Systems Programming in Practice: My Journey Through CS:APP as a Rust Engineer
There's a certain magic to writing code, but the real power comes from understanding the underlying system. That's a lesson I'm reinforcing daily as I embark on an immersive journey through 'Computer Systems: A Programmer's Perspective' (CS:APP).
This article isn't just a reading log; it's my personal roadmap for mastering the fundamentals of computer architecture, data representation, and operating systems. Each day, I'll be sharing a key concept, a challenging exercise, or a practical insight gained. I invite you to follow this journey with me – let's demystify the machine, one bit at a time!
There are 2 types of software engineer: those who understand computer science well enough to do challenging, innovative work, and those who just get by because they’re familiar with a few high level tools.
Chapter #1: A Tour of Computer Systems
Chapter 1 serves as an excellent introduction to this exploration. It's allowed me to either learn new concepts or gain a deeper understanding of existing ones. As I noted in my previous post, abstraction is the defining theme of this chapter. Ultimately, computer systems are built upon layers of abstraction that, by design, keep us from direct hardware interaction – and that's generally beneficial. However, as developers, a solid grasp of these underlying mechanisms is essential.
Chapter #2: Representing and Manipulating Information
Chapter 2 of CS:APP, diving into data representation and low-level arithmetic, truly highlights one of Rust's core strengths over C/C++. While C/C++ often employs implicit type conversions—silently reinterpreting bits when mixing signed and unsigned types—Rust makes these crucial operations explicit.
This prevents classic pitfalls, like a -1 unexpectedly becoming a massive positive number during a comparison, or an integer overflow leading to undefined, unpredictable behavior.
Rust's insistence on as for casting, and its well-defined overflow handling (panicking in debug mode, wrapping predictably in release), means you're always aware of how your data is being manipulated at the bit level.
This deliberate design eliminates an entire class of subtle, hard-to-debug errors, making your low-level code significantly safer and more reliable than its C/C++ counterparts.
Chapter #3: Machine-Level Representation of Programs
As a Rust developer, Chapter 3 is a fascinating immersion into the inner workings of a computer. It shows how C code (and by extension, Rust code) is transformed into a language the processor understands: assembler, then binary machine code.
This chapter is fundamental because it establishes the direct link between your high-level lines of code and the microscopic operations performed by the CPU. The big difference is that Rust offers powerful and safe abstractions over these low-level mechanisms, whereas C often lets us manipulate them directly, with all the risks that entails.
To see the assembly code that will be generated, there is this interesting command:
$ rustc --emit asm [filename].rs
This chapter gives me a better understanding of how Rust protects my code and allows me to achieve near-C performance, while also giving me a better understanding of what's going on "under the hood." It's the bridge between application logic and hardware execution.
Chapter #4: Processor Architecture
So, after diving into this chapter, even with my "Nand to Tetris" background, I feel like I've lifted yet another hood on my computer, and this time, things are getting seriously detailed!
Before, a CPU felt a bit like magic. Now, I understand that even the simplest instruction goes through a clearly defined six-step path: first, you fetch the instruction, then you decode it to know what to do, next you execute the operation (like a little calculation), after that, if needed, you check the memory (to fetch or store data), then you write back the result somewhere, and finally, you update the program counter to move on.
That's the basic "step-by-step" mode, which really helps visualize the whole process, a true breakdown that adds flesh to the circuit skeletons I already knew.
But where it gets really hot and relevant for "real life" is with pipelining. From "Nand to Tetris," I had the clock cycle concept down, but this chapter showed me the complex dance of overlapping instructions. That's the secret to why our CPUs aren't super slow!
The processor doesn't just do one instruction at a time like we did in Hack; no, it handles multiple instructions simultaneously, each at a different stage of its "pipe." And that's where hazards (the bottlenecks or wrong turns) become a nightmare: imagine data dependencies, or worse, unpredictable branches that force the CPU to flush its pipeline and start all over.
For me, as a Rust developer, this changes everything! It makes me think about how I should structure my code to help the processor better predict my intentions and avoid those costly pauses.
It's crazy how understanding the "under the hood" at this level of granularity makes performance less mysterious and more manageable, right?
Chapter #5: Optimizing Program Performance
As I delved into Chapter 5, dedicated to optimizing program performance, I realized how much the speed of our code depends not just on the algorithm chosen, but primarily on how it dances with the hardware. I now understand that our CPUs are true marvels, capable of incredible performance thanks to pipelining, their multiple parallel computing units, and their ability to predict our next moves.
However, I also discovered that these superpowers have their "Kryptonites": data dependencies that force the CPU to wait, mispredicted branches that flush the pipeline and force a restart, and especially, the exorbitant cost of distant memory accesses – those "cache misses" that set us back tens or even hundreds of cycles.
This chapter clearly showed me that the true bottlenecks in our programs are not always where we expect them. Loops, for instance, even simple ones, introduce management overhead (checks, increments, jumps) that accumulates and slows down execution.
Unpredictable branches are the pipeline's worst enemy, turning fluidity into chaos. And the way our programs access memory is crucial: not leveraging the CPU's fast caches is like buying every ingredient from the supermarket one by one instead of using the kitchen's well-stocked fridge. This is where code profiling becomes indispensable, as it precisely points out where these time losses are hiding.
Fortunately, CS:APP equips us with powerful strategies to counter these weaknesses. I particularly appreciated understanding Loop Unrolling, that clever trick of doing more work per iteration to reduce overhead and branches.
Similarly, the concept of Conditional Moves fascinated me: for simple conditions, the CPU can sometimes compute both outcomes and choose without a costly branch, thus avoiding the penalties of mispredictions.
Finally, optimizing cache locality by structuring our data accesses (e.g., "cache blocking" for matrices) has become a priority to ensure that needed data is always readily available to the processor.
The big lesson I'm taking away from Chapter 5 of CS:APP is fundamental: to truly optimize my code's performance, I need to look beyond language syntax.
It's imperative to consider how my high-level code will be transformed into machine instructions by the compiler, and how those instructions will then be executed by the CPU, taking into account its pipelines, functional units, and memory hierarchy.
It's this deep understanding of "how it really works" that will allow me to guide the Rustc compiler towards even finer optimizations and unlock the full potential of my applications. A crucial step in my journey to become a more efficient and system-aware engineer.
Chapter #6: The Memory Hierarchy
Even though I was already familiar with the concept of memory hierarchy, this chapter has profoundly refined my understanding of why memory is the true key (or bottleneck) to our programs' performance. It's no longer just an idea; it's a detailed reality that directly impacts every line of code I write.
I really dove deep into the specifics of this pyramid of speeds, from the ultra-fast but tiny CPU registers, to the caches (L1, L2, L3) which are larger but slightly slower, all the way down to RAM (main memory), which is massive but frustratingly slow compared to the caches. The big revelation was understanding precisely the cost of each "descent" in this hierarchy – a cost that, accumulated, can cripple the performance of any program, even with a supercharged CPU.
This chapter primarily helped me grasp the two fundamental principles guiding the CPU's cache management: temporal locality and spatial locality.
⌛ Temporal locality explains why the CPU keeps recently used data close by in the cache, anticipating its next use.
🎯 But it was spatial locality that was most enlightening: understanding that the CPU fetches an entire block of memory at once when you access a single piece of data, and that the key is to exploit this.
This is where the concept of a "Stride-1 reference" makes perfect sense: accessing data consecutively in memory isn't just good practice; it's the golden rule for maximizing cache efficiency and making performance soar on the "Memory Mountain." 🎢
This in-depth exploration of the memory hierarchy and these principles of locality is crucial. It explains why code that "jumps" all over memory is a performance disaster (due to numerous "cache misses"!), while code that accesses data sequentially can be blazing fast. Now, I no longer look at my data structures or loops the same way.
Thinking "cache-friendly" has become a new, more precise habit, deeply rooted in the intimate workings of the system!
Chapter #7: Linking
After exploring the fundamentals of "linking" in this chapter, one of the most enlightening distinctions has been that between static linking and dynamic linking. This often-invisible difference has a colossal impact on the size of our programs and how they behave once deployed. I finally understood why my Rust "Hello World" is sometimes chunkier than expected!
Static linking is much like preparing an "all-in-one" survival kit. When you link your code this way, the linker takes all the necessary code and data from every library you use and copies them directly into your final executable.
The result?
A completely self-contained binary. It needs nothing else on the target system to run, which is fantastic for portability – you copy it, and it just runs. However, this autonomy comes at a price: the executable file size is often significantly larger. In Rust, this is the default behavior for most of your dependencies (crates), which are compiled into .rlib (static libraries). This is why even a small Rust program might seem "big" at first; it carries a large part of its environment with it. Strategies like compiling with musl for Linux push this concept to the extreme, creating truly self-sufficient binaries, perfect for containers or minimalist deployments.
Conversely, there's dynamic linking, which I visualize as subscribing to a shared service. Here, your executable doesn't contain the library code itself, but only references to it. The operating system, when your program starts, will search for these shared libraries (the familiar .so on Linux, .dll on Windows, or .dylib on macOS) and load them into memory.
The advantage is clear: your executable is much smaller, and if multiple programs use the same library, only one copy is loaded into RAM, saving valuable resources. Furthermore, updates to these libraries can automatically benefit all programs that use them without needing to be recompiled. Rust can also generate and use dynamic libraries (.dylib for example), which is often the case when you interact with C libraries via FFI. However, dynamic linking also has its challenges: "DLL Hell" on Windows (library versioning issues), or the requirement that libraries must be present on the target system for your program to even start.
Understanding these two linking approaches is essential. It's a constant trade-off between binary size, portability, ease of updates, and managing dependencies on the target system.
Thanks to this chapter, I can now make informed decisions about how I build and deploy my Rust applications, knowing precisely the implications of each linking choice. It's another piece of the performance and system management puzzle!
Chapter #8: Exceptional Control Flow
This chapter is a goldmine for Rustaceans like us! It pulls back the curtain on why our code behaves the way it does, making us better, more informed developers.
First off, your Rust threads aren't magic. Ever wonder why std::thread::spawn isn't "free"? It's because you're asking the OS to create and manage that thread, which involves a system call—a precise "exceptional control flow" operation. Understanding this cost helps you decide when to use OS threads and when lighter-weight async/await tasks might be a better fit.
Next, panic! is your loud, honest friend. When your Rust program panics!, it's not just a random crash. Chapter 8 helps you see it as the system executing a controlled "trap" or "exception" handler. Rust's default stack unwinding, followed by process termination, is the OS stepping in to manage an unrecoverable situation. It's Rust's way of saying "I can't continue safely, so let's stop now to prevent worse bugs later."
Moreover, every OS interaction has a price. Reading a file with std::fs, sending data over std::net, or launching another program with std::process::Command—these all involve system calls. This chapter demystifies that process, showing you how your user-mode code temporarily hands control to the kernel. Knowing this helps you understand why I/O operations can be slow and why non-blocking I/O (often used in async Rust) is so crucial for performance.
Finally, you can start building resilient apps by understanding signals. The OS uses signals (like when you hit Ctrl+C) to communicate with your program. Understanding these allows you to write Rust applications that can gracefully shut down, clean up resources, or react to external events, making your software more robust.
In short, Chapter 8 transforms those "mysteries" of why things happen into clear, logical system behaviors. For Rust developers, it's essential for writing not just correct and safe code, but also code that's performant, predictable, and truly understands its place in the grand scheme of the computer system. It's about moving beyond just writing code to truly understanding the machine.
Chapter #9: Virtual Memory
Have you ever pondered the magic happening under the hood, the very force that allows your program to run reliably even when dozens of other applications are active? That's the realm of Virtual Memory (VM).
Imagine that every program you write believes it has access to a vast, contiguous, and private memory space. This is the perfect illusion created by virtual memory. In reality, your computer juggles physical memory (RAM) to serve all programs. VM is the grand orchestrator that translates these "dreamed" addresses (virtual) into real physical addresses in RAM or even on the hard drive.
This illusion is a feat of engineering that brings colossal advantages. Firstly, it ensures unshakeable protection. If one of your Rust applications crashes (rare as it may be!), it won't corrupt the memory of another program or the operating system, ensuring system stability. Secondly, VM enables incredibly efficient use of your RAM. The system loads only the actively used parts of your Rust code into physical memory, leaving the rest calmly on the hard drive, ready to be fetched if needed. Finally, it simplifies life for compilers and operating systems: your Rust binary can always be designed as if it will load at the same address, regardless of its actual location in RAM, and shared libraries can be efficiently used by multiple programs without wasting memory.
The core of this magic lies in address translation. Your CPU works with virtual addresses, and it's the processor's Memory Management Unit (MMU), guided by "page tables" managed by the operating system, that converts them into physical addresses. If the piece of memory your Rust program needs isn't in RAM (but on disk), a "page fault" is triggered, prompting the system to load it. The Translation Lookaside Buffer (TLB) acts as an ultra-fast cache to significantly speed up these translations.
So, what's the direct connection to Rust, this already powerful language? This is where it gets fascinating. Rust's promise of memory safety, with its borrow checker and ownership system, aims to prevent at compile time the types of memory management errors (like data races or invalid pointers) that, in other languages, would often lead to segmentation faults – essentially, violations of the virtual memory space detected by the operating system. Rust adds a layer of software defense that saves you from many VM-related headaches before your program even runs.
When your Rust code uses Box<T>, Vec<T>, or String, memory is allocated on your process's virtual heap. Internally, Rust relies on the system allocator, which in turn interacts with the kernel via syscalls like mmap() to acquire virtual memory pages. Rust's deterministic management of these objects' lifetimes (through the Drop trait) ensures clean release of VM-allocated resources. Furthermore, if you venture into unsafe blocks or interface with C code (FFI - Foreign Function Interface), knowledge of virtual memory becomes critical. You bypass Rust's safety guarantees in these scenarios, and it's then your responsibility to ensure your memory accesses adhere to VM rules to prevent crashes. Understanding the difference between a Rust panic! (an error detected by the program itself) and a true segmentation fault (a processor/OS error) is also crucial for effective debugging.
In conclusion, virtual memory is the invisible backdrop upon which all your code runs. For a Rust developer, understanding VM means not only appreciating the depth of the safety guarantees Rust provides but also gaining the ability to optimize, debug the most complex issues, and navigate with confidence in the fascinating world of systems development. It's the next step to becoming a truly enlightened software architect.
Chapter #10: System-Level I/O
In the complex world of computing, everything is ultimately an abstraction. Your CPU doesn't speak "file" or "network," only bits and bytes. Yet, our programs effortlessly manipulate text documents, communicate over the internet, or display images. This magic is the work of powerful abstraction layers, and this chapter dives into one of the most fundamental: the abstraction of Input/Output (I/O) operations.
The operating system transforms every resource—whether it's a file on your hard drive, your keyboard, the screen, or even a network connection—into a simple sequence of bytes, accessible via a small integer called a file descriptor. A brilliant idea for simplicity, but one that, in C or C++, often comes with its share of challenges and headaches.
The most insidious trap? File descriptor leaks. In C/C++, every time you "open" a file or resource, you must conscientiously remember to explicitly "close" it once your work is done. Forgetting just one close() or fclose() call in an execution path can lead to a leak: the resource remains allocated, your program consumes available system file descriptors, and eventually, it may no longer be able to open new files, leading to unexplained crashes or denial of service. It's a constant mental burden for the developer.
This is precisely where Rust truly shines with its elegant and robust solution: RAII (Resource Acquisition Is Initialization).
In Rust, when you create a File object (or BufReader, BufWriter for buffering) to interact with an I/O resource, this object is more than just a simple variable. It "owns" the underlying resource (the file descriptor). And the magic happens: as soon as this File object is no longer accessible (for example, when it goes out of scope), Rust guarantees that its "cleanup" code (its implementation of the Drop trait) will be executed. This cleanup code automatically handles calling the necessary system functions to close the file descriptor and flush any write buffers if needed.
You almost never write an explicit close() in safe Rust for standard I/O types, because the compiler ensures it's handled for you. This guarantee virtually eliminates resource leaks and data loss due to unflushed buffers upon normal program exit. It removes a major source of bugs, reduces developer mental overhead, and makes your applications infinitely more reliable when it comes to resource management.
In essence, the universal abstraction of files is a powerful foundation, but it's Rust's approach to managing the lifecycle of I/O resources, through RAII, that elevates the reliability and security of our applications to a new level. It's a genuine liberation for developers.
Chapter #11: Network Programming
How do computers talk to each other? Imagine the complexity of connecting billions of machines! Well, Chapter 11 shows us it's a marvel of abstraction.
For two computers to communicate, they need an address (IP), a specific service on that machine (port), and a reliable way to send data to each other (like TCP for reliability or UDP for speed). DNS, on the other hand, translates website names into IP addresses.
The brilliant idea in Unix is to treat network connections just like files. Yes, you read that right! When your program wants to talk to a web server, it opens a special "file" called a socket. And just like any other file, the operating system gives it a file descriptor (FD), which is just a simple number.
This means you use the same basic functions (read() to receive data, write() to send data, close() to end the communication) as you would for manipulating a text document on your hard drive. This is the elegance of the "everything is a file" philosophy: a single interface for very different resources, greatly simplifying programming.
This is where Rust comes in, taking Unix's simplicity and adding an unparalleled layer of safety and robustness.
By combining Unix's powerful abstractions with Rust's memory safety guarantees and robust error handling, you can build modern, high-performance network applications with unmatched confidence. That's the recipe for systems that truly work!
Chapter #12: Concurrent Programming
Imagine your computer could only do one thing at a time. If you clicked a button and the program started a heavy calculation, the entire interface would freeze. Unthinkable! Concurrency solves this. It allows for:
The most common form of concurrency this chapter explores is threads. Multiple threads run within the same program, sharing the same memory, but each having its own "execution path."
The flip side is that concurrency is a source of some of the hardest bugs to pinpoint. Why? Because threads share the same memory!
To avoid these issues, you must use synchronization mechanisms like mutexes (which ensure only one thread at a time can access a critical resource) or semaphores. But even with these tools, the complexity is immense, and the slightest error can create security vulnerabilities or crashes.
This is where Rust dramatically changes the game. While CS:APP shows you the complexity of Pthreads in C, Rust offers an approach to concurrency that is both performant and, crucially, incredibly safe.
Rust doesn't reinvent the fundamental concepts of concurrency (threads, mutexes also exist). But it adds two major guarantees thanks to its ownership system and borrow checker:
In short, Rust allows you to write performant concurrent code that harnesses your machine's full power, but without the risks of memory corruption or race conditions that plague C/C++ development. It transforms a traditionally risky domain into a much safer and more pleasant one.
Thales Air Systems Trainee
2wReally very interesting article to read
--
3wReading your blog, it reminded me of my fascination with Ruat and especially for Low-Level programming. 2 years ago I started learning Rust and I also followed the "Nans to Terris" course which is really excellent and which opens up new perspectives for understanding the abstraction of systems called "Newman Architecture". I started writing my own compiled Punk language and also a VM (punkVM) which would be responsible for the interpretation. I also noticed that I use other languages almost the same way I use rust. because the program must be correct before being able to not have errors to compile without problems. I can say that Rust has made me a better programmer and especially the understanding of the cause of memory leaks. I use rust in all my personal projects.