SlideShare a Scribd company logo
Understanding The Linux Kernel 1st Edition
Daniel Pierre Bovet download
https://ebookbell.com/product/understanding-the-linux-kernel-1st-
edition-daniel-pierre-bovet-973618
Explore and download more ebooks at ebookbell.com
Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Understanding The Linux Kernel Third Edition 3rd Edition Daniel P
Bovet
https://ebookbell.com/product/understanding-the-linux-kernel-third-
edition-3rd-edition-daniel-p-bovet-56913742
Understanding The Linux Kernel 2nd Edition Daniel P Bovet Marco Cesati
https://ebookbell.com/product/understanding-the-linux-kernel-2nd-
edition-daniel-p-bovet-marco-cesati-1369738
Understanding The Linux Kernel 3rd Edition Daniel P Bovet Marco Cesati
https://ebookbell.com/product/understanding-the-linux-kernel-3rd-
edition-daniel-p-bovet-marco-cesati-52556382
Understanding The Linux Kernel Cesati Marco Bovet Daniel P Marco
Cesati
https://ebookbell.com/product/understanding-the-linux-kernel-cesati-
marco-bovet-daniel-p-marco-cesati-10509418
Understanding The Linux Kernel Daniel P Bovet
https://ebookbell.com/product/understanding-the-linux-kernel-daniel-p-
bovet-33374810
Understanding The Linux Kernel 3rd Edition Daniel P Bovet Marco Cesati
Daniel P Bovet And Marco Cesati
https://ebookbell.com/product/understanding-the-linux-kernel-3rd-
edition-daniel-p-bovet-marco-cesati-daniel-p-bovet-and-marco-
cesati-28931654
Understanding The Linux Kernel Daniel P Bovet Marco Cesati Bovet
https://ebookbell.com/product/understanding-the-linux-kernel-daniel-p-
bovet-marco-cesati-bovet-31746162
Understanding The Linux Kernel Daniel P Bovet Marco Cesati
https://ebookbell.com/product/understanding-the-linux-kernel-daniel-p-
bovet-marco-cesati-38225578
Understanding The Linux Virtual Memory Manager Mel Gorman
https://ebookbell.com/product/understanding-the-linux-virtual-memory-
manager-mel-gorman-976436
Understanding The Linux Kernel 1st Edition Daniel Pierre Bovet
Understanding The Linux Kernel 1st Edition Daniel Pierre Bovet
Understanding the Linux Kernel
Daniel P. Bovet
Marco Cesati
Publisher: O'Reilly
First Edition October 2000
ISBN: 0-596-00002-2, 702 pages
Understanding the Linux Kernel helps readers understand how Linux performs best and how
it meets the challenge of different environments. The authors introduce each topic by
explaining its importance, and show how kernel operations relate to the utilities that are
familiar to Unix programmers and users.
Table of Contents
Preface ..........................................................
The Audience for This Book ..........................................
Organization of the Material ..........................................
Overview of the Book ..............................................
Background Information .............................................
Conventions in This Book ...........................................
How to Contact Us .................................................
Acknowledgments .................................................
1
1
1
3
4
4
4
5
1. Introduction ....................................................
1.1 Linux Versus Other Unix-Like Kernels ...............................
1.2 Hardware Dependency ..........................................
1.3 Linux Versions ................................................
1.4 Basic Operating System Concepts ..................................
1.5 An Overview of the Unix Filesystem ................................
1.6 An Overview of Unix Kernels .....................................
6
6
10
11
12
16
22
2. Memory Addressing .............................................
2.1 Memory Addresses .............................................
2.2 Segmentation in Hardware .......................................
2.3 Segmentation in Linux ..........................................
2.4 Paging in Hardware ............................................
2.5 Paging in Linux ...............................................
2.6 Anticipating Linux 2.4 ..........................................
36
36
37
41
44
52
63
3. Processes ......................................................
3.1 Process Descriptor .............................................
3.2 Process Switching .............................................
3.3 Creating Processes .............................................
3.4 Destroying Processes ...........................................
3.5 Anticipating Linux 2.4 ..........................................
64
64
78
86
93
94
4. Interrupts and Exceptions .........................................
4.1 The Role of Interrupt Signals ......................................
4.2 Interrupts and Exceptions ........................................
4.3 Nested Execution of Exception and Interrupt Handlers ..................
4.4 Initializing the Interrupt Descriptor Table ............................
4.5 Exception Handling ...........................................
4.6 Interrupt Handling ............................................
4.7 Returning from Interrupts and Exceptions ...........................
4.8 Anticipating Linux 2.4 .........................................
96
96
97
106
107
109
112
126
129
5. Timing Measurements ...........................................
5.1 Hardware Clocks .............................................
5.2 The Timer Interrupt Handler .....................................
5.3 PIT's Interrupt Service Routine ...................................
5.4 The TIMER_BH Bottom Half Functions ............................
5.5 System Calls Related to Timing Measurements ........................
5.6 Anticipating Linux 2.4 .........................................
131
131
133
134
136
145
148
6. Memory Management ...........................................
6.1 Page Frame Management .......................................
6.2 Memory Area Management ......................................
6.3 Noncontiguous Memory Area Management ..........................
6.4 Anticipating Linux 2.4 .........................................
149
149
160
176
181
7. Process Address Space ..........................................
7.1 The Process's Address Space .....................................
7.2 The Memory Descriptor ........................................
7.3 Memory Regions .............................................
7.4 Page Fault Exception Handler ....................................
7.5 Creating and Deleting a Process Address Space .......................
7.6 Managing the Heap ............................................
7.7 Anticipating Linux 2.4 .........................................
183
183
185
186
201
212
214
216
8. System Calls ..................................................
8.1 POSIX APIs and System Calls ...................................
8.2 System Call Handler and Service Routines ...........................
8.3 Wrapper Routines .............................................
8.4 Anticipating Linux 2.4 .........................................
217
217
218
229
230
9. Signals .......................................................
9.1 The Role of Signals ...........................................
9.2 Sending a Signal ..............................................
9.3 Receiving a Signal ............................................
9.4 Real-Time Signals ............................................
9.5 System Calls Related to Signal Handling ............................
9.6 Anticipating Linux 2.4 .........................................
231
231
239
242
251
252
257
10. Process Scheduling ............................................
10.1 Scheduling Policy ............................................
10.2 The Scheduling Algorithm .....................................
10.3 System Calls Related to Scheduling ...............................
10.4 Anticipating Linux 2.4 ........................................
258
258
261
272
276
11. Kernel Synchronization .........................................
11.1 Kernel Control Paths ..........................................
11.2 Synchronization Techniques ....................................
11.3 The SMP Architecture ........................................
11.4 The Linux/SMP Kernel ........................................
11.5 Anticipating Linux 2.4 ........................................
277
277
278
286
290
302
12. The Virtual Filesystem .........................................
12.1 The Role of the VFS ..........................................
12.2 VFS Data Structures ..........................................
12.3 Filesystem Mounting .........................................
12.4 Pathname Lookup ............................................
12.5 Implementations of VFS System Calls .............................
12.6 File Locking ................................................
12.7 Anticipating Linux 2.4 ........................................
303
303
308
324
329
333
337
342
13. Managing I/O Devices ..........................................
13.1 I/O Architecture .............................................
13.2 Associating Files with I/O Devices ...............................
13.3 Device Drivers ..............................................
13.4 Character Device Handling .....................................
13.5 Block Device Handling ........................................
13.6 Page I/O Operations ..........................................
13.7 Anticipating Linux 2.4 ........................................
343
343
348
353
360
361
377
380
14. Disk Caches ..................................................
14.1 The Buffer Cache ............................................
14.2 The Page Cache .............................................
14.3 Anticipating Linux 2.4 ........................................
382
383
396
398
15. Accessing Regular Files .........................................
15.1 Reading and Writing a Regular File ...............................
15.2 Memory Mapping ............................................
15.3 Anticipating Linux 2.4 ........................................
400
400
408
416
16. Swapping: Methods for Freeing Memory ...........................
16.1 What Is Swapping? ...........................................
16.2 Swap Area .................................................
16.3 The Swap Cache .............................................
16.4 Transferring Swap Pages .......................................
16.5 Page Swap-Out ..............................................
16.6 Page Swap-In ...............................................
16.7 Freeing Page Frames ..........................................
16.8 Anticipating Linux 2.4 ........................................
417
417
420
429
433
437
442
444
450
17. The Ext2 Filesystem ...........................................
17.1 General Characteristics ........................................
17.2 Disk Data Structures ..........................................
17.3 Memory Data Structures .......................................
17.4 Creating the Filesystem ........................................
17.5 Ext2 Methods ...............................................
17.6 Managing Disk Space .........................................
17.7 Reading and Writing an Ext2 Regular File ..........................
17.8 Anticipating Linux 2.4 ........................................
451
451
453
459
463
464
466
473
475
18. Process Communication ........................................
18.1 Pipes .....................................................
18.2 FIFOs ....................................................
18.3 System V IPC ...............................................
18.4 Anticipating Linux 2.4 ........................................
476
477
483
486
499
19. Program Execution ............................................
19.1 Executable Files .............................................
19.2 Executable Formats ..........................................
19.3 Execution Domains ...........................................
19.4 The exec-like Functions .......................................
19.5 Anticipating Linux 2.4 ........................................
500
500
512
514
515
519
A. System Startup ................................................
A.1 Prehistoric Age: The BIOS ......................................
A.2 Ancient Age: The Boot Loader ...................................
A.3 Middle Ages: The setup( ) Function ...............................
A.4 Renaissance: The startup_32( ) Functions ...........................
A.5 Modern Age: The start_kernel( ) Function ...........................
520
520
521
523
523
524
B. Modules .....................................................
B.1 To Be (a Module) or Not to Be? ..................................
B.2 Module Implementation ........................................
B.3 Linking and Unlinking Modules ..................................
B.4 Linking Modules on Demand ....................................
526
526
527
529
531
C. Source Code Structure .......................................... 533
Colophon ...................................................... 536
Understanding the Linux Kernel
1
Preface
In the spring semester of 1997, we taught a course on operating systems based on Linux 2.0.
The idea was to encourage students to read the source code. To achieve this, we assigned term
projects consisting of making changes to the kernel and performing tests on the modified
version. We also wrote course notes for our students about a few critical features of Linux like
task switching and task scheduling.
We continued along this line in the spring semester of 1998, but we moved on to the Linux
2.1 development version. Our course notes were becoming larger and larger. In July, 1998 we
contacted O'Reilly & Associates, suggesting they publish a whole book on the Linux kernel.
The real work started in the fall of 1998 and lasted about a year and a half. We read thousands
of lines of code, trying to make sense of them. After all this work, we can say that it was
worth the effort. We learned a lot of things you don't find in books, and we hope we have
succeeded in conveying some of this information in the following pages.
The Audience for This Book
All people curious about how Linux works and why it is so efficient will find answers here.
After reading the book, you will find your way through the many thousands of lines of code,
distinguishing between crucial data structures and secondary ones—in short, becoming a true
Linux hacker.
Our work might be considered a guided tour of the Linux kernel: most of the significant data
structures and many algorithms and programming tricks used in the kernel are discussed; in
many cases, the relevant fragments of code are discussed line by line. Of course, you should
have the Linux source code on hand and should be willing to spend some effort deciphering
some of the functions that are not, for sake of brevity, fully described.
On another level, the book will give valuable insights to people who want to know more about
the critical design issues in a modern operating system. It is not specifically addressed to
system administrators or programmers; it is mostly for people who want to understand how
things really work inside the machine! Like any good guide, we try to go beyond superficial
features. We offer background, such as the history of major features and the reasons they were
used.
Organization of the Material
When starting to write this book, we were faced with a critical decision: should we refer to a
specific hardware platform or skip the hardware-dependent details and concentrate on the
pure hardware-independent parts of the kernel?
Others books on Linux kernel internals have chosen the latter approach; we decided to adopt
the former one for the following reasons:
• Efficient kernels take advantage of most available hardware features, such as
addressing techniques, caches, processor exceptions, special instructions, processor
control registers, and so on. If we want to convince you that the kernel indeed does
Understanding the Linux Kernel
2
quite a good job in performing a specific task, we must first tell what kind of support
comes from the hardware.
• Even if a large portion of a Unix kernel source code is processor-independent and
coded in C language, a small and critical part is coded in assembly language. A
thorough knowledge of the kernel thus requires the study of a few assembly language
fragments that interact with the hardware.
When covering hardware features, our strategy will be quite simple: just sketch the features
that are totally hardware-driven while detailing those that need some software support. In fact,
we are interested in kernel design rather than in computer architecture.
The next step consisted of selecting the computer system to be described: although Linux is
now running on several kinds of personal computers and workstations, we decided to
concentrate on the very popular and cheap IBM-compatible personal computers—thus, on the
Intel 80x86 microprocessors and on some support chips included in these personal computers.
The term Intel 80x86 microprocessor will be used in the forthcoming chapters to denote the
Intel 80386, 80486, Pentium, Pentium Pro, Pentium II, and Pentium III microprocessors or
compatible models. In a few cases, explicit references will be made to specific models.
One more choice was the order followed in studying Linux components. We tried to follow a
bottom-up approach: start with topics that are hardware-dependent and end with those that are
totally hardware-independent. In fact, we'll make many references to the Intel 80x86
microprocessors in the first part of the book, while the rest of it is relatively hardware-
independent. Two significant exceptions are made in Chapter 11, and Chapter 13. In practice,
following a bottom-up approach is not as simple as it looks, since the areas of memory
management, process management, and filesystem are intertwined; a few forward
references—that is, references to topics yet to be explained—are unavoidable.
Each chapter starts with a theoretical overview of the topics covered. The material is then
presented according to the bottom-up approach. We start with the data structures needed to
support the functionalities described in the chapter. Then we usually move from the lowest
level of functions to higher levels, often ending by showing how system calls issued by user
applications are supported.
Level of Description
Linux source code for all supported architectures is contained in about 4500 C and Assembly
files stored in about 270 subdirectories; it consists of about 2 million lines of code, which
occupy more than 58 megabytes of disk space. Of course, this book can cover a very small
portion of that code. Just to figure out how big the Linux source is, consider that the whole
source code of the book you are reading occupies less than 2 megabytes of disk space.
Therefore, in order to list all code, without commenting on it, we would need more than 25
books like this![1]
[1]
Nevertheless, Linux is a tiny operating system when compared with other commercial giants. Microsoft Windows 2000, for example, reportedly has
more than 30 million lines of code. Linux is also small when compared to some popular applications; Netscape Communicator 5 browser, for example,
has about 17 million lines of code.
So we had to make some choices about the parts to be described. This is a rough assessment
of our decisions:
Understanding the Linux Kernel
3
• We describe process and memory management fairly thoroughly.
• We cover the Virtual Filesystem and the Ext2 filesystem, although many functions are
just mentioned without detailing the code; we do not discuss other filesystems
supported by Linux.
• We describe device drivers, which account for a good part of the kernel, as far as the
kernel interface is concerned, but do not attempt analysis of any specific driver,
including the terminal drivers.
• We do not cover networking, since this area would deserve a whole new book by
itself.
In many cases, the original code has been rewritten in an easier to read but less efficient way.
This occurs at time-critical points at which sections of programs are often written in a mixture
of hand-optimized C and Assembly code. Once again, our aim is to provide some help in
studying the original Linux code.
While discussing kernel code, we often end up describing the underpinnings of many familiar
features that Unix programmers have heard of and about which they may be curious (shared
and mapped memory, signals, pipes, symbolic links).
Overview of the Book
To make life easier, Chapter 1 presents a general picture of what is inside a Unix kernel and
how Linux competes against other well-known Unix systems.
The heart of any Unix kernel is memory management. Chapter 2 explains how Intel 80x86
processors include special circuits to address data in memory and how Linux exploits them.
Processes are a fundamental abstraction offered by Linux and are introduced in Chapter 3.
Here we also explain how each process runs either in an unprivileged User Mode or in a
privileged Kernel Mode. Transitions between User Mode and Kernel Mode happen only
through well-established hardware mechanisms called interrupts and exceptions, which are
introduced in Chapter 4. One type of interrupt is crucial for allowing Linux to take care of
elapsed time; further details can be found in Chapter 5.
Next we focus again on memory: Chapter 6 describes the sophisticated techniques required to
handle the most precious resource in the system (besides the processors, of course), that is,
available memory. This resource must be granted both to the Linux kernel and to the user
applications. Chapter 7 shows how the kernel copes with the requests for memory issued by
greedy application programs.
Chapter 8 explains how a process running in User Mode makes requests to the kernel, while
Chapter 9 describes how a process may send synchronization signals to other processes.
Chapter 10 explains how Linux executes, in turn, every active process in the system so that all
of them can progress toward their completions. Synchronization mechanisms are needed by
the kernel too: they are discussed in Chapter 11 for both uniprocessor and multiprocessor
systems.
Now we are ready to move on to another essential topic, that is, how Linux implements the
filesystem. A series of chapters covers this topic: Chapter 12 introduces a general layer that
supports many different filesystems. Some Linux files are special because they provide
Understanding the Linux Kernel
4
trapdoors to reach hardware devices; Chapter 13 offers insights on these special files and on
the corresponding hardware device drivers. Another issue to be considered is disk access
time; Chapter 14 shows how a clever use of RAM reduces disk accesses and thus improves
system performance significantly. Building on the material covered in these last chapters, we
can now explain in Chapter 15, how user applications access normal files. Chapter 16
completes our discussion of Linux memory management and explains the techniques used by
Linux to ensure that enough memory is always available. The last chapter dealing with files is
Chapter 17, which illustrates the most-used Linux filesystem, namely Ext2.
The last two chapters end our detailed tour of the Linux kernel: Chapter 18 introduces
communication mechanisms other than signals available to User Mode processes; Chapter 19
explains how user applications are started.
Last but not least are the appendixes: Appendix A sketches out how Linux is booted, while
Appendix B describes how to dynamically reconfigure the running kernel, adding and
removing functionalities as needed. Appendix C is just a list of the directories that contain the
Linux source code. The Source Code Index includes all the Linux symbols referenced in the
book; you will find here the name of the Linux file defining each symbol and the book's page
number where it is explained. We think you'll find it quite handy.
Background Information
No prerequisites are required, except some skill in C programming language and perhaps
some knowledge of Assembly language.
Conventions in This Book
The following is a list of typographical conventions used in this book:
Constant Width
Is used to show the contents of code files or the output from commands, and to
indicate source code keywords that appear in code.
Italic
Is used for file and directory names, program and command names, command-line
options, URLs, and for emphasizing new terms.
How to Contact Us
We have tested and verified all the information in this book to the best of our abilities, but you
may find that features have changed or that we have let errors slip through the production of
the book. Please let us know of any errors that you find, as well as suggestions for future
editions, by writing to:
O'Reilly & Associates, Inc. 101 Morris St. Sebastopol, CA 95472 (800) 998-9938 (in the U.S.
or Canada) (707) 829-0515 (international/local) (707) 829-0104 (fax)
Understanding the Linux Kernel
5
You can also send messages electronically. To be put on our mailing list or to request a
catalog, send email to:
info@oreilly.com
To ask technical questions or to comment on the book, send email to:
bookquestions@oreilly.com
We have a web site for the book, where we'll list reader reviews, errata, and any plans for
future editions. You can access this page at:
http://www.oreilly.com/catalog/linuxkernel/
We also have an additional web site where you will find material written by the authors about
the new features of Linux 2.4. Hopefully, this material will be used for a future edition of this
book. You can access this page at:
http://www.oreilly.com/catalog/linuxkernel/updates/
For more information about this book and others, see the O'Reilly web site:
http://www.oreilly.com/
Acknowledgments
This book would not have been written without the precious help of the many students of the
school of engineering at the University of Rome "Tor Vergata" who took our course and tried
to decipher the lecture notes about the Linux kernel. Their strenuous efforts to grasp the
meaning of the source code led us to improve our presentation and to correct many mistakes.
Andy Oram, our wonderful editor at O'Reilly & Associates, deserves a lot of credit. He was
the first at O'Reilly to believe in this project, and he spent a lot of time and energy
deciphering our preliminary drafts. He also suggested many ways to make the book more
readable, and he wrote several excellent introductory paragraphs.
Many thanks also to the O'Reilly staff, especially Rob Romano, the technical illustrator, and
Lenny Muellner, for tools support.
We had some prestigious reviewers who read our text quite carefully (in alphabetical order by
first name): Alan Cox, Michael Kerrisk, Paul Kinzelman, Raph Levien, and Rik van Riel.
Their comments helped us to remove several errors and inaccuracies and have made this book
stronger.
—Daniel P. Bovet, Marco Cesati
September 2000
Understanding the Linux Kernel
6
Chapter 1. Introduction
Linux is a member of the large family of Unix-like operating systems. A relative newcomer
experiencing sudden spectacular popularity starting in the late 1990s, Linux joins such
well-known commercial Unix operating systems as System V Release 4 (SVR4) developed by
AT&T, which is now owned by Novell; the 4.4 BSD release from the University of California
at Berkeley (4.4BSD), Digital Unix from Digital Equipment Corporation (now Compaq); AIX
from IBM; HP-UX from Hewlett-Packard; and Solaris from Sun Microsystems.
Linux was initially developed by Linus Torvalds in 1991 as an operating system for IBM-
compatible personal computers based on the Intel 80386 microprocessor. Linus remains
deeply involved with improving Linux, keeping it up-to-date with various hardware
developments and coordinating the activity of hundreds of Linux developers around the
world. Over the years, developers have worked to make Linux available on other
architectures, including Alpha, SPARC, Motorola MC680x0, PowerPC, and IBM
System/390.
One of the more appealing benefits to Linux is that it isn't a commercial operating system: its
source code under the GNU Public License[1]
is open and available to anyone to study, as we
will in this book; if you download the code (the official site is http://www.kernel.org/) or
check the sources on a Linux CD, you will be able to explore from top to bottom one of
the most successful, modern operating systems. This book, in fact, assumes you have
the source code on hand and can apply what we say to your own explorations.
[1]
The GNU project is coordinated by the Free Software Foundation, Inc. (http://www.gnu.org/); its aim is to implement a whole operating system
freely usable by everyone. The availability of a GNU C compiler has been essential for the success of the Linux project.
Technically speaking, Linux is a true Unix kernel, although it is not a full Unix operating
system, because it does not include all the applications such as filesystem utilities, windowing
systems and graphical desktops, system administrator commands, text editors, compilers, and
so on. However, since most of these programs are freely available under the GNU General
Public License, they can be installed into one of the filesystems supported by Linux.
Since Linux is a kernel, many Linux users prefer to rely on commercial distributions,
available on CD-ROM, to get the code included in a standard Unix system. Alternatively,
the code may be obtained from several different FTP sites. The Linux source code is usually
installed in the /usr/src/linux directory. In the rest of this book, all file pathnames will refer
implicitly to that directory.
1.1 Linux Versus Other Unix-Like Kernels
The various Unix-like systems on the market, some of which have a long history and may
show signs of archaic practices, differ in many important respects. All commercial variants
were derived from either SVR4 or 4.4BSD; all of them tend to agree on some common
standards like IEEE's POSIX (Portable Operating Systems based on Unix) and X/Open's CAE
(Common Applications Environment).
Understanding the Linux Kernel
7
The current standards specify only an application programming interface (API)—that is,
a well-defined environment in which user programs should run. Therefore, the standards do
not impose any restriction on internal design choices of a compliant kernel.[2]
[2]
As a matter of fact, several non-Unix operating systems like Windows NT are POSIX-compliant.
In order to define a common user interface, Unix-like kernels often share fundamental design
ideas and features. In this respect, Linux is comparable with the other Unix-like operating
systems. What you read in this book and see in the Linux kernel, therefore, may help you
understand the other Unix variants too.
The 2.2 version of the Linux kernel aims to be compliant with the IEEE POSIX standard.
This, of course, means that most existing Unix programs can be compiled and executed on
a Linux system with very little effort or even without the need for patches to the source code.
Moreover, Linux includes all the features of a modern Unix operating system, like virtual
memory, a virtual filesystem, lightweight processes, reliable signals, SVR4 interprocess
communications, support for Symmetric Multiprocessor (SMP) systems, and so on.
By itself, the Linux kernel is not very innovative. When Linus Torvalds wrote the first kernel,
he referred to some classical books on Unix internals, like Maurice Bach's The Design of
the Unix Operating System (Prentice Hall, 1986). Actually, Linux still has some bias toward
the Unix baseline described in Bach's book (i.e., SVR4). However, Linux doesn't stick to any
particular variant. Instead, it tries to adopt good features and design choices of several
different Unix kernels.
Here is an assessment of how Linux competes against some well-known commercial Unix
kernels:
• The Linux kernel is monolithic. It is a large, complex do-it-yourself program,
composed of several logically different components. In this, it is quite conventional;
most commercial Unix variants are monolithic. A notable exception is Carnegie-
Mellon's Mach 3.0, which follows a microkernel approach.
• Traditional Unix kernels are compiled and linked statically. Most modern kernels can
dynamically load and unload some portions of the kernel code (typically, device
drivers), which are usually called modules. Linux's support for modules is very good,
since it is able to automatically load and unload modules on demand. Among the main
commercial Unix variants, only the SVR4.2 kernel has a similar feature.
• Kernel threading. Some modern Unix kernels, like Solaris 2.x and SVR4.2/MP, are
organized as a set of kernel threads. A kernel thread is an execution context that can
be independently scheduled; it may be associated with a user program, or it may run
only some kernel functions. Context switches between kernel threads are usually much
less expensive than context switches between ordinary processes, since the former
usually operate on a common address space. Linux uses kernel threads in a very
limited way to execute a few kernel functions periodically; since Linux kernel threads
cannot execute user programs, they do not represent the basic execution context
abstraction. (That's the topic of the next item.)
• Multithreaded application support. Most modern operating systems have some kind of
support for multithreaded applications, that is, user programs that are well designed in
terms of many relatively independent execution flows sharing a large portion of the
application data structures. A multithreaded user application could be composed of
many lightweight processes (LWP), or processes that can operate on a common
Understanding the Linux Kernel
8
address space, common physical memory pages, common opened files, and so on.
Linux defines its own version of lightweight processes, which is different from the
types used on other systems such as SVR4 and Solaris. While all the commercial Unix
variants of LWP are based on kernel threads, Linux regards lightweight processes as
the basic execution context and handles them via the nonstandard clone( ) system
call.
• Linux is a nonpreemptive kernel. This means that Linux cannot arbitrarily interleave
execution flows while they are in privileged mode. Several sections of kernel code
assume they can run and modify data structures without fear of being interrupted and
having another thread alter those data structures. Usually, fully preemptive kernels are
associated with special real-time operating systems. Currently, among conventional,
general-purpose Unix systems, only Solaris 2.x and Mach 3.0 are fully preemptive
kernels. SVR4.2/MP introduces some fixed preemption points as a method to get
limited preemption capability.
• Multiprocessor support. Several Unix kernel variants take advantage of multiprocessor
systems. Linux 2.2 offers an evolving kind of support for symmetric multiprocessing
(SMP), which means not only that the system can use multiple processors but also that
any processor can handle any task; there is no discrimination among them. However,
Linux 2.2 does not make optimal use of SMP. Several kernel activities that could be
executed concurrently—like filesystem handling and networking—must now be
executed sequentially.
• Filesystem. Linux's standard filesystem lacks some advanced features, such as
journaling. However, more advanced filesystems for Linux are available, although not
included in the Linux source code; among them, IBM AIX's Journaling File System
(JFS), and Silicon Graphics Irix's XFS filesystem. Thanks to a powerful object-
oriented Virtual File System technology (inspired by Solaris and SVR4), porting
a foreign filesystem to Linux is a relatively easy task.
• STREAMS. Linux has no analog to the STREAMS I/O subsystem introduced in
SVR4, although it is included nowadays in most Unix kernels and it has become the
preferred interface for writing device drivers, terminal drivers, and network protocols.
This somewhat disappointing assessment does not depict, however, the whole truth. Several
features make Linux a wonderfully unique operating system. Commercial Unix kernels often
introduce new features in order to gain a larger slice of the market, but these features are not
necessarily useful, stable, or productive. As a matter of fact, modern Unix kernels tend to be
quite bloated. By contrast, Linux doesn't suffer from the restrictions and the conditioning
imposed by the market, hence it can freely evolve according to the ideas of its designers
(mainly Linus Torvalds). Specifically, Linux offers the following advantages over its
commercial competitors:
Linux is free.
You can install a complete Unix system at no expense other than the hardware (of
course).
Understanding the Linux Kernel
9
Linux is fully customizable in all its components.
Thanks to the General Public License (GPL), you are allowed to freely read and
modify the source code of the kernel and of all system programs.[3]
[3]
Several commercial companies have started to support their products under Linux, most of which aren't distributed under a GNU Public License.
Therefore, you may not be allowed to read or modify their source code.
Linux runs on low-end, cheap hardware platforms.
You can even build a network server using an old Intel 80386 system with 4 MB of
RAM.
Linux is powerful.
Linux systems are very fast, since they fully exploit the features of the hardware
components. The main Linux target is efficiency, and indeed many design choices of
commercial variants, like the STREAMS I/O subsystem, have been rejected by Linus
because of their implied performance penalty.
Linux has a high standard for source code quality.
Linux systems are usually very stable; they have a very low failure rate and system
maintenance time.
The Linux kernel can be very small and compact.
Indeed, it is possible to fit both a kernel image and full root filesystem, including all
fundamental system programs, on just one 1.4 MB floppy disk! As far as we know,
none of the commercial Unix variants is able to boot from a single floppy disk.
Linux is highly compatible with many common operating systems.
It lets you directly mount filesystems for all versions of MS-DOS and MS Windows,
SVR4, OS/2, Mac OS, Solaris, SunOS, NeXTSTEP, many BSD variants, and so on.
Linux is also able to operate with many network layers like Ethernet, Fiber Distributed
Data Interface (FDDI), High Performance Parallel Interface (HIPPI), IBM's Token
Ring, AT&T WaveLAN, DEC RoamAbout DS, and so forth. By using suitable
libraries, Linux systems are even able to directly run programs written for other
operating systems. For example, Linux is able to execute applications written for MS-
DOS, MS Windows, SVR3 and R4, 4.4BSD, SCO Unix, XENIX, and others on the
Intel 80x86 platform.
Linux is well supported.
Believe it or not, it may be a lot easier to get patches and updates for Linux than for
any proprietary operating system! The answer to a problem often comes back within
a few hours after sending a message to some newsgroup or mailing list. Moreover,
drivers for Linux are usually available a few weeks after new hardware products have
been introduced on the market. By contrast, hardware manufacturers release device
drivers for only a few commercial operating systems, usually the Microsoft ones.
Understanding the Linux Kernel
10
Therefore, all commercial Unix variants run on a restricted subset of hardware
components.
With an estimated installed base of more than 12 million and growing, people who are used to
certain creature features that are standard under other operating systems are starting to expect
the same from Linux. As such, the demand on Linux developers is also increasing. Luckily,
though, Linux has evolved under the close direction of Linus over the years, to accommodate
the needs of the masses.
1.2 Hardware Dependency
Linux tries to maintain a neat distinction between hardware-dependent and hardware-
independent source code. To that end, both the arch and the include directories include nine
subdirectories corresponding to the nine hardware platforms supported. The standard names
of the platforms are:
arm
Acorn personal computers
alpha
Compaq Alpha workstations
i386
IBM-compatible personal computers based on Intel 80x86 or Intel 80x86-compatible
microprocessors
m68k
Personal computers based on Motorola MC680x0 microprocessors
mips
Workstations based on Silicon Graphics MIPS microprocessors
ppc
Workstations based on Motorola-IBM PowerPC microprocessors
sparc
Workstations based on Sun Microsystems SPARC microprocessors
sparc64
Workstations based on Sun Microsystems 64-bit Ultra SPARC microprocessors
Understanding the Linux Kernel
11
s390
IBM System/390 mainframes
1.3 Linux Versions
Linux distinguishes stable kernels from development kernels through a simple numbering
scheme. Each version is characterized by three numbers, separated by periods. The first two
numbers are used to identify the version; the third number identifies the release.
As shown in Figure 1-1, if the second number is even, it denotes a stable kernel; otherwise, it
denotes a development kernel. At the time of this writing, the current stable version of the
Linux kernel is 2.2.14, and the current development version is 2.3.51. The 2.2 kernel, which is
the basis for this book, was first released in January 1999, and it differs considerably from the
2.0 kernel, particularly with respect to memory management. Work on the 2.3 development
version started in May 1999.
Figure 1-1. Numbering Linux versions
New releases of a stable version come out mostly to fix bugs reported by users. The main
algorithms and data structures used to implement the kernel are left unchanged.
Development versions, on the other hand, may differ quite significantly from one another;
kernel developers are free to experiment with different solutions that occasionally lead to
drastic kernel changes. Users who rely on development versions for running applications may
experience unpleasant surprises when upgrading their kernel to a newer release. This book
concentrates on the most recent stable kernel that we had available because, among all
the new features being tried in experimental kernels, there's no way of telling which will
ultimately be accepted and what they'll look like in their final form.
At the time of this writing, Linux 2.4 has not officially come out. We tried to anticipate the
forthcoming features and the main kernel changes with respect to the 2.2 version by looking
at the Linux 2.3.99-pre8 prerelease. Linux 2.4 inherits a good deal from Linux 2.2: many
concepts, design choices, algorithms, and data structures remain the same. For that reason, we
conclude each chapter by sketching how Linux 2.4 differs from Linux 2.2 with respect to
the topics just discussed. As you'll notice, the new Linux is gleaming and shining; it should
appear more appealing to large corporations and, more generally, to the whole business
community.
Understanding the Linux Kernel
12
1.4 Basic Operating System Concepts
Any computer system includes a basic set of programs called the operating system. The most
important program in the set is called the kernel. It is loaded into RAM when the system boots
and contains many critical procedures that are needed for the system to operate. The other
programs are less crucial utilities; they can provide a wide variety of interactive experiences
for the user—as well as doing all the jobs the user bought the computer for—but the essential
shape and capabilities of the system are determined by the kernel. The kernel, then, is where
we fix our attention in this book. Hence, we'll often use the term "operating system" as
a synonym for "kernel."
The operating system must fulfill two main objectives:
• Interact with the hardware components servicing all low-level programmable elements
included in the hardware platform.
• Provide an execution environment to the applications that run on the computer system
(the so-called user programs).
Some operating systems allow all user programs to directly play with the hardware
components (a typical example is MS-DOS). In contrast, a Unix-like operating system hides
all low-level details concerning the physical organization of the computer from applications
run by the user. When a program wants to make use of a hardware resource, it must issue
a request to the operating system. The kernel evaluates the request and, if it chooses to grant
the resource, interacts with the relative hardware components on behalf of the user program.
In order to enforce this mechanism, modern operating systems rely on the availability of
specific hardware features that forbid user programs to directly interact with low-level
hardware components or to access arbitrary memory locations. In particular, the hardware
introduces at least two different execution modes for the CPU: a nonprivileged mode for user
programs and a privileged mode for the kernel. Unix calls these User Mode and Kernel Mode,
respectively.
In the rest of this chapter, we introduce the basic concepts that have motivated the design of
Unix over the past two decades, as well as Linux and other operating systems. While the
concepts are probably familiar to you as a Linux user, these sections try to delve into them
a bit more deeply than usual to explain the requirements they place on an operating system
kernel. These broad considerations refer to Unix-like systems, thus also to Linux. The other
chapters of this book will hopefully help you to understand the Linux kernel internals.
1.4.1 Multiuser Systems
A multiuser system is a computer that is able to concurrently and independently execute
several applications belonging to two or more users. "Concurrently" means that applications
can be active at the same time and contend for the various resources such as CPU, memory,
hard disks, and so on. "Independently" means that each application can perform its task with
no concern for what the applications of the other users are doing. Switching from one
application to another, of course, slows down each of them and affects the response time seen
by the users. Many of the complexities of modern operating system kernels, which we will
examine in this book, are present to minimize the delays enforced on each program and to
provide the user with responses that are as fast as possible.
Understanding the Linux Kernel
13
Multiuser operating systems must include several features:
• An authentication mechanism for verifying the user identity
• A protection mechanism against buggy user programs that could block other
applications running in the system
• A protection mechanism against malicious user programs that could interfere with, or
spy on, the activity of other users
• An accounting mechanism that limits the amount of resource units assigned to each
user
In order to ensure safe protection mechanisms, operating systems must make use of the
hardware protection associated with the CPU privileged mode. Otherwise, a user program
would be able to directly access the system circuitry and overcome the imposed bounds. Unix
is a multiuser system that enforces the hardware protection of system resources.
1.4.2 Users and Groups
In a multiuser system, each user has a private space on the machine: typically, he owns some
quota of the disk space to store files, receives private mail messages, and so on. The operating
system must ensure that the private portion of a user space is visible only to its owner. In
particular, it must ensure that no user can exploit a system application for the purpose of
violating the private space of another user.
All users are identified by a unique number called the User ID , or UID. Usually only a
restricted number of persons are allowed to make use of a computer system. When one of
these users starts a working session, the operating system asks for a login name and a
password. If the user does not input a valid pair, the system denies access. Since the password
is assumed to be secret, the user's privacy is ensured.
In order to selectively share material with other users, each user is a member of one or more
groups, which are identified by a unique number called a Group ID , or GID. Each file is also
associated with exactly one group. For example, access could be set so that the user owning
the file has read and write privileges, the group has read-only privileges, and other users on
the system are denied access to the file.
Any Unix-like operating system has a special user called root, superuser, or supervisor. The
system administrator must log in as root in order to handle user accounts, perform
maintenance tasks like system backups and program upgrades, and so on. The root user can
do almost everything, since the operating system does not apply the usual protection
mechanisms to her. In particular, the root user can access every file on the system and can
interfere with the activity of every running user program.
1.4.3 Processes
All operating systems make use of one fundamental abstraction: the process . A process can
be defined either as "an instance of a program in execution," or as the "execution context" of a
running program. In traditional operating systems, a process executes a single sequence of
instructions in an address space ; the address space is the set of memory addresses that the
process is allowed to reference. Modern operating systems allow processes with multiple
Understanding the Linux Kernel
14
execution flows, that is, multiple sequences of instructions executed in the same address
space.
Multiuser systems must enforce an execution environment in which several processes can be
active concurrently and contend for system resources, mainly the CPU. Systems that allow
concurrent active processes are said to be multiprogramming or multiprocessing.[4]
It is
important to distinguish programs from processes: several processes can execute the same
program concurrently, while the same process can execute several programs sequentially.
[4]
Some multiprocessing operating systems are not multiuser; an example is Microsoft's Windows 98.
On uniprocessor systems, just one process can hold the CPU, and hence just one execution
flow can progress at a time. In general, the number of CPUs is always restricted, and therefore
only a few processes can progress at the same time. The choice of the process that can
progress is left to an operating system component called the scheduler. Some operating
systems allow only nonpreemptive processes, which means that the scheduler is invoked only
when a process voluntarily relinquishes the CPU. But processes of a multiuser system must be
preemptive ; the operating system tracks how long each process holds the CPU and
periodically activates the scheduler.
Unix is a multiprocessing operating system with preemptive processes. Indeed, the process
abstraction is really fundamental in all Unix systems. Even when no user is logged in and no
application is running, several system processes monitor the peripheral devices. In particular,
several processes listen at the system terminals waiting for user logins. When a user inputs a
login name, the listening process runs a program that validates the user password. If the user
identity is acknowledged, the process creates another process that runs a shell into which
commands are entered. When a graphical display is activated, one process runs the window
manager, and each window on the display is usually run by a separate process. When a user
creates a graphics shell, one process runs the graphics windows, and a second process runs the
shell into which the user can enter the commands. For each user command, the shell process
creates another process that executes the corresponding program.
Unix-like operating systems adopt a process/kernel model. Each process has the illusion that
it's the only process on the machine and it has exclusive access to the operating system
services. Whenever a process makes a system call (i.e., a request to the kernel), the hardware
changes the privilege mode from User Mode to Kernel Mode, and the process starts the
execution of a kernel procedure with a strictly limited purpose. In this way, the operating
system acts within the execution context of the process in order to satisfy its request.
Whenever the request is fully satisfied, the kernel procedure forces the hardware to return to
User Mode and the process continues its execution from the instruction following the system
call.
1.4.4 Kernel Architecture
As stated before, most Unix kernels are monolithic: each kernel layer is integrated into the
whole kernel program and runs in Kernel Mode on behalf of the current process. In contrast,
microkernel operating systems demand a very small set of functions from the kernel,
generally including a few synchronization primitives, a simple scheduler, and an interprocess
communication mechanism. Several system processes that run on top of the microkernel
implement other operating system-layer functions, like memory allocators, device drivers,
system call handlers, and so on.
Understanding the Linux Kernel
15
Although academic research on operating systems is oriented toward microkernels, such
operating systems are generally slower than monolithic ones, since the explicit message
passing between the different layers of the operating system has a cost. However, microkernel
operating systems might have some theoretical advantages over monolithic ones.
Microkernels force the system programmers to adopt a modularized approach, since any
operating system layer is a relatively independent program that must interact with the other
layers through well-defined and clean software interfaces. Moreover, an existing microkernel
operating system can be fairly easily ported to other architectures, since all hardware-
dependent components are generally encapsulated in the microkernel code. Finally,
microkernel operating systems tend to make better use of random access memory (RAM) than
monolithic ones, since system processes that aren't implementing needed functionalities might
be swapped out or destroyed.
Modules are a kernel feature that effectively achieves many of the theoretical advantages of
microkernels without introducing performance penalties. A module is an object file whose
code can be linked to (and unlinked from) the kernel at runtime. The object code usually
consists of a set of functions that implements a filesystem, a device driver, or other features at
the kernel's upper layer. The module, unlike the external layers of microkernel operating
systems, does not run as a specific process. Instead, it is executed in Kernel Mode on behalf
of the current process, like any other statically linked kernel function.
The main advantages of using modules include:
Modularized approach
Since any module can be linked and unlinked at runtime, system programmers must
introduce well-defined software interfaces to access the data structures handled by
modules. This makes it easy to develop new modules.
Platform independence
Even if it may rely on some specific hardware features, a module doesn't depend on a
fixed hardware platform. For example, a disk driver module that relies on the SCSI
standard works as well on an IBM-compatible PC as it does on Compaq's Alpha.
Frugal main memory usage
A module can be linked to the running kernel when its functionality is required and
unlinked when it is no longer useful. This mechanism also can be made transparent to
the user, since linking and unlinking can be performed automatically by the kernel.
No performance penalty
Once linked in, the object code of a module is equivalent to the object code of the
statically linked kernel. Therefore, no explicit message passing is required when the
functions of the module are invoked.[5]
[5]
A small performance penalty occurs when the module is linked and when it is unlinked. However, this penalty can be compared to the penalty
caused by the creation and deletion of system processes in microkernel operating systems.
Understanding the Linux Kernel
16
1.5 An Overview of the Unix Filesystem
The Unix operating system design is centered on its filesystem, which has several interesting
characteristics. We'll review the most significant ones, since they will be mentioned quite
often in forthcoming chapters.
1.5.1 Files
A Unix file is an information container structured as a sequence of bytes; the kernel does not
interpret the contents of a file. Many programming libraries implement higher-level
abstractions, such as records structured into fields and record addressing based on keys.
However, the programs in these libraries must rely on system calls offered by the kernel.
From the user's point of view, files are organized in a tree-structured name space as shown in
Figure 1-2.
Figure 1-2. An example of a directory tree
All the nodes of the tree, except the leaves, denote directory names. A directory node contains
information about the files and directories just beneath it. A file or directory name consists of
a sequence of arbitrary ASCII characters,[6]
with the exception of / and of the null character 0.
Most filesystems place a limit on the length of a filename, typically no more than 255
characters. The directory corresponding to the root of the tree is called the root directory . By
convention, its name is a slash (/). Names must be different within the same directory, but the
same name may be used in different directories.
[6]
Some operating systems allow filenames to be expressed in many different alphabets, based on 16-bit extended coding of graphical characters such
as Unicode.
Unix associates a current working directory with each process (see Section 1.6.1 later in this
chapter); it belongs to the process execution context, and it identifies the directory currently
used by the process. In order to identify a specific file, the process uses a pathname, which
consists of slashes alternating with a sequence of directory names that lead to the file. If the
first item in the pathname is a slash, the pathname is said to be absolute, since its starting
point is the root directory. Otherwise, if the first item is a directory name or filename, the
pathname is said to be relative, since its starting point is the process's current directory.
While specifying filenames, the notations "." and ".." are also used. They denote the current
working directory and its parent directory, respectively. If the current working directory is the
root directory, "." and ".." coincide.
Understanding the Linux Kernel
17
1.5.2 Hard and Soft Links
A filename included in a directory is called a file hard link, or more simply a link. The same
file may have several links included in the same directory or in different ones, thus several
filenames.
The Unix command:
$ ln f1 f2
is used to create a new hard link that has the pathname f2 for a file identified by the pathname
f1.
Hard links have two limitations:
• Users are not allowed to create hard links for directories. This might transform the
directory tree into a graph with cycles, thus making it impossible to locate a file
according to its name.
• Links can be created only among files included in the same filesystem. This is a
serious limitation since modern Unix systems may include several filesystems located
on different disks and/or partitions, and users may be unaware of the physical
divisions between them.
In order to overcome these limitations, soft links (also called symbolic links) have been
introduced. Symbolic links are short files that contain an arbitrary pathname of another file.
The pathname may refer to any file located in any filesystem; it may even refer to a
nonexistent file.
The Unix command:
$ ln -s f1 f2
creates a new soft link with pathname f2 that refers to pathname f1. When this command is
executed, the filesystem creates a soft link and writes into it the f1 pathname. It then inserts—
in the proper directory—a new entry containing the last name of the f2 pathname. In this way,
any reference to f2 can be translated automatically into a reference to f1.
1.5.3 File Types
Unix files may have one of the following types:
• Regular file
• Directory
• Symbolic link
• Block-oriented device file
• Character-oriented device file
• Pipe and named pipe (also called FIFO)
• Socket
Understanding the Linux Kernel
18
The first three file types are constituents of any Unix filesystem. Their implementation will be
described in detail in Chapter 17.
Device files are related to I/O devices and device drivers integrated into the kernel. For
example, when a program accesses a device file, it acts directly on the I/O device associated
with that file (see Chapter 13).
Pipes and sockets are special files used for interprocess communication (see Section 1.6.5
later in this chapter and Chapter 18).
1.5.4 File Descriptor and Inode
Unix makes a clear distinction between a file and a file descriptor. With the exception of
device and special files, each file consists of a sequence of characters. The file does not
include any control information such as its length, or an End-Of-File (EOF) delimiter.
All information needed by the filesystem to handle a file is included in a data structure called
an inode. Each file has its own inode, which the filesystem uses to identify the file.
While filesystems and the kernel functions handling them can vary widely from one Unix
system to another, they must always provide at least the following attributes, which are
specified in the POSIX standard:
• File type (see previous section)
• Number of hard links associated with the file
• File length in bytes
• Device ID (i.e., an identifier of the device containing the file)
• Inode number that identifies the file within the filesystem
• User ID of the file owner
• Group ID of the file
• Several timestamps that specify the inode status change time, the last access time, and
the last modify time
• Access rights and file mode (see next section)
1.5.5 Access Rights and File Mode
The potential users of a file fall into three classes:
• The user who is the owner of the file
• The users who belong to the same group as the file, not including the owner
• All remaining users (others)
There are three types of access rights, Read, Write, and Execute, for each of these three
classes. Thus, the set of access rights associated with a file consists of nine different binary
flags. Three additional flags, called suid (Set User ID), sgid (Set Group ID), and sticky define
the file mode. These flags have the following meanings when applied to executable files:
Understanding the Linux Kernel
19
suid
A process executing a file normally keeps the User ID (UID) of the process owner.
However, if the executable file has the suid flag set, the process gets the UID of the
file owner.
sgid
A process executing a file keeps the Group ID (GID) of the process group. However,
if the executable file has the sgid flag set, the process gets the ID of the file group.
sticky
An executable file with the sticky flag set corresponds to a request to the kernel to
keep the program in memory after its execution terminates.[7]
[7]
This flag has become obsolete; other approaches based on sharing of code pages are now used (see Chapter 7).
When a file is created by a process, its owner ID is the UID of the process. Its owner group ID
can be either the GID of the creator process or the GID of the parent directory, depending on
the value of the sgid flag of the parent directory.
1.5.6 File-Handling System Calls
When a user accesses the contents of either a regular file or a directory, he actually accesses
some data stored in a hardware block device. In this sense, a filesystem is a user-level view of
the physical organization of a hard disk partition. Since a process in User Mode cannot
directly interact with the low-level hardware components, each actual file operation must be
performed in Kernel Mode.
Therefore, the Unix operating system defines several system calls related to file handling.
Whenever a process wants to perform some operation on a specific file, it uses the proper
system call and passes the file pathname as a parameter.
All Unix kernels devote great attention to the efficient handling of hardware block devices in
order to achieve good overall system performance. In the chapters that follow, we will
describe topics related to file handling in Linux and specifically how the kernel reacts to file-
related system calls. In order to understand those descriptions, you will need to know how the
main file-handling system calls are used; they are described in the next section.
1.5.6.1 Opening a file
Processes can access only "opened" files. In order to open a file, the process invokes the
system call:
fd = open(path, flag, mode)
The three parameters have the following meanings:
Understanding the Linux Kernel
20
path
Denotes the pathname (relative or absolute) of the file to be opened.
flag
Specifies how the file must be opened (e.g., read, write, read/write, append). It can
also specify whether a nonexisting file should be created.
mode
Specifies the access rights of a newly created file.
This system call creates an "open file" object and returns an identifier called file descriptor .
An open file object contains:
• Some file-handling data structures, like a pointer to the kernel buffer memory area
where file data will be copied; an offset field that denotes the current position in the
file from which the next operation will take place (the so-called file pointer); and so
on.
• Some pointers to kernel functions that the process is enabled to invoke. The set of
permitted functions depends on the value of the flag parameter.
We'll discuss open file objects in detail in Chapter 12. Let's limit ourselves here to describing
some general properties specified by the POSIX semantics:
• A file descriptor represents an interaction between a process and an opened file, while
an open file object contains data related to that interaction. The same open file object
may be identified by several file descriptors.
• Several processes may concurrently open the same file. In this case, the filesystem
assigns a separate file descriptor to each file, along with a separate open file object.
When this occurs, the Unix filesystem does not provide any kind of synchronization
among the I/O operations issued by the processes on the same file. However, several
system calls such as flock( ) are available to allow processes to synchronize
themselves on the entire file or on portions of it (see Chapter 12).
In order to create a new file, the process may also invoke the create( ) system call, which is
handled by the kernel exactly like open( ).
1.5.6.2 Accessing an opened file
Regular Unix files can be addressed either sequentially or randomly, while device files and
named pipes are usually accessed sequentially (see Chapter 13). In both kinds of access, the
kernel stores the file pointer in the open file object, that is, the current position at which the
next read or write operation will take place.
Sequential access is implicitly assumed: the read( ) and write( ) system calls always refer
to the position of the current file pointer. In order to modify the value, a program must
explicitly invoke the lseek( ) system call. When a file is opened, the kernel sets the file
pointer to the position of the first byte in the file (offset 0).
Understanding the Linux Kernel
21
The lseek( ) system call requires the following parameters:
newoffset = lseek(fd, offset, whence);
which have the following meanings:
fd
Indicates the file descriptor of the opened file
offset
Specifies a signed integer value that will be used for computing the new position of
the file pointer
whence
Specifies whether the new position should be computed by adding the offset value to
the number (offset from the beginning of the file), the current file pointer, or the
position of the last byte (offset from the end of the file)
The read( ) system call requires the following parameters:
nread = read(fd, buf, count);
which have the following meaning:
fd
Indicates the file descriptor of the opened file
buf
Specifies the address of the buffer in the process's address space to which the data will
be transferred
count
Denotes the number of bytes to be read
When handling such a system call, the kernel attempts to read count bytes from the file
having the file descriptor fd, starting from the current value of the opened file's offset field. In
some cases—end-of-file, empty pipe, and so on—the kernel does not succeed in reading all
count bytes. The returned nread value specifies the number of bytes effectively read. The file
pointer is also updated by adding nread to its previous value. The write( ) parameters are
similar.
Understanding the Linux Kernel
22
1.5.6.3 Closing a file
When a process does not need to access the contents of a file anymore, it can invoke the
system call:
res = close(fd);
which releases the open file object corresponding to the file descriptor fd. When a process
terminates, the kernel closes all its still opened files.
1.5.6.4 Renaming and deleting a file
In order to rename or delete a file, a process does not need to open it. Indeed, such operations
do not act on the contents of the affected file, but rather on the contents of one or more
directories. For example, the system call:
res = rename(oldpath, newpath);
changes the name of a file link, while the system call:
res = unlink(pathname);
decrements the file link count and removes the corresponding directory entry. The file is
deleted only when the link count assumes the value 0.
1.6 An Overview of Unix Kernels
Unix kernels provide an execution environment in which applications may run. Therefore, the
kernel must implement a set of services and corresponding interfaces. Applications use those
interfaces and do not usually interact directly with hardware resources.
1.6.1 The Process/Kernel Model
As already mentioned, a CPU can run either in User Mode or in Kernel Mode. Actually, some
CPUs can have more than two execution states. For instance, the Intel 80x86 microprocessors
have four different execution states. But all standard Unix kernels make use of only Kernel
Mode and User Mode.
When a program is executed in User Mode, it cannot directly access the kernel data structures
or the kernel programs. When an application executes in Kernel Mode, however, these
restrictions no longer apply. Each CPU model provides special instructions to switch from
User Mode to Kernel Mode and vice versa. A program executes most of the time in User
Mode and switches to Kernel Mode only when requesting a service provided by the kernel.
When the kernel has satisfied the program's request, it puts the program back in User Mode.
Processes are dynamic entities that usually have a limited life span within the system. The
task of creating, eliminating, and synchronizing the existing processes is delegated to a group
of routines in the kernel.
The kernel itself is not a process but a process manager. The process/kernel model assumes
that processes that require a kernel service make use of specific programming constructs
Understanding the Linux Kernel
23
called system calls. Each system call sets up the group of parameters that identifies the
process request and then executes the hardware-dependent CPU instruction to switch from
User Mode to Kernel Mode.
Besides user processes, Unix systems include a few privileged processes called kernel threads
with the following characteristics:
• They run in Kernel Mode in the kernel address space.
• They do not interact with users, and thus do not require terminal devices.
• They are usually created during system startup and remain alive until the system is
shut down.
Notice how the process/ kernel model is somewhat orthogonal to the CPU state: on a
uniprocessor system, only one process is running at any time and it may run either in User or
in Kernel Mode. If it runs in Kernel Mode, the processor is executing some kernel routine.
Figure 1-3 illustrates examples of transitions between User and Kernel Mode. Process 1 in
User Mode issues a system call, after which the process switches to Kernel Mode and the
system call is serviced. Process 1 then resumes execution in User Mode until a timer interrupt
occurs and the scheduler is activated in Kernel Mode. A process switch takes place, and
Process 2 starts its execution in User Mode until a hardware device raises an interrupt. As a
consequence of the interrupt, Process 2 switches to Kernel Mode and services the interrupt.
Figure 1-3. Transitions between User and Kernel Mode
Unix kernels do much more than handle system calls; in fact, kernel routines can be activated
in several ways:
• A process invokes a system call.
• The CPU executing the process signals an exception, which is some unusual condition
such as an invalid instruction. The kernel handles the exception on behalf of the
process that caused it.
• A peripheral device issues an interrupt signal to the CPU to notify it of an event such
as a request for attention, a status change, or the completion of an I/O operation. Each
interrupt signal is dealt by a kernel program called an interrupt handler. Since
peripheral devices operate asynchronously with respect to the CPU, interrupts occur at
unpredictable times.
• A kernel thread is executed; since it runs in Kernel Mode, the corresponding program
must be considered part of the kernel, albeit encapsulated in a process.
Understanding the Linux Kernel
24
1.6.2 Process Implementation
To let the kernel manage processes, each process is represented by a process descriptor that
includes information about the current state of the process.
When the kernel stops the execution of a process, it saves the current contents of several
processor registers in the process descriptor. These include:
• The program counter (PC) and stack pointer (SP) registers
• The general-purpose registers
• The floating point registers
• The processor control registers (Processor Status Word) containing information about
the CPU state
• The memory management registers used to keep track of the RAM accessed by the
process
When the kernel decides to resume executing a process, it uses the proper process descriptor
fields to load the CPU registers. Since the stored value of the program counter points to the
instruction following the last instruction executed, the process resumes execution from where
it was stopped.
When a process is not executing on the CPU, it is waiting for some event. Unix kernels
distinguish many wait states, which are usually implemented by queues of process
descriptors; each (possibly empty) queue corresponds to the set of processes waiting for a
specific event.
1.6.3 Reentrant Kernels
All Unix kernels are reentrant : this means that several processes may be executing in Kernel
Mode at the same time. Of course, on uniprocessor systems only one process can progress,
but many of them can be blocked in Kernel Mode waiting for the CPU or the completion of
some I/O operation. For instance, after issuing a read to a disk on behalf of some process, the
kernel will let the disk controller handle it and will resume executing other processes.
An interrupt notifies the kernel when the device has satisfied the read, so the former process
can resume the execution.
One way to provide reentrancy is to write functions so that they modify only local variables
and do not alter global data structures. Such functions are called reentrant functions. But
a reentrant kernel is not limited just to such reentrant functions (although that is how some
real-time kernels are implemented). Instead, the kernel can include nonreentrant functions and
use locking mechanisms to ensure that only one process can execute a nonreentrant function
at a time. Every process in Kernel Mode acts on its own set of memory locations and cannot
interfere with the others.
If a hardware interrupt occurs, a reentrant kernel is able to suspend the current running
process even if that process is in Kernel Mode. This capability is very important, since it
improves the throughput of the device controllers that issue interrupts. Once a device has
issued an interrupt, it waits until the CPU acknowledges it. If the kernel is able to answer
quickly, the device controller will be able to perform other tasks while the CPU handles
the interrupt.
Understanding the Linux Kernel
25
Now let's look at kernel reentrancy and its impact on the organization of the kernel. A kernel
control path denotes the sequence of instructions executed by the kernel to handle a system
call, an exception, or an interrupt.
In the simplest case, the CPU executes a kernel control path sequentially from the first
instruction to the last. When one of the following events occurs, however, the CPU interleaves
the kernel control paths:
• A process executing in User Mode invokes a system call and the corresponding kernel
control path verifies that the request cannot be satisfied immediately; it then invokes
the scheduler to select a new process to run. As a result, a process switch occurs. The
first kernel control path is left unfinished and the CPU resumes the execution of some
other kernel control path. In this case, the two control paths are executed on behalf of
two different processes.
• The CPU detects an exception—for example, an access to a page not present in
RAM—while running a kernel control path. The first control path is suspended, and
the CPU starts the execution of a suitable procedure. In our example, this type of
procedure could allocate a new page for the process and read its contents from disk.
When the procedure terminates, the first control path can be resumed. In this case, the
two control paths are executed on behalf of the same process.
• A hardware interrupt occurs while the CPU is running a kernel control path with the
interrupts enabled. The first kernel control path is left unfinished and the CPU starts
processing another kernel control path to handle the interrupt. The first kernel control
path resumes when the interrupt handler terminates. In this case the two kernel control
paths run in the execution context of the same process and the total elapsed system
time is accounted to it. However, the interrupt handler doesn't necessarily operate on
behalf of the process.
Figure 1-4 illustrates a few examples of noninterleaved and interleaved kernel control paths.
Three different CPU states are considered:
• Running a process in User Mode (User)
• Running an exception or a system call handler (Excp)
• Running an interrupt handler (Intr)
Figure 1-4. Interleaving of kernel control paths
Understanding the Linux Kernel
26
1.6.4 Process Address Space
Each process runs in its private address space. A process running in User Mode refers to
private stack, data, and code areas. When running in Kernel Mode, the process addresses the
kernel data and code area and makes use of another stack.
Since the kernel is reentrant, several kernel control paths—each related to a different
process—may be executed in turn. In this case, each kernel control path refers to its own
private kernel stack.
While it appears to each process that it has access to a private address space, there are times
when part of the address space is shared among processes. In some cases this sharing is
explicitly requested by processes; in others it is done automatically by the kernel to reduce
memory usage.
If the same program, say an editor, is needed simultaneously by several users, the program
will be loaded into memory only once, and its instructions can be shared by all of the users
who need it. Its data, of course, must not be shared, because each user will have separate data.
This kind of shared address space is done automatically by the kernel to save memory.
Processes can also share parts of their address space as a kind of interprocess communication,
using the "shared memory" technique introduced in System V and supported by Linux.
Finally, Linux supports the mmap( ) system call, which allows part of a file or the memory
residing on a device to be mapped into a part of a process address space. Memory mapping
can provide an alternative to normal reads and writes for transferring data. If the same file is
shared by several processes, its memory mapping is included in the address space of each of
the processes that share it.
1.6.5 Synchronization and Critical Regions
Implementing a reentrant kernel requires the use of synchronization: if a kernel control path is
suspended while acting on a kernel data structure, no other kernel control path will be allowed
to act on the same data structure unless it has been reset to a consistent state. Otherwise, the
interaction of the two control paths could corrupt the stored information.
For example, let's suppose that a global variable V contains the number of available items of
some system resource. A first kernel control path A reads the variable and determines that
there is just one available item. At this point, another kernel control path B is activated and
reads the same variable, which still contains the value 1. Thus, B decrements V and starts
using the resource item. Then A resumes the execution; because it has already read the value
of V, it assumes that it can decrement V and take the resource item, which B already uses. As
a final result, V contains -1, and two kernel control paths are using the same resource item
with potentially disastrous effects.
When the outcome of some computation depends on how two or more processes are
scheduled, the code is incorrect: we say that there is a race condition.
In general, safe access to a global variable is ensured by using atomic operations. In the
previous example, data corruption would not be possible if the two control paths read and
Understanding the Linux Kernel
27
decrement V with a single, noninterruptible operation. However, kernels contain many data
structures that cannot be accessed with a single operation. For example, it usually isn't
possible to remove an element from a linked list with a single operation, because the kernel
needs to access at least two pointers at once. Any section of code that should be finished by
each process that begins it before another process can enter it is called a critical region.[8]
[8]
Synchronization problems have been fully described in other works; we refer the interested reader to books on the Unix operating systems (see the
bibliography near the end of the book).
These problems occur not only among kernel control paths but also among processes sharing
common data. Several synchronization techniques have been adopted. The following section
will concentrate on how to synchronize kernel control paths.
1.6.5.1 Nonpreemptive kernels
In search of a drastically simple solution to synchronization problems, most traditional Unix
kernels are nonpreemptive: when a process executes in Kernel Mode, it cannot be arbitrarily
suspended and substituted with another process. Therefore, on a uniprocessor system all
kernel data structures that are not updated by interrupts or exception handlers are safe for the
kernel to access.
Of course, a process in Kernel Mode can voluntarily relinquish the CPU, but in this case it
must ensure that all data structures are left in a consistent state. Moreover, when it resumes its
execution, it must recheck the value of any previously accessed data structures that could be
changed.
Nonpreemptability is ineffective in multiprocessor systems, since two kernel control paths
running on different CPUs could concurrently access the same data structure.
1.6.5.2 Interrupt disabling
Another synchronization mechanism for uniprocessor systems consists of disabling all
hardware interrupts before entering a critical region and reenabling them right after leaving it.
This mechanism, while simple, is far from optimal. If the critical region is large, interrupts
can remain disabled for a relatively long time, potentially causing all hardware activities to
freeze.
Moreover, on a multiprocessor system this mechanism doesn't work at all. There is no way to
ensure that no other CPU can access the same data structures updated in the protected critical
region.
1.6.5.3 Semaphores
A widely used mechanism, effective in both uniprocessor and multiprocessor systems, relies
on the use of semaphores. A semaphore is simply a counter associated with a data structure;
the semaphore is checked by all kernel threads before they try to access the data structure.
Each semaphore may be viewed as an object composed of:
• An integer variable
• A list of waiting processes
• Two atomic methods: down( ) and up( )
Understanding the Linux Kernel
28
The down( ) method decrements the value of the semaphore. If the new value is less than 0,
the method adds the running process to the semaphore list and then blocks (i.e., invokes the
scheduler). The up( ) method increments the value of the semaphore and, if its new value is
greater than or equal to 0, reactivates one or more processes in the semaphore list.
Each data structure to be protected has its own semaphore, which is initialized to 1. When a
kernel control path wishes to access the data structure, it executes the down( ) method on the
proper semaphore. If the value of the new semaphore isn't negative, access to the data
structure is granted. Otherwise, the process that is executing the kernel control path is added
to the semaphore list and blocked. When another process executes the up( ) method on that
semaphore, one of the processes in the semaphore list is allowed to proceed.
1.6.5.4 Spin locks
In multiprocessor systems, semaphores are not always the best solution to the synchronization
problems. Some kernel data structures should be protected from being concurrently accessed
by kernel control paths that run on different CPUs. In this case, if the time required to update
the data structure is short, a semaphore could be very inefficient. To check a semaphore, the
kernel must insert a process in the semaphore list and then suspend it. Since both operations
are relatively expensive, in the time it takes to complete them, the other kernel control path
could have already released the semaphore.
In these cases, multiprocessor operating systems make use of spin locks. A spin lock is very
similar to a semaphore, but it has no process list: when a process finds the lock closed by
another process, it "spins" around repeatedly, executing a tight instruction loop until the lock
becomes open.
Of course, spin locks are useless in a uniprocessor environment. When a kernel control path
tries to access a locked data structure, it starts an endless loop. Therefore, the kernel control
path that is updating the protected data structure would not have a chance to continue the
execution and release the spin lock. The final result is that the system hangs.
1.6.5.5 Avoiding deadlocks
Processes or kernel control paths that synchronize with other control paths may easily enter in
a deadlocked state. The simplest case of deadlock occurs when process p1 gains access to data
structure a and process p2 gains access to b, but p1 then waits for b and p2 waits for a. Other
more complex cyclic waitings among groups of processes may also occur. Of course, a
deadlock condition causes a complete freeze of the affected processes or kernel control paths.
As far as kernel design is concerned, deadlock becomes an issue when the number of kernel
semaphore types used is high. In this case, it may be quite difficult to ensure that no deadlock
state will ever be reached for all possible ways to interleave kernel control paths. Several
operating systems, including Linux, avoid this problem by introducing a very limited number
of semaphore types and by requesting semaphores in an ascending order.
Understanding the Linux Kernel
29
1.6.6 Signals and Interprocess Communication
Unix signals provide a mechanism for notifying processes of system events. Each event has
its own signal number, which is usually referred to by a symbolic constant such as SIGTERM.
There are two kinds of system events:
Asynchronous notifications
For instance, a user can send the interrupt signal SIGTERM to a foreground process by
pressing the interrupt keycode (usually, CTRL-C) at the terminal.
Synchronous errors or exceptions
For instance, the kernel sends the signal SIGSEGV to a process when it accesses a
memory location at an illegal address.
The POSIX standard defines about 20 different signals, two of which are user-definable and
may be used as a primitive mechanism for communication and synchronization among
processes in User Mode. In general, a process may react to a signal reception in two possible
ways:
• Ignore the signal.
• Asynchronously execute a specified procedure (the signal handler).
If the process does not specify one of these alternatives, the kernel performs a default action
that depends on the signal number. The five possible default actions are:
• Terminate the process.
• Write the execution context and the contents of the address space in a file (core dump)
and terminate the process.
• Ignore the signal.
• Suspend the process.
• Resume the process's execution, if it was stopped.
Kernel signal handling is rather elaborate since the POSIX semantics allows processes to
temporarily block signals. Moreover, a few signals such as SIGKILL cannot be directly
handled by the process and cannot be ignored.
AT&T's Unix System V introduced other kinds of interprocess communication among
processes in User Mode, which have been adopted by many Unix kernels: semaphores,
message queues, and shared memory. They are collectively known as System V IPC.
The kernel implements these constructs as IPC resources: a process acquires a resource by
invoking a shmget( ), semget( ), or msgget( ) system call. Just like files, IPC resources
are persistent: they must be explicitly deallocated by the creator process, by the current
owner, or by a superuser process.
Semaphores are similar to those described in Section 1.6.5 earlier in this chapter, except that
they are reserved for processes in User Mode. Message queues allow processes to exchange
Understanding the Linux Kernel
30
messages by making use of the msgsnd( ) and msgget( ) system calls, which respectively
insert a message into a specific message queue and extract a message from it.
Shared memory provides the fastest way for processes to exchange and share data. A process
starts by issuing a shmget( ) system call to create a new shared memory having a required
size. After obtaining the IPC resource identifier, the process invokes the shmat( ) system
call, which returns the starting address of the new region within the process address space.
When the process wishes to detach the shared memory from its address space, it invokes the
shmdt( ) system call. The implementation of shared memory depends on how the kernel
implements process address spaces.
1.6.7 Process Management
Unix makes a neat distinction between the process and the program it is executing. To that
end, the fork( ) and exit( ) system calls are used respectively to create a new process and
to terminate it, while an exec( )-like system call is invoked to load a new program. After
such a system call has been executed, the process resumes execution with a brand new
address space containing the loaded program.
The process that invokes a fork( ) is the parent while the new process is its child . Parents
and children can find each other because the data structure describing each process includes a
pointer to its immediate parent and pointers to all its immediate children.
A naive implementation of the fork( ) would require both the parent's data and the parent's
code to be duplicated and assign the copies to the child. This would be quite time-consuming.
Current kernels that can rely on hardware paging units follow the Copy-On-Write approach,
which defers page duplication until the last moment (i.e., until the parent or the child is
required to write into a page). We shall describe how Linux implements this technique in
Section 7.4.4 in Chapter 7.
The exit( ) system call terminates a process. The kernel handles this system call by
releasing the resources owned by the process and sending the parent process a SIGCHLD
signal, which is ignored by default.
1.6.7.1 Zombie processes
How can a parent process inquire about termination of its children? The wait( ) system call
allows a process to wait until one of its children terminates; it returns the process ID (PID) of
the terminated child.
When executing this system call, the kernel checks whether a child has already terminated. A
special zombie process state is introduced to represent terminated processes: a process
remains in that state until its parent process executes a wait( ) system call on it. The system
call handler extracts some data about resource usage from the process descriptor fields; the
process descriptor may be released once the data has been collected. If no child process has
already terminated when the wait( ) system call is executed, the kernel usually puts the
process in a wait state until a child terminates.
Many kernels also implement a waitpid( ) system call, which allows a process to wait for a
specific child process. Other variants of wait( ) system calls are also quite common.
Understanding the Linux Kernel
31
It's a good practice for the kernel to keep around information on a child process until the
parent issues its wait( ) call, but suppose the parent process terminates without issuing that
call? The information takes up valuable memory slots that could be used to serve living
processes. For example, many shells allow the user to start a command in the background and
then log out. The process that is running the command shell terminates, but its children
continue their execution.
The solution lies in a special system process called init that is created during system
initialization. When a process terminates, the kernel changes the appropriate process
descriptor pointers of all the existing children of the terminated process to make them become
children of init. This process monitors the execution of all its children and routinely issues
wait( ) system calls, whose side effect is to get rid of all zombies.
1.6.7.2 Process groups and login sessions
Modern Unix operating systems introduce the notion of process groups to represent a "job"
abstraction. For example, in order to execute the command line:
$ ls | sort | more
a shell that supports process groups, such as bash, creates a new group for the three processes
corresponding to ls, sort, and more. In this way, the shell acts on the three processes as if
they were a single entity (the job, to be precise). Each process descriptor includes a process
group ID field. Each group of processes may have a group leader, which is the process whose
PID coincides with the process group ID. A newly created process is initially inserted into the
process group of its parent.
Modern Unix kernels also introduce login sessions. Informally, a login session contains all
processes that are descendants of the process that has started a working session on a specific
terminal—usually, the first command shell process created for the user. All processes in a
process group must be in the same login session. A login session may have several process
groups active simultaneously; one of these process groups is always in the foreground, which
means that it has access to the terminal. The other active process groups are in the
background. When a background process tries to access the terminal, it receives a SIGTTIN or
SIGTTOUT signal. In many command shells the internal commands bg and fg can be used to
put a process group in either the background or the foreground.
1.6.8 Memory Management
Memory management is by far the most complex activity in a Unix kernel. We shall dedicate
more than a third of this book just to describing how Linux does it. This section illustrates
some of the main issues related to memory management.
1.6.8.1 Virtual memory
All recent Unix systems provide a useful abstraction called virtual memory. Virtual memory
acts as a logical layer between the application memory requests and the hardware Memory
Management Unit (MMU). Virtual memory has many purposes and advantages:
Understanding the Linux Kernel
32
• Several processes can be executed concurrently.
• It is possible to run applications whose memory needs are larger than the available
physical memory.
• Processes can execute a program whose code is only partially loaded in memory.
• Each process is allowed to access a subset of the available physical memory.
• Processes can share a single memory image of a library or program.
• Programs can be relocatable, that is, they can be placed anywhere in physical memory.
• Programmers can write machine-independent code, since they do not need to be
concerned about physical memory organization.
The main ingredient of a virtual memory subsystem is the notion of virtual address space.
The set of memory references that a process can use is different from physical memory
addresses. When a process uses a virtual address,[9]
the kernel and the MMU cooperate to
locate the actual physical location of the requested memory item.
[9]
These addresses have different nomenclatures depending on the computer architecture. As we'll see in Chapter 2, Intel 80x86 manuals refer to them
as "logical addresses."
Today's CPUs include hardware circuits that automatically translate the virtual addresses into
physical ones. To that end, the available RAM is partitioned into page frames 4 or 8 KB in
length, and a set of page tables is introduced to specify the correspondence between virtual
and physical addresses. These circuits make memory allocation simpler, since a request for a
block of contiguous virtual addresses can be satisfied by allocating a group of page frames
having noncontiguous physical addresses.
1.6.8.2 Random access memory usage
All Unix operating systems clearly distinguish two portions of the random access memory
(RAM). A few megabytes are dedicated to storing the kernel image (i.e., the kernel code and
the kernel static data structures). The remaining portion of RAM is usually handled by the
virtual memory system and is used in three possible ways:
• To satisfy kernel requests for buffers, descriptors, and other dynamic kernel data
structures
• To satisfy process requests for generic memory areas and for memory mapping of files
• To get better performance from disks and other buffered devices by means of caches
Each request type is valuable. On the other hand, since the available RAM is limited, some
balancing among request types must be done, particularly when little available memory is left.
Moreover, when some critical threshold of available memory is reached and a page-frame-
reclaiming algorithm is invoked to free additional memory, which are the page frames most
suitable for reclaiming? As we shall see in Chapter 16, there is no simple answer to this
question and very little support from theory. The only available solution lies in developing
carefully tuned empirical algorithms.
One major problem that must be solved by the virtual memory system is memory
fragmentation . Ideally, a memory request should fail only when the number of free page
frames is too small. However, the kernel is often forced to use physically contiguous memory
areas, hence the memory request could fail even if there is enough memory available but it is
not available as one contiguous chunk.
Understanding the Linux Kernel
33
1.6.8.3 Kernel Memory Allocator
The Kernel Memory Allocator (KMA) is a subsystem that tries to satisfy the requests for
memory areas from all parts of the system. Some of these requests will come from other
kernel subsystems needing memory for kernel use, and some requests will come via system
calls from user programs to increase their processes' address spaces. A good KMA should
have the following features:
• It must be fast. Actually, this is the most crucial attribute, since it is invoked by all
kernel subsystems (including the interrupt handlers).
• It should minimize the amount of wasted memory.
• It should try to reduce the memory fragmentation problem.
• It should be able to cooperate with the other memory management subsystems in order
to borrow and release page frames from them.
Several kinds of KMAs have been proposed, which are based on a variety of different
algorithmic techniques, including:
• Resource map allocator
• Power-of-two free lists
• McKusick-Karels allocator
• Buddy system
• Mach's Zone allocator
• Dynix allocator
• Solaris's Slab allocator
As we shall see in Chapter 6, Linux's KMA uses a Slab allocator on top of a Buddy system.
1.6.8.4 Process virtual address space handling
The address space of a process contains all the virtual memory addresses that the process is
allowed to reference. The kernel usually stores a process virtual address space as a list of
memory area descriptors. For example, when a process starts the execution of some program
via an exec( )-like system call, the kernel assigns to the process a virtual address space that
comprises memory areas for:
• The executable code of the program
• The initialized data of the program
• The uninitialized data of the program
• The initial program stack (that is, the User Mode stack)
• The executable code and data of needed shared libraries
• The heap (the memory dynamically requested by the program)
All recent Unix operating systems adopt a memory allocation strategy called demand paging.
With demand paging, a process can start program execution with none of its pages in physical
memory. As it accesses a nonpresent page, the MMU generates an exception; the exception
handler finds the affected memory region, allocates a free page, and initializes it with the
appropriate data. In a similar fashion, when the process dynamically requires some memory
by using malloc( ) or the brk( ) system call (which is invoked internally by malloc( )),
the kernel just updates the size of the heap memory region of the process. A page frame is
Understanding the Linux Kernel
34
assigned to the process only when it generates an exception by trying to refer its virtual
memory addresses.
Virtual address spaces also allow other efficient strategies, such as the Copy-On-Write
strategy mentioned earlier. For example, when a new process is created, the kernel just
assigns the parent's page frames to the child address space, but it marks them read only. An
exception is raised as soon the parent or the child tries to modify the contents of a page. The
exception handler assigns a new page frame to the affected process and initializes it with the
contents of the original page.
1.6.8.5 Swapping and caching
In order to extend the size of the virtual address space usable by the processes, the Unix
operating system makes use of swap areas on disk. The virtual memory system regards the
contents of a page frame as the basic unit for swapping. Whenever some process refers to a
swapped-out page, the MMU raises an exception. The exception handler then allocates a new
page frame and initializes the page frame with its old contents saved on disk.
On the other hand, physical memory is also used as cache for hard disks and other block
devices. This is because hard drives are very slow: a disk access requires several milliseconds,
which is a very long time compared with the RAM access time. Therefore, disks are often the
bottleneck in system performance. As a general rule, one of the policies already implemented
in the earliest Unix system is to defer writing to disk as long as possible by loading into RAM
a set of disk buffers corresponding to blocks read from disk. The sync( ) system call forces
disk synchronization by writing all of the "dirty" buffers (i.e., all the buffers whose contents
differ from that of the corresponding disk blocks) into disk. In order to avoid data loss, all
operating systems take care to periodically write dirty buffers back to disk.
1.6.9 Device Drivers
The kernel interacts with I/O devices by means of device drivers. Device drivers are included
in the kernel and consist of data structures and functions that control one or more devices,
such as hard disks, keyboards, mouses, monitors, network interfaces, and devices connected
to a SCSI bus. Each driver interacts with the remaining part of the kernel (even with other
drivers) through a specific interface. This approach has the following advantages:
• Device-specific code can be encapsulated in a specific module.
• Vendors can add new devices without knowing the kernel source code: only the
interface specifications must be known.
• The kernel deals with all devices in a uniform way and accesses them through the
same interface.
• It is possible to write a device driver as a module that can be dynamically loaded in the
kernel without requiring the system to be rebooted. It is also possible to dynamically
unload a module that is no longer needed, thus minimizing the size of the kernel image
stored in RAM.
Figure 1-5 illustrates how device drivers interface with the rest of the kernel and with the
processes. Some user programs (P) wish to operate on hardware devices. They make requests
to the kernel using the usual file-related system calls and the device files normally found in
the /dev directory. Actually, the device files are the user-visible portion of the device driver
Understanding the Linux Kernel
35
interface. Each device file refers to a specific device driver, which is invoked by the kernel in
order to perform the requested operation on the hardware component.
Figure 1-5. Device driver interface
It is worth mentioning that at the time Unix was introduced graphical terminals were
uncommon and expensive, and thus only alphanumeric terminals were handled directly by
Unix kernels. When graphical terminals became widespread, ad hoc applications such as the
X Window System were introduced that ran as standard processes and accessed the I/O ports
of the graphics interface and the RAM video area directly. Some recent Unix kernels, such as
Linux 2.2, include limited support for some frame buffer devices, thus allowing a program to
access the local memory inside a video card through a device file.
Understanding the Linux Kernel
36
Chapter 2. Memory Addressing
This chapter deals with addressing techniques. Luckily, an operating system is not forced to
keep track of physical memory all by itself; today's microprocessors include several hardware
circuits to make memory management both more efficient and more robust in case of
programming errors.
As in the rest of this book, we offer details in this chapter on how Intel 80x86
microprocessors address memory chips and how Linux makes use of the available addressing
circuits. You will find, we hope, that when you learn the implementation details on Linux's
most popular platform you will better understand both the general theory of paging and how
to research the implementation on other platforms.
This is the first of three chapters related to memory management: Chapter 6, discusses how
the kernel allocates main memory to itself, while Chapter 7, considers how linear addresses
are assigned to processes.
2.1 Memory Addresses
Programmers casually refer to a memory address as the way to access the contents of
a memory cell. But when dealing with Intel 80x86 microprocessors, we have to distinguish
among three kinds of addresses:
Logical address
Included in the machine language instructions to specify the address of an operand or
of an instruction. This type of address embodies the well-known Intel segmented
architecture that forces MS-DOS and Windows programmers to divide their programs
into segments. Each logical address consists of a segment and an offset (or
displacement) that denotes the distance from the start of the segment to the actual
address.
Linear address
A single 32-bit unsigned integer that can be used to address up to 4 GB, that is, up to
4,294,967,296 memory cells. Linear addresses are usually represented in hexadecimal
notation; their values range from 0x00000000 to 0xffffffff.
Physical address
Used to address memory cells included in memory chips. They correspond to the
electrical signals sent along the address pins of the microprocessor to the memory bus.
Physical addresses are represented as 32-bit unsigned integers.
The CPU control unit transforms a logical address into a linear address by means of a
hardware circuit called a segmentation unit; successively, a second hardware circuit called a
paging unit transforms the linear address into a physical address (see Figure 2-1).
Understanding the Linux Kernel
37
Figure 2-1. Logical address translation
2.2 Segmentation in Hardware
Starting with the 80386 model, Intel microprocessors perform address translation in two
different ways called real mode and protected mode. Real mode exists mostly to maintain
processor compatibility with older models and to allow the operating system to bootstrap (see
Appendix A, for a short description of real mode). We shall thus focus our attention on
protected mode.
2.2.1 Segmentation Registers
A logical address consists of two parts: a segment identifier and an offset that specifies the
relative address within the segment. The segment identifier is a 16-bit field called Segment
Selector, while the offset is a 32-bit field.
To make it easy to retrieve segment selectors quickly, the processor provides segmentation
registers whose only purpose is to hold Segment Selectors; these registers are called cs, ss,
ds, es, fs, and gs. Although there are only six of them, a program can reuse the same
segmentation register for different purposes by saving its content in memory and then
restoring it later.
Three of the six segmentation registers have specific purposes:
cs
The code segment register, which points to a segment containing program instructions
ss
The stack segment register, which points to a segment containing the current program
stack
ds
The data segment register, which points to a segment containing static and external
data
The remaining three segmentation registers are general purpose and may refer to arbitrary
segments.
The cs register has another important function: it includes a 2-bit field that specifies the
Current Privilege Level (CPL) of the CPU. The value denotes the highest privilege level, while
the value 3 denotes the lowest one. Linux uses only levels and 3, which are respectively called
Kernel Mode and User Mode.
Understanding the Linux Kernel
38
2.2.2 Segment Descriptors
Each segment is represented by an 8-byte Segment Descriptor (see Figure 2-2) that describes
the segment characteristics. Segment Descriptors are stored either in the Global Descriptor
Table (GDT ) or in the Local Descriptor Table (LDT ).
Figure 2-2. Segment Descriptor format
Usually only one GDT is defined, while each process may have its own LDT. The address of
the GDT in main memory is contained in the gdtr processor register and the address of the
currently used LDT is contained in the ldtr processor register.
Each Segment Descriptor consists of the following fields:
• A 32-bit Base field that contains the linear address of the first byte of the segment.
• A G granularity flag: if it is cleared, the segment size is expressed in bytes; otherwise,
it is expressed in multiples of 4096 bytes.
• A 20-bit Limit field that denotes the segment length in bytes. If G is set to 0, the size
of a non-null segment may vary between 1 byte and 1 MB; otherwise, it may vary
between 4 KB and 4 GB.
• An S system flag: if it is cleared, the segment is a system segment that stores kernel
data structures; otherwise, it is a normal code or data segment.
• A 4-bit Type field that characterizes the segment type and its access rights. The
following Segment Descriptor types are widely used:
Code Segment Descriptor
Indicates that the Segment Descriptor refers to a code segment; it may be included
either in the GDT or in the LDT. The descriptor has the S flag set.
Understanding the Linux Kernel
39
Data Segment Descriptor
Indicates that the Segment Descriptor refers to a data segment; it may be included
either in the GDT or in the LDT. The descriptor has the S flag set. Stack segments are
implemented by means of generic data segments.
Task State Segment Descriptor (TSSD)
Indicates that the Segment Descriptor refers to a Task State Segment (TSS), that is,
a segment used to save the contents of the processor registers (see Section 3.2.2 in
Chapter 3); it can appear only in the GDT. The corresponding Type field has the value
11 or 9, depending on whether the corresponding process is currently executing on the
CPU. The S flag of such descriptors is set to 0.
Local Descriptor Table Descriptor (LDTD)
Indicates that the Segment Descriptor refers to a segment containing an LDT; it can
appear only in the GDT. The corresponding Type field has the value 2. The S flag of
such descriptors is set to 0.
• A DPL (Descriptor Privilege Level ) 2-bit field used to restrict accesses to the segment.
It represents the minimal CPU privilege level requested for accessing the segment.
Therefore, a segment with its DPL set to is accessible only when the CPL is 0, that is, in
Kernel Mode, while a segment with its DPL set to 3 is accessible with every CPL value.
• A Segment-Present flag that is set to if the segment is currently not stored in main
memory. Linux always sets this field to 1, since it never swaps out whole segments to
disk.
• An additional flag called D or B depending on whether the segment contains code or
data. Its meaning is slightly different in the two cases, but it is basically set if the
addresses used as segment offsets are 32 bits long and it is cleared if they are 16 bits
long (see the Intel manual for further details).
• A reserved bit (bit 53) always set to 0.
• An AVL flag that may be used by the operating system but is ignored in Linux.
2.2.3 Segment Selectors
To speed up the translation of logical addresses into linear addresses, the Intel processor
provides an additional nonprogrammable register—that is, a register that cannot be set by a
programmer—for each of the six programmable segmentation registers. Each
nonprogrammable register contains the 8-byte Segment Descriptor (described in the previous
section) specified by the Segment Selector contained in the corresponding segmentation
register. Every time a Segment Selector is loaded in a segmentation register, the
corresponding Segment Descriptor is loaded from memory into the matching
nonprogrammable CPU register. From then on, translations of logical addresses referring to
that segment can be performed without accessing the GDT or LDT stored in main memory;
the processor can just refer directly to the CPU register containing the Segment Descriptor.
Accesses to the GDT or LDT are necessary only when the contents of the segmentation
register change (see Figure 2-3). Each Segment Selector includes the following fields:
Understanding the Linux Kernel
40
• A 13-bit index (described further in the text following this list) that identifies the
corresponding Segment Descriptor entry contained in the GDT or in the LDT
• A TI (Table Indicator) flag that specifies whether the Segment Descriptor is included
in the GDT (TI = 0) or in the LDT (TI = 1)
• An RPL (Requestor Privilege Level ) 2-bit field, which is precisely the Current
Privilege Level of the CPU when the corresponding Segment Selector is loaded into
the cs register[1]
[1]
The RPL field may also be used to selectively weaken the processor privilege level when accessing data segments; see Intel documentation for
details.
Figure 2-3. Segment Selector and Segment Descriptor
Since a Segment Descriptor is 8 bytes long, its relative address inside the GDT or the LDT is
obtained by multiplying the most significant 13 bits of the Segment Selector by 8. For
instance, if the GDT is at 0x00020000 (the value stored in the gdtr register) and the index
specified by the Segment Selector is 2, the address of the corresponding Segment Descriptor
is 0x00020000 + (2 x 8), or 0x00020010.
The first entry of the GDT is always set to 0: this ensures that logical addresses with a null
Segment Selector will be considered invalid, thus causing a processor exception. The
maximum number of Segment Descriptors that can be stored in the GDT is thus 8191, that is,
213
-1.
2.2.4 Segmentation Unit
Figure 2-4 shows in detail how a logical address is translated into a corresponding linear
address. The segmentation unit performs the following operations:
• Examines the TI field of the Segment Selector, in order to determine which Descriptor
Table stores the Segment Descriptor. This field indicates that the Descriptor is either
in the GDT (in which case the segmentation unit gets the base linear address of the
GDT from the gdtr register) or in the active LDT (in which case the segmentation
unit gets the base linear address of that LDT from the ldtr register).
• Computes the address of the Segment Descriptor from the index field of the Segment
Selector. The index field is multiplied by 8 (the size of a Segment Descriptor), and the
result is added to the content of the gdtr or ldtr register.
• Adds to the Base field of the Segment Descriptor the offset of the logical address, thus
obtains the linear address.
Understanding the Linux Kernel
41
Figure 2-4. Translating a logical address
Notice that, thanks to the nonprogrammable registers associated with the segmentation
registers, the first two operations need to be performed only when a segmentation register has
been changed.
2.3 Segmentation in Linux
Segmentation has been included in Intel microprocessors to encourage programmers to split
their applications in logically related entities, such as subroutines or global and local data
areas. However, Linux uses segmentation in a very limited way. In fact, segmentation and
paging are somewhat redundant since both can be used to separate the physical address spaces
of processes: segmentation can assign a different linear address space to each process while
paging can map the same linear address space into different physical address spaces. Linux
prefers paging to segmentation for the following reasons:
• Memory management is simpler when all processes use the same segment register
values, that is, when they share the same set of linear addresses.
• One of the design objectives of Linux is portability to the most popular architectures;
however, several RISC processors support segmentation in a very limited way.
The 2.2 version of Linux uses segmentation only when required by the Intel 80x86
architecture. In particular, all processes use the same logical addresses, so the total number of
segments to be defined is quite limited and it is possible to store all Segment Descriptors in
the Global Descriptor Table (GDT). This table is implemented by the array gdt_table
referred by the gdt variable. If you look in the Source Code Index, you can see that these
symbols are defined in the file arch/i386/kernel/head.S. Every macro, function, and other
symbol in this book is listed in the appendix so you can quickly find it in the source code.
Local Descriptor Tables are not used by the kernel, although a system call exists that allows
processes to create their own LDTs. This turns out to be useful to applications such as Wine
that execute segment-oriented Microsoft Windows applications.
Understanding the Linux Kernel
42
Here are the segments used by Linux:
• A kernel code segment. The fields of the corresponding Segment Descriptor in the
GDT have the following values:
o Base = 0x00000000
o Limit = 0xfffff
o G (granularity flag) = 1, for segment size expressed in pages
o S (system flag) = 1, for normal code or data segment
o Type = 0xa, for code segment that can be read and executed
o DPL (Descriptor Privilege Level) = 0, for Kernel Mode
o D/B (32-bit address flag) = 1, for 32-bit offset addresses
Thus, the linear addresses associated with that segment start at and reach the
addressing limit of 232
- 1. The S and Type fields specify that the segment is a code
segment that can be read and executed. Its DPL value is 0, thus it can be accessed only
in Kernel Mode. The corresponding Segment Selector is defined by the __KERNEL_CS
macro: in order to address the segment, the kernel just loads the value yielded by the
macro into the cs register.
• A kernel data segment. The fields of the corresponding Segment Descriptor in the
GDT have the following values:
o Base = 0x00000000
o Limit = 0xfffff
o G (granularity flag) = 1, for segment size expressed in pages
o S (system flag) = 1, for normal code or data segment
o Type = 2, for data segment that can be read and written
o DPL (Descriptor Privilege Level) = 0, for Kernel Mode
o D/B (32-bit address flag) = 1, for 32-bit offset addresses
This segment is identical to the previous one (in fact, they overlap in the linear address
space) except for the value of the Type field, which specifies that it is a data segment
that can be read and written. The corresponding Segment Selector is defined by the
__KERNEL_DS macro.
• A user code segment shared by all processes in User Mode. The fields of the
corresponding Segment Descriptor in the GDT have the following values:
o Base = 0x00000000
o Limit = 0xfffff
o G (granularity flag) = 1, for segment size expressed in pages
o S (system flag) = 1, for normal code or data segment
o Type = 0xa, for code segment that can be read and executed
o DPL (Descriptor Privilege Level) = 3, for User Mode
o D/B (32-bit address flag) = 1, for 32-bit offset addresses
The S and DPL fields specify that the segment is not a system segment and that its
privilege level is equal to 3; it can thus be accessed both in Kernel Mode and in User
Mode. The corresponding Segment Selector is defined by the __USER_CS macro.
Understanding the Linux Kernel
43
• A user data segment shared by all processes in User Mode. The fields of the
corresponding Segment Descriptor in the GDT have the following values:
o Base = 0x00000000
o Limit = 0xfffff
o G (granularity flag) = 1, for segment size expressed in pages
o S (system flag) = 1, for normal code or data segment
o Type = 2, for data segment that can be read and written
o DPL (Descriptor Privilege Level) = 3, for User Mode
o D/B (32-bit address flag) = 1, for 32-bit offset addresses
This segment overlaps the previous one: they are identical, except for the value of
Type. The corresponding Segment Selector is defined by the __USER_DS macro.
• A Task State Segment (TSS) segment for each process. The descriptors of these
segments are stored in the GDT. The Base field of the TSS descriptor associated with
each process contains the address of the tss field of the corresponding process
descriptor. The G flag is cleared, while the Limit field is set to 0xeb, since the TSS
segment is 236 bytes long. The Type field is set to 9 or 11 (available 32-bit TSS), and
the DPL is set to 0, since processes in User Mode are not allowed to access TSS
segments.
• A default LDT segment that is usually shared by all processes. This segment is stored
in the default_ldt variable. The default LDT includes a single entry consisting of a
null Segment Descriptor. Each process has its own LDT Segment Descriptor, which
usually points to the common default LDT segment. The Base field is set to the
address of default_ldt and the Limit field is set to 7. If a process requires a real
LDT, a new 4096-byte segment is created (it can include up to 511 Segment
Descriptors), and the default LDT Segment Descriptor associated with that process is
replaced in the GDT with a new descriptor with specific values for the Base and
Limit fields.
For each process, therefore, the GDT contains two different Segment Descriptors: one for the
TSS segment and one for the LDT segment. The maximum number of entries allowed in the
GDT is 12+2xNR_TASKS, where, in turn, NR_TASKS denotes the maximum number of
processes. In the previous list we described the six main Segment Descriptors used by Linux.
Four additional Segment Descriptors cover Advanced Power Management (APM) features,
and four entries of the GDT are left unused, for a grand total of 14.
As we mentioned before, the GDT can have at most 213
= 8192 entries, of which the first is
always null. Since 14 are either unused or filled by the system, NR_TASKS cannot be larger
than 8180/2 = 4090.
The TSS and LDT descriptors for each process are added to the GDT as the process is
created. As we shall see in Section 3.3.2 in Chapter 3, the kernel itself spawns the first
process: process running init_task . During kernel initialization, the trap_init( )
function inserts the TSS descriptor of this first process into the GDT using the statement:
set_tss_desc(0, &init_task.tss);
The first process creates others, so that every subsequent process is the child of some existing
process. The copy_thread( ) function, which is invoked from the clone( ) and fork( )
Understanding the Linux Kernel
44
system calls to create new processes, executes the same function in order to set the TSS of the
new process:
set_tss_desc(nr, &(task[nr]->tss));
Since each TSS descriptor refers to a different process, of course, each Base field has a
different value. The copy_thread( ) function also invokes the set_ldt_desc( ) function
in order to insert a Segment Descriptor in the GDT relative to the default LDT for the new
process.
The kernel data segment includes a process descriptor for each process. Each process
descriptor includes its own TSS segment and a pointer to its LDT segment, which is also
located inside the kernel data segment.
As stated earlier, the Current Privilege Level of the CPU reflects whether the processor is in
User or Kernel Mode and is specified by the RPL field of the Segment Selector stored in the
cs register. Whenever the Current Privilege Level is changed, some segmentation registers
must be correspondingly updated. For instance, when the CPL is equal to 3 (User Mode), the
ds register must contain the Segment Selector of the user data segment, but when the CPL is
equal to 0, the ds register must contain the Segment Selector of the kernel data segment.
A similar situation occurs for the ss register: it must refer to a User Mode stack inside the
user data segment when the CPL is 3, and it must refer to a Kernel Mode stack inside the
kernel data segment when the CPL is 0. When switching from User Mode to Kernel Mode,
Linux always makes sure that the ss register contains the Segment Selector of the kernel data
segment.
2.4 Paging in Hardware
The paging unit translates linear addresses into physical ones. It checks the requested access
type against the access rights of the linear address. If the memory access is not valid, it
generates a page fault exception (see Chapter 4, and Chapter 6).
For the sake of efficiency, linear addresses are grouped in fixed-length intervals called pages;
contiguous linear addresses within a page are mapped into contiguous physical addresses. In
this way, the kernel can specify the physical address and the access rights of a page instead of
those of all the linear addresses included in it. Following the usual convention, we shall use
the term "page" to refer both to a set of linear addresses and to the data contained in this group
of addresses.
The paging unit thinks of all RAM as partitioned into fixed-length page frames (they are
sometimes referred to as physical pages). Each page frame contains a page, that is, the length
of a page frame coincides with that of a page. A page frame is a constituent of main memory,
and hence it is a storage area. It is important to distinguish a page from a page frame: the
former is just a block of data, which may be stored in any page frame or on disk.
The data structures that map linear to physical addresses are called page tables; they are
stored in main memory and must be properly initialized by the kernel before enabling the
paging unit.
Exploring the Variety of Random
Documents with Different Content
looks as if he could do with a bit more, but he always is thin. We
have got a very tall lot of men here, Cecil, Tom Greenfield, Godley,
Fitzclarence, Bentinck, all make an ordinary six-foot individual feel
small, and McKenna isn't exactly short. If we have length
represented we also have breadth, which even our present rations
are unable to reduce. I am certainly not going to quote a nominal
roll of these individuals, as they are fine strong men and I can't get
away.
2nd, Wednesday. This morning firing is going on. I suppose
another attack. I will go out and see. One rather funny incident in
connection with the Boer attack took place yesterday. As a rule they
knock off for breakfast, but yesterday they kept it up till some time
past 8 o'clock, so at 8 o'clock punctually the natives left their
trenches with their tins to draw their porridge, absolutely
disregarding the Boer fire which was renewed at intervals all day. It
is perfectly incredible how we have pushed them back, for within the
area where our advanced trenches now are I recollect seeing a
horse-battery of theirs in action during the first few days of the
siege. They take particular care not to play those games now. I only
wish they would. This sort of drivel relieves one's feelings, even if
one can't see relief.
3rd, Thursday. Firing yesterday and to-day was not of any
value; they kept it up off and on all day. I sat on the roof with the
officers of the Bechuanaland Rifles, and looked on till we got bored.
The operation of getting on to and off the roof again was far more
dangerous than the ordinary Boer battle. This evening I rode round
the guards with Major Panzera. It would take a more enterprising
Boer than we have run up against to get in. Major Panzera has a
theory that he can't be hit; I haven't, however. Both our theories are
good enough viewed from the light of experience.
The Germans participating in the defence of the town are going
to be photographed. I feel sorry for the German Emperor not being
here. He would enjoy this war thoroughly.
I heard from Weston-Jarvis this morning. He wrote a very
cheery letter. At last they appear to be making some effort to relieve
us. Why on earth they didn't try before, Heaven only knows! It
seems a perfectly simple operation for any man of any ordinary
sense, but really it doesn't much matter in the long run whether it is
a month or two sooner or later. I also see the "Baron" is coming
down to relieve us. I hope he won't fall on his head and get
stretched out as he usually persists in doing. We are always meeting
each other in some old ship or other, or in some out of the way
continent, but certainly I never expected to be relieved by the
"Baron" in the middle of Africa; however, the more pals that roll up
the better.
4th, Friday. Absolute quiet. My last letters have fallen into the
Dutchmen's hands. They will be nice light reading for them, as they
were barely complimentary. I do not expect to be popular after this
war. When one is tired and bored out here, it is very refreshing to be
able to abuse all and sundry, and think that one need not settle up
for another two or three months.
5th, Saturday. Life is short, but temper is shorter. Runners in but
no news. This morning a funeral party of the Bechuanaland Rifles
marched from the hospital to the cemetery to bury the remains, I
say advisedly remains, of Lance-Corporal Ironside, who, after having
been wounded some two months ago, had recently had his leg
amputated, and had at last died from sheer weakness. He bore his
extreme sufferings with remarkable fortitude, pluck, and cheeriness.
He was a Scotchman, from Aberdeen, and one of the best shots in
the garrison. It is satisfactory to think that he had already avenged
his death before he was wounded.
6th, Sunday. To-day the Boers most deliberately violated the
tacit Sunday truce which, at their own instigation and request, we
have always observed. The whole proceedings were very peculiar. It
was a fine morning, and the Sabbath calm pervading the town and
the surrounding forts was manifest in the way we were all strolling
about the market square. As regards myself, I had just purchased
some bases of shells at Platnauer's auction mart, where the weekly
auction was proceeding. The firing began, and nobody paid much
attention except the officers and men belonging to the quarter at
which it was apparently directed. They, on foot, horseback, and
bicycle, dispersed headlong to their various posts. One, Mr.
McKenzie, on a bicycle, striking the railway line, reached his post in
four minutes and fifteen seconds, fifteen seconds too quick for the
Boer he was enabled to bag. The Boers, who on previous Sundays
had displayed an inclination to loot our cattle, had crept up to the
dead ground east of Cannon Kopje, and hastily shot one of our cattle
guard and stolen the horses and mules under his charge. It was the
more annoying that they should have been successful as we were
well prepared for them, and had rather anticipated this attack,
having a Maxim in ambush within one hundred and fifty yards, which
unfortunately jammed, and failed to polish off the lot, as it certainly
ought to have done. If we had had any luck it would have been a
very different story. Directly the Maxim began the Boers nipped off
their horses and running alongside of them for protection reached
the cover in the fold of the ground. Unfortunately they killed poor
Francis of the B.S.A.P. (the second brother who has fallen here since
the fighting began) and took all the horses. It was very annoying,
but a smart bit of work and I congratulate the Dutchmen, whoever
they may be, who conducted it. Still it was a breach of our Sunday
truce, and if all is fair in love and war the many irate spectators will
have their pound of flesh to ask for later on. It really was a curious
sight: lines of men impotently watching the raid and behind them
the shouts of the unmoved auctioneer of "Going at fifteen bob."
"Last time." "Going." "Going." "Gone," and gone they were
undoubtedly, but they were our horses and he was referring to some
scrap iron. To cover this nefarious procedure they opened a heavy
fire on various outlying forts. We were lucky enough in the
interchange of courtesies to secure a Dutchman on the railway line,
and as they had practically violated the white flag our advanced
posts had great shooting all the afternoon at his friends who came
to try to pick him up. We buried Francis this evening. The concert
was put off. A certain amount of endurance has been shown by the
inhabitants and a certain amount of pluck by the defenders of the
town, but prior to the Boers starting fooling (successful fooling and
neatly carried out), I and several more were standing in the market
square gossiping about things we did know, and things we didn't,
when we happened to notice a very weak-looking child, apparently
as near death as any living creature could be. It transpired on
inquiry that this infant was a Dutch one, Graaf by name. His father, a
refugee, died of fever; his brother was in hospital, and he had been
offered admission, which he refused, because he said that he must
look after his mother. Even then, though scarcely able to cross the
road, the kid was going to draw his rations. He was taken to
hospital, but I think that this is about the pluckiest individual that
has come under my notice, and nobody can take exception to the
child, though his mother is probably one of those amiable ladies who
eat our rations, betray our plans, and are always expressing a
whole-hearted wish for our extermination.
15th, Tuesday. News has arrived that our troops are within
striking distance; "Sister Ann" performance has begun again. We are
now beginning to recover from our exciting Saturday. As I wired
home, it was the best day that I ever saw, and I must now try and
describe it.
Just before four o'clock in the morning we were roused by
heavy firing. The garrison turned out and manned the various works.
We all turned up, and I went to the headquarters. Everybody got
their horses ready, armed themselves as best they could, and
awaited the real attack. Colonel Baden-Powell said at once the real
attack would be on the stadt. We have had a good many attacks and
don't attach much importance to them, but we did not any one of us
anticipate the day's work that was in store for us. When I say
anticipate, every possible preparation had been made. Well, we
hung about in the cold. After about an hour and a half the firing on
the eastern front began to slacken. Trooper Waterson of the Blues,
as usual, had coffee and cocoa ready at once, and we felt we could
last a bit. Jokes were freely bandied, and we kept saying, "When are
they going to begin?" Suddenly on the west a conflagration was
seen, and betting began as to how far out it was. I got on to the
roof of a house, and with Mr. Arnold, of Dixon's Hotel, saw a very
magnificent sight. Apparently the whole stadt was on fire, and with
the sunrise behind us and the stadt in flames in front, the
combination of effects was truly magnificent, if not exactly
reassuring. However, nobody seemed to mind much. Our guns,
followed by the Bechuanaland Rifles, hurried across the square, men
laughing and joking and saying, "we were going to have a good
fight." Then came the news that the B.S.A.P. fort, garrisoned by the
Protectorate Regiment, had fallen into the enemy's hands. Personally
I did not believe it to be true, and started with a carbine to assure
myself of the fact. I got close up to the fort, met a squadron running
obliquely across its front, and though the bullets were coming from
that direction could not believe but that they were our own men who
were strolling about outside it. That is the worst of being educated
under black powder. I saw poor Hazelrigg, who was a personal friend
of mine, and whom I knew at home, shot, but did not realise who he
was. Both sides were inextricably mixed, but having ridden about,
and got the hang of things, I am certain that within twenty minutes,
order and confidence were absolutely restored on our side. You saw
bodies of men, individuals, everybody armed with what they could
get, guns of any sort, running towards the firing. A smile on every
man's face, and the usual remark was, "Now we've got the
beggars." The "beggars" in question were under the impression that
they had got us and no doubt had a certain amount of ground for
their belief. The fight then began. At least we began to fight, for up
till then no return had been made to the very heavy fusillade to
which we had been subjected. I have soldiered for some years and I
have never seen anything smarter or better than the way the
Bechuanaland Rifles, our Artillery and the Protectorate Regiment ran
down and got between the Boers and their final objective. The Boers
then sent a message through the telephone to say they had got
Colonel Hore and his force prisoners and that we could not touch
them. Campbell, our operator, returned a few remarks of his own not
perhaps wholly complimentary and the telephone was disconnected
and re-connected with Major Godley. Our main telephone wire runs
through the B.S.A.P. fort. McLeod, the man in charge of the wires,
commenced careering about armed with a stick and a rifle, and
followed by his staff of black men with the idea of directly
connecting Major Godley's fort and the headquarters. I may mention
McLeod is a sailor and conducts his horse on the principle of a ship.
He is perhaps the worst horseman I have ever seen and it says
much for the honour of the horse flesh of Mafeking that he is still
alive. However, be that as it may, his pawky humour and absolute
disregard of danger has made him one of the most amusing features
of the siege. You always hear him in broad Scotch and remarkable
places, but he is always where he is wanted. By this time we were
settling down a bit, so were they. They looted everything they
possibly could. A Frenchman got on to the roof of the fort with a
bottle of Burgundy belonging to the officers' mess to drink to
"Fashoda." He got hit in the stomach and his pals drank the bottle.
Our men were very funny. When the Frenchmen yelled "Fashoda,"
they said "silly beggars, their geography is wrong." I was very
pleased with the whole day. I have never heard more or worse jokes
made, and, no doubt, had I been umpiring, I should have put some
of us out of action or at any rate given them a slight advantage.
Every townsman otherwise unoccupied, who had possibly never
contemplated the prospect of a fight to the finish, now turned out.
Mr. Weil (and too much cannot be said for his resource through
every feature of the siege) broke open his boxes, served out every
species of firearms he could to every person who wanted them.
BOERS FIRING THE NATIVE STADT.
A very deaf old soldier, late of the 24th Regiment, Masters by
name, asked where they were, and then proceeded to investigate in
a most practical fashion. I went down to the jail which more or less
commands the B.S.A.P. fort and buildings, and had a look, and as we
saw that no attack was imminent or at any rate likely to prove
successful, we knocked off by parties and had our breakfast. We
were beginning to kill them very nicely. Jail prisoners had all been
released. Murchison, who shot Parslow, Lonie, the greatest criminal
of the town, were both armed and doing their duty. We were all
shooting with the greatest deliberation and effect whenever they
showed themselves, and perhaps I was better pleased with being an
Englishman from a sightseer's point of view than on any day since
the Jubilee. The quaint part of the whole thing was that we were
shooting at our own people unwittingly. I had a cousin there, and we
laughed consumedly in the evening when we exchanged notes and
found that we had been shooting close to him amongst others. I
don't think that any man who was in that fight will ever think ill of
his neighbour from the highest to the lowest; from our General--or,
at least, he ought to be a General--to the ordinary civilian,
everybody was cheerful and confident of victory. We had had a long
seven months' wait, and at last we were having our decisive fight.
After breakfast (like giants refreshed) we began shooting again. I
cannot tell you who did well, but I can assure you that no man did
badly. Besides the men there were ladies. Mrs. Buchan and Miss
Crawford worked most calmly and bravely under fire. All the other
ladies did their duty too. Whilst the fight was developing, Mrs.
Winter was running about getting us coffee. Her small son, aged six,
was extremely wroth with me because I ordered him under shelter.
Then commenced what you may call the next phase of the fight.
Captain Fitzclarence and his squadron, with Mr. Swinburne and Mr.
Bridges, came down through the town to join hands with Captain
Marsh's squadron, and then with Lord Charles Bentinck's squadron
and the Baralongs, the whole under Major Godley, were now going
to commence to capture the Boers. I must endeavour to describe
the situation. Eloff's attack was clever and determined. He had
seven hundred men and had advanced up the bed of the Molopo.
Into Mafeking he had got, but like many previous attacks had
proved--it was easy to get in, but quite another matter to get out.
The Baralongs and our outlying forts had allowed some three
hundred men to enter, and had then commenced a heavy fire upon
their supports. This discomfited the supports, and they incontinently
fled. Silas Moleno and Lekoko, the Baralong leaders, had decided
that it was better to kraal them up like cattle. One Dutchman was
overheard to shout, "Mafeking is ours," when suddenly his friends
yelled, "My God, we are surrounded." This species of fighting
particularly appeals to the Baralong. He is better than the Boer at
the Boer's own game, and never will I hear a word against the
Baralong. However, Silas was then engaged in conjunction with our
own men in collecting them. He collected them where they had no
water, and then the question resolved itself into the Boer showing
himself and getting shot or gradually starving. If the Baralongs had
been fighting the fight and time had been no particular object, they
would probably still be shooting odd Boers, but it is obvious that
those dilatory measures could not be pursued by ourselves, and that
we had to finish the fight by nightfall. Our men were accordingly
sent down to round them up; there were thus in all three parties of
Boers in the town, one, nearly three hundred strong, in the B.S.A.P.
fort, sundry in a kraal by Mr. Minchin's house, others again in the
kopje. The kraal was captured in an exceedingly clever manner.
Captain Fitzclarence and Captain Marsh worked up to the walls, but
knowing the pleasant nature of the Boer, instead of storming the
place or showing themselves, they bored loopholes with their
bayonets. The artillery under Lieutenant Daniels also had come up to
within forty yards. There was a slight hesitation on the part of the
Boers to surrender. The order was given to the gun to commence
fire. The lanyard broke, but before a fresh start could be made the
Boers hastily surrendered. Captain Marsh, known and respected by
the Baralongs, had great difficulty in restraining them from finishing
the fight their own way, and small blame to them for their desire.
They had had their stadt burned. Odd Boers had been bolting at
intervals, and had mostly been accounted for. The question next to
be settled was as to the possession of the B.S.A.P. fort. Our men
who were captive therein, and indeed the Boers and foreigners to
whom I have since talked describe our fire as extraordinarily
accurate. Eloff had great difficulty in keeping his men together, and
as one man at least was a deserter of ours, it can't altogether be
wondered that they did not wish to remain. Our firing, as we had
more men to spare, became more and more deadly, and at last now
they decided to surrender. Some hundred broke away and escaped
from the fort, in spite of Eloff firing on them, but their bodies have
been coming in ever since and many will never be accounted for,
because the bodies of men with rifles may be possibly put away by
the Baralongs, who are always begging rifles we have been unable
to give them. Eloff accordingly surrendered to Colonel Hore. The
other party in the kopje had made several unsuccessful attempts to
break out, Bentinck and his squadron always successfully heading
them, but as it got dark, and our men had been fighting from before
four, it was decided to let them break out and just shoot what we
could. The Baralongs had some more shooting too. As each
successive batch of prisoners was marched into the town absolute
silence was maintained by the Britishers, except saluting brave men
who had tried and failed. They were brave men and I like them
better now than I ever did; the Kaffirs, however, hooted. As each
batch marched up, their arms, of which they had naturally been
deprived, were handed over to the Cadets, who had been under fire
all day. These warriors range from nine to fifteen years of age. They
are the only smartly clad portion of the garrison, for our victorious
troops were the dirtiest and most vilely robed lot of scarecrows I
have ever seen, still it did one good to see the escort to the
prisoners, they were simply swelling like turkey cocks and all round
our long lines of defences we would hear cheers and "Rule
Britannia" and the "Anthem" being sung with the wildest
enthusiasm. It is impossible as I said before, to say who behaved
best, but none behaved badly. There was only one thing said
afterwards, when all sorts and conditions of men were shaking each
other by the hand, and that was, "This is a great day for England."
Mafeking is still rather mad with the Relief Column within shouting
distance and it is likely to remain so.
CAPTURED BOER PRISONERS
We lost few men in our great success but I take it that no man
particularly wants to be lost. I really have seen brave men here, but
the man who says he wants to get shot is simply a liar. We know the
story of the Roman sentinel and the Highlander who fought in
Athlone (or was it Mullingar) against Hoche and many men that have
died for their country obstinately. Captain Singleton's servant,
Trooper Muttershek, may be added to their roll. He absolutely
declined to surrender and fought on till killed. It wasn't a case of
dashing in and dashing out and having your fun and a fight, it was a
case of resolution to die sooner than throw down your arms, the
wisdom may be questionable, the heroism undoubted. He wasn't
taking any surrender. As far as I am concerned, I have seen the
British assert their superiority over foreigners before now, but this
man in my opinion, though I didn't see him die, was the bravest man
who fought on either side that day. It is a good thing to be an
Englishman. These foreigners start too quick and finish quicker. They
are good men, but we are better, and have proved so for several
hundred years. I had always wanted to see the Englishman fight in a
tight hole, and I know what he is worth now. He can outstay the
other chap. Well, you must be getting rather bored by the fighting,
and I will write more anon when I have collected some further
particulars. The Rev. W. H. Weekes, our parson, organized a
thanksgiving service on Sunday night. We were still rather mad, and
it gave us a pleasant feeling to sing nice fighting psalms and hymns,
because which ever way you look at it we are perfectly convinced
out here that it is a righteous war. He had rather a mixed
congregation, which probably in times of peace would be half the
size, but he understands his congregation and the congregation
understand him.
Poor Hazelrigg died that night.
INTERVIEWING BOER PRISONERS ON MR.
WEIL'S STOEP
I went over and saw the prisoners this afternoon. They were
very civil, and so were we. I like a Frenchman, and was chaffing
them more or less at having left "La Patrie." They didn't seem to
mind being prisoners; they apparently enjoyed their fight, but they
objected to their food. I did what I could for them, and I couldn't
help feeling that they were absolutely uninvited guests. It wasn't
their quarrel, and why they wanted to shove their nose into it we all
fail to understand. There is really a very charming man amongst
them, who asked me to procure him a grammar as he wished to
improve his mind by learning Dutch and English. Of course, I got
him a grammar, while I couldn't help suggesting that it might have
been as well to remain in comfort in France without travelling all this
way to learn the language, also remarking Dutch seemed rather out
of date. He rather agreed with me, and asked me for a collection of
siege stamps as he said he thought his girl would like them. The
funny part of these fellows is that they seem to think that we haven't
got homes or girls or anything else, but are a sort of automatic
"Aunt Sally," put up here for irresponsible foreigners to have a shy
at. Nobody bears any malice about the fight, but the Frenchman
calls the Boer "canaille," the Boer doesn't seem to like the
Frenchman or, indeed, any other foreigner, regarding him as an
impetuous fool who would probably lead him (the Boer) into some
nasty dangerous place, and the Englishman laughs at the lot;
however, as I said before, the poor devils can't help being
foreigners. I always like a Frenchman, a good many have been kind
to me and they are invariably amusing. Their stomachs, however,
are at present proud, and they cannot swallow "sowen," or horse
flesh, or any local luxuries. However, as we pointed out, it was rather
their fault that we had not any rations in here. Some of these men
had only been in the country a week. It seems a long way to come
to get put in "quod," and live on horse flesh and "sowens." One told
me he passed a battery of our relieving column in harbour at Beira. I
suppose he thought he had put in a smart day's work when he got
ahead of it. He has, but he isn't working now. I never liked Eloff
much, not that I knew him personally, but now I like him better for
his performances. He very nearly did a big thing, but both sides have
apparently an ineradicable mutual contempt for each other, which
has led to some very pretty fighting through the whole war. There is
no mistake about it, he did insult the Queen, and I am glad we have
had the wiping out of that score, but he is a gallant fellow all the
same. When we look back on our discomfiture of Cronje, and the
mopping up of Eloff, it gives a pleasant finish to the siege. It wanted
just a finishing touch to make it satisfactory. There should be
another fight within a few hours, but I reckon that it will be the relief
Column's turn, and though everything is ready for us to assist them I
honestly don't think we could go far and do much. The men were
dog tired on Saturday, absolutely dog tired. I always thought the
Boer was a bad bird to get up to the gun, but he came up that day. I
don't think he will again.
On Monday we saw the tail end of some Boer force arriving. We
had hoped it might be our own people, but they appear to be a few
miles further off. However, we know they are there or thereabouts
now. Nobody minds now, we know we are winning.
To return again to my story of the fighting, the foreigners did try
their best to stop the Boers looting, but loot they did most
thoroughly. They stole everything they could lay their hands on. Not
one officer, whose kit happened to be in the fort has recovered
anything. One "clumpy" of Boers galloped forth laden with food and
drink. The food belonged to themselves, the drink belonged to us.
They happened to fall in with the galloping Maxim, a piece of bad
luck because they all died and our people took the food and drink.
One fellow had taken a pair of brown boots and a horse, he had a
few bullets through the boots, the horse was killed and so was he.
Life had been very dull here, but that morning put everything all
right. We had never before seen a dead or wounded Boer or a
prisoner, and it is weary work to see your friends and neighbours
shot and not see your own bag too, but personally, except in the
way of business, I hope I haven't killed a Boer. In the fight in the
morning, though everything had been prepared for as far as we
could tell, we had had to take up positions which were absolutely
enfiladed by the fresh development of affairs. The trench occupied
by the Bechuanaland Rifles, Protectorate Regiment, and others on
the spur of the moment, was directly enfiladed by the enemy's
quick-firer. Why we were not wiped out on that line I never shall
quite make out. They shot the jailor, Heale, who has done very good
work all through the siege, who I am afraid leaves a wife and family.
Then the prisoners took charge of themselves. Our gunner prisoners
ran down to the guns, one was shot, the others served the gun all
day. The others, armed with Martinis, commenced a heavy fire on
the enemy, or cautioned the Dutch prisoners, the suspects, as to
their behaviour, and put them down a hole. It was an exhilarating
sight and struck me as exceedingly quaint to see men who had
committed every crime, and were undergoing penal servitude,
dismissing their past, oblivious of anything except the fact that we
were all of the same crowd, and had got to keep the Dutchmen out.
I hope Her Majesty will exercise her clemency; they certainly
deserve to regain their rights as citizens.
We have had rather a dull day for some reason or other. A
general idea pervaded the town that relief was at hand, and when
towards evening a cloud of dust and troops were seen to the south-
west, we most of us got on the roofs and looked at them with some
interest. It transpired subsequently, however, that they were the
enemy retiring before Mahon. They passed round the south of the
town, and opposed him later.
16th, Wednesday. A dull day, but towards evening our relief was
really seen. Everybody got on the roofs, and looked on at the Boers
being shelled; most refreshing, but as they were not apparently
coming in, people went to feed, and enthusiasm rather died away
again, so much so that when Major Karri Davis, and some eight men
of the I.L.I. marched in, he told one passer-by he was the advance
guard relief force, the other only murmured "Oh, yes, I heard you
were knocking about," and went to draw his rations, or whatever he
was busily engaged in. However, when it became generally known
the crowd assembled and began to cheer, and go mad again--so to
bed.
17th, Thursday. Roused out this morning at some ungodly hour
to be told they had arrived, and strolled down to the I.L.I. to see
Captain Barnes of my old regiment. It appeared that Mahon and
Plumer had effected a masterly junction the day before, and that the
former, following the only true policy of South African warfare had,
as usual, said he was going to do one thing, and done something
else, viz., camped out, and then suddenly inspanned and marched
into the town. I can't quite convey the feelings of the townspeople,
they were wild with delight, and pleased as they were their bonne
bouche was to come later. Edwardes and Barnes breakfasted with
me and then went back (personally I borrowed a horse from the
I.L.I.). About 9 o'clock the guns moved out to the waterworks, and
then the fun really began. The Boers had been going to intercept
Mahon's entry, but he was a bit too previous. All the morning their
silly old five-pounder (locally known as "Gentle Annie") had been
popping away, when suddenly the R.H.A. Canadian Artillery and
pom-poms began, ably led by our old popguns, who had the honour
of beginning the ball. I rode well out, as I wanted to see the other
people have a treat, but literally in half an hour all there was left of
the laager, which has vexed our eyes and souls so much for long
months, was a cloud of dust on the horizon, except food-stuffs, &c.,
which we looted. I got a Dutch Bible, and from its tidiness I was
pleased to see its late owner was a proficient in the Sunday school.
So, quietly back to the town, and after the march past of the relief
column the relieved troops began. And now, I suppose, after being
bottled up for some eight lunar months, I may effervesce. As I have
said before, I have seen many tributes to her Majesty and joined in
them all, but dirty men in shirt sleeves, and dirtier men in rags on
scarecrows of horses touched me up most of all. We were dirty, we
were ragged, but we were most unmistakably loyal, and we came
from all parts of the world--Canadians, South Africans, Australians,
Englishmen, Indians, and our Cape Boys and various other Africans,
and there was not one of us who did not respect the other, and
know we were for one job, the Queen and Empire, not one.
MARCH PAST OF THE RELIEVING FORCE.
I wonder how the prisoners felt, poor devils; they must have
wished they were not against us. The Boers had certainly executed
the smartest movement I had seen for some time; I had not
believed it possible that a laager could break up and disperse so
rapidly. We all went back to lunch, having recovered Captain
McLaren, who, I am glad to say, is doing very well. Then after lunch
an alarm was raised that we had rounded up old Snyman, and
everybody started off to help in the operation; but, alas, Snyman
knows too much. They said that he and four hundred Boers were
surrounded and refused to surrender, and we all wanted as much
surrender as we could get--or the other thing. I am glad to say he
was hit on the head in the morning with a bit of shrapnel, but not
dangerously wounded, unfortunately, at least so they report. He
seems equally execrated by Dutch and English--Psalm-singing,
sanctimonious murderer of women and children and his son takes
after him. I may contradict my previous statements, but his actions
have also varied frequently. Well, we had a great dinner; old friends
from all parts of the world foregathered, and at our head was
Smitheman. Many dinners then combined, and more old friends
were met--so to bed, still pleased with England. Men of all sorts and
conditions, trades, professions and ranks, relievers and relieved,
slept that night in and about Mafeking, with a restless sleep, thinking
of what England would think, and we knew and were sorry we
couldn't hear what they said.
The garrison in Mafeking hope to get some recognition or
decoration, but what they attach particular importance to is receiving
the Queen's chocolate.
Immediately after the relief column marched in our Baralongs
under Montsoia Wessels, Silas and Sekoko and Josiah, marched off
on their own to settle up Abraham Ralinti at Rietfontein, and bring in
our trusty ally, Saani. He had been utterly looted, and taken away
from his own stadt, and kept a prisoner at Rietfontein, his great
notion being that we should have a conference with the Boers, and
then lay down what he called "plenty polomite," and blow them up
when they came to confer. You cannot get very far ahead of a
Baralong. I suppose this is the first occasion on which one black man
surrendered under a white flag to another. These Rietfontein rebels
have always been against the remainder of the Baralongs, and have
invariably fought for the Boers since the disturbed relations between
Briton and Boer have existed. I hope they will shoot Abraham, as his
people's invariable cunning in stopping our runners has caused us
great inconvenience, not to mention the numbers they have killed.
18th, Friday. Did very little. Went round and helped our pals to
shop, get stamps, money, &c., &c.
19th, Saturday. The garrison held its solemn Thanksgiving
Service at the cemetery, at the termination of which three volleys
were fired over our dead. We had been unable to do this before
owing to the certainty of drawing fire, not that that really much
mattered, as they usually fired on all our funeral parties, though
there could be no mistaking them. Still they had this excuse that the
cemetery is fortified. After the last post had sounded we reformed
and sang the National Anthem. Then, after Colonel Baden-Powell
had spoken personally to each detachment, we cheered him, and
then with heartfelt cheers for Her Majesty, the siege of Mafeking
closed.
GOD SAVE THE QUEEN.
And now for sheer personalities. Mr. Stuart had arrived, and as I
considered he was much better qualified to represent the paper with
the force than myself, I determined to come south. Mr. B. Weil,
whom as I have previously said, I consider to be one of the principal
factors in the successful defence, certainly as regards the food
supply, said he was going south. I accordingly resolved to
accompany him, and while returning from the ceremony suggested
it. Anyhow, to make a long story short, I arrived as he was starting,
and with a small bag, having relinquished all my Mafeking
impedimenta, climbed into his cart. He had to turn out one of his
boys, but I didn't mind that, and being the most good-natured of
men, he tried to look as if he didn't. So our caravan started--Major
Anderson, Major Davis (Surg. I.L.I.), Mr. Weil, and myself, together
with his servant Mitchell, a prototype of "Binjamin," but absolutely
reliable and hard-working, also Bradley, of Bradley's Hotel, Inspector
Marsh, the Rev. ---- Peart, and Ronny Moncrieffe (who had secured a
horse belonging to a Protectorate regiment, and proposed to
accompany us). He had done a lot of good work in the siege, and
was about as tired and unfit as a man could be. However, he was
determined to get through, and so he did. It was a quaint
pilgrimage, as the column, though it had swept the country, had not
particularly cleared it, and the Boer is here to-day, gone to-morrow,
and back the next day. Well, our commissariat was excellent. I
contributed some eight biscuits and three tins of bully, and that is all
I have done except live on the fat of the land--Lord, how fat it
seemed after Mafeking--a land flowing with fresh milk, butter and
eggs, mutton and white bread, and above all, the sense of freedom,
I never knew what it felt like to be properly free before, and I have
been more or less of a wanderer most of my life. No more sieges for
me, except perhaps from the outside. Yet I was sorry to leave
Mafeking, and I may truly say as far as I know I didn't leave a bad
friend behind me, only all my kit. Towards dark, after an outspan
that was like a picnic, we reached Mr. Wright's farm, where the
wounded were--one had died the night before--and we found Mr.
Hands, Daily Mail, badly wounded in the thigh, but doing well;
Captain Maxwell, I.S.C., and others. Mr. Wright acts up to his name.
Two of his sons were in "tronk" at Zeerust for refusing to join the
Boers, and what he had was at our disposal. I wonder if people at
home realize in what a position our loyalists in Bechuanaland have
been placed. If they didn't come in their own countrymen regarded
them as rebels,--if they did they lost all they had. But by doing as
they have done, that is by carrying on their business while exposed
to all the contumely and insult the Boers could heap on them, with
the possible loss of life as well as property, they have served their
country as well as those who have taken up arms; because their
houses have always been a safe place for runners to go to, and
news about the doings of the Boers could be obtained from them.
Besides, they know which of the Boers fought, and which didn't, and
this fact now terrifies the rebels and keeps many quiet, who might
not otherwise be so. Mr. Weil on arrival bought two hundred bags of
mealies and despatched them to his friends the Baralongs. Such a
pretty place his farm is, with plenty of water and lots of game. We
slept under the cart, and miserably cold it was. Mr. Weil (who is
rather like myself in that respect), could not sleep, and was
determined nobody else should do so. So we got up, and sat round
the fire till sunrise. Our cocoa that morning was indeed acceptable.
The caravan, which was as I say, quaint, marched as follows,
preceded by mounted Kaffir Scouts:--First came Keeley and his boy
in a Cape cart drawn by mules, followed by Weil, his servant, driver
and myself in another Cape cart with six mules, Bradley driving a
pair of horses in another, then Ronny, the Rev. ---- Peart and
Inspector Marsh riding, the latter riding B.P.'s brother's pony. We
inspanned at sunrise on Monday and started for Setloguli. Halted
half way and had the pleasing intelligence that a commando was
raiding within six miles of us. I personally felt very unhappy. I had
always looked upon it as a two-to-one chance, and as we had no
weapons we could make no fight of it. Apart from the bore of being
a prisoner I knew I should be so awfully laughed at. However, there
we were--it was no use grumbling, but I did, as hard as ever I could.
Then we inspanned and drove to Setloguli, where our spirits were
considerably raised by an excellent lunch provided by Mrs. Fraser,
who is the best hostess I have ever met. The Frasers had a terrible
rough time of it, and now "the Queen had got her own again" were
naturally correspondingly cheerful. Later we were also further
relieved to hear that "the commando" was merely a small patrol of
Boers, and that it had withdrawn across the border. During the
afternoon I went up and saw the old fort--quite interesting, and
anybody who wants to spend a quiet time might do worse than to
go to Setloguli. The worst of it is it takes some time to get there.
Lady Sarah Wilson's maid was there. She had been there since Lady
Sarah was brought in by the Boers to Mafeking. Mr. Weil was
showing various curios of the siege to Mrs. Fraser, including a copy
of Her Majesty's Leaves from the Journal of our Life in the
Highlands, which he had looted from the Boer laager. This excited
the good lady's unqualified wrath, "What sacrilege for them to have
it in their hands. Why it smells Boery," she said. On Tuesday Keeley
was returning to Mafeking with Lady Sarah's maid and his scouts, so
Weil engaged two scouts to accompany us to Jan Modebi, where we
were next going to stop. They didn't seem particularly pushing sort
of scouts, as they persistently rode in rear of the Cape cart. The
road too, was infamous, but it was impossible to lose the way as the
column had left an unmistakable track behind them, and this was
fortunate, because when we had been going about an hour and a
half our intelligent guide stated he didn't know the way. I wonder
how Keeley felt all that Tuesday. If he could have heard half we said
he would have torn his two days' beard out and wept. The other
scout lost us altogether. Keeley and Weil were arranging a series of
despatch riders, so as long as we got one of them to Jan Modebi's, it
didn't much matter. We outspanned first at a rebel's farm, and had
an excellent lunch. I was still rather fretful. The prospect of captivity
made me so, and I only believe in dead Dutchmen, till peace is
proclaimed.
One Sonnenberg, a brother of some Bond member or other, was
there trading, I suppose, like most Bondsmen, running with the hare
and hunting with the hounds. He looked well on it, and was very
civil. We inspanned and then came a long trek to Jan Modebi's.
About half-way there, we saw two horsemen with guns cruising
about. One obviously was not a soldier. I reckoned Pretoria was the
ticket, however, they came up and Weil went to interview them.
They turned out to be one of the Kimberley Light Horse and a
civilian who was showing him the way, and he said he had got a
convoy of cattle. It felt like being near home again then. We
afterwards met the convoy--total, four white men and five black. I
still marvel at their colossal impudence, marching through a rebel
country within five miles of the enemy's border, escorting cattle for
which any Boer will peril his skin. He calmly assured me they were
going to pick up all they saw on the way; to use his own words, "All
is fish that comes to our net." I hope they got through all right. So
to Mr. Menson's, where we put up for the night, and he, like
everyone else, did all he could. He, too, had had a bad time. He
didn't grumble, but when the relief column had come through they
had cut all his barbed wire fences. Having a constitutional antipathy
to barbed wire I sympathized with the relief column, but naturally
did not say so. I was amused to see three prints of Sir Alfred Milner,
Lord Roberts, and Oom Paul, the inscription under the latter being,
"The end is better than the beginning, 14.10.99," also to hear his
account of how when driving his cattle to Vryburg at the outbreak of
the war he had met a Dutchman who told him that they had driven
the English into the sea. His reply was, "Oh, that's too far to go,"
and so he turned and drove his cattle back again to his farm. Weil,
as usual, bought up cattle, &c., also butter and other luxuries, and
despatched them to the hospital at Mafeking on his own account.
Wednesday. We started rather later than usual owing to the
heavy rain, and half way to Vryburg we crossed the fresh spoor of
men, wagons, cattle, &c., going towards the Transvaal. It afterwards
transpired it was the rebel Van Zyl and his following, bolting from
Kuruman to the Transvaal. Let off number two. We couldn't have
been more than an hour or two behind them, and they would
certainly have scooped us had we met them, so the rain was lucky.
Well, we got into Vryburg from one side as the troops got in from
the other. An old acquaintance rushed me off to the Club, and I then
strolled up to see the Scotch Yeomanry and found Charley Burn. I
found also Kidd and several others I knew--then on to see Reade,
who had been Intelligence Officer at Mafeking before the war, and
was D.A.A.G. to General Barton, and arranged about getting on in
the first train. This was my first chance of seeing the infantry Tommy
on the war path to any great extent. He is no more beautiful or
clean, in fact, if anything less so than his cavalry brother, but by
heaven he looks a useful one! However, what matter the man as
long as the flag is clean. Met North of the Royal Fusiliers and dined
with him, they all asked after Fitzclarence, Godley, and the others.
They and the Scots Fusiliers had done quite an extraordinary march
of forty-four miles in thirty-four hours, and now our infantry were
within striking distance of Mafeking. The line should soon be
repaired as they had begun from Mafeking and the line as far as
Maribogo was practically untouched, in fact next morning, Thursday,
they ran twelve miles north. Thursday we began our preparations for
departure. The garrison were preparing to celebrate the Queen's
Birthday, and the populace to display great enthusiasm, and the
women began to come into town. It was not a highly polished
parade, so far as I could see. Still, it was rather good to have it there
just then, where the Dutchmen had been in occupation within ten
days. Rifles were now coming in by the hundred, and the rebel of a
fortnight before became a British patriot. We drove to the station,
and there met the Scots Fusiliers. I was accosted by a warrior in
large blue goggles, who said I didn't remember him. I naturally
didn't in the goggles, but it turned out to be Scudamore. They did
the best they could for us, and then Dick of the Royal Irish Fusiliers
turned up, who had once been my sergeant-major. I was glad to see
him--the old regiment and squadron seems fairly dotted all over
Africa. Barnes was at Mafeking, three of us had been through the
siege, and I met one Lambart at Taungs, who had been a corporal
with us, and was a captain in the Kimberley Mounted Corps,
curiously enough all belonging to two squadrons, B and D. Well, we
left Vryburg with a light engine and a truck full of niggers. We were
all sitting on the tank, in charge of young Gregg, R.E., who is a good
train master. He ran us down, after dropping the niggers to repair a
bridge, to Dry Hartz, where we had to pull out for an up-coming
train, and as we had half an hour to wait, and it was just mid-day at
twelve, we formed up and gave three cheers for the Queen and
drank her health. It was the smallest and dirtiest Queen's Birthday
parade I have ever attended; nine all told, but "mony a little makes
a muckle." We ran down to Taungs, where one way and another we
were detained some twelve hours. I didn't mind. The Royal Welsh
Fusiliers were there, and I found several old friends and
acquaintances--Gough Radcliffe, R.H., Cooper (Royal Fusiliers),
Broke Wright, R.E., the former railway staff officer. So into a cattle
truck we jumped with one of the Welsh Fusiliers and some men and
arrived at Kimberley 7 o'clock next morning, where I called on Sir C.
Parsons, and had fish for breakfast at the hotel. Thus my journey
was practically ended. It transpired that Vryburg was held by some
half dozen of our forces, and that the remainder of the garrison was
only sixty loyalists from the town population. It did not seem a large
garrison, but apparently it was good enough. There was rather a
curious coincidence at dinner at Orange River. I saw a man whose
face I thought I knew, but I was mistaken; it was his likeness to his
brother which misled me. He turned out to be Tom Greenfield's
brother, who was down here sick, and to whom I had wired to meet
me at Fourteen Streams, so that I could give him news of Tom.
However, I struck him on the next river or so, so it didn't much
matter.
It was sad to pass the Modder River and see our cemeteries--all
English; so we passed on to Cape Town. And how jolly it was to see
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com

More Related Content

PDF
Linux Applications On Pseries 2nd Ed Ibm Redbooks
PDF
Implementing Linux With Ibm Disk Storage Ibm Redbooks
PDF
Linux fundamentals
PDF
Kali Linux Revealed - Mastering the Penetration Testing (Raphaël Hertzog, Jim...
PDF
The_Linux_Users_Guide.pdf
PDF
BOOK - IBM zOS V1R10 communications server TCP / IP implementation volume 1 b...
PDF
IBM AIX Version 7.1 Differences Guide
PDF
IBM PureFlex System and IBM Flex System Products and Technology
Linux Applications On Pseries 2nd Ed Ibm Redbooks
Implementing Linux With Ibm Disk Storage Ibm Redbooks
Linux fundamentals
Kali Linux Revealed - Mastering the Penetration Testing (Raphaël Hertzog, Jim...
The_Linux_Users_Guide.pdf
BOOK - IBM zOS V1R10 communications server TCP / IP implementation volume 1 b...
IBM AIX Version 7.1 Differences Guide
IBM PureFlex System and IBM Flex System Products and Technology

Similar to Understanding The Linux Kernel 1st Edition Daniel Pierre Bovet (20)

PDF
Bayanihan linux 5_manual
PDF
Advanced Networking Concepts Applied Using Linux on IBM System z
PDF
Linux kernel 2.6 document
PDF
Linux-Perf.pdf
PDF
Isp Setup Red Hat Howto
PDF
Cesvip 2010 first_linux_module
PDF
New linux course_modules
PDF
New linux course_modules
PDF
Multiprocessor Systemonchip 1 Architectures 1st Liliana Andrade
PDF
digital marketing training in bangalore
PDF
Securing optimizing linux red hat edition
PDF
Ubuntu manual
PDF
IBM Flex System p260 and p460 Planning and Implementation Guide
PDF
Linux_kernelmodule
PDF
Experiences with oracle 10g database for linux on z series sg246482
PDF
BOOK - IBM Security on ibm z vse
PDF
AIX 5L Differences Guide Version 5.3 Edition
PDF
MySQL Reference Manual
PDF
Parallel sysplex
PDF
redp5222.pdf
Bayanihan linux 5_manual
Advanced Networking Concepts Applied Using Linux on IBM System z
Linux kernel 2.6 document
Linux-Perf.pdf
Isp Setup Red Hat Howto
Cesvip 2010 first_linux_module
New linux course_modules
New linux course_modules
Multiprocessor Systemonchip 1 Architectures 1st Liliana Andrade
digital marketing training in bangalore
Securing optimizing linux red hat edition
Ubuntu manual
IBM Flex System p260 and p460 Planning and Implementation Guide
Linux_kernelmodule
Experiences with oracle 10g database for linux on z series sg246482
BOOK - IBM Security on ibm z vse
AIX 5L Differences Guide Version 5.3 Edition
MySQL Reference Manual
Parallel sysplex
redp5222.pdf
Ad

Recently uploaded (20)

PPTX
Institutional Correction lecture only . . .
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
Presentation on HIE in infants and its manifestations
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
master seminar digital applications in india
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PPTX
Cell Types and Its function , kingdom of life
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
Classroom Observation Tools for Teachers
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
Cell Structure & Organelles in detailed.
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
RMMM.pdf make it easy to upload and study
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Institutional Correction lecture only . . .
Microbial disease of the cardiovascular and lymphatic systems
Presentation on HIE in infants and its manifestations
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
master seminar digital applications in india
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
Cell Types and Its function , kingdom of life
Anesthesia in Laparoscopic Surgery in India
Classroom Observation Tools for Teachers
Module 4: Burden of Disease Tutorial Slides S2 2025
Cell Structure & Organelles in detailed.
FourierSeries-QuestionsWithAnswers(Part-A).pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
O7-L3 Supply Chain Operations - ICLT Program
RMMM.pdf make it easy to upload and study
102 student loan defaulters named and shamed – Is someone you know on the list?
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Ad

Understanding The Linux Kernel 1st Edition Daniel Pierre Bovet

  • 1. Understanding The Linux Kernel 1st Edition Daniel Pierre Bovet download https://ebookbell.com/product/understanding-the-linux-kernel-1st- edition-daniel-pierre-bovet-973618 Explore and download more ebooks at ebookbell.com
  • 2. Here are some recommended products that we believe you will be interested in. You can click the link to download. Understanding The Linux Kernel Third Edition 3rd Edition Daniel P Bovet https://ebookbell.com/product/understanding-the-linux-kernel-third- edition-3rd-edition-daniel-p-bovet-56913742 Understanding The Linux Kernel 2nd Edition Daniel P Bovet Marco Cesati https://ebookbell.com/product/understanding-the-linux-kernel-2nd- edition-daniel-p-bovet-marco-cesati-1369738 Understanding The Linux Kernel 3rd Edition Daniel P Bovet Marco Cesati https://ebookbell.com/product/understanding-the-linux-kernel-3rd- edition-daniel-p-bovet-marco-cesati-52556382 Understanding The Linux Kernel Cesati Marco Bovet Daniel P Marco Cesati https://ebookbell.com/product/understanding-the-linux-kernel-cesati- marco-bovet-daniel-p-marco-cesati-10509418
  • 3. Understanding The Linux Kernel Daniel P Bovet https://ebookbell.com/product/understanding-the-linux-kernel-daniel-p- bovet-33374810 Understanding The Linux Kernel 3rd Edition Daniel P Bovet Marco Cesati Daniel P Bovet And Marco Cesati https://ebookbell.com/product/understanding-the-linux-kernel-3rd- edition-daniel-p-bovet-marco-cesati-daniel-p-bovet-and-marco- cesati-28931654 Understanding The Linux Kernel Daniel P Bovet Marco Cesati Bovet https://ebookbell.com/product/understanding-the-linux-kernel-daniel-p- bovet-marco-cesati-bovet-31746162 Understanding The Linux Kernel Daniel P Bovet Marco Cesati https://ebookbell.com/product/understanding-the-linux-kernel-daniel-p- bovet-marco-cesati-38225578 Understanding The Linux Virtual Memory Manager Mel Gorman https://ebookbell.com/product/understanding-the-linux-virtual-memory- manager-mel-gorman-976436
  • 6. Understanding the Linux Kernel Daniel P. Bovet Marco Cesati Publisher: O'Reilly First Edition October 2000 ISBN: 0-596-00002-2, 702 pages Understanding the Linux Kernel helps readers understand how Linux performs best and how it meets the challenge of different environments. The authors introduce each topic by explaining its importance, and show how kernel operations relate to the utilities that are familiar to Unix programmers and users.
  • 7. Table of Contents Preface .......................................................... The Audience for This Book .......................................... Organization of the Material .......................................... Overview of the Book .............................................. Background Information ............................................. Conventions in This Book ........................................... How to Contact Us ................................................. Acknowledgments ................................................. 1 1 1 3 4 4 4 5 1. Introduction .................................................... 1.1 Linux Versus Other Unix-Like Kernels ............................... 1.2 Hardware Dependency .......................................... 1.3 Linux Versions ................................................ 1.4 Basic Operating System Concepts .................................. 1.5 An Overview of the Unix Filesystem ................................ 1.6 An Overview of Unix Kernels ..................................... 6 6 10 11 12 16 22 2. Memory Addressing ............................................. 2.1 Memory Addresses ............................................. 2.2 Segmentation in Hardware ....................................... 2.3 Segmentation in Linux .......................................... 2.4 Paging in Hardware ............................................ 2.5 Paging in Linux ............................................... 2.6 Anticipating Linux 2.4 .......................................... 36 36 37 41 44 52 63 3. Processes ...................................................... 3.1 Process Descriptor ............................................. 3.2 Process Switching ............................................. 3.3 Creating Processes ............................................. 3.4 Destroying Processes ........................................... 3.5 Anticipating Linux 2.4 .......................................... 64 64 78 86 93 94 4. Interrupts and Exceptions ......................................... 4.1 The Role of Interrupt Signals ...................................... 4.2 Interrupts and Exceptions ........................................ 4.3 Nested Execution of Exception and Interrupt Handlers .................. 4.4 Initializing the Interrupt Descriptor Table ............................ 4.5 Exception Handling ........................................... 4.6 Interrupt Handling ............................................ 4.7 Returning from Interrupts and Exceptions ........................... 4.8 Anticipating Linux 2.4 ......................................... 96 96 97 106 107 109 112 126 129 5. Timing Measurements ........................................... 5.1 Hardware Clocks ............................................. 5.2 The Timer Interrupt Handler ..................................... 5.3 PIT's Interrupt Service Routine ................................... 5.4 The TIMER_BH Bottom Half Functions ............................ 5.5 System Calls Related to Timing Measurements ........................ 5.6 Anticipating Linux 2.4 ......................................... 131 131 133 134 136 145 148
  • 8. 6. Memory Management ........................................... 6.1 Page Frame Management ....................................... 6.2 Memory Area Management ...................................... 6.3 Noncontiguous Memory Area Management .......................... 6.4 Anticipating Linux 2.4 ......................................... 149 149 160 176 181 7. Process Address Space .......................................... 7.1 The Process's Address Space ..................................... 7.2 The Memory Descriptor ........................................ 7.3 Memory Regions ............................................. 7.4 Page Fault Exception Handler .................................... 7.5 Creating and Deleting a Process Address Space ....................... 7.6 Managing the Heap ............................................ 7.7 Anticipating Linux 2.4 ......................................... 183 183 185 186 201 212 214 216 8. System Calls .................................................. 8.1 POSIX APIs and System Calls ................................... 8.2 System Call Handler and Service Routines ........................... 8.3 Wrapper Routines ............................................. 8.4 Anticipating Linux 2.4 ......................................... 217 217 218 229 230 9. Signals ....................................................... 9.1 The Role of Signals ........................................... 9.2 Sending a Signal .............................................. 9.3 Receiving a Signal ............................................ 9.4 Real-Time Signals ............................................ 9.5 System Calls Related to Signal Handling ............................ 9.6 Anticipating Linux 2.4 ......................................... 231 231 239 242 251 252 257 10. Process Scheduling ............................................ 10.1 Scheduling Policy ............................................ 10.2 The Scheduling Algorithm ..................................... 10.3 System Calls Related to Scheduling ............................... 10.4 Anticipating Linux 2.4 ........................................ 258 258 261 272 276 11. Kernel Synchronization ......................................... 11.1 Kernel Control Paths .......................................... 11.2 Synchronization Techniques .................................... 11.3 The SMP Architecture ........................................ 11.4 The Linux/SMP Kernel ........................................ 11.5 Anticipating Linux 2.4 ........................................ 277 277 278 286 290 302 12. The Virtual Filesystem ......................................... 12.1 The Role of the VFS .......................................... 12.2 VFS Data Structures .......................................... 12.3 Filesystem Mounting ......................................... 12.4 Pathname Lookup ............................................ 12.5 Implementations of VFS System Calls ............................. 12.6 File Locking ................................................ 12.7 Anticipating Linux 2.4 ........................................ 303 303 308 324 329 333 337 342
  • 9. 13. Managing I/O Devices .......................................... 13.1 I/O Architecture ............................................. 13.2 Associating Files with I/O Devices ............................... 13.3 Device Drivers .............................................. 13.4 Character Device Handling ..................................... 13.5 Block Device Handling ........................................ 13.6 Page I/O Operations .......................................... 13.7 Anticipating Linux 2.4 ........................................ 343 343 348 353 360 361 377 380 14. Disk Caches .................................................. 14.1 The Buffer Cache ............................................ 14.2 The Page Cache ............................................. 14.3 Anticipating Linux 2.4 ........................................ 382 383 396 398 15. Accessing Regular Files ......................................... 15.1 Reading and Writing a Regular File ............................... 15.2 Memory Mapping ............................................ 15.3 Anticipating Linux 2.4 ........................................ 400 400 408 416 16. Swapping: Methods for Freeing Memory ........................... 16.1 What Is Swapping? ........................................... 16.2 Swap Area ................................................. 16.3 The Swap Cache ............................................. 16.4 Transferring Swap Pages ....................................... 16.5 Page Swap-Out .............................................. 16.6 Page Swap-In ............................................... 16.7 Freeing Page Frames .......................................... 16.8 Anticipating Linux 2.4 ........................................ 417 417 420 429 433 437 442 444 450 17. The Ext2 Filesystem ........................................... 17.1 General Characteristics ........................................ 17.2 Disk Data Structures .......................................... 17.3 Memory Data Structures ....................................... 17.4 Creating the Filesystem ........................................ 17.5 Ext2 Methods ............................................... 17.6 Managing Disk Space ......................................... 17.7 Reading and Writing an Ext2 Regular File .......................... 17.8 Anticipating Linux 2.4 ........................................ 451 451 453 459 463 464 466 473 475 18. Process Communication ........................................ 18.1 Pipes ..................................................... 18.2 FIFOs .................................................... 18.3 System V IPC ............................................... 18.4 Anticipating Linux 2.4 ........................................ 476 477 483 486 499 19. Program Execution ............................................ 19.1 Executable Files ............................................. 19.2 Executable Formats .......................................... 19.3 Execution Domains ........................................... 19.4 The exec-like Functions ....................................... 19.5 Anticipating Linux 2.4 ........................................ 500 500 512 514 515 519
  • 10. A. System Startup ................................................ A.1 Prehistoric Age: The BIOS ...................................... A.2 Ancient Age: The Boot Loader ................................... A.3 Middle Ages: The setup( ) Function ............................... A.4 Renaissance: The startup_32( ) Functions ........................... A.5 Modern Age: The start_kernel( ) Function ........................... 520 520 521 523 523 524 B. Modules ..................................................... B.1 To Be (a Module) or Not to Be? .................................. B.2 Module Implementation ........................................ B.3 Linking and Unlinking Modules .................................. B.4 Linking Modules on Demand .................................... 526 526 527 529 531 C. Source Code Structure .......................................... 533 Colophon ...................................................... 536
  • 11. Understanding the Linux Kernel 1 Preface In the spring semester of 1997, we taught a course on operating systems based on Linux 2.0. The idea was to encourage students to read the source code. To achieve this, we assigned term projects consisting of making changes to the kernel and performing tests on the modified version. We also wrote course notes for our students about a few critical features of Linux like task switching and task scheduling. We continued along this line in the spring semester of 1998, but we moved on to the Linux 2.1 development version. Our course notes were becoming larger and larger. In July, 1998 we contacted O'Reilly & Associates, suggesting they publish a whole book on the Linux kernel. The real work started in the fall of 1998 and lasted about a year and a half. We read thousands of lines of code, trying to make sense of them. After all this work, we can say that it was worth the effort. We learned a lot of things you don't find in books, and we hope we have succeeded in conveying some of this information in the following pages. The Audience for This Book All people curious about how Linux works and why it is so efficient will find answers here. After reading the book, you will find your way through the many thousands of lines of code, distinguishing between crucial data structures and secondary ones—in short, becoming a true Linux hacker. Our work might be considered a guided tour of the Linux kernel: most of the significant data structures and many algorithms and programming tricks used in the kernel are discussed; in many cases, the relevant fragments of code are discussed line by line. Of course, you should have the Linux source code on hand and should be willing to spend some effort deciphering some of the functions that are not, for sake of brevity, fully described. On another level, the book will give valuable insights to people who want to know more about the critical design issues in a modern operating system. It is not specifically addressed to system administrators or programmers; it is mostly for people who want to understand how things really work inside the machine! Like any good guide, we try to go beyond superficial features. We offer background, such as the history of major features and the reasons they were used. Organization of the Material When starting to write this book, we were faced with a critical decision: should we refer to a specific hardware platform or skip the hardware-dependent details and concentrate on the pure hardware-independent parts of the kernel? Others books on Linux kernel internals have chosen the latter approach; we decided to adopt the former one for the following reasons: • Efficient kernels take advantage of most available hardware features, such as addressing techniques, caches, processor exceptions, special instructions, processor control registers, and so on. If we want to convince you that the kernel indeed does
  • 12. Understanding the Linux Kernel 2 quite a good job in performing a specific task, we must first tell what kind of support comes from the hardware. • Even if a large portion of a Unix kernel source code is processor-independent and coded in C language, a small and critical part is coded in assembly language. A thorough knowledge of the kernel thus requires the study of a few assembly language fragments that interact with the hardware. When covering hardware features, our strategy will be quite simple: just sketch the features that are totally hardware-driven while detailing those that need some software support. In fact, we are interested in kernel design rather than in computer architecture. The next step consisted of selecting the computer system to be described: although Linux is now running on several kinds of personal computers and workstations, we decided to concentrate on the very popular and cheap IBM-compatible personal computers—thus, on the Intel 80x86 microprocessors and on some support chips included in these personal computers. The term Intel 80x86 microprocessor will be used in the forthcoming chapters to denote the Intel 80386, 80486, Pentium, Pentium Pro, Pentium II, and Pentium III microprocessors or compatible models. In a few cases, explicit references will be made to specific models. One more choice was the order followed in studying Linux components. We tried to follow a bottom-up approach: start with topics that are hardware-dependent and end with those that are totally hardware-independent. In fact, we'll make many references to the Intel 80x86 microprocessors in the first part of the book, while the rest of it is relatively hardware- independent. Two significant exceptions are made in Chapter 11, and Chapter 13. In practice, following a bottom-up approach is not as simple as it looks, since the areas of memory management, process management, and filesystem are intertwined; a few forward references—that is, references to topics yet to be explained—are unavoidable. Each chapter starts with a theoretical overview of the topics covered. The material is then presented according to the bottom-up approach. We start with the data structures needed to support the functionalities described in the chapter. Then we usually move from the lowest level of functions to higher levels, often ending by showing how system calls issued by user applications are supported. Level of Description Linux source code for all supported architectures is contained in about 4500 C and Assembly files stored in about 270 subdirectories; it consists of about 2 million lines of code, which occupy more than 58 megabytes of disk space. Of course, this book can cover a very small portion of that code. Just to figure out how big the Linux source is, consider that the whole source code of the book you are reading occupies less than 2 megabytes of disk space. Therefore, in order to list all code, without commenting on it, we would need more than 25 books like this![1] [1] Nevertheless, Linux is a tiny operating system when compared with other commercial giants. Microsoft Windows 2000, for example, reportedly has more than 30 million lines of code. Linux is also small when compared to some popular applications; Netscape Communicator 5 browser, for example, has about 17 million lines of code. So we had to make some choices about the parts to be described. This is a rough assessment of our decisions:
  • 13. Understanding the Linux Kernel 3 • We describe process and memory management fairly thoroughly. • We cover the Virtual Filesystem and the Ext2 filesystem, although many functions are just mentioned without detailing the code; we do not discuss other filesystems supported by Linux. • We describe device drivers, which account for a good part of the kernel, as far as the kernel interface is concerned, but do not attempt analysis of any specific driver, including the terminal drivers. • We do not cover networking, since this area would deserve a whole new book by itself. In many cases, the original code has been rewritten in an easier to read but less efficient way. This occurs at time-critical points at which sections of programs are often written in a mixture of hand-optimized C and Assembly code. Once again, our aim is to provide some help in studying the original Linux code. While discussing kernel code, we often end up describing the underpinnings of many familiar features that Unix programmers have heard of and about which they may be curious (shared and mapped memory, signals, pipes, symbolic links). Overview of the Book To make life easier, Chapter 1 presents a general picture of what is inside a Unix kernel and how Linux competes against other well-known Unix systems. The heart of any Unix kernel is memory management. Chapter 2 explains how Intel 80x86 processors include special circuits to address data in memory and how Linux exploits them. Processes are a fundamental abstraction offered by Linux and are introduced in Chapter 3. Here we also explain how each process runs either in an unprivileged User Mode or in a privileged Kernel Mode. Transitions between User Mode and Kernel Mode happen only through well-established hardware mechanisms called interrupts and exceptions, which are introduced in Chapter 4. One type of interrupt is crucial for allowing Linux to take care of elapsed time; further details can be found in Chapter 5. Next we focus again on memory: Chapter 6 describes the sophisticated techniques required to handle the most precious resource in the system (besides the processors, of course), that is, available memory. This resource must be granted both to the Linux kernel and to the user applications. Chapter 7 shows how the kernel copes with the requests for memory issued by greedy application programs. Chapter 8 explains how a process running in User Mode makes requests to the kernel, while Chapter 9 describes how a process may send synchronization signals to other processes. Chapter 10 explains how Linux executes, in turn, every active process in the system so that all of them can progress toward their completions. Synchronization mechanisms are needed by the kernel too: they are discussed in Chapter 11 for both uniprocessor and multiprocessor systems. Now we are ready to move on to another essential topic, that is, how Linux implements the filesystem. A series of chapters covers this topic: Chapter 12 introduces a general layer that supports many different filesystems. Some Linux files are special because they provide
  • 14. Understanding the Linux Kernel 4 trapdoors to reach hardware devices; Chapter 13 offers insights on these special files and on the corresponding hardware device drivers. Another issue to be considered is disk access time; Chapter 14 shows how a clever use of RAM reduces disk accesses and thus improves system performance significantly. Building on the material covered in these last chapters, we can now explain in Chapter 15, how user applications access normal files. Chapter 16 completes our discussion of Linux memory management and explains the techniques used by Linux to ensure that enough memory is always available. The last chapter dealing with files is Chapter 17, which illustrates the most-used Linux filesystem, namely Ext2. The last two chapters end our detailed tour of the Linux kernel: Chapter 18 introduces communication mechanisms other than signals available to User Mode processes; Chapter 19 explains how user applications are started. Last but not least are the appendixes: Appendix A sketches out how Linux is booted, while Appendix B describes how to dynamically reconfigure the running kernel, adding and removing functionalities as needed. Appendix C is just a list of the directories that contain the Linux source code. The Source Code Index includes all the Linux symbols referenced in the book; you will find here the name of the Linux file defining each symbol and the book's page number where it is explained. We think you'll find it quite handy. Background Information No prerequisites are required, except some skill in C programming language and perhaps some knowledge of Assembly language. Conventions in This Book The following is a list of typographical conventions used in this book: Constant Width Is used to show the contents of code files or the output from commands, and to indicate source code keywords that appear in code. Italic Is used for file and directory names, program and command names, command-line options, URLs, and for emphasizing new terms. How to Contact Us We have tested and verified all the information in this book to the best of our abilities, but you may find that features have changed or that we have let errors slip through the production of the book. Please let us know of any errors that you find, as well as suggestions for future editions, by writing to: O'Reilly & Associates, Inc. 101 Morris St. Sebastopol, CA 95472 (800) 998-9938 (in the U.S. or Canada) (707) 829-0515 (international/local) (707) 829-0104 (fax)
  • 15. Understanding the Linux Kernel 5 You can also send messages electronically. To be put on our mailing list or to request a catalog, send email to: info@oreilly.com To ask technical questions or to comment on the book, send email to: bookquestions@oreilly.com We have a web site for the book, where we'll list reader reviews, errata, and any plans for future editions. You can access this page at: http://www.oreilly.com/catalog/linuxkernel/ We also have an additional web site where you will find material written by the authors about the new features of Linux 2.4. Hopefully, this material will be used for a future edition of this book. You can access this page at: http://www.oreilly.com/catalog/linuxkernel/updates/ For more information about this book and others, see the O'Reilly web site: http://www.oreilly.com/ Acknowledgments This book would not have been written without the precious help of the many students of the school of engineering at the University of Rome "Tor Vergata" who took our course and tried to decipher the lecture notes about the Linux kernel. Their strenuous efforts to grasp the meaning of the source code led us to improve our presentation and to correct many mistakes. Andy Oram, our wonderful editor at O'Reilly & Associates, deserves a lot of credit. He was the first at O'Reilly to believe in this project, and he spent a lot of time and energy deciphering our preliminary drafts. He also suggested many ways to make the book more readable, and he wrote several excellent introductory paragraphs. Many thanks also to the O'Reilly staff, especially Rob Romano, the technical illustrator, and Lenny Muellner, for tools support. We had some prestigious reviewers who read our text quite carefully (in alphabetical order by first name): Alan Cox, Michael Kerrisk, Paul Kinzelman, Raph Levien, and Rik van Riel. Their comments helped us to remove several errors and inaccuracies and have made this book stronger. —Daniel P. Bovet, Marco Cesati September 2000
  • 16. Understanding the Linux Kernel 6 Chapter 1. Introduction Linux is a member of the large family of Unix-like operating systems. A relative newcomer experiencing sudden spectacular popularity starting in the late 1990s, Linux joins such well-known commercial Unix operating systems as System V Release 4 (SVR4) developed by AT&T, which is now owned by Novell; the 4.4 BSD release from the University of California at Berkeley (4.4BSD), Digital Unix from Digital Equipment Corporation (now Compaq); AIX from IBM; HP-UX from Hewlett-Packard; and Solaris from Sun Microsystems. Linux was initially developed by Linus Torvalds in 1991 as an operating system for IBM- compatible personal computers based on the Intel 80386 microprocessor. Linus remains deeply involved with improving Linux, keeping it up-to-date with various hardware developments and coordinating the activity of hundreds of Linux developers around the world. Over the years, developers have worked to make Linux available on other architectures, including Alpha, SPARC, Motorola MC680x0, PowerPC, and IBM System/390. One of the more appealing benefits to Linux is that it isn't a commercial operating system: its source code under the GNU Public License[1] is open and available to anyone to study, as we will in this book; if you download the code (the official site is http://www.kernel.org/) or check the sources on a Linux CD, you will be able to explore from top to bottom one of the most successful, modern operating systems. This book, in fact, assumes you have the source code on hand and can apply what we say to your own explorations. [1] The GNU project is coordinated by the Free Software Foundation, Inc. (http://www.gnu.org/); its aim is to implement a whole operating system freely usable by everyone. The availability of a GNU C compiler has been essential for the success of the Linux project. Technically speaking, Linux is a true Unix kernel, although it is not a full Unix operating system, because it does not include all the applications such as filesystem utilities, windowing systems and graphical desktops, system administrator commands, text editors, compilers, and so on. However, since most of these programs are freely available under the GNU General Public License, they can be installed into one of the filesystems supported by Linux. Since Linux is a kernel, many Linux users prefer to rely on commercial distributions, available on CD-ROM, to get the code included in a standard Unix system. Alternatively, the code may be obtained from several different FTP sites. The Linux source code is usually installed in the /usr/src/linux directory. In the rest of this book, all file pathnames will refer implicitly to that directory. 1.1 Linux Versus Other Unix-Like Kernels The various Unix-like systems on the market, some of which have a long history and may show signs of archaic practices, differ in many important respects. All commercial variants were derived from either SVR4 or 4.4BSD; all of them tend to agree on some common standards like IEEE's POSIX (Portable Operating Systems based on Unix) and X/Open's CAE (Common Applications Environment).
  • 17. Understanding the Linux Kernel 7 The current standards specify only an application programming interface (API)—that is, a well-defined environment in which user programs should run. Therefore, the standards do not impose any restriction on internal design choices of a compliant kernel.[2] [2] As a matter of fact, several non-Unix operating systems like Windows NT are POSIX-compliant. In order to define a common user interface, Unix-like kernels often share fundamental design ideas and features. In this respect, Linux is comparable with the other Unix-like operating systems. What you read in this book and see in the Linux kernel, therefore, may help you understand the other Unix variants too. The 2.2 version of the Linux kernel aims to be compliant with the IEEE POSIX standard. This, of course, means that most existing Unix programs can be compiled and executed on a Linux system with very little effort or even without the need for patches to the source code. Moreover, Linux includes all the features of a modern Unix operating system, like virtual memory, a virtual filesystem, lightweight processes, reliable signals, SVR4 interprocess communications, support for Symmetric Multiprocessor (SMP) systems, and so on. By itself, the Linux kernel is not very innovative. When Linus Torvalds wrote the first kernel, he referred to some classical books on Unix internals, like Maurice Bach's The Design of the Unix Operating System (Prentice Hall, 1986). Actually, Linux still has some bias toward the Unix baseline described in Bach's book (i.e., SVR4). However, Linux doesn't stick to any particular variant. Instead, it tries to adopt good features and design choices of several different Unix kernels. Here is an assessment of how Linux competes against some well-known commercial Unix kernels: • The Linux kernel is monolithic. It is a large, complex do-it-yourself program, composed of several logically different components. In this, it is quite conventional; most commercial Unix variants are monolithic. A notable exception is Carnegie- Mellon's Mach 3.0, which follows a microkernel approach. • Traditional Unix kernels are compiled and linked statically. Most modern kernels can dynamically load and unload some portions of the kernel code (typically, device drivers), which are usually called modules. Linux's support for modules is very good, since it is able to automatically load and unload modules on demand. Among the main commercial Unix variants, only the SVR4.2 kernel has a similar feature. • Kernel threading. Some modern Unix kernels, like Solaris 2.x and SVR4.2/MP, are organized as a set of kernel threads. A kernel thread is an execution context that can be independently scheduled; it may be associated with a user program, or it may run only some kernel functions. Context switches between kernel threads are usually much less expensive than context switches between ordinary processes, since the former usually operate on a common address space. Linux uses kernel threads in a very limited way to execute a few kernel functions periodically; since Linux kernel threads cannot execute user programs, they do not represent the basic execution context abstraction. (That's the topic of the next item.) • Multithreaded application support. Most modern operating systems have some kind of support for multithreaded applications, that is, user programs that are well designed in terms of many relatively independent execution flows sharing a large portion of the application data structures. A multithreaded user application could be composed of many lightweight processes (LWP), or processes that can operate on a common
  • 18. Understanding the Linux Kernel 8 address space, common physical memory pages, common opened files, and so on. Linux defines its own version of lightweight processes, which is different from the types used on other systems such as SVR4 and Solaris. While all the commercial Unix variants of LWP are based on kernel threads, Linux regards lightweight processes as the basic execution context and handles them via the nonstandard clone( ) system call. • Linux is a nonpreemptive kernel. This means that Linux cannot arbitrarily interleave execution flows while they are in privileged mode. Several sections of kernel code assume they can run and modify data structures without fear of being interrupted and having another thread alter those data structures. Usually, fully preemptive kernels are associated with special real-time operating systems. Currently, among conventional, general-purpose Unix systems, only Solaris 2.x and Mach 3.0 are fully preemptive kernels. SVR4.2/MP introduces some fixed preemption points as a method to get limited preemption capability. • Multiprocessor support. Several Unix kernel variants take advantage of multiprocessor systems. Linux 2.2 offers an evolving kind of support for symmetric multiprocessing (SMP), which means not only that the system can use multiple processors but also that any processor can handle any task; there is no discrimination among them. However, Linux 2.2 does not make optimal use of SMP. Several kernel activities that could be executed concurrently—like filesystem handling and networking—must now be executed sequentially. • Filesystem. Linux's standard filesystem lacks some advanced features, such as journaling. However, more advanced filesystems for Linux are available, although not included in the Linux source code; among them, IBM AIX's Journaling File System (JFS), and Silicon Graphics Irix's XFS filesystem. Thanks to a powerful object- oriented Virtual File System technology (inspired by Solaris and SVR4), porting a foreign filesystem to Linux is a relatively easy task. • STREAMS. Linux has no analog to the STREAMS I/O subsystem introduced in SVR4, although it is included nowadays in most Unix kernels and it has become the preferred interface for writing device drivers, terminal drivers, and network protocols. This somewhat disappointing assessment does not depict, however, the whole truth. Several features make Linux a wonderfully unique operating system. Commercial Unix kernels often introduce new features in order to gain a larger slice of the market, but these features are not necessarily useful, stable, or productive. As a matter of fact, modern Unix kernels tend to be quite bloated. By contrast, Linux doesn't suffer from the restrictions and the conditioning imposed by the market, hence it can freely evolve according to the ideas of its designers (mainly Linus Torvalds). Specifically, Linux offers the following advantages over its commercial competitors: Linux is free. You can install a complete Unix system at no expense other than the hardware (of course).
  • 19. Understanding the Linux Kernel 9 Linux is fully customizable in all its components. Thanks to the General Public License (GPL), you are allowed to freely read and modify the source code of the kernel and of all system programs.[3] [3] Several commercial companies have started to support their products under Linux, most of which aren't distributed under a GNU Public License. Therefore, you may not be allowed to read or modify their source code. Linux runs on low-end, cheap hardware platforms. You can even build a network server using an old Intel 80386 system with 4 MB of RAM. Linux is powerful. Linux systems are very fast, since they fully exploit the features of the hardware components. The main Linux target is efficiency, and indeed many design choices of commercial variants, like the STREAMS I/O subsystem, have been rejected by Linus because of their implied performance penalty. Linux has a high standard for source code quality. Linux systems are usually very stable; they have a very low failure rate and system maintenance time. The Linux kernel can be very small and compact. Indeed, it is possible to fit both a kernel image and full root filesystem, including all fundamental system programs, on just one 1.4 MB floppy disk! As far as we know, none of the commercial Unix variants is able to boot from a single floppy disk. Linux is highly compatible with many common operating systems. It lets you directly mount filesystems for all versions of MS-DOS and MS Windows, SVR4, OS/2, Mac OS, Solaris, SunOS, NeXTSTEP, many BSD variants, and so on. Linux is also able to operate with many network layers like Ethernet, Fiber Distributed Data Interface (FDDI), High Performance Parallel Interface (HIPPI), IBM's Token Ring, AT&T WaveLAN, DEC RoamAbout DS, and so forth. By using suitable libraries, Linux systems are even able to directly run programs written for other operating systems. For example, Linux is able to execute applications written for MS- DOS, MS Windows, SVR3 and R4, 4.4BSD, SCO Unix, XENIX, and others on the Intel 80x86 platform. Linux is well supported. Believe it or not, it may be a lot easier to get patches and updates for Linux than for any proprietary operating system! The answer to a problem often comes back within a few hours after sending a message to some newsgroup or mailing list. Moreover, drivers for Linux are usually available a few weeks after new hardware products have been introduced on the market. By contrast, hardware manufacturers release device drivers for only a few commercial operating systems, usually the Microsoft ones.
  • 20. Understanding the Linux Kernel 10 Therefore, all commercial Unix variants run on a restricted subset of hardware components. With an estimated installed base of more than 12 million and growing, people who are used to certain creature features that are standard under other operating systems are starting to expect the same from Linux. As such, the demand on Linux developers is also increasing. Luckily, though, Linux has evolved under the close direction of Linus over the years, to accommodate the needs of the masses. 1.2 Hardware Dependency Linux tries to maintain a neat distinction between hardware-dependent and hardware- independent source code. To that end, both the arch and the include directories include nine subdirectories corresponding to the nine hardware platforms supported. The standard names of the platforms are: arm Acorn personal computers alpha Compaq Alpha workstations i386 IBM-compatible personal computers based on Intel 80x86 or Intel 80x86-compatible microprocessors m68k Personal computers based on Motorola MC680x0 microprocessors mips Workstations based on Silicon Graphics MIPS microprocessors ppc Workstations based on Motorola-IBM PowerPC microprocessors sparc Workstations based on Sun Microsystems SPARC microprocessors sparc64 Workstations based on Sun Microsystems 64-bit Ultra SPARC microprocessors
  • 21. Understanding the Linux Kernel 11 s390 IBM System/390 mainframes 1.3 Linux Versions Linux distinguishes stable kernels from development kernels through a simple numbering scheme. Each version is characterized by three numbers, separated by periods. The first two numbers are used to identify the version; the third number identifies the release. As shown in Figure 1-1, if the second number is even, it denotes a stable kernel; otherwise, it denotes a development kernel. At the time of this writing, the current stable version of the Linux kernel is 2.2.14, and the current development version is 2.3.51. The 2.2 kernel, which is the basis for this book, was first released in January 1999, and it differs considerably from the 2.0 kernel, particularly with respect to memory management. Work on the 2.3 development version started in May 1999. Figure 1-1. Numbering Linux versions New releases of a stable version come out mostly to fix bugs reported by users. The main algorithms and data structures used to implement the kernel are left unchanged. Development versions, on the other hand, may differ quite significantly from one another; kernel developers are free to experiment with different solutions that occasionally lead to drastic kernel changes. Users who rely on development versions for running applications may experience unpleasant surprises when upgrading their kernel to a newer release. This book concentrates on the most recent stable kernel that we had available because, among all the new features being tried in experimental kernels, there's no way of telling which will ultimately be accepted and what they'll look like in their final form. At the time of this writing, Linux 2.4 has not officially come out. We tried to anticipate the forthcoming features and the main kernel changes with respect to the 2.2 version by looking at the Linux 2.3.99-pre8 prerelease. Linux 2.4 inherits a good deal from Linux 2.2: many concepts, design choices, algorithms, and data structures remain the same. For that reason, we conclude each chapter by sketching how Linux 2.4 differs from Linux 2.2 with respect to the topics just discussed. As you'll notice, the new Linux is gleaming and shining; it should appear more appealing to large corporations and, more generally, to the whole business community.
  • 22. Understanding the Linux Kernel 12 1.4 Basic Operating System Concepts Any computer system includes a basic set of programs called the operating system. The most important program in the set is called the kernel. It is loaded into RAM when the system boots and contains many critical procedures that are needed for the system to operate. The other programs are less crucial utilities; they can provide a wide variety of interactive experiences for the user—as well as doing all the jobs the user bought the computer for—but the essential shape and capabilities of the system are determined by the kernel. The kernel, then, is where we fix our attention in this book. Hence, we'll often use the term "operating system" as a synonym for "kernel." The operating system must fulfill two main objectives: • Interact with the hardware components servicing all low-level programmable elements included in the hardware platform. • Provide an execution environment to the applications that run on the computer system (the so-called user programs). Some operating systems allow all user programs to directly play with the hardware components (a typical example is MS-DOS). In contrast, a Unix-like operating system hides all low-level details concerning the physical organization of the computer from applications run by the user. When a program wants to make use of a hardware resource, it must issue a request to the operating system. The kernel evaluates the request and, if it chooses to grant the resource, interacts with the relative hardware components on behalf of the user program. In order to enforce this mechanism, modern operating systems rely on the availability of specific hardware features that forbid user programs to directly interact with low-level hardware components or to access arbitrary memory locations. In particular, the hardware introduces at least two different execution modes for the CPU: a nonprivileged mode for user programs and a privileged mode for the kernel. Unix calls these User Mode and Kernel Mode, respectively. In the rest of this chapter, we introduce the basic concepts that have motivated the design of Unix over the past two decades, as well as Linux and other operating systems. While the concepts are probably familiar to you as a Linux user, these sections try to delve into them a bit more deeply than usual to explain the requirements they place on an operating system kernel. These broad considerations refer to Unix-like systems, thus also to Linux. The other chapters of this book will hopefully help you to understand the Linux kernel internals. 1.4.1 Multiuser Systems A multiuser system is a computer that is able to concurrently and independently execute several applications belonging to two or more users. "Concurrently" means that applications can be active at the same time and contend for the various resources such as CPU, memory, hard disks, and so on. "Independently" means that each application can perform its task with no concern for what the applications of the other users are doing. Switching from one application to another, of course, slows down each of them and affects the response time seen by the users. Many of the complexities of modern operating system kernels, which we will examine in this book, are present to minimize the delays enforced on each program and to provide the user with responses that are as fast as possible.
  • 23. Understanding the Linux Kernel 13 Multiuser operating systems must include several features: • An authentication mechanism for verifying the user identity • A protection mechanism against buggy user programs that could block other applications running in the system • A protection mechanism against malicious user programs that could interfere with, or spy on, the activity of other users • An accounting mechanism that limits the amount of resource units assigned to each user In order to ensure safe protection mechanisms, operating systems must make use of the hardware protection associated with the CPU privileged mode. Otherwise, a user program would be able to directly access the system circuitry and overcome the imposed bounds. Unix is a multiuser system that enforces the hardware protection of system resources. 1.4.2 Users and Groups In a multiuser system, each user has a private space on the machine: typically, he owns some quota of the disk space to store files, receives private mail messages, and so on. The operating system must ensure that the private portion of a user space is visible only to its owner. In particular, it must ensure that no user can exploit a system application for the purpose of violating the private space of another user. All users are identified by a unique number called the User ID , or UID. Usually only a restricted number of persons are allowed to make use of a computer system. When one of these users starts a working session, the operating system asks for a login name and a password. If the user does not input a valid pair, the system denies access. Since the password is assumed to be secret, the user's privacy is ensured. In order to selectively share material with other users, each user is a member of one or more groups, which are identified by a unique number called a Group ID , or GID. Each file is also associated with exactly one group. For example, access could be set so that the user owning the file has read and write privileges, the group has read-only privileges, and other users on the system are denied access to the file. Any Unix-like operating system has a special user called root, superuser, or supervisor. The system administrator must log in as root in order to handle user accounts, perform maintenance tasks like system backups and program upgrades, and so on. The root user can do almost everything, since the operating system does not apply the usual protection mechanisms to her. In particular, the root user can access every file on the system and can interfere with the activity of every running user program. 1.4.3 Processes All operating systems make use of one fundamental abstraction: the process . A process can be defined either as "an instance of a program in execution," or as the "execution context" of a running program. In traditional operating systems, a process executes a single sequence of instructions in an address space ; the address space is the set of memory addresses that the process is allowed to reference. Modern operating systems allow processes with multiple
  • 24. Understanding the Linux Kernel 14 execution flows, that is, multiple sequences of instructions executed in the same address space. Multiuser systems must enforce an execution environment in which several processes can be active concurrently and contend for system resources, mainly the CPU. Systems that allow concurrent active processes are said to be multiprogramming or multiprocessing.[4] It is important to distinguish programs from processes: several processes can execute the same program concurrently, while the same process can execute several programs sequentially. [4] Some multiprocessing operating systems are not multiuser; an example is Microsoft's Windows 98. On uniprocessor systems, just one process can hold the CPU, and hence just one execution flow can progress at a time. In general, the number of CPUs is always restricted, and therefore only a few processes can progress at the same time. The choice of the process that can progress is left to an operating system component called the scheduler. Some operating systems allow only nonpreemptive processes, which means that the scheduler is invoked only when a process voluntarily relinquishes the CPU. But processes of a multiuser system must be preemptive ; the operating system tracks how long each process holds the CPU and periodically activates the scheduler. Unix is a multiprocessing operating system with preemptive processes. Indeed, the process abstraction is really fundamental in all Unix systems. Even when no user is logged in and no application is running, several system processes monitor the peripheral devices. In particular, several processes listen at the system terminals waiting for user logins. When a user inputs a login name, the listening process runs a program that validates the user password. If the user identity is acknowledged, the process creates another process that runs a shell into which commands are entered. When a graphical display is activated, one process runs the window manager, and each window on the display is usually run by a separate process. When a user creates a graphics shell, one process runs the graphics windows, and a second process runs the shell into which the user can enter the commands. For each user command, the shell process creates another process that executes the corresponding program. Unix-like operating systems adopt a process/kernel model. Each process has the illusion that it's the only process on the machine and it has exclusive access to the operating system services. Whenever a process makes a system call (i.e., a request to the kernel), the hardware changes the privilege mode from User Mode to Kernel Mode, and the process starts the execution of a kernel procedure with a strictly limited purpose. In this way, the operating system acts within the execution context of the process in order to satisfy its request. Whenever the request is fully satisfied, the kernel procedure forces the hardware to return to User Mode and the process continues its execution from the instruction following the system call. 1.4.4 Kernel Architecture As stated before, most Unix kernels are monolithic: each kernel layer is integrated into the whole kernel program and runs in Kernel Mode on behalf of the current process. In contrast, microkernel operating systems demand a very small set of functions from the kernel, generally including a few synchronization primitives, a simple scheduler, and an interprocess communication mechanism. Several system processes that run on top of the microkernel implement other operating system-layer functions, like memory allocators, device drivers, system call handlers, and so on.
  • 25. Understanding the Linux Kernel 15 Although academic research on operating systems is oriented toward microkernels, such operating systems are generally slower than monolithic ones, since the explicit message passing between the different layers of the operating system has a cost. However, microkernel operating systems might have some theoretical advantages over monolithic ones. Microkernels force the system programmers to adopt a modularized approach, since any operating system layer is a relatively independent program that must interact with the other layers through well-defined and clean software interfaces. Moreover, an existing microkernel operating system can be fairly easily ported to other architectures, since all hardware- dependent components are generally encapsulated in the microkernel code. Finally, microkernel operating systems tend to make better use of random access memory (RAM) than monolithic ones, since system processes that aren't implementing needed functionalities might be swapped out or destroyed. Modules are a kernel feature that effectively achieves many of the theoretical advantages of microkernels without introducing performance penalties. A module is an object file whose code can be linked to (and unlinked from) the kernel at runtime. The object code usually consists of a set of functions that implements a filesystem, a device driver, or other features at the kernel's upper layer. The module, unlike the external layers of microkernel operating systems, does not run as a specific process. Instead, it is executed in Kernel Mode on behalf of the current process, like any other statically linked kernel function. The main advantages of using modules include: Modularized approach Since any module can be linked and unlinked at runtime, system programmers must introduce well-defined software interfaces to access the data structures handled by modules. This makes it easy to develop new modules. Platform independence Even if it may rely on some specific hardware features, a module doesn't depend on a fixed hardware platform. For example, a disk driver module that relies on the SCSI standard works as well on an IBM-compatible PC as it does on Compaq's Alpha. Frugal main memory usage A module can be linked to the running kernel when its functionality is required and unlinked when it is no longer useful. This mechanism also can be made transparent to the user, since linking and unlinking can be performed automatically by the kernel. No performance penalty Once linked in, the object code of a module is equivalent to the object code of the statically linked kernel. Therefore, no explicit message passing is required when the functions of the module are invoked.[5] [5] A small performance penalty occurs when the module is linked and when it is unlinked. However, this penalty can be compared to the penalty caused by the creation and deletion of system processes in microkernel operating systems.
  • 26. Understanding the Linux Kernel 16 1.5 An Overview of the Unix Filesystem The Unix operating system design is centered on its filesystem, which has several interesting characteristics. We'll review the most significant ones, since they will be mentioned quite often in forthcoming chapters. 1.5.1 Files A Unix file is an information container structured as a sequence of bytes; the kernel does not interpret the contents of a file. Many programming libraries implement higher-level abstractions, such as records structured into fields and record addressing based on keys. However, the programs in these libraries must rely on system calls offered by the kernel. From the user's point of view, files are organized in a tree-structured name space as shown in Figure 1-2. Figure 1-2. An example of a directory tree All the nodes of the tree, except the leaves, denote directory names. A directory node contains information about the files and directories just beneath it. A file or directory name consists of a sequence of arbitrary ASCII characters,[6] with the exception of / and of the null character 0. Most filesystems place a limit on the length of a filename, typically no more than 255 characters. The directory corresponding to the root of the tree is called the root directory . By convention, its name is a slash (/). Names must be different within the same directory, but the same name may be used in different directories. [6] Some operating systems allow filenames to be expressed in many different alphabets, based on 16-bit extended coding of graphical characters such as Unicode. Unix associates a current working directory with each process (see Section 1.6.1 later in this chapter); it belongs to the process execution context, and it identifies the directory currently used by the process. In order to identify a specific file, the process uses a pathname, which consists of slashes alternating with a sequence of directory names that lead to the file. If the first item in the pathname is a slash, the pathname is said to be absolute, since its starting point is the root directory. Otherwise, if the first item is a directory name or filename, the pathname is said to be relative, since its starting point is the process's current directory. While specifying filenames, the notations "." and ".." are also used. They denote the current working directory and its parent directory, respectively. If the current working directory is the root directory, "." and ".." coincide.
  • 27. Understanding the Linux Kernel 17 1.5.2 Hard and Soft Links A filename included in a directory is called a file hard link, or more simply a link. The same file may have several links included in the same directory or in different ones, thus several filenames. The Unix command: $ ln f1 f2 is used to create a new hard link that has the pathname f2 for a file identified by the pathname f1. Hard links have two limitations: • Users are not allowed to create hard links for directories. This might transform the directory tree into a graph with cycles, thus making it impossible to locate a file according to its name. • Links can be created only among files included in the same filesystem. This is a serious limitation since modern Unix systems may include several filesystems located on different disks and/or partitions, and users may be unaware of the physical divisions between them. In order to overcome these limitations, soft links (also called symbolic links) have been introduced. Symbolic links are short files that contain an arbitrary pathname of another file. The pathname may refer to any file located in any filesystem; it may even refer to a nonexistent file. The Unix command: $ ln -s f1 f2 creates a new soft link with pathname f2 that refers to pathname f1. When this command is executed, the filesystem creates a soft link and writes into it the f1 pathname. It then inserts— in the proper directory—a new entry containing the last name of the f2 pathname. In this way, any reference to f2 can be translated automatically into a reference to f1. 1.5.3 File Types Unix files may have one of the following types: • Regular file • Directory • Symbolic link • Block-oriented device file • Character-oriented device file • Pipe and named pipe (also called FIFO) • Socket
  • 28. Understanding the Linux Kernel 18 The first three file types are constituents of any Unix filesystem. Their implementation will be described in detail in Chapter 17. Device files are related to I/O devices and device drivers integrated into the kernel. For example, when a program accesses a device file, it acts directly on the I/O device associated with that file (see Chapter 13). Pipes and sockets are special files used for interprocess communication (see Section 1.6.5 later in this chapter and Chapter 18). 1.5.4 File Descriptor and Inode Unix makes a clear distinction between a file and a file descriptor. With the exception of device and special files, each file consists of a sequence of characters. The file does not include any control information such as its length, or an End-Of-File (EOF) delimiter. All information needed by the filesystem to handle a file is included in a data structure called an inode. Each file has its own inode, which the filesystem uses to identify the file. While filesystems and the kernel functions handling them can vary widely from one Unix system to another, they must always provide at least the following attributes, which are specified in the POSIX standard: • File type (see previous section) • Number of hard links associated with the file • File length in bytes • Device ID (i.e., an identifier of the device containing the file) • Inode number that identifies the file within the filesystem • User ID of the file owner • Group ID of the file • Several timestamps that specify the inode status change time, the last access time, and the last modify time • Access rights and file mode (see next section) 1.5.5 Access Rights and File Mode The potential users of a file fall into three classes: • The user who is the owner of the file • The users who belong to the same group as the file, not including the owner • All remaining users (others) There are three types of access rights, Read, Write, and Execute, for each of these three classes. Thus, the set of access rights associated with a file consists of nine different binary flags. Three additional flags, called suid (Set User ID), sgid (Set Group ID), and sticky define the file mode. These flags have the following meanings when applied to executable files:
  • 29. Understanding the Linux Kernel 19 suid A process executing a file normally keeps the User ID (UID) of the process owner. However, if the executable file has the suid flag set, the process gets the UID of the file owner. sgid A process executing a file keeps the Group ID (GID) of the process group. However, if the executable file has the sgid flag set, the process gets the ID of the file group. sticky An executable file with the sticky flag set corresponds to a request to the kernel to keep the program in memory after its execution terminates.[7] [7] This flag has become obsolete; other approaches based on sharing of code pages are now used (see Chapter 7). When a file is created by a process, its owner ID is the UID of the process. Its owner group ID can be either the GID of the creator process or the GID of the parent directory, depending on the value of the sgid flag of the parent directory. 1.5.6 File-Handling System Calls When a user accesses the contents of either a regular file or a directory, he actually accesses some data stored in a hardware block device. In this sense, a filesystem is a user-level view of the physical organization of a hard disk partition. Since a process in User Mode cannot directly interact with the low-level hardware components, each actual file operation must be performed in Kernel Mode. Therefore, the Unix operating system defines several system calls related to file handling. Whenever a process wants to perform some operation on a specific file, it uses the proper system call and passes the file pathname as a parameter. All Unix kernels devote great attention to the efficient handling of hardware block devices in order to achieve good overall system performance. In the chapters that follow, we will describe topics related to file handling in Linux and specifically how the kernel reacts to file- related system calls. In order to understand those descriptions, you will need to know how the main file-handling system calls are used; they are described in the next section. 1.5.6.1 Opening a file Processes can access only "opened" files. In order to open a file, the process invokes the system call: fd = open(path, flag, mode) The three parameters have the following meanings:
  • 30. Understanding the Linux Kernel 20 path Denotes the pathname (relative or absolute) of the file to be opened. flag Specifies how the file must be opened (e.g., read, write, read/write, append). It can also specify whether a nonexisting file should be created. mode Specifies the access rights of a newly created file. This system call creates an "open file" object and returns an identifier called file descriptor . An open file object contains: • Some file-handling data structures, like a pointer to the kernel buffer memory area where file data will be copied; an offset field that denotes the current position in the file from which the next operation will take place (the so-called file pointer); and so on. • Some pointers to kernel functions that the process is enabled to invoke. The set of permitted functions depends on the value of the flag parameter. We'll discuss open file objects in detail in Chapter 12. Let's limit ourselves here to describing some general properties specified by the POSIX semantics: • A file descriptor represents an interaction between a process and an opened file, while an open file object contains data related to that interaction. The same open file object may be identified by several file descriptors. • Several processes may concurrently open the same file. In this case, the filesystem assigns a separate file descriptor to each file, along with a separate open file object. When this occurs, the Unix filesystem does not provide any kind of synchronization among the I/O operations issued by the processes on the same file. However, several system calls such as flock( ) are available to allow processes to synchronize themselves on the entire file or on portions of it (see Chapter 12). In order to create a new file, the process may also invoke the create( ) system call, which is handled by the kernel exactly like open( ). 1.5.6.2 Accessing an opened file Regular Unix files can be addressed either sequentially or randomly, while device files and named pipes are usually accessed sequentially (see Chapter 13). In both kinds of access, the kernel stores the file pointer in the open file object, that is, the current position at which the next read or write operation will take place. Sequential access is implicitly assumed: the read( ) and write( ) system calls always refer to the position of the current file pointer. In order to modify the value, a program must explicitly invoke the lseek( ) system call. When a file is opened, the kernel sets the file pointer to the position of the first byte in the file (offset 0).
  • 31. Understanding the Linux Kernel 21 The lseek( ) system call requires the following parameters: newoffset = lseek(fd, offset, whence); which have the following meanings: fd Indicates the file descriptor of the opened file offset Specifies a signed integer value that will be used for computing the new position of the file pointer whence Specifies whether the new position should be computed by adding the offset value to the number (offset from the beginning of the file), the current file pointer, or the position of the last byte (offset from the end of the file) The read( ) system call requires the following parameters: nread = read(fd, buf, count); which have the following meaning: fd Indicates the file descriptor of the opened file buf Specifies the address of the buffer in the process's address space to which the data will be transferred count Denotes the number of bytes to be read When handling such a system call, the kernel attempts to read count bytes from the file having the file descriptor fd, starting from the current value of the opened file's offset field. In some cases—end-of-file, empty pipe, and so on—the kernel does not succeed in reading all count bytes. The returned nread value specifies the number of bytes effectively read. The file pointer is also updated by adding nread to its previous value. The write( ) parameters are similar.
  • 32. Understanding the Linux Kernel 22 1.5.6.3 Closing a file When a process does not need to access the contents of a file anymore, it can invoke the system call: res = close(fd); which releases the open file object corresponding to the file descriptor fd. When a process terminates, the kernel closes all its still opened files. 1.5.6.4 Renaming and deleting a file In order to rename or delete a file, a process does not need to open it. Indeed, such operations do not act on the contents of the affected file, but rather on the contents of one or more directories. For example, the system call: res = rename(oldpath, newpath); changes the name of a file link, while the system call: res = unlink(pathname); decrements the file link count and removes the corresponding directory entry. The file is deleted only when the link count assumes the value 0. 1.6 An Overview of Unix Kernels Unix kernels provide an execution environment in which applications may run. Therefore, the kernel must implement a set of services and corresponding interfaces. Applications use those interfaces and do not usually interact directly with hardware resources. 1.6.1 The Process/Kernel Model As already mentioned, a CPU can run either in User Mode or in Kernel Mode. Actually, some CPUs can have more than two execution states. For instance, the Intel 80x86 microprocessors have four different execution states. But all standard Unix kernels make use of only Kernel Mode and User Mode. When a program is executed in User Mode, it cannot directly access the kernel data structures or the kernel programs. When an application executes in Kernel Mode, however, these restrictions no longer apply. Each CPU model provides special instructions to switch from User Mode to Kernel Mode and vice versa. A program executes most of the time in User Mode and switches to Kernel Mode only when requesting a service provided by the kernel. When the kernel has satisfied the program's request, it puts the program back in User Mode. Processes are dynamic entities that usually have a limited life span within the system. The task of creating, eliminating, and synchronizing the existing processes is delegated to a group of routines in the kernel. The kernel itself is not a process but a process manager. The process/kernel model assumes that processes that require a kernel service make use of specific programming constructs
  • 33. Understanding the Linux Kernel 23 called system calls. Each system call sets up the group of parameters that identifies the process request and then executes the hardware-dependent CPU instruction to switch from User Mode to Kernel Mode. Besides user processes, Unix systems include a few privileged processes called kernel threads with the following characteristics: • They run in Kernel Mode in the kernel address space. • They do not interact with users, and thus do not require terminal devices. • They are usually created during system startup and remain alive until the system is shut down. Notice how the process/ kernel model is somewhat orthogonal to the CPU state: on a uniprocessor system, only one process is running at any time and it may run either in User or in Kernel Mode. If it runs in Kernel Mode, the processor is executing some kernel routine. Figure 1-3 illustrates examples of transitions between User and Kernel Mode. Process 1 in User Mode issues a system call, after which the process switches to Kernel Mode and the system call is serviced. Process 1 then resumes execution in User Mode until a timer interrupt occurs and the scheduler is activated in Kernel Mode. A process switch takes place, and Process 2 starts its execution in User Mode until a hardware device raises an interrupt. As a consequence of the interrupt, Process 2 switches to Kernel Mode and services the interrupt. Figure 1-3. Transitions between User and Kernel Mode Unix kernels do much more than handle system calls; in fact, kernel routines can be activated in several ways: • A process invokes a system call. • The CPU executing the process signals an exception, which is some unusual condition such as an invalid instruction. The kernel handles the exception on behalf of the process that caused it. • A peripheral device issues an interrupt signal to the CPU to notify it of an event such as a request for attention, a status change, or the completion of an I/O operation. Each interrupt signal is dealt by a kernel program called an interrupt handler. Since peripheral devices operate asynchronously with respect to the CPU, interrupts occur at unpredictable times. • A kernel thread is executed; since it runs in Kernel Mode, the corresponding program must be considered part of the kernel, albeit encapsulated in a process.
  • 34. Understanding the Linux Kernel 24 1.6.2 Process Implementation To let the kernel manage processes, each process is represented by a process descriptor that includes information about the current state of the process. When the kernel stops the execution of a process, it saves the current contents of several processor registers in the process descriptor. These include: • The program counter (PC) and stack pointer (SP) registers • The general-purpose registers • The floating point registers • The processor control registers (Processor Status Word) containing information about the CPU state • The memory management registers used to keep track of the RAM accessed by the process When the kernel decides to resume executing a process, it uses the proper process descriptor fields to load the CPU registers. Since the stored value of the program counter points to the instruction following the last instruction executed, the process resumes execution from where it was stopped. When a process is not executing on the CPU, it is waiting for some event. Unix kernels distinguish many wait states, which are usually implemented by queues of process descriptors; each (possibly empty) queue corresponds to the set of processes waiting for a specific event. 1.6.3 Reentrant Kernels All Unix kernels are reentrant : this means that several processes may be executing in Kernel Mode at the same time. Of course, on uniprocessor systems only one process can progress, but many of them can be blocked in Kernel Mode waiting for the CPU or the completion of some I/O operation. For instance, after issuing a read to a disk on behalf of some process, the kernel will let the disk controller handle it and will resume executing other processes. An interrupt notifies the kernel when the device has satisfied the read, so the former process can resume the execution. One way to provide reentrancy is to write functions so that they modify only local variables and do not alter global data structures. Such functions are called reentrant functions. But a reentrant kernel is not limited just to such reentrant functions (although that is how some real-time kernels are implemented). Instead, the kernel can include nonreentrant functions and use locking mechanisms to ensure that only one process can execute a nonreentrant function at a time. Every process in Kernel Mode acts on its own set of memory locations and cannot interfere with the others. If a hardware interrupt occurs, a reentrant kernel is able to suspend the current running process even if that process is in Kernel Mode. This capability is very important, since it improves the throughput of the device controllers that issue interrupts. Once a device has issued an interrupt, it waits until the CPU acknowledges it. If the kernel is able to answer quickly, the device controller will be able to perform other tasks while the CPU handles the interrupt.
  • 35. Understanding the Linux Kernel 25 Now let's look at kernel reentrancy and its impact on the organization of the kernel. A kernel control path denotes the sequence of instructions executed by the kernel to handle a system call, an exception, or an interrupt. In the simplest case, the CPU executes a kernel control path sequentially from the first instruction to the last. When one of the following events occurs, however, the CPU interleaves the kernel control paths: • A process executing in User Mode invokes a system call and the corresponding kernel control path verifies that the request cannot be satisfied immediately; it then invokes the scheduler to select a new process to run. As a result, a process switch occurs. The first kernel control path is left unfinished and the CPU resumes the execution of some other kernel control path. In this case, the two control paths are executed on behalf of two different processes. • The CPU detects an exception—for example, an access to a page not present in RAM—while running a kernel control path. The first control path is suspended, and the CPU starts the execution of a suitable procedure. In our example, this type of procedure could allocate a new page for the process and read its contents from disk. When the procedure terminates, the first control path can be resumed. In this case, the two control paths are executed on behalf of the same process. • A hardware interrupt occurs while the CPU is running a kernel control path with the interrupts enabled. The first kernel control path is left unfinished and the CPU starts processing another kernel control path to handle the interrupt. The first kernel control path resumes when the interrupt handler terminates. In this case the two kernel control paths run in the execution context of the same process and the total elapsed system time is accounted to it. However, the interrupt handler doesn't necessarily operate on behalf of the process. Figure 1-4 illustrates a few examples of noninterleaved and interleaved kernel control paths. Three different CPU states are considered: • Running a process in User Mode (User) • Running an exception or a system call handler (Excp) • Running an interrupt handler (Intr) Figure 1-4. Interleaving of kernel control paths
  • 36. Understanding the Linux Kernel 26 1.6.4 Process Address Space Each process runs in its private address space. A process running in User Mode refers to private stack, data, and code areas. When running in Kernel Mode, the process addresses the kernel data and code area and makes use of another stack. Since the kernel is reentrant, several kernel control paths—each related to a different process—may be executed in turn. In this case, each kernel control path refers to its own private kernel stack. While it appears to each process that it has access to a private address space, there are times when part of the address space is shared among processes. In some cases this sharing is explicitly requested by processes; in others it is done automatically by the kernel to reduce memory usage. If the same program, say an editor, is needed simultaneously by several users, the program will be loaded into memory only once, and its instructions can be shared by all of the users who need it. Its data, of course, must not be shared, because each user will have separate data. This kind of shared address space is done automatically by the kernel to save memory. Processes can also share parts of their address space as a kind of interprocess communication, using the "shared memory" technique introduced in System V and supported by Linux. Finally, Linux supports the mmap( ) system call, which allows part of a file or the memory residing on a device to be mapped into a part of a process address space. Memory mapping can provide an alternative to normal reads and writes for transferring data. If the same file is shared by several processes, its memory mapping is included in the address space of each of the processes that share it. 1.6.5 Synchronization and Critical Regions Implementing a reentrant kernel requires the use of synchronization: if a kernel control path is suspended while acting on a kernel data structure, no other kernel control path will be allowed to act on the same data structure unless it has been reset to a consistent state. Otherwise, the interaction of the two control paths could corrupt the stored information. For example, let's suppose that a global variable V contains the number of available items of some system resource. A first kernel control path A reads the variable and determines that there is just one available item. At this point, another kernel control path B is activated and reads the same variable, which still contains the value 1. Thus, B decrements V and starts using the resource item. Then A resumes the execution; because it has already read the value of V, it assumes that it can decrement V and take the resource item, which B already uses. As a final result, V contains -1, and two kernel control paths are using the same resource item with potentially disastrous effects. When the outcome of some computation depends on how two or more processes are scheduled, the code is incorrect: we say that there is a race condition. In general, safe access to a global variable is ensured by using atomic operations. In the previous example, data corruption would not be possible if the two control paths read and
  • 37. Understanding the Linux Kernel 27 decrement V with a single, noninterruptible operation. However, kernels contain many data structures that cannot be accessed with a single operation. For example, it usually isn't possible to remove an element from a linked list with a single operation, because the kernel needs to access at least two pointers at once. Any section of code that should be finished by each process that begins it before another process can enter it is called a critical region.[8] [8] Synchronization problems have been fully described in other works; we refer the interested reader to books on the Unix operating systems (see the bibliography near the end of the book). These problems occur not only among kernel control paths but also among processes sharing common data. Several synchronization techniques have been adopted. The following section will concentrate on how to synchronize kernel control paths. 1.6.5.1 Nonpreemptive kernels In search of a drastically simple solution to synchronization problems, most traditional Unix kernels are nonpreemptive: when a process executes in Kernel Mode, it cannot be arbitrarily suspended and substituted with another process. Therefore, on a uniprocessor system all kernel data structures that are not updated by interrupts or exception handlers are safe for the kernel to access. Of course, a process in Kernel Mode can voluntarily relinquish the CPU, but in this case it must ensure that all data structures are left in a consistent state. Moreover, when it resumes its execution, it must recheck the value of any previously accessed data structures that could be changed. Nonpreemptability is ineffective in multiprocessor systems, since two kernel control paths running on different CPUs could concurrently access the same data structure. 1.6.5.2 Interrupt disabling Another synchronization mechanism for uniprocessor systems consists of disabling all hardware interrupts before entering a critical region and reenabling them right after leaving it. This mechanism, while simple, is far from optimal. If the critical region is large, interrupts can remain disabled for a relatively long time, potentially causing all hardware activities to freeze. Moreover, on a multiprocessor system this mechanism doesn't work at all. There is no way to ensure that no other CPU can access the same data structures updated in the protected critical region. 1.6.5.3 Semaphores A widely used mechanism, effective in both uniprocessor and multiprocessor systems, relies on the use of semaphores. A semaphore is simply a counter associated with a data structure; the semaphore is checked by all kernel threads before they try to access the data structure. Each semaphore may be viewed as an object composed of: • An integer variable • A list of waiting processes • Two atomic methods: down( ) and up( )
  • 38. Understanding the Linux Kernel 28 The down( ) method decrements the value of the semaphore. If the new value is less than 0, the method adds the running process to the semaphore list and then blocks (i.e., invokes the scheduler). The up( ) method increments the value of the semaphore and, if its new value is greater than or equal to 0, reactivates one or more processes in the semaphore list. Each data structure to be protected has its own semaphore, which is initialized to 1. When a kernel control path wishes to access the data structure, it executes the down( ) method on the proper semaphore. If the value of the new semaphore isn't negative, access to the data structure is granted. Otherwise, the process that is executing the kernel control path is added to the semaphore list and blocked. When another process executes the up( ) method on that semaphore, one of the processes in the semaphore list is allowed to proceed. 1.6.5.4 Spin locks In multiprocessor systems, semaphores are not always the best solution to the synchronization problems. Some kernel data structures should be protected from being concurrently accessed by kernel control paths that run on different CPUs. In this case, if the time required to update the data structure is short, a semaphore could be very inefficient. To check a semaphore, the kernel must insert a process in the semaphore list and then suspend it. Since both operations are relatively expensive, in the time it takes to complete them, the other kernel control path could have already released the semaphore. In these cases, multiprocessor operating systems make use of spin locks. A spin lock is very similar to a semaphore, but it has no process list: when a process finds the lock closed by another process, it "spins" around repeatedly, executing a tight instruction loop until the lock becomes open. Of course, spin locks are useless in a uniprocessor environment. When a kernel control path tries to access a locked data structure, it starts an endless loop. Therefore, the kernel control path that is updating the protected data structure would not have a chance to continue the execution and release the spin lock. The final result is that the system hangs. 1.6.5.5 Avoiding deadlocks Processes or kernel control paths that synchronize with other control paths may easily enter in a deadlocked state. The simplest case of deadlock occurs when process p1 gains access to data structure a and process p2 gains access to b, but p1 then waits for b and p2 waits for a. Other more complex cyclic waitings among groups of processes may also occur. Of course, a deadlock condition causes a complete freeze of the affected processes or kernel control paths. As far as kernel design is concerned, deadlock becomes an issue when the number of kernel semaphore types used is high. In this case, it may be quite difficult to ensure that no deadlock state will ever be reached for all possible ways to interleave kernel control paths. Several operating systems, including Linux, avoid this problem by introducing a very limited number of semaphore types and by requesting semaphores in an ascending order.
  • 39. Understanding the Linux Kernel 29 1.6.6 Signals and Interprocess Communication Unix signals provide a mechanism for notifying processes of system events. Each event has its own signal number, which is usually referred to by a symbolic constant such as SIGTERM. There are two kinds of system events: Asynchronous notifications For instance, a user can send the interrupt signal SIGTERM to a foreground process by pressing the interrupt keycode (usually, CTRL-C) at the terminal. Synchronous errors or exceptions For instance, the kernel sends the signal SIGSEGV to a process when it accesses a memory location at an illegal address. The POSIX standard defines about 20 different signals, two of which are user-definable and may be used as a primitive mechanism for communication and synchronization among processes in User Mode. In general, a process may react to a signal reception in two possible ways: • Ignore the signal. • Asynchronously execute a specified procedure (the signal handler). If the process does not specify one of these alternatives, the kernel performs a default action that depends on the signal number. The five possible default actions are: • Terminate the process. • Write the execution context and the contents of the address space in a file (core dump) and terminate the process. • Ignore the signal. • Suspend the process. • Resume the process's execution, if it was stopped. Kernel signal handling is rather elaborate since the POSIX semantics allows processes to temporarily block signals. Moreover, a few signals such as SIGKILL cannot be directly handled by the process and cannot be ignored. AT&T's Unix System V introduced other kinds of interprocess communication among processes in User Mode, which have been adopted by many Unix kernels: semaphores, message queues, and shared memory. They are collectively known as System V IPC. The kernel implements these constructs as IPC resources: a process acquires a resource by invoking a shmget( ), semget( ), or msgget( ) system call. Just like files, IPC resources are persistent: they must be explicitly deallocated by the creator process, by the current owner, or by a superuser process. Semaphores are similar to those described in Section 1.6.5 earlier in this chapter, except that they are reserved for processes in User Mode. Message queues allow processes to exchange
  • 40. Understanding the Linux Kernel 30 messages by making use of the msgsnd( ) and msgget( ) system calls, which respectively insert a message into a specific message queue and extract a message from it. Shared memory provides the fastest way for processes to exchange and share data. A process starts by issuing a shmget( ) system call to create a new shared memory having a required size. After obtaining the IPC resource identifier, the process invokes the shmat( ) system call, which returns the starting address of the new region within the process address space. When the process wishes to detach the shared memory from its address space, it invokes the shmdt( ) system call. The implementation of shared memory depends on how the kernel implements process address spaces. 1.6.7 Process Management Unix makes a neat distinction between the process and the program it is executing. To that end, the fork( ) and exit( ) system calls are used respectively to create a new process and to terminate it, while an exec( )-like system call is invoked to load a new program. After such a system call has been executed, the process resumes execution with a brand new address space containing the loaded program. The process that invokes a fork( ) is the parent while the new process is its child . Parents and children can find each other because the data structure describing each process includes a pointer to its immediate parent and pointers to all its immediate children. A naive implementation of the fork( ) would require both the parent's data and the parent's code to be duplicated and assign the copies to the child. This would be quite time-consuming. Current kernels that can rely on hardware paging units follow the Copy-On-Write approach, which defers page duplication until the last moment (i.e., until the parent or the child is required to write into a page). We shall describe how Linux implements this technique in Section 7.4.4 in Chapter 7. The exit( ) system call terminates a process. The kernel handles this system call by releasing the resources owned by the process and sending the parent process a SIGCHLD signal, which is ignored by default. 1.6.7.1 Zombie processes How can a parent process inquire about termination of its children? The wait( ) system call allows a process to wait until one of its children terminates; it returns the process ID (PID) of the terminated child. When executing this system call, the kernel checks whether a child has already terminated. A special zombie process state is introduced to represent terminated processes: a process remains in that state until its parent process executes a wait( ) system call on it. The system call handler extracts some data about resource usage from the process descriptor fields; the process descriptor may be released once the data has been collected. If no child process has already terminated when the wait( ) system call is executed, the kernel usually puts the process in a wait state until a child terminates. Many kernels also implement a waitpid( ) system call, which allows a process to wait for a specific child process. Other variants of wait( ) system calls are also quite common.
  • 41. Understanding the Linux Kernel 31 It's a good practice for the kernel to keep around information on a child process until the parent issues its wait( ) call, but suppose the parent process terminates without issuing that call? The information takes up valuable memory slots that could be used to serve living processes. For example, many shells allow the user to start a command in the background and then log out. The process that is running the command shell terminates, but its children continue their execution. The solution lies in a special system process called init that is created during system initialization. When a process terminates, the kernel changes the appropriate process descriptor pointers of all the existing children of the terminated process to make them become children of init. This process monitors the execution of all its children and routinely issues wait( ) system calls, whose side effect is to get rid of all zombies. 1.6.7.2 Process groups and login sessions Modern Unix operating systems introduce the notion of process groups to represent a "job" abstraction. For example, in order to execute the command line: $ ls | sort | more a shell that supports process groups, such as bash, creates a new group for the three processes corresponding to ls, sort, and more. In this way, the shell acts on the three processes as if they were a single entity (the job, to be precise). Each process descriptor includes a process group ID field. Each group of processes may have a group leader, which is the process whose PID coincides with the process group ID. A newly created process is initially inserted into the process group of its parent. Modern Unix kernels also introduce login sessions. Informally, a login session contains all processes that are descendants of the process that has started a working session on a specific terminal—usually, the first command shell process created for the user. All processes in a process group must be in the same login session. A login session may have several process groups active simultaneously; one of these process groups is always in the foreground, which means that it has access to the terminal. The other active process groups are in the background. When a background process tries to access the terminal, it receives a SIGTTIN or SIGTTOUT signal. In many command shells the internal commands bg and fg can be used to put a process group in either the background or the foreground. 1.6.8 Memory Management Memory management is by far the most complex activity in a Unix kernel. We shall dedicate more than a third of this book just to describing how Linux does it. This section illustrates some of the main issues related to memory management. 1.6.8.1 Virtual memory All recent Unix systems provide a useful abstraction called virtual memory. Virtual memory acts as a logical layer between the application memory requests and the hardware Memory Management Unit (MMU). Virtual memory has many purposes and advantages:
  • 42. Understanding the Linux Kernel 32 • Several processes can be executed concurrently. • It is possible to run applications whose memory needs are larger than the available physical memory. • Processes can execute a program whose code is only partially loaded in memory. • Each process is allowed to access a subset of the available physical memory. • Processes can share a single memory image of a library or program. • Programs can be relocatable, that is, they can be placed anywhere in physical memory. • Programmers can write machine-independent code, since they do not need to be concerned about physical memory organization. The main ingredient of a virtual memory subsystem is the notion of virtual address space. The set of memory references that a process can use is different from physical memory addresses. When a process uses a virtual address,[9] the kernel and the MMU cooperate to locate the actual physical location of the requested memory item. [9] These addresses have different nomenclatures depending on the computer architecture. As we'll see in Chapter 2, Intel 80x86 manuals refer to them as "logical addresses." Today's CPUs include hardware circuits that automatically translate the virtual addresses into physical ones. To that end, the available RAM is partitioned into page frames 4 or 8 KB in length, and a set of page tables is introduced to specify the correspondence between virtual and physical addresses. These circuits make memory allocation simpler, since a request for a block of contiguous virtual addresses can be satisfied by allocating a group of page frames having noncontiguous physical addresses. 1.6.8.2 Random access memory usage All Unix operating systems clearly distinguish two portions of the random access memory (RAM). A few megabytes are dedicated to storing the kernel image (i.e., the kernel code and the kernel static data structures). The remaining portion of RAM is usually handled by the virtual memory system and is used in three possible ways: • To satisfy kernel requests for buffers, descriptors, and other dynamic kernel data structures • To satisfy process requests for generic memory areas and for memory mapping of files • To get better performance from disks and other buffered devices by means of caches Each request type is valuable. On the other hand, since the available RAM is limited, some balancing among request types must be done, particularly when little available memory is left. Moreover, when some critical threshold of available memory is reached and a page-frame- reclaiming algorithm is invoked to free additional memory, which are the page frames most suitable for reclaiming? As we shall see in Chapter 16, there is no simple answer to this question and very little support from theory. The only available solution lies in developing carefully tuned empirical algorithms. One major problem that must be solved by the virtual memory system is memory fragmentation . Ideally, a memory request should fail only when the number of free page frames is too small. However, the kernel is often forced to use physically contiguous memory areas, hence the memory request could fail even if there is enough memory available but it is not available as one contiguous chunk.
  • 43. Understanding the Linux Kernel 33 1.6.8.3 Kernel Memory Allocator The Kernel Memory Allocator (KMA) is a subsystem that tries to satisfy the requests for memory areas from all parts of the system. Some of these requests will come from other kernel subsystems needing memory for kernel use, and some requests will come via system calls from user programs to increase their processes' address spaces. A good KMA should have the following features: • It must be fast. Actually, this is the most crucial attribute, since it is invoked by all kernel subsystems (including the interrupt handlers). • It should minimize the amount of wasted memory. • It should try to reduce the memory fragmentation problem. • It should be able to cooperate with the other memory management subsystems in order to borrow and release page frames from them. Several kinds of KMAs have been proposed, which are based on a variety of different algorithmic techniques, including: • Resource map allocator • Power-of-two free lists • McKusick-Karels allocator • Buddy system • Mach's Zone allocator • Dynix allocator • Solaris's Slab allocator As we shall see in Chapter 6, Linux's KMA uses a Slab allocator on top of a Buddy system. 1.6.8.4 Process virtual address space handling The address space of a process contains all the virtual memory addresses that the process is allowed to reference. The kernel usually stores a process virtual address space as a list of memory area descriptors. For example, when a process starts the execution of some program via an exec( )-like system call, the kernel assigns to the process a virtual address space that comprises memory areas for: • The executable code of the program • The initialized data of the program • The uninitialized data of the program • The initial program stack (that is, the User Mode stack) • The executable code and data of needed shared libraries • The heap (the memory dynamically requested by the program) All recent Unix operating systems adopt a memory allocation strategy called demand paging. With demand paging, a process can start program execution with none of its pages in physical memory. As it accesses a nonpresent page, the MMU generates an exception; the exception handler finds the affected memory region, allocates a free page, and initializes it with the appropriate data. In a similar fashion, when the process dynamically requires some memory by using malloc( ) or the brk( ) system call (which is invoked internally by malloc( )), the kernel just updates the size of the heap memory region of the process. A page frame is
  • 44. Understanding the Linux Kernel 34 assigned to the process only when it generates an exception by trying to refer its virtual memory addresses. Virtual address spaces also allow other efficient strategies, such as the Copy-On-Write strategy mentioned earlier. For example, when a new process is created, the kernel just assigns the parent's page frames to the child address space, but it marks them read only. An exception is raised as soon the parent or the child tries to modify the contents of a page. The exception handler assigns a new page frame to the affected process and initializes it with the contents of the original page. 1.6.8.5 Swapping and caching In order to extend the size of the virtual address space usable by the processes, the Unix operating system makes use of swap areas on disk. The virtual memory system regards the contents of a page frame as the basic unit for swapping. Whenever some process refers to a swapped-out page, the MMU raises an exception. The exception handler then allocates a new page frame and initializes the page frame with its old contents saved on disk. On the other hand, physical memory is also used as cache for hard disks and other block devices. This is because hard drives are very slow: a disk access requires several milliseconds, which is a very long time compared with the RAM access time. Therefore, disks are often the bottleneck in system performance. As a general rule, one of the policies already implemented in the earliest Unix system is to defer writing to disk as long as possible by loading into RAM a set of disk buffers corresponding to blocks read from disk. The sync( ) system call forces disk synchronization by writing all of the "dirty" buffers (i.e., all the buffers whose contents differ from that of the corresponding disk blocks) into disk. In order to avoid data loss, all operating systems take care to periodically write dirty buffers back to disk. 1.6.9 Device Drivers The kernel interacts with I/O devices by means of device drivers. Device drivers are included in the kernel and consist of data structures and functions that control one or more devices, such as hard disks, keyboards, mouses, monitors, network interfaces, and devices connected to a SCSI bus. Each driver interacts with the remaining part of the kernel (even with other drivers) through a specific interface. This approach has the following advantages: • Device-specific code can be encapsulated in a specific module. • Vendors can add new devices without knowing the kernel source code: only the interface specifications must be known. • The kernel deals with all devices in a uniform way and accesses them through the same interface. • It is possible to write a device driver as a module that can be dynamically loaded in the kernel without requiring the system to be rebooted. It is also possible to dynamically unload a module that is no longer needed, thus minimizing the size of the kernel image stored in RAM. Figure 1-5 illustrates how device drivers interface with the rest of the kernel and with the processes. Some user programs (P) wish to operate on hardware devices. They make requests to the kernel using the usual file-related system calls and the device files normally found in the /dev directory. Actually, the device files are the user-visible portion of the device driver
  • 45. Understanding the Linux Kernel 35 interface. Each device file refers to a specific device driver, which is invoked by the kernel in order to perform the requested operation on the hardware component. Figure 1-5. Device driver interface It is worth mentioning that at the time Unix was introduced graphical terminals were uncommon and expensive, and thus only alphanumeric terminals were handled directly by Unix kernels. When graphical terminals became widespread, ad hoc applications such as the X Window System were introduced that ran as standard processes and accessed the I/O ports of the graphics interface and the RAM video area directly. Some recent Unix kernels, such as Linux 2.2, include limited support for some frame buffer devices, thus allowing a program to access the local memory inside a video card through a device file.
  • 46. Understanding the Linux Kernel 36 Chapter 2. Memory Addressing This chapter deals with addressing techniques. Luckily, an operating system is not forced to keep track of physical memory all by itself; today's microprocessors include several hardware circuits to make memory management both more efficient and more robust in case of programming errors. As in the rest of this book, we offer details in this chapter on how Intel 80x86 microprocessors address memory chips and how Linux makes use of the available addressing circuits. You will find, we hope, that when you learn the implementation details on Linux's most popular platform you will better understand both the general theory of paging and how to research the implementation on other platforms. This is the first of three chapters related to memory management: Chapter 6, discusses how the kernel allocates main memory to itself, while Chapter 7, considers how linear addresses are assigned to processes. 2.1 Memory Addresses Programmers casually refer to a memory address as the way to access the contents of a memory cell. But when dealing with Intel 80x86 microprocessors, we have to distinguish among three kinds of addresses: Logical address Included in the machine language instructions to specify the address of an operand or of an instruction. This type of address embodies the well-known Intel segmented architecture that forces MS-DOS and Windows programmers to divide their programs into segments. Each logical address consists of a segment and an offset (or displacement) that denotes the distance from the start of the segment to the actual address. Linear address A single 32-bit unsigned integer that can be used to address up to 4 GB, that is, up to 4,294,967,296 memory cells. Linear addresses are usually represented in hexadecimal notation; their values range from 0x00000000 to 0xffffffff. Physical address Used to address memory cells included in memory chips. They correspond to the electrical signals sent along the address pins of the microprocessor to the memory bus. Physical addresses are represented as 32-bit unsigned integers. The CPU control unit transforms a logical address into a linear address by means of a hardware circuit called a segmentation unit; successively, a second hardware circuit called a paging unit transforms the linear address into a physical address (see Figure 2-1).
  • 47. Understanding the Linux Kernel 37 Figure 2-1. Logical address translation 2.2 Segmentation in Hardware Starting with the 80386 model, Intel microprocessors perform address translation in two different ways called real mode and protected mode. Real mode exists mostly to maintain processor compatibility with older models and to allow the operating system to bootstrap (see Appendix A, for a short description of real mode). We shall thus focus our attention on protected mode. 2.2.1 Segmentation Registers A logical address consists of two parts: a segment identifier and an offset that specifies the relative address within the segment. The segment identifier is a 16-bit field called Segment Selector, while the offset is a 32-bit field. To make it easy to retrieve segment selectors quickly, the processor provides segmentation registers whose only purpose is to hold Segment Selectors; these registers are called cs, ss, ds, es, fs, and gs. Although there are only six of them, a program can reuse the same segmentation register for different purposes by saving its content in memory and then restoring it later. Three of the six segmentation registers have specific purposes: cs The code segment register, which points to a segment containing program instructions ss The stack segment register, which points to a segment containing the current program stack ds The data segment register, which points to a segment containing static and external data The remaining three segmentation registers are general purpose and may refer to arbitrary segments. The cs register has another important function: it includes a 2-bit field that specifies the Current Privilege Level (CPL) of the CPU. The value denotes the highest privilege level, while the value 3 denotes the lowest one. Linux uses only levels and 3, which are respectively called Kernel Mode and User Mode.
  • 48. Understanding the Linux Kernel 38 2.2.2 Segment Descriptors Each segment is represented by an 8-byte Segment Descriptor (see Figure 2-2) that describes the segment characteristics. Segment Descriptors are stored either in the Global Descriptor Table (GDT ) or in the Local Descriptor Table (LDT ). Figure 2-2. Segment Descriptor format Usually only one GDT is defined, while each process may have its own LDT. The address of the GDT in main memory is contained in the gdtr processor register and the address of the currently used LDT is contained in the ldtr processor register. Each Segment Descriptor consists of the following fields: • A 32-bit Base field that contains the linear address of the first byte of the segment. • A G granularity flag: if it is cleared, the segment size is expressed in bytes; otherwise, it is expressed in multiples of 4096 bytes. • A 20-bit Limit field that denotes the segment length in bytes. If G is set to 0, the size of a non-null segment may vary between 1 byte and 1 MB; otherwise, it may vary between 4 KB and 4 GB. • An S system flag: if it is cleared, the segment is a system segment that stores kernel data structures; otherwise, it is a normal code or data segment. • A 4-bit Type field that characterizes the segment type and its access rights. The following Segment Descriptor types are widely used: Code Segment Descriptor Indicates that the Segment Descriptor refers to a code segment; it may be included either in the GDT or in the LDT. The descriptor has the S flag set.
  • 49. Understanding the Linux Kernel 39 Data Segment Descriptor Indicates that the Segment Descriptor refers to a data segment; it may be included either in the GDT or in the LDT. The descriptor has the S flag set. Stack segments are implemented by means of generic data segments. Task State Segment Descriptor (TSSD) Indicates that the Segment Descriptor refers to a Task State Segment (TSS), that is, a segment used to save the contents of the processor registers (see Section 3.2.2 in Chapter 3); it can appear only in the GDT. The corresponding Type field has the value 11 or 9, depending on whether the corresponding process is currently executing on the CPU. The S flag of such descriptors is set to 0. Local Descriptor Table Descriptor (LDTD) Indicates that the Segment Descriptor refers to a segment containing an LDT; it can appear only in the GDT. The corresponding Type field has the value 2. The S flag of such descriptors is set to 0. • A DPL (Descriptor Privilege Level ) 2-bit field used to restrict accesses to the segment. It represents the minimal CPU privilege level requested for accessing the segment. Therefore, a segment with its DPL set to is accessible only when the CPL is 0, that is, in Kernel Mode, while a segment with its DPL set to 3 is accessible with every CPL value. • A Segment-Present flag that is set to if the segment is currently not stored in main memory. Linux always sets this field to 1, since it never swaps out whole segments to disk. • An additional flag called D or B depending on whether the segment contains code or data. Its meaning is slightly different in the two cases, but it is basically set if the addresses used as segment offsets are 32 bits long and it is cleared if they are 16 bits long (see the Intel manual for further details). • A reserved bit (bit 53) always set to 0. • An AVL flag that may be used by the operating system but is ignored in Linux. 2.2.3 Segment Selectors To speed up the translation of logical addresses into linear addresses, the Intel processor provides an additional nonprogrammable register—that is, a register that cannot be set by a programmer—for each of the six programmable segmentation registers. Each nonprogrammable register contains the 8-byte Segment Descriptor (described in the previous section) specified by the Segment Selector contained in the corresponding segmentation register. Every time a Segment Selector is loaded in a segmentation register, the corresponding Segment Descriptor is loaded from memory into the matching nonprogrammable CPU register. From then on, translations of logical addresses referring to that segment can be performed without accessing the GDT or LDT stored in main memory; the processor can just refer directly to the CPU register containing the Segment Descriptor. Accesses to the GDT or LDT are necessary only when the contents of the segmentation register change (see Figure 2-3). Each Segment Selector includes the following fields:
  • 50. Understanding the Linux Kernel 40 • A 13-bit index (described further in the text following this list) that identifies the corresponding Segment Descriptor entry contained in the GDT or in the LDT • A TI (Table Indicator) flag that specifies whether the Segment Descriptor is included in the GDT (TI = 0) or in the LDT (TI = 1) • An RPL (Requestor Privilege Level ) 2-bit field, which is precisely the Current Privilege Level of the CPU when the corresponding Segment Selector is loaded into the cs register[1] [1] The RPL field may also be used to selectively weaken the processor privilege level when accessing data segments; see Intel documentation for details. Figure 2-3. Segment Selector and Segment Descriptor Since a Segment Descriptor is 8 bytes long, its relative address inside the GDT or the LDT is obtained by multiplying the most significant 13 bits of the Segment Selector by 8. For instance, if the GDT is at 0x00020000 (the value stored in the gdtr register) and the index specified by the Segment Selector is 2, the address of the corresponding Segment Descriptor is 0x00020000 + (2 x 8), or 0x00020010. The first entry of the GDT is always set to 0: this ensures that logical addresses with a null Segment Selector will be considered invalid, thus causing a processor exception. The maximum number of Segment Descriptors that can be stored in the GDT is thus 8191, that is, 213 -1. 2.2.4 Segmentation Unit Figure 2-4 shows in detail how a logical address is translated into a corresponding linear address. The segmentation unit performs the following operations: • Examines the TI field of the Segment Selector, in order to determine which Descriptor Table stores the Segment Descriptor. This field indicates that the Descriptor is either in the GDT (in which case the segmentation unit gets the base linear address of the GDT from the gdtr register) or in the active LDT (in which case the segmentation unit gets the base linear address of that LDT from the ldtr register). • Computes the address of the Segment Descriptor from the index field of the Segment Selector. The index field is multiplied by 8 (the size of a Segment Descriptor), and the result is added to the content of the gdtr or ldtr register. • Adds to the Base field of the Segment Descriptor the offset of the logical address, thus obtains the linear address.
  • 51. Understanding the Linux Kernel 41 Figure 2-4. Translating a logical address Notice that, thanks to the nonprogrammable registers associated with the segmentation registers, the first two operations need to be performed only when a segmentation register has been changed. 2.3 Segmentation in Linux Segmentation has been included in Intel microprocessors to encourage programmers to split their applications in logically related entities, such as subroutines or global and local data areas. However, Linux uses segmentation in a very limited way. In fact, segmentation and paging are somewhat redundant since both can be used to separate the physical address spaces of processes: segmentation can assign a different linear address space to each process while paging can map the same linear address space into different physical address spaces. Linux prefers paging to segmentation for the following reasons: • Memory management is simpler when all processes use the same segment register values, that is, when they share the same set of linear addresses. • One of the design objectives of Linux is portability to the most popular architectures; however, several RISC processors support segmentation in a very limited way. The 2.2 version of Linux uses segmentation only when required by the Intel 80x86 architecture. In particular, all processes use the same logical addresses, so the total number of segments to be defined is quite limited and it is possible to store all Segment Descriptors in the Global Descriptor Table (GDT). This table is implemented by the array gdt_table referred by the gdt variable. If you look in the Source Code Index, you can see that these symbols are defined in the file arch/i386/kernel/head.S. Every macro, function, and other symbol in this book is listed in the appendix so you can quickly find it in the source code. Local Descriptor Tables are not used by the kernel, although a system call exists that allows processes to create their own LDTs. This turns out to be useful to applications such as Wine that execute segment-oriented Microsoft Windows applications.
  • 52. Understanding the Linux Kernel 42 Here are the segments used by Linux: • A kernel code segment. The fields of the corresponding Segment Descriptor in the GDT have the following values: o Base = 0x00000000 o Limit = 0xfffff o G (granularity flag) = 1, for segment size expressed in pages o S (system flag) = 1, for normal code or data segment o Type = 0xa, for code segment that can be read and executed o DPL (Descriptor Privilege Level) = 0, for Kernel Mode o D/B (32-bit address flag) = 1, for 32-bit offset addresses Thus, the linear addresses associated with that segment start at and reach the addressing limit of 232 - 1. The S and Type fields specify that the segment is a code segment that can be read and executed. Its DPL value is 0, thus it can be accessed only in Kernel Mode. The corresponding Segment Selector is defined by the __KERNEL_CS macro: in order to address the segment, the kernel just loads the value yielded by the macro into the cs register. • A kernel data segment. The fields of the corresponding Segment Descriptor in the GDT have the following values: o Base = 0x00000000 o Limit = 0xfffff o G (granularity flag) = 1, for segment size expressed in pages o S (system flag) = 1, for normal code or data segment o Type = 2, for data segment that can be read and written o DPL (Descriptor Privilege Level) = 0, for Kernel Mode o D/B (32-bit address flag) = 1, for 32-bit offset addresses This segment is identical to the previous one (in fact, they overlap in the linear address space) except for the value of the Type field, which specifies that it is a data segment that can be read and written. The corresponding Segment Selector is defined by the __KERNEL_DS macro. • A user code segment shared by all processes in User Mode. The fields of the corresponding Segment Descriptor in the GDT have the following values: o Base = 0x00000000 o Limit = 0xfffff o G (granularity flag) = 1, for segment size expressed in pages o S (system flag) = 1, for normal code or data segment o Type = 0xa, for code segment that can be read and executed o DPL (Descriptor Privilege Level) = 3, for User Mode o D/B (32-bit address flag) = 1, for 32-bit offset addresses The S and DPL fields specify that the segment is not a system segment and that its privilege level is equal to 3; it can thus be accessed both in Kernel Mode and in User Mode. The corresponding Segment Selector is defined by the __USER_CS macro.
  • 53. Understanding the Linux Kernel 43 • A user data segment shared by all processes in User Mode. The fields of the corresponding Segment Descriptor in the GDT have the following values: o Base = 0x00000000 o Limit = 0xfffff o G (granularity flag) = 1, for segment size expressed in pages o S (system flag) = 1, for normal code or data segment o Type = 2, for data segment that can be read and written o DPL (Descriptor Privilege Level) = 3, for User Mode o D/B (32-bit address flag) = 1, for 32-bit offset addresses This segment overlaps the previous one: they are identical, except for the value of Type. The corresponding Segment Selector is defined by the __USER_DS macro. • A Task State Segment (TSS) segment for each process. The descriptors of these segments are stored in the GDT. The Base field of the TSS descriptor associated with each process contains the address of the tss field of the corresponding process descriptor. The G flag is cleared, while the Limit field is set to 0xeb, since the TSS segment is 236 bytes long. The Type field is set to 9 or 11 (available 32-bit TSS), and the DPL is set to 0, since processes in User Mode are not allowed to access TSS segments. • A default LDT segment that is usually shared by all processes. This segment is stored in the default_ldt variable. The default LDT includes a single entry consisting of a null Segment Descriptor. Each process has its own LDT Segment Descriptor, which usually points to the common default LDT segment. The Base field is set to the address of default_ldt and the Limit field is set to 7. If a process requires a real LDT, a new 4096-byte segment is created (it can include up to 511 Segment Descriptors), and the default LDT Segment Descriptor associated with that process is replaced in the GDT with a new descriptor with specific values for the Base and Limit fields. For each process, therefore, the GDT contains two different Segment Descriptors: one for the TSS segment and one for the LDT segment. The maximum number of entries allowed in the GDT is 12+2xNR_TASKS, where, in turn, NR_TASKS denotes the maximum number of processes. In the previous list we described the six main Segment Descriptors used by Linux. Four additional Segment Descriptors cover Advanced Power Management (APM) features, and four entries of the GDT are left unused, for a grand total of 14. As we mentioned before, the GDT can have at most 213 = 8192 entries, of which the first is always null. Since 14 are either unused or filled by the system, NR_TASKS cannot be larger than 8180/2 = 4090. The TSS and LDT descriptors for each process are added to the GDT as the process is created. As we shall see in Section 3.3.2 in Chapter 3, the kernel itself spawns the first process: process running init_task . During kernel initialization, the trap_init( ) function inserts the TSS descriptor of this first process into the GDT using the statement: set_tss_desc(0, &init_task.tss); The first process creates others, so that every subsequent process is the child of some existing process. The copy_thread( ) function, which is invoked from the clone( ) and fork( )
  • 54. Understanding the Linux Kernel 44 system calls to create new processes, executes the same function in order to set the TSS of the new process: set_tss_desc(nr, &(task[nr]->tss)); Since each TSS descriptor refers to a different process, of course, each Base field has a different value. The copy_thread( ) function also invokes the set_ldt_desc( ) function in order to insert a Segment Descriptor in the GDT relative to the default LDT for the new process. The kernel data segment includes a process descriptor for each process. Each process descriptor includes its own TSS segment and a pointer to its LDT segment, which is also located inside the kernel data segment. As stated earlier, the Current Privilege Level of the CPU reflects whether the processor is in User or Kernel Mode and is specified by the RPL field of the Segment Selector stored in the cs register. Whenever the Current Privilege Level is changed, some segmentation registers must be correspondingly updated. For instance, when the CPL is equal to 3 (User Mode), the ds register must contain the Segment Selector of the user data segment, but when the CPL is equal to 0, the ds register must contain the Segment Selector of the kernel data segment. A similar situation occurs for the ss register: it must refer to a User Mode stack inside the user data segment when the CPL is 3, and it must refer to a Kernel Mode stack inside the kernel data segment when the CPL is 0. When switching from User Mode to Kernel Mode, Linux always makes sure that the ss register contains the Segment Selector of the kernel data segment. 2.4 Paging in Hardware The paging unit translates linear addresses into physical ones. It checks the requested access type against the access rights of the linear address. If the memory access is not valid, it generates a page fault exception (see Chapter 4, and Chapter 6). For the sake of efficiency, linear addresses are grouped in fixed-length intervals called pages; contiguous linear addresses within a page are mapped into contiguous physical addresses. In this way, the kernel can specify the physical address and the access rights of a page instead of those of all the linear addresses included in it. Following the usual convention, we shall use the term "page" to refer both to a set of linear addresses and to the data contained in this group of addresses. The paging unit thinks of all RAM as partitioned into fixed-length page frames (they are sometimes referred to as physical pages). Each page frame contains a page, that is, the length of a page frame coincides with that of a page. A page frame is a constituent of main memory, and hence it is a storage area. It is important to distinguish a page from a page frame: the former is just a block of data, which may be stored in any page frame or on disk. The data structures that map linear to physical addresses are called page tables; they are stored in main memory and must be properly initialized by the kernel before enabling the paging unit.
  • 55. Exploring the Variety of Random Documents with Different Content
  • 56. looks as if he could do with a bit more, but he always is thin. We have got a very tall lot of men here, Cecil, Tom Greenfield, Godley, Fitzclarence, Bentinck, all make an ordinary six-foot individual feel small, and McKenna isn't exactly short. If we have length represented we also have breadth, which even our present rations are unable to reduce. I am certainly not going to quote a nominal roll of these individuals, as they are fine strong men and I can't get away. 2nd, Wednesday. This morning firing is going on. I suppose another attack. I will go out and see. One rather funny incident in connection with the Boer attack took place yesterday. As a rule they knock off for breakfast, but yesterday they kept it up till some time past 8 o'clock, so at 8 o'clock punctually the natives left their trenches with their tins to draw their porridge, absolutely disregarding the Boer fire which was renewed at intervals all day. It is perfectly incredible how we have pushed them back, for within the area where our advanced trenches now are I recollect seeing a horse-battery of theirs in action during the first few days of the siege. They take particular care not to play those games now. I only wish they would. This sort of drivel relieves one's feelings, even if one can't see relief. 3rd, Thursday. Firing yesterday and to-day was not of any value; they kept it up off and on all day. I sat on the roof with the officers of the Bechuanaland Rifles, and looked on till we got bored. The operation of getting on to and off the roof again was far more dangerous than the ordinary Boer battle. This evening I rode round the guards with Major Panzera. It would take a more enterprising Boer than we have run up against to get in. Major Panzera has a
  • 57. theory that he can't be hit; I haven't, however. Both our theories are good enough viewed from the light of experience. The Germans participating in the defence of the town are going to be photographed. I feel sorry for the German Emperor not being here. He would enjoy this war thoroughly. I heard from Weston-Jarvis this morning. He wrote a very cheery letter. At last they appear to be making some effort to relieve us. Why on earth they didn't try before, Heaven only knows! It seems a perfectly simple operation for any man of any ordinary sense, but really it doesn't much matter in the long run whether it is a month or two sooner or later. I also see the "Baron" is coming down to relieve us. I hope he won't fall on his head and get stretched out as he usually persists in doing. We are always meeting each other in some old ship or other, or in some out of the way continent, but certainly I never expected to be relieved by the "Baron" in the middle of Africa; however, the more pals that roll up the better. 4th, Friday. Absolute quiet. My last letters have fallen into the Dutchmen's hands. They will be nice light reading for them, as they were barely complimentary. I do not expect to be popular after this war. When one is tired and bored out here, it is very refreshing to be able to abuse all and sundry, and think that one need not settle up for another two or three months. 5th, Saturday. Life is short, but temper is shorter. Runners in but no news. This morning a funeral party of the Bechuanaland Rifles marched from the hospital to the cemetery to bury the remains, I say advisedly remains, of Lance-Corporal Ironside, who, after having been wounded some two months ago, had recently had his leg
  • 58. amputated, and had at last died from sheer weakness. He bore his extreme sufferings with remarkable fortitude, pluck, and cheeriness. He was a Scotchman, from Aberdeen, and one of the best shots in the garrison. It is satisfactory to think that he had already avenged his death before he was wounded. 6th, Sunday. To-day the Boers most deliberately violated the tacit Sunday truce which, at their own instigation and request, we have always observed. The whole proceedings were very peculiar. It was a fine morning, and the Sabbath calm pervading the town and the surrounding forts was manifest in the way we were all strolling about the market square. As regards myself, I had just purchased some bases of shells at Platnauer's auction mart, where the weekly auction was proceeding. The firing began, and nobody paid much attention except the officers and men belonging to the quarter at which it was apparently directed. They, on foot, horseback, and bicycle, dispersed headlong to their various posts. One, Mr. McKenzie, on a bicycle, striking the railway line, reached his post in four minutes and fifteen seconds, fifteen seconds too quick for the Boer he was enabled to bag. The Boers, who on previous Sundays had displayed an inclination to loot our cattle, had crept up to the dead ground east of Cannon Kopje, and hastily shot one of our cattle guard and stolen the horses and mules under his charge. It was the more annoying that they should have been successful as we were well prepared for them, and had rather anticipated this attack, having a Maxim in ambush within one hundred and fifty yards, which unfortunately jammed, and failed to polish off the lot, as it certainly ought to have done. If we had had any luck it would have been a very different story. Directly the Maxim began the Boers nipped off
  • 59. their horses and running alongside of them for protection reached the cover in the fold of the ground. Unfortunately they killed poor Francis of the B.S.A.P. (the second brother who has fallen here since the fighting began) and took all the horses. It was very annoying, but a smart bit of work and I congratulate the Dutchmen, whoever they may be, who conducted it. Still it was a breach of our Sunday truce, and if all is fair in love and war the many irate spectators will have their pound of flesh to ask for later on. It really was a curious sight: lines of men impotently watching the raid and behind them the shouts of the unmoved auctioneer of "Going at fifteen bob." "Last time." "Going." "Going." "Gone," and gone they were undoubtedly, but they were our horses and he was referring to some scrap iron. To cover this nefarious procedure they opened a heavy fire on various outlying forts. We were lucky enough in the interchange of courtesies to secure a Dutchman on the railway line, and as they had practically violated the white flag our advanced posts had great shooting all the afternoon at his friends who came to try to pick him up. We buried Francis this evening. The concert was put off. A certain amount of endurance has been shown by the inhabitants and a certain amount of pluck by the defenders of the town, but prior to the Boers starting fooling (successful fooling and neatly carried out), I and several more were standing in the market square gossiping about things we did know, and things we didn't, when we happened to notice a very weak-looking child, apparently as near death as any living creature could be. It transpired on inquiry that this infant was a Dutch one, Graaf by name. His father, a refugee, died of fever; his brother was in hospital, and he had been offered admission, which he refused, because he said that he must
  • 60. look after his mother. Even then, though scarcely able to cross the road, the kid was going to draw his rations. He was taken to hospital, but I think that this is about the pluckiest individual that has come under my notice, and nobody can take exception to the child, though his mother is probably one of those amiable ladies who eat our rations, betray our plans, and are always expressing a whole-hearted wish for our extermination. 15th, Tuesday. News has arrived that our troops are within striking distance; "Sister Ann" performance has begun again. We are now beginning to recover from our exciting Saturday. As I wired home, it was the best day that I ever saw, and I must now try and describe it. Just before four o'clock in the morning we were roused by heavy firing. The garrison turned out and manned the various works. We all turned up, and I went to the headquarters. Everybody got their horses ready, armed themselves as best they could, and awaited the real attack. Colonel Baden-Powell said at once the real attack would be on the stadt. We have had a good many attacks and don't attach much importance to them, but we did not any one of us anticipate the day's work that was in store for us. When I say anticipate, every possible preparation had been made. Well, we hung about in the cold. After about an hour and a half the firing on the eastern front began to slacken. Trooper Waterson of the Blues, as usual, had coffee and cocoa ready at once, and we felt we could last a bit. Jokes were freely bandied, and we kept saying, "When are they going to begin?" Suddenly on the west a conflagration was seen, and betting began as to how far out it was. I got on to the roof of a house, and with Mr. Arnold, of Dixon's Hotel, saw a very
  • 61. magnificent sight. Apparently the whole stadt was on fire, and with the sunrise behind us and the stadt in flames in front, the combination of effects was truly magnificent, if not exactly reassuring. However, nobody seemed to mind much. Our guns, followed by the Bechuanaland Rifles, hurried across the square, men laughing and joking and saying, "we were going to have a good fight." Then came the news that the B.S.A.P. fort, garrisoned by the Protectorate Regiment, had fallen into the enemy's hands. Personally I did not believe it to be true, and started with a carbine to assure myself of the fact. I got close up to the fort, met a squadron running obliquely across its front, and though the bullets were coming from that direction could not believe but that they were our own men who were strolling about outside it. That is the worst of being educated under black powder. I saw poor Hazelrigg, who was a personal friend of mine, and whom I knew at home, shot, but did not realise who he was. Both sides were inextricably mixed, but having ridden about, and got the hang of things, I am certain that within twenty minutes, order and confidence were absolutely restored on our side. You saw bodies of men, individuals, everybody armed with what they could get, guns of any sort, running towards the firing. A smile on every man's face, and the usual remark was, "Now we've got the beggars." The "beggars" in question were under the impression that they had got us and no doubt had a certain amount of ground for their belief. The fight then began. At least we began to fight, for up till then no return had been made to the very heavy fusillade to which we had been subjected. I have soldiered for some years and I have never seen anything smarter or better than the way the Bechuanaland Rifles, our Artillery and the Protectorate Regiment ran
  • 62. down and got between the Boers and their final objective. The Boers then sent a message through the telephone to say they had got Colonel Hore and his force prisoners and that we could not touch them. Campbell, our operator, returned a few remarks of his own not perhaps wholly complimentary and the telephone was disconnected and re-connected with Major Godley. Our main telephone wire runs through the B.S.A.P. fort. McLeod, the man in charge of the wires, commenced careering about armed with a stick and a rifle, and followed by his staff of black men with the idea of directly connecting Major Godley's fort and the headquarters. I may mention McLeod is a sailor and conducts his horse on the principle of a ship. He is perhaps the worst horseman I have ever seen and it says much for the honour of the horse flesh of Mafeking that he is still alive. However, be that as it may, his pawky humour and absolute disregard of danger has made him one of the most amusing features of the siege. You always hear him in broad Scotch and remarkable places, but he is always where he is wanted. By this time we were settling down a bit, so were they. They looted everything they possibly could. A Frenchman got on to the roof of the fort with a bottle of Burgundy belonging to the officers' mess to drink to "Fashoda." He got hit in the stomach and his pals drank the bottle. Our men were very funny. When the Frenchmen yelled "Fashoda," they said "silly beggars, their geography is wrong." I was very pleased with the whole day. I have never heard more or worse jokes made, and, no doubt, had I been umpiring, I should have put some of us out of action or at any rate given them a slight advantage. Every townsman otherwise unoccupied, who had possibly never contemplated the prospect of a fight to the finish, now turned out.
  • 63. Mr. Weil (and too much cannot be said for his resource through every feature of the siege) broke open his boxes, served out every species of firearms he could to every person who wanted them. BOERS FIRING THE NATIVE STADT. A very deaf old soldier, late of the 24th Regiment, Masters by name, asked where they were, and then proceeded to investigate in a most practical fashion. I went down to the jail which more or less commands the B.S.A.P. fort and buildings, and had a look, and as we saw that no attack was imminent or at any rate likely to prove successful, we knocked off by parties and had our breakfast. We
  • 64. were beginning to kill them very nicely. Jail prisoners had all been released. Murchison, who shot Parslow, Lonie, the greatest criminal of the town, were both armed and doing their duty. We were all shooting with the greatest deliberation and effect whenever they showed themselves, and perhaps I was better pleased with being an Englishman from a sightseer's point of view than on any day since the Jubilee. The quaint part of the whole thing was that we were shooting at our own people unwittingly. I had a cousin there, and we laughed consumedly in the evening when we exchanged notes and found that we had been shooting close to him amongst others. I don't think that any man who was in that fight will ever think ill of his neighbour from the highest to the lowest; from our General--or, at least, he ought to be a General--to the ordinary civilian, everybody was cheerful and confident of victory. We had had a long seven months' wait, and at last we were having our decisive fight. After breakfast (like giants refreshed) we began shooting again. I cannot tell you who did well, but I can assure you that no man did badly. Besides the men there were ladies. Mrs. Buchan and Miss Crawford worked most calmly and bravely under fire. All the other ladies did their duty too. Whilst the fight was developing, Mrs. Winter was running about getting us coffee. Her small son, aged six, was extremely wroth with me because I ordered him under shelter. Then commenced what you may call the next phase of the fight. Captain Fitzclarence and his squadron, with Mr. Swinburne and Mr. Bridges, came down through the town to join hands with Captain Marsh's squadron, and then with Lord Charles Bentinck's squadron and the Baralongs, the whole under Major Godley, were now going to commence to capture the Boers. I must endeavour to describe
  • 65. the situation. Eloff's attack was clever and determined. He had seven hundred men and had advanced up the bed of the Molopo. Into Mafeking he had got, but like many previous attacks had proved--it was easy to get in, but quite another matter to get out. The Baralongs and our outlying forts had allowed some three hundred men to enter, and had then commenced a heavy fire upon their supports. This discomfited the supports, and they incontinently fled. Silas Moleno and Lekoko, the Baralong leaders, had decided that it was better to kraal them up like cattle. One Dutchman was overheard to shout, "Mafeking is ours," when suddenly his friends yelled, "My God, we are surrounded." This species of fighting particularly appeals to the Baralong. He is better than the Boer at the Boer's own game, and never will I hear a word against the Baralong. However, Silas was then engaged in conjunction with our own men in collecting them. He collected them where they had no water, and then the question resolved itself into the Boer showing himself and getting shot or gradually starving. If the Baralongs had been fighting the fight and time had been no particular object, they would probably still be shooting odd Boers, but it is obvious that those dilatory measures could not be pursued by ourselves, and that we had to finish the fight by nightfall. Our men were accordingly sent down to round them up; there were thus in all three parties of Boers in the town, one, nearly three hundred strong, in the B.S.A.P. fort, sundry in a kraal by Mr. Minchin's house, others again in the kopje. The kraal was captured in an exceedingly clever manner. Captain Fitzclarence and Captain Marsh worked up to the walls, but knowing the pleasant nature of the Boer, instead of storming the place or showing themselves, they bored loopholes with their
  • 66. bayonets. The artillery under Lieutenant Daniels also had come up to within forty yards. There was a slight hesitation on the part of the Boers to surrender. The order was given to the gun to commence fire. The lanyard broke, but before a fresh start could be made the Boers hastily surrendered. Captain Marsh, known and respected by the Baralongs, had great difficulty in restraining them from finishing the fight their own way, and small blame to them for their desire. They had had their stadt burned. Odd Boers had been bolting at intervals, and had mostly been accounted for. The question next to be settled was as to the possession of the B.S.A.P. fort. Our men who were captive therein, and indeed the Boers and foreigners to whom I have since talked describe our fire as extraordinarily accurate. Eloff had great difficulty in keeping his men together, and as one man at least was a deserter of ours, it can't altogether be wondered that they did not wish to remain. Our firing, as we had more men to spare, became more and more deadly, and at last now they decided to surrender. Some hundred broke away and escaped from the fort, in spite of Eloff firing on them, but their bodies have been coming in ever since and many will never be accounted for, because the bodies of men with rifles may be possibly put away by the Baralongs, who are always begging rifles we have been unable to give them. Eloff accordingly surrendered to Colonel Hore. The other party in the kopje had made several unsuccessful attempts to break out, Bentinck and his squadron always successfully heading them, but as it got dark, and our men had been fighting from before four, it was decided to let them break out and just shoot what we could. The Baralongs had some more shooting too. As each successive batch of prisoners was marched into the town absolute
  • 67. silence was maintained by the Britishers, except saluting brave men who had tried and failed. They were brave men and I like them better now than I ever did; the Kaffirs, however, hooted. As each batch marched up, their arms, of which they had naturally been deprived, were handed over to the Cadets, who had been under fire all day. These warriors range from nine to fifteen years of age. They are the only smartly clad portion of the garrison, for our victorious troops were the dirtiest and most vilely robed lot of scarecrows I have ever seen, still it did one good to see the escort to the prisoners, they were simply swelling like turkey cocks and all round our long lines of defences we would hear cheers and "Rule Britannia" and the "Anthem" being sung with the wildest enthusiasm. It is impossible as I said before, to say who behaved best, but none behaved badly. There was only one thing said afterwards, when all sorts and conditions of men were shaking each other by the hand, and that was, "This is a great day for England." Mafeking is still rather mad with the Relief Column within shouting distance and it is likely to remain so.
  • 68. CAPTURED BOER PRISONERS We lost few men in our great success but I take it that no man particularly wants to be lost. I really have seen brave men here, but the man who says he wants to get shot is simply a liar. We know the story of the Roman sentinel and the Highlander who fought in Athlone (or was it Mullingar) against Hoche and many men that have died for their country obstinately. Captain Singleton's servant, Trooper Muttershek, may be added to their roll. He absolutely declined to surrender and fought on till killed. It wasn't a case of dashing in and dashing out and having your fun and a fight, it was a case of resolution to die sooner than throw down your arms, the wisdom may be questionable, the heroism undoubted. He wasn't
  • 69. taking any surrender. As far as I am concerned, I have seen the British assert their superiority over foreigners before now, but this man in my opinion, though I didn't see him die, was the bravest man who fought on either side that day. It is a good thing to be an Englishman. These foreigners start too quick and finish quicker. They are good men, but we are better, and have proved so for several hundred years. I had always wanted to see the Englishman fight in a tight hole, and I know what he is worth now. He can outstay the other chap. Well, you must be getting rather bored by the fighting, and I will write more anon when I have collected some further particulars. The Rev. W. H. Weekes, our parson, organized a thanksgiving service on Sunday night. We were still rather mad, and it gave us a pleasant feeling to sing nice fighting psalms and hymns, because which ever way you look at it we are perfectly convinced out here that it is a righteous war. He had rather a mixed congregation, which probably in times of peace would be half the size, but he understands his congregation and the congregation understand him. Poor Hazelrigg died that night.
  • 70. INTERVIEWING BOER PRISONERS ON MR. WEIL'S STOEP I went over and saw the prisoners this afternoon. They were very civil, and so were we. I like a Frenchman, and was chaffing them more or less at having left "La Patrie." They didn't seem to mind being prisoners; they apparently enjoyed their fight, but they objected to their food. I did what I could for them, and I couldn't help feeling that they were absolutely uninvited guests. It wasn't their quarrel, and why they wanted to shove their nose into it we all fail to understand. There is really a very charming man amongst them, who asked me to procure him a grammar as he wished to improve his mind by learning Dutch and English. Of course, I got
  • 71. him a grammar, while I couldn't help suggesting that it might have been as well to remain in comfort in France without travelling all this way to learn the language, also remarking Dutch seemed rather out of date. He rather agreed with me, and asked me for a collection of siege stamps as he said he thought his girl would like them. The funny part of these fellows is that they seem to think that we haven't got homes or girls or anything else, but are a sort of automatic "Aunt Sally," put up here for irresponsible foreigners to have a shy at. Nobody bears any malice about the fight, but the Frenchman calls the Boer "canaille," the Boer doesn't seem to like the Frenchman or, indeed, any other foreigner, regarding him as an impetuous fool who would probably lead him (the Boer) into some nasty dangerous place, and the Englishman laughs at the lot; however, as I said before, the poor devils can't help being foreigners. I always like a Frenchman, a good many have been kind to me and they are invariably amusing. Their stomachs, however, are at present proud, and they cannot swallow "sowen," or horse flesh, or any local luxuries. However, as we pointed out, it was rather their fault that we had not any rations in here. Some of these men had only been in the country a week. It seems a long way to come to get put in "quod," and live on horse flesh and "sowens." One told me he passed a battery of our relieving column in harbour at Beira. I suppose he thought he had put in a smart day's work when he got ahead of it. He has, but he isn't working now. I never liked Eloff much, not that I knew him personally, but now I like him better for his performances. He very nearly did a big thing, but both sides have apparently an ineradicable mutual contempt for each other, which has led to some very pretty fighting through the whole war. There is
  • 72. no mistake about it, he did insult the Queen, and I am glad we have had the wiping out of that score, but he is a gallant fellow all the same. When we look back on our discomfiture of Cronje, and the mopping up of Eloff, it gives a pleasant finish to the siege. It wanted just a finishing touch to make it satisfactory. There should be another fight within a few hours, but I reckon that it will be the relief Column's turn, and though everything is ready for us to assist them I honestly don't think we could go far and do much. The men were dog tired on Saturday, absolutely dog tired. I always thought the Boer was a bad bird to get up to the gun, but he came up that day. I don't think he will again. On Monday we saw the tail end of some Boer force arriving. We had hoped it might be our own people, but they appear to be a few miles further off. However, we know they are there or thereabouts now. Nobody minds now, we know we are winning. To return again to my story of the fighting, the foreigners did try their best to stop the Boers looting, but loot they did most thoroughly. They stole everything they could lay their hands on. Not one officer, whose kit happened to be in the fort has recovered anything. One "clumpy" of Boers galloped forth laden with food and drink. The food belonged to themselves, the drink belonged to us. They happened to fall in with the galloping Maxim, a piece of bad luck because they all died and our people took the food and drink. One fellow had taken a pair of brown boots and a horse, he had a few bullets through the boots, the horse was killed and so was he. Life had been very dull here, but that morning put everything all right. We had never before seen a dead or wounded Boer or a prisoner, and it is weary work to see your friends and neighbours
  • 73. shot and not see your own bag too, but personally, except in the way of business, I hope I haven't killed a Boer. In the fight in the morning, though everything had been prepared for as far as we could tell, we had had to take up positions which were absolutely enfiladed by the fresh development of affairs. The trench occupied by the Bechuanaland Rifles, Protectorate Regiment, and others on the spur of the moment, was directly enfiladed by the enemy's quick-firer. Why we were not wiped out on that line I never shall quite make out. They shot the jailor, Heale, who has done very good work all through the siege, who I am afraid leaves a wife and family. Then the prisoners took charge of themselves. Our gunner prisoners ran down to the guns, one was shot, the others served the gun all day. The others, armed with Martinis, commenced a heavy fire on the enemy, or cautioned the Dutch prisoners, the suspects, as to their behaviour, and put them down a hole. It was an exhilarating sight and struck me as exceedingly quaint to see men who had committed every crime, and were undergoing penal servitude, dismissing their past, oblivious of anything except the fact that we were all of the same crowd, and had got to keep the Dutchmen out. I hope Her Majesty will exercise her clemency; they certainly deserve to regain their rights as citizens. We have had rather a dull day for some reason or other. A general idea pervaded the town that relief was at hand, and when towards evening a cloud of dust and troops were seen to the south- west, we most of us got on the roofs and looked at them with some interest. It transpired subsequently, however, that they were the enemy retiring before Mahon. They passed round the south of the town, and opposed him later.
  • 74. 16th, Wednesday. A dull day, but towards evening our relief was really seen. Everybody got on the roofs, and looked on at the Boers being shelled; most refreshing, but as they were not apparently coming in, people went to feed, and enthusiasm rather died away again, so much so that when Major Karri Davis, and some eight men of the I.L.I. marched in, he told one passer-by he was the advance guard relief force, the other only murmured "Oh, yes, I heard you were knocking about," and went to draw his rations, or whatever he was busily engaged in. However, when it became generally known the crowd assembled and began to cheer, and go mad again--so to bed. 17th, Thursday. Roused out this morning at some ungodly hour to be told they had arrived, and strolled down to the I.L.I. to see Captain Barnes of my old regiment. It appeared that Mahon and Plumer had effected a masterly junction the day before, and that the former, following the only true policy of South African warfare had, as usual, said he was going to do one thing, and done something else, viz., camped out, and then suddenly inspanned and marched into the town. I can't quite convey the feelings of the townspeople, they were wild with delight, and pleased as they were their bonne bouche was to come later. Edwardes and Barnes breakfasted with me and then went back (personally I borrowed a horse from the I.L.I.). About 9 o'clock the guns moved out to the waterworks, and then the fun really began. The Boers had been going to intercept Mahon's entry, but he was a bit too previous. All the morning their silly old five-pounder (locally known as "Gentle Annie") had been popping away, when suddenly the R.H.A. Canadian Artillery and pom-poms began, ably led by our old popguns, who had the honour
  • 75. of beginning the ball. I rode well out, as I wanted to see the other people have a treat, but literally in half an hour all there was left of the laager, which has vexed our eyes and souls so much for long months, was a cloud of dust on the horizon, except food-stuffs, &c., which we looted. I got a Dutch Bible, and from its tidiness I was pleased to see its late owner was a proficient in the Sunday school. So, quietly back to the town, and after the march past of the relief column the relieved troops began. And now, I suppose, after being bottled up for some eight lunar months, I may effervesce. As I have said before, I have seen many tributes to her Majesty and joined in them all, but dirty men in shirt sleeves, and dirtier men in rags on scarecrows of horses touched me up most of all. We were dirty, we were ragged, but we were most unmistakably loyal, and we came from all parts of the world--Canadians, South Africans, Australians, Englishmen, Indians, and our Cape Boys and various other Africans, and there was not one of us who did not respect the other, and know we were for one job, the Queen and Empire, not one.
  • 76. MARCH PAST OF THE RELIEVING FORCE. I wonder how the prisoners felt, poor devils; they must have wished they were not against us. The Boers had certainly executed the smartest movement I had seen for some time; I had not believed it possible that a laager could break up and disperse so rapidly. We all went back to lunch, having recovered Captain McLaren, who, I am glad to say, is doing very well. Then after lunch an alarm was raised that we had rounded up old Snyman, and everybody started off to help in the operation; but, alas, Snyman knows too much. They said that he and four hundred Boers were
  • 77. surrounded and refused to surrender, and we all wanted as much surrender as we could get--or the other thing. I am glad to say he was hit on the head in the morning with a bit of shrapnel, but not dangerously wounded, unfortunately, at least so they report. He seems equally execrated by Dutch and English--Psalm-singing, sanctimonious murderer of women and children and his son takes after him. I may contradict my previous statements, but his actions have also varied frequently. Well, we had a great dinner; old friends from all parts of the world foregathered, and at our head was Smitheman. Many dinners then combined, and more old friends were met--so to bed, still pleased with England. Men of all sorts and conditions, trades, professions and ranks, relievers and relieved, slept that night in and about Mafeking, with a restless sleep, thinking of what England would think, and we knew and were sorry we couldn't hear what they said. The garrison in Mafeking hope to get some recognition or decoration, but what they attach particular importance to is receiving the Queen's chocolate. Immediately after the relief column marched in our Baralongs under Montsoia Wessels, Silas and Sekoko and Josiah, marched off on their own to settle up Abraham Ralinti at Rietfontein, and bring in our trusty ally, Saani. He had been utterly looted, and taken away from his own stadt, and kept a prisoner at Rietfontein, his great notion being that we should have a conference with the Boers, and then lay down what he called "plenty polomite," and blow them up when they came to confer. You cannot get very far ahead of a Baralong. I suppose this is the first occasion on which one black man surrendered under a white flag to another. These Rietfontein rebels
  • 78. have always been against the remainder of the Baralongs, and have invariably fought for the Boers since the disturbed relations between Briton and Boer have existed. I hope they will shoot Abraham, as his people's invariable cunning in stopping our runners has caused us great inconvenience, not to mention the numbers they have killed. 18th, Friday. Did very little. Went round and helped our pals to shop, get stamps, money, &c., &c. 19th, Saturday. The garrison held its solemn Thanksgiving Service at the cemetery, at the termination of which three volleys were fired over our dead. We had been unable to do this before owing to the certainty of drawing fire, not that that really much mattered, as they usually fired on all our funeral parties, though there could be no mistaking them. Still they had this excuse that the cemetery is fortified. After the last post had sounded we reformed and sang the National Anthem. Then, after Colonel Baden-Powell had spoken personally to each detachment, we cheered him, and then with heartfelt cheers for Her Majesty, the siege of Mafeking closed. GOD SAVE THE QUEEN. And now for sheer personalities. Mr. Stuart had arrived, and as I considered he was much better qualified to represent the paper with the force than myself, I determined to come south. Mr. B. Weil, whom as I have previously said, I consider to be one of the principal factors in the successful defence, certainly as regards the food supply, said he was going south. I accordingly resolved to accompany him, and while returning from the ceremony suggested it. Anyhow, to make a long story short, I arrived as he was starting,
  • 79. and with a small bag, having relinquished all my Mafeking impedimenta, climbed into his cart. He had to turn out one of his boys, but I didn't mind that, and being the most good-natured of men, he tried to look as if he didn't. So our caravan started--Major Anderson, Major Davis (Surg. I.L.I.), Mr. Weil, and myself, together with his servant Mitchell, a prototype of "Binjamin," but absolutely reliable and hard-working, also Bradley, of Bradley's Hotel, Inspector Marsh, the Rev. ---- Peart, and Ronny Moncrieffe (who had secured a horse belonging to a Protectorate regiment, and proposed to accompany us). He had done a lot of good work in the siege, and was about as tired and unfit as a man could be. However, he was determined to get through, and so he did. It was a quaint pilgrimage, as the column, though it had swept the country, had not particularly cleared it, and the Boer is here to-day, gone to-morrow, and back the next day. Well, our commissariat was excellent. I contributed some eight biscuits and three tins of bully, and that is all I have done except live on the fat of the land--Lord, how fat it seemed after Mafeking--a land flowing with fresh milk, butter and eggs, mutton and white bread, and above all, the sense of freedom, I never knew what it felt like to be properly free before, and I have been more or less of a wanderer most of my life. No more sieges for me, except perhaps from the outside. Yet I was sorry to leave Mafeking, and I may truly say as far as I know I didn't leave a bad friend behind me, only all my kit. Towards dark, after an outspan that was like a picnic, we reached Mr. Wright's farm, where the wounded were--one had died the night before--and we found Mr. Hands, Daily Mail, badly wounded in the thigh, but doing well; Captain Maxwell, I.S.C., and others. Mr. Wright acts up to his name.
  • 80. Two of his sons were in "tronk" at Zeerust for refusing to join the Boers, and what he had was at our disposal. I wonder if people at home realize in what a position our loyalists in Bechuanaland have been placed. If they didn't come in their own countrymen regarded them as rebels,--if they did they lost all they had. But by doing as they have done, that is by carrying on their business while exposed to all the contumely and insult the Boers could heap on them, with the possible loss of life as well as property, they have served their country as well as those who have taken up arms; because their houses have always been a safe place for runners to go to, and news about the doings of the Boers could be obtained from them. Besides, they know which of the Boers fought, and which didn't, and this fact now terrifies the rebels and keeps many quiet, who might not otherwise be so. Mr. Weil on arrival bought two hundred bags of mealies and despatched them to his friends the Baralongs. Such a pretty place his farm is, with plenty of water and lots of game. We slept under the cart, and miserably cold it was. Mr. Weil (who is rather like myself in that respect), could not sleep, and was determined nobody else should do so. So we got up, and sat round the fire till sunrise. Our cocoa that morning was indeed acceptable. The caravan, which was as I say, quaint, marched as follows, preceded by mounted Kaffir Scouts:--First came Keeley and his boy in a Cape cart drawn by mules, followed by Weil, his servant, driver and myself in another Cape cart with six mules, Bradley driving a pair of horses in another, then Ronny, the Rev. ---- Peart and Inspector Marsh riding, the latter riding B.P.'s brother's pony. We inspanned at sunrise on Monday and started for Setloguli. Halted half way and had the pleasing intelligence that a commando was
  • 81. raiding within six miles of us. I personally felt very unhappy. I had always looked upon it as a two-to-one chance, and as we had no weapons we could make no fight of it. Apart from the bore of being a prisoner I knew I should be so awfully laughed at. However, there we were--it was no use grumbling, but I did, as hard as ever I could. Then we inspanned and drove to Setloguli, where our spirits were considerably raised by an excellent lunch provided by Mrs. Fraser, who is the best hostess I have ever met. The Frasers had a terrible rough time of it, and now "the Queen had got her own again" were naturally correspondingly cheerful. Later we were also further relieved to hear that "the commando" was merely a small patrol of Boers, and that it had withdrawn across the border. During the afternoon I went up and saw the old fort--quite interesting, and anybody who wants to spend a quiet time might do worse than to go to Setloguli. The worst of it is it takes some time to get there. Lady Sarah Wilson's maid was there. She had been there since Lady Sarah was brought in by the Boers to Mafeking. Mr. Weil was showing various curios of the siege to Mrs. Fraser, including a copy of Her Majesty's Leaves from the Journal of our Life in the Highlands, which he had looted from the Boer laager. This excited the good lady's unqualified wrath, "What sacrilege for them to have it in their hands. Why it smells Boery," she said. On Tuesday Keeley was returning to Mafeking with Lady Sarah's maid and his scouts, so Weil engaged two scouts to accompany us to Jan Modebi, where we were next going to stop. They didn't seem particularly pushing sort of scouts, as they persistently rode in rear of the Cape cart. The road too, was infamous, but it was impossible to lose the way as the column had left an unmistakable track behind them, and this was
  • 82. fortunate, because when we had been going about an hour and a half our intelligent guide stated he didn't know the way. I wonder how Keeley felt all that Tuesday. If he could have heard half we said he would have torn his two days' beard out and wept. The other scout lost us altogether. Keeley and Weil were arranging a series of despatch riders, so as long as we got one of them to Jan Modebi's, it didn't much matter. We outspanned first at a rebel's farm, and had an excellent lunch. I was still rather fretful. The prospect of captivity made me so, and I only believe in dead Dutchmen, till peace is proclaimed. One Sonnenberg, a brother of some Bond member or other, was there trading, I suppose, like most Bondsmen, running with the hare and hunting with the hounds. He looked well on it, and was very civil. We inspanned and then came a long trek to Jan Modebi's. About half-way there, we saw two horsemen with guns cruising about. One obviously was not a soldier. I reckoned Pretoria was the ticket, however, they came up and Weil went to interview them. They turned out to be one of the Kimberley Light Horse and a civilian who was showing him the way, and he said he had got a convoy of cattle. It felt like being near home again then. We afterwards met the convoy--total, four white men and five black. I still marvel at their colossal impudence, marching through a rebel country within five miles of the enemy's border, escorting cattle for which any Boer will peril his skin. He calmly assured me they were going to pick up all they saw on the way; to use his own words, "All is fish that comes to our net." I hope they got through all right. So to Mr. Menson's, where we put up for the night, and he, like everyone else, did all he could. He, too, had had a bad time. He
  • 83. didn't grumble, but when the relief column had come through they had cut all his barbed wire fences. Having a constitutional antipathy to barbed wire I sympathized with the relief column, but naturally did not say so. I was amused to see three prints of Sir Alfred Milner, Lord Roberts, and Oom Paul, the inscription under the latter being, "The end is better than the beginning, 14.10.99," also to hear his account of how when driving his cattle to Vryburg at the outbreak of the war he had met a Dutchman who told him that they had driven the English into the sea. His reply was, "Oh, that's too far to go," and so he turned and drove his cattle back again to his farm. Weil, as usual, bought up cattle, &c., also butter and other luxuries, and despatched them to the hospital at Mafeking on his own account. Wednesday. We started rather later than usual owing to the heavy rain, and half way to Vryburg we crossed the fresh spoor of men, wagons, cattle, &c., going towards the Transvaal. It afterwards transpired it was the rebel Van Zyl and his following, bolting from Kuruman to the Transvaal. Let off number two. We couldn't have been more than an hour or two behind them, and they would certainly have scooped us had we met them, so the rain was lucky. Well, we got into Vryburg from one side as the troops got in from the other. An old acquaintance rushed me off to the Club, and I then strolled up to see the Scotch Yeomanry and found Charley Burn. I found also Kidd and several others I knew--then on to see Reade, who had been Intelligence Officer at Mafeking before the war, and was D.A.A.G. to General Barton, and arranged about getting on in the first train. This was my first chance of seeing the infantry Tommy on the war path to any great extent. He is no more beautiful or clean, in fact, if anything less so than his cavalry brother, but by
  • 84. heaven he looks a useful one! However, what matter the man as long as the flag is clean. Met North of the Royal Fusiliers and dined with him, they all asked after Fitzclarence, Godley, and the others. They and the Scots Fusiliers had done quite an extraordinary march of forty-four miles in thirty-four hours, and now our infantry were within striking distance of Mafeking. The line should soon be repaired as they had begun from Mafeking and the line as far as Maribogo was practically untouched, in fact next morning, Thursday, they ran twelve miles north. Thursday we began our preparations for departure. The garrison were preparing to celebrate the Queen's Birthday, and the populace to display great enthusiasm, and the women began to come into town. It was not a highly polished parade, so far as I could see. Still, it was rather good to have it there just then, where the Dutchmen had been in occupation within ten days. Rifles were now coming in by the hundred, and the rebel of a fortnight before became a British patriot. We drove to the station, and there met the Scots Fusiliers. I was accosted by a warrior in large blue goggles, who said I didn't remember him. I naturally didn't in the goggles, but it turned out to be Scudamore. They did the best they could for us, and then Dick of the Royal Irish Fusiliers turned up, who had once been my sergeant-major. I was glad to see him--the old regiment and squadron seems fairly dotted all over Africa. Barnes was at Mafeking, three of us had been through the siege, and I met one Lambart at Taungs, who had been a corporal with us, and was a captain in the Kimberley Mounted Corps, curiously enough all belonging to two squadrons, B and D. Well, we left Vryburg with a light engine and a truck full of niggers. We were all sitting on the tank, in charge of young Gregg, R.E., who is a good
  • 85. train master. He ran us down, after dropping the niggers to repair a bridge, to Dry Hartz, where we had to pull out for an up-coming train, and as we had half an hour to wait, and it was just mid-day at twelve, we formed up and gave three cheers for the Queen and drank her health. It was the smallest and dirtiest Queen's Birthday parade I have ever attended; nine all told, but "mony a little makes a muckle." We ran down to Taungs, where one way and another we were detained some twelve hours. I didn't mind. The Royal Welsh Fusiliers were there, and I found several old friends and acquaintances--Gough Radcliffe, R.H., Cooper (Royal Fusiliers), Broke Wright, R.E., the former railway staff officer. So into a cattle truck we jumped with one of the Welsh Fusiliers and some men and arrived at Kimberley 7 o'clock next morning, where I called on Sir C. Parsons, and had fish for breakfast at the hotel. Thus my journey was practically ended. It transpired that Vryburg was held by some half dozen of our forces, and that the remainder of the garrison was only sixty loyalists from the town population. It did not seem a large garrison, but apparently it was good enough. There was rather a curious coincidence at dinner at Orange River. I saw a man whose face I thought I knew, but I was mistaken; it was his likeness to his brother which misled me. He turned out to be Tom Greenfield's brother, who was down here sick, and to whom I had wired to meet me at Fourteen Streams, so that I could give him news of Tom. However, I struck him on the next river or so, so it didn't much matter. It was sad to pass the Modder River and see our cemeteries--all English; so we passed on to Cape Town. And how jolly it was to see
  • 86. Welcome to our website – the perfect destination for book lovers and knowledge seekers. We believe that every book holds a new world, offering opportunities for learning, discovery, and personal growth. That’s why we are dedicated to bringing you a diverse collection of books, ranging from classic literature and specialized publications to self-development guides and children's books. More than just a book-buying platform, we strive to be a bridge connecting you with timeless cultural and intellectual values. With an elegant, user-friendly interface and a smart search system, you can quickly find the books that best suit your interests. Additionally, our special promotions and home delivery services help you save time and fully enjoy the joy of reading. Join us on a journey of knowledge exploration, passion nurturing, and personal growth every day! ebookbell.com