SlideShare a Scribd company logo
Presentation of Chapter 4,
LINUX Kernel Internals
Zhihua (Scott) Jiang
Computer Science Department
University of Maryland, Baltimore County
Baltimore, MD 21250
<zhjiang@cs.umbc.edu>
Guideline
• The Architecture-independent Memory
Model in LINUX
• The Virtual Address Space for a
Process
• Block Device Caching
• Paging Under LINUX
The architecture-independent
memory model
• Pages of Memory
• Virtual Address Space
• Converting the Linear Address
• The Page Directory
• The Page Middle Directory
• The Page Table
Pages of memory
• Defined by the PAGE_SIZE macro in the
asm/page.h
• For X86, the size is 4k bytes
• For Alpha uses 8K bytes
Virtual address space
• Given by reference to a segment selector and the offset within
the segment
• C pointers hold the offsets
• Defined in asm/segment.h
– KERNERL_DS (segment selector for kernel data)
– USER_DS (segment selector for user data)
• By carrying out a conversion on the segment selector register,
a system function can be given pointers to the kernel
segment.
– Used by UMSDOS file system to simulate a Unix file system
Continued
• MMU of an x86 processor converts the virtual address to a
linear address
• 4 Gbytes by width of the linear address
– 3 Gbytes for user segment
– 1 Gbyte for kernel segment
• Alpha does not support segmentation
– Offset addresses for the user segment not permitted to overlap
with the offset addresses for the kernel segment
Converting the linear address
Linear address conversion in the architecture-independent memory model
Linear address
The virtual address space for a
process
• The User Segment
• Virtual Memory Areas
• The System Call brk
• Mapping Functions
• The Kernel Segment
• Static Memory Allocation in the Kernel Segment
• Dynamic Memory Allocation in the Kernel
Segment
The user segment
• In user mode, access only in user segment
• Individual page tables for different processes
• system call fork
– child and parent processes have different page directories and page
tables
– however, in the kernel segment page tables are shared by all
processes
• system call clone
– old and new threads share the memory fully
Continued
• Some explanation for shared libraries in the user
segment
– Originally, linked into one binary, lead to efficiency
– Drawback is the growth of the length
– Stored in separate files and loaded at program start
– Linked to static addresses
– With ELF, allowed shared libraries to be loaded during
program execution
– No absolute address references in the compiled code
Virtual memory areas
• Process not use all functions at any time
• Process can share codes if they are run by the
same executable file
• Copy-on-write strategy used for memory
management
The system call brk
• The brk field points to the end of the BSS segment for non-
statically initialized data
• Used for allocating or releasing dynamic memory
• The system call brk can be used to find the current value of
the pointer or to set it to a new one under protection check
• Rejected if the mem required exceeds the estimated size
• function sys_brk() calls do_map() to map a private and
anonymous area between the old & new values of brk
Mapping functions
• C library provides 3 functions in sys/mman.h
– caddr_t mmap(caddr_t addr, size_t len, int prot, int flags,
int fd, off_t off);
– int munmap(caddr_t addr, size_t len);
– int mprotect(caddr_t addr, size_t len, int prot);
– int msync;
The kernel segment
• In x86 architecture, a system call is generally initiated by the
software interrupt 128 (0x80) being triggered.
• Any processes in system mode will encounter the same kernel
segment
• Kernel segment in alpha architecture cannot start at addr 0
• A PAGE_OFFSET is provided between physical & virtual addrs
Static memory allocation in the kernel
segment
• Initialization routine for character-oriented
devices is called as follows
memory_start = console_init(memory_start, memory_end);
• Reserves memory by returning a value higher
than the parameter memory_start
• The memory between the return value and
memory_start can be used as desired by the
initialized component
Dynamic memory allocation in the kernel
segment
• In LINUX kernel, kmalloc() and kfree() used for dynamic
memory allocation
– void * kmalloc(size_t size, int priority);
– void kfree(void *obj);
• To increase efficiency, the memory reserved is not initialized
• In LINUX kernel 1.2, __get_free_pages() only to reserve
contiguous areas of memory of 4, 8, 16, 32, 64, and 128
Kbytes in size
• kmalloc() can reserve far smaller areas of memory
Continued
• Sizes[] contains descriptors for different for
different sizes of memory area
– one manages memory suitable for DMA
– the other is responsible for ordinary memory
Continued
Structures for kmalloc
Continued
• Kmalloc() and kfree() restricted to the size of one page of
mem
• vmalloc() and vfree() improved to multiple of the size of one
page of mem
• The max of value of size is limited by the amount of physical
memory available
• Memory reserved by vmalloc() won’t be copied to external
storage
Continued
• Comparison of vmalloc() and kmalloc()
– the size of the area of memory requested can be better
adjusted to actual needs
– Limited only by the size of free physical memory and not
by its segmentation (as kmalloc() is)
– Does not return any physical address
– reserved memory can be non-consecutive pages
– not suitable for reserving memory for DMA
Block Device Caching
• Block Buffering
• The update and bdflush Processes
• List Structures for the Buffer Cache
• Using the Buffer Cache
Block Buffering
• Block size may be 512, 1024, 2048, or 4096 bytes
• Held in memory via a buffering system
• A special case applies for blocks taken from files
opened with the flag 0_SYNC
– Transferred to disk every time their contents are modified
• Data is organized as frequently requested data lie
every close together & can be kept in the processor
cache
The update and bdflush
Processes
• At periodic intervals, update process calls the system call
bdflush with an parameter
• All modified buffer blocks are written back to disk with all
superblock and inode information
• bdflush, writes back the number of blocks buffers marked
“dirty” given in the bdflush parameter
• Always activated when a block is released by means of
brelse()
• Also activated when new block buffers are requested or the
size of the buffer cache needs to be reduced
List structure for the buffer cache
• LINUX manages its block buffers via a number of different
doubly linked lists
• Block buffers in use are managed in a set of special LRU lists
LRU list(index) Description
BUF_CLEAN Block buffers not managed in other lists - content
matches relevant block on hard disk
BUF_UNSHARED Block buffers formerly (but no longer) managed in
BUF_SHARED
BUF_LOCKED Locked block buffers (b_lock != 0 )
BUF_LOCKED1 Locked block buffers for inodes and superblocks
BUF_DIRTY Block buffers with contents not matching the relevant
block on hard disk
BUF_SHARED Block buffers situated in a page of memory mapped to
the user segment of a process
The various LRU lists
Using the buffer cache
• Function bread() is called for block read
• Variance of bread(), breada(), reads not the block
requested into the buffer cache but a number of
following blocks
Paging under LINUX
• Page Cache and Management
• Finding a Free Page
• Page Errors and Reloading a Page
Page Cache and Management
• LINUX can save pages to extenral media in 2 ways
– a complete block device as the external medium, typically
a partition on a hard disk
– fixed-length files on a file system for its external storage
• Data that belong together are stored in a cache line
(16 bytes)
Finding a free page
• __get_free_pages() is called after physical pages of mem
reserved
– unsigned long __get_free_pages(int priority, unsigned long order,
int dma) ;
Priority Description
GFP_BUFFER Free page to be returned only if free pages are still available
in physical mem
GFP_ATOMIC The function __get_free_page must not interrupt the current
process, but a page should be returned if possible
GFP_USER The current process may be interrupted to swap pages
GFP_KERNEL This para is the same as GFP_USER
GFP_NOBUFFER The buffer cache won’t be reduced by an attempt to find a
free page in mem
GFP_NFS The difference between this & GFP_USER is that the # of
pages reserved for GFP_ATOMIC is reduced from
min_free_pages to five. Will speed up NFS operations
Priorities for the function __get_free_page()
Page errors and reloading a page
• do_page_fault() is called when there generates a
page fault interrupt
– void do_page_fault(struct pt_regs *regs, unsigned long
error_code);
• do_no_page() or do_wp_page() is called when the
address is in a virtual memory area, the legality of the
read or write operation is checked by reference to the
flags for the virtual mem

More Related Content

PPT
Driver development – memory management
PPT
Linux memory
DOCX
Os Linux Documentation
PPT
memory.ppt
PDF
Virtual memory 20070222-en
PPTX
Io sy.stemppt
PPTX
operating system
Driver development – memory management
Linux memory
Os Linux Documentation
memory.ppt
Virtual memory 20070222-en
Io sy.stemppt
operating system

Similar to kerch04.ppt (20)

PDF
Vmreport
PPT
08 operating system support
PPTX
Linux Memory Management with CMA (Contiguous Memory Allocator)
PPT
operating system
PPTX
Os solaris memory management
PPT
08 operating system support
PPT
Chap8 Virtual Memory. 1997-2003.ppt
PPT
Memory Management in Operating Systems for all
PPT
memory_mapping.ppt
PPT
PPT
ODP
Linux internal
DOCX
Module4
PPTX
Week 12 Operating System Lectures lec 2.pptx
PPTX
CSE2010- Module 4 V1.pptx
PDF
malloc & vmalloc in Linux
PPT
Linux architecture
PPT
08 Operating System Support
PPT
Linux architecture
PPTX
Linux Kernel Booting Process (2) - For NLKB
Vmreport
08 operating system support
Linux Memory Management with CMA (Contiguous Memory Allocator)
operating system
Os solaris memory management
08 operating system support
Chap8 Virtual Memory. 1997-2003.ppt
Memory Management in Operating Systems for all
memory_mapping.ppt
Linux internal
Module4
Week 12 Operating System Lectures lec 2.pptx
CSE2010- Module 4 V1.pptx
malloc & vmalloc in Linux
Linux architecture
08 Operating System Support
Linux architecture
Linux Kernel Booting Process (2) - For NLKB
Ad

More from KalimuthuVelappan (7)

PPTX
log analytic using generative AI transformer model
PPT
rdma-intro-module.ppt
PPT
lesson24.ppt
PPTX
Netlink-Optimization.pptx
PPTX
DPKG caching framework-latest .pptx
PPTX
stack.pptx
PPT
lesson05.ppt
log analytic using generative AI transformer model
rdma-intro-module.ppt
lesson24.ppt
Netlink-Optimization.pptx
DPKG caching framework-latest .pptx
stack.pptx
lesson05.ppt
Ad

Recently uploaded (20)

PPT
Mechanical Engineering MATERIALS Selection
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
web development for engineering and engineering
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
Sustainable Sites - Green Building Construction
PPTX
Welding lecture in detail for understanding
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
DOCX
573137875-Attendance-Management-System-original
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
UNIT 4 Total Quality Management .pptx
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
PPT on Performance Review to get promotions
PDF
Well-logging-methods_new................
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Mechanical Engineering MATERIALS Selection
Embodied AI: Ushering in the Next Era of Intelligent Systems
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
web development for engineering and engineering
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Sustainable Sites - Green Building Construction
Welding lecture in detail for understanding
Operating System & Kernel Study Guide-1 - converted.pdf
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
573137875-Attendance-Management-System-original
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
UNIT 4 Total Quality Management .pptx
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
CH1 Production IntroductoryConcepts.pptx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPT on Performance Review to get promotions
Well-logging-methods_new................
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks

kerch04.ppt

  • 1. Presentation of Chapter 4, LINUX Kernel Internals Zhihua (Scott) Jiang Computer Science Department University of Maryland, Baltimore County Baltimore, MD 21250 <zhjiang@cs.umbc.edu>
  • 2. Guideline • The Architecture-independent Memory Model in LINUX • The Virtual Address Space for a Process • Block Device Caching • Paging Under LINUX
  • 3. The architecture-independent memory model • Pages of Memory • Virtual Address Space • Converting the Linear Address • The Page Directory • The Page Middle Directory • The Page Table
  • 4. Pages of memory • Defined by the PAGE_SIZE macro in the asm/page.h • For X86, the size is 4k bytes • For Alpha uses 8K bytes
  • 5. Virtual address space • Given by reference to a segment selector and the offset within the segment • C pointers hold the offsets • Defined in asm/segment.h – KERNERL_DS (segment selector for kernel data) – USER_DS (segment selector for user data) • By carrying out a conversion on the segment selector register, a system function can be given pointers to the kernel segment. – Used by UMSDOS file system to simulate a Unix file system
  • 6. Continued • MMU of an x86 processor converts the virtual address to a linear address • 4 Gbytes by width of the linear address – 3 Gbytes for user segment – 1 Gbyte for kernel segment • Alpha does not support segmentation – Offset addresses for the user segment not permitted to overlap with the offset addresses for the kernel segment
  • 7. Converting the linear address Linear address conversion in the architecture-independent memory model Linear address
  • 8. The virtual address space for a process • The User Segment • Virtual Memory Areas • The System Call brk • Mapping Functions • The Kernel Segment • Static Memory Allocation in the Kernel Segment • Dynamic Memory Allocation in the Kernel Segment
  • 9. The user segment • In user mode, access only in user segment • Individual page tables for different processes • system call fork – child and parent processes have different page directories and page tables – however, in the kernel segment page tables are shared by all processes • system call clone – old and new threads share the memory fully
  • 10. Continued • Some explanation for shared libraries in the user segment – Originally, linked into one binary, lead to efficiency – Drawback is the growth of the length – Stored in separate files and loaded at program start – Linked to static addresses – With ELF, allowed shared libraries to be loaded during program execution – No absolute address references in the compiled code
  • 11. Virtual memory areas • Process not use all functions at any time • Process can share codes if they are run by the same executable file • Copy-on-write strategy used for memory management
  • 12. The system call brk • The brk field points to the end of the BSS segment for non- statically initialized data • Used for allocating or releasing dynamic memory • The system call brk can be used to find the current value of the pointer or to set it to a new one under protection check • Rejected if the mem required exceeds the estimated size • function sys_brk() calls do_map() to map a private and anonymous area between the old & new values of brk
  • 13. Mapping functions • C library provides 3 functions in sys/mman.h – caddr_t mmap(caddr_t addr, size_t len, int prot, int flags, int fd, off_t off); – int munmap(caddr_t addr, size_t len); – int mprotect(caddr_t addr, size_t len, int prot); – int msync;
  • 14. The kernel segment • In x86 architecture, a system call is generally initiated by the software interrupt 128 (0x80) being triggered. • Any processes in system mode will encounter the same kernel segment • Kernel segment in alpha architecture cannot start at addr 0 • A PAGE_OFFSET is provided between physical & virtual addrs
  • 15. Static memory allocation in the kernel segment • Initialization routine for character-oriented devices is called as follows memory_start = console_init(memory_start, memory_end); • Reserves memory by returning a value higher than the parameter memory_start • The memory between the return value and memory_start can be used as desired by the initialized component
  • 16. Dynamic memory allocation in the kernel segment • In LINUX kernel, kmalloc() and kfree() used for dynamic memory allocation – void * kmalloc(size_t size, int priority); – void kfree(void *obj); • To increase efficiency, the memory reserved is not initialized • In LINUX kernel 1.2, __get_free_pages() only to reserve contiguous areas of memory of 4, 8, 16, 32, 64, and 128 Kbytes in size • kmalloc() can reserve far smaller areas of memory
  • 17. Continued • Sizes[] contains descriptors for different for different sizes of memory area – one manages memory suitable for DMA – the other is responsible for ordinary memory
  • 19. Continued • Kmalloc() and kfree() restricted to the size of one page of mem • vmalloc() and vfree() improved to multiple of the size of one page of mem • The max of value of size is limited by the amount of physical memory available • Memory reserved by vmalloc() won’t be copied to external storage
  • 20. Continued • Comparison of vmalloc() and kmalloc() – the size of the area of memory requested can be better adjusted to actual needs – Limited only by the size of free physical memory and not by its segmentation (as kmalloc() is) – Does not return any physical address – reserved memory can be non-consecutive pages – not suitable for reserving memory for DMA
  • 21. Block Device Caching • Block Buffering • The update and bdflush Processes • List Structures for the Buffer Cache • Using the Buffer Cache
  • 22. Block Buffering • Block size may be 512, 1024, 2048, or 4096 bytes • Held in memory via a buffering system • A special case applies for blocks taken from files opened with the flag 0_SYNC – Transferred to disk every time their contents are modified • Data is organized as frequently requested data lie every close together & can be kept in the processor cache
  • 23. The update and bdflush Processes • At periodic intervals, update process calls the system call bdflush with an parameter • All modified buffer blocks are written back to disk with all superblock and inode information • bdflush, writes back the number of blocks buffers marked “dirty” given in the bdflush parameter • Always activated when a block is released by means of brelse() • Also activated when new block buffers are requested or the size of the buffer cache needs to be reduced
  • 24. List structure for the buffer cache • LINUX manages its block buffers via a number of different doubly linked lists • Block buffers in use are managed in a set of special LRU lists LRU list(index) Description BUF_CLEAN Block buffers not managed in other lists - content matches relevant block on hard disk BUF_UNSHARED Block buffers formerly (but no longer) managed in BUF_SHARED BUF_LOCKED Locked block buffers (b_lock != 0 ) BUF_LOCKED1 Locked block buffers for inodes and superblocks BUF_DIRTY Block buffers with contents not matching the relevant block on hard disk BUF_SHARED Block buffers situated in a page of memory mapped to the user segment of a process The various LRU lists
  • 25. Using the buffer cache • Function bread() is called for block read • Variance of bread(), breada(), reads not the block requested into the buffer cache but a number of following blocks
  • 26. Paging under LINUX • Page Cache and Management • Finding a Free Page • Page Errors and Reloading a Page
  • 27. Page Cache and Management • LINUX can save pages to extenral media in 2 ways – a complete block device as the external medium, typically a partition on a hard disk – fixed-length files on a file system for its external storage • Data that belong together are stored in a cache line (16 bytes)
  • 28. Finding a free page • __get_free_pages() is called after physical pages of mem reserved – unsigned long __get_free_pages(int priority, unsigned long order, int dma) ; Priority Description GFP_BUFFER Free page to be returned only if free pages are still available in physical mem GFP_ATOMIC The function __get_free_page must not interrupt the current process, but a page should be returned if possible GFP_USER The current process may be interrupted to swap pages GFP_KERNEL This para is the same as GFP_USER GFP_NOBUFFER The buffer cache won’t be reduced by an attempt to find a free page in mem GFP_NFS The difference between this & GFP_USER is that the # of pages reserved for GFP_ATOMIC is reduced from min_free_pages to five. Will speed up NFS operations Priorities for the function __get_free_page()
  • 29. Page errors and reloading a page • do_page_fault() is called when there generates a page fault interrupt – void do_page_fault(struct pt_regs *regs, unsigned long error_code); • do_no_page() or do_wp_page() is called when the address is in a virtual memory area, the legality of the read or write operation is checked by reference to the flags for the virtual mem