SlideShare a Scribd company logo
* Based on kernel 6.3 (x86_64) – QEMU
* 2-socket CPUs (4 cores/socket)
* 16GB memory
* Kernel parameter: nokaslr norandmaps
* KASAN: disabled
* Userspace: ASLR is disabled
* Legacy BIOS
Memory Management with Page Folios
Adrian Huang | May, 2023
Agenda
• Problem description
✓[Background] Normal high-order page & compound page
✓Legacy page cache
• Memory folio’s goal
• Solution
✓Page cache with memory folio
✓page struct vs folio struct
✓[Example] total_mapcount() implementation difference between legacy approach and
folio
Normal high-order page & compound page
Page
(head page)
flags |= PG_head
Page
(Tail page)
compound_head
compound_dtor
compound_order
compound_mapcount
compound_nr
First
Tail
Page
only
Page
(Tail page)
_compound_pad_1
(compound_head)
hpage_pinned_refcount
deferred_list
2nd
Tail
Page
only
Compound Page
Page
(Tail page)
_compound_pad_1
(compound_head)
Page
Page
Page
Page
Normal high-order page
pages = alloc_pages(GFP_KERNEL, 2);
Four physically contiguous pages: Init compound page metadata during page allocation
pages = alloc_pages(GFP_KERNEL | __GFP_COMP, 2);
Four physically contiguous pages: not a compound page
Compound page
Page
(head page)
flags |= PG_head
Page
(Tail page)
compound_head
compound_dtor
compound_order
compound_mapcount
compound_nr
First
Tail
Page
only
Page
(Tail page)
_compound_pad_1
(compound_head)
hpage_pinned_refcount
deferred_list
2nd
Tail
Page
only
. . .
Compound Page
Page
(Tail page)
_compound_pad_1
(compound_head)
Page
(Tail page)
_compound_pad_1
(compound_head)
Compound page – Use Cases
• Mainly used in huge page
✓ hugetlbfs (also called HugeTLB Pages or persistent huge pages)
➢ Reserved inside the kernel and cannot be used for other purposes.
➢ Cannot be swapped out.
➢ Two allocation methods:
• Pre-allocated to the kernel huge page pool with appending kernel parameter.
• [Dynamically allocated huge pages of the default size] Example: `echo 10 > /proc/sys/vm/nr_hugepages`
➢ User application calls the mmap system call or shared memory system calls (shmget and shmat) to request the huge page allocation.
➢ Used by database for many years
➢ Manual configuration for hugetlb pages is required.
➢ Application change is required. (via open/mmap)
Compound page
Page
(head page)
flags |= PG_head
Page
(Tail page)
compound_head
compound_dtor
compound_order
compound_mapcount
compound_nr
First
Tail
Page
only
Page
(Tail page)
_compound_pad_1
(compound_head)
hpage_pinned_refcount
deferred_list
2nd
Tail
Page
only
. . .
Compound Page
Page
(Tail page)
_compound_pad_1
(compound_head)
Page
(Tail page)
_compound_pad_1
(compound_head)
Compound page
Page
(head page)
flags |= PG_head
Page
(Tail page)
compound_head
compound_dtor
compound_order
compound_mapcount
compound_nr
First
Tail
Page
only
Page
(Tail page)
_compound_pad_1
(compound_head)
hpage_pinned_refcount
deferred_list
2nd
Tail
Page
only
. . .
Compound Page
Page
(Tail page)
_compound_pad_1
(compound_head)
Page
(Tail page)
_compound_pad_1
(compound_head)
Compound page – Use Cases
• Mainly used in huge page
✓ Transparent Huge Page (THP)
➢ Support the automatic promotion and demotion of page sizes
➢ Transparent to the application: No need to modify application.
➢ Control via /sys/kernel/mm/transparent_hugepage/enabled:
Compound page
Page
(head page)
flags |= PG_head
Page
(Tail page)
compound_head
compound_dtor
compound_order
compound_mapcount
compound_nr
First
Tail
Page
only
Page
(Tail page)
_compound_pad_1
(compound_head)
hpage_pinned_refcount
deferred_list
2nd
Tail
Page
only
. . .
Compound Page
Page
(Tail page)
_compound_pad_1
(compound_head)
Page
(Tail page)
_compound_pad_1
(compound_head)
Compound page – Use Cases
• Mainly used in huge page
• kmalloc: allocation size > 8192 bytes
o Check kmalloc_order()
• Memory folio
When to configure compound page?
• Condition: page order >= 1 && __GFP_COMP allocation flag is set
• alloc_pages -> … -> prep_new_page -> prep_compound_page
Compound page: Problem Description #1
Page
(head page)
flags |= PG_head
Page
(Tail page)
compound_head
compound_dtor
compound_order
compound_mapcount
compound_nr
First
Tail
Page
only
Page
(Tail page)
_compound_pad_1
(compound_head)
hpage_pinned_refcount
deferred_list
2nd
Tail
Page
only
. . .
Compound Page
Page
(Tail page)
_compound_pad_1
(compound_head)
Page
(Tail page)
_compound_pad_1
(compound_head)
[Problem Description] No unified interface: Ambiguity
• Some functions may deal with PAGES_SIZE unit (4KB): They’re unaware of compound pages and huge pages
• Some functions accept the page head *only*
• Some functions accept the page head or page tail
✓ Call compound_head() to get the page head: waste instructions to get the page head → Performance impact
✓ compound_head() users:
➢ get_page(): This function is called quite frequently
➢ put_page(): This function is called quite frequently
➢ …
Legacy page cache
4KB 4KB
512B
512B
512B
512B
[file] file->f_pos
(continuous file position)
Page cache
512B
512B
512B
512B
512B
512B
512B
512B
512B
512B
512B
512B
512B
512B
512B
512B
512B
512B
512B
512B
sector
..
Disk
4KB
Page Cache
Buffer Cache
(Buffer head)
Kernel Space
read()/write()/sendfile()
User Space
mmap()
Legacy page cache: Problem Description #2
1. Page cache occupies most of memory pages!!!
2. Each page cache (no “compound page” concept) is added to active/inactive lru list
• Long lru list: lock contention & cache misses
Agenda
• Problem description
✓Normal high-order page & compound page
✓Legacy page cache
• Memory folio’s goal
• Solution
✓Page cache with memory folio
✓page struct vs folio struct
✓[Example] total_mapcount() implementation difference between legacy approach and
folio
Memory folio’s goal
• Unified interface
✓All accesses via folio struct (head page/tail page in a compound page)
• [Page Cache] Shorter LRU list
✓Original: one struct page per 4KB to be added to LRU list
✓Folio: one struct page (page head) per 8KB, 16KB, 32KB, 64KB, 128KB….and so
on (include THP) to be added to LRU list
• [Anonymous page] THP support
✓create_huge_pmd -> do_huge_pmd_anonymous_page ->
__do_huge_pmd_anonymous_page
* THP: Transparent Huge Page
Agenda
• Problem description
✓Normal high-order page & compound page
✓Legacy page cache
• Memory folio’s goal
• Solution
✓Page cache with memory folio
✓page struct vs folio struct
✓[Example] total_mapcount() implementation difference between legacy approach and
folio
Page cache with folio
4KB
512B
…
512B
[file] file->f_pos
(continuous file position)
folio
sector
…
…
..
Disk
page page page page
4KB 4KB 4KB 4KB 4KB 4KB 4KB
512B
…
512B
512B
…
512B
512B
…
512B
512B
…
512B
page page page page
512B
…
512B
512B
…
512B
512B
…
512B
Kernel Space
read()/write()/sendfile()
User Space
mmap()
• Folio is the container of struct page(s)
✓ All accesses via folio struct
✓ No tail page → fewer run-time checks
Page cache with folio
4KB
512B
…
512B
[file] file->f_pos
(continuous file position)
folio
sector
…
…
..
Disk
page page page page
4KB 4KB 4KB 4KB 4KB 4KB 4KB
512B
…
512B
512B
…
512B
512B
…
512B
512B
…
512B
page page page page
512B
…
512B
512B
…
512B
512B
…
512B
Kernel Space
read()/write()/sendfile()
User Space
mmap()
Folio’s page order: readahead mechanism
• CONFIG_TRANSPARENT_HUGEPAGE is enabled
✓ Minimum: order 2 (4 pages)
✓ Maximum: order 9 (512 pages)
• CONFIG_TRANSPARENT_HUGEPAGE is disabled
✓ Minimum: order 2 (4 pages)
✓ Maximum: order 8 (256 pages)
• Commit 793917d997df (“mm/readahead: Add
large folio readahead”): merged in 5.18 kernel
• Default readahead size: 128KB (32 pages)
Page cache with folio
4KB
512B
…
512B
[file] file->f_pos
(continuous file position)
folio
sector
…
…
..
Disk
page page page page
4KB 4KB 4KB 4KB 4KB 4KB 4KB
512B
…
512B
512B
…
512B
512B
…
512B
512B
…
512B
page page page page
512B
…
512B
512B
…
512B
512B
…
512B
Kernel Space
read()/write()/sendfile()
User Space
mmap()
1. Short LRU list: Only the head page of folio is added LRU list → Performance improvement
2. 45% improvement for lru-file-mmap-read (vm-scalability): Matthew Wilcox’s PDF file
Folio’s page order: readahead mechanism
• CONFIG_TRANSPARENT_HUGEPAGE is enabled
✓ Minimum: order 2 (4 pages)
✓ Maximum: order 9 (512 pages)
• CONFIG_TRANSPARENT_HUGEPAGE is disabled
✓ Minimum: order 2 (4 pages)
✓ Maximum: order 8 (256 pages)
• Commit 793917d997df (“mm/readahead: Add
large folio readahead”): merged in 5.18 kernel
• Default readahead size: 128KB (32 pages)
Page cache with folio: backtrace * kernel: 6.3
Agenda
• Problem description
✓Normal high-order page & compound page
✓Legacy page cache
• Memory folio’s goal
• Solution
✓Page cache with memory folio
✓page struct vs folio struct
✓[Example] total_mapcount() implementation difference between legacy approach and
folio
folio
flags
struct list_head lru
void *__filler
mlock_count
struct address_space *mapping
pgoff_t index
union
void *private
atomic_t _mapcount
atomic_t _refcount
unsigned long memcg_data
struct page page
struct
struct
union
_flags_1
_head_1
unsigned char _folio_dtor
unsigned char _folio_order
atomic_t _entire_mapcount
atomic_t _nr_pages_mapped
atomic_t _pincount
unsigned int _folio_nr_pages
struct
union
struct page __page_1
_flags_2
_head_2
void *_hugetlb_subpool
void *_hugetlb_cgroup
void *_hugetlb_cgroup_rsvd
void *_hugetlb_hwpoison
_flags_2a
_head_2a
struct
union
struct page __page_2
struct
struct list_head _deferred_list
Page #0 (head)
flags
…
Page #1 (tail)
flags
compound_head
compound_dtor
compound_order
compound_mapcount
compound_nr
…
Page #N (tail)
flags
_compound_pad_1
hpage_pinned_refcount
deferred_list
.
.
.
page struct vs folio struct
Compound pages
folio
flags
struct list_head lru
void *__filler
mlock_count
struct address_space *mapping
pgoff_t index
union
void *private
atomic_t _mapcount
atomic_t _refcount
unsigned long memcg_data
struct page page
struct
struct
union
_flags_1
_head_1
unsigned char _folio_dtor
unsigned char _folio_order
atomic_t _entire_mapcount
atomic_t _nr_pages_mapped
atomic_t _pincount
unsigned int _folio_nr_pages
struct
union
struct page __page_1
_flags_2
_head_2
void *_hugetlb_subpool
void *_hugetlb_cgroup
void *_hugetlb_cgroup_rsvd
void *_hugetlb_hwpoison
_flags_2a
_head_2a
struct
union
struct page __page_2
struct
struct list_head _deferred_list
Page #0 (head)
flags
…
Page #1 (tail)
flags
compound_head
compound_dtor
compound_order
compound_mapcount
compound_nr
…
Page #2 (tail)
flags
_compound_pad_1
hpage_pinned_refcount
deferred_list
.
.
.
page struct vs folio struct
Compound pages
folio’s benefit
• [Example] 512KB compound page
✓ page struct: Need to maintain 128
page structs (1 head page and 127
tail pages)
✓ folio struct: Maintain 3 page structs
regardless of the size of compound
pages.
folio
flags
struct list_head lru
void *__filler
mlock_count
struct address_space *mapping
pgoff_t index
union
void *private
atomic_t _mapcount
atomic_t _refcount
unsigned long memcg_data
struct page page
struct
struct
union
_flags_1
_head_1
unsigned char _folio_dtor
unsigned char _folio_order
atomic_t _entire_mapcount
atomic_t _nr_pages_mapped
atomic_t _pincount
unsigned int _folio_nr_pages
struct
union
struct page __page_1 → won’t be used
_flags_2
_head_2
void *_hugetlb_subpool
void *_hugetlb_cgroup
void *_hugetlb_cgroup_rsvd
void *_hugetlb_hwpoison
_flags_2a
_head_2a
struct
union
struct page __page_2 → won’t be used
struct
struct list_head _deferred_list
page struct vs folio struct
folio struct’s members
• _entire_mapcount
✓ The compound page is mapped via a single PMD (huge page).
• _nr_pages_mapped
✓ Number of individual subpages (PTE: 4KB pages) are mapped.
✓ Scenario: Two processes map the same memory range
✓ One process maps the entire 2MB compound page (Transparent Huge
Page - THP): mapped via a single PMD
✓ The other process maps some 4KB pages within this 2MB memory
area: mapped via PTEs
✓ Benefit about THP: No need to split the huge page if other processes
map 4KB pages within the same memory area.
• _folio_nr_pages
✓ Number of pages in this folio.
✓ _folio_nr_pages = 1 << order, where order > 0.
folio
flags
struct list_head lru
void *__filler
mlock_count
struct address_space *mapping
pgoff_t index
union
void *private
atomic_t _mapcount
atomic_t _refcount
unsigned long memcg_data
struct page page
struct
struct
union
_flags_1
_head_1
unsigned char _folio_dtor
unsigned char _folio_order
atomic_t _entire_mapcount
atomic_t _nr_pages_mapped
atomic_t _pincount
unsigned int _folio_nr_pages
struct
union
struct page __page_1
_flags_2
_head_2
void *_hugetlb_subpool
void *_hugetlb_cgroup
void *_hugetlb_cgroup_rsvd
void *_hugetlb_hwpoison
_flags_2a
_head_2a
struct
union
struct page __page_2
struct
struct list_head _deferred_list
Page #0 (head)
flags
…
Page #1 (tail)
flags
compound_head
compound_dtor
compound_order
compound_mapcount
compound_nr
…
Page #2 (tail)
flags
_compound_pad_1
hpage_pinned_refcount
deferred_list
.
.
.
folio struct vs legacy compound page
Page #1 of folio and legacy page
struct has the same mapping
Agenda
• Problem description
✓Normal high-order page & compound page
✓Legacy page cache
• Memory folio’s goal
• Solution
✓Page cache with memory folio
✓page struct vs folio struct
✓[Example] total_mapcount() implementation difference between legacy approach and
folio
[kernel v5.11] total_mapcount()
page
Page cache and anonymous pages
struct
union
page_pool used by netstack
struct
slab, slob and slub
struct
Tail pages of compound page
struct
Second tail page of compound page
struct
Page table pages:
1. PMD huge PTE
2. x86 pgd page <-> mm_struct
struct
ZONE_DEVICE pages
struct
rcu_head: free a page by RCU
struct
union
atomic_t _mapcount: the number of this
page is referenced by page table
unsigned int page_type
unsigned int active: used by slab
int units: used by slob
atomic_t _refcount
…
Case 1: Singleton page(s)
Get _mapcount directly
total_mapcount() users:
• Huge page
• rmap (reverse mapping)
page (head)
Page cache and anonymous pages
struct
union
page_pool used by netstack
struct
slab, slob and slub
struct
Tail pages of compound page
struct
Second tail page of compound page
struct
Page table pages:
1. PMD huge PTE
2. x86 pgd page <-> mm_struct
struct
ZONE_DEVICE pages
struct
rcu_head: free a page by RCU
struct
union
atomic_t _mapcount: the number of this
page is referenced by page table
unsigned int page_type
unsigned int active: used by slab
int units: used by slob
atomic_t _refcount
…
Case 2: Compound page && hugetlb (hugetlbfs) page
page (first tail)
compound_head
struct
union
compound_dtor
compound_order
compound_mapcount
compound_nr =
1 << compound_nr
struct
union
atomic_t _mapcount
unsigned int page_type
unsigned int active
int units
atomic_t _refcount
…
. . .
struct
. . .
page (second tail)
_compound_pad_1
struct
union
hpage_pinned_refcount
deferred_list
struct
union
atomic_t _mapcount
unsigned int page_type
unsigned int active
int units
atomic_t _refcount
…
. . .
struct
. . .
. . .
_compound_pad_1
(compound_head)
page (second tail)
Get compound_mapcount directly
compound_mapcount:
• Map count of the whole compound page
(does not include mapped sub-pages)
Steps:
1. Get the head page based on any page (page
head or page tail)
2. Read ‘compound_mapcount’ of the first tail page
A. page[1].compound_mapcount
[kernel v5.11] total_mapcount()
page (head)
Page cache and anonymous pages
struct
union
page_pool used by netstack
struct
slab, slob and slub
struct
Tail pages of compound page
struct
Second tail page of compound page
struct
Page table pages:
1. PMD huge PTE
2. x86 pgd page <-> mm_struct
struct
ZONE_DEVICE pages
struct
rcu_head: free a page by RCU
struct
union
atomic_t _mapcount: the number of this
page is referenced by page table
unsigned int page_type
unsigned int active: used by slab
int units: used by slob
atomic_t _refcount
…
Case 3: [Anonymous page] Compound page && transparent huge page
page (first tail)
compound_head
struct
union
compound_dtor
compound_order
compound_mapcount
compound_nr =
1 << compound_nr
struct
union
atomic_t _mapcount
unsigned int page_type
unsigned int active
int units
atomic_t _refcount
…
. . .
struct
. . .
page (second tail)
_compound_pad_1
struct
union
hpage_pinned_refcount
deferred_list
struct
union
atomic_t _mapcount
unsigned int page_type
unsigned int active
int units
atomic_t _refcount
…
. . .
struct
. . .
. . .
_compound_pad_1
(compound_head)
page (second tail)
Steps:
1. Get the head page based on any page (page head or page tail)
2. Read ‘compound_mapcount’ of the first tail page
A. page[1].compound_mapcount: Map count of the whole compound page
3. `Accumulate each subpage._mapcount`:
A. One process maps 2MB range as a single huge page (a single PMD)
B. Another process maps 512 individual PTEs
4. `Accumulate each subpage._mapcount` + page[1].compound_mapcount
[kernel v5.11] total_mapcount()
page (head)
Page cache and anonymous pages
struct
union
page_pool used by netstack
struct
slab, slob and slub
struct
Tail pages of compound page
struct
Second tail page of compound page
struct
Page table pages:
1. PMD huge PTE
2. x86 pgd page <-> mm_struct
struct
ZONE_DEVICE pages
struct
rcu_head: free a page by RCU
struct
union
atomic_t _mapcount: the number of this
page is referenced by page table
unsigned int page_type
unsigned int active: used by slab
int units: used by slob
atomic_t _refcount
…
Case 3: [Page cache] Compound page && transparent huge page
page (first tail)
compound_head
struct
union
compound_dtor
compound_order
compound_mapcount
compound_nr =
1 << compound_nr
struct
union
atomic_t _mapcount
unsigned int page_type
unsigned int active
int units
atomic_t _refcount
…
. . .
struct
. . .
page (second tail)
_compound_pad_1
struct
union
hpage_pinned_refcount
deferred_list
struct
union
atomic_t _mapcount
unsigned int page_type
unsigned int active
int units
atomic_t _refcount
…
. . .
struct
. . .
. . .
_compound_pad_1
(compound_head)
page (second tail)
Steps:
1. Get the head page based on any page (page head or page tail)
2. Read ‘compound_mapcount’ of the first tail page
A. page[1].compound_mapcount: Map count of the whole compound page
3. `Accumulate each subpage._mapcount`:
A. One process maps 2MB range as a single huge page (a single PMD)
B. Another process maps 512 individual PTEs
4. `Accumulate each subpage._mapcount` + page[1].compound_mapcount -
page[1].compound_mapcount * page[1].compound_nr
A. File pages has compound_mapcount included in _mapcount
[kernel v5.11] total_mapcount()
[kernel v6.3] total_mapcount() and folio_mapcount()
folio
flags
struct list_head lru
void *__filler
mlock_count
struct address_space *mapping
pgoff_t index
union
void *private
atomic_t _mapcount
atomic_t _refcount
unsigned long memcg_data
struct page page
struct
struct
union
_flags_1
_head_1
unsigned char _folio_dtor
unsigned char _folio_order
atomic_t _entire_mapcount
atomic_t _nr_pages_mapped
atomic_t _pincount
unsigned int _folio_nr_pages
struct
union
struct page __page_1
…
void *_hugetlb_subpool
void *_hugetlb_cgroup
void *_hugetlb_cgroup_rsvd
void *_hugetlb_hwpoison
_flags_2a
_head_2a
struct
union
struct page __page_2
struct
struct list_head _deferred_list
Case 1: Singleton page – Not a compound page
1
2
[kernel v6.3] total_mapcount() and folio_mapcount()
folio
flags
struct list_head lru
void *__filler
mlock_count
struct address_space *mapping
pgoff_t index
union
void *private
atomic_t _mapcount
atomic_t _refcount
unsigned long memcg_data
struct page page
struct
struct
union
_flags_1
_head_1
unsigned char _folio_dtor
unsigned char _folio_order
atomic_t _entire_mapcount
atomic_t _nr_pages_mapped
atomic_t _pincount
unsigned int _folio_nr_pages
struct
union
struct page __page_1
…
void *_hugetlb_subpool
void *_hugetlb_cgroup
void *_hugetlb_cgroup_rsvd
void *_hugetlb_hwpoison
_flags_2a
_head_2a
struct
union
struct page __page_2
struct
struct list_head _deferred_list
Case 2: Compound page is mapped via PMD (huge page)
1
3
2
Get _entire_mapcount
4
5 Get _nr_pages_mapped
[kernel v6.3] total_mapcount() and folio_mapcount()
folio
flags
struct list_head lru
void *__filler
mlock_count
struct address_space *mapping
pgoff_t index
union
void *private
atomic_t _mapcount
atomic_t _refcount
unsigned long memcg_data
struct page page
struct
struct
union
_flags_1
_head_1
unsigned char _folio_dtor
unsigned char _folio_order
atomic_t _entire_mapcount
atomic_t _nr_pages_mapped
atomic_t _pincount
unsigned int _folio_nr_pages
struct
union
struct page __page_1
…
void *_hugetlb_subpool
void *_hugetlb_cgroup
void *_hugetlb_cgroup_rsvd
void *_hugetlb_hwpoison
_flags_2a
_head_2a
struct
union
struct page __page_2
struct
struct list_head _deferred_list
Case 3: Compound page is mapped via PMD (huge page)
and some subpages are mapped by PTE
1
2 mapcount = folio’s _entire_mapcount +
sum(each subpage’s _mapcount)
Reference
• Memory Folios
• LWN - A memory-folio update
• LWN - An introduction to compound pages
• LWN - Huge pages part 1 (Introduction)
• Documentation/mm/transhuge.rst
backup
Learn new C standard (C11) from folio: Generic Selection
* Reference from: ISO/IEC 9899:201x
• C99 defines type-generic macros in the standardized
library: the type of argument is detected automatically,
and the corresponding function is invoked based on that
type.
✓ Example: sqrt(X),
➢ X is double → invoke sqrt()
➢ X is float → invoke sqrtf()
➢ X is long double → invoke sqrtl()
• However, programmers cannot define their own type-
generic macros in C99.
• In C11, programmers can define their own type-generic
macros:
Some Functions

More Related Content

PDF
Memory Mapping Implementation (mmap) in Linux Kernel
PDF
Memory Compaction in Linux Kernel.pdf
PDF
Reverse Mapping (rmap) in Linux Kernel
PDF
Physical Memory Models.pdf
PDF
Anatomy of the loadable kernel module (lkm)
PDF
Decompressed vmlinux: linux kernel initialization from page table configurati...
PDF
Page cache in Linux kernel
PDF
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
Memory Mapping Implementation (mmap) in Linux Kernel
Memory Compaction in Linux Kernel.pdf
Reverse Mapping (rmap) in Linux Kernel
Physical Memory Models.pdf
Anatomy of the loadable kernel module (lkm)
Decompressed vmlinux: linux kernel initialization from page table configurati...
Page cache in Linux kernel
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...

What's hot (20)

PDF
Process Address Space: The way to create virtual address (page table) of user...
PDF
malloc & vmalloc in Linux
PDF
Physical Memory Management.pdf
PPTX
Slab Allocator in Linux Kernel
PDF
Linux Kernel - Virtual File System
PPTX
qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...
PDF
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
PDF
Kdump and the kernel crash dump analysis
PDF
semaphore & mutex.pdf
PDF
spinlock.pdf
PPTX
Linux Kernel Booting Process (2) - For NLKB
PDF
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
PDF
Linux Synchronization Mechanism: RCU (Read Copy Update)
PDF
Linux kernel debugging
PPTX
Linux Kernel Booting Process (1) - For NLKB
PPTX
Linux Memory Management
PDF
Memory management in Linux kernel
PDF
The Linux Block Layer - Built for Fast Storage
PDF
XPDDS17: Shared Virtual Memory Virtualization Implementation on Xen - Yi Liu,...
PDF
Arm device tree and linux device drivers
Process Address Space: The way to create virtual address (page table) of user...
malloc & vmalloc in Linux
Physical Memory Management.pdf
Slab Allocator in Linux Kernel
Linux Kernel - Virtual File System
qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
Kdump and the kernel crash dump analysis
semaphore & mutex.pdf
spinlock.pdf
Linux Kernel Booting Process (2) - For NLKB
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Linux Synchronization Mechanism: RCU (Read Copy Update)
Linux kernel debugging
Linux Kernel Booting Process (1) - For NLKB
Linux Memory Management
Memory management in Linux kernel
The Linux Block Layer - Built for Fast Storage
XPDDS17: Shared Virtual Memory Virtualization Implementation on Xen - Yi Liu,...
Arm device tree and linux device drivers
Ad

Similar to Memory Management with Page Folios (20)

PDF
Page Cache in Linux 2.6.pdf
PPTX
Linux Memory Management with CMA (Contiguous Memory Allocator)
PDF
Understand
PDF
Practical ,Transparent Operating System Support For Superpages
PDF
AOS Lab 7: Page tables
PDF
Vmreport
PPT
Windows memory manager internals
PPT
Chapter 04
PDF
Buddy system
PDF
Кирилл Шутемов aka “kas” - О Transparent Hugepages и Huge zero page
PPTX
Virtual Memory Managementddddddddffffffffffffff.pptx
PPTX
Abhaycavirtual memory and the pagehit.pptx
PDF
Unit 5
PPT
Segmentation with paging methods and techniques
PPT
Paging and Segmentation
PPT
Linux memory
PPT
Linux Memory Management
PPT
Memory Management
PPT
Computer memory management
PPT
Page Cache in Linux 2.6.pdf
Linux Memory Management with CMA (Contiguous Memory Allocator)
Understand
Practical ,Transparent Operating System Support For Superpages
AOS Lab 7: Page tables
Vmreport
Windows memory manager internals
Chapter 04
Buddy system
Кирилл Шутемов aka “kas” - О Transparent Hugepages и Huge zero page
Virtual Memory Managementddddddddffffffffffffff.pptx
Abhaycavirtual memory and the pagehit.pptx
Unit 5
Segmentation with paging methods and techniques
Paging and Segmentation
Linux memory
Linux Memory Management
Memory Management
Computer memory management
Ad

Recently uploaded (20)

PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
Understanding Forklifts - TECH EHS Solution
PDF
System and Network Administration Chapter 2
PDF
AI in Product Development-omnex systems
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Nekopoi APK 2025 free lastest update
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
System and Network Administraation Chapter 3
PPT
Introduction Database Management System for Course Database
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PPTX
Online Work Permit System for Fast Permit Processing
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
Softaken Excel to vCard Converter Software.pdf
VVF-Customer-Presentation2025-Ver1.9.pptx
Understanding Forklifts - TECH EHS Solution
System and Network Administration Chapter 2
AI in Product Development-omnex systems
Design an Analysis of Algorithms II-SECS-1021-03
Nekopoi APK 2025 free lastest update
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Operating system designcfffgfgggggggvggggggggg
System and Network Administraation Chapter 3
Introduction Database Management System for Course Database
Which alternative to Crystal Reports is best for small or large businesses.pdf
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
2025 Textile ERP Trends: SAP, Odoo & Oracle
How to Choose the Right IT Partner for Your Business in Malaysia
Online Work Permit System for Fast Permit Processing
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
How Creative Agencies Leverage Project Management Software.pdf
Softaken Excel to vCard Converter Software.pdf

Memory Management with Page Folios

  • 1. * Based on kernel 6.3 (x86_64) – QEMU * 2-socket CPUs (4 cores/socket) * 16GB memory * Kernel parameter: nokaslr norandmaps * KASAN: disabled * Userspace: ASLR is disabled * Legacy BIOS Memory Management with Page Folios Adrian Huang | May, 2023
  • 2. Agenda • Problem description ✓[Background] Normal high-order page & compound page ✓Legacy page cache • Memory folio’s goal • Solution ✓Page cache with memory folio ✓page struct vs folio struct ✓[Example] total_mapcount() implementation difference between legacy approach and folio
  • 3. Normal high-order page & compound page Page (head page) flags |= PG_head Page (Tail page) compound_head compound_dtor compound_order compound_mapcount compound_nr First Tail Page only Page (Tail page) _compound_pad_1 (compound_head) hpage_pinned_refcount deferred_list 2nd Tail Page only Compound Page Page (Tail page) _compound_pad_1 (compound_head) Page Page Page Page Normal high-order page pages = alloc_pages(GFP_KERNEL, 2); Four physically contiguous pages: Init compound page metadata during page allocation pages = alloc_pages(GFP_KERNEL | __GFP_COMP, 2); Four physically contiguous pages: not a compound page
  • 4. Compound page Page (head page) flags |= PG_head Page (Tail page) compound_head compound_dtor compound_order compound_mapcount compound_nr First Tail Page only Page (Tail page) _compound_pad_1 (compound_head) hpage_pinned_refcount deferred_list 2nd Tail Page only . . . Compound Page Page (Tail page) _compound_pad_1 (compound_head) Page (Tail page) _compound_pad_1 (compound_head) Compound page – Use Cases • Mainly used in huge page ✓ hugetlbfs (also called HugeTLB Pages or persistent huge pages) ➢ Reserved inside the kernel and cannot be used for other purposes. ➢ Cannot be swapped out. ➢ Two allocation methods: • Pre-allocated to the kernel huge page pool with appending kernel parameter. • [Dynamically allocated huge pages of the default size] Example: `echo 10 > /proc/sys/vm/nr_hugepages` ➢ User application calls the mmap system call or shared memory system calls (shmget and shmat) to request the huge page allocation. ➢ Used by database for many years ➢ Manual configuration for hugetlb pages is required. ➢ Application change is required. (via open/mmap)
  • 5. Compound page Page (head page) flags |= PG_head Page (Tail page) compound_head compound_dtor compound_order compound_mapcount compound_nr First Tail Page only Page (Tail page) _compound_pad_1 (compound_head) hpage_pinned_refcount deferred_list 2nd Tail Page only . . . Compound Page Page (Tail page) _compound_pad_1 (compound_head) Page (Tail page) _compound_pad_1 (compound_head)
  • 6. Compound page Page (head page) flags |= PG_head Page (Tail page) compound_head compound_dtor compound_order compound_mapcount compound_nr First Tail Page only Page (Tail page) _compound_pad_1 (compound_head) hpage_pinned_refcount deferred_list 2nd Tail Page only . . . Compound Page Page (Tail page) _compound_pad_1 (compound_head) Page (Tail page) _compound_pad_1 (compound_head) Compound page – Use Cases • Mainly used in huge page ✓ Transparent Huge Page (THP) ➢ Support the automatic promotion and demotion of page sizes ➢ Transparent to the application: No need to modify application. ➢ Control via /sys/kernel/mm/transparent_hugepage/enabled:
  • 7. Compound page Page (head page) flags |= PG_head Page (Tail page) compound_head compound_dtor compound_order compound_mapcount compound_nr First Tail Page only Page (Tail page) _compound_pad_1 (compound_head) hpage_pinned_refcount deferred_list 2nd Tail Page only . . . Compound Page Page (Tail page) _compound_pad_1 (compound_head) Page (Tail page) _compound_pad_1 (compound_head) Compound page – Use Cases • Mainly used in huge page • kmalloc: allocation size > 8192 bytes o Check kmalloc_order() • Memory folio When to configure compound page? • Condition: page order >= 1 && __GFP_COMP allocation flag is set • alloc_pages -> … -> prep_new_page -> prep_compound_page
  • 8. Compound page: Problem Description #1 Page (head page) flags |= PG_head Page (Tail page) compound_head compound_dtor compound_order compound_mapcount compound_nr First Tail Page only Page (Tail page) _compound_pad_1 (compound_head) hpage_pinned_refcount deferred_list 2nd Tail Page only . . . Compound Page Page (Tail page) _compound_pad_1 (compound_head) Page (Tail page) _compound_pad_1 (compound_head) [Problem Description] No unified interface: Ambiguity • Some functions may deal with PAGES_SIZE unit (4KB): They’re unaware of compound pages and huge pages • Some functions accept the page head *only* • Some functions accept the page head or page tail ✓ Call compound_head() to get the page head: waste instructions to get the page head → Performance impact ✓ compound_head() users: ➢ get_page(): This function is called quite frequently ➢ put_page(): This function is called quite frequently ➢ …
  • 9. Legacy page cache 4KB 4KB 512B 512B 512B 512B [file] file->f_pos (continuous file position) Page cache 512B 512B 512B 512B 512B 512B 512B 512B 512B 512B 512B 512B 512B 512B 512B 512B 512B 512B 512B 512B sector .. Disk 4KB Page Cache Buffer Cache (Buffer head) Kernel Space read()/write()/sendfile() User Space mmap()
  • 10. Legacy page cache: Problem Description #2 1. Page cache occupies most of memory pages!!! 2. Each page cache (no “compound page” concept) is added to active/inactive lru list • Long lru list: lock contention & cache misses
  • 11. Agenda • Problem description ✓Normal high-order page & compound page ✓Legacy page cache • Memory folio’s goal • Solution ✓Page cache with memory folio ✓page struct vs folio struct ✓[Example] total_mapcount() implementation difference between legacy approach and folio
  • 12. Memory folio’s goal • Unified interface ✓All accesses via folio struct (head page/tail page in a compound page) • [Page Cache] Shorter LRU list ✓Original: one struct page per 4KB to be added to LRU list ✓Folio: one struct page (page head) per 8KB, 16KB, 32KB, 64KB, 128KB….and so on (include THP) to be added to LRU list • [Anonymous page] THP support ✓create_huge_pmd -> do_huge_pmd_anonymous_page -> __do_huge_pmd_anonymous_page * THP: Transparent Huge Page
  • 13. Agenda • Problem description ✓Normal high-order page & compound page ✓Legacy page cache • Memory folio’s goal • Solution ✓Page cache with memory folio ✓page struct vs folio struct ✓[Example] total_mapcount() implementation difference between legacy approach and folio
  • 14. Page cache with folio 4KB 512B … 512B [file] file->f_pos (continuous file position) folio sector … … .. Disk page page page page 4KB 4KB 4KB 4KB 4KB 4KB 4KB 512B … 512B 512B … 512B 512B … 512B 512B … 512B page page page page 512B … 512B 512B … 512B 512B … 512B Kernel Space read()/write()/sendfile() User Space mmap() • Folio is the container of struct page(s) ✓ All accesses via folio struct ✓ No tail page → fewer run-time checks
  • 15. Page cache with folio 4KB 512B … 512B [file] file->f_pos (continuous file position) folio sector … … .. Disk page page page page 4KB 4KB 4KB 4KB 4KB 4KB 4KB 512B … 512B 512B … 512B 512B … 512B 512B … 512B page page page page 512B … 512B 512B … 512B 512B … 512B Kernel Space read()/write()/sendfile() User Space mmap() Folio’s page order: readahead mechanism • CONFIG_TRANSPARENT_HUGEPAGE is enabled ✓ Minimum: order 2 (4 pages) ✓ Maximum: order 9 (512 pages) • CONFIG_TRANSPARENT_HUGEPAGE is disabled ✓ Minimum: order 2 (4 pages) ✓ Maximum: order 8 (256 pages) • Commit 793917d997df (“mm/readahead: Add large folio readahead”): merged in 5.18 kernel • Default readahead size: 128KB (32 pages)
  • 16. Page cache with folio 4KB 512B … 512B [file] file->f_pos (continuous file position) folio sector … … .. Disk page page page page 4KB 4KB 4KB 4KB 4KB 4KB 4KB 512B … 512B 512B … 512B 512B … 512B 512B … 512B page page page page 512B … 512B 512B … 512B 512B … 512B Kernel Space read()/write()/sendfile() User Space mmap() 1. Short LRU list: Only the head page of folio is added LRU list → Performance improvement 2. 45% improvement for lru-file-mmap-read (vm-scalability): Matthew Wilcox’s PDF file Folio’s page order: readahead mechanism • CONFIG_TRANSPARENT_HUGEPAGE is enabled ✓ Minimum: order 2 (4 pages) ✓ Maximum: order 9 (512 pages) • CONFIG_TRANSPARENT_HUGEPAGE is disabled ✓ Minimum: order 2 (4 pages) ✓ Maximum: order 8 (256 pages) • Commit 793917d997df (“mm/readahead: Add large folio readahead”): merged in 5.18 kernel • Default readahead size: 128KB (32 pages)
  • 17. Page cache with folio: backtrace * kernel: 6.3
  • 18. Agenda • Problem description ✓Normal high-order page & compound page ✓Legacy page cache • Memory folio’s goal • Solution ✓Page cache with memory folio ✓page struct vs folio struct ✓[Example] total_mapcount() implementation difference between legacy approach and folio
  • 19. folio flags struct list_head lru void *__filler mlock_count struct address_space *mapping pgoff_t index union void *private atomic_t _mapcount atomic_t _refcount unsigned long memcg_data struct page page struct struct union _flags_1 _head_1 unsigned char _folio_dtor unsigned char _folio_order atomic_t _entire_mapcount atomic_t _nr_pages_mapped atomic_t _pincount unsigned int _folio_nr_pages struct union struct page __page_1 _flags_2 _head_2 void *_hugetlb_subpool void *_hugetlb_cgroup void *_hugetlb_cgroup_rsvd void *_hugetlb_hwpoison _flags_2a _head_2a struct union struct page __page_2 struct struct list_head _deferred_list Page #0 (head) flags … Page #1 (tail) flags compound_head compound_dtor compound_order compound_mapcount compound_nr … Page #N (tail) flags _compound_pad_1 hpage_pinned_refcount deferred_list . . . page struct vs folio struct Compound pages
  • 20. folio flags struct list_head lru void *__filler mlock_count struct address_space *mapping pgoff_t index union void *private atomic_t _mapcount atomic_t _refcount unsigned long memcg_data struct page page struct struct union _flags_1 _head_1 unsigned char _folio_dtor unsigned char _folio_order atomic_t _entire_mapcount atomic_t _nr_pages_mapped atomic_t _pincount unsigned int _folio_nr_pages struct union struct page __page_1 _flags_2 _head_2 void *_hugetlb_subpool void *_hugetlb_cgroup void *_hugetlb_cgroup_rsvd void *_hugetlb_hwpoison _flags_2a _head_2a struct union struct page __page_2 struct struct list_head _deferred_list Page #0 (head) flags … Page #1 (tail) flags compound_head compound_dtor compound_order compound_mapcount compound_nr … Page #2 (tail) flags _compound_pad_1 hpage_pinned_refcount deferred_list . . . page struct vs folio struct Compound pages folio’s benefit • [Example] 512KB compound page ✓ page struct: Need to maintain 128 page structs (1 head page and 127 tail pages) ✓ folio struct: Maintain 3 page structs regardless of the size of compound pages.
  • 21. folio flags struct list_head lru void *__filler mlock_count struct address_space *mapping pgoff_t index union void *private atomic_t _mapcount atomic_t _refcount unsigned long memcg_data struct page page struct struct union _flags_1 _head_1 unsigned char _folio_dtor unsigned char _folio_order atomic_t _entire_mapcount atomic_t _nr_pages_mapped atomic_t _pincount unsigned int _folio_nr_pages struct union struct page __page_1 → won’t be used _flags_2 _head_2 void *_hugetlb_subpool void *_hugetlb_cgroup void *_hugetlb_cgroup_rsvd void *_hugetlb_hwpoison _flags_2a _head_2a struct union struct page __page_2 → won’t be used struct struct list_head _deferred_list page struct vs folio struct folio struct’s members • _entire_mapcount ✓ The compound page is mapped via a single PMD (huge page). • _nr_pages_mapped ✓ Number of individual subpages (PTE: 4KB pages) are mapped. ✓ Scenario: Two processes map the same memory range ✓ One process maps the entire 2MB compound page (Transparent Huge Page - THP): mapped via a single PMD ✓ The other process maps some 4KB pages within this 2MB memory area: mapped via PTEs ✓ Benefit about THP: No need to split the huge page if other processes map 4KB pages within the same memory area. • _folio_nr_pages ✓ Number of pages in this folio. ✓ _folio_nr_pages = 1 << order, where order > 0.
  • 22. folio flags struct list_head lru void *__filler mlock_count struct address_space *mapping pgoff_t index union void *private atomic_t _mapcount atomic_t _refcount unsigned long memcg_data struct page page struct struct union _flags_1 _head_1 unsigned char _folio_dtor unsigned char _folio_order atomic_t _entire_mapcount atomic_t _nr_pages_mapped atomic_t _pincount unsigned int _folio_nr_pages struct union struct page __page_1 _flags_2 _head_2 void *_hugetlb_subpool void *_hugetlb_cgroup void *_hugetlb_cgroup_rsvd void *_hugetlb_hwpoison _flags_2a _head_2a struct union struct page __page_2 struct struct list_head _deferred_list Page #0 (head) flags … Page #1 (tail) flags compound_head compound_dtor compound_order compound_mapcount compound_nr … Page #2 (tail) flags _compound_pad_1 hpage_pinned_refcount deferred_list . . . folio struct vs legacy compound page Page #1 of folio and legacy page struct has the same mapping
  • 23. Agenda • Problem description ✓Normal high-order page & compound page ✓Legacy page cache • Memory folio’s goal • Solution ✓Page cache with memory folio ✓page struct vs folio struct ✓[Example] total_mapcount() implementation difference between legacy approach and folio
  • 24. [kernel v5.11] total_mapcount() page Page cache and anonymous pages struct union page_pool used by netstack struct slab, slob and slub struct Tail pages of compound page struct Second tail page of compound page struct Page table pages: 1. PMD huge PTE 2. x86 pgd page <-> mm_struct struct ZONE_DEVICE pages struct rcu_head: free a page by RCU struct union atomic_t _mapcount: the number of this page is referenced by page table unsigned int page_type unsigned int active: used by slab int units: used by slob atomic_t _refcount … Case 1: Singleton page(s) Get _mapcount directly total_mapcount() users: • Huge page • rmap (reverse mapping)
  • 25. page (head) Page cache and anonymous pages struct union page_pool used by netstack struct slab, slob and slub struct Tail pages of compound page struct Second tail page of compound page struct Page table pages: 1. PMD huge PTE 2. x86 pgd page <-> mm_struct struct ZONE_DEVICE pages struct rcu_head: free a page by RCU struct union atomic_t _mapcount: the number of this page is referenced by page table unsigned int page_type unsigned int active: used by slab int units: used by slob atomic_t _refcount … Case 2: Compound page && hugetlb (hugetlbfs) page page (first tail) compound_head struct union compound_dtor compound_order compound_mapcount compound_nr = 1 << compound_nr struct union atomic_t _mapcount unsigned int page_type unsigned int active int units atomic_t _refcount … . . . struct . . . page (second tail) _compound_pad_1 struct union hpage_pinned_refcount deferred_list struct union atomic_t _mapcount unsigned int page_type unsigned int active int units atomic_t _refcount … . . . struct . . . . . . _compound_pad_1 (compound_head) page (second tail) Get compound_mapcount directly compound_mapcount: • Map count of the whole compound page (does not include mapped sub-pages) Steps: 1. Get the head page based on any page (page head or page tail) 2. Read ‘compound_mapcount’ of the first tail page A. page[1].compound_mapcount [kernel v5.11] total_mapcount()
  • 26. page (head) Page cache and anonymous pages struct union page_pool used by netstack struct slab, slob and slub struct Tail pages of compound page struct Second tail page of compound page struct Page table pages: 1. PMD huge PTE 2. x86 pgd page <-> mm_struct struct ZONE_DEVICE pages struct rcu_head: free a page by RCU struct union atomic_t _mapcount: the number of this page is referenced by page table unsigned int page_type unsigned int active: used by slab int units: used by slob atomic_t _refcount … Case 3: [Anonymous page] Compound page && transparent huge page page (first tail) compound_head struct union compound_dtor compound_order compound_mapcount compound_nr = 1 << compound_nr struct union atomic_t _mapcount unsigned int page_type unsigned int active int units atomic_t _refcount … . . . struct . . . page (second tail) _compound_pad_1 struct union hpage_pinned_refcount deferred_list struct union atomic_t _mapcount unsigned int page_type unsigned int active int units atomic_t _refcount … . . . struct . . . . . . _compound_pad_1 (compound_head) page (second tail) Steps: 1. Get the head page based on any page (page head or page tail) 2. Read ‘compound_mapcount’ of the first tail page A. page[1].compound_mapcount: Map count of the whole compound page 3. `Accumulate each subpage._mapcount`: A. One process maps 2MB range as a single huge page (a single PMD) B. Another process maps 512 individual PTEs 4. `Accumulate each subpage._mapcount` + page[1].compound_mapcount [kernel v5.11] total_mapcount()
  • 27. page (head) Page cache and anonymous pages struct union page_pool used by netstack struct slab, slob and slub struct Tail pages of compound page struct Second tail page of compound page struct Page table pages: 1. PMD huge PTE 2. x86 pgd page <-> mm_struct struct ZONE_DEVICE pages struct rcu_head: free a page by RCU struct union atomic_t _mapcount: the number of this page is referenced by page table unsigned int page_type unsigned int active: used by slab int units: used by slob atomic_t _refcount … Case 3: [Page cache] Compound page && transparent huge page page (first tail) compound_head struct union compound_dtor compound_order compound_mapcount compound_nr = 1 << compound_nr struct union atomic_t _mapcount unsigned int page_type unsigned int active int units atomic_t _refcount … . . . struct . . . page (second tail) _compound_pad_1 struct union hpage_pinned_refcount deferred_list struct union atomic_t _mapcount unsigned int page_type unsigned int active int units atomic_t _refcount … . . . struct . . . . . . _compound_pad_1 (compound_head) page (second tail) Steps: 1. Get the head page based on any page (page head or page tail) 2. Read ‘compound_mapcount’ of the first tail page A. page[1].compound_mapcount: Map count of the whole compound page 3. `Accumulate each subpage._mapcount`: A. One process maps 2MB range as a single huge page (a single PMD) B. Another process maps 512 individual PTEs 4. `Accumulate each subpage._mapcount` + page[1].compound_mapcount - page[1].compound_mapcount * page[1].compound_nr A. File pages has compound_mapcount included in _mapcount [kernel v5.11] total_mapcount()
  • 28. [kernel v6.3] total_mapcount() and folio_mapcount() folio flags struct list_head lru void *__filler mlock_count struct address_space *mapping pgoff_t index union void *private atomic_t _mapcount atomic_t _refcount unsigned long memcg_data struct page page struct struct union _flags_1 _head_1 unsigned char _folio_dtor unsigned char _folio_order atomic_t _entire_mapcount atomic_t _nr_pages_mapped atomic_t _pincount unsigned int _folio_nr_pages struct union struct page __page_1 … void *_hugetlb_subpool void *_hugetlb_cgroup void *_hugetlb_cgroup_rsvd void *_hugetlb_hwpoison _flags_2a _head_2a struct union struct page __page_2 struct struct list_head _deferred_list Case 1: Singleton page – Not a compound page 1 2
  • 29. [kernel v6.3] total_mapcount() and folio_mapcount() folio flags struct list_head lru void *__filler mlock_count struct address_space *mapping pgoff_t index union void *private atomic_t _mapcount atomic_t _refcount unsigned long memcg_data struct page page struct struct union _flags_1 _head_1 unsigned char _folio_dtor unsigned char _folio_order atomic_t _entire_mapcount atomic_t _nr_pages_mapped atomic_t _pincount unsigned int _folio_nr_pages struct union struct page __page_1 … void *_hugetlb_subpool void *_hugetlb_cgroup void *_hugetlb_cgroup_rsvd void *_hugetlb_hwpoison _flags_2a _head_2a struct union struct page __page_2 struct struct list_head _deferred_list Case 2: Compound page is mapped via PMD (huge page) 1 3 2 Get _entire_mapcount 4 5 Get _nr_pages_mapped
  • 30. [kernel v6.3] total_mapcount() and folio_mapcount() folio flags struct list_head lru void *__filler mlock_count struct address_space *mapping pgoff_t index union void *private atomic_t _mapcount atomic_t _refcount unsigned long memcg_data struct page page struct struct union _flags_1 _head_1 unsigned char _folio_dtor unsigned char _folio_order atomic_t _entire_mapcount atomic_t _nr_pages_mapped atomic_t _pincount unsigned int _folio_nr_pages struct union struct page __page_1 … void *_hugetlb_subpool void *_hugetlb_cgroup void *_hugetlb_cgroup_rsvd void *_hugetlb_hwpoison _flags_2a _head_2a struct union struct page __page_2 struct struct list_head _deferred_list Case 3: Compound page is mapped via PMD (huge page) and some subpages are mapped by PTE 1 2 mapcount = folio’s _entire_mapcount + sum(each subpage’s _mapcount)
  • 31. Reference • Memory Folios • LWN - A memory-folio update • LWN - An introduction to compound pages • LWN - Huge pages part 1 (Introduction) • Documentation/mm/transhuge.rst
  • 33. Learn new C standard (C11) from folio: Generic Selection * Reference from: ISO/IEC 9899:201x • C99 defines type-generic macros in the standardized library: the type of argument is detected automatically, and the corresponding function is invoked based on that type. ✓ Example: sqrt(X), ➢ X is double → invoke sqrt() ➢ X is float → invoke sqrtf() ➢ X is long double → invoke sqrtl() • However, programmers cannot define their own type- generic macros in C99. • In C11, programmers can define their own type-generic macros: