4. Memory virtualization and management

Memory Virtualization and
Management
Hwanju Kim
1

Memory Virtualization
• VMM: “Virtualizing virtual memory”
• Virtual  Physical  Machine
Level 2
Page
table
Page
table
Page
table
Page
table
Level 1
Page
table
.
.
.
Machine memory
Virtual address
Physical
to
Machine
Pseudo physical
memory
[Goal] Secure memory isolation
 A VM is NOT permitted to access
another VM’s memory region
 A VM is NOT permitted to manipulate
“physical-to-machine” mapping
 All mapping to machine memory MUST be verified by VMM
3/30

SW-Based Memory Virtualization
• x86 was virtualization-unfriendly w.r.t. memory
• Memory management unit (MMU) has only a page
table root for “virtual-to-machine (V2M)” mapping
Level 2
Page
table
Page
table
Page
table
Page
table
Level 1
Page
table
.
.
.
Machine memory
Virtual address
Physical
to
Machine
Pseudo physical
memory
MMU
CR3
“Pseudo” means SW, not HW
 This P2M table is used to establish V2M,
not recognized by HW 4/30

Full- vs. Para-virtualization
• How to maintain V2M mapping
• Full-virtualization
• No modification to V2P in a guest OS
• Secretly modifying binary violates OS semantic
• “Shadow page tables”
• V2M made by referring to V2P and P2M
• + No OS modification
• - Performance overheads for maintaining shadow page tables
• Para-virtualization
• Direct modification to V2P in a guest OS using hypercall
• V2P  V2M
• + High performance (batching optimization is possible)
• - OS modification
5/30

Full- vs. Para-virtualization
• How to maintain V2M mapping
MMU Hardware
Page
directory Page
table
Page
table
Page
table
Page
table
Page
directory Page
table
Page
table
Page
table
Page
table
VMM
Guest OS
Shadow
Page
table
Shadow mode (full virtualization) Direct mode (para-virtualization)
V2P
V2M
sync
MMU
Page
directory Page
table
Page
table
Page
table
Page
table
V2M
Read
Write
Read
Write
Page
fault
Page fault handler
Verify that the machine
page to be updated
is owned by the domain?
6/30

Linux Virtual Memory (x86-32)
Kernel
(1G)
User
(3G)
Virtual memory
Page
directory
Page
table
Page
table
Page
table
Page
table
cr3
PFN N
PFN N-1
.
.
.
PFN 4
PFN 3
PFN 2
PFN 1
PFN 0
PAGE_OFFSET
PFN N’s descriptor
PFN N-1’s descriptor
.
.
.
PFN 2’s descriptor
mem_map
struct page
_count
flags
mapping
lru
Physical memory
high_memory
Buddy system allocator
Slab allocator
__alloc_pages
__free_pages
7/30

Xen Memory Virtualization
Xen(64M)
Kernel
User
(3G)
Page
directory Page
table
Page
table
Page
table
Page
table
cr3
MFN N
MFN N-1
.
.
.
MFN 4
MFN 3
MFN 2
MFN 1
MFN 0
Xen(64M)
Kernel
User
(3G)
Page
directory
Page
table
Page
table
Page
table
Page
table
Virtual memory Virtual memory
Machine memory
MFN N’s descriptor
MFN N-1’s descriptor
.
.
.
MFN 2’s descriptor
frame_table
struct page_info
list
count_info
_domain
type_info
Buddy system allocator
__alloc_heap_pages
__free_heap_pages
8/30

Page Table Identification
• Auditing page table updates
• Following mapping from a page table root (CR3) to
identify page tables
• Once identified, page table updates are carefully
monitored and verified
Page
directory
Page
table
Page
table
.
.
.
.
.
.
Page Type
PD
PT
RW
Pin request
validated
Page
Page
.
.
.
9/30

HW Memory Virtualization
• What if nested page table walking is supported
by HW?
• Eliminating SW overheads to maintain V2M
• HW-assisted memory virtualization
• Intel Extended Page Tables (EPT)
• AMD Rapid Virtualization Indexing (RVI)
VMM
Guest OS V2P
P2M V2M
Shadow page tables (SPT)
MMU
SPT
VMM
Guest OS V2P
P2M
Extended page tables (EPT)
EPT MMU
V2M
EPT
GPT
1st walking
2nd walking
GPT
10/30

• AMD RVI (formerly Nested Page Tables (NPT))
• Two page table roots: gCR3 and nCR3
Accelerating Two-Dimensional Page Walks for Virtualized Systems [ASPLOS’08]
11/30

• Advantages
• Significantly simplifying VMM
• Just informing MMU of a P2M root
• No shadow page tables
• No synchronizing overheads and memory overheads
• No OS modification
• Disadvantages
• Not always outperforming SW-based methods
• Page walking overheads on a TLB miss
• SW solution: SW-HW hybrid scheme [VEE’11], Large pages
• HW solution: Caching page walks [ASPLOS’08], Flat page tables
[ISCA’12]
12/30

ARM Memory Virtualization
• Two-stage address translation
Applications
OS
Hardware
Applications
Guest OS
VMM
Hardware
Virtual Address (VA)
Physical Address (PA)
Virtual Address (VA)
Physical Address (PA)
Intermediate Physical Address (IPA)
Guest
Kernel
Guest
User
Stage 1
translation
Stage 2
translation
Virtual
Address Space
Physical
Address Space
Intermediate Physical
Address Space
13/30

Summary
• SW-based memory virtualization has been the
most complex part in VMM
• Before HW support, Xen continued optimizing its
shadow page tables up to ver3
• Virtual memory itself is already complicated, but
virtualizing virtual memory is horrible
• HW-based memory virtualization significantly
reduces VMM complexity
• The most complex and heavy part is now offloaded
to HW
• But, energy issues on ARM HW memory
virtualization?
14/30

Process Memory Management
• Memory sharing
• Parent-child copy-on-write (CoW) sharing
• On fork(), a child CoW-shares its parent memory
• On write to a shared page, copy and modify a private page
• Advantages
• Reducing memory footprint
• Lightweight fork
• Memory overcommitment
• Giving a process larger memory space than physical
memory
• Paging or swapping out to backing storage when
memory is pressured
• Advantage
• Efficient memory utilization
16/30

VM Memory Management
• Memory sharing
• No parent-child relationship
• But, a research project finds this relationship in a useful case
• Virtual based honeyfarm [SOSP’05]
• Honeypot VMs CoW-share a reference image
• General memory sharing
• Block-based sharing
• Content-based sharing
• Memory overcommitment
• Σ VM memory allocation > Machine memory
• Dynamic memory balancing
VMM
Logging & Analysis
Parent
Honeypot
VM
Honeypot
VM
Machine
Memory
Scalability, Fidelity, and Containment in
the Potemkin Virtual Honeyfarm [SOSP’05]
17/30

Why VM Memory Sharing?
• Why memory?
• Memory limitation inhibits high consolidation density
• Other resources wastage
• HW cost
• Memory itself
• Limited motherboard slot
• Energy cost
• RAM energy consumption matters!
• Main goal
• Reducing memory footprint as much as possible even with
more CPU computation
18/30

Memory Sharing
• Block-based page sharing
• Transparent page sharing of Disco [SOSP’97]
• Sharing-aware block devices [USENIX’09]
• On reading a common block from shared disk, only
one memory copy is CoW-shared
+ Finding identical pages is lightweight
- Sharing only for shared disk
Disco: Running Commodity
Operating Systems on
Scalable Multiprocessors [SOSP’97]
19/30

Memory Sharing
• Content-based page sharing
• Sharing pages with identical contents
• VMWare ESX server and KSM for KVM
…2bd806af
4. Byte-by-byte
comparison
1. Periodic scan
2. Hashing page contents
3. Hash
collision
5. CoW sharing &
reclaiming a redundant page
Memory Resource Management in
VMware ESX Server [OSDI’02]
+ High memory utilization
- Finding identical pages is nontrivial
PA
MA
20/30

Memory Sharing
• Subpage sharing
• Difference Engine: Harnessing Memory Redundancy
in Virtual Machine [OSDI’08]
• Patching similar pages
• Compressing idle pages
• Reference & dirty bit tracking to find idle pages
PA
MA
Reference page
+ Much higher memory utilization
- Computationally intensive
Put it all together!
21/30

Memory Sharing
• Kernel Samepage Merging (KSM)
• Open source!!
• Content-based page sharing in Linux
• Increasing memory density by using KSM [OLS’09]
• Linux kernel service
• Applicable to all Linux processes including KVM
• Target memory regions can be registered via madvise()
system call
• Content comparison is done by memcmp()
• Red-black tree
22/30

Memory Overcommitment
• Two types of memory overcommitment
• Using surplus memory reclaimed by sharing
• Providing to memory-hungry VMs
• Creating more VMs
• When is memory pressured?
• Shared pages are CoW-broken
• Balancing memory between VMs
• Providing idle memory to memory-hungry VMs
• When is memory pressured?
• Idle memory becomes busy
Research issues
• How to detect memory-hungry VMs
• How to detect idle memory in VMs
• How to effectively move memory from a VM to another
Working set estimation techniques
Satori: Enlightened page sharing [USENIX’09]
Sharing cycle
23/30

How to Detect Memory-hungry VMs
• Monitoring memory pressure of VMs
• Swap I/O traffic
• Simple method, but only for anonymous pages (e.g., heap)
• How much memory is required?
• Feedback-driven method
• Allocate more memory  monitor swap traffics  …
• Buffer cache monitoring (Geiger [ASPLOS’06])
• Monitoring the use of unified buffer cache based on
• Page faults, page table updates, and disk I/Os
• How much memory is required?
• LRU miss curve ratio (MRC)
Disk
VM
Unified
buffer
cache
 Associate memory and disk locations
 Detect page reuse as cache eviction
• Reused by CoW and demand paging
24/30

How to Detect Idle Memory
• Idle memory
• Inactive memory
• Not recently used memory
• Monitoring page access frequency
• Nontrivial
• Page access is done solely by HW
• Using memory protection of MMU
• Sampling-based idle memory tracking
• Memory Resource Management in VMware ESX Server [OSDI’02]
• Invalidating access privilege of sample pages  Access to a
sample page generates page fault to VMM  VMM estimates
the size of idle memory
25/30

How to Detect Idle Memory
• Para-virtualized approach
• Ghost buffer with hypervisor exclusive cache
• Paravirtualized paging
• Transcendent memory (tmem)
• Providing OS with explicit interface for hypervisor cache
• When a page is evicted, put the page in hypervisor cache
• Oracle’s project
• https://oss.oracle.com/projects/tmem/
Virtual Machine Memory Access Tracking With Hypervisor Exclusive Cache [USENIX’07]
MRC
<Original> <Hypervisor cache>
26/30

How to Move Memory
• VMM-level swap (host swap)
• Full-virtualization
• VMM is responsible for reclaiming pages to be moved
VM1 VM2
VMM
Guest
swap
Guest
swap
Host swap
Memory Memory
Drawback
• VMM cannot know which page is less important (VMM does not know OS policies)
• Even if VMM chooses the same victim page as OS, double page fault occurs
if OS tries to swap out a “host-swapped page” to guest swap
swap-out
27/30

How to Move Memory
• Memory ballooning
• OS is responsible for reclaiming pages to be moved
Memory Resource Management in
VMware ESX Server [OSDI’02]
+ OS knows the best target of victim pages
+ VMM doesn’t need to track guest memory
- Guest OS support is required
Popular solution now!
• Module-based implementation
• Simple implementation
• Balloon drivers for KVM and Xen are
maintained in Linux mainline
• Windows versions are also available
28/30

How to Move Memory
• Memory ballooning
• Overcommitted memory
• Guest OS 2 requests six pages, but four pages are available
VMM
Guest OS 1 Guest OS 2
Balloon
driver
Guest OS 1’s
Swap
Memory
allocator
Request
6 pages
Reclaim
2 pages
U U U U U F
Guest OS 1’s
page
Balloon page
29/30

Summary
• Memory is precious in virtualized environments
• Sharing and overcommitment contribute to high
consolidation density
• But, we should take care of memory efficiency vs. QoS
• Insufficient memory can largely degrade QoS
• VM memory management issues will be more focused
in mobile virtualization
The degree of consolidation
High QoS
Low memory utilization Low QoS
High memory utilization
30/30

4. Memory virtualization and management

More Related Content

What's hot (20)

Similar to 4. Memory virtualization and management (20)

More from Hwanju Kim (8)

Recently uploaded (20)

4. Memory virtualization and management