SlideShare a Scribd company logo
Memory Virtualization and
Management
Hwanju Kim
1
MEMORY VIRTUALIZATION
2
Memory Virtualization
• VMM: “Virtualizing virtual memory”
• Virtual  Physical  Machine
Level 2
Page
table
Page
table
Page
table
Page
table
Level 1
Page
table
.
.
.
Machine memory
Virtual address
Physical
to
Machine
Pseudo physical
memory
[Goal] Secure memory isolation
 A VM is NOT permitted to access
another VM’s memory region
 A VM is NOT permitted to manipulate
“physical-to-machine” mapping
 All mapping to machine memory MUST be verified by VMM
3/30
SW-Based Memory Virtualization
• x86 was virtualization-unfriendly w.r.t. memory
• Memory management unit (MMU) has only a page
table root for “virtual-to-machine (V2M)” mapping
Level 2
Page
table
Page
table
Page
table
Page
table
Level 1
Page
table
.
.
.
Machine memory
Virtual address
Physical
to
Machine
Pseudo physical
memory
MMU
CR3
“Pseudo” means SW, not HW
 This P2M table is used to establish V2M,
not recognized by HW 4/30
Full- vs. Para-virtualization
• How to maintain V2M mapping
• Full-virtualization
• No modification to V2P in a guest OS
• Secretly modifying binary violates OS semantic
• “Shadow page tables”
• V2M made by referring to V2P and P2M
• + No OS modification
• - Performance overheads for maintaining shadow page tables
• Para-virtualization
• Direct modification to V2P in a guest OS using hypercall
• V2P  V2M
• + High performance (batching optimization is possible)
• - OS modification
5/30
Full- vs. Para-virtualization
• How to maintain V2M mapping
MMU Hardware
Page
directory Page
table
Page
table
Page
table
Page
table
Page
directory Page
table
Page
table
Page
table
Page
table
VMM
Guest OS
Shadow
Page
table
Shadow mode (full virtualization) Direct mode (para-virtualization)
V2P
V2M
sync
MMU
Page
directory Page
table
Page
table
Page
table
Page
table
V2M
Read
Write
Read
Write
Page
fault
Page fault handler
Verify that the machine
page to be updated
is owned by the domain?
6/30
Linux Virtual Memory (x86-32)
Kernel
(1G)
User
(3G)
Virtual memory
Page
directory
Page
table
Page
table
Page
table
Page
table
cr3
PFN N
PFN N-1
.
.
.
PFN 4
PFN 3
PFN 2
PFN 1
PFN 0
PAGE_OFFSET
PFN N’s descriptor
PFN N-1’s descriptor
.
.
.
PFN 2’s descriptor
PFN 1’s descriptor
PFN 0’s descriptor
mem_map
struct page
_count
flags
mapping
lru
Physical memory
high_memory
Buddy system allocator
Slab allocator
__alloc_pages
__free_pages
7/30
Xen Memory Virtualization
• Para-virtualization
Xen(64M)
Kernel
User
(3G)
Page
directory Page
table
Page
table
Page
table
Page
table
cr3
MFN N
MFN N-1
.
.
.
MFN 4
MFN 3
MFN 2
MFN 1
MFN 0
Xen(64M)
Kernel
User
(3G)
Page
directory
Page
table
Page
table
Page
table
Page
table
Virtual memory Virtual memory
Machine memory
MFN N’s descriptor
MFN N-1’s descriptor
.
.
.
MFN 2’s descriptor
MFN 1’s descriptor
MFN 0’s descriptor
frame_table
struct page_info
list
count_info
_domain
type_info
Buddy system allocator
__alloc_heap_pages
__free_heap_pages
8/30
Page Table Identification
• Auditing page table updates
• Following mapping from a page table root (CR3) to
identify page tables
• Once identified, page table updates are carefully
monitored and verified
Page
directory
Page
table
Page
table
.
.
.
.
.
.
Page Type
PD
PT
RW
Pin request
validated
Page
Page
.
.
.
9/30
HW Memory Virtualization
• What if nested page table walking is supported
by HW?
• Eliminating SW overheads to maintain V2M
• HW-assisted memory virtualization
• Intel Extended Page Tables (EPT)
• AMD Rapid Virtualization Indexing (RVI)
VMM
Guest OS V2P
P2M V2M
Shadow page tables (SPT)
MMU
SPT
VMM
Guest OS V2P
P2M
Extended page tables (EPT)
EPT MMU
V2M
EPT
GPT
1st walking
2nd walking
GPT
10/30
HW Memory Virtualization
• AMD RVI (formerly Nested Page Tables (NPT))
• Two page table roots: gCR3 and nCR3
Accelerating Two-Dimensional Page Walks for Virtualized Systems [ASPLOS’08]
11/30
HW Memory Virtualization
• Advantages
• Significantly simplifying VMM
• Just informing MMU of a P2M root
• No shadow page tables
• No synchronizing overheads and memory overheads
• No OS modification
• Disadvantages
• Not always outperforming SW-based methods
• Page walking overheads on a TLB miss
• SW solution: SW-HW hybrid scheme [VEE’11], Large pages
• HW solution: Caching page walks [ASPLOS’08], Flat page tables
[ISCA’12]
12/30
ARM Memory Virtualization
• Two-stage address translation
Applications
OS
Hardware
Applications
Guest OS
VMM
Hardware
Virtual Address (VA)
Physical Address (PA)
Virtual Address (VA)
Physical Address (PA)
Intermediate Physical Address (IPA)
Guest
Kernel
Guest
User
Stage 1
translation
Stage 2
translation
Virtual
Address Space
Physical
Address Space
Intermediate Physical
Address Space
13/30
Summary
• SW-based memory virtualization has been the
most complex part in VMM
• Before HW support, Xen continued optimizing its
shadow page tables up to ver3
• Virtual memory itself is already complicated, but
virtualizing virtual memory is horrible
• HW-based memory virtualization significantly
reduces VMM complexity
• The most complex and heavy part is now offloaded
to HW
• But, energy issues on ARM HW memory
virtualization?
14/30
MEMORY MANAGEMENT
15
Process Memory Management
• Memory sharing
• Parent-child copy-on-write (CoW) sharing
• On fork(), a child CoW-shares its parent memory
• On write to a shared page, copy and modify a private page
• Advantages
• Reducing memory footprint
• Lightweight fork
• Memory overcommitment
• Giving a process larger memory space than physical
memory
• Paging or swapping out to backing storage when
memory is pressured
• Advantage
• Efficient memory utilization
16/30
VM Memory Management
• Memory sharing
• No parent-child relationship
• But, a research project finds this relationship in a useful case
• Virtual based honeyfarm [SOSP’05]
• Honeypot VMs CoW-share a reference image
• General memory sharing
• Block-based sharing
• Content-based sharing
• Memory overcommitment
• Σ VM memory allocation > Machine memory
• Dynamic memory balancing
VMM
Logging & Analysis
Parent
Honeypot
VM
Honeypot
VM
Machine
Memory
Scalability, Fidelity, and Containment in
the Potemkin Virtual Honeyfarm [SOSP’05]
17/30
Why VM Memory Sharing?
• Why memory?
• Memory limitation inhibits high consolidation density
• Other resources wastage
• HW cost
• Memory itself
• Limited motherboard slot
• Energy cost
• RAM energy consumption matters!
• Main goal
• Reducing memory footprint as much as possible even with
more CPU computation
18/30
Memory Sharing
• Block-based page sharing
• Transparent page sharing of Disco [SOSP’97]
• Sharing-aware block devices [USENIX’09]
• On reading a common block from shared disk, only
one memory copy is CoW-shared
+ Finding identical pages is lightweight
- Sharing only for shared disk
Disco: Running Commodity
Operating Systems on
Scalable Multiprocessors [SOSP’97]
19/30
Memory Sharing
• Content-based page sharing
• Sharing pages with identical contents
• VMWare ESX server and KSM for KVM
…2bd806af
4. Byte-by-byte
comparison
1. Periodic scan
2. Hashing page contents
3. Hash
collision
5. CoW sharing &
reclaiming a redundant page
Memory Resource Management in
VMware ESX Server [OSDI’02]
+ High memory utilization
- Finding identical pages is nontrivial
PA
MA
20/30
Memory Sharing
• Subpage sharing
• Difference Engine: Harnessing Memory Redundancy
in Virtual Machine [OSDI’08]
• Patching similar pages
• Compressing idle pages
• Reference & dirty bit tracking to find idle pages
PA
MA
Reference page
+ Much higher memory utilization
- Computationally intensive
Put it all together!
21/30
Memory Sharing
• Kernel Samepage Merging (KSM)
• Open source!!
• Content-based page sharing in Linux
• Increasing memory density by using KSM [OLS’09]
• Linux kernel service
• Applicable to all Linux processes including KVM
• Target memory regions can be registered via madvise()
system call
• Content comparison is done by memcmp()
• Red-black tree
22/30
Memory Overcommitment
• Two types of memory overcommitment
• Using surplus memory reclaimed by sharing
• Providing to memory-hungry VMs
• Creating more VMs
• When is memory pressured?
• Shared pages are CoW-broken
• Balancing memory between VMs
• Providing idle memory to memory-hungry VMs
• When is memory pressured?
• Idle memory becomes busy
Research issues
• How to detect memory-hungry VMs
• How to detect idle memory in VMs
• How to effectively move memory from a VM to another
Working set estimation techniques
Satori: Enlightened page sharing [USENIX’09]
Sharing cycle
23/30
How to Detect Memory-hungry VMs
• Monitoring memory pressure of VMs
• Swap I/O traffic
• Simple method, but only for anonymous pages (e.g., heap)
• How much memory is required?
• Feedback-driven method
• Allocate more memory  monitor swap traffics  …
• Buffer cache monitoring (Geiger [ASPLOS’06])
• Monitoring the use of unified buffer cache based on
• Page faults, page table updates, and disk I/Os
• How much memory is required?
• LRU miss curve ratio (MRC)
Disk
VM
Unified
buffer
cache
 Associate memory and disk locations
 Detect page reuse as cache eviction
• Reused by CoW and demand paging
24/30
How to Detect Idle Memory
• Idle memory
• Inactive memory
• Not recently used memory
• Monitoring page access frequency
• Nontrivial
• Page access is done solely by HW
• Using memory protection of MMU
• Sampling-based idle memory tracking
• Memory Resource Management in VMware ESX Server [OSDI’02]
• Invalidating access privilege of sample pages  Access to a
sample page generates page fault to VMM  VMM estimates
the size of idle memory
25/30
How to Detect Idle Memory
• Para-virtualized approach
• Ghost buffer with hypervisor exclusive cache
• Paravirtualized paging
• Transcendent memory (tmem)
• Providing OS with explicit interface for hypervisor cache
• When a page is evicted, put the page in hypervisor cache
• Oracle’s project
• https://oss.oracle.com/projects/tmem/
Virtual Machine Memory Access Tracking With Hypervisor Exclusive Cache [USENIX’07]
MRC
<Original> <Hypervisor cache>
26/30
How to Move Memory
• VMM-level swap (host swap)
• Full-virtualization
• VMM is responsible for reclaiming pages to be moved
VM1 VM2
VMM
Guest
swap
Guest
swap
Host swap
Memory Memory
Drawback
• VMM cannot know which page is less important (VMM does not know OS policies)
• Even if VMM chooses the same victim page as OS, double page fault occurs
if OS tries to swap out a “host-swapped page” to guest swap
swap-out
27/30
How to Move Memory
• Memory ballooning
• Para-virtualization
• OS is responsible for reclaiming pages to be moved
Memory Resource Management in
VMware ESX Server [OSDI’02]
+ OS knows the best target of victim pages
+ VMM doesn’t need to track guest memory
- Guest OS support is required
Popular solution now!
• Module-based implementation
• Simple implementation
• Balloon drivers for KVM and Xen are
maintained in Linux mainline
• Windows versions are also available
28/30
How to Move Memory
• Memory ballooning
• Overcommitted memory
• Guest OS 2 requests six pages, but four pages are available
VMM
Guest OS 1 Guest OS 2
Balloon
driver
Guest OS 1’s
Swap
Memory
allocator
Request
6 pages
Reclaim
2 pages
U U U U U F
Guest OS 1’s
page
Balloon page
29/30
Summary
• Memory is precious in virtualized environments
• Sharing and overcommitment contribute to high
consolidation density
• But, we should take care of memory efficiency vs. QoS
• Insufficient memory can largely degrade QoS
• VM memory management issues will be more focused
in mobile virtualization
The degree of consolidation
High QoS
Low memory utilization Low QoS
High memory utilization
30/30

More Related Content

PDF
Memory Virtualization In Cloud Computing.pdf
PPTX
Cloud platforms - Cloud Computing
PDF
When NOT to use Apache Kafka?
PPT
Cloud Computing
PPTX
Aws overview
PPTX
Cloud Computing_Unit 1- Part 1.pptx
PPTX
Introduction to Amazon Web Services (AWS)
PPTX
AWS Simple Storage Service (s3)
Memory Virtualization In Cloud Computing.pdf
Cloud platforms - Cloud Computing
When NOT to use Apache Kafka?
Cloud Computing
Aws overview
Cloud Computing_Unit 1- Part 1.pptx
Introduction to Amazon Web Services (AWS)
AWS Simple Storage Service (s3)

What's hot (20)

PPTX
Data backup and disaster recovery
PPTX
Unit 3
PPT
Security Issues of Cloud Computing
PPTX
Cloud computing
PPTX
Storage Virtualization
PPTX
virtual-machine-ppt 18030 cloud computing.pptx
PPTX
Cloud computing (IT-703) UNIT 1 & 2
PPTX
Cloud computing hybrid architecture
PPTX
NIST Model of Cloud Computing by Piyush Bujade.pptx
PPTX
Data streaming fundamentals
PPTX
Cloud Management Mechanisms
PPTX
Multi-Tenant Approach
PDF
PPTX
Cloud Management Mechanisms
PPTX
History and Evolution of Cloud computing (Safaricom cloud)
PPTX
Cs6703 grid and cloud computing unit 3
PPTX
Data center virtualization
PPTX
Cloud Resource Management
PDF
Introduction to column oriented databases
PDF
AWS EBS
Data backup and disaster recovery
Unit 3
Security Issues of Cloud Computing
Cloud computing
Storage Virtualization
virtual-machine-ppt 18030 cloud computing.pptx
Cloud computing (IT-703) UNIT 1 & 2
Cloud computing hybrid architecture
NIST Model of Cloud Computing by Piyush Bujade.pptx
Data streaming fundamentals
Cloud Management Mechanisms
Multi-Tenant Approach
Cloud Management Mechanisms
History and Evolution of Cloud computing (Safaricom cloud)
Cs6703 grid and cloud computing unit 3
Data center virtualization
Cloud Resource Management
Introduction to column oriented databases
AWS EBS
Ad

Similar to 4. Memory virtualization and management (20)

PPTX
Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)
PPTX
Vmwareperformancetroubleshooting 100224104321-phpapp02
PDF
Vmware vsphere taking_a_trip_down_memory_lane
PDF
Virtual memory
PPT
Memory virtualisation
PPT
Windows memory manager internals
PPT
memory_mapping.ppt
PPTX
Lecture 8- Virtual Memory Final.pptx
PDF
Dynamic Memory Management Hyperv 2008 R2 S
PDF
Dynamic Memory Management HyperV R2 SP1
PPTX
OS Presentation 2023.pptx
PPT
Driver development – memory management
PPTX
Hyper-V Dynamic Memory in Depth
PPTX
6. Live VM migration
PPTX
Virtual Memory Managementddddddddffffffffffffff.pptx
PPT
SQL 2005 Memory Module
PDF
EuroSec2012 "Effects of Memory Randomization, Sanitization and Page Cache on ...
PPT
DATA SQL Server 2005 Memory Internals.ppt
PPT
Distributed system
PPTX
Os solaris memory management
Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)
Vmwareperformancetroubleshooting 100224104321-phpapp02
Vmware vsphere taking_a_trip_down_memory_lane
Virtual memory
Memory virtualisation
Windows memory manager internals
memory_mapping.ppt
Lecture 8- Virtual Memory Final.pptx
Dynamic Memory Management Hyperv 2008 R2 S
Dynamic Memory Management HyperV R2 SP1
OS Presentation 2023.pptx
Driver development – memory management
Hyper-V Dynamic Memory in Depth
6. Live VM migration
Virtual Memory Managementddddddddffffffffffffff.pptx
SQL 2005 Memory Module
EuroSec2012 "Effects of Memory Randomization, Sanitization and Page Cache on ...
DATA SQL Server 2005 Memory Internals.ppt
Distributed system
Os solaris memory management
Ad

More from Hwanju Kim (8)

PPTX
CPU Scheduling for Virtual Desktop Infrastructure
PPTX
5. IO virtualization
PPTX
3. CPU virtualization and scheduling
PPTX
2. OS vs. VMM
PPTX
1.Introduction to virtualization
PPTX
Demand-Based Coordinated Scheduling for SMP VMs
PDF
Scheduler Support for Video-oriented Multimedia on Client-side Virtualization
PDF
Task-aware Virtual Machine Scheduling for I/O Performance
CPU Scheduling for Virtual Desktop Infrastructure
5. IO virtualization
3. CPU virtualization and scheduling
2. OS vs. VMM
1.Introduction to virtualization
Demand-Based Coordinated Scheduling for SMP VMs
Scheduler Support for Video-oriented Multimedia on Client-side Virtualization
Task-aware Virtual Machine Scheduling for I/O Performance

Recently uploaded (20)

PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
DOCX
573137875-Attendance-Management-System-original
PPTX
OOP with Java - Java Introduction (Basics)
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
Lecture Notes Electrical Wiring System Components
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
Welding lecture in detail for understanding
PPT
Project quality management in manufacturing
PDF
Digital Logic Computer Design lecture notes
PPTX
UNIT 4 Total Quality Management .pptx
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
PPT on Performance Review to get promotions
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
CYBER-CRIMES AND SECURITY A guide to understanding
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
573137875-Attendance-Management-System-original
OOP with Java - Java Introduction (Basics)
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Lecture Notes Electrical Wiring System Components
Operating System & Kernel Study Guide-1 - converted.pdf
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Welding lecture in detail for understanding
Project quality management in manufacturing
Digital Logic Computer Design lecture notes
UNIT 4 Total Quality Management .pptx
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPT on Performance Review to get promotions
Model Code of Practice - Construction Work - 21102022 .pdf
Internet of Things (IOT) - A guide to understanding
Automation-in-Manufacturing-Chapter-Introduction.pdf

4. Memory virtualization and management

  • 3. Memory Virtualization • VMM: “Virtualizing virtual memory” • Virtual  Physical  Machine Level 2 Page table Page table Page table Page table Level 1 Page table . . . Machine memory Virtual address Physical to Machine Pseudo physical memory [Goal] Secure memory isolation  A VM is NOT permitted to access another VM’s memory region  A VM is NOT permitted to manipulate “physical-to-machine” mapping  All mapping to machine memory MUST be verified by VMM 3/30
  • 4. SW-Based Memory Virtualization • x86 was virtualization-unfriendly w.r.t. memory • Memory management unit (MMU) has only a page table root for “virtual-to-machine (V2M)” mapping Level 2 Page table Page table Page table Page table Level 1 Page table . . . Machine memory Virtual address Physical to Machine Pseudo physical memory MMU CR3 “Pseudo” means SW, not HW  This P2M table is used to establish V2M, not recognized by HW 4/30
  • 5. Full- vs. Para-virtualization • How to maintain V2M mapping • Full-virtualization • No modification to V2P in a guest OS • Secretly modifying binary violates OS semantic • “Shadow page tables” • V2M made by referring to V2P and P2M • + No OS modification • - Performance overheads for maintaining shadow page tables • Para-virtualization • Direct modification to V2P in a guest OS using hypercall • V2P  V2M • + High performance (batching optimization is possible) • - OS modification 5/30
  • 6. Full- vs. Para-virtualization • How to maintain V2M mapping MMU Hardware Page directory Page table Page table Page table Page table Page directory Page table Page table Page table Page table VMM Guest OS Shadow Page table Shadow mode (full virtualization) Direct mode (para-virtualization) V2P V2M sync MMU Page directory Page table Page table Page table Page table V2M Read Write Read Write Page fault Page fault handler Verify that the machine page to be updated is owned by the domain? 6/30
  • 7. Linux Virtual Memory (x86-32) Kernel (1G) User (3G) Virtual memory Page directory Page table Page table Page table Page table cr3 PFN N PFN N-1 . . . PFN 4 PFN 3 PFN 2 PFN 1 PFN 0 PAGE_OFFSET PFN N’s descriptor PFN N-1’s descriptor . . . PFN 2’s descriptor PFN 1’s descriptor PFN 0’s descriptor mem_map struct page _count flags mapping lru Physical memory high_memory Buddy system allocator Slab allocator __alloc_pages __free_pages 7/30
  • 8. Xen Memory Virtualization • Para-virtualization Xen(64M) Kernel User (3G) Page directory Page table Page table Page table Page table cr3 MFN N MFN N-1 . . . MFN 4 MFN 3 MFN 2 MFN 1 MFN 0 Xen(64M) Kernel User (3G) Page directory Page table Page table Page table Page table Virtual memory Virtual memory Machine memory MFN N’s descriptor MFN N-1’s descriptor . . . MFN 2’s descriptor MFN 1’s descriptor MFN 0’s descriptor frame_table struct page_info list count_info _domain type_info Buddy system allocator __alloc_heap_pages __free_heap_pages 8/30
  • 9. Page Table Identification • Auditing page table updates • Following mapping from a page table root (CR3) to identify page tables • Once identified, page table updates are carefully monitored and verified Page directory Page table Page table . . . . . . Page Type PD PT RW Pin request validated Page Page . . . 9/30
  • 10. HW Memory Virtualization • What if nested page table walking is supported by HW? • Eliminating SW overheads to maintain V2M • HW-assisted memory virtualization • Intel Extended Page Tables (EPT) • AMD Rapid Virtualization Indexing (RVI) VMM Guest OS V2P P2M V2M Shadow page tables (SPT) MMU SPT VMM Guest OS V2P P2M Extended page tables (EPT) EPT MMU V2M EPT GPT 1st walking 2nd walking GPT 10/30
  • 11. HW Memory Virtualization • AMD RVI (formerly Nested Page Tables (NPT)) • Two page table roots: gCR3 and nCR3 Accelerating Two-Dimensional Page Walks for Virtualized Systems [ASPLOS’08] 11/30
  • 12. HW Memory Virtualization • Advantages • Significantly simplifying VMM • Just informing MMU of a P2M root • No shadow page tables • No synchronizing overheads and memory overheads • No OS modification • Disadvantages • Not always outperforming SW-based methods • Page walking overheads on a TLB miss • SW solution: SW-HW hybrid scheme [VEE’11], Large pages • HW solution: Caching page walks [ASPLOS’08], Flat page tables [ISCA’12] 12/30
  • 13. ARM Memory Virtualization • Two-stage address translation Applications OS Hardware Applications Guest OS VMM Hardware Virtual Address (VA) Physical Address (PA) Virtual Address (VA) Physical Address (PA) Intermediate Physical Address (IPA) Guest Kernel Guest User Stage 1 translation Stage 2 translation Virtual Address Space Physical Address Space Intermediate Physical Address Space 13/30
  • 14. Summary • SW-based memory virtualization has been the most complex part in VMM • Before HW support, Xen continued optimizing its shadow page tables up to ver3 • Virtual memory itself is already complicated, but virtualizing virtual memory is horrible • HW-based memory virtualization significantly reduces VMM complexity • The most complex and heavy part is now offloaded to HW • But, energy issues on ARM HW memory virtualization? 14/30
  • 16. Process Memory Management • Memory sharing • Parent-child copy-on-write (CoW) sharing • On fork(), a child CoW-shares its parent memory • On write to a shared page, copy and modify a private page • Advantages • Reducing memory footprint • Lightweight fork • Memory overcommitment • Giving a process larger memory space than physical memory • Paging or swapping out to backing storage when memory is pressured • Advantage • Efficient memory utilization 16/30
  • 17. VM Memory Management • Memory sharing • No parent-child relationship • But, a research project finds this relationship in a useful case • Virtual based honeyfarm [SOSP’05] • Honeypot VMs CoW-share a reference image • General memory sharing • Block-based sharing • Content-based sharing • Memory overcommitment • Σ VM memory allocation > Machine memory • Dynamic memory balancing VMM Logging & Analysis Parent Honeypot VM Honeypot VM Machine Memory Scalability, Fidelity, and Containment in the Potemkin Virtual Honeyfarm [SOSP’05] 17/30
  • 18. Why VM Memory Sharing? • Why memory? • Memory limitation inhibits high consolidation density • Other resources wastage • HW cost • Memory itself • Limited motherboard slot • Energy cost • RAM energy consumption matters! • Main goal • Reducing memory footprint as much as possible even with more CPU computation 18/30
  • 19. Memory Sharing • Block-based page sharing • Transparent page sharing of Disco [SOSP’97] • Sharing-aware block devices [USENIX’09] • On reading a common block from shared disk, only one memory copy is CoW-shared + Finding identical pages is lightweight - Sharing only for shared disk Disco: Running Commodity Operating Systems on Scalable Multiprocessors [SOSP’97] 19/30
  • 20. Memory Sharing • Content-based page sharing • Sharing pages with identical contents • VMWare ESX server and KSM for KVM …2bd806af 4. Byte-by-byte comparison 1. Periodic scan 2. Hashing page contents 3. Hash collision 5. CoW sharing & reclaiming a redundant page Memory Resource Management in VMware ESX Server [OSDI’02] + High memory utilization - Finding identical pages is nontrivial PA MA 20/30
  • 21. Memory Sharing • Subpage sharing • Difference Engine: Harnessing Memory Redundancy in Virtual Machine [OSDI’08] • Patching similar pages • Compressing idle pages • Reference & dirty bit tracking to find idle pages PA MA Reference page + Much higher memory utilization - Computationally intensive Put it all together! 21/30
  • 22. Memory Sharing • Kernel Samepage Merging (KSM) • Open source!! • Content-based page sharing in Linux • Increasing memory density by using KSM [OLS’09] • Linux kernel service • Applicable to all Linux processes including KVM • Target memory regions can be registered via madvise() system call • Content comparison is done by memcmp() • Red-black tree 22/30
  • 23. Memory Overcommitment • Two types of memory overcommitment • Using surplus memory reclaimed by sharing • Providing to memory-hungry VMs • Creating more VMs • When is memory pressured? • Shared pages are CoW-broken • Balancing memory between VMs • Providing idle memory to memory-hungry VMs • When is memory pressured? • Idle memory becomes busy Research issues • How to detect memory-hungry VMs • How to detect idle memory in VMs • How to effectively move memory from a VM to another Working set estimation techniques Satori: Enlightened page sharing [USENIX’09] Sharing cycle 23/30
  • 24. How to Detect Memory-hungry VMs • Monitoring memory pressure of VMs • Swap I/O traffic • Simple method, but only for anonymous pages (e.g., heap) • How much memory is required? • Feedback-driven method • Allocate more memory  monitor swap traffics  … • Buffer cache monitoring (Geiger [ASPLOS’06]) • Monitoring the use of unified buffer cache based on • Page faults, page table updates, and disk I/Os • How much memory is required? • LRU miss curve ratio (MRC) Disk VM Unified buffer cache  Associate memory and disk locations  Detect page reuse as cache eviction • Reused by CoW and demand paging 24/30
  • 25. How to Detect Idle Memory • Idle memory • Inactive memory • Not recently used memory • Monitoring page access frequency • Nontrivial • Page access is done solely by HW • Using memory protection of MMU • Sampling-based idle memory tracking • Memory Resource Management in VMware ESX Server [OSDI’02] • Invalidating access privilege of sample pages  Access to a sample page generates page fault to VMM  VMM estimates the size of idle memory 25/30
  • 26. How to Detect Idle Memory • Para-virtualized approach • Ghost buffer with hypervisor exclusive cache • Paravirtualized paging • Transcendent memory (tmem) • Providing OS with explicit interface for hypervisor cache • When a page is evicted, put the page in hypervisor cache • Oracle’s project • https://oss.oracle.com/projects/tmem/ Virtual Machine Memory Access Tracking With Hypervisor Exclusive Cache [USENIX’07] MRC <Original> <Hypervisor cache> 26/30
  • 27. How to Move Memory • VMM-level swap (host swap) • Full-virtualization • VMM is responsible for reclaiming pages to be moved VM1 VM2 VMM Guest swap Guest swap Host swap Memory Memory Drawback • VMM cannot know which page is less important (VMM does not know OS policies) • Even if VMM chooses the same victim page as OS, double page fault occurs if OS tries to swap out a “host-swapped page” to guest swap swap-out 27/30
  • 28. How to Move Memory • Memory ballooning • Para-virtualization • OS is responsible for reclaiming pages to be moved Memory Resource Management in VMware ESX Server [OSDI’02] + OS knows the best target of victim pages + VMM doesn’t need to track guest memory - Guest OS support is required Popular solution now! • Module-based implementation • Simple implementation • Balloon drivers for KVM and Xen are maintained in Linux mainline • Windows versions are also available 28/30
  • 29. How to Move Memory • Memory ballooning • Overcommitted memory • Guest OS 2 requests six pages, but four pages are available VMM Guest OS 1 Guest OS 2 Balloon driver Guest OS 1’s Swap Memory allocator Request 6 pages Reclaim 2 pages U U U U U F Guest OS 1’s page Balloon page 29/30
  • 30. Summary • Memory is precious in virtualized environments • Sharing and overcommitment contribute to high consolidation density • But, we should take care of memory efficiency vs. QoS • Insufficient memory can largely degrade QoS • VM memory management issues will be more focused in mobile virtualization The degree of consolidation High QoS Low memory utilization Low QoS High memory utilization 30/30