SlideShare a Scribd company logo
LO-PHI: Low-Observable Physical Host
Instrumentation for Malware Analysis
Chad Spensky, Hongyi Hu and Kevin Leach
Pietro De Nicolao
Politecnico di Milano
June 13, 2016
1 / 31
Open challenges in Dynamic Malware Analysis
LO-PHI framework implementation
Experiments
Critique
Related and future work
2 / 31
The paper and the code
LO-PHI: Low-Observable Physical Host
Instrumentation for Malware Analysis
Chad Spensky⇤†, Hongyi Hu⇤§ and Kevin Leach⇤‡
⇤MIT Lincoln Laboratory lophi@mit.edu
†University of California, Santa Barbara cspensky@cs.ucsb.edu
§Dropbox hongyihu@alum.mit.edu
‡University of Virginia kjl2y@virginia.edu
Abstract—Dynamic-analysis techniques have become the
chpins of modern malware analysis. However, software-based
thods have been shown to expose numerous artifacts, which
n either be detected and subverted, or potentially interfere
h the analysis altogether, making their results untrustworthy.
e need for less-intrusive methods of analysis has led many
earchers to utilize introspection in place of instrumenting the
tware itself. While most current introspection technologies
ve focused on virtual-machine introspection, we present a novel
tem, LO-PHI, which is capable of physical-machine introspec-
n of both non-volatile and volatile memory, i.e., hard disk and
tem memory. We demonstrate that we are able to provide anal-
s capabilities comparable to existing solutions, whilst exposing
al. [20] even provide a taxonomy of anti-analysis technique
and mitigations commonly employed. However, no bulle
proof solutions exist, and most existing solutions requir
continuous updating as they rely on emulation frameworks.
To address these problems we present LO-PHI (Low
Observable Physical Host Instrumentation), a novel system ca
pable of analyzing software executing on commercial-off-the
shelf (COTS) bare-metal machines, without the need for an
additional software on the machines. LO-PHI permits accurat
monitoring and analysis of live-running physical hosts in rea
time, with a minimal addition of “plug-and-play” component
Network and Distributed System Symposium (NDSS) ’16
21-24 February 2016, San Diego, CA, USA
GitHub repository: https://github.com/mit-ll/LO-PHI
3 / 31
Open challenges in Dynamic Malware Analysis
4 / 31
Dynamic Malware Analysis
Idea: run the malware in a sandbox (VM, debugger, . . . ) and use
tools to analyze its behavior.
We are interested in observing:
Memory accesses
Network activity
Disk activity
Observation tools must be placed outside the sandbox
At a lower level w.r.t. the malware
In theory, completely transparent to the malware
Not so simple. . .
5 / 31
Virtualization is like dreaming
Unsure if you’re living in a dream, or awake?
Look for artifacts (i.e. anomalies) in your reality!
Malware can do the same. . .
Figure 1: Never stopping spinning top: a possible artifact in dreams.
6 / 31
Artifacts and environment-aware malware
Observer effect
Execution of software into a debugger or VM leaves artifacts.
Artifacts are evidences of an “artificial” environment
According to [Garfinkel, 2007], building a transparent VMM is
fundamentally infeasible.
Malware resistance to dynamic analysis
If the malware is able to detect artifacts, it can resist to
traditional dynamic analysis tools.
1. Remain inactive (not trigger the payload)
2. Abort or crash its host
3. Disable defenses or tools
7 / 31
Artifacts: some examples
[Chen, 2008], [Garfinkel, 2007] provide a taxonomy of artifacts:
Hardware
Special devices or adapters in VMs
Specific manufacturer prefixes on device names
Memory
Hypervisors placing interrupt table in different positions
Too little RAM (typical of VMs)
Software
Presence of installed tools on the system
isDebuggerPresent() Windows API
Behavior
Timing differences (e.g. interception of privileged instructions in
VMMs)
8 / 31
Semantic Gap
What is the malware doing?
Need to mine semantics from the extracted raw data.
From raw data:
Disk read at sector 123
TCP packet ABC
Some bytes at memory address 0xDEADBEEF
To concise, high-level event descriptions:
/etc/passwd has been read
A connection to badguy.com has been opened
Forensics to the rescue
LO-PHI uses well-known forensic tools, adapted for live analysis:
Disk forensics: Sleuthkit
Memory forensics: Volatility
9 / 31
Malware Analysis Framework Evaluation
to the undetectability of the presence of the analysis envi-
ronment.
Quality
Stealthiness
Efficiency
Figure 5: Malware analysis framework evaluation
There is a constant tension between these factors, which
is represented as a triangular surface in Figure 5. That is,
while trying to achieve better results for one factor, the anal-
ysis technique has to make compromises in remain two other
factors. For example, an analysis system with an in-guest
Figure 2: From [Kirat, 2011]
Tradeoff between:
1. Low-artifact, semantically poor tools (Virtual Machine
Introspection)
2. High-artifact, semantically-rich frameworks (debuggers)
10 / 31
LO-PHI framework implementation
11 / 31
Goals
No virtualization: run malware on bare metal machines!
Physical sensors and actuators
Bridging the semantic gap
Physical sensors collect raw data
Automated restore to pre-infection state
Stealthiness: very few, undetectable artifacts
Extendability: support new OSs and filesystems
12 / 31
Threat model
Assumptions on our model of malware: they are limitations of the
approach.
1. Malicious modifications evident either in memory or on disk
2. No infection delivered to hardware
3. Instrumentation is in place before malware is executed
Malware cannot analyze the system without LO-PHI in place
Harder to compare and detect artifacts: no baseline
13 / 31
Sensors and actuators (1)
Live Disk Forensics - 11
CS & HH 11/5/2014
Physical Instrumentation
Power, Keyboard, Mouse
Memory Introspection
Network Tap
SATA Introspection
Semantic Analysis
Figure 3: Hardware instrumentation to inspect a bare-metal machine.
14 / 31
Sensors and actuators (2)
Sensors
Memory. Xilinx ML507 board connected to PCIe, reads and
writes arbitrary memory locations via DMA.
Disk. ML507 board intercepting all the traffic over SATA
interface. Sends SATA frames via Gigabit Ethernet and UDP.
Completely passive. . .
except when SATA data rate exceeds Ethernet bandwidth:
throttling of frames.
Network interface. Mentioned in paper, but the technology
used is unclear.
Actuators
An Arduino Leonardo emulates USB keyboard and mouse.
15 / 31
Infrastructure
Restoring physical machines
We cannot simply “restore a snapshot” like in VMs
Preboot Execute Environment (PXE) with CloneZilla
Allows to restore the disk to a previously saved state
No interaction with the OS
Scalable infrastructure
Job submission system: jobs are sent to a scheduler
The scheduler executes the routine on an appropriate machine
Python interface to control the machine and run malware
and analysis
16 / 31
Bridging the semantic gap via forensic analysis
Raw SATA
Capture
Disk Reconstruction
(Custom Module)
File System Reconstruction
(PyTSK + Custom Code) Filter Noise FS Modifications
(a) Disk Reconstruction
Memory Image
(Clean)
Semantic Reconstruction
(Volatility) OS Information
Memory Image
(Dirty)
Semantic Reconstruction
(Volatility) OS Information
Extract
Differences
Filter Noise
Memory
Modifications
(b) Memory Reconstruction
Fig. 5: Binary analysis workflow. (Rounded nodes represent data and rectangles represent data manipulation.)
VII. EVALUATION AND ANALYSIS
In this section, we explain our methodology for seman-
tic gap reconstruction (Section VII-A) and demonstrate the
practicality of LO-PHI with three targeted experiments. The
experiments were constructed to demonstrate the following:
• The ability of LO-PHI to detect the behaviors
elicited by real malware, confirmed with ground truth
(Section VII-C)
• The ability to scale and extract meaningful results
from unknown malware samples (Section VII-D)
ssdt, svcscan, and callbacks which examine kernel descriptor
tables, registered services, and kernel callbacks.
2) Disk: The first step in our disk analysis is to first convert
the raw capture of the SATA activity into a 4-tuple containing
the disk operation (e.g., READ or WRITE), starting sector,
total number of sectors, and data. Our physical drives, as
with most modern drives, used an optimization in the SATA
specification known as Native Command Queuing (NCQ) [23].
NCQ reorders SATA Frame Information Structure (FIS) re-
quests to achieve better performance by reducing extraneous
head movement and then asynchronously replies based on
Figure 4: Binary analysis workflow. Rounded nodes represent data and
rectangles represent data manipulation.
Background noise was removed by analyzing non-malicious
software.
17 / 31
Example of output of the analysis toolbox
Offset Name PID PPID
0x86292438 AcroRd32.exe 1340 1048
0x86458818 AcroRd32.exe 1048 1008
0x86282be0 AdobeARM.exe 1480 1048
0x864562a0 $$ rk sketchy server.exe 1044 1008
(a) New Processes (pslist)
Selector Base Limit Type DPL Gr Pr
0x320 0x8003b6da 0x00000000 CallGate32 3 - P
(c) GDT Hooks (gdt)
Name
hookssdt.sys
Table Entry Index Address Name Module
0 0x0000f7 0xf7c5b406 NtSetValueKey hookssdt.sys
0 0x0000ad 0xf7c5b44c NtQuerySystemInformation hookssdt.sys
0 0x000091 0xf7c5b554 NtQueryDirectoryFile hookssdt.sys
(e) SSDT Hooks (ssdt)
/.../lo
/.../lo
/.../lo
Fig. 6: Post-filtered semantic output from rootkit experim
B. Filtering Background Noise
While the ability to provide a complete log of modifications
to the entire system is useful in its own right, it is likely more
(Figure 6d), h
then execute
have omitted
Adobe Acrob
Figure 5: New processes.
D
PID Port Protocol Address
1048 1038 UDP 127.0.0.1
1044 21 TCP 0.0.0.0
(b) New Sockets (sockets)
Gr Pr
- P
Name Base Size File
hookssdt.sys 0xf7c5b000 0x1000 C: ...lophihookssdt.sys
(d) Loaded Kernel Models (modscan)
Module
hookssdt.sys
mation hookssdt.sys
e hookssdt.sys
Created Filename
/.../lophi/$$ rk sketchy server.exe
/.../lophi/hookssdt.sys
/.../lophi/sample 0742475e94904c41de1397af5c53dff8e.exe
(f) Disk Event Log (81 Entries Truncated)
Figure 6: New sockets.
PPID
1048
1008
1048
1008
PID Port Protocol Address
1048 1038 UDP 127.0.0.1
1044 21 TCP 0.0.0.0
(b) New Sockets (sockets)
DPL Gr Pr
3 - P
Name Base Size File
hookssdt.sys 0xf7c5b000 0x1000 C: ...lophihookssdt.sys
(d) Loaded Kernel Models (modscan)
Module
hookssdt.sys
Information hookssdt.sys
ryFile hookssdt.sys
Created Filename
/. .. /lophi/$$ rk sketchy server.exe
/. .. /lophi/hookssdt.sys
/. .. /lophi/sample 0742475e94904c41de1397af5c53dff8e.exe
(f) Disk Event Log (81 Entries Truncated)
ntic output from rootkit experiment (Section VII-C1).Figure 7: Disk event log.
18 / 31
Experiments
19 / 31
Experiment on evasive malware
Malware previously labeled as “evasive” was executed on Windows 7
and analyzed using LO-PHI.
Many samples clearly exhibited typical malware behavior.
Some samples failed because of no network connection, or
wrong Windows version.
[. . . ] we feel that our findings are more that sufficient to
showcase LO-PHI’s ability to analyze evasive malware,
without being subverted, and subsequently produce
high-fidelity results for further analysis.
20 / 31
Timing
Fig. 4: Time spent in each step of binary analysis. Both environments
were booting a 10 GB Windows 7 (64-bit) hibernate image and were
running on a system with 1 GB of volatile memory.
Figure 8: Time spent in each step of binary analysis. Both environments
were booting a 10 GB Windows 7 (64-bit) hibernate image and were
running on a system with 1 GB of volatile memory.
21 / 31
Critique
22 / 31
Known limitations
Newer chipsets use IOMMUs, disabling DMA from peripherals
Current memory acquisition technique will become unusable
Smearing: the memory can change during the acquisition
Inconsistent states
Faster polling rates can help
Filesystem caching: some data will not pass through SATA
interface
Malware could write a file to disk cache, execute and delete it
before the cached is flushed to disk.
However, the effects would be visible in memory.
23 / 31
Issues with the technique and the experiments
The malware is left to run only 3 minutes.
Many malwares need much more time to fully uncover their
effects (e.g. ransomware).
No memory polling during the execution of the malware
Only snapshot before and after the execution
Temporary data used by the malware is never seen
Assumption: malware does not modify BIOS or firmware.
But if it does, the physical machine could not be recoverable.
Costly!
No Internet access: the authors always run the malware on
disconnected machines.
Most malware becomes useless without Command&Control
infrastructure.
Network access could expose further, unseen, artifacts.
24 / 31
Methodological issues
The article claims that the artifacts from LO-PHI are unusable
by malware because there’s no baseline (i.e. the malware
cannot see the machine before the installation of LO-PHI).
This can also be true for traditional approaches.
No statistical test used to discover whether difference in
disk/memory throughput is statistically significant (presence of
artifacts).
Very simple and standard procedure, should really be done in a
scientific paper.
Network analysis technique is not described and unclear.
We exclude the network trace analysis from much of our
discussion since it is a well-known technique and not the focus
of our work.
25 / 31
Memory throughput with and without LO-PHI
(a) Physical machine (Polling at 14MB/sec) (b)
Fig. 1: Average memory throughput comparison as reported by RAMSpeed, with and
machines. (500 samples for each box plot)
At first glance, Figure 1 may seem to indicate that our in- B. Disk Artifa
Figure 9: Average memory throughput comparison as reported by
RAMSpeed, with and without instrumentation. Deviation from
uninstrumented trial is only 0.4% in worst case. No statistical test used.
26 / 31
Another point. . .
Virtualization is increasingly used in production contexts.
Just think about cloud computing!
Attackers will want to target those VMs [Garfinkel, 2007]
Malware that deactivates in VMs will be less common
27 / 31
Related and future work
28 / 31
Related work
Traditional dynamic analysis
Many dynamic malware analysis tools rely on virtualization: Ether,
BitBlaze, Anubis, V2E, HyperDbg, SPIDER.
We already saw the limitations of VM approaches: artifacts
Bare-metal dynamic analysis
BareBox [Kirat, 2011]: malware analysis framework based on a
bare-metal machine without virtualization or emulations
Only targets user-mode malware
Only disk analysis (no memory tools)
29 / 31
Future evolutions
Automated, repeated analyses
Disk restore phase is the lengthiest (> 6 min)
The resetting and boot process could be decreased significantly
by writing a custom PXE loader, or completely mitigated by
implementing copy-on-write into our FPGA.
While snapshots are trivial with virtual machines, it is still an
open problem for physical machines.
Extensions
Analyze transient behavior of binary
Continuous memory polling
Need to deal with DMA artifacts
Cover malware that infects hardware (BIOS, firmware)
30 / 31
References
Chen X., Andersen J, Morley M., Bailey M., Nazario J.
Towards an understanding of anti-virtualization and
anti-debugging behavior in modern malware
In Proceedings of the International Conference on Dependable
Systems and Networks (2008)
doi: 10.1109/DSN.2008.4630086
Kirat D., Vigna G., Kruegel C.
BareBox: Efficient Malware Analysis on Bare-metal
Proceedings of the 27th Annual Computer Security Applications
Conference (2001)
doi: 10.1145/2076732.2076790
Garfinkel T., Adams K., Warfield A., Franklin J.
Compatibility is Not Transparency: VMM Detection Myths and
Realities
Proceedings of the 11th USENIX Workshop on Hot Topics in
Operating Systems (2007)
31 / 31

More Related Content

PDF
Formal Verification of Functional Code
PDF
Unikernels, Multikernels, Virtual Machine-based Kernels
PDF
IPC in Microkernel Systems, Capabilities
PDF
Microkernels in the Era of Data-Centric Computing
PDF
Hardware/Software Co-Design for Efficient Microkernel Execution
PPTX
44CON London 2015 - Old Dog, New Tricks: Forensics With PowerShell
PDF
Lessons Learned from Porting HelenOS to RISC-V
PPTX
Forensic Memory Analysis of Android's Dalvik Virtual Machine
Formal Verification of Functional Code
Unikernels, Multikernels, Virtual Machine-based Kernels
IPC in Microkernel Systems, Capabilities
Microkernels in the Era of Data-Centric Computing
Hardware/Software Co-Design for Efficient Microkernel Execution
44CON London 2015 - Old Dog, New Tricks: Forensics With PowerShell
Lessons Learned from Porting HelenOS to RISC-V
Forensic Memory Analysis of Android's Dalvik Virtual Machine

What's hot (20)

PDF
Accessing Forensic Images
PDF
Container con toronto
PPTX
Memory Forensics: Defeating Disk Encryption, Skilled Attackers, and Advanced ...
PDF
Memory Analysis of the Dalvik (Android) Virtual Machine
PPTX
Remnux tutorial-1 Statically Analyse Portable Executable(PE) Files
PPTX
Forensic imaging tools
PPTX
Forensic imaging
PDF
RESTful Triple Spaces of Things
DOCX
Anhnh osg202-1
PDF
Forensics of a Windows System
PDF
Hardware Implementation of Algorithm for Cryptanalysis
PPTX
A Survey of NGS Data Analysis on Hadoop
PPTX
Lichtenberg bosc2010 wordseeker
PDF
Anti forensics the rootkit connection
PPTX
How to be a bioinformatician
PDF
Exploiting Linux On 32-bit and 64-bit Systems
PDF
Dfrws eu 2014 rekall workshop
ODP
Introduction to forensic imaging
PPTX
Unmasking Careto through Memory Forensics (video in description)
PPTX
Two-For-One Talk: Malware Analysis for Everyone
Accessing Forensic Images
Container con toronto
Memory Forensics: Defeating Disk Encryption, Skilled Attackers, and Advanced ...
Memory Analysis of the Dalvik (Android) Virtual Machine
Remnux tutorial-1 Statically Analyse Portable Executable(PE) Files
Forensic imaging tools
Forensic imaging
RESTful Triple Spaces of Things
Anhnh osg202-1
Forensics of a Windows System
Hardware Implementation of Algorithm for Cryptanalysis
A Survey of NGS Data Analysis on Hadoop
Lichtenberg bosc2010 wordseeker
Anti forensics the rootkit connection
How to be a bioinformatician
Exploiting Linux On 32-bit and 64-bit Systems
Dfrws eu 2014 rekall workshop
Introduction to forensic imaging
Unmasking Careto through Memory Forensics (video in description)
Two-For-One Talk: Malware Analysis for Everyone
Ad

Similar to LO-PHI: Low-Observable Physical Host Instrumentation for Malware Analysis (20)

PDF
Automated In-memory Malware/Rootkit Detection via Binary Analysis and Machin...
PDF
Malware Collection and Analysis via Hardware Virtualization
PDF
Análise de malware com suporte de hardware
PDF
H@dfex 2015 malware analysis
PDF
Introduction to Memory Analysis
PDF
Automated Live Forensics Analysis for Volatile Data Acquisition
PDF
Memory Forensic CheatSheet - SANS Institute
PPTX
0box Analyzer--Afterdark Runtime Forensics for Automated Malware Analysis and...
PPT
На страже ваших денег и данных
PDF
2010 2013 sandro suffert memory forensics introdutory work shop - public
PPTX
Memory Forensics: Defeating Disk Encryption, Skilled Attackers, and Advanced ...
PDF
RING 0/-2 ROOKITS : COMPROMISING DEFENSES
PPTX
First Responders Course - Session 7 - Incident Scope Assessment [2004]
PDF
Project in malware analysis:C2C
PPTX
Building next gen malware behavioural analysis environment
PDF
Mem forensic
PPTX
Malware Analysis 101: N00b to Ninja in 60 Minutes at BSidesDC on October 19, ...
PDF
Oleksyk applied-anti-forensics
Automated In-memory Malware/Rootkit Detection via Binary Analysis and Machin...
Malware Collection and Analysis via Hardware Virtualization
Análise de malware com suporte de hardware
H@dfex 2015 malware analysis
Introduction to Memory Analysis
Automated Live Forensics Analysis for Volatile Data Acquisition
Memory Forensic CheatSheet - SANS Institute
0box Analyzer--Afterdark Runtime Forensics for Automated Malware Analysis and...
На страже ваших денег и данных
2010 2013 sandro suffert memory forensics introdutory work shop - public
Memory Forensics: Defeating Disk Encryption, Skilled Attackers, and Advanced ...
RING 0/-2 ROOKITS : COMPROMISING DEFENSES
First Responders Course - Session 7 - Incident Scope Assessment [2004]
Project in malware analysis:C2C
Building next gen malware behavioural analysis environment
Mem forensic
Malware Analysis 101: N00b to Ninja in 60 Minutes at BSidesDC on October 19, ...
Oleksyk applied-anti-forensics
Ad

Recently uploaded (20)

PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PPTX
2. Earth - The Living Planet Module 2ELS
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PPT
protein biochemistry.ppt for university classes
PPTX
neck nodes and dissection types and lymph nodes levels
Biophysics 2.pdffffffffffffffffffffffffff
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
Classification Systems_TAXONOMY_SCIENCE8.pptx
TOTAL hIP ARTHROPLASTY Presentation.pptx
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
The KM-GBF monitoring framework – status & key messages.pptx
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
2. Earth - The Living Planet Module 2ELS
Phytochemical Investigation of Miliusa longipes.pdf
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
POSITIONING IN OPERATION THEATRE ROOM.ppt
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
ECG_Course_Presentation د.محمد صقران ppt
protein biochemistry.ppt for university classes
neck nodes and dissection types and lymph nodes levels

LO-PHI: Low-Observable Physical Host Instrumentation for Malware Analysis

  • 1. LO-PHI: Low-Observable Physical Host Instrumentation for Malware Analysis Chad Spensky, Hongyi Hu and Kevin Leach Pietro De Nicolao Politecnico di Milano June 13, 2016 1 / 31
  • 2. Open challenges in Dynamic Malware Analysis LO-PHI framework implementation Experiments Critique Related and future work 2 / 31
  • 3. The paper and the code LO-PHI: Low-Observable Physical Host Instrumentation for Malware Analysis Chad Spensky⇤†, Hongyi Hu⇤§ and Kevin Leach⇤‡ ⇤MIT Lincoln Laboratory lophi@mit.edu †University of California, Santa Barbara cspensky@cs.ucsb.edu §Dropbox hongyihu@alum.mit.edu ‡University of Virginia kjl2y@virginia.edu Abstract—Dynamic-analysis techniques have become the chpins of modern malware analysis. However, software-based thods have been shown to expose numerous artifacts, which n either be detected and subverted, or potentially interfere h the analysis altogether, making their results untrustworthy. e need for less-intrusive methods of analysis has led many earchers to utilize introspection in place of instrumenting the tware itself. While most current introspection technologies ve focused on virtual-machine introspection, we present a novel tem, LO-PHI, which is capable of physical-machine introspec- n of both non-volatile and volatile memory, i.e., hard disk and tem memory. We demonstrate that we are able to provide anal- s capabilities comparable to existing solutions, whilst exposing al. [20] even provide a taxonomy of anti-analysis technique and mitigations commonly employed. However, no bulle proof solutions exist, and most existing solutions requir continuous updating as they rely on emulation frameworks. To address these problems we present LO-PHI (Low Observable Physical Host Instrumentation), a novel system ca pable of analyzing software executing on commercial-off-the shelf (COTS) bare-metal machines, without the need for an additional software on the machines. LO-PHI permits accurat monitoring and analysis of live-running physical hosts in rea time, with a minimal addition of “plug-and-play” component Network and Distributed System Symposium (NDSS) ’16 21-24 February 2016, San Diego, CA, USA GitHub repository: https://github.com/mit-ll/LO-PHI 3 / 31
  • 4. Open challenges in Dynamic Malware Analysis 4 / 31
  • 5. Dynamic Malware Analysis Idea: run the malware in a sandbox (VM, debugger, . . . ) and use tools to analyze its behavior. We are interested in observing: Memory accesses Network activity Disk activity Observation tools must be placed outside the sandbox At a lower level w.r.t. the malware In theory, completely transparent to the malware Not so simple. . . 5 / 31
  • 6. Virtualization is like dreaming Unsure if you’re living in a dream, or awake? Look for artifacts (i.e. anomalies) in your reality! Malware can do the same. . . Figure 1: Never stopping spinning top: a possible artifact in dreams. 6 / 31
  • 7. Artifacts and environment-aware malware Observer effect Execution of software into a debugger or VM leaves artifacts. Artifacts are evidences of an “artificial” environment According to [Garfinkel, 2007], building a transparent VMM is fundamentally infeasible. Malware resistance to dynamic analysis If the malware is able to detect artifacts, it can resist to traditional dynamic analysis tools. 1. Remain inactive (not trigger the payload) 2. Abort or crash its host 3. Disable defenses or tools 7 / 31
  • 8. Artifacts: some examples [Chen, 2008], [Garfinkel, 2007] provide a taxonomy of artifacts: Hardware Special devices or adapters in VMs Specific manufacturer prefixes on device names Memory Hypervisors placing interrupt table in different positions Too little RAM (typical of VMs) Software Presence of installed tools on the system isDebuggerPresent() Windows API Behavior Timing differences (e.g. interception of privileged instructions in VMMs) 8 / 31
  • 9. Semantic Gap What is the malware doing? Need to mine semantics from the extracted raw data. From raw data: Disk read at sector 123 TCP packet ABC Some bytes at memory address 0xDEADBEEF To concise, high-level event descriptions: /etc/passwd has been read A connection to badguy.com has been opened Forensics to the rescue LO-PHI uses well-known forensic tools, adapted for live analysis: Disk forensics: Sleuthkit Memory forensics: Volatility 9 / 31
  • 10. Malware Analysis Framework Evaluation to the undetectability of the presence of the analysis envi- ronment. Quality Stealthiness Efficiency Figure 5: Malware analysis framework evaluation There is a constant tension between these factors, which is represented as a triangular surface in Figure 5. That is, while trying to achieve better results for one factor, the anal- ysis technique has to make compromises in remain two other factors. For example, an analysis system with an in-guest Figure 2: From [Kirat, 2011] Tradeoff between: 1. Low-artifact, semantically poor tools (Virtual Machine Introspection) 2. High-artifact, semantically-rich frameworks (debuggers) 10 / 31
  • 12. Goals No virtualization: run malware on bare metal machines! Physical sensors and actuators Bridging the semantic gap Physical sensors collect raw data Automated restore to pre-infection state Stealthiness: very few, undetectable artifacts Extendability: support new OSs and filesystems 12 / 31
  • 13. Threat model Assumptions on our model of malware: they are limitations of the approach. 1. Malicious modifications evident either in memory or on disk 2. No infection delivered to hardware 3. Instrumentation is in place before malware is executed Malware cannot analyze the system without LO-PHI in place Harder to compare and detect artifacts: no baseline 13 / 31
  • 14. Sensors and actuators (1) Live Disk Forensics - 11 CS & HH 11/5/2014 Physical Instrumentation Power, Keyboard, Mouse Memory Introspection Network Tap SATA Introspection Semantic Analysis Figure 3: Hardware instrumentation to inspect a bare-metal machine. 14 / 31
  • 15. Sensors and actuators (2) Sensors Memory. Xilinx ML507 board connected to PCIe, reads and writes arbitrary memory locations via DMA. Disk. ML507 board intercepting all the traffic over SATA interface. Sends SATA frames via Gigabit Ethernet and UDP. Completely passive. . . except when SATA data rate exceeds Ethernet bandwidth: throttling of frames. Network interface. Mentioned in paper, but the technology used is unclear. Actuators An Arduino Leonardo emulates USB keyboard and mouse. 15 / 31
  • 16. Infrastructure Restoring physical machines We cannot simply “restore a snapshot” like in VMs Preboot Execute Environment (PXE) with CloneZilla Allows to restore the disk to a previously saved state No interaction with the OS Scalable infrastructure Job submission system: jobs are sent to a scheduler The scheduler executes the routine on an appropriate machine Python interface to control the machine and run malware and analysis 16 / 31
  • 17. Bridging the semantic gap via forensic analysis Raw SATA Capture Disk Reconstruction (Custom Module) File System Reconstruction (PyTSK + Custom Code) Filter Noise FS Modifications (a) Disk Reconstruction Memory Image (Clean) Semantic Reconstruction (Volatility) OS Information Memory Image (Dirty) Semantic Reconstruction (Volatility) OS Information Extract Differences Filter Noise Memory Modifications (b) Memory Reconstruction Fig. 5: Binary analysis workflow. (Rounded nodes represent data and rectangles represent data manipulation.) VII. EVALUATION AND ANALYSIS In this section, we explain our methodology for seman- tic gap reconstruction (Section VII-A) and demonstrate the practicality of LO-PHI with three targeted experiments. The experiments were constructed to demonstrate the following: • The ability of LO-PHI to detect the behaviors elicited by real malware, confirmed with ground truth (Section VII-C) • The ability to scale and extract meaningful results from unknown malware samples (Section VII-D) ssdt, svcscan, and callbacks which examine kernel descriptor tables, registered services, and kernel callbacks. 2) Disk: The first step in our disk analysis is to first convert the raw capture of the SATA activity into a 4-tuple containing the disk operation (e.g., READ or WRITE), starting sector, total number of sectors, and data. Our physical drives, as with most modern drives, used an optimization in the SATA specification known as Native Command Queuing (NCQ) [23]. NCQ reorders SATA Frame Information Structure (FIS) re- quests to achieve better performance by reducing extraneous head movement and then asynchronously replies based on Figure 4: Binary analysis workflow. Rounded nodes represent data and rectangles represent data manipulation. Background noise was removed by analyzing non-malicious software. 17 / 31
  • 18. Example of output of the analysis toolbox Offset Name PID PPID 0x86292438 AcroRd32.exe 1340 1048 0x86458818 AcroRd32.exe 1048 1008 0x86282be0 AdobeARM.exe 1480 1048 0x864562a0 $$ rk sketchy server.exe 1044 1008 (a) New Processes (pslist) Selector Base Limit Type DPL Gr Pr 0x320 0x8003b6da 0x00000000 CallGate32 3 - P (c) GDT Hooks (gdt) Name hookssdt.sys Table Entry Index Address Name Module 0 0x0000f7 0xf7c5b406 NtSetValueKey hookssdt.sys 0 0x0000ad 0xf7c5b44c NtQuerySystemInformation hookssdt.sys 0 0x000091 0xf7c5b554 NtQueryDirectoryFile hookssdt.sys (e) SSDT Hooks (ssdt) /.../lo /.../lo /.../lo Fig. 6: Post-filtered semantic output from rootkit experim B. Filtering Background Noise While the ability to provide a complete log of modifications to the entire system is useful in its own right, it is likely more (Figure 6d), h then execute have omitted Adobe Acrob Figure 5: New processes. D PID Port Protocol Address 1048 1038 UDP 127.0.0.1 1044 21 TCP 0.0.0.0 (b) New Sockets (sockets) Gr Pr - P Name Base Size File hookssdt.sys 0xf7c5b000 0x1000 C: ...lophihookssdt.sys (d) Loaded Kernel Models (modscan) Module hookssdt.sys mation hookssdt.sys e hookssdt.sys Created Filename /.../lophi/$$ rk sketchy server.exe /.../lophi/hookssdt.sys /.../lophi/sample 0742475e94904c41de1397af5c53dff8e.exe (f) Disk Event Log (81 Entries Truncated) Figure 6: New sockets. PPID 1048 1008 1048 1008 PID Port Protocol Address 1048 1038 UDP 127.0.0.1 1044 21 TCP 0.0.0.0 (b) New Sockets (sockets) DPL Gr Pr 3 - P Name Base Size File hookssdt.sys 0xf7c5b000 0x1000 C: ...lophihookssdt.sys (d) Loaded Kernel Models (modscan) Module hookssdt.sys Information hookssdt.sys ryFile hookssdt.sys Created Filename /. .. /lophi/$$ rk sketchy server.exe /. .. /lophi/hookssdt.sys /. .. /lophi/sample 0742475e94904c41de1397af5c53dff8e.exe (f) Disk Event Log (81 Entries Truncated) ntic output from rootkit experiment (Section VII-C1).Figure 7: Disk event log. 18 / 31
  • 20. Experiment on evasive malware Malware previously labeled as “evasive” was executed on Windows 7 and analyzed using LO-PHI. Many samples clearly exhibited typical malware behavior. Some samples failed because of no network connection, or wrong Windows version. [. . . ] we feel that our findings are more that sufficient to showcase LO-PHI’s ability to analyze evasive malware, without being subverted, and subsequently produce high-fidelity results for further analysis. 20 / 31
  • 21. Timing Fig. 4: Time spent in each step of binary analysis. Both environments were booting a 10 GB Windows 7 (64-bit) hibernate image and were running on a system with 1 GB of volatile memory. Figure 8: Time spent in each step of binary analysis. Both environments were booting a 10 GB Windows 7 (64-bit) hibernate image and were running on a system with 1 GB of volatile memory. 21 / 31
  • 23. Known limitations Newer chipsets use IOMMUs, disabling DMA from peripherals Current memory acquisition technique will become unusable Smearing: the memory can change during the acquisition Inconsistent states Faster polling rates can help Filesystem caching: some data will not pass through SATA interface Malware could write a file to disk cache, execute and delete it before the cached is flushed to disk. However, the effects would be visible in memory. 23 / 31
  • 24. Issues with the technique and the experiments The malware is left to run only 3 minutes. Many malwares need much more time to fully uncover their effects (e.g. ransomware). No memory polling during the execution of the malware Only snapshot before and after the execution Temporary data used by the malware is never seen Assumption: malware does not modify BIOS or firmware. But if it does, the physical machine could not be recoverable. Costly! No Internet access: the authors always run the malware on disconnected machines. Most malware becomes useless without Command&Control infrastructure. Network access could expose further, unseen, artifacts. 24 / 31
  • 25. Methodological issues The article claims that the artifacts from LO-PHI are unusable by malware because there’s no baseline (i.e. the malware cannot see the machine before the installation of LO-PHI). This can also be true for traditional approaches. No statistical test used to discover whether difference in disk/memory throughput is statistically significant (presence of artifacts). Very simple and standard procedure, should really be done in a scientific paper. Network analysis technique is not described and unclear. We exclude the network trace analysis from much of our discussion since it is a well-known technique and not the focus of our work. 25 / 31
  • 26. Memory throughput with and without LO-PHI (a) Physical machine (Polling at 14MB/sec) (b) Fig. 1: Average memory throughput comparison as reported by RAMSpeed, with and machines. (500 samples for each box plot) At first glance, Figure 1 may seem to indicate that our in- B. Disk Artifa Figure 9: Average memory throughput comparison as reported by RAMSpeed, with and without instrumentation. Deviation from uninstrumented trial is only 0.4% in worst case. No statistical test used. 26 / 31
  • 27. Another point. . . Virtualization is increasingly used in production contexts. Just think about cloud computing! Attackers will want to target those VMs [Garfinkel, 2007] Malware that deactivates in VMs will be less common 27 / 31
  • 28. Related and future work 28 / 31
  • 29. Related work Traditional dynamic analysis Many dynamic malware analysis tools rely on virtualization: Ether, BitBlaze, Anubis, V2E, HyperDbg, SPIDER. We already saw the limitations of VM approaches: artifacts Bare-metal dynamic analysis BareBox [Kirat, 2011]: malware analysis framework based on a bare-metal machine without virtualization or emulations Only targets user-mode malware Only disk analysis (no memory tools) 29 / 31
  • 30. Future evolutions Automated, repeated analyses Disk restore phase is the lengthiest (> 6 min) The resetting and boot process could be decreased significantly by writing a custom PXE loader, or completely mitigated by implementing copy-on-write into our FPGA. While snapshots are trivial with virtual machines, it is still an open problem for physical machines. Extensions Analyze transient behavior of binary Continuous memory polling Need to deal with DMA artifacts Cover malware that infects hardware (BIOS, firmware) 30 / 31
  • 31. References Chen X., Andersen J, Morley M., Bailey M., Nazario J. Towards an understanding of anti-virtualization and anti-debugging behavior in modern malware In Proceedings of the International Conference on Dependable Systems and Networks (2008) doi: 10.1109/DSN.2008.4630086 Kirat D., Vigna G., Kruegel C. BareBox: Efficient Malware Analysis on Bare-metal Proceedings of the 27th Annual Computer Security Applications Conference (2001) doi: 10.1145/2076732.2076790 Garfinkel T., Adams K., Warfield A., Franklin J. Compatibility is Not Transparency: VMM Detection Myths and Realities Proceedings of the 11th USENIX Workshop on Hot Topics in Operating Systems (2007) 31 / 31