ACSAC2016: Code Obfuscation Against Symbolic Execution Attacks

Code Obfuscation Against Symbolic Execution Attacks
Sebastian Banescu1, Christian Collberg2, Vijay Ganesh3,
Zack Newsham3, Alexander Pretschner1
1 Technical University of Munich, Germany
2 University of Arizona, Tucson, AZ, USA
3 University of Waterloo, Ontario, Canada

Research Questions
1. How do we measure obfuscation strength?
2. Which obfuscation transformations are stronger? Why?
3. Are combinations of obfuscation transformations stronger?
4. How many obfuscation transformations should one combine?
5. Can we build stronger obfuscation transformations?
2

Introduction
• Many obfuscation transformations available
• Malware developers combine them to generate millions of malware variants
• Human-assisted analysis of all variants unscalable
• Automated analysis must be employed
3

Deobfuscation Attack Goals
• Simplify control-flow graph
• Identify & disable tamper-proofing checks
• Bypass authentication checks / trigger conditions
5

Simplify Control-Flow Graph (Yadegari et al. 2015)
1. Explore paths such that all code is covered
2. Simplify traces using compiler optimization tricks
3. Reconstruct CFG from traces
6

Identify Tamper-Proofing Checks (Qiu et al. 2015)
1. Taint code segment
2. Explore paths until enough self-checks disabled
(cyclic checks → explore all code)
3. Disable self-checking instructions
7

Symbolic / Concolic Execution
8
1. Make variables (inputs) symbolic
2. Collect path constraints during execution
3. Solve path constrains w. SMT solver → concrete values (test cases)
int main(int ac, char* av[]){
int a = atoi(av[1]); // symbolic
int b = atoi(av[2]); // symbolic
int c = atoi(av[3]); // symbolic
if (a > b)
a = a - b;
if (b < 1) {
if (c != a) {
c = a + b;
}
}
b = 1;
return 0;
}

Bypass Authentication Checks (Banescu et al. 2015)
1. Make password symbolic
2. Explore paths until desired instruction (sequence) is found
3. Solve path constraints on paths that lead to desired instruction via SMT solver
4. Find satisfiable path constraints → concrete inputs to bypass check
9

A Common Sub-Problem of Deobfuscation Attacks
• Common sub-problem: path exploration
• How do we explore paths of a given program?
• Generate test cases:
 Black-box test generation: Fuzzing, Random testing
 White-box test generation: Symbolic/Concolic execution
10
VS

Measuring Obfuscation Strength
• Strength of obfuscation: increase in test case generation time
• Observation: Generally, obfuscation does not change input-output behavior
→ No increase in black-box test case generation time
• Example:
• Observation: Could be faster to use black-box test generator than white-box
• Conclusion: Apply obfuscation transformations until white-box slower than
black-box test case generation
11
if (arg[1][0] > 127)
// do this
else
// do that
Obfuscator
Obfuscated
Program

Code Tampering Attacks
• Question: Why do we need code obfuscation? Just use cryptographic hash
• Example:
• Hard for symbolic execution (SMT solver) to break crypto hash functions
• Answer:
 Test case generation is non-invasive attack, i.e. code is read, not changed
 Obfuscation aims to defend against MATE attacker (can tamper with code)
 Easy to find and patch-out crypto hash functions
12
if (SHA256(arg[1]) == 0xa49…3793)
// do this
else
// do that

Overview of Experiments
• Datasets of programs:
1. Manually crafted 48 small programs (heterogeneous)
2. Randomly generated 5761 larger programs (homogeneous)
• Obfuscation tools:
1. Tigress C Obfuscator / Virtualizer (source code level)
2. Obfuscator-LLVM (LLVM IR level)
• Symbolic execution tools:
1. KLEE (LLVM IR level)
2. Angr (binary level)
3. Triton (binary level)
14

Description of Experiment 1
• Attacker goal: 100% code coverage → CFG recovery, remove all self-checks
• Obfuscated programs in 1st dataset with:
 30 combinations of 5 obfuscation transformations from Tigress
 Opaque predicates
 Encode literals
 Encode arithmetic
 Control flow flattening
 Virtualization
 9 combinations of 3 obfuscation transformations from Obfuscator-LLVM
 Instruction substitution
 Bogus control flow
 48 original programs x 39 obfuscations + 48 original programs = 1920
• Ran KLEE 10x on each of the 1920 programs → recorded time, mem. size …
15

Results of Experiment 1
• Opaque predicates and virtualization have highest increase in program size
• Opaque predicates and encode literals have smallest impact on symbolic execution time
• Flattening and virtualization (also combined w. other transformations) increase time
• % time waiting for solver increased by flattening and encode arithmetic, decreased by virt.
• Flattening increases number of queries sent to SMT solver
• Encode arithmetic increases size of queries sent to SMT solver
16
Tigress Obfuscator-LLVM

Description of Experiment 2
• Attacker goal: find test for “winning” path → bypass license check
• Obfuscated programs in 2nd dataset with:
 5 obfuscation transformations from Tigress
 Opaque predicates
 Encode literals
 Encode arithmetic
 Virtualization
 5761 programs x 5 obfuscations + 5761 programs = 34 566 programs
• Ran symbolic execution tools:
1. KLEE (LLVM IR level)
2. Angr (binary level)
3. Triton (binary level)
17

Results of Experiment 2
• Triton ran out of memory when given larger obfuscated programs
• KLEE and angr only successfully analyzed 12.713 obfuscated programs
• Data types of variables and type of operators influence symbolic execution time
• KLEE incurs overall lower slowdown than angr (also requires less memory)
• Slowdown for finding “winning” path is lower than slowdown for 100% code coverage
18

Key Observation from Experiments
Observation: Number of path constraints are the same for all obfuscated and
original programs
Reason: Obfuscation transformations do not introduce new paths dependent on
symbolic values
Idea: Introduce new paths dependent on symbolic values!
19

Conclusions
• Test case generation is a common sub-goal of 3 deobfuscation attacks
• Used 2 datasets of small programs to compare obfuscation and attack impl.:
 Opaque predicates, instruction substitution and encode literals not good
 Virtualization, flattening and encode arithmetic better
 KLEE slightly faster than Angr
• Remark: Obfuscation transformations don’t introduce input dependent paths
• Proposed obfuscation transformations to raise the bar for sym-exec
• Future work:
 Use real-world programs
 Binary obfuscators (e.g. Themida)
 Other automated attacks (e.g. active / tampering attacks)
20

21
Thank you for your attention!
Questions?
21

ACSAC2016: Code Obfuscation Against Symbolic Execution Attacks

More Related Content

What's hot (20)

Similar to ACSAC2016: Code Obfuscation Against Symbolic Execution Attacks (20)

Recently uploaded (20)

ACSAC2016: Code Obfuscation Against Symbolic Execution Attacks