SlideShare a Scribd company logo
Code Obfuscation Against Symbolic Execution Attacks
Sebastian Banescu1, Christian Collberg2, Vijay Ganesh3,
Zack Newsham3, Alexander Pretschner1
1 Technical University of Munich, Germany
2 University of Arizona, Tucson, AZ, USA
3 University of Waterloo, Ontario, Canada
Research Questions
1. How do we measure obfuscation strength?
2. Which obfuscation transformations are stronger? Why?
3. Are combinations of obfuscation transformations stronger?
4. How many obfuscation transformations should one combine?
5. Can we build stronger obfuscation transformations?
2
Introduction
• Many obfuscation transformations available
• Malware developers combine them to generate millions of malware variants
• Human-assisted analysis of all variants unscalable
• Automated analysis must be employed
3
Automated Analysis Attacks
4
Deobfuscation Attack Goals
• Simplify control-flow graph
• Identify & disable tamper-proofing checks
• Bypass authentication checks / trigger conditions
5
Simplify Control-Flow Graph (Yadegari et al. 2015)
1. Explore paths such that all code is covered
2. Simplify traces using compiler optimization tricks
3. Reconstruct CFG from traces
6
Identify Tamper-Proofing Checks (Qiu et al. 2015)
1. Taint code segment
2. Explore paths until enough self-checks disabled
(cyclic checks → explore all code)
3. Disable self-checking instructions
7
Symbolic / Concolic Execution
8
1. Make variables (inputs) symbolic
2. Collect path constraints during execution
3. Solve path constrains w. SMT solver → concrete values (test cases)
int main(int ac, char* av[]){
int a = atoi(av[1]); // symbolic
int b = atoi(av[2]); // symbolic
int c = atoi(av[3]); // symbolic
if (a > b)
a = a - b;
if (b < 1) {
if (c != a) {
c = a + b;
}
}
b = 1;
return 0;
}
Bypass Authentication Checks (Banescu et al. 2015)
1. Make password symbolic
2. Explore paths until desired instruction (sequence) is found
3. Solve path constraints on paths that lead to desired instruction via SMT solver
4. Find satisfiable path constraints → concrete inputs to bypass check
9
A Common Sub-Problem of Deobfuscation Attacks
• Common sub-problem: path exploration
• How do we explore paths of a given program?
• Generate test cases:
 Black-box test generation: Fuzzing, Random testing
 White-box test generation: Symbolic/Concolic execution
10
VS
Measuring Obfuscation Strength
• Strength of obfuscation: increase in test case generation time
• Observation: Generally, obfuscation does not change input-output behavior
→ No increase in black-box test case generation time
• Example:
• Observation: Could be faster to use black-box test generator than white-box
• Conclusion: Apply obfuscation transformations until white-box slower than
black-box test case generation
11
if (arg[1][0] > 127)
// do this
else
// do that
Obfuscator
Obfuscated
Program
Code Tampering Attacks
• Question: Why do we need code obfuscation? Just use cryptographic hash
• Example:
• Hard for symbolic execution (SMT solver) to break crypto hash functions
• Answer:
 Test case generation is non-invasive attack, i.e. code is read, not changed
 Obfuscation aims to defend against MATE attacker (can tamper with code)
 Easy to find and patch-out crypto hash functions
12
if (SHA256(arg[1]) == 0xa49…3793)
// do this
else
// do that
Experiments
13
Overview of Experiments
• Datasets of programs:
1. Manually crafted 48 small programs (heterogeneous)
2. Randomly generated 5761 larger programs (homogeneous)
• Obfuscation tools:
1. Tigress C Obfuscator / Virtualizer (source code level)
2. Obfuscator-LLVM (LLVM IR level)
• Symbolic execution tools:
1. KLEE (LLVM IR level)
2. Angr (binary level)
3. Triton (binary level)
14
Description of Experiment 1
• Attacker goal: 100% code coverage → CFG recovery, remove all self-checks
• Obfuscated programs in 1st dataset with:
 30 combinations of 5 obfuscation transformations from Tigress
 Opaque predicates
 Encode literals
 Encode arithmetic
 Control flow flattening
 Virtualization
 9 combinations of 3 obfuscation transformations from Obfuscator-LLVM
 Instruction substitution
 Control flow flattening
 Bogus control flow
 48 original programs x 39 obfuscations + 48 original programs = 1920
• Ran KLEE 10x on each of the 1920 programs → recorded time, mem. size …
15
Results of Experiment 1
• Opaque predicates and virtualization have highest increase in program size
• Opaque predicates and encode literals have smallest impact on symbolic execution time
• Flattening and virtualization (also combined w. other transformations) increase time
• % time waiting for solver increased by flattening and encode arithmetic, decreased by virt.
• Flattening increases number of queries sent to SMT solver
• Encode arithmetic increases size of queries sent to SMT solver
16
Tigress Obfuscator-LLVM
Description of Experiment 2
• Attacker goal: find test for “winning” path → bypass license check
• Obfuscated programs in 2nd dataset with:
 5 obfuscation transformations from Tigress
 Opaque predicates
 Encode literals
 Encode arithmetic
 Control flow flattening
 Virtualization
 5761 programs x 5 obfuscations + 5761 programs = 34 566 programs
• Ran symbolic execution tools:
1. KLEE (LLVM IR level)
2. Angr (binary level)
3. Triton (binary level)
17
Results of Experiment 2
• Triton ran out of memory when given larger obfuscated programs
• KLEE and angr only successfully analyzed 12.713 obfuscated programs
• Data types of variables and type of operators influence symbolic execution time
• KLEE incurs overall lower slowdown than angr (also requires less memory)
• Slowdown for finding “winning” path is lower than slowdown for 100% code coverage
18
Key Observation from Experiments
Observation: Number of path constraints are the same for all obfuscated and
original programs
Reason: Obfuscation transformations do not introduce new paths dependent on
symbolic values
Idea: Introduce new paths dependent on symbolic values!
19
Conclusions
• Test case generation is a common sub-goal of 3 deobfuscation attacks
• Used 2 datasets of small programs to compare obfuscation and attack impl.:
 Opaque predicates, instruction substitution and encode literals not good
 Virtualization, flattening and encode arithmetic better
 KLEE slightly faster than Angr
• Remark: Obfuscation transformations don’t introduce input dependent paths
• Proposed obfuscation transformations to raise the bar for sym-exec
• Future work:
 Use real-world programs
 Binary obfuscators (e.g. Themida)
 Other automated attacks (e.g. active / tampering attacks)
20
21
Thank you for your attention!
Questions?
21

More Related Content

PPTX
I.T.A.K.E Unconference - Mutation testing to the rescue of your tests
PPTX
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
PDF
Practical RISC-V Random Test Generation using Constraint Programming
PDF
TMPA-2017: 5W+1H Static Analysis Report Quality Measure
PDF
Cyclomatic complexity
PDF
The Road Not Taken: Estimating Path Execution Frequency Statically
PDF
Practical byzantine fault tolerance by altanai
PDF
TMPA-2017: Live testing distributed system fault tolerance with fault injecti...
I.T.A.K.E Unconference - Mutation testing to the rescue of your tests
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
Practical RISC-V Random Test Generation using Constraint Programming
TMPA-2017: 5W+1H Static Analysis Report Quality Measure
Cyclomatic complexity
The Road Not Taken: Estimating Path Execution Frequency Statically
Practical byzantine fault tolerance by altanai
TMPA-2017: Live testing distributed system fault tolerance with fault injecti...

What's hot (20)

PDF
Compiler for Zero-Knowledge Proof-of-Knowledge Protocols
PPTX
Calculation of Cyclomatic complexity
PDF
VCCFinder: Finding Potential Vulnerabilities in Open-Source Projects to Assis...
PPTX
Benchmarking with JMH (riviera dev 2017)
PDF
Comparing Reuse Mechanisms for Model Transformation Languages: Design for an ...
PDF
Triantafyllia Voulibasi
PPTX
Programming using MPI and OpenMP
ODP
Scalable concurrency control in a dynamic membership
PDF
Protecting JavaScript source code using obfuscation - OWASP Europe Tour 2013 ...
PPTX
A simple tool for debug (tap>)
PPTX
The Psychology of C# Analysis
PDF
Presentation slides: "How to get 100% code coverage"
PPT
Qat09 presentations dxw07u
PPTX
Decision Making & Loops
PDF
Data Generation with PROSPECT: a Probability Specification Tool
PDF
On component interface
PPTX
Magic behind the numbers - software metrics in practice
PPTX
Information and data security pseudorandom number generation and stream cipher
PDF
Tools and techniques of code coverage testing
PPTX
Sequence to Sequence Pattern Learning Algorithm for Real-time Anomaly Detecti...
Compiler for Zero-Knowledge Proof-of-Knowledge Protocols
Calculation of Cyclomatic complexity
VCCFinder: Finding Potential Vulnerabilities in Open-Source Projects to Assis...
Benchmarking with JMH (riviera dev 2017)
Comparing Reuse Mechanisms for Model Transformation Languages: Design for an ...
Triantafyllia Voulibasi
Programming using MPI and OpenMP
Scalable concurrency control in a dynamic membership
Protecting JavaScript source code using obfuscation - OWASP Europe Tour 2013 ...
A simple tool for debug (tap>)
The Psychology of C# Analysis
Presentation slides: "How to get 100% code coverage"
Qat09 presentations dxw07u
Decision Making & Loops
Data Generation with PROSPECT: a Probability Specification Tool
On component interface
Magic behind the numbers - software metrics in practice
Information and data security pseudorandom number generation and stream cipher
Tools and techniques of code coverage testing
Sequence to Sequence Pattern Learning Algorithm for Real-time Anomaly Detecti...
Ad

Similar to ACSAC2016: Code Obfuscation Against Symbolic Execution Attacks (20)

PDF
Code obfuscation theory and practices
PDF
Binary code obfuscation through c++ template meta programming
PDF
Deobfuscation and beyond (ZeroNights, 2014)
PPTX
Adventures in Asymmetric Warfare
PDF
PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynam...
PPTX
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in Firmware
PDF
Simple Obfuscation Tool for Software Protection
PDF
Aizatulin
PDF
414351_Iason_Papapanagiotakis-bousy_Iason_Papapanagiotakis_Thesis_2360661_357...
PDF
NSC #2 - D2 06 - Richard Johnson - SAGEly Advice
PDF
Debug - MITX60012016-V005100
PDF
Automated malware invariant generation
PDF
Symbolic Execution of Malicious Software: Countering Sandbox Evasion Techniques
PDF
Sthack 2015 - Jonathan "@JonathanSalwan" Salwan - Dynamic Behavior Analysis U...
PDF
St hack2015 dynamic_behavior_analysis_using_binary_instrumentation_jonathan_s...
PPT
Secure programming - Computer and Network Security
PDF
Malware analysis
PPTX
Software Security information security
PPTX
Code obfuscation
Code obfuscation theory and practices
Binary code obfuscation through c++ template meta programming
Deobfuscation and beyond (ZeroNights, 2014)
Adventures in Asymmetric Warfare
PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynam...
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in Firmware
Simple Obfuscation Tool for Software Protection
Aizatulin
414351_Iason_Papapanagiotakis-bousy_Iason_Papapanagiotakis_Thesis_2360661_357...
NSC #2 - D2 06 - Richard Johnson - SAGEly Advice
Debug - MITX60012016-V005100
Automated malware invariant generation
Symbolic Execution of Malicious Software: Countering Sandbox Evasion Techniques
Sthack 2015 - Jonathan "@JonathanSalwan" Salwan - Dynamic Behavior Analysis U...
St hack2015 dynamic_behavior_analysis_using_binary_instrumentation_jonathan_s...
Secure programming - Computer and Network Security
Malware analysis
Software Security information security
Code obfuscation
Ad

Recently uploaded (20)

PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Network Security Unit 5.pdf for BCA BBA.
PPT
Teaching material agriculture food technology
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
KodekX | Application Modernization Development
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Electronic commerce courselecture one. Pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Approach and Philosophy of On baking technology
PDF
Machine learning based COVID-19 study performance prediction
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Encapsulation_ Review paper, used for researhc scholars
Review of recent advances in non-invasive hemoglobin estimation
Network Security Unit 5.pdf for BCA BBA.
Teaching material agriculture food technology
Mobile App Security Testing_ A Comprehensive Guide.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
KodekX | Application Modernization Development
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Electronic commerce courselecture one. Pdf
Spectroscopy.pptx food analysis technology
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
The Rise and Fall of 3GPP – Time for a Sabbatical?
Dropbox Q2 2025 Financial Results & Investor Presentation
sap open course for s4hana steps from ECC to s4
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Spectral efficient network and resource selection model in 5G networks
Approach and Philosophy of On baking technology
Machine learning based COVID-19 study performance prediction
The AUB Centre for AI in Media Proposal.docx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Encapsulation_ Review paper, used for researhc scholars

ACSAC2016: Code Obfuscation Against Symbolic Execution Attacks

  • 1. Code Obfuscation Against Symbolic Execution Attacks Sebastian Banescu1, Christian Collberg2, Vijay Ganesh3, Zack Newsham3, Alexander Pretschner1 1 Technical University of Munich, Germany 2 University of Arizona, Tucson, AZ, USA 3 University of Waterloo, Ontario, Canada
  • 2. Research Questions 1. How do we measure obfuscation strength? 2. Which obfuscation transformations are stronger? Why? 3. Are combinations of obfuscation transformations stronger? 4. How many obfuscation transformations should one combine? 5. Can we build stronger obfuscation transformations? 2
  • 3. Introduction • Many obfuscation transformations available • Malware developers combine them to generate millions of malware variants • Human-assisted analysis of all variants unscalable • Automated analysis must be employed 3
  • 5. Deobfuscation Attack Goals • Simplify control-flow graph • Identify & disable tamper-proofing checks • Bypass authentication checks / trigger conditions 5
  • 6. Simplify Control-Flow Graph (Yadegari et al. 2015) 1. Explore paths such that all code is covered 2. Simplify traces using compiler optimization tricks 3. Reconstruct CFG from traces 6
  • 7. Identify Tamper-Proofing Checks (Qiu et al. 2015) 1. Taint code segment 2. Explore paths until enough self-checks disabled (cyclic checks → explore all code) 3. Disable self-checking instructions 7
  • 8. Symbolic / Concolic Execution 8 1. Make variables (inputs) symbolic 2. Collect path constraints during execution 3. Solve path constrains w. SMT solver → concrete values (test cases) int main(int ac, char* av[]){ int a = atoi(av[1]); // symbolic int b = atoi(av[2]); // symbolic int c = atoi(av[3]); // symbolic if (a > b) a = a - b; if (b < 1) { if (c != a) { c = a + b; } } b = 1; return 0; }
  • 9. Bypass Authentication Checks (Banescu et al. 2015) 1. Make password symbolic 2. Explore paths until desired instruction (sequence) is found 3. Solve path constraints on paths that lead to desired instruction via SMT solver 4. Find satisfiable path constraints → concrete inputs to bypass check 9
  • 10. A Common Sub-Problem of Deobfuscation Attacks • Common sub-problem: path exploration • How do we explore paths of a given program? • Generate test cases:  Black-box test generation: Fuzzing, Random testing  White-box test generation: Symbolic/Concolic execution 10 VS
  • 11. Measuring Obfuscation Strength • Strength of obfuscation: increase in test case generation time • Observation: Generally, obfuscation does not change input-output behavior → No increase in black-box test case generation time • Example: • Observation: Could be faster to use black-box test generator than white-box • Conclusion: Apply obfuscation transformations until white-box slower than black-box test case generation 11 if (arg[1][0] > 127) // do this else // do that Obfuscator Obfuscated Program
  • 12. Code Tampering Attacks • Question: Why do we need code obfuscation? Just use cryptographic hash • Example: • Hard for symbolic execution (SMT solver) to break crypto hash functions • Answer:  Test case generation is non-invasive attack, i.e. code is read, not changed  Obfuscation aims to defend against MATE attacker (can tamper with code)  Easy to find and patch-out crypto hash functions 12 if (SHA256(arg[1]) == 0xa49…3793) // do this else // do that
  • 14. Overview of Experiments • Datasets of programs: 1. Manually crafted 48 small programs (heterogeneous) 2. Randomly generated 5761 larger programs (homogeneous) • Obfuscation tools: 1. Tigress C Obfuscator / Virtualizer (source code level) 2. Obfuscator-LLVM (LLVM IR level) • Symbolic execution tools: 1. KLEE (LLVM IR level) 2. Angr (binary level) 3. Triton (binary level) 14
  • 15. Description of Experiment 1 • Attacker goal: 100% code coverage → CFG recovery, remove all self-checks • Obfuscated programs in 1st dataset with:  30 combinations of 5 obfuscation transformations from Tigress  Opaque predicates  Encode literals  Encode arithmetic  Control flow flattening  Virtualization  9 combinations of 3 obfuscation transformations from Obfuscator-LLVM  Instruction substitution  Control flow flattening  Bogus control flow  48 original programs x 39 obfuscations + 48 original programs = 1920 • Ran KLEE 10x on each of the 1920 programs → recorded time, mem. size … 15
  • 16. Results of Experiment 1 • Opaque predicates and virtualization have highest increase in program size • Opaque predicates and encode literals have smallest impact on symbolic execution time • Flattening and virtualization (also combined w. other transformations) increase time • % time waiting for solver increased by flattening and encode arithmetic, decreased by virt. • Flattening increases number of queries sent to SMT solver • Encode arithmetic increases size of queries sent to SMT solver 16 Tigress Obfuscator-LLVM
  • 17. Description of Experiment 2 • Attacker goal: find test for “winning” path → bypass license check • Obfuscated programs in 2nd dataset with:  5 obfuscation transformations from Tigress  Opaque predicates  Encode literals  Encode arithmetic  Control flow flattening  Virtualization  5761 programs x 5 obfuscations + 5761 programs = 34 566 programs • Ran symbolic execution tools: 1. KLEE (LLVM IR level) 2. Angr (binary level) 3. Triton (binary level) 17
  • 18. Results of Experiment 2 • Triton ran out of memory when given larger obfuscated programs • KLEE and angr only successfully analyzed 12.713 obfuscated programs • Data types of variables and type of operators influence symbolic execution time • KLEE incurs overall lower slowdown than angr (also requires less memory) • Slowdown for finding “winning” path is lower than slowdown for 100% code coverage 18
  • 19. Key Observation from Experiments Observation: Number of path constraints are the same for all obfuscated and original programs Reason: Obfuscation transformations do not introduce new paths dependent on symbolic values Idea: Introduce new paths dependent on symbolic values! 19
  • 20. Conclusions • Test case generation is a common sub-goal of 3 deobfuscation attacks • Used 2 datasets of small programs to compare obfuscation and attack impl.:  Opaque predicates, instruction substitution and encode literals not good  Virtualization, flattening and encode arithmetic better  KLEE slightly faster than Angr • Remark: Obfuscation transformations don’t introduce input dependent paths • Proposed obfuscation transformations to raise the bar for sym-exec • Future work:  Use real-world programs  Binary obfuscators (e.g. Themida)  Other automated attacks (e.g. active / tampering attacks) 20
  • 21. 21 Thank you for your attention! Questions? 21