SlideShare a Scribd company logo
Crash Analysis with
Reverse Taint
Powered by Taintgrind
Marek Zmysłowski
V. 12122019
whoami
Security Researcher @
Interested in fuzzing and vulnerability finding
Fan of The Matrix and Hacker movies
Co-organizer of “H4x0r5 %40 Warsaw” meetings
Crash Analysis with Reverse Taint
Crash Analysis with Reverse Taint
- What is a crash?
Crash, or system crash, occurs when a computer program (...) stops functioning properly
and exits. *
An application typically crashes when it performs an operation that is not allowed by the
operating system. The operating system then triggers an exception or signal in the
application. Unix applications traditionally responded to the signal by dumping core. *
- Why is it important to identify a crash?
“performs an operation that is not allowed”. This indicate that inside the application a bug
exists. When the crash is identified and recurrent the bug can be found.
*https://en.wikipedia.org/wiki/Crash_(computing)
Crash Analysis with Reverse Taint
- How to identify a crash?
“operating system then triggers an exception or signal” Different operating
systems contain different mechanisms to “collect” crashes. Some perform core
dump, store the memory (Linux), other attach the debugger to allow user the
debugging session with the crashed application (Windows).
- How do we get crashes?
“That, Detective, is the ‘right question.’”
“I, Robot” (2004)
Hunting in the wild
- How can a crash be found?
By accident or by one of the most popular techniques that we will be also
mentioned here, fuzzing. The sooner the bug is found in the production
process, the less the costs are.
- So what is ‘fuzzing’?
The idea behind fuzzing is very simple. Let’s take an malformed input and feed it
to the application. Maybe it will crash. Of course, how the input is “chosen” and
how the crashes are caught is a topic for another presentation(s).
Fuzzers
The godfather of all is, of course, AFL.
However, recent years brought multiple
fuzzers used for different purposes. Some
of them are different clones of AFL, some of
them try to do things differently.
Everyone can find something for
themselves.
American Fuzzy Lop
HonggFuzz
AFL++
Angora
QSYM
WinAFL
Real Example - Fuzzing
jhead is used to display and manipulate data contained in the Exif header of
JPEG images from digital cameras. By default, jhead displays the more useful
camera settings from the file in a user-friendly format.
The version used here is 3.03.
http://www.sentex.net/~mwandel/jhead/
Real Example - Fuzzing
Crash
Example
Crash vs Bug
- What is the difference between Crash and Bug?
Crash is a result of incorrectly working code caused by a bug. Sometimes it
happens, that the crashing place and the bug place are “the same”. And
sometimes not ...
- Is one crash caused by one bug?
No.
Crash vs Bug
Case 1.
One bug causes one crash.
This is the easiest situation as the identification
is straightforward.
Crash vs Bug
Case 2.
One bug can cause a few
crashes.
This happens quite often especially
with simple buffer overflows where
the “size” variable is used. Direct
read or write and access different
memory regions cause different
crashes.
Crash vs Bug
Case 3.
A few bugs can cause one crash.
This depends on how we identify crash. The
simplest example can be a frame processor.
For the different types of frame, the size
parser works incorrectly and may cause
different crashes for different paths.
Crash vs Bug
In an additional experiment we computed a portion of groundtruth. We applied all patches to cxxfilt
from the version we fuzzed up until the present. We grouped together all inputs that a particular patch
caused to now gracefully exit [11], confirming that the patch represented a single conceptual bugfix. We
found that all 57,142 crashing inputs deemed “unique” by coverage profiles were addressed by 9
distinct patches.
Stack hashes did better, but still over-counted bugs. Instead of the bug mapping to, say 500 AFL
coverage-unique crashes in a given trial, it would map to about 46 stack hashes, on average.
Stackhashes were also subject to false negatives: roughly 16% of hashesfor crashes from one bug
were shared by crashes from another bug.In five cases, a distinct bug was found by only one crash,
and that crash had a non-unique hash, meaning that evidence of a distinct bug would have been
dropped by “de-duplication.”
“Evaluating Fuzz Testing” https://arxiv.org/pdf/1808.09700.pdf
Crash vs Bug
So what is needed?
Crash Analysis with Reverse Taint
Crash Analysis with Reverse Taint
- What is crash analysis and why do we need that?
Crash analysis is a process of evaluating exploitability of the crash and
identifying the root cause of this crash. If you are fuzzing something, the number
of crashes can be huge. Also the impact and consequences (criticality) the bug
might have, depends on the application technology and the system.
- So what exactly do we analyze?
There are two major things to analyze: the crash and the bug.
Analysis - Crash
- What type of the crash is it?
For example: Out-of-bound read, NULL Pointer Dereference, Buffer Overflow, etc.
- Is the crash exploitable?
It is a part of identification process to find out if the crash can be used to achieve
something more than just crash the application - read a piece data, overwrite
memory or execute code.
- Critical or exploitable - what is the difference?
The exploitability related only to bug and crash itself. The criticality is related to
the whole environment. A Safe NULL Pointer Dereference is different for a nuclear
power plant software and a kids game.
Analysis - Bug
The second important analysis part is to identify the bug. As this was mentioned
before, there can be different relations between bug and crash.
It is also important to how inputs (which bits and bytes) correlate with the bug.
This of course may influence the crash later and its exploitability.
- What different relation are here?
User data can control the crash directly (e: offset inside the table is calculated
based on user data) or indirectly (e: the incorrect branch is taken)
For example: NULL Pointer Dereference vs Safe NULL Pointer Dereference
Analysis - Bug (Direct vs. Indirect)
void *pointer = NULL
char table[100]
int index = 0
char user_data
user_data>100
pointer[index] = 0
table[user_data] = 0
Analysis - Few Interesting Tools
It runs crash files with instrumentation and outputs results in various formats.
It summarizes crashes in a crashwalk database by major / minor stack hash.
Although AFL (for example) already de-dupes crashes, bucketing summarizes
those crashes by an order of magnitude or more. Crashes that bucket the same
have exactly the same stack contents, so they're likely (not guaranteed) to be
the same bug.
It is a simple utility to output the filenames of all crashes matching a given hash. I
use it in combination with xargs to bulk delete / move crash files.
crashwalk
- cwtriage
- cwdump
- cwfind
https://github.com/bnagy/crashwalk
Analysis - Few Interesting Tools
afl-utils
- afl-collect
- afl-minimize
Copies all crash sample files from an afl synchronisation
directory (used by multiple afl instances when run in parallel)
into a single location providing easy access for further crash
analysis. Also executes exploitable on them and remove
uninteresting crashes.
Helps to create a minimized corpus from samples of a parallel
fuzzing job.
https://github.com/rc0r/afl-utils
afl-collect
https://github.com/rc0r/afl-utils
Results of the
“exploitable” plugin
afl-minimize
Reducing Input Files
Analysis - Few Interesting Tools
afl-analyze It takes an input file, attempts to
sequentially flip bytes, and observes
the behavior of the tested program. It
then color-codes the input based on
which sections appear to be critical,
and which are not.
While not bulletproof, it can often offer
quick insights into complex file
formats.
https://lcamtuf.blogspot.com/2016/02/say-hello-to-afl-analyze.html
afl-analyze
*Of course it works, it just does not always give expected results.
Crash Analysis with Reverse Taint
Tainting
- What is tainting?
The purpose of dynamic taint analysis is to track information flow between
sources and sinks. Any program value whose computation depends on data
derived from a taint source is considered tainted. Any other value is considered
untainted.
- What are the types of tainting?
● The direct value is tainted
● Indirect/Control flow
● Address/Pointer relation
https://users.ece.cmu.edu/~aavgerin/papers/Oakland10.pdf
Tainting - Types
Indirect/ Control Flow
if (X > 2)
Y = 5
else
Y = 10
Address/Pointer
Y = A[X]
Direct Value
Y = X + 2
Tainting Propagation (Policy)
Depends on the application, different rules can be used to propagate the taint. It
is a set of rules how the source operants are propagated to destination. In
standard taint analysis, the destination operand is typically marked as tainted if
any of the source operands is tainted regardless of how the specific semantics
of s affects its destination operands.
Tainting - Issues
- What is over-tainting?
Overtainting occurs when code or data identified by the analysis as tainted is
not in fact influenced by any taint source (false-positive).
- What is under-tainting?
Under-tainting occurs when code or data that is influenced by a taint source is
not identified by the analysis as tainted. Such imprecision can be problematic,
especially in systems where the result of the taint analysis is critically important
(false-negative).
Powered by Taintgrind
- What is Valgrind?
Valgrind
It is an instrumentation framework for
building dynamic analysis tools. It comes with
a set of tools each of which performs some
kind of debugging, profiling, or similar task
that helps you improve your programs.
Valgrind's architecture is modular, so new
tools can be created easily and without
disturbing the existing structure.
http://valgrind.org
Valgrind IR
Valgrind had an x86-specific,part D&R, part
C&A, assembly-code-like IR in which the
units of translation were basic blocks. Since
then Valgrind has had anarchitecture-
neutral, D&R, single-static-assignment
(SSA) IR that is more similar to what might
be used in a compiler. IR blocks are
superblocks: single-entry, multiple-exit
stretches of code.
*http://valgrind.org/docs/valgrind2007.pdf
Single-Static-Assignment (SSA)
It is a property of an intermediate representation (IR),
which requires that each variable is assigned exactly
once, and every variable is defined before it is used.
Existing variables in the original IR are split into
versions, new variables typically indicated by the
original name with a subscript in textbooks, so that
every definition gets its own version. In SSA form, use-
def chains are explicit and each contains a single
element.
*https://en.wikipedia.org/wiki/Static_single_assignment_form
- What is Taintgrind?
Taintgrind
Taintgrind is based on Valgrind's MemCheck and Flayer plugin.
Taintgrind borrows the bit-precise shadow memory from MemCheck and only
propagates explicit data flow. This means that Taintgrind will not propagate taint
in control structures such as if-else, for-loops and while-loops. Taintgrind will also
not propagate taint in dereferenced tainted pointers.
http://valgrind.org/docs/memcheck2005.pdf
Taintgrind - Propagation Rules
1. The direct value is tainted
2. Indirect/Control flow
3. Address/Pointer relation
Taintgrind - Propagation Rules
- What are the Taintgrind propagation rules?
The granularity for the memory operation is 1 byte.
For the registry operation it is the size related to the operand. Even if one byte
is used there, the whole register will still be tainted. In such case, the Taintgrind
is overtainting.
However, because the Taintgrind is handling first type, it is also under-tainting.
Taintgrind - Propagation Rules
WRITE READ
Overtainting bit-byte operation
Taintgrind
Here is the example of logs and how
the taint is propagated over the file.
The job is to find all the patch from
the end of the file to the beginning.
One instruction can be tainted with
multiple input.
Taintgrind
The original Taintgrind was not useful for the purpose of the reverse taint. It was
missing a few parts.
- What was changed?
The “Read” function was not showing the size of data that was read.
The “Load” and “Store” functions were also not presenting the size of the operation.
Tracked variables
Reported crash
Taintgrind
GDB
Function where the
crash occurred
Instruction that caused the crash
The crash occurred with the
reference to the address
stored in RAX register
GDB
Taintgrind
/work/taint-analysis/valgrind-
3.15.0/build/bin/valgrind
--tool=taintgrind --file-filter=/work/taint-
analysis/CRASH
--compact=yes
--taint-start=0
--taint-len=1504
/work/taint-analysis/jhead-3.03/jhead
/work/taint-analysis/CRASH
Taintgrind
/work/taint-analysis/valgrind-
3.15.0/build/bin/valgrind --tool=taintgrind
Calling the Taintgrind tool.
---file-filter=/work/taint-analysis/CRASH This is the name of the file that needs
to be tainted. It must be FULL path.
--compact=yes Makes the log file smaller.
--taint-start=0 Offset inside the file.
--taint-len=1504 Taint size
/work/taint-analysis/jhead-3.03/jhead
/work/taint-analysis/CRASH
Command
Crash Analysis with Reverse Taint
Reverse Tainting the Value
An example how the values are tracked.
Reverse Tainting the
Value
Parts of one tainted variable diagram
rtaint
https://github.com/Cycura/rtaint
rtaint
- -f
This is the name of the log file created by Taintgrind. It can be in the compact
version.
- -g
The script can also produce the file in dot format used to generate a graph.
- -s
This is the name of the file with the slice. Later, this can be used to display what
operations where tainted with the values.
- -k
This is the directory path where the KaiTai struct will be stored inside files.
Reverse Tainting
the Value
These are the
indexes from
the file that
are causing
the crash.
Crash File
with Kaitai
Struct
Reverse Tainting the Value
- With or without the file size?
What is the probability that different size files with the same KaiTai Struct will
have different root cause?
- What is the relation between AFL Unique Algorithm and the Tainted
Input?
It is an open question...
Reverse Tainting the Value - Results
413 total crashes found by 4 instances (1 master and 3 slaves)
- master - 44 crashes
- slave1 - 116 crashes
- slave2 - 124 crashes
- slave3 - 129 crashes
349 crashes were reproduced under Taintgraind
177 crashes had unique KaiTai structure.
Slicing
It is the computation of the set of program
statements, the program slice, that may
affect the values at some point of interest,
referred to as a slicing criterion.*
https://en.wikipedia.org/wiki/Program_slicing
Graph
Taintgrind and rtaint allows to
create a dot graph that can be
converted with the graphviz
package.
Simple Example
Crash Analysis with Reverse Taint
Powered by Taintgrind
Powered by ...
Moflow - https://github.com/Cisco-
Talos/moflow/tree/master/BAP-0.7-moflow
Binary Ninja - https://blog.trailofbits.com/2019/08/29/reverse-
taint-analysis-using-binary-ninja/
Triton - https://triton.quarkslab.com/
BARF - https://github.com/programa-stic/barf-project
libdft - http://www.cs.columbia.edu/~vpk/research/libdft/
CommercialFree
TETRANE - https://www.tetrane.com/
What Next?
The tainting starts from the last line inside the file. This is
useful when there is a crash. But there is no way to taint any
arbitrary instruction if the application doesn’t crash.
IDA Pro/Ghidra/Binary Ninja script for highlighting the tainted
instruction. This will help to easy identified the data flow.
The way as it is written currently makes it slow. Optimization
or the language change (thinking about Rust) is required.
Updates
- Address
- Scripts
- Speed
What Next?
Issues Currently the Taintgrind doesn't work on the ARM
processors. This is caused by the Valgrind itself. It is missing
some of the ARM conversions. The bug was already
reported.
Summary
The solution is based on the Valgrind/Taintgrind. It means that supports all the
system supported by Valgrind itself (+) But it also suffers from the Valgrind issues
(-)
The process of creating taint log is time consuming (-)
rtaint can be used in most of the cases making the analysis “faster” and
automated. Easy to incorporate to other tools. (+)
The Python may not be the best solution for the rtaint. Too slow? (-)
It requires more testing on the real live application. I’m happy to receive any
feedback :) (+)
References and Interesting Docs
https://github.com/wmkhoo/taintgrind
http://valgrind.org/
http://valgrind.org/docs/memcheck2005.pdf
https://users.ece.cmu.edu/~aavgerin/papers/Oakland10.pdf
https://www2.cs.arizona.edu/~debray/Publications/bit-level-taint.pdf
http://bitblaze.cs.berkeley.edu/papers/dta%2B%2B-ndss11.pdf
http://shell-storm.org/blog/Taint-analysis-and-pattern-matching-with-Pin/
https://www.blackhat.com/docs/eu-15/materials/eu-15-Kim-Triaging-Crashes-With-Backward-Taint-
Analysis-For-ARM-Architecture.pdf
Special Thanks
Wei Ming Khoo
Questions
Thank you :)
mzmyslowski@cycura.com
@marekzmyslowski
https://github.com/Cycura/rtaint
https://twitter.com/H4x0r54

More Related Content

PDF
How Triton can help to reverse virtual machine based software protections
PPT
iOS Application Pentesting
PDF
Taint analysis
PDF
Thick Client Penetration Testing.pdf
PDF
Dynamic Binary Analysis and Obfuscated Codes
PDF
IDA ユーザなら知っておくべきマントノン侯爵夫人にモテる 7つの法則
PDF
Memory & object pooling
PPTX
Invoke-Obfuscation DerbyCon 2016
How Triton can help to reverse virtual machine based software protections
iOS Application Pentesting
Taint analysis
Thick Client Penetration Testing.pdf
Dynamic Binary Analysis and Obfuscated Codes
IDA ユーザなら知っておくべきマントノン侯爵夫人にモテる 7つの法則
Memory & object pooling
Invoke-Obfuscation DerbyCon 2016

What's hot (20)

PDF
자동화된 소스 분석, 처리, 검증을 통한 소스의 불필요한 #if - #endif 제거하기 NDC2012
PDF
송창규, unity build로 빌드타임 반토막내기, NDC2010
PDF
Building Advanced XSS Vectors
PDF
Top 10 Web Application vulnerabilities
PDF
[보안 PARTNER DAY] 모바일게임 리소스 보안
PDF
徹底解説!Project Lambdaのすべて リターンズ[祝Java8Launch #jjug]
PDF
Practical White Hat Hacker Training - Active Information Gathering
PDF
NDC15 - 사례로 살펴보는 MSVC 빌드 최적화 팁
PDF
Burp Suite 2.0触ってみた
PDF
MS Officeファイル暗号化のマスター鍵を利用したバックドアとその対策
PDF
Linux binary Exploitation - Basic knowledge
PDF
Triton and symbolic execution on gdb
PDF
XSS Magic tricks
PDF
PHP 5.5ネーティブキャッシュの話
PPTX
Attacking thru HTTP Host header
PDF
채팅서버의 부하 분산 사례
PPTX
超効率的フロントエンドデバッグ術
PDF
フリーでできるセキュリティチェック OpenVAS CLI編
PDF
ワタナベ難読化シェル芸
PDF
MMOG Server-Side 충돌 및 이동처리 설계와 구현
자동화된 소스 분석, 처리, 검증을 통한 소스의 불필요한 #if - #endif 제거하기 NDC2012
송창규, unity build로 빌드타임 반토막내기, NDC2010
Building Advanced XSS Vectors
Top 10 Web Application vulnerabilities
[보안 PARTNER DAY] 모바일게임 리소스 보안
徹底解説!Project Lambdaのすべて リターンズ[祝Java8Launch #jjug]
Practical White Hat Hacker Training - Active Information Gathering
NDC15 - 사례로 살펴보는 MSVC 빌드 최적화 팁
Burp Suite 2.0触ってみた
MS Officeファイル暗号化のマスター鍵を利用したバックドアとその対策
Linux binary Exploitation - Basic knowledge
Triton and symbolic execution on gdb
XSS Magic tricks
PHP 5.5ネーティブキャッシュの話
Attacking thru HTTP Host header
채팅서버의 부하 분산 사례
超効率的フロントエンドデバッグ術
フリーでできるセキュリティチェック OpenVAS CLI編
ワタナベ難読化シェル芸
MMOG Server-Side 충돌 및 이동처리 설계와 구현
Ad

Similar to Crash Analysis with Reverse Taint (20)

PDF
nullcon 2011 - Reversing MicroSoft patches to reveal vulnerable code
PDF
DEFCON 21: EDS: Exploitation Detection System WP
PDF
Breaking Antivirus Software - Joxean Koret (SYSCAN 2014)
PDF
Breaking av software
PDF
Breaking av software
PDF
Breaking Antivirus Software
PPTX
Malware 101 by saurabh chaudhary
PDF
[USENIX-WOOT] Introduction to Procedural Debugging through Binary Libification
PPTX
Showing How Security Has (And Hasn't) Improved, After Ten Years Of Trying
ODP
Call Graph Agnostic Malware Indexing (EuskalHack 2017)
DOCX
What
PDF
PVS-Studio Static Analyzer as a Tool for Protection against Zero-Day Vulnerab...
PPT
Malware Classification Using Structured Control Flow
PDF
Debugging and optimization of multi-thread OpenMP-programs
PDF
A Smart Fuzzing Approach for Integer Overflow Detection
PDF
Attacking antivirus
PDF
Parallel Lint
PDF
Introduction into Fault-tolerant Distributed Algorithms and their Modeling (P...
PDF
44CON 2014 - Breaking AV Software
PDF
How to find 56 potential vulnerabilities in FreeBSD code in one evening
nullcon 2011 - Reversing MicroSoft patches to reveal vulnerable code
DEFCON 21: EDS: Exploitation Detection System WP
Breaking Antivirus Software - Joxean Koret (SYSCAN 2014)
Breaking av software
Breaking av software
Breaking Antivirus Software
Malware 101 by saurabh chaudhary
[USENIX-WOOT] Introduction to Procedural Debugging through Binary Libification
Showing How Security Has (And Hasn't) Improved, After Ten Years Of Trying
Call Graph Agnostic Malware Indexing (EuskalHack 2017)
What
PVS-Studio Static Analyzer as a Tool for Protection against Zero-Day Vulnerab...
Malware Classification Using Structured Control Flow
Debugging and optimization of multi-thread OpenMP-programs
A Smart Fuzzing Approach for Integer Overflow Detection
Attacking antivirus
Parallel Lint
Introduction into Fault-tolerant Distributed Algorithms and their Modeling (P...
44CON 2014 - Breaking AV Software
How to find 56 potential vulnerabilities in FreeBSD code in one evening
Ad

Recently uploaded (20)

PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Machine learning based COVID-19 study performance prediction
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPT
Teaching material agriculture food technology
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Electronic commerce courselecture one. Pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Big Data Technologies - Introduction.pptx
Network Security Unit 5.pdf for BCA BBA.
Review of recent advances in non-invasive hemoglobin estimation
Machine learning based COVID-19 study performance prediction
Mobile App Security Testing_ A Comprehensive Guide.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
NewMind AI Weekly Chronicles - August'25 Week I
20250228 LYD VKU AI Blended-Learning.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
Teaching material agriculture food technology
Agricultural_Statistics_at_a_Glance_2022_0.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Electronic commerce courselecture one. Pdf
Unlocking AI with Model Context Protocol (MCP)
Advanced methodologies resolving dimensionality complications for autism neur...
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Chapter 3 Spatial Domain Image Processing.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Big Data Technologies - Introduction.pptx

Crash Analysis with Reverse Taint

  • 1. Crash Analysis with Reverse Taint Powered by Taintgrind Marek Zmysłowski V. 12122019
  • 2. whoami Security Researcher @ Interested in fuzzing and vulnerability finding Fan of The Matrix and Hacker movies Co-organizer of “H4x0r5 %40 Warsaw” meetings
  • 3. Crash Analysis with Reverse Taint
  • 4. Crash Analysis with Reverse Taint - What is a crash? Crash, or system crash, occurs when a computer program (...) stops functioning properly and exits. * An application typically crashes when it performs an operation that is not allowed by the operating system. The operating system then triggers an exception or signal in the application. Unix applications traditionally responded to the signal by dumping core. * - Why is it important to identify a crash? “performs an operation that is not allowed”. This indicate that inside the application a bug exists. When the crash is identified and recurrent the bug can be found. *https://en.wikipedia.org/wiki/Crash_(computing)
  • 5. Crash Analysis with Reverse Taint - How to identify a crash? “operating system then triggers an exception or signal” Different operating systems contain different mechanisms to “collect” crashes. Some perform core dump, store the memory (Linux), other attach the debugger to allow user the debugging session with the crashed application (Windows). - How do we get crashes? “That, Detective, is the ‘right question.’” “I, Robot” (2004)
  • 6. Hunting in the wild - How can a crash be found? By accident or by one of the most popular techniques that we will be also mentioned here, fuzzing. The sooner the bug is found in the production process, the less the costs are. - So what is ‘fuzzing’? The idea behind fuzzing is very simple. Let’s take an malformed input and feed it to the application. Maybe it will crash. Of course, how the input is “chosen” and how the crashes are caught is a topic for another presentation(s).
  • 7. Fuzzers The godfather of all is, of course, AFL. However, recent years brought multiple fuzzers used for different purposes. Some of them are different clones of AFL, some of them try to do things differently. Everyone can find something for themselves. American Fuzzy Lop HonggFuzz AFL++ Angora QSYM WinAFL
  • 8. Real Example - Fuzzing jhead is used to display and manipulate data contained in the Exif header of JPEG images from digital cameras. By default, jhead displays the more useful camera settings from the file in a user-friendly format. The version used here is 3.03. http://www.sentex.net/~mwandel/jhead/
  • 9. Real Example - Fuzzing
  • 11. Crash vs Bug - What is the difference between Crash and Bug? Crash is a result of incorrectly working code caused by a bug. Sometimes it happens, that the crashing place and the bug place are “the same”. And sometimes not ... - Is one crash caused by one bug? No.
  • 12. Crash vs Bug Case 1. One bug causes one crash. This is the easiest situation as the identification is straightforward.
  • 13. Crash vs Bug Case 2. One bug can cause a few crashes. This happens quite often especially with simple buffer overflows where the “size” variable is used. Direct read or write and access different memory regions cause different crashes.
  • 14. Crash vs Bug Case 3. A few bugs can cause one crash. This depends on how we identify crash. The simplest example can be a frame processor. For the different types of frame, the size parser works incorrectly and may cause different crashes for different paths.
  • 15. Crash vs Bug In an additional experiment we computed a portion of groundtruth. We applied all patches to cxxfilt from the version we fuzzed up until the present. We grouped together all inputs that a particular patch caused to now gracefully exit [11], confirming that the patch represented a single conceptual bugfix. We found that all 57,142 crashing inputs deemed “unique” by coverage profiles were addressed by 9 distinct patches. Stack hashes did better, but still over-counted bugs. Instead of the bug mapping to, say 500 AFL coverage-unique crashes in a given trial, it would map to about 46 stack hashes, on average. Stackhashes were also subject to false negatives: roughly 16% of hashesfor crashes from one bug were shared by crashes from another bug.In five cases, a distinct bug was found by only one crash, and that crash had a non-unique hash, meaning that evidence of a distinct bug would have been dropped by “de-duplication.” “Evaluating Fuzz Testing” https://arxiv.org/pdf/1808.09700.pdf
  • 16. Crash vs Bug So what is needed?
  • 17. Crash Analysis with Reverse Taint
  • 18. Crash Analysis with Reverse Taint - What is crash analysis and why do we need that? Crash analysis is a process of evaluating exploitability of the crash and identifying the root cause of this crash. If you are fuzzing something, the number of crashes can be huge. Also the impact and consequences (criticality) the bug might have, depends on the application technology and the system. - So what exactly do we analyze? There are two major things to analyze: the crash and the bug.
  • 19. Analysis - Crash - What type of the crash is it? For example: Out-of-bound read, NULL Pointer Dereference, Buffer Overflow, etc. - Is the crash exploitable? It is a part of identification process to find out if the crash can be used to achieve something more than just crash the application - read a piece data, overwrite memory or execute code. - Critical or exploitable - what is the difference? The exploitability related only to bug and crash itself. The criticality is related to the whole environment. A Safe NULL Pointer Dereference is different for a nuclear power plant software and a kids game.
  • 20. Analysis - Bug The second important analysis part is to identify the bug. As this was mentioned before, there can be different relations between bug and crash. It is also important to how inputs (which bits and bytes) correlate with the bug. This of course may influence the crash later and its exploitability. - What different relation are here? User data can control the crash directly (e: offset inside the table is calculated based on user data) or indirectly (e: the incorrect branch is taken) For example: NULL Pointer Dereference vs Safe NULL Pointer Dereference
  • 21. Analysis - Bug (Direct vs. Indirect) void *pointer = NULL char table[100] int index = 0 char user_data user_data>100 pointer[index] = 0 table[user_data] = 0
  • 22. Analysis - Few Interesting Tools It runs crash files with instrumentation and outputs results in various formats. It summarizes crashes in a crashwalk database by major / minor stack hash. Although AFL (for example) already de-dupes crashes, bucketing summarizes those crashes by an order of magnitude or more. Crashes that bucket the same have exactly the same stack contents, so they're likely (not guaranteed) to be the same bug. It is a simple utility to output the filenames of all crashes matching a given hash. I use it in combination with xargs to bulk delete / move crash files. crashwalk - cwtriage - cwdump - cwfind https://github.com/bnagy/crashwalk
  • 23. Analysis - Few Interesting Tools afl-utils - afl-collect - afl-minimize Copies all crash sample files from an afl synchronisation directory (used by multiple afl instances when run in parallel) into a single location providing easy access for further crash analysis. Also executes exploitable on them and remove uninteresting crashes. Helps to create a minimized corpus from samples of a parallel fuzzing job. https://github.com/rc0r/afl-utils
  • 26. Analysis - Few Interesting Tools afl-analyze It takes an input file, attempts to sequentially flip bytes, and observes the behavior of the tested program. It then color-codes the input based on which sections appear to be critical, and which are not. While not bulletproof, it can often offer quick insights into complex file formats. https://lcamtuf.blogspot.com/2016/02/say-hello-to-afl-analyze.html
  • 27. afl-analyze *Of course it works, it just does not always give expected results.
  • 28. Crash Analysis with Reverse Taint
  • 29. Tainting - What is tainting? The purpose of dynamic taint analysis is to track information flow between sources and sinks. Any program value whose computation depends on data derived from a taint source is considered tainted. Any other value is considered untainted. - What are the types of tainting? ● The direct value is tainted ● Indirect/Control flow ● Address/Pointer relation https://users.ece.cmu.edu/~aavgerin/papers/Oakland10.pdf
  • 30. Tainting - Types Indirect/ Control Flow if (X > 2) Y = 5 else Y = 10 Address/Pointer Y = A[X] Direct Value Y = X + 2
  • 31. Tainting Propagation (Policy) Depends on the application, different rules can be used to propagate the taint. It is a set of rules how the source operants are propagated to destination. In standard taint analysis, the destination operand is typically marked as tainted if any of the source operands is tainted regardless of how the specific semantics of s affects its destination operands.
  • 32. Tainting - Issues - What is over-tainting? Overtainting occurs when code or data identified by the analysis as tainted is not in fact influenced by any taint source (false-positive). - What is under-tainting? Under-tainting occurs when code or data that is influenced by a taint source is not identified by the analysis as tainted. Such imprecision can be problematic, especially in systems where the result of the taint analysis is critically important (false-negative).
  • 34. - What is Valgrind?
  • 35. Valgrind It is an instrumentation framework for building dynamic analysis tools. It comes with a set of tools each of which performs some kind of debugging, profiling, or similar task that helps you improve your programs. Valgrind's architecture is modular, so new tools can be created easily and without disturbing the existing structure. http://valgrind.org
  • 36. Valgrind IR Valgrind had an x86-specific,part D&R, part C&A, assembly-code-like IR in which the units of translation were basic blocks. Since then Valgrind has had anarchitecture- neutral, D&R, single-static-assignment (SSA) IR that is more similar to what might be used in a compiler. IR blocks are superblocks: single-entry, multiple-exit stretches of code. *http://valgrind.org/docs/valgrind2007.pdf
  • 37. Single-Static-Assignment (SSA) It is a property of an intermediate representation (IR), which requires that each variable is assigned exactly once, and every variable is defined before it is used. Existing variables in the original IR are split into versions, new variables typically indicated by the original name with a subscript in textbooks, so that every definition gets its own version. In SSA form, use- def chains are explicit and each contains a single element. *https://en.wikipedia.org/wiki/Static_single_assignment_form
  • 38. - What is Taintgrind?
  • 39. Taintgrind Taintgrind is based on Valgrind's MemCheck and Flayer plugin. Taintgrind borrows the bit-precise shadow memory from MemCheck and only propagates explicit data flow. This means that Taintgrind will not propagate taint in control structures such as if-else, for-loops and while-loops. Taintgrind will also not propagate taint in dereferenced tainted pointers. http://valgrind.org/docs/memcheck2005.pdf
  • 40. Taintgrind - Propagation Rules 1. The direct value is tainted 2. Indirect/Control flow 3. Address/Pointer relation
  • 41. Taintgrind - Propagation Rules - What are the Taintgrind propagation rules? The granularity for the memory operation is 1 byte. For the registry operation it is the size related to the operand. Even if one byte is used there, the whole register will still be tainted. In such case, the Taintgrind is overtainting. However, because the Taintgrind is handling first type, it is also under-tainting.
  • 42. Taintgrind - Propagation Rules WRITE READ Overtainting bit-byte operation
  • 43. Taintgrind Here is the example of logs and how the taint is propagated over the file. The job is to find all the patch from the end of the file to the beginning. One instruction can be tainted with multiple input.
  • 44. Taintgrind The original Taintgrind was not useful for the purpose of the reverse taint. It was missing a few parts. - What was changed? The “Read” function was not showing the size of data that was read. The “Load” and “Store” functions were also not presenting the size of the operation.
  • 46. GDB Function where the crash occurred Instruction that caused the crash
  • 47. The crash occurred with the reference to the address stored in RAX register GDB
  • 49. Taintgrind /work/taint-analysis/valgrind- 3.15.0/build/bin/valgrind --tool=taintgrind Calling the Taintgrind tool. ---file-filter=/work/taint-analysis/CRASH This is the name of the file that needs to be tainted. It must be FULL path. --compact=yes Makes the log file smaller. --taint-start=0 Offset inside the file. --taint-len=1504 Taint size /work/taint-analysis/jhead-3.03/jhead /work/taint-analysis/CRASH Command
  • 50. Crash Analysis with Reverse Taint
  • 51. Reverse Tainting the Value An example how the values are tracked.
  • 52. Reverse Tainting the Value Parts of one tainted variable diagram
  • 54. rtaint - -f This is the name of the log file created by Taintgrind. It can be in the compact version. - -g The script can also produce the file in dot format used to generate a graph. - -s This is the name of the file with the slice. Later, this can be used to display what operations where tainted with the values. - -k This is the directory path where the KaiTai struct will be stored inside files.
  • 55. Reverse Tainting the Value These are the indexes from the file that are causing the crash.
  • 57. Reverse Tainting the Value - With or without the file size? What is the probability that different size files with the same KaiTai Struct will have different root cause? - What is the relation between AFL Unique Algorithm and the Tainted Input? It is an open question...
  • 58. Reverse Tainting the Value - Results 413 total crashes found by 4 instances (1 master and 3 slaves) - master - 44 crashes - slave1 - 116 crashes - slave2 - 124 crashes - slave3 - 129 crashes 349 crashes were reproduced under Taintgraind 177 crashes had unique KaiTai structure.
  • 59. Slicing It is the computation of the set of program statements, the program slice, that may affect the values at some point of interest, referred to as a slicing criterion.* https://en.wikipedia.org/wiki/Program_slicing
  • 60. Graph Taintgrind and rtaint allows to create a dot graph that can be converted with the graphviz package.
  • 62. Crash Analysis with Reverse Taint Powered by Taintgrind
  • 63. Powered by ... Moflow - https://github.com/Cisco- Talos/moflow/tree/master/BAP-0.7-moflow Binary Ninja - https://blog.trailofbits.com/2019/08/29/reverse- taint-analysis-using-binary-ninja/ Triton - https://triton.quarkslab.com/ BARF - https://github.com/programa-stic/barf-project libdft - http://www.cs.columbia.edu/~vpk/research/libdft/ CommercialFree TETRANE - https://www.tetrane.com/
  • 64. What Next? The tainting starts from the last line inside the file. This is useful when there is a crash. But there is no way to taint any arbitrary instruction if the application doesn’t crash. IDA Pro/Ghidra/Binary Ninja script for highlighting the tainted instruction. This will help to easy identified the data flow. The way as it is written currently makes it slow. Optimization or the language change (thinking about Rust) is required. Updates - Address - Scripts - Speed
  • 65. What Next? Issues Currently the Taintgrind doesn't work on the ARM processors. This is caused by the Valgrind itself. It is missing some of the ARM conversions. The bug was already reported.
  • 66. Summary The solution is based on the Valgrind/Taintgrind. It means that supports all the system supported by Valgrind itself (+) But it also suffers from the Valgrind issues (-) The process of creating taint log is time consuming (-) rtaint can be used in most of the cases making the analysis “faster” and automated. Easy to incorporate to other tools. (+) The Python may not be the best solution for the rtaint. Too slow? (-) It requires more testing on the real live application. I’m happy to receive any feedback :) (+)
  • 67. References and Interesting Docs https://github.com/wmkhoo/taintgrind http://valgrind.org/ http://valgrind.org/docs/memcheck2005.pdf https://users.ece.cmu.edu/~aavgerin/papers/Oakland10.pdf https://www2.cs.arizona.edu/~debray/Publications/bit-level-taint.pdf http://bitblaze.cs.berkeley.edu/papers/dta%2B%2B-ndss11.pdf http://shell-storm.org/blog/Taint-analysis-and-pattern-matching-with-Pin/ https://www.blackhat.com/docs/eu-15/materials/eu-15-Kim-Triaging-Crashes-With-Backward-Taint- Analysis-For-ARM-Architecture.pdf