SlideShare a Scribd company logo
Decompiler internals: microcode
Hex-Rays
Ilfak Guilfanov
2(c) 2018 Ilfak Guilfanov
Presentation Outline
Decompiler architecture
Overview of the microcode
Opcodes and operands
Stack and registers
Data flow analysis, aliasibility
Microcode availability
Your feedback
Online copy of this presentation is available at
https://www.hex-rays.com/products/ida/support/ppt/rootedcon2018.ppt
3(c) 2018 Ilfak Guilfanov
Hex-Rays Decompiler
Interactive, fast, robust, and programmable decompiler
Can handle x86, x64, ARM, ARM64, PowerPC
Runs on top of IDA Pro
Has been evolving for more than 10 years
Internals were not really published
Namely, the intermediate language
4(c) 2018 Ilfak Guilfanov
Decompiler architecture
It uses very straightforward sequence of steps:
Generate microcode
Transform microcode (optimize, resolve memrefs, analyze calls, etc)
Allocate local vars
Generate ctree
Beautify ctree
Print ctree
5(c) 2018 Ilfak Guilfanov
Decompiler architecture
We will focus on the first two steps:
Generate microcode
Transform microcode (optimize, resolve memrefs, analyze calls, etc)
Allocate local vars
Generate ctree
Beautify ctree
Print ctree
6(c) 2018 Ilfak Guilfanov
Why microcode?
It helps to get rid of the complexity of processor
instructions
Also we get rid of processor idiosyncrasies. Examples:
− x86: segment registers, fpu stack
− ARM: thumb mode addresses
− PowerPC: multiple copies of CF register (and other
condition registers)
− MIPS: delay slots
− Sparc: stack windows
It makes the decompiler portable. We “just” need to
replace the microcode generator
Writing a decompiler without an intermediate language
looks like waste of time
7(c) 2018 Ilfak Guilfanov
Is implementing an IR difficult?
Your call :)
How many IR languages to you know?
8(c) 2018 Ilfak Guilfanov
Why not use an existing IR?
There are tons of other intermediate languages: LLVM,
REIL, Binary Ninja's ILs, RetDec's IL, etc.
Yes, we could use something
But I started to work on the microcode when none of the
above languages existed
This is the main reason why we use our own IR
mov.d EAX,, T0
ldc.d #5,, T1
mkcadd.d T0, T1, CF
mkoadd.d T0, T1, CF
add.d T0, T1, TT
setz.d TT,, ZF
sets.d TT,, ZF
mov.d TT,, EAX
(this is how it looked like in 1999)
9(c) 2018 Ilfak Guilfanov
A long evolution
I started to work on the microcode in 1998 or earlier
The name is nothing fancy but reflects the nature of it
Some design decisions turned out to be bad (and some of
them are already very difficult to fix)
For example, the notion of virtual stack registers
We will fix it, though. Just takes time
Even today we modify our microcode when necessary
For example, I reshuffled the instruction opcodes for this
talk...
10(c) 2018 Ilfak Guilfanov
Design highlights
Simplicity:
− No processor specific stuff
− One microinstruction does one thing
− Small number of instructions (only 45 in 1999, now
72)
− Simple instruction operands (register, number,
memory)
− Consider only compiler generated code
Discard things we do not care about:
− Instruction timing (anyway it is a lost battle)
− Instruction order (exceptions are a problem!)
− Order of memory accesses (later we added logic to
preserve indirect memory accesses)
− Handcrafted code
11(c) 2018 Ilfak Guilfanov
Generated microcode
Initially the microcode looks like RISC code:
− Memory loads and stores are done using dedicated
microinstructions
− The desired operation is performed on registers
− Microinstructions have no side effects
− Each output register is initialized by a separate
microinstruction
It is very verbose. Example:
004014FB mov eax, [ebx+4]
004014FE mov dl, [eax+1]
00401501 sub dl, 61h ; 'a'
00401504 jz short loc_401517
12(c) 2018 Ilfak Guilfanov
Initial microcode: very verbose
2. 0 mov ebx.4, eoff.4 ; 4014FB u=ebx.4 d=eoff.4
2. 1 mov ds.2, seg.2 ; 4014FB u=ds.2 d=seg.2
2. 2 add eoff.4, #4.4, eoff.4 ; 4014FB u=eoff.4 d=eoff.4
2. 3 ldx seg.2, eoff.4, et1.4 ; 4014FB u=eoff.4,seg.2,
; (STACK,GLBMEM)
d=et1.4
2. 4 mov et1.4, eax.4 ; 4014FB u=et1.4 d=eax.4
2. 5 mov eax.4, eoff.4 ; 4014FE u=eax.4 d=eoff.4
2. 6 mov ds.2, seg.2 ; 4014FE u=ds.2 d=seg.2
2. 7 add eoff.4, #1.4, eoff.4 ; 4014FE u=eoff.4 d=eoff.4
2. 8 ldx seg.2, eoff.4, t1.1 ; 4014FE u=eoff.4,seg.2,
; (STACK,GLBMEM)
d=t1.1
2. 9 mov t1.1, dl.1 ; 4014FE u=t1.1 d=dl.1
2.10 mov #0x61.1, t1.1 ; 401501 u= d=t1.1
2.11 setb dl.1, t1.1, cf.1 ; 401501 u=dl.1,t1.1 d=cf.1
2.12 seto dl.1, t1.1, of.1 ; 401501 u=dl.1,t1.1 d=of.1
2.13 sub dl.1, t1.1, dl.1 ; 401501 u=dl.1,t1.1 d=dl.1
2.14 setz dl.1, #0.1, zf.1 ; 401501 u=dl.1 d=zf.1
2.15 setp dl.1, #0.1, pf.1 ; 401501 u=dl.1 d=pf.1
2.16 sets dl.1, sf.1 ; 401501 u=dl.1 d=sf.1
2.17 mov cs.2, seg.2 ; 401504 u=cs.2 d=seg.2
2.18 mov #0x401517.4, eoff.4 ; 401504 u= d=eoff.4
2.19 jcnd zf.1, $loc_401517 ; 401504 u=zf.1
13(c) 2018 Ilfak Guilfanov
The first optimization pass
2. 0 ldx ds.2, (ebx.4+#4.4), eax.4 ; 4014FB u=ebx.4,ds.2,
;(STACK,GLBMEM) d=eax.4
2. 1 ldx ds.2, (eax.4+#1.4), dl.1 ; 4014FE u=eax.4,ds.2,
;(STACK,GLBMEM) d=dl.1
2. 2 setb dl.1, #0x61.1, cf.1 ; 401501 u=dl.1 d=cf.1
2. 3 seto dl.1, #0x61.1, of.1 ; 401501 u=dl.1 d=of.1
2. 4 sub dl.1, #0x61.1, dl.1 ; 401501 u=dl.1 d=dl.1
2. 5 setz dl.1, #0.1, zf.1 ; 401501 u=dl.1 d=zf.1
2. 6 setp dl.1, #0.1, pf.1 ; 401501 u=dl.1 d=pf.1
2. 7 sets dl.1, sf.1 ; 401501 u=dl.1 d=sf.1
2. 8 jcnd zf.1, $loc_401517 ; 401504 u=zf.1
Only 8 microinstructions
Some intermediate registers disappeared
Sub-instructions appeared
Still too noisy and verbose
14(c) 2018 Ilfak Guilfanov
Further microcode transformations
And the final code is:
This code is ready to be translated to ctree.
(numbers in curly braces are value numbers)
The output will look like this:
2. 1 ldx ds.2{3}, ([ds.2{3}:(ebx.4+#4.4)].4+#1.4), dl.1{5} ; 4014FE
; u=ebx.4,ds.2,(GLBLOW,sp+20..,GLBHIGH) d=dl.1
2. 2 sub dl.1{5}, #0x61.1, dl.1{6} ; 401501 u=dl.1 d=dl.1
2. 3 jz dl.1{6}, #0.1, @7 ; 401504 u=dl.1
2. 0 jz [ds.2{4}:([ds.2{4}:(ebx.4{8}+#4.4){7}].4{6}+#1.4){5}].1{3},
#0x61.1,
@7
; 401504 u=ebx.4,ds.2,(GLBLOW,GLBHIGH)
if ( argv[1][1] == 'a' )
...
15(c) 2018 Ilfak Guilfanov
Minor details
Reading microcode is not easy (but hey, it was not
designed for that! :)
All operand sizes are spelled out explicitly
The initial microcode is very simple (RISC like)
As we transform microcode, nested subinstructions may
appear
We implemented the translation from processor
instructions to microinstructions in plain C++
We do not use automatic code generators or machine
descriptions to generate them. Anyway there are too many
processor specific details to make them feasible
16(c) 2018 Ilfak Guilfanov
Opcodes: constants and move
Copy from (l) to (d)estination
Operand sizes must match
ldc l, d // load constant
mov l, d // move
17(c) 2018 Ilfak Guilfanov
Opcodes: changing operand size
Copy from (l) to (d)estination
Operand sizes must differ
Since real world programs work with partial registers (like
al, ah), we absolutely need low/high
xds l, d // extend (signed)
xdu l, d // extend (unsigned)
low l, d // take low part
high l, d // take high part
18(c) 2018 Ilfak Guilfanov
Opcodes: load and store
{sel, off} is a segment:offset pair
Usually seg is ds or cs; for processors with flat memory it
is ignored
'off' is the most interesting part, it is a memory address
stx l, sel, off // store value to memory
ldx sel, off, d // load value from memory
Example:
ldx ds.2, (ebx.4+#4.4), eax.4
stx #0x2E.1, ds.2, eax.4
19(c) 2018 Ilfak Guilfanov
Opcodes: comparisons
Compare (l)left against (r)right
The result is stored into (d)estination, a bit register like
CF,ZF,SF,...
sets l, d // sign
setp l, r, d // unordered/parity
setnz l, r, d // not equal
setz l, r, d // equal
setae l, r, d // above or equal
setb l, r, d // below
seta l, r, d // above
setbe l, r, d // below or equal
setg l, r, d // greater
setge l, r, d // greater or equal
setl l, r, d // less
setle l, r, d // less or equal
seto l, r, d // overflow of (l-r)
20(c) 2018 Ilfak Guilfanov
Opcodes: arithmetic and bitwise operations
Operand sizes must be the same
The result is stored into (d)estination
neg l, d // -l -> d
lnot l, d // !l -> d
bnot l, d // ~l -> d
add l, r, d // l + r -> d
sub l, r, d // l - r -> d
mul l, r, d // l * r -> d
udiv l, r, d // l / r -> d
sdiv l, r, d // l / r -> d
umod l, r, d // l % r -> d
smod l, r, d // l % r -> d
or l, r, d // bitwise or
and l, r, d // bitwise and
xor l, r, d // bitwise xor
21(c) 2018 Ilfak Guilfanov
Opcodes: shifts (and rotations?)
Shift (l)eft by the amount specified in (r)ight
The result is stored into (d)estination
Initially our microcode had rotation operations but they
turned out to be useless because they can not be nicely
represented in C
shl l, r, d // shift logical left
shr l, r, d // shift logical right
sar l, r, d // shift arithmetic right
22(c) 2018 Ilfak Guilfanov
Opcodes: condition codes
Perform the operation on (l)left and (r)ight
Generate carry or overflow bits
Store CF or OF into (d)estination
We need these instructions to precisely track carry and
overflow bits
Normally these instructions get eliminated during
microcode transformations
cfadd l, r, d // carry of (l+r)
ofadd l, r, d // overflow of (l+r)
cfshl l, r, d // carry of (l<<r)
cfshr l, r, d // carry of (l>>r)
23(c) 2018 Ilfak Guilfanov
Opcodes: unconditional flow control
Initially calls have only the callee address
The decompiler retrieves the callee prototype from the
database or tries to guess it
After that the 'd' operand contains all information about the
call, including the function prototype and actual arguments
ijmp {sel, off} // indirect jmp
goto l // unconditional jmp
call l d // direct call
icall {sel, off} d // indirect call
ret // return
call $___org_fprintf <...:
“FILE *” &($stdout).4,
"const char *" &($aArIllegalSwitc).4,
_DWORD xds.4([ds.2:([ds.2:(ebx.4+#4.4)].4+#1.4)].1)>.0
24(c) 2018 Ilfak Guilfanov
Opcodes: conditional jumps
Compare (l)eft against (r)right and jump to (d)estination if
the condition holds
Jtbl is used to represent 'switch' idioms
jcnd l, d //
jnz l, r, d // ZF=0 Not Equal
jz l, r, d // ZF=1 Equal
jae l, r, d // CF=0 Above or Equal
jb l, r, d // CF=1 Below
ja l, r, d // CF=0 & ZF=0 Above
jbe l, r, d // CF=1 | ZF=1 Below or Equal
jg l, r, d // SF=OF & ZF=0 Greater
jge l, r, d // SF=OF Greater or Equal
jl l, r, d // SF!=OF Less
jle l, r, d // SF!=OF | ZF=1 Less or Equal
jtbl l, cases // Table jump
25(c) 2018 Ilfak Guilfanov
Opcodes: floating point operations
Basically we have conversions and a few arithmetic
operations
There is little we can do with these operations, they are
not really optimizable
Other fp operations use helper functions (e.g. sqrt)
f2i l, d // int(l) => d; convert fp -> int, any size
f2u l, d // uint(l)=> d; convert fp -> uint,any size
i2f l, d // fp(l) => d; convert int -> fp, any size
i2f l, d // fp(l) => d; convert uint-> fp, any size
f2f l, d // l => d; change fp precision
fneg l, d // -l => d; change sign
fadd l, r, d // l + r => d; add
fsub l, r, d // l - r => d; subtract
fmul l, r, d // l * r => d; multiply
fdiv l, r, d // l / r => d; divide
26(c) 2018 Ilfak Guilfanov
Opcodes: miscellaneous
Some operations can not be expressed in microcode
If possible, we use intrinsic calls for them (e.g. sqrtpd)
If no intrinsic call exists, we use “ext” for them and only try
to keep track of data dependencies (e.g. “aam”)
“und” is used when a register is spoiled in a way that we
can not predict or describe (e.g. ZF after mul)
nop // no operation
und d // undefine
ext l, r, d // external insn
push l
pop d
27(c) 2018 Ilfak Guilfanov
More opcodes?
We quickly reviewed all 72 instructions
Probably we should extend microcode
Ternary operator?
Post-increment and post-decrement?
All this requires more research
28(c) 2018 Ilfak Guilfanov
Operands!
As everyone else, initially we had only:
− constant integer numbers
− registers
Life was simple and easy in the good old days!
Alas, the reality is more diverse. We quickly added:
− stack variables
− global variables
− address of an operand
− list of cases (for switches)
− result of another instruction
− helper functions
− call arguments
− string and floating point constants
29(c) 2018 Ilfak Guilfanov
Register operands
The microcode engine provides unlimited (in theory)
number of microregisters
Process registers are mapped to microregisters:
− eax => microregisters (mreg) 8, 9, 10, 11
− al => mreg 8
− ah => mreg 9
Usually there are more microregisters than the processor
registers. We allocate them as needed when generating
microcode
Examples:
eax.4
rsi.8
ST00_04.4
30(c) 2018 Ilfak Guilfanov
Stack as microregisters
I was reluctant to introduce a new operand type for stack
variables and decided to map the stack frame to
microregisters
Like, the stack frame is mapped to the microregister #100
and higher
A bright idea? Nope!
Very soon I realized that we have to handle indirect
references to the stack frame
Not really possible with microregisters
But there was so much code relying on this concept that
we still have it
Laziness “pays off” now and will do so in the future
(negatively)
31(c) 2018 Ilfak Guilfanov
Stack as viewed by the decompiler
Shadow stkargs
Input stkargs
Return address
Saved registers
Local variables
Output stkargs
(not visible in IDA)
inargtop
inargoff
minimal esp
typical ebp
minstkref
typical ebp
minargref
Local variables
stkvar base 0
Input stkargs
Yellow part is mapped to microregisters
Red is aliasable
32(c) 2018 Ilfak Guilfanov
More operand types!
64-bit values are represented as pairs of registers
Usually it is a standard pair like edx:eax
Compilers get better and nowadays use any registers as a
pair; or even pair a stack location with a register: sp+4:esi
We ended up with a new operand type:
− operand pair
It consists of low and high halves
They can be located anywhere (stack, registers, glbmem)
33(c) 2018 Ilfak Guilfanov
Scattered operands
The nightmare has just begun, in fact
Modern compilers use very intricate rules to pass structs
and unions by value to and from the called functions
A register like RDI may contain multiple structure fields
Some structure fields may be passed on the stack
Some in the floating registers
Some in general registers (unaligned wrt register start)
We had no other choice but to add
− scattered operands
that can represent all the above
34(c) 2018 Ilfak Guilfanov
A simple scattered return value
A function that returns a struct in rax:
Assembler code:
struct div_t { int quot; int rem; };
div_t div(int numer, int denom);
mov edi, esi
mov esi, 1000
call _div
movsxd rdx, eax
sar rax, 20h
add [rbx], rdx
imul eax, 1000
cdqe
add rax, [rbx+8]
35(c) 2018 Ilfak Guilfanov
A simple scattered return value
…and the output is:
Our decompiler managed to represent things nicely!
Similar or more complex situations exist for all 64-bit
processors
Support for scattered operands is not complete yet but we
continue to improve it
v2 = div(a2, 1000);
*a1 += v2.quot;
result = a1[1] + 1000 * v2.rem;
36(c) 2018 Ilfak Guilfanov
More detailed look at microcode transformations
The initial “preoptimization” step uses very simple
constant and register propagation algorithm
It is very fast
It gets rid of most temporary registers and reduces the
microcode size by two
Normally we use a more sophisticated propagation
algorithm
It also works on the basic block level
It is much slower but can:
− handle partial registers (propagate eax into an
expression that uses ah)
− move entire instruction inside another
− work with operands other that registers (stack and
global memory, pair and scattered operands)
37(c) 2018 Ilfak Guilfanov
Global optimization
We build the control flow graph
Perform data flow analysis to find where each operand is
used or defined
The use/def information is used to:
− delete dead code (if the instruction result is not used,
then we delete the instruction)
− propagate operands and instructions across block
boundaries
− generate assertions for future optimizations (we know
that eax is zero at the target of “jz eax” if there are no
other predecessors; so we generate “mov 0, eax”)
38(c) 2018 Ilfak Guilfanov
Synthetic assertion instructions
If jump is not taken, then we know that eax is zero
Assertions can be propagated and lead to more
simplifications
jnz eax.4, #0, @5
blk5:
...
mov #0.4, eax.4 ; assert
...
falsetrue
39(c) 2018 Ilfak Guilfanov
Simple algebraic transformations
We have implemented (in plain C++) hundreds of very
small optimization rules. For example:
They are simple and sound
They apply to all cases without exceptions
Overall the decompiler uses sound rules
They do not depend on the compiler
(x-y)+y => x
x- ~y => x+y+1
x*m-x*n => x*(m-n)
(x<<n)-x => (2**n-1)*x
-(x-y) => y-x
(~x) < 0 => x >= 0
(-x)*n => x*-n
40(c) 2018 Ilfak Guilfanov
More complex rules
For example, this rule recognizes 64-bit subtractions:
We have a swarm of rules like this. They work like little
ants :)
CMB18 (combination rule #18):
sub xlow.4, ylow.4, rlow.4
sub xhigh.4, (xdu.4((xlow.4 <u ylow.4))+yhigh.4), rhigh.4
=>
sub x.8, y.8, r.8
if yhigh is zero, then it can be optimized away
a special case when xh is zero:
sub xl, yl, rl
neg (xdu(lnot(xl >=u yl))+yh), rh
41(c) 2018 Ilfak Guilfanov
Data dependency dependent rules
Naturally, all these rules are compiler-independent, they
use common algebraic number properties
Unfortunately we do not have a language to describe
these rules, so we manually added these rules in C++
However, the pattern recognition does not naively check if
the previous or next instruction is the expected one. We
use data dependencies to find the instructions that form
the pattern
For example, the rule CMB43 looks for the 'low' instruction
by searching forward for an instruction that accesses the
'x' operand: CMB43:
mul #(1<<N).4, xl.4, yl.4
low (x.8 >>a #M.1), yh.4, M == 32-N
=>
mul x.8, #(1<<N).8, y.8
42(c) 2018 Ilfak Guilfanov
Interblock rules
Some rules work across multiple blocks:
jl xh, yh, SUCCESS
jg xh, yh, @4
jb xl, yl, SUCCESS
FAILED: ...
SUCCESS: ...
jl x, y, SUCCESS
FAILED: ...
SUCCESS: ...
The “64bit 3-way check” rule transforms
this structure into simple:
(xh means high half of x
xl means low half of x
yh means high half of y
yl means low half of y)
43(c) 2018 Ilfak Guilfanov
Interblock rules: signed division by power2
Signed division is sometimes replaced by a shift:
A simple rule transforms it back:
jcnd !SF(x), b3
add x, (1<<N)-1, x
sar x, N, r
sdiv x, (1<<N), r
44(c) 2018 Ilfak Guilfanov
Hooks
It is possible to hook to the optimization engine and add
your own transformation rules
The Decompiler SDK has some examples how to do it
Currently it is not possible to disable an existing rule
However, since (almost?) all of them are sound and do not
use heuristics, it is not a problem
In fact the processor specific parts of the decompiler
internally use these hooks as well
45(c) 2018 Ilfak Guilfanov
ARM hooks
For example, the ARM decompiler has the following rule:
so that a construct like this: BX LR
will be converted into: RET
only if we can prove that the value of LR at the "BX LR"
instruction is equal to the initial value of LR at the entry
point.
However, how do we find if we jump to the initial_lr? Data
analysis is to help us
ijmp cs, initial_lr => ret
46(c) 2018 Ilfak Guilfanov
Data flow analysis
In fact virtually all transformation rules are based on data
flow analysis. Very rarely we check the previous or the
next instruction for pattern matching
Instead, we calculate the use/def lists for the instruction
and search for the instructions that access them
We keep track of what is used and what is defined by
every microinstruction (in red). These lists are calculated
when necessary:
mov %argv.4, ebx.4 ; 4014E9 u=arg+4.4 d=ebx.4
mov %argc.4, edi.4 ; 4014EC u=arg+0.4 d=edi.4
mov &($dword_41D128).4, ST18_4.4 ; 4014EF u= d=ST18_4.4
goto @12 ; 4014F6 u= d=
47(c) 2018 Ilfak Guilfanov
Use-def lists
Similar blocks are maintained for each block. Instead of
calculating them on request we keep them precalculated:
We keep both “must” and “may” access lists
The values in parenthesis are part of the “may” list
For example, an indirect memory access may read any
memory:
; 1WAY-BLOCK 6 INBOUNDS: 5 OUTBOUNDS: 58 [START=401515 END=401517]
; USE: ebx.4,ds.2,(GLBLOW,GLBHIGH)
; DEF: eax.4,(cf.1,zf.1,sf.1,of.1,pf.1,edx.4,ecx.4,fps.2,fl.1,
; c0.1,c2.1,c3.1,df.1,if.1,ST00_12.12,GLBLOW,GLBHIGH)
; DNU: eax.4
add [ds.2:(ebx.4+#4.4)].4, #2.4, ST18_4.4
; u=ebx.4,ds.2,(GLBLOW,GLBHIGH)
; d=ST18_4.4
48(c) 2018 Ilfak Guilfanov
Usefulness of use-def lists
Based on use-def lists of each block the decompiler can
build global use-def chains and answer questions like:
− Is a defined value used anywhere? If yes, where
exactly? Just one location? If yes, what about moving
the definition there? If the value is used nowhere,
what about deleting it?
− Where does a value come from? If only from one
location, can we propagate (or even move) it?
− What are the values are the used but never defined?
These are the candidates for input arguments
− What are the values that are defined but never used
but reach the last block? These are the candidates for
the return values
49(c) 2018 Ilfak Guilfanov
Global propagation in action
Image we have code like this:
mov #5.4, esi.4
Do some stuff
that does not modify esi.4
call func(esi.4)
blk1
blk3
blk1blk1
blk2
50(c) 2018 Ilfak Guilfanov
Global propagation in action
The use-def chains clearly show that esi is defined only in
block #1:
Therefore it can be propagated:
mov #5.4, esi.4
Do some stuff
that does not modify esi.4
call func(esi.4)
blk1
blk3
blk1blk1
blk2
use:
def: esi.4{3}
use: ...
def: ...
use: esi.4{1}
def: ...
call func(#5.4)
51(c) 2018 Ilfak Guilfanov
Testing the microcode
Microcode if verified for consistency after every
transformation
BTW, third party plugins should do the same
Very few microcode related bug reports
We have quite extensive test suites that constantly grow
A hundred or so of processors cores running tests
However, after publishing microcode there will be a new
wave of bug reports
Found a bug? Send us the database with the description
how to reproduce it
Most problems are solved within one day or faster
52(c) 2018 Ilfak Guilfanov
Publishing microcode
The microcode API for C++ is available since the 27th
of
Feb (since 2 days already! :)
Python API is not available yet
Check out the sample plugins that show how to use the
new API
We will blog about how to use the microcode API
Any technical feedback? We love that, bring it on! :)
53(c) 2018 Ilfak Guilfanov
Was it interesting?
Thank you for your attention!
Questions?

More Related Content

PDF
Ricardo J. Rodríguez & Daniel Uroz - When ROP meets Turing: Automatic Generat...
PDF
One Shellcode to Rule Them All: Cross-Platform Exploitation
PPTX
Software to the slaughter
PDF
Course lecture - An introduction to the Return Oriented Programming
PPTX
Return oriented programming (ROP)
PDF
Return Oriented Programming - ROP
PDF
PDF
AllBits presentation - Lower Level SW Security
Ricardo J. Rodríguez & Daniel Uroz - When ROP meets Turing: Automatic Generat...
One Shellcode to Rule Them All: Cross-Platform Exploitation
Software to the slaughter
Course lecture - An introduction to the Return Oriented Programming
Return oriented programming (ROP)
Return Oriented Programming - ROP
AllBits presentation - Lower Level SW Security

What's hot (20)

PDF
How Triton can help to reverse virtual machine based software protections
PPTX
PPT
Lec19 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Pr...
PDF
Zn task - defcon russia 20
PPT
PDF
Scale17x buffer overflows
PPTX
1300 david oswald id and ip theft with side-channel attacks
PPTX
Cisco IOS shellcode: All-in-one
PDF
System Hacking Tutorial #2 - Buffer Overflow - Overwrite EIP
PPTX
Return Oriented Programming (ROP) Based Exploits - Part I
PDF
Linux Shellcode disassembling
PPT
Assem -lect-6
PDF
Qemu JIT Code Generator and System Emulation
ODP
Design and implementation_of_shellcodes
PDF
Dynamic Binary Analysis and Obfuscated Codes
PDF
ROP 輕鬆談
PDF
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
PDF
from Binary to Binary: How Qemu Works
PDF
Inside Winnyp
PPTX
07 - Bypassing ASLR, or why X^W matters
How Triton can help to reverse virtual machine based software protections
Lec19 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Pr...
Zn task - defcon russia 20
Scale17x buffer overflows
1300 david oswald id and ip theft with side-channel attacks
Cisco IOS shellcode: All-in-one
System Hacking Tutorial #2 - Buffer Overflow - Overwrite EIP
Return Oriented Programming (ROP) Based Exploits - Part I
Linux Shellcode disassembling
Assem -lect-6
Qemu JIT Code Generator and System Emulation
Design and implementation_of_shellcodes
Dynamic Binary Analysis and Obfuscated Codes
ROP 輕鬆談
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
from Binary to Binary: How Qemu Works
Inside Winnyp
07 - Bypassing ASLR, or why X^W matters
Ad

Similar to Ilfak Guilfanov - Decompiler internals: Microcode [rooted2018] (20)

DOCX
Instruction set of 8086 Microprocessor
PDF
BlueHat Seattle 2019 || Modern Binary Analysis with ILs
PPTX
02 - Introduction to the cdecl ABI and the x86 stack
PPT
The bytecode mumbo-jumbo
PPTX
Logic Design - Chapter 5: Part1 Combinattional Logic
PPT
Chapter Eight(3)
PPT
EMBEDDED SYSTEMS 4&5
PPT
The bytecode hocus pocus - JavaOne 2016
PPTX
ISA.pptx
PDF
Pragmatic Optimization in Modern Programming - Mastering Compiler Optimizations
PDF
[FT-11][suhorng] “Poor Man's” Undergraduate Compilers
PPT
5asm the stackandsubroutines
PDF
The true story_of_hello_world
PDF
Abstracting Vector Architectures in Library Generators: Case Study Convolutio...
PPT
Chapter 2 Part2 C
PDF
ARIC Team Seminar
PDF
Reverse Engineering Dojo: Enhancing Assembly Reading Skills
PDF
HES2011 - James Oakley and Sergey bratus-Exploiting-the-Hard-Working-DWARF
PDF
ESL Anyone?
PPT
Happy To Use SIMD
Instruction set of 8086 Microprocessor
BlueHat Seattle 2019 || Modern Binary Analysis with ILs
02 - Introduction to the cdecl ABI and the x86 stack
The bytecode mumbo-jumbo
Logic Design - Chapter 5: Part1 Combinattional Logic
Chapter Eight(3)
EMBEDDED SYSTEMS 4&5
The bytecode hocus pocus - JavaOne 2016
ISA.pptx
Pragmatic Optimization in Modern Programming - Mastering Compiler Optimizations
[FT-11][suhorng] “Poor Man's” Undergraduate Compilers
5asm the stackandsubroutines
The true story_of_hello_world
Abstracting Vector Architectures in Library Generators: Case Study Convolutio...
Chapter 2 Part2 C
ARIC Team Seminar
Reverse Engineering Dojo: Enhancing Assembly Reading Skills
HES2011 - James Oakley and Sergey bratus-Exploiting-the-Hard-Working-DWARF
ESL Anyone?
Happy To Use SIMD
Ad

More from RootedCON (20)

PDF
Rooted2020 A clockwork pentester - Jose Carlos Moral & Alvaro Villaverde
PDF
rooted2020 Sandbox fingerprinting -_evadiendo_entornos_de_analisis_-_victor_c...
PDF
Rooted2020 hunting malware-using_process_behavior-roberto_amado
PPSX
Rooted2020 compliance as-code_-_guillermo_obispo_-_jose_mariaperez_-_
PDF
Rooted2020 the day i_ruled_the_world_deceiving_software_developers_through_op...
PPTX
Rooted2020 si la-empresa_ha_ocultado_el_ciberataque,_como_se_ha_enterado_el_r...
PPTX
Rooted2020 wordpress-another_terror_story_-_manuel_garcia_-_jacinto_sergio_ca...
PPTX
Rooted2020 Atacando comunicaciones-de_voz_cifradas_-_jose_luis_verdeguer
PDF
rooted2020-Rootkit necurs no_es_un_bug,_es_una_feature_-_roberto_santos_-_jav...
PDF
Rooted2020 stefano maccaglia--_the_enemy_of_my_enemy
PPTX
Rooted2020 taller de-reversing_de_binarios_escritos_en_golang_-_mariano_palom...
PPTX
Rooted2020 virtual pwned-network_-_manel_molina
PDF
Rooted2020 van a-mear_sangre_como_hacer_que_los_malos_lo_paguen_muy_caro_-_an...
PDF
Rooted2020 todo a-siem_-_marta_lopez
PPTX
Rooted2020 roapt evil-mass_storage_-_tu-ya_aqui_-_david_reguera_-_abel_valero
PDF
Rooted2020 live coding--_jesus_jara
PDF
Rooted2020 legalidad de-la_prueba_tecnologica_indiciaria_cuando_tu_papi_es_un...
PDF
Rooted2020 hackeando el-mundo_exterior_a_traves_de_bluetooth_low-energy_ble_-...
PDF
Rooted2020 evading deep-learning_malware_detectors_-_javier_yuste
PDF
Rooted2020 encontrando 0days-en_2020_-_antonio_morales
Rooted2020 A clockwork pentester - Jose Carlos Moral & Alvaro Villaverde
rooted2020 Sandbox fingerprinting -_evadiendo_entornos_de_analisis_-_victor_c...
Rooted2020 hunting malware-using_process_behavior-roberto_amado
Rooted2020 compliance as-code_-_guillermo_obispo_-_jose_mariaperez_-_
Rooted2020 the day i_ruled_the_world_deceiving_software_developers_through_op...
Rooted2020 si la-empresa_ha_ocultado_el_ciberataque,_como_se_ha_enterado_el_r...
Rooted2020 wordpress-another_terror_story_-_manuel_garcia_-_jacinto_sergio_ca...
Rooted2020 Atacando comunicaciones-de_voz_cifradas_-_jose_luis_verdeguer
rooted2020-Rootkit necurs no_es_un_bug,_es_una_feature_-_roberto_santos_-_jav...
Rooted2020 stefano maccaglia--_the_enemy_of_my_enemy
Rooted2020 taller de-reversing_de_binarios_escritos_en_golang_-_mariano_palom...
Rooted2020 virtual pwned-network_-_manel_molina
Rooted2020 van a-mear_sangre_como_hacer_que_los_malos_lo_paguen_muy_caro_-_an...
Rooted2020 todo a-siem_-_marta_lopez
Rooted2020 roapt evil-mass_storage_-_tu-ya_aqui_-_david_reguera_-_abel_valero
Rooted2020 live coding--_jesus_jara
Rooted2020 legalidad de-la_prueba_tecnologica_indiciaria_cuando_tu_papi_es_un...
Rooted2020 hackeando el-mundo_exterior_a_traves_de_bluetooth_low-energy_ble_-...
Rooted2020 evading deep-learning_malware_detectors_-_javier_yuste
Rooted2020 encontrando 0days-en_2020_-_antonio_morales

Recently uploaded (20)

PDF
Approach and Philosophy of On baking technology
PPTX
Tartificialntelligence_presentation.pptx
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Hybrid model detection and classification of lung cancer
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Mushroom cultivation and it's methods.pdf
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
August Patch Tuesday
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
cloud_computing_Infrastucture_as_cloud_p
PPTX
TLE Review Electricity (Electricity).pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
Chapter 5: Probability Theory and Statistics
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
Approach and Philosophy of On baking technology
Tartificialntelligence_presentation.pptx
Zenith AI: Advanced Artificial Intelligence
1 - Historical Antecedents, Social Consideration.pdf
Unlocking AI with Model Context Protocol (MCP)
A novel scalable deep ensemble learning framework for big data classification...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Hybrid model detection and classification of lung cancer
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Mushroom cultivation and it's methods.pdf
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
August Patch Tuesday
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
MIND Revenue Release Quarter 2 2025 Press Release
cloud_computing_Infrastucture_as_cloud_p
TLE Review Electricity (Electricity).pptx
Programs and apps: productivity, graphics, security and other tools
Chapter 5: Probability Theory and Statistics
Group 1 Presentation -Planning and Decision Making .pptx

Ilfak Guilfanov - Decompiler internals: Microcode [rooted2018]

  • 2. 2(c) 2018 Ilfak Guilfanov Presentation Outline Decompiler architecture Overview of the microcode Opcodes and operands Stack and registers Data flow analysis, aliasibility Microcode availability Your feedback Online copy of this presentation is available at https://www.hex-rays.com/products/ida/support/ppt/rootedcon2018.ppt
  • 3. 3(c) 2018 Ilfak Guilfanov Hex-Rays Decompiler Interactive, fast, robust, and programmable decompiler Can handle x86, x64, ARM, ARM64, PowerPC Runs on top of IDA Pro Has been evolving for more than 10 years Internals were not really published Namely, the intermediate language
  • 4. 4(c) 2018 Ilfak Guilfanov Decompiler architecture It uses very straightforward sequence of steps: Generate microcode Transform microcode (optimize, resolve memrefs, analyze calls, etc) Allocate local vars Generate ctree Beautify ctree Print ctree
  • 5. 5(c) 2018 Ilfak Guilfanov Decompiler architecture We will focus on the first two steps: Generate microcode Transform microcode (optimize, resolve memrefs, analyze calls, etc) Allocate local vars Generate ctree Beautify ctree Print ctree
  • 6. 6(c) 2018 Ilfak Guilfanov Why microcode? It helps to get rid of the complexity of processor instructions Also we get rid of processor idiosyncrasies. Examples: − x86: segment registers, fpu stack − ARM: thumb mode addresses − PowerPC: multiple copies of CF register (and other condition registers) − MIPS: delay slots − Sparc: stack windows It makes the decompiler portable. We “just” need to replace the microcode generator Writing a decompiler without an intermediate language looks like waste of time
  • 7. 7(c) 2018 Ilfak Guilfanov Is implementing an IR difficult? Your call :) How many IR languages to you know?
  • 8. 8(c) 2018 Ilfak Guilfanov Why not use an existing IR? There are tons of other intermediate languages: LLVM, REIL, Binary Ninja's ILs, RetDec's IL, etc. Yes, we could use something But I started to work on the microcode when none of the above languages existed This is the main reason why we use our own IR mov.d EAX,, T0 ldc.d #5,, T1 mkcadd.d T0, T1, CF mkoadd.d T0, T1, CF add.d T0, T1, TT setz.d TT,, ZF sets.d TT,, ZF mov.d TT,, EAX (this is how it looked like in 1999)
  • 9. 9(c) 2018 Ilfak Guilfanov A long evolution I started to work on the microcode in 1998 or earlier The name is nothing fancy but reflects the nature of it Some design decisions turned out to be bad (and some of them are already very difficult to fix) For example, the notion of virtual stack registers We will fix it, though. Just takes time Even today we modify our microcode when necessary For example, I reshuffled the instruction opcodes for this talk...
  • 10. 10(c) 2018 Ilfak Guilfanov Design highlights Simplicity: − No processor specific stuff − One microinstruction does one thing − Small number of instructions (only 45 in 1999, now 72) − Simple instruction operands (register, number, memory) − Consider only compiler generated code Discard things we do not care about: − Instruction timing (anyway it is a lost battle) − Instruction order (exceptions are a problem!) − Order of memory accesses (later we added logic to preserve indirect memory accesses) − Handcrafted code
  • 11. 11(c) 2018 Ilfak Guilfanov Generated microcode Initially the microcode looks like RISC code: − Memory loads and stores are done using dedicated microinstructions − The desired operation is performed on registers − Microinstructions have no side effects − Each output register is initialized by a separate microinstruction It is very verbose. Example: 004014FB mov eax, [ebx+4] 004014FE mov dl, [eax+1] 00401501 sub dl, 61h ; 'a' 00401504 jz short loc_401517
  • 12. 12(c) 2018 Ilfak Guilfanov Initial microcode: very verbose 2. 0 mov ebx.4, eoff.4 ; 4014FB u=ebx.4 d=eoff.4 2. 1 mov ds.2, seg.2 ; 4014FB u=ds.2 d=seg.2 2. 2 add eoff.4, #4.4, eoff.4 ; 4014FB u=eoff.4 d=eoff.4 2. 3 ldx seg.2, eoff.4, et1.4 ; 4014FB u=eoff.4,seg.2, ; (STACK,GLBMEM) d=et1.4 2. 4 mov et1.4, eax.4 ; 4014FB u=et1.4 d=eax.4 2. 5 mov eax.4, eoff.4 ; 4014FE u=eax.4 d=eoff.4 2. 6 mov ds.2, seg.2 ; 4014FE u=ds.2 d=seg.2 2. 7 add eoff.4, #1.4, eoff.4 ; 4014FE u=eoff.4 d=eoff.4 2. 8 ldx seg.2, eoff.4, t1.1 ; 4014FE u=eoff.4,seg.2, ; (STACK,GLBMEM) d=t1.1 2. 9 mov t1.1, dl.1 ; 4014FE u=t1.1 d=dl.1 2.10 mov #0x61.1, t1.1 ; 401501 u= d=t1.1 2.11 setb dl.1, t1.1, cf.1 ; 401501 u=dl.1,t1.1 d=cf.1 2.12 seto dl.1, t1.1, of.1 ; 401501 u=dl.1,t1.1 d=of.1 2.13 sub dl.1, t1.1, dl.1 ; 401501 u=dl.1,t1.1 d=dl.1 2.14 setz dl.1, #0.1, zf.1 ; 401501 u=dl.1 d=zf.1 2.15 setp dl.1, #0.1, pf.1 ; 401501 u=dl.1 d=pf.1 2.16 sets dl.1, sf.1 ; 401501 u=dl.1 d=sf.1 2.17 mov cs.2, seg.2 ; 401504 u=cs.2 d=seg.2 2.18 mov #0x401517.4, eoff.4 ; 401504 u= d=eoff.4 2.19 jcnd zf.1, $loc_401517 ; 401504 u=zf.1
  • 13. 13(c) 2018 Ilfak Guilfanov The first optimization pass 2. 0 ldx ds.2, (ebx.4+#4.4), eax.4 ; 4014FB u=ebx.4,ds.2, ;(STACK,GLBMEM) d=eax.4 2. 1 ldx ds.2, (eax.4+#1.4), dl.1 ; 4014FE u=eax.4,ds.2, ;(STACK,GLBMEM) d=dl.1 2. 2 setb dl.1, #0x61.1, cf.1 ; 401501 u=dl.1 d=cf.1 2. 3 seto dl.1, #0x61.1, of.1 ; 401501 u=dl.1 d=of.1 2. 4 sub dl.1, #0x61.1, dl.1 ; 401501 u=dl.1 d=dl.1 2. 5 setz dl.1, #0.1, zf.1 ; 401501 u=dl.1 d=zf.1 2. 6 setp dl.1, #0.1, pf.1 ; 401501 u=dl.1 d=pf.1 2. 7 sets dl.1, sf.1 ; 401501 u=dl.1 d=sf.1 2. 8 jcnd zf.1, $loc_401517 ; 401504 u=zf.1 Only 8 microinstructions Some intermediate registers disappeared Sub-instructions appeared Still too noisy and verbose
  • 14. 14(c) 2018 Ilfak Guilfanov Further microcode transformations And the final code is: This code is ready to be translated to ctree. (numbers in curly braces are value numbers) The output will look like this: 2. 1 ldx ds.2{3}, ([ds.2{3}:(ebx.4+#4.4)].4+#1.4), dl.1{5} ; 4014FE ; u=ebx.4,ds.2,(GLBLOW,sp+20..,GLBHIGH) d=dl.1 2. 2 sub dl.1{5}, #0x61.1, dl.1{6} ; 401501 u=dl.1 d=dl.1 2. 3 jz dl.1{6}, #0.1, @7 ; 401504 u=dl.1 2. 0 jz [ds.2{4}:([ds.2{4}:(ebx.4{8}+#4.4){7}].4{6}+#1.4){5}].1{3}, #0x61.1, @7 ; 401504 u=ebx.4,ds.2,(GLBLOW,GLBHIGH) if ( argv[1][1] == 'a' ) ...
  • 15. 15(c) 2018 Ilfak Guilfanov Minor details Reading microcode is not easy (but hey, it was not designed for that! :) All operand sizes are spelled out explicitly The initial microcode is very simple (RISC like) As we transform microcode, nested subinstructions may appear We implemented the translation from processor instructions to microinstructions in plain C++ We do not use automatic code generators or machine descriptions to generate them. Anyway there are too many processor specific details to make them feasible
  • 16. 16(c) 2018 Ilfak Guilfanov Opcodes: constants and move Copy from (l) to (d)estination Operand sizes must match ldc l, d // load constant mov l, d // move
  • 17. 17(c) 2018 Ilfak Guilfanov Opcodes: changing operand size Copy from (l) to (d)estination Operand sizes must differ Since real world programs work with partial registers (like al, ah), we absolutely need low/high xds l, d // extend (signed) xdu l, d // extend (unsigned) low l, d // take low part high l, d // take high part
  • 18. 18(c) 2018 Ilfak Guilfanov Opcodes: load and store {sel, off} is a segment:offset pair Usually seg is ds or cs; for processors with flat memory it is ignored 'off' is the most interesting part, it is a memory address stx l, sel, off // store value to memory ldx sel, off, d // load value from memory Example: ldx ds.2, (ebx.4+#4.4), eax.4 stx #0x2E.1, ds.2, eax.4
  • 19. 19(c) 2018 Ilfak Guilfanov Opcodes: comparisons Compare (l)left against (r)right The result is stored into (d)estination, a bit register like CF,ZF,SF,... sets l, d // sign setp l, r, d // unordered/parity setnz l, r, d // not equal setz l, r, d // equal setae l, r, d // above or equal setb l, r, d // below seta l, r, d // above setbe l, r, d // below or equal setg l, r, d // greater setge l, r, d // greater or equal setl l, r, d // less setle l, r, d // less or equal seto l, r, d // overflow of (l-r)
  • 20. 20(c) 2018 Ilfak Guilfanov Opcodes: arithmetic and bitwise operations Operand sizes must be the same The result is stored into (d)estination neg l, d // -l -> d lnot l, d // !l -> d bnot l, d // ~l -> d add l, r, d // l + r -> d sub l, r, d // l - r -> d mul l, r, d // l * r -> d udiv l, r, d // l / r -> d sdiv l, r, d // l / r -> d umod l, r, d // l % r -> d smod l, r, d // l % r -> d or l, r, d // bitwise or and l, r, d // bitwise and xor l, r, d // bitwise xor
  • 21. 21(c) 2018 Ilfak Guilfanov Opcodes: shifts (and rotations?) Shift (l)eft by the amount specified in (r)ight The result is stored into (d)estination Initially our microcode had rotation operations but they turned out to be useless because they can not be nicely represented in C shl l, r, d // shift logical left shr l, r, d // shift logical right sar l, r, d // shift arithmetic right
  • 22. 22(c) 2018 Ilfak Guilfanov Opcodes: condition codes Perform the operation on (l)left and (r)ight Generate carry or overflow bits Store CF or OF into (d)estination We need these instructions to precisely track carry and overflow bits Normally these instructions get eliminated during microcode transformations cfadd l, r, d // carry of (l+r) ofadd l, r, d // overflow of (l+r) cfshl l, r, d // carry of (l<<r) cfshr l, r, d // carry of (l>>r)
  • 23. 23(c) 2018 Ilfak Guilfanov Opcodes: unconditional flow control Initially calls have only the callee address The decompiler retrieves the callee prototype from the database or tries to guess it After that the 'd' operand contains all information about the call, including the function prototype and actual arguments ijmp {sel, off} // indirect jmp goto l // unconditional jmp call l d // direct call icall {sel, off} d // indirect call ret // return call $___org_fprintf <...: “FILE *” &($stdout).4, "const char *" &($aArIllegalSwitc).4, _DWORD xds.4([ds.2:([ds.2:(ebx.4+#4.4)].4+#1.4)].1)>.0
  • 24. 24(c) 2018 Ilfak Guilfanov Opcodes: conditional jumps Compare (l)eft against (r)right and jump to (d)estination if the condition holds Jtbl is used to represent 'switch' idioms jcnd l, d // jnz l, r, d // ZF=0 Not Equal jz l, r, d // ZF=1 Equal jae l, r, d // CF=0 Above or Equal jb l, r, d // CF=1 Below ja l, r, d // CF=0 & ZF=0 Above jbe l, r, d // CF=1 | ZF=1 Below or Equal jg l, r, d // SF=OF & ZF=0 Greater jge l, r, d // SF=OF Greater or Equal jl l, r, d // SF!=OF Less jle l, r, d // SF!=OF | ZF=1 Less or Equal jtbl l, cases // Table jump
  • 25. 25(c) 2018 Ilfak Guilfanov Opcodes: floating point operations Basically we have conversions and a few arithmetic operations There is little we can do with these operations, they are not really optimizable Other fp operations use helper functions (e.g. sqrt) f2i l, d // int(l) => d; convert fp -> int, any size f2u l, d // uint(l)=> d; convert fp -> uint,any size i2f l, d // fp(l) => d; convert int -> fp, any size i2f l, d // fp(l) => d; convert uint-> fp, any size f2f l, d // l => d; change fp precision fneg l, d // -l => d; change sign fadd l, r, d // l + r => d; add fsub l, r, d // l - r => d; subtract fmul l, r, d // l * r => d; multiply fdiv l, r, d // l / r => d; divide
  • 26. 26(c) 2018 Ilfak Guilfanov Opcodes: miscellaneous Some operations can not be expressed in microcode If possible, we use intrinsic calls for them (e.g. sqrtpd) If no intrinsic call exists, we use “ext” for them and only try to keep track of data dependencies (e.g. “aam”) “und” is used when a register is spoiled in a way that we can not predict or describe (e.g. ZF after mul) nop // no operation und d // undefine ext l, r, d // external insn push l pop d
  • 27. 27(c) 2018 Ilfak Guilfanov More opcodes? We quickly reviewed all 72 instructions Probably we should extend microcode Ternary operator? Post-increment and post-decrement? All this requires more research
  • 28. 28(c) 2018 Ilfak Guilfanov Operands! As everyone else, initially we had only: − constant integer numbers − registers Life was simple and easy in the good old days! Alas, the reality is more diverse. We quickly added: − stack variables − global variables − address of an operand − list of cases (for switches) − result of another instruction − helper functions − call arguments − string and floating point constants
  • 29. 29(c) 2018 Ilfak Guilfanov Register operands The microcode engine provides unlimited (in theory) number of microregisters Process registers are mapped to microregisters: − eax => microregisters (mreg) 8, 9, 10, 11 − al => mreg 8 − ah => mreg 9 Usually there are more microregisters than the processor registers. We allocate them as needed when generating microcode Examples: eax.4 rsi.8 ST00_04.4
  • 30. 30(c) 2018 Ilfak Guilfanov Stack as microregisters I was reluctant to introduce a new operand type for stack variables and decided to map the stack frame to microregisters Like, the stack frame is mapped to the microregister #100 and higher A bright idea? Nope! Very soon I realized that we have to handle indirect references to the stack frame Not really possible with microregisters But there was so much code relying on this concept that we still have it Laziness “pays off” now and will do so in the future (negatively)
  • 31. 31(c) 2018 Ilfak Guilfanov Stack as viewed by the decompiler Shadow stkargs Input stkargs Return address Saved registers Local variables Output stkargs (not visible in IDA) inargtop inargoff minimal esp typical ebp minstkref typical ebp minargref Local variables stkvar base 0 Input stkargs Yellow part is mapped to microregisters Red is aliasable
  • 32. 32(c) 2018 Ilfak Guilfanov More operand types! 64-bit values are represented as pairs of registers Usually it is a standard pair like edx:eax Compilers get better and nowadays use any registers as a pair; or even pair a stack location with a register: sp+4:esi We ended up with a new operand type: − operand pair It consists of low and high halves They can be located anywhere (stack, registers, glbmem)
  • 33. 33(c) 2018 Ilfak Guilfanov Scattered operands The nightmare has just begun, in fact Modern compilers use very intricate rules to pass structs and unions by value to and from the called functions A register like RDI may contain multiple structure fields Some structure fields may be passed on the stack Some in the floating registers Some in general registers (unaligned wrt register start) We had no other choice but to add − scattered operands that can represent all the above
  • 34. 34(c) 2018 Ilfak Guilfanov A simple scattered return value A function that returns a struct in rax: Assembler code: struct div_t { int quot; int rem; }; div_t div(int numer, int denom); mov edi, esi mov esi, 1000 call _div movsxd rdx, eax sar rax, 20h add [rbx], rdx imul eax, 1000 cdqe add rax, [rbx+8]
  • 35. 35(c) 2018 Ilfak Guilfanov A simple scattered return value …and the output is: Our decompiler managed to represent things nicely! Similar or more complex situations exist for all 64-bit processors Support for scattered operands is not complete yet but we continue to improve it v2 = div(a2, 1000); *a1 += v2.quot; result = a1[1] + 1000 * v2.rem;
  • 36. 36(c) 2018 Ilfak Guilfanov More detailed look at microcode transformations The initial “preoptimization” step uses very simple constant and register propagation algorithm It is very fast It gets rid of most temporary registers and reduces the microcode size by two Normally we use a more sophisticated propagation algorithm It also works on the basic block level It is much slower but can: − handle partial registers (propagate eax into an expression that uses ah) − move entire instruction inside another − work with operands other that registers (stack and global memory, pair and scattered operands)
  • 37. 37(c) 2018 Ilfak Guilfanov Global optimization We build the control flow graph Perform data flow analysis to find where each operand is used or defined The use/def information is used to: − delete dead code (if the instruction result is not used, then we delete the instruction) − propagate operands and instructions across block boundaries − generate assertions for future optimizations (we know that eax is zero at the target of “jz eax” if there are no other predecessors; so we generate “mov 0, eax”)
  • 38. 38(c) 2018 Ilfak Guilfanov Synthetic assertion instructions If jump is not taken, then we know that eax is zero Assertions can be propagated and lead to more simplifications jnz eax.4, #0, @5 blk5: ... mov #0.4, eax.4 ; assert ... falsetrue
  • 39. 39(c) 2018 Ilfak Guilfanov Simple algebraic transformations We have implemented (in plain C++) hundreds of very small optimization rules. For example: They are simple and sound They apply to all cases without exceptions Overall the decompiler uses sound rules They do not depend on the compiler (x-y)+y => x x- ~y => x+y+1 x*m-x*n => x*(m-n) (x<<n)-x => (2**n-1)*x -(x-y) => y-x (~x) < 0 => x >= 0 (-x)*n => x*-n
  • 40. 40(c) 2018 Ilfak Guilfanov More complex rules For example, this rule recognizes 64-bit subtractions: We have a swarm of rules like this. They work like little ants :) CMB18 (combination rule #18): sub xlow.4, ylow.4, rlow.4 sub xhigh.4, (xdu.4((xlow.4 <u ylow.4))+yhigh.4), rhigh.4 => sub x.8, y.8, r.8 if yhigh is zero, then it can be optimized away a special case when xh is zero: sub xl, yl, rl neg (xdu(lnot(xl >=u yl))+yh), rh
  • 41. 41(c) 2018 Ilfak Guilfanov Data dependency dependent rules Naturally, all these rules are compiler-independent, they use common algebraic number properties Unfortunately we do not have a language to describe these rules, so we manually added these rules in C++ However, the pattern recognition does not naively check if the previous or next instruction is the expected one. We use data dependencies to find the instructions that form the pattern For example, the rule CMB43 looks for the 'low' instruction by searching forward for an instruction that accesses the 'x' operand: CMB43: mul #(1<<N).4, xl.4, yl.4 low (x.8 >>a #M.1), yh.4, M == 32-N => mul x.8, #(1<<N).8, y.8
  • 42. 42(c) 2018 Ilfak Guilfanov Interblock rules Some rules work across multiple blocks: jl xh, yh, SUCCESS jg xh, yh, @4 jb xl, yl, SUCCESS FAILED: ... SUCCESS: ... jl x, y, SUCCESS FAILED: ... SUCCESS: ... The “64bit 3-way check” rule transforms this structure into simple: (xh means high half of x xl means low half of x yh means high half of y yl means low half of y)
  • 43. 43(c) 2018 Ilfak Guilfanov Interblock rules: signed division by power2 Signed division is sometimes replaced by a shift: A simple rule transforms it back: jcnd !SF(x), b3 add x, (1<<N)-1, x sar x, N, r sdiv x, (1<<N), r
  • 44. 44(c) 2018 Ilfak Guilfanov Hooks It is possible to hook to the optimization engine and add your own transformation rules The Decompiler SDK has some examples how to do it Currently it is not possible to disable an existing rule However, since (almost?) all of them are sound and do not use heuristics, it is not a problem In fact the processor specific parts of the decompiler internally use these hooks as well
  • 45. 45(c) 2018 Ilfak Guilfanov ARM hooks For example, the ARM decompiler has the following rule: so that a construct like this: BX LR will be converted into: RET only if we can prove that the value of LR at the "BX LR" instruction is equal to the initial value of LR at the entry point. However, how do we find if we jump to the initial_lr? Data analysis is to help us ijmp cs, initial_lr => ret
  • 46. 46(c) 2018 Ilfak Guilfanov Data flow analysis In fact virtually all transformation rules are based on data flow analysis. Very rarely we check the previous or the next instruction for pattern matching Instead, we calculate the use/def lists for the instruction and search for the instructions that access them We keep track of what is used and what is defined by every microinstruction (in red). These lists are calculated when necessary: mov %argv.4, ebx.4 ; 4014E9 u=arg+4.4 d=ebx.4 mov %argc.4, edi.4 ; 4014EC u=arg+0.4 d=edi.4 mov &($dword_41D128).4, ST18_4.4 ; 4014EF u= d=ST18_4.4 goto @12 ; 4014F6 u= d=
  • 47. 47(c) 2018 Ilfak Guilfanov Use-def lists Similar blocks are maintained for each block. Instead of calculating them on request we keep them precalculated: We keep both “must” and “may” access lists The values in parenthesis are part of the “may” list For example, an indirect memory access may read any memory: ; 1WAY-BLOCK 6 INBOUNDS: 5 OUTBOUNDS: 58 [START=401515 END=401517] ; USE: ebx.4,ds.2,(GLBLOW,GLBHIGH) ; DEF: eax.4,(cf.1,zf.1,sf.1,of.1,pf.1,edx.4,ecx.4,fps.2,fl.1, ; c0.1,c2.1,c3.1,df.1,if.1,ST00_12.12,GLBLOW,GLBHIGH) ; DNU: eax.4 add [ds.2:(ebx.4+#4.4)].4, #2.4, ST18_4.4 ; u=ebx.4,ds.2,(GLBLOW,GLBHIGH) ; d=ST18_4.4
  • 48. 48(c) 2018 Ilfak Guilfanov Usefulness of use-def lists Based on use-def lists of each block the decompiler can build global use-def chains and answer questions like: − Is a defined value used anywhere? If yes, where exactly? Just one location? If yes, what about moving the definition there? If the value is used nowhere, what about deleting it? − Where does a value come from? If only from one location, can we propagate (or even move) it? − What are the values are the used but never defined? These are the candidates for input arguments − What are the values that are defined but never used but reach the last block? These are the candidates for the return values
  • 49. 49(c) 2018 Ilfak Guilfanov Global propagation in action Image we have code like this: mov #5.4, esi.4 Do some stuff that does not modify esi.4 call func(esi.4) blk1 blk3 blk1blk1 blk2
  • 50. 50(c) 2018 Ilfak Guilfanov Global propagation in action The use-def chains clearly show that esi is defined only in block #1: Therefore it can be propagated: mov #5.4, esi.4 Do some stuff that does not modify esi.4 call func(esi.4) blk1 blk3 blk1blk1 blk2 use: def: esi.4{3} use: ... def: ... use: esi.4{1} def: ... call func(#5.4)
  • 51. 51(c) 2018 Ilfak Guilfanov Testing the microcode Microcode if verified for consistency after every transformation BTW, third party plugins should do the same Very few microcode related bug reports We have quite extensive test suites that constantly grow A hundred or so of processors cores running tests However, after publishing microcode there will be a new wave of bug reports Found a bug? Send us the database with the description how to reproduce it Most problems are solved within one day or faster
  • 52. 52(c) 2018 Ilfak Guilfanov Publishing microcode The microcode API for C++ is available since the 27th of Feb (since 2 days already! :) Python API is not available yet Check out the sample plugins that show how to use the new API We will blog about how to use the microcode API Any technical feedback? We love that, bring it on! :)
  • 53. 53(c) 2018 Ilfak Guilfanov Was it interesting? Thank you for your attention! Questions?