SlideShare a Scribd company logo
LLVM 總是打開你的心:
從電玩模擬器看編譯器應用實例
Jim Huang ( 黃敬群 ) <jserv@0xlab.org>
Jul 11, 2013 / 新竹碼農
Rights to copy
Attribution – ShareAlike 3.0
You are free
to copy, distribute, display, and perform the work
to make derivative works
to make commercial use of the work
Under the following conditions
Attribution. You must give the original author credit.
Share Alike. If you alter, transform, or build upon this work, you may distribute the
resulting work only under a license identical to this one.
For any reuse or distribution, you must make clear to others the license terms of this work.
Any of these conditions can be waived if you get permission from the copyright holder.
Your fair use and other rights are in no way affected by the above.
License text: http://creativecommons.org/licenses/by-sa/3.0/legalcode
© Copyright 2013 0xlab
http://0xlab.org/
contact@0xlab.org
Corrections, suggestions, contributions and translations
are welcome!
Latest update:Jul 22, 2013
姊妹議程:
〈窮得只剩下 Compiler 〉
OSDC.tw 2009
http://www.slideshare.net/jserv/what-can-compilers-do-for-us
1972 Pong ( 硬體 )
1980 Zork ( 高階直譯語言 )
1993 DOOM (C)
1998 Unreal (C++, Java-style scripting)
2005-6 Xbox 360, PlayStation 3
with 6-8 hardware threads
2009 Next console generation
Unification of the CPU, GPU.
Massive multi-core, data parallelism, etc.
遊戲產業加速技術革新
目標 : 重新編譯任天堂遊戲為 Native Game
http://andrewkelley.me/post/jamulator.html
提綱 (1) 任天堂遊戲機概況
(2) Code Generation: 使用 LLVM
(3) Optimizations & LLVM
(4) 建構 NES 執行環境
任天堂遊戲機
NES (Nintendo Entertainment System)
任天堂電玩主機歡度誕生 30 周年 !
http://gnn.gamer.com.tw/6/82966.html
• 1983 年 7 月 15 日在日本推出、陪伴許多 6 、 7 年級
玩家度過童年時光的任天堂 Family Computer( 簡稱
FAMICOM / FC ,台灣俗稱任天堂紅白機 ) 電視遊樂
器主機,歡度誕生 30 周年紀念
• 因為《超級瑪利歐兄弟》等眾多人氣經典遊戲的推出
而帶動了爆發性的銷售
• 自 1983 年推出到 2003 年停產為止,全球總計出貨
6291 萬台
NES 硬體簡述
• 四個獨立運作的單元
– CPU: 8-bit processor - MOS 6502.
– Graphics: Picture Processing Unit (PPU)
– Sound: Audio Processing Unit (APU)
– (Memory) Mapper: 提供額外 ROM 的客製化硬體
Source: https://en.wikipedia.org/wiki/Nintendo_Entertainment_SystemSource: https://en.wikipedia.org/wiki/Nintendo_Entertainment_System
NES 硬體定址空間
• NES 使用 MMIO (memory-mapped I/O) ,可藉由讀寫特定的記
憶體位址,存取到週邊硬體
• 定址空間為 64 KB ,所有位址可用 2 bytes 表示
起始位址 結束位址 描述
$0000 $07FF RAM: 除非使用 Mapper ,否則 NES 遊戲
只有 2K 的 RAM 空間
$2000 $2007 保留給 PPU 使用,此區段的記憶體內
容將會在螢幕上顯示
$4000 $4015 保留給 APU 使用,作為音效輸出
$4016 $4017 遊戲控制狀態的保存
$8000 $FFFF 唯讀記憶體,遊戲本身載入於此區段
[code+data]
$ 表示 16 進位$ 表示 16 進位
6502 程式組譯後成為 32 KB binary ,將在
系統啟動後,被載入到 $8000 - $FFFF 的範圍
6502 程式組譯後成為 32 KB binary ,將在
系統啟動後,被載入到 $8000 - $FFFF 的範圍
撰寫 Hello World! 程式
• 在 NES 開發程式的門檻不低,即便是 Hello World 等級的程式
• 難度在於得自行處理以下
– 硬體週邊
– Interrupt
• Reset
• IRQ: 以 CLI/SEI 指令控制觸發與否
• NMI (Non-Maskable Interrupt): 無硬體指令可控制此類 ;
vertical blank 起始時,會觸發 NMI
• 定義自訂的 ABI ,以簡化處理
位址 描述
$2008 Write a byte to this address and the byte will go to
standard out.
$2009 Write a byte to this address and the program will exit
with that byte as the return code.
org $C000
msg: .data "Hello, world!", 10, 0
Reset_Routine:
LDX #$00 ; put the starting index, 0, in register X
loop: LDA msg, X ; read 1 char
BEQ loopend ; end loop if we hit the 0
STA $2008 ; putchar (custom ABI)
INX ; (Register X)++
JMP loop ; repeat
loopend:
LDA #$00 ; return code 0
STA $2009 ; exit (custom ABI)
IRQ_Routine: rti ; do nothing
NMI_Routine: rti ; do nothing
org $FFFA
dc.w NMI_Routine
dc.w Reset_Routine
dc.w IRQ_Routine
code/data 將自 $C000 位址開始,所以 "Hello, World!n0"
字串的 W 字元將被載入到位址 $C007
code/data 將自 $C000 位址開始,所以 "Hello, World!n0"
字串的 W 字元將被載入到位址 $C007
若 zero flag 為 1 ,跳往 loopend 標記,
否則移往下一道指令
若 zero flag 為 1 ,跳往 loopend 標記,
否則移往下一道指令
將暫存器 A 的內容存到記憶體位址 $2008 ,
這意味著,將暫存器 A 的值輸出到 stdout
將暫存器 A 的內容存到記憶體位址 $2008 ,
這意味著,將暫存器 A 的值輸出到 stdout
NES 程式僅有 32 KB ,這指令告知組譯器,
填寫空白直到位址 $FFFA
NES 程式僅有 32 KB ,這指令告知組譯器,
填寫空白直到位址 $FFFA
; memcpy --
; Copy a block of memory from one location to another.
;
; Entry parameters
; SRC - Address of source data block
; DST - Address of target data block
; CNT - Number of bytes to copy
0040 ORG $0040 ;Parameters at $0040
0040 00 00 SRC DW $0000
0042 00 00 DST DW $0000
0044 00 00 CNT DW $0000
0600 ORG $0600 ;Code at $0600
0600 A4 44 MEMCPY LDY CNT+0 ;Set Y = SRC.L
0602 D0 05 BNE LOOP ;If SRC.L > 0, then loop
0604 A5 45 LDA CNT+1 ;If CNT.H > 0,
0606 D0 01 BNE LOOP ; then loop
0608 60 RTS ;Return
0609 B1 40 LOOP LDA (SRC),Y ;Load A from ((SRC)+Y)
060B 91 42 STA (DST),Y ;Store A to ((DST)+Y)
060D 88 DEY ;Decr CNT.L
060E D0 F9 BNE LOOP ;if CNT.L > 0, then loop
0610 E6 41 INC SRC+1 ;Incr SRC += $0100
0612 E6 43 INC DST+1 ;Incr DST += $0100
0614 88 DEY ;Decr CNT.L
0615 C6 45 DEC CNT+1 ;Decr CNT.H
0617 D0 F0 BNE LOOP ;If CNT.H > 0, then loop
0619 60 RTS ;Return
061A END
Source: https://en.wikipedia.org/wiki/MOS_Technology_6502Source: https://en.wikipedia.org/wiki/MOS_Technology_6502
• 在 NES 遊戲中,記憶體區段 $FFFA 到 $FFFF 有特別含意
• 當遊戲啟動時, NES 會從 $FFFC 與 $FFFD 讀取 2 bytes ,並
使用其內含值,決定第一道執行指令的位址
• 6502 暫存器定義
名稱 寬度 描述
A 8-bit "main" register. Most of the arithmetic operations are performed on this register.
X 8-bit Another general purpose register. Fewer instructions support X than A. Often used
as an index, as in the above code.
Y 8-bit Pretty much the same as X. Fewer instructions support Y than X. Also often used as
an index.
P 8-bit The "status" register. When the assembly program mentions the "zero flag" it is
actually referring to bit 1 (the second smallest bit) of the status register. The other
bits mean other things, which we will get into later.
SP 8-bit The "stack pointer". The stack is located at $100 - $1ff. The stack pointer is
initialized to $1ff. When you push a byte on the stack, the stack pointer is
decremented, and when you pull a byte from the stack, the stack pointer is
incremented.
PC 16-bit The program counter. You can't directly modify this register but you can indirectly
modify it using the stack.
程式開發
用 Go 語言寫工具程式
• Assembler
– lexer
– asm6502.nex → asm6502.nn.go ; asm6502.y → y.go
• Disassembler
– 首先面對的問題 : 如何區分 code 與 data?
org $FFFA
dc.w NMI_Routine
dc.w Reset_Routine
dc.w IRQ_Routine
– 建立一個 AST ,使每個單一 byte 都是個 .db 敘述
– 對於 $FFFA 到 $FFFB 範圍的記憶體區間,若有參照
NMI_Routine 標籤的話,以 .dw 敘述替換掉 .db 敘述
– 計算 $FFFA 到 $FFFB 之間有參照的 .db 敘述,並適當地
該敘述前,安插 NMI_Routine
– 視該位址的 .db 敘述為一道指令
設計 Disassembler ,分析遊戲運作
• Disassembler: 如何區分 code 與 data?
• 解析機械碼指令的策略
• 以 B 字母開頭的指令,表示 conditional branch;
• JSR = Jump to SubRoutine; RTS = ReTurn from Subroutine
指令 策略
BPL,
BMI,
BVC,
BVS,
BCC,
BCS,
BNE,
BEQ, JSR
Mark the jump target address and the next address as
an instruction.
JMP
absolute
Mark the jump target address as an instruction
Code Generation: 使用 LLVM
• bit-code
• 完整的編譯器基礎建設
• 可重用的、用以建構編譯器的軟體元件
• 允許更快更完整的打造新的編譯器
• static compiler, JIT, trace-based optimizer, …
• 開放的編譯器框架
• 多種程式語言支援
• 高彈性的自由軟體授權模式 (BSD License)
• 活躍的開發
• 豐富的編譯輸出: ARM, x86, MIPS, GPU, …
LLVM
豐富的工具集
Assemblers, automatic debugger, linker, code
generator, compiler driver, modular optimizer, …
Frontend LLVM IR Backend
C x86 機械碼
ARM 機械碼C++
Obj-C
Clang
$ clang main.c mod.c
libLLVM
ld
as
Clang driver
libClang
Clang: LLVM 的程式語言前端
[C]
main.c
x86 組語
main.s
libClang libLLVM
Source →
BitCode
[C]
mod.c
BitCode
main.bc
BitCode
mod.bc
Optimizer
Code
Generator
Clang 與 LLVM 的關聯
[C]
main.c
libClang
lex
parse
[C]
mod.c
semantics
codegen
BitCode
main.bc
BitCode
mod.bc
AST
main.ast
AST
mod.ast
Indexer
Analysis
Checker
AST
輸出 / 解析
IDE 整合
(Xcode)
Serialize/
Deserialize
Clang 功能示意
BitCode
main.bc
BitCode
mod.bc
Analysis Transforms
BitCode
parser
BitCode
Linker
Inter-precedual
BitCode
main.opt.bc
BitCode
mod.opt.bc
BitCode
main.lto.bc
Analysis Transforms
BitCode
BitCode
writer
BitCode + Optimizer
LTO: Link-Time OptimizationLTO: Link-Time Optimization
ARM.td
InstrInfo
Register
Subtarget
Frame
Lowering
CallingConv
libLLVM
Transform
SDNode
Transform
Machine
Instr
BitCode
parser
x86 組語
main.s
Instruction
Selection
Instruction
Scheduler
BitCode
Asm
Emitter
BitCode
main.lto.bc
X86.td
InstrInfo
Register
Subtarget
Frame
Lowering
CallingConv
Machine
Transforms
Machine
Analysis
Machine
Instr
Register
Allocation
CodeGen
[C]
main.c
[C++]
main.cpp
Execution
Engine
InterPreter
x86
CallbackUtils
Transforms
SDNode
LLVM Code Generation
Selection DAGSelection DAG
先從 Hello World 開始
• 完整的 Compiler Driver
$ clang hello.c ­o hello
• 生成 IR
$ clang ­O3 ­emit­llvm hello.c ­c ­o hello.bc
• 以 Just-In-Time compiler 模式執行 BitCode
$ lli hello.bc
Getting Started with the LLVM System
http://llvm.org/docs/GettingStarted.html
Getting Started with the LLVM System
http://llvm.org/docs/GettingStarted.html
#include <stdio.h>
int main(int argc, char *argv[])
{
printf("Hello world!n");
return 0;
}
#include <stdio.h>
int main(int argc, char *argv[])
{
printf("Hello world!n");
return 0;
}
  ; ModuleID = '<stdin>'
  target datalayout = "e­p:32:32:32­...
  target triple = "i386­pc­linux­gnu"
  @str = internal constant [13 x i8] c"Hello world!00"
  define i32 @main(i32 %argc, i8** nocapture %argv) nounwind {
  entry:
    %puts = tail call i32 @puts(i8* getelementptr inbounds ([13 x i8]* @str, i32 0, i32 0))
    ret i32 0
  }
  Declare i32 @puts(i8* nocapture) nounwind
  ; ModuleID = '<stdin>'
  target datalayout = "e­p:32:32:32­...
  target triple = "i386­pc­linux­gnu"
  @str = internal constant [13 x i8] c"Hello world!00"
  define i32 @main(i32 %argc, i8** nocapture %argv) nounwind {
  entry:
    %puts = tail call i32 @puts(i8* getelementptr inbounds ([13 x i8]* @str, i32 0, i32 0))
    ret i32 0
  }
  Declare i32 @puts(i8* nocapture) nounwind
• 輸出 x86 後端組合語言
$ llc hello.bc ­o hello.s
• 反組譯 BitCode
$ llvm­dis < hello.bc
函式 printf() 後方僅有一個
字串參數,前端預設將其
轉換為 puts()
函式 printf() 後方僅有一個
字串參數,前端預設將其
轉換為 puts()
org $C000
msg: .data "Hello, world!", 10, 0
Reset_Routine:
LDX #$00 ; put the starting index, 0, in register X
loop: LDA msg, X ; read 1 char
BEQ loopend ; end loop if we hit the 0
STA $2008 ; putchar (custom ABI)
INX ; (Register X)++
JMP loop ; repeat
loopend:
LDA #$00 ; return code 0
STA $2009 ; exit (custom ABI)
IRQ_Routine: rti ; do nothing
NMI_Routine: rti ; do nothing
org $FFFA
dc.w NMI_Routine
dc.w Reset_Routine
dc.w IRQ_Routine
Hello World in NES
; "constant" means that this data is read-only.
; [15 x i8] is the type of this data. i8 means an 8-bit integer.
@msg = private constant [15 x i8] c"Hello, World!0a00"
; Here we declare a dependency on the `putchar` symbol.
declare i32 @putchar(i32)
; `noreturn` indicates that we do not expect this function to return.
; It will end the process, after all.
; `nounwind` has to do with LLVM's error handling model.
; We use `nounwind` because we know that `exit` will not
; throw an exception.
declare void @exit(i32) noreturn nounwind
define i32 @main() {
Entry:
; Here we allocate some variables on the stack.
; These are X, Y, and A,
; 3 of the 6502's 8-bit registers.
%X = alloca i8
%Y = alloca i8
%A = alloca i8
NES to LLVM IR (1)
%S_neg = alloca i1
%S_zero = alloca i1
; Send control flow to the Reset_Routine basic block.
br label %Reset_Routine
Reset_Routine:
; This is the code to generate for
; LDX #$00
; Store 0 in the X register.
store i8 0, i8* %X
; Clear the negative status bit, because we just stored 0
; in X, and 0 is not negative.
store i1 0, i1* %S_neg
; Set the zero status bit, because we just stored 0 in X.
store i1 1, i1* %S_zero
...
NES to LLVM IR (2)
驗證 LLVM IR 與執行
• LLVM static compiler
$ llc -filetype=obj hello.llvm
• Link ELF image
$ gcc hello.llvm.o
• Veirfy
$ ./a.out
Hello, World!
32
#include <stdio.h>
void foo(char c) { putchar(c); }
#include <stdio.h>
void foo(char c) { putchar(c); }
define void @foo(i8 signext %c) nounwind {
%1 = alloca i8, align 1 ; <i8*> [#uses=2]
store i8 %c, i8* %1
%2 = load i8* %1 ; <i8> [#uses=1]
%3 = sext i8 %2 to i32 ; <i32> [#uses=1]
%4 = call i32 @putchar(i32 %3) ; <i32> [#uses=0]
ret void
}
declare i32 @putchar(i32)
define void @foo(i8 signext %c) nounwind {
%1 = alloca i8, align 1 ; <i8*> [#uses=2]
store i8 %c, i8* %1
%2 = load i8* %1 ; <i8> [#uses=1]
%3 = sext i8 %2 to i32 ; <i32> [#uses=1]
%4 = call i32 @putchar(i32 %3) ; <i32> [#uses=0]
ret void
}
declare i32 @putchar(i32)
   // declare i32 @putchar(i32)
Function* putchar = cast<Function>(
module->getOrInsertFunction(
"putchar", voidType, cellType, NULL));
putchar->setCallingConv(CallingConv::C);
呼叫底層系統 libc 的 putchar 函式呼叫底層系統 libc 的 putchar 函式
33
Function* makeFunc(Module* module,
const char* source,
int tapeSize = 400) {
...
Value* zero = ConstantInt::get(cellType, 0);
Value* one = ConstantInt::get(cellType, 1);
Value* minOne = ConstantInt::get(cellType, -1);
...
BasicBlock* block =
BasicBlock::Create(getGlobalContext(), "code", main);
std::stack<bfLoopInfo> loops;
IRBuilder<> codeIR(block);
Value *head = codeIR.CreateAlloca(cellType,
ConstantInt::get(indexType, tapeSize));
Value *it = head;
for (int i = 0; i < tapeSize; i++) {
codeIR.CreateStore(zero, it);
it = codeIR.CreateGEP(it, one);
}
在 LLVM IR 中,預先若干定義的常數
zero = 0, one =1, minOne = -1
在 LLVM IR 中,預先若干定義的常數
zero = 0, one =1, minOne = -1
建立 LLVM IR建立 LLVM IR
Code Generation: LDX
LDX (LoaD X register)
func (i *Instruction) Compile(c *Compilation) {
v := llvm.ConstInt(llvm.Int8Type(),
uint64(i.Value), false)
switch i.OpCode {
case 0xa2: // ldx
c.builder.CreateStore(v, c.rX)
c.testAndSetZero(i.Value)
c.testAndSetNeg(i.Value)
// other immediate insn
}
}
func (c *Compilation)
testAndSetZero(v int) {
if v == 0 {
c.setZero()
return
}
c.clearZero()
}
func (c *Compilation)
testAndSetZero(v int) {
if v == 0 {
c.setZero()
return
}
c.clearZero()
}
func (c *Compilation)
testAndSetNeg(v int) {
if v&0x80 == 0x80 {
c.setNeg()
return
}
c.clearNeg()
}
func (c *Compilation)
testAndSetNeg(v int) {
if v&0x80 == 0x80 {
c.setNeg()
return
}
c.clearNeg()
}
Optimizations
LLVM IR is designed to be optimized
• Constant Propagation [ConstantPropagationPass]
• Looks for instructions involving only constants and replaces them
with a constant value instead of an instruction.
• Combine Redundant Instructions [InstructionCombining]
• Combines instructions to form fewer, simple instructions. For
example if you add 1 twice, it will instead add 2.
• Promote Memory to Register [PromoteMemoryToRegister]
• This pass allows us to load every "register" variable (X, Y, A, etc)
before performing an instruction, and then store the register
variable back after performing the instruction.
• Global Value Numbering [GVN]
• Eliminates partially and fully redundant instructions, and delete
redundant load instructions.
• Control Flow Graph Simplification [CFGSimplification]
• Removes unnecessary code and merges basic blocks
together when possible.
套用各項優化策略
engine, err := llvm.NewJITCompiler(c.mod, 3)
if err != nil {
c.Errors = append(c.Errors, err.Error())
return
}
defer engine.Dispose()
pass := llvm.NewPassManager()
defer pass.Dispose()
pass.Add(engine.TargetData())
pass.AddConstantPropagationPass()
pass.AddInstructionCombiningPass()
pass.AddPromoteMemoryToRegisterPass()
pass.AddGVNPass()
pass.AddCFGSimplificationPass()
pass.Run(c.mod)
原本的輸出 (1)
@Label_c000 = private global [15 x i8] c"Hello, World!0A00"
declare i32 @putchar(i32)
declare void @exit(i32) noreturn nounwind
define i32 @main() {
Entry:
%X = alloca i8
%Y = alloca i8
%A = alloca i8
%S_neg = alloca i1
%S_zero = alloca i1
br label %Reset_Routine
Reset_Routine: ; preds =
%Entry
store i8 0, i8* %X
store i1 true, i1* %S_zero
store i1 false, i1* %S_neg
br label %Label_c011
原本的輸出 (2)
Label_c011: ; preds = %else, %Reset_Routine
%0 = load i8* %X
%1 = getelementptr [15 x i8]* @Label_c000, i8 0, i8 %0
%2 = load i8* %1
store i8 %2, i8* %A
%3 = and i8 %2, -128
%4 = icmp eq i8 %3, -128
store i1 %4, i1* %S_neg
%5 = icmp eq i8 %2, 0
store i1 %5, i1* %S_zero
%6 = load i1* %S_zero
br i1 %6, label %Label_c01d, label %else
原本的輸出 (3)
else: ; preds = %Label_c011
%7 = load i8* %A
%8 = zext i8 %7 to i32
%9 = call i32 @putchar(i32 %8)
%10 = load i8* %X
%11 = add i8 %10, 1
store i8 %11, i8* %X
%12 = and i8 %11, -128
%13 = icmp eq i8 %12, -128
store i1 %13, i1* %S_neg
%14 = icmp eq i8 %11, 0
store i1 %14, i1* %S_zero
br label %Label_c011
原本的輸出 (4)
Label_c01d: ; preds = %Label_c011
store i8 0, i8* %A
store i1 true, i1* %S_zero
store i1 false, i1* %S_neg
%15 = load i8* %A
%16 = zext i8 %15 to i32
call void @exit(i32 %16)
unreachable
IRQ_Routine: ; preds = %Label_c01d
unreachable
NMI_Routine: ; No predecessors!
unreachable
}
@Label_c000 = private global [15 x i8] c"Hello, World!0A00"
declare i32 @putchar(i32)
declare void @exit(i32) noreturn nounwind
define i32 @main() {
Entry:
br label %Label_c011
Label_c011: ; preds = %else, %Entry
%storemerge = phi i8 [ 0, %Entry ], [ %6, %else ]
%0 = sext i8 %storemerge to i64
%1 = getelementptr [15 x i8]* @Label_c000, i64 0, i64 %0
%2 = load i8* %1, align 1
%3 = icmp eq i8 %2, 0
br i1 %3, label %Label_c01d, label %else
else: ; preds = %Label_c011
%4 = zext i8 %2 to i32
%5 = call i32 @putchar(i32 %4)
%6 = add i8 %storemerge, 1
br label %Label_c011
Label_c01d: ; preds = %Label_c011
call void @exit(i32 0)
unreachable
}
經過 LLVM optimization pass經過 LLVM optimization pass
建構 NES
執行環境
NES ROM 組成
• 16 byte header + metadata
• 是否使用 mapper
• PPU 使用垂直或水平鏡像處理
• 組譯的 32 KB 程式碼將會在開機時,載入到 $8000 - $ffff 記憶體
區段,若 mapper 存在,會有更多的程式區段可用
• CHR-ROM: 8 KB 大小的圖形資料,將會在開機時,載入到
PPU 。同樣的,若有 mapper ,可允許更多的圖形資料段
NES ROM 組成
• $ ./jamulate -unrom roms/Mario.nes
loading roms/Mario.nes
disassembling to roms/Mario
• $ ls roms/Mario/
chr0.chr Mario.jam prg.asm
chr0.chr is binary data;
it is the 8KB of graphics data mentioned before
which is loaded into the PPU on bootup.
chr0.chr is binary data;
it is the 8KB of graphics data mentioned before
which is loaded into the PPU on bootup.
program metadata in a
simple key-value text format
program metadata in a
simple key-value text format
http://0xlab.org

More Related Content

PDF
How A Compiler Works: GNU Toolchain
PDF
What Can Compilers Do for Us?
PDF
用十分鐘 向jserv學習作業系統設計
PDF
Interpreter, Compiler, JIT from scratch
PDF
Virtual Machine Constructions for Dummies
PDF
from Source to Binary: How GNU Toolchain Works
PDF
Let's trace Linux Lernel with KGDB @ COSCUP 2021
PDF
The Internals of "Hello World" Program
How A Compiler Works: GNU Toolchain
What Can Compilers Do for Us?
用十分鐘 向jserv學習作業系統設計
Interpreter, Compiler, JIT from scratch
Virtual Machine Constructions for Dummies
from Source to Binary: How GNU Toolchain Works
Let's trace Linux Lernel with KGDB @ COSCUP 2021
The Internals of "Hello World" Program

What's hot (20)

PDF
淺談探索 Linux 系統設計之道
PDF
The Microkernel Mach Under NeXTSTEP
PDF
Learn C Programming Language by Using GDB
PDF
Launch the First Process in Linux System
PDF
Qemu device prototyping
PPTX
QEMU - Binary Translation
PDF
Kernel Recipes 2019 - Faster IO through io_uring
PDF
ARM Trusted FirmwareのBL31を単体で使う!
PPTX
Linux Initialization Process (1)
PDF
Address/Thread/Memory Sanitizer
PDF
給自己更好未來的 3 個練習:嵌入式作業系統設計、實做,與移植 (2015 年春季 ) 課程說明
PDF
constexpr関数はコンパイル時処理。これはいい。実行時が霞んで見える。cpuの嬌声が聞こえてきそうだ
PDF
COSCUP 2016 - LLVM 由淺入淺
PDF
Qemu Introduction
PDF
Embedded Virtualization applied in Mobile Devices
PDF
GDB Rocks!
PDF
入門core.async
PPTX
Understand more about C
PDF
Lispマシン・シミュレータの紹介
PPTX
90分 Scheme to C(勝手に抄訳版)
淺談探索 Linux 系統設計之道
The Microkernel Mach Under NeXTSTEP
Learn C Programming Language by Using GDB
Launch the First Process in Linux System
Qemu device prototyping
QEMU - Binary Translation
Kernel Recipes 2019 - Faster IO through io_uring
ARM Trusted FirmwareのBL31を単体で使う!
Linux Initialization Process (1)
Address/Thread/Memory Sanitizer
給自己更好未來的 3 個練習:嵌入式作業系統設計、實做,與移植 (2015 年春季 ) 課程說明
constexpr関数はコンパイル時処理。これはいい。実行時が霞んで見える。cpuの嬌声が聞こえてきそうだ
COSCUP 2016 - LLVM 由淺入淺
Qemu Introduction
Embedded Virtualization applied in Mobile Devices
GDB Rocks!
入門core.async
Understand more about C
Lispマシン・シミュレータの紹介
90分 Scheme to C(勝手に抄訳版)
Ad

Similar to LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例 (20)

PPTX
Buffer overflow – Smashing The Stack
PDF
Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...
PDF
Assembly language part I
PDF
Assembly language part I
PDF
ruby2600 - an Atari 2600 emulator written in Ruby
PDF
Buffer Overflow - Smashing the Stack
PDF
XS Boston 2008 Paravirt Ops in Linux IA64
PPTX
Coal (1)
PDF
SMB3 Offload Data Transfer (ODX)
PDF
Specialized Compiler for Hash Cracking
PDF
JavaScript on the GPU
PDF
Gdc 14 bringing unreal engine 4 to open_gl
PPTX
BOOTABLE OPERATING SYSTEM PPT
PDF
Next Stop, Android
PDF
Hardware hacking
PPTX
PV-Drivers for SeaBIOS using Upstream Qemu
PPTX
Optimizing your Infrastrucure and Operating System for Hadoop
PDF
Introduction to SCSI over FCP for Linux on System z
PDF
AllBits presentation - Lower Level SW Security
PDF
Shellcoding in linux
Buffer overflow – Smashing The Stack
Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...
Assembly language part I
Assembly language part I
ruby2600 - an Atari 2600 emulator written in Ruby
Buffer Overflow - Smashing the Stack
XS Boston 2008 Paravirt Ops in Linux IA64
Coal (1)
SMB3 Offload Data Transfer (ODX)
Specialized Compiler for Hash Cracking
JavaScript on the GPU
Gdc 14 bringing unreal engine 4 to open_gl
BOOTABLE OPERATING SYSTEM PPT
Next Stop, Android
Hardware hacking
PV-Drivers for SeaBIOS using Upstream Qemu
Optimizing your Infrastrucure and Operating System for Hadoop
Introduction to SCSI over FCP for Linux on System z
AllBits presentation - Lower Level SW Security
Shellcoding in linux
Ad

More from National Cheng Kung University (20)

PDF
PyPy's approach to construct domain-specific language runtime
PDF
Making Linux do Hard Real-time
PDF
2016 年春季嵌入式作業系統課程說明
PDF
進階嵌入式作業系統設計與實做 (2015 年秋季 ) 課程說明
PDF
Construct an Efficient and Secure Microkernel for IoT
PDF
從線上售票看作業系統設計議題
PDF
進階嵌入式系統開發與實做 (2014 年秋季 ) 課程說明
PDF
Xvisor: embedded and lightweight hypervisor
PDF
Making Linux do Hard Real-time
PDF
Implement Runtime Environments for HSA using LLVM
PDF
Priority Inversion on Mars
PDF
Develop Your Own Operating Systems using Cheap ARM Boards
PDF
Lecture notice about Embedded Operating System Design and Implementation
PDF
Explore Android Internals
PDF
中輟生談教育: 完全用開放原始碼軟體進行 嵌入式系統教學
PDF
F9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems
PDF
Open Source from Legend, Business, to Ecosystem
PDF
Summer Project: Microkernel (2013)
PDF
進階嵌入式系統開發與實作 (2013 秋季班 ) 課程說明
PDF
Faults inside System Software
PyPy's approach to construct domain-specific language runtime
Making Linux do Hard Real-time
2016 年春季嵌入式作業系統課程說明
進階嵌入式作業系統設計與實做 (2015 年秋季 ) 課程說明
Construct an Efficient and Secure Microkernel for IoT
從線上售票看作業系統設計議題
進階嵌入式系統開發與實做 (2014 年秋季 ) 課程說明
Xvisor: embedded and lightweight hypervisor
Making Linux do Hard Real-time
Implement Runtime Environments for HSA using LLVM
Priority Inversion on Mars
Develop Your Own Operating Systems using Cheap ARM Boards
Lecture notice about Embedded Operating System Design and Implementation
Explore Android Internals
中輟生談教育: 完全用開放原始碼軟體進行 嵌入式系統教學
F9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems
Open Source from Legend, Business, to Ecosystem
Summer Project: Microkernel (2013)
進階嵌入式系統開發與實作 (2013 秋季班 ) 課程說明
Faults inside System Software

Recently uploaded (20)

PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Empathic Computing: Creating Shared Understanding
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Encapsulation theory and applications.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Encapsulation_ Review paper, used for researhc scholars
NewMind AI Monthly Chronicles - July 2025
Empathic Computing: Creating Shared Understanding
Unlocking AI with Model Context Protocol (MCP)
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Understanding_Digital_Forensics_Presentation.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Spectral efficient network and resource selection model in 5G networks
Encapsulation theory and applications.pdf
Electronic commerce courselecture one. Pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Per capita expenditure prediction using model stacking based on satellite ima...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
NewMind AI Weekly Chronicles - August'25 Week I
The Rise and Fall of 3GPP – Time for a Sabbatical?
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Chapter 3 Spatial Domain Image Processing.pdf
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
CIFDAQ's Market Insight: SEC Turns Pro Crypto

LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例

  • 1. LLVM 總是打開你的心: 從電玩模擬器看編譯器應用實例 Jim Huang ( 黃敬群 ) <jserv@0xlab.org> Jul 11, 2013 / 新竹碼農
  • 2. Rights to copy Attribution – ShareAlike 3.0 You are free to copy, distribute, display, and perform the work to make derivative works to make commercial use of the work Under the following conditions Attribution. You must give the original author credit. Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under a license identical to this one. For any reuse or distribution, you must make clear to others the license terms of this work. Any of these conditions can be waived if you get permission from the copyright holder. Your fair use and other rights are in no way affected by the above. License text: http://creativecommons.org/licenses/by-sa/3.0/legalcode © Copyright 2013 0xlab http://0xlab.org/ contact@0xlab.org Corrections, suggestions, contributions and translations are welcome! Latest update:Jul 22, 2013
  • 3. 姊妹議程: 〈窮得只剩下 Compiler 〉 OSDC.tw 2009 http://www.slideshare.net/jserv/what-can-compilers-do-for-us
  • 4. 1972 Pong ( 硬體 ) 1980 Zork ( 高階直譯語言 ) 1993 DOOM (C) 1998 Unreal (C++, Java-style scripting) 2005-6 Xbox 360, PlayStation 3 with 6-8 hardware threads 2009 Next console generation Unification of the CPU, GPU. Massive multi-core, data parallelism, etc. 遊戲產業加速技術革新
  • 5. 目標 : 重新編譯任天堂遊戲為 Native Game http://andrewkelley.me/post/jamulator.html
  • 6. 提綱 (1) 任天堂遊戲機概況 (2) Code Generation: 使用 LLVM (3) Optimizations & LLVM (4) 建構 NES 執行環境
  • 8. 任天堂電玩主機歡度誕生 30 周年 ! http://gnn.gamer.com.tw/6/82966.html • 1983 年 7 月 15 日在日本推出、陪伴許多 6 、 7 年級 玩家度過童年時光的任天堂 Family Computer( 簡稱 FAMICOM / FC ,台灣俗稱任天堂紅白機 ) 電視遊樂 器主機,歡度誕生 30 周年紀念 • 因為《超級瑪利歐兄弟》等眾多人氣經典遊戲的推出 而帶動了爆發性的銷售 • 自 1983 年推出到 2003 年停產為止,全球總計出貨 6291 萬台
  • 9. NES 硬體簡述 • 四個獨立運作的單元 – CPU: 8-bit processor - MOS 6502. – Graphics: Picture Processing Unit (PPU) – Sound: Audio Processing Unit (APU) – (Memory) Mapper: 提供額外 ROM 的客製化硬體 Source: https://en.wikipedia.org/wiki/Nintendo_Entertainment_SystemSource: https://en.wikipedia.org/wiki/Nintendo_Entertainment_System
  • 10. NES 硬體定址空間 • NES 使用 MMIO (memory-mapped I/O) ,可藉由讀寫特定的記 憶體位址,存取到週邊硬體 • 定址空間為 64 KB ,所有位址可用 2 bytes 表示 起始位址 結束位址 描述 $0000 $07FF RAM: 除非使用 Mapper ,否則 NES 遊戲 只有 2K 的 RAM 空間 $2000 $2007 保留給 PPU 使用,此區段的記憶體內 容將會在螢幕上顯示 $4000 $4015 保留給 APU 使用,作為音效輸出 $4016 $4017 遊戲控制狀態的保存 $8000 $FFFF 唯讀記憶體,遊戲本身載入於此區段 [code+data] $ 表示 16 進位$ 表示 16 進位 6502 程式組譯後成為 32 KB binary ,將在 系統啟動後,被載入到 $8000 - $FFFF 的範圍 6502 程式組譯後成為 32 KB binary ,將在 系統啟動後,被載入到 $8000 - $FFFF 的範圍
  • 11. 撰寫 Hello World! 程式 • 在 NES 開發程式的門檻不低,即便是 Hello World 等級的程式 • 難度在於得自行處理以下 – 硬體週邊 – Interrupt • Reset • IRQ: 以 CLI/SEI 指令控制觸發與否 • NMI (Non-Maskable Interrupt): 無硬體指令可控制此類 ; vertical blank 起始時,會觸發 NMI • 定義自訂的 ABI ,以簡化處理 位址 描述 $2008 Write a byte to this address and the byte will go to standard out. $2009 Write a byte to this address and the program will exit with that byte as the return code.
  • 12. org $C000 msg: .data "Hello, world!", 10, 0 Reset_Routine: LDX #$00 ; put the starting index, 0, in register X loop: LDA msg, X ; read 1 char BEQ loopend ; end loop if we hit the 0 STA $2008 ; putchar (custom ABI) INX ; (Register X)++ JMP loop ; repeat loopend: LDA #$00 ; return code 0 STA $2009 ; exit (custom ABI) IRQ_Routine: rti ; do nothing NMI_Routine: rti ; do nothing org $FFFA dc.w NMI_Routine dc.w Reset_Routine dc.w IRQ_Routine code/data 將自 $C000 位址開始,所以 "Hello, World!n0" 字串的 W 字元將被載入到位址 $C007 code/data 將自 $C000 位址開始,所以 "Hello, World!n0" 字串的 W 字元將被載入到位址 $C007 若 zero flag 為 1 ,跳往 loopend 標記, 否則移往下一道指令 若 zero flag 為 1 ,跳往 loopend 標記, 否則移往下一道指令 將暫存器 A 的內容存到記憶體位址 $2008 , 這意味著,將暫存器 A 的值輸出到 stdout 將暫存器 A 的內容存到記憶體位址 $2008 , 這意味著,將暫存器 A 的值輸出到 stdout NES 程式僅有 32 KB ,這指令告知組譯器, 填寫空白直到位址 $FFFA NES 程式僅有 32 KB ,這指令告知組譯器, 填寫空白直到位址 $FFFA
  • 13. ; memcpy -- ; Copy a block of memory from one location to another. ; ; Entry parameters ; SRC - Address of source data block ; DST - Address of target data block ; CNT - Number of bytes to copy 0040 ORG $0040 ;Parameters at $0040 0040 00 00 SRC DW $0000 0042 00 00 DST DW $0000 0044 00 00 CNT DW $0000 0600 ORG $0600 ;Code at $0600 0600 A4 44 MEMCPY LDY CNT+0 ;Set Y = SRC.L 0602 D0 05 BNE LOOP ;If SRC.L > 0, then loop 0604 A5 45 LDA CNT+1 ;If CNT.H > 0, 0606 D0 01 BNE LOOP ; then loop 0608 60 RTS ;Return 0609 B1 40 LOOP LDA (SRC),Y ;Load A from ((SRC)+Y) 060B 91 42 STA (DST),Y ;Store A to ((DST)+Y) 060D 88 DEY ;Decr CNT.L 060E D0 F9 BNE LOOP ;if CNT.L > 0, then loop 0610 E6 41 INC SRC+1 ;Incr SRC += $0100 0612 E6 43 INC DST+1 ;Incr DST += $0100 0614 88 DEY ;Decr CNT.L 0615 C6 45 DEC CNT+1 ;Decr CNT.H 0617 D0 F0 BNE LOOP ;If CNT.H > 0, then loop 0619 60 RTS ;Return 061A END Source: https://en.wikipedia.org/wiki/MOS_Technology_6502Source: https://en.wikipedia.org/wiki/MOS_Technology_6502
  • 14. • 在 NES 遊戲中,記憶體區段 $FFFA 到 $FFFF 有特別含意 • 當遊戲啟動時, NES 會從 $FFFC 與 $FFFD 讀取 2 bytes ,並 使用其內含值,決定第一道執行指令的位址 • 6502 暫存器定義 名稱 寬度 描述 A 8-bit "main" register. Most of the arithmetic operations are performed on this register. X 8-bit Another general purpose register. Fewer instructions support X than A. Often used as an index, as in the above code. Y 8-bit Pretty much the same as X. Fewer instructions support Y than X. Also often used as an index. P 8-bit The "status" register. When the assembly program mentions the "zero flag" it is actually referring to bit 1 (the second smallest bit) of the status register. The other bits mean other things, which we will get into later. SP 8-bit The "stack pointer". The stack is located at $100 - $1ff. The stack pointer is initialized to $1ff. When you push a byte on the stack, the stack pointer is decremented, and when you pull a byte from the stack, the stack pointer is incremented. PC 16-bit The program counter. You can't directly modify this register but you can indirectly modify it using the stack. 程式開發
  • 15. 用 Go 語言寫工具程式 • Assembler – lexer – asm6502.nex → asm6502.nn.go ; asm6502.y → y.go • Disassembler – 首先面對的問題 : 如何區分 code 與 data? org $FFFA dc.w NMI_Routine dc.w Reset_Routine dc.w IRQ_Routine – 建立一個 AST ,使每個單一 byte 都是個 .db 敘述 – 對於 $FFFA 到 $FFFB 範圍的記憶體區間,若有參照 NMI_Routine 標籤的話,以 .dw 敘述替換掉 .db 敘述 – 計算 $FFFA 到 $FFFB 之間有參照的 .db 敘述,並適當地 該敘述前,安插 NMI_Routine – 視該位址的 .db 敘述為一道指令
  • 16. 設計 Disassembler ,分析遊戲運作 • Disassembler: 如何區分 code 與 data? • 解析機械碼指令的策略 • 以 B 字母開頭的指令,表示 conditional branch; • JSR = Jump to SubRoutine; RTS = ReTurn from Subroutine 指令 策略 BPL, BMI, BVC, BVS, BCC, BCS, BNE, BEQ, JSR Mark the jump target address and the next address as an instruction. JMP absolute Mark the jump target address as an instruction
  • 18. • bit-code • 完整的編譯器基礎建設 • 可重用的、用以建構編譯器的軟體元件 • 允許更快更完整的打造新的編譯器 • static compiler, JIT, trace-based optimizer, … • 開放的編譯器框架 • 多種程式語言支援 • 高彈性的自由軟體授權模式 (BSD License) • 活躍的開發 • 豐富的編譯輸出: ARM, x86, MIPS, GPU, … LLVM
  • 19. 豐富的工具集 Assemblers, automatic debugger, linker, code generator, compiler driver, modular optimizer, …
  • 20. Frontend LLVM IR Backend
  • 21. C x86 機械碼 ARM 機械碼C++ Obj-C Clang $ clang main.c mod.c libLLVM ld as Clang driver libClang Clang: LLVM 的程式語言前端
  • 22. [C] main.c x86 組語 main.s libClang libLLVM Source → BitCode [C] mod.c BitCode main.bc BitCode mod.bc Optimizer Code Generator Clang 與 LLVM 的關聯
  • 26. 先從 Hello World 開始 • 完整的 Compiler Driver $ clang hello.c ­o hello • 生成 IR $ clang ­O3 ­emit­llvm hello.c ­c ­o hello.bc • 以 Just-In-Time compiler 模式執行 BitCode $ lli hello.bc Getting Started with the LLVM System http://llvm.org/docs/GettingStarted.html Getting Started with the LLVM System http://llvm.org/docs/GettingStarted.html
  • 27. #include <stdio.h> int main(int argc, char *argv[]) { printf("Hello world!n"); return 0; } #include <stdio.h> int main(int argc, char *argv[]) { printf("Hello world!n"); return 0; }   ; ModuleID = '<stdin>'   target datalayout = "e­p:32:32:32­...   target triple = "i386­pc­linux­gnu"   @str = internal constant [13 x i8] c"Hello world!00"   define i32 @main(i32 %argc, i8** nocapture %argv) nounwind {   entry:     %puts = tail call i32 @puts(i8* getelementptr inbounds ([13 x i8]* @str, i32 0, i32 0))     ret i32 0   }   Declare i32 @puts(i8* nocapture) nounwind   ; ModuleID = '<stdin>'   target datalayout = "e­p:32:32:32­...   target triple = "i386­pc­linux­gnu"   @str = internal constant [13 x i8] c"Hello world!00"   define i32 @main(i32 %argc, i8** nocapture %argv) nounwind {   entry:     %puts = tail call i32 @puts(i8* getelementptr inbounds ([13 x i8]* @str, i32 0, i32 0))     ret i32 0   }   Declare i32 @puts(i8* nocapture) nounwind • 輸出 x86 後端組合語言 $ llc hello.bc ­o hello.s • 反組譯 BitCode $ llvm­dis < hello.bc 函式 printf() 後方僅有一個 字串參數,前端預設將其 轉換為 puts() 函式 printf() 後方僅有一個 字串參數,前端預設將其 轉換為 puts()
  • 28. org $C000 msg: .data "Hello, world!", 10, 0 Reset_Routine: LDX #$00 ; put the starting index, 0, in register X loop: LDA msg, X ; read 1 char BEQ loopend ; end loop if we hit the 0 STA $2008 ; putchar (custom ABI) INX ; (Register X)++ JMP loop ; repeat loopend: LDA #$00 ; return code 0 STA $2009 ; exit (custom ABI) IRQ_Routine: rti ; do nothing NMI_Routine: rti ; do nothing org $FFFA dc.w NMI_Routine dc.w Reset_Routine dc.w IRQ_Routine Hello World in NES
  • 29. ; "constant" means that this data is read-only. ; [15 x i8] is the type of this data. i8 means an 8-bit integer. @msg = private constant [15 x i8] c"Hello, World!0a00" ; Here we declare a dependency on the `putchar` symbol. declare i32 @putchar(i32) ; `noreturn` indicates that we do not expect this function to return. ; It will end the process, after all. ; `nounwind` has to do with LLVM's error handling model. ; We use `nounwind` because we know that `exit` will not ; throw an exception. declare void @exit(i32) noreturn nounwind define i32 @main() { Entry: ; Here we allocate some variables on the stack. ; These are X, Y, and A, ; 3 of the 6502's 8-bit registers. %X = alloca i8 %Y = alloca i8 %A = alloca i8 NES to LLVM IR (1)
  • 30. %S_neg = alloca i1 %S_zero = alloca i1 ; Send control flow to the Reset_Routine basic block. br label %Reset_Routine Reset_Routine: ; This is the code to generate for ; LDX #$00 ; Store 0 in the X register. store i8 0, i8* %X ; Clear the negative status bit, because we just stored 0 ; in X, and 0 is not negative. store i1 0, i1* %S_neg ; Set the zero status bit, because we just stored 0 in X. store i1 1, i1* %S_zero ... NES to LLVM IR (2)
  • 31. 驗證 LLVM IR 與執行 • LLVM static compiler $ llc -filetype=obj hello.llvm • Link ELF image $ gcc hello.llvm.o • Veirfy $ ./a.out Hello, World!
  • 32. 32 #include <stdio.h> void foo(char c) { putchar(c); } #include <stdio.h> void foo(char c) { putchar(c); } define void @foo(i8 signext %c) nounwind { %1 = alloca i8, align 1 ; <i8*> [#uses=2] store i8 %c, i8* %1 %2 = load i8* %1 ; <i8> [#uses=1] %3 = sext i8 %2 to i32 ; <i32> [#uses=1] %4 = call i32 @putchar(i32 %3) ; <i32> [#uses=0] ret void } declare i32 @putchar(i32) define void @foo(i8 signext %c) nounwind { %1 = alloca i8, align 1 ; <i8*> [#uses=2] store i8 %c, i8* %1 %2 = load i8* %1 ; <i8> [#uses=1] %3 = sext i8 %2 to i32 ; <i32> [#uses=1] %4 = call i32 @putchar(i32 %3) ; <i32> [#uses=0] ret void } declare i32 @putchar(i32)    // declare i32 @putchar(i32) Function* putchar = cast<Function>( module->getOrInsertFunction( "putchar", voidType, cellType, NULL)); putchar->setCallingConv(CallingConv::C); 呼叫底層系統 libc 的 putchar 函式呼叫底層系統 libc 的 putchar 函式
  • 33. 33 Function* makeFunc(Module* module, const char* source, int tapeSize = 400) { ... Value* zero = ConstantInt::get(cellType, 0); Value* one = ConstantInt::get(cellType, 1); Value* minOne = ConstantInt::get(cellType, -1); ... BasicBlock* block = BasicBlock::Create(getGlobalContext(), "code", main); std::stack<bfLoopInfo> loops; IRBuilder<> codeIR(block); Value *head = codeIR.CreateAlloca(cellType, ConstantInt::get(indexType, tapeSize)); Value *it = head; for (int i = 0; i < tapeSize; i++) { codeIR.CreateStore(zero, it); it = codeIR.CreateGEP(it, one); } 在 LLVM IR 中,預先若干定義的常數 zero = 0, one =1, minOne = -1 在 LLVM IR 中,預先若干定義的常數 zero = 0, one =1, minOne = -1 建立 LLVM IR建立 LLVM IR
  • 34. Code Generation: LDX LDX (LoaD X register) func (i *Instruction) Compile(c *Compilation) { v := llvm.ConstInt(llvm.Int8Type(), uint64(i.Value), false) switch i.OpCode { case 0xa2: // ldx c.builder.CreateStore(v, c.rX) c.testAndSetZero(i.Value) c.testAndSetNeg(i.Value) // other immediate insn } } func (c *Compilation) testAndSetZero(v int) { if v == 0 { c.setZero() return } c.clearZero() } func (c *Compilation) testAndSetZero(v int) { if v == 0 { c.setZero() return } c.clearZero() } func (c *Compilation) testAndSetNeg(v int) { if v&0x80 == 0x80 { c.setNeg() return } c.clearNeg() } func (c *Compilation) testAndSetNeg(v int) { if v&0x80 == 0x80 { c.setNeg() return } c.clearNeg() }
  • 36. LLVM IR is designed to be optimized • Constant Propagation [ConstantPropagationPass] • Looks for instructions involving only constants and replaces them with a constant value instead of an instruction. • Combine Redundant Instructions [InstructionCombining] • Combines instructions to form fewer, simple instructions. For example if you add 1 twice, it will instead add 2. • Promote Memory to Register [PromoteMemoryToRegister] • This pass allows us to load every "register" variable (X, Y, A, etc) before performing an instruction, and then store the register variable back after performing the instruction. • Global Value Numbering [GVN] • Eliminates partially and fully redundant instructions, and delete redundant load instructions. • Control Flow Graph Simplification [CFGSimplification] • Removes unnecessary code and merges basic blocks together when possible.
  • 37. 套用各項優化策略 engine, err := llvm.NewJITCompiler(c.mod, 3) if err != nil { c.Errors = append(c.Errors, err.Error()) return } defer engine.Dispose() pass := llvm.NewPassManager() defer pass.Dispose() pass.Add(engine.TargetData()) pass.AddConstantPropagationPass() pass.AddInstructionCombiningPass() pass.AddPromoteMemoryToRegisterPass() pass.AddGVNPass() pass.AddCFGSimplificationPass() pass.Run(c.mod)
  • 38. 原本的輸出 (1) @Label_c000 = private global [15 x i8] c"Hello, World!0A00" declare i32 @putchar(i32) declare void @exit(i32) noreturn nounwind define i32 @main() { Entry: %X = alloca i8 %Y = alloca i8 %A = alloca i8 %S_neg = alloca i1 %S_zero = alloca i1 br label %Reset_Routine Reset_Routine: ; preds = %Entry store i8 0, i8* %X store i1 true, i1* %S_zero store i1 false, i1* %S_neg br label %Label_c011
  • 39. 原本的輸出 (2) Label_c011: ; preds = %else, %Reset_Routine %0 = load i8* %X %1 = getelementptr [15 x i8]* @Label_c000, i8 0, i8 %0 %2 = load i8* %1 store i8 %2, i8* %A %3 = and i8 %2, -128 %4 = icmp eq i8 %3, -128 store i1 %4, i1* %S_neg %5 = icmp eq i8 %2, 0 store i1 %5, i1* %S_zero %6 = load i1* %S_zero br i1 %6, label %Label_c01d, label %else
  • 40. 原本的輸出 (3) else: ; preds = %Label_c011 %7 = load i8* %A %8 = zext i8 %7 to i32 %9 = call i32 @putchar(i32 %8) %10 = load i8* %X %11 = add i8 %10, 1 store i8 %11, i8* %X %12 = and i8 %11, -128 %13 = icmp eq i8 %12, -128 store i1 %13, i1* %S_neg %14 = icmp eq i8 %11, 0 store i1 %14, i1* %S_zero br label %Label_c011
  • 41. 原本的輸出 (4) Label_c01d: ; preds = %Label_c011 store i8 0, i8* %A store i1 true, i1* %S_zero store i1 false, i1* %S_neg %15 = load i8* %A %16 = zext i8 %15 to i32 call void @exit(i32 %16) unreachable IRQ_Routine: ; preds = %Label_c01d unreachable NMI_Routine: ; No predecessors! unreachable }
  • 42. @Label_c000 = private global [15 x i8] c"Hello, World!0A00" declare i32 @putchar(i32) declare void @exit(i32) noreturn nounwind define i32 @main() { Entry: br label %Label_c011 Label_c011: ; preds = %else, %Entry %storemerge = phi i8 [ 0, %Entry ], [ %6, %else ] %0 = sext i8 %storemerge to i64 %1 = getelementptr [15 x i8]* @Label_c000, i64 0, i64 %0 %2 = load i8* %1, align 1 %3 = icmp eq i8 %2, 0 br i1 %3, label %Label_c01d, label %else else: ; preds = %Label_c011 %4 = zext i8 %2 to i32 %5 = call i32 @putchar(i32 %4) %6 = add i8 %storemerge, 1 br label %Label_c011 Label_c01d: ; preds = %Label_c011 call void @exit(i32 0) unreachable } 經過 LLVM optimization pass經過 LLVM optimization pass
  • 44. NES ROM 組成 • 16 byte header + metadata • 是否使用 mapper • PPU 使用垂直或水平鏡像處理 • 組譯的 32 KB 程式碼將會在開機時,載入到 $8000 - $ffff 記憶體 區段,若 mapper 存在,會有更多的程式區段可用 • CHR-ROM: 8 KB 大小的圖形資料,將會在開機時,載入到 PPU 。同樣的,若有 mapper ,可允許更多的圖形資料段
  • 45. NES ROM 組成 • $ ./jamulate -unrom roms/Mario.nes loading roms/Mario.nes disassembling to roms/Mario • $ ls roms/Mario/ chr0.chr Mario.jam prg.asm chr0.chr is binary data; it is the 8KB of graphics data mentioned before which is loaded into the PPU on bootup. chr0.chr is binary data; it is the 8KB of graphics data mentioned before which is loaded into the PPU on bootup. program metadata in a simple key-value text format program metadata in a simple key-value text format