An annotated context-free grammar based vulnerability detection using LALR parser

An annotated context-free
grammar based vulnerability
detection using LALR parser
安藤類央
情報通信研究機構
ネットワークセキュリティ研究所
LALR (先読み上昇型パーサ）を用いた
CVE-2013-4371の検出と評価
CVE-2013-4371 Realloc() vulnerability under high memory
pressure
情報通信システムセキュリティ研究会（ICSS）
2015年11月27日(金) 15:05-15:30

概要： LALR (先読み上昇型パーサ）を用いた大規模プログラム
脆弱性走査
■近年、ミッションクリティカルな情報基盤ソフトウェアに大規模なオープンソース（Linux, Xen, OpenFlowなど）
が用いられており、このオープンソースの規模が爆発的に増加している。
■ANTLR, Bison/Flex, Boost Spiritなどの構文解析（言語実装）技術の普及により、大規模プログラムの処理・
生成に関数型言語のフレームワークが適用されつつある。例）Android Open Source Project内でのANTLR,
OpenFLow内でのBison/Flexの利用など
■近年のクラスタコンピューティング、クラウドコンピューティングなどの技術の普及により、Map-Reduceや
LALR (Look-Ahead Left-Right)などの上昇型のテキスト技術処理で現実的な計算時間で行なうことが可能に
なっている。
■本論文ではソースコードの脆弱性の検出にLALR型の構文解析プログラムを解析し、大規模プログラムの
高メモリプレッシャー下での情報漏洩の危険性のある脆弱性を文脈自由文法を用いて記述し、ソースコード
の全走査からの検出を許容可能な計算時間で行なうことが可能になった。

背景と設計方針 (Scalability vs False Negatives)
In designing vulnerability checker, we face the difficult choice between
precision and scalability. Particularly, security system design is forced
to emphasize either false negatives or false positives. In todayfs large
scale computing era, we conclude that a false negative rate should be
as close to 0 as possible.
As of January 2013, GitHub had grown to 3 million users and
4.9 million repositories (repositories are histories of code
shared on the site). [9] And by December of this year, the
company hit 10 million repositories.
http://slideplayer.us/slide/703331/

１９40-1950 1960 １９９０２０００２０１０
assembler
C language
(1972) -
Lisp
(1958-)
Prolog
（１９７２－）
mapReduceOcaml
Scala
Java
Ruby / Python
Turing machine
Lamda calculus
Otter
First order Theorem Prover
First order
Logic
Map and Fold
1970-1980
Isabelle
proverif
John von Neumann
Two streams of computing paradigm(1940 – 2015) Imperative vs Declarative
Dalvik VM
Kurt Gödel
MainFrame
resolution
Haskel
-> x { -> y { x.call(y) } }
量子力学
集合論
ICOT

Long term trend (検査方式と問題領域）
ITS4
ACSAC 2000
MOPS
CCS 2002
MC Meta-Level
Compilation
OSDI 2000
MACE
Concolic Execution
USENIX SEC 2011
COTS (ROP)
Usenix 2013
Automation
NDSS 2000
Format String
USENIX SEC 2001
MOPS (2)
CCS 2004
MetaSymsploit
USENIX SEC 2013
CHUCKY
CCS 2013
Computational
Verification (proverif)
CCS 2012
ConfAid
OSDI 2011
Metal Compiler
Extention
SSP 2002
SLAM
POPL 2002
ForNox
Hot SDN 2012
Dowser
USENIX SEC 2013
F7 verification
CCS 2010
StackGuard
USENIX SEC 1998
Branch Tracing (ROP)
Usenix Sec 2013
Proverif
SSP 2006 プロトコル検証の精緻化
複合型
設定整合性
攻撃手法の
迅速化への
対応

検査方法の分類
■構文主導型 (Syntax Directed Translation)
- This translator consists of a parser (or grammar) with embedded actions that immediately generate output.
正規表現、有限オートマトン
ITS4: a static vulnerability scanner for C and C++ code, Computer Security Applications, ACSAC 2002
Chucky: exposing missing checks in source code for vulnerability discovery ccs 2013
■ルール方式 (Rule Based Translation)
- Rule-based translators use the DSL of a particular rule engine to specify a set of “this goes to that”
translation rules.
遷移規則、プッシュダウンオートマトン
Using programmer-written compiler extensions to catch security holes SSP 2002
Checking system rules using system-specific, programmer-written compiler extensions OSDI 2000
■モデル駆動方式 (Model Driven Translation)
- From the input model, a translator can emit output directly, build up strings, build up templates (documents
with “holes” in them where we can stick values), or build up specialized output objects
モデル検査・実行系
MOPS: an infrastructure for examining security properties of software CCS2002
Chucky: exposing missing checks in source code for vulnerability discovery ccs 2013

提案手法（tagging, LALR parsing and binary search)
Ａ－１
対象となるソースツリーのファイルをリストアップする
Ａ－２
ファイルリストから関数をtagging(関数の行数など）する
Ｂ－１
対象となるソースツリーのループ群を検出する
Ｂ－２
対象となるソースツリーからrealloc()関数を検出する
Ｃ－３
検出されたループそれぞれについて、
関数の行数配列と、realloc()の行数の間で
バイナリサーチを行う
Ｄ－１
ＡからＣの手順をもとに、
脆弱性についての情報をまとめる。
{ "_id" : ObjectId("5633ed7f42e0e0048307ec14"),
"loop_end_line" : 420, "realloc" : 1, "loop_start_line" : 398,
"loop_type" : 1, "realloc_line" : 402, "file_name" :
"tools_libxl_libxl__c", "func_line_number" : 388 }
関数解析部分
ループ解析部分

比較した手法（ＳＣＩＳ２０１５）：プッシュダウンオートマトンによるブロック解析
Main Loop
Lexer
NFA（有限オートマトン）
PDA(プッシュダウンオートマトン）
Token Analyzer
Block Handler
識別子（制御文、メモリ操作命令など）
の検出と処理
ブロック文（繰り返し、
分岐）のネスト管理
Saturator-1
lightweight code checker with document database
https://github.com/RuoAndo/Saturator-1
Iteration for each token
switch (charatyp[ch]) f
case Letter:
for ( ; charatyp[ch]==Letter ||
charatyp[ch]==Digit;
ch=nextCh())
if (p < p 16) p++ = ch;
p = '0'
if(strcmp(tkn.text, “for")==0)
Document Database
処理系の状態情報
（プログラム中の位置など）
問い合わせ
格納

検査対象 CVE-2013-4371
Xen Hypervisor
402 tmp = realloc(ptr, (i + 1) * sizeof(libxl_cpupoolinfo));
388libxl_cpupoolinfo * libxl_list_cpupool(libxl_ctx *ctx, int *nb_pool)
389{
397 poolid = 0;
398 for (i = 0;; i++) {
399 info = xc_cpupool_getinfo(ctx->xch, poolid);
400 if (info == NULL)
401 break;
402 tmp = realloc(ptr, (i + 1) * sizeof(libxl_cpupoolinfo));
403 if (!tmp) {
404 LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "allocating
cpupool info");
405 free(ptr);
406 xc_cpupool_infofree(ctx->xch, info);
407 return NULL;
408 }
409 ptr = tmp;
410 ptr[i].poolid = info->cpupool_id;
411 ptr[i].sched_id = info->sched_id;
412 ptr[i].n_dom = info->n_dom;
413 if (libxl_cpumap_alloc(ctx, &ptr[i].cpumap)) {
415 break;
416 }
417 memcpy(ptr[i].cpumap.map, info->cpumap, ptr[i].cpumap.size);
418 poolid = info->cpupool_id + 1;
420 }
realloc use-after-free vulnerability
Use-after-free vulnerability in the
libxl_list_cpupool function in the libxl toolstack
library in Xen 4.2.x and 4.3.x, when running
"under memory pressure," returns the original
pointer when the realloc function fails, which
allows local users to cause a denial of service
(heap corruption and crash) and possibly
execute arbitrary code via unspecified vectors.
At line 402, Xen uses realloc for
reallocating the memory. Note that the
address of libxl_cpupoolinfo is already
assigned outside of this routine. Under high
pressure, realloc can not extend the
memory from the original pointer which is
already obtained. in this case, realloc newly
yielding the address which remaining the
data to be written.
Boundary(終了条件）が
緩いループにreallocが
不適切なポインタを
引数にして実行
されている。

Loop representation and semantic action
66 line
67: for_statement_1
68| for_statement_2
69| for_loop_start
70| condition_1
71| condition_2
72| realloc
73| block
74;
258 block
259: BRACE_LEFT {
263}
264|
265 BRACE_RIGHT {
266 counter = yylval.ival;
269
274 func_for_statement_end();
275 func_for_statement_insert();
276
281 }
"for" {
return FOR;
}
"realloc" {
return REALLOC;
}
[0-9*] {
return NUMBER;
}
"(" {
return PAREN_LEFT;
}
Ｌｅｘｅｒ Parser
Yacc or Bison Compiler
C Compiler
Parser Binary
(a.out)
LR
specification
y.tab.c
Input stream
y.tab.c
a.out
output stream
{ "_id" :
ObjectId("5633ed7f42e0e004
8307ec14"), "loop_end_line" :
420, "realloc" : 1,
"loop_start_line" : 398,
"loop_type" : 1,
"realloc_line" : 402,
"file_name" :
"tools_libxl_libxl__c",
"func_line_number" : 388 }
Semantic action
Bottom up

評価実験 CVE-2013-4371 並列化したプッシュダウンオートマトン
12
{"_id" : ObjectId("53f9ec4764e21cef244d69fb"), "
located" : "402", "functionName" : "
libxl_list_cpupool", "functionLine" : "388", "
filename" : "libxl.c“}
34
{"_id" : ObjectId("53f9ec9464e21cef244d6a0e"), "
start_line" : "398", "end_line" : "420", "
functionName" : "libxl_list_cpupool", "
functionLine" : "388", "filename" : "libxl.c“}
realloc
{"_id" : ObjectId("53d291fe40c2acf65bbbf9f7"), "located" : "145
"functionName" : "xc_vcpu_setaffinity", "functionLine" : "116", "filename" :
"xc_domain.c" }
Use-after-free vulnerability in the libxl_list_cpupool function in the libxl toolstack library in Xen 4.2.x and 4.3.x, when
running "under memory pressure," returns the original pointer when the realloc function fails, which allows local users
to cause a denial of service (heap corruption and crash) and possibly execute arbitrary code via unspecified vectors.
http://www.cvedetails.com/cve/CVE-2013-4371/
We compiled our system on ubuntu12 LTS with Linux kernel
3.2.0. proposed system is hosted on Intel Xeon E5645 with 2.4
GHZ clock.
version forloop realloc functions real user sys real user sys
4.0.4 5438 76 13143m41.925s 0m9.213s 0m22.837 0m17.817s 0m2.880s 0m0.328s
4.1.0 5579 80 13735m35.133s 0m9.381s 0m25.002s 0m18.597 0m2.980 0m0.448
4.1.2 5547 76 13682m2.915s 0m9.301s 0m23.545s 0m18.432s 0m3.012 0m0.396
青：並列化なし赤：提案手法（タスク並列化）

評価実験（２） LALRを用いたloopタイプの特定等
{ "_id" : ObjectId("5633ed7f42e0e0048307ec14"),
"loop_end_line" : 420, "realloc" : 1, "loop_start_line" : 398,
"loop_type" : 1, "realloc_line" : 402, "file_name" :
"tools_libxl_libxl__c", "func_line_number" : 388 }
computing time
detected
realloc() loop
detected loop
4.2.1 10m40.012s 21 3734
4.2.5 11.m17.259s 20 3737
4.2.5 11.m17.259s 20 3737
4.3.1 11m54.117s 18 3907
4.3.4 12m3.511s 18 3911
for_statement_1
: for_loop_start condition_1_1 condition_1_2 condition_1_3 {
printf("for_statement type:1 starts at :");
print_line_number();
func_for_statement_1_start();
for_loop_flag = 1;
}
;
for_statement_2
: for_loop_start condition_1_1 condition_2_2 {
printf("for_statement type:2 started at :");
print_line_number();
func_for_statement_2_start();
for_loop_flag = 1;
}
;
終了条件 (boundary) が不十分なループ実装
提案したLALR型は変数遷移やループ系特定可能だがreentrant(pure)
ではないので現状では並列化できない。

進行中の解析処理と今後の課題
脆弱性なループのある関数
* libxl_list_cpupool(libxl_ctx *ctx, int *nb_pool)
外部から攻撃者が操作し、ループ回数に影響を
あたえる与えることができる関数（main)
Exhaustive path search (ソースコードからの全パス列挙）
Linux 4.1.3
４００分
当該ループ実装の
他のオープンソース
内の検査
Exhaustive path searchはXen 4.2.0であれば***分程度
ですべて網羅できる。
Httpd 2.2.31
268分

まとめと今後の課題： LALR (先読み上昇型パーサ）を用いた
大規模プログラム脆弱性走査
■ANTLR, Bison/Flex, Boost Spiritなどの構文解析（言語実装）技術の普及により、
大規模プログラムの処理・生成に関数型言語のフレームワークが適用されつつある。例）
Android Open Source Project内でのANTLR, OpenFLow内でのBison/Flexの利用など
■近年のクラスタコンピューティング、クラウドコンピューティングなどの技術の普及により、Map-
ReduceやLALR (Look-Ahead Left-Right)などの上昇型のテキスト技術処理で現実的な計算時間で行
なうことが可能になっている。
■本論文ではソースコードの脆弱性の検出にLALR型の構文解析プログラムを解析し、大規模プロ
グラムの高メモリプレッシャー下での情報漏洩の危険性のある脆弱性を文脈自由文法を用いて記
述し、ソースコードの全走査からの検出を許容可能な計算時間で行なうことが可能になった。
■今後の課題提案LALRパーサのpure化(reentrantにする）による並列化、Exhaustive path search
(全パス走査）の高速化、ANTLR, Boost Spiritの適用によるsemantic actionの強化等

An annotated context-free grammar based vulnerability detection using LALR parser

More Related Content

What's hot (20)

Viewers also liked (17)

Similar to An annotated context-free grammar based vulnerability detection using LALR parser (20)

More from Ruo Ando (20)

An annotated context-free grammar based vulnerability detection using LALR parser