B-link-tree

LENMAN-YAO concurrent B-Tree (a.k.a B-Link-Tree) 油井誠 (Makoto YUI)

B-Link Tree とは (history) A variant of B*-Tree (they call) actually.. a B+-tree? (data stored in leaf only) 1981 年 PHILIP.L. LEHMAN , S.BING YAO ” Efficient Locking for Concurrent Operations on B-Trees ” という論文で発表当時，複数のプロセスで同時実効性を持つデータベースの構築が流行りだった ( らしい論文の主眼は如何に効率のよく高い同時実効性を有する索引を構築するか．

B*-Tree an elegant(?) variant on B+-tree keep pages at least 2/3 full, guaranteed . The benefit over B+-trees is not large, a worst case reduction in page usage of 16.67% メモリが安く早くなりつつあるので、メモリスペースを節約するアルゴリズムから脱却すべきではないだろうか．沢山のメモリを利用することで、レイテンシーが問題がなることがある．よりエレガントなアルゴリズムが必要になってくる．これは既に、バスのキャパシティを CPU スピードが追い越すことで起こっていることである． http://www.cise.ufl.edu/~jhammer/classes/b_star.html

B-Link-Tree (Key issue) No Read-Lock !! 複数のプロセスが同時に更新した場合に，少数のノードをロックするロックの競合は稀であるロックの粒度が高く，開放が早い (non-2PL) ロック自体は update 時に Binary Lock (simple) デッドロックはないが，分散システムを考えるとよくないかも (? Additional Value to B*-Tree right-link , high key Breadth-first search （横型探索 ) が可能 . ノードにたどり着く Alternative way が存在する . Tree traversing operation search: top-down, left-to-right insert: bottom-up

デッドロック ( 待ちグラフ ) T2 T1 T3 2 層ロック ( トランザクション内で処理するオブジェクトを順にロックする ) では，巡回経路ができ，デッドロックが発生する．巡回閉路

B-link-Tree Structure (a) Leaf( 実際のデータへのポインタを有す ) への距離は一定 (of course, well balanced) (b) 各ノードは基本的に最低 K+1 の子 ( への pointer) を持つただし、 Root は常に最低 2 つの子を持つ (c) 各ノードは最高 2k+1(exclude High-Key) の子を持つ “ High-key ” と呼ばれる追加エントリを持つこともある (+1) (d) key は leaf ノードに存在 key はデータベースのレコードへのポインタを有す . ( 各レコードがキーと関連付けられている )

B*-Tree Node Structure (with no “ high key ” ) Internal use only.

P i は Subtree T i を指す　 T i に含まれるキー値は K i-1 < v <= K i Leaf と non-leaf ノードの構造が同一 ( シンプル ) M は Leaf であることを識別する Marker

High-key ノードが溢れる時 (Split) ，ノードの右端に付加される Rightmost-link を辿る時に使われる．

更新時の処理 (basis) x: 物理ページ (Logical ノードと呼んでいる ) Lock(x); // ページ X を更新する際にマーキング // もし既にロックされていたら wait() A <- get(x); // Disk からメモリに読み込む Modify data in A // メモリ内の A を更新 Put(A, x) // メモリから Disk の sync Unlock(x); // ロックを解除

Native approach ロックする必要がある． (read rock) Top-Down locking. 操作ノード下子孫をロック 15 の探索中に 9 が Insert され， Split されたケース Wait

Rightmost-link B-link-Tree は各ノードに Single Link Pointer を持つ．各レベルが単方向リスト．目的目的のノードに到達するための Alternative way を提供するどのノードもたどり着く為の 2 つの Pointer を持つ Overflow Replaced by two New node. (usually, the two nodes Are on the same physical page) Essentially same as A single node until the Parent pointer is added.

Use high-key 74 74 63 HighKey HighKey Split 直後 ( 親へのリンクポインタが修正されていない ) 72 を探すケースを考えてみる．ここの中に High Key があるはずだが， Split されてない．この時， Link Pointer を辿る必要がある．この条件を High-Key と Search Key の比較によって行う． Search key が High key 以上ならポインタを辿る

An structure example of B-link-Tree Additional advantage: level-major order での取得が早い (Leaf をすべて取得する場合など )

Search Algorithm current = root; /* root の pointer を設定 */ A = get(current); /* ノードを memory に読み込む */ while (current is not a leaf) /* leaf に到達するまでループ */ current = scannode1(v, A); /* v をキーに memory bloak A を走査 */ /* リンクポインタを返す */ A = get(current); /* リンク先のノード読み込み */ / * now reached to leaves * / * leaf 内の線形探索 */ while ((t = scannode(v,A)) == link pointer of A) current = t; A = get(current); // 値 v を持つ leaf が見つかったか ? if (v is in A) return(success); else return(failure); Simple! Only trick is to have scannode know about high-keys and right-links. Right-link か Child-link かは問わない

Insert Algorithm 基本的に , Search と似たような下準備をする． Step １．木を辿ってキー値 v を含む (should include) リーフノードを探す．その際， Rightmost-link をスタックに積む． Step2. “ remembered list ” ( 辿ってきた道 ) を辿って，実際の挿入 Bottom-up insert ．

Insert(1) initialize stack; /* to remember ancestors */ current = root; A = get(current); /* scan down the tree */ while (current is not a leaf) { t = current; current = scannode(v,A); if (current not link pointer in A) push t; /* remember node at the level */ A = get(current); } /* reached to leaf */ lock(current); A = get(current); move_right(); /* if required, move right and lock the neighbor */

// current node の update. // Split Node // for locking.. Remember the oldnode and the parent. // lock parent, oldnode, and the next-sibling.

A B C D E F G H splited A Bottom-up Stack (right-most-link) B E Leaf Found G

D E G H Leaf A Bottom-up Stack (right-most-link) B E If need to Split Flush Flush B Lock Flush Un Lock Finish! G I ’ G’ I ’ G’ E E

P A C Leaf Insert algorithm(Write lock / no read lock) B ’ A ’ P Bottom-up Stack Insert(Split) A ’ Flush Lock A B ’ Flush

Delete Too simple (leaf のエントリが K 以下なら，この論文は primary index を前提 ) Just remove keys from a leaf node. (never do deletion from internal nodes) Delete が続くとノードのエントリが k エントリ以下になることがある． (starvation?)

problems Livelock multiprocesser system, where one process runs indenfinitely ( 特に processing speed に差がある時 ) Following link pointer が別プロセスで作られると極端に遅くなることがある

B-link-tree

More Related Content

What's hot (20)

Viewers also liked (13)

Similar to B-link-tree (20)

More from Makoto Yui (20)

B-link-tree