Discretized Streams: Fault-Tolerant Streaming Computation at Scaleの解説

Discretized Streams:
Fault-Tolerant Streaming
Computation at Scale
2014年7月4日
Katsunori Kanda

紹介する論文について
• SOSP 13
• Author: Matei Zaharia et al. (UCB)
• CTO@databricks
• Assistant Professor@MIT
• Contributor of Apache Spark

概要
• 新しいストリーム処理モデル(D-Streams)の提案
• 特徴1: parallel recovery
• 特徴2: スループットがスケール(100nodes)
• 特徴3: latencyが数秒∼数百ミリ秒
• Spark Streamingとして実装

Spark Model
Write programs in terms of transformations
on distributed datasets
!
Resilient Distributed Datasets (RDDs)
• Collections of objects that can be stored in
memory or disk across a cluster
• Parallel functional transformations (map,
ﬁlter, …)
• Automatically rebuilt on failure

2. Goals and Background
• 対象とするアプリケーションの例
• Site activity statistics: 10^6 events/s
• Cluster monitoring
• Spam detection
• 0.5-2 sec latency(not target: high-frequency
trading)

2.1 Goals
1. Scalability to hundreds of nodes
2. Minimal cost beyond base processing
3. Second-scale latency
4. Second-scale recovery from faults and
stragglers

2.2 既存の処理モデル
• continuous operator model
• 生存期間が長い状態を持ったオペレータに分
割して計算する。入力値によって状態が更新
される。

2.2 Previous Processing
Models: Replication
• 同じ入力を二つのシステムが同時に受け取る。
二つのシステムは、同期が必要になる（DB等が
典型例）

2.2 Previous Processing
Models: Upstream Backup
• 各ノードはあるチェックポイント以降に送られ
てきたメッセージのコピーを保持する
• ノードがfailした場合、待機系のノードがfailし
たノードの状態を再構築する。この再構築のコ
ストは高い。
• 例: MapReduce Online, Storm

Handle stragglers
• 既存のモデルでは、stragglerの問題に対処でき
ない
• replication: stragglerが発生すると全体が遅
くなる（同期が必要のため）
• upstream backup: failureとして扱うことに
なるが、リカバリーが高コスト（前述）

3. Discretized Streams
(D-Streams)
• D-Streamsは、
• 小さい(short)
• 状態を持たない(stateless)
• 決定論的タスク(deterministic tasks)

3.1. Computation Model
• 短い間隔の決定論的な連続したバッチ計算

3. Computation Model:
Recovery from faults
• partition単位で再計算される
• 無限に再計算されることを避けるために、一定
間隔で非同期レプリケーションが行われRDDの
状態が保存される
• 再計算は、並列実行可能

3.2. Timing Considerations
• 順番通りにデータが到着しない問題への対応
• 余裕時間(slack time)の間はバッチの開始を待
つ
• アプリケーションレベルで遅れてきたレコー
ドを処理する方法を提供

3.3. D-Stream API(1/3)
• Transformations: 新しいD-Streamを作る
• paris = words.map(w => (w, 1))
• counts = pairs.reduceByKey((a, b) => a +
b)
Stateless API

Stateful API
ex.
pairs.reduceByWindow( 5s , (a,b) => a + b)
pairs.reduceByWindow( 5s , (a,b) => a + b, (a,b) => a - b)
Incremental aggregation:

Stateful API
sessions = events.track(
(key, ev) => 1,
(key, st, ev) =>
ev == Exit ? null : 1,
30s )
count = sessions.count()
state tracking:

3.4. Consistency Semantics
• nodeによって処理の進行状況が違うと整合性の
問題が生じる
• 既存システム: 同期で解決、または無視
• D-Streams: 時間が区切られているので明確

3.5. Uniﬁcation with Batch
& Interactive Processing
• Batchと同じ計算モデルを使っているのでBatch
と組み合わせやすい
• 特徴1: バッチの結果とjoinできる
• 特徴2: 過去データを計算できる
• 特徴3: 対話的な問い合わせができる
• counts.slice( 21:00 , 21:05 ).topK(10)

4. System Architecture
• Master: D-Streamの系統グラフの管理、タスク
スケジューリング、RDD partitionの作成
• Worker nodes: dataを受け取る、partitionへ
の入力と計算されたRDDの保存、タスク実行
• Client Library: システムにデータを送る

4.2. Optimization for
Stream Processing
• Network communication: 非同期I/Oを導入。reduceが速くなった。
• Timestamp pipelining: Sparkのスケジューラーを次の時間の処理を
先に登録できるように修正した
• Task scheduling
• Storage layer: 非同期チェックポイントの追加。RDDがimmutableな
のでブロックしなくていい。zero-copy I/Oも実装。
• Lineage cutoﬀ: チェックポイント作成後に削除するようになっった
• Master recovery: マスターの状態復帰機能を実装

4.3 Memory Management
• LRUでデータをdiskに書き出している

5.2. Straggler Mitigation
• simple threshold to detect straggler:
• タスク処理時間の中央値の1.4倍
• 1秒以内にはstragglerを解消できている

5.3. Master Recovery
1. 各時間の処理開始前に計算の信頼度記録
2. マスターがfailした場合、各workerが保持して
いるRDD partitionを新しいマスターに報告す
る
重要なのは・・・
同じRDDを二回計算しても問題ないこと

5.3. D-Streamsのメタデータ
• D-Streamsのメタデータ@HDFS
• ユーザーのD-Streamグラフ、ユーザー定義関
数
• 最後のチェックポイント作成時刻
• RDDのID(チェックポイント以降)

6.1. Comparison with S4
and Storm

6.2. Varying the Checkpoint Interval

6.2. Varying the Number of
Nodes

Discretized Streams: Fault-Tolerant Streaming Computation at Scaleの解説

More Related Content

Similar to Discretized Streams: Fault-Tolerant Streaming Computation at Scaleの解説 (20)

More from Katsunori Kanda (14)

Discretized Streams: Fault-Tolerant Streaming Computation at Scaleの解説