Guagua an iterative computing framework on hadoop

Guagua: An Iterative Computing
Framework on Hadoop
Zhang Pengshan(David), PayPal

AGENDA
• Introduction
• Distributed Neural Network Algorithm
• What is Guagua?
• Guagua Advanced Features
• Shifu on Guagua
• Future Plans

ALIPAY vs. PAYPAL
Q: Where is risk control in PayPal?
A: Risk control is everywhere in paypal.com.

FRAUD TYPES IN PAYPAL
Fraud Types in
PayPal
Account Take Over
Stolen Financials
INR/SNAD
Credit Cards
INR: Item Not Received
SNAD: Significantly Not as Described

RISK CONTROL IN PAYPAL
Models Rules Agents

RISK MODELING IN PAYPAL
MODELING CHALLENGES
Thousands
of Features
Algorithms
(LR, NN, DT)
Big
Training
Data
SLA
(Online)
Simulation

RISK MODELING IN PAYPAL
MODELING CHALLENGES
Thousands
of Features
Algorithms
(LR, NN, DT)
Big
Training
Data
SLA
(Online)
Simulation
Q: How to train models with TB data and thousands of features?

DISTRIBUTED NEURAL NETWORK ALGORITHM*
Worker Worker Worker
GRADIENTS: DOUBLE [] GRADIENTS: DOUBLE []
Master
ACCUMULATE GRADIENTS
UPDATE WEIGHTS
GRADIENTS: DOUBLE [] GRADIENTS: DOUBLE [] GRADIENTS: DOUBLE []
Master
…
1st iteration
2nd iteration
…
GRADIENTS: DOUBLE []
WEIGHTS: DOUBLE [] WEIGHTS: DOUBLE [] WEIGHTS: DOUBLE []
UPDATE WEIGHTS
* Distributed batch gradient descent algorithm.

DISTRIBUTED NEURAL NETWORK ALGORITHM
GRADIENTS: DOUBLE [] GRADIENTS: DOUBLE []
Master
UPDATE WEIGHTS
GRADIENTS: DOUBLE [] GRADIENTS: DOUBLE [] GRADIENTS: DOUBLE []
Master
…
1st iteration
2nd iteration
…
GRADIENTS: DOUBLE []
UPDATE WEIGHTS
Q: How to implement it?

WHY NOT MAHOUT OR SPARK?
Mahout
• No distributed logistic regression &
neural network.
• Iterative through Hadoop jobs, bad
performance.
Spark
• No independent Spark cluster.
• Hadoop cluster is still 1.0 based, not
YARN.
Q: How to implement it in Hadoop?

POSSIBLE SOLUTIONS
Hadoop YARN Hadoop MapReduce
Pros
Flexible framework for framework Works well on all Hadoop versions
Self resource management Mature computing model
Internal fault tolerance, splits, UI …
Cons
2.0.3-Alpha Different computing model
PayPal Clusters: Hadoop 0.20.2 How to do iterative coordination?
Extra fault tolerance, splits, UI …
Q: How to implement it in Hadoop MapReduce?

ITERATIVE COMPUTING MODEL IN GUAGUA
WORKER RESULT WORKER RESULT
Master
WORKER RESULT WORKER RESULT WORKER RESULT
Master
…
1st iteration
2nd iteration
…
WORKER RESULT
MASTER RESULT MASTER RESULT MASTER RESULT
MASTER RESULT MASTER RESULT MASTER RESULT
Guagua is a framework over such iterative computing model,
compared with Hadoop 1.0 over MapReduce.

GUAGUA API
MasterComputable
WorkerComputable

GUAGUA OVERVIEW
Iterative
Computing
Framework
CORE
Master-
Workers Core
MapReduce Adapter
(For Hadoop 1.0)
Coordination
YARN Adapter
(For Hadoop 2.0)
Fault
Tolerance
Consistent Client
Distributed Neural
Network Application

PLUGGABLE, SCALABLE INTERCEPTORS
Master
Fault Tolerance Interceptor
ZooKeeper Coordinator
Perf Interceptor
Timer
User Defined Interceptors
Master Computation
Worker
Fault Tolerance Interceptor
ZooKeeper Coordinator
Perf Interceptor
Timer
User Defined Interceptors
Worker Computation
* These two graphs are aspects for each iteration.

GUAGUA RUNTIME
Master: Mapper (Container)
Worker: Mapper (Container)

GUAGUA RUNTIME
ZooKeeper
Cluster
REGISTER
REGISTER
REGISTER
REGISTER
REGISTER
REGISTER
1. Master is listening znodes of workers.
2. Workers are listening znode of master.

GUAGUA RUNTIME
ZooKeeper
Cluster
UPDATE ITER
UPDATE ITER
UPDATE IITER
UPDATE ITER
UPDATE ITER
UPDATE ITER
1. Data is loaded in worker memory in the first iteration.
2. Whole process is done when reaches maximal iteration
or halt condition is triggered.

FAULT TOLERANCE
Master
Worker
Worker
Worker
Worker
Worker
Master
Worker
Worker
Worker
Worker
Worker
Master
Worker
Worker
Worker
Worker
Worker
Master
Worker
Worker
Worker
Worker
Worker
Master
Worker
Worker
Worker
Worker
Worker
…
…
…
…
…
…
1 2 3 4 … n

FAULT TOLERANCE
Master
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Master
Worker
Worker
Worker
Worker
Worker
…
…
…
…
…
1 2 3 4 … n
* The same on workers.

STRAGGLER MITIGATION
Master
Worker
Worker
Worker
Worker
Worker
Master
Worker
Worker
Worker
Worker
Worker
Master
Worker
1 2 3
Worker
Worker
Worker
Worker

Master
Worker
Worker
Worker
Worker
Master
Worker
Worker
Worker
Worker
Master
Worker
Worker
Worker
Worker
1 2 3

Master
Worker Worker
Worker
Worker
Worker
Worker
Master
Worker
Worker
Worker
Worker
Master
Worker
Worker
Worker
Worker
1 2 3

PROGRESS AND STATE REPORT
0.86 = 432/501 (Current Iteration) / (Total Iteration)

WHAT IS SHIFU?
Shifu* is an open-source, end-to-end machine learning and data
mining framework built on top of Hadoop.
NEW INIT STATS VARSELECT
NORMALIZE
POSTTRAIN TRAIN
EVAL
Built on Guagua
*Want to try Shifu? Please visit http://shifu.ml.

SHIFU ON GUAGUA (TRAIN STEP)
NNMaster
MasterInterceptor
NNWorker NNOutput
MasterComputable
WorkerComputable
AbstractWorkerComputable BasicMasterInterceptor
Gradients
Weights
GUAGUA API SHIFU CODE ENCOG CODE

SHIFU NN vs. SPARK LR
Run Time Comparison
Shifu-NN: 1102*20*1 Network, 319 Mappers * 1G
Spark-LR: 1102 features, 120 executors * 3G
45
40
35
30
25
20
15
10
5
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Time
Iteration
Shifu-NN
Spark-LR

SHIFU NN BENCHMARK RESULTS
All data are located in memory. At most we used 2400 mappers. 20 epochs are used.
1400
1200
1000
800
600
400
200
0
125G 375G 500G 625G 750G 875G 1000G
Run Time
(Seconds)
Size of Input
Time(Seconds)

WHAT’S NEXT?
• More open source docs
• Support more (distributed) machine learning algorithms
• Improve YARN (Beta) implementation
• Support more input formats
• Big model support
• Deep learning support

APPENDIX
• Website
• http://shifu.ml
• http://shifu.ml/docs/guagua/
• Guagua issue website
• https://github.com/shifuml/shifu/issues
• https://github.com/shifuml/guagua/issues
• Shifu & Guagua source code:
• https://github.com/shifuml/shifu/
• https://github.com/shifuml/guagua/

Guagua an iterative computing framework on hadoop

More Related Content

What's hot (20)

Viewers also liked (13)

Similar to Guagua an iterative computing framework on hadoop (20)

Recently uploaded (20)

Guagua an iterative computing framework on hadoop