On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms

Platforms
Marat Zhanikeev
maratishe@gmail.com
maratishe.github.io
Hadoop versus Bigdata Replay
Tokyo Univ. of Science
O n P e r f o r m a n c e U n d e r H o t s p o t s i n
WebDB Forum 2017@お茶の水女子大
PDF → bit.do/170920

Background on Hadoop
• Hadoop performance measurement
◦ creators on performance limits 09
◦ superlinear effect 08
◦ various benchmarks on Hadoop vs Spark 07
◦ inconsistencies in measurements 11
• Hadoop/MapReduce optimization in 14 and a ton of other papers
• the ”Do We (actually) Need Hadoop?” argument in 10 and few recent
papers
09 K.Shvachko+0 ”HDFS scalability: the limits to growth” Usenix Login (2010)
08 N.Gunther+2 ”Hadoop Superlinear Scalability” ACM Queue (2015)
07 J.Shi+6 ”Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics” Very Large Data Bases (2015)
11 M.Xia+3 ”Performance Inconsistency in Large Scale Data Processing Clusters” 10th USENIX ICAC (2013)
14 A.Rasooli+1 ”COSHH: A Classiffication and Optimization based Scheduler for Heterogeneous Hadoop Systems” Future Gen.Comp.Sys. (2014)
10 A.Rowstron+1 ”Nobody ever got fired for using Hadoop on a cluster” 1st HotCDP (2012)
M.Zhanikeev – maratishe@gmail.com On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms – bit.do/170920 2/12
2/12

Modeling Hadoop Bottlenecks
Network
(NW)
Bulk
Storage
(BS)
Shared
Memory
(SM)
Core Output
Big Data Processing
HPC, Simulators, Modeling
Small
Data
Bulk
Storage
(BS)
On-Chip
Shared
Memory
(hSM)
Numberofparallelaccesses
Network
(NW)
Ability to isolate
Bottleneck
(pipe width)
RAM-based
Shared Memory
(sSM) Bulk
Storage
(BS)
Network
(NW)1
RAM-based
Shared Memory
(sSM)
Parallelaccesses
Ability to isolate
Core Output
Small
Data
3/12

Hadoop’s Answer: Rack Awareness
Rack
Switch
Datanode
Datanode
Datanode
…
Rack
Switch
Datanode
Datanode
……
Core
Switch
Client
Client
Logical
Client
Own Rack
Switch
Other Rack Switch
Other Rack Switch
Other Rack Switch
Datanodes
• official Hadoop feature
(not a bug) 12
• some dynamics, goes
off-rack when local
nodes have too many jobs
• sadly, manual
configuration of rack
affiliation (much potential here for
research on virtual network coordinates –
Meridian, Vivaldi...)
12 ”Hadoop: Rack Awareness” https://hadoop.apache.org (2017)
4/12

Hadoop vs Bigdata Replay Method
• basic idea similar to 10 but uses circuits 02 to transfer shards and multicore
01 to parallel-process them
Name Node
Storage Node (shard)
file A
file B
file C
…
Hadoop Space
Manager
Hadoop Job
(your code)
Hadoop Job
(your code)
Hadoop Job
(your code)
MapReduce
job (your code)
manymany
Name
Server(s)
Client Machine
Hadoop Client
Your
Code
You
Start Use
Deploy
FindRead/parse
many
Internals (DC)
Users
Storage Node
(shard)
Time-Aware
Sub-Store(s)
Manager
Client Machine
Client
Your
Sketcher
You
Start Use
Schedule
Multicore
Replay
Replay Node
many
10 A.Rowstron+1 ”Nobody ever got fired for using Hadoop on a cluster” 1st HotCDP (2012)
02 myself+0 ”Circuit Emulation for Big Data Transfers in Clouds” Networking for Big Data, CRC (2015)
01 myself+0 ”Streaming Algorithms for Big Data Processing on Multicore” Big Data: Algorithms, Analytics, and Applications, CRC (2015)
5/12

Replay Environment is Highly Flexible!
• replay is time-aligned, so jobs can pick any spot on the timeline
• similar to Spark in going beyond key-value datatype but more – the full scope
of streaming algorithms 01
• massively multicore environments 04 with 100+ cores, dynamic re-packing of
job batches, etc.
Core 1
Core 1
Core X
Replay
Manager
Now(replay)
….
Time-Aligned Big Data
Cursor
Time
Direction
One Sketch One SketchOne Sketch
Start End End End
Read/prepare
Shared Memory
Start
….
Time
Now
(buffer head)
Manager
Job
Job
Buffer
tail
pos
pos
Controller
Kill
2 Report
Manage
in realtime
One Replay Batch
One
Buffer
One
Buffer
One
BufferJobs
Jobs
Jobs
Replay at
a scale
1
01 myself+0 ”Streaming Algorithms for Big Data Processing on Multicore” Big Data: Algorithms, Analytics, and Applications, CRC (2015)
04 myself+0 ”Volume and Irregularity Effects on Massively Multicore Packet Processors” APNOMS (2016)
6/12

Performance under hotspots
7/12

The Hotspot Distribution
0 20 40 60 80 100
Decreasing order
0
0.35
0.7
1.05
1.4
1.75
2.1
2.45
2.8
log(value)
Class A Class B Class C Class D Class E
• models Flash/Hotspot/
Killerapp/Blackswan
events using extreme variance
in popularity
• generation method:
stick-breaking process,
Dirichlet distribution with
parallel beta sources 05
• final step: classify based on
the number of hot/flash items
05 myself+1 ”Popularity-Based Modeling of Flash Events in Synthetic Packet Traces” CQ研 (2012)
8/12

The Binary ”Till Contention” Metric
• not a common, but very realistic
way to model performance under load
• note: even more applicable under
hotspot-y input
Rack
Rack
Border
(switch)
Client
Data
Shards
Data
Shards
…
Volume
Contention
Contention -free
to contention -ful
threshold
• example: function of server response
time to load can be expressed as:
T =
1
2
[
(L − n) +
√
(L − n)2 + k
1 − L
]
• ...where T is response time, L is load,
and k is the knee = contention point!
9/12

Performance Models
• shard size as S and in-job traffic to shard size ratio r
◦ so, Hadoop jobs generate rS versus always strictly S under Replay
• contention threshold as C (for both contention and/or capacity)
• list of shard hotness (popularity)
{
h1, h2, h3, ..., hn
}
and sizes{
S1, S2, S3, ...., Sn
}
• then we have (job/traffic) volume for Hadoop:
Vhadoop =
∑
i=1..n
rhiSi
• ... and for Replay method:
Vreplay =
∑
i=1..n
Si (1)
10/12

Results
A 0.001 B 0.001 C 0.001 D 0.001 E 0.001
hadoopreplay
A 0.005 B 0.005 C 0.005 D 0.005 E 0.005
A 0.01 B 0.01 C 0.01 D 0.01 E 0.01
A 0.05 B 0.05 C 0.05 D 0.05 E 0.05
A 0.1 B 0.1 C 0.1 D 0.1 E 0.1
A 0.2 B 0.2 C 0.2 D 0.2 E 0.2
10
20
50
100
200
500
1000
2000
5000
10000
0.7
1.4
2.1
2.8
3.5
4.2
log(1+timetillcontention)
A 0.5 B 0.5 C 0.5 D 0.5 E 0.5
Replay period (step) is 10
A 0.001 B 0.001 C 0.001 D 0.001 E 0.001
hadoopreplay
A 0.005 B 0.005 C 0.005 D 0.005 E 0.005
A 0.01 B 0.01 C 0.01 D 0.01 E 0.01
A 0.05 B 0.05 C 0.05 D 0.05 E 0.05
A 0.1 B 0.1 C 0.1 D 0.1 E 0.1
A 0.2 B 0.2 C 0.2 D 0.2 E 0.2
10
20
50
100
200
500
1000
2000
5000
10000
0.8
1.6
2.4
3.2
4
4.8
A 0.5 B 0.5 C 0.5 D 0.5 E 0.5
A 0.001 B 0.001 C 0.001 D 0.001 E 0.001
hadoopreplay
A 0.005 B 0.005 C 0.005 D 0.005 E 0.005
A 0.01 B 0.01 C 0.01 D 0.01 E 0.01
A 0.05 B 0.05 C 0.05 D 0.05 E 0.05
A 0.1 B 0.1 C 0.1 D 0.1 E 0.1
A 0.2 B 0.2 C 0.2 D 0.2 E 0.2
10
20
50
100
200
500
1000
2000
5000
10000
0.9
1.8
2.7
3.6
4.5
5.4
A 0.5 B 0.5 C 0.5 D 0.5 E 0.5
11/12

That’s all, thank you ...
12/12

On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms

More Related Content

What's hot (20)

Similar to On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms (20)

More from Tokyo University of Science (20)

Recently uploaded (20)

On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms