SlideShare a Scribd company logo
Big	Data	Meets	HPC	–	Exploi4ng	HPC	Technologies	
for	Accelera4ng	Big	Data	Processing	
Dhabaleswar	K.	(DK)	Panda	
The	Ohio	State	University	
E-mail:	panda@cse.ohio-state.edu	
h<p://www.cse.ohio-state.edu/~panda	
Keynote	Talk	at	HPCAC-Switzerland	(Mar	2016)	
by
HPCAC-Switzerland	(Mar	‘16)	 2	Network	Based	Compu4ng	Laboratory	
•  Big	Data	has	become	the	one	of	the	most	
important	elements	of	business	analyFcs	
•  Provides	groundbreaking	opportuniFes	for	
enterprise	informaFon	management	and	
decision	making		
•  The	amount	of	data	is	exploding;	companies	
are	capturing	and	digiFzing	more	informaFon	
than	ever	
•  The	rate	of	informaFon	growth	appears	to	be	
exceeding	Moore’s	Law	
Introduc4on	to	Big	Data	Applica4ons	and	Analy4cs
HPCAC-Switzerland	(Mar	‘16)	 3	Network	Based	Compu4ng	Laboratory	
•  Commonly	accepted	3V’s	of	Big	Data	
•  Volume,	Velocity,	Variety	
Michael	Stonebraker:	Big	Data	Means	at	Least	Three	Different	Things,	hWp://www.nist.gov/itl/ssd/is/upload/NIST-stonebraker.pdf	
•  4/5V’s	of	Big	Data	–	3V	+	*Veracity,	*Value	
4V	Characteris4cs	of	Big	Data	
Courtesy:	hWp://api.ning.com/files/tRHkwQN7s-
Xz5cxylXG004GLGJdjoPd6bVfVBwvgu*F5MwDDUCiHHdmBW-
JTEz0cfJjGurJucBMTkIUNdL3jcZT8IPfNWfN9/dv1.jpg
HPCAC-Switzerland	(Mar	‘16)	 4	Network	Based	Compu4ng	Laboratory	
•  Webpages	(content,	graph)	
•  Clicks	(ad,	page,	social)	
•  Users	(OpenID,	FB	Connect,	etc.)	
•  e-mails	(Hotmail,	Y!Mail,	Gmail,	etc.)	
•  Photos,	Movies	(Flickr,	YouTube,	Video,	etc.)	
•  Cookies	/	tracking	info	(see	Ghostery)	
•  Installed	apps	(Android	market,	App	Store,	etc.)	
•  LocaFon	(LaFtude,	Loopt,	Foursquared,	Google	Now,	etc.)	
•  User	generated	content	(Wikipedia	&	co,	etc.)	
•  Ads	(display,	text,	DoubleClick,	Yahoo,	etc.)	
•  Comments	(Discuss,	Facebook,	etc.)	
•  Reviews	(Yelp,	Y!Local,	etc.)	
•  Social	connecFons	(LinkedIn,	Facebook,	etc.)	
•  Purchase	decisions	(Netflix,	Amazon,	etc.)	
•  Instant	Messages	(YIM,	Skype,	Gtalk,	etc.)	
•  Search	terms	(Google,	Bing,	etc.)	
•  News	arFcles	(BBC,	NYTimes,	Y!News,	etc.)	
•  Blog	posts	(Tumblr,	Wordpress,	etc.)	
•  Microblogs	(Twi<er,	Jaiku,	Meme,	etc.)	
•  Link	sharing	(Facebook,	Delicious,	Buzz,	etc.)	
Data	Genera4on	in	Internet	Services	and	Applica4ons
Number	of	Apps	in	the	Apple	App	Store,	Android	Market,	Blackberry,	
and	Windows	Phone		(2013)	
•  Android	Market:	<1200K	
•  Apple	App	Store:	~1000K	
Courtesy:	hWp://dazeinfo.com/2014/07/10/apple-inc-aapl-ios-google-inc-goog-
android-growth-mobile-ecosystem-2014/
HPCAC-Switzerland	(Mar	‘16)	 5	Network	Based	Compu4ng	Laboratory	
Velocity	of	Big	Data	–	How	Much	Data	Is	Generated	Every	Minute	on	the	Internet?	
The	global	Internet	popula4on	grew	18.5%	from	2013	to	2015	and	now	represents		
3.2	Billion	People.	
Courtesy:	hWps://www.domo.com/blog/2015/08/data-never-sleeps-3-0/
HPCAC-Switzerland	(Mar	‘16)	 6	Network	Based	Compu4ng	Laboratory	
•  ScienFfic	Data	Management,	Analysis,	and	VisualizaFon	
•  ApplicaFons	examples		
–  Climate	modeling	
–  CombusFon	
–  Fusion	
–  Astrophysics	
–  BioinformaFcs	
•  Data	Intensive	Tasks	
–  Runs	large-scale	simulaFons	on	supercomputers	
–  Dump	data	on	parallel	storage	systems	
–  Collect	experimental	/	observaFonal	data	
–  Move	experimental	/	observaFonal	data	to	analysis	sites	
–  Visual	analyFcs	–	help	understand	data	visually	
Not	Only	in	Internet	Services	-	Big	Data	in	Scien4fic	Domains
HPCAC-Switzerland	(Mar	‘16)	 7	Network	Based	Compu4ng	Laboratory	
•  Hadoop:	h<p://hadoop.apache.org	
–  The	most	popular	framework	for	Big	Data	AnalyFcs	
–  HDFS,	MapReduce,	HBase,	RPC,	Hive,	Pig,	ZooKeeper,	Mahout,	etc.	
•  Spark:	h<p://spark-project.org	
–  Provides	primiFves	for	in-memory	cluster	compuFng;	Jobs	can	load	data	into	memory	and	query	it	repeatedly	
•  Storm:	h<p://storm-project.net	
–  A	distributed	real-Fme	computaFon	system	for	real-Fme	analyFcs,	online	machine	learning,	conFnuous	
computaFon,	etc.	
•  S4:	h<p://incubator.apache.org/s4	
–  A	distributed	system	for	processing	conFnuous	unbounded	streams	of	data	
•  GraphLab:	h<p://graphlab.org	
–  Consists	of	a	core	C++	GraphLab	API	and	a	collecFon	of	high-performance	machine	learning	and	data	mining	
toolkits	built	on	top	of	the	GraphLab	API.	
•  Web	2.0:	RDBMS	+	Memcached	(h<p://memcached.org)	
–  Memcached:	A	high-performance,	distributed	memory	object	caching	systems	
Typical	Solu4ons	or	Architectures	for	Big	Data	Analy4cs
HPCAC-Switzerland	(Mar	‘16)	 8	Network	Based	Compu4ng	Laboratory	
Big	Data	Processing	with	Hadoop	Components
•  Major	components	included	in	this	
tutorial:		
–  MapReduce	(Batch)	
–  HBase	(Query)	
–  HDFS	(Storage)	
–  RPC	(Inter-process	communicaFon)	
•  Underlying	Hadoop	Distributed	File	
System	(HDFS)	used	by	both	
MapReduce	and	HBase	
•  Model	scales	but	high	amount	of	
communicaFon	during	intermediate	
phases	can	be	further	opFmized	
HDFS	
MapReduce	
Hadoop	Framework	
User	Applica4ons	
HBase	
Hadoop	Common	(RPC)
HPCAC-Switzerland	(Mar	‘16)	 9	Network	Based	Compu4ng	Laboratory	
Worker
Worker
Worker
Worker
Master
HDFS
Driver
Zookeeper
Worker
SparkContext
•  An	in-memory	data-processing	
framework		
–  IteraFve	machine	learning	jobs		
–  InteracFve	data	analyFcs		
–  Scala	based	ImplementaFon	
–  Standalone,	YARN,	Mesos	
•  Scalable	and	communicaFon	intensive	
–  Wide	dependencies	between	Resilient	
Distributed	Datasets	(RDDs)	
–  MapReduce-like	shuffle	operaFons	to	
reparFFon	RDDs		
–  Sockets	based	communicaFon	
Spark	Architecture	Overview	
hWp://spark.apache.org
HPCAC-Switzerland	(Mar	‘16)	 10	Network	Based	Compu4ng	Laboratory	
Memcached	Architecture	
•  Distributed	Caching	Layer	
–  Allows	to	aggregate	spare	memory	from	mulFple	nodes	
–  General	purpose	
•  Typically	used	to	cache	database	queries,	results	of	API	calls	
•  Scalable	model,	but	typical	usage	very	network	intensive	
Main
memory
CPUs
SSD HDD
High Performance
Networks
... ...
...
Main
memory
CPUs
SSD HDD
Main
memory
CPUs
SSD HDD
Main
memory
CPUs
SSD HDD
Main
memory
CPUs
SSD HDD
HPCAC-Switzerland	(Mar	‘16)	 11	Network	Based	Compu4ng	Laboratory	
•  SubstanFal	impact	on	designing	and	uFlizing	data	management	and	processing	systems	in	mulFple	Fers	
–  Front-end	data	accessing	and	serving	(Online)	
•  Memcached	+	DB	(e.g.	MySQL),	HBase	
–  Back-end	data	analyFcs	(Offline)	
•  HDFS,	MapReduce,	Spark	
Data	Management	and	Processing	on	Modern	Clusters
Internet
Front-end Tier Back-end Tier
Web
ServerWeb
ServerWeb
Server
Memcached
+ DB (MySQL)Memcached
+ DB (MySQL)Memcached
+ DB (MySQL)
NoSQL DB
(HBase)NoSQL DB
(HBase)NoSQL DB
(HBase)
HDFS
MapReduce Spark
Data Analytics Apps/Jobs
Data Accessing
and Serving
HPCAC-Switzerland	(Mar	‘16)	 12	Network	Based	Compu4ng	Laboratory	
•  Introduced	in	Oct	2000	
•  High	Performance	Data	Transfer	
–  Interprocessor	communicaFon	and	I/O	
–  Low	latency	(<1.0	microsec),	High	bandwidth	(up	to	12.5	GigaBytes/sec	->	100Gbps),	and	
low	CPU	uFlizaFon	(5-10%)	
•  MulFple	OperaFons	
–  Send/Recv	
–  RDMA	Read/Write	
–  Atomic	OperaFons	(very	unique)	
•  high	performance	and	scalable	implementaFons	of	distributed	locks,	semaphores,	collecFve	
communicaFon	operaFons		
•  Leading	to	big	changes	in	designing		
–  HPC	clusters	
–  File	systems	
–  Cloud	compuFng	systems	
–  Grid	compuFng	systems	
Open	Standard	InfiniBand	Networking	Technology
HPCAC-Switzerland	(Mar	‘16)	 13	Network	Based	Compu4ng	Laboratory	
Interconnects	and	Protocols	in	OpenFabrics	Stack	
Kernel
Space
Applica4on	/	
Middleware	
Verbs	
Ethernet	
Adapter	
Ethernet	
Switch	
Ethernet	
Driver	
TCP/IP	
1/10/40/100	
GigE	
InfiniBand	
Adapter	
InfiniBand	
Switch	
IPoIB	
IPoIB	
Ethernet	
Adapter	
Ethernet	
Switch	
Hardware	
Offload	
TCP/IP	
10/40	GigE-
TOE	
InfiniBand	
Adapter	
InfiniBand	
Switch	
User	
Space	
RSockets	
RSockets	
iWARP	
Adapter	
Ethernet	
Switch	
TCP/IP	
User	
Space	
iWARP	
RoCE	
Adapter	
Ethernet	
Switch	
RDMA	
User	
Space	
RoCE	
InfiniBand	
Switch	
InfiniBand	
Adapter	
RDMA	
User	
Space	
IB	Na4ve	
Sockets	
Applica4on	/	
Middleware	Interface	
Protocol	
Adapter	
Switch	
InfiniBand	
Adapter	
InfiniBand	
Switch	
RDMA	
SDP	
SDP
HPCAC-Switzerland	(Mar	‘16)	 14	Network	Based	Compu4ng	Laboratory	
How	Can	HPC	Clusters	with	High-Performance	Interconnect	and	Storage	
Architectures	Benefit	Big	Data	Applica4ons?	
Bring	HPC	and	Big	Data	processing	into	a	
“convergent	trajectory”!	
What	are	the	major	
bo<lenecks	in	current	Big	
Data	processing	
middleware	(e.g.	Hadoop,	
Spark,	and	Memcached)?	
Can	the	bo<lenecks	be	
alleviated	with	new	
designs	by	taking	
advantage	of	HPC	
technologies?	
Can	RDMA-enabled	
high-performance	
interconnects		
benefit	Big	Data	
processing?	
Can	HPC	Clusters	with	
high-performance	
storage	systems	(e.g.	
SSD,	parallel	file	
systems)	benefit	Big	
Data	applicaFons?	
How	much	
performance	benefits	
can	be	achieved	
through	enhanced	
designs?	
How	to	design	
benchmarks	for		
evaluaFng	the	
performance	of	Big	
Data	middleware	on	
HPC	clusters?
HPCAC-Switzerland	(Mar	‘16)	 15	Network	Based	Compu4ng	Laboratory	
Designing	Communica4on	and	I/O	Libraries	for	Big	Data	Systems:	
Challenges		
Big	Data	Middleware	
(HDFS,	MapReduce,	HBase,	Spark	and	Memcached)	
Networking	Technologies	
(InfiniBand,	1/10/40/100	GigE	
and	Intelligent	NICs)	
Storage	Technologies	
(HDD,	SSD,	and	NVMe-SSD)	
Programming	Models	
(Sockets)	
Applica4ons	
Commodity	Compu4ng	System	
Architectures	
(Mul4-	and	Many-core	
architectures	and	accelerators)	
Other	Protocols?	
Communica4on	and	I/O	Library	
Point-to-Point	
Communica4on	
QoS	
Threaded	Models	
and	Synchroniza4on	
Fault-Tolerance	I/O	and	File	Systems	
Virtualiza4on	
Benchmarks	
Upper	level	
Changes?
HPCAC-Switzerland	(Mar	‘16)	 16	Network	Based	Compu4ng	Laboratory	
•  Sockets	not	designed	for	high-performance	
–  Stream	semanFcs	owen	mismatch	for	upper	layers	
–  Zero-copy	not	available	for	non-blocking	sockets	
Can	Big	Data	Processing	Systems	be	Designed	with	High-
Performance	Networks	and	Protocols?	
Current		Design	
Applica4on	
Sockets	
1/10/40/100	GigE	
Network	
Our	Approach	
Applica4on	
OSU		Design	
10/40/100	GigE	or	
InfiniBand	
Verbs	Interface
HPCAC-Switzerland	(Mar	‘16)	 17	Network	Based	Compu4ng	Laboratory	
•  RDMA	for	Apache	Spark		
•  RDMA	for	Apache	Hadoop	2.x	(RDMA-Hadoop-2.x)	
–  Plugins	for	Apache,	Hortonworks	(HDP)	and	Cloudera	(CDH)	Hadoop	distribuFons	
•  RDMA	for	Apache	Hadoop	1.x	(RDMA-Hadoop)	
•  RDMA	for	Memcached	(RDMA-Memcached)	
•  OSU	HiBD-Benchmarks	(OHB)	
–  HDFS	and	Memcached	Micro-benchmarks	
•  hWp://hibd.cse.ohio-state.edu	
•  Users	Base:	155	organizaFons	from	20	countries	
•  More	than	15,500	downloads	from	the	project	site	
•  RDMA	for	Apache	HBase	and	Impala	(upcoming)	
The	High-Performance	Big	Data	(HiBD)	Project	
Available	for	InfiniBand	and	RoCE
HPCAC-Switzerland	(Mar	‘16)	 18	Network	Based	Compu4ng	Laboratory	
•  High-Performance	Design	of	Hadoop	over	RDMA-enabled	Interconnects	
–  High	performance	RDMA-enhanced	design	with	naFve	InfiniBand	and	RoCE	support	at	the	verbs-level	for	HDFS,	MapReduce,	and	
RPC	components	
–  Enhanced	HDFS	with	in-memory	and	heterogeneous	storage	
–  High	performance	design	of	MapReduce	over	Lustre	
–  Plugin-based	architecture	supporFng	RDMA-based	designs	for	Apache	Hadoop,	CDH	and	HDP	
–  Easily	configurable	for	different	running	modes	(HHH,	HHH-M,	HHH-L,	and	MapReduce	over	Lustre)	and	different	protocols	(naFve	
InfiniBand,	RoCE,	and	IPoIB)	
•  Current	release:	0.9.9	
–  Based	on	Apache	Hadoop	2.7.1	
–  Compliant	with	Apache	Hadoop	2.7.1,	HDP	2.3.0.0		and	CDH	5.6.0	APIs	and	applicaFons	
–  Tested	with	
•  Mellanox	InfiniBand	adapters	(DDR,	QDR	and	FDR)	
•  RoCE	support	with	Mellanox	adapters	
•  Various	mulF-core	pla|orms	
•  Different	file	systems	with	disks	and	SSDs	and	Lustre	
–  hWp://hibd.cse.ohio-state.edu	
RDMA	for	Apache	Hadoop	2.x	Distribu4on
HPCAC-Switzerland	(Mar	‘16)	 19	Network	Based	Compu4ng	Laboratory	
•  High-Performance	Design	of	Spark		over	RDMA-enabled	Interconnects	
–  High	performance	RDMA-enhanced	design	with	naFve	InfiniBand	and	RoCE	support	at	the	verbs-level	
for	Spark	
–  RDMA-based	data	shuffle	and	SEDA-based	shuffle	architecture	
–  Non-blocking	and	chunk-based	data	transfer	
–  Easily	configurable	for	different	protocols	(naFve	InfiniBand,	RoCE,	and	IPoIB)	
•  Current	release:	0.9.1	
–  Based	on	Apache	Spark		1.5.1	
–  Tested	with	
•  Mellanox	InfiniBand	adapters	(DDR,	QDR	and	FDR)	
•  RoCE	support	with	Mellanox	adapters	
•  Various	mulF-core	pla|orms	
•  RAM	disks,	SSDs,	and	HDD	
–  hWp://hibd.cse.ohio-state.edu	
RDMA	for	Apache	Spark	Distribu4on
HPCAC-Switzerland	(Mar	‘16)	 20	Network	Based	Compu4ng	Laboratory	
•  High-Performance	Design	of	Memcached	over	RDMA-enabled	Interconnects	
–  High	performance	RDMA-enhanced	design	with	naFve	InfiniBand	and	RoCE	support	at	the	verbs-level	for	
Memcached	and	libMemcached	components	
–  High	performance	design	of	SSD-Assisted	Hybrid	Memory	
–  Easily	configurable	for	naFve	InfiniBand,	RoCE	and	the	tradiFonal	sockets-based	support	(Ethernet	and	
InfiniBand	with	IPoIB)	
•  Current	release:	0.9.4		
–  Based	on	Memcached	1.4.24	and	libMemcached	1.0.18	
–  Compliant	with	libMemcached	APIs	and	applicaFons	
–  Tested	with	
•  Mellanox	InfiniBand	adapters	(DDR,	QDR	and	FDR)	
•  RoCE	support	with	Mellanox	adapters	
•  Various	mulF-core	pla|orms	
•  SSD	
–  hWp://hibd.cse.ohio-state.edu	
RDMA	for	Memcached	Distribu4on
HPCAC-Switzerland	(Mar	‘16)	 21	Network	Based	Compu4ng	Laboratory	
•  Micro-benchmarks	for	Hadoop	Distributed	File	System	(HDFS)		
–  SequenFal	Write	Latency	(SWL)	Benchmark,	SequenFal	Read	Latency	(SRL)	
Benchmark,	Random	Read	Latency	(RRL)	Benchmark,	SequenFal	Write	Throughput	
(SWT)	Benchmark,	SequenFal	Read	Throughput	(SRT)	Benchmark		
–  Support	benchmarking	of		
•  Apache	Hadoop	1.x	and	2.x	HDFS,	Hortonworks	Data	Pla|orm	(HDP)	HDFS,	Cloudera	
DistribuFon	of	Hadoop	(CDH)	HDFS		
•  Micro-benchmarks	for	Memcached		
–  Get	Benchmark,	Set	Benchmark,	and		Mixed	Get/Set	Benchmark	
•  Current	release:	0.8	
•  hWp://hibd.cse.ohio-state.edu	
OSU	HiBD	Micro-Benchmark	(OHB)	Suite	–	HDFS	&	Memcached
HPCAC-Switzerland	(Mar	‘16)	 22	Network	Based	Compu4ng	Laboratory	
•  HHH:	Heterogeneous	storage	devices	with	hybrid	replicaFon	schemes	are	supported	in	this	mode	of	operaFon	to	have	be<er	fault-tolerance	as	well	
as	performance.	This	mode	is	enabled	by	default	in	the	package.		
•  HHH-M:	A	high-performance	in-memory	based	setup	has	been	introduced	in	this	package	that	can	be	uFlized	to	perform	all	I/O	operaFons	in-
memory	and	obtain	as	much	performance	benefit	as	possible.		
•  HHH-L:	With	parallel	file	systems	integrated,	HHH-L	mode	can	take	advantage	of	the	Lustre	available	in	the	cluster.		
•  MapReduce	over	Lustre,	with/without	local	disks:	Besides,	HDFS	based	soluFons,	this	package	also	provides	support	to	run	MapReduce	jobs	on	top	
of	Lustre	alone.	Here,	two	different	modes	are	introduced:	with	local	disks	and	without	local	disks.	
•  Running	with	Slurm	and	PBS:	Supports	deploying	RDMA	for	Apache	Hadoop	2.x	with	Slurm	and	PBS	in	different	running	modes	(HHH,	HHH-M,	HHH-
L,	and	MapReduce	over	Lustre).	
Different	Modes	of	RDMA	for	Apache	Hadoop	2.x
HPCAC-Switzerland	(Mar	‘16)	 23	Network	Based	Compu4ng	Laboratory	
•  RDMA-based	Designs	and	Performance	EvaluaFon	
–  HDFS	
–  MapReduce	
–  RPC	
–  HBase	
–  Spark	
–  Memcached	(Basic,	Hybrid	and	Non-blocking	APIs)	
–  HDFS	+	Memcached-based	Burst	Buffer	
–  OSU	HiBD	Benchmarks	(OHB)	
Accelera4on	Case	Studies	and	Performance	Evalua4on
HPCAC-Switzerland	(Mar	‘16)	 24	Network	Based	Compu4ng	Laboratory	
•  Enables	high	performance	RDMA	communicaFon,	while	supporFng	tradiFonal	socket	interface	
•  JNI	Layer	bridges	Java	based	HDFS	with	communicaFon	library	wri<en	in	naFve	code	
	
Design	Overview	of	HDFS	with	RDMA	
HDFS	
Verbs	
RDMA	Capable	Networks	
(IB,		iWARP,	RoCE	..)	
Applica4ons	
1/10/40/100	GigE,	IPoIB		
Network	
Java	Socket	Interface	 Java	Na4ve	Interface	(JNI)	
Write	Others	
	
OSU	Design	
	
•  Design	Features	
–  RDMA-based	HDFS	write	
–  RDMA-based	HDFS	
replicaFon	
–  Parallel	replicaFon	
support	
–  On-demand	connecFon	
setup	
–  InfiniBand/RoCE	support	
N.	S.	Islam,	M.	W.	Rahman,	J.	Jose,	R.	Rajachandrasekar,	H.	Wang,	H.	Subramoni,	C.	Murthy	and	D.	K.	Panda	,	High	Performance	RDMA-Based	Design	of	HDFS	
over	InfiniBand	,	Supercompu4ng	(SC),	Nov	2012	
N.	Islam,	X.	Lu,	W.	Rahman,	and	D.	K.	Panda,	SOR-HDFS:	A	SEDA-based	Approach	to	Maximize	Overlapping	in	RDMA-Enhanced	HDFS,		HPDC	'14,		June	2014
HPCAC-Switzerland	(Mar	‘16)	 25	Network	Based	Compu4ng	Laboratory	
Triple-H	
	
Heterogeneous	Storage	
	
•  Design	Features	
–  Three	modes	
•  Default	(HHH)	
•  In-Memory	(HHH-M)	
•  Lustre-Integrated	(HHH-L)	
–  Policies	to	efficiently	uFlize	the	heterogeneous	
storage	devices	
•  RAM,	SSD,	HDD,	Lustre	
–  EvicFon/PromoFon	based	on	data	usage	
pa<ern	
–  Hybrid	ReplicaFon	
–  Lustre-Integrated	mode:	
•  Lustre-based	fault-tolerance	
Enhanced	HDFS	with	In-Memory	and	Heterogeneous	Storage	
Hybrid	Replica4on	
Data	Placement	Policies	
Evic4on/Promo4on	
RAM	Disk	 SSD	 HDD	
Lustre
N.	Islam,	X.	Lu,	M.	W.	Rahman,	D.	Shankar,	and	D.	K.	Panda,	Triple-H:		A	Hybrid	Approach	to	Accelerate	HDFS	on	HPC	Clusters	
with	Heterogeneous	Storage	Architecture,	CCGrid	’15,		May	2015	
Applica4ons
HPCAC-Switzerland	(Mar	‘16)	 26	Network	Based	Compu4ng	Laboratory	
Design	Overview	of	MapReduce	with	RDMA	
MapReduce	
Verbs	
RDMA	Capable	Networks	
(IB,	iWARP,	RoCE	..)	
OSU	Design	
Applica4ons	
1/10/40/100	GigE,	IPoIB		
Network	
Java	Socket	Interface	 Java	Na4ve	Interface	(JNI)	
Job	
Tracker	
Task	
Tracker	
Map	
Reduce	
•  Enables	high	performance	RDMA	communicaFon,	while	supporFng	tradiFonal	socket	interface	
•  JNI	Layer	bridges	Java	based	MapReduce	with	communicaFon	library	wri<en	in	naFve	code	
•  Design	Features	
–  RDMA-based	shuffle	
–  Prefetching	and	caching	map	output	
–  Efficient	Shuffle	Algorithms	
–  In-memory	merge	
–  On-demand	Shuffle	Adjustment	
–  Advanced	overlapping	
•  map,	shuffle,	and	merge	
•  shuffle,	merge,	and	reduce	
–  On-demand	connecFon	setup	
–  InfiniBand/RoCE	support	
M.	W.	Rahman,	X.	Lu,	N.	S.	Islam,	and	D.	K.	Panda,	HOMR:	A	Hybrid	Approach	to	Exploit	Maximum	Overlapping	in	
MapReduce	over	High	Performance	Interconnects,	ICS,	June	2014
HPCAC-Switzerland	(Mar	‘16)	 27	Network	Based	Compu4ng	Laboratory	
•  A	hybrid	approach	to	achieve	maximum	
possible	overlapping	in	MapReduce	across	
all	phases	compared	to	other	approaches	
–  Efficient	Shuffle	Algorithms	
–  Dynamic	and	Efficient	Switching	
–  On-demand	Shuffle	Adjustment	
			Advanced	Overlapping	among	Different	Phases	
Default	Architecture	
Enhanced	Overlapping	with	In-Memory	Merge	
Advanced	Hybrid	Overlapping	
M.	W.	Rahman,	X.	Lu,	N.	S.	Islam,	and	D.	K.	Panda,	HOMR:	A	
Hybrid	Approach	to	Exploit	Maximum	Overlapping	in	MapReduce	
over	High	Performance	Interconnects,	ICS,	June	2014
HPCAC-Switzerland	(Mar	‘16)	 28	Network	Based	Compu4ng	Laboratory	
•  Design	Features	
–  JVM-bypassed	buffer	
management	
–  RDMA	or	send/recv	based	
adapFve	communicaFon	
–  Intelligent	buffer	allocaFon	and	
adjustment	for	serializaFon	
–  On-demand	connecFon	setup	
–  InfiniBand/RoCE	support	
Design	Overview	of	Hadoop	RPC	with	RDMA	
Hadoop	RPC	
Verbs	
RDMA	Capable	Networks	
(IB,	iWARP,	RoCE	..)	
Applica4ons	
1/10/40/100	GigE,	IPoIB		
Network	
Java	Socket	Interface	 Java	Na4ve	Interface	(JNI)	
Our	Design	Default	
	
OSU	Design	
	
•  Enables	high	performance	RDMA	communicaFon,	while	supporFng	tradiFonal	socket	interface	
•  JNI	Layer	bridges	Java	based	RPC	with	communicaFon	library	wri<en	in	naFve	code	
	 X.	Lu,	N.	Islam,	M.	W.	Rahman,	J.	Jose,	H.	Subramoni,	H.	Wang,	and	D.	K.	Panda,	High-Performance	Design	of	Hadoop	RPC	with	
RDMA	over	InfiniBand,	Int'l	Conference	on	Parallel	Processing	(ICPP	'13),	October	2013.
HPCAC-Switzerland	(Mar	‘16)	 29	Network	Based	Compu4ng	Laboratory	
0	
50	
100	
150	
200	
250	
80	 100	 120	
Execu4on	Time	(s)	
Data	Size	(GB)	
IPoIB	(FDR)	
0	
50	
100	
150	
200	
250	
80	 100	 120	
Execu4on	Time	(s)	
Data	Size	(GB)	
IPoIB	(FDR)	
Performance	Benefits	–	RandomWriter	&	TeraGen	in	TACC-Stampede	
Cluster	with	32	Nodes	with	a	total	of	128	maps	
•  RandomWriter	
–  3-4x	improvement	over	IPoIB	
for	80-120	GB	file	size	
•  TeraGen	
–  4-5x	improvement	over	IPoIB	
for	80-120	GB	file	size	
RandomWriter	 TeraGen	
Reduced	by	3x	 Reduced	by	4x
HPCAC-Switzerland	(Mar	‘16)	 30	Network	Based	Compu4ng	Laboratory	
0	
100	
200	
300	
400	
500	
600	
700	
800	
900	
80	 100	 120	
Execu4on	Time	(s)	
Data	Size	(GB)	
IPoIB	(FDR)	 OSU-IB	(FDR)	
0	
100	
200	
300	
400	
500	
600	
80	 100	 120	
Execu4on	Time	(s)	
Data	Size	(GB)	
IPoIB	(FDR)	 OSU-IB	(FDR)	
Performance	Benefits	–	Sort	&	TeraSort	in	TACC-Stampede	
Cluster	with	32	Nodes	with	a	total	of		
128	maps	and	64	reduces	
•  Sort	with	single	HDD	per	node	
–  40-52%	improvement	over	IPoIB	
for	80-120	GB	data		
•  TeraSort	with	single	HDD	per	node	
–  42-44%	improvement	over	IPoIB	
for	80-120	GB	data	
Reduced	by	52%	 Reduced	by	44%	
Cluster	with	32	Nodes	with	a	total	of		
128	maps	and	57	reduces
HPCAC-Switzerland	(Mar	‘16)	 31	Network	Based	Compu4ng	Laboratory	
Evalua4on	of	HHH	and	HHH-L	with	Applica4ons	
HDFS	(FDR)	 HHH	(FDR)	
60.24	s	 48.3	s	
CloudBurst	MR-MSPolyGraph	
0	
200	
400	
600	
800	
1000	
4	 6	 8	
ExecuFon	Time	(s)	
Concurrent	maps	per	host	
HDFS	 Lustre	 HHH-L	 Reduced	by	79%	
	
•  MR-MSPolygraph	on	OSU	RI	with	
1,000	maps	
–  HHH-L	reduces	the	execuFon	Fme	
by	79%	over	Lustre,	30%	over	HDFS	
•  CloudBurst	on	TACC	Stampede	
–  With	HHH:	19%	improvement	over	
HDFS
HPCAC-Switzerland	(Mar	‘16)	 32	Network	Based	Compu4ng	Laboratory	
Evalua4on	with	Spark	on	SDSC	Gordon	(HHH	vs.	Tachyon/Alluxio)	
•  For	200GB	TeraGen	on	32	nodes	
–  Spark-TeraGen:	HHH	has	2.4x	improvement	over	Tachyon;	2.3x	over	HDFS-IPoIB	(QDR)	
–  Spark-TeraSort:	HHH	has	25.2%	improvement	over	Tachyon;	17%	over	HDFS-IPoIB	(QDR)	
0	
20	
40	
60	
80	
100	
120	
140	
160	
180	
8:50	 16:100	 32:200	
Execu4on	Time	(s)	
Cluster	Size	:	Data	Size	(GB)	
IPoIB	(QDR)	 Tachyon	 OSU-IB	(QDR)	
0	
100	
200	
300	
400	
500	
600	
700	
8:50	 16:100	 32:200	
Execu4on	Time	(s)	
Cluster	Size	:	Data	Size	(GB)	
Reduced	
by	2.4x	
Reduced	by	25.2%	
TeraGen	 TeraSort	
N.	Islam,	M.	W.	Rahman,	X.	Lu,	D.	Shankar,	and	D.	K.	Panda,	Performance	Characteriza4on	and	Accelera4on	of	In-Memory	File	
Systems	for	Hadoop	and	Spark	Applica4ons	on	HPC	Clusters,	IEEE	BigData	’15,	October	2015
HPCAC-Switzerland	(Mar	‘16)	 33	Network	Based	Compu4ng	Laboratory	
Intermediate	Data	Directory	
Design	Overview	of	Shuffle	Strategies	for	MapReduce	over	Lustre	
•  Design	Features	
–  Two	shuffle	approaches	
•  Lustre	read	based	shuffle	
•  RDMA	based	shuffle	
–  Hybrid	shuffle	algorithm	to	take	benefit	
from	both	shuffle	approaches	
–  Dynamically	adapts	to	the	be<er	
shuffle	approach	for	each	shuffle	
request	based	on	profiling	values	for	
each	Lustre	read	operaFon	
–  In-memory	merge	and	overlapping	of	
different	phases	are	kept	similar	to	
RDMA-enhanced	MapReduce	design	
Map	1	 Map	2	 Map	3	
Lustre	
Reduce	1	 Reduce	2	
Lustre	Read	/	RDMA	
In-memory	
merge/sort	
reduce	
M.	W.	Rahman,	X.	Lu,	N.	S.	Islam,	R.	Rajachandrasekar,	and	D.	K.	Panda,	High	Performance	Design	of	YARN	
MapReduce	on	Modern	HPC	Clusters	with	Lustre	and	RDMA,	IPDPS,	May	2015	
In-memory	
merge/sort	
reduce
HPCAC-Switzerland	(Mar	‘16)	 34	Network	Based	Compu4ng	Laboratory	
•  For	500GB	Sort	in	64	nodes	
–  44%	improvement	over	IPoIB	(FDR)	
Performance	Improvement	of	MapReduce	over	Lustre	on	TACC-
Stampede	
•  For	640GB	Sort	in	128	nodes	
–  48%	improvement	over	IPoIB	(FDR)	
0	
200	
400	
600	
800	
1000	
1200	
300	 400	 500	
Job	Execu4on	Time	(sec)	
Data	Size	(GB)	
IPoIB	(FDR)	
OSU-IB	(FDR)	
0	
50	
100	
150	
200	
250	
300	
350	
400	
450	
500	
20	GB	 40	GB	 80	GB	 160	GB	 320	GB	 640	GB	
Cluster:	4	 Cluster:	8	 Cluster:	16	 Cluster:	32	 Cluster:	64	 Cluster:	128	
Job	Execu4on	Time	(sec)	
IPoIB	(FDR)	 OSU-IB	(FDR)	
M.	W.	Rahman,	X.	Lu,	N.	S.	Islam,	R.	Rajachandrasekar,	and	D.	K.	Panda,	MapReduce	over	Lustre:	Can	
RDMA-based	Approach	Benefit?,	Euro-Par,	August	2014.	
•  Local	disk	is	used	as	the	intermediate	data	directory	
Reduced	by	48%	Reduced	by	44%
HPCAC-Switzerland	(Mar	‘16)	 35	Network	Based	Compu4ng	Laboratory	
•  For	80GB	Sort	in	8	nodes	
–  34%	improvement	over	IPoIB	(QDR)	
Case	Study	-	Performance	Improvement	of	MapReduce	over	
Lustre	on	SDSC-Gordon	
•  For	120GB	TeraSort	in	16	nodes	
–  25%	improvement	over	IPoIB	(QDR)	
•  Lustre	is	used	as	the	intermediate	data	directory	
0	
100	
200	
300	
400	
500	
600	
700	
800	
900	
40	 60	 80	
Job	Execu4on	Time	(sec)	
Data	Size	(GB)	
IPoIB	(QDR)	
OSU-Lustre-Read	(QDR)	
OSU-RDMA-IB	(QDR)	
OSU-Hybrid-IB	(QDR)	
0	
100	
200	
300	
400	
500	
600	
700	
800	
900	
40	 80	 120	
Job	Execu4on	Time	(sec)	
Data	Size	(GB)	
Reduced	by	25%	Reduced	by	34%
HPCAC-Switzerland	(Mar	‘16)	 36	Network	Based	Compu4ng	Laboratory	
•  RDMA-based	Designs	and	Performance	EvaluaFon	
–  HDFS	
–  MapReduce	
–  RPC	
–  HBase	
–  Spark	
–  Memcached	(Basic,	Hybrid	and	Non-blocking	APIs)	
–  HDFS	+	Memcached-based	Burst	Buffer	
–  OSU	HiBD	Benchmarks	(OHB)	
Accelera4on	Case	Studies	and	Performance	Evalua4on
HPCAC-Switzerland	(Mar	‘16)	 37	Network	Based	Compu4ng	Laboratory	
HBase-RDMA	Design	Overview	
•  JNI	Layer	bridges	Java	based	HBase	with	communicaFon	library	wri<en	in	naFve	code	
•  Enables	high	performance	RDMA	communicaFon,	while	supporFng	tradiFonal	socket	interface	
HBase	
IB	Verbs	
RDMA	Capable	Networks	
(IB,	iWARP,	RoCE	..)	
OSU-IB	Design	
Applica4ons	
1/10/40/100	GigE,	IPoIB		
Networks	
Java	Socket	Interface	 Java	Na4ve	Interface	(JNI)
HPCAC-Switzerland	(Mar	‘16)	 38	Network	Based	Compu4ng	Laboratory	
HBase	–	YCSB	Read-Write	Workload	
•  HBase	Get	latency	(QDR,	10GigE)	
–  64	clients:	2.0	ms;	128	Clients:	3.5	ms	
–  42%	improvement	over	IPoIB	for	128	clients	
•  HBase	Put	latency	(QDR,	10GigE)	
–  64	clients:	1.9	ms;	128	Clients:	3.5	ms	
–  40%	improvement	over	IPoIB	for	128	clients	
	
0	
1000	
2000	
3000	
4000	
5000	
6000	
7000	
8	 16	 32	 64	 96	 128	
Time	(us)	
No.	of	Clients	
0	
1000	
2000	
3000	
4000	
5000	
6000	
7000	
8000	
9000	
10000	
8	 16	 32	 64	 96	 128	
Time	(us)	
No.	of	Clients	
10GigE	
Read	Latency	 Write	Latency	
OSU-IB	(QDR)	IPoIB	(QDR)	
J.	Huang,	X.	Ouyang,	J.	Jose,	M.	W.	Rahman,	H.	
Wang,	M.	Luo,	H.	Subramoni,	Chet	Murthy,	and	
D.	K.	Panda,	High-Performance	Design	of	HBase	
with	RDMA	over	InfiniBand,	IPDPS’12
HPCAC-Switzerland	(Mar	‘16)	 39	Network	Based	Compu4ng	Laboratory	
HBase	–	YCSB	Get	Latency	and	Throughput	on	SDSC-Comet	
•  HBase	Get	average	latency	(FDR)	
–  4	client	threads:	38	us		
–  59%	improvement	over	IPoIB	for	4	client	threads	
•  HBase	Get	total	throughput		
–  4	client	threads:	102	Kops/sec	
–  2.4x	improvement	over	IPoIB	for	4	client	threads	
	
Get	Latency	 Get	Throughput	
0	
0.02	
0.04	
0.06	
0.08	
0.1	
0.12	
1	 2	 3	 4	
Average	Latency	(ms)	
Number	of	Client	Threads	
IPoIB	(FDR)	 OSU-IB	(FDR)	
0	
20	
40	
60	
80	
100	
120	
1	 2	 3	 4	
Total	Throughput	(Kops/
sec)	
Number	of	Client	Threads	
IPoIB	(FDR)	 OSU-IB	(FDR)	
59%	
2.4x
HPCAC-Switzerland	(Mar	‘16)	 40	Network	Based	Compu4ng	Laboratory	
•  RDMA-based	Designs	and	Performance	EvaluaFon	
–  HDFS	
–  MapReduce	
–  RPC	
–  HBase	
–  Spark	
–  Memcached	(Basic,	Hybrid	and	Non-blocking	APIs)	
–  HDFS	+	Memcached-based	Burst	Buffer	
–  OSU	HiBD	Benchmarks	(OHB)	
Accelera4on	Case	Studies	and	Performance	Evalua4on
HPCAC-Switzerland	(Mar	‘16)	 41	Network	Based	Compu4ng	Laboratory	
•  Design	Features	
–  RDMA	based	shuffle	
–  SEDA-based	plugins	
–  Dynamic	connecFon	
management	and	sharing	
–  Non-blocking	data	transfer	
–  Off-JVM-heap	buffer	
management	
–  InfiniBand/RoCE	support	
Design	Overview	of	Spark	with	RDMA
Spark Applications
(Scala/Java/Python)
Spark
(Scala/Java)
BlockFetcherIteratorBlockManager
Task TaskTask Task
Java NIO
Shuffle
Server
(default)
Netty
Shuffle
Server
(optional)
RDMA
Shuffle
Server
(plug-in)
Java NIO
Shuffle
Fetcher
(default)
Netty
Shuffle
Fetcher
(optional)
RDMA
Shuffle
Fetcher
(plug-in)
Java Socket
RDMA-based Shuffle Engine
(Java/JNI)
1/10 Gig Ethernet/IPoIB (QDR/FDR)
Network
Native InfiniBand
(QDR/FDR)
•  Enables	high	performance	RDMA	communicaFon,	while	supporFng	tradiFonal	socket	interface	
•  JNI	Layer	bridges	Scala	based	Spark	with	communicaFon	library	wri<en	in	naFve	code	
	 X.	Lu,	M.	W.	Rahman,	N.	Islam,	D.	Shankar,	and	D.	K.	Panda,	Accelera4ng	Spark	with	RDMA	for	Big	Data	Processing:	Early	
Experiences,	Int'l	Symposium	on	High	Performance	Interconnects	(HotI'14),	August	2014
HPCAC-Switzerland	(Mar	‘16)	 42	Network	Based	Compu4ng	Laboratory	
•  InfiniBand	FDR,	SSD,	64	Worker	Nodes,	1536	Cores,	(1536M	1536R)	
•  RDMA-based	design	for	Spark	1.5.1		
•  RDMA	vs.	IPoIB	with	1536	concurrent	tasks,	single	SSD	per	node.		
–  SortBy:	Total	Fme	reduced	by	up	to	80%	over	IPoIB	(56Gbps)		
–  GroupBy:	Total	Fme	reduced	by	up	to	57%	over	IPoIB	(56Gbps)		
Performance	Evalua4on	on	SDSC	Comet	–	SortBy/GroupBy
64	Worker	Nodes,	1536	cores,	SortByTest		Total	Time	 64	Worker	Nodes,	1536	cores,	GroupByTest		Total	Time	
0	
50	
100	
150	
200	
250	
300	
64	 128	 256	
Time	(sec)	
Data	Size	(GB)	
IPoIB	
RDMA	
0	
50	
100	
150	
200	
250	
64	 128	 256	
Time	(sec)	
Data	Size	(GB)	
IPoIB	
RDMA	
57%	80%
HPCAC-Switzerland	(Mar	‘16)	 43	Network	Based	Compu4ng	Laboratory	
•  InfiniBand	FDR,	SSD,	64	Worker	Nodes,	1536	Cores,	(1536M	1536R)	
•  RDMA-based	design	for	Spark	1.5.1		
•  RDMA	vs.	IPoIB	with	1536	concurrent	tasks,	single	SSD	per	node.		
–  Sort:	Total	Fme	reduced	by	38%	over	IPoIB	(56Gbps)		
–  TeraSort:	Total	Fme	reduced	by	15%	over	IPoIB	(56Gbps)		
Performance	Evalua4on	on	SDSC	Comet	–	HiBench	Sort/TeraSort
64	Worker	Nodes,	1536	cores,	Sort		Total	Time	 64	Worker	Nodes,	1536	cores,	TeraSort	Total	Time	
0	
50	
100	
150	
200	
250	
300	
350	
400	
450	
64	 128	 256	
Time	(sec)	
Data	Size	(GB)	
IPoIB	
RDMA	
0	
100	
200	
300	
400	
500	
600	
64	 128	 256	
Time	(sec)	
Data	Size	(GB)	
IPoIB	
RDMA	
15%	38%
HPCAC-Switzerland	(Mar	‘16)	 44	Network	Based	Compu4ng	Laboratory	
•  InfiniBand	FDR,	SSD,	32/64	Worker	Nodes,	768/1536	Cores,	(768/1536M	768/1536R)	
•  RDMA-based	design	for	Spark	1.5.1		
•  RDMA	vs.	IPoIB	with	768/1536	concurrent	tasks,	single	SSD	per	node.		
–  32	nodes/768	cores:	Total	Fme	reduced	by	37%	over	IPoIB	(56Gbps)		
–  64	nodes/1536	cores:	Total	Fme	reduced	by	43%	over	IPoIB	(56Gbps)		
Performance	Evalua4on	on	SDSC	Comet	–	HiBench	PageRank
32	Worker	Nodes,	768	cores,	PageRank	Total	Time	 64	Worker	Nodes,	1536	cores,	PageRank	Total	Time	
0	
50	
100	
150	
200	
250	
300	
350	
400	
450	
Huge	 BigData	 GiganFc	
Time	(sec)	
Data	Size	(GB)	
IPoIB	
RDMA	
0	
100	
200	
300	
400	
500	
600	
700	
800	
Huge	 BigData	 GiganFc	
Time	(sec)	
Data	Size	(GB)	
IPoIB	
RDMA	
43%	37%
HPCAC-Switzerland	(Mar	‘16)	 45	Network	Based	Compu4ng	Laboratory	
•  RDMA-based	Designs	and	Performance	EvaluaFon	
–  HDFS	
–  MapReduce	
–  RPC	
–  HBase	
–  Spark	
–  Memcached	(Basic,	Hybrid	and	Non-blocking	APIs)	
–  HDFS	+	Memcached-based	Burst	Buffer	
–  OSU	HiBD	Benchmarks	(OHB)	
Accelera4on	Case	Studies	and	Performance	Evalua4on
HPCAC-Switzerland	(Mar	‘16)	 46	Network	Based	Compu4ng	Laboratory	
•  Server	and	client	perform	a	negoFaFon	protocol	
–  Master	thread	assigns	clients	to	appropriate	worker	thread	
•  Once	a	client	is	assigned	a	verbs	worker	thread,	it	can	communicate	directly	and	is	“bound”	to	
that	thread	
•  All	other	Memcached	data	structures	are	shared	among	RDMA	and	Sockets	worker	threads	
•  Memcached	Server	can	serve	both	socket	and	verbs	clients	simultaneously	
•  Memcached	applicaFons	need	not	be	modified;	uses	verbs	interface	if	available	
Memcached-RDMA	Design	
Sockets	
Client	
RDMA	
Client	
Master	
Thread	
Sockets	
Worker	
Thread	
Verbs	
Worker	
Thread	
Sockets	
Worker	
Thread	
Verbs	
Worker	
Thread	
	
Shared	
Data	
	
Memory	
Slabs	
Items	
…	
1	
1	
2	
2
HPCAC-Switzerland	(Mar	‘16)	 47	Network	Based	Compu4ng	Laboratory	
1	
10	
100	
1000	
1	 2	 4	 8	 16	 32	 64	128	256	512	1K	 2K	 4K	
Time	(us)	
Message	Size	
OSU-IB	(FDR)	
IPoIB	(FDR)	
0	
100	
200	
300	
400	
500	
600	
700	
16	 32	 64	 128	 256	 512	 1024	2048	4080	
Thousands	of	Transac4ons	
per	Second	(TPS)	
No.	of	Clients	
•  Memcached	Get	latency	
–  4	bytes	OSU-IB:	2.84	us;	IPoIB:	75.53	us	
–  2K	bytes	OSU-IB:	4.49	us;	IPoIB:	123.42	us	
•  Memcached	Throughput	(4bytes)	
–  4080	clients	OSU-IB:	556	Kops/sec,	IPoIB:	233	Kops/s	
–  Nearly	2X	improvement	in	throughput	
	
Memcached	GET	Latency	 Memcached	Throughput	
RDMA-Memcached	Performance	(FDR	Interconnect)	
Experiments	on	TACC	Stampede	(Intel	SandyBridge	Cluster,	IB:	FDR)	
Latency	Reduced		
by	nearly	20X	
2X
HPCAC-Switzerland	(Mar	‘16)	 48	Network	Based	Compu4ng	Laboratory	
•  IllustraFon	with	Read-Cache-Read	access	pa<ern	using	modified	mysqlslap	load	tesFng	
tool		
•  Memcached-RDMA	can		
-  improve	query	latency	by	up	to	66%	over	IPoIB	(32Gbps)	
-  throughput	by	up	to	69%	over	IPoIB	(32Gbps)	
Micro-benchmark	Evalua4on	for	OLDP	workloads		
0	
1	
2	
3	
4	
5	
6	
7	
8	
64	 96	 128	 160	 320	 400	
Latency	(sec)	
No.	of	Clients	
	Memcached-IPoIB	(32Gbps)	
Memcached-RDMA	(32Gbps)	
0	
1000	
2000	
3000	
4000	
64	 96	 128	 160	 320	 400	
Throughput		(Kq/s)	
No.	of	Clients	
Memcached-IPoIB	(32Gbps)	
Memcached-RDMA	(32Gbps)	
D.	Shankar,	X.	Lu,	J.	Jose,	M.	W.	Rahman,	N.	Islam,	and	D.	K.	Panda,	Can	RDMA	Benefit	On-Line	Data	Processing	Workloads	
with	Memcached	and	MySQL,	ISPASS’15	
Reduced	by	66%
HPCAC-Switzerland	(Mar	‘16)	 49	Network	Based	Compu4ng	Laboratory	
•  ohb_memlat	&	ohb_memthr	latency	&	throughput	micro-benchmarks	
•  Memcached-RDMA	can	
-  improve	query	latency	by	up	to	70%	over	IPoIB	(32Gbps)	
-  improve	throughput	by	up	to	2X	over	IPoIB	(32Gbps)	
-  No	overhead	in	using	hybrid	mode	when	all	data	can	fit	in	memory	
Performance	Benefits	of	Hybrid	Memcached	(Memory	+	SSD)	on		
SDSC-Gordon		
0	
2	
4	
6	
8	
10	
64	 128	 256	 512	 1024	
Throughput	(million	trans/sec)	
No.	of	Clients	
	IPoIB	(32Gbps)	
RDMA-Mem	(32Gbps)	
RDMA-Hybrid	(32Gbps)	
0	
100	
200	
300	
400	
500	
Average	latency	(us)	
Message	Size	(Bytes)	
2X
HPCAC-Switzerland	(Mar	‘16)	 50	Network	Based	Compu4ng	Laboratory	
–  Memcached	latency	test	with	Zipf	distribuFon,	server	with	1	GB	memory,	32	KB	key-value	pair	size,	total	
size	of	data	accessed	is	1	GB	(when	data	fits	in	memory)	and	1.5	GB	(when	data	does	not	fit	in	memory)		
–  When	data	fits	in	memory:	RDMA-Mem/Hybrid	gives	5x	improvement	over	IPoIB-Mem	
–  When	data	does	not	fit	in	memory:	RDMA-Hybrid	gives	2x-2.5x	over	IPoIB/RDMA-Mem	
Performance	Evalua4on	on	IB	FDR	+	SATA/NVMe	SSDs
0	
500	
1000	
1500	
2000	
2500	
Set	 Get	 Set	 Get	 Set	 Get	 Set	 Get	 Set	 Get	 Set	 Get	 Set	 Get	 Set	 Get	
IPoIB-Mem	 RDMA-Mem		 RDMA-Hybrid-SATA	 RDMA-Hybrid-
NVMe	
IPoIB-Mem	 RDMA-Mem	 RDMA-Hybrid-SATA	 RDMA-Hybrid-
NVMe	
Data	Fits	In	Memory	 Data	Does	Not	Fit	In	Memory	
Latency	(us)	
slab	allocaFon	(SSD	write)	 cache	check+load	(SSD	read)	 cache	update	 server	response	 client	wait	 miss-penalty
HPCAC-Switzerland	(Mar	‘16)	 51	Network	Based	Compu4ng	Laboratory	
–  RDMA-Accelerated	CommunicaFon		for	
Memcached	Get/Set	
–  Hybrid	‘RAM+SSD’	slab	management	for	
higher	data	retenFon	
–  Non-blocking	API	extensions		
•  memcached_(iset/iget/bset/bget/test/wait)	
•  Achieve	near	in-memory	speeds	while	hiding	
bo<lenecks	of	network	and	SSD	I/O	
•  Ability	to	exploit	communicaFon/computaFon	
overlap	
•  OpFonal	buffer	re-use	guarantees	
–  AdapFve	slab	manager	with	different	I/O	
schemes	for	higher	throughput.		
Accelera4ng	Hybrid	Memcached	with	RDMA,	Non-blocking	Extensions	and	SSDs	
D.	Shankar,	X.	Lu,	N.	S.	Islam,	M.	W.	Rahman,	and	D.	K.	Panda,	High-Performance	Hybrid	Key-Value	Store	on	Modern	Clusters	with	
RDMA	Interconnects	and	SSDs:	Non-blocking	Extensions,	Designs,	and	Benefits,	IPDPS,	May	2016	
BLOCKING		
API	
NON-BLOCKING		
API		REQ.	
NON-BLOCKING	
API	REPLY	
CLIENT	SERVER	
HYBRID	SLAB	MANAGER	(RAM+SSD)	
RDMA-ENHANCED	COMMUNICATION	LIBRARY	
RDMA-ENHANCED	COMMUNICATION	LIBRARY	
LIBMEMCACHED	LIBRARY	
Blocking	API	Flow
	
		
Non-Blocking	API	Flow
HPCAC-Switzerland	(Mar	‘16)	 52	Network	Based	Compu4ng	Laboratory	
–  Data	does	not	fit	in	memory:	Non-blocking	Memcached	Set/Get	API	Extensions	can	achieve	
•  >16x	latency	improvement	vs.	blocking	API	over	RDMA-Hybrid/RDMA-Mem	w/	penalty	
•  >2.5x	throughput	improvement	vs.	blocking	API	over	default/opFmized	RDMA-Hybrid	
–  Data	fits	in	memory:	Non-blocking	Extensions	perform	similar	to	RDMA-Mem/RDMA-Hybrid	and	>3.6x	
improvement	over	IPoIB-Mem		
Performance	Evalua4on	with	Non-Blocking	Memcached	API
0	
500	
1000	
1500	
2000	
2500	
Set	 Get	 Set	 Get	 Set	 Get	 Set	 Get	 Set	 Get	 Set	 Get	
IPoIB-Mem	 RDMA-Mem	 H-RDMA-Def	 H-RDMA-Opt-Block				H-RDMA-Opt-NonB-
i				
H-RDMA-Opt-NonB-
b				
Average	Latency	(us)	
Miss	Penalty	(Backend	DB	Access	Overhead)	
Client	Wait	
Server	Response	
Cache	Update	
Cache	check+Load	(Memory	and/or	SSD	read)	
Slab	AllocaFon	(w/	SSD	write	on	Out-of-Mem)	
H	=	Hybrid	Memcached	over	SATA	SSD		Opt	=	AdapFve	slab	manager		Block	=	Default	Blocking	API		
NonB-i	=	Non-blocking	iset/iget	API			NonB-b	=	Non-blocking	bset/bget	API	w/	buffer	re-use	guarantee
HPCAC-Switzerland	(Mar	‘16)	 53	Network	Based	Compu4ng	Laboratory	
•  RDMA-based	Designs	and	Performance	EvaluaFon	
–  HDFS	
–  MapReduce	
–  RPC	
–  HBase	
–  Spark	
–  Memcached	(Basic,	Hybrid	and	Non-blocking	APIs)	
–  HDFS	+	Memcached-based	Burst	Buffer	
–  OSU	HiBD	Benchmarks	(OHB)		
Accelera4on	Case	Studies	and	Performance	Evalua4on
HPCAC-Switzerland	(Mar	‘16)	 54	Network	Based	Compu4ng	Laboratory	
•  Design	Features	
–  Memcached-based	burst-buffer	
system		
•  Hides	latency	of	parallel	file	
system	access	
•  Read	from	local	storage	and	
Memcached	
–  Data	locality	achieved	by	wriFng	data	
to	local	storage	
–  Different	approaches	of	integraFon	
with	parallel	file	system		to	guarantee	
fault-tolerance	
Accelera4ng	I/O	Performance	of	Big	Data	Analy4cs	
through	RDMA-based	Key-Value	Store	
Applica4on	
I/O	Forwarding	Module	
Map/Reduce	Task	 DataNode	
Local	Disk	
	
Data	Locality	Fault-tolerance	
Lustre	
	
Memcached-based	Burst	Buffer	System
HPCAC-Switzerland	(Mar	‘16)	 55	Network	Based	Compu4ng	Laboratory	
Evalua4on	with	PUMA	Workloads	
Gains	on	OSU	RI	with	our	approach	(Mem-bb)	on	24	nodes	
•  SequenceCount:	34.5%	over	Lustre,	40%	over	HDFS	
•  RankedInvertedIndex:	27.3%	over	Lustre,	48.3%	over	HDFS		
•  HistogramRaFng:	17%	over	Lustre,	7%	over	HDFS	
0	
500	
1000	
1500	
2000	
2500	
3000	
3500	
4000	
4500	
SeqCount	 RankedInvIndex	 HistoRaFng	
Execu4on	Time	(s)	
Workloads	
HDFS	(32Gbps)	
Lustre	(32Gbps)	
Mem-bb	(32Gbps)	
48.3%
40%
17%
N.	S.	Islam,	D.	Shankar,	X.	Lu,	M.	
W.	Rahman,	and	D.	K.	Panda,	
Accelera4ng	I/O	Performance	of	
Big	Data	Analy4cs	with	RDMA-
based	Key-Value	Store,		ICPP	’15,	
September	2015
HPCAC-Switzerland	(Mar	‘16)	 56	Network	Based	Compu4ng	Laboratory	
•  RDMA-based	Designs	and	Performance	EvaluaFon	
–  HDFS	
–  MapReduce	
–  RPC	
–  HBase	
–  Spark	
–  Memcached	(Basic,	Hybrid	and	Non-blocking	APIs)	
–  HDFS	+	Memcached-based	Burst	Buffer	
–  OSU	HiBD	Benchmarks	(OHB)	
Accelera4on	Case	Studies	and	Performance	Evalua4on
HPCAC-Switzerland	(Mar	‘16)	 57	Network	Based	Compu4ng	Laboratory	
•  The	current	benchmarks	provide	some	performance	behavior	
•  However,	do	not	provide	any	informaFon	to	the	designer/developer	on:	
–  What	is	happening	at	the	lower-layer?	
–  Where	the	benefits	are	coming	from?	
–  Which	design	is	leading	to	benefits	or	bo<lenecks?	
–  Which	component	in	the	design	needs	to	be	changed	and	what	will	be	its	impact?	
–  Can	performance	gain/loss	at	the	lower-layer	be	correlated	to	the	performance	
gain/loss	observed	at	the	upper	layer?				
Are	the	Current	Benchmarks	Sufficient	for	Big	Data?
HPCAC-Switzerland	(Mar	‘16)	 58	Network	Based	Compu4ng	Laboratory	
Big	Data	Middleware	
(HDFS,	MapReduce,	HBase,	Spark	and	Memcached)	
Networking	Technologies	
(InfiniBand,	1/10/40/100	GigE	
and	Intelligent	NICs)	
	
Storage	Technologies	
(HDD,	SSD,	and	NVMe-SSD)	
Programming	Models	
(Sockets)	
Applica4ons	
Commodity	Compu4ng	System	
Architectures	
(Mul4-	and	Many-core	
architectures	and	accelerators)	
Other	Protocols?	
Communica4on	and	I/O	Library	
Point-to-Point	
Communica4on	
QoS	
Threaded	Models	
and	Synchroniza4on	
Fault-Tolerance	I/O	and	File	Systems	
Virtualiza4on	
Benchmarks	
RDMA	Protocols	
Challenges	in	Benchmarking	of	RDMA-based	Designs	
Current	
Benchmarks	
No	Benchmarks	
Correla4on?
HPCAC-Switzerland	(Mar	‘16)	 59	Network	Based	Compu4ng	Laboratory	
•  A	comprehensive	suite	of	benchmarks	to		
–  Compare	performance	of	different	MPI	libraries	on	various	networks	and	systems	
–  Validate	low-level	funcFonaliFes	
–  Provide	insights	to	the	underlying	MPI-level	designs	
•  Started	with	basic	send-recv	(MPI-1)	micro-benchmarks	for	latency,	bandwidth	and	bi-direcFonal	bandwidth	
•  Extended	later	to	
–  MPI-2	one-sided	
–  CollecFves	
–  GPU-aware	data	movement	
–  OpenSHMEM	(point-to-point	and	collecFves)	
–  UPC	
•  Has	become	an	industry	standard		
•  Extensively	used	for	design/development	of	MPI	libraries,	performance	comparison	of	MPI	libraries	and	even	
in	procurement	of	large-scale	systems	
•  Available	from	h<p://mvapich.cse.ohio-state.edu/benchmarks	
•  Available	in	an	integrated	manner	with	MVAPICH2	stack		
	
OSU	MPI	Micro-Benchmarks	(OMB)	Suite
HPCAC-Switzerland	(Mar	‘16)	 60	Network	Based	Compu4ng	Laboratory	
Big	Data	Middleware	
(HDFS,	MapReduce,	HBase,	Spark	and	Memcached)	
Networking	Technologies	
(InfiniBand,	1/10/40/100	GigE	
and	Intelligent	NICs)	
	
Storage	Technologies	
(HDD,	SSD,	and	NVMe-SSD)	
Programming	Models	
(Sockets)	
Applica4ons	
Commodity	Compu4ng	System	
Architectures	
(Mul4-	and	Many-core	
architectures	and	accelerators)	
Other	Protocols?	
Communica4on	and	I/O	Library	
Point-to-Point	
Communica4on	
QoS	
Threaded	Models	
and	Synchroniza4on	
Fault-Tolerance	I/O	and	File	Systems	
Virtualiza4on	
Benchmarks	
RDMA	Protocols	
Itera4ve	Process	–	Requires	Deeper	Inves4ga4on	and	Design	for	
Benchmarking	Next	Genera4on	Big	Data	Systems	and	Applica4ons		
Applica4ons-Level	
Benchmarks	
Micro-
Benchmarks
HPCAC-Switzerland	(Mar	‘16)	 61	Network	Based	Compu4ng	Laboratory	
•  HDFS	Benchmarks	
–  SequenFal	Write	Latency	(SWL)	Benchmark	
–  SequenFal	Read	Latency	(SRL)	Benchmark	
–  Random	Read	Latency	(RRL)	Benchmark	
–  SequenFal	Write	Throughput	(SWT)	Benchmark	
–  SequenFal	Read	Throughput	(SRT)	Benchmark	
•  Memcached	Benchmarks	
–  Get	Benchmark	
–  Set	Benchmark	
–  Mixed	Get/Set	Benchmark	
•  Available	as	a	part	of	OHB	0.8	
OSU	HiBD	Benchmarks	(OHB)	
N.	S.	Islam,	X.	Lu,	M.	W.	Rahman,	J.	Jose,	and	D.	
K.	Panda,	A	Micro-benchmark	Suite	for	
Evalua4ng	HDFS	Opera4ons	on	Modern	
Clusters,	Int'l	Workshop	on	Big	Data	
Benchmarking	(WBDB	'12),	December	2012	
	
D.	Shankar,	X.	Lu,	M.	W.	Rahman,	N.	Islam,	and	
D.	K.	Panda,	A	Micro-Benchmark	Suite	for	
Evalua4ng	Hadoop	MapReduce	on	High-
Performance	Networks,	BPOE-5	(2014)		
X.	Lu,	M.	W.	Rahman,	N.	Islam,	and	D.	K.	Panda,	
A	Micro-Benchmark	Suite	for	Evalua4ng	Hadoop	
RPC	on	High-Performance	Networks,	Int'l	
Workshop	on	Big	Data	Benchmarking	(WBDB	
'13),	July	2013	
To	be	Released
HPCAC-Switzerland	(Mar	‘16)	 62	Network	Based	Compu4ng	Laboratory	
•  Upcoming	Releases	of	RDMA-enhanced	Packages	will	support	
–  HBase	
–  Impala	
•  Upcoming	Releases	of	OSU	HiBD	Micro-Benchmarks	(OHB)	will	support	
–  MapReduce	
–  RPC	
•  Advanced	designs	with	upper-level	changes	and	opFmizaFons	
–  Memcached	with	Non-blocking	API	
–  HDFS	+	Memcached-based	Burst	Buffer		
On-going	and	Future	Plans	of	OSU	High	Performance	Big	Data	
(HiBD)	Project
HPCAC-Switzerland	(Mar	‘16)	 63	Network	Based	Compu4ng	Laboratory	
•  Discussed	challenges	in	acceleraFng	Hadoop,	Spark	and	Memcached	
•  Presented	iniFal	designs	to	take	advantage	of	InfiniBand/RDMA	for	HDFS,	
MapReduce,	RPC,	Spark,	and	Memcached	
•  Results	are	promising		
•  Many	other	open	issues	need	to	be	solved		
•  Will	enable	Big	Data	community	to	take	advantage	of	modern	HPC	
technologies	to	carry	out	their	analyFcs	in	a	fast	and	scalable	manner		
Concluding	Remarks
HPCAC-Switzerland	(Mar	‘16)	 64	Network	Based	Compu4ng	Laboratory	
Funding	Acknowledgments	
Funding	Support	by	
Equipment	Support	by
HPCAC-Switzerland	(Mar	‘16)	 65	Network	Based	Compu4ng	Laboratory	
Personnel	Acknowledgments	
Current	Students		
–  A.	AugusFne	(M.S.)	
–  A.	Awan	(Ph.D.)	
–  S.	Chakraborthy		(Ph.D.)	
–  C.-H.	Chu	(Ph.D.)	
–  N.	Islam	(Ph.D.)	
–  M.	Li	(Ph.D.)	
Past	Students		
–  P.	Balaji	(Ph.D.)	
–  S.	Bhagvat	(M.S.)	
–  A.	Bhat	(M.S.)		
–  D.	BunFnas	(Ph.D.)	
–  L.	Chai	(Ph.D.)	
–  B.	Chandrasekharan	(M.S.)	
–  N.	Dandapanthula	(M.S.)	
–  V.	Dhanraj	(M.S.)	
–  T.	Gangadharappa	(M.S.)	
–  K.	Gopalakrishnan	(M.S.)	
–  G.	Santhanaraman	(Ph.D.)	
–  A.	Singh	(Ph.D.)	
–  J.	Sridhar	(M.S.)	
–  S.	Sur	(Ph.D.)	
–  H.	Subramoni	(Ph.D.)	
–  K.	Vaidyanathan	(Ph.D.)	
–  A.	Vishnu	(Ph.D.)	
–  J.	Wu	(Ph.D.)	
–  W.	Yu	(Ph.D.)	
Past	Research	Scien:st	
–  S.	Sur	
Current	Post-Doc	
–  J.	Lin	
–  D.	Banerjee	
Current	Programmer	
–  J.	Perkins	
Past	Post-Docs	
–  H.	Wang	
–  X.	Besseron	
–  H.-W.	Jin	
–  M.	Luo	
–  W.	Huang	(Ph.D.)	
–  W.	Jiang	(M.S.)	
–  J.	Jose	(Ph.D.)	
–  S.	Kini	(M.S.)	
–  M.	Koop	(Ph.D.)	
–  R.	Kumar	(M.S.)	
–  S.	Krishnamoorthy	(M.S.)	
–  K.	Kandalla	(Ph.D.)	
–  P.	Lai	(M.S.)	
–  J.	Liu	(Ph.D.)	
–  M.	Luo	(Ph.D.)	
–  A.	Mamidala	(Ph.D.)	
–  G.	Marsh	(M.S.)	
–  V.	Meshram	(M.S.)	
–  A.	Moody	(M.S.)	
–  S.	Naravula	(Ph.D.)	
–  R.	Noronha	(Ph.D.)	
–  X.	Ouyang	(Ph.D.)	
–  S.	Pai	(M.S.)	
–  S.	Potluri	(Ph.D.)
–  R.	Rajachandrasekar	(Ph.D.)	
	
–  K.	Kulkarni	(M.S.)	
–  M.	Rahman	(Ph.D.)	
–  D.	Shankar	(Ph.D.)	
–  A.	Venkatesh	(Ph.D.)	
–  J.	Zhang	(Ph.D.)	
–  E.	Mancini	
–  S.	Marcarelli	
–  J.	Vienne	
Current	Research	Scien:sts				Current	Senior	Research	Associate	
–  H.	Subramoni	
–  X.	Lu	
Past	Programmers	
–  D.	Bureddy	
						-		K.	Hamidouche	
Current	Research	Specialist	
–  M.	Arnold
HPCAC-Switzerland	(Mar	‘16)	 66	Network	Based	Compu4ng	Laboratory	
Second	Interna4onal	Workshop	on		
High-Performance	Big	Data	Compu4ng	(HPBDC)
HPBDC	2016	will	be	held	with	IEEE	Interna4onal	Parallel	and	Distributed	Processing	
Symposium	(IPDPS	2016),	Chicago,	Illinois	USA,	May	27th,	2016	
Keynote	Talk:	Dr.	Chaitanya	Baru,		
Senior	Advisor	for	Data	Science,	Na4onal	Science	Founda4on	(NSF);		
Dis4nguished	Scien4st,	San	Diego	Supercomputer	Center	(SDSC)	
	
Panel	Moderator:	Jianfeng	Zhan	(ICT/CAS)	
Panel	Topic:	Merge	or	Split:	Mutual	Influence	between	Big	Data	and	HPC	Techniques	
	
Six	Regular	Research	Papers	and	Two	Short	Research	Papers	
hWp://web.cse.ohio-state.edu/~luxi/hpbdc2016		
HPBDC	2015	was	held	in	conjunc4on	with	ICDCS’15	
hWp://web.cse.ohio-state.edu/~luxi/hpbdc2015
HPCAC-Switzerland	(Mar	‘16)	 67	Network	Based	Compu4ng	Laboratory	
panda@cse.ohio-state.edu	
Thank	You!	
The	High-Performance	Big	Data	Project	
h<p://hibd.cse.ohio-state.edu/
Network-Based	CompuFng	Laboratory	
h<p://nowlab.cse.ohio-state.edu/
The	MVAPICH2	Project	
h<p://mvapich.cse.ohio-state.edu/

More Related Content

PPTX
High Performance Computing and Big Data
PPTX
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
PPTX
Hadoop: An Industry Perspective
PDF
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
PPTX
Big Data Analysis Patterns - TriHUG 6/27/2013
PPTX
Big Data Analytics with Hadoop, MongoDB and SQL Server
PPTX
Big Data - A brief introduction
PDF
Présentation on radoop
High Performance Computing and Big Data
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
Hadoop: An Industry Perspective
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data - A brief introduction
Présentation on radoop

What's hot (20)

PPTX
Big Data Analysis Patterns with Hadoop, Mahout and Solr
PPTX
Big data Analytics Hadoop
PPTX
Are you ready for BIG DATA?
PPT
Big Tools for Big Data
PPTX
The Big Data Stack
PDF
Big Data Analytics for Real Time Systems
PDF
Introduction to Big Data
PDF
Hadoop - Architectural road map for Hadoop Ecosystem
PPTX
Intro to Big Data Hadoop
PDF
Big Data Real Time Applications
DOCX
Big data abstract
PPTX
Matching Data Intensive Applications and Hardware/Software Architectures
PDF
Introduction to Big Data
PPT
Big Data: An Overview
PPTX
Introduction to BIg Data and Hadoop
PDF
Database revolution opening webcast 01 18-12
PPT
Big data introduction, Hadoop in details
PDF
Big data technologies and Hadoop infrastructure
PPTX
Big Data HPC Convergence and a bunch of other things
PPT
Big data analytics, survey r.nabati
Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big data Analytics Hadoop
Are you ready for BIG DATA?
Big Tools for Big Data
The Big Data Stack
Big Data Analytics for Real Time Systems
Introduction to Big Data
Hadoop - Architectural road map for Hadoop Ecosystem
Intro to Big Data Hadoop
Big Data Real Time Applications
Big data abstract
Matching Data Intensive Applications and Hardware/Software Architectures
Introduction to Big Data
Big Data: An Overview
Introduction to BIg Data and Hadoop
Database revolution opening webcast 01 18-12
Big data introduction, Hadoop in details
Big data technologies and Hadoop infrastructure
Big Data HPC Convergence and a bunch of other things
Big data analytics, survey r.nabati
Ad

Viewers also liked (6)

PDF
HPC Storage and IO Trends and Workflows
PPTX
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
PDF
Big data analytics with Apache Hadoop
PDF
High–Performance Computing
PPTX
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
PPTX
Hadoop & Greenplum: Why Do Such a Thing?
HPC Storage and IO Trends and Workflows
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Big data analytics with Apache Hadoop
High–Performance Computing
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
Hadoop & Greenplum: Why Do Such a Thing?
Ad

Similar to Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing (20)

PPTX
An Introduction to Big Data
PDF
Big Data et eGovernment
PDF
Data Science towards the Digital Enterprise
PDF
Data Culture Series - Keynote - 24th feb
PDF
INF2190_W1_2016_public
PPTX
Top 5 Trends in Big Data & Analytics
PPTX
Top 5 Trends in Big Data & Analytics.
PDF
Top 5 Trends in Big Data & Analytics
PDF
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
PPTX
Big Data By Vijay Bhaskar Semwal
PDF
HUG Ireland Event - HPCC Presentation Slides
DOCX
Big data lecture notes
PPTX
PPTX
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
PDF
The New Convergence of Data; The Next Strategic Business Advantage
PDF
The Maturity Model: Taking the Growing Pains Out of Hadoop
PPT
Big data ankita1
PPTX
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
PPTX
Data mining with big data
PDF
Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate
An Introduction to Big Data
Big Data et eGovernment
Data Science towards the Digital Enterprise
Data Culture Series - Keynote - 24th feb
INF2190_W1_2016_public
Top 5 Trends in Big Data & Analytics
Top 5 Trends in Big Data & Analytics.
Top 5 Trends in Big Data & Analytics
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
Big Data By Vijay Bhaskar Semwal
HUG Ireland Event - HPCC Presentation Slides
Big data lecture notes
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
The New Convergence of Data; The Next Strategic Business Advantage
The Maturity Model: Taking the Growing Pains Out of Hadoop
Big data ankita1
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Data mining with big data
Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate

More from inside-BigData.com (20)

PDF
Major Market Shifts in IT
PDF
Preparing to program Aurora at Exascale - Early experiences and future direct...
PPTX
Transforming Private 5G Networks
PDF
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
PDF
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
PDF
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
PDF
HPC Impact: EDA Telemetry Neural Networks
PDF
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
PDF
Machine Learning for Weather Forecasts
PPTX
HPC AI Advisory Council Update
PDF
Fugaku Supercomputer joins fight against COVID-19
PDF
Energy Efficient Computing using Dynamic Tuning
PDF
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
PDF
State of ARM-based HPC
PDF
Versal Premium ACAP for Network and Cloud Acceleration
PDF
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
PDF
Scaling TCO in a Post Moore's Era
PDF
CUDA-Python and RAPIDS for blazing fast scientific computing
PDF
Introducing HPC with a Raspberry Pi Cluster
PDF
Overview of HPC Interconnects
Major Market Shifts in IT
Preparing to program Aurora at Exascale - Early experiences and future direct...
Transforming Private 5G Networks
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
HPC Impact: EDA Telemetry Neural Networks
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Machine Learning for Weather Forecasts
HPC AI Advisory Council Update
Fugaku Supercomputer joins fight against COVID-19
Energy Efficient Computing using Dynamic Tuning
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
State of ARM-based HPC
Versal Premium ACAP for Network and Cloud Acceleration
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Scaling TCO in a Post Moore's Era
CUDA-Python and RAPIDS for blazing fast scientific computing
Introducing HPC with a Raspberry Pi Cluster
Overview of HPC Interconnects

Recently uploaded (20)

PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Approach and Philosophy of On baking technology
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
KodekX | Application Modernization Development
PPT
Teaching material agriculture food technology
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Cloud computing and distributed systems.
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Review of recent advances in non-invasive hemoglobin estimation
Dropbox Q2 2025 Financial Results & Investor Presentation
Approach and Philosophy of On baking technology
Digital-Transformation-Roadmap-for-Companies.pptx
Encapsulation_ Review paper, used for researhc scholars
Reach Out and Touch Someone: Haptics and Empathic Computing
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Electronic commerce courselecture one. Pdf
KodekX | Application Modernization Development
Teaching material agriculture food technology
Building Integrated photovoltaic BIPV_UPV.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Empathic Computing: Creating Shared Understanding
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
NewMind AI Weekly Chronicles - August'25 Week I
Cloud computing and distributed systems.
Advanced methodologies resolving dimensionality complications for autism neur...
Chapter 3 Spatial Domain Image Processing.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Review of recent advances in non-invasive hemoglobin estimation

Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing