SlideShare a Scribd company logo
Politecnico	di	Milano
Dipartimento	di	Elettronica,	Informazione	e	Bioingegneria	(DEIB)
Anna	Maria	Nestorov,	Enrico	Reggiani	and	Marco	D.	Santambrogio
{annamaria.nestorov,	enrico2.reggiani}@mail.polimi.it	
marco.santambrogio@polimi.it		
A	SCALABLE	DATAFLOW	IMPLEMENTATION	OF		
CURRAN’S	APPPROXIMATION	ALGORITHM	
7th	June	2017	@	Xilinx
2
Contributions
Thanks to the Maxeler Tools productivity features, we aimed to create an
efficient parametric design which: 

1. Computes Value at Risk (VaR) of a portfolio of Asian Options based on
Curran’s approximation method

2. Supports arbitrary number of averaging points
3
• Black-Scholes model: option payoff variable no closed-form representation for
its probability distribution

• Curran's Approximation: expected option payoff conditional on the
geometric mean of the prices at averaging points

• Curran’s algorithm characterised by:

1. High degree of precision

2. Computational intensive 

• High number of invocations to the Normal Cumulative Distribution Function
(NCDF), exponentials and logarithms

• Highly parallel computation, completely independent variables are calculated

• Evaluation of one portfolio takes from one to many hours
Curran’s	Approximation	for	Asian	Option	Pricing
4
• A	server-class	HPC	system	comprising:	
1.	8	MAX4	MAIA	DFEs	with	an	Altera	StraXx	V	FPGA	and	96	GB	of	DRAM	each		
2.	a	dual	socket	Intel	Xeon	CPU	X5650	CPU	subsystem	with	24	hardware	cores	per	
socket	running	at	2.67GHz	and	using	768GB	of	RAM
1U	Maxeler	MAX4	MPC-X	Architecture
5
• DFE	input:	N			x	N			x	#optionFields		
• Initialisation	K1,	intermediate	K3	and	finalisation	K5	kernels	do	not	
require	multi-cycling		
• Summation	kernels	K2	and	K4	unroll	k	summand		computations	
• DFE	output:	N	S
Data	Flow	Architecture	Single	DFE
O S
Infiniband	link
Infiniband	link
6
• DFE	input:	N			x	(N			/	#DFEs)	x	#optionFields		
• Initialisation	K1,	intermediate	K3	and	finalisation	K5	kernels	do	not	
require	multi-cycling		
• Summation	kernels	K2	and	K4	unroll	k	summand		computations	
• DFE	output:	N			/	#DFEsS
Data	Flow	Architecture	Multi-DFEs
O S
DFEs
Infiniband	link
Infiniband	link
7
• Two	test	data	sets:	DataSet30	and	DataSet780	
• Precision	analysis	performed	exploiting	fixed-point	
and	floating-point	data	types,	one	per	build,	for	the	
entire	design	
• DFE	resource	usage	analysis	for	the	same	data	types	
• Dynamic	ranges	analysis	
Experiments
8
• Domain	specific	accuracy	constraint:			precision	<	10		
																						
Fix32(11,21)													Fix48(16,32)													Fix54(16,38)												Fix64(11,53)
Float32	(8,24)										Float48(11,37)									Float52(11,41)								Float64(11,53)
Precision	Analysis	Results
-9
9
• 54 and 64 bits fixed-point data representation leads to less resources
than in case of a floating point (through 48, 54 and 64 bits)
DFE	Resource	Analysis	Results
10
• Assuming	worst	case	(linear)	scalability	resource	utilisation	with	
parameter	k	
• With	Fix54(16,38)		maximum	value	of	the	unrolling	factor	k=3	
• Dynamic	range	analysis	aiming	to	increase	the	unrolling	factor
Dynamic	Ranges	Analysis	Results
11
• Assuming	worst	case	(linear)	scalability	resource	utilisation	with	
parameter	k	
• With	Fix54(16,38)		maximum	value	of	the	unrolling	factor	k=3	
• Dynamic	range	analysis	aiming	to	increase	the	unrolling	factor
k=3
Dynamic	Ranges	Analysis	Results
12
• Assuming	worst	case	(linear)	scalability	resource	utilisation	with	
parameter	k	
• With	Fix54(16,38)		maximum	value	of	the	unrolling	factor	k=3	
• Dynamic	range	analysis	aiming	to	increase	the	unrolling	factor
K1
K2 K2 K2 K2
K3
K4 K4 K4 K4
K5
…
…
k=3
Dynamic	Ranges	Analysis	Results
13
• Assuming	worst	case	(linear)	scalability	resource	utilisation	with	
parameter	k	
• With	Fix54(16,38)		maximum	value	of	the	unrolling	factor	k=3	
• Dynamic	range	analysis	aiming	to	increase	the	unrolling	factor
K1
K2 K2 K2 K2
K3
K4 K4 K4 K4
K5
…
…
Float(11,32)
Float(11,32)
Float(11,32)
k=3
Dynamic	Ranges	Analysis	Results
14
• Assuming	worst	case	(linear)	scalability	resource	utilisation	with	
parameter	k	
• With	Fix54(16,38)		maximum	value	of	the	unrolling	factor	k=3	
• Dynamic	range	analysis	aiming	to	increase	the	unrolling	factor
K1
K2 K2 K2 K2
K3
K4 K4 K4 K4
K5
…
…
Float(11,32)
Float(11,32)
Float(11,32)
Fix48(14,34),	Fix48(8,40),	Fix64(20,40),		
Fix54(21,33)	and	Fix32(32,0)
Fix48(14,34),	Fix48(8,40),	Fix64(20,40),		
Fix54(21,33)	and	Fix32(32,0)
k=3
Dynamic	Ranges	Analysis	Results
15
• Assuming	worst	case	(linear)	scalability	resource	utilisation	with	
parameter	k	
• With	Fix54(16,38)		maximum	value	of	the	unrolling	factor	k=3	
• Dynamic	range	analysis	aiming	to	increase	the	unrolling	factor
K1
K2 K2 K2 K2
K3
K4 K4 K4 K4
K5
…
…
Float(11,32)
Float(11,32)
Float(11,32)
Fix48(14,34),	Fix48(8,40),	Fix64(20,40),		
Fix54(21,33)	and	Fix32(32,0)
Fix48(14,34),	Fix48(8,40),	Fix64(20,40),		
Fix54(21,33)	and	Fix32(32,0)
k=3
k=15
Dynamic	Ranges	Analysis	Results
16
Speedups	and	Energy	Efficiencies
s CPU 48 Cores Single DFE
DataSet30
DataSet780
Tabella 1-1
DataSet30 DataSet780
PU 1 Core 21 400
PU 24 Cores 7 36
PU 48 Cores 8 27
ingle DFE 5 6
s CPU 48 Cores Single DFE
DataSet30
DataSet780
Tabella 1-1-1
DataSet30 DataSet780
PU 1 Core 11 30
PU 24 Cores 7 36
PU 48 Cores 8 27
ingle DFE 11 12
RunTime[s]
1
100
10000
CPU 1 Core CPU 24 Cores CPU 48 Cores Single DFE 8 DFEs
2,181
11,99
238,564240,017
3789,277
1,461,25
10,3310,49
158,81
DataSet30
DataSet780
SocketEnergy[Wh]
1
10
100
CPU 1 Core CPU 24 Cores CPU 48 Cores Single DFE 8 DFEs
8
6
27
36
400
55
87
21
DataSet30
DataSet780
17
• An	example	of	large	class	of	HPC	application	with	numerical	solvers	used	
as	case	study	in	EXTRA	European	Project	
• Improvements	in	runtime	and	energy	utilisation	offer	a	compelling	
advantage	to	financial	institutions	that	want	to	reduce	both	option	pricing	
time	and	energy	usage	
• DFE:			
1. Multi-DFE	energy	efficiency	in	progress	
2. Porting	to	the	new	Maxeler	MAX5	based	on	Xilinx	Virtex	UltraScale+	
• CPU:	
1. More	improvements	to	be	done
Conclusions	and	Future	Works
18
THANKS	FOR	THE	ATTENTION!
{annamaria.nestorov,	enrico2.reggiani}@mail.polimi.it		
marco.santambrogio@polimi.it		
Acknowledgements	to	Hristina	Palikareva,	Pavel	Burovskiy	and	Tobias	Becker	from	Maxeler	Technologies	London

More Related Content

PPTX
powerpoint feb
PPT
Rinfret, Jonathan poster(2)
PDF
Development Infographic
PDF
LSH for
 Prediction Problem in Recommendation
PDF
Probabilistic data structures
PDF
Scalable Recommendation Algorithms with LSH
PDF
Time series deep learning
powerpoint feb
Rinfret, Jonathan poster(2)
Development Infographic
LSH for
 Prediction Problem in Recommendation
Probabilistic data structures
Scalable Recommendation Algorithms with LSH
Time series deep learning

What's hot (20)

PDF
Web-app realization of Shor’s quantum factoring algorithm and Grover’s quantu...
PDF
140829+an+empirical+study+of+the+impact+of+cloud+patterns+on+quality+of+servi...
PPTX
Canopy k-means using Hadoop
PPTX
Canopy kmeans
PPTX
Beyond The Euclidean Distance: Creating effective visual codebooks using the ...
PDF
Parallel Algorithms K – means Clustering
PPTX
CS267_Graph_Lab
PDF
computer networking
PPTX
On the Support of a Similarity-Enabled Relational Database Management System ...
PPTX
Optimizing SPARQL Query Processing On Dynamic and Static Data Based on Query ...
PDF
Toward the Online Visualisation of Algorithm Performance for Parameter Selection
PPT
Predicting Multiple Metrics for Queries: Better Decision Enabled by Machine L...
PDF
Masters Thesis
PDF
Changepoint Detection with Bayesian Inference
PDF
Change Point Analysis
PPTX
[Seminar] hyunwook 0624
PDF
Scalable Graph Clustering with Pregel
PDF
A Load-Balanced Parallelization of AKS Algorithm
PDF
Thesis presentation
PDF
Deep Learning for Time Series Data
Web-app realization of Shor’s quantum factoring algorithm and Grover’s quantu...
140829+an+empirical+study+of+the+impact+of+cloud+patterns+on+quality+of+servi...
Canopy k-means using Hadoop
Canopy kmeans
Beyond The Euclidean Distance: Creating effective visual codebooks using the ...
Parallel Algorithms K – means Clustering
CS267_Graph_Lab
computer networking
On the Support of a Similarity-Enabled Relational Database Management System ...
Optimizing SPARQL Query Processing On Dynamic and Static Data Based on Query ...
Toward the Online Visualisation of Algorithm Performance for Parameter Selection
Predicting Multiple Metrics for Queries: Better Decision Enabled by Machine L...
Masters Thesis
Changepoint Detection with Bayesian Inference
Change Point Analysis
[Seminar] hyunwook 0624
Scalable Graph Clustering with Pregel
A Load-Balanced Parallelization of AKS Algorithm
Thesis presentation
Deep Learning for Time Series Data
Ad

Similar to A Scalable Dataflow Implementation of Curran's Approximation Algorithm (12)

PDF
A Scalable Dataflow Implementation of Curran's Approximation Algorithm
PDF
OXiGen: Automated FPGA design flow from C applications to dataflow kernels - ...
PDF
OXiGen: A tool for automatic acceleration of C functions into dataflow FPGA-b...
PPT
Data flow super computing valentina balas
PDF
Parallelising Dynamic Programming
PDF
TMPA-2017: Using Functional Directives to Analyze Code Complexity and Communi...
PDF
High Performance Decision Tree Optimization within a Deep Learning Framework ...
PDF
optimization and preparation processes.pdf
PDF
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
PDF
A Probabilistic Pointer Analysis For Speculative Optimizations
PDF
JJ_Thesis
PPT
Anegdotic Maxeler (Romania)
A Scalable Dataflow Implementation of Curran's Approximation Algorithm
OXiGen: Automated FPGA design flow from C applications to dataflow kernels - ...
OXiGen: A tool for automatic acceleration of C functions into dataflow FPGA-b...
Data flow super computing valentina balas
Parallelising Dynamic Programming
TMPA-2017: Using Functional Directives to Analyze Code Complexity and Communi...
High Performance Decision Tree Optimization within a Deep Learning Framework ...
optimization and preparation processes.pdf
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
A Probabilistic Pointer Analysis For Speculative Optimizations
JJ_Thesis
Anegdotic Maxeler (Romania)
Ad

More from NECST Lab @ Politecnico di Milano (20)

PDF
Mesticheria Team - WiiReflex
PPTX
Punto e virgola Team - Stressometro
PDF
BitIt Team - Stay.straight
PDF
BabYodini Team - Talking Gloves
PDF
printf("Nome Squadra"); Team - NeoTon
PPTX
BlackBoard Team - Motion Tracking Platform
PDF
#include<brain.h> Team - HomeBeatHome
PDF
Flipflops Team - Wave U
PDF
Bug(atta) Team - Little Brother
PDF
#NECSTCamp: come partecipare
PDF
NECSTCamp101@2020.10.1
PDF
NECSTLab101 2020.2021
PDF
TreeHouse, nourish your community
PDF
TiReX: Tiled Regular eXpressionsmatching architecture
PDF
Embedding based knowledge graph link prediction for drug repurposing
PDF
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PDF
EMPhASIS - An EMbedded Public Attention Stress Identification System
PDF
Luns - Automatic lungs segmentation through neural network
PDF
BlastFunction: How to combine Serverless and FPGAs
PDF
Maeve - Fast genome analysis leveraging exact string matching
Mesticheria Team - WiiReflex
Punto e virgola Team - Stressometro
BitIt Team - Stay.straight
BabYodini Team - Talking Gloves
printf("Nome Squadra"); Team - NeoTon
BlackBoard Team - Motion Tracking Platform
#include<brain.h> Team - HomeBeatHome
Flipflops Team - Wave U
Bug(atta) Team - Little Brother
#NECSTCamp: come partecipare
NECSTCamp101@2020.10.1
NECSTLab101 2020.2021
TreeHouse, nourish your community
TiReX: Tiled Regular eXpressionsmatching architecture
Embedding based knowledge graph link prediction for drug repurposing
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
EMPhASIS - An EMbedded Public Attention Stress Identification System
Luns - Automatic lungs segmentation through neural network
BlastFunction: How to combine Serverless and FPGAs
Maeve - Fast genome analysis leveraging exact string matching

Recently uploaded (20)

PPTX
Lesson 3_Tessellation.pptx finite Mathematics
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
Structs to JSON How Go Powers REST APIs.pdf
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
CH1 Production IntroductoryConcepts.pptx
PPT
Project quality management in manufacturing
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
Construction Project Organization Group 2.pptx
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
Lecture Notes Electrical Wiring System Components
PPTX
Sustainable Sites - Green Building Construction
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
additive manufacturing of ss316l using mig welding
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
Welding lecture in detail for understanding
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Lesson 3_Tessellation.pptx finite Mathematics
CYBER-CRIMES AND SECURITY A guide to understanding
Structs to JSON How Go Powers REST APIs.pdf
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
CH1 Production IntroductoryConcepts.pptx
Project quality management in manufacturing
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Construction Project Organization Group 2.pptx
OOP with Java - Java Introduction (Basics)
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Model Code of Practice - Construction Work - 21102022 .pdf
Lecture Notes Electrical Wiring System Components
Sustainable Sites - Green Building Construction
Operating System & Kernel Study Guide-1 - converted.pdf
additive manufacturing of ss316l using mig welding
Internet of Things (IOT) - A guide to understanding
Welding lecture in detail for understanding
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx

A Scalable Dataflow Implementation of Curran's Approximation Algorithm