Estimating Financial
Risk with Spark
Sandy Ryza
Senior Data Scientist, Cloudera
Estimating Financial Risk with Apache Spark
Estimating Financial Risk with Apache Spark
In reasonable
circumstances, what’s
the most you can expect
to lose?
Estimating Financial Risk with Apache Spark
def valueAtRisk(
portfolio,
timePeriod,
pValue
): Double = { ... }
def valueAtRisk(
portfolio,
2 weeks,
.05
): = $1,000,000
Probability
density
Portfolio return ($) over time period
VaR estimation
approaches
• Variance-covariance
• Historical
• Monte Carlo
Estimating Financial Risk with Apache Spark
Estimating Financial Risk with Apache Spark
Market Risk Factors
• Indexes (S&P 500, NASDAQ)
• Prices of commodities
• Currency exchange rates
• Treasury bonds
Predicting Instrument Returns from
Factor Returns
• Train a linear model on the factors for each
instrument
Fancier
• Add features that are non-linear transformations of
the market risk factors
• Decision trees
• For options, use Black-Scholes
import org.apache.commons.math3.stat.regression.OLSMultipleLinearRegression
// Load the instruments and factors
val factorReturns: Array[Array[Double]] = ...
val instrumentReturns: RDD[Array[Double]] = ...
// Fit a model to each instrument
val models: Array[Array[Double]] =
instrumentReturns.map { instrument =>
val regression = new OLSMultipleLinearRegression()
regression.newSampleData(instrument, factorReturns)
regression.estimateRegressionParameters()
}.collect()
How to sample factor
returns?
• Need to be able to generate sample vectors where
each component is a factor return.
• Factors returns are usually correlated.
Distribution of US treasury bond two-
week returns
Distribution of crude oil two-week
returns
The Multivariate Normal Distribution
• Probability distribution over vectors of length N
• Given all the variables but one, that variable is
distributed according to a univariate normal
distribution
• Models correlations between variables
Estimating Financial Risk with Apache Spark
import org.apache.commons.math3.stat.correlation.Covariance
// Compute means
val factorMeans: Array[Double] = transpose(factorReturns)
.map(factor => factor.sum / factor.size)
// Compute covariances
val factorCovs: Array[Array[Double]] = new Covariance(factorReturns)
.getCovarianceMatrix().getData()
Fancier
• Multivariate normal often a poor choice compared to
more sophisticated options
• Fatter tails: Multivariate T Distribution
• Filtered historical simulation
• ARMA
• GARCH
Running the Simulations
• Create an RDD of seeds
• Use each seed to generate a set of simulations
• Aggregate results
def trialReturn(factorDist: MultivariateNormalDistribution, models: Seq[Array[Double]]): Double = {
val trialFactorReturns = factorDist.sample()
var totalReturn = 0.0
for (model <- models) {
// Add the returns from the instrument to the total trial return
for (i <- until trialFactorsReturns.length) {
totalReturn += trialFactorReturns(i) * model(i)
}
}
totalReturn
}
// Broadcast the factor return -> instrument return models
val bModels = sc.broadcast(models)
// Generate a seed for each task
val seeds = (baseSeed until baseSeed + parallelism)
val seedRdd = sc.parallelize(seeds, parallelism)
// Create an RDD of trials
val trialReturns: RDD[Double] = seedRdd.flatMap { seed =>
trialReturns(seed, trialsPerTask, bModels.value, factorMeans, factorCovs)
}
Executor
Executor
Time
Model
parameters
Model
parameters
Executor
Task Task
Trial Trial TrialTrial
Task Task
Trial Trial Trial Trial
Executor
Task Task
Trial Trial TrialTrial
Task Task
Trial Trial Trial Trial
Time
Model
parameters
Model
parameters
// Cache for reuse
trialReturns.cache()
val numTrialReturns = trialReturns.count().toInt
// Compute value at risk
val valueAtRisk = trials.takeOrdered(numTrialReturns / 20).last
// Compute expected shortfall
val expectedShortfall =
trials.takeOrdered(numTrialReturns / 20).sum / (numTrialReturns / 20)
Estimating Financial Risk with Apache Spark
VaR
Estimating Financial Risk with Apache Spark
So why Spark?
Easier to use
• Scala and Python REPLs
• Single platform for
• Cleaning data
• Fitting models
• Running simulations
• Analyzing results
New powers
• Save full simulation-loss matrix in memory (or disk)
• Run deeper analyses
• Join with other datasets
But it’s CPU bound and
we’re using Java?
• Computational bottlenecks are normally in matrix
operations, which can be BLAS-ified
• Can call out to GPUs just like in C++
• Memory access patterns aren’t high-GC inducing
Want to do this
yourself?
spark-finance
• https://github.com/cloudera/spark-finance
• Everything here + the fancier stuff
• Patches welcome!
Estimating Financial Risk with Apache Spark

More Related Content

PDF
Estimating Financial Risk with Spark
PDF
Financial Modeling with Apache Spark: Calculating Value at Risk
PPTX
Javascript Arrays
PPTX
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
PPTX
Scott Anderson [InfluxData] | InfluxDB Tasks – Beyond Downsampling | InfluxDa...
PDF
How To Use Higher Order Functions in Scala
PDF
Talk about Testing at vienna.rb meetup #2 on Apr 12th, 2013
PPTX
Amazon elastic map reduce
Estimating Financial Risk with Spark
Financial Modeling with Apache Spark: Calculating Value at Risk
Javascript Arrays
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
Scott Anderson [InfluxData] | InfluxDB Tasks – Beyond Downsampling | InfluxDa...
How To Use Higher Order Functions in Scala
Talk about Testing at vienna.rb meetup #2 on Apr 12th, 2013
Amazon elastic map reduce

Similar to Estimating Financial Risk with Apache Spark (20)

PPTX
wk5ppt2_Iris
PDF
Viktor Tsykunov: Azure Machine Learning Service
PPTX
Unit test candidate solutions
PPTX
Dive into DevOps | March, Building with Terraform, Volodymyr Tsap
PDF
Visual diagnostics at scale
PDF
Training Large-scale Ad Ranking Models in Spark
PPTX
Reactive programming every day
PDF
An Introduction to Property Based Testing
PPTX
Building calloutswithoutwsdl2apex
PDF
ppopoff
PDF
Android Automated Testing
PPTX
Unsupervised Aspect Based Sentiment Analysis at Scale
PDF
Testing in those hard to reach places
 
PPT
Real world cross-platform testing
PDF
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
PDF
Analytics with Spark
PDF
Workshop 23: ReactJS, React & Redux testing
PPTX
Test in action week 3
PDF
Converting R to PMML
PPTX
Azure machine learning service
wk5ppt2_Iris
Viktor Tsykunov: Azure Machine Learning Service
Unit test candidate solutions
Dive into DevOps | March, Building with Terraform, Volodymyr Tsap
Visual diagnostics at scale
Training Large-scale Ad Ranking Models in Spark
Reactive programming every day
An Introduction to Property Based Testing
Building calloutswithoutwsdl2apex
ppopoff
Android Automated Testing
Unsupervised Aspect Based Sentiment Analysis at Scale
Testing in those hard to reach places
 
Real world cross-platform testing
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Analytics with Spark
Workshop 23: ReactJS, React & Redux testing
Test in action week 3
Converting R to PMML
Azure machine learning service
Ad

More from Cloudera, Inc. (20)

PPTX
Partner Briefing_January 25 (FINAL).pptx
PPTX
Cloudera Data Impact Awards 2021 - Finalists
PPTX
2020 Cloudera Data Impact Awards Finalists
PPTX
Edc event vienna presentation 1 oct 2019
PPTX
Machine Learning with Limited Labeled Data 4/3/19
PPTX
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
PPTX
Introducing Cloudera DataFlow (CDF) 2.13.19
PPTX
Introducing Cloudera Data Science Workbench for HDP 2.12.19
PPTX
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
PPTX
Leveraging the cloud for analytics and machine learning 1.29.19
PPTX
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
PPTX
Leveraging the Cloud for Big Data Analytics 12.11.18
PPTX
Modern Data Warehouse Fundamentals Part 3
PPTX
Modern Data Warehouse Fundamentals Part 2
PPTX
Modern Data Warehouse Fundamentals Part 1
PPTX
Extending Cloudera SDX beyond the Platform
PPTX
Federated Learning: ML with Privacy on the Edge 11.15.18
PPTX
Analyst Webinar: Doing a 180 on Customer 360
PPTX
Build a modern platform for anti-money laundering 9.19.18
PPTX
Introducing the data science sandbox as a service 8.30.18
Partner Briefing_January 25 (FINAL).pptx
Cloudera Data Impact Awards 2021 - Finalists
2020 Cloudera Data Impact Awards Finalists
Edc event vienna presentation 1 oct 2019
Machine Learning with Limited Labeled Data 4/3/19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Leveraging the cloud for analytics and machine learning 1.29.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Leveraging the Cloud for Big Data Analytics 12.11.18
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 1
Extending Cloudera SDX beyond the Platform
Federated Learning: ML with Privacy on the Edge 11.15.18
Analyst Webinar: Doing a 180 on Customer 360
Build a modern platform for anti-money laundering 9.19.18
Introducing the data science sandbox as a service 8.30.18
Ad

Recently uploaded (20)

PPTX
Computer Software - Technology and Livelihood Education
DOCX
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
PDF
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
PDF
Autodesk AutoCAD Crack Free Download 2025
PPTX
Weekly report ppt - harsh dattuprasad patel.pptx
PPTX
WiFi Honeypot Detecscfddssdffsedfseztor.pptx
PDF
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
PPTX
CNN LeNet5 Architecture: Neural Networks
PDF
iTop VPN Crack Latest Version Full Key 2025
PPTX
Cybersecurity: Protecting the Digital World
PPTX
Computer Software and OS of computer science of grade 11.pptx
PDF
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
PDF
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
DOCX
How to Use SharePoint as an ISO-Compliant Document Management System
PPTX
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
PDF
MCP Security Tutorial - Beginner to Advanced
PDF
AI Guide for Business Growth - Arna Softech
PDF
Types of Token_ From Utility to Security.pdf
PDF
DNT Brochure 2025 – ISV Solutions @ D365
PPTX
Tech Workshop Escape Room Tech Workshop
Computer Software - Technology and Livelihood Education
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
Autodesk AutoCAD Crack Free Download 2025
Weekly report ppt - harsh dattuprasad patel.pptx
WiFi Honeypot Detecscfddssdffsedfseztor.pptx
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
CNN LeNet5 Architecture: Neural Networks
iTop VPN Crack Latest Version Full Key 2025
Cybersecurity: Protecting the Digital World
Computer Software and OS of computer science of grade 11.pptx
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
How to Use SharePoint as an ISO-Compliant Document Management System
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
MCP Security Tutorial - Beginner to Advanced
AI Guide for Business Growth - Arna Softech
Types of Token_ From Utility to Security.pdf
DNT Brochure 2025 – ISV Solutions @ D365
Tech Workshop Escape Room Tech Workshop

Estimating Financial Risk with Apache Spark