SlideShare a Scribd company logo
Estimating Future Initial Margin
with Machine Learning
Andres Hernandez
Estimating Initial Margin
Set up
Consider a bank, with a portfolio of OTC contracts traded with a
counterparty covered under a single netting agreement, which can
include variational margin VM and initial margin IM. The exposure
of the bank at time t is
E(t) = (V (t) − VM(t) + U(t) − IM(t))+
V (t) is the value of the portfolio at time t
VM(t) is the variational margin available to the bank at time t
U(t) is the value of cashflows scheduled to paid up to time t
IM(t) is the initial margin available to the bank at time t
3
Initial Margin
Our purpose is to be able to produce the forecast of initial margin.
For the purpose of MVA, the expectation Et [IM(t)] is needed, e.g.
Green and Kenyon
MVA = −
∫ T
t
((1 − RB)λB(u) − sI (u)) e−
∫ u
t (r(s)+λB (s)+λC (s))ds
×Et [IM(t)] du,
but in general, if the intention is to calculate exposure, what we will
strive for is the forecast along a particular scenario path m
IMm(t) = Q99 (∆m(t; δIM)|Ft)
where Q99 is the 99-th percentile, δIM is the MPoR, and ∆m(t; δIM)
is the clean change in portfolio value
∆m(t; δIM) = Vm(t + δIM) + Um(t, t + δIM) − Vm(t) (1)
4
Assumptions
We follow L. Andersen, M. Pykhtin, and A. Sokol and assume that
as δIM is relatively short, the P&L under the quantile is Gaussian
with zero drift
IMm(t) ≈ σm(t)Φ−1
(99%)
and where the variance is defined as
σ2
m(t) = E
[
∆2
m(t, δIM)|Ft
]
Forecasting the variance σ2
m(t) is really the purpose of this talk.
5
Estimating σm(t) through Regression
Longstaff-Schwartz Regression
F(t0, t) = E [f (xt; t0, t)|Ft0 ]
For square-integrable functions L2(Ω, F, Q), the expansion via or-
thonormal functions can be used to resolve the expectation
f (xt; t0, t) =
∞∑
i=0
ai (t0, t)Li (xt)
where Li (xt) is part of an orthonormal function sequence, which
covers the L2 space
F(t0, t) =
∞∑
i=0
ai (t0, t)
7
Longstaff-Schwartz Regression
The coefficients ai (t0, t) are proportional to
ai (t0, t) = E [f (xt; t0, t)Li (xt)|Ft0 ] (2)
With such a sequence, one can guarantee that for a chosen error
tolerance ϵ, there exists an N such that
F(t0, t) −
N∑
i=0
ai (t0, t) < ϵ
However, if one would need to evaluate Eq.(2) to use the method,
there would be little use for it.
8
Longstaff-Schwartz Regression
Instead of evaluating the coefficients ai , one chooses an N, and
estimates them by regressing against the available Monte Carlo sim-
ulation.
In the original Longstaff-Schwartz paper, a basis of Laguerre and
Hermite functions were used. Note however, that the requirement
of the orthogonal sequence was simply to guarantee the ability to
increase the precision, but in practice one does not rely on this
property, and often just uses a polynomial function.
9
Regression with Machine Learning
Machine Learning tools
While there are a myriad methods available from the machine learn-
ing toolkit, for time constrain we will look at the following methods
Least-Squares Regression (LSE)
Nadaraya-Watson kernel Regression (NWK)
k-Nearest Neighbor Regression (kNR)
Gradient Boosted Regression Trees (GBRT)
Recurrent Neural Network (LSTM)
All attempt to approximate the conditional expectation of Y relative
to a variable X
E [Y |X] = m(X)
11
Least Squares Regression
A form for the function is proposed, e.g.
ˆm(X) =
N∑
n=0
anXn
and the coefficients are determined by minimising the squared errors.
In the following N = 1 will be used. For a sample set {(xi , yi )}, with
i = 1, . . . , M
min
∑
i
wi (yi − ˆm(xi ))2
One could introduce some clever choice of weights wi , but in the
following all points are equally weighted
12
Nadaraya-Watson kernel Regression
Nadaraya-Watson uses a locally weighted average, with the weight
provided by a kernel K. The estimation of m, ˆmh, is then given by
ˆmh(x) =
∑
i Kh(x − xi )yi
∑
i Kh(x − xi )
The parameter h, called the bandwidth, should determine how much
the kernel will focus on local over global features. For example, the
radial basis function kernel
Kh(x, xi ) = exp
(
−
∥x − xi ∥2
2h2
)
As h varies from 0 to ∞, the kernel will move from weighting only
a match with an exact xi to weighting all points equally.
13
k-Nearest Neighbor Regression
To know the value for input x, the distance to all points in the
sample set is calculated. The k samples with the shortest distance
are picked, and the output value is simply the weighted average of
them. In our case, the points are weighted by inverse distance, so
that the nearest points have more weight.
14
Gradient Boosted Regression Trees
In a decision regression tree a
set of decisions based on one of
the predictor variables is made
at each node, leading a final
prediction:
A GBRT fits an additive set of
decision trees (weak learners)
F(x) =
∑
m
γmhm(x).
It bootstraps the model one
tree at a time
Fm(x) = Fm−1(x) + γmhm(x),
by having the new decision tree
hm(x) minimize the error of
Fm−1(x)
15
Artificial Neural Networks
An ANN is simply a network of regression units stacked in a particu-
lar configuration. Each regression unit, called a neuron, takes input
from the previous layer 1, combines that input according to a rule,
and applies a function on the result:
x1
w1
xn
wn
Σ
b
Σwx + b
σ(x) a
1
There are more complicated topologies, e.g. Recursive Neural Networks or
Restricted Boltzmann machine16
Artificial Neural Networks
In ANNs independent regression units are stacked together in layers,
with layers stacked on top of each other
17
Many to Many Recurrent Neural Network
A long-short-term memory (LSTM)
architecture was used. The standard
LSTM block is composed of several
gates with an internal state:
*Wikimedia
The LSTM blocks are
grouped into layers,
and several layers can
be stacked on top of
each other
t
18
Estimating σm(t) with Machine Learning
Benchmark
Originally we intended to use forward SIMM as the benchmark, but
as what we are calculating and SIMM are different, and need to
be scaled to compare them (see Anfuso et. al 2016), a different
benchmark was used
σ2
m(t) ≈ E
[
∆2
m(t, δIM)|V (t) = Vm(t)
]
≈ E
[
∆2
m(t, δIM)|V (t) ≈ Vm(t)
]
= E
[
∆2
m(t, δIM)| |V (t) − Vm(t)| < ϵ
]
For the regular calculations 1k scenarios are used, but 100k for the
benchmark calculations. ϵ is chosen on a timestep basis, at most
being half the width of the distance between the two nearest points
on that time step.
20
IR Swap
0 50 100 150 200 250
Timestep
1.78e+09
3.56e+09
5.34e+09
7.12e+09
8.90e+09
1.07e+10
1.25e+10
1.42e+10
1.60e+10
E[σ2(t)]
LSE
NWK
kNR
GBRT
LSTM
Benchmark
21
5 × 5 European Swaption - Physical Exercise
0 50 100 150 200 250
Timestep
1.88e+06
3.76e+06
5.64e+06
7.52e+06
9.40e+06
1.13e+07
1.32e+07
1.50e+07
1.69e+07
E[σ2(t)]
LSE
NWK
kNR
GBRT
LSTM
Benchmark
22
5 × 5 European Swaption - Physical Exercise
While all methods so far would probably be acceptable to calculate
E[IM(t)], and hence MVA, not all would be acceptable to calculate
exposure
−0.04 −0.02 0.00 0.02 0.04 0.06
Rate
−0.2
0.0
0.2
0.4
0.6
0.8
1.0
Δ2
m
1e7 BenchmarkΔ(timestepΔ=Δ180)
−0.04 −0.02 0.00 0.02 0.04 0.06 0.08
Rate
−0.2
0.0
0.2
0.4
0.6
0.8
1.0
Δ2
m
1e7 LSEΔ(timestepΔ=Δ180)
23
5 × 5 European Swaption - Physical Exercise
−0.04 −0.02 0.00 0.02 0.04 0.06 0.08
Rate
−0.2
0.0
0.2
0.4
0.6
0.8
1.0
Δ2
m
1e7 kNRΔ(timestepΔ=Δ180)
−0.04 −0.02 0.00 0.02 0.04 0.06 0.08
Rate
−0.2
0.0
0.2
0.4
0.6
0.8
1.0
Δ2
m
1e7 GBRTΔ(timestepΔ=Δ180)
−0.04 −0.02 0.00 0.02 0.04 0.06 0.08
Rate
−0.2
0.0
0.2
0.4
0.6
0.8
1.0
Δ2
m
1e7 LSTMΔ(timestepΔ=Δ180)
24
Portfolio
Multiple currencies: EUR, USD, SEK, AUD
Multiple indices: EUR 6M, EUR 3M, EUR 12M
15 IR Swaps, 5 XCCy Swaps, 4 FX Swaps
7 Bermudan Swaptions with physical exercise and multiple
exercise dates
While not all products are treated equally under regulation, for the
purpose of this exercise, they will all be included in the same netting
set.
25
Portfolio
0 50 100 150 200 250
Timestep
1.33e+13
1.73e+13
2.14e+13
2.54e+13
2.95e+13
3.35e+13
3.76e+13
4.16e+13
4.57e+13
E[σ2(t)]
LSE
NWK
kNR
GBRT
LSTM
Benchmark
26
Single model for the whole simulation
The LSTM produces a much smoother prediction because it is
”cheating”. Instead of training a new model at each time step, a la
Longstaff-Schwartz, the LSTM is provided with the benchmark
itself, albeit a few days old. The current simulation is used to
predict, but a much bigger simulation in the past was used to train.
Besides LSE for which it makes no sense to even try, this could be
done for the others.
27
k-Nearest Neighbor Single Model for Swaption
0 50 100 150 200 250
Timestep
1.88e+06
3.76e+06
5.64e+06
7.52e+06
9.40e+06
1.13e+07
1.32e+07
1.50e+07
1.69e+07
E[σ2(t)]
2 day delay
pre-trained kNR
timestep trained kNR
Benchmark
28
k-Nearest Neighbor Single Model for Swaption
0 50 100 150 200 250
Timestep
1.69e+06
3.38e+06
5.07e+06
6.76e+06
8.45e+06
1.01e+07
1.18e+07
1.35e+07
1.52e+07
E[σ2(t)]
10 day delay
pre-trained kNR
timestep trained kNR
Benchmark
29
Summary
A regression based approach allows for a fast calculation of expected
initial margin, and hence MVA, but care needs to be taken for expo-
sure calculation. Until now, the best solution seems to be k-Nearest
Neighbor Regression: simple, intuitive, and fast.
Moving forward
Validate the Gaussian assumption for more complex portfolios
Backtest stability of mapping to forward SIMM
Improve neural network response by trying out generative
models
Use transfer learning and other tools to attempt to replace
ever more parts from the Monte Carlo workflow for neural
networks taught on large data sets.
30
Thank you
R⃝2017 PricewaterhouseCoopers GmbH Wirtschaftsprfungsgesellschaft. All rights reserved. In this
document, PwC refers to PricewaterhouseCoopers GmbH Wirtschaftsprfungsgesellschaft, which is a member
firm of PricewaterhouseCoopers International Limited (PwCIL). Each member firm of PwCIL is a separate
and independent legal entity.

More Related Content

PDF
Kiroo recrute à 9 postes (Mars 2022)
PDF
Andres hernandez ai_machine_learning_london_nov2017
PDF
Random Matrix Theory and Machine Learning - Part 3
PDF
Response Surface in Tensor Train format for Uncertainty Quantification
PDF
DSP_FOEHU - MATLAB 02 - The Discrete-time Fourier Analysis
PDF
Computation of Electromagnetic Fields Scattered from Dielectric Objects of Un...
PDF
Adaptive dynamic programming algorithm for uncertain nonlinear switched systems
PDF
Cdc18 dg lee
Kiroo recrute à 9 postes (Mars 2022)
Andres hernandez ai_machine_learning_london_nov2017
Random Matrix Theory and Machine Learning - Part 3
Response Surface in Tensor Train format for Uncertainty Quantification
DSP_FOEHU - MATLAB 02 - The Discrete-time Fourier Analysis
Computation of Electromagnetic Fields Scattered from Dielectric Objects of Un...
Adaptive dynamic programming algorithm for uncertain nonlinear switched systems
Cdc18 dg lee

Similar to Estimating Future Initial Margin with Machine Learning (20)

PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
DSP_FOEHU - MATLAB 01 - Discrete Time Signals and Systems
PDF
Mathandphysicspart6subpart1 draftbacc
PDF
Duke Mathematical Journal Volume 112 Jonathan Wahl Academic Editor
PDF
Project Paper
PDF
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
PDF
On The Fundamental Aspects of Demodulation
PPTX
Smart Multitask Bregman Clustering
PDF
Mit2 092 f09_lec20
PPTX
Online Signals and Systems Assignment Help
PDF
Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on ...
PDF
DSP Lab Manual (10ECL57) - VTU Syllabus (KSSEM)
PDF
project final
PDF
ENGINEERING SYSTEM DYNAMICS-TAKE HOME ASSIGNMENT 2018
PDF
MVPA with SpaceNet: sparse structured priors
PDF
2014 spring crunch seminar (SDE/levy/fractional/spectral method)
PDF
Iterative procedure for uniform continuous mapping.
PPT
3. convolution fourier
PDF
Computation of electromagnetic_fields_scattered_from_dielectric_objects_of_un...
PDF
Theoretical and Practical Bounds on the Initial Value of Skew-Compensated Cl...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
DSP_FOEHU - MATLAB 01 - Discrete Time Signals and Systems
Mathandphysicspart6subpart1 draftbacc
Duke Mathematical Journal Volume 112 Jonathan Wahl Academic Editor
Project Paper
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
On The Fundamental Aspects of Demodulation
Smart Multitask Bregman Clustering
Mit2 092 f09_lec20
Online Signals and Systems Assignment Help
Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on ...
DSP Lab Manual (10ECL57) - VTU Syllabus (KSSEM)
project final
ENGINEERING SYSTEM DYNAMICS-TAKE HOME ASSIGNMENT 2018
MVPA with SpaceNet: sparse structured priors
2014 spring crunch seminar (SDE/levy/fractional/spectral method)
Iterative procedure for uniform continuous mapping.
3. convolution fourier
Computation of electromagnetic_fields_scattered_from_dielectric_objects_of_un...
Theoretical and Practical Bounds on the Initial Value of Skew-Compensated Cl...
Ad

Recently uploaded (20)

PDF
ssrn-3708.kefbkjbeakjfiuheioufh ioehoih134.pdf
PPTX
Session 14-16. Capital Structure Theories.pptx
PDF
Bitcoin Layer August 2025: Power Laws of Bitcoin: The Core and Bubbles
PDF
Understanding University Research Expenditures (1)_compressed.pdf
PPTX
EABDM Slides for Indifference curve.pptx
PPTX
Introduction to Managemeng Chapter 1..pptx
PDF
ECONOMICS AND ENTREPRENEURS LESSONSS AND
PPTX
Introduction to Customs (June 2025) v1.pptx
PDF
ECONOMICS AND ENTREPRENEURS LESSONSS AND
PDF
Lecture1.pdf buss1040 uses economics introduction
PPTX
Antihypertensive_Drugs_Presentation_Poonam_Painkra.pptx
PDF
THE EFFECT OF FOREIGN AID ON ECONOMIC GROWTH IN ETHIOPIA
PDF
Corporate Finance Fundamentals - Course Presentation.pdf
PPTX
introuction to banking- Types of Payment Methods
PDF
Dialnet-DynamicHedgingOfPricesOfNaturalGasInMexico-8788871.pdf
PPTX
Basic Concepts of Economics.pvhjkl;vbjkl;ptx
PDF
Why Ignoring Passive Income for Retirees Could Cost You Big.pdf
PDF
CLIMATE CHANGE AS A THREAT MULTIPLIER: ASSESSING ITS IMPACT ON RESOURCE SCARC...
PPTX
The discussion on the Economic in transportation .pptx
PPTX
Session 11-13. Working Capital Management and Cash Budget.pptx
ssrn-3708.kefbkjbeakjfiuheioufh ioehoih134.pdf
Session 14-16. Capital Structure Theories.pptx
Bitcoin Layer August 2025: Power Laws of Bitcoin: The Core and Bubbles
Understanding University Research Expenditures (1)_compressed.pdf
EABDM Slides for Indifference curve.pptx
Introduction to Managemeng Chapter 1..pptx
ECONOMICS AND ENTREPRENEURS LESSONSS AND
Introduction to Customs (June 2025) v1.pptx
ECONOMICS AND ENTREPRENEURS LESSONSS AND
Lecture1.pdf buss1040 uses economics introduction
Antihypertensive_Drugs_Presentation_Poonam_Painkra.pptx
THE EFFECT OF FOREIGN AID ON ECONOMIC GROWTH IN ETHIOPIA
Corporate Finance Fundamentals - Course Presentation.pdf
introuction to banking- Types of Payment Methods
Dialnet-DynamicHedgingOfPricesOfNaturalGasInMexico-8788871.pdf
Basic Concepts of Economics.pvhjkl;vbjkl;ptx
Why Ignoring Passive Income for Retirees Could Cost You Big.pdf
CLIMATE CHANGE AS A THREAT MULTIPLIER: ASSESSING ITS IMPACT ON RESOURCE SCARC...
The discussion on the Economic in transportation .pptx
Session 11-13. Working Capital Management and Cash Budget.pptx
Ad

Estimating Future Initial Margin with Machine Learning

  • 1. Estimating Future Initial Margin with Machine Learning Andres Hernandez
  • 3. Set up Consider a bank, with a portfolio of OTC contracts traded with a counterparty covered under a single netting agreement, which can include variational margin VM and initial margin IM. The exposure of the bank at time t is E(t) = (V (t) − VM(t) + U(t) − IM(t))+ V (t) is the value of the portfolio at time t VM(t) is the variational margin available to the bank at time t U(t) is the value of cashflows scheduled to paid up to time t IM(t) is the initial margin available to the bank at time t 3
  • 4. Initial Margin Our purpose is to be able to produce the forecast of initial margin. For the purpose of MVA, the expectation Et [IM(t)] is needed, e.g. Green and Kenyon MVA = − ∫ T t ((1 − RB)λB(u) − sI (u)) e− ∫ u t (r(s)+λB (s)+λC (s))ds ×Et [IM(t)] du, but in general, if the intention is to calculate exposure, what we will strive for is the forecast along a particular scenario path m IMm(t) = Q99 (∆m(t; δIM)|Ft) where Q99 is the 99-th percentile, δIM is the MPoR, and ∆m(t; δIM) is the clean change in portfolio value ∆m(t; δIM) = Vm(t + δIM) + Um(t, t + δIM) − Vm(t) (1) 4
  • 5. Assumptions We follow L. Andersen, M. Pykhtin, and A. Sokol and assume that as δIM is relatively short, the P&L under the quantile is Gaussian with zero drift IMm(t) ≈ σm(t)Φ−1 (99%) and where the variance is defined as σ2 m(t) = E [ ∆2 m(t, δIM)|Ft ] Forecasting the variance σ2 m(t) is really the purpose of this talk. 5
  • 7. Longstaff-Schwartz Regression F(t0, t) = E [f (xt; t0, t)|Ft0 ] For square-integrable functions L2(Ω, F, Q), the expansion via or- thonormal functions can be used to resolve the expectation f (xt; t0, t) = ∞∑ i=0 ai (t0, t)Li (xt) where Li (xt) is part of an orthonormal function sequence, which covers the L2 space F(t0, t) = ∞∑ i=0 ai (t0, t) 7
  • 8. Longstaff-Schwartz Regression The coefficients ai (t0, t) are proportional to ai (t0, t) = E [f (xt; t0, t)Li (xt)|Ft0 ] (2) With such a sequence, one can guarantee that for a chosen error tolerance ϵ, there exists an N such that F(t0, t) − N∑ i=0 ai (t0, t) < ϵ However, if one would need to evaluate Eq.(2) to use the method, there would be little use for it. 8
  • 9. Longstaff-Schwartz Regression Instead of evaluating the coefficients ai , one chooses an N, and estimates them by regressing against the available Monte Carlo sim- ulation. In the original Longstaff-Schwartz paper, a basis of Laguerre and Hermite functions were used. Note however, that the requirement of the orthogonal sequence was simply to guarantee the ability to increase the precision, but in practice one does not rely on this property, and often just uses a polynomial function. 9
  • 11. Machine Learning tools While there are a myriad methods available from the machine learn- ing toolkit, for time constrain we will look at the following methods Least-Squares Regression (LSE) Nadaraya-Watson kernel Regression (NWK) k-Nearest Neighbor Regression (kNR) Gradient Boosted Regression Trees (GBRT) Recurrent Neural Network (LSTM) All attempt to approximate the conditional expectation of Y relative to a variable X E [Y |X] = m(X) 11
  • 12. Least Squares Regression A form for the function is proposed, e.g. ˆm(X) = N∑ n=0 anXn and the coefficients are determined by minimising the squared errors. In the following N = 1 will be used. For a sample set {(xi , yi )}, with i = 1, . . . , M min ∑ i wi (yi − ˆm(xi ))2 One could introduce some clever choice of weights wi , but in the following all points are equally weighted 12
  • 13. Nadaraya-Watson kernel Regression Nadaraya-Watson uses a locally weighted average, with the weight provided by a kernel K. The estimation of m, ˆmh, is then given by ˆmh(x) = ∑ i Kh(x − xi )yi ∑ i Kh(x − xi ) The parameter h, called the bandwidth, should determine how much the kernel will focus on local over global features. For example, the radial basis function kernel Kh(x, xi ) = exp ( − ∥x − xi ∥2 2h2 ) As h varies from 0 to ∞, the kernel will move from weighting only a match with an exact xi to weighting all points equally. 13
  • 14. k-Nearest Neighbor Regression To know the value for input x, the distance to all points in the sample set is calculated. The k samples with the shortest distance are picked, and the output value is simply the weighted average of them. In our case, the points are weighted by inverse distance, so that the nearest points have more weight. 14
  • 15. Gradient Boosted Regression Trees In a decision regression tree a set of decisions based on one of the predictor variables is made at each node, leading a final prediction: A GBRT fits an additive set of decision trees (weak learners) F(x) = ∑ m γmhm(x). It bootstraps the model one tree at a time Fm(x) = Fm−1(x) + γmhm(x), by having the new decision tree hm(x) minimize the error of Fm−1(x) 15
  • 16. Artificial Neural Networks An ANN is simply a network of regression units stacked in a particu- lar configuration. Each regression unit, called a neuron, takes input from the previous layer 1, combines that input according to a rule, and applies a function on the result: x1 w1 xn wn Σ b Σwx + b σ(x) a 1 There are more complicated topologies, e.g. Recursive Neural Networks or Restricted Boltzmann machine16
  • 17. Artificial Neural Networks In ANNs independent regression units are stacked together in layers, with layers stacked on top of each other 17
  • 18. Many to Many Recurrent Neural Network A long-short-term memory (LSTM) architecture was used. The standard LSTM block is composed of several gates with an internal state: *Wikimedia The LSTM blocks are grouped into layers, and several layers can be stacked on top of each other t 18
  • 19. Estimating σm(t) with Machine Learning
  • 20. Benchmark Originally we intended to use forward SIMM as the benchmark, but as what we are calculating and SIMM are different, and need to be scaled to compare them (see Anfuso et. al 2016), a different benchmark was used σ2 m(t) ≈ E [ ∆2 m(t, δIM)|V (t) = Vm(t) ] ≈ E [ ∆2 m(t, δIM)|V (t) ≈ Vm(t) ] = E [ ∆2 m(t, δIM)| |V (t) − Vm(t)| < ϵ ] For the regular calculations 1k scenarios are used, but 100k for the benchmark calculations. ϵ is chosen on a timestep basis, at most being half the width of the distance between the two nearest points on that time step. 20
  • 21. IR Swap 0 50 100 150 200 250 Timestep 1.78e+09 3.56e+09 5.34e+09 7.12e+09 8.90e+09 1.07e+10 1.25e+10 1.42e+10 1.60e+10 E[σ2(t)] LSE NWK kNR GBRT LSTM Benchmark 21
  • 22. 5 × 5 European Swaption - Physical Exercise 0 50 100 150 200 250 Timestep 1.88e+06 3.76e+06 5.64e+06 7.52e+06 9.40e+06 1.13e+07 1.32e+07 1.50e+07 1.69e+07 E[σ2(t)] LSE NWK kNR GBRT LSTM Benchmark 22
  • 23. 5 × 5 European Swaption - Physical Exercise While all methods so far would probably be acceptable to calculate E[IM(t)], and hence MVA, not all would be acceptable to calculate exposure −0.04 −0.02 0.00 0.02 0.04 0.06 Rate −0.2 0.0 0.2 0.4 0.6 0.8 1.0 Δ2 m 1e7 BenchmarkΔ(timestepΔ=Δ180) −0.04 −0.02 0.00 0.02 0.04 0.06 0.08 Rate −0.2 0.0 0.2 0.4 0.6 0.8 1.0 Δ2 m 1e7 LSEΔ(timestepΔ=Δ180) 23
  • 24. 5 × 5 European Swaption - Physical Exercise −0.04 −0.02 0.00 0.02 0.04 0.06 0.08 Rate −0.2 0.0 0.2 0.4 0.6 0.8 1.0 Δ2 m 1e7 kNRΔ(timestepΔ=Δ180) −0.04 −0.02 0.00 0.02 0.04 0.06 0.08 Rate −0.2 0.0 0.2 0.4 0.6 0.8 1.0 Δ2 m 1e7 GBRTΔ(timestepΔ=Δ180) −0.04 −0.02 0.00 0.02 0.04 0.06 0.08 Rate −0.2 0.0 0.2 0.4 0.6 0.8 1.0 Δ2 m 1e7 LSTMΔ(timestepΔ=Δ180) 24
  • 25. Portfolio Multiple currencies: EUR, USD, SEK, AUD Multiple indices: EUR 6M, EUR 3M, EUR 12M 15 IR Swaps, 5 XCCy Swaps, 4 FX Swaps 7 Bermudan Swaptions with physical exercise and multiple exercise dates While not all products are treated equally under regulation, for the purpose of this exercise, they will all be included in the same netting set. 25
  • 26. Portfolio 0 50 100 150 200 250 Timestep 1.33e+13 1.73e+13 2.14e+13 2.54e+13 2.95e+13 3.35e+13 3.76e+13 4.16e+13 4.57e+13 E[σ2(t)] LSE NWK kNR GBRT LSTM Benchmark 26
  • 27. Single model for the whole simulation The LSTM produces a much smoother prediction because it is ”cheating”. Instead of training a new model at each time step, a la Longstaff-Schwartz, the LSTM is provided with the benchmark itself, albeit a few days old. The current simulation is used to predict, but a much bigger simulation in the past was used to train. Besides LSE for which it makes no sense to even try, this could be done for the others. 27
  • 28. k-Nearest Neighbor Single Model for Swaption 0 50 100 150 200 250 Timestep 1.88e+06 3.76e+06 5.64e+06 7.52e+06 9.40e+06 1.13e+07 1.32e+07 1.50e+07 1.69e+07 E[σ2(t)] 2 day delay pre-trained kNR timestep trained kNR Benchmark 28
  • 29. k-Nearest Neighbor Single Model for Swaption 0 50 100 150 200 250 Timestep 1.69e+06 3.38e+06 5.07e+06 6.76e+06 8.45e+06 1.01e+07 1.18e+07 1.35e+07 1.52e+07 E[σ2(t)] 10 day delay pre-trained kNR timestep trained kNR Benchmark 29
  • 30. Summary A regression based approach allows for a fast calculation of expected initial margin, and hence MVA, but care needs to be taken for expo- sure calculation. Until now, the best solution seems to be k-Nearest Neighbor Regression: simple, intuitive, and fast. Moving forward Validate the Gaussian assumption for more complex portfolios Backtest stability of mapping to forward SIMM Improve neural network response by trying out generative models Use transfer learning and other tools to attempt to replace ever more parts from the Monte Carlo workflow for neural networks taught on large data sets. 30
  • 31. Thank you R⃝2017 PricewaterhouseCoopers GmbH Wirtschaftsprfungsgesellschaft. All rights reserved. In this document, PwC refers to PricewaterhouseCoopers GmbH Wirtschaftsprfungsgesellschaft, which is a member firm of PricewaterhouseCoopers International Limited (PwCIL). Each member firm of PwCIL is a separate and independent legal entity.