SlideShare a Scribd company logo
Introduction
Model For-
malisation
Numerical
Results
Calibrating the Lee-Carter and the Poisson Lee-Carter
models via Neural Networks
Salvatore Scognamiglio
Department of Management and Quantitative Studies,
University of Naples “Parthenope”
XLV Annual Meeting of the AMASES (2021)
S. Scognamiglio 13 September 2021 1 / 33
Introduction
Model For-
malisation
Numerical
Results
Introduction
Mortality modelling: Lee and Carter (JASA 1992), Brouhns, Denuit and Vermunt
(IME 2002), Renshaw and Haberman (IME 2006);
can be applied to a single population.
Multi-Population Mortality modelling: Li and Lee (Demography 2005), Kleinow
(IME 2015)
generally applied on smaller sub-sets of data;
usually intended for forecasting the mortality of similar populations;
hard to fit (complex optimisation schemes/less known statistical techniques).
Large-Scale Mortality Modelling: Richman and Wüthrich (AAS 2020), Perla,
Richman, Scognamiglio and Wüthrich (SAJ 2021)
allows more accurate forecasting than the traditional models for a large set of
populations;
provides only point forecasts.
S. Scognamiglio 13 September 2021 2 / 33
Introduction
Model For-
malisation
Numerical
Results
Large-Scale Mortality Modelling via neural networks
We develop a neural network model which describes the mortality dynamics of
many different and potentially unrelated populations:
individual stochastic mortality models are combined into a neural network
environment which encourages the information sharing among populations;
the model parameters are jointly optimised in a single stage using all
available information instead of using population-specific subsets of data as in
the traditional fitting schemes;
the proposed model presents very few easy-to-interpret parameters and allows
to measure uncertainty in the predictions;
the parameter estimates appear more robust and the forecasting performance
improves.
◦ The full paper is available on
SSRN:https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3868303
S. Scognamiglio 13 September 2021 3 / 33
Introduction
Model For-
malisation
Numerical
Results
The Lee-Carter Model
Let X = {x0, x1, . . . , xω} be the set of the ages and T = {t0, t1, . . . , tn} the set of
calendar years considered.
The Lee-Carter (LC) model defines the logarithm of the central death rate
log(mx,t ) ∈ R at age x ∈ X in the calendar year t ∈ T as
log(mx,t ) = ax + bx kt + x,t ,
where:
ax is the average force of mortality at age x;
kt is the overall mortality trend in calendar year t;
bx is the rate of change of force of mortality broken down to different ages.
To avoid identifiability problems, the following constraints are imposed
X
x∈X
bx = 1
X
t∈T
kt
| T |
= 0.
S. Scognamiglio 13 September 2021 4 / 33
Introduction
Model For-
malisation
Numerical
Results
The Lee-Carter Model: Ordinary Least Squared (OLS) estimation
The Ordinary Least Squared (OLS) estimation of the parameters can be obtained by
solving
arg min
(ax )x ,(bx )x ,(kt )t
X
x∈X
X
t∈T

log(mx,t ) − ax − bx kt
2
.
The (ax )x are estimated as
âx = log
 Y
t∈T
(mx,t )1/|T |

,
while (kt )t and (bx )x are estimated as the first right and first left singular vectors in
the Singular Value Decomposition (SVD) of the center log-mortality matrix
M = log(mx,t ) − âx

x∈X,t∈T
∈ R|X|×|T |
.
In order to forecast, (ax )x and (bx )x are assumed to be constant over time and the
time index kt is modeled as an ARIMA (0,1,0) process
kt = kt−1 + γ + et with i.i.d et ∼ N(0, σ2
)
where γ ∈ R.
S. Scognamiglio 13 September 2021 5 / 33
Introduction
Model For-
malisation
Numerical
Results
Multi-population mortality modelling: the Individual Lee-Carter
Approach
A simple way of modelling the mortality of a set of different populations I is
to describe each population separately with its own LC model
log(m
(i)
x,t ) = a(i)
x + b(i)
x k
(i)
t + 
(i)
x,t ∀i ∈ I.
This approach is sometimes called Individual Lee Carter (ILC) approach. In
this case, the model fitting is performed individually ∀i ∈ I and the
population and time-specific terms k
(i)
t are projected with independent
ARIMA (0,1,0) processes.
S. Scognamiglio 13 September 2021 6 / 33
Introduction
Model For-
malisation
Numerical
Results
The Poisson Lee-Carter Model
The main drawback of SVD is the assumption of homoskedastic errors (see Alho
(NAAJ 2000)).
In Brouhns (IME 2002), a maximum likelihood estimation based on a Poisson death
count D
(i)
x,t is proposed to allow heteroskedasticity:
D
(i)
x,t ∼ Poisson(E
(i)
x,t m
(i)
x,t ) with m
(i)
x,t = ea
(i)
x +b
(i)
x k
(i)
t
where E
(i)
x,t is the number of exposure-to-risk in age x at time t in the population i
and the classical LC constraints still hold ∀i ∈ I.
The model parameters can be estimated by solving
arg max
(a
(i)
x )x ,(b
(i)
x )x ,(k
(i)
t )t
X
x∈X
X
t∈T

D
(i)
x,t (a
(i)
x + b
(i)
x k
(i)
t ) − E
(i)
x,t ea
(i)
x +b
(i)
x k
(i)
t

+ ci , ∀i ∈ I
where ci ∈ R.
S. Scognamiglio 13 September 2021 7 / 33
Introduction
Model For-
malisation
Numerical
Results
Lee-Carter Model forecasting performance vs population size (Perla,
Richman, Scognamiglio and Wüthrich (SAJ 2021)):
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Male
Female
USA
RUS
JPN
DEUTW
FRATNP
ITA
GBRTENW
UKR
ESP
POL
TWN
AUS
NLD
DEUTE
GRC
HUN
PRT
BLR
CZE
BEL
SWE
AUT
BGR
CHE
ISR
SVK
DNK
FIN
GBR_SCO
NOR
IRL
LTU
NZL_NM
LVA
SVN
GBR_NIR
EST
LUX
ISL
−2
0
2
4
−2
0
2
4
Country
log(MSE)
Model
●
●
LC_Poisson
LC_SVD
Figure: Forecasting Mean Squared Error (MSE) in log-scale of the LC model (LC SVD) and the Poisson LC
model (LC Poisson) on the population of the Human Mortality Database (HMD), fitting period 1950-1999;
forecasting period 2000-2019; countries are sorted by population size in 2000.
S. Scognamiglio 13 September 2021 8 / 33
Introduction
Model For-
malisation
Numerical
Results
USA mortality data (high-population country)
Male Female
0 25 50 75 100 0 25 50 75 100
−7.5
−5.0
−2.5
Age
log(mx)
1960
1980
2000
Year
Figure: Log mortality rates for different ages in USA from 1950 to 2018. Source: Human Mortality Database
(HMD).
S. Scognamiglio 13 September 2021 9 / 33
Introduction
Model For-
malisation
Numerical
Results
LC model estimations using USA data (high-population country)
ax bx kt
0 25 50 75 100 0 25 50 75 100 1950 1960 1970 1980 1990 2000
−20
0
20
40
0.00
0.01
0.02
0.03
−8
−6
−4
−2
value
Model
LC_poisson
LC_SVD
Gender
Female
Male
Figure: LC model and Poisson LC models parameter estimates for USA mortality data.
S. Scognamiglio 13 September 2021 10 / 33
Introduction
Model For-
malisation
Numerical
Results
Luxembourg Mortality Data (low-population country)
Male Female
0 25 50 75 100 0 25 50 75 100
−10.0
−7.5
−5.0
−2.5
0.0
Age
log(mx)
1960
1970
1980
1990
2000
2010
Year
Figure: Log mortality rates for different ages in Luxembourg from 1960 to 2018. Source: Human Mortality
Database (HMD).
S. Scognamiglio 13 September 2021 11 / 33
Introduction
Model For-
malisation
Numerical
Results
LC model estimations using Luxembourg data (low-population
country)
ax bx kt
0 25 50 75 100 0 25 50 75 100 1960 1970 1980 1990 2000
−40
−20
0
20
0.00
0.02
0.04
−7.5
−5.0
−2.5
0.0
value
Model
LC_poisson
LC_SVD
Gender
Female
Male
Figure: LC model and Poisson LC models parameter estimates for Luxembourg mortality data.
S. Scognamiglio 13 September 2021 12 / 33
Introduction
Model For-
malisation
Numerical
Results
The Model formalisation
We simultaneously model the mortality of a set of populations I which differ
among them for the region and gender i = (r, g) ∈ I = R × {male, female}.
The network model provides three subnets that approximate the parameters
of the LC model. Each one of these subnets combines several kinds of neural
network layers:
The a(i)
-subnet uses embedding and fully-connected layers;
The b(i)
-subnet uses embedding and fully-connected layers;
The k
(i)
t -subnet uses fully-connected layers and/or other feed-forward
layers.
S. Scognamiglio 13 September 2021 13 / 33
Introduction
Model For-
malisation
Numerical
Results
The a(i)-subnet
The two embedding layers map r ∈ R and g ∈ G into real-valued vectors:
z
(a)
R : R → R
q
(a)
R , r 7→ z
(a)
R (r) = z
(a)
R,1(r), z
(a)
R,2(r), . . . , z
(a)
R,q
(a)
R
(r)
!
,
z
(a)
G : G → R
q
(a)
G , g 7→ z
(a)
G (g) = z
(a)
G,1(g), z
(a)
G,2(g), . . . , z
(a)
G,q
(a)
G
(g)
!
.
The vector z
(a)
I = z
(a)
I (r, g) = z
(a)
R (r)

, z
(a)
G (g)

∈ Rq
(a)
I (with q
(a)
I = q
(a)
R + q
(a)
G ) is a
learned representation of the population i = (r, g).
It is further processed by a FCN layer which maps z
(a)
I in a new |X|-dimensional real-valued
space
f
(a)
: R
q
(a)
I → R
|X|
, z
(a)
I 7→ f
(a)
(z
(a)
I ) =

f
(a)
x0
(z
(a)
I ), f
(a)
x1
(z
(a)
I ), . . . , f
(a)
xω
(z
(a)
I )

.
Each new feature f (a)
x (z
(a)
I ) is a age-specific function of the vector z
(a)
I
z
(a)
I 7→ f
(a)
x (z
(a)
I ) = φ
(a)

w
(a)
x,0 +
q
(a)
I
X
l=1
w
(a)
x,l z
(a)
I,l

= φ
(a)

w
(a)
x,0 +
D
w
(a)
x , z
(a)
I
E
, x ∈ X,
where φ(a)
: R → R is a (non-linear) activation function, w
(a)
x,l ∈ R are the network parameters.
S. Scognamiglio 13 September 2021 14 / 33
Introduction
Model For-
malisation
Numerical
Results
The b(i)-subnet
Similarly to the first subnet, the second one provides two embedding layers of size q
(b)
R , q
(b)
G ∈ N
z
(b)
R : R → R
q
(b)
R , r 7→ z
(b)
R (r) = z
(b)
R,1(r), z
(b)
R,2(r), . . . , z
(b)
R,q
(b)
R
(r)
!
,
z
(b)
G : G → R
q
(b)
G , g 7→ z
(b)
G (g) = z
(b)
G,1(g), z
(b)
G,2(g), . . . , z
(b)
G,q
(b)
G
(g)
!
,
and a |X|-dimensional FCN layer which maps the population-specific vector
z
(b)
I = z
(b)
I (r, g) = z
(b)
R (r)

, z
(b)
G (g)

∈ Rq
(b)
I (with q
(b)
I = q
(b)
R + q
(b)
G ) in
|X|-dimensional real-valued space
f
(b)
: R
q
(b)
I → R
|X|
, z
(b)
I 7→ f
(b)
(z
(b)
I ) =

f
(b)
x0
(z
(b)
I ), f
(b)
x1
(z
(b)
I ), . . . , f
(b)
xω
(z
(b)
I )

.
Also in this case, each new component f (b)
x (z
(b)
I ) is an age-specific function of z
(b)
I
z
(b)
I 7→ f
(b)
x (z
(j)
I ) = φ
(b)

w
(b)
x,0 +
q
(b)
I
X
l=1
w
(b)
x,l z
(b)
I,l

= φ
(b)

w
(b)
x,0 +
D
w
(b)
x , z
(b)
I
E 
, x ∈ X,
with φ(b)
: R → R and w
(b)
x,l ∈ R.
S. Scognamiglio 13 September 2021 15 / 33
Introduction
Model For-
malisation
Numerical
Results
The first two subnets in compact form
Denoting by w
(j)
0 = (w
(j)
x,0)x∈X ∈ R|X|
and W (j)
= (w
(j)
x,I )
x∈X ∈ R|X|×q
(j)
I , ∀j ∈ {a, b}, the
output of the first two subnets can be written in compact form
f
(a)
z
(a)
I

= φ
(a)

w
(a)
0 +
D
W
(a)
, z
(a)
I
E 
= φ
(a)

w
(a)
0 +
D
W
(a)
R , z
(a)
R (r)
E
+
D
W
(a)
G , z
(a)
G (g)
E 
,
f
(b)
z
(b)
I

= φ
(b)

w
(b)
0 +
D
W
(b)
, z
(b)
I
E 
= φ
(b)

w
(b)
0 +
D
W
(b)
R , z
(b)
R (r)
E
+
D
W
(b)
G , z
(b)
G (g)
E 
,
where one could carry out the decomposition W (j)
= W
(j)
R , W
(j)
G

of the matrices of the FCN
layers to distinguish the weights which refer to the gender-specific and the region-specific
features.
S. Scognamiglio 13 September 2021 16 / 33
Introduction
Model For-
malisation
Numerical
Results
The k
(i)
t -subnet
The first FCN layer maps log(m
(i)
t ) into a qz1
-dimensional real-valued space:
f
(k1)
: R
|X|
→ R
qz1 , log(m
(i)
t ) 7→ f
(k1)
log(m
(i)
t )

=

f
(k1)
1 log(m
(i)
t )

, . . . , f
(k1)
qz1
log(m
(i)
t )

where each new feature component f
(k1)
s (log(m
(i)
t )) is function of the mortality rates of all ages
log(m
(i)
t ) 7→ f
(k1)
s log(m
(i)
t )

= φ
(k1)

w
(k1)
s,0 +
D
w
(k1)
s , log(m
(i)
t )
E
, s = 1, . . . , qz1
,
where w
(k1)
s,0 ∈ R and w
(k1)
s ∈ R|X|
are parameters.
The second FCN layer of size qz2
= 1 is a mapping
f
(k2)
: R
qz1 → R, f
(k1)
log(m
(i)
t )

7→ f
(k2)
f
(k1)
log(m
(i)
t )

= f
(k2)
◦ f
(k1)
log(m
(i)
t )

.
It extracts a single new feature
(f
(k2)
◦ f
(k1)
)(log(m
(i)
t )) = φ
(k2)

w
(k2)
0 +

w
(k2)
, φ
(k1)

w
(k1)
0 +
D
W
(k1)
, log(m
(i)
t )
E  
,
where
w
(k2)
0 ∈ R, w
(k1)
0 = (w
(k1)
s,0 )1≤s≤qz1
∈ Rqz1 , w(k2)
∈ Rqz1 , W (k1)
= (w
(k1)
s )
1≤s≤qz1
∈ Rqz1
×|X|
are network parameters and φ(j)
(·) : R → R for j ∈ {k1, k2} are activation functions.
S. Scognamiglio 13 September 2021 17 / 33
Introduction
Model For-
malisation
Numerical
Results
Grapical summary of the model
fully conn. layer
embedding layer
concatening
Figure: Graphical representation of the neural network architecture for ILC models fitting.
S. Scognamiglio 13 September 2021 18 / 33
Introduction
Model For-
malisation
Numerical
Results
Model Interpretation
Finally, an approximation of log-mortality curve at time t in the population i can be obtained as

log(m
(i)
t ) = f
(a)
z
(a)
I

+ f
(b)
z
(b)
I

(f
(k2)
◦ f
(k1)
)(log(m
(i)
t ))
where each age component is given by

log(m
(i)
x,t ) = f
(a)
x z
(a)
I

| {z }
a
(i)
x
+ f
(b)
x z
(b)
I

| {z }
b
(i)
x
f
(k2)
◦ f
(k1)
log(m
(i)
t )

| {z }
k
(i)
t
.
A simple interpretation of all the terms can be provided:
f (a)
x z
(a)
I

∈ R is a population and age-specific term that plays the same role of a(i)
x in the
LC model.
f (b)
x z
(b)
I

∈ R is a population and age-specific term that plays the same role of b(i)
x in the
LC model.
(f (k2)
◦ f (k1)
)(log(m
(i)
t )) ∈ R is a population and time-specific term that plays the same
role of the k
(i)
t in the LC model.
S. Scognamiglio 13 September 2021 19 / 33
Introduction
Model For-
malisation
Numerical
Results
Model Interpretation
Setting linear activation φ(j)
(x) = x, ∀j ∈ {a, b}, and expanding all the terms in previous
equation, some further interpretations can be argued:

log(m
(i)
x,t ) =

global ax
z}|{
w
(a)
x,0 +
population effect
z }| {
D
w
(a)
x , z
(a)
I
E 
| {z }
a
(i)
x
+

global bx
z}|{
w
(b)
x,0 +
population effect
z }| {
D
w
(b)
x , z
(b)
I
E 
| {z }
b
(i)
x
·

f
(k2)
◦ f
(k1)
log(m
(i)
t )


| {z }
k
(i)
t
w
(a)
x,0 can be seen as a population-independent ax parameter,
D
w(a)
x , z
(a)
I
E
can be seen as a population-specific ax correction which can be decomposed
as:
D
w
(a)
x , z
(a)
I
E
| {z }
population effect
=
D
w
(a)
x,R, z
(a)
R (r)
E
| {z }
regional effect
+
D
w
(a)
x,G , z
(a)
G (g)
E
| {z }
gender effect
.
w
(b)
x,0 can be seen as a population-independent bx parameter,
D
w(b)
x , z
(b)
I
E
can be seen as a population-specific bx correction which can be decomposed
as:
D
w
(b)
x , z
(b)
I
E
| {z }
population effect
=
D
w
(b)
x,R, z
(b)
R (r)
E
| {z }
regional effect
+
D
w
(b)
x,G , z
(b)
G (g)
E
| {z }
gender effect
.
S. Scognamiglio 13 September 2021 20 / 33
Introduction
Model For-
malisation
Numerical
Results
Model fitting and forecasting
Denoting by ψ the full set of the network model’s parameters, it can be splitted into two groups:
the population-specific parameters z
(a)
R (r), z
(b)
R (r), ∀r ∈ R, and z
(a)
G (g), z
(b)
G (g), ∀g ∈ G;
the cross-population parameters w
(j)
0 , W (j)
, ∀j ∈ {a, b, k1}, and w(k2)
, w
(k2)
0 .
These parameters are iteratively adjusted via Back-Propagation algorithm to minimise a given
loss function.
The resulting estimates ψ̂ can be used to compute the Neural Network (NN) estimates of the
LC parameters:
â
(i)
x,NN = φ
(a)

ŵ
(a)
x,0 +
D
ŵ
(a)
x,R, ẑ
(a)
R (r)
E
+
D
ŵ
(a)
x,G , ẑ
(a)
G (g)
E 
, ∀x ∈ X, ∀i ∈ I,
b̂
(i)
x,NN = φ
(b)

ŵ
(b)
x,0 +
D
ŵ
(b)
x,R, ẑ
(b)
R (r)
E
+
D
ŵ
(b)
x,G , ẑ
(b)
G (g)
E 
, ∀x ∈ X, ∀i ∈ I,
k̂
(i)
t,NN = φ
(k2)

ŵ
(k2)
0 +

ŵ
(k2)
, φ
(k1)

ŵ
(k1)
0 +
D
Ŵ
(k1)
, log(m
(i)
t )
E  
, ∀t ∈ T , ∀i ∈ I.
Forecasting is performed assuming that â
(i)
x,NN and b̂
(i)
x,NN are constant over time, while k̂
(i)
t,NN is
projected with a random walk with drift, ∀i ∈ I.
S. Scognamiglio 13 September 2021 21 / 33
Introduction
Model For-
malisation
Numerical
Results
Experiment Design: Human Mortality Database
Data description:
Human Mortality Database (HMD): we simultaneously consider all populations
i = (r, g) ∈ I with |I| = 80 (Male and Female populations of 40 countries) for calendar
years in T = {t ∈ N : 1950 ≤ t ≤ 2018}.
Data Partitioning:
I Training data Ttrain = {t ∈ N : 1950 ≤ t ≤ 1999};
I Test data Ttest = {t ∈ N : 2000 ≤ t ≤ 2018}.
We consider 3 different networks which differ from each other in the k
(i)
t -subnet design that
processes the log-mortality curves:
1 LC FCN employs a fully-connected layer;
2 LC LCN employs a 1D locally-connected layer (local-connectivity) ;
3 LC CONV employs a 1D convolutional layer (local-connectivity and parameters sharing).
See Chapter 9 of Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep learning. MIT press.
Table: Number of network parameters for the neural network models considered.
Model # parameters
LC FCN 5.171
LC LCN 2.771
LC CONV 2.651
S. Scognamiglio 13 September 2021 22 / 33
Introduction
Model For-
malisation
Numerical
Results
Model fitting: MSE minimisation
In the first stage, all the network models are fitted minimising the Mean Squared Error (MSE).
The network training involves the minimisation of the following loss function
L(ψ) =
X
x∈X
X
i∈I
X
t∈T

log(m
(i)
x,t ) − φ
(a)

w
(a)
x,0 +
D
w
(a)
x , z
(a)
I
E 
+
−φ
(b)

w
(b)
x,0+
D
w
(b)
x , z
(b)
I
E 
·φ
(k2)

w
(k2)
0 +

w
(k2)
, φ
(k1)

w
(k1)
0 +
D
W
(k1)
, log(m
(i)
x )
E  2
.
S. Scognamiglio 13 September 2021 23 / 33
Introduction
Model For-
malisation
Numerical
Results
Forecasting performance
Table: Results of all three network architectures considered: forecasting MSE,
number of populations and ages in which each network beats the LC SVD model;
forecasting period 2000-2019; MSEs values are in 10−4.
Model # MSE # Populations # Ages
LC CONV mse 3.41 52/80 83/100
LC FCN mse 3.25 59/80 84/100
LC LCN mse 3.22 60/80 84/100
LC SVD 6.12
S. Scognamiglio 13 September 2021 24 / 33
Introduction
Model For-
malisation
Numerical
Results
Estimates comparison
LTU NZL_NM LVA SVN GBR_NIR EST LUX ISL
CHE ISR SVK DNK FIN GBR_SCO NOR IRL
HUN PRT BLR CZE BEL SWE AUT BGR
ESP POL CAN TWN AUS NLD DEUTE GRC
USA RUS JPN DEUTW FRATNP ITA GBRTENW UKR
0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100
−7.5
−5.0
−2.5
0.0
−7.5
−5.0
−2.5
0.0
−7.5
−5.0
−2.5
0.0
−7.5
−5.0
−2.5
0.0
−7.5
−5.0
−2.5
0.0
Age
value
Model
LC_LCN_mse
LC_SVD
Gender
Female
Male
Figure: Comparison of the LC LCN mse and LC SVD estimates of (a
(i)
x )x∈X for all the populations
considered; fitting period 1950-1999; countries are sorted by population size in 2000.
S. Scognamiglio 13 September 2021 25 / 33
Introduction
Model For-
malisation
Numerical
Results
Estimates comparison
LC_LCN_mse LC_SVD
0 25 50 75 100 0 25 50 75 100
−7.5
−5.0
−2.5
0.0
Age
value
Gender
Female
Male
Figure: Comparison of the LC LCN mse and LC SVD estimates of (a
(i)
x )x∈X distinguishing by model; fitting
period 1950-1999.
S. Scognamiglio 13 September 2021 26 / 33
Introduction
Model For-
malisation
Numerical
Results
Estimates comparison
LTU NZL_NM LVA SVN GBR_NIR EST LUX ISL
CHE ISR SVK DNK FIN GBR_SCO NOR IRL
HUN PRT BLR CZE BEL SWE AUT BGR
ESP POL CAN TWN AUS NLD DEUTE GRC
USA RUS JPN DEUTW FRATNP ITA GBRTENW UKR
0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
Age
value
Model
LC_LCN_mse
LC_SVD
Gender
Female
Male
Figure: Comparison of the LC LCN mse and LC SVD estimates of (b
(i)
x )x∈X for all the populations
considered; fitting period 1950-1999; countries are sorted by population size in 2000.
S. Scognamiglio 13 September 2021 27 / 33
Introduction
Model For-
malisation
Numerical
Results
Estimates comparison
LTU NZL_NM LVA SVN GBR_NIR EST LUX ISL
CHE ISR SVK DNK FIN GBR_SCO NOR IRL
HUN PRT BLR CZE BEL SWE AUT BGR
ESP POL CAN TWN AUS NLD DEUTE GRC
USA RUS JPN DEUTW FRATNP ITA GBRTENW UKR
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
8
5
1
9
9
0
1
9
9
5
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
8
5
1
9
9
0
1
9
9
5
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
8
5
1
9
9
0
1
9
9
5
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
−10
0
10
20
−10
0
10
0
25
50
−50
−25
0
25
50
75
−40
−20
0
20
40
−25
0
25
50
−40
−20
0
20
−30
0
30
60
−25
0
25
50
−40
−20
0
20
−60
−30
0
30
60
−25
0
25
−50
−25
0
25
50
−40
−20
0
20
40
60
−20
−10
0
10
20
−30
0
30
60
−50
−25
0
25
50
−50
−25
0
25
50
−50
−25
0
25
50
75
−25
0
25
50
−50
−25
0
25
−20
0
20
−40
−20
0
20
40
60
−20
0
20
40
−20
−10
0
10
−40
0
40
80
120
−25
0
25
50
−20
−10
0
10
20
30
−25
0
25
50
−10
0
10
20
−10
0
10
20
−20
0
20
40
−50
−25
0
25
50
−20
−10
0
10
20
−40
−20
0
20
−20
0
20
40
−50
0
50
−20
0
20
40
60
−30
0
30
−10
0
10
20
Year
value
Model
LC_LCN_mse
LC_SVD
Gender
Female
Male
Figure: Comparison of the LC LCN mse and LC SVD estimates of (k
(i)
t )t∈T for all the populations
considered; fitting period 1950-1999; countries are sorted by population size in 2000.
S. Scognamiglio 13 September 2021 28 / 33
Introduction
Model For-
malisation
Numerical
Results
Model fitting: Poisson loss minimisation
Assuming a Poisson number of death D
(i)
x,t , we explore the use of the Poisson loss function to
train the neural network models:
D
(i)
x,t ∼ Poisson(E
(i)
x,t e
m
(i)
x,t ),
where
m
(i)
x,t =

w
(a)
x,0 +
D
w
(a)
x , z
(a)
I
E 
+
+

w
(b)
x,0 +
D
w
(b)
x , z
(b)
I
E 
·

w
(k2)
0 +

w
(k2)
, φ
(k1)

w
(k1)
0 +
D
W
(k1)
, log(m
(i)
x )
E  
.
In this setting, the neural networks model fitting involves the minimisation of
L(ψ) =
X
x∈X
X
i∈I
X
t∈T

E
(i)
x,t e
m
(i)
x,t − D
(i)
x,t m
(i)
x,t

+ c
which corresponds to maximise the log-likelihood function under the Poisson assumption and
c ∈ R.
We use all the same data HMD; however, this time we exclude the Canadian populations since
the data present several missing values in the Exposure to Risk time series. Here, we have
|I| = 78.
S. Scognamiglio 13 September 2021 29 / 33
Introduction
Model For-
malisation
Numerical
Results
Forecasting performance
Table: Results of all three network architectures considered: forecasting MSEs,
number of populations and ages in which each network beats the LC Poisson and
the LC SVD models; forecasting period 2000-2019; MSEs values are in 10−4.
LC Poisson LC SVD
Model MSE # Populations # Ages # Populations # Ages
LC CONV Poisson 3.02 57/78 83/100 64/78 83/100
LC FCN Poisson 3.07 55/78 83/100 63/78 83/100
LC LCN Poisson 2.89 61/78 83/100 67/78 83/100
LC SVD 6.12
LC Poisson 5.19
S. Scognamiglio 13 September 2021 30 / 33
Introduction
Model For-
malisation
Numerical
Results
Forecasting MSEs of the LC LCN Poisson and the LC Poisson
models on different populations
Male Female
Country LC LCN Poisson LC Poisson LC LCN Poisson LC Poisson
1 USA 1.15 1.42 0.27 0.50
2 RUS 2.19 8.35 2.23 5.89
3 JPN 0.91 0.45 2.30 0.40
4 DEUTW 0.63 0.80 0.23 0.35
5 FRATNP 0.77 0.52 0.64 0.34
6 ITA 0.49 0.58 0.94 0.24
7 GBRTENW 0.74 1.11 0.66 0.38
8 UKR 2.05 7.19 3.40 3.72
9 ESP 0.80 1.72 0.63 1.27
10 POL 2.61 4.69 0.85 3.29
11 TWN 4.82 10.49 1.42 0.95
12 AUS 0.89 1.14 0.32 0.41
13 NLD 1.11 1.76 0.43 0.35
14 DEUTE 1.82 2.71 0.70 1.45
15 GRC 1.73 3.16 0.55 1.97
16 HUN 3.45 6.01 1.22 1.38
17 PRT 1.33 2.42 0.99 2.01
18 BLR 3.34 12.76 3.47 10.24
19 CZE 2.97 4.68 1.03 2.27
20 BEL 1.56 2.31 0.47 0.51
21 SWE 1.10 1.13 0.25 0.38
22 AUT 1.51 2.57 0.40 0.61
23 BGR 5.83 11.30 2.95 6.14
24 CHE 1.41 1.81 0.32 0.32
25 ISR 2.38 1.85 2.03 1.81
26 SVK 7.20 13.27 3.20 2.54
27 DNK 2.01 2.27 0.53 0.42
28 FIN 3.74 3.73 0.82 1.10
29 GBR SCO 1.69 1.97 0.41 0.67
30 NOR 2.11 3.50 0.71 0.51
31 IRL 3.40 7.82 1.52 2.23
32 LTU 6.59 9.37 9.54 7.60
33 NZL NM 2.50 4.19 0.70 1.19
34 LVA 10.38 11.37 3.00 3.57
35 SVN 10.18 69.32 2.01 4.77
36 GBR NIR 5.75 8.21 1.62 1.80
37 EST 16.05 18.88 3.49 6.88
38 LUX 15.90 43.12 5.42 6.74
39 ISL 19.17 19.98 7.56 7.40
S. Scognamiglio 13 September 2021 31 / 33
Introduction
Model For-
malisation
Numerical
Results
Projected log-mortality surface for the Luxembourg male population
Figure: Projected log-mortality surface of the LC Poisson and LC LCN Poisson models projected for the
Luxembourg male population.
S. Scognamiglio 13 September 2021 32 / 33
Introduction
Model For-
malisation
Numerical
Results
Future works
Future research intends to
1 analyse the performance of the proposed model on other available data
sources such as the United States Mortality Database (USMB) and
insurance portfolio’s data;
2 investigate the use of neural networks for fitting other stochastic
mortality models:
I the single-population models belonging to the family of Generalized Age
Period Cohort (GAPC) models;
I the multi-population extensions of the LC model: Li and lee (Demography
2005), Kleinow (IME 2015);
3 explore the potential of the proposed large-scale mortality model in the
actuarial evaluations and longevity risk management.
For advice or comments:
salvatore.scognamiglio@uniparthenope.it
S. Scognamiglio 13 September 2021 33 / 33

More Related Content

PDF
Ica group 3[1]
PDF
Bayesian Models for Astronomy
PDF
Pattern learning and recognition on statistical manifolds: An information-geo...
PDF
Maneuvering target track prediction model
PDF
Pattern Recognition
PDF
QMC Error SAMSI Tutorial Aug 2017
PDF
CLIM Fall 2017 Course: Statistics for Climate Research, Estimating Curves and...
PDF
The Odd Generalized Exponential Log Logistic Distribution
Ica group 3[1]
Bayesian Models for Astronomy
Pattern learning and recognition on statistical manifolds: An information-geo...
Maneuvering target track prediction model
Pattern Recognition
QMC Error SAMSI Tutorial Aug 2017
CLIM Fall 2017 Course: Statistics for Climate Research, Estimating Curves and...
The Odd Generalized Exponential Log Logistic Distribution

What's hot (20)

PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PDF
CLIM Fall 2017 Course: Statistics for Climate Research, Geostats for Large Da...
PDF
CLIM Fall 2017 Course: Statistics for Climate Research, Nonstationary Covaria...
PDF
accurate ABC Oliver Ratmann
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PDF
An introduction on normalizing flows
PDF
prior selection for mixture estimation
PDF
ABC with Wasserstein distances
PDF
D143136
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PDF
Moment Preserving Approximation of Independent Components for the Reconstruct...
PDF
Coordinate sampler: A non-reversible Gibbs-like sampler
PDF
Enhancing Partition Crossover with Articulation Points Analysis
PDF
SPDE presentation 2012
PDF
Clustering in Hilbert simplex geometry
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PDF
A new implementation of k-MLE for mixture modelling of Wishart distributions
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
CLIM Fall 2017 Course: Statistics for Climate Research, Geostats for Large Da...
CLIM Fall 2017 Course: Statistics for Climate Research, Nonstationary Covaria...
accurate ABC Oliver Ratmann
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
An introduction on normalizing flows
prior selection for mixture estimation
ABC with Wasserstein distances
D143136
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
Moment Preserving Approximation of Independent Components for the Reconstruct...
Coordinate sampler: A non-reversible Gibbs-like sampler
Enhancing Partition Crossover with Articulation Points Analysis
SPDE presentation 2012
Clustering in Hilbert simplex geometry
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
A new implementation of k-MLE for mixture modelling of Wishart distributions
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
Ad

Similar to Calibrating the Lee-Carter and the Poisson Lee-Carter models via Neural Networks (20)

PDF
Coherent mortality forecasting using functional time series models
PDF
22nd BSS meeting poster
PPT
extreme times in finance heston model.ppt
PDF
MUMS: Bayesian, Fiducial, and Frequentist Conference - Uncertainty Quantifica...
PPTX
Multivariate Methods Assignment Help
PDF
STATISTICAL ANALYSIS OF FUZZY LINEAR REGRESSION MODEL BASED ON DIFFERENT DIST...
PDF
Decoding BCH-Code.pdf
PDF
StatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdf
PDF
PhD defense talk slides
PDF
An investigation of inference of the generalized extreme value distribution b...
PDF
STATISTICAL ANALYSIS OF FUZZY LINEAR REGRESSION MODEL BASED ON DIFFERENT DIST...
PDF
H2O World - Generalized Low Rank Models - Madeleine Udell
PPTX
Kumaraswamy disribution
PDF
Quantum Minimax Theorem in Statistical Decision Theory (RIMS2014)
PDF
Mm chap08 -_lossy_compression_algorithms
PDF
ISI MSQE Entrance Question Paper (2011)
PDF
Inria Tech Talk - La classification de données complexes avec MASSICCC
PDF
Research on 4-dimensional Systems without Equilibria with Application
PDF
Principles of Actuarial Science Chapter 3
PDF
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
Coherent mortality forecasting using functional time series models
22nd BSS meeting poster
extreme times in finance heston model.ppt
MUMS: Bayesian, Fiducial, and Frequentist Conference - Uncertainty Quantifica...
Multivariate Methods Assignment Help
STATISTICAL ANALYSIS OF FUZZY LINEAR REGRESSION MODEL BASED ON DIFFERENT DIST...
Decoding BCH-Code.pdf
StatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdf
PhD defense talk slides
An investigation of inference of the generalized extreme value distribution b...
STATISTICAL ANALYSIS OF FUZZY LINEAR REGRESSION MODEL BASED ON DIFFERENT DIST...
H2O World - Generalized Low Rank Models - Madeleine Udell
Kumaraswamy disribution
Quantum Minimax Theorem in Statistical Decision Theory (RIMS2014)
Mm chap08 -_lossy_compression_algorithms
ISI MSQE Entrance Question Paper (2011)
Inria Tech Talk - La classification de données complexes avec MASSICCC
Research on 4-dimensional Systems without Equilibria with Application
Principles of Actuarial Science Chapter 3
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
Ad

Recently uploaded (20)

PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
SAP 2 completion done . PRESENTATION.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
Business Analytics and business intelligence.pdf
PDF
Introduction to the R Programming Language
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
.pdf is not working space design for the following data for the following dat...
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
Lecture1 pattern recognition............
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Introduction to Knowledge Engineering Part 1
PDF
Mega Projects Data Mega Projects Data
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Computer network topology notes for revision
IBA_Chapter_11_Slides_Final_Accessible.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
oil_refinery_comprehensive_20250804084928 (1).pptx
SAP 2 completion done . PRESENTATION.pptx
Clinical guidelines as a resource for EBP(1).pdf
Reliability_Chapter_ presentation 1221.5784
Business Analytics and business intelligence.pdf
Introduction to the R Programming Language
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
.pdf is not working space design for the following data for the following dat...
Galatica Smart Energy Infrastructure Startup Pitch Deck
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Lecture1 pattern recognition............
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Introduction to Knowledge Engineering Part 1
Mega Projects Data Mega Projects Data
Miokarditis (Inflamasi pada Otot Jantung)
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Computer network topology notes for revision

Calibrating the Lee-Carter and the Poisson Lee-Carter models via Neural Networks

  • 1. Introduction Model For- malisation Numerical Results Calibrating the Lee-Carter and the Poisson Lee-Carter models via Neural Networks Salvatore Scognamiglio Department of Management and Quantitative Studies, University of Naples “Parthenope” XLV Annual Meeting of the AMASES (2021) S. Scognamiglio 13 September 2021 1 / 33
  • 2. Introduction Model For- malisation Numerical Results Introduction Mortality modelling: Lee and Carter (JASA 1992), Brouhns, Denuit and Vermunt (IME 2002), Renshaw and Haberman (IME 2006); can be applied to a single population. Multi-Population Mortality modelling: Li and Lee (Demography 2005), Kleinow (IME 2015) generally applied on smaller sub-sets of data; usually intended for forecasting the mortality of similar populations; hard to fit (complex optimisation schemes/less known statistical techniques). Large-Scale Mortality Modelling: Richman and Wüthrich (AAS 2020), Perla, Richman, Scognamiglio and Wüthrich (SAJ 2021) allows more accurate forecasting than the traditional models for a large set of populations; provides only point forecasts. S. Scognamiglio 13 September 2021 2 / 33
  • 3. Introduction Model For- malisation Numerical Results Large-Scale Mortality Modelling via neural networks We develop a neural network model which describes the mortality dynamics of many different and potentially unrelated populations: individual stochastic mortality models are combined into a neural network environment which encourages the information sharing among populations; the model parameters are jointly optimised in a single stage using all available information instead of using population-specific subsets of data as in the traditional fitting schemes; the proposed model presents very few easy-to-interpret parameters and allows to measure uncertainty in the predictions; the parameter estimates appear more robust and the forecasting performance improves. ◦ The full paper is available on SSRN:https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3868303 S. Scognamiglio 13 September 2021 3 / 33
  • 4. Introduction Model For- malisation Numerical Results The Lee-Carter Model Let X = {x0, x1, . . . , xω} be the set of the ages and T = {t0, t1, . . . , tn} the set of calendar years considered. The Lee-Carter (LC) model defines the logarithm of the central death rate log(mx,t ) ∈ R at age x ∈ X in the calendar year t ∈ T as log(mx,t ) = ax + bx kt + x,t , where: ax is the average force of mortality at age x; kt is the overall mortality trend in calendar year t; bx is the rate of change of force of mortality broken down to different ages. To avoid identifiability problems, the following constraints are imposed X x∈X bx = 1 X t∈T kt | T | = 0. S. Scognamiglio 13 September 2021 4 / 33
  • 5. Introduction Model For- malisation Numerical Results The Lee-Carter Model: Ordinary Least Squared (OLS) estimation The Ordinary Least Squared (OLS) estimation of the parameters can be obtained by solving arg min (ax )x ,(bx )x ,(kt )t X x∈X X t∈T log(mx,t ) − ax − bx kt 2 . The (ax )x are estimated as âx = log Y t∈T (mx,t )1/|T | , while (kt )t and (bx )x are estimated as the first right and first left singular vectors in the Singular Value Decomposition (SVD) of the center log-mortality matrix M = log(mx,t ) − âx x∈X,t∈T ∈ R|X|×|T | . In order to forecast, (ax )x and (bx )x are assumed to be constant over time and the time index kt is modeled as an ARIMA (0,1,0) process kt = kt−1 + γ + et with i.i.d et ∼ N(0, σ2 ) where γ ∈ R. S. Scognamiglio 13 September 2021 5 / 33
  • 6. Introduction Model For- malisation Numerical Results Multi-population mortality modelling: the Individual Lee-Carter Approach A simple way of modelling the mortality of a set of different populations I is to describe each population separately with its own LC model log(m (i) x,t ) = a(i) x + b(i) x k (i) t + (i) x,t ∀i ∈ I. This approach is sometimes called Individual Lee Carter (ILC) approach. In this case, the model fitting is performed individually ∀i ∈ I and the population and time-specific terms k (i) t are projected with independent ARIMA (0,1,0) processes. S. Scognamiglio 13 September 2021 6 / 33
  • 7. Introduction Model For- malisation Numerical Results The Poisson Lee-Carter Model The main drawback of SVD is the assumption of homoskedastic errors (see Alho (NAAJ 2000)). In Brouhns (IME 2002), a maximum likelihood estimation based on a Poisson death count D (i) x,t is proposed to allow heteroskedasticity: D (i) x,t ∼ Poisson(E (i) x,t m (i) x,t ) with m (i) x,t = ea (i) x +b (i) x k (i) t where E (i) x,t is the number of exposure-to-risk in age x at time t in the population i and the classical LC constraints still hold ∀i ∈ I. The model parameters can be estimated by solving arg max (a (i) x )x ,(b (i) x )x ,(k (i) t )t X x∈X X t∈T D (i) x,t (a (i) x + b (i) x k (i) t ) − E (i) x,t ea (i) x +b (i) x k (i) t + ci , ∀i ∈ I where ci ∈ R. S. Scognamiglio 13 September 2021 7 / 33
  • 8. Introduction Model For- malisation Numerical Results Lee-Carter Model forecasting performance vs population size (Perla, Richman, Scognamiglio and Wüthrich (SAJ 2021)): ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Male Female USA RUS JPN DEUTW FRATNP ITA GBRTENW UKR ESP POL TWN AUS NLD DEUTE GRC HUN PRT BLR CZE BEL SWE AUT BGR CHE ISR SVK DNK FIN GBR_SCO NOR IRL LTU NZL_NM LVA SVN GBR_NIR EST LUX ISL −2 0 2 4 −2 0 2 4 Country log(MSE) Model ● ● LC_Poisson LC_SVD Figure: Forecasting Mean Squared Error (MSE) in log-scale of the LC model (LC SVD) and the Poisson LC model (LC Poisson) on the population of the Human Mortality Database (HMD), fitting period 1950-1999; forecasting period 2000-2019; countries are sorted by population size in 2000. S. Scognamiglio 13 September 2021 8 / 33
  • 9. Introduction Model For- malisation Numerical Results USA mortality data (high-population country) Male Female 0 25 50 75 100 0 25 50 75 100 −7.5 −5.0 −2.5 Age log(mx) 1960 1980 2000 Year Figure: Log mortality rates for different ages in USA from 1950 to 2018. Source: Human Mortality Database (HMD). S. Scognamiglio 13 September 2021 9 / 33
  • 10. Introduction Model For- malisation Numerical Results LC model estimations using USA data (high-population country) ax bx kt 0 25 50 75 100 0 25 50 75 100 1950 1960 1970 1980 1990 2000 −20 0 20 40 0.00 0.01 0.02 0.03 −8 −6 −4 −2 value Model LC_poisson LC_SVD Gender Female Male Figure: LC model and Poisson LC models parameter estimates for USA mortality data. S. Scognamiglio 13 September 2021 10 / 33
  • 11. Introduction Model For- malisation Numerical Results Luxembourg Mortality Data (low-population country) Male Female 0 25 50 75 100 0 25 50 75 100 −10.0 −7.5 −5.0 −2.5 0.0 Age log(mx) 1960 1970 1980 1990 2000 2010 Year Figure: Log mortality rates for different ages in Luxembourg from 1960 to 2018. Source: Human Mortality Database (HMD). S. Scognamiglio 13 September 2021 11 / 33
  • 12. Introduction Model For- malisation Numerical Results LC model estimations using Luxembourg data (low-population country) ax bx kt 0 25 50 75 100 0 25 50 75 100 1960 1970 1980 1990 2000 −40 −20 0 20 0.00 0.02 0.04 −7.5 −5.0 −2.5 0.0 value Model LC_poisson LC_SVD Gender Female Male Figure: LC model and Poisson LC models parameter estimates for Luxembourg mortality data. S. Scognamiglio 13 September 2021 12 / 33
  • 13. Introduction Model For- malisation Numerical Results The Model formalisation We simultaneously model the mortality of a set of populations I which differ among them for the region and gender i = (r, g) ∈ I = R × {male, female}. The network model provides three subnets that approximate the parameters of the LC model. Each one of these subnets combines several kinds of neural network layers: The a(i) -subnet uses embedding and fully-connected layers; The b(i) -subnet uses embedding and fully-connected layers; The k (i) t -subnet uses fully-connected layers and/or other feed-forward layers. S. Scognamiglio 13 September 2021 13 / 33
  • 14. Introduction Model For- malisation Numerical Results The a(i)-subnet The two embedding layers map r ∈ R and g ∈ G into real-valued vectors: z (a) R : R → R q (a) R , r 7→ z (a) R (r) = z (a) R,1(r), z (a) R,2(r), . . . , z (a) R,q (a) R (r) ! , z (a) G : G → R q (a) G , g 7→ z (a) G (g) = z (a) G,1(g), z (a) G,2(g), . . . , z (a) G,q (a) G (g) ! . The vector z (a) I = z (a) I (r, g) = z (a) R (r) , z (a) G (g) ∈ Rq (a) I (with q (a) I = q (a) R + q (a) G ) is a learned representation of the population i = (r, g). It is further processed by a FCN layer which maps z (a) I in a new |X|-dimensional real-valued space f (a) : R q (a) I → R |X| , z (a) I 7→ f (a) (z (a) I ) = f (a) x0 (z (a) I ), f (a) x1 (z (a) I ), . . . , f (a) xω (z (a) I ) . Each new feature f (a) x (z (a) I ) is a age-specific function of the vector z (a) I z (a) I 7→ f (a) x (z (a) I ) = φ (a) w (a) x,0 + q (a) I X l=1 w (a) x,l z (a) I,l = φ (a) w (a) x,0 + D w (a) x , z (a) I E , x ∈ X, where φ(a) : R → R is a (non-linear) activation function, w (a) x,l ∈ R are the network parameters. S. Scognamiglio 13 September 2021 14 / 33
  • 15. Introduction Model For- malisation Numerical Results The b(i)-subnet Similarly to the first subnet, the second one provides two embedding layers of size q (b) R , q (b) G ∈ N z (b) R : R → R q (b) R , r 7→ z (b) R (r) = z (b) R,1(r), z (b) R,2(r), . . . , z (b) R,q (b) R (r) ! , z (b) G : G → R q (b) G , g 7→ z (b) G (g) = z (b) G,1(g), z (b) G,2(g), . . . , z (b) G,q (b) G (g) ! , and a |X|-dimensional FCN layer which maps the population-specific vector z (b) I = z (b) I (r, g) = z (b) R (r) , z (b) G (g) ∈ Rq (b) I (with q (b) I = q (b) R + q (b) G ) in |X|-dimensional real-valued space f (b) : R q (b) I → R |X| , z (b) I 7→ f (b) (z (b) I ) = f (b) x0 (z (b) I ), f (b) x1 (z (b) I ), . . . , f (b) xω (z (b) I ) . Also in this case, each new component f (b) x (z (b) I ) is an age-specific function of z (b) I z (b) I 7→ f (b) x (z (j) I ) = φ (b) w (b) x,0 + q (b) I X l=1 w (b) x,l z (b) I,l = φ (b) w (b) x,0 + D w (b) x , z (b) I E , x ∈ X, with φ(b) : R → R and w (b) x,l ∈ R. S. Scognamiglio 13 September 2021 15 / 33
  • 16. Introduction Model For- malisation Numerical Results The first two subnets in compact form Denoting by w (j) 0 = (w (j) x,0)x∈X ∈ R|X| and W (j) = (w (j) x,I ) x∈X ∈ R|X|×q (j) I , ∀j ∈ {a, b}, the output of the first two subnets can be written in compact form f (a) z (a) I = φ (a) w (a) 0 + D W (a) , z (a) I E = φ (a) w (a) 0 + D W (a) R , z (a) R (r) E + D W (a) G , z (a) G (g) E , f (b) z (b) I = φ (b) w (b) 0 + D W (b) , z (b) I E = φ (b) w (b) 0 + D W (b) R , z (b) R (r) E + D W (b) G , z (b) G (g) E , where one could carry out the decomposition W (j) = W (j) R , W (j) G of the matrices of the FCN layers to distinguish the weights which refer to the gender-specific and the region-specific features. S. Scognamiglio 13 September 2021 16 / 33
  • 17. Introduction Model For- malisation Numerical Results The k (i) t -subnet The first FCN layer maps log(m (i) t ) into a qz1 -dimensional real-valued space: f (k1) : R |X| → R qz1 , log(m (i) t ) 7→ f (k1) log(m (i) t ) = f (k1) 1 log(m (i) t ) , . . . , f (k1) qz1 log(m (i) t ) where each new feature component f (k1) s (log(m (i) t )) is function of the mortality rates of all ages log(m (i) t ) 7→ f (k1) s log(m (i) t ) = φ (k1) w (k1) s,0 + D w (k1) s , log(m (i) t ) E , s = 1, . . . , qz1 , where w (k1) s,0 ∈ R and w (k1) s ∈ R|X| are parameters. The second FCN layer of size qz2 = 1 is a mapping f (k2) : R qz1 → R, f (k1) log(m (i) t ) 7→ f (k2) f (k1) log(m (i) t ) = f (k2) ◦ f (k1) log(m (i) t ) . It extracts a single new feature (f (k2) ◦ f (k1) )(log(m (i) t )) = φ (k2) w (k2) 0 + w (k2) , φ (k1) w (k1) 0 + D W (k1) , log(m (i) t ) E , where w (k2) 0 ∈ R, w (k1) 0 = (w (k1) s,0 )1≤s≤qz1 ∈ Rqz1 , w(k2) ∈ Rqz1 , W (k1) = (w (k1) s ) 1≤s≤qz1 ∈ Rqz1 ×|X| are network parameters and φ(j) (·) : R → R for j ∈ {k1, k2} are activation functions. S. Scognamiglio 13 September 2021 17 / 33
  • 18. Introduction Model For- malisation Numerical Results Grapical summary of the model fully conn. layer embedding layer concatening Figure: Graphical representation of the neural network architecture for ILC models fitting. S. Scognamiglio 13 September 2021 18 / 33
  • 19. Introduction Model For- malisation Numerical Results Model Interpretation Finally, an approximation of log-mortality curve at time t in the population i can be obtained as log(m (i) t ) = f (a) z (a) I + f (b) z (b) I (f (k2) ◦ f (k1) )(log(m (i) t )) where each age component is given by log(m (i) x,t ) = f (a) x z (a) I | {z } a (i) x + f (b) x z (b) I | {z } b (i) x f (k2) ◦ f (k1) log(m (i) t ) | {z } k (i) t . A simple interpretation of all the terms can be provided: f (a) x z (a) I ∈ R is a population and age-specific term that plays the same role of a(i) x in the LC model. f (b) x z (b) I ∈ R is a population and age-specific term that plays the same role of b(i) x in the LC model. (f (k2) ◦ f (k1) )(log(m (i) t )) ∈ R is a population and time-specific term that plays the same role of the k (i) t in the LC model. S. Scognamiglio 13 September 2021 19 / 33
  • 20. Introduction Model For- malisation Numerical Results Model Interpretation Setting linear activation φ(j) (x) = x, ∀j ∈ {a, b}, and expanding all the terms in previous equation, some further interpretations can be argued: log(m (i) x,t ) = global ax z}|{ w (a) x,0 + population effect z }| { D w (a) x , z (a) I E | {z } a (i) x + global bx z}|{ w (b) x,0 + population effect z }| { D w (b) x , z (b) I E | {z } b (i) x · f (k2) ◦ f (k1) log(m (i) t ) | {z } k (i) t w (a) x,0 can be seen as a population-independent ax parameter, D w(a) x , z (a) I E can be seen as a population-specific ax correction which can be decomposed as: D w (a) x , z (a) I E | {z } population effect = D w (a) x,R, z (a) R (r) E | {z } regional effect + D w (a) x,G , z (a) G (g) E | {z } gender effect . w (b) x,0 can be seen as a population-independent bx parameter, D w(b) x , z (b) I E can be seen as a population-specific bx correction which can be decomposed as: D w (b) x , z (b) I E | {z } population effect = D w (b) x,R, z (b) R (r) E | {z } regional effect + D w (b) x,G , z (b) G (g) E | {z } gender effect . S. Scognamiglio 13 September 2021 20 / 33
  • 21. Introduction Model For- malisation Numerical Results Model fitting and forecasting Denoting by ψ the full set of the network model’s parameters, it can be splitted into two groups: the population-specific parameters z (a) R (r), z (b) R (r), ∀r ∈ R, and z (a) G (g), z (b) G (g), ∀g ∈ G; the cross-population parameters w (j) 0 , W (j) , ∀j ∈ {a, b, k1}, and w(k2) , w (k2) 0 . These parameters are iteratively adjusted via Back-Propagation algorithm to minimise a given loss function. The resulting estimates ψ̂ can be used to compute the Neural Network (NN) estimates of the LC parameters: â (i) x,NN = φ (a) ŵ (a) x,0 + D ŵ (a) x,R, ẑ (a) R (r) E + D ŵ (a) x,G , ẑ (a) G (g) E , ∀x ∈ X, ∀i ∈ I, b̂ (i) x,NN = φ (b) ŵ (b) x,0 + D ŵ (b) x,R, ẑ (b) R (r) E + D ŵ (b) x,G , ẑ (b) G (g) E , ∀x ∈ X, ∀i ∈ I, k̂ (i) t,NN = φ (k2) ŵ (k2) 0 + ŵ (k2) , φ (k1) ŵ (k1) 0 + D Ŵ (k1) , log(m (i) t ) E , ∀t ∈ T , ∀i ∈ I. Forecasting is performed assuming that â (i) x,NN and b̂ (i) x,NN are constant over time, while k̂ (i) t,NN is projected with a random walk with drift, ∀i ∈ I. S. Scognamiglio 13 September 2021 21 / 33
  • 22. Introduction Model For- malisation Numerical Results Experiment Design: Human Mortality Database Data description: Human Mortality Database (HMD): we simultaneously consider all populations i = (r, g) ∈ I with |I| = 80 (Male and Female populations of 40 countries) for calendar years in T = {t ∈ N : 1950 ≤ t ≤ 2018}. Data Partitioning: I Training data Ttrain = {t ∈ N : 1950 ≤ t ≤ 1999}; I Test data Ttest = {t ∈ N : 2000 ≤ t ≤ 2018}. We consider 3 different networks which differ from each other in the k (i) t -subnet design that processes the log-mortality curves: 1 LC FCN employs a fully-connected layer; 2 LC LCN employs a 1D locally-connected layer (local-connectivity) ; 3 LC CONV employs a 1D convolutional layer (local-connectivity and parameters sharing). See Chapter 9 of Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep learning. MIT press. Table: Number of network parameters for the neural network models considered. Model # parameters LC FCN 5.171 LC LCN 2.771 LC CONV 2.651 S. Scognamiglio 13 September 2021 22 / 33
  • 23. Introduction Model For- malisation Numerical Results Model fitting: MSE minimisation In the first stage, all the network models are fitted minimising the Mean Squared Error (MSE). The network training involves the minimisation of the following loss function L(ψ) = X x∈X X i∈I X t∈T log(m (i) x,t ) − φ (a) w (a) x,0 + D w (a) x , z (a) I E + −φ (b) w (b) x,0+ D w (b) x , z (b) I E ·φ (k2) w (k2) 0 + w (k2) , φ (k1) w (k1) 0 + D W (k1) , log(m (i) x ) E 2 . S. Scognamiglio 13 September 2021 23 / 33
  • 24. Introduction Model For- malisation Numerical Results Forecasting performance Table: Results of all three network architectures considered: forecasting MSE, number of populations and ages in which each network beats the LC SVD model; forecasting period 2000-2019; MSEs values are in 10−4. Model # MSE # Populations # Ages LC CONV mse 3.41 52/80 83/100 LC FCN mse 3.25 59/80 84/100 LC LCN mse 3.22 60/80 84/100 LC SVD 6.12 S. Scognamiglio 13 September 2021 24 / 33
  • 25. Introduction Model For- malisation Numerical Results Estimates comparison LTU NZL_NM LVA SVN GBR_NIR EST LUX ISL CHE ISR SVK DNK FIN GBR_SCO NOR IRL HUN PRT BLR CZE BEL SWE AUT BGR ESP POL CAN TWN AUS NLD DEUTE GRC USA RUS JPN DEUTW FRATNP ITA GBRTENW UKR 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 −7.5 −5.0 −2.5 0.0 −7.5 −5.0 −2.5 0.0 −7.5 −5.0 −2.5 0.0 −7.5 −5.0 −2.5 0.0 −7.5 −5.0 −2.5 0.0 Age value Model LC_LCN_mse LC_SVD Gender Female Male Figure: Comparison of the LC LCN mse and LC SVD estimates of (a (i) x )x∈X for all the populations considered; fitting period 1950-1999; countries are sorted by population size in 2000. S. Scognamiglio 13 September 2021 25 / 33
  • 26. Introduction Model For- malisation Numerical Results Estimates comparison LC_LCN_mse LC_SVD 0 25 50 75 100 0 25 50 75 100 −7.5 −5.0 −2.5 0.0 Age value Gender Female Male Figure: Comparison of the LC LCN mse and LC SVD estimates of (a (i) x )x∈X distinguishing by model; fitting period 1950-1999. S. Scognamiglio 13 September 2021 26 / 33
  • 27. Introduction Model For- malisation Numerical Results Estimates comparison LTU NZL_NM LVA SVN GBR_NIR EST LUX ISL CHE ISR SVK DNK FIN GBR_SCO NOR IRL HUN PRT BLR CZE BEL SWE AUT BGR ESP POL CAN TWN AUS NLD DEUTE GRC USA RUS JPN DEUTW FRATNP ITA GBRTENW UKR 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Age value Model LC_LCN_mse LC_SVD Gender Female Male Figure: Comparison of the LC LCN mse and LC SVD estimates of (b (i) x )x∈X for all the populations considered; fitting period 1950-1999; countries are sorted by population size in 2000. S. Scognamiglio 13 September 2021 27 / 33
  • 28. Introduction Model For- malisation Numerical Results Estimates comparison LTU NZL_NM LVA SVN GBR_NIR EST LUX ISL CHE ISR SVK DNK FIN GBR_SCO NOR IRL HUN PRT BLR CZE BEL SWE AUT BGR ESP POL CAN TWN AUS NLD DEUTE GRC USA RUS JPN DEUTW FRATNP ITA GBRTENW UKR 1 9 6 0 1 9 7 0 1 9 8 0 1 9 9 0 2 0 0 0 1 9 5 0 1 9 6 0 1 9 7 0 1 9 8 0 1 9 9 0 2 0 0 0 1 9 6 0 1 9 7 0 1 9 8 0 1 9 9 0 2 0 0 0 1 9 8 5 1 9 9 0 1 9 9 5 1 9 5 0 1 9 6 0 1 9 7 0 1 9 8 0 1 9 9 0 2 0 0 0 1 9 6 0 1 9 7 0 1 9 8 0 1 9 9 0 2 0 0 0 1 9 6 0 1 9 7 0 1 9 8 0 1 9 9 0 2 0 0 0 1 9 5 0 1 9 6 0 1 9 7 0 1 9 8 0 1 9 9 0 2 0 0 0 1 9 5 0 1 9 6 0 1 9 7 0 1 9 8 0 1 9 9 0 2 0 0 0 1 9 8 5 1 9 9 0 1 9 9 5 1 9 5 0 1 9 6 0 1 9 7 0 1 9 8 0 1 9 9 0 2 0 0 0 1 9 5 0 1 9 6 0 1 9 7 0 1 9 8 0 1 9 9 0 2 0 0 0 1 9 5 0 1 9 6 0 1 9 7 0 1 9 8 0 1 9 9 0 2 0 0 0 1 9 5 0 1 9 6 0 1 9 7 0 1 9 8 0 1 9 9 0 2 0 0 0 1 9 5 0 1 9 6 0 1 9 7 0 1 9 8 0 1 9 9 0 2 0 0 0 1 9 5 0 1 9 6 0 1 9 7 0 1 9 8 0 1 9 9 0 2 0 0 0 1 9 5 0 1 9 6 0 1 9 7 0 1 9 8 0 1 9 9 0 2 0 0 0 1 9 5 0 1 9 6 0 1 9 7 0 1 9 8 0 1 9 9 0 2 0 0 0 1 9 6 0 1 9 7 0 1 9 8 0 1 9 9 0 2 0 0 0 1 9 5 0 1 9 6 0 1 9 7 0 1 9 8 0 1 9 9 0 2 0 0 0 1 9 5 0 1 9 6 0 1 9 7 0 1 9 8 0 1 9 9 0 2 0 0 0 1 9 5 0 1 9 6 0 1 9 7 0 1 9 8 0 1 9 9 0 2 0 0 0 1 9 5 0 1 9 6 0 1 9 7 0 1 9 8 0 1 9 9 0 2 0 0 0 1 9 5 0 1 9 6 0 1 9 7 0 1 9 8 0 1 9 9 0 2 0 0 0 1 9 5 0 1 9 6 0 1 9 7 0 1 9 8 0 1 9 9 0 2 0 0 0 1 9 6 0 1 9 7 0 1 9 8 0 1 9 9 0 2 0 0 0 1 9 5 0 1 9 6 0 1 9 7 0 1 9 8 0 1 9 9 0 2 0 0 0 1 9 7 0 1 9 8 0 1 9 9 0 2 0 0 0 1 9 5 0 1 9 6 0 1 9 7 0 1 9 8 0 1 9 9 0 2 0 0 0 1 9 5 0 1 9 6 0 1 9 7 0 1 9 8 0 1 9 9 0 2 0 0 0 1 9 6 0 1 9 7 0 1 9 8 0 1 9 9 0 2 0 0 0 1 9 8 5 1 9 9 0 1 9 9 5 1 9 5 0 1 9 6 0 1 9 7 0 1 9 8 0 1 9 9 0 2 0 0 0 1 9 6 0 1 9 7 0 1 9 8 0 1 9 9 0 2 0 0 0 1 9 5 0 1 9 6 0 1 9 7 0 1 9 8 0 1 9 9 0 2 0 0 0 1 9 6 0 1 9 7 0 1 9 8 0 1 9 9 0 2 0 0 0 1 9 5 0 1 9 6 0 1 9 7 0 1 9 8 0 1 9 9 0 2 0 0 0 1 9 5 0 1 9 6 0 1 9 7 0 1 9 8 0 1 9 9 0 2 0 0 0 1 9 5 0 1 9 6 0 1 9 7 0 1 9 8 0 1 9 9 0 2 0 0 0 1 9 6 0 1 9 7 0 1 9 8 0 1 9 9 0 2 0 0 0 −10 0 10 20 −10 0 10 0 25 50 −50 −25 0 25 50 75 −40 −20 0 20 40 −25 0 25 50 −40 −20 0 20 −30 0 30 60 −25 0 25 50 −40 −20 0 20 −60 −30 0 30 60 −25 0 25 −50 −25 0 25 50 −40 −20 0 20 40 60 −20 −10 0 10 20 −30 0 30 60 −50 −25 0 25 50 −50 −25 0 25 50 −50 −25 0 25 50 75 −25 0 25 50 −50 −25 0 25 −20 0 20 −40 −20 0 20 40 60 −20 0 20 40 −20 −10 0 10 −40 0 40 80 120 −25 0 25 50 −20 −10 0 10 20 30 −25 0 25 50 −10 0 10 20 −10 0 10 20 −20 0 20 40 −50 −25 0 25 50 −20 −10 0 10 20 −40 −20 0 20 −20 0 20 40 −50 0 50 −20 0 20 40 60 −30 0 30 −10 0 10 20 Year value Model LC_LCN_mse LC_SVD Gender Female Male Figure: Comparison of the LC LCN mse and LC SVD estimates of (k (i) t )t∈T for all the populations considered; fitting period 1950-1999; countries are sorted by population size in 2000. S. Scognamiglio 13 September 2021 28 / 33
  • 29. Introduction Model For- malisation Numerical Results Model fitting: Poisson loss minimisation Assuming a Poisson number of death D (i) x,t , we explore the use of the Poisson loss function to train the neural network models: D (i) x,t ∼ Poisson(E (i) x,t e m (i) x,t ), where m (i) x,t = w (a) x,0 + D w (a) x , z (a) I E + + w (b) x,0 + D w (b) x , z (b) I E · w (k2) 0 + w (k2) , φ (k1) w (k1) 0 + D W (k1) , log(m (i) x ) E . In this setting, the neural networks model fitting involves the minimisation of L(ψ) = X x∈X X i∈I X t∈T E (i) x,t e m (i) x,t − D (i) x,t m (i) x,t + c which corresponds to maximise the log-likelihood function under the Poisson assumption and c ∈ R. We use all the same data HMD; however, this time we exclude the Canadian populations since the data present several missing values in the Exposure to Risk time series. Here, we have |I| = 78. S. Scognamiglio 13 September 2021 29 / 33
  • 30. Introduction Model For- malisation Numerical Results Forecasting performance Table: Results of all three network architectures considered: forecasting MSEs, number of populations and ages in which each network beats the LC Poisson and the LC SVD models; forecasting period 2000-2019; MSEs values are in 10−4. LC Poisson LC SVD Model MSE # Populations # Ages # Populations # Ages LC CONV Poisson 3.02 57/78 83/100 64/78 83/100 LC FCN Poisson 3.07 55/78 83/100 63/78 83/100 LC LCN Poisson 2.89 61/78 83/100 67/78 83/100 LC SVD 6.12 LC Poisson 5.19 S. Scognamiglio 13 September 2021 30 / 33
  • 31. Introduction Model For- malisation Numerical Results Forecasting MSEs of the LC LCN Poisson and the LC Poisson models on different populations Male Female Country LC LCN Poisson LC Poisson LC LCN Poisson LC Poisson 1 USA 1.15 1.42 0.27 0.50 2 RUS 2.19 8.35 2.23 5.89 3 JPN 0.91 0.45 2.30 0.40 4 DEUTW 0.63 0.80 0.23 0.35 5 FRATNP 0.77 0.52 0.64 0.34 6 ITA 0.49 0.58 0.94 0.24 7 GBRTENW 0.74 1.11 0.66 0.38 8 UKR 2.05 7.19 3.40 3.72 9 ESP 0.80 1.72 0.63 1.27 10 POL 2.61 4.69 0.85 3.29 11 TWN 4.82 10.49 1.42 0.95 12 AUS 0.89 1.14 0.32 0.41 13 NLD 1.11 1.76 0.43 0.35 14 DEUTE 1.82 2.71 0.70 1.45 15 GRC 1.73 3.16 0.55 1.97 16 HUN 3.45 6.01 1.22 1.38 17 PRT 1.33 2.42 0.99 2.01 18 BLR 3.34 12.76 3.47 10.24 19 CZE 2.97 4.68 1.03 2.27 20 BEL 1.56 2.31 0.47 0.51 21 SWE 1.10 1.13 0.25 0.38 22 AUT 1.51 2.57 0.40 0.61 23 BGR 5.83 11.30 2.95 6.14 24 CHE 1.41 1.81 0.32 0.32 25 ISR 2.38 1.85 2.03 1.81 26 SVK 7.20 13.27 3.20 2.54 27 DNK 2.01 2.27 0.53 0.42 28 FIN 3.74 3.73 0.82 1.10 29 GBR SCO 1.69 1.97 0.41 0.67 30 NOR 2.11 3.50 0.71 0.51 31 IRL 3.40 7.82 1.52 2.23 32 LTU 6.59 9.37 9.54 7.60 33 NZL NM 2.50 4.19 0.70 1.19 34 LVA 10.38 11.37 3.00 3.57 35 SVN 10.18 69.32 2.01 4.77 36 GBR NIR 5.75 8.21 1.62 1.80 37 EST 16.05 18.88 3.49 6.88 38 LUX 15.90 43.12 5.42 6.74 39 ISL 19.17 19.98 7.56 7.40 S. Scognamiglio 13 September 2021 31 / 33
  • 32. Introduction Model For- malisation Numerical Results Projected log-mortality surface for the Luxembourg male population Figure: Projected log-mortality surface of the LC Poisson and LC LCN Poisson models projected for the Luxembourg male population. S. Scognamiglio 13 September 2021 32 / 33
  • 33. Introduction Model For- malisation Numerical Results Future works Future research intends to 1 analyse the performance of the proposed model on other available data sources such as the United States Mortality Database (USMB) and insurance portfolio’s data; 2 investigate the use of neural networks for fitting other stochastic mortality models: I the single-population models belonging to the family of Generalized Age Period Cohort (GAPC) models; I the multi-population extensions of the LC model: Li and lee (Demography 2005), Kleinow (IME 2015); 3 explore the potential of the proposed large-scale mortality model in the actuarial evaluations and longevity risk management. For advice or comments: salvatore.scognamiglio@uniparthenope.it S. Scognamiglio 13 September 2021 33 / 33