Calibrating the Lee-Carter and the Poisson Lee-Carter models via Neural Networks

Introduction
Model For-
malisation
Numerical
Results
Calibrating the Lee-Carter and the Poisson Lee-Carter
models via Neural Networks
Salvatore Scognamiglio
Department of Management and Quantitative Studies,
University of Naples “Parthenope”
XLV Annual Meeting of the AMASES (2021)
S. Scognamiglio 13 September 2021 1 / 33

Introduction
Model For-
malisation
Numerical
Results
Introduction
Mortality modelling: Lee and Carter (JASA 1992), Brouhns, Denuit and Vermunt
(IME 2002), Renshaw and Haberman (IME 2006);
can be applied to a single population.
Multi-Population Mortality modelling: Li and Lee (Demography 2005), Kleinow
(IME 2015)
generally applied on smaller sub-sets of data;
usually intended for forecasting the mortality of similar populations;
hard to fit (complex optimisation schemes/less known statistical techniques).
Large-Scale Mortality Modelling: Richman and Wüthrich (AAS 2020), Perla,
Richman, Scognamiglio and Wüthrich (SAJ 2021)
allows more accurate forecasting than the traditional models for a large set of
populations;
provides only point forecasts.

Introduction
Model For-
malisation
Numerical
Results
Large-Scale Mortality Modelling via neural networks
We develop a neural network model which describes the mortality dynamics of
many different and potentially unrelated populations:
individual stochastic mortality models are combined into a neural network
environment which encourages the information sharing among populations;
the model parameters are jointly optimised in a single stage using all
available information instead of using population-specific subsets of data as in
the traditional fitting schemes;
the proposed model presents very few easy-to-interpret parameters and allows
to measure uncertainty in the predictions;
the parameter estimates appear more robust and the forecasting performance
improves.
◦ The full paper is available on
SSRN:https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3868303

Introduction
Model For-
malisation
Numerical
Results
The Lee-Carter Model
Let X = {x0, x1, . . . , xω} be the set of the ages and T = {t0, t1, . . . , tn} the set of
calendar years considered.
The Lee-Carter (LC) model defines the logarithm of the central death rate
log(mx,t ) ∈ R at age x ∈ X in the calendar year t ∈ T as
log(mx,t ) = ax + bx kt + x,t ,
where:
ax is the average force of mortality at age x;
kt is the overall mortality trend in calendar year t;
bx is the rate of change of force of mortality broken down to different ages.
To avoid identifiability problems, the following constraints are imposed
X
x∈X
bx = 1
X
t∈T
kt
| T |
= 0.

Introduction
Model For-
malisation
Numerical
Results
The Lee-Carter Model: Ordinary Least Squared (OLS) estimation
The Ordinary Least Squared (OLS) estimation of the parameters can be obtained by
solving
arg min
(ax )x ,(bx )x ,(kt )t
X
x∈X
X
t∈T

log(mx,t ) − ax − bx kt
2
.
The (ax )x are estimated as
âx = log
Y
t∈T
(mx,t )1/|T |

,
while (kt )t and (bx )x are estimated as the first right and first left singular vectors in
the Singular Value Decomposition (SVD) of the center log-mortality matrix
M = log(mx,t ) − âx

x∈X,t∈T
∈ R|X|×|T |
.
In order to forecast, (ax )x and (bx )x are assumed to be constant over time and the
time index kt is modeled as an ARIMA (0,1,0) process
kt = kt−1 + γ + et with i.i.d et ∼ N(0, σ2
)
where γ ∈ R.

Introduction
Model For-
malisation
Numerical
Results
Multi-population mortality modelling: the Individual Lee-Carter
Approach
A simple way of modelling the mortality of a set of different populations I is
to describe each population separately with its own LC model
log(m
(i)
x,t ) = a(i)
x + b(i)
x k
(i)
t +
(i)
x,t ∀i ∈ I.
This approach is sometimes called Individual Lee Carter (ILC) approach. In
this case, the model fitting is performed individually ∀i ∈ I and the
population and time-specific terms k
(i)
t are projected with independent
ARIMA (0,1,0) processes.

Introduction
Model For-
malisation
Numerical
Results
The Poisson Lee-Carter Model
The main drawback of SVD is the assumption of homoskedastic errors (see Alho
(NAAJ 2000)).
In Brouhns (IME 2002), a maximum likelihood estimation based on a Poisson death
count D
(i)
x,t is proposed to allow heteroskedasticity:
D
(i)
x,t ∼ Poisson(E
(i)
x,t m
(i)
x,t ) with m
(i)
x,t = ea
(i)
x +b
(i)
x k
(i)
t
where E
(i)
x,t is the number of exposure-to-risk in age x at time t in the population i
and the classical LC constraints still hold ∀i ∈ I.
The model parameters can be estimated by solving
arg max
(a
(i)
x )x ,(b
(i)
x )x ,(k
(i)
t )t
X
x∈X
X
t∈T

D
(i)
x,t (a
(i)
x + b
(i)
x k
(i)
t ) − E
(i)
x,t ea
(i)
x +b
(i)
x k
(i)
t

+ ci , ∀i ∈ I
where ci ∈ R.

Introduction
Model For-
malisation
Numerical
Results
Lee-Carter Model forecasting performance vs population size (Perla,
Richman, Scognamiglio and Wüthrich (SAJ 2021)):
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Male
Female
USA
RUS
JPN
DEUTW
FRATNP
ITA
GBRTENW
UKR
ESP
POL
TWN
AUS
NLD
DEUTE
GRC
HUN
PRT
BLR
CZE
BEL
SWE
AUT
BGR
CHE
ISR
SVK
DNK
FIN
GBR_SCO
NOR
IRL
LTU
NZL_NM
LVA
SVN
GBR_NIR
EST
LUX
ISL
−2
0
2
4
−2
0
2
4
Country
log(MSE)
Model
●
●
LC_Poisson
LC_SVD
Figure: Forecasting Mean Squared Error (MSE) in log-scale of the LC model (LC SVD) and the Poisson LC
model (LC Poisson) on the population of the Human Mortality Database (HMD), fitting period 1950-1999;
forecasting period 2000-2019; countries are sorted by population size in 2000.

Introduction
Model For-
malisation
Numerical
Results
USA mortality data (high-population country)
Male Female
0 25 50 75 100 0 25 50 75 100
−7.5
−5.0
−2.5
Age
log(mx)
1960
1980
2000
Year
Figure: Log mortality rates for different ages in USA from 1950 to 2018. Source: Human Mortality Database
(HMD).

Introduction
Model For-
malisation
Numerical
Results
LC model estimations using USA data (high-population country)
ax bx kt
0 25 50 75 100 0 25 50 75 100 1950 1960 1970 1980 1990 2000
−20
0
20
40
0.00
0.01
0.02
0.03
−8
−6
−4
−2
value
Model
LC_poisson
LC_SVD
Gender
Female
Male
Figure: LC model and Poisson LC models parameter estimates for USA mortality data.

Introduction
Model For-
malisation
Numerical
Results
Luxembourg Mortality Data (low-population country)
Male Female
0 25 50 75 100 0 25 50 75 100
−10.0
−7.5
−5.0
−2.5
0.0
Age
log(mx)
1960
1970
1980
1990
2000
2010
Year
Figure: Log mortality rates for different ages in Luxembourg from 1960 to 2018. Source: Human Mortality
Database (HMD).

Introduction
Model For-
malisation
Numerical
Results
LC model estimations using Luxembourg data (low-population
country)
ax bx kt
0 25 50 75 100 0 25 50 75 100 1960 1970 1980 1990 2000
−40
−20
0
20
0.00
0.02
0.04
−7.5
−5.0
−2.5
0.0
value
Model
LC_poisson
LC_SVD
Gender
Female
Male
Figure: LC model and Poisson LC models parameter estimates for Luxembourg mortality data.

Introduction
Model For-
malisation
Numerical
Results
The Model formalisation
We simultaneously model the mortality of a set of populations I which differ
among them for the region and gender i = (r, g) ∈ I = R × {male, female}.
The network model provides three subnets that approximate the parameters
of the LC model. Each one of these subnets combines several kinds of neural
network layers:
The a(i)
-subnet uses embedding and fully-connected layers;
The b(i)
-subnet uses embedding and fully-connected layers;
The k
(i)
t -subnet uses fully-connected layers and/or other feed-forward
layers.

Introduction
Model For-
malisation
Numerical
Results
The a(i)-subnet
The two embedding layers map r ∈ R and g ∈ G into real-valued vectors:
z
(a)
R : R → R
q
(a)
R , r 7→ z
(a)
R (r) = z
(a)
R,1(r), z
(a)
R,2(r), . . . , z
(a)
R,q
(a)
R
(r)
!
,
z
(a)
G : G → R
q
(a)
G , g 7→ z
(a)
G (g) = z
(a)
G,1(g), z
(a)
G,2(g), . . . , z
(a)
G,q
(a)
G
(g)
!
.
The vector z
(a)
I = z
(a)
I (r, g) = z
(a)
R (r)

, z
(a)
G (g)

∈ Rq
(a)
I (with q
(a)
I = q
(a)
R + q
(a)
G ) is a
learned representation of the population i = (r, g).
It is further processed by a FCN layer which maps z
(a)
I in a new |X|-dimensional real-valued
space
f
(a)
: R
q
(a)
I → R
|X|
, z
(a)
I 7→ f
(a)
(z
(a)
I ) =

f
(a)
x0
(z
(a)
I ), f
(a)
x1
(z
(a)
I ), . . . , f
(a)
xω
(z
(a)
I )

.
Each new feature f (a)
x (z
(a)
I ) is a age-specific function of the vector z
(a)
I
z
(a)
I 7→ f
(a)
x (z
(a)
I ) = φ
(a)

w
(a)
x,0 +
q
(a)
I
X
l=1
w
(a)
x,l z
(a)
I,l

= φ
(a)

w
(a)
x,0 +
D
w
(a)
x , z
(a)
I
E
, x ∈ X,
where φ(a)
: R → R is a (non-linear) activation function, w
(a)
x,l ∈ R are the network parameters.

Introduction
Model For-
malisation
Numerical
Results
The b(i)-subnet
Similarly to the first subnet, the second one provides two embedding layers of size q
(b)
R , q
(b)
G ∈ N
z
(b)
R : R → R
q
(b)
R , r 7→ z
(b)
R (r) = z
(b)
R,1(r), z
(b)
R,2(r), . . . , z
(b)
R,q
(b)
R
(r)
!
,
z
(b)
G : G → R
q
(b)
G , g 7→ z
(b)
G (g) = z
(b)
G,1(g), z
(b)
G,2(g), . . . , z
(b)
G,q
(b)
G
(g)
!
,
and a |X|-dimensional FCN layer which maps the population-specific vector
z
(b)
I = z
(b)
I (r, g) = z
(b)
R (r)

, z
(b)
G (g)

∈ Rq
(b)
I (with q
(b)
I = q
(b)
R + q
(b)
G ) in
|X|-dimensional real-valued space
f
(b)
: R
q
(b)
I → R
|X|
, z
(b)
I 7→ f
(b)
(z
(b)
I ) =

f
(b)
x0
(z
(b)
I ), f
(b)
x1
(z
(b)
I ), . . . , f
(b)
xω
(z
(b)
I )

.
Also in this case, each new component f (b)
x (z
(b)
I ) is an age-specific function of z
(b)
I
z
(b)
I 7→ f
(b)
x (z
(j)
I ) = φ
(b)

w
(b)
x,0 +
q
(b)
I
X
l=1
w
(b)
x,l z
(b)
I,l

= φ
(b)

w
(b)
x,0 +
D
w
(b)
x , z
(b)
I
E
, x ∈ X,
with φ(b)
: R → R and w
(b)
x,l ∈ R.

Introduction
Model For-
malisation
Numerical
Results
The first two subnets in compact form
Denoting by w
(j)
0 = (w
(j)
x,0)x∈X ∈ R|X|
and W (j)
= (w
(j)
x,I )
x∈X ∈ R|X|×q
(j)
I , ∀j ∈ {a, b}, the
output of the first two subnets can be written in compact form
f
(a)
z
(a)
I

= φ
(a)

w
(a)
0 +
D
W
(a)
, z
(a)
I
E
= φ
(a)

w
(a)
0 +
D
W
(a)
R , z
(a)
R (r)
E
+
D
W
(a)
G , z
(a)
G (g)
E
,
f
(b)
z
(b)
I

= φ
(b)

w
(b)
0 +
D
W
(b)
, z
(b)
I
E
= φ
(b)

w
(b)
0 +
D
W
(b)
R , z
(b)
R (r)
E
+
D
W
(b)
G , z
(b)
G (g)
E
,
where one could carry out the decomposition W (j)
= W
(j)
R , W
(j)
G

of the matrices of the FCN
layers to distinguish the weights which refer to the gender-specific and the region-specific
features.

Introduction
Model For-
malisation
Numerical
Results
The k
(i)
t -subnet
The first FCN layer maps log(m
(i)
t ) into a qz1
-dimensional real-valued space:
f
(k1)
: R
|X|
→ R
qz1 , log(m
(i)
t ) 7→ f
(k1)
log(m
(i)
t )

=

f
(k1)
1 log(m
(i)
t )

, . . . , f
(k1)
qz1
log(m
(i)
t )

where each new feature component f
(k1)
s (log(m
(i)
t )) is function of the mortality rates of all ages
log(m
(i)
t ) 7→ f
(k1)
s log(m
(i)
t )

= φ
(k1)

w
(k1)
s,0 +
D
w
(k1)
s , log(m
(i)
t )
E
, s = 1, . . . , qz1
,
where w
(k1)
s,0 ∈ R and w
(k1)
s ∈ R|X|
are parameters.
The second FCN layer of size qz2
= 1 is a mapping
f
(k2)
: R
qz1 → R, f
(k1)
log(m
(i)
t )

7→ f
(k2)
f
(k1)
log(m
(i)
t )

= f
(k2)
◦ f
(k1)
log(m
(i)
t )

.
It extracts a single new feature
(f
(k2)
◦ f
(k1)
)(log(m
(i)
t )) = φ
(k2)

w
(k2)
0 +

w
(k2)
, φ
(k1)

w
(k1)
0 +
D
W
(k1)
, log(m
(i)
t )
E
,
where
w
(k2)
0 ∈ R, w
(k1)
0 = (w
(k1)
s,0 )1≤s≤qz1
∈ Rqz1 , w(k2)
∈ Rqz1 , W (k1)
= (w
(k1)
s )
1≤s≤qz1
∈ Rqz1
×|X|
are network parameters and φ(j)
(·) : R → R for j ∈ {k1, k2} are activation functions.

Introduction
Model For-
malisation
Numerical
Results
Grapical summary of the model
fully conn. layer
embedding layer
concatening
Figure: Graphical representation of the neural network architecture for ILC models fitting.

Introduction
Model For-
malisation
Numerical
Results
Model Interpretation
Finally, an approximation of log-mortality curve at time t in the population i can be obtained as

log(m
(i)
t ) = f
(a)
z
(a)
I

+ f
(b)
z
(b)
I

(f
(k2)
◦ f
(k1)
)(log(m
(i)
t ))
where each age component is given by

log(m
(i)
x,t ) = f
(a)
x z
(a)
I

| {z }
a
(i)
x
+ f
(b)
x z
(b)
I

| {z }
b
(i)
x
f
(k2)
◦ f
(k1)
log(m
(i)
t )

| {z }
k
(i)
t
.
A simple interpretation of all the terms can be provided:
f (a)
x z
(a)
I

∈ R is a population and age-specific term that plays the same role of a(i)
x in the
LC model.
f (b)
x z
(b)
I

∈ R is a population and age-specific term that plays the same role of b(i)
x in the
LC model.
(f (k2)
◦ f (k1)
)(log(m
(i)
t )) ∈ R is a population and time-specific term that plays the same
role of the k
(i)
t in the LC model.

Introduction
Model For-
malisation
Numerical
Results
Model Interpretation
Setting linear activation φ(j)
(x) = x, ∀j ∈ {a, b}, and expanding all the terms in previous
equation, some further interpretations can be argued:

log(m
(i)
x,t ) =

global ax
z}|{
w
(a)
x,0 +
population effect
z }| {
D
w
(a)
x , z
(a)
I
E
| {z }
a
(i)
x
+

global bx
z}|{
w
(b)
x,0 +
population effect
z }| {
D
w
(b)
x , z
(b)
I
E
| {z }
b
(i)
x
·

f
(k2)
◦ f
(k1)
log(m
(i)
t )

| {z }
k
(i)
t
w
(a)
x,0 can be seen as a population-independent ax parameter,
D
w(a)
x , z
(a)
I
E
can be seen as a population-specific ax correction which can be decomposed
as:
D
w
(a)
x , z
(a)
I
E
| {z }
population effect
=
D
w
(a)
x,R, z
(a)
R (r)
E
| {z }
regional effect
+
D
w
(a)
x,G , z
(a)
G (g)
E
| {z }
gender effect
.
w
(b)
x,0 can be seen as a population-independent bx parameter,
D
w(b)
x , z
(b)
I
E
can be seen as a population-specific bx correction which can be decomposed
as:
D
w
(b)
x , z
(b)
I
E
| {z }
population effect
=
D
w
(b)
x,R, z
(b)
R (r)
E
| {z }
regional effect
+
D
w
(b)
x,G , z
(b)
G (g)
E
| {z }
gender effect
.

Introduction
Model For-
malisation
Numerical
Results
Model fitting and forecasting
Denoting by ψ the full set of the network model’s parameters, it can be splitted into two groups:
the population-specific parameters z
(a)
R (r), z
(b)
R (r), ∀r ∈ R, and z
(a)
G (g), z
(b)
G (g), ∀g ∈ G;
the cross-population parameters w
(j)
0 , W (j)
, ∀j ∈ {a, b, k1}, and w(k2)
, w
(k2)
0 .
These parameters are iteratively adjusted via Back-Propagation algorithm to minimise a given
loss function.
The resulting estimates ψ̂ can be used to compute the Neural Network (NN) estimates of the
LC parameters:
â
(i)
x,NN = φ
(a)

ŵ
(a)
x,0 +
D
ŵ
(a)
x,R, ẑ
(a)
R (r)
E
+
D
ŵ
(a)
x,G , ẑ
(a)
G (g)
E
, ∀x ∈ X, ∀i ∈ I,
b̂
(i)
x,NN = φ
(b)

ŵ
(b)
x,0 +
D
ŵ
(b)
x,R, ẑ
(b)
R (r)
E
+
D
ŵ
(b)
x,G , ẑ
(b)
G (g)
E
, ∀x ∈ X, ∀i ∈ I,
k̂
(i)
t,NN = φ
(k2)

ŵ
(k2)
0 +

ŵ
(k2)
, φ
(k1)

ŵ
(k1)
0 +
D
Ŵ
(k1)
, log(m
(i)
t )
E
, ∀t ∈ T , ∀i ∈ I.
Forecasting is performed assuming that â
(i)
x,NN and b̂
(i)
x,NN are constant over time, while k̂
(i)
t,NN is
projected with a random walk with drift, ∀i ∈ I.

Introduction
Model For-
malisation
Numerical
Results
Experiment Design: Human Mortality Database
Data description:
Human Mortality Database (HMD): we simultaneously consider all populations
i = (r, g) ∈ I with |I| = 80 (Male and Female populations of 40 countries) for calendar
years in T = {t ∈ N : 1950 ≤ t ≤ 2018}.
Data Partitioning:
I Training data Ttrain = {t ∈ N : 1950 ≤ t ≤ 1999};
I Test data Ttest = {t ∈ N : 2000 ≤ t ≤ 2018}.
We consider 3 different networks which differ from each other in the k
(i)
t -subnet design that
processes the log-mortality curves:
1 LC FCN employs a fully-connected layer;
2 LC LCN employs a 1D locally-connected layer (local-connectivity) ;
3 LC CONV employs a 1D convolutional layer (local-connectivity and parameters sharing).
See Chapter 9 of Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep learning. MIT press.
Table: Number of network parameters for the neural network models considered.
Model # parameters
LC FCN 5.171
LC LCN 2.771
LC CONV 2.651

Introduction
Model For-
malisation
Numerical
Results
Model fitting: MSE minimisation
In the first stage, all the network models are fitted minimising the Mean Squared Error (MSE).
The network training involves the minimisation of the following loss function
L(ψ) =
X
x∈X
X
i∈I
X
t∈T

log(m
(i)
x,t ) − φ
(a)

w
(a)
x,0 +
D
w
(a)
x , z
(a)
I
E
+
−φ
(b)

w
(b)
x,0+
D
w
(b)
x , z
(b)
I
E
·φ
(k2)

w
(k2)
0 +

w
(k2)
, φ
(k1)

w
(k1)
0 +
D
W
(k1)
, log(m
(i)
x )
E 2
.

Introduction
Model For-
malisation
Numerical
Results
Forecasting performance
Table: Results of all three network architectures considered: forecasting MSE,
number of populations and ages in which each network beats the LC SVD model;
forecasting period 2000-2019; MSEs values are in 10−4.
Model # MSE # Populations # Ages
LC CONV mse 3.41 52/80 83/100
LC FCN mse 3.25 59/80 84/100
LC LCN mse 3.22 60/80 84/100
LC SVD 6.12

Introduction
Model For-
malisation
Numerical
Results
Estimates comparison
LTU NZL_NM LVA SVN GBR_NIR EST LUX ISL
CHE ISR SVK DNK FIN GBR_SCO NOR IRL
HUN PRT BLR CZE BEL SWE AUT BGR
ESP POL CAN TWN AUS NLD DEUTE GRC
USA RUS JPN DEUTW FRATNP ITA GBRTENW UKR
0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100
−7.5
−5.0
−2.5
0.0
−7.5
−5.0
−2.5
0.0
−7.5
−5.0
−2.5
0.0
−7.5
−5.0
−2.5
0.0
−7.5
−5.0
−2.5
0.0
Age
value
Model
LC_LCN_mse
LC_SVD
Gender
Female
Male
Figure: Comparison of the LC LCN mse and LC SVD estimates of (a
(i)
x )x∈X for all the populations
considered; fitting period 1950-1999; countries are sorted by population size in 2000.

Introduction
Model For-
malisation
Numerical
Results
LC_LCN_mse LC_SVD
0 25 50 75 100 0 25 50 75 100
−7.5
−5.0
−2.5
0.0
Age
value
Gender
Female
Male
Figure: Comparison of the LC LCN mse and LC SVD estimates of (a
(i)
x )x∈X distinguishing by model; fitting
period 1950-1999.

Introduction
Model For-
malisation
Numerical
Results
0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
Age
value
Model
LC_LCN_mse
LC_SVD
Gender
Female
Male
Figure: Comparison of the LC LCN mse and LC SVD estimates of (b
(i)
x )x∈X for all the populations

Introduction
Model For-
malisation
Numerical
Results
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
8
5
1
9
9
0
1
9
9
5
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
8
5
1
9
9
0
1
9
9
5
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
8
5
1
9
9
0
1
9
9
5
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
5
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
1
9
6
0
1
9
7
0
1
9
8
0
1
9
9
0
2
0
0
0
−10
0
10
20
−10
0
10
0
25
50
−50
−25
0
25
50
75
−40
−20
0
20
40
−25
0
25
50
−40
−20
0
20
−30
0
30
60
−25
0
25
50
−40
−20
0
20
−60
−30
0
30
60
−25
0
25
−50
−25
0
25
50
−40
−20
0
20
40
60
−20
−10
0
10
20
−30
0
30
60
−50
−25
0
25
50
−50
−25
0
25
50
−50
−25
0
25
50
75
−25
0
25
50
−50
−25
0
25
−20
0
20
−40
−20
0
20
40
60
−20
0
20
40
−20
−10
0
10
−40
0
40
80
120
−25
0
25
50
−20
−10
0
10
20
30
−25
0
25
50
−10
0
10
20
−10
0
10
20
−20
0
20
40
−50
−25
0
25
50
−20
−10
0
10
20
−40
−20
0
20
−20
0
20
40
−50
0
50
−20
0
20
40
60
−30
0
30
−10
0
10
20
Year
value
Model
LC_LCN_mse
LC_SVD
Gender
Female
Male
Figure: Comparison of the LC LCN mse and LC SVD estimates of (k
(i)
t )t∈T for all the populations

Introduction
Model For-
malisation
Numerical
Results
Model fitting: Poisson loss minimisation
Assuming a Poisson number of death D
(i)
x,t , we explore the use of the Poisson loss function to
train the neural network models:
D
(i)
x,t ∼ Poisson(E
(i)
x,t e
m
(i)
x,t ),
where
m
(i)
x,t =

w
(a)
x,0 +
D
w
(a)
x , z
(a)
I
E
+
+

w
(b)
x,0 +
D
w
(b)
x , z
(b)
I
E
·

w
(k2)
0 +

w
(k2)
, φ
(k1)

w
(k1)
0 +
D
W
(k1)
, log(m
(i)
x )
E
.
In this setting, the neural networks model fitting involves the minimisation of
L(ψ) =
X
x∈X
X
i∈I
X
t∈T

E
(i)
x,t e
m
(i)
x,t − D
(i)
x,t m
(i)
x,t

+ c
which corresponds to maximise the log-likelihood function under the Poisson assumption and
c ∈ R.
We use all the same data HMD; however, this time we exclude the Canadian populations since
the data present several missing values in the Exposure to Risk time series. Here, we have
|I| = 78.

Introduction
Model For-
malisation
Numerical
Results
Forecasting performance
Table: Results of all three network architectures considered: forecasting MSEs,
number of populations and ages in which each network beats the LC Poisson and
the LC SVD models; forecasting period 2000-2019; MSEs values are in 10−4.
LC Poisson LC SVD
Model MSE # Populations # Ages # Populations # Ages
LC CONV Poisson 3.02 57/78 83/100 64/78 83/100
LC FCN Poisson 3.07 55/78 83/100 63/78 83/100
LC LCN Poisson 2.89 61/78 83/100 67/78 83/100
LC SVD 6.12
LC Poisson 5.19

Introduction
Model For-
malisation
Numerical
Results
Forecasting MSEs of the LC LCN Poisson and the LC Poisson
models on different populations
Male Female
Country LC LCN Poisson LC Poisson LC LCN Poisson LC Poisson
1 USA 1.15 1.42 0.27 0.50
2 RUS 2.19 8.35 2.23 5.89
3 JPN 0.91 0.45 2.30 0.40
4 DEUTW 0.63 0.80 0.23 0.35
5 FRATNP 0.77 0.52 0.64 0.34
6 ITA 0.49 0.58 0.94 0.24
7 GBRTENW 0.74 1.11 0.66 0.38
8 UKR 2.05 7.19 3.40 3.72
9 ESP 0.80 1.72 0.63 1.27
10 POL 2.61 4.69 0.85 3.29
11 TWN 4.82 10.49 1.42 0.95
12 AUS 0.89 1.14 0.32 0.41
13 NLD 1.11 1.76 0.43 0.35
14 DEUTE 1.82 2.71 0.70 1.45
15 GRC 1.73 3.16 0.55 1.97
16 HUN 3.45 6.01 1.22 1.38
17 PRT 1.33 2.42 0.99 2.01
18 BLR 3.34 12.76 3.47 10.24
19 CZE 2.97 4.68 1.03 2.27
20 BEL 1.56 2.31 0.47 0.51
21 SWE 1.10 1.13 0.25 0.38
22 AUT 1.51 2.57 0.40 0.61
23 BGR 5.83 11.30 2.95 6.14
24 CHE 1.41 1.81 0.32 0.32
25 ISR 2.38 1.85 2.03 1.81
26 SVK 7.20 13.27 3.20 2.54
27 DNK 2.01 2.27 0.53 0.42
28 FIN 3.74 3.73 0.82 1.10
29 GBR SCO 1.69 1.97 0.41 0.67
30 NOR 2.11 3.50 0.71 0.51
31 IRL 3.40 7.82 1.52 2.23
32 LTU 6.59 9.37 9.54 7.60
33 NZL NM 2.50 4.19 0.70 1.19
34 LVA 10.38 11.37 3.00 3.57
35 SVN 10.18 69.32 2.01 4.77
36 GBR NIR 5.75 8.21 1.62 1.80
37 EST 16.05 18.88 3.49 6.88
38 LUX 15.90 43.12 5.42 6.74
39 ISL 19.17 19.98 7.56 7.40

Introduction
Model For-
malisation
Numerical
Results
Projected log-mortality surface for the Luxembourg male population
Figure: Projected log-mortality surface of the LC Poisson and LC LCN Poisson models projected for the
Luxembourg male population.

Introduction
Model For-
malisation
Numerical
Results
Future works
Future research intends to
1 analyse the performance of the proposed model on other available data
sources such as the United States Mortality Database (USMB) and
insurance portfolio’s data;
2 investigate the use of neural networks for fitting other stochastic
mortality models:
I the single-population models belonging to the family of Generalized Age
Period Cohort (GAPC) models;
I the multi-population extensions of the LC model: Li and lee (Demography
2005), Kleinow (IME 2015);
3 explore the potential of the proposed large-scale mortality model in the
actuarial evaluations and longevity risk management.
For advice or comments:
salvatore.scognamiglio@uniparthenope.it

Calibrating the Lee-Carter and the Poisson Lee-Carter models via Neural Networks

More Related Content

What's hot (20)

Similar to Calibrating the Lee-Carter and the Poisson Lee-Carter models via Neural Networks (20)

Recently uploaded (20)

Calibrating the Lee-Carter and the Poisson Lee-Carter models via Neural Networks