transformations and nonparametric inference

Arthur CHARPENTIER - Using Transformations of Variables to Improve Inference
Using Transformations of Variables to Improve Inference
A. Charpentier ( UQAM),
joint work with J.-D. Fermanian (CREST) E. Flachaire (AMSE) G. Geenens
(UNSW), A. Oulidi (UIO), D. Paindaveine (ULB), O. Scaillet (UNIGE)
Montréal, UQAM, November 2018
@freakonometrics freakonometrics freakonometrics.hypotheses.org 1

Side Note
Most of the contents here is based on old results (revised, and continued)
Work on genealogical trees, Étude de la démographie française du XIXe siècle à
partir de données collaboratives de généalogie and Internal Migrations in France in
the Nineteenth Century with E. Gallic.
Children Grandchildren Great-grandchildren
0.00
0.126
0.252
0.378
0.503
Children Grandchildren Great-grandchildren
0.00
0.029
0.058
0.087
0.116

Motivation
2005, The Estimation of Copulas: Theory and Practice
with J.-D. Fermanian and O. Scaillet
survey on non-parametric techniques
(kernel base) to visualize
the estimator of a copula density
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0
2
4
6
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0
2
4
6
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0
2
4
6
Idea : beta kernels and transformed kernels
More recently, mix those techniques in the univariate case

Motivation
Consider some n-i.i.d. sample {(Xi, Yi)} with cu-
mulative distribution function FXY and joint den-
sity fXY . Let FX and FY denote the marginal dis-
tributions, and C the copula,
FXY (x, y) = C(FX(x), FY (y))
so that
fXY (x, y) = fX(x)fY (y)c(FX(x), FY (y))
We want a nonparametric estimate of c on [0, 1]2
. 1e+01 1e+03 1e+05
1e+011e+021e+031e+041e+05

Notations
Define uniformized n-i.i.d. sample {(Ui, Vi)}
Ui = FX(Xi) and Vi = FY (Yi)
or uniformized n-i.i.d. pseudo-sample {( Ûi, ˆVi)}
Ûi =
n
n + 1
ˆFXn(Xi) and ˆVi =
n
n + 1
ˆFY n(Yi)
where ˆFXn and ˆFY n denote empirical c.d.f.
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0

Standard Kernel Estimate
The standard kernel estimator for c, say ˆc∗
, at (u, v) ∈ I would be (see Wand &
Jones (1995))
ˆc∗
(u, v) =
1
n|HUV |1/2
n
i=1
K H
−1/2
UV
u − Ui
v − Vi
, (1)
where K : R2
→ R is a kernel function and HUV is a bandwidth matrix.

Standard Kernel Estimate
However, this estimator is not consistent along
boundaries of [0, 1]2
E(ˆc∗
(u, v)) =
1
4
c(u, v) + O(h) at corners
E(ˆc∗
(u, v)) =
1
2
c(u, v) + O(h) on the borders
if K is symmetric and HUV symmetric.
Corrections have been proposed, e.g. mirror reﬂec-
tion Gijbels (1990) or the usage of boundary kernels
Chen (2007), but with mixed results.
Remark : the graph on the bottom is ˆc∗
on the
(ﬁrst) diagonal. 0.0 0.2 0.4 0.6 0.8 1.0
01234567
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
0
2
4
6

Mirror Kernel Estimate
Use an enlarged sample : instead of only {( Ûi, ˆVi)},
add {(− Ûi, ˆVi)}, {( Ûi, − ˆVi)}, {(− Ûi, − ˆVi)},
{( Ûi, 2 − ˆVi)}, {(2 − Ûi, ˆVi)},{(− Ûi, 2 − ˆVi)},
{(2 − Ûi, − ˆVi)} and {(2 − Ûi, 2 − ˆVi)}.
See Gijbels & Mielniczuk (1990).
That estimator will be used as a benchmark in the
simulation study.
0.0 0.2 0.4 0.6 0.8 1.0
01234567
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
0
2
4
6

Using Beta Kernels
Use a Kernel which is a product of beta kernels
Kxi
(u) ∝ x
u1
b
1,i [1 − x1,i]
u1
b · x
u2
b
2,i [1 − x2,i]
u2
b
for some b > 0, see Chen (1999).
for some observation xi in the lower left corner
0.0 0.2 0.4 0.6 0.8 1.0
01234567
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
0
2
4
6

Using Beta Kernels

Probit Transformation
See Devroye & Gyöfi (1985) and Marron & Ruppert
(1994).
Define normalized n-i.i.d. sample {(Si, Ti)}
Si = Φ−1
(Ui) and Ti = Φ−1
(Vi)
or normalized n-i.i.d. pseudo-sample {( ˆSi, ˆTi)}
Ûi = Φ−1
( Ûi) and ˆVi = Φ−1
( ˆVi)
where Φ−1
is the quantile function of N(0, 1)
(probit transformation). −3 −2 −1 0 1 2 3
−3−2−10123

Probit Transformation
FST (x, y) = C(Φ(x), Φ(y))
so that
fST (x, y) = φ(x)φ(y)c(Φ(x), Φ(y))
Thus
c(u, v) =
fST (Φ−1
(u), Φ−1
(v))
φ(Φ−1(u))φ(Φ−1(v))
.
So use
ˆc(τ)
(u, v) =
ˆfST (Φ−1
(u), Φ−1
(v))
φ(Φ−1(u))φ(Φ−1(v))
−3 −2 −1 0 1 2 3
−3−2−10123

The naive estimator
Since we cannot use
ˆf∗
ST (s, t) =
1
n|HST |1/2
n
i=1
K H
−1/2
ST
s − Si
t − Ti
,
where K is a kernel function and HST is a band-
width matrix, use
ˆfST (s, t) =
1
n|HST |1/2
n
i=1
K H
−1/2
ST
s − ˆSi
t − ˆTi
.
and the copula density is
ˆc(τ)
(u, v) =
ˆfST (Φ−1
(u), Φ−1
(v))
φ(Φ−1(u))φ(Φ−1(v))
0.0 0.2 0.4 0.6 0.8 1.0
01234567
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
0
2
4
6

The naive estimator
ˆc(τ)
(u, v) =
1
n|HST |1/2φ(Φ−1(u))φ(Φ−1(v))
n
i=1
K H
−1/2
ST
Φ−1
(u) − Φ−1
( ˆUi)
Φ−1(v) − Φ−1( ˆVi)
as suggested in C., Fermanian & Scaillet (2007) and Lopez-Paz . et al. (2013).
Note that Omelka . et al. (2009) obtained theoretical properties on the
convergence of ˆC(τ)
(u, v) (not c).
In Probit transformation for nonparametric kernel estimation of the copula density
with G. Geenens and D. Paindaveine, we extended that estimator.
See also kdecopula R package by T. Nagler

Improved probit-transformation copula density estimators
When estimating a density from pseudo-sample, Loader (1996) and Hjort &
Jones (1996) deﬁne a local likelihood estimator
Around (s, t) ∈ R2
, use a polynomial approximation of order p for log fST
log fST (ˇs, ˇt) a1,0(s, t) + a1,1(s, t)(ˇs − s) + a1,2(s, t)(ˇt − t)
.
= Pa1
(ˇs − s, ˇt − t)
log fST (ˇs, ˇt) a2,0(s, t) + a2,1(s, t)(ˇs − s) + a2,2(s, t)(ˇt − t)
+ a2,3(s, t)(ˇs − s)2
+ a2,4(s, t)(ˇt − t)2
+ a2,5(s, t)(ˇs − s)(ˇt − t)
.
= Pa2
(ˇs − s, ˇt − t).

Improved probit-transformation copula density estimators
Remark Vectors a1(s, t) = (a1,0(s, t), a1,1(s, t), a1,2(s, t)) and
a2(s, t)
.
= (a2,0(s, t), . . . , a2,5(s, t)) are then estimated by solving a weighted
maximum likelihood problem.
˜ap(s, t) = arg max
ap
n
i=1
K H
−1/2
ST
s − ˆSi
t − ˆTi
Pap
( ˆSi − s, ˆTi − t)
−n
R2
K H
−1/2
ST
s − ˇs
t − ˇt
exp Pap (ˇs − s, ˇt − t) dˇs dˇt ,
The estimate of fST at (s, t) is then ˜f
(p)
ST (s, t) = exp(˜ap,0(s, t)), for p = 1, 2.
The Improved probit-transformation kernel copula density estimators are
˜c(τ,p)
(u, v) =
˜f
(p)
ST (Φ−1
(u), Φ−1
(v))
φ(Φ−1(u))φ(Φ−1(v))

Improved probit-transformation
copula density estimators
For the local log-linear (p = 1) approximation
˜c(τ,1)
(u, v) =
exp(˜a1,0(Φ−1
(u), Φ−1
(v))
φ(Φ−1(u))φ(Φ−1(v))
0.0 0.2 0.4 0.6 0.8 1.0
01234567
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
0
2
4
6

Improved probit-transformation
copula density estimators
For the local log-quadratic (p = 2) approximation
˜c(τ,2)
(u, v) =
exp(˜a2,0(Φ−1
(u), Φ−1
(v))
φ(Φ−1(u))φ(Φ−1(v))
0.0 0.2 0.4 0.6 0.8 1.0
01234567
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
0
2
4
6

Asymptotic properties
A1. The sample {(Xi, Yi)} is a n- i.i.d. sample from the joint distribution FXY ,
an absolutely continuous distribution with marginals FX and FY strictly
increasing on their support ;
(uniqueness of the copula)

A2. The copula C of FXY is such that (∂C/∂u)(u, v) and (∂2
C/∂u2
)(u, v) exist
and are continuous on {(u, v) : u ∈ (0, 1), v ∈ [0, 1]}, and (∂C/∂v)(u, v) and
(∂2
C/∂v2
)(u, v) exist and are continuous on {(u, v) : u ∈ [0, 1], v ∈ (0, 1)}. In
addition, there are constants K1 and K2 such that



∂2
C
∂u2
(u, v) ≤
K1
u(1 − u)
for (u, v) ∈ (0, 1) × [0, 1];
∂2
C
∂v2
(u, v) ≤
K2
v(1 − v)
for (u, v) ∈ [0, 1] × (0, 1);
A3. The density c of C exists, is positive and admits continuous second-order
partial derivatives on the interior of the unit square I. In addition, there is a
constant K00 such that
c(u, v) ≤ K00 min
1
u(1 − u)
,
1
v(1 − v)
∀(u, v) ∈ (0, 1)2
.
see Segers (2012).

Assume that K(z1, z2) = φ(z1)φ(z2) and HST = h2
I with h ∼ n−a
for some
a ∈ (0, 1/4). Under Assumptions A1-A3, the ‘naive’ probit transformation kernel
copula density estimator at any (u, v) ∈ (0, 1)2
is such that
√
nh2 ˆc(τ)
(u, v) − c(u, v) − h2 b(u, v)
φ(Φ−1(u))φ(Φ−1(v))
L
−→ N 0, σ2
(u, v) ,
where b(u, v) =
1
2
∂2
c
∂u2
(u, v)φ2
(Φ−1
(u)) +
∂2
c
∂v2
(u, v)φ2
(Φ−1
(v))
− 3
∂c
∂u
(u, v)Φ−1
(u)φ(Φ−1
(u)) +
∂c
∂v
(u, v)Φ−1
(v)φ(Φ−1
(v))
+ c(u, v) {Φ−1
(u)}2
+ {Φ−1
(v)}2
− 2 (2)
and σ2
(u, v) =
c(u, v)
4πφ(Φ−1(u))φ(Φ−1(v))
.

The Amended version
The last unbounded term in b be easily adjusted.
ˆc(τam)
(u, v) =
ˆfST (Φ−1
(u), Φ−1
(v))
φ(Φ−1(u))φ(Φ−1(v))
×
1
1 + 1
2 h2 ({Φ−1(u)}2 + {Φ−1(v)}2 − 2)
.
The asymptotic bias becomes proportional to
b(am)
(u, v) =
1
2
∂2
c
∂u2
(u, v)φ2
(Φ−1
(u)) +
∂2
c
∂v2
(u, v)φ2
(Φ−1
(v))
−3
∂c
∂u
(u, v)Φ−1
(u)φ(Φ−1
(u)) +
∂c
∂v
(u, v)Φ−1
(v)φ(Φ−1
(v)) .

A local log-linear probit-transformation kernel estimator
˜c∗(τ,1)
(u, v) = ˜f
∗(1)
ST (Φ−1
(u), Φ−1
(v))/ φ(Φ−1
(u))φ(Φ−1
(v))
Then
√
nh2 ˜c∗(τ,1)
(u, v) − c(u, v) − h2 b(1)
(u, v)
φ(Φ−1(u))φ(Φ−1(v))
L
−→ N 0, σ(1) 2
(u, v) ,
where b(1)
(u, v) =
1
2
∂2
c
∂u2
(u, v)φ2
(Φ−1
(u)) +
∂2
c
∂v2
(u, v)φ2
(Φ−1
(v))
−
1
c(u, v)
∂c
∂u
(u, v)
2
φ2
(Φ−1
(u)) +
∂c
∂v
(u, v)
2
φ2
(Φ−1
(v))
−
∂c
∂u
(u, v)Φ−1
(u)φ(Φ−1
(u)) +
∂c
∂v
(u, v)Φ−1
(v)φ(Φ−1
(v)) − 2c(u, v)

Using a higher order polynomial approximation
Locally ﬁtting a polynomial of a higher degree is known to reduce the asymptotic
bias of the estimator, here from order O(h2
) to order O(h4
), see Loader (1996) or
Hjort (1996), under suﬃcient smoothness conditions.
If fST admits continuous fourth-order partial derivatives and is positive at (s, t),
then
√
nh2 ˜f
∗(2)
ST (s, t) − fST (s, t) − h4
b
(2)
ST (s, t)
L
−→ N 0, σ
(2)
ST
2
(s, t) ,
where σ
(2)
ST
2
(s, t) =
5
2
fST (s, t)
4π
and
b
(2)
ST (s, t) = −
1
8
fST (s, t)
×
∂4
g
∂s4
+
∂4
g
∂t4
+ 4
∂3
g
∂s3
∂g
∂s
+
∂3
g
∂t3
∂g
∂t
+
∂3
g
∂s2∂t
∂g
∂t
+
∂3
g
∂s∂t2
∂g
∂s
+ 2
∂4
g
∂s2∂t2
(s, t),
with g(s, t) = log fST (s, t).

Using a higher order polynomial approximation
A4. The copula density c(u, v) = (∂2
C/∂u∂v)(u, v) admits continuous
fourth-order partial derivatives on the interior of the unit square [0, 1]2
.
Then
√
nh2 ˜c∗(τ,2)
(u, v) − c(u, v) − h4 b(2)
(u, v)
φ(Φ−1(u))φ(Φ−1(v))
L
−→ N 0, σ(2) 2
(u, v)
where σ(2) 2
(u, v) =
5
2
c(u, v)
4πφ(Φ−1(u))φ(Φ−1(v))

Improving Bandwidth choice
Consider the principal components decomposition of the (n × 2) matrix
[ˆS, ˆT ] = M.
Let W1 = (W11, W12)T
and W2 = (W21, W22)T
be the eigenvectors of MT
M. Set
Q
R
=


W11 W12
W21 W22

 S
T
= W
S
T
which is only a linear reparametrization of R2
, so
an estimate of fST can be readily obtained from an
estimate of the density of (Q, R)
Since { ˆQi} and { ˆRi} are empirically uncorrela-
ted, consider a diagonal bandwidth matrix HQR =
diag(h2
Q, h2
R).
−4 −2 0 2 4
−3−2−1012

Improving Bandwidth choice
Use univariate procedures to select hQ and hR independently
Denote ˜f
(p)
Q and ˜f
(p)
R (p = 1, 2), the local log-polynomial estimators for the
densities
hQ can be selected via cross-validation (see Section 5.3.3 in Loader (1999))
hQ = arg min
h>0
∞
−∞
˜f
(p)
Q (q)
2
dq −
2
n
n
i=1
˜f
(p)
Q(−i)( ˆQi) ,
where ˜f
(p)
Q(−i) is the ‘leave-one-out’ version of ˜f
(p)
Q .

Graphical Comparison (loss ALAE dataset)
c~(τ2)
Loss (X)
ALAE(Y)
0.25
0.25
0.5
0.5
0.75
0.75
1
1
1.25
1.25
1.5
1.5
2
2
4
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
0.25
0.25
0.5
0.5
0.75
0.75
1
1
1
1.25
1.25
1.5
1.5
2
2
4
c^
β
Loss (X)
ALAE(Y)
0.25
0.25
0.5
0.5
0.75
0.75
1
1
1.25
1.25
1.5
1.5
2
2
4
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
0.25
0.25
0.25 0.25
0.5
0.5
0.75
0.75
0.75
1
1
1
1
1
1.25
1.25
1.25
1.25
1.25
1.5
1.5
2
2
2
4
c^
b
Loss (X)
ALAE(Y)
0.25
0.25
0.5
0.5
0.75
0.75
1
1
1.25
1.25
1.5
1.5
2
2
4
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
0.25
0.25
0.5
0.5
0.75
0.75
1
1
1.25
1.25
1.5
1.5
2
2
c^
p
Loss (X)
ALAE(Y)
0.25
0.25
0.5
0.5
0.75
0.75
1
1
1.25
1.25
1.5
1.5
2
2
4
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
0.25
0.25
0.25
0.25
0.5
0.5
0.75
0.75
1
1
1
1.25
1.25
1.25
1.25
1.5
1.5
1.5
2
2
4
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
c~(τ2)
Loss (X)
ALAE(Y)
0.25
0.25
0.5
0.5
0.75
0.75
1
1
1
1.25
1.25
1.5
1.5
2
2
4
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
c^
β
Loss (X)
ALAE(Y)
0.25
0.25
0.25 0.25
0.5
0.5
0.75
0.75
0.75
1
1
1
1
1
1.25
1.25
1.25
1.25
1.25
1.5
1.5
1.5
2
2
2
4
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
c^
b
Loss (X)
ALAE(Y)
0.25
0.25
0.5
0.5
0.75
0.751
1
1.25
1.25
1.5
1.5
2
2
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
c^
p
Loss (X)
ALAE(Y)
0.25
0.25
0.25
0.25
0.5
0.5
0.75
0.75
1
1
1
1.25
1.25
1.25
1.25
1.5
1.5
1.5
2
2
4

Simulation Study
M = 1, 000 independent random samples {(Ui, Vi)}n
i=1 of sizes n = 200, n = 500
and n = 1000 were generated from each of the following copulas :
· the independence copula (i.e., Ui’s and Vi’s drawn independently) ;
· the Gaussian copula, with parameters ρ = 0.31, ρ = 0.59 and ρ = 0.81 ;
· the Student t-copula with 4 degrees of freedom, with parameters ρ = 0.31,
ρ = 0.59 and ρ = 0.81 ;
· the Frank copula, with parameter θ = 1.86, θ = 4.16 and θ = 7.93 ;
· the Gumbel copula, with parameter θ = 1.25, θ = 1.67 and θ = 2.5 ;
· the Clayton copula, with parameter θ = 0.5, θ = 1.67 and θ = 2.5.
(approximated) MISE relative to the MISE of the mirror-reflection estimator
(last column), n = 1000. Bold values show the minimum MISE for the
corresponding copula (non-significantly different values are highlighted as well).

n = 1000 ˆc(τ) ˆc(τam) ˜c(τ,1) ˜c(τ,2) ˆc
(β)
1 ˆc
(β)
2 ˆc
(B)
1 ˆc
(B)
2 ˆc
(p)
1 ˆc
(p)
2 ˆc
(p)
3
Indep 3.57 2.80 2.89 1.40 7.96 11.65 1.69 3.43 1.62 0.50 0.14
Gauss2 2.03 1.52 1.60 0.76 4.63 6.06 1.10 1.82 0.98 0.66 0.89
Gauss4 0.63 0.49 0.44 0.21 1.72 1.60 0.75 0.58 0.62 0.99 2.93
Gauss6 0.21 0.20 0.11 0.05 0.74 0.33 0.77 0.37 0.72 1.21 2.83
Std(4)2 0.61 0.56 0.50 0.40 1.57 1.80 0.78 0.67 0.75 1.01 1.88
Std(4)4 0.21 0.27 0.17 0.15 0.88 0.51 0.75 0.42 0.75 1.12 2.07
Std(4)6 0.09 0.17 0.08 0.09 0.70 0.19 0.82 0.47 0.90 1.17 1.90
Frank2 3.31 2.42 2.57 1.35 7.16 9.63 1.70 2.95 1.31 0.45 0.49
Frank4 2.35 1.45 1.51 0.99 4.42 4.89 1.49 1.65 0.60 0.72 6.14
Frank6 0.96 0.52 0.45 0.44 1.51 1.19 1.35 0.76 0.65 1.58 7.25
Gumbel2 0.65 0.62 0.56 0.43 1.77 1.97 0.82 0.75 0.83 1.03 1.52
Gumbel4 0.18 0.28 0.16 0.19 0.89 0.41 0.78 0.47 0.81 1.10 1.78
Gumbel6 0.09 0.21 0.10 0.15 0.78 0.29 0.85 0.58 0.94 1.12 1.63
Clayton2 0.63 0.60 0.51 0.34 1.78 1.99 0.78 0.70 0.79 1.04 1.79
Clayton4 0.11 0.26 0.10 0.15 0.79 0.27 0.83 0.56 0.90 1.10 1.50
Clayton6 0.11 0.28 0.08 0.15 0.82 0.35 0.88 0.67 0.96 1.09 1.36

Probit Transform in the Univariate Case : the log transform
See Log-Transform Kernel Density Estimation of Income Distribution with E.
Flachaire
The Gaussian kernel estimator of a density is
fZ(z) =
1
n
n
i=1
ϕ(z; zi, h)
where ϕ(·; µ, σ) is the density of the normal distribution.
Use a Gaussian kernel estimation of the density using a
logarithmic transformation of data xi’s
fX(x) =
fZ(log(x))
x
=
1
n
n
i=1
h(x; log xi, h)
where h(·; µ, σ) is the density of the lognormal distribution.

Probit Transform in the Univariate Case : the log transform
Recall that classically bias[fZ(z)] ∼
h2
2
fZ(z) and Var[fZ(z)] ∼
0.2821
nh
fZ(z)
Here, in the neighborhood of 0,
bias[fX(x)] ∼
h2
2
fX(x) + 3x · fX(x) + x2
· fX(x)
which is positive if fX(0) > 0, while
Var[fX(x)] ∼
0.2821
nhx
fX(x)
The log-transform kernel may perform
poorly when fX(0) > 0,
see Silverman (1986).

Back on the Transformed Kernel
See Devroye & Györﬁ (1985), and Devroye & Lugosi (2001)
... use the transformed kernel the other way, R → [0, 1] → R

Back on the Transformed Kernel
Interesting point, the optimal T should be F,
thus, T can be Fθ

Heavy Tailed distribution
Let X denote a (heavy-tailed) random variable with tail index α ∈ (0, ∞), i.e.
P(X > x) = x−α
L1(x)
where L1 is some regularly varying function.
Let T denote a R → [0, 1] function, such that 1 − T is regularly varying at
inﬁnity, with tail index β ∈ (0, ∞).
Deﬁne Q(x) = T−1
(1 − x−1
) the associated tail quantile function, then
Q(x) = x1/β
L2(1/x), where L2 is some regularly varying function (the de Bruyn
conjugate of the regular variation function associated with T). Assume here that
Q(x) = bx1/β
Let U = T(X). Then, as u → 1
P(U > u) ∼ (1 − u)α/β
.

see C. & Oulidi (2010), α = 0.75−1
, T0.75−1 , T0.65−1
lighter
, T0.85−1
heavier
and Tˆα
Density
0.0 0.2 0.4 0.6 0.8 1.0
0.00.51.01.52.0
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Density
0.0 0.2 0.4 0.6 0.8 1.0
0.00.51.01.52.0
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Density
0.0 0.2 0.4 0.6 0.8 1.0
0.00.51.01.52.0
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Density
0.0 0.2 0.4 0.6 0.8 1.0
0.00.51.01.52.0
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0

see C. & Oulidi (2007) Beta kernel quantile estimators of heavy-tailed loss
distributions, impact on quantile estimation ?
20 30 40 50 60 70 80
0.000.010.020.030.040.050.06
Density

see C. & Oulidi (2007), impact on quantile estimation ? bias ? m.s.e. ?

Bimodal distribution
Let X denote a bimodal distribution, obtained from a mixture
X ∼ FΘ



F0 if Θ = 0, (probability p0)
F1 if Θ = 1, (probability p1)
Idea : T(X) can be obtained as transformation of two distributions on [0, 1],
T(X) ∼ GΘ



G0 if Θ = 0, (probability p0)
G1 if Θ = 1, (probability p1)
→ standard for income observations...

Which transformation ?
GB2 : t(y; a, b, p, q) =
|a|yap−1
bapB(p, q)[1 + (y/b)a]p+q
, for y > 0,
GB2
q→∞
~~
a=1

p=1
55
q=1
@@
GG
a→0
ÓÓ
a=1

p=1
%%
Beta2
q→∞
ww
SM
q→∞
xx
q=1
99
Dagum
p=1

Lognormal Gamma Weibull Champernowne

Example, X ∼ Pareto
Pareto plot (log F(x) vs. log x), and histogram of {U1, · · · , Un}, Ui = Tˆθ(Xi)
log(x)
log(1−F(x))
1 5 10 50 100 500
5e−055e−045e−035e−025e−01
u=H(x)
densityg(u)
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.01.2

Example, X ∼ Pareto
Estimation of the density g of U = Tθ
(X), and estimated c.d.f of X,
Fn(x) =
x
0
fn(y)dy where fn(y) = gn(Tˆθ(y)) · tˆθ(y)
u=H(x)
densityg(u)
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.01.2
Adaptative kernel
log(x)
log(1−F(x))
1 5 10 50 100 500
5e−055e−045e−035e−025e−01

Beta kernel
g(u) =
n
i=1
1
n
· b u;
Ui
h
,
1 − Ui
h
u ∈ [0, 1].
with some possible boundary correction, as suggested in Chen (1999),
u
h
→ ρ(u, h) = 2h2
+ 2.5 − (4h4
+ 6h2
+ 2.25 − u2
− u/h)1/2
Problem : choice of the bandwidth h ? Standard loss function
L(h) = [gn(u) − g(u)]2
du = [gn(u)]2
du − 2 gn(u) · g(u)du
CV (h)
+ [g(u)]2
du
where
CV (h) = gn(u)du
2
−
2
n
n
i=1
g(−i)(Ui)

Beta kernel
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
0.0 0.2 0.4 0.6 0.8 1.0
0.00.51.01.52.02.53.0
q q q qq q qq
0.0 0.2 0.4 0.6 0.8 1.0
0.00.51.01.52.02.53.0
q q q qq q qqqq qq qq qq q qq q qq qq q qq q qq q q

Beta kernel
u=H(x)
densityg(u)
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.01.2
Beta mixtures (.1)
Beta mixtures (.05)
Beta mixtures (.02)
Beta mixtures (.01)
log(x)
log(1−F(x))
1 5 10 50 100 5001e−051e−041e−031e−021e−01

Mixture of Beta distributions
g(u) =
k
j=1
πj · b u; αj, βj u ∈ [0, 1].
Problem : choice the number of components k (and estimation...). Use of
stochastic EM algorithm (or sort of) see Celeux Diebolt (1985).

0.0 0.2 0.4 0.6 0.8 1.0
0.00.51.01.52.02.53.0
q q q qq q qq
0.0 0.2 0.4 0.6 0.8 1.0
0.00.51.01.52.02.53.0

u=H(x)
densityg(u)
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.01.2
Beta mixtures
log(x)
log(1−F(x))
1 5 10 50 100 5001e−051e−041e−031e−021e−01

Bernstein approximation
g(u) =
m
k=1
[mωk] · b (u; k, m − k) u ∈ [0, 1].
where ωk = G
k
m
− G
k − 1
m
.

0.0 0.2 0.4 0.6 0.8 1.0
0.00.51.01.52.02.53.0
q q q qq q qq
0.0 0.2 0.4 0.6 0.8 1.0
0.00.51.01.52.02.53.0

u=H(x)
densityg(u)
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.01.2
Bernstein (100)
Bernstein (60)
Bernstein (40)
Bernstein (20)
Bernstein (10)
Bernstein (5)
log(x)
log(1−F(x))
1 5 10 50 100 5001e−051e−041e−031e−021e−01

Quantities of interest
Standard statistical quantities
• miae,
∞
0
fn(x) − f(x) dx
• mise,
∞
0
fn(x) − f(x)
2
dx
Inequality indices and risk measures, based on F(x) =
x
0
f(t)dt,
• Gini,
1
µ
∞
0
F(t)[1 − F(t)]dt
• Theil,
∞
0
t
µ
log
t
µ
f(t)dt
• VaR-quantile, x such that F(x) = P(X ≤ x) = α, i.e. F−1
(α)
• TVaR-expected shorfall, E[X|X F−1
(α)]
where µ =
∞
0
[1 − F(x)]dx.

transformations and nonparametric inference

More Related Content

What's hot (20)

Similar to transformations and nonparametric inference (20)

More from Arthur Charpentier (19)

Recently uploaded (20)

transformations and nonparametric inference