SlideShare a Scribd company logo
Center for Uncertainty
Quantification
Hierarchical matrix approximation of
large covariance matrices
A. Litvinenko1
, M. Genton2
, Ying Sun2
, R. Tempone
1
SRI-UQ Center and 2
Spatio-Temporal Statistics & Data Analysis Group
at KAUST
alexander.litvinenko@kaust.edu.sa
Center for Uncertainty
Quantification
Center for Uncertainty
Quantification
Abstract
We approximate large non-structured covariance ma-
trices in the H-matrix format with a log-linear com-
putational cost and storage O(n log n). We compute
inverse, Cholesky decomposition and determinant in
H-format. As an example we consider the class of
Matern covariance functions, which are very popu-
lar in spatial statistics, geostatistics, machine learning
and image analysis. Applications are: kriging and op-
timal design
1. Matern covariance
C(x, y) = C(|x−y|) = σ2 1
Γ(ν)2ν−1
√
2ν
r
L
ν
Kν
√
2ν
r
L
,
where Γ is the gamma function, Kν is the modified
Bessel function of the second kind, r = |x − y| and
ν, L are non-negative parameters of the covariance.
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
0
0.05
0.1
0.15
0.2
0.25
Matern covariance (nu=1)
σ=0.5, l=0.5
σ=0.5, l=0.3
σ=0.5, l=0.2
σ=0.5, l=0.1
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
0
0.05
0.1
0.15
0.2
0.25
nu=0.15
nu=0.3
nu=0.5
nu=1
nu=2
nu=30
As ν → ∞ [4],
C(r) = σ2
exp(−r2
/2L2
).
When ν = 0.5, the Matern covariance is identical to
the exponential covariance function.
Cν=3/2(r) = 1 +
√
3r
L
exp −
√
3r
L
Cν=5/2(r) = 1 +
√
5r
L
+
5r2
3L2
exp −
√
5r
L
.
Note: no need to assume neither C(x, y) = C(|x − y|) nor tensor
grid.
2. H-matrix approximation
25 20
20 20
20 16
20 16
20 20
16 16
20 16
16 16
19 20
20 19 32
19 19
16 16 32
19 20
20 19
19 16
19 16
32 32
20 20
20 20 32
32 32
20 19
19 19 32
20 19
16 16 32
32 20
32 32
20 32
32 32
32 20
32 32
20 19
19 19
20 16
19 16
32 32
20 32
32 32
32 32
20 32
32 32
20 32
20 20
20 20 32
32 32
32 32
32 32
32 32
32 32
32 32
32 32
20 20
20 19
20 20 32
32 32 20
20 20
32 20
32 32 20
20 20
32 32
20 32 20
20 20
32 32
32 32 20
20
20 20
19 20 32
32 32
20 20
20
32 20
32 32
20 20
20
32 32
20 32
20 20
20
32 32
32 32
20 20
20 20
20 20 32
32 32
20 19
20 19 32
32 32
20 20
19 19 32
32 32
20 20
20 20 32
32 32
32 20
32 32
32 20
32 32
32 20
32 32
32 20
32 32
32 32
20 32
32 32
20 32
32 32
20 32
32 32
20 32
32 32
32 32
32 32
32 32
32 32
32 32
32 32
32 32
20 20
20 20 2019 20
20 20 32
32 32
32 20
32 32
20 32
32 32
32 20
32 32
20 20
20 20
20 20
20 20 20
20 20
32 20
32 32 20
20 20
20 20
20 20
20 20 20
20
32 20
32 32
20 20
20 20
20 20
20 20
20 20 20
32 20
32 32
32 20
32 32
32 20
32 32
32 20
32 32
20 20
20 20
20 20
20 20
19 20
20 20 32
32 32
20 32
32 32
32 32
20 32
32 32
20 32
20
20 20
20 20
20 20
20 20
20 20
32 32
20 32 20
20
20 20
20 20
20 20
20 20
20
32 32
20 32
20 20
20
20 20
20 20
20 20
20 20
32 32
20 32
32 32
20 32
32 32
20 32
32 32
20 32
20
20 20
20 20
20 20
20 20 32
32 32
32 32
32 32
32 32
32 32
32 32
32 32
19 19
20 20 32
32 32
32 20
32 32
20 32
32 32
32 20
32 32
19 20
19 20 32
32 32
20 32
32 32
32 32
20 32
32 32
20 32
20 20
20 20 32
32 32
32 32
32 32
32 32
32 32
32 32
32 32
20 20
32 32
32 32 20
20 20
32 20
32 32 20
20 20
32 32
20 32 20
20 20
32 32
32 32 20
20
32 32
32 32
20 20
20
32 20
32 32
20 20
20
32 32
20 32
20 20
20
32 32
32 32
20 20
32 32
32 32
32 32
32 32
32 32
32 32
32 32
32 32
32 20
32 32
32 20
32 20
32 20
32 32
32 20
32 20
32 32
20 32
32 32
20 32
32 32
20 20
32 32
20 20
32 32
32 32
32 32
32 32
32 32
32 32
32 32
32 32
25 9
9 20 9
9
20 7
7 16
9
9
20 9
9 20 9
9 32
9
9
20 9
9 20 9
9 32 9
9
32 9
9 32
9
9
20 9
9 20 9
9 32 9
9
20 9
9 20 9
9 32
9
9
32 9
9 32 9
9
32 9
9 32
9
9
20 9
9 20 9
9 32 9
9
32 9
9 32
9
9
20 9
9 20 9
9 32 9
9
32 9
9 32
9
9
32 9
9 32 9
9
32 9
9 32
9
9
32 9
9 32 9
9
32 9
9 32
Figure 2: Two approximation strategies [1]: fixed rank (left) and
flexible rank (right) approximations, C ∈ Rn×n, n = 652.
I
I
I I
I
I
I I I I1
1
2
2
11 12 21 22
I11
I12
I21
I22
Q
Qt
S
dist
H=
t
s
1. Build cluster tree TI, I = {1, 2, ..., n}
2. Build block cluster tree TI×I
3. For each (t × s) ∈ TI×I, t, s ∈ TI, check admissibility
condition min{diam(Qt), diam(Qs)} ≤ η·dist(Qt, Qs).
if(adm=true) then M|t×s is a rank-k matrix block
if(adm=false) then divide M|t×s further or define as a
dense matrix block, if small enough.
Grid → cluster tree (TI) + admissibility condition →
block cluster tree (TI×I) → H-matrix → H-matrix arith-
metics.
Operation Sequential Complexity Parallel Complexity
(Hackbusch et al. ’99-’06) (Kriemann ’05)
storage(M) N = O(kn log n) N
P
Mx N = O(kn log n) N
P
M1 ⊕ M2 N = O(k2n log n) N
P
M1 M2, M−1 N = O(k2n log2 n) N
P + O(n)
H-LU N = O(k2n log2 n) N
P + O(k2
n log2
n
n1/d )
Table 1: Computational cost of H-matrix arithmetics, sequential
and parallel.
Let ε =
(C−CH
)z 2
C 2 z 2
, where z is a random vector.
n rank k size, MB t, sec. ε max
i=1..10
|λi − ˜λi|, i ε2
for ˜C C ˜C C ˜C
4.0 · 103
10 48 3 0.8 0.08 7 · 10−3
7.0 · 10−2
, 9 2.0 · 10−4
1.05 · 104
18 439 19 7.0 0.4 7 · 10−4
5.5 · 10−2
, 2 1.0 · 10−4
2.1 · 104
25 2054 64 45.0 1.4 1 · 10−5
5.0 · 10−2
, 9 4.4 · 10−6
Table 2: Accuracy of the H-matrix approx. exp. covariance function, l1 = l3 =
0.1, l2 = 0.5.
l1 l2 ε
0.01 0.02 3 · 10−2
0.1 0.2 8 · 10−3
0.5 1 2.8 · 10−5
Table 3: Dependence of the H-matrix accuracy on the covari-
ance lengths l1 and l2, n = 1292. The smaller cov. length the less
accurate is H-approximation.
0
100
200
300
0
50
100
150
200
250
300
−1
0
1
2
−1
−0.5
0
0.5
1
1.5
2
2.5
3
0
100
200
300
0
50
100
150
200
250
300
−1
0
1
2
−3
−2
−1
0
1
2
Figure 4: Two realizations of random field generated via
Cholesky decomposition of Matern covariance matrix, ν = 0.4.
3. Kullback-Leibler divergence
Measure of the information lost when distribution Q is used to
approximate P.
DKL(P Q) =
i
P(i) ln
P(i)
Q(i)
, DKL(P Q) =
∞
−∞
p(x) ln
p(x)
q(x)
dx,
where p, q densities of P and Q. For miltivariate normal distribu-
tions (µ0, C) and (µ1, CH):
2DKL(N0 N1) = tr((CH
)−1
C) + (µ1 − µ0)T
(CH
)−1
(µ1 − µ0) − n − ln
det C
det CH
.
0 10 20 30 40 50 60 70 80 90 100
−16
−14
−12
−10
−8
−6
−4
−2
0
rank k
log(rel.error)
Spectral norm, L=0.1, nu=0.5
Frob. norm, L=0.1
Spectral norm, L=0.2
Frob. norm, L=0.2
Spectral norm, L=0.5
Frob. norm, L=0.5
0 10 20 30 40 50 60 70 80 90 100
−18
−16
−14
−12
−10
−8
−6
−4
−2
0
rank k
log(rel.error)
Spectral norm, L=0.1, ν=1.5
Frob. norm, L=0.1
Spectral norm, L=0.2
Frob. norm, L=0.2
Spectral norm, L=0.5
Frob. norm, L=0.5
Figure 5: Relative H-matrix approx. error C−CH
2 for different
cov. lengths L = {0.1, 0.2, 0.5} and ν = {0.5, 1.5}
k KLD(C, CH) C − CH
2 C(CH)−1 − I 2
L = 0.25 L = 0.75 L = 0.25 L = 0.75 L = 0.25 L = 0.75
5 0.51 2.3 4.0e-2 0.1 4.8 63
6 0.34 1.6 9.4e-3 0.02 3.4 22
8 5.3e-2 0.4 1.9e-3 0.003 1.2 8
10 2.6e-3 0.2 7.7e-4 7.0e-4 6.0e-2 3.1
12 5.0e-4 2e-2 9.7e-5 5.6e-5 1.6e-2 0.5
15 1.0e-5 9e-4 2.0e-5 1.1e-5 8.0e-4 0.02
20 4.5e-7 4.8e-5 6.5e-7 2.8e-7 2.1e-5 1.2e-3
50 3.4e-13 5e-12 2.0e-13 2.4e-13 4e-11 2.7e-9
Table 4: Dependence of KLD on H-matrix rank k, Matern co-
variance with L = {0.25, 0.75} and ν = 0.5, domain G = [0, 1]2,
C(L=0.25,0.75) 2 = {212, 568}.
For ν = 1.5 the KLD and the inverse (CH)−1 is hard to compute
numerically. Results in Table 4 are better since covariance ma-
trix with ν = 0.5 has smallest eigenvalues far enough from zero.
The case ν = 1.5 is more smooth, the eigenvalues decay faster,
but the smallest eigenvalues come much closer to zero than in
ν = 0.5 case.
4. Other applications
4.1 Low-rank approximation of Kriging and geo-
statistical optimal design
Let ˆs ∈ Rn to be estimated, Css covariance matrix, y ∈ Rm is
vector of measurements. The corresponding cross- and auto-
covariance matrices are denoted by Csy and Cyy, respectively,
sized n × m and m × m.
Kriging estimate ˆs = CsyC−1
yy y .
The estimation variance ˆσ is the diagonal of the cond. cov. ma-
trix Css|y: ˆσs = diag(Css|y) = diag Css − CsyC−1
yy Cys
Geostatistical optimal design:
φA = n−1 trace Css|y
φC = cT Css − CsyC−1
yy Cys c, c − a vector.
4.2 Weather forecast in Europa
180 240
30
60
Figure 6: Europa weather stations (≈ 2500). Collected data set
M ∈ R2500×365
.
0 50 100 150 200 250 300 350 400
−20
−15
−10
−5
0
5
10
15
20
Figure 7: Truth temperature forecast and its low-rank approxi-
mation (rank 50 approximation of matrix M) in one station, rel.
error=25%.
5. Open question
1. Compute the whole spectrum of large covariance matrix
2. Compute KLD for large matrices (det Σ ?)
3. How sensible is KLD to H-matrix accuracy ?
4. Derive/estimate KLD for non-Gaussian distributions.
Acknowledgements
A. Litvinenko is a member of the KAUST SRI UQ Center.
References
1. B. N. Khoromskij, A. Litvinenko, H. G. Matthies, Application of
hierarchical matrices for computing the Karhunen?Lo´eve expan-
sion, Computing, Vol. 84, Issue 1-2, pp 49-67, 2008
2. R. Furrer, M. Genton, D. Nychka, Covariance tapering for in-
terpolation of large spatial datasets, J. Comp. & Graph. Stat.,
Vol.15, N3, pp502-523.
3. M. Stein, Limitations on low rank approximations for covari-
ance matrices of spatial data, Spat. Statistics, 2013
4. J. Castrillion-Candis, M. Genton, R. Yokota, Multi-Level Re-
stricted Maximum Likelihood Cov. Estim. and Kriging for Large
Non-Gridded Datasets, 2014.

More Related Content

PDF
Data sparse approximation of the Karhunen-Loeve expansion
PDF
Talk litvinenko prior_cov
PDF
Test yourself for JEE(Main)TP-2
PDF
Test yourself for JEE(Main)TP-3
PDF
Álgebra básica 2
PDF
Test yourself for JEE(Main)
PDF
H 2004 2007
PDF
C3 bronze 1
Data sparse approximation of the Karhunen-Loeve expansion
Talk litvinenko prior_cov
Test yourself for JEE(Main)TP-2
Test yourself for JEE(Main)TP-3
Álgebra básica 2
Test yourself for JEE(Main)
H 2004 2007
C3 bronze 1

What's hot (17)

PDF
ITA 2010 - fechada
DOCX
Guess paper x
PPT
Finding coordinates parabol,cubic
PDF
Spm additional-mathematical-formulae-pdf
PDF
Appendex h
PDF
Exams in college algebra
PDF
ITA 2016 - fechada
PDF
Summative Assessment Paper-3
PDF
Lista de derivadas e integrais
PDF
Summative Assessment Paper-2
PDF
Lista de integrais Calculo IV
DOC
12th mcq
DOCX
Unit 1 review packet
PDF
Lista de integrais2
DOC
Math practice for final exam
DOCX
ITA 2010 - fechada
Guess paper x
Finding coordinates parabol,cubic
Spm additional-mathematical-formulae-pdf
Appendex h
Exams in college algebra
ITA 2016 - fechada
Summative Assessment Paper-3
Lista de derivadas e integrais
Summative Assessment Paper-2
Lista de integrais Calculo IV
12th mcq
Unit 1 review packet
Lista de integrais2
Math practice for final exam
Ad

Viewers also liked (20)

PDF
Minimum mean square error estimation and approximation of the Bayesian update
PDF
Response Surface in Tensor Train format for Uncertainty Quantification
PDF
Data sparse approximation of the Karhunen-Loeve expansion
PDF
Likelihood approximation with parallel hierarchical matrices for large spatia...
PDF
A small introduction into H-matrices which I gave for my colleagues
PDF
Low-rank methods for analysis of high-dimensional data (SIAM CSE talk 2017)
PDF
Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...
PDF
Scalable hierarchical algorithms for stochastic PDEs and UQ
PDF
Computation of Electromagnetic Fields Scattered from Dielectric Objects of Un...
PPTX
Disc seeding in conservation agriculture
PDF
Tensor train to solve stochastic PDEs
PDF
add_2_diplom_main
PDF
My PhD on 4 pages
PDF
Possible applications of low-rank tensors in statistics and UQ (my talk in Bo...
PDF
Litvinenko low-rank kriging +FFT poster
PDF
Litvinenko nlbu2016
PDF
Low-rank tensor methods for stochastic forward and inverse problems
PDF
My PhD talk "Application of H-matrices for computing partial inverse"
PDF
My paper for Domain Decomposition Conference in Strobl, Austria, 2005
Minimum mean square error estimation and approximation of the Bayesian update
Response Surface in Tensor Train format for Uncertainty Quantification
Data sparse approximation of the Karhunen-Loeve expansion
Likelihood approximation with parallel hierarchical matrices for large spatia...
A small introduction into H-matrices which I gave for my colleagues
Low-rank methods for analysis of high-dimensional data (SIAM CSE talk 2017)
Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...
Scalable hierarchical algorithms for stochastic PDEs and UQ
Computation of Electromagnetic Fields Scattered from Dielectric Objects of Un...
Disc seeding in conservation agriculture
Tensor train to solve stochastic PDEs
add_2_diplom_main
My PhD on 4 pages
Possible applications of low-rank tensors in statistics and UQ (my talk in Bo...
Litvinenko low-rank kriging +FFT poster
Litvinenko nlbu2016
Low-rank tensor methods for stochastic forward and inverse problems
My PhD talk "Application of H-matrices for computing partial inverse"
My paper for Domain Decomposition Conference in Strobl, Austria, 2005
Ad

Similar to Hierarchical matrix approximation of large covariance matrices (20)

PDF
Hierarchical matrix techniques for maximum likelihood covariance estimation
PDF
Approximation of large covariance matrices in statistics
PDF
Poster litvinenko genton_ying_hpcsaudi17
PDF
Application of parallel hierarchical matrices and low-rank tensors in spatial...
PDF
Data sparse approximation of Karhunen-Loeve Expansion
PDF
Overview of sparse and low-rank matrix / tensor techniques
PDF
New data structures and algorithms for \\post-processing large data sets and ...
PDF
Identification of unknown parameters and prediction with hierarchical matrice...
PDF
Application of parallel hierarchical matrices for parameter inference and pre...
PDF
Hierarchical matrices for approximating large covariance matries and computin...
PDF
Identification of unknown parameters and prediction of missing values. Compar...
PDF
Bounded Approaches in Radio Labeling Square Grids -- Dev Ananda
PDF
Application of hierarchical matrices for partial inverse
PDF
Application of Parallel Hierarchical Matrices in Spatial Statistics and Param...
PDF
Application H-matrices for solving PDEs with multi-scale coefficients, jumpin...
PDF
Tucker tensor analysis of Matern functions in spatial statistics
PPT
Approx
PDF
Lecture5
PDF
A Factor Graph Approach To Constrained Optimization
Hierarchical matrix techniques for maximum likelihood covariance estimation
Approximation of large covariance matrices in statistics
Poster litvinenko genton_ying_hpcsaudi17
Application of parallel hierarchical matrices and low-rank tensors in spatial...
Data sparse approximation of Karhunen-Loeve Expansion
Overview of sparse and low-rank matrix / tensor techniques
New data structures and algorithms for \\post-processing large data sets and ...
Identification of unknown parameters and prediction with hierarchical matrice...
Application of parallel hierarchical matrices for parameter inference and pre...
Hierarchical matrices for approximating large covariance matries and computin...
Identification of unknown parameters and prediction of missing values. Compar...
Bounded Approaches in Radio Labeling Square Grids -- Dev Ananda
Application of hierarchical matrices for partial inverse
Application of Parallel Hierarchical Matrices in Spatial Statistics and Param...
Application H-matrices for solving PDEs with multi-scale coefficients, jumpin...
Tucker tensor analysis of Matern functions in spatial statistics
Approx
Lecture5
A Factor Graph Approach To Constrained Optimization

More from Alexander Litvinenko (20)

PDF
Poster_density_driven_with_fracture_MLMC.pdf
PDF
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
PDF
litvinenko_Intrusion_Bari_2023.pdf
PDF
Density Driven Groundwater Flow with Uncertain Porosity and Permeability
PDF
litvinenko_Gamm2023.pdf
PDF
Litvinenko_Poster_Henry_22May.pdf
PDF
Uncertain_Henry_problem-poster.pdf
PDF
Litvinenko_RWTH_UQ_Seminar_talk.pdf
PDF
Litv_Denmark_Weak_Supervised_Learning.pdf
PDF
Computing f-Divergences and Distances of High-Dimensional Probability Density...
PDF
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
PDF
Low rank tensor approximation of probability density and characteristic funct...
PDF
Computation of electromagnetic fields scattered from dielectric objects of un...
PDF
Low-rank tensor approximation (Introduction)
PDF
Computation of electromagnetic fields scattered from dielectric objects of un...
PDF
Computation of electromagnetic fields scattered from dielectric objects of un...
PDF
Propagation of Uncertainties in Density Driven Groundwater Flow
PDF
Simulation of propagation of uncertainties in density-driven groundwater flow
PDF
Semi-Supervised Regression using Cluster Ensemble
PDF
Talk Alexander Litvinenko on SIAM GS Conference in Houston
Poster_density_driven_with_fracture_MLMC.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Intrusion_Bari_2023.pdf
Density Driven Groundwater Flow with Uncertain Porosity and Permeability
litvinenko_Gamm2023.pdf
Litvinenko_Poster_Henry_22May.pdf
Uncertain_Henry_problem-poster.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdf
Litv_Denmark_Weak_Supervised_Learning.pdf
Computing f-Divergences and Distances of High-Dimensional Probability Density...
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
Low rank tensor approximation of probability density and characteristic funct...
Computation of electromagnetic fields scattered from dielectric objects of un...
Low-rank tensor approximation (Introduction)
Computation of electromagnetic fields scattered from dielectric objects of un...
Computation of electromagnetic fields scattered from dielectric objects of un...
Propagation of Uncertainties in Density Driven Groundwater Flow
Simulation of propagation of uncertainties in density-driven groundwater flow
Semi-Supervised Regression using Cluster Ensemble
Talk Alexander Litvinenko on SIAM GS Conference in Houston

Recently uploaded (20)

PPTX
Cell Types and Its function , kingdom of life
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
Classroom Observation Tools for Teachers
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
master seminar digital applications in india
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
Institutional Correction lecture only . . .
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PDF
TR - Agricultural Crops Production NC III.pdf
PPTX
Cell Structure & Organelles in detailed.
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
Pharma ospi slides which help in ospi learning
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
01-Introduction-to-Information-Management.pdf
Cell Types and Its function , kingdom of life
Abdominal Access Techniques with Prof. Dr. R K Mishra
Classroom Observation Tools for Teachers
O7-L3 Supply Chain Operations - ICLT Program
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
master seminar digital applications in india
human mycosis Human fungal infections are called human mycosis..pptx
Institutional Correction lecture only . . .
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
TR - Agricultural Crops Production NC III.pdf
Cell Structure & Organelles in detailed.
Supply Chain Operations Speaking Notes -ICLT Program
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Pharma ospi slides which help in ospi learning
Renaissance Architecture: A Journey from Faith to Humanism
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
O5-L3 Freight Transport Ops (International) V1.pdf
PPH.pptx obstetrics and gynecology in nursing
Microbial disease of the cardiovascular and lymphatic systems
01-Introduction-to-Information-Management.pdf

Hierarchical matrix approximation of large covariance matrices

  • 1. Center for Uncertainty Quantification Hierarchical matrix approximation of large covariance matrices A. Litvinenko1 , M. Genton2 , Ying Sun2 , R. Tempone 1 SRI-UQ Center and 2 Spatio-Temporal Statistics & Data Analysis Group at KAUST alexander.litvinenko@kaust.edu.sa Center for Uncertainty Quantification Center for Uncertainty Quantification Abstract We approximate large non-structured covariance ma- trices in the H-matrix format with a log-linear com- putational cost and storage O(n log n). We compute inverse, Cholesky decomposition and determinant in H-format. As an example we consider the class of Matern covariance functions, which are very popu- lar in spatial statistics, geostatistics, machine learning and image analysis. Applications are: kriging and op- timal design 1. Matern covariance C(x, y) = C(|x−y|) = σ2 1 Γ(ν)2ν−1 √ 2ν r L ν Kν √ 2ν r L , where Γ is the gamma function, Kν is the modified Bessel function of the second kind, r = |x − y| and ν, L are non-negative parameters of the covariance. −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 0 0.05 0.1 0.15 0.2 0.25 Matern covariance (nu=1) σ=0.5, l=0.5 σ=0.5, l=0.3 σ=0.5, l=0.2 σ=0.5, l=0.1 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 0 0.05 0.1 0.15 0.2 0.25 nu=0.15 nu=0.3 nu=0.5 nu=1 nu=2 nu=30 As ν → ∞ [4], C(r) = σ2 exp(−r2 /2L2 ). When ν = 0.5, the Matern covariance is identical to the exponential covariance function. Cν=3/2(r) = 1 + √ 3r L exp − √ 3r L Cν=5/2(r) = 1 + √ 5r L + 5r2 3L2 exp − √ 5r L . Note: no need to assume neither C(x, y) = C(|x − y|) nor tensor grid. 2. H-matrix approximation 25 20 20 20 20 16 20 16 20 20 16 16 20 16 16 16 19 20 20 19 32 19 19 16 16 32 19 20 20 19 19 16 19 16 32 32 20 20 20 20 32 32 32 20 19 19 19 32 20 19 16 16 32 32 20 32 32 20 32 32 32 32 20 32 32 20 19 19 19 20 16 19 16 32 32 20 32 32 32 32 32 20 32 32 32 20 32 20 20 20 20 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 20 20 20 19 20 20 32 32 32 20 20 20 32 20 32 32 20 20 20 32 32 20 32 20 20 20 32 32 32 32 20 20 20 20 19 20 32 32 32 20 20 20 32 20 32 32 20 20 20 32 32 20 32 20 20 20 32 32 32 32 20 20 20 20 20 20 32 32 32 20 19 20 19 32 32 32 20 20 19 19 32 32 32 20 20 20 20 32 32 32 32 20 32 32 32 20 32 32 32 20 32 32 32 20 32 32 32 32 20 32 32 32 20 32 32 32 20 32 32 32 20 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 20 20 20 20 2019 20 20 20 32 32 32 32 20 32 32 20 32 32 32 32 20 32 32 20 20 20 20 20 20 20 20 20 20 20 32 20 32 32 20 20 20 20 20 20 20 20 20 20 20 32 20 32 32 20 20 20 20 20 20 20 20 20 20 20 32 20 32 32 32 20 32 32 32 20 32 32 32 20 32 32 20 20 20 20 20 20 20 20 19 20 20 20 32 32 32 20 32 32 32 32 32 20 32 32 32 20 32 20 20 20 20 20 20 20 20 20 20 20 32 32 20 32 20 20 20 20 20 20 20 20 20 20 20 32 32 20 32 20 20 20 20 20 20 20 20 20 20 20 32 32 20 32 32 32 20 32 32 32 20 32 32 32 20 32 20 20 20 20 20 20 20 20 20 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 19 19 20 20 32 32 32 32 20 32 32 20 32 32 32 32 20 32 32 19 20 19 20 32 32 32 20 32 32 32 32 32 20 32 32 32 20 32 20 20 20 20 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 20 20 32 32 32 32 20 20 20 32 20 32 32 20 20 20 32 32 20 32 20 20 20 32 32 32 32 20 20 32 32 32 32 20 20 20 32 20 32 32 20 20 20 32 32 20 32 20 20 20 32 32 32 32 20 20 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 20 32 32 32 20 32 20 32 20 32 32 32 20 32 20 32 32 20 32 32 32 20 32 32 32 20 20 32 32 20 20 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 25 9 9 20 9 9 20 7 7 16 9 9 20 9 9 20 9 9 32 9 9 20 9 9 20 9 9 32 9 9 32 9 9 32 9 9 20 9 9 20 9 9 32 9 9 20 9 9 20 9 9 32 9 9 32 9 9 32 9 9 32 9 9 32 9 9 20 9 9 20 9 9 32 9 9 32 9 9 32 9 9 20 9 9 20 9 9 32 9 9 32 9 9 32 9 9 32 9 9 32 9 9 32 9 9 32 9 9 32 9 9 32 9 9 32 9 9 32 Figure 2: Two approximation strategies [1]: fixed rank (left) and flexible rank (right) approximations, C ∈ Rn×n, n = 652. I I I I I I I I I I1 1 2 2 11 12 21 22 I11 I12 I21 I22 Q Qt S dist H= t s 1. Build cluster tree TI, I = {1, 2, ..., n} 2. Build block cluster tree TI×I 3. For each (t × s) ∈ TI×I, t, s ∈ TI, check admissibility condition min{diam(Qt), diam(Qs)} ≤ η·dist(Qt, Qs). if(adm=true) then M|t×s is a rank-k matrix block if(adm=false) then divide M|t×s further or define as a dense matrix block, if small enough. Grid → cluster tree (TI) + admissibility condition → block cluster tree (TI×I) → H-matrix → H-matrix arith- metics. Operation Sequential Complexity Parallel Complexity (Hackbusch et al. ’99-’06) (Kriemann ’05) storage(M) N = O(kn log n) N P Mx N = O(kn log n) N P M1 ⊕ M2 N = O(k2n log n) N P M1 M2, M−1 N = O(k2n log2 n) N P + O(n) H-LU N = O(k2n log2 n) N P + O(k2 n log2 n n1/d ) Table 1: Computational cost of H-matrix arithmetics, sequential and parallel. Let ε = (C−CH )z 2 C 2 z 2 , where z is a random vector. n rank k size, MB t, sec. ε max i=1..10 |λi − ˜λi|, i ε2 for ˜C C ˜C C ˜C 4.0 · 103 10 48 3 0.8 0.08 7 · 10−3 7.0 · 10−2 , 9 2.0 · 10−4 1.05 · 104 18 439 19 7.0 0.4 7 · 10−4 5.5 · 10−2 , 2 1.0 · 10−4 2.1 · 104 25 2054 64 45.0 1.4 1 · 10−5 5.0 · 10−2 , 9 4.4 · 10−6 Table 2: Accuracy of the H-matrix approx. exp. covariance function, l1 = l3 = 0.1, l2 = 0.5. l1 l2 ε 0.01 0.02 3 · 10−2 0.1 0.2 8 · 10−3 0.5 1 2.8 · 10−5 Table 3: Dependence of the H-matrix accuracy on the covari- ance lengths l1 and l2, n = 1292. The smaller cov. length the less accurate is H-approximation. 0 100 200 300 0 50 100 150 200 250 300 −1 0 1 2 −1 −0.5 0 0.5 1 1.5 2 2.5 3 0 100 200 300 0 50 100 150 200 250 300 −1 0 1 2 −3 −2 −1 0 1 2 Figure 4: Two realizations of random field generated via Cholesky decomposition of Matern covariance matrix, ν = 0.4. 3. Kullback-Leibler divergence Measure of the information lost when distribution Q is used to approximate P. DKL(P Q) = i P(i) ln P(i) Q(i) , DKL(P Q) = ∞ −∞ p(x) ln p(x) q(x) dx, where p, q densities of P and Q. For miltivariate normal distribu- tions (µ0, C) and (µ1, CH): 2DKL(N0 N1) = tr((CH )−1 C) + (µ1 − µ0)T (CH )−1 (µ1 − µ0) − n − ln det C det CH . 0 10 20 30 40 50 60 70 80 90 100 −16 −14 −12 −10 −8 −6 −4 −2 0 rank k log(rel.error) Spectral norm, L=0.1, nu=0.5 Frob. norm, L=0.1 Spectral norm, L=0.2 Frob. norm, L=0.2 Spectral norm, L=0.5 Frob. norm, L=0.5 0 10 20 30 40 50 60 70 80 90 100 −18 −16 −14 −12 −10 −8 −6 −4 −2 0 rank k log(rel.error) Spectral norm, L=0.1, ν=1.5 Frob. norm, L=0.1 Spectral norm, L=0.2 Frob. norm, L=0.2 Spectral norm, L=0.5 Frob. norm, L=0.5 Figure 5: Relative H-matrix approx. error C−CH 2 for different cov. lengths L = {0.1, 0.2, 0.5} and ν = {0.5, 1.5} k KLD(C, CH) C − CH 2 C(CH)−1 − I 2 L = 0.25 L = 0.75 L = 0.25 L = 0.75 L = 0.25 L = 0.75 5 0.51 2.3 4.0e-2 0.1 4.8 63 6 0.34 1.6 9.4e-3 0.02 3.4 22 8 5.3e-2 0.4 1.9e-3 0.003 1.2 8 10 2.6e-3 0.2 7.7e-4 7.0e-4 6.0e-2 3.1 12 5.0e-4 2e-2 9.7e-5 5.6e-5 1.6e-2 0.5 15 1.0e-5 9e-4 2.0e-5 1.1e-5 8.0e-4 0.02 20 4.5e-7 4.8e-5 6.5e-7 2.8e-7 2.1e-5 1.2e-3 50 3.4e-13 5e-12 2.0e-13 2.4e-13 4e-11 2.7e-9 Table 4: Dependence of KLD on H-matrix rank k, Matern co- variance with L = {0.25, 0.75} and ν = 0.5, domain G = [0, 1]2, C(L=0.25,0.75) 2 = {212, 568}. For ν = 1.5 the KLD and the inverse (CH)−1 is hard to compute numerically. Results in Table 4 are better since covariance ma- trix with ν = 0.5 has smallest eigenvalues far enough from zero. The case ν = 1.5 is more smooth, the eigenvalues decay faster, but the smallest eigenvalues come much closer to zero than in ν = 0.5 case. 4. Other applications 4.1 Low-rank approximation of Kriging and geo- statistical optimal design Let ˆs ∈ Rn to be estimated, Css covariance matrix, y ∈ Rm is vector of measurements. The corresponding cross- and auto- covariance matrices are denoted by Csy and Cyy, respectively, sized n × m and m × m. Kriging estimate ˆs = CsyC−1 yy y . The estimation variance ˆσ is the diagonal of the cond. cov. ma- trix Css|y: ˆσs = diag(Css|y) = diag Css − CsyC−1 yy Cys Geostatistical optimal design: φA = n−1 trace Css|y φC = cT Css − CsyC−1 yy Cys c, c − a vector. 4.2 Weather forecast in Europa 180 240 30 60 Figure 6: Europa weather stations (≈ 2500). Collected data set M ∈ R2500×365 . 0 50 100 150 200 250 300 350 400 −20 −15 −10 −5 0 5 10 15 20 Figure 7: Truth temperature forecast and its low-rank approxi- mation (rank 50 approximation of matrix M) in one station, rel. error=25%. 5. Open question 1. Compute the whole spectrum of large covariance matrix 2. Compute KLD for large matrices (det Σ ?) 3. How sensible is KLD to H-matrix accuracy ? 4. Derive/estimate KLD for non-Gaussian distributions. Acknowledgements A. Litvinenko is a member of the KAUST SRI UQ Center. References 1. B. N. Khoromskij, A. Litvinenko, H. G. Matthies, Application of hierarchical matrices for computing the Karhunen?Lo´eve expan- sion, Computing, Vol. 84, Issue 1-2, pp 49-67, 2008 2. R. Furrer, M. Genton, D. Nychka, Covariance tapering for in- terpolation of large spatial datasets, J. Comp. & Graph. Stat., Vol.15, N3, pp502-523. 3. M. Stein, Limitations on low rank approximations for covari- ance matrices of spatial data, Spat. Statistics, 2013 4. J. Castrillion-Candis, M. Genton, R. Yokota, Multi-Level Re- stricted Maximum Likelihood Cov. Estim. and Kriging for Large Non-Gridded Datasets, 2014.