SlideShare a Scribd company logo
Computational Information Geometry
on Matrix Manifolds
Frank Nielsen
Frank.Nielsen@acm.org
www.informationgeometry.org
Sony Computer Science Laboratories, Inc.

July 2013, ICTP, Trieste, IT

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

1/56
Geometry of matrix manifolds...
◮

Euclidean geometry, Fr¨benius norm → distance:
o
M

2
F

2
mij =

=
i ,j

Mi ∗
i

2
2

=

M∗j

2
2

= tr(M ⊤ M)

j

◮

Riemannian geometry of symmetric positive definite (SPD)
matrices [9, 2]

◮

Riemannian geometry of rank-deficient positive semi-definite
(SPSD) matrices
Stiefel/Grassman manifolds [3]

◮

Quantum geometry: SPD matrices with unit trace
“One geometry cannot be more true than another;
it can only be more convenient”,
— Jules Henri Poincar´ (1902)
e

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

2/56
Forthcoming conference (GSI)
28th-30th August, Paris.

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

3/56
What is Computational Information Geometry?
◮

What is Information? = Essence of data (datum=“thing”)
(make it tangible → e.g., parameters of generative models)

◮

Can we do Intrinsic computing?
(unbiased by any particular “data representation” → same
results after recoding data)

◮

Geometry −→ Science of invariance
(mother of Science, compass & ruler, Descartes
analytic=coordinate/Cartesian, imaginaries, ...).
...the open-ended poetic mathematics!

?!

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

4/56
Rationale for Computational Information Geometry
◮

Information is ...never void! → lower bounds
◮
◮
◮
◮

◮

Geometry:
◮

◮

◮

Fisher information and Cram´r-Rao lower bound (estimation)
e
Bayes error and Chernoff information (classification)
Coding and Shannon entropy (communication)
Program and Kolmogorov complexity (compression).
(Unfortunately not computable!)

Language (point, line, ball, dimension, orthogonal, projection,
geodesic, immersion, etc.)
Power of characterization (eg., intersection of two
pseudo-segments not admitting closed-form expression)

Computing: Information computing. Seeking for mathematical
convenience and mathematical tricks (RKHS in ML).
How to manipulate “space of functions” ?!?

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

5/56
Example I: Matrix manifold
Pattern = Gaussian mixture models (universal class)
Statistical (dis)similarity/distance: total Bregman divergence
(tBD, tKL).
Invariance: ..., xi ∼ N(µi , Σi ), y = A(x) = Lx + t,
yi ∼ N(Lµi + t, LΣi L⊤ ), D(X1 : X2 ) = D(Y1 : Y2 )
(L: any invertible affine transformation, t a translation)

Shape Retrieval using Hierarchical Total Bregman Soft Clustering [7],
IEEE PAMI, 2012.

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

6/56
Example II: Matrix manifolds
DTI: diffusion ellipsoids, tensor interpolation.
Pattern = zero-centered “Gaussians“
Statistical (dis)similarity/distance: total Bregman divergence
(tBD, tKL).
Invariance: ..., D(A⊤ PA : A⊤ QA) = D(P : Q), A ∈ SL(d):
orthogonal matrix
(volume/orientation preserving)
total Bregman divergence (tBD).

(3D rat corpus callosum)
c

Total Bregman Divergence and its Applications to DTI Analysis [20],
IEEE TMI. 2011. Science Laboratories, Inc.
2013 Frank Nielsen, Sony Computer

7/56
Example III: Gaussian manifolds
Consider 5D Gaussian Mixture Models (GMMs) of color images
(image=RGBxy point set)

A Gaussian mixture model
wi N(µi , Σi ) is interpreted as a
weighted point set {θi = (µi , Σi )}.
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

8/56
Matrix center points & clustering
Aggregation (matrix quantization for codebooks):
Given a data-set of matrices M = {M1 , ..., Mn } ⊂ M, compute a
center matrix C .
Centering as a variational minimization problem:
wi distancep (C , Mi )

(OPT ) : Cp = arg min

C ∈M

i

Notion of centrality, robustness to outliers?
For diagonal matrices, with “Euclidean” distance, usual geometric
center points:
◮

◮
◮

median (p = 1): robust to outliers (Fermat-Weber point, no
closed form),
centroid (p = 2): breakdown point of 1 (→ tBD)),
circumcenter (lim p → ∞): minimize farthest point
(minimax [1]).

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

9/56
Diffusion Tensor Magnetic Resonance Imaging
DT-MRI: Measures anisotropic diffusion of water molecules in a
3 × 3 tensor assigned to each voxel position (1990˜).
Used to analyze in-vivo connectivity patterns of brain tissues:
gray matter, white matter (corpus callosum) and cerebrospinal
fluid (CSF)

c Image courtesy Peter J. Basser
(Magnetic resonance imaging of the brain and spine, Chapter 31)
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

10/56
Gradiometry tensor: 3 × 3 SPSD matrices
Beyond the “constant” g ≃ 9.81m/s 2 . Gravity field measuring
anisotropy.

→ Oil & gas industry.
Courtesy of BellGeo.
http://www.bellgeo.com/tech/technology_theory_of_FTG.html
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

11/56
Structure tensors in computer vision
→ Pioneered in image processing: tensor descriptor of a region at
a pixel. (Harris-Stephens [6]).
Consider a kernel, and compute the tensor descriptor
I ′2 (x)

T (p = (x, y )) = K ∗

I ′ (y )I ′ (x)

I ′ (x)I ′ (y )
I ′ (y )2

,

w (u, v )∇I (u, v )(∇I (u, v ))T

=
u,v

K : uniform, Gaussian kernel (eg., s × s window W centered at the
pixel p)
I ′ (x), I ′ (y ): gradient, derivatives of the image.
Versatile method: corner detection, optical flow estimation,
segmentation, stereo matching, etc.
→ Tensor image processing
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

12/56
Harris-Stephens structure tensor (1988)
Deformation tensor field

Harris-Stephens combined corner-edge detector:
R = det T − k(tr T )2
→ Measures of tensor anisotropy.
Structure tensor represents local orientation
(eigenvectors/eigenvalues).
Harris-Stephens’ combined corner/edge detector (note)
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

13/56
Matrix with Fr¨benius metric distance
o
Matrix space M with vectorial structure
dE (P, Q) =

P −Q

=

F

tr(P − Q)T (P − Q)

Centroid of tensors:
1
CE =
n

(1)
(2)

n

wi Ti
i =1

→ scalar average of each element of the tensor.
Tensor Field Segmentation Using Region Based Active Contour
Model [21], ECCV, 2004.

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

14/56
Matrix vectorization & computational geometry
Computational geometry on w × h-dimensional matrix spaces wrt
Fr¨benius distance amounts to computational geometry on
o
Euclidean vector space for D = w × h.
→ Voronoi diagrams, smallest enclosing ball, minimum spanning
tree, etc.
For symmetric matrices, we have D = d(d+1) degrees of freedom,
2
and vectorize as follows:
d

M

F

d
2
mij

=
i =1 j=1

d−1

d

d
2
mij

2
mii + 2

=
i =1

i =1 j=i +1

= m 2
√
√
√
with m = [m11 ...mdd 2m12 2m1d ... 2md−1,d ]T = M.
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

15/56
Matrix functions

From the spectral decomposition:
M = UDU ⊤
with D = λ(M) = diag(λ1 , ..., λd ) the diagonal matrix of
eigenvalues, consider real-valued function x → f (x) to extend to
matrices as
f (M) = U diag(f (λ1 ), ..., f (λd )) U T
1

Examples: log x, exp x, |x|, x 2 , x 2 , etc.
O(d 3 ) SVD factorization complexity.

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

16/56
Riemannian cone of SPD matrices
Exponential maps from tangent planes (symmetric matrices Sym)
to the manifold cone C:
expP

: TP C = Sym → C

Logarithmic maps from manifold cone C to tangent planes:
logP

: C → TP C = Sym
1

1

1

1

logP (Q) = P 2 log(P − 2 QP − 2 )P 2

Map any point Q ∈ Sym++ to unique tangent vector at P such
that γ0 = P and γ1 = Q.
Geodesic equation:
1

1

1

γt (P, Q) = P 2 P − 2 QP − 2

t

1

P2

Geodesic (metric length) distance:
1

1

dR (P, Q) = log P − 2 QP − 2
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

17/56
Riemannian Karcher centroid
d

dR (P, Q) =

log2 λi

tr log2 (P −1 Q) =
i =1

=

log P

1
−2

QP

1
−2

, where the λi ’s are the eigenvalues of P −1 Q.
1
1
(P −1 Q = Q 2 P −1 Q 2 )
Unique mean characterized by n=1 log(Ti−1 CR ) = 0
i
Closed-form solution only for n = 2:
1

1

1

1
2

1

CR (P, Q) = P 2 P − 2 QP − 2 P 2 otherwise iterative
approximation (CR = limt→∞ Ct ):
Ct+1 = Ct exp
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

1
n

n

log Ct−1 Ti

.

i =1
18/56
Riemannian minimax SPD center (circumcenter [1])
Case of p = ∞, center that minimizes the maximum distance.
GEO-ALG:
Starts with c1 ∈ P and iteratively update the current
1
circumcenter as follows: ci +1 = Geodesic(ci , fi , i +1 ),
where fi denotes the farthest point of P to ci , and
Geodesic(p, q, t) denotes the intermediate point m
on the geodesic passing through p and q such that
ρ(p, m) = t × ρ(p, q).
Geodesic:

1

1

1

γt (P, Q) = P 2 P − 2 QP − 2

t

1

P2

Find t such that d=1 log2 λt = t 2 d=1 log2 λi = r 2
i
i
i
That is t = r .
Prove core-set and guaranteed convergence.
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

d
2
i =1 log λi .

19/56
Matrices as parameters in probability distributions
Exponential families: Gaussian, Wishart, etc.:
p(x; λ) = pF (x; θ) = exp ( t(x), θ − F (θ) + k(x)) .
Example: Poisson distribution
p(x; λ) =

λx
exp(−λ),
x!

◮

the sufficient statistic t(x) = x,

◮

θ = log λ, the natural parameter,

◮

F (θ) = exp θ, the log-normalizer → CONVEX,

◮

and k(x) = − log x! the carrier measure
(with respect to the counting measure).

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

20/56
Gaussians as an exponential family
p(x; λ) = p(x; µ, Σ) =

1
(x − µ)T Σ−1 (x − µ))
√
exp −
2
2π det Σ

1
θ = (Σ−1 µ, 2 Σ−1 ) ∈ Θ = Rd × Kd×d , with Kd×d cone of
positive definite matrices,
◮ F (θ) = 1 trθ −1 θ1 θ T − 1 log det θ2 + d log π → CONVEX
1
2
4
2
2
◮ t(x) = (x, −x T x),
◮ k(x) = 0.
Inner product : composite, sum of a dot product and a matrix
trace :
T ′
T ′
θ, θ ′ = θ1 θ1 + trθ2 θ2 .
◮

The coordinate transformation τ : Λ → Θ is given for λ = (µ, Σ)
by
τ (λ) =

1
λ−1 λ1 , λ−1 ,
2
2 2

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

τ −1 (θ) =

1 −1 1 −1
θ θ1 , θ2
2 2
2
21/56
Convex duality: Legendre transformation
◮

For a strictly convex and differentiable function F : X → R:
F ∗ (y ) = sup { y , x − F (x)}
x∈X

lF (y ;x);
◮

Maximum obtained for y = ∇F (x):
∇x lF (y ; x) = y − ∇F (x) = 0 ⇒ y = ∇F (x)

◮

Maximum unique from convexity of F (∇2 F ≻ 0):
∇2 lF (y ; x) = −∇2 F (x) ≺ 0
x

◮

Convex conjugates:
(F , X ) ⇔ (F ∗ , Y),

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

Y = {∇F (x) | x ∈ X }
22/56
Legendre duality: Geometric interpretation
Consider the epigraph of F as a convex object:
◮ convex hull (V -representation), versus
◮ half-space (H-representation).

Legendre transform also called “slope” transform.

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

23/56
Legendre duality & Canonical divergence
◮

◮
◮

Convex conjugates have functional inverse gradients
∇F −1 = ∇F ∗
∇F ∗ may require numerical approximation
(not always available in analytical closed-form)
Involution: (F ∗ )∗ = F with ∇F ∗ = (∇F )−1 .
Convex conjugate F ∗ expressed using (∇F )−1 :
F ∗ (y ) =
=

◮

x, y − F (x), x = ∇y F ∗ (y )

(∇F )−1 (y ), y − F ((∇F )−1 (y ))

Fenchel-Young inequality at the heart of canonical divergence:
F (x) + F ∗ (y ) ≥ x, y
AF (x : y ) = AF ∗ (y : x) = F (x) + F ∗ (y ) − x, y ≥ 0

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

24/56
Dual Bregman divergences & canonical divergence [14]
p(x)
≥0
q(x)
= BF (θQ : θP ) = BF ∗ (ηP : ηQ )

KL(P : Q) = EP log

= F (θQ ) + F ∗ (ηP ) − θQ , ηP

= AF (θQ : ηP ) = AF ∗ (ηP : θQ )

with θQ (natural parameterization) and ηP = EP [t(X )] = ∇F (θP )
(moment parameterization).
1
1
dx − p(x) log
dx
KL(P : Q) = p(x) log
q(x)
p(x)
H × (P:Q)

H(p)=H × (P:P)

Shannon cross-entropy and entropy of EF [14]:
H × (P : Q) = F (θQ ) − θQ , ∇F (θP ) − EP [k(x)]
H(P) = F (θP ) − θP , ∇F (θP ) − EP [k(x)]

H(P) = −F ∗ (ηP ) − EP [k(x)]
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

25/56
Bregman divergence: Geometric interpretation (I)
Potential function F , graph plot F : (x, F (x)).
DF (p : q) = F (p) − F (q) − p − q, ∇F (q)

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

26/56
Bregman divergence: Geometric interpretation (II)
Potential function f , graph plot F : (x, f (x)).
Bf (p||q) = f (p) − f (q) − (p − q)f ′ (q)

Bf (.||q): vertical distance between the hyperplane Hq tangent to
F at lifted point q , and the translated hyperplane at p .
ˆ
ˆ
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

27/56
Bregman divergence: Geometric interpretation (III)
Bregman divergence and path integrals
B(θ1 : θ2 ) = F (θ1 ) − F (θ2 ) − θ1 − θ2 , ∇F (θ2 ) ,

(3)

θ1

=
θ2
η2

=
η1
∗

∇F (t) − ∇F (θ2 ), dt ,

(4)

∇F ∗ (t) − ∇F ∗ (η1 ), dt ,

(5)

= B (η2 : η1 )

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

(6)

28/56
Matrix Bregman divergences [4, 16]
Choose F a real-valued functional generator and extend F to
matrices:
F (X ) = tr(Ψ(X ))
tF ,k N k

Ψ(X ) =
k≥0

(tF ,k from the Taylor expansion of real-valued F )
BF (P : Q) = F (P) − F (Q) − tr((P − Q)⊤ ∇F (Q)),
∇F (X ) =

′
tF ,k N k
k≥0

′
(tF ,k from the Taylor expansion of real-valued F ′ )

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

29/56
Matrix Bregman divergences [16]

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

30/56
Particular case: Bregman Schatten p-divergences [5, 16]

Schatten p-norm of real symmetric matrix X :
(unitarily invariant matrix norms)
X

p

= λ(X )

p

Bregman generator:

1
X 2
p
2
Used in regularized convex optimization [5], matrix data
mining [16].
F (X ) =

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

31/56
Matrix Legendre transformation

Extends classical Legendre-Fenchel transformation:
F ∗ (η) =

sup
spec(θ)⊆dom(F )

tr(θη ⊤ ) − F (θ)

DF (θP : θQ ) = DF ∗ (ηQ : ηP ) = F (θ) + F ∗ (η) − tr(θη ⊤ )
θ and η are dual matrix coordinate systems on the matrix manifold.

Non-metric differential structure with dual coordinate systems.

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

32/56
Bregman matrix means
BF (X , P) = F (X ) − F (P) − tr((X − P)T ∇F (P)),
F (·): strictly convex and differentiable function on an open convex
space.
n

C = ∇F −1

i =1

wi ∇F (Ti )

quasi-arithmetic mean for ∇F .
Since BF (X , P) = BF (P, X ), define a right-sided centroid M ′ :
Find the center of mass [13] (independent of generator F )
F (X ) = tr(X T X ): the quadratic matrix entropy,
F (X ) = − log det X : the matrix Burg entropy, and
F (X ) = tr(X log X − X ): the von Neumann entropy [19, 18, 15]
(Umegaki quantum relative entropy).
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

33/56
Total Bregman divergences (tBD)
Instead of ”vertical” projection in Bregman divergence, consider
perpendicular projection.
(Analogy with least squares and total least squares regression.)

tBF (P, Q) =

BF (P, Q)
1 + ∇F (Q)

2

→ proven statistically robust.
Applications to robust DT-MRI segmentation [8].
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

34/56
Matrix Jensen/Burbea-Rao divergences [10]

Convexity gap defines a divergence
BRF (P, Q) =

◮
◮
◮
◮

F (P) + F (Q)
−F
2

P +Q
2

≥0

F (X ) = tr(X T X ): the quadratic matrix entropy,
F (X ) = − log det X : the matrix Burg entropy, and

F (X ) = tr(X log X − X ): the von Neumann entropy.
etc.

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

35/56
Smooth family of convex generators [12, 17]
1-parameter family of generators:
Fα (X ) =

1
tr(αX − X α + (1 − α)I ), α = {0, 1}
α(1 − α)

Bα (P : Q) =

∇Fα (X ) =

1
tr(Q α − P α + αQ α−1 (P − Q))
α(1 − α)

1
(I − X α−1 )
α−1

1

−1
∇Fα (X ) = (I − (α − 1)X ) α−1

When α → 1, ∇Fα (X ) = ∇F1 (X ) = log X . When α → 0,
∇Fα (X ) = ∇F0 (X ) = X −1 − I .
◮

α = 2: Quadratic matrix information

◮

α → 1: von Neumann information

◮

α → 0: Burg log-det information

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

36/56
Jensen (Burbea-Rao) divergences
Based on Jensen’s inequality for a convex function F :
BRF (X , P) =

F (X ) + F (P)
−F
2

X +P
2

def

= ≥ 0.

strictly convex function F (·).
Includes the special case of Jensen-Shannon divergence:
JS(p, q) = H

p+q
2

−

H(p) + H(q)
2

F (x) = −H(x), the negative Shannon entropy H(x) = −x log x.
→ generators are convex and entropies are concave (negative
generators)

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

37/56
Visualizing Burbea-Rao divergences

include Squared Mahalanobis distance.

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

38/56
Burbea-Rao from Symmetrizing Bregman divergences [13]
◮

Jeffreys-Bregman divergences.

SF (p; q) =
=
◮

BF (p, q) + BF (q, p)
2
1
p − q, ∇F (p) − ∇F (q) ,
2

Jensen-Bregman divergences (diversity index).
JF (p; q) =
=

BF (p, p+q ) + BF (q, p+q )
2
2
2
F (p) + F (q)
p+q
−F
2
2

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

= BRF (p, q)

39/56
Skew Burbea-Rao divergences
(α)

BRF
(α)

:

X × X → R+

BRF (p, q) = αF (p) + (1 − α)F (q) − F (αp + (1 − α)q)
(α)

BRF (p, q) = αF (p) + (1 − α)F (q) − F (αp + (1 − α)q)
(1−α)

= BRF

(q, p)

Skew symmetrization of Bregman divergences:
def

αBF (p, αp + (1 − α)q) + (1 − α)BF (q, αp + (1 − α)q) =
(α)

BRF (p, q)

= skew Jensen-Bregman divergences.
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

40/56
Bregman divergences = asymptotic skewed Jensen
divergences

1
(α)
BRF (p, q)
α→1 1 − α
1
(α)
BF (q, p) = lim BRF (p, q)
α→0 α
BF (p, q) =

lim

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

41/56
Burbea-Rao/Jensen centroids
(p = 1)
n
X

(α )

wi BRF i (X , Ti ) = arg min L(x)

OPT : CF = arg min

x

i =1

Wlog., equivalent to minimize
n

n

E (c) = (
i =1

wi αi )F (C ) −

i =1

wi F (αi C + (1 − αi )Ti )

Sum E = F + G of convex F + concave G function ⇒
Convex-ConCave Procedure (CCCP, NIPS*01)
Start from arbitrary c0 , and iteratively update as:
∇F (Ct+1 ) = −∇G (Ct )
⇒ guaranteed convergence to a (local) minimum.
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

42/56
ConCave Convex Procedure (CCCP)
minx E (x) = F (x) + G (x)
∇F (ct+1 ) = −∇G (ct )

Decomposition may not be unique...
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

43/56
Iterative algorithm for Burbea-Rao centroids
Apply CCCP scheme
∇F (Ct+1 ) =

Ct+1 = ∇F

−1

1

n

n
i =1 wi αi i =1

1

wi αi ∇F (αi Ct + (1 − αi )Ti )

n

n
i =1 wi αi i =1

wi αi ∇F (αi Ct + (1 − αi )Ti )

Get arbitrarily fine approximations of the (skew) Burbea-Rao
matrix centroids and barycenters.

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

44/56
Special case: α-log det divergence [15, 11]
Cone of Hermitian positive definite matrices (self-adjoint matrices
¯
M H = M T = M).
F (X ) = − log detX , ∇F (X ) = ∇F −1 (X ) = −X −1
Burbea-Rao α-log det divergences:

 tr(Q −1 P − I ) − log det(Q −1 P)) α = 1


det( 1−α P+ 1+α Q)
(α)
4
2
2
α ∈ R{−1, 1}
Dld (P, Q) =
1−α
2 log
1+α
 1−α
(det P) 2 (det Q) 2

 tr(P −1 Q − I ) − log det(P −1 Q) α = −1
Start with C1 =

1
n

n
i =1 Ti ,
n

Ct+1 = n
i =1

1−α
1+α
Ti +
Ct
2
2

→ unique global mean (obtained from CCCP).

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

−1

−1

45/56
Bhattacharyya coefficients/distances
Bhattacharyya coefficient and non-metric distance:
C( p, q) =

p(x)q(x)dx, 0 < C (p, q) ≤ 1, B(p, q) = − ln C (p, q).

(coefficient is always strictly positive). Hellinger metric
H(p, q) =

1
2

( p(x) −

q(x))2 dx,

such that 0 ≤ H(p, q) ≤ 1.

H(p, q) =
=

1
2

p(x)dx +

q(x)dx − 2

p(x) q(x)dx

1 − C (p, q).

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

46/56
Chernoff coefficients/α-divergences
Skew Bhattacharrya divergences based on Chernoff α-coefficients.
Bα (p, q) = − ln
= − ln

x

p α(x)q 1−α (x)dx = − ln Cα (p, q)
q(x)

x

p(x)
q(x)

α

dx

= − ln Eq [Lα (x)]
Amari α-divergence:

1−α
1+α
 4 2 1 − p(x) 2 q(x) 2 dx , α = ±1,
 1−α

p(x)
Dα (p||q) =
α = −1,
p(x) log q(x) dx = KL(p, q),


 q(x) log q(x) dx = KL(q, p),
α = 1,
p(x)
Dα (p||q) = D−α (q||p)

Remapping α′ =

1−α
2

(α = 1 − 2α′ ) to get Chernoff α′ -divergences

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

47/56
Bhattacharyya/Chernoff of exponential families [10]

Equivalence with skew Burbea-Rao distances:
(α)

Bα (pF (x; θp ), pF (x; θq )) = BRF (θp , θq ),

(7)

= αF (θp ) + (1 − α)F (θq ) − F (αθp + (1 − α)θq )
Bhat. divergence on probability distributions amounts to compute
a Jensen divergence on its parameters

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

48/56
Closed-form Bhattacharyya distances for exp. fam.

Generic formula that instantiates in those well-known formula in
statistical pattern recognition.
Exp. fam.

F (θ) (up to a constant)

Multinomial

log(1 +

Poisson

exp θ

Gaussian

1
π
1
− 4θ + 2 log(− θ )
2
2

2
2
σ2 +σq
1 (µp −µq ) + 1 ln p
4 σ2 +σ2
2
2σp σq
p
q

Gaussian

1 trΘ−1 θθ T − 1 log det Θ
4
2

1 (µ − µ )T
p
q
8

d −1
exp θi )
i =1

θ2

Bhattacharyya/Burbea-Rao BRF (λp , λq ) = BRF (τ (λp ), τ (λq ))
√
− ln d=1 pi qi
i
1 ( √µ − √µ ) 2
p
q
2

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

Σp +Σq
2

−1

Σp +Σq

det
1
2
(µp − µq ) + 2 ln det Σ det Σ
p
q

49/56
Wrapping up

◮

Besides Euclidean, log-Euclidean and Riemannian
metric-based means, proposed
divergence-based matrix centroids,

◮

Total Bregman divergences and robustness (conformal
geometry),

◮

Riemannian minimax center,

◮

skew Burbea-Rao/Jensen divergences extending Bregman
divergences,

◮

Bhattacharrya means of densities = Burbea-Rao means on
(matrix) parameters
Which mean you do you mean or need?

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

50/56
Non-metric matrix manifolds with dually affine connections

In a nutshell:
◮

asymmetric (Bregman) non-metric divergence,

◮

Legendre transform, convex conjugates & dual divergences

◮

Dual θ− or η- or mixed coordinate systems

◮

dual closed-form affine geodesics (convenient computationally)

◮

Pythagorean theorem

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

51/56
Thank you.

www.informationgeometry.org

“One geometry cannot be more true than another;
it can only be more convenient”,
— Jules Henri Poincar´ (1902)
e

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

52/56
Bibliographic references I
Marc Arnaudon and Frank Nielsen.
On approximating the Riemannian 1-center.
Comput. Geom., 46(1):93–104, 2013.
Rajendra Bhatia.
The Riemannian mean of positive matrices.
In Frank Nielsen and Rajendra Bhatia, editors, Matrix Information Geometry, pages 35–51, 2012.
Silvere Bonnabel and Rodolphe Sepulchre.
Riemannian metric and geometric mean for positive semidefinite matrices of fixed rank.
SIAM J. Matrix Analysis Applications, 31(3):1055–1070, 2009.
Inderjit S. Dhillon and Joel A. Tropp.
Matrix nearness problems with bregman divergences.
SIAM J. Matrix Anal. Appl., 29(4):1120–1146, November 2007.
John Duchi, Shai Shalev-Shwartz, Yoram Singer, and Ambuj Tewari.
Composite objective mirror descent.
In Adam Tauman Kalai and Mehryar Mohri, editors, COLT, pages 14–26. Omnipress, 2010.
C. Harris and M. Stephens.
A Combined Corner and Edge Detection.
In Proceedings of The Fourth Alvey Vision Conference, pages 147–151, 1988.
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

53/56
Bibliographic references II
Meizhu Liu, Baba C. Vemuri, Shun-ichi Amari, and Frank Nielsen.
Shape retrieval using hierarchical total Bregman soft clustering.
Transactions on Pattern Analysis and Machine Intelligence, 34(12):2407–2419, 2012.
Meizhu Liu, Baba C. Vemuri, Shun ichi Amari, and Frank Nielsen.
Shape retrieval using hierarchical total bregman soft clustering.
IEEE Trans. Pattern Anal. Mach. Intell., 34(12):2407–2419, 2012.
Maher Moakher.
A differential geometric approach to the geometric mean of symmetric positive-definite matrices.
SIAM Journal on Matrix Analysis and Applications, 26(3):735–747, 2005.
Frank Nielsen and Sylvain Boltz.
The Burbea-Rao and Bhattacharyya centroids.
IEEE Transactions on Information Theory, 57(8):5455–5466, 2011.
Frank Nielsen, Meizhu Liu, Xiaojing Ye, and Baba C. Vemuri.
Jensen divergence based SPD matrix means and applications.
In International Conference on Pattern Recognition (ICPR), 2012.
Frank Nielsen and Richard Nock.
Quantum Voronoi diagrams and Holevo channel capacity for 1-qubit quantum states.
In IEEE International Symposium on Information Theory (ISIT), pages 96–100, 2008.
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

54/56
Bibliographic references III
Frank Nielsen and Richard Nock.
Sided and symmetrized Bregman centroids.
IEEE Trans. Inf. Theor., 55(6):2882–2904, June 2009.
Frank Nielsen and Richard Nock.
Entropies and cross-entropies of exponential families.
In International Conference on Image Processing (ICIP), pages 3621–3624, 2010.
R. Nock, B. Magdalou, E. Briys, and F. Nielsen.
On tracking portfolios with certainty equivalents on a generalization of Markowitz model: the fool, the wise
and the adaptive.
In Thorsten Joachims, editor, International Conference on Machine Learning (ICML). Omnipress, 2011.
Richard Nock, Brice Magdalou, Eric Briys, and Frank Nielsen.
Mining matrix data with Bregman matrix divergences for portfolio selection.
In Frank Nielsen and Rajendra Bhatia, editors, Matrix Information Geometry, pages 373–402, 2012.
Masanori Ohya and D´nes Petz.
e
Quantum Entropy and Its Use.
1st ed. 1993. Corr 2nd printing, 2004.
Koji Tsuda, Gunnar R¨tsch, and Manfred K. Warmuth.
a
Matrix exponentiated gradient updates for on-line learning and Bregman projection.
J. Mach. Learn. Res., 6:995–1018, December 2005.
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

55/56
Bibliographic references IV

Hisaharu Umegaki.
Conditional expectation in an operator algebra. IV. Entropy and information.
KodaiMathSemRep, 14(2):59, 1962.
Baba Vemuri, Meizhu Liu, Shun ichi Amari, and Frank Nielsen.
Total Bregman divergence and its applications to DTI analysis.
IEEE Transactions on Medical Imaging, 2011.
Zhizhou Wang and Baba C. Vemuri.
An affine invariant tensor dissimilarity measure and its applications to tensor-valued image segmentation.
In CVPR (1), pages 228–233, 2004.

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

56/56

More Related Content

PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...

What's hot (20)

PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
Continuous and Discrete-Time Analysis of SGD
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
Litvinenko low-rank kriging +FFT poster
PDF
On Clustering Financial Time Series - Beyond Correlation
PDF
Unbiased Bayes for Big Data
PDF
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
PDF
MCMC and likelihood-free methods
PDF
Coordinate sampler: A non-reversible Gibbs-like sampler
PPTX
Smart Multitask Bregman Clustering
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
ABC with Wasserstein distances
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
Maximum likelihood estimation of regularisation parameters in inverse problem...
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
QMC Opening Workshop, Support Points - a new way to compact distributions, wi...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Continuous and Discrete-Time Analysis of SGD
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Litvinenko low-rank kriging +FFT poster
On Clustering Financial Time Series - Beyond Correlation
Unbiased Bayes for Big Data
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
MCMC and likelihood-free methods
Coordinate sampler: A non-reversible Gibbs-like sampler
Smart Multitask Bregman Clustering
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
ABC with Wasserstein distances
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Maximum likelihood estimation of regularisation parameters in inverse problem...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
QMC Opening Workshop, Support Points - a new way to compact distributions, wi...
Ad

Viewers also liked (20)

PPTX
Atlas bacteriologico.a
PPTX
Matriz foda
PDF
PDF
La Misericordia
PPTX
Big data aplicado el negocio CRISP-DM
PPTX
Ideologia de genero agl
PDF
Flexitime y la biblioteca
PDF
Material para docente
PPTX
What is Social Media & why should you join? - for business
PPTX
Tomar café com.......Stª Casa da Misericórdia de Vizela dia 8-02-2012
PDF
38 nom 054 semarnat 1993 proc incompatibilidad residuos
PPTX
Comportamiento colectivo e individual
PDF
Revista vigia 2014 san 2
PPTX
Administracion de un centro de computo
PDF
Guia para implementar pei.
PPTX
Globalización y negocios internacionales
DOCX
Act 5 quiz 1 diseño de plantas industriales
PPTX
Los cuatro temperamentos 2011 yfraga
PDF
Grade 8: Araling Panlipunan Modyul 2: Mga Sinaunang Kabihasnan sa Asya
Atlas bacteriologico.a
Matriz foda
La Misericordia
Big data aplicado el negocio CRISP-DM
Ideologia de genero agl
Flexitime y la biblioteca
Material para docente
What is Social Media & why should you join? - for business
Tomar café com.......Stª Casa da Misericórdia de Vizela dia 8-02-2012
38 nom 054 semarnat 1993 proc incompatibilidad residuos
Comportamiento colectivo e individual
Revista vigia 2014 san 2
Administracion de un centro de computo
Guia para implementar pei.
Globalización y negocios internacionales
Act 5 quiz 1 diseño de plantas industriales
Los cuatro temperamentos 2011 yfraga
Grade 8: Araling Panlipunan Modyul 2: Mga Sinaunang Kabihasnan sa Asya
Ad

Similar to Computational Information Geometry on Matrix Manifolds (ICTP 2013) (20)

PDF
Pattern learning and recognition on statistical manifolds: An information-geo...
PDF
MUMS Opening Workshop - Model Uncertainty in Data Fusion for Remote Sensing -...
PDF
Slides: A glance at information-geometric signal processing
PDF
On approximating the Riemannian 1-center
PDF
PDF
Integration with kernel methods, Transported meshfree methods
PDF
Low rank tensor approximation of probability density and characteristic funct...
PDF
Optimal interval clustering: Application to Bregman clustering and statistica...
PDF
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...
PDF
Computing f-Divergences and Distances of High-Dimensional Probability Density...
PDF
Patch Matching with Polynomial Exponential Families and Projective Divergences
PDF
SPDE presentation 2012
PDF
Clustering Random Walk Time Series
PDF
Tucker tensor analysis of Matern functions in spatial statistics
PDF
KAUST_talk_short.pdf
PDF
On learning statistical mixtures maximizing the complete likelihood
PDF
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
PDF
Information-theoretic clustering with applications
PDF
The Multivariate Gaussian Probability Distribution
PDF
Pres metabief2020jmm
Pattern learning and recognition on statistical manifolds: An information-geo...
MUMS Opening Workshop - Model Uncertainty in Data Fusion for Remote Sensing -...
Slides: A glance at information-geometric signal processing
On approximating the Riemannian 1-center
Integration with kernel methods, Transported meshfree methods
Low rank tensor approximation of probability density and characteristic funct...
Optimal interval clustering: Application to Bregman clustering and statistica...
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...
Computing f-Divergences and Distances of High-Dimensional Probability Density...
Patch Matching with Polynomial Exponential Families and Projective Divergences
SPDE presentation 2012
Clustering Random Walk Time Series
Tucker tensor analysis of Matern functions in spatial statistics
KAUST_talk_short.pdf
On learning statistical mixtures maximizing the complete likelihood
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
Information-theoretic clustering with applications
The Multivariate Gaussian Probability Distribution
Pres metabief2020jmm

Recently uploaded (20)

PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
KodekX | Application Modernization Development
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Encapsulation theory and applications.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
MYSQL Presentation for SQL database connectivity
Reach Out and Touch Someone: Haptics and Empathic Computing
KodekX | Application Modernization Development
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Advanced methodologies resolving dimensionality complications for autism neur...
Understanding_Digital_Forensics_Presentation.pptx
NewMind AI Weekly Chronicles - August'25 Week I
Per capita expenditure prediction using model stacking based on satellite ima...
Encapsulation_ Review paper, used for researhc scholars
Network Security Unit 5.pdf for BCA BBA.
20250228 LYD VKU AI Blended-Learning.pptx
Machine learning based COVID-19 study performance prediction
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Encapsulation theory and applications.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
MYSQL Presentation for SQL database connectivity

Computational Information Geometry on Matrix Manifolds (ICTP 2013)

  • 1. Computational Information Geometry on Matrix Manifolds Frank Nielsen Frank.Nielsen@acm.org www.informationgeometry.org Sony Computer Science Laboratories, Inc. July 2013, ICTP, Trieste, IT c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 1/56
  • 2. Geometry of matrix manifolds... ◮ Euclidean geometry, Fr¨benius norm → distance: o M 2 F 2 mij = = i ,j Mi ∗ i 2 2 = M∗j 2 2 = tr(M ⊤ M) j ◮ Riemannian geometry of symmetric positive definite (SPD) matrices [9, 2] ◮ Riemannian geometry of rank-deficient positive semi-definite (SPSD) matrices Stiefel/Grassman manifolds [3] ◮ Quantum geometry: SPD matrices with unit trace “One geometry cannot be more true than another; it can only be more convenient”, — Jules Henri Poincar´ (1902) e c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 2/56
  • 3. Forthcoming conference (GSI) 28th-30th August, Paris. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 3/56
  • 4. What is Computational Information Geometry? ◮ What is Information? = Essence of data (datum=“thing”) (make it tangible → e.g., parameters of generative models) ◮ Can we do Intrinsic computing? (unbiased by any particular “data representation” → same results after recoding data) ◮ Geometry −→ Science of invariance (mother of Science, compass & ruler, Descartes analytic=coordinate/Cartesian, imaginaries, ...). ...the open-ended poetic mathematics! ?! c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 4/56
  • 5. Rationale for Computational Information Geometry ◮ Information is ...never void! → lower bounds ◮ ◮ ◮ ◮ ◮ Geometry: ◮ ◮ ◮ Fisher information and Cram´r-Rao lower bound (estimation) e Bayes error and Chernoff information (classification) Coding and Shannon entropy (communication) Program and Kolmogorov complexity (compression). (Unfortunately not computable!) Language (point, line, ball, dimension, orthogonal, projection, geodesic, immersion, etc.) Power of characterization (eg., intersection of two pseudo-segments not admitting closed-form expression) Computing: Information computing. Seeking for mathematical convenience and mathematical tricks (RKHS in ML). How to manipulate “space of functions” ?!? c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 5/56
  • 6. Example I: Matrix manifold Pattern = Gaussian mixture models (universal class) Statistical (dis)similarity/distance: total Bregman divergence (tBD, tKL). Invariance: ..., xi ∼ N(µi , Σi ), y = A(x) = Lx + t, yi ∼ N(Lµi + t, LΣi L⊤ ), D(X1 : X2 ) = D(Y1 : Y2 ) (L: any invertible affine transformation, t a translation) Shape Retrieval using Hierarchical Total Bregman Soft Clustering [7], IEEE PAMI, 2012. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 6/56
  • 7. Example II: Matrix manifolds DTI: diffusion ellipsoids, tensor interpolation. Pattern = zero-centered “Gaussians“ Statistical (dis)similarity/distance: total Bregman divergence (tBD, tKL). Invariance: ..., D(A⊤ PA : A⊤ QA) = D(P : Q), A ∈ SL(d): orthogonal matrix (volume/orientation preserving) total Bregman divergence (tBD). (3D rat corpus callosum) c Total Bregman Divergence and its Applications to DTI Analysis [20], IEEE TMI. 2011. Science Laboratories, Inc. 2013 Frank Nielsen, Sony Computer 7/56
  • 8. Example III: Gaussian manifolds Consider 5D Gaussian Mixture Models (GMMs) of color images (image=RGBxy point set) A Gaussian mixture model wi N(µi , Σi ) is interpreted as a weighted point set {θi = (µi , Σi )}. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 8/56
  • 9. Matrix center points & clustering Aggregation (matrix quantization for codebooks): Given a data-set of matrices M = {M1 , ..., Mn } ⊂ M, compute a center matrix C . Centering as a variational minimization problem: wi distancep (C , Mi ) (OPT ) : Cp = arg min C ∈M i Notion of centrality, robustness to outliers? For diagonal matrices, with “Euclidean” distance, usual geometric center points: ◮ ◮ ◮ median (p = 1): robust to outliers (Fermat-Weber point, no closed form), centroid (p = 2): breakdown point of 1 (→ tBD)), circumcenter (lim p → ∞): minimize farthest point (minimax [1]). c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 9/56
  • 10. Diffusion Tensor Magnetic Resonance Imaging DT-MRI: Measures anisotropic diffusion of water molecules in a 3 × 3 tensor assigned to each voxel position (1990˜). Used to analyze in-vivo connectivity patterns of brain tissues: gray matter, white matter (corpus callosum) and cerebrospinal fluid (CSF) c Image courtesy Peter J. Basser (Magnetic resonance imaging of the brain and spine, Chapter 31) c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 10/56
  • 11. Gradiometry tensor: 3 × 3 SPSD matrices Beyond the “constant” g ≃ 9.81m/s 2 . Gravity field measuring anisotropy. → Oil & gas industry. Courtesy of BellGeo. http://www.bellgeo.com/tech/technology_theory_of_FTG.html c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 11/56
  • 12. Structure tensors in computer vision → Pioneered in image processing: tensor descriptor of a region at a pixel. (Harris-Stephens [6]). Consider a kernel, and compute the tensor descriptor I ′2 (x) T (p = (x, y )) = K ∗ I ′ (y )I ′ (x) I ′ (x)I ′ (y ) I ′ (y )2 , w (u, v )∇I (u, v )(∇I (u, v ))T = u,v K : uniform, Gaussian kernel (eg., s × s window W centered at the pixel p) I ′ (x), I ′ (y ): gradient, derivatives of the image. Versatile method: corner detection, optical flow estimation, segmentation, stereo matching, etc. → Tensor image processing c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 12/56
  • 13. Harris-Stephens structure tensor (1988) Deformation tensor field Harris-Stephens combined corner-edge detector: R = det T − k(tr T )2 → Measures of tensor anisotropy. Structure tensor represents local orientation (eigenvectors/eigenvalues). Harris-Stephens’ combined corner/edge detector (note) c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 13/56
  • 14. Matrix with Fr¨benius metric distance o Matrix space M with vectorial structure dE (P, Q) = P −Q = F tr(P − Q)T (P − Q) Centroid of tensors: 1 CE = n (1) (2) n wi Ti i =1 → scalar average of each element of the tensor. Tensor Field Segmentation Using Region Based Active Contour Model [21], ECCV, 2004. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 14/56
  • 15. Matrix vectorization & computational geometry Computational geometry on w × h-dimensional matrix spaces wrt Fr¨benius distance amounts to computational geometry on o Euclidean vector space for D = w × h. → Voronoi diagrams, smallest enclosing ball, minimum spanning tree, etc. For symmetric matrices, we have D = d(d+1) degrees of freedom, 2 and vectorize as follows: d M F d 2 mij = i =1 j=1 d−1 d d 2 mij 2 mii + 2 = i =1 i =1 j=i +1 = m 2 √ √ √ with m = [m11 ...mdd 2m12 2m1d ... 2md−1,d ]T = M. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 15/56
  • 16. Matrix functions From the spectral decomposition: M = UDU ⊤ with D = λ(M) = diag(λ1 , ..., λd ) the diagonal matrix of eigenvalues, consider real-valued function x → f (x) to extend to matrices as f (M) = U diag(f (λ1 ), ..., f (λd )) U T 1 Examples: log x, exp x, |x|, x 2 , x 2 , etc. O(d 3 ) SVD factorization complexity. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 16/56
  • 17. Riemannian cone of SPD matrices Exponential maps from tangent planes (symmetric matrices Sym) to the manifold cone C: expP : TP C = Sym → C Logarithmic maps from manifold cone C to tangent planes: logP : C → TP C = Sym 1 1 1 1 logP (Q) = P 2 log(P − 2 QP − 2 )P 2 Map any point Q ∈ Sym++ to unique tangent vector at P such that γ0 = P and γ1 = Q. Geodesic equation: 1 1 1 γt (P, Q) = P 2 P − 2 QP − 2 t 1 P2 Geodesic (metric length) distance: 1 1 dR (P, Q) = log P − 2 QP − 2 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 17/56
  • 18. Riemannian Karcher centroid d dR (P, Q) = log2 λi tr log2 (P −1 Q) = i =1 = log P 1 −2 QP 1 −2 , where the λi ’s are the eigenvalues of P −1 Q. 1 1 (P −1 Q = Q 2 P −1 Q 2 ) Unique mean characterized by n=1 log(Ti−1 CR ) = 0 i Closed-form solution only for n = 2: 1 1 1 1 2 1 CR (P, Q) = P 2 P − 2 QP − 2 P 2 otherwise iterative approximation (CR = limt→∞ Ct ): Ct+1 = Ct exp c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 1 n n log Ct−1 Ti . i =1 18/56
  • 19. Riemannian minimax SPD center (circumcenter [1]) Case of p = ∞, center that minimizes the maximum distance. GEO-ALG: Starts with c1 ∈ P and iteratively update the current 1 circumcenter as follows: ci +1 = Geodesic(ci , fi , i +1 ), where fi denotes the farthest point of P to ci , and Geodesic(p, q, t) denotes the intermediate point m on the geodesic passing through p and q such that ρ(p, m) = t × ρ(p, q). Geodesic: 1 1 1 γt (P, Q) = P 2 P − 2 QP − 2 t 1 P2 Find t such that d=1 log2 λt = t 2 d=1 log2 λi = r 2 i i i That is t = r . Prove core-set and guaranteed convergence. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. d 2 i =1 log λi . 19/56
  • 20. Matrices as parameters in probability distributions Exponential families: Gaussian, Wishart, etc.: p(x; λ) = pF (x; θ) = exp ( t(x), θ − F (θ) + k(x)) . Example: Poisson distribution p(x; λ) = λx exp(−λ), x! ◮ the sufficient statistic t(x) = x, ◮ θ = log λ, the natural parameter, ◮ F (θ) = exp θ, the log-normalizer → CONVEX, ◮ and k(x) = − log x! the carrier measure (with respect to the counting measure). c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 20/56
  • 21. Gaussians as an exponential family p(x; λ) = p(x; µ, Σ) = 1 (x − µ)T Σ−1 (x − µ)) √ exp − 2 2π det Σ 1 θ = (Σ−1 µ, 2 Σ−1 ) ∈ Θ = Rd × Kd×d , with Kd×d cone of positive definite matrices, ◮ F (θ) = 1 trθ −1 θ1 θ T − 1 log det θ2 + d log π → CONVEX 1 2 4 2 2 ◮ t(x) = (x, −x T x), ◮ k(x) = 0. Inner product : composite, sum of a dot product and a matrix trace : T ′ T ′ θ, θ ′ = θ1 θ1 + trθ2 θ2 . ◮ The coordinate transformation τ : Λ → Θ is given for λ = (µ, Σ) by τ (λ) = 1 λ−1 λ1 , λ−1 , 2 2 2 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. τ −1 (θ) = 1 −1 1 −1 θ θ1 , θ2 2 2 2 21/56
  • 22. Convex duality: Legendre transformation ◮ For a strictly convex and differentiable function F : X → R: F ∗ (y ) = sup { y , x − F (x)} x∈X lF (y ;x); ◮ Maximum obtained for y = ∇F (x): ∇x lF (y ; x) = y − ∇F (x) = 0 ⇒ y = ∇F (x) ◮ Maximum unique from convexity of F (∇2 F ≻ 0): ∇2 lF (y ; x) = −∇2 F (x) ≺ 0 x ◮ Convex conjugates: (F , X ) ⇔ (F ∗ , Y), c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. Y = {∇F (x) | x ∈ X } 22/56
  • 23. Legendre duality: Geometric interpretation Consider the epigraph of F as a convex object: ◮ convex hull (V -representation), versus ◮ half-space (H-representation). Legendre transform also called “slope” transform. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 23/56
  • 24. Legendre duality & Canonical divergence ◮ ◮ ◮ Convex conjugates have functional inverse gradients ∇F −1 = ∇F ∗ ∇F ∗ may require numerical approximation (not always available in analytical closed-form) Involution: (F ∗ )∗ = F with ∇F ∗ = (∇F )−1 . Convex conjugate F ∗ expressed using (∇F )−1 : F ∗ (y ) = = ◮ x, y − F (x), x = ∇y F ∗ (y ) (∇F )−1 (y ), y − F ((∇F )−1 (y )) Fenchel-Young inequality at the heart of canonical divergence: F (x) + F ∗ (y ) ≥ x, y AF (x : y ) = AF ∗ (y : x) = F (x) + F ∗ (y ) − x, y ≥ 0 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 24/56
  • 25. Dual Bregman divergences & canonical divergence [14] p(x) ≥0 q(x) = BF (θQ : θP ) = BF ∗ (ηP : ηQ ) KL(P : Q) = EP log = F (θQ ) + F ∗ (ηP ) − θQ , ηP = AF (θQ : ηP ) = AF ∗ (ηP : θQ ) with θQ (natural parameterization) and ηP = EP [t(X )] = ∇F (θP ) (moment parameterization). 1 1 dx − p(x) log dx KL(P : Q) = p(x) log q(x) p(x) H × (P:Q) H(p)=H × (P:P) Shannon cross-entropy and entropy of EF [14]: H × (P : Q) = F (θQ ) − θQ , ∇F (θP ) − EP [k(x)] H(P) = F (θP ) − θP , ∇F (θP ) − EP [k(x)] H(P) = −F ∗ (ηP ) − EP [k(x)] c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 25/56
  • 26. Bregman divergence: Geometric interpretation (I) Potential function F , graph plot F : (x, F (x)). DF (p : q) = F (p) − F (q) − p − q, ∇F (q) c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 26/56
  • 27. Bregman divergence: Geometric interpretation (II) Potential function f , graph plot F : (x, f (x)). Bf (p||q) = f (p) − f (q) − (p − q)f ′ (q) Bf (.||q): vertical distance between the hyperplane Hq tangent to F at lifted point q , and the translated hyperplane at p . ˆ ˆ c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 27/56
  • 28. Bregman divergence: Geometric interpretation (III) Bregman divergence and path integrals B(θ1 : θ2 ) = F (θ1 ) − F (θ2 ) − θ1 − θ2 , ∇F (θ2 ) , (3) θ1 = θ2 η2 = η1 ∗ ∇F (t) − ∇F (θ2 ), dt , (4) ∇F ∗ (t) − ∇F ∗ (η1 ), dt , (5) = B (η2 : η1 ) c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. (6) 28/56
  • 29. Matrix Bregman divergences [4, 16] Choose F a real-valued functional generator and extend F to matrices: F (X ) = tr(Ψ(X )) tF ,k N k Ψ(X ) = k≥0 (tF ,k from the Taylor expansion of real-valued F ) BF (P : Q) = F (P) − F (Q) − tr((P − Q)⊤ ∇F (Q)), ∇F (X ) = ′ tF ,k N k k≥0 ′ (tF ,k from the Taylor expansion of real-valued F ′ ) c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 29/56
  • 30. Matrix Bregman divergences [16] c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 30/56
  • 31. Particular case: Bregman Schatten p-divergences [5, 16] Schatten p-norm of real symmetric matrix X : (unitarily invariant matrix norms) X p = λ(X ) p Bregman generator: 1 X 2 p 2 Used in regularized convex optimization [5], matrix data mining [16]. F (X ) = c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 31/56
  • 32. Matrix Legendre transformation Extends classical Legendre-Fenchel transformation: F ∗ (η) = sup spec(θ)⊆dom(F ) tr(θη ⊤ ) − F (θ) DF (θP : θQ ) = DF ∗ (ηQ : ηP ) = F (θ) + F ∗ (η) − tr(θη ⊤ ) θ and η are dual matrix coordinate systems on the matrix manifold. Non-metric differential structure with dual coordinate systems. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 32/56
  • 33. Bregman matrix means BF (X , P) = F (X ) − F (P) − tr((X − P)T ∇F (P)), F (·): strictly convex and differentiable function on an open convex space. n C = ∇F −1 i =1 wi ∇F (Ti ) quasi-arithmetic mean for ∇F . Since BF (X , P) = BF (P, X ), define a right-sided centroid M ′ : Find the center of mass [13] (independent of generator F ) F (X ) = tr(X T X ): the quadratic matrix entropy, F (X ) = − log det X : the matrix Burg entropy, and F (X ) = tr(X log X − X ): the von Neumann entropy [19, 18, 15] (Umegaki quantum relative entropy). c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 33/56
  • 34. Total Bregman divergences (tBD) Instead of ”vertical” projection in Bregman divergence, consider perpendicular projection. (Analogy with least squares and total least squares regression.) tBF (P, Q) = BF (P, Q) 1 + ∇F (Q) 2 → proven statistically robust. Applications to robust DT-MRI segmentation [8]. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 34/56
  • 35. Matrix Jensen/Burbea-Rao divergences [10] Convexity gap defines a divergence BRF (P, Q) = ◮ ◮ ◮ ◮ F (P) + F (Q) −F 2 P +Q 2 ≥0 F (X ) = tr(X T X ): the quadratic matrix entropy, F (X ) = − log det X : the matrix Burg entropy, and F (X ) = tr(X log X − X ): the von Neumann entropy. etc. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 35/56
  • 36. Smooth family of convex generators [12, 17] 1-parameter family of generators: Fα (X ) = 1 tr(αX − X α + (1 − α)I ), α = {0, 1} α(1 − α) Bα (P : Q) = ∇Fα (X ) = 1 tr(Q α − P α + αQ α−1 (P − Q)) α(1 − α) 1 (I − X α−1 ) α−1 1 −1 ∇Fα (X ) = (I − (α − 1)X ) α−1 When α → 1, ∇Fα (X ) = ∇F1 (X ) = log X . When α → 0, ∇Fα (X ) = ∇F0 (X ) = X −1 − I . ◮ α = 2: Quadratic matrix information ◮ α → 1: von Neumann information ◮ α → 0: Burg log-det information c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 36/56
  • 37. Jensen (Burbea-Rao) divergences Based on Jensen’s inequality for a convex function F : BRF (X , P) = F (X ) + F (P) −F 2 X +P 2 def = ≥ 0. strictly convex function F (·). Includes the special case of Jensen-Shannon divergence: JS(p, q) = H p+q 2 − H(p) + H(q) 2 F (x) = −H(x), the negative Shannon entropy H(x) = −x log x. → generators are convex and entropies are concave (negative generators) c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 37/56
  • 38. Visualizing Burbea-Rao divergences include Squared Mahalanobis distance. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 38/56
  • 39. Burbea-Rao from Symmetrizing Bregman divergences [13] ◮ Jeffreys-Bregman divergences. SF (p; q) = = ◮ BF (p, q) + BF (q, p) 2 1 p − q, ∇F (p) − ∇F (q) , 2 Jensen-Bregman divergences (diversity index). JF (p; q) = = BF (p, p+q ) + BF (q, p+q ) 2 2 2 F (p) + F (q) p+q −F 2 2 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. = BRF (p, q) 39/56
  • 40. Skew Burbea-Rao divergences (α) BRF (α) : X × X → R+ BRF (p, q) = αF (p) + (1 − α)F (q) − F (αp + (1 − α)q) (α) BRF (p, q) = αF (p) + (1 − α)F (q) − F (αp + (1 − α)q) (1−α) = BRF (q, p) Skew symmetrization of Bregman divergences: def αBF (p, αp + (1 − α)q) + (1 − α)BF (q, αp + (1 − α)q) = (α) BRF (p, q) = skew Jensen-Bregman divergences. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 40/56
  • 41. Bregman divergences = asymptotic skewed Jensen divergences 1 (α) BRF (p, q) α→1 1 − α 1 (α) BF (q, p) = lim BRF (p, q) α→0 α BF (p, q) = lim c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 41/56
  • 42. Burbea-Rao/Jensen centroids (p = 1) n X (α ) wi BRF i (X , Ti ) = arg min L(x) OPT : CF = arg min x i =1 Wlog., equivalent to minimize n n E (c) = ( i =1 wi αi )F (C ) − i =1 wi F (αi C + (1 − αi )Ti ) Sum E = F + G of convex F + concave G function ⇒ Convex-ConCave Procedure (CCCP, NIPS*01) Start from arbitrary c0 , and iteratively update as: ∇F (Ct+1 ) = −∇G (Ct ) ⇒ guaranteed convergence to a (local) minimum. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 42/56
  • 43. ConCave Convex Procedure (CCCP) minx E (x) = F (x) + G (x) ∇F (ct+1 ) = −∇G (ct ) Decomposition may not be unique... c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 43/56
  • 44. Iterative algorithm for Burbea-Rao centroids Apply CCCP scheme ∇F (Ct+1 ) = Ct+1 = ∇F −1 1 n n i =1 wi αi i =1 1 wi αi ∇F (αi Ct + (1 − αi )Ti ) n n i =1 wi αi i =1 wi αi ∇F (αi Ct + (1 − αi )Ti ) Get arbitrarily fine approximations of the (skew) Burbea-Rao matrix centroids and barycenters. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 44/56
  • 45. Special case: α-log det divergence [15, 11] Cone of Hermitian positive definite matrices (self-adjoint matrices ¯ M H = M T = M). F (X ) = − log detX , ∇F (X ) = ∇F −1 (X ) = −X −1 Burbea-Rao α-log det divergences:   tr(Q −1 P − I ) − log det(Q −1 P)) α = 1   det( 1−α P+ 1+α Q) (α) 4 2 2 α ∈ R{−1, 1} Dld (P, Q) = 1−α 2 log 1+α  1−α (det P) 2 (det Q) 2   tr(P −1 Q − I ) − log det(P −1 Q) α = −1 Start with C1 = 1 n n i =1 Ti , n Ct+1 = n i =1 1−α 1+α Ti + Ct 2 2 → unique global mean (obtained from CCCP). c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. −1 −1 45/56
  • 46. Bhattacharyya coefficients/distances Bhattacharyya coefficient and non-metric distance: C( p, q) = p(x)q(x)dx, 0 < C (p, q) ≤ 1, B(p, q) = − ln C (p, q). (coefficient is always strictly positive). Hellinger metric H(p, q) = 1 2 ( p(x) − q(x))2 dx, such that 0 ≤ H(p, q) ≤ 1. H(p, q) = = 1 2 p(x)dx + q(x)dx − 2 p(x) q(x)dx 1 − C (p, q). c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 46/56
  • 47. Chernoff coefficients/α-divergences Skew Bhattacharrya divergences based on Chernoff α-coefficients. Bα (p, q) = − ln = − ln x p α(x)q 1−α (x)dx = − ln Cα (p, q) q(x) x p(x) q(x) α dx = − ln Eq [Lα (x)] Amari α-divergence:  1−α 1+α  4 2 1 − p(x) 2 q(x) 2 dx , α = ±1,  1−α  p(x) Dα (p||q) = α = −1, p(x) log q(x) dx = KL(p, q),    q(x) log q(x) dx = KL(q, p), α = 1, p(x) Dα (p||q) = D−α (q||p) Remapping α′ = 1−α 2 (α = 1 − 2α′ ) to get Chernoff α′ -divergences c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 47/56
  • 48. Bhattacharyya/Chernoff of exponential families [10] Equivalence with skew Burbea-Rao distances: (α) Bα (pF (x; θp ), pF (x; θq )) = BRF (θp , θq ), (7) = αF (θp ) + (1 − α)F (θq ) − F (αθp + (1 − α)θq ) Bhat. divergence on probability distributions amounts to compute a Jensen divergence on its parameters c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 48/56
  • 49. Closed-form Bhattacharyya distances for exp. fam. Generic formula that instantiates in those well-known formula in statistical pattern recognition. Exp. fam. F (θ) (up to a constant) Multinomial log(1 + Poisson exp θ Gaussian 1 π 1 − 4θ + 2 log(− θ ) 2 2 2 2 σ2 +σq 1 (µp −µq ) + 1 ln p 4 σ2 +σ2 2 2σp σq p q Gaussian 1 trΘ−1 θθ T − 1 log det Θ 4 2 1 (µ − µ )T p q 8 d −1 exp θi ) i =1 θ2 Bhattacharyya/Burbea-Rao BRF (λp , λq ) = BRF (τ (λp ), τ (λq )) √ − ln d=1 pi qi i 1 ( √µ − √µ ) 2 p q 2 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. Σp +Σq 2 −1 Σp +Σq det 1 2 (µp − µq ) + 2 ln det Σ det Σ p q 49/56
  • 50. Wrapping up ◮ Besides Euclidean, log-Euclidean and Riemannian metric-based means, proposed divergence-based matrix centroids, ◮ Total Bregman divergences and robustness (conformal geometry), ◮ Riemannian minimax center, ◮ skew Burbea-Rao/Jensen divergences extending Bregman divergences, ◮ Bhattacharrya means of densities = Burbea-Rao means on (matrix) parameters Which mean you do you mean or need? c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 50/56
  • 51. Non-metric matrix manifolds with dually affine connections In a nutshell: ◮ asymmetric (Bregman) non-metric divergence, ◮ Legendre transform, convex conjugates & dual divergences ◮ Dual θ− or η- or mixed coordinate systems ◮ dual closed-form affine geodesics (convenient computationally) ◮ Pythagorean theorem c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 51/56
  • 52. Thank you. www.informationgeometry.org “One geometry cannot be more true than another; it can only be more convenient”, — Jules Henri Poincar´ (1902) e c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 52/56
  • 53. Bibliographic references I Marc Arnaudon and Frank Nielsen. On approximating the Riemannian 1-center. Comput. Geom., 46(1):93–104, 2013. Rajendra Bhatia. The Riemannian mean of positive matrices. In Frank Nielsen and Rajendra Bhatia, editors, Matrix Information Geometry, pages 35–51, 2012. Silvere Bonnabel and Rodolphe Sepulchre. Riemannian metric and geometric mean for positive semidefinite matrices of fixed rank. SIAM J. Matrix Analysis Applications, 31(3):1055–1070, 2009. Inderjit S. Dhillon and Joel A. Tropp. Matrix nearness problems with bregman divergences. SIAM J. Matrix Anal. Appl., 29(4):1120–1146, November 2007. John Duchi, Shai Shalev-Shwartz, Yoram Singer, and Ambuj Tewari. Composite objective mirror descent. In Adam Tauman Kalai and Mehryar Mohri, editors, COLT, pages 14–26. Omnipress, 2010. C. Harris and M. Stephens. A Combined Corner and Edge Detection. In Proceedings of The Fourth Alvey Vision Conference, pages 147–151, 1988. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 53/56
  • 54. Bibliographic references II Meizhu Liu, Baba C. Vemuri, Shun-ichi Amari, and Frank Nielsen. Shape retrieval using hierarchical total Bregman soft clustering. Transactions on Pattern Analysis and Machine Intelligence, 34(12):2407–2419, 2012. Meizhu Liu, Baba C. Vemuri, Shun ichi Amari, and Frank Nielsen. Shape retrieval using hierarchical total bregman soft clustering. IEEE Trans. Pattern Anal. Mach. Intell., 34(12):2407–2419, 2012. Maher Moakher. A differential geometric approach to the geometric mean of symmetric positive-definite matrices. SIAM Journal on Matrix Analysis and Applications, 26(3):735–747, 2005. Frank Nielsen and Sylvain Boltz. The Burbea-Rao and Bhattacharyya centroids. IEEE Transactions on Information Theory, 57(8):5455–5466, 2011. Frank Nielsen, Meizhu Liu, Xiaojing Ye, and Baba C. Vemuri. Jensen divergence based SPD matrix means and applications. In International Conference on Pattern Recognition (ICPR), 2012. Frank Nielsen and Richard Nock. Quantum Voronoi diagrams and Holevo channel capacity for 1-qubit quantum states. In IEEE International Symposium on Information Theory (ISIT), pages 96–100, 2008. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 54/56
  • 55. Bibliographic references III Frank Nielsen and Richard Nock. Sided and symmetrized Bregman centroids. IEEE Trans. Inf. Theor., 55(6):2882–2904, June 2009. Frank Nielsen and Richard Nock. Entropies and cross-entropies of exponential families. In International Conference on Image Processing (ICIP), pages 3621–3624, 2010. R. Nock, B. Magdalou, E. Briys, and F. Nielsen. On tracking portfolios with certainty equivalents on a generalization of Markowitz model: the fool, the wise and the adaptive. In Thorsten Joachims, editor, International Conference on Machine Learning (ICML). Omnipress, 2011. Richard Nock, Brice Magdalou, Eric Briys, and Frank Nielsen. Mining matrix data with Bregman matrix divergences for portfolio selection. In Frank Nielsen and Rajendra Bhatia, editors, Matrix Information Geometry, pages 373–402, 2012. Masanori Ohya and D´nes Petz. e Quantum Entropy and Its Use. 1st ed. 1993. Corr 2nd printing, 2004. Koji Tsuda, Gunnar R¨tsch, and Manfred K. Warmuth. a Matrix exponentiated gradient updates for on-line learning and Bregman projection. J. Mach. Learn. Res., 6:995–1018, December 2005. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 55/56
  • 56. Bibliographic references IV Hisaharu Umegaki. Conditional expectation in an operator algebra. IV. Entropy and information. KodaiMathSemRep, 14(2):59, 1962. Baba Vemuri, Meizhu Liu, Shun ichi Amari, and Frank Nielsen. Total Bregman divergence and its applications to DTI analysis. IEEE Transactions on Medical Imaging, 2011. Zhizhou Wang and Baba C. Vemuri. An affine invariant tensor dissimilarity measure and its applications to tensor-valued image segmentation. In CVPR (1), pages 228–233, 2004. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 56/56