SlideShare a Scribd company logo
Jeffreys centroids:
A closed-form expression for positive histograms
and a guaranteed tight approximation for
frequency histograms
Frank Nielsen
Frank.Nielsen@acm.org
5793b870
Sony Computer Science Laboratories, Inc.

April 2013

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

1/25
Why histogram clustering?
Task: Classify documents into categories:
Bag-of-Word (BoW) modeling paradigm [3, 6].
◮

Define a word dictionary, and

◮

Represent each document by a word count histogram.

Centroid-based k-means clustering [1]:
◮

Cluster document histograms to learn categories,

◮

Build visual vocabularies by quantizing image features:
Compressed Histogram of Gradient descriptors [4].

→ histogram centroids
wh = d=1 hi : cumulative sum of bin values
i
˜: normalization operator
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

2/25
Why Jeffreys divergence?
Distance between two frequency histograms p and q :
˜
˜
Kullback-Leibler divergence or relative entropy.
KL(˜ : q ) = H × (˜ : q ) − H(˜ ),
p ˜
p ˜
p
d

p i log
˜

H × (˜ : q ) =
p ˜
i =1

1
, cross − entropy
qi
˜
d

p i log
˜

H(˜ ) = H × (˜ : p ) =
p
p ˜
i =1

1
, Shannon entropy.
pi
˜

→ expected extra number of bits per datum that must be
transmitted when using the “wrong” distribution q instead of the
˜
true distribution p .
˜
p is hidden by nature (and hypothesized), q is estimated.
˜
˜
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

3/25
Why Jeffreys divergence?

When clustering histograms, all histograms play the same role →
Jeffreys [8] divergence:
J(p, q) = KL(p : q) + KL(q : p),
d

(p i − q i ) log

J(p, q) =
i =1

pi
= J(q, p).
qi

→ symmetrizes the KL divergence.
(also called J-divergence or symmetrical Kullback-Leibler
divergence, etc.)

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

4/25
Jeffreys centroids: frequency and positive centroids
A set H = {h1 , ..., hn } of weighted histograms.
n

πj J(hj , x),

c = arg min
x

j=1
n
j=1 πj

πj > 0’s histogram positive weights:
◮

= 1.

Jeffreys positive centroid c:
n

πj J(hj , x),

c = arg min

x∈Rd
+

◮

j=1

Jeffreys frequency centroid c :
˜
n

˜
πj J(hj , x),

c = arg min
˜

x∈∆d

j=1

∆d : Probability (d − 1)-dimensional simplex.
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

5/25
Prior work
◮

Histogram clustering wrt. χ2 distance [10]

◮

Histogram clustering wrt. Bhattacharyya distance [11, 13]

◮

Histogram clustering wrt. Kullback-Leibler distance as
Bregman k-means clustering [1]

◮

Jeffreys frequency centroid [16] (Newton numerical
optimization)

◮

Jeffreys frequency centroid as equivalent symmetrized
Bregmen centroid [14]

◮

Mixed Bregman clustering [15]

◮

Smooth family of KL symmetrized centroids including
Jensen-Shannon centroids and Jeffreys centroids in limit
case [12]

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

6/25
Jeffreys positive centroid
n

c = arg min J(H, x) = arg min
x∈Rd
+

x∈Rd
+

πj J(hj , x).
j=1

Theorem (Theorem 1)
The Jeffreys positive centroid c = (c 1 , ..., c d ) of a set {h1 , ..., hn }
of n weighted positive histograms with d bins can be calculated
component-wise exactly using the Lambert W analytic function:
ci =

ai
i

a
W ( g i e)

,

where ai = n πj hji denotes the coordinate-wise arithmetic
j=1
weighted means and g i = n (hji )πj the coordinate-wise
j=1
geometric weighted means.
Lambert analytic function [2] W (x)e W (x) = x for x ≥ 0.
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

7/25
Jeffreys positive centroid (proof)
n

πj J(hj , x)

min
x

j=1
n

min
x

d

(hji − x i )(log hji − log x i )

πj
j=1
d

i =1
n

πj (x i log x i − x i log hji − hji log x i )

≡ min
x

i =1 j=1
n

d

n

(hji )πj −

x i log x i − x i log
j=1

i =1

πj hji a log x i
j=1

g
d

x i log

min
x

i =1

xi
− a log x i
g

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

8/25
Jeffreys positive centroid (proof)

Coordinate-wise minimize:
min x log
x

x
− a log x
g

Setting the derivative to zero, we solve:
log

x
a
+1− =0
g
x

and get
x=

a
a
W ( g e)

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

9/25
Jeffreys frequency centroid: A guaranteed approximation
n

˜
πj J(hj , x),

c = arg min
˜

x∈∆d

j=1

Relaxing x from probability simplex ∆d to Rd , we get
+
c′ =
˜

c i
ai
, wc =
,c =
ai
wc
W ( g i e)

ci
i

Lemma (Lemma 1)
The cumulative sum wc of the bin values of the Jeffreys positive
centroid c of a set of frequency histograms is less or equal to one:
0 < wc ≤ 1.

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

10/25
Proof of Lemma 1

From Theorem 1:
d

d

ai

ci =

wc =

i

i =1

i =1

a
W ( g i e)

.

Arithmetic-geometric mean inequality: ai ≥ g i
ai
Therefore W ( g i e) ≥ 1 and c i ≤ ai . Thus
d

d

ai = 1

i

c ≤

wc =
i =1

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

i =1

11/25
Lemma 2

Lemma (Lemma 2)
˜
For any histogram x and frequency histogram h, we have
˜ = J(˜ , h) + (wx − 1)(KL(˜ : h) + log wx ), where wx
˜
˜
J(x, h)
x
x
denotes the normalization factor (wx = d=1 x i ).
i
˜
J(x, H) = J(˜ , H) + (wx − 1)(KL(˜ : H) + log wx ),
x ˜
x ˜
˜
˜
where J(x, H ) = n πj J(x, hj ) and
j=1
KL(˜ : H) = n πj KL(˜ , hj ) (with
x ˜
x ˜
j=1

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

n
j=1 πj

= 1).

12/25
Proof of Lemma 2
x i = wx x i
˜
d

˜
(wx x i − hi ) log
˜

˜
J(x, h) =
i =1
d

˜
J(x, h) =

(wx x i log
˜
i =1

wx x i
˜
˜i
h

˜
hi ˜
xi
˜
˜
+ wx x i log wx + hi log i − hi log wx )
˜
˜
x
˜
hi
d

= (wx − 1) log wx + J(˜ , h) + (wx − 1)
x ˜

x i log
˜
i =1

xi
˜
˜
hi

= J(˜ , h) + (wx − 1)(KL(˜ : h) + log wx )
x ˜
x ˜

since

d ˜i
i =1 h

=

d
˜i
i =1 x

= 1.

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

13/25
Guaranteed approximation of c
˜

Theorem (Theorem 2)
c
Let c denote the Jeffreys frequency centroid and c ′ = wc the
˜
˜
normalized Jeffreys positive centroid. Then the approximation
˜
c ′ H)
1
factor αc ′ = J(˜ ,,H) is such that 1 ≤ αc ′ ≤ wc (with wc ≤ 1).
˜
˜
J(˜ ˜
c

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

14/25
Proof of Theorem 2
˜
J(c, H) ≤ J(˜, H) ≤ J(˜′ , H)
c ˜
c ˜
From Lemma 2, since
˜
J(˜′ , H) = J(c, H) + (1 − wc )(KL(˜′ , H) + log wc )) and
c ˜
c ˜
˜ ≤ J(˜, H)
˜
J(c, H)
c
1 ≤ αc ′ ≤ 1 +
˜

(1 − wc )(KL(˜′ , H) + log wc )
c ˜
J(˜, H)
c ˜

1
˜
KL(c, H) − log wc
KL(˜′ : H) =
c ˜
wc
αc ′ ≤ 1 +
˜

˜
(1 − wc )KL(c, H)
wc J(˜, H)
c ˜

˜
˜
˜
Since J(˜, H) ≥ J(c, H) and KL(c, H) ≤ J(c, H), we get
c ˜
1
αc ′ ≤ wc .
˜
When wc = 1 the bound is tight.
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

15/25
In practice...

˜
˜
c in closed-form → compute wc , KL(c, H), J(c, H).
Bound the approximation factor αc ′ as:
˜
αc ′ ≤ 1 +
˜

1
−1
wc

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

˜
KL(c, H)
1
≤
˜
wc
J(c, H)

16/25
Fine approximation
From [16, 14], minimization of Jeffreys frequency centroid
equivalent to:
a ˜
x ˜
c = arg min KL(˜ : x ) + KL(˜ : g )
˜
x ∈∆d
˜

Lagrangian function enforcing
log

i

c i = 1:

a
˜i
ci
˜
+1− i +λ=0
gi
˜
c
˜
a
˜i

ci =
˜
W

˜i e λ+1
a
gi
˜

λ = −KL(˜ : g ) ≤ 0
c ˜

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

17/25
Fine approximation: Bisection search

˜i
a

ci ≤ 1 ⇒ ci =

a
˜i e λ+1
gi
˜

W
i

a
λ ≥ log(e ˜ g i ) − 1∀i ,
˜

i

a
λ ∈ [max log(e ˜ g i ) − 1, 0]
˜
i

d

˜i
a

c i (λ) =

s(λ) =
i

≤1

i =1

W

a
˜i e λ+1
gi
˜

Function s: monotonously decreasing with s(0) ≤ 1.
→ Bisection search for s(λ∗ ) ≃ 1 for arbitrary precision.

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

18/25
Experiments: Caltech-256

Caltech-256 [7]: 30607 images labeled into 256 categories (256
Jeffreys centroids).
Arbitrary floating-point precision: http://www.apfloat.org/
c ′′ =
˜
αc (optimal positive)
avg
min
max

0.9648680345638155
0.906414219584823
0.9956399220678585

a ˜
˜+g
2

α ′ (n′ lized approx.)
c
˜
1.0002205080964255
1.0000005079528809
1.0000031489541772

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

wc ≤ 1(n′ lizing coeff.t)
0.9338228644308926
0.8342819488534723
0.9931975105809021

α ′′ (Veldhuis’ approx.)
c
˜
1.065590178484613
1.0027707382095195
1.3582296675397754

19/25
Experiments: Synthetic data-sets

Random binary histograms
α=

J(˜′ )
c
≥1
J(˜)
c

Performance:
α ∼ 1.0000009, αmax ∼ 1.00181506, αmin = 1.000000.
¯

Express better worst-case upper bound performance?

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

20/25
Summary and conclusion
◮

Jeffreys positive centroid c in closed-form

◮

normalized Jeffreys positive centroid c ′ within approximation
˜
1
factor wc

◮

Bisection search for arbitrary fine approximation of c .
˜

→ Variational Jeffreys k-means clustering
Other Kullback-Leibler symmetrizations:
◮

Jensen-Shannon divergence [9]

◮

Chernoff divergence [5]

◮

Family of symmetrized centroids including Jensen-Shannon
and Jeffreys centroids [12]

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

21/25
Thank you!

http://www.informationgeometry.org
@Article{JeffreysCentroid-2013,
author = {Frank Nielsen},
title = {Jeffreys centroids: {A} closed-form expression for positive histograms
and a guaranteed tight approximation for frequency histograms},
journal = {IEEE Signal Processing Letters (SPL)},
year = {2013}
}

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

22/25
Bibliographic references I
Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, and Joydeep Ghosh.
Clustering with Bregman divergences.
Journal of Machine Learning Research, 6:1705–1749, 2005.
D. A. Barry, P. J. Culligan-Hensley, and S. J. Barry.
Real values of the W -function.
ACM Trans. Math. Softw., 21(2):161–171, June 1995.
Brigitte Bigi.
Using Kullback-Leibler distance for text categorization.
In Proceedings of the 25th European conference on IR research (ECIR), ECIR’03, pages 305–319, Berlin,
Heidelberg, 2003. Springer-Verlag.
Vijay Chandrasekhar, Gabriel Takacs, David M. Chen, Sam S. Tsai, Yuriy A. Reznik, Radek Grzeszczuk, and
Bernd Girod.
Compressed histogram of gradients: A low-bitrate descriptor.
International Journal of Computer Vision, 96(3):384–399, 2012.
Herman Chernoff.
A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations.
Annals of Mathematical Statistics, 23:493–507, 1952.

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

23/25
Bibliographic references II
G. Csurka, C. Bray, C. Dance, and L. Fan.
Visual categorization with bags of keypoints.
Workshop on Statistical Learning in Computer Vision (ECCV), pages 1–22, 2004.
G. Griffin, A. Holub, and P. Perona.
Caltech-256 object category dataset.
Technical Report 7694, California Institute of Technology, 2007.
Harold Jeffreys.
An invariant form for the prior probability in estimation problems.
Proceedings of the Royal Society of London, 186(1007):453–461, March 1946.
Jianhua Lin.
Divergence measures based on the Shannon entropy.
IEEE Transactions on Information Theory, 37:145–151, 1991.
Huan Liu and Rudy Setiono.
Chi2: Feature selection and discretization of numeric attributes.
In Proceedings of the Seventh International Conference on Tools with Artificial Intelligence (TAI), pages
88–, Washington, DC, USA, 1995. IEEE Computer Society.
Max Mignotte.
Segmentation by fusion of histogram-based k-means clusters in different color spaces.
IEEE Transactions on Image Processing (TIP), 17(5):780–787, 2008.
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

24/25
Bibliographic references III
Frank Nielsen.
A family of statistical symmetric divergences based on Jensen’s inequality.
CoRR, abs/1009.4004, 2010.
Frank Nielsen and Sylvain Boltz.
The Burbea-Rao and Bhattacharyya centroids.
IEEE Transactions on Information Theory, 57(8):5455–5466, August 2011.
Frank Nielsen and Richard Nock.
Sided and symmetrized Bregman centroids.
IEEE Transactions on Information Theory, 55(6):2048–2059, June 2009.
Richard Nock, Panu Luosto, and Jyrki Kivinen.
Mixed Bregman clustering with approximation guarantees.
In Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases,
pages 154–169, Berlin, Heidelberg, 2008. Springer-Verlag.
Raymond N. J. Veldhuis.
The centroid of the symmetrical Kullback-Leibler distance.
IEEE signal processing letters, 9(3):96–99, March 2002.

c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc.

25/25

More Related Content

PDF
Automatic Bayesian method for Numerical Integration
PPT
Stationary Incompressible Viscous Flow Analysis by a Domain Decomposition Method
PDF
Hideitsu Hino
PDF
Optimal interval clustering: Application to Bregman clustering and statistica...
PDF
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
PDF
GonzalezGinestetResearchDay2016
PDF
On Clustering Histograms with k-Means by Using Mixed α-Divergences
PDF
MUMS Opening Workshop - Extrapolation: The Art of Connecting Model-Based Pred...
Automatic Bayesian method for Numerical Integration
Stationary Incompressible Viscous Flow Analysis by a Domain Decomposition Method
Hideitsu Hino
Optimal interval clustering: Application to Bregman clustering and statistica...
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
GonzalezGinestetResearchDay2016
On Clustering Histograms with k-Means by Using Mixed α-Divergences
MUMS Opening Workshop - Extrapolation: The Art of Connecting Model-Based Pred...

What's hot (18)

PPTX
Supervised Learning of Semantic Classes for Image Annotation and Retrieval
PDF
Information-theoretic clustering with applications
PDF
Hybrid dynamics in large-scale logistics networks
PDF
Meanshift Tracking Presentation
PDF
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
PDF
Expectation propagation for latent Dirichlet allocation
PDF
Introducing Zap Q-Learning
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
Modeling, Control and Optimization for Aerospace Systems
PDF
Reinforcement Learning: Hidden Theory and New Super-Fast Algorithms
PDF
Low Complexity Regularization of Inverse Problems
PDF
Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...
PDF
Mesh Processing Course : Active Contours
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
DissertationSlides169
PDF
Proximal Splitting and Optimal Transport
PPTX
Introduction to Neural Networks and Deep Learning from Scratch
PDF
Review of probability calculus
Supervised Learning of Semantic Classes for Image Annotation and Retrieval
Information-theoretic clustering with applications
Hybrid dynamics in large-scale logistics networks
Meanshift Tracking Presentation
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
Expectation propagation for latent Dirichlet allocation
Introducing Zap Q-Learning
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Modeling, Control and Optimization for Aerospace Systems
Reinforcement Learning: Hidden Theory and New Super-Fast Algorithms
Low Complexity Regularization of Inverse Problems
Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...
Mesh Processing Course : Active Contours
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
DissertationSlides169
Proximal Splitting and Optimal Transport
Introduction to Neural Networks and Deep Learning from Scratch
Review of probability calculus
Ad

Viewers also liked (6)

PDF
Classification with mixtures of curved Mahalanobis metrics
PDF
Tailored Bregman Ball Trees for Effective Nearest Neighbors
PDF
Traitement massif des données 2016
PDF
A series of maximum entropy upper bounds of the differential entropy
PDF
The dual geometry of Shannon information
PDF
Patch Matching with Polynomial Exponential Families and Projective Divergences
Classification with mixtures of curved Mahalanobis metrics
Tailored Bregman Ball Trees for Effective Nearest Neighbors
Traitement massif des données 2016
A series of maximum entropy upper bounds of the differential entropy
The dual geometry of Shannon information
Patch Matching with Polynomial Exponential Families and Projective Divergences
Ad

Similar to Slides: Jeffreys centroids for a set of weighted histograms (20)

PDF
Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015
PDF
Slides: Hypothesis testing, information divergence and computational geometry
PDF
Mixed Spectra for Stable Signals from Discrete Observations
PDF
Mixed Spectra for Stable Signals from Discrete Observations
PDF
Mixed Spectra for Stable Signals from Discrete Observations
PDF
Mixed Spectra for Stable Signals from Discrete Observations
PDF
MIXED SPECTRA FOR STABLE SIGNALS FROM DISCRETE OBSERVATIONS
PDF
MUMS: Bayesian, Fiducial, and Frequentist Conference - Objective Bayesian Ana...
PDF
Lecture4
PDF
Divergence center-based clustering and their applications
PDF
ABC workshop: 17w5025
PDF
Lec 3-mcgregor
PDF
Fuzzy directed divergence and image segmentation
PDF
Can we estimate a constant?
PDF
ISBA 2016: Foundations
PDF
Ab cancun
PDF
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
PDF
Lec 4-slides
PDF
Divergence clustering
PPT
fuzzy_measures.ppt
Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015
Slides: Hypothesis testing, information divergence and computational geometry
Mixed Spectra for Stable Signals from Discrete Observations
Mixed Spectra for Stable Signals from Discrete Observations
Mixed Spectra for Stable Signals from Discrete Observations
Mixed Spectra for Stable Signals from Discrete Observations
MIXED SPECTRA FOR STABLE SIGNALS FROM DISCRETE OBSERVATIONS
MUMS: Bayesian, Fiducial, and Frequentist Conference - Objective Bayesian Ana...
Lecture4
Divergence center-based clustering and their applications
ABC workshop: 17w5025
Lec 3-mcgregor
Fuzzy directed divergence and image segmentation
Can we estimate a constant?
ISBA 2016: Foundations
Ab cancun
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
Lec 4-slides
Divergence clustering
fuzzy_measures.ppt

Recently uploaded (20)

PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Machine learning based COVID-19 study performance prediction
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
cuic standard and advanced reporting.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Encapsulation theory and applications.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
Encapsulation_ Review paper, used for researhc scholars
Machine learning based COVID-19 study performance prediction
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Mobile App Security Testing_ A Comprehensive Guide.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
cuic standard and advanced reporting.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Encapsulation theory and applications.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Per capita expenditure prediction using model stacking based on satellite ima...
Building Integrated photovoltaic BIPV_UPV.pdf
Chapter 3 Spatial Domain Image Processing.pdf
The AUB Centre for AI in Media Proposal.docx
Network Security Unit 5.pdf for BCA BBA.
Dropbox Q2 2025 Financial Results & Investor Presentation
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Review of recent advances in non-invasive hemoglobin estimation

Slides: Jeffreys centroids for a set of weighted histograms

  • 1. Jeffreys centroids: A closed-form expression for positive histograms and a guaranteed tight approximation for frequency histograms Frank Nielsen Frank.Nielsen@acm.org 5793b870 Sony Computer Science Laboratories, Inc. April 2013 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 1/25
  • 2. Why histogram clustering? Task: Classify documents into categories: Bag-of-Word (BoW) modeling paradigm [3, 6]. ◮ Define a word dictionary, and ◮ Represent each document by a word count histogram. Centroid-based k-means clustering [1]: ◮ Cluster document histograms to learn categories, ◮ Build visual vocabularies by quantizing image features: Compressed Histogram of Gradient descriptors [4]. → histogram centroids wh = d=1 hi : cumulative sum of bin values i ˜: normalization operator c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 2/25
  • 3. Why Jeffreys divergence? Distance between two frequency histograms p and q : ˜ ˜ Kullback-Leibler divergence or relative entropy. KL(˜ : q ) = H × (˜ : q ) − H(˜ ), p ˜ p ˜ p d p i log ˜ H × (˜ : q ) = p ˜ i =1 1 , cross − entropy qi ˜ d p i log ˜ H(˜ ) = H × (˜ : p ) = p p ˜ i =1 1 , Shannon entropy. pi ˜ → expected extra number of bits per datum that must be transmitted when using the “wrong” distribution q instead of the ˜ true distribution p . ˜ p is hidden by nature (and hypothesized), q is estimated. ˜ ˜ c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 3/25
  • 4. Why Jeffreys divergence? When clustering histograms, all histograms play the same role → Jeffreys [8] divergence: J(p, q) = KL(p : q) + KL(q : p), d (p i − q i ) log J(p, q) = i =1 pi = J(q, p). qi → symmetrizes the KL divergence. (also called J-divergence or symmetrical Kullback-Leibler divergence, etc.) c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 4/25
  • 5. Jeffreys centroids: frequency and positive centroids A set H = {h1 , ..., hn } of weighted histograms. n πj J(hj , x), c = arg min x j=1 n j=1 πj πj > 0’s histogram positive weights: ◮ = 1. Jeffreys positive centroid c: n πj J(hj , x), c = arg min x∈Rd + ◮ j=1 Jeffreys frequency centroid c : ˜ n ˜ πj J(hj , x), c = arg min ˜ x∈∆d j=1 ∆d : Probability (d − 1)-dimensional simplex. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 5/25
  • 6. Prior work ◮ Histogram clustering wrt. χ2 distance [10] ◮ Histogram clustering wrt. Bhattacharyya distance [11, 13] ◮ Histogram clustering wrt. Kullback-Leibler distance as Bregman k-means clustering [1] ◮ Jeffreys frequency centroid [16] (Newton numerical optimization) ◮ Jeffreys frequency centroid as equivalent symmetrized Bregmen centroid [14] ◮ Mixed Bregman clustering [15] ◮ Smooth family of KL symmetrized centroids including Jensen-Shannon centroids and Jeffreys centroids in limit case [12] c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 6/25
  • 7. Jeffreys positive centroid n c = arg min J(H, x) = arg min x∈Rd + x∈Rd + πj J(hj , x). j=1 Theorem (Theorem 1) The Jeffreys positive centroid c = (c 1 , ..., c d ) of a set {h1 , ..., hn } of n weighted positive histograms with d bins can be calculated component-wise exactly using the Lambert W analytic function: ci = ai i a W ( g i e) , where ai = n πj hji denotes the coordinate-wise arithmetic j=1 weighted means and g i = n (hji )πj the coordinate-wise j=1 geometric weighted means. Lambert analytic function [2] W (x)e W (x) = x for x ≥ 0. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 7/25
  • 8. Jeffreys positive centroid (proof) n πj J(hj , x) min x j=1 n min x d (hji − x i )(log hji − log x i ) πj j=1 d i =1 n πj (x i log x i − x i log hji − hji log x i ) ≡ min x i =1 j=1 n d n (hji )πj − x i log x i − x i log j=1 i =1 πj hji a log x i j=1 g d x i log min x i =1 xi − a log x i g c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 8/25
  • 9. Jeffreys positive centroid (proof) Coordinate-wise minimize: min x log x x − a log x g Setting the derivative to zero, we solve: log x a +1− =0 g x and get x= a a W ( g e) c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 9/25
  • 10. Jeffreys frequency centroid: A guaranteed approximation n ˜ πj J(hj , x), c = arg min ˜ x∈∆d j=1 Relaxing x from probability simplex ∆d to Rd , we get + c′ = ˜ c i ai , wc = ,c = ai wc W ( g i e) ci i Lemma (Lemma 1) The cumulative sum wc of the bin values of the Jeffreys positive centroid c of a set of frequency histograms is less or equal to one: 0 < wc ≤ 1. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 10/25
  • 11. Proof of Lemma 1 From Theorem 1: d d ai ci = wc = i i =1 i =1 a W ( g i e) . Arithmetic-geometric mean inequality: ai ≥ g i ai Therefore W ( g i e) ≥ 1 and c i ≤ ai . Thus d d ai = 1 i c ≤ wc = i =1 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. i =1 11/25
  • 12. Lemma 2 Lemma (Lemma 2) ˜ For any histogram x and frequency histogram h, we have ˜ = J(˜ , h) + (wx − 1)(KL(˜ : h) + log wx ), where wx ˜ ˜ J(x, h) x x denotes the normalization factor (wx = d=1 x i ). i ˜ J(x, H) = J(˜ , H) + (wx − 1)(KL(˜ : H) + log wx ), x ˜ x ˜ ˜ ˜ where J(x, H ) = n πj J(x, hj ) and j=1 KL(˜ : H) = n πj KL(˜ , hj ) (with x ˜ x ˜ j=1 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. n j=1 πj = 1). 12/25
  • 13. Proof of Lemma 2 x i = wx x i ˜ d ˜ (wx x i − hi ) log ˜ ˜ J(x, h) = i =1 d ˜ J(x, h) = (wx x i log ˜ i =1 wx x i ˜ ˜i h ˜ hi ˜ xi ˜ ˜ + wx x i log wx + hi log i − hi log wx ) ˜ ˜ x ˜ hi d = (wx − 1) log wx + J(˜ , h) + (wx − 1) x ˜ x i log ˜ i =1 xi ˜ ˜ hi = J(˜ , h) + (wx − 1)(KL(˜ : h) + log wx ) x ˜ x ˜ since d ˜i i =1 h = d ˜i i =1 x = 1. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 13/25
  • 14. Guaranteed approximation of c ˜ Theorem (Theorem 2) c Let c denote the Jeffreys frequency centroid and c ′ = wc the ˜ ˜ normalized Jeffreys positive centroid. Then the approximation ˜ c ′ H) 1 factor αc ′ = J(˜ ,,H) is such that 1 ≤ αc ′ ≤ wc (with wc ≤ 1). ˜ ˜ J(˜ ˜ c c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 14/25
  • 15. Proof of Theorem 2 ˜ J(c, H) ≤ J(˜, H) ≤ J(˜′ , H) c ˜ c ˜ From Lemma 2, since ˜ J(˜′ , H) = J(c, H) + (1 − wc )(KL(˜′ , H) + log wc )) and c ˜ c ˜ ˜ ≤ J(˜, H) ˜ J(c, H) c 1 ≤ αc ′ ≤ 1 + ˜ (1 − wc )(KL(˜′ , H) + log wc ) c ˜ J(˜, H) c ˜ 1 ˜ KL(c, H) − log wc KL(˜′ : H) = c ˜ wc αc ′ ≤ 1 + ˜ ˜ (1 − wc )KL(c, H) wc J(˜, H) c ˜ ˜ ˜ ˜ Since J(˜, H) ≥ J(c, H) and KL(c, H) ≤ J(c, H), we get c ˜ 1 αc ′ ≤ wc . ˜ When wc = 1 the bound is tight. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 15/25
  • 16. In practice... ˜ ˜ c in closed-form → compute wc , KL(c, H), J(c, H). Bound the approximation factor αc ′ as: ˜ αc ′ ≤ 1 + ˜ 1 −1 wc c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. ˜ KL(c, H) 1 ≤ ˜ wc J(c, H) 16/25
  • 17. Fine approximation From [16, 14], minimization of Jeffreys frequency centroid equivalent to: a ˜ x ˜ c = arg min KL(˜ : x ) + KL(˜ : g ) ˜ x ∈∆d ˜ Lagrangian function enforcing log i c i = 1: a ˜i ci ˜ +1− i +λ=0 gi ˜ c ˜ a ˜i ci = ˜ W ˜i e λ+1 a gi ˜ λ = −KL(˜ : g ) ≤ 0 c ˜ c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 17/25
  • 18. Fine approximation: Bisection search ˜i a ci ≤ 1 ⇒ ci = a ˜i e λ+1 gi ˜ W i a λ ≥ log(e ˜ g i ) − 1∀i , ˜ i a λ ∈ [max log(e ˜ g i ) − 1, 0] ˜ i d ˜i a c i (λ) = s(λ) = i ≤1 i =1 W a ˜i e λ+1 gi ˜ Function s: monotonously decreasing with s(0) ≤ 1. → Bisection search for s(λ∗ ) ≃ 1 for arbitrary precision. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 18/25
  • 19. Experiments: Caltech-256 Caltech-256 [7]: 30607 images labeled into 256 categories (256 Jeffreys centroids). Arbitrary floating-point precision: http://www.apfloat.org/ c ′′ = ˜ αc (optimal positive) avg min max 0.9648680345638155 0.906414219584823 0.9956399220678585 a ˜ ˜+g 2 α ′ (n′ lized approx.) c ˜ 1.0002205080964255 1.0000005079528809 1.0000031489541772 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. wc ≤ 1(n′ lizing coeff.t) 0.9338228644308926 0.8342819488534723 0.9931975105809021 α ′′ (Veldhuis’ approx.) c ˜ 1.065590178484613 1.0027707382095195 1.3582296675397754 19/25
  • 20. Experiments: Synthetic data-sets Random binary histograms α= J(˜′ ) c ≥1 J(˜) c Performance: α ∼ 1.0000009, αmax ∼ 1.00181506, αmin = 1.000000. ¯ Express better worst-case upper bound performance? c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 20/25
  • 21. Summary and conclusion ◮ Jeffreys positive centroid c in closed-form ◮ normalized Jeffreys positive centroid c ′ within approximation ˜ 1 factor wc ◮ Bisection search for arbitrary fine approximation of c . ˜ → Variational Jeffreys k-means clustering Other Kullback-Leibler symmetrizations: ◮ Jensen-Shannon divergence [9] ◮ Chernoff divergence [5] ◮ Family of symmetrized centroids including Jensen-Shannon and Jeffreys centroids [12] c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 21/25
  • 22. Thank you! http://www.informationgeometry.org @Article{JeffreysCentroid-2013, author = {Frank Nielsen}, title = {Jeffreys centroids: {A} closed-form expression for positive histograms and a guaranteed tight approximation for frequency histograms}, journal = {IEEE Signal Processing Letters (SPL)}, year = {2013} } c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 22/25
  • 23. Bibliographic references I Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, and Joydeep Ghosh. Clustering with Bregman divergences. Journal of Machine Learning Research, 6:1705–1749, 2005. D. A. Barry, P. J. Culligan-Hensley, and S. J. Barry. Real values of the W -function. ACM Trans. Math. Softw., 21(2):161–171, June 1995. Brigitte Bigi. Using Kullback-Leibler distance for text categorization. In Proceedings of the 25th European conference on IR research (ECIR), ECIR’03, pages 305–319, Berlin, Heidelberg, 2003. Springer-Verlag. Vijay Chandrasekhar, Gabriel Takacs, David M. Chen, Sam S. Tsai, Yuriy A. Reznik, Radek Grzeszczuk, and Bernd Girod. Compressed histogram of gradients: A low-bitrate descriptor. International Journal of Computer Vision, 96(3):384–399, 2012. Herman Chernoff. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Annals of Mathematical Statistics, 23:493–507, 1952. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 23/25
  • 24. Bibliographic references II G. Csurka, C. Bray, C. Dance, and L. Fan. Visual categorization with bags of keypoints. Workshop on Statistical Learning in Computer Vision (ECCV), pages 1–22, 2004. G. Griffin, A. Holub, and P. Perona. Caltech-256 object category dataset. Technical Report 7694, California Institute of Technology, 2007. Harold Jeffreys. An invariant form for the prior probability in estimation problems. Proceedings of the Royal Society of London, 186(1007):453–461, March 1946. Jianhua Lin. Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, 37:145–151, 1991. Huan Liu and Rudy Setiono. Chi2: Feature selection and discretization of numeric attributes. In Proceedings of the Seventh International Conference on Tools with Artificial Intelligence (TAI), pages 88–, Washington, DC, USA, 1995. IEEE Computer Society. Max Mignotte. Segmentation by fusion of histogram-based k-means clusters in different color spaces. IEEE Transactions on Image Processing (TIP), 17(5):780–787, 2008. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 24/25
  • 25. Bibliographic references III Frank Nielsen. A family of statistical symmetric divergences based on Jensen’s inequality. CoRR, abs/1009.4004, 2010. Frank Nielsen and Sylvain Boltz. The Burbea-Rao and Bhattacharyya centroids. IEEE Transactions on Information Theory, 57(8):5455–5466, August 2011. Frank Nielsen and Richard Nock. Sided and symmetrized Bregman centroids. IEEE Transactions on Information Theory, 55(6):2048–2059, June 2009. Richard Nock, Panu Luosto, and Jyrki Kivinen. Mixed Bregman clustering with approximation guarantees. In Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases, pages 154–169, Berlin, Heidelberg, 2008. Springer-Verlag. Raymond N. J. Veldhuis. The centroid of the symmetrical Kullback-Leibler distance. IEEE signal processing letters, 9(3):96–99, March 2002. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 25/25