Computing Marginal in CCMRFs - NIPS 2010

http://www.cs.umd.edu/linqs

Computing Marginal Distributions over Continuous
Markov Networks for Statistical Relational Learning
Matthias Bröcheler and Lise Getoor Supported by NSF Grant No. 0937094

The complexity of computing an approximate

Lovasz & Vempala ‘04
Problem?
distribution σ* using hit-and-run sampling such that
Computing marginal distributions in constrained the total variation distance of σ* and P is less than ε is
continuous MRFs (CCMRF) ∗
3

d O n (kB + n + m)
˜ ˜
Motivation? where ñ=n-kA, under the assumptions that we start from an initial distribution σ such
Many applications of CCMRF, probabilistic soft logic Xi p that the density function dσ/dP is bounded by M except on a set S with σ(S)≤ε/s

being one of them
Contributions? Hit-and-Run Sampling In Theory…
q In Prac@ce…
Analysis of the theoretical and practical aspects of 1.  Sample random direction
computing marginals in CCMRFs 2.  Compute line segment d
3.  Induce density on line Algorithm ε1
4.  Sample from induced density p
1.  Start=MAP state
What’s a CCMRF?
2.  Dimensionality
reduction and LA
Constrained Continuous Markov Random Field Let’s approximate! 3.  How do we get out ε2

of corners?
X = {X1 , .., Xn } : Di ⊂ R D = ×n Di zk − W k d i T
i=1 Computing the marginal probability density function
1.  Corner heuristic di+1 = di + 2 Wk
φ = {φ1 , .., φm } : φj : D → [0, M] 4.  Induce f efficiently
Constraints fX (x ) =
f (x , y)dy for a subset X ⊂ X under Wk 2
Λ = {λ1 , .., λm } ˜
y∈×D ,s.t.X ∈X
i i /
Equality Constraints
the probability measure defined by a CCMRF is #P
Probability measure P over X deﬁned through A : D → RkA , a ∈ Rk A
1 m hard in the worst case. Experimental Results
Inequality Constraints
f (x) = exp[− λj φj (x)]
Z(Λ) B : D → Rk B , b ∈ Rk B Collective classification of 1717 Wikipedia articles with 20% seed documents
j=1
  ˜
D = D ∩ {x|A(x) = a ∧ B(x) ≤ b} In Theory… Setup using tf/idf weighted cosine similarity as baseline and comparing against a
m PSL program with learned weights over K-folds cross validation.

Z(Λ) = exp − λj φj (x) dx / ˜
f (x) = 0 ∀x ∈ D Why CCMRF? Std. Deviation Indicator of
D j=1 Folds Improvement P(Null Relative Confidence
over baseline Hypothesis) Difference Δ(σ)
Probabilistic soft logic (PSL) is a declarative language ∆(σ) = 2
σ− − σ+
20 41.4% 1.95E-09 38.3%
for collective probabilistic reasoning about similarity σ+ + σ−
What does it look like? or uncertainty in relational domains. PSL focuses on
25 31.7% 2.40E-13 41.2%
30 39.1% 1.00E-16 43.5% Hypothesis

X1
statistical relational learning problems with continuous 35 46.1% 4.54E-08 39.0% ∆(σ) 0
1 1 X1
φ3 (x) = max(0, x2 − x3 ) f
RVs and supports sets and aggregation.
Convergence Analysis
φ2 (x) = max(0, x1 − x2 ) 0 1 PSL programs get grounded into CCMRFs for inference. 5

KL Divergence
φ1 (x) = x1
x1 + x3 ≤ 1 w1 : class(B,C)  A.text≈B.text class(A,C) Average KL Divergence
P(0.4 ≤ X2 ≤ 0.6) 0.5
X3
0
Highest
Probability 0
X3
w2 : class(B,C)  link(A,B) class(A,C) Lowest Quartile KL RV)
Divergence
(322-413
1 1 Highest Quartile KL RV)
(174-224

X2
Λ = {1, 2, 1} Constraint: functional(class) 0.05
Divergence
X = {X1 , X2 , X3 } 30000 300000 Number of Samples 3000000

Computing Marginal in CCMRFs - NIPS 2010

More Related Content

More from Matthias Broecheler (14)

Recently uploaded (20)

Computing Marginal in CCMRFs - NIPS 2010