SlideShare a Scribd company logo
Graphical Models                              Factor Graphs           Test-time Inference   Training




                       Part 2: Introduction to Graphical Models

                                 Sebastian Nowozin and Christoph H. Lampert



                                             Colorado Springs, 25th June 2011




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference       Training

Graphical Models



Introduction
             Model: relating observations x to
             quantities of interest y
                                                                                   f
             Example 1: given RGB image x, infer
             depth y for each pixel
             Example 2: given RGB image x, infer              X                        Y
             presence and positions y of all objects                     f :X →Y
             shown




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs           Test-time Inference       Training

Graphical Models



Introduction
             Model: relating observations x to
             quantities of interest y
                                                                                            f
             Example 1: given RGB image x, infer
             depth y for each pixel
             Example 2: given RGB image x, infer                       X                        Y
             presence and positions y of all objects                              f :X →Y
             shown




                                             X : image, Y: object annotations
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference           Training

Graphical Models



Introduction



             General case: mapping x ∈ X to y ∈ Y
             Graphical models are a concise
             language to define this mapping                              x
             Mapping can be ambiguous:
                                                                                   f (x)
             measurement noise, lack of                       X                            Y
             well-posedness (e.g. occlusions)                            f :X →Y
             Probabilistic graphical models: define
             form p(y |x) or p(x, y ) for all y ∈ Y




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference       Training

Graphical Models



Introduction



             General case: mapping x ∈ X to y ∈ Y
             Graphical models are a concise                                        ?
             language to define this mapping                              x
             Mapping can be ambiguous:                                             ?
             measurement noise, lack of                       X                        Y
             well-posedness (e.g. occlusions)                            p(Y |X = x)
             Probabilistic graphical models: define
             form p(y |x) or p(x, y ) for all y ∈ Y




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs    Test-time Inference          Training

Graphical Models



Graphical Models

        A graphical model defines
                   a family of probability distributions over a set of random variables,
                   by means of a graph,
                   so that the random variables satisfy conditional independence
                   assumptions encoded in the graph.




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs    Test-time Inference          Training

Graphical Models



Graphical Models

        A graphical model defines
                   a family of probability distributions over a set of random variables,
                   by means of a graph,
                   so that the random variables satisfy conditional independence
                   assumptions encoded in the graph.
     Popular classes of graphical models,
         Undirected graphical models (Markov
         random fields),
         Directed graphical models (Bayesian
         networks),
             Factor graphs,
             Others: chain graphs, influence
             diagrams, etc.

Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs           Test-time Inference                Training

Graphical Models



Bayesian Networks

      Graph: G = (V , E), E ⊂ V × V                                                          Yi            Yj
              directed
              acyclic
      Variable domains Yi                                                                           Yk
      Factorization

                      p(Y = y ) =                  p(yi |ypaG (i) )
                                                                                                    Yl
                                             i∈V

      over distributions, by conditioning on parent                                         A simple Bayes net
      nodes.
      Example

      p(Y = y ) =p(Yl = yl |Yk = yk )p(Yk = yk |Yi = yi , Yj = yj )
                            p(Yi = yi )p(Yj = yj ).
Sebastian Nowozin and Christoph H. Lampert
      Family of distributions
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs           Test-time Inference                Training

Graphical Models



Bayesian Networks

      Graph: G = (V , E), E ⊂ V × V                                                          Yi            Yj
              directed
              acyclic
      Variable domains Yi                                                                           Yk
      Factorization

                      p(Y = y ) =                  p(yi |ypaG (i) )
                                                                                                    Yl
                                             i∈V

      over distributions, by conditioning on parent                                         A simple Bayes net
      nodes.
      Example

      p(Y = y ) =p(Yl = yl |Yk = yk )p(Yk = yk |Yi = yi , Yj = yj )
                            p(Yi = yi )p(Yj = yj ).
Sebastian Nowozin and Christoph H. Lampert
      Family of distributions
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                     Test-time Inference                  Training

Graphical Models



Undirected Graphical Models
                                                                                                Yi         Yj        Yk
             = Markov random field (MRF) = Markov
             network                                                                                  A simple MRF
             Graph: G = (V , E), E ⊂ V × V
                      undirected, no self-edges
             Variable domains Yi
             Factorization over potentials ψ at cliques,
                                              1
                                 p(y ) =                       ψC (yC )
                                              Z
                                                    C ∈C(G )


             Constant Z =                    y ∈Y     C ∈C(G )   ψC (yC )
             Example
                                    1
                     p(y ) =          ψi (yi )ψj (yj )ψl (yl )ψi,j (yi , yj )
                                    Z
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                     Test-time Inference                  Training

Graphical Models



Undirected Graphical Models
                                                                                                Yi         Yj        Yk
             = Markov random field (MRF) = Markov
             network                                                                                  A simple MRF
             Graph: G = (V , E), E ⊂ V × V
                      undirected, no self-edges
             Variable domains Yi
             Factorization over potentials ψ at cliques,
                                              1
                                 p(y ) =                       ψC (yC )
                                              Z
                                                    C ∈C(G )


             Constant Z =                    y ∈Y     C ∈C(G )   ψC (yC )
             Example
                                    1
                     p(y ) =          ψi (yi )ψj (yj )ψl (yl )ψi,j (yi , yj )
                                    Z
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                      Test-time Inference   Training

Graphical Models



Example 1


                                                  Yi               Yj              Yk




                   Cliques C(G ): set of vertex sets V with V ⊆ V ,
                   E ∩ (V × V ) = V × V
                   Here C(G ) = {{i}, {i, j}, {j}, {j, k}, {k}}

                                                         1
                                             p(y ) =       ψi (yi )ψj (yj )ψl (yl )ψi,j (yi , yj )
                                                         Z



Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                       Test-time Inference   Training

Graphical Models



Example 2


                                                        Yi                 Yj




                                                        Yk                 Yl



                   Here C(G ) = 2V : all subsets of V are cliques

                                                             1
                                                   p(y ) =                      ψA (yA ).
                                                             Z
                                                                 A∈2{i,j,k,l}


Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                  Test-time Inference                       Training

Factor Graphs



Factor Graphs


                Graph: G = (V , F, E), E ⊆ V × F                                              Yi                  Yj
                      variable nodes V ,
                      factor nodes F ,
                      edges E between variable and factor nodes.
                      scope of a factor,
                      N(F ) = {i ∈ V : (i, F ) ∈ E}
                                                                                              Yk                  Yl
                Variable domains Yi
                Factorization over potentials ψ at factors,                                        Factor graph
                                              1
                                 p(y ) =                   ψF (yN(F ) )
                                              Z
                                                    F ∈F

                Constant Z =                 y ∈Y     F ∈F    ψF (yN(F ) )


Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                  Test-time Inference                       Training

Factor Graphs



Factor Graphs


                Graph: G = (V , F, E), E ⊆ V × F                                              Yi                  Yj
                      variable nodes V ,
                      factor nodes F ,
                      edges E between variable and factor nodes.
                      scope of a factor,
                      N(F ) = {i ∈ V : (i, F ) ∈ E}
                                                                                              Yk                  Yl
                Variable domains Yi
                Factorization over potentials ψ at factors,                                        Factor graph
                                              1
                                 p(y ) =                   ψF (yN(F ) )
                                              Z
                                                    F ∈F

                Constant Z =                 y ∈Y     F ∈F    ψF (yN(F ) )


Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs             Test-time Inference        Training

Factor Graphs



Why factor graphs?


                         Yi                  Yj              Yi   Yj         Yi              Yj




                         Yk                  Yl              Yk   Yl         Yk              Yl




                   Factor graphs are explicit about the factorization
                   Hence, easier to work with
                   Universal (just like MRFs and Bayesian networks)


Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference             Training

Factor Graphs



Capacity



           Yi                     Yj                                               Yi   Yj




           Yk                     Yl                                               Yk   Yl




                   Factor graph defines family of distributions
                   Some families are larger than others



Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference   Training

Factor Graphs



Four remaining pieces




           1. Conditional distributions (CRFs)
           2. Parameterization
           3. Test-time inference
           4. Learning the model from training data




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference   Training

Factor Graphs



Four remaining pieces




           1. Conditional distributions (CRFs)
           2. Parameterization
           3. Test-time inference
           4. Learning the model from training data




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference                        Training

Factor Graphs



Conditional Distributions

                We have discussed p(y ),                                           Xi          Xj
                How do we define p(y |x)?
                Potentials become a function of xN(F )
                Partition function depends on x
                                                                                   Yi              Yj
                Conditional random fields (CRFs)
                x is not part of the probability model, i.e. not                    conditional
                treated as random variable                                          distribution




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                    Test-time Inference                        Training

Factor Graphs



Conditional Distributions

                We have discussed p(y ),                                                             Xi          Xj
                How do we define p(y |x)?
                Potentials become a function of xN(F )
                Partition function depends on x
                                                                                                     Yi              Yj
                Conditional random fields (CRFs)
                x is not part of the probability model, i.e. not                                      conditional
                treated as random variable                                                            distribution
                                               1
                                      p(y ) =        ψF (yN(F ) )
                                               Z
                                                              F ∈F

                                                       1
                                           p(y |x) =                 ψF (yN(F ) ; xN(F ) )
                                                     Z (x)
                                                              F ∈F



Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                    Test-time Inference                        Training

Factor Graphs



Conditional Distributions

                We have discussed p(y ),                                                             Xi          Xj
                How do we define p(y |x)?
                Potentials become a function of xN(F )
                Partition function depends on x
                                                                                                     Yi              Yj
                Conditional random fields (CRFs)
                x is not part of the probability model, i.e. not                                      conditional
                treated as random variable                                                            distribution
                                               1
                                      p(y ) =        ψF (yN(F ) )
                                               Z
                                                              F ∈F

                                                       1
                                           p(y |x) =                 ψF (yN(F ) ; xN(F ) )
                                                     Z (x)
                                                              F ∈F



Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                         Test-time Inference                     Training

Factor Graphs



Potentials and Energy Functions

                   For each factor F ∈ F, YF =                     ×
                                                                  i∈N(F )
                                                                            Yi ,

                                                              EF : YN(F ) → R,

                   Potentials and energies (assume ψF (yF ) > 0)

                         ψF (yF ) = exp(−EF (yF )),                  and EF (yF ) = − log(ψF (yF )).

                   Then p(y ) can be written as
                                                                   1
                                             p(Y = y )        =                ψF (yF )
                                                                   Z
                                                                       F ∈F
                                                                   1
                                                              =      exp(−                EF (yF )),
                                                                   Z
                                                                                   F ∈F

                   Hence, p(y ) is completely determined by E (y ) =                                      F ∈F   EF (yF )
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                         Test-time Inference                     Training

Factor Graphs



Potentials and Energy Functions

                   For each factor F ∈ F, YF =                     ×
                                                                  i∈N(F )
                                                                            Yi ,

                                                              EF : YN(F ) → R,

                   Potentials and energies (assume ψF (yF ) > 0)

                         ψF (yF ) = exp(−EF (yF )),                  and EF (yF ) = − log(ψF (yF )).

                   Then p(y ) can be written as
                                                                   1
                                             p(Y = y )        =                ψF (yF )
                                                                   Z
                                                                       F ∈F
                                                                   1
                                                              =      exp(−                EF (yF )),
                                                                   Z
                                                                                   F ∈F

                   Hence, p(y ) is completely determined by E (y ) =                                      F ∈F   EF (yF )
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                         Test-time Inference                     Training

Factor Graphs



Potentials and Energy Functions

                   For each factor F ∈ F, YF =                     ×
                                                                  i∈N(F )
                                                                            Yi ,

                                                              EF : YN(F ) → R,

                   Potentials and energies (assume ψF (yF ) > 0)

                         ψF (yF ) = exp(−EF (yF )),                  and EF (yF ) = − log(ψF (yF )).

                   Then p(y ) can be written as
                                                                   1
                                             p(Y = y )        =                ψF (yF )
                                                                   Z
                                                                       F ∈F
                                                                   1
                                                              =      exp(−                EF (yF )),
                                                                   Z
                                                                                   F ∈F

                   Hence, p(y ) is completely determined by E (y ) =                                      F ∈F   EF (yF )
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                     Test-time Inference      Training

Factor Graphs



Energy Minimization

                                                                      1
                     argmax p(Y = y )                 =      argmax     exp(−               EF (yF ))
                        y ∈Y                                  y ∈Y    Z
                                                                                    F ∈F

                                                      =      argmax exp(−              EF (yF ))
                                                              y ∈Y
                                                                               F ∈F

                                                      =      argmax −           EF (yF )
                                                              y ∈Y
                                                                        F ∈F

                                                      =      argmin          EF (yF )
                                                              y ∈Y
                                                                      F ∈F
                                                      =      argmin E (y ).
                                                              y ∈Y


                   Energy minimization can be interpreted as solving for the most likely
                   state of some factor graph model
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference      Training

Factor Graphs



Parameterization
                   Factor graphs define a family of distributions
                   Parameterization: identifying individual members by parameters w




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs           Test-time Inference   Training

Factor Graphs



Parameterization
                   Factor graphs define a family of distributions
                   Parameterization: identifying individual members by parameters w


                         distributions
                         indexed
                         by w                                      pw1
                                                             pw2




                                                                          distributions
                                                                          in family

Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference      Training

Factor Graphs



Example: Parameterization



                Image segmentation model
                Pairwise “Potts” energy function
                EF (yi , yj ; w1 ),

                      EF : {0, 1} × {0, 1} × R → R,

                EF (0, 0; w1 ) = EF (1, 1; w1 ) = 0            image segmentation model
                EF (0, 1; w1 ) = EF (1, 0; w1 ) = w1




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference      Training

Factor Graphs



Example: Parameterization (cont)



                Image segmentation model
                Unary energy function EF (yi ; x, w ),

                   EF : {0, 1} × X × R{0,1}×D → R,

                EF (0; x, w ) = w (0), ψF (x)
                EF (1; x, w ) = w (1), ψF (x)                  image segmentation model
                Features ψF : X → RD , e.g. image
                filters




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs          Test-time Inference         Training

Factor Graphs



Example: Parameterization (cont)

                                w(0), ψF (x)
                                                             ...               ...        ...
                                w(1), ψF (x)
                                                                                          ...
                                                              0    w1
                                                              w1   0




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs               Test-time Inference         Training

Factor Graphs



Example: Parameterization (cont)

                                w(0), ψF (x)
                                                               ...                  ...        ...
                                w(1), ψF (x)
                                                                                               ...
                                                                 0     w1
                                                                 w1    0


                   Total number of parameters: D + D + 1
                   Parameters are shared, but energies differ because of different ψF (x)
                   General form, linear in w ,

                                              EF (yF ; xF , w ) = w (yF ), ψF (xF )

Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference   Training

Test-time Inference



Making Predictions




                   Making predictions: given x ∈ X , predict y ∈ Y
                   How to measure quality of prediction? (or function f : X → Y)




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                 Test-time Inference   Training

Test-time Inference



Loss function


                   Define a loss function

                                                             ∆ : Y × Y → R+ ,

                   so that ∆(y , y ∗ ) measures the loss incurred by predicting y when y ∗
                   is true.
                   The loss function is application dependent




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                      Test-time Inference   Training

Test-time Inference



Test-time Inference

                   Loss function ∆(y , f (x)): correct label y , predict f (x)

                                                             ∆:Y ×Y →R

                   True joint distribution d(X , Y ) and true conditional d(y |x)
                   Model distribution p(y |x)
                   Expected loss: quality of prediction

                                             R∆ (x)
                                              f              = Ey ∼d(y |x) ∆(y , f (x))
                                                             =          d(y |x) ∆(y , f (x)).
                                                                 y ∈Y
                                                             ≈ Ey ∼p(y |x;w ) ∆(y , f (x))

                   Assuming that p(y |x; w ) ≈ d(y |x)

Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                      Test-time Inference   Training

Test-time Inference



Test-time Inference

                   Loss function ∆(y , f (x)): correct label y , predict f (x)

                                                             ∆:Y ×Y →R

                   True joint distribution d(X , Y ) and true conditional d(y |x)
                   Model distribution p(y |x)
                   Expected loss: quality of prediction

                                             R∆ (x)
                                              f              = Ey ∼d(y |x) ∆(y , f (x))
                                                             =          d(y |x) ∆(y , f (x)).
                                                                 y ∈Y
                                                             ≈ Ey ∼p(y |x;w ) ∆(y , f (x))

                   Assuming that p(y |x; w ) ≈ d(y |x)

Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                      Test-time Inference   Training

Test-time Inference



Test-time Inference

                   Loss function ∆(y , f (x)): correct label y , predict f (x)

                                                             ∆:Y ×Y →R

                   True joint distribution d(X , Y ) and true conditional d(y |x)
                   Model distribution p(y |x)
                   Expected loss: quality of prediction

                                             R∆ (x)
                                              f              = Ey ∼d(y |x) ∆(y , f (x))
                                                             =          d(y |x) ∆(y , f (x)).
                                                                 y ∈Y
                                                             ≈ Ey ∼p(y |x;w ) ∆(y , f (x))

                   Assuming that p(y |x; w ) ≈ d(y |x)

Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                  Test-time Inference   Training

Test-time Inference



Example 1: 0/1 loss

        Loss 0 iff perfectly predicted, 1 otherwise:

                                                                        0      if y = y ∗
                                  ∆0/1 (y , y ∗ ) = I (y = y ∗ ) =
                                                                        1      otherwise

        Plugging it in,

                                      y∗     := argmin Ey ∼p(y |x) ∆0/1 (y , y )
                                                       y ∈Y

                                             =      argmax p(y |x)
                                                       y ∈Y

                                             =      argmin E (y , x).
                                                       y ∈Y



                   Minimizing the expected 0/1-loss → MAP prediction (energy
                   minimization)
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                  Test-time Inference   Training

Test-time Inference



Example 1: 0/1 loss

        Loss 0 iff perfectly predicted, 1 otherwise:

                                                                        0      if y = y ∗
                                  ∆0/1 (y , y ∗ ) = I (y = y ∗ ) =
                                                                        1      otherwise

        Plugging it in,

                                      y∗     := argmin Ey ∼p(y |x) ∆0/1 (y , y )
                                                       y ∈Y

                                             =      argmax p(y |x)
                                                       y ∈Y

                                             =      argmin E (y , x).
                                                       y ∈Y



                   Minimizing the expected 0/1-loss → MAP prediction (energy
                   minimization)
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                                Factor Graphs                 Test-time Inference   Training

Test-time Inference



Example 2: Hamming loss
     Count the number of mislabeled variables:
                                             1
                      ∆H (y , y ∗ ) =                    I (yi = yi∗ )
                                            |V |
                                                   i∈V




        Plugging it in,

                                           y∗   := argmin Ey ∼p(y |x) [∆H (y , y )]
                                                            y ∈Y


                                                 =          argmax p(yi |x)
                                                                yi ∈Yi
                                                                              i∈V


                   Minimizing the expected Hamming loss → maximum posterior
                   marginal (MPM, Max-Marg) prediction
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                                Factor Graphs                 Test-time Inference   Training

Test-time Inference



Example 2: Hamming loss
     Count the number of mislabeled variables:
                                             1
                      ∆H (y , y ∗ ) =                    I (yi = yi∗ )
                                            |V |
                                                   i∈V




        Plugging it in,

                                           y∗   := argmin Ey ∼p(y |x) [∆H (y , y )]
                                                            y ∈Y


                                                 =          argmax p(yi |x)
                                                                yi ∈Yi
                                                                              i∈V


                   Minimizing the expected Hamming loss → maximum posterior
                   marginal (MPM, Max-Marg) prediction
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                                 Factor Graphs                           Test-time Inference   Training

Test-time Inference



Example 3: Squared error
     Assume a vector space on Yi (pixel intensities,
     optical flow vectors, etc.).
     Sum of squared errors
                                                 1
                      ∆Q (y , y ∗ ) =                            yi − yi∗ 2 .
                                                |V |
                                                       i∈V


        Plugging it in,
                                           y∗    := argmin Ey ∼p(y |x) [∆Q (y , y )]
                                                             y ∈Y
                                                                                  

                                                  =                      p(yi |x)yi 
                                                                 yi ∈Yi
                                                                                         i∈V

                   Minimizing the expected squared error → minimum mean squared
                   error (MMSE) prediction
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                                 Factor Graphs                           Test-time Inference   Training

Test-time Inference



Example 3: Squared error
     Assume a vector space on Yi (pixel intensities,
     optical flow vectors, etc.).
     Sum of squared errors
                                                 1
                      ∆Q (y , y ∗ ) =                            yi − yi∗ 2 .
                                                |V |
                                                       i∈V


        Plugging it in,
                                           y∗    := argmin Ey ∼p(y |x) [∆Q (y , y )]
                                                             y ∈Y
                                                                                  

                                                  =                      p(yi |x)yi 
                                                                 yi ∈Yi
                                                                                         i∈V

                   Minimizing the expected squared error → minimum mean squared
                   error (MMSE) prediction
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs      Test-time Inference   Training

Test-time Inference



Inference Task: Maximum A Posteriori (MAP) Inference




        Definition (Maximum A Posteriori (MAP) Inference)
        Given a factor graph, parameterization, and weight vector w , and given
        the observation x, find

                             y ∗ = argmax p(Y = y |x, w ) = argmin E (y ; x, w ).
                                           y ∈Y                y ∈Y




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                        Test-time Inference   Training

Test-time Inference



Inference Task: Probabilistic Inference


        Definition (Probabilistic Inference)
        Given a factor graph, parameterization, and weight vector w , and given
        the observation x, find

                       log Z (x, w ) =              log             exp(−E (y ; x, w )),
                                                             y ∈Y
                               µF (yF )      = p(YF = yf |x, w ),                ∀F ∈ F, ∀yF ∈ YF .


                   This typically includes variable marginals

                                                             µi (yi ) = p(yi |x, w )



Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference             Training

Test-time Inference



Example: Man-made structure detection


                                                                       Xi
                                                                     ψi2
                                                                    Yi            3
                                                                                 ψi,k   Yk
                                                                           ψi1




                   Left: input image x,
                   Middle: ground truth labeling on 16-by-16 pixel blocks,
                   Right: factor graph model

                   Features: gradient and color histograms
                   Estimate model parameters from ≈ 60 training images


Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs    Test-time Inference    Training

Test-time Inference



Example: Man-made structure detection




                   Left: input image x,
                   Middle (probabilistic inference): visualization of the variable
                   marginals p(yi = “manmade |x, w ),
                   Right (MAP inference): joint MAP labeling
                   y ∗ = argmaxy ∈Y p(y |x, w ).



Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference   Training

Training



Training the Model




           What can be learned?
              Model structure: factors
                   Model variables: observed variables fixed, but we can add
                   unobserved variables
                   Factor energies: parameters




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference   Training

Training



Training the Model




           What can be learned?
              Model structure: factors
                   Model variables: observed variables fixed, but we can add
                   unobserved variables
                   Factor energies: parameters




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                                Factor Graphs                Test-time Inference   Training

Training



Training: Overview



                   Assume a fully observed, independent and identically distributed
                   (iid) sample set

                                           {(x n , y n )}n=1,...,N ,   (x n , y n ) ∼ d(X , Y )

                   Goal: predict well,
                   Alternative goal: first model d(y |x) well by p(y |x, w ), then predict
                   by minimizing the expected loss




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs    Test-time Inference     Training

Training



Probabilistic Learning



           Problem (Probabilistic Parameter Learning)
           Let d(y |x) be the (unknown) conditional distribution of labels for a
           problem to be solved. For a parameterized conditional distribution
           p(y |x, w ) with parameters w ∈ RD , probabilistic parameter learning is
           the task of finding a point estimate of the parameter w ∗ that makes
           p(y |x, w ∗ ) closest to d(y |x).

                   We will discuss probabilistic parameter learning in detail.




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs    Test-time Inference     Training

Training



Probabilistic Learning



           Problem (Probabilistic Parameter Learning)
           Let d(y |x) be the (unknown) conditional distribution of labels for a
           problem to be solved. For a parameterized conditional distribution
           p(y |x, w ) with parameters w ∈ RD , probabilistic parameter learning is
           the task of finding a point estimate of the parameter w ∗ that makes
           p(y |x, w ∗ ) closest to d(y |x).

                   We will discuss probabilistic parameter learning in detail.




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs        Test-time Inference   Training

Training



Loss-Minimizing Parameter Learning


           Problem (Loss-Minimizing Parameter Learning)
           Let d(x, y ) be the unknown distribution of data in labels, and let
           ∆ : Y × Y → R be a loss function. Loss minimizing parameter learning is
           the task of finding a parameter value w ∗ such that the expected
           prediction risk
                                   E(x,y )∼d(x,y ) [∆(y , fp (x))]
           is as small as possible, where fp (x) = argmaxy ∈Y p(y |x, w ∗ ).

                   Requires loss function at training time
                   Directly learns a prediction function fp (x)



Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs        Test-time Inference   Training

Training



Loss-Minimizing Parameter Learning


           Problem (Loss-Minimizing Parameter Learning)
           Let d(x, y ) be the unknown distribution of data in labels, and let
           ∆ : Y × Y → R be a loss function. Loss minimizing parameter learning is
           the task of finding a parameter value w ∗ such that the expected
           prediction risk
                                   E(x,y )∼d(x,y ) [∆(y , fp (x))]
           is as small as possible, where fp (x) = argmaxy ∈Y p(y |x, w ∗ ).

                   Requires loss function at training time
                   Directly learns a prediction function fp (x)



Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models

More Related Content

PDF
Ben Gal
PDF
Tro07 sparse-solutions-talk
PDF
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
PDF
Object Detection with Discrmininatively Trained Part based Models
PDF
MATHEON Center Days: Index determination and structural analysis using Algori...
PDF
Lesson 4: Calculating Limits (Section 21 slides)
PDF
Johan Suykens: "Models from Data: a Unifying Picture"
Ben Gal
Tro07 sparse-solutions-talk
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
Object Detection with Discrmininatively Trained Part based Models
MATHEON Center Days: Index determination and structural analysis using Algori...
Lesson 4: Calculating Limits (Section 21 slides)
Johan Suykens: "Models from Data: a Unifying Picture"

What's hot (19)

PDF
Lesson 12: Linear Approximation
PDF
Elementary Landscape Decomposition of Combinatorial Optimization Problems
PPT
JavaYDL13
PDF
Lesson 25: The Definite Integral
PDF
Robust Shape and Topology Optimization - Northwestern
PDF
Identity Based Encryption
PDF
Elementary Landscape Decomposition of Combinatorial Optimization Problems
PDF
Lecture11
PPTX
JOSA TechTalks - Machine Learning on Graph-Structured Data
PDF
Adaptive Signal and Image Processing
PPTX
PDF
Image formation
PDF
Discussion of Faming Liang's talk
PDF
Kernelization algorithms for graph and other structure modification problems
PDF
Optimal Transport in Imaging Sciences
PDF
Camera parameters
PDF
An Introduction to Optimal Transport
PDF
Bayesian Defect Signal Analysis for Nondestructive Evaluation of Materials
PDF
A type system for the vectorial aspects of the linear-algebraic lambda-calculus
Lesson 12: Linear Approximation
Elementary Landscape Decomposition of Combinatorial Optimization Problems
JavaYDL13
Lesson 25: The Definite Integral
Robust Shape and Topology Optimization - Northwestern
Identity Based Encryption
Elementary Landscape Decomposition of Combinatorial Optimization Problems
Lecture11
JOSA TechTalks - Machine Learning on Graph-Structured Data
Adaptive Signal and Image Processing
Image formation
Discussion of Faming Liang's talk
Kernelization algorithms for graph and other structure modification problems
Optimal Transport in Imaging Sciences
Camera parameters
An Introduction to Optimal Transport
Bayesian Defect Signal Analysis for Nondestructive Evaluation of Materials
A type system for the vectorial aspects of the linear-algebraic lambda-calculus
Ad

Similar to 01 graphical models (20)

PDF
A discussion on sampling graphs to approximate network classification functions
PDF
Physics of Algorithms Talk
PDF
YSC 2013
PDF
從 VAE 走向深度學習新理論
PDF
Stochastic Differentiation
PDF
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
PDF
Bayesian case studies, practical 2
PDF
Slides2 130201091056-phpapp01
PDF
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...
PDF
cswiercz-general-presentation
PDF
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
PDF
Chapter 3 projection
PDF
Final presentation2-----------------.pdf
PDF
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
PDF
UCB 2012-02-28
PDF
Slides: A glance at information-geometric signal processing
PDF
Maximum likelihood estimation of regularisation parameters in inverse problem...
PDF
Basics of probability in statistical simulation and stochastic programming
PDF
An introduction to quantum stochastic calculus
PDF
Tuto part2
A discussion on sampling graphs to approximate network classification functions
Physics of Algorithms Talk
YSC 2013
從 VAE 走向深度學習新理論
Stochastic Differentiation
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
Bayesian case studies, practical 2
Slides2 130201091056-phpapp01
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...
cswiercz-general-presentation
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Chapter 3 projection
Final presentation2-----------------.pdf
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
UCB 2012-02-28
Slides: A glance at information-geometric signal processing
Maximum likelihood estimation of regularisation parameters in inverse problem...
Basics of probability in statistical simulation and stochastic programming
An introduction to quantum stochastic calculus
Tuto part2
Ad

More from zukun (20)

PDF
My lyn tutorial 2009
PDF
ETHZ CV2012: Tutorial openCV
PDF
ETHZ CV2012: Information
PDF
Siwei lyu: natural image statistics
PDF
Lecture9 camera calibration
PDF
Brunelli 2008: template matching techniques in computer vision
PDF
Modern features-part-4-evaluation
PDF
Modern features-part-3-software
PDF
Modern features-part-2-descriptors
PDF
Modern features-part-1-detectors
PDF
Modern features-part-0-intro
PDF
Lecture 02 internet video search
PDF
Lecture 01 internet video search
PDF
Lecture 03 internet video search
PDF
Icml2012 tutorial representation_learning
PPT
Advances in discrete energy minimisation for computer vision
PDF
Gephi tutorial: quick start
PDF
EM algorithm and its application in probabilistic latent semantic analysis
PDF
Object recognition with pictorial structures
PDF
Iccv2011 learning spatiotemporal graphs of human activities
My lyn tutorial 2009
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Information
Siwei lyu: natural image statistics
Lecture9 camera calibration
Brunelli 2008: template matching techniques in computer vision
Modern features-part-4-evaluation
Modern features-part-3-software
Modern features-part-2-descriptors
Modern features-part-1-detectors
Modern features-part-0-intro
Lecture 02 internet video search
Lecture 01 internet video search
Lecture 03 internet video search
Icml2012 tutorial representation_learning
Advances in discrete energy minimisation for computer vision
Gephi tutorial: quick start
EM algorithm and its application in probabilistic latent semantic analysis
Object recognition with pictorial structures
Iccv2011 learning spatiotemporal graphs of human activities

Recently uploaded (20)

PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPTX
1. Introduction to Computer Programming.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
A Presentation on Artificial Intelligence
PDF
Getting Started with Data Integration: FME Form 101
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Spectroscopy.pptx food analysis technology
PDF
Machine learning based COVID-19 study performance prediction
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPT
Teaching material agriculture food technology
PPTX
Big Data Technologies - Introduction.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Approach and Philosophy of On baking technology
Group 1 Presentation -Planning and Decision Making .pptx
SOPHOS-XG Firewall Administrator PPT.pptx
1. Introduction to Computer Programming.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
20250228 LYD VKU AI Blended-Learning.pptx
A Presentation on Artificial Intelligence
Getting Started with Data Integration: FME Form 101
Encapsulation_ Review paper, used for researhc scholars
Advanced methodologies resolving dimensionality complications for autism neur...
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Empathic Computing: Creating Shared Understanding
Spectroscopy.pptx food analysis technology
Machine learning based COVID-19 study performance prediction
Network Security Unit 5.pdf for BCA BBA.
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Teaching material agriculture food technology
Big Data Technologies - Introduction.pptx
cuic standard and advanced reporting.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Approach and Philosophy of On baking technology

01 graphical models

  • 1. Graphical Models Factor Graphs Test-time Inference Training Part 2: Introduction to Graphical Models Sebastian Nowozin and Christoph H. Lampert Colorado Springs, 25th June 2011 Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 2. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Introduction Model: relating observations x to quantities of interest y f Example 1: given RGB image x, infer depth y for each pixel Example 2: given RGB image x, infer X Y presence and positions y of all objects f :X →Y shown Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 3. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Introduction Model: relating observations x to quantities of interest y f Example 1: given RGB image x, infer depth y for each pixel Example 2: given RGB image x, infer X Y presence and positions y of all objects f :X →Y shown X : image, Y: object annotations Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 4. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Introduction General case: mapping x ∈ X to y ∈ Y Graphical models are a concise language to define this mapping x Mapping can be ambiguous: f (x) measurement noise, lack of X Y well-posedness (e.g. occlusions) f :X →Y Probabilistic graphical models: define form p(y |x) or p(x, y ) for all y ∈ Y Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 5. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Introduction General case: mapping x ∈ X to y ∈ Y Graphical models are a concise ? language to define this mapping x Mapping can be ambiguous: ? measurement noise, lack of X Y well-posedness (e.g. occlusions) p(Y |X = x) Probabilistic graphical models: define form p(y |x) or p(x, y ) for all y ∈ Y Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 6. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Graphical Models A graphical model defines a family of probability distributions over a set of random variables, by means of a graph, so that the random variables satisfy conditional independence assumptions encoded in the graph. Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 7. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Graphical Models A graphical model defines a family of probability distributions over a set of random variables, by means of a graph, so that the random variables satisfy conditional independence assumptions encoded in the graph. Popular classes of graphical models, Undirected graphical models (Markov random fields), Directed graphical models (Bayesian networks), Factor graphs, Others: chain graphs, influence diagrams, etc. Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 8. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Bayesian Networks Graph: G = (V , E), E ⊂ V × V Yi Yj directed acyclic Variable domains Yi Yk Factorization p(Y = y ) = p(yi |ypaG (i) ) Yl i∈V over distributions, by conditioning on parent A simple Bayes net nodes. Example p(Y = y ) =p(Yl = yl |Yk = yk )p(Yk = yk |Yi = yi , Yj = yj ) p(Yi = yi )p(Yj = yj ). Sebastian Nowozin and Christoph H. Lampert Family of distributions Part 2: Introduction to Graphical Models
  • 9. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Bayesian Networks Graph: G = (V , E), E ⊂ V × V Yi Yj directed acyclic Variable domains Yi Yk Factorization p(Y = y ) = p(yi |ypaG (i) ) Yl i∈V over distributions, by conditioning on parent A simple Bayes net nodes. Example p(Y = y ) =p(Yl = yl |Yk = yk )p(Yk = yk |Yi = yi , Yj = yj ) p(Yi = yi )p(Yj = yj ). Sebastian Nowozin and Christoph H. Lampert Family of distributions Part 2: Introduction to Graphical Models
  • 10. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Undirected Graphical Models Yi Yj Yk = Markov random field (MRF) = Markov network A simple MRF Graph: G = (V , E), E ⊂ V × V undirected, no self-edges Variable domains Yi Factorization over potentials ψ at cliques, 1 p(y ) = ψC (yC ) Z C ∈C(G ) Constant Z = y ∈Y C ∈C(G ) ψC (yC ) Example 1 p(y ) = ψi (yi )ψj (yj )ψl (yl )ψi,j (yi , yj ) Z Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 11. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Undirected Graphical Models Yi Yj Yk = Markov random field (MRF) = Markov network A simple MRF Graph: G = (V , E), E ⊂ V × V undirected, no self-edges Variable domains Yi Factorization over potentials ψ at cliques, 1 p(y ) = ψC (yC ) Z C ∈C(G ) Constant Z = y ∈Y C ∈C(G ) ψC (yC ) Example 1 p(y ) = ψi (yi )ψj (yj )ψl (yl )ψi,j (yi , yj ) Z Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 12. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Example 1 Yi Yj Yk Cliques C(G ): set of vertex sets V with V ⊆ V , E ∩ (V × V ) = V × V Here C(G ) = {{i}, {i, j}, {j}, {j, k}, {k}} 1 p(y ) = ψi (yi )ψj (yj )ψl (yl )ψi,j (yi , yj ) Z Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 13. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Example 2 Yi Yj Yk Yl Here C(G ) = 2V : all subsets of V are cliques 1 p(y ) = ψA (yA ). Z A∈2{i,j,k,l} Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 14. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Factor Graphs Graph: G = (V , F, E), E ⊆ V × F Yi Yj variable nodes V , factor nodes F , edges E between variable and factor nodes. scope of a factor, N(F ) = {i ∈ V : (i, F ) ∈ E} Yk Yl Variable domains Yi Factorization over potentials ψ at factors, Factor graph 1 p(y ) = ψF (yN(F ) ) Z F ∈F Constant Z = y ∈Y F ∈F ψF (yN(F ) ) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 15. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Factor Graphs Graph: G = (V , F, E), E ⊆ V × F Yi Yj variable nodes V , factor nodes F , edges E between variable and factor nodes. scope of a factor, N(F ) = {i ∈ V : (i, F ) ∈ E} Yk Yl Variable domains Yi Factorization over potentials ψ at factors, Factor graph 1 p(y ) = ψF (yN(F ) ) Z F ∈F Constant Z = y ∈Y F ∈F ψF (yN(F ) ) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 16. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Why factor graphs? Yi Yj Yi Yj Yi Yj Yk Yl Yk Yl Yk Yl Factor graphs are explicit about the factorization Hence, easier to work with Universal (just like MRFs and Bayesian networks) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 17. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Capacity Yi Yj Yi Yj Yk Yl Yk Yl Factor graph defines family of distributions Some families are larger than others Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 18. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Four remaining pieces 1. Conditional distributions (CRFs) 2. Parameterization 3. Test-time inference 4. Learning the model from training data Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 19. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Four remaining pieces 1. Conditional distributions (CRFs) 2. Parameterization 3. Test-time inference 4. Learning the model from training data Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 20. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Conditional Distributions We have discussed p(y ), Xi Xj How do we define p(y |x)? Potentials become a function of xN(F ) Partition function depends on x Yi Yj Conditional random fields (CRFs) x is not part of the probability model, i.e. not conditional treated as random variable distribution Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 21. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Conditional Distributions We have discussed p(y ), Xi Xj How do we define p(y |x)? Potentials become a function of xN(F ) Partition function depends on x Yi Yj Conditional random fields (CRFs) x is not part of the probability model, i.e. not conditional treated as random variable distribution 1 p(y ) = ψF (yN(F ) ) Z F ∈F 1 p(y |x) = ψF (yN(F ) ; xN(F ) ) Z (x) F ∈F Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 22. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Conditional Distributions We have discussed p(y ), Xi Xj How do we define p(y |x)? Potentials become a function of xN(F ) Partition function depends on x Yi Yj Conditional random fields (CRFs) x is not part of the probability model, i.e. not conditional treated as random variable distribution 1 p(y ) = ψF (yN(F ) ) Z F ∈F 1 p(y |x) = ψF (yN(F ) ; xN(F ) ) Z (x) F ∈F Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 23. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Potentials and Energy Functions For each factor F ∈ F, YF = × i∈N(F ) Yi , EF : YN(F ) → R, Potentials and energies (assume ψF (yF ) > 0) ψF (yF ) = exp(−EF (yF )), and EF (yF ) = − log(ψF (yF )). Then p(y ) can be written as 1 p(Y = y ) = ψF (yF ) Z F ∈F 1 = exp(− EF (yF )), Z F ∈F Hence, p(y ) is completely determined by E (y ) = F ∈F EF (yF ) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 24. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Potentials and Energy Functions For each factor F ∈ F, YF = × i∈N(F ) Yi , EF : YN(F ) → R, Potentials and energies (assume ψF (yF ) > 0) ψF (yF ) = exp(−EF (yF )), and EF (yF ) = − log(ψF (yF )). Then p(y ) can be written as 1 p(Y = y ) = ψF (yF ) Z F ∈F 1 = exp(− EF (yF )), Z F ∈F Hence, p(y ) is completely determined by E (y ) = F ∈F EF (yF ) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 25. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Potentials and Energy Functions For each factor F ∈ F, YF = × i∈N(F ) Yi , EF : YN(F ) → R, Potentials and energies (assume ψF (yF ) > 0) ψF (yF ) = exp(−EF (yF )), and EF (yF ) = − log(ψF (yF )). Then p(y ) can be written as 1 p(Y = y ) = ψF (yF ) Z F ∈F 1 = exp(− EF (yF )), Z F ∈F Hence, p(y ) is completely determined by E (y ) = F ∈F EF (yF ) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 26. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Energy Minimization 1 argmax p(Y = y ) = argmax exp(− EF (yF )) y ∈Y y ∈Y Z F ∈F = argmax exp(− EF (yF )) y ∈Y F ∈F = argmax − EF (yF ) y ∈Y F ∈F = argmin EF (yF ) y ∈Y F ∈F = argmin E (y ). y ∈Y Energy minimization can be interpreted as solving for the most likely state of some factor graph model Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 27. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Parameterization Factor graphs define a family of distributions Parameterization: identifying individual members by parameters w Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 28. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Parameterization Factor graphs define a family of distributions Parameterization: identifying individual members by parameters w distributions indexed by w pw1 pw2 distributions in family Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 29. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Example: Parameterization Image segmentation model Pairwise “Potts” energy function EF (yi , yj ; w1 ), EF : {0, 1} × {0, 1} × R → R, EF (0, 0; w1 ) = EF (1, 1; w1 ) = 0 image segmentation model EF (0, 1; w1 ) = EF (1, 0; w1 ) = w1 Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 30. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Example: Parameterization (cont) Image segmentation model Unary energy function EF (yi ; x, w ), EF : {0, 1} × X × R{0,1}×D → R, EF (0; x, w ) = w (0), ψF (x) EF (1; x, w ) = w (1), ψF (x) image segmentation model Features ψF : X → RD , e.g. image filters Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 31. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Example: Parameterization (cont) w(0), ψF (x) ... ... ... w(1), ψF (x) ... 0 w1 w1 0 Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 32. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Example: Parameterization (cont) w(0), ψF (x) ... ... ... w(1), ψF (x) ... 0 w1 w1 0 Total number of parameters: D + D + 1 Parameters are shared, but energies differ because of different ψF (x) General form, linear in w , EF (yF ; xF , w ) = w (yF ), ψF (xF ) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 33. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Making Predictions Making predictions: given x ∈ X , predict y ∈ Y How to measure quality of prediction? (or function f : X → Y) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 34. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Loss function Define a loss function ∆ : Y × Y → R+ , so that ∆(y , y ∗ ) measures the loss incurred by predicting y when y ∗ is true. The loss function is application dependent Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 35. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Test-time Inference Loss function ∆(y , f (x)): correct label y , predict f (x) ∆:Y ×Y →R True joint distribution d(X , Y ) and true conditional d(y |x) Model distribution p(y |x) Expected loss: quality of prediction R∆ (x) f = Ey ∼d(y |x) ∆(y , f (x)) = d(y |x) ∆(y , f (x)). y ∈Y ≈ Ey ∼p(y |x;w ) ∆(y , f (x)) Assuming that p(y |x; w ) ≈ d(y |x) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 36. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Test-time Inference Loss function ∆(y , f (x)): correct label y , predict f (x) ∆:Y ×Y →R True joint distribution d(X , Y ) and true conditional d(y |x) Model distribution p(y |x) Expected loss: quality of prediction R∆ (x) f = Ey ∼d(y |x) ∆(y , f (x)) = d(y |x) ∆(y , f (x)). y ∈Y ≈ Ey ∼p(y |x;w ) ∆(y , f (x)) Assuming that p(y |x; w ) ≈ d(y |x) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 37. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Test-time Inference Loss function ∆(y , f (x)): correct label y , predict f (x) ∆:Y ×Y →R True joint distribution d(X , Y ) and true conditional d(y |x) Model distribution p(y |x) Expected loss: quality of prediction R∆ (x) f = Ey ∼d(y |x) ∆(y , f (x)) = d(y |x) ∆(y , f (x)). y ∈Y ≈ Ey ∼p(y |x;w ) ∆(y , f (x)) Assuming that p(y |x; w ) ≈ d(y |x) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 38. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Example 1: 0/1 loss Loss 0 iff perfectly predicted, 1 otherwise: 0 if y = y ∗ ∆0/1 (y , y ∗ ) = I (y = y ∗ ) = 1 otherwise Plugging it in, y∗ := argmin Ey ∼p(y |x) ∆0/1 (y , y ) y ∈Y = argmax p(y |x) y ∈Y = argmin E (y , x). y ∈Y Minimizing the expected 0/1-loss → MAP prediction (energy minimization) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 39. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Example 1: 0/1 loss Loss 0 iff perfectly predicted, 1 otherwise: 0 if y = y ∗ ∆0/1 (y , y ∗ ) = I (y = y ∗ ) = 1 otherwise Plugging it in, y∗ := argmin Ey ∼p(y |x) ∆0/1 (y , y ) y ∈Y = argmax p(y |x) y ∈Y = argmin E (y , x). y ∈Y Minimizing the expected 0/1-loss → MAP prediction (energy minimization) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 40. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Example 2: Hamming loss Count the number of mislabeled variables: 1 ∆H (y , y ∗ ) = I (yi = yi∗ ) |V | i∈V Plugging it in, y∗ := argmin Ey ∼p(y |x) [∆H (y , y )] y ∈Y = argmax p(yi |x) yi ∈Yi i∈V Minimizing the expected Hamming loss → maximum posterior marginal (MPM, Max-Marg) prediction Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 41. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Example 2: Hamming loss Count the number of mislabeled variables: 1 ∆H (y , y ∗ ) = I (yi = yi∗ ) |V | i∈V Plugging it in, y∗ := argmin Ey ∼p(y |x) [∆H (y , y )] y ∈Y = argmax p(yi |x) yi ∈Yi i∈V Minimizing the expected Hamming loss → maximum posterior marginal (MPM, Max-Marg) prediction Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 42. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Example 3: Squared error Assume a vector space on Yi (pixel intensities, optical flow vectors, etc.). Sum of squared errors 1 ∆Q (y , y ∗ ) = yi − yi∗ 2 . |V | i∈V Plugging it in, y∗ := argmin Ey ∼p(y |x) [∆Q (y , y )] y ∈Y   =  p(yi |x)yi  yi ∈Yi i∈V Minimizing the expected squared error → minimum mean squared error (MMSE) prediction Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 43. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Example 3: Squared error Assume a vector space on Yi (pixel intensities, optical flow vectors, etc.). Sum of squared errors 1 ∆Q (y , y ∗ ) = yi − yi∗ 2 . |V | i∈V Plugging it in, y∗ := argmin Ey ∼p(y |x) [∆Q (y , y )] y ∈Y   =  p(yi |x)yi  yi ∈Yi i∈V Minimizing the expected squared error → minimum mean squared error (MMSE) prediction Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 44. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Inference Task: Maximum A Posteriori (MAP) Inference Definition (Maximum A Posteriori (MAP) Inference) Given a factor graph, parameterization, and weight vector w , and given the observation x, find y ∗ = argmax p(Y = y |x, w ) = argmin E (y ; x, w ). y ∈Y y ∈Y Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 45. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Inference Task: Probabilistic Inference Definition (Probabilistic Inference) Given a factor graph, parameterization, and weight vector w , and given the observation x, find log Z (x, w ) = log exp(−E (y ; x, w )), y ∈Y µF (yF ) = p(YF = yf |x, w ), ∀F ∈ F, ∀yF ∈ YF . This typically includes variable marginals µi (yi ) = p(yi |x, w ) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 46. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Example: Man-made structure detection Xi ψi2 Yi 3 ψi,k Yk ψi1 Left: input image x, Middle: ground truth labeling on 16-by-16 pixel blocks, Right: factor graph model Features: gradient and color histograms Estimate model parameters from ≈ 60 training images Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 47. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Example: Man-made structure detection Left: input image x, Middle (probabilistic inference): visualization of the variable marginals p(yi = “manmade |x, w ), Right (MAP inference): joint MAP labeling y ∗ = argmaxy ∈Y p(y |x, w ). Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 48. Graphical Models Factor Graphs Test-time Inference Training Training Training the Model What can be learned? Model structure: factors Model variables: observed variables fixed, but we can add unobserved variables Factor energies: parameters Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 49. Graphical Models Factor Graphs Test-time Inference Training Training Training the Model What can be learned? Model structure: factors Model variables: observed variables fixed, but we can add unobserved variables Factor energies: parameters Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 50. Graphical Models Factor Graphs Test-time Inference Training Training Training: Overview Assume a fully observed, independent and identically distributed (iid) sample set {(x n , y n )}n=1,...,N , (x n , y n ) ∼ d(X , Y ) Goal: predict well, Alternative goal: first model d(y |x) well by p(y |x, w ), then predict by minimizing the expected loss Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 51. Graphical Models Factor Graphs Test-time Inference Training Training Probabilistic Learning Problem (Probabilistic Parameter Learning) Let d(y |x) be the (unknown) conditional distribution of labels for a problem to be solved. For a parameterized conditional distribution p(y |x, w ) with parameters w ∈ RD , probabilistic parameter learning is the task of finding a point estimate of the parameter w ∗ that makes p(y |x, w ∗ ) closest to d(y |x). We will discuss probabilistic parameter learning in detail. Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 52. Graphical Models Factor Graphs Test-time Inference Training Training Probabilistic Learning Problem (Probabilistic Parameter Learning) Let d(y |x) be the (unknown) conditional distribution of labels for a problem to be solved. For a parameterized conditional distribution p(y |x, w ) with parameters w ∈ RD , probabilistic parameter learning is the task of finding a point estimate of the parameter w ∗ that makes p(y |x, w ∗ ) closest to d(y |x). We will discuss probabilistic parameter learning in detail. Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 53. Graphical Models Factor Graphs Test-time Inference Training Training Loss-Minimizing Parameter Learning Problem (Loss-Minimizing Parameter Learning) Let d(x, y ) be the unknown distribution of data in labels, and let ∆ : Y × Y → R be a loss function. Loss minimizing parameter learning is the task of finding a parameter value w ∗ such that the expected prediction risk E(x,y )∼d(x,y ) [∆(y , fp (x))] is as small as possible, where fp (x) = argmaxy ∈Y p(y |x, w ∗ ). Requires loss function at training time Directly learns a prediction function fp (x) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 54. Graphical Models Factor Graphs Test-time Inference Training Training Loss-Minimizing Parameter Learning Problem (Loss-Minimizing Parameter Learning) Let d(x, y ) be the unknown distribution of data in labels, and let ∆ : Y × Y → R be a loss function. Loss minimizing parameter learning is the task of finding a parameter value w ∗ such that the expected prediction risk E(x,y )∼d(x,y ) [∆(y , fp (x))] is as small as possible, where fp (x) = argmaxy ∈Y p(y |x, w ∗ ). Requires loss function at training time Directly learns a prediction function fp (x) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models