Course:CPSC522/SMC for PGMs

From UBC Wiki
Jump to: navigation, search

Sequential Monte Carlo for Probabilistic Graphical Models

Twisted sequential Monte Carlo (tSMC) is a general framework for adaptive sampling-based inference. In the context of probabilistic graphical models (PGMs), it enables the combination of fast (but biased) deterministic approximations with asymptotically consistent (but expensive) stochastic inference.

Principal Author: Boyan Beronov

Collaborators:

Abstract

Probabilistic graphical models (PGMs) are a broad class of statistical models employed in all branches of science. Their static and explicit independence structure has allowed for the development of several powerful inference algorithms. Since in all but a limited subclass of models, exact inference is computationally intractable, approximate inference algorithms have been crucial for their adoption in applied domains. This article presents a line of work which combines two classes of such algorithms in a mutually beneficial way:

  1. deterministic, biased but efficient approximations to exact inference algorithms which make use of the explicit PGM structure, and
  2. adaptive versions of the stochastic, asymptotically consistent but slow methods from the sequential Monte Carlo (SMC) family.

Builds on

All concepts in this article are built on the fundamental basis of probability theory. The probabilistic model representations employed here are graphical models, with notable subclasses being Markov networks and Bayesian networks. The stochastic approximate inference algorithms rest on the theory of Markov Chain Monte Carlo and Sequential Monte Carlo, see also references therein.

Related Pages

Being a general probabilistic inference method, SMC relates to many topics in the category of Probability and Graphical Models. See the sections below on deterministic and stochastic approximate inference for references to related algorithms.

Background

This section introduces the necessary technical context for the line of research represented by Naesseth et al. (2014) [1] and Lindsten et al. (2018) [2].

Probabilistic Graphical Models

(Un-)Directed Graphical Models

Main articles: Graphical models, Markov networks, Bayesian networks

For example, the joint density of a Bayesian network is defined by construction as:

where is the index set of variables, and is the set of parent variables of variable .

Factor Graphs

Main article: Factor graph

The joint density of any PGM can be equivalently represented as a bipartite factor graph with edges only connecting factors with variables:

Here, is the index set of the factors, and is the set of variables on which the factor depends, hence . As usual, is referred to as the normalization constant or partition function. Estimating , or the model evidence / log-likelihood is important for tasks such as model learning and comparison.

Deterministic Approximate Inference

Main article: Inference in graphical models

There exist exact inference algorithms for many restricted classes of PGMs, including variable elimination, weighted model counting, belief propagation and junction trees. They usually require restricted graph connectivities, discrete random variables with finite support, or conjugate priors. In the general case, though, exact inference is intractable, hence numerous deterministic approximations have been devised and successfully employed in various applications. The following lists several important methods.

Laplace Approximation

Main article: Laplace's method

(Loopy) Belief Propagation

Main article: Belief propagation

Expectation Propagation

Main article: Expectation propagation

Variational Inference

Main article: Variational inference

Stochastic Approximate Inference: Feynman-Kac Models

Importance Sampling

Main article: Importance sampling

When the expectation of a quantity of interest is to be computed under a distribution from which exact sampling is intractable, importance sampling allows the derivation of an unbiased estimator via a proposal distribution on the same space . If the latter admits efficient sampling and (absolute continuity), the likelihood ratio (Radon-Nikodym derivative) exists and can be used to re-weight the samples obtained from :

The last identity simply observes .

Sequential Monte Carlo

Main article: SMC samplers

[3][4][5]

Twisted Sequential Monte Carlo

[6][7][8]

Twisting function :

Feynman-Kac Models

Main articles: Particle filter, Mean field particle methods, Feynman-Kac formula

This section is meant to highlight the commonality between the stochastic inference algorithms presented above – and their numerous variants in literature – in a systematic way. All such methods are special cases of the general class of Feynman-Kac path models. [9] They provide a means for constructing a measure over paths by re-weighting a base measure with a path potential , where is directly obtained from an initial measure and Markov transition kernels , and similarly is defined via a sequence of state potentials :

The usual goal of Feynman-Kac models is then to measure quantities of interest along the flow of corresponding time marginal distributions:
In particular, all quantities above – including target densities, importance weights, resampling and rejuvenation strategies, twisting functions etc. – can be understood as particular choices for approximating the quantities , and in terms of particle populations (discrete measures). This broad class of models also encompasses evolutionary optimization methods, which in turn constitute a highly diverse and active field of study. For example, in the case of genetic algorithms, the transition kernels are called recombination and mutation operators, while the potentials are fitness functions.

From this abstract perspective, the explanation of twisted SMC is very simple:

The target densities are manipulated by the factors in such a way that the effective sample size at each SMC step is improved, while all factors cancel out in the full expression for , i.e., they maintain an unbiased estimator for the final distribution of interest and its normalization constant . The power of this approach stems from the flexibility of choosing the twisting potentials adaptively, i.e., depending on the data and current particle forest.

SMC for PGMs

Using the formalism, data structures and algorithms introduced above, this section summarizes the core contributions of [1] and [2], highlighting the respective incremental improvements.

Basic SMC

Sequential Graph Decomposition

[1] formulates generic PGM inference as an SMC problem, i.e., as the task of sampling from a sequence of increasingly complex probability distributions – where the first distribution is efficiently accessible and the last distribution corresponds to the full target. This is a very common theme in statistical inference, the most widely used techniques of this sort being tempering, annealing and bridging. In the setting of PGMs, this sequence is constructed by ordering the factors according to some criterion and gradually including them into the sequence of unnormalized densities:

Twisted SMC

Twisting functions

[2] extends the above framework by introducing a general twisting mechanism in the setting of PGMs. In particular, the authors derive the optimal twisting function for intermediate target densities to be:

Exactly computing is of course intractable, as it would amount to exact inference. Instead, the authors derive general schemes for adopting well-known deterministic approximations as twisting functions.

Experiments

Both methods were tested on a variety of applications, including the Classical XY and Ising models from statistical mechanics, topic models (Latent Dirichlet Allocation), and a Gaussian Markov random field. In general, the basic SMC method performed comparably to state of the art inference algorithms at the time of publication, and the more recent twisted SMC variant consistently outperformed basic SMC, requiring 1-2 orders of magnitude less particles to achieve similar accuracy in terms of .

Conclusions

Recent years have seen important developments in adaptive sampling-based inference methods within the SMC framework. While these were mostly introduced in the context of state-space models, the publications discussed here have demonstrated the potential of such methods in the much more general class of PGMs. Despite adaptation methods presenting a computational overhead, in many cases this cost can be amortized over subsequent inference steps, opening up the possibility of significant gains in accuracy per computation. Perhaps the most valuable contribution in [2] is the insight that, in combination with traditional deterministic inference approximations, twisted SMC provides an alternative path towards inference amortization, whereas the statistical discourse has been dominated by variational inference methods in recent years.

Annotated Bibliography

  1. 1.0 1.1 1.2 Naesseth, Christian Andersson, Fredrik Lindsten, and Thomas B. Schön. "Sequential Monte Carlo for graphical models." Advances in Neural Information Processing Systems. 2014. http://papers.nips.cc/paper/5570-sequential-monte-carlo-for-graphical-models
  2. 2.0 2.1 2.2 2.3 Lindsten, Fredrik, Jouni Helske, and Matti Vihola. "Graphical model inference: Sequential Monte Carlo meets deterministic approximations." Advances in Neural Information Processing Systems. 2018. http://papers.nips.cc/paper/8041-graphical-model-inference-sequential-monte-carlo-meets-deterministic-approximations
  3. Liu, Jun S., and Rong Chen. "Sequential Monte Carlo methods for dynamic systems." Journal of the American statistical association 93.443 (1998): 1032-1044. https://www.tandfonline.com/doi/abs/10.1080/01621459.1998.10473765
  4. Del Moral, Pierre, Arnaud Doucet, and Ajay Jasra. "Sequential monte carlo samplers." Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68.3 (2006): 411-436. https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-9868.2006.00553.x
  5. Doucet, Arnaud, and Adam M. Johansen. "A tutorial on particle filtering and smoothing: Fifteen years later." Handbook of nonlinear filtering 12.656-704 (2009): 3. http://www.warwick.ac.uk/fac/sci/statistics/staff/academic-research/johansen/publications/dj11.pdf
  6. Kappen, Hilbert Johan, and Hans Christian Ruiz. "Adaptive importance sampling for control and inference." Journal of Statistical Physics 162.5 (2016): 1244-1266. https://link.springer.com/article/10.1007/s10955-016-1446-7
  7. Guarniero, Pieralberto, Adam M. Johansen, and Anthony Lee. "The iterated auxiliary particle filter." Journal of the American Statistical Association 112.520 (2017): 1636-1647. https://amstat.tandfonline.com/doi/abs/10.1080/01621459.2016.1222291
  8. Heng, J., Bishop, A. N., Deligiannidis, G., & Doucet, A. (2017). Controlled sequential monte carlo. arXiv preprint arXiv:1708.08396. https://arxiv.org/abs/1708.08396
  9. Moral, P. D. "Feynman-kac formulae: Genealogical and interacting particle systems with applications, Probability and its applications." (2004). https://www.infona.pl/resource/bwmeta1.element.springer-3c891fde-b218-3132-870a-10bb8290626c

To Add

Some rights reserved
Permission is granted to copy, distribute and/or modify this document according to the terms in Creative Commons License, Attribution-NonCommercial-ShareAlike 3.0. The full text of this license may be found here: CC by-nc-sa 3.0
By-nc-sa-small-transparent.png