# Course:CPSC522/SMC for PGMs

## Sequential Monte Carlo for Probabilistic Graphical Models

Twisted sequential Monte Carlo (tSMC) is a general framework for adaptive sampling-based inference. In the context of probabilistic graphical models (PGMs), it enables the combination of fast (but biased) deterministic approximations with asymptotically consistent (but expensive) stochastic inference.

Principal Author: Boyan Beronov

Collaborators:

## Abstract

Probabilistic graphical models (PGMs) are a broad class of statistical models employed in all branches of science. Their static and explicit independence structure has allowed for the development of several powerful inference algorithms. Since in all but a limited subclass of models, exact inference is computationally intractable, approximate inference algorithms have been crucial for their adoption in applied domains. This article presents a line of work which combines two classes of such algorithms in a mutually beneficial way:

1. deterministic, biased but efficient approximations to exact inference algorithms which make use of the explicit PGM structure, and
2. adaptive versions of the stochastic, asymptotically consistent but slow methods from the sequential Monte Carlo (SMC) family.

### Builds on

All concepts in this article are built on the fundamental basis of probability theory. The probabilistic model representations employed here are graphical models, with notable subclasses being Markov networks and Bayesian networks. The stochastic approximate inference algorithms rest on the theory of Markov Chain Monte Carlo and Sequential Monte Carlo, see also references therein.

### Related Pages

Being a general probabilistic inference method, SMC relates to many topics in the category of Probability and Graphical Models. See the sections below on deterministic and stochastic approximate inference for references to related algorithms.

## Background

This section introduces the necessary technical context for the line of research represented by Naesseth et al. (2014) [1] and Lindsten et al. (2018) [2].

### Probabilistic Graphical Models

#### (Un-)Directed Graphical Models

Main articles: Graphical models, Markov networks, Bayesian networks

For example, the joint density ${\displaystyle \pi }$ of a Bayesian network is defined by construction as:

${\displaystyle \pi (x_{1:T})=\prod _{t\,\in \,{\mathcal {V}}}p\left(x_{t}\mid x_{\,\operatorname {Pa} _{t}}\right)\;,}$
where ${\displaystyle {\mathcal {V}}=\left\{1,\dots T\right\}}$ is the index set of variables, and ${\displaystyle x_{\,\operatorname {Pa} _{j}}=\left\{x_{t}\mid t\,\in \,\operatorname {Pa} _{j}\right\}}$ is the set of parent variables of variable ${\displaystyle j}$.

#### Factor Graphs

Main article: Factor graph

The joint density ${\displaystyle \pi }$ of any PGM can be equivalently represented as a bipartite factor graph with edges only connecting factors with variables:

${\displaystyle \pi (x_{1:T})={\frac {1}{Z}}\,\prod _{j\,\in \,{\mathcal {F}}}f_{j}(x_{{\mathcal {V}}_{j}})\qquad \qquad Z=\int \prod _{j\,\in \,{\mathcal {F}}}f_{j}(x_{{\mathcal {V}}_{j}})\operatorname {d} \!x_{1:T}\;.}$
Here, ${\displaystyle {\mathcal {F}}}$ is the index set of the factors, and ${\displaystyle x_{{\mathcal {V}}_{j}}=\left\{x_{t}\mid t\,\in \,{\mathcal {V}}_{j}\right\}}$ is the set of variables on which the factor ${\displaystyle f_{j}:X_{{\mathcal {V}}_{j}}\rightarrow \mathbb {R} }$ depends, hence ${\displaystyle {\mathcal {V}}=\bigcup _{j\,\in \,{\mathcal {F}}}{\mathcal {V}}_{j}}$. As usual, ${\displaystyle Z}$ is referred to as the normalization constant or partition function. Estimating ${\displaystyle Z}$, or the model evidence / log-likelihood ${\displaystyle \log(Z)}$ is important for tasks such as model learning and comparison.

### Deterministic Approximate Inference

Main article: Inference in graphical models

There exist exact inference algorithms for many restricted classes of PGMs, including variable elimination, weighted model counting, belief propagation and junction trees. They usually require restricted graph connectivities, discrete random variables with finite support, or conjugate priors. In the general case, though, exact inference is intractable, hence numerous deterministic approximations have been devised and successfully employed in various applications. The following lists several important methods.

#### Laplace Approximation

Main article: Laplace's method

#### (Loopy) Belief Propagation

Main article: Belief propagation

#### Expectation Propagation

Main article: Expectation propagation

#### Variational Inference

Main article: Variational inference

### Stochastic Approximate Inference: Feynman-Kac Models

#### Importance Sampling

Main article: Importance sampling

When the expectation of a quantity of interest ${\textstyle f}$ is to be computed under a distribution ${\displaystyle p_{X}}$ from which exact sampling is intractable, importance sampling allows the derivation of an unbiased estimator via a proposal distribution ${\displaystyle q_{X}}$ on the same space ${\displaystyle X}$. If the latter admits efficient sampling and ${\displaystyle q_{X}\!\gg \!p_{X}}$ (absolute continuity), the likelihood ratio (Radon-Nikodym derivative) ${\textstyle r(x)={\frac {\operatorname {d} \!p_{X}}{\operatorname {d} \!q_{X}}}(x)={\frac {p_{X}(x)}{q_{X}(x)}}}$ exists and can be used to re-weight the samples obtained from ${\displaystyle q_{X}}$:

${\displaystyle \mathbb {E} _{x\sim p_{X}}\left[\,f(x)\,\right]=\int _{X}q_{X}(\operatorname {d} \!x)\,{\frac {p_{X}(x)}{q_{X}(x)}}\,f(x)=\mathbb {E} _{x\sim q_{X}}\left[\,r(x)\,f(x)\,\right]={\frac {\mathbb {E} _{x\sim q_{X}}\left[\,r(x)\,f(x)\,\right]}{\mathbb {E} _{x\sim q_{X}}\left[\,r(x)\,\right]}}\;.}$

The last identity simply observes ${\displaystyle \mathbb {E} _{x\sim q_{X}}\left[\,r(x)\,\right]=\mathbb {E} _{x\sim p_{X}}\left[\,1\,\right]=1}$.

#### Sequential Monte Carlo

Main article: SMC samplers

#### Twisted Sequential Monte Carlo

Twisting function ${\displaystyle \psi _{t}}$:

${\displaystyle \gamma _{t}^{\psi }(x_{1:t})=\psi _{t}(x_{1:t})\,\gamma _{t}(x_{1:t})}$

#### Feynman-Kac Models

Main articles: Particle filter, Mean field particle methods, Feynman-Kac formula

This section is meant to highlight the commonality between the stochastic inference algorithms presented above – and their numerous variants in literature – in a systematic way. All such methods are special cases of the general class of Feynman-Kac path models. [9] They provide a means for constructing a measure ${\textstyle \mathbb {P} _{T}}$ over paths ${\textstyle x_{1:T}\,\in \,X^{T}}$ by re-weighting a base measure ${\textstyle \mathbb {Q} _{T}}$ with a path potential ${\textstyle R_{T}}$, where ${\textstyle \mathbb {Q} _{T}}$ is directly obtained from an initial measure ${\textstyle \mu _{1}}$ and Markov transition kernels ${\textstyle \left(\kappa _{t}\right)_{t=2:T}}$, and similarly ${\textstyle R_{T}}$ is defined via a sequence of state potentials ${\textstyle \left(\phi _{t}\right)_{t=1:T}}$:

${\displaystyle \mathbb {Q} _{T}(\operatorname {d} \!x_{1:T})=\mu _{1}(\operatorname {d} \!x_{1})\,\prod _{t=2}^{T}\kappa _{t}(x_{t-1},\operatorname {d} \!x_{t})\qquad \qquad R_{T}(x_{1:T})=\phi _{1}(x_{1})\,\prod _{t=2}^{T}\phi _{t}(x_{t-1},x_{t})}$
${\displaystyle \mathbb {P} _{T}(\operatorname {d} \!x_{1:T})={\frac {R_{T}(x_{1:T})\,\mathbb {Q} _{T}(\operatorname {d} \!x_{1:T})}{\mathbb {E} _{{\hat {x}}_{1:T}\sim \mathbb {Q} _{T}}\left[R_{T}({\hat {x}}_{1:T})\right]}}={\frac {(\mu _{1}\!\cdot \!\phi _{1})(\operatorname {d} \!x_{1})\,\prod _{t=2}^{T}(\kappa _{t}\!\cdot \!\phi _{t})(x_{t-1},\operatorname {d} \!x_{t})}{\int _{X^{T}}(\mu _{1}\!\cdot \!\phi _{1})(\operatorname {d} \!{\hat {x}}_{1})\,\prod _{t=2}^{T}(\kappa _{t}\!\cdot \!\phi _{t})({\hat {x}}_{t-1},\operatorname {d} \!{\hat {x}}_{t})}}\;.}$
The usual goal of Feynman-Kac models is then to measure quantities of interest ${\textstyle f}$ along the flow of corresponding time marginal distributions:
${\displaystyle \eta _{t}\!\left[\,f\,\right]={\frac {\gamma _{t}\!\left[\,f\,\right]}{Z_{t}}}={\frac {\gamma _{t}\!\left[\,f\,\right]}{\gamma _{t}\!\left[\,1\,\right]}}={\frac {\mathbb {E} _{x_{1:t}\sim \mathbb {Q} _{t}}\left[\,R_{t}(x_{1:t})\,f(x_{t})\,\right]}{\mathbb {E} _{x_{1:t}\sim \mathbb {Q} _{t}}\left[\,R_{t}(x_{1:t})\,\right]}}\;.}$
In particular, all quantities above – including target densities, importance weights, resampling and rejuvenation strategies, twisting functions etc. – can be understood as particular choices for approximating the quantities ${\textstyle \mu _{1}}$, ${\textstyle \left(\kappa _{t}\right)_{t=2:T}}$ and ${\textstyle \left(\phi _{t}\right)_{t=1:T}}$in terms of particle populations (discrete measures). This broad class of models also encompasses evolutionary optimization methods, which in turn constitute a highly diverse and active field of study. For example, in the case of genetic algorithms, the transition kernels ${\textstyle \kappa _{t}}$ are called recombination and mutation operators, while the potentials ${\displaystyle \phi _{t}}$ are fitness functions.

From this abstract perspective, the explanation of twisted SMC is very simple:

The target densities ${\displaystyle \gamma _{t}}$ are manipulated by the factors ${\displaystyle \psi _{t}}$ in such a way that the effective sample size at each SMC step is improved, while all factors cancel out in the full expression for ${\displaystyle \eta _{T}}$, i.e., they maintain an unbiased estimator for the final distribution of interest and its normalization constant ${\displaystyle Z_{T}}$. The power of this approach stems from the flexibility of choosing the twisting potentials adaptively, i.e., depending on the data and current particle forest.

## SMC for PGMs

Using the formalism, data structures and algorithms introduced above, this section summarizes the core contributions of [1] and [2], highlighting the respective incremental improvements.

### Basic SMC

#### Sequential Graph Decomposition

[1] formulates generic PGM inference as an SMC problem, i.e., as the task of sampling from a sequence of increasingly complex probability distributions – where the first distribution is efficiently accessible and the last distribution corresponds to the full target. This is a very common theme in statistical inference, the most widely used techniques of this sort being tempering, annealing and bridging. In the setting of PGMs, this sequence is constructed by ordering the factors according to some criterion and gradually including them into the sequence of unnormalized densities:

${\displaystyle \gamma _{t}(x_{1:t})=\prod _{j\,\in \,{\mathcal {F}}}f_{j}(x_{{\mathcal {I}}_{j}})\;.}$

### Twisted SMC

#### Twisting functions

[2] extends the above framework by introducing a general twisting mechanism in the setting of PGMs. In particular, the authors derive the optimal twisting function for intermediate target densities to be:

${\displaystyle \psi ^{\star }(x_{1:t})=\int \prod _{j\,\in \,{\mathcal {F}}\setminus {\mathcal {F}}_{t}}f_{j}(x_{{\mathcal {I}}_{j}})\operatorname {d} \!x_{t+1:T}\;.}$
Exactly computing ${\displaystyle \psi ^{\star }}$ is of course intractable, as it would amount to exact inference. Instead, the authors derive general schemes for adopting well-known deterministic approximations as twisting functions.

### Experiments

Both methods were tested on a variety of applications, including the Classical XY and Ising models from statistical mechanics, topic models (Latent Dirichlet Allocation), and a Gaussian Markov random field. In general, the basic SMC method performed comparably to state of the art inference algorithms at the time of publication, and the more recent twisted SMC variant consistently outperformed basic SMC, requiring 1-2 orders of magnitude less particles to achieve similar accuracy in terms of ${\displaystyle \log(Z)}$.

## Conclusions

Recent years have seen important developments in adaptive sampling-based inference methods within the SMC framework. While these were mostly introduced in the context of state-space models, the publications discussed here have demonstrated the potential of such methods in the much more general class of PGMs. Despite adaptation methods presenting a computational overhead, in many cases this cost can be amortized over subsequent inference steps, opening up the possibility of significant gains in accuracy per computation. Perhaps the most valuable contribution in [2] is the insight that, in combination with traditional deterministic inference approximations, twisted SMC provides an alternative path towards inference amortization, whereas the statistical discourse has been dominated by variational inference methods in recent years.

## Annotated Bibliography

1. Naesseth, Christian Andersson, Fredrik Lindsten, and Thomas B. Schön. "Sequential Monte Carlo for graphical models." Advances in Neural Information Processing Systems. 2014. http://papers.nips.cc/paper/5570-sequential-monte-carlo-for-graphical-models
2. Lindsten, Fredrik, Jouni Helske, and Matti Vihola. "Graphical model inference: Sequential Monte Carlo meets deterministic approximations." Advances in Neural Information Processing Systems. 2018. http://papers.nips.cc/paper/8041-graphical-model-inference-sequential-monte-carlo-meets-deterministic-approximations
3. Liu, Jun S., and Rong Chen. "Sequential Monte Carlo methods for dynamic systems." Journal of the American statistical association 93.443 (1998): 1032-1044. https://www.tandfonline.com/doi/abs/10.1080/01621459.1998.10473765
4. Del Moral, Pierre, Arnaud Doucet, and Ajay Jasra. "Sequential monte carlo samplers." Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68.3 (2006): 411-436. https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-9868.2006.00553.x
5. Doucet, Arnaud, and Adam M. Johansen. "A tutorial on particle filtering and smoothing: Fifteen years later." Handbook of nonlinear filtering 12.656-704 (2009): 3. http://www.warwick.ac.uk/fac/sci/statistics/staff/academic-research/johansen/publications/dj11.pdf
6. Kappen, Hilbert Johan, and Hans Christian Ruiz. "Adaptive importance sampling for control and inference." Journal of Statistical Physics 162.5 (2016): 1244-1266. https://link.springer.com/article/10.1007/s10955-016-1446-7
7. Guarniero, Pieralberto, Adam M. Johansen, and Anthony Lee. "The iterated auxiliary particle filter." Journal of the American Statistical Association 112.520 (2017): 1636-1647. https://amstat.tandfonline.com/doi/abs/10.1080/01621459.2016.1222291
8. Heng, J., Bishop, A. N., Deligiannidis, G., & Doucet, A. (2017). Controlled sequential monte carlo. arXiv preprint arXiv:1708.08396. https://arxiv.org/abs/1708.08396
9. Moral, P. D. "Feynman-kac formulae: Genealogical and interacting particle systems with applications, Probability and its applications." (2004). https://www.infona.pl/resource/bwmeta1.element.springer-3c891fde-b218-3132-870a-10bb8290626c