Title
This page covers Normalizing Flows, a class of generative model which transforms a latent distribution to match the data distribution using an invertible transformation.
Principal Author: Matthew Niedoba
Collaborators:
Abstract
Normalizing Flows are an attractive class of generative model which maps each element of a simple analytical distribution one to one to a complex data distribution through a bijective transformation. Notably, this construction allows for both sampling from the data distribution and exact likelihood computation. In this page, we introduce normalizing flows and detail their construction and application.
Builds on
An understanding of Probability and Probability general semantics is required to understand the construction of Normalizing Flows. The bijective transforms of normalizing flows are typically parameterized by Neural Networks.
Related Pages
Normalizing Flows are just member of the broad class of generative models. Other types of generative models include:
Method
Source and Target Distributions
Normalizing Flows model the data distribution by transforming the latent space (right) into the data space (left) through an invertible mapping. Source: Dinh, L., Sohl-Dickstein, J., & Bengio, S. (2016)
[1]
The aim of normalizing flows is to model a target distribution , . Since the analytic form of is not generally known, normalizing flows model it by transforming samples from a source distribution through a transformation , generally parameterized with some parameters .
The goal of normalizing flows is to achieve a
one to one, invertible mapping between
and
. To this end, normalizing flows restrict the transformation
to
diffeomorphisms - transformations which are invertible and where both
and
are differentiable. Since
must be invertible, we require that
must be the same dimensionality as
. Using the inverse transformation, we can transform data samples to samples from the source distribution
Likelihood Computations
The previous section illustrates how to sample data from a normalizing flow by transforming samples from a source distribution through the transformation . Another important task in generative modelling is computing the likelihood of data under the model. Due to the one to one mapping between the source and target distributions, normalizing flows allow exact computation of likelihoods through the change of variables formula
Here,
refers to the determinant of the Jacobian of
with respect to
. Since
, and using the identity
we can also write
Training Objective
Like many generative models, the goal of training is to approximate the true data distribution with our model which transforms data from our source distribution to the target distribution through the parameterized transformation . The objective of training is to minimize the KL divergence between these two distributions. That is to minimize
Note that this is the so-called
forward KL divergence, which is an expectation over samples from the true data distribution
, usually in the form of a dataset of examples. Noting that
, we can rewrite the loss as the sum of two expectations
Since the first term is constant with respect to the parameters, we can see that minimizing the KL divergence of a normalizing flow with the data distribution is equivalent to minimizing the negative log likelihood of the data under the normalizing flow model
During training, we compute the exact likelihood
by using the
change of variables formula.
Finite Normalizing Flows
The key challenge in designing a normalizing flow is choosing the structure of the transformation which maps between the source in target distributions. Such a transformation must be complex enough to model the data distribution. However, generating such a complex transformation in one shot is difficult, especially when it must be invertible. Instead of computing the transformation in one shot, finite normalizing achieve the required complexity through the composition of a finite number of simpler transformations.
Each transformation
can be thought of as a miniature normalizing flow, transforming an intermediate source distribution
into an intermediate target distribution
via the relation
for
. We set
equal to the samples from our original source distribution (usually a multivariate Gaussian) and aim to have
match the target distribution
. With the finite normalizing flow construction, the determinant of the Jacobian of the overall transformation is equal to the product of the determinants of the Jacobians of each transform
By selecting individual transforms which are invertible and for which the determinant of the Jacobian can be computed efficiently, we ensure that the overall normalizing flow is also invertible with an easy to compute Jacobian determinant. In the next sections, we discuss choices of transforms which have these properties and can be composed to construct more complex normalizing flows.
Coupling Flows
Coupling transformations, introduced by [2], aim to make the Jacobian matrix triangular by partitioning into two parts, and . Then, we define the transformation of each partition seperately
Coupling transformations are easily invertible, and the Jacobian matrix has the form
Since the Jacobian is lower triangular, the determinant is equal to one.
The partitioning of is modified with each transformation layer such that all components of are transformed by the end of the flow.
RealNVP
RealNVP [1] extends the coupling transformations introduced in [2] by adding a scaling to the transformation. Specifically
Here, the transformation is parameterized by two neural networks:
and
which control the scale and shift of the transformation. With this modification, the transformation is still easily invertible, and the Jacobian is still lower triangular
where diag indicates a diagonal matrix. Since the Jacobian is lower triangular, the determinant is still the trace of the matrix.
Planar Flows
Planar flows[3] are a type of transformation which allow for linear computation of the determinant of the Jacobian. They have the form
Here, the parameters
of the transformation are
. The determinant of the Jacobian is easily computed as
With planar flows,
and its determinant are easy to compute, but are not easily invertible. As a result, the authors of this method train their flow using the reverse KL. In this setup, they minimize the divergence
by drawing samples from the source distribution.
Continuous Normalizing Flows
Synthetic Celebrity portraits generated using Glow, a normalizing flow. Source: Kingma, D. P., & Dhariwal, P. (2018).
[4]
Continuous normalizing flows[5] consider the case of extending finite normalizing flows to an infinite number if infinitesimal transformations. If we let be a variable from our source distribution and let be a variable from our target distribution, then the continuous normalizing flow transforming to is given by
The log density of the resulting distribution is given by the
instantaneous change of variables formula, another ODE:
Notably, unlike the finite normalizing flows, computing the log density only requires evaluating the
trace of the jacobian, instead of the determinant. This allows for more freedom in selecting
, but at the cost of using an numerical ODE solver for sampling and likelihood evaluation. Training continuous normalizing flows is also challenging, as it requires backpropagating through the ODE solver.
Applications
Normalizing Flows have primarily been used to model image data, such as in [2][1][4]. However, some other applications exist such as for text modelling [6] or audio [7]. Normalizing flows have become less popular recently, possibly because the requirements of invertibility and easy to compute Jacobian determinants place large restrictions on the class of transformations that can be used. Instead, many practitioners have shifted to using other generative models, such as Generative Adversarial Networks, Variational Auto Encoders or Diffusion Probabilistic Models. However, recent work has lead to renewed interest in normalizing flows due to a simplification of the training objective for continuous normalizing flows through flow matching [8], which may be used for a new generation of powerful normalizing flow models.
Annotated Bibliography
- ↑ 1.0 1.1 1.2 Dinh, L., Sohl-Dickstein, J., & Bengio, S. (2016). Density estimation using real nvp. arXiv preprint arXiv:1605.08803.
- ↑ 2.0 2.1 2.2 Dinh, L., Krueger, D., & Bengio, Y. (2014). Nice: Non-linear independent components estimation. arXiv preprint arXiv:1410.8516.
- ↑ Rezende, D., & Mohamed, S. (2015, June). Variational inference with normalizing flows. In International conference on machine learning (pp. 1530-1538). PMLR.
- ↑ 4.0 4.1 Kingma, D. P., & Dhariwal, P. (2018). Glow: Generative flow with invertible 1x1 convolutions. Advances in neural information processing systems, 31.
- ↑ Chen, R. T., Rubanova, Y., Bettencourt, J., & Duvenaud, D. K. (2018). Neural ordinary differential equations. Advances in neural information processing systems, 31.
Chicago
- ↑ Tran, D., Vafa, K., Agrawal, K., Dinh, L., & Poole, B. (2019). Discrete flows: Invertible generative models of discrete data. Advances in Neural Information Processing Systems, 32.
- ↑ Oord, A., Li, Y., Babuschkin, I., Simonyan, K., Vinyals, O., Kavukcuoglu, K., ... & Hassabis, D. (2018, July). Parallel wavenet: Fast high-fidelity speech synthesis. In International conference on machine learning (pp. 3918-3926). PMLR.
- ↑ Lipman, Y., Chen, R. T., Ben-Hamu, H., Nickel, M., & Le, M. (2022). Flow matching for generative modeling. arXiv preprint arXiv:2210.02747.
|
Permission is granted to copy, distribute and/or modify this document according to the terms in Creative Commons License, Attribution-NonCommercial-ShareAlike 3.0. The full text of this license may be found here: CC by-nc-sa 3.0
|
|
|