Critique

The article reads well and is easy to grasp.

Some clarifications that would be helpful

- It wasn't fully clear as to why a 1-to-1 mapping is required in a normalizing flow

- How does the composition of more straightforward transformations improve complexity? Do they use non-linearity to improve complexity?

- Is s_\theta a neural network in RealNVP?

Suggestions:

- Can you add some images from Flow-based models to give some motivation?

- Can you give some cons of this method and why it's not preferred over say something like Diffusion-based models?

Some minor language errors,

- "generally known, model it by transforming samples from a source distribution"

- "Such a transformation must b.."