Discussion 1
I think I've addressed most of your concerns. 1. I simplified the abstract and to reduce the redundancy 2. Fixed that grammatical error (if beta is small) 3. Expanded the derivation of p(x_0) and the top/bottom multiplication steps 4. Added a sentence on Jensen's Inequality and how it is used in this context. Also added a link to the VI page which talks about the ELBO 5. Added the algebraic manipulation, and changed the final form to match the DDPM paper so its clearer when I discuss modifications in the DDPM paper 6. Selecting hyperparmeters is just a choice you make. The DDPM authors don't really motivate it in the paper, although they do provide an ablation which I've included in the experimental results section 7. I completely rewrote that section 8. I tried to rewrite this. The main idea here is that the authors were inspired by a different body of work (score matching methods), and noticed that with a specific parameterization of the reverse trajectory they end up with a score-matching like objective 9. As with 5. I tried to introduce this earlier so it doesn't come out of the blue so much 10. I had f_\mu as the function approximator for the mean in the first paper, but I changed it to \mu_\theta for clarity. 11. I tried to explain how this equation comes from the reparameterization trick. I also added the ablation section to show that this parameterization is more effective. 12. Added conclusion section.