Initial Feedback

This looks good for the assignment.

The abstract says " two papers that use Markov Decision Processes for Differentially Private (DP) Reinforcement Learning", but does the first one guarantee DP?
What does "sends" mean in the pseudo-code?
Perhaps a page on Differential Privacy could be used for the next assignment (as it wasn't explained here in a way that is easy to understand)
The conclusion should talk about both papers