Talk:MDP for Differentially Private Reinforcement Learning

From UBC Wiki

Contents

Thread titleRepliesLast modified
Initial Feedback003:41, 15 February 2023
Wiki Critique020:57, 13 February 2023
Critique105:22, 13 February 2023

Initial Feedback

This looks good for the assignment.

  • The abstract says " two papers that use Markov Decision Processes for Differentially Private (DP) Reinforcement Learning", but does the first one guarantee DP?
  • What does "sends" mean in the pseudo-code?
  • Perhaps a page on Differential Privacy could be used for the next assignment (as it wasn't explained here in a way that is easy to understand)
  • The conclusion should talk about both papers
DavidPoole (talk)03:41, 15 February 2023

Wiki Critique

Overall thoughts:[wikitext]

  • I really enjoyed the wiki article but I think a little bit of a reorganization could be very useful. Specifically, I think some sort of background section convering the MDP / DP fundamentals would be good to get out of the way so I can clearly understand the contributions of each of these papers
  • I thought the abstract was good, but perhaps a more fleshed out introduction would be nice to have. Specifically, it would be good to prime me with an expectation about why I should be interested in the problem of episodic RL and why DP in the context of that problem is important.

Detailed critiques:[wikitext]

Title Block[wikitext]

  • Introduce authors of papers, and title. Transform the raw url into a link with text.

Abstract

  • may contain sensitive information such as personalized medical treatment applications - what is a medical treatment application?

Paper 1[wikitext]

  • Most of the MDP notation is pretty standard. Perhaps move it to a background section since the MDP terminology is not the specific contribution of the paper
  • Generally in RL the agent interacts with the environment. How does that compare with the interaction with users in the episodic RL algorithm block?
  • In the UBEV algorithm, it would be nice to get a blurb about what the algorithm is before seeing a wall of text. I don’t know what to look for when reading through this algorithm because I don’t have context about what it is doing.
  • You talk about the Q function but you didn’t introduce the Q function in your MDP/RL background section.
  • Add a (JDP) after the first instance of Joint differential privacy so I know what the acronym means.
  • Not sure if it is relevant but I’m curious what the difference between JDP and DP is. Maybe add a sentence about that?

Paper 2[wikitext]

  • I don’t know what a prefix count is
  • I like the intro to the PUCB algorithm. That is exactly what I’d like to see for UBEV
MatthewNiedoba (talk)20:56, 13 February 2023
Edited by author.
Last edit: 05:22, 13 February 2023

I think your use of pseudo code is good and I can see the contribution of the second paper by making a DP variant! It worked well as a unit too.

Paper 1

  • Should it be or can be any number of time steps?
  • I was wondering how SAH is computed if S is a set of states and A is a set of actions.
  • What represents is not explicitly mentioned.

Paper 2

  • Related to paper 1, Q-functions and Q-learning are mentioned and a link is given but it is not really elaborated on. I think it may help to write a few sentences on Q-learning in the background information.

Minor Corrections

Abstract

  • as well intro the -> as well as the pseudo-code of the two algorithms and their PAC and regret guarantees

Paper 1

  • S a finite set of states -> S is a finite set of states
  • s.t. could be written out as such that
  • for all time and state -> for all times and states

Notible -> Notable

Under Intuition of JDP, there is a [todo] remaining.

SarahChen (talk)05:17, 13 February 2023

Should it be or can be any number of time steps?*

SarahChen (talk)05:19, 13 February 2023