Critique

Edited by author.
Last edit: 05:22, 13 February 2023

I think your use of pseudo code is good and I can see the contribution of the second paper by making a DP variant! It worked well as a unit too.

Paper 1

  • Should it be or can be any number of time steps?
  • I was wondering how SAH is computed if S is a set of states and A is a set of actions.
  • What represents is not explicitly mentioned.

Paper 2

  • Related to paper 1, Q-functions and Q-learning are mentioned and a link is given but it is not really elaborated on. I think it may help to write a few sentences on Q-learning in the background information.

Minor Corrections

Abstract

  • as well intro the -> as well as the pseudo-code of the two algorithms and their PAC and regret guarantees

Paper 1

  • S a finite set of states -> S is a finite set of states
  • s.t. could be written out as such that
  • for all time and state -> for all times and states

Notible -> Notable

Under Intuition of JDP, there is a [todo] remaining.

SarahChen (talk)05:17, 13 February 2023

Should it be or can be any number of time steps?*

SarahChen (talk)05:19, 13 February 2023