MDP suggestions

Hi Md Abed Rahman, Yaashaar Hadadian Pour , Adnan Reza Awesome page! It helped me a lot in understanding MDP. Here are some of the things I needed clarifications on:

$a_{t}=\pi (S_{t})$ what is $a_{t}$ ? Is it the action at time t?
In the reward and optimal policy section, I couldn't understand how did you get the values 1.6284, 0.4278, 0.0850 etc.
For value iteration could you provide some explanation for the pseudo code like the one that is present in policy iteration?
Just like Jiahong Chen has mentioned a few practical examples would be fun to read.

SamprityKashyap (talk)‎

Thanks Samprity for your feedback. We would definitely try to make the necessary improvements. Any future comments and/or feedback will be thoroughly appreciated.

MDAbedRahman (talk)‎

Also, I forgot to answer the points/clarifications. Here are they and also we will try to put them in the Wiki
Regarding question 1, yes it is the action at time t picked by the policy $\pi$
Regarding question 2, the values there are values picked to show how a highly punishing to highly rewarding values for traversing through nodes can change the optimal policy. The value ranges are mostly empirical here and only calibrated to this situation. We are not sure whether putting the explanation to this might be useful in the Wiki. If you have any insights on that, we would be happy to hear them. Regarding question 3 and 4, we would definitely try our best :)

MDAbedRahman (talk)‎

Thanks for the clarifications! I thought you might have used some equations to reach the values.

SamprityKashyap (talk)‎