File:Q-Learning on Taxi-v2 with Shifted Reward.png

From UBC Wiki

Original file(843 × 647 pixels, file size: 79 KB, MIME type: image/png)

Summary

Description
English: Cumulative true reward from the environment is plotted as a function of training episode. Perfect reward observation (beta0=0, blue) along with negative shifts converge to the global optimum, while positive shifts are stuck in a local optimum.
Date 15 April 2019(2019-04-15)
File source Own Work
Author NamHeeKim

Licensing

I, the copyright holder of this work, hereby publish it under the following license:
Some rights reserved
Permission is granted to copy, distribute and/or modify this document according to the terms in Creative Commons License, Attribution-ShareAlike 4.0. The full text of this license may be found here: CC by-sa 4.0
Attribution-Share-a-like

File history

Click on a date/time to view the file as it appeared at that time.

Date/TimeThumbnailDimensionsUserComment
current05:37, 19 April 2019Thumbnail for version as of 05:37, 19 April 2019843 × 647 (79 KB)NamHeeKim (talk | contribs)User created page with UploadWizard