Comments on the first draft
This is based on the talk and the page.
This is interesting, but I think from a different perspective than was given.
A (positive) linear function of a reward has the same optimal policy as the original reward.
You can compute how adding a constant to a reward changes the return (sum of discounted rewards). This doesn't mean that the algorithms will work the same mainly because of the initial Q-values. If you use 0 as the initial Q-values, then I would conjecture that adding a constant so that some of the rewards are positive and some are negative will work better than if they are all positive or all negative. (And will be equivalent to a different initialization).
Similarly, think about multiplying the reward by a constant.
As for adding noise, the reason that we have alpha using averages is to allow for noise. The noise should average out. I find that r_k=10/(9+k) works well for averaging noise in the state-based examples I have tried. I would suggest that learning a model, or using action replay (where the rewards can be averaged) may work well even for very noisy rewards.