February 13th Questions
1) For the first example, should each of the physician specific scores be normalized to account for the different range of scores given?
2) Can we consider adding a Wilcoxon signed-rank test to these visual displays to account for non-linear association?
3) In measuring the agreement between physicians in the last example, should each disagreement be weighted the same as each agreement? That is, should every disagreement detract from our measure of associativity more than each agreement?
Q1) Rather than normalising, would there be an advantage to only scaling or centring the specific scores. Q2) Could we perhaps look into a between vs. within physician effect for the scores as a method to better understand the differences. Q3) Should we consider the reliability of the scores when carrying out analyses.