Reading for January 8
Reply to this thread for the January 8th discussion.
1.For repeated data, what does it mean by "weighted analysis"? Why is it necessary for us to put different weights on different obervations based on their oberseved numbers in each subject?
2. When talking about the additional advantage of analysis of covariance, the author claims that it has greated statistical power by comparing the number of patients needed. I wonder how we get these exact numbers.
3. For the public bias, one example of regression towards the mean, the reason why public bias is generated lies in different methods assessing quality of a paper? How can we solve such a common problem?
1. In two papers have the author claimed that initial measurements and relative change/change scores are negatively correlated. My simple simulations indicate that this is not always the case.
2. Clarification needed when using change scores (paper 4): Does the author imply that the common way of defining Treatment_Effect =(average change score in Treatment group)- (average change score in Control group)? I do not understand his claim that this treatment effect is affected by regression toward the mean if baseline imbalance exists.
3. Regarding "Analysing controlled trials...": the author appeared a strong advocate for using ANCOVA. However, I feel he does not emphasize enough, for the intended audience, that several strong assumptions must be satisfied. It would be interesting to investigate the Type I error (Ho:no treatment effect) of ANCOVA vs using change scores when the ANCOVA assumption of homogeneity of regression slope is incorrect.
1. In paper 1(Correlation, regression and repeated data), how are the p-values in the table (and the third paragraph) computed?
2. In paper 3, example 1, the author says "The difference between the second mean for the subgroup and the population mean would be approximately r times the difference between the first mean and the population mean". Does "the second mean" here refers to the predicted mean through regressing the second measurement value on the first measurement value? According to paper 2, it is the predicted instead of the observed value of Y that is always fewer standard deviations from its mean than is X from its mean.
3. In paper 4, why the efficiency gains of ANCOVA compared with a change score are low when there is a high correlation between baseline and follow up measurements? How the efficiency gains are measured?
1. (Paper 1) The method of finding the correlation between means should allow us to compute shorter confidence interval for the correlation coefficient (since the mean for a group is based on a number of observations in that group and is different from having only one observation that happens to have the value at the mean). How should we incorporate the number of observations in each group in such calculation?
2. (Paper 3) The sentence "Even if subjects are not treated the mean blood pressure will go down" is quite confusing. Sure enough BP can increase, thus bringing up the mean. Judging from the second paper it seems that those comparisons are made in terms of number of standard deviations, but still they pertain to the fitted/estimated slope only, not the observed ones.
3. (Last paper) The multiple regression / ANCOVA method uses only the main effects (baseline score and group). I wonder what interpretations can we develop by adding an interaction term - certainly the estimates of a and b will be different compared with those obtained in the main effects model.
(Article 1) By using subject means rather than individual observations, are we losing important information? For instance, an anomalous case may disappear when only considering the mean, but this case may still be scientifically relevant (if not statistically relevant).
(Article 3) The paper claims that the blood pressure of an extreme group will surely decrease without treatment due to regression toward the mean. Is this statement dependent on biological variation in blood pressure? In other words, does it assume that subjects with extreme measurements typically have lower blood pressure than what was recorded?
(Article 4) This paper seems to be fairly one-sided. I wonder if there are advantages to using change or follow-up scores versus ANCOVA that the author fails to mention. For example, change scores may be easier to interpret and require fewer conditions than ANCOVA. I also wonder which method is more common in practice.
1. Regarding the pooling of subject data from the “Correlation, Regression, and Repeated Data” paper: would researchers use this incorrect method on purpose so that they may obtain a significant result? If so, how many?
2. What type of data would not have the issue of regression towards the mean?
3. What medical treatments are used today which are deemed significant, but are in fact concluded due to poor analyses?
1. (Paper 1) I notice that in the table of simulated data, the values of X of each subject are randomly assigned. However, in reality, the repeated measurements on the same subject should be similar, which decreases the variability within subject. Will it affect the conclusion to the example?
2. (Paper 3) In the second paragraph, the author said that if subjects with hypertension were measured again, the mean would be closer to the whole population, even if they are not treated. But I think the difference can also become greater than before. If not, after several times of re-measurements, the mean of extreme ones will be very close to the population mean.
3. (Paper 4) In the last but one paragraph, the author mentioned that when the correlation is high (say r>0.8), analysis of change scores is a reasonable alternative. So what happens when the correlation is low (say r<0.2)? Will analysis of follow up be an alternative?
(Correlation, regression, and repeated data): The authors state that, when we have repeated measurements, we can compare the subject means to determine if subjects with high values of X tend to also have a high value of Y. However, are we not also concerned about the variability of the observations from each subject? How does one incorporate this into the analysis?
(Regression towards the mean): How is "regression towards the mean" interpreted for multiple regression? We no longer have the nice interpretation of the slope as in simple regression because we would have a vector of independent variables at each response value.
(Analysing controlled trials ...): Why does "regression towards the mean" negatively impact follow up score analysis and change score analysis when baseline scores are worse in the treatment group?
(Paper 1) What are the detailed steps to do the multiple regression to find whether subjects with a high value of X tend also to have a high value of Y?
(Paper 2) The slope is less than 1 if we switch X and Y, the height of parents and that of childs, given that they have the same mean and variance. Of course, the slope is still less than 1 when we don't switch them. Does this contradict the notion of "regression toward mediocrity"?
(Paper 4) The author already stated that ANCOVA generally has greater statistical power. Then why don't we just use ANCOVA even when there is a high correlation?