Course talk:STAT 550 2013
- [View source↑]
- [History↑]
Contents
| Thread title | Replies | Last modified |
|---|---|---|
| Birth Attitudies Survey | 11 | 07:24, 26 March 2013 |
| Questions regarding Dialysis study | 9 | 06:39, 5 March 2013 |
| Recommendations for Hart Banach | 10 | 20:13, 31 January 2013 |
| Questions regarding Juvenile Arthritis Study | 9 | 07:48, 31 January 2013 |
| Jan 24: Assignment 2 Questions | 10 | 20:08, 24 January 2013 |
| Basic Instructions | 2 | 03:54, 13 January 2013 |
| Reading for January 8 | 9 | 02:14, 8 January 2013 |
| Readings for discussion on Jan. 8 | 0 | 21:44, 2 January 2013 |
Please post three questions
1. Is the average taken for each bullet point or for each "attitude/belief" category? The latter seems weird since the questions in a category are quite different and it doesn't make sense to average their responses.
2. Is "correlation structure of the average scores" referring to the correlation among the 15 categories?
3. In what way do you judge validity? Is there a "gold standard" for a valid survey for attitudes?
1. Should "no opinion" be included as score 4 or should not be included when calcuating averages? The former results in less variation.
2. Is it reasonable to assign the same step-size among different responses? e.g. people may be reluctant to choose "strongly agree", which may imply that "strongly agree" should get more score.
3. Aren't the categories too many? e.g. do people really distinguish "mildly agree" and "agree"?
1. What is a "provider group"?
2. What do you mean by "establish the validity of the survey"?
3. Are the questions within each set equally important? If not, weighted averages should be used.
1. If the researchers wish to take an average over each set of questions, the questions in that set should be measuring the same thing. For instance, in the maternal choices section, some of the statements seem to represent contradictory views.
2. Are six questions enough to measure the attitudes on one topic? Increasing the number of questions that measure the same thing might improve the validity of the survey.
3. Since the data are correlated (multiple responses from one subject), a linear mixed effects model may be a good way to compare provider groups. In this case, the response is the raw scores (not averaged) with predictors question area and provider group.
- How do the responses align with the attitudes/beliefs of a set of questions? For example, if the average score for a group of providers is 7 (strongly agree) for the "maternal choices" section, what kind of attitude does this imply?
- Can you clarify what you mean by "correlation structure of the average scores to possibly identify underlying factors or groupings"? What correlation are you interested in (correlation of average scores for a particular question, average scores for a particular section, etc.)?
- What does it mean in this case to "validate the survey as a measure of attitudes"?
1. Is the degree of the internal consistency of items in each of the 15 sets(e.g. the maternal choices set) large enough? 2. What are the Cronbach's alpha values for each set of questions? 3. Is it good to include highly similar items? For example, "For a woman, having a naturally managed birth is a more empowering experience than delivering by cesarean section." and "Women who deliver their baby by cesarean section miss an important life experience."
1. if the researchers wish to use the surveys as a measure of "attitude" by using some kind of scoring system derived from the survey, what kind of attitude does a perfect score correspond to?
2. how can 2 people's response to the same question be compared fairly when what people define as "strongly agree" internally vary across individuals.
3. What does it mean to 'validate' a survey, and how do you do that?
1. When calculating the average of each set of structure set, it is reasonable to give each question of the set the same weight? i.e., some question may better reflect the attitude towards this set of questions.
2. From the attached article, maybe we can use the Cronbach's alpha to test whether the questions within each set are related to each other. Then what is the appropriate sample size to calculate this coefficient?
3. What is the cut-off numerical score which distinguish the overall attitude (disagree, agree, no opinion maybe) for each set of questions? Is chi-squared helpful here?
1. They say they intend to compare provider groups. Do you they compare these groups in each of 15 areas or just take a average of these 15 areas? Hear taking a simple average might lead to a misleading conclusion.
2. In the end, they talk about validty of their survey. I'm quite confused about the concept "validity" here. Does it mean that they hope to prove that this survey indeed reflects their real attitude towards surgical delivery?
3. Are the participants be able to distinguish the seven possibilities listed in the questionnaire? When I fill in such questionnaires, I feel so confused about disagree and strongly disagree.
1. By using the average of these scores(i.e agree, disagree, and so on..),, is it would be realistic to valiadate this survey ?
2. How could they identify the groupings by using the correlation structure of the these categogies avarage scores ? I think, sometimes it is difficult to distinguish among agree, strongly agree and mild agree ?
3. Besides these two set of questions, is it possible to add some quantitative measurements ? I think, that quantitative measurements would be beneficial to make a concrete decision about the safety of child-birth.
1. By using the average of these scores(i.e agree, disagree, and so on..),, is it would be realistic to valiadate this survey ?
2. How could they identify the groupings by using the correlation structure of the these categogies avarage scores ? I think, sometimes it is difficult to distinguish among agree, strongly agree and mild agree ?
3. Besides these two set of questions, is it possible to add some quantitative measurements ? I think, that quantitative measurements would be beneficial to make a concrete decision about the safety of child-birth.
Please supply three questions to clarify the choice of appropriate analysis
1. How is "sudden drop of blood pressure" defined?
2. Does the sequence of administration of dialysate at different temperatures have confounding effects? In other words, if a patient is assigned to the "cold" group, will (s)he be more/less likely to develop hypotension during the subsequent "hot" period compared with someone who is given warm dialysate at the beginning?
3. From the sample data it seems that all patients were monitored from late May to early August. Depending on the location of the study, the ambient temperature may change significantly during that period, and as such "room temperature" could be quite variable throughout the study if it is not kept more or less constant.
1. There are so many missing data here, which is likely to result in bias in estimation if just getting rid of them. So how can we deal with such a great deal of missing data?
2. In this case, is it necessary to consider clustering effect? How to estimate interclass correlation coefficient?
3. How to decide sample size here since no information about power and effetive size has been provided?
1. How is an episode of hypotension defined? Is there some threshold change in blood pressure that triggers an episode?
2. Are the investigators concerned with changes in blood pressure at different time points during dialysis? If not, is it appropriate to simply average blood pressure measurements taken through the course of one treatment?
3. Did the two groups receive the same number of cold and room temperature treatments?
1. Is it better to let one group merely receive warmed dialysate and the other merely receive room temperature dialysate?
2. Is the temperature of the warmed dialysate kept the same for each patient and in each time during the whole study period?
3. Should some other clinical(e.g kidney treatment history) and demographic information be considered?
1. How a sudden drop in blood pressure can be measured ?
2. The drop in blood pressure might possibly be due to a decrease in internal body temperature from where the Nephrologists hypothesized ? Is there any scientific evidence or statistical significance of their hypothesis ?
3. Why the blood pressure was recorded upto 4 hours ? Why not more than 4 hours ? Are the blood pressure measurements correlated ?
1) The "date" column in the data suggests that each subject underwent dialysis sessions on the same days (roughly speaking). Was this intentional such that we must incorporate the exact date into the analysis (compared with an analysis where order of trials didn't matter)?
2) within each session, do researchers expect (significant) time varying effect of dialysis on BP ?
3) since there are unequal numbers of hot/cold treatments per subject, does looking at the rate of incidence of hypo-tension the right measure to consider (which suggests some-kind of poisson regression)
1. What measure of blood pressure are you interested in? When talking about “drops” in blood pressure, we must be referring to one number.
2. Was blood pressure recorded strictly every hour, or were the recording times lenient?
3. Are you interested in the magnitude of the incidence of hypotension, or just the occurrence?
- Were other potentially confounding variables also recorded for the patients (e.g. age, sex, etc.) and controlled for during the study? If not, can we check to see if there were significant differences between the groups in the study?
- How rare is emphhypotension normally? If it is a rare occurrence, then we may, by chance, observe many instances of the adverse effect for one of the temperatures and mistakenly attribute the temperature as an indicator of emphhypotension.
- Checking wikipedia, there seems to be 3 potential disadvantages to a cross-over study. (a) the order of the treatments may affect the results (b) there may be a "carry-over" effect of one treatment over to the next treatment and (c) a learning effect, which probably will not affect this study. How will issues (a) and (b) be addressed in the study?
1. Is the sudden drop of bleed pressure defined as number or percentage? Is it compared with the previous record or the baseline record? And should both systolic and diastolic blood pressure drop be taken into consideration?
2. If we want to explore whether temperature has any effect on blood pressure, should the normal fluctuation of blood pressure without dialysis for each subject also be recorded as control?
3. Are you interested in the episode or time if sudden drop happens? i.e. if hypotension exists for two subjects, but arrives at different times, are they treated the same or not?
Please contribute two recommendations to make to Dr. Banach
1. The proportion of students that are “represented” by the funds of knowledge that are incorporated in a program may affect student success. As a result, these proportions should be kept constant amongst the programs. Or, if enough programs are held, the proportions can vary and be considered as a covariate, perhaps telling us how this proportion impacts student success.
2. A good measure of “student success” is their enthusiasm about learning and school. A good way to quantify this is through student participation – namely, the number of times a student “raises their hand”. This is similar to “attendance” data but has the benefits that 1. it is controlled by the student (not influenced by the parents), 2. data can be collected in a smaller time frame, and 3. an upper limit is not really an issue.
1. It would be a good idea to clearly state the research question(s). From there, we can work on defining the variable(s) of interest we may want to measure to answer the research question(s), which will lead to the statistical techniques we may want to use. Then the design and scope of the study can be created with these specified goals while taking into consideration any restrictions we may have. This was discussed in Tuesday's class, but it would be a good idea to make everything as concrete as possible before moving forward.
2. It is probably very likely that there will be students that "drop out" of the program. In this study, I believe that students that "drop out" from the study may be intrinsically different than students that choose to stay in the study (e.g. students that drop out of the study are less likely to try in school than those that do not). Therefore, the assumption that units drop out randomly will probably not hold in this study, and it would be wrong to base results on just the students that remained in the study. Careful consideration should be taken in order to retain as many students as possible in the study.
1. I don't think it's a good idea to hold several programmes concurrently or spaced closely in time. Otherwise, since the proportion of vulnerable students is (presumably) small, it will mean that students may be engaged in different activities at almost the same time, and it will be difficult to gauge which one actually contributed to the variables we are planning to use.
2. Getting more schools involved seems to be a necessity. There might be some school-specific effects that one will not be able to differentiate from the true experimental effect if only one school is considered. This will still not eliminate teacher effect though. How about treating that as a random effect?
1. We discussed several designs in class, but not all designs answer the same research question. It is important to clearly define your research question so that the right design can be selected. For instance, if you hope to prove that curriculums built around students’ funds of knowledge improve their ability to learn (measured by some quantitative factor), then the design where all participants partake in a program seems fitting.
2. It may be useful to consider quantitative measures that you could reasonably expect to see a difference in between the treatment and control groups. The larger the expected difference, the greater the statistical power of your analysis. Since you mentioned that your sample size is relatively fixed around 40 students, choosing a metric that will reflect a difference between the two groups is important.
1. Since the study lasts for quite a long time, quite a few subjects might quit from the study. Please keep a detailed record about the reason why they intend to quit, since these information plays an important role in our statistical analysis of missing data.
2. As we know, for general treatment and control study, the subjects themselves have no idea of which group they are divided into, just as we have come across in acupuncture study. For this study, the researcher seems to pay little attention to this point. In my opinion, this design might lead to some bias, which means difference produced by other factors, rather than treatment and control.
1. As the main objective of the study is to make some necessary steps in the policy implementation, so I think, it would be better to take more schools in different areas rather than one (stratified sampling).
2. The subjects should be selected using probability sampling procedure not subjective sampling.
1. Carefully and clearly choose and define one or several variables that can directly reflect the research purpose. For example, if the purpose is to see whether students benefit from certain approach of teaching, then variables that measure students' performance would be good candidates.
2. Once the subjects (students) are voluntarily enrolled in the study, it's better to randomize them into control group and treatment group instead of grouping them according to certain criterion. It's also important to blind them to their grouping label.
1. Before study design and data collection, it's better to clear clarify the purpose of the research. Even if there are several potential topics you can explore, pick one as the primary research question and leave others less important.
2. Maybe it's better to choose some funds of interest yourself and let children check them, instead of letting them think of, which may cause response bias. In this way, you can design program or curriculum beforehand and set better quantitative measurement for the effect.
1. Design.
Two Programs: Curriculum based on "Funds of knowledge" (Program T) v.s. Traditional curriculum (Program C)
Two types of subjects: children with (or with more) funds of knowledge (Group F) v.s. children with no (or less) funds of knowledge (Group !F).
Implementation:
Round 1: Program T and Program C run at the same time. Randomize (equal) numbers of Subjects F and !F into each program. Conduct some kind of entry and exit measures (m_i and m_e, let d =m_e-m_i) related to learning, but is balanced in groups and in programs.
Analysis:
- Potential to compare the effect of program T vs C (as in, compare d_t v.s d_c). --> H0: program developed with funds of knowledge is better for children on this particular measure
- Potential to compare the effect of how having more funds of knowledge v.s. no (or less) funds of knowledge (as in, d_f v.s. d_f!) --> stretching this a bit: children with more funds of knowledge tend to do better on this measure.
Round 2: Switch the treatment (program) for each individual. Conduct the same entry and exit measures. Can do same comparison as above.
IN ADDITION, ask the question: "Which program increased your interest in learning". Since the order of the programs are randomized for all participants I believe this could be a valid question.
(Am I too ambitious? Sample size could be a problem (and budget)).
1. Should be careful when putting survey questions measured by "Strongly disagree, Disagree, ... , Strongly Agree" : kids are very whimsical and they might not respond truthfully or they might not know how to answer correctly this kind of question.
2. Although the effect may vary across baseline performances, it should be still valid to use quantitative measure. Maybe we should alleviate this problem by constructing unbounded measure instead of using the typical test score methods which have upper bounds : ex) number of flowers planted in one minute
Please supply two questions
1. What “patterns” of corticosteroid dosings are you considering?
2. Which is more important to monitor – height or weight?
1. Is there a measure for the seriousness of the arthritis? For example, can we quantitatively compare arthritis in different children with some sort of "arthritis score"? Does this affect dosing?
2. What is the protocol for the dosing? (Does a child receive a dosing amount proportional to his/her body weight or age? Does a child receive a more frequent dosing schedule if they are older? Etc.)
1. How long are the "longer breaks"? Growth effects may take considerable time to manifest themselves. If the longer breaks occur near the end of the study, we may not be able to tell whether reduced corticosteroid use would mitigate those effects.
2. How is the dosing determined in clinical charts?
1. Are doses of corticosteroid equally effective when ingested versus injected?
2. What are the key units of measurement for dosage patterns, and which is of the greatest interest? (i.e. dosage amount, dosage frequency)
3. What difference in growth would be scientifically relevant between a JRA case and a similar child without JRA?
1. Taking corticosteroid might have a beneficial effect on wight gain, but an adverse influence on growth in height. Then how to evaluate the effect of this treatment?
2. How to decide the pattern of dosings? Just depend on doctor's decisions or subject physical conditions? Does this variation of dosings between physicians rusult in some bias in final analysis?
1. data structure. there are variables that change with time, and variables that don't change with time. It is possible to record a list of dates and corresponding changes, but that dataset becomes really big very quickly. Is this too much information?
2. Analysis of data. What analysis will be used? Data storage could be more efficient if it is designed around an analysis plan.
1. The patterns of dosing vary between physicians, can the pattern be classified into several categories? 2. The JRA has a number of particular disease types. Are the rheumatologists interested in investigating the effects of each individual type or investigating the overall JRA?
1. Do you want to analyze weight and height together, say BMI or separately? And the same with frequency and dose of corticosteroid, do you want to analyze them together as does per week/month, or treat as two explanatory variables? 2. I wonder whether they will also collect data from juveniles without rheumatoid arthritis for their height and weight, because I assume there might be strong effect modifiers, such as diet structure and age(juveniles usually grow faster in height and weight). So instead of taking many potential variables/confounders in the model, it may be better to control them in the data collection (case-control cohort?)
I took the liberty of starting a discussion. Post your questions here!
The trial used “semi-standardized” acupuncture. It seems like the success of acupuncture in the treatment group would depend on initial diagnostics – since this determined the course of treatment. This might lead to more variable results.
How comparable was the pilot trial used for sample size calculations? It seems like the pilot used TCM, which the paper suggests is more effective than semi-standardized acupuncture.
Is it really true that age has does not impact migraine symptoms and severity? I wonder if the difference in age between the treatment and control groups may have introduced bias into the study.
Due to the strict inclusion and exclusion regulations, the number of subject (sample size) is so small. In this case, does it make sense to compute p values in order to do hypothesis testing?
In both received real acupuncture group and received sham acupuncture group, there are quite a few participants quiting from the study in the treatment process. The author just got rid of the missing data or dropouts. Does this simple approach bring about bias in estimation?
Just to clarify, was the sample size determined with a targeted significance level 0.05 and power 0.80 using the asymptotic result that a difference in proportions is normally distributed when n is large?
If yes, was the sample size, n, deemed large enough?
The authors were not exactly clear on how they did their sample size calculations. Did they calculate a confidence interval (what they call an "equivalence range" (?) ) for the estimate in the pilot study, and then use the end points of that estimate in their sample size calculation?
1. On page 524, "Statistical analysis" section: What do the authors mean by "The equivalence range was 11.8-57.9%"?
2. On page 524, "Patients" section, last paragraph: Why do the authors test for mean age difference between groups? Are there even "true means" here to compare?
3. On page 524, "Efficacy and long-term follow-up" section: The authors found that "pain" had significantly reduced within each group after treatment. Is this due to regression towards the mean? How can we tell?
1. In page 526 (efficacy), the real and the sham acupuncture groups are statistically insignificant for a number of pain parameters just by seeing the graphs (fig 2, 3 and 4) without by using any statistical inference ? It is not clear for me and how is it possible ?
2. In page 524, the author used an unpublished pilot trial to determine the sample size for his study and he don't specify how long ago the pilot study has done. Is the pilot trial valid ?
1. In "Statistical Analysis" section, p524, what is "equivalence range"?
2. In "Statistical Analysis" section, p524, how did they get beta=0.2?
3. In question 4 of assignment 2, how can we do ANCOVA based on the paper which does not give us the raw data?
My question relates more to the assignment than to the technical details: What is the level of statistical knowledge we should expect of the clients? This may have impact on how we explain things in the report.
1. ANCOVA. In order to suggest a suitable sample size for ANCOVA of the change in headache frequency we need to estimate the Variance of the error term, which can be estimated from the Sum of Squares Error (SSE) after the regression fitting. The paper does not provide paired data on each patient: How do we give a reasonable estimate?
2. ANCOVA. RE:Treatment effect used in power calc. One way to interpret the effect is the difference in migraine frequency between trt/ctrl groups Given the same baseline value: what is a reasonable size of scientific interest? If there is no way of getting this question answered by the time a consulting report is due, is it good enough to present sample sizes calculated under several scenarios? (perhaps show result graphically)
3. Change Scores Analysis. What is an effect size (difference in mean change) of scientific interest?
1. On page 524, in "statistical analysis" section, is the 15% drop-out rate a generally assumed value in such type of trials, or is it based on result from previous acupuncture trials?
2. On page 524, in "associated symptoms" section, how is the Tukey test conducted?
1. The researchers did chi-square and Fisher's exact test for comparisons, and I wonder whether they can fit some models for this study.
2. In page 522, Real acupuncture treatment, it points out that some possible feelings was explained to the patients in the real treatment group. However, not in the sham group. Suppose patients ever talked with each other, would they find whether they received the real or sham treatment? Would it have influence on the outcome?
Add your discussion contributions by clicking on the Reply button relating to the appropriate thread. Use the "Save page" button at the bottom to save your work. I suggest creating your submission first in your favourite text editor or word processing program and then pasting it in, rather than doing all the editing on-line. After you've pasted in your contribution use "Show preview" to see what the formatted version of the page will look like after you've saved and then save. It is not necessary to sign your name for the purposes of this discussion, but I will ask people to submit their contributions to me by e-mail as well as posting it here. However, contributions are not anonymous because anyone can track who's edited the page by using the "History" tab above.
In the first article, what would be used when there are only two subjects with repeated pairs of observations on each ?
In the third article, the author states that for comparison of two methods of measurement the regression analysis has done with reported weight as the outcome variable and measured weight as the predictor variable. The slope of the regression was less than 1 in each study and the mean reported weighted of heavy subjects was less than their mean measured weight. The mean reported weight of light subjects was greater than their mean measured weight, how it comes ?
In the last article, the author states that the ANCOVA gains efficiency with a change score are low when there is a high correlation (say r > 0.8) between baseline and follow up measurements. What would the case when there is not a high correlation ?
Reply to this thread for the January 8th discussion.
1.For repeated data, what does it mean by "weighted analysis"? Why is it necessary for us to put different weights on different obervations based on their oberseved numbers in each subject?
2. When talking about the additional advantage of analysis of covariance, the author claims that it has greated statistical power by comparing the number of patients needed. I wonder how we get these exact numbers.
3. For the public bias, one example of regression towards the mean, the reason why public bias is generated lies in different methods assessing quality of a paper? How can we solve such a common problem?
1. In two papers have the author claimed that initial measurements and relative change/change scores are negatively correlated. My simple simulations indicate that this is not always the case.
2. Clarification needed when using change scores (paper 4): Does the author imply that the common way of defining Treatment_Effect =(average change score in Treatment group)- (average change score in Control group)? I do not understand his claim that this treatment effect is affected by regression toward the mean if baseline imbalance exists.
3. Regarding "Analysing controlled trials...": the author appeared a strong advocate for using ANCOVA. However, I feel he does not emphasize enough, for the intended audience, that several strong assumptions must be satisfied. It would be interesting to investigate the Type I error (Ho:no treatment effect) of ANCOVA vs using change scores when the ANCOVA assumption of homogeneity of regression slope is incorrect.
1. In paper 1(Correlation, regression and repeated data), how are the p-values in the table (and the third paragraph) computed?
2. In paper 3, example 1, the author says "The difference between the second mean for the subgroup and the population mean would be approximately r times the difference between the first mean and the population mean". Does "the second mean" here refers to the predicted mean through regressing the second measurement value on the first measurement value? According to paper 2, it is the predicted instead of the observed value of Y that is always fewer standard deviations from its mean than is X from its mean.
3. In paper 4, why the efficiency gains of ANCOVA compared with a change score are low when there is a high correlation between baseline and follow up measurements? How the efficiency gains are measured?
1. (Paper 1) The method of finding the correlation between means should allow us to compute shorter confidence interval for the correlation coefficient (since the mean for a group is based on a number of observations in that group and is different from having only one observation that happens to have the value at the mean). How should we incorporate the number of observations in each group in such calculation?
2. (Paper 3) The sentence "Even if subjects are not treated the mean blood pressure will go down" is quite confusing. Sure enough BP can increase, thus bringing up the mean. Judging from the second paper it seems that those comparisons are made in terms of number of standard deviations, but still they pertain to the fitted/estimated slope only, not the observed ones.
3. (Last paper) The multiple regression / ANCOVA method uses only the main effects (baseline score and group). I wonder what interpretations can we develop by adding an interaction term - certainly the estimates of a and b will be different compared with those obtained in the main effects model.
(Article 1) By using subject means rather than individual observations, are we losing important information? For instance, an anomalous case may disappear when only considering the mean, but this case may still be scientifically relevant (if not statistically relevant).
(Article 3) The paper claims that the blood pressure of an extreme group will surely decrease without treatment due to regression toward the mean. Is this statement dependent on biological variation in blood pressure? In other words, does it assume that subjects with extreme measurements typically have lower blood pressure than what was recorded?
(Article 4) This paper seems to be fairly one-sided. I wonder if there are advantages to using change or follow-up scores versus ANCOVA that the author fails to mention. For example, change scores may be easier to interpret and require fewer conditions than ANCOVA. I also wonder which method is more common in practice.
1. Regarding the pooling of subject data from the “Correlation, Regression, and Repeated Data” paper: would researchers use this incorrect method on purpose so that they may obtain a significant result? If so, how many?
2. What type of data would not have the issue of regression towards the mean?
3. What medical treatments are used today which are deemed significant, but are in fact concluded due to poor analyses?
1. (Paper 1) I notice that in the table of simulated data, the values of X of each subject are randomly assigned. However, in reality, the repeated measurements on the same subject should be similar, which decreases the variability within subject. Will it affect the conclusion to the example?
2. (Paper 3) In the second paragraph, the author said that if subjects with hypertension were measured again, the mean would be closer to the whole population, even if they are not treated. But I think the difference can also become greater than before. If not, after several times of re-measurements, the mean of extreme ones will be very close to the population mean.
3. (Paper 4) In the last but one paragraph, the author mentioned that when the correlation is high (say r>0.8), analysis of change scores is a reasonable alternative. So what happens when the correlation is low (say r<0.2)? Will analysis of follow up be an alternative?
(Correlation, regression, and repeated data): The authors state that, when we have repeated measurements, we can compare the subject means to determine if subjects with high values of X tend to also have a high value of Y. However, are we not also concerned about the variability of the observations from each subject? How does one incorporate this into the analysis?
(Regression towards the mean): How is "regression towards the mean" interpreted for multiple regression? We no longer have the nice interpretation of the slope as in simple regression because we would have a vector of independent variables at each response value.
(Analysing controlled trials ...): Why does "regression towards the mean" negatively impact follow up score analysis and change score analysis when baseline scores are worse in the treatment group?
(Paper 1) What are the detailed steps to do the multiple regression to find whether subjects with a high value of X tend also to have a high value of Y?
(Paper 2) The slope is less than 1 if we switch X and Y, the height of parents and that of childs, given that they have the same mean and variance. Of course, the slope is still less than 1 when we don't switch them. Does this contradict the notion of "regression toward mediocrity"?
(Paper 4) The author already stated that ANCOVA generally has greater statistical power. Then why don't we just use ANCOVA even when there is a high correlation?