Course talk:STAT 550
- [View source↑]
- [History↑]
Contents
| Thread title | Replies | Last modified |
|---|---|---|
| Reports of data cleaning issues, A3, Q1. | 0 | 23:40, 18 March 2014 |
| Discussion of Assignment 3, Q2 | 0 | 23:39, 18 March 2014 |
| Discussion of Assignment 3, Q1. | 0 | 23:37, 18 March 2014 |
| Discussion of Migraine Paper | 9 | 07:55, 18 March 2014 |
| Assignment #3 | 2 | 05:06, 18 March 2014 |
| Singing Study Comments | 9 | 06:42, 11 March 2014 |
| Bias (+ Confounding?) | 9 | 17:49, 7 March 2014 |
| Birth Attitudes Survey | 9 | 16:35, 4 March 2014 |
| Birth Attitudes Survey | 0 | 17:40, 28 February 2014 |
| February 13th Questions | 3 | 20:06, 13 February 2014 |
| Discussion of Dialysis Experiment | 10 | 19:26, 11 February 2014 |
| Discussion of Questions for January 30th | 9 | 22:17, 30 January 2014 |
| Questions for week 2 | 8 | 07:02, 21 January 2014 |
| Sample size discussion for Thursday Jan. 8 | 9 | 05:30, 14 January 2014 |
| Discussion for Assignment 0 | 1 | 19:20, 9 January 2014 |
| Basic Instructions | 0 | 20:12, 7 January 2014 |
Identify specific problems with the data set.
Add you queries and observations here!
Please read the paper e-mailed Sunday, focusing attention on statistical methods and results, and comment on strengths and weaknesses (one of each) of the paper.
weakness:
The paper said they used chi-squared test. However, the sample size is really small. Are they sure the cell counts are greater than 5? If not, the results are not accurate.
strength:
They offered confidence intervals, which are better than p-values.
Strength: patients for the study were chosen based on specific criteria in order to avoid confounding. For instance, patients with only migraine were included, whereas patients with other chronic pain syndrome were excluded.
Weakness: Due to the nature of the treatment the one given to the patients might not be representative of the whole acupuncture practice.
weakness: Data collection seems very basic. The diaries don't seem to provide accurate measures, although this might not be easy over such a long period.
strength: Well designed experiment for sham acupuncture
Weakness: The power for the study was only 20% which is a bit low when other problems in the study are taken into account.
Strength: The used Fisher's exact test in addition to the Chi-squared test which is helpful due to the small sample size.
weakness:
Although not necessarily a weakness per se, it seems strange to formally announce to patients that there would be both real and false treatments. While anyone participating in a study is likely aware that researchers will be studying differences through control groups, a formal announcement and anticipation of a questionnaire on patient's randomization arm seems to mitigate the placebo. For example, if patients knew that they were required to fill in a survey on their randomization arm impression they could easily research the differences in both sham and real treatments.
strength:
The researchers presented findings against the efficacy of semi-standardized methods instead of trying to make the data tell another story promoting these treatments. They also included and (recorded) detailed patient flows through the experiment (though not much commentary on whether these dropouts were systematic or random).
Does anyone recall why the lab file is coded with ** in the LabTest column to indicate a new patient? It seems like this information can be completely discarded...
Additionally can we throw away rows such as row 286 -- 33,"","" ? This also seems to be another human error. I am more asking to confirm that these are indeed human data entry errors and not some feature of the dataset I do not understand.
Thanks!
I think it can be discarded. That's what I'm doing anyway. What approach are you taking to creating the extra columns? I'm struggling to get this cumulative hour tally mentioned in class to work.
I don't think that you actually need to add any columns, one could just change the 0d coding into a 0 for all intensive purposes.
I think the easiest way to do the tally is to convert the date time into posix format and then prototype the cum function on a subset of the data (for one id). Basically you just want to create a vector of first differences of the datetime and then multiply these by the dosage...Let me know if that makes sense.
Please write three or four sentences commenting on the potential for bias in the study described on the course web page
1. Based on the description of sampling procedures,participation bias could occur in this study.
2. Also, secured-ward participants who sang or did not sing were assigned through observation of participation level. This can occur reporting bias.
3. In addition, the tests are held by assessors and I have not seen any information indicating this experiment is double blinded. In this case, interview bias can occur in this study.
1) Spectrum bias. Didn't see much reporting on stage of Alzheimer's disease in the patients. This could alter results drastically if not representative of general Alzheimer's population (further question, what is this??).
2) Not sure name of the bias, but outside factors could have an influence on happiness/well being. Could be so many other factors involved (maybe potential reporting bias)
3) Attrition Bias. Admitted that this occurs in paper. Loss of participants due to death, transfer to other facilities etc. This could be due to factors such as well being, so could cause lost information.
1) There is participation bias as only Alzheimer's patients in the ECE facility were selected as a sample for all the population of Alzheimer's patients.
2) There is observer bias as the researchers allowed their hypothesised results to influence their sampling and assignment (i.e., they used a "natural assignment" rather than random assignment).
3) Finally, confounding bias is also present as the experimenters did not properly correct for much of what happened between the 4-month testing period, including the norovirus outbreak (although they do note that this might have influenced their results).
1) Measurement bias: As the study measurement is based on cognitive assessments, which is hard to measure quantitatively and accurately. 2) Omitted variable bias: Environmental control is essential in this study. However, some covariates are hard to control, or are not realized by researchers. Therefore, omitted variables may cause bias of the study . 3) Selection bias: The people who participant in this study may tend to love music, therefore, the sample is not representative. Besides, sample size N = 45 is too small.
1) Selection bias: the researcher did not randomly assigned patients into groups. Since they are consider pre and post measurements, the singer and listener groups may not be comparable at baseline.
2) Volunteering bias: the participants decided whether they want to be in the study or not.
3) Technical/Information (?) bias: participants were reduced from 73 to 45. Moreover, the control group was not considered in the study. Only two types of different treatment were considered, but no comparison was given regarding of the difference between treatments and controls.
Please "discover" 3 forms of bias and provide short definitions in preparation for discussion on Thursday Mar. 6.
1. Response bias - the design of the survey, e.g., wording of questions, may suggest a favoured response. 2. Non-response bias - a survey generally does not receive responses from everyone. The problem is that in some cases those who respond are quite different from those who do not respond. 3. Observer bias - researchers know the objective of the study and allow this knowledge to influence their actions or observations during the study.
1. Information bias: it can occur during data collection
1.1 misclassification bias: for example, in a case-control study, some people who have the disease may be misclassified as controls, and some without the disease may be misclassified as cases.
1.2 Recall bias: for example, women who had a baby with a malformation tended to remember more mild infections that occurred during their pregnancies than did mothers of normal infants.
2. selection bias: who participant in the study differ from who would have been eligible to participant but were unwilling or not selected.
2.1 incidence-prevalence bias: patients die too fast that could be ignored in the study
2.2 participation bias: who participants in the study differ from who is eligible but not participant
1) Spectrum bias - when the population under investigation does not reflect the general population or the clinically relevant population.
2) Reporting bias - defined as "selective revealing or suppression of information" by subjects. For instance, some subjects do not fully report information about past medical history, or smoking. Sometimes reporting bias is also caused by tendency of the researchers to under-report unexpected or undesirable experimental results.
3) Information bias (misclassification bias) - due to inaccurate measurement or classification of disease, exposure or other variables. For example, an inaccurately calibrated instrument or the situation when some individuals consistently have missing data
1. Belief bias - An effect where someone's evaluation of the logical strength of an argument is biased by the believability of the conclusion.
2. Distinction bias - The tendency to view two options as more dissimilar when evaluating them simultaneously than when evaluating them separately.
3. Data-snooping bias - misusing data mining techniques to uncover relationships in data
1. Cognitive bias: a pattern of deviation in judgement, whereby inferences about other people and situations may be drawn in an illogical fashion. 2. Omitted-variable bias: is created when the model compensates for the missing factor by over- or underestimating the effect of one of the other factors. 3. Systematic errors: a measurement which lead to the situation where the mean of many separate measurements differs significantly from the actual value of the measured attribute
1. Experimentar Bias:occurs when the measurements obtained in a study are influenced by the experimenter's expectations regarding the outcome of the study. 2. Recency Bias: cause people to more prominently recall and emphasize recent events and observations than those that occurred in the near or distant past. 3. Confounding bias: occur when two factors are associated and the effect of one is confused with or distorted by the effect of the other.
1) Monte Carlo Bias: The difference between the true value of a parameter of interest and the value given by a Monte Carlo estimate using a finite monte carlo sample.
2) Survivorship Bias: Often, the subjects that survive a particular event are not randomly selected out of a population. If one tried to make conclusions regarding an entire population using only those that the survived the event, the conclusions would be incorrect (survivorship bias would be present). A good example is the Abraham Wald WWII Planes anecdote.
3) Leading Question Bias: Non-neutral wording of a question can influence a participant's response compared to a neutral version of the question. This influence is known as a leading question bias. For example: "Did you enjoy the critically acclaimed Academy Award-winning film entitled Twelve Years a Slave" is a leading question compared to "Did you enjoy the film entitled Twelve Years a Slave". Mentioning the acclaim a film has been given could affect the participant's response.
Please submit 3 questions by Tues, Mar. 4th noon.
Q1) Do we take into account the health care provider's gender? Furthermore knowledge of whether the person asked has a child could be of some use here?
Q2) Do we have to retain the linear scale or is it possible/common to weight the scale system for analysis. For example place heavier weights on extreme answers.
Q3) 'What statistical advice can be offered?' <- Seems extremely vague. What input would be of most use from a statistician? Eg/ Are sample size calculations required or has the survey been conducted? Are we are needed to check questions aren't biased?
1. What's the main purpose for maternal healthcare researchers? Are they going to develop a new intervention program?
2. I agree with Danny. I believe gender could have a huge influence on the reuslts in this survey.
3. Do they include names for health care providers? Reporting names on the survey may influence participants and make the results become biased.
4. What's the potential sample size? Are we going to collect data from all of them?
Q1) How will they consider a "no opinion" answer? For instance, if they take averages should no opinion be included or not?
Q2) How can the validity of a survey be established?
Q3) Are the participants chosen based on specific criteria?
Q1) Are we assuming reliable responses within each group (e.g., among nurses) or must we test the level of agreement within each group before comparing the attitudes between groups? Q2) Should we consider an analysis based on clustering the type of questions (e.g., moral-based, practicality-based, etc.) rather than testing the overall agreement in attitudes? Q3) How are we categorising attitudes? Pro, Neutral, Anti, or are we keeping the 7 level scale of the test.
Q1) What is the point of this experiment? Is there a specific question they are looking to answer? Q2) Are all the sample questions points of debate or controversy? If not, are they looking to find out the amount of bias in healthcare providers? Q3) How would they like the ranks to be weighted? Is an "agree" equally opposite to "disagree" or will one side be more heavily weighted and for what reason?
Q1. For a midwife, do we consider wether or not she has children and gave birth naturally? Q2. What is the specific aim of the study? Are these questions cover the things we want to study about? Q3. Do these questions measure the same thing? ' When a woman is in labour, the safest place for her to be is in the hospital' seems uncorrelated with other questions. What is the Cronbach's alpha?
Q1) If the questions are not equally independent, is that rational to give same score to all questions? Q2) Sometimes, it is real hard to me to identify disagree, mildly disagree etcs, so is there method to test robustness of the result? Q3)In survey, is possible that some people fill in it very careless, is there any method to distinguish good survey result and bad one?
1) The problem states that we are interested in "attitudes about child-birth". With factor analysis in mind, is there a pre-specified set of attitudes to be investigated? (e.g. Attitude concerning Caesarian, Attitude concerning patients making choices, attitude toward non-traditional methods)
2) Is it necessary to have such a large scale? Would 5 levels be sufficient?
3) Will the order of the questions on the survey be randomized?
1. Neil's question on question order randomization is interesting for it addresses the issue that many of the questions seem to elicit highly correlated responses. In that vein of thought, can we use some of the questions within this survey to test for internal validity?
2. To test for internal validity we could use Cronbach's alpha technique. In this case, what is the best way to define the score function (the question the researchers have posed is not well-defined)? Should it be weighted for different questions?
3. Do the researchers have other results or variables related to this survey that we can use to test the present results' construct validity?
Please submit 3 questions by Tues, Mar. 4th noon.
1) Could you discuss the use of box plots instead of violin plots?
2) With respect to the first example: Is the magnitude of the ratings of the ratings important, or is it more important to focus on ratings of the patients relative to each other?
3) For the second example, would you be able to provide more information regarding performance of each method on individual patients? It would be useful to be able to pair them up to see the correlation of the two methods.
1) For the first example, should each of the physician specific scores be normalized to account for the different range of scores given?
2) Can we consider adding a Wilcoxon signed-rank test to these visual displays to account for non-linear association?
3) In measuring the agreement between physicians in the last example, should each disagreement be weighted the same as each agreement? That is, should every disagreement detract from our measure of associativity more than each agreement?
Q1) Rather than normalising, would there be an advantage to only scaling or centring the specific scores. Q2) Could we perhaps look into a between vs. within physician effect for the scores as a method to better understand the differences. Q3) Should we consider the reliability of the scores when carrying out analyses.
Please submit your your three questions by Wednesday evening.
Q1) What are some reasons as to why a systolic/diastolic blood pressure measurement may be missing in the context of the study? Q2) What are the "normal" ranges for a systolic/diastolic blood pressure measurement? Q3) How large would a unit change in a blood pressure measurement from one hour to the next have to be to be considered unsual?
Q1) Was the blood pressure measured before the study? i.e. Will they consider that some patients might also have hypotension/hypertension problem? Q2) What is the difference in temperature between warm dialysate and the room temperature one? Q3) How many time passes between two consecutive dialysis sessions?
Q1) What other factors were considered for each patient? Ie/ male/female, weight Q2) Do blood pressures change with age/sex/height etc and similarly do blood pressure drops signalling hypotension change with regards to other factors? Q3) What is the expected blood pressure difference between the two and will this change with time?
Q1) What is the expected rate of emphhypotension for patients using dialysis? Q2) Would using warm dialysate cause any adverse or other effects? Q3) Would changing the dialysate temperature for the same patient cause any adverse or other effects?
Q1. Why the previously recorded blood pressure for one subject are so unstable? Q2. Why so many missing values in the 4th record of subject #01? Q3. Does the cold temperature dialysis have long term effect on blood pressure that may affect later experiments?
Q1) Are the missing values in systolic/diastolic blood pressure measurement occurring completely at random, at random or not at random? Q2) How to define "hypotension" under both conditions? Whether the definition are the same in both conditions? Q3) What are the potential reasons that participants drop out? Q4) Whether this study has a "wash-out" time? How long is that? If the side effect does not occur right away, short "wash-out" period may cause a problem.
Additional Question: What percentage of the sample is experiencing acute renal failure versus end-stage renal failure. Is the information regarding which patients have which available?
Q1) What values of FEV1 are considered 'normal' and how does this change with age? Q2) What values of FEV1 classify moderate and severe asthma? Q3) What is the expected drop-out rate in such studies (if known)?
Q1) What difference is the researchers hoping to see? Q2) How many subjects/months are feasible, taking into account cost and availability? Q3) How would the baseline change if not given the treatments?
Q1) Are any measurements being taken (e.g., changes of weight, diet, blood pressure, self-report questionnaires) within the two-month intervals?
Q2) What kind of variables have been found to be confounding for testing asthma drugs in the past (e.g., age, gender, etc)?.
Q3) Is the new treatment to be used for all kinds of asthma or just certain types (e.g., allergic, exercise-induced, etc) ? In the former case, are you considering dividing the ashtma patients by asthma-type?
a) In Analysing controlled trials with baseline and follow up measurements, it seems that linear regression does not fit the data well. Does this matter? b) Except gender, are there any other factors that have impacts on FVC? Or in other words, What decides the difference of FVC between male and female?
Q1. What kind of blinding strategy should this study choose? (unblinded, single, double or triple?) And what kind of biases will this study deal with? Q2. What is the anticipated non-compliance rate? Since treatments are self-administered regularly each day, some patients may forget to take treatments. How to solve this problem? Q3. What is the participation rate for source population? Q4. Do we need a two-sided test? In order to detect a meaningful effect, how powerful should this study become?
Q1. For trials of chronic conditions, the study time is normally very long. Is it necessary to measure several times during the study period? Q2. Is it necessary to consider other factors that may influence the result? Q3. For the FVC test, it is normally repeated several times on patient. Is it repeated immediately after the previous test, or followed at a different time of the day?
Q1. What is the expected difference between the two proposed treatments? ie. the expected effect size and the expected variability around this measure. Q2. What metric do the researchers propose using to measure the difference between treatments? ie. relative differences vs. absolute differences, percent changes etc. Q3. If patients are self-reporting how do the researchers expect to confirm reliability and validity of measurements?
Q1: Have previous studies been performed about this (or a similar) topic? If so, could you point me towards a paper?
Q2: How do you plan on recruiting participants? Is it possible that some participants will have something in common (e.g. two members of the same family)
Q3: To elaborate on Danny's drop-out question, do you know why participants drop out? (Is it random? Death? The treatment not working?) Do you expect the drop-out rate to be larger in one group than the other? Is there a certain period after which drop-out increases? (e.g. Christmas)
Please add a relevant comment and question to this discussion thread.
To get the ball rolling, one area highlighted in Gerald van Belle's book is case when costs differ between samples. This could be a factor to consider on Thursday (section 2.11). Also since Neil mentioned we are dealing with a Binomial distribution, we could talk about the need for care when using certain methods to find the sample size. For example in the same book Equation 2.27 is considered appropriate only for the region of 10 < n < 100.
Extending on Daniel's point, an interesting point to considered and discuss, which is not fully fleshed out in the chapter, is what to do in situations where multiple constraints are influencing the sample size? For example, what to do in the situation where there exist constraints both in terms of costs (section 2.11), number of subjects available (section 2.10), and the requirement of a certain effect size dictated by a journal (section 2.5). Perhaps, we could delinate situations where ranking the importance of one constraint over another might be appropriate.
In this case mentioned by Andres, can we use different sample size calculations regarding each constraints and select the smallest number as sample size?
I believe it is important to consider whether the number of subjects required per group is available, and, if it is not, whether is useful to have unequal sample sizes (section 2.10). Moreover, as Andres mentioned in the previous comment we could discuss about the consequences of solely rely on the effect size (as the Rule of Thumb presented in most of the sections depends on it).
Yes, level of statistical significance, the value of the power desired, one-sided/two-sided test, costs difference, effect size, and unequal sample size are essential aspects to consider when we try to identify the sample size. Besides math calculation, in order to increase internal validity of the study design, do we need to consider about matching? Patients may have different characteristics, such as age and gender. For this example, suppose we only know whether the patients use activated carbon treatment or not, do we still have to go through all files to match two samples? Are there any better way or we do not care about matching? In addition, it is also important to consider the response rate in one of the groups (especially for surveys).
While it would be nice to match or block patients, I believe it would be too inefficient to implement this on an observational study such as the activated carbon case, unless there is a good number of patients available. And it would be rather unethical to create a randomized experiment, forcing subjects to consume poison and testing the use of activated carbon to other treatments.
As for the sample size, and in regards to Andres comment, it could be feasible to find the limiting factor and base it off that. But that factor should be a certain "mix" of other dependent factors such as costs and number of subjects available.
One point I'd like to ask is for a real experiment, how do we know some of the values that we need for sample size calculation (for example, standard deviation, probability of success, etc.). Should it be based on previous findings or a pilot study or something else?
A step toward answering Andres' question is to address the "number of subjects available" and the "cost" constraints as a single constraint. It would involve generalizing the concept of cost to allow for varying cost of additional observations as the sample size increases.
Increasing costs are often the case in reality; for example, suppose that the most recent charts are already digitized (very accessible, meaning cheap), slightly older charts are organized in a filing cabinet (still relatively cheap), but the pre- 2000 charts are disorganized in a cardboard box due to an office move or flood (much more expensive).
To control the number of subjects available, we could consider the cost of additional observations beyond that point as infinity. In this case, the simple method given by the book would no longer apply, but if costs were known it would not be too difficult to figure out the solution. I think we would be forced to use observations from the single remaining available set regardless of cost.
Another question, related to this one, is how to solve the cost problem when additional samples come in batches at a fixed cost. For example, sorting a box of charts gives 10 observations rather than just one.
Combining some of the ideas already mentioned could result in an interesting approach to choosing sample sizes. Consider regarding the currently available clinical results on Activated carbon as fixed with close to zero cost (for Neil suggests that these will likely be digitized and thus have zero incremental lookup cost). This also implicitly assumes this is a new treatment method that has not been conducted in the past. In comparison, the results of alternative treatments must be ascertained through manual searching (per Neil's suggestion). Hence, using section 2.10 on unequal sample sizes, as suggested by Chiara, will yield the number of past records to be manually searched. One intricacy not commented on is that the cost of retrieving one record of alternate treatment will not be uniform. Pre-digitized records are not organized by symptoms or treatment, therefore finding one positive record may actually be poisson distribution among all other causes bringing kids to the ER.
Alternatively, we can view this problem from a different perspective: What fixed cost are we willing to incur in the manual search process? With this cost we could estimate the number of applicable records found, consider this value fixed, and use section 2.10 to determine the number of cases where Activated carbon is used. If this sample size is greater than the number of records currently available simply wait and conduct the study when more records become available. [Here, the waiting time can be considered a generalized cost (likely modelled by an exponential random variable) to be compared against the costs from the search process. In this way the investigator could minimize his costs by balancing the relative sample sizes].
I agree with Sean's option. Also, if we only care about computing ability, an alternative way to deal with the case that sample size is greater than the number of records available may be to apply bootstrap. I think the cost in this case will be very small. However, it can be argued that this way does not increase the amount of information of original data, so it may not fit the requirement of our design.
Students are invited collaborate on Assignment 0 by post comments and questions in this discussion.
A few questions/comments:
1. Could you please elaborate on the instructions for this assignment? In particular, is our analysis meant to be based on the summary statistics provided in the "Simple Study Designs" or from the linked articles?
2. It appears the link for example 3 is broken.
Thanks!
You can add your discussion contributions by clicking on the Reply button relating to the appropriate thread. Use the "Save page" button at the bottom to save your work. I suggest creating your submission first in your favourite text editor or word processing program and then pasting it in, rather than doing all the editing on-line. After you've pasted in your contribution use "Show preview" to see what the formatted version of the page will look like after you've saved and then save. It is not necessary to sign your name for the purposes of this discussion, but I will ask people to submit their contributions to me by e-mail as well as posting it here. However, contributions are not anonymous because anyone can track who's edited the page by using the "History" tab above.