Jump to content

Course talk:STAT 550

From UBC Wiki

Contents

Thread titleRepliesLast modified
Reports of data cleaning issues, A3, Q1.023:40, 18 March 2014
Discussion of Assignment 3, Q2023:39, 18 March 2014
Discussion of Assignment 3, Q1.023:37, 18 March 2014
Discussion of Migraine Paper907:55, 18 March 2014
Assignment #3205:06, 18 March 2014
Singing Study Comments906:42, 11 March 2014
Bias (+ Confounding?)917:49, 7 March 2014
Birth Attitudes Survey916:35, 4 March 2014
Birth Attitudes Survey017:40, 28 February 2014
February 13th Questions320:06, 13 February 2014
Discussion of Dialysis Experiment1019:26, 11 February 2014
Discussion of Questions for January 30th922:17, 30 January 2014
Questions for week 2807:02, 21 January 2014
Sample size discussion for Thursday Jan. 8905:30, 14 January 2014
Discussion for Assignment 0119:20, 9 January 2014
Basic Instructions020:12, 7 January 2014

Reports of data cleaning issues, A3, Q1.

Identify specific problems with the data set.

RollinBrant (talk)23:40, 18 March 2014

Discussion of Assignment 3, Q2

Discuss here!

RollinBrant (talk)23:39, 18 March 2014

Discussion of Assignment 3, Q1.

Add you queries and observations here!

RollinBrant (talk)23:37, 18 March 2014

Discussion of Migraine Paper

Please read the paper e-mailed Sunday, focusing attention on statistical methods and results, and comment on strengths and weaknesses (one of each) of the paper.

RollinBrant (talk)03:36, 17 March 2014

weakness:

The paper said they used chi-squared test. However, the sample size is really small. Are they sure the cell counts are greater than 5? If not, the results are not accurate.

strength:

They offered confidence intervals, which are better than p-values.

HuitingMa (talk)18:36, 17 March 2014

Strength: patients for the study were chosen based on specific criteria in order to avoid confounding. For instance, patients with only migraine were included, whereas patients with other chronic pain syndrome were excluded.

Weakness: Due to the nature of the treatment the one given to the patients might not be representative of the whole acupuncture practice.

ChiaraDiGravio (talk)00:56, 18 March 2014

weakness: Data collection seems very basic. The diaries don't seem to provide accurate measures, although this might not be easy over such a long period.

strength: Well designed experiment for sham acupuncture

DanielDinsdale (talk)02:01, 18 March 2014

Weakness: The power for the study was only 20% which is a bit low when other problems in the study are taken into account.

Strength: The used Fisher's exact test in addition to the Chi-squared test which is helpful due to the small sample size.

AndresSanchezOrdonez (talk)04:10, 18 March 2014

weakness:

Although not necessarily a weakness per se, it seems strange to formally announce to patients that there would be both real and false treatments. While anyone participating in a study is likely aware that researchers will be studying differences through control groups, a formal announcement and anticipation of a questionnaire on patient's randomization arm seems to mitigate the placebo. For example, if patients knew that they were required to fill in a survey on their randomization arm impression they could easily research the differences in both sham and real treatments.

strength:

The researchers presented findings against the efficacy of semi-standardized methods instead of trying to make the data tell another story promoting these treatments. They also included and (recorded) detailed patient flows through the experiment (though not much commentary on whether these dropouts were systematic or random).

SeanJewell (talk)05:19, 18 March 2014
 
 
 
 
 

Assignment #3

Does anyone recall why the lab file is coded with ** in the LabTest column to indicate a new patient? It seems like this information can be completely discarded...

Additionally can we throw away rows such as row 286 -- 33,"","" ? This also seems to be another human error. I am more asking to confirm that these are indeed human data entry errors and not some feature of the dataset I do not understand.

Thanks!

SeanJewell (talk)20:40, 17 March 2014

I think it can be discarded. That's what I'm doing anyway. What approach are you taking to creating the extra columns? I'm struggling to get this cumulative hour tally mentioned in class to work.

DanielDinsdale (talk)21:31, 17 March 2014

I don't think that you actually need to add any columns, one could just change the 0d coding into a 0 for all intensive purposes.

I think the easiest way to do the tally is to convert the date time into posix format and then prototype the cum function on a subset of the data (for one id). Basically you just want to create a vector of first differences of the datetime and then multiply these by the dosage...Let me know if that makes sense.

SeanJewell (talk)05:06, 18 March 2014
 
 

Singing Study Comments

Please write three or four sentences commenting on the potential for bias in the study described on the course web page

RollinBrant (talk)04:38, 9 March 2014

1. Based on the description of sampling procedures,participation bias could occur in this study.

2. Also, secured-ward participants who sang or did not sing were assigned through observation of participation level. This can occur reporting bias.

3. In addition, the tests are held by assessors and I have not seen any information indicating this experiment is double blinded. In this case, interview bias can occur in this study.

HuitingMa (talk)05:35, 9 March 2014

1) Spectrum bias. Didn't see much reporting on stage of Alzheimer's disease in the patients. This could alter results drastically if not representative of general Alzheimer's population (further question, what is this??).

2) Not sure name of the bias, but outside factors could have an influence on happiness/well being. Could be so many other factors involved (maybe potential reporting bias)

3) Attrition Bias. Admitted that this occurs in paper. Loss of participants due to death, transfer to other facilities etc. This could be due to factors such as well being, so could cause lost information.

DanielDinsdale (talk)22:47, 9 March 2014

1) There is participation bias as only Alzheimer's patients in the ECE facility were selected as a sample for all the population of Alzheimer's patients.

2) There is observer bias as the researchers allowed their hypothesised results to influence their sampling and assignment (i.e., they used a "natural assignment" rather than random assignment).

3) Finally, confounding bias is also present as the experimenters did not properly correct for much of what happened between the 4-month testing period, including the norovirus outbreak (although they do note that this might have influenced their results).

AndresSanchezOrdonez (talk)00:43, 10 March 2014

1) Measurement bias: As the study measurement is based on cognitive assessments, which is hard to measure quantitatively and accurately. 2) Omitted variable bias: Environmental control is essential in this study. However, some covariates are hard to control, or are not realized by researchers. Therefore, omitted variables may cause bias of the study . 3) Selection bias: The people who participant in this study may tend to love music, therefore, the sample is not representative. Besides, sample size N = 45 is too small.

JinyuanZhang (talk)02:07, 10 March 2014

1) Selection bias: the researcher did not randomly assigned patients into groups. Since they are consider pre and post measurements, the singer and listener groups may not be comparable at baseline.

2) Volunteering bias: the participants decided whether they want to be in the study or not.

3) Technical/Information (?) bias: participants were reduced from 73 to 45. Moreover, the control group was not considered in the study. Only two types of different treatment were considered, but no comparison was given regarding of the difference between treatments and controls.

ChiaraDiGravio (talk)04:42, 10 March 2014
 
 
 
 
 

Bias (+ Confounding?)

Please "discover" 3 forms of bias and provide short definitions in preparation for discussion on Thursday Mar. 6.

RollinBrant (talk)18:45, 5 March 2014

1. Response bias - the design of the survey, e.g., wording of questions, may suggest a favoured response. 2. Non-response bias - a survey generally does not receive responses from everyone. The problem is that in some cases those who respond are quite different from those who do not respond. 3. Observer bias - researchers know the objective of the study and allow this knowledge to influence their actions or observations during the study.

AndresSanchezOrdonez (talk)23:59, 5 March 2014
 

1. Information bias: it can occur during data collection

1.1 misclassification bias: for example, in a case-control study, some people who have the disease may be misclassified as controls, and some without the disease may be misclassified as cases.

1.2 Recall bias: for example, women who had a baby with a malformation tended to remember more mild infections that occurred during their pregnancies than did mothers of normal infants.

2. selection bias: who participant in the study differ from who would have been eligible to participant but were unwilling or not selected.

2.1 incidence-prevalence bias: patients die too fast that could be ignored in the study

2.2 participation bias: who participants in the study differ from who is eligible but not participant

HuitingMa (talk)00:28, 6 March 2014

1) Spectrum bias - when the population under investigation does not reflect the general population or the clinically relevant population.

2) Reporting bias - defined as "selective revealing or suppression of information" by subjects. For instance, some subjects do not fully report information about past medical history, or smoking. Sometimes reporting bias is also caused by tendency of the researchers to under-report unexpected or undesirable experimental results.

3) Information bias (misclassification bias) - due to inaccurate measurement or classification of disease, exposure or other variables. For example, an inaccurately calibrated instrument or the situation when some individuals consistently have missing data

ChiaraDiGravio (talk)01:53, 6 March 2014

1. Belief bias - An effect where someone's evaluation of the logical strength of an argument is biased by the believability of the conclusion.

2. Distinction bias - The tendency to view two options as more dissimilar when evaluating them simultaneously than when evaluating them separately.

3. Data-snooping bias - misusing data mining techniques to uncover relationships in data

JackNi (talk)03:33, 6 March 2014

1. Cognitive bias: a pattern of deviation in judgement, whereby inferences about other people and situations may be drawn in an illogical fashion. 2. Omitted-variable bias: is created when the model compensates for the missing factor by over- or underestimating the effect of one of the other factors. 3. Systematic errors: a measurement which lead to the situation where the mean of many separate measurements differs significantly from the actual value of the measured attribute

JinyuanZhang (talk)06:02, 6 March 2014

1. Experimentar Bias:occurs when the measurements obtained in a study are influenced by the experimenter's expectations regarding the outcome of the study. 2. Recency Bias: cause people to more prominently recall and emphasize recent events and observations than those that occurred in the near or distant past. 3. Confounding bias: occur when two factors are associated and the effect of one is confused with or distorted by the effect of the other.

YifanZhang (talk)07:30, 6 March 2014
 

1) Monte Carlo Bias: The difference between the true value of a parameter of interest and the value given by a Monte Carlo estimate using a finite monte carlo sample.

2) Survivorship Bias: Often, the subjects that survive a particular event are not randomly selected out of a population. If one tried to make conclusions regarding an entire population using only those that the survived the event, the conclusions would be incorrect (survivorship bias would be present). A good example is the Abraham Wald WWII Planes anecdote.

3) Leading Question Bias: Non-neutral wording of a question can influence a participant's response compared to a neutral version of the question. This influence is known as a leading question bias. For example: "Did you enjoy the critically acclaimed Academy Award-winning film entitled Twelve Years a Slave" is a leading question compared to "Did you enjoy the film entitled Twelve Years a Slave". Mentioning the acclaim a film has been given could affect the participant's response.

NeilSpencer (talk)07:44, 6 March 2014
 
 
 
 
 

Birth Attitudes Survey

Please submit 3 questions by Tues, Mar. 4th noon.

RollinBrant (talk)17:40, 28 February 2014

Q1) Do we take into account the health care provider's gender? Furthermore knowledge of whether the person asked has a child could be of some use here?

Q2) Do we have to retain the linear scale or is it possible/common to weight the scale system for analysis. For example place heavier weights on extreme answers.

Q3) 'What statistical advice can be offered?' <- Seems extremely vague. What input would be of most use from a statistician? Eg/ Are sample size calculations required or has the survey been conducted? Are we are needed to check questions aren't biased?

DanielDinsdale (talk)01:26, 3 March 2014

1. What's the main purpose for maternal healthcare researchers? Are they going to develop a new intervention program?

2. I agree with Danny. I believe gender could have a huge influence on the reuslts in this survey.

3. Do they include names for health care providers? Reporting names on the survey may influence participants and make the results become biased.

4. What's the potential sample size? Are we going to collect data from all of them?

HuitingMa (talk)02:42, 3 March 2014

Q1) How will they consider a "no opinion" answer? For instance, if they take averages should no opinion be included or not?

Q2) How can the validity of a survey be established?

Q3) Are the participants chosen based on specific criteria?

ChiaraDiGravio (talk)06:25, 3 March 2014

Q1) Are we assuming reliable responses within each group (e.g., among nurses) or must we test the level of agreement within each group before comparing the attitudes between groups? Q2) Should we consider an analysis based on clustering the type of questions (e.g., moral-based, practicality-based, etc.) rather than testing the overall agreement in attitudes? Q3) How are we categorising attitudes? Pro, Neutral, Anti, or are we keeping the 7 level scale of the test.

AndresSanchezOrdonez (talk)23:51, 3 March 2014

Q1) What is the point of this experiment? Is there a specific question they are looking to answer? Q2) Are all the sample questions points of debate or controversy? If not, are they looking to find out the amount of bias in healthcare providers? Q3) How would they like the ranks to be weighted? Is an "agree" equally opposite to "disagree" or will one side be more heavily weighted and for what reason?

JackNi (talk)03:51, 4 March 2014
 

Q1. For a midwife, do we consider wether or not she has children and gave birth naturally? Q2. What is the specific aim of the study? Are these questions cover the things we want to study about? Q3. Do these questions measure the same thing? ' When a woman is in labour, the safest place for her to be is in the hospital' seems uncorrelated with other questions. What is the Cronbach's alpha?

YifanZhang (talk)03:58, 4 March 2014
 

Q1) If the questions are not equally independent, is that rational to give same score to all questions? Q2) Sometimes, it is real hard to me to identify disagree, mildly disagree etcs, so is there method to test robustness of the result? Q3)In survey, is possible that some people fill in it very careless, is there any method to distinguish good survey result and bad one?

JinyuanZhang (talk)04:23, 4 March 2014
 
 
 
 

1) The problem states that we are interested in "attitudes about child-birth". With factor analysis in mind, is there a pre-specified set of attitudes to be investigated? (e.g. Attitude concerning Caesarian, Attitude concerning patients making choices, attitude toward non-traditional methods)

2) Is it necessary to have such a large scale? Would 5 levels be sufficient?

3) Will the order of the questions on the survey be randomized?

NeilSpencer (talk)06:40, 4 March 2014

1. Neil's question on question order randomization is interesting for it addresses the issue that many of the questions seem to elicit highly correlated responses. In that vein of thought, can we use some of the questions within this survey to test for internal validity?

2. To test for internal validity we could use Cronbach's alpha technique. In this case, what is the best way to define the score function (the question the researchers have posed is not well-defined)? Should it be weighted for different questions?

3. Do the researchers have other results or variables related to this survey that we can use to test the present results' construct validity?

SeanJewell (talk)16:35, 4 March 2014
 
 

Birth Attitudes Survey

Please submit 3 questions by Tues, Mar. 4th noon.

RollinBrant (talk)17:40, 28 February 2014

February 13th Questions

1) Could you discuss the use of box plots instead of violin plots?

2) With respect to the first example: Is the magnitude of the ratings of the ratings important, or is it more important to focus on ratings of the patients relative to each other?

3) For the second example, would you be able to provide more information regarding performance of each method on individual patients? It would be useful to be able to pair them up to see the correlation of the two methods.

NeilSpencer (talk)18:18, 13 February 2014

1) For the first example, should each of the physician specific scores be normalized to account for the different range of scores given?

2) Can we consider adding a Wilcoxon signed-rank test to these visual displays to account for non-linear association?

3) In measuring the agreement between physicians in the last example, should each disagreement be weighted the same as each agreement? That is, should every disagreement detract from our measure of associativity more than each agreement?

SeanJewell (talk)18:26, 13 February 2014

Q1) Rather than normalising, would there be an advantage to only scaling or centring the specific scores. Q2) Could we perhaps look into a between vs. within physician effect for the scores as a method to better understand the differences. Q3) Should we consider the reliability of the scores when carrying out analyses.

AndresSanchezOrdonez (talk)20:02, 13 February 2014

Q1) For example 1 is the scatterplot of the correlations more informative than the boxplot? Q2) In example 2 since we have just 40 patients can be use a non parametric test to understand the agreement? Q3) For the last example will taking proportions be effective?

ChiaraDiGravio (talk)20:06, 13 February 2014
 
 
 

Discussion of Dialysis Experiment

Please submit your your three questions by Wednesday evening.

RollinBrant (talk)21:40, 4 February 2014

Q1) What are some reasons as to why a systolic/diastolic blood pressure measurement may be missing in the context of the study? Q2) What are the "normal" ranges for a systolic/diastolic blood pressure measurement? Q3) How large would a unit change in a blood pressure measurement from one hour to the next have to be to be considered unsual?

AndresSanchezOrdonez (talk)01:28, 5 February 2014

Q1) Was the blood pressure measured before the study? i.e. Will they consider that some patients might also have hypotension/hypertension problem? Q2) What is the difference in temperature between warm dialysate and the room temperature one? Q3) How many time passes between two consecutive dialysis sessions?

ChiaraDiGravio (talk)05:34, 5 February 2014

Q1) What other factors were considered for each patient? Ie/ male/female, weight Q2) Do blood pressures change with age/sex/height etc and similarly do blood pressure drops signalling hypotension change with regards to other factors? Q3) What is the expected blood pressure difference between the two and will this change with time?

DanielDinsdale (talk)21:30, 5 February 2014

Q1) What is the expected rate of emphhypotension for patients using dialysis? Q2) Would using warm dialysate cause any adverse or other effects? Q3) Would changing the dialysate temperature for the same patient cause any adverse or other effects?

JackNi (talk)22:20, 5 February 2014

Q1. Why the previously recorded blood pressure for one subject are so unstable? Q2. Why so many missing values in the 4th record of subject #01? Q3. Does the cold temperature dialysis have long term effect on blood pressure that may affect later experiments?

YifanZhang (talk)02:07, 6 February 2014
 

Q1) Are the missing values in systolic/diastolic blood pressure measurement occurring completely at random, at random or not at random? Q2) How to define "hypotension" under both conditions? Whether the definition are the same in both conditions? Q3) What are the potential reasons that participants drop out? Q4) Whether this study has a "wash-out" time? How long is that? If the side effect does not occur right away, short "wash-out" period may cause a problem.

HuitingMa (talk)02:09, 6 February 2014
 
 
 
 

Additional Question: What percentage of the sample is experiencing acute renal failure versus end-stage renal failure. Is the information regarding which patients have which available?

NeilSpencer (talk)19:26, 11 February 2014
 

Discussion of Questions for January 30th

I am planning on discussing E4 and F2.

NeilSpencer (talk)17:49, 29 January 2014

E3 and C4

JackNi (talk)00:16, 30 January 2014

E5 and D1

ChiaraDiGravio (talk)00:52, 30 January 2014
 
 
 

Questions for week 2

Q1) What values of FEV1 are considered 'normal' and how does this change with age? Q2) What values of FEV1 classify moderate and severe asthma? Q3) What is the expected drop-out rate in such studies (if known)?

DanielDinsdale (talk)19:07, 18 January 2014

Q1) What difference is the researchers hoping to see? Q2) How many subjects/months are feasible, taking into account cost and availability? Q3) How would the baseline change if not given the treatments?

JackNi (talk)01:34, 19 January 2014

Q1) Are any measurements being taken (e.g., changes of weight, diet, blood pressure, self-report questionnaires) within the two-month intervals?

Q2) What kind of variables have been found to be confounding for testing asthma drugs in the past (e.g., age, gender, etc)?.

Q3) Is the new treatment to be used for all kinds of asthma or just certain types (e.g., allergic, exercise-induced, etc) ? In the former case, are you considering dividing the ashtma patients by asthma-type?

AndresSanchezOrdonez (talk)02:25, 19 January 2014

a) In Analysing controlled trials with baseline and follow up measurements, it seems that linear regression does not fit the data well. Does this matter? b) Except gender, are there any other factors that have impacts on FVC? Or in other words, What decides the difference of FVC between male and female?

JinyuanZhang (talk)06:04, 19 January 2014

Q.1) Since FEV1 depends on multiple variables (e.g. age, sex, height) are the researchers planning to take these into account when they assign the two treatments? Q.2) What are they planning to do in case of possible drop outs? Q.3) What kind of treatment effect are the researchers anticipating?

ChiaraDiGravio (talk)07:25, 19 January 2014
 
 
 

Q1. What kind of blinding strategy should this study choose? (unblinded, single, double or triple?) And what kind of biases will this study deal with? Q2. What is the anticipated non-compliance rate? Since treatments are self-administered regularly each day, some patients may forget to take treatments. How to solve this problem? Q3. What is the participation rate for source population? Q4. Do we need a two-sided test? In order to detect a meaningful effect, how powerful should this study become?

HuitingMa (talk)06:36, 20 January 2014

Q1. For trials of chronic conditions, the study time is normally very long. Is it necessary to measure several times during the study period? Q2. Is it necessary to consider other factors that may influence the result? Q3. For the FVC test, it is normally repeated several times on patient. Is it repeated immediately after the previous test, or followed at a different time of the day?

YifanZhang (talk)06:20, 21 January 2014

Q1. What is the expected difference between the two proposed treatments? ie. the expected effect size and the expected variability around this measure. Q2. What metric do the researchers propose using to measure the difference between treatments? ie. relative differences vs. absolute differences, percent changes etc. Q3. If patients are self-reporting how do the researchers expect to confirm reliability and validity of measurements?

SeanJewell (talk)06:44, 21 January 2014
 
 

Q1: Have previous studies been performed about this (or a similar) topic? If so, could you point me towards a paper?

Q2: How do you plan on recruiting participants? Is it possible that some participants will have something in common (e.g. two members of the same family)

Q3: To elaborate on Danny's drop-out question, do you know why participants drop out? (Is it random? Death? The treatment not working?) Do you expect the drop-out rate to be larger in one group than the other? Is there a certain period after which drop-out increases? (e.g. Christmas)

NeilSpencer (talk)07:02, 21 January 2014
 

Sample size discussion for Thursday Jan. 8

Please add a relevant comment and question to this discussion thread.

RollinBrant (talk)03:05, 8 January 2014

To get the ball rolling, one area highlighted in Gerald van Belle's book is case when costs differ between samples. This could be a factor to consider on Thursday (section 2.11). Also since Neil mentioned we are dealing with a Binomial distribution, we could talk about the need for care when using certain methods to find the sample size. For example in the same book Equation 2.27 is considered appropriate only for the region of 10 < n < 100.

DanielDinsdale (talk)03:13, 9 January 2014

Extending on Daniel's point, an interesting point to considered and discuss, which is not fully fleshed out in the chapter, is what to do in situations where multiple constraints are influencing the sample size? For example, what to do in the situation where there exist constraints both in terms of costs (section 2.11), number of subjects available (section 2.10), and the requirement of a certain effect size dictated by a journal (section 2.5). Perhaps, we could delinate situations where ranking the importance of one constraint over another might be appropriate.

AndresSanchezOrdonez (talk)05:59, 9 January 2014

In this case mentioned by Andres, can we use different sample size calculations regarding each constraints and select the smallest number as sample size?

YifanZhang (talk)07:00, 9 January 2014

I believe it is important to consider whether the number of subjects required per group is available, and, if it is not, whether is useful to have unequal sample sizes (section 2.10). Moreover, as Andres mentioned in the previous comment we could discuss about the consequences of solely rely on the effect size (as the Rule of Thumb presented in most of the sections depends on it).

ChiaraDiGravio (talk)07:08, 9 January 2014
 

Yes, level of statistical significance, the value of the power desired, one-sided/two-sided test, costs difference, effect size, and unequal sample size are essential aspects to consider when we try to identify the sample size. Besides math calculation, in order to increase internal validity of the study design, do we need to consider about matching? Patients may have different characteristics, such as age and gender. For this example, suppose we only know whether the patients use activated carbon treatment or not, do we still have to go through all files to match two samples? Are there any better way or we do not care about matching? In addition, it is also important to consider the response rate in one of the groups (especially for surveys).

HuitingMa (talk)07:06, 9 January 2014

While it would be nice to match or block patients, I believe it would be too inefficient to implement this on an observational study such as the activated carbon case, unless there is a good number of patients available. And it would be rather unethical to create a randomized experiment, forcing subjects to consume poison and testing the use of activated carbon to other treatments.

As for the sample size, and in regards to Andres comment, it could be feasible to find the limiting factor and base it off that. But that factor should be a certain "mix" of other dependent factors such as costs and number of subjects available.

One point I'd like to ask is for a real experiment, how do we know some of the values that we need for sample size calculation (for example, standard deviation, probability of success, etc.). Should it be based on previous findings or a pilot study or something else?

JackNi (talk)21:53, 9 January 2014
 

A step toward answering Andres' question is to address the "number of subjects available" and the "cost" constraints as a single constraint. It would involve generalizing the concept of cost to allow for varying cost of additional observations as the sample size increases.

Increasing costs are often the case in reality; for example, suppose that the most recent charts are already digitized (very accessible, meaning cheap), slightly older charts are organized in a filing cabinet (still relatively cheap), but the pre- 2000 charts are disorganized in a cardboard box due to an office move or flood (much more expensive).

To control the number of subjects available, we could consider the cost of additional observations beyond that point as infinity. In this case, the simple method given by the book would no longer apply, but if costs were known it would not be too difficult to figure out the solution. I think we would be forced to use observations from the single remaining available set regardless of cost.

Another question, related to this one, is how to solve the cost problem when additional samples come in batches at a fixed cost. For example, sorting a box of charts gives 10 observations rather than just one.

NeilSpencer (talk)08:30, 9 January 2014

Combining some of the ideas already mentioned could result in an interesting approach to choosing sample sizes. Consider regarding the currently available clinical results on Activated carbon as fixed with close to zero cost (for Neil suggests that these will likely be digitized and thus have zero incremental lookup cost). This also implicitly assumes this is a new treatment method that has not been conducted in the past. In comparison, the results of alternative treatments must be ascertained through manual searching (per Neil's suggestion). Hence, using section 2.10 on unequal sample sizes, as suggested by Chiara, will yield the number of past records to be manually searched. One intricacy not commented on is that the cost of retrieving one record of alternate treatment will not be uniform. Pre-digitized records are not organized by symptoms or treatment, therefore finding one positive record may actually be poisson distribution among all other causes bringing kids to the ER.

Alternatively, we can view this problem from a different perspective: What fixed cost are we willing to incur in the manual search process? With this cost we could estimate the number of applicable records found, consider this value fixed, and use section 2.10 to determine the number of cases where Activated carbon is used. If this sample size is greater than the number of records currently available simply wait and conduct the study when more records become available. [Here, the waiting time can be considered a generalized cost (likely modelled by an exponential random variable) to be compared against the costs from the search process. In this way the investigator could minimize his costs by balancing the relative sample sizes].

SeanJewell (talk)19:12, 9 January 2014

I agree with Sean's option. Also, if we only care about computing ability, an alternative way to deal with the case that sample size is greater than the number of records available may be to apply bootstrap. I think the cost in this case will be very small. However, it can be argued that this way does not increase the amount of information of original data, so it may not fit the requirement of our design.

JinyuanZhang (talk)05:30, 14 January 2014
 
 
 
 
 

Discussion for Assignment 0

Students are invited collaborate on Assignment 0 by post comments and questions in this discussion.

RollinBrant (talk)20:08, 7 January 2014

A few questions/comments:

1. Could you please elaborate on the instructions for this assignment? In particular, is our analysis meant to be based on the summary statistics provided in the "Simple Study Designs" or from the linked articles?

2. It appears the link for example 3 is broken.

Thanks!

SeanJewell (talk)19:20, 9 January 2014
 

Basic Instructions

You can add your discussion contributions by clicking on the Reply button relating to the appropriate thread. Use the "Save page" button at the bottom to save your work. I suggest creating your submission first in your favourite text editor or word processing program and then pasting it in, rather than doing all the editing on-line. After you've pasted in your contribution use "Show preview" to see what the formatted version of the page will look like after you've saved and then save. It is not necessary to sign your name for the purposes of this discussion, but I will ask people to submit their contributions to me by e-mail as well as posting it here. However, contributions are not anonymous because anyone can track who's edited the page by using the "History" tab above.

RollinBrant (talk)20:05, 7 January 2014