Signup List

Signup List

The case studies are described at the link given on the Course Page. If you are interested in taking on one of the projects, please reply to this discussion identifying which project(s) interest(s) you.

RollinBrant (talk)01:44, 22 January 2014

STAT450 students, please select two projects and post at least one question and at least one comment to each.

GabrielaCohen (talk)14:22, 22 January 2014

Hello,

Here are the 2 projects I am interested in:

Predictors of Anemia: 1) I am wondering what the random sampling methodology used was. 2) What is meant by "... eight independent variables (30)"? Are there eight independent variables or 30?

TA Grading Consistency 1) From where and how were the TAs chosen for test data? 2) Were the TAs being aware they were being monitored for grading inconsistencies? If they were, then they might have altered their behavior so as to not appear inconsistent. This might affect the level at which a TA would be labelled "inconsistent".

Thanks, Angela Cotton

AngelaCotton (talk)23:24, 25 January 2014
 

Hi,

Here are the 2 projects that I am interested in and my questions&comments on them:


1) TA Grading Consistency

Questions: What do you mean by "common exam grades"? Are they the grades that are most commonly given out by most TAs? If so, what if all the TAs give significantly different grades? Do you obtain such "common exam grades" by sampling or the entire population (all the students enrolled the course, in all sections)?

Comments: Other than determining the number of TAs/students required, I suggest that the tool should also be improved in the way that users are allowed to input other factors, for instance the types of question (short answer, essay, reasoning, etc.) being marked by the TAs each time. This is to capture the interaction effect between our main factor (TAs consistency) and other factors that are potential to be interacting or confound with the the main.


2) Longevity in China

Questions:

What do the codes that provides information about subjects' occupation look like? I want to know whether the codes provided are possible to be classified into the desired categories by software programs such as R, instead of being done manually". What is the periodization that you will supply?

Comments:

So far the article only mentioned about "average" as the measurement to refer to. To analyze the data set better, I would suggest as least standard deviation to be measured for the entire data set, and within each group of data. This can be used to identify the main cause of variation of the data set when some statistical methods are implemented.

YungMingHuang (talk)17:42, 26 January 2014
 

Hi:

I'm interested in the following two project and I have some questions and comments regarding each.

1. Longevity in China:

  • What is the source of chronic data, and will the groups be consistent with that of the Harvard data groups?
  • Is the subject interest mainly on religion or occupation? When the category gets too vast, we many not have enough data for each occupation to draw conclusion.
  • There are many other factors other than occupation that affects the life longevity like district the people they live in. Ignoring these factors can cause the researcher to draw inaccurate conclusions. Either the factors are included or set to a constant (make every data be from a specific province)

2. DNA and Aging in Mice

  • What are the criteria of being an "age mouse" and a "young mouse"? Is there another group between "young" and "old" mice?
  • How are the mice selected? Are the age of the mice known? Do the mice spend their entire life in the lab?
  • Are the experiments replicated or repeated?
  • The mice need to be randomly assigned to different technicians to measure or draw samples in order to minimize the possible source of error in measurement.

Tracy Tien

Mengping (talk)18:58, 27 January 2014

Hello I am interested in the following two projects: 1. Micronutrient Powders

Questions: Was the sample of 300 children chosen equally from each of 4 districts? (75 children from each district?) Comments: Paired t-test might be used to compare effect of MNP on mean Hb level in treatment group and control group

2. DNA And AGE In Mice

Questions: How many samples will be used? How old are mice defined as aged mice? What is cell culture? Do you want to compare protein expression, various functions (migration, response to injury, etc.), levels of secreted proteins, and level of DNA compaction after treatment in young mice and aged mice, as well as comparison before the treatment? Which design of experiment used? Is the experiment conducted based on time series?

Comments: ANOVA may be used to compare effects of the treatment on young mice and aged mice.

HyunWooKim1 (talk)21:35, 27 January 2014
 

Hi,

These are the two projects I'm interested in:

1. Micronutrient Powders Question(s): Client wants to pair each intervention district with a control district, why isn't the number of intervention children (604) equal to the number of controlled children (505)? Any other factors that could affect the severity of diarrhoea? Comment: Other factors that could determine their growth and severity of diarrhoea. For example, their genes, diets could affect the results. We could use Two-sample T-test in this case since the sample sizes of controlled group and intervention group are different.

2.TA Grading Consistency Question(s): Are the TAs chosen all from the same faculty? Typically, a professor sets a class average in mind, would that affect how TAs mark their students? Comment: Include other factors of the TAs. For example, how many courses are the TA's teaching and taking, the more course they are taking and/or teaching, the higher possibility that they are more inconsistent. Other factors such as their level of education(UnderGrad, Grad,Master,PhD) and gender could also have impact on the tool.

ZhenHuang (talk)04:11, 28 January 2014
 

1. Predictors in Anemia

Questions: Are there steps to ensure an even age distribution in all 4 groups? Are you considering taking into account whether or not the sampled women are ethic Cambodians or from elsewhere? Perhaps some economical standing will also play a factor in results gathered

Comments: use anova analysis to study between the treatment groups

2. DNA and Mice age

Questions: How are the sample/treatment sizes chosen/used? What are the controls for mice deemed young and old, to ensure that the two groups do not vary much within themselves? What is the drug used? are there more than one drug being used? Does the study rely on time after treatment?

Comments: use anova to analyze the statistical significance between age groups

Jonluo (talk)04:46, 28 January 2014
 

Hi, 1. Predictors in Anemia

Questions: How were these villages chosen for the study? (Randomly selected or volunteers?) Are sample sizes similar across groups?

Comments: Perhaps we can also use a stepwise/stagewise variable selection method to reduce the model down to the meaningful variables first and then fit the model again before estimating the effects of each variable.


2. DNA and Age in Mice

Questions: How many samples are going to be used and how will they be split up? Are there any blocking factors that are relevant during the experiment? Also where will randomization be used in this experiment? Also for the age division in the mice, are you planning to split it up with a point value as a cutoff (eg. mice older than 1 year vs mice less than 1 year old) or two distinct groups (eg. mice 1 month old vs mice that are 12 months old)?

Comments: Since we are testing for differences between two groups then a two sample t-test can be used.

DerekCho (talk)06:25, 28 January 2014
 

Good Evening, The two projects I picked are: 1. Longevity in China Q: a) Is the data very well randomized for the study groups like emporers, buddhist monks/nuns, daoists, medical practitioners, etc.? Usually the north of China has more wars than the south, so people in south have a longer lives b) Are the lurking variables like war, sickness, etc which also cause death taken care of or just the ones with natural death are the study interest? Wars and sickness will affect the result of interest. C: We can use R/Excel code to do the data clean up 2. TA Grading Consistency Q: a) If the tools are not that statistical accurate, are the results statistically reliable? Wrong conclusions may be made. b) Are the lurking variables taken care of? Some lurking variables may be like class size, work load of the course. Consider one class has only 30 people and the TA only needs to do some marking of two midterms and a final and another class has 150 people, TA needs to mark exams, assignments, supervising labs, etc, this surely will affect the marking consistency of TAs C: Statistical sample size selection techniques can be used to find the minimum number of TAs/students required to produce reliable results.

Sincerely

HongXuanZhao (talk)06:29, 28 January 2014
 

1. TA Grading Consistency: Questions: Is there randomization in the study? For example, are the students randomly assigned to each TA? Are there equal number of students for each TA?

Comments: A common grading scheme/template should be provided to the TAs in order to minimize the inconsistency in grading.

2. Longevity in China Questions: Do all the subjects in the study have piety towards their occupation/religion? For example, what if a subject is both Buddhist monk and medical practitioner?

Comments: We should construct an ANOVA table to see the interactions between factors.

KathyYanSinNg (talk)06:42, 28 January 2014
 

Establishing a statistical tool to assess TA grading consistency: Questions: When accessing the TA grading did they grade more than one section? As a single section sample may not be accurate because by chance that particular section may be superior or worse than others. What exactly do you mean by common exam grades? Is it for that year or previous grades from previous years. Comments: We could cross reference the excel tool by also looking at the differences between TA grading with Anova.

DNA structure of cells in young vs. aged mice: Questions: Was there replication or repeated measurements? How large was the sample size used? How was the data measured as in was it blocked and randomized? Comments: Randomizing and blocking this experiment is really important as the data may be correlated and our results may not be accurate.

TommyPoChungTang (talk)07:55, 28 January 2014
 

Hello,

1. Longevity in China Questions: Would you be open to using statistical program R and performing test? Simply calculating the means would not be a good indicator if there is a significant difference in the average lifespan of a particular occupation. Are there other variables such as location as these factors may play a more significant role? Comment: Possibly t-test,anova or even variable selection could be used to analyze the data.

2.Predictors of Anemia Questions: What model selection method was chosen? Different methods could give us different results. Were the blood samples analyzed by the same lab? Comment: Look into more detail on how the data was analyzed.

JonnyPoHongTang (talk)08:05, 28 January 2014
 

Hi, I am interested in the following two topics: Micronutrient Powders Questions: During this 12 months, what if some children in the control group get sever anaemia? It is impossible that they receive no other treatment than MNP They measured 3 times in one year. Did each measurement take place at the same time? And were these 3 measurements evenly distributed in this year? Problems: In this two stage cluster sampling They purposively choose two districts but not randomly select two district. This might cause some bias in the results

Longevity in China Questions: There are two important factors that can affect the result. One is the occupation and the other one is the period people were living in. How did they isolate the effect of each factor? Are these the only two factors in the data set? Comments: The people who have the records of their birth and death dates are likely to be the famous Chinese people. This might lead to some bias. Since famous people tend to have better quality of lives, this might affect the result of longevity in China.

ShengyiZhu (talk)09:00, 28 January 2014
 

Hello,

Here are the two projects that I'm interested in:

1. Longevity in China

Question: a) How was the data collected and what are the sampling methods used in the study? b) How are the occupation groups identified and do we already know the occupation of each individual? Comment: we could use anova to compare all the isolated groups.

2. Predictors of Anemia

Question: a) It's mentioned that '...each of the eight independent variables (30)'; what are the eight variables and does 30 mean that each variable has 30 random samples from the four villages? Comment: we could add a control group to see the effects of other factors.

AyakaYingCui (talk)09:09, 28 January 2014
 

I am interested in the following two topics: 1) Micronutrient Powders Question: i) How many primary sampling units listed in the frame? ii) What are the problems encountered when the data was collected using simple random sampling? Comment:two sample t-test can be used to calculate the mean differences between intervention and control groups. 2)Longevity in China Question: was the data collected successively in equal spaced time intervals? or all at once. Comment: average lifespan for entire data set might not be good estimator of mean lifespan for the Chinese if the data is a time series.

WenyanZhao (talk)09:30, 28 January 2014
 

Hello

Here are the my chosen topics and corresponding comments and question:

1) Spruce Budworm - My question for this topic is for the number of pupae and the number of months, do we count each number as one category or do we have a range for the counting? - My comment for this topic is that we could convert all the data into numerical variable. Since the data will be normalized, we can use Pearson’s R Correlation Test to investigate the correlation between variables. Also, if we do not want to convert data, or not sure about the correct way of normalizing a data, we can use Spearman’s R Correlation Test instead.

2) Micronutrient Powers - My question for this topic is do we need to worry about the equal proportion from each district in this experiment? For example, do Musanze and Burera need to have same number of children participating in the experiment in order to get an accurate percentage result for comparison? - My comment for testing the effectiveness of MNP is that, since we are comparing a control group (i.e. Musanze or Nyaruguru) versus the MNP-consuming group (i.e. Burera or Nyamagable), we can use paired t-test to analyze the data.

TerriZhang (talk)14:54, 28 January 2014
 

Micronutrient Poweders -Questions:

 -do we need to have the same sample size in both control group and intervention groups?
 -There are two trials: Nyamagabe(control) vs Nyaruguru and Burera(control) vs Musanze, do we need to use one trial or more than two trials? since if the data analysis’ result of the first trial is the opposite of the second trial,what should we do?
 -In methods part, it says we need to randomly selected 300 children from each district, so the total sample size is 300*4=1200. However, in the end of the paragraph, it said there are 1109 children. The conditions are not consistent. Maybe there are some missing data?

-Comments: If we only compare two groups: control and intervention, then we can use pair t-test to analyze the data.



Spruce Budworm -Questions:

  -When we count the number of surviving pupae and moths, we cannot determine if they are die from the similar compound made from the needles of Douglas-Fir Trees or die from their natural death. 
  -I have the same question with the statistical advice part.
  

-Comments: A random effect should be counted in the correlation analysis.


Longevity in China -Questions:

 -How to specialize the periods? not clear to mentioned in this article.
 -We know the total sample size is 27,000. Do the three groups’s sample sizes are equal?


-Comments: By comparing the longevity of different groups of people, we can use one-ANOVA test.

FangyuDi (talk)17:32, 28 January 2014
 

Good evening,

The two projects I am interested in are:

  * 1. TA Grading Consistency     
         What is meant by reference table? What kind of data is given to create such table? Were all TA's involved in all the grading (homework/midterms/final exams) and in all of the tutorials? If not, how was that decided? What does the "plug and play” statistical tool do? For instance in an office job there are different positions held by different people such as an HR Manager or a Payroll Manager in which they are qualified in most instances only to those specific positions. The same notion applies to TA's since they are hired based on their qualifications and placed in their respective departments. In the social sciences (Criminology), the TA has a different background and a different marking criteria than that of a Science Department (Statistics). How were the TA's chosen? Did they all come from the same department? What if there are not enough TA's to test in one department? Were the rest of the TA's chosen from other departments within the same school? Or were there other TA's from another school within similar department chosen? What did their grading consist of? How much time were they given for grading? Some people need more time than the others to grade especially if the student's handwriting is messy and this may affect the grading consistency. In order to run a statistical analysis, one would need a large enough sample and I am concerned if that was accomplished. This is something I have always wondered and I am very interested if there is any inconsistency between the grading, and if there is how big the inconsistency is. In that case, the TA's would need more training so that there would be fair grading for all students.
  * 2. Predictors Of Anemia 
           What are the eight independent variables? And what is it meant by the 30 in the parenthesis? Were there 30 samples tested for each of the eight independent variables? What are some of the assumptions I should be aware of? Why were the samples chosen from only four villages? Are there any limitations? Effect modification often occurs in biomedical research so we might need to take it into consideration.

Sincerely, Simona Cristiana Hrehorciuc

CristianaSimonaHrehorciuc (talk)07:11, 26 January 2014

Hi, I am interested in the following two projects:

  1. Longevity in China
    • What kind of periodization is proposed? Are these periods same in length?
    • If R can read the occupation code, we can save a lot of work from doing that in Excel. In despite, using Excel to isolate individuals according to their occupation groups is not hard (assuming isolation is not doable in R). If the codes involve some complexity, I believe we can observe some patterns in the codes. Above all, we need to view the data first.
    • Is residence region considered in this case? People who resided in an urban area might have more access to medication, fresh food and water comparing to those who resided in rural area, and thus lifespan could be affected by this factor. To make the result more accurate, we can further separate the data into different regions (if this piece of information is included in the original data)
  2. DNA and Age in Mice
    • How are young and aged mice defined?
    • What is sample size? What kind of data have collected so far?
    • More details about the experiment are needed, ie, the experiment needs to be replicated, mice are randomly selected.
QingWeiLi (talk)07:24, 28 January 2014
 

1. Micronutrient Powder Question: Do we need to consider the amount taken by each unit each day? Comment: Two two-sample t tests can be performed for this project.

2. Spruce Budworm Question: For those "counts" in measure section, do we consider them categorical or numerical with a certain range? Comment: A linear regression model can be constructed to provide a good analysis based on data given.

ChuanZhang (talk)07:05, 28 January 2014