Course talk:CPSC522/Treatment of Missing Data
|Thread title||Replies||Last modified|
|Feedback(J0)||3||22:34, 20 February 2019|
|Feedback on first draft.||1||01:33, 10 February 2019|
Sections, formalism and examples looks good. I can't verify the coverage of important concepts and correctness completely, but I think enough time was spent.
I read it with more detail, can you look at the denominator of the first formula after this sentence: ^Using the definition of conditional probability, we conduct the following steps
I don't know if should be there or not.
Looks good. Here are a few comments based on the first draft.
In preliminaries, what are the "responses"? Why do you say ? is the missing entries inside X, but then also use it for y in your example? (Does it mean something different there?)
What is a missingness indicator for a feature? (What does a feature missing mean, when the feature can be missing for some examples and not for others? What if every feature has at least one example with that feature missing?
I'm not sure it is appropriate to claim that women are less likely to reveal their age (even if it were true).
I cannot work out from your examples what the difference between MAR and NMAR. (I guess that MCAR is when R variable has no parents). Why isn't there a gender variable as well as a GENDER variable? I'm guessing that this means that gender is not observed. In the description how does "using another method" relate to "model why the data are missing"? Is there a clearer definition?
Why is collaborative filtering suitable for NMAR?
I am not sure what "complete training data, incomplete test data" - isn't this what supervised machine learning is? I thought that you were considering missing data in the training set (that is what I got from the introduction, which I thought was all about the training data).