Course talk:CPSC522/Treatment of Missing Data

From UBC Wiki
Jump to: navigation, search


Thread titleRepliesLast modified
Feedback(J0)322:34, 20 February 2019
Feedback on first draft.101:33, 10 February 2019


Sections, formalism and examples looks good. I can't verify the coverage of important concepts and correctness completely, but I think enough time was spent.

01:26, 13 February 2019

Thank you for the feedback, Hooman.

19:12, 13 February 2019

I read it with more detail, can you look at the denominator of the first formula after this sentence: ^Using the definition of conditional probability, we conduct the following steps

I don't know if should be there or not.

00:27, 19 February 2019

You are absolutely right. I corrected the error.

22:34, 20 February 2019

Feedback on first draft.

Looks good. Here are a few comments based on the first draft.

In preliminaries, what are the "responses"? Why do you say ? is the missing entries inside X, but then also use it for y in your example? (Does it mean something different there?)

What is a missingness indicator for a feature? (What does a feature missing mean, when the feature can be missing for some examples and not for others? What if every feature has at least one example with that feature missing?

I'm not sure it is appropriate to claim that women are less likely to reveal their age (even if it were true).

I cannot work out from your examples what the difference between MAR and NMAR. (I guess that MCAR is when R variable has no parents). Why isn't there a gender variable as well as a GENDER variable? I'm guessing that this means that gender is not observed. In the description how does "using another method" relate to "model why the data are missing"? Is there a clearer definition?

Why is collaborative filtering suitable for NMAR?

I am not sure what "complete training data, incomplete test data" - isn't this what supervised machine learning is? I thought that you were considering missing data in the training set (that is what I got from the introduction, which I thought was all about the training data).

04:21, 8 February 2019

Thanks for the feedback. I will incorporate these points.

01:33, 10 February 2019