# Course talk:CPSC522/Titanic: Machine Learning from Disaster

## Contents

Critique118:39, 22 April 2016
suggestions and questions106:26, 21 April 2016
Comments and critique103:28, 21 April 2016
Suggestions for Titanic: Machine Learning from Disaster122:54, 20 April 2016

## Critique

Hi yunyuan,

It's my pleasure to read your page. I think the problem you try to solve in this page is very practical and important as many datasets may include missing values and I also learned a lot about how to possess missing values rather than simply removing the samples with miss values. But I found that Reference Section is missing. In addition, Can you explain the coefficient of determination briefly instead of just adding the link to the term in wikipedia?

Sincerely,

Ke Dai

06:56, 21 April 2016

Hi Ke,

Thank you for your feedback! I will change my page based on your suggestion.

Sincerely,

Junyuan Zheng

18:39, 22 April 2016

## suggestions and questions

Hi Junyuan,

Interesting topic, and very easy to understand.

I have some suggestions and questions here:

1. Mice actually does not always perform as satisfying as your graph though. Can you explain more on why do you choose MICE as a comparing target? Say like how Mice performs nicely in similar circumstances.

2.I think your solution of neural network is very inspiring and reasonable, I think maybe you just chose the wrong data sets for testing, if you test on the cases that have more instances, the result would be much more acceptable.

3.In your “master” investigation part, do you mean that you have only 53 instances for this training? Or you also have others? why there are also people with other titles in your result? Did I miss anything?

I still do not want believe that your hypothesis is wrong.

05:03, 21 April 2016

Hi Dandan,

1.The main reason that I use MICE is it is very easy to implement, and most people in the competition is using the MICE+R to fill the missing value.

3. Yes, I use the full dataset first (with 5 different title classes), but the single neural network & neural network ensemble model result is not good. Therefore, I want to investigate why the neural network ensemble cannot improve the performance. So in the further investigation section, I implement a single neural network and using a dataset that only contain Master title, which contains 53 instances.

Thanks!

Sincerely,

Junyuan Zheng

06:26, 21 April 2016

Hi Junyuan,

Firstly, let me thank for your project and your contribution in CPSC 522 Wiki pages. I like your page and find your topic of project interesting. I like to share my opinion and give you my comments, maybe they might help you.

1) First of you can treat this assignment's wiki page as a research paper and include the usual and common sections such as: abstract, motivation, introduction, method, evaluation, results, conclusion, future work and acknowledgment. 2) I would say work a bit more on the wording of some of the such as: "Using Probability (Probabilistic) Method to Fill Missing Age Value(s)" 3) Maybe move most of the codes and pseudo-codes to a file or link and give pointer to link to that rather than putting all of them here. 4) I need more explanation on the attributes. 5) Finally, as I mentioned in the first point above, there must be subsections such as conclusion or results or data, so when a reader wants to just look at a specific aspect of the page, be able to do so as efficient as possible.

02:53, 21 April 2016

Your suggestions are very helpful! I will change the format of my wiki page. For question 4, I just add a "variable descriptions" table, hope this will be helpful.

Thank you!

Sincerely,

Junyuan

03:28, 21 April 2016

## Suggestions for Titanic: Machine Learning from Disaster

Hi Junyuan,
I enjoyed going through your page. I think you have very clearly explained the hypothesis, problem description and how you proceed towards solving it. Some suggestions and queries I had are outlined below:
1. It would be great if you could proof read once because there are typos and grammatical errors; some of which hinder in understanding what you are trying to convey.
2. You might want to explain the attributes: for instance for the attribute Embarked, what do S,C and Q stand for? What does Parch stand for? Under your section, Using probability method to fill missing age value; you might want to explain what PMM-predictive mean matching stands for and maybe in a line or two explain how it works.
3. I think it was really clever how you used title and relevant attributes to guess the age of the person. In rare title, do you mean titles like Dr.?
4. You could give links to refer to RMSE and ${\displaystyle R_{2}}$ because a layman user might not know these standards of comparison.
5. If my understanding of ensemble methods is correct, it creates and combines multiple models to sort of average out the errors. I am not sure how multiple models are created and combined in your ensemble NN. And why do you chose to ignore the attribute FamilyID in your second set of implementations. You could probably explain this more explicitly in your page.
6. So if i understand correctly; in conclusion, because of insufficient unique data you are unable to train your neural network properly because of which you cannot use it to give a good prediction on your age attribute's missing values. You could consider adding a Conclusion, Discussion and Future work section to give the readers a better idea of what you have established and what could be further checked.
Thanks for a great page!
Ritika

20:02, 20 April 2016

Hi Ritika,

Thank you for your feedback! I will modify my page based on your suggestions.

For question 3, the 'Dona', 'Lady', 'the Countess','Capt', 'Col', 'Don', 'Dr', 'Major', 'Rev', 'Sir', 'Jonkheer', they all belong to Rare title.

For question 4, I did test a neural network that includes the FamilyID, but the result is the almost the same. I will update the page based on your suggestion and include the FamilyID.

Thanks!

Junyuan

22:53, 20 April 2016