Course:CPSC532:StaRAI:2017:As1:Q1

From UBC Wiki

Course:CPSC532:StaRAI

Assignment 1 - Question 1

Please post here you results on how well a method your propose works. Do no include more significant digits than you have evidence for (running your algorithm once does not provide evidence for accuracy). This is not a competition, we want as many suggestions as possible. Keep method in table short and add an explanation below:

David's solution

Sum of Squares Error

Method n=1 n=2 n=3 n=4 n=5 n=10 n=20 n=100 n=1000
Predict 0.5 25 25 25 25 25 25 25 25 25
Training proportion 33.61 25.234 23.2 21.4 19.9 18.5 17.2 16.9 16.78
Predict Mode 33.6 32.7 30.5 29.3 28.8 27.4 26.3 25 24.9

Sum of Absolute Error

Method n=1 n=2 n=3 n=4 n=5 n=10 n=20 n=100 n=1000
Predict 0.5 50 50 50 50 50 50 50 50 50
Training proportion 33.3 33.3 33.8 33.8 32.7 32.6 33.2 32.7 33.6
Predict Mode 33.6 33.2 30.4 29.4 27.9 27.9 26.5 25.8 25.4

Log Loss

Method n=1 n=2 n=3 n=4 n=5 n=10 n=20 n=100 n=1000
Predict 0.5 100 100 100 100 100 100 100 100 100
Training proportion 5979.5 100.0 140.5 70.0 5979.5 68.2 68.2 68.0 68.9
Predict Mode 121.9 119.6 109.2 113.8 105.7 102.1 98.8 95.3 93.8

Description of Methods

Predict 0.5

Always predict 0.5

Training Proportion

Predict . This is the "training average" or the "empirical proportion". This is from http://www.cs.ubc.ca/~poole/cs532/2017/as1/triv_learn.py (Can someone do log loss, please?)

Predict Mode(Moumita)

Predicts the label of the majority class. If n1>n0, predicts 1 else predicts 0. Log loss returns error for 0. So instead of predicting 0 or 1, predicts 0.1 or 0.9 for this loss only.

Notes on Log Loss

The Log Loss were not calculated as the exact one which is . To solve the potential numerical issue, what is calculated here is .

The extra error part we had here is very small so that if the resulting prediction is in a reasonable range if doesn't have any real affect to the result.