Course:CPSC532:StaRAI:2017:As1:Q1

Assignment 1 - Question 1

Please post here you results on how well a method your propose works. Do no include more significant digits than you have evidence for (running your algorithm once does not provide evidence for accuracy). This is not a competition, we want as many suggestions as possible. Keep method in table short and add an explanation below:

David's solution

Sum of Squares Error

Method	n=1	n=2	n=3	n=4	n=5	n=10	n=20	n=100	n=1000
Predict 0.5	25	25	25	25	25	25	25	25	25
Training proportion	33.61	25.234	23.2	21.4	19.9	18.5	17.2	16.9	16.78
Predict Mode	33.6	32.7	30.5	29.3	28.8	27.4	26.3	25	24.9

Sum of Absolute Error

Method	n=1	n=2	n=3	n=4	n=5	n=10	n=20	n=100	n=1000
Predict 0.5	50	50	50	50	50	50	50	50	50
Training proportion	33.3	33.3	33.8	33.8	32.7	32.6	33.2	32.7	33.6
Predict Mode	33.6	33.2	30.4	29.4	27.9	27.9	26.5	25.8	25.4

Log Loss

Method	n=1	n=2	n=3	n=4	n=5	n=10	n=20	n=100	n=1000
Predict 0.5	100	100	100	100	100	100	100	100	100
Training proportion	5979.5	100.0	140.5	70.0	5979.5	68.2	68.2	68.0	68.9
Predict Mode	121.9	119.6	109.2	113.8	105.7	102.1	98.8	95.3	93.8

Description of Methods

Predict 0.5

Always predict 0.5

Training Proportion

Predict ${\frac {n_{1}}{n_{0}+n_{1}}}$ . This is the "training average" or the "empirical proportion". This is from http://www.cs.ubc.ca/~poole/cs532/2017/as1/triv_learn.py (Can someone do log loss, please?)

Predict Mode(Moumita)

Predicts the label of the majority class. If n1>n0, predicts 1 else predicts 0. Log loss returns error for 0. So instead of predicting 0 or 1, predicts 0.1 or 0.9 for this loss only.

Notes on Log Loss

The Log Loss were not calculated as the exact one which is $-e\log _{2}(p)-(1-e)\log _{2}(1-p)$ . To solve the potential numerical issue, what is calculated here is $-e\log _{2}(p+10^{-100})-(1-e)\log _{2}(1-p+10^{-100})$ .

The extra error part we had here is very small so that if the resulting prediction is in a reasonable range if doesn't have any real affect to the result.