Critique

Hi Dandan,

Thanks for your kindly detailed feedback. Let me explain a little bit about C. C is the penalty coefficient of error which represents to what extent you are tolerant to the error. The bigger the value of C is, the less tolerant you are to the error. That is to say, a too big value of C gurantees better predictive accuracy but may lead to overfitting while a too small value of C leads to bad predictive accuracy. So you have to find an approriate value of C by trial in practice.

1. When C is set to 0.004, the tolerance of error increases and some important and necessary features are removed. So the learning model trained with new train data set cannot fit test data set well.

2. From my perspective, the size of a data set is not the most important factor affecting predictive accuracy of a model. The distribution of samples of differnt classes in a data set is more important. An imbalanced data set will misdirect feature selection and make the model meaningless. Take thyroid data set for example. When C is set to 0.005, only the feature sex is selected. But the model trained with the new data set still achieve very high predictive accuracy, which means the model can predict whether a patient suffers from thyroid disease only by his or her sex. It is ridiculous, right? Do you know iris data set? This data set only contains 150 instances, but samples of 3 classes are uniformly distributed. The model trained with this data set can attain predictive accuracy of 100% in my experiment.

3. As metioned above, no one knows what the approriate value of C is given a certian data set. The only method is trial and error.

4. Yes. I have added a conclusion section in this page.

Sincerely,

Ke Dai

KeDai (talk)12:09, 22 April 2016