Suggestions
Hi,
An interesting wiki page. Here are several suggestions: 1. I think maybe you should add your name on the page. 2. The 'Data Preprocessing' section is a little abstract. Would you mind to show some result after each step? That would be easier for people to understand. 3. I am curious about why DBN-Ling's accuracy is higher than NB-Ling's accuracy, but DBN-Enron's higher is lower than NB-Enron's accuracy? And those four measurements, which one is the most important?
Sincerely,
Junyuan Zheng
Thanks for your comments.
1. Sure.
2. I will try to add some explanation.
3. I guess the reason is that spam ratio in Ling-Spam is much lower, therefore the performance of NB is much worse. Also, since the size of data is small, the variance of classification results can be high.
4. I think it should precision, because it measure false positive (incorrectly rejecting legitimate email), which is a big problem for email users.
Yan