Course talk:CPSC522/Spam Detection
- [View source↑]
- [History↑]
Contents
Thread title | Replies | Last modified |
---|---|---|
response | 1 | 19:26, 27 April 2016 |
Some suggestions | 1 | 19:26, 27 April 2016 |
Suggestions | 1 | 19:25, 27 April 2016 |
Suggestions | 1 | 00:47, 21 April 2016 |
Hi Yan, Good work. Your hypothesis is stated and motivated well. But few of the claims you make lack justifications from the work you presented.
1. NB has better performance in our experiments. However, our first guess is that DBN should be better. The reason might be the size of DBN is too small.
You mean adding more layers would always help? Why you feel that there is no saturation? And what are some std practice then to decide the number of layers. Is there a principled way to set this?
4. Data that contains more spams lead to better performance.
True as it sees more thing to learn. What would be more interesting if spam/(not spam) ratio has any impact, to be able to generalize the way you stated
5. NB is more suitable for serving as a prototype for spam detection. While DBN can achieve better performance in larger system with more data and computational power.
You can only claim this when you can actually see this happening, but as of now you are just going by gut feeling :-). Moreover no study on running/training time is performed. Which leaves the validity un-justified again.
Hi Yan,
It is a good page that makes clear statement of the hypothesis and detailed description for the algorithms that you are going to compare. Simple suggestion is that it might be better if you could tell more about the motivation to this topic and have a introduction to this. Also, more discuss on the evaluation results will be better.
Best regards, Jiahong Chen
Hi Yan,
Nice page. Here are some suggestions:
- Add some explanation for your Evaluation Metric part will help me easier to understand them.
- Abstract section can be more detailed.
Bests,
Yu Yan
Hi,
An interesting wiki page. Here are several suggestions: 1. I think maybe you should add your name on the page. 2. The 'Data Preprocessing' section is a little abstract. Would you mind to show some result after each step? That would be easier for people to understand. 3. I am curious about why DBN-Ling's accuracy is higher than NB-Ling's accuracy, but DBN-Enron's higher is lower than NB-Enron's accuracy? And those four measurements, which one is the most important?
Sincerely,
Junyuan Zheng
Thanks for your comments.
1. Sure.
2. I will try to add some explanation.
3. I guess the reason is that spam ratio in Ling-Spam is much lower, therefore the performance of NB is much worse. Also, since the size of data is small, the variance of classification results can be high.
4. I think it should precision, because it measure false positive (incorrectly rejecting legitimate email), which is a big problem for email users.
Yan