Course talk:CPSC522/Spam Detection

[View source↑]
[History↑]

response

Hi Yan, Good work. Your hypothesis is stated and motivated well. But few of the claims you make lack justifications from the work you presented.

1. NB has better performance in our experiments. However, our first guess is that DBN should be better. The reason might be the size of DBN is too small.

   You mean adding more layers would always help? Why you feel that there is no saturation? And what are some std practice then to decide the number of layers. Is there a principled way to set this?

4. Data that contains more spams lead to better performance.

True as it sees more thing to learn. What would be more interesting if spam/(not spam) ratio has any impact, to be able to generalize the way you stated

5. NB is more suitable for serving as a prototype for spam detection. While DBN can achieve better performance in larger system with more data and computational power.

 You can only claim this when you can actually see this happening, but as of now you are just going by gut feeling :-). Moreover no study on running/training time is performed. Which leaves the validity un-justified again.

PrithuBanerjee (talk)‎

Thanks! Already updated the conclusion part.

YanZhao (talk)‎

Some suggestions

Hi Yan,

It is a good page that makes clear statement of the hypothesis and detailed description for the algorithms that you are going to compare. Simple suggestion is that it might be better if you could tell more about the motivation to this topic and have a introduction to this. Also, more discuss on the evaluation results will be better.

Best regards, Jiahong Chen

JiahongChen (talk)‎

Thanks for your comment! I will try to update this page according to your advice : )

YanZhao (talk)‎

Suggestions

Hi Yan,

Nice page. Here are some suggestions:

Add some explanation for your Evaluation Metric part will help me easier to understand them.
Abstract section can be more detailed.

Bests,

Yu Yan

YuYan1 (talk)‎

Thanks for your comment. I will try to update this page according to your advice.

YanZhao (talk)‎

Suggestions

Hi,

An interesting wiki page. Here are several suggestions: 1. I think maybe you should add your name on the page. 2. The 'Data Preprocessing' section is a little abstract. Would you mind to show some result after each step? That would be easier for people to understand. 3. I am curious about why DBN-Ling's accuracy is higher than NB-Ling's accuracy, but DBN-Enron's higher is lower than NB-Enron's accuracy? And those four measurements, which one is the most important?

Sincerely,

Junyuan Zheng

JunyuanZheng (talk)‎

Thanks for your comments.

1. Sure.

2. I will try to add some explanation.

3. I guess the reason is that spam ratio in Ling-Spam is much lower, therefore the performance of NB is much worse. Also, since the size of data is small, the variance of classification results can be high.

4. I think it should precision, because it measure false positive (incorrectly rejecting legitimate email), which is a big problem for email users.

Yan

YanZhao (talk)‎

Thread title	Replies	Last modified
response	1	19:26, 27 April 2016
Some suggestions	1	19:26, 27 April 2016
Suggestions	1	19:25, 27 April 2016
Suggestions	1	00:47, 21 April 2016

Contents

response

Some suggestions

Suggestions

Suggestions