Course talk:CPSC522/Spam Detection

From UBC Wiki

Contents

Thread titleRepliesLast modified
response119:26, 27 April 2016
Some suggestions119:26, 27 April 2016
Suggestions119:25, 27 April 2016
Suggestions100:47, 21 April 2016

Hi Yan, Good work. Your hypothesis is stated and motivated well. But few of the claims you make lack justifications from the work you presented.

1. NB has better performance in our experiments. However, our first guess is that DBN should be better. The reason might be the size of DBN is too small.

   You mean adding more layers would always help? Why you feel that there is no saturation? And what are some std practice then to decide the number of layers. Is there a principled way to set this?

4. Data that contains more spams lead to better performance.

True as it sees more thing to learn. What would be more interesting if spam/(not spam) ratio has any impact, to be able to generalize the way you stated

5. NB is more suitable for serving as a prototype for spam detection. While DBN can achieve better performance in larger system with more data and computational power.

 You can only claim this when you can actually see this happening, but as of now you are just going by gut feeling :-). Moreover no study on running/training time is performed. Which leaves the validity un-justified again.
PrithuBanerjee (talk)06:28, 21 April 2016

Thanks! Already updated the conclusion part.

YanZhao (talk)19:26, 27 April 2016
 

Some suggestions

Hi Yan,

It is a good page that makes clear statement of the hypothesis and detailed description for the algorithms that you are going to compare. Simple suggestion is that it might be better if you could tell more about the motivation to this topic and have a introduction to this. Also, more discuss on the evaluation results will be better.

Best regards, Jiahong Chen

JiahongChen (talk)03:52, 21 April 2016

Thanks for your comment! I will try to update this page according to your advice : )

YanZhao (talk)19:26, 27 April 2016
 

Suggestions

Hi Yan,

Nice page. Here are some suggestions:

  1. Add some explanation for your Evaluation Metric part will help me easier to understand them.
  2. Abstract section can be more detailed.

Bests,

Yu Yan

YuYan1 (talk)08:52, 21 April 2016

Thanks for your comment. I will try to update this page according to your advice.

YanZhao (talk)19:25, 27 April 2016
 

Suggestions

Hi,

An interesting wiki page. Here are several suggestions: 1. I think maybe you should add your name on the page. 2. The 'Data Preprocessing' section is a little abstract. Would you mind to show some result after each step? That would be easier for people to understand. 3. I am curious about why DBN-Ling's accuracy is higher than NB-Ling's accuracy, but DBN-Enron's higher is lower than NB-Enron's accuracy? And those four measurements, which one is the most important?

Sincerely,

Junyuan Zheng

JunyuanZheng (talk)06:12, 19 April 2016

Thanks for your comments.

1. Sure.

2. I will try to add some explanation.

3. I guess the reason is that spam ratio in Ling-Spam is much lower, therefore the performance of NB is much worse. Also, since the size of data is small, the variance of classification results can be high.

4. I think it should precision, because it measure false positive (incorrectly rejecting legitimate email), which is a big problem for email users.

Yan

YanZhao (talk)00:46, 21 April 2016