Course talk:CPSC522/Density-Based Unsupervised Learning
- [View source↑]
- [History↑]
Contents
Thread title | Replies | Last modified |
---|---|---|
Critique-Prithu | 1 | 06:53, 14 March 2016 |
Critique | 1 | 06:45, 14 March 2016 |
Feedback | 1 | 06:41, 14 March 2016 |
Regarding the page Density-Based Unsupervised Learning | 1 | 06:37, 14 March 2016 |
Hi Jiahong Chen, Thanks for the nice write-up. It covers some seminal works on density-based unsupervised learning. A very fast and useful tool in AI. One thing that bothers me in general, how one can validate the efficacy of different algorithms in unsupervised tasks such as clustering. To illustrate as you say in your page that interleaved cluster are discovered in DBSCAN which might fail in K-means. But an application might not necessarily want that. So it will be good if you can throw some light on that. One more query as it seems that the techniques you described does not support a point being part of more that one cluster. Is that true with any unsupervised clustering method? Just to restress, these are some basic intuition that I feel can help readers to help with the background. Your page does the rest very aptly.
best, Prithu
Hi Prithu,
Thanks for your suggestions! The validation of the efficacy might be carry out with the help of a testing data set with both data points and theirs label. Then just compute theirs correctness. As for another question, I am not sure what other unsupervised learning methods is like, sorry about that. If you have other questions, please let me know.
Best regards, Jiahong Chen
Hi Jiahong,
A solid first draft, with good overall organization and flow.
General comments:
- It is good that you cite the sources of the figures when I click on them; but it would also be good to cite them directly in the page.
- It would be good to more clearly state which sections are about which paper, and put extra emphasis on what the contribution of the second paper is over the first.
- I highly recommend an extra proofreading pass. The page is readable overall in its current form, but there are places where it becomes more difficult to read and understand due to grammatical errors.
Section-specific comments:
- Introduction
- When you mention the old methods (partitioning and hierarchical), it would be nice to have links to other pages.
- DBSCAN-definition
- You have 6 definitions in this section, so the section should be called Definitions <--plural
- Are the definitions taken directly from the paper? I'd recommend either citing each of them or putting a note at the beginning of the section that they're all from a particular source.
- You frequently use the terms "neighbour", "neighbourhood", and "neighbor region", and I'm not sure if you're using them consistently. Mixing up those terms can make this section very confusing for people, so please make sure you're not accidentally using one of the terms when you mean another. You may also want to specify what a "neighbour region" is and how it's different from a neighbourhood.
- I think you mean "border", not boarder.
- DENCLUE
- You may want to move your discussion about the drawbacks of DBSCAN into the DBSCAN section as a new subsection.
- This is the most math-heavy part of your page, and so it is very important that your text is as clear as possible. The figures you supplied are very useful here.
- Comparison
- The comparisons you show are DBSCAN vs old methods (performance) and DENCLUE vs DBSCAN (runtime), and you also show a weakness of DBSCAN. However, you don't show if DENCLUE improves upon this weakness. How does DENCLUE compare to DBSCAN in terms of performance?
- I think you mean Figure 10, not Figure 1-.
Clear skies,
Jordon
Hi Jordan,
Thanks! You always give the most detailed critiques for the page, this really helps a lot to improve the page. I am correcting these errors now. If you have further suggestions, please let me know.
Best regards,
Jiahong Chen
Hi Jiahong,
I like the given motivations for density-based clustering.
For Figure 9, I fail to see how DBSCAN can possibly output these clusterings. It looks to me that the points on the border separating the blue and green clusters should be "density-connected".
For the DENCLUE section, you mention hill climbing but not the function that is supposedly being maximized? Also, your only definition of is the distribution for . Where is the function defined? (Unless is a typo.)
For Figure 6, can you explain what each diagram is showing? How is changed going from the middle to the right diagram?
What is the in the formula for ? This also doesn't quite look like the derivative to ...
Ricky
Hi Ricky,
Thank you for pointing out some errors in my page, I am trying to correct them, I will read the paper and other resources again.
Best regards,
Jiahong Chen
Hi Jiahong,
A well organized draft in my opinion. I would like to mention the following points 1. A bit more explanation of the algorithms could be helpful in my opinion. 2. A bit more focus on an example where k-means/any other well known clustering algorithm fails and these algorithms(for example-DBSCAN) can work. 3. Since other well known clustering algorithms exist, adding some links to the to add section might make the WIKI even better I hope these suggestions would be helpful
Hi,
Thank you for your suggestions! I am trying to find some more examples to make the page more understandable. If you have further suggestions, please let me know.
Many thanks,
Jiahong Chen