Suggestions

Hi Prithu Great page! You wrote the basics in a lucid manner for us to understand. Thumbs up for the Layman's Explanation of LDA ! You can consider putting in an abstract. That would help readers in understanding what the paper is about in a quick glance. Some questions I had:

In the generative model of LDA you have written Matrix $\beta$ is a $k*V$ , k is the number of topics, what is V in this case?
How do we decide on the values of topics or is that known like the number of topics?
Does LDA essentially act like a recommendation model?
Is CTM always better than LDA? Is there any scenario where considering the correlation between topics is actually harmful?

SamprityKashyap (talk)‎

Hi Samprity, thanks for your feedback. Here is a quick rebuttal: 1. V is explained in the notation section first bullet. Essentially it's the vocabulary i.e. the collection of all the words of the corpus (after doing some preprocessing such as stemming, stopword removal etc) 2. By values of topic do you mean the name of the topic? I am answering it assuming so. See here the name of the topic does not really matter, think it as of an unsupervised clustering where the what the cluster means, is not so relevant (or rather application relevant.). Having said that, when LDA applied on text data, there is an easy hack to name the clusters. Remember, topics are distributions over words, so by looking at the most probable words inside a topic distribution, one can fairly guess the topic name. 3. No LDA on its own is not a recommendation model. It just maps each observation(aka documents) to a (in most cases) smaller space (aka topics) from where they are supposed to be generated from. Having the smaller space mapping helps in comparing observations, and hence if needed application may use this similarity measure to recommend similar set of other observations. 4. I guess CTM is always as good as LDA. Given there is no bound on inference time, there is no need to prefer LDA over CTM. As you can see also from the generative model it's mostly similar to LDA. In case there is no correlation across topics in the dataset, CTM will learn a covariance matrix which is even. So it will essentially be an LDA. Thus according to my understanding performance wise it is never inferior to LDA.

I will include an abstract and also some of the stuff from my response, if this looks convincing to you. Thanks for your suggestions again,

best Prithu

PrithuBanerjee (talk)‎

Thanks for the clarifications! Looks good to me now :)

SamprityKashyap (talk)‎