Does not someone help me?What is the best value for likelihood in each of these charts?help meThanks a lot
In my view there is no simple answer to this question. In general, the more topics you allow, the better your performance metrics look. But as you noted, having more topics increases the complexity of your analysis. So you have to make a tradeoff decision. I don't think there is any single way to find the "best" number. @mschmitz is the architext of the LDA extension so I would be interested in his thoughts on this.
Hi,
for me finding the optimal number of topics is very similar to k in k-means. there is no easy thing to optimize. The next toolbox version will have "Perplexity" in it, which is the common measure.
Here is a ncie read on the topic: LDA Best Practices
BR,
Martin
HelloThank you both dear professorsJust what did you mention, when does Perplexity come from?What is its purpose?Is it better for you to review the data for Alpha? Or set heuristics better?Is there a criterion for assessing the goodness of Lda with different alpha and beta parameters? How?
Thank youWith respect
Hi @elena2020chao,
Perplexity is defined as
exp(-LLH/#tokens)
and is thus a direct dereritive of LLH. It will be present in the next release. It's just common to use this measure over LLH.
For alpha/beta: I would go for Heuristics + Optimize Hyperparameters. It supports an automatic change over the fitting process.
HelloThank you very much for your replyOnly this operator is Optimize Hyperparameters I did not find ...And what is the basis of liklihood?Thanks if you answerWith regards
Optimize Hyperparameters is a setting for the LDA operator. Not an operator.
The LLH is the LLH of the underlying model. See: https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation
It can be interpreted like a "goodness of fit" in other models.
Hi, thank you very much:smileyhappy:I realizedImpatiently waiting for the new version of the program ...How can you find out what each topic is about? Do I need to understand myself by repeating this topic?Is it possible to determine the content of each cluster in kmeans by the LDA? I could not do anything ...thanks again