Some personal understanding of LDA perplexity

I have been struggling with this problem for a long time. During the period, I mainly went to the gensim google forum, and searched it with keywords on StackOverflow and StackexChange topic number perplexity, and got these very vague understandings:

1. Interpretation of log_perplexity() of gensim:

According to the source code of gensim3.8.3, the log_perplexity()output is perwordboundthat the perwordboundcalculation steps are as follows:

First call bound(), through a chunk of corpus W ⃗ \vec{W}W Calculate the log likelihood logp (W ⃗) logp(\vec(W)) of the entire corpuslogp(W ) , which isE q [logp (W ⃗)] − E q [logq (W ⃗)] Eq[logp(\vec{W})]-Eq[logq(\vec{W})]Eq[logp(W )]Eq[logq(W )]

Then use logp (W ⃗) logp(\vec(W))logp(W Divide the bound of ) by the size of the entire corpusNand getperwordboundit as the return value of log_perplexity().

During the function call, 2 − perwordbound 2^{-perwordbound}2P e r w o r d b o u n d is printed out as perplexity. This perplexity takes 2 as the base and is the same as:
D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, January 2003
Hoffman, Blei, Bach: Online Learning for Latent Dirichlet Allocation, NIPS 2010.
The perplexity with e as the base defined in these two papers is different, but the ideas are basically the same.

The question and answer here has similar instructions:
https://stats.stackexchange.com/questions/322809/inferring-the-number-of-topics-for-gensims-lda-perplexity-cm-aic-and-bic?r=SearchResults

2. Interpretation of the relationship between perplexity and the number of topics:

First of all, the log_perplexity() function does not normalize the number of topics, so the number of different topics cannot be directly compared:
Insert picture description here
Portal: https://groups.google.com/g/gensim/c/krs1Uytq5bY/m/ePZXIKfwGwAJ

Secondly, Radim, the author of the gensim package, appeared and replied that perplexity is not a good indicator of topic quality:
Insert picture description here
Portal: https://groups.google.com/g/gensim/c/TpuYRxhyIOc

Guess you like

Origin blog.csdn.net/yocencyy/article/details/111147746