8000 Perplexity not monotonically decreasing for batch Latent Dirichlet Allocation · Issue #6777 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Perplexity not monotonically decreasing for batch Latent Dirichlet Allocation #6777

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
amueller opened this issue May 12, 2016 · 14 comments
Open

Comments

@amueller
Copy link
Member

When using the batch method, the perplexity in LDA should be non-increasing in every iteration, right?
I have cases where it does increase. If this is indeed a bug, I'll investigate.

@amueller amueller added the Bug label May 12, 2016
@amueller amueller modified the milestone: 0.19 Sep 29, 2016
@kenanz0630
Copy link

Have you solved this problem? The literature states that the perplexity should decrease with the number of topics increases. I tried this both on my dataset and sklearn.datasets, but the perplexity didn't go down in either case. I also tried assigning evaluate_every a non-zero number, but it didn't print out the perplexity over iterations. Did I make any mistake?

@amueller
Copy link
Member Author
amueller commented Dec 3, 2016

There might be an issue in the implementation, but I'm not sure :(

@amueller
Copy link
Member Author
amueller commented Dec 8, 2016

@kenanz0630 you should see the perplexity at each iteration, though. Are you doing batch or online learning?

@amueller
Copy link
Member Author
amueller commented Dec 8, 2016

Not sure if this is related to #7992

@amueller
Copy link
Member Author
amueller commented Dec 8, 2016

@kenanz0630 can you provide sample code? And can you try running the code from #7992? Because obviously I can't reproduce any more :(

@jnothman
Copy link
Member

Do we have a minimal failing example? It seems unclear if this issue remains.

@bingbong-sempai
Copy link

I tried gridsearch on the number of topics for LDA recently and log likelihood scores for both training and validation sets monotonically decreases with increasing number of topics.

@gianlucamalato
Copy link

I tried gridsearch on the number of topics for LDA recently and log likelihood scores for both training and validation sets monotonically decreases with increasing number of topics.

I see the same behavior, that makes this implementation almost useless. I'm using sklearn 0.23.2

@ihavemanyquestions
Copy link

Hi, I am having trouble with the same bug using the latest sklearn version. Has anyone got any idea how to sort it/ what alternative routes to use to get perplexity scores? I think this would be very useful.

@ozls
Copy link
ozls commented Apr 19, 2021

Hi, is there any news regarding this ? It's been five years and this bug is still present, I think it'd be wise to hide this feature entirely or mention it's broken in the docs at the very least

@ogrisel
Copy link
Member
ogrisel commented Apr 19, 2021

Nobody has posted a minimal reproducing example yet as far as I can see.

@fzhem
Copy link
fzhem commented Nov 9, 2021

I totally forgot about this bug. I faced this issue when doing one of my projects.

This seems to happen because of document lengths.
Try the following dataset to recreate the bug: https://www.kaggle.com/therohk/india-headlines-news-dataset

@cmarmo cmarmo removed this from the 0.19 milestone Dec 1, 2021
@Alexander-philip-sage
Copy link

I'm seeing this issue as well...

@mateuspestana
Copy link

Same here, guys. Any tips on how to solve it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

0