8000 LatentDirichletAllocation.fit() gives joblib error when evaluate_every > 0. · Issue #6258 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
LatentDirichletAllocation.fit() gives joblib error when evaluate_every > 0. #6258
Closed
@groceryheist

Description

@groceryheist

how to reproduce:

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation

docs = ["help i have a bug yikes" for i in range(1000)]

vectorizer = CountVectorizer(input=docs,analyzer='word')
lda_features = vectorizer.fit_transform(docs)

lda_model = LatentDirichletAllocation(
    n_topics=10,
    learning_method='online',
    evaluate_every=10,
    n_jobs=4,
)
model = lda_model.fit(lda_features)

The error only occurs when 10 >= evaluate_every = 0.

The error is:

Traceback (most recent call last):
  File "topic_model.py", line 59, in <module>
    model = lda_model.fit(lda_features)
  File "/usr/local/lib/python3.4/dist-packages/sklearn/decomposition/online_lda.py", line 520, in fit
    random_init=False)
  File "/usr/local/lib/python3.4/dist-packages/sklearn/decomposition/online_lda.py", line 358, in _e_step
    for idx_slice in gen_even_slices(X.shape[0], n_jobs))
  File "/usr/local/lib/python3.4/dist-packages/sklearn/externals/joblib/parallel.py", line 771, in __call__
    n_jobs = self._initialize_pool()
  File "/usr/local/lib/python3.4/dist-packages/sklearn/externals/joblib/parallel.py", line 518, in _initialize_pool
    raise ImportError('[joblib] Attempting to do parallel computing '
ImportError: [joblib] Attempting to do parallel computing without protecting your import on a system that does not support forking. 
To use parallel-computing in a script, you must protect your main loop using "if __name__ == '__main__'". 
Please see the joblib documentation on Parallel for more information

This is the error that windows users get when they don't run their code in "if name == "main": . However, I am on linux.
The error actually indicates that a threadpool is being reinitialized. I suspect that the issue is that the threadpool is reinitialized after perplexity is evaluated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0