Closed
Description
how to reproduce:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation
docs = ["help i have a bug yikes" for i in range(1000)]
vectorizer = CountVectorizer(input=docs,analyzer='word')
lda_features = vectorizer.fit_transform(docs)
lda_model = LatentDirichletAllocation(
n_topics=10,
learning_method='online',
evaluate_every=10,
n_jobs=4,
)
model = lda_model.fit(lda_features)
The error only occurs when 10 >= evaluate_every = 0.
The error is:
Traceback (most recent call last):
File "topic_model.py", line 59, in <module>
model = lda_model.fit(lda_features)
File "/usr/local/lib/python3.4/dist-packages/sklearn/decomposition/online_lda.py", line 520, in fit
random_init=False)
File "/usr/local/lib/python3.4/dist-packages/sklearn/decomposition/online_lda.py", line 358, in _e_step
for idx_slice in gen_even_slices(X.shape[0], n_jobs))
File "/usr/local/lib/python3.4/dist-packages/sklearn/externals/joblib/parallel.py", line 771, in __call__
n_jobs = self._initialize_pool()
File "/usr/local/lib/python3.4/dist-packages/sklearn/externals/joblib/parallel.py", line 518, in _initialize_pool
raise ImportError('[joblib] Attempting to do parallel computing '
ImportError: [joblib] Attempting to do parallel computing without protecting your import on a system that does not support forking.
To use parallel-computing in a script, you must protect your main loop using "if __name__ == '__main__'".
Please see the joblib documentation on Parallel for more information
This is the error that windows users get when they don't run their code in "if name == "main": . However, I am on linux.
The error actually indicates that a threadpool is being reinitialized. I suspect that the issue is that the threadpool is reinitialized after perplexity is evaluated.
Metadata
Metadata
Assignees
Labels
No labels