-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
ValueError: assignment destination is read-only, when paralleling with n_jobs > 1 #5956
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I am taking a look at this |
Is it not related to #5481, which seems more generic? |
It is, but |
Not that it matters but SparseCoder is an estimator: from sklearn.base import BaseEstimator
from sklearn.decomposition import SparseCoder
issubclass(SparseCoder, BaseEstimator) # True |
I guess the error wasn't detected in #4807 as it is raised only when using |
Was there a resolution to this bug? I've run into something similar while doing
Someone ran into the same exact problem on StackOverflow - |
@alichaudry I just commented on a similar issue here. |
I confirm that there is an error and it is floating in nature. sklearn.decomposition.SparseCoder(D, transform_algorithm = 'omp', n_jobs=64).transform(X) if X.shape[0] > 4000 it fails with ValueError: assignment destination is read-only OS: Linux 3.10.0-327.13.1.el7.x86_64 |
Hi there, OS: OSX |
currently I am still dealing with this issue and it is nearly a year since. this is still an open issue. |
If you have a solution, please contribute it, @williamdjones |
#4807 is probably the more advanced effort to address this. |
@williamdjones I was not suggesting that it's solved, but that it's an issue that is reported at a different place, and having multiple issues related to the same problem makes keeping track of it harder. |
Not sure where to report this, or if it's related, but I get the |
@JGH1000 NOT A SOLUTION, but I would try using a random forest for feature selection instead since it is stable and has working joblib functionality. |
Thanks @williamdjones, I used several different methods but found that RandomizedLasso works best for couple of particular datasets. In any case, it works but a bit slow. Not a deal breaker. |
@JGH1000 No problem. If you don't mind, I'm curious about the dimensionality of the datasets for which RLasso was useful versus those for which it was not. |
@williamdjones it was a small sample size (40-50), high-dimension (40,000-50,000) dataset. I would not say that other methods were bad, but RLasso provided results/ranking that were much more consistent with several univariate tests + domain knowledge. I guess this might not be the 'right' features but I had more trust in this method. Shame to hear it will be removed from scikit. |
The problem still seems to exist on 24 core Ubuntu processor for RLasso with n_jobs = -1 and sklearn 0.19.1 |
@coldfog @garciaev I don't know if it is still relevant for you, but I ran into the same problem using joblib without scikit-learn. The reason is the max_nbytes parameter within the Parallel invocation of the Joblib-library when you set n_jobs>1, which is 1M by default. The definition of this parameter is: "Threshold on the size of arrays passed to the workers that triggers automated memory mapping in temp_folder". So, once the arrays pass the size of 1M, joblib will throw the error "ValueError: assignment destination is read-only". In order to overcome this, the parameter has to be set higher, e.g. max_nbytes='50M'. If you want a quick-fix, you can add max_nbytes='50M' to the file "sklearn/decomposition/dict_learning.py" at line 297 in the Parallel class initiation to increase the allowed size of temporary files. |
Just to complement @lvermue answer. I did what he suggested, but instead inside |
And you can find |
When I run
SparseCoder
with n_jobs > 1, there is a chance to raise exceptionValueError: assignment destination is read-only
. The code is shown as follow:The bigger
data_dims
is, the higher chance get. Whendata_dims
is small (lower than 2000, I verified), everything works fine. Oncedata_dims
is bigger than 2000, there is a chance to get the exception. Whendata_dims
is bigger than 5000, it is 100% raised.My version infor:
OS: OS X 10.11.1
python: Python 2.7.10 |Anaconda 2.2.0
numpy: 1.10.1
sklearn: 0.17
The full error information is shown as follow
The text was updated successfully, but these errors were encountered: