-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
imblearn SMOTE throwing error when n_jobs > 1 #10916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Not sure that you are inside the good issue tracker: But since that it is here, I might need the skills of @lesteve or @ogrisel. We are using But I am not really sure what is going on for sure. |
hmm... might be due to cython code in scipy? try using an older version of
scipy?? most recent cython?
|
current imblearn minimum scipy version is >= 0.19.0 |
As a work-around I think using n_jobs=1 in pairwise_distances may be as fast (see #8216 for more details). The error seems similar to #6614. At the time, as I said in #6614 (comment), the only work-around I could find was to make sure the CSR matrix had its indices sorted before reaching the problematic step of the Pipeline. The root cause of the problem lies in scipy: it looks like import numpy as np
from scipy import sparse
from sklearn.externals import joblib
filename = '/tmp/test.pkl'
data = [2, 1, 4, 3]
indices = [1, 0, 1, 0]
indptr = [0, 2, 4]
matrix = sparse.csr_matrix((data, indices, indptr))
print('matrix.todense():\n', repr(matrix.todense()))
# To trigger the error you need to make sure that the indices are not sorted
print('matrix.has_sorted_indices:', matrix.has_sorted_indices)
joblib.dump(matrix, filename)
mmap_backed_matrix = joblib.load(filename, mmap_mode='r')
mmap_backed_matrix.astype(np.float64) Stack-trace---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/tmp/test.py in <module>()
12 mmap_backed_matrix = joblib.load(filename, mmap_mode='r')
13 mmap_backed_matrix.has_sorted_indices = False
---> 14 mmap_backed_matrix.astype(np.float64)
/home/local/lesteve/miniconda3/lib/python3.6/site-packages/scipy/sparse/data.py in astype(self, dtype, casting, copy)
69 if self.dtype != dtype:
70 return self._with_data(
---> 71 self._deduped_data().astype(dtype, casting=casting, copy=copy),
72 copy=copy)
73 elif copy:
/home/local/lesteve/miniconda3/lib/python3.6/site-packages/scipy/sparse/data.py in _deduped_data(self)
32 def _deduped_data(self):
33 if hasattr(self, 'sum_duplicates'):
---> 34 self.sum_duplicates()
35 return self.data
36
/home/local/lesteve/miniconda3/lib/python3.6/site-packages/scipy/sparse/compressed.py in sum_duplicates(self)
1007 if self.has_canonical_format:
1008 return
-> 1009 self.sort_indices()
1010
1011 M, N = self._swap(self.shape)
/home/local/lesteve/miniconda3/lib/python3.6/site-packages/scipy/sparse/compressed.py in sort_indices(self)
1053 if not self.has_sorted_indices:
1054 _sparsetools.csr_sort_indices(len(self.indptr) - 1, self.indptr,
-> 1055 self.indices, self.data)
1056 self.has_sorted_indices = True
1057
ValueError: WRITEBACKIFCOPY base is read-only |
For the record I opened scipy/scipy#8678. |
For the record, I think we could in principle work around the problem (for example a try/catch around the |
Uh oh!
There was an error while loading. Please reload this page.
Description
imblearn SMOTE throws error with n_jobs > 1
Expected Results
Actual Results
Error:
multiprocessing.pool.RemoteTraceback:
Steps/Code to Reproduce
<----- Version----->
Versions
The text was updated successfully, but these errors were encountered: