-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
BUG liblinear/libsvm-based learners segfault when passed large sparse matrices #9545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@jnothman i would love to solve this bug, any suggestions? |
@kdhingra307 add a test to these classes to check for these matrices? |
yes, start with a test. construct a CSR matrix with big index dtype. Use
assert_raises to pass it into SVC().fit and to check it raises an error
(check what type of errors other estimators such as SGDClassifier raise).
Then make the test pass by changing the SVC.fit code.
…On 15 Aug 2017 5:13 am, "Andreas Mueller" ***@***.***> wrote:
@kdhingra307 <https://github.com/kdhingra307> add a test to these classes
to check for these matrices?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9545 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz66461fEb8l4COt4HwkKcMK_HWzE_ks5sYJxkgaJpZM4O156Q>
.
|
@AishwaryaRK am working on this, had moved to a new place last week. so couldnt finish it |
@jnothman i have put a pull request, in which i address all of the test cases and also created a new function in utils/validation.py which checks for sparse indices, which is by default referenced from check_array only. By default check_array blocks 64bit indices,but there is an option accept_large_sparse using which it can be bypassed |
From #2969 (comment):
"Anything that uses liblinear (and possibly other bundled C as opposed to Cython code) will segfault when given CSR arrays with 64 bit indices (e.g. LogisticRegression(), LinearSVC() etc). This is fairly critical IMO, and even if sparse arrays with 64 bit indices won't be supported there in the near future (or at all), it would be good to check for indices dtype and raise a python exception when appropriate. This is also the reason these tests need to be run with pytest-xdist using the -n 1 option, so that pytest could recover from a crashed interpreter."
I assume the same is true of SVC, SVR.
The issue is that scipy.sparse matrices only relatively began to support large sparse matrices, such as where
indptr
andindices
ofcsr_matrix
may be 64-bit ints. This case should be ruled out for the liblinear/libsvm solvers. I think the best solution (so that we can later support or reject large sparse matrices more systematically) is to add a boolean parameter such asaccept_large_sparse
tosklearn.utils.check_array
.The text was updated successfully, but these errors were encountered: