8000 BUG liblinear/libsvm-based learners segfault when passed large sparse matrices · Issue #9545 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

BUG liblinear/libsvm-based learners segfault when passed large sparse matrices #9545

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jnothman opened this issue Aug 14, 2017 · 6 comments · Fixed by #11327
Closed

BUG liblinear/libsvm-based learners segfault when passed large sparse matrices #9545

jnothman opened this issue Aug 14, 2017 · 6 comments · Fixed by #11327
Labels
Bug Easy Well-defined and straightforward way to resolve help wanted

Comments

@jnothman
Copy link
Member

From #2969 (comment):
"Anything that uses liblinear (and possibly other bundled C as opposed to Cython code) will segfault when given CSR arrays with 64 bit indices (e.g. LogisticRegression(), LinearSVC() etc). This is fairly critical IMO, and even if sparse arrays with 64 bit indices won't be supported there in the near future (or at all), it would be good to check for indices dtype and raise a python exception when appropriate. This is also the reason these tests need to be run with pytest-xdist using the -n 1 option, so that pytest could recover from a crashed interpreter."

I assume the same is true of SVC, SVR.

The issue is that scipy.sparse matrices only relatively began to support large sparse matrices, such as where indptr and indices of csr_matrix may be 64-bit ints. This case should be ruled out for the liblinear/libsvm solvers. I think the best solution (so that we can later support or reject large sparse matrices more systematically) is to add a boolean parameter such as accept_large_sparse to sklearn.utils.check_array.

@jnothman jnothman added Bug Easy Well-defined and straightforward way to resolve Need Contributor labels Aug 14, 2017
@kdhingra307
Copy link

@jnothman i would love to solve this bug, any suggestions?

@amueller
Copy link
Member

@kdhingra307 add a test to these classes to check for these matrices?

@jnothman
Copy link
Member Author
jnothman commented Aug 15, 2017 via email

@AishwaryaRK
Copy link
Contributor

If this issue is still unresolved then I would like to contribute. @jnothman @amueller can you please give me pointers to start with.

@kdhingra307
Copy link

@AishwaryaRK am working on this, had moved to a new place last week. so couldnt finish it

@kdhingra307
Copy link

@jnothman i have put a pull request, in which i address all of the test cases and also created a new function in utils/validation.py which checks for sparse indices, which is by default referenced from check_array only.

By default check_array blocks 64bit indices,but there is an option accept_large_sparse using which it can be bypassed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Easy Well-defined and straightforward way to resolve help wanted
Projects
None yet
5 participants
0