-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Segmentation fault when running test_common.py::test_estimators[NuSVC()-check_classifiers_train(readonly_memmap=True)] #21361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the report. Do you happen to know which specific test is causing the segfault? For instance is the following enough to reproduce?
You can then refine the test name pattern passed to |
For the record, I cannot reproduce locally with the same versions of numpy and scipy and our CI is green. Which compiler version are you using? Assuming you are using the system's compiler on Linux, the following should do:
|
Yes, it reproduces the same error:
It seems to come from
|
I have the exact same version of gcc and I don't reproduce the failure locally |
@alexrz can you please try to install clang / clang++ from the llvm project (either with
|
After running commands described in "Steps/Code to Reproduce", I ran: sudo apt install clang
CC="clang" CXX="clang++" LDSHARED="clang -shared" make clean inplace
pytest -v sklearn/tests/test_common.py -k "check_classifiers_train and svc" and got the same error on the same test. |
For instance: gdb -ex r --args python -m pytest -v sklearn/tests/test_common.py -k "check_classifiers_train and svc" and then, type |
I got: (gdb) bt
#0 0x00007fffdbe31c31 in ddot_k_PRESCOTT () from /home/aperez/dev/sandbox/test/scikit-learn/venv/lib/python3.8/site-packages/scipy/spatial/../../scipy.libs/libopenblasp-r0-085ca80a.3.9.so
#1 0x00007fffc893b39a in ?? () from /home/aperez/dev/sandbox/test/scikit-learn/venv/lib/python3.8/site-packages/scipy/linalg/cython_blas.cpython-38-x86_64-linux-gnu.so
#2 0x00007fffc89226e0 in ?? () from /home/aperez/dev/sandbox/test/scikit-learn/venv/lib/python3.8/site-packages/scipy/linalg/cython_blas.cpython-38-x86_64-linux-gnu.so
#3 0x00007fffbe547aca in __pyx_fuse_1__pyx_f_7sklearn_5utils_12_cython_blas__dot (__pyx_v_n=<optimized out>, __pyx_v_x=<optimized out>, __pyx_v_incx=<optimized out>, __pyx_v_y=<optimized out>,
__pyx_v_incy=<optimized out>) at sklearn/utils/_cython_blas.c:2861
#4 0x00007fffbe258327 in svm::Kernel::Kernel (this=0x7fffffff6030, l=<optimized out>, x_=<optimized out>, param=..., blas_functions=0x7fffffff64e0) at sklearn/svm/src/libsvm/svm.cpp:394
#5 0x00007fffbe25b394 in svm::SVC_Q::SVC_Q (blas_functions=0x7fffffff64e0, y_=0x555556ff0fd0 '\001' <repeats 100 times>, '\377' <repeats 100 times>..., param=..., prob=..., this=0x7fffffff6030)
at sklearn/svm/src/libsvm/svm.cpp:1682
#6 svm::solve_nu_svc (blas_functions=0x7fffffff64e0, si=0x7fffffff6000, alpha=0x555557450160, param=0x7fffffff69a0, prob=0x7fffffff6360) at sklearn/svm/src/libsvm/svm.cpp:1682
#7 svm::svm_train_one (prob=0x7fffffff6360, param=0x7fffffff69a0, Cp=<optimized out>, Cn=<optimized out>, status=0x7fffffff64d4, blas_functions=0x7fffffff64e0)
at sklearn/svm/src/libsvm/svm.cpp:1856
#8 0x00007fffbe264cde in svm_train (prob=0x7fffffff6340, prob@entry=0x7fffffff6500, param=param@entry=0x7fffffff69a0, status=status@entry=0x7fffffff64d4,
blas_functions=blas_functions@entry=0x7fffffff64e0) at sklearn/svm/src/libsvm/svm.cpp:2504
#9 0x00007fffbe23e921 in __pyx_pf_7sklearn_3svm_7_libsvm_fit (__pyx_v_X=__pyx_v_X@entry=0x7fffbb041db0, __pyx_v_Y=__pyx_v_Y@entry=0x7fffbb041c90, __pyx_v_svm_type=__pyx_v_svm_type@entry=1,
__pyx_v_kernel=__pyx_v_kernel@entry=0x7fffc3052730, __pyx_v_degree=__pyx_v_degree@entry=3, __pyx_v_gamma=__pyx_v_gamma@entry=0.53188777537536391, __pyx_v_coef0=__pyx_v_coef0@entry=0,
__pyx_v_tol=__pyx_v_tol@entry=0.001, __pyx_v_C=__pyx_v_C@entry=0, __pyx_v_nu=0.5, __pyx_v_epsilon=0, __pyx_v_class_weight=__pyx_v_class_weight@entry=0x7fffbb041f30,
__pyx_v_sample_weight=0x7fffbb041e10, __pyx_v_shrinking=1, __pyx_v_probability=0, __pyx_v_cache_size=200, __pyx_v_max_iter=-1, __pyx_v_random_seed=209652396, __pyx_self=<optimized out>) |
OpenBLAS 0.3.9 seems very old. Can you please include the output of:
in the environment where you can reproduce the crash? |
|
When installing packages with condaforge, I don't get the error.
|
It's possible that openblas 0.3.9 on CPU architectures detected as |
@AlexPrz maybe you can try to do:
and also:
in the venv that has the problem. |
When running:
in the venv that has the problem, I don't get the error anymore. And
|
On my side, I confirm that forcing the Prescott architecture triggers the segfault. |
I think the only fix is to wait for scipy to upgrade it's dependency to openblas. |
I reported the bug upstream in scipy. The workaround, in the mean time, is to install scipy from conda-forge instead of pypi.org.
|
So should this no longer be a blocker? |
scipy/scipy#14316 seems to have fixed this issue indirectly by upgrading OpenBLAS for the PyPI distribution of scipy: /tmp/scikit-learn ± ● main $ source venv/bin/activate
/tmp/scikit-learn ± ● main $ where python
/tmp/scikit-learn/venv/bin/python
/usr/bin/python
/tmp/scikit-learn ± ● main $ python -m threadpoolctl -i sklearn
[
{
"user_api": "openmp",
"internal_api": "openmp",
"prefix": "libomp",
"filepath": "/home/jjerphan/.local/share/miniconda3/lib/libomp.so",
"version": null,
"num_threads": 8
},
{
"user_api": "blas",
"internal_api": "openblas",
"prefix": "libopenblas",
"filepath": "/tmp/scikit-learn/venv/lib/python3.9/site-packages/numpy.libs/libopenblasp-r0-2d23e62b.3.17.so",
"version": "0.3.17",
"threading_layer": "pthreads",
"architecture": "SkylakeX",
"num_threads": 8
},
{
"user_api": "blas",
"internal_api": "openblas",
"prefix": "libopenblas",
"filepath": "/tmp/scikit-learn/venv/lib/python3.9/site-packages/scipy.libs/libopenblasp-r0-8b9e111f.3.17.so",
"version": "0.3.17",
"threading_layer": "pthreads",
"architecture": "SkylakeX",
"num_threads": 8
}
] @AlexPrz: can you confirm? |
I do confirm that I don't get the segfault anymore with scipy 1.7.2 whereas I get it with scipy 1.7.1. |
For information we observed this crash on the CI on the unrelated PR #21697. The CI entry with the crash is also using the old version of openblas: 0.3.13 that detects the
We should probably update our
6D40
CI tests to XFAIL or skip those tests on the |
As discussed with @glemaitre, I am taking this as I already did some openblas xfail fixtures for faulty tests. |
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
* TST XFAIL test when unstable openblas configuration * Adapt versions parsing Co-authored-By: Adrin Jalali <adrin.jalali@gmail.com> * DOC Prefer ifskip and reference #21361 Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * Fix Julien's scheduler incorrect dispatch * Simplify Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * Do not remove blank line * fixup! Do not remove blank line * Retrigger CI * Prudently assume Prescott might be the architecture if it is unknown Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
…rn#21702) * TST XFAIL test when unstable openblas configuration * Adapt versions parsing Co-authored-By: Adrin Jalali <adrin.jalali@gmail.com> * DOC Prefer ifskip and reference scikit-learn#21361 Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * Fix Julien's scheduler incorrect dispatch * Simplify Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * Do not remove blank line * fixup! Do not remove blank line * Retrigger CI * Prudently assume Prescott might be the architecture if it is unknown Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
…rn#21702) * TST XFAIL test when unstable openblas configuration * Adapt versions parsing Co-authored-By: Adrin Jalali <adrin.jalali@gmail.com> * DOC Prefer ifskip and reference scikit-learn#21361 Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * Fix Julien's scheduler incorrect dispatch * Simplify Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * Do not remove blank line * fixup! Do not remove blank line * Retrigger CI * Prudently assume Prescott might be the architecture if it is unknown Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
…rn#21702) * TST XFAIL test when unstable openblas configuration * Adapt versions parsing Co-authored-By: Adrin Jalali <adrin.jalali@gmail.com> * DOC Prefer ifskip and reference scikit-learn#21361 Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * Fix Julien's scheduler incorrect dispatch * Simplify Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * Do not remove blank line * fixup! Do not remove blank line * Retrigger CI * Prudently assume Prescott might be the architecture if it is unknown Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
…rn#21702) * TST XFAIL test when unstable openblas configuration * Adapt versions parsing Co-authored-By: Adrin Jalali <adrin.jalali@gmail.com> * DOC Prefer ifskip and reference scikit-learn#21361 Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * Fix Julien's scheduler incorrect dispatch * Simplify Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * Do not remove blank line * fixup! Do not remove blank line * Retrigger CI * Prudently assume Prescott might be the architecture if it is unknown Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
* TST XFAIL test when unstable openblas configuration * Adapt versions parsing Co-authored-By: Adrin Jalali <adrin.jalali@gmail.com> * DOC Prefer ifskip and reference #21361 Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * Fix Julien's scheduler incorrect dispatch * Simplify Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * Do not remove blank line * fixup! Do not remove blank line * Retrigger CI * Prudently assume Prescott might be the architecture if it is unknown Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Describe the bug
I followed the contributing code guidelines and ran into a segmentation fault when running the test
sklearn/tests/test_common.py
. It appears to come fromlibsvm.fit(...)
.I am not sure whether it is a problem related to my device or not.
Steps/Code to Reproduce
Expected Results
No error is thrown.
Actual Results
Versions
The text was updated successfully, but these errors were encountered: