8000 Floating point exception in GradientBoostingClassifier during nosetests on macosx 10.8.2 64bit (and others?) · Issue #1406 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Floating point exception in GradientBoostingClassifier during nosetests on macosx 10.8.2 64bit (and others?) #1406

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
erg opened this issue Nov 24, 2012 · 19 comments · Fixed by #1458
Labels
Milestone

Comments

@erg
Copy link
Contributor
erg commented Nov 24, 2012

Fails every time. Using current master and either nosetests or make. Git id 3c46eba.

sklearn.decomposition.tests.test_sparse_pca.test_initialization ... ok
sklearn.decomposition.tests.test_sparse_pca.test_mini_batch_correct_shapes ... ok
sklearn.decomposition.tests.test_sparse_pca.test_mini_batch_fit_transform ... SKIP
Doctest: sklearn.ensemble.gradient_boosting.GradientBoostingClassifier ... Floating point exception: 8
modern:scikit-learn erg$ [master*] 
@mblondel
Copy link
Member

Seems like your computer is catching lots of errors :)

@pprett
Copy link
Member
pprett commented Nov 25, 2012

@erg I cannot reproduce (neither does our buildbot)... can you try to pin point the error? what numpy/blas version are your running?

thx,
Peter

8000

@erg
Copy link
Contributor Author
erg commented Nov 25, 2012

Blas is the builtin one for mac. How do I find the version?

All you need is a new Macbook or like 10.8.2 64bit. Might get time to debug later...

In [2]: np.__version__
Out[2]: '1.8.0.dev-20224ea'

modern:scikit-learn erg$ [master*] file /usr/lib/libblas.dylib
/usr/lib/libblas.dylib: Mach-O universal binary with 2 architectures
/usr/lib/libblas.dylib (for architecture i386): Mach-O dynamically linked shared library i386
/usr/lib/libblas.dylib (for architecture x86_64):   Mach-O 64-bit dynamically linked shared library x86_64

@amueller
Copy link
Member

On 11/25/2012 11:06 PM, erg wrote:

Blas is the builtin one for mac. How do I find the version?

All you need is a new Macbook or like 10.8.2 64bit. Might get time to
debug later...

@erg if you send me one, I'll to the debugging for you ;)

@ogrisel
Copy link
Member
ogrisel commented Nov 25, 2012

I cannot reproduce either, although I am running OS X 10.8.2 (12C3006):

$ otool -L /usr/local/lib/python2.7/site-packages/numpy/core/_dotblas.so 
/usr/local/lib/python2.7/site-packages/numpy/core/_dotblas.so:
    /System/Library/Frameworks/Accelerate.framework/Versions/A/Accelerate (compatibility version 1.0.0, current version 4.0.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 169.3.0)

@GaelVaroquaux
Copy link
Member

I can reproduce with my system with the hand-compiled atlas library.

8000

@GaelVaroquaux
Copy link
Member

Hum, I can even reproduce with 0.12.... I am pretty certain that the tests used to pass. Thus I am starting to think that it is not a problem in the scikit.

@GaelVaroquaux
Copy link
Member

OK, with numpy 1.6.2 it works fine...

@pprett
Copy link
Member
pprett commented Nov 26, 2012

I see - I'm using 1.6.2 - I'll hunt down the problem with 1.7

@GaelVaroquaux
Copy link
Member

I did a bisect on numpy, and the numpy commit that induces the breakage is:
numpy/numpy@c48156d

@amueller
Copy link
Member

@GaelVaroquaux nice job :)

@ogrisel
Copy link
Member
ogrisel commented Nov 26, 2012

Now we need a volunteer to qualify the bug in a minimalistic reproduction script and report it to the numpy github issue tracker if it's indeed a numpy regression rather than a misuse from our side.

@GaelVaroquaux
Copy link
Member

Now we need a volunteer to qualify the bug in a minimalistic reproduction
script and report it to the numpy github issue tracker if it's indeed a numpy
regression rather than a misuse from our side.

I am still looking at it. We might have something fishy. The segfault is
in _tree.pyx, in recursive_partition, line 437:
cdef int y_stride = y.strides[0] / y.strides[1]

I suspect that y.strides[1] is now 0.

@GaelVaroquaux
Copy link
Member

cdef int y_stride = y.strides[0] / y.strides[1]

I suspect that y.strides[1] is now 0.

Confirmed.

OK, there's been an API change at the numpy level: a C-contiguous array
can have zero-length strides. It's a change in behavior, and we can
report it. But we can easily be robust to it. Is this a bug on the numpy
side? I wonder.

@GaelVaroquaux
Copy link
Member

OK, there's been an API change at the numpy level: a C-contiguous array
can have zero-length strides. It's a change in behavior, and we can
report it. But we can easily be robust to it.

This change of behavior has a wider impact. I have fixed the segfault in
my fix_1406 branch, but I am now getting a lot of failing tests. I'll
report on the numpy tracker and see what follows.

@pprett
Copy link
Member
pprett commented Nov 26, 2012

@GaelVaroquaux thanks for taking care of that

@GaelVaroquaux
Copy link
Member

This change of behavior has a wider impact. I have fixed the segfault in
my fix_1406 branch, but I am now getting a lot of failing tests. I'll
report on the numpy tracker and see what follows.

numpy/numpy#2770

We'll see what they reply.

@seberg
Copy link
Contributor
seberg commented Nov 28, 2012

I had already mentioned this in numpy/numpy#2694 (comment) (there is a second occurance that I saw back then outside of tree in case you did not yet) the fix is simply to use .itemsize for the division. The use before the division is OK, since it seems only used for that given dimension.

I have not tried this specifically for this code, but with previous numpy you should be able to break it for example with some (possibly not easy to create) 0-sized or 1-sized arrays.

Btw. sorry if you guys spend so much time on it, I somewhat thought it was seen (and also thought by then it would be reverted in numpy faster, though thats no reason not to fix it here as well)

@glouppe
Copy link
Contributor
glouppe commented Dec 12, 2012

Fixed in #1458.

@glouppe glouppe closed this as completed Dec 12, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants
0