8000 Solaris10/SPARC/gcc-4.5.1/32bit: SIGBUS in test_mrecords.py (Trac #1631) · Issue #2227 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

Solaris10/SPARC/gcc-4.5.1/32bit: SIGBUS in test_mrecords.py (Trac #1631) #2227

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
numpy-gitbot opened this issue Oct 19, 2012 · 16 comments
Closed

Comments

@numpy-gitbot
Copy link

Original ticket http://projects.scipy.org/numpy/ticket/1631 on 2010-10-10 by @nstange, assigned to @charris.

Hi everybody,

for numpy-1.5.0, the testsuite crashes here with an SIGBUS here:
"Test filled w/ flexible dtype ... Bus Error (core dumped)" (numpy-1.5.0/numpy/ma/tests/test_core.py)
and here:
"Test that 'exotic' formats are processed properly ... Bus Error (core dumped)" (numpy-1.5.0/numpy/ma/tests/test_mrecords.py).

The problem is that SPARC processors require alignment, that is 8 byte values have to be aligned at 8 byte memory boundraries.
The good point is: numpy knows that the array is misaligned (verified using gdb) and thus, the fix is easy (see attached diff 02_put_mask_only_on_behaved_arrays.diff): The check for ISCONTINOUS isn't enough in PyArray_PutMask (item_selection.c), also check for ISBEHAVED (= ISALIGNED && ISWRITABLE).

But now, another problem arises:
a SIGSEGV here:
"Tests fields retrieval"
(numpy-1.5.0/numpy/ma/tests/test_mrecords.py:77).

The problem is in _copy_same_shape(numpy-1.5.0/numpy/core/src/multiarrayctors.c:732): dest->dimensions == NULL.
Please also note that maxaxis == -1 at that point.
Check the attached diff "04_copy_from_same_shape_zerodim_fix.diff" for details. There's one point in the diff I'm unsure about: The right position of
PyArray_INCREF(src);
PyArray_XDECREF(dest);
I tried to resemble the original "logical" position, but since I have no clue about Python's reference counting, please have a look.

While debugging the last SIGSEGV-issue, I stumbled over another mistake (at least I think so):

In PyArray_IterAllButAxis (numpy-1.5.0/numpy/core/src/multiarray/iterators.c), the minaxis won't be set if the first nonzero stride is the smallest one.
See attached diff (03_fix_iterallbutaxis_minstride_search.diff) for a fix.

The reason why I poste those three issues into one report is that the testsuite still doesn't succeed and thus, I'm unsure if I've broken sth. with my diffs.

What I get now is:

FAIL: Ticket #1897 second test

Traceback (most recent call last):
File "/pf/m/m222086/xas/solaris10/python2/python-2.7-ve0-gcc/lib/python2.7/sit
e-packages/numpy/core/tests/test_regression.py", line 1255, in test_structured_a
rrays_with_objects2
assert sys.getrefcount(strb) == numb
AssertionError:
7 = <module 'sys' (built-in)>.getrefcount('aaaa')
array([[(0L, 'aaaa'), (0L, 'aaaa')]],
dtype=[('f0', '>i8'), ('f1', '|O4')]) = <module 'numpy' from '/pf/m/m2
22086/xas/solaris10/python2/python-2.7-ve0-gcc/lib/python2.7/site-packages/numpy
/init.pyc'>.array([[(0,'aaaa'),(1,'bbbb')]], 'i8,O')

array([[(0L, 'aaaa'), (0L, 'aaaa')]],
dtype=[('f0', '>i8'), ('f1', '|O4')])[array([[(0L, 'aaaa'), (0L, 'aaaa
')]],
dtype=[('f0', '>i8'), ('f1', '|O4')]).nonzero()] = array([[(0L, 'aaaa'
), (0L, 'aaaa')]],
dtype=[('f0', '>i8'), ('f1', '|O4')]).ravel()[:1]
assert <module 'sys' (built-in)>.getrefcount('bbbb') == 7
assert <module 'sys' (built-in)>.getrefcount('aaaa') == 7 + 2

FAIL: Test filled w/ mvoid

Traceback (most recent call last):
File "/pf/m/m222086/xas/solaris10/python2/python-2.7-ve0-gcc/lib/python2.7/sit
e-packages/numpy/ma/tests/test_core.py", line 518, in test_filled_w_mvoid
assert_equal(tuple(test), (1, default_fill_value(1.)))
File "/pf/m/m222086/xas/solaris10/python2/python-2.7-ve0-gcc/lib/python2.7/sit
e-packages/numpy/ma/testutils.py", line 94, in assert_equal
return _assert_equal_on_sequences(actual, desired, err_msg='')
File "/pf/m/m222086/xas/solaris10/python2/python-2.7-ve0-gcc/lib/python2.7/sit
e-packages/numpy/ma/testutils.py", line 66, in _assert_equal_on_sequences
assert_equal(actual[k], desired[k], 'item=%r\n%s' % (k,err_msg))
File "/pf/m/m222086/xas/solaris10/python2/python-2.7-ve0-gcc/lib/python2.7/sit
e-packages/numpy/ma/testutils.py", line 98, in assert_equal
raise AssertionError(msg)
AssertionError:
Items are not equal:
item=1

ACTUAL: 2.0
DESIRED: 1e+20

raise AssertionError('\nItems are not equal:\nitem=1\n\n ACTUAL: 2.0\n DESIR
ED: 1e+20')

At least not segfaults/bus errors anymore ;).
Btw.: I don't know what mvoid is, but have a look at the following:

myuid@myhost:~$ ~/xas/solaris10/python2/python-2.7-ve0-gcc/bin/python Python 2.7 (r27:82500, Oct 9 2010, 17:26:38)
[GCC 4.5.1] on sunos5
Type "help", "copyright", "credits" or "license" for more information.

import numpy as np
import numpy.ma as ma
import numpy.ma.core
from numpy.ma.core import *
x = ma.array([(1,2.)], mask=[(0,1)], dtype=[('a', int), ('b', float)])
print x.filled()
[(1, 1e+20)]
x = mvoid((1,2.), mask=[(0,1)], dtype=[('a', int), ('b', float)])
print x.filled()
(1, 2.0)

Since the non-working mvoid-mask could have an impact on results, I cannot release that build to our the scientists at our site. Do you have any idea what the issue could be?
Do you know what a mvoid is? I can't find any documentation about it. I'm seriously thinking about just removing that class from my numpy build (If I knew that there weren't any dependants outside of numpy)...

Wishes

Nicolai

@numpy-gitbot
Copy link
Author

Attachment added by @nstange on 2010-10-10: 02_put_mask_only_on_behaved_arrays.diff

@numpy-gitbot
Copy link
Author

Attachment added by @nstange on 2010-10-10: 03_fix_iterallbutaxis_minstride_search.diff

@numpy-gitbot
Copy link
Author

Attachment added by @nstange on 2010-10-10: 04_copy_from_same_shape_zerodim_fix.diff

@numpy-gitbot
Copy link
Author

@charris wrote on 2010-10-10

Hmmm... Curious. Other SPARC users haven't reported this problem (yet). The same compiler version has been used so what remains looks like the 32 bits. Can you try compiling for 64 bits? I'm mostly concerned that your fixes might be hiding something more fundamental like a failure to detect correct integer sizes somewhere. Also, is there any chance that different builds may have been mixed together?

@numpy-gitbot
Copy link
Author

@charris wrote on 2010-10-10

This also looks like ticket #1692. We need more information to track down the root cause.

@numpy-gitbot
Copy link
Author

@nstange wrote on 2010-10-10

The location of the Bus error is exactly the same as in #1692
It definitely is the same issue!

@numpy-gitbot
Copy link
Author

@charris wrote on 2010-10-10

Add #1630 as another related ticket.

@numpy-gitbot
Copy link
Author

@nstange wrote on 2010-10-10

With 64 bits:

See the different configs attached (py_and_numpy_config.tar.bz2).

I don't think that the issue has sth to do with wrong type size detection:
The NPY_ALIGNED flag is not set (in 32 bit case) and PyArray_PutMask invokes the DOUBLE_fastputmask nonetheless. This is definitively wrong behaviour.
I guess the failure of the mvoid test with my patches applied may be because of some issues with
a not respected NPY_UPDATEIFCOPY set in the copy within PyArray_PutMask.

@numpy-gitbot
Copy link
Author

Attachment added by @nstange on 2010-10-10: py_and_numpy_config.tar.bz2

@numpy-gitbot
Copy link
Author

@charris wrote on 2010-10-10

I'm curious how this 64bit/32bit thing works. Are you compiling numpy as 32bits and running it with a 64bit Python, or do you also have a 32 bit Python installed?

@numpy-gitbot
Copy link
Author

@nstange wrote on 2010-10-10

Either both 32 bit or both 64 bits. Mixing them wouldn't work anyway as the *.so could not be loaded then.

@numpy-gitbot
Copy link
Author

@nstange wrote on 2010-10-10

After some intensive debugging, I've got the reason for the mvoid-failer: It has nd == 0 (that is zero dimension).

The error only appears if the array has to be copied in PyArray_PutMask due to alignment requirements.
In that case, when the copy is destructed at the end of PyArray_PutMask and the data is to be copied back to self->base due to NPY_UPDATEIFCOPY being set in the flags, I can decide whether I want to have a SIGSEGFAULT in _copy_from_same_shape or wrong behaviour (my patch 04_copy_from_same_shape_zerodim_fix.diff makes _copy_from_same_shape return silently if nd == 0 without any copying).

So now how to fix that?
There are two possiblities:

  • If nd == 0 makes no sense: Set nd at mvoid's construction to a meaningful value. What value would that be?
  • If nd == 0 makes sense: Prepare _copy_from_same_shape to copy from zerodim arrays. How?

Unfortunately mvoid isn't documented anywhere and thus, I can't answer any of these questions. What is it supposed to do?

I think I'll just remove it completely: It's not used anywhere (at least I think so).

@numpy-gitbot
Copy link
Author

@charris wrote on 2010-10-11

Thank you for all the work. I think you've brought us much closer to fixing the problem.

@numpy-gitbot
Copy link
Author

@nstange wrote on 2010-10-11

Two things to mention:

  • the test_structured_arrays_with_objects2 failer mentioned above is unrelated to this ticket (except that it happens only on machines requiring alignment).
  • what remains is to fix the mvoid-failer

I think I'll just remove it completely: It's not used anywhere (at least I think so).
This won't work, it is used here: numpy-1.5.0/numpy/ma/core.py:2956
Too sad...

@numpy-gitbot
Copy link
Author

Milestone changed to Unscheduled by @rgommers on 2011-03-30

@charris
Copy link
Member
charris commented Feb 19, 2014

Duplicate of #1692.

@charris charris closed this as completed Feb 19, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants
0