8000 Complex128 alignment leads to scipy test failures. · Issue #3768 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

Complex128 alignment leads to scipy test failures. #3768

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
charris opened this issue Sep 20, 2013 · 18 comments
Closed

Complex128 alignment leads to scipy test failures. #3768

charris opened this issue Sep 20, 2013 · 18 comments

Comments

@charris
Copy link
Member
charris commented Sep 20, 2013

The tests that fail check inplace fft and fail due to copies being made on win 32 when allocated data memory is not 16 byte aligned. The change is c9bf9b0. Discussion at scipy/scipy#2890 (comment).

@juliantaylor
Copy link
Contributor

the numpy issue is that complex arrays are not classified as aligned when they actually are on the hardware.
E.g. complex128 requires only 4 byte alignment on i386, but it is set to require 8/16 byte because thats what is needed on sparc64.

We could fix that, but the scipy test is still questionable as it expects the data it uses to always be aligned.

@charris
Copy link
Member Author
charris commented Sep 20, 2013

@pv Agree about the test. I also agree that the alignment should match what is needed for efficient operation or to avoid bus errors and that will be architecture dependent. On Intel I suspect the default malloc alignment matches what is needed.

@juliantaylor
Copy link
Contributor

the question is how to fix it. As we aren't using C99 complex types we have no reliable way of getting the alignment for them. On the other hand I think sparc64 is the only platform that has 16 byte large scalar types so we could just hardcode the double alignment for this platform and keep double alignment for all others.

@juliantaylor
Copy link
Contributor

or we could search for where numpy accesses the two double as one 16 byte object which caused the issue that was the original reason for the alignment being removed for complex types on transfer.
Most code seems to access each double of the complex double separately, I don't even know how you would get an unaligned access without the complex double type, except maybe casting it to long double (which is 16 byte on sparc).

@charris
Copy link
Member Author
charris commented Sep 20, 2013

I wonder what the Fortran alignment requirements are? In any case, the double alignment seems to have worked for all but the SPARC, so I think hard coding that exception would be the best short term fix. The search for the use of a C99 complex seems more of a long term thing, and for all I know, we might want to move to that at some point in the future rather than fix it ;) Any idea what the situation is with respect to the long double complex type?

@charris
Copy link
Member Author
charris commented Sep 20, 2013

The complex type comes from numpy/core/src/npymath/npy_math_private.h where complex.h is included if NPY_USE_C99_COMPLEX is true. The type is used to call complex library functions if they are available.

@juliantaylor
Copy link
Contributor

from googling a bit it seems sparc does not even requires 16 byte alignment for complex doubles:
http://docs.oracle.com/cd/E19957-01/805-4939/c400041360f5/index.html

so I dug a bit more for the source of the #2668
in this example the data is complex float which has alignment requirement 4, and itemsize 8
nditer only sees itemsize 8 and returns an aligned size8 transfer function which crashes on the 4byte aligned buffer.
No idea how to fix that (besides bloating the alignment requirement as done right now), does nditer always know about all the buffers involved? if yes somewhere an alignment check is missing.

it can be reproduced on x86 by asserting on alignment in lowlevel_strided_loops.c.src and reverting the alignment increase:

--- a/numpy/core/src/multiarray/arraytypes.c.src
+++ b/numpy/core/src/multiarray/arraytypes.c.src
@@ -3819,7 +3819,7 @@ NPY_NO_EXPORT PyArray_Descr @from@_Descr = {
     /* elsize */
     @num@ * sizeof(@fromtype@),
     /* alignment */
-    @num@ * _ALIGN(@fromtype@),
+    _ALIGN(@fromtype@),
     /* subarray */
     NULL,
     /* fields */
@@ -4150,7 +4150,6 @@ set_typeinfo(PyObject *dict)
      *         CFLOAT, CDOUBLE, CLONGDOUBLE#
      * #Name = Half, Float, Double, LongDouble,
      *         CFloat, CDouble, CLongDouble#
-     * #num  = 1, 1, 1, 1, 2, 2, 2#
      */

     PyDict_SetItemString(infodict, "@name@",
@@ -4161,7 +4160,7 @@ set_typeinfo(PyObject *dict)
 #endif
                 NPY_@name@,
                 NPY_BITSOF_@name@,
-                @num@ * _ALIGN(@type@),
+                _ALIGN(@type@),
                 (PyObject *) &Py@Name@ArrType_Type));
     Py_DECREF(s);


--- a/numpy/core/src/multiarray/lowlevel_strided_loops.c.src
+++ b/numpy/core/src/multiarray/lowlevel_strided_loops.c.src
@@ -118,6 +118,10 @@ static void
                         npy_intp N, npy_intp NPY_UNUSED(src_itemsize),
                         NpyAuxData *NPY_UNUSED(data))
 {
+#if @is_aligned@ && @elsize@ != 16
+    assert((long)dst % sizeof(@type@) == 0);
+    assert((long)src % sizeof(@type@) == 0);
+#endif
     /*printf("fn @prefix@_@oper@_size@elsize@\n");*/
     while (N > 0) {

@charris
Copy link
Member Author
charris commented Sep 22, 2013

Let's see if @seberg or @mwiebe can shed any light on the buffer.

@juliantaylor
Copy link
Contributor

replacing PyArray_ISALIGNED in nditer_constr.c with something like _IsAligned with the itemsize instead of the stored alignment fixes this issue, but its impossible to tell if thats sufficient.

@njsmith
Copy link
Member
njsmith commented Sep 22, 2013

Sounds like nditer assumes that itemsize and alignment are always the same
and this is just wrong, which makes it a bug in nditer?

(Can one reproduce it then with e.g. an S8 array?)
On 22 Sep 2013 22:14, "Julian Taylor" notifications@github.com wrote:

from googling a bit it seems sparc does not even requires 16 byte
alignment for complex doubles:
http://docs.oracle.com/cd/E19957-01/805-4939/c400041360f5/index.html

so I dug a bit more for the source of the #2668#2668
in this example the data is complex float which has alignment requirement
4, and itemsize 8
nditer only sees itemsize 8 and returns an aligned size8 transfer function
which crashes on the 4byte aligned buffer.
No idea how to fix that (besides bloating the alignment requirement as
done right now), does nditer always know about all the buffers involved? if
yes somewhere an alignment check is missing.

it can be reproduced on x86 by asserting on alignment in
lowlevel_strided_loops.c.src and reverting the alignment increase:

--- a/numpy/core/src/multiarray/arraytypes.c.src+++ b/numpy/core/src/multiarray/arraytypes.c.src@@ -3819,7 +3819,7 @@ NPY_NO_EXPORT PyArray_Descr @from@_Descr = {
/* elsize /
@num@ * sizeof(@fromtype@),
/
alignment _/- @num@ * _ALIGN(@fromtype@),+ ALIGN(@fromtype@),
/
subarray /
NULL,
/
fields */@@ -4150,7 +4150,6 @@ set_typeinfo(PyObject *dict)
* CFLOAT, CDOUBLE, CLONGDOUBLE#
* #Name = Half, Float, Double, LongDouble,
* CFloat, CDouble, CLongDouble#- * #num = 1, 1, 1, 1, 2, 2, 2#
*/

 PyDict_SetItemString(infodict, "@name@",@@ -4161,7 +4160,7 @@ set_typeinfo(PyObject *dict)

#endif
NPY_@name@,
NPY_BITSOF_@name@,- @num@ * _ALIGN(@type@),+ _ALIGN(@type@),
(PyObject *) &Py@Name@ArrType_Type));
Py_DECREF(s);

--- a/numpy/core/src/multiarray/lowlevel_strided_loops.c.src+++ b/numpy/core/src/multiarray/lowlevel_strided_loops.c.src@@ -118,6 +118,10 @@ static void
npy_intp N, npy_intp NPY_UNUSED(src_itemsize),
NpyAuxData _NPY_UNUSED(data))
{+#if @is_aligned@ && @elsize@ != 16+ assert((long)dst % sizeof(@type@) == 0);+ assert((long)src % sizeof(@type@) == 0);+#endif
/printf("fn @Prefix@@Oper@_size@elsize@\n");*/
while (N > 0) {


Reply to this email directly or view it on GitHubhttps://github.com//issues/3768#issuecomment-24890741
.

@juliantaylor
Copy link
Contributor

you can also argue the other way round, complex can be copied as a type of double size so it should be aligned to that double size.
Then nditer is working correctly and the alignment requirement was wrong.

I think sticking with doubled alignment requirement is better than putting check for complex types in all alignment checks (as we don't want to align to S1234).
i386 performance could be recovered by using aligned memory allocation or putting a half aligned special case into the unaligned strided copy.

Then we should add a alignment check to NewFromDescr as malloc does not provide this double alignment on i386 and fix a couple failures in test_array_from_pyobj.py.

@charris
Copy link
Member Author
charris commented Sep 23, 2013

This is sounding more like a cleanup task than a 1.8 blocker. Should I just ignore this for 1.8?

@charris
Copy link
Member Author
charris commented Sep 28, 2013

I'm taking this out of the blocker category.

@charris
Copy link
Member Author
charris commented Sep 28, 2013

To break it down,

  • The scipy test should not use unaligned data for the overwrite test.
  • Numpy is making unneeded copies on i386.

The second isn't fatal, and the first should be fixed in scipy.

@juliantaylor
Copy link
Contributor

one just reduce the alignment on platforms we know don't need it.

juliantaylor added a commit to juliantaylor/numpy that referenced this issue Sep 28, 2013
x86 can deal with unaligned memory access so it does not need to align
complex types to 2 * sizeof(T) for the dtype transfers which work on the
full complex number.
closes numpy#3768

Does not fix that new complex arrays are reported as aligned when
the allocator does not provide the required alignment.
@charris
Copy link
Member Author
charris commented May 5, 2014

@juliantaylor Do you recall where we are with this issue?

@charris charris added this to the 1.9 blockers milestone May 5, 2014
@juliantaylor
Copy link
Contributor

not fixed but we could by reducing the alignment of complex types but excluding sparc

@charris
Copy link
Member Author
charris commented May 5, 2014

OK. I think I can remove this from the 1.9 blockers as we did for 1.8.

@charris charris removed this from the 1.9 blockers milestone May 5, 2014
@charris charris closed this as completed in fc5d7cf Jul 3, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants
0