8000 Merge pull request #6377 from ahaldane/fix_align · numpy/numpy@4a7926f · GitHub
[go: up one dir, main page]

Skip to content

Commit 4a7926f

Browse files
authored
Merge pull request #6377 from ahaldane/fix_align
BUG: define "uint-alignment", fixes complex64 alignment
2 parents 87c1fcd + 12bd7c3 commit 4a7926f

21 files changed

+361
-141
lines changed

doc/release/1.16.0-notes.rst

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,21 @@ old behavior, use ``np.isnat`` to explicitly check for NaT or convert
6969
datetime64/timedelta64 arrays with ``.astype(np.int64)`` before making
7070
comparisons.
7171

72+
complex64/128 alignment has changed
73+
-----------------------------------
74+
The memory alignment of complex types is now the same as a C-struct composed of
75+
two floating point values, while before it was equal to the size of the type.
76+
For many users (for instance on x64/unix/gcc) this means that complex64 is now
77+
4-byte aligned instead of 8-byte aligned. An important consequence is that
78+
aligned structured dtypes may now have a different size. For instance,
79+
``np.dtype('c8,u1', align=True)`` used to have an itemsize of 16 (on x64/gcc)
80+
but now it is 12.
81+
82+
More in detail, the complex64 type now has the same alignment as a C-struct
83+
``struct {float r, i;}``, according to the compiler used to compile numpy, and
84+
similarly for the complex128 and complex256 types.
85+
86+
7287
C API changes
7388
=============
7489

doc/source/dev/alignment.rst

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
.. _alignment:
2+
3+
4+
Numpy Alignment Goals
5+
=====================
6+
7+
There are three use-cases related to memory alignment in numpy (as of 1.14):
8+
9+
1. Creating structured datatypes with fields aligned like in a C-struct.
10+
2. Speeding up copy operations by using uint assignment in instead of memcpy
11+
3. Guaranteeing safe aligned access for ufuncs/setitem/casting code
12+
13+
Numpy uses two different forms of alignment to achieve these goals:
14+
"True alignment" and "Uint alignment".
15+
16+
"True" alignment refers to the architecture-dependent alignment of an
17+
equivalent C-type in C. For example, in x64 systems ``numpy.float64`` is
18+
equivalent to ``double`` in C. On most systems this has either an alignment of
19+
4 or 8 bytes (and this can be controlled in gcc by the option
20+
``malign-double``). A variable is aligned in memory if its memory offset is a
21+
multiple of its alignment. On some systems (eg sparc) memory alignment is
22+
required, on others it gives a speedup.
23+
24+
"Uint" alignment depends on the size of a datatype. It is defined to be the
25+
"True alignment" of the uint used by numpy's copy-code to copy the datatype, or
26+
undefined/unaligned if there is no equivalent uint. Currently numpy uses uint8,
27+
uint16, uint32, uint64 and uint64 to copy data of size 1,2,4,8,16 bytes
28+
respectively, and all other sized datatypes cannot be uint-aligned.
29+
30+
For example, on a (typical linux x64 gcc) system, the numpy ``complex64``
31+
datatype is implemented as ``struct { float real, imag; }``. This has "true"
32+
alignment of 4 and "uint" alignment of 8 (equal to the true alignment of
33+
``uint64``).
34+
35+
Variables in Numpy which control and describe alignment
36+
=======================================================
37+
38+
There are 4 relevant uses of the word ``align`` used in numpy:
39+
40+
* The ``dtype.alignment`` attribute (``descr->alignment`` in C). This is meant
41+
to reflect the "true alignment" of the type. It has arch-dependent default
42+
values for all datatypes, with the exception of structured types created
43+
with ``align=True`` as described below.
44+
* The ``ALIGNED`` flag of an ndarray, computed in ``IsAligned`` and checked
45+
by ``PyArray_ISALIGNED``. This is computed from ``dtype.alignment``.
46+
It is set to ``True`` if every item in the array is at a memory location
47+
consistent with ``dtype.alignment``, which is the case if the data ptr and
48+
all strides of the array are multiples of that alignment.
49+
* The ``align`` keyword of the dtype constructor, which only affects structured
50+
arrays. If the structure's field offsets are not manually provided numpy
51+
determines offsets automatically. In that case, ``align=True`` pads the
52+
structure so that each field is "true" aligned in memory and sets
53+
``dtype.alignment`` to be the largest of the field "true" alignments. This
54+
is like what C-structs usually do. Otherwise if offsets or itemsize were
55+
manually provided ``align=True`` simply checks that all the fields are
56+
"true" aligned and that the total itemsize is a multiple of the largest
57+
field alignment. In either case ``dtype.isalignedstruct`` is also set to
58+
True.
59+
* ``IsUintAligned`` is used to determine if an ndarray is "uint aligned" in
60+
an analagous way to how ``IsAligned`` checks for true-alignment.
61+
62+
Consequences of alignment
63+
=========================
64+
65+
Here is how the variables above are used:
66+
67+
1. Creating aligned structs: In order to know how to offset a field when
68+
``align=True``, numpy looks up ``field.dtype.alignment``. This includes
69+
fields which are nested structured arrays.
70+
2. Ufuncs: If the ``ALIGNED`` flag of an array is False, ufuncs will
71+
buffer/cast the array before evaluation. This is needed since ufunc inner
72+
loops access raw elements directly, which might fail on some archs if the
73+
elements are not true-aligned.
74+
3. Getitem/setitem/copyswap function: Similar to ufuncs, these functions
75+
generally have two code paths. If ``ALIGNED`` is False they will
76+
use a code path that buffers the arguments so they are true-aligned.
77+
4. Strided copy code: Here, "uint alignment" is used instead. If the itemsize
78+
of an array is equal to 1, 2, 4, 8 or 16 bytes and the array is uint
79+
aligned then instead numpy will do ``*(uintN*)dst) = *(uintN*)src)`` for
80+
appropriate N. Otherwise numpy copies by doing ``memcpy(dst, src, N)``.
81+
5. Nditer code: Since this often calls the strided copy code, it must
82+
check for "uint alignment".
83+
6. Cast code: if the array is "uint aligned" this will essentially do
84+
``*dst = CASTFUNC(*src)``. If not, it does
85+
``memmove(srcval, src); dstval = CASTFUNC(srcval); memmove(dst, dstval)``
86+
where dstval/srcval are aligned.
87+
88+
Note that in principle, only "true alignment" is required for casting code.
89+
However, because the casting code and copy code are deeply intertwined they
90+
both use "uint" alignment. This should be safe assuming uint alignment is
91+
always larger than true alignment, though it can cause unnecessary buffering if
92+
an array is "true aligned" but not "uint aligned". If there is ever a big
93+
rewrite of this code it would be good to allow them to use different
94+
alignments.
95+
96+

numpy/core/src/common/array_assign.c

Lines changed: 52 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -84,14 +84,43 @@ broadcast_error: {
8484

8585
/* See array_assign.h for parameter documentation */
8686
NPY_NO_EXPORT int
87-
raw_array_is_aligned(int ndim, char *data, npy_intp *strides, int alignment)
87+
raw_array_is_aligned(int ndim, npy_intp *shape,
88+
char *data, npy_intp *strides, int alignment)
8889
{
89-
if (alignment > 1) {
90-
npy_intp align_check = (npy_intp)data;
91-
int idim;
9290

93-
for (idim = 0; idim < ndim; ++idim) {
94-
align_check |= strides[idim];
91+
/*
92+
* The code below expects the following:
93+
* * that alignment is a power of two, as required by the C standard.
94+
* * that casting from pointer to uintp gives a sensible representation
95+
* we can use bitwise operations on (perhaps *not* req. by C std,
96+
* but assumed by glibc so it should be fine)
97+
* * that casting stride from intp to uintp (to avoid dependence on the
98+
* signed int representation) preserves remainder wrt alignment, so
99+
* stride%a is the same as ((unsigned intp)stride)%a. Req. by C std.
100+
*
101+
* The code checks whether the lowest log2(alignment) bits of `data`
102+
* and all `strides` are 0, as this implies that
103+
* (data + n*stride)%alignment == 0 for all integers n.
104+
*/
105+
106+
if (alignment > 1) {
107+
npy_uintp align_check = (npy_uintp)data;
108+
int i;
109+
110+
for (i = 0; i < ndim; i++) {
111+
#if NPY_RELAXED_STRIDES_CHECKING
112+
/* skip dim == 1 as it is not required to have stride 0 */
113+
if (shape[i] > 1) {
114+
/* if shape[i] == 1, the stride is never used */
115+
align_check |= (npy_uintp)strides[i];
116+
}
117+
else if (shape[i] == 0) {
118+
/* an array with zero elements is always aligned */
119+
return 1;
120+
}
121+
#else /* not NPY_RELAXED_STRIDES_CHECKING */
122+
align_check |= (npy_uintp)strides[i];
123+
#endif /* not NPY_RELAXED_STRIDES_CHECKING */
95124
}
96125

97126
return npy_is_aligned((void *)align_check, alignment);
@@ -101,6 +130,23 @@ raw_array_is_aligned(int ndim, char *data, npy_intp *strides, int alignment)
101130
}
102131
}
103132

133+
NPY_NO_EXPORT int
134+
IsAligned(PyArrayObject *ap)
135+
{
136+
return raw_array_is_aligned(PyArray_NDIM(ap), PyArray_DIMS(ap),
137+
PyArray_DATA(ap), PyArray_STRIDES(ap),
138+
PyArray_DESCR(ap)->alignment);
139+
}
140+
141+
NPY_NO_EXPORT int
142+
IsUintAligned(PyArrayObject *ap)
143+
{
144+
return raw_array_is_aligned(PyArray_NDIM(ap), PyArray_DIMS(ap),
145+
PyArray_DATA(ap), PyArray_STRIDES(ap),
146+
npy_uint_alignment(PyArray_DESCR(ap)->elsize));
147+
}
148+
149+
104150

105151
/* Returns 1 if the arrays have overlapping data, 0 otherwise */
106152
NPY_NO_EXPORT int

numpy/core/src/common/array_assign.h

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -87,10 +87,26 @@ broadcast_strides(int ndim, npy_intp *shape,
8787

8888
/*
8989
* Checks whether a data pointer + set of strides refers to a raw
90-
* array which is fully aligned data.
90+
* array whose elements are all aligned to a given alignment.
91+
* alignment should be a power of two.
9192
*/
9293
NPY_NO_EXPORT int
93-
raw_array_is_aligned(int ndim, char *data, npy_intp *strides, int alignment);
94+
raw_array_is_aligned(int ndim, npy_intp *shape,
95+
char *data, npy_intp *strides, int alignment);
96+
97+
/*
98+
* Checks if an array is aligned to its "true alignment"
99+
* given by dtype->alignment.
100+
*/
101+
NPY_NO_EXPORT int
102+
IsAligned(PyArrayObject *ap);
103+
104+
/*
105+
* Checks if an array is aligned to its "uint alignment"
106+
* given by npy_uint_alignment(dtype->elsize).
107+
*/
108+
NPY_NO_EXPORT int
109+
IsUintAligned(PyArrayObject *ap);
94110

95111
/* Returns 1 if the arrays have overlapping data, 0 otherwise */
96112
NPY_NO_EXPORT int

numpy/core/src/common/lowlevel_strided_loops.h

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,9 @@
77
/*
88
* NOTE: This API should remain private for the time being, to allow
99
* for further refinement. I think the 'aligned' mechanism
10-
* needs changing, for example.
10+
* needs changing, for example.
11+
*
12+
* Note: Updated in 2018 to distinguish "true" from "uint" alignment.
1113
*/
1214

1315
/*
@@ -69,8 +71,9 @@ typedef void (PyArray_StridedBinaryOp)(char *dst, npy_intp dst_stride,
6971
* strided memory. Returns NULL if there is a problem with the inputs.
7072
*
7173
* aligned:
72-
* Should be 1 if the src and dst pointers are always aligned,
73-
* 0 otherwise.
74+
* Should be 1 if the src and dst pointers always point to
75+
* locations at which a uint of equal size to dtype->elsize
76+
* would be aligned, 0 otherwise.
7477
* src_stride:
7578
* Should be the src stride if it will always be the same,
7679
* NPY_MAX_INTP otherwise.
@@ -165,8 +168,9 @@ PyArray_GetDTypeCopySwapFn(int aligned,
165168
* function when the transfer function is no longer required.
166169
*
167170
* aligned:
168-
* Should be 1 if the src and dst pointers are always aligned,
169-
* 0 otherwise.
171+
* Should be 1 if the src and dst pointers always point to
172+
* locations at which a uint of equal size to dtype->elsize
173+
* would be aligned, 0 otherwise.
170174
* src_stride:
171175
* Should be the src stride if it will always be the same,
172176
* NPY_MAX_INTP otherwise.

numpy/core/src/common/npy_config.h

Lines changed: 0 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -6,22 +6,6 @@
66
#include "numpy/npy_cpu.h"
77
#include "numpy/npy_os.h"
88

9-
/*
10-
* largest alignment the copy loops might require
11-
* required as string, void and complex types might get copied using larger
12-
* instructions than required to operate on them. E.g. complex float is copied
13-
* in 8 byte moves but arithmetic on them only loads in 4 byte moves.
14-
* the sparc platform may need that alignment for long doubles.
15-
* amd64 is not harmed much by the bloat as the system provides 16 byte
16-
* alignment by default.
17-
*/
18-
#if (defined NPY_CPU_X86 || defined _WIN32 || defined NPY_CPU_ARMEL_AARCH32 ||\
19-
defined NPY_CPU_ARMEB_AARCH32)
20-
#define NPY_MAX_COPY_ALIGNMENT 8
21-
#else
22-
#define NPY_MAX_COPY_ALIGNMENT 16
23-
#endif
24-
259
/* blacklist */
2610

2711
/* Disable broken Sun Workshop Pro math functions */

numpy/core/src/multiarray/_multiarray_tests.c.src

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
#include "numpy/arrayscalars.h"
77
#include "numpy/npy_math.h"
88
#include "numpy/halffloat.h"
9+
#include "common.h"
910
#include "mem_overlap.h"
1011
#include "npy_extint128.h"
1112
#include "common.h"
@@ -1641,6 +1642,42 @@ extint_ceildiv_128_64(PyObject *NPY_UNUSED(self), PyObject *args) {
16411642
return pylong_from_int128(c);
16421643
}
16431644

1645+
struct TestStruct1 {
1646+
npy_uint8 a;
1647+
npy_complex64 b;
1648+
};
1649+
1650+
struct TestStruct2 {
1651+
npy_uint32 a;
1652+
npy_complex64 b;
1653+
};
1654+
1655+
struct TestStruct3 {
1656+
npy_uint8 a;
1657+
struct TestStruct1 b;
1658+
};
1659+
1660+
static PyObject *
1661+
get_struct_alignments(PyObject *NPY_UNUSED(self), PyObject *args) {
1662+
PyObject *ret = PyTuple_New(3);
1663+
PyObject *alignment, *size, *val;
1664+
1665+
/**begin repeat
1666+
* #N = 1,2,3#
1667+
*/
1668+
alignment = PyInt_FromLong(_ALIGN(struct TestStruct@N@));
1669+
size = PyInt_FromLong(sizeof(struct TestStruct@N@));
1670+
val = PyTuple_Pack(2, alignment, size);
1671+
Py_DECREF(alignment);
1672+
Py_DECREF(size);
1673+
if (val == NULL) {
1674+
return NULL;
1675+
}
1676+
PyTuple_SET_ITEM(ret, @N@-1, val);
1677+
/**end repeat**/
1678+
return ret;
1679+
}
1680+
16441681

16451682
static char get_fpu_mode_doc[] = (
16461683
"get_fpu_mode()\n"
@@ -1956,6 +1993,9 @@ static PyMethodDef Multiarray_TestsMethods[] = {
19561993
{"format_float_OSprintf_g",
19571994
(PyCFunction)printf_float_g,
19581995
METH_VARARGS , NULL},
1996+
{"get_struct_alignments",
1997+
get_struct_alignments,
1998+
METH_VARARGS, NULL},
19591999
{NULL, NULL, 0, NULL} /* Sentinel */
19602000
};
19612001

numpy/core/src/multiarray/array_assign_array.c

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -49,10 +49,10 @@ raw_array_assign_array(int ndim, npy_intp *shape,
4949
NPY_BEGIN_THREADS_DEF;
5050

5151
/* Check alignment */
52-
aligned = raw_array_is_aligned(ndim,
53-
dst_data, dst_strides, dst_dtype->alignment) &&
54-
raw_array_is_aligned(ndim,
55-
src_data, src_strides, src_dtype->alignment);
52+
aligned = raw_array_is_aligned(ndim, shape, dst_data, dst_strides,
53+
npy_uint_alignment(dst_dtype->elsize)) &&
54+
raw_array_is_aligned(ndim, shape, src_data, src_strides,
55+
npy_uint_alignment(src_dtype->elsize));
5656

5757
/* Use raw iteration with no heap allocation */
5858
if (PyArray_PrepareTwoRawArrayIter(
@@ -134,10 +134,10 @@ raw_array_wheremasked_assign_array(int ndim, npy_intp *shape,
134134
NPY_BEGIN_THREADS_DEF;
135135

136136
/* Check alignment */
137-
aligned = raw_array_is_aligned(ndim,
138-
dst_data, dst_strides, dst_dtype->alignment) &&
139-
raw_array_is_aligned(ndim,
140-
src_data, src_strides, src_dtype->alignment);
137+
aligned = raw_array_is_aligned(ndim, shape, dst_data, dst_strides,
138+
npy_uint_alignment(dst_dtype->elsize)) &&
139+
raw_array_is_aligned(ndim, shape, src_data, src_strides,
140+
npy_uint_alignment(src_dtype->elsize));
141141

142142
/* Use raw iteration with no heap allocation */
143143
if (PyArray_PrepareThreeRawArrayIter(

0 commit comments

Comments
 (0)
0