8000 MAINT: CPUs that support unaligned access. by Qiyu8 · Pull Request #18065 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

MAINT: CPUs that support unaligned access. #18065

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 5, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 0 additions & 8 deletions numpy/core/include/numpy/npy_common.h
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,6 @@
#include <npy_config.h>
#endif

// int*, int64* should be propertly aligned on ARMv7 to avoid bus error
#if !defined(NPY_STRONG_ALIGNMENT) && defined(__arm__) && !(defined(__aarch64__) || defined(_M_ARM64))
#define NPY_STRONG_ALIGNMENT 1
#endif
#if !defined(NPY_STRONG_ALIGNMENT)
#define NPY_STRONG_ALIGNMENT 0
#endif

// compile time environment variables
#ifndef NPY_RELAXED_STRIDES_CHECKING
#define NPY_RELAXED_STRIDES_CHECKING 0
Expand Down
14 changes: 10 additions & 4 deletions numpy/core/include/numpy/npy_cpu.h
Original file line number Diff line number Diff line change
Expand Up @@ -110,10 +110,16 @@
information about your platform (OS, CPU and compiler)
#endif

#if (defined(NPY_CPU_X86) || defined(NPY_CPU_AMD64))
#define NPY_CPU_HAVE_UNALIGNED_ACCESS 1
#else
#define NPY_CPU_HAVE_UNALIGNED_ACCESS 0
/*
* Except for the following architectures, memory access is limited to the natural
* alignment of data types otherwise it may lead to bus error or performance regression.
* For more details about unaligned access, see https://www.kernel.org/doc/Documentation/unaligned-memory-access.txt.
*/
#if defined(NPY_CPU_X86) || defined(NPY_CPU_AMD64) || defined(__aarch64__) || defined(__powerpc64__)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Qiyu8, Its should be a simple comment not a case study, and the current one not accurate enough. the comment can be simple as following:

// Except for the following architectures, memory access is limited to the natural
// alignment of data types otherwise it may lead to bus error or performance regression.

the new #def NPY_STRONG_ALIGNMENT_REQUIRED is too long, three words is enough NPY_ALIGNMENT_REQUIRED or NPY_STRONG_ALIGNMENT

Copy link
Member
@seberg seberg Jan 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I tried to shut down a bit over the break. I agree with Sayed, we don't need a list of how things can go wrong or how specific CPUs behave (it is interesting though!).

The more important thing is to write a short comment about where it is safe to do unaligned access (i.e. if you put a if (!NPY_STRONG_ALIGNMENT_REQUIRED) {code} what are you allowed to do in code? Are there remaining constraints about unaligned access?

EDIT: You could include the URL to this PR so that someone interested can find it!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will choose the name NPY_ALIGNMENT_REQUIRED, which represents only aligned memory is allowed to access.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. About that auto vectorization note in the comment? I do not have to be worried about it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NPY_USE_UNALIGNED_ACCESS already disabled no matter auto-vectorization is enabled or not, I will reconstruct lowlevel_strided_loops.c.src by using universal intrinsics so that NPY_USE_UNALIGNED_ACCESS can enabled when -O3 is specifiled, but that's not the point of this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize that it is disabled, my question was just to make sure that the comment doesn't somehow apply to the new code as well, since I am not quite sure how it would differ from the one in lowlevel_strided_loops.c.src.

Btw. don't let it stop you, but that file will need at least two times bigger maintenance (one already open, the other is fixing most function signatures). That should be pretty orthogonal to anything you do though, making sure casts are vectorized seems definitely worth it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, I am happy with this, feel free to merge. I just still don't quite understand the autovectorization comment unless it is specific to using NPY_GCC_UNROLL_LOOPS.

#define NPY_ALIGNMENT_REQUIRED 0
#endif
#ifndef NPY_ALIGNMENT_REQUIRED
#define NPY_ALIGNMENT_REQUIRED 1
#endif

#endif
2 changes: 1 addition & 1 deletion numpy/core/src/multiarray/common.h
ED4F
Original file line number Diff line number Diff line change
Expand Up @@ -267,7 +267,7 @@ npy_memchr(char * haystack, char needle,
}
else {
/* usually find elements to skip path */
if (NPY_CPU_HAVE_UNALIGNED_ACCESS && needle == 0 && stride == 1) {
if (!NPY_ALIGNMENT_REQUIRED && needle == 0 && stride == 1) {
/* iterate until last multiple of 4 */
char * block_end = haystack + size - (size % sizeof(unsigned int));
while (p < block_end) {
Expand Down
2 changes: 1 addition & 1 deletion numpy/core/src/multiarray/compiled_base.c
Original file line number Diff line number Diff line change
Expand Up @@ -1521,7 +1521,7 @@ pack_inner(const char *inptr,
bb[2] = npyv_tobits_b8(npyv_cmpneq_u8(v2, v_zero));
bb[3] = npyv_tobits_b8(npyv_cmpneq_u8(v3, v_zero));
if(out_stride == 1 &&
(!NPY_STRONG_ALIGNMENT || isAligned)) {
(!NPY_ALIGNMENT_REQUIRED || isAligned)) {
npy_uint64 *ptr64 = (npy_uint64*)outptr;
#if NPY_SIMD_WIDTH == 16
npy_uint64 bcomp = bb[0] | (bb[1] << 16) | (bb[2] << 32) | (bb[3] << 48);
Expand Down
2 changes: 1 addition & 1 deletion numpy/core/src/multiarray/item_selection.c
Original file line number Diff line number Diff line change
Expand Up @@ -2245,7 +2245,7 @@ count_boolean_trues(int ndim, char *data, npy_intp const *ashape, npy_intp const
count += count_nonzero_bytes((const npy_uint8 *)d, stride);
d += stride;
#else
if (NPY_CPU_HAVE_UNALIGNED_ACCESS ||
if (!NPY_ALIGNMENT_REQUIRED ||
npy_is_aligned(d, sizeof(npy_uint64))) {
npy_uintp stride = 6 * sizeof(npy_uint64);
for (; d < e - (shape[0] % stride); d += stride) {
Expand Down
2 changes: 1 addition & 1 deletion numpy/core/src/multiarray/lowlevel_strided_loops.c.src
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
* instructions (16 byte).
* So this flag can only be enabled if autovectorization is disabled.
*/
#if NPY_CPU_HAVE_UNALIGNED_ACCESS
#if NPY_ALIGNMENT_REQUIRED
# define NPY_USE_UNALIGNED_ACCESS 0
#else
# define NPY_USE_UNALIGNED_ACCESS 0
Expand Down
0