8000 BUG: Resolve Divide by Zero on Apple silicon + test failures by Developer-Ecosystem-Engineering · Pull Request #19926 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG: Resolve Divide by Zero on Apple silicon + test failures #19926

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 25, 2021
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Address Azure CI failures with older versions of clang
- -ftrapping-math is default enabled for Numpy, but support in clang is mainly for x86_64
- Apple Clang and Clang have different, but overlapping versions
- Non-Apple Clang versions come from looking at when they started supporting -ftrapping-math for x86_64

Testing was done against Apple Clang versions
- v11 / x86_64 - failed previously, now passes (azure failure)
- v12+ / x86_64 - passes before and after
- v13 / arm64 - failed before initial patch, passes after
  • Loading branch information
Developer-Ecosystem-Engineering committed Sep 24, 2021
commit 5f93ba4fbe1a3ebb13cf125809552cfa3200a43e
57 changes: 44 additions & 13 deletions numpy/core/src/umath/loops_unary_fp.dispatch.c.src
Original file line number Diff line number Diff line change
Expand Up @@ -79,21 +79,52 @@ NPY_FINLINE double c_square_f64(double a)
#define NCONTIG 1

/*
* clang has a bug on at least v13 and prior. The bug is present at -O1 or
* greater. When partially loading a NEON register for a reciprocal operation,
* the remaining elements are set to 1 to avoid divide-by-zero. The partial
* load is paired with a partial store after the reciprocal operation. clang
* notices that the entire NEON register is not needed for the store and
* optimizes out the fill of 1 to the remaining elements. This causes a
* divide-by-zero error that we were trying to avoid by filling.
* clang has a bug that's present at -O1 or greater. When partially loading a
* vector register for a reciprocal operation, the remaining elements are set
* to 1 to avoid divide-by-zero. The partial load is paired with a partial
* store after the reciprocal operation. clang notices that the entire register
* is not needed for the store and optimizes out the fill of 1 to the remaining
* elements. This causes either a divide-by-zero or 0/0 with invalid exception
* that we were trying to avoid by filling.
*
* Using a dummy variable marked 'volatile' convinces clang not to ignore
* the explicit fill of remaining elements.
* the explicit fill of remaining elements. If `-ftrapping-math` is
* supported, then it'll also avoid the bug. `-ftrapping-math` is supported
* on Apple clang v12+ for x86_64. It is not currently supported for arm64.
* `-ftrapping-math` is set by default of Numpy builds in
* numpy/distutils/ccompiler.py.
*
* Note: Apple clang and clang upstream have different versions that overlap
*/
#if defined(__clang__) && defined(__APPLE__) && defined(__arm64__)
#define WORKAROUND_CLANG_ARM64_RECIPROCAL_BUG 1
#if defined(__clang__)
#if defined(__apple_build_version__)
// Apple Clang
#if __apple_build_version__ < 12000000
// Apple Clang before v12
#define WORKAROUND_CLANG_RECIPROCAL_BUG 1
#elif defined(NPY_CPU_X86) || defined(NPY_CPU_AMD64)
// Apple Clang after v12, targeting i386 or x86_64
#define WORKAROUND_CLANG_RECIPROCAL_BUG 0
#else
// Apple Clang after v12, not targeting i386 or x86_64
AF43 #define WORKAROUND_CLANG_RECIPROCAL_BUG 1
#endif
#else
// Clang, not Apple Clang
#if __clang_major__ < 10
// Clang before v10
#define WORKAROUND_CLANG_RECIPROCAL_BUG 1
#elif defined(NPY_CPU_X86) || defined(NPY_CPU_AMD64)
// Clang v10+, targeting i386 or x86_64
#define WORKAROUND_CLANG_RECIPROCAL_BUG 0
#else
// Clang v10+, not targeting i386 or x86_64
#define WORKAROUND_CLANG_RECIPROCAL_BUG 1
#endif
#endif
#else
#define WORKAROUND_CLANG_ARM64_RECIPROCAL_BUG 0
// Not a Clang compiler
#define WORKAROUND_CLANG_RECIPROCAL_BUG 0
#endif

/**begin repeat
Expand All @@ -106,7 +137,7 @@ NPY_FINLINE double c_square_f64(double a)
* #kind = sqrt, absolute, square, reciprocal#
* #intr = sqrt, abs, square, recip#
* #repl_0w1 = 0, 0, 0, 1#
* #RECIP_WORKAROUND = 0, 0, 0, WORKAROUND_CLANG_ARM64_RECIPROCAL_BUG#
* #RECIP_WORKAROUND = 0, 0, 0, WORKAROUND_CLANG_RECIPROCAL_BUG#
*/
/**begin repeat2
* #STYPE = CONTIG, NCONTIG, CONTIG, NCONTIG#
Expand Down Expand Up @@ -203,7 +234,7 @@ static void simd_@TYPE@_@kind@_@STYPE@_@DTYPE@
#endif // @VCHK@
/**end repeat**/

#undef WORKAROUND_CLANG_ARM64_RECIPROCAL_BUG
#undef WORKAROUND_CLANG_RECIPROCAL_BUG

/********************************************************************************
** Defining ufunc inner functions
Expand Down
0