MAINT: Fix floating point warning flag as much as possible #19476

seberg · 2021-07-14T16:58:38Z

This PR avoids setting FPE flag flags rather than clearing them
where possible.

The main place where this is not easy is SSE (I could not yet find
the isless equivalent for basic SSE), and did not want to modify
the code too much.

This also updates the docs to copy the informational section to the
"Floating point error handling" landing page. Right now, the docs
assume that we (for now) keep the behaviour of NOT giving floating
point warnings for comparisons with NaN.

Hopefully, using these functions should not have any speed impact.
There is some chance that compilers will not honor the contract
and replace isless with instructions that do set floating point
error flags. If/where this happens, we may have to undo the changes.

This should have no impact on functionality (except if compilers
do not follow C99/IEEE correctly, which is entirely possible).

However, signalling NaNs WILL now warn more often (as per C99/IEEE),
since we do not clear the warnings as agressively.

A few notes:

I am not sure that we can trust compilers. so there is some chance that relying on isless and friends won't work perfectly. There is also some chance of performance regressions forcing us back.
Maybe @seiko2plus can have a look at the SIMD related changes? Otherwise, I could split them out to make things simpler.

To some degree getting warning flags right seems a bit "aspirational"...

seberg · 2021-07-14T17:17:37Z

Well, that attempt died quickly on CI :/, it had worked great locally. (I still have some hope I just missed some SIMD path though)

seberg · 2021-07-14T19:03:18Z

Aha, I managed to rewrite the AVX512F logic to avoid the spurious warning setting. Quick timing gives (on a machine with AVX512F*):

In [1]: arr1 = np.random.random(10000); arr2 = np.random.random(10000)

On main:

In [2]: %timeit np.minimum(arr1, arr2)
7.89 µs ± 83.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

On this branch:

In [3]: %timeit np.minimum(arr1, arr2)
8.39 µs ± 6.36 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

So it looks like <5% slowdown, probably we can live with that.

I think the AVX512 solution may be a big improvement for the SSE2 code. That code currently tests for an invalid flag set during the comparison. But as far as I can tell, both of the instructions I now use in AVX have equivalent ones in SSE2.

This is technically even incorrect in IEEE, signalling NaNs should pretty much always give a warning. But overall, it seems unlikely NumPy should bother about it (we never create them after all). This is split off from numpygh-19476. Mainly, to check whether s390x is broken...

seberg · 2021-07-14T20:00:20Z

I can reproduce the MacOS failures locally with clang, and they go away with -ffp-exception-behavior=strict, so gh-19049 (not working) strikes again...

So we will have to figure out gh-19049 before this has any chance :(. Marking as draft, I guess some of the SIMD stuff could be split out regardless of clang. EDIT: xref the issue gh-19427

seberg · 2021-07-19T00:01:55Z

Well, probably will have to just pull out the useful parts... Godbolt points out that clang <10.0 does bad optimization: https://godbolt.org/z/38hcaaP7o (note the cmpltsd xmm3, xmm1 instruction, where clang 10 correctly only uses ucomisd).

I am not actually sure why this fails CI, since that looks like a clang 11, unless it is Mac specific? But, I guess we support clang 9 anyway, so...

EDIT: Apparently the CI issue is that clang 11 does not support trapping-math, at least not for x86_64.

seberg · 2021-11-10T15:59:46Z

Closing reopening, because I am curious if bumping MacOS in CI from 10.14 to 10.15 makes a difference here (although I am not sure that would actually help with moving forward).

This PR avoids setting FPE flag flags rather than clearing them where possible. The main place where this is not easy is SSE (I could not yet find the `isless` equivalent for basic SSE), and did not want to modify the code too much. This also updates the docs to copy the informational section to the "Floating point error handling" landing page. Right now, the docs assume that we (for now) keep the behaviour of NOT giving floating point warnings for comparisons with NaN. Hopefully, using these functions should not have any speed impact. There is some chance that compilers will not honor the contract and replace `isless` with instructions that do set floating point error flags. If/where this happens, we may have to undo the changes. This should have no impact on functionality (except if compilers do not follow C99/IEEE correctly, which is entirely possible). However, signalling NaNs WILL now warn more often (as per C99/IEEE), since we do not clear the warnings as agressively.

seberg · 2022-04-11T17:58:15Z

Going to close this for now. There are a few things here that would be nice to have. The whole thing would be nice, but seems only viable for newer clang versions (which probably takes a bit more time).

github-actions bot added the 03 - Maintenance label Jul 14, 2021

seberg mentioned this pull request Jul 14, 2021

TST: Do not test for signalling NaNs not raising warnings #19477

Closed

seberg force-pushed the use-no-fp-error-C99-functions branch from fd815bd to 7291c4b Compare July 14, 2021 19:39

seberg marked this pull request as draft July 14, 2021 20:00

seberg mentioned this pull request Jul 14, 2021

BLD: Add clang -ftrapping-math also for compiler_so #19479

Merged

seberg force-pushed the use-no-fp-error-C99-functions branch from 7291c4b to f137362 Compare July 18, 2021 20:26

seberg marked this pull request as ready for review July 18, 2021 20:26

seberg marked this pull request as draft July 18, 2021 23:14

seberg mentioned this pull request Jul 22, 2021

MAINT,ENH: Use non-error functions isless, isgreater, etc. #19398

Open

seberg closed this Nov 10, 2021

seberg reopened this Nov 10, 2021

seberg added 2 commits November 10, 2021 10:01

BUG: Try if avoiding AVX max for NaN elements works

1110d7c

seberg force-pushed the use-no-fp-error-C99-functions branch from f137362 to 1110d7c Compare November 10, 2021 16:04

seberg added the 64 - Good Idea Inactive PR with a good start or idea. Consider studying it if you are working on a related issue. label Apr 11, 2022

seberg closed this Apr 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

MAINT: Fix floating point warning flag as much as possible #19476

MAINT: Fix floating point warning flag as much as possible #19476

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MAINT: Fix floating point warning flag as much as possible #19476

MAINT: Fix floating point warning flag as much as possible #19476

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!