8000 Set -mfpmath=sse on x86-32 for gcc/clang numeric consistency · numpy/numpy@1cd1a0e · GitHub
[go: up one dir, main page]

Skip to content

Commit 1cd1a0e

Browse files
committed
Set -mfpmath=sse on x86-32 for gcc/clang numeric consistency
Force SSE-based floating-point on 32-bit x86 systems to fix inconsistent results between einsum and other math functions. Prevents test failures with int16 operations by avoiding the x87 FPU's extended precision.
1 parent 8bd973d commit 1cd1a0e

File tree

1 file changed

+5
-1
lines changed

1 file changed

+5
-1
lines changed

meson_cpu/x86/meson.build

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,11 @@ cpu_family = host_machine.cpu_family()
44
mod_features = import('features')
55

66
HWY_SSE4_FLAGS = ['-DHWY_WANT_SSE4', '-DHWY_DISABLE_PCLMUL_AES']
7-
X86_64_V2_FLAGS = cpu_family == 'x86'? [] : ['-mcx16']
7+
# Use SSE for floating-point on x86-32 to ensure numeric consistency.
8+
# The x87 FPU's 80-bit internal precision causes unpredictable rounding
9+
# and overflow behavior when converting to smaller types. SSE maintains
10+
# strict 32/64-bit precision throughout all calculations.
11+
X86_64_V2_FLAGS = cpu_family == 'x86'? ['-mfpmath=sse'] : ['-mcx16']
812
X86_64_V2_NAMES = cpu_family == 'x86'? [] : ['CX16']
913
X86_V2 = mod_features.new(
1014
'X86_V2', 1, args: ['-msse', '-msse2', '-msse3', '-mssse3', '-msse4.1', '-msse4.2',

0 commit comments

Comments
 (0)
0