Description
Describe the issue:
We discovered a simple test failing on windows only: sunpy/sunpy#7754
I have investigated the symptoms of this in some detail but have not tried to find the cause: In short it seems like matrix multiplications with largeish numbers fails inconsistently in windows, and when it does fail it will fail a few times in a row before returning to normal. I would suspect openblas but the failure never occurs with numpy <2
Reproduce the code example:
import numpy as np
def test():
# Test whether matrix multiplication involving a large matrix always gives the same answer
# This indirectly tests whichever BLAS/LAPACK libraries that NumPy is linking to (if any)
x = np.arange(500000, dtype=np.float64)
src = np.vstack((x, -10*x)).T
matrix = np.array([[0, 1], [1, 0]])
expected = np.vstack((-10*x, x)).T # src @ matrix
mismatches = np.zeros(500, int)
for i in range(len(mismatches)):
result = src @ matrix
mismatches[i] = (~np.isclose(result, expected)).sum()
if mismatches[i] != 0:
print(f"{mismatches[i]} mismatching elements in multiplication #{i}")
Error message:
This bug is absolutely wild by the way:
With numpy 2.0.1:
16 mismatching elements in multiplication #22
316 mismatching elements in multiplication #33
28 mismatching elements in multiplication #53
307 mismatching elements in multiplication #100
32 mismatching elements in multiplication #201
32 mismatching elements in multiplication #267
288 mismatching elements in multiplication #272
1177 mismatching elements in multiplication #276
1596 mismatching elements in multiplication #289
32 mismatching elements in multiplication #298
16 mismatching elements in multiplication #314
3268 mismatching elements in multiplication #407
1403 mismatching elements in multiplication #419
64 mismatching elements in multiplication #446
with numpy 2.0.0
80 mismatching elements in multiplication #0
920 mismatching elements in multiplication #183
683 mismatching elements in multiplication #186
232 mismatching elements in multiplication #187
400 mismatching elements in multiplication #190
1217 mismatching elements in multiplication #195
1515 mismatching elements in multiplication #249
593 mismatching elements in multiplication #269
108 mismatching elements in multiplication #285
120 mismatching elements in multiplication #295
204 mismatching elements in multiplication #332
78 mismatching elements in multiplication #357
1816 mismatching elements in multiplication #380
164 mismatching elements in multiplication #396
48 mismatching elements in multiplication #492
with 1.26.4 it passes.
So it is a numpy 2 issue and not a 2.0.1 issue... it is also random:
Inserting a return and a break on the first time the matmul fails:
In [52]: test()
32 mismatching elements in multiplication #27
In [53]: test()
152 mismatching elements in multiplication #44
In [54]: test()
108 mismatching elements in multiplication #75
In [55]: test()
68 mismatching elements in multiplication #44
In [56]: test()
964 mismatching elements in multiplication #1
In [57]: test()
228 mismatching elements in multiplication #89
In [58]: test()
24 mismatching elements in multiplication #0
In a given set of multiplications the errors always have the same values.
In [92]: print([i for i in x if np.any(i != 0)])
[array([-69480., 6948.]), array([-69480., 6948.]), array([-69480., 6948.]), array([-69480., 6948.]), array([-69480., 6948.]), array([-69480., 6948.]), array([-69480., 6948.]), array([-69480., 6948.]), array([-69480., 6948.]), array([-69480., 6948.]), array([-69480., 6948.]), array([-69480., 6948.]), array([-69480., 6948.]), array([-69480., 6948.]), array([-69480., 6948.]), array([-69480., 6948.]), array([-69480., 6948.]), array([-69480., 6948.]), array([-69480., 6948.]), array([-69480., 6948.]), array([-69480., 6948.]), array([-69480., 6948.]), array([-69480., 6948.]), array([-69480.,
6948.])]
In [94]: print([i for i in (res-ex) if np.any(i != 0)])
[array([-1289600., 128960.]), array([-1289600., 128960.]), array([-1289600., 128960.]), array([-1289600., 128960.]), array([-1289600., 128960.]), array([-1289600., 128960.]), array([-1289600., 128960.]), array([-1289600., 128960.])]
In [101]: print([i for i in (res-ex) if np.any(i != 0)])
[array([-208450., 20845.]), array([-208450., 20845.]), array([-208450., 20845.]), array([-208450., 20845.]), array([-208450., 20845.]), array([-208450., 20845.]), array([-208450., 20845.]), array([-208450., 20845.])]
As you can see they also have the weird symmetry where the first column is -10* the second column which I guess makes some twisted sense because thats what the calculation is.
The errors are also always in the same region i.e. with indexs > 400k or so
They also always fail multiple indexes in a row, e.g. you might have 20 or 50 or whatever failed calcualtions in a row but then it starts working again.
e.g.
[417292, 417293, 417294, 417295, 417296, 417297, 417298, 417299, 417300, 417301, 417302, 417303, 449292, 449293, 449294, 449295, 449296, 449297, 449298, 449299, 449300, 449301, 449302, 449303, 449400, 449401, 449402, 449403, 449404, 449405, 449406, 449407, 449408, 449409, 449410, 449411, 449412, 449413, 449414, 449415, 449416, 449417, 449418, 449419, 449420, 449421, 449422, 449423]
or
[487736, 487737, 487738, 487739, 487740, 487741, 487742, 487743, 487760, 487761, 487762, 487763, 487764, 487765, 487766, 487767, 490490, 490491, 490492, 490493, 490494, 490495, 490496, 490497, 496838, 496839, 496840, 496841, 496842, 496843, 496844, 496845]
You can see in one of the plots there has been the same failure for 2 separate sequences of indices.
Python and NumPy Versions:
Windows 10, cannot recreate this at all in linux.
python 3.11.8
numpy 2.0.0 or numpy 2.0.1 fail in the same manner
This also failed on our CI on a windows-latest runner.
https://github.com/sunpy/sunpy/actions/runs/10072095800/job/27843572893#step:10:5160
Runtime Environment:
[{'numpy_version': '2.0.1',
'python': '3.11.8 (tags/v3.11.8:db85d51, Feb 6 2024, 22:03:32) [MSC v.1937 '
'64 bit (AMD64)]',
'uname': uname_result(system='Windows', node='DESKTOP-0EA2O83', release='10', version='10.0.19045', machine='AMD64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2'],
'not_found': ['AVX512F',
'AVX512CD',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL']}},
{'architecture': 'Haswell',
'filepath': 'C:\\Users\\Alasdair\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\numpy.libs\\libscipy_openblas64_-fb1711452d4d8cee9f276fd1449ee5c7.dll',
'internal_api': 'openblas',
'num_threads': 8,
'prefix': 'libscipy_openblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.27'}]
Context for the issue:
Any matrix multiplication result could go wrong at any time in windows.