8000 BUG: Matrix multiplication in windows produces large numerical errors inconsistently in numpy >= 2 · Issue #27036 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content
BUG: Matrix multiplication in windows produces large numerical errors inconsistently in numpy >= 2 #27036
Closed
@alasdairwilson

Description

@alasdairwilson

Describe the issue:

We discovered a simple test failing on windows only: sunpy/sunpy#7754

I have investigated the symptoms of this in some detail but have not tried to find the cause: In short it seems like matrix multiplications with largeish numbers fails inconsistently in windows, and when it does fail it will fail a few times in a row before returning to normal. I would suspect openblas but the failure never occurs with numpy <2

Reproduce the code example:

import numpy as np
def test():
    # Test whether matrix multiplication involving a large matrix always gives the same answer
    # This indirectly tests whichever BLAS/LAPACK libraries that NumPy is linking to (if any)
    x = np.arange(500000, dtype=np.float64)
    src = np.vstack((x, -10*x)).T
    matrix = np.array([[0, 1], [1, 0]])

    expected = np.vstack((-10*x, x)).T  # src @ matrix
    
    mismatches = np.zeros(500, int)
    for i in range(len(mismatches)):
        result = src @ matrix
        mismatches[i] = (~np.isclose(result, expected)).sum()
        if mismatches[i] != 0:
            print(f"{mismatches[i]} mismatching elements in multiplication #{i}")

Error message:

This bug is absolutely wild by the way:

With numpy 2.0.1:

16 mismatching elements in multiplication #22
316 mismatching elements in multiplication #33
28 mismatching elements in multiplication #53
307 mismatching elements in multiplication #100
32 mismatching elements in multiplication #201
32 mismatching elements in multiplication #267
288 mismatching elements in multiplication #272
1177 mismatching elements in multiplication #276
1596 mismatching elements in multiplication #289
32 mismatching elements in multiplication #298
16 mismatching elements in multiplication #314
3268 mismatching elements in multiplication #407
1403 mismatching elements in multiplication #419
64 mismatching elements in multiplication #446

with numpy 2.0.0

80 mismatching elements in multiplication #0
920 mismatching elements in multiplication #183
683 mismatching elements in multiplication #186
232 mismatching elements in multiplication #187
400 mismatching elements in multiplication #190
1217 mismatching elements in multiplication #195
1515 mismatching elements in multiplication #249
593 mismatching elements in multiplication #269
108 mismatching elements in multiplication #285
120 mismatching elements in multiplication #295
204 mismatching elements in multiplication #332
78 mismatching elements in multiplication #357
1816 mismatching elements in multiplication #380
164 mismatching elements in multiplication #396
48 mismatching elements in multiplication #492

with 1.26.4 it passes.

So it is a numpy 2 issue and not a 2.0.1 issue... it is also random:

Inserting a return and a break on the first time the matmul fails:

In [52]: test()
32 mismatching elements in multiplication #27

In [53]: test()
152 mismatching elements in multiplication #44

In [54]: test()
108 mismatching elements in multiplication #75

In [55]: test()
68 mismatching elements in multiplication #44

In [56]: test()
964 mismatching elements in multiplication #1

In [57]: test()
228 mismatching elements in multiplication #89

In [58]: test()
24 mismatching elements in multiplication #0

In a given set of multiplications the errors always have the same values.

In [92]: print([i for i in x if np.any(i != 0)])
[array([-69480.,   6948.]), array([-69480.,   6948.]), array([-69480.,   6948.]), array([-69480.,   6948.]), array([-69480.,   6948.]), array([-69480.,   6948.]), array([-69480.,   6948.]), array([-69480.,   6948.]), array([-69480.,   6948.]), array([-69480.,   6948.]), array([-69480.,   6948.]), array([-69480.,   6948.]), array([-69480.,   6948.]), array([-69480.,   6948.]), array([-69480.,   6948.]), array([-69480.,   6948.]), array([-69480.,   6948.]), array([-69480.,   6948.]), array([-69480.,   6948.]), array([-69480.,   6948.]), array([-69480.,   6948.]), array([-69480.,   6948.]), array([-69480.,   6948.]), array([-69480.,  
 6948.])]
In [94]: print([i for i in (res-ex) if np.any(i != 0)])
[array([-1289600.,   128960.]), array([-1289600.,   128960.]), array([-1289600.,   128960.]), array([-1289600.,   128960.]), array([-1289600.,   128960.]), array([-1289600.,   128960.]), array([-1289600.,   128960.]), array([-1289600.,   128960.])]
In [101]: print([i for i in (res-ex) if np.any(i != 0)])
[array([-208450.,   20845.]), array([-208450.,   20845.]), array([-208450.,   20845.]), array([-208450.,   20845.]), array([-208450.,   20845.]), array([-208450.,   20845.]), array([-208450.,   20845.]), array([-208450.,   20845.])]

As you can see they also have the weird symmetry where the first column is -10* the second column which I guess makes some twisted sense because thats what the calculation is.

The errors are also always in the same region i.e. with indexs > 400k or so
Figure_1

Figure_2

They also always fail multiple indexes in a row, e.g. you might have 20 or 50 or whatever failed calcualtions in a row but then it starts working again.

e.g.

[417292, 417293, 417294, 417295, 417296, 417297, 417298, 417299, 417300, 417301, 417302, 417303, 449292, 449293, 449294, 449295, 449296, 449297, 449298, 449299, 449300, 449301, 449302, 449303, 449400, 449401, 449402, 449403, 449404, 449405, 449406, 449407, 449408, 449409, 449410, 449411, 449412, 449413, 449414, 449415, 449416, 449417, 449418, 449419, 449420, 449421, 449422, 449423]

or

[487736, 487737, 487738, 487739, 487740, 487741, 487742, 487743, 487760, 487761, 487762, 487763, 487764, 487765, 487766, 487767, 490490, 490491, 490492, 490493, 490494, 490495, 490496, 490497, 496838, 496839, 496840, 496841, 496842, 496843, 496844, 496845]

You can see in one of the plots there has been the same failure for 2 separate sequences of indices.

Python and NumPy Versions:

Windows 10, cannot recreate this at all in linux.

python 3.11.8
numpy 2.0.0 or numpy 2.0.1 fail in the same manner 

This also failed on our CI on a windows-latest runner.
https://github.com/sunpy/sunpy/actions/runs/10072095800/job/27843572893#step:10:5160

Runtime Environment:

[{'numpy_version': '2.0.1',
  'python': '3.11.8 (tags/v3.11.8:db85d51, Feb  6 2024, 22:03:32) [MSC v.1937 '
            '64 bit (AMD64)]',
  'uname': uname_result(system='Windows', node='DESKTOP-0EA2O83', release='10', version='10.0.19045', machine='AMD64')},
 {'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
                      'found': ['SSSE3',
                                'SSE41',
                                'POPCNT',
                                'SSE42',
                                'AVX',
                                'F16C',
                                'FMA3',
                                'AVX2'],
                      'not_found': ['AVX512F',
                                    'AVX512CD',
                                    'AVX512_SKX',
                                    'AVX512_CLX',
                                    'AVX512_CNL',
                                    'AVX512_ICL']}},
 {'architecture': 'Haswell',
  'filepath': 'C:\\Users\\Alasdair\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\numpy.libs\\libscipy_openblas64_-fb1711452d4d8cee9f276fd1449ee5c7.dll',
  'internal_api': 'openblas',
  'num_threads': 8,
  'prefix': 'libscipy_openblas',
  'threading_layer': 'pthreads',
  'user_api': 'blas',
  'version': '0.3.27'}]

Context for the issue:

Any matrix multiplication result could go wrong at any time in windows.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0