Description
Describe the issue:
When an in-place ndarray operation is performed with a higher precision numpy scalar value, it took 3x longer on NumPy 2.0rc1 than on 1.26.4.
I have read:
https://numpy.org/devdocs/numpy_2_0_migration_guide.html#changes-to-numpy-data-type-promotion
Setting
np._set_promotion_state("weak_and_warn")
does give the warning
UserWarning: result dtype changed due to the removal of value-based promotion from NumPy. Changed from float32 to float64.
The question I have is whether this is considered counter-intuitive behavior for an in-place operation. The slower runtime would seem to imply that a temporary ndarray was created, which defeats the purpose of an in-place operation. (i.e. to avoid memory allocation for temporaries)
Reproduce the code example:
import numpy as np
data = np.full((4000, 6000), 42, dtype=np.float32)
%timeit global data; data -= np.float64(0)
Error message:
(with NumPy 2.0rc1)
27.4 ms ± 207 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
(with Numpy 1.26.4)
9.5 ms ± 299 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Python and NumPy Versions:
2.0.0rc1
3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
1.26.4
3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
Runtime Environment:
[{'numpy_version': '2.0.0rc1',
'python': '3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC '
'v.1929 64 bit (AMD64)]',
'uname': uname_result(system='Windows', node='LAPTOP-GP728CM2', release='10', version='10.0.22621', machine='AMD64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2'],
'not_found': ['AVX512F',
'AVX512CD',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL']}}]
None
Context for the issue:
This performance regression was found in pyqtgraph
pyqtgraph/pyqtgraph#2974