You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The assertion in the repro below fails when using numpy >= 2.1.0 and succeeds with numpy <= 2.0.2, with pandas 2.2.3 (latest; I didn't try an earlier pandas).
Reproduce the code example:
importpandasaspdimportnumpyasnpprint(f'{pd.__version__=}')
print(f'{np.__version__=}')
s=pd.Series([1.0,2.0,3.0])
s_r=np.rint(s).astype(np.int32)
# This next line makes a copy in numpy 2.0.2# and incorrect takes a reference in numpy 2.1.0.a=np.array(s_r, dtype=np.int32)
print('before:')
print(f's_r:\n{s_r}')
print(f'a:\n{a}')
a+=1print('after:')
np.testing.assert_array_equal(s_r, [1.0,2.0,3.0])
print(f's_r:\n{s_r}')
print(f'a:\n{a}')
Error message:
Traceback (most recent call last):
File "/usr/local/google/home/daiweili/src/nptest/test.py", line 20, in<module>
np.testing.assert_array_equal(s_r, [1.0,2.0,3.0])
File "/usr/local/google/home/daiweili/src/nptest/venv/lib/python3.11/site-packages/numpy/_utils/__init__.py", line 85, in wrapper
return fun(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/google/home/daiweili/src/nptest/venv/lib/python3.11/site-packages/numpy/testing/_private/utils.py", line 1025, in assert_array_equal
assert_array_compare(operator.__eq__, actual, desired, err_msg=err_msg,
File "/usr/lib/python3.11/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/google/home/daiweili/src/nptest/venv/lib/python3.11/site-packages/numpy/testing/_private/utils.py", line 889, in assert_array_compare
raise AssertionError(msg)
AssertionError:
Arrays are not equal
Mismatched elements: 3 / 3 (100%)
Max absolute difference among violations: 1.
Max relative difference among violations: 1.
ACTUAL: array([2, 3, 4], dtype=int32)
DESIRED: array([1., 2., 3.])
Python and NumPy Versions:
3.11.9 (main, Jun 19 2024, 00:38:48) [GCC 13.2.0]
2.1.0
This was causing numerical bugs in our codebase. We managed to work around it by explicitly doing a .copy(), but according to the documentation in https://numpy.org/doc/2.1/reference/generated/numpy.array.html, the copy should be happening in the array init.
The text was updated successfully, but these errors were encountered:
daiweili
changed the title
BUG: np.array in 2.1.0 on pandas.Series takes a reference instead of copy
BUG: np.array in 2.1.0 or newer on pandas.Series takes a reference instead of copy
Dec 8, 2024
I face the same issue, i used to be work in old versions :
python3.9.9
numpy =1.26.4
pandas =1.3.4
I switch to :
python3.11.11
numpy = 2.1.3
pandas = 2.2.3
here my piece of code showing the issue of reference
import pandas as pd
import numpy as np
df = pd.DataFrame({"a": [True, False]})
m = np.array(df["a"]) #<<<<== here the problem
m &= np.array([False, False])
print(df)
There is nothing NumPy can reasonably do. We could make the copy for pandas, but all that is needed to fix this is the next pandas 2.3.4 release and NumPy would also have to make a release just for this purpose...
@jorisvandenbossche can we expect the pandas 2.3.4 release soon? I somewhat thought it would exist by now. FWIW, I dunno how big of a problem this one is, but since it can silently do wrong things, I think it may be a pretty big one.
Describe the issue:
The assertion in the repro below fails when using numpy >= 2.1.0 and succeeds with numpy <= 2.0.2, with pandas 2.2.3 (latest; I didn't try an earlier pandas).
Reproduce the code example:
Error message:
Python and NumPy Versions:
3.11.9 (main, Jun 19 2024, 00:38:48) [GCC 13.2.0]
2.1.0
Runtime Environment:
[{'numpy_version': '2.0.2',
'python': '3.11.9 (main, Jun 19 2024, 00:38:48) [GCC 13.2.0]',
'uname': uname_result(system='Linux', node='', release='6.10.11-1rodete2-amd64', version='#1 SMP PREEMPT_DYNAMIC Debian 6.10.11-1rodete2 (2024-10-16)', machine='x86_64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2',
'AVX512F',
'AVX512CD',
'AVX512_SKX'],
'not_found': ['AVX512_KNL',
'AVX512_KNM',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL']}},
{'architecture': 'SkylakeX',
'filepath': '/home/daiweili/src/nptest/venv/lib/python3.11/site-packages/numpy.libs/libscipy_openblas64_-99b71e71.so',
'internal_api': 'openblas',
'num_threads': 64,
'prefix': 'libscipy_openblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.27'}]
Context for the issue:
This was causing numerical bugs in our codebase. We managed to work around it by explicitly doing a .copy(), but according to the documentation in https://numpy.org/doc/2.1/reference/generated/numpy.array.html, the copy should be happening in the array init.
The text was updated successfully, but these errors were encountered: