8000 BUG: np.array in 2.1.0 or newer on pandas.Series takes a reference instead of copy · Issue #27938 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG: np.array in 2.1.0 or newer on pandas.Series takes a reference instead of copy #27938

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
daiweili opened this issue Dec 8, 2024 · 3 comments
Labels
00 - Bug 57 - Close? Issues which may be closable unless discussion continued

Comments

@daiweili
Copy link
daiweili commented Dec 8, 2024

Describe the issue:

The assertion in the repro below fails when using numpy >= 2.1.0 and succeeds with numpy <= 2.0.2, with pandas 2.2.3 (latest; I didn't try an earlier pandas).

Reproduce the code example:

import pandas as pd
import numpy as np

print(f'{pd.__version__=}')
print(f'{np.__version__=}')
s = pd.Series([1.0,2.0,3.0])
s_r = np.rint(s).astype(np.int32)
# This next line makes a copy in numpy 2.0.2
# and incorrect takes a reference in numpy 2.1.0.
a = np.array(s_r, dtype=np.int32)
print('before:')
print(f's_r:\n{s_r}')
print(f'a:\n{a}')
a += 1
print('after:')
np.testing.assert_array_equal(s_r, [1.0,2.0,3.0])
print(f's_r:\n{s_r}')
print(f'a:\n{a}')

Error message:

Traceback (most recent call last):
  File "/usr/local/google/home/daiweili/src/nptest/test.py", line 20, in <module>
    np.testing.assert_array_equal(s_r, [1.0,2.0,3.0])
  File "/usr/local/google/home/daiweili/src/nptest/venv/lib/python3.11/site-packages/numpy/_utils/__init__.py", line 85, in wrapper
    return fun(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/google/home/daiweili/src/nptest/venv/lib/python3.11/site-packages/numpy/testing/_private/utils.py", line 1025, in assert_array_equal
    assert_array_compare(operator.__eq__, actual, desired, err_msg=err_msg,
  File "/usr/lib/python3.11/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/google/home/daiweili/src/nptest/venv/lib/python3.11/site-packages/numpy/testing/_private/utils.py", line 889, in assert_array_compare
    raise AssertionError(msg)
AssertionError:
Arrays are not equal

Mismatched elements: 3 / 3 (100%)
Max absolute difference among violations: 1.
Max relative difference among violations: 1.
 ACTUAL: array([2, 3, 4], dtype=int32)
 DESIRED: array([1., 2., 3.])

Python and NumPy Versions:

3.11.9 (main, Jun 19 2024, 00:38:48) [GCC 13.2.0]
2.1.0

Runtime Environment:

[{'numpy_version': '2.0.2',
'python': '3.11.9 (main, Jun 19 2024, 00:38:48) [GCC 13.2.0]',
'uname': uname_result(system='Linux', node='', release='6.10.11-1rodete2-amd64', version='#1 SMP PREEMPT_DYNAMIC Debian 6.10.11-1rodete2 (2024-10-16)', machine='x86_64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2',
'AVX512F',
'AVX512CD',
'AVX512_SKX'],
'not_found': ['AVX512_KNL',
'AVX512_KNM',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL']}},
{'architecture': 'SkylakeX',
'filepath': '/home/daiweili/src/nptest/venv/lib/python3.11/site-packages/numpy.libs/libscipy_openblas64_-99b71e71.so',
'internal_api': 'openblas',
'num_threads': 64,
'prefix': 'libscipy_openblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.27'}]

Context for the issue:

This was causing numerical bugs in our codebase. We managed to work around it by explicitly doing a .copy(), but according to the documentation in https://numpy.org/doc/2.1/reference/generated/numpy.array.html, the copy should be happening in the array init.

@daiweili daiweili changed the title BUG: np.array in 2.1.0 on pandas.Series takes a reference instead of copy BUG: np.array in 2.1.0 or newer on pandas.Series takes a reference instead of copy Dec 8, 2024
@seberg
Copy link
Member
seberg commented Dec 9, 2024

This was a pandas bug, it's fixed in 2.3 and 3.0. But, I suppose the 2.3 release is delayed a bit unfortunately.
See also gh-57739

@seberg seberg added the 57 - Close? Issues which may be closable unless discussion continued label Dec 9, 2024
@maximeLeurent
Copy link
maximeLeurent commented Dec 11, 2024

Hi,

I face the same issue, i used to be work in old versions :
python3.9.9
numpy =1.26.4
pandas =1.3.4

I switch to :
python3.11.11
numpy = 2.1.3
pandas = 2.2.3

here my piece of code showing the issue of reference

import pandas as pd
import numpy as np

df = pd.DataFrame({"a": [True, False]})
m = np.array(df["a"]) #<<<<== here the problem
m &= np.array([False, False])
print(df)

With old version I got :

       a
0   True
1  False

In new version i got

       a
0  False
1  False

result is the same using

m = df["a"].to_numpy()

@seberg
Copy link
Member
seberg commented Dec 11, 2024

There is nothing NumPy can reasonably do. We could make the copy for pandas, but all that is needed to fix this is the next pandas 2.3.4 release and NumPy would also have to make a release just for this purpose...

@jorisvandenbossche can we expect the pandas 2.3.4 release soon? I somewhat thought it would exist by now. FWIW, I dunno how big of a problem this one is, but since it can silently do wrong things, I think it may be a pretty big one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
00 - Bug 57 - Close? Issues which may be closable unless discussion continued
Projects
None yet
Development

No branches or pull requests

3 participants
0