-
-
Notifications
You must be signed in to change notification settings - Fork 11k
Strange problem when creating a pandas.Series from void ndarray since 1.15.0 #11668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I ran this with Python 3.4.8 and Pandas 0.22.0: import numpy
import pandas
a = numpy.array([b'abcd'], "V4")
#print(a)
s = pandas.Series(a).apply(lambda x: str(x))
print(s) With Numpy 1.14.5:
With Numpy 1.15.0:
With Numpy 1.15.0 and uncommented "print(a)":
Still very different behaviours, |
I think that there is something uninitialized in the printing function for that dtype. EDIT: Or maybe not. Python 3.3.6, Pandas 0.22.0
@ahaldane Can you think of anything? |
So to be clear, the act of printing the array changes the behavior of the following call to pandas? |
Yes.
I don't know how the printing function is involved in the construction of the pandas.Series, but the object inside the Series is different depending on whether you've printed the ndarray first. e.g. (with with Python 3.4.8, Pandas 0.22.0, Numpy 1.15.0): a = numpy.array([b'abcd'], "V4")
#print(a)
s = pandas.Series(a).apply(lambda x: x.decode('latin-1'))
print(s[0] == 'abcd') Prints |
Is accessing the element sufficient? So replacing |
No. The pandas.Series still contains garbage if I do that. |
How about |
Yes. That fixes the problem. |
Something very strange is going on. After some poking around, I got this behavior:
Seems we're using unitialized memory from somewhere |
Printing |
Well, I've found one bug
should be
For length 1 arrays though like the one in your example, I can't see why that would matter. This causes:
I think this slipped through the cracks since I removed almost all the occurences of |
The bad commit is a83af93 in #8157. Which seems to be the same as @eric-wieser has above. |
Fixing the bug found by @eric-wieser fixes the immediate problem. The |
@eric-wieser Any idea how to test the fix without using pandas? I'm coming up dry. |
I gave a failing test case in my comment above |
The return value for a void array was not correct. Closes numpy#11668.
The return value for a void array was not correct. Closes numpy#11668.
I don't know whether this is Numpy bug or Pandas bug. Since upgrading to 1.15.0, I'm getting garbage in my pandas.Series when I construct it using an ndarray. This worked fine in Numpy 1.14.5.
Reproducing code example:
Error message:
With Numpy 1.14.5, it prints:
With Numpy 1.15.0, it prints:
However, if you uncomment the
print a
, then it will print:Note that the original "V4" ndarray is produced by another library. Ideally, they should probably be producing an "a4" ndarray. But that's another matter.
Numpy/Python version information:
Python 2.7.5
Numpy 1.15.0
Pandas 0.23.3
The text was updated successfully, but these errors were encountered: