8000 ENH: core: add hack enabling unpickling Py2 pickled scalars on Py3 under encoding='latin1' by pv · Pull Request #4883 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

ENH: core: add hack enabling unpickling Py2 pickled scalars on Py3 under encoding='latin1' #4883

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 18, 2014

Conversation

pv
Copy link
Member
@pv pv commented Jul 18, 2014

There is a similar dirty hack in place for arrays, but scalar unpickling was not covered.

Should provide a workaround for gh-4879

@juliantaylor
Copy link
Contributor

-1, I don't think adding more latin1 hacks is the right way, e.g. with this patch this won't work (using #aa☎#):

numpy.core.multiarray.scalar(numpy.dtype('U5'), '#\x00\x00\x00a\x00\x00\x00a\x00\x00\x00\x0e&\x00\x00#\x00\x00\x00')

I think the better way is using the pickle protocol, variant 0 does always seem to use utf32 (or ucs2 for narrow builds) as its the python2 protocol, we can use that to decode correctly (just use PyUnicode_DATA for the wide build at least. mixing pickles from wide and narrow builds could be problematic), the newer ones work differently but I'm willing to assume they are saner but I still need to check.
I'll have a go at this this weekend.

@pv
Copy link
Member Author
pv commented Jul 18, 2014

If the user does not pass encoding='some tolerant encoding' or encoding='bytes', the execution never even reaches Numpy code.

Moreover, I don't think the protocol affects string unpickling very much, as it seems to be mainly about adding more opcodes.

@pv
Copy link
Member Author
pv commented Jul 18, 2014

As far as I was able to verify, the purpose of the function numpy.core.multiarray.scalar is to support the scalar reduce pickling method. It will in normal use never be called with unicode as the second argument. When it is called with unicode as second argument, the encoding of the original raw data is determined by the encoding= parameter given to pickle.load.

@juliantaylor
Copy link
Contributor

hm, right I was testing the function itself, so I guess this is ok. what is actually writting the strings as latin1? is it numpy or python?
I think another testcase with the non-latin1 string loaded with encoding='latin1' should be added

@juliantaylor
Copy link
Contributor

also please run

git rebase --onto $(git merge-base master maintenance/1.9.x) HEAD^

as I think this should be merged to 1.9

@pv
Copy link
Member Author
pv commented Jul 18, 2014

Yes, you're right that there should also be non-latin1 unicode test case. Adding that...

Nothing actually writes latin1 strings --- on Python2, numpy hands over the raw binary data as a Python string object to pickle. Python 3 however loads the corresponding S opcode data as unicode, using an encoding determined by the encoding= parameter to pickle.load*, given by the user with ASCII as the default value.

The cleanest case for us would be encoding="bytes", but this is not always desirable if there's other stuff in the pickle (and we have some bugs that cause it to fail in dtype.setstate).

On Python 3, Numpy hands over the raw data as Python bytes object, so the problem does not arise with Py3 generated pickles.

@pv
Copy link
Member Author
pv commented Jul 18, 2014

Better tests + rebased

@juliantaylor
Copy link
Contributor

needs a rebase due to the bugfix merge which touches the same test

8000

…der encoding='latin1'

There is a similar hack in place for arrays, but scalar unpickling was not covered.

Provides a workaround for numpygh-4879
@pv
Copy link
Member Author
pv commented Jul 18, 2014

rebased

juliantaylor added a commit that referenced this pull request Jul 18, 2014
ENH: core: add hack enabling unpickling Py2 pickled scalars on Py3 under encoding='latin1'
@juliantaylor juliantaylor merged commit 4cbced2 into numpy:master Jul 18, 2014
juliantaylor added a commit that referenced this pull request Jul 18, 2014
ENH: core: add hack enabling unpickling Py2 pickled scalars on Py3 under encoding='latin1'
@juliantaylor
Copy link
Contributor

thanks, also pushed to 1.9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0