-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
ENH: core: make unpickling with encoding='bytes' work #4888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Interesting test failure pattern, 3.2 and 3.3 fail, 3.4 works. |
The |
Want to try Julian's rebase trick? Probably need |
I think we need some docs somewhere explaining how to load py2 pickles into py3? |
hm wil
8000
l this fix gh-4798? |
@charris: I think this is already based on the merge base. @njsmith: maybe separate PR, I don't immediately see where it should be put. @juliantaylor: yes, if you also add The right thing to do with |
hm I guess my commit proposal is flawed, merging this onto maintenance conflicts for no good reason :/ |
oh it does work but we need the recursive merge strategy
|
@juliantaylor: the parent commit is 88cf0e4 which AFAIK is the current merge base, so I don't see what to fix. (Note also that your merge-base command assumes the local clone also has those branches, and that they are up-to-date, which is not usually the case. Moreover, some people might use |
oh yes right the branch is fine, I screwed up a command used a merge into my master instead of the original branch to test |
Added the dtype field name conversions. |
#undef _ARGSTR_ | ||
PyErr_Clear(); | ||
#endif | ||
if (!PyArg_ParseTuple(args, "(icOOOiii)", &version, &endian_char, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be simpler to check the type of args[2] and change the format string accordingly instead of parsing twice?
PyArg_ParseTuple is a very slow function it could maybe slow down loading large pickles
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about something like this: juliantaylor@bfd96a9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually shouldn't we just convert the endian object to ascii? anything else is not allowed as endian character anyway
Make dtype.__setstate__ accept endian either as a byte string or unicode. Also fix a missing error return introduced in c355157, apparently mistake.
… + coerce on Py3 That 'names' is a tuple and 'fields' a dict (when present) is assumed in most of the code base, so check them during unpickling. Also add coercion from bytes using ASCII codec on Python 3. This is never triggered in the "usual" case of loading Py3-generated pickles on Py3, but can occur if loading Py2 generated pickles with pickle.load(f, encoding='bytes'), which sometimes is the only working option. The ASCII codec is probably the safest choice and likely covers most use cases.
Rewrote it in a saner way. |
looks good, thanks |
ENH: core: make unpickling with encoding='bytes' work
ENH: core: make unpickling with encoding='bytes' work
When loading Py2-generated pickles on Py3, it may be sometimes useful to use
pickle.load(..., encoding='bytes')
instead ofencoding='latin1'
.This is currently blocked by
dtype.__setstate__
not accepting the endian as a byte string.There's not really a good reason to not support this, so we could change it, as below.
The routine also seemed to have some old bug with missing error return, so fix that too.