-
-
Notifications
You must be signed in to change notification settings - Fork 11k
BUG: fix np.save tokenizer for python 2.7.5 #10523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
What is the issue with circleci? I suspect it is not related to my changes... |
Don't worry about circle CI yet - it was added recently and seems to have teething trouble. This seems like too large a refactor to slip into a bugfic release, especially since the original problem was not a bug in our code. I'd be more comfortable with a tiny change for the workaround, and then queue uo this refactor for 1.15 |
Well, the last two commits should fix the original issue. |
In practice, I think we consider |
# Totally non-contiguous data. We will have to make it C-contiguous | ||
# before writing. Note that we need to test for C_CONTIGUOUS first | ||
# because a 1-D array is both C_CONTIGUOUS and F_CONTIGUOUS. | ||
d['fortran_order'] = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment should be preserved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the feeling that the part about "[w]e will have to make it C-contiguous before writing" belongs somewhere else. At least I do not see the connection between setting d['fortran_order'] = False
and making it C-contiguous before writing. The part that c_contiguous implies not f_contiguous could be explained above the refactored code...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to fill you in: The c_contiguous
and f_contiguous
flags have somewhat confusing meanings which it is too late for us to change for back-compat reasons. It is possible for arrays to be both c-contiguous and f-contiguous (eg, all non-unusually-strided 1d arrays are like this). It is also possible to be neither (eg, for weird strides). I think the logic here is that if it is neither, we will later copy it into a C-contiguous buffer, so fortran_ordered
should be False.
I agree something about this code seems a bit messy, since we repeat the check for f-order in two places: Here, and in write_array
(which is where we copy to a C-contiguous buffer if needed). But maybe it's OK...
result_list = ["{"] | ||
for key, value in sorted(dictionary.items()): | ||
# Need to use repr here, since we eval these when reading | ||
result_list.append("'%s': %s, " % (key, repr(value))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should use repr for both, via %r
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was like that before, I just moved this piece of code around a bit...
I agree with Eric here that a very simple fix for the 1.14 problems is desirable, refactoring is best left to 1.15 where there is more time |
The circleci problem probably indicates that you didn't branch off of current master. I don't know if a rebase will help as circleci seems to not update once it has run. Might be worth a try, though, just so we can learn if that works. |
@charris If you want I can prepare two new pull requests. One with the bugfix and then a second one with the refactorings so far. I just have to do it as soon as possible, since I am currently between two jobs and from tomorrow on I need to get a permission from my employer. This should not be a problem, but might take a few days. |
@pnbat Great, thanks. |
Not really. Everything that's also in |
Looks like I made that up - thanks for calling me out on it - looking again, we've done a pretty good job in |
The changes include a bugfix for this issue here (tested it with python 2.7.5 and 2.7.14, but unfortunately I cannot test it with python 3). This is basically the last commit. I went for the easy solution, adding a newline right before the filtering and removing it after the filtering. To make this work I also had to change from StringIO.read to StringIO.readline (which seems more appropriate in any case).
Apart from that I started with some refactoring to make the code more readable and more testable. It should also be quite easy to make the alignment configurable now. I could continue a bit more into that direction, but I would be happy about a bit of feedback at this point.
It is also not really clear to me, what is part of the public API of the format module. To me the methods of interest are
write_array
,read_array
andopen_mmap
. The methodsmagic
,read_magic
,dtype_to_descr
,header_data_from_array_1_0
,write_array_header_1_0
,write_array_header_2_0
,read_array_header_1_0
,read_array_header_2_0
do not look like they should be exposed. What is the reason behind that?