8000 BUG: fix np.save tokenizer for python 2.7.5 by pnbat · Pull Request #10523 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG: fix np.save tokenizer for python 2.7.5 #10523

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from
Closed

Conversation

pnbat
Copy link
Contributor
@pnbat pnbat commented Feb 5, 2018

The changes include a bugfix for this issue here (tested it with python 2.7.5 and 2.7.14, but unfortunately I cannot test it with python 3). This is basically the last commit. I went for the easy solution, adding a newline right before the filtering and removing it after the filtering. To make this work I also had to change from StringIO.read to StringIO.readline (which seems more appropriate in any case).

Apart from that I started with some refactoring to make the code more readable and more testable. It should also be quite easy to make the alignment configurable now. I could continue a bit more into that direction, but I would be happy about a bit of feedback at this point.

It is also not really clear to me, what is part of the public API of the format module. To me the methods of interest are write_array, read_array and open_mmap. The methods magic, read_magic, dtype_to_descr, header_data_from_array_1_0, write_array_header_1_0, write_array_header_2_0, read_array_header_1_0, read_array_header_2_0 do not look like they should be exposed. What is the reason behind that?

@charris charris added this to the 1.14.1 release milestone Feb 5, 2018
@pnbat
Copy link
Contributor Author
pnbat commented Feb 5, 2018

What is the issue with circleci? I suspect it is not related to my changes...

@eric-wieser
Copy link
Member

Don't worry about circle CI yet - it was added recently and seems to have teething trouble.

This seems like too large a refactor to slip into a bugfic release, especially since the original problem was not a bug in our code. I'd be more comfortable with a tiny change for the workaround, and then queue uo this refactor for 1.15

@pnbat
Copy link
Contributor Author
pnbat commented Feb 5, 2018

Well, the last two commits should fix the original issue.
Regarding the refactoring. I did not touch any tests and kept the interface of the format module as it was before. So it boils down to how much you trust the tests :) I can also prepare two pull requests, one for the bugfix and the other for the refactoring. In fact, I would also like to continue a bit more with the refactoring. But for that I would be happy about some feedback and I also need to know which methods in the format module need to be exposed (public) and which are exposed for other reasons (e.g. for tests without calling private methods).

@eric-wieser
Copy link
Member

In practice, I think we consider np.lib.* to be private, and only the functions exposed through np.* to be public.

# Totally non-contiguous data. We will have to make it C-contiguous
# before writing. Note that we need to test for C_CONTIGUOUS first
# because a 1-D array is both C_CONTIGUOUS and F_CONTIGUOUS.
d['fortran_order'] = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should be preserved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the feeling that the part about "[w]e will have to make it C-contiguous before writing" belongs somewhere else. At least I do not see the connection between setting d['fortran_order'] = False and making it C-contiguous before writing. The part that c_contiguous implies not f_contiguous could be explained above the refactored code...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to fill you in: The c_contiguous and f_contiguous flags have somewhat confusing meanings which it is too late for us to change for back-compat reasons. It is possible for arrays to be both c-contiguous and f-contiguous (eg, all non-unusually-strided 1d arrays are like this). It is also possible to be neither (eg, for weird strides). I think the logic here is that if it is neither, we will later copy it into a C-contiguous buffer, so fortran_ordered should be False.

I agree something about this code seems a bit messy, since we repeat the check for f-order in two places: Here, and in write_array (which is where we copy to a C-contiguous buffer if needed). But maybe it's OK...

result_list = ["{"]
for key, value in sorted(dictionary.items()):
# Need to use repr here, since we eval these when reading
result_list.append("'%s': %s, " % (key, repr(value)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use repr for both, via %r

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was like that before, I just moved this piece of code around a bit...

@charris
Copy link
Member
charris commented Feb 5, 2018

I agree with Eric here that a very simple fix for the 1.14 problems is desirable, refactoring is best left to 1.15 where there is more time

@charris
Copy link
Member
charris commented Feb 5, 2018

The circleci problem probably indicates that you didn't branch off of current master. I don't know if a rebase will help as circleci seems to not update once it has run. Might be worth a try, though, just so we can learn if that works.

@pnbat
Copy link
Contributor Author
pnbat commented Feb 5, 2018

@charris If you want I can prepare two new pull requests. One with the bugfix and then a second one with the refactorings so far. I just have to do it as soon as possible, since I am currently between two jobs and from tomorrow on I need to get a permission from my employer. This should not be a problem, but might take a few days.

@pnbat
Copy link
Contributor Author
pnbat commented Feb 5, 2018

@charris The new pull request for the bugfix is #10524 (circlet ci is working fine now) and the new pull request for the refactoring is #10525.

@pnbat pnbat closed this Feb 5, 2018
@charris
Copy link
Member
charris commented Feb 5, 2018

@pnbat Great, thanks.

@charris charris removed this from the 1.14.1 release milestone Feb 5, 2018
@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Feb 5, 2018
@rgommers
Copy link
Member
rgommers commented Feb 5, 2018

In practice, I think we consider np.lib.* to be private, and only the functions exposed through np.* to be public.

Not really. Everything that's also in np.* should be imported from there, however there are certainly functions and classes that are only exposed in np.lib (e.g. NumpyVersion, stride_tricks, Arrayterator).

@eric-wieser
Copy link
Member

Looks like I made that up - thanks for calling me out on it - looking again, we've done a pretty good job in lib of naming private helper functions appropriately

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0