-
-
Notifications
You must be signed in to change notification settings - Fork 11k
ENH: Save to ZIP files without using temporary files. #9863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
818c4c7
to
d5ec8e9
Compare
numpy/lib/npyio.py
Outdated
try: | ||
format.write_array(fid, np.asanyarray(val), | ||
val = np.asanyarray(val) | ||
force_zip64 = val.nbytes > 1000000000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be clearer if a simple power of 2 is used, something like val.nbytes > 2**30
.
fd, tmpfile = tempfile.mkstemp(prefix=file_prefix, dir=file_dir, suffix='-numpy.npy') | ||
os.close(fd) | ||
try: | ||
if sys.version_info >= (3, 6): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might add a comment that since Python 3.6 it is possible to write directly to a zipfile.
This is worthy of a release note. Could you add something to |
d5ec8e9
to
169d295
Compare
Done. If you with I can add other improvements. Save small data without using temporary files even in Python <3.6. Use the context manager for avoiding file descriptor leak in case of failure. |
How would you do that? The problem is that the array data needs to be formatted and the resultant stream cannot be written directly to the zipfile. |
Upgrades are pretty easy these days, I'd be inclined to simply point to Python 3.6 for people who find the old implementation limiting. |
The size of the formatted data can be estimated by the |
Hmm, those are pretty small files. I suspect speed is not of great importance here, but size can be. However, if you wish to add that I won't complain. |
Since Python 3.6 it is possible to write directly to a ZIP file, without creating temporary files.
169d295
to
52c1ef6
Compare
If speed is not of great importance here, I think it is not worth to complicate the code. I have moved |
format.write_array(fid, np.asanyarray(val), | ||
val = np.asanyarray(val) | ||
force_zip64 = val.nbytes >= 2**30 | ||
with zipf.open(fname, 'w', force_zip64=force_zip64) as fid: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is force_zip64
an unrelated fix here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and should this be wb
to match the one below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mode parameter, if included, must be 'r' (the default) or 'w'
I suspect the zipped files are automatically binary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
force_zip64
is needed when write a stream. Tests are failed without this option.
'r'
and 'w'
are the only supported options. Opened file-like objects are binary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like zip64 has been available since 2.7, but the force_zip64
keyword is new. It is used for files of unknown size that may exceed 2 GiB. As we are formatting the array, I suppose it qualifies as of unknown size, but probably not by much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be using force_zip64
on all version of python 3, not just 3.6?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
force_zip64
doesn't exist in earlier versions.
Thanks @serhiy-storchaka . |
Thank you for your time and review @charris. |
In Python 3.6 large data can be written to ZIP files directly, without using temporary files.