8000 ENH: Save to ZIP files without using temporary files. by serhiy-storchaka · Pull Request #9863 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

ENH: Save to ZIP files without using temporary files. #9863

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 15, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
< 8000 span data-view-component="true"> Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions doc/release/1.14.0-notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -263,6 +263,11 @@ common cache line size. This makes ``npy`` files easier to use in
programs which open them with ``mmap``, especially on Linux where an
``mmap`` offset must be a multiple of the page size.

NPZ files now can be written without using temporary files
----------------------------------------------------------
In Python 3.6+ ``numpy.savez`` and ``numpy.savez_compressed`` now write
directly to a ZIP file, without creating intermediate temporary files.

Better support for empty structured and string types
----------------------------------------------------
Structured types can contain zero fields, and string dtypes can contain zero
Expand Down
55 changes: 33 additions & 22 deletions numpy/lib/npyio.py
Original file line number Diff line number Diff line change
Expand Up @@ -661,8 +661,6 @@ def _savez(file, args, kwds, compress, allow_pickle=True, pickle_kwargs=None):
# Import is postponed to here since zipfile depends on gzip, an optional
# component of the so-called standard library.
import zipfile
# Import deferred for startup time improvement
import tempfile

if isinstance(file, basestring):
if not file.endswith('.npz'):
Expand All @@ -686,31 +684,44 @@ def _savez(file, args, kwds, compress, allow_pickle=True, pickle_kwargs=None):

zipf = zipfile_factory(file, mode="w", compression=compression)

# Stage arrays in a temporary file on disk, before writing to zip.

# Since target file might be big enough to exceed capacity of a global
# temporary directory, create temp file side-by-side with the target file.
file_dir, file_prefix = os.path.split(file) if _is_string_like(file) else (None, 'tmp')
fd, tmpfile = tempfile.mkstemp(prefix=file_prefix, dir=file_dir, suffix='-numpy.npy')
os.close(fd)
try:
if sys.version_info >= (3, 6):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might add a comment that since Python 3.6 it is possible to write directly to a zipfile.

# Since Python 3.6 it is possible to write directly to a ZIP file.
for key, val in namedict.items():
fname = key + '.npy'
fid = open(tmpfile, 'wb')
try:
format.write_array(fid, np.asanyarray(val),
val = np.asanyarray(val)
force_zip64 = val.nbytes >= 2**30
with zipf.open(fname, 'w', force_zip64=force_zip64) as fid:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is force_zip64 an unrelated fix here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and should this be wb to match the one below?

Copy link
Member
@charris charris Oct 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mode parameter, if included, must be 'r' (the default) or 'w'

I suspect the zipped files are automatically binary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

force_zip64 is needed when write a stream. Tests are failed without this option.

'r' and 'w' are the only supported options. Opened file-like objects are binary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like zip64 has been available since 2.7, but the force_zip64 keyword is new. It is used for files of unknown size that may exceed 2 GiB. As we are formatting the array, I suppose it qualifies as of unknown size, but probably not by much.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be using force_zip64 on all version of python 3, not just 3.6?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

force_zip64 doesn't exist in earlier versions.

format.write_array(fid, val,
allow_pickle=allow_pickle,
pickle_kwargs=pickle_kwargs)
fid.close()
fid = None
zipf.write(tmpfile, arcname=fname)
except IOError as exc:
raise IOError("Failed to write to %s: %s" % (tmpfile, exc))
finally:
if fid:
else:
# Stage arrays in a temporary file on disk, before writing to zip.

# Import deferred for startup time improvement
import tempfile
# Since target file might be big enough to exceed capacity of a global
# temporary directory, create temp file side-by-side with the target file.
file_dir, file_prefix = os.path.split(file) if _is_string_like(file) else (None, 'tmp')
fd, tmpfile = tempfile.mkstemp(prefix=file_prefix, dir=file_dir, suffix='-numpy.npy')
os.close(fd)
try:
for key, val in namedict.items():
fname = key + '.npy'
fid = open(tmpfile, 'wb')
try:
format.write_array(fid, np.asanyarray(val),
allow_pickle=allow_pickle,
pickle_kwargs=pickle_kwargs)
fid.close()
5E3C finally:
os.remove(tmpfile)
fid = None
zipf.write(tmpfile, arcname=fname)
except IOError as exc:
raise IOError("Failed to write to %s: %s" % (tmpfile, exc))
finally:
if fid:
fid.close()
finally:
os.remove(tmpfile)

zipf.close()

Expand Down
0