-
-
Notifications
You must be signed in to change notification settings - Fork 11.4k
Description
Describe the issue:
numpy.memmap
fails when attempting to create an empty memmap
that has an offset
which is a multiple of mmap.ALLOCATIONGRANULARITY
.
The issue occurs due to the fact that under these conditions, numpy.memmap
calls mmap.mmap
which length=0
which numpy assumes means a length of zero however as described in the mmap python docs:
If length is 0, the maximum length of the map will be the current size of the file when mmap is called.
Reproduce the code example:
import numpy as np
def empty_memmap(offset):
fname = "test.dat"
a = np.array([]) # empty array
with open(fname, 'wb+') as f:
f.write(b'c'*offset)
mm = np.memmap(f, shape=a.shape, dtype=a.dtype, offset=offset, mode='r+')
print(mm)
# This works
empty_memmap(4321)
# This fails
empty_memmap(4096)
# This also fails
empty_memmap(2*4096)
Error message:
Traceback (most recent call last):
File "/home/luke/m.py", line 15, in <module>
empty_memmap(4096)
File "/home/luke/m.py", line 8, in empty_memmap
mm = np.memmap(f, shape=a.shape, dtype=a.dtype, offset=offset, mode='r+')
File "/home/luke/user310/lib/python3.10/site-packages/numpy/_core/memmap.py", line 280, in __new__
mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)
ValueError: mmap offset is greater than file size
Python and NumPy Versions:
2.1.3
3.10.12 (main, Sep 11 2024, 15:47:36) [GCC 11.4.0]
Runtime Environment:
[{'numpy_version': '2.1.3',
'python': '3.10.12 (main, Sep 11 2024, 15:47:36) [GCC 11.4.0]',
'uname': uname_result(system='Linux', node='titan', release='5.15.0-122-generic', version='#132-Ubuntu SMP Thu Aug 29 13:45:52 UTC 2024', machine='x86_64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2'],
'not_found': ['AVX512F',
'AVX512CD',
'AVX512_KNL',
'AVX512_KNM',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL']}}]
Context for the issue:
While this is an extreme edge case it is also important to resolve. In my case I had a program which had been running reliably in a production setting for number of years and which crashed due to this issue when the offset of my memmap coincidentally happened to be a multiple of the allocation ganularity.
The fix to this issue is fairly straightforward and I will be submitting a pull request shortly.