8000 BUG: `np.loadtxt` return F_CONTIGUOUS ndarray if row size is too big · Issue #26900 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG: np.loadtxt return F_CONTIGUOUS ndarray if row size is too big #26900

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
determ1ne opened this issue Jul 10, 2024 · 0 comments · Fixed by #26901
Closed

BUG: np.loadtxt return F_CONTIGUOUS ndarray if row size is too big #26900

determ1ne opened this issue Jul 10, 2024 · 0 comments · Fixed by #26901
Labels

Comments

@determ1ne
Copy link
Contributor

Describe the issue:

If row size is too big in a text file, np.loadtxt will retrun a np.ndarray with F_CONTIGUOUS set.

Reproduce the code example:

import numpy as np

a = np.arange(25).reshape((-1, 5))
b = np.arange((1 << 14) + 2).reshape((-1, (1 << 13) + 1))

np.savetxt('a.txt', a)
np.savetxt('b.txt', b)

a = np.loadtxt('a.txt')
b = np.loadtxt('b.txt')
print(a.flags)
print(b.flags)

assert(b.tobytes('F') == b.copy().tobytes('F')) # error

Error message:

C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False

  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[25], line 14
     11 print(a.flags)
     12 print(b.flags)
---> 14 assert(b.tobytes('F') == b.copy().tobytes('F'))

AssertionError:

Python and NumPy Versions:

2.1.0.dev0+git20240708.735a477
3.12.4 (main, Jul 9 2024, 15:36:49) [GCC 11.4.0]

Runtime Environment:

[{'numpy_version': '2.1.0.dev0+git20240708.735a477',
'python': '3.12.4 (main, Jul 9 2024, 15:36:49) [GCC 11.4.0]',
'uname': uname_result(system='Linux', node='hn00', release='5.15.0-105-generic', version='#115-Ubuntu SMP Mon Apr 15 09:52:04 UTC 2024', machine='x86_64')},
{'simd_extensions': {'baseline': [], 'found': [], 'not_found': []}},
{'filepath': '/opt/intel/oneapi/mkl/2022.1.0/lib/intel64/libmkl_rt.so.2',
'internal_api': 'mkl',
'num_threads': 48,
'prefix': 'libmkl_rt',
'threading_layer': 'intel',
'user_api': 'blas',
'version': '2022.1-Product'},
{'filepath': '/opt/intel/oneapi/compiler/2022.1.0/linux/compiler/lib/intel64_lin/libiomp5.so',
'internal_api': 'openmp',
'num_threads': 96,
'prefix': 'libiomp',
'user_api': 'openmp',
'version': None}]

Context for the issue:

np.loadtxt creates the PyArrayObject in read_rows, which reads a specific number of rows of the text file based on the allocated buffer size. If the row size was too big, read_rows might decide to read only one line (var min_rows below), and then PyArray_SimpleNewFromDescr would set both F_CONTIGUOUS and C_CONTIGUOUS. This results in an inconsistent behavior with what loadtxt does on small files.

code for context:

if (data_array == NULL) {
if (max_rows < 0) {
/*
* Negative max_rows denotes to read the whole file, we
* approach this by allocating ever larger blocks.
* Adds a number of rows based on `MIN_BLOCK_SIZE`.
* Note: later code grows assuming this is a power of two.
*/
if (row_size == 0) {
/* actual rows_per_block should not matter here */
rows_per_block = 512;
}
else {
/* safe on overflow since min_rows will be 0 or 1 */
size_t min_rows = (
(MIN_BLOCK_SIZE + row_size - 1) / row_size);
while (rows_per_block < min_rows) {
rows_per_block *= 2;
}
}
data_allocated_rows = rows_per_block;
}
else {
data_allocated_rows = max_rows;
}
result_shape[0] = data_allocated_rows;
Py_INCREF(out_descr);
/*
* We do not use Empty, as it would fill with None
* and requiring decref'ing if we shrink again.
*/
data_array = (PyArrayObject *)PyArray_SimpleNewFromDescr(
ndim, result_shape, out_descr);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant
0