npyio.py: genfromtxt() handles comments incorrectly with names=True (Trac #2184) #637

numpy-gitbot · 2012-10-19T15:10:40Z

Original ticket http://projects.scipy.org/numpy/ticket/2184 on 2012-07-11 by trac user khaeru, assigned to unknown.

The documentation for genfromtxt() reads:

When the variables are named (either by a flexible dtype or with names, there must not be any header in the file (else a ValueError exception is raised).

and also:

If names is True, the field names are read from the first valid line after the first skip_header lines.

The cause of this seems to be in [https://github.com/numpy/numpy/blob/master/numpy/lib/npyio.py#L1347 numpy/lib/npyio.py at lines 1347-9]:

    if names is True:
        if comments in first_line:
            first_line = asbytes('').join(first_line.split(comments)[1:])

The last line should read first_line = first_line.split(comments)[0].

With the current code, the input line:

# Example comment line

will be transformed to:

Example comment line

resulting in columns named 'Example', 'comment' and 'line' (this is what the warning in the documentation is about).

But also the input line:

ColumnA ColumnB ColumnC # the column names precede this comment

will be transformed to:

the column names precede this comment

resulting in columns named 'the', 'column', 'names' …etc. In this instance actual column names present in the file are inappropriately discarded.

By taking the [0] portion of the split instead of [1:]:

Lines beginning with comments result in an empty string being passed to split_lines() on L1350, producing no usable output and causing the while not first_values loop to try the next line.
Partial-line comments following actual heading names are discarded, instead of the names themselves.
As a result, files can have commented headers of any length and column names, simultaneously.

The text was updated successfully, but these errors were encountered:

numpy-gitbot · 2012-10-19T15:10:40Z

trac user khaeru wrote on 2012-07-11

Sorry, bad title. Also, what's the difference between the Trac issues list and https://github.com/numpy/numpy/issues ?

numpy-gitbot · 2012-10-19T15:10:41Z

Title changed from Remove to npyio.py: genfromtxt() handles comments incorrectly with names=True by trac user khaeru on 2012-07-11

numpy-gitbot · 2012-10-19T15:10:41Z

@rgommers wrote on 2012-07-12

We opened Github issues only a few weeks ago, we're in the process of transitioning all Trac tickets to it. When that's done we'll close Trac, or make it read-only. For now you can use either one.

numpy-gitbot · 2012-10-19T15:10:41Z

@rgommers wrote on 2012-07-12

Suggested fix looks correct.

numpy-gitbot · 2012-10-19T15:10:41Z

trac user khaeru wrote on 2012-07-12

Oh, I see — well, I also posted a branch with this fix and a pull request: #351

charris · 2014-02-16T17:02:26Z

#351 was closed as a wrong fix -- broke user code. Don't know the status of fixing this.

mattip · 2018-01-11T06:16:37Z

It seems there are two issues here, one is documentation (in the notes):

When the variables are named (either by a flexible dtype or with names, there must not be any header in the file

should be

When the variables are named (either by a flexible dtype or with a names sequence), there must not be any header in the file

The second is what happens when names=True and there is a comment character in the names line. Since changing this may break user code, the documentation for names should read (addition in bold):

If names is True, the field names are read from the first line after the first skip_header lines. This line can optionally be proceeded by a comment delimeter. Any content before the comment delimiter is discarded

Fix numpygh-637 for genfromtxt documentation. [skip cirrus] [skip azp] [skip actions] Signed-off-by: Liang Yan <ckgppl_yan@sina.cn>

mattip removed the priority: normal label Oct 21, 2018

liang3zy22 added a commit to liang3zy22/numpy that referenced this issue Feb 13, 2024

DOC: Update genfromtxt documentation

3127ede

Fix numpygh-637 for genfromtxt documentation. [skip cirrus] [skip azp] [skip actions] Signed-off-by: Liang Yan <ckgppl_yan@sina.cn>

liang3zy22 mentioned this issue Feb 13, 2024

DOC: Update genfromtxt documentation #25813

Merged

liang3zy22 added a commit to liang3zy22/numpy that referenced this issue Feb 14, 2024

DOC: Update genfromtxt documentation

ad99c62

Fix numpygh-637 for genfromtxt documentation. [skip cirrus] [skip azp] [skip actions] Signed-off-by: Liang Yan <ckgppl_yan@sina.cn>

liang3zy22 added a commit to liang3zy22/numpy that referenced this issue Feb 15, 2024

DOC: Update genfromtxt documentation

8644162

Fix numpygh-637 for genfromtxt documentation. [skip cirrus] [skip azp] [skip actions] Signed-off-by: Liang Yan <ckgppl_yan@sina.cn>

charris closed this as completed in #25813 Feb 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

npyio.py: genfromtxt() handles comments incorrectly with names=True (Trac #2184) #637

npyio.py: genfromtxt() handles comments incorrectly with names=True (Trac #2184) #637

npyio.py: genfromtxt() handles comments incorrectly with names=True (Trac #2184) #637

npyio.py: genfromtxt() handles comments incorrectly with names=True (Trac #2184) #637

Comments