8000 npyio.py: genfromtxt() handles comments incorrectly with names=True (Trac #2184) · Issue #637 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

npyio.py: genfromtxt() handles comments incorrectly with names=True (Trac #2184) #637

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

B 8000 y clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
numpy-gitbot opened this issue Oct 19, 2012 · 7 comments · Fixed by #25813
Closed

Comments

@numpy-gitbot
Copy link

Original ticket http://projects.scipy.org/numpy/ticket/2184 on 2012-07-11 by trac user khaeru, assigned to unknown.

The documentation for genfromtxt() reads:

When the variables are named (either by a flexible dtype or with names, there must not be any header in the file (else a ValueError exception is raised).

and also:

If names is True, the field names are read from the first valid line after the first skip_header lines.

The cause of this seems to be in [https://github.com/numpy/numpy/blob/master/numpy/lib/npyio.py#L1347 numpy/lib/npyio.py at lines 1347-9]:

    if names is True:
        if comments in first_line:
            first_line = asbytes('').join(first_line.split(comments)[1:])

The last line should read first_line = first_line.split(comments)[0].

With the current code, the input line:

# Example comment line

will be transformed to:

Example comment line

resulting in columns named 'Example', 'comment' and 'line' (this is what the warning in the documentation is about).

But also the input line:

ColumnA ColumnB ColumnC # the column names precede this comment

will be transformed to:

the column names precede this comment

resulting in columns named 'the', 'column', 'names' …etc. In this instance actual column names present in the file are inappropriately discarded.

By taking the [0] portion of the split instead of [1:]:

  • Lines beginning with comments result in an empty string being passed to split_lines() on L1350, producing no usable output and causing the while not first_values loop to try the next line.
  • Partial-line comments following actual heading names are discarded, instead of the names themselves.
  • As a result, files can have commented headers of any length and column names, simultaneously.
@numpy-gitbot
Copy link
Author

trac user khaeru wrote on 2012-07-11

Sorry, bad title. Also, what's the difference between the Trac issues list and https://github.com/numpy/numpy/issues ?

@numpy-gitbot
Copy link
Author

Title changed from Remove to npyio.py: genfromtxt() handles comments incorrectly with names=True by trac user khaeru on 2012-07-11

@numpy-gitbot
Copy link
Author

@rgommers wrote on 2012-07-12

We opened Github issues only a few weeks ago, we're in the process of transitioning all Trac tickets to it. When that's done we'll close Trac, or make it read-only. For now you can use either one.

@numpy-gitbot
Copy link
Author

@rgommers wrote on 2012-07-12

Suggested fix looks correct.

@numpy-gitbot
Copy link
Author

trac user khaeru wrote on 2012-07-12

Oh, I see — well, I also posted a branch with this fix and a pull request: #351

@charris
Copy link
Member
charris commented Feb 16, 2014

#351 was closed as a wrong fix -- broke user code. Don't know the status of fixing this.

@mattip
Copy link
Member
mattip commented Jan 11, 2018

It seems there are two issues here, one is documentation (in the notes):

When the variables are named (either by a flexible dtype or with names, there must not be any header in the file

should be

When the variables are named (either by a flexible dtype or with a names sequence), there must not be any header in the file

The second is what happens when names=True and there is a comment character in the names line. Since changing this may break user code, the documentation for names should read (addition in bold):

If names is True, the field names are read from the first line after the first skip_header lines. This line can optionally be proceeded by a comment delimeter. Any content before the comment delimiter is discarded

liang3zy22 added a commit to liang3zy22/numpy that referenced this issue Feb 13, 2024
Fix numpygh-637 for genfromtxt documentation.

[skip cirrus] [skip azp] [skip actions]

Signed-off-by: Liang Yan <ckgppl_yan@sina.cn>
liang3zy22 added a commit to liang3zy22/numpy that referenced this issue Feb 14, 2024
Fix numpygh-637 for genfromtxt documentation.

[skip cirrus] [skip azp] [skip actions]

Signed-off-by: Liang Yan <ckgppl_yan@sina.cn>
liang3zy22 added a commit to liang3zy22/numpy that referenced this issue Feb 15, 2024
Fix numpygh-637 for genfromtxt documentation.

[skip cirrus] [skip azp] [skip actions]

Signed-off-by: Liang Yan <ckgppl_yan@sina.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants
0