npyio.py: genfromtxt() handles comments incorrectly with names=True (Trac #2184)

_Original ticket http://projects.scipy.org/numpy/ticket/2184 on 2012-07-11 by trac user khaeru, assigned to unknown._

The documentation for `genfromtxt()` reads:

  When the variables are named (either by a flexible dtype or with _names_, there must not be any header in the file (else a ValueError exception is raised).

and also:

  If _names_ is True, the field names are read from the first valid line after the first _skip_header_ lines.

The cause of this seems to be in [https://github.com/numpy/numpy/blob/master/numpy/lib/npyio.py#L1347 numpy/lib/npyio.py at lines 1347-9]:

```
    if names is True:
        if comments in first_line:
            first_line = asbytes('').join(first_line.split(comments)[1:])
```

**The last line should read `first_line = first_line.split(comments)[0]`.**

With the current code, the input line:

```
# Example comment line
```

will be transformed to:

```
Example comment line
```

resulting in columns named 'Example', 'comment' and 'line' (this is what the warning in the documentation is about).

But also the input line:

```
ColumnA ColumnB ColumnC # the column names precede this comment
```

will be transformed to:

```
the column names precede this comment
```

resulting in columns named 'the', 'column', 'names' …etc. In this instance actual column names present in the file are inappropriately discarded.

By taking the  `[0]` portion of the split instead of `[1:]`:
- Lines beginning with comments result in an empty string being passed to `split_lines()` on L1350, producing no usable output and causing the `while not first_values` loop to try the next line.
- Partial-line comments following actual heading names are discarded, instead of the names themselves.
- As a result, files can have commented headers of any length _and_ column names, simultaneously.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions