Description
Original ticket http://projects.scipy.org/numpy/ticket/2184 on 2012-07-11 by trac user khaeru, assigned to unknown.
The documentation for genfromtxt()
reads:
When the variables are named (either by a flexible dtype or with names, there must not be any header in the file (else a ValueError exception is raised).
and also:
If names is True, the field names are read from the first valid line after the first skip_header lines.
The cause of this seems to be in [https://github.com/numpy/numpy/blob/master/numpy/lib/npyio.py#L1347 numpy/lib/npyio.py at lines 1347-9]:
if names is True:
if comments in first_line:
first_line = asbytes('').join(first_line.split(comments)[1:])
The last line should read first_line = first_line.split(comments)[0]
.
With the current code, the input line:
# Example comment line
will be transformed to:
Example comment line
resulting in columns named 'Example', 'comment' and 'line' (this is what the warning in the documentation is about).
But also the input line:
ColumnA ColumnB ColumnC # the column names precede this comment
will be transformed to:
the column names precede this comment
resulting in columns named 'the', 'column', 'names' …etc. In this instance actual column names present in the file are inappropriately discarded.
By taking the [0]
portion of the split instead of [1:]
:
- Lines beginning with comments result in an empty string being passed to
split_lines()
on L1350, producing no usable output and causing thewhile not first_values
loop to try the next line. - Partial-line comments following actual heading names are discarded, instead of the names themselves.
- As a result, files can have commented headers of any length and column names, simultaneously.