8000 npyio.py: genfromtxt() handles comments incorrectly with names=True (Trac #2184) · Issue #5974 · thouis/numpy-trac-migration · GitHub
[go: up one dir, main page]

Skip to content
npyio.py: genfromtxt() handles comments incorrectly with names=True (Trac #2184) #5974
Open
@numpy-gitbot

Description

@numpy-gitbot

Original ticket http://projects.scipy.org/numpy/ticket/2184 on 2012-07-11 by trac user khaeru, assigned to unknown.

The documentation for genfromtxt() reads:

When the variables are named (either by a flexible dtype or with names, there must not be any header in the file (else a ValueError exception is raised).

and also:

If names is True, the field names are read from the first valid line after the first skip_header lines.

The cause of this seems to be in [https://github.com/numpy/numpy/blob/master/numpy/lib/npyio.py#L1347 numpy/lib/npyio.py at lines 1347-9]:

    if names is True:
        if comments in first_line:
            first_line = asbytes('').join(first_line.split(comments)[1:])

The last line should read first_line = first_line.split(comments)[0].

With the current code, the input line:

# Example comment line

will be transformed to:

Example comment line

resulting in columns named 'Example', 'comment' and 'line' (this is what the warning in the documentation is about).

But also the input line:

ColumnA ColumnB ColumnC # the column names precede this comment

will be transformed to:

the column names precede this comment

resulting in columns named 'the', 'column', 'names' …etc. In this instance actual column names present in the file are inappropriately discarded.

By taking the [0] portion of the split instead of [1:]:

  • Lines beginning with comments result in an empty string being passed to split_lines() on L1350, producing no usable output and causing the while not first_values loop to try the next line.
  • Partial-line comments following actual heading names are discarded, instead of the names themselves.
  • As a result, files can have commented headers of any length and column names, simultaneously.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0