8000 Get-Content -Delimiter unexpectedly keeps the delimiter in the lines returned · Issue #3706 · PowerShell/PowerShell · GitHub
[go: up one dir, main page]

Skip to content

Get-Content -Delimiter unexpectedly keeps the delimiter in the lines returned #3706

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mklement0 opened this issue May 5, 2017 · 10 comments
Closed
Assignees
Labels
Breaking-Change breaking change that may affect users Issue-Bug Issue has been identified as a bug in the product Resolution-Fixed The issue is fixed. WG-Engine-Providers built-in PowerShell providers such as FileSystem, Certificates, Registry, etc.

Comments

@mklement0
Copy link
Contributor
mklement0 commented May 5, 2017

Note: It may be too late to change this behavior, or perhaps it falls into Bucket 3: Unlikely Grey Area.
If the former, perhaps a -TrimDelimiter option could be added.

By default, Get-Content strips the newline character [sequence] from the lines it returns.

The generalization of this concept is to use the -Delimiter parameter, which allows specifying a custom line (record) delimiter (terminator / separator).

However, unlike with the default delimiter - a newline - whatever -Delimiter argument you specify is kept in the lines returned, which is:

  • an inconsistency

  • an inconvenience, because the delimiter - which by its nature is not part of the data itself - must be stripped explicitly before further processing.

Steps to reproduce

I'd expect the following 2 commands to be equivalent:

1,2 > t.txt; get-content                                     t.txt | % { "[$_]" }
1,2 > t.txt; get-content -delimiter ([environment]::newline) t.txt | % { "[$_]" }

Expected behavior

[1]
[2]
[1]
[2]

Actual behavior

[1]
[2]
[1
]
[2
]

Note how the trailing newlines were retained in the output from the 2nd command.

Environment data

PowerShell Core v6.0.0-alpha (v6.0.0-alpha.18) on macOS 10.12.4
@MaximoTrinidad
Copy link

Hey @iSazonov,

Just wondering if this could be tied up with this other issue experiencing with Import-Csv:
"Import-Csv does not handle newlines containing just \r (Carriage Return) #3692"

Can someone confirmed this?

:)

@mklement0
Copy link
Contributor Author
mklement0 commented May 5, 2017

@MaximoTrinidad: The other issue is somewhat related, though in the case of Import-Csv the proper fix is to have it accept \r-only line breaks too (in line with Get-Content), so I see less of a need for an explicit -Delimiter parameter there.

@MaximoTrinidad
Copy link

Thanks @mklement0!

@iSazonov iSazonov added WG-Engine-Providers built-in PowerShell providers such as FileSystem, Certificates, Registry, etc. Issue-Bug Issue has been identified as a bug in the product labels May 5, 2017
@iSazonov
Copy link
Collaborator
iSazonov commented May 5, 2017

I agree that both should work uniformly Get-Content and Get-Content -Delimiter. Although the parameter name LineTerminator is more suitable.

And it is an other issue - Import-Csv don't use FileProvider code. Although worth discussing that all file cmdlets (Out-File, Import-Csv, Export-Csv ...) have been based on FileProvider code to work uniformly.

/cc @jeffbi

@mklement0
Copy link
Contributor Author

@iSazonov:

Historically, there's been a lot of confusion around the terms separator, delimiter, and terminator.

I agree that -Delimiter is not a great name, and that terminator is more appropriate here - though, arguably, we're then not dealing with lines anymore, but with the abstraction of records (the terminology adopted by awk for instan 8000 ce, which has otherwise settled on separator, unfortunately).

Therefore, bypassing the secondary issue of line vs. record, perhaps defining -Terminator as an alias for -Delimiter is appropriate (-Delimiter currently has no alias).

@iSazonov
Copy link
Collaborator
iSazonov commented May 10, 2017

@mklement0 We could rely on the W3C standard - Model for Tabular Data and Metadata on the Web
line terminators
(from #3692 (comment))

@mklement0
Copy link
Contributor Author

@iSazonov:

In the context of CSV files, specifically, which are line-oriented, line terminator is the right term.

By contrast, by allowing arbitrary terminator strings with -Delimiter, what a terminator terminates is no longer necessarily a line: with a terminator other than a newline (in any of its variations), any number of lines may then occur between instances of the terminator.

In other words: the units of data may themselves be multi-line, so calling such a unit a line is inappropriate.

That's why awk chose record, for instance, but my point is that it isn't necessary for the parameter name to reflect a term for the data unit, given that only one kind of terminator is supported (for the implied lines / data units).

On a meta note: even though the label of your link suggests that it is to a specific comment on the issue page, the link is just to the issue page itself.
A recent post by @lzybkr exhibited the same problem.
How are you generating these links? Is this a GitHub bug?

@iSazonov
Copy link
Collaborator

@mklement0 You can copy such links from post header datetime string "mklement0 commented 9 minutes ago".
It is GitHub bug.

@SteveL-MSFT SteveL-MSFT added the Breaking-Change breaking change that may affect users label Jun 1, 2017
mklement0 added a commit to mklement0/PowerShell that referenced this issue Aug 31, 2017
mklement0 added a commit to mklement0/PowerShell that referenced this issue Aug 31, 2017
daxian-dbw pushed a commit that referenced this issue Sep 1, 2017
daxian-dbw pushed a commit that referenced this issue Sep 1, 2017
@iSazonov iSazonov added the Resolution-Fixed The issue is fixed. label Sep 2, 2017
@shreyjain362
< 94C6 summary data-view-component="true" class="timeline-comment-action Link--secondary Button--link Button--medium Button"> Copy link

The issue is still not resolved. Why has it been marked "Resolution Fixed"? @mklement0 @iSazonov

@mklement0
Copy link
Contributor Author

@shreyjain362, it has been fixed in PowerShell 7 (which this repo is solely devoted to) - at least judging by running the original reproduction steps.

Note that Windows PowerShell won't receive this fix, because it is no longer actively developed and only receives security-critical fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Breaking-Change breaking change that may affect users Issue-Bug Issue has been identified as a bug in the product Resolution-Fixed The issue is fixed. WG-Engine-Providers built-in PowerShell providers such as FileSystem, Certificates, Registry, etc.
Projects
None yet
Development

No branches or pull requests

5 participants
0