-
Notifications
You must be signed in to change notification settings - Fork 7.6k
Enable use 'Singleline,Multiline' option in split operator #4721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Context "Binary split operator options" { | ||
BeforeAll { | ||
$testText = "a12a`nb34b`nc56c`nd78d" | ||
# Add '#' - now second line do not start with 'b'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's use some other char instead of #
- it's easy to confuse with comments, but comments are irrelevant here.
also nit: doesn't start
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
} | ||
|
||
It "Unary split operator can split array of values" { | ||
$res = -split ("a b", "c d") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow didn't know you can do that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vors:
Turns out I didn't know either, mostly because the documentation:
-
in part suggests that only string scalars are supported (
<String>
in the syntax diagram) -
where it does mention arrays (multiple input strings), mistakenly claims that only the binary form supports them ("To split more than one string, use the binary split operator ").
I've created a doc issue to address that.
(I originally thought that when you pass an array it simply gets stringified based on $OFS
, but that's not the case.)
$res[2] | Should Be "c" | ||
$res[3] | Should Be "d" | ||
|
||
$res = "a b c d" -split " ", -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a valid scenario, or should PowerShell report an error when the count is less than 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get this from our documentation - see <Max-substrings
>.
So it will be a breaking change if we add a exception here. I don't see the need for that.
@mklement0 Could you please comment too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iSazonov: Yes, the documentation states "A value of 0 and negative values return all the substrings".
Supporting 0
definitely makes sense, because in order to be able to specify options you must also specify a <max-substrings>
value for syntactical reasons, so you need a way to signal the default of "return all".
Negative values serve no real purpose - and could lead people to mistakenly assume that they have special meaning - but given that they're documented and th 10000 at we could break existing code, I don't think a change is warranted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mklement0 Thanks for good comment!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an argument in favor of a breaking change - is a negative count a likely bug? For example, if I compute the count and swapped my operands, this feels like a bug PowerShell should catch.
I do think it's an unlikely breaking change, mostly because the count it most commonly a constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant something like:
$cnt = $b - $a # Should have been $a - $b
'a b c d' -split ' ', $cnt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks. Preventing negative values categorically is certainly at odds with what I just proposed, but I'd still consider it an improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is interesting improvement:
PS C:\temp> 'a b c d' -split ' ', 2
a
b c d
PS C:\temp> 'a b c d' -split ' ', 1
a b c d
PS C:\temp> 'a b c d' -split ' ', 0
a
b
c
d
PS C:\temp> 'a b c d' -split ' ', -1
a b c d
PS C:\temp> 'a b c d' -split ' ', -2
a b c
d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iSazonov: Yes, though your -2
behavior is what I suggest for -1
- analogous to index [-1]
retrieving the last element of a collection.
Not sure if providing a way to not split at all is necessary.
> 'a b c d' -split ' ', -1 # last 1
a b c
d
> 'a b c d' -split ' ', -2 # last 2
a b
c
d
Also, it's probably worth always returning a first item that represents the remaining, unsplit part - even if that part is empty:
> 'a b' -split ' ', -2 # last 2, with nothing remaining
# empty string to signal that there's no remaining part
a
b
That enables the following idiom, irrespective of whether there's a remaining part (prefix) or not.
> $prefix, $tokens = 'a b' -split ' ', -2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mklement0 I like the "breaking change". If @lzybkr approve we should open Issue Enhancement.
|
||
Context "Binary split operator options" { | ||
BeforeAll { | ||
$testText = "a12a`nb34b`nc56c`nd78d" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multi-line tests should probably all test the three linefeed variants: `n
, `r
, and `r`n
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we is full based on RegEx implementation in our code. We don't parse the string, we only pass it to RegEx as is. Do we really want to test CoreFX implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True today and probably true forever, but if somebody decided they could be smarter than regex (maybe in limited scenarios), it's useful (if it's easy) to cover those additional scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point. I added tests for 'n' and '
rn'. SingleLine option catch only '
n so I believe no tests needed for '
r'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved, but it would be nice to see more tests with the various valid variants of line feeds.
I approve of the breaking change, but we'll need an ack from the @PowerShell/powershell-committee before proceeding. |
@PowerShell/powershell-committee can see how this can be useful and the current documented behavior is an edge case that isn't likely used so accepting this breaking change is approved |
The suggested behavior for negative count should be a different PR. |
@mklement0 Feel free to open Issue for the negative count enhancement. |
@lzybkr Is the PR ready to merge? |
@lzybkr @mklement0 Thanks for great discussion! |
@iSazonov Great test additions! Thanks! |
Fix #4712
RegEx Singleline and Multiline options are not mutually exclusive and can be used at the same time.
Tests for -split operator is added to get real code coverage and exclude regressions.