8000 Enable use 'Singleline,Multiline' option in split operator by iSazonov · Pull Request #4721 · PowerShell/PowerShell · GitHub
[go: up one dir, main page]

Skip to content

Enable use 'Singleline,Multiline' option in split operator #4721

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 8, 2017

Conversation

iSazonov
Copy link
Collaborator
@iSazonov iSazonov commented Aug 31, 2017

Fix #4712

RegEx Singleline and Multiline options are not mutually exclusive and can be used at the same time.

Tests for -split operator is added to get real code coverage and exclude regressions.

Context "Binary split operator options" {
BeforeAll {
$testText = "a12a`nb34b`nc56c`nd78d"
# Add '#' - now second line do not start with 'b'.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use some other char instead of # - it's easy to confuse with comments, but comments are irrelevant here.

also nit: doesn't start

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

}

It "Unary split operator can split array of values" {
$res = -split ("a b", "c d")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow didn't know you can do that!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vors:
Turns out I didn't know either, mostly because the documentation:

  • in part suggests that only string scalars are supported (<String> in the syntax diagram)

  • where it does mention arrays (multiple input strings), mistakenly claims that only the binary form supports them ("To split more than one string, use the binary split operator ").

I've created a doc issue to address that.

(I originally thought that when you pass an array it simply gets stringified based on $OFS, but that's not the case.)

$res[2] | Should Be "c"
$res[3] | Should Be "d"

$res = "a b c d" -split " ", -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a valid scenario, or should PowerShell report an error when the count is less than 1?

Copy link
Collaborator Author
@iSazonov iSazonov Sep 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get this from our documentation - see <Max-substrings>.
So it will be a breaking change if we add a exception here. I don't see the need for that.

@mklement0 Could you please comment too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iSazonov: Yes, the documentation states "A value of 0 and negative values return all the substrings".

Supporting 0 definitely makes sense, because in order to be able to specify options you must also specify a <max-substrings> value for syntactical reasons, so you need a way to signal the default of "return all".

Negative values serve no real purpose - and could lead people to mistakenly assume that they have special meaning - but given that they're documented and th 10000 at we could break existing code, I don't think a change is warranted.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mklement0 Thanks for good comment!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an argument in favor of a breaking change - is a negative count a likely bug? For example, if I compute the count and swapped my operands, this feels like a bug PowerShell should catch.

I do think it's an unlikely breaking change, mostly because the count it most commonly a constant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant something like:

$cnt = $b - $a # Should have been $a - $b
'a b c d' -split ' ', $cnt

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks. Preventing negative values categorically is certainly at odds with what I just proposed, but I'd still consider it an improvement.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is interesting improvement:

PS C:\temp> 'a b c d' -split ' ', 2
a
b c d
PS C:\temp> 'a b c d' -split ' ', 1
a b c d
PS C:\temp> 'a b c d' -split ' ', 0
a
b
c
d
PS C:\temp> 'a b c d' -split ' ', -1
a b c d
PS C:\temp> 'a b c d' -split ' ', -2
a b c
d

Copy link
Contributor
@mklement0 mklement0 Sep 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iSazonov: Yes, though your -2 behavior is what I suggest for -1 - analogous to index [-1] retrieving the last element of a collection.
Not sure if providing a way to not split at all is necessary.

> 'a b c d' -split ' ', -1  # last 1
a b c
d

> 'a b c d' -split ' ', -2  # last 2
a b
c
d

Also, it's probably worth always returning a first item that represents the remaining, unsplit part - even if that part is empty:

> 'a b' -split ' ', -2  # last 2, with nothing remaining
   # empty string  to signal that there's no remaining part
a
b

That enables the following idiom, irrespective of whether there's a remaining part (prefix) or not.

> $prefix, $tokens = 'a b' -split ' ', -2

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mklement0 I like the "breaking change". If @lzybkr approve we should open Issue Enhancement.


Context "Binary split operator options" {
BeforeAll {
$testText = "a12a`nb34b`nc56c`nd78d"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-line tests should probably all test the three linefeed variants: `n, `r, and `r`n.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we is full based on RegEx implementation in our code. We don't parse the string, we only pass it to RegEx as is. Do we really want to test CoreFX implementation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True today and probably true forever, but if somebody decided they could be smarter than regex (maybe in limited scenarios), it's useful (if it's easy) to cover those additional scenarios.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. I added tests for 'n' and 'rn'. SingleLine option catch only 'n so I believe no tests needed for 'r'.

Copy link
Contributor
@lzybkr lzybkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, but it would be nice to see more tests with the various valid variants of line feeds.

@lzybkr
Copy link
Contributor
lzybkr commented Sep 6, 2017

I approve of the breaking change, but we'll need an ack from the @PowerShell/powershell-committee before proceeding.

@SteveL-MSFT SteveL-MSFT added Breaking-Change breaking change that may affect users Review - Committee The PR/Issue needs a review from the PowerShell Committee labels Sep 6, 2017
@SteveL-MSFT
Copy link
Member

@PowerShell/powershell-committee can see how this can be useful and the current documented behavior is an edge case that isn't likely used so accepting this breaking change is approved

@SteveL-MSFT SteveL-MSFT added Committee-Reviewed PS-Committee has reviewed this and made a decision and removed Review - Committee The PR/Issue needs a review from the PowerShell Committee labels Sep 6, 2017
@lzybkr lzybkr removed the Breaking-Change breaking change that may affect users label Sep 6, 2017
@lzybkr
Copy link
Contributor
lzybkr commented Sep 6, 2017

The suggested behavior for negative count should be a different PR.

@iSazonov
Copy link
Collaborator Author
iSazonov commented Sep 7, 2017

@mklement0 Feel free to open Issue for the negative count enhancement.

@mklement0
Copy link
Contributor

@iSazonov: Done - see #4765.

@iSazonov
Copy link
Collaborator Author
iSazonov commented Sep 8, 2017

@lzybkr Is the PR ready to merge?

@iSazonov iSazonov added the Documentation Needed in this repo Documentation is needed in this repo label Sep 8, 2017
@lzybkr lzybkr merged commit fc9d798 into PowerShell:master Sep 8, 2017
@iSazonov
Copy link
Collaborator Author
iSazonov commented Sep 8, 2017

@lzybkr @mklement0 Thanks for great discussion!

@iSazonov iSazonov deleted the fix-split branch September 8, 2017 16:26
@daxian-dbw
Copy link
Member

@iSazonov Great test additions! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Committee-Reviewed PS-Committee has reviewed this and made a decision
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants
0