-
Notifications
You must be signed in to change notification settings - Fork 8.1k
Add char range overload to DotDot operator #5026
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add char range overload to DotDot operator #5026
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code changes look fine - but we need tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about returning char[] instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@IISResetMe Could you please address the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea, but for now I think we should keep operator output consistent (ie. return object[] with individual items of the expected type).
There are are number of builtin operators that return object[] in this fashion (@(), -split, any comparison operator in array/filter mode etc.), changing the behavior should be a concerted effort across all of them IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this special case is correctly handled by the code below and it happens rarely, so maybe we should remove it. (And I see you copied the code from the int version, so you could remove it there too.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly did it this way to keep it similar to the existing code in Compiler.cs. I'm inclined to make a single Range method that takes the operands as Expression's, de-duplicate the loop that generates the actual sequence and then cast the entire thing to char[], int[] or long[] based on the lhs operand before returning. Will update the code later today
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@IISResetMe Is this addresed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the redundant branch in both overloads now. 0406cb3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this work:
'A'..'Z' | % { ... }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this will produce the sequence A (0x41) through Z (0x5A)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently it does not work on the Powershell Core 6.0.0 GA
> 'A'..'Z' | % {$_}
Cannot convert value "A" to type "System.Int32". Error: "Input string was not in a correct format."
At line:1 char:1
+ 'A'..'Z' | % {$_}
+ ~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidArgument: (:) [], RuntimeException
+ FullyQualifiedErrorId : InvalidCastFromStringToIntegerThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kborowinski there is an open issue for that #5519
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@markekraus: sorry, I've missed that issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder - is this ASCII Arithmetic work correctly for unicode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It won't work with surrogate pairs because those can't be represented in a char, but I don't think that is an interesting scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we added `u{} should we detect surrogate pairs and write an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we should error on invalid characters (half of the pair). We could also support surrogate pair ranges if we didn't convert the string to char immediately, instead analyzing the string more thoroughly. But I'm not sure I'd block this PR on either of those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it will work with traditional languages in common scenarios but before I saw issues from China and Japan mans - if the new operator does not matter to them I agree with you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@IISResetMe Could you please address the comment in code or add relevant comments in the PR description for future documentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iSazonov I don't feel confident enough in my unicode-fu to attempt to implement the appropriate handling in code for this at the moment, I've updated the PR description to reflect this deficiency instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
Closed.
|
Oops, rebased powershell:master into the branch, looks like all parent commits are in the PR now, how do I clean it up? 😞 |
|
@IISResetMe did you only have the 2 commits? if so you could do this (assuming git fetch
git checkout -b 20171015fix PowerShell/master
git cherry-pick 6e6dfffc2d0150b504585597ad4df997a174867a
git cherry-pick 6d6f20d28157ee49071f018051ac7b35be57bbb0
git checkout -b 20171015backup overload-DotDot-CharRange-1
git checkout overload-DotDot-CharRange-1
git reset --hard 20171015fix
git push --forceThis will create a new |
This commit attempts to allow the DotDot operator (..) to generate character ranges. Since PowerShell has no shorthand syntax for char literals, this change tests whether the left-hand-side is of type String, and subsequently attempts to convert both operands to a char. NOTE: This breaks a number of currently supported range operations, such as implicit numerical conversion of strings: '1'..'100'
This commit adds basic tests for the current integer range (..) operator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These were the most basic tests I could think of for the existing Range operator. All examples I could find in the about_* help files pass these.
|
Thanks @markekraus, that seems to have done the trick. |
Add tests for the [char] overload feature. RangeOperator.Tests.ps1 encoding changed to UTF16LE (Windows "Unicode") to allows inline unicode chars in test case
|
Can you make sure |
Changed RangeOperator.Tests.ps1 encoding to UTF-8 (no BOM), substituted UTF-16 char literals with [char] casts
|
@TravisEz13 Done, https://github.com/IISResetMe/PowerShell/blob/34bc59046698d0de41bb5403a6105c7070b38a46/test/powershell/Language/Operators/RangeOperator.Tests.ps1 is UTF-8 (with updated tests) now |
| It "Range operator generates arrays of integers" { | ||
| $Range = 5..8 | ||
| $Range.count | Should Be 4 | ||
| $Range[0] | Should BeOfType [int] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add tests for values - $Range[0] | Should Be 5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Closed.
| It "Range operator accepts negative integer values" { | ||
| { | ||
| -8..-5 | ||
| } |Should Not Throw |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the should - next line do the same and Pester can catch a throw there correctly.
|
|
||
| $Range = -8..-5 | ||
| $Range.count | Should Be 4 | ||
| $Range[0] |Should Be -8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formatting - please add space after |
Please add type check as in previous test.
| It "Range operator support single-item sequences" { | ||
| { | ||
| 0..0 | ||
| } |Should Not Throw |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove.
| $Range[0] | Should Be 0 | ||
| } | ||
|
|
||
| It "Range operator works in ascending and descending order" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previous tests already test "ascending". I think we could test here only descending order for positive and negative numbers. But I believe we should replace some tests with one - "-2 .. 2" and "2 .. -2" - so we check transition through zero too.
|
|
||
| Context "Range operator operand types" { | ||
|
|
||
| It "Range operator works on [int]" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the test - we do the checks above already.
| } | Should Not Throw | ||
| $Range = ([long]1)..([long]10) | ||
| $Range.count | Should Be 10 | ||
| $Range.Where({$_ -is [int]}).count | Should Be 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really [int]? I think we should use values > int.MaxValue.
And please check values too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current implementation doesn't support values > int.MaxValue, intended to add this in a separate branch/PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarify. Please remove the confuse test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@IISResetMe Please address the comment about the tests for [long] and [bigint].
We should remove its - they don't make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, they don't make too much sense at the moment. I've removed them now
| } | Should Not Throw | ||
| $Range = ([bigint]1)..([bigint]10) | ||
| $Range.count | Should Be 10 | ||
| $Range.Where({$_ -is [int]}).count | Should Be 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really [int]? I think we should use values > long.MaxValue.
And please check values too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing numeric range operator only supports int regardless of input type
| { | ||
| 1.2d..2.9d | ||
| } | Should Not Throw | ||
| $Range = 1.1d..9.9d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why so much? I think enough $Range = 1.1d..3.9d
| $Range[9] | Should Be 10 | ||
| } | ||
| } | ||
| } No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add new line at EOF.
Fixed formatting, removed unnecessary checks
|
@lzybkr Can you update your review? |
|
@iSazonov Can you indicate if you think this is ready or not (feel free to say if someone else signs off)? |
|
@TravisEz13 I updated three comment above - after they have been resolved we'll ready to merge. |
Both Range() overloads have unnecessary branches for the special case of lhs == rhs, removed for simplicity.
Removed tests for .. using [long] and [bigint] input operands. I believe these will make more sense once we add support for ranges with values outside Int32, but for now might be confusing
|
@iSazonov I believe I've addressed all outstanding comments :-) |
|
@IISResetMe Many thanks! We are near to merge. Please address @lzybkr comment above. |
|
@iSazonov Done |
|
@lzybkr Could you please take another look if you have a bit free time? |
This PR attempts to allow the DotDot operator (..) to generate
character ranges, in turn allowing shorthand syntax like:
Since PowerShell has no shorthand syntax for char literals, we
test whether the left-hand-side is of type String, and subsequently
attempts to convert both operands to a char (in accordance with
PowerShell operator overload behavior in general).
NOTE:
This breaks a number of currently supported range operations, such
as implicit numerical conversion of strings:
'1'..'100'
This implementation is fairly naive, making no attempts to assert whether the input characters (or any codepoints in between) fall outside UCS-2 (ie. surrogates). Attempts to expand ranges bound by surrogate pairs are expected to fail and not covered by this implementation (I will leave this an exercise for someone more experienced with unicode hackery than myself)
I was unable to locate existing tests for the integer range function, please advise on where to add those.