8000 Add char range overload to DotDot operator by IISResetMe · Pull Request #5026 · PowerShell/PowerShell · GitHub
[go: up one dir, main page]

Skip to content

Conversation

@IISResetMe
Copy link
Collaborator
@IISResetMe IISResetMe commented Oct 5, 2017

This PR attempts to allow the DotDot operator (..) to generate
character ranges, in turn allowing shorthand syntax like:

$AtoZ = -join('A'..'Z')

Since PowerShell has no shorthand syntax for char literals, we
test whether the left-hand-side is of type String, and subsequently
attempts to convert both operands to a char (in accordance with
PowerShell operator overload behavior in general).

NOTE:

  1. This breaks a number of currently supported range operations, such
    as implicit numerical conversion of strings:
    '1'..'100'

  2. This implementation is fairly naive, making no attempts to assert whether the input characters (or any codepoints in between) fall outside UCS-2 (ie. surrogates). Attempts to expand ranges bound by surrogate pairs are expected to fail and not covered by this implementation (I will leave this an exercise for someone more experienced with unicode hackery than myself)

I was unable to locate existing tests for the integer range function, please advise on where to add those.

@msftclas
Copy link
msftclas commented Oct 5, 2017

CLA assistant check
All CLA requirements met.

@lzybkr lzybkr self-assigned this Oct 5, 2017
Copy link
Contributor
@lzybkr lzybkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code changes look fine - but we need tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about returning char[] instead?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@IISResetMe Could you please address the comment?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea, but for now I think we should keep operator output consistent (ie. return object[] with individual items of the expected type).

There are are number of builtin operators that return object[] in this fashion (@(), -split, any comparison operator in array/filter mode etc.), changing the behavior should be a concerted effort across all of them IMO

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this special case is correctly handled by the code below and it happens rarely, so maybe we should remove it. (And I see you copied the code from the int version, so you could remove it there too.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly did it this way to keep it similar to the existing code in Compiler.cs. I'm inclined to make a single Range method that takes the operands as Expression's, de-duplicate the loop that generates the actual sequence and then cast the entire thing to char[], int[] or long[] based on the lhs operand before returning. Will update the code later today

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@IISResetMe Is this addresed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the redundant branch in both overloads now. 0406cb3

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this work:

'A'..'Z' | % { ... }

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this will produce the sequence A (0x41) through Z (0x5A)

Copy link
Contributor
@kborowinski kborowinski Jan 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently it does not work on the Powershell Core 6.0.0 GA

> 'A'..'Z' | % {$_}
Cannot convert value "A" to type "System.Int32". Error: "Input string was not in a correct format."
At line:1 char:1
+ 'A'..'Z' | % {$_}
+ ~~~~~~~~~~~~~~~~~~
+ CategoryInfo          : InvalidArgument: (:) [], RuntimeException
+ FullyQualifiedErrorId : InvalidCastFromStringToInteger

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kborowinski there is an open issue for that #5519

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markekraus: sorry, I've missed that issue

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder - is this ASCII Arithmetic work correctly for unicode?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't work with surrogate pairs because those can't be represented in a char, but I don't think that is an interesting scenario.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we added `u{} should we detect surrogate pairs and write an error?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we should error on invalid characters (half of the pair). We could also support surrogate pair ranges if we didn't convert the string to char immediately, instead analyzing the string more thoroughly. But I'm not sure I'd block this PR on either of those.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it will work with traditional languages in common scenarios but before I saw issues from China and Japan mans - if the new operator does not matter to them I agree with you.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@IISResetMe Could you please address the comment in code or add relevant comments in the PR description for future documentation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iSazonov I don't feel confident enough in my unicode-fu to attempt to implement the appropriate handling in code for this at the moment, I've updated the PR description to reflect this deficiency instead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

Closed.

@IISResetMe
Copy link
Collaborator Author

Oops, rebased powershell:master into the branch, looks like all parent commits are in the PR now, how do I clean it up? 😞

@markekraus
Copy link
Contributor
markekraus commented Oct 15, 2017

@IISResetMe did you only have the 2 commits? if so you could do this (assuming PowerShell is the name of your upstream remote for the actual PowerShell repo). Make sure you stash or commit any workspace changes before doing this as they will be lost. I'm not a git expert, and maybe there is a better way. So maybe wait until someone more seasoned either "thumbs up" this or tells you just how wrong I am :)

git fetch
git checkout -b 20171015fix PowerShell/master
git cherry-pick 6e6dfffc2d0150b504585597ad4df997a174867a
git cherry-pick 6d6f20d28157ee49071f018051ac7b35be57bbb0
git checkout -b 20171015backup overload-DotDot-CharRange-1
git checkout overload-DotDot-CharRange-1
git reset --hard 20171015fix
git push --force

This will create a new 20171015fix branch at the current master on this repo, then cherry-picks your 2 commits, then backs up your existing branch to 20171015backup, and then hard reset your branch to the 20171015fix branch. This will both "rebase" your branch on to master and add your commits with out the merge mess. This will likely mess up any comment history here, but I guess it's already some what messed up.

This commit attempts to allow the DotDot operator (..) to generate
character ranges.

Since PowerShell has no shorthand syntax for char literals, this change
tests whether the left-hand-side is of type String, and subsequently
attempts to convert both operands to a char.

NOTE: This breaks a number of currently supported range operations, such
as implicit numerical conversion of strings:
'1'..'100'
This commit adds basic tests for the current integer range (..) operator
Copy link
Collaborator Author
@IISResetMe IISResetMe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were the most basic tests I could think of for the existing Range operator. All examples I could find in the about_* help files pass these.

@IISResetMe
Copy link
Collaborator Author

Thanks @markekraus, that seems to have done the trick.

Add tests for the [char] overload feature. RangeOperator.Tests.ps1 encoding changed to UTF16LE (Windows "Unicode") to allows inline unicode chars in test case
@TravisEz13
Copy link
Member

Can you make sure RangeOperator.Tests.ps1 is UTF8 without BOM? It shows up as binary.

Changed RangeOperator.Tests.ps1 encoding to UTF-8 (no BOM), substituted UTF-16 char literals with [char] casts
@IISResetMe
Copy link
Collaborator Author

< 6284 td class="blob-num blob-num-addition empty-cell">
It "Range operator generates arrays of integers" {
$Range = 5..8
$Range.count | Should Be 4
$Range[0] | Should BeOfType [int]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests for values - $Range[0] | Should Be 5

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Closed.

It "Range operator accepts negative integer values" {
{
-8..-5
} |Should Not Throw
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the should - next line do the same and Pester can catch a throw there correctly.


$Range = -8..-5
$Range.count | Should Be 4
$Range[0] |Should Be -8
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting - please add space after |

Please add type check as in previous test.

It "Range operator support single-item sequences" {
{
0..0
} |Should Not Throw
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove.

$Range[0] | Should Be 0
}

It "Range operator works in ascending and descending order" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous tests already test "ascending". I think we could test here only descending order for positive and negative numbers. But I believe we should replace some tests with one - "-2 .. 2" and "2 .. -2" - so we check transition through zero too.


Context "Range operator operand types" {

It "Range operator works on [int]" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the test - we do the checks above already.

} | Should Not Throw
$Range = ([long]1)..([long]10)
$Range.count | Should Be 10
$Range.Where({$_ -is [int]}).count | Should Be 10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really [int]? I think we should use values > int.MaxValue.
And please check values too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current implementation doesn't support values > int.MaxValue, intended to add this in a separate branch/PR.

Copy link
Collaborator
@iSazonov iSazonov Oct 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarify. Please remove the confuse test.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@IISResetMe Please address the comment about the tests for [long] and [bigint].
We should remove its - they don't make sense.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, they don't make too much sense at the moment. I've removed them now

} | Should Not Throw
$Range = ([bigint]1)..([bigint]10)
$Range.count | Should Be 10
$Range.Where({$_ -is [int]}).count | Should Be 10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really [int]? I think we should use values > long.MaxValue.
And please check values too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing numeric range operator only supports int regardless of input type

{
1.2d..2.9d
} | Should Not Throw
$Range = 1.1d..9.9d
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why so much? I think enough $Range = 1.1d..3.9d

$Range[9] | Should Be 10
}
}
} No newline at end of file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add new line at EOF.

Fixed formatting, removed unnecessary checks
@TravisEz13
Copy link
Member

@lzybkr Can you update your review?

@TravisEz13
Copy link
Member

@iSazonov Can you indicate if you think this is ready or not (feel free to say if someone else signs off)?

@iSazonov
Copy link
Collaborator

@TravisEz13 I updated three comment above - after they have been resolved we'll ready to merge.

Both Range() overloads have unnecessary branches for the special case of lhs == rhs, removed for simplicity.
Removed tests for .. using [long] and [bigint] input operands. I believe these will make more sense once we add support for ranges with values outside Int32, but for now might be confusing
@IISResetMe
Copy link
Collaborator Author

@iSazonov I believe I've addressed all outstanding comments :-)

@iSazonov
Copy link
Collaborator

@IISResetMe Many thanks! We are near to merge. Please address @lzybkr comment above.

@vors vors removed their request for review October 25, 2017 06:02
@IISResetMe
Copy link
Collaborator Author

@iSazonov Done

@iSazonov
Copy link
Collaborator

@lzybkr Could you please take another look if you have a bit free time?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants

0