8000 globbing for native commands is too agressive · Issue #3931 · PowerShell/PowerShell · GitHub
[go: up one dir, main page]

Skip to content

globbing for native commands is too agressive #3931

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
vors opened this issue Jun 3, 2017 · 26 comments
Closed

globbing for native commands is too agressive #3931

vors opened this issue Jun 3, 2017 · 26 comments
Assignees
Labels
Issue-Bug Issue has been identified as a bug in the product Resolution-Fixed The issue is fixed. WG-Language parser, language semantics
Milestone

Comments

@vors
Copy link
Collaborator
vors commented Jun 3, 2017

Steps to reproduce

Intuitively globbing should not kick-in inside the single-quoted strings.

echo '11:1' | grep '.*:.'

Expected behavior

Works, output is 11:1, like in bash.

Actual behavior

Cannot find drive. A drive with the name '.*' does not exist.
At line:1 char:1
+ echo '11:1' | grep '.*:.'
+ ~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (.*:String) [], DriveNotFoundException
    + FullyQualifiedErrorId : DriveNotFound 

The error is pretty confusing for a unix user.

Workaround

Escape * by a backtick in the regex.

Environment data

> $PSVersionTable

Name                           Value                                                                                      
----                           -----                                                                                      
PSVersion                      6.0.0-beta                                                                                 
PSEdition                      Core                                                                                       
BuildVersion                   3.0.0.0                                                                                    
CLRVersion                                                                                                                
GitCommitId                    v6.0.0-beta.2                                                                              
OS                             Darwin 16.6.0 Darwin Kernel Version 16.6.0: Fri Apr 14 16:21:16 PDT 2017; root:xnu-3789....
Platform                       Unix                                                                                       
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0...}                                                                    
PSRemotingProtocolVersion      2.3                                                                                        
SerializationVersion           1.1.0.1                                                                                    
WSManStackVersion              3.0  
@vors vors added the WG-Language parser, language semantics label Jun 3, 2017
@SteveL-MSFT SteveL-MSFT added this to the 6.0.0-HighPriority milestone Jun 3, 2017
@mklement0
Copy link
Contributor
8000

Just to state it explicitly: globs shouldn't be expanded inside "..." (double-quoted strings) either, which currently happens too:

printf '%s\n' '*'    # should print literal *
printf '%s\n' "*"    # ditto

Only unquoted tokens should ever be subject to globbing, as in POSIX-like shells.

@latkin
Copy link
Contributor
latkin commented Jun 5, 2017

Just noticed this after moving to Beta on OSX. Maybe a recent regression?

It makes using curl impossible if you have URL query parameters.

> curl 'https://google.com'                                                                                                                                                                                                                                         
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="https://www.google.com/">here</A>.
</BODY></HTML>
> curl 'https://google.com?foo=bar'                                                                                                                                                                                                                                 
Cannot find drive. A drive with the name 'https' does not exist.
At line:1 char:1
+ curl 'https://google.com?foo=bar'
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (https:String) [], DriveNotFoundException
    + FullyQualifiedErrorId : DriveNotFound

@vors vors added the Issue-Bug Issue has been identified as a bug in the product label Jun 5, 2017
@vors
Copy link
Collaborator Author
vors commented Jun 5, 2017

Just noticed this after moving to Beta on OSX. Maybe a recent regression?

Yes, not really a regression, it a new feature in beta-1
#3643

@latkin
Copy link
Contributor
latkin commented Jun 6, 2017

Is there a way to disable globbing entirely and opt back in to the Windows-style behavior? Even when it works by design, I kind of hate it.

e.g. git add * used to just work, but with globbing I'd need to do git add '*'

@vors
Copy link
Collaborator Author
vors commented Jun 10, 2017

For this specific case git add . may work

@latkin
Copy link
Contributor
latkin commented Jul 12, 2017

Bump. Any update? This is preventing me from moving to beta on non-Windows.

@vors
Copy link
Collaborator Author
vors commented Jul 12, 2017

Polite ping @BrucePay

@vors
Copy link
Collaborator Author
vors commented Jul 27, 2017

Another example of native utility that became unusable is youtube-dl (or anything that takes url for that matter)

youtube-dl https://www.youtube.com/watch?v=QQ0Yn1fqugg
Cannot find drive. A drive with the name 'https' does not exist.
At line:1 char:1
+ youtube-dl 'https://www.youtube.com/watch?v=QQ0Yn1fqugg'
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (https:String) [], DriveNotFound 
   Exception
    + FullyQualifiedErrorId : DriveNotFound
8000

@rkeithhill
Copy link
Collaborator
rkeithhill commented Oct 11, 2017

And yet another example of globbbing messing up invocation of a native executable:

conan info --only "None" --package_filter PkgName/* --cwd ../..

BTW the suggested workaround of escaping the * doesn't work because `* is passed to the executable.

vors reacted with thumbs up emoji

@rkeithhill
Copy link
Collaborator

So I was trying out my home grown echoargs in my WSL/Bash shell and noticed that given this command (in Bash):

hillr@HILLR1:~$ echoargs g* --package_filter FOO/*

You get this output:

Command line: "/home/hillr/dotnet/echoargs/bin/Debug/netcoreapp2.0/ubuntu.16.04-x64/echoargs.dll getcwd getcwd.c get-pip.py git --package_filter FOO/*"

RE globbing, how does Bash know to glob the first argument g* but not the second one FOO/*?

@mklement0
Copy link
Contributor
mklement0 commented Oct 11, 2017

Bash (per POSIX):

  • only ever applies globbing to unquoted tokens; * is subject to globbing, '*' and "*" are not (or a singly escaped *, \*).

    • Side note: strictly speaking, what matters is whether the individual globbing metacharacters/expressions are quoted or not; POSIX-like shells allow you to form a single argument composed of both quoted and unquoted parts; e.g., "foo"* would still match all files whose names start with foo, because the * is unquoted.
  • passes unquoted tokens that do not match any filesystem objects through as-is (in Bash, you can opt into different behavior, but that is not part of the POSIX standard); in your example, FOO/* presumably didn't match anything and was therefore left untouched.

    • Note that it is irrelevant whether or not an unquoted token with globbing metacharacters partially happens to refer to existing filesystem items or is even a syntactically valid path - the original argument is passed through as-is in all cases.
    • Arguably, this is not the most sensible default behavior, but it's POSIX-mandated and has a long history.

My assumption has been that the idea behind PowerShell's native globbing is to emulate POSIX-shell rules (is that not true?), so in your earlier conan example, if you wanted to pass PkgName/* through, you'd have to quote it to exempt it from globbing:

conan info --only "None" --package_filter 'PkgName/*' --cwd ../..

@rkeithhill
Copy link
Collaborator
rkeithhill commented Oct 11, 2017

Quoting doesn't work in PowerShell (does work from Bash). That is part or the problem with PowerShell's aggressive globbing:

1> echoargs info --only "None" --package_filter 'PkgName/*' --cwd ../..
Cannot find path '/home/hillr/PkgName' because it does not exist.

That cannot find path error is coming from the globbing code - not echoargs.

BTW you're right about Bash not finding anything that matched PkgName/*. If a add folder with that name and some files in it, it globs that as well. So I guess one difference (bug?) with PowerShell's globbing is that if it does 8000 n't find a path it shouldn't error, it should pass the value straight through to the native exe.

@mklement0
Copy link
Contributor

Sorry for the reposts - I didn't want to clutter this thread with piecemeal insights.

Indeed: that it currently doesn't work as I described and instead unexpectedly works as you demonstrate is the reason this issue was created.

What you're seeing is a variation of what @vors experienced, and demonstrates both current problems (deviations from POSIX-shell behavior as of beta 8):

  • Globbing is applied blindly, instead of leaving quoted tokens alone.

  • A glob that matches nothing due to nonexistent path components results in an error rather than in the original token getting passed through (e.g., /bin/echo /nosuch/*); by contrast, a glob without a path component or one whose non-globbing components exist is passed through as-is, in line with POSIX (e.g., /bin/echo nosuch*).

    • In other words: PowerShell currently in part matches the POSIX behavior in this respect. Given that the passing-through-as-is-if-no-match arguably never really makes sense, perhaps PowerShell could decide to deviate from POSIX behavior here:
      Arguably, the sensible behavior is to complain about non-existent path components (as PowerShell already does) and to otherwise pass globs that happen to match nothing as distinct empty-string arguments? (This is not the same as setting shopt -s nullglob in Bash, where non-matching globs are eliminated as arguments altogether). Not sure what the right answer is.

@latkin
Copy link
Contributor
latkin commented Oct 12, 2017

Thanks for sharing the POSIX design. I readily acknowledge that there is a lot of value in offering this well-established behavior to PowerShell users, even as a default on unix systems. This makes PowerShell easier to pick up for folks coming from that ecosystem. By no means am I opposed to PowerShell having this capability.

But dang, that's totally idiotic. I really really hope there can be an option to disable this and use Windows-style (e.g. no) globbing, even on Unix. I have to switch back and forth between Windows and Mac pretty frequently, and it's been fantastic being able to use PowerShell on both. But the introduction of globbing on Mac was a complete show-stopper. I've had to stick with the alpha builds due to this. Even if the overly-aggressive stuff is fixed and behavior matches POSIX as described above, it still sounds terrible to me. | fl * is muscle memory, now I need to type | fl '*'? Wildcards are all over the place in PowerShell, I will need to defensively quote each and every one from now on, just in case my CWD has a particular structure?

@rkeithhill
Copy link
Collaborator
rkeithhill commented Oct 13, 2017

the introduction of globbing on Mac was a complete show-stopper.

Same here for us on Ubuntu.

really hope there can be an option to disable this

Agreed. You can disable pathname expansion globbing in Bash with set -f noglob. There needs to be a way in PowerShell to do the same.

@rkeithhill
Copy link
Collaborator

BTW a question for the Bash savvy. We expect that some folks will continue to live in Bash but we want them to be able to use PowerShell scripts in our repo. So we've shebang'd them and committed them with chmod=+x.

The problem comes when they want use Bash globbing with a PowerShell command. Bash globbing space separates the files and that messes up our PowerShell command. Internally, it can do the wildcard resolution but it is kind of sucky to have to tell Bash users they have to put wildcards in quotes when calling PowerShell scripts. Is there an option in Bash, to get it to generate glob lists that are comma separated?

@mklement0
Copy link
Contributor
mklement0 commented Oct 13, 2017

@latkin:

| fl * is muscle memory, now I need to type | fl '*'?

No: the globbing is only applied when calling external utilities, on Unix, so nothing changes for calls to cmdlets / functions / *.ps1 PS scripts - or at all on Windows.

Note: Globbing is applied when passing arguments to an executable PowerShell script with a shebang line, as it technically is an external utility too.

@mklement0
Copy link
Contributor
mklement0 commented Oct 13, 2017

@rkeithhill:

I don't think there is such an option - arguments are strictly space-separated in the Unix world, and there is no concept of an array-valued argument in the shell.

You could use a ValueFromRemainingArguments parameter in your PowerShell scripts, but that limits you to 1 wildcard expression and notably precludes use of the parameter name in the invocation.

As an aside: even a comma-separated list wouldn't help you, because PowerShell doesn't recognize arrays in arguments passed to it from the outside; e.g., 1,2 would be interpreted as scalar string 1,2, and 1, 2 would be interpreted as 2 arguments: 1, and 2.

@rkeithhill
Copy link
Collaborator

So Bash users need to know to quote wildcard arguments. Guess that's just the way it'll have to be.

PowerShell doesn't recognize arrays in arguments passed to it from the outside

Well that appears to be a new bug. Has that been filed yet? So how do I direct my script users to pass array args to my PowerShell script from Bash? Sigh.. 8000 .

@mklement0
Copy link
Contributor

@rkeithhill:

Well that appears to be a new bug.

I agree that passing arrays from the outside would be nice, but has that really ever worked?

I discovered the issue a while ago and assumed it was a by-design limitation of the CLI's argument parsing, similar to how all arguments are interpreted as literal strings.

@rkeithhill
Copy link
Collaborator
rkeithhill commented Oct 13, 2017

has that really ever worked

Well, it is something that PowerShell users are use to e.g.:

Remove-Item foo.txt, bar.txt, baz.txt

The question is what do Bash users expect? Presumably us PowerShell users will be using PowerShell. However, we will tell our Bash buddies how to run our PowerShell scripts and we're going to have to know that the array literal syntax we're used to in PowerShell isn't going to work in Bash.

@mklement0
Copy link
Contributor

The question is what do Bash users expect?

They expect to pass a list (array) - whose semantics are known to the target utility only - as either a single whitespace-less argument - e.g., to pass column names pid and comm to utility ps via its -o option:
ps -o pid,comm
or as a quoted argument, if the value contains whitespace or other shell metacharacters - e.g.,
ps -o 'pid, comm'

To Bash it is just a single argument in either case.


So, at least with how the PowerShell CLI currently works, the answer is again:

  • quoting when passing - even when calling from within PowerShell (see below)
  • combined with splitting the single string into the embedded elements inside the PowerShell script

From within PowerShell, if you call an external utility - including a shebang-line PS script - with an array argument:

  • if what would normally be an array in PS has no embedded whitespace (e.g., 1,2 or 'a','b'), it is NOT treated as an array and passed as a single argument.

  • If the array elements are space-separated, they turn into individual arguments, by virtue of converting the array to a space-separated list of its elements (e.g., 1, 2 turns into 1 2, seen by the target utility as separate arguments 1 and 2.

@al-ign
Copy link
al-ign commented Oct 31, 2017

Oh, at last I found this issue.
There is same problem (at least as I see it) with invoking native commands with variables containing special symbols in options/arguments.

There is a real-life example when globbing interfere when it shouldn't - SELinux file context management through semanage (of course I leared this hard way, in the middle of writing deployment script).
I wrote this 'mini-test' to demonstrate it.
I have this behavior on PS 6.0.0-beta.9 on CentOS 7 1611

"make sure you have semanage, if not - run 'yum --assumeyes install policycoreutils-python'" 
"recreating test directory" 
if (test-path /testdata -ea 0) { Remove-Item /testdata -Force -Recurse }; New-Item /testdata/testdir1 -ItemType Directory
"show current selinux context" 
ls -lZ /testdata 

"Expected behavior:"
"testing context changing using 'stop-parsing --%' symbol" 
semanage --% fcontext --add -t httpd_sys_rw_content_t "/testdata/testdir1(/.*)?"
restorecon --% -R /testdata/testdir1
"we should see context changed to httpd_sys_rw_content_t" 
ls -lZ /testdata 
"restoring default context" 
semanage --% fcontext --delete "/testdata/testdir1(/.*)?"
restorecon --% -R /testdata/testdir1
ls -lZ /testdata 

"Actual behavior:"
"And now we try to execute same command using PS variables" 
$contextpath = '/testdata/testdir1(/.*)?'
semanage fcontext -a -t httpd_sys_rw_content_t $contextpath
"And what if we try to enclose path in double-quotes?" 
$contextpath = '"/testdata/testdir1(/.*)?"'
semanage fcontext -a -t httpd_sys_rw_content_t $contextpath
"What about escaping?" 
$contextpath = "/testdata/testdir1`(`/`.`*`)`?"
semanage fcontext -a -t httpd_sys_rw_content_t $contextpath

"Workaround:"
"1) write resulting invoke to script file, chmod +x, invoke bash file"
"2) Use Start-process, which doesn't capture command output, which brings another PITA to solve:"
$contextpath = '/testdata/testdir1(/.*)?'
$semanageArgs = @(
'fcontext'
'-a' 
'-t'
'httpd_sys_rw_content_t'
$contextpath
)
Start-Process -FilePath semanage -ArgumentList $semanageArgs -Wait 
restorecon --% -R /testdata/testdir1
ls -lZ /testdata

@mklement0
Copy link
Contributor
mklement0 commented Oct 31, 2017

I think at this point there is agreement that "Unix-native" globbing in PowerShell is broken, but we don't know yet how it will be fixed.

It's been laid out here how POSIX-like shells handle globbing, which decide whether to apply globbing based on the distinction between quoted and unquoted tokens - and anyone with Unix shell-scripting experience is aware of that.
Furthermore, even unquoted variable references are subject to globbing (e.g., in Bash:
var='*.txt'; echo $var # globbing happens, because $var is unquoted)

Both concepts are alien to PowerShell, where

  • *.txt and '*.txt' are treated the same.
  • the distinction between $var and "$var" exists, but is entirely unrelated to globbing (and the need to pass a value with embedded whitespace as a single argument); it merely forces stringification.

Two worlds collide here, and something's gotta give.

Adopting the quoted-vs.-unquoted distinction at least for literal unquoted tokens for calls to external utilities seems like a reasonable compromise to me, but perhaps there's a different solution - we have yet to hear from the powers that be. (a fix is underway)

As an aside re --%:

--% is not the answer not only because you then cannot use PS variables, but because it was designed for Windows and still behaves exclusively that way:

  • Because it doesn't know It treats single quotes as syntactic elements, they become part of the argument to pass: /bin/echo --% 'foo, bar' results in 2 arguments, 'foo, and bar' - note the embedded single quotes.

  • It will expand cmd-style environment variable references even on Unix (e.g., %HOME%), yet doesn't recognize Bash-style ones (e.g., $HOME).

A decision was made not to adapt --% to Unix (whether as --% or with a distinct name) -
see #3733 (comment)

@iSazonov
Copy link
Collaborator

@mklement0 Could you please review ##5188 ?

@mklement0
Copy link
Contributor

@iSazonov Oops! Sorry I missed that a fix is already underway - will take a look.

lzybkr added a commit to lzybkr/PowerShell that referenced this issue Nov 1, 2017
Also fix some minor issues with exceptions being raised when resolving
the path - falling back to no glob.

Fix: PowerShell#3931 PowerShell#4971
lzybkr added a commit that referenced this issue Nov 1, 2017
Also fix some minor issues with exceptions being raised when resolving
the path - falling back to no glob.

Fix: #3931 #4971
@iSazonov iSazonov added the Resolution-Fixed The issue is fixed. label Nov 12, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue-Bug Issue has been identified as a bug in the product Resolution-Fixed The issue is fixed. WG-Language parser, language semantics
Projects
None yet
Development

No branches or pull requests

8 participants
0