-
Notifications
You must be signed in to change notification settings - Fork 4
Implementing a solution
There is no complete solution to argument-escaping on Windows, particularly when cmd.exe is involved. The best that can be achieved is a robust compromise that can handle most cases without introducing a set of complex rules.
From How Windows parses the command-line it is clear that the only character that might cause unexpected results is a double-quote. So we need a convention for handling these and arguments in general:
- The argument is treated as unescaped.
- Any double-quotes in an argument are escaped as literal double-quotes.
- An argument will not be enclosed in double-quotes unless absolutely necessary.
This will avoid inconsistencies when handling consecutive double-quotes and enable each argument to be included in a command-line without it affecting other items. It will also prevent double-quotes breaking batch scripts.
Having defined our convention, the steps to escape an argument are simply:
- Replace all
[backslashes] double-quotewith[2 x backslashes] backslash double-quote. - If a
spaceortabcharacter is found, or the argument is empty:- double up trailing backslashes.
- add surrounding double-quotes.
function escapeWin($arg)
{
$arg = preg_replace('/(\\\\*)"/', '$1$1\\"', $arg);
if (strpbrk($arg, " \t") !== false || $arg === '') {
$arg = preg_replace('/(\\\\*)$/', '$1$1', $arg);
$arg = '"'.$arg.'"';
}
return $arg;
}From How cmd.exe parses a command we know that meta characters have a special meaning. How we deal with these is split into the following sections:
From the point of view of cmd, all double-quotes either start or end a quoted-string, regardless of whether they are backslash-escaped. This could have unexpected consequences if there is an odd number of double quotes, or in other situations.
For example, the argument colors="red & blue" would be escaped as:
"colors=\"red & blue\""
However the & character is no longer protected by the opening double-quote, because the quoted-string has been closed by the first literal (backslash-escaped) double-quote. The result is that the argument is split by the & character and cmd trys to call a program named blue\"".
The only way to solve this is to caret-escape the whole argument, which in this case would be ^"colors=\^"red ^& blue\^"^".
Environment variable expansion is triggered by the %...% and !...! syntax, regardless of the quoted-string state. Therefore we need to caret-escape the whole argument.
However we cannot do this for exclamation-marks. These require an escape sequence of two carets ^^!, due to the two step parsing that cmd performs, and we have no way of knowing the DelayedExpansion state (other than it is disabled by default):
- If enabled, an escaped
^^!var^^!will be transformed to!var!as intended. - If disabled, an escaped
^^!var^^!will be transformed to^!var^!and introduce two unintended carets.
These are the characters that have not yet been accounted for: ^ & | < > ( )
Since they have no special meaning inside a quoted-string (and we know there are no double-quotes to confuse the quoted-string state) we have two choices:
- Do nothing if there is whitespace in the argument (because these meta characters will be escaped by the enclosing double-quotes).
- Enclose the argument in double-quotes if it contains any of these meta characters.
Note that we do not use caret-escaping in case we come up against its single limitation.
We can condense the above into the following rules:
- If an argument contains double-quotes or
%...%syntax, the transformed argument must be caret-escaped. - Otherwise if it does not contain whitespace but does contains meta characters it will be enclosed in double-quotes.
- The
!meta character is not escaped because it cannot be handled reliably.
We need to set the following flags:
- Set quote to true if a
spaceortabcharacter is found, or the argument is empty. - Set dquotes to true if a double-quote character is found.
- Set meta to true if dquotes is true or two
%characters surround other characters.
- We need to caret-escape everything, including any enclosing double-quotes.
- If meta and quote are false, set quote to true if any
^&|<>()characters are found.
- We can safely escape these characters using the surrounding double-quotes.
Now we can perform the escaping:
- If dquotes is true:
- Replace all
[backslashes] double-quotewith[2 x backslashes] backslash double-quote.
- If quote is true:
- double up trailing backslashes.
- add surrounding double-quotes.
- If meta is true:
- escape all
"^&|<>()%characters with a caret^.
function escapeCmdExe($arg)
{
$quote = strpbrk($arg, " \t") !== false || $arg === '';
$dquotes = strpos($arg, '"') !== false;
$meta = $dquotes || preg_match('/%[^%]+%/', $arg);
if (!$meta && !$quote) {
$quote = strpbrk($arg, '^&|<>()') !== false;
}
if ($dquotes) {
$arg = preg_replace('/(\\\\*)"/', '$1$1\\"', $arg);
}
if ($quotes) {
$arg = preg_replace('/(\\\\*)$/', '$1$1', $arg);
$arg = '"'.$arg.'"';
}
if ($meta) {
$arg = preg_replace('/(["^&|<>()%])/', '^$1', $arg);
}
return $arg;
}