-
Notifications
You must be signed in to change notification settings - Fork 7.6k
Fix Default/OEM encoding behavior PowerShell Core #3467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
36ab1e2
to
9580b4a
Compare
I was able to build locally with |
@@ -7569,8 +7569,7 @@ private static Encoding GetEncodingFromEnum(FileSystemCmdletProviderEncoding typ | |||
|
|||
case FileSystemCmdletProviderEncoding.Oem: | |||
{ | |||
uint oemCP = NativeMethods.GetOEMCP(); | |||
encoding = System.Text.Encoding.GetEncoding((int)oemCP); | |||
encoding = ClrFacade.GetOEMEncoding(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The NativeMethods.GetOEMCP
in this file can be removed if it's no longer used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is used in ClrFacade.GetOEMEncoding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ClrFacade
has its own copy of GetOEMEncoding
, so the one in FileSystemProvider.cs
can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
#if CORECLR | ||
|
||
#if UNUX // PowerShell Core on Unix | ||
s_defaultEncoding = Encoding.GetEncoding(65001); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not using Encoding.UTF8
? Isn't 65001 the same as utf8?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
#if UNUX // PowerShell Core on Unix | ||
s_defaultEncoding = Encoding.GetEncoding(65001); | ||
#else // PowerShell Core on Windows | ||
uint aCp = NativeMethods.GetACP(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename the variable to currentAnsiCp
? It's helpful to know what GetACP
does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// We will revisit this if it causes any failures when running tests on Core PS. | ||
s_defaultEncoding = Encoding.GetEncoding(28591); | ||
#else | ||
#if CORECLR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The if/def pattern used in other places is
#if UNIX
// unix behavior
#elif CORECLR
// win-core behavior
#else
// win-full behavior
#endif
Could you please follow the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
uint aCp = NativeMethods.GetACP(); | ||
if (s_oemEncoding == null) | ||
{ | ||
Encoding.RegisterProvider(System.Text.CodePagesEncodingProvider.Instance); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better to separate this into a method to reduce code duplication?
@@ -17,6 +17,7 @@ internal static class PinvokeDllNames | |||
internal const string QueryDosDeviceDllName = "api-ms-win-core-file-l1-1-0.dll"; /*1*/ | |||
internal const string CreateSymbolicLinkDllName = "api-ms-win-core-file-l2-1-0.dll"; /*2*/ | |||
internal const string GetOEMCPDllName = "api-ms-win-core-localization-l1-2-0.dll"; /*3*/ | |||
internal const string GetACPDllName = "api-ms-win-core-localization-l1-2-0.dll"; /*3*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The /*3*/
at the end is important to track if there is a mismatch. Please add this API to the end of the list and use an incremented number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -139,6 +140,7 @@ internal static class PinvokeDllNames | |||
internal const string QueryDosDeviceDllName = "kernel32.dll"; /*1*/ | |||
internal const string CreateSymbolicLinkDllName = "kernel32.dll"; /*2*/ | |||
internal const string GetOEMCPDllName = "kernel32.dll"; /*3*/ | |||
internal const string GetACPDllName = "kernel32.dll"; /*3*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -555,10 +564,20 @@ internal static Encoding GetOEMEncoding() | |||
{ | |||
if (s_oemEncoding == null) | |||
{ | |||
#if CORECLR // The OEM code page '437' is not supported by CoreCLR. | |||
// Use the default encoding (ISO-8859-1, CodePage 28591) as the OEM encoding in OneCore powershell. | |||
#if CORECLR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here regarding the pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
uint oemCp = NativeMethods.GetOEMCP(); | ||
if (s_defaultEncoding == null) | ||
{ | ||
Encoding.RegisterProvider(System.Text.CodePagesEncodingProvider.Instance); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here regarding the code duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Why are you using |
@@ -85,7 +85,7 @@ | |||
<PackageReference Include="System.ServiceModel.Primitives" Version="4.3.0" /> | |||
<PackageReference Include="System.ServiceModel.Security" Version="4.3.0" /> | |||
<PackageReference Include="System.ServiceProcess.ServiceController" Version="4.3.0" /> | |||
<PackageReference Include="System.Text.Encoding.CodePages" Version="4.3.0" /> | |||
<PackageReference Include="System.Text.Encoding.CodePages" Version="4.0.11" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change it to 4.0.11
? I don't even find such a version on nuget.org
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With 4.0.11 I can compile locally on my computer. Now I changed to 4.0.1 but failed too.
EncodingRegisterProvider() | ||
|
||
uint currentAnsiCp = NativeMethods.GetACP(); | ||
s_defaultEncoding = Encoding.GetEncoding((int)currentAnsiCp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Encoding.Default
is coming back in netstandard2.0, are we going to change to it at that point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have RFC for that. No we only recover "Windows PowerShell" behavior.
@@ -530,17 +530,21 @@ internal static object[] GetCustomAttributes<T>(Assembly assembly) | |||
#region Encoding | |||
|
|||
/// <summary> | |||
/// Facade for Encoding.Default | |||
/// Facade for getting Encoding.Default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on what we will do when Encoding.Default
is back in netstandard2.0. If we are not going to change to Encoding.Default
in powershell core, then the comment should be Facade for getting default encoding
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree.
|
||
uint currentAnsiCp = NativeMethods.GetACP(); | ||
s_defaultEncoding = Encoding.GetEncoding((int)currentAnsiCp); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the extra blank line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
uint oemCp = NativeMethods.GetOEMCP(); | ||
s_oemEncoding = Encoding.GetEncoding((int)oemCp); | ||
|
||
#else // Windows PowerShell | ||
uint oemCp = NativeMethods.GetOEMCP(); | ||
s_oemEncoding = Encoding.GetEncoding((int)oemCp); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't notice the duplicate code between coreclr and fullclr. In this case, maybe the following is better?
#if UNIX
s_oemEncoding = GetDefaultEncoding();
#else
#if CORECLR
EncodingRegisterProvider()
#endif
uint oemCp = NativeMethods.GetOEMCP();
s_oemEncoding = Encoding.GetEncoding((int)oemCp);
#end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried - it is difficult to read. In compile time no duplicate code is. So maybe leave as is?
uint oemCp = NativeMethods.GetOEMCP(); | ||
s_oemEncoding = Encoding.GetEncoding((int)oemCp); | ||
#endif | ||
} | ||
return s_oemEncoding; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove extra line, or move private static volatile Encoding s_defaultEncoding;
here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved.
private static volatile Encoding s_oemEncoding; | ||
|
||
private static EncodingRegisterProvider() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method should be enclosed with #if CoreCLR, as it won't build for fullclr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
{ | ||
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
{ | ||
if (s_defaultEncoding == null && s_oemEncoding == null) | ||
{ | ||
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible this line will be called more than once when there are multiple runspace. Will it fail at the second call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First I put the call in GetDefaultEncoding and GetOEMEncoding without "if" and second call was well.
@daxian-dbw |
@JamesWTruher @mklement0, you have been closely following up with this issue (and related), could you please take a look at the fix? |
s_defaultEncoding = Encoding.GetEncoding(28591); | ||
#else | ||
#if UNUX // PowerShell Core on Unix | ||
s_defaultEncoding = Encoding.UTF8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a mistake to default to a UTF-8 encoding with BOM.
While .NET Core, at least currently, also reports the same BOM-based encoding - unofficially in Encoding.Default
(not part of the contract yet) and officially in Encoding.GetDefaultEncoding(0)
(and I consider that misguided too) -
at least it doesn't govern the actual default behavior in the sense of what happens if no encoding is specified. The methods of type System.IO.File
default to BOM-less UTF-8 - both in .NET Framework since its inception and in .NET Core.
It think it's imperative that we use a BOM-less UTF-8 encoding in the Unix world (and, I would argue, even make that the default on Windows (Core) as well, with opt-in to legacy behavior).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
@@ -85,7 +85,7 @@ | |||
<PackageReference Include="System.ServiceModel.Primitives" Version="4.3.0" /> | |||
<PackageReference Include="System.ServiceModel.Security" Version="4.3.0" /> | |||
<PackageReference Include="System.ServiceProcess.ServiceController" Version="4.3.0" /> | |||
<PackageReference Include="System.Text.Encoding.CodePages" Version="4.3.0" /> | |||
<PackageReference Include="System.Text.Encoding.CodePages" Version="4.0.1" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use 4.3.0
? The type System.Text.CodePagesEncodingProvider
is available in that package.
You need to update SMA.csproj to add this package reference since that's the project depending on it. Then you can remove this entry from Microsoft.PowerShell.SDK.csproj.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks! It is solved the problem and PowerShell is builded well.
But currently Travis is failed on test phase with
Import-Module : Unable to load DLL 'api-ms-win-core-localization-l1-2-0.dll'
Could you comment please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#if UNUX
As Steve pointed out, this is why 😄
@daxian-dbw, @JamesWTruher : I know this debate is cumbersome, but I think it's a very important one to have, and I don't think we're ready to move ahead yet - I've just added further thoughts here. |
// We will revisit this if it causes any failures when running tests on Core PS. | ||
s_defaultEncoding = Encoding.GetEncoding(28591); | ||
#else | ||
#if UNUX // PowerShell Core on Unix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be UNIX
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh... fixed.
@@ -135,6 +135,7 @@ internal static class PinvokeDllNames | |||
internal const string CreateToolhelp32SnapshotDllName = "api-ms-win-core-toolhelp-l1-1-0"; /*120*/ | |||
internal const string Process32FirstDllName = "api-ms-win-core-toolhelp-l1-1-0"; /*121*/ | |||
internal const string Process32NextDllName = "api-ms-win-core-toolhelp-l1-1-0"; /*122*/ | |||
internal const string GetACPDllName = "api-ms-win-core-localization-l1-2-0.dll"; /*123*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API set matches the API on my laptop. I don't have access to a Nano VM at the moment to verify it there. On Nano, it must match the exporting API set exactly. You can verify it with dumpbin.exe /exports <path to api set>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I don't run still dumpbin.exe /exports
MSDN docs say about:
- Nano api-ms-win-core-localization-l1-2-2.dll https://msdn.microsoft.com/en-us/library/mt588480%28v=vs.85%29.aspx?#api-ms-win-core-localization-l1-2-2.dll
- Desktop api-ms-win-core-localization-l1-2-1.dll https://msdn.microsoft.com/en-us/library/windows/desktop/dn933214(v=vs.85).aspx
What version we should use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetACP
is in api-ms-win-core-localization-l1-2-0.dll
of runtime.win7-x64.Microsoft.NETCore.Windows.ApiSets
dump.
But It seems Nano use runtime.win10-x64.Microsoft.NETCore.Windows.ApiSets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is correct "as-is" here. No change needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Clear.
Is Nano still under question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Nano Server contains the 1-2-0 version of the API set
3a8d68b
to
0f04279
Compare
@daxian-dbw @SteveL-MSFT CI have passed - many thanks! Sorry for many typos (make the PR in the weekend night was bad my idea). Please continue code review. @mklement0 Can you build the branch and check our expectations? If no, could you please attach files with your tests? |
@iSazonov: I built on my macOS 10.12 machine and I can confirm that the following ad-hoc tests return true on macOS (and, judging by the source, therefore all Unix platforms), as desired: # Setup: *create* a BOM-less UTF-8 file.
'ö' | set-content -nonewline /tmp/$pid.txt
# Tests: Both should output $True
# Compare the raw bytes of the new file to the UTF-8 encoding of 'ö' (0xc3 0xb6)
# With the current alpha17, this would return $False, because Set-Content creates an
# ISO-8859-1 file.
$null -eq (Compare-Object (Get-Content -Encoding Byte -Raw /tmp/$pid.txt) (0xc3, 0xb6))
# See if the BOM-less UTF-8 file is *read* correctly.
'ö' -eq (Get-Content -Raw /tmp/$pid.txt) (I wasn't able to run the tests - the
) |
@mklement0 Thanks for confirmation! It means the code is step in right direction! All tests have passed on CI. I believe "New-PSSession basic test" too and you can ignore the error in context of the PR (or try discover a root of the error: copy-past test code in your session and see $error[0] - I think you don't install any package). |
I added comment about tests in first post. |
Great idea, but I suggest, to avoid ambiguity, that you explicitly mention in your comment that this PR implements BOM-less UTF-8, especially given that your comment links to the .NET Core To add some more fodder for upcoming tests, here's a simple one that tests whether PowerShell itself correctly reads BOM-less UTF-8 source code (such as scripts) by default in PS Core on Unix: # Setup:
# Create a no-BOM UTF-8 file with the following literal content:
# 'ö'
# which, when executed as a PS script, should echo 'ö' back, IF the script
# was correctly decoded from UTF-8 by PS.
# UTF-8 bytes: 0x27 (single quote), 0xc3 0xb6 (encoding of 'ö', U+00F6), 0x27 (single quote)
[byte[]] (0x27, 0xc3, 0xb6, 0x27) | Set-Content -Encoding Byte /tmp/$PID.ps1
# Test: See if the 'ö' is echoed back correctly.
# Should return $True
'ö' -eq (& /tmp/$PID.ps1) |
@mklement0 I corrected first post about BOM-less, in the code it is obvious. |
@iSazonov are you going to add tests to this PR? |
I added in first post:
I am confused. Partial test is meaningless. The full test also does not make sense until the approval of the Encoding RFC. What is compromise? |
@iSazonov the code changes look good to me. If the corresponding tests will come later, then we should keep that work item tracked in an issue. Is #3248 considered fixed with this PR? If not, the test work item should be tracked in the first post of that issue, similar to how you are tracking |
@daxian-dbw Tracking Issue #3488. |
@iSazonov Thank you! I will merge this PR later to 4D1F day. |
Based on conclusion in discussion #3248
We need internal fix because .Net Core haven't "OEM" term.
The fix should provide PowerShell Core behavior:
on Windows - as Windows PowerShell based on GetOEMCP() and legacy encodings (System.Text.Encoding.CodePages)
on Unix - the same as default encoding in PowerShell Core
- Default
We need internal fix because CoreFX default for all platforms is UTF8
Until we have completed Encoding RFC we should restore behaivor as in Windows PowerShell.
The fix should provide PowerShell Core behavior:
on Windows - as Windows PowerShell based on GetACP() and legacy encodings (System.Text.Encoding.CodePages)
on Unix - UTF-8 without BOM (We are expecting that the same will be in CoreFX)
- Tests
Now we don't have a full set of encoding tests (only for redirections). I believe we need to create them during future RFC implementation, not now.