8000 Converting from windows 1252 to UTF8 · Issue #6550 · PowerShell/PowerShell · GitHub
[go: up one dir, main page]

Skip to content

Converting from windows 1252 to UTF8 #6550

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Calimerou opened this issue Apr 3, 2018 · 8 comments
Closed

Converting from windows 1252 to UTF8 #6550

Calimerou opened this issue Apr 3, 2018 · 8 comments
Labels
Issue-Bug Issue has been identified as a bug in the product Resolution-Answered The question is answered. WG-Cmdlets-Core cmdlets in the Microsoft.PowerShell.Core module

Comments

@Calimerou
Copy link

Steps to reproduce

Using Windows 1252 encoding, create a file "test.txt" that contents this sentence :
cette fonction doit être appelée avant l'initialisation de l'API

Try to convert the file "test.txt" from Windows 1252 to UTF8 using this script.

Param (
[Parameter(Mandatory=$True)][String]$SourcePath
)

Get-ChildItem $SourcePath* -recurse -Include *.txt | ForEach-Object {
$content = $_ | Get-Content

Set-Content -PassThru $_.Fullname $content -Encoding UTF8 -Force}

Expected behavior

In UTF8 :

cette fonction doit être appelée avant l'initialisation de l'API

Actual behavior

In UTF8:

cette fonction doit �tre appel�e avant l'initialisation de l'API

Environment data

Name Value


PSVersion 6.1.0-preview.1
PSEdition Core
GitCommitId v6.1.0-preview.1
OS Microsoft Windows 6.1.7601 S
Platform Win32NT
PSCompatibleVersions {1.0, 2.0, 3.0, 4.0...}
PSRemotingProtocolVersion 2.3
SerializationVersion 1.1.0.1
WSManStackVersion 3.0

Note

Powershell 4.0 does not have this issue

@mklement0
Copy link
Contributor

The default encoding in PowerShell Core is now UTF-8 (without a BOM when creating files).

That means that a Windows 1252-encoded file - in the absence of a BOM defining it as such (there is none for Windows 1252) - is now interpreted as UTF-8.

The upshot is that you must now tell Get-Content what encoding to assume - unless it is UTF-8 or there is a BOM.

Regrettably, Get-Content doesn't currently allow you to specify Windows 1252, because Default now represents UTF-8 and no longer the active "ANSI" code page (such as Windows 1252), as on Windows PowerShell, and you cannot pass a [System.Text.Encoding] instance directly.

This is an oversight that must be corrected.

My suggestion: add an ANSI encoding enumeration value on Windows that represents the system's legacy "ANSI" code page (e.g., Windows 1252 on US-English systems).


The - cumbersome - workaround to use in the meantime requires use of the .NET framework directly:

$content = [IO.File]::ReadAllText($_.FullName, [text.encoding]::GetEncoding(1252))

Or, more generically:

$content = [IO.File]::ReadAllText($_.FullName, [text.encoding]::GetEncoding([cultureinfo]::CurrentCulture.TextInfo.ANSICodePage))

@BrucePay BrucePay added Issue-Bug Issue has been identified as a bug in the product WG-Cmdlets-Core cmdlets in the Microsoft.PowerShell.Core module labels Apr 3, 2018
@stknohg
Copy link
Contributor
stknohg commented Apr 4, 2018

@mklement0

PowerShell Core 6.0 accepts System.Text.Encoding class in -Encoding parameter. (#5080)

We can write as follow.

$content = $_ | Get-Content -Encoding ([System.Text.Encoding]::GetEncoding(1252))

# or

$content = $_ | Get-Content -Encoding ([System.Text.Encoding]::GetEncoding([cultureinfo]::CurrentCulture.TextInfo.ANSICodePage))

Additionally, WindowsLegacyg is proposed in RFC.
(but WindowsLegacyg is not implemented yet...)

It is better to discuss this RFC if compatibility is necessary.


Maybe #5204 related.

@mklement0
Copy link
Contributor
mklement0 commented Apr 4, 2018

@stknohg:

Ah, thanks. Somehow I had wrongly convinced myself that you couldn't directly pass a System.Text.Encoding instance - thanks for clarifying that.

I think the discussion around the linked RFC eventually led to the current Core behavior of globally defaulting to BOM-less UTF-8 - see PowerShell/PowerShell-RFC#71

The WindowsLegacy meta-setting was intended for a never-implemented $PSDefaultEncoding preference variable, and was meant to globally revert to the old, inconsistent encoding behavior for the sake of backward compatibility - an approach that I personally think is not worth pursuing.

Again, given that OEM - the OEM code page implied by the legacy system locale - already exists as a predefined encoding enumeration value, it should be complemented with an ANSI identifier for the "ANSI" code page implied by the system locale (on Windows only; the equivalent of what Default represents for Windows PowerShell).

@stknohg
Copy link
Contributor
stknohg commented Apr 4, 2018

Certainly, to introduce ANSI is simpler and not globally as you say.
I think it's good.

@Calimerou
Copy link
Author

The workaround proposed by mklement0 works for me.
I propose to close this issue since the rest of the discussion is mainly focused on BM-less UTF8 which is indeed treated in PowerShell/PowerShell-RFC#71.
Thanks.

@mklement0
Copy link
Contributor

@Calimerou: Alternatively, we could retitle your issue and modify the initial post to propose the missing ANSI encoding-enumeration value, as discussed. If you prefer my creating a new issue instead, let me know.

@Calimerou
Copy link
Author

I would prefer yours.
Thanks in advance.

@sba923
Copy link
Contributor
sba923 commented Jul 18, 2019

For now, I work around this issue in my scripts as follows:

    $iswinps = ($null, 'Desktop') -contains $PSVersionTable.PSEdition
    if (!$iswinps)
    {
        $encoding = [System.Text.Encoding]::GetEncoding(1252)
    }
    else
    {
        $encoding = [Microsoft.PowerShell.Commands.FileSystemCmdletProviderEncoding]::Default
    }
    
    Get-Content -Encoding $encoding ...

HTH

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue-Bug Issue has been identified as a bug in the product Resolution-Answered The question is answered. WG-Cmdlets-Core cmdlets in the Microsoft.PowerShell.Core module
Projects
None yet
Development

No branches or pull requests

6 participants
0