8000 Allow non-default encodings to be used in user's script/code by daxian-dbw · Pull Request #18605 · PowerShell/PowerShell · GitHub
[go: up one dir, main page]

Skip to content

Allow non-default encodings to be used in user's script/code #18605

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 7, 2022

Conversation

daxian-dbw
Copy link
Member
@daxian-dbw daxian-dbw commented Nov 17, 2022

PR Summary

Fix #18537

Fix the regression to allow non-default encodings to be used in user's script/code.

Move the registration of CodePagesEncodingProvider to the static constructor of AutomationEngine, which is before the AutomationEngine constructor is called, which calls into iss.Bind(...) that may run user code.

Also, almost right after creating an AutomationEngine instance in LocalRunspace.DoOpenHelper(), InitialSessionState.BindRunspace will be called, which will try loading modules (could be custom modules) that are specified in the InitialSessionState instance.

In fact, Encoding.GetEncoding('IBM437') doesn't really work consistently as of today (7.2 or 7.3), here is a simple repro:

## Running this from 7.2.x or 7.3.0 fails because no module loading gets triggered, and thus, the call
## to `GetDefaultEncoding()` doesn't happen as early as when `PSReadLine` is auto-loaded at startup.
pwsh -noprofile -c "[System.Text.Encoding]::GetEncoding('IBM437').WebName"

You will get this failure:

image

So, this PR is to make it a consistent, predictable behavior.

PR Context

The root cause of the regression is:

  • before the change of Replace UTF8Encoding(false) with Encoding.Default part 2 #18356, Encoding.RegisterProvider will be called on the first time retrieving the default encoding by GetDefaultEncoding(), which happens when loading the PSReadLine module on startup. So, the encoding provider gets registered early enough for GetEncoding("IBM437") to work in user script/code.
  • after Replace UTF8Encoding(false) with Encoding.Default part 2 #18356, Encoding.RegisterProvider won't be called until GetOEMEncoding() is called, which happens pretty late -- until you run a cmdlet that accesses EncodingConversion. So, the encoding provider is not registered when GetEncoding("IBM437") is used in user script/code.

PR Checklist

@iSazonov
Copy link
Collaborator

I did not found an API being out of AutomationEngine which could depend on Encoding.RegisterProvider, nevertheless please confirm.

@daxian-dbw
Copy link
Member Author
daxian-dbw commented Nov 17, 2022

I called it out in the PR description: iss.Bind(...) in its constructor may run user code -- binding provider and etc.

PowerShell doesn't depend on non-default encodings except for the OEM encoding, but user code may depend on them.
Updated the PR description to add more information about the context and the fix.

@iSazonov
Copy link
Collaborator
iSazonov commented Nov 18, 2022

@daxian-dbw Please look CI MacOS fails. Comes from #18567?

@pull-request-quantifier-deprecated

This PR has 17 quantified lines of changes. In general, a change size of upto 200 lines is ideal for the best PR experience!


Quantification details

Label      : Extra Small
Size       : +10 -7
Percentile : 6.8%

Total files changed: 5

Change summary by file extension:
.cs : +5 -7
.ps1 : +5 -0

Change counts above are quantified counts, based on the PullRequestQuantifier customizations.

Why proper sizing of changes matters

Optimal pull request sizes drive a better predictable PR flow as they strike a
balance between between PR complexity and PR review overhead. PRs within the
optimal size (typical small, or medium sized PRs) mean:

  • Fast and predictable releases to production:
    • Optimal size changes are more likely to be reviewed faster with fewer
      iterations.
    • Similarity in low PR complexity drives similar review times.
  • Review quality is likely higher as complexity is lower:
    • Bugs are more likely to be detected.
    • Code inconsistencies are more likely to be detected.
  • Knowledge sharing is improved within the participants:
    • Small portions can be assimilated better.
  • Better engineering practices are exercised:
    • Solving big problems by dividing them in well contained, smaller problems.
    • Exercising separation of concerns within the code changes.

What can I do to optimize my changes

  • Use the PullRequestQuantifier to quantify your PR accurately
    • Create a context profile for your repo using the context generator
    • Exclude files that are not necessary to be reviewed or do not increase the review complexity. Example: Autogenerated code, docs, project IDE setting files, binaries, etc. Check out the Excluded section from your prquantifier.yaml context profile.
    • Understand your typical change complexity, drive towards the desired complexity by adjusting the label mapping in your prquantifier.yaml context profile.
    • Only use the labels that matter to you, see context specification to customize your prquantifier.yaml context profile.
  • Change your engineering behaviors
    • For PRs that fall outside of the desired spectrum, review the details and check if:
      • Your PR could be split in smaller, self-contained PRs instead
      • Your PR only solves one particular issue. (For example, don't refactor and code new features in the same PR).

How to interpret the change counts in git diff output

  • One line was added: +1 -0
  • One line was deleted: +0 -1
  • One line was modified: +1 -1 (git diff doesn't know about modified, it will
    interpret that line like one addition plus one deletion)
  • Change percentiles: Change characteristics (addition, deletion, modification)
    of this PR in relation to all other PRs within the repository.


Was this comment helpful? 👍  :ok_hand:  :thumbsdown: (Email)
Customize PullRequestQuantifier for this repository.

@daxian-dbw
Copy link
Member Author

The mac CI somehow uses new test changes from a different PR that was merged yesterday. I have rebased the code and CIs should pass hopefully.

@iSazonov
Copy link
Collaborator

@SeeminglyScience Please review.

@ghost ghost added the Review - Needed The PR is being reviewed label Nov 26, 2022
@ghost
Copy link
ghost commented Nov 26, 2022

This pull request has been automatically marked as Review Needed because it has been there has not been any activity for 7 days.
Maintainer, please provide feedback and/or mark it as Waiting on Author

@kilasuit
Copy link
Collaborator

@daxian-dbw just an FYI - I had to do more investigation into this PR than was needed due to you doing a little bit of clean up without it being associated to a commit, particularly when looking at /src/System.Management.Automation/utils/EncodingUtils.cs in this PR as by using the GitHub Web UI it just looks like there was code being removed that could not be easily accounted for, but could have been with a commit message for that file. Whilst I get that you'll likely have been entirely within vscode for this PR you should in future remember that it's not just vscode that we can look at PRs going forward.

@daxian-dbw
Copy link
Member Author

@kilasuit Thanks for the feedback!
@SeeminglyScience Can you please review when you get a chance? Thanks!

@ghost ghost removed the Review - Needed The PR is being reviewed label Nov 29, 2022
@iSazonov
Copy link
Collaborator
iSazonov commented Dec 2, 2022

@SeeminglyScience Friendly ping!

Copy link
Collaborator
@SeeminglyScience SeeminglyScience left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, LGTM!

@daxian-dbw daxian-dbw added the CL-Engine Indicates that a PR should be marked as an engine change in the Change Log label Dec 6, 2022
@iSazonov iSazonov merged commit 0619479 into PowerShell:master Dec 7, 2022
@iSazonov
Copy link
Collaborator
iSazonov commented Dec 7, 2022

Great! 👍

@daxian-dbw daxian-dbw deleted the encoding branch December 7, 2022 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CL-Engine Indicates that a PR should be marked as an engine change in the Change Log Extra Small
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"IBM437 is not supported encoding name" exception when saving Excel workbook on daily PS 7.3 build
4 participants
0