8000 Avoid creating instances of the generated delegate helper class every time `ReplaceOperatorImpl` is called by daxian-dbw · Pull Request #14128 · PowerShell/PowerShell · GitHub
[go: up one dir, main page]

Skip to content

Uh oh!

There was an error while loading. Please reload this page.

Conversation

@daxian-dbw
Copy link
Member
@daxian-dbw daxian-dbw commented Nov 18, 2020

PR Summary

Unnecessary Allocation

In ReplaceOperatorImpl, a lambda expression is used in one of the switch case. But in the generated IL code, an instance of the generated delegate helper class is created at the beginning of the function, so even if the that specific switch case is never hit during a script execution, an instance of the helper class will be created for every -replace call.

For the particular repro in #14087, 76 MB of the delegate helper class instances got allocated in @iSazonov's profiling, which should all be avoided.

image

Here is the IL code generated for ReplaceOperatorImpl:

image

The Fix

The change in this PR moves the delegate to a local static method, so the creation of the delegate helper class instance is done in the helper method instead, and therefore that happens only if the particular swtich case is hit.

PR Checklist

@ghost ghost assigned TravisEz13 Nov 18, 2020
@daxian-dbw daxian-dbw added the CL-Performance Indicates that a PR should be marked as a performance improvement in the Change Log label Nov 18, 2020
@daxian-dbw daxian-dbw assigned iSazonov and unassigned TravisEz13 Nov 18, 2020

// Local helper function to avoid creating an instance of the generated delegate helper class
// every time 'ReplaceOperatorImpl' is invoked.
static MatchEvaluator GetMatchEvaluator(ExecutionContext context, ScriptBlock sb)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could put the method in a helper static class and save the result in an private static. So we will create the delegate only once.
I found the pattern in https://www.stevejgordon.co.uk/high-performance-logging-in-net-core.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the compiler not cache the delegate?

Copy link
Member Author
@daxian-dbw daxian-dbw Nov 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iSazonov The delegate body for MatchEvaluator requires 2 closed-over variables from its lexically-enclosing method sb and context -- they are not passed to the delegate as paramters, and thus a generated helper class is needed.

The article you referred to is a good read, but the key perf improvement there is avoid processing the message template for every call to each of the log methods.

The key improvement here, vs. Standard logging, is that the template for the log message now only needs to be parsed once when this message is defined. In standard logging, that occurs for each call.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xtqqczze The delegate here cannot be cached becuase it depends on the scirpt block sb and also the execution context context. These 2 closed-over variables make caching not useful in most cases.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The delegate body for MatchEvaluator requires 2 closed-over variables from its lexically-enclosing method sb and context -- they are not passed to the delegate as paramters, and thus a generated helper class is needed.

Can't we pass them as parameters?
I guess the pattern I mentioned excludes delegate allocations at all.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem to be talking about different things. :-)
One code path where the ReplaceOperatorImpl() is called is a while cycle. There the same context and the same scriptblock but we create the delegate again and again - what if the lval is a large list?

Copy link
Member Author
@daxian-dbw daxian-dbw Nov 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you were talking about now :)
I'm not trying to optimize for all scenarios, but the most common scenario.

To optimize for the scenario you are talking about, you will need a delegate cache with both the scirpt block and the context together as the key. But as I said previously, the cache will not be useful for most cases. You need to think about how often would $ABigList -replace "pattern", $AScriptBlock be used. Holding the cache indefinitely would be a waste of memory; adding logic to clear the cache smartly would be extra complexity. Either way, I don't think the value would be worth the effort.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8000

Either way, I don't think the value would be worth the effort.

After bad experience, I avoid such scripts. I suppose others do the same. But this does not mean that it cannot be improved. For example, the original script from the question could be rewritten using this pattern and if optimized, it could possibly run even faster than the original script.

I will open a tracking issue for the optimization.

Copy link
Member Author
@daxian-dbw daxian-dbw Nov 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, the original script from the question could be rewritten using this pattern and if optimized could possibly run even faster than the original script.

No, the original repro scirpt cannot be rewritten with $ABigList -replace "pattern", $AScriptBlock, because

  1. the script uses many different patterns
  2. the substitute strings are known, not requiring calculation based on $match. (be noted on the $AScriptBlock part)

The repro script uses -replace in a way that doesn't allocate delegate at all.

Copy link
Collaborator
@iSazonov iSazonov Nov 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am definitely in another dimension :-)

I implemented my thoughts in #14221.

@iSazonov iSazonov merged commit 44f5811 into PowerShell:master Nov 21, 2020
@daxian-dbw daxian-dbw deleted the smallmemory branch November 22, 2020 05:11
@iSazonov iSazonov added this to the 7.2.0-preview.2 milestone Nov 22, 2020
@ghost
Copy link
ghost commented Dec 15, 2020

🎉v7.2.0-preview.2 has been released which incorporates this pull request.:tada:

Handy links:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CL-Performance Indicates that a PR should be marked as a performance improvement in the Change Log

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants

0