-
Notifications
You must be signed in to change notification settings - Fork 7.7k
Description
This arose from #3008(comment) where the slow performance of opening numerous runspaces that import modules was noted as a barrier to a generally-performant implementation of Invoke-Parallel
and parallel ForEach-Object
. From what I can tell, this slow performance is a barrier to generally-performant PowerShell everywhere scriptblocks are invoked concurrently within a process. One such example is the certificate validation scriptblock in #4970, which (as best I can tell) performs whatever CPU-bound work is necessary to import whatever modules are used for certificate validation each time the http client invokes HttpClientHandler.ServerCertificateCustomValidationCallback
.
The root performance limiter
Currently, the performance of an implementation involving concurrent scriptblocks is limited by contention amongst threads importing a common script module in parallel. That limitation is detailed in #7035.
What is currently possible
Despite #7035, it is possible to achieve some useful concurrency with slow-loading modules using the current Runspace implementation. In order to achieve this, the following is necessary:
- Open each runspace and import the required modules into it in anticipation of its use. The best performance is (currently) achieved by importing modules into runspaces one runspace at a time because of the contention problems.
- Reset and re-use the runspaces with the imported modules. Note that there are limitations to the degree to which a runspace can be reset, so you should expect some cross-talk between invocations as a result.
I applied this strategy in an experimental implementation of Invoke-Parallel
. The processing of a CPU-bound workload on 8 cores looks like this (click to see the gif):
In this example, Invoke-Parallel
is processing 20 items through the same scriptblock, and the scriptblock performs 10 operations on each item. You can see that runspaces are opened one at a time, and used as they become available. They are re-used as each concurrent Scriptblock invokation completes. So you get increasing parallelism as each runspace becomes available.
Using this technique to parallelize unit tests on my 16-core computer took 40 seconds to open all the runspaces and reach full parallelism. This is simply because it takes 40 seconds to import the test framework module and module under test 16 times.
What is not currently possible
As best I can tell, it is currently not possible to open runspaces with imported modules within a single process any faster than single-threaded. The problem with this is that if there is a demand for many concurrent scriptblocks that use a slow-loading module, the last scriptblock can be waiting in line for a runspace with imported modules for quite a while before it can start to execute. This occurs easily when you want to invoke in parallel a scriptblock involving a blocking call and a module that is slow loading.
For example, Suppose you want to run a scriptblock that involves the following:
- a module that takes 1 second to import
- a 20-second call to
Invoke-WebRequest
Suppose you want to invoke that scriptblock 20 times with different parameters. Ideally this would all take around 20 seconds to complete, but with the current runspace implementation it would take 40 seconds: 20 seconds to import the module for 20 times into the 20 runspaces, and another 20 seconds for that runspace to invoke the 20th call to Invoke-WebRequest
.
Runspace features that would support improved performance
It seems like the following features would support improved performance for concurrent scriptblocks:
- Relieve the contention that results when multiple threads attempt to import the same module.
- Rearrange when and how compilation of script modules occurs such that it is possible to pay the price of compilation as little once per process even when using that module in several runspaces. It seems like this would involve two different things:
a. Introduce the concept of a compiled-but-not-imported script module. The idea would be that the compiled script module could be used whenRunspace.Open()
is invoked such that numerous runspaces could be opened without having to compile the same module again.
b. Establish a supported way of defining multi-file script modules that doesn't involve invoking scriptblocks to gather the files. By doing so, the compilation of a module could be separated from the invocation of its scriptblock. Note that per #5942(comment) there currently doesn't seem to be an alternative to dot-sourcing.ps1
files inside the.psm1
. - Reduce the cost of producing clones of a runspace with imported script modules. I think this is what @powercode's suggested Snapshot, ResetToSnapShot, and CloneSnapshot could do. I suspect that the time it takes to execute a script module's scriptblock on module import is probably non-trivial. I'm not sure how, exactly, this would work since a module's scriptblock could be constructing any variety of objects that themselves aren't trivially clonable.
- Improve the ability to reset Runspaces to their InitialSessionState.
ResetRunspaceState()
only resets variables, so there are situations where that precludes Runspace reuse.
I think that (1) alone would be a significant improvement because at least then all the cores could be used for module import instead of just one.