-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Improve control over number of thr 8000 eads used in an mne call #10522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I like this idea but it's tricky to make work out of the box. I suspect the optimal behavior depends on the length of the files, the number of channels, epochs... |
thanks for the reply @agramfort . In contrast, I think it would be straight forward. The current And there are good reasons why you would want to consistently define a max thread number, e.g. when you work on shared resources or want to run multiple jobs on the same compute server. As it is now, a user needs to go out of their way to make sure that all restrictions via environment variables are set before anything else is imported - or do some additional research to find, what I referenced above. |
how would you do this? if n_jobs=1 you make sure one thread is used?
So suddenly all computations in MNE are monothread?
it means we would need to add n_jobs in many places? note that we use
usually processes and not threads for parallel
do I get what you want to do correctly?
… Message ID: ***@***.***>
|
Well I guess, the question would then be, whether Apart from functions, that have the argument As a user, I assumed that If you decide to empower
I actually meant available CPU resources, not threads or processes as they are used in the context of python. I have used "CPU threads" now to clarify.
Essentially yes. But that could be done gradually. |
8000
I guess it comes down to, what you would like In my current code, the use of |
I would try to avoid a behavior that deviates to big libraries like
scikit-learn.
what is unclear to me is how big of the change is the change you suggest.
in the lab we ask users to set OMP_NUM_THREADS to 1 in their .bashrc
on the shared machines and to nice their jobs with "nice -5 python ..."
Message ID: ***@***.***>
… |
According to this resource the default for
All these changes are non-breaking in the sense that actively setting
That would be enforcing monothreading, assuming it catches all cases (might need to add more enviroment variables, see this stackoverflow answer). Even so, wouldn't it make more sense to use a Python API instead of having users manipulate their |
let me think about this. We will discuss this in the next MNE core dev meeting. You're welcome to join. It will be on Friday 22nd 5PM CET on MNE discord channel. 2 remarks:
|
I have not thought about threadpoolctl much, but I've seen it used in SciPy (with modifications from sklearn): And they mention there that it's also what's being used by NumPy. Given that scikit-learn uses Joblib to spawn new processes, as well, we can probably learn from their experience and try to do the same things.
Two ideas (and I think the second is better):
|
@dafrose we usually try to follow what sklearn does, under the assumption that they have thought about this stuff a lot. It sounds like they, in turn, mostly deligate to joblib. With that in mind, I propose we follow their model by:
This seems like it would allow MNE functions that use |
+1 to now default to None in n_jobs and use this semantic from sklearn
… Message ID: ***@***.***>
|
@agramfort thanks for the invite. I will see whether I can make it on Friday. @larsoner +1 for adding the context manager as a decorator or to an existent decorator. Regarding |
Regarding n_jobs=None: I like the idea, because it does not change the
current default behaviour, but allows to hand over control to a lower-tier
context. However, it should always be clear, what takes preference. Unless
otherwise explained, I would expect that the call raw.filter(...,
n_jobs=<a_number>) should overrule whatever an external context defines,
unless the value is None as you defined above. Do I understand correctly,
that setting n_jobs=-1 would still mean that all available cores are used?Message
ID: ***@***.***>
yes it's the sklearn behavior.
|
@dafrose do you want to take a stab at a PR to implement this? |
@larsoner thanks for the offer. I would love to, but I am afraid it would take some time. I already have a few PRs on my todo list, one of them already for |
Description
I propose to use threadpoolctl to improve control over the number of threads used throughout mne and in calls to external libraries like
numpy
. This is apparently the direction thatnumpy
has moved to as discussed here: numpy/numpy#11826Reasoning
I have had trouble completely controlling the number of threads used by various
mne
functions. Many mne functions haven_jobs
arguments that control the number of threads used in that function, but there are cases where code within that function can escape this limit due to externally defined reference values. And then there are functions likemne.chpi.filter_chpi
that do not have then_jobs
argument, but can still parallelize. It is possible to control this with environment variables as dicussed here, but that only works before you import the respective library, e.g. numpy. The easiest way to control thread limits after an import has happened appears to bethreadpoolctl
.Proposed Implementation
I have successfully use the syntax
to control threads used in an mne call. The same could be used internally to make better use of the existing
n_jobs
argument without forcing the user to do it themselves. If this proves successful, it might make sense to add then_jobs
argument in even more places.The text was updated successfully, but these errors were encountered: