-
Notifications
You must be signed in to change notification settings - Fork 432
[MRG] Automatically group short tasks in batches #157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
One of the applications that brought this to attention comes from the parallel OvR metaestimator, where some benchmarks revealed the estimation being much faster without parallelization if the number of classes (and therefore parallel jobs) is very high (4000). The following is with default settings except where specified With old joblib:
With this PR:
This seems great! |
All tests under windows pass: https://ci.appveyor.com/project/ogrisel/joblib/build/1.0.29 I had to push |
@GaelVaroquaux this is ready for final review. @larsmans you might also be interested in having a look at this. |
@jnothman I think you might also be interested in reviewing this. There is a benchmark script you can use to explore the runtime behavior. |
@@ -132,10 +158,10 @@ def delayed_function(*args, **kwargs): | |||
class ImmediateApply(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the class name should be changed to 'ImmediateComputeBatch': the 'ImmediateApply' name comes from the 'apply' terminology used in multiprocessing.
The docstring should clearly be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
I'll have to see if I have time to review, but @vene's summary above is very enticing! |
note that the predict is not parallelized. :-) |
I think the first batch of comments are addressed in new commits. I plan to squash those prior to merging once the review is over. |
Default is '2*n_jobs'. When batch_size="auto" this is reasonable | ||
default and the multiprocessing workers shoud never starve. | ||
batch_size: int or 'auto', default: 'auto' | ||
The number of atomic tasks to dispatch at once to a specific |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"a specific worker" isn't quite right here. Perhaps just "to each worker"?
@jnothman thanks for the review. I addressed your comments in the last commit. |
I haven't reviewed the benchmark or the tests, but the feature itself looks great! |
Looks like the memory_profiler profile decorators had a significant overhead in the benchmark scripts. So here are the more accurate timings:
|
Thanks for the benchmarks! |
BTW, you asked earlier I how to test the dev branch of joblib in scikit-learn without modifying the scikit-learn source code: I put a monkeypatch in a
try:
import joblib
from sklearn.externals import joblib as skl_joblib
print('Monkeypatching scikit-learn embedded joblib')
for k, v in vars(joblib).items():
setattr(skl_joblib, k, v)
except ImportError as e:
print("Could not apply joblib monkeypatch: %s" % e) |
|
||
|
||
def test_batching_auto_multiprocessing(): | ||
# Batching is not enabled whith the threading backend as it has found |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this comment is a copy and paste of the one in test_batching_auto_threading and should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch
self.n_dispatched_batches += 1 | ||
self.n_dispatched_tasks += len(batch) | ||
self.n_completed_tasks += len(batch) | ||
if not _verbosity_filter(self.n_dispatched_batches, self.verbose): | ||
self._print('Done %3i jobs | elapsed: %s', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You didn't change jobs to tasks here but you did it below somewhere.
@lesteve I pushed a fix for the |
bench_short_tasks(low_variance, **bench_parameters) | ||
|
||
# Second pair of benchmarks: one has a cycling task duration pattern that | ||
# the auto batching feature should be able to roughly track. We use a pair |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by pair I assume you mean even? 🇫🇷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no I mean a couple, e.g. like in the expression "a pair of gloves": https://www.google.fr/search?q=define:pair
Isn't that idiomatic English?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean "a pair power of cos". The first use of "pair" is perfectly readable to me, but the second struck as odd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok, will fix it.
Thanks a lot @ogrisel for the work on this, is there anything I can do to help? |
Merging, thanks a lot, great stuff. |
[MRG] Automatically group short tasks in batches
Yay! On 29 June 2015 at 16:51, Loïc Estève notifications@github.com wrote:
|
Great !!! |
🍻 |
This is based on an online evaluation of the batch completion time to dynamically tune the size of the batch of tasks to dispatch at once to a worker.
TODO: