[MRG] Automatically group short tasks in batches #157

ogrisel · 2014-07-20T13:40:55Z

This is based on an online evaluation of the batch completion time to dynamically tune the size of the batch of tasks to dispatch at once to a worker.

TODO:

ensure tests pass on windows
update the change log
test with scikit-learn

vene · 2014-07-20T17:09:31Z

One of the applications that brought this to attention comes from the parallel OvR metaestimator, where some benchmarks revealed the estimation being much faster without parallelization if the number of classes (and therefore parallel jobs) is very high (4000).

Benching script

The following is with default settings except where specified

With old joblib:

n_jobs=-1
{'dataset': 'sparse_output.joblib'}
Time to fit     = 393.623631001
Time to predict = 29.2175137997

sparse, n_jobs=1
Time to fit     = 35.092443943
Time to predict = 28.9341239929

With this PR:

n_jobs=-1
Time to fit     = 14.2408790588
Time to predict = 29.649130106

This seems great!

ogrisel · 2014-07-20T17:16:03Z

All tests under windows pass: https://ci.appveyor.com/project/ogrisel/joblib/build/1.0.29

I had to push bench-batching in the upstream repo for appveyor to run the test for some reason, we can delete that branch once we merge.

ogrisel · 2014-07-20T17:33:37Z

@GaelVaroquaux this is ready for final review. @larsmans you might also be interested in having a look at this.

ogrisel · 2014-07-22T07:30:33Z

@jnothman I think you might also be interested in reviewing this. There is a benchmark script you can use to explore the runtime behavior.

GaelVaroquaux · 2014-07-22T09:23:24Z

joblib/parallel.py

@@ -132,10 +158,10 @@ def delayed_function(*args, **kwargs):
 class ImmediateApply(object):


Maybe the class name should be changed to 'ImmediateComputeBatch': the 'ImmediateApply' name comes from the 'apply' terminology used in multiprocessing.

The docstring should clearly be changed.

jnothman · 2014-07-22T11:56:47Z

I'll have to see if I have time to review, but @vene's summary above is very enticing!

arjoly · 2014-07-22T12:16:30Z

note that the predict is not parallelized. :-)

ogrisel · 2014-07-22T13:35:29Z

I think the first batch of comments are addressed in new commits. I plan to squash those prior to merging once the review is over.

jnothman · 2014-07-27T03:30:57Z

joblib/parallel.py

+            Default is '2*n_jobs'. When batch_size="auto" this is reasonable
+            default and the multiprocessing workers shoud never starve.
+        batch_size: int or 'auto', default: 'auto'
+            The number of atomic tasks to dispatch at once to a specific


"a specific worker" isn't quite right here. Perhaps just "to each worker"?

ogrisel · 2014-07-27T18:19:43Z

@jnothman thanks for the review. I addressed your comments in the last commit.

jnothman · 2014-07-31T00:42:09Z

I haven't reviewed the benchmark or the tests, but the feature itself looks great!

lesteve · 2015-06-23T08:17:44Z

Looks like the memory_profiler profile decorators had a significant overhead in the benchmark scripts. So here are the more accurate timings:

Dense	0.8.4	PR 157
n_jobs=1	47.71	47.65
n_jobs=-1	125.49	20.45

Sparse	0.8.4	PR 157
n_jobs=1	40.13	40.60
n_jobs=-1	113.41	10.73

ogrisel · 2015-06-23T08:57:20Z

Thanks for the benchmarks!

ogrisel · 2015-06-23T08:59:00Z

BTW, you asked earlier I how to test the dev branch of joblib in scikit-learn without modifying the scikit-learn source code: I put a monkeypatch in a pythonstartup.py file that gets executed each time you start python:

$ cat ~/pythonstartup.py

try:
    import joblib
    from sklearn.externals import joblib as skl_joblib
    print('Monkeypatching scikit-learn embedded joblib')
    for k, v in vars(joblib).items():
        setattr(skl_joblib, k, v)
except ImportError as e:
    print("Could not apply joblib monkeypatch: %s" % e)

lesteve · 2015-06-23T09:32:02Z

joblib/test/test_parallel.py

+
+
+def test_batching_auto_multiprocessing():
+    # Batching is not enabled whith the threading backend as it has found


I guess this comment is a copy and paste of the one in test_batching_auto_threading and should be removed.

lesteve · 2015-06-23T09:44:39Z

joblib/parallel.py

+            self.n_dispatched_batches += 1
+            self.n_dispatched_tasks += len(batch)
+            self.n_completed_tasks += len(batch)
+            if not _verbosity_filter(self.n_dispatched_batches, self.verbose):
                self._print('Done %3i jobs       | elapsed: %s',


You didn't change jobs to tasks here but you did it below somewhere.

ogrisel · 2015-06-25T12:20:28Z

@lesteve I pushed a fix for the batch_size=0 you found while running the benchmarks.

vene · 2015-06-25T17:34:13Z

benchmarks/bench_auto_batching.py

+    bench_short_tasks(low_variance, **bench_parameters)
+
+    # Second pair of benchmarks: one has a cycling task duration pattern that
+    # the auto batching feature should be able to roughly track. We use a pair


by pair I assume you mean even? 🇫🇷

no I mean a couple, e.g. like in the expression "a pair of gloves": https://www.google.fr/search?q=define:pair

Isn't that idiomatic English?

I mean "a pair power of cos". The first use of "pair" is perfectly readable to me, but the second struck as odd.

ah ok, will fix it.

vene · 2015-06-25T20:41:10Z

Thanks a lot @ogrisel for the work on this, is there anything I can do to help?

ogrisel · 2015-06-25T20:55:56Z

Thanks a lot @ogrisel for the work on this, is there anything I can do to help?

Thanks for the final review. I think we are good. I will let @lesteve do the merge and release joblib 0.9.0b1. Then I will do a PR in sklearn to synchronize the embedded joblib. That should get us some quick feedback.

lesteve · 2015-06-29T06:51:20Z

Merging, thanks a lot, great stuff.

[MRG] Automatically group short tasks in batches

jnothman · 2015-06-29T07:01:01Z

Yay!

On 29 June 2015 at 16:51, Loïc Estève notifications@github.com wrote:

Merged #157 #157.

—
Reply to this email directly or view it on GitHub
#157 (comment).

arjoly · 2015-06-29T07:32:58Z

Great !!!

ogrisel · 2015-06-29T08:05:45Z

🍻

ogrisel mentioned this pull request Jul 20, 2014

[WIP] Batching of atomic tasks #156

Closed

ogrisel changed the title ~~[WIP] Automatically group short tasks in batches~~ [MRG] Automatically group short tasks in batches Jul 20, 2014

ogrisel mentioned this pull request Jul 22, 2014

Make it possible to pack tasks in joblib.Parallel #112

Closed

ogrisel mentioned this pull request Jul 22, 2014

Make joblib.Parallel(pre_dispatch="auto") by default #113

Closed

GaelVaroquaux reviewed Jul 22, 2014
View reviewed changes

ogrisel mentioned this pull request Jul 22, 2014

option to return results of parallel for loop as a numpy array instead of list. #110

Closed

jnothman reviewed Jul 27, 2014
View reviewed changes

ogrisel mentioned this pull request Mar 26, 2015

joblib and max user procs #199

Open

ogrisel added 2 commits June 23, 2015 09:44

DOC fixed some inline comments

d8a9b10

FIX remove unused arguments in BatchedCalls

cba6c41

ENH use a temporary file in the bench script

f72af39

lesteve reviewed Jun 23, 2015
View reviewed changes

FIX remove phony test comment

43d524f

lesteve reviewed Jun 23, 2015
View reviewed changes

ogrisel added 2 commits June 23, 2015 13:07

FIX use 'tasks' consistently in print statements

619c73a

FIX validate batch_size param & addded more tests

aa1f8ac

vene reviewed Jun 25, 2015
View reviewed changes

ogrisel force-pushed the bench-batching branch from e0cbdbc to 8ee33cb Compare June 25, 2015 20:10

FIX typos and useless import

894ef75

ogrisel force-pushed the bench-batching branch from 8ee33cb to 894ef75 Compare June 25, 2015 20:10

FIX wording

779c1b3

lesteve added a commit that referenced this pull request Jun 29, 2015

Merge pull request #157 from ogrisel/bench-batching

aa29032

[MRG] Automatically group short tasks in batches

lesteve merged commit aa29032 into joblib:master Jun 29, 2015

lesteve mentioned this pull request Jun 29, 2015

[MRG+1] Updated joblib to 0.9.0b2 scikit-learn/scikit-learn#4905

Merged

amueller mentioned this pull request Jun 30, 2015

Improved Parallelization for Training and Inference pystruct/pystruct#129

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MRG] Automatically group short tasks in batches #157

[MRG] Automatically group short tasks in batches #157

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

		@@ -132,10 +158,10 @@ def delayed_function(args, *kwargs):
		class ImmediateApply(object):



		def test_batching_auto_multiprocessing():
		# Batching is not enabled whith the threading backend as it has found

[MRG] Automatically group short tasks in batches #157

[MRG] Automatically group short tasks in batches #157

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!