ENH: avoid oversubscription with nested for loops #690

GaelVaroquaux · 2018-05-31T22:54:33Z

Nested for loops can request many many threads. This leads to oversubscription which leads to too much use of memory and possibly a fork bomb with threads (#688).

The solution I implemented involves two changes:

Sharing the thread pool across Parallel backends. This require scaling it when we add parallel instances
Falling back to sequential computing when too many Parallel backends are around.

This PR fixes #688. It also avoids a very large memory consumption in the following code (use scikit-learn/scikit-learn#11166 to test in scikit-learn):

import os
os.environ['SKLEARN_SITE_JOBLIB'] = '1'
import joblib
from sklearn import datasets, model_selection, ensemble

data = datasets.fetch_covtype()
X = data.data
y = data.target


rf = ensemble.RandomForestClassifier(n_estimators=100, n_jobs=-1, verbose=10,
                                     max_depth=1)

model = model_selection.GridSearchCV(estimator=rf,
            param_grid=dict(
                max_features=[.1, .2, .3, .4, .5, .6]),
            n_jobs=-1, verbose=10,
            )

with joblib.parallel_backend('threading', n_jobs=-1):
    model_selection.cross_val_score(model, X, y, n_jobs=-1,
                verbose=10,
                )

Nested for loops can request many many threads. This leads to oversubscription which leads to too much use of memory and possibly a fork bomb with threads (joblib#688). The solution I implemented involves two changes: - Sharing the thread pool across Parallel backends. This require scaling it when we add parallel instances - Falling back to sequential computing when too many Parallel backends are around.

codecov · 2018-05-31T23:52:36Z

Codecov Report

Merging #690 into master will decrease coverage by 0.8%.
The diff coverage is 100%.

@@            Coverage Diff            @@
##           master    #690      +/-   ##
=========================================
- Coverage   95.01%   94.2%   -0.81%     
=========================================
  Files          40      40              
  Lines        5694    5744      +50     
=========================================
+ Hits         5410    5411       +1     
- Misses        284     333      +49

Impacted Files	Coverage Δ
joblib/parallel.py	`97.39% <ø> (+0.28%)`	⬆️
joblib/_parallel_backends.py	`97.37% <100%> (+0.76%)`	⬆️
joblib/test/test_parallel.py	`95.89% <100%> (-0.2%)`	⬇️
joblib/backports.py	`39.58% <0%> (-56.26%)`	⬇️
joblib/testing.py	`87.5% <0%> (-7.5%)`	⬇️
joblib/func_inspect.py	`89.71% <0%> (-5.15%)`	⬇️
joblib/pool.py	`87.93% <0%> (-3.45%)`	⬇️
joblib/test/common.py	`86.44% <0%> (-1.7%)`	⬇️
joblib/disk.py	`80% <0%> (-1.67%)`	⬇️
joblib/logger.py	`85.52% <0%> (-1.32%)`	⬇️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 48ae8f6...1e28b78. Read the comment docs.

ogrisel

As soon as travis is happy, I will be happy :)

ogrisel · 2018-06-01T22:59:35Z

joblib/_parallel_backends.py

+
+else:
+    def cpu_count():
+        return(1)


style: return 1 (return is still a statement in Python 3 ;)

ogrisel · 2018-06-01T23:00:17Z

joblib/_parallel_backends.py

            SafeFunction(func), callback=callback)
+        return(out)


Why introduce a local variable here?

ogrisel · 2018-06-01T23:01:36Z

joblib/_parallel_backends.py

+            # Don't span new threads if there are already many running
+            # This will fallback to SequentialBackend in the configure
+            # method
+            # This is necessary to avoid fork bombs


thread bombs?

ogrisel · 2018-06-01T23:01:46Z

joblib/_parallel_backends.py

+        if len(_thread_pool_users) > 2 * cpu_count():
+            # Don't span new threads if there are already many running
+            # This will fallback to SequentialBackend in the configure
+            # method


tomMoral · 2018-06-02T07:53:48Z

joblib/_parallel_backends.py

+            global _thread_pool
+            _thread_pool = None
+        try:
+            _thread_pool_users.remove(self)


I think you are missing a global here?

Also, I don't understand why you terminate the ThreadPool if len(_thread_pool_user) > 1. In this case, you should just remove some thread from it no?

I don't think the global declaration is required: we are just mutating the value of the _thread_pool_users variable, not assigning it a new value.

tomMoral · 2018-06-02T07:54:58Z

joblib/_parallel_backends.py

+            # The code below is accessing multiprocessing private API
+            max_processes = cpu_count() + len(_thread_pool_users)
+            if _thread_pool._processes < max_processes:
+                _thread_pool._processes = min(max_processes,


Pep8?

It feels hard to parse as the first argument is not aligned with the second.

ogrisel · 2018-06-04T11:29:10Z

CHANGES.rst

+Gael Varoquaux
+
+    Avoid oversubscription when there are multiple nested parallel loops.
+    As a result the system avoids fork bombs with recursive parallel


Let's be more specific about this change:

the system avoids thread bombs with ....

This change is not about the "fork" system call or new process creation.

ogrisel · 2018-06-04T11:29:48Z

joblib/_parallel_backends.py

@@ -26,6 +27,26 @@
    from .externals.loky._base import TimeoutError as LokyTimeoutError
    from .externals.loky import process_executor, cpu_count

+    class SafeThreadPool(ThreadPool):
+        " A ThreadPool that can repopulate in a thread safe way."


Style:

class SafeThreadPool(ThreadPool): """A ThreadPool that can repopulate in a thread safe way."""

ogrisel · 2018-06-04T11:34:26Z

joblib/test/test_parallel.py

+
+def test_fork_bomp():
+    # Test that recursive parallelism raises a recursion rather than
+    # doing a fork bomp


doing a fork bomb nor a thread bomb.

ogrisel · 2018-06-04T11:35:34Z

joblib/test/test_parallel.py

+    # Depending on whether the exception is raised in the main thread
+    # or in a slave thread and the version of Python one exception org
+    # another is raised
+    with parallel_backend('threading', n_jobs=-1):


It would be even better to test with the default backend: loky would be used as the top level and threads in the nested calls.

ogrisel · 2018-06-18T21:37:43Z

I added a new stress test and it caused a deadlock under Windows as @tomMoral suggested earlier that it would. I am not sure we can fix the design to avoid this.

I would rather not postpone the joblib release further for this.

GaelVaroquaux · 2018-06-19T06:55:07Z

I would rather not postpone the joblib release further for this.

Can we at least put in place a system that falls back to sequential backend when there is too much nesting. Right now it is easy to shoot oneself in the foot.

ogrisel · 2018-06-20T13:48:45Z

Can we at least put in place a system that falls back to sequential backend when there is too much nesting. Right now it is easy to shoot oneself in the foot.

Done in #700.

ogrisel

We cannot merge this as is (because of the potential deadlocks).

We can probably find a way to better mitigate oversubscription issues but we should not delay the joblib release because of this as this is complex problem.

GaelVaroquaux added enhancement bug need Review labels May 31, 2018

GaelVaroquaux added 2 commits June 1, 2018 01:51

FIX: catter for multiprocesssing not present

68f3205

TEST: test for fork bomps

4652e01

GaelVaroquaux added 6 commits June 1, 2018 02:09

FIX: correct exception type

fcc6fdc

BUG: fix deadlock

3138bb7

FIX: work without multiprocessing

4ab9b40

FIX tests

b68bf34

Support all Python versions

80194a3

FIX: variation in Exceptions

6f56e5d

ogrisel approved these changes Jun 1, 2018

View reviewed changes

GaelVaroquaux added 3 commits June 2, 2018 01:14

nitpicks

fe62e22

Flexible exception behavior

7f753f5

Update CHANGES

c4b998e

tomMoral reviewed Jun 2, 2018

View reviewed changes

ogrisel reviewed Jun 4, 2018

View reviewed changes

GaelVaroquaux mentioned this pull request Jun 4, 2018

Test too slow: test_mean_shift.py::test_parallel scikit-learn/scikit-learn#11146

Closed

ogrisel added 5 commits June 18, 2018 15:21

Merge master

cbddd3c

Explicitly catch RecursionError in test

c9460b5

TST stress test for nested threading calls

034ff43

MAINT increase timeouts to make test more robust on travis

462b8ea

PEP8

1e28b78

ogrisel mentioned this pull request Jun 20, 2018

[MRG] Nested parallel call thread bomb mitigation #700

Merged

ogrisel requested changes Jun 20, 2018

View reviewed changes

Devilmoon mentioned this pull request Aug 21, 2020

feature_importance causes a BSOD on Windows 10 scikit-learn/scikit-learn#18187

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ENH: avoid oversubscription with nested for loops #690

ENH: avoid oversubscription with nested for loops #690

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ENH: avoid oversubscription with nested for loops #690

Are you sure you want to change the base?

ENH: avoid oversubscription with nested for loops #690

Uh oh!

Conversation

Uh oh!

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!