8000 ENH: avoid oversubscription with nested for loops by GaelVaroquaux · Pull Request #690 · joblib/joblib · GitHub
[go: up one dir, main page]

Skip to content

ENH: avoid oversubscription with nested for loops #690

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

GaelVaroquaux
Copy link
Member

Nested for loops can request many many threads. This leads to oversubscription which leads to too much use of memory and possibly a fork bomb with threads (#688).

The solution I implemented involves two changes:

  • Sharing the thread pool across Parallel backends. This require scaling it when we add parallel instances
  • Falling back to sequential computing when too many Parallel backends are around.

This PR fixes #688. It also avoids a very large memory consumption in the following code (use scikit-learn/scikit-learn#11166 to test in scikit-learn):

import os
os.environ['SKLEARN_SITE_JOBLIB'] = '1'
import joblib
from sklearn import datasets, model_selection, ensemble

data = datasets.fetch_covtype()
X = data.data
y = data.target


rf = ensemble.RandomForestClassifier(n_estimators=100, n_jobs=-1, verbose=10,
                                     max_depth=1)

model = model_selection.GridSearchCV(estimator=rf,
            param_grid=dict(
                max_features=[.1, .2, .3, .4, .5, .6]),
            n_jobs=-1, verbose=10,
            )

with joblib.parallel_backend('threading', n_jobs=-1):
    model_selection.cross_val_score(model, X, y, n_jobs=-1,
                verbose=10,
                )

Nested for loops can request many many threads. This leads to
oversubscription which leads to too much use of memory and possibly a
fork bomb with threads (joblib#688).

The solution I implemented involves two changes:

- Sharing the thread pool across Parallel backends. This require scaling
  it when we add parallel instances
- Falling back to sequential computing when too many Parallel backends
  are around.
@codecov
Copy link
codecov bot commented May 31, 2018

Codecov Report

Merging #690 into master will decrease coverage by 0.8%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #690      +/-   ##
=========================================
- Coverage   95.01%   94.2%   -0.81%     
=========================================
  Files          40      40              
  Lines        5694    5744      +50     
=========================================
+ Hits         5410    5411       +1     
- Misses        284     333      +49
Impacted Files Coverage Δ
joblib/parallel.py 97.39% <ø> (+0.28%) ⬆️
joblib/_parallel_backends.py 97.37% <100%> (+0.76%) ⬆️
joblib/test/test_parallel.py 95.89% <100%> (-0.2%) ⬇️
joblib/backports.py 39.58% <0%> (-56.26%) ⬇️
joblib/testing.py 87.5% <0%> (-7.5%) ⬇️
joblib/func_inspect.py 89.71% <0%> (-5.15%) ⬇️
joblib/pool.py 87.93% <0%> (-3.45%) ⬇️
joblib/test/common.py 86.44% <0%> (-1.7%) ⬇️
joblib/disk.py 80% <0%> (-1.67%) ⬇️
joblib/logger.py 85.52% <0%> (-1.32%) ⬇️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 48ae8f6...1e28b78. Read the comment docs.

Copy link
Contributor
@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As soon as travis is happy, I will be happy :)


else:
def cpu_count():
return(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: return 1 (return is still a statement in Python 3 ;)

SafeFunction(func), callback=callback)
8000 return(out)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why introduce a local variable here?

# Don't span new threads if there are already many running
# This will fallback to SequentialBackend in the configure
# method
# This is necessary to avoid fork bombs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thread bombs?

if len(_thread_pool_users) > 2 * cpu_count():
# Don't span new threads if there are already many running
# This will fallback to SequentialBackend in the configure
# method
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

method.

global _thread_pool
_thread_pool = None
try:
_thread_pool_users.remove(self)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are missing a global here?

Also, I don't understand why you terminate the ThreadPool if len(_thread_pool_user) > 1. In this case, you should just remove some thread from it no?

Copy link
Contributor
@ogrisel ogrisel Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the global declaration is required: we are just mutating the value of the _thread_pool_users variable, not assigning it a new value.

# The code below is accessing multiprocessing private API
max_processes = cpu_count() + len(_thread_pool_users)
if _thread_pool._processes < max_processes:
_thread_pool._processes = min(max_processes,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pep8?

It feels hard to parse as the first argument is not aligned with the second.

Gael Varoquaux

Avoid oversubscription when there are multiple nested parallel loops.
As a result the system avoids fork bombs with recursive parallel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be more specific about this change:

the system avoids thread bombs with ....

This change is not about the "fork" system call or new process creation.

@@ -26,6 +27,26 @@
from .externals.loky._base import TimeoutError as LokyTimeoutError
from .externals.loky import process_executor, cpu_count

class SafeThreadPool(ThreadPool):
" A ThreadPool that can repopulate in a thread safe way."
Copy link
Contributor
@ogrisel ogrisel Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style:

class SafeThreadPool(ThreadPool):
    """A ThreadPool that can repopulate in a thread safe way."""


def test_fork_bomp():
# Test that recursive parallelism raises a recursion rather than
# doing a fork bomp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doing a fork bomb nor a thread bomb.

# Depending on whether the exception is raised in the main thread
# or in a slave thread and the version of Python one exception org
# another is raised
with parallel_backend('threading', n_jobs=-1):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be even better to test with the default backend: loky would be used as the top level and threads in the nested calls.

8000

@ogrisel
Copy link
Contributor
ogrisel commented Jun 18, 2018

I added a new stress test and it caused a deadlock under Windows as @tomMoral suggested earlier that it would. I am not sure we can fix the design to avoid this.

I would rather not postpone the joblib release further for this.

@GaelVaroquaux
Copy link
Member Author
GaelVaroquaux commented Jun 19, 2018 via email

@ogrisel
Copy link
Contributor
ogrisel commented Jun 20, 2018

Can we at least put in place a system that falls back to sequential backend when there is too much nesting. Right now it is easy to shoot oneself in the foot.

Done in #700.

Copy link
Contributor
@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot merge this as is (because of the potential deadlocks).

We can probably find a way to better mitigate oversubscription issues but we should not delay the joblib release because of this as this is complex problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Be robust to infinite recursively parallel
3 participants
0