-
-
Notifications
You must be signed in to change notification settings - Fork 26.1k
Closed
Description
Hi,
I'm using the parallel version of clustering.MeanShift (which I had written, interestingly). I've now noticed that most of the processes are actually "sleeping", and only a few actually work. Even more oddly, this doesn't always happen:
- the problem is worse on some machine than on others
- the problem doesn't seem to appear when working with 2 dimensions instead of 4 (see code below).
- changing the code to use
multiprocessing
instead ofjoblib
makes it work
I have no idea where to start...
Reproduce
When running the code
from sklearn.cluster import MeanShift
import numpy as np
ndim = 4
points = np.random.random([100000, ndim])
MS = MeanShift(n_jobs=20, bandwidth=0.1)
print("Starting.")
MS.fit(points)
a call to htop
shows:
Versions
Linux-2.6.32-573.3.1.el6.x86_64-x86_64-with-redhat-6.6-Carbon
Python 3.4.2 (default, Feb 4 2015, 08:24:27)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)]
NumPy 1.11.1
SciPy 0.17.1
Scikit-Learn 0.17.1
Metadata
Metadata
Assignees
Labels
No labels