Description
Description
Isolation forest final stage very slow and single threaded.
This is an issue I get quite frequently. I'll train an isolation forest on a decently large data set (say order 1M to 100M records, around 50 features), and it will run rapidly and in parallel with nearly 100% CPU utilization. I'll get the output like the following:
[Parallel(n_jobs=30)]: Using backend LokyBackend with 30 concurrent workers.
[Parallel(n_jobs=30)]: Done 3 out of 30 | elapsed: 17.9s remaining: 2.7min
[Parallel(n_jobs=30)]: Done 7 out of 30 | elapsed: 18.5s remaining: 1.0min
[Parallel(n_jobs=30)]: Done 11 out of 30 | elapsed: 19.4s remaining: 33.5s
[Parallel(n_jobs=30)]: Done 15 out of 30 | elapsed: 19.7s remaining: 19.7s
[Parallel(n_jobs=30)]: Done 19 out of 30 | elapsed: 20.0s remaining: 11.6s
[Parallel(n_jobs=30)]: Done 23 out of 30 | elapsed: 20.2s remaining: 6.2s
[Parallel(n_jobs=30)]: Done 27 out of 30 | elapsed: 20.9s remaining: 2.3s
[Parallel(n_jobs=30)]: Done 30 out of 30 | elapsed: 21.5s finished
And then it will run for a very long time (10x as long? more?) on a single core, and eventually finalize. Often I'll get progress statements all printed simultaneously at the end when the task completes:
Building estimator 1 of 3 for this parallel run (total 100)...
Building estimator 2 of 3 for this parallel run (total 100)...
Building estimator 3 of 3 for this parallel run (total 100)...
...
I presume that's from parallel processes or threads printing to stdout without flushing.
I create the isolation forest with:
from sklearn.ensemble import IsolationForest
model_kwargs={
'n_estimators': 100,
'n_jobs': 30,
'verbose': 10,
'max_samples': 1000,
'behaviour': "new"
}
clf = IsolationForest(**model_kwargs)
Versions
System:
python: 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) [GCC 7.3.0]
executable: /home/ibackus/anaconda3/bin/python
machine: Linux-4.15.0-1032-aws-x86_64-with-debian-buster-sid
BLAS:
macros: SCIPY_MKL_H=None, HAVE_CBLAS=None
lib_dirs: /home/ibackus/anaconda3/lib
cblas_libs: mkl_rt, pthread
Python deps:
pip: 18.1
setuptools: 40.6.3
sklearn: 0.20.2
numpy: 1.15.4
scipy: 1.2.1
Cython: 0.29.2
pandas: 0.24.1