8000 Isolation forest final stage very slow and single threaded · Issue #13295 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
Isolation forest final stage very slow and single threaded #13295
Closed
@ibackus

Description

@ibackus

Description

Isolation forest final stage very slow and single threaded.

This is an issue I get quite frequently. I'll train an isolation forest on a decently large data set (say order 1M to 100M records, around 50 features), and it will run rapidly and in parallel with nearly 100% CPU utilization. I'll get the output like the following:

[Parallel(n_jobs=30)]: Using backend LokyBackend with 30 concurrent workers.
[Parallel(n_jobs=30)]: Done   3 out of  30 | elapsed:   17.9s remaining:  2.7min
[Parallel(n_jobs=30)]: Done   7 out of  30 | elapsed:   18.5s remaining:  1.0min
[Parallel(n_jobs=30)]: Done  11 out of  30 | elapsed:   19.4s remaining:   33.5s
[Parallel(n_jobs=30)]: Done  15 out of  30 | elapsed:   19.7s remaining:   19.7s
[Parallel(n_jobs=30)]: Done  19 out of  30 | elapsed:   20.0s remaining:   11.6s
[Parallel(n_jobs=30)]: Done  23 out of  30 | elapsed:   20.2s remaining:    6.2s
[Parallel(n_jobs=30)]: Done  27 out of  30 | elapsed:   20.9s remaining:    2.3s
[Parallel(n_jobs=30)]: Done  30 out of  30 | elapsed:   21.5s finished

And then it will run for a very long time (10x as long? more?) on a single core, and eventually finalize. Often I'll get progress statements all printed simultaneously at the end when the task completes:

Building estimator 1 of 3 for this parallel run (total 100)...
Building estimator 2 of 3 for this parallel run (total 100)...
Building estimator 3 of 3 for this parallel run (total 100)...
...

I presume that's from parallel processes or threads printing to stdout without flushing.

I create the isolation forest with:

from sklearn.ensemble import IsolationForest
model_kwargs={
    'n_estimators': 100,
    'n_jobs': 30,
    'verbose': 10,
    'max_samples': 1000,
    'behaviour': "new"
}
clf = IsolationForest(**model_kwargs)

Versions

System:
python: 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) [GCC 7.3.0]
executable: /home/ibackus/anaconda3/bin/python
machine: Linux-4.15.0-1032-aws-x86_64-with-debian-buster-sid

BLAS:
macros: SCIPY_MKL_H=None, HAVE_CBLAS=None
lib_dirs: /home/ibackus/anaconda3/lib
cblas_libs: mkl_rt, pthread

Python deps:
pip: 18.1
setuptools: 40.6.3
sklearn: 0.20.2
numpy: 1.15.4
scipy: 1.2.1
Cython: 0.29.2
pandas: 0.24.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0