KMeans singnificantly slower on 0.23 #17208

PrimozGodec · 2020-05-13T14:31:37Z

Describe the bug

With the latest changes, KMeans is significantly slower on small datasets. The time needed to compute clusters is around ten times longer.

Steps/Code to Reproduce

Times with the following code are:
scikit-lern 0.22: ~0.015
scikit-learn 0.23: ~0.15

import time

import sklearn.cluster
from sklearn import datasets

data = datasets.load_iris()['data']

t = time.time()
sklearn.cluster.KMeans(n_clusters=2).fit(data)
print(time.time() - t)

I also tried on a bigger dataset with shape (300, 25) where clustering with the new version needed 3-4s while before it happened in miliseconds.

Expected Results

Clusters would be computed as fast as before.

Versions

System:
    python: 3.7.6 | packaged by conda-forge | (default, Jan  7 2020, 22:05:27)  [Clang 9.0.1 ]
executable: /Users/primoz/miniconda3/envs/orange/bin/python
   machine: Darwin-19.0.0-x86_64-i386-64bit
Python dependencies:
       pip: 20.1
setuptools: 46.1.3
   sklearn: 0.23.0
     numpy: 1.18.4
     scipy: 1.4.1
    Cython: None
    pandas: 1.0.3
matplotlib: 3.2.1
    joblib: 0.14.1
Built with OpenMP: True

The text was updated successfully, but these errors were encountered:

jeremiedbb · 2020-05-13T15:48:10Z

Thanks for the report. Can you tell how many cores you have ?
Can you try setting OMP_NUM_THREADS=1 as an env var before launching python ? (I'm not proposing to do it permanently it's just to check if I'm on the right path)

PrimozGodec · 2020-05-13T17:00:01Z

Thank you for fast reaction. I have 4 cores.
I also tried to set OMP_NUM_THREADS=1 and nothing changed regarding the speed - it was same slow than before.

jeremiedbb · 2020-05-14T09:29:26Z

I can reproduce the slowdown on my laptop and the changes I made in #17210 solve the issue for me. We are going to merge it and it will be available it the nightly builds. Here are the instructions to install it: https://scikit-learn.org/stable/developers/advanced_installation.html#installing-nightly-builds. We'd be very interested if you could check if it fixes your issue (you'd have to wait a day after we merge it)

PrimozGodec · 2020-05-14T10:46:31Z

I checked out your PR and installed the package with pip install -e .. It slightly improves the speed but it still much slower than before than me. Did I make anything wrong?

Now times are:
scikit-lern 0.22: ~0.015 s
scikit-learn 0.23: ~0.15 s
scikit-learn #17210: ~0.11 s

jeremiedbb · 2020-05-14T11:13:16Z

I did cleaner measurements and indeed the proposed fix is still ~3x slower. Profiling showed that it's a new helper which takes 90% of the execution time. It's called at each iteration. I'll make a pr to move it outside of the iteration loop.

Out of curiosity, is performance critical for you for problems that take ~10ms to run ?

PrimozGodec · 2020-05-15T09:39:35Z

@jeremiedbb thank you. That would be great.

We use scikit as the Orange dependency (Orange is a graphical tool for data analysis) and on the other problem (data with size (300, 25)) clustering which took milliseconds before (0.1 - 0.5s) takes few seconds now. Most of Orange's users have data with smaller sizes and for them, it is quite a difference. I mean it is not the most critical issue, but if it is possible we would prefer that things would be computed faster.

jeremiedbb · 2020-05-15T11:18:11Z

I opened #17235. It's better but still not as fast as scikit-learn 0.22.

EDIT: the conclusions are wrong, please ignore.

The reason is the overhead of the deprecation of positional args in 0.23 as we can see in this profiling:

Using public functions which are now wrapped in the positional args deprecation context manager in tight loops may have a non negligible overhead. The solution would be to do someting like:

@_deprecate_positional...
def function(x):
    return _function(x)

def _function(x):
    do the job

and call _function internally. That would apply to all utilities like metrics, validation, ... I don't know if we want to dive into this to improve perf of millisecond problems. What do you think @adrinjalali @rth ?

adrinjalali · 2020-05-15T11:57:51Z

Another alternative is to disable the wrapping with a context manager. For instance:

def _deprecate_positional_args(f):
    if not get_config(wrap_positional):
        return f
    ...

with config_context(wrap_positional=False):
    public_method(...)

Would this improve the overhead?

WDYT @thomasjpfan?

thomasjpfan · 2020-05-15T13:02:48Z

WDYT @thomasjpfan?

This could work. I have another idea I want to try out ;)

@jeremiedbb Can you provide the code snippet you used to profile?

jeremiedbb · 2020-05-15T13:16:23Z

I take back what I said. I misinterpreted the profile. The overhead of the decorator is actually the very small bars right under inner_f. The 3 large bars are the function itself (euclidean_distances, check_array and check_pairwise here).

As a confirmation, I tried adrin's suggestion and it did not improve

@thomasjpfan here's what I use to profile:

pip install snakeviz

ipython
> %load_ext snakeviz
> %snakeviz my_func(x)

in this specific case:

import sklearn.cluster
from sklearn import datasets
data = datasets.load_iris()['data']

# 10000 iteration to have a more reliable profile
km = sklearn.cluster.KMeans(n_clusters=2, max_iter=10000, tol=-1).fit(data)

%snakeviz km.fit(data)

thomasjpfan · 2020-05-15T13:38:18Z

At a glance, it looks like the other parts of inner_f are all the other wrapped functions such as check_pairwise_array and check_array, the overhead of _deprecate_positional_args itself is ~ms in total:

That being said I think we can speed things up here as well.

jeremiedbb · 2020-05-15T13:40:09Z

At a glance, it looks like the other parts of inner_f are all the other wrapped functions such as check_pairwise_array and check_array, the overhead of _deprecate_positional_args itself is ~ms in total

That's exactly what I said in the previous comment :D

jeremiedbb · 2020-05-15T18:31:03Z

I think I managed to fix the issue now in #17235. @PrimozGodec would you mind checking this out to see if you recover the performances of 0.22 ?

PrimozGodec · 2020-05-18T08:00:18Z

@jeremiedbb thank you for your help. I tested the PR and it works now normally.

jeremiedbb · 2020-05-18T10:06:06Z

Thanks @PrimozGodec ! Closing.

PrimozGodec added the Bug: triage label May 13, 2020

PrimozGodec mentioned this issue May 13, 2020

[FIX] Fix compatibility with scikit-learn 0.23 biolab/orange3#4768

Merged

3 tasks

jeremiedbb mentioned this issue May 13, 2020

[WIP] Number of threads in KMeans should not be bigger than number of chunks #17210

Merged

jeremiedbb mentioned this issue May 15, 2020

ENH Move threadpoolctl outside of iteration loop in KMeans #17235

Merged

adrinjalali added Performance and removed Bug: triage labels May 15, 2020

jeremiedbb mentioned this issue May 15, 2020

sklearn.cluster.KMeans 0.23 is extra slower compared to 0.22.2 #17230

Closed

thomasjpfan mentioned this issue May 15, 2020

ENH Makes _deprecate_positional_args slightly faster #17246

Merged

jeremiedbb closed this as completed May 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

KMeans singnificantly slower on 0.23 #17208

KMeans singnificantly slower on 0.23 #17208

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

KMeans singnificantly slower on 0.23 #17208

KMeans singnificantly slower on 0.23 #17208

Comments

Describe the bug

Steps/Code to Reproduce

Expected Results

Versions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!