8000 KMeans singnificantly slower on 0.23 · Issue #17208 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

KMeans singnificantly slower on 0.23 #17208

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
PrimozGodec opened this issue May 13, 2020 · 15 comments
Closed

KMeans singnificantly slower on 0.23 #17208

PrimozGodec opened this issue May 13, 2020 · 15 comments

Comments

@PrimozGodec
Copy link

Describe the bug

With the latest changes, KMeans is significantly slower on small datasets. The time needed to compute clusters is around ten times longer.

Steps/Code to Reproduce

Times with the following code are:
scikit-lern 0.22: ~0.015
scikit-learn 0.23: ~0.15

import time

import sklearn.cluster
from sklearn import datasets

data = datasets.load_iris()['data']

t = time.time()
sklearn.cluster.KMeans(n_clusters=2).fit(data)
print(time.time() - t)

I also tried on a bigger dataset with shape (300, 25) where clustering with the new version needed 3-4s while before it happened in miliseconds.

Expected Results

Clusters would be computed as fast as before.

Versions

System:
    python: 3.7.6 | packaged by conda-forge | (default, Jan  7 2020, 22:05:27)  [Clang 9.0.1 ]
executable: /Users/primoz/miniconda3/envs/orange/bin/python
   machine: Darwin-19.0.0-x86_64-i386-64bit
Python dependencies:
       pip: 20.1
setuptools: 46.1.3
   sklearn: 0.23.0
     numpy: 1.18.4
     scipy: 1.4.1
    Cython: None
    pandas: 1.0.3
matplotlib: 3.2.1
    joblib: 0.14.1
Built with OpenMP: True
@jeremiedbb
Copy link
Member

Thanks for the report. Can you tell how many cores you have ?
Can you try setting OMP_NUM_THREADS=1 as an env var before launching python ? (I'm not proposing to do it permanently it's just to check if I'm on the right path)

@PrimozGodec
Copy link
Author

Thank you for fast reaction. I have 4 cores.
I also tried to set OMP_NUM_THREADS=1 and nothing changed regarding the speed - it was same slow than before.

@jeremiedbb
Copy link
Member

I can reproduce the slowdown on my laptop and the changes I made in #17210 solve the issue for me. We are going to merge it and it will be available it the nightly builds. Here are the instructions to install it: https://scikit-learn.org/stable/developers/advanced_installation.html#installing-nightly-builds. We'd be very interested if you could check if it fixes your issue (you'd have to wait a day after we merge it)

@PrimozGodec
Copy link
Author

I checked out your PR and installed the package with pip install -e .. It slightly improves the speed but it still much slower than before than me. Did I make anything wrong?

Now times are:
scikit-lern 0.22: ~0.015 s
scikit-learn 0.23: ~0.15 s
scikit-learn #17210: ~0.11 s

@jeremiedbb
Copy link
Member

I did cleaner measurements and indeed the proposed fix is still ~3x slower. Profiling showed that it's a new helper which takes 90% of the execution time. It's called at each iteration. I'll make a pr to move it outside of the iteration loop.

Out of curiosity, is performance critical for you for problems that take ~10ms to run ?

@PrimozGodec
Copy link
Author

@jeremiedbb thank you. That would be great.

We use scikit as the Orange dependency (Orange is a graphical tool for data analysis) and on the other problem (data with size (300, 25)) clustering which took milliseconds before (0.1 - 0.5s) takes few seconds now. Most of Orange's users have data with smaller sizes and for them, it is quite a difference. I mean it is not the most critical issue, but if it is possible we would prefer that things would be computed faster.

@jeremiedbb
Copy link
Member
jeremiedbb commented May 15, 2020

I opened #17235. It's better but still not as fast as scikit-learn 0.22.

EDIT: the conclusions are wrong, please ignore.

The reason is the overhead of the deprecation of positional args in 0.23 as we can see in this profiling:
prof2

Using public functions which are now wrapped in the positional args deprecation context manager in tight loops may have a non negligible overhead. The solution would be to do someting like:

@_deprecate_positional...
def function(x):
    return _function(x)

def _function(x):
    do the job

and call _function internally. That would apply to all utilities like metrics, validation, ... I don't know if we want to dive into this to improve perf of millisecond problems. What do you think @adrinjalali @rth ?

@adrinjalali
Copy link
Member
adrinjalali commented May 15, 2020

Another alternative is to disable the wrapping with a context manager. For instance:

def _deprecate_positional_args(f):
    if not get_config(wrap_positional):
        return f
    ...

with config_context(wrap_positional=False):
    public_method(...)

Would this improve the overhead?

WDYT @thomasjpfan?

@thomasjpfan
Copy link
Member
thomasjpfan commented May 15, 2020

WDYT @thomasjpfan?

This could work. I have another idea I want to try out ;)

@jeremiedbb Can you provide the code snippet you used to profile?

@jeremiedbb
Copy link
Member
jeremiedbb commented May 15, 2020

I take back what I said. I misinterpreted the profile. The overhead of the decorator is actually the very small bars right under inner_f. The 3 large bars are the function itself (euclidean_distances, check_array and check_pairwise here).

As a confirmation, I tried adrin's suggestion and it did not improve

@thomasjpfan here's what I use to profile:

pip install snakeviz

ipython
> %load_ext snakeviz
> %snakeviz my_func(x)

in this specific case:

import sklearn.cluster
from sklearn import datasets
data = datasets.load_iris()['data']

# 10000 iteration to have a more reliable profile
km = sklearn.cluster.KMeans(n_clusters=2, max_iter=10000, tol=-1).fit(data)

%snakeviz km.fit(data)

@thomasjpfan
Copy link
Member

At a glance, it looks like the other parts of inner_f are all the other wrapped functions such as check_pairwise_array and check_array, the overhead of _deprecate_positional_args itself is ~ms in total:

82044449-acc01700-96ad-11ea-9c6e-f6ff2f1f922e

That being said I think we can speed things up here as well.

@jeremiedbb
Copy link
Member

At a glance, it looks like the other parts of inner_f are all the other wrapped functions such as check_pairwise_array and check_array, the overhead of _deprecate_positional_args itself is ~ms in total

That's exactly what I said in the previous comment :D

@jeremiedbb
Copy link
Member

I think I managed to fix the issue now in #17235. @PrimozGodec would you mind checking this out to see if you recover the performances of 0.22 ?

@PrimozGodec
Copy link
Author

@jeremiedbb thank you for your help. I tested the PR and it works now normally.

@jeremiedbb
Copy link
Member

Thanks @PrimozGodec ! Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants
0