8000 Discussion working memory vs. performances · Issue #11506 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
Discussion working memory vs. performances #11506
Closed
@jeremiedbb

Description

@jeremiedbb

I ran some benchmarks on KMeans performances when varying the working_memory (see #10280). I open this discussion as suggested by @rth in #11271. In KMeans, working memory is involved in the function pairwise_distances_argmin_min.

You can see benchmarks below. I benchmarked KMeans.fit on a problem with 100000 samples, 50 dimensions and 1000 clusters, on 3 different machines.
image
image
image
It seems that working memory has an impact on performances, and moreover that the optimal is close to the cpu cache size. I think the first has lot of noise because it was made on my machine with other processes running and also focuses on smaller working memories.

Even if the improvement could only be at most 2x, it's worth considering a modification of the default value of the working memory, which is currently 1000Mo. However, it depends on the cpu specs. Would it be possible to make working_memory be inferred from that ?

ping @ogrisel

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0