Discussion working memory vs. performances

@rth

I ran some benchmarks on KMeans performances when varying the working_memory (see #10280). I open this discussion as suggested by @rth in #11271. In KMeans, working memory is involved in the function pairwise_distances_argmin_min.

You can see benchmarks below. I benchmarked KMeans.fit on a problem with 100000 samples, 50 dimensions and 1000 clusters, on 3 different machines.

It seems that working memory has an impact on performances, and moreover that the optimal is close to the cpu cache size. I think the first has lot of noise because it was made on my machine with other processes running and also focuses on smaller working memories.

Even if the improvement could only be at most 2x, it's worth considering a modification of the default value of the working memory, which is currently 1000Mo. However, it depends on the cpu specs. Would it be possible to make working_memory be inferred from that ?

ping @ogrisel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions