Array API support for k-nearest neighbors models with the brute force method #26586

ogrisel · 2023-06-15T07:54:23Z

This issue is a sibling of a similar issue for k-means: #26585 with similar purpose but likely different constraints.

In particular an efficient implementation of k-NN on the GPU would require:

torch.cdist
torch.topk being discussed at:
- RFC: add topk and / or argpartition data-apis/array-api#629

The text was updated successfully, but these errors were encountered:

fcharras · 2023-06-20T14:50:38Z

Here is a relevant gist of what could be a pytorch drop-in replacement for the kneighbors method:

https://gist.github.com/fcharras/82772cf7651e087b3b91b99105a860dd

Self quoting myself in the k-means thread::

to my knowledge the best brute force implementations require materializing the pairwise distance matrix in memory and can't go farther than the IO bottleneck, so the speedup one can get is more limited, and the pytorch implementation should be decently close from the best you can get.

ogrisel · 2023-07-13T09:57:08Z

It would be interesting to compare with cuML and if cuML is much faster than this PyTorch GPU implementation of brute force kNN, then it might be interesting to see if we can get similar performance with Triton based implementation.

betatim · 2023-07-20T11:25:11Z

Forked your original gist and added some basic cuml comparison: https://gist.github.com/betatim/68219c95f539df51afad96cd9cd14a1c

On a machine with 8 Tesla V100 (32GB RAM), 80 Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz (from looking at /proc/cpuinfo) and 1TB RAM I get about 8s for the torch implementation and about 1s for the cuml option. With 5M samples in the data, so more than the original gist used.

On a second run I got 6s and 0.33s respectively. Seems to fluctuate a bit.

ogrisel · 2023-08-21T09:42:35Z

Have you tried to set CUDA_VISIBLE_DEVICES=0 to make sure that none of the two implementation leverages the fact that you have multiple GPU devices on the benchmark machine?

github-actions bot added the Needs Triage Issue requires triage label Jun 15, 2023

ogrisel added New Feature Array API and removed Needs Triage Issue requires triage labels Jun 15, 2023

glemaitre added this to Array API May 17, 2024

glemaitre moved this to Todo in Array API May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Array API support for k-nearest neighbors models with the brute force method #26586

Array API support for k-nearest neighbors models with the brute force method #26586

Array API support for k-nearest neighbors models with the brute force method #26586

Array API support for k-nearest neighbors models with the brute force method #26586

Comments