-
-
Notifications
You must be signed in to change notification settings - Fork 26k
[MRG+1] Parallel radius neighbors #10887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] Parallel radius neighbors #10887
Conversation
We found in benchmarks that this did not improve runtime. Does your mileage
vary?
|
I haven't done a proper benchmark yet but I saw much better runtime on my laptop. I'll do a benchmark today and post the results here. |
I ran 10 times the following benchmark on a Google Cloud instance with 64 vCPUs: from sklearn.datasets import make_blobs
from sklearn.neighbors import NearestNeighbors
from sklearn.externals.joblib import cpu_count
import time
d = make_blobs(100000, 100)[0]
nn = NearestNeighbors().fit(d)
for n_jobs in range(1, cpu_count() + 1):
nn.n_jobs = n_jobs
start = time.time()
nn.radius_neighbors()
end = time.time()
print('{},{}'.format(n_jobs, end - start)) Although the scaling is not linear it definitely runs faster: |
maybe I'm not recalling correctly, and we never got as far as completely
removing the gil. The results look great! The implementation looks good at
a glance but will need a proper review. please add a test that results are
unchanged.
|
One thing that was not obvious to me but made a huge difference was that in - raise ValueError("Fatal: count out of range. "
- "This should never happen.")
+ with gil:
+ raise ValueError("Fatal: count out of range. "
+ "This should never happen.")
instead of this: - raise ValueError("Fatal: count out of range. "
- "This should never happen.")
+ return -1 have a completely different scaling behavior. In both case the function is |
@jnothman Thank you for your comments. I think this is ready for review. |
And I wish I were available to review it, but will not be for a few weeks
at least...
…On 2 April 2018 at 17:44, Joël Billaud ***@***.***> wrote:
@jnothman <https://github.com/jnothman> Thank you for your comments. I
think this is ready for review.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#10887 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6yEYiXdA6Uiz7cn1_xVbU6Y1IzEcks5tkdbigaJpZM4S_fFL>
.
|
Nice work ! The code seems reasonable, though I am a novice in C memory allocation. As a general comment, your code could benefit from more comments, especially near non-standard function like For instance, you could add comments with the equivalent python expressions: |
@TomDLT Thank you for your comment. I added some comments as you suggested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Disclaimer: I am a novice in C memory allocation.
Could you add an entry in doc/whats_new/v0.20.rst
?
@TomDLT thank you for the review. I updated the release notes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Nice work, thanks
Merge when green |
Reference Issues/PRs
Fixes #8003
What does this implement/fix? Explain your changes.
This makes
RadiusNeighborsMixin.radius_neighbors
honor then_jobs
argument and split the queries among processors. This also makesquery_radius
GIL-free so that it is actually faster than single thread.TODO:
Any other comments?