-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH Add float32
implementations for BallTree
and KDTree
#25914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Add float32
implementations for BallTree
and KDTree
#25914
Conversation
Hi Omar, thank you for looking into it. Adding support for float32 in binary trees is major work. I think we might want to proceed in several steps using a feature branch. This way, we could create PRs to:
What do people think? |
@jjerphan Could you kindly check the error and let me know what I am missing? It is an error related to |
This stems from Note I am not able to make the test fail on my end, but I think this issue can be fixed by performing this change: diff --git i/setup.py w/setup.py
index f5522600f..a07506013 100755
--- i/setup.py
+++ w/setup.py
@@ -497,10 +497,10 @@ def configure_extension_modules():
# `source` is a Tempita file
tempita_sources.append(source)
- # Do not include pxd files that were generated by tempita
- if os.path.splitext(new_source_path)[-1] == ".pxd":
- continue
- sources.append(new_source_path)
+ # Do not include header files (".pxd") and include files ("pxi")
+ # that were generated by Tempita.
+ if os.path.splitext(new_source_path)[-1] not in (".pxd", ".pxi"):
+ sources.append(new_source_path)
gen_from_templates(tempita_sources) |
How about we include the cython changes and basic tests for ball_tree and kd_tree in one PR and then a follow up PR can be used to treat any underlying estimators which make use of ball_tree or kd_tree? |
We also can. I think I would wait for other maintainers' point of view for the relevance of this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are a few notes.
This increase the size of binary outputs, but I think this is acceptable.
Details
On 3043ea6 (main
):
sklearn/neighbors/_partition_nodes.cpython-311-x86_64-linux-gnu.so: 29656 bytes
sklearn/neighbors/_kd_tree.cpython-311-x86_64-linux-gnu.so: 510104 bytes
sklearn/neighbors/_ball_tree.cpython-311-x86_64-linux-gnu.so: 515280 bytes
On 301afc9 (this branch):
sklearn/neighbors/_partition_nodes.cpython-311-x86_64-linux-gnu.so: 34192 bytes
sklearn/neighbors/_kd_tree.cpython-311-x86_64-linux-gnu.so: 709496 bytes
sklearn/neighbors/_ball_tree.cpython-311-x86_64-linux-gnu.so: 710944 bytes
Hence:
- 15% binary size increase for
_partition_nodes
- 39% binary size increase for
_kd_tree
- 37% binary size increase for
_ball_tree
Also, could you add a changelog?
BallTree
and KDTree
support for float32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once the last comments are treated.
Thank you, @OmarManzoor!
I think this PR is ready for a review, @OmarManzoor. What do you think? |
Well I think there might be two items remaining here:
What do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
API and implementation look good as discussed. Some minor notes to clean up tests a bit. Thanks!
@Micky774: what do you think of requiring a third reviewer's approval for this PR? If you or another maintainer is OK with our two approvals, I let you merge this PR. Edit: I added a note in the description of the PR to mention that those implementations aren't usable from the API yet. |
Sure. It's a novel pattern, so extra consideration won't hurt (plus it's not a super urgent change). General ping for anyone w/ bandwidth to review this (mainly concerning API) @scikit-learn/core-devs For those that are interested, see this comment for a cleaner diff than with the github UI. |
@ogrisel Wanted to go ahead and ping you in case you were interested in discussing the path we're taking here! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mixing the usage of pxi
files and Tempita does make this part of the code base a more complicated. I do not see any great alternatives, so I am okay with moving forward with this PR.
I see the linter check failed, but I didn't understand why? |
There are sometimes sporadic failures due to the network or the CI infrastructure. I reran the job and it passed. |
@thomasjpfan Can this be merged? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR does not actually use the new KDTree32
or BallTree32
in non-testing code.
@Micky774 @OmarManzoor @jjerphan For transparency, can one of you open an issue regarding the roadmap for what work needs to be done after this PR is merged?
{{endfor}} | ||
|
||
|
||
class BallTree(BallTree64): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The BallTree docs are missing:
class BallTree(BallTree64): | |
class BallTree(BallTree64): | |
__doc__ = CLASS_DOC.format(BinaryTree="BallTree") |
(Looking over CLASS_DOC
, it only requires BinaryTree
)
{{endfor}} | ||
|
||
|
||
class KDTree(KDTree64): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here regarding __doc__
.
Congratulations @OmarManzoor 🥳! Thank you for your work. |
All reactions
-
🎉 1 reaction
Congratulations, @OmarManzoor! |
…-learn#25914) Co-authored-by: Omar Salman <omar.salman@arbisoft> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
…-learn#25914) Co-authored-by: Omar Salman <omar.salman@arbisoft> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Reference Issues/PRs
Fixes #15474
What does this implement/fix? Explain your changes.
ℹ️ Note that those new implementations for float32 are tested but aren't usable from the
KDTree
andBallTree
public API because the implementations for float64 need to be decoupled from those API. This will be done in another PR.Any other comments?
CC: @jjerphan