-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[WIP] Refactor fibonacci tree, use in Ball Tree #11541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi @vyasr, Do you still have some time to work on this PR? |
This PR almost certainly needs to be restarted from scratch given how long it's been, but I'm interested and can certainly take a look. |
Please, do if you are. Personally, I am interested in seeing what Fibonacci Heaps have to offer for various implementations. |
Thank for exploring this, @vyasr. In retrospective, I do not think Fibonacci heaps are relevant for scikit-learn. Fibonacci heaps are theoretically speaking interesting construction over simple binary heaps: some operations are But technically, working with nodes which aren't contiguous in memory (the case of Fibonacci heaps) is much more costly and inefficient than working with a single structure of contiguous arrays which have been allocated once (the case of the current binary heaps used in scikit-learn originally implemented by @jakevdp): this implementation of binary heaps is optimal, much easier to understand and maintain as it is a single fused-typed function. Moreover, graph algorithms previously implemented in Hence, I am closing this PR. Still, you might be interested in scipy/scipy#16696. |
@jjerphan thanks for following up. Yes I think you are right. I briefly started working on this after our last messages, and when I saw that the graph algorithms had been removed I wondered the same. IIRC there's also some issues because the ball tree implementation uses two different binary heaps for two different purposes (I believe a fixed-size one for tracking nearest neighbors and a variable-size one for performance BFS/DFS) and the needs and performance characteristics of each are different. If there's future interest in this speedup I suspect that looking at pairing heaps for each of these uses separately might make sense. Pairing heaps are likely a better choice in practice than Fibonacci heaps given the operations in use and our interest in real-world performance, not just better scaling if it comes at the cost of larger prefactors. How they'll compare to a binary heap is less of a given, but if all we're doing is inserting then we may not see much performance gain by switching. On the plus side, the pairing heap implementation is far simpler than a Fibonacci heap (not quite as simple as a binary heap, but quite close) so it wouldn't be so painful to test out. |
Thanks for your comment, @vyasr.
That's exact. To add precision, both
I think, @jakevdp originally thought of improving the NeighborsHeaps (i.e. the max-heaps, used in both DFS and BFS)
That's exact. If you have time and are interested, feel free to have a look — I alternatively can. Small parenthesis: the |
Address Issue #597, which requests a refactoring of the existing fibonacci tree used for Dijkstra's algorithm in graph_nearest_neighbor.py in order to accelerate the Ball Tree in sklearn.neighbors.
This pull request is currently a work in progress. The fibonacci tree has been extracted, but the ball tree code still needs to be modified.