8000 PWD Engine Refactor by Micky774 · Pull Request #7 · Micky774/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

PWD Engine Refactor #7

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of servic 8000 e and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 15 commits into from
Closed

PWD Engine Refactor #7

wants to merge 15 commits into from

Conversation

Micky774
Copy link
Owner

Reference Issues/PRs
N/A

What does this implement/fix? Explain your changes.
Refactors PWD backend code to disentangle core computation from problem context

Any other comments?
Draft view for refactor -- will make PR on main repo when more mature.

Copy link
@Vincent-Maladiere Vincent-Maladiere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Micky774, very nice rework!

I focused on a similar approach on scikit-learn#25170, but I have some performances decrease and no workaround for them.

I think your approach is more likely to be successful, so I took the liberty to write some comments.

@Micky774
Copy link
Owner Author
Micky774 commented Jan 4, 2023

Initial tests show no significant change in performance between the current refactor approach and the performance on main. I have not run exhaustive tests, but on my 16 core machine here are the results I have currently (only on ArgKmin with the EuclideanEngines):

Small `n_samples`

4695b95a-896c-49e9-88b4-cc269a2d4acb

Larger `n_samples`

46a023e1-b4da-4a01-8094-882ae0fd990e

@Vincent-Maladiere
Copy link
Vincent-Maladiere commented Jan 11, 2023

Hey @Micky774! We scratched our heads for a while with @jjerphan to try to make the scikit-learn#25170 work and we concluded that calling surrogate_dist in a for-loop leads to the observed ~30% performance decreases. As a workaround, we tried to inline it so that surrogate_dist wouldn't introduce extra cost, but we failed to do so because inline is not feasible with inheritance. The reason for that is that Cython doesn't know how to dispatch inline methods when using inheritance (more details here).

We think that you will encounter the same issues because you have specialized your EuclideanEngine.surrogate_dist method for ArgKmin, with the call to heap_push, but you won't be able to implement RadiusNeighbors in this fashion. You'll also need to migrate the data structures for both ArgKmin and RadiusNeighbors to BaseEngine, creating massive coupling between these classes.

What do you think?

@Micky774
Copy link
Owner Author

Hey @Micky774! We scratched our heads for a while with @jjerphan to try to make the scikit-learn#25170 work and we concluded that calling surrogate_dist in a for-loop leads to the observed ~30% performance decreases. As a workaround, we tried to inline it so that surrogate_dist wouldn't introduce extra cost, but we failed to do so because inline is not feasible with inheritance. The reason for that is that Cython doesn't know how to dispatch inline methods when using inheritance.

We think that you will encounter the same issues because you have specialized your EuclideanEngine.surrogate_dist method for ArgKmin, with the call to heap_push, but you won't be able to implement RadiusNeighbors in this fashion. You'll also need to migrate the data structures for both ArgKmin and RadiusNeighbors to BaseEngine, creating massive coupling between these classes.

What do you think?

Yes that's where I had become blocked as well. @thomasjpfan and I explored several methods to try to solve this, mostly orienting around some form of efficient callback system. Ideally, context classes like ArgKmin, RadiusNeighbors would provide a callback to the engine object such that it could be efficiently called within the loop, however the difficulty therein is how to make it generic.

We have an idea that may work which relies on tightly-monitored and reviewed callbacks that utilize a void * extra_params struct which is mediated through documentation -- each context class would create its own callback in some _callbacks.pyx and define whatever extra parameters they need in a custom-typed struct passed through the void * argument.

I'll have a version of that on this draft soon enough hopefully and will see what the performance looks like.

@Micky774
Copy link
Owner Author

A callback system has been implemented in #8 but the initial benchmark results don't look great. I'll publish more comprehensive results later tomorrow. Right now I'm struggling to figure out exactly what is causing the slowdown (py-spy even with native tracking seems insufficient)

@jjerphan
Copy link
jjerphan commented Jan 12, 2023

I fear that adding callbacks with support for arbitrary arguments will make those implementations harder to understand and maintain overtime.

I would rather have duplication at the level of reduction specialization (e.g. Euclidean{ArgKmin,RadiusNeighbors}) than a complex chain of responsibilities between many composites and components. What do you think?

@Micky774
Copy link
Owner Author

I fear that adding callbacks with support for arbitrary arguments will make those implementations harder to understand and maintain overtime.

I would rather have duplication at the level of reduction specialization (e.g. Euclidean{ArgKmin,RadiusNeighbors} that have a complex chain of responsibilities between many composites and components. What do you think?

Honestly I favor tight documentation and review to mediate the complexity of the callback system than the current duplication and messy hierarchy observed on main. Of course I understand it may be an unpopular opinion, in which case I'm fine with whatever the concensus amongst core devs is -- or at least amongst those that are most involved with Cython development 😉

It may still be a moot point if the callbacks result in an irreducible performance regression though.

@thomasjpfan
Copy link

Since any of the callback implementations leads to a regression, I am okay with keeping the status quo. Currently, when we add a new reduction such as ArgKminLabels in scikit-learn#24076, there needs to be two implementations: one for the normal case and another for the Euclidean specialization. For me, this is still maintainable.

If we get to a point where we want to add another specialization, we can revisit the current design. (Adding another specialization, likely means we add it to ArgKmin, RadiusNeighbors, ArgKminLabels, and any future reductions).

@Micky774 Micky774 closed this Feb 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0