8000 FEA Introduce `PairwiseDistances` by Vincent-Maladiere · Pull Request #1 · Vincent-Maladiere/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

FEA Introduce PairwiseDistances #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

Vincent-Maladiere
Copy link
Owner

Reference Issues/PRs

Towards scikit-learn#23958

What does this implement/fix? Explain your changes.

This simplifies the original implementation of PairwiseDistance by @jjerphan, with the following differences:

  • PairwiseDistance{32,64} doesn't subclass BaseBaseDistancesReduction{32,64} anymore.
  • This allows to add _parallel_on_{X,Y} methods to PairwiseDistance{32,64}, since these methods are decorated with @final in BaseBaseDistancesReduction{32,64} and thus can't be overwritten.
  • This also remove the chunk computing mechanism, by considering only the case chunk_size = 1, as proposed by @ogrisel in this comment.
  • This doesn't implement the Euclidean specialization yet to make benchmarks simpler.

Following this benchmark, we found that this PR yields a significant performance regression when n_jobs = 1 and an improvement when n_jobs > 1, for both euclidean and manhattan distances:

euclidean

manhattan

Any other comments?

As suggested by @jjerphan, decorating DistanceMetric subclasses with @final could alleviate some of the overhead and make this implementation competitive with main when n_jobs=1.

@jjerphan
Copy link
jjerphan commented Feb 6, 2023

Thanks for pursuing this work, @Vincent-Maladiere.

This PR targets your fork's main branch. Can you recreate another one targeting upsteam/main? I will then provide some feedback and comment to you. Thank you. 🙂

@Vincent-Maladiere Vincent-Maladiere force-pushed the feat/pairwise_distances-pdr-backend branch from 162b501 to e4bf4e7 Compare February 7, 2023 09:08
@Vincent-Maladiere
Copy link
Owner Author

Closed to target scikit-learn/main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
4E8F
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0