-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH Introduce PairwiseDistancesReduction
and PairwiseDistancesArgKmin
(feature branch)
#22134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
I am okay with merging this into |
As this code is reviewed, we can merge it anytime (after merge conflicts are resolved). |
Following #21462 (comment), the last thing that is missing is interfacing those implementations with scikit-learn interfaces, adapting and completing the existing tests. I was planning to stick to the plan, doing another sub-PR as initially planned not to have "dead-code" integrated. But no strong opinion: I am also OK doing a separate pull-request as mentioned by @thomasjpfan. |
No strong opinion either. We can stick with the original plan. This means we need one more PR into |
I believe this is the best approach for posterity.
That's it. I'll do it tomorrow. |
I am also fine with either approaches. Whatever works for you best @jjerphan. To get the coverage report greener, it should be possible to skip coverage in some branch of the openblas safe compat code: https://app.codecov.io/gh/scikit-learn/scikit-learn/compare/22134/diff |
Also Either it should be added to the list or that branch should be removed from the test. https://app.codecov.io/gh/scikit-learn/scikit-learn/compare/22134/diff#D5L329 |
PairwiseDistancesReductions
and PairwiseDistancesReductionsArgKmin
(feature branch)
PairwiseDistancesReductions
and PairwiseDistancesReductionsArgKmin
(feature branch)PairwiseDistancesReduction
and PairwiseDistancesArgKmin
(feature branch)
* Forward pairwise_dist_chunk_size in the configuration * Flip finalized results for PairwiseDistancesArgKmin The previous would have made the code more complex by introducing some boilerplate for the interface plugs. Having it this way actually simplifies the code. This also removes the haversine branch for test_pairwise_distances_argkmin Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * Plug PairwiseDistancesArgKmin as a back-end * Adapt test accordingly * Add whats_new entry * Change input validation order for kneighbors * Remove duplicated test_neighbors_distance_metric_deprecation Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> * Adapt the documentation Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> * Add mahalanobis case to test fixtures * Correct whats_new entry * CLN Remove unneeded private metric attribute This was needed when 'fast_sqeuclidean' and 'fast_euclidean' were present to choose the best implementation based on the user specification. Those metric have been removed since then, making this attribute useless. Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> * TST Assert FutureWarning instead of DeprecationWarning in test_neighbors_metrics * MAINT Add use_pairwise_dist_activate to scikit-learn config * TST Add a test for the 'brute' backends' results' consistency Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> * fixup! MAINT Add use_pairwise_dist_activate to scikit-learn config * fixup! fixup! MAINT Add use_pairwise_dist_activate to scikit-learn config * TST Filter FutureWarning for WMinkowskiDistance * MAINT pin numpydoc in arm for now (#22292) * fixup! TST Filter FutureWarning for WMinkowskiDistance * Revert keywords arguments removal for the GEMM trick for 'euclidean' * MAINT pin max numpydoc for now (#22286) * Add 'haversine' to CDIST_PAIRWISE_DISTANCES_REDUCTION_COMMON_METRICS Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * fixup! Add 'haversine' to CDIST_PAIRWISE_DISTANCES_REDUCTION_COMMON_METRICS * Apply suggestions from code review Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * MAINT Document some config parameters for maintenance Also rename one of them. Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * FIX Support and test one of 'sqeuclidean' specification Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * FIX Various typos fix and correct haversine 'haversine' is not supported by cdist. * Directly use get_config * CLN Apply comments from review Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com> Co-authored-by: Jérémie du Boisberranger <jeremiedbb@users.noreply.github.com> * Motivate swapped returned values Co-authored-by: Jérémie du Boisberranger <jeremiedbb@users.noreply.github.com> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> * TST Remove mahalanobis from test fixtures Co-authored-by: Jérémie du Boisberranger <jeremiedbb@users.noreply.github.com> * MNT Add comment regaduction functions' signatures Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * TST Complete test for `pairwise_distance_{argmin,argmin_min}` (#22371) * DOC Add sub-pull requests to the whats_new entry
This needs to by synched with main and then ... Is there anything left or can those 2k loc be merged? I guess, the honor would be yours, @jjerphan. |
Nothing that I can remember of. |
In that case, let's wait one or a few days, if anybody has any further comment. Eventually
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit, otherwise LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Woot!! This is a big deal! Congratulations to everyone involved!!! |
Thanks to @jjerphan for the hard work, and to the reviewers!! |
Coming back from holidays ⛰️ Thanks to everyone involved! This was a big piece which paves the way to smaller PRs, which reviews hopefully would be eased.
You're welcome. |
…min` (feature branch) (scikit-learn#22134) * MAINT Introduce Pairwise Distances Reductions private submodule (scikit-learn#22064) * MAINT Introduce FastEuclideanPairwiseArgKmin (scikit-learn#22065) * fixup! Merge branch 'main' into pairwise-distances-argkmin Remove duplicated Bunch * MAINT Plug `PairwiseDistancesArgKmin` as a back-end (scikit-learn#22288) * Forward pairwise_dist_chunk_size in the configuration * Flip finalized results for PairwiseDistancesArgKmin The previous would have made the code more complex by introducing some boilerplate for the interface plugs. Having it this way actually simplifies the code. This also removes the haversine branch for test_pairwise_distances_argkmin * Plug PairwiseDistancesArgKmin as a back-end * Adapt test accordingly * Add whats_new entry * Change input validation order for kneighbors * Remove duplicated test_neighbors_distance_metric_deprecation * Adapt the documentation * Add mahalanobis case to test fixtures * Correct whats_new entry * CLN Remove unneeded private metric attribute This was needed when 'fast_sqeuclidean' and 'fast_euclidean' were present to choose the best implementation based on the user specification. Those metric have been removed since then, making this attribute useless. * TST Assert FutureWarning instead of DeprecationWarning in test_neighbors_metrics * MAINT Add use_pairwise_dist_activate to scikit-learn config * TST Add a test for the 'brute' backends' results' consistency Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> * fixup! MAINT Add use_pairwise_dist_activate to scikit-learn config * fixup! fixup! MAINT Add use_pairwise_dist_activate to scikit-learn config * TST Filter FutureWarning for WMinkowskiDistance * MAINT pin numpydoc in arm for now (scikit-learn#22292) * fixup! TST Filter FutureWarning for WMinkowskiDistance * Revert keywords arguments removal for the GEMM trick for 'euclidean' * MAINT pin max numpydoc for now (scikit-learn#22286) * Add 'haversine' to CDIST_PAIRWISE_DISTANCES_REDUCTION_COMMON_METRICS * fixup! Add 'haversine' to CDIST_PAIRWISE_DISTANCES_REDUCTION_COMMON_METRICS * Apply suggestions from code review * MAINT Document some config parameters for maintenance Also rename one of them. * FIX Support and test one of 'sqeuclidean' specification Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * FIX Various typos fix and correct haversine 'haversine' is not supported by cdist. * Directly use get_config * CLN Apply comments from review * Motivate swapped returned values * TST Remove mahalanobis from test fixtures * MNT Add comment regaduction functions' signatures * TST Complete test for `pairwise_distance_{argmin,argmin_min}` (scikit-learn#22371) * DOC Add sub-pull requests to the whats_new entry * DOC place comment inside functions * DOC move up whatsnew entry Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Jérémie du Boisberranger <jeremiedbb@users.noreply.github.com>
Supersedes #21462.
Relates to #22587.
Task list as mentioned in #21462 (comment):
PairwiseDistancesArgKmin
to use a class method.compute
jjerphan/scikit-learn#5 for the small refactoring suggested by @thomasjpfan.PairwiseDistancesReduction
andPairwiseDistancesArgKmin
(feature branch) #22134 : high-level PR for the feature branchPairwiseDistancesArgKmin
as a back-end #22288pairwise-distances-argkmin
inmain
Results
Using this script:
On scikit-learn
1.0
:This PR:
See #21462 for reference.