8000 MAINT Plug `PairwiseDistancesArgKmin` as a back-end by jjerphan · Pull Request #22288 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

MAINT Plug PairwiseDistancesArgKmin as a back-end #22288

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

jjerphan
Copy link
Member

Reference Issues/PRs

Part of #21462 (comment).

Supersedes #22214.

What does this implement/fix? Explain your changes.

This plugs PairwiseDistancesArgKmin as a back-end so that it is used for some API, namely:

  • sklearn.metrics.pairwise_distances_argmin
  • sklearn.metrics.pairwise_distances_argmin_min
  • sklearn.cluster.AffinityPropagation
  • sklearn.cluster.Birch
  • sklearn.cluster.MeanShift
  • sklearn.cluster.OPTICS
  • sklearn.cluster.SpectralClustering
  • sklearn.feature_selection.mutual_info_regression
  • sklearn.neighbors.KNeighborsClassifier
  • sklearn.neighbors.KNeighborsRegressor
  • sklearn.neighbors.LocalOutlierFactor
  • sklearn.neighbors.NearestNeighbors
  • sklearn.manifold.Isomap
  • sklearn.manifold.LocallyLinearEmbedding
  • sklearn.manifold.TSNE
  • sklearn.manifold.trustworthiness
  • sklearn.semi_supervised.LabelPropagation
  • sklearn.semi_supervised.LabelSpreading

Existing tests are adapted and completed accordingly.

jjerphan and others added 13 commits January 14, 2022 18:14
The previous would have made the code more complex
by introducing some boilerplate for the interface plugs.

Having it this way actually simplifies the code.

This also removes the haversine branch for
test_pairwise_distances_argkmin

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
This was needed when 'fast_sqeuclidean' and 'fast_euclidean'
were present to choose the best implementation based on the user
specification.

Those metric have been removed since then, making this attribute
useless.

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Copy link
Member
@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (assuming all tests now pass)!

@jjerphan
Copy link
Member Author

sklearn/linear_model/tests/test_quantile.py::test_quantile_estimates_calibration[0.9] (unrelated to this PR) makes the CI timeout. This also happens on main.

@thomasjpfan
Copy link
Member

Most likely need to merge with main again to fix the stalling issue.

Copy link
Member
@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

jjerphan and others added 2 commits February 1, 2022 16:28
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
Co-authored-by: Jérémie du Boisberranger
<jeremiedbb@users.noreply.github.com>
Co-authored-by: Jérémie du Boisberranger
<jeremiedbb@users.noreply.github.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Copy link
Member
@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think the config_context discussion is not a blocker for this PR (But let's not forget it before merging the final PR in main).

@jeremiedbb
Copy link
Member

The coverage seems a bit light according to codecov (assuming it worked and did not forget any coverage file).

jjerphan and others added 2 commits February 2, 2022 08:28
Co-authored-by: Jérémie du Boisberranger <jeremiedbb@users.noreply.github.com>
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
@jjerphan
Copy link
Member Author
jjerphan commented Feb 4, 2022

#22371 has been opened to improve coverage.

@ogrisel
Copy link
Member
ogrisel commented Feb 9, 2022

I think the freeze in some CI jobs comes from pytest 7. Merge the current main to pin to pytest 6.

@jjerphan jjerphan force-pushed the pairwise-distances-argkmin-plug-contd branch from 7b90dfa to b8e9fb8 Compare February 9, 2022 20:51
@jjerphan
Copy link
Member Author

Finally all 🟢 🙂

@lorentzenchr lorentzenchr merged commit 5a4d710 into scikit-learn:pairwise-distances-argkmin Feb 10, 2022
lorentzenchr added a commit that referenced this pull request Feb 17, 2022
…min` (feature branch) (#22134)

* MAINT Introduce Pairwise Distances Reductions private submodule  (#22064)

* MAINT Introduce FastEuclideanPairwiseArgKmin  (#22065)

* fixup! Merge branch 'main' into pairwise-distances-argkmin

Remove duplicated Bunch

* MAINT Plug `PairwiseDistancesArgKmin` as a back-end (#22288)

* Forward pairwise_dist_chunk_size in the configuration

* Flip finalized results for PairwiseDistancesArgKmin

The previous would have made the code more complex
by introducing some boilerplate for the interface plugs.

Having it this way actually simplifies the code.

This also removes the haversine branch for
test_pairwise_distances_argkmin

* Plug PairwiseDistancesArgKmin as a back-end

* Adapt test accordingly

* Add whats_new entry

* Change input validation order for kneighbors

* Remove duplicated test_neighbors_distance_metric_deprecation

* Adapt the documentation

* Add mahalanobis case to test fixtures

* Correct whats_new entry

* CLN Remove unneeded private metric attribute

This was needed when 'fast_sqeuclidean' and 'fast_euclidean'
were present to choose the best implementation based on the user
specification.

Those metric have been removed since then, making this attribute
useless.

* TST Assert FutureWarning instead of DeprecationWarning in
test_neighbors_metrics

* MAINT Add use_pairwise_dist_activate to scikit-learn config

* TST Add a test for the 'brute' backends' results' consistency

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

* fixup! MAINT Add use_pairwise_dist_activate to scikit-learn config

* fixup! fixup! MAINT Add use_pairwise_dist_activate to scikit-learn config

* TST Filter FutureWarning for WMinkowskiDistance

* MAINT pin numpydoc in arm for now (#22292)

* fixup! TST Filter FutureWarning for WMinkowskiDistance

* Revert keywords arguments removal for the GEMM trick for 'euclidean'

* MAINT pin max numpydoc for now (#22286)

* Add 'haversine' to CDIST_PAIRWISE_DISTANCES_REDUCTION_COMMON_METRICS

* fixup! Add 'haversine' to CDIST_PAIRWISE_DISTANCES_REDUCTION_COMMON_METRICS

* Apply suggestions from code review

* MAINT Document some config parameters for maintenance

Also rename one of them.

* FIX Support and test one of 'sqeuclidean' specification

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

* FIX Various typos fix and correct haversine

'haversine' is not supported by cdist.

* Directly use get_config

* CLN Apply comments from review

* Motivate swapped returned values

* TST Remove mahalanobis from test fixtures

* MNT Add comment regaduction functions' signatures

* TST Complete test for `pairwise_distance_{argmin,argmin_min}` (#22371)

* DOC Add sub-pull requests to the whats_new entry

* DOC place comment inside functions

* DOC move up whatsnew entry

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Jérémie du Boisberranger <jeremiedbb@users.noreply.github.com>
thomasjpfan added a commit to thomasjpfan/scikit-learn that referenced this pull request Mar 1, 2022
…min` (feature branch) (scikit-learn#22134)

* MAINT Introduce Pairwise Distances Reductions private submodule  (scikit-learn#22064)

* MAINT Introduce FastEuclideanPairwiseArgKmin  (scikit-learn#22065)

* fixup! Merge branch 'main' into pairwise-distances-argkmin

Remove duplicated Bunch

* MAINT Plug `PairwiseDistancesArgKmin` as a back-end (scikit-learn#22288)

* Forward pairwise_dist_chunk_size in the configuration

* Flip finalized results for PairwiseDistancesArgKmin

The previous would have made the code more complex
by introducing some boilerplate for the interface plugs.

Having it this way actually simplifies the code.

This also removes the haversine branch for
test_pairwise_distances_argkmin

* Plug PairwiseDistancesArgKmin as a back-end

* Adapt test accordingly

* Add whats_new entry

* Change input validation order for kneighbors

* Remove duplicated test_neighbors_distance_metric_deprecation

* Adapt the documentation

* Add mahalanobis case to test fixtures

* Correct whats_new entry

* CLN Remove unneeded private metric attribute

This was needed when 'fast_sqeuclidean' and 'fast_euclidean'
were present to choose the best implementation based on the user
specification.

Those metric have been removed since then, making this attribute
useless.

* TST Assert FutureWarning instead of DeprecationWarning in
test_neighbors_metrics

* MAINT Add use_pairwise_dist_activate to scikit-learn config

* TST Add a test for the 'brute' backends' results' consistency

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

* fixup! MAINT Add use_pairwise_dist_activate to scikit-learn config

* fixup! fixup! MAINT Add use_pairwise_dist_activate to scikit-learn config

* TST Filter FutureWarning for WMinkowskiDistance

* MAINT pin numpydoc in arm for now (scikit-learn#22292)

* fixup! TST Filter FutureWarning for WMinkowskiDistance

* Revert keywords arguments removal for the GEMM trick for 'euclidean'

* MAINT pin max numpydoc for now (scikit-learn#22286)

* Add 'haversine' to CDIST_PAIRWISE_DISTANCES_REDUCTION_COMMON_METRICS

* fixup! Add 'haversine' to CDIST_PAIRWISE_DISTANCES_REDUCTION_COMMON_METRICS

* Apply suggestions from code review

* MAINT Document some config parameters for maintenance

Also rename one of them.

* FIX Support and test one of 'sqeuclidean' specification

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

* FIX Various typos fix and correct haversine

'haversine' is not supported by cdist.

* Directly use get_config

* CLN Apply comments from review

* Motivate swapped returned values

* TST Remove mahalanobis from test fixtures

* MNT Add comment regaduction functions' signatures

* TST Complete test for `pairwise_distance_{argmin,argmin_min}` (scikit-learn#22371)

* DOC Add sub-pull requests to the whats_new entry

* DOC place comment inside functions

* DOC move up whatsnew entry

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Jérémie du Boisberranger <jeremiedbb@users.noreply.github.com>
@jjerphan jjerphan deleted the pairwise-distances-argkmin-plug-contd branch October 21, 2022 14:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0