8000 ENH add partial_fit to NearestCentroid · Pull Request #19262 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

ENH add partial_fit to NearestCentroid #19262

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from
Closed

ENH add partial_fit to NearestCentroid #19262

wants to merge 11 commits into from

Conversation

ghost
Copy link
@ghost ghost commented Jan 24, 2021

Reference Issues/PRs

Adresses issue #12952 and @Robinspecteur 's PR #13033

What does this implement/fix? Explain your changes.

Implements the partial_fit method and the associated tests from PR #13033 while taking into account the comments that were never addressed.

To summarize :

Done

  • @jnothman comments :

    • Link. deepcopy() has been replaced by .copy()
    • Link. The true_classes_ attribute has been documented
    • Link. A redundant code block of test_partial_fit() in sklearn/neighbors/tests/test_nearest_centroid.py has been removed.
      The author of PR Partial centroid #13033 seems to have wanted to add an example of values for which the predicted classes were already correctly guessed partway through the partial_fit process and thus equal to the predicted classes by the end of the complete fit.
    • Link. test_partial_fit now checks the predicted classes of partial_fit against fit
  • @dirkpadfield comments :

    • Link. The two deepcopy() (which were replaced by .copy()) were inserted inside a if self.shrink_threshold condition.
      They could not be move under the original if self.shrink_threshold line 228 (now 230) due to tests failing because the value recovered from the class attributes wouldn't be the same due to the for loop line 195 (now 200)
    • Link. The warning has been relocated under the if _refit or _check_partial_fit_first_call(self, classes) condition line 173 (now 171)

To do

  • @jnothman comments :
    • Link. This comment, regarding the following test comments, was not addressed because I did not quite understand what the PR Partial centroid #13033 author's comment was referring to.
# Ensure that the shrinking is correct.
# The expected result is calculated by R (pamr),
# which is implemented by the author of the original paper.
# (One need to modify the code to output the new centroid in pamr.predict)

Any other comments?

I ran Flake8 and pytest locally to ensure the CI passes, but that is about as many tools as I can run on my machine, I'm not 100% sure about the other scikit-learn's repo automatic checking tools.

@cmarmo
Copy link
Contributor
cmarmo commented Jan 25, 2021

Hi @aerodynamic-saucepan some tests are still failing. There is a docstring failure that you can check locally using

$ pytest --doctest-modules sklearn/neighbors/_nearest_centroid.py -k NearestCentroid

Other failures are accessible via this link. Let us know if you need some help to solve them.

@ghost
Copy link
Author
ghost commented Jan 26, 2021

Hi @aerodynamic-saucepan some tests are still failing. There is a docstring failure that you can check locally using

$ pytest --doctest-modules sklearn/neighbors/_nearest_centroid.py -k NearestCentroid

Other failures are accessible via this link. Let us know if you need some help to solve them.

Hi @cmarmo , thank you for the help.
Now I am wondering, is there a way of running the Azure checkers locally ? I'm less familiar with their errors and would like to do quick tests without having to push them each time and wait for the CI pipeline to process (I couldn't find the answer in the Contributing section).

@cmarmo
Copy link
Contributor
cmarmo commented Jan 26, 2021

Now I am wondering, is there a way of running the Azure checkers locally ?

Azure run the test suite on different architectures (32/64 bit) and python environments.
You should be able to reproduce locally the failures of the environment closest to yours.
In your case (for eg Linux pylatest_pip_openblas_pandas, sorry this is the one closest to my env... ) apparently the issue is in the common tests related to the estimator you have modified.

$ pytest sklearn/tests/test_common.py -k NearestCentroid

will reproduce the issue locally.
Hope this could help.

@ghost ghost closed this Feb 1, 2021
@ghost ghost reopened this Feb 1, 2021
@ghost
Copy link
Author
ghost commented Feb 1, 2021

I apologize for the inconvenience, but I accidentally closed the PR by misinterpreting a 'Close with comment' button for closing a comment, when it actually means closing the PR with the comment as final.. I re-opened it right away but it seems now the references for the CI pipeline are broken :

(from circleci)

fatal: Couldn't find remote ref refs/pull/19262/merge
error: pathspec 'pr/19262/merge' did not match any file(s) known to git
Could not fetch merge commit.
There may be conflicts in merging PR #19262 with main.

I would appreciate some advice to avoid breaking it any further.

@cmarmo
Copy link
Contributor
cmarmo commented Feb 1, 2021

Hi @aerodynamic-saucepan , apparently you created your branch before the renaming of the main branch in upstream. You still have some references to master and this breaks the checks. Sychronising with upstream will solve the issue.
Also some conflicts arose in the meanwhile, they need to be fixed during the sync.

@cmarmo
Copy link
Contributor
cmarmo commented Feb 1, 2021

Hi @aerodynamic-saucepan , apparently you created your branch before the renaming of the main branch in upstream. You still have some references to master and this breaks the checks. Sychronising with upstream will solve the issue.

Sorry I misread... there is no issue with renaming

Also some conflicts arose in the meanwhile, they need to be fixed during the sync.

Conflicts are the reason of the failing checks.

@ghost
Copy link
Author
ghost commented Feb 1, 2021

But conflicts arise when attempting to merge with main right ? And this PR isn't ready to be merged, yet wouldn't attempting to resolve merge conflicts lead to a merge request on main ?

@cmarmo
Copy link
Contributor
cmarmo commented Feb 1, 2021

But conflicts arise when attempting to merge with main right ?

Right: but every time there is a commit to an open pull request some checks are run (this is called Continuous Integration , CI) to verify that your pull request does not contain conflicting commits with respect to the base branch, let's say that a merge is simulated at each commit. If there is such a conflict the build checks are not run. You might not want main and your branch conflict too much during the review process, otherwise the merging step will be painful. You will find more information about that on the github documentation about conflicts.

@glemaitre
Copy link
Member

@aerodynamic-saucepan Could you solve the conflict such that we can make an appropriate review?

@glemaitre glemaitre changed the title Partial centroid #13033 PR : comments corrections applied ENH add partial_fit to NearestCentroid Aug 4, 2021
@Valentin-Laurent
Copy link
Contributor

Hey 👋🏻
I'd like to work on this stalled PR. Not sure I'll succeed but it is worth trying.
By the way, I noticed some issues related to NearestCentroid and metrics (see #23890), but I believe starting by this PR makes more sense.

@ArturoAmorQ
Copy link
Member
ArturoAmorQ commented Aug 10, 2022

Hello @Valentin-Laurent, are you still working on a related PR?

@Valentin-Laurent
Copy link
Contributor

Hello @Valentin-Laurent, are you still working on a related PR?

Hello @ArturoAmorQ, I was waiting for PR #24083 to be reviewed before starting to work on implementing partial fit. Why?

Repository owner closed this by deleting the head repository Mar 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0