8000 FIX Improve error message when RepeatedStratifiedKFold.split is called without a y argument by Anurag-Varma · Pull Request #29402 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

FIX Improve error message when RepeatedStratifiedKFold.split is called without a y argument #29402

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 11, 2024

Conversation

Anurag-Varma
Copy link
Contributor

Reference Issues/PRs

Fixes #29369

What does this implement/fix? Explain your changes.

For sklearn.model_selection.RepeatedStratifiedKFold, it fixes the exception when y argument is not provided.

Now it returns: TypeError: RepeatedStratifiedKFold.split() missing 1 required positional argument: 'y'.
This matches the similar format of sklearn.model_selection.StratifiedKFold when y argument is not provided.

Any other comments?

Created a new function in class RepeatedStratifiedKFold -> def split(self, X, y, groups=None) in which the argument 'y' doesn't have default value of None. So its mandatory to provide y argument in split function in RepeatedStratifiedKFold class or else, it gives the error.

Copy link
github-actions bot commented Jul 3, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 4f4016c. Link to the linter CI: here

@Anurag-Varma
Copy link
Contributor Author

Hi @lesteve,

When possible, can you review this.

Copy link
Member
@lesteve lesteve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test using pytest.raises that checks that the error is
TypeError with missing 1 required positional argument: 'y' in the error message?

@Anurag-Varma Anurag-Varma requested a review from lesteve July 5, 2024 10:09
@lesteve
Copy link
Member
lesteve commented Jul 5, 2024

Can you please add a test see other tests in https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/model_selection/tests/test_split.py.

@Anurag-Varma
Copy link
Contributor Author

@lesteve All the changes are done, can you merge the PR

@Anurag-Varma
Copy link
Contributor Author

@lesteve can you review this PR

@lesteve
Copy link
Member
lesteve commented Jul 8, 2024

@Anurag-Varma maybe you can imagine that pinging me 3 times over 4 days can be perceived as a bit too pushy 😉. Generally speaking, I would say pinging after a week without answer is probably fine, 3 times over 4 days is certainly not. Have a look at this FAQ for more details.

I will review this when I find the time, as you can imagine there is plenty of other stuff that I am involved in ...

Also, since you seem to be in quite a hurry, note that in scikit-learn we need the approval of 2 maintainers before we can merge a PR.

English is not my native language and may not be yours but even without the 3 pings in 4 days thing, I personally find this kind of language too pushy:

@lesteve All the changes are done, can you merge the PR

I would rather have something like this:

I have tackled all your comments, let me know if there is something else that needs to be done

8000
@Anurag-Varma
Copy link
Contributor Author

@lesteve Ohh sorry about that, I didn't know like the rules and faq. This is my first time contributing to scikit.

I thought after my changes, I should inform a member so that they can verify them.

I don't know about the 2 reviewers rule or that it takes time for the review. I was thinking like the review will be done immediatly and then it will be merged as I resolved the issues which were mentioned.

From next time, I will keep this in mind.

Thanks

@lesteve
Copy link
Member
lesteve commented Jul 9, 2024

No worries this is part of unwritten etiquette and each project may work slightly differently.

Since you see to be relatively new to open-source contributiony you maybe interested in is the Github open-source guide (the section "What happens after you submit your contribution" mentions pinging someone if you haven't received a response in over a week which sounds a reasonable amount of time in general).

Also don't hesitate to have a look at the scikit-learn contributing guide.

Copy link
Member
@lesteve lesteve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, a few comments, mostly about improving the test.

@lesteve lesteve changed the title FIX Erroneous optional status for y parameter in RepeatedStratifiedKFold.split FIX Improve error message when RepeatedStratifiedKFold.split is called without a y argument Jul 9, 2024
Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
@lesteve
Copy link
Member
lesteve commented Jul 9, 2024

Just a general question, do you have a reason to force-push as often as you do?

I don't think this PR depends on any of the changes in the main branch since you have created your branch and as such there is no need to merge main or rebase + force-push.

FYI, when there is a need for it, we generally ask people to favor merging rather than rebase + force-push. As a reviewer it makes it easier to see what has changed since I last looked at the PR. Also we use squash + merge when merging a PR so it gets turned into a single commit on the main branch so we don't care about having a complicated history with many merge commits in the PR branch.

@Anurag-Varma
Copy link
Contributor Author

The sklearn main was getting updated and my branch was getting outdated. So i was rebasing the latest updates inti my branch and forch pushing it.

Regarding the merging, i thought too many extra merge commits were there.

Sorry that you are spending too much time on this pr reviewing it multiple times.

Copy link
Member
@lesteve lesteve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

As I mentioned earlier it needs a second reviewer for the PR to be merged.

@lesteve
Copy link
Member
lesteve commented Jul 9, 2024

The sklearn main was getting updated and my branch was getting outdated.

In general this is not an issue, i.e. if there are new commits in the main branch, you don't need to do anything. I would say the exception is if there are conflicts in your PR (you changed something and another PR got merged and changed the same lines) or a maintainer ask you to do it for some specific reason.

@lesteve lesteve added Quick Review For PRs that are quick to review Waiting for Second Reviewer First reviewer is done, need a second one! labels Jul 9, 2024
Copy link
Member
@lucyleeow lucyleeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! Just one nit.

Co-authored-by: Lucy Liu <jliu176@gmail.com>
@lesteve lesteve enabled auto-merge (squash) July 11, 2024 06:46
@lesteve
Copy link
Member
lesteve commented Jul 11, 2024

Thanks for the review @lucyleeow, I have enabled auto-merge so this PR will be merged when CI is green.

@lesteve lesteve merged commit 20c7bd0 into scikit-learn:main Jul 11, 2024
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:model_selection Quick Review For PRs that are quick to review Waiting for Second Reviewer First reviewer is done, need a second one!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Erroneous optional status for y parameter in RepeatedStratifiedKFold.split
3 participants
0