-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[WIP] ENH Make stratified a parameter and conflate all the stratified/non-stratified cross-validator class pairs. #5569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d1696a7
to
d8c4f45
Compare
Is this Pull Request in "GSoC is over, so I need not finish this up stage?" ;-) |
:P No No I'll start this soon. Apologies for the delay! |
Was there some discussion on changing the |
I discussed with Andy IRL... I have noted it down in my note I guess... I'll go home and check |
Ok I'm splitting this into multiple smaller PRs... That is better I feel |
Since 2013, maybe before... |
Do we unanimously agree to make stratified as a constructor argument? Sorry if this is obvious but what are the benefits? |
I think the advantage is mostly just in a reduction of the number of classes. One of the key points is that this change is enabled by the new splitter design, wherein |
I see, now should be the best time to do it, then. Should we also support continuous y as done in #6598 ? |
Yes, this is an issue; the type_of_target function attempts to identify, On 13 April 2016 at 14:16, Manoj Kumar notifications@github.com wrote:
|
If this is broadly the way to go, I think it's possible we'll land up with stratify=False/True becoming stratify=False/True/'auto'/'binned' where 'binned' artificially creates |
Going slightly in the same direction, I would think that stratification
for regression would also be useful. In such a case, the stratification
would be inexact, but in my opinion that would still be useful.
Whether or not to merge the classes... I have no strong opinion. I would
say that if the code becomes significantly harder to read and maintain,
we shouldn't do it.
|
I think the advantage is mostly just in a reduction of the number of classes.
Given that the code is not getting simpler, that's probably a potential
advantage in terms of usability right? But I am not sure whether finding
an argument hidden in a class is easier to find a class. And the way
people now learn CV with scikit-learn, they pretty much need to look at a
list of class. Hence, I think that for usuability, stratified as a
different class is probably a good thing.
What do people think about the above reasonning?
|
I'm fairly ambivalent here. I do suspect that this merger is creating On 13 April 2016 at 15:05, Gael Varoquaux notifications@github.com wrote:
|
The good timing to do this change would have been when we created the model_selection module. Now it feels a bit too much to ask users to change their code once again. |
model_selection is not released. On 13 April 2016 at 16:13, Mathieu Blondel notifications@github.com wrote:
|
You mean to say
Should I count that as a +1 for making this change? |
I agree with Joel and Gael and am also really +0 on this (but that's probably because I am less experienced). I think the more important thing would be the multiple-metric support. |
Okay. Thanks for the comments!! |
Partially fixes #5053 -- Decided not to proceed as the benefit is not worth the deprecation.
VOTE
Mathieu +1 (#5053 (comment))
Alex ?
Joel +0
Manoj +0
Gael +0?
TODO
stratified
a constructor parameter.NOTES
cc: @amueller @vene @jnothman @MechCoder