[WIP] ENH Make stratified a parameter and conflate all the stratified/non-stratified cross-validator class pairs.#5569
Conversation
d1696a7 to
d8c4f45
Compare
|
Is this Pull Request in "GSoC is over, so I need not finish this up stage?" ;-) |
|
:P No No I'll start this soon. Apologies for the delay! |
|
Was there some discussion on changing the |
|
I discussed with Andy IRL... I have noted it down in my note I guess... I'll go home and check |
|
Ok I'm splitting this into multiple smaller PRs... That is better I feel |
Since 2013, maybe before... |
|
Do we unanimously agree to make stratified as a constructor argument? Sorry if this is obvious but what are the benefits? |
|
I think the advantage is mostly just in a reduction of the number of classes. One of the key points is that this change is enabled by the new splitter design, wherein |
|
I see, now should be the best time to do it, then. Should we also support continuous y as done in #6598 ? |
Yes, this is an issue; the type_of_target function attempts to identify, On 13 April 2016 at 14:16, Manoj Kumar notifications@github.com wrote:
|
|
If this is broadly the way to go, I think it's possible we'll land up with stratify=False/True becoming stratify=False/True/'auto'/'binned' where 'binned' artificially creates |
|
Going slightly in the same direction, I would think that stratification
for regression would also be useful. In such a case, the stratification
would be inexact, but in my opinion that would still be useful.
Whether or not to merge the classes... I have no strong opinion. I would
say that if the code becomes significantly harder to read and maintain,
we shouldn't do it.
|
|
I think the advantage is mostly just in a reduction of the number of classes.
Given that the code is not getting simpler, that's probably a potential
advantage in terms of usability right? But I am not sure whether finding
an argument hidden in a class is easier to find a class. And the way
people now learn CV with scikit-learn, they pretty much need to look at a
list of class. Hence, I think that for usuability, stratified as a
different class is probably a good thing.
What do people think about the above reasonning?
|
|
I'm fairly ambivalent here. I do suspect that this merger is creating On 13 April 2016 at 15:05, Gael Varoquaux notifications@github.com wrote:
|
|
The good timing to do this change would have been when we created the model_selection module. Now it feels a bit too much to ask users to change their code once again. |
|
model_selection is not released. On 13 April 2016 at 16:13, Mathieu Blondel notifications@github.com wrote:
|
You mean to say
Should I count that as a +1 for making this change? |
|
I agree with Joel and Gael and am also really +0 on this (but that's probably because I am less experienced). I think the more important thing would be the multiple-metric support. |
|
Okay. Thanks for the comments!! |
Partially fixes #5053 -- Decided not to proceed as the benefit is not worth the deprecation.
VOTE
Mathieu +1 (#5053 (comment))
Alex ?
Joel +0
Manoj +0
Gael +0?
TODO
stratifieda constructor parameter.NOTES
cc: @amueller @vene @jnothman @MechCoder