[WIP] ENH Make stratified a parameter and conflate all the stratified/non-stratified cross-validator class pairs. #5569

raghavrv · 2015-10-23T16:40:22Z

Partially fixes #5053 -- Decided not to proceed as the benefit is not worth the deprec 8000 ation.

VOTE

Mathieu +1 (#5053 (comment))
Alex ?
Joel +0
Manoj +0
Gael +0?

TODO

Make stratified a constructor parameter.
Conflate all the stratified/non-stratified class pairs
Make the tests pass.

NOTES

Comment from Joel, Andy and Matheiu, here, here and here
Also refer extend StratifiedKFold to float for regression #4757 for discussion on stratification in the case of regression
Refer this for an existing implementation(?)

cc: @amueller @vene @jnothman @MechCoder

MechCoder · 2016-02-08T20:25:43Z

Is this Pull Request in "GSoC is over, so I need not finish this up stage?" ;-)

raghavrv · 2016-02-08T22:27:20Z

:P No No I'll start this soon. Apologies for the delay!

MechCoder · 2016-02-09T19:27:37Z

Was there some discussion on changing the grid_scores_ attribute structure?

raghavrv · 2016-02-09T19:30:07Z

I discussed with Andy IRL... I have noted it down in my note I guess... I'll go home and check

raghavrv · 2016-02-10T10:42:51Z

Ok I'm splitting this into multiple smaller PRs... That is better I feel

raghavrv · 2016-04-11T15:47:37Z

@amueller @jnothman I am resuming my work on model_selection... (Aiming to finish up multiple metric soon)

Do I finish this up first? Is there interest in this?

EDIT: The diff is incorrect. Kindly don't look at the diff!

raghavrv · 2016-04-11T15:47:51Z

Also ping @vene @mblondel

jnothman · 2016-04-12T01:09:18Z

Was there some discussion on changing the grid_scores_ attribute structure?

Since 2013, maybe before...

jnothman · 2016-04-12T01:10:43Z

Indeed, 2012, arguably: #1034; also #1787.

MechCoder · 2016-04-12T06:13:44Z

Do we unanimously agree to make stratified as a constructor argument? Sorry if this is obvious but what are the benefits?

jnothman · 2016-04-12T07:01:10Z

I think the advantage is mostly just in a reduction of the number of classes. One of the key points is that this change is enabled by the new splitter design, wherein y etc are no longer passed as constructor parameters. The disadvantage is that actually the stratified and unstratified implementations tend to be quite different.

MechCoder · 2016-04-13T04:16:05Z

I see, now should be the best time to do it, then.

Should we also support continuous y as done in #6598 ?
I think there is the issue of how to identify whether the target is discrete in that case. (Do we just go by the dtype of y?)

< 8000 /td>

MechCoder · 2016-04-13T04:16:17Z

#4757 (comment)

jnothman · 2016-04-13T04:16:58Z

I think there is the issue of how to identify whether the target is
discrete in that case. (Do we just go by the dtype of y?)

Yes, this is an issue; the type_of_target function attempts to identify,
but it's not foolproof.

On 13 April 2016 at 14:16, Manoj Kumar notifications@github.com wrote:

#4757 (comment)
#4757 (comment)

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#5569 (comment)

jnothman · 2016-04-13T04:43:27Z

If this is broadly the way to go, I think it's possible we'll land up with stratify=False/True becoming stratify=False/True/'auto'/'binned' where 'binned' artificially creates n_samples/n_folds classes, and 'auto' is like the check_cv heuristic. But it does seem like we're creating a monolith.

GaelVaroquaux · 2016-04-13T05:01:26Z

Going slightly in the same direction, I would think that stratification for regression would also be useful. In such a case, the stratification would be inexact, but in my opinion that would still be useful. Whether or not to merge the classes... I have no strong opinion. I would say that if the code becomes significantly harder to read and maintain, we shouldn't do it.

GaelVaroquaux · 2016-04-13T05:05:00Z

I think the advantage is mostly just in a reduction of the number of classes.

Given that the code is not getting simpler, that's probably a potential advantage in terms of usability right? But I am not sure whether finding an argument hidden in a class is easier to find a class. And the way people now learn CV with scikit-learn, they pretty much need to look at a list of class. Hence, I think that for usuability, stratified as a different class is probably a good thing. What do people think about the above reasonning?

jnothman · 2016-04-13T05:10:20Z

I'm fairly ambivalent here. I do suspect that this merger is creating
unnecessary work for us and for the users. It only becomes beneficial IMO
once we're dealing with a non-binary stratify=False/True, and even then,
the amount of code shared may be minimal.

On 13 April 2016 at 15:05, Gael Varoquaux notifications@github.com wrote:

I think the advantage is mostly just in a reduction of the number of
classes.

Given that the code is not getting simpler, that's probably a potential
advantage in terms of usability right? But I am not sure whether finding
an argument hidden in a class is easier to find a class. And the way
people now learn CV with scikit-learn, they pretty much need to look at a
list of class. Hence, I think that for usuability, stratified as a
different class is probably a good thing.

What do people think about the above reasonning?

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#5569 (comment)

mblondel · 2016-04-13T06:13:07Z

The good timing to do this change would have been when we created the model_selection module. Now it feels a bit too much to ask users to change their code once again.

jnothman · 2016-04-13T06:16:28Z

model_selection is not released.

On 13 April 2016 at 16:13, Mathieu Blondel notifications@github.com wrote:

The good timing to do this change would have been when we created the
model_selection module. Now it feels a bit too much to ask users to change
their code once again.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#5569 (comment)

raghavrv · 2016-04-13T08:42:28Z

once we're dealing with a non-binary stratify=False/True

You mean to say stratify=True/False/'binned'/'auto' like you said before?

The good timing to do this change would have been when we created the model_selection module. Now it feels a bit too much to ask users to change their code once again.

Should I count that as a +1 for making this change?

MechCoder · 2016-04-13T19:47:22Z

I agree with Joel and Gael and am also really +0 on this (but that's probably because I am less experienced). I think the more important thing would be the multiple-metric support.

raghavrv · 2016-04-13T20:44:16Z

Okay. Thanks for the comments!! ~~I'm leaving this open for the future.. feel free to close~~ feel free to reopen this...

raghavrv mentioned this pull request Oct 30, 2015

[MRG+1] Make cross-validators data independent + Reorganize grid_search, cross_validation and learning_curve into model_selection #4294

Merged

24 tasks

raghavrv force-pushed the model_sel_enh branch from d1696a7 to d8c4f45 Compare November 9, 2015 12:51

raghavrv mentioned this pull request Mar 10, 2016

[RFC] Changes to model_selection? #5053

Closed

raghavrv force-pushed the model_sel_enh branch from d8c4f45 to 529084d Compare March 10, 2016 15:36

ENH Various enhancements to the model_selection module

8f7bea9

raghavrv force-pushed the model_sel_enh branch from 529084d to 8f7bea9 Compare March 10, 2016 15:39

raghavrv changed the title ~~[WIP] ENH Various enhancements to the model_selection module~~ [WIP] ENH Make stratified a parameter and conflate all the stratified/non-stratified cross-validator class pairs. Mar 10, 2016

raghavrv closed this May 11, 2016

raghavrv deleted the model_sel_enh branch May 11, 2016 02:51

raghavrv restored the model_sel_enh branch May 11, 2016 13:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[WIP] ENH Make stratified a parameter and conflate all the stratified/non-stratified cross-validator class pairs. #5569

[WIP] ENH Make stratified a parameter and conflate all the stratified/non-stratified cross-validator class pairs. #5569

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[WIP] ENH Make stratified a parameter and conflate all the stratified/non-stratified cross-validator class pairs. #5569

[WIP] ENH Make stratified a parameter and conflate all the stratified/non-stratified cross-validator class pairs. #5569

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!