8000 [WIP] ENH Make stratified a parameter and conflate all the stratified/non-stratified cross-validator class pairs. by raghavrv · Pull Request #5569 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[WIP] ENH Make stratified a parameter and conflate all the stratified/non-stratified cross-validator class pairs. #5569

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

raghavrv
Copy link
Member
@raghavrv raghavrv commented Oct 23, 2015

Partially fixes #5053 -- Decided not to proceed as the benefit is not worth the deprecation.

VOTE

Mathieu +1 (#5053 (comment))
Alex ?
Joel +0
Manoj +0
Gael +0?


TODO

  • Make stratified a constructor parameter.
  • Conflate all the stratified/non-stratified class pairs
  • Make the tests pass.

NOTES


cc: @amueller @vene @jnothman @MechCoder

@MechCoder
Copy link
Member

Is this Pull Request in "GSoC is over, so I need not finish this up stage?" ;-)

@raghavrv
Copy link
Member Author
raghavrv commented Feb 8, 2016

:P No No I'll start this soon. Apologies for the delay!

@MechCoder
Copy link
Member

Was there some discussion on changing the grid_scores_ attribute structure?

@raghavrv
Copy link
Member Author
raghavrv commented Feb 9, 2016

I discussed with Andy IRL... I have noted it down in my note I guess... I'll go home and check

@raghavrv
Copy link
Member Author

Ok I'm splitting this into multiple smaller PRs... That is better I feel

@raghavrv raghavrv changed the title [WIP] ENH Various enhancements to the model_selection module [WIP] ENH Make stratified a parameter and conflate all the stratified/non-stratified cross-validator class pairs. Mar 10, 2016
@raghavrv
Copy link
Member Author

@amueller @jnothman I am resuming my work on model_selection... (Aiming to finish up multiple metric soon)

Do I finish this up first? Is there interest in this?

EDIT: The diff is incorrect. Kindly don't look at the diff!

@raghavrv
Copy link
Member Author

Also ping @vene @mblondel

@jnothman
Copy link
Member

Was there some discussion on changing the grid_scores_ attribute structure?

Since 2013, maybe before...

@jnothman
Copy link
Member

Indeed, 2012, arguably: #1034; also #1787.

@MechCoder
Copy link
Member

Do we unanimously agree to make stratified as a constructor argument? Sorry if this is obvious but what are the benefits?

@jnothman
Copy link
Member

I think the advantage is mostly just in a reduction of the number of classes. One of the key points is that this change is enabled by the new splitter design, wherein y etc are no longer passed as constructor parameters. The disadvantage is that actually the stratified and unstratified implementations tend to be quite different.

@MechCoder
Copy link
Member

I see, now should be the best time to do it, then.

Should we also support continuous y as done in #6598 ?
I think there is the issue of how to identify whether the target is discrete in that case. (Do we just go by the dtype of y?)

@MechCoder
Copy link
Member

#4757 (comment)

@jnothman
Copy link
Member

I think there is the issue of how to identify whether the target is
discrete in that case. (Do we just go by the dtype of y?)

Yes, this is an issue; the type_of_target function attempts to identify,
but it's not foolproof.

On 13 April 2016 at 14:16, Manoj Kumar notifications@github.com wrote:

#4757 (comment)
#4757 (comment)


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#5569 (comment)

@jnothman
Copy link
Member

If this is broadly the way to go, I think it's possible we'll land up with stratify=False/True becoming stratify=False/True/'auto'/'binned' where 'binned' artificially creates n_samples/n_folds classes, and 'auto' is like the check_cv heuristic. But it does seem like we're creating a monolith.

@GaelVaroquaux
Copy link
Member
GaelVaroquaux commented Apr 13, 2016 via email

@GaelVaroquaux
Copy link
Member
GaelVaroquaux commented Apr 13, 2016 via email

@jnothman
Copy link
Member

I'm fairly ambivalent here. I do suspect that this merger is creating
unnecessary work for us and for the users. It only becomes beneficial IMO
once we're dealing with a non-binary stratify=False/True, and even then,
the amount of code shared may be minimal.

On 13 April 2016 at 15:05, Gael Varoquaux notifications@github.com wrote:

I think the advantage is mostly just in a reduction of the number of
classes.

Given that the code is not getting simpler, that's probably a potential
advantage in terms of usability right? But I am not sure whether finding
an argument hidden in a class is easier to find a class. And the way
people now learn CV with scikit-learn, they pretty much need to look at a
list of class. Hence, I think that for usuability, stratified as a
different class is probably a good thing.

What do people think about the above reasonning?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#5569 (comment)

@mblondel
Copy link
Member

The good timing to do this change would have been when we created the model_selection module. Now it feels a bit too much to ask users to change their code once again.

@jnothman
Copy link
Member

model_selection is not released.

On 13 April 2016 at 16:13, Mathieu Blondel notifications@github.com wrote:

The good timing to do this change would have been when we created the
model_selection module. Now it feels a bit too much to ask users to change
their code once again.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#5569 (comment)

@raghavrv
Copy link
Member Author

once we're dealing with a non-binary stratify=False/True

You mean to say stratify=True/False/'binned'/'auto' like you said before?

The good timing to do this change would have been when we created the model_selection module. Now it feels a bit too much to ask users to change their code once again.

Should I count that as a +1 for making this change?

@MechCoder
Copy link
Member

I agree with Joel and Gael and am also really +0 on this (but that's probably because I am less experienced). I think the more important thing would be the multiple-metric support.

@raghavrv
Copy link
Member Author
raghavrv commented Apr 13, 2016

Okay. Thanks for the comments!! I'm leaving this open for the future.. feel free to close feel free to reopen this...

@raghavrv raghavrv closed this May 11, 2016
@raghavrv raghavrv deleted the model_sel_enh branch May 11, 2016 02:51
@raghavrv raghavrv restored the model_sel_enh branch May 11, 2016 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFC] Changes to model_selection?
5 participants
0