8000 Work on cross val and pipelines. by GaelVaroquaux · Pull Request #6 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Work on cross val and pipelines. #6

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
3 commits merged into from
Sep 16, 2010
Merged

Conversation

GaelVaroquaux
Copy link
Member

Code review: I am interested in suggestions on how to make this code stink less, or criticism that finds problems or bugs with this code.

Enhancements

  • Cross validation in parallel (with several CPUs) works better.
  • The GridSearchCV stores its scores on the grid. I don't like the data structure that they are stored in. I think it
    is something that will need to be revisited at some point.
  • The score stored are now scaled even in the case iid=True. Alex, could you check that the code is right. You
    are the number one person who wants iid=True to work well.

Bug fixes

  • Make clone work on pipelines

    The clone code is fragile. I don't like it. The alternative is to add a '_clone' method to an object that would
    enable to override the clone behavior. I think that we need to keep this option in mind (the con is that it
    makes the estimator 'contract' heavier, the pro is that it makes this contract more explicit. I still think that we
    need the clone behavior in order to cross validate.

  • Make sure that classifiers are indeed recognized as so even when wrapped in CV or Pipeline objects

    I am very unhappy with this code (but I would still like it merged, because it gives a temporary solution).
    It raises the issue of how to recognize a classifier from a regressor. Finding the duck-typing signature for
    classifiers is partly the problem, because once we have found it, we need to find a good way to 'propagate' in
    the case of nested objects, as this commit shows.

Make it possible to pipeline GridsearchCV,
Give a meaningful scaling to the scores in GridsearchCV
Store the scores in GridsearchCV
…ized

as classifiers, if they contain a classifier.
and use clone in cross_val_func
@agramfort
Copy link
Member

the "iid" stuff look good. I have to play a bit with the Pipeline to see if it works as expected. I'll keep you posted.

NelleV referenced this pull request in NelleV/scikit-learn Jun 14, 2011
larsmans referenced this pull request in larsmans/scikit-learn Jul 15, 2011
Revised text classification chapter
pprett referenced this pull request in pprett/scikit-learn Aug 8, 2011
Boston dataset commit
jakevdp added a commit that referenced this pull request Feb 9, 2012
glouppe referenced this pull request in glouppe/scikit-learn Dec 18, 2012
larsmans referenced this pull request in larsmans/scikit-learn May 20, 2013
agramfort added a commit that referenced this pull request Jul 24, 2013
nitpick fixes, pep8 and fix math equations
jaquesgrobler pushed a commit to jaquesgrobler/scikit-learn that referenced this pull request Jul 25, 2013
ENH added the fork me ribbon to the website
jnothman referenced this pull request in jnothman/scikit-learn Jan 10, 2015
Fix merge conflict with scikit-learn/master
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0