Restructure the output attributes of *SearchCV#1768
Restructure the output attributes of *SearchCV#1768jnothman wants to merge 13 commits intoscikit-learn:masterfrom
Conversation
add docstring for GridSearchCV, RandomizedSearchCV and fit_grid_point. In "fit_grid_point" I used test_score rather than validation_score, as the split is given to the function. rbf svm grid search example now also shows training scores - which illustrates overfitting for high C, and training/prediction times... which pasically serve to illustrate that this is possible. Maybe random forests would be better to evaluate training times?
Currently, no tests have been added, and backwards compatibility is eschewed
This includes some refactoring and creation of new Scorer classes to wrap legacy scorer forms. Conflicts: sklearn/grid_search.py sklearn/tests/test_grid_search.py
Thus the attributes stored by BaseSearchCV._fit() are no longer redundant. Also: test for these attributes
There was a problem hiding this comment.
Should this be called fit_fold for consistency?
There was a problem hiding this comment.
Yes, with deprecation of the current function name.
|
Thanks a lot for your work. I hope I can incooperate you chages soon. |
This replaces Scorer.store() Also: tests for new Scorer functionality and descendants, and fixes broken WrapScorer
|
I think this PR does to many thing / unrelated things at once. Current master basically does a rename of the On the other hand, one could argue that your changes are not going far enough. If you do have a grid of parameters, the reshaping is still annoying to do by hand, right? I'm all for incremental improvements, but not if they change the api ;) |
|
Btw, I don't see how this is an alternative to #1742. |
|
All true, @amueller . It is not strictly an alternative to #1742 (which also grew much broader than its title suggests); rather, [what I meant:] it incorporates an alternative approach to returning additional statistics from the parameter search. The implementation at #1742 (d884180) actually breaks backwards-compatibility: where This patch doesn't change the structure of I am not sure why you are concerned that the output cannot be indexed directly by parameter values. I don't think I ever suggested reshaping the output to be indexed by parameter values: this might indeed be a nice extension (though as far as I can tell would still not be trivial to use), but is certainly separate. Moreover this critique -- and the potential extension -- would apply equally to this patch and to the existing code. Finally, I included multiple scores in this PR because:
But if you insist, it can be removed; but it can then only be proposed after the restructure patch is accepted. As this was my first content PR to scikit-learn, would you advise me on the appropriate action? (If I am to split it, do I merely edit this branch, or do I close this PR and reopen?) |
|
Okay. I've split it up. Start with #1787 |
|
Thanks for responding to my comments. I think I agree with you. Thanks for splitting up the PR. I think this is the way to go. |
BaseSearchCVnow stores attribsbest_index_,grid_results_andfold_results_from whichbest_params_,best_score_andgrid_scores_are derived; andbest_estimator_.grid_results_andfold_results_are structured arrays, allowing for flexible further computation.An alternative to #1742