[MRG] Model selection documentation #4

raghavrv · 2015-08-13T22:58:17Z

The build docs are over here

TODO -

Move grid_search.rst to search.rst

amueller · 2015-08-14T17:42:32Z

doc/modules/classes.rst

@@ -143,43 +143,58 @@ Classes
   covariance.oas
   covariance.graph_lasso

+.. _model_selection_ref:


As I said, I think I would keep the modules in the references under a "old and will be removed" header. I'm not sure what @vene @GaelVaroquaux or @jnothman think about that, though.

Is that model_selection_ref tag used anywhere?

Ok so all the "ref" tags were added in 6709ed5, the usage later got removed, and we kept adding the tags in good old cargo-cult fashion?

I'm +0.5 about keeping the "old and will be removed". It might hurt googlability so we need good links to the new non-deprecated files.

also ping @larsmans ;)

Ok so all the "ref" tags were added in 6709ed5, the usage later got removed, and we kept adding the tags in good old cargo-cult fashion?

Ah :P

I'm -1 for keeping the old modules listed here, but don't object to having their docs generated (by putting automodule somewhere hidden, say "deprecated.rst") with a clear deprecation-oriented docstring, if that's what you're more-or-less going for, @amueller. I'm not sure whether there's an easy way to set rel="canonical" to point to new equivalents, if that's your concern @vene.

I'm also convinced now that we should not list them here. Maybe just not list them. People can always look at the docs in ipython or the source if they like.

+.. _model_selection_ref:

As I said, I think I would ke 8000 ep the modules in the references under a
"old and will be removed" header. I'm not sure what @vladn
@GaelVaroquaux or @jnothman think about that, though.

Sorry for the slow reply. I think that it is a good suggestion.

amueller · 2015-08-14T17:57:13Z

the developers/contributing.rst file still has references to the old modules, I found cross_validation there and grid_search.

amueller · 2015-08-14T17:58:02Z

Not sure what to do with all the mentions in whatsnew.....

amueller · 2015-08-14T18:01:58Z

doc/modules/classes.rst

@@ -618,24 +610,6 @@ From text
   lda.LDA


-.. _learning_curve_ref:


I'm confused that these tags were not used anywhere....

raghavrv · 2015-08-14T18:15:23Z

Not sure what to do with all the mentions in whatsnew.....

I was also wondering about that :P but I thought maybe its better to leave it unchanged? since we'll be adding a whatsnew entry to note that they are grouped into model_selection...

raghavrv · 2015-08-14T18:16:02Z

the developers/contributing.rst file still has references to the old modules, I found cross_validation there and grid_search.

Oh are there? I'll fix them right away!

EDIT: fixed!

amueller · 2015-08-14T18:22:59Z

doc/modules/classes.rst

+   model_selection.cross_val_score
+   model_selection.cross_val_predict
+   model_selection.permutation_test_score
+   model_selection.check_cv


maybe a stupid question, but why is check_cv public? Input from @vene @GaelVaroquaux @larsmans @jnothman welcome.

So it can be used by extension modules, I guess. Here are some github code search hits.
This is like parts of sklearn.utils in that users don't need it directly, but it's useful for library extensions.

https://github.com/nilearn/nilearn/blob/master/nilearn/decoding/tests/test_searchlight.py#L31
https://github.com/experiencor/Data-analysis-projects/blob/4a3d20efd17a555028b8a9080f5ac6e63eff42ba/Decoding%20Human%20Brain/mne-python-master/mne/decoding/time_gen.py#L88
https://github.com/peret/visualize-bovw/blob/386bdc172a8a7766e915e65fabcc9e49c2e0bf23/weighted_grid_search.py#L13

Keep it public.

amueller · 2015-10-22T12:53:42Z

doc/modules/cross_validation.rst

-  >>> slo = LabelShuffleSplit(labels, n_iter=4, test_size=0.5,
-  ...                        random_state=0)
-  >>> for train, test in slo:
+  >>> slo = LabelShuffleSplit(n_iter=4, test_size=0.5, random_state=0)


slo? maybe lss

amueller · 2015-10-22T13:05:08Z

LGTM apart from minor nitpicks. If we rename n_labels in LeavePLabelsOut that needs to be fixed in the doctests.

ogrisel · 2015-10-22T13:28:00Z

doc/modules/model_evaluation.rst

    Traceback (most recent call last):
    ValueError: 'wrong_choice' is not a valid scoring value. Valid options are ['accuracy', 'adjusted_rand_score', 'average_precision', 'f1', 'f1_macro', 'f1_micro', 'f1_samples', 'f1_weighted', 'log_loss', 'mean_absolute_error', 'mean_squared_error', 'median_absolute_error', 'precision', 'precision_macro', 'precision_micro', 'precision_samples', 'precision_weighted', 'r2', 'recall', 'recall_macro', 'recall_micro', 'recall_samples', 'recall_weighted', 'roc_auc']
    >>> clf = svm.SVC(probability=True, random_state=0)
-    >>> cross_validation.cross_val_score(clf, X, y, scoring='log_loss') # doctest: +ELLIPSIS
+    >>> cross_val_score(clf, X, y, scoring='log_loss') # doctest: +ELLIPSIS
    array([-0.07..., -0.16..., -0.06...])


Please revert the order of the 2 examples: start with the valid use case (scoring='log_loss') first before the failure case (scoring='wrong_choice').

-------------------- * ENH Reogranize classes/fn from grid_search into search.py * ENH Reogranize classes/fn from cross_validation into split.py * ENH Reogranize cls/fn from cross_validation/learning_curve into validate.py * MAINT Merge _check_cv into check_cv inside the model_selection module * MAINT Update all the imports to point to the model_selection module * FIX use iter_cv to iterate throught the new style/old style cv objs * TST Add tests for the new model_selection members * ENH Wrap the old-style cv obj/iterables instead of using iter_cv * ENH Use scipy's binomial coefficient function comb for calucation of nCk * ENH Few enhancements to the split module * ENH Improve check_cv input validation and docstring * MAINT _get_test_folds(X, y, labels) --> _get_test_folds(labels) * TST if 1d arrays for X introduce any errors * ENH use 1d X arrays for all tests; * ENH X_10 --> X (global var) Minor ----- * ENH _PartitionIterator --> _BaseCrossValidator; * ENH CVIterator --> CVIterableWrapper * TST Import the old SKF locally * FIX/TST Clean up the split module's tests. * DOC Improve documentation of the cv parameter * COSMIT consistently hyphenate cross-validation/cross-validator * TST Calculate n_samples from X * COSMIT Use separate lines for each import. * COSMIT cross_validation_generator --> cross_validator Commits merged manually ----------------------- * FIX Document the random_state attribute in RandomSearchCV * MAINT Use check_cv instead of _check_cv * ENH refactor OVO decision function, use it in SVC for sklearn-like decision_function shape * FIX avoid memory cost when sampling from large parameter grids ENH Major to Minor incremental enhancements to the model_selection Squashed commit messages - (For reference) Major ----- * ENH p --> n_labels * FIX *ShuffleSplit: all float/invalid type errors at init and int error at split * FIX make PredefinedSplit accept test_folds in constructor; Cleanup docstrings * ENH+TST KFold: make rng to be generated at every split call for reproducibility * FIX/MAINT KFold: make shuffle a public attr * FIX Make CVIterableWrapper private. * FIX reuse len_cv instead of recalculating it * FIX Prevent adding *SearchCV estimators from the old grid_search module * re-FIX In all_estimators: the sorting to use only the 1st item (name) To avoid collision between the old and the new GridSearch classes. * FIX test_validate.py: Use 2D X (1D X is being detected as a single sample) * MAINT validate.py --> validation.py * MAINT make the submodules private * MAINT Support old cv/gs/lc until 0.19 * FIX/MAINT n_splits --> get_n_splits * FIX/TST test_logistic.py/test_ovr_multinomial_iris: pass predefined folds as an iterable * MAINT expose BaseCrossValidator * Update the model_selection module with changes from master - From scikit-learn#5161 - - MAINT remove redundant p variable - - Add check for sparse prediction in cross_val_predict - From scikit-learn#5201 - DOC improve random_state param doc - From scikit-learn#5190 - LabelKFold and test - From scikit-learn#4583 - LabelShuffleSplit and tests - From scikit-learn#5300 - shuffle the `labels` not the `indxs` in LabelKFold + tests - From scikit-learn#5378 - Make the GridSearchCV docs more accurate. - From scikit-learn#5458 - Remove shuffle from LabelKFold - From scikit-learn#5466(scikit-learn#4270) - Gaussian Process by Jan Metzen - From scikit-learn#4826 - Move custom error / warnings into sklearn.exception Minor ----- * ENH Make the KFold shuffling test stronger * FIX/DOC Use the higher level model_selection module as ref * DOC in check_cv "y : array-like, optional" * DOC a supervised learning problem --> supervised learning problems * DOC cross-validators --> cross-validation strategies * DOC Correct Olivier Grisel's name ;) * MINOR/FIX cv_indices --> kfold * FIX/DOC Align the 'See also' section of the new KFold, LeaveOneOut * TST/FIX imports on separate lines * FIX use __class__ instead of classmethod * TST/FIX import directly from model_selection * COSMIT Relocate the random_state documentation * COSMIT remove pass * MAINT Remove deprecation warnings from old tests * FIX correct import at test_split * FIX/MAINT Move P_sparse, X, y defns to top; rm unused W_sparse, X_sparse * FIX random state to avoid doctest failure * TST n_splits and split wrapping of _CVIterableWrapper * FIX/MAINT Use multilabel indicator matrix directly * TST/DOC clarify why we conflate classes 0 and 1 * DOC add comment that this was taken from BaseEstimator * FIX use of labels is not needed in stratified k fold * Fix cross_validation reference * Fix the labels param doc FIX/DOC/MAINT Addressing the review comments by Arnaud and Andy COSMIT Sort the members alphabetically COSMIT len_cv --> n_splits COSMIT Merge 2 if; FIX Use kwargs DOC Add my name to the authors :D DOC make labels parameter consistent FIX Remove hack for boolean indices; + COSMIT idx --> indices; DOC Add Returns COSMIT preds --> predictions DOC Add Returns and neatly arrange X, y, labels FIX idx(s)/ind(s)--> indice(s) COSMIT Merge if and else to elif COSMIT n --> n_samples COSMIT Use bincount only once COSMIT cls --> class_i / class_i (ith class indices) --> perm_indices_class_i FIX/ENH/TST Addressing the final reviews COSMIT c --> count FIX/TST make check_cv raise ValueError for string cv value TST nested cv (gs inside cross_val_score) works for diff cvs FIX/ENH Raise ValueError when labels is None for label based cvs; TST if labels is being passed correctly to the cv and that the ValueError is being propagated to the cross_val_score/predict and grid search FIX pass labels to cross_val_score FIX use make_classification DOC Add Returns; COSMIT Remove scaffolding TST add a test to check the _build_repr helper REVERT the old GS/RS should also be tested by the common tests. ENH Add a tuple of all/label based CVS FIX raise VE even at get_n_splits if labels is None FIX Fabian's comments PEP8

raghavrv force-pushed the model_selection branch from beec231 to 23d41f7 Compare August 13, 2015 23:22

raghavrv force-pushed the model_selection_documentation branch 2 times, most recently from 8b81641 to 9ba4c67 Compare August 13, 2015 23:23

raghavrv mentioned this pull request Aug 13, 2015

[MRG+1] Make cross-validators data independent + Reorganize grid_search, cross_validation and learning_curve into model_selection scikit-learn/scikit-learn#4294

Merged

24 tasks

raghavrv force-pushed the model_selection branch from 23d41f7 to 94e6ddd Compare August 13, 2015 23:31

raghavrv force-pushed the model_selection_documentation branch 2 times, most recently from ade9aa4 to a545a4f Compare August 14, 2015 12:21

amueller reviewed Aug 14, 2015
View reviewed changes

raghavrv force-pushed the model_selection branch 3 times, most recently from d36fbaa to d1d30b9 Compare August 14, 2015 23:23

raghavrv mentioned this pull request Aug 15, 2015

[MRG] DOC/MAINT remove _ref tags scikit-learn/scikit-learn#5121

Closed

amueller reviewed Oct 22, 2015
View reviewed changes

raghavrv force-pushed the model_selection branch from b2a952a to b40f307 Compare October 22, 2015 13:10

ogrisel reviewed Oct 22, 2015
View reviewed changes

raghavrv force-pushed the model_selection branch 2 times, most recently from 64eebe2 to 1c47e5d Compare October 22, 2015 16:01

raghavrv force-pushed the model_selection_documentation branch 2 times, most recently from 5a6ce7e to d147a85 Compare October 22, 2015 16:51

raghavrv force-pushed the model_selection branch from 73401d8 to aa35308 Compare October 23, 2015 10:03

raghavrv force-pushed the model_selection_documentation branch from d147a85 to 4dcfa19 Compare October 23, 2015 12:47

raghavrv force-pushed the model_selection branch from 7972165 to ca9517b Compare October 23, 2015 12:47

raghavrv force-pushed the model_selection_documentation branch from 4dcfa19 to 602fc8e Compare October 23, 2015 12:58

raghavrv force-pushed the model_selection branch from 74ec175 to 6a0d2fc Compare October 23, 2015 13:02

raghavrv force-pushed the model_selection_documentation branch from 602fc8e to efff882 Compare October 23, 2015 13:02

dohmatob and others added 5 commits October 23, 2015 17:28

BF: FIX OvR decision_function_shape in SVC

5427e08

DOC: added <what's new> entry regarding BF

a1844be

Raise appropriate error if y is sparse

409c888

DOC Modify documentation/examples for the new model_selection module

cff6258

raghavrv force-pushed the model_selection_documentation branch from efff882 to cff6258 Compare October 23, 2015 15:34

raghavrv closed this Oct 23, 2015

raghavrv deleted the model_selection_documentation branch February 11, 2016 13:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Model selection documentation #4

[MRG] Model selection documentation #4

		@@ -618,24 +610,6 @@ From text
		lda.LDA


		.. _learning_curve_ref:

[MRG] Model selection documentation #4

[MRG] Model selection documentation #4

Conversation

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment