8000 [MRG+1] Fix #9743: Adding parameter information to docstring. by taylorkm · Pull Request #9757 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG+1] Fix #9743: Adding parameter information to docstring. #9757

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Sep 15, 2017
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 10 additions & 3 deletions sklearn/model_selection/_split.py
Original file line number Diff line number Diff line change
Expand Up @@ -1706,12 +1706,19 @@ def _validate_shuffle_split(n_samples, test_size, train_size):
class PredefinedSplit(BaseCrossValidator):
"""Predefined split cross-validator

Splits the data into training/test set folds according to a predefined
scheme. Each sample can be assigned to at most one test set fold, as
specified by the user through the ``test_fold`` parameter.
Provides train/test indices to split data into train/test sets using a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "provides" is a poor word choice here given that in this case the user provides the indices

Copy link
Contributor Author
@taylorkm taylorkm Sep 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a bit tricky. The user provides via test_fold what I want to call a "test-fold assignment" or "test-fold membership." This implicitly defines a collection of arrays which represent the indices of the data points split into each (train-fold, test-fold) pair. (There are as many folds as there are unique values (different from -1) in test_fold.) The explicit representation of the train/test folds in the form of the collection of arrays is being "provided" by the function.

A quick way fix without introducing this colloquial term I've used above to (hopefully) help distinguish the implicit vs. explicit ways of representing a collection of test/train folds is to use "computes" insted of "provides"? Or do nothing? Another possibility is making the example in the documentation a bit larger to possibly clear up confusion, but it may be overkill. I'm open to suggestions as my only rationale to previously use "provides" is the docstring for KFold.

predefined scheme specified by the user with the ``test_fold`` parameter.

Read more in the :ref:`User Guide <cross_validation>`.

Parameters
----------
test_fold : array-like, shape (n_samples,)
The entry ``test_fold[i]`` represents the index of the test set that
sample ``i`` belongs to. It is possible to exclude sample ``i`` from
any test set (i.e. include sample ``i`` in every training set) by
setting ``test_fold[i]`` equal to -1.

Examples
--------
>>> from sklearn.model_selection import PredefinedSplit
Expand Down
0