8000 Added user guide documentation for permutation_test_score #10905 by vmanisha · Pull Request #14769 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Added user guide documentation for permutation_test_score #10905 #14769

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions doc/modules/cross_validation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,20 @@ Here is an example of ``cross_validate`` using a single metric::
>>> sorted(scores.keys())
['estimator', 'fit_time', 'score_time', 'test_score', 'train_score']

Cross-validation significance evaluation
----------------------------------------
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add this section after "Obtaining predictions by cross-validation"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in PR #14757.


Significance of cross validation scores can be evaluated using the
:func:`permutation_test_score` function. The function returns a p-value, which
approximates the probability that the average cross-validation score would be
obtained by chance if the target is independent of the data.


It also returns cross_validation scores for each permutation of y labels. It
permutes the labels of the samples and computes the p-value against the null
hypothesis that the features and the labels are independent, meaning that there
is no difference between the classes.


Obtaining predictions by cross-validation
-----------------------------------------
Expand Down
30 changes: 23 additions & 7 deletions sklearn/model_selection/_validation.py
Original file line number Diff line number Diff line change
Expand Up @@ -871,7 +871,20 @@ def _index_param_value(X, v, indices):
def permutation_test_score(estimator, X, y, groups=None, cv=None,
n_permutations=100, n_jobs=1, random_state=0,
verbose=0, scoring=None):
"""Evaluate the significance of a cross-validated score with permutations
"""Evaluate the significance of a cross-validated score by permuting
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following pep8, we try to keep the first sentence to 1 line.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in PR #14757.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Closing this pull request in favour of #14757

the labels of the samples and computing the p-value against the null
hypothesis that the features and the labels are independent, meaning that
there is no difference between the classes.

The p-value represents the fraction of randomized data sets where the
classifier would have had a larger error on the original data
than in the randomized one.

A small p-value (under a threshold, like :math:`\alpha = 0.05`) gives
enough evidence to conclude that the classifier has not learned a random
pattern in the data.

.. versionadded:: 0.9

Read more in the :ref:`User Guide <cross_validation>`.

Expand Down Expand Up @@ -953,14 +966,17 @@ def permutation_test_score(estimator, X, y, groups=None, cv=None,

The best possible p-value is 1/(n_permutations + 1), the worst is 1.0.

Notes
-----
This function implements Test 1 in:
References
----------

Ojala and Garriga. Permutation Tests for Studying Classifier
Performance. The Journal of Machine Learning Research (2010)
vol. 11
* `"Permutation Tests for Studying Classifier Performance"
<http://ieeexplore.ieee.org/document/5360332/>`_
Ojala and Garriga - The Journal of Machine Learning Research (2010)
vol. 11

Notes
-----
This function implements "Test 1" as described in the paper given above.
"""
X, y, groups = indexable(X, y, groups)

Expand Down
0