@@ -856,3 +856,63 @@ Cross validation and model selection
856
856
Cross validation iterators can also be used to directly perform model
857
857
selection using Grid Search for the optimal hyperparameters of the
858
858
model. This is the topic of the next section: :ref: `grid_search `.
859
+
860
+ .. _permutation_test_score :
861
+
862
+ Permutation test score
863
+ ======================
864
+
865
+ :func: `~sklearn.model_selection.permutation_test_score ` offers another way
866
+ to evaluate the performance of classifiers. It provides a permutation-based
867
+ p-value, which represents how likely an observed performance of the
868
+ classifier would be obtained by chance. The null hypothesis in this test is
869
+ that the classifier fails to leverage any statistical dependency between the
870
+ features and the labels to make correct predictions on left out data.
871
+ :func: `~sklearn.model_selection.permutation_test_score ` generates a null
872
+ distribution by calculating `n_permutations ` different permutations of the
873
+ data. In each permutation the labels are randomly shuffled, thereby removing
874
+ any dependency between the features and the labels. The p-value output
875
+ is the fraction of permutations for which the average cross-validation score
876
+ obtained by the model is better than the cross-validation score obtained by
877
+ the model using the original data. For reliable results ``n_permutations ``
878
+ should typically be larger than 100 and ``cv `` between 3-10 folds.
879
+
880
+ A low p-value provides evidence that the dataset contains real dependency
881
+ between features and labels and the classifier was able to utilize this
882
+ to obtain good results. A high p-value could be due to a lack of dependency
883
+ between features and labels (there is no difference in feature values between
884
+ the classes) or because the classifier was not able to use the dependency in
885
+ the data. In the latter case, using a more appropriate classifier that
886
+ is able to utilize the structure in the data, would result in a low
887
+ p-value.
888
+
889
+ Cross-validation provides information about how well a classifier generalizes,
890
+ specifically the range of expected errors of the classifier. However, a
891
+ classifier trained on a high dimensional dataset with no structure may still
892
+ perform better than expected on cross-validation, just by chance.
893
+ This can typically happen with small datasets with less than a few hundred
894
+ samples.
895
+ :func: `~sklearn.model_selection.permutation_test_score ` provides information
896
+ on whether the classifier has found a real class structure and can help in
897
+ evaluating the performance of the classifier.
898
+
899
+ It is important to note that this test has been shown to produce low
900
+ p-values even if there is only weak structure in the data because in the
901
+ corresponding permutated datasets there is absolutely no structure. This
902
+ test is therefore only able to show when the model reliably outperforms
903
+ random guessing.
904
+
905
+ Finally, :func: `~sklearn.model_selection.permutation_test_score ` is computed
906
+ using brute force and interally fits ``(n_permutations + 1) * n_cv `` models.
907
+ It is therefore only tractable with small datasets for which fitting an
908
+ individual model is very fast.
909
+
910
+ .. topic :: Examples
911
+
912
+ * :ref: `sphx_glr_auto_examples_feature_selection_plot_permutation_test_for_classification.py `
913
+
914
+ .. topic :: References:
915
+
916
+ * Ojala and Garriga. `Permutation Tests for Studying Classifier Performance
917
+ <http://www.jmlr.org/papers/volume11/ojala10a/ojala10a.pdf> `_.
918
+ J. Mach. Learn. Res. 2010.
0 commit comments