LOO is bad doc

johncollins · johncollins · commit 35f5b4b396bc · 2013-10-21T18:40:37.000-07:00
diff --git a/doc/modules/cross_validation.rst b/doc/modules/cross_validation.rst
@@ -165,7 +165,7 @@ validation strategies.
 K-fold
 ------
 
-:class:`KFold` divides all the samples in math:`k` groups of samples,
+:class:`KFold` divides all the samples in :math:`k` groups of samples,
 called folds (if :math:`k = n`, this is equivalent to the *Leave One
 Out* strategy), of equal sizes (if possible). The prediction function is
 learned using :math:`k - 1` folds, and the fold left out is used for test.
@@ -231,6 +231,41 @@ not waste much data as only one sample is removed from the learning set::
   [0 1 2] [3]
 
 
+Potential users of LOO for model selection should weigh a few known caveats. 
+When compared with *k*-fold cross validation, one builds *n* models from *n* 
+samples instead of *k* models, where *n > k*. Moreover, each is trained on *n - 1* 
+samples rather than *(k-1)n / k*. In both ways, assuming *k* is not too large 
+and *k < n*, LOO is more computationally expensive than *k*-fold cross validation.
+Typically *k* should be between 5 and 10.
+
+In terms of accuracy, LOO often results in poor estimation of the test error, 
+since it overfits to the training data. Intuitively, since *n - 1* of the *n* 
+samples are used to build each model, models constructed from folds are virtually 
+identical to each other and to the model built from the entire training set. 
+
+In contrast, it can also be shown that if the learning curve has a steep slope at 
+the training size in question, then 5- or 10- fold cross validation tends to 
+overestimate the generalization error.
+
+As a general rule, most authors and empirical evidence suggest that 5- or 10- 
+fold cross validation is preferred to LOO.
+
+
+.. topic:: References:
+
+ * http://www.faqs.org/faqs/ai-faq/neural-nets/part3/section-12.html
+ * T. Hastie, R. Tibshirani, J. Friedman,  `The Elements of Statistical Learning
+   <http://www-stat.stanford.edu/~tibs/ElemStatLearn>`_, Springer 2009
+ * L. Brieman, P. Spector `Submodel selection and evaluation in regression: The X-random case
+   <http://digitalassets.lib.berkeley.edu/sdtr/ucb/text/197.pdf>`_, International Statistical Review 1992
+ * R. Kohavi, `A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection
+   <http://www.cs.iastate.edu/~jtian/cs573/Papers/Kohavi-IJCAI-95.pdf>`_, Intl. Jnt. Conf. AI   
+ * R. Bharat Rao, G. Fung, R. Rosales, `On the Dangers of Cross-Validation. An Experimental Evaluation
+   <http://www.siam.org/proceedings/datamining/2008/dm08_54_Rao.pdf>`_, SIAM 2008
+ * G. James, D. Witten, T. Hastie, R Tibshirani, `An Introduction to Statitical Learning
+   <http://www-bcf.usc.edu/~gareth/ISL>`_, Springer 2013
+
+
 Leave-P-Out - LPO
 -----------------