0% found this document useful (0 votes)

44 views7 pages

Paper3 - Prediction Error Estimation

This document compares different resampling methods for estimating the prediction error of classification models when performing feature selection on high-dimensional genomic data with small sample sizes. It introduces the problem of building classifiers from high-dimensional genomic data that has thousands of features measured on relatively few samples. Feature selection, model selection, and prediction assessment are described as the key steps. Different resampling techniques for estimating prediction error are discussed, including resubstitution, split-sample validation, leave-one-out cross-validation (LOOCV), k-fold cross-validation, and the .632+ bootstrap. The authors performed an extensive comparison of these resampling methods on simulated, microarray, and proteomic data of varying sample sizes and signal-

Uploaded by

정원석

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views7 pages

Paper3 - Prediction Error Estimation

Uploaded by

정원석

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Vol. 21 no.

15 2005, pages 3301–3307

BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/bti499

Data and text mining

Prediction error estimation: a comparison of resampling

methods

Downloaded from https://academic.oup.com/bioinformatics/article-abstract/21/15/3301/195433 by Univ. of Michigan Law Library user on 25 April 2020
Annette M. Molinaro1,3,∗ , Richard Simon2 and Ruth M. Pfeiffer1
1 Biostatistics
Branch, Division of Cancer Epidemiology and Genetics and 2 Biometric Research Branch, Division of
Cancer Treatment and Diagnostics, NCI, NIH, Rockville, MD 20852 USA and 3 Department of Epidemiology and
Public Health, Yale University School of Medicine, New Haven, CT 06520, USA
Received on April 6, 2005; revised on April 28, 2005; accepted on May 12, 2005
Advance Access publication May 19, 2005

ABSTRACT Including too many noisy variables reduces accuracy of the predic-
Motivation: In genomic studies, thousands of features are collected tion and may lead to over-fitting of data, resulting in promising but
on relatively few samples. One of the goals of these studies is to build often non-reproducible results (Ransohoff, 2004).
classifiers to predict the outcome of future observations. There are Another difficulty is model selection with numerous classification
three inherent steps to this process: feature selection, model selection models available. An important step in reporting results is assessing
and prediction assessment. With a focus on prediction assessment, the chosen model’s error rate, or generalizability. In the absence of
we compare several methods for estimating the ‘true’ prediction error independent validation data, a common approach to estimating pre-
of a prediction model in the presence of feature selection. dictive accuracy is based on some form of resampling the original
Results: For small studies where features are selected from thou- data, e.g. cross-validation. These techniques divide the data into a
sands of candidates, the resubstitution and simple split-sample estim- learning set and a test set, and range in complexity from the popu-
ates are seriously biased. In these small samples, leave-one-out lar learning-test split to v-fold cross-validation, Monte-Carlo v-fold
cross-validation (LOOCV), 10-fold cross-validation (CV) and the .632+ cross-validation and bootstrap resampling. Few comparisons of
bootstrap have the smallest bias for diagonal discriminant analysis, standard resampling methods have been performed to date, and all of
nearest neighbor and classification trees. LOOCV and 10-fold CV have them exhibit limitations that make their conclusions inapplicable to
the smallest bias for linear discriminant analysis. Additionally, LOOCV, most genomic settings. Early comparisons of resampling techniques
5- and 10-fold CV, and the .632+ bootstrap have the lowest mean in the literature are focussed on model selection as opposed to predic-
square error. The .632+ bootstrap is quite biased in small sample tion error estimation (Breiman and Spector, 1992; Burman, 1989). In
sizes with strong signal-to-noise ratios. Differences in performance two recent assessments of resampling techniques for error estimation
among resampling methods are reduced as the number of specimens (Braga-Neto and Dougherty, 2004; Efron, 2004), feature selection
available increase. was not included as part of the resampling procedures, causing the
Contact: annette.molinaro@yale.edu conclusions to be inappropriate for the high-dimensional setting.
Supplementary Information: A complete compilation of results and We have performed an extensive comparison of resampling meth-
R code for simulations and analyses are available in Molinaro et al. ods to estimate prediction error using simulated (large signal-to-noise
(2005) (http://linus.nci.nih.gov/brb/TechReport.htm). ratio), microarray (intermediate signal to noise ratio) and proteomic
data (low signal-to-noise ratio), encompassing increasing sample
sizes with large numbers of features. The impact of feature selection
1 INTRODUCTION
on the performance of various cross-validation methods is high-
In genomic experiments one frequently encounters high dimensional lighted. The results elucidate the ‘best’ resampling techniques for
data and small sample sizes. Microarrays simultaneously monitor future research involving high dimensional data to avoid overly
expression levels for several thousands of genes. Proteomic profil- optimistic assessment of the performance of a model.
ing studies using SELDI-TOF (surface-enhanced laser desorption
and ionization time-of-flight) measure size and charge of proteins
and protein fragments by mass spectroscopy, and result in up to 2 METHODS
15 000 intensity levels at prespecified mass values for each spectrum.
In the prediction problem, one observes n independent and identically dis-
Sample sizes in such experiments are typically <100. tributed (i.i.d.) random variables O1 , . . . , On with unknown distribution P .
In many studies, observations are known to belong to predeter- Each observation in O consists of an outcome Y with range Y and an l-vector
mined classes and the task is to build predictors or classifiers for of measured covariates, or features, X with range X , such that Oi = (Xi , Yi ),
new observations whose class is unknown. Deciding which genes or i = 1, . . . , n. In microarray experiments X includes gene expression meas-
proteomic measurements to include in the prediction is called fea- urements, while in proteomic data, it includes the intensities at the mass
ture selection and is a crucial step in developing a class predictor. over charge (m/z) values. X may also contain covariates such as a patient’s
age and/or histopathologic measurements. The outcome Y may be a con-
tinuous measure such as months to disease or a categorical measure such as
∗ To whom correspondence should be addressed. disease status.

Published by Oxford University Press 2005 3301

A.M.Molinaro et al.

The goal in class prediction is to build a rule implementing the information validation set. We will focus solely on the partitioning of the data into learning
from X in order to predict Y . The intention is that by building this rule, based and test sets for the express purpose of estimating the generalization error.
on the observations O1 , . . . , On , a future unobserved outcome Y0 can be To enhance a general discussion of resampling methods, we define a binary
predicted based on its corresponding measured features X0 . If the outcome random n-vector, Sn ∈ {0, 1}n , which splits the observations into the desired
is continuous, then the rule, or predictor, ψ is defined as a mapping from the subsets (Molinaro et al., 2004). A realization of Sn = (Sn,1 , . . . , Sn,n ) pre-
feature space X onto the real line, i.e. ψ : X → R. Consequently, ŷ = ψ(x) scribes a particular split of the entire dataset of n observations into a learning
set, {i ∈ {1, . . . , n} : Sn,i = 0}, and a test set, {i ∈ {1, . . . , n} : Sn,i = 1}. Let

Downloaded from https://academic.oup.com/bioinformatics/article-abstract/21/15/3301/195433 by Univ. of Michigan Law Library user on 25 April 2020
denotes the predicted outcome based on the observed X. Such predictors can
be built via regression (linear and non-linear) or recursive binary partitioning p be the proportion of observations in the test set. The empirical distributions
0 1 , respectively.
such as classification and regression trees (CART) (Breiman et al., 1984). of the learning and test sets are denoted by Pn,S n
and Pn,S n
If the outcome Y is categorical it assumes one of K values. In this case, Importantly, Sn is independent of the empirical distribution of the complete
the rule ψ partitions the feature space X into K disjoint and exhaustive dataset of n observations Pn and the particular distribution of Sn defines the
groups Gk , where k = 1, . . . , K, such that ŷ = k if x ∈ Gk . Standard type of resampling method. Given Sn , the performance of any given estimator
statistical analyses include linear discriminant analysis (LDA) and diagonal ψ(·|Pn ) can be assessed via the resampling conditional risk estimate
discriminant classifiers (DDA), nearest neighbors (NN) and CART, as well
as aggregate classifiers. θ̂n(1−p) = ESn L(o, ψ(·|Pn,S 0
n
1
))dPn,S n
(o), (4)
Thorough discussions of the prediction problem and available algorithms
can be found in Breiman et al. (1984), McLachlan (1992), Ripley (1996) and where Sn refers to binary split vectors for the entire dataset of n observations

Hastie et al. (2003). and p = i Si,n /n is the proportion of n observations in the test set.
The rule ψ can be written as ψ(·|Pn ), where Pn denotes the empirical There are several considerations when selecting a resampling method.
distribution of O and reflects the dependence of the built rule on the observed The first is sample size n. For v-fold cross-validation and bootstrap, Dudoit
data. Loss functions may be employed to quantify the performance of a given and van der Laan (2003) (http://www.bepress.com/ucbbiostat/paper126) have
rule. A common loss function for a continuous outcome Y is the squared shown that as n → ∞ (and consequently np → ∞) asymptotic optimality
error loss, L(Y , ψ) = [Y − ψ(X)]2 . With a categorical outcome Y , a popular is achieved. However, no such results exist for finite samples. Other consid-
choice is the indicator loss function, L(Y , ψ) = I [Y = ψ(X)]. A loss erations are on the proportion p of the observations for the test set and the
function could also incorporate differential misclassification costs (Breiman number of times the estimate is calculated. We address these considerations
et al., 1984). in the following sections and refer the reader to more detailed discussions in
For either type of outcome, the expected loss, or risk, is defined as: McLachlan (1992) and Davison and Hinkley (1997).

θ̃ = R(ψ, P ) = EP [L(Y , ψ)] = L(y, ψ(x)) dP (x, y). (1) 2.1.1 Split sample This popular resampling method, also known as the
learning-test split or holdout method (McLachlan, 1992), entails a single par-
The rule in Equation (1) is constructed and evaluated upon the distribution tition of the data into a learning set and a test set based on a predetermined
P , as such, θ̃ is referred to as the asymptotic risk. However, in reality P p. For example, p = 1/3 allots two-thirds of the data to the learning set and
is unknown, thus, the rule based upon the observations O1 , . . . , On has an one-third to the test set. The distribution of Sn places mass 1/2 on two binary
expected loss, or conditional risk (also known as the generalization error), vectors which assign the n observations to the learning and test sets. The
defined as: advantage of this method is the ease of computation. Also, since the classifier
is developed only once, a completely specified algorithm for classifier devel-
θ̃n = R(ψ(·|Pn ), P ) = L(y, ψ(x|Pn )) dP (x, y). (2) opment need not be available; the development can be more informal and
subjective. There are two potential sources of bias inherent in this method:
There are two impetuses for evaluating the conditional risk: model selec-
bias introduced by each individual observation contributing only to the learn-
tion and performance assessment. In model selection, the goal is to find the
ing or test set; and, bias due to a small learning set, whereas both features and
one which minimizes the conditional risk over a collection of potential mod-
classifiers selected depend solely on the learning set. Because the learning
els. In performance assessment, the goal is to estimate the generalization
set is smaller than the full data set, the test set error for a model built on the
error for a given model, i.e. assess how well it predicts the outcome of an
training set will tend to over-estimate the unknown generalization error for a
observation not included in O.
model built on the full dataset.
In an ideal setting an independent dataset would be available for the pur-
poses of model selection and estimating the generalization error. Typically, 2.1.2 v-fold cross-validation This method randomly assigns the n
however, one must use the observed sample O for model building, selec- observations to one of v partitions such that the partitions are near-equal size.
tion and performance assessment. The simplest method for estimating the Subsequently, the learning set contains all but one of the partitions which is
conditional risk is with the resubstitution or apparent error: labeled the test set. The generalization error is assessed for each of the v test
sets and then averaged over v. In this method, the distribution of Sn puts mass
θ̂nRS = R(ψ(·|Pn ), Pn ) = L(y, ψ(x|Pn )) dPn (x, y). (3) 1/v on the v binary vectors, which assign each of the n observations to one
of the v partitions. The proportion p is approximately equal to 1/v. Both p
Here each of the n observation is used for constructing, selecting and,
and the number of averages can adversely or positively affect this estimate
subsequently, evaluating the prediction error of ψ. Consequently, the resub-
of error. For example, a larger v (e.g. v = 10) results in a smaller proportion
stitution risk estimate tends to underestimate the generalization error (Efron,
p in the test set; thus, a higher proportion in the learning set decreases the
1983; McLachlan, 1992). To alleviate this biased estimation, resampling
bias. In addition, the number of averages is equivalent to v and thus, may
methods such as cross-validation or bootstrapping can be employed. In the
additionally decrease the bias.
next section, we describe these techniques and their implications in the
framework of prediction error. 2.1.3 Leave-one-out cross-validation (LOOCV) This is the most
extreme case of v-fold cross-validation. In this method each observation is
2.1 Resampling methods individually assigned to the test set, i.e. v = n and p = 1/n (Lachenbruch
In the absence of a large, independent test set, there are numerous techniques and Mickey, 1968; Geisser, 1975; Stone, 1974, 1977). The distribution of
for assessing prediction error by implementing some form of partitioning or Sn places mass 1/n on the n binary vectors, which assign each of the n
resampling of the original observed data O. Each of these techniques involves observations to the learning and test sets. LOOCV and the corresponding
dividing the data into a learning set and a test set. For purposes of model p = 1/n represent the best example of a bias-variance trade-off. It tends
selection, the learning set may further be divided into a training set and a toward a small bias with elevated variance. In model selection, LOOCV has

3302
Prediction error estimation

performed poorly compared to v-fold cross-validation (Breiman and Spector, and CART), and data sets (simulated, microarray and proteomic; see
1992). Due to the computational burden, LOOCV has not been a favored Sections 3.1–3.3) are utilized. Prior to discussing results, the general
method for large samples, and its behavior in estimating generalization error strategy for estimating the risks is explained followed by the specifics
has not been thoroughly studied. of each dataset.
2.1.4 Monte Carlo cross-validation (MCCV) MCCV randomly Each dataset consists of N observations with N1 cases and N0 con-
splits the sample into a learning and test set numerous times (e.g. 20, 50 trols and l measured features. For r = 1, . . . , R repetitions, a random

Downloaded from https://academic.oup.com/bioinformatics/article-abstract/21/15/3301/195433 by Univ. of Michigan Law Library user on 25 April 2020
or 1000 iterations). For each split, np = n(1/v) of the observations are sample of size n stratified by case/control status is selected from N ,
labeled as the test set and n(1 − p) = n(1 − 1/v) as the learning set. For such that the number of cases in the subsample (n/2) equals the num-
example, in MCCV with v = 10 each of 50 iterations allot 10% of the data to ber of controls. The stratification allows for equal representation of
the test set and 90% to the learning set. The generalization error is assessed both, cases and controls, such that classification algorithms relying
for each of the 50 test sets and subsequently
averaged over the 50 iterations. on majority consensus are not biased toward either (Quackenbush,
The distribution of Sn puts mass 1/ np n on each of the binary vectors rep- 2004). This random sample, or subsample, plays two roles. First, it
resenting one split into a learning and test set. As the number of iterations serves as a sample from which the resampling conditional risk θ̂n(1−p)
increases the computational burden of MCCV is quite large. However, unless can be estimated. This is accomplished by splitting the subsample
the iterations of random splits approaches infinity, the chance that each obser- into a learning and test set corresponding to each of the resampling
vation is included in a learning set and a test set (over all iterations) is small, methods. For each r, an estimate of θ̂n(1−p) is obtained for each res-
introducing a similar bias to that of the split sample approach (i.e. when each ampling method with all four algorithms. In reality, the distribution
observation is either in the learning set or test set). P of the observed data O is unknown and thus, so is the ‘true’ con-
2.1.5 .632+ Bootstrap Several variations of the bootstrap have been ditional risk. In order to estimate θ̃n in Equation (2) we will use the
introduced to estimate the generalization error. The leave-one-out boot- complete observed data. As such, the subsample’s second role is to
strap (θ̂nBS ) is based on a random sample drawn with replacement from n serve as the learning set and the remaining N − n observations as
observations (Efron, 1983; Efron and Tibshirani, 1993). For each draw, the the test set for an approximation of the conditional risk θ̃n .
observations left out (≈.368n) serve as the test set. The learning set has Given the high-dimensional structure of each data set (i.e. large
≈.632n unique observations which leads to an overestimation of the predic- l), feature selection is an important task administered before running
tion error (i.e. a decrease in the learning set leads to an increase in the bias). any of the algorithms. Feature selection must occur based on the
To correct for this, two estimators have been suggested: the .632 bootstrap
learning set within each resampling, otherwise additional bias is
and the .632+ estimator. Both correct by adding the underestimated resubsti-
introduced (Simon et al., 2003). This correct approach to feature
tution error θ̂nRS , ωθ̂nBS + (1 − ω)θ̂nRS . For the .632 bootstrap the weight ω is
constant (ω = .632), whereas for the .632+ bootstrap ω is determined based selection within cross-validation has been referred to as honest or
on the ‘no-information error rate’ (Efron and Tibshirani, 1997). We focus on complete (Quackenbush, 2004). There are many methods available
the latter as it is the most used in the literature and the most robust across for feature selection; here t-tests are used. Initially components of
different algorithms (Efron and Tibshirani, 1997). X with the largest 10 absolute value t-test statistics are considered.
Subsequently, the largest 20 are discussed.
2.2 Algorithms All simulations and analyses were implemented in R (Ihaka and
Predictions of outcomes based on the observed X can employ parametric or Gentlemen, 1996).
non-parametric algorithms. If the outcome is continuous, predictors can be
built using regression models or recursive binary partitioning like CART. If 3.1 Simulated data
the outcome is categorical, algorithms which partition the feature space X The simulated datasets are generated as described in Bura and Pfeiffer
into disjoint and exhaustive groups are used. In this manuscript, we limit our
(2003). Each dataset contains N = 300 observations with 750 cov-
discussion to the classification of binary outcomes, i.e. Y = 0 or Y = 1, and
thus, evaluate methods for the estimation of prediction error in the context of
ariates, representing patients and genes, respectively. Half of the
the following classification algorithms. observations (i.e. 150) are labeled controls (Y = 0) and half cases
We calculate the LDA with the lda function in the MASS library of the (Y = 1). Of the 750 genes, 8 are associated with disease and the
statistical package R (Venables and Ripley, 1994; Ihaka and Gentlemen, others are non-predictive. The controls are simulated from a mul-
1996). We use the function dlda in the library supclust in R to imple- tivariate normal distribution with a mean of 0 and covariance matrix
ment DDA (Dettling and Maechler, 2005, http://lib.stat.cmu.edu/R/CRAN/ . The cases have 99% non-differentially expressed genes which
src/contrib/Descriptions/supclust.html). The library supclust also houses are generated from the same N (0, ) as the controls. The 1% of the
the function nnr for NN. CART classification is obtained using the library genes that are differentially expressed are generated from a mixture
and function rpart in R (Breiman et al., 1984; Therneau and Atkinson, of two multivariate normals with means µ1 and µ2 and covariance
1997).
structure . The mixing probability is 0.5. The covariance matrix
= (σij ) has a block structure with σij = 0.2 for |j − i| ≤ 5 and
3 ANALYSIS zero otherwise. Estimates of θ̂n(1−p) and θ̃n are based on learning
The goal of this analysis is to ascertain differences between res- samples of size 40, 80 and 120 and test sets of size 260, 220 and
ampling methods in the estimation of generalization error (presently, 180, respectively.
limited to the classification problem) in the presence of feature
selection. We evaluate the influence of sample size, parametric to 3.2 Lymphoma and lung datasets
non-parametric classification methods, and large feature spaces on The microarray datasets are both publicly available. The first focuses
each resampling method’s ability to estimate the resampling con- on diffuse large-B-cell lymphoma (Rosenwald et al., 2002). In this
ditional risk θ̂n(1−p) [Equation (4)] compared to that of the ‘true’ study there are 7399 genes on the microarray and 240 patients. For
conditional risk θ̃n [Equation (2)]. As such, a range of sample sizes the purposes of this analysis, the outcome-variable represents the
(n = 40, 80 and 120), classification algorithms (LDA, DDA, NN lymphoma subtype: activated B-cell for Y = 0 and germinal-center

3303
A.M.Molinaro et al.

B-cell for Y = 1. This is an example of a moderate signal-to-noise Table 1. Prediction error estimates
ratio dataset, as the subgroups do not separate perfectly based on the
microarray observations (Wright et al., 2003). Estimates of θ̂n(1−p)
Estimator p Algorithm Estimate SD Bias MSE
and θ̃n are based on learning samples of size 40, 80 and 120 and test
sets of size 200, 160 and 120, respectively. The second study uses
oligonucleotide microarrays to measure 12 601 transcript sequences θ̃n 0.87 LDA 0.078 0.093

Downloaded from https://academic.oup.com/bioinformatics/article-abstract/21/15/3301/195433 by Univ. of Michigan Law Library user on 25 April 2020
DDA 0.160 0.086
for 186 lung tumor samples (Bhattacharjee et al., 2001). For our
NN 0.042 0.084
analysis, the outcome represents the 139 adenocarcinomas as Y = 0
CART 0.121 0.133
and the remaining 47 tumors as Y = 1. v-fold CV 0.5 LDA 0.357 0.126 0.279 0.097
3.3 Proteomic ovarian dataset DDA 0.342 0.106 0.182 0.052
NN 0.277 0.135 0.235 0.077
The proteomic dataset consists of 164 SELDI-TOF measurements CART 0.430 0.121 0.309 0.134
from NCI/Mayo Clinic serum samples. These data are part of a 0.2 LDA 0.161 0.127 0.083 0.017
study designed to validate previously identified proteomic markers DDA 0.208 0.086 0.048 0.012
for ovarian cancer. The readings are from fraction 4, IMAC30 Pro- NN 0.108 0.102 0.066 0.011
teinChip arrays, read at high and low energy settings in a PCS4000 CART 0.284 0.117 0.163 0.055
ProteinChip Reader (Ciphergen Biosystems, Inc., Fremont, CA). 0.1 LDA 0.118 0.120 0.040 0.008
The spectra were externally calibrated for mass, internally normal- DDA 0.177 0.087 0.017 0.007
NN 0.078 0.102 0.036 0.005
ized for intensity using total ion current, and baseline subtracted.
CART 0.189 0.104 0.068 0.024
Peaks were manually selected and the intensity recorded.
LOOCV 0.025 LDA 0.092 0.115 0.014 0.008
Of the n = 164 observations, 45 are ovarian cancer cases and 119 DDA 0.164 0.096 0.004 0.007
controls. Estimates of θ̂n(1−p) and θ̃n are based on learning samples NN 0.058 0.103 0.016 0.005
of size 40 and 80 and test sets of size 144 and 104, respectively. CART 0.146 0.125 0.025 0.018
Given the nature of proteomic data as well as the naive algorithms Split 0.333 LDA 0.205 0.184 0.127 0.053
implemented, this will serve as a low signal-to-noise example. DDA 0.243 0.138 0.083 0.034
NN 0.145 0.169 0.103 0.044
3.4 Results CART 0.371 0.174 0.25 0.121
To compare the resampling methods in Section 2.1, conditional risk 0.5 LDA 0.348 0.185 0.270 0.113
estimates for each method are calculated and compared to each other DDA 0.344 0.139 0.184 0.062
and the truth (i.e. the conditional risk). This evaluation is based on NN 0.265 0.177 0.223 0.086
CART 0.438 0.155 0.317 0.147
the mean squared error (MSE) and bias, calculated as follows:
.632+ 50 ≈.368 LDA 0.274 0.084 0.196 0.047
1
R repetitions DDA 0.286 0.074 0.126 0.028
MSE = (θ̂n,r − θ̃n,r )2 NN 0.200 0.070 0.158 0.032
r CART 0.387 0.080 0.266 0.100
r=1

1
R
Bias = (θ̂n,r − θ̃n,r ), The estimate θ̂n (column 4) and SD (column 5) based on learning sample of size 40.
r The estimate θ̃n (rows 1–4) and SD based on the remaining 260 observations. Bias
r=1
(column 6) and MSE (column 7) reported for each resampling technique (column 1)
where θ̂n,r is the resampling conditional risk and θ̃n,r is the conditional and algorithm (column 3). The ten features with largest t-statistics used in algorithms.
risk for the r-th repetition. In all results the total number of repetitions Minimums in bold.
is set at 100, i.e. R = 100.
There were several attempts to examine the effect of varying p (Table 1). The largest MSE and bias occur with 2-fold CV and split
on those resampling methods which allow user-defined test set pro- sample with p = 1/2. For n = 80 and n = 120, the differences
portions (i.e. v-fold cross-validation, MCCV and split sample). For among these methods diminish. For n = 40 and n = 80, .632+ has
v-fold cross-validation, 2-, 5- and 10-fold were explored. In MCCV, the smallest SD, followed by 10-fold CV, LOOCV and 5-fold CV.
both p and the number of MCCV repetitions affect the estimation, The only exception is for LDA and NN at n = 80, when LOOCV and
thus, test set proportions of p = 0.5, p = 0.2, p = 0.1 as well as 10-fold CV have the smallest. At n = 120, the differences among
repetitions of 20, 50 and 1000 were run. In split-sample estimation these methods diminish.
test set proportions of both p = 1/3 and p = 1/2 were examined to Lymphoma and lung study results In the lymphoma study, for
assess the bias/variance trade-off. n = 40, 80 and 120, .632+, LOOCV, 5- and 10-fold CV have the
Due to space limitations, all results are discussed but only a limited smallest MSE and bias. The two split-samples and 2-fold CV have
number of tables can be displayed. The interested reader is referred to the largest MSE and bias. Similar to the simulation study, .632+
Molinaro et al. (2005) (http://linus.nci.nih.gov/brb/TechReport.htm) has the smallest SD across the algorithms and sample sizes, while
for a comprehensive compilation of results. The MCCV results are both split samples do by far the worst. Partial results are shown in
not included below, as the only noticeable improvement over v-fold Table 2. The results from the lung study are very similar and thor-
CV is a slight decrease in variance. Additionally, the advantage of oughly discussed in Molinaro et al. (2005). (http://linus.nci.nih.gov/
increasing the MCCV iterations from 20 to 50 to 1000 is minimal. brb/TechReport.htm).
Simulation study results For n = 40, LOOCV and 10-fold CV have Ovarian study results For n = 40 to n = 80, LOOCV and .632+
the smallest MSE and bias, followed by 5-fold CV and then .632+ have the smallest MSE, followed by 5- and 10-fold CV. As for bias,

3304
Prediction error estimation

Table 2. Lymphoma study results

Resampling method n = 40 n = 80 n = 120

SD Bias MSE SD Bias MSE SD Bias MSE

Downloaded from https://academic.oup.com/bioinformatics/article-abstract/21/15/3301/195433 by Univ. of Michigan Law Library user on 25 April 2020
2-fold CV 0.085 0.038 0.01 0.043 0.002 0.004 0.031 0.0 0.003
5-fold CV 0.07 0.004 0.007 0.045 −0.008 0.005 0.032 −0.006 0.003
10-fold CV 0.063 −0.007 0.006 0.036 −0.009 0.003 0.031 −0.006 0.003
LOOCV 0.072 −0.019 0.008 0.04 −0.013 0.004 0.033 −0.004 0.003
Split 1/3 0.119 0.001 0.017 0.071 0.0 0.007 0.059 −0.004 0.005
Split 1/2 0.117 0.037 0.018 0.058 0.001 0.005 0.046 −0.001 0.004
.632+ 0.049 −0.006 0.004 0.025 −0.02 0.003 0.018 −0.015 0.002

Comparison of resampling method’s MSE, bias and SD. Results shown are for the DDA algorithm using the top 10 genes as ranked by t-tests.

Table 3. Ovarian study results Table 4. Prediction error estimates without feature selection

Resampling n = 40 n = 80 Estimator p Algorithm Estimation SD Bias MSE

method SD Bias MSE SD Bias MSE
θ̃n 0.87 LDA 0.026 0.028
2-fold CV 0.098 0.026 0.015 0.05 0.004 0.007 DDA 0.073 0.058
5-fold CV 0.082 00 0.012 0.039 −0.005 0.006 NN 0.010 0.017
10-fold CV 0.082 −0.01 0.011 0.036 −0.005 0.005 CART 0.099 0.092
LOOCV 0.079 −0.004 0.011 0.037 −0.004 0.006 v-fold CV 0.5 LDA 0.067 0.060 0.041 0.005
Split 1/3 0.133 −0.002 0.022 0.075 −0.009 0.009 DDA 0.106 0.079 0.033 0.009
Split 1/2 0.113 0.027 0.018 0.071 0.013 0.01 NN 0.011 0.025 0.001 0
.632+ 0.075 −0.006 0.011 0.028 −0.014 0.005 CART 0.304 0.088 0.205 0.063
0.2 LDA 0.034 0.045 0.008 0.002
Comparison of resampling method’s MSE, bias and SD. Results shown are for the DDA DDA 0.085 0.049 0.012 0.003
algorithm using the top 10 peaks as ranked by t-tests. NN 0.011 0.024 0.001 0
CART 0.158 0.072 0.059 0.012
10-fold CV, .632+ and LOOCV vie for the smallest. The largest 0.1 LDA 0.032 0.041 0.006 0.001
MSE and bias occur with the split samples and 2-fold CV. Again DDA 0.074 0.048 0.001 0.002
NN 0.010 0.021 0 0
.632+ has the smallest SD across algorithms and sample sizes; how-
CART 0.118 0.063 0.019 0.006
ever, the discrepancy is much smaller than in the other two studies. LOOCV 0.025 LDA 0.028 0.040 0.002 0.001
The split samples have the largest SDs. Partial results are shown DDA 0.072 0.049 −0.001 0.002
in Table 3. NN 0.010 0.022 0 0
All analyses were repeated, selecting the 20 features having the CART 0.110 0.075 0.011 0.006
largest t-test statistics. The ranking of the resampling methods Split 0.333 LDA 0.046 0.076 0.020 0.005
remained the same (Supplementary material). DDA 0.066 0.085 −0.007 0.008
NN 0.007 0.029 −0.003 0.001
Repeated resampling We examined the effect of repeated res- CART 0.265 0.116 0.166 0.047
ampling on 2-, 5- and 10-fold CV and split sample with p = 1/3, for 0.5 LDA 0.073 0.078 0.047 0.007
the three samples sizes and four algorithms. Each was repeated 10 DDA 0.093 0.099 0.020 0.013
and 30 times. Interestingly, there was minimal improvement when NN 0.010 0.028 0 0.001
increased from 10 to 30 repeats. However, when increasing repeats CART 0.308 0.114 0.209 0.071
.632+ ≈.368 LDA 0.037 0.036 0.011 0.001
from 1 to 10 (or 30), all SDs decreased (up to 50%). The MSE either
50 repetitions DDA 0.085 0.036 0.012 0.003
decreased (up to 35%) or stayed similar, which was also true for the
NN 0.008 0.016 −0.002 0
bias except in split sample for n = 40 and 2-fold CV for n = 40 and CART 0.160 0.034 0.061 0.010
n = 80 (Supplementary material).
To assess the effect of no feature selection on resampling methods estimation, only 10
Dimensionality of feature space In the simulations of Efron and features were simulated and all 10 used in estimation. Results based on a learning sample
Tibshirani (1997), .632+ outperformed LOOCV and 10-fold CV. For of 40 and a test sample of 260. Absolute minimums in bold.
example, in their experiment 22, with 10 variables and 36 patients,
the MSE was .040 for .632+ and .058 for LOOCV. However, in our
simulations with n = 40 (Table 1) .632+ does not fare so well, par- of 750. The results are shown in Table 4 for the sample size of
ticularly with regard to bias. To investigate the differences between 40. With low dimension the large bias of the bootstrap is substan-
our simulations and those in Efron and Tibshirani, we decreased the tially reduced, and the .632+ does as well or better than LOOCV
dimensions of the feature space to a total of 10 variables instead and 10-fold CV.

3305
A.M.Molinaro et al.

Table 5. Resampling with and without replacement

n Algorithm Leave-one-out bootstrap 3-fold MCCV

Estimation SD Bias MSE Estimation SD Bias MSE

Downloaded from https://academic.oup.com/bioinformatics/article-abstract/21/15/3301/195433 by Univ. of Michigan Law Library user on 25 April 2020
LDA 0.331 0.075 0.252 0.072 0.242 0.101 0.164 0.035
DDA 0.337 0.075 0.177 0.044 0.270 0.072 0.110 0.022
n = 40 NN 0.259 0.072 0.217 0.055 0.167 0.083 0.125 0.022
CART 0.414 0.065 0.296 0.114 0.377 0.085 0.256 0.094
LDA 0.07 0.063 0.043 0.004 0.044 0.053 0.017 0.002
DDA 0.146 0.058 0.074 0.008 0.104 0.058 0.033 0.003
n = 80 NN 0.046 0.056 0.036 0.003 0.022 0.043 0.012 0.001
CART 0.098 0.047 0.057 0.006 0.062 0.039 0.020 0.002
LDA 0.032 0.033 0.011 0.001 0.026 0.026 0.005 0
DDA 0.088 0.045 0.036 0.002 0.068 0.043 0.016 0.001
n = 120 NN 0.016 0.030 0.007 0 0.012 0.023 0.003 0
CART 0.048 0.025 0.022 0.001 0.038 0.022 0.012 0.001

The leave-one-out bootstrap and 3-fold MCCV estimate, SD, bias, and MSE, over 3 samples sizes and 4 algorithms. Feature selection was used to select the top 10 ranked features
by t-tests.

Resampling with and without replacement To understand the (3) 10-fold CV prediction error estimates approximate those
ramification of resampling with replacement as it pertains to the boot- of LOOCV in almost all settings. For computationally bur-
strap estimates, we compared the leave-one-out bootstrap estimate densome analyses, 10-fold CV may be preferable to LOOCV.
(Section 2.1.5) to the 3-fold MCCV. The 3-fold MCCV randomly Additionally, in the simulated data, repeated resamplings (the
selects .666n for the learning set and .333n for the test set. This is average of 10 repeats) reduce the MSE, bias, and variance of
repeated numerous times and the estimates averaged. Therefore the 10-fold CV.
3-fold MCCV is equivalent to the leave-one-out bootstrap, except it (4) The .632+ prediction error estimate performs best with
employs resampling without replacement. Table 5 displays the sim- moderate to weak signal-to-noise ratios. Previous studies
ulation study results for the two estimates using 50 iterations for have found the bootstrap variants superior to LOOCV and
both. Interestingly, the bias and MSE for the leave-one-out boot- v-fold CV; however, these studies did not include feature
strap are roughly double that of 3-fold MCCV. The only two distinct selection. As seen in Table 1, honest resampling in small
differences between the two methods are the replicate copies in the samples with strong signal suggest that LOOCV and 10-fold
learning set, inherent in the bootstrap estimate, and the fact that on CV are in fact better than the .632+ bootstrap. This dis-
average .632n unique observations are in the learning sample for the crepancy fades when feature selection is discarded (Table 4)
leave-one-out bootstrap, whereas there are always .666n in the learn- and when the signal decreases, as seen in the lymphoma and
ing sample for the 3-fold MCCV. Both these factors may contribute ovarian datasets (Tables 2 and 3). Additional glimpses into the
to the increase in bias and MSE. bootstrap estimate (Table 5) indicate that the sampling with
replacement increases the MSE and bias substantially over
4 DISCUSSION 3-fold MCCV (i.e. resampling without replacement).
Estimation of prediction error when confronted with a multitude (5) MCCV does not decrease the MSE or bias enough to warrant
of covariates and small sample sizes is a relatively new problem. its use over v-fold CV.
Feature selection, sample size and signal-to-noise ratio are import- (6) As the sample size grows, the differences among the
ant influences on the relative performance of resampling methods. resampling methods decrease. Additionally, as the signal
We have evaluated resampling methods for use in high dimensional decreases from strong in the simulated data to rather weak
classification problems using a range of sample sizes, algorithms and in the ovarian data the discrepancies between the methods
signals. Some general conclusions may be summarized as follows: diminish.
(1) With small sample sizes, the split sample method and In future work we will compare the resampling methods for
2-fold CV perform very poorly. This poor performance is continuous outcomes and continue to explore the behavior of the
primarily due to a large positive bias resulting from the use of bootstrap estimates. Also, the effect of feature selection method may
a reduced training set size, which severely impairs its ability play an important role in prediction and needs further investigation.
to effectively select features and fit a model. The large bias
contributes to a large MSE.
ACKNOWLEDGEMENTS
(2) LOOCV generally performs very well with regard to MSE
A.M.M. was supported by the Cancer Prevention Fellowship
and bias. The only exception is when an unstable classi-
Program, DCP/NCI/NIH. The authors thank Mark J. van der Laan
fier (e.g. CART) is used in the presence of a weak signal.
for fruitful discussions.
In this setting, the larger MSE is attributed to LOOCV’s
increased variance. Conflict of Interest: none declared.

3306
Prediction error estimation

REFERENCES Hastie,T., Tibshirani,R. and Friedman,J. (2003) The Elements of Statistical Learning:
Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer, 1st
Bhattacharjee,A. et al. (2001) Classification of human lung carcinomas by mRNA
edn, 3rd print. Springer-Verlag, New York.
expression profiling reveals distinct adenocarcinoma subclasses. Proc. Natl Acad.
Ihaka,R. and Gentlemen,R. (1996) R: a language for data analysis and graphics.
Sci. USA, 98, 13790–13795.
J. Comput. Graph. Stat., 5, 299–314.
Braga-Neto,U.M. and Dougherty,E.R. (2004) Is cross-validation for small-sample
Lachenbruch,P.A. and Mickey,M.R. (1968) Estimation of error rates in discriminant
microarray classification? Bioinformatics, 20, 374–380.
analysis. Technometrics, 10, 1–11.

Downloaded from https://academic.oup.com/bioinformatics/article-abstract/21/15/3301/195433 by Univ. of Michigan Law Library user on 25 April 2020
Breiman,L. and Spector,P. (1992) Submodel selection and evaluation in regression. The
X-random case. Int. Stat. Rev., 60, 291–319. McLachlan,G.J. (1992) Discriminant Analysis and Statistical Pattern Recognition. John
Breiman,L., Friedman,J.H., Olshen,R.A. and Stone,C.J. (1984) Classification and Wiley & Sons, Inc, New York.
Regression Trees. Wadsworth and Brooks/Cole, Monterey, CA. Molinaro,A.M. et al. (2004) Tree-based multivariate regression and density estimation
Bura,E. and Pfeiffer,R.M. (2003) Graphical methods for class prediction using with right-censored data. J. Multivar. Anal., 90, 154–177.
dimension reduction techniques on DNA microarray data. Bioinformatics, 19, Molinaro,A.M., Simon,R. and Pfeiffer, R.M. (2005) Prediction error estimation: a com-
1252–1258. parison of resampling methods. Technical Report Number 30, Biometrics Research
Burman,P. (1989) A comparative study of ordinary cross-validation, v-fold Branch, Division of Cancer Treatment and Diagnosis, NCI.
cross-validation and the repeated learning-testing methods. Biometrika, 76, Quackenbush,J. (2004) Meeting the challenges of functional genomics: from the
503–514. laboratory to the clinic. Preclinica, 2, 313–316.
Davison,A.C. and Hinkley,D.V. (1997) Bootstrap Methods and Their Application. Ransohoff,D.F. (2004) Rules of evidence for cancer molecular marker discovery and
Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University validation. Nature Reviews/Cancer, 4, 309–313.
Press, Cambridge, UK. Ripley,B.D. (1996) Pattern Recognition and Neural Networks. Cambridge University
Dettling,M. and Maechler,M. (2005) Software R Contributed Package: Supclust: Press, New York.
Supervised Clustering of Genes. (http://cran.r-project.org) version 1.0-5. Rosenwald,A. et al. (2002) The use of molecular profiling to predict survival
Dudoit,S. and van der Laan,M.J. (2003) Asymptotics of cross-validated risk estim- after chemotherapy for diffuse large-B-cell lymphoma. New Engl. J. Med., 346,
ation in model selection and performance assessment. Technical Report 126, 1937–1946.
U.C. Berkeley Division of Biostatistics Working Paper Series. http://www.bepress. Simon,R. et al. (2003) Pitfalls in the use of DNA microarray data for diagnostic and
com/ucbbiostat/paper126. prognostic classification. J. Natl Cancer Inst., 95, 14–18.
Efron,B. (1983) Estimating the error rate of a prediction rule: improvement on cross- Stone,M. (1974) Cross-validatory choice and assessment of statistical predictions. J. R.
validation. J. Am. Stat. Assoc., 78, 316–331. Stat. Soc., Series B, 36, 111–147.
Efron,B. (2004) The estimation of prediction error: covariance penalties and cross- Stone,M. (1977) Asymptotics for and against cross-validation. Biometrika, 64, 29–35.
validation. J. Am. Stat. Assoc., 99, 619–642. Therneau,T. and Atkinson,E. (1997) An introduction to recursive partitioning using
Efron,B. and Tibshirani,R.J. (1993) An Introduction to the Bootstrap. the RPART routine. Technical Report 61, Section of Biostatistics, Mayo Clinic,
Monographs on Statistics and Applied Probability. Chapman and Hall, Rochester.
Vol. 57, NY. Venables,W.N. and Ripley,B.D. (1994) Modern Applied Statistics with S-PLUS.
Efron,B. and Tibshirani,R.J. (1997) Improvements on cross-validation: the .632+ Springer, New York.
bootstrap method. J. Am. Stat. Assoc., 92, 548–560. Wright,G. et al. (2003) A gene expression-based method to diagnose clinically dis-
Geisser,S. (1975) The predictive sample reuse method with applications. J. Am. Stat. tinct subgroups of diffuse large B cell lymphoma. Proc. Natl Acad. Sci. USA, 100,
Assoc., 70, 320–328. 9991–9996.

3307

Statistical Methods For Bioinformatics Lecture 3
No ratings yet
Statistical Methods For Bioinformatics Lecture 3
33 pages
Calculating Confidence Intervals For Prediction Error in Microarray Classification Using Resampling
No ratings yet
Calculating Confidence Intervals For Prediction Error in Microarray Classification Using Resampling
22 pages
Constructing Confidence Intervals For "The" Generalization Error - A Comprehensive Benchmark Study
No ratings yet
Constructing Confidence Intervals For "The" Generalization Error - A Comprehensive Benchmark Study
66 pages
Aabc 1 13
No ratings yet
Aabc 1 13
6 pages
Discussion 3 Supervised
No ratings yet
Discussion 3 Supervised
14 pages
Improvements On CVBootstrap
No ratings yet
Improvements On CVBootstrap
14 pages
BMC Bioinformatics
No ratings yet
BMC Bioinformatics
10 pages
Adaptive Methods for Function Estimation
No ratings yet
Adaptive Methods for Function Estimation
16 pages
Big Challenges With Big Data in Life Sciences: Shankar Subramaniam UC San Diego
No ratings yet
Big Challenges With Big Data in Life Sciences: Shankar Subramaniam UC San Diego
94 pages
Bioinformatics 26-3-392
No ratings yet
Bioinformatics 26-3-392
7 pages
Fdsa UNIT V
No ratings yet
Fdsa UNIT V
18 pages
Statistical Methods-1
No ratings yet
Statistical Methods-1
63 pages
Biometrics 65 4 1030
No ratings yet
Biometrics 65 4 1030
11 pages
Ridge Regression and Lasso Estimators For Data Analysis - 1749804481151
No ratings yet
Ridge Regression and Lasso Estimators For Data Analysis - 1749804481151
38 pages
Submodel Selection: AND Evaluation in Regression-THE X-Random Case
No ratings yet
Submodel Selection: AND Evaluation in Regression-THE X-Random Case
44 pages
Bootstrap
No ratings yet
Bootstrap
33 pages
Regression Modeling Strategies
No ratings yet
Regression Modeling Strategies
506 pages
s41664 018 0068 2
No ratings yet
s41664 018 0068 2
14 pages
Rms PDF
No ratings yet
Rms PDF
506 pages
BMC Bioinformatics: Gene Selection and Classification of Microarray Data Using Random Forest
No ratings yet
BMC Bioinformatics: Gene Selection and Classification of Microarray Data Using Random Forest
13 pages
Course PDF
No ratings yet
Course PDF
403 pages
Lec 5
No ratings yet
Lec 5
29 pages
BMC Bioinformatics: Bias in Error Estimation When Using Cross-Validation For Model Selection
No ratings yet
BMC Bioinformatics: Bias in Error Estimation When Using Cross-Validation For Model Selection
8 pages
LJZWMK LV 5 S KH3 V 8 GD 3 KN QRy
No ratings yet
LJZWMK LV 5 S KH3 V 8 GD 3 KN QRy
12 pages
Bias in Error Estimation When Using Cross-Validation For Model Selection." BMC Bioinformatics, 7 (1), 91
No ratings yet
Bias in Error Estimation When Using Cross-Validation For Model Selection." BMC Bioinformatics, 7 (1), 91
9 pages
Probability and Non-Probability Samples LMU Munich 2022
No ratings yet
Probability and Non-Probability Samples LMU Munich 2022
22 pages
Lecture 6 Model Selection and Regularization 11oct2023
No ratings yet
Lecture 6 Model Selection and Regularization 11oct2023
29 pages
MSD Model Diagnostics 1
No ratings yet
MSD Model Diagnostics 1
43 pages
EU Pig Classification Statistical Guide
No ratings yet
EU Pig Classification Statistical Guide
132 pages
Lecture 22: Review For Exam 2 1 Basic Model Assumptions (Without Gaussian Noise)
No ratings yet
Lecture 22: Review For Exam 2 1 Basic Model Assumptions (Without Gaussian Noise)
7 pages
Fgene 10 00899
No ratings yet
Fgene 10 00899
4 pages
A Simulation Study On Comparison of Prediction Methods When Only A Few Components Are Relevant
No ratings yet
A Simulation Study On Comparison of Prediction Methods When Only A Few Components Are Relevant
21 pages
Practical Aplication 2
No ratings yet
Practical Aplication 2
10 pages
User Friendly Multivariate Calibration GP
100% (4)
User Friendly Multivariate Calibration GP
354 pages
Notes Stat Learning
No ratings yet
Notes Stat Learning
64 pages
Out of Sample
No ratings yet
Out of Sample
19 pages
Robustness of Random Forest-Based Gene Selection Methods: Researcharticle Open Access
No ratings yet
Robustness of Random Forest-Based Gene Selection Methods: Researcharticle Open Access
8 pages
A Universal Selection Method in Linear Regression Models: Eckhard Liebscher
No ratings yet
A Universal Selection Method in Linear Regression Models: Eckhard Liebscher
10 pages
Course Regression Model Strategies PDF
No ratings yet
Course Regression Model Strategies PDF
307 pages
Test
No ratings yet
Test
4 pages
1) SVM-RFE: This Is A Popular Method For Feature Selection Where Ranking Is Done Based On
No ratings yet
1) SVM-RFE: This Is A Popular Method For Feature Selection Where Ranking Is Done Based On
6 pages
Classification
No ratings yet
Classification
22 pages
9 - RF Behaviour
No ratings yet
9 - RF Behaviour
13 pages
Features Election
No ratings yet
Features Election
18 pages
Regression Variable Selection Methods
No ratings yet
Regression Variable Selection Methods
30 pages
Improvements On Cross Validation The 632 Bootstrap Method
No ratings yet
Improvements On Cross Validation The 632 Bootstrap Method
14 pages
Regression Modeling PDF
100% (1)
Regression Modeling PDF
598 pages
Regression Modeling Strategies - With Applications To Linear Models by Frank E. Harrell
100% (4)
Regression Modeling Strategies - With Applications To Linear Models by Frank E. Harrell
598 pages
2015 Book RegressionModelingStrategies-1 PDF
No ratings yet
2015 Book RegressionModelingStrategies-1 PDF
598 pages
Module07 - Model Selection and Regularization
No ratings yet
Module07 - Model Selection and Regularization
46 pages
SDSC3006 - Assignment 3
No ratings yet
SDSC3006 - Assignment 3
4 pages
PLS Tutorial PDF
No ratings yet
PLS Tutorial PDF
12 pages
Partial Least Squares A Tutorial
No ratings yet
Partial Least Squares A Tutorial
12 pages
Introduction To Statistical Learning: With Applications in R
No ratings yet
Introduction To Statistical Learning: With Applications in R
13 pages
CO1 September 19
No ratings yet
CO1 September 19
6 pages
Research Output On Paddy (Oryza Sativa) : A Scientometric Study
No ratings yet
Research Output On Paddy (Oryza Sativa) : A Scientometric Study
4 pages
Project Risk Management Guide
No ratings yet
Project Risk Management Guide
20 pages
TB SR Implementation Guideline-Clean-Final
No ratings yet
TB SR Implementation Guideline-Clean-Final
64 pages
Grade 11 Health PE Exam 2023-2024
No ratings yet
Grade 11 Health PE Exam 2023-2024
2 pages
Task 3
No ratings yet
Task 3
3 pages
Comprehensive Guide to Pain Management
No ratings yet
Comprehensive Guide to Pain Management
26 pages
Breastfeeding vs. Formula Feeding Guide
No ratings yet
Breastfeeding vs. Formula Feeding Guide
4 pages
Indoor Air Quality Requirement Within Green Building Regulations & Specifications PDF
No ratings yet
Indoor Air Quality Requirement Within Green Building Regulations & Specifications PDF
9 pages
Nyasa Vimsati
No ratings yet
Nyasa Vimsati
84 pages
Lesson Plan Math2
100% (4)
Lesson Plan Math2
6 pages
Hello, Love, Goodbye: JAMES B. DELLAVA - BSN I - Blessed Frederic Ozanam
No ratings yet
Hello, Love, Goodbye: JAMES B. DELLAVA - BSN I - Blessed Frederic Ozanam
4 pages
Q2e RW3 U04 Energy PDF
No ratings yet
Q2e RW3 U04 Energy PDF
14 pages
Copa Lesson Plan Java
No ratings yet
Copa Lesson Plan Java
118 pages
Personal Strategic Plan Example
No ratings yet
Personal Strategic Plan Example
2 pages
Siddhanta Deepika Volume 13
100% (1)
Siddhanta Deepika Volume 13
599 pages
Tilapia Farming Guide
100% (1)
Tilapia Farming Guide
45 pages
Grade 3 Simple Complex Sentences A
No ratings yet
Grade 3 Simple Complex Sentences A
2 pages
Avaya PD-PDS Integration
No ratings yet
Avaya PD-PDS Integration
119 pages
Service Delivery Switch: Data Sheet
No ratings yet
Service Delivery Switch: Data Sheet
6 pages
15 SearchTrees
No ratings yet
15 SearchTrees
67 pages
Homework A 32
No ratings yet
Homework A 32
3 pages
Resource Based Theory (RBT) - AK S1
No ratings yet
Resource Based Theory (RBT) - AK S1
9 pages
Commissioning
100% (1)
Commissioning
22 pages
KOMET Technical Manual: Drilling & Threading
No ratings yet
KOMET Technical Manual: Drilling & Threading
184 pages
How To Install Kali Linux On Raspberry Pi (And WH
No ratings yet
How To Install Kali Linux On Raspberry Pi (And WH
2 pages
Stins
No ratings yet
Stins
89 pages
Icd 11
No ratings yet
Icd 11
852 pages
Denfoss Valve
No ratings yet
Denfoss Valve
3 pages
Pushpali Thorat
No ratings yet
Pushpali Thorat
2 pages

Paper3 - Prediction Error Estimation

Uploaded by

Paper3 - Prediction Error Estimation

Uploaded by

Vol. 21 no.

15 2005, pages 3301–3307

Data and text mining

Prediction error estimation: a comparison of resampling

Published by Oxford University Press 2005 3301

Table 2. Lymphoma study results

Resampling method n = 40 n = 80 n = 120

Resampling n = 40 n = 80 Estimator p Algorithm Estimation SD Bias MSE

Table 5. Resampling with and without replacement

n Algorithm Leave-one-out bootstrap 3-fold MCCV

You might also like