See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/23226158
Ratio Estimation with Measurement Error in the
Auxiliary Variate
Article in Biometrics · September 2009
DOI: 10.1111/j.1541-0420.2008.01110.x · Source: PubMed
CITATIONS
READS
6
107
2 authors:
Timothy G. Gregoire
Christian Salas
77 PUBLICATIONS 1,904 CITATIONS
48 PUBLICATIONS 301 CITATIONS
Yale University
Universidad de La Frontera
SEE PROFILE
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Ecophysiology of Tropical Pioneer Tree Species View project
Global Forest Biodiversity Initiative View project
All content following this page was uploaded by Christian Salas on 28 April 2015.
The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the original document
and are linked to publications on ResearchGate, letting you access and read them immediately.
DOI: 10.1111/j.1541-0420.2008.01110.x
Biometrics 65, 590–598
June 2009
Ratio Estimation with Measurement Error in the Auxiliary Variate
Timothy G. Gregoire1, ∗ and Christian Salas1,2, ∗∗
1
School of Forestry and Environmental Studies, Yale University, New Haven, Connecticut 06511-2104, U.S.A.
2
Departamento de Ciencias Forestales, Universidad de La Frontera, Temuco, Chile
∗
email: timothy.gregoire@yale.edu
∗∗
email: christian.salas@yale.edu
Summary. With auxiliary information that is well correlated with the primary variable of interest, ratio estimation of
the finite population total may be much more efficient than alternative estimators that do not make use of the auxiliary
variate. The well-known properties of ratio estimators are perturbed when the auxiliary variate is measured with error. In
this contribution we examine the effect of measurement error in the auxiliary variate on the design-based statistical properties
of three common ratio estimators. We examine the case of systematic measurement error as well as measurement error that
varies according to a fixed distribution. Aside from presenting expressions for the bias and variance of these estimators when
they are contaminated with measurement error we provide numerical results based on a specific population. Under systematic
measurement error, the biasing effect is asymmetric around zero, and precision may be improved or degraded depending on
the magnitude of the error. Under variable measurement error, bias of the conventional ratio-of-means estimator increased
slightly with increasing error dispersion, but far less than the increased bias of the conventional mean-of-ratios estimator. In
similar fashion, the variance of the mean-of-ratios estimator incurs a greater loss of precision with increasing error dispersion
compared with the other estimators we examine. Overall, the ratio-of-means estimator appears to be remarkably resistant to
the effects of measurement error in the auxiliary variate.
Key words:
Beta errors; Design-based inference; Gaussian errors; Rounding errors; Sampling; Surveys; Uniform errors.
1. Introduction
In the field of survey sampling, auxiliary information is used
variously to guide the inclusion of population units into the
sample or to assist in the estimation of population descriptive parameters, or both. Classic, recent, and contemporary
texts on sampling—for example, Cochran (1977), Särndal,
Swensson, and Wretman (1992), and Gregoire and Valentine
(2008)—all devote substantial portions of their expositions
to the use of auxiliary information to increase the precision
with which quantitative population characteristics may be estimated. To be worthwhile for these purposes, the auxiliary
variate, say x, must be well correlated, usually positively correlated, with the attribute of principal interest, say y, and
it must be comparatively inexpensive to obtain, or else the
investment in time and energy to obtain information on the
auxiliary variate could be better spent obtaining a larger sample of the variable of interest alone.
Oftentimes the secondary importance of the auxiliary variate, x, may result in a more sizeable error in its measurement compared to that of y. For example, x may result from
remotely sensed, satellite data that have been processed by
a digital classifier, and hence subject to classification error.
Or, x may be an ocular or other type of subjective assessment of a size characteristic that is associated with y. Airborne or space Light Detection and Ranging (LiDAR) measurements of forest canopy height may change due to degradation in laser system performance over time, due to location
590
error between airborne laser profiles, or due to technological
improvements that necessitate the use of two different laser
systems for data collection flown years apart. Yet another
source of error when x is measured in the field or forest is
nonnegligible rounding error. These few examples are hardly
exhaustive, and they are provided merely to help motivate
the problem we address: because of the nature of auxiliary
information and how it is used in the sample design and
estimation, it is apt to include nonnegligible measurement
error.
The purpose of this article is to explore the effect of measurement error (ME) in x on the design-based statistical properties of commonly employed ratio estimators of τ y , the total amount of the variate of interest, y, in the population.
We know of no prior investigation of this issue, notwithstanding considerable recent attention devoted to measurement error in modeling (e.g., Fuller, 1987; Carroll, 1998) and
its effect on model-based inference. On one level it is evident that although the information in x about y is degraded
when measurement error is included, the well-known properties of ratio estimators persist. However, on a different level we
think that there is value in scrutinizing more specific effects
of measurement error. For example, whether it affects bias
more egregiously than variance; whether the relative impact
changes with sample size; and whether the effects are symmetric around zero. The results of this article shed light on such
effects.
C
2008, The International Biometric Society
Ratio Estimation with Measurement Error
2. Ratio Estimation of Population
Descriptive Parameters
2.1 Population Descriptive Parameters
Similar to Breidt et al. (2007), we denote the ordered labels
for a finite population of discrete units by P = {1, 2, . . . N },
indexed by k in the following. Corresponding to each unit is a
nonnegative value of some measurable characteristic, yk , ∀ k ∈
P; likewise, there is a quantitative, auxiliary characteristic,
x k , also nonnegative. Our interest is focused on the
estimation
of the aggregate value of y in P, namely, τy = P yk on the
basis of a sample of n units selected without replacement by
simple random sampling (SRSwoR). We denote the mean yvalue per element in P by µ y = τ y /N .
2.2 Ratio Estimation in the Absence of Measurement
Error in x
We first summarize the design-based bias and variance of
three ratio estimators of interest when the auxiliary variate
is free of measurement error. In the succeeding section, we
examine how these properties are altered by measurement
error.
2.2.1 Ratio-of-means estimator. Letting y and x denote
sample mean values, the aptly named “ratio-of-means”
estimator is
x,
τy 1 = Rτ
(1)
= y/x is an estimator of R =
where τx = P xk . In (1), R
τ y /τ x = µ y /µ x .
A Taylor-series expansion of τy 1 truncated after two terms
leads to an expression of its design bias as an estimator of τ y
as
B [ τy 1 : τy ] ≈
1
n
−
1
N
γx2 − ργx γy τy ,
(2)
where ρ is the linear correlation coefficient and γx and γ y
are the coefficients of variation of x and y, respectively, in
the population. This approximation to the bias of τy 1 may
be deduced inter alia, from Cochran (1977, Section 6.8, Eq.
6.34).
The usual approximation of the variance of τy 1 following
SRSwoR is (cf. Gregoire and Valentine, 2008, Section 6.3, Eq.
6.16)
V [ τy 1 ] = N 2
1
n
−
1
N
σr2 m ,
(3)
where σr2 m = N 1−1 P (yk − Rxk )2 .
2.2.2 Mean-of-ratios estimator. In addition to τy 1 we consider the “mean-of-ratios” estimator,
τy 2 = rτx ,
(4)
where r is the average ratio r k = y k /x k of the n units in
the sample. Its expected value under the SRSwoR design is
] = µr τx , where µ r is the population average ratio, i.e.,
E[ τy 2
µr = P rk /N .
The design bias of τy 2 as an estimator of τ y may be deduced
straightforwardly as
B [τy 2 : τy ] =
P
yk
µ
x
xk
−1 .
591
Evidently its bias is impervious to the size of the sample
actually selected. Indeed, even when n = N, τy 2 = τy , and according to Sukhatme and Sukhatme (1970, p. 160), this inconsistency has limited its use. The variance of τy 2 is
V [ τy 2 ] = τx2
1
n
−
1
N
2
σm
r,
1
2
(r − µr )2 , as shown in equation (7) of
where σm
r = N −1
P k
Goodman and Hartley (1958).
2.2.3 Unbiased ratio-type estimator. Hartley and Ross
(1954) introduced the following design-unbiased estimator of
τy:
τy 3 = τy 2 +
N − 1 n
N
n−1
(τy π − r τx π ) ,
in which τy π and τx π are the Horvitz–Thompson estimators
of τ y and τ x , respectively. Unbiasedness of τy 3 is obtained
because E[( NN−1 )( nn−1 )(τy π − r τx π )] = τy − µr τx .
Goodman and Hartley (1958, Eq. 18) derived the variance
of τy 3 . In our notation and after adjusting to a finite population context,
V [τy 3 ] = N 2
1
1
N
σy2 + µ2r σx2 − 2µr C(x, y)
n
1 2 2
σ σ + C(r, x)2
,
+
n−1 r x
−
where C(x, y) is the population covariance: C(x, y) =
(xk − µx )(yk − µy )/N , and C(r, x) is analogously defined.
P
2.3 Ratio Estimation in the Presence of Additive
Measurement Error in x
We assume that x k cannot be measured without error, consequently in its stead we measure
x∗k = xk + δk ,
where δ k is the measurement error. In keeping with precepts
of design-based inference, we assume further that δ k is fixed
in the sense that repeated measurements of the kth unit of
P would result in the same value x∗k . Fixed error is reasonable, for example, in the case where a measure of length is
rounded to the nearest centimeter: We presume that repeated
measurements of the same length would result in an identical measurement, the error of which would be unknown but
have constant magnitude among the repeated measurements.
In a remote sensing context LiDAR readings of height will
contain error, the magnitude of which will vary among pixels, yet for a given scene it will be fixed for each pixel in the
scene. As a further example, measurement error of a fixed
magnitude may result from faulty instrumentation, thereby
leading to the same magnitude of measurement error among
all elements of P. Cochran (1977, Section 13.9) terms the latter “constant bias over all units,” yet he discusses the case
where such biased measurement only affects y, not x.
As explained in the next section, we shall consider both the
case where the magnitude of δ k may vary among units, and
when it is constant for all units.
In the sequel, the population and sample mean measurement errors are denoted as µ δ and δ, respectively. In an
obvious extension to the notation introduced above, let the
Biometrics, June 2009
592
error-contaminated total, mean, and coefficient of variation of
x∗ in P be denoted by τ ∗x , µ∗x , and γ ∗x , respectively. In similar
fashion, the sample mean of x∗ is x ∗ .
The analog to τy 1 with the error-contaminated auxiliary
variate is
∗ τx∗ = τy 1
τy∗1 = R
µδ
µx
1+
1+
δ
x
+ ). The bias of
= y/x = R/(1
where R
to equation (2):
∗
∗
B τy∗1 : τy ≈
1
n
δ
x
τy∗1
1
N
−
,
B [ τy 1 : τy ]
=
γx∗2 − ρ∗ γx∗ γy τy .
µδ
1+
µx
2
.
(6)
The utility of this expression is that it is independent of sample size, n. It shows, also, that even when the population mean
error, µ δ , is identically zero, the bias of τy∗1 is affected by the
variability of measurement error.
From equation (3) we deduce that the approximate variance
of τy∗1 is given by
V
where
σr∗2m =
τy∗1 = N 2
1
n
1
N
−
σr∗2m ,
(7)
1
(yk − R∗ x∗k )2
N −1
P
=
1
yk − R (xk + δk )
N −1
P
1+
µδ
µx
2
. (8)
With measurement error in the auxiliary variate, the
“mean-of-ratios” estimator in equation (4) becomes
τy∗2 = r ∗ τx∗ =
µ + µ
N
x
δ
yk
,
n
xk + δk
k ∈S
where r ∗ is the average ratio r ∗k = y k /x∗k of the n units in the
sample. Therefore, E τy∗2 = µ∗r τx∗ , and
B τy∗2 : τy = µ∗r τx∗ − τy =
yk
P
µ + µ
x
δ
xk + δk
where µ∗r = N −1 P yk /x∗k .
The ratio of the bias of τy∗2 to τy 2 is
B τy∗2 : τy
B [τy 2 : τy ]
=
P
yk
P
µ + µ
x
δ
xk + δk
yk
µ
x
xk
−1
−1
τy∗2 = (τx + N µδ )2
−1 ,
.
(9)
(10)
( yk
P x∗
1
N −1
1
n
−
1
N
∗2
σm
r,
(11)
− µ∗r )2 .
k
The error-contaminated version of τy 3 is
τy∗3 = τy∗2 +
N − 1 n
n−1
µ + µ
N
x
δ
=
yk
n
xk + δk
N
(τy π − r ∗ τx∗π )
k ∈S
+
µy σx∗2 − C(x∗ , y)(µx + µδ )
µy σx2 − C(x, y)µx
=
∗2
where σm
r =
(5)
γx∗ − ρ∗ γy
γx − ργy
γx∗
γx
V
is analogous
The change in magnitude of bias, which arises from the
measurement error, may be expressed usefully by the ratio of
equation (5) to that of equation (2), namely,
B τy∗1 : τy
The variance of τy∗2 is
N − 1 n
N
n−1
(τy π − r ∗ τx∗π ) ,
where r ∗ is the average sample ratio r ∗k , as mentioned earlier. With SRSwoR, τx∗π = N (x + δ ). Because E[( NN−1 )( nn−1 )
(τy π − r ∗ τx∗π )] = τy − µ∗r τx∗ , the design unbiasedness of τy∗3 is
preserved despite measurement error in the x k ’s.
The variance of τy∗3 is
V
τy∗3
= N2
+
1
n
−
1
N
2
∗
∗
σy2 + µ∗2
r σx ∗ − 2µr C(x , y)
1 2 2
σ ∗ σ ∗ + C(r ∗ , x∗ )2 ,
n−1 r x
(12)
∗
∗
∗
∗
where
∗C(x ∗, y) = P (xk − µx ) (yk − µy ) /N , and C(r , y) =
(r
−
µ
)
(y
−
µ
)
/N
.
k
y
r
k
P
3. Empirical Study
The complicated dependence of the bias and variance of
τy∗1 , τy∗2 , and τy∗3 on the mean and variance of the errorcontaminated auxiliary variate prevents a general analytical
comparison of the relative performance of these estimators. To
circumvent this difficulty we examined the statistical properties of these estimators when sampling from a specific population, which we describe below.
The empirical portion of this study was undertaken to provide an indication of the magnitude of the effects of additive
measurement error in x k on the estimators presented above,
and to examine how the magnitude of these effects change
with the mean and variance of the measurement error process itself. We imposed various types of measurement error
on data collected by Candy (1999). In that study, conducted
in Tasmania, Australia, the length, width, and area of Eucalyptus nitens leaves were measured. We computed the product
of leaf length and width, and used this erstwhile “rectangular” area as the auxiliary variate for the estimation of total
leaf area, τ y , of the population. Descriptive parameters for
leaf area and for the corresponding rectangular area are displayed in Table 1. The marginal distribution of leaf area and
the relation between leaf area and rectangular area are shown
in Figure 1.
4. Measurement Error Processes
We examined the effect of measurement error in the auxiliary variate in the case where the magnitude of the error
was constant for all units in the population. In addition,
we looked at its effect when the magnitude of measurement
error varied among population units in accordance with a
Ratio Estimation with Measurement Error
Table 1
Descriptive parameters of the Eucalyptus nitens leaves
population (N = 501)
Variable
Leaf
area (cm2 )
Minimum
Maximum
Mean (µ)
Standard deviation (σ)
Total (τ )
Coefficient of variation (γ)
Coefficient of skewness
Kurtosis
Correlation coefficient (ρ)
28.1
146.6
71.0
21.1
35, 575.2
29.7%
0.6
0.2
Length ×
width (cm2 )
55.1
222.2
99.8
31.5
49, 988.1
31.6%
0.9
0.7
0.96
(b)
120
40 60 80
5
10
Leaf area (cm2)
15
(a)
0
Percent of the total
20
Descriptive
parameter
25
55
85
125
Leaf area class (cm2)
50 100
200
Leaf length * width (cm2)
Figure 1. Eucalyptus nitens leaves population: (a) histogram
of leaf areas and (b) relationship between leaf area and length
× width.
uniform distribution, a Gaussian distribution, and a beta
distribution.
The case of constant measurement error corresponds to the
situation, mentioned above, where instrumentation yields a
reading that systematically is larger than it should be, or
else is systematically smaller. We examined the effect of such
systematic error in the measurement of x k for the estimation
of total E. nitens leaf area, τ y , when its magnitude was δ =
−25 cm2 for each x k , i.e., δ k = −25 cm2 , ∀k. We did likewise
when δ k = 25 cm2 , ∀k, and then at a succession of values on
a fine grid within this range.
Uniform measurement error mimics the process of rounding
in the recording of x k . In the empirical study, we generated
a uniform random number error, u k , for each x k from a U[−
25, 25] distribution. We then applied a multiplicative scaling
factor, f U , to each random number, so that the scaled uniformly distributed measurement error was δ k = fU uk . We did
this for a minimum value of f U = 0, a maximum value of
f U = 1, and at a succession of values on a fine grid within
this range. The case where f U = 0 evidently corresponds to
the absence of measurement error.
593
Gaussian measurement error may result from the agglomeration of errors from independent sources. Example sources
may include change in ambient temperature, personnel fatigue, altered battery, or signal strength, background noise
level, glare, mental distraction, to name a few. We generated
δ k from a N (0, σ 2δ ) distribution, where σ δ = f N σ x , and
σ x is the standard deviation of the auxiliary variate in the
population of E. nitens leaves.
The multiplicative scaling factor, f N , varied from a minimum of 0.0, a maximum of 0.30, and intermediate values on
a fine grid within this range. Here, too, a value of f N = 0
corresponds to the absence of measurement error.
We also wished to examine the effect of skew. For this purpose, we generated δ k proportional to a beta-distributed random variate, b k ∼ β(a, b), with a = 2 and b = 10. The proportionality factor was set to f β = 25p/max (b k ), where p is a
proportion that was varied from a minimum of p = 0, a maximum of p = 1, and intermediate values on a fine grid within
this range. The case where p = 0 corresponds to the absence
of measurement error. For each set of scaled, beta-distributed
measurement errors, δ k , we deducted the median value from
each so that the skewed distribution of measurement errors
was centered around zero, as in the uniform and Gaussian
cases and the constant measurement error case introduced
above.
5. Effect of Constant Measurement Error
In this section, we examine the effect on design-based bias,
standard error (SE), and root mean square error (RMSE)
caused by the insinuation of a systematic (constant) measurement error in the auxiliary variate. That is, δ k = µ δ , ∀k.
5.1 Bias Ratio Trend with Change in Average Error
of Measurement
The bias ratio for τy∗1 , equation (6), and for τy∗2 , (10), are
displayed in Figure 2a for a range of values of µ δ arrayed
as a proportion of µ x . Horizontal reference lines have been
superimposed at values of −1, 0, and 1 on the vertical axis, as
well as a vertical reference line at µ δ /µ x = 0. As mentioned
earlier, these ratios do not depend on sample size, n.
As seen in this graphic, relative to the bias of τy 1 and τy 2 ,
the bias of τy∗1 and τy∗2 when µ δ < 0 exceeds that of the
corresponding estimator in the absence of measurement error. Conversely, positive measurement error results in reduced
bias. However, with sufficiently large positive µ δ , the bias of
each estimator becomes negative and larger in absolute magnitude than the bias of the corresponding error-free estimator.
Nonetheless, at least for this E. nitens leaf population, there
is a range, 0 < µ δ 0.2µ x , of constant measurement error
where the bias is reduced over what it is in the absence of
measurement error.
Figure 2b shows the bias of τy∗1 and τy∗2 , respectively, expressed as a percentage of τ y . This display serves to emphasize the comparative imperviousness of the bias in τy 1 to this
type of measurement error. The results depicted here were
computed with equations (5) and (9), presuming a sample of
size n = 7, which is roughly a 1% sample of the 501 element
E. nitens leaf population. For larger sample sizes, the bias
of τy∗1 will lay closer to the zero reference line, whereas the
percentage bias trend line for τy∗2 would be unchanged.
Biometrics, June 2009
594
(b)
Bias as a percentage of τy
Ratio of bias with/without ME
(a)
10
8
6
4
2
0
0.0
0.1
0.2
2.5
8
6
4
2
0
0.3
0.0
SE as a percentage of τy
Ratio of SE with/without ME
(c)
2.0
1.5
0.1
0.2
9
1.0
0.3
(d)
8
7
6
5
4
3
0.0
0.1
0.2
3.5
0.3
0.0
RMSE as percentage of τy
Ratio of RMSE with/without ME
3.0
2.5
2.0
1.5
1.0
0.5
0.1
0.2
9
(e)
0.3
(f)
8
7
6
5
4
3
0.0
µδ / µx
0.1
0.2
0.3
0.0
µδ / µx
0.1
0.2
0.3
Figure 2. Properties of estimators under systematic measurement error in the x variate. For τy∗1 (solid line) and τy∗2 (dot–dash
line), the ratio of bias with:without measurement error in (a) and bias as a percentage of τ y in (b). For τy∗1 , τy∗2 , and τy∗3 (dashed
line), the ratio of standard error with:without measurement error in (c) and standard error as a percentage of τ y in (d); ratio
of RMSE with:without measurement error in (e), and RMSE percentage as a percentage of τ y in (f). Results for τy∗3 are based
on samples of size n = 7.
5.2 Standard Error Trend with Change in Average Error
of Measurement
The ratio of the standard error of τy∗1 to τy 1 is displayed in
Figure 2c with similar traces for the standard error ratios of
τy∗2 and τy∗3 superimposed. The standard error ratio of τy∗3 ,
only, depends on n, and results in this figure are shown for
n = 7. When µ δ < 0, the standard errors of all three estimators
are increased, and when µ δ > 0, the precision of all three is
improved. Although not easily discernible in Figure 2c, when
µ δ is within the region of ±5% of µ x , the standard error τy∗2
is less affected than that of τy∗1 , whereas the standard error
τy∗3 is more affected. This is evident, also, when the standard
errors of the estimators are expressed as a percentage of τ y
as in Figure 2d. When µ δ is within the region of ±5% of µ x ,
Ratio Estimation with Measurement Error
the increase or decrease in standard error is a tiny fraction of
a percentage point.
By differentiating equation (8) with respect to µ δ , we deduce that the standard error of τy∗1 is minimized at the value
of µ δ satisfying the relation
ζµδ = µy − ζµx ,
where ζ = C(x, y)/σ 2x is the linear regression coefficient of y
on x.
5.3 RMSE Trend with Change in Average Error
of Measurement
A similar set of graphs are shown in Figure 2e, which portrays
the RMSE ratios for τy∗1 , τy∗2 , and τy∗3 , and Figure 2f, which
shows the change in RMSE, expressed as a percentage of τ y ,
with increasing µ δ . As before, only the results pertaining to
τy∗3 depend on the size of the sample. In both figures it is
apparent that RMSE is affected much more when µ δ < 0
than when µ δ > 0.
When judged by RMSE, τy∗2 is most affected when µ δ < 0
and τy∗1 is least affected. When expressed as a percentage of
τ y , as in Figure 2f, τy∗1 is superior regardless of the sign of µ δ .
All three estimators have smaller RMSE under small positive
µ δ than they do in the absence of measurement error.
6. Effect of Variable Measurement Error
The three distributions—uniform, Gaussian, and beta—of
measurement error that we investigated all were centered at
zero. By using different scaling factors, we were able to vary
the spread of the error distributions in a systematic fashion.
In the graphical results discussed in this section, we portray
the change in bias, standard error, and RMSE of the errorcontaminated estimators of τ y as a function of the standard
deviation, σ δ , of the error distribution expressed as a proportion of the standard deviation, σ x , of the auxiliary variate.
The upper panels of Figure 3 display the ratio of the bias
of τy∗1 to τy 1 as σ δ increases, and the similar ratio of the
bias of τy∗2 to τy 2 is superimposed. From left to right, panels (a), (b), and (c) show the bias ratio, respectively, for uniformly, Gaussian-, and beta-distributed measurement error.
In all three cases, the bias of both τy∗1 and τy∗2 is increased
compared to the bias in the absence of measurement error,
becoming greater with increasing dispersion of the error distribution. Under all three error processes τy∗2 is more sensitive
to measurement error than τy∗1 , in the sense that its bias is increased more. For any specified level of σ δ , the bias ratio was
greatest when measurement errors were Gaussian distributed,
and least when beta distributed.
The inserts in the upper left corner of these panels show
the percentage bias of τy∗1 and τy∗2 when n = 7. Percentage
bias increases in a smooth fashion with increasing σ δ . Compared to τy∗2 , the bias of τy∗1 is rather insensitive to increasing
measurement error dispersion. At any specified level of σδ ,
bias in τy∗1 and τy∗2 was greatest when measurement errors
were Gaussian distributed, and least when beta distributed,
although the differences between them are slight. Arguably,
the salient message carried by these graphs is that variable
measurement error increases the bias of τy 1 and τy 2 , and that
the increase in bias is directly related to the dispersion of the
error distribution.
595
Panels (a), (b), and (c) in the middle row of graphs in
Figure 3 show the ratio of standard error with and without
measurement error in the auxiliary variate. It is apparent that
the standard error of estimation, like bias, increases directly
with increasing error dispersion, and that τy∗1 and τy∗3 are less
sensitive than τy∗2 in this regard. As with bias, the Gaussiandistributed errors exert more of an effect on the standard error
of estimation than uniformly distributed errors, whereas the
beta-distributed errors had the smallest effect. This result is
evident when examining the standard error of estimation as a
percentage of τ y , shown in the upper left inserts of the middle
row of graphs.
When performance is judged by RMSE, τy∗1 performs best
under all three variable measurement error processes, as
shown in panels (a), (b), and (c) of the lowest row of graphs
of Figure 3.
7. Simulation Results
For the error-contaminated versions of the E. nitens leaf population described in Section 3 we checked the results presented
in Figures 2 and 3 by means of a simulation study in which we
drew 30,000 samples from the population of E. nitens leaves
contaminated by each of the measurement error processes described in Section 4. In all cases the discrepancy between the
results displayed in Figures 2 and 3 and the simulation results
agreed to within a fraction of a percentage point. Moreover,
when the simulation was repeated with 100,000 samples, the
results changed minimally. Sampling was conducted with samples of size 7, 15, and 37. Because of the close similarity of
results, we report results for samples of size 7 only.
Aside from serving this confirmatory purpose, the simulation enabled us to evaluate how well the approximations put
forth in equations (7), (11), and (12) portray the variance
of τy∗1 , τy∗2 , and τy∗3 , respectively, under the different types of
measurement error processes we examined. Cochran (1977,
p. 162–163) considered their adequacy in the absence of measurement error.
Results for the case of constant measurement error in the
auxiliary variate are displayed in Table 2a. In the absence of
measurement error, i.e., when µ δ = 0, the standard errors
computed from these variance approximations all are within
1% of the Monte Carlo standard errors for all three estimators of τ y and for the three sample sizes that we examined.
When µ δ > 0, the computed standard error of τy∗1 appears to
track the Monte Carlo standard error as well as it does when
µ δ = 0: there is no apparent pattern of increasing or decreasing deviation from the Monte Carlo error. The same can be
said for the deviation of the standard error of τy∗2 and τy∗3 from
the empirical error observed in the simulation study.
When µ δ < 0, these approximations to the standard error
of estimation perform less well, especially for the largest n.
However, only in one instance did the deviation from the empirical standard error exceed 2% in absolute magnitude, and
that case occurred only when µ δ was −25% of µ x . Overall,
the variance approximations given for all three estimators lead
to trustworthy standard errors of estimation under constant
measurement error in the auxiliary variate.
Results for variable measurement error are tabulated in
Table 2b. Recall that we examined the effects of uniform,
Biometrics, June 2009
596
Uniform
3
3.5
2
3.0
Gaussian
(a)
(%)
3
3.5
2
3.0
Bias ratio
1
0
0
0
2.0
1.5
1.5
1.5
1.0
1.0
1.0
1.8
5
1.6
0.30
(a)
0.10
0.20
6
5
1.6
4
3
1.4
0.00
1.8
(%)
0.30
(b)
3
1.0
1.0
7
6
1.8
0.30
(a)
0.10
0.20
7
6
(%)
1.8
5
4
1.6
0.00
2.0
0.30
(b)
3
0.00
1.2
1.2
1.0
1.0
1.0
0.30
0.00
0.10
0.20
σδ / σx
(%)
4
3
1.2
0.20
0.30
(c)
5
3
1.4
/ σx
7
1.6
1.4
0.10
σδ
0.20
6
1.4
0.00
0.10
2.0
1.8
4
1.6
(%)
3
(%)
5
0.30
(c)
4
1.4
1.0
0.20
5
1.6
1.2
0.10
0.20
6
1.2
0.00
0.10
1.8
(%)
4
1.4
0.00
1.2
2.0
(%)
1
2.0
0.20
(c)
2.5
2.0
0.10
2
3.0
2.5
6
Standard error ratio
(%)
3
3.5
1
2.5
0.00
RMSE ratio
Beta
(b)
0.30
0.00
0.10
0.20
σδ / σx
0.30
Figure 3. Bias (first panel row), standard error (second panel row), and RMSE (third panel row) of τy∗1 (solid line), τy∗2
(dot–dash line), and τy∗3 (dashed line) relative to that of τy 1 , τy 2 , and τy 3 , respectively, with uniform (a), Gaussian (b), and
beta (c) distributed measurement error in the auxiliary variate. The inner plots represent bias (first panel row), standard
error (second panel row), and RMSE (third panel row) expressed as a percentage of τ y . The horizontal axis of the inserts
span the range 0 σ δ /σ x 0.30. Results for τy∗3 are based on samples of size n = 7.
Gaussian, and beta-distributed measurement error, where, in
each case, the distribution was centered at zero, i.e., µ δ = 0.
Looking first at the results that pertain to uniformly distributed measurement error, there is an apparent pattern of
understatement of actual standard error, which increases in
magnitude with increasing σ δ . When σ δ = 0.3σ x , the computed standard errors of τy∗1 , τy∗2 , and τy∗3 are about 2–4%
smaller than what was observed among the 30,000 estimates
observed in the simulation study.
In contrast, when δ ∼ N [0, σ 2δ ], the computed approximations to the standard errors of τy∗2 and τy∗3 tend to overstate
the empirically observed standard error, with the overstatement increasing moderately with increasing σ δ . This trend
was apparent for τy∗1 at the large sample sizes (n = 15 & 37)
that we examined.
Results when δ is distributed as a beta random variable
mimic the Gaussian results, but the overstatement is greater
for a similar level of σ δ . Whereas overstatement ranged
Ratio Estimation with Measurement Error
597
Table 2
Percentage deviation of the standard error approximation of τy∗1 , τy∗2 , and τy∗3 from the Monte Carlo standard error. In part (a),
results are shown when the measurement error in the auxiliary variate is a constant magnitude given by µ δ ; in part (b), results
are shown when the measurement error in the auxiliary variate varies according to uniform, Gaussian, and β distributions with
dispersion indicated by σ δ . Simulation results are based on 30, 000 samples of size n = 7.
(a)
µ δ /µ x
Distribution
Constant
(b)
Estimator
−0.25
τy∗1
τy∗2
τy∗3
−3.05
−1.90
−0.94
−0.19
0.34
0.62
0.67
0.55
−0.74
−0.61
−0.43
−0.20
0.05
0.28
0.45
0.54
−0.85
−0.76
−0.63
−0.46
−0.24
−0.02
σ δ /σ x
0.19
0.34
0
0.03
0.07
0.10
0.13
0.17
0.20
0.92
0.83
0.49
−0.03
−0.64
−1.89
−2.47
−0.20
−0.15
−0.10
−0.05
0
0.05
0.10
0.15
0.20
0.25
0.36
0.15
−0.03
0.56
0.53
0.48
0.42
0.43
0.41
0.23
0.27
0.30
−3.02
−3.54
−4.04
Uniform
τy∗1
τy∗2
τy∗3
Gaussian
τy∗1
Beta
0.44
1.01
1.19
1.03
0.64
−0.43
−0.99
−1.55
−2.10
−2.65
0.29
0.26
0.13
−0.08
−0.34
−0.92
−1.21
−1.48
−1.75
−2.02
0.49
0.47
0.47
0.46
0.46
0.45
0.43
0.40
0.35
0.29
τy∗2
−0.31
0.03
0.36
0.67
0.96
1.23
1.50
1.77
2.09
2.47
τy∗3
−0.05
0.02
0.17
0.39
0.66
0.95
1.25
1.55
1.83
2.09
τy∗1
0.76
1.22
1.60
1.88
2.07
2.21
2.20
2.16
2.09
2.00
0.26
0.83
1.47
2.07
2.57
3.21
3.35
3.41
3.40
3.32
0.21
0.75
1.28
1.76
2.18
2.77
2.97
3.10
3.19
3.25
τy∗2
τy∗3
generally from 0–2% when σ δ = 0.3σ x under Gaussian measurement error, it ranged from 2–3% under beta measurement
error.
8. Discussion
This has been a multifaceted study, the major conclusions of
which are enumerated below. Although it is impossible to generalize the results of a single empirical study, our results at
the very least provide insight into how the effects of measurement error change with increasing µ δ in the case of constant
measurement error and on increasing measurement error dispersion in the case of variable measurement error. The analytical expressions for the bias and variance of τy∗1 , τy∗2 , and
τy∗3 have been presented in Section 2.3 and confirmed by the
simulation study.
(i)
(ii)
The unbiasedness of τy 3 is preserved even when measurement error contaminates the auxiliary variate.
As a referee pointed out, this estimator is unbiased
without regard to the auxiliary variate and how or
whether it may be contaminated.
Constant measurement error in the auxiliary variate
can occur when the measuring instrument is faulty
in a consistent manner. The effects of such errors
are not symmetric: when µ δ < 0, its effect on the
bias, standard errors, and mean square errors of τy∗1
and τy∗2 differ from its effect when µ δ > 0. The
same can be said regarding its effect on the stan-
(iii)
(iv)
dard error of τy∗3 . When µ δ > 0, the bias and standard error are both reduced from their values in the
absence of measurement error. It will be useful to
determine whether this result holds for other study
populations. Over a broad range of µδ = 0, τy∗1 has
smallest RMSE.
Variable measurement error in the auxiliary variate
can occur for any of a number of reasons, e.g., rounding error in the measurement process. In the face of
variable measurement error, the bias of τy∗1 increases
less than that of τy∗2 . The RMSEs of all three estimators we examined increases directly with increasing
σ δ , a result which is intuitively clear. The RMSEs of
τy∗1 and τy∗3 are nearly identical, both being less than
that of τy∗2 .
The approximations we presented to the variance,
and hence standard error, of the three estimators
work well under constant measurement error when
µ δ > 0, but deteriorate slightly when µ δ < 0. Under a variable measurement error process, the magnitude and direction of its effect on the standard error
approximation differ from one error distribution to
another. Overall, the absolute size of the effect increases directly with increasing σ δ .
There are a number of ways in which the results presented
here can be extended. It would be of interest to determine
how estimators of variance are affected by measurement error
Biometrics, June 2009
598
in the auxiliary variate, as well as the coverage of nominal
(1 − α)100% confidence intervals are affected by measurement error in the auxiliary variate. When the relationship
between y and x does not pass near the origin, the linear regression estimator is more apt, in the model-assisted sense of
Särndal et al. (1992). It is not clear whether the results of measurement error in x observed for the ratio estimators of this
article would be magnified or diminished with the regression
estimator of τ y . Although we have taken a design-based approach to infer the effect of fixed measurement error, there
may be utility in a model-based approach that postulates a
probability distribution for the error of repeated measurement
of each unit of the population. Surely for some practitioners
who are more accustomed to relying on a presumed model as
the basis for statistical inference, this would be a more satisfying approach. Accordingly, we shall report on our study of
this approach later. Lastly, the combination of measurement
error in both y and x, possibly stemming from different error
processes, needs to be examined.
Acknowledgements
The authors wish to thank Jeff Gove, Daniel Mandallaz,
Ross Nelson, Al Stage, Göran Stȧhl, and Steve Stehman
for suggestions, which helped to improve the quality of the
manuscript.
View publication stats
References
Breidt, F. J., Opsomer, J. D., Johnson, A. A., and Ranalli, M. G.
(2007). Semiparametric model-assisted estimation for natural resource surveys. Survey Methodology 33, 35–44.
Candy, S. G. (1999). Predictive models for integrated pest management
of the leaf beetle Chrysophtharta bimaculata in Eucalyptus nitens
in Tasmania. Doctoral dissertation, University of Tasmania, Hobart, Australia.
Carroll, R. J. (1998). Measurement Error in Nonlinear Models. Boca
Raton, Florida: Chapman & Hall/CRC.
Cochran, W. G. (1977). Sampling Techniques, 3rd edition. New York:
John Wiley.
Fuller, W. (1987). Measurement Error Models. New York: John Wiley.
Goodman, L. A. and Hartley, H. O. (1958). The precision of unbiased
ratio-type estimators. Journal of the American Statistical Association 53, 491–508 (corrigenda: 1969 64, 1700).
Gregoire, T. G. and Valentine, H. T. (2008). Sampling Strategies for Natural Resources and the Environment. Boca
Raton, Florida: Chapman & Hall/CRC.
Hartley, H. O. and Ross, A. (1954). Unbiased ratio estimators. Nature
174, 270–271.
Särndal, C.-E., Swensson, B., and Wretman, J. (1992). Model Assisted
Survey Sampling. New York: Springer-Verlag.
Sukhatme, P. V. and Sukhatme, B. V. (1970). Sampling Theory of Surveys with Applications. Ames: Iowa State University Press.
Received December 2007. Revised April 2008.
Accepted May 2008.