Ratio estimation with measurement error in the auxiliary variate

2009, Biometrics

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/23226158 Ratio Estimation with Measurement Error in the Auxiliary Variate Article in Biometrics · September 2009 DOI: 10.1111/j.1541-0420.2008.01110.x · Source: PubMed CITATIONS READS 6 107 2 authors: Timothy G. Gregoire Christian Salas 77 PUBLICATIONS 1,904 CITATIONS 48 PUBLICATIONS 301 CITATIONS Yale University Universidad de La Frontera SEE PROFILE SEE PROFILE Some of the authors of this publication are also working on these related projects: Ecophysiology of Tropical Pioneer Tree Species View project Global Forest Biodiversity Initiative View project All content following this page was uploaded by Christian Salas on 28 April 2015. The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the original document and are linked to publications on ResearchGate, letting you access and read them immediately. DOI: 10.1111/j.1541-0420.2008.01110.x Biometrics 65, 590–598 June 2009 Ratio Estimation with Measurement Error in the Auxiliary Variate Timothy G. Gregoire1, ∗ and Christian Salas1,2, ∗∗ 1 School of Forestry and Environmental Studies, Yale University, New Haven, Connecticut 06511-2104, U.S.A. 2 Departamento de Ciencias Forestales, Universidad de La Frontera, Temuco, Chile ∗ email: timothy.gregoire@yale.edu ∗∗ email: christian.salas@yale.edu Summary. With auxiliary information that is well correlated with the primary variable of interest, ratio estimation of the ﬁnite population total may be much more eﬃcient than alternative estimators that do not make use of the auxiliary variate. The well-known properties of ratio estimators are perturbed when the auxiliary variate is measured with error. In this contribution we examine the eﬀect of measurement error in the auxiliary variate on the design-based statistical properties of three common ratio estimators. We examine the case of systematic measurement error as well as measurement error that varies according to a ﬁxed distribution. Aside from presenting expressions for the bias and variance of these estimators when they are contaminated with measurement error we provide numerical results based on a speciﬁc population. Under systematic measurement error, the biasing eﬀect is asymmetric around zero, and precision may be improved or degraded depending on the magnitude of the error. Under variable measurement error, bias of the conventional ratio-of-means estimator increased slightly with increasing error dispersion, but far less than the increased bias of the conventional mean-of-ratios estimator. In similar fashion, the variance of the mean-of-ratios estimator incurs a greater loss of precision with increasing error dispersion compared with the other estimators we examine. Overall, the ratio-of-means estimator appears to be remarkably resistant to the eﬀects of measurement error in the auxiliary variate. Key words: Beta errors; Design-based inference; Gaussian errors; Rounding errors; Sampling; Surveys; Uniform errors. 1. Introduction In the ﬁeld of survey sampling, auxiliary information is used variously to guide the inclusion of population units into the sample or to assist in the estimation of population descriptive parameters, or both. Classic, recent, and contemporary texts on sampling—for example, Cochran (1977), Särndal, Swensson, and Wretman (1992), and Gregoire and Valentine (2008)—all devote substantial portions of their expositions to the use of auxiliary information to increase the precision with which quantitative population characteristics may be estimated. To be worthwhile for these purposes, the auxiliary variate, say x, must be well correlated, usually positively correlated, with the attribute of principal interest, say y, and it must be comparatively inexpensive to obtain, or else the investment in time and energy to obtain information on the auxiliary variate could be better spent obtaining a larger sample of the variable of interest alone. Oftentimes the secondary importance of the auxiliary variate, x, may result in a more sizeable error in its measurement compared to that of y. For example, x may result from remotely sensed, satellite data that have been processed by a digital classiﬁer, and hence subject to classiﬁcation error. Or, x may be an ocular or other type of subjective assessment of a size characteristic that is associated with y. Airborne or space Light Detection and Ranging (LiDAR) measurements of forest canopy height may change due to degradation in laser system performance over time, due to location 590 error between airborne laser proﬁles, or due to technological improvements that necessitate the use of two diﬀerent laser systems for data collection ﬂown years apart. Yet another source of error when x is measured in the ﬁeld or forest is nonnegligible rounding error. These few examples are hardly exhaustive, and they are provided merely to help motivate the problem we address: because of the nature of auxiliary information and how it is used in the sample design and estimation, it is apt to include nonnegligible measurement error. The purpose of this article is to explore the eﬀect of measurement error (ME) in x on the design-based statistical properties of commonly employed ratio estimators of τ y , the total amount of the variate of interest, y, in the population. We know of no prior investigation of this issue, notwithstanding considerable recent attention devoted to measurement error in modeling (e.g., Fuller, 1987; Carroll, 1998) and its eﬀect on model-based inference. On one level it is evident that although the information in x about y is degraded when measurement error is included, the well-known properties of ratio estimators persist. However, on a diﬀerent level we think that there is value in scrutinizing more speciﬁc eﬀects of measurement error. For example, whether it aﬀects bias more egregiously than variance; whether the relative impact changes with sample size; and whether the eﬀects are symmetric around zero. The results of this article shed light on such eﬀects. C 2008, The International Biometric Society Ratio Estimation with Measurement Error 2. Ratio Estimation of Population Descriptive Parameters 2.1 Population Descriptive Parameters Similar to Breidt et al. (2007), we denote the ordered labels for a ﬁnite population of discrete units by P = {1, 2, . . . N }, indexed by k in the following. Corresponding to each unit is a nonnegative value of some measurable characteristic, yk , ∀ k ∈ P; likewise, there is a quantitative, auxiliary characteristic, x k , also nonnegative. Our interest is focused on the estimation of the aggregate value of y in P, namely, τy = P yk on the basis of a sample of n units selected without replacement by simple random sampling (SRSwoR). We denote the mean yvalue per element in P by µ y = τ y /N . 2.2 Ratio Estimation in the Absence of Measurement Error in x We ﬁrst summarize the design-based bias and variance of three ratio estimators of interest when the auxiliary variate is free of measurement error. In the succeeding section, we examine how these properties are altered by measurement error. 2.2.1 Ratio-of-means estimator. Letting y and x denote sample mean values, the aptly named “ratio-of-means” estimator is x, τy 1 = Rτ (1) = y/x is an estimator of R = where τx = P xk . In (1), R τ y /τ x = µ y /µ x . A Taylor-series expansion of τy 1 truncated after two terms leads to an expression of its design bias as an estimator of τ y as B [ τy 1 : τy ] ≈ 1 n − 1 N γx2 − ργx γy τy , (2) where ρ is the linear correlation coeﬃcient and γx and γ y are the coeﬃcients of variation of x and y, respectively, in the population. This approximation to the bias of τy 1 may be deduced inter alia, from Cochran (1977, Section 6.8, Eq. 6.34). The usual approximation of the variance of τy 1 following SRSwoR is (cf. Gregoire and Valentine, 2008, Section 6.3, Eq. 6.16) V [ τy 1 ] = N 2 1 n − 1 N σr2 m , (3) where σr2 m = N 1−1 P (yk − Rxk )2 . 2.2.2 Mean-of-ratios estimator. In addition to τy 1 we consider the “mean-of-ratios” estimator, τy 2 = rτx , (4) where r is the average ratio r k = y k /x k of the n units in the sample. Its expected value under the SRSwoR design is ] = µr τx , where µ r is the population average ratio, i.e., E[ τy 2 µr = P rk /N . The design bias of τy 2 as an estimator of τ y may be deduced straightforwardly as B [τy 2 : τy ] = P yk µ x xk −1 . 591 Evidently its bias is impervious to the size of the sample actually selected. Indeed, even when n = N, τy 2 = τy , and according to Sukhatme and Sukhatme (1970, p. 160), this inconsistency has limited its use. The variance of τy 2 is V [ τy 2 ] = τx2 1 n − 1 N 2 σm r, 1 2 (r − µr )2 , as shown in equation (7) of where σm r = N −1 P k Goodman and Hartley (1958). 2.2.3 Unbiased ratio-type estimator. Hartley and Ross (1954) introduced the following design-unbiased estimator of τy: τy 3 = τy 2 + N − 1 n N n−1 (τy π − r τx π ) , in which τy π and τx π are the Horvitz–Thompson estimators of τ y and τ x , respectively. Unbiasedness of τy 3 is obtained because E[( NN−1 )( nn−1 )(τy π − r τx π )] = τy − µr τx . Goodman and Hartley (1958, Eq. 18) derived the variance of τy 3 . In our notation and after adjusting to a ﬁnite population context, V [τy 3 ] = N 2 1 1 N σy2 + µ2r σx2 − 2µr C(x, y) n 1 2 2 σ σ + C(r, x)2 , + n−1 r x − where C(x, y) is the population covariance: C(x, y) = (xk − µx )(yk − µy )/N , and C(r, x) is analogously deﬁned. P 2.3 Ratio Estimation in the Presence of Additive Measurement Error in x We assume that x k cannot be measured without error, consequently in its stead we measure x∗k = xk + δk , where δ k is the measurement error. In keeping with precepts of design-based inference, we assume further that δ k is ﬁxed in the sense that repeated measurements of the kth unit of P would result in the same value x∗k . Fixed error is reasonable, for example, in the case where a measure of length is rounded to the nearest centimeter: We presume that repeated measurements of the same length would result in an identical measurement, the error of which would be unknown but have constant magnitude among the repeated measurements. In a remote sensing context LiDAR readings of height will contain error, the magnitude of which will vary among pixels, yet for a given scene it will be ﬁxed for each pixel in the scene. As a further example, measurement error of a ﬁxed magnitude may result from faulty instrumentation, thereby leading to the same magnitude of measurement error among all elements of P. Cochran (1977, Section 13.9) terms the latter “constant bias over all units,” yet he discusses the case where such biased measurement only aﬀects y, not x. As explained in the next section, we shall consider both the case where the magnitude of δ k may vary among units, and when it is constant for all units. In the sequel, the population and sample mean measurement errors are denoted as µ δ and δ, respectively. In an obvious extension to the notation introduced above, let the Biometrics, June 2009 592 error-contaminated total, mean, and coeﬃcient of variation of x∗ in P be denoted by τ ∗x , µ∗x , and γ ∗x , respectively. In similar fashion, the sample mean of x∗ is x ∗ . The analog to τy 1 with the error-contaminated auxiliary variate is ∗ τx∗ = τy 1 τy∗1 = R µδ µx 1+ 1+ δ x + ). The bias of = y/x = R/(1 where R to equation (2): ∗ ∗ B τy∗1 : τy ≈ 1 n δ x τy∗1 1 N − , B [ τy 1 : τy ] = γx∗2 − ρ∗ γx∗ γy τy . µδ 1+ µx 2 . (6) The utility of this expression is that it is independent of sample size, n. It shows, also, that even when the population mean error, µ δ , is identically zero, the bias of τy∗1 is aﬀected by the variability of measurement error. From equation (3) we deduce that the approximate variance of τy∗1 is given by V where σr∗2m = τy∗1 = N 2 1 n 1 N − σr∗2m , (7) 1 (yk − R∗ x∗k )2 N −1 P = 1 yk − R (xk + δk ) N −1 P 1+ µδ µx 2 . (8) With measurement error in the auxiliary variate, the “mean-of-ratios” estimator in equation (4) becomes τy∗2 = r ∗ τx∗ = µ + µ N x δ yk , n xk + δk k ∈S where r ∗ is the average ratio r ∗k = y k /x∗k of the n units in the sample. Therefore, E τy∗2 = µ∗r τx∗ , and B τy∗2 : τy = µ∗r τx∗ − τy = yk P µ + µ x δ xk + δk where µ∗r = N −1 P yk /x∗k . The ratio of the bias of τy∗2 to τy 2 is B τy∗2 : τy B [τy 2 : τy ] = P yk P µ + µ x δ xk + δk yk µ x xk −1 −1 τy∗2 = (τx + N µδ )2 −1 , . (9) (10) ( yk P x∗ 1 N −1 1 n − 1 N ∗2 σm r, (11) − µ∗r )2 . k The error-contaminated version of τy 3 is τy∗3 = τy∗2 + N − 1 n n−1 µ + µ N x δ = yk n xk + δk N (τy π − r ∗ τx∗π ) k ∈S + µy σx∗2 − C(x∗ , y)(µx + µδ ) µy σx2 − C(x, y)µx = ∗2 where σm r = (5) γx∗ − ρ∗ γy γx − ργy γx∗ γx V is analogous The change in magnitude of bias, which arises from the measurement error, may be expressed usefully by the ratio of equation (5) to that of equation (2), namely, B τy∗1 : τy The variance of τy∗2 is N − 1 n N n−1 (τy π − r ∗ τx∗π ) , where r ∗ is the average sample ratio r ∗k , as mentioned earlier. With SRSwoR, τx∗π = N (x + δ ). Because E[( NN−1 )( nn−1 ) (τy π − r ∗ τx∗π )] = τy − µ∗r τx∗ , the design unbiasedness of τy∗3 is preserved despite measurement error in the x k ’s. The variance of τy∗3 is V τy∗3 = N2 + 1 n − 1 N 2 ∗ ∗ σy2 + µ∗2 r σx ∗ − 2µr C(x , y) 1 2 2 σ ∗ σ ∗ + C(r ∗ , x∗ )2 , n−1 r x (12) ∗ ∗ ∗ ∗ where ∗C(x ∗, y) = P (xk − µx ) (yk − µy ) /N , and C(r , y) = (r − µ ) (y − µ ) /N . k y r k P 3. Empirical Study The complicated dependence of the bias and variance of τy∗1 , τy∗2 , and τy∗3 on the mean and variance of the errorcontaminated auxiliary variate prevents a general analytical comparison of the relative performance of these estimators. To circumvent this diﬃculty we examined the statistical properties of these estimators when sampling from a speciﬁc population, which we describe below. The empirical portion of this study was undertaken to provide an indication of the magnitude of the eﬀects of additive measurement error in x k on the estimators presented above, and to examine how the magnitude of these eﬀects change with the mean and variance of the measurement error process itself. We imposed various types of measurement error on data collected by Candy (1999). In that study, conducted in Tasmania, Australia, the length, width, and area of Eucalyptus nitens leaves were measured. We computed the product of leaf length and width, and used this erstwhile “rectangular” area as the auxiliary variate for the estimation of total leaf area, τ y , of the population. Descriptive parameters for leaf area and for the corresponding rectangular area are displayed in Table 1. The marginal distribution of leaf area and the relation between leaf area and rectangular area are shown in Figure 1. 4. Measurement Error Processes We examined the eﬀect of measurement error in the auxiliary variate in the case where the magnitude of the error was constant for all units in the population. In addition, we looked at its eﬀect when the magnitude of measurement error varied among population units in accordance with a Ratio Estimation with Measurement Error Table 1 Descriptive parameters of the Eucalyptus nitens leaves population (N = 501) Variable Leaf area (cm2 ) Minimum Maximum Mean (µ) Standard deviation (σ) Total (τ ) Coeﬃcient of variation (γ) Coeﬃcient of skewness Kurtosis Correlation coeﬃcient (ρ) 28.1 146.6 71.0 21.1 35, 575.2 29.7% 0.6 0.2 Length × width (cm2 ) 55.1 222.2 99.8 31.5 49, 988.1 31.6% 0.9 0.7 0.96 (b) 120 40 60 80 5 10 Leaf area (cm2) 15 (a) 0 Percent of the total 20 Descriptive parameter 25 55 85 125 Leaf area class (cm2) 50 100 200 Leaf length * width (cm2) Figure 1. Eucalyptus nitens leaves population: (a) histogram of leaf areas and (b) relationship between leaf area and length × width. uniform distribution, a Gaussian distribution, and a beta distribution. The case of constant measurement error corresponds to the situation, mentioned above, where instrumentation yields a reading that systematically is larger than it should be, or else is systematically smaller. We examined the eﬀect of such systematic error in the measurement of x k for the estimation of total E. nitens leaf area, τ y , when its magnitude was δ = −25 cm2 for each x k , i.e., δ k = −25 cm2 , ∀k. We did likewise when δ k = 25 cm2 , ∀k, and then at a succession of values on a ﬁne grid within this range. Uniform measurement error mimics the process of rounding in the recording of x k . In the empirical study, we generated a uniform random number error, u k , for each x k from a U[− 25, 25] distribution. We then applied a multiplicative scaling factor, f U , to each random number, so that the scaled uniformly distributed measurement error was δ k = fU uk . We did this for a minimum value of f U = 0, a maximum value of f U = 1, and at a succession of values on a ﬁne grid within this range. The case where f U = 0 evidently corresponds to the absence of measurement error. 593 Gaussian measurement error may result from the agglomeration of errors from independent sources. Example sources may include change in ambient temperature, personnel fatigue, altered battery, or signal strength, background noise level, glare, mental distraction, to name a few. We generated δ k from a N (0, σ 2δ ) distribution, where σ δ = f N σ x , and σ x is the standard deviation of the auxiliary variate in the population of E. nitens leaves. The multiplicative scaling factor, f N , varied from a minimum of 0.0, a maximum of 0.30, and intermediate values on a ﬁne grid within this range. Here, too, a value of f N = 0 corresponds to the absence of measurement error. We also wished to examine the eﬀect of skew. For this purpose, we generated δ k proportional to a beta-distributed random variate, b k ∼ β(a, b), with a = 2 and b = 10. The proportionality factor was set to f β = 25p/max (b k ), where p is a proportion that was varied from a minimum of p = 0, a maximum of p = 1, and intermediate values on a ﬁne grid within this range. The case where p = 0 corresponds to the absence of measurement error. For each set of scaled, beta-distributed measurement errors, δ k , we deducted the median value from each so that the skewed distribution of measurement errors was centered around zero, as in the uniform and Gaussian cases and the constant measurement error case introduced above. 5. Effect of Constant Measurement Error In this section, we examine the eﬀect on design-based bias, standard error (SE), and root mean square error (RMSE) caused by the insinuation of a systematic (constant) measurement error in the auxiliary variate. That is, δ k = µ δ , ∀k. 5.1 Bias Ratio Trend with Change in Average Error of Measurement The bias ratio for τy∗1 , equation (6), and for τy∗2 , (10), are displayed in Figure 2a for a range of values of µ δ arrayed as a proportion of µ x . Horizontal reference lines have been superimposed at values of −1, 0, and 1 on the vertical axis, as well as a vertical reference line at µ δ /µ x = 0. As mentioned earlier, these ratios do not depend on sample size, n. As seen in this graphic, relative to the bias of τy 1 and τy 2 , the bias of τy∗1 and τy∗2 when µ δ < 0 exceeds that of the corresponding estimator in the absence of measurement error. Conversely, positive measurement error results in reduced bias. However, with suﬃciently large positive µ δ , the bias of each estimator becomes negative and larger in absolute magnitude than the bias of the corresponding error-free estimator. Nonetheless, at least for this E. nitens leaf population, there is a range, 0 < µ δ 0.2µ x , of constant measurement error where the bias is reduced over what it is in the absence of measurement error. Figure 2b shows the bias of τy∗1 and τy∗2 , respectively, expressed as a percentage of τ y . This display serves to emphasize the comparative imperviousness of the bias in τy 1 to this type of measurement error. The results depicted here were computed with equations (5) and (9), presuming a sample of size n = 7, which is roughly a 1% sample of the 501 element E. nitens leaf population. For larger sample sizes, the bias of τy∗1 will lay closer to the zero reference line, whereas the percentage bias trend line for τy∗2 would be unchanged. Biometrics, June 2009 594 (b) Bias as a percentage of τy Ratio of bias with/without ME (a) 10 8 6 4 2 0 0.0 0.1 0.2 2.5 8 6 4 2 0 0.3 0.0 SE as a percentage of τy Ratio of SE with/without ME (c) 2.0 1.5 0.1 0.2 9 1.0 0.3 (d) 8 7 6 5 4 3 0.0 0.1 0.2 3.5 0.3 0.0 RMSE as percentage of τy Ratio of RMSE with/without ME 3.0 2.5 2.0 1.5 1.0 0.5 0.1 0.2 9 (e) 0.3 (f) 8 7 6 5 4 3 0.0 µδ / µx 0.1 0.2 0.3 0.0 µδ / µx 0.1 0.2 0.3 Figure 2. Properties of estimators under systematic measurement error in the x variate. For τy∗1 (solid line) and τy∗2 (dot–dash line), the ratio of bias with:without measurement error in (a) and bias as a percentage of τ y in (b). For τy∗1 , τy∗2 , and τy∗3 (dashed line), the ratio of standard error with:without measurement error in (c) and standard error as a percentage of τ y in (d); ratio of RMSE with:without measurement error in (e), and RMSE percentage as a percentage of τ y in (f). Results for τy∗3 are based on samples of size n = 7. 5.2 Standard Error Trend with Change in Average Error of Measurement The ratio of the standard error of τy∗1 to τy 1 is displayed in Figure 2c with similar traces for the standard error ratios of τy∗2 and τy∗3 superimposed. The standard error ratio of τy∗3 , only, depends on n, and results in this ﬁgure are shown for n = 7. When µ δ < 0, the standard errors of all three estimators are increased, and when µ δ > 0, the precision of all three is improved. Although not easily discernible in Figure 2c, when µ δ is within the region of ±5% of µ x , the standard error τy∗2 is less aﬀected than that of τy∗1 , whereas the standard error τy∗3 is more aﬀected. This is evident, also, when the standard errors of the estimators are expressed as a percentage of τ y as in Figure 2d. When µ δ is within the region of ±5% of µ x , Ratio Estimation with Measurement Error the increase or decrease in standard error is a tiny fraction of a percentage point. By diﬀerentiating equation (8) with respect to µ δ , we deduce that the standard error of τy∗1 is minimized at the value of µ δ satisfying the relation ζµδ = µy − ζµx , where ζ = C(x, y)/σ 2x is the linear regression coeﬃcient of y on x. 5.3 RMSE Trend with Change in Average Error of Measurement A similar set of graphs are shown in Figure 2e, which portrays the RMSE ratios for τy∗1 , τy∗2 , and τy∗3 , and Figure 2f, which shows the change in RMSE, expressed as a percentage of τ y , with increasing µ δ . As before, only the results pertaining to τy∗3 depend on the size of the sample. In both ﬁgures it is apparent that RMSE is aﬀected much more when µ δ < 0 than when µ δ > 0. When judged by RMSE, τy∗2 is most aﬀected when µ δ < 0 and τy∗1 is least aﬀected. When expressed as a percentage of τ y , as in Figure 2f, τy∗1 is superior regardless of the sign of µ δ . All three estimators have smaller RMSE under small positive µ δ than they do in the absence of measurement error. 6. Effect of Variable Measurement Error The three distributions—uniform, Gaussian, and beta—of measurement error that we investigated all were centered at zero. By using diﬀerent scaling factors, we were able to vary the spread of the error distributions in a systematic fashion. In the graphical results discussed in this section, we portray the change in bias, standard error, and RMSE of the errorcontaminated estimators of τ y as a function of the standard deviation, σ δ , of the error distribution expressed as a proportion of the standard deviation, σ x , of the auxiliary variate. The upper panels of Figure 3 display the ratio of the bias of τy∗1 to τy 1 as σ δ increases, and the similar ratio of the bias of τy∗2 to τy 2 is superimposed. From left to right, panels (a), (b), and (c) show the bias ratio, respectively, for uniformly, Gaussian-, and beta-distributed measurement error. In all three cases, the bias of both τy∗1 and τy∗2 is increased compared to the bias in the absence of measurement error, becoming greater with increasing dispersion of the error distribution. Under all three error processes τy∗2 is more sensitive to measurement error than τy∗1 , in the sense that its bias is increased more. For any speciﬁed level of σ δ , the bias ratio was greatest when measurement errors were Gaussian distributed, and least when beta distributed. The inserts in the upper left corner of these panels show the percentage bias of τy∗1 and τy∗2 when n = 7. Percentage bias increases in a smooth fashion with increasing σ δ . Compared to τy∗2 , the bias of τy∗1 is rather insensitive to increasing measurement error dispersion. At any speciﬁed level of σδ , bias in τy∗1 and τy∗2 was greatest when measurement errors were Gaussian distributed, and least when beta distributed, although the diﬀerences between them are slight. Arguably, the salient message carried by these graphs is that variable measurement error increases the bias of τy 1 and τy 2 , and that the increase in bias is directly related to the dispersion of the error distribution. 595 Panels (a), (b), and (c) in the middle row of graphs in Figure 3 show the ratio of standard error with and without measurement error in the auxiliary variate. It is apparent that the standard error of estimation, like bias, increases directly with increasing error dispersion, and that τy∗1 and τy∗3 are less sensitive than τy∗2 in this regard. As with bias, the Gaussiandistributed errors exert more of an eﬀect on the standard error of estimation than uniformly distributed errors, whereas the beta-distributed errors had the smallest eﬀect. This result is evident when examining the standard error of estimation as a percentage of τ y , shown in the upper left inserts of the middle row of graphs. When performance is judged by RMSE, τy∗1 performs best under all three variable measurement error processes, as shown in panels (a), (b), and (c) of the lowest row of graphs of Figure 3. 7. Simulation Results For the error-contaminated versions of the E. nitens leaf population described in Section 3 we checked the results presented in Figures 2 and 3 by means of a simulation study in which we drew 30,000 samples from the population of E. nitens leaves contaminated by each of the measurement error processes described in Section 4. In all cases the discrepancy between the results displayed in Figures 2 and 3 and the simulation results agreed to within a fraction of a percentage point. Moreover, when the simulation was repeated with 100,000 samples, the results changed minimally. Sampling was conducted with samples of size 7, 15, and 37. Because of the close similarity of results, we report results for samples of size 7 only. Aside from serving this conﬁrmatory purpose, the simulation enabled us to evaluate how well the approximations put forth in equations (7), (11), and (12) portray the variance of τy∗1 , τy∗2 , and τy∗3 , respectively, under the diﬀerent types of measurement error processes we examined. Cochran (1977, p. 162–163) considered their adequacy in the absence of measurement error. Results for the case of constant measurement error in the auxiliary variate are displayed in Table 2a. In the absence of measurement error, i.e., when µ δ = 0, the standard errors computed from these variance approximations all are within 1% of the Monte Carlo standard errors for all three estimators of τ y and for the three sample sizes that we examined. When µ δ > 0, the computed standard error of τy∗1 appears to track the Monte Carlo standard error as well as it does when µ δ = 0: there is no apparent pattern of increasing or decreasing deviation from the Monte Carlo error. The same can be said for the deviation of the standard error of τy∗2 and τy∗3 from the empirical error observed in the simulation study. When µ δ < 0, these approximations to the standard error of estimation perform less well, especially for the largest n. However, only in one instance did the deviation from the empirical standard error exceed 2% in absolute magnitude, and that case occurred only when µ δ was −25% of µ x . Overall, the variance approximations given for all three estimators lead to trustworthy standard errors of estimation under constant measurement error in the auxiliary variate. Results for variable measurement error are tabulated in Table 2b. Recall that we examined the eﬀects of uniform, Biometrics, June 2009 596 Uniform 3 3.5 2 3.0 Gaussian (a) (%) 3 3.5 2 3.0 Bias ratio 1 0 0 0 2.0 1.5 1.5 1.5 1.0 1.0 1.0 1.8 5 1.6 0.30 (a) 0.10 0.20 6 5 1.6 4 3 1.4 0.00 1.8 (%) 0.30 (b) 3 1.0 1.0 7 6 1.8 0.30 (a) 0.10 0.20 7 6 (%) 1.8 5 4 1.6 0.00 2.0 0.30 (b) 3 0.00 1.2 1.2 1.0 1.0 1.0 0.30 0.00 0.10 0.20 σδ / σx (%) 4 3 1.2 0.20 0.30 (c) 5 3 1.4 / σx 7 1.6 1.4 0.10 σδ 0.20 6 1.4 0.00 0.10 2.0 1.8 4 1.6 (%) 3 (%) 5 0.30 (c) 4 1.4 1.0 0.20 5 1.6 1.2 0.10 0.20 6 1.2 0.00 0.10 1.8 (%) 4 1.4 0.00 1.2 2.0 (%) 1 2.0 0.20 (c) 2.5 2.0 0.10 2 3.0 2.5 6 Standard error ratio (%) 3 3.5 1 2.5 0.00 RMSE ratio Beta (b) 0.30 0.00 0.10 0.20 σδ / σx 0.30 Figure 3. Bias (ﬁrst panel row), standard error (second panel row), and RMSE (third panel row) of τy∗1 (solid line), τy∗2 (dot–dash line), and τy∗3 (dashed line) relative to that of τy 1 , τy 2 , and τy 3 , respectively, with uniform (a), Gaussian (b), and beta (c) distributed measurement error in the auxiliary variate. The inner plots represent bias (ﬁrst panel row), standard error (second panel row), and RMSE (third panel row) expressed as a percentage of τ y . The horizontal axis of the inserts span the range 0 σ δ /σ x 0.30. Results for τy∗3 are based on samples of size n = 7. Gaussian, and beta-distributed measurement error, where, in each case, the distribution was centered at zero, i.e., µ δ = 0. Looking ﬁrst at the results that pertain to uniformly distributed measurement error, there is an apparent pattern of understatement of actual standard error, which increases in magnitude with increasing σ δ . When σ δ = 0.3σ x , the computed standard errors of τy∗1 , τy∗2 , and τy∗3 are about 2–4% smaller than what was observed among the 30,000 estimates observed in the simulation study. In contrast, when δ ∼ N [0, σ 2δ ], the computed approximations to the standard errors of τy∗2 and τy∗3 tend to overstate the empirically observed standard error, with the overstatement increasing moderately with increasing σ δ . This trend was apparent for τy∗1 at the large sample sizes (n = 15 & 37) that we examined. Results when δ is distributed as a beta random variable mimic the Gaussian results, but the overstatement is greater for a similar level of σ δ . Whereas overstatement ranged Ratio Estimation with Measurement Error 597 Table 2 Percentage deviation of the standard error approximation of τy∗1 , τy∗2 , and τy∗3 from the Monte Carlo standard error. In part (a), results are shown when the measurement error in the auxiliary variate is a constant magnitude given by µ δ ; in part (b), results are shown when the measurement error in the auxiliary variate varies according to uniform, Gaussian, and β distributions with dispersion indicated by σ δ . Simulation results are based on 30, 000 samples of size n = 7. (a) µ δ /µ x Distribution Constant (b) Estimator −0.25 τy∗1 τy∗2 τy∗3 −3.05 −1.90 −0.94 −0.19 0.34 0.62 0.67 0.55 −0.74 −0.61 −0.43 −0.20 0.05 0.28 0.45 0.54 −0.85 −0.76 −0.63 −0.46 −0.24 −0.02 σ δ /σ x 0.19 0.34 0 0.03 0.07 0.10 0.13 0.17 0.20 0.92 0.83 0.49 −0.03 −0.64 −1.89 −2.47 −0.20 −0.15 −0.10 −0.05 0 0.05 0.10 0.15 0.20 0.25 0.36 0.15 −0.03 0.56 0.53 0.48 0.42 0.43 0.41 0.23 0.27 0.30 −3.02 −3.54 −4.04 Uniform τy∗1 τy∗2 τy∗3 Gaussian τy∗1 Beta 0.44 1.01 1.19 1.03 0.64 −0.43 −0.99 −1.55 −2.10 −2.65 0.29 0.26 0.13 −0.08 −0.34 −0.92 −1.21 −1.48 −1.75 −2.02 0.49 0.47 0.47 0.46 0.46 0.45 0.43 0.40 0.35 0.29 τy∗2 −0.31 0.03 0.36 0.67 0.96 1.23 1.50 1.77 2.09 2.47 τy∗3 −0.05 0.02 0.17 0.39 0.66 0.95 1.25 1.55 1.83 2.09 τy∗1 0.76 1.22 1.60 1.88 2.07 2.21 2.20 2.16 2.09 2.00 0.26 0.83 1.47 2.07 2.57 3.21 3.35 3.41 3.40 3.32 0.21 0.75 1.28 1.76 2.18 2.77 2.97 3.10 3.19 3.25 τy∗2 τy∗3 generally from 0–2% when σ δ = 0.3σ x under Gaussian measurement error, it ranged from 2–3% under beta measurement error. 8. Discussion This has been a multifaceted study, the major conclusions of which are enumerated below. Although it is impossible to generalize the results of a single empirical study, our results at the very least provide insight into how the eﬀects of measurement error change with increasing µ δ in the case of constant measurement error and on increasing measurement error dispersion in the case of variable measurement error. The analytical expressions for the bias and variance of τy∗1 , τy∗2 , and τy∗3 have been presented in Section 2.3 and conﬁrmed by the simulation study. (i) (ii) The unbiasedness of τy 3 is preserved even when measurement error contaminates the auxiliary variate. As a referee pointed out, this estimator is unbiased without regard to the auxiliary variate and how or whether it may be contaminated. Constant measurement error in the auxiliary variate can occur when the measuring instrument is faulty in a consistent manner. The eﬀects of such errors are not symmetric: when µ δ < 0, its eﬀect on the bias, standard errors, and mean square errors of τy∗1 and τy∗2 diﬀer from its eﬀect when µ δ > 0. The same can be said regarding its eﬀect on the stan- (iii) (iv) dard error of τy∗3 . When µ δ > 0, the bias and standard error are both reduced from their values in the absence of measurement error. It will be useful to determine whether this result holds for other study populations. Over a broad range of µδ = 0, τy∗1 has smallest RMSE. Variable measurement error in the auxiliary variate can occur for any of a number of reasons, e.g., rounding error in the measurement process. In the face of variable measurement error, the bias of τy∗1 increases less than that of τy∗2 . The RMSEs of all three estimators we examined increases directly with increasing σ δ , a result which is intuitively clear. The RMSEs of τy∗1 and τy∗3 are nearly identical, both being less than that of τy∗2 . The approximations we presented to the variance, and hence standard error, of the three estimators work well under constant measurement error when µ δ > 0, but deteriorate slightly when µ δ < 0. Under a variable measurement error process, the magnitude and direction of its eﬀect on the standard error approximation diﬀer from one error distribution to another. Overall, the absolute size of the eﬀect increases directly with increasing σ δ . There are a number of ways in which the results presented here can be extended. It would be of interest to determine how estimators of variance are aﬀected by measurement error Biometrics, June 2009 598 in the auxiliary variate, as well as the coverage of nominal (1 − α)100% conﬁdence intervals are aﬀected by measurement error in the auxiliary variate. When the relationship between y and x does not pass near the origin, the linear regression estimator is more apt, in the model-assisted sense of Särndal et al. (1992). It is not clear whether the results of measurement error in x observed for the ratio estimators of this article would be magniﬁed or diminished with the regression estimator of τ y . Although we have taken a design-based approach to infer the eﬀect of ﬁxed measurement error, there may be utility in a model-based approach that postulates a probability distribution for the error of repeated measurement of each unit of the population. Surely for some practitioners who are more accustomed to relying on a presumed model as the basis for statistical inference, this would be a more satisfying approach. Accordingly, we shall report on our study of this approach later. Lastly, the combination of measurement error in both y and x, possibly stemming from diﬀerent error processes, needs to be examined. Acknowledgements The authors wish to thank Jeﬀ Gove, Daniel Mandallaz, Ross Nelson, Al Stage, Göran Stȧhl, and Steve Stehman for suggestions, which helped to improve the quality of the manuscript. View publication stats References Breidt, F. J., Opsomer, J. D., Johnson, A. A., and Ranalli, M. G. (2007). Semiparametric model-assisted estimation for natural resource surveys. Survey Methodology 33, 35–44. Candy, S. G. (1999). Predictive models for integrated pest management of the leaf beetle Chrysophtharta bimaculata in Eucalyptus nitens in Tasmania. Doctoral dissertation, University of Tasmania, Hobart, Australia. Carroll, R. J. (1998). Measurement Error in Nonlinear Models. Boca Raton, Florida: Chapman & Hall/CRC. Cochran, W. G. (1977). Sampling Techniques, 3rd edition. New York: John Wiley. Fuller, W. (1987). Measurement Error Models. New York: John Wiley. Goodman, L. A. and Hartley, H. O. (1958). The precision of unbiased ratio-type estimators. Journal of the American Statistical Association 53, 491–508 (corrigenda: 1969 64, 1700). Gregoire, T. G. and Valentine, H. T. (2008). Sampling Strategies for Natural Resources and the Environment. Boca Raton, Florida: Chapman & Hall/CRC. Hartley, H. O. and Ross, A. (1954). Unbiased ratio estimators. Nature 174, 270–271. Särndal, C.-E., Swensson, B., and Wretman, J. (1992). Model Assisted Survey Sampling. New York: Springer-Verlag. Sukhatme, P. V. and Sukhatme, B. V. (1970). Sampling Theory of Surveys with Applications. Ames: Iowa State University Press. Received December 2007. Revised April 2008. Accepted May 2008.

Log In

Ratio estimation with measurement error in the auxiliary variate

Related papers

Related papers

Related topics