0% found this document useful (0 votes)

16 views162 pages

Call

This document outlines a short course based on the book 'Measurement Error in Nonlinear Models' by Carroll, Ruppert, and Stefanski, focusing on the impact of measurement error in regression analysis. It covers various types of measurement errors, their effects on linear and nonlinear models, and methods for correction and estimation. The course is supported by a NIH grant and emphasizes the importance of understanding measurement error in statistical modeling.

Uploaded by

abdi1211001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views162 pages

Call

Uploaded by

abdi1211001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 162

Introduction (@ R.J. Carroll & D.

Ruppert, 2002) 1

INTRODUCTION AND OUTLINE

This short course is based upon the book

Measurement Error in Nonlinear Models

R. J. Carroll, D. Ruppert and L. A. Stefanski
Chapman & Hall/CRC Press, 1995
ISBN: 0 412 04721 7
http://www.crcpress.com

The project described was supported by Grant Number R44 RR12435 from the National Institutes of Health, National
Center for Research Resources. Its contents are solely the responsibility of the authors and do not necessarily
represent the official views of the National Center for Research Resources.
Introduction (@ R.J. Carroll & D. Ruppert, 2002) 2

OUTLINE OF SEGMENT 1
• What is measurement error?
• Some examples
• Effects of measurement error in simple linear regression
• Effects of measurement error in multiple regression
• Analysis of Covariance: effects of measurement error in a covariate on the com-
parisons of populations
• The correction for attenuation: the classic way of correcting for biases caused
by measurement error
Introduction (@ R.J. Carroll & D. Ruppert, 2002) 3

OUTLINE OF SEGMENT 2
• Broad classes of measurement error

∗ Nondifferential: you only measure an error–prone predictor because the

error–free predictor is unavailable
∗ Differential: the measurement error is itself predictive of outcome
• Surrogates
∗ Proxies for a difficult to measure predictor
• Assumptions about the form of the measurement error: additive and homoscedas-
tic
• Replication to estimate measurement error variance
• Methods to disagnose whether measurement error is additive and homoscedastic
Introduction (@ R.J. Carroll & D. Ruppert, 2002) 4

OUTLINE OF SEGMENT 3
• Transportability: using other data sets to estimate properties of measurement
error
• Conceptual definition of an exact predictor
• The classical error model
∗ You observe the real predictor plus error
• The Berkson error model
∗ The real predictor is what you observe plus error
• Functional and structural models defined and discussed
Introduction (@ R.J. Carroll & D. Ruppert, 2002) 5

OUTLINE OF SEGMENT 4
• The regression calibration method: replace X by an estimate of it given the
observed data
• Regression calibration is correction for attenuation (Segment 1)in linear re-
gression
• Use of validation, replication and external data
• Logistic and Poisson regression
• Use of an unbiased surrogate to estimate the calibration function
Introduction (@ R.J. Carroll & D. Ruppert, 2002) 6

OUTLINE OF SEGMENT 5
• The SIMEX method
• Motivation from design of experiments
• The algorithm

∗ The simulation step

∗ The extrapolation step
• Application to logistic regression
• Application to a generalized linear mixed model
Introduction (@ R.J. Carroll & D. Ruppert, 2002) 7

OUTLINE OF SEGMENT 6
• Instrumental variables:

∗ Indirect way to understand measurement error

∗ Often the least informativew
• The IV method/algorithm

∗ Why the results are variable

∗ IV estimation as a type of regression calibration
• Examples to logistic regression
Introduction (@ R.J. Carroll & D. Ruppert, 2002) 8

OUTLINE OF SEGMENT 7
• Likelihood methods
• The Berkson model and the Utah fallout study
∗ The essential parts of a Berkson likelihood analysis
• The classical model and the Framingham study
∗ The essential parts of a classical likelihood analysis
• Model robustness and computational issues
Segment 1 (@ R.J. Carroll & D. Ruppert, 2002) 9

SEGMENT 1: INTRODUCTION AND LINEAR

MEASUREMENT ERROR MODELS REVIEW
OUTLINE

• About This Course

• Measurement Error Model Examples
• Structure of a Measurement Error Problem
• A Classical Error Model
• Classical Error Model in Linear Regression
• Summary
Segment 1 (@ R.J. Carroll & D. Ruppert, 2002) 10

ABOUT THIS COURSE

• This course is about analysis strategies for regression problems in which
predictors are measured with error.
• Remember your introductory regression text ...
∗ Snedecor and Cochran (1967), “ Thus far we have assumed that X-variable
in regression is measured without error. Since no measuring instrument is
perfect this assumption is often unrealistic.”
∗ Steele and Torrie (1980), “... if the X’s are also measured with error, ... an
alternative computing procedure should be used ...”
∗ Neter and Wasserman (1974), “Unfortunately, a different situation holds if
the independent variable X is known only with measurement error.”

• This course focuses on nonlinear measurement error models (MEMs), with

some essential review of linear MEMs (see Fuller, 1987)
Segment 1 (@ R.J. Carroll & D. Ruppert, 2002) 11

EXAMPLES OF MEASUREMENT ERROR

MODELS
• Measures of nutrient intake

∗ A classical error model

• Coronary Heart Disease vs Systolic Blood Pressure

∗ A classical error model

• Radiation Dosimetry

∗ A Berkson error model

Segment 1 (@ R.J. Carroll & D. Ruppert, 2002) 12

MEASURES OF NUTRIENT INTAKE

• Y = average daily percentage of calories from fat as measured by a food fre-
quency questionnaire (FFQ).
• X = true long–term average daily percentage of calories from fat
• The problem: fit a linear regression of Y on X
• In symbols, Y = β0 + βxX + ²
• X is never observable. It is measured with error:
Segment 1 (@ R.J. Carroll & D. Ruppert, 2002) 13

MEASURES OF NUTRIENT INTAKE

• Along with the FFQ, on 6 days over the course of a year women are interviewed
by phone and asked to recall their food intake over the past year (24–hour
recalls).
• Their average % Calories from Fat is recorded and denoted by W .
∗ The analysis of 24–hour recall introduces some error =⇒ analysis error
∗ Measurement error = sampling error
+ analysis error
∗ Measurement error model
Wi = Xi + Ui, Ui are measurement errors
Segment 1 (@ R.J. Carroll & D. Ruppert, 2002) 14

HEART DISEASE VS SYSTOLIC BLOOD

PRESSURE
• Y = indicator of Coronary Heart Disease (CHD)
• X = true long-term average systolic blood pressure (SBP) (maybe transformed)
• Goal: Fot a logistic regression of Y on X
• In symbols, pr(Y = 1) = H (β0 + βxX)
• Data are CHD indicators and determinations of systolic blood pressure for n =
1, 600 in Framingham Heart Study
• X measured with error:
Segment 1 (@ R.J. Carroll & D. Ruppert, 2002) 15

HEART DISEASE VS SYSTOLIC BLOOD

PRESSURE
• SBP measured at two exams (and averaged) =⇒ sampling error
• The determination of SBP is subject to machine and reader variability =⇒
analysis error
∗ Measurement error = sampling error
+ analysis error
∗ Measurement error model
Wi = Xi + Ui, Ui are measurement errors
Segment 1 (@ R.J. Carroll & D. Ruppert, 2002) 16

THE KEY FACTOID OF MEASUREMENT ERROR

PROBLEMS
• Y = response, Z = error-free predictor, X = error-prone predictor, W = proxy
for X

• Observed are (Y, Z, W )

• Unobserved is X

• Want to fit a regression model (linear, logistic, etc.)

• In symbols, E(Y |Z, X) = f (Z, X, β)

• Key point: The regression model in the observed data is not the same as the
regression model when X is observed

• In symbols, E(Y |Z, W ) 6= f (Z, W, β)

Segment 1 (@ R.J. Carroll & D. Ruppert, 2002) 17

A CLASSICAL ERROR MODEL

• What you see is the true/real predictor plus measurement error

• In symbols, Wi = Xi + Ui

• This is called additive) measurement error

• The measurement errors Ui are:

∗ independent of all Yi, Zi and Xi (independent)

∗ IID(0, σu2) (IID, unbiased, homoscedastic)
Segment 1 (@ R.J. Carroll & D. Ruppert, 2002) 18

SIMPLE LINEAR REGRESSION WITH A

CLASSICAL ERROR MODEL
• Y = response, X = error-prone predictor

• Y = β0 + βxX + ²

• Observed data: (Yi, Wi), i = 1, . . . , n

• Wi = Xi + Ui (additive)

• Ui are:

∗ independent of all Yi, Zi and Xi (independent)

∗ IID(0, σu2) (IID, unbiased, homoscedastic)

What are the effects of measurement error on the usual analysis?

Segment 1 (@ R.J. Carroll & D. Ruppert, 2002) 19

SIMULATION STUDY
• Generate X1, . . . , X50, IID N(0, 1)

• Generate Yi = β0 + βxXi + ²i

∗ ²i IID N(0, 1/9)

∗ β0 = 0
∗ βx = 1

• Generate U1, . . . , U50, IID N(0, 1)

• Set Wi = Xi + Ui

• Regress Y on X and Y on W and compare

Segment 1 (@ R.J. Carroll & D. Ruppert, 2002) 20

Effects of Measurement Error

5 Reliable Data

-5

-6 -2 0 2 4 6

Figure 1: True Data Without Measurement Error.

Segment 1 (@ R.J. Carroll & D. Ruppert, 2002) 21

Effects of Measurement Error

10
Error--prone Data

5 Reliable Data

-5

-6 -2 0 2 4 6

Figure 2: Observed Data With Measurement Error.

Segment 1 (@ R.J. Carroll & D. Ruppert, 2002) 22

THEORY BEHIND THE PICTURES: THE NAIVE

ANALYSIS
• Least Squares Estimate of Slope:
c Sy,w
βx = 2
Sw
where
Sy,w −→ Cov(Y, W ) = Cov(Y, X + U )
= Cov(Y, X)
= σy,x

Sw2 −→ Var(W ) = Var(X + U )

= σx2 + σu2
Segment 1 (@ R.J. Carroll & D. Ruppert, 2002) 23

THEORY BEHIND THE PICTURES: THE NAIVE

ANALYSIS
So
 
2
c σy,x  σ x 
βx −→ 2 = 
  βx

σx + σu2 σx2 + σu2

• Note how classical measurement error causes a bias in the least squares regres-
sion coefficient
Segment 1 (@ R.J. Carroll & D. Ruppert, 2002) 24

THEORY BEHIND THE PICTURES: THE NAIVE

ANALYSIS
• The attenuation factor or reliability ratio describes the bias in linear
regression caused by classical measurement error
You estimate λβx;
σx2
λ= 2
σx + σu2

• Important Factoids:

∗ As the measurement error increases, more bias

∗ As the variability in the true predictor increases, less bias
Segment 1 (@ R.J. Carroll & D. Ruppert, 2002) 25

THEORY BEHIND THE PICTURES: THE NAIVE

ANALYSIS
• Least Squares Estimate of Intercept:

βc0 = Y − βcxW
−→ µy − λβxµx
= β0 + (1 − λ)βxµx

• Estimate of Residual Variance:

MSE −→ σ²2 + (1 − λ)βx2σx2

• Note how the residual variance is inflated

∗ Classical measurement error in X causes the regression to have more noise
Segment 1 (@ R.J. Carroll & D. Ruppert, 2002) 26

MORE THEORY: JOINT NORMALITY

• Y, X, W jointly normal =⇒

∗ Y | W ∼ Normal
∗ E(Y | W ) = β0 + (1 − λ)βxµx + λβxW
∗ Var(Y | W ) = σ²2 + (1 − λ)βx2σx2

• Intercept is shifted by (1 − λ)βxµx

• Slope is attenuated by the factor λ

• Residual variance is inflated by (1 − λ)βx2σx2

• And simple linear regression is an easy problem!

Segment 1 (@ R.J. Carroll & D. Ruppert, 2002) 27

MORE THEORY: IMPLICATIONS FOR TESTING

HYPOTHESES
• Because
βx = 0 iff λβx = 0
it follows that
[H0 : βx = 0] ≡ [H0 : λβx = 0]
which in turn implies that the naive test of βx = 0 is valid (correct
Type I error rate).
• The discussion of naive tests when there are multiple predictor measured
with error, or error-free predictors, is more complicated
• In the following graph, we show that as the measurement error increases:

∗ Statistical power decreases

∗ Sample size to obtain a fixed power increases
Segment 1 (@ R.J. Carroll & D. Ruppert, 2002) 28

20
Sample Size
15

0.0 0.2 0.4 0.6 0.8 1.0

Measurement Error Variance

Figure 3: Sample Size for 80% Power. True slope βx = 0.75. Variances σx2 = σ²2 = 1.
Segment 1 (@ R.J. Carroll & D. Ruppert, 2002) 29

MULTIPLE LINEAR REGRESSION WITH ERROR

• Model
Y = β0 + βzt Z + βxt X + ²
W = X + U is observed instead of X

• Regressing Y on Z and W estimates

      
 βz∗   βz    βz 



 = Λ 


 6= 

 
βx∗ βx βx
• Λ is the attenuation matrix or reliability matrix
 
−1  
 σzz σzx   σzz σzx 
Λ =  

 
σxz σxx + σuu σxz σxx
• Biases in components of βx∗ and βz∗ can be multiplicative or additive =⇒
∗ Naive test of H0 : βx = 0, βz = 0 is valid
∗ Naive test of H0 : βx = 0 is valid
∗ Naive test of H0 : βx,1 = 0 is typically not valid (βx,1 denotes a subvector of
βx)
Segment 1 (@ R.J. Carroll & D. Ruppert, 2002) 30

MULTIPLE LINEAR REGRESSION WITH ERROR

• For X scalar, attenuation factor changes:
2
σx|z
λ1 = 2
σx|z + σu2

∗ σx|z
2
= residual variance in regression of X on Z
∗ σx|z
2
≤ σx2 =⇒
2
σx|z σx2
λ1 = 2 ≤ =λ
σx|z + σu2 σx2 + σu2
∗ =⇒ Collinearity accentuates attenuation
Segment 1 (@ R.J. Carroll & D. Ruppert, 2002) 31

MULTIPLE LINEAR REGRESSION WITH ERROR

• Amazingly, classical measurement error in X causes iased estimates of βz :

• Suppose that the regressio of X on Z is γ0 + γz Z

• Then what you estimate is

βz∗ = βz + (1 − λ1)βxγz ,

• So, there is bias in the coefficient for Z if:

∗ X is correlated with Z
∗ Z is a significant predictor were X to be observed
Segment 1 (@ R.J. Carroll & D. Ruppert, 2002) 32

ANALYSIS OF COVARIANCE
• These results have implications for the two group ANCOVA.

∗ X = true covariate
∗ Z = dummy indicator of group
• We are interested in estimating βz , the group effect. Biased estimates of βz :

βz∗ = βz + (1 − λ1)βxγz ,

∗ γz is from E(X | Z) = γ0 + γzt Z

∗ γz is the difference in the mean of X among the two groups.
∗ Thus, biased unless X and Z are unrelated.
∗ A randomized Study!!!
Segment 1 (@ R.J. Carroll & D. Ruppert, 2002) 33

1.5

1.0

Response
0.5

0.0

-0.5

-4 -2 0 2 4
Predictors

Figure 4: UNBALANCED ANCOVA. RED = TRUE DATA, BLUE = OBSERVED.

SOLID = FIRST GROUP, OPEN = SECOND GROUP. NO DIFFERENCE IN
GROUPS.
Segment 1 (@ R.J. Carroll & D. Ruppert, 2002) 34

CORRENTIONS FOR ATTENUATION

Y = β0 + βzt Z + βxt X + ²
W = X + U is observed instead of X

• Let Σuu be the measurement error covariance matrix

• Let Σzz be the covariance matrix of the Z’s
• Let Σww be the covariance matrix of the W ’s
• Let Σzw be the covariance matrix of the Z’s and W ’s
• Ordinary least squares actually estimates
 −1    
Σzz Σzw  Σzz Σzw   βz 
.
 
   
Σwz Σww Σwz Σww − Σuu βx

• The correction for attenuation simply fixes this up:

 c   −1    
βz,eiv  Σzz Σzw  Σzz Σzw   βcz,ols 

 c =


   c .
βx,eiv Σwz Σww − Σuu Σwz Σww βx,ols

• In simple linear regression, this means that the ordinary least squares slope is
Segment 2 (@ R.J. Carroll & D. Ruppert, 2002) 35

SEGMENT 2 NONLINEAR MODELS AND DATA

TYPES OUTLINE
• Differential and Nondifferential measurement error.
• Estimating error variances:
∗ Validation
∗ Replication
• Using Replication data to check error models

∗ Additivity
∗ Homoscedasticity
∗ Normality
Segment 2 (@ R.J. Carroll & D. Ruppert, 2002) 36

THE BASIC DATA

• A response Y
• Predictors X measured with error.
• Predictors Z measured without error.
• A major proxy W for X.
• Sometimes, a second proxy T for X.
Segment 2 (@ R.J. Carroll & D. Ruppert, 2002) 37

NONDIFFERENTIAL ERROR
• Error is said to be nondifferential if W and T would not be measured if one
could have measured X.
∗ It is not clear how this term arose, but it is in commopn use.
• More formally, (W, T ) are conditionally independent of Y given (X, Z).
∗ The idea: (W, T ) provide no additional information about Y if X were
observed
• This often makes sense, but it may be fairly subtle in each application.
Segment 2 (@ R.J. Carroll & D. Ruppert, 2002) 38

NONDIFFERENTIAL ERROR
• Many crucial theoretical calculations revolve around nondifferential error.
• Consider simple linear regression: Y = β0 + βxX + ², where ² is independent of
X.
E(Y |W ) = E [{E(Y |X, W )} |W ]
= E [{E(Y |X)} |W ] Note
= β0 + βxE(X|W ).

∗ This reduces the problem in general to estimating E(X|W ).

• If the error is differential, then the second line fails, and no simplification is
possible.
• For example,
cov(Y, W ) = βxcov(Y, X) + cov(², W ).
Segment 2 (@ R.J. Carroll & D. Ruppert, 2002) 39

HEART DISEASE VS SYSTOLIC BLOOD

PRESSURE
– Y = indicator of Coronary Heart Disease (CHD)
– X = true long-term average systolic blood pressure (SBP) (maybe trans-
formed)
– Assume P (Y = 1) = H (β0 + βxX)
– Data are CHD indicators and determinations of systolic blood pressure for
n = 1600 in Framingham Heart Study
– X measured with error:
∗ SBP measured at two exams (and averaged) =⇒ sampling error
∗ The determination of SBP is subject to machine and reader variability
∗ It is hard to believe that the short term average of two days carries any
additional information about the subject’s chance of CHD over and above
true SBP.
∗ Hence, Nondifferential
Segment 2 (@ R.J. Carroll & D. Ruppert, 2002) 40

IS THIS NONDIFFERENTIAL?
• From Tosteson et al. (1989).

• Y = I{wheeze}.

• X is personal exposure to NO2.

• W = (NO2 in kitchen, NO2 in bedroom) is observed in the primary study.

Segment 2 (@ R.J. Carroll & D. Ruppert, 2002) 41

IS THIS NONDIFFERENTIAL?
• From Küchenhoff & Carroll

• Y = I{lung irritation}.

• X is actual personal long–term dust exposure

• W = is dust exposure as measured by occupational epidemiology tech-

niques.

∗ They sampled the plant for dust.

∗ Then they tried to match the person to work area
Segment 2 (@ R.J. Carroll & D. Ruppert, 2002) 42

IS THIS NONDIFFERENTIAL?
• Y = average daily percentage of calories from fat as measured by a food fre-
quency questionnaire (FFQ).

• FFQ’s are in wide use because they are inexpensive

• The non–objectivity (self–report) suggests a generally complex error structure

• X = true long–term average daily percentage of calories from fat

• Assume Y = β0 + βxX + ²

• X is never observable. It is measured with error:

∗ Along with the FFQ, on 6 days over the course of a year women are inter-
viewed by phone and asked to recall their food intake over the past year
(24–hour recalls). Their average is recorded and denoted by W .
Segment 2 (@ R.J. Carroll & D. Ruppert, 2002) 43

WHAT IS NECESSARY TO DO AN ANALYSIS?

• In linear regression with classical additive error W = X + U , we have seen that
what we need is:

∗ Nondifferential error
∗ An estimate of the error variance var(U )

• How do we get the latter information?

• The best way is to get a subsample of the study in which X is observed. This
is called validation.

∗ In our applications, generally not possible.

• Another method is to do replications of the process, often called calibration.

• A third way is to get the value from another similar study.

Segment 2 (@ R.J. Carroll & D. Ruppert, 2002) 44

REPLICATION
• In a replication study, for some of the study participants you measure more
than one W .

• The standard additive model with mi replicates is

Wij = Xi + Uij , j = 1, ..., mi.

• This is an unbalanced 1–factor ANOVA with mean squared error var(U ) esti-
mated by
Pn Pmi 2
c2 i=1 j=1 (Wij − W i• )
σu = Pn .
i=1 (mi − 1)

• Of course, as the proxy or surrogate for Xi one would use the sample mean W i•.

W i• = Xi + U i•
var(U i•) = σu2 /mi.
Segment 2 (@ R.J. Carroll & D. Ruppert, 2002) 45

REPLICATION
• Replication allows you to test whether your model is basically additive with
constant error variance.

• If Wij = Xi +Uij with Uij symmetrically distributed about zero and independent
of Xi, we have a major fact:

∗ The sample mean and sample standard deviation are uncorre-

lated.

• Also, if Uij are normally distributed, then so too are differences Wi1 − Wi2 =
Ui1 − Ui2.

∗ q-q plots of these differences can be used to assess normality of the measure-
ment errors

• Both procedures can be implemented easily in any package.

Segment 2 (@ R.J. Carroll & D. Ruppert, 2002) 46

REPLICATION: WISH
• The WISH study measured caloric intake using a 24–hour recall.

∗ There were 6 replicates per woman in the study.

• A plot of the caloric intake data showed that W was no where close to being
normally distributed in the population.

∗ If additive, then either X or U is not normal.

• When plotting standard deviation versus the mean, typical to use the rule that
the method “passes” the test if the essential max–to–min is less than 2.0.

∗ A little bit of non–constant variance never hurt anyone. See

Carroll & Ruppert (1988)
Segment 2 (@ R.J. Carroll & D. Ruppert, 2002) 47

WISH Calories, FFQ

Normal QQ--Plot

6000

5000

•
••
4000
•
••
s.d.

•
•
•••••
3000 ••••
••
••••
••
••••
•••
•••••
2000 ••••
••••••
••
••
•••••••
•
••••••••
•••••
••
••••••••••
•••
•
•••••••
•••••••••
1000 •••
•••••••
•••••••
••••••••••••
••••••••••
•
• •••
•
• •

-3 -2 -1 0 1 2 3

Figure 5: WISH, CALORIC INTAKE, Q–Q plot of Observed data. Caloric intake is
clearly not normally distributed.
Segment 2 (@ R.J. Carroll & D. Ruppert, 2002) 48

WISH Calories, 24--hour recalls

Normal QQ--Plot of Pairwise Differences

3000
•
•

•
•
2000 ••
••••
•
••
•••
•••
• •••••
•••
••••
••
••••••
1000 ••••
••••
•••••••
•••••
•••••
••
•••••
••••••
•••••
••
••••••
•
0 •
•••••
•••••
••••
s.d.

••
••
••••••
••••
•
••••••
•
••
••••
••
-1000 ••
••••
••
•••
•••
••••
••
••••
•••
•••
••
-2000 •••
•••••••
•
•

•
••
-3000
•
•

-3 -2 -1 0 1 2 3

Figure 6: WISH, CALORIC INTAKE, Q–Q plot of Differenced data. This suggests that
the measurement errors are reasonably normally distributed.
Segment 2 (@ R.J. Carroll & D. Ruppert, 2002) 49

WISH Calories, 24--hour recalls

s.d. versus mean

1200
• •
•
•
•
•

1000 ••
•• •
• • • •• • •
• •• • • •
• • • • • • • •
• • • •
• •
800 • •• •
• • •• • •
• • • • • •
• • •
• • • •• •
s.d.

•• ••••
• •
• •• •• • • •
• • • • ••• •••
• •
••• • • •
600 • • •
• • • • • • • •
• • • • • • •
• • • • •• •
• •• • •
• • • • • ••• • •• • ••
• • •• • • •• •••
• • •
• • • •• • •
400 • • • • • • • •• ••
• • ••• •• • • • •• • • •
• •
•
• •• • •
• • ••
• •• • • •
• • • • •
• •• • • •
• • •• • • •• ••• ••
•• • •• • •
•
• •• • •
200 • • • • •• •• •
• • •• •
• •
• •

1000 1500 2000 2500 3000

Mean Calories

Figure 7: WISH, CALORIC INTAKE, plot for additivity, loess and OLS. The standard
deviation versus the mean plot suggests lots of non–constant variance. Note how the range of the
fits violates the 2:1 rule.
Segment 2 (@ R.J. Carroll & D. Ruppert, 2002) 50

REPLICATION: WISH
• Taking logarithms improves all the plots.
Segment 2 (@ R.J. Carroll & D. Ruppert, 2002) 51

WISH Log-Calories, FFQ

Normal QQ--Plot

8.5
•
••
•• •
••
••
8.0 •••••••
••
••••
•
•••
•
••••
••
•••••
•••••
•••••
7.5
•••
•••••••
•
••••••
s.d.

••••
•
•••••••
••
••••
••••
•
•••••
••
••••••
7.0
••
••••
•
••
••
••••
•••
••••••
•••••
•
•••
6.5 ••••
••••
••
••

6.0 •
•

-3 -2 -1 0 1 2 3

Figure 8: WISH, LOG CALORIC INTAKE, Q–Q plot of Observed data. The actual
logged data appears nearly normally distributed.
Segment 2 (@ R.J. Carroll & D. Ruppert, 2002) 52

WISH Log-Calories, 24--hour recalls

Normal QQ--Plot of Pairwise Differences

•
•

••
•
••
••••
9.999392*10^-1 ••••••••
•
•
••••
•••
•
•••••••
•••
••••••••
••
•••••••
•••
•••••
•••••
•
•••••••
•
•••
••••••
-4.053116*10^-5 ••••••
•••••••
•
•
••••••
•••••••
•
•••••••
s.d.

••
•
•••••
••••
••••
•
••
•••••
•••
•
•••••••
••
-1.000020*10^0 •••••
•••••
••••
•••
••••
•
•

-2*10^0
•

-3 -2 -1 0 1 2 3

Figure 9: WISH, LOG CALORIC INTAKE, Q–Q plot of Differenced data. The mea-
surement errors appear normally distributed.
Segment 2 (@ R.J. Carroll & D. Ruppert, 2002) 53

WISH Log--Calories, 24--hour recalls

s.d. versus mean

0.8 • •

•
• • •

•
•
• • •
0.6 •
• •
•
• ••
•• • •
• • • • •
• • •• • •
• • • • • •
• • • ••
••• • • • • •
•
s.d.

• • •• •
• •• • • • • •• •• •
0.4 • • • • • ••
• ••••• •
• • ••• • • • •
• • • • • •• • •
•
•• • ••• • •• • • • •
• • •• • •
•• • • ••
• • • • •
••• •• ••• ••••• • • •
•• • • •
• ••• • • • •••• •
•
• • • • • •••• • • • •
• • • • •
• • •
• • • • • •
• • • •• • ••
• •• • • •
0.2 •• • • •• ••• • • • • ••• •• • •
•• • •• •
• • •• • •
• • •• • •• • •• •
• • ••
•• • •
• •
• •

•
•

6.5 7.0 7.5 8.0

Mean Log--Calories

Figure 10: WISH, LOG CALORIC INTAKE, plot for additivity, loess and OLS. The
2:1 rule is not badly violated, suggested constant variance of the errors. This transformation seems
to work fine.
Segment 2 (@ R.J. Carroll & D. Ruppert, 2002) 54

SUMMARY
• Nondifferential error is an important assumption.

∗ In the absence of validation data, it is not a testable assumption.

• Additivity, Normality, Homoscedasticity of errors can be assessed graph-

ically via replication

∗ Sample standard deviation versus sample mean.

∗ q–q plots of differences of within–person replicates.
Segment 3 (@ R.J. Carroll & D. Ruppert, 2002) 55

SEGMENT 3: BASIC CONCEPTUAL ISSUES

• Transportability: what parts of a measurement error model can be assessed
by external data sets

• What is Berkson? What is classical?

• Functional versus structural modeling

Segment 3 (@ R.J. Carroll & D. Ruppert, 2002) 56

TRANSPORTABILITY AND THE LIKELIHOOD

• In linear regression, we have seen that we only require knowing the mea-
surement error variance (after checking for semi–constant variance, addi-
tivity, normality).

• Remember that the reliability ratio or attenuation coefficient is

σx2 var(X)
λ= 2 =
σx + σu2 var(W )

• In general though, more is needed. Let’s remember that if we observe

W instead of X, then the observed data have a regression of Y on W that
effectively acts as if

E(Y |W ) = β0 + βxE(X|W )
≈ β0 + βxλW.

• If we knew λ, it would be easy to correct for the bias

Segment 3 (@ R.J. Carroll & D. Ruppert, 2002) 57

TRANSPORTABILITY
• It is tempting to try to use outside data and transport this distribution to your
problem.

∗ Bad idea!!!!!!!!!!!!
σx2
λ= 2
σx + σu2

∗ Note how λ depends on the distribution of X.

∗ It is rarely the case that two populations have the same X distribution,
even when the same instrument is used.
Segment 3 (@ R.J. Carroll & D. Ruppert, 2002) 58

EXTERNAL DATA AND TRANSPORTABILITY

• A model is transportable across studies if it holds with the same param-
eters in the two studies.

∗ Internal data, i.e., data from the current study, is ideal since there is no
question about transportability.

• With external data, transportability back to the primary study

cannot be taken for granted.

∗ Sometimes transportability clearly will not hold. Then the value of the ex-
ternal data is, at best, questionable.
∗ Even is transportability seems to be a reasonable assumption, it is still just
that, an assumption.
Segment 3 (@ R.J. Carroll & D. Ruppert, 2002) 59

EXTERNAL DATA AND TRANSPORTABILITY

• As an illustration, consider two nutrition data sets which use exactly the same
FFQ

• Nurses Health Study

∗ Nurses in the Boston Area

• American Cancer Society

∗ National sample
• Since the same instrument is used, error properties should be about the
same.

∗ But maybe not the distribution of X!!!

∗ var(differences, NHS = 47)
∗ var(differences, ACS = 45)
Segment 3 (@ R.J. Carroll & D. Ruppert, 2002) 60

∗ var(sum, ACS = 296)

Segment 3 (@ R.J. Carroll & D. Ruppert, 2002) 61

NHS

50
40
30
20
10
0

10 20 30 40 50
pccal.nhs

ACS
40
30
20
10
0

10 20 30 40 50
pccal.acs

Figure 11: FFQ Histograms of % Calories from Fat in NHS and ACS
Segment 3 (@ R.J. Carroll & D. Ruppert, 2002) 62

THE BERKSON MODEL

• The Berkson model says that
True Exposure = Observed Exposure + Error

X = W + Ub

• Note the difference:

∗ Classical: We observe true X plus error

∗ Berkson: True X is what we observe (W ) plus error
∗ Further slides will describe the difference in detail
• In the linear regression model,

∗ Ignoring error still leads to unbiased intercept and slope estimates,

∗ but the error about the line is increased.
Segment 3 (@ R.J. Carroll & D. Ruppert, 2002) 63

WHAT’S BERKSON? WHAT’S CLASSICAL?

• In practice, it may be hard to distinguish between the classical
and the Berkson error models.

∗ In some instances, neither holds exactly.

∗ In some complex situations, errors may have both Berkson and classical com-
ponents, e.g., when the observed predictor is a combination of 2 or more
error–prone predictors.

• Berkson model: a nominal value is assigned.

∗ Direct measures cannot be taken, nor can replicates.

• Classical error structure: direct individual measurements are taken, and

can be replicated but with variability.
Segment 3 (@ R.J. Carroll & D. Ruppert, 2002) 64

WHAT’S BERKSON? WHAT’S CLASSICAL?

• Direct measures possible?

• Replication possible?

• Classical: We observe true X plus error

• Berkson: True X is what we observe (W ) plus error

• Let’s play stump the experts!

• Framingham Heart Study

∗ Predictor is systolic blood pressure

Segment 3 (@ R.J. Carroll & D. Ruppert, 2002) 65

WHAT’S BERKSON? WHAT’S CLASSICAL?

• All workers with the same job classification and age are assigned the same ex-
posure based on job exposure studies.

• Using a phantom, all persons of a given height and weight with a given recorded
dose are assigned the same radiation exposure.
Segment 3 (@ R.J. Carroll & D. Ruppert, 2002) 66

WHAT’S BERKSON? WHAT’S CLASSICAL?

• Long–term nutrient intake as measured by repeated 24–hour recalls.
Segment 3 (@ R.J. Carroll & D. Ruppert, 2002) 67

FUNCTIONAL AND STRUCTURAL MODELING

• Once you have decided on an error model, you have to go about making estima-
tion and inference.

• In classical error models, you have to know the structure of the error.

∗ Additive or multiplicative?
∗ Some experimentation is necessary to give information about the measure-
ment error variance.

• With all this information, you have to decide upon a method of estimation.

• The methods can be broadly categorized as functional or structural.

Segment 3 (@ R.J. Carroll & D. Ruppert, 2002) 68

FUNCTIONAL AND STRUCTURAL MODELING

• The common linear regression texts make distinction:

∗ Functional: X’s are fixed constants

∗ Structural: X’s are random variables
• If you pretend that the X’s are fixed constants, it seems plausible to try to
estimate them as well as all the other model parameters.
• This is the functional maximum likelihood estimator.
∗ Every textbook has the linear regression functional maximum likelihood es-
timator.

• Unfortunately, the functional MLE in nonlinear problems has two defects.

∗ It’s really nasty to compute.

∗ It’s a lousy estimator (badly inconsistent).
Segment 3 (@ R.J. Carroll & D. Ruppert, 2002) 69

FUNCTIONAL AND STRUCTURAL MODELING

CLASSICAL ERROR MODELS
• The common linear regression texts make distinction:

∗ Functional: X’s are fixed constants

∗ Structural: X’s are random variables
• These terms are misnomers.
• All inferential methods assume that the X’s behave like a random sample any-
way!
• More useful distinction:

∗ Functional: No assumptions made about the X’s (could be random or

fixed)
∗ Classical structural: Strong parametric assumptions made about the
distribution of X. Generally normal, lognormal or gamma.
Segment 3 (@ R.J. Carroll & D. Ruppert, 2002) 70

FUNCTIONAL METHODS IN THIS COURSE

CLASSICAL ERROR MODELS
• Regression Calibration/Substitution

∗ Replaces true exposure X by an estimate of it based only on covariates

but not on the response.
∗ In linear model with additive errors, this is the classical correction for
attenuation.
∗ In Berkson model, this means to ignore measurement error.
• The SIMEX method (Segment 4) is a fairly generally applicable functional
method.
∗ It assumes only that you have an error model, and that in some fashion you
can “add on” measurement error to make the problem worse.
Segment 3 (@ R.J. Carroll & D. Ruppert, 2002) 71

FUNCTIONAL METHODS CLASSICAL ERROR

MODELS
• The strength of the functional model is its model robustness

∗ No assumptions are made about the true predictors.

∗ Standard error estimates are available.
• There are potential costs.

∗ Loss of efficiency of estimation (missing data problems, highly nonlinear pa-

rameters)
∗ Inference comparable to likelihood ratio tests are possible (SIMEX) but not
well–studied.
Segment 4 (@ R.J. Carroll & D. Ruppert, 2002) 72

SEGMENT 4: REGRESSION CALIBRATION

OUTLINE
• Basic ideas
• The regression calibration algorithm
• Correction for attenuation
• Example: NHANES-I
• Estimating the calibration function

∗ validation data
∗ instrumental data
∗ replication data
Segment 4 (@ R.J. Carroll & D. Ruppert, 2002) 73

REGRESSION CALIBRATION—BASIC IDEAS

• Key idea: replace the unknown X by E(X|Z, W ) which depends only
on the known (Z, W ).
∗ This provides an approximate model for Y in terms of (Z, W ).
• Developed as a general approach by Carroll and Stefanski (1990) and Gleser
(1990).
∗ Special cases appeared earlier in the literature.
• Generally applicable (like SIMEX).
∗ Depends on the measurement error being “not too large” in
order for the approximation to be suffciently accurate.
Segment 4 (@ R.J. Carroll & D. Ruppert, 2002) 74

THE REGRESSION CALIBRATION ALGORITHM

• The general algorithm is:

∗ Using replication, validation, or instrumental data, develop a model for the

regression of X on (W, Z).
∗ Replace X by the model fits and run your favorite analysis.
∗ Obtain standard errors by the bootstrap or the “sandwich method.”

• In linear regression, regression calibration is equivalent to the “correction

for attenuation.”
Segment 4 (@ R.J. Carroll & D. Ruppert, 2002) 75

AN EXAMPLE: LOGISTIC REGRESSION,

NORMAL X
• Consider the logistic regression model
pr(Y = 1|X) = {1 + exp(−β0 − βxX)}−1 = H(β0 + βxX).

• Remarkably, the regression calibration approximation works extremely well in

this case
Segment 4 (@ R.J. Carroll & D. Ruppert, 2002) 76

AN EXAMPLE: POISSON REGRESSION, NORMAL

X
• Consider the Poisson loglinear regression model with
E(Y |X) = exp(−β0 − βxX).

• Suppose that X and U are normally distributed.

• Then the regression calibration approximation is approximately correct for the
mean
• However, the observed data are not Poisson, but are overdispersed
• In other words, and crucially, measurement error can destroy the distribu-
tional relationship.
Segment 4 (@ R.J. Carroll & D. Ruppert, 2002) 77

NHANES-I
• The NHANES-I example is from Jones et al., (1987).

• Y = I{breast cancer}.

• Z = (age, poverty index ratio, body mass index, I{use alcohol}, I{family
history of breast cancer}, I{age at menarche ≤ 12}, I{pre-menopause}, race).

• X = daily intake of saturated fat (grams).

• Untransformed surrogate:

∗ saturated fat measured by 24-hour recall.

∗ considerable error ⇒ much controversy about validity.

• Transformation: W = log(5 + measured saturated fat).

Segment 4 (@ R.J. Carroll & D. Ruppert, 2002) 78

NHANES-I—CONTINUED
• w/o adjustment for Z, W appears to have a small protective effect

• Naive logistic regression of Y on (Z, W ):

∗ βcW = −.97, se(βcW ) = .29, p < .001

∗ again evidence for a protective effect.

• Result is sensitive to the three individuals with the largest values of W .

∗ all were non-cases.

∗ changing them to cases: p = .06 and βcW = −.53, even though only 0.1% of
the data are changed.
Segment 4 (@ R.J. Carroll & D. Ruppert, 2002) 79

Healthy

100 200 300 400 500

0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5
Transformed Saturated Fat

Breast Cancer
20
15
10
5
0

-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5

Transformed Saturated Fat

Figure 12: Histograms of log(.05 + Saturated Fat/100) in the NHANES data, for women
with and without breast cancer in 10 year follow–up..
Segment 4 (@ R.J. Carroll & D. Ruppert, 2002) 80

NHANES-I—CONTINUED
• External replication data;
∗ CSFII (Continuous Survey of Food Intake by Individuals).
∗ 24-hour recall (W ) plus three additional 24-hour recall phone interviews,
(T1, T2, T3).
∗ Over 75% of σW2
|Z appears due to measurement error.

• From CSFII:
∗ σcW2
|Z = 0.217.
∗ σcU2 = 0.171 (assuming W = X + U )
∗ Correction for attenuation:
c2
σ
βx = c 2 W |Z c 2 βcw
c

σW |Z − σu
0.217
= (−.97) = −4.67
0.217 − 0.171

∗ 95% bootstrap confidence interval: (−10.37, −1.38).

∗ Protective effect is now much bigger but estimated with much
Segment 4 (@ R.J. Carroll & D. Ruppert, 2002) 81

ESTIMATING THE CALIBRATION FUNCTION

• Need to estimate E(X|Z, W ).

∗ How this is done depends, of course, on the type of auxiliary data available.

• Easy case: validation data

∗ Suppose one has internal, validation data.

∗ Then one can simply regress X on (Z, W ) and transports the model to the
non-validation data.
∗ For the validation data one regresses Y on (Z, X), and this estimate must
be combined with the one from the non-validation data.

• Same approach can be used for external validation data, but with the usual
concern for non-transportability.
Segment 4 (@ R.J. Carroll & D. Ruppert, 2002) 82

ESTIMATING THE CALIBRATION FUNCTION:

INSTRUMENTAL DATA: ROSNER’S METHOD
• Internal unbiased instrumental data:

∗ suppose E(T |X) = E(T |X, W ) = X so that T is an unbiased instrument.

∗ If T is expensive to measure, then T might be available for only a subset of
the study. W will generally be available for all subjects.
∗ then
E(T |W ) = E{E(T |X, W )|Z, W } = E(X|W ).

• Thus, T regressed on W follows the same model as X regressed on W , although

with greater variance.

• One regresses T on (Z, W ) to estimate the parameters in the regression of X

on (Z, W ).
Segment 4 (@ R.J. Carroll & D. Ruppert, 2002) 83

ESTIMATING THE CALIBRATION FUNCTION:

REPLICATION DATA
• Suppose that one has unbiased internal replicate data:

∗ n individuals
∗ ki replicates for the ith individual
∗ Wij = Xi + Uij , i = 1, . . . , n and j = 1, . . . , ki, where E(Uij |Zi, Xi) = 0.
∗ W i· := k1i Pj Wij .
∗ Notation: µz is E(Z), Σxz is the covariance (matrix) between X and Z, etc.

• There are formulae to implement a regression calibration method in this case.

Basically, you use standard least squares theory to get the best
linear unbiased predictor of X from (W, Z).

∗ Formulae are ugly, see attached and in the book

Segment 4 (@ R.J. Carroll & D. Ruppert, 2002) 84

ESTIMATING THE CALIBRATION FUNCTION:

REPLICATION DATA, CONTINUED

• E(X|Z, W )
   



 Σxx + Σuu/k Σxz −1  W − µw 
≈ µx + (Σxx Σxz )  



 . (1)
Σtxz

 Σzz 
 Z − µz
(best linear approximation = exact conditional expectation under joint normal-
ity).

• Need to estimate the unknown µ’s and Σ’s.

∗ These estimates can then be substituted into (1).

∗ µcz and Σc zz are the “usual” estimates since the Z’s are observed.
∗ µcx = µcw = Pni=1 kiW i/Pni=1 ki.
∗ Σc xz = Pni=1 ki(W i· − µcw )(Zi − µcz )t/ν
where ν = P ki − P ki2/ P ki.
Segment 4 (@ R.J. Carroll & D. Ruppert, 2002) 85

ESTIMATING THE CALIBRATION FUNCTION:

REPLICATION DATA, CONTINUED

∗ Σc uu
Pki
Pn
i=1 j=1 (Wij − W i· )(Wij − W i·)t
= Pn .
i=1 (ki − 1)

∗ Σc xx   
 n
 

 X c )t − (n − 1)Σ
c
= 


ki(W i· − µ
c )(W
w i· − µw  uu  /ν.


i=1
Segment 5 (@ R.J. Carroll & D. Ruppert, 2002) 86

SEGMENT 5, REMEASUREMENT METHODS:

SIMULATION EXTRAPOLATION, OUTLINE

• About Simulation Extrapolation

• The Key Idea

• An Empirical Version

• Simulation Extrapolation Algorithm

• Example: Measurement Error in Systolic Blood Pressure

• Summary
Segment 5 (@ R.J. Carroll & D. Ruppert, 2002) 87

ABOUT SIMULATION EXTRAPOLATION

• Restricted to classical measurement error

∗ additive, unbiased, independent in some scale, e.g., log

∗ for this segment:
∗ one variable measured with error
∗ error variance, σu2, assumed known
• A functional method
∗ no assumptions about the true X values
• Not model dependent
∗ like bootstrap and jackknife
• Handles complicated problems
• Computer intensive
• Approximate, less efficient for certain problems
Segment 5 (@ R.J. Carroll & D. Ruppert, 2002) 88

THE KEY IDEA

• The effects of measurement error on a statistic can be studied
with a simulation experiment in which additional measurement
error is added to the measured data and the statistic recalculated.

• Response variable is the statistic under study

• Independent factor is the measurement error variance

∗ Factor levels are the variances of the added measurement errors

• Objective is to study how the statistic depends on the variance of the measure-
ment error
Segment 5 (@ R.J. Carroll & D. Ruppert, 2002) 89

OUTLINE OF THE ALGORITHM

• Add measurement error !!! to variable measured with error

∗ θ controls amount of added measurement error

∗ σu2 increased to (1 + θ)σu2

• Recalculate estimates — called pseudo estimates

• Plot pseudo estimates versus θ

• Extrapolate to θ = −1

∗ θ = −1 corresponds to case of no measurement error

Segment 5 (@ R.J. Carroll & D. Ruppert, 2002) 90

Illustration of SIMEX
1.2

1.0

0.8
Coefficient

0.6
Naive Estimate

0.4

0.2

0.0

0.0 0.5 1.0 1.5 2.0 2.5 3.0

Measurement Error Variance

Figure 13: Your estimate when you ignore measurement error.

Segment 5 (@ R.J. Carroll & D. Ruppert, 2002) 91

Illustration of SIMEX
1.2

1.0

Coefficient 0.8

0.6
Naive Estimate

0.4

0.2

0.0

0.0 0.5 1.0 1.5 2.0 2.5 3.0

Measurement Error Variance

Figure 14: This shows what happens to your estimate when you have more error, but
you still ignore the error.
Segment 5 (@ R.J. Carroll & D. Ruppert, 2002) 92

Illustration of SIMEX
1.2

1.0

0.8
Coefficient

0.6
Naive Estimate

0.4

0.2

0.0

0.0 0.5 1.0 1.5 2.0 2.5 3.0

Measurement Error Variance

Figure 15: What statistician can resist fitting a curve?

Segment 5 (@ R.J. Carroll & D. Ruppert, 2002) 93

Illustration of SIMEX
1.2

SIMEX Estimate
1.0

0.8
Coefficient

0.6
Naive Estimate

0.4

0.2

0.0

0.0 0.5 1.0 1.5 2.0 2.5 3.0

Measurement Error Variance

Figure 16: Now extrapolate to the case of no measurement error.

Segment 5 (@ R.J. Carroll & D. Ruppert, 2002) 94

OUTLINE OF THE ALGORITHM

• Add measurement error to variable measured with error

∗ θ controls amount of added measurement error

∗ σu2 increased to (1 + θ)σu2

• Recalculate estimates — called pseudo estimates. Do many times and

average for each θ

• Plot pseudo estimates versus θ

• Extrapolate to θ = −1

∗ θ = −1 corresponds to case of no measurement error

Segment 5 (@ R.J. Carroll & D. Ruppert, 2002) 95

AN EMPIRICAL VERSION OF SIMEX:

FRAMINGHAM DATA EXAMPLE
• Data

∗ Y = indicator of CHD
∗ Wk = SBP at Exam k, k = 1, 2
∗ X = “true” SBP
∗ Data, 1660 subjects:
(Yj , W1,j , W2,j ), j = 1, . . . , 1660

• Model Assumptions

∗ W1, W2 | X iid N(X, σu2)

∗ Pr(Y = 1 | X) = H(α + βX), H logistic
Segment 5 (@ R.J. Carroll & D. Ruppert, 2002) 96

FRAMINGHAM DATA EXAMPLE: THREE NAIVE

ANALYSES:

• Regress Y on W • 7−→ βcAverage

• Regress Y on W1 7−→ βc1
• Regress Y on W2 7−→ βc2

Predictor
Measurement Error Slope
θ Variance Estimate
= (1 + θ)σu2 /2

−1 0 ?

0 σu2 /2 βbA

1 σu2 βb1 , βb2

Segment 5 (@ R.J. Carroll & D. Ruppert, 2002) 97

Logistic regression fits in Framingham using first replicate, second replicate

Figure 17:
and average of both
Segment 5 (@ R.J. Carroll & D. Ruppert, 2002) 98

A SIMEX–type plot for the Framingham data, where the errors are not
Figure 18:
computer–generated.
Segment 5 (@ R.J. Carroll & D. Ruppert, 2002) 99

A SIMEX–type extrapolation for the Framingham data, where the errors

Figure 19:
are not computer–generated.
Segment 5 (@ R.J. Carroll & D. Ruppert, 2002) 100

SIMULATION AND EXTRAPOLATION STEPS:

EXTRAPOLATION
• Framingham Example: (two points θ = 0, 1)
∗ Linear Extrapolation — a + bθ

• In General: (multiple θ points)

∗ Linear: a + bθ
∗ Quadratic: a + bθ + cθ2
∗ Rational Linear: (a + bθ)/(c + θ)
Segment 5 (@ R.J. Carroll & D. Ruppert, 2002) 101

SIMULATION AND EXTRAPOLATION

ALGORITHM
• Simulation Step
• For θ ∈ {θ1, . . . , θM }
• For b = 1, ..., B, compute:
∗ bth pseudo data set
√ µ ¶
Wb,i(θ) = Wi + θ Normal 0, σu2 b,i

∗ bth pseudo estimate µ

c c n¶
θb(θ) = θ {Yi, Wb,i(θ)}1
∗ the average of the pseudo estimates
c
B
−1 X c
µ
c n¶
θ(θ) = B θb(θ) ≈ E θb(θ) | {Yj , Xj }1
b=1
Segment 5 (@ R.J. Carroll & D. Ruppert, 2002) 102

SIMULATION AND EXTRAPOLATION

ALGORITHM
• Extrapolation Step
c
• Plot θ(θ) vs θ (θ > 0)
c
• Extrapolate to θ = −1 to get θ(−1) = θcSIMEX
Segment 5 (@ R.J. Carroll & D. Ruppert, 2002) 103

EXAMPLE: MEASUREMENT ERROR IN

SYSTOLIC BLOOD PRESSURE
• Framingham Data:
Ã !
Yj , Agej , Smokej , Cholj WA,j , j = 1, . . . , 1615

∗ Y = indicator of CHD
∗ Age (at Exam 2)
∗ Smoking Status (at Exam 1)
∗ Serum Cholesterol (at Exam 3)
∗ Transformed SBP
WA = (W1 + W2) /2,
Wk = ln (SBP − 50) at Exam k

• Consider logistic regression of Y on Age, Smoke, Chol and SBP with transformed
SBP measured with error
Segment 5 (@ R.J. Carroll & D. Ruppert, 2002) 104

EXAMPLE: PARAMETER ESTIMATION

• The plots on the following page illustrate the simulation extrapolation method
for estimating the parameters in the logistic regression model
Segment 5 (@ R.J. Carroll & D. Ruppert, 2002) 105

Age Smoking

6.72 8.43

Coefficient (x 1e+2)

Coefficient (x 1e+1)
6.13 7.18

• • • 0.601
• • • 0.593
5.54 • • 5.93 • • • • • • • •
0.0554
0.0536
4.95 4.68

4.36 3.43

-1.0 0.0 1.0 2.0 -1.0 0.0 1.0 2.0

Lambda Lambda

Cholesterol Log(SBP-50)

9.98 2.12

1.93
Coefficient (x 1e+3)

Coefficient (x 1e+0)
8.93 1.92

0.00787 1.71
7.87 • • • • • • • • 1.71 •
0.00782 •
•
•
6.82 1.50 •
•
•
•
5.76 1.29

-1.0 0.0 1.0 2.0 -1.0 0.0 1.0 2.0

Lambda Lambda

Figure 20:
Segment 5 (@ R.J. Carroll & D. Ruppert, 2002) 106

EXAMPLE: VARIANCE ESTIMATION

• The pseudo estimates can be used for variance estimation.
∗ The theory is similar to those for jackknife and bootstrap variance estimation.

∗ The calculations, too involved to review here, are similar as well. See Chapter
4 of our book.
• In many cases, with decent coding, you can use the bootstrap to estimate
the variance of SIMEX.
Segment 5 (@ R.J. Carroll & D. Ruppert, 2002) 107

A MIXED MODEL
• Data from the Framingham Heart Study
• There were m = 75 clusters (individuals) with most having n = 4 exams, each
taken 2 years apart.
• The variables were
∗ Y = evidence of LVH (left ventricular hypertrophy) diagnosed by ECG in
patients who developed coronary heart disease before or during the study
period
∗ W = log(SBP-50)
∗ Z = age, exam number, smoking status, body mass index.
∗ X = average log(SBP-50) over many applications within 6 months (say) of
each exam.
Segment 5 (@ R.J. Carroll & D. Ruppert, 2002) 108

A MIXED MODEL
• We fit this as a logistic mixed model, with a random intercept for each
person having mean β0 and variance θ.
• We assumed that measurement error was independent at each visit.
Segment 5 (@ R.J. Carroll & D. Ruppert, 2002) 109

Framingham Data, SIMEX extrapolations

All intraindividual variability due to error

4.0

3.5

3.0 beta(SBP)

2.5

2.0

1.5 theta

1.0

-1.0 0.0 0.5 1.0 1.5 2.0

(m.e. var added on)/(m.e. var)

LVH Framingham data. β(SBP) is the coefficient for transformed systolic

Figure 21:
blood pressure, while θ is the variance of the person–to–person random intercept.
Segment 5 (@ R.J. Carroll & D. Ruppert, 2002) 110

SUMMARY
• Bootstrap-like method for estimating bias and variance due to measurement
error
• Functional method for classical measurement error
• Not model dependent
• Computer intensive
∗ Generate and analyze several pseudo data sets
• Approximate method like regression calibration
Segment 6 (@ R.J. Carroll & D. Ruppert, 2002) 111

SEGMENT 6 INSTRUMENTAL VARIABLES

OUTLINE
• Linear Regression
• Regression Calibration for GLIM’s
Segment 6 (@ R.J. Carroll & D. Ruppert, 2002) 112

LINEAR REGRESSION
• Let’s remember what the linear model says.
Y = β0 + βxX + ²;
W = X + U;
U ∼ Normal(0, σu2 ).

• We know that if we ignore measurement error, ordinary least squares estimates

not βx, but instead it estimates
σx2
λβx = βx 2
σx + σu2

• λ is the attenuation coefficient or reliability ratio

• Without information about σu2 , we cannot estimate βx.
Segment 6 (@ R.J. Carroll & D. Ruppert, 2002) 113

INFORMATION ABOUT MEASUREMENT ERROR

• textbfblueClassical measurement error: W = X + U , U ∼ Normal(0, σu2 ).
• The most direct and efficient way to get information about σu2 is to observe
X on a subset of the data.
• The next best way is via replication, namely to take ≥ 2 independent
replicates

∗ W1 = X + U1
∗ W2 = X + U2.
• If these are indeed replicates, then we can estimate σu2 via a components of
variance analysis.
• The third and least efficient method is to use Instrumental Vari-
ables, or IV’s

∗ Sometimes replicates cannot be taken.

∗ Sometimes X cannot be observed.
Segment 6 (@ R.J. Carroll & D. Ruppert, 2002) 114

WHAT IS AN INSTRUMENTAL VARIABLE?

Y = β0 + βxX + ²;
W = X + U;
U ∼ Normal(0, σu2 ).

• In linear regression, an instrumental variable T is a random variable which has

three properties:

∗ T is independent of ²
∗ T is independent of U
∗ T is related to X.
∗ You only measure T to get information about measurement error: it is not
part of the model.
∗ In our parlance, T is a surrogate for X!
Segment 6 (@ R.J. Carroll & D. Ruppert, 2002) 115

WHAT IS AN INSTRUMENTAL VARIABLE?

• Whether T qualifies as an instrumental variable can be a difficult
and subtle question.
∗ After all, we do not observe U , X or ², so how can we know that the
assumptions are satisfied?
Segment 6 (@ R.J. Carroll & D. Ruppert, 2002) 116

AN EXAMPLE

X = usual (long–term) average intake of Fat (log scale);

Y = Fat as measured by a questionnaire;
W = Fat as measured by 6 days of 24–hour recalls
T = Fat as measured by a diary record

• In this example, the time ordering was:

∗ Questionnaire
∗ Then one year later, the recalls were done fairly close together in time.
∗ Then 6 months later, the diaries were measured.
• One could think of the recalls as replicates, but some researchers have worried
that major correlations exist, i.e., they are not independent replicates.
• The 6–month gap with the recalls and the 18–month gap with the questionnaire
makes the diary records a good candidate for an instrument.
Segment 6 (@ R.J. Carroll & D. Ruppert, 2002) 117

INSTRUMENTAL VARIABLES ALGORITHM

• The simple IV algorithm in linear regression works as follows:
STEP 1: Regress W on T (may be a multivariate regression)
STEP 2: Form the predicted values of this regression
STEP 3: Regress Y on the predicted values.
STEP 4: The regression coefficients are the IV estimates.
• Only Step 3 changes if you do not have linear regression but instead have logistic
regression or a generalized linear model.

∗ Then the “regression” is logistic or GLIM.

∗ Very simple to compute.
∗ Easily bootstrapped.
• This method is “valid” in GLIM’s to the extent that regression calibration is
valid.
Segment 6 (@ R.J. Carroll & D. Ruppert, 2002) 118

USING INSTRUMENTAL
VARIABLES:MOTIVATION
• In what follows, we will use underscores to denote which coefficients go where.

• For example, βY |1X is the coefficient for X in the regression of Y on X.

• Let’s do a little algebra:

Y = βY |1X + βY |1X X + ²;
W = X + U;
(², U ) = independent of T.

• This means
E(Y | T ) = βY |1T + βY |1T T
= βY |1X + βY |1X E(X | T )
= βY |1X + βY |1X E(W | T )
Segment 6 (@ R.J. Carroll & D. Ruppert, 2002) 119

MOTIVATION

E(Y | T ) = βY |1T + βY |1T T

= βY |1X + βY |1X E(X | T )
= βY |1X + βY |1X E(W | T )
= βY |1T + βY |1X βW |1T T.

• We want to estimate βY |1X

• Algebraically, this means that the slope Y on T is the product of the slope for
Y on X times the slope for W on T :

βY |1T = βY |1X βW |1T

Segment 6 (@ R.J. Carroll & D. Ruppert, 2002) 120

MOTIVATION
∗ Equivalently, it means
βY |1T
βY |1X = .
βW |1T

∗ Regress Y on T and divide its slope by the slope of the regres-

sion of W on T !
Segment 6 (@ R.J. Carroll & D. Ruppert, 2002) 121

THE DANGERS OF A WEAK INSTRUMENT

• Remember that we get the IV estimate using the relationship
βY |1T
βY |1X = .
βW |1T

• This means we divide

Slope of Regression of Y on T
.
Slope of Regression of W on T

• The division causes increased variability.

∗ If the instrument is very weak, the slope βW |1T will be near zero.
∗ This will make the IV estimate very unstable.
• It is generally far more efficient in practice to take replicates and
get a good estimate of the measurement error variance than it is to “hope and
pray” with an instrumental variable.
Segment 6 (@ R.J. Carroll & D. Ruppert, 2002) 122

OTHER ALGORITHMS
• The book describes other algorithms which improve upon the simple algorithm,
in the sense of having smaller variation.

• The methods are described in the book, but are largely algebraic and difficult
to explain here.

• However, for most generalized linear models the two methods are fairly similar
in practice.
Segment 6 (@ R.J. Carroll & D. Ruppert, 2002) 123

FIRST EXAMPLE
• WISH Data (Women’s Interview Study of Health).

X = usual (long–term) average intake of Fat (log scale);

Y = Fat as measured by a Food Frequency Questionnaire;
W = Fat as measured by 6 days of 24–hour recalls
T = Fat as measured by a diary record

• Recall the algorithm:

∗ Regress W on T
∗ Form predicted values
∗ Regress Y on the predicted values.
• Dietary intake data have large error, and signals are difficult to find.
Segment 6 (@ R.J. Carroll & D. Ruppert, 2002) 124

WISH Data, Y versus W

Slope of Y on W = 0.52

5.5

IV
200
150
100
50
0

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4

betaxiv

Bootstrap sampling, comparison with SIMEX and Regression Calibration,

Figure 27:

when the instrument is of lower quality and one one of the diaries is used.
Segment 6 (@ R.J. Carroll & D. Ruppert, 2002) 130

FURTHER ANALYSES
• The naive analysis has
∗ Slope = 0.4832
∗ OLS standard error = 0.0987
∗ Bootstrap standard error = 0.0946
• The instrumental variable analysis has
∗ Slope = 0.8556
∗ Bootstrap standard error = 0.1971
• For comparison purposes, the analysis which treats the 6 24–hour recalls
as independent replicates has
∗ Slope = 0.765
∗ Bootstrap standard error = 0.1596
• Simulations show that if the 24–hour recalls were really replicates, then the EIV
estimate is less variable than the IV estimate.
Segment 7 (@ R.J. Carroll & D. Ruppert, 2002) 131

SEGMENT 7: LIKELIHOOD METHODS OUTLINE

• Nevada A–bomb test site data

∗ Berkson likelihood analysis

• Framingham Heart Study

∗ Classical likelihood analysis

• Extensions of the models

• Comments on Computation
Segment 7 (@ R.J. Carroll & D. Ruppert, 2002) 132

NEVADA A–BOMB TEST FALLOUT DATA

• In the early 1990’s, Richard Kerber (University of Utah) and colleagues in-
vestigated the effects of 1950’s Nevada A–bomb tests on thyroid neoplasm in
exposed children.

• Data were gathered from Utah, Nevada and Arizona.

• Dose to the thyroid was measured by a complex modeling process (more later)

• If true dose in the log–scale is X, and other covariates are Z, fit a logistic
regression model:
· ¸
T
pr(Y = 1|X, Z) = H Z βz + log{1 + βx exp(X)} .
Segment 7 (@ R.J. Carroll & D. Ruppert, 2002) 133

NEVADA A–BOMB TEST FALLOUT DATA

• Dosimetry in radiation cancer epidemiology is a difficult and time–consuming
process.

• In the fallout study, many factors were taken into account

∗ Age of exposure
∗ Amount of milk drunk
∗ Milk producers
∗ I–131 (a radioisotope) deposition on the ground
∗ Physical transport models from milk and vegetables to the thyroid

• Essentially all of these steps have uncertainties associated with

them.
Segment 7 (@ R.J. Carroll & D. Ruppert, 2002) 134

NEVADA A–BOMB TEST FALLOUT DATA

• The investigators worked initially in the log scale, and propogated errors and
uncertainties through the system.

∗ Much of how they did this is a mystery to us.

∗ They took published estimates of measurement errors in food frequency ques-
tionnaires in milk.

∗ They also had estimates of the measurement errors in ground deposition of

I–131.
∗ And they had subjective estimates of the errors in transport from milk to
the human to the thyroid.
Segment 7 (@ R.J. Carroll & D. Ruppert, 2002) 135

NEVADA A–BOMB TEST FALLOUT DATA

• Crucially, and as usual in this field, the data file contained not only the esti-
mated dose of I–131, but also an uncertainty associated with this dose.

• For purposes of today we are going to assume that the error are Berkson in
the log–scale:

Xi = Wi + Ubi.

∗ The variance of Ub is the uncertainty in the data file.

var(Ubi) = σbi2 known

• And to repeat, the dose–response model of major interest is

· ¸
T
pr(Y = 1|X, Z) = H Z βz + log{1 + βx exp(X)} .
Segment 7 (@ R.J. Carroll & D. Ruppert, 2002) 136

Utah Study
Utah
Nevada
Arizona

1.5
SD log dose

1.0

0.5

0.0

-15 -10 -5 0 5
Log Dose

Figure 28: Log(Dose) and estimated uncertainty in the Utah Data

Segment 7 (@ R.J. Carroll & D. Ruppert, 2002) 137

Utah Study: Black=Neoplasm

Utah
Nevada
Arizona

1.5
SD log dose

1.0

0.5

0.0

-15 -10 -5 0 5

Log Dose

Figure 29: Log(Dose) and estimated uncertainty in the Utah Data. Large black octo-
gons are the 19 cases of thyroid neoplasm. Note the neoplasm for a person with
no dose.
Segment 7 (@ R.J. Carroll & D. Ruppert, 2002) 138

Utah Study: Black=Neoplasm

1.5
Utah
Nevada
Arizona
SD log dose

1.0

0.5

0.0

-15 -10 -5 0 5

Log Dose

Figure 30: Log(Dose) and estimated uncertainty in the Utah Data for the thyroid
neoplasm cases, by state.
Segment 7 (@ R.J. Carroll & D. Ruppert, 2002) 139

BERKSON LIKELIHOOD ANALYSIS

• How do we analyze such data?

• We propose that in the Berkson model, the only real available methods
for this complex, heteroscedastic nonlinear logistic model have to be based on
likelihood methods.

• Let’s see if we can understand what the likelihood is for this problem.

• The likelihood function fY |W,Z (y|w, z, Θ, A) can be computed by numerical
integration.

• The maximum likelihood estimate maximizes the loglikelihood of all the data.
n
X
L(Θ, A) = log fY |Z,W (Yi|Zi, Wi, Θ, A).
i=1

• Maximization program can be used.

Segment 7 (@ R.J. Carroll & D. Ruppert, 2002) 144

BERKSON LIKELIHOOD ANALYSIS: SUMMARY

• Berkson error modeling is relatively straightforward in general.

• Likelihood for underlying model: fY |Z,X (y|z, x, Θ)

∗ Logistic nonlinear model

• Likelihood for error model: fX|Z,W (x|z, w, A)

∗ In our case, the Utah study data files tells us the Berkson error variance for
each individual.
Segment 7 (@ R.J. Carroll & D. Ruppert, 2002) 145

BERKSON LIKELIHOOD ANALYSIS: SUMMARY

• Overall likelihood computed by numerical integration.

fY |W,Z (y|w, z, Θ, A)
Z
= fY |Z,X (y|z, x, Θ)fX|Z,W (x|z, w, A)dx.

• The maximum likelihood estimate maximizes

n
X
L(Θ, A) = log fY |Z,W (Yi|Zi, Wi, Θ, A).
i=1
Segment 7 (@ R.J. Carroll & D. Ruppert, 2002) 146

Utah Study: LR chi-square

3
LR chi square

Naive
Berkson

0.0 0.05 0.10 0.15 0.20 0.25 0.30

FRAMINGHAM HEART STUDY DATA

• The aim is to understand the relationship between coronary heart disease (CHD
= Y ) and systolic blood pressure (SBP) in the presence of covariates (age and
smoking status).

• SBP is known to be measured with error.

∗ If we define X = log(SBP − 50), then about 1/3 of the variability in the

observed values W is due to error.

∗ Classical error is reasonable here.

∗ The measurement error is essentially known to equal σu2 = 0.01259

• Here is a q–q plot of the observed SBP’s (W ), along with a density estimate.
Segment 7 (@ R.J. Carroll & D. Ruppert, 2002) 151

Nomral QQ plot, Framingham Data

* *
**

0.5
*
*
*
*****
****
*****
***
***
*****
*******
*
******
*
**
******
******
*******
**
**
********
*****
******
*******
******
**
********
********
******
0.0
diff

*******
**
**
*******
**
*****
******
*******
**
**
******
*
****
*******
*********
******
**
******
*
****
********
*****
*
****
*
*****
**
**
***
-0.5

*
*

-2 0 2

Quantiles of Standard Normal

lambda = 0.0, theta = 50

Figure 32: q–q plot in Framingham for log(SBP − 50)

Segment 7 (@ R.J. Carroll & D. Ruppert, 2002) 152

Framingham, theta = 50,lambda=0.0

0.4
0.3
0.2
0.1
0.0

-4 -2 0 2 4

Kernel Density with width = 1.0, AD-stat for normality = 0.70

Solid = best normal

Figure 33: Kernel density estimate and best fitting normal density plot in Framingham
for log(SBP − 50)
Segment 7 (@ R.J. Carroll & D. Ruppert, 2002) 153

FRAMINGHAM HEART STUDY DATA

• We will let age and smoking status be denoted by Z.

• A reasonable model is logistic regression.

pr(Y = 1|X, Z) = H(β0 + βzTZ + βxX);

½ ¾
= 1./ 1 + exp(β0 + βzTZ + βxX) .

• A reasonable error model is

W = X + U, σu2 = 0.01259.

• W is only very weakly correlated with Z. Thus, a reasonable model for X

given Z is

X ∼ Normal(µx, σx2 ).
Segment 7 (@ R.J. Carroll & D. Ruppert, 2002) 154

FRAMINGHAM HEART STUDY DATA

• We have now specified everything we need to do a likelihood analysis.

∗ A model for Y given (X, Z)

∗ A model for W given (X, Z)
∗ A model for X given Z.

• The unknown parameters are β0, βz , βx, µx, σx2 .

• We need a formula for the likelihood function, and for this we need a little theory.
Segment 7 (@ R.J. Carroll & D. Ruppert, 2002) 155

LIKELIHOOD WITH AN ERROR MODEL

• Assume that we observe (Y, W, Z) on every subject.

• fY |X,Z (y|x, z, β) is the density of Y given X and Z.

∗ this is the underlying model of interest.

∗ the density depends on an unknown parameter β.

• fW |X,Z (w|x, z, U) is the conditional density of W given X and Z.

∗ This is the error model.

∗ It depends on another unknown parameter U.

• fX|Z (x|z, α2) is the density of X given Z depending on the parameter A. This
is the model for the unobserved predictor. This density may be hard
to specifiy but it is needed. This is where model robustness becomes a big
issue.
Segment 7 (@ R.J. Carroll & D. Ruppert, 2002) 156

LIKELIHOOD WITH AN ERROR

∗ The assumption of nondifferential measurement error is used here,

so that fY |X,W,Z = fY |X,Z .
∗ The integral will ususally be calculated numerically.
∗ The integral is replaced by a sum if X is discrete.
∗ Note that fY,W |Z depends of fX|Z —again this is why robustness is a worry.
Segment 7 (@ R.J. Carroll & D. Ruppert, 2002) 157

LIKELIHOOD WITH AN ERROR

MODEL—CONTINUED
• The log-likelihood for the data is, of course,
n
X
L(β, α) = log fY,W (Yi, Wi|β, α).
i=1

• The log–likelihood is often computed numerically,

• Function maximizers can be used to compute the likelihood analysis.

Segment 7 (@ R.J. Carroll & D. Ruppert, 2002) 158

LIKELIHOOD WITH AN ERROR

MODEL—CONTINUED
• If X is scalar, generally the likelihood function can be computed numerically
and then maximized by a function maximizer.

fY,W |Z (y, w|z, β, U, A)

Z
= fY |X,Z (y|x, z, β)fW |X,Z (w|x, z, U)fX|Z (x|z, A)dx.

• We did this in the Framingham data.

∗ We used starting values for β0, βz , βx, µx, σx2 from the naive analysis which
ignores measurement error.
∗ We will show you the profile loglikelihood functions for βx for both analyses.
Segment 7 (@ R.J. Carroll & D. Ruppert, 2002) 159

Framingham Heart Study

15
Naive
Classical

10
LR chi square

0.0 0.5 1.0 1.5 2.0 2.5 3.0

beta for transformed SBP

Figure 34: Profile likelihoods for SBP in Framingham Heart Study.

Segment 7 (@ R.J. Carroll & D. Ruppert, 2002) 160

A NOTE ON COMPUTATION
• It is almost always better to standardize the covariates to have sample
mean zero and sample variance one.

• Especially in logistic regression, this improves the accuracy and stability of nu-
merical integration and likelihood maximization.
Segment 7 (@ R.J. Carroll & D. Ruppert, 2002) 161

A NOTE ON COMPUTATION
• Not all problems are amenable to numerical integration to com-
pute the log–likelihood

∗ Mixed GLIM’s is just such a case.

∗ In fact, for mixed GLIM’s, the likelihood function with no measurement
error is not computable

• In these cases, specialized tools are necessary. Monte–Carlo EM (McCulloch,

1997, JASA and Booth & Hobert, 1999, JRSS–B) are two examples of Monte–
Carlo EM.
Segment 7 (@ R.J. Carroll & D. Ruppert, 2002) 162

EXTENSIONS OF THE MODELS

• It’s relatively easy to write down the likelihood of complex, nonstandard models.

∗ So likelihood analysis is a good option when the data or scientific knowledge

suggest a nonstandard model.

• For example, multiplicative measurement error will often make sense. These are
additive models in the log scale, e.g., the Utah data.

• Generally, the numerical issues are no more or less difficult for multiplicative
error.

Simulation Extrapolation Estimation Measurement Error Models
No ratings yet
Simulation Extrapolation Estimation Measurement Error Models
16 pages
CLRM Assumptions & OLS Violations
No ratings yet
CLRM Assumptions & OLS Violations
54 pages
ECON 342 AE Model Specification and Data Problems 2021
No ratings yet
ECON 342 AE Model Specification and Data Problems 2021
43 pages
Introduction To Econometrics With R
No ratings yet
Introduction To Econometrics With R
18 pages
How Measurement Error Affects Inference in Linear Regression
No ratings yet
How Measurement Error Affects Inference in Linear Regression
25 pages
Instrumental Variables Slides 2021
No ratings yet
Instrumental Variables Slides 2021
26 pages
Lecture 7
No ratings yet
Lecture 7
14 pages
Error in Variables
No ratings yet
Error in Variables
16 pages
OLS Assumptions & Issues Guide
No ratings yet
OLS Assumptions & Issues Guide
4 pages
Econometrics
No ratings yet
Econometrics
13 pages
The Effect of Measurement Errors On A Linear Regression Model - Time Series Analysis, Regression, and Forecasting
No ratings yet
The Effect of Measurement Errors On A Linear Regression Model - Time Series Analysis, Regression, and Forecasting
12 pages
Econometrics: Measurement Error Analysis
No ratings yet
Econometrics: Measurement Error Analysis
15 pages
Activity 7
No ratings yet
Activity 7
5 pages
Advanced Econometrics: OLS & Regression Analysis
No ratings yet
Advanced Econometrics: OLS & Regression Analysis
65 pages
Agra University Journal Scie
No ratings yet
Agra University Journal Scie
69 pages
Unit 2
No ratings yet
Unit 2
15 pages
Notes 2
No ratings yet
Notes 2
16 pages
Econometric Modeling:: Model Specification and Diagnostic Testing
100% (1)
Econometric Modeling:: Model Specification and Diagnostic Testing
57 pages
Unit Two-Exposure Measurement Error and Its Effects - 3
No ratings yet
Unit Two-Exposure Measurement Error and Its Effects - 3
59 pages
Bounding Parameters in A Linear Regression Model With A Mismeasured Regressor Using Additional Information
No ratings yet
Bounding Parameters in A Linear Regression Model With A Mismeasured Regressor Using Additional Information
20 pages
Unit 5. Model Selection: María José Olmo Jiménez
No ratings yet
Unit 5. Model Selection: María José Olmo Jiménez
15 pages
Econometrics Chapter 3
No ratings yet
Econometrics Chapter 3
24 pages
Lecture 11 - Stochastic Regressors Measurement Errors
No ratings yet
Lecture 11 - Stochastic Regressors Measurement Errors
6 pages
Linear Regression
No ratings yet
Linear Regression
59 pages
Chapter 3 Notes
No ratings yet
Chapter 3 Notes
5 pages
VariableSelectionAndModelBuilding IIT
No ratings yet
VariableSelectionAndModelBuilding IIT
22 pages
Topic 10 Regression Diagnostic IV Analysis Model Specification Errors
No ratings yet
Topic 10 Regression Diagnostic IV Analysis Model Specification Errors
30 pages
Specification Errors in Regression Analysis
No ratings yet
Specification Errors in Regression Analysis
7 pages
A Comprehensive Approach To Misspecification Testing in Linear Regression Models
No ratings yet
A Comprehensive Approach To Misspecification Testing in Linear Regression Models
6 pages
Wang PDF
No ratings yet
Wang PDF
56 pages
05 Diagnostic Test of CLRM 2
No ratings yet
05 Diagnostic Test of CLRM 2
39 pages
Lecture39 Module16 Econometrics
No ratings yet
Lecture39 Module16 Econometrics
10 pages
Regression Analysis Guide
100% (1)
Regression Analysis Guide
280 pages
EC228 f2009
No ratings yet
EC228 f2009
16 pages
Measurement Error Edit - Removed
No ratings yet
Measurement Error Edit - Removed
7 pages
Chapter16 Econometrics Measurement Error Models
No ratings yet
Chapter16 Econometrics Measurement Error Models
21 pages
Qbus2810 Notes PDF
100% (1)
Qbus2810 Notes PDF
58 pages
Chap4 Econometrics
No ratings yet
Chap4 Econometrics
38 pages
Chapter 9
No ratings yet
Chapter 9
38 pages
Empirical Methods for Finance Students
No ratings yet
Empirical Methods for Finance Students
80 pages
Measurement Error-Robust Causal Inference Via Constructed Instrumental Variables
No ratings yet
Measurement Error-Robust Causal Inference Via Constructed Instrumental Variables
72 pages
BA501 Week5 Linear Regression
No ratings yet
BA501 Week5 Linear Regression
45 pages
Lecture 01
No ratings yet
Lecture 01
58 pages
R18&19
No ratings yet
R18&19
32 pages
Linear Regression Model
No ratings yet
Linear Regression Model
36 pages
BCOR 3750 Linear Regression Models
No ratings yet
BCOR 3750 Linear Regression Models
14 pages
Statistical Inference Review 1753631981
No ratings yet
Statistical Inference Review 1753631981
48 pages
Topic 3a
No ratings yet
Topic 3a
64 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
Violations of Classical Assumptions: Chapter Four
No ratings yet
Violations of Classical Assumptions: Chapter Four
38 pages
Unit 2
No ratings yet
Unit 2
16 pages
Simple Linear Regression Guide
100% (1)
Simple Linear Regression Guide
23 pages
Intro To Regression
No ratings yet
Intro To Regression
4 pages
Chap12 2012
No ratings yet
Chap12 2012
30 pages
Chapter 14
No ratings yet
Chapter 14
65 pages
CH 2
No ratings yet
CH 2
31 pages
Application of AR MA and ARMA Models in Financial
No ratings yet
Application of AR MA and ARMA Models in Financial
8 pages
SSRN 4218834
No ratings yet
SSRN 4218834
22 pages
Conflict of Interest Policy
No ratings yet
Conflict of Interest Policy
7 pages
Smith Et Al 2025 A Time Series Analysis of Herd Investor Behavior Using Online and Social Media Data
No ratings yet
Smith Et Al 2025 A Time Series Analysis of Herd Investor Behavior Using Online and Social Media Data
22 pages
SSRN 4922390
No ratings yet
SSRN 4922390
24 pages
This Document Is Discoverable and Free To Researchers Across The Globe Due To The Work of Agecon Search
No ratings yet
This Document Is Discoverable and Free To Researchers Across The Globe Due To The Work of Agecon Search
24 pages
18 - Introduction and Levels of Measurements (2017-18)
No ratings yet
18 - Introduction and Levels of Measurements (2017-18)
41 pages
Multidimensional Scaling Methodand
No ratings yet
Multidimensional Scaling Methodand
15 pages
330 Lect11
No ratings yet
330 Lect11
35 pages
Multivariate Normality: Hurold W. Fulls Marshall Marshall Ala
No ratings yet
Multivariate Normality: Hurold W. Fulls Marshall Marshall Ala
288 pages
655 Ecacb 7 DD 82
No ratings yet
655 Ecacb 7 DD 82
10 pages
Action Research Initiatives
No ratings yet
Action Research Initiatives
82 pages
TaggartKoskelaRooke2014 SupplyChain-ReductionRework
No ratings yet
TaggartKoskelaRooke2014 SupplyChain-ReductionRework
44 pages
Eai 18-7-2019 2288566
No ratings yet
Eai 18-7-2019 2288566
7 pages
Stress Testing Insurance Companies Methodology
No ratings yet
Stress Testing Insurance Companies Methodology
6 pages
G01SASIntroduction (Part1)
No ratings yet
G01SASIntroduction (Part1)
4 pages
454545consumer Perception Towards Tata Motors (1) (1)
No ratings yet
454545consumer Perception Towards Tata Motors (1) (1)
48 pages
Document
No ratings yet
Document
276 pages
CH Engr 101A Transport Phenomena I: Syllabus Fall 2020
No ratings yet
CH Engr 101A Transport Phenomena I: Syllabus Fall 2020
6 pages
The Full Screen: Shalizza Vega
No ratings yet
The Full Screen: Shalizza Vega
12 pages
Investigation 5
No ratings yet
Investigation 5
4 pages
Entrepreneurship Management: A Project Report On International Marketing of Indian Companies
No ratings yet
Entrepreneurship Management: A Project Report On International Marketing of Indian Companies
8 pages
Human Anatomy Physiology 11th Edition by Elaine N Marieb Full Download
No ratings yet
Human Anatomy Physiology 11th Edition by Elaine N Marieb Full Download
409 pages
Econometrics Test Bank Guide
No ratings yet
Econometrics Test Bank Guide
134 pages
894 ArticleText 4979 3 10 20210805
No ratings yet
894 ArticleText 4979 3 10 20210805
29 pages
Data Analysis Techniques Guide
No ratings yet
Data Analysis Techniques Guide
44 pages
Maslach Burnout Inventory
75% (4)
Maslach Burnout Inventory
13 pages
Design For Product Success
100% (5)
Design For Product Success
313 pages
Application Form For Graduate Study 2020-21: Section A: About Your Course
No ratings yet
Application Form For Graduate Study 2020-21: Section A: About Your Course
19 pages
Mutinda - Value Chain and Competitive Advantage in Commercial Banks in Kenya 7
No ratings yet
Mutinda - Value Chain and Competitive Advantage in Commercial Banks in Kenya 7
71 pages
Lecture Notes
No ratings yet
Lecture Notes
5 pages
2021-09-10 - Essential Brand Persona Storytelling and Branding - Stephen Herskovitz-Malcolm Crystal
No ratings yet
2021-09-10 - Essential Brand Persona Storytelling and Branding - Stephen Herskovitz-Malcolm Crystal
8 pages
Maritime Shipping Digitalization Blockchain-Based Technology Applications, Future Improvements, and Intention To Use
No ratings yet
Maritime Shipping Digitalization Blockchain-Based Technology Applications, Future Improvements, and Intention To Use
10 pages
Extended Essay Abstract Writing Guide
100% (2)
Extended Essay Abstract Writing Guide
3 pages
Case Study Methods
No ratings yet
Case Study Methods
25 pages
Day 1 Day 2 Day 3 Day 4 Day 5: Daily Lesson Log
No ratings yet
Day 1 Day 2 Day 3 Day 4 Day 5: Daily Lesson Log
2 pages
Outline and Evaluate The Effect of Endogenous Pacemakers and Exogenous Zeitgebers On The Sleep-Wake Cycle
No ratings yet
Outline and Evaluate The Effect of Endogenous Pacemakers and Exogenous Zeitgebers On The Sleep-Wake Cycle
25 pages
MANUAL For WHO Disability Assessment Schedule
No ratings yet
MANUAL For WHO Disability Assessment Schedule
152 pages
Evidence of The Impacts of Public e Procurement The Portuguese Experience.
No ratings yet
Evidence of The Impacts of Public e Procurement The Portuguese Experience.
9 pages
FULL CHAP 1-5 With References
No ratings yet
FULL CHAP 1-5 With References
48 pages
Module 1 - The Nature of Chemistry
100% (1)
Module 1 - The Nature of Chemistry
12 pages
B2 Step 1.5 - Skills Map
No ratings yet
B2 Step 1.5 - Skills Map
1 page
Spatial Data Analysis Theory and Practice 1st Edition Robert Haining Instant Download
No ratings yet
Spatial Data Analysis Theory and Practice 1st Edition Robert Haining Instant Download
62 pages
Seibel Case Study
No ratings yet
Seibel Case Study
4 pages
CS 551: Banker's Algorithm
No ratings yet
CS 551: Banker's Algorithm
4 pages
EU Migrant Policy Trends 2010-2019
No ratings yet
EU Migrant Policy Trends 2010-2019
24 pages