(DM) S-Plus Missing Data Analysis Library
(DM) S-Plus Missing Data Analysis Library
September 2001
Insightful Corporation
Seattle, Washington
Proprietary Insightful Corporation owns both this software program and its
Notice documentation. Both the program and documentation are
copyrighted with all rights reserved by Insightful Corporation.
The correct bibliographical reference for this document is:
Schimert, J., Schafer, J.L., Hesterberg, T., Fraley, C., & Clarkson,
D.B., Analyzing Data with Missing Values in S-PLUS . Insightful
Corporation, Seattle, WA.
Printed in the United States.
iii
Chapter
iv
CONTENTS
Chapter 1 Introduction 1
Overview 2
Imputing Missing Data with S+MISSINGDATA 5
S+MISSINGDATA Features 8
Using S+MISSINGDATA 9
Using This Manual 11
Chapter 2 Background 13
Overview 14
Taxonomy of Missing Data Methods 15
Imputation 17
Model Fitting Algorithms 20
Multiple Imputation Using DA 23
Using the EM and DA Algorithms in Conjunction 25
v
Contents
Chapter 6 Imputation 63
Overview 64
Imputing Data 65
The Class of impute Objects 68
vi
Contents
Bibliography 165
vii
Contents
viii
INTRODUCTION
Overview
1
2
Model-Based Approaches 3
Model-Based Multiple Imputation 3
Imputing Missing Data with S+MISSINGDATA 5
Workflow 7
S+MISSINGDATA Features 8
Using S+MISSINGDATA 9
Starting and Quitting S+MISSINGDATA 9
Organizing Your Working Data 9
Getting Help 10
Using This Manual 11
Intended Audience 11
Organization 11
Typographic Conventions 12
1
Chapter 1 Introduction
OVERVIEW
Missing data cripple most routines in statistical packages that typically
expect a complete data set (a data set with no missing values). The
common practice is to artificially create a complete data set as
follows:
Throw away cases with missing values, or
Impute (estimate and fill in missing data using some ad hoc
method).
The analyst then treats the altered data set as if
The deleted cases had never been observed, or
The imputed values had always been observed.
These and other ad hoc methods can lead to misleading inferences
because they either throw away or distort information in the data.
More principled methods require methodology and computational
methods that can be expensive to implement.
The S-PLUS library S+MISSINGDATA extends the statistical modeling
capabilities of S-PLUS to support modelbased missing data methods as
outlined in Little and Rubin (1987). These may be applied more or
less routinely to handle a wide variety of missing data problems. The
models are fit using a variety of computational tools including:
1. Expectation-Maximization (EM) algorithm (Dempster, Laird,
and Rubin (1977)) and extensions (see Rubin (1992) for a
review).
2. Data Augmentation (DA) algorithms (Tanner and Wong
(1987), Schafer (1991), Schafer (1997)). These are Monte Carlo
Markov Chain methods (Gelfand and Smith (1990), Gelman
and Rubin(1992), Geyer (1992), Smith and Roberts (1993),
Tierney (1991)). One important property is that these DA
algorithms also produce proper multiple imputations (Rubin
(1987)), which are discussed at length below.
This chapter briefly discusses modelbased methods, including
multiple imputation. It explains how this software for missing data
adds to the collection of S-PLUS modeling functions and expands the
S-PLUS modeling paradigm to incorporate multiple imputation. It
2
Overview
explains the steps you will take in using this software to perform
statistical analysis on data with missing values, and organizes the
functions and objects by these steps.
3
Chapter 1 Introduction
4
Imputing Missing Data with S+MISSINGDATA
Options &
Fit Model
Parameters
Fitted Model
Object
Figure 1.1: S-PLUS modeling functions combine data and models to produce a fitted
model object.
5
Chapter 1 Introduction
Impute
Options &
Fit Model(s)
Parameters
Multiple Fitted
Model(s) Object
Figure 1.2: The role of multiple imputation objects and functions in S-PLUS. The
asterisk (*) indicates components that are the same as in Figure 1.1.
6
Imputing Missing Data with S+MISSINGDATA
Workflow The workflow for using S+MISSINGDATA can be broken down into
distinct stages:
Explore. Look for and understand patterns in the missing
data.
Preprocess. Process the data to create an object that contains
information needed by the model fitting algorithms. By
creating this object once, calculation can be saved if the
model fitting functions are used several times.
Fit. Fit a missing data model.
For multiple imputation, the additional steps are:
Impute. Create M complete data sets, starting the imputation
algorithm from the parameters of the fitted missing data
model.
Analyze. Analyze the completed data sets with respect to a
standard analysis model to produce M fitted analysis objects.
Consolidate. Combine the M fitted analysis objects to obtain
a single inference that incorporates uncertainty due to missing
values.
7
Chapter 1 Introduction
S+MISSINGDATA FEATURES
Table 1.1 organizes the objects and functions available in
S+MISSINGDATA by the activities specified in the workflow on page
7.
Table 1.1: Objects and functions in the S+MISSINGDATA library, organized by activites in the workflow.
miList miEval
miList miFTest
miChiSquareTest
miLikelihoodTest
8
Using S+MISSINGDATA
USING S+MISSINGDATA
If you are familiar with S-PLUS, getting started with S+MISSINGDATA
is simple. If you have not used S-PLUS before, consult the S-PLUS
Users Guide ; we recommend that you learn more about S-PLUS before
proceeding with S+MISSINGDATA.
Starting and To start S+MISSINGDATA, you must first start S-PLUS. See the S-PLUS
Quitting Users Guide for detailed instructions on starting S-PLUS.
S+MISSINGDATA To add the S+MISSINGDATA functions to your S-PLUS session, type
the following at the S-PLUS command line:
> library(missing)
In S-PLUS for Windows, you can also select File ? Load Library
from the main menu to add S+MISSINGDATA to your session.
If you plan to use S+MISSINGDATA extensively, you may want to
customize your S-PLUS start-up routine to automatically attach the
S+MISSINGDATA library. You can do this by adding the line
library(missing) to your .First function. If you do not already
have a .First function, you can create one from the S-PLUS
command line by typing:
Organizing To help you organize the data you analyze with S+MISSINGDATA,
Your Working you can create separate directories for individual projects. In this
section, we briefly describe how to create project directories in both
Data UNIX and Windows. For a detailed discussion, see the S-PLUS Users
Guide .
9
Chapter 1 Introduction
UNIX
To create a specific project directory in S-PLUS for UNIX, use the
CHAPTER utility. To then work in a particular project, simply start
S-PLUS from that projects directory. For example, to create and use
the directory missingdir for an S+MISSINGDATA project, type the
following commands from the UNIX prompt:
mkdir dir
cd dir
Splus CHAPTER
Splus
In these commands, Splus should be replaced with whatever you
type to start S-PLUS on your system.
Windows
To create a specific project directory in S-PLUS for Windows, use the
Open S-PLUS Project dialog. If this dialog does not automatically
appear when you start S-PLUS, choose Options ? General Settings
from the main menu, click the Startup tab, and check the Prompt
for project folder box. The next time you launch S-PLUS, the Open
S-PLUS Project dialog appears, in which you can specify a project
folder for the duration of your session. If the folder you select does
not already exist, S-PLUS creates and initializes it for you.
Getting Help S+MISSINGDATA provides help files for virtually all functions
included in the library. For example, you can obtain help on the
function impGauss by typing the following at the S-PLUS command
line:
> help(impGauss)
In S-PLUS for Windows, you can also select Help ? Available Help
? missing after loading S+MISSINGDATA into your session. Note
that some functions intended for internal use do not have help files.
10
Using This Manual
Organization The main body of this book is divided into 11 chapters that guide you
step-by-step through the S+MISSINGDATA library.
Chapter 1 (this chapter) introduces you to S+MISSINGDATA,
lists its features, and tells you how to use this manual.
Chapter 2 briefly gives background information, which may
be skimmed at first and revisited as needed.
Chapters 3 to 8 describe each step in the workflow given on
page 7.
Chapters 9 to 11 provide examples using the functions and
objects in S+MISSINGDATA.
11
Chapter 1 Introduction
> miss(myData)
> miss(
+ myData)
12
BACKGROUND
Overview
2
14
Taxonomy of Missing Data Methods 15
Omit Cases with Missing Values 15
Imputation 15
Weighting 15
ModelBased Approaches 16
Imputation 17
Single Imputation 17
Multiple Imputation 17
Model Fitting Algorithms 20
Expectation-Maximization (EM) 20
Data Augmentation (DA) 21
Multiple Imputation Using DA 23
Using the EM and DA Algorithms in Conjunction 25
13
Chapter 2 Background
OVERVIEW
This chapter discusses background information regarding model
based missing data methods. You may want to skim this chapter at
first and return to it when needed as you read the rest of the manual.
To put modelbased methods into context, we briefly describe
common approaches to handling missing data, with additional details
on imputation. Two algorithms, expectation-maximization (EM) and
data augmentation (DA), are described for fitting missing data models.
The DA algorithm may be used to produce either multiple
imputations or multiple sets of parameter estimates. The average of
the parameter estimates may be used as a point estimate; their
variability indicates the additional uncertainty due to missing data.
Whether you use DA to produce multiple imputations or parameter
estimates, assessing convergence is an important practical problem.
To address this problem, we discuss simple diagnostic procedures that
may be enough to assess convergence in the missing data models
described here.
Finally, we describe how the EM and DA algorithms complement
each other in analysis.
14
Taxonomy of Missing Data Methods
Note
The methods discussed here are not mutually exclusive. For example, S+MISSINGDATA provides
a modelbased approach to multiple imputation.
Omit Cases Omitting cases with missing values is easy to do and may be
with Missing satisfactory with small amounts of missing data. However, it can lead
to serious biases. This approach is usually not very efficient.
Values
Imputation With imputation methods, you estimate and fill in missing values,
then analyze the resulting complete data set by standard methods. To
obtain valid inferences, the standard analyses must be modified to
account for the differing status of the observed and imputed values.
Single imputation replaces each missing value by a single imputed
value. Multiple imputation replaces each missing value by a vector of
M ≥ 2 imputed values, and thereby shares the advantages of single
imputation while overcoming its disadvantages. We discuss this in
more detail in the section Imputation on page 17.
Weighting Weighting is used mostly for unit missingness, in which the values for
all variables in a case are missing. Respondents and non-respondents
are grouped together into a relatively small number of classes based
on other variables recorded for both respondents and non-
respondents. This arises, for example, in survey design variables. The
non-respondents are assigned weights of zero, and the weights of the
remaining cases are proportionately inflated so that the total weight of
the cases within cells is preserved.
15
Chapter 2 Background
ModelBased In a modelbased approach, you define a model for the missing data
Approaches and base inferences on the likelihood or posterior distribution under
that model. Parameters are estimated by procedures such as
maximum likelihood or iterative simulation.
16
Imputation
IMPUTATION
One advantage of imputation over the other methods described in the
previous section is that, once the missing values have been imputed,
standard analysis methods can be applied to the complete data.
Imputation is also advantageous in contexts where the data producer
(collector) and consumer (data analyst) are different individuals:
The producer may have access to information and resources
for creating imputations that are not available to the
consumer;
The created set of official imputations tends to increase the
comparability of analyses of the same data set;
The possibly substantial effort required to create sensible
imputations need be carried out only once.
Single Single imputation replaces each missing value in a data set by a single
Imputation imputed value. While this is a straightforward approach to filling in
missing data, it does not provide valid inferences that adjust for
observed differences between respondents and non-respondents. In
addition, single imputation does not provide standard errors that
reflect the reduced sample size, nor does it display sensitivity of
inferences to various plausible models for nonresponse.
17
Chapter 2 Background
18
Imputation
19
Chapter 2 Background
20
Model Fitting Algorithms
21
Chapter 2 Background
(t + 1) (t + 1)
Posterior step (P-step). Given Y mis , draw θ from its
(t + 1)
complete data posterior P [ θ Y obs ,Y mis ].
With a sample of independent, identically distributed, incomplete
multivariate data, the following is true:
n
P [ Y mis Y obs ,θ ] = ∏ P [ y i( mis ) yi( obs ) ,θ ] .
i=1
Here, y i ( mis ) and y i ( obs ) are the missing and observed parts of the
ith row of data, respectively. Thus, in the I-step above, the missing
data are imputed independently for each row.
22
Multiple Imputation Using DA
(t)
The subsequence { θ ;t = 1, 2, … } has a stationary
distribution of P [ θ Y obs ] .
(t)
The subsequence { Y mis ;t = 1, 2, … } has a stationary
distribution of P [ Y mis Y obs ] .
implies that
(t + s) (t + s)
(θ ,Y mis ) ∼ P [ θ, Y mis Y obs ]
23
Chapter 2 Background
24
Using the EM and DA Algorithms in Conjunction
25
Chapter 2 Background
26
EXPLORING AND
PREPROCESSING
Overview
3
28
Exploring Patterns of Missingness 29
Initial Explorations 29
The miss Function 31
Preprocessing Data 35
27
Chapter 3 Exploring and Preprocessing
OVERVIEW
Most data analyses begin by exploring the data, often graphically.
When there are missing values in the data, additional tools are
necessary to analyze patterns of missingness. In particular, the EM
and DA algorithms require initial analysis of the patterns in missing
data. To accomplish this, S+MISSINGDATA includes functions that
preprocess the data. If you perform this preprocessing once at the
beginning of an analysis, it need not be repeated every time you
apply EM or DA. As described in Chapter 2, the EM and DA
algorithms are often used in a complementary fashion and called
several times, so preprocessing can save considerable resources over
the course of a large analysis.
In this chapter, we discuss graphical and numerical techniques for
discovering patterns in missing data, some of which are implemented
in the miss function and its methods. In the final section of this
chapter, we discuss model-specific preprocessing functions that
compute and return information required by the EM and DA fitting
algorithms.
28
Exploring Patterns of Missingness
> is.na(health$Hyp)
[1] T F F T F T F F F T T T F F F T F F F F T F F F F
> which.na(health$Hyp)
[1] 1 4 6 10 11 12 16 21
> anyMissing(health$Hyp)
[1] T
> numberMissing(health$Hyp)
[1] 8
29
Chapter 3 Exploring and Preprocessing
$Age:
numeric(0)
$Hyp:
[1] 1 4 6 10 11 12 16 21
$BMI:
[1] 1 3 4 6 10 11 12 16 21
$Chl:
[1] 1 4 10 11 12 15 16 20 21 24
> colSums(is.na(health))
30
Exploring Patterns of Missingness
> rowSums(is.na(health))
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
3 0 1 3 0 2 0 0 0 3 3 3 0 0 1 3 0 0 0 1 3 0
23 24 25
0 1 0
> round(cor(is.na(health)), 2)
The miss The miss function facilitates the discovery of patterns in missing data
Function by grouping together similar variables and observations. The output
of the miss function is an object of class "miss". You can use the
print, summary, and plot methods to display the information in a
miss object.
For example, create a miss object for the built-in health data set and
then print it:
31
Chapter 3 Exploring and Preprocessing
> summary(M)
Breakdown by variable
V O name Missing % missing
1 2 Hyp 8 32
2 3 BMI 9 36
3 4 Chl 10 40
V = Variable number used below, O = Original number (before
sorting)
No missing values for variables:
Age
32
Exploring Patterns of Missingness
See Figure 3.1 for the plots that result from the following commands:
5 5
Observations
10 10
15 15
20 20
25 25
Figure 3.1: Plots of the miss object for the health data set.
33
Chapter 3 Exploring and Preprocessing
If your data set has a missing value code other than NA, you should
change it to NA before calling miss. For example, the following
command changes all missing values in the vector x as from -9 to NA.
34
Preprocessing Data
PREPROCESSING DATA
Preprocessing functions in S+MISSINGDATA process a data set to create
an object that contains the information needed for the EM and DA
model fitting algorithms. Note that the preprocessing functions are
model-specific; see Chapter 4 for detailed descriptions of the models
mentioned here.
To fit a Gaussian imputation model, use the preGauss function
to preprocess the data. This returns an object of class
"preGauss". For example, the following preprocesses the
built-in cholesterol data:
35
Chapter 3 Exploring and Preprocessing
36
FITTING A MISSING DATA
MODEL
Overview
4
38
Missing Data Models 39
The Gaussian Model 39
The Loglinear Model 41
The Conditional Gaussian Model 44
S-PLUS Implementation 48
Fitting a Gaussian Model 49
Fitting a Loglinear Model 51
Fitting a Conditional Gaussian Model 54
37
Chapter 4 Fitting a Missing Data Model
OVERVIEW
Once youve explored the patterns of missingness in your data and
preprocessed it for the fitting algorithms, the next step is to fit a
model. The model you fit is the distribution assumed for the complete
data (the missing and observed data together).
S+MISSINGDATA implements three models for independent,
identically distributed (iid) observations: the Gaussian model for
numeric variables, the loglinear model for factor variables, and the
conditional Gaussian model for both numeric and factor variables.
This chapter describes these three models and their associated priors,
and then shows how to fit the models using functions in
S+MISSINGDATA.
38
Missing Data Models
The Gaussian The Gaussian model handles missing data when all the variables are
Model numeric.
Model
Let Y 1, …, Y p be numeric variables in which values are recorded for
n cases, so that the complete data form an n × p data frame Y . The
cases are assumed to be independent and identically distributed
multivariate Gaussians with mean µ and covariance Σ .
Prior distribution
In complete data problems, using a normal inverted-Wishart prior
distribution leads to a conjugate analysis. The posterior distribution is
again normal inverted-Wishart with updated parameters involving the
data and prior parameters. In the presence of missing data, though,
this family is not conjugate in general. However, using this family of
distributions is computationally convenient for the EM and DA
model fitting algorithms. This is because both algorithms depend on
the complete data problem being tractable; see the section S-Plus
Implementation on page 48. Further details may be found in Schafer
(1997).
A normal inverted-Wishart prior distribution means the following.
Given Σ , the mean µ is assumed to have a conditional Gaussian
distribution:
–1
µ Σ ∼ N ( µ 0, τ Σ )
39
Chapter 4 Fitting a Missing Data Model
Note that µ does not appear on the right side of this equation;
its distribution is assumed to be uniform.
With an informative prior, you choose reasonable values for
the hyperparameters by interpreting them as a summary of
the information provided by an imaginary set of data. The
value µ 0 is the best guess as to what µ is, τ is the number of
imaginary prior observations on which µ 0 is based, and
–1 –1
m Λ is the best guess for Σ . The parameter m is the
number of imaginary prior degrees of freedom on which
–1 –1
m Λ is based.
A ridge prior is useful for stabilizing the inference about µ
when the sample covariance matrix is singular or nearly so,
and little is known a priori about µ or Σ . This can happen, for
example, when the data are sparse.
A ridge prior is the limiting form of the normal inverted-
Wishart distribution when τ → 0 . Take m = ε > 0 and
–1
Λ = ε ⋅ Ψ , where Ψ is a covariance matrix. For complete
data, the estimate for Σ is a weighted average of Ψ and the
sample covariance S . When S is nearly singular, set
Ψ = diag ( S ) , the matrix with sample variances along the
diagonal and zeroes elsewhere. This helps to smooth the
calculated variances towards the observed data variances and
the correlations towards zero; the smoothing results in
something closer to an independence model. The relative
sizes of ε and the sample size n determine the degree of
smoothing.
40
Missing Data Models
The Loglinear The loglinear model handles missing data when all the variables are
Model categorical, or of class "factor".
Models
Let W 1, W 2, …, W q be factor variables with values recorded for n
cases, so that the complete data form an n × q data frame W . If the
cases are independent and identically distributed, the information in
W is equivalent to a contingency table with D cells, where D is the
number of level combinations:
q
D = ∏ dj .
j=1
Here, d j is the number of levels for the variable W j . Some cells in the
contingency table are empty because of logical constraints; these are
known as structural zeroes.
If the sample size is assumed fixed, the set of D cell frequencies (or
counts) has a multinomial distribution. The parameters of the
distribution are the D probabilities that a case falls into each of the D
cells of the contingency table. If there are no restrictions on the
parameters other than that they are true probabilities, then the model
is said to be saturated . In many realistic examples, however, the
amount of data is insufficient to model such arbitrarily complex
associations among the variables.
Loglinear models are a flexible class of models for specifying possible
dependencies among variables. The cell probabilities are
parameterized as the product of effects for each variable and the
associations among variables. The log of the probabilities is therefore
linear. Eliminating terms from this decomposition imposes equality
constraints on odds ratios in the contingency table. See Bishop,
Fienberg, and Holland (1975) or Agresti (1990) for details.
41
Chapter 4 Fitting a Missing Data Model
Other situations
If the levels of the factors in your data are ordered, you may either:
Pretend that they are approximately normally distributed, or
Disregard the order. If the immediate goal is to create
plausible multiple imputations of missing data, then applying
a loglinear model may be reasonable in this case (Schafer,
1997, page 240).
The multinomial model can also be applied in some non-multinomial
situations:
If the distribution of one or more categorical variables is fixed
by design, the cell frequencies follow a product-multinomial
model. This arises, for example, in variables used to define
strata in sample surveys. The multinomial model may still be
valid in this situation if the missing values are confined to
variables that are not fixed.
If the total sample size n is random, the multinomial
likelihood may lead to valid conditional inferences. This
occurs, for example, in Poisson sampling.
Prior distribution
With complete data, using a Dirichlet prior distribution for the
saturated model leads to a conjugate analysis. The posterior
distribution is again Dirichlet with updated parameters involving the
data and prior parameters.
For the loglinear model, Schafer (1997) adopts the constrained Dirichlet
as the prior distribution. This has the same functional form as the
Dirichlet but requires the parameters to satisfy constraints imposed by
a loglinear model. The advantage of this prior is that it forms a
conjugate class: the posterior distribution is another constrained
Dirichlet with updated parameters. Note, however, that the
constrained Dirichlet prior assumes that the given loglinear model is
true. This can be assessed by performing goodness-of-fit tests against
more general alternative models.
42
Missing Data Models
43
Chapter 4 Fitting a Missing Data Model
The The conditional Gaussian model (CGM) handles missing data when
Conditional some of the variables are factors and others are numeric. This arises,
for example, in the analysis of covariance and logistic regression with
Gaussian continuous predictors.
Model
44
Missing Data Models
Model
Let W 1, W 2, …, W p be factor variables and let Z 1, Z 2, …, Z q be
numeric variables in which values are recorded for n cases. Thus, the
complete data form an n × ( p + q ) data frame Y = (W,Z) . The rows
are assumed to be:
Independent and identically distributed, and
Distributed according to a general location model (Olkin and
Tate, 1961), or more descriptively as a conditional Gaussian
model.
The conditional Gaussian model is best described in terms of the
marginal distribution of W and the conditional distribution of Z
given W , as follows. The information in W is equivalent to a
contingency table with D cells, where D is the number of level
combinations:
p
D = ∏ dj .
j=1
Restricted models
The number of parameters in the unrestricted conditional Gaussian
model is:
( D – 1 ) + Dq + q ( q + 1 ) ⁄ 2 .
45
Chapter 4 Fitting a Missing Data Model
µ = Aβ .
46
Missing Data Models
Prior distribution
The likelihood factors as a product of a multinomial distribution
involving W and a conditional Gaussian distribution for Z given W .
By applying independent prior distributions for the parameters of
each distribution, the parameter sets remain independent in the
posterior distribution.
In principal, the same prior distributions discussed in the sections The
Gaussian Model on page 39 and The Loglinear Model on page 41 can
be used. In practice, however, it may be difficult to quantify prior
knowledge about the Gaussian model parameters. A noninformative
prior for these parameters is the only option allowed in the S-PLUS
functions for fitting a CGM.
In sparse data situations, the posterior distribution may be improper
or the Gaussian parameters from certain cells may be poorly
estimated. Rather than trying to stabilize the inferences through
informative priors, Schafer (1997; pages 341, 348) recommends
restricting the model. In case of problems, simplify the model by
using a design matrix that has fewer columns.
For the multinomial portion of the model, apply a Dirichlet prior
distribution. See the section The Loglinear Model on page 41 for
details.
47
Chapter 4 Fitting a Missing Data Model
S-PLUS IMPLEMENTATION
This section describes the functions in S+MISSINGDATA that fit the
models introduced in the section Missing Data Models on page 39.
Table 4.1 lists the available fitting functions for each model.
Table 4.1: The model fitting functions available in S+MISSINGDATA.
48
S-PLUS Implementation
Fitting a The main wrapper function for the Gaussian model is mdGauss. It
Gaussian estimates the parameters of a Gaussian model, with or without
missing values in the data. Missing data options are specified through
Model the argument na.proc; Table 4.2 lists the possible values for this
argument.
Table 4.2: Possible values for the na.proc argument to the mdGauss function.
49
Chapter 4 Fitting a Missing Data Model
All three functions for fitting the Gaussian model have a prior
argument that specifies the hyperparameters of the normal inverted-
Wishart distribution. The following are possible values for prior:
One of the character strings "ml", "noninformative", or
"ridge". When prior="ml", no prior is specified and
maximum likelihood estimates are produced. Specifying
prior="ridge" sets the scale hyperparameter of the inverted-
Wishart distribution to a diagonal matrix of observed
variances with degrees of freedom equal to 1.
To specify a different scale hyperparameter or different
degrees of freedom, use the function dataDepPrior (for data-
dependent prior). This is a generic function with methods for
preGauss and preLoglin objects (see page 35).
50
S-PLUS Implementation
Fitting a The model fitting functions for a loglinear model are analogous to
Loglinear those described for the Gaussian model in the previous section. The
wrapper function mdLoglin estimates the parameters of the loglinear
Model model, with or without missing values in the data. Missing data
options are specified through the argument na.proc, which has the
values described in Table 4.2. The lower level functions emLoglin and
daLoglin fit the model using the EM and DA algorithms,
respectively; they are equivalent to calling mdLoglin with
na.proc="em" and na.proc="da".
51
Chapter 4 Fitting a Missing Data Model
The default value for prior is the noninformative prior. When you
give a missmodel object to one of the model fitting functions, the prior
used to produce that object is applied instead of the default, unless
prior is explicitly set.
maximum likelihood c = 1 c = 0
noninformative c = 1 c = 1⁄2
data-dependent
α d = 1 + n 0 θ̂ d α d = n 0 θ̂ d
52
S-PLUS Implementation
control = daLoglin.control(monotone = T)
control = list(monotone = T)
53
Chapter 4 Fitting a Missing Data Model
Fitting a The model fitting functions for a conditional Gaussian model are
Conditional entirely analogous to those described for the Gaussian and loglinear
models of the previous sections. The wrapper function mdCgm
Gaussian estimates the parameters of the conditional Gaussian model, with or
Model without missing values in the data. Missing data options are specified
through the argument na.proc, which has the values described in
Table 4.2. The lower level functions emCgm and daCgm fit the model
using the EM and DA algorithms, respectively; they are equivalent to
calling mdCgm with na.proc="em" and na.proc="da". Control
parameters for the fitting algorithms are specified through the
control argument to mdCgm. See the on-line help files for
emCgm.control and daCgm.control for details.
54
CONVERGENCE OF DATA
AUGMENTATION
ALGORITHMS
Overview
5
56
Parameter Simulation 57
Multiple Imputation 58
Practical Considerations for Missing Data Problems 60
Starting Values 60
S-PLUS Functions 61
55
Chapter 5 Convergence of Data Augmentation Algorithms
OVERVIEW
The goal of Monte Carlo Markov Chain (MCMC) methods is to
sample values from a convergent Markov chain in which the limiting
distribution is the true joint posterior of quantities of interest. In
practice, you need to determine when the algorithm has converged.
That is, you must determine when the samples are representative of
the stationary distribution of the Markov chain can be used to
estimate characteristics of the distribution of interest.
Theoretical convergence rates involve laborious and sophisticated
mathematics that must be repeated for each model. In addition, the
bounds of such rates can be so loose as to be impractical. Instead,
S+MISSINGDATA uses statistical analysis, called convergence diagnostics,
on the generated samples to assess convergence. The diagnostics for
assessing convergence vary according to the method of inference
being used.
This chapter discusses diagnostics used for both parameter simulation
and multiple imputation. In conclusion, we discuss practical
considerations for missing data problems, including starting values
and the implementation of convergence diagnostics in
S+MISSINGDATA.
56
Parameter Simulation
PARAMETER SIMULATION
In parameter simulation, the goal is to accurately estimate
characteristics of the posterior distribution P [ θ Y obs ] , such as its
moments and quantiles. Convergence is given by the law of large
numbers and occurs when the sample summaries are sufficiently
close to the posterior quantities they estimate.
To reduce bias due to starting values, samples within an initial burn-in
period are thrown away. The length of this period varies according to
how fast the algorithm converges to the parameters of the target
distribution.
To estimate a quantity g = g ( θ ) of interest such as a point estimate,
standard error, interval estimate, or p-value, collect iterates
k+1 k+2 k+n
g ,g , …, g .
Here, k is the burn-in period and n is the Monte Carlo sample size. If
k is large enough to ensure stationarity and n ⁄ k is large enough for
the law of large numbers to apply, then the sample quantities estimate
the corresponding posterior quantities (for example, the posterior
mean E [ g Y obs ] ). The burn-in period k should be chosen large
k 0
enough to make g practically independent of g . To determine k ,
Schafer (1997) recommends looking at time series and autocorrelation
t
function plots of { g } .
After convergence, the following should apply:
Time series plots should not show a trend, nor should iterates
k steps apart have more than negligible correlation. See
Schafer (1997), page 121 for examples.
Autocorrelation function (ACF) plots should die out. The
sample ACFs should fall within approximate 0.05-level
critical values for testing that the ACFs are zero. See Schafer
(1997), page 122 for examples.
57
Chapter 5 Convergence of Data Augmentation Algorithms
MULTIPLE IMPUTATION
In multiple imputation, the goal is to generate Bayesianly proper
multiple imputations. These are independent realizations of
P [ Y mis Y obs ] , the posterior predictive distribution of the missing data
under some complete-data model and prior.
The DA algorithm simulates values of Y mis that have P [ Y mis Y obs ] as
their stationary distribution. In practice, M imputations are produced
either with one long chain or several chains. The working notions of
convergence differ depending on whether one or several chains are
used, as we discuss below. In both cases, however, the main problem
is to approximate the burn-in period k . As in parameter simulation,
samples within an initial burn-in period are discarded to reduce bias
due to starting values.
Note
As discussed in the section Multiple Imputation Using DA on page 23, it is easier to monitor
convergence using the parameter sequence rather than the imputed data sequence.
58
Multiple Imputation
59
Chapter 5 Convergence of Data Augmentation Algorithms
Starting Values The rate of convergence to stationarity for DA partly depends on the
starting values or starting distribution. Schafer (1997) recommends
using starting values that are near the center of the posterior. For
example, use a maximum likelihood estimate or posterior mode
obtained from running an EM algorithm.
To facilitate this, objects of class "missmodel" returned by the EM
fitting functions emGauss, emLoglin, and emCgm may be used as input
to a DA algorithm. For example, the missmodel object returned by
60
Practical Considerations for Missing Data Problems
ˆ b
2. Calculate θ b = θ̂ ( Y obs ) .
* * n
If we take n to be smaller than n , say n = --- , then θ̂ b tends to be
2
overdispersed relative to P [ θ, Y obs ] . Care is required, however, since
a reduced data set size may lead to problems such as colinearity.
61
Chapter 5 Convergence of Data Augmentation Algorithms
62
IMPUTATION
Overview
6
64
Imputing Data 65
The Gaussian Model 65
The Loglinear Model 66
The Conditional Gaussian Model 67
The Class of impute Objects 68
Extracting Imputations 69
Replacing Imputations 70
Manipulating impute Objects 70
63
Chapter 6 Imputation
OVERVIEW
As discussed in Chapter 2, the DA algorithm can be used to produce
multiple imputations under one of the models discussed in Chapter 4:
Gaussian, loglinear, and conditional Gaussian. Applied within the
framework of multiple imputation, these models are more widely
applicable than would appear at first glance. This is because multiple
imputation is fairly robust to model misspecification, especially with
small fractions of missing information (Ezzati-Rice et al. (1995), Rubin
and Schenker (1986), Schafer (1997)). Multiple imputation under
these models thus applies more or less routinely to a wide variety of
missing data problems.
This chapter discusses the functions and objects in S+MISSINGDATA
that support multiple imputation. The main functions discussed are
impGauss, impLoglin, and impCgm, corresponding to each of the
models from Chapter 4. All three imputation functions return objects
of class "impute", which are designed to store multiple imputations
efficiently with the original data. The impute objects work with a
variety of utility functions available in S+MISSINGDATA.
In principal, you can generate multiple imputations by using
nonparametric procedures, or by using parametric models that are
different than the ones provided in S+MISSINGDATA. As long as your
custom procedures and models return an object of class "impute",
you may use the capabilities described in the next two chapters to
perform multiple complete data analyses and consolidate results.
64
Imputing Data
IMPUTING DATA
The following functions for imputing data are available in
S+MISSINGDATA:
The impGauss function produces multiple imputations under
the Gaussian model.
The impLoglin function produces multiple imputations under
the loglinear model.
The impCgm function produces multiple imputations under the
conditional Gaussian model.
These functions are generic with methods for preGauss, preLoglin,
and preCgm objects, respectively; see the section Preprocessing Data
on page 35 for descriptions of these objects. The impGauss,
impLoglin, and impCgm functions also have methods for the missmodel
objects described in Chapter 4, as well as default methods for
matrices and data frames.
All three functions for imputing data return an object of class
"impute", which by default is a data frame with columns of class
"miVariable". See the section The Class of impute Objects on page
68 for details on miVariable. Alternatively, you can set the argument
return.type="matrix" in a call to impGauss, impLoglin, or impCgm. In
this case, an miVariable version of a matrix is returned instead of a
data frame. The form of the return object usually depends on the
whether the original data is a data frame or matrix. Preserving the
form of the original data in the impute object makes the commands
for subsequent complete data analyses parallel to those used when the
data has no missing values.
The form of the starting values, given by the argument start in the
impGauss, impLoglin, and impCgm functions, determines how many
chains are run. The possible values for start are model-specific, as
we discuss below.
The Gaussian For one long chain, the start argument of the impGauss function is a
Model list with two components: a vector that gives the mean and a matrix
that gives the covariance matrix.
65
Chapter 6 Imputation
The Loglinear For one long chain, the start argument of the impLoglin function is a
Model vector of cell probabilities. The length of start equals the number of
distinct combinations of levels in the factor variables. The ordering is
such that the first variable varies the fastest, then the second variable,
and so on. Starting values should be zero for cells that are structural
zeros. For one long chain, you must also supply the argument
nimpute, which gives the number of imputations.
66
Imputing Data
The For one long chain, the start argument of the impCgm function is a list
Conditional with the following components:
Gaussian mu, which is matrix of cell means. Each column in the matrix
represents one numeric variable and each row represents a
Model cell. The ordering of the rows is equivalent to the ordering in
the pi component.
sigma, which is a variance-covariance matrix of the numeric
variables.
pi,which is vector of cell probabilities. The length of pi
equals the number of distinct combinations of the factor
variable levels. The ordering is such that the first variable
varies the fastest, then the second variable, and so on. Starting
values should be zero for cells that are structural zeros.
For multiple chains, the start argument may take several forms:
A list of lists. The inner lists must all have a vector component
containing the mean, a matrix component containing the
covariance, and a vector component containing the cell
probabilities. Each list of parameters starts a separate chain.
An object of class "Cgm", which is the paramIter component
of a missmodel object created by one of the functions mdCgm,
emCgm, or daCgm. A Cgm object is a matrix in which each row
contains one set of parameter estimates; each row then starts a
separate chain. Typically, a Cgm object contains a limited set of
iterations, obtained either through subsetting or by specifying
the save argument to emCgm.control or daCgm.control.
A list of Cgm objects. In this case, the last row of each Cgm
object starts a separate chain.
67
Chapter 6 Imputation
68
The Class of impute Objects
miList that the data value was originally missing. The converse is not
always true, however. Only an miList object with components that
have the same length, names, attributes, and atomic mode may be
converted to an miVariable object. The generic function
miVariablePossible determines whether an object can be converted
to an miVariable; you may write your own methods for this function.
Generally, miVariable objects should be used for data and miList
objects for the results of analyses. Results of analyses that can be
treated as data, such as a vector of residuals from a regression, are
usually created and stored as miList objects. However, it is possible
to convert them to miVariable objects.
Both miList and miVariable objects may be components of a list. In
particular, variables in a data frame may be miVariable objects.
These types of objects may be contained in an attribute, though this
has not been well tested and is not currently recommended. These
objects can also be contained in a slot if the definition for the class
allows this.
is equivalent to
Extraction for lists (or objects with slots) containing miList and
miVariable objects proceeds recursively. Each miList or miVariable
component is replaced with the corresponding extracted complete-
data object, then the whole list (or object with slots) is returned.
69
Chapter 6 Imputation
Manipulating The miList and miVariable objects can be created using the miList
impute Objects and miVariable functions, respectively.
To determine the number, names, or existence of imputations, use
miReps, miNames, and is.mi functions, respectively. These functions
operate recursively, searching for imputations in any list component
or slot of an object. The is.miVariable and is.miList existence
functions are also useful. The latter has an optional argument
recursive; if recursive=TRUE, the is.miList function searches for
miList objects recursively.
70
The Class of impute Objects
71
Chapter 6 Imputation
72
ANALYZING COMPLETED
DATA SETS
Overview
7
74
Analysis Functions 75
The miEval Function 75
The miApply Function 76
Additional Arguments 77
Compatibility of miEval and miApply 77
73
Chapter 7 Analyzing Completed Data Sets
OVERVIEW
The process of statistical analysis may involve creating graphics,
fitting models, investigating diagnostics, and comparing results. If you
use multiple imputation to handle missing values, you still perform a
similar sequence of analysis steps. However, you need to perform the
steps on several data sets, each of which is completed by filling in the
missing values using one set of imputations. Each completed data set
gives a different result; the results must then be collected as described
in this chapter and combined as described in the next chapter.
The S+MISSINGDATA library provides two functions to facilitate the
process of performing analyses and collecting results:
The miApply function is analogous to S-PLUS functions such as
apply and sapply. It is useful when the complete data analysis
can be expressed as a function applied to one set of data.
The miEval function is analogous to the S-PLUS function eval.
If the complete data analysis involves more than one set of
data or requires an S-PLUS expression, then miEval is the
function to use.
In this chapter, we discuss miApply and miEval in detail. Both
functions usually produce miList objects in which each component is
the result of one complete data analysis.
74
Analysis Functions
ANALYSIS FUNCTIONS
or
> miEval({
+ print(mean(x, trim = 0.2))
+ any(x + y > z)
+ })
corresponds to either
or
75
Chapter 7 Analyzing Completed Data Sets
The miEval function does not provide support for functions that
access data directly, without being passed through the argument list.
This becomes a problem when, for example, the function is called by
another function. However, miEval has an advantage in this situation:
the given expression is evaluated in the calling frame, so that data that
would be visible if the expression was evaluated outside of miEval is
also visible inside of miEval. If the data is an impute object, the
appropriate set of imputations is not extracted. The expression given
to miEval can include explicit assignments to frame 1 to handle these
situations.
Note that miApply cannot be used for expressions; you must use a
wrapper function instead. For example, the expression x+y/z
becomes the following:
76
Analysis Functions
You must also use a wrapper function when calling functions that
require multiple impute objects. For example, the command
Compatibility In most cases, the objects produced by miEval and miApply (when
of miEval and called with analogous expressions) are compatible. When the
expressions contain modeling functions such as lm or glm, however,
miApply the objects produced by miEval and miApply are slightly different.
77
Chapter 7 Analyzing Completed Data Sets
Both m.fit1 and m.fit2 are miList objects with components that are
glm objects. The call attributes for the first components of each are,
respectively:
However, the same commands using m.fit1 fail because the data
cannot be found. Instead, you must do one of the following:
78
Analysis Functions
> miEval({
+ assign("xx", m.kyphosis, frame = 1)
+ predict(m.fit1, type = "terms")
+ })
This ensures that the appropriate data sets are assigned to frame 1
where they are sure to be found. The first completed data set from
m.kyphosis is assigned to frame 1 with the name xx before the first
analysis is run, then the second completed data set is assigned there,
and so on. Note that replacing the dummy name xx with m.kyphosis
when creating m.fit1 would be dangerous, because some code would
not know whether to use the original m.kyphosis (which contains
multiple imputations) or its completed data sets with the same names.
Results could be transparently incorrect.
79
Chapter 7 Analyzing Completed Data Sets
80
CONSOLIDATING ANALYSES
Overview
8
82
Simple Statistics 83
Inferences 84
Normal and Students-t Inferences 84
Chi-Square and F Inferences 86
Likelihood Ratio Inferences 90
81
Chapter 8 Consolidating Analyses
OVERVIEW
The final step in an analysis of multiple imputations is consolidating
results from all imputations to produce a single result. If the result
from a single set of imputations is an estimate with no associated
standard error or other inferences, then you can use the miMean
function, which computes the average result across imputations.
Likewise, the miVar function computes the variance across
imputations.
More interesting is the case where you need to combine not only
estimates but also standard errors or other inferences. The final result
must encompass both the uncertainty associated with individual
estimates, such as standard errors for linear regression coefficients, as
well as the additional uncertainty due to missing data. The miMeanSE
function combines point and variance estimates that are used for
inference assuming asymptotic normality (Rubin (1987), Chapter 3)
or Students-t (Barnard and Rubin (1999); Hesterberg (1998)). The
functions miChiSquareTest, miFTest, and miLikelihoodTest combine
2
inferences based on χ (Li et al. (1991)), F (Hesterberg (1998); Li,
Ragunathan, and Rubin (1991)), and likelihood ratio statistics,
respectively.
82
Simple Statistics
SIMPLE STATISTICS
S+MISSINGDATA includes two functions for calculating simple
statistics across imputation sets. The miMean function calculates the
mean across imputation sets and the miVar function calculates the
variances:
> miMean(m.coef)
> miVar(m.coef)
83
Chapter 8 Consolidating Analyses
INFERENCES
Normal and Many inferences are based on estimates, standard errors, and
Students-t approximate normality. The miMeanSE function consolidates both
estimates and their standard deviations or standard errors by
Inferences averaging the estimates and obtaining adjusted standard errors. The
rules implemented in miMeanSE are based on those described in Rubin
(1987) for combining normal-based inferences, and in Barnard and
Rubin (1999) and Hesterberg (1998) for combining Students-t
inferences. In this section, we describe the computations underlying
miMeanSE.
84
Inferences
85
Chapter 8 Consolidating Analyses
This allows you to call miMeanSE without first calculating sumfit and
extracting the coefficients, degrees of freedom, and covariance
matrices.
2
Chi-Square and When complete data inferences are based on χ or F statistics, there
F Inferences are two cases to consider:
The estimates and variance-covariance estimates are available
from each set of imputations; or
2
Only the χ or F statistics are available.
In the first case, consolidate the estimates and variance-covariance
matrices using miMeanSE and calculate an F statistic using the formula:
T –1
( θ – θ 0 ) Σ̂ ( θ – θ 0 ) ⁄ ( df1 ) .
86
Inferences
> m.C$df[4:8])
87
Chapter 8 Consolidating Analyses
2
When only the χ or F statistics are available, the functions in
S+MISSINGDATA follow Schafer (1997, page 115) and Li et al. (1991),
with natural extensions to F statistics. The functions miChiSquareTest
2
and miFTest accept as input the scalar χ or F statistics calculated on
each completed data set and the degrees of freedom for the tests.
They return the consolidated F statistic, numerator and denominator
degrees of freedom, estimated average relative increase in variance
due to nonresponse, and approximate p-value corresponding to the F
statistic. The p-value should be used for screening only; the actual
p-value may be larger or smaller by a factor of two.
For example, we might use the F statistics computed by the anova
function to compare linear models with and without a factor variable.
To continue the preceding example:
$Fstatistic:
[1] 0.2926602
$df1:
[1] 5
$df2:
[1] 5.758413
$r:
[1] 0.3130781
$p:
[1] 0.9001019
Both procedures fail to reject the null hypothesis that all coefficients
for the Type variable are zero. However, note that the p-value
obtained by combining F statistics is larger than the p-value based on
averaging parameter estimates (obtained earlier). The p-value here is
based on a less powerful test.
88
Inferences
Case 1: The X values are observed. In this case, the test uses
the consolidated estimate X of µ .
2
Case 2: Only Y i = X i is observed (these are equivalent to
2
χ test statistics). In this case, the test uses the statistic ∑ Yi .
Both procedures give exact tests, where the probability of Type I
error is exactly equal to α . However, any single set of data can reach
different conclusions. The first procedure, which averages individual
parameter estimates, is more powerful.
The next example consolidates chi-square statistics from a loglinear
analysis of a contingency table. First, create an impute object from
the built-in data set barley:
89
Chapter 8 Consolidating Analyses
90
Inferences
Here, the df1 and ... arguments are defined as before, but FUN is a
function that calculates parameter estimates internally for both the
alternative and null hypotheses and returns the likelihood ratio
statistic. Furthermore, the data argument must be such that the
completed data sets can be combined into a single large data set using
rbind, and FUN must be able to take this large data set as input and
compute likelihood ratios. The log-likelihood statistic for the large
data set formed by stacking M copies of a single data set should be M
times the statistic obtained for the single data set. The procedure
followed in this case is invariant under transformations of the
parameters.
For example:
91
Chapter 8 Consolidating Analyses
92
EXAMPLE 1: THE GAUSSIAN
MODEL
Overview
9
94
The Cholesterol Data 94
Exploring Patterns of Missingness 96
Summarizing and Plotting 96
Preprocessing Data 99
Model Fitting 100
Fitting a Model Using EM 100
Fitting a Model Using DA 103
Assessing Convergence 105
Autocorrelation Plots 105
Fractions of Missing Information 108
Conclusions 110
Analysis Using Parameter Simulation 111
Generating Multiple Imputations Through DA 114
Omitting Cases with Missing Values 119
Summary 120
93
Chapter 9 Example 1: The Gaussian Model
OVERVIEW
This chapter provides detailed examples illustrating the Gaussian
model fitting process, in which all variables with missing values are
numeric. Chapter 4 briefly describes the Gaussian model, the
associated priors, and the functions in S+MISSINGDATA used to fit it.
In this chapter, we illustrate the S+MISSINGDATA functions using the
cholesterol example from Schafer (1997). Note that the algorithms in
S+MISSINGDATA differ from those in Schafers book, which involve
sweep operators; details are in Fraley (1998).
Multivariate normality is often assumed in analyzing continuous data.
It is therefore natural to treat missing data using the same
assumptions. The Gaussian model handles missing data even when
data sets deviate from normality, however. Some reasons are:
Transformations of the variables may make normality more
tenable.
If some variables are clearly non-normal but complete, the
Gaussian model can be used if the incomplete variables may
be modeled as conditionally Gaussian, given a linear function
of the complete variables. In this case, inferences must be
made about the parameters of the conditional distribution
only.
When used as a model for multiple imputation, the Gaussian
model is applied only to the missing part of the data. Multiple
imputation inferences are robust to assumptions on the
imputation model as long as the fraction of missing
information is small.
The Schafer (1997) illustrates the Gaussian model using a data set of 28
Cholesterol patients treated for heart attacks at a Pennsylvania medical center.
The original data are given in Table 9.1 of Ryan and Joiner (1994).
Data For each patient, serum-cholesterol levels are measured 2 and 4 days
after the attack. For 19 patients, a measurement is also taken 14 days
after attack.
The data from the cholesterol study is included in S+MISSINGDATA
as the built-in data set cholesterol. It consists of three variables,
chol2, chol4, and chol14.
94
Overview
> cholesterol
For additional details, see the online help file for cholesterol.
The goals of the study are to estimate three parameters:
Mean cholesterol level at 14 days;
Average decrease in cholesterol level from data 2 to day 14;
Percentage decrease in cholesterol level from day 2 to day 14.
We accomplish these goals in this chapter using the EM algorithm,
the DA algorithm, and multiple imputation. The latter two techniques
provide confidence intervals for each of the estimated parameters.
95
Chapter 9 Example 1: The Gaussian Model
Summarizing In this section, we use the miss function and its associated methods to
and Plotting explore the cholesterol data. As discussed in Chapter 3, the miss
function is designed to facilitate exploratory data analysis for data sets
that include missing values. It creates an object of class "miss", which
by default rearranges the rows and columns of the data according to
the numbers and patterns of missing values.
To create a miss object from the cholesterol data, type:
Note that omitting cases with missing values would throw out nearly a
third (32%) of the observations.
Use summary for more detailed information. Here is the annotated
output from summary for the cholesterol.miss object:
> summary(cholesterol.miss)
Breakdown by variable
V O name Missing % missing
1 3 chol14 9 32
V = Variable number used below, O = Original number (before
sorting)
No missing values for variables:
chol2 chol4
96
Exploring Patterns of Missingness
Observed values are displayed with a period and missing values with
an m. The output indicates that the first pattern has no missing values
while the second pattern has missing values only in variable 1. As we
previously noted, the first variable after reordering is chol14.
Each pattern detected by the miss function corresponds to one or
more rows in the original data set. The correspondence between rows
and patterns is shown in the next section of output from summary:
97
Chapter 9 Example 1: The Gaussian Model
9 .
10 m
11 .
12 .
13 m
14 .
15 .
16 m
17 .
18 m
19 .
20 .
21 .
22 .
23 m
24 .
25 m
26 .
27 .
28 .
> plot(cholesterol.miss)
98
Exploring Patterns of Missingness
10
15
20
25
Preprocessing In the next section, we fit models to the cholesterol data using both
Data the EM and DA algorithms. To save computation resources when
fitting these models, preprocess the cholesterol data by creating a
preGauss object as follows:
99
Chapter 9 Example 1: The Gaussian Model
MODEL FITTING
Fitting a Model To fit a Gaussian model to the cholesterol data using the EM
Using EM algorithm, type:
Iterations of EM:
Iteration ParChange
1 2.2853
2 0.3018
3 0.1211
4 0.0523
5 0.0233
6 0.0105
7 0.0048
8 0.0022
9 0.0010
10 0.0005
100
Model Fitting
> cholesterol.EM$paramIter
Covariance
chol2 chol4 chol14
chol2 2194.995 1454.617 835.233
chol4 1454.617 2127.158 1514.498
chol14 835.233 1514.498 1950.798
========== iteration = 10 ================
Mean
chol2 chol4 chol14
253.9286 230.6429 222.2329
Covariance
chol2 chol4 chol14
chol2 2194.9949 1454.617 835.3333
chol4 1454.6173 2127.158 1515.0270
chol14 835.3333 1515.027 1951.5629
==========================================
By default, only the last two iterates are saved for the EM algorithm.
This can be modified through the argument last to emGauss.control.
For example, to save the last four iterates, add the following to the
argument list in the original call to emGauss:
control = emGauss.control(last= 4)
> cholesterol.EM$algorithm
101
Chapter 9 Example 1: The Gaussian Model
> worstFraction(cholesterol.EM)
$direction:
Mean
chol2 chol4 chol14
0 0 -0.3081905
Covariance
chol2 chol4 chol14
chol2 0 0 0.0000000
chol4 0 0 0.0000000
chol14 0 0 0.9050186
==========================================
$fraction:
[1] 0.4265396
Since there are no missing values in either chol2 and chol4, the
parameters corresponding to these variables converge in a single step
and the fractions of missing information are zero.
To compute the worst fraction of missing information using the power
method, type:
> worstFraction(cholesterol.EM, method = "power")
$direction:
chol2 chol4 chol14 chol2.chol2 chol2.chol4 chol4.chol4
0 0 -0.4331057 0 0 0
$fraction:
[1] 0.4657516
For details on the power argument, see the online help file for
worstFraction.Gauss.
102
Model Fitting
> cholesterol.EM.ex$mu[3]
chol14
222.2329
> 100*dec.2to14/cholesterol.EM.ex$mu[1]
chol2
12.48212
These estimates are summarized in Table 9.2 on page 120, along with
those obtained using data augmentation and multiple imputation.
Fitting a Model It is also possible to estimate the three parameters of interest in the
Using DA cholesterol study via parameter simulation. To accomplish this, it is
generally a good idea to start a DA algorithm near the center of the
posterior obtained from running an EM algorithm. See the section
Using the EM and DA Algorithms in Conjunction on page 25 for
additional details.
103
Chapter 9 Example 1: The Gaussian Model
> cholesterol.DA$algorithm
seed = 13 46 10 7 30 0 6 9 59 60 1 1
parameter estimates saved for iterations: 101:1100
104
Assessing Convergence
ASSESSING CONVERGENCE
105
Chapter 9 Example 1: The Gaussian Model
60
250
50
chol14.chol14
230
chol14
40
210
30
190
20
Figure 9.2: Time series plots for parameters that are related to cholesterol measurements on day 14, the only
variable in cholesterol with missing values.
106
Assessing Convergence
The ACF plots for the parameters of the linear regression are
generated by writing several functions. For the intercept, define the
following function:
107
Chapter 9 Example 1: The Gaussian Model
0.8
0.8
0.8
ACF
ACF
ACF
0.4
0.4
0.4
0.0
0.0
0.0
0.8
0.8
0.8
ACF
ACF
ACF
0.4
0.4
0.4
0.0
0.0
0.0
Figure 9.3: ACF plots for parameters that are related to cholesterol measurements on day 14. ACF plots for
parameters of the linear regression of chol14 on both chol2 and chol4 are also displayed. These parameters are
conjectured to have high fractions of missing information as well.
108
Assessing Convergence
0 20 40 60 80 100
Lag
Figure 9.4: ACF plot of the worst linear function of the parameters in the cholesterol models.
109
Chapter 9 Example 1: The Gaussian Model
Conclusions In summary, the times series plots of this section show no unusual
features. Similarly, the ACF plots indicate rapid convergence and
negligible correlations by lag 20. Based on this evidence, it seems safe
to conclude that the DA algorithm achieves stationarity by 20
iterations. To be safe, discard the first 100 observations:
> cholesterol.DA$paramIter <-
+ cholesterol.DA$paramIter[-c(1:100),]
110
Analysis Using Parameter Simulation
See Figure 9.5 for histograms of the parameters, which are generated
as follows:
111
Chapter 9 Example 1: The Gaussian Model
0.0
0.0
180 220 260 -20 20 60
0.0
-10 10 30 0 10 30
Figure 9.5: Histograms of the three parameters of interest in the cholesterol study.
112
Analysis Using Parameter Simulation
113
Chapter 9 Example 1: The Gaussian Model
114
Generating Multiple Imputations Through DA
You calculate these quantities for each of the completed data sets with
the following:
> m.cholesterol.estimates <-
+ miEval(cholesterol.estimates(cholesterol.imp))
115
Chapter 9 Example 1: The Gaussian Model
> m.cholesterol.estimates
> m.cholesterol.se
116
Generating Multiple Imputations Through DA
> chol.consolidate
$est:
mean.day14 decrease percent.decrease
219.9349 33.99365 13.38709
$std.err:
sigma.mean.day14 sigma.decrease sigma.percent.decrease
8.950076 10.25067 3.782212
$df:
sigma.mean.day14 sigma.decrease sigma.percent.decrease
150.7358 259.3694 199.8662
$m:
[1] 5
$r:
mean.day14 decrease percent.decrease
0.1946008 0.1417942 0.1647799
$fminf:
mean.day14 decrease percent.decrease
0.1737904 0.1308616 0.1499327
> chol.consolidate$est
117
Chapter 9 Example 1: The Gaussian Model
118
Omitting Cases with Missing Values
119
Chapter 9 Example 1: The Gaussian Model
SUMMARY
Table 9.2 shows the estimates and confidence intervals obtained for
each of the methods for handling missing data. In each cell, the order
of the methods is first EM, then DA, and finally multiple imputation.
The Estimate column also shows estimates obtained after omitting
cases with missing values.
Table 9.2: Comparison of answers obtained using EM, DA, multiple imputation,
and deleting cases with missing values.
mean 222.23 NA NA
222.05 201.05 242.21
220.8542 201.36 240.35
221.47
decrease 31.70 NA NA
31.87 9.01 54.23
33.07 10.83 55.32
38
percent 12.48 NA NA
decrease
12.48 3.78 20.55
13.03 4.71 21.34
14.65
120
Summary
Complete Cases
Incomplete Cases
250
240
230
220
Figure 9.6: The average cholesterol level at day 2 is lower for incomplete cases than for complete cases. This
helps to explain the different parameter estimate obtained using complete case analysis versus using EM, DA, or
multiple imputation.
121
Chapter 9 Example 1: The Gaussian Model
122
EXAMPLE 2: THE
LOGLINEAR MODEL
Overview
10 124
The Crime Data 124
Exploring Patterns of Missingness 126
Summarizing and Plotting 126
Preprocessing Data 128
Model Fitting 129
Fitting a Model Using EM 129
Fitting a Model Using DA 130
Assessing Convergence 133
Autocorrelation Plots 133
Fractions of Missing Information 135
Conclusions 136
Generating Multiple Imputations Through DA 140
Analyzing Completed Data Sets 142
Analysis Using Multiple Imputation 142
123
Chapter 10 Example 2: The Loglinear Model
OVERVIEW
This chapter provides detailed examples illustrating the loglinear
model fitting process, in which all variables with missing values are
categorical. Chapter 4 briefly describes both the saturated
multinomial model and the loglinear model, as well as the functions
in S+MISSINGDATA used to fit them. In this chapter, we illustrate the
S+MISSINGDATA functions using the crime example from Schafer
(1997). See Schafers book for details about this study and descriptions
of the algorithms involved.
The Crime Schafer (1997, page 45) illustrates the loglinear model using a data set
Data that represents 641 housing occupants. The original data were
obtained through the National Crime Survey conducted by the U.S.
Bureau of the Census. Housing occupants were intially asked whether
they had been victimized by crimes committed in the previous six
months, and then six months later they were asked the same question.
A total of 641 occupants responded on at least one of the two
occasions.
The data from the crime study is included in S+MISSINGDATA as the
built-in data set crime. It consists of two dichotomous variables,
Visit.1 and Visit.2, each taking on the values Crime-free or
Victim. A third variable provides the number of housing occupants
corresponding to each of the combinations:
> crime
For additional details, see the online help file for crime.
124
Overview
Note that this data set is in a grouped format. This saves space by
representing the data as the unique combinations of values in Visit.1
and Visit.2 with the corresponding frequencies in the count column.
Equivalently, the data can be represented in an ungrouped format
with the following command. This requires 756 rows instead of 9:
125
Chapter 10 Example 2: The Loglinear Model
Summarizing In this section, we use the miss function and its associated methods to
and Plotting explore the crime data. As discussed in Chapter 3, the miss function
is designed to facilitate exploratory data analysis for data sets that
include missing values. Patterns in missing data are reasonably easy to
discern for data in a grouped format, such as crime. This is especially
true when the data set includes a limited number of factors.
Nevertheless, we illustrate the miss function using the ungrouped data
set crime.df.
The miss function creates an object of class "miss", which by default
rearranges the rows and columns of the data according to the
numbers and patterns of missing values. To create a miss object from
the crime.df data, type:
> summary(crime.miss)
126
Exploring Patterns of Missingness
Breakdown by variable
V O name Missing % missing
1 1 Visit.1 153 20
2 2 Visit.2 157 21
V = Variable number used below, O = Original number (before
sorting)
Observed values are displayed with a period and missing values with
an m. The output indicates that the first pattern has no missing values
while the second pattern has missing values only in the first variable.
As we previously noted, the first variable after reordering is Visit.1.
Likewise, the third pattern detected has missing values only in the
second variable (Visit.2), and the fourth pattern has missing values
in both variables.
Each pattern detected by the miss function corresponds to one or
more rows in the original data set. The correspondence between rows
and patterns is shown in the next section of output from summary:
127
Chapter 10 Example 2: The Loglinear Model
Preprocessing In the next section, we fit models to the crime data using both the EM
Data and DA algorithms. To save computation resources when fitting these
models, preprocess the crime data by creating a preLoglin object as
follows:
128
Model Fitting
MODEL FITTING
Fitting a Model To perform the likelihood ratio test of independence, the likelihood
Using EM must be maximized twice: once for the saturated model and once
under the null hypothesis of independence. Since there are missing
values in the crime data set, an EM algorithm is used to maximize the
likelihoods. The maximum likelihood estimates (MLEs) under
independence are obtained as follows:
Iterations of ECM:
1...2.498457, -589.665968278183
2...0.6356901, -575.88481965762
3...0.1355902, -575.224397116118
4...0.02803836, -575.195583085395
5...0.005763061, -575.194355312622
6...0.00118325, -575.194303240237
129
Chapter 10 Example 2: The Loglinear Model
Similarly, the MLEs for the saturated model are obtained with either
of the following:
Asymptotic analysis
The following command calculates the likelihood ratio test statistic for
testing independence:
> 1 - pchisq(like.ratio.test, 1)
[1] 4.70321e-07
130
Model Fitting
Next, start from crime.EM and run the DA algorithm for 5100
iterations, saving all of them:
> crime.EM$paramIter
Visit.1=1;Visit.2=1 Visit.1=2;Visit.2=1
5 0.6969570 0.1358427
6 0.6970886 0.1357966
Visit.1=1;Visit.2=2 Visit.1=2;Visit.2=2
5 0.09872477 0.06847549
6 0.09865219 0.06846263
By default, only the last two iterates are saved for the EM algorithm.
This can be modified through the argument last to
emLoglin.control.
> crime.DA$paramIter[1:10,]
Visit.1=1;Visit.2=1 Visit.1=2;Visit.2=1
1 0.6745408 0.1479869
2 0.6784437 0.1372391
3 0.6967126 0.1552910
4 0.6997548 0.1312311
5 0.6952653 0.1364643
6 0.6921411 0.1398454
7 0.6676453 0.1529987
8 0.6944598 0.1230416
9 0.6508019 0.1560291
10 0.6968313 0.1309274
131
Chapter 10 Example 2: The Loglinear Model
Visit.1=1;Visit.2=2 Visit.1=2;Visit.2=2
1 0.08282638 0.09464592
2 0.10567274 0.07864449
3 0.07906126 0.06893515
4 0.10795915 0.06105492
5 0.09763649 0.07063395
6 0.11058398 0.05742950
7 0.11219294 0.06716300
8 0.11210834 0.07039024
9 0.12209065 0.07107828
10 0.08831767 0.08392358
> crime.EM$algorithm
> crime.DA$algorithm
seed = 21 14 49 32 43 1 32 22 36 23 28 3
parameter estimates saved for iterations: 1:5100
132
Assessing Convergence
ASSESSING CONVERGENCE
> plot(crime.DA)
> daAcfPlot(crime.DA)
133
Chapter 10 Example 2: The Loglinear Model
0.18
0.74
0.10
Visit.1=1 Visit.2=1
Visit.1=2;Visit.2=1
Visit.1=1;Visit.2=2
Visit.1=2;Visit.2=2
0.12
0.70
0.08
0.14
0.06
0.66
0.08
0.10
0.04
0.62
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
ACF
ACF
ACF
ACF
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0 40 80 0 40 80 0 40 80 0 40 80
Figure 10.1: Plots for the parameters in the crime model. The top row, produced by the plot method for
missmodel objects, is a set of time series plots of each parameter versus iteration number. The bottom row,
produced by the daAcfPlot function, is a set of ACF plots of each parameter. These suggest that convergence is
reached quickly.
134
Assessing Convergence
$direction:
[1] -0.4616784 0.4846614 0.5137191 -0.5367021
$fraction:
[1] 0.2642576
135
Chapter 10 Example 2: The Loglinear Model
0 20 40 60 80 100
Lag
Figure 10.2: ACF plot of the worst linear function of the parameters in the crime models, with 95%
confidence bounds. The correlations for lags 4 and beyond are not significantly different than 0.
136
Assessing Convergence
2 4 6 8
crime.omega
Figure 10.3: Histogram of the simulated odds ratios. This distribution forms the basis of inference using
parameter simulation.
137
Chapter 10 Example 2: The Loglinear Model
> mean(crime.omega)
[1] 3.666283
2.5% 97.5%
2.211379 5.692615
See table Table 10.1 on page 143 to compare these results with those
obtained by other methods.
138
Assessing Convergence
0.20
0.10
0.0
0 5 10 15 20
Figure 10.4: Histogram of the likelihood ratio test statistic that compares the MLE with the simulated values.
2
Asymptotically, this observed data posterior distribution is χ 3 . The agreement is good, suggesting good large
sample properties.
139
Chapter 10 Example 2: The Loglinear Model
> crime.imp
$I1:
Visit.1 Visit.2 frequency
1 "Crime-free" "Crime-free" "529"
2 " Victim" "Crime-free" "100"
3 "Crime-free" " Victim" " 71"
4 " Victim" " Victim" " 56"
$I2:
Visit.1 Visit.2 frequency
1 "Crime-free" "Crime-free" "525"
2 " Victim" "Crime-free" " 96"
3 "Crime-free" " Victim" " 76"
4 " Victim" " Victim" " 59"
.
.
.
140
Generating Multiple Imputations Through DA
$I10:
Visit.1 Visit.2 frequency
1 "Crime-free" "Crime-free" "517"
2 " Victim" "Crime-free" "102"
3 "Crime-free" " Victim" " 82"
4 " Victim" " Victim" " 55"
attr(, "call"):
impLoglin.preLoglin(object = crime.s, prior = "n", start =
start.crime, control
= list(niter = 100))
attr(, "seed"):
[1] 33 2 58 26 51 3 47 25 42 12 28 1
141
Chapter 10 Example 2: The Loglinear Model
Note
Any calculation on the result of crosstabs is another object of class "crosstabs", so that the
elements of the calculation must be integers. In particular, the odds ratio calculation fails, which
is why we must oldUnclass above.
Analysis Using Several functions combine the separate complete data analyses to
Multiple produce one result that accounts for missing data uncertainty. For the
crime example, we test independence by:
Imputation
Combining odds ratios. Asymptotically, the log odds ratios
are normally distributed. Following rules given by Rubin
(1987), the multiple imputation inference is based on a t
distribution.
Combining likelihood ratio tests.
1 1 1 1
------- + ------- + ------- + ------- ,
x 11 x 12 x 21 x 22
142
Analyzing Completed Data Sets
We calculate the log odds ratio and the variance for each completed
data set as follows:
> exp(crime.logodd.comb$est)
[1] 3.809245
Table 10.1 compares the inferences for the odds ratio obtained using
the EM algorithm, DA algorithm, and multiple imputation.
Table 10.1: Comparison of inferences obtained for the odds ratio.
Lower Upper
Method Estimate Confidence Confidence
Bound Bound
EM 3.57 NA NA
143
Chapter 10 Example 2: The Loglinear Model
We can also test independence using the likelihood ratio test. The
following simple function calculates the test:
This confirms the likelihood ratio test result obtained using the EM in
Fitting a Model Using EM on page 129.
144
EXAMPLE 3: THE
CONDITIONAL GAUSSIAN
MODEL
Overview
11
146
The Foreign Language Data 146
Exploring Patterns of Missingness 148
Summarizing and Plotting 148
Preprocessing Data 151
Model Fitting 152
Specifying a Restricted Model 152
Fitting a Model Using EM 153
Fitting a Model Using DA 154
Assessing Convergence 156
Multiple Imputation 158
Analyzing Completed Data Sets 159
Consolidating Inferences 161
Conclusions 163
145
Chapter 11 Example 3: The Conditional Gaussian Model
OVERVIEW
This chapter provides detailed examples illustrating the conditional
Gaussian model fitting process, in which the variables with missing
values are either numeric or categorical. This model arises, for
example, in analysis of covariance and logistic regression with
continuous predictors. Chapter 4 briefly describes the conditional
Gaussian model, the associated priors, and the functions in
S+MISSINGDATA used to fit it. In this chapter, we illustrate the
S+MISSINGDATA functions using the foreign language example from
Schafer (1997). See Schafer (1997) for additional details and algorithm
descriptions.
The Foreign Schafer (1997) illustrates the conditional Gaussian model using a data
Language Data set of 279 students enrolled in foreign language courses at the
Pennsylvania State University. The original data are given in
Raymond and Roberts (1983). For each student, twelve variables
were collected, including age and sex. Variables measuring academic
achievement in foreign languages were also collected. One such
variable, GRD, is the final grade in the foreign language course. Two
instruments, the new Foreign Language Attitude Scale (FLAS) and
the established Modern Language Aptitude Test (MLAT), are
designed to predict success in studying foreign languages. The
students scores on these standardized tests were also collected and
recorded.
The data from the foreign language study is included in
S+MISSINGDATA as the built-in data set language. It consists of 12
variables, including GRD, FLAS, and MLAT:
> language
AGE PRI SEX FLAS MLAT SATV SATM ENG HGPA CGPA GRD
1 20-21 3 male 74 32 540 660 58 3.77 3.75 A
2 <20 2 male 69 28 610 760 75 2.18 3.81 A
3 20-21 0 female 81 28 610 560 61 3.19 3.73 A
4 <20 4+ female 89 13 430 470 33 2.21 3.54 B
5 <20 3 male 56 26 630 630 78 3.59 4.00 NA
6 20-21 3 female 95 22 440 580 48 3.25 3.20 A
7 NA NA male 71 NA NA NA NA 2.46 NA NA
8 <20 4+ female 95 NA 560 540 55 2.00 2.77 NA
. . .
146
Overview
147
Chapter 11 Example 3: The Conditional Gaussian Model
Summarizing In this section, we use the miss function and its associated methods to
and Plotting explore the language data. As discussed in Chapter 3, the miss
function is designed to facilitate exploratory data analysis for data sets
that include missing values. It creates an object of class "miss", which
by default rearranges the rows and columns of the data according to
the numbers and patterns of missing values.
To create a miss obect from the language data, type:
Ten of the twelve variables have at least one missing value. Omitting
cases with missing values would delete 38% of the observations.
Use summary for more detailed information. Here is the annotated
output from summary for language.miss:
> summary(language.miss)
Breakdown by variable
V O name Missing % missing
1 9 HGPA 1 0
2 3 SEX 1 0
3 1 AGE 11 4
4 2 PRI 11 4
5 6 SATV 34 12
6 7 SATM 34 12
7 10 CGPA 34 12
148
Exploring Patterns of Missingness
8 8 ENG 37 13
9 11 GRD 47 17
10 5 MLAT 49 18
V = Variable number used below, O = Original number (before
sorting)
No missing values for variables:
FLAS LAN
Of the 279 rows in the original language data, there are 18 distinct
patterns of missing values. These are shown in the next section of the
output from summary:
149
Chapter 11 Example 3: The Conditional Gaussian Model
Observed values are displayed with a period and missing values with
an m. The output indicates that the first pattern has no missing values
while the second pattern has missing values only in variable 10. As we
previously noted, the tenth variable after reordering is MLAT.
Each pattern detected by the miss function corresponds to one or
more rows in the original data set. The correspondence between rows
and patterns is shown in the next section of the output from summary:
150
Exploring Patterns of Missingness
You can view an image plot of the language.miss object by using the
plot.miss function. Figure 11.1 displays the plot created by the
following command:
> plot(language.miss)
HGPA
CGPA
SATM
MLAT
SATV
FLAS
GRD
ENG
AGE
SEX
LAN
PRI
Observations (reordered)
50
100
150
200
250
Preprocessing In the next section, we fit models to the language data using both the
Data EM and DA algorithms. To save computation resources during the
model fitting process, preprocess the language data by creating a
preCgm object as follows:
The arguments margins and gauss to preCgm identify the factor and
numeric variables, respectively. Since language is a data frame and
these arguments are not supplied in the above command, all factor
variables are modeled by the loglinear part of the model and all
numeric variables are modeled by the (conditional) Gaussian part.
For additional details, see page 35 and the online help file for preCgm.
151
Chapter 11 Example 3: The Conditional Gaussian Model
MODEL FITTING
Specifying a The categorical variables in the language data set, AGE, PRI, SEX, GRD,
Restricted and LAN, have 5, 5, 2, 5, and 4 levels, respectively. Together, they
specify a 5 dimensional contingency table with
Model 5 × 5 × 2 × 5 × 4 = 1000 cells. In addition, there are 7 numeric
variables in language. An unrestricted model therefore involves
( 1000 – 1 ) + ( 1000 × 7 ) + ( 7 × ( 7 + 1 ) ⁄ 2 ) = 8027 free parameters.
Clearly, an unrestricted model cannot be fit with 279 observations!
Instead, Schafer (1997, page 367) suggests a restricted model in which:
The table formed by the factor variables is described by a
loglinear model with all main effects and two-variable
associations.
The numeric variables are collectively described by a
regression with main effects for each factor variable. The
eight-column design matrix for this regression includes an
intercept, dummy indicators for SEX and LAN, and linear
contrasts for AGE, PRI, and GRD.
To compute this restricted model, first specify the formula for a
loglinear model with all main effects and two-variable associations:
Finally, the formula that produces the appropriate design matrix is:
152
Model Fitting
Note that LAN and SEX are coded by dummy variables while the other
factor variables are represented by linear contrasts.
Fitting a Model The command below fits the restricted conditional Gaussian model to
Using EM the language data using the EM algorithm. To ensure a mode in the
interior of the parameter space, Schafer (1997, page 369)
recommends setting the Dirichlet prior hyperparameter to 1.05:
Steps of ECM:
1...2...3...4...5...6...7...8...9...10...11...12...13...14
...15...16...17...18...19...20...21...22...23...24...25...
26...27...28...29...30...31...32...33...34...35...36...37
...38...39...40...41...42...43...44...45...46...47...48...
49...50...51...52...53...54...55...56...57...58...59...60
...61...62...63...64...65...66...
> language.EM$paramIter
153
Chapter 11 Example 3: The Conditional Gaussian Model
ENG 43.504382
HGPA 1.516468
CGPA 2.530749
AGE=2;PRI=1;SEX=1;GRD=1;LAN=1
FLAS 70.311911
MLAT 15.833215
SATV 468.583594
SATM 513.125791
ENG 41.335946
HGPA 1.563725
CGPA 2.417728
. . .
> language.EM$algorithm
Fitting a Model In this section, we use the DA model fitting algorithm on the
Using DA language data. As discussed in Schafer (1997, page 369), a flattening
prior may be undesirable for models of the language data because the
AGE and GRD variables have rare levels. Flattening priors can distort
the marginal distributions for these variables, leading to too many
rare levels in the imputed values. Instead, Schafer recommends using
a data dependent prior that smooths toward a table of mutual
independence but leaves the marginal distributions unchanged. The
hyperparameters are scaled so that they add to 50, giving the same
effective prior sample size as used in the previous section for the EM
algorithm.
Create such a data dependent prior as follows:
154
Model Fitting
155
Chapter 11 Example 3: The Conditional Gaussian Model
ASSESSING CONVERGENCE
Since there are 8028 parameters for the language data, it is
unreasonable to monitor each of them individually with
autocorrelation plots. Instead, we look at the worst linear function of
the parameters. The rate of convergence for the EM algorithm is
governed by the fraction of missing information. To aid in assessing
convergence, we monitor a function with high rates of missing
information, since convergence is slowest for this type of function.
Schafer (1997, pages 129131) recommends monitoring the worst
linear function of θ .
Warning
When the posterior is non-normal, other functions may converge more slowly. Thus, do not
blindly rely on the apparent stationarity of the worst linear function without enlisting other
diagnostic techniques.
156
Assessing Convergence
Figure 11.2: ACF plot of the worst linear function of the parameters in the language models. The
autocorrelations seem to die out by iteration 50.
157
Chapter 11 Example 3: The Conditional Gaussian Model
MULTIPLE IMPUTATION
The ACF of the worst linear function in Figure 11.2 suggests
convergence by 50 iterations. To be conservative, we save every
250th imputation. The following command generates ten
imputations, starting from the last parameter values in the
language.DA object:
> miSubscript(language.imp, 2)
158
Analyzing Completed Data Sets
2
T
r = ± --------------- ,
2
T +ν
where the sign is chosen to be the sign of T . Moreover, atan ( r ) is
asymptotically Gaussian with a mean of atan ( ρ ) and variance
1 ⁄ ( ν – 1 ) . We use this fact in the next section to apply Rubins rule
and consolidate inferences.
The first step in estimating partial correlations is to fit a linear model
to each of the ten completed data sets for language:
To apply Rubins rule, we must calculate the estimate and its standard
error for each completed data set. First, calculate the transformed
partial correlation for each of the data sets:
159
Chapter 11 Example 3: The Conditional Gaussian Model
160
Consolidating Inferences
CONSOLIDATING INFERENCES
The following command calculates the consolidated estimate of the
transformed partial correlation:
> tan(partCorr$est)
> partCorr$fminf
161
Chapter 11 Example 3: The Conditional Gaussian Model
162
Conclusions
CONCLUSIONS
As Schafer (1997, page 372) indicates, the assumptions underlying the
regression model and the normal approximation to the transformed
partial correlation do not hold, so the estimated partial correlation
coefficients must be interpreted loosely. Yet these estimates indicate
that the FLAS variable has the highest partial correlation with GRD
except for HGPA. In particular, its partial correlation is higher than that
of the well established instrument MLAT.
163
Chapter 11 Example 3: The Conditional Gaussian Model
164
BIBLIOGRAPHY
165
Bibliography
166
Bibliography
167
Bibliography
168