D Optimal Designs
D Optimal Designs
com
Chapter 267
D-Optimal Designs
Introduction
This procedure generates D-optimal designs for multi-factor experiments with both quantitative and qualitative
factors. The factors can have a mixed number of levels. Hence, you could use this procedure to design an
experiment with two quantitative factors having three levels each and a qualitative factor having seven levels.
D-optimal designs are constructed to minimize the generalized variance of the estimated regression coefficients.
In the multiple regression setting, the matrix X is often used to represent the data matrix of independent variables.
D-optimal designs minimize the overall variance of the estimated regression coefficients by maximizing the
determinant of X’X. Designs that are D-optimal have been shown to be nearly optimal for several other criterion
that have been proposed as well.
When would you use D-optimal designs? When you have a limited budget and cannot run a completely replicated
factorial design. For example, suppose you want to study the response to three factors: A with three levels, B with
four levels, and C with eight levels. One complete replication of this experiment would require 3 x 4 x 8 = 96
points (we use the word ‘point’ to mean an experimental unit). Suppose you can afford only 20 points. Which 20
of the 96 possible should you use? The D-optimal design algorithm provides a reasonable choice.
267-1
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
D-Optimal Designs
100 random starting sets are needed. (During the testing of the algorithm, we found that some designs required
500 starts to obtain the global maximum.)
Factor Scaling
This algorithm deals with both quantitative (continuous) and qualitative (discrete) factors. The levels of
quantitative factors are scaled so that the minimum value is -1 and the maximum value is 1. Qualitative factors are
included as a set of variables. For example, suppose that a qualitative variable has four values. Three independent
variables are created to represent this factor:
Original X1 X2 X3
1 -10 0 0
2 0 -1 0
3 0 0 -1
4 1 1 1
As you can see, each of these variables compares a separate group with the last group. Also note that the number
of generated variables is always one less than the number of levels.
Duplicates (Replicates)
The measurement of experimental error is extremely important in the analysis of an experiment. In most cases, if
an estimate of experimental error is not available, the data from the experiment cannot be analyzed. One of the
best estimates of experiment error comes from points that are duplicates (often called replicates) of each other.
Since D-optimal designs are often used in situations with limited budgets, the experimenter is often tempted to
ignore the need for duplicates and instead add points with additional treatment combinations. The tenth
commandments for experimental design should be “Thou shalt have at least four duplicates in an experiment.”
Unfortunately, the D-optimal design algorithm ignores the need for duplicates. Instead, you have to add them after
the experimental design has be found. So what you do is set aside at least four points from the algorithm. For
example, suppose you have budget for 20 design points. You would tell the program that you have only 16 points.
The algorithm would find the best 16 point design. You would then duplicate four of the resulting design points to
provide an estimate of experimental error. We recommend that you spread these duplicates out across the
experiment so you can have some indication as to whether the magnitude of the experimental error is constant
across all treatment settings.
Specifying a Model
Selecting an appropriate model is subjective by nature. Often, you will know very little about the true functional
form of the relationship between the response and the factor variables. A common approach is to assume that a
second-order Taylor-series approximation will work fairly well. You are assuming that the true function may be
approximated by parabolic surface in the neighborhood of interest. Cutting down on the complexity of the model
reduces the number of points that must be added to the experimental design.
When dealing with qualitative factors, you generally limit the model to first order interactions. Higher order
interactions may be studied later when a complete experiment can be run.
267-2
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
D-Optimal Designs
Procedure Options
This section describes the options available in this procedure.
Design Tab
This panel specifies the parameters that will be used to create the design values.
Experimental Setup
N Per Block
This option specifies the required sample size. If you are not using blocks, enter a single number giving the total
sample size. The sample size must be large enough to fit the designated model. If it is not large enough, you will
be shown the minimum number of points necessary.
If you are using blocks, enter the sample size for each block, separated by blanks or commas. These sample sizes
do not have to be equal, although they usually are. For example, if you have three blocks, you might enter 8,8,12
which would give an overall sample size of 28. The first block will have 8 points, the second 8 points, and the
third 12 points.
You must be careful when specifying blocks when you also have forced design points. In this case, the first few
blocks are matched with the forced design points. The size of the blocks must match the number of forced points.
For example, suppose you have already run two blocks of four each and you want to augment this with three
blocks of six each. You would have eight forced points. The entry in this field would be 4,4,6,6,6. If you entered
4,3,7,6,6 an error would occur because the forced points cannot be assigned exactly to one or more blocks. The
bottom line is, you cannot force partial blocks into the design.
Input Variables (Candidate and Forced)
When specified, these variables contain either a set of points to be forced into the final design, a set of candidate
points from which the design is to be selected, or both. The data must be arranged so that the forced points are
located at the top of the spreadsheet followed by any candidate points. When candidate points are specified, no
additional candidate points are generated. If you want to force points in the design and choose the rest from
among those generated by the model statement, the total number of rows in these variables must equal the total
number of forced rows specified below.
Note that these variables are matched with the factors specified in the model after those factors have been sorted.
Qualitative factors must be entered using positive integers (1, 2, 3, etc.). You cannot use any other identifiers. If
you have data entered using some other scheme (such as A, B, C, etc.), you will have to recode the values so that
they are positive integers.
Quantitative factors must be scaled so that the minimum value is -1 and the maximum value is 1. For example,
suppose an existing design has a factor whose values are 10, 15, and 20. Here the minimum is 10 and the
maximum is 20. You would transform these using the formula
Scaled = (Original + Original - Max - Min) / (Max - Min)
Since, in this example, Max = 20 and Min = 10, the transformation reduces to New = (Original + Original - 30)/10
= Original /5 - 3. You would create a new variable using the transformation Original /5-3. This transformation
would give 10/5 - 3 = -1, 15/5 - 3 = 0, and 20/5 - 3 = 1. That is, the new variable would contain -1’s, 0’s, and 1’s
instead of 10’s, 15’s, and 20’s.
Number Duplicates
It is very important to have duplicates of at least some of the design points to provide an estimate of experimental
error. This option designates the number of duplicates to be generated. The first design point is duplicated, then
267-3
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
D-Optimal Designs
the second, and so on. Even though this option is convenient, we recommend that you pick appropriate points for
duplication by looking at scatter plots of the design.
If your design includes blocking, you should not create duplicates since that will give erroneous block sizes.
Rather, you should manually create duplicates.
Input Data Type
If you have Input Variables specified, this option specifies the type of data contained in those variables. Two
types of data are possible.
• Factor Values
Specifies that the input data contains indices of each factor. An expanded design matrix will be generated
from these factor indices using the designated model. This is the more common data type.
• Expanded Matrix
Specifies that the input dataset contains the expanded design matrix. That is, the quadratic, cubic, and
interaction terms have been created. The model statement is not used. You would use this option when you
want to specify the candidate design set in more detail than is allowed by the program. The expanded matrix
must include the intercept (a column of one’s) if one is to be included in the model.
Forced Points
The number of rows in the Input Variables that should be forced into the final design. These rows must be located
at the top of the database, before any candidate points. If the number of forced points is equal to the number of
points read in, the generated design matrix is used. Otherwise, the additional rows are used as candidate points
and no other rows are generated.
Optimize the Design for this Model
Your design is optimized for the model specified here. Specify main effects (factors) with names consisting of
one or more letters, such as A B C. Specify interactions using an asterisk (*), such as A*B. You can use the bar (|)
symbol (see examples below) as a shorthand method to specify a complete model. You can use parentheses. You
can separate terms with blanks or the '+' (plus) sign. Duplicate terms are removed during the evaluation of the
model. Note that the main effects are always sorted in alphabetical order.
Some examples will help to indicate how the model syntax works:
C + B + A + B*A + C*A = A+B+C+A*B+A*C (Note the sorting!)
A|B = A+B+A*B
B|A = A+B+A*B
A|B A*A B*B = A+B+A*B+A*A+B*B
A|A|B|B (Max Term Order=2) = A+B+A*B+A*A+B*B
A|B|C = A+B+C+A*B+A*C+B*C+A*B*C
(A+B)*(C+D) = A*C+A*D+B*C+B*D
(A+B)|C = A+B+C+(A+B)*C
= A+B+C+A*C+B*C
You can experiment with various expressions by viewing the Model Terms report.
For quantitative factors, each term represents a single variable in the expanded design matrix. For qualitative
variables, each term represents a set of variables in the expanded design matrix.
Note that qualitative terms should not be squared or cubed. That is, if A is a qualitative factor, you would not
include A*A or an A*A*A in your model.
267-4
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
D-Optimal Designs
Storage Tab
This panel specifies the parameters that will be used to store values to the spreadsheet.
Data Storage
Store Data with the Dataset
Check this box to generate the design data on the dataset. The data will be identical to the design data generated
on the output window.
First Factor Column
If the Input Data Type is set to Factor Values, the final design is stored in a set of contiguous columns of the
dataset, beginning with this column. Be careful not to overwrite existing data. If you have four factors, the design
will be stored in this variable and the next three to the right. Existing data will be lost!
If the Input Data Type is set to Expanded Matrix, an index is stored in this variable that represents whether the
row is used in the design. If the row is not in the optimum design, a zero is stored. If the row is in the optimum
design, the number of times it occurs is stored here.
267-5
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
D-Optimal Designs
Reports Tab
This panel specifies the reports that will be generated.
Select Reports
Factor Report - Expanded Design Matrix Report
These options control which reports are displayed. Some of the reports may be fairly lengthy, so you will often
want to omit them.
Report Options
Precision
Specify the precision of numbers in the report. A single-precision number will show seven-place accuracy, while
a double-precision number will show thirteen-place accuracy. Note that the reports are formatted for single
precision. If you select double precision, some numbers may run into others. Also note that all calculations are
performed in double precision regardless of which option you select here. This is for reporting purposes only.
Decimal Places
Specify the number of decimal places shown when displaying the design.
267-6
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
D-Optimal Designs
267-7
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
D-Optimal Designs
Several columns in the dataset are filled with data. The first, second, and third columns (A, B, and C) contain the
actual design. You would replace the -1’s with the corresponding factor’s minimum value, the 1’s with the
maximum value, and the 0’s with the average of the two.
The columns from Intercept to C_C contain the expanded design matrix. Each variable is generated by
multiplying the appropriate factor values. For example, in the first row, A_B is found by multiplying the value for
A, which is -1, by the value for B, which is also -1. The result is 1. The intercept is set to one for all rows. The
expanded matrix is usually saved so that the design can be analyzed using multiple regression.
To use this design, you would randomly assign these ten points to the ten experimental units.
Factor Section
Number
Name Values Type Value1 Value2 Value3
A 3 Quantitative -1.0000 0.0000 1.0000
B 3 Quantitative -1.0000 0.0000 1.0000
C 3 Quantitative -1.0000 0.0000 1.0000
This report summarizes the factors that were included in the design. The last line of this report gives the number
of observations required for one complete replication of the experiment. This value is the product of the number
of levels for each factor.
Name
The symbol(s) used to represent the factor.
Number Values
The number of values (levels) generated for each factor. For qualitative factors, this value was set in the
Qualitative Factors and Levels box of the Design panel. For quantitative factors, this value is one more that the
highest exponent used with this term. For example, if the model includes an A*A and nothing of a higher order,
this value will be three.
Type
A factor is either quantitative or qualitative.
Value1 - Value 3
These columns list the individual values that are used as the levels of each factor when generating the expanded
design matrix based on the model. Notice that the smallest is always -1 and the largest is always 1.
When the expanded design matrix is input directly, these values should be ignored.
267-8
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
D-Optimal Designs
This report shows the terms generated by your model. You should check this report carefully to make sure that the
generated model matches what you wanted. The last line of the report gives the total number of degrees of
freedom (except for the intercept) required for your model. This number plus one is the minimum size of the D-
optimal design for this model.
Variables Needed
The number of degrees of freedom (expanded design variables) required for this term.
Term
The name of each term.
D-Optimal Design
Original Factors
Row A B C
1 -1 -1 -1
3 1 -1 -1
5 0 0 -1
7 -1 1 -1
9 1 1 -1
13 -1 0 0
17 0 1 0
20 0 -1 1
25 -1 1 1
27 1 1 1
The values 10, 15, and 20 represent the three levels of factor A that are used in the design. They would replace the
-1, 0, and 1 displayed in this report.
This report shows the largest twenty determinants. The main purpose of this report is to let you decide if enough
iterations have been run so that a global maximum has been found. Unless the maximum value was achieved on at
least five iterations, you should double the number of iterations and rerun the procedure.
In this example, the top value occurred on only two iterations. In practice we would probably try another 200
iterations to find out if this is the global maximum.
Rank
Only the top twenty are shown on this report. The values are sorted by the determinant.
Determinant of X’X
This is the value of the determinant of X’X which is the statistic that is being maximized. This value is sometimes
called the generalized variance of the regression coefficients. Since this value occurs in the denominator of the
variance of each regression coefficient, maximizing it has the effect of reducing the variance of the estimated
regression coefficients.
D-Efficiency
D-efficiency is the relative number of runs (expressed as a percent) required by a hypothetical orthogonal design
to achieve the same determinant value. It provides a way of comparing designs across different sample sizes.
X ' X 1/ p
DE = 100
N
where p is the total number of degrees of freedom in the model and N is the number of points in the design.
Percent of Maximum
This is the percentage that the determinant on this row is of the best determinant found.
267-10
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
D-Optimal Designs
Determinant 1327104
D-Efficiency 40.95345
Trace 4.583333
A-Efficiency 21.81818
This report shows the diagonal elements of the X’X and its inverse. Since the variance of each term is
proportional to diagonal elements from the inverse of X’X, the last column of this report lets you compare those
variances. From this report you can determine if the coefficients will be estimated with the relative precision that
is desired.
For example, we can see from this example that them main effects will be estimated with the greatest precision—
usually a desirable quality in a design.
Number
An arbitrary sequence number.
Name
The name of the term.
Diagonal of X’X
The diagonal element of this term in the X’X matrix.
Diagonal of X’X Inv
The diagonal element of this term in the X’X inverse matrix. See the discussion above for an understanding of
how this value might be interpreted.
Determinant
This is the value of the determinant of X’X which is the statistic that is being maximized. This value is sometimes
called the generalized variance of the regression coefficients. Since this value occurs in the denominator of the
variance of each regression coefficient, maximizing it has the effect of reducing the variance of the estimated
regression coefficients.
D-Efficiency
D-efficiency is the relative number of runs (expressed as a percent) required by a hypothetical orthogonal design
to achieve the same determinant value. It provides a way of comparing designs across different sample sizes.
X ' X 1/ p
DE = 100
N
where p is the total number of degrees of freedom in the model and N is the number of points in the design.
Trace
This is the value of the trace of X’X-inverse which is associated with A-optimality.
267-11
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
D-Optimal Designs
A-Efficiency
D-efficiency is the relative number of runs (expressed as a percent) required by a hypothetical orthogonal design
to achieve the same trace value. It provides a way of comparing designs across different sample sizes.
AE = 100
p
(
trace N X ' X −1
( ) )
where p is the total number of degrees of freedom in the model and N is the number of points in the design.
This report gives a list of candidate points from which the D-optimal design points were selected.
Original Row
This is an arbitrary identification number.
Factors (A B C)
These are the values of the factors. For example, the first row sets A, B, and C to -1. Remember that these are
scaled values. You would transform them back into their original metric using the formula:
Original = (Scaled(Max - Min) + Max + Min)/2
For example, suppose the original metric for factor A is minimum = 10 and maximum =20. The original values
would be calculated as follows:
Scaled Formula Original
-1 (-1(20-10)+20+10)/2 10
0 (0(20-10)+20+10)/2 15
1 (1(20-10)+20+10)/2 20
267-12
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
D-Optimal Designs
The values 10, 15, and 20 represent the three levels of factor A. They would replace the -1, 0, and 1 displayed in
this report.
This report gives a list of candidate points expanded so that each individual term may be seen. The report is useful
to show you how the expanded matrix looks. Each variable is generated by multiplying the appropriate factor
values. For example, in the first row, A_B is found by multiplying the value for A, which is -1, by the value for B,
which is also -1. The result is 1. The intercept is set to one for all rows.
If you want to constrain the design space, you could cut and paste these values back into the spreadsheet and then
eliminate points that cannot occur.
A vs B A vs C B vs C
1.5 1.5 1.5
A A B
Finally, we ran the D-optimal design through the Scatter Plot procedure so that we could visually see how the
design values are placed.
From these three scatter plots, we can see the configuration of the points fairly well. It appears that the B*C term
is missing two points while the A*B and A*C terms are missing only one. Using this information, we would want
to arrange our factors in such a way that the B*C term is the least likely to have an interaction.
267-13
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
D-Optimal Designs
Columns A and B give the design. The Determinant Analysis Section showed that the maximum was achieved on
25 of the 30 iterations. Hence, we assume that the algorithm converged to the global maximum.
267-14
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
D-Optimal Designs
Next, we add the two duplicates to the design. When only a few duplicates are available, we like to have them in
the middle, so we will duplicate the two rows having zero values. We choose random numbers for the two new
response values. The resulting design appears as follows.
Next, we change the factor values back to their original scale. Factor A went from 10 to 20 and factor B went
from 1 to 3. We call the two new variables A1 and B1. While we are at it, we also create other columns of the
expanded design matrix. The resulting dataset appears as follows.
We could continue this exercise by running these data through the multiple regression procedure and paying
particular attention to the Multicollinearity Section and the Eigenvalues of Centered Correlations Section. When
we did this, we found that multicollinearity seemed to be a problem in the original scale, but not in the -1 to 1
scale used by the D-optimal algorithm.
Plot of Design
A1 vs B1
3.5
2.5
1.5
0.5
8.0 12.7 17.3 22.0
A1
In order to better understand the design, we look at a scatter plot of the two factors. Remember that this began as a
six-point design. We can see from this plot that the optimum configuration puts points at each corner and in the
middle—just what we would expect. Viewing the design configuration is extremely important.
Remember that we duplicated the two center points of this design.
267-15
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
D-Optimal Designs
267-16
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
D-Optimal Designs
Variables A, B, C, and Blocks give the design. The Determinant Analysis Section showed that the maximum was
achieved on 12 of the 100 iterations. Hence, we assume that the algorithm converged to the global maximum.
In order to visually analyze the design, we generate the scatter plots for each pair of variables in the design.
Plot of Design
A vs B A vs C B vs C
1.5 1.5 1.5
A A B
Block
Block
A B C
We can see from these plots that each of the interactions seems to be well represented—only a few points are
missing from each and none of these are on the corners. The design seems pretty good. We decide to use the
interactions with blocks as the measure of experimental error, so no other duplicates are need.
As a exercise, try adding one more block to this experiment. You will notice that each of the two-way interaction
plots are completely full.
267-17
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
D-Optimal Designs
267-18
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
D-Optimal Designs
Columns A, B, C, and Blocks give the design. The new block is shown as the last four rows of the design.
The Determinant Analysis Section showed that the maximum was achieved on 9 of the 30 iterations. Hence, we
assume that the algorithm converged to the global maximum.
In order to visually analyze the design, we generate the scatter plots for each pair of variables in the design.
Plot of Design
Ax vs Bx Ax vs Cx Bx vs Cx
1.5 1.5
1.5
Ax Ax Bx
Ax Bx Cx
We set the plotting symbols in the scatter plots so that the new points are displayed as squares. It is interesting to
see where these points were added.
267-19
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
D-Optimal Designs
A vs B A vs C B vs C
0.6 0.6 0.6
C
B
A A B
The task for the algorithm is to pick the ten best points from the thirteen that are shown here.
You may follow along here by making the appropriate entries or load the completed template Example 5 by
clicking on Open Example Template from the File menu of the D-Optimal Designs window.
267-20
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
D-Optimal Designs
Mixture Design
Original Factors
Row A B C
1 0.7000 0.1000 0.2000
2 0.2000 0.6000 0.2000
3 0.7000 0.2000 0.1000
4 0.2000 0.2000 0.6000
5 0.3000 0.6000 0.1000
6 0.3000 0.1000 0.6000
8 0.2000 0.4000 0.4000
9 0.5000 0.1000 0.4000
11 0.5000 0.4000 0.1000
13 0.4000 0.3000 0.3000
Columns A, B, and C give the design. The original row from the candidate list is shown as the first column of the
report.
The Determinant Analysis Section showed that the maximum was achieved on 30 of the 30 iterations. Hence, we
assume that the algorithm converged to the global maximum.
In order to visually analyze the design, we generate the scatter plots for each pair of variables in the design.
Plot of Design
Ax vs Bx Ax vs Cx Bx vs Cx
0.6 0.6 0.6
Ax Ax Bx
It is interesting to compare these plots with those produced earlier to see which points were kept by the algorithm.
267-21
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
D-Optimal Designs
267-22
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
D-Optimal Designs
Columns A, B, and C give the design. Notice that column C simply gives the level for factor C—it was not
rescaled. Also note that the levels of factor C are numbered arbitrarily. This means that only the pattern is
important, not the particular level. For example, in this solution, there are only three level 2’s and three level 4’s.
In the next solution, there might be three level 3’s and three level 4’s.
The Determinant Analysis Section showed that the maximum was achieved on 5 of the 30 iterations. Hence, we
assume that the algorithm converged to the global maximum.
In order to visually analyze the design, we generate the scatter plots for each pair of columns in the design.
Plot of Design
A vs B A vs C B vs C
6.0 6.0
1.5
0.5
4.0 4.0
C
C
B
A A B
It is interesting to note that all nine positions were filled for the interaction of the two quantitative factors, A and
B. However, some points were omitted for the AC interaction and the BC interaction.
267-23
© NCSS, LLC. All Rights Reserved.