Wombat Manual
Wombat Manual
A program for
Mixed Model Analyses
by Restricted
Maximum Likelihood
2 Availability 4
3 Getting started 6
3.1 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.1 Installation for Linux operating systems . . . . . 6
3.1.2 Installation under Windows . . . . . . . . . . . . . 7
3.1.3 Running WOMBAT . . . . . . . . . . . . . . . . . . 8
3.1.4 Examples and testing . . . . . . . . . . . . . . . . . 9
3.1.5 Compilation notes . . . . . . . . . . . . . . . . . . 10
3.1.6 Updates . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Using the manual . . . . . . . . . . . . . . . . . . . . . . . 11
3.3 Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . 11
3.4 Additional documentation . . . . . . . . . . . . . . . . . . 13
3.4.1 On-line version of the manual . . . . . . . . . . . . 13
3.4.2 Notes distributed with specific examples . . . . . 13
3.4.3 Wiki pages . . . . . . . . . . . . . . . . . . . . . . . 13
4 Parameter file 15
4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2 General rules . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.3 Run options . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.4 Comment line . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.5 Overriding default program limits . . . . . . . . . . . . . 18
4.6 Analysis type . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.7 Pedigree file . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.8 Marker counts file . . . . . . . . . . . . . . . . . . . . . . 22
4.9 Data file . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.9.1 Simple . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.9.2 Compact . . . . . . . . . . . . . . . . . . . . . . . . 23
4.10 Model of analysis . . . . . . . . . . . . . . . . . . . . . . . 24
4.10.1 Effects fitted . . . . . . . . . . . . . . . . . . . . . . 24
4.10.2 Fixed effects . . . . . . . . . . . . . . . . . . . . . . 24
4.10.3 Fixed covariables . . . . . . . . . . . . . . . . . . . 25
4.10.4 Random effects . . . . . . . . . . . . . . . . . . . . 26
4.10.5 ‘Subject’ identifier . . . . . . . . . . . . . . . . . . 29
4.10.6 ‘Extra’ effects . . . . . . . . . . . . . . . . . . . . . 29
4.10.7 Traits analysed . . . . . . . . . . . . . . . . . . . . 29
4.11 Covariance components . . . . . . . . . . . . . . . . . . . 30
4.11.1 Residual covariances . . . . . . . . . . . . . . . . . 30
4.11.2 Covariances for random effects . . . . . . . . . . . 32
4.12 Additional functions of covariance components . . . . . . 33
i
ii
5 Run options 57
5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.2 Basic run options . . . . . . . . . . . . . . . . . . . . . . . 58
5.2.1 Continuation run . . . . . . . . . . . . . . . . . . . 58
5.2.2 Level of screen output . . . . . . . . . . . . . . . . 58
5.2.3 Set-up steps . . . . . . . . . . . . . . . . . . . . . . 59
5.2.4 Quality of starting values . . . . . . . . . . . . . . 59
5.2.5 Numerical settings . . . . . . . . . . . . . . . . . . 59
5.2.6 Intermediate results . . . . . . . . . . . . . . . . . 60
5.2.7 Prediction only . . . . . . . . . . . . . . . . . . . . 60
5.2.8 Simulation only . . . . . . . . . . . . . . . . . . . . 62
5.2.9 Sampling to approximate standard errors . . . . . 63
5.2.10 Matrix inversion only . . . . . . . . . . . . . . . . . 64
5.2.11 Calculation of genomic relationships or H−1 . . . . 66
5.2.12 Quadratic approximation . . . . . . . . . . . . . . 66
5.2.13 Analyses of subsets of traits . . . . . . . . . . . . . 67
5.2.14 Pooling estimates of covariance components from
part analyses . . . . . . . . . . . . . . . . . . . . . 68
5.2.15 Miscellaneous . . . . . . . . . . . . . . . . . . . . . 69
5.3 Advanced run options . . . . . . . . . . . . . . . . . . . . 69
5.3.1 Ordering strategies . . . . . . . . . . . . . . . . . . 70
5.3.2 REML algorithms . . . . . . . . . . . . . . . . . . . 71
iii
5.3.3 Parameterisation . . . . . . . . . . . . . . . . . . . 74
5.3.4 Matrix storage mode . . . . . . . . . . . . . . . . . 74
5.3.5 Sparse matrix factorisation, auto-differentiation
and inversion . . . . . . . . . . . . . . . . . . . . . 75
5.3.6 Other . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.4 Parameter file name . . . . . . . . . . . . . . . . . . . . . 78
6 Input files 83
6.1 Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.2 Data File . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.3 Pedigree File . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.4 Marker models: Allele counts . . . . . . . . . . . . . . . . 86
6.5 Parameter File . . . . . . . . . . . . . . . . . . . . . . . . 87
6.6 Other Files . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.6.1 General inverse . . . . . . . . . . . . . . . . . . . . 87
6.6.2 Basis function . . . . . . . . . . . . . . . . . . . . . 91
6.6.3 GWAS: Allele counts . . . . . . . . . . . . . . . . . 92
6.6.4 Results from part analyses . . . . . . . . . . . . . . 93
6.6.5 ‘Utility’ files . . . . . . . . . . . . . . . . . . . . . . 93
6.6.6 File SubSetsList . . . . . . . . . . . . . . . . . . 94
6.6.7 File(s) Pen*(.dat) . . . . . . . . . . . . . . . . . . 94
7 Output files 96
7.1 Main results files . . . . . . . . . . . . . . . . . . . . . . . 97
7.1.1 File SumPedigrees.out . . . . . . . . . . . . . . 97
7.1.2 File SumModel.out . . . . . . . . . . . . . . . . . 97
7.1.3 File SumEstimates.out . . . . . . . . . . . . . . 97
7.1.4 File BestSoFar.out . . . . . . . . . . . . . . . . 97
7.1.5 File FixSolutions.out . . . . . . . . . . . . . . 97
7.1.6 File SumSampleAI.out . . . . . . . . . . . . . . . 98
7.2 Additional results . . . . . . . . . . . . . . . . . . . . . . . 98
7.2.1 File Residuals.dat . . . . . . . . . . . . . . . . 98
7.2.2 File(s) RnSoln_rname.dat . . . . . . . . . . . . . 99
7.2.3 File(s) Curve_cvname(_trname).dat . . . . . . 100
7.2.4 File(s) RanRegname.dat . . . . . . . . . . . . . . 100
7.2.5 Files SimDatan.dat . . . . . . . . . . . . . . . . . 101
7.2.6 Files EstimSubSetn+. . .+m.dat . . . . . . . . . 102
7.2.7 Files PDMatrix.dat and PDBestPoint . . . . . 102
7.2.8 Files PoolEstimates.out and PoolBestPoint 103
7.2.9 Files MME*.dat . . . . . . . . . . . . . . . . . . . . 103
7.2.10 File SNPSolutions.dat and
SNPCovariances.dat . . . . . . . . . . . . . . . 104
7.2.11 Files Pen*(.dat) and ValidateLogLike.dat . . 104
7.2.12 File CovSamples_name.dat . . . . . . . . . . . . 105
iv
9 Examples 112
Bibliography 130
Index 134
1 Introduction
Purpose
WOMBAT replaces DfReml [29, 30] which has been withdrawn from
distribution at the end of 2005.
Features
Remarks
topics are available. To get the best from WOMBAT, users should also
be familiar with some of the technical aspects of REML estimation, in
particular properties of various algorithms to maximise the likelihood,
ordering strategies and parameterisations – otherwise many of the
(advanced) options provided will not make a great deal of sense.
2 Availability
WOMBAT has been developed under and is meant for a Linux op-
erating system. Highly optimised executables are available for this
environment that are suitable for large analyses.
Conditions of use
Alternatives:
DISCLAIMER
While every effort has been made to ensure that WOMBAT does
what it claims to do, there is absolutely no guarantee that the
results provided are correct.
3.1 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.1 Installation for Linux operating systems . . . . . . . 6
3.1.2 Installation under Windows . . . . . . . . . . . . . . 7
3.1.3 Running WOMBAT . . . . . . . . . . . . . . . . . . . 8
3.1.4 Examples and testing . . . . . . . . . . . . . . . . . . 9
3.1.5 Compilation notes . . . . . . . . . . . . . . . . . . . . 10
3.1.6 Updates . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Using the manual . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3 Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4 Additional documentation . . . . . . . . . . . . . . . . . . . . 13
3.4.1 On-line version of the manual . . . . . . . . . . . . . 13
3.4.2 Notes distributed with specific examples . . . . . . . 13
3.4.3 Wiki pages . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1 Installation
4. Make sure that the operating system can find wombat, by doing
one of the following :
Hint
PATH is set in your login file, e.g. .login_usr, or shell start-up
file, e.g. .tcshrc; use printenv PATH to check your current
PATH settings.
You make have to log out and log in again before changes in your
PATH are recognised, and wombat is found by the system.
Hint
You may be able to inspect the setting for PATH by opening a
command window (type cmd.exe into the “Run” box in the start
menu & hit return) and typing ECHO %PATH%.
Under XP, My Computer (from the start
right-click on
menu), Properties, then Advanced and the
choose
Environment Variables tab. This will open a dialog
box which will list the existing variables and their settings. If
PATH already exists, choose Edit and add the WOMBAT directory –
remember to use the full path and to separate it from the existing
variables by a semi-colon. Otherwise choose New and create it.
The general form of the command (under Linux and its emulation) is
wombat options parfile
with parfile the ‘parameter file’ specifying everything WOMBAT
needs to know about the input files and models of analysis (see
Chapter 4 for full details), and options any run time options you may
wish to give (see Chapter 5). If you have not put the executable file
somewhere the operating system can find it (or have not modified your
PATH settings), you will need to replace wombat above by the full path,
e.g. /home/agbu/WOMBAT/wombat if the program has been placed
in /home/agbu/WOMBAT.
❏ Click “Start”, then “Run”, type CMD.EXE into the “Open” box and
click “O.K.”, or
❏ Find “Command Prompt” under “Accessories” (Under “All
Programs”) and click it. To use this regularly, add this to your “Start”
menu.
wombat.bat
rem Simple .bat file to run WOMBAT in a DOS window 1
rem Assumptions: 2
@echo off 6
color 70 7
pause 9
C:\Program_Files\WOMBAT\wombat -v 10
pause 12
exit 13
Each subdirectory contains the input and output files for a run of
WOMBAT, together with a records of the screen output in the file
Test runs typescript. To test your installation,
N.B.
This should be a ‘parallel’ directory to sub-directories A,B,. . ., i.e.
Examplen/try not Examplen/A/try. Otherwise, you need to
adjust the paths to the data and pedigree files in the parameter
file for WOMBAT to be able to find them !
3.1.5.1 Linux
3.1.5.2 Windows
3.1.6 Updates
WOMBAT caters for novice users by supplying defaults for most as-
Must read pects of estimation. Essential sections of the manual to ‘get started’
are :
Hint
A suitable strategy might be
i) Choose the type of analysis you are interested in, and decide on the
model of analysis. Start with a relatively simple scenario.
ii) Try to find an example which matches the type of analysis and fits a
not too dissimilar model.
iii) Inspect the example parameter and input files.
iv) Read the description of individual entries in the parameter file
(Chapter 4). Compare each section to the relevant section in the
example parameter file.
v) Try to modify the example file for your data (& pedigree) file and
model.
3.3 Troubleshooting
Input errors Errors in the input files are the most likely source of problems which
are not a program bug. Some of these are trapped, leading to a pro-
grammed error stop (with a screen message of “exit WOMBAT” or
12 3 Getting started
Program If – after thoroughly checking your parameter file and other input files
bugs – you think you have encountered a genuine error in the program,
please submit a ‘bug report’, as specified below1 .
❏ wombatcheck.tar.gz
3. Try to recreate the problem using one of the test data sets and
pedigree files supplied. Edit these as appropriate to generate
the structure causing the problem if necessary, e.g. delete or
add records or effects in the model.
Only if absolutely necessary, use a small subset of your data and
pedigree files – the smaller the better – but definitely less than
100 animals in the pedigree and less than 500 records !
7. tar the complete directory and gzip it (or use any other common
compression utility), and send it to me.
I need all input files, output files and the typescript file !
8. If you have any theory on what might be the problem, please tell
me.
1
Never send any .doc, .rtf, etc files !
3.4 Additional documentation 13
This may sound like a lot of work, but is necessary to for me to even
begin to try understanding what is going on !
didgeridoo.une.edu.au/km/WOMBAT/WWW/manual.html
❏ Error messages
didgeridoo.une.edu.au/womwiki/doku.php?id=wombat:faqserr
❏ Standard errors
didgeridoo.une.edu.au/womwiki/doku.php?id=wombat:sterrors
❏ How to . . .?
didgeridoo.une.edu.au/womwiki/doku.php?id=wombat:faqshow
❏ Confusion
didgeridoo.une.edu.au/womwiki/doku.php?id=wombat:faqsconfusion
14 3 Getting started
❏ Miscellaneous
didgeridoo.une.edu.au/womwiki/doku.php?id=wombat:faqsmisc
❏ Windows specific
didgeridoo.une.edu.au/womwiki/doku.php?id=wombat:faqwin
4 The Parameter File
4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2 General rules . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.3 Run options . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.4 Comment line . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.5 Overriding default program limits . . . . . . . . . . . . . . . 18
4.6 Analysis type . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.7 Pedigree file . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.8 Marker counts file . . . . . . . . . . . . . . . . . . . . . . . . 22
4.9 Data file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.9.1 Simple . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.9.2 Compact . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.10 Model of analysis . . . . . . . . . . . . . . . . . . . . . . . . 24
4.10.1 Effects fitted . . . . . . . . . . . . . . . . . . . . . . . 24
4.10.2 Fixed effects . . . . . . . . . . . . . . . . . . . . . . . 24
4.10.3 Fixed covariables . . . . . . . . . . . . . . . . . . . . 25
4.10.4 Random effects . . . . . . . . . . . . . . . . . . . . . 26
4.10.5 ‘Subject’ identifier . . . . . . . . . . . . . . . . . . . . 29
4.10.6 ‘Extra’ effects . . . . . . . . . . . . . . . . . . . . . . 29
4.10.7 Traits analysed . . . . . . . . . . . . . . . . . . . . . 29
4.11 Covariance components . . . . . . . . . . . . . . . . . . . . . 30
4.11.1 Residual covariances . . . . . . . . . . . . . . . . . . 30
4.11.2 Covariances for random effects . . . . . . . . . . . . 32
4.12 Additional functions of covariance components . . . . . . . 33
4.13 Pooling estimates of covariance components . . . . . . . . . 34
4.14 Dependencies among fixed effects . . . . . . . . . . . . . . . 39
4.15 Random effects to be treated as fixed . . . . . . . . . . . . . 40
4.16 Covariance components to be treated as fixed . . . . . . . . 41
4.17 Special information . . . . . . . . . . . . . . . . . . . . . . . 41
4.17.1 Weighted analysis . . . . . . . . . . . . . . . . . . . . 41
4.17.2 Covariables that are zero . . . . . . . . . . . . . . . . 42
4.17.3 Animals that are clones . . . . . . . . . . . . . . . . . 42
4.17.4 Standard errors for fixed and random effect solutions 42
4.17.5 GWAS from ssGBLUP: backsolving for marker effects 43
4.17.6 Repeated records . . . . . . . . . . . . . . . . . . . . 43
4.17.7 Fitting SNP effects . . . . . . . . . . . . . . . . . . . 44
4.17.8 Sampling to approximate standard errors . . . . . . 45
4.17.9 Fitting “social” genetic effects . . . . . . . . . . . . . 46
4.17.10 Fitting ‘explicit’ genetic groups . . . . . . . . . . . . 46
4.17.11 In core storage . . . . . . . . . . . . . . . . . . . . . 49
4.17.12 Calculation of alternative outlier model statistics . . 49
4.17.13Analyses using a Gaussian Kernel matrix . . . . . . . 50
4.17.14 Correlations for random regression analyses . . . . 51
16 4 Parameter file
4.1 Overview
All information on the the model of analysis and the data and pedigree
files is specified through the parameter file.
The name of the parameter file should have extension ’.par’. By default,
WOMBAT expects to find a file wombat.par in the current working
File name directory.
Other filenames can be specified at run time as the last command line
option – see Chapter 5 for details.
Error stops If WOMBAT does find an obvious error, it will stop with a message like
“exit WOMBAT” or, in verbose mode, “Programmed(error) stop
for WOMBAT encountered”. Inconsistencies or errors not discovered
are likely to wreak havoc in subsequent steps !
Hint
Use the -v option at run time. This will cause WOMBAT to echo each
line in the parameter file as it is read (to the screen) – if there is a
programmed error stop at this stage, it is easier to find the mistake as
you know which is the offending line.
Line length The parameter file is read line by line. Each line is assumed to be 88
characters long, i.e. anything in columns 89 onwards is ignored.
4.2 General rules 17
Example
PEDS pedigrees.dat 1
2. ‘Long’ or block entries, spanning several lines (e.g. for the layout
of the data file, or the model of analysis).
Each of these entries starts with a line which gives the code for
the entry (see Table 4.1), and possibly some additional informa-
tion. This is followed by lines with specific information, where of
each of these lines again starts with a specific code. Except for
blocks of starting values of covariance components, the block is
terminated by a line beginning with END.
18 4 Parameter file
Example
MODEL 1
FIX CGroup 2
TRA Weight 4
END MODEL 5
This shows a block entry for the model of analysis, where the
model fits CGroup as a crossclassified, fixed effect and Animal as
random effect with covariance matrix proportional to the numer-
ator relationship matrix, and Weight is the trait to be analysed.
Table 4.1: Valid codes for entries in the parameter file - Part I
and its new value. There should be one such line for each variable to
be changed. Variable names recognised are:
Table 4.2: Valid codes for entries in the parameter file - Part II
These names can be abbreviated to the first six letters. Values for
MAXANIS and MAXEQNS apply to REML analyses or BLUP runs with
direct solution for the mixed model equations – the limits for BLUP ana-
lyses via iterative solutions are obtained by multiplying the respective
values with a factor of 20.
4.6 Analysis type 21
N.B.
Remember to put any such lines close to the top of the parameter file –
so that they are read prior to parts specifying the data layout or model
of analysis.
Example
ANALYSIS MUV PC 8 1
This is a single line entry beginning with code PEDS (can be abbrevi-
ated to PED), followed by the name of the pedigree file. There is no
default name. This entry is ‘optional’, in such that it is only required
if a code of NRM if specified for the covariance structure of a random
effect (see Section 4.10.4). Only one pedigree file can be given. The
format of the pedigree file required by WOMBAT is described in detail
in Section 6.3.
This is a single line entry beginning with code MRKFILE (can be ab-
breviated to MRK), followed by the name of the file. There is no default
name. This entry is ‘optional’, i.e. it is only required for selected ana-
lyses where the default name MarkerCounts.dat is to be replaced.
The layout of the marker file required by WOMBAT is described in
detail in Section 6.4.
This is a block entry. The block begins with a line containing the code
DATA (can be abbreviated to DAT) followed by the name of the data file.
There is no default name. The general form of the data file required in
described in Section 6.2.
The following lines specify the record layout of the data file for all
traits. There are two alternative ways of specification.
4.9.1 Simple
For each trait in turn, there should be one parameter file line for each
column, up to the last column used in the analysis.
The lines can have up to 3 elements :
(a) The code TRn where n is a one- or two-digit trait number. This can
be omitted for univariate analyses.
(b) The name of the variable in this column.
(c) The maximum number of levels. This is required if the column
represents a fixed or random effect in the model of analysis or a
control variable in a random regression analysis.
Exception: For random effects with covariance option NRM (i.e.
additive genetic effects) the number of levels can be given as 0
as WOMBAT counts the number of animals in the pedigree and
replaces this number.
Example
DATA mydata.dat 1
TR1 traitno 2 2
TR1 fixeffect 50 4
TR1 weight 5
TR2 traitno 2 6
TR2 fixeffect 30 8
TR2 feedintake 9
END DATA 10
This shows the block for an analysis reading records for 2 traits from
the file mydata.dat.
4.9.2 Compact
If there are several traits for which the record layout is the same, the
respective record layout can be given for the whole group of traits.
This avoids tedious duplication of lines.
Grouped This alternative is selected by placing the code GRP after the name of
traits the data file (same line, separated by a space).
For each group of traits, the following lines need to be given :
Example
TRNOS 1 2 2
traitno 2 3
animal 1000 4
fixeffect 50 5
END DATA 7
This shows the ‘grouped’ alternative for specifying the data file layout,
for two traits with the same column structure in the example above.
This is another block entry. The block begins with a line containing
MODEL the code MODEL (can be abbreviated to MOD), and finishes with a line
beginning with END. The block then should contain one line for each
effect to be fitted and one line for each trait in the analysis.
NB The name for a fixed effect should not contain a “(”, otherwise
it is assumed that this effect is a covariable.
Hint
‘Not implemented’ here means merely that WOMBAT will not
code the interaction for you – you can, of course, still fit a model
with an interaction effect, but you a) have to insert an additional
column in the data file with the code for the appropriate subclass,
b) fit this as if it were a crossclassified fixed effect, and c) specify
any additional dependencies arising explicitly (using ZEROUT, see
below).
COV : This specifies a fixed covariable. The effect name should have
the form “vn(n,BAF)”, where vn is a variable name, the integer
variable n gives the number of regression coefficients fitted and
BAF stands for a three-letter code describing the basis functions
to be used in the regression. NB: By default, intercepts for fixed
covariables are not fitted.
y − ȳ = b1 (x − x̄) + b2 (x − x̄)2 + · · ·
y = β0 + β1 x + β2 x2 + · · ·
N.B.
WOMBAT expects the value of the control variable to be non-
negative (i.e. 0 or greater) and not to be a fraction (the control
variable is read as a real variable but converted to the nearest
integer, e.g. values of 3,.0, 3.1 and 3.4 would be treated as 3) –
scale your values appropriately if necessary!
N.B.
For multivariate analyses, WOMBAT collects the range of values
for the control variable (i.e. minimum and maximum) across all
traits. This may be undesirable if different traits have distinct
ranges and Legendre polynomials are used as basis function - if
so, use the USR option and set up your own file which maps the
range for each trait exactly as you want it!
4.10 Model of analysis 29
For animal model analyses, WOMBAT assumes that the code for the
first genetic effect fitted also identifies the subject on which measure-
ments are taken. For some analyses (in particular those not fitting any
additive genetic effects!) and sire models this is not appropriate, and
such identifier needs to be supplied as an extra column in the data file.
SUBJ : This code needs to be followed by the name of the variable which
identifies the individual.
For some models, coding is required for effects which are not explicitly
fitted in the model of analysis, for instance, when fitting nested effects.
These need to be specified as ‘extra’ effects.
One line should be given for each trait. It should contain the following
TRAIT information :
Next, the parameter file needs to specify values for all (co)variance
components in the model of analysis. For variance component estima-
tion, these are the starting values used. For simple BLUP analyses and
simulation runs, these are the values assumed to be the population
values.
The input matrices for full rank analyses must be positive definite, i.e.
cannot have eigenvalues less than the operational zero. For reduced
rank (PC) analyses, some ‘zero’ but no negative eigenvalues are ac-
ceptable, provided the rank (i.e. the number of non-zero eigenvalues)
is equal to (or greater than) the number of principal components to be
fitted.
Example
For a univariate RR analysis with values of the control vari-
able ranging from 1 to 5, say we want to fit separate residual
variances for 1 & 2, 3 & 4 and 5:
1 2 1.234 2
3 4 3.456 3
5 5 5.678 4
1
Currently implemented for full-rank, ‘standard’ analyses only !
4.12 Additional functions of covariance components 33
Example
SE+USR 1
SUM siga+pe 1 2 2
VRA repeat 5 4 3
END 4
Hint
Run WOMBAT with the --setup option to begin with. Inspect the
file ListOfCovs generated in this step – this gives a list of running
numbers for individual covariance components.
Example
MODEL 1
trait bwgt 1 5
trait wwgt 2 6
trait ywgt 3 7
END MOD 8
POOL 13
PSEUPED BON 10 14
DIRADD animal 15
MATADD gmdam 16
MATPE pedam 17
SMALL 0.001 18
DELTAL 0.005 19
END 20
Example
MODEL 1
... 3
END MOD 4
POOL 7
PSEUPED USR 5 8
1.0 0.25 12
1.0 13
END 14
MINPAR : The default assumption for the parameter file used is that it is a
‘full’ parameter file as for a corresponding, multivariate analysis.
However, the information on pedigree and data file and their
layout are not actually required. Hence the option MINPAR (on
a line by itself) is provided which allows use of a ‘minimum’
parameter file, switching off some of the consistency checks
otherwise carried out. If used, MINPAR must be the first entry
within the POOL block.
effect.
Example
ANAL MUV 14 1
POOL 4
MINPAR 5
SMALL 0.001d0 6
DIRADD animal 8
END 9
SINGLE : The default form of input for results from part analysis is to read
estimates from separate files, in the form of output generated
by WOMBAT when carrying out multiple analyses of subsets of
traits. Alternatively, all information can be given in a single file.
This is selected by the option SINGLE, followed (space separated)
by the name of the input file (same line). The layout required for
this file is described in Section 6.6.4.2.
The last entry on the line relates to the tuning factor(s) to be used:
If a single penalized analysis is to be carried out, the correspond-
ing tuning factor should be given (real value). To specify multiple
penalized analyses, specify the number of separate tuning factors
4.14 Dependencies among fixed effects 39
Example
1. Shrink all correlation matrices towards the phenotypic cor-
relation matrix using a single tuning factor of 0.1, and calcu-
late the shrinkage target from unpenalized results.
(a) The name of the fixed effect, as specified in the MODEL block.
(b) The ‘original’ code for the level to be zeroed out, as encountered
in the data file.
(c) The trait number; this can be omitted for univariate analyses.
40 4 Parameter file
N.B.
Treating levels of random effect(s) as fixed may lead to additional de-
pendencies in the fixed effects part of the model, especially for multivari-
ate analyses. Great care must be taken to identify these and specify
them explicitly, as outlined above (Section 4.14). WOMBAT has no
provision to account for this possibility !
(a) The name of the random effect, as specified in the MODEL block.
This should be a genetic effect, distributed proportionally to the
numerator relationship matrix among animals.
(b) The ‘original’ code for the level to be treated as ’fixed’, as given in
the pedigree file.
4.16 Covariance components to be treated as fixed 41
N.B.
This option has only undergone fairly rudimentary testing !
It is not available in conjunction with the PX-EM algorithm.
Selecting this option prohibits re-use of inverse NRM matrices set up in
any previous runs.
(a) The name of the random effect, as specified in the MODEL block.
(b) The code ALL to signify that all pertaining covariances are fixed.
Example
Consider an animal with two records and corresponding
weights w1 = 0.8 and w2 = 2. Let the unweighted residual
covariance matrix be
10 −5 0.1143 0.0286
with inverse
−5 20 0.0286 0.0571
WOMBAT reports solutions for fixed and random effects fitted – these
are calculated as a by-product by all algorithms to obtain REML estim-
4.17 Special information 43
N.B.
For large analyses, this can take quite some extra time.
Details about additional information and input files needed and the
output generated are available with Example 22.
Example
SPECIAL 1
END 3
When using the run option --snap, the covariable in the model rep-
resenting SNP effects needs to be identified by a line in a SPECIAL
block containing
ables are specified (in the MODEL block) before any other, ‘regular’
covariables to be fitted.
Example
SPECIAL 1
SNPEFF snp(1) 2
END 4
Example
SPECIAL 1
SAMPLEAI animal 2
END 3
46 4 Parameter file
For each run where a dilution factor has been specified, WOMBAT
appends the value of the dilution factor used together with the cor-
responding maximum log likelihood to a file in the working directory
named LogL4Quapprox.dat (see Section 7.3.8). These represent
points on the profile likelihood curve for the dilution factor, and a
quadratic approximation can be used to determine its maximum. The
run time option --quapp is provided to assist in this task.
y = Xb + Zu + ZQg + e
(b) The name of the random effect in the model representing genetic
group effects.
(e) The name of the random effect in the model representing individu-
als’ additive genetic effect corresponding to the genetic groups.
For example,
48 4 Parameter file
Example
MODEL 1
... 2
... 6
END 7
... 8
SPECIAL 9
... 13
END 14
For example, for g = 5 and a scale factor of 1000, the *.codes file
might be:
Example
4 2304 2 0 0 0 0 1000 4
... 5
with columns 1 and 2 the running number and original animal code,
column 3 the type of animal (1 = non genotyped, 2 = genotyped; for
compatibility, this code needs to be included for all run options if
genetic groups are fitted, including those that do not treat genotyped
animals differently). Columns 4 to 8 then give the proportions of group
membership, multiplied by the scale of 1000 for each of the 5 groups.
Example
SPECIAL 1
... 2
... 4
END 5
Example
SPECIAL 1
... 2
AOM-RES 3
... 4
END 5
The AOM options are implemented for uni- and full rank standard
multivariate analyses only; they are ignored for reduced rank (PC)
analyses and random regression models. Calculations are carried out
for one observation or random effects level at a time. For large models
or cases with dense coefficient matrices C, this can take quite a long
time.
p
AOM statistics for residual effects, R−1 ê/ Dia(R−1 − R−1 WC−1 W′ R−1 ),
are reported together with residuals in Residuals.dat. This file has
one row per record with up to seven columns. Columns 1 to 3 are
written out for all analyses and contain the residual, the predicted
observation and the actual observation (for reference). Specifying
50 4 Parameter file
Example
SPECIAL 1
... 2
... 4
END 5
WOMBAT will then construct the kernel matrix for the given value of
θ and attempt to invert it, as well as calculate its log determinant. K
is assumed to be dense and is automatically stored ‘in-core’. Inversion
is carried out using LAPACK [3] routine DPFTRF [19] which requires a
positive definite matrix.
For CANEIG:
SPECIAL 1
END 3
Example
SPECIAL 1
END 4
Example
SPECIAL 1
END 5
SPECIAL 1
END 3
SPECIAL 1
END 3
Example
SPECIAL 1
END 4
NEW For KANEIG: see example 19 For KORREL: see example 19 For PACORR:
Example
SPECIAL 1
END 3
Example
SPECIAL 1
END 6
For this option to work, the first tuning factor given has to be ψ = 0!
Hint
Use run time option ---valid (Section 5.3.2) for validation steps when
using cross-validation to estimate the value for ψ to be used.
5 Run options
5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.2 Basic run options . . . . . . . . . . . . . . . . . . . . . . . . 58
5.2.1 Continuation run . . . . . . . . . . . . . . . . . . . . 58
5.2.2 Level of screen output . . . . . . . . . . . . . . . . . 58
5.2.3 Set-up steps . . . . . . . . . . . . . . . . . . . . . . . 59
5.2.4 Quality of starting values . . . . . . . . . . . . . . . . 59
5.2.5 Numerical settings . . . . . . . . . . . . . . . . . . . 59
5.2.6 Intermediate results . . . . . . . . . . . . . . . . . . 60
5.2.7 Prediction only . . . . . . . . . . . . . . . . . . . . . 60
5.2.8 Simulation only . . . . . . . . . . . . . . . . . . . . . 62
5.2.9 Sampling to approximate standard errors . . . . . . 63
5.2.10 Matrix inversion only . . . . . . . . . . . . . . . . . . 64
5.2.11 Calculation of genomic relationships or H−1 . . . . . 66
5.2.12 Quadratic approximation . . . . . . . . . . . . . . . . 66
5.2.13 Analyses of subsets of traits . . . . . . . . . . . . . . 67
5.2.14 Pooling estimates of covariance components from
part analyses . . . . . . . . . . . . . . . . . . . . . . 68
5.2.15 Miscellaneous . . . . . . . . . . . . . . . . . . . . . . 69
5.3 Advanced run options . . . . . . . . . . . . . . . . . . . . . . 69
5.3.1 Ordering strategies . . . . . . . . . . . . . . . . . . . 70
5.3.2 REML algorithms . . . . . . . . . . . . . . . . . . . . 71
5.3.3 Parameterisation . . . . . . . . . . . . . . . . . . . . 74
5.3.4 Matrix storage mode . . . . . . . . . . . . . . . . . . 74
5.3.5 Sparse matrix factorisation, auto-differentiation and
inversion . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.3.6 Other . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.4 Parameter file name . . . . . . . . . . . . . . . . . . . . . . . 78
5.1 Overview
Example
wombat 1
highly specialised options to modify the search strategies for the max-
imum of the likelihood function. Multiple options can be combined, but
some care is required so that options given last do not unintentionally
cancel out some of the effects of options given first. Options available
are summarised in Table 5.1, Table 5.2 and Table 5.3.
3. On the first line of the parameter file, after a code of RUNOP (see
Section 4.3). Any such options are read before options specified
on the command line, i.e. the latter may overide them.
Form Following Linux standard, run options have the form “-a” where a
stands for a single, lower-case letter, or the form “--abcdef” where
abcdef stands for a multi-letter code.
--setup It is good practice, to start each analysis with a run which carries out
the set-up steps only, checking that the summary information provided
on the model of analysis and data structure in file SumModel.out
(see Section 7.1.2) is as expected, i.e. that WOMBAT is fitting the
correct model and has read the data file correctly. This is specified
using --setup. This option also invokes verbose screen output.
If we are confident, that we have good starting values, this might cause
--good an unnecessary computational overhead. Specifying --good reduces
the maximum number of (PX-)EM iterates to 1. Similarly, for potentially
bad starting values, estimation might converge more reliably if a few
--bad more initial (PX-)EM iterates were carried out. Specifying --bad sets
this number to 8. If the increase in log likelihood during the initial (PX-
)EM iterates is less than 2.0, WOMBAT will switch to the AI algorithms
immediately.
the run option --pivotx where x is the real value specifying the new
limit.
WOMBAT can be used for a simple BLUP run, using the ‘starting’
values given in the parameter files as assumed values for the true
covariances. In this mode, no pruning of pedigrees is carried out. If
-c is specified in addition, WOMBAT will try to read the values from
the file BestPoint instead – this is useful to obtain ‘backsolutions’
after convergence of an estimation run. Solutions can be obtained
either directly or iteratively:
--blup ❏ Run option --blup selects the direct solution scheme. This in-
volves sparse matrix inversion of the coefficient matrix, i.e. com-
putational requirements are similar to those of a single REML
iterate using the EM algorithm. While this can be computation-
ally demanding for larger problems, it allows standard errors and
expected accuracies (correlations between estimated and true
effects) for random effects to be calculated from the diagonal
elements of the direct inverse.
5.2 Basic run options 61
--mmeout ❏ Run option --mmeout acts like --solvit, but writes out the
complete mixed model equations in a form suitable for further
analyses. This includes a file containing all non-zero elements
in the lower triangle of the coefficient matrix (default name
MMECoeffMatrix.dat), and the file MMEEqNos+Solns.dat
which specifies the mapping of equation numbers to effects fitted,
as well as giving the vector of right hand sides and solutions; see
Section 7.2.9 for details.
❏ Run option --s1step provides an iterative solver as in
--solvit, but targeted towards a so-called single-step analysis
and providing a PCG algorithm only. The main difference is that
equations for genotyped individuals are grouped together and
are then processed as a dense block. This requires the inverse
--s1step of the combined relationship matrix (H−1 ) to be supplied as a
*.gin matrix, and a code for each individual – specifying whether
is has been genotyped or not – to be given in the third, additional
column of the pertaining *.codes file.
NEW Change: The default for the PCG algorithm has been changed to
the so-called SSOR preconditioner. There are now five additional
choices to vary the preconditioner by appending the letter A, B,
C, D or E:
--s1stepA : invokes a simple diagonal preconditioning scheme.
--s1stepB : provides a blockdiagonal preconditioner with the
62 5 Run options
a
diag: diagonal only, blox: trait x trait diag. blocks, full: full diag. block
--simul Specifying --simul causes WOMBAT to sample values for all ran-
dom effects fitted for the data and pedigree structure given by the
respective files, from a multi-variate normal distribution with a mean
5.2 Basic run options 63
This option is not available for models involving covariance option GIN
(see Section 4.10.4).
best for estimation runs with the PC option, even when matrices are
estimated at full rank.
Use of this facility has been changed and the following section has
been UPDATED!
a) row number,
b) column number and
c) the matrix element.
--invert 1. Using --invert (or simply --inv) is the default option which
selects inversion of a full-stored, dense matrix. This uses LAPACK
routines DPOTRF and DPOTRI.
2. Option --invlap selects the same calculations as --inv. Ex-
panding this to --invlapp or --invlapf (note the extra let-
ter!) changes the matrix storage to ‘packed’ form, storing
only one triangle of the symmetric matrix and thus requiring
less memory. For --invlapp, calculations are carried out us-
ing LAPACK routines DPPTRF and DPPTRI and for --invlapf,
LAPACK routines DPFTRF and DPFTRI are used. These two differ
in the in way the elements of the matrix are packed and the latter
tends to be computationally faster (see the LAPACK manual for
details). Note that for integer*4 addressing, the largest size of a
packed matrix that can be accommodated is 65 535.
5.2 Basic run options 65
Example
For some types of analyses a parameter ( e.g. the dilution factor for “so-
cial” genetic effects) is fixed at a given value and variance components
are estimated for this value. To estimate the parameter then requires
multiple runs for different values, each generating a point of the pro-
file likelihood for this parameter. Provided an initial triplet of points
can be established, comprised of a ‘middle’ parameter value with the
highest likelihood bracketed by points with lower likelihoods, a quad-
ratic approximation of the curve can be used to locate its maximum.
WOMBAT collects the pairs of parameter values and corresponding
likelihoods in a file with the standard name LogL4Quapprox.dat.
--quapp Specifying the run option --quapp invokes an attempt at a quadratic
approximation, utilizing the information from this file. If successful
(i.e. if a suitable triplet of points can be found), the estimate of the
5.2 Basic run options 67
N.B.
This option does not carry through any information from a SPECIAL
block, and is not implemented for random regression models or analyses
involving correlated random effects or permanent environmental effects
fitted as part of the residuals.
For n = 2 (bivariate), WOMBAT also writes out a file containing a bash
shell script, run_bivar.sh, to allow execution of all analyses with a
single command.
Hint
WOMBAT will add a line to SubSetsList on each run – this may cause
redundant entries. Inspect & edit if necessary before proceeding to
combining estimates !
68 5 Run options
--itsum Option --itsum selects a run to combine estimates from partial ana-
lyses, using the ’iterative summing of expanded part matrices’ ap-
proach of Mäntysaari [27] (see also Koivula et al. [23]), modified to
allow for differential weighing of individual analyses. For this run, the
parameter file required is equal to that for a full multivariate analysis
for all traits. Further, a file SubSetsList is assumed to exist and list
the names of all the files containing the results from analyses of sub-
sets, and, optionally, the weightings to be applied (see Section 7.3.9).
Pooled covariance matrices are written to a file name PDMatrix.dat
as well as a file named PDBestPoint (see Section 7.2.7).
Hint
Use of --itsum is not limited to combining bi-variate analyses or the
use of files with standard names (EstimSubsetn + . . . + m.dat), but
all input files must have the form as generated by WOMBAT.
To use --itsum to combine estimates from analyses involving different
data sets, be sure to a) number the traits in individual analyses appro-
priately (i.e. 1, . . . , q with q the total number of traits, not the number
of traits in a partial analysis), and b) to use the syntax described in
Section 4.10.7 to renumber traits – this will ‘switch on’ the output of
subset results files.
Copy PDBestPoint to BestPoint and run WOMBAT with option
--best to obtain a listing with values of correlations and variance ratios
for the pooled results.
5.3 Advanced run options 69
Hint
Details and a file with further documentation are available with Example
15.
5.2.15 Miscellaneous
--expiry Option --expiry will print the expiry date for your copy of WOMBAT
to the screen.
--limits Option --limits can be used to find at the upper limits imposed
on analyses feasible in WOMBAT, as ‘hard-coded’ in the program.
N.B. Sometimes these are larger than your computing environment
(memory available) allows.
--times Option --times causes WOMBAT to print out values for the CPU
time used in intermediate steps.
--wide Option --wide will generate formatted output files which are wider
than 80 columns.
--help Run option --help causes a list of available run options to be printed
to screen.
There are default values for most of these codes, which are adequate
for most simpler analyses, and analyses of small to moderately sized
data sets. Hence these options are predominantly of interest to users
which want to fit complicated models or analyse large data sets, and
thus need to tweak some of the options for ordering of equations,
parameterisations, maximisation strategies and convergence criteria
to get the best possible performance from WOMBAT.
The order in which the equations in the mixed model are processed
can have a dramatic impact on the computational requirements of
REML analyses. Hence, WOMBAT attempts to reorder the equations,
selecting a default ordering strategy based on the number of equations;
see Section A.1 for details. This can often be improved upon by
selecting the strategy used explicitly.
--mmd Option --mmd specifies ordering using the multiple minimum degree
procedure.
faster.
Example
--aireml30,0.001 1
limits the number of iterates carried out to 30, and stops estimation
when the change in log likelihood is less than 10−3 .
--mod In this case, a modification of the AI matrix can result in better per-
aim formance of the AI algorithm. Four procedures have been implemen-
ted in WOMBAT. These are selected by specifying --modaimn, with
n = 1, 2, 3, 4 selecting modification by setting all eigenvalues to a min-
imum value (n = 1), by adding a diagonal matrix (n = 2, the default),
or through a modified (n = 3) [45, 46] or partial (n = 4) [15] Cholesky
decomposition of the AI matrix; see Section A.5 for details.
--pxem Option --pxem then selects estimation using the PX-EM algorithm. As
above, this can be followed by n or n, C .
Example
---pxai6,50,0.001 1
--emai Option --emai is like --pxai except that the initial iterates are car-
ried out using the standard EM algorithm. This is the default for
reduced rank estimation. Optional arguments are as for --pxai.
5.3.3 Parameterisation
NEW this sets an upper limit of n = 65, 535. For larger problems, the option
--dense2 is available which uses storage in two-dimensional arrays
--dense2 instead. Note that use of --dense2 may require substantial amounts
of RAM and that it is not yet implemented for all algorithms (e.g. not
for PX(EM)).
5.3.6 Other
For sire model analyses, the default is not to carry out any pruning
or pedigree reduction. The option --redped is provided to switch
on pedigree reduction for sire model analyses, i.e. elimination of
--redped all individuals without a link to an individual in the data from the
pedigree file.
--quaas Alternative algorithms, due to Quaas [44] and Meuwissen and Luo
--meuw [28], which do not have this limitation but are slower and use very
little memory, can be selected using the run time option --quaas and
--meuw, respectively.
--nosolut Run option --nosolut will skip writing out files with fixed and ran-
dom effects solutions at the end of an estimation run. This is useful
for simulation studies where these are not of interest.
--nohead Run option --nohead will omit writing out the header line for files
with solutions for random effects. This is useful when these files are
to be processed further using other software.
--noraw Run option --noraw will omit writing out the raw means for solutions
of fixed or random effects. This is useful for large problems and when
these are not of interest.
--threads The Linux executables compiled using ifort will carry out calcula-
tions using multiple threads where available. The default is to use half
of the maximum number available (as each core typically is counted as
two threads, this is the number of cores). A different number can be
selected via the run option --threadsn where n is an integer value
giving the number of threads to be utilised.
78 5 Run options
Example
wombat weights 1
Option Section
--amd [5.3.1] Option Section
--aireml [5.3.2]
--noraw [5.3.6.9]
--bad [5.2.4]
--nostrict [5.3.2]
--batch [5.3.6.3]
--nologd [5.3.3]
--best [5.2.6]
--noreord [5.3.3]
--blup [5.2.7]
--nonly [5.3.6.6]
-c [5.2.1]
--noprune [5.3.6.1]
--choozhz [5.3.1]
--nocenter [5.3.6.4]
--cycle [5.3.2]
--nolsq [5.3.6.7]
-d [5.2.2]
--norped [5.3.6.2]
--dense [5.3.4]
--pivot [5.2.5.2]
--deraud [5.3.2]
--powell [5.3.2]
--derinv [5.3.2]
--pxai [5.3.2]
--emai [5.3.2]
--pxem [5.3.2]
--emalg [5.3.2]
--pool [5.2.14.2]
--expiry [5.2.15]
--quapp [5.2.12]
--force [5.3.2]
--quaas [5.3.6.5]
--fudge [5.2.5.3]
--reconstr
--good [5.2.4]
--redped [5.3.6.2]
--help [5.2.15]
--reorder
--hinv [5.2.11]
--s1step [5.2.7]
--hdiag [5.2.10]
--s2step [5.2.7]
--hchol [5.2.10]
--sample [4.17.8]
--invert [5.2.10]
--setup [5.2.3]
--inveig [5.2.10]
--simplex [5.3.2]
--invlap [5.2.10]
--simul [5.2.8]
--invspa [5.2.10]
--snap [5.2.7]
--itsum [5.2.14.1]
--subset [5.2.13]
--limits [5.2.15]
--solvit [5.2.7]
--like1 [5.3.2]
-t [5.2.2]
--logdia [5.3.3]
--threads [5.3.6.11]
--maxgen [5.3.6.5]
--times [5.2.15]
--metis [5.3.1]
-v [5.2.2]
--mmd [5.3.1]
--valid [5.3.2]
--mmeout [5.2.7]
--wide [5.2.15]
--meuw [5.3.6.5]
--zero [5.2.5.1]
--modaim [5.3.2]
--nohead [5.3.6.10]
6 Input files for WOMBAT
6.1 Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.2 Data File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.3 Pedigree File . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.4 Marker models: Allele counts . . . . . . . . . . . . . . . . . 86
6.5 Parameter File . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.6 Other Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.6.1 General inverse . . . . . . . . . . . . . . . . . . . . . 87
6.6.2 Basis function . . . . . . . . . . . . . . . . . . . . . . 91
6.6.3 GWAS: Allele counts . . . . . . . . . . . . . . . . . . 92
6.6.4 Results from part analyses . . . . . . . . . . . . . . . 93
6.6.5 ‘Utility’ files . . . . . . . . . . . . . . . . . . . . . . . 93
6.6.6 File SubSetsList . . . . . . . . . . . . . . . . . . . 94
6.6.7 File(s) Pen*(.dat) . . . . . . . . . . . . . . . . . . . 94
6.1 Format
All input files supplied by the user are expected to be ’formatted’ (in a
Fortran sense), i.e. should be plain text of ASCII file.
Hint
Take great care when switching from DOS/Windows to Linux or vice
versa: Remember that these operating systems use different end-of-line
coding - this means that you may have to ‘translate’ files using a utility
like dos2unix (unix2dos) or fromdos (todos).
The data file is mandatory. It gives the traits to be analysed, and all
information on effects in the model of analysis. It is expected to have
the following features:
File name 1. There is no ’default’ name for the data file. File names up to 30
characters long are accommodated.
Format 2. Variables in the data file should be in fixed width columns, separ-
ated by spaces.
84 6 Input files
N.B.
Calculations in WOMBAT use an operational zero (default value:
10−8 ), treating all smaller values as zero. To avoid numerical prob-
lems, please ensure your traits are scaled so that their variances
are in a moderate range (something like 10−5 to 105 ).
Order of i) the individual (or ’subject’) for which traits are recorded,
records and
ii) according to the trait number within individual.
iii) For RR analyses, records are expected to be sorted according
to the value of the control variable (within individual and
trait number) in addition.
N.B.
WOMBAT does not allow ‘repeated’ records for individual
points on the trajectory in RR analyses, i.e. you can not have
multiple observations for an individual with the same value
of the control variable.
1
Yes, this may result in some duplication of codes, if the model is the same for all
traits !
6.3 Pedigree File 85
Annotation To facilitate annotation of the data file (e.g. column headers, date of
creation, source), WOMBAT will skip lines with a ’#’ (hash sign) in
column 1 at the beginning of the file - there is no limit on the number,
n, of such lines, but they must represent the first n lines (any ’#’
elsewhere will cause an error).
File name 1. There is no ’default’ name for the pedigree file. File names up to
30 characters long are accommodated.
2. The pedigree file must contain one line for each animal in the
Parents data. Additional lines with pedigree information for parents
without records themselves can be included.
All codes must be valid integer in the range of 0 to 2 147 483 647.
Note that these relate to older, seldom used model options which
are incompatible with genotype codes or genetic group informa-
tion.
As for the data file, any lines at the beginning of the pedigree file with
a ’#’ (hash sign) in column 1 are ignored.
2. The marker counts file file must contain at least one line for each
genotyped animal.
(a) the animal code (as in data or pedigree file) at the beginning
of a NEW line.
(b) the marker counts, typically 0, 1 and 2 (through real variables
can be accommodated for some forms of input; see below) for
exactly m markers, with m as specified in the parameter file.
These may be on the same line or extend over continuation
lines.
Rules to set up the parameter file are complex, and are described in
detail in a separate chapter (Chapter 4).
For each random effect fitted for which the covariance option GIN
(see Section 4.10.4) has been specified, WOMBAT expects a file set
up by the user which contains the inverse of the matrix K (such as
relationship or correlation matrix) which determines the ‘structure’ of
the covariance matrix for the random effect. The following rules apply
:
1. The file name should be equal to the name of the random effect,
File name with the extension .gin. For example, mother.gin for a ran-
dom effect called mother.
For random effect names containing additional information in
round brackets, for instance in RR analysis, only the part preced-
ing the ‘(’ should be used. In this case, be careful to name the
effects in the model so that no ambiguities arise!
88 6 Input files
2. The first line of the file should contain a real variable with value
equal to the log determinant of the covariance/general relation-
ship matrix (NB: This is the log determinant of the matrix K,
not of the inverse K−1 ; this can generally be calculated as a ‘by-
product’ during inversion).
This comprises a constant term in the (log) likelihood, i.e. any
value can be given (e.g. zero) if no comparisons between models
are required.
NEW Optionally, this can be followed (separated by space(s)) by the
keyword “DENSE”. If given, WOMBAT will store the elements
of the general relationship matrix in core, assuming it is dense,
i.e. for n levels, an array of size n(n + 1)/2 is used. This can
require substantial additional memory, but reduces the overhead
incurred by re-reading this matrix from disk for every iteration,
and may be advantageous if the matrix is (almost) dense, such
as the inverse of a genomic relationship matrix.
3. The file should then contain one line for each non-zero element
Layout in the inverse. Each line is expected to contain three space-
separated variables :
(a) An integer code for the ‘column’ number
(b) An integer code for the ‘row’ number
(c) A real variable specifying the element of the inverse
Here ‘row’ and ‘column’ numbers should range from 1 to N ,
where N is the number of levels for the random effect.
Only the elements of the lower triangle of the inverse should be
given and given ‘row-wise’, i.e. WOMBAT expects a ’column’
number which is less than or equal to the ‘row’ number.
Hint
Calculations involved are more efficient if elements are given in
order (of the lower triangle)!
For runs which produce random effects solutions WITH standard er-
NEW rors, a file containing the diagonal elements of the general relationship
matrix K (NOT of the inverse!) is recognised. This is expected to have
the same name as the .gin file but extension .hdiags or .gdiags.(
If found, these diagonal elements are used to attempt computation
of the accuracies corresponding to the prediction errors computed.
The file is expected to be formatted, with one line for each level of the
corresponding random effect. Each line should contain three space
separated variables:
Hint
Output of such file — with extension .hdiags – can be requested when
using WOMBAT with run option --hinv to build the joint inverse rela-
tionship matrix for a single-step analysis, H−1 .
A) If a file with the same name as the .gin file but extension .matrix
is found, WOMBAT expects to read the elements of the lower
triangle of the symmetric matrix from this file.
B) If the original matrix is not available, it may be more convenient
to supply the Cholesky factor of the inverse K−1 instead, in a file
with the same name as the .gin file but extension .chlsky. If
this is given (and no .matrix file is found), the required product
is evaluated using two triangular solves in each iterate.
C) If neither of these files is available, WOMBAT will attempt to carry
out the Cholesky factorisation if the GIN matrix is more than 75%
dense. This is done storing the matrix in full and using Lapack
routines for symmetric, positive definite matrices – if the matrix
is large or not safely positive definite this can cause problems.
Otherwise, the program will stop.
To avoid setting up a .matrix or .chlsky file altogether, please
disable use of the average information algorithm by explicitly spe-
cifying one of the other maximisation algorithms.
As for the .gin file, the .matrix or .chlsky file should be format-
ted, with one line for each non-zero element containing three space-
separated variables (but no line corresponding to the determinant)
Hint
Run option --hchol is provided to carry out a sparse Cholesky factor-
isation as a separate step; see Section 5.2.10
6.6 Other Files 91
1. The name of the file should be the name of the covariable (or
‘control’ variable), as given in the parameter file (model of ana-
File name lysis part), followed by _USR, the number of coefficients, and the
extension .baf.
Example
If the model of analysis includes the effect age and the maximum
number of regression coefficients for age is 7, the corresponding
input file expected is age_USR7.baf
N.B.
The file name does not include a trait number.
This implies, that for multivariate analyses the same basis function
is assumed to be used for a particular covariable across all traits.
The only differentiation allowed is that the number of regression
coefficients may be different (i.e. that a subset of coefficients
may be fitted for some traits); in this case, the file supplied must
correspond to the largest number of coefficients specified.
Example
Assume the covariable has possible values of 1, 3, 5, 7 and 9, and that
we want to fit a cubic regression on ’ordinary’ polynomials, including
the intercept. In this case, WOMBAT would expect to find a file with 5
rows (corresponding to the 5 values of the covariable) and 4 columns
(corresponding to the 4 regression coefficients, i.e. intercept, linear,
quadratic and cubic):
1 1 1 1
1 3 9 27
1 5 25 125
1 7 49 343
1 9 81 729
Note that there is no leading column with the value of the covariable
(you can add it as the last column which is ignored by WOMBAT, if you
wish) – the association between value of covariable and user defined
function is made through the order of records.
For run option --pool, results can be given in a single file instead.
For each part analysis, this should contain the following information:
WOMBAT will check for existence of other files with default names
in the working directory and, if they exist, acquire information from
them.
Example
age.baf mybasefn.dat 1
damage.baf mybasefn.dat 2
For penalty options COVARM and CORREL a file with this name must
be supplied which gives the shrinkage target. This must be a positive
definite matrix. The file should be a plain text file and contain the
elements of the upper triangle of the matrix. It is read in ‘free’ format,
i.e. variable numbers of elements per line are allowed.
6.6 Other Files 95
A run with the option --valid expects to read sets of estimates from
a file with this name. This is generated by WOMBAT when penalized
estimation is specified, but can be edited to suit or generated by other
means. For each tuning factor, it should contain:
This file gives a summary about the model of analysis specified and
the corresponding features of the data found. Statistics given include
means, standard deviations and ranges for traits and covariables, and
numbers of levels found for the effects in the model of analysis.
It is written after the first processing of the input files, i.e. during the
set-up steps.
Standard If the final estimates were obtained using the AI algorithm, WOMBAT
errors provides approximate sampling errors for the parameters and covari-
ance components estimated, as given by the inverse of the respective
average information matrices. In addition, sampling errors of variance
ratios and correlations are derived, as described in Section A.4.2.
This file lists the generalised least-squares solutions for all fixed effects
fitted, together with ‘raw’ means and numbers of observations for
individual subclasses.
98 7 Output files
Hint
If this file is the ‘by-product’ of an estimation run using the AI-REML
algorithm (default), no standard errors for fixed effects are given. The
reason is that the AI algorithm does not involve calculation of the in-
verse of the coefficient matrix of the mixed model equations. Asymptotic
lower bound standard errors are written out, however, if the (PX-)EM
algorithm is used in the last iterate performed, or if a BLUP run is
specified, as both evaluate this inverse. You can force WOMBAT to cal-
culate standard errors by supplying the option FORCE-SE in a SPECIAL
block (see Section 4.17.4). Note though that is may require additional
calculations to invert the coefficient matrix which can be demanding for
large analyses.
This file is only written out when specifying run option --sample
(see Section 5.2.9. It gives a brief summary of the characteristics
of the analysis and average information matrix for which samples
were obtained, together with means and variances across replicates.
In addition, large deviations between theoretical results from the
information matrix and samples obtained are flagged.
These are large files, most likely subject to further processing by other
programs. Thus they contain minimum or no text. They have extension
.dat.
This files gives the residuals for all observations, for the model fitted
and current estimates of covariance components. It has the same order
as the data file, and contains 3 or more space separated columns:
The first line contains the column names. Note that this file is not
produced for runs involving iterative solution of the mixed model
7.2 Additional results 99
equations.
Example
res<-read.table(‘‘Residuals.dat’’) 1
summary(res); sd(res) 2
par(mfrow=c(1,2)) 3
plot(res[,1],res[,3]); hist(res[,1]) 4
Solutions for each random effect are written to a separate file. These
files have names RnSoln_rname.dat, with rname representing the
name of the random effect. Columns in these files are :
(a) Column 1 gives the running number for the level considered
(b) Column 2 gives the corresponding original code.
(c) Column 3 gives the ’effect’ number, where, depending on the ana-
lysis, ’effect’ is a trait, principal component or random regression
coefficient.
(d) Column 4 gives the solution.
(e) Column 5 gives the sampling error of the solution, calculated as
the square root value of the corresponding diagonal element of
the coefficient matrix in the mixed model equations. This is only
available, if the last iterate has been carried out using an EM or
PX-EM algorithm (or if FORCE-SE has been invoked).
If you have carried out a reduced rank analysis, i.e. give the PC
option for the analysis type, the solutions in RnSoln_rname.dat
pertain to the principal components! You might then also be inter-
100 7 Output files
Hint
To get most information from these files, it might be worth your while
scaling your covariables prior to analysis!
(a) Column 1 gives the running number of the value of the control
variable.
7.2 Additional results 101
(a) The number of traits in the subset and their names, as given in the
parameter file.
(b) The corresponding trait numbers in the ‘full’ analysis.
(c) The first line for each covariance matrix gives the running number
of the random effect, the order of fit and the name of the effect
(d) The following lines give the elements of covariance matrix, with
one line per row.
NB The number of rows written is equal to the number of traits in
the subset; for random effects not fitted for all traits, corres-
ponding rows and columns of zeros are written out.
(a) A line with the qualifier VAR, followed by the name of the random
effect and the order and rank of the covariance matrix.
(b) The elements of the upper triangle of the covariance matrix; these
are written out as one element per line.
These files provided the results from a run with the option --pool:
This is the output file for a run using option --snap. It contains one
line for each line found in the input file SNPCounts.dat containing
This file gives a brief summary of estimates together with log likelihood
values for all values of the tuning parameter given.
This file collects the BestPoint’s for all tuning parameters. The
format for each is similar to that for BestPoint (see Section 7.3.3),
except that the first line has 3 entries comprising the tuning factor,
the maximum penalized likelihood and the corresponding unpenalized
value. It is suitable as input for additional calculations aimed at com-
paring estimates, and used as input file for ‘validation’ runs (see run
option --valid, Section 5.3.2).
7.2 Additional results 105
Similarly, this file collects the values of the tuning factors and corres-
ponding penalized and unpenalized log likelihood values. It has one
line for each tuning factor. For a penalty on the canonical eigenvalues,
estimates of the latter are written out as well (in descending order).
Again, if this file exists in the working directory it is not over-written
but appended to.
This file is the output resulting from a run with the option --valid.
It contains one line per tuning factor with he following entries:
This (potentially large) file contains the samples drawn from the mul-
tivariate normal distribution of estimates, either for all random effects
in the analysis (name = ALL) or for single, selected effect. The file
106 7 Output files
This file gives a count of the numbers of repeated records per trait
and, if option TSELECT is used, a count of the number of pairs taken
at the same time.
N.B.
BestPoint is used in any continuation or post-estimation steps – do
not delete is until the analysis is complete !
(a) Column 1 gives a two-letter identifying the algorithms (AI, PX, EM)
used.
(b) Column 2 gives the running number of the iterate.
(c) Column 3 contains the log likelihood value at the end of the iterate.
(d) Column 4 gives the change in log likelihood from the previous
iterate.
(e) Column 5 shows the norm of the vector of first derivatives of the
log likelihood (zero for PX and EM)
(f) Column 6 gives the norm of the vector of changes in the paramet-
ers, divided by the norm of the vector of parameters.
(g) Column 7 gives the Newton decrement (absolute value) for the
iterate (zero for PX and EM).
(h) Column 8 shows a) the step size factor used for AI steps, b) the
average squared deviation of the matrices of additional parameters
in the PX-EM algorithm from the identity matrix, c) zero for EM
steps.
(i) Column 9 gives the CPU time for the iterate in seconds
(j) Column 10 gives the number of likelihood evaluations carried out
so far.
(k) Column 11 gives the factor used to ensure that the average inform-
ation matrix is ‘safely’ positive definite
(l) Column 12 identifies the method used to modify the average in-
formation matrix (0: no modification, 1: modify eigenvalues dir-
ectly, 2: add diagonal matrix, 3: modified Cholesky decomposition,
4: partial Cholesky decomposition – see Section A.5).
particular analysis. The file contains one line per ordering tried, with
the following information :
These files are written out when the AI algorithm is used. After
each iterate, they give the average information matrix (not its inverse
!) corresponding to the ‘best’ estimates obtained by an AI step, as
written out to BestPoint. These can be used to approximate sampling
variances and errors of genetic parameters.
N.B.
If the AI iterates are followed by further estimates steps using a different
algorithm, the average information matrices given may not pertain to
the ‘best’ estimates any longer.
For full rank estimation, the average information is first calculated with
respect to the covariance components and then transformed to the
Cholesky scale. Hence, the average information for the covariances is
available directly, and is written to the file AvInfoParms.
Both files give the elements of the upper triangle of the symmetric
information matrix row-wise. The first line gives the log likelihood
value for the estimates to which the matrix pertains – this can be used
to ensure corresponding files of estimates and average information
are used. Each of the following lines in the file represents one element
of the matrix, containing 3 variables :
N.B.
Written out are the information matrices for all parameters. If some
parameters (or covariances) are not estimated (such as zero residual
covariances for traits measured on different animals), the corresponding
rows and columns may be zero.
For random regression analyses, file(s) with the basis functions evalu-
ated for the values of the control variable(s) in the data are written
out. These can be used, for example, in calculating covariances of
predicted random effects at specific points.
The name of a file is equal to the name of the covariable (or ‘control’
variable), as given in the parameter file (model of analysis part), fol-
lowed by the option describing the form of basis function (POL, LEG,
BSP; see Section 4.10.2) the maximum number of coefficients, and the
extension .baf. The file then contains one row for each value of the
covariable, giving the covariable, followed by the coefficients of the
basis function.
NB. These files pertain to the random regressions fitted! Your model
may contain a fixed regression on the same covariable with the same
number of specified regression coefficients, n, but with intercept omit-
ted. If so, the coefficients in this file are not appropriate to evaluate
the fixed regression curve.
7.4 Miscellaneous
This file collects ‘time stamps’ for various stages of a run, together
explanatory messages for programmed stops. The content largely
duplicates what is written to the screen. However, the file is closed
after any message written – in contrast to a capture of the screen
output which, under Linux, may be incomplete. It is intended for those
running a multitude of analyses and wanting to trace hat went on. N.B.
This file is appended too, i.e. will collect information for multiple runs
in the current directory if not deleted between runs.
8 ‘Internal’ files used by WOMBAT
Hint
Checks for a match between an existing .bin file and the current model
of analysis and data file are rather elementary and should not be relied
upon. When starting a ‘new’ analysis, it is good practice to switch to a
new, ‘clean’ directory, copying any .bin files which are known to match,
if appropriate.
9 Worked examples
Each subdirectory contains the data and pedigree files for a particular
example, a file WhatIsIt with a brief description of the example, and
one or subdirectories for individual runs, A, B, C, . . ..
N.B.
The example data sets have been selected for ease of demonstration, and
to allow fairly rapid replication of the example runs. Clearly, most of the
data sets used are too small to support estimation of all the parameters
fitted, in particular for the higher dimensional analyses shown !
N.B.
Examples runs have been carried out on a 64-bit machine (Intel I7
processor, rated at 3.20Ghz) ; numbers obtained on 32-bit machine may
vary slightly.
Note further that all the example files are Linux files - you may need to
’translate’ them to DOS format if you plan to run the examples under
Windows.
Example 1
Example 2
This shows a bivariate analysis for the case where the same model
is fitted for both traits and all traits are recorded for all animals.
The model of analysis includes 3 cross-classified fixed effects and an
additional random effect
Example 3
Example 4
Example 5
Example 6
Example 7
Example 8
of the animal is thus fitted for this trait. Gestation length is treated
as trait of the calf and assumed to be affected by both genetic and
permanent environmental effects.
A Standard model
B Equivalent model, using the PEQ option for permanent environ-
mental effects of the animal.
Example 9
Example 10
The tutorial and all data (& pedigree) files used in this example are
not included – you have to download these from their web site:
http://wildanimalmodels.org
Example 11
Example 13
Example 14
This gives a toy example illustrating the use of the run option --snap
for a simple GWAS type analysis.
Brief notes are supplied as a .pdf file with the example or can be
downloaded as RNote_WOMBATSnappy.pdf
counts for the number of copies of the reference allele; this has
the default name "SNPCounts.dat" and must be (or a link to it)
in the working directory. The output file "SNPSolutions.dat"
contains estimates of SNP effects, their standard errors and
resulting t-values, followed by their p-value derived from the
Normal density and log 10 of the latter (one per line).
B As A, but for a bivariate analysis (2 traits, fit a single SNP). The
output file SNPSolutions.dat contains 6 values per line: estimates
of the SNP effects for traits 1 and 2, their standard errors and
resulting t-values.
C As A, but for two SNPs (no p-values).
D Ditto, but two SNPs & two traits
E As A with the addition of ‘explicit’ genetic groups
Example 15
Some notes are supplied as a .pdf file with the example or can be
downloaded as RNote_WOMBATPool.pdf
A All part analyses have been carried out using WOMBAT and a ‘full’
parameter file is available.
B Results from part analyses are summarized in a single file and a
‘minimum’ parameter file is used.
Example 16
A This directory shows an example run for simulated data for a dilu-
tion factor of 0, treating direct and social genetic effects as uncor-
related.
B As A, but allowing for a non-zero genetic covariance
C As B, but treating social group and residual variances as heterogen-
eous.
119
Example 17
Example 18
A Univariate analysis
B As A, using –blup
C Bivariate analysis
D Univariate analysis, fitting "explicit" genetic groups
Example 19
Brief notes are supplied as a .pdf file with the example or can be
downloaded as RNote_WOMBATSimplePenalty.pdf
Example 20
Example 21
Example 22
Simulated data for a trait with heritability of 0.4; 130 animals with
allele counts for 100 SNPs.
122 9 Examples
where θ̂it denotes the estimate of the i−th parameter from iterate
t, and p is the number of parameters.
3. The norm of the gradient vector, i.e., for git = ∂log Lt /∂θi ,
v
u p
uX
t (g t )2
i (A.2)
i=1
t
where Hij is the ij−the element of the inverse of the average
information matrix for iterate t. This gives a measure of the
expected difference of log Lt from the maximum, and has been
suggested as an alternative convergence criterion [6].
Default values for the thresholds for these criteria used in WOMBAT
are summarised in Table Table A.1.
A.3 Parameterisation 125
Algorithm
Criterion AI (PX-) EM Powell
Change in log L < 5 × 10−4 < 10−5 < 10−4
Change in parameters < 10−8 < 10−8 < 10−8
Norm of gradient vector < 10−3 – –
Newton decrement not used – –
N.B.
Current values used are rather stringent; ‘softer’ limits combining sev-
eral criteria may be more appropriate for practical estimation.
A.3 Parameterisation
For reduced rank analyses, however, the AI matrix for the parameters
to be estimated are calculated directly, as outlined by Meyer and Kirk-
patrick [36]. Hence, sampling covariances among the corresponding
covariance components need to be approximated from the inverse
of the AI matrix. WOMBAT estimates the leading columns of the
Cholesky factor (L) of a matrix to obtain a reduced rank estimate,
Σ = LL′ . Let lir (for i ≤ r) denote the non-zero elements of L. The
inverse of the AI matrix then gives approximate sampling covariances
126 Appendix A Technical details
q(i,j)
X
σij = lir ljr
r=1
with q(i, j) = min(i, j, t) and t the rank which the estimate of Σ is set
to have. The covariance between two covariances, σij and σkm is then
q(i,j) q(k,m)
X X
Cov (σij , σkl ) = Cov (lir ljr , lks lms )
r=1 s=1
q(i,j) q(k,m)
X X
Cov (σij , σkl ) ≈ [ ljr lms Cov (lir , lks ) + ljr lks Cov (lir , lms )
r=1 s=1
(A.4)
+lir lms Cov (ljr , lks ) + lir lks Cov (ljr , lms ) ]
σ12
≈ σ24 Var(σ12 ) + σ14 Var(σ22 ) − 2σ12 σ22 Cov(σ12 , σ22 ) /σ28 (A.6)
Var
σ22
A.5 Modification of the average information matrix 127
S n
− −
X
V t+1
= ws Vt (Ps Ps ′ Vt Ps Ps ′ ) Ps Cs Ps ′ (Ps Ps ′ Vt Ps Ps ′ ) Vt
s=1
S
′ t −1 ′
− o X
+ (I − Ps Ps ) (V ) (I − Ps Ps ) / ws (A.8)
s=1
until Vt and Vt+1 are virtually identical, with Vt the value of V for the
t−th iterate.
Other components of (Equation A.8) are ws , the weight given to ana-
lysis s, I, an identity matrix of size q × q , and transformation matrix Ps
for analysis s, of size q × ks . Ps has ks elements of unity, pij = 1, if the
i-th trait overall is the j -th trait in analysis s, and zero otherwise.
Starting values (V0 ) are obtained initially by decomposing covariance
matrices Cs into variances and correlations, and calculating simple
averages over individual analyses. If the resulting correlation matrix
is not positive definite, it is modified by regressing all eigenvalues
towards their mean, choosing a regression factor so that the smallest
eigenvalue becomes 10−6 , and pre- and post-multiplying the diagonal
matrix of modified eigenvalues with the matrix of eigenvectors and
A.6 Iterative summation 129
t
with vij denoting the ij−th element of Vt .
Bibliography
[1] Aguilar I., Legarra A., Cardoso F., Masuda Y., Lourenco D., Misztal I. Fre-
quentist p-values for large-scale-single step genome-wide association, with an
application to birth weight in American Angus cattle. Genet. Sel. Evol. 51
(2019) 28. doi: 10.1186/s12711-019-0469-3.
[2] Amestoy P.R., Davis T.A., Duff I.S. An approximate minimum degree ordering
algorithm. SIAM J. Matr. Anal. Appl. 17 (1996) 886–905.
[3] Anderson E., Bai Z., Bischof C., Blackford S., Demmel J., Dongarra J., Du Croz
J., Greenbaum A., Hammarling S., McKenney A., Sorensen D. LAPACK Users’
Guide. Society for Industrial and Applied Mathematics, Philadelphia, PA, Third
edn. (1999). ISBN 978-0-89871-447-0.
[5] Bondari K., Willham R.L., Freeman A.E. Estimates of direct and maternal
genetic correlations for pupa weight and family size of Tribolium. J. Anim. Sci.
47 (1978) 358–365.
[8] Dennis J.E., Schnabel R.B. Numerical Methods for Unconstrained Optimization
and Nonlinear Equations. SIAM Classics in Applied Mathematics. Society for
Industrial and Applied Mathematics, Philadelphia (1996).
[9] Dongarra J.J., Croz J.D., Hammarling S., Hanson R.J. An extended set of
FORTRAN Basic Linear Algebra Subprograms. ACM Trans. Math. Softw. 14
(1988) 1–17. doi: 10.1145/42288.42291.
[10] Erisman A.M., Tinney W.F. On computing certain elements of the inverse of
a sparse matrix. Commun. ACM 18 (1975) 177–179. ISSN 0001-0782. doi:
10.1145/360680.360704.
[11] Eskow E., Schnabel R.B. Algorithm 695: Software for a new modified Cholesky
factorization. ACM Trans. Math. Software 17 (1991) 306–312.
[13] Fernando R.L., Cheng H., Golden B.L., Garrick D.J. Computational strategies
for alternative single-step Bayesian regression models with large numbers of
genotyped and non-genotyped animals. Genet. Sel. Evol. 48 (2016) 96. doi:
10.1186/s12711-016-0273-2.
[14] Ferris M., Lucidi S., Roma M. Nonmonotone curvilinear line search methods
for unconstrained optimization. Comput. Optim. Applic. 6 (1996) 117–136.
[15] Forsgren A., Gill P.E., Murray W. Computing modified Newton directions
using a partial Cholesky factorization. SIAM J. Sci. Statist. Comp. 16 (1995)
139–150.
[16] Garcia-Baccino C.A., Legarra A., Christensen O.F., Misztal I., Pocrnic I., Vitezica
Z.G., Cantet R.J.C. Metafounders are related to FST fixation indices and reduce
bias in single-step genomic evaluations. Genet. Sel. Evol. 49 (2017) 34. doi:
10.1186/s12711-017-0309-2.
[17] Grippo L., Lampariello F., Lucidi S. A nonmonotone line search technique for
Newton’s method. SIAM J. Numer. Anal. 23 (1986) 707–716. ISSN 0036-1429.
doi: 10.1137/0723046.
[18] Gualdrón Duarte J.L., Cantet R.J.C., Bates R.O., Ernst C.W., Raney N.E., Steibel
J.P. Rapid screening for phenotype-genotype associations by linear trans-
formations of genomic evaluations. BMC Bioinformatics 15 (2014) 246. doi:
10.1186/1471-2105-15-246.
[19] Gustavson F., Waśniewski J., Dongarra J., Langou J. Rectangular full packed
format for Cholesky’s algorithm: factorization, solution, and inversion. ACM
Trans. Math. Softw. 37 (2010) 18. doi: 10.1145/1731022.1731028.
[20] Harville D.A. Use of the Gibbs sampler to invert large, possibly sparse, positive
definite matrices. Lin. Alg. Appl. 289 (1999) 203–224.
[21] Ihaka R., Gentleman R. R: A language for data analysis and graphics. J. Comp.
Graph. Stat. 5 (1996) 299–314.
[22] Karypis G., Kumar V. MeTis A software package for partioning unstructured
graphs, partitioning meshes, and computing fill-in reducing ordering of sparse
matrices Version 4.0. Department of Computer Science, University of Min-
nesota, Minneapolis, MN 55455 (1998). 44 pp.
[23] Koivula M., Negussie E., Mäntysaari E.A. Genetic parameters for test-day
somatic cell count at different lactation stages of Finnish dairy cattle. Livest.
Prod. Sci. 90 (2004) 145–157. doi: 10.1016/j.livprodsci.2004.03.004.
[24] Legarra A., Aguilar I., Colleau J. Short communication: Methods to compute
genomic inbreeding for ungenotyped individuals. J. Dairy Sci. 103 (2020)
3363–3367. doi: 10.3168/jds.2019-17750.
132 BIBLIOGRAPHY
[25] Legarra A., Christensen O.F., Vitezica Z.G., Aguilar I., Misztal I. Ancestral
relationships using metafounders: finite ancestral populations and across
population relationships. Genetics 200 (2015) 455–468. doi: 10.1534/genet-
ics.115.177014.
[26] Liu J.W.H. Modification of the minimum degree algorithm by multiple elimina-
tion. ACM Trans. Math. Soft. 11 (1985) 141–153.
[27] Mäntysaari E.A. Derivation of multiple trait reduced random regression (RR)
model for the first lactation test day records of milk, protein and fat. In:
Proceedings of the 50th Annual Meeting of the European Association of Animal
Production. Europ. Ass. Anim. Prod. (1999).
[30] Meyer K. DfReml version 3.0. CD-ROM of the Sixth World Congress on
Genetics Applied to Livestock Production (1998).
[33] Meyer K. Multivariate analyses of carcass traits for Angus cattle fitting reduced
rank and factor–analytic models. J. Anim. Breed. Genet. 124 (2007) 50–64.
doi: 10.1111/j.1439-0388.2007.00637.x.
[38] Meyer K., Kirkpatrick M., Gianola D. Penalized maximum likelihood estimates
of genetic covariance matrices with shrinkage towards phenotypic dispersion.
Proc. Ass. Advan. Anim. Breed. Genet. 19 (2011) 87–90.
[39] Meyer K., Smith S.P. Restricted maximum likelihood estimation for animal
models using derivatives of the likelihood. Genet. Sel. Evol. 28 (1996) 23–49.
doi: 10.1051/gse:19960102.
[40] Meyer K., Tier B. "SNP Snappy": A strategy for fast genome wide associ-
ation studies fitting a full mixed model. Genetics 190 (2012) 275–277. doi:
10.1534/genetics.111.134841.
[41] Nelder J.A., Mead R. A simplex method for function minimization. Computer J.
7 (1965) 308–313.
[42] Nocedahl J., Wright S.J. Numerical Optimization. Springer Series in Operations
Research. Springer Verlag, New York, Berlin Heidelberg (1999). ISBN 0-
38798793-2.
[43] Powell M.J.D. An efficient method for finding the minimum of a function
of several variables without calculating derivatives. Computer J. 7 (1965)
155–162.
[44] Quaas R.L. Computing the diagonal elements of a large numerator relationship
matrix. Biometrics 32 (1976) 949–953. doi: 10.2307/2529279.
[45] Schnabel R.B., Estrow E. A new modified Cholesky factorization. SIAM J. Sci.
Statist. Comp. 11 (1990) 1136–1158.
[47] Tier B. Computing inbreeding coefficients quickly. Genet. Sel. Evol. 22 (1990)
419–425.
[48] Westell R.A., Quaas R.L., Van Vleck L.D. Genetic groups in an animal model. J.
Dairy Sci. 71 (1988) 1310–1318. doi: 10.3168/jds.S0022-0302(88)79688-5.
[49] Wilson A.J., Reale D., Clements M.N., Morrissey M.B., Postma E., Walling C.A.,
Kruuk L.E.B., Nussey D.H. An ecologist’s guide to the animal model. Journal
of Animal Ecology 79 (2010) 13–26. doi: 10.1111/j.1365-2656.2009.01639.x.
Index
134
INDEX 135
--invspa, 65 -c, 58
--itsum, 68 -d, 58
--like1, 73 -t, 58
--limits, 69 -v, 58
--logdia, 74 hchol, 65
--maxgen, 76 hdiag, 65
--metis, 70
--meuw, 76 Sire model, 21
--mmd, 70 Example, 113
--mmeout, 61 Social genetic effects, 46
--modaim, 72 Dilution factor, 46
--nocenter, 76 Special block option
--nohead, 77 AOM-RAN, 49
--nologd, 74 AOM-RES, 49
--nolsq, 77 BSOLVE-SNP, 43
--nonly, 76 CLONES, 42
--noprune, 75 COVZER, 42
--noraw, 77 FORCE-SE, 42
--noreord, 74 GENGROUPS, 46
--norped, 75 INCORE, 49
--nosolut, 77 KERNEL, 50
--nostrict, 72 PENALTY, 52
--old, 75 REPEAT, 43
--pivot, 60 RPTCOV, 43
--pool, 69 RRCORR-ALL, 51
--powell, 73 SAMPLEAI, 45
--pxai, 72 SNPEFF, 44
--pxem, 72 SOCIAL, 46
--quaas, 76 WEIGHT, 41
--quapp, 66 Standard errors
--redped, 75 Approximation by sampling, 45
--s1step, 61 Force calculation, 42
--sample, 63
Traits, 29
--setup, 59
Troubleshooting, 11
--simplex, 73
Tuning factor, 53
--simul, 63
--solvit, 61 User defined
--subset, 67 Basis functions, 26, 28, 91
--threads, 77 Functions of covariance components,
--times, 69 33
--valid, 73 Relationship matrix, 27, 87
--wide, 69
--zero, 59 Weighted analysis, 41