UK Biobank Data Access Guide
UK Biobank Data Access Guide
Version 3.4
http://www.ukbiobank.ac.uk/
October 2023
This document details the means by which data supplied by UK Biobank can be obtained
and manipulated once access has been approved.
Contents
1
3.2 Using ukbfetch........................................................................................................31
3.2.1 Data downloaded using ukbfetch .....................................................................31
3.2.2 Downloading a single bulk item .......................................................................32
3.2.3 Creating and using a bulk file...........................................................................33
3.3 Using gfetch ...........................................................................................................35
3.3.1 Data downloaded using gfetch.........................................................................35
3.3.2 A gfetch example .............................................................................................36
3.4 Using ukblink ..........................................................................................................37
3.4.1 Returned datasets ...........................................................................................37
3.4.2 Using ukblink to download a Return ................................................................37
3.4.3 Using ukblink to create a bridge between participant identifiers ......................38
4 The Data Portal.............................................................................................................39
4.1 Record-level data on the Data Portal .....................................................................39
4.2 Accessing the Data Portal ......................................................................................40
4.3 Downloading tables from the Data Portal ...............................................................40
4.4 Using SQL to query the tables ...............................................................................41
Appendix A: Troubleshooting guide .................................................................................43
A.1 General ..................................................................................................................43
A.2 ukbunpack .............................................................................................................45
A.3 ukbconv .................................................................................................................45
A.4 ukbfetch .................................................................................................................46
A.5 gfetch .....................................................................................................................46
A.6 ukblink....................................................................................................................47
A.7 The Data Portal ......................................................................................................47
Appendix B: Sizes of bulk files .........................................................................................48
Appendix C: Size of core dataset.....................................................................................50
Appendix D: File types of older returned datasets ...........................................................51
2
Summary of changes in this version (3.4)
Compared to the previous version, the following changes have been made to this
version:
Added COVID-19 vaccination data to Table 1.1 and Section 4.1, and included
information about access restrictions
Clarified the specific Showcase field used to access COVID-19 PCR test data in
Table 1.1 and Section 4.1
Added instruction on firewall configuration to allow for email notification of data
release in Section 1.1
Updated the length of main dataset validity before basket expiration in Section 2.6.1
3
1 Introduction
This guide is intended for researchers who have an approved application for access to
UK Biobank (UKB) data, or an approved additional data request on an existing project,
and have received a notification email stating that their data is now available for
download.
This guide covers how to download the main dataset and access the Data Portal using
the Showcase download page, as well as how to use the download utilities to access
bulk and genetics data. This guide is not for researchers who will be accessing the data
via the UKB Research Analysis Platform (RAP). Details on accessing data via the UKB
RAP can be found here.
If you require information on applying for access to the resource, please see the page
Apply for access on the UKB website. If you require information about finding Data-fields
in Showcase, please see the Showcase User Guide.
When a new data release is approved on a project, a notification email (from no-
reply@ndph.ox.ac.uk) is sent to all collaborators on the project informing them that data
is now available for download, and giving outline instructions on how to do this. Please
ensure that firewall permissions for your organisation are configured to accept emails
from this domain.
The notification email contains a 32-character MD5 checksum within the main body of the
email. This is needed to download the main dataset. (Note that in the case of data
releases only giving access to returned datasets (see Section 3.4), there is no main
dataset and hence no MD5 checksum.)
It will also have an attachment, called the "(authentication) keyfile", with a filename of the
form: k56789r23456.key where 56789 will be replaced by the relevant project id and
23456 by the "run id" (a unique identifier for this particular dataset). This file is needed to
decrypt the main dataset, and to provide authentication for the utilities used to download
bulk data, some genetics data, and returned datasets.
4
1.2 Methods of accessing UK Biobank data
The data available to download from UK Biobank comes in a variety of formats which
need to be accessed in different ways.
The main dataset. This is a rectangular dataset with one row per participant, and all
fields from a researcher basket as columns. It is downloaded, decrypted and
converted according to the instructions given in Section 2.
Download utilities. The ukbfetch, gfetch & ukblink utilities are used to
download images and bulk data files, e.g. MRI Images, ECG data, certain genomics
data, and returned datasets (from other UKB research projects). See Section 3.
The Data Portal. Record-level hospital inpatient, death, primary care (GP), COVID-19
PCR test result and COVID-19 vaccination data, as well as Olink and OMOP data is
accessed via the Data Portal accessed from the Downloads page of Showcase. See
Section 4 for details.
The Research Analysis Platform (RAP). Most UKB data is now available on the
RAP, with a few exceptions. This guide does not go into detail of how to access data
using the RAP, which has its own documentation.
The table on the following two pages gives an overview of the different types of data
available through the UKB resource, and how they can each be accessed.
5
6
Table 1.1: Methods of accessing UKB data
7
1.3 Tiered access
In the past, which elements of the UKB data a research project can access has been
controlled by the use of "baskets", where a selection of particular Data-Fields is made,
and then, once the basket is approved, access is granted to those particular Data-Fields.
For many older projects, which remain on the old-style Material Transfer Agreement
(MTA), this will continue to be the way that access to data is determined.
For new applications, and those which have reached the point where they need to move
onto the new MTA, access will instead be controlled by researchers selecting one of
three "tiers", with different types of data being available at each tier, and with different
costs associated with these as described here.
Each Data-Field on Showcase has a "Cost Tier" associated with it, indicating which
methods of access are available for that Data-Field and what tier the application must be
in order to access the Data-Field by each method. For example, for Data-Field 20158 the
Cost Tier is shown below:
8
This Cost Tier is given in a coded form as: dN1 oN2 sN3, where each Ni is a number from
1 to 3 or the letter X. In this code:
The numbers give the minimum tier that a project must be in order to access that Data-
Field via that method. So for example, d1 o1 s1 means that any Tier 1, Tier 2 or Tier 3
project can access that Data-Field, whereas dX o3 sX means that the data for the Data-
Field cannot be downloaded from the RAP or using the Showcase download utilities, and
is only available to Tier 3 projects online on the RAP.
In the case of Data-Field 20158 given in Figure 1.2 above, the Cost Tier of d3 o2 s3
indicates that the DXA images can be viewed online on the RAP by Tier 2 or Tier 3
projects, but that only Tier 3 projects are permitted to download the data from the RAP or
download the data using the Showcase download utilities (ukbfetch in this case).
If you are having difficulties with any aspect of the data download process we have
collected some previously encountered issues in Appendix A of this document.
If you are unable to find a solution, then please contact the Access Management Team
(AMT) at access@ukbiobank.ac.uk. It will help us to solve your problem more quickly if
you provide as much information as possible, including:
Your Project ID.
The Run ID, i.e. which main dataset, to which the problem relates (where
relevant).
Screenshots of your problem.
The steps you followed up until the point when the issue occurred.
Any error messages received.
Listings of the contents of the folder you are working from (where relevant).
If you find any errors in this document, or any parts that are unclear or incomplete, we
would be grateful if you would pass them on to the AMT at the address above.
9
2 The main dataset
The main dataset is a rectangular dataset with one row per participant in the UK Biobank
(UKB) study. It contains all of the measurements and information collected on a
participant either at the UKB assessment centres or via online questionnaires, some
information gathered from external data providers such as the NHS (though some of this
is only available via the Data Portal; see Section 4), and derived data from bulk files
(such as MRI scans).
The main dataset also indicates whether a particular bulk or genomics Data-Field is
available for a particular participant, by containing an entry for where such data exists
and being blank where it does not. The bulk/genomics data itself needs to be
downloaded separately (see Section 3).
Further information about how a main dataset is structured can be found in Section 2.5.
Downloading a main dataset involves several steps. The encrypted dataset must be
downloaded by logging onto AMS for the relevant project and navigating to the
Showcase downloads page. It must then be decrypted (“unpacked”), and finally
converted to a suitable format for use. A number of “helper programs” need to be
downloaded to accomplish these steps.
There are three helper programs required for decrypting and converting the main dataset:
ukbmd5 – for ensuring the encrypted main dataset has downloaded correctly.
ukbunpack – for decrypting the downloaded main dataset.
ukbconv – for converting the decrypted dataset into a suitable format.
These are provided in the File Handlers tab in the Downloads section of the Showcase
website, as shown in figure 2.1 below.
There are separate versions of the helper programs for Windows and Linux systems. The
Windows format is distinguished by the suffix ".exe". The Linux versions do not work on
Macs.
10
Figure 2.1: The File Handlers tab of the Showcase Downloads page
The helper programs can be downloaded one at a time by selecting the required
operating system version. This will open a new page, where the file to download can be
found.
For example, to download the Windows version of the ukbmd5 program, you should
right-click on the link to the file (shown in green on Figure 2.2 below), and select "Save
link as…". It is also possible to use the command line wget utility to download the file by
copying the wget command given (shown in blue below) into a command prompt /
terminal.
11
2.2.2 Downloading a main dataset via AMS
To download a main dataset, you must first login to the Access Management System
(AMS), click on Projects, and select the relevant project ID and click the blue button
"View/Update". Now click on the Data tab at the top right, and then on the “Go to
Showcase to refresh or download data” button which will lead to the Showcase
Downloads page.
Click on the ID (also called the "Run ID") for the dataset you wish to download, which will
take you to the authentication screen:
12
Now enter the 32-character MD5 checksum, which was included in the main body of the
notification email for the dataset. Then click "Generate".
This will open a new page with a link to your dataset as shown below:
Click the "Fetch" button to download the encrypted dataset. Then save your dataset in
the same folder as the helper programs.
In order to proceed with the download process, i.e. validating, decrypting & converting
the downloaded file, it is necessary to be able to run the helper programs (see Section
2.2.1) using command line instructions from a command prompt in Windows or a terminal
window in Linux.
The easiest way to open a command prompt in the correct folder in Windows, is to
navigate to the required folder using the Windows file explorer, type cmd into the address
bar followed by Return, as shown below:
13
Figure 2.6: Opening a Command Prompt in Windows
Now that you have a command prompt / terminal window open, you can verify the
integrity of the downloaded main dataset file by typing the command:
ukbmd5 ukb23456.enc
You will receive information about the size of the downloaded file (in bytes) and a 32-
character MD5 checksum (labelled "MD5="). If this checksum matches the MD5
checksum in your notification email (the checksum used to download the dataset) then
the dataset has downloaded correctly, and you can proceed to the next step.
If the MD5 checksum differs from that in your notification email then the dataset has not
downloaded correctly. You should discard the downloaded file and repeat the steps in
Section 2.2.2.
Attached to your notification email was a file, called an "(authentication) keyfile", with a
filename like k56789r23456.key. This is a simple text file with your Project ID on the
1st line and a 64-character password (the "keyvalue") on the 2nd line.
14
To use ukbunpack to decrypt the dataset, use the command:
replacing:
Note that the keyfile for each main dataset extraction, including refreshes of the same
basket, is different, and so you have to use the keyfile with the matching run ID (23456
above). If the keyfile is not in the same directory as the dataset, you must provide a full
filepath to it.
The decryption could take a few minutes, after which a new file will have been created in
the folder named ukb23456.enc_ukb where 23456 is replaced by the run ID of your
dataset.
The result of the unpacking program is a dataset in a custom UK Biobank format (the
.enc_ukb file above). The ukbconv program can be used to convert this into various
other formats.
where ukb23456.enc_ukb is the file generated from the previous unpacking step (with
23456 replaced by the run ID of your dataset), and <option> is replaced by one of:
docs, csv, txt, r, sas, stata or bulk depending on the output desired.
Note that large datasets may take a considerable amount of time (possibly hours) to
convert, depending on the speed of the local system.
15
The various options do the following:
csv or txt: Converts the dataset into a comma-separated (csv) file or a tab-
separated (txt) file respectively (see Section 2.3.2);
r, sas or stata: Converts the dataset into a file suitable for one of the statistics
packages R, SAS or Stata, including (if desired) converting coded values into their
meanings (see Section 2.3.3);
docs: Generates a data dictionary for your dataset (see Section 2.3.5);
bulk: Creates a “bulk” file which is used in conjunction with the ukbfetch download
utility to download bulk data (see Section 3.2.3 for further details). This option is only
relevant for the downloading of bulk data items such as MRI images etc.
In all cases the original .enc_ukb file remains intact so the converter may be used
multiple times to generate different outputs.
All options generate the following two files (as well as the output from the conversion):
fields.ukb – A simple text file, giving a list of all the Showcase Data-Field numbers
appearing in the dataset.
ukb23456.log – A log file used to summarise the result of the conversion process,
giving the date & time, name of the output file, project id, basket number, the number
of variables, and the time required to convert.
All options, except bulk, produce a dataset that has one row per participant.
To convert the dataset into a comma-separated or tab-separated file, use the command:
where ukb23456.enc_ukb is the file generated from the previous unpacking step (with
23456 replaced by the run ID of your dataset), and <option> is replaced by csv or txt
respectively.
16
As well as the field and log files given above, the conversion will also produce:
In both cases coded values for categorical variables will be retained rather than being
replaced by the meanings (see Section 2.3.5).
The csv and txt files produced in Section 2.3.2 can be imported into R, SAS or Stata in
the usual ways. Alternatively, the r, sas and stata options for ukbconv can be used to
automatically replace the coded values used in categorical variables with their meanings.
For example, in Field 31 (Sex), Female is represented by the value 0 and Male by the
value 1. If, for example, the r option is used, a tab-separated file will be created that still
codes the values 0 and 1 for this variable, but an R script will also be generated that,
when used to import the dataset into R, will recode each 0 to "Female" and each 1 to
"Male".
In order for ukbconv to perform this recoding, the file encoding.dat needs to be
downloaded from Showcase and placed into the same folder as your .enc_ukb file and
ukbconv. The encoding.dat file can be found in the "Miscellaneous Utility" tab of the
Showcase Downloads page:
17
Figure 2.8: The encoding file
Note that same file works on either Windows or Linux systems. To download it, click on
"all" and then either right-click on the filename and select "Save link as…" or use the
wget command.
In order to recode the categorical variables ukbconv needs to be given the name of the
encoding file using the -e flag as shown below:
where ukb23456.enc_ukb is the file generated from the previous unpacking step (with
23456 replaced by the run ID of your dataset) and <option> is replaced by one of: r,
sas, or stata.
The files generated, in addition to the field and log files described in Section 2.3.1 above,
are given on the following table:
18
Format File generated Description
Note that if the file encodings.dat is not present when the conversion into R, SAS or
Stata format is run, or the -e flag is not used, then the conversion will still proceed but
without the categorical variables being recoded.
19
2.3.4 Converting a subset of fields
A main dataset can become very large when it contains many fields, and so ukbconv
has flags which can be used to restrict the output to only certain columns:
Flag Meaning
-s Specify a single field (only) to include in the output
-i Specify a subset of fields to include in the output
-x Specify a subset of fields to exclude from the output
Options are included by adding them to the end of the ukbconv command. So for
example the command:
would convert the dataset into an R format, keeping only the eid column and all columns
relating to field 20002. Note that since field 20002 (Non-cancer illness code, self-
reported) has numerous different instance and array indices this will produce multiple
additional columns (see Section 2.5 for an explanation of instance and array indices).
When using the options -i or -x to select fields to include or exclude respectively from
the converted dataset, the option should be immediately followed by the name of a text
file which contains the list of fields with one field number per row.
For example, assume we just want to extract fields 31, 20204 and 40000 from our
dataset and convert it to csv format. We create a text file called field_list.txt with
the contents:
31
20204
40000
and place it into the same folder as ukbconv. We then run the command:
The resulting output will only contain the eid column and columns for all instance and
array combinations for Data-Fields 31, 20204 and 40000 (assuming those Data-Fields
are present in the .enc_ukb file).
20
To assist with preparing the file giving the list of required fields, ukbconv outputs the file
named field.ukb each time it is run which lists all the available fields associated with
the dataset. This can be edited to identify the particular fields which are to be included in
or excluded from the subset.
Note that running the converter twice, using the same subset file but with -i on one run
and -x on the other, will split the dataset into two complementary parts (except with the
eid column in both).
The ukbconv option docs creates an HTML document that lists information about the
structure of the dataset. The first nine rows of such a file are shown below for illustration:
Here the UDI (Unique Data Identifier) is how each item is referenced in the UKB
repository. The format for standard data fields is:
<field_id>-<instance_index>.<array_index>
so that 47-2.0 references array index 0 for instance index 2 of Data-Field 47. Genomic
SNPs are identified by a number prefixed by "affy". See Section 2.5 on the structure of a
main dataset for an explanation of what is meant by “instance index” and “array index” for
a main dataset.
The Count column gives the number of non-empty rows for each variable (i.e. field,
instance, array combination) present in this dataset.
21
2.4 A decryption and conversion example
A researcher has been notified by email that data for their application 56789 is available
for download. The email provides the run ID 23456 for the dataset. The 32-character MD5
Checksum is:
abcdef0123456789abcdef0123456789
We assume that the three helper programs have already been downloaded in accordance
with Section 2.2.1 and that the dataset is being downloaded into the same folder.
1. The researcher logs on to the Access Management System (AMS), clicks Projects
and then clicks on the blue button View/Update for project 56789. They select the
Data tab at the top right of the page, and select the option to go to the Showcase
download page. From this page they select the Dataset tab. An entry with (run) ID
23456 should be listed.
2. They click on the (run) ID 23456 for the entry and on the following screen enter the
MD5 checksum given above (from the main body of the notification email):
abcdef0123456789abcdef0123456789
into the box and click Generate. This will open the download page; they click Fetch to
initiate the download of file ukb23456.enc and save it in the same folder as the
helper programs and encoding file.
3. To verify that the file has arrived intact they open a command prompt, navigate to the
appropriate folder and enter:
ukbmd5 ukb23456.enc
This displays an MD5 value which matches the MD5 Checksum from the notification
email (the one used to download the dataset). In this this will be:
abcdef0123456789abcdef0123456789
If the MD5 checksum had not matched, the researcher would need to repeat the
download operation. If there was still no match they would need to contact the Access
Management Team (AMT) for further assistance.
22
4. They next decrypt ("unpack") the data by entering into the command prompt:
5. To create a comma separated variable (csv) version of the data, they enter into the
command prompt:
23
2.5 The structure of a main dataset
Having followed the above steps a researcher will now have a UK Biobank main dataset. We here give some indication of what this
would look like, focusing in particular on the meanings of the column headers.
A main dataset will be rectangular with one participant per row, and columns headers giving the Showcase Data-Field number that
the data in that column relates to together with the “instance index” and “array index” of that item. Broadly speaking, the instance
index is used to distinguish data for a Data-Field which were gathered at different times, and the array index is used to distinguish
multiple pieces of data for that field which were gathered at the same time.
These will display differently depending on the format that the dataset has been converted to (see Table 2.13 at the end of this
section). The example given in Table 2.12 below shows a small portion of a sample dataset as it would appear in .csv format
opened in Excel:
eid 53-0.0 53-1.0 53-2.0 20002-0.0 20002-0.1 20002-1.0 20002-1.1 20002-2.0 20002-2.1 …
1256847 11/04/2007 03/01/2017 1077 1077 1075
8645816 29/10/2009
4652658 15/08/2009
2328974 12/07/2008 09/03/2013 1002
3315794 22/02/2010 01/12/2012 19/11/2018 1111 1111 1111 1065
9497726 25/02/2006
4582852 06/06/2008 1222 1265
…
The eid is the encoded participant identifier for the project in question. The remaining column headers are in the format F-I.A
where F is the Data-Field number, I is the instance index and A is the array index.
24
Two Data-Fields are shown in the sample dataset: Data-Field 53 (Date of attending
assessment centre) and Data-Field 20002 (Non-cancer illness code, self-reported). In
each case there are three “instances” of the variable (the first number after the -). Using
the “Instances” tab on the Data-Field 53 page on Showcase, or clicking on the “2” of
“Instancing 2” on the Data-Field 20002 page, we can see that these correspond to the
visit type: 0 for the initial (baseline) visit, 1 for the repeat assessment and 2 for the first
imaging assessment. Instance 3, corresponding to repeat imaging, is omitted here for
space reasons.
The columns 53-0.0, 53-1.0 and 53-2.0 therefore hold the dates each participant
attended that particular type of assessment centre. In the above example data, all
participants attended a baseline assessment centre (this would always be the case), but
only two (2328974 & 3315794) attended the repeat assessment, one of whom (3315794)
also attended an imaging centre. The first participant attended an imaging centre, but did
not attended the repeat assessment.
At each assessment centre visit a participant can self-report illnesses, and these are
recorded in Data-Field 20002. The illnesses are coded using Coding 6, as indicated on
the Data-Field 20002 page on Showcase. Clicking on the “6” of “Coding 6” on that page
gives the meanings of these codes.
For example: looking at the participant with eid 3315794 we see that at each of their
three assessment centre visits they self-reported having asthma (code 1111). As the
“first” condition reported this is assigned to have array index 0 (the final number in 20002-
0.0 etc). At their imaging assessment visit (instance 2) they also report hypertension
(code 1065), and this being the second reported condition at that visit it is assigned to
array index 1, i.e. in the column with header 20002-2.1.
Note that in reality Data-Field 20002 has array indices running from 0 to 33 (indicating at
least one participant self-reported 34 illness codes), and so the real dataset would be
considerably wider than that shown above, even with only these two Data-Fields in it.
Note also that due to the nature of Data-Field 20002 being a self-report field (i.e. reported
at an assessment centre), it is only possible to have data for a particular instance index
for Data-Field 20002 if that same instance index in Data-Field 53 has a value. For
example, since the participant with eid 4582852 only attended baseline assessment they
can only have values for field 20002 with instance index equal to 0.
The instance index is not exclusively used to refer to the assessment centre visit. For
example, the “Diet by 24-hour recall” fields (see Category 100090) use instance 0 to refer
to the baseline assessment centre (as above), but then instances 1 to 4 refer to the four
on-line cycles of this questionnaire. As another example, reports from the cancer register
25
(see Category 100092) are given a new instance index for each additional type of cancer
reported.
As indicated above, the column headers appear slightly differently depending on which
package you are using. The various output formats display the headers as follows:
where, as previously, F represents the field number, I the instance index and A the array
index.
The main dataset can be downloaded multiple times without limit, but will "expire" after
three months and become inaccessible. This is in order to prevent the data of
participants who have subsequently withdrawn from the study being released again. (A
keyfile remains valid for bulk downloads for longer; see section Section 3.1.2 for more
information.)
An expired dataset with appear on the main Downloads page with an asterisk next to it,
and it will not be possible to click on it to go the page where it can be "fetched" (see
Section 2.2.2). Note however, that even where a dataset is expired, it will generally still
be possible to create a "refresh" for the basket that the dataset was based on (see the
next section).
A refresh of a basket is a new extraction of the fields in the basket, and will include any
additional data added to the UKB resource since the last extraction of that basket. A
26
refresh will not contain data for participants who have withdrawn since the basket was
last released.
(iv) Next click "Application" (at the top of the page) where you will be taken to a list of
your baskets.
(v) Now select the basket ID of the basket that you wish to refresh, and then click on
the Refreshes tab as shown below:
27
(vi) Finally, select “Request Refresh”.
Note that the refresh request may take a few days (or longer at busy times) to be
approved and released, at which point you will receive a new notification email for the
resulting data extraction. This email will contain a new MD5 checksum and keyfile to
download and decrypt the dataset.
It is only possible to refresh a basket that contains new data subsequent to a Showcase
update. If you try to refresh a basket that has been released more recently than latest
Showcase update to the fields in the basket, you will see a page like this:
28
If you are unable to request a refresh for a basket using the above method, but
nevertheless wish to receive a refreshed dataset, please contact the Access
Management Team (AMT).
UKB participants are free to withdraw from the study at any time. There are different
levels of withdrawal, which are described here. Where a participant has withdrawn with
"no further use", they have indicated that they no longer wish their data to be used.
Records corresponding to such participants should be removed from further analyses.
Lists of such withdrawn participants are sent out to all collaborators on a project
periodically (usually around the time of a Showcase Update). In addition, it is possible to
access a list of withdrawn participant eids on Showcase by following steps (i) to (v) of
Section 2.6.2 above, selecting the "Withdrawals" tab, and then clicking "Download
withdrawals" as shown below.
29
3 Download utilities for bulk & genomics data and returns
The UKB resource contains various types of non-tabular data, such as images, genomics
data & returned datasets, which are not suitable for inclusion in a main dataset. The
download utilities: ukbfetch, gfetch & ukblink are used to download some such
items. These utilities are all available for download on the Showcase downloads page.
To use these utilities you need to be able to connect to the UKB remote repositories (see
Section 3.1.1), and to have the keyfile from a notification email for your project in the
same directory as the download utility (see Section 3.1.2).
Note that the utilities will usually not run from shared network drives on Windows systems
due to necessary permissions not being set, and so we would advise that they are run
from a local drive.
The bulk repository consists of a pair of mirrored systems each connected to the UK
JANET network by independent links. The system names are:
biota.ndph.ox.ac.uk
chest.ndph.ox.ac.uk
To use the download utilities your computer must be able to make http (Port 80)
connections to at least one of these systems. Note that navigating to the above websites
is not part of the download process; you simply need to ensure that your computer is able
to connect to them.
It is not possible to use a proxy server when using the download utilities.
In order to use the download utilities it is necessary for you to authenticate yourself to the
system. To do this you will need an authentication “keyfile” which was attached to one of
your notification emails. This will have a filename of the form k56789r23456.key
where 56789 is your project id and 23456 the run id of the data extract.
The keyfile is a simple text file containing your project id on the first line and the 64-
character decryption password for that dataset on the second line.
30
Note that any keyfile from your project can be used for authentication for up to a year
after the notification email to which it was attached. If a keyfile from more than a year ago
is used, you will receive an error that the "authentication key has expired". See Section
3.1.3 below for how to proceed if you do not have a keyfile that is less than a year old.
The keyfile should be saved in the folder where you will be running the download utility.
The utility expects by default that the authentication keyfile has been renamed as
.ukbkey (i.e. this is its full filename with no other file extension). However, it is still
possible to run the utility with the keyfile named differently by using the -a option (see
below).
If you do not have a keyfile that is less than a year old, you will not be able to use any of
the download utilities. In such a case, it is possible to request a new keyfile without the
need for a new release of a main dataset.
To do this: log on to AMS and navigate to your project, then the Data tab, then to
Downloads page on Showcase by clicking on the “Go to Showcase to refresh or
download data” button. Now click on "Application" at the top, then the "Keys" tab, and
finally click on "Request Key". A new email will then be sent to all collaborators on the
project containing a new keyfile. As such a keyfile is not associated with a run (i.e. main
dataset), its filename will have the form k56789x34567.key where 56789 is your
project id and 34567 is an identifier for the new keyvalue.
The following sections give general instructions for accessing “bulk” data using the
ukbfetch utility. Both a Windows and a Linux version of this utility exist. These can be
downloaded from the Showcase downloads page. Further details are given in UKB
Resource 644.
Note that ukbfetch creates a temporary file during the download, and then checks the
MD5 checksum of the resulting file against its expected value. If the checksums do not
agree then the download will fail. There is hence no separate validation step needed.
The ukbfetch download utility is used download bulk data, such as imaging data (e.g.
brain MRIs), accelerometer and ECG data, i.e. fields for which each item is a
complex/compound dataset in itself.
31
The ukbconv utility will also usually be needed to generate a “bulk file” allowing the
download of multiple bulk items at once (see Section 3.2.3).
If you have a bulk data field in your application basket, there will be a column for it in your
main dataset; however only the field ID will be present rather than the actual contents of
the bulk data. The purpose is to indicate which participants have that bulk field available.
The sizes of some of the bulk field files are given for reference in Appendix B.
We assume for illustration that a participant with eid 2143432 has data for the bulk Data-
Field 20252 (T1 structural brain images - NIFTI), which was collected at a first imaging
centre visit (i.e. instance 2; see Section 2.5 for more details of instance index).
In a main dataset this will be indicated by the cell corresponding to the row with eid
2143432 and the column 20252-2.0 having the value 20252 in it. (Note that the array
index is 0 because only a single item of data was collected for this Data-Field.)
Note that there must be no spaces between the flags (-e, -a etc) and the following
arguments.
The files extracted from the repository are renamed to have the format:
<eid>_<field_id>_<instance_index>_<array_index>.<extension>
where <extension> is the actual extension of the file extracted. So for example, the file
downloaded by the ukbfetch call above would be named:
2143432_20252_2_0.zip
32
3.2.3 Creating and using a bulk file
To download many bulk data files at once, ukbconv can be used to generate a “bulk file”
which lists participant eids and Data-Field numbers (including instance & array indices;
see Section 2.5) for which that bulk field exists.
For example, let us assume we want to download all the T1 structural brain images, i.e.
Data-Field 20252, for all participants at once.
Firstly, to generate the bulk file we go to the folder where our decrypted main dataset is
stored (i.e the .enc_ukb file that resulted from running ukbunpack), and run the
command:
where ukb23456.enc_ukb is our unpacked (but not converted) main dataset, and
23456 would be replaced by the run id corresponding to your dataset.
The above command would output a text file called ukb23456.bulk, the first few lines
of which would look something like:
3422567 20252_2_0
5321753 20252_2_0
2457842 20252_3_0
i.e. a simple list with each row the eid of a participant and the Field_Instance_Array of the
relevant data.
Note that it is not possible to specify particular instance and array indices in the ukbconv
call as the -s flag does not have this functionality. If this is a problem, the bulk file can be
edited using an appropriate software package to keep only the particular instances/arrays
required.
Note that the -i flag for ukbconv can replace the -s flag to select a group of fields rather
that a single one as in the example above. See Section 2.3.4 for how this is done.
Next, using our bulk file, we can now run the command:
ukbfetch -bukb23456.bulk
33
to download every file for Field 20252. Once again there should be no space between the
-b flag and the filename.
As previously, the above assumes the keyfile has been renamed as .ukbkey. If instead
we had retained the original filename of the keyfile of k56789r23456.key then the
command would be instead:
The number of files that should be downloaded at once can be specified using the -s
and -m flags. There is a limit of 1,000 files per ukbfetch call and so this will sometimes
be an essential element of the process.
The flag -s gives the starting row of the bulk file to work from, and the -m flag sets how
many rows from the bulk file are processed.
For example, to download 1000 files at a time for the above field we would run the
following commands one by one:
Assuming that there were fewer than 5000 participants with this field (there are in fact
more than this) this would download all files for Data-Field 20252.
These commands can be added to a batch file or shell script and then run in one go. In
this case there is -o flag which can be used to specify a different name for the logfile for
each call of the ukbfetch utility.
Note that in order to speed up the download process it is possible to have several files
downloading simultaneously, with up to a maximum of 20 connections to the repository at
one time being permitted.
34
3.3 Using gfetch
There are a variety of different types of genomics data available through UK Biobank,
and different methods are used for accessing the different types. See the table in Section
1.2, as well as the Cost Tier information for the relevant fields, for further details. Note
that some genomics data, for example exome and whole genome sequences, are only
available in-situ on the RAP and cannot be downloaded at all.
For those genomics files which are downloadable using the Showcase utilities, the
method for individual-level files and population-level files is different. Individual-level files,
i.e. where there is a single file per participant, are downloaded using ukbfetch,
population-level genetics files, containing the whole cohort or a substantial portion of it,
are downloaded using gfetch. Only a Linux version of gfetch is available (see
Appendix A.5 for a way to proceed if you do not have Linux available).
Note that genetics Data-Fields downloaded using gfetch will have a corresponding
column in your main dataset, but in a similar way as for bulk fields it will only provide a
marker to indicate whether a participant has that Data-Field available.
During the download process, gfetch creates a temporary file, and then checks the
MD5 checksum of the resulting file against its expected value. If the checksums do not
agree then the download will fail. There is hence no separate validation step needed.
Further information about the various options available with gfetch are given in UKB
Resource 668.
Some genotype data fields appear in the main dataset (e.g sample QC fields), some can
be downloaded using the gfetch utility (for population-level files containing multiple
participants), and some needs to be downloaded using the ukbfetch utility (for
individual-level files containing data on a single participant).
Most of the relevant information is shown on the page Category 263 (Genotypes) and in
UKB Resource 668.
Information about the method of download is also usually given on the Notes tab of the
relevant Data-Field. For example, the CEL files, Field 22002, need to be downloaded
using ukbfetch using the same methods as described in Section 3.2.
Further information about the Genotyping, Exome & Whole Genome data, including the
size of the files, is given in the various FAQs links on the page: UKB Genetic data.
35
The data in the Genotype BED and BGEN files appear in a common order for all
researchers. In order to match your participant eids to the data (which is done by
position) it is necessary to use gfetch to download appropriate FAM and sample files.
The same is true for the other types of genomics data and their corresponding link files.
A researcher has gained access to the genotype calls by including Data-Field 22418
(Genotype calls) in their project basket, which has subsequently been approved. They
wish to download the chromosome .bed file and its associated .fam file (i.e. the link file
giving the order that their project eids appear in the .bed file).
They have downloaded gfetch from Download 600 by running the wget command
given on a Linux terminal. To make gfetch an executable file they have then run:
They have also saved their authentication keyfile k56789r23456.key from their
notification email (where 56789 is their application ID) into the same folder as gfetch,
and renamed it as .ukbkey (this being the full filename).
To download the Genotype call .bed file for Chromosome 5, they enter the command:
The two files then match by row position, i.e. the data on row 1 in the .bed file
corresponds to the participants whose eid is given on the first row of the .fam file.
Note that sometimes the command ./gfetch needs to be used in place of gfetch
because of the way a Linux system is set up (see Appendix A.1). If the researcher had
not renamed their keyfile, and left it with the filename k56789r23456.key, then they
would have had to replace the above commands with:
and
“Returns” are datasets returned by researchers who have used UKB data in their
research. Some returned datasets are incorporated into the main resource as new Data-
Fields, but those that have not been need to be downloaded from the Returns Catalogue
using the ukblink utility.
The ukblink utility can be downloaded from the File Handlers tab on the Download
section of Showcase. Both a Windows and Linux version of ukblink are available.
Further details for accessing Returns using the ukblink utility are given in UKB
Resource 655.
We assume that we have downloaded the ukblink utility (and if using Linux, made it
executable; see Section A.1), and moved our keyfile into the same folder.
If our project has been granted access to (for example) Return 210, then we can
download it by using the command:
ukblink -r210
assuming our keyfile (placed into the same folder) has been renamed as .ukbkey.
Otherwise we use:
assuming the keyfile still has its original filename. (In Linux we may need to replace
ukblink by ./ukblink, as per Appendix A.1.)
Newer Returns are all .zip files and will extract as such. Older returns will extract as
generic .dat files and in reality be either .zip or .7z files. A list of which of the older
Returns has which type of zipped file format is given in Appendix D.
Note that some returns are so large that the Windows version of ukblink will hit a
memory allocation ("malloc") error when you attempt to download them. In such a case,
the Linux version should be used instead.
37
3.4.3 Using ukblink to create a bridge between participant identifiers
Some Returns provide data on an individual level for each participant, in which case with
the Return the participants will be identified by the encoded ids (eids) associated with the
original project that generated the return. The ukblink utility therefore also allows the
creation of a bridge file to connect your project eids with those used in the Return.
Return 210 is an example of a Return that includes individual-level data (this can be seen
from its Showcase page in the “Personal” row). In order to download the bridge file we
need to know the project that this Return was generated as part of. This can be
determined from the first line of its Showcase page where we can see that it was
generated by Application 38006.
ukblink -b38006
(adding the equivalent of -ak56789r23456.key if the keyfile still has its original
filename).
This will create a text file with two columns. The first column will be the eids for the
project creating the bridge (56789 in the above example), and the second column will
consist of the corresponding eids for the application that generated the return (38006 in
the above example).
38
4 The Data Portal
Record-level data is tabular data that is too complex in structure to fit into a main UKB
dataset. For example, a participant may have a very large number of hospital inpatient
admissions, each with a large amount of accompanying information, and so it is not
possible to fit this into the one-row-per-participant structure of a main dataset.
Record-level data is accessed via the Data Portal, which is accessed through AMS as
described in Section 4.2.
Category 2000 for hospital inpatient data. Note that summary information about
hospital diagnoses and procedures is available in a main dataset, but the detail of
each admission in the record-level tables is not.
Category 100093 for death data. Note that death data is (uniquely) also available
in a main dataset.
Category 3000 for primary care (GP) data (covering approximately 45% of the
cohort).
Field 40100 for COVID-19 PCR test results data.
Field 32040 for COVID-19 vaccination data. This field is restricted and is only
available for research related to COVID-19; principal investigators must contact
the Access Management Team (AMT) to request authorisation to access the data.
Category 1839 for the Olink proteomics data.
Field 20142 for the OMOP Common Data Format (CDF) dataset tables.
A list of all Data Portal tables can be found in the Record Tables Catalogue on
Showcase. Clicking on a Table ID will also allow a list of columns for that table to be
displayed, with accompanying information about the data type and notes on the field.
This information on the tables and columns of the record tables is also available in a
downloadable form in Schema 17 and Schema 18 on Showcase (accessed by clicking on
"Catalogues" at the top of Showcase, then "Schema").
39
4.2 Accessing the Data Portal
For projects on the old-style Material Transfer Agreement (MTA), access to each table on
the Data Portal is granted to a research project on a table-by-table basis, by including a
specific Data-Field in a project basket. For example, including Data-Field 41259 in a
basket will give access to the main HESIN (hospital inpatient) table. Similar fields can be
found in each of the categories above.
For projects on the new-style MTA, access to all (unrestricted) health-linkage tables
(inpatient, death, GP & COVID-19 PCR) is granted as part of Tier 1 and hence is
available to all projects, with access to the Olink and OMOP tables given to Tier 2 and
Tier 3 projects. The COVID-19 vaccination table is part of Tier 1 but is restricted to
COVID-19 research only.
The main dataset will include a column for each field which grants access to each table,
but the values shown in that column will be a count of the number of rows that each
participant has in the corresponding table.
Once access to a table on the Data Portal has been approved, the Data Portal can be
accessed by following the steps below:
Once a researcher has accessed the Data Portal they can download each complete table
to which they have access as shown below, or query the data prior to downloading it (see
Section 4.4).
To download a complete table click on the ‘Table Download’ tab in the bottom panel,
enter the name of the table you wish to download (e.g. hesin_diag) and click on the
‘Fetch Table’ button as shown in Figure 4.1.
40
Figure 4.1: Table download tab
This will generate a custom download link that you can paste into a web browser, and a
wget command for those using a linux system. The resulting dataset will be provided as
a tab separated text file (.txt). Please note it can take some time to download the
complete tables.
Each major database uses a slightly different dialect to that of other vendors, however
most common statements are identical across them. The UK Biobank system uses the
Ingres platform to host its relational databases. A reference manual is available online
and can be located by an internet search for “Ingres 10.2 SQL Reference Guide”.
For each of the main categories of record-level data (hospital inpatient, death, and
primary care), a resource is available which gives some examples of SQL statements
that can be used to investigate the data in that category. This can be found within the
Resource tab of the category pages (given in Section 4.1 above).
Please note that the use of SQL on the Data Portal is intended for simple exploratory
queries. Running complex queries that return large amounts of data, and then attempting
to download the results is likely to cause the system to timeout (giving a “Gateway
Timeout” error). We would advise instead downloading the needed tables using the
41
"Fetch table" method in Section 4.3, and then performing joins and other manipulations
on the downloaded tables.
42
Appendix A: Troubleshooting guide
This appendix lists a number of problems that have previously been encountered by
researchers when using the various download, decryption & conversion utilities.
A.1 General
If you are trying to run ukbunpack, ukbconv etc in a Windows environment and
receive an “Access is denied” error, then it is likely you do not have permissions set to
run executable files which are unknown to the system. You may need to log on as an
Administrator or contact your local IT support for assistance.
If you are working in a Linux environment then a download utility such as ukbfetch
will not by default be recognised as being executable. To fix this use the command:
You may also find that your system cannot locate ukbfetch because it does not
search the current working directory when looking for executable files. The easiest
way around this is to prefix the command as follows:
If when using any of the helper programs (such as ukbmd5) or download utilities
(such as ukbfetch) you receive an error such as:
then it is likely you are attempting to run the Linux utilities on a Mac. Unfortunately,
none of the utilities work on Macs.
The keyfile (received as the attachment to your notification email) needs to be in the
same directory as the utility in order to provide authentification to the remote
repository. Both utilities by default expect it to have been renamed as .ukbkey (note
that this as its full name, with no other file extension). This can cause problems in
43
Windows, and hides the file in Linux (the command ls -a will show such “hidden
files”). If you prefer to give a different name to the keyfile, then ukbfetch, ukblink
& gfetch can still be run, but you will need to add:
-ak56789r23456.key
it is because you have put a space between the -a and the keyfile name.
When using a Linux system, if you receive an error along the lines of:
it means that your local Linux libraries are not compatible with our standard versions
of the utility ukbfetch (in this example), ukblink or gfetch. In each case it is
possible to create a version of the utility that will run on your system. See Resource
645 (for ukbfetch), Resource 656 (for ukblink) or Resource 669 (for gfetch), for
further details.
on a Windows system when using one of the download utilities, it means you are
trying to run the utility from a shared network drive, but lack all the necessary
permissions to do so. You should run the utility from a local drive instead.
If when running one of the download utilities you receive an error similar to :
you will need to request a new keyfile, as the one you have used is more than a year
old. This is done by navigating to the Downloads page on Showcase via AMS,
selecting "Application" at the top, then the "Keys" tab, and finally clicking on "Request
Key".
44
A.2 ukbunpack
When attempting to use ukbunpack to decrypt the dataset, if you receive the error:
the most likely explanation is that you are using the wrong 64-character password.
Please note that the main dataset can only be unpacked using the password from the
keyfile k56789r23456.key contained in the notification email for that particular
dataset, i.e. for the dataset released with run id 23456 in this example. You cannot
reuse the keyfile password from a different data release on the same project.
A.3 ukbconv
While using the ukbconv utility, some researchers, depending on the variables in
their dataset, may see the following error message appear in the command-line
terminal:
This message does not affect the conversion process, and has no consequence on
the data being extracted. Researchers can directly open the files generated by
ukbconv without worrying about these errors.
If using the txt option with ukbconv to create a tab-separated file, all columns
except the first will incorrectly start with an extra tab, meaning that header column will
be misaligned with the data columns. Assuming you are working in a Linux
environment, one possible way of fixing this once the conversion is complete, is to
use the following sequence of commands on the resulting file (ukb23456.txt in the
example given).
csplit ukb23456.txt 2
cut -f1 --complement xx01 > xx01_1
cat xx00 xx01_1 > ukb23456_repaired.txt
These three commands respectively split off the header row from the remainder of the
dataset, remove the first (blank) column from the dataset, and then reattach the
header to the dataset with the additional column removed. The process creates three
intermediate files: xx00, xx01 & xx01_1, which can then be deleted.
45
When using the csv option with ukbconv it is important to remember that the
resulting file will be "quoted", i.e. the value for each field will be enclosed in double
quotes. This is necessary to deal with the fact that some text fields might themselves
contain commas. Statistics packages are able to deal with this without a problem,
however other approaches, such as trying to use the Linux command cut to extract
particular columns, will not maintain column alignment and so cannot be used.
When using the sas option for ukbconv you may receive warning messages in the
log file of the following types:
The first of these is not a cause for concern, but is simply an artefact of how the
columns are generated (e.g. if any of instances 0, 2 or 3 of field 84 have an entry for
array-index 4, then a column will be generated for array-index 4 for instance 1, even if
it does not contain any data).
Warnings (2) and (3) are more problematic, and indicate that SAS is unable to import
the data correctly. As a workaround for these problems, it is possible to use ukbconv
to only import certain fields, and you can use the error logs to determine those fields
where the data cannot be converted or where the string is too long, and to import
these separately into a csv file. Some minor processing will then allow these to be
merged back into the main dataset in SAS as required.
A.4 ukbfetch
If you are running ukbfetch with a bulk file and are receiving an error indicating that
it cannot find data for a particular eid/field combination, then this might be because
you created a bulk file (using ukbconv) containing fields which are not accessed
using ukbfetch. For instance, if ukbconv is run with the bulk option without
specifying a particular field (or set of fields), it will include genomics fields that need to
be downloaded using gfetch in amongst the bulk fields. Hence, most fields
appearing in the bulk file will fail to download because it is not possible to access their
data in this way.
A.5 gfetch
When running gfetch you may find that the data appears to be “fetched” properly,
but then cannot be “written”, causing the process to abort. This is most likely due to
the large size of some of the genetics data (particularly the imputed data)
46
overwhelming the local storage available during the download. We recommend
contacting your local IT support to deal with this issue.
If you only have a Windows computer available, it is possible to set up a Linux shell to
run within it from where you can use gfetch. Searching online for “running linux on
windows” or similar will provide links describing how to do this.
when using gfetch, indicates that some network issue has interrupted the download
process. Such errors usually appear to be transient, and so we would recommend
retrying the download when they occur. If the issue persists, we would advise you to
use a different computer and/or network, as well as ensuring you are using the most
up-to-date versions of the utilities.
A.6 ukblink
Some Returns are so large that the Windows version of ukblink produces the error:
indicating that ukblink has run out of working memory. If this occurs, the Linux
version of ukblink should be used instead.
If you have run an SQL query, then attempted to download the results, but received a
"Gateway Timeout" error, it is because the system (which is intended for small
exploratory queries) is unable to deal with the size of the download. We advise
instead downloading the needed tables using the "Fetch table" method in Section 4.3,
and then performing joins and other manipulations on the downloaded tables.
47
Appendix B: Sizes of bulk files
The following table gives the approximate size, per participant, of some of the bulk fields
available:
48
20253 T2 FLAIR structural brain images - NIFTI 34
20254 Liver imaging – IDEAL protocol - DICOM 10
20259 Pancreas images – ShMoLLI - DICOM 15
20260 Pancreas images – gradient echo - DICOM 15
20263 T1 surface model files 296
20264 Kidney imaging – gradient echo - DICOM 36
20265 Kidney imaging – T2 haste - DICOM 36
20266 Arterial spin labelling brain images - DICOM 22
20267 Kidney imaging – T2 vibe - DICOM 36
21011 FDA data file (left) 273
21012 FDS data file (left) 273
21013 FDA data file (right) 273
21014 FDS data file (right) 273
21015 Fundus retinal eye image (left) 273
21016 Fundus retinal eye image (right) 273
21017 OCT image slices (left) 273
21018 OCT image slices (right) 273
25747 Eprime advisor file <1
25748 Eprime txt file <1
25749 Eprime ed2 file <1
25750 rfMRI full correlation matrix, dimension 25 <1
25751 rfMRI full correlation matrix, dimension 100 <1
25752 rfMRI partial correlation matrix, dimension 25 <1
25753 rfMRI partial correlation matrix, dimension 100 <1
25754 rfMRI component amplitudes, dimension 25 1083
25755 rfMRI component amplitudes, dimension 100 1083
26301 Quantitative susceptibility mapping images - NIFTI 23
90001 Acceleration data – cwa format 267
90004 Acceleration intensity time-series 267
49
Appendix C: Size of core dataset
The core dataset consists of the categories shown in the Quick Start section of
Showcase when a basket is first created:
In addition: the R .tab file is accompanied by a 521 KB .r script, the SAS .sd2 file by a 1.4
MB .sas script, and the Stata .raw file by a 533 KB .do script and a 995 KB .dct file.
Note that the large difference in size between the tsv .txt file and the (also tab-separated)
R .tab file is due to empty fields being represented by the empty string in the former and
by NA in the latter. Similarly, all fields are quoted in the .csv file, with empty fields
appearing as "", which accounts for its additional size compared to the .txt file.
50
Appendix D: File types of older returned datasets
The following table gives the zipped format used for older returned datasets, which
extract with a .dat extension. Later returns than those below either will extract with a
.dat extension and will actually be in .zip format, or (for those uploaded after October
2020) will extract with their correct extension (which will be .zip). Files downloaded by
ukblink with a .dat extension should be renamed to the correct file type, and standard
utilities then used to unzip the file.
51
424 Characteristics of rheumatoid arthritis and its association 7z
with major comorbid conditions: cross-sectional study of 502
649 UK Biobank participants.
463 Heaviness, health and happiness: a cross-sectional study of 7z
163066 UK Biobank participants
464 Psychiatry Gender differences in the association between 7z
adiposity and probable major depression: a cross-sectional
study of 140,564 UK Biobank participants
473 The effect of functional hearing loss and age on long- and 7z
short-term visuospatial memory: evidence from the UK
Biobank resource
474 Better visuospatial working memoery in adults who report 7z
profound deafness comapred to those with normal or poor
hearing: Data from the UK Biobank resource
501 Cognitive function and lifetime features of depression and 7z
bipolar disorder in a large population sample: Cross-
sectional study of 143,828 UK Biobank participants
504 Low birth weight and features of neuroticism and mood 7z
disorder in 83545 participants of the UK Biobank cohort
508 Prevalence and Characteristics of Probable Major 7z
Depression and Bipolar Disorder within UK Biobank: Cross-
Sectional Study of 172,751 Participants.
509 Associations between single and multiple cardiometabolic 7z
diseases and cognitive abilities in 474 129 UK Biobank
participants.
511 Adiposity among 132,479 UK Biobank participants; 7z
contribution of sugar intake vs other macronutrients
513 Cognitive Test Scores in UK Biobank: Data Reduction in 7z
480,416 Participants and Longitudinal Stability in 20,346
Participants.
526 Change in commute mode and body mass index: 7z
prospective longitudinal evidence from UK Biobank
527 Active commuting and obesity in mid-life: cross-sectional, 7z
observational evidence from UK Biobank
529 Lifestyle factors and prostate-specific antigen (PSA) testing 7z
in UK Biobank: Implications for epidemiological research
52
534 Ethnic differences in sleep duration and moring-evening type 7z
in a population
535 Smoking, screen-based sedentary behaviour, and diet 7z
associated with habitual sleep duration and chronotype: data
from the UK Biobank
536 Interactive effects of sleep duration and morning/ evening 7z
preference on cardiovascular risk factors
542 Multiple novel gene-by-environment interactions modify the 7z
effect of FTO variants on body mass index
547 The influence of social interaction and physical health on the 7z
association between hearing and depression with age and
gender
584 Genome-wide association analysis identifies novel blood zip
pressure loci and offers biological insights into
cardiovascular risk
702 Case-control association mapping by proxy using family zip
history of disease
717 Gemome-wide association study identifies 74 loci associated zip
with educational attainment
718 Genetic variants associated with subjective well-being, zip
depressive symptoms,and neuroticism identified through
genome-wide analysis
723 Genome-wide association meta-analysis of 78,308 zip
individuals identifies new loci and genes influencing human
intelligence
726 Linkage disequilibrium - dependent architecture of human zip
complex traits shows action of negative selection
735 Red blood cell distribution width: Genetic evidence for aging zip
pathways in 116,666 volunteers
736 Mixed model association for biobank-scale data sets. zip
739 Genome-wide association analyses for lung funtion and zip
chronic obstructive pulmonary disease identify new loci and
potnential druggable targets
744 Genome-wide association study reveals ten loci associated zip
with chronotype in the UKBiobank.
53
745 Genome-wide association analyses of sleep disturbance zip
traits identify new loci and highlight shared genetics with
neuropsychiatric and metabolic traits
749 Genome-wide association study of alcohol consumption and zip
genetic overlap with other health-related traits in UK Biobank
(N=122117)
752 Genome-wide associations for birth weight and correlations zip
with adult disease
760 Genome-wide association study of cognitive functions and zip
educational attainment in UK Biobank (N=112151)
762 Molecular genetic aetiology of general cognitive function is zip
enriched in evolutionarily conserved regions
776 Rare coding variants pinpoint genes that control human zip
hematological traits
777 An erythroid-specific ATP2B4 enhancer mediates red blood zip
cell hydration and malaria susceptibility
783 Cognitive performance among carriers of pathogenic copy 7z
number variants: Analysis of 152,000 UK Biobank subjects
792 The 'Cognitive footprint' of psychiatric and neurological 7z
conditions: cross-sectional study in the UK Biobank Cohort
793 Visualization of cancer and cardiovascular disease co- 7z
occurrence with network methods
796 Psychological distress, neuroticism, and cause-specific 7z
mortality: early prospective idence from UK Biobank
54