HMG Advance Access published October 1, 2015
Human Molecular Genetics, 2015, 1–12
doi: 10.1093/hmg/ddv379
Advance Access Publication Date: 16 September 2015
Association Studies Article
A S S O C I AT I O N S T U D I E S A R T I C L E
Gene-based meta-analysis of genome-wide association
studies implicates new loci involved in obesity
Sara Hägg1,2, Andrea Ganna3,4, Sander W. Van Der Laan5, Tonu Esko6,7,8,9,
Tune H. Pers6,7,8,10,11, Adam E. Locke12, Sonja I. Berndt15, Anne E. Justice16,
Bratati Kahali13,14, Marten A. Siemelink5, Gerard Pasterkamp5,18, the GIANT
Consortium, David P. Strachan19, Elizabeth K. Speliotes13,14, Kari E. North16,17,
Ruth J.F. Loos20,21,22,23, Joel N. Hirschhorn6,7,8, Yudi Pawitan1 and Erik
Ingelsson2,24,25,*
1
Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden, 2Molecular
Epidemiology and Science for Life Laboratory, Department of Medical Sciences, Uppsala University, SE-751 41
Uppsala, Sweden, 3Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge,
MA, USA, 4Analytical and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital
and Harvard Medical School, Boston, MA, USA, 5Laboratory of Experimental Cardiology, Division of Heart and
Lungs, University Medical Center Utrecht, Utrecht 3584 CX, The Netherlands, 6Divisions of Endocrinology and
Genetics and Center for Basic and Translational Obesity Research, Boston Children’s Hospital, Boston, MA 02115,
USA, 7Broad Institute of the Massachusetts Institute of Technology and Harvard University, Cambridge, MA
02142, USA, 8Department of Genetics, Harvard Medical School, Boston, MA 02115, USA, 9Estonian Genome Center,
University of Tartu, Tartu 51010, Estonia, 10Department of Epidemiology Research, Statens Serum Institut,
Copenhagen, Denmark, 11Novo Nordisk Foundation Center for Basic Metabolic Research, University of
Copenhagen, Copenhagen, Denmark, 12Center for Statistical Genetics, Department of Biostatistics, 13Department
of Internal Medicine, Division of Gastroenterology, 14Department of Computational Medicine and Bioinformatics,
University of Michigan, Ann Arbor, MI 48109, USA, 15Division of Cancer Epidemiology and Genetics, National
Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA, 16Department of Epidemiology,
17
Carolina Center for Genome Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA,
18
Laboratory of Clinical Chemistry and Hematology, Division Laboratories & Pharmacy, UMC Utrecht, Utrecht, The
Netherlands, 19St George’s, University of London, London SW17 0RE, UK, 20MRC Epidemiology Unit, Institute of
Metabolic Science, University of Cambridge, Addenbrooke’s Hospital, Hills Road, Cambridge CB2 0QQ, UK, 21The
Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
10029, USA, 22The Genetics of Obesity and Related Metabolic Traits Program, 23The Mindich Child Health and
Development Institute, 24Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK and
25
Department of Medicine, Division of Cardiovascular Medicine, Stanford University School of Medicine,
Stanford, CA 94305, USA
*To whom correspondence should be addressed. Tel: +46 707569422; Fax: +46 18515570; Email: erik.ingelsson@medsci.uu.se
Received: April 30, 2015. Revised and Accepted: September 10, 2015
© The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
1
2
| Human Molecular Genetics
Abstract
To date, genome-wide association studies (GWASs) have identified >100 loci with single variants associated with body mass index
(BMI). This approach may miss loci with high allelic heterogeneity; therefore, the aim of the present study was to use gene-based
meta-analysis to identify regions with high allelic heterogeneity to discover additional obesity susceptibility loci. We included
GWAS data from 123 865 individuals of European descent from 46 cohorts in Stage 1 and Metabochip data from additional 103 046
individuals from 43 cohorts in Stage 2, all within the Genetic Investigation of ANthropometric Traits (GIANT) consortium. Each
cohort was tested for association between ∼2.4 million (Stage 1) or ∼200 000 (Stage 2) imputed or genotyped single variants and BMI,
and summary statistics were subsequently meta-analyzed in 17 941 genes. We used the ‘VErsatile Gene-based Association Study’
(VEGAS) approach to assign variants to genes and to calculate gene-based P-values based on simulations. The VEGAS method was
applied to each cohort separately before a gene-based meta-analysis was performed. In Stage 1, two known (FTO and TMEM18) and
six novel (PEX2, MTFR2, SSFA2, IARS2, CEP295 and TXNDC12) loci were associated with BMI (P < 2.8 × 10−6 for 17 941 gene tests). We
confirmed all loci, and six of them were gene-wide significant in Stage 2 alone. We provide biological support for the loci by
pathway, expression and methylation analyses. Our results indicate that gene-based meta-analysis of GWAS provides a useful
strategy to find loci of interest that were not identified in standard single-marker analyses due to high allelic heterogeneity.
Introduction
Genome-wide association studies (GWASs) and large-scale collaborations have revolutionized the field of complex disease genetics. Body mass index (BMI) is commonly used to assess obesity
and >100 loci have been discovered to date in studies of individuals of primarily European descent (1,2). Nevertheless, a large
proportion of the genetic variance remains undiscovered, implying that more loci influencing BMI are yet to be found (3). Additional explanations for the missing heritability include, among
others, incomplete tagging of causal variants and allelic heterogeneity—multiple alleles in the same gene region that affect the
same trait (4,5). One approach to detect regions that display substantial allelic heterogeneity is to use gene-based tests of association. Gene-based tests can improve statistical power in the
presence of allelic heterogeneity by combining single variants
from GWAS into a gene-based score, which substantially reduces
the burden of multiple testing and combines signals from multiple associated variants (6–8). One of several methods where significance levels from single variant associations are used to
calculate a gene-based test statistic is the VErsatile Gene-based
Association Study (VEGAS) approach (8). VEGAS performs
equivalently to other gene-based approaches (8), but it is superior
to other methods when used in meta-analyses because it uses
external reference panels to simulate a null distribution, i.e. it
can be run with summary association statistics and does not require individual-level genotypic data.
A gene-based test can also improve the ability to replicate loci
displaying high between-study heterogeneity. Differences in the
underlying linkage disequilibrium (LD) patterns could lead to different tag single nucleotide polymorphisms (SNPs) being linked
to the causal variants in different study populations, a problem
that is lessened in gene-based tests as they are combined into
the same gene signal. Thus, using gene-based methods should
increase statistical power for GWAS regions with allelic and between-study heterogeneity, and applying these techniques
could implicate novel loci without increasing sample sizes or collecting new data. Hence, the aims of the present study were (1) to
identify new loci associated with BMI and (2) to evaluate our
gene-based meta-analysis as a valid approach using a set of sensitivity analyses and comparisons. Toward these aims, we used
existing GWAS data from the GIANT (Genetic Investigation of ANthropometric Traits) consortium. To enhance power by allowing
for allelic heterogeneity between studies, we applied the VEGAS
summary statistics algorithm on each GWAS study separately before combining the results in a single gene-based meta-analysis.
Results
Stage 1 identified six novel loci associated with BMI
We first analyzed the Stage 1 data set comprising 46 studies including up to 123 865 individuals who had undergone GWAS of
BMI, as detailed elsewhere by Speliotes et al. (1). In brief, the
GWAS was done assuming an additive genetic model with inverse
normal transformation of BMI. The VEGAS method was applied to
each of the GWAS result files from the 46 studies individually, before a gene-based meta-analysis using the Fisher method was
conducted on 17 941 genes (Fig. 1; Supplementary Material,
Fig. S1). Each gene was defined with symmetric boundaries
±50 kb (the default in VEGAS); however, results from VEGAS are
in general robust across different gene boundaries chosen (9). A
SNP could contribute to multiple gene signals if the genes are overlapping. Thus, a locus could include multiple neighboring or overlapping genes that could partly rely on the same SNP signals.
Applying a Bonferroni-corrected gene-wide significance
threshold (P < 2.8 × 10−6 for 17 941 gene tests) for gene-based association with BMI, we identified six loci that have previously not
been implicated in relation to BMI (at least 1 Mb from published
genome-wide SNPs); the first locus contained four significant
genes in high LD (Supplementary Material, Fig. S2): thioredoxin
domain containing 12 (TXNDC12); KTI12 homolog, chromatin associated (KTI12); basic transcription factor 3-like 4 (BTF3L4) and
zinc finger, FYVE domain containing 9 (ZFYVE9). The other five
significant loci harbored only one gene each: peroxisomal biogenesis factor 2 (PEX2); mitochondrial fission regulator 2 (MTFR2);
sperm-specific antigen 2 (SSFA2); isoleucyl-tRNA synthetase 2
(IARS2) and centrosomal protein 295 kDa (CEP295), also known
as KIAA1731. In addition, two well-known BMI-associated loci
were confirmed in our analyses; the first locus contained the fat
mass- and obesity-associated gene (FTO) and the second locus
the gene transmembrane protein 18 (TMEM18). Details about
the associations are given in Table 1 and regional plots are
shown in Supplementary Material, Figure S3A–H.
Stage 2 validated the BMI loci found in Stage 1
In order to validate loci that reached gene-wide significance in
the Stage 1 analyses, we performed Stage 2 analyses in 43
Human Molecular Genetics
| 3
Figure 1. Workflow describing the process of gene-based analyses applied on the different data sets. The gene-based meta-analyses were carried out in a Stage 1 material
of 46 different studies from the Speliotes et al. paper. Results were then validated in a Stage 2 phase including 43 studies from the MetaboChip in the Locke et al. paper
depicted in the figure. A joint gene-based meta-analysis of Stage 1 and Stage 2 data sets were subsequently done. The two right panels refer to the gene-based analyses
carried out of the summary statistics from Speliotes et al. and Locke et al. For clarity, the corresponding tables in the paper are illustrated in the figure.
Table 1. Gene-wide significant BMI-associated loci from gene-based meta-analyses of Stage 1, 2 and 1 + 2 data sets
Locus
1
Gene
Chr
Position
Start
TXNDC12
1
52 258 391
KTI12
1
52 270 364
BTF3L4
1
52 294 560
ZFYVE9
1
52 380 633
2
PEX2
8
78 055 048
3
FTO
16
52 295 375
4
MTFR2
6
136 593 860
5
TMEM18
2
657 972
6
SSFA2
2
182 464 926
7
IARS2
1
218 334 077
8
CEP295
11
93 034 463
Additional gene-wide significant hits in Stage 2
9
MC4R
18
58 038 564
10
ADCY3
2
25 042 038
Additional gene-wide significant hits in Stage 1 + 2
11
CBX4
17
75 421 549
12
SPAG6
10
22 674 404
13
SPG11
15
42 642 185
8
TAF1D
11
93 108 745
14
KHDRBS1
1
32 252 077
15
RPL23AP74
17
56 501 992
Stage 1 P-value
Stage 2 P-value
Stage 1 + 2 P-value
52 293 635
52 272 060
52 326 650
52 584 946
78 075 079
52 705 882
136 613 142
667 439
182 503 706
218 388 006
93 103 170
1.3E−14
1.2E−12
1.8E−12
9.7E−07
3.4E−13
1.6E−10
9.7E−08
1.8E−07
6.3E−07
2.1E−06
2.3E−06
2.3E−14
9.3E−13
3.2E−12
1.5E−03
1.7E−07
3.1E−13
4.5E−07
1.9E−08
3.1E−05
1.1E−08
8.1E−05
5.5E−27
2.0E−23
9.7E−23
1.6E−08
1.1E−18
1.1E−21
4.9E−13
4.8E−14
2.0E−10
3.8E−13
1.7E − 09
58 040 001
25 142 055
2.2E−03
2.2E−01
6.4E−07
9.3E−07
5.1E−08
5.5E−05
75 427 808
22 746 545
42 743 168
93 114 310
32 282 059
56 502 554
1.1E−05
3.9E−05
2.4E−05
1.2E−04
2.0E−05
2.3E−04
3.9E−06
4.5E−05
2.2E−04
5.2E−04
3.0E−03
7.5E−04
4.6E−10
1.6E−08
4.5E−08
4.9E−07
5.5E−07
1.3E−06
Stop
additional cohorts including 103 046 individuals with data
from the Metabochip array published in Locke et al. (2) using
the same gene-based approach (Fig. 1; Supplementary Material, Fig. S1). The Metabochip array does not have a dense genome-wide coverage of SNPs; however, the gene coverage as
judged by the VEGAS results is almost as complete (17 833
genes in Stage 2 compared with 17 941 in Stage 1). Moreover,
analyzing the SNPs that contributed to the VEGAS signals
in Stage 1 showed that many SNPs were common between
Stages 1 and 2 (Supplementary Material, Fig. S4). Thus, all
eight loci replicated in Stage 2 with a significance well below
a Bonferroni-corrected threshold (α = 0.00625 for eight tests)
(Table 1).
Although replication of the significant genes was the primary
goal of Stage 2, we also report additional gene-wide significant
hits that arise when using these data to the full extent. Hence,
two loci, both previously established, including the genes melanocortin 4 receptor (MC4R) and adenylate cyclase 3 (ADCY3), respectively, were identified in Stage 2 alone (Table 1). Further, in
a joint Stage 1 + 2 analysis, in addition to all genes in the eight
loci from Stage 1, another five loci also reached the gene-wide
threshold and was represented by the genes chromobox homolog
4 (CBX4), sperm associated antigen 6 (SPAG6), spastic paraplegia
11 (autosomal recessive) (SPG11), KH domain containing, RNA
binding, signal transduction associated 1 (KHDRBS1) and ribosomal protein L23a pseudogene 74 (RPL23AP74) (Table 1).
4
| Human Molecular Genetics
Pathway analyses of BMI-associated genes
Using data from Stage 1, we performed pathway analyses including all genes with nominally significant (P < 0.05) association
with BMI (n = 773) using ConsensusPathDB (10). We observed an
enrichment for pathways disease (Q-value <0.003) and synthesis
of bile acids and bile salts via 24-hydroxycholesterol (Q-value <0.05).
Combining Stages 1 and 2, pathway analyses of the 942 topranking genes (again P < 0.05 for association with BMI) revealed
enrichment in peptide hormone biosynthesis (Q-value <0.002), osteopontin signaling (Q-value <0.004), glycoprotein hormones (Q-value
<0.004) and regulation of cytoplasmic and nuclear SMAD2/3 signaling
(Q-value <0.05).
We further performed pathway and network analysis using
Ingenuity software (11) (Qiagen, fall 2014 version) with the top
genes from the gene-based association analysis, also including
the established Stage 2 hits MC4R and ADCY3 to allow for a
more stable network. The most significant canonical pathway
was Glutathione Redox Reactions II, P = 1.96 × 10−3. The top network
not only involved FTO, MTFR2, ADCY3 and MC4R but also the
novel genes SSFA2 and ZFYVE9. The nuclear FTO was an important hub with many connections centrally regulating many of the
other genes (Fig. 2). Strikingly, one of these genes was leptin (LEP),
a hormone known to be involved in obesity signaling (12). This
finding is especially interesting in the light of recent important
functional studies disentangling the causal variants and genes
in the FTO locus, which have highlighted not only IRX3 and
IRX5 (13,14) but also RPRIP1L (15) and FTO (16).
Gene expression analyses of identified genes
To provide biological support for and to further validate the BMIassociated genes identified in Stage 1, we explored those genes in
relevant expression data sets. To accomplish this aim, we downloaded all data related to obesity from blood or adipose tissue in
human, mouse or rat found in the Gene Expression Omnibus
(GEO) data repository. All data sets were independently crossexamined by two different investigators (S.H. and E.I.) and underwent quality control steps before further inclusion. Altogether, 53
selected experiments including 335 obese cases and 308 controls
were finally used for analyses.
In each data set, although our primary goal was to test the
11 genes identified in Stage 1, all genes were screened for differential expression between obese cases and controls, and
ranked by significance. This allowed for a null distribution to
be generated for each experiment by permuting the ranking
of the gene. We then evaluated if the 11 genes were significantly deviating from the null distribution. Using a Bonferroni
correction (threshold P-value = 0.0045) for the 11 genes tested,
we found 2 genes (TXNDC12, P = 0.00021 and SSFA2, P = 0.00051)
to be significantly differentially expressed (having a lower rank
than expected based on the null) between obese cases and controls in blood and adipose tissue combined (Fig. 3). As judged
from the heat map, both TXNDC12 and BTF3L4 from the same
loci with borderline significance of P = 0.0066 had higher gene
expression ( positive fold change) in obese cases versus controls in most of the studies, whereas SSFA2, on the contrary,
showed lower expression (negative fold change) in obese in
most studies.
Expression and methylation quantitative trait loci
analyses
To further investigate functional aspects of the loci identified in
Stage 1, we performed both expression quantitative trait loci
(eQTL) and methylation QTL (mQTL) analyses (Supplementary
Material, Table S7), including any SNPs from the loci identified
in Stage 1. Such analyses can identify functional SNPs further
emphasizing a role for certain genes in obesity development.
We analyzed the largest number of samples available for either
blood or adipose tissue, and additionally using a smaller data set
from carotid plaques. At first, we used the Genotype-Tissue Expression (GTEx) portal (17) to look for significant cis-eQTL associations within 1 Mb window of the genes’ start/stop positions
in whole-blood samples (n = 168). We found four significant
cis-eQTLs (based on permutation) within the gene TXNDC12
(Supplementary Material, Fig. S5). Further, to calculate gene significance from eQTL data, a gene-specific empirical P-value was
generated, and the corresponding Q-value for that gene was
calculated (where Q-value <0.05 is indicating significance). In
this analysis, the TXNDC12 gene was found to be significant
(Q-value = 0.0065).
Then, we further analyzed cis-eQTLs from adipose tissue (±1 Mb
gene boundaries) by the Genevar (GENe Expression VARiation)
platform (18) using default settings with P-value cutoff of
<0.001. The samples (n = 856) were collected from the TwinsUK
cohort with healthy females of European descent (19). Again,
TXNDC12 was found to have two significant eQTLs (Supplementary Material, Fig. S6), and from the same locus, ZFYVE9 had four
significant eQTLs from two different Illumina expression probes
(Supplementary Material, Fig. S7). Two other genes also revealed
significant eQTLs: PEX2 with 36 significant eQTLs (Supplementary Material, Fig. S8) and FTO with three significant eQTLs
(Supplementary Material, Fig. S9).
For the mQTL analyses, we again used the Genevar platform
to characterize all genes identified in Stage 1 in the same adipose
tissue samples from female twins (n = 856). The degree of DNA
methylation at specific CpG sites was assessed by the Illumina
HumanMethylation450 bead chip. The maximum distance to
probe start was set to 0.1 Mb, and the P-value cutoff level was
<0.001. We found strong enrichment of cis-mQTLs in the locus
of the PEX2 gene; two different CpG sites were identified, which
were associated with 28 and 82 mQTLs, respectively (Supplementary Material, Fig. S10). The FTO gene also showed strong enrichments of multiple cis-mQTLs. We found six different CpG sites to
be enriched, with 26, 23, 16, 16, 30 and 83 mQTLs, respectively
(Supplementary Material, Fig. S11). Lastly, we found five CpG
sites with significant SNPs within TMEM18. Again, the enrichment was very strong in at least four of the five sites and all together, we identified 41, 17, 9, 38 and 101 mQTLs, respectively
(Supplementary Material, Fig. S12).
Finally, we decided to do mQTL follow-up analyses, including
association tests with BMI of identified cis-mQTLs (13 CpGs from
PEX2, FTO and TMEM18) in participants of the Dutch AtheroExpress cohort using carotid plaque (N = 488) and blood (N = 92)
samples. All three genes had at least one significant mQTL in plaques (Supplementary Material, Table S1 and Fig. S13) and in blood
(Supplementary Material, Table S2 and Fig. S13), when adjusting
for multiple testing using false discovery rate (FDR < 0.05). Methylation levels in plaques at the promoter-associated cg01780754 located within the PEX2 gene, a probe that was significant in all
tissue-specific mQTL analyses, was also significantly associated
with BMI (FDR-adjusted P-value = 0.009). Specifically, our data
show that the C allele of rs9643429 located upstream of PEX2 (in
LD with the best mQTL SNP) is linked to higher methylation at
cg01780754 in plaques, and when methylated, cg01780754 is associated with an increase in BMI. The combined functional effects
found for PEX2 is summarized in Supplementary Material,
Figure S14.
Human Molecular Genetics
| 5
Figure 2. The gene network created with the nuclear gene FTO at center, directly and indirectly regulating the other genes in this network. Ingenuity software was used to
establish a gene network including the top genes identified in Stage 1 and Stage 2 analyses of the gene-based meta-analyses. The network involved not only the genes FTO,
MTFR2, ADCY8 and MC4R but also the genes SSFA2 and ZFYVE9. The nuclear FTO was found to be one of the hubs in the network together with the well-established BMI
gene LEP.
Evaluation of our gene-based meta-analysis approach
As a secondary aim of this study, we anticipated to evaluate other
possible approaches that could have been undertaken, as well as
investigating plausible biases from using VEGAS. Our a priori
hypothesis was that performing gene-based analyses in each
individual study as we did would be a more efficient approach
to pick up additionally unknown loci, likely by catching more allelic and between-study heterogeneity, than an approach where
gene-based analysis was applied once to the summary statistics
from the meta-analysis of SNP-based results. To test this
6
| Human Molecular Genetics
Table 2. Gene length in relation to P-value from VEGAS meta-analysis
Figure 3. Gene expression validation of genes identified in Stage 1. We
downloaded expression data from obese individuals/mice/rats in whole blood
or adipose tissue from the GEO repository to look for differential expression of
the 11 genes found in Stage 1. Using a rank-based permutation method to
generate the null distribution, we concluded that the genes TXNDC12 and SSFA2
were significantly differentially expressed between obese cases and controls
using a Bonferroni-corrected threshold of 0.0045. The color bar describes the
log2 fold change value for each gene in each study, where blue represents lower
expression and red higher expression in obese cases versus controls.
P-value cutoff
N
Top genes
Mean gene length (bp)
Top genes
Other genes
P-valuea
<E−6
<0.00001
<0.0001
<0.001
<0.01
<0.05
<0.10
<0.15
<0.20
<0.30
<0.40
<0.50
<0.60
<0.70
<0.80
<0.90
10
10
16
31
145
530
1007
1479
1967
2912
3934
4922
5982
7180
8487
9972
90 603
90 603
73 688
59 548
54 417
49 916
46 644
46 391
46 228
46 576
46 818
49 233
50 339
51 681
52 974
55 084
0.65
0.65
0.51
0.47
0.23
0.06
0.0052
3.3E−05
8.4E−08
4.8E−12
4.3E−22
3.0E−24
9.8E−29
1.6E−34
5.2E−45
3.8E−57
70 571
70 571
70 582
70 612
70 772
71 489
72 655
73 789
75 072
77 771
81 326
84 205
88 783
95 459
106 608
128 585
a
assumption empirically, we performed VEGAS gene-based analysis on the summary statistics of the single-SNP meta-analysis
of BMI by Speliotes et al. (1), which included 123 865 individuals
from 43 studies (Fig. 1). The gene-based analysis based on the
summary statistics generated a list of 20 significant loci, corresponding to 59 genes, all of which have been reported in the original publication (Supplementary Material, Table S3). We then
compared the loci identified in our Stage 1 analysis with this
gene-based analysis of the summary statistics from Speliotes
et al. and found that none of the newly identified loci were
gene-wide significant (Supplementary Material, Table S4) in the
latter analysis, whereas the previously established loci (FTO and
TMEM18) were represented among the top findings, reaching
gene-wide significance.
The same approach was undertaken to examine the genebased analysis of the summary statistics data from Locke et al.
(2), which was the data source used for our Stage 2 analyses
(Fig. 1). The result from the VEGAS analysis revealed 33 genewide significant loci containing 110 genes in total (Supplementary Material, Table S5). All gene-wide significant loci, except
for two, were already reported as being associated with BMI in
the Locke et al. paper. The two newly identified loci were golgin
A2 (GOLGA2) and secretory carrier membrane protein 4
(SCAMP4). When we then compared these results with the
gene-based loci identified in our Stage 1, the only common loci
were again the established ones (FTO and TMEM18) (Supplementary Material, Table S6). Thus, our hypothesis that applying the
VEGAS software on summary statistics would render fewer
novel loci compared with our approach of performing VEGAS in
each substudy before meta-analyzing seemed to be true. The
most likely reason for this observation is that allelic heterogeneity in the gene-based meta-analysis (Stage 1 and 2 results),
where many different SNP signals rather than a strong single driver, gave rise to the gene-wide significant hits. This explanation
was further supported by the variety of top SNPs that contributed
to the VEGAS gene-based signals in Stages 1 and 2 (Supplementary Material, Figs S3 and S4). However, there is yet to be proven
if the novel BMI loci are good biological targets.
Next, we decided to investigate whether the length of the gene
could bias our observed results. The VEGAS method corrects for
the number of SNPs within each gene and also for gene length
as has been shown before (20). Nonetheless, using Stage 1 data,
t-Tests were done on the logarithm of gene length because of the skewed
distribution.
the mean gene length of all genes was 70 582 bp, whereas the median gene length was 28 333 bp (range: 2 304 483 bp), indicating a
skewed distribution. At first, we analyzed the mean gene length
of a varying list of top genes (P < 10−6 to P < 0.05 for inclusion)
comparing with the rest of the genes, but found no significant differences (Table 2). However, as the top gene list increased when
relaxing the P-value threshold even more, there was a significant
difference in mean gene length between the groups, with shorter
genes among the top genes indicating that VEGAS tends to overadjust for gene length (Table 2). To further support this notion,
we performed correlation analysis and found a weak but significant correlation between increased gene length and higher
P-value (Spearman rank ρ = 0.14; P < 10−16).
Finally, we performed sensitivity analyses in the three most
promising novel loci (PEX2, TXNDC12 and SSFA2) in Stage 1 cohorts. For each locus and cohort, we deleted the best SNP as reported by VEGAS and consequently reran VEGAS on all data
sets. The difference in the gene-based P-value from the two
runs (with and without the best SNP) in each cohort and locus
is presented in Supplementary Material, Figure S15. The mean
difference in gene-based P-value in each cohort (PSensitivity −
PNormal ) was 0.015 for PEX2, 0.020 for SSFA2 and 0.022 for
TXNDC12. The overall conclusion when summing up the Fisher
χ 2 statistics over all cohorts was that the final gene-based
P-value changed one 10-fold to the higher for all three loci, still
well below gene-wide detection for PEX2 and TXNDC12, and
reaching borderline significance level for SSFA2.
Discussion
In this study, we present a novel approach utilizing previous
GWAS results by applying gene-based methods on individual
study results followed by meta-analysis in order to find new
loci associated with BMI. The study samples used were all from
the GIANT consortium including Stage 1 (1) and Stage 2 data
(2). We identified six new loci associated with BMI, including several genes with high potential for further functional follow-up
studies (e.g. PEX2, TXNDC12 and SSFA2), and could confirm two
established loci (FTO and TMEM18). All loci were replicated in
Human Molecular Genetics
Stage 2, and we provide further biological support from a range of
bioinformatics approaches, such as expression analyses, as well
as pathways, and eQTL and mQTL analyses (Supplementary
Material, Table S7).
The concept of allelic heterogeneity
Allelic heterogeneity occurs when multiple alleles in the same
locus are associated with the same trait. The phenomenon of allelic heterogeneity is being reported for more and more loci and
traits, and it is a well-known phenomenon in long-established
loci, such as BRCA1 and BRCA2. It is easy to imagine several causal
variants that affect the same gene transcripts through changes in
expression levels or gene structure and function, and thereby in
the end affect the same biological mechanism. Loci that display
high allelic heterogeneity can be missed by conventional GWAS
of individual variants, as the statistics are based on one variant
at the time. Gene-based methods on the other hand combine
all signals for individual variants within the same gene to a single
gene-based signal, hence accounting for allelic heterogeneity.
Moreover, by applying a gene-based meta-analysis on the
study-level data (instead of performing the analyses on the
meta-analysis summary statistics), the allelic heterogeneity is
better accounted for, and moreover, between-study heterogeneity is captured if existing. By comparing the results of the
gene-based analysis done on a cohort level (Table 1) with a
gene-based analysis based on summary statistics (Supplementary Material, Tables S3 and S5), we could conclude that very different set of genes are being identified using the two approaches.
Thus, this observation supports our notion that the heterogeneity is more accurately picked up by VEGAS when run on cohort
level data rather than directly on the GWAS meta-analysis summary data. The likely reason is that all SNPs and cohorts contribute to the result, not just a single significant SNP in a large cohort,
a suggestion that is further supported by the replication in Stage 2
(Table 1), sensitivity analyses without the best SNP (Supplementary Material, Fig. S15) and the variety of lead SNPs from different
studies in the significant loci (Supplementary Material, Figs S3
and S4).
In performing a gene-based association test, one presumes
that the contributions of different causal variants will be associated with the outcome via the same downstream disease
mechanism. However, an important feature in Fisher’s method
for meta-analysis is that it allows effect sizes in both positive
and negative directions, so there is no restriction in the types of
allelic heterogeneity within the gene.
Multivariate methods where independent SNPs are combined
within the same locus have improved the total variance explained, suggesting that allelic heterogeneity could be a major
contributor (4,5). However, independent SNP signals within a
locus are often difficult to identify because standard GWASs are
not designed to do so and there are computational limitations
of multivariate studies. On the contrary, the gene-based approach has the potential to detect loci where independent SNP
signals contribute to the same gene-based signal. Thus, loci
with strong secondary signals are prioritized, and as a consequence, we identified some of the established BMI loci with
known secondary signals among our top findings. FTO and
TMEM18 (identified in Stage 1), as well as MC4R (identified in
Stage 2), have been reported to harbor multiple SNPs associated
with BMI (21).
The loci with allelic heterogeneity are also more likely to contain functional SNPs, as judged by software such as PolyPhen and
by eQTL mapping (4,5). In light of this, it is not surprising that the
| 7
QTL investigations in this study concluded that 5 of the 11 genes
(4 of the 8 loci) that we discovered in Stage 1 showed evidence of
enrichment in eQTLs and/or mQTLs (Supplementary Material,
Table S7). When also considering the expression and pathway
analyses, we found functional evidence in 7 of the 11 genes
(6 of the 8 loci) (Supplementary Material, Table S7).
VEGAS in comparison with other gene-based methods
There is a growing number of gene-based bioinformatics tools
available. Many of them use participant-level genotype data as
input, perform a gene-based association test using permutations
and summarize the results in a list of gene-to-trait associations
(6,22). These methods usually perform well in terms of limiting
false-positive rates; however, they have a great disadvantage
being computationally demanding and needing access to participant-level genotype data, which usually is impossible in the setting of large genetic consortia. Other methods, including VEGAS,
only use a list of SNP P-values as input and then consider the
underlying SNP–SNP correlation pattern using HapMap Project’s
genotype data or other data sources (7,8,23). These methods are
computationally less burdensome and use summary result
files, which is crucial when working within large international
consortia.
The method in VEGAS that we used performs well compared
with other gene-based analyses considering different gene
boundaries (9). Basically, VEGAS is stable across different boundaries chosen and remains powerful even with the inclusion of
non-significant SNPs (9). We used the default gene boundary in
VEGAS (±50 kb), which is appropriate as GWAS hits are enriched
within this distance (24).
GWASs are known to prioritize SNPs in the close vicinity to or
within genes of long length (20,24). The VEGAS method, however,
has been shown to compensate for the gene length bias in the
analysis (20). Our study investigated this further and found that
VEGAS tends to be too stringent when compensating for gene
length, and as a consequence, long genes are less likely to be significant. The fact that we did not identify any genes related to behavior and appetite, which are usually found in genetic studies of
BMI, could be a result of this phenomena, as brain genes tend to
be longer than other genes (25).
Functional aspects of newly identified genes
The protein encoded by TXNDC12 is characterized by a conserved
active motif called the thioredoxin fold that catalyzes disulfide
bond formation and isomerization (26). The txndc12 knockout
model in mice does not show any phenotype changes. However,
a study in grouper fish concluded that TXNDC12 is an important
antioxidant with potential physiological properties (27). They
further noticed that it is predominately expressed in liver, muscle
and brain, and in addition, we found it to be differentially expressed in both blood and adipose tissue (which they did not
test). Another interesting candidate gene from the same locus
is ZFYVE9, which encodes a double zinc finger protein that interacts with SMAD proteins that are signal transducers in the transforming growth factor β (TGF-β) signaling pathway. The relation
to TGF-β is also demonstrated in the network analysis (Fig. 2). A
mediated knockdown model of zfyve9a in zebrafish inhibits the
formation of the liver in early embryogenesis (28). In addition,
we found significant eQTLs in adipose tissue strengthening the
functional aspect of this gene. The four genes located in this
locus at Chromosome 1 are found in a region with strong LD
(Supplementary Material, Fig. S2). Using VEGAS, we cannot
8
| Human Molecular Genetics
compensate for including the same SNPs in several genes; thus,
many SNPs are common between genes in a region like this (Supplementary Material, Fig. S16). Based on the functional data we
present, TXNDC12 seems to be the best candidate for involvement in adiposity among the four, followed by ZFYVE9 that also
had some functional validation and did not share any of the
most strongly associated BMI SNPs with TXNDC12.
The protein encoded by PEX2 belongs to a family of peroxisomal membrane proteins where gene mutations lead to peroxisome biogenesis disorders, such as Zellweger syndrome with
various abnormalities seen in the central nervous system (29).
The pex2 knockout mouse dies before weaning at ∼13 days of
age, which is in line with humans suffering from the Zellweger
syndrome who usually die at adolescence. Decreased levels of triglycerides and high-density lipoprotein cholesterol in plasma are
seen before dying (30), and PEX2 has also been shown to decrease
angiogenesis in ischemia and wound healing (31). We found significant eQTLs in the PEX2 locus and strong enrichment of
mQTLs in adipose tissue, carotid plaques and blood. Moreover,
DNA methylation in carotid plaques was associated with BMI,
indicating an intrinsic SNP–methylation–obesity relationship
(Supplementary Material, Fig. S14). Thus, we highlight a functional role of PEX2 through gene expression and in particular
methylation in multiple tissues.
SSFA2 was first considered a cancer-related gene as its expression level is altered in colon cancer cells (32), but other findings
have described its role in metabolic disturbances through regulation of energy homeostasis and the exocrine system with implications for obesity and diabetes (33–36). SSFA2 is expressed in
brown adipose tissue, liver, pancreas, stomach and kidney. The
ssfa2 gene knockout mouse model exhibit decreased body weight
and adipose tissue, resistance to diet-induced obesity and increased food intake and metabolic rate. Moreover, abnormal glucose homeostasis and hormone levels are detected (33,36) and it
is connected to LEP in the gene network (Fig. 2). Our gene expression analysis further concluded that SSFA2 is associated with
obesity (Fig. 3).
The MTFR2 gene is not well known but is involved in aerobic
respiration in the mitochondria. The Ingenuity analysis highlighted MTFR2 in the cytoplasm where it is regulated by nuclear
genes important for obesity (Fig. 2). IARS2 encodes a tRNA
synthetase for isoleucine found in mitochondria. Isoleucine is
the predominant amino acid in human mitochondria and small
changes of the aminoacylation properties may be of great importance (37). Point mutations found in this gene have been shown to
cause fatal cardiomyopathy, growth retardation, hearing deficiency, neuropathy and skeletal dysplasia (38,39). CEP295, also
called KIAA1731, is a putative centrosomal protein containing
an ALMS motif, also seen in the gene Alström syndrome 1
(ALMS1). As the name imply, mutations in ALMS1 cause Alström
syndrome, a rare progressive condition characterized by neurosensory degeneration and metabolic defects (40).
Strengths and limitations
A main strength of this study was the very large sample size used
for the VEGAS analyses, and that we could do gene-based analyses in each participating study before meta-analyzing (instead
of using the summary statistics from the meta-analysis to do
gene-based analyses). Our approach allowed for the identification of six novel loci associated with BMI. Moreover, as no new
samples were interrogated in this study, this approach highlights
the potential of the gene-based meta-analysis method as such
for reuse of existing data. The soundness of this approach was
confirmed by replication of all loci in an independent validation
stage (Stage 2) as well as by confirmation of well-known BMI
GWAS loci with documented strong secondary signals (FTO,
TMEM18 and MC4R). Furthermore, by using pathway analyses,
gene expression data and eQTL and mQTL analyses, we were
able to functionally validate genes in six of eight loci found.
This also illustrates the possibility of using VEGAS to prioritize
among candidate GWAS genes as the top genes from VEGAS are
more likely to harbor functional variants.
Limitations of the study include the assumptions needed for
the gene-based analysis, such as the one-directional effect size. It
is also worth mentioning that the VEGAS method could be run
with other settings, for example using a different LD panel,
which would most likely render slightly different results. We further concluded that VEGAS overadjusts for gene length, which
may have had an impact on the results. Moreover, the functional
validation of the significant genes was only concentrated to those
specific genes; no additional genes in the close vicinity were considered to limit the extent of multiple testing; hence, further
functional studies are warranted to conclude whether the identified genes are truly linked to obesity. The gene expression analysis was restricted to blood and adipose tissue as these were
the tissues frequently available in the GEO repository; investigating brain tissue in relation to obesity status would most likely
have rendered a different result. Finally, as all results are based
on analyses of individuals of European descent, the generalizability to other ethnicities is unknown.
Conclusions
In this study, we present a novel approach for conducting genebased meta-analysis to reutilize large GWAS data sets in order
to identify new genes involved in complex diseases. Moreover,
we perform extensive bioinformatic functional validation of the
identified genes illustrating the potential of using methods like
VEGAS to prioritize among candidate genes and to detect loci
with high allelic heterogeneity. Using this approach, we highlight
several new promising gene candidates for further validation in
functional studies, such as TXNDC12, PEX2 and SSFA2. Further attempts using gene-based meta-analyses of existing data for other
phenotypes using the methods outlined are encouraged.
Materials and Methods
Study samples
The Stage 1 data set comprised 46 studies with in total 123 865
individuals of European descent. The sample sizes varied from
276 to 26 799 individuals per study, and they were genotyped
using Affymetrix or Illumina technology as described elsewhere
(1). BMI levels were inverse normally transformed and adjusted
for age, age2 and study-specific covariates such as population
stratification by principal components. All genotype data were
imputed to the HapMap CEU panel and badly imputed SNPs
(^r2 < 0:3 in MACH, observed/expected dosage variance <0.3 in
BimBam or proper_info <0.4 in IMPUTE), and those with a
minor allele count <6 per cohort-specific strata were removed.
Analysis using additive models for association between genotype and BMI was performed on cohort level, and each GWAS
was then corrected with lambda genomic control. For the
meta-analysis on individual-variant level, the inverse variance
method in METAL software was used [analyses performed as
previously reported (1)].
Human Molecular Genetics
The Stage 2 data set included 43 additional studies of European ancestry with genotype data available from the Illumina
iSelect Metabochip, the only additional data available to us for
replication at the time of analysis, covering 196 725 SNPs selected
for cardiac and metabolic phenotypes, including 5055 SNPs selected for associations with BMI (2). The Metabochip data were
not imputed, but corrected with lambda genomic control in
each study separately.
The VEGAS method
To yield lists of gene-based P-values for association with BMI, the
VEGAS software (http://gump.qimr.edu.au/VEGAS/) was applied
to each cohort separately. The VEGAS algorithm has been described elsewhere (8), but in brief, it takes the full set of markers
within a gene and accounts for LD by simulations from HapMap
LD structures. A gene-based test statistic is calculated as the sum
of all χ 2-converted SNP P-values within that gene. The null distribution is then calculated using Monte Carlo simulation taking
into account the LD structure that results in a large number of
multivariate normal vectors with the same distribution as our
gene-based test statistic. The simulation approach is computationally faster than using permutations but performs equally
well (8). The assignment of SNPs to genes was done with the
UCSC genome browser hg18 assembly rendering 17 941 genes.
Gene boundaries were defined as ±50 kb of 5′ and 3′ untranslated
regions, which is the default setting in VEGAS. However, a study
investigating different boundaries in relation to the gene-based
result in several methods found the VEGAS sum method to be
stable for different lengths chosen (9). To limit the computational
burden, we set the maximum number of simulations to 1 000 000,
meaning that the lowest P-value that could be estimated was
1 × 10−6. The rationale for this number of simulations was the
Bonferroni-corrected gene-wide significance threshold of P < 2.8
× 10−6 (correcting for 17 941 gene tests). Due to this truncation, the
P-values from this gene-based approach appear higher than
those from single variant-based GWAS for some cohorts (for
example, for the FTO locus).
Meta-analysis using Fisher’s method
In gene-based association analysis, an underlying assumption is
that all causal effects within a gene are having the same direction. Therefore, when performing a meta-analysis on gene
level, a one-sided χ 2 distribution as used in the Fisher method
is preferred. The Fisher formula for meta-analysis is as follows:
χ2 ¼
2
k
X
loge ð pi Þ
| 9
25 networks were generated. Ingenuity calculates the significance values for canonical pathways by Fisher’s exact test right
tailed.
Gene expression analysis methods
Data sets for functional validation of the BMI-associated genes
from Stage 1 were identified in GEO (http://www.ncbi.nlm.nih.
gov/geo/) using the following search term: (‘blood’ OR ‘adipose
tissue’) AND (‘adiposity’ OR ‘obesity’ OR ‘obese’ OR ‘body-mass
index’ OR ‘body mass index’ OR ‘BMI’) AND (Homo sapiens
[Organism] OR Mus musculus[Organism] OR Rattus norvegicus
[Organism] AND (gds[Entry Type] OR gse[Entry Type])), which
rendered 322 hits. Two investigators (S.H. and E.I.) went through
the list independently and selected those data sets that fulfilled
the criteria of having obese cases versus non-obese controls
within similar experimental settings. The final number of data
sets that were selected via consensus for further analysis was
76. The data sets were downloaded from the GEO repository
using R with the GEOquery package and analyzed using the
limma package. Each data set could sometimes be split into multiple subsets depending on experimental designs, e.g. mouse
models with different time intervals, as long as the case–control
criteria were met. All data sets were quantile normalized and
log2-transformed if not done already, and quality control of the
final differential analysis was done using volcano plots. Bad quality data, microRNA-chip data and lack of control samples were
common reasons for exclusions of data sets during the analysis
procedure. The final data set included 335 cases and 308 controls
from 53 experiments. Differential expression analysis was conducted within each experiment, and the transcript with the lowest P-value within the same gene, if multiple transcripts existed,
was selected. Genes were compared between different array platforms using the official human HGNC gene symbol list downloaded from NCBI. All gene aliases and orthologous genes in
mouse and rat were consequently translated to the official
human symbol and accounted for in the search. Every gene in
each data set was then ranked according to P-value, and the
final rank was scaled to the same distribution (rank = position/
number_of_genes). The average ranking for each gene was calculated across all 53 data sets where missing genes in any data set
were assigned the rank 0.5. To calculate the null distribution,
each data set was permuted 300 times and the average gene ranking was computed 300 times for each gene. Then, the 300 permuted gene ranks were compared with the original gene rank
for each gene separately to generate a P-value. The final P-value
was further Bonferroni corrected taking 11 tests into account for
the 11 genes tested.
i¼1
QTL methods
Pathway analysis
Enrichment analysis for pathways was carried out using the webbased tool ConsensusPathDB (10) (http://cpdb.molgen.mpg.de/
CPDB/). Genes were searched for overrepresentation among different pathways, and Q-values were calculated from corrected
P-values using FDR from the hypergeometric distribution. Ingenuity Pathway Analysis (Qiagen, fall 2014 version) was used
for visualizing and analyzing the gene-based results of BMI (11).
We only focused on genes available in the Ingenuity Knowledge
Base. The confidence filter was set to consider only experimentally observed direct and indirect relationships in mice, rats and humans. A maximum of 35 nodes per network and a maximum of
Enrichments of QTL were tested using three approaches. First,
the GTEx portal (17) (http://www.gtexportal.org/home/) was
used via their web interface for screening of eQTLs in whole
blood. Expression data were collected from Affymetrix arrays or
Illumina RNA sequencing, and genotype data from Illumina arrays. We only tested for cis-eQTLs allowing SNPs to reside in a
window of 1 Mb from start or stop of the corresponding gene.
The overall enrichment significance for a gene was calculated
with FDR using permutation and presented in an adjusted
Q-value. The number of significant cis-eQTLs within each gene
was dependent on the permutation-based P-value threshold for
that gene. The samples (n = 168) used for whole-blood analysis
were collected through the Cancer Human Biobank (caHUB) of
10
| Human Molecular Genetics
the National Cancer Institute and came from adult individuals of
any sex and ancestry, taken either during surgery or postmortem.
Specifications on sample quality or data preprocessing can be
found online. The second software used for QTL analyses was
the Genevar platform (18) (http://www.sanger.ac.uk/resources/
software/genevar/) developed at the Wellcome Trust Sanger Institute. The software is a standalone Java application, which
was downloaded to the computer and once running, it connects
to the online databases needed. Again, we defined cis-eQTLs as
SNPs within ±1 Mb of gene boundaries and used the default settings for significance with cutoff for P-value <0.001. The corresponding numbers for the cis-mQTLs were ±0.1 Mb gene
boundaries and a P-value cutoff level <0.001. The Genevar software does not present any overall gene scores as GTEx does.
The samples were from adipose tissue collected from the TwinsUK cohort with 856 healthy female twins of European descent
run on Illumina HumanMethylation450 bead chip (19,41).
Lastly, mQTL analyses were performed in Athero-Express, a
cohort including patients from two Dutch hospitals undergoing
carotid endarterectomy. DNA was isolated from carotid artery
plaque samples of 500 patients and ethylenediaminetetraacetic
acid whole-blood samples of 100 matched patients and checked
for purity and concentration using a Nanodrop system (Thermo
Scientific, Massachusetts, USA). DNA concentrations were equalized, bisulphate converted and used to measure DNA methylation using the HumanMethylation450 bead chip (Illumina, San
Diego, USA), according to the manufacturer’s protocol. Data
were processed using the ‘MethylAid’ in R (42) with normalization and batch-effect correction using ‘Functional Normalization’
with eight principal components of control probes. Invariable
probes (β range: 0–0.1 and 0.9–1) and outliers at >3 SD from median were removed from each probe. After quality control, 488
plaque samples, 92 blood samples and 443 872 probes (41 640
probes were excluded) remained for further analysis. Genotyping, available in 444 plaques and 92 blood samples, has previously been performed and is described elsewhere (43). For the
association with BMI, linear regression modeling was performed
with covariates age and sex. mQTL analysis was performed
in SNPtest (v2.5) using a model with covariates age, sex and
SNP array type. The Benjamini–Hochberg method was used to
account for multiple testing, where a FDR <5% was considered
statistically significant.
cohort and locus was deleted and VEGAS analysis was again
undertaken. The difference between the originally reported
gene-based P-value and the newly generated gene-based Pvalue (without the best SNP in the data set) was calculated and
evaluated.
Gene length analyses
Conflict of Interest statement: None declared.
The information on gene length was extracted and downloaded
from the table browser at UCSC Genome Browser website. The
genes were sorted based on significance from the VEGAS Stage
1 analysis, and the top genes were compared with the complementary gene list using different cutoff values (P < 10−6, <10−5,
<0.0001, <0.001, <0.01, <0.05, <0.10, <0.15, <0.20, <0.30, <0.40,
<0.50, <0.60, <0.70, <0.80, <0.90). A two samples t-test was used
in the calculation and applied to the log-transformed values of
gene lengths due to the skewed distribution of gene length. In
addition, a Spearman rank correlation test between gene length
and P-value was performed.
Sensitivity analyses
The best SNP as judged from the lowest P-value within each gene
is reported by VEGAS for each individual cohort. The stability of
the gene-based P-value can be investigated if the best SNP is deleted from the data set. Consequently, for each of the three loci
(PEX2, TXNDC12 and SSFA2), the best SNP as reported for each
Statistical software
The VEGAS software was used in an offline command version
and additional analyses were carried out using Unix, Perl, R
3.0.0 or 3.1.1 and Mathematica version 8. The LD pattern plot
was created using SNAP software version 2.2 (SNP Annotation
and Proxy Search; http://www.broadinstitute.org/mpg/snap/
ldplot.php) with SNP data set 1000 Genomes Pilot 1 and population panel CEU. The union of all SNPs contributing to the genebased signal from the TXNDC12 locus with four genes
(TXNDC12, BTF3L4, KTI12 and ZFYVE9) was submitted (n = 90
SNPs). Regional plots were made by the Locus Zoom online web
tool (http://csg.sph.umich.edu/locuszoom/) where SNPs contributing to the gene-based signals and the corresponding P-value
from each GWAS were uploaded for each locus separately. The
best SNP in the locus was chosen for each cohort, and if several
cohorts had the same SNP, it was plotted multiple times. The reference panel was 1000 Genomes build hg18 with population
panel CEU.
Ethics statement
We are only using summary statistics from each cohort within
GIANT, and the data have already been published and described
elsewhere (1,2). Thus, this study does not need any additional
ethical permit.
Supplementary material
Supplementary Material is available at HMG online.
Acknowledgements
The authors would like to acknowledge all contributors to the
Genetic Investigation of ANthropometric Traits (GIANT) Consortium and all research participants of studies within GIANT for
their very kind contributions toward science.
Funding
The study was supported by the Foundation for Geriatric Diseases, Karolinska Institutet.
References
1. Speliotes, E.K., Willer, C.J., Berndt, S.I., Monda, K.L., Thorleifsson, G., Jackson, A.U., Allen, H.L., Lindgren, C.M., Luan, J.,
Magi, R. et al. (2010) Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index.
Nat. Genet., 42, 937–948.
2. Locke, A.E., Kahali, B., Berndt, S.I., Justice, A.E., Pers, T.H., Day,
F.R., Powell, C., Vedantam, S. and Buchkovich, M.L. (2015)
Genetic studies of body mass index yield new insights for
obesity biology. Nature, 518, 197–206.
3. Manolio, T.A., Collins, F.S., Cox, N.J., Goldstein, D.B., Hindorff,
L.A., Hunter, D.J., McCarthy, M.I., Ramos, E.M., Cardon, L.R.,
Human Molecular Genetics
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
Chakravarti, A. et al. (2009) Finding the missing heritability of
complex diseases. Nature, 461, 747–753.
Ehret, G.B., Lamparter, D., Hoggart, C.J., Whittaker, J.C., Beckmann, J.S. and Kutalik, Z. (2012) A multi-SNP locus-association method reveals a substantial fraction of the missing
heritability. Am. J. Hum. Genet., 91, 863–871.
Wood, A.R., Hernandez, D.G., Nalls, M.A., Yaghootkar, H.,
Gibbs, J.R., Harries, L.W., Chong, S., Moore, M., Weedon, M.
N., Guralnik, J.M. et al. (2011) Allelic heterogeneity and more
detailed analyses of known loci explain additional phenotypic variation and reveal complex patterns of association. Hum.
Mol. Genet., 20, 4082–4092.
Huang, H., Chanda, P., Alonso, A., Bader, J.S. and Arking, D.E.
(2011) Gene-based tests of association. PLoS Genet., 7, e1002177.
Li, M.X., Gui, H.S., Kwan, J.S. and Sham, P.C. (2011) GATES: a
rapid and powerful gene-based association test using extended Simes procedure. Am. J. Hum. Genet., 88, 283–293.
Liu, J.Z., McRae, A.F., Nyholt, D.R., Medland, S.E., Wray, N.R.,
Brown, K.M., Hayward, N.K., Montgomery, G.W., Visscher, P.
M., Martin, N.G. et al. (2010) A versatile gene-based test for
genome-wide association studies. Am. J. Hum. Genet., 87,
139–145.
Petersen, A., Alvarez, C., DeClaire, S. and Tintle, N.L. (2013) Assessing methods for assigning SNPs to genes in gene-based
tests of association using common variants. PLoS One, 8,
e62161.
Kamburov, A., Stelzl, U., Lehrach, H. and Herwig, R. (2013) The
ConsensusPathDB interaction database: 2013 update. Nucleic
Acids Res., 41, D793–D800.
Calvano, S.E., Xiao, W., Richards, D.R., Felciano, R.M., Baker,
H.V., Cho, R.J., Chen, R.O., Brownstein, B.H., Cobb, J.P.,
Tschoeke, S.K. et al. (2005) A network-based analysis of systemic inflammation in humans. Nature, 437, 1032–1037.
Munzberg, H. and Morrison, C.D. (2015) Structure, production
and signaling of leptin. Metabolism, 64, 13–23.
Claussnitzer, M., Dankel, S.N., Kim, K.H., Quon, G., Meuleman, W., Haugen, C., Glunk, V., Sousa, I.S., Beaudry, J.L., Puviindran, V. et al. (2015) FTO obesity variant circuitry and
adipocyte browning in humans. N. Engl. J. Med., 373, 895–907.
Smemo, S., Tena, J.J., Kim, K.H., Gamazon, E.R., Sakabe, N.J.,
Gomez-Marin, C., Aneas, I., Credidio, F.L., Sobreira, D.R., Wasserman, N.F. et al. (2014) Obesity-associated variants within
FTO form long-range functional connections with IRX3. Nature, 507, 371–375.
Stratigopoulos, G., Martin Carli, J.F., O’Day, D.R., Wang, L.,
Leduc, C.A., Lanzano, P., Chung, W.K., Rosenbaum, M., Egli,
D., Doherty, D.A. et al. (2014) Hypomorphism for RPGRIP1L, a
ciliary gene vicinal to the FTO locus, causes increased adiposity in mice. Cell Metab., 19, 767–779.
Merkestein, M., Laber, S., McMurray, F., Andrew, D., Sachse, G.,
Sanderson, J., Li, M., Usher, S., Sellayah, D., Ashcroft, F.M. et al.
(2015) FTO influences adipogenesis by regulating mitotic clonal expansion. Nat. Commun., 6, 6792.
Lonsdale, J., Thomas, J., Salvatore, M., Phillips, R., Lo, E., Shad,
S. and Hasz, R. (2013) The Genotype-Tissue Expression (GTEx)
project. Nat. Genet., 45, 580–585.
Yang, T.P., Beazley, C., Montgomery, S.B., Dimas, A.S., Gutierrez-Arcelus, M., Stranger, B.E., Deloukas, P. and Dermitzakis,
E.T. (2010) Genevar: a database and Java application for the
analysis and visualization of SNP-gene associations in eQTL
studies. Bioinformatics, 26, 2474–2476.
Grundberg, E., Small, K.S., Hedman, A.K., Nica, A.C., Buil, A.,
Keildson, S., Bell, J.T., Yang, T.P., Meduri, E., Barrett, A. et al.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
| 11
(2012) Mapping cis- and trans-regulatory effects across multiple tissues in twins. Nat. Genet., 44, 1084–1089.
Mirina, A., Atzmon, G., Ye, K. and Bergman, A. (2012) Gene
size matters. PLoS One, 7, e49093.
Yang, J., Ferreira, T., Morris, A.P., Medland, S.E., Madden, P.A.,
Heath, A.C., Martin, N.G., Montgomery, G.W., Weedon, M.N.,
Loos, R.J. et al. (2012) Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet., 44, 369–375,
S361–S363.
Moskvina, V., O’Dushlaine, C., Purcell, S., Craddock, N., Holmans, P. and O’Donovan, M.C. (2011) Evaluation of an approximation method for assessment of overall significance
of multiple-dependent tests in a genomewide association
study. Genet. Epidemiol., 35, 861–866.
Hong, M.G., Pawitan, Y., Magnusson, P.K. and Prince, J.A.
(2009) Strategies and issues in the detection of pathway enrichment in genome-wide association studies. Hum. Genet.,
126, 289–301.
Leslie, R., O’Donnell, C.J. and Johnson, A.D. (2014) GRASP: analysis of genotype-phenotype results from 1390 genome-wide
association studies and corresponding open access database.
Bioinformatics, 30, i185–i194.
Huang, Y., Xie, C., Ye, A.Y., Li, C.Y., Gao, G. and Wei, L. (2013)
Recent adaptive events in human brain revealed by metaanalysis of positively selected genes. PLoS One, 8, e61280.
Alanen, H.I., Williamson, R.A., Howard, M.J., Lappi, A.K., Jantti, H.P., Rautio, S.M., Kellokumpu, S. and Ruddock, L.W. (2003)
Functional characterization of ERp18, a new endoplasmic reticulum-located thioredoxin superfamily member. J. Biol.
Chem., 278, 28912–28920.
Wei, J., Ji, H., Guo, M. and Qin, Q. (2012) Isolation and characterization of a thioredoxin domain-containing protein 12
from orange-spotted grouper, Epinephelus coioides. Fish Shellfish Immunol., 33, 667–673.
Liu, N., Li, Z., Pei, D. and Shu, X. (2013) Zfyve9a regulates the
proliferation of hepatic cells during zebrafish embryogenesis.
Int. J. Dev. Biol., 57, 773–778.
Mignarri, A., Vinciguerra, C., Giorgio, A., Ferdinandusse, S.,
Waterham, H., Wanders, R., Bertini, E., Dotti, M.T. and Federico, A. (2012) Zellweger spectrum disorder with mild phenotype caused by PEX2 gene mutations. JIMD Rep., 6, 43–46.
Kovacs, W.J., Shackelford, J.E., Tape, K.N., Richards, M.J.,
Faust, P.L., Fliesler, S.J. and Krisans, S.K. (2004) Disturbed
cholesterol homeostasis in a peroxisome-deficient PEX2
knockout mouse model. Mol. Cell. Biol., 24, 1–13.
Nedeau, A.E., Gallagher, K.A., Liu, Z.J. and Velazquez, O.C.
(2011) Elevation of hemopexin-like fragment of matrix metalloproteinase-2 tissue levels inhibits ischemic wound healing
and angiogenesis. J. Vasc. Surg., 54, 1430–1438.
Inokuchi, J., Komiya, M., Baba, I., Naito, S., Sasazuki, T. and
Shirasawa, S. (2004) Deregulated expression of KRAP, a
novel gene encoding actin-interacting protein, in human
colon cancer cells. J. Hum. Genet., 49, 46–52.
Fujimoto, T., Miyasaka, K., Koyanagi, M., Tsunoda, T., Baba, I.,
Doi, K., Ohta, M., Kato, N., Sasazuki, T. and Shirasawa, S.
(2009) Altered energy homeostasis and resistance to dietinduced obesity in KRAP-deficient mice. PLoS One, 4, e4240.
Fujimoto, T. and Shirasawa, S. (2011) KRAS-induced actin-interacting protein: a potent target for obesity, diabetes and
cancer. Anticancer Res., 31, 2413–2417.
Fujimoto, T. and Shirasawa, S. (2012) Identification of KRAPexpressing cells and the functional relevance of KRAP to
12
36.
37.
38.
39.
| Human Molecular Genetics
the subcellular localization of IP3R in the stomach and kidney. Int. J. Mol. Med., 30, 1287–1293.
Miyasaka, K., Fujimoto, T., Kawanami, T., Takiguchi, S., Jimi,
A., Funakoshi, A. and Shirasawa, S. (2011) Pancreatic hypertrophy in Ki-ras-induced actin-interacting protein gene
knockout mice. Pancreas, 40, 79–83.
Degoul, F., Brule, H., Cepanec, C., Helm, M., Marsac, C., Leroux,
J., Giege, R. and Florentz, C. (1998) Isoleucylation properties
of native human mitochondrial tRNAIle and tRNAIle
transcripts. Implications for cardiomyopathy-related point
mutations (4269, 4317) in the tRNAIle gene. Hum. Mol. Genet.,
7, 347–354.
Perli, E., Giordano, C., Tuppen, H.A., Montopoli, M., Montanari, A., Orlandi, M., Pisano, A., Catanzaro, D., Caparrotta,
L., Musumeci, B. et al. (2012) Isoleucyl-tRNA synthetase levels
modulate the penetrance of a homoplasmic m.4277T>C
mitochondrial tRNA(Ile) mutation causing hypertrophic cardiomyopathy. Hum. Mol. Genet., 21, 85–100.
Schwartzentruber, J., Buhas, D., Majewski, J., Sasarman, F.,
Papillon-Cavanagh, S., Thiffaut, I., Sheldon, K.M., Massicotte,
C., Patry, L., Simon, M. et al. (2014) Mutation in the nuclearencoded mitochondrial isoleucyl-tRNA synthetase IARS2 in
patients with cataracts, growth hormone deficiency with
40.
41.
42.
43.
short stature, partial sensorineural deafness, and peripheral
neuropathy or with Leigh syndrome. Hum. Mutat., 35, 1285–
1289.
Knorz, V.J., Spalluto, C., Lessard, M., Purvis, T.L., Adigun, F.F.,
Collin, G.B., Hanley, N.A., Wilson, D.I. and Hearn, T. (2010)
Centriolar association of ALMS1 and likely centrosomal functions of the ALMS motif-containing proteins C10orf90 and
KIAA1731. Mol. Biol. Cell, 21, 3617–3629.
Grundberg, E., Meduri, E., Sandling, J.K., Hedman, A.K., Keildson, S., Buil, A., Busche, S., Yuan, W., Nisbet, J., Sekowska, M.
et al. (2013) Global analysis of DNA methylation variation in
adipose tissue from twins reveals links to disease-associated
variants in distal regulatory elements. Am. J. Hum. Genet., 93,
876–890.
van Iterson, M., Tobi, E.W., Slieker, R.C., den Hollander, W.,
Luijk, R., Slagboom, P.E. and Heijmans, B.T. (2014) MethylAid:
visual and interactive quality control of large Illumina 450k
datasets. Bioinformatics, 30, 3435–3437.
Azghandi, S., Prell, C., van der Laan, S.W., Schneider, M.,
Malik, R., Berer, K., Gerdes, N., Pasterkamp, G., Weber, C.,
Haffner, C. et al. (2015) Deficiency of the stroke relevant
HDAC9 gene attenuates atherosclerosis in accord with allele-specific effects at 7p21.1. Stroke, 46, 197–202.