Abstract
Linkage studies have successfully mapped loci underlying monogenic disorders, but mostly failed when applied to common diseases. Conversely, genome-wide association studies (GWASs) have identified replicable associations between thousands of SNPs and complex traits, yet capture less than half of the total heritability. In the present study we reconcile these two approaches by showing that linkage signals of height and body mass index (BMI) from 119,000 sibling pairs colocalize with GWAS-identified loci. Concordant with polygenicity, we observed the following: a genome-wide inflation of linkage test statistics; that GWAS results predict linkage signals; and that adjusting phenotypes for polygenic scores reduces linkage signals. Finally, we developed a method using recombination rate-stratified, identity-by-descent sharing between siblings to unbiasedly estimate heritability of height (0.76 ± 0.05) and BMI (0.55 ± 0.07). Our results imply that substantial heritability remains unaccounted for by GWAS-identified loci and this residual genetic variation is polygenic and enriched near these loci.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout




Similar content being viewed by others
Data availability
Individual-level data used in the present study are available through application to the relevant cohort. The individual-level UKB data are available upon application to the UKB (http://www.ukbiobank.ac.uk, accessed under project no. 12505). Average IBD status across four groups of loci defined by quartiles of the RR distribution will be returned (to the UKB) for 21,756 sib-pairs analyzed in the present study. These data will be accessible to researchers registered with the UKB. A genetic map for linkage analyses of height and BMI was downloaded from https://github.com/joepickrell/1000-genomes-genetic-maps/tree/master/interpolated_OMNI. A genetic map used in simulations was obtained from Bcftools: https://samtools.github.io/bcftools/bcftools.html. Summary statistics from GWASs of BMI conducted in the present study are available in Supplementary Data and in the GWAS Catalog (https://www.ebi.ac.uk/gwas) under accession no. GCST90446645.
Code availability
The customized code generated in this paper (source code of predLINK, R script to simulate sib-pairs, R script to run restricted maximum likelihood estimation for QISPs) is available via Zenodo at https://doi.org/10.5281/zenodo.10416893 (ref. 62). All other analyses were performed using publicly available software. Statistical analyses were performed using R (v.4.1.0, v.4.2.1; https://cran.r-project.org). MERLIN v.1.1.2 software was used to estimate IBD sharing (https://csg.sph.umich.edu/abecasis/Merlin/download/). KING v.2.2.7 software was used to identify sib-pairs (https://www.kingrelatedness.com/Download.shtml). GWAS of BMI was performed using BOLT-LMM v.2.4.1 (https://alkesgroup.broadinstitute.org/BOLT-LMM/BOLT-LMM_manual.html). GCTA software (gcta_1.93.1beta, v.1.93.2beta) was used for genotype data quality control (including principal component (PC) calculation, SNP loading calculation and PC projection for ancestry inference), SNP heritability estimation and COJO analysis (https://yanglab.westlake.edu.cn/software/gcta/index.html). Genotype data quality control, including filtering and LD pruning, as well as allelic scoring, was performed with PLINK v.1.90b6.20 (https://www.cog-genomics.org/plink).
References
Polderman, T. J. C. et al. Meta-analysis of the heritability of human traits based on fifty years of twin studies. Nat. Genet. 47, 702–709 (2015).
Risch, N. & Merikangas, K. The future of genetic studies of complex human diseases. Science 273, 1516–1517 (1996).
Lynch, M. & Walsh, B. Genetics and Analysis of Quantitative Traits (Sinauer Associates, Inc., 1998).
Botstein, D. & Risch, N. Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease. Nat. Genet. 33, 228–237 (2003).
Hall, J. M. et al. Linkage of early-onset familial breast cancer to chromosome 17q21. Science 250, 1684–1689 (1990).
Goate, A. et al. Segregation of a missense mutation in the amyloid precursor protein gene with familial Alzheimer’s disease. Nature 349, 704–706 (1991).
Risch, N. J. Searching for genetic determinants in the new millennium. Nature 405, 847–856 (2000).
Weiss, K. M. & Terwilliger, J. D. How many diseases does it take to map a gene with SNPs? Nat. Genet. 26, 151–157 (2000).
Visscher, P. M., Brown, M. A., McCarthy, M. I. & Yang, J. Five years of GWAS discovery. Am. J. Hum. Genet. 90, 7–24 (2012).
McClellan, J. & King, M. C. Genetic heterogeneity in human disease. Cell 141, 210–217 (2010).
Klein, R. J., Xu, X., Mukherjee, S., Willis, J. & Hayes, J. Successes of genome-wide association studies. Cell 142, 350–351 (2010).
Wang, K., Bucan, M., Grant, S. F. A., Schellenberg, G. & Hakonarson, H. Strategies for genetic studies of complex diseases. Cell 142, 351–353 (2010).
Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).
Visscher, P. et al. Assumption-free estimation of heritability from genome-wide identity-by-descent sharing between full siblings. PLoS Genet. 2, e41 (2006).
Young, A. I. et al. Relatedness disequilibrium regression estimates heritability without environmental bias. Nat. Genet. 50, 1304–1310 (2018).
Kong, A. et al. The nature of nurture: effects of parental genotypes. Science 359, 424–428 (2018).
Howe, L. J. et al. Within-sibship genome-wide association analyses decrease bias in estimates of direct genetic effects. Nat. Genet. 54, 581–592 (2022).
Lee, J. J. et al. Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat. Genet. 50, 1112–1121 (2018).
Smith, B. H. et al. Cohort profile: Generation Scotland: Scottish family health study (GS: SFHS). The study, its participants and their potential for genetic research on health and illness. Int. J. Epidemiol. 42, 689–700 (2013).
Scholtens, S. et al. Cohort profile: LifeLines, a three-generation cohort study and biobank. Int. J. Epidemiol. 44, 1172–1180 (2015).
Sijtsma, A. et al. Cohort profile update: LifeLines, a three-generation cohort study and biobank. Int. J. Epidemiol. 51, e295–e302 (2022).
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Leitsalu, L. et al. Cohort profile: Estonian Biobank of the Estonian Genome Center, University of Tartu. Int. J. Epidemiol. 44, 1137–1147 (2015).
Brumpton, B. M. et al. The HUNT study: a population-based cohort for genetic research. Cell Genom. 2, 100193 (2022).
Åsvold, B. O. et al. Cohort profile update: the HUNT study, Norway. Int. J. Epidemiol. 52, e80–e91 (2023).
Kemper, K. E. et al. Phenotypic covariance across the entire spectrum of relatedness for 86 billion pairs of individuals. Nat. Commun. 12, 1050 (2021).
Gazal, S. et al. Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 49, 1421–1427 (2017).
Schousboe, K. et al. Sex differences in heritability of BMI: a comparative study of results from twin studies in eight countries. Twin Res. 6, 409–421 (2003).
Silventoinen, K. et al. Heritability of adult body height: a comparative study of twin cohorts in eight countries. Twin Res. 6, 399–408 (2003).
Lander, E. & Kruglyak, L. Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results. Nat. Genet. 11, 241–247 (1995).
Lander, E. S. & Botstein, D. Mapping mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121, 185–199 (1989).
Dekkers, J. C. M. & Dentine, M. R. Quantitative genetic variance associated with chromosomal markers in segregating populations. Theor. Appl. Genet. 81, 212–220 (1991).
Yengo, L. et al. A saturated map of common genetic variants associated with human height. Nature 610, 704–712 (2022).
Yengo, L. et al. Meta-analysis of genome-wide association studies for height and body mass index in ∼700000 individuals of European ancestry. Hum. Mol. Genet. 27, 3641–3649 (2018).
Visscher, P. M. Proportion of the variation in genetic composition in backcrossing programs explained by genetic markers. J. Heredity 87, 136–138 (1996).
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).
Hodge, S. E. Linkage analysis versus association analysis: distinguishing between two models that explain disease-marker associations. Am. J. Hum. Genet. 53, 367–384 (1993).
Hemani, G. et al. Inference of the genetic architecture underlying BMI and height with the use of 20,240 sibling pairs. Am. J. Hum. Genet. 93, 865–875 (2013).
Hivert, V. et al. Estimation of non-additive genetic variance in human complex traits from a large sample of unrelated individuals. Am. J. Hum. Genet. 108, 786–798 (2021).
Hill, W. G., Goddard, M. E. & Visscher, P. M. Data and theory point to mainly additive genetic variance for complex traits. PLoS Genet. 4, e1000008 (2008).
Yang, J. et al. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nat. Genet. 47, 1114–1120 (2015).
Wainschtein, P. et al. Assessing the contribution of rare variants to complex trait heritability from whole-genome sequence data. Nat. Genet. 54, 263–273 (2022).
Backman, J. D. et al. Exome sequencing and analysis of 454,787 UK Biobank participants. Nature 599, 628–634 (2021).
Akbari, P. et al. Sequencing of 640,000 exomes identifies GPR75 variants associated with protection from obesity. Science 373, eabf8683 (2021).
Tenesa, A., Rawlik, K., Navarro, P. & Canela-Xandri, O. Genetic determination of height-mediated mate choice. Genome Biol. 16, 269 (2016).
Yengo, L. et al. Imprint of assortative mating on the human genome. Nat. Hum. Behav. 2, 948–954 (2018).
Robinson, M. R. et al. Genetic evidence of assortative mating in humans. Nat. Hum. Behav. 1, 16 (2017).
Visscher, P. M. & Haley, C. S. Detection of putative quantitative trait loci in line crosses under infinitesimal genetic models. Theor. Appl. Genet. 93, 691–702 (1996).
Sham, P. C., Cherny, S. S., Purcell, S. & Hewitt, J. K. Power of linkage versus association analysis of quantitative traits, by use of variance-components models, for sibship data. Am. J. Hum. Genet. 66, 1616–1630 (2000).
Visscher, P. M. & Hopper, J. L. Power of regression and maximum likelihood methods to map QTL from sib-pair and DZ twin data. Ann. Hum. Genet. 65, 583–601 (2001).
Manichaikul, A. et al. Robust relationship inference in genome-wide association studies. Bioinformatics 26, 2867–2873 (2010).
Abecasis, G. R., Cherny, S. S., Cookson, W. O. & Cardon, L. R. Merlin—rapid analysis of dense genetic maps using sparse gene flow trees. Nat. Genet. 30, 97–101 (2002).
Locke, A. E. et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206 (2015).
Loh, P. R., Kichaev, G., Gazal, S., Schoech, A. P. & Price, A. L. Mixed-model association for biobank-scale datasets. Nat. Genet. 50, 906–908 (2018).
Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).
Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375 (2012).
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).
Zaitlen, N. et al. Using extended genealogy to estimate components of heritability for 23 quantitative and dichotomous traits. PLoS Genet. 9, e1003520 (2013).
Yengo, L., Wray, N. R. & Visscher, P. M. Extreme inbreeding in a European ancestry sample from the contemporary UK population. Nat. Commun. 10, 3719 (2019).
Delaneau, O., Zagury, J.-F. & Marchini, J. Improved whole-chromosome phasing for disease and population genetic studies. Nat. Methods 10, 5–6 (2013).
Wray, N. R. et al. Pitfalls of predicting complex traits from SNPs. Nat. Rev. Genet. 14, 507–515 (2013).
Yengo, L. Genetic architecture reconciles linkage and association studies of complex traits. Zenodo https://doi.org/10.5281/zenodo.10416893 (2023).
Acknowledgements
We thank the participants and analysts in each cohort contributing to the present study. L.Y. was funded by the Australian Research Council (grant nos DE200100425 and FT220100069). P.M.V. was funded by the Australian Research Council (grant no. FL180100072) and the Australian National Health and Medical Research Council (NHMRC; grant no. 113400). B.C.-D. is supported by NHMRC’s CJ Martin Fellowship (grant no. APP1161356). G.-H.M. is the recipient of an Australian Research Council Discovery Early Career Award (project no. DE220101226) funded by the Australian Government and supported by the Research Council of Norway (project grant no. 325640). D.C. is supported by the Ragnar Söderberg Foundation (grant no. E42/15), D.C. and D.J.B. by Open Philanthropy (grant no. 010623-00001 to D.J.B.) and D.J.B. by the National Institute on Aging/National Institutes of Health (grant nos R24-AG065184 and R01-AG042568). D.M.E. is supported by an Australian NHMRC Investigator Award (no. 2017942). Additional acknowledgements are provided in Supplementary Information.
Author information
Authors and Affiliations
Consortia
Contributions
P.M.V. and L.Y. conceptualized and jointly supervised the study. J.S. conducted statistical analyses (and meta-analyses) of UKB, QIMR, GS, LL and EBB data with assistance or guidance from P.M.V., L.Y. and K.E.K. G.-H.M. and B.B. performed linkage analyses using data from the HUNT study. B.C.-D., A.C., C.H., S.G., A.A., R.W., I.M.N., R.M., B.O.A. and L.B. prepared data in the respective cohorts. D.J.B., D.C., D.M.E., M.E.G. and C.S.H. contributed through suggestions and comments on study design, methods, analyses and their interpretation. D.P., S.E.M., N.G.M., H.S., A.M., K.H. and B.B. contributed to data collection, data management and scientific leadership of the respective cohorts. L.Y., P.M.V. and J.S. wrote the manuscript with the participation of all authors. All the authors approved the final version of the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Genetics thanks Luke O’Connor, Daniel Weeks and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Observed and theoretically predicted statistics for locus-specific linkage analysis.
a,The observed and predicted mean test statistics of linkage (χ2) test statistics for height and BMI. The error-bars indicate standard errors (s.e.) calculated as the standard deviation of locus-specific statistics divided by the square root of the effective number independent markers, that is ~94 (Supplementary Table 8). The size of the circle is proportional to sample size. The theoretically predicted values are based on the REML estimates of heritability from genome wide IBD regression (\(\widehat{{h}_{FS}^{2}}\)) and the observed correlation between siblings. b, The proportion of loci with positive (i) estimated linkage (the bars and the values) and (ii) theoretically predicted (the black rectangles +/- s.e., Methods). The dotted horizontal line represents the proportion (that is, 0.5) expected in the absence of a genetic contribution to the trait. The data is shown for Generation Scotland (GS, number of quasi-independent sib-pairs (n) = 8,368), the Queensland Institute of Medical Research cohort (QIMR, n = 12,844), the Lifelines Cohort (LL, n = 16,581), the UK Biobank (UKB, n = 21,756), the Estonian Biobank (EBB, n = 25,333) the HUNT study (HUNT, n = 34,575) and the meta-analysis combining all cohorts (META, n = 119,457). The numerical values for mean and median χ2 and proportion of χ2 > 0 are presented in Supplementary Table 7a.
Extended Data Fig. 2 Effect of polygenicity and sample size of linkage studies on the correlation between predicted and observed linkage signals in simulated data.
The results are shown for 8 simulated genetic architectures (polygenicity = 0.1%-100%) with a genome-wide h2 = 1. a-b, show the observed and predicted linkage signals (measured as variance explained) on chromosomes 1 (a) and 22 (b), respectively, for one simulation replicate. The simulated causal variants are depicted as green stars. The predicted signal, estimated as a weighted sum of simulated effects (Methods, equation (1)) is depicted by the black curve. The grey and yellow lines show the observed linkage signal from the analysis of 20,000 and 100,000 simulated sib-pairs, respectively, where the phenotypes were simulated using the same causal variants (green stars). The correlations \(\hat{\phi }\) for each polygenicity panel are the chromosome-wide estimates for each linkage sample size (yellow: n=20,000; grey: n=100,000). c, the summary of results across 100 replicates. \(\hat{\phi }\) is estimated per chromosome across the grid of 0.5 cM, then a chromosome length weighted average is calculated for each replicate. Each symbol represents a mean value across 100 simulation replicates and the error bars are standard deviation across replicates. The left-most enlarged symbols for each polygenicity panel indicate that the true simulated SNP effects were used predict linkage signal, that is, the expected prediction accuracy from polygenic scores (\({R}_{g}^{2}\)) using these causal variants = 1. To approximate estimation errors of SNP effects in a GWAS of finite sample, \(\hat{\phi }\) was also calculated using causal variants with \({R}_{g}^{2}\) <1 (regular symbols). For the numeric values see Supplementary Table 9. Estimated variance components were not constrained to ensure unbiasedness. Therefore, if a region of the genome does not explain any genetic variation, then 50% of the estimates are expected to be negative.
Extended Data Fig. 3 Colocalization between GWAS-predicted and observed linkage signals for traits adjusted for polygenic scores (PGS).
a, The correlation between observed linkage signals for PGS-adjusted height and predicted linkage signals from 12,010 height-associated SNPs. b, The correlation between observed linkage signals for PGS-adjusted BMI and predicted linkage signals from 787 BMI-associated SNPs. Height was adjusted using a PGS based on the same 12,010 height-associated SNPs (explaining 38% of height variance), while BMI was adjusted using a PGS including 4,582 SNPs (explaining 9% of BMI variance). The x-axis in each panel displays the correlation (\(\hat{\phi }\)) between observed and predicted (from GWAS results; Methods) linkage signals. In each panel, the vertical dashed line represents the correlation between observed and predicted linkage signals from either height-associated SNPs (a) or 787 BMI-associated SNPs (b). Predicted linkage signals were also obtained under the null hypothesis (that is ‘the correlation between observed and predicted linkage signals is due to the curvature effect’) using 1,000 draws of random SNPs with similar minor allele frequency and linkage disequilibrium properties as trait-associated SNPs. The histogram in each panel represents the distribution of correlations (under the null) between observed linkage for the trait indicated in the corresponding column-panel and predicted linkage obtained from these 1,000 draws. The mean of correlations obtained under the null hypothesis is denoted \({\hat{\phi }}_{{\rm{CE}}}\). The P-values (P) reported in the top-left corner of each panel assess the statistical significance of the difference between \(\hat{\phi }\) and \({\hat{\phi }}_{{\rm{CE}}}\) using a two-sided Wald test. Numeric values are presented in Supplementary Table 10.
Extended Data Fig. 4 Correlation between chromosome length and estimates of variance explained from linkage analyses of BMI.
Analyses were based on summary statistics from a linkage meta-analysis of BMI and BMI adjusted for polygenic score (PGS). The x axis represents the physical length of each chromosome relative to the size of the autosome (that is, ~2879 Mb). The y axis represents the expected variance explained (\({q}_{{\rm{s}}}^{2}\)) for each chromosome (s = 1–22) estimated as \({q}_{{\rm{s}}}^{2}=\,{m}_{s}{\bar{q}}^{2}\), where \({\bar{q}}^{2}\) is the mean across the chromosome of estimates of locus-specific variance, and ms an effective number of independent markers per chromosome (Supplementary Table 8). Error bars around each dot represent ms times the standard deviation of linkage estimate across the chromosomes. Standard errors (s.e.) of the regression slopes were obtained using a leave-one-chromosome-out jackknife approach. 95% confidence intervals (CI) were calculated as 1.96×s.e.
Supplementary information
Supplementary Information
Supplementary Figs. 1–8, Tables 1–17, Notes 1–3, Methods, Discussion and references, Acknowledgements and Consortium author list.
Supplementary Code 1
Multiple *.R files and *.Rdata files to generate the corresponding figures. Figure1.R: R script to generate Fig. 1 from Figure1.Rdata. Figure2.R: R script to generate Fig. 2 from Figure2.Rdata. Figure3.R: R script to generate Fig. 3 from Figure3.Rdata. Figure4.R: R script to generate Fig. 4 from Figure4.Rdata. ExtData_Figure1.R: R script to generate Extended Data Fig. 1 from ExtData_Figure1.Rdata. ExtData_Figure2.R: R script to generate Extended Data Fig. 2 from ExtData_Figure2.Rdata. ExtData_Figure3.R: R script to generate Extended Data Fig. 3 from ExtData_Figure3.Rdata. ExtData_Figure4.R: R script to generate Extended Data Fig. 4 from ExtData_Figure4.Rdata.
Supplementary Data
GWAS summary statistics for BMI-associated variants generated in the present study. The list of independent variants was obtained using the GCTA-COJO algorithm.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sidorenko, J., Couvy-Duchesne, B., Kemper, K.E. et al. Genetic architecture reconciles linkage and association studies of complex traits. Nat Genet 56, 2352–2360 (2024). https://doi.org/10.1038/s41588-024-01940-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-024-01940-2