Key Points
-
Studies of gene–environment (G×E) interactions can be useful for investigating biological pathways, and can reveal genes that act only in particular environments or exposures that are hazardous only to genetically susceptible individuals. Such knowledge can be used for setting environmental safety standards, understanding heterogeneity in genetic associations across populations, predicting the risks and changes to an individual that might result from changes in modifiable risk factors, and choosing the best treatment based on a patient's genotype.
-
Basic epidemiological cohort or case–control designs can be used for studying G×E interactions, but more powerful alternatives include case-only, two-phase case–control and counter-matched designs. Case-only substudies within clinical trials are attractive for studying genetic modifiers of treatment response because genotype and treatment can be assumed to be independent through randomization.
-
Various exploratory and hypothesis-driven approaches are available for examining the joint effects of multiple genes and exposures in a common pathway. Hierarchical models provide a way to incorporate external knowledge about the pathway into the analysis of complex interactions in the study data.
-
Two-step analyses can be used in genome-wide association studies to target a subset of promising interactions and improve the power for testing them in the same data set using an independent test. New methods are being developed that use pathway information to guide the search for novel genes and interactions or that mine agnostic genome scans for novel pathways.
-
Comprehensive ontologies that incorporate environmental and toxicological information into genomic and pathway databases will be useful for informing future analyses of complex G×E interactions in both pathway-driven and genome-wide association scans.
-
Emerging areas include understanding how the environment influences gene expression through epigenetics, somatic mutations and other mechanisms, and understanding the roles of these effects in disease causation. Various types of biomarkers and high-volume metabolomics methods can be incorporated as intermediate variables in pathway-based analysis methods.
Abstract
Despite the yield of recent genome-wide association (GWA) studies, the identified variants explain only a small proportion of the heritability of most complex diseases. This unexplained heritability could be partly due to gene–environment (G×E) interactions or more complex pathways involving multiple genes and exposures. This Review provides a tutorial on the available epidemiological designs and statistical analysis approaches for studying specific G×E interactions and choosing the most appropriate methods. I discuss the approaches that are being developed for studying entire pathways and available techniques for mining interactions in GWA data. I also explore methods for marrying hypothesis-driven pathway-based approaches with 'agnostic' GWA studies.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Le Marchand, L. The predominance of the environment over genes in cancer causation: implications for genetic epidemiology. Cancer Epidemiol. Biomarkers Prev. 14, 1037–1039 (2005).
Le Marchand, L. & Wilkens, L. R. Design considerations for genomic association studies: importance of gene–environment interactions. Cancer Epidemiol. Biomarkers Prev. 17, 263–267 (2008).
Kraft, P., Yen, Y. C., Stram, D. O., Morrison, J. & Gauderman, W. J. Exploiting gene–environment interaction to detect genetic associations. Hum. Hered. 63, 111–119 (2007).
Hunter, D. J. Gene–environment interactions in human diseases. Nature Rev. Genet. 6, 287–298 (2005). An excellent Review of the basic principles of epidemiological study designs for G×E interactions in the pre-GWA studies era. Among other insights, the author argues that G×E findings can 'point the finger' towards the causal constituent of a complex mixture.
Greene, C. S., Penrod, N. M., Williams, S. M. & Moore, J. H. Failure to replicate a genetic association may provide important clues about genetic architecture. PLoS ONE 4, e5639 (2009).
Ioannidis, J. P. Non-replication and inconsistency in the genome-wide association setting. Hum. Hered. 64, 203–213 (2007).
Thomas, D. Methods for investigating gene–environment interactions in candidate pathway and genome-wide association studies. Annu. Rev. Public Health4 Jan 2010 (doi:10.1146/annurev.publhealth.012809.103619).
Cordell, H. J. Detecting gene–gene interactions that underlie human diseases. Nature Rev. Genet. 10, 392–404 (2009).
Holmans, P. et al. Gene Ontology analysis of GWA study data sets provides insights into the biology of bipolar disorder. Am. J. Hum. Genet. 85, 13–24 (2009).
Sebastiani, P., Ramoni, M. F., Nolan, V., Baldwin, C. T. & Steinberg, M. H. Genetic dissection and prognostic modeling of overt stroke in sickle cell anemia. Nature Genet. 37, 435–440 (2005).
Khoury, M. J. & Wacholder, S. Invited commentary: from genome-wide association studies to gene–environment-wide interaction studies — challenges and opportunities. Am. J. Epidemiol. 169, 227–230 (2009).
Thomas, D. C. Exposure–time–response relationships with applications to cancer epidemiology. Ann. Rev. Public Health 9, 451–482 (1988).
Thomas, D. C., Stram, D. & Dwyer, J. Exposure measurement error: influence on exposure–disease relationships and methods of correction. Ann. Rev. Public Health 14, 69–93 (1993).
Lobach, I., Carroll, R. J., Spinka, C., Gail, M. H. & Chatterjee, N. Haplotype-based regression analysis and inference of case–control studies with unphased genotypes and measurement errors in environmental exposures. Biometrics 64, 673–684 (2008).
Wong, M. Y., Day, N. E., Luan, J. A. & Wareham, N. J. Estimation of magnitude in gene–environment interactions in the presence of measurement error. Stat. Med. 23, 987–998 (2004).
Smith, P. G. & Day, N. E. The design of case–control studies: the influence of confounding and interaction effects. Int. J. Epidemiol. 13, 356–365 (1984).
Gauderman, W. J. Sample size requirements for matched case–control studies of gene–environment interaction. Stat. Med. 21, 35–50 (2002). This paper describes a general approach to sample size and power calculations for G×E studies and the capabilities of the freely available Quanto program for this purpose.
Garcia-Closas, M. & Lubin, J. H. Power and sample size calculations in case–control studies of gene–environment interactions: comments on different approaches. Am. J. Epidemiol. 149, 689–692 (1999).
Burton, P. R. et al. Size matters: just how big is BIG? Quantifying realistic sample size requirements for human genome epidemiology. Int. J. Epidemiol. 38, 263–273 (2009).
Ioannidis, J. P., Trikalinos, T. A. & Khoury, M. J. Implications of small effect sizes of individual genetic variants on the design and interpretation of genetic association studies of complex diseases. Am. J. Epidemiol. 164, 609–614 (2006).
Matullo, G., Berwick, M. & Vineis, P. Gene–environment interactions: how many false positives? J. Natl Cancer Inst. 97, 550–551 (2005).
Clayton, D. & McKeigue, P. M. Epidemiological methods for studying genes and environmental factors in complex diseases. Lancet 358, 1356–1360 (2001). This paper takes a critical look at the current enthusiasm for G×E interactions, particularly in the context of large biobanks. The authors argue for case–control studies over cohort studies and for relying on case-only methods for detecting G×E interactions; however, they question whether genes involved in interactions might not more easily be discovered on the basis of the marginal associations they induce.
Moore, J. H. The ubiquitous nature of epistasis in determining susceptibility to common human diseases. Hum. Hered. 56, 73–82 (2003). The creator of the MDR algorithm for identifying higher-order interactions gives a spirited argument in support of the notion that many such effects would be overlooked by limiting attention to factors showing significant main effects.
Moore, J. H. & Williams, S. M. Epistasis and its implications for personal genetics. Am. J. Hum. Genet. 85, 309–320 (2009).
Yang, Q. & Khoury, M. J. Evolving methods in genetic epidemiology. III. Gene–environment interaction in epidemiologic research. Epidemiol. Rev. 19, 33–43 (1997). Another excellent review of study design principles for G×E interactions, covering a broad range of designs.
Manolio, T. A., Bailey-Wilson, J. E. & Collins, F. S. Genes, environment and the value of prospective cohort studies. Nature Rev. Genet. 7, 812–820 (2006).
Andrieu, N. & Goldstein, A. M. Epidemiologic and genetic approaches in the study of gene–environment interaction: an overview of available methods. Epidemiol. Rev. 20, 137–147 (1998).
Piegorsch, W., Weinberg, C. & Taylor, J. Non-hierarchical logistic models and case-only designs for assessing susceptibility in population-based case–control studies. Stat. Med. 13, 153–162 (1994). The paper that introduced the case-only design for testing G×E interactions.
Caporaso, N. et al. Genome-wide and candidate gene association study of cigarette smoking behaviors. PLoS ONE 4, e4653 (2009).
Thorgeirsson, T. E. et al. A variant associated with nicotine dependence, lung cancer and peripheral arterial disease. Nature 452, 638–642 (2008).
Thomas, D. C. Case–parents design for gene–environment interaction by Schaid. Genet. Epidemiol. 19, 461–463 (2000).
Broeks, A. et al. Identification of women with an increased risk of developing radiation-induced breast cancer: a case only study. Breast Cancer Res. 9, R26 (2007).
Albert, P. S., Ratnasinghe, D., Tangrea, J. & Wacholder, S. Limitations of the case-only design for identifying gene–environment interactions. Am. J. Epidemiol. 154, 687–693 (2001).
Mukherjee, B. et al. Tests for gene–environment interaction from case–control data: a novel study of type I error, power and designs. Genet. Epidemiol. 32, 615–626 (2008).
Li, D. & Conti, D. V. Detecting gene–environment interactions using a combined case-only and case–control approach. Am. J. Epidemiol. 169, 497–504 (2009).
Schaid, D. Case–parents design for gene–environment interaction. Genet. Epidemiol. 16, 261–273 (1999). This paper introduced the transmission-disequilibrium test stratified by the case's exposure as a method of testing for G×E interactions that is robust to population G–E association.
Gauderman, W. J., Witte, J. S. & Thomas, D. C. Family-based association studies. J. Natl Cancer Inst. Monogr. 26, 31–37 (1999).
Laird, N. M. & Lange, C. Family-based designs in the age of large-scale gene-association studies. Nature Genet. 7, 385–394 (2006). A review of the various family-based designs for testing genetic main effects in the context of GWA studies.
Cui, J. S. et al. Regressive logistic and proportional hazards disease models for within-family analyses of measured genotypes, with application to a CYP17 polymorphism and breast cancer. Genet. Epidemiol. 24, 161–172 (2003).
Boomsma, D., Busjahn, A. & Peltonen, L. Classical twin studies and beyond. Nature Rev. Genet. 3, 872–882 (2002).
Andrieu, N. & Demenais, F. Interactions between genetic and reproductive factors in breast cancer risk in a French family sample. Am. J. Hum. Genet. 61, 678–690 (1997).
Gauderman, W. J. & Faucett, C. L. Detection of gene–environment interactions in joint segregation and linkage analysis. Am. J. Hum. Genet. 61, 1189–1199 (1997).
Gauderman, W. J. & Siegmund, K. D. Gene–environment interaction and affected sib pair linkage analysis. Hum. Hered. 52, 34–46 (2001).
Schaid, D. J., Olson, J. M., Gauderman, W. J. & Elston, R. C. Regression models for linkage: issues of traits, covariates, heterogeneity, and interaction. Hum. Hered. 55, 86–96 (2003).
White, J. E. A two stage design for the study of the relationship between a rare exposure and a rare disease. Am. J. Epidemiol. 115, 119–128 (1982). The paper that first introduced the idea of two-stage sampling in the epidemiologic context.
Breslow, N. E. & Chatterjee, N. Design and analysis of two-phase studies with binary outcome applied to Wilms tumor prognosis. Appl. Stat. 48, 457–468 (1999). Arguably the most accessible summary of a major series of papers on the design and analysis of two-phase case–control studies.
Li, R. et al. Glutathione S-transferase genotype as a susceptibility factor in smoking-related coronary heart disease. Atherosclerosis 149, 451–462 (2000).
Breslow, N. E., Lumley, T., Ballantyne, C. M., Chambless, L. E. & Kulich, M. Using the whole cohort in the analysis of case–cohort data. Am. J. Epidemiol. 169, 1398–1405 (2009). An important contribution to the literature on two-phase case–control studies that emphasizes the value added by exploiting the information available on the entire cohort that is not used in standard analysis methods.
Bernstein, J. L. et al. Study design: evaluating gene–environment interactions in the etiology of breast cancer — the WECARE study. Breast Cancer Res. 6, R199–R214 (2004). This paper provides an overview of the design of the WECARE study, giving particular attention to the power gained from using the counter-matched design when testing for gene–radiation interactions.
Langholz, B. & Goldstein, L. Risk set sampling in epidemiologic cohort studies. Stat. Sci. 11, 35–53 (1996). This paper provides a non-technical discussion of counter-matching and other cohort sampling designs, with numerous examples of applications for epidemiologic studies.
Andrieu, N., Goldstein, A. M., Thomas, D. C. & Langholz, B. Counter-matching in studies of gene–environment interaction: efficiency and feasibility. Am. J. Epidemiol. 153, 265–274 (2001).
Gilliland, F. D., McConnell, R., Peters, J. & Gong, H. Jr. A theoretical basis for investigating ambient air pollution and children's respiratory health. Environ. Health Perspect. 107, 403–407 (1999). This paper provides a superb overview of the biological rationale for focusing studies of air pollution and respiratory disease on genes and environmental modifiers involved in oxidative stress and inflammatory pathways.
Hoh, J., Wille, A. & Ott, J. Trimming, weighting, and grouping SNPs in human case–control association studies. Genome Res. 11, 2115–2119 (2001).
McKinney, B. A., Reif, D. M., Ritchie, M. D. & Moore, J. H. Machine learning for detecting gene–gene interactions: a review. Appl. Bioinformatics 5, 77–88 (2006).
Moore, J. H. & Williams, S. M. Epistasis and its implications for personal genetics. Am. J. Hum. Genet. 85, 309–320 (2009).
Ritchie, M. D. & Motsinger, A. A. Multifactor dimensionality reduction for detecting gene–gene and gene–environment interactions in pharmacogenomics studies. Pharmacogenomics 6, 823–834 (2005).
Le Marchand, L. et al. Combined effects of well-done red meat, smoking, and rapid N-acetyltransferase 2 and CYP1A2 phenotypes in increasing colorectal cancer risk. Cancer Epidemiol. Biomarkers Prev. 10, 1259–1266 (2001). A classic example of an interaction involving two genes and two exposures for which none of the constituent lower-order main effects or interactions is significant.
Vineis, P. et al. Current smoking, occupation, N-acetyltransferase-2 and bladder cancer: a pooled analysis of genotype-based studies. Cancer Epidemiol. Biomarkers Prev. 10, 1249–1252 (2001).
Thomas, D. C. et al. Approaches to complex pathways in molecular epidemiology: summary of an AACR special conference. Cancer Res. 68, 10028–10030 (2008).
Thomas, D. C. The need for a systematic approach to complex pathways in molecular epidemiology. Cancer Epidemiol. Biomarkers Prev. 14, 557–559 (2005).
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).
Wang, K., Li, M. & Bucan, M. Pathway-based approaches for analysis of genomewide association studies. Am. J. Hum. Genet. 81, 1278–1283 (2007).
Hong, M. G., Pawitan, Y., Magnusson, P. K. & Prince, J. A. Strategies and issues in the detection of pathway enrichment in genome-wide association studies. Hum. Genet. 126, 289–301 (2009).
Chasman, D. I. On the utility of gene set methods in genomewide association studies of quantitative traits. Genet. Epidemiol. 32, 658–668 (2008). This paper provides a clear discussion of the use of GSEA as a way of prioritizing hits from a GWA study and interpreting the ensemble of SNP associations in relation to pathways.
Aragaki, C. C., Greenland, S., Probst-Hensch, N. & Haile, R. W. Hierarchical modeling of gene–environment interactions: estimating NAT2 genotype-specific dietary effects on adenomatous polyps. Cancer Epidemiol. Biomarkers Prev. 6, 307–314 (1997).
Wakefield, J., De Vocht, F. & Hung, R. J. Bayesian mixture modeling of gene–environment and gene–gene interactions. Genet. Epidemiol. 34, 16–25 (2010).
Hung, R. J. et al. Inherited predisposition of lung cancer: a hierarchical modeling approach to DNA repair and cell cycle control pathways. Cancer Epidemiol. Biomarkers Prev. 16, 2736–2744 (2007).
Hung, R. J. et al. Using hierarchical modeling in genetic association studies with multiple markers: application to a case–control study of bladder cancer. Cancer Epidemiol. Biomarkers Prev. 13, 1013–1021 (2004). One of the first examples of the use of hierarchical modelling for the study of G×E interactions. A set of pathway indicator variables are used as prior covariates to classify specific combinations of genes and environmental exposures.
Conti, D. V. et al. in Phenotypes and Endophenotypes: Foundations for Genetic Studies of Nicotine Use and Dependence (ed. Swan, G. E.) 539–584 (NCI Tobacco Control Monographs, Bethesda, Maryland, 2009).
Wang, L. & Weinshilboum, R. M. Pharmacogenomics: candidate gene identification, functional validation and mechanisms. Hum. Mol. Genet. 17, R174–R179 (2008).
Rebbeck, T. R., Spitz, M. & Wu, X. Assessing the function of genetic variants in candidate gene association studies. Nature Rev. Genet. 5, 589–597 (2004). An excellent discussion of ways of interpreting candidate-gene associations in relation to biological function. The functions are inferred from various external sources of information or from programs for computing the predicted function of polymorphisms.
Ulrich, C. M. et al. Mathematical modeling of folate metabolism: predicted effects of genetic polymorphisms on mechanisms and biomarkers relevant to carcinogenesis. Cancer Epidemiol. Biomarkers Prev. 17, 1822–1831 (2008). One of a long series of papers on mathematical modelling of the folate pathway. This article focuses specifically on the use of the authors' model to predict the effects of variation in metabolic rate parameters for polymorphisms in specific genes on various outcomes, such as homocysteine concentration or DNA methylation reactions.
Thomas, D. C. et al. Use of pathway information in molecular epidemiology. Hum. Genomics 4, 21–42 (2010).
Armitage, P. & Doll, R. The age distribution of cancer and a multistage theory of carcinogenesis. Br. J. Cancer 8, 1–12 (1954).
Moolgavkar, S. H. & Knudson, A. G. Jr. Mutation and cancer: a model for human carcinogenesis. J. Natl Cancer Inst. 66, 1037–1052 (1981).
Racine-Poon, A. & Wakefield, J. Statistical methods for population pharmacokinetic modelling. Stat. Methods Med. Res. 7, 63–84 (1998).
Clewell, H. J., Andersen, M. E. & Barton, H. A. A consistent approach for the application of pharmacokinetic modeling in cancer and noncancer risk assessment. Environ. Health Persp. 110, 85–93 (2002).
Bois, F. Y. Applications of population approaches in toxicology. Toxicol. Lett. 120, 385–394 (2001).
Nijhout, H. F., Reed, M. C. & Ulrich, C. M. Mathematical models of folate-mediated one-carbon metabolism. Vitam. Horm. 79, 45–82 (2008).
Bergman, R. N. et al. Minimal model-based insulin sensitivity has greater heritability and a different genetic basis than homeostasis model assessment or fasting insulin. Diabetes 52, 2168–2174 (2003).
Cascorbi, I. Genetic basis of toxic reactions to drugs and chemicals. Toxicol. Lett. 162, 16–28 (2006).
Cortessis, V. & Thomas, D. C. in Mechanistic Considerations in the Molecular Epidemiology of Cancer (eds Bird, P., Boffetta, P., Buffler, P. & Rice, J.) 127–150 (IARC Scientific Publications, Lyon, France, 2003).
Thomas, D. C. Multistage sampling for latent variable models. Lifetime Data Anal. 13, 565–581 (2007).
Didelez, V. & Sheehan, N. Mendelian randomization as an instrumental variable approach to causal inference. Stat. Methods Med. Res. 16, 309–330 (2007).
Davey Smith, G. & Ebrahim, S. 'Mendelian randomization': can genetic epidemiology contribute to understanding environmental determinants of disease? Int. J. Epidemiol. 32, 1–22 (2003).
Greenland, S. An introduction to instrumental variables for epidemiologists. Int. J. Epidemiol. 29, 722–729 (2000).
Dai, J. Y., LeBlanc, M. & Kooperberg, C. Semiparametric estimation exploiting covariate independence in two-phase randomized trials. Biometrics 65, 178–187 (2009).
McCarthy, M. I. et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nature Rev. Genet. 9, 356–369 (2008).
Altshuler, D., Daly, M. J. & Lander, E. S. Genetic mapping in human disease. Science 322, 881–888 (2008).
Satagopan, J. M., Verbel, D. A., Venkatraman, E. S., Offit, K. E. & Begg, C. B. Two-stage designs for gene–disease association studies. Biometrics 58, 163–170 (2002).
Wang, H., Thomas, D. C., Pe'er, I. & Stram, D. O. Optimal two-stage genotyping designs for genome-wide association scans. Genet. Epidemiol. 30, 356–368 (2006).
Skol, A. D., Scott, L. J., Abecasis, G. R. & Boehnke, M. Optimal designs for two-stage genome-wide association studies. Genet. Epidemiol. 31, 776–788 (2007).
Elston, R. C., Lin, D. & Zheng, G. Multistage sampling for genetic studies. Annu. Rev. Genomics Hum. Genet. 8, 327–342 (2007).
Thomas, D. C. et al. Methodological issues in multistage genome-wide association studies. Stat. Sci. Preprint at http://www.imstat.org/sts/future_papers.html (2009).
Kooperberg, C. & Leblanc, M. Increasing the power of identifying gene × gene interactions in genome-wide association studies. Genet. Epidemiol. 32, 255–263 (2008).
Marchini, J., Donnelly, P. & Cardon, L. R. Genome-wide strategies for detecting multiple loci that influence complex diseases. Nature Genet. 37, 413–417 (2005).
Evans, D. M., Marchini, J., Morris, A. P. & Cardon, L. R. Two-stage two-locus models in genome-wide association. PLoS Genet. 2, e157 (2006).
Umbach, D. M. & Weinberg, C. R. Designing and analysing case–control studies to exploit independence of genotype and exposure. Stat. Med. 16, 1731–1743 (1997).
Murcray, C. E., Lewinger, J. P. & Gauderman, W. J. Gene–environment interaction in genome-wide association studies. Am. J. Epidemiol. 169, 219–226 (2009).
Pearson, J. V. et al. Identification of the genetic basis for complex disorders by use of pooling-based genomewide single-nucleotide-polymorphism association studies. Am. J. Hum. Genet. 80, 126–139 (2007).
Craig, D. W. et al. Identification of genetic variants using bar-coded multiplexed sequencing. Nature Methods 5, 887–893 (2008).
Sham, P., Bader, J. S., Craig, I., O'Donovan, M. & Owen, M. DNA pooling: a tool for large-scale association studies. Nature Rev. Genet. 3, 862–871 (2002).
Cantor, R. M., Lange, K. & Sinsheimer, J. S. Prioritizing GWAS results: a review of statistical methods and recommendations for their application. Am. J. Hum. Genet. 86, 6–22 (2010).
Roeder, K., Devlin, B. & Wasserman, L. Improving power in genome-wide association studies: weights tip the scale. Genet. Epidemiol. 31, 741–747 (2007).
Whittemore, A. S. A Bayesian false discovery rate for multiple testing. J. Appl. Stat. 34, 1–9 (2007).
Wakefield, J. A Bayesian measure of the probability of false discovery in genetic epidemiology studies. Am. J. Hum. Genet. 81, 208–227 (2007).
Wakefield, J. Reporting and interpretation in genome-wide association studies. Int. J. Epidemiol. 37, 641–653 (2008).
Datta, S. Empirical Bayes screening of many p-values with applications to microarray studies. Bioinformatics 21, 1987–1994 (2005).
Chen, G. K. & Witte, J. S. Enriching the analysis of genomewide association studies with hierarchical modeling. Am. J. Hum. Genet. 81, 397–404 (2007).
Lewinger, J. P., Conti, D. V., Baurley, J. W., Triche, T. J. & Thomas, D. C. Hierarchical Bayes prioritization of marker associations from a genome-wide association scan for further investigation. Genet. Epidemiol. 31, 871–882 (2007).
Binder, H. & Schumacher, M. Incorporating pathway information into boosting estimation of high-dimensional risk prediction models. BMC Bioinformatics 10, 18 (2009).
Holden, M., Deng, S., Wojnowski, L. & Kulle, B. GSEA-SNP: applying gene set enrichment analysis to SNP data from genome-wide association studies. Bioinformatics 24, 2784–2785 (2008).
Elbers, C. C. et al. Using genome-wide pathway analysis to unravel the etiology of complex diseases. Genet. Epidemiol. 33, 419–431 (2009).
Baranzini, S. E. et al. Pathway and network-based analysis of genome-wide association studies in multiple sclerosis. Hum. Mol. Genet. 18, 2078–2090 (2009).
Torkamani, A., Topol, E. J. & Schork, N. J. Pathway analysis of seven common diseases assessed by genome-wide association. Genomics 92, 265–272 (2008).
Lesnick, T. G. et al. A genomic pathway approach to a complex disease: axon guidance and Parkinson disease. PLoS Genet. 3, e98 (2007).
Thomas, P. D. et al. A systems biology network model for genetic association studies of nicotine addiction and treatment. Pharmacogenet. Genomics 19, 538–551 (2009).
Gieger, C. et al. Genetics meets metabolomics: a genome-wide association study of metabolite profiles in human serum. PLoS Genet. 4, e1000282 (2008).
Friedman, N. Inferring cellular networks using probabilistic graphical models. Science 303, 799–805 (2004). An important paper that popularized the use of Bayesian network analysis for the reconstruction of gene networks from gene co-expression data.
Ramoni, R. B., Saccone, N. L., Hatsukami, D. K., Bierut, L. J. & Ramoni, M. F. A Testable prognostic model of nicotine dependence. J. Neurogenet. 23, 283–292 (2009).
Ferrazzi, F., Sebastiani, P., Ramoni, M. F. & Bellazzi, R. Bayesian approaches to reverse engineer cellular systems: a simulation study on nonlinear Gaussian networks. BMC Bioinformatics 8, S2 (2007).
Kohler, S., Bauer, S., Horn, D. & Robinson, P. N. Walking the interactome for prioritization of candidate disease genes. Am. J. Hum. Genet. 82, 949–958 (2008).
Koch, L. G. & Britton, S. L. Development of animal models to test the fundamental basis of gene–environment interactions. Obesity (Silver Spring) 16, S28–S32 (2008).
Gilliland, F. D., Li, Y. F., Saxon, A. & Diaz-Sanchez, D. Effect of glutathione-S-transferase M1 and P1 genotypes on xenobiotic enhancement of allergic responses: randomised, placebo-controlled crossover study. Lancet 363, 119–125 (2004). An excellent example of the use of experimental designs for investigating G×E interactions, in this case a randomized crossover challenge study of immunologic responses to diesel exhaust particles in allergic subjects.
Thomas, D. C. & Conti, D. V. Two stage genetic association studies. in Encyclopedia of Clinical Trials (eds D'Agostino, R., Sullivan, L. & Massaro, J.) (Wiley, New York, 2007).
Israel, E. et al. Use of regularly scheduled albuterol treatment in asthma: genotype-stratified, randomised, placebo-controlled cross-over trial. Lancet 364, 1505–1512 (2004).
Davis, B. R. et al. Imputing gene–treatment interactions when the genotype distribution is unknown using case-only and putative placebo analyses — a new method for the Genetics of Hypertension Associated Treatment (GenHAT) study. Stat. Med. 23, 2413–2427 (2004).
Vittinghoff, E. & Bauer, D. C. Case-only analysis of treatment–covariate interactions in clinical trials. Biometrics 62, 769–776 (2006).
Lin, B. K. et al. Tracking the epidemiology of human genes in the literature: the HuGE Published Literature database. Am. J. Epidemiol. 164, 1–4 (2006).
Khoury, M. J. & Little, J. Human genome epidemiologic reviews: the beginning of something HuGE. Am. J. Epidemiol. 151, 2–3 (2000).
Yesupriya, A. et al. Reporting of human genome epidemiology (HuGE) association studies: an empirical assessment. BMC Med. Res. Methodol. 8, 31 (2008).
Jensen, L. J., Saric, J. & Bork, P. Literature mining for the biologist: from information retrieval to biological discovery. Nature Rev. Genet. 7, 119–129 (2006).
Raychaudhuri, S. et al. Identifying relationships among genomic disease regions: predicting genes at pathogenic SNP associations and rare deletions. PLoS Genet. 5, e1000534 (2009).
Gene Ontology Consortium. The Gene Ontology (GO) project in 2006. Nucleic Acids Res. 34, D322–D326 (2006).
Kanehisa, M. et al. KEGG for linking genomes to life and the environment. Nucleic Acids Res. 36, D480–D484 (2008).
Thomas, P. D. et al. PANTHER: a library of protein families and subfamilies indexed by function. Genome Res. 13, 2129–2141 (2003).
Miller, R. L. & Ho, S. M. Environmental epigenetics and asthma: current concepts and call for studies. Am. J. Respir. Crit. Care Med. 177, 567–573 (2008).
Salk, J. J., Fox, E. J. & Loeb, L. A. Mutational heterogeneity in human cancers: origin and consequences. Annu. Rev. Pathol. 5, 51–75 (2010).
Zeisel, S. H. Epigenetic mechanisms for nutrition determinants of later health outcomes. Am. J. Clin. Nutr. 89, 1488S–1493S (2009).
Perera, F. et al. Relation of DNA methylation of 5′-CpG island of ACSL3 to transplacental exposure to airborne polycyclic aromatic hydrocarbons and childhood asthma. PLoS ONE 4, e4488 (2009).
Baccarelli, A. et al. Rapid DNA methylation changes after exposure to traffic particles. Am. J. Respir. Crit. Care Med. 179, 572–578 (2009).
Fraga, M. F. et al. Epigenetic differences arise during the lifetime of monozygotic twins. Proc. Natl Acad. Sci. USA 102, 10604–10609 (2005).
Stranger, B. E. et al. Genome-wide associations of gene expression variation in humans. PLoS Genet. 1, e78 (2005).
Manolio, T. A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009).
Zhu, X., Feng, T., Li, Y., Lu, Q. & Elston, R. C. Detecting rare variants for complex traits using family and unrelated data. Genet. Epidemiol. 34, 171–187 (2010).
Siva, N. 1000 Genomes project. Nature Biotech. 26, 256 (2008).
Cullen, A. C., Corrales, M. A., Kramer, C. B. & Faustman, E. M. The application of genetic information for regulatory standard setting under the clean air act: a decision-analytic approach. Risk Anal. 28, 877–890 (2008).
Shostak, S. Locating gene–environment interaction: at the intersections of genetics and public health. Soc. Sci. Med. 56, 2327–2342 (2003).
Need, A. C., Motulsky, A. G. & Goldstein, D. B. Priorities and standards in pharmacogenetic research. Nature Genet. 37, 671–681 (2005).
Lave, L. B. & Omenn, G. S. Clearing The Air: Reforming The Clean Air Act (Brookings Institution, Washington, DC, 1981).
Rose, G. The Strategy Of Preventive Medicine (Oxford Univ. Press, 1992).
Bernstein, J. L. et al. Radiation-induced second primary breast cancer and BRCA1 and BRCA2 mutation carrier status: a report from the WECARE Study. J. Natl Cancer Inst. (in the press).
Perera, F. P. Molecular epidemiology: on the path to prevention? J. Natl Cancer Inst. 92, 602–612 (2000).
Feng, D. et al. Platelet glycoprotein IIIa PlA polymorphism, fibrinogen, and platelet aggregability: The Framingham Heart Study. Circulation 104, 140–144 (2001).
He, C., Tamimi, R. M., Hankinson, S. E., Hunter, D. J. & Han, J. A prospective study of genetic polymorphism in MPO, antioxidant status, and breast cancer risk. Breast Cancer Res. Treat. 113, 585–594 (2009).
Bureau, A., Diallo, M. S., Ordovas, J. M. & Cupples, L. A. Estimating interaction between genetic and environmental risk factors: efficiency of sampling designs within a cohort. Epidemiology 19, 83–93 (2008).
Jugessur, A. et al. Cleft palate, transforming growth factor alpha gene variants, and maternal exposures: assessing gene–environment interactions in case–parent triads. Genet. Epidemiol. 25, 367–374 (2003).
Mayer, E. J. et al. Genetic and environmental influences on insulin levels and the insulin resistance syndrome: an analysis of women twins. Am. J. Epidemiol. 143, 323–332 (1996).
Bernstein, J. L. et al. Radiation exposure, the ATM gene, and risk of bilateral breast cancer in the WECARE study. J. Natl Cancer Inst. (in the press).
Gilliland, F. D. et al. Effects of glutathione S-transferase M1, maternal smoking during pregnancy, and environmental tobacco smoke on asthma and wheezing in children. Am. J. Respir. Crit. Care Med. 166, 457–463 (2002).
Martinez, F. D. Gene–environment interactions in asthma: with apologies to William of Ockham. Proc. Am. Thorac. Soc. 4, 26–31 (2007).
Gianfagna, F., De Feo, E., van Duijn, C. M., Ricciardi, G. & Boccia, S. A systematic review of meta-analyses on gene polymorphisms and gastric cancer risk. Curr. Genomics 9, 361–374 (2008).
Siemiatycki, J. & Thomas, D. C. Biological models and statistical interactions: an example from multistage carcinogenesis. Int. J. Epidemiol. 10, 383–387 (1981).
Greenland, S. Interactions in epidemiology: relevance, identification, and estimation. Epidemiology 20, 14–17 (2009).
Haldane, J. B. S. Heredity and Politics (W. W. Norton, New York, 1938).
Ottman, R. An epidemiologic approach to gene–environment interaction. Genet. Epidemiol. 7, 177–185 (1990). This widely quoted paper was one of the first to offer a classification of different types of G×E interactions, and gives classic examples of each type.
Lewontin, R. C. Annotation: the analysis of variance and the analysis of causes. Am. J. Hum. Genet. 26, 400–411 (1974).
Garcia-Closas, M. et al. NAT2 slow acetylation, GSTM1 null genotype, and risk of bladder cancer: results from the Spanish Bladder Cancer Study and meta-analyses. Lancet 366, 649–659 (2005).
Dearfield, K. L., Benson, W. H., Gallagher, K. & Johnson, J. D. in Genomics and Environmental Regulation: Science, Ethics, and Law (eds Sharp, R. R., Marchant, G. E. & Grodsky, J. A.) 25–34 (Johns Hopkins Univ. Press, Baltimore, 2009).
Lympany, P. A. et al. HLA-DPB polymorphisms: Glu 69 association with sarcoidosis. Eur. J. Immunogenet. 23, 353–359 (1996).
Jacobi, C. E., Nagelkerke, N. J., van Houwelingen, J. H. & de Bock, G. H. Breast cancer screening, outside the population-screening program, of women from breast cancer families without proven BRCA1/BRCA2 mutations: a simulation study. Cancer Epidemiol. Biomarkers Prev. 15, 429–436 (2006).
Ulrich, C. M. & Potter, J. D. Folate supplementation: too much of a good thing? Cancer Epidemiol. Biomarkers Prev. 15, 189–193 (2006).
Acknowledgements
Supported in part by grants 5P30 ES007043, 1U01 ES15090 and 1R01 ES016813 from the US National Institute of Environmental Health Sciences. The author is grateful to D. Conti, W. J. Gauderman, F. Gilliland and R. Haile for many helpful suggestions.
Author information
Authors and Affiliations
Ethics declarations
Competing interests
The author declares no competing financial interests.
Supplementary information
Supplementary Figure S1
Sample-size requirements for gene–environment interactions. (PDF 240 kb)
Related links
Related links
FURTHER INFORMATION
Human Genome Epidemiology Network (HuGENet)
Nature Reviews Genetics series on Genome-wide association studies
Glossary
- Marginal effects
-
The effects of a specific risk factor (gene or exposure) in the population as a whole, averaging over all other variables.
- Genome-wide association study
-
A scan of the entire genome for association with a disease or trait using a standard panel of ∼500,000 to 1 million haplotype-tagging SNPs.
- Gene–environment-wide interaction study
-
A scan of the entire genome for interactions with various environmental exposures.
- Ecologic-level study
-
An observational epidemiology study that relies on comparisons of aggregate disease rates across groups in relation to aggregate exposure information rather than comparisons between individuals.
- Interaction odds ratio
-
The ratio of odds ratios for the relationship of one factor (for example, a gene) with disease across the levels of another factor (for example, an environmental exposure); as such, it is a measure of departure from a multiplicative joint effect.
- Confounder
-
A spurious association between a risk factor (a gene, exposure or interaction) and disease induced by the joint associations of some other variable with the risk factor and the disease that are independent of the risk factor. Confounding can also distort the magnitude of the association of a true risk factor with disease or mask it.
- Gene–environment independence
-
The independent distribution of genotype and environment in the source population.
- Empirical Bayes
-
A technique for estimating the effects of each component of a large ensemble of related variables by assuming the ensemble has some common distribution and estimating the parameters of that distribution. Empirical Bayes estimators typically have better prediction error than estimating each one separately.
- Bayes model averaging
-
A technique for accounting for uncertainty about the correct model form (for example, the selection of variables to include in a multiple regression model) by averaging the effects of each possible variable over the set of all plausible models.
- Modified segregation analysis
-
This analysis applies likelihood-based methods to data from a pedigree in which one or more members have genotypes available at a major gene. It derives the genotypes of untyped individuals by summing their conditional genotype probabilities using the genotypes available.
- Population stratification
-
The phenomenon of an apparently homogeneous population that is actually composed of subgroups of individuals with distinct ancestral origins and differing allele frequencies at many loci. This leads to bias in the assessment of the significance of associations of a trait with particular loci.
- Joint segregation and linkage analysis
-
The use of family studies to estimate the parameters of a penetrance model. The parameters could include interactions between the unobserved major gene, which is linked to a marker, and environmental factors.
- Multiple regression
-
A standard statistical technique for relating a single outcome variable to multiple explanatory variables, either all at once or using some variable selection method, such as stepwise forward selection or backward elimination.
- Machine learning
-
Any of many data analysis techniques for mining large data sets derived from the computer science field. The techniques are not specifically based on mathematical statistics theory.
- Pattern recognition
-
Any technique from exploratory data analysis or machine learning for discovering non-random patterns in large data sets.
- First-level coefficients
-
In a hierarchical model, the regression coefficients (for example, log relative risks for each variable) for the subject-level data on the association between risk factors and disease. Unlike a non-hierarchical model, these coefficients are treated as random variables with distributions described in the higher level(s) of the model rather than as model parameters to be estimated directly.
- Pathway indicator variables
-
Various types of information that can be used as predictor variables in the higher levels of a hierarchical model, specifically binary variables that indicate whether a particular gene or interaction has a role in a particular pathway.
- Ontology
-
A formal system for organizing knowledge, here used in the context of biological pathways as a means of synthesizing information about the function of genes and exposures and their joint roles in disease causation.
- Reverse causation
-
A bias in the estimation of the causal effect of a biomarker on disease when biospecimens are obtained after diagnosis. The bias occurs because the disease or its treatment alters the underlying intermediate variable or the measurement of it.
- Mendelian randomization
-
A technique for studying the relationship between a biomarker and disease indirectly by studying the relationship of each to a gene that influences the biomarker.
- Instrumental variable
-
In statistics, a variable that can be used to predict the value of an explanatory variable that is measured with error. The instrumental variable thereby indirectly yields an unbiased estimate of the relationship of the explanatory variable with an outcome variable.
- Multiple comparisons penalty
-
The higher degree of statistical significance that is required for a particular association to be considered noteworthy when many possible associations are analysed simultaneously. Several adjustment methods can take account of this penalty, the best known of which is the Bonferroni correction.
- Bonferroni correction
-
A multiple comparisons adjustment for testing at a conventional significance level. It is based on multiplying the p value for a specific test by the total number of tests performed, and approximately controls the overall type I error rate (the probability of at least one false positive association) at the chosen significance level if the predictors are independent.
- DNA bar-coding
-
The addition of a unique molecular tag to each fragment of an individual's DNA so that after pooling with other DNA samples, the genotype of each individual in the pool can be reconstructed.
- Coherence
-
The extent to which the data at hand is concordant with other types of biological knowledge, thereby reinforcing a causal interpretation.
- False discovery rate
-
This controls the proportion of all reported positive associations that are expected to be false positives, and can be used to judge which of many associations are noteworthy.
- Bayesian network analysis
-
A technique for developing a minimal graphical representation of the connections among a large set of variables by examining the conditional independence relationships among pairs of variables given the other variables connected to them within the graph. This technique has been widely used for the analysis of gene co-expression data.
- Challenge studies
-
Various experimental designs for assessing the effects of a noxious agent by exposing individuals to trace amounts in a controlled setting (as in a randomized or crossover trial). For gene–environment interaction studies, the effects can be compared across subgroups with different genotypes, and the efficiency can be improved by stratified sampling based on genotype.
- Latent variable models
-
A model involving one or more unobservable intermediate variables that represent the pathway connecting a cause (for example, exposures and genotypes) to an effect (for example, disease). Identifying the pathways typically requires the use of surrogates for the latent variables (for example, biomarkers) in addition to the observable cause and effect variables.
- 1000 Genomes Project
-
A large-scale effort to obtain and catalogue the full genome-wide DNA sequence of 1,000 individuals selected from a range of races.
Rights and permissions
About this article
Cite this article
Thomas, D. Gene–environment-wide association studies: emerging approaches. Nat Rev Genet 11, 259–272 (2010). https://doi.org/10.1038/nrg2764
Published:
Issue Date:
DOI: https://doi.org/10.1038/nrg2764