Abstract
Tag SNP selection is an important problem in genetic association studies. A class of algorithms to perform this task, among them a popular tool called Tagger, can be described as searching for a minimal vertex cover of a graph. In this article this approach is contrasted with a recently introduced clustering algorithm based on the graph theoretical concept of dominant sets. To compare the performance of both procedures comprehensive simulation studies have been performed using SNP data from the ten ENCODE regions included in the HapMap project. Quantitative traits have been simulated from additive models with a single causative SNP. Simulation results suggest that clustering performs always at least as good as Tagger, while in more than a third of the considered instances substantial improvement can be observed. Additionally an extension of the clustering algorithm is described which can be used for larger genomic data sets.
Similar content being viewed by others
References
Balding DJ (2006) A tutorial on statistical methods for population association studies Nat. Rev Gen 7: 781–791
Barrett JC, Fry B, Maller J, Daly MJ (2005) Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21: 263–265
de Bakker PI, Yelensky R, Pe’er I, Gabriel SB, Daly MJ, Altshuler D (2005) Efficiency and power in genetic association studies. Nat Genet 37: 1217–1223
Beckmann L, Ziegler A, Duggal P, Bailey-Wilson JE (2005) Haplotypes and haplotype-tagging single-nucleotide polymorphism: presentation Group 8 of Genetic Analysis Workshop 14. Genet Epidemiol 29: 59–71
Bogdan M, Frommlet F, Biecek P, Cheng R, Ghosh JK, Doerge RW (2008) Extending the modified Bayesian information criterion (mBIC) to dense markers and multiple interval mapping. Biometrics 64: 1162–1169
Bomze IM (1997) Evolution towards the maximum clique. JOGO 10: 143–164
Bomze IM (2005) Portfolio selection via replicator dynamics and projections of indefinite estimated covariances. Dyn Contin Dis Impul Syst B 12: 527–564
Buló SR (2008) Private communication
Carlson CS, Eberle MA, Rieder MJ, Yi Q, Kruglyak L, Nickerson DA (2004) Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am J Hum Genet 74: 106–120
Devlin B, Risch N (1995) A comparison of linkage disequilibrium measures for fine-scale mapping. Genomics 29: 311–322
Halldorsson BV, Istrail S, Vega F (2004) Optimal selection of SNP markers for disease association studies. Hum Hered 58: 190–202
Halperin E, Kimmel G, Shamir R (2005) Tag SNP selection in genotype data for maximizing SNP prediction accuracy. Bioinformatics 21: 195–203
He J, Zelikovsky A (2006) MLR-Tagging: informative SNP selection for unphased genotypes based on multiple linear regression. Bioinformatics 22: 2558–2561
Lin Z, Altman B (2004) Finding haplotype tagging SNPs by use of principal components analysis. Am J Hum Genet 75: 850–861
Lohmann G, Bohn S (2004) Using replicator dynamics for analyzing fMRI data of the human brain. IEEE Trans Med Imag 21: 485–492
Motzkin TS, Straus EG (1965) Maxima for graphs and a new proof of a theorem of Turan. Can J Math 17: 533–540
Nicodemus KK, Liu W, Chase GA, Tsai YY, Fallin MD (2005) Comparison of type I error for multiple test corrections in large single-nucleotide polymorphism studies using principal components versus haplotype blocking algorithms. BMC Genet 6(Suppl 1): S78
Nicolas P, Sun F, Li LM (2006) A model-based approach to selection of tag SNPs. BMC Bioinform 7: 303
Pavan M, Pelillo M (2003) A new graph-theoretic approach to clustering and segmentation. IEEE Conf Comput Vis Pattern Recogn 1: 145–152
Pavan M, Pelillo M (2007) Dominant sets and pairwise clustering. IEEE Trans Pat Anal Mach Int 29: 167–172
Pelillo M, Torsello A (2006) Payoff-monotonic game dynamics and the maximum clique problem. Neur Comp 18: 1215–1258
Pritchard JK, Przeworski M (2001) Linkage disequilibrium in humans: models and data. Am J Hum Genet 69: 1–14
Qin ZS, Gopalakrishnan S, Abecasis GR (2006) An efficient comprehensive search algorithm for tagSNP selection using linkage disequilibrium criteria. Bioinformatics 22: 220–225
Scheet P, Stephens M (2006) A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet 78: 629–644
Stram DO (2004) Tag SNP selection for association studies. Gen Epi 27: 365–374
The Encode Project Consortium: (2004) The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306: 636–640
The International HapMap Consortium: (2005) A haplotype map of the human genome. Nature 437: 1299–1320
The International HapMap Consortium: (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449: 851–862
Wellek S, Ziegler A (2009) A genotype-based approach to assessing the association between single nucleotide polymorphisms. Hum Hered 67: 128–139
Xu Z, Kaplan NL, Taylor JA (2007) TAGster: efficient selection of LD tag SNPs in single or multiple populations. Bioinformatics 23: 3254–3255
Zhang K, Deng M, Chen T, Waterman MS, Sun F (2002) A dynamic programming algorithm for haplotype block partitioning. Natl Acad Sci USA 99: 7335–7339
Zhang K, Sun F (2005) Assessing the power of tag SNPs in the mapping of quantitative trait loci (QTL) with extremal and random samples. BMC Genet 6: 51
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Frommlet, F. Tag SNP selection based on clustering according to dominant sets found using replicator dynamics. Adv Data Anal Classif 4, 65–83 (2010). https://doi.org/10.1007/s11634-010-0059-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-010-0059-2