[go: up one dir, main page]

Skip to main content
Log in

Tag SNP selection based on clustering according to dominant sets found using replicator dynamics

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

Tag SNP selection is an important problem in genetic association studies. A class of algorithms to perform this task, among them a popular tool called Tagger, can be described as searching for a minimal vertex cover of a graph. In this article this approach is contrasted with a recently introduced clustering algorithm based on the graph theoretical concept of dominant sets. To compare the performance of both procedures comprehensive simulation studies have been performed using SNP data from the ten ENCODE regions included in the HapMap project. Quantitative traits have been simulated from additive models with a single causative SNP. Simulation results suggest that clustering performs always at least as good as Tagger, while in more than a third of the considered instances substantial improvement can be observed. Additionally an extension of the clustering algorithm is described which can be used for larger genomic data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Balding DJ (2006) A tutorial on statistical methods for population association studies Nat. Rev Gen 7: 781–791

    Article  Google Scholar 

  • Barrett JC, Fry B, Maller J, Daly MJ (2005) Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21: 263–265

    Article  Google Scholar 

  • de Bakker PI, Yelensky R, Pe’er I, Gabriel SB, Daly MJ, Altshuler D (2005) Efficiency and power in genetic association studies. Nat Genet 37: 1217–1223

    Article  Google Scholar 

  • Beckmann L, Ziegler A, Duggal P, Bailey-Wilson JE (2005) Haplotypes and haplotype-tagging single-nucleotide polymorphism: presentation Group 8 of Genetic Analysis Workshop 14. Genet Epidemiol 29: 59–71

    Article  Google Scholar 

  • Bogdan M, Frommlet F, Biecek P, Cheng R, Ghosh JK, Doerge RW (2008) Extending the modified Bayesian information criterion (mBIC) to dense markers and multiple interval mapping. Biometrics 64: 1162–1169

    Article  MATH  Google Scholar 

  • Bomze IM (1997) Evolution towards the maximum clique. JOGO 10: 143–164

    Article  MATH  MathSciNet  Google Scholar 

  • Bomze IM (2005) Portfolio selection via replicator dynamics and projections of indefinite estimated covariances. Dyn Contin Dis Impul Syst B 12: 527–564

    MATH  MathSciNet  Google Scholar 

  • Buló SR (2008) Private communication

  • Carlson CS, Eberle MA, Rieder MJ, Yi Q, Kruglyak L, Nickerson DA (2004) Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am J Hum Genet 74: 106–120

    Article  Google Scholar 

  • Devlin B, Risch N (1995) A comparison of linkage disequilibrium measures for fine-scale mapping. Genomics 29: 311–322

    Article  Google Scholar 

  • Halldorsson BV, Istrail S, Vega F (2004) Optimal selection of SNP markers for disease association studies. Hum Hered 58: 190–202

    Article  Google Scholar 

  • Halperin E, Kimmel G, Shamir R (2005) Tag SNP selection in genotype data for maximizing SNP prediction accuracy. Bioinformatics 21: 195–203

    Article  Google Scholar 

  • He J, Zelikovsky A (2006) MLR-Tagging: informative SNP selection for unphased genotypes based on multiple linear regression. Bioinformatics 22: 2558–2561

    Article  Google Scholar 

  • Lin Z, Altman B (2004) Finding haplotype tagging SNPs by use of principal components analysis. Am J Hum Genet 75: 850–861

    Article  Google Scholar 

  • Lohmann G, Bohn S (2004) Using replicator dynamics for analyzing fMRI data of the human brain. IEEE Trans Med Imag 21: 485–492

    Article  Google Scholar 

  • Motzkin TS, Straus EG (1965) Maxima for graphs and a new proof of a theorem of Turan. Can J Math 17: 533–540

    MATH  MathSciNet  Google Scholar 

  • Nicodemus KK, Liu W, Chase GA, Tsai YY, Fallin MD (2005) Comparison of type I error for multiple test corrections in large single-nucleotide polymorphism studies using principal components versus haplotype blocking algorithms. BMC Genet 6(Suppl 1): S78

    Article  Google Scholar 

  • Nicolas P, Sun F, Li LM (2006) A model-based approach to selection of tag SNPs. BMC Bioinform 7: 303

    Article  Google Scholar 

  • Pavan M, Pelillo M (2003) A new graph-theoretic approach to clustering and segmentation. IEEE Conf Comput Vis Pattern Recogn 1: 145–152

    Google Scholar 

  • Pavan M, Pelillo M (2007) Dominant sets and pairwise clustering. IEEE Trans Pat Anal Mach Int 29: 167–172

    Article  Google Scholar 

  • Pelillo M, Torsello A (2006) Payoff-monotonic game dynamics and the maximum clique problem. Neur Comp 18: 1215–1258

    Article  MATH  MathSciNet  Google Scholar 

  • Pritchard JK, Przeworski M (2001) Linkage disequilibrium in humans: models and data. Am J Hum Genet 69: 1–14

    Article  Google Scholar 

  • Qin ZS, Gopalakrishnan S, Abecasis GR (2006) An efficient comprehensive search algorithm for tagSNP selection using linkage disequilibrium criteria. Bioinformatics 22: 220–225

    Article  Google Scholar 

  • Scheet P, Stephens M (2006) A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet 78: 629–644

    Article  Google Scholar 

  • Stram DO (2004) Tag SNP selection for association studies. Gen Epi 27: 365–374

    Article  Google Scholar 

  • The Encode Project Consortium: (2004) The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306: 636–640

    Article  Google Scholar 

  • The International HapMap Consortium: (2005) A haplotype map of the human genome. Nature 437: 1299–1320

    Article  Google Scholar 

  • The International HapMap Consortium: (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449: 851–862

    Article  Google Scholar 

  • Wellek S, Ziegler A (2009) A genotype-based approach to assessing the association between single nucleotide polymorphisms. Hum Hered 67: 128–139

    Article  Google Scholar 

  • Xu Z, Kaplan NL, Taylor JA (2007) TAGster: efficient selection of LD tag SNPs in single or multiple populations. Bioinformatics 23: 3254–3255

    Article  Google Scholar 

  • Zhang K, Deng M, Chen T, Waterman MS, Sun F (2002) A dynamic programming algorithm for haplotype block partitioning. Natl Acad Sci USA 99: 7335–7339

    Article  MATH  Google Scholar 

  • Zhang K, Sun F (2005) Assessing the power of tag SNPs in the mapping of quantitative trait loci (QTL) with extremal and random samples. BMC Genet 6: 51

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Florian Frommlet.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Frommlet, F. Tag SNP selection based on clustering according to dominant sets found using replicator dynamics. Adv Data Anal Classif 4, 65–83 (2010). https://doi.org/10.1007/s11634-010-0059-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-010-0059-2

Keywords

Mathematics Subject Classification (2000)