Tag SNP selection based on clustering according to dominant sets found using replicator dynamics

Florian Frommlet¹

155 Accesses
Explore all metrics

Abstract

Tag SNP selection is an important problem in genetic association studies. A class of algorithms to perform this task, among them a popular tool called Tagger, can be described as searching for a minimal vertex cover of a graph. In this article this approach is contrasted with a recently introduced clustering algorithm based on the graph theoretical concept of dominant sets. To compare the performance of both procedures comprehensive simulation studies have been performed using SNP data from the ten ENCODE regions included in the HapMap project. Quantitative traits have been simulated from additive models with a single causative SNP. Simulation results suggest that clustering performs always at least as good as Tagger, while in more than a third of the considered instances substantial improvement can be observed. Additionally an extension of the clustering algorithm is described which can be used for larger genomic data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the impact of relatedness on SNP association analysis

Article Open access 06 December 2017

Performance of a blockwise approach in variable selection using linkage disequilibrium information

Article Open access 08 May 2015

High-throughput inference of pairwise coalescence times identifies signals of selection and enriched disease heritability

Article 13 August 2018

References

Balding DJ (2006) A tutorial on statistical methods for population association studies Nat. Rev Gen 7: 781–791
Article Google Scholar
Barrett JC, Fry B, Maller J, Daly MJ (2005) Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21: 263–265
Article Google Scholar
de Bakker PI, Yelensky R, Pe’er I, Gabriel SB, Daly MJ, Altshuler D (2005) Efficiency and power in genetic association studies. Nat Genet 37: 1217–1223
Article Google Scholar
Beckmann L, Ziegler A, Duggal P, Bailey-Wilson JE (2005) Haplotypes and haplotype-tagging single-nucleotide polymorphism: presentation Group 8 of Genetic Analysis Workshop 14. Genet Epidemiol 29: 59–71
Article Google Scholar
Bogdan M, Frommlet F, Biecek P, Cheng R, Ghosh JK, Doerge RW (2008) Extending the modified Bayesian information criterion (mBIC) to dense markers and multiple interval mapping. Biometrics 64: 1162–1169
Article MATH Google Scholar
Bomze IM (1997) Evolution towards the maximum clique. JOGO 10: 143–164
Article MATH MathSciNet Google Scholar
Bomze IM (2005) Portfolio selection via replicator dynamics and projections of indefinite estimated covariances. Dyn Contin Dis Impul Syst B 12: 527–564
MATH MathSciNet Google Scholar
Buló SR (2008) Private communication
Carlson CS, Eberle MA, Rieder MJ, Yi Q, Kruglyak L, Nickerson DA (2004) Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am J Hum Genet 74: 106–120
Article Google Scholar
Devlin B, Risch N (1995) A comparison of linkage disequilibrium measures for fine-scale mapping. Genomics 29: 311–322
Article Google Scholar
Halldorsson BV, Istrail S, Vega F (2004) Optimal selection of SNP markers for disease association studies. Hum Hered 58: 190–202
Article Google Scholar
Halperin E, Kimmel G, Shamir R (2005) Tag SNP selection in genotype data for maximizing SNP prediction accuracy. Bioinformatics 21: 195–203
Article Google Scholar
He J, Zelikovsky A (2006) MLR-Tagging: informative SNP selection for unphased genotypes based on multiple linear regression. Bioinformatics 22: 2558–2561
Article Google Scholar
Lin Z, Altman B (2004) Finding haplotype tagging SNPs by use of principal components analysis. Am J Hum Genet 75: 850–861
Article Google Scholar
Lohmann G, Bohn S (2004) Using replicator dynamics for analyzing fMRI data of the human brain. IEEE Trans Med Imag 21: 485–492
Article Google Scholar
Motzkin TS, Straus EG (1965) Maxima for graphs and a new proof of a theorem of Turan. Can J Math 17: 533–540
MATH MathSciNet Google Scholar
Nicodemus KK, Liu W, Chase GA, Tsai YY, Fallin MD (2005) Comparison of type I error for multiple test corrections in large single-nucleotide polymorphism studies using principal components versus haplotype blocking algorithms. BMC Genet 6(Suppl 1): S78
Article Google Scholar
Nicolas P, Sun F, Li LM (2006) A model-based approach to selection of tag SNPs. BMC Bioinform 7: 303
Article Google Scholar
Pavan M, Pelillo M (2003) A new graph-theoretic approach to clustering and segmentation. IEEE Conf Comput Vis Pattern Recogn 1: 145–152
Google Scholar
Pavan M, Pelillo M (2007) Dominant sets and pairwise clustering. IEEE Trans Pat Anal Mach Int 29: 167–172
Article Google Scholar
Pelillo M, Torsello A (2006) Payoff-monotonic game dynamics and the maximum clique problem. Neur Comp 18: 1215–1258
Article MATH MathSciNet Google Scholar
Pritchard JK, Przeworski M (2001) Linkage disequilibrium in humans: models and data. Am J Hum Genet 69: 1–14
Article Google Scholar
Qin ZS, Gopalakrishnan S, Abecasis GR (2006) An efficient comprehensive search algorithm for tagSNP selection using linkage disequilibrium criteria. Bioinformatics 22: 220–225
Article Google Scholar
Scheet P, Stephens M (2006) A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet 78: 629–644
Article Google Scholar
Stram DO (2004) Tag SNP selection for association studies. Gen Epi 27: 365–374
Article Google Scholar
The Encode Project Consortium: (2004) The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306: 636–640
Article Google Scholar
The International HapMap Consortium: (2005) A haplotype map of the human genome. Nature 437: 1299–1320
Article Google Scholar
The International HapMap Consortium: (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449: 851–862
Article Google Scholar
Wellek S, Ziegler A (2009) A genotype-based approach to assessing the association between single nucleotide polymorphisms. Hum Hered 67: 128–139
Article Google Scholar
Xu Z, Kaplan NL, Taylor JA (2007) TAGster: efficient selection of LD tag SNPs in single or multiple populations. Bioinformatics 23: 3254–3255
Article Google Scholar
Zhang K, Deng M, Chen T, Waterman MS, Sun F (2002) A dynamic programming algorithm for haplotype block partitioning. Natl Acad Sci USA 99: 7335–7339
Article MATH Google Scholar
Zhang K, Sun F (2005) Assessing the power of tag SNPs in the mapping of quantitative trait loci (QTL) with extremal and random samples. BMC Genet 6: 51
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, University Vienna, 1010, Vienna, Austria
Florian Frommlet

Authors

Florian Frommlet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Florian Frommlet.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Frommlet, F. Tag SNP selection based on clustering according to dominant sets found using replicator dynamics. Adv Data Anal Classif 4, 65–83 (2010). https://doi.org/10.1007/s11634-010-0059-2

Download citation

Received: 22 August 2009
Revised: 31 January 2010
Accepted: 18 February 2010
Published: 18 March 2010
Issue Date: April 2010
DOI: https://doi.org/10.1007/s11634-010-0059-2

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

On the impact of relatedness on SNP association analysis

Performance of a blockwise approach in variable selection using linkage disequilibrium information

High-throughput inference of pairwise coalescence times identifies signals of selection and enriched disease heritability

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2000)

Subscribe and save

Buy Now

Tag SNP selection based on clustering according to dominant sets found using replicator dynamics

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

On the impact of relatedness on SNP association analysis

Performance of a blockwise approach in variable selection using linkage disequilibrium information

High-throughput inference of pairwise coalescence times identifies signals of selection and enriched disease heritability

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2000)

Subscribe and save

Buy Now