Abstract
In large discrete data sets which requires classification into signal and noise components, the distribution of the signal is often very bumpy and does not follow a standard distribution. Therefore the signal distribution is further modelled as a mixture of component distributions. However, when the signal component is modelled as a mixture of distributions, we are faced with the challenges of justifying the number of components and the label switching problem (caused by multi-modality of the likelihood function). To circumvent these challenges, we propose a non-parametric structure for the signal component. This new method is more efficient in terms of precise estimates and better classifications. We demonstrated the efficacy of the methodology using a ChIP-sequencing data set.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Antoniak, C. E. (1974). Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. The Annals of Statistics, 2(6), 1152–1174.
Bao, Y., Vinciotti, V., Wit, E., & ’T Hoen, P. A. C. (2013). Accounting for immunoprecipitation efficiencies in the statistical analysis of ChIP-seq data. BMC Bioinformatics, 14, 169.
Bao, Y., Vinciotti, V., Wit, E., & ’T Hoen, P. A. C. (2014). Joint modelling of ChIP-seq data via a Markov random field model. Biostatistics, 15(2), 296–310.
Celeux, G., Hurn, M., & Robert, C. P. (2000). Computational and inferential difficulties with mixture posterior distributions. Journal of American Statistical Association, 95, 957–970.
Diebolt, J., & Robert, C. P. (1994). Estimation of finite mixture distributions through Bayesian sampling. Journal of the Royal Statistical Society. Series B, 56, 363–375.
Escobar, M. D., & West, M. (1995). Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association, 90(430), 577–588.
Green, P. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82(4), 711–732.
Hower, V., Evans, S. N., & Pachter, L. (2011). Shape-based peak identification for ChIP-seq. BMC Bioinformatics, 12(1), 15.
Jasra, A., Holmes, C. C., & Stephens, D. A. (2005). Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling. Statistical Science, 20, 50–67.
Kuan, P. F., Chung, D., Pan, G., Thomson, J. A., Stewart, R., & Kele, S. (2011). A statistical framework for the analysis of chip-seq data. Journal of the American Statistical Association, 106(495), 891–903.
Mclachlan, G., & Peel, D. (2004). Finite mixture models. New York: Wiley.
Nix, D., Courdy, S., & Boucher, K. (2008). Empirical methods for controlling false positives and estimating confidence in ChIP-Seq peaks. BMC Bioinformatics 9(1), 523.
Nobile, A., & Fearnside, A. T. (2007). Bayesian finite mixtures with an unknown number of components: The allocation sampler. Statistics and Computing, 17(2), 147–162.
Qin, Z. S., Yu, J., Shen, J., Maher, C. A., Hu, M., Kalyana-Sundaram, S., et al. (2010). HPeak: An HMM-based algorithm for defining read-enriched regions in ChIP-seq data. BMC Bioinformatics, 11(1), 369.
Ramos, Y. F. M., Hestand, M. S., Verlaan, M., Krabbendam, E., Ariyurek, Y., Van Galen, M., et al. (2010). Genome-wide assessment of differential roles for p300 and CBP in transcription regulation. Nucleic Acids Research, 39(16), 5396–5408.
Richardson, S., & Green, P. J. (1997). Bayesian analysis of mixtures with an unknown number of components (With Discussion). Journal of the Royal Statistical Society: Series B, 59(4), 731–792.
Rodriguez, C. E., & Walker, S. G. (2014). Label switching in Bayesian mixture models: Deterministic relabeling strategies. Journal of Computational and Graphical Statistics, 23, 25–45.
Sperrin, M., Jaki, T., & Wit, E. (2010). Probabilistic relabelling strategies for the label switching problem in Bayesian mixture models. Journal of Statistics and Computing, 20, 357–366.
Stephens, M. (2000a). Bayesian analysis of mixture models with an unknown number of components an alternative to reversible jump methods. Annals of Statistician, 28, 40–74.
Stephens, M. (2000b). Dealing with label switching in mixture models. Journal of the Royal Statistical Society: Series B, 62(4), 795–809.
Wang, J., Huda, A., Lunyak, V. V., & Jordan, I. K. (2010). A Gibbs sampling strategy applied to the mapping of ambiguous short-sequence tags. Bioinformatics, 26(20), 2501–2508.
Zhang, Y., Liu, T., Meyer, C., Eeckhoute, J., Johnson, D., Bernstein, B., et al. (2008). Model-based analysis of ChIP-Seq (MACS). Genome Biology 9(9), R137.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Alhaji, B.B., Dai, H., Hayashi, Y., Vinciotti, V., Harrison, A., Lausen, B. (2016). Analysis of ChIP-seq Data Via Bayesian Finite Mixture Models with a Non-parametric Component. In: Wilhelm, A., Kestler, H. (eds) Analysis of Large and Complex Data. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-25226-1_43
Download citation
DOI: https://doi.org/10.1007/978-3-319-25226-1_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25224-7
Online ISBN: 978-3-319-25226-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)