Analysis of ChIP-seq Data Via Bayesian Finite Mixture Models with a Non-parametric Component

Baba B. Alhaji²⁰,
Hongsheng Dai²⁰,
Yoshiko Hayashi²⁰,
Veronica Vinciotti²⁰,
Andrew Harrison²⁰ &
…
Berthold Lausen²⁰

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

2252 Accesses

Abstract

In large discrete data sets which requires classification into signal and noise components, the distribution of the signal is often very bumpy and does not follow a standard distribution. Therefore the signal distribution is further modelled as a mixture of component distributions. However, when the signal component is modelled as a mixture of distributions, we are faced with the challenges of justifying the number of components and the label switching problem (caused by multi-modality of the likelihood function). To circumvent these challenges, we propose a non-parametric structure for the signal component. This new method is more efficient in terms of precise estimates and better classifications. We demonstrated the efficacy of the methodology using a ChIP-sequencing data set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Modelling ChIP-seq Data Using HMMs

Accucopy: accurate and fast inference of allele-specific copy number alterations from low-coverage low-purity tumor sequencing data

Article Open access 15 January 2021

CLIMB: High-dimensional association detection in large scale genomic data

Article Open access 12 November 2022

References

Antoniak, C. E. (1974). Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. The Annals of Statistics, 2(6), 1152–1174.
Article MathSciNet MATH Google Scholar
Bao, Y., Vinciotti, V., Wit, E., & ’T Hoen, P. A. C. (2013). Accounting for immunoprecipitation efficiencies in the statistical analysis of ChIP-seq data. BMC Bioinformatics, 14, 169.
Google Scholar
Bao, Y., Vinciotti, V., Wit, E., & ’T Hoen, P. A. C. (2014). Joint modelling of ChIP-seq data via a Markov random field model. Biostatistics, 15(2), 296–310.
Google Scholar
Celeux, G., Hurn, M., & Robert, C. P. (2000). Computational and inferential difficulties with mixture posterior distributions. Journal of American Statistical Association, 95, 957–970.
Article MathSciNet MATH Google Scholar
Diebolt, J., & Robert, C. P. (1994). Estimation of finite mixture distributions through Bayesian sampling. Journal of the Royal Statistical Society. Series B, 56, 363–375.
MathSciNet MATH Google Scholar
Escobar, M. D., & West, M. (1995). Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association, 90(430), 577–588.
Article MathSciNet MATH Google Scholar
Green, P. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82(4), 711–732.
Article MathSciNet MATH Google Scholar
Hower, V., Evans, S. N., & Pachter, L. (2011). Shape-based peak identification for ChIP-seq. BMC Bioinformatics, 12(1), 15.
Article Google Scholar
Jasra, A., Holmes, C. C., & Stephens, D. A. (2005). Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling. Statistical Science, 20, 50–67.
Article MathSciNet MATH Google Scholar
Kuan, P. F., Chung, D., Pan, G., Thomson, J. A., Stewart, R., & Kele, S. (2011). A statistical framework for the analysis of chip-seq data. Journal of the American Statistical Association, 106(495), 891–903.
Article MathSciNet MATH Google Scholar
Mclachlan, G., & Peel, D. (2004). Finite mixture models. New York: Wiley.
MATH Google Scholar
Nix, D., Courdy, S., & Boucher, K. (2008). Empirical methods for controlling false positives and estimating confidence in ChIP-Seq peaks. BMC Bioinformatics 9(1), 523.
Article Google Scholar
Nobile, A., & Fearnside, A. T. (2007). Bayesian finite mixtures with an unknown number of components: The allocation sampler. Statistics and Computing, 17(2), 147–162.
Article MathSciNet Google Scholar
Qin, Z. S., Yu, J., Shen, J., Maher, C. A., Hu, M., Kalyana-Sundaram, S., et al. (2010). HPeak: An HMM-based algorithm for defining read-enriched regions in ChIP-seq data. BMC Bioinformatics, 11(1), 369.
Article Google Scholar
Ramos, Y. F. M., Hestand, M. S., Verlaan, M., Krabbendam, E., Ariyurek, Y., Van Galen, M., et al. (2010). Genome-wide assessment of differential roles for p300 and CBP in transcription regulation. Nucleic Acids Research, 39(16), 5396–5408.
Article Google Scholar
Richardson, S., & Green, P. J. (1997). Bayesian analysis of mixtures with an unknown number of components (With Discussion). Journal of the Royal Statistical Society: Series B, 59(4), 731–792.
Article MathSciNet MATH Google Scholar
Rodriguez, C. E., & Walker, S. G. (2014). Label switching in Bayesian mixture models: Deterministic relabeling strategies. Journal of Computational and Graphical Statistics, 23, 25–45.
Article MathSciNet Google Scholar
Sperrin, M., Jaki, T., & Wit, E. (2010). Probabilistic relabelling strategies for the label switching problem in Bayesian mixture models. Journal of Statistics and Computing, 20, 357–366.
Article MathSciNet Google Scholar
Stephens, M. (2000a). Bayesian analysis of mixture models with an unknown number of components an alternative to reversible jump methods. Annals of Statistician, 28, 40–74.
Article MathSciNet MATH Google Scholar
Stephens, M. (2000b). Dealing with label switching in mixture models. Journal of the Royal Statistical Society: Series B, 62(4), 795–809.
Article MathSciNet MATH Google Scholar
Wang, J., Huda, A., Lunyak, V. V., & Jordan, I. K. (2010). A Gibbs sampling strategy applied to the mapping of ambiguous short-sequence tags. Bioinformatics, 26(20), 2501–2508.
Article Google Scholar
Zhang, Y., Liu, T., Meyer, C., Eeckhoute, J., Johnson, D., Bernstein, B., et al. (2008). Model-based analysis of ChIP-Seq (MACS). Genome Biology 9(9), R137.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematical Sciences, University of Essex, Colchester, UK
Baba B. Alhaji, Hongsheng Dai, Yoshiko Hayashi, Veronica Vinciotti, Andrew Harrison & Berthold Lausen

Authors

Baba B. Alhaji
View author publications
You can also search for this author in PubMed Google Scholar
Hongsheng Dai
View author publications
You can also search for this author in PubMed Google Scholar
Yoshiko Hayashi
View author publications
You can also search for this author in PubMed Google Scholar
Veronica Vinciotti
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Harrison
View author publications
You can also search for this author in PubMed Google Scholar
Berthold Lausen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Baba B. Alhaji .

Editor information

Editors and Affiliations

Jacobs University Bremen , Bremen, Germany
Adalbert F.X. Wilhelm
Universität Ulm, Institute of Medical Systems Biology Universität Ulm, Ulm, Baden-Württemberg, Germany
Hans A. Kestler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alhaji, B.B., Dai, H., Hayashi, Y., Vinciotti, V., Harrison, A., Lausen, B. (2016). Analysis of ChIP-seq Data Via Bayesian Finite Mixture Models with a Non-parametric Component. In: Wilhelm, A., Kestler, H. (eds) Analysis of Large and Complex Data. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-25226-1_43

Download citation

DOI: https://doi.org/10.1007/978-3-319-25226-1_43
Published: 04 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25224-7
Online ISBN: 978-3-319-25226-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Analysis of ChIP-seq Data Via Bayesian Finite Mixture Models with a Non-parametric Component

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Modelling ChIP-seq Data Using HMMs

Accucopy: accurate and fast inference of allele-specific copy number alterations from low-coverage low-purity tumor sequencing data

CLIMB: High-dimensional association detection in large scale genomic data

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Analysis of ChIP-seq Data Via Bayesian Finite Mixture Models with a Non-parametric Component

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Modelling ChIP-seq Data Using HMMs

Accucopy: accurate and fast inference of allele-specific copy number alterations from low-coverage low-purity tumor sequencing data

CLIMB: High-dimensional association detection in large scale genomic data

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation