[go: up one dir, main page]

Skip to main content

Analysis of ChIP-seq Data Via Bayesian Finite Mixture Models with a Non-parametric Component

  • Conference paper
  • First Online:
Analysis of Large and Complex Data

Abstract

In large discrete data sets which requires classification into signal and noise components, the distribution of the signal is often very bumpy and does not follow a standard distribution. Therefore the signal distribution is further modelled as a mixture of component distributions. However, when the signal component is modelled as a mixture of distributions, we are faced with the challenges of justifying the number of components and the label switching problem (caused by multi-modality of the likelihood function). To circumvent these challenges, we propose a non-parametric structure for the signal component. This new method is more efficient in terms of precise estimates and better classifications. We demonstrated the efficacy of the methodology using a ChIP-sequencing data set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  • Antoniak, C. E. (1974). Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. The Annals of Statistics, 2(6), 1152–1174.

    Article  MathSciNet  MATH  Google Scholar 

  • Bao, Y., Vinciotti, V., Wit, E., & ’T Hoen, P. A. C. (2013). Accounting for immunoprecipitation efficiencies in the statistical analysis of ChIP-seq data. BMC Bioinformatics, 14, 169.

    Google Scholar 

  • Bao, Y., Vinciotti, V., Wit, E., & ’T Hoen, P. A. C. (2014). Joint modelling of ChIP-seq data via a Markov random field model. Biostatistics, 15(2), 296–310.

    Google Scholar 

  • Celeux, G., Hurn, M., & Robert, C. P. (2000). Computational and inferential difficulties with mixture posterior distributions. Journal of American Statistical Association, 95, 957–970.

    Article  MathSciNet  MATH  Google Scholar 

  • Diebolt, J., & Robert, C. P. (1994). Estimation of finite mixture distributions through Bayesian sampling. Journal of the Royal Statistical Society. Series B, 56, 363–375.

    MathSciNet  MATH  Google Scholar 

  • Escobar, M. D., & West, M. (1995). Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association, 90(430), 577–588.

    Article  MathSciNet  MATH  Google Scholar 

  • Green, P. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82(4), 711–732.

    Article  MathSciNet  MATH  Google Scholar 

  • Hower, V., Evans, S. N., & Pachter, L. (2011). Shape-based peak identification for ChIP-seq. BMC Bioinformatics, 12(1), 15.

    Article  Google Scholar 

  • Jasra, A., Holmes, C. C., & Stephens, D. A. (2005). Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling. Statistical Science, 20, 50–67.

    Article  MathSciNet  MATH  Google Scholar 

  • Kuan, P. F., Chung, D., Pan, G., Thomson, J. A., Stewart, R., & Kele, S. (2011). A statistical framework for the analysis of chip-seq data. Journal of the American Statistical Association, 106(495), 891–903.

    Article  MathSciNet  MATH  Google Scholar 

  • Mclachlan, G., & Peel, D. (2004). Finite mixture models. New York: Wiley.

    MATH  Google Scholar 

  • Nix, D., Courdy, S., & Boucher, K. (2008). Empirical methods for controlling false positives and estimating confidence in ChIP-Seq peaks. BMC Bioinformatics 9(1), 523.

    Article  Google Scholar 

  • Nobile, A., & Fearnside, A. T. (2007). Bayesian finite mixtures with an unknown number of components: The allocation sampler. Statistics and Computing, 17(2), 147–162.

    Article  MathSciNet  Google Scholar 

  • Qin, Z. S., Yu, J., Shen, J., Maher, C. A., Hu, M., Kalyana-Sundaram, S., et al. (2010). HPeak: An HMM-based algorithm for defining read-enriched regions in ChIP-seq data. BMC Bioinformatics, 11(1), 369.

    Article  Google Scholar 

  • Ramos, Y. F. M., Hestand, M. S., Verlaan, M., Krabbendam, E., Ariyurek, Y., Van Galen, M., et al. (2010). Genome-wide assessment of differential roles for p300 and CBP in transcription regulation. Nucleic Acids Research, 39(16), 5396–5408.

    Article  Google Scholar 

  • Richardson, S., & Green, P. J. (1997). Bayesian analysis of mixtures with an unknown number of components (With Discussion). Journal of the Royal Statistical Society: Series B, 59(4), 731–792.

    Article  MathSciNet  MATH  Google Scholar 

  • Rodriguez, C. E., & Walker, S. G. (2014). Label switching in Bayesian mixture models: Deterministic relabeling strategies. Journal of Computational and Graphical Statistics, 23, 25–45.

    Article  MathSciNet  Google Scholar 

  • Sperrin, M., Jaki, T., & Wit, E. (2010). Probabilistic relabelling strategies for the label switching problem in Bayesian mixture models. Journal of Statistics and Computing, 20, 357–366.

    Article  MathSciNet  Google Scholar 

  • Stephens, M. (2000a). Bayesian analysis of mixture models with an unknown number of components an alternative to reversible jump methods. Annals of Statistician, 28, 40–74.

    Article  MathSciNet  MATH  Google Scholar 

  • Stephens, M. (2000b). Dealing with label switching in mixture models. Journal of the Royal Statistical Society: Series B, 62(4), 795–809.

    Article  MathSciNet  MATH  Google Scholar 

  • Wang, J., Huda, A., Lunyak, V. V., & Jordan, I. K. (2010). A Gibbs sampling strategy applied to the mapping of ambiguous short-sequence tags. Bioinformatics, 26(20), 2501–2508.

    Article  Google Scholar 

  • Zhang, Y., Liu, T., Meyer, C., Eeckhoute, J., Johnson, D., Bernstein, B., et al. (2008). Model-based analysis of ChIP-Seq (MACS). Genome Biology 9(9), R137.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Baba B. Alhaji .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Alhaji, B.B., Dai, H., Hayashi, Y., Vinciotti, V., Harrison, A., Lausen, B. (2016). Analysis of ChIP-seq Data Via Bayesian Finite Mixture Models with a Non-parametric Component. In: Wilhelm, A., Kestler, H. (eds) Analysis of Large and Complex Data. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-25226-1_43

Download citation

Publish with us

Policies and ethics