[go: up one dir, main page]

Skip to main content

A Method of Combining Gaussian Mixture Model and K-Means for Automatic Audio Segmentation of Popular Music

  • Conference paper
  • First Online:
IT Convergence and Security 2012

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 215))

  • 784 Accesses

Abstract

In this study, a hybrid scheme that combines Gaussian mixture model (GMM) and the k-means approach, called GMM-kmeans, is proposed for automatic audio segmentation (AAS) of popular music. Generally, the structure of a popular music is composed of verse, chorus and non-repetitive (such as intro, bridge and outro) segments. The combined GMM-kmeans scheme including mainly two developed algorithms, GMMAAS and SFS, will efficiently divide a song into these three parts. In GMM-kmeans, the GMM classifier is to recognize the vocal segments and then calculate the section boundary between them and non-repetitive sections first. The song with vocal segments extracted by GMM, containing only the remaining verse and chorus sections, is then analyzed by the k-means clustering algorithm where the verse section is further discriminated from the chorus section. In classification of verse and chorus by k-means, the developed switching frame search (SFS) algorithm with the devise of verse group-of-frames (Verse-GoF) and Chorus-GoF will accurately estimate the separation boundary of verse and chorus sections. Experimental results obtained from a musical data set of numerous Chinese popular songs show the superiority of both proposed GMMAAS and SFS.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Wutiwiwatchai C, Furui S (2007) Thai speech processing technology: a review. Speech Commun 49:8–27

    Article  Google Scholar 

  2. Levy M, Sandler M (2008) Structural segmentation of musical audio by constrained clustering. IEEE Trans Audio, Speech, Lang Process 16:318–326

    Article  Google Scholar 

  3. Lukashevich H (2008) Towards quantitative measures of evaluating song segmentation. In: Proceedings of international conference on music information retrieval, pp 375–380

    Google Scholar 

  4. Peiszer E, Lidy T, Rauber A (2008) Automatic audio segmentation: segment boundary and structure detection in popular music. In: Proceedings of international workshop on learning the semantics of audio signals

    Google Scholar 

  5. Reynolds DA, Rose RC (1995) Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans Speech Audio Process 3:72–83

    Article  Google Scholar 

  6. Hartigan JA, Wong MA (1979) A K-means clustering algorithm. Appl Stat 28:100–108

    Article  MATH  Google Scholar 

  7. Bezdek JC, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10:191–203

    Article  Google Scholar 

Download references

Acknowledgments

This research is partially supported by the National Science Council (NSC) in Taiwan under grant NSC 101-2221-E-150-084.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ing-Jr Ding .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media Dordrecht

About this paper

Cite this paper

Ding, IJ. (2013). A Method of Combining Gaussian Mixture Model and K-Means for Automatic Audio Segmentation of Popular Music. In: Kim, K., Chung, KY. (eds) IT Convergence and Security 2012. Lecture Notes in Electrical Engineering, vol 215. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-5860-5_94

Download citation

  • DOI: https://doi.org/10.1007/978-94-007-5860-5_94

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-007-5859-9

  • Online ISBN: 978-94-007-5860-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics