Abstract
In this study, a hybrid scheme that combines Gaussian mixture model (GMM) and the k-means approach, called GMM-kmeans, is proposed for automatic audio segmentation (AAS) of popular music. Generally, the structure of a popular music is composed of verse, chorus and non-repetitive (such as intro, bridge and outro) segments. The combined GMM-kmeans scheme including mainly two developed algorithms, GMMAAS and SFS, will efficiently divide a song into these three parts. In GMM-kmeans, the GMM classifier is to recognize the vocal segments and then calculate the section boundary between them and non-repetitive sections first. The song with vocal segments extracted by GMM, containing only the remaining verse and chorus sections, is then analyzed by the k-means clustering algorithm where the verse section is further discriminated from the chorus section. In classification of verse and chorus by k-means, the developed switching frame search (SFS) algorithm with the devise of verse group-of-frames (Verse-GoF) and Chorus-GoF will accurately estimate the separation boundary of verse and chorus sections. Experimental results obtained from a musical data set of numerous Chinese popular songs show the superiority of both proposed GMMAAS and SFS.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Wutiwiwatchai C, Furui S (2007) Thai speech processing technology: a review. Speech Commun 49:8–27
Levy M, Sandler M (2008) Structural segmentation of musical audio by constrained clustering. IEEE Trans Audio, Speech, Lang Process 16:318–326
Lukashevich H (2008) Towards quantitative measures of evaluating song segmentation. In: Proceedings of international conference on music information retrieval, pp 375–380
Peiszer E, Lidy T, Rauber A (2008) Automatic audio segmentation: segment boundary and structure detection in popular music. In: Proceedings of international workshop on learning the semantics of audio signals
Reynolds DA, Rose RC (1995) Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans Speech Audio Process 3:72–83
Hartigan JA, Wong MA (1979) A K-means clustering algorithm. Appl Stat 28:100–108
Bezdek JC, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10:191–203
Acknowledgments
This research is partially supported by the National Science Council (NSC) in Taiwan under grant NSC 101-2221-E-150-084.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media Dordrecht
About this paper
Cite this paper
Ding, IJ. (2013). A Method of Combining Gaussian Mixture Model and K-Means for Automatic Audio Segmentation of Popular Music. In: Kim, K., Chung, KY. (eds) IT Convergence and Security 2012. Lecture Notes in Electrical Engineering, vol 215. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-5860-5_94
Download citation
DOI: https://doi.org/10.1007/978-94-007-5860-5_94
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-5859-9
Online ISBN: 978-94-007-5860-5
eBook Packages: EngineeringEngineering (R0)