Abstract
The goal of automatic music segmentation is to calculate boundaries between musical parts or sections that are perceived as semantic entities. Such sections are often characterized by specific musical properties such as instrumentation, dynamics, tempo, or rhythm. Recent data-driven approaches often phrase music segmentation as a binary classification problem, where musical cues for identifying boundaries are learned implicitly. Complementary to such methods, we present in this paper an approach for identifying relevant audio features that explain the presence of musical boundaries. In particular, we describe a multi-objective evolutionary feature selection strategy, which simultaneously optimizes two objectives. In a first setting, we reduce the number of features while maximizing an F-measure. In a second setting, we jointly maximize precision and recall values. Furthermore, we present extensive experiments based on six different feature sets covering different musical aspects. We show that feature selection allows for reducing the overall dimensionality while increasing the segmentation quality compared to full feature sets, with timbre-related features performing best.
This work was funded by the German Research Foundation (DFG), project 336599081 “Evolutionary optimisation for interpretable music segmentation and music categorisation based on discretised semantic metafeatures”. The experiments were carried out on the Linux HPC cluster at TU Dortmund (LiDO3), partially funded in the course of the Large-Scale Equipment Initiative by the German Research Foundation (DFG) as project 271512359. The International Audio Laboratories Erlangen are a joint institution of the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and the Fraunhofer-Institut für Integrierte Schaltungen IIS.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The terminology within the scope of this paper is as follows: feature selection keeps individual feature dimensions (e.g., the 2nd MFCC) from feature vectors (e.g., a 13-dimensional MFCC vector) which exclusively belong to feature groups like timbre. A feature set selected for music segmentation is then constructed with various dimensions of various features which however belong to the same group in the current setup—the combination of features from different groups remains a promising future work.
References
Burred, J.J., Lerch, A.: A hierarchical approach to automatic musical genre classification. In: Proceedings of the 6th International Conference on Digital Audio Effects (DAFx), pp. 8–11 (2003)
Coello, C.A.C., Lamont, G.B., Veldhuizen, D.A.V.: Evolutionary Algorithms for Solving Multi-Objective Problems. Springer, New York (2007). https://doi.org/10.1007/978-0-387-36797-2
Emmerich, M., Beume, N., Naujoks, B.: An EMO algorithm using the hypervolume measure as selection criterion. In: Coello Coello, C.A., Hernández Aguirre, A., Zitzler, E. (eds.) EMO 2005. LNCS, vol. 3410, pp. 62–76. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31880-4_5
Foote, J.: Visualizing music and audio using self-similarity. In: Proceedings of the 7th ACM International Conference on Multimedia, pp. 77–80 (1999)
Fujinaga, I.: Machine recognition of timbre using steady-state tone of acoustic musical instruments. In: Proceedings of the International Computer Music Conference (ICMC), pp. 207–210 (1998)
Grill, T., Schlüter, J.: Music boundary detection using neural networks on combined features and two-level annotations. In: Müller, M., Wiering, F. (eds.) Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR), pp. 531–537 (2015)
Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds.): Feature Extraction. In: Foundations and Applications, Studies in Fuzziness and Soft Computing, vol. 207. Springer, Heidelberg (2006). https://doi.org/10.1007/978-3-540-35488-8
Jensen, K.: Timbre models of musical sounds - from the model of one sound to the model of one instrument. Ph.D. Thesis, University of Copenhagen, Denmark (1999)
Jensen, K.: Multiple scale music segmentation using rhythm, timbre, and harmony. EURASIP J. Adv. Sig. Process. (2007). https://doi.org/10.1155/2007/73205
Klapuri, A., Eronen, A.J., Astola, J.: Analysis of the meter of acoustic musical signals. IEEE Trans. Audio Speech Lang. Process. 14(1), 342–355 (2006)
Lartillot, O., Toiviainen, P.: MIR in MATLAB (II): a toolbox for musical feature extraction from audio. In: Proceedings of the 8th International Conference on Music Information Retrieval (ISMIR), pp. 127–130. Austrian Computer Society (2007)
Mauch, M., Dixon, S.: Approximate note transcription for the improved identification of difficult chords. In: Downie, J.S., Veltkamp, R.C. (eds.) Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR), pp. 135–140 (2010)
McEnnis, D., McKay, C., Fujinaga, I.: jAudio: additions and improvements. In: Proceedings of the 7th International Conference on Music Information Retrieval (ISMIR), pp. 385–386 (2006)
McFee, B., et al.: Librosa: audio and music signal analysis in python. In: Proceedings the Python Science Conference, pp. 18–25 (2015)
Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., Euler, T.: YALE: rapid prototyping for complex data mining tasks. In: Eliassi-Rad, T., Ungar, L.H., Craven, M., Gunopulos, D. (eds.) Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 935–940. ACM (2006)
Müller, M.: Fundamentals of Music Processing: Audio, Analysis, Algorithms, Applications. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-319-21945-5
Müller, M., Ewert, S.: Chroma Toolbox: MATLAB implementations for extracting variants of chroma-based audio features. In: Klapuri, A., Leider, C. (eds.) Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR), pp. 215–220. University of Miami (2011)
Müller, M., Zalkow, F.: FMP notebooks: educational material for teaching and learning fundamentals of music processing. In: Proceedings of the 20th International Conference on Music Information Retrieval (ISMIR). Delft, The Netherlands, November 2019
Parry, R.M., Essa, I.A.: Feature weighting for segmentation. In: Proceedings of the 5th International Conference on Music Information Retrieval (ISMIR) (2004)
Saari, P., Eerola, T., Lartillot, O.: Generalizability and simplicity as criteria in feature selection: application to mood classification in music. IEEE Trans. Audio Speech Lang. Process. 19(6), 1802–1812 (2011)
Smith, J.B.L., Burgoyne, J.A., Fujinaga, I., Roure, D.D., Downie, J.S.: Design and creation of a large-scale database of structural annotations. In: Klapuri, A., Leider, C. (eds.) Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR), pp. 555–560. University of Miami (2011)
Smith, J.B.L., Chew, E.: Using quadratic programming to estimate feature relevance in structural analyses of music. In: Proceedings of ACM Multimedia Conference, pp. 113–122. ACM (2013)
Tian, M.: A cross-cultural analysis of music structure. Ph.D. Thesis, Queen Mary University of London, UK (2017)
Vatolkin, I.: Generalisation performance of western instrument recognition models in polyphonic mixtures with ethnic samples. In: Correia, J., Ciesielski, V., Liapis, A. (eds.) EvoMUSART 2017. LNCS, vol. 10198, pp. 304–320. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55750-2_21
Vatolkin, I., Preuß, M., Rudolph, G.: Multi-objective feature selection in music genre and style recognition tasks. In: Krasnogor, N., Lanzi, P.L. (eds.) Proceedings of the 13th Annual Genetic and Evolutionary Computation Conference (GECCO), pp. 411–418. ACM Press (2011)
Vatolkin, I., Theimer, W., Botteck, M.: AMUSE (Advanced MUSic Explorer) - a multitool framework for music data analysis. In: Downie, J.S., Veltkamp, R.C. (eds.) Proceedings of the 11th International Society on Music Information Retrieval Conference (ISMIR), pp. 33–38 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Vatolkin, I., Koch, M., Müller, M. (2021). A Multi-objective Evolutionary Approach to Identify Relevant Audio Features for Music Segmentation. In: Romero, J., Martins, T., Rodríguez-Fernández, N. (eds) Artificial Intelligence in Music, Sound, Art and Design. EvoMUSART 2021. Lecture Notes in Computer Science(), vol 12693. Springer, Cham. https://doi.org/10.1007/978-3-030-72914-1_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-72914-1_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-72913-4
Online ISBN: 978-3-030-72914-1
eBook Packages: Computer ScienceComputer Science (R0)