Modeling Audio Distortion Effects with Autoencoder Neural Networks

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 377))

Included in the following conference series:

International Conference on Intelligent Technologies for Interactive Entertainment

572 Accesses

Abstract

Most music production nowadays is carried out using software tools: for this reason, the market demands faithful audio effect simulations. Traditional methods for modeling nonlinear systems are effect-specific or labor-intensive; however, recent works yielded promising results by black-box simulation of these effects using neural networks. This work aims to explore two models of distortion effects based on autoencoders: one makes use of fully-connected layers only, and the other employs convolutional layers. Both models were trained using clean sounds as input and distorted sounds as target, thus, the learning method was not self-supervised, as it is mostly the case when dealing with autoencoders. The networks were then tested with visual inspection of the output spectrograms, as well as with an informal listening test, and performed well in reconstructing the distorted signal spectra, however a fair amount of noise was also introduced.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Latent Timbre Synthesis

Article 20 October 2020

Deep Generative Models for Musical Audio Synthesis

Audio Mixing Inversion via Embodied Self-supervised Learning

Article 15 January 2024

Notes

References

Idmt dataset. www.idmt.fraunhofer.de/en/business_units/m2d/smt/audio_effects.html
Covert, J., Livingston, D.: A vacuum-tube guitar amplifier model using a recurrent neural network. In: Proceedings of IEEE SOUTHEASTCON 2013, pp. 1–5 (2013)
Google Scholar
Damskägg, E.P., Juvela, L., Välimäki, V.: Real-time modeling of audio distortion circuits with deep learning. In: Proceedings of International Sound and Music Computing Conference (SMC), Malaga, Spain, pp. 332–339 (2019)
Google Scholar
Damskägg, E.P., Juvela, L., Thuillier, E., Välimäki, V.: Deep learning for tube amplifier emulation. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, pp. 471–475 (2019)
Google Scholar
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016). http://www.deeplearningbook.org
Hinton, G.E., Zemel, R.S.: Autoencoders, minimum description length and Helmholtz free energy. In: Proceedings of the 6th International Conference on Neural Information Processing Systems, NIPS 1993, pp. 3–10. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Google Scholar
Lu, X., Tsao, Y., Matsuda, S., Hori, C.: Speech enhancement based on deep denoising auto-encoder. In: Proceedings of Interspeech 2013, pp. 436–440 (2013)
Google Scholar
Martínez Ramírez, M.A., Benetos, E., Reiss, J.D.: A general-purpose deep learning approach to model time-varying audio effects. In: 22nd International Conference on Digital Audio Effects (DAFx-19) (2019)
Google Scholar
Martínez Ramírez, M.A., Reiss, J.: End-to-end equalization with convolutional neural networks. In: Proceedings of International Conference on Digital Audio Effects (DAFx) 2018, Aveiro, Portugal, pp. 296–303 (2018)
Google Scholar
Martínez Ramírez, M.A., Reiss, J.D.: Modeling nonlinear audio effects with end-to-end deep neural networks. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2019, Brighton, United Kingdom, pp. 171–175 (2019)
Google Scholar
Moreno, J.A., Bigoni, F., Palamas, G.: Latent birds: a bird’s-eye view exploration of the latent space. In: Proceedings of 17th Sound and Music Computing Conference, Torino, 24th–26th June 2020 (2020)
Google Scholar
van den Oord, A., et al.: WaveNet: a generative model for raw audio (2016). https://arxiv.org/abs/1609.03499. Accessed 01 Nov 2019
Rethage, D., Pons, J., Serra, X.: A wavenet for speech denoising. In: Proceedings of 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, pp. 5069–5073 (2018)
Google Scholar
Schattschneider, J., Olzer, U.: Discrete-time models for nonlinear audio systems. In: Proceedings of the 2nd COST G-6 Workshop on Digital Audio Effects (DAFx99) (1999)
Google Scholar
Schmitz, T., Embrechts, J.J.: Real time emulation of parametric guitar tube amplifier with long short term memory neural network. In: Proceedings of Conference on Image Processing and Pattern Recognition (IPPR 2018), pp. 149–157 (2018)
Google Scholar
Zhang, Z., Olbrych, E., Bruchalski, J., McCormick, T.J., Livingston, D.L.: A vacuum-tube guitar amplifier model using long/short-term memory networks. In: Proceedings of IEEE SOUTHEASTCON 2018, pp. 1–5 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Aalborg University Copenhagen, Copenhagen, Denmark
Riccardo Russo, Francesco Bigoni & George Palamas

Authors

Riccardo Russo
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Bigoni
View author publications
You can also search for this author in PubMed Google Scholar
George Palamas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Riccardo Russo .

Editor information

Editors and Affiliations

Santa Clara University, Santa Clara, CA, USA
Navid Shaghaghi
INFN Sezione di Torino, Torino, Italy
Fabrizio Lamberti
Santa Clara University, Santa Clara, FL, USA
Brian Beams
Santa Clara University, Santa Clara, FL, USA
Reza Shariatmadari
Santa Clara University, Santa Clara, CA, USA
Ahmed Amer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Russo, R., Bigoni, F., Palamas, G. (2021). Modeling Audio Distortion Effects with Autoencoder Neural Networks. In: Shaghaghi, N., Lamberti, F., Beams, B., Shariatmadari, R., Amer, A. (eds) Intelligent Technologies for Interactive Entertainment. INTETAIN 2020. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 377. Springer, Cham. https://doi.org/10.1007/978-3-030-76426-5_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-76426-5_9
Published: 19 May 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-76425-8
Online ISBN: 978-3-030-76426-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Modeling Audio Distortion Effects with Autoencoder Neural Networks

Abstract

Access this chapter

Subscribe and save

Buy Now