Abstract
This paper presents a system capable of de-identifying speech signals in order to hide and protect the identity of the speaker. It applies a relatively simple yet effective transformation of the pitch and the frequency axis of the spectral envelope thanks to a flexible wideband harmonic model. Moreover, it inserts the parameters of the transformation in the signal by means of watermarking techniques, thus enabling re-identification. Our experiments show that for adequate modification factors its performance is satisfactory in terms of quality, de-identification degree and naturalness. The limitations due to the signal processing framework are discussed as well.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Voice conversion can be seen as a particular case of voice transformation where there is a specific target speaker.
- 2.
PESQ predicts the mean opinion score of a distorted signal in comparison with its original clean version.
References
Ribaric, S., Ariyaeeinia, A., Pavesic, N.: De-identification for privacy protection in multimedia content: a survey. Signal Process. Image Commun. 47, 131–151 (2016)
Jin, Q., Toth, A.R., Schultz, T., Black, A.W.: Voice convergin: speaker de-identification by voice transformation. In: Proceedings of ICASSP, pp. 3909–3912 (2009)
Pobar, M., Ipsic, I.: Online speaker de-identification using voice transformation. In: Proceedings of MIPRO, pp. 1264–1267 (2014)
Justin, T., Struc, V., Dobrisek, S., Vesnicer, B., Ipsic, I., Mihelic, F.: Speaker de-identification using diphone recognition and speech synthesis. In: Proceedings of 11th IEEE International Conference on Automatic Face and Gesture Recognition, pp. 1–7 (2015)
Magariños, C., Lopez-Otero, P., Docio, L., Erro, D., Rodriguez-Banga, E., Garcia-Mateo, C.: Piecewise linear definition of transformation functions for speaker de-identification. In: Proceedings of SPLINE (2016)
Magariños, C., Lopez-Otero, P., Docio, L., Rodriguez-Banga, E., Erro, D., Garcia-Mateo, C.: Reversible speaker de-identification using pre-trained transformation functions. IEEE Signal Process. Lett. (2016, submitted)
Erro, D., Moreno, A., Bonafonte, A.: Flexible harmonic/stochastic speech synthesis. In: Proceedings of 6th ISCA Speech Synthesis Workshop, pp. 194–199 (2007)
Degottex, G., Stylianou, Y.: Analysis and synthesis of speech using an adaptive full-band harmonic model. IEEE Trans. Audio Speech Lang. Process. 21(10), 2085–2095 (2013)
Stylianou, Y.: Harmonic plus noise models for speech, combined with statistical methods, for speech and speaker modification. Ph.D. thesis, ENST, Paris (1996)
Boersma, P.: Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. In: Proceedings of Institute of Phonetic Sciences, University of Amsterdam, pp. 97–110 (1993)
Tokuda, K., Kobayashi, T., Masuko, T., Imai, S.: Mel-generalized cepstral analysis - a unified approach to speech spectral estimation. In: Proceedings of ICSLP, vol. 3, pp. 1043–1046 (1994)
Nematollahi, M.A., Al-Haddad, S.A.R.: An overview of digital speech watermarking. Int. J. Speech Tech. 16(4), 471–488 (2013)
Kirovski, D., Malvar, H.S.: Spread-spectrum watermarking of audio signals. IEEE Trans. Signal Process. 51(4), 1020–1033 (2003)
Korzhik, V.I., Morales-Luna, G., Fedyanin, I.: Audio watermarking based on echo hiding with zero error probability. Int. J. Emerg. Technol. Adv. Eng. 10(1), 1–10 (2013)
Hernaez, I., Saratxaga, I., Ye, J., Sanchez, J., Erro, D., Navas, E.: Speech watermarking based on coding of the harmonic phase. In: Navarro Mesa, J.L., Ortega, A., Teixeira, A., Hernández Pérez, E., Quintana Morales, P., Ravelo GarcÃa, A., Guerra Moreno, I., Toledano, D.T. (eds.) IberSPEECH 2014. LNCS (LNAI), vol. 8854, pp. 259–268. Springer, Heidelberg (2014). doi:10.1007/978-3-319-13623-3_27
Zeki, A.M., Manaf, A.A.: A novel digital watermarking technique based on ISB (Intermediate Significant Bit). Int. J. Comput. Electr. Autom. Control Inf. Eng. 3(2), 444–451 (2009)
Moon, T.K.: Error Correction Coding: Mathematical Methods and Algorithms. Wiley, New York (2005)
Rix, A., Beerends, J., Hollier, M., Hekstra, A.: Perceptual evaluation of speech quality (PESQ) - a new method for speech quality assessment of telephone networks and codecs. In: Proceedings of ICASSP, vol. 2, pp. 749–752 (2001)
Phonexia speaker identification. https://www.phonexia.com/technologies/sid
Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)
White, L., King, S.: The EUSTACE speech corpus (2003). http://www.cstr.ed.ac.uk/projects/eustace
Acknowledgements
This work has been partially funded by the Spanish Ministry of Economy and Competitiveness (RESTORE project, TEC2015-67163-C2-1-R MINECO/FEDER,UE) and the Basque Government (ELKAROLA, KK-2015/00098).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Valdivielso, A., Erro, D., Hernaez, I. (2016). Reversible Speech De-identification Using Parametric Transformations and Watermarking. In: Abad, A., et al. Advances in Speech and Language Technologies for Iberian Languages. IberSPEECH 2016. Lecture Notes in Computer Science(), vol 10077. Springer, Cham. https://doi.org/10.1007/978-3-319-49169-1_26
Download citation
DOI: https://doi.org/10.1007/978-3-319-49169-1_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49168-4
Online ISBN: 978-3-319-49169-1
eBook Packages: Computer ScienceComputer Science (R0)