[go: up one dir, main page]

Skip to main content

Reversible Speech De-identification Using Parametric Transformations and Watermarking

  • Conference paper
  • First Online:
Advances in Speech and Language Technologies for Iberian Languages (IberSPEECH 2016)

Abstract

This paper presents a system capable of de-identifying speech signals in order to hide and protect the identity of the speaker. It applies a relatively simple yet effective transformation of the pitch and the frequency axis of the spectral envelope thanks to a flexible wideband harmonic model. Moreover, it inserts the parameters of the transformation in the signal by means of watermarking techniques, thus enabling re-identification. Our experiments show that for adequate modification factors its performance is satisfactory in terms of quality, de-identification degree and naturalness. The limitations due to the signal processing framework are discussed as well.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Voice conversion can be seen as a particular case of voice transformation where there is a specific target speaker.

  2. 2.

    PESQ predicts the mean opinion score of a distorted signal in comparison with its original clean version.

References

  1. Ribaric, S., Ariyaeeinia, A., Pavesic, N.: De-identification for privacy protection in multimedia content: a survey. Signal Process. Image Commun. 47, 131–151 (2016)

    Article  Google Scholar 

  2. Jin, Q., Toth, A.R., Schultz, T., Black, A.W.: Voice convergin: speaker de-identification by voice transformation. In: Proceedings of ICASSP, pp. 3909–3912 (2009)

    Google Scholar 

  3. Pobar, M., Ipsic, I.: Online speaker de-identification using voice transformation. In: Proceedings of MIPRO, pp. 1264–1267 (2014)

    Google Scholar 

  4. Justin, T., Struc, V., Dobrisek, S., Vesnicer, B., Ipsic, I., Mihelic, F.: Speaker de-identification using diphone recognition and speech synthesis. In: Proceedings of 11th IEEE International Conference on Automatic Face and Gesture Recognition, pp. 1–7 (2015)

    Google Scholar 

  5. Magariños, C., Lopez-Otero, P., Docio, L., Erro, D., Rodriguez-Banga, E., Garcia-Mateo, C.: Piecewise linear definition of transformation functions for speaker de-identification. In: Proceedings of SPLINE (2016)

    Google Scholar 

  6. Magariños, C., Lopez-Otero, P., Docio, L., Rodriguez-Banga, E., Erro, D., Garcia-Mateo, C.: Reversible speaker de-identification using pre-trained transformation functions. IEEE Signal Process. Lett. (2016, submitted)

    Google Scholar 

  7. Erro, D., Moreno, A., Bonafonte, A.: Flexible harmonic/stochastic speech synthesis. In: Proceedings of 6th ISCA Speech Synthesis Workshop, pp. 194–199 (2007)

    Google Scholar 

  8. Degottex, G., Stylianou, Y.: Analysis and synthesis of speech using an adaptive full-band harmonic model. IEEE Trans. Audio Speech Lang. Process. 21(10), 2085–2095 (2013)

    Article  Google Scholar 

  9. Stylianou, Y.: Harmonic plus noise models for speech, combined with statistical methods, for speech and speaker modification. Ph.D. thesis, ENST, Paris (1996)

    Google Scholar 

  10. Boersma, P.: Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. In: Proceedings of Institute of Phonetic Sciences, University of Amsterdam, pp. 97–110 (1993)

    Google Scholar 

  11. Tokuda, K., Kobayashi, T., Masuko, T., Imai, S.: Mel-generalized cepstral analysis - a unified approach to speech spectral estimation. In: Proceedings of ICSLP, vol. 3, pp. 1043–1046 (1994)

    Google Scholar 

  12. Nematollahi, M.A., Al-Haddad, S.A.R.: An overview of digital speech watermarking. Int. J. Speech Tech. 16(4), 471–488 (2013)

    Article  Google Scholar 

  13. Kirovski, D., Malvar, H.S.: Spread-spectrum watermarking of audio signals. IEEE Trans. Signal Process. 51(4), 1020–1033 (2003)

    Article  MathSciNet  Google Scholar 

  14. Korzhik, V.I., Morales-Luna, G., Fedyanin, I.: Audio watermarking based on echo hiding with zero error probability. Int. J. Emerg. Technol. Adv. Eng. 10(1), 1–10 (2013)

    Google Scholar 

  15. Hernaez, I., Saratxaga, I., Ye, J., Sanchez, J., Erro, D., Navas, E.: Speech watermarking based on coding of the harmonic phase. In: Navarro Mesa, J.L., Ortega, A., Teixeira, A., Hernández Pérez, E., Quintana Morales, P., Ravelo García, A., Guerra Moreno, I., Toledano, D.T. (eds.) IberSPEECH 2014. LNCS (LNAI), vol. 8854, pp. 259–268. Springer, Heidelberg (2014). doi:10.1007/978-3-319-13623-3_27

    Google Scholar 

  16. Zeki, A.M., Manaf, A.A.: A novel digital watermarking technique based on ISB (Intermediate Significant Bit). Int. J. Comput. Electr. Autom. Control Inf. Eng. 3(2), 444–451 (2009)

    Google Scholar 

  17. Moon, T.K.: Error Correction Coding: Mathematical Methods and Algorithms. Wiley, New York (2005)

    Book  MATH  Google Scholar 

  18. Rix, A., Beerends, J., Hollier, M., Hekstra, A.: Perceptual evaluation of speech quality (PESQ) - a new method for speech quality assessment of telephone networks and codecs. In: Proceedings of ICASSP, vol. 2, pp. 749–752 (2001)

    Google Scholar 

  19. Phonexia speaker identification. https://www.phonexia.com/technologies/sid

  20. Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)

    Article  Google Scholar 

  21. White, L., King, S.: The EUSTACE speech corpus (2003). http://www.cstr.ed.ac.uk/projects/eustace

Download references

Acknowledgements

This work has been partially funded by the Spanish Ministry of Economy and Competitiveness (RESTORE project, TEC2015-67163-C2-1-R MINECO/FEDER,UE) and the Basque Government (ELKAROLA, KK-2015/00098).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Erro .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Valdivielso, A., Erro, D., Hernaez, I. (2016). Reversible Speech De-identification Using Parametric Transformations and Watermarking. In: Abad, A., et al. Advances in Speech and Language Technologies for Iberian Languages. IberSPEECH 2016. Lecture Notes in Computer Science(), vol 10077. Springer, Cham. https://doi.org/10.1007/978-3-319-49169-1_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49169-1_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49168-4

  • Online ISBN: 978-3-319-49169-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics