[go: up one dir, main page]

Skip to main content

Advertisement

Log in

A New Algorithm for Speech Feature Extraction Using Polynomial Chirplet Transform

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Time–frequency analysis (TFA) is a powerful tool for signal feature representation. In the time–frequency plane, the primary data properties are shown with both instantaneous values and trends of frequency change during time. With a complicated and non-stationary signal such as human speech, the conventional TFA tools, including Fourier transform, wavelet transform, or linear chirplet transform (LCT), cannot reveal and represent speech behaviors well. This research proposes a new method for speech representation with a TFA perspective using polynomial chirplet transform (PCT). Inspired by the Weierstrass theorem, PCT uses a polynomial function for instantaneous frequency (IF) estimation. This polynomial also shapes the modulated atom for the transform. With the strength of a high-degree polynomial, PCT can capture many meaningful features in human speech and then robust the recognition models by improving the features representation. Experimental results in the speech processing tasks have demonstrated the potential of PCT. Furthermore, it will perform better if PCT is optimized with an adaptive strategy to identify the IF function.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Algorithm 1
Fig. 7

Similar content being viewed by others

Data Availability

This work uses TIMIT (English) and VIVOS (Vietnamese), which are commonly used in academic research.

References

  1. A. Ahrabian, D.P. Mandic, Selective time-frequency reassignment based on synchrosqueezing. IEEE Signal Process. Lett. 22(11), 2039–2043 (2015). https://doi.org/10.1109/LSP.2015.2456097

    Article  ADS  Google Scholar 

  2. S.A. Alim, N.K.A. Rashid, December. Some commonly used speech feature extraction algorithms, From Natural to Artificial Intelligence - Algorithms and Applications. IntechOpen. (2018) https://doi.org/10.5772/intechopen.80419

  3. L. Alzubaidi, J. Bai, A. Al-Sabaawi, J. Santamaría, A.S. Albahri, B.S.N. Al-dabbagh, M.A. Fadhel, M. Manoufali, J. Zhang, A.H. Al-Timemy, Y. Duan, A. Abdullah, L. Farhan, Y. Lu, A. Gupta, F. Albu, A. Abbosh, and Y. Gu, April. A survey on deep learning tools dealing with data scarcity: definitions, challenges, solutions, tips, and applications. J. Big Data (2023). https://doi.org/10.1186/s40537-023-00727-2

  4. R. Baraniuk, D. Jones, Wigner-based formulation of the chirplet transform. IEEE Trans. Signal Process. 44(12), 3129–3135 (1996). https://doi.org/10.1109/78.553486

    Article  ADS  Google Scholar 

  5. B. Brkljac, M. Janev, R. Obradovic, D. Rapaic, N.M. Ralević, V.S. Crnojevic, Sparse representation of precision matrices used in gmms. Appl. Intell. 41, 956–973 (2014)

    Article  Google Scholar 

  6. H.D. Do, D.T. Chau, S.T. Tran, Speech representation using linear Chirplet transform and its application in speaker-related recognition, in Computational Collective Intelligence. ed. by N.T. Nguyen, Y. Manolopoulos, R. Chbeir, A. Kozierkiewicz, B. Trawiński (Springer International Publishing, Cham, 2022), pp.719–729

    Chapter  Google Scholar 

  7. G. Evangelista, S. Cavaliere, Discrete frequency warped wavelets: theory and applications. IEEE Trans. Signal Process. 46(4), 874–885 (1998). https://doi.org/10.1109/78.668543

    Article  ADS  MathSciNet  Google Scholar 

  8. W.M. Fisher, G.R. Doddington, K.M. Goudie-Marshall, The darpa speech recognition research database: specifications and status. Proc. DARPA Workshop Speech Recogn. 1, 93–99 (1986)

    Google Scholar 

  9. W.B. Gao, B.Z. Li, Octonion short-time Fourier transform for time-frequency representation and its applications. IEEE Trans. Signal Process. 69, 6386–6398 (2021). https://doi.org/10.1109/TSP.2021.3127678

    Article  ADS  MathSciNet  Google Scholar 

  10. C. Giardina, P. Chirlian, Proof of Weierstrass approximation theorem using band-limited functions. Proc. IEEE 61(4), 512–512 (1973). https://doi.org/10.1109/PROC.1973.9103

    Article  Google Scholar 

  11. C. Goodyear, D. Wei Articulatory copy synthesis using a nine-parameter vocal tract model, in 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, vol. 1 (1996). pp. 385–388

  12. Y. Guan, M. Liang, D.S. Necsulescu, Velocity synchronous linear chirplet transform. IEEE Trans. Industr. Electron. 66(8), 6270–6280 (2019). https://doi.org/10.1109/TIE.2018.2873520

    Article  Google Scholar 

  13. A. Gulati, J. Qin, C.C. Chiu, N. Parmar, Y. Zhang, J. Yu, W. Han, S. Wang, Z. Zhang, Y. Wu, et al, Conformer: Convolution-augmented transformer for speech recognition. arXiv preprintarXiv:2005.08100 (2020)

  14. G. Hao, J. Guo, Y. Bai, S. Tan, M. Wu, Novel method for non-stationary signals via high-concentration time-frequency analysis using sstfrft. Circuits Syst. Signal Process. 39, 5710–5728 (2020)

    Article  Google Scholar 

  15. J. Hu, B. Liu, M. Yu, A novel method of realizing stochastic chaotic secure communication by synchrosqueezed wavelet transform: the finite-time case. IEEE Access 9, 83940–83949 (2021). https://doi.org/10.1109/ACCESS.2021.3087236

    Article  Google Scholar 

  16. Z. Hua, J. Shi, Z. Zhu, Matching linear chirplet strategy-based synchroextracting transform and its application to rotating machinery fault diagnosis. IEEE Access 8, 185725–185737 (2020). https://doi.org/10.1109/ACCESS.2020.3027067

    Article  Google Scholar 

  17. N. Jaitly, D. Sussillo, Q.V. Le, O. Vinyals, I. Sutskever, S. Bengio. A neural transducer. (2015) arXiv preprintarXiv:1511.04868

  18. P. Khatua, K.C. Ray, A low computational complexity modified complex harmonic wavelet transform. Circuits Syst. Signal Process. 41, 6462–6483 (2022)

    Article  Google Scholar 

  19. Y. LeCun, B. Boser, J.S. Denker, D. Henderson, R.E. Howard, W. Hubbard, L.D. Jacke, Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989). https://doi.org/10.1162/neco.1989.1.4.541

    Article  Google Scholar 

  20. M. Li, B. Liu, Q. Wei, K. Yu, C. Wang, X. Zeng Time -frequency analysis on earth fault signal response based on the blt equation for multibranch distribution networks, in The 12th IEEEPES Asia-Pacific Power and Energy Engineering Conference (APPEEC) (2020). pp. 1–5

  21. P. Li, Q. Zhang, An improved viterbi algorithm for instantaneous frequency extraction of overlapped multicomponent signals. In 2019 IEEE 4th Advanced Information Technology. Electron. Autom. Control Conf. (IAEAC) 1, 1–5 (2019)

    Google Scholar 

  22. Y. Liu, Y. Sun, Z. Xiong, An approximate maximum likelihood estimator for instantaneous frequency estimation of multicomponent nonstationary signals. IEEE Trans. Instrum. Meas. 71, 1–9 (2022). https://doi.org/10.1109/TIM.2022.3146948

    Article  Google Scholar 

  23. H.T. Luong, H.Q. Vu, A non-expert kaldi recipe for vietnamese speech recognition system, in Proceedings of the Third International Workshop on Worldwide Language Service Infrastructure, (2016). pp. 51–55

  24. S. Majumdar, H. Parthasarathy, Wavelet-based transistor parameter estimation. Circuits Syst. Signal Process. 29, 953–970 (2010)

    Article  Google Scholar 

  25. S. Mann, S. Haykin, Time-frequency perspectives: the ‘chirplet’ transform, in ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, (1992).

  26. S. Mann, S. Haykin, The chirplet transform: physical considerations. IEEE Trans. Signal Process. 43(11), 2745–2761 (1995). https://doi.org/10.1109/78.482123

    Article  ADS  Google Scholar 

  27. D. Mihovilovic, R.N. Bracewell, Adaptive chirplet representation of signals on time-frequency plane. Electron. Lett. 27, 1159–1161 (1991)

    Article  ADS  Google Scholar 

  28. M. Mohammadi, N.A. Khan, H. Hassanpour, A.H. Mohammed. Spike detection based on the adaptive time-frequency analysis. Circuits Syst. Signal Process. 1–25 (2020)

  29. G.K. Nilsen, Recursive time-frequency reassignment. IEEE Trans. Signal Process. 57(8), 3283–3287 (2009). https://doi.org/10.1109/TSP.2009.2020355

    Article  ADS  MathSciNet  Google Scholar 

  30. A.V. Oppenheim, Theory and implementation of the discrete hilbert transform (1969). p. 14–42

  31. S.C. Pei, S.G. Huang, Instantaneous frequency estimation by group delay attractors and instantaneous frequency attractors, in 2014 22nd European Signal Processing Conference (EUSIPCO) (2014). pp. 471–475

  32. Z.K. Peng, G. Meng, F.L. Chu, Z.Q. Lang, W.M. Zhang, Y. Yang, Polynomial chirplet transform with application to instantaneous frequency estimation. IEEE Trans. Instrum. Meas. 60(9), 3222–3229 (2011). https://doi.org/10.1109/TIM.2011.2124770

    Article  ADS  Google Scholar 

  33. B.M. Popović, M. Janev, D. Pekar, N. Jakovljević, M. Gnjatović, M. Secujski, V. Delić, A novel split-and-merge algorithm for hierarchical clustering of gaussian mixture models. Appl. Intell. 37, 377–389 (2012)

    Article  Google Scholar 

  34. M. Richman, T. Parks, R. Shenoy, Discrete-time, discrete-frequency, time-frequency analysis. IEEE Trans. Signal Process. 46(6), 1517–1527 (1998). https://doi.org/10.1109/78.678465

    Article  ADS  Google Scholar 

  35. J.F. Rosenblueth, G.S. Licea, Strengthening weierstrass’ condition. IMA J. Math. Control. Inf. 21(3), 275–294 (2004). https://doi.org/10.1093/imamci/21.3.275

    Article  MathSciNet  Google Scholar 

  36. O.A. Safaryan, I.A. Pilipenko, N.V. Boldyrikhin, and V.I. Yukhnov, Multidimensional likelihood function in the problem of estimating time-frequency parameters of signals, in 2021 Radiation and Scattering of Electromagnetic Waves (RSEMW) (2021). pp. 393–396

  37. X. Tu, Y. Hu, F. Li, S. Abbas, Y. Liu, Instantaneous frequency estimation for nonlinear fm signal based on modified polynomial chirplet transform. IEEE Trans. Instrum. Meas. 66(11), 2898–2908 (2017). https://doi.org/10.1109/TIM.2017.2730982

    Article  ADS  Google Scholar 

  38. J. Wang, Y. Han, L. Wang, P. Zhang, P. Chen, Instantaneous frequency estimation for motion echo signal of projectile in bore based on polynomial chirplet transform, in 2017 International Conference on Computer Systems, Electronics and Control (ICCSEC) (2017), pp. 1031–1040

  39. L. Wu, Y. Zhao, L. He, S. He, G. Ren, A time-varying filtering algorithm based on short-time fractional fourier transform. In 2020 International Conference on Computing, Networking and Communications (ICNC) (2020), pp. 555–560

  40. Y. Yang, Z.K. Peng, X.J. Dong, W.M. Zhang, G. Meng, General parameterized time-frequency transform. IEEE Trans. Signal Process. 62(11), 2751–2764 (2014). https://doi.org/10.1109/TSP.2014.2314061

    Article  ADS  MathSciNet  Google Scholar 

  41. Y. Yang, W. Zhang, Z. Peng, G. Meng, Multicomponent signal analysis based on polynomial chirplet transform. IEEE Trans. Industr. Electron. 60(9), 3948–3956 (2013). https://doi.org/10.1109/TIE.2012.2206331

    Article  Google Scholar 

  42. G. Yu, Y. Zhou, General linear chirplet transform. Mech. Syst. Signal Process. 70–71, 958–973 (2016). https://doi.org/10.1016/j.ymssp.2015.09.004

    Article  ADS  Google Scholar 

Download references

Acknowledgements

Hao Do-Duc was funded by Vingroup JSC and supported by the PhD Scholarship Programme of Vingroup Innovation Foundation (VINIF), Institute of Big Data, code VINIF.2022.TS.037.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hao Do-Duc.

Ethics declarations

Conflicts of interest

The authors see no bias or conflict of competing interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Do-Duc, H., Chau-Thanh, D. & Tran-Thai, S. A New Algorithm for Speech Feature Extraction Using Polynomial Chirplet Transform. Circuits Syst Signal Process 43, 2320–2340 (2024). https://doi.org/10.1007/s00034-023-02561-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-023-02561-6

Keywords

Navigation