Abstract
Time–frequency analysis (TFA) is a powerful tool for signal feature representation. In the time–frequency plane, the primary data properties are shown with both instantaneous values and trends of frequency change during time. With a complicated and non-stationary signal such as human speech, the conventional TFA tools, including Fourier transform, wavelet transform, or linear chirplet transform (LCT), cannot reveal and represent speech behaviors well. This research proposes a new method for speech representation with a TFA perspective using polynomial chirplet transform (PCT). Inspired by the Weierstrass theorem, PCT uses a polynomial function for instantaneous frequency (IF) estimation. This polynomial also shapes the modulated atom for the transform. With the strength of a high-degree polynomial, PCT can capture many meaningful features in human speech and then robust the recognition models by improving the features representation. Experimental results in the speech processing tasks have demonstrated the potential of PCT. Furthermore, it will perform better if PCT is optimized with an adaptive strategy to identify the IF function.








Similar content being viewed by others
Data Availability
This work uses TIMIT (English) and VIVOS (Vietnamese), which are commonly used in academic research.
References
A. Ahrabian, D.P. Mandic, Selective time-frequency reassignment based on synchrosqueezing. IEEE Signal Process. Lett. 22(11), 2039–2043 (2015). https://doi.org/10.1109/LSP.2015.2456097
S.A. Alim, N.K.A. Rashid, December. Some commonly used speech feature extraction algorithms, From Natural to Artificial Intelligence - Algorithms and Applications. IntechOpen. (2018) https://doi.org/10.5772/intechopen.80419
L. Alzubaidi, J. Bai, A. Al-Sabaawi, J. Santamaría, A.S. Albahri, B.S.N. Al-dabbagh, M.A. Fadhel, M. Manoufali, J. Zhang, A.H. Al-Timemy, Y. Duan, A. Abdullah, L. Farhan, Y. Lu, A. Gupta, F. Albu, A. Abbosh, and Y. Gu, April. A survey on deep learning tools dealing with data scarcity: definitions, challenges, solutions, tips, and applications. J. Big Data (2023). https://doi.org/10.1186/s40537-023-00727-2
R. Baraniuk, D. Jones, Wigner-based formulation of the chirplet transform. IEEE Trans. Signal Process. 44(12), 3129–3135 (1996). https://doi.org/10.1109/78.553486
B. Brkljac, M. Janev, R. Obradovic, D. Rapaic, N.M. Ralević, V.S. Crnojevic, Sparse representation of precision matrices used in gmms. Appl. Intell. 41, 956–973 (2014)
H.D. Do, D.T. Chau, S.T. Tran, Speech representation using linear Chirplet transform and its application in speaker-related recognition, in Computational Collective Intelligence. ed. by N.T. Nguyen, Y. Manolopoulos, R. Chbeir, A. Kozierkiewicz, B. Trawiński (Springer International Publishing, Cham, 2022), pp.719–729
G. Evangelista, S. Cavaliere, Discrete frequency warped wavelets: theory and applications. IEEE Trans. Signal Process. 46(4), 874–885 (1998). https://doi.org/10.1109/78.668543
W.M. Fisher, G.R. Doddington, K.M. Goudie-Marshall, The darpa speech recognition research database: specifications and status. Proc. DARPA Workshop Speech Recogn. 1, 93–99 (1986)
W.B. Gao, B.Z. Li, Octonion short-time Fourier transform for time-frequency representation and its applications. IEEE Trans. Signal Process. 69, 6386–6398 (2021). https://doi.org/10.1109/TSP.2021.3127678
C. Giardina, P. Chirlian, Proof of Weierstrass approximation theorem using band-limited functions. Proc. IEEE 61(4), 512–512 (1973). https://doi.org/10.1109/PROC.1973.9103
C. Goodyear, D. Wei Articulatory copy synthesis using a nine-parameter vocal tract model, in 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, vol. 1 (1996). pp. 385–388
Y. Guan, M. Liang, D.S. Necsulescu, Velocity synchronous linear chirplet transform. IEEE Trans. Industr. Electron. 66(8), 6270–6280 (2019). https://doi.org/10.1109/TIE.2018.2873520
A. Gulati, J. Qin, C.C. Chiu, N. Parmar, Y. Zhang, J. Yu, W. Han, S. Wang, Z. Zhang, Y. Wu, et al, Conformer: Convolution-augmented transformer for speech recognition. arXiv preprintarXiv:2005.08100 (2020)
G. Hao, J. Guo, Y. Bai, S. Tan, M. Wu, Novel method for non-stationary signals via high-concentration time-frequency analysis using sstfrft. Circuits Syst. Signal Process. 39, 5710–5728 (2020)
J. Hu, B. Liu, M. Yu, A novel method of realizing stochastic chaotic secure communication by synchrosqueezed wavelet transform: the finite-time case. IEEE Access 9, 83940–83949 (2021). https://doi.org/10.1109/ACCESS.2021.3087236
Z. Hua, J. Shi, Z. Zhu, Matching linear chirplet strategy-based synchroextracting transform and its application to rotating machinery fault diagnosis. IEEE Access 8, 185725–185737 (2020). https://doi.org/10.1109/ACCESS.2020.3027067
N. Jaitly, D. Sussillo, Q.V. Le, O. Vinyals, I. Sutskever, S. Bengio. A neural transducer. (2015) arXiv preprintarXiv:1511.04868
P. Khatua, K.C. Ray, A low computational complexity modified complex harmonic wavelet transform. Circuits Syst. Signal Process. 41, 6462–6483 (2022)
Y. LeCun, B. Boser, J.S. Denker, D. Henderson, R.E. Howard, W. Hubbard, L.D. Jacke, Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989). https://doi.org/10.1162/neco.1989.1.4.541
M. Li, B. Liu, Q. Wei, K. Yu, C. Wang, X. Zeng Time -frequency analysis on earth fault signal response based on the blt equation for multibranch distribution networks, in The 12th IEEEPES Asia-Pacific Power and Energy Engineering Conference (APPEEC) (2020). pp. 1–5
P. Li, Q. Zhang, An improved viterbi algorithm for instantaneous frequency extraction of overlapped multicomponent signals. In 2019 IEEE 4th Advanced Information Technology. Electron. Autom. Control Conf. (IAEAC) 1, 1–5 (2019)
Y. Liu, Y. Sun, Z. Xiong, An approximate maximum likelihood estimator for instantaneous frequency estimation of multicomponent nonstationary signals. IEEE Trans. Instrum. Meas. 71, 1–9 (2022). https://doi.org/10.1109/TIM.2022.3146948
H.T. Luong, H.Q. Vu, A non-expert kaldi recipe for vietnamese speech recognition system, in Proceedings of the Third International Workshop on Worldwide Language Service Infrastructure, (2016). pp. 51–55
S. Majumdar, H. Parthasarathy, Wavelet-based transistor parameter estimation. Circuits Syst. Signal Process. 29, 953–970 (2010)
S. Mann, S. Haykin, Time-frequency perspectives: the ‘chirplet’ transform, in ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, (1992).
S. Mann, S. Haykin, The chirplet transform: physical considerations. IEEE Trans. Signal Process. 43(11), 2745–2761 (1995). https://doi.org/10.1109/78.482123
D. Mihovilovic, R.N. Bracewell, Adaptive chirplet representation of signals on time-frequency plane. Electron. Lett. 27, 1159–1161 (1991)
M. Mohammadi, N.A. Khan, H. Hassanpour, A.H. Mohammed. Spike detection based on the adaptive time-frequency analysis. Circuits Syst. Signal Process. 1–25 (2020)
G.K. Nilsen, Recursive time-frequency reassignment. IEEE Trans. Signal Process. 57(8), 3283–3287 (2009). https://doi.org/10.1109/TSP.2009.2020355
A.V. Oppenheim, Theory and implementation of the discrete hilbert transform (1969). p. 14–42
S.C. Pei, S.G. Huang, Instantaneous frequency estimation by group delay attractors and instantaneous frequency attractors, in 2014 22nd European Signal Processing Conference (EUSIPCO) (2014). pp. 471–475
Z.K. Peng, G. Meng, F.L. Chu, Z.Q. Lang, W.M. Zhang, Y. Yang, Polynomial chirplet transform with application to instantaneous frequency estimation. IEEE Trans. Instrum. Meas. 60(9), 3222–3229 (2011). https://doi.org/10.1109/TIM.2011.2124770
B.M. Popović, M. Janev, D. Pekar, N. Jakovljević, M. Gnjatović, M. Secujski, V. Delić, A novel split-and-merge algorithm for hierarchical clustering of gaussian mixture models. Appl. Intell. 37, 377–389 (2012)
M. Richman, T. Parks, R. Shenoy, Discrete-time, discrete-frequency, time-frequency analysis. IEEE Trans. Signal Process. 46(6), 1517–1527 (1998). https://doi.org/10.1109/78.678465
J.F. Rosenblueth, G.S. Licea, Strengthening weierstrass’ condition. IMA J. Math. Control. Inf. 21(3), 275–294 (2004). https://doi.org/10.1093/imamci/21.3.275
O.A. Safaryan, I.A. Pilipenko, N.V. Boldyrikhin, and V.I. Yukhnov, Multidimensional likelihood function in the problem of estimating time-frequency parameters of signals, in 2021 Radiation and Scattering of Electromagnetic Waves (RSEMW) (2021). pp. 393–396
X. Tu, Y. Hu, F. Li, S. Abbas, Y. Liu, Instantaneous frequency estimation for nonlinear fm signal based on modified polynomial chirplet transform. IEEE Trans. Instrum. Meas. 66(11), 2898–2908 (2017). https://doi.org/10.1109/TIM.2017.2730982
J. Wang, Y. Han, L. Wang, P. Zhang, P. Chen, Instantaneous frequency estimation for motion echo signal of projectile in bore based on polynomial chirplet transform, in 2017 International Conference on Computer Systems, Electronics and Control (ICCSEC) (2017), pp. 1031–1040
L. Wu, Y. Zhao, L. He, S. He, G. Ren, A time-varying filtering algorithm based on short-time fractional fourier transform. In 2020 International Conference on Computing, Networking and Communications (ICNC) (2020), pp. 555–560
Y. Yang, Z.K. Peng, X.J. Dong, W.M. Zhang, G. Meng, General parameterized time-frequency transform. IEEE Trans. Signal Process. 62(11), 2751–2764 (2014). https://doi.org/10.1109/TSP.2014.2314061
Y. Yang, W. Zhang, Z. Peng, G. Meng, Multicomponent signal analysis based on polynomial chirplet transform. IEEE Trans. Industr. Electron. 60(9), 3948–3956 (2013). https://doi.org/10.1109/TIE.2012.2206331
G. Yu, Y. Zhou, General linear chirplet transform. Mech. Syst. Signal Process. 70–71, 958–973 (2016). https://doi.org/10.1016/j.ymssp.2015.09.004
Acknowledgements
Hao Do-Duc was funded by Vingroup JSC and supported by the PhD Scholarship Programme of Vingroup Innovation Foundation (VINIF), Institute of Big Data, code VINIF.2022.TS.037.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors see no bias or conflict of competing interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Do-Duc, H., Chau-Thanh, D. & Tran-Thai, S. A New Algorithm for Speech Feature Extraction Using Polynomial Chirplet Transform. Circuits Syst Signal Process 43, 2320–2340 (2024). https://doi.org/10.1007/s00034-023-02561-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-023-02561-6