A New Algorithm for Speech Feature Extraction Using Polynomial Chirplet Transform

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Time–frequency analysis (TFA) is a powerful tool for signal feature representation. In the time–frequency plane, the primary data properties are shown with both instantaneous values and trends of frequency change during time. With a complicated and non-stationary signal such as human speech, the conventional TFA tools, including Fourier transform, wavelet transform, or linear chirplet transform (LCT), cannot reveal and represent speech behaviors well. This research proposes a new method for speech representation with a TFA perspective using polynomial chirplet transform (PCT). Inspired by the Weierstrass theorem, PCT uses a polynomial function for instantaneous frequency (IF) estimation. This polynomial also shapes the modulated atom for the transform. With the strength of a high-degree polynomial, PCT can capture many meaningful features in human speech and then robust the recognition models by improving the features representation. Experimental results in the speech processing tasks have demonstrated the potential of PCT. Furthermore, it will perform better if PCT is optimized with an adaptive strategy to identify the IF function.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive S-Transform with Chirp-Modulated Window and Its Synchroextracting Transform

Article 31 May 2021

Speech Representation Using Linear Chirplet Transform and Its Application in Speaker-Related Recognition

Improvement of automatic speech recognition systems utilizing 2D adaptive wavelet transformation applied to recurrence plot of speech trajectories

Article 15 December 2023

Data Availability

This work uses TIMIT (English) and VIVOS (Vietnamese), which are commonly used in academic research.

References

A. Ahrabian, D.P. Mandic, Selective time-frequency reassignment based on synchrosqueezing. IEEE Signal Process. Lett. 22(11), 2039–2043 (2015). https://doi.org/10.1109/LSP.2015.2456097
Article ADS Google Scholar
S.A. Alim, N.K.A. Rashid, December. Some commonly used speech feature extraction algorithms, From Natural to Artificial Intelligence - Algorithms and Applications. IntechOpen. (2018) https://doi.org/10.5772/intechopen.80419
L. Alzubaidi, J. Bai, A. Al-Sabaawi, J. Santamaría, A.S. Albahri, B.S.N. Al-dabbagh, M.A. Fadhel, M. Manoufali, J. Zhang, A.H. Al-Timemy, Y. Duan, A. Abdullah, L. Farhan, Y. Lu, A. Gupta, F. Albu, A. Abbosh, and Y. Gu, April. A survey on deep learning tools dealing with data scarcity: definitions, challenges, solutions, tips, and applications. J. Big Data (2023). https://doi.org/10.1186/s40537-023-00727-2
R. Baraniuk, D. Jones, Wigner-based formulation of the chirplet transform. IEEE Trans. Signal Process. 44(12), 3129–3135 (1996). https://doi.org/10.1109/78.553486
Article ADS Google Scholar
B. Brkljac, M. Janev, R. Obradovic, D. Rapaic, N.M. Ralević, V.S. Crnojevic, Sparse representation of precision matrices used in gmms. Appl. Intell. 41, 956–973 (2014)
Article Google Scholar
H.D. Do, D.T. Chau, S.T. Tran, Speech representation using linear Chirplet transform and its application in speaker-related recognition, in Computational Collective Intelligence. ed. by N.T. Nguyen, Y. Manolopoulos, R. Chbeir, A. Kozierkiewicz, B. Trawiński (Springer International Publishing, Cham, 2022), pp.719–729
Chapter Google Scholar
G. Evangelista, S. Cavaliere, Discrete frequency warped wavelets: theory and applications. IEEE Trans. Signal Process. 46(4), 874–885 (1998). https://doi.org/10.1109/78.668543
Article ADS MathSciNet Google Scholar
W.M. Fisher, G.R. Doddington, K.M. Goudie-Marshall, The darpa speech recognition research database: specifications and status. Proc. DARPA Workshop Speech Recogn. 1, 93–99 (1986)
Google Scholar
W.B. Gao, B.Z. Li, Octonion short-time Fourier transform for time-frequency representation and its applications. IEEE Trans. Signal Process. 69, 6386–6398 (2021). https://doi.org/10.1109/TSP.2021.3127678
Article ADS MathSciNet Google Scholar
C. Giardina, P. Chirlian, Proof of Weierstrass approximation theorem using band-limited functions. Proc. IEEE 61(4), 512–512 (1973). https://doi.org/10.1109/PROC.1973.9103
Article Google Scholar
C. Goodyear, D. Wei Articulatory copy synthesis using a nine-parameter vocal tract model, in 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, vol. 1 (1996). pp. 385–388
Y. Guan, M. Liang, D.S. Necsulescu, Velocity synchronous linear chirplet transform. IEEE Trans. Industr. Electron. 66(8), 6270–6280 (2019). https://doi.org/10.1109/TIE.2018.2873520
Article Google Scholar
A. Gulati, J. Qin, C.C. Chiu, N. Parmar, Y. Zhang, J. Yu, W. Han, S. Wang, Z. Zhang, Y. Wu, et al, Conformer: Convolution-augmented transformer for speech recognition. arXiv preprintarXiv:2005.08100 (2020)
G. Hao, J. Guo, Y. Bai, S. Tan, M. Wu, Novel method for non-stationary signals via high-concentration time-frequency analysis using sstfrft. Circuits Syst. Signal Process. 39, 5710–5728 (2020)
Article Google Scholar
J. Hu, B. Liu, M. Yu, A novel method of realizing stochastic chaotic secure communication by synchrosqueezed wavelet transform: the finite-time case. IEEE Access 9, 83940–83949 (2021). https://doi.org/10.1109/ACCESS.2021.3087236
Article Google Scholar
Z. Hua, J. Shi, Z. Zhu, Matching linear chirplet strategy-based synchroextracting transform and its application to rotating machinery fault diagnosis. IEEE Access 8, 185725–185737 (2020). https://doi.org/10.1109/ACCESS.2020.3027067
Article Google Scholar
N. Jaitly, D. Sussillo, Q.V. Le, O. Vinyals, I. Sutskever, S. Bengio. A neural transducer. (2015) arXiv preprintarXiv:1511.04868
P. Khatua, K.C. Ray, A low computational complexity modified complex harmonic wavelet transform. Circuits Syst. Signal Process. 41, 6462–6483 (2022)
Article Google Scholar
Y. LeCun, B. Boser, J.S. Denker, D. Henderson, R.E. Howard, W. Hubbard, L.D. Jacke, Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989). https://doi.org/10.1162/neco.1989.1.4.541
Article Google Scholar
M. Li, B. Liu, Q. Wei, K. Yu, C. Wang, X. Zeng Time -frequency analysis on earth fault signal response based on the blt equation for multibranch distribution networks, in The 12th IEEEPES Asia-Pacific Power and Energy Engineering Conference (APPEEC) (2020). pp. 1–5
P. Li, Q. Zhang, An improved viterbi algorithm for instantaneous frequency extraction of overlapped multicomponent signals. In 2019 IEEE 4th Advanced Information Technology. Electron. Autom. Control Conf. (IAEAC) 1, 1–5 (2019)
Google Scholar
Y. Liu, Y. Sun, Z. Xiong, An approximate maximum likelihood estimator for instantaneous frequency estimation of multicomponent nonstationary signals. IEEE Trans. Instrum. Meas. 71, 1–9 (2022). https://doi.org/10.1109/TIM.2022.3146948
Article Google Scholar
H.T. Luong, H.Q. Vu, A non-expert kaldi recipe for vietnamese speech recognition system, in Proceedings of the Third International Workshop on Worldwide Language Service Infrastructure, (2016). pp. 51–55
S. Majumdar, H. Parthasarathy, Wavelet-based transistor parameter estimation. Circuits Syst. Signal Process. 29, 953–970 (2010)
Article Google Scholar
S. Mann, S. Haykin, Time-frequency perspectives: the ‘chirplet’ transform, in ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, (1992).
S. Mann, S. Haykin, The chirplet transform: physical considerations. IEEE Trans. Signal Process. 43(11), 2745–2761 (1995). https://doi.org/10.1109/78.482123
Article ADS Google Scholar
D. Mihovilovic, R.N. Bracewell, Adaptive chirplet representation of signals on time-frequency plane. Electron. Lett. 27, 1159–1161 (1991)
Article ADS Google Scholar
M. Mohammadi, N.A. Khan, H. Hassanpour, A.H. Mohammed. Spike detection based on the adaptive time-frequency analysis. Circuits Syst. Signal Process. 1–25 (2020)
G.K. Nilsen, Recursive time-frequency reassignment. IEEE Trans. Signal Process. 57(8), 3283–3287 (2009). https://doi.org/10.1109/TSP.2009.2020355
Article ADS MathSciNet Google Scholar
A.V. Oppenheim, Theory and implementation of the discrete hilbert transform (1969). p. 14–42
S.C. Pei, S.G. Huang, Instantaneous frequency estimation by group delay attractors and instantaneous frequency attractors, in 2014 22nd European Signal Processing Conference (EUSIPCO) (2014). pp. 471–475
Z.K. Peng, G. Meng, F.L. Chu, Z.Q. Lang, W.M. Zhang, Y. Yang, Polynomial chirplet transform with application to instantaneous frequency estimation. IEEE Trans. Instrum. Meas. 60(9), 3222–3229 (2011). https://doi.org/10.1109/TIM.2011.2124770
Article ADS Google Scholar
B.M. Popović, M. Janev, D. Pekar, N. Jakovljević, M. Gnjatović, M. Secujski, V. Delić, A novel split-and-merge algorithm for hierarchical clustering of gaussian mixture models. Appl. Intell. 37, 377–389 (2012)
Article Google Scholar
M. Richman, T. Parks, R. Shenoy, Discrete-time, discrete-frequency, time-frequency analysis. IEEE Trans. Signal Process. 46(6), 1517–1527 (1998). https://doi.org/10.1109/78.678465
Article ADS Google Scholar
J.F. Rosenblueth, G.S. Licea, Strengthening weierstrass’ condition. IMA J. Math. Control. Inf. 21(3), 275–294 (2004). https://doi.org/10.1093/imamci/21.3.275
Article MathSciNet Google Scholar
O.A. Safaryan, I.A. Pilipenko, N.V. Boldyrikhin, and V.I. Yukhnov, Multidimensional likelihood function in the problem of estimating time-frequency parameters of signals, in 2021 Radiation and Scattering of Electromagnetic Waves (RSEMW) (2021). pp. 393–396
X. Tu, Y. Hu, F. Li, S. Abbas, Y. Liu, Instantaneous frequency estimation for nonlinear fm signal based on modified polynomial chirplet transform. IEEE Trans. Instrum. Meas. 66(11), 2898–2908 (2017). https://doi.org/10.1109/TIM.2017.2730982
Article ADS Google Scholar
J. Wang, Y. Han, L. Wang, P. Zhang, P. Chen, Instantaneous frequency estimation for motion echo signal of projectile in bore based on polynomial chirplet transform, in 2017 International Conference on Computer Systems, Electronics and Control (ICCSEC) (2017), pp. 1031–1040
L. Wu, Y. Zhao, L. He, S. He, G. Ren, A time-varying filtering algorithm based on short-time fractional fourier transform. In 2020 International Conference on Computing, Networking and Communications (ICNC) (2020), pp. 555–560
Y. Yang, Z.K. Peng, X.J. Dong, W.M. Zhang, G. Meng, General parameterized time-frequency transform. IEEE Trans. Signal Process. 62(11), 2751–2764 (2014). https://doi.org/10.1109/TSP.2014.2314061
Article ADS MathSciNet Google Scholar
Y. Yang, W. Zhang, Z. Peng, G. Meng, Multicomponent signal analysis based on polynomial chirplet transform. IEEE Trans. Industr. Electron. 60(9), 3948–3956 (2013). https://doi.org/10.1109/TIE.2012.2206331
Article Google Scholar
G. Yu, Y. Zhou, General linear chirplet transform. Mech. Syst. Signal Process. 70–71, 958–973 (2016). https://doi.org/10.1016/j.ymssp.2015.09.004
Article ADS Google Scholar

Download references

Acknowledgements

Hao Do-Duc was funded by Vingroup JSC and supported by the PhD Scholarship Programme of Vingroup Innovation Foundation (VINIF), Institute of Big Data, code VINIF.2022.TS.037.

Author information

Authors and Affiliations

University of Science, Ho Chi Minh City, Vietnam
Hao Do-Duc, Duc Chau-Thanh & Son Tran-Thai
Vietnam National University, Ho Chi Minh City, Vietnam
Hao Do-Duc, Duc Chau-Thanh & Son Tran-Thai
FPT University, Ho Chi Minh City, Vietnam
Hao Do-Duc

Authors

Hao Do-Duc
View author publications
You can also search for this author in PubMed Google Scholar
Duc Chau-Thanh
View author publications
You can also search for this author in PubMed Google Scholar
Son Tran-Thai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hao Do-Duc.

Ethics declarations

Conflicts of interest

The authors see no bias or conflict of competing interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Do-Duc, H., Chau-Thanh, D. & Tran-Thai, S. A New Algorithm for Speech Feature Extraction Using Polynomial Chirplet Transform. Circuits Syst Signal Process 43, 2320–2340 (2024). https://doi.org/10.1007/s00034-023-02561-6

Download citation

Received: 29 March 2023
Revised: 07 November 2023
Accepted: 07 November 2023
Published: 12 December 2023
Issue Date: April 2024
DOI: https://doi.org/10.1007/s00034-023-02561-6

A New Algorithm for Speech Feature Extraction Using Polynomial Chirplet Transform

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Adaptive S-Transform with Chirp-Modulated Window and Its Synchroextracting Transform

Speech Representation Using Linear Chirplet Transform and Its Application in Speaker-Related Recognition

Improvement of automatic speech recognition systems utilizing 2D adaptive wavelet transformation applied to recurrence plot of speech trajectories

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A New Algorithm for Speech Feature Extraction Using Polynomial Chirplet Transform

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Adaptive S-Transform with Chirp-Modulated Window and Its Synchroextracting Transform

Speech Representation Using Linear Chirplet Transform and Its Application in Speaker-Related Recognition

Improvement of automatic speech recognition systems utilizing 2D adaptive wavelet transformation applied to recurrence plot of speech trajectories

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation