Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a voice signal preprocessing method based on a cochlear nonlinear dynamics mechanism, and aims to further improve the feature analysis and the anti-noise interference capability of the existing machine voice signal processing technology.
The invention provides a voice signal preprocessing method based on a cochlear nonlinear dynamics mechanism, which comprises the following steps:
(1) establishing a cochlear nonlinear dynamical model;
(2) constructing a nonlinear cochlear array according to a cochlear nonlinear dynamical model;
the nonlinear cochlear array is a group of active simulation modules containing n different natural frequencies, and each active simulation module performs corresponding operation on the received input voice signal according to the cochlear nonlinear dynamics model to obtain a real-time response output signal of each active simulation module;
(3) processing the real-time response output signals of the active simulation modules to obtain voice preprocessing signals;
wherein n is the number of the active simulation modules, and n is an integer greater than or equal to 1.
Further, the nonlinear dynamical model of cochlea is:
wherein x is the basement membrane off-balancePosition displacement, t is time, gamma is damping coefficient, gamma
αIs the adaptive force coefficient, B is the outer hair cell electrostrictive coefficient, x
0Is the primary length of outer hair cells, omega
iIs the natural circular frequency of the cochlea, S (t) is the input speech signal, x
iAnd (t) is a real-time response output signal of the ith active simulation module, i is a serial number of the active simulation module, and i is 1,2,3.
Further, the adaptive force coefficient gamma in the cochlear nonlinear dynamics modelαThe following ranges should be satisfied: gamma is more than 0αGamma is less than or equal to gamma, within this range gammaαThe larger the value, the greater the amplification of the speech signal near its natural frequency by the active simulation module.
Further, the natural frequencies of the n active simulation modules can be set as follows: for natural frequency range a-a × eε(n-1)Hz (epsilon < 1) nonlinear cochlear array, wherein the natural frequency of the ith active simulation module is fi=a*eε(i-1)Hz; i is the serial number of the active simulation module, i 1,2,3.
Wherein a is more than or equal to 20Hz and less than or equal to 200 Hz.
Further, in step (3), the real-time response output signal x is averaged according to an energy method
i(T) processing to obtain speech frame signal with time length T
Wherein x is
i(t) is the real-time response output signal of the ith active simulation module, y
iAnd (T) is the signal after pretreatment, T is time, and T is the duration of a voice frame.
Compared with the prior art, the technical scheme of the invention has the advantages that the nonlinear cochlear array utilizing the cochlear nonlinear dynamical model is introduced to replace the traditional passive filter bank to preprocess the voice signal, so that the preprocessed periodic or quasi-periodic voice signal is amplified, and the combined voice related to the tone is displayed, thereby improving the anti-noise capability and the characteristic analysis capability of the voice processing.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The cochlea is a nonlinear signal processing system, and has the characteristics of two-tone suppression, generation of combined tones and the like, and the characteristics play an important role in signal processing. For example, the term "combined tone" means that when two frequencies are f1、f2When excited by sound of (2), there will be pf1+mf2Frequency component components of the excitation signal (p, m are integers) appear. Wherein the difference frequency component f1-f2The composite sound without fundamental frequency and with higher harmonic passes through cochlea to generate corresponding fundamental frequency, so that the tones with fundamental frequency which does not exist still exist. The cochlea also enables the tones of the quasi-periodic signal to be perceived through nonlinearity. The nonlinear characteristic of the cochlea can be described by a nonlinear dynamical equation, and the invention designs an active simulation module capable of completing the mathematical operation according to the nonlinear dynamical equation and according to the parameter omega in the equationiThe nonlinear cochlear array formed by n active simulation modules with different natural frequencies is designed for preprocessing voice by different values (representing the natural circular frequencies of the active simulation modules). Compared with the conventional method using band-pass filter bankCompared with the sound processing strategy, the voice signal processed by the nonlinear cochlear array can reflect a plurality of nonlinear effects related to the sound processing mechanism of the cochlea, such as nonlinear tuning, polyphonic distortion, two-tone suppression and the like similar to the auditory processing result, particularly can show a combination tone related to the tone, and an active amplification mechanism of a periodic or quasi-periodic voice signal close to the natural frequency of an active simulation module, so that the voice processing has better characteristic analysis and anti-noise capability than the traditional method.
In order to improve the speech recognition and anti-noise interference capability of the existing machine speech signal processing in the real environment. The invention constructs a voice signal preprocessing method based on a cochlear nonlinear dynamics model. The method introduces a nonlinear cochlear array utilizing a cochlear nonlinear dynamical model to replace a traditional passive filter bank to preprocess voice signals. Analysis shows that the simulation result of the nonlinear cochlear array is highly consistent with the physiological experiment result of the cochlear basilar membrane and the auditory psychological experiment result, and especially, a plurality of nonlinear effects related to the sound processing mechanism of the cochlea, such as nonlinear tuning, polyphonic distortion, two-tone suppression and the like, can be well simulated. By using the method to process the voice signal, the combined voice related to the tone can be displayed, the periodic signal characteristic related to the voice is enhanced, and the voice signal is highlighted from the noise, so that the voice identification degree is improved.
The invention adopts the following specific technical scheme:
(1) establishing a cochlea nonlinear dynamics model:
we take cochlear basilar membrane dynamics as the basis, and take cochlear local as an example for stress analysis. During the acoustic conduction process, the cochlea basal membrane is locally subjected to external force F caused by external acoustic stimulation
s(t), base film self-elasticity F
TLymphatic fluid and self-generated resistance
And the nonlinear adaptive force F regulated by the electrostriction of the outer hair cells and the ciliary movement
aIts simplified expression is as follows:
the nonlinear dynamical model of the cochlea built according to the Newton's law of mechanics is as follows:
wherein x is the displacement of the basement membrane from the equilibrium position, gamma is the damping coefficient, gamma
αIs the adaptive force coefficient, B is the outer hair cell electrostrictive coefficient, x
0Is the primary length of outer hair cells, omega
iThe input signal is S (t) which is the natural circular frequency of the cochlea in this region. Solving the nonlinear equation to obtain the real-time response output x of the cochlear basilar membrane
i(t)。
(2) A voice signal preprocessing method added with a cochlear nonlinear dynamics model comprises the following steps:
as shown in fig. 1, the new speech signal preprocessing method requires constructing a nonlinear cochlear array according to a nonlinear dynamical model of the cochlea to simulate the processing mechanism of the cochlea on sound. The nonlinear cochlear array is a group of n cochlear nonlinear dynamical models with different natural frequencies
The active simulation module of (2) to form a nonlinear simulation array. The input speech signal is S (t), the real-time response output x of different processed channels can be obtained by solving the equation
i(t) of (d). Then each channel outputs a signal x
i(t) averaging according to an energy method to obtain a speech preprocessing signal
It should be noted that the design of the active simulation module should be such that the adaptive force coefficient γ is
αThe settings were: gamma is more than 0
αIn the range of ≦ γ, when γ is present
αWhen the adaptive force is 0, the system becomes a passive system; when gamma is
αWhen γ, the system will eventually oscillate self-sustained. When gamma is
αWithin the above range, γ
αThe larger the value is, the larger the maximum value of the adaptive force is, and the active simulation isThe smaller the effective damping of the module, the greater the amplitude of the response to a speech signal near the natural frequency of the active simulation module. The frequency response curve of the active simulation module and the frequency response characteristic curve of the passive system are shown in fig. 4, for example, it can be seen that the active simulation module has a better amplification effect on the voice signal near its natural frequency.
The present invention will be described in further detail with reference to the accompanying drawings and specific examples.
Fig. 1 is a block diagram of a novel speech signal processing technique implemented by using a cochlear nonlinear dynamical model. The specific strategy is as follows: n frequency band channels are required to be designed according to the frequency, and each frequency band channel comprises an active simulation module with different natural frequencies to form a nonlinear cochlear array. The sound signal S (t) recorded by the microphone is processed and output as x by different simulation modulesi(t), thereafter xi(t) average output by energy of yi(t),yiAnd (t) is the signal after pretreatment. The preprocessing signal obtained by the method can reflect the combined tone information which is consistent with the cochlear processing result and is related to auditory tones, and enhance the periodic signal characteristics related to voice, so that the voice signal characteristics are highlighted in noise, and the voice recognition degree is improved.
Fig. 2 shows the multi-tone distortion effect, and as can be seen from the response spectrum of the active simulation module in fig. 2(a), distortion products which cannot be found in the conventional passive filter system appear in the response spectrum of the nonlinear cochlear array, namely: and combining the sound. Fig. 2(b) is the response result on the cochlear basilar membrane of the actual physiological experiment. The comparison shows that the active simulation module can well simulate the polyphonic distortion effect in the cochlea, and combined tone information related to the voice tones appears in the voice signals processed by the active simulation module, and meanwhile, the fundamental frequency signals of the voice are improved, which is also the basis for improving the voice characteristics by the strategy.
Fig. 3(a) is a result of the richness spectrum analysis of a speech after adding noise, and it can be seen from the figure that the speech features are almost drowned by the noise. The results of the Fourier transform analysis using the model constructed in this study are shown in FIG. 3 (b). It can be seen that speech signal features are clearly highlighted in the noise.
FIG. 4 is a graph comparing frequency response characteristics of an active simulation module and a passive system. The horizontal axis of the image is the sound frequency, and the vertical axis is the response amplitude of the system to the sounds with different frequencies. In the figure, the natural frequencies of the active simulation module and the passive system are both 140Hz, the solid line is the frequency response characteristic line of the active simulation module, and the dotted line is the frequency response characteristic line of the passive system. Compared with a passive system, the active simulation module has larger response amplitude to the voice signals near the natural frequency, and the active amplification effect of the active simulation module on the voice signals is embodied.
The invention provides a voice signal preprocessing method based on a cochlear nonlinear dynamics mechanism, which comprises the following steps:
(1) establishing a cochlear nonlinear dynamics model
(2) And constructing a nonlinear cochlear array according to the cochlear nonlinear dynamical model, wherein the nonlinear cochlear array is a group of active simulation modules containing n different natural frequencies, and the nonlinear cochlear array is used for performing corresponding mathematical operation according to the nonlinear dynamical model. The natural frequency of the active simulation module can be set according to the following modes: e.g. for the natural frequency range a-eε(n-1)Hz (generally, 20Hz is more than or equal to a and less than or equal to 200Hz) nonlinear cochlear array, wherein the natural frequency of the ith active simulation module is fi=a*eε(i-1)Hz (i ═ 1,2,3.. n). The design of the active simulation module should make the adaptive force coefficient gammaαThe settings were: gamma is more than 0αA range of ≦ γ, in which range γ is presentαThe larger the value is, the larger the amplification effect of the active simulation module on the voice signal near the natural frequency is;
the input voice signal is S (t), the real-time response output x of each processed active simulation module can be obtained by solving the equationi(t);
(3) Output signal x for each channel
i(t) processing to obtain a speech pre-processed signal
Wherein the channel output signals x can be processed according to energy-wise averaging
i(T), thereby obtaining the voice frame signal with the time length T for the subsequent voice processing process.
The nonlinear cochlear array can align periodic or periodical voice signals to play an active amplification role, so that the voice signals required by people can be highlighted from noise. Meanwhile, the nonlinear cochlear array can well simulate the polyphonic distortion effect of the cochlea, so that the preprocessed signals can show combined tones related to tones, the voice characteristics are highlighted, and the voice recognition degree is improved.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.