[go: up one dir, main page]

WO2022052244A1 - 耳机的语音活动检测方法、耳机及存储介质 - Google Patents

耳机的语音活动检测方法、耳机及存储介质 Download PDF

Info

Publication number
WO2022052244A1
WO2022052244A1 PCT/CN2020/124866 CN2020124866W WO2022052244A1 WO 2022052244 A1 WO2022052244 A1 WO 2022052244A1 CN 2020124866 W CN2020124866 W CN 2020124866W WO 2022052244 A1 WO2022052244 A1 WO 2022052244A1
Authority
WO
WIPO (PCT)
Prior art keywords
bone conduction
domain
signal
frequency
microphone
Prior art date
Application number
PCT/CN2020/124866
Other languages
English (en)
French (fr)
Inventor
陈国明
Original Assignee
歌尔股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 歌尔股份有限公司 filed Critical 歌尔股份有限公司
Priority to US18/025,876 priority Critical patent/US20230352038A1/en
Publication of WO2022052244A1 publication Critical patent/WO2022052244A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/005Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/84Detection of presence or absence of voice signals for discriminating voice from noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/10Earpieces; Attachments therefor ; Earphones; Monophonic headphones
    • H04R1/1083Reduction of ambient noise
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R25/00Deaf-aid sets, i.e. electro-acoustic or electro-mechanical hearing aids; Electric tinnitus maskers providing an auditory perception
    • H04R25/40Arrangements for obtaining a desired directivity characteristic
    • H04R25/407Circuits for combining signals of a plurality of transducers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/10Earpieces; Attachments therefor ; Earphones; Monophonic headphones
    • H04R1/1091Details not provided for in groups H04R1/1008 - H04R1/1083
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2201/00Details of transducers, loudspeakers or microphones covered by H04R1/00 but not provided for in any of its subgroups
    • H04R2201/10Details of earpieces, attachments therefor, earphones or monophonic headphones covered by H04R1/10 but not provided for in any of its subgroups
    • H04R2201/107Monophonic and stereophonic headphones with microphone for two-way hands free communication
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2430/00Signal processing covered by H04R, not provided for in its groups
    • H04R2430/03Synergistic effects of band splitting and sub-band processing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2460/00Details of hearing devices, i.e. of ear- or headphones covered by H04R1/10 or H04R5/033 but not provided for in any of their subgroups, or of hearing aids covered by H04R25/00 but not provided for in any of its subgroups
    • H04R2460/13Hearing devices using bone conduction transducers

Definitions

  • the present application relates to the technical field of wireless communication, and in particular, to a method for detecting voice activity of an earphone, an earphone and a storage medium.
  • Speech enhancement is an effective method to solve noise pollution. It can extract clean speech signals from noisy speech and reduce the auditory fatigue of listeners. Hearing impaired people reduce background interference and other occasions.
  • VAD Voice Activated Detection
  • VAD Voice Activated Detection
  • the main purpose of the embodiments of the present application is to provide a voice activity detection method for an earphone, which aims to solve the technical problem of low recognition accuracy in the prior art when a VAD is used to determine whether a sound signal is noise or voice.
  • an embodiment of the present application provides a voice activity detection method for an earphone, including the following content:
  • the step of obtaining a coherence coefficient according to the frequency-domain microphone signal and the frequency-domain bone conduction signal includes:
  • the coherence coefficient is obtained according to the sub-frequency domain microphone signal of each of the sub-bands and the sub-frequency domain bone conduction signal of each of the sub-bands.
  • the step of obtaining the coherence coefficient according to the sub-frequency domain microphone signal of each of the sub-bands and the sub-frequency domain bone conduction signal of each of the sub-bands includes:
  • the coherence coefficient is obtained according to the cross-correlation coefficient of each of the subbands, the energy of the microphone subband, and the energy of the bone conduction subband.
  • the step of obtaining spectral energy according to the spectral bone conduction signal further includes:
  • the spectral energy is acquired according to each of the sub-frequency domain bone conduction signals.
  • the step of determining that the headset detects speech or noise according to the coherence coefficient and the spectral energy includes:
  • the earphone detects noise.
  • the headset detects the voice it further includes:
  • the second time-domain microphone signal and the second time-domain bone conduction signal are mixed and processed and output.
  • the step of performing noise removal on the frequency-domain microphone signal and the frequency-domain bone conduction signal respectively includes:
  • Noise cancellation is performed on the frequency domain bone conduction signal according to the frequency domain bone conduction signal and the historical bone conduction noise power spectral density.
  • the voice activity detection method for the headset further includes:
  • the microphone noise power spectral density is obtained according to the historical microphone noise power spectral density and the frequency domain microphone signal;
  • the historical bone conduction noise power spectral density is updated to the bone conduction noise power spectral density.
  • an embodiment of the present application further provides an earphone, the earphone includes a microphone, a bone voiceprint sensor, a processor, a memory, and an earphone that is stored on the memory and can run on the processor
  • the voice activity detection program of the headset when the voice activity detection program of the headset is executed by the processor, implements the steps of the voice activity detection method of the headset as described above.
  • Embodiments of the present application further provide a computer-readable storage medium, where a voice activity detection program of an earphone is stored on the computer-readable storage medium, and the above-mentioned earphone is implemented when the voice activity detection program of the earphone is executed by a processor.
  • the steps of the voice activity detection method are described in detail below.
  • a method for detecting voice activity of an earphone converts a first time-domain microphone signal into a frequency-domain microphone signal, and converts the first time-domain bone conduction signal into a frequency-domain bone conduction signal, according to the frequency-domain microphone signal.
  • Obtain the coherence coefficient from the signal and the frequency domain bone conduction signal obtain the frequency domain energy according to the frequency domain bone conduction signal, confirm that the current speech frame is speech or noise according to the coherence coefficient and frequency domain energy, and judge the correlation between the microphone signal and the bone conduction signal through the coherence coefficient.
  • the earphone detects speech or noise with reference to the spectral energy, so as to prevent the low-energy microphone signal from being judged as speech, and improve the accuracy of judging speech and noise.
  • FIG. 1 is a schematic structural diagram of an earphone of a hardware operating environment involved in a solution according to an embodiment of the present application
  • FIG. 2 is a schematic flowchart of a first embodiment of a voice activity detection method for a headset of the present application
  • FIG. 3 is a schematic flowchart of the process involved after step S400 in FIG. 2;
  • FIG. 4 is a schematic flowchart of a second embodiment of a voice activity detection method for a headset of the present application
  • FIG. 5 is a schematic diagram of the refinement process of step S230 in FIG. 4;
  • FIG. 6 is a schematic flowchart of a third embodiment of a voice activity detection method for a headset of the present application.
  • FIG. 7 is a schematic flowchart of a fourth embodiment of a voice activity detection method for a headset of the present application.
  • FIG. 8 is a schematic flowchart of a fifth embodiment of a voice activity detection method for an earphone of the present application.
  • the main solution of the embodiment of the present application is: the audio obtained by the earphone is processed by the microphone of the earphone, and the first time-domain microphone signal is converted into a frequency-domain microphone signal, and the audio obtained by the earphone is processed by the bone voiceprint of the earphone.
  • Sensor processing converting the first time-domain bone conduction signal into a frequency-domain bone-conduction signal; obtaining a coherence coefficient according to the frequency-domain microphone signal and the frequency-domain bone-conduction signal; obtaining spectral energy according to the frequency-domain bone conduction signal; It is determined from the coherence coefficient and the spectral energy that speech or noise is detected by the earphone.
  • the VAD is used to determine whether the sound signal is noise or speech in the prior art, there is a technical problem of low recognition accuracy.
  • the embodiments of the present application provide a solution, by converting the first time-domain microphone signal into a frequency-domain microphone signal, and converting the first time-domain bone conduction signal into a frequency-domain bone conduction signal, according to the frequency-domain microphone signal and the frequency-domain bone conduction signal
  • the coherence coefficient is obtained from the pilot signal, the frequency domain energy is obtained according to the frequency domain bone conduction signal, and the current speech frame is confirmed as speech or noise according to the coherence coefficient and frequency domain energy, and the correlation between the microphone signal and the bone conduction signal is judged by the coherence coefficient.
  • the frequency spectrum energy is further used to determine whether the audio obtained by the earphone is speech or noise, so as to prevent the low-energy microphone signal from being determined as speech, and improve the accuracy of determining speech and noise.
  • FIG. 1 is a schematic structural diagram of an earphone of a hardware operating environment involved in the solution of an embodiment of the present application.
  • the executive body of the embodiment of the present application may be an earphone.
  • the earphones can be wired earphones or wireless earphones such as TWS (True Wireless Stereo, true wireless stereo) Bluetooth earphones.
  • the headset may include: a processor 1001 , such as a CPU, an IC chip, a communication bus 1002 , a memory 1003 , a microphone 1004 , and a bone voiceprint sensor 1005 .
  • the communication bus 1002 is used to realize the connection and communication between these components.
  • the memory 1003 may be a high-speed RAM memory, or a non-volatile memory, such as a disk memory.
  • the memory 1003 may optionally also be a storage device independent of the aforementioned processor 1001 .
  • the microphone 1004 is used to collect sound signals conducted through the air, and the collected sound signals can be used to implement calls and noise reduction functions.
  • the bone voiceprint sensor 1005 is used to collect vibration signals conducted through the skull, jaws, etc., and the collected vibration signals are used to realize the noise reduction function.
  • the earphone may further include: a battery assembly, a touch assembly, an LED light, a sensor and a speaker.
  • the battery component is used to power the headset;
  • the touch component is used to realize the touch function, which can be a button;
  • the LED light is used to indicate the working status of the headset, such as power-on prompt, charging prompt, terminal connection prompt, etc.;
  • the sensor can include gravitational acceleration Sensors, vibration sensors, gyroscopes, etc., are used to detect the state of the headset, so as to determine the physical movement state of the user who is currently wearing the headset; for speakers, it can include more than two speakers, for example, each headset of the headset is provided with two Speaker, a moving coil speaker, a moving iron speaker, the moving coil speaker has a better response in the low and middle frequencies, and the moving iron speaker has a better response in the mid-high frequency part.
  • the two speakers are used at the same time, and the moving iron is used by the frequency division function of the processor.
  • the speakers are connected
  • the structure of the earphone shown in FIG. 1 does not constitute a limitation on the terminal, and may include more or less components than shown, or combine some components, or arrange different components.
  • the memory 1003 as a computer storage medium may include an operating system and a voice activity detection program of the headset, and the processor 1001 may be used to call the voice activity detection program of the headset stored in the memory 1003 .
  • FIG. 2 is a schematic flowchart of the first embodiment of the voice activity detection method of the headset of the present application.
  • the voice activity detection method of the headset includes the following steps:
  • Step S100 converting the first time-domain microphone signal collected by the microphone of the earphone into a frequency-domain microphone signal, and converting the first time-domain bone conduction signal collected by the bone voiceprint sensor of the earphone into a frequency-domain bone conduction signal, wherein , the acquisition time period of the first time-domain microphone signal and the first time-domain bone conduction signal is the same;
  • Air conduction means that sound waves are transmitted from the external auditory canal to the middle ear through the auricle, and then transmitted to the inner ear through the ear chain, and the speech spectrum components are relatively rich.
  • Bone conduction refers to the transmission of sound waves to the inner ear through vibrations of the skull and jaw. In bone conduction, sound waves are transmitted to the inner ear without passing through the outer and middle ear.
  • the bone voiceprint sensor includes a bone conduction microphone, which can only collect sound signals that are in direct contact with the bone conduction microphone and vibrate, but cannot collect sound signals transmitted through the air, and is not disturbed by environmental noise. Due to the influence of the process, the bone voiceprint sensor can only collect and transmit lower frequency sound signals, resulting in a dull sound.
  • the earphone converts the first microphone time-domain signal collected by the microphone of the earphone into a frequency-domain microphone signal in real time, and converts the first bone conduction time-domain signal collected by the bone voiceprint processor of the earphone into a frequency-domain bone conduction Signal.
  • the headset includes a microphone and a bone voiceprint sensor.
  • the first microphone frequency domain signal collected by the microphone and the first time domain bone conduction signal collected by the bone voiceprint sensor are collected in the same time period, and the microphone and the bone voiceprint sensor are located in the same earphone, then the frequency domain signal collected by the two
  • the audio is emitted from the same sound source in the environment where the headset is located, that is, the same audio is converted into a first microphone time domain signal after being collected by a microphone, and is converted into a first bone conduction time domain signal after being collected by a bone voiceprint processor.
  • the earphone can use one or more microphones to collect air-conducted sound signals in real time, including ambient noise around the earphone and air-conducted sound signals emitted by the earphone wearer, to obtain the first time-domain microphone signal.
  • the headset includes multiple microphones, the microphone signals collected by each microphone can be subjected to beamforming processing to obtain a first time-domain microphone signal.
  • the earphone collects the vibration signal conducted through the skull, the jaw, etc. in real time through the bone voiceprint sensor to obtain the first time-domain bone conduction signal.
  • Both the first time-domain microphone signal and the first time-domain bone conduction signal are digital signals converted from analog signals.
  • the first time-domain microphone signal is converted from the time-domain to the frequency-domain by Fourier transform to obtain a frequency-domain microphone signal.
  • the first time-domain bone conduction signal is transformed from the time domain into the frequency domain through Fourier transform to obtain the frequency-domain bone conduction signal.
  • Step S200 obtaining a coherence coefficient according to the frequency-domain microphone signal and the frequency-domain bone conduction signal
  • the coherence coefficient is used to reflect the correlation between the frequency-domain microphone signal and the frequency-domain bone conduction signal.
  • the coherence coefficient is positively correlated with the correlation. The larger the coherence coefficient, the higher the correlation.
  • the bone conduction signal collected by the bone voiceprint sensor is not conducted through the air and is not polluted by the environment.
  • the correlation between the microphone signal and the bone conduction signal is high, and the coherence coefficient is large; for noise, the microphone signal contains air conduction noise, and the correlation between the microphone signal and the bone conduction signal is relatively high. low, the coherence coefficient is small.
  • the noise signal in the currently acquired frequency-domain microphone signal accounts for a large proportion, the correlation between the frequency-domain microphone signal and the frequency-domain bone conduction signal is low, and the coherence coefficient is small;
  • the speech signal in the frequency-domain microphone signal is relatively pure, so the correlation between the frequency-domain microphone signal and the frequency-domain bone conduction signal is high, and the coherence coefficient is large.
  • the earphone can obtain the coherence coefficient according to the frequency domain microphone signal and the frequency domain bone conduction signal.
  • the cross power spectral density between the frequency domain microphone signal and the frequency domain bone conduction signal can be obtained according to the frequency domain microphone signal and the frequency domain bone conduction signal, and the power spectral density of the frequency domain microphone signal and the frequency domain bone conduction signal can be obtained.
  • the power spectral density of the signal, the coherence coefficient is calculated according to the cross power spectral density, the power spectral density of the frequency domain microphone signal and the power spectral density of the frequency domain bone conduction signal.
  • Step S300 obtaining spectral energy according to the frequency domain bone conduction signal
  • the earphone can obtain spectral energy according to the bone conduction signal in the frequency domain. Spectral energy is used to measure the energy of the frequency domain bone conduction signal in the low frequency band.
  • Step S400 according to the coherence coefficient and the spectral energy, determine that the earphone has detected speech or noise.
  • the degree of correlation between the frequency-domain microphone signal and the frequency-domain bone conduction signal can be determined according to the coherence coefficient.
  • the correlation is low, it is determined that the currently obtained frequency-domain microphone signal and the frequency-domain bone conduction signal are noise, or the earphone detected
  • the audio signal is noise; on the contrary, it is further judged as speech or noise according to the level of spectral energy, and when the spectral energy is low, it is judged that the currently obtained spectral microphone signal and the spectral bone conduction signal are noise, or the audio signal detected by the earphone is judged to be noise.
  • the correlation degree is high and the spectral energy is high, it is determined that the currently obtained spectral microphone signal and the spectral bone conduction signal are speech, or the audio signal detected by the earphone is determined to be speech.
  • step S400 includes:
  • the earphone detects noise.
  • the preset coherence coefficient and preset spectral energy can be adjusted according to actual needs or the microphone and bone voiceprint sensor, and can be customized by the designer.
  • the coherence coefficient is greater than or equal to the preset coherence coefficient, and the spectrum energy is greater than or equal to the preset spectrum energy, it can be determined that the audio signal currently detected by the headset is speech, and noise cancellation is performed on the spectrum microphone signal and the spectrum bone conduction signal respectively.
  • the coherence coefficient is smaller than the preset coherence coefficient, or the spectrum energy is smaller than the preset spectrum energy, it may be determined that the audio signal detected by the current earphone is noise.
  • Noise removal for spectral microphone signals and spectral bone conduction signals may include spectral subtraction, Wiener filtering, MMSE minimum mean square error method, subspace method, wavelet transform method, and neural network-based noise reduction algorithm.
  • step S400 it also includes:
  • a mute signal is output.
  • the coherence coefficient is smaller than the preset coherence coefficient, or the spectrum energy is smaller than the preset spectrum energy, it is determined that the currently detected audio signal is noise, and the mute signal is directly output, wherein the time domain amplitude corresponding to the mute signal is 0. In this way, the impact of noise on the uplink call can be effectively reduced.
  • step S400 it further includes:
  • Step S500 respectively performing noise removal on the frequency domain microphone signal and the frequency domain bone conduction signal
  • Step S600 converting the noise-eliminated spectrum microphone signal into a second time-domain microphone signal, and converting the noise-eliminated frequency-domain bone conduction signal into a second time-domain bone conduction signal;
  • Step S700 Mix and output the second time-domain microphone signal and the second time-domain bone conduction signal.
  • the second time-domain microphone signal and the second time-domain bone conduction signal are mixed and processed to obtain a mixed sound signal, and the mixed sound signal is output to be used for an uplink call.
  • the noise-eliminated spectral microphone signal is converted from the frequency domain to the time domain by inverse Fourier transform to obtain a second time-domain microphone signal.
  • the noise-eliminated spectral bone conduction signal is converted from the frequency domain to the time domain by inverse Fourier transform to obtain the second time domain bone conduction signal.
  • Noise cancellation is performed on the frequency-domain microphone signal and the frequency-domain bone conduction signal respectively. While eliminating the environmental noise, under the condition of strong noise, the fidelity of the low-frequency signal of the bone voiceprint sensor is much better than that of the low-frequency signal of the microphone. The quality of the uplink voice and audio signals is improved, the clarity of the low frequency signal is improved, and the output uplink call has the beneficial effect of better recognition.
  • high-pass filtering may be used to process the second time-domain microphone signal
  • low-pass filtering may be used to process the second time-domain bone conduction signal
  • the processed second time-domain microphone signal and the processed second time-domain microphone signal may be mixed.
  • the time-domain bone conduction signal is obtained, and the mixed sound signal is obtained, and the mixed sound signal is output.
  • High-pass filtering is used to process the second time-domain microphone signal, so as to block and weaken the low-frequency signal of the second time-domain microphone signal;
  • the signal at the high frequency end of the bone conduction signal in the time domain is blocked and weakened.
  • the processed second time-frequency microphone signal and the processed second time-frequency bone conduction signal are mixed to obtain a mixed sound signal, and the mixed sound signal is output for use in an uplink call.
  • the first time-domain microphone signal is converted into a frequency-domain microphone signal
  • the first time-domain bone conduction signal is converted into a frequency-domain bone conduction signal
  • coherence is obtained according to the frequency-domain microphone signal and the frequency-domain bone conduction signal Coefficient, obtain the frequency domain energy according to the frequency domain bone conduction signal, confirm that the current speech frame is speech or noise according to the coherence coefficient and frequency domain energy, and judge the correlation between the microphone signal and the bone conduction signal through the coherence coefficient.
  • the correlation degree of the pilot signal is high, it is further determined that the earphone detects speech or noise by referring to the spectral energy, so as to prevent the low-energy microphone signal from being judged as speech, and improve the accuracy of judging speech and noise.
  • FIG. 4 is a schematic flowchart of the second embodiment of the voice activity detection method for the headset of the present application.
  • Step S200 includes:
  • Step S210 acquiring sub-frequency-domain microphone signals of each sub-band of the frequency-domain microphone signal in the first preset frequency band;
  • Step S220 acquiring sub-frequency-domain bone conduction signals of each sub-band of the frequency-domain bone conduction signal in the first preset frequency band;
  • Step S230 Obtain the coherence coefficient according to the sub-frequency domain microphone signal of each of the sub-bands and the sub-frequency domain bone conduction signal of each of the sub-bands.
  • a spectrum with a preset bandwidth such as 0-8000 Hz
  • the bandwidth can be divided into sub-bands with equal frequency intervals, for example, the bandwidth of 0-8000 Hz is divided into 128 sub-bands, and each sub-band is 62.5 Hz.
  • the first preset frequency band is a part of the preset bandwidth, which can be set according to requirements or effects, such as 0-4000Hz, a total of 64 sub-bands.
  • the coherence coefficient is obtained according to the sub-frequency domain microphone signal of each sub-band and the sub-frequency domain bone conduction signal of each sub-band.
  • step S230 includes:
  • Step S231 obtaining the microphone sub-band energy of the frequency-domain microphone signal in the first preset frequency band according to the sub-frequency-domain microphone signal of each of the sub-bands;
  • Step S232 obtaining the bone conduction subband energy of the frequency domain bone conduction signal in the first preset frequency band according to the subfrequency domain bone conduction signal of each of the subbands;
  • Step S233 obtaining the cross-correlation coefficient of each of the sub-bands according to the sub-frequency domain microphone signal and the sub-frequency domain bone conduction signal corresponding to the same sub-band;
  • Step S234 Obtain the coherence coefficient according to the cross-correlation coefficient of each of the subbands, the energy of the microphone subband, and the energy of the bone conduction subband.
  • the earphone obtains the microphone sub-band energy of the frequency-domain microphone signal in the first preset frequency band according to the sub-frequency-domain microphone signal of each sub-band. Further, the energy of the microphone sub-bands in the first preset frequency band is equal to the sum of the squares of the moduli of the sub-frequency domain microphone signals of each sub-band.
  • the earphone obtains the bone conduction subband energy of the frequency domain bone conduction signal in the first preset frequency band according to the subfrequency domain bone conduction signal of each subband. Further, the bone conduction sub-band energy in the first preset frequency band is equal to the sum of the squares of the modes of the sub-bone conduction signals of each sub-band.
  • the earphone obtains the cross-correlation coefficient of each sub-band in the first preset frequency band according to the sub-frequency domain microphone signal and the sub-frequency domain bone conduction signal corresponding to the same sub-band. Further, the cross-correlation coefficient of the sub-band is equal to the product of the corresponding sub-frequency domain microphone signal and the sub-frequency domain bone conduction signal.
  • the earphone obtains the coherence coefficient according to the cross-correlation coefficient of each sub-band, the energy of the microphone sub-band and the energy of the bone conduction sub-band. Further, the earphone can obtain the sum of the cross-correlation coefficients of the first preset frequency band according to the cross-correlation coefficients of the respective subbands, wherein the sum of the cross-correlation coefficients is equal to the sum of the cross-correlation coefficients of the respective subbands. The earphone can obtain the coherence coefficient according to the sum of the cross-correlation coefficients, the energy of the microphone sub-band and the energy of the bone conduction sub-band.
  • the coherence coefficient is equal to the ratio of the sum of the cross-correlation coefficients to (the square root of the energy of the microphone subband and the energy of the bone conduction subband).
  • the coherence coefficient satisfies the following formula:
  • is the coherence coefficient
  • k is the sub-band sequence number in the first preset frequency band
  • Y 1 (k) is the corresponding sub-frequency domain microphone signal when the sub-band sequence number is k
  • Y 2 (k) is when the sub-band sequence number is k , the corresponding sub-frequency domain bone conduction signal.
  • FIG. 6 is a schematic flowchart of the third embodiment of the voice activity detection method for the headset of the present application.
  • Step S300 includes:
  • Step S310 acquiring sub-frequency-domain bone conduction signals of each sub-band of the frequency-domain bone conduction signal in the second preset frequency band;
  • Step S320 obtaining the spectral energy according to each of the sub-frequency domain bone conduction signals.
  • the second preset frequency band can be selected from the same preset bandwidth in the second embodiment, such as 0-8000 Hz.
  • the second preset frequency band is a part of the preset bandwidth, which can be set according to requirements or actual effects, such as 0-2000Hz, a total of 32 sub-bands.
  • the spectral energy is equal to the sum of the squares of the modes of the sub-frequency domain bone conduction signals of each sub-band.
  • the sub-frequency domain energy of each sub-band can be obtained according to the sub-frequency domain bone conduction signal, and the frequency domain energy can be obtained according to the sub-frequency domain energy of each sub-band, wherein the sub-frequency domain energy of the sub-band is equal to the sub-band energy of the sub-band.
  • the square of the modulus of the frequency-domain bone conduction signal, the frequency-domain energy is equal to the sum of the sub-frequency-domain energies of each subband.
  • the frequency domain energy satisfies the following formula:
  • E g is the spectral energy
  • k is the sub-band sequence number in the first preset frequency band
  • Y 2 (k) is the corresponding sub-frequency domain bone conduction signal when the sub-band sequence number is k.
  • the sub-frequency domain bone conduction signal of each sub-band in the second preset frequency band is obtained, the spectral energy is obtained according to the sub-frequency domain bone conduction signal of each sub-band, and by setting an appropriate second preset frequency band, Obtaining spectral energy according to each subband neutron frequency domain bone conduction signal in the low frequency band makes the acquisition of spectral energy more practical, and at the same time more accurately reflects the magnitude of spectral energy, making speech recognition more accurate; further, due to the sound signal When the frequency of the frequency domain is low, the coherence coefficient between the frequency domain microphone signal and the frequency domain bone conduction signal may also be large, which may easily cause the noise to be misjudged as speech. Combined with the spectral energy, the beneficial effect of misjudgment at low energy can be effectively eliminated.
  • FIG. 7 is a schematic flowchart of the fourth embodiment of the voice activity detection method for the headset of the present application.
  • Step S500 includes:
  • Step S510 obtaining the historical microphone noise power spectral density and the historical bone conduction noise power spectral density of the earphone;
  • Step S520 performing noise removal on the frequency-domain microphone signal according to the frequency-domain microphone signal and the historical microphone noise power spectral density
  • Step S530 performing noise elimination on the frequency-domain bone conduction signal according to the frequency-domain bone conduction signal and the historical bone conduction noise power spectral density.
  • the earphone stores the last detected microphone noise signal and bone conduction noise signal.
  • the historical microphone noise power spectral density may be the last microphone noise signal identified by the earphone; the historical bone conduction noise power spectral density may be the last bone conduction noise signal identified by the earphone.
  • the headset can cancel and enhance the spectral microphone signal according to the spectral microphone signal and the historical microphone noise power spectral density. Further, a corresponding gain function can be obtained according to the frequency-domain microphone signal and the historical microphone noise power spectral density, and noise elimination and enhancement can be performed on the frequency-domain microphone signal according to the gain function and the spectrum microphone signal.
  • the earphone can cancel and enhance the spectral bone conduction signal according to the spectral bone conduction signal and the power spectral density of the historical bone conduction noise. Further, a corresponding gain function can be obtained according to the frequency domain bone conduction signal and the historical bone conduction noise power spectral density, and noise elimination and enhancement can be performed on the frequency domain bone conduction signal according to the gain function and the spectral bone conduction signal.
  • the elimination and enhancement of the frequency-domain microphone signal or the frequency-domain bone conduction signal satisfy the following formula:
  • H t (k) is the gain function
  • ⁇ t (k) is the posterior signal-to-noise ratio
  • is the over-reduction factor, which is a constant , such as 0.9
  • P n (k,t-1) is the historical microphone noise power spectral density or the historical bone conduction noise power spectral density.
  • the frequency domain microphone signal is eliminated and enhanced according to the frequency domain microphone signal and the historical microphone noise power spectral density.
  • Signal and historical bone conduction noise power spectral density to eliminate and enhance the frequency domain bone conduction signal, eliminate the current sound signal according to the noise signal detected last time, and noise the sound signal according to the environmental noise and the characteristics of the bone voiceprint sensor It has better noise reduction effect.
  • the fidelity of the low-frequency signal of the bone voiceprint sensor is much better than the fidelity of the low-frequency signal of the microphone, thereby improving the quality of the uplink audio signal and improving the clarity of the low-frequency signal. It has the beneficial effect of making the outgoing uplink call have better identification.
  • FIG. 8 is a schematic flowchart of the fifth embodiment of the voice activity detection method for the headset of the present application. After step S400, the method further includes:
  • Step S800 when it is confirmed that the earphone detects noise, obtain the microphone noise power spectral density according to the historical microphone noise power spectral density and the frequency domain microphone signal;
  • Step S900 obtaining a bone conduction noise power spectral density according to the historical bone conduction noise power spectral density and the frequency domain bone conduction signal;
  • Step S1000 updating the historical microphone noise power spectral density to the microphone noise power spectral density
  • Step S1100 updating the historical bone conduction noise power spectral density to the bone conduction noise power spectral density.
  • the earphone detects noise, obtains the power spectral density of the microphone noise according to the historical microphone noise power spectral density and the frequency domain microphone signal, and obtains the microphone noise power spectral density according to the historical bone conduction noise power spectrum. Density and spectral bone conduction signal to obtain bone conduction noise power spectral density,
  • the microphone noise power spectral density is obtained according to the square of the modulus of the frequency domain microphone signal and the historical microphone noise power spectral density
  • the bone conduction noise power spectrum is obtained according to the square of the modulus of the frequency domain bone conduction signal and the historical bone conduction noise power spectral density. density.
  • the mike noise power spectral density satisfies the following formula:
  • P n1 (k, t) is the power spectral density of the microphone noise
  • P n1 (k, t-1) is the power spectral density of the historical microphone noise
  • is the iteration factor, which is a constant, such as 0.9
  • t is the speech frame number
  • k is the subband number.
  • the power spectral density of bone conduction noise satisfies the following formula:
  • P n2 (k, t) is the power spectral density of bone conduction noise
  • P n1 (k, t-1) is the power spectral density of historical bone conduction noise
  • is the iteration factor, which is a constant, such as 0.9
  • t is the speech Frame number
  • k is the subband number.
  • the historical microphone noise power spectral density is updated to the microphone noise power spectral density
  • the historical bone conduction noise power spectral density is updated to the bone conduction noise power spectral density.
  • the historical microphone noise power spectral density and the historical bone conduction noise power spectral density are obtained, and the microphone noise power is obtained according to the frequency domain microphone signal and the historical microphone noise power spectral density Spectral density: Obtain the bone conduction noise power spectral density according to the frequency domain bone conduction signal and the historical bone conduction noise power spectral density, and update the historical microphone noise power spectral density and historical bone conduction noise power spectral density, and update the noise signal in time so as to adapt to the environment The change of the noise eliminates or enhances the current noise, so as to better achieve the beneficial effect of noise reduction.
  • an embodiment of the present application also provides a headset, the headset includes a microphone, a bone voiceprint sensor, a processor, a memory, and a voice activity detection program for the headset that is stored on the memory and can run on the processor , when the voice activity detection program of the headset is executed by the processor, the content of the above embodiments of the voice activity detection method of the headset is implemented.
  • Embodiments of the present application further provide a computer-readable storage medium, where a voice activity detection program of an earphone is stored on the computer-readable storage medium, and the above-mentioned earphone is implemented when the voice activity detection program of the earphone is executed by a processor.
  • the content of the voice activity detection method embodiment is not limited to:

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Acoustics & Sound (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Otolaryngology (AREA)
  • General Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Neurosurgery (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Details Of Audible-Bandwidth Transducers (AREA)

Abstract

本申请一些实施例公开了一种耳机的语音活动检测方法,包括:将第一时域麦克信号转换成频域麦克信号,并将第一时域骨导信号转换成频域骨导信号;根据所述频域麦克信号以及所述频域骨导信号获取相干系数;根据所述频域骨导信号获取频谱能量;根据所述相干系数以及所述频谱能量确定所述耳机检测到语音或噪音。本申请还公开一种耳机以及存储介质。通过相干系数判断麦克信号与骨导信号的相关度,在判定在麦克信号与骨导信号的相关度高时,进一步根据频谱能量判定耳机获取到的音频为语音或噪音,防止将低能量的麦克信号判为语音,提高判定语音与噪音的准确度。

Description

耳机的语音活动检测方法、耳机及存储介质
本申请要求于2020年9月10日提交中国专利局、申请号为202010953526.7、发明名称为“耳机的语音活动检测方法、耳机及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及无线通讯技术领域、尤其涉及一种耳机的语音活动检测方法、耳机及存储介质。
背景技术
语音增强是解决噪声污染的有效方法,可从带噪语音中提取干净的语音信号,减少听众的听觉疲劳程度,目前被广泛应用于数字移动电话、汽车中Hands-free电话系统、电话会议以及为听力障碍者降低背景干扰等场合。
现有技术中通过VAD(Voice Activated Detection,语音激活检测)判断当前处理的信号帧属于语音信号还是噪音信号,通过VAD提取声音信号中的声音特征,根据声音特征判断声音信号是噪音还是语音,存在识别准确性低的问题。
上述内容仅用于辅助理解本申请的技术方案,并不代表承认上述内容是现有技术。
发明内容
本申请实施例的主要目的在于提供一种耳机的语音活动检测方法,旨在解决现有技术中通过VAD判断声音信号是噪音还是语音,存在识别准确性低的技术问题。
为解决上述问题,本申请实施例提供一种耳机的语音活动检测方法,包括以下内容:
将耳机的麦克风采集的第一时域麦克信号转换成频域麦克信号,并将所 述耳机的骨声纹传感器采集的第一时域骨导信号转换成频域骨导信号,其中,所述第一时域麦克信号与所述第一时域骨导信号的采集时间段相同;
根据所述频域麦克信号以及所述频域骨导信号获取相干系数;
根据所述频域骨导信号获取频谱能量;
根据所述相干系数以及所述频谱能量确定所述耳机检测到语音或噪音。
可选地,所述根据所述频域麦克信号以及所述频域骨导信号获取相干系数的步骤包括:
获取所述频域麦克信号在第一预设频段中各个子带的子频域麦克信号;
获取所述频域骨导信号在第一预设频段中各个子带的子频域骨导信号;
根据各个所述子带的子频域麦克信号以及各个所述子带的子频域骨导信号获取所述相干系数。
可选地,所述根据各个所述子带的子频域麦克信号以及各个所述子带的子频域骨导信号获取所述相干系数的步骤包括:
根据各个所述子带的子频域麦克信号获取所述频域麦克信号在所述第一预设频段的麦克子带能量;
根据各个所述子带的子频域骨导信号获取所述频域骨导信号在所述第一预设频段的骨导子带能量;
根据同一所述子带对应的子频域麦克信号与子频域骨导信号获取各个所述子带的互相关系数;
根据各个所述子带的互相关系数、所述麦克子带能量以及所述骨导子带能量获取所述相干系数。
可选地,所述根据所述频谱骨导信号获取频谱能量的步骤还包括:
获取所述频域骨导信号在第二预设频段中各个子带的子频域骨导信号;
根据各个所述子频域骨导信号获取所述频谱能量。
可选地,所述根据所述相干系数以及所述频谱能量确定所述耳机检测到语音或噪音的步骤包括:
在所述相干系数大于或等于预设相干系数,且所述频谱能量大于或等于预设频谱能量时,确认所述耳机检测到语音;
在所述相干系数小于所述预设相干系数,或所述频谱能量小于所述预设频谱能量时,确认所述耳机检测到噪音。
可选地,所述确认所述耳机检测到语音的步骤之后,还包括:
分别对所述频域麦克信号以及所述频域骨导信号进行噪音消除;
将噪音消除后的所述频谱麦克信号转换成第二时域麦克信号,将噪音消除后的频域骨导信号转换成第二时域骨导信号;
对所述第二时域麦克信号以及所述第二时域骨导信号混合处理并输出。
可选地,所述分别对所述频域麦克信号以及所述频域骨导信号进行噪音消除的步骤包括:
获取所述耳机的历史麦克噪声功率谱密度以及历史骨导噪声功率谱密度;
根据所述频域麦克信号以及所述历史麦克噪声功率谱密度对所述频域麦克信号进行噪音消除;
根据所述频域骨导信号以及所述历史骨导噪声功率谱密度对所述频域骨导信号进行噪音消除。
可选地,获取相干系数以及频谱能量的步骤之后,所述根据所述相干系数以及所述频谱能量确定所述耳机检测到语音或噪音的步骤之后,所述耳机的语音活动检测方法还包括:
确认所述耳机检测到噪音时,根据所述历史麦克噪声功率谱密度以及所述频域麦克信号获取麦克噪声功率谱密度;
根据所述历史骨导噪声功率谱密度以及所述频域骨导信号获取骨导噪声功率谱密度;
将所述历史麦克噪声功率谱密度更新为所述麦克噪声功率谱密度;
将所述历史骨导噪声功率谱密度更新为所述骨导噪声功率谱密度。
此外,为解决上述问题,本申请实施例还提供一种耳机,所述耳机包括麦克风、骨声纹传感器、处理器、存储器以及存储在所述存储器上并可在所述处理器上运行的耳机的语音活动检测程序,所述耳机的语音活动检测程序被所述处理器执行时实现如上所述的耳机的语音活动检测方法的步骤。
本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有耳机的语音活动检测程序,所述耳机的语音活动检测程序被处理 器执行时实现如上所述的耳机的语音活动检测方法的步骤。
本申请实施例提出的一种耳机的语音活动检测方法,通过将第一时域麦克信号转换成频域麦克信号,将第一时域骨导信号转换成频域骨导信号,根据频域麦克信号以及频域骨导信号获取相干系数,根据频域骨导信号获取频域能量,根据相干系数、频域能量确认当前语音帧为语音或噪音,通过相干系数判断麦克信号与骨导信号的相关度,在判定在麦克信号与骨导信号的相关度高时,进一步参照频谱能量判定耳机检测到语音或噪音,防止将低能量的麦克信号判为语音,提高判定语音与噪音的准确度。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一部分附图,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为本申请实施例方案涉及的硬件运行环境的耳机结构示意图;
图2为本申请耳机的语音活动检测方法第一实施例的流程示意图;
图3为图2中步骤S400之后涉及的流程示意图;
图4为本申请耳机的语音活动检测方法第二实施例的流程示意图;
图5为图4中步骤S230的细化流程示意图;
图6为本申请耳机的语音活动检测方法第三实施例的流程示意图;
图7为本申请耳机的语音活动检测方法第四实施例的流程示意图;
图8为本申请耳机的语音活动检测方法第五实施例的流程示意图。
具体实施方式
应当理解,此处所描述的具体实施方式仅仅用以解释本申请,并不用于限定本申请。
本申请实施例的主要解决方案是:耳机获取的音频经所述耳机的麦克风处理,由第一时域麦克信号转换成频域麦克信号,所述耳机获取的音频经所述耳机的骨声纹传感器处理,由第一时域骨导信号转换成频域骨导信号;根据所述频域麦克信号以及所述频域骨导信号获取相干系数;根据所述频域骨 导信号获取频谱能量;根据所述相干系数以及所述频谱能量确定所述耳机检测到语音或噪音。
由于现有技术中通过VAD判断声音信号是噪音还是语音,存在识别准确性低的技术问题。
本申请实施例提供一种解决方案,通过将第一时域麦克信号转换成频域麦克信号,将第一时域骨导信号转换成频域骨导信号,根据频域麦克信号以及频域骨导信号获取相干系数,根据频域骨导信号获取频域能量,根据相干系数、频域能量确认当前语音帧为语音或噪音,通过相干系数判断麦克信号与骨导信号的相关度,在判定在麦克信号与骨导信号的相关度高时,进一步参照频谱能量判定耳机获取到的音频为语音或噪音,防止将低能量的麦克信号判为语音,提高判定语音与噪音的准确度。
如图1所示,图1为本申请实施例方案涉及的硬件运行环境的耳机结构示意图。
本申请实施例的执行主体可以是耳机。耳机可以是有线耳机、也可以是无线耳机如TWS(True Wireless Stereo,真正无线立体声)蓝牙耳机。
如图1所示,该耳机可以包括:处理器1001,例如CPU、IC芯片,通信总线1002,存储器1003,麦克风1004,以及骨声纹传感器1005。其中,通讯总线1002用于实现这些组件之间的连接通信。存储器1003可以是高速RAM存储器,也可以是稳定的存储器(non-volatile memory),如磁盘存储器。存储器1003可选地还可以是独立于前述处理器1001的存储装置。麦克风1004用于采集通过空气传导的声音信号,采集的声音信号可用于实现通话以及降噪功能。骨声纹传感器1005用于采集通过颅骨、颌骨等传导的振动信号,采集的振动信号用于实现降噪功能。
进一步地,耳机还可以包括:电池组件、触碰组件、LED灯、传感器以及喇叭。电池组件则用来给耳机供电;触碰组件用于实现触碰功能,可以是按键;LED灯用于提示耳机的工作状态,如开机提示、充电提示、终端连接提示等;传感器可以包括重力加速度传感器、振动传感器以及陀螺仪等,用于检测耳机的状态,从而判断当前佩戴该耳机的用户的身体动作状态;对于喇叭,可以包括两个以上的喇叭,例如耳机的每只耳机均设置两个喇叭,一 个动圈喇叭,一个动铁喇叭,动圈喇叭在中低频率响应较好,动铁喇叭在中高频部分响应比较好,两个喇叭同时使用,通过处理器的分频功能把动铁喇叭并联在动圈喇叭上,使人耳听到整个音频频段的声波。
本领域技术人员可以理解,图1示出的耳机的结构并不构成对终端的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
如图1所示,作为一种计算机存储介质的存储器1003可以包括操作系统以及耳机的语音活动检测程序,而处理器1001可以用于调用存储器1003中存储的耳机的语音活动检测程序。
基于上述终端的结构,提出本申请第一实施例,参照图2,图2为本申请耳机的语音活动检测方法第一实施例的流程示意图,所述耳机的语音活动检测方法包括以下步骤:
步骤S100,将耳机的麦克风采集的第一时域麦克信号转换成频域麦克信号,并将所述耳机的骨声纹传感器采集的第一时域骨导信号转换成频域骨导信号,其中,所述第一时域麦克信号与所述第一时域骨导信号的采集时间段相同;
声波可通过两条路径传入内耳,包括空气传导和骨传导。空气传导是指声波经耳廓由外耳道传递到中耳,再经听耳链传到内耳,语音频谱成分比较丰富。骨传导是指声波通过颅骨、颌骨等振动传到内耳。在骨传导中,声波无需通过外耳和中耳也可以传递到内耳。
骨声纹传感器包括骨导麦克风,只能采集与骨导麦克风直接接触并产生振动的声音信号,不能采集通过空气传播的声音信号,不受环境噪音的干扰,适用于噪声环境下的语音传输。由于工艺影响,骨声纹传感器只能采集并传送频率较低的声音信号,导致声音听起来比较沉闷。
在本实施例中,耳机实时将耳机的麦克风采集的第一麦克时域信号转换成频域麦克信号,将耳机的骨声纹处理器采集的第一骨导时域信号转换成频域骨导信号。其中,耳机包括麦克风以及骨声纹传感器。麦克风采集的第一麦克频域信号与骨声纹传感器采集的第一时域骨导信号是在同一时间段采集的,且麦克风以及骨声纹传感器位于同一耳机,则两者采集的频域信号为耳 机所在环境相同的声源发出的音频,即同一音频通过麦克风采集后转换为第一麦克时域信号,而通过骨声纹处理器采集后转换为第一骨导时域信号。
可选地,耳机可采用一个或多个麦克风实时采集通过空气传导的声音信号,包括耳机周围的环境噪声以及耳机佩戴者本身发出的通过空气传导的声音信号,得到第一时域麦克信号。耳机包括多个麦克风时,可将各个麦克风采集的麦克信号进行波束成形处理,得到第一时域麦克信号。
可选地,耳机通过骨声纹传感器实时采集通过颅骨、颌骨等传导的振动信号,得到第一时域骨导信号。第一时域麦克信号和第一时域骨导信号均由模拟信号转换而成的数字信号。
第一时域麦克信号经傅里叶变换由时域转换成频域,得到频域麦克信号。第一时域骨导信号经傅里叶变换由时域转换成频域,得到频域骨导信号。
步骤S200,根据所述频域麦克信号以及所述频域骨导信号获取相干系数;
相干系数用于反映频域麦克信号以及频域骨导信号之间的相关度,相干系数与相关度呈正相关,相干系数越大,相关度越高。
由于通过空气传导的声音信号,无可避免会受到环境噪音的污染,但通过骨声纹传感器采集的骨导信号,没有通过空气传导,不受环境的污染。对于语音而言,麦克信号与骨导信号之间的相关度较高,相干系数大;对于噪音而言,麦克信号中包含有空气传导的噪音,麦克信号与骨导信号之间的相关度较低,相干系数小。
可以理解的是,若当前获取到的频域麦克信号中的噪音信号占比较大,那么频域麦克信号与频域骨导信号之间的相关度较低,相干系数小;若当前获取到的频域麦克信号中的语音信号较纯净,那么频域麦克信号与频域骨导信号之间的相关度较高,相干系数大。
耳机可根据频域麦克信号以及频域骨导信号获取相干系数。
可选地,可根据频域麦克信号以及频域骨导信号获取频域麦克信号与频域骨导信号之间的互功率谱密度,可获取频域麦克信号的功率谱密度以及频域骨导信号的功率谱密度,根据互功率谱密度、频域麦克信号的功率谱密度以及频域骨导信号的功率谱密度计算出相干系数。
步骤S300,根据所述频域骨导信号获取频谱能量;
耳机可根据频域骨导信号获取频谱能量。频谱能量用于衡量频域骨导信 号在低频段的能量大小。
步骤S400,根据所述相干系数以及所述频谱能量确定所述耳机检测到语音或噪音。
可根据相干系数判定频域麦克信号与频域骨导信号之间的相关度,在相关度低时,判定当前得到的频域麦克信号以及频域骨导信号为噪音,或者判定耳机检测到的音频信号为噪音;反之,进一步根据频谱能量的高低判断为语音或噪音,在频谱能量低时,判定当前得到的频谱麦克信号以及频谱骨导信号为噪音,或者判定耳机检测到的音频信号为噪音;在相关度高、频谱能量高时,判定当前得到的频谱麦克信号以及频谱骨导信号为语音,或者判定耳机检测到的音频信号为语音。
作为一种可选的实施方式,步骤S400包括:
在所述相干系数大于或等于预设相干系数,且所述频谱能量大于或等于预设频谱能量时,确认所述耳机检测到语音;
在所述相干系数小于所述预设相干系数,或所述频谱能量小于所述预设频谱能量时,确认所述耳机检测到噪音。
预设相干系数以及预设频谱能量可根据实际需求或麦克风以及骨声纹传感器进行相应地调整,可由设计人员进行自定义。在相干系数大于或等于预设相干系数,且频谱能量大于或等于预设频谱能量时,可判定耳机当前检测到的音频信号为语音,分别对频谱麦克信号以及频谱骨导信号进行噪音消除。在相干系数小于预设相干系数,或频谱能量小于预设频谱能量时,可判定当前耳机检测到的音频信号为噪音。
对频谱麦克信号以及频谱骨导信号进行噪音消除可包括谱减法、维纳滤波、MMSE最小均方误差方法、子空间方法、小波变换方法以及基于神经网络的降噪算法等。
可选地,步骤S400之后,还包括:
确认所述耳机检测到噪音时,输出静音信号。
在相干系数小于预设相干系数,或频谱能量小预设频谱能量时,判定当前检测到的音频信号为噪音,直接输出静音信号,其中,静音信号对应的时域幅值为0。如此,可有效减少噪音对上行通话的影响。
作为一种可选的实施方式,参照图3,步骤S400之后,还包括:
步骤S500,分别对所述频域麦克信号以及所述频域骨导信号进行噪音消除;
步骤S600,将噪音消除后的所述频谱麦克信号转换成第二时域麦克信号,将噪音消除后的频域骨导信号转换成第二时域骨导信号;
步骤S700,对所述第二时域麦克信号以及所述第二时域骨导信号混合处理并输出。
对第二时域麦克信号以及第二时域骨导信号混合处理,得到混合声音信号,输出混合声音信号,以用于上行链路的通话。
将噪音消除后的频谱麦克信号经反傅里叶变换由频域转换成时域,得到第二时域麦克信号。将噪音消除后的频谱骨导信号经反傅里叶变换由频域转换成时域,得到第二时域骨导信号。
分别对频域麦克信号以及频域骨导信号进行噪音消除,消除环境噪声的同时,在强噪声条件下,骨声纹传感器低频信号保真度远好于麦克风的低频信号的保真度,从而提高上行语音频信号质量,提高低频信号的清晰度,使输出的上行链路通话具有更好的识别度的有益效果。
可选地,可采用高通滤波对第二时域麦克信号进行处理,并采用低通滤波对第二时域骨导信号进行处理;混合处理后的第二时域麦克信号以及处理后的第二时域骨导信号,得到混合声音信号,并输出混合声音信号。
采用高通滤波对第二时域麦克信号进行处理,以对第二时域麦克信号低频段的信号进行阻隔、减弱处理;采用低通滤波对第二时域骨导信号进行处理,以对第二时域骨导信号高频端的信号进行阻隔、减弱处理。将处理后的第二时频麦克信号以及处理后的第二时频骨导信号进行混合,得到混合声音信号,并输出混合声音信号,以用于上行链路的通话。
在本实施例中,通过将第一时域麦克信号转换成频域麦克信号,将第一时域骨导信号转换成频域骨导信号,根据频域麦克信号以及频域骨导信号获取相干系数,根据频域骨导信号获取频域能量,根据相干系数、频域能量确认当前语音帧为语音或噪音,通过相干系数判断麦克信号与骨导信号的相关度,在判定在麦克信号与骨导信号的相关度高时,进一步参照频谱能量判定耳机检测到语音或噪音,防止将低能量的麦克信号判为语音,提高判定语音与噪音的准确度。
基于上述第一实施例,参照图,4,图4为本申请耳机的语音活动检测方法第二实施例的流程示意图,步骤S200包括:
步骤S210,获取所述频域麦克信号在第一预设频段中各个子带的子频域麦克信号;
步骤S220,获取所述频域骨导信号在第一预设频段中各个子带的子频域骨导信号;
步骤S230,根据各个所述子带的子频域麦克信号以及各个所述子带的子频域骨导信号获取所述相干系数。
第一时域麦克信号以及第一时域骨导信号经傅里叶转换后,可得到预设带宽的频谱,如0-8000Hz。可将带宽划分成频率间隔相等的子带,如将0-8000Hz的带宽划分为128个子带,每个子带宽为62.5Hz。第一预设频段为预设带宽中的一部分,可根据需求或效果进行设置,如0-4000Hz,共64个子带。
获取频域麦克信号在第一预设频段中各个子带的子频域麦克信号;获取频域麦克信号在第一预设频段中各个子带的子频域骨导信号。根据各个子带的子频域麦克信号以及各个子带的子频域骨导信号获取相干系数。
作为一种可选的实施方式,参照图5,步骤S230包括:
步骤S231,根据各个所述子带的子频域麦克信号获取所述频域麦克信号在所述第一预设频段的麦克子带能量;
步骤S232,根据各个所述子带的子频域骨导信号获取所述频域骨导信号在所述第一预设频段的骨导子带能量;
步骤S233,根据同一所述子带对应的子频域麦克信号与子频域骨导信号获取各个所述子带的互相关系数;
步骤S234,根据各个所述子带的互相关系数、所述麦克子带能量以及所述骨导子带能量获取所述相干系数。
耳机根据各个子带的子频域麦克信号获取频域麦克信号在第一预设频段的麦克子带能量。进一步地,在第一预设频段的麦克子带能量等于各个子带的子频域麦克信号的模的平方和。
耳机根据各个子带的子频域骨导信号获取频域骨导信号在第一预设频段的骨导子带能量。进一步地,在第一预设频段的骨导子带能量等于各个子带 的子骨导信号的模的平方和。
耳机根据同一子带对应的子频域麦克信号与子频域骨导信号获取第一预设频段中各个子带的互相关系数。进一步地,子带的互相关系数等于对应的子频域麦克信号与子频域骨导信号之积。
耳机根据各个子带的互相关系数、麦克子带能量以及骨导子带能量获取相干系数。进一步地,耳机可根据各个子带的互相关系数得到第一预设频段的互相关系数之和,其中,互相关系数之和等于各个子带的互相关系数之和。耳机可根据互相关系数之和、麦克子带能量以及骨导子带能量得到相干系数。
进一步地,相干系数等于互相关系数之和与(麦克子带能量以及骨导子带能量的平方根)的比值。
可选地,相干系数满足以下公式:
Figure PCTCN2020124866-appb-000001
以第一预设频段为0-4000Hz,64个子带为例。Φ为相干系数,k为第一预设频段中子带序号,Y 1(k)为子带序号为k时,对应的子频域麦克信号;Y 2(k)为子带序号为k时,对应的子频域骨导信号。
在本实施例中,通过获取在第一预设频段中各个子带对应的子频域麦克信号以及子频域骨导信号,根据各个子带的子频域麦克信号以及子频域骨导信号获取相干系数,通过设置合适的第一预设频段,以及结合各个子带的子频域麦克信号与子频域骨导信号获取子频域麦克信号与子频域骨导信号之间的相关度,根据各个子带中子频域麦克信号与子频域骨导信号之间的相关度获取相干系数,使相干系数更具统计意义,得到的相干系数更加精准,用于判断噪音还是语音更加符合实际的有益效果。
基于上述任一实施例,参照图6,图6为本申请耳机的语音活动检测方法第三实施例的流程示意图,步骤S300包括:
步骤S310,获取所述频域骨导信号在第二预设频段中各个子带的子频域骨导信号;
步骤S320,根据各个所述子频域骨导信号获取所述频谱能量。
在本实施例中,第二预设频段可选取于第二实施例中的同一预设带宽, 如0-8000Hz。第二预设频段为预设带宽中的一部分,可根据需求或实际效果进行设置,如0-2000Hz,共32个子带。
获取频域骨导信号在第二预设频段中各个子带的子频域骨导信号,根据各个子带的子频域骨导信号获取频谱能量。进一步地,频谱能量等于各个子带的子频域骨导信号的模的平方和。进一步地,可根据子频域骨导信号获取各个子带的子频域能量,根据各个子带的子频域能量获取频域能量,其中,子带的子频域能量等于该子带的子频域骨导信号的模的平方,频域能量等于各个子带的子频域能量之和。
可选地,频域能量满足以下公式:
Figure PCTCN2020124866-appb-000002
以第一预设频段为0-2000Hz,32个子带为例。E g为频谱能量,k为第一预设频段中子带序号,Y 2(k)为子带序号为k时,对应的子频域骨导信号。
在本实施例中,通过获取第二预设频段中各个子带的子频域骨导信号,根据各个子带的子频域骨导信号获取频谱能量,通过设置合适的第二预设频段,根据低频段中各个子带中子频域骨导信号获取频谱能量,使频谱能量的获取更具实际意义,同时更精准地反映频谱能量的大小,使语音识别更加精准;进一步地,由于声音信号的频率较低时,频域麦克信号与频域骨导信号的相干系数也可能较大,容易造成将噪音误判成语音,结合频谱能量可有效排除低能量时误判的有益效果。
基于上述任一实施例,参照图7,图7为本申请耳机的语音活动检测方法第四实施例的流程示意图,步骤S500包括:
步骤S510,获取所述耳机的历史麦克噪声功率谱密度以及历史骨导噪声功率谱密度;
步骤S520,根据所述频域麦克信号以及所述历史麦克噪声功率谱密度对所述频域麦克信号进行噪音消除;
步骤S530,根据所述频域骨导信号以及所述历史骨导噪声功率谱密度对所述频域骨导信号进行噪音消除。
耳机存储有上一次检测到的麦克噪声信号以及骨导噪声信号。历史麦克噪音功率谱密度可以是耳机识别的上一次麦克噪音信号;历史骨导噪音功率 谱密度可以是耳机识别的上一次骨导噪音信号。
耳机可根据频谱麦克信号以及历史麦克噪音功率谱密度对频谱麦克信号进行消除以及增强。进一步地,可根据频域麦克信号以及历史麦克噪音功率谱密度获取对应的增益函数,根据增益函数以及频谱麦克信号对频域麦克信号进行噪音消除与增强。
耳机可根据频谱骨导信号以及历史骨导噪音功率谱密度对频谱骨导信号进行消除以及增强。进一步地,可根据频域骨导信号以及历史骨导噪音功率谱密度获取对应的增益函数,根据增益函数以及频谱骨导信号对频域骨导信号进行噪音消除与增强。
可选地,对频域麦克信号或频域骨导信号的消除以及增强满足以下公式:
Figure PCTCN2020124866-appb-000003
其中,
Figure PCTCN2020124866-appb-000004
其中,
Figure PCTCN2020124866-appb-000005
为噪声消除后的频域麦克信号或噪音消除后的频域骨导信号;H t(k)为增益函数;γ t(k)为后验信噪比;λ为过减因子,为一常数,如0.9;P n(k,t-1)为历史麦克噪音功率谱密度或历史骨导噪音功率谱密度。
在本实施例中,通过获取历史麦克噪音功率谱密度以及历史骨导噪音功率谱密度,根据频域麦克信号以及历史麦克噪音功率谱密度对频域麦克信号进行消除以及增强,根据频域骨导信号以及历史骨导噪音功率谱密度对频域骨导信号进行消除以及增强,根据上一次检测到的噪音信号对当前声音信号进行消除,根据环境噪声以及骨声纹传感器的特点对声音信号进行噪音消除,具有更好的降噪效果,在强噪声条件下,骨声纹传感器低频信号保真度远好于麦克风的低频信号的保真度,从而提高上行语音频信号质量,提高低频信号的清晰度,使输出的上行链路通话具有更好的识别度的有益效果。
基于上述第四实施例,参照图8,图8为本申请耳机的语音活动检测方法第五实施例的流程示意图,步骤S400之后,还包括:
步骤S800,确认所述耳机检测到噪音时,根据所述历史麦克噪声功率谱密度以及所述频域麦克信号获取麦克噪声功率谱密度;
步骤S900,根据所述历史骨导噪声功率谱密度以及所述频域骨导信号获取骨导噪声功率谱密度;
步骤S1000,将所述历史麦克噪声功率谱密度更新为所述麦克噪声功率谱密度;
步骤S1100,将所述历史骨导噪声功率谱密度更新为所述骨导噪声功率谱密度。
在相干系数小于预设相干系数,或频谱能量小于预设频谱能量时,耳机检测到噪音,根据历史麦克噪音功率谱密度以及频域麦克信号获取麦克噪音功率谱密度,根据历史骨导噪音功率谱密度以及频谱骨导信号获取骨导噪音功率谱密度,
进一步地,根据频域麦克信号的模的平方以及历史麦克噪音功率谱密度获取麦克噪音功率谱密度;根据频域骨导信号的模的平方以及历史骨导噪音功率谱密度获取骨导噪音功率谱密度。
可选地,麦克噪音功率谱密度满足以下公式:
P n1(k,t)=β*P n1(k,t-1)+(1-β)*|Y 1(k,t| 2
其中,P n1(k,t)为麦克噪音功率谱密度;P n1(k,t-1)为历史麦克噪音功率谱密度;β为迭代因子,为一常数,如0.9;t为语音帧编号;k为子带序号。
可选地,骨导噪音功率谱密度满足以下公式:
P n2(k,t)=β*P n2(k,t-1)+(1-β)*|Y 2(k,t)|
其中,P n2(k,t)为骨导噪音功率谱密度;P n1(k,t-1)为历史骨导噪音功率谱密度;β为迭代因子,为一常数,如0.9;t为语音帧编号;k为子带序号。
在获取到骨导噪音功率谱密度以及麦克噪音功率谱密度后,将历史麦克噪音功率谱密度更新为麦克噪音功率谱密度,将历史骨导噪音功率谱密度更新为骨导噪音功率谱密度。
在本实施例中,在耳机当前获取到的音频信号为噪音时,获取历史麦克噪音功率谱密度以及历史骨导噪音功率谱密度,根据频域麦克信号以及历史麦克噪音功率谱密度获取麦克噪音功率谱密度,根据频域骨导信号以及历史骨导噪音功率谱密度获取骨导噪音功率谱密度,并更新历史麦克噪音功率谱密度以及历史骨导噪音功率谱密度,及时更新噪音信号,以便根据环境噪声的变化对当前噪音进行消除或增强,以更好地进行降噪的有益效果。
此外,本申请实施例还提供一种耳机,所述耳机包括麦克风、骨声纹传感器、处理器、存储器以及存储在所述存储器上并可在所述处理器上运行的耳机的语音活动检测程序,所述耳机的语音活动检测程序被所述处理器执行时实现如上所述的耳机的语音活动检测方法实施例的内容。
本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有耳机的语音活动检测程序,所述耳机的语音活动检测程序被处理器执行时实现如上所述的耳机的语音活动检测方法实施例的内容。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个计算机可读存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台耳机(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (10)

  1. 一种耳机的语音活动检测方法,其特征在于,所述耳机的语音活动检测方法包括以下步骤:
    将耳机的麦克风采集的第一时域麦克信号转换成频域麦克信号,并将所述耳机的骨声纹传感器采集的第一时域骨导信号转换成频域骨导信号,其中,所述第一时域麦克信号与所述第一时域骨导信号的采集时间段相同;
    根据所述频域麦克信号以及所述频域骨导信号获取相干系数;
    根据所述频域骨导信号获取频谱能量;
    根据所述相干系数以及所述频谱能量确定所述耳机检测到语音或噪音。
  2. 如权利要求1所述的耳机的语音活动检测方法,其特征在于,所述根据所述频域麦克信号以及所述频域骨导信号获取相干系数的步骤包括:
    获取所述频域麦克信号在第一预设频段中各个子带的子频域麦克信号;
    获取所述频域骨导信号在第一预设频段中各个子带的子频域骨导信号;
    根据各个所述子带的子频域麦克信号以及各个所述子带的子频域骨导信号获取所述相干系数。
  3. 如权利要求2所述的耳机的语音活动检测方法,其特征在于,所述根据各个所述子带的子频域麦克信号以及各个所述子带的子频域骨导信号获取所述相干系数的步骤包括:
    根据各个所述子带的子频域麦克信号获取所述频域麦克信号在所述第一预设频段的麦克子带能量;
    根据各个所述子带的子频域骨导信号获取所述频域骨导信号在所述第一预设频段的骨导子带能量;
    根据同一所述子带对应的子频域麦克信号与子频域骨导信号获取各个所述子带的互相关系数;
    根据各个所述子带的互相关系数、所述麦克子带能量以及所述骨导子带能量获取所述相干系数。
  4. 如权利要求1所述的耳机的语音活动检测方法,其特征在于,所述根 据所述频谱骨导信号获取频谱能量的步骤还包括:
    获取所述频域骨导信号在第二预设频段中各个子带的子频域骨导信号;
    根据各个所述子频域骨导信号获取所述频谱能量。
  5. 如权利要求1所述的耳机的语音活动检测方法,其特征在于,所述根据所述相干系数以及所述频谱能量确定所述耳机检测到语音或噪音的步骤包括:
    在所述相干系数大于或等于预设相干系数,且所述频谱能量大于或等于预设频谱能量时,确认所述耳机检测到语音;
    在所述相干系数小于所述预设相干系数,或所述频谱能量小于所述预设频谱能量时,确认所述耳机检测到噪音。
  6. 如权利要求5所述的耳机的语音活动检测方法,其特征在于,所述确认所述耳机检测到语音的步骤之后,还包括:
    分别对所述频域麦克信号以及所述频域骨导信号进行噪音消除;
    将噪音消除后的所述频谱麦克信号转换成第二时域麦克信号,将噪音消除后的频域骨导信号转换成第二时域骨导信号;
    对所述第二时域麦克信号以及所述第二时域骨导信号混合处理并输出。
  7. 如权利要求6所述的耳机的语音活动检测方法,其特征在于,所述分别对所述频域麦克信号以及所述频域骨导信号进行噪音消除的步骤包括:
    获取所述耳机的历史麦克噪声功率谱密度以及历史骨导噪声功率谱密度;
    根据所述频域麦克信号以及所述历史麦克噪声功率谱密度对所述频域麦克信号进行噪音消除;
    根据所述频域骨导信号以及所述历史骨导噪声功率谱密度对所述频域骨导信号进行噪音消除。
  8. 如权利要求7所述的耳机的语音活动检测方法,其特征在于,所述根据所述相干系数以及所述频谱能量确定所述耳机检测到语音或噪音的步骤之 后,所述耳机的语音活动检测方法还包括:
    确认所述耳机检测到噪音时,根据所述历史麦克噪声功率谱密度以及所述频域麦克信号获取麦克噪声功率谱密度;
    根据所述历史骨导噪声功率谱密度以及所述频域骨导信号获取骨导噪声功率谱密度;
    将所述历史麦克噪声功率谱密度更新为所述麦克噪声功率谱密度;
    将所述历史骨导噪声功率谱密度更新为所述骨导噪声功率谱密度。
  9. 一种耳机,其特征在于,所述耳机包括麦克风、骨声纹传感器、处理器、存储器以及存储在所述存储器上并可在所述处理器上运行的耳机的语音活动检测程序,所述耳机的语音活动检测程序被所述处理器执行时实现如权利要求1至8中任一项所述的耳机的语音活动检测方法的步骤。
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有耳机的语音活动检测程序,所述耳机的语音活动检测程序被处理器执行时实现如权利要求1至8中的任一项所述的耳机的语音活动检测方法的步骤。
PCT/CN2020/124866 2020-09-10 2020-10-29 耳机的语音活动检测方法、耳机及存储介质 WO2022052244A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/025,876 US20230352038A1 (en) 2020-09-10 2020-10-29 Voice activation detecting method of earphones, earphones and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010953526.7A CN112017696B (zh) 2020-09-10 2020-09-10 耳机的语音活动检测方法、耳机及存储介质
CN202010953526.7 2020-09-10

Publications (1)

Publication Number Publication Date
WO2022052244A1 true WO2022052244A1 (zh) 2022-03-17

Family

ID=73522259

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/124866 WO2022052244A1 (zh) 2020-09-10 2020-10-29 耳机的语音活动检测方法、耳机及存储介质

Country Status (3)

Country Link
US (1) US20230352038A1 (zh)
CN (1) CN112017696B (zh)
WO (1) WO2022052244A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114786083A (zh) * 2022-04-21 2022-07-22 歌尔股份有限公司 降噪方法、装置、耳机设备及存储介质

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112750464B (zh) * 2020-12-25 2023-05-23 深圳米唐科技有限公司 基于多传感器的人体发声状态检测方法、系统及存储介质
CN112767963B (zh) * 2021-01-28 2022-11-25 歌尔科技有限公司 一种语音增强方法、装置、系统及计算机可读存储介质
CN115132212A (zh) * 2021-03-24 2022-09-30 华为技术有限公司 一种语音控制方法和装置
CN113115190B (zh) * 2021-03-31 2023-01-24 歌尔股份有限公司 音频信号处理方法、装置、设备及存储介质
CN113223561B (zh) * 2021-05-08 2023-03-24 紫光展锐(重庆)科技有限公司 一种语音活动检测的方法、电子设备及装置
CN113113050A (zh) * 2021-05-10 2021-07-13 紫光展锐(重庆)科技有限公司 一种语音活动检测的方法、电子设备及装置
CN113421580B (zh) * 2021-08-23 2021-11-05 深圳市中科蓝讯科技股份有限公司 降噪方法、存储介质、芯片及电子设备
CN113593612B (zh) * 2021-08-24 2024-06-04 歌尔科技有限公司 语音信号处理方法、设备、介质及计算机程序产品
US12052538B2 (en) * 2021-09-16 2024-07-30 Bitwave Pte Ltd. Voice communication in hostile noisy environment
CN114040309B (zh) * 2021-09-24 2024-03-19 北京小米移动软件有限公司 风噪检测方法、装置、电子设备及存储介质
CN115348049B (zh) * 2022-06-22 2024-07-09 北京理工大学 一种利用耳机内向麦克风的用户身份认证方法
CN115457984A (zh) * 2022-07-28 2022-12-09 杭州芯声智能科技有限公司 一种基于骨声纹传感器的vad方法及系统
CN115396776A (zh) * 2022-08-25 2022-11-25 北京小米移动软件有限公司 耳机的控制方法、装置、耳机及计算机可读存储介质
CN115499770A (zh) * 2022-08-29 2022-12-20 歌尔科技有限公司 耳机的语音活动检测方法、装置、耳机及介质
CN119380700A (zh) * 2024-12-24 2025-01-28 绍兴圆方半导体有限公司 关键词识别方法、装置、存储介质及电子设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109195042A (zh) * 2018-07-16 2019-01-11 恒玄科技(上海)有限公司 低功耗的高效降噪耳机及降噪系统
CN109920451A (zh) * 2019-03-18 2019-06-21 恒玄科技(上海)有限公司 语音活动检测方法、噪声抑制方法和噪声抑制系统
CN110556128A (zh) * 2019-10-15 2019-12-10 出门问问信息科技有限公司 一种语音活动性检测方法、设备及计算机可读存储介质
CN110782912A (zh) * 2019-10-10 2020-02-11 安克创新科技股份有限公司 音源的控制方法以及扬声设备

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09212196A (ja) * 1996-01-31 1997-08-15 Nippon Telegr & Teleph Corp <Ntt> 雑音抑圧装置
JP2021511755A (ja) * 2017-12-07 2021-05-06 エイチイーディ・テクノロジーズ・エスアーエルエル 音声認識オーディオシステムおよび方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109195042A (zh) * 2018-07-16 2019-01-11 恒玄科技(上海)有限公司 低功耗的高效降噪耳机及降噪系统
CN109920451A (zh) * 2019-03-18 2019-06-21 恒玄科技(上海)有限公司 语音活动检测方法、噪声抑制方法和噪声抑制系统
CN110782912A (zh) * 2019-10-10 2020-02-11 安克创新科技股份有限公司 音源的控制方法以及扬声设备
CN110556128A (zh) * 2019-10-15 2019-12-10 出门问问信息科技有限公司 一种语音活动性检测方法、设备及计算机可读存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114786083A (zh) * 2022-04-21 2022-07-22 歌尔股份有限公司 降噪方法、装置、耳机设备及存储介质

Also Published As

Publication number Publication date
US20230352038A1 (en) 2023-11-02
CN112017696A (zh) 2020-12-01
CN112017696B (zh) 2024-02-09

Similar Documents

Publication Publication Date Title
WO2022052244A1 (zh) 耳机的语音活动检测方法、耳机及存储介质
US11134330B2 (en) Earbud speech estimation
US11363390B2 (en) Perceptually guided speech enhancement using deep neural networks
US10535362B2 (en) Speech enhancement for an electronic device
WO2022160593A1 (zh) 一种语音增强方法、装置、系统及计算机可读存储介质
US9723422B2 (en) Multi-microphone method for estimation of target and noise spectral variances for speech degraded by reverberation and optionally additive noise
CN103907152B (zh) 用于音频信号噪声抑制的方法和系统
US11109164B2 (en) Method of operating a hearing aid system and a hearing aid system
CN111935584A (zh) 用于无线耳机组件的风噪处理方法、装置以及耳机
US9532149B2 (en) Method of signal processing in a hearing aid system and a hearing aid system
KR20200143255A (ko) 바람 검출을 위한 마이크로폰의 스피커 에뮬레이션
CN104021798A (zh) 用于通过具有可变频谱增益和可动态调制的硬度的算法对音频信号隔音的方法
JP5663112B1 (ja) 音信号処理装置、及び、それを用いた補聴器
CN115298735A (zh) 用于有源干扰噪声抑制的方法、设备、耳机及计算机程序
EP2916320A1 (en) Multi-microphone method for estimation of target and noise spectral variances
US11438712B2 (en) Method of operating a hearing aid system and a hearing aid system
US20230197050A1 (en) Wind noise suppression system
JP2019154030A (ja) 補聴器の作動方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20953029

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20953029

Country of ref document: EP

Kind code of ref document: A1