[go: up one dir, main page]

CN111863018A - Directional sound pickup method and related device under dual microphones - Google Patents

Directional sound pickup method and related device under dual microphones Download PDF

Info

Publication number
CN111863018A
CN111863018A CN202010704095.0A CN202010704095A CN111863018A CN 111863018 A CN111863018 A CN 111863018A CN 202010704095 A CN202010704095 A CN 202010704095A CN 111863018 A CN111863018 A CN 111863018A
Authority
CN
China
Prior art keywords
frequency
frequency domain
atom
signal
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010704095.0A
Other languages
Chinese (zh)
Inventor
郭颖
金忠孝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SAIC Motor Corp Ltd
Original Assignee
SAIC Motor Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SAIC Motor Corp Ltd filed Critical SAIC Motor Corp Ltd
Priority to CN202010704095.0A priority Critical patent/CN111863018A/en
Publication of CN111863018A publication Critical patent/CN111863018A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The application provides a directional pickup method under a double-microphone and a related device, wherein the method comprises the following steps: according to the sampling voice signals of the double microphones, atoms used for representing the direction to be picked up in a preset dictionary matrix are determined as target atoms, and a frequency domain filter is constructed based on the target atoms in the dictionary matrix so as to pick up the voice signals in the direction to be picked up in the sampling voice signals. Because the frequency domain filter reflects the frequency distribution of the voice in the direction to be picked up in the sampled voice signal, the voice signal obtained by filtering the sampled voice signal according to the frequency domain filter is the voice signal in the direction to be picked up, that is, the voice signal obtained by filtering does not contain the voice signals in other directions, and further, the anti-interference capability of directional pickup realized by the scheme provided by the application is improved.

Description

一种双麦克风下的定向拾音方法及相关装置Directional sound pickup method and related device under dual microphones

技术领域technical field

本申请涉及语音识别领域,尤其涉及一种双麦克风下的定向拾音方法及相关装置。The present application relates to the field of speech recognition, and in particular, to a directional sound pickup method with dual microphones and a related device.

背景技术Background technique

随着语音交互技术的发展,传统的仅支持主驾声源的语音识别系统已经不能满足需求。目前需要同时支持主副驾声源的语音识别的功能,即需要能够拾取主驾声源和副驾声源的语音识别系统。由于车载场景中,语音识别通常会受到路噪、风噪、胎噪等环境噪声以及音乐、人声等干扰的影响,造成语音识别效果的严重下降。因此,针对双麦克风的车载场景下的定向拾音,在车内多人同时说话时,需要分离出主副驾声源,同时抑制后排乘客的干扰,以便语音识别系统依据拾取的主副驾声源,准确识别主副驾声源包含的指令。With the development of voice interaction technology, the traditional voice recognition system that only supports the main driver's sound source can no longer meet the demand. At present, it is necessary to support the function of speech recognition of the sound source of the main and auxiliary drivers at the same time, that is, a speech recognition system that can pick up the sound source of the main driver and the sound source of the auxiliary driver is required. In vehicle-mounted scenarios, speech recognition is usually affected by environmental noises such as road noise, wind noise, tire noise, and interference from music and human voices, resulting in a serious decline in the effect of speech recognition. Therefore, for the directional pickup in the vehicle-mounted scene with dual microphones, when many people in the car speak at the same time, it is necessary to isolate the sound source of the main and auxiliary drivers, and at the same time suppress the interference of the rear passengers, so that the voice recognition system can pick up the sound source of the main and auxiliary drivers according to the sound source. , to accurately identify the instructions contained in the main and co-pilot sound sources.

对于双麦克风的车载场景下的定向拾音,传统的拾音方法为波束形成方法,例如,最小方差无失真响应波束形成、线形约束最小方差波束形成,以及广义旁瓣抑制等。For the directional sound pickup in the vehicle-mounted scene with two microphones, the traditional sound pickup method is the beamforming method, such as minimum variance undistorted response beamforming, linear constrained minimum variance beamforming, and generalized sidelobe suppression.

但是,传统的拾音方法应用于车内双麦克风的车载场景下,存在抗干扰能力差的问题,即拾取的主副驾声源中残留有较多的干扰语音、音乐等噪声等。However, when the traditional sound pickup method is applied to the car scene with dual microphones in the car, there is a problem of poor anti-interference ability, that is, there is a lot of noise such as interfering speech and music remaining in the picked-up sound source of the main and auxiliary drivers.

发明内容SUMMARY OF THE INVENTION

本申请提供了一种双麦克风下的定向拾音方法及相关装置,目的在于解决传统的波束形成方法拾取主副驾声源的抗干扰能力差的问题。The present application provides a directional sound pickup method with dual microphones and a related device, which aim to solve the problem of poor anti-interference ability of the traditional beamforming method to pick up the sound source of the main and auxiliary drivers.

为了实现上述目的,本申请提供了以下技术方案:In order to achieve the above purpose, the application provides the following technical solutions:

本申请提供了一种双麦克风下的定向拾音方法,包括:The present application provides a directional sound pickup method with dual microphones, including:

依据双麦克风的采样语音信号的频域信号,计算预设的字典矩阵的各个原子的时延;所述各个原子的时延表示所述采样语音信号下该原子表示的预设语音成分到达双麦克风的时间差;Calculate the time delay of each atom of the preset dictionary matrix according to the frequency domain signal of the sampled speech signal of the dual microphones; the time delay of each atom indicates that the preset speech component represented by the atom under the sampled speech signal reaches the dual microphones time difference;

将所述字典矩阵的原子中,时延与待拾音方向的语音的时延间的差值属于预设的拾音波束范围的原子,作为目标原子;In the atoms of the dictionary matrix, the difference between the time delay and the time delay of the voice in the direction to be picked up belongs to the atom of the preset sound pickup beam range, as the target atom;

分别计算所述字典矩阵的各个频率下目标原子的频率幅度值之和与该频率下全部原子的频率幅度值之和的比值,将各个频率与对应的比值组成于表征所述语音信号中,所述待拾音方向的语音的频率分布的频域滤波器Calculate the ratio of the sum of the frequency amplitude values of the target atoms under each frequency of the dictionary matrix to the sum of the frequency amplitude values of all atoms at the frequency, and form each frequency and the corresponding ratio in the characterizing the speech signal, so A frequency domain filter describing the frequency distribution of speech in the direction to be picked up

依据所述频域信号和所述频域滤波器,确定所述采样语音信号中所述待拾音方向的语音信号。According to the frequency domain signal and the frequency domain filter, the voice signal in the to-be-picked direction in the sampled voice signal is determined.

可选的,所述依据所述频域信号和所述频域滤波器,确定所述采样语音信号中所述待拾音方向的语音信号,包括:Optionally, the determining, according to the frequency domain signal and the frequency domain filter, the voice signal in the to-be-picked direction in the sampled voice signal, comprising:

采用所述频域滤波器对所述频域信号进行滤波,得到滤波后的频域信号;Filter the frequency-domain signal by using the frequency-domain filter to obtain a filtered frequency-domain signal;

对所述滤波后的频域信号分别进行时频逆变换,得到所述采样语音信号中所述待拾音方向的语音信号。The time-frequency inverse transformation is performed on the filtered frequency domain signals respectively to obtain the voice signal in the to-be-picked direction in the sampled voice signal.

可选的,所述频域信号包括:第一频域信号和第二频域信号;Optionally, the frequency domain signal includes: a first frequency domain signal and a second frequency domain signal;

所述依据双麦克风的采样语音信号的频域信号,计算预设的字典矩阵的各个原子的时延,包括:The time delay of each atom of the preset dictionary matrix is calculated according to the frequency domain signal of the sampled speech signal of the dual microphones, including:

依据

Figure BDA0002594047710000021
所述第一频域信号和所述第二频域信号,计算每个原子的时延函数;其中,f表示频率,d表示原子,Wfd表示所述字典矩阵;d表示时延;Xlf与Xrf分别表示第一频域信号和第二频域信号;F表示计算得到的原子的时延函数;in accordance with
Figure BDA0002594047710000021
Calculate the delay function of each atom for the first frequency domain signal and the second frequency domain signal; wherein, f represents the frequency, d represents the atom, and W fd represents the dictionary matrix; d represents the delay; X lf and X rf represent the first frequency domain signal and the second frequency domain signal respectively; F represents the calculated atomic delay function;

分别针对每个原子的时延函数,将时延函数的取极大值情况下的时延,作为原子的时延,得到所述字典矩阵的各个原子的时延。For the delay function of each atom, the delay when the delay function takes a maximum value is taken as the delay of the atom, and the delay of each atom of the dictionary matrix is obtained.

可选的,所述分别计算所述字典矩阵中的各个频率下目标原子的频率幅度值之和与该频率下全部原子的频率幅度值之和的比值,将各个频率与对应的比值组成用于表征所述采样语音信号中,所述待拾音方向的语音的频率分布的频域滤波器,包括:Optionally, the ratio of the sum of the frequency amplitude values of the target atoms at each frequency in the dictionary matrix and the sum of the frequency amplitude values of all atoms at the frequency is calculated separately, and each frequency and the corresponding ratio are formed for use. A frequency domain filter that characterizes the frequency distribution of the voice in the direction to be picked up in the sampled voice signal, including:

如果所述字典矩阵的原子的时延与所述待拾音方向的语音的时延之间的差值的绝对值小于预设阈值,则原子的二值取值为1,否则二值取值为0;If the absolute value of the difference between the delay of the atom of the dictionary matrix and the delay of the voice in the direction to be picked up is smaller than the preset threshold, the binary value of the atom takes the value of 1, otherwise the binary value takes the value is 0;

分别计算所述字典矩阵的各个频率下全部原子的频率幅度值与对应二值取值的加权和,与,该频率下全部原子的频率幅度值之和的比值;Calculate the weighted sum of the frequency amplitude values of all atoms and the corresponding binary values under each frequency of the dictionary matrix, and the ratio of the sum of the frequency amplitude values of all atoms at the frequency;

将各个频率与对应的比值,组成用于表征所述采样语音信号中,所述待拾音方向的语音的频率分布的频域滤波器。A frequency domain filter used to characterize the frequency distribution of the speech in the to-be-picked direction in the sampled speech signal is composed of each frequency and the corresponding ratio.

可选的,所述预设的字典矩阵的生成过程,包括:Optionally, the generation process of the preset dictionary matrix includes:

将双麦克风分别对应的预设训练数据,分别进行时频变换并取绝对值,得到两个非负的幅度谱矩阵;The preset training data corresponding to the two microphones are respectively subjected to time-frequency transformation and absolute values are obtained to obtain two non-negative amplitude spectrum matrices;

通过非负矩阵分解算法,将所述幅度谱矩阵分解为目标字典矩阵和系数矩阵;Decompose the magnitude spectrum matrix into a target dictionary matrix and a coefficient matrix through a non-negative matrix decomposition algorithm;

依据预设的目标函数,迭代更新目标字典矩阵和系数矩阵,直至将在所述目标函数收敛的情况下得到的目标字典矩阵作为所述预设的字典矩阵。According to the preset objective function, the target dictionary matrix and the coefficient matrix are iteratively updated until the target dictionary matrix obtained when the objective function converges is used as the preset dictionary matrix.

本申请还提供了一种双麦克风下的定向拾音装置,包括:The application also provides a directional sound pickup device with dual microphones, including:

第一计算模块,用于依据双麦克风的采样语音信号的频域信号,计算预设的字典矩阵中各个原子的时延;所述各个原子的时延表示所述采样语音信号下该原子表示的预设语音成分到达双麦克风的时间差;The first calculation module is used to calculate the time delay of each atom in the preset dictionary matrix according to the frequency domain signal of the sampled speech signal of the dual microphones; the time delay of each atom represents the time delay represented by the atom under the sampled speech signal. The time difference between the preset speech components reaching the dual microphones;

第一确定模块,用于将所述字典矩阵的原子中,时延与待拾音方向的语音的时延间的差值属于预设的拾音波束范围的原子,作为目标原子;The first determination module is used for taking the atoms in the atoms of the dictionary matrix, the difference between the time delay and the time delay of the voice in the direction to be picked up, which belongs to the preset range of the sound pickup beam, as the target atom;

第二计算模块,用于分别计算所述字典矩阵各个频率下目标原子的频率幅度值之和与该频率下全部原子的频率幅度值之和的比值,将各个频率与对应的比值组成用于表征所述语音信号中,所述待拾音方向的语音的频率分布的频域滤波器;The second calculation module is used to calculate the ratio of the sum of the frequency amplitude values of the target atoms at each frequency of the dictionary matrix to the sum of the frequency amplitude values of all atoms at the frequency, and form each frequency and the corresponding ratio to represent the In the voice signal, the frequency domain filter of the frequency distribution of the voice in the direction to be picked up;

第二确定模块,用于依据所述频域信号和所述频域滤波器,确定所述采样语音信号中所述待拾音方向的语音信号。The second determination module is configured to determine, according to the frequency domain signal and the frequency domain filter, the speech signal in the to-be-picked direction in the sampled speech signal.

可选的,所述第二确定模块,用于依据所述频域信号和所述频域滤波器,确定所述采样语音信号中所述待拾音方向的语音信号,包括:Optionally, the second determining module is configured to determine the voice signal in the to-be-picked direction in the sampled voice signal according to the frequency domain signal and the frequency domain filter, including:

所述第二确定模块,具体用于采用所述频域滤波器对所述频域信号进行滤波,得到滤波后的频域信号;对所述滤波后的频域信号分别进行时频逆变换,得到所述采样语音信号中所述待拾音方向的语音信号。The second determining module is specifically configured to use the frequency domain filter to filter the frequency domain signal to obtain a filtered frequency domain signal; respectively perform inverse time-frequency transform on the filtered frequency domain signal, The voice signal in the to-be-picked direction in the sampled voice signal is obtained.

可选的,所述频域信号包括:第一频域信号和第二频域信号;Optionally, the frequency domain signal includes: a first frequency domain signal and a second frequency domain signal;

所述第一计算模块,用于依据双麦克风的采样语音信号的频域信号,计算预设的字典矩阵中各个原子的时延,包括:The first calculation module is used to calculate the time delay of each atom in the preset dictionary matrix according to the frequency domain signal of the sampled speech signal of the dual microphones, including:

所述第一计算模块,具体用于依据

Figure BDA0002594047710000041
所述第一频域信号和所述第二频域信号,计算每个原子的时延函数;其中,f表示频率,d表示原子,Wfd表示所述字典矩阵;d表示时延;Xlf与Xrf分别表示第一频域信号和第二频域信号;F表示计算得到的原子的时延函数;The first calculation module is specifically used for
Figure BDA0002594047710000041
Calculate the delay function of each atom for the first frequency domain signal and the second frequency domain signal; wherein, f represents the frequency, d represents the atom, and W fd represents the dictionary matrix; d represents the delay; X lf and X rf represent the first frequency domain signal and the second frequency domain signal respectively; F represents the calculated atomic delay function;

分别针对每个原子的时延函数,将时延函数的取极大值情况下的时延,作为原子的时延,得到所述字典矩阵的各个原子的时延。For the delay function of each atom, the delay when the delay function takes a maximum value is taken as the delay of the atom, and the delay of each atom of the dictionary matrix is obtained.

可选的,所述第二计算模块,用于分别计算所述字典矩阵中的各个频率下目标原子的频率幅度值之和与该频率下全部原子的频率幅度值之和的比值,将各个频率与对应的比值组成用于表征所述采样语音信号中,所述待拾音方向的语音的频率分布的频域滤波器,包括:Optionally, the second calculation module is used to calculate the ratio of the sum of the frequency amplitude values of the target atoms at each frequency in the dictionary matrix to the sum of the frequency amplitude values of all atoms at the frequency, and calculate the ratio of each frequency A frequency domain filter used to characterize the frequency distribution of the voice in the direction to be picked up in the sampled voice signal with the corresponding ratio, including:

所述第二计算模块,具体用于如果所述字典矩阵的原子的时延与所述待拾音方向的语音的时延之间的差值的绝对值小于预设阈值,则原子的二值取值为1,否则二值取值为0;分别计算所述字典矩阵的各个频率下全部原子的频率幅度值与对应二值取值的加权和,与,该频率下全部原子的频率幅度值之和的比值,将各个频率与对应的比值,组成用于表征所述采样语音信号中,所述待拾音方向的语音的频率分布的频域滤波器。The second calculation module is specifically configured to, if the absolute value of the difference between the time delay of the atom of the dictionary matrix and the time delay of the voice in the direction to be picked up is less than the preset threshold, the binary value of the atom The value is 1, otherwise the binary value is 0; calculate the weighted sum of the frequency amplitude value of all atoms under each frequency of the dictionary matrix and the corresponding binary value, and, the frequency amplitude value of all atoms at this frequency The ratio of the sum and each frequency and the corresponding ratio are used to form a frequency domain filter for characterizing the frequency distribution of the voice in the direction to be picked up in the sampled voice signal.

本申请还提供了一种存储介质,所述存储介质包括存储的程序,其中,所述程序执行上述任意一种所述的双麦克风下的定向拾音方法。The present application further provides a storage medium, where the storage medium includes a stored program, wherein the program executes any one of the above-mentioned directional sound pickup methods with dual microphones.

本申请所述的双麦克风下的定向拾音方法及相关装置,依据双麦克风的采样语音信号的频域信号,计算预设的字典矩阵的各个原子的时延;其中,计算得到的各个原子的时延表示采样语音信号下该原子表示的预设语音成分到达双麦克风的时间差。因此,本申请将字典矩阵的原子中,时延与待拾音方向的语音的时延间的差值属于预设的拾音波束范围的原子,作为目标原子,该目标原子为代表待拾音方向的语音的原子。The directional sound pickup method and related device under the dual microphones described in the present application calculate the time delay of each atom of the preset dictionary matrix according to the frequency domain signal of the sampled speech signal of the dual microphones; wherein, the calculated time delay of each atom is The time delay represents the time difference for the preset speech component represented by the atom to reach the dual microphones under the sampled speech signal. Therefore, in the present application, among the atoms of the dictionary matrix, the difference between the time delay and the time delay of the voice in the direction to be picked up belongs to the atom that belongs to the preset range of the sound pickup beam as the target atom, and the target atom represents the sound to be picked up. Atoms for directional speech.

进而,通过分别计算字典矩阵中各个频率下目标原子的频率幅度值之和与该频率下的全部原子的频率幅度值之和的比值,使得各个频率与对应比值组成的频域滤波器反映的是:预设拾音波束范围内待拾音方向的语音的频率分布。因此,本申请通过频率滤波器滤波得到的待拾音方向的语音信号不包括其他方向的信号,因此,本申请提供的双麦克风下的定向拾音方案实现定向拾音的抗干扰能力得到提高。Furthermore, by calculating the ratio of the sum of the frequency amplitude values of the target atoms at each frequency in the dictionary matrix to the sum of the frequency amplitude values of all atoms at the frequency, the frequency domain filter composed of each frequency and the corresponding ratio reflects the : The frequency distribution of the voice in the direction to be picked up within the preset sound pickup beam range. Therefore, the voice signal in the direction to be picked up obtained by filtering the frequency filter in the present application does not include signals in other directions. Therefore, the anti-interference ability of the directional sound pickup solution under the dual microphone provided by the present application to achieve directional sound pickup is improved.

附图说明Description of drawings

为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following briefly introduces the accompanying drawings required for the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are some embodiments of the present application. For those of ordinary skill in the art, other drawings can also be obtained based on these drawings without any creative effort.

图1为本申请实施例公开的完成训练的字典矩阵的训练过程示意图;1 is a schematic diagram of a training process of a dictionary matrix for completing training disclosed in an embodiment of the application;

图2为本申请实施例公开的一种双麦克风下的定向拾音方法的流程图;FIG. 2 is a flowchart of a method for directional sound pickup with dual microphones disclosed in an embodiment of the present application;

图3(a)为本申请实施例公开的主副驾上同时产生声源时,采样语音信号与定向拾取的语音信号在时域上的对比图;Figure 3 (a) is a comparison diagram in the time domain between the sampled speech signal and the directionally picked-up speech signal when sound sources are simultaneously generated on the main and auxiliary vehicles disclosed in the embodiment of the application;

图3(b)为本申请实施例公开的主副驾上同时产生声源时,采样语音信号与定向拾取的语音信号在频域上的对比图;Figure 3 (b) is a comparison diagram in the frequency domain between the sampled speech signal and the directionally picked-up speech signal when sound sources are simultaneously generated on the main and auxiliary vehicles disclosed in the embodiment of the present application;

图4为本申请实施例公开的一种双麦克风下的定向拾取装置的结构示意图;4 is a schematic structural diagram of a directional pickup device with dual microphones disclosed in an embodiment of the present application;

图5为本申请实施例公开的一种设备的结构示意图。FIG. 5 is a schematic structural diagram of a device disclosed in an embodiment of the present application.

具体实施方式Detailed ways

下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of this application.

图1为本申请实施例提供的一种字典矩阵的训练过程,可以包括以下步骤:FIG. 1 provides a training process of a dictionary matrix according to an embodiment of the present application, which may include the following steps:

S101、将双麦克风分别对应的预设训练数据,分别进行时频变换并取绝对值,得到两个非负的幅度谱矩阵。S101. Perform time-frequency transformation on the preset training data corresponding to the dual microphones respectively, and obtain absolute values to obtain two non-negative amplitude spectrum matrices.

在本实例中,预设训练数据可以为5min双通道的带噪声语音数据。当然,在实际中,预设训练数据的时长可以根据实际情况确定,本实施例不对时长的具体内容作限定。In this example, the preset training data may be 5-min dual-channel noisy speech data. Of course, in practice, the duration of the preset training data may be determined according to the actual situation, and the specific content of the duration is not limited in this embodiment.

在本实施例中,得到的一个幅度谱矩阵可以表示为Xft,其中f表示频率,t表示时间。In this embodiment, an amplitude spectrum matrix obtained can be represented as X ft , where f represents frequency and t represents time.

S102、通过非负矩阵分解算法,将幅度谱矩阵分解为目标字典矩阵和系数矩阵。S102 , decompose the magnitude spectrum matrix into a target dictionary matrix and a coefficient matrix through a non-negative matrix decomposition algorithm.

在本实施例中,通过非负矩阵分解(Non-negative Matrix Factorization,NMF),将幅度谱矩阵分解为字典矩阵和系数矩阵,其中,为了描述方便,将本步骤分解得到的字典矩阵,称为目标字典矩阵。在本实施例中,字典矩阵可以采用Wfd表示,其中,d为字典矩阵的列数,并且,总列数还表示字典矩阵包括的原子的个数。系数矩阵可以采用Hdt表示。In this embodiment, the magnitude spectrum matrix is decomposed into a dictionary matrix and a coefficient matrix by non-negative matrix factorization (NMF). For the convenience of description, the dictionary matrix obtained by decomposing this step is called The target dictionary matrix. In this embodiment, the dictionary matrix may be represented by W fd , where d is the number of columns of the dictionary matrix, and the total number of columns also represents the number of atoms included in the dictionary matrix. The coefficient matrix can be represented by H dt .

在实际中,字典矩阵的原子的个数可以为2n,当然,也可以为其他个数,本实施例不对个数的具体取值作限定。其中,字典矩阵的原子的个数增加对待拾音方向的语音拾取和干扰抑制能力有相应的改善,但同时也会增加计算复杂度,在实际应用中需要根据实际情况进行选取。In practice, the number of atoms of the dictionary matrix may be 2 n , of course, it may also be other numbers, and the specific value of the number is not limited in this embodiment. Among them, the increase of the number of atoms in the dictionary matrix will improve the ability of voice pickup and interference suppression in the direction to be picked up, but it will also increase the computational complexity, which needs to be selected according to the actual situation in practical applications.

本步骤的具体实现方式为现有技术,这里不再赘述。The specific implementation manner of this step is in the prior art, and details are not repeated here.

S103、依据预设的目标函数,迭代更新目标字典矩阵和系数矩阵,直至将目标函数收敛的情况下得到的目标字典矩阵作为完成训练的字典矩阵。S103. Iteratively update the target dictionary matrix and the coefficient matrix according to the preset target function, until the target dictionary matrix obtained when the target function converges is used as the dictionary matrix for completing the training.

在本实施例中,对分解得到的目标字典矩阵和系数矩阵进行迭代优化,具体的,依据预设的目标函数,迭代更新目标字典矩阵和系数矩阵,直至目标函数收敛。并将目标函数收敛的情况下得到的目标字典矩阵,作为完成训练的字典矩阵。在本实施例中,可以将训练完成的字典矩阵作为预设的字典矩阵。In this embodiment, iterative optimization is performed on the decomposed target dictionary matrix and coefficient matrix. Specifically, according to a preset target function, the target dictionary matrix and the coefficient matrix are iteratively updated until the target function converges. The target dictionary matrix obtained when the target function converges is used as the dictionary matrix for completing the training. In this embodiment, the dictionary matrix after training can be used as a preset dictionary matrix.

在本实施例中,目标函数可以为欧几里得距离或KL散度,当然,在实际中,还可以采用其他函数作为目标函数,本实施例不对目标函数的具体形式作限定。In this embodiment, the objective function may be Euclidean distance or KL divergence. Of course, in practice, other functions may also be used as the objective function, and this embodiment does not limit the specific form of the objective function.

图2为本申请实施例提供的一种双麦克风下的定向拾音方法,包括以下步骤:FIG. 2 is a directional sound pickup method under dual microphones provided by an embodiment of the present application, comprising the following steps:

S201、获取双麦克风分别采集的语音信号。S201. Acquire voice signals collected by the dual microphones respectively.

在本实施例中,双麦克风分别采集的语音信号可以包括主副驾的声源,也包括车内其他方向的声源。In this embodiment, the voice signals respectively collected by the dual microphones may include the sound sources of the driver and the passenger, as well as sound sources in other directions in the vehicle.

在本步骤中,获取双麦克风分别采集的语音信号。In this step, the voice signals respectively collected by the dual microphones are acquired.

S202、对双麦克风采集的语音信号分别进行采样,得到双麦克风分别对应的采样语音信号。S202: Sample the speech signals collected by the dual microphones respectively, to obtain sampled speech signals corresponding to the dual microphones respectively.

在本实施例中,采样频率可以为16000Hz,可以选取帧长512个采样点作为当前待处理的采样语音信号,采用以下步骤进行处理。在实际中,采用帧移为10ms(160个采样点),选择汉宁窗,将每次帧移得到的帧长为512的采样点作为采样语音信号,分别进行以下步骤的流程处理。在本实施例中,为了描述方便,针对一次采样得到的帧长为512个采样点为采样语音信号,进行以下流程的介绍。In this embodiment, the sampling frequency may be 16000 Hz, and a frame length of 512 sampling points may be selected as the current sampled speech signal to be processed, and the following steps are used for processing. In practice, the frame shift is 10ms (160 sampling points), the Hanning window is selected, and the sampling points with a frame length of 512 obtained by each frame shift are used as the sampled speech signal, and the following steps are processed respectively. In this embodiment, for the convenience of description, the following procedure is introduced for a frame length of 512 sampling points obtained by one sampling as a sampled speech signal.

需要说明的是,采样频率为16000Hz,帧长为512个采样点,帧移为10ms,都只是一种具体的实现方式,在实际中,这些参数还可以为其他取值,本实施例不对具体的取值作限定。It should be noted that the sampling frequency of 16000 Hz, the frame length of 512 sampling points, and the frame shift of 10 ms are only a specific implementation. In practice, these parameters can also take other values, which are not specific in this embodiment. The value is limited.

S203、对双麦克风分别对应的采样语音信号,分别进行时频变换,得到第一频域信号和第二频域信号。S203. Perform time-frequency transformation on the sampled voice signals corresponding to the two microphones respectively, to obtain a first frequency domain signal and a second frequency domain signal.

在本实施例中,时频变换的方式可以为傅里叶变换,当然,在实际中,时频变换除了傅里叶变换之外,还可以采用其他变换方式,本实施例不对具体的时频变换方式作限定。In this embodiment, the time-frequency transform may be Fourier transform. Of course, in practice, in addition to Fourier transform, the time-frequency transform may also adopt other transform methods. This embodiment does not specifically describe the time-frequency transform. The transformation method is limited.

在本实施例中,为了描述方便,将对双麦克风分别对应的采样语音信号进行时频变换,得到的信号,称为第一频域信号和第二频域信号。In this embodiment, for convenience of description, time-frequency transform is performed on the sampled speech signals corresponding to the two microphones respectively, and the obtained signals are referred to as a first frequency domain signal and a second frequency domain signal.

S204、依据第一频域信号与第二频域信号,计算完成训练的字典矩阵的各个原子的时延。S204 , according to the first frequency domain signal and the second frequency domain signal, calculate the time delay of each atom of the dictionary matrix that has completed the training.

在本实施例中,第一频域信号和第二频域信号包括来自不同方向的语音频域信号,例如,都包括来自主驾方向的语音频域信号和来自副驾方向的语音频域信号。In this embodiment, the first frequency domain signal and the second frequency domain signal include voice and audio domain signals from different directions, for example, both include voice and audio domain signals from the main driving direction and voice and audio domain signals from the auxiliary driving direction.

字典矩阵的数值表示预设的各个原子分别在预设的各个频率下的频率幅度值,其中,计算得到的各个原子的时延表示采样语音信号下该原子表示的语音成分到达双麦克风的时间差。因此,在本步骤中,依据第一频域信号和第二频域信号,计算完成训练的字典矩阵的各个原子的时延,即将第一频域信号和第二频域信号中来自各个方向的语音信号的时延,转换为各个原子的时延。The value of the dictionary matrix represents the preset frequency amplitude values of each atom at each preset frequency, wherein the calculated time delay of each atom represents the time difference between the speech components represented by the atom reaching the dual microphones under the sampled speech signal. Therefore, in this step, according to the first frequency domain signal and the second frequency domain signal, calculate the time delay of each atom of the dictionary matrix that has completed the training, that is, the first frequency domain signal and the second frequency domain signal from various directions. The delay of the speech signal is converted into the delay of each atom.

可选的,计算各个原子的时延的过程,可以包括以下步骤A1~步骤A2:Optionally, the process of calculating the time delay of each atom may include the following steps A1 to A2:

A1、依据第一频域信号和第二频域信号,计算每个原子的时延函数。A1. Calculate the time delay function of each atom according to the first frequency domain signal and the second frequency domain signal.

在本步骤中,可以依据公式(1),计算每个原子的时延函数。In this step, the delay function of each atom can be calculated according to formula (1).

Figure BDA0002594047710000081
Figure BDA0002594047710000081

式中,f表示频率,d表示原子,Wfd表示完成训练的字典矩阵;d表示时延,Xlf与Xrf分别表示第一频域信号和第二频域信号,F表示计算得到的原子的时延函数。In the formula, f represents the frequency, d represents the atom, and W fd represents the dictionary matrix that has completed the training; d represents the time delay, X lf and X rf represent the first frequency domain signal and the second frequency domain signal respectively, and F represents the calculated Atomic delay function.

A2、分别针对每个原子的时延函数,将时延函数的取极大值情况下的时延,作为原子的时延,得到字典矩阵中各个原子的时延。A2. According to the delay function of each atom, the delay when the delay function takes a maximum value is taken as the delay of the atom, and the delay of each atom in the dictionary matrix is obtained.

本步骤的具体实现方式为现有技术,这里不再赘述。The specific implementation manner of this step is in the prior art, and details are not repeated here.

上述第一频域信号和第二频域信号可以统称为频域信号。The above-mentioned first frequency domain signal and second frequency domain signal may be collectively referred to as frequency domain signals.

上述S201~S204的目的是:依据双麦克风的采样语音信号的频域信号,计算预设的字典矩阵中各个原子的时延。The purpose of the above S201-S204 is to calculate the time delay of each atom in the preset dictionary matrix according to the frequency domain signal of the sampled speech signal of the dual microphones.

S205、将字典矩阵的原子中,时延与待拾音方向的语音的时延间的差值属于预设的拾音波束范围的原子,作为目标原子。S205. In the atoms of the dictionary matrix, the difference between the time delay and the time delay of the voice in the direction to be picked up belongs to the atom in the preset range of the sound pickup beam, as the target atom.

在本步骤中,目标原子为时延与待拾音方向语音的时延间的差值属于预设的拾音波束范围的原子。其中,待拾音方向是根据实际所需提取的方向语音确定的,例如,待拾音方向语音为主驾声源。In this step, the target atom is an atom whose time delay and the time delay of the voice in the direction to be picked up belong to the preset sound pickup beam range. The direction of the sound to be picked up is determined according to the directional voice that actually needs to be extracted, for example, the voice of the direction to be picked up is the main driving sound source.

可选的,在本实施例中,待拾音方向可以是事先设定,也可以本实施例确定得到。其中,本实施例确定待拾音方向,可以根据声源方位法进行声源定位,得到声源定位结果,并根据指定的声源定位结果,确定待拾音方向。其中,声源定位方法包括但不限于基于到达时间差的方法(如GCCPHAT)、基于高分辨率谱估计的方法(MUSIC),以及基于可控波束形成的方法。Optionally, in this embodiment, the direction to be picked up may be preset or determined in this embodiment. Wherein, this embodiment determines the direction of sound to be picked up, and can perform sound source localization according to the sound source orientation method to obtain a sound source localization result, and determine the sound to be picked up direction according to the specified sound source localization result. Among them, sound source localization methods include but are not limited to methods based on time difference of arrival (eg GCCPHAT), methods based on high resolution spectral estimation (MUSIC), and methods based on steerable beamforming.

S206、分别计算字典矩阵中各个频率下目标原子的频率幅度值之和与该频率下全部原子的频率幅度值之和的比值,将各个频率与对应的比值组成频域滤波器。S206: Calculate the ratio of the sum of the frequency amplitude values of the target atoms at each frequency in the dictionary matrix to the sum of the frequency amplitude values of all atoms at the frequency, and form a frequency domain filter with each frequency and the corresponding ratio.

可选的,在实际中,本步骤的具体实现方式可以包括以下步骤B1~步骤B2:Optionally, in practice, the specific implementation of this step may include the following steps B1 to B2:

B1、如果字典矩阵中原子的时延与待拾音方向的语音时延之间的差值绝对值小于预设阈值,则原子的二值取值为1,否则二值取值为0,得到各个原子与的二值取值构成的二值矩阵。B1. If the absolute value of the difference between the delay of the atom in the dictionary matrix and the delay of the voice in the direction to be picked up is smaller than the preset threshold, then the binary value of the atom takes the value of 1, otherwise the binary value takes the value of 0 to obtain A binary matrix consisting of the binary values of each atom and.

具体的,本步骤可以通过公式(2)实现。Specifically, this step can be implemented by formula (2).

Figure BDA0002594047710000101
Figure BDA0002594047710000101

式中,

Figure BDA0002594047710000102
表示预设阈值,δ表示预设的拾音波束范围,τ0表示待拾音方向的语音的时延,
Figure BDA0002594047710000103
表示步骤B1中计算出的原子的时延,Md表示原子的二值取值。In the formula,
Figure BDA0002594047710000102
represents the preset threshold, δ represents the preset range of the pickup beam, τ 0 represents the time delay of the voice in the direction to be picked up,
Figure BDA0002594047710000103
represents the time delay of the atom calculated in step B1, and M d represents the binary value of the atom.

在本实施实例中,通过调整δ的大小可以调整系统的拾音波束范围。本实施例不对δ的具体取值作限定。In this embodiment, the range of the sound pickup beam of the system can be adjusted by adjusting the size of δ. This embodiment does not limit the specific value of δ.

B2、分别计算字典矩阵的各个频率下全部原子的频率幅度值与对应二值取值的加权和,与,该频率下全部原子的频率幅度值之和的比值,将各个频率与对应的比值组成频域滤波器。B2. Calculate the weighted sum of the frequency amplitude values of all atoms at each frequency of the dictionary matrix and the corresponding binary values, and the ratio of the sum of the frequency amplitude values of all atoms at this frequency, and combine each frequency with the corresponding ratio. frequency domain filter.

在本步骤中,可以通过公式(3),计算得到频域滤波器中,各频率下的幅度取值。In this step, the value of the amplitude at each frequency in the frequency domain filter can be calculated by formula (3).

Figure BDA0002594047710000104
Figure BDA0002594047710000104

式中,Md表示原子的二值取值,Wfd表示字典矩阵,d表示原子,If表示频率f下的幅度取值。In the formula, M d represents the binary value of the atom, W fd represents the dictionary matrix, d represents the atom, and If represents the amplitude value at the frequency f .

在本申请中,由于计算得到的各个原子的时延表示采样语音信号中该原子表示的预设语音成分到达双麦克风的时间差。因此,本申请将字典矩阵的原子中,时延与待拾音方向的语音的时延间的差值属于预设的拾音波束范围的原子,作为目标原子,该目标原子为代表待拾音方向语音的原子In this application, the calculated time delay of each atom represents the time difference between the preset speech components represented by the atom in the sampled speech signal reaching the dual microphones. Therefore, in the present application, among the atoms of the dictionary matrix, the difference between the time delay and the time delay of the voice in the direction to be picked up belongs to the atom that belongs to the preset range of the sound pickup beam as the target atom, and the target atom represents the sound to be picked up. Atoms of Directional Speech

因此,本步骤分别计算字典矩阵中各个频率下目标原子的频率幅度值之和与全部原子的频率幅度值之和的比值,并将各个频率与对应的比值组成的频域滤波器反映的是:预设拾音波束范围内待拾音方向的语音分别在各个频率下的幅度分布情况,即待拾音方向的语音的频率分布。因此,采用本步骤得到的滤波器,分别对第一频域信号和第二频域信号进行滤波,可以得到第一频域信号和第二频域信号分别包含的待拾音方向语音的频域信号。Therefore, this step calculates the ratio of the sum of the frequency amplitude values of the target atoms at each frequency in the dictionary matrix to the sum of the frequency amplitude values of all atoms, and the frequency domain filter composed of each frequency and the corresponding ratio reflects: The amplitude distribution of the speech in the direction to be picked up within the preset range of the sound pickup beam at each frequency, that is, the frequency distribution of the speech in the direction to be picked up. Therefore, using the filter obtained in this step to filter the first frequency domain signal and the second frequency domain signal respectively, the frequency domain of the voice in the direction to be picked up respectively contained in the first frequency domain signal and the second frequency domain signal can be obtained. Signal.

S207、采用频域滤波器分别对第一频域信号和第二频域信号进行滤波,得到滤波后的第一频域信号和第二频域信号。S207 , using a frequency domain filter to filter the first frequency domain signal and the second frequency domain signal, respectively, to obtain the filtered first frequency domain signal and the second frequency domain signal.

在本步骤中,分别对第一频域信号和第二频域信号进行滤波的方式,可以通过公式(4)实现。In this step, the way of filtering the first frequency domain signal and the second frequency domain signal respectively can be implemented by formula (4).

Figure BDA0002594047710000111
Figure BDA0002594047710000111

式中,If表示频域滤波器,Xlf和Xrf表示第一频域信号和第二频域信号,Ylf和Yrf表示滤波后的第一频域信号和滤波后的第二频域信号。In the formula, I f represents the frequency domain filter, X lf and X rf represent the first frequency domain signal and the second frequency domain signal, Y lf and Y rf represent the filtered first frequency domain signal and the filtered second frequency domain signal. domain signal.

S208、将滤波后的第一频域信号和滤波后的第二频域信号分别进行时频逆变换,得到采样语音信号中待拾音方向的语音信号。S208: Perform inverse time-frequency transform on the filtered first frequency domain signal and the filtered second frequency domain signal, respectively, to obtain a voice signal in the direction to be picked up in the sampled voice signal.

在本步骤中,时频逆变换可以为傅里叶逆变换,当然,在实际中,时频逆变换还可以为其他逆变换形式,本实施例不对时频逆变换的具体形式作限定,只要与时频变换的方式对应即可。In this step, the time-frequency inverse transform can be an inverse Fourier transform. Of course, in practice, the time-frequency inverse transform can also be other inverse transform forms. This embodiment does not limit the specific form of the time-frequency inverse transform, as long as It is sufficient to correspond to the method of time-frequency transformation.

上述S207~S208的目的是:依据频域信号和频域滤波器,确定采样语音信号中待拾音方向的语音信号。The purpose of the above S207-S208 is to determine the voice signal in the direction to be picked up in the sampled voice signal according to the frequency domain signal and the frequency domain filter.

经实验验证,本实施例提供的双麦克风下的定向拾音方案,能够在车载双麦克风主副驾定向拾音场景下,对非待拾音方向的干扰抑制15-18dB,因此,具有较强的干扰抑制能力,即较强的抗干扰能力。It has been verified by experiments that the directional sound pickup solution under the dual microphones provided in this embodiment can suppress the interference of the non-to-be-picked sound direction by 15-18 dB in the directional sound pickup scenario of the main and auxiliary drivers with dual microphones in the vehicle. Therefore, it has strong performance. Interference suppression ability, that is, strong anti-interference ability.

在主副驾同事说话的场景下,采用本实施例的方案,输入的采样语音信号和输出的拾音后的语音信号,在时域上的对比图如图3(a)所示,其中,上面为采样语音信号,下面为拾音后的语音信号。在频域上的对比如图3(b)所示,其中,上面为采样语音信号的幅度谱,下面为拾音后的语音信号的幅度谱。从图3(a)中,可以看出拾音得到的语音信号明显少于采样语音信号。从图3(b)中可以看出拾音得到的幅度谱明显少于采样语音的幅度谱。In the scene where the main and co-pilot colleagues are talking, using the solution of this embodiment, the comparison diagram in the time domain of the input sampled voice signal and the output voice signal after pickup is shown in Figure 3(a), where the above In order to sample the voice signal, the following is the voice signal after pickup. The comparison in the frequency domain is shown in Fig. 3(b), where the upper part is the amplitude spectrum of the sampled speech signal, and the lower part is the amplitude spectrum of the speech signal after pickup. From Figure 3(a), it can be seen that the voice signal obtained by picking up the sound is significantly less than the sampled voice signal. It can be seen from Figure 3(b) that the amplitude spectrum obtained by picking up the voice is obviously less than that of the sampled speech.

本实施例具有以下有益效果:This embodiment has the following beneficial effects:

有益效果一:Beneficial effect one:

由于本实施例依据对应的时延与待拾音方向的语音的时延间的差值属于预设的拾音波束范围的目标原子,以及字典矩阵中目标原子在各种频率下的频率幅度值,得到用于反映预设拾音波束范围内待拾音方向的语音的频率分布的频域滤波器。由于频率滤波器反映的是:预设拾音波束范围内待拾音方向的语音的频率分布,因此,通过本实施例可以提高对非待拾音空间方向的语音信号的抑制能力,进而,本实施例提供的方案的抗干扰能力增强,鲁棒性增强。In this embodiment, the difference between the corresponding time delay and the time delay of the speech in the direction to be picked up belongs to the target atoms in the preset sound pickup beam range, and the frequency amplitude values of the target atoms in the dictionary matrix at various frequencies , to obtain a frequency domain filter used to reflect the frequency distribution of the voice in the direction to be picked up within the preset sound pickup beam range. Since the frequency filter reflects: the frequency distribution of the voice in the direction to be picked up within the preset range of the sound pickup beam, this embodiment can improve the ability to suppress the voice signal in the spatial direction not to be picked up, and further, this embodiment The solutions provided by the embodiments have enhanced anti-interference capability and enhanced robustness.

有益效果二:Beneficial effect two:

采用本实施例提供的双麦克风下的定向拾音方案,只需获取双麦克风采集的语音信号,以及事先完成训练的字典矩阵,得到待拾取空间方向的语音,即本实施例不需要信号的先验知识。并且,字典矩阵为事先训练完成的,使得,基于训练完成的字典矩阵对待拾取方向的语音进行拾取,降低了推理过程的运算量,因此,本实施例的实时性较好,因此,具有较高的实际应用价值。Using the directional sound pickup solution provided by the dual microphones in this embodiment, it is only necessary to obtain the speech signals collected by the dual microphones and the dictionary matrix trained in advance to obtain the speech in the spatial direction to be picked up. test knowledge. In addition, the dictionary matrix is trained in advance, so that the speech in the direction to be picked up is picked up based on the trained dictionary matrix, which reduces the computational complexity of the reasoning process. Therefore, the real-time performance of this embodiment is good, and therefore, it has a high practical application value.

图4为本申请实施例提供的一种双麦克风下的定向拾音装置,可以包括:第一计算模块401、第一确定模块402、第二计算模块403和第二确定模块404,其中,FIG. 4 provides a directional sound pickup device with dual microphones according to an embodiment of the present application, which may include: a first calculation module 401, a first determination module 402, a second calculation module 403, and a second determination module 404, wherein,

第一计算模块401,用于依据双麦克风的采样语音信号的频域信号,计算预设的字典矩阵中各个原子的时延;所述各个原子的时延表示所述采样语音信号下该原子表示的预设语音成分到达双麦克风的时间差。。The first calculation module 401 is configured to calculate the time delay of each atom in the preset dictionary matrix according to the frequency domain signal of the sampled speech signal of the dual microphones; the time delay of each atom represents the representation of the atom under the sampled speech signal The time difference between the preset speech components reaching the dual microphones. .

第一确定模块402,用于将字典矩阵的原子中,时延与待拾音方向的语音的时延间的差值属于预设的拾音波束范围的原子,作为目标原子。The first determination module 402 is configured to use, among the atoms of the dictionary matrix, the difference between the time delay and the time delay of the speech in the direction to be picked up, which belongs to the preset range of the sound pickup beam, as the target atom.

第二计算模块403,用于分别计算字典矩阵中各个频率下目标原子的频率幅度值之和与该频率下全部原子的频率幅度值之和的比值,将各个频率与对应的比值组成用于表征所述语音信号中,待拾音方向的语音的频率分布的频域滤波器。The second calculation module 403 is used to respectively calculate the ratio of the sum of the frequency amplitude values of the target atoms at each frequency in the dictionary matrix to the sum of the frequency amplitude values of all atoms at the frequency, and form each frequency and the corresponding ratio to represent In the voice signal, the frequency domain filter of the frequency distribution of the voice in the direction to be picked up.

第二确定模块404,用于依据频域信号和所述频域滤波器,确定采样语音信号中待拾音方向的语音信号。The second determining module 404 is configured to determine, according to the frequency-domain signal and the frequency-domain filter, the speech signal in the direction to be picked up in the sampled speech signal.

可选的,第二确定模块404,用于依据频域信号和所述频域滤波器,确定采样语音信号中待拾音方向的语音信号,包括:Optionally, the second determining module 404 is configured to determine, according to the frequency domain signal and the frequency domain filter, the voice signal in the direction to be picked up in the sampled voice signal, including:

第二确定模块404,具体用于采用频域滤波器对频域信号进行滤波,得到滤波后的频域信号;将滤波后的频域信号分别进行时频逆变换,得到采样语音信号中待拾音方向的语音信号。The second determination module 404 is specifically configured to use a frequency domain filter to filter the frequency domain signal to obtain a filtered frequency domain signal; perform time-frequency inverse transformation on the filtered frequency domain signal respectively to obtain the to-be-picked voice signal in the sampled voice signal. voice signal in the sound direction.

可选的,频域信号包括:第一频域信号和第二频域信号;第一计算模块402,用于依据双麦克风的采样语音信号的频域信号,计算预设的字典矩阵中各个原子的时延,包括:Optionally, the frequency domain signal includes: a first frequency domain signal and a second frequency domain signal; the first calculation module 402 is configured to calculate each atom in the preset dictionary matrix according to the frequency domain signal of the sampled speech signal of the dual microphones. delays, including:

第一计算模块401,具体用于依据

Figure BDA0002594047710000131
第一频域信号和第二频域信号,计算每个原子的时延函数;其中,f表示频率,d表示原子,Wfd表示所述字典矩阵;d表示时延;Xlf与Xrf分别表示第一频域信号和第二频域信号;F表示计算得到的原子的时延函数;分别针对每个原子的时延函数,将时延函数的取极大值情况下的时延,作为原子的时延,得到字典矩阵的各个原子的时延。The first calculation module 401 is specifically used for
Figure BDA0002594047710000131
The first frequency domain signal and the second frequency domain signal are used to calculate the delay function of each atom; wherein, f represents the frequency, d represents the atom, and W fd represents the dictionary matrix; d represents the delay; X lf and X rf respectively Represents the first frequency domain signal and the second frequency domain signal; F represents the calculated atomic delay function; for the delay function of each atom, the delay when the delay function takes a maximum value, As the delay of the atom, the delay of each atom of the dictionary matrix is obtained.

可选的,第二计算模块403,用于分别计算字典矩阵中的各个频率下目标原子的频率幅度值之和与该频率下全部原子的频率幅度值之和的比值,将各个频率与对应的比值组成用于表征所述采样语音信号中,所述待拾音方向的语音的频率分布的频域滤波器,包括:Optionally, the second calculation module 403 is used to calculate the ratio of the sum of the frequency amplitude values of the target atoms under each frequency in the dictionary matrix and the sum of the frequency amplitude values of all atoms at the frequency, and compare each frequency with the corresponding value. The ratio constitutes a frequency domain filter used to characterize the frequency distribution of the voice in the direction to be picked up in the sampled voice signal, including:

第二计算模块403,具体用于如果字典矩阵的原子的时延与待拾音方向的语音的时延之间的差值的绝对值小于预设阈值,则原子的二值取值为1,否则二值取值为0;分别计算字典矩阵的各个频率下全部原子的频率幅度值与对应二值取值的加权和,与,该频率下全部原子的频率幅度值之和的比值,将各个频率与对应的比值,组成用于表征所述采样语音信号中,所述待拾音方向的语音的频率分布的频域滤波器。The second calculation module 403 is specifically used for if the absolute value of the difference between the time delay of the atom of the dictionary matrix and the time delay of the voice in the direction to be picked up is less than the preset threshold, then the binary value of the atom is 1, Otherwise, the binary value is 0; calculate the weighted sum of the frequency amplitude values of all atoms under each frequency of the dictionary matrix and the corresponding binary values, and the ratio of the sum of the frequency amplitude values of all atoms at this frequency, and calculate each The frequency and the corresponding ratio constitute a frequency domain filter used to characterize the frequency distribution of the speech in the to-be-picked direction in the sampled speech signal.

双麦克风下的定向拾取装置包括处理器和存储器,上述第一计算模块401、第一确定模块402、第二计算模块403和第二确定模块404等均作为程序单元存储在存储器中,由处理器执行存储在存储器中的上述程序单元来实现相应的功能。The directional pickup device with dual microphones includes a processor and a memory. The above-mentioned first calculation module 401 , first determination module 402 , second calculation module 403 and second determination module 404 are all stored in the memory as program units, and are stored in the memory by the processor. The above-mentioned program elements stored in the memory are executed to realize the corresponding functions.

处理器中包含内核,由内核去存储器中调取相应的程序单元。内核可以设置一个或以上,通过调整内核参数来提供快速准确的双麦克风下的定向拾音方法。The processor includes a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can be set to one or more, by adjusting the kernel parameters to provide a fast and accurate directional pickup method under dual microphones.

本发明实施例提供了一种存储介质,其上存储有程序,该程序被处理器执行时实现所述双麦克风下的定向拾音方法。An embodiment of the present invention provides a storage medium on which a program is stored, and when the program is executed by a processor, the directional sound pickup method under the dual microphones is implemented.

本发明实施例提供了一种处理器,所述处理器用于运行程序,其中,所述程序运行时执行所述双麦克风下的定向拾音方法。An embodiment of the present invention provides a processor for running a program, wherein the directional sound pickup method under the dual microphones is executed when the program is running.

本发明实施例提供了一种设备,如图5所示,设备包括至少一个处理器、以及与处理器连接的至少一个存储器、总线;其中,处理器、存储器通过总线完成相互间的通信;处理器用于调用存储器中的程序指令,以执行上述的同行人员的识别方法。本文中的设备可以是服务器、PC、PAD、手机等。An embodiment of the present invention provides a device. As shown in FIG. 5 , the device includes at least one processor, and at least one memory and a bus connected to the processor; wherein the processor and the memory communicate with each other through the bus; processing The device is used to call the program instructions in the memory to execute the above-mentioned method for identifying the person in the same field. The devices in this article can be servers, PCs, PADs, mobile phones, and so on.

本申请还提供了一种计算机程序产品,当在数据处理设备上执行时,适于执行初始化有如下方法步骤的程序:The application also provides a computer program product that, when executed on a data processing device, is adapted to execute a program initialized with the following method steps:

依据双麦克风的采样语音信号的频域信号,计算预设的字典矩阵的各个原子的时延;所述各个原子的时延表示所述采样语音信号下该原子表示的预设语音成分到达双麦克风的时间差;Calculate the time delay of each atom of the preset dictionary matrix according to the frequency domain signal of the sampled speech signal of the dual microphones; the time delay of each atom indicates that the preset speech component represented by the atom under the sampled speech signal reaches the dual microphones time difference;

将所述字典矩阵的原子中,时延与待拾音方向的语音的时延间的差值属于预设的拾音波束范围的原子,作为目标原子;In the atoms of the dictionary matrix, the difference between the time delay and the time delay of the voice in the direction to be picked up belongs to the atom of the preset sound pickup beam range, as the target atom;

分别计算所述字典矩阵的各个频率下目标原子的频率幅度值之和与该频率下全部原子的频率幅度值之和的比值,将各个频率与对应的比值组成用于表征所述语音信号中,所述待拾音方向的语音的频率分布的频域滤波器;Calculate the ratio of the sum of the frequency amplitude values of the target atoms under each frequency of the dictionary matrix to the sum of the frequency amplitude values of all atoms at the frequency, and form each frequency and the corresponding ratio to characterize the speech signal, A frequency domain filter for the frequency distribution of the voice in the direction to be picked up;

依据所述频域信号和所述频域滤波器,确定所述采样语音信号中所述待拾音方向的语音信号。According to the frequency domain signal and the frequency domain filter, the voice signal in the to-be-picked direction in the sampled voice signal is determined.

本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.

在一个典型的配置中,设备包括一个或多个处理器(CPU)、存储器和总线。设备还可以包括输入/输出接口、网络接口等。In a typical configuration, a device includes one or more processors (CPUs), memory, and a bus. Devices may also include input/output interfaces, network interfaces, and the like.

存储器可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM),存储器包括至少一个存储芯片。存储器是计算机可读介质的示例。Memory may include non-persistent memory in computer readable media, random access memory (RAM) and/or non-volatile memory, such as read only memory (ROM) or flash memory (flash RAM), the memory including at least one memory chip. Memory is an example of a computer-readable medium.

计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Computer-readable media includes both persistent and non-permanent, removable and non-removable media, and storage of information may be implemented by any method or technology. Information may be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, excludes transitory computer-readable media, such as modulated data signals and carrier waves.

还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括要素的过程、方法、商品或者设备中还存在另外的相同要素。It should also be noted that the terms "comprising", "comprising" or any other variation thereof are intended to encompass a non-exclusive inclusion such that a process, method, article or device comprising a series of elements includes not only those elements, but also Other elements not expressly listed or inherent to such a process, method, article of manufacture or apparatus are also included. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, method, article of manufacture or apparatus that includes the element.

本领域技术人员应明白,本申请的实施例可提供为方法、系统或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。It will be appreciated by those skilled in the art that the embodiments of the present application may be provided as a method, a system or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

以上仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。The above are merely examples of the present application, and are not intended to limit the present application. Various modifications and variations of this application are possible for those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included within the scope of the claims of this application.

本申请实施例方法所述的功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算设备可读取存储介质中。基于这样的理解,本申请实施例对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该软件产品存储在一个存储介质中,包括若干指令用以使得一台计算设备(可以是个人计算机,服务器,移动计算设备或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。If the functions described in the methods of the embodiments of the present application are implemented in the form of software functional units and sold or used as independent products, they may be stored in a readable storage medium of a computing device. Based on this understanding, the part of the embodiments of the present application that contribute to the prior art or the part of the technical solution may be embodied in the form of a software product, and the software product is stored in a storage medium and includes several instructions to make a A computing device (which may be a personal computer, a server, a mobile computing device or a network device, etc.) executes all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .

本说明书的各个实施例中记载的特征可以相互替换或者组合,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。The features described in the various embodiments of this specification can be replaced or combined with each other, and each embodiment focuses on the differences from other embodiments, and the same or similar parts of the various embodiments can be referred to each other.

对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本申请。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本申请的精神或范围的情况下,在其它实施例中实现。因此,本申请将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。The above description of the disclosed embodiments enables any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the present application. Therefore, this application is not intended to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1.一种双麦克风下的定向拾音方法,其特征在于,包括:1. a directional pickup method under a dual microphone, is characterized in that, comprising: 依据双麦克风的采样语音信号的频域信号,计算预设的字典矩阵的各个原子的时延;所述各个原子的时延表示所述采样语音信号下该原子表示的预设语音成分到达双麦克风的时间差;Calculate the time delay of each atom of the preset dictionary matrix according to the frequency domain signal of the sampled speech signal of the dual microphones; the time delay of each atom indicates that the preset speech component represented by the atom under the sampled speech signal reaches the dual microphones time difference; 将所述字典矩阵的原子中,时延与待拾音方向的语音的时延间的差值属于预设的拾音波束范围的原子,作为目标原子;In the atoms of the dictionary matrix, the difference between the time delay and the time delay of the voice in the direction to be picked up belongs to the atom of the preset sound pickup beam range, as the target atom; 分别计算所述字典矩阵的各个频率下目标原子的频率幅度值之和与该频率下全部原子的频率幅度值之和的比值,将各个频率与对应的比值组成用于表征所述采样语音信号中,所述待拾音方向的语音的频率分布的频域滤波器;Calculate the ratio of the sum of the frequency amplitude values of the target atoms at each frequency of the dictionary matrix to the sum of the frequency amplitude values of all atoms at the frequency, and form each frequency and the corresponding ratio to characterize the sampled speech signal. , the frequency domain filter of the frequency distribution of the voice in the direction to be picked up; 依据所述频域信号和所述频域滤波器,确定所述采样语音信号中所述待拾音方向的语音信号。According to the frequency domain signal and the frequency domain filter, the voice signal in the to-be-picked direction in the sampled voice signal is determined. 2.根据权利要求1所述的方法,其特征在于,所述依据所述频域信号和所述频域滤波器,确定所述采样语音信号中所述待拾音方向的语音信号,包括:2. The method according to claim 1, wherein the determining the voice signal in the direction to be picked up in the sampled voice signal according to the frequency domain signal and the frequency domain filter comprises: 采用所述频域滤波器对所述频域信号进行滤波,得到滤波后的频域信号;Filter the frequency-domain signal by using the frequency-domain filter to obtain a filtered frequency-domain signal; 对所述滤波后的频域信号分别进行时频逆变换,得到所述采样语音信号中所述待拾音方向的语音信号。The time-frequency inverse transformation is performed on the filtered frequency domain signals respectively to obtain the voice signal in the to-be-picked direction in the sampled voice signal. 3.根据权利要求1所述的方法,其特征在于,所述频域信号包括:第一频域信号和第二频域信号;3. The method according to claim 1, wherein the frequency domain signal comprises: a first frequency domain signal and a second frequency domain signal; 所述依据双麦克风的采样语音信号的频域信号,计算预设的字典矩阵的各个原子的时延,包括:The time delay of each atom of the preset dictionary matrix is calculated according to the frequency domain signal of the sampled speech signal of the dual microphones, including: 依据
Figure FDA0002594047700000011
所述第一频域信号和所述第二频域信号,计算每个原子的时延函数;其中,f表示频率,d表示原子,Wfd表示所述字典矩阵;d表示时延;Xlf与Xrf分别表示第一频域信号和第二频域信号;F表示计算得到的原子的时延函数;
in accordance with
Figure FDA0002594047700000011
Calculate the delay function of each atom for the first frequency domain signal and the second frequency domain signal; wherein, f represents the frequency, d represents the atom, and W fd represents the dictionary matrix; d represents the delay; X lf and X rf represent the first frequency domain signal and the second frequency domain signal respectively; F represents the calculated atomic delay function;
分别针对每个原子的时延函数,将时延函数的取极大值情况下的时延,作为原子的时延,得到所述字典矩阵的各个原子的时延。For the delay function of each atom, the delay when the delay function takes a maximum value is taken as the delay of the atom, and the delay of each atom of the dictionary matrix is obtained.
4.根据权利要求1所述的方法,其特征在于,所述分别计算所述字典矩阵中的各个频率下目标原子的频率幅度值之和与该频率下全部原子的频率幅度值之和的比值,将各个频率与对应的比值组成用于表征所述采样语音信号中,所述待拾音方向的语音的频率分布的频域滤波器,包括:4. The method according to claim 1, wherein the ratio of the sum of the frequency amplitude values of the target atoms under each frequency in the dictionary matrix and the sum of the frequency amplitude values of all atoms at the frequency is calculated respectively. , and each frequency and the corresponding ratio are formed into a frequency domain filter used to characterize the frequency distribution of the voice in the direction of the sampled voice signal in the sampled voice signal, including: 如果所述字典矩阵的原子的时延与所述待拾音方向的语音的时延之间的差值的绝对值小于预设阈值,则原子的二值取值为1,否则二值取值为0;If the absolute value of the difference between the delay of the atom of the dictionary matrix and the delay of the voice in the direction to be picked up is smaller than the preset threshold, the binary value of the atom takes the value of 1, otherwise the binary value takes the value is 0; 分别计算所述字典矩阵的各个频率下全部原子的频率幅度值与对应二值取值的加权和,与,该频率下全部原子的频率幅度值之和的比值;Calculate the weighted sum of the frequency amplitude values of all atoms and the corresponding binary values under each frequency of the dictionary matrix, and the ratio of the sum of the frequency amplitude values of all atoms at the frequency; 将各个频率与对应的比值,组成用于表征所述采样语音信号中,所述待拾音方向的语音的频率分布的频域滤波器。A frequency domain filter used to characterize the frequency distribution of the speech in the to-be-picked direction in the sampled speech signal is composed of each frequency and the corresponding ratio. 5.根据权利要求1所述的方法,其特征在于,所述预设的字典矩阵的生成过程,包括:5. The method according to claim 1, wherein the generation process of the preset dictionary matrix comprises: 将双麦克风分别对应的预设训练数据,分别进行时频变换并取绝对值,得到两个非负的幅度谱矩阵;The preset training data corresponding to the two microphones are respectively subjected to time-frequency transformation and absolute values are obtained to obtain two non-negative amplitude spectrum matrices; 通过非负矩阵分解算法,将所述幅度谱矩阵分解为目标字典矩阵和系数矩阵;Decompose the magnitude spectrum matrix into a target dictionary matrix and a coefficient matrix through a non-negative matrix decomposition algorithm; 依据预设的目标函数,迭代更新目标字典矩阵和系数矩阵,直至将在所述目标函数收敛的情况下得到的目标字典矩阵作为所述预设的字典矩阵。According to the preset objective function, the target dictionary matrix and the coefficient matrix are iteratively updated until the target dictionary matrix obtained when the objective function converges is used as the preset dictionary matrix. 6.一种双麦克风下的定向拾音装置,其特征在于,包括:6. A directional sound pickup device under a dual microphone is characterized in that, comprising: 第一计算模块,用于依据双麦克风的采样语音信号的频域信号,计算预设的字典矩阵中各个原子的时延;所述各个原子的时延表示所述采样语音信号下该原子表示的预设语音成分到达双麦克风的时间差;The first calculation module is used to calculate the time delay of each atom in the preset dictionary matrix according to the frequency domain signal of the sampled speech signal of the dual microphones; the time delay of each atom represents the time delay represented by the atom under the sampled speech signal. The time difference between the preset speech components reaching the dual microphones; 第一确定模块,用于将所述字典矩阵的原子中,时延与待拾音方向的语音的时延间的差值属于预设的拾音波束范围的原子,作为目标原子;The first determination module is used for taking the atoms in the atoms of the dictionary matrix, the difference between the time delay and the time delay of the voice in the direction to be picked up, which belongs to the preset range of the sound pickup beam, as the target atom; 第二计算模块,用于分别计算所述字典矩阵各个频率下目标原子的频率幅度值之和与该频率下全部原子的频率幅度值之和的比值,将各个频率与对应的比值组成用于表征所述采样语音信号中,所述待拾音方向的语音的频率分布的频域滤波器;The second calculation module is used to calculate the ratio of the sum of the frequency amplitude values of the target atoms at each frequency of the dictionary matrix to the sum of the frequency amplitude values of all atoms at the frequency, and form each frequency and the corresponding ratio to represent the In the sampled voice signal, the frequency domain filter of the frequency distribution of the voice in the direction to be picked up; 第二确定模块,用于依据所述频域信号和所述频域滤波器,确定所述采样语音信号中所述待拾音方向的语音信号。The second determination module is configured to determine, according to the frequency domain signal and the frequency domain filter, the speech signal in the to-be-picked direction in the sampled speech signal. 7.根据权利要求6所述的装置,其特征在于,所述第二确定模块,用于依据所述频域信号和所述频域滤波器,确定所述采样语音信号中所述待拾音方向的语音信号,包括:7 . The device according to claim 6 , wherein the second determining module is configured to determine the to-be-picked sound in the sampled speech signal according to the frequency-domain signal and the frequency-domain filter. 8 . Directional voice signals, including: 所述第二确定模块,具体用于采用所述频域滤波器对所述频域信号进行滤波,得到滤波后的频域信号;对所述滤波后的频域信号分别进行时频逆变换,得到所述采样语音信号中所述待拾音方向的语音信号。The second determining module is specifically configured to use the frequency domain filter to filter the frequency domain signal to obtain a filtered frequency domain signal; respectively perform inverse time-frequency transform on the filtered frequency domain signal, The voice signal in the to-be-picked direction in the sampled voice signal is obtained. 8.根据权利要求6所述的装置,其特征在于,所述频域信号包括:第一频域信号和第二频域信号;8. The apparatus according to claim 6, wherein the frequency domain signal comprises: a first frequency domain signal and a second frequency domain signal; 所述第一计算模块,用于依据双麦克风的采样语音信号的频域信号,计算预设的字典矩阵中各个原子的时延,包括:The first calculation module is used to calculate the time delay of each atom in the preset dictionary matrix according to the frequency domain signal of the sampled speech signal of the dual microphones, including: 所述第一计算模块,具体用于依据
Figure FDA0002594047700000031
所述第一频域信号和所述第二频域信号,计算每个原子的时延函数;其中,f表示频率,d表示原子,Wfd表示所述字典矩阵;d表示时延;Xlf与Xrf分别表示第一频域信号和第二频域信号;F表示计算得到的原子的时延函数;
The first calculation module is specifically used for
Figure FDA0002594047700000031
Calculate the delay function of each atom for the first frequency domain signal and the second frequency domain signal; wherein, f represents the frequency, d represents the atom, and W fd represents the dictionary matrix; d represents the delay; X lf and X rf represent the first frequency domain signal and the second frequency domain signal respectively; F represents the calculated atomic delay function;
分别针对每个原子的时延函数,将时延函数的取极大值情况下的时延,作为原子的时延,得到所述字典矩阵的各个原子的时延。For the delay function of each atom, the delay when the delay function takes a maximum value is taken as the delay of the atom, and the delay of each atom of the dictionary matrix is obtained.
9.根据权利要求6所述的装置,其特征在于,所述第二计算模块,用于分别计算所述字典矩阵中的各个频率下目标原子的频率幅度值之和与该频率下全部原子的频率幅度值之和的比值,将各个频率与对应的比值组成用于表征所述采样语音信号中,所述待拾音方向的语音的频率分布的频域滤波器,包括:9 . The device according to claim 6 , wherein the second calculation module is used to calculate the sum of the frequency amplitude values of the target atoms at each frequency in the dictionary matrix and the sum of the frequency amplitudes of all atoms at the frequency. 10 . The ratio of the sum of frequency amplitude values, each frequency and the corresponding ratio are formed into a frequency domain filter used to characterize the frequency distribution of the voice in the direction to be picked up in the sampled voice signal, including: 所述第二计算模块,具体用于如果所述字典矩阵的原子的时延与所述待拾音方向的语音的时延之间的差值的绝对值小于预设阈值,则原子的二值取值为1,否则二值取值为0;分别计算所述字典矩阵的各个频率下全部原子的频率幅度值与对应二值取值的加权和,与,该频率下全部原子的频率幅度值之和的比值,将各个频率与对应的比值,组成用于表征所述采样语音信号中,所述待拾音方向的语音的频率分布的频域滤波器。The second calculation module is specifically configured to, if the absolute value of the difference between the time delay of the atom of the dictionary matrix and the time delay of the voice in the direction to be picked up is less than the preset threshold, the binary value of the atom The value is 1, otherwise the binary value is 0; calculate the weighted sum of the frequency amplitude value of all atoms under each frequency of the dictionary matrix and the corresponding binary value, and, the frequency amplitude value of all atoms at this frequency The ratio of the sum and each frequency and the corresponding ratio are used to form a frequency domain filter for characterizing the frequency distribution of the voice in the direction to be picked up in the sampled voice signal. 10.一种存储介质,其特征在于,所述存储介质包括存储的程序,其中,所述程序执行权利要求1~5任意一项权利要求所述的双麦克风下的定向拾音方法。10 . A storage medium, wherein the storage medium comprises a stored program, wherein the program executes the directional sound pickup method with dual microphones according to any one of claims 1 to 5 . 11 .
CN202010704095.0A 2020-07-21 2020-07-21 Directional sound pickup method and related device under dual microphones Pending CN111863018A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010704095.0A CN111863018A (en) 2020-07-21 2020-07-21 Directional sound pickup method and related device under dual microphones

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010704095.0A CN111863018A (en) 2020-07-21 2020-07-21 Directional sound pickup method and related device under dual microphones

Publications (1)

Publication Number Publication Date
CN111863018A true CN111863018A (en) 2020-10-30

Family

ID=73001791

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010704095.0A Pending CN111863018A (en) 2020-07-21 2020-07-21 Directional sound pickup method and related device under dual microphones

Country Status (1)

Country Link
CN (1) CN111863018A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114708881A (en) * 2022-04-20 2022-07-05 展讯通信(上海)有限公司 Directional and selective sound pickup method, electronic device and storage medium based on dual microphones

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110428852A (en) * 2019-08-09 2019-11-08 南京人工智能高等研究院有限公司 Speech separating method, device, medium and equipment

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110428852A (en) * 2019-08-09 2019-11-08 南京人工智能高等研究院有限公司 Speech separating method, device, medium and equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SEAN U. N. WOOD, 等: "Unsupervised Low Latency Speech Enhancement With RT-GCC-NMF", IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, vol. 13, no. 2, pages 333 - 335 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114708881A (en) * 2022-04-20 2022-07-05 展讯通信(上海)有限公司 Directional and selective sound pickup method, electronic device and storage medium based on dual microphones

Similar Documents

Publication Publication Date Title
JP4283212B2 (en) Noise removal apparatus, noise removal program, and noise removal method
JP2005249816A (en) Device, method and program for signal enhancement, and device, method and program for speech recognition
US10708702B2 (en) Signal processing method and signal processing device
Zhang et al. Multi-channel multi-frame ADL-MVDR for target speech separation
JP6987075B2 (en) Audio source separation
CN101154384A (en) Sound signal correction method, sound signal correction device and computer program
CN112002307B (en) Voice recognition method and device
CN114242104B (en) Speech noise reduction method, device, equipment and storage medium
JP4457221B2 (en) Sound source separation method and system, and speech recognition method and system
CN106031196A (en) Signal-processing device, method, and program
JP2016048872A (en) Sound collection device
US9966081B2 (en) Method and apparatus for synthesizing separated sound source
JP6815956B2 (en) Filter coefficient calculator, its method, and program
WO2020110228A1 (en) Information processing device, program and information processing method
Ueda et al. Environment-dependent denoising autoencoder for distant-talking speech recognition
CN114283832B (en) Processing method and device for multichannel audio signal
CN111863018A (en) Directional sound pickup method and related device under dual microphones
CN113380267B (en) Method and device for positioning voice zone, storage medium and electronic equipment
CN110689900B (en) Signal enhancement method and device, computer readable storage medium and electronic equipment
WO2024158629A1 (en) Guided speech-enhancement networks
JP2007047427A (en) Audio processing device
CN107919136B (en) An estimation method of digital speech sampling frequency based on Gaussian mixture model
CN115831145A (en) Double-microphone speech enhancement method and system
CN115720317A (en) Audio signal squeaking detection and suppression method and device
CN110858485A (en) Voice enhancement method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20201030

RJ01 Rejection of invention patent application after publication