CN1255255A

CN1255255A - Echo reducing phone with state machine controlled switches

Info

Publication number: CN1255255A
Application number: CN 98804832
Authority: CN
Inventors: J·格瑙斯佩刘斯
Original assignee: Telefonaktiebolaget LM Ericsson AB
Current assignee: Telefonaktiebolaget LM Ericsson AB
Priority date: 1997-03-11
Filing date: 1998-02-24
Publication date: 2000-05-31
Also published as: JP2001514823A; WO1998040974A1; SE511650C2; TW407435B; AU6426498A; BR9808240A; SE9700873L; SE9700873D0; EP0974205A1; AU735505B2; CA2283590A1

Abstract

The purpose of the present invention is to reduce the echo introduced by crosstalk. The above-mentioned problem of how to reduce crosstalk and introduce echo is solved by introducing a switch controlled by a state machine to the microphone and the speaker. The state machine takes the signal energy of the signal from the microphone, the VAD flag of the signal from the microphone, and the The signal energy and the VAD signature of the signal reaching the loudspeaker are taken as input.

Description

Echo reduction phone with state machine controlled switch

本发明一般涉及电信，更具体地涉及因特网上的话音通信的语音处理。This invention relates generally to telecommunications, and more particularly to speech processing for voice communications over the Internet.

典型的因特网电话利用带声板、麦克风及两个扬声器的PC。麦克风与扬声器通常互相靠近地放在桌上。这一配置导致在接收机端听起来象回声的可观的串音量。为了使因特网电话能使用必须抑制这一回声。A typical Internet phone utilizes a PC with a soundboard, microphone and two speakers. The microphone and speaker are usually placed close to each other on a table. This configuration results in a considerable amount of crosstalk that sounds like an echo at the receiver. This echo must be suppressed in order for Internet telephony to be usable.

在GSM中，已知用VAD(话音活动检测)来检测移动电话用户是否在讲话。利用这一信息能减少发射话音时的带宽。在按照VOX原理(话音操作的传输)的不连续语音编码中，VAD单元负责检测所接收的声音序列是否表示人的语音。VAD单元能取两种不同的状态，其中第一状态表示声音序列为人的话音而另一状态表示声音序列不是人的话音。In GSM it is known to use VAD (Voice Activity Detection) to detect whether a mobile phone user is speaking or not. Utilizing this information reduces bandwidth when transmitting voice. In discontinuous speech coding according to the VOX principle (Voice Operated Transmission), the VAD unit is responsible for detecting whether the received sound sequence represents human speech. The VAD unit can take two different states, where the first state indicates that the sound sequence is human voice and the other state indicates that the sound sequence is not human voice.

如果VAD单元检测出给定的声音序列表示人的话音，该VAD单元将发布第一状态信号到语音编码单元，后者将该声音序列编码在语音帧中。反之，如果给定的声音序列表示人的语音以外的东西，该VAD单元将发布第二状态信号给SID(静默描述符)单元。所述SID单元每N个帧发送一个SID帧。在其余N-1个可能发送帧的时机中不发送任何东西。SID帧包括发送方上的关于估计的背景噪声及估计的噪声频谱的信息。用这一过程能节省电池功率与无线电带宽。If the VAD unit detects that a given sound sequence represents human speech, the VAD unit will issue a first status signal to the speech encoding unit, which encodes the sound sequence in a speech frame. Conversely, if the given sound sequence represents something other than human speech, the VAD unit will issue a second status signal to the SID (Silence Descriptor) unit. The SID unit sends a SID frame every N frames. Nothing is sent on the remaining N-1 possible occasions to send a frame. The SID frame includes information on the sender about the estimated background noise and the estimated noise spectrum. Using this process saves battery power and radio bandwidth.

当SID单元从生成第一状态信号改变到生成第二状态信号时，便是从检测到语音到检测到非语音的时间间隔，通常施加所谓的释放延迟，在这一期间语音编码单元继续发送语音帧，似乎所接收的声音序列依然是人的语音。如果释放延迟时间之后VAD单元仍检测到非语音，便生成SID帧。这一过程的原因在于不应将人的语言中的字之间的短的停顿解释为非语音，而语音帧发生器仍应活跃。When the SID unit changes from generating the first state signal to generating the second state signal, that is the time interval from the detection of speech to the detection of non-speech, usually imposing a so-called release delay, during which time the speech encoding unit continues to send speech frame, it seems that the received sound sequence is still human speech. If the VAD unit still detects non-speech after the release delay time, a SID frame is generated. The reason for this process is that short pauses between words in human language should not be interpreted as non-speech, and the speech frame generator should still be active.

本发明公开了减少串音引入的回声的方法与装置。The invention discloses a method and a device for reducing the echo introduced by crosstalk.

因而本发明的目的为减少串音引入的回声。It is therefore an object of the present invention to reduce echoes introduced by crosstalk.

上述关于如何减少串音引入的回声问题是通过向麦克风与扬声器引入受状态机控制的开关解决的，该状态机以来自麦克风的信号的信号能量、来自麦克风的信号的VAD标志、到扬声器的信号的信号能量及到扬声器的信号的VAD标志作为输入。The above-mentioned problem of how to reduce the echo introduced by crosstalk is solved by introducing a switch to the microphone and speaker controlled by a state machine, which uses the signal energy of the signal from the microphone, the VAD flag of the signal from the microphone, the signal The signal energy and the VAD flag of the signal to the loudspeaker are taken as input.

本发明的优点之一是明显地减少了串音引入的回声而无需更多的计算能力。One of the advantages of the present invention is that crosstalk-induced echoes are significantly reduced without requiring more computing power.

从下面给出的详细描述中，对于熟悉本技术的人员其它优点将是显而易见的。Other advantages will be apparent to those skilled in the art from the detailed description given below.

从下面给出的详细描述中，本发明的进一步应用范围将是显而易见的。然而，应理解，本发明的最佳实施例只是示例性的，因为从这一详细描述中，在本发明范围内的各种改变与修正对熟悉本技术的人员是显而易见的。Further scope of applicability of the present invention will be apparent from the detailed description given below. It should be understood, however, that the preferred embodiment of the invention is illustrative only, since various changes and modifications within the scope of the invention will become apparent to those skilled in the art from this detailed description.

图1示出本发明的一个实施例的方框图。Figure 1 shows a block diagram of one embodiment of the present invention.

图2示出有限状态图。Figure 2 shows a finite state diagram.

在图1中麦克风101连接在GSM编码器102上。在信号到达GSM编码器102之前，它已按照图1中未示出的已知技术被数字化与抽样。从GSM编码器102首先通过能启动或截止传输的开关103将编码信号传输给图中未示出的接收机。从GSM编码器102将ACF_E(自动校正系数)传递给VAD单元104。从GSM帧还传递长期预测器滞后值N_E给VAD单元104。从VAD单元104将表示信号的能量的值P_E传递给有限状态机105。VAD单元104还计算指示VAD单元104是否已检测到人的语音的标志F_E。将标志F_E传递给有限状态机105。如果检测到人的话音则标志F_E为真。In FIG. 1 a microphone 101 is connected to a GSM encoder 102 . Before the signal reaches the GSM encoder 102, it is digitized and sampled according to known techniques not shown in FIG. The encoded signal is first transmitted from the GSM encoder 102 to a receiver (not shown in the figure) via a switch 103 which can activate or deactivate the transmission. _ACFE (Auto Correction Coefficient) is passed from the GSM encoder 102 to the VAD unit 104 . The long-term predictor lag value _NE is also passed to the VAD unit 104 from the GSM frame. A value _PE representing the energy of the signal is passed from the VAD unit 104 to the finite state machine 105 . The VAD unit 104 also calculates a flag _FE indicating whether the VAD unit 104 has detected human speech. The flag _FE is passed to the finite state machine 105 . Flag _FE is true if human voice is detected.

图1中还有从发送者(未示出)接收并传递给GSM解码器106的抽样的编码话音信号。从GSM解码器106首先通过能使或截止话音信号到达扬声器的开关108将解码的抽样话音信号传递给扬声器107。按照图1中未示出的已知技术，为了使扬声器能正常工作，需要D/A转换。从所接收的编码话音信号中推导出长期预测器滞后值N₀并传递给VAD单元109。Also shown in FIG. 1 is a sampled encoded speech signal received from a sender (not shown) and passed to the GSM decoder 106 . From the GSM decoder 106 the decoded sampled voice signal is first delivered to the speaker 107 through a switch 108 which enables or disables the voice signal to the speaker. According to known techniques not shown in Fig. 1, a D/A conversion is required for the loudspeaker to function properly. The long-term predictor lag value N ₀ is derived from the received encoded speech signal and passed to the VAD unit 109 .

由于GSM帧的解码通常不包含使用VAD单元，GSM解码器缺少用于计算ACF的必要参数。为了能计算ACF，自相关单元110接收来自GSM解码器106的数据及计算传递给VAD单元109的ACF_D。自相关单元110为标准中所描述的GSM编码器的一部分。从VAD单元109将到达扬声器的话音信号中的能量的指示值P_D传递给有限状态机105。从VAD单元109还将标志F_D传递给所述有限状态机，指示VAD单元是否检测到人的话音。Since the decoding of GSM frames usually does not involve the use of VAD units, GSM decoders lack the necessary parameters for calculating the ACF. To be able to calculate the ACF, the autocorrelation unit 110 receives the data from the GSM decoder 106 and calculates the ACF _D which is passed to the VAD unit 109 . The autocorrelation unit 110 is part of the GSM encoder described in the standard. An indication value _PD of the energy in the speech signal arriving at the loudspeaker is passed from the VAD unit 109 to the finite state machine 105 . A flag _FD is also passed from the VAD unit 109 to the finite state machine, indicating whether the VAD unit detected human speech or not.

有限状态机106包括根据输入到有限状态机的值设定开关103与109的功能。The finite state machine 106 includes the function of setting the switches 103 and 109 according to the values input to the finite state machine.

图2中示出图1中的有限状态机的状态与可能的转移。FIG. 2 shows the states and possible transitions of the finite state machine in FIG. 1 .

状态之间的转移是按照以下描述进行的。利用下述定义：Transitions between states are performed as described below. Use the following definitions:

·F_E：编码时的VAD标志· F _E : VAD flag when encoding

·F_D：解码时的VAD标志F _D : VAD flag when decoding

·P_E：编码时的信号能量P _E : signal energy during encoding

·P_D：解码时的信号能量P _D : signal energy when decoding

·释放延迟：从决定开关方向到进行开关的时间。这一时间必须足够长以补偿室内回声。·Release delay: the time from when the switch direction is determined to when the switch is made. This time must be long enough to compensate for room echoes.

201.F_E＝1AND F_D＝0 OR F_E＝1及P_E＞P_D，释放延迟＝0201. _FE = 1AND F _D = 0 OR _FE = 1 and P _E > P _D , release delay = 0

202.F_E＝0，释放延迟＝600ms202. F _E = 0, release delay = 600ms

203.F_D＝1 AND F_E＝0 OR F_D＝1及P_D＞P_E，释放延迟＝0203. F _D ＝1 AND _FE ＝0 OR F _D ＝1 and P _D >P _E , release delay=0

204.F_D＝0，释放延迟＝600ms204. F _D = 0, release delay = 600ms

205.F_D＝1 AND P_D＞P_E，释放延迟＝600ms205.F _D ＝1 AND P _D ＞P _E , release delay＝600ms

205.F_E＝1 AND P_E＞P_D，释放延迟＝600ms _205.FE = 1 AND P _E > P _D , release delay = 600ms

在状态TRANSMITTING(传输)207中，启动控制从麦克风传输话音信号的开关及截止控制传输话音信号到扬声器的开关。在状态RECEIVING(接收)208中，截止控制从麦克风传输话音信号的开关及启动控制向扬声器传输的开关。在IDLE(空闲)状态209中，两个开关都截止。In state TRANSMITTING 207, the switch controlling the transmission of the voice signal from the microphone is activated and the switch controlling the transmission of the voice signal to the speaker is disabled. In state RECEIVING 208, the switch controlling the transmission of voice signals from the microphone is disabled and the switch controlling transmission to the speaker is enabled. In the IDLE state 209, both switches are off.

这样描述了本发明，显而易见可用多种方式改变本发明。而这种改变不认为是偏离本发明的精神与范围，对于熟悉本技术的人员显而易见的所有这些修改都旨在包含在以下的权利要求的范围之内。The invention thus being described, it will be obvious that the invention may be varied in various ways. While such changes are not to be regarded as a departure from the spirit and scope of the invention, all such modifications as would be obvious to one skilled in the art are intended to be embraced within the scope of the following claims.

Claims

1. method that is used for when the phone application transporting speech reducing echo, described phone application comprises loud speaker and microphone, it is characterized in that, finite state machine influences opening or closing of described loud speaker and microphone according to the signal characteristic that reaches described loud speaker from the signal characteristic of described microphone.

2. according to the method for claim 1, wherein said phone application comprises at least one VAD unit, a GSM encoder and a GSM decoder, it is characterized in that, to pass to described finite state machine from the VAD sign of the signal of microphone will represent to pass to described finite state machine from first value of the representation signal energy in the signal of microphone, the 2nd VAD sign that arrives the signal of loud speaker is passed to described finite state machine, second value of the energy in the signal of expression arrival loud speaker is passed to described finite state machine, according to described VAD sign, described the 2nd VAD sign, described first value and described second value, described finite state machine influence control is from first switch of the transmission of the described signal of described microphone, and described finite state machine is transferred to described secondary signal the second switch of described loud speaker.

3. according to the method for claim 2, it is characterized in that, to pass to described GSM encoder from the first sampling voice signal of described microphone, the first long-term predictor lagged value is passed to a VAD unit, to pass to a described VAD unit from first auto-correlation coefficient of a described GSM encoder, to pass to described finite state machine from first Boolean denotation of a described VAD unit, expression is passed to described finite state machine from first value of the energy of the signal of described microphone from a described VAD unit, receive the second sampling coded speech signal, described second voice signal is passed to the GSM decoder, to pass to the 2nd VAD unit from the second long-term predictor lagged value of described second voice signal, calculate second auto-correlation coefficient and pass to described the 2nd VAD unit, second value of the energy in described second voice signal of expression is passed to described finite state machine from described the 2nd VAD unit, second Boolean denotation is passed to described finite state machine from described VAD unit, and described finite state machine is according to described first Boolean denotation, described second Boolean denotation, described first value and described second value, the control influence is from first switch of the transmission of the described first sampling coded speech signal of described microphone, and the described second decoding voice signal of influence is to the second switch of the transmission of loud speaker.

4. according to the method for claim 2, it is characterized in that, if described finite state machine is got first state, control is arranged to allow this transmission from described first switch of the transmission of described microphone, and control transmission is arranged to not allow this transmission to the described second switch of loud speaker, if described finite state machine is got second state, control is arranged to not allow this transmission from described first switch of the transmission of described microphone, and will controls the described second switch of the transmission of loud speaker is arranged to allow this transmission.

5. according to the method for claim 4, it is characterized in that,, then described first and second switch all is arranged to identical state if described finite state machine is got the third state.

6. according to the method for claim 5, it is characterized in that, if if described first be masked as true with described second be masked as puppet or described first be masked as pseudo-and described first value greater than described second value, then described finite state machine switches to described first state from the described third state; If described first is masked as puppet and has pass by hang-over delay, then described finite state machine switches to the described third state from described first state; If if described second be masked as true and described first and be masked as puppet or described second and be masked as true and described second value greater than described first value, then described finite state machine switches to described second state from the described third state; If described second is masked as pseudo-and has pass by described hang-over delay, then described finite state machine switches to the described third state from described second state; If described second be masked as true and described second value greater than described first value with pass by described hang-over delay, then described finite state machine switches to described second state from described first state; If described first is masked as true and described first value greater than described second value and pass by described hang-over delay, then described finite state machine switches to described first state from described second state.

7. according to the method for claim 6, it is characterized in that described hang-over delay is 600ms.

8. device that is used for when the phone application transporting speech reducing echo, described phone application comprises loud speaker and microphone, it is characterized in that described phone application comprises the finite state machine that opens or closes that is configured to according to influence described loud speaker and microphone from the signal characteristic of described microphone and the signal characteristic that arrives described loud speaker.

9. one kind is configured to phone application transport and the personal computer that receives speech, described phone application comprises the echo that is used to reduce described speech, described phone application comprises loud speaker and microphone, it is characterized in that described phone application comprises that basis reaches the finite state machine that opens or closes that influences described loud speaker and microphone from the signal characteristic of described loud speaker from the signal characteristic of described microphone.