MXPA99008026A

MXPA99008026A - Echo reducing phone with state machine controlled switches

Info

Publication number: MXPA99008026A
Application number: MXPA/A/1999/008026A
Authority: MX
Inventors: Gnosspelius John
Original assignee: Telefonaktiebolaget L M Ericsson
Priority date: 1997-03-11
Filing date: 1999-08-31
Publication date: 2000-01-21

Abstract

The purpose of the present invention is thus to reduce the echo introduced by cross-talk. The problem described above, with how to reduce the echo introduced by cross-talk is solved by to the microphone and to the speaker introduce switches controlled by a state-machine which take as input the signal energy of the signal from the microphone, a VAD flag of the signal from the microphone, the signal energy of the signal to the speaker and a VAD flag of the signal to the speaker.

Description

"ECO REDUCER TELEPHONE WITH CONTROLLERS WITH STATE MACHINE" TECHNICAL FIELD OF THE INVENTION The present invention relates to telecommunication in general, and to speech processing for voice communication over the Internet, in particular.

DESCRIPTION OF THE RELATED TECHNIQUE A typical Internet phone uses a PC with a sound board, a microphone and two loudspeakers. The microphone and loudspeakers are often placed side by side on the desk. This configuration causes a considerable amount of crosstalk that is heard as an echo at the receiving end. This echo must be deleted to be usable the Internet phone. In GSM it has been known to use VAD (Detection of Voice Activity) to detect when a user of a mobile phone is talking or not talking. This information is used to be able to decrease up to the bandwidth when the voice is transmitted. In speech coding discontinued in accordance with the - - Principle of VOX (Voice Driven Transmission), a VAD unit is responsible for detecting whether a sound sequence received or does not represent, or not human speech. The VAD unit can take two different states when a first state indicates that the sound sequence was the human voice and the other state indicates that the sound sequence was not the human voice. If the VAD unit detects that a certain sound sequence represents the human voice, the VAD unit will issue a first status signal to a speech coding unit that will encode the sound sequence in a speech frame. If on the other hand, a certain sound sequence represents something other than human speech, the VAD unit will issue a second status signal to the SID unit (Silent Descriptor). The SID Unit will supply each frame N: th, a SID box. During the remaining possible N-L occasions to send the pictures, nothing will be sent. A SID table comprises the information about the calculated background noise and the noise spectra calculated on the sending side. With this procedure, the power of the battery and the radio width can be saved. When the SID unit changes from generating the first state signal to generate the second state signal, that is, from detecting the speech to detecting the non-speech, it is - - they normally apply a time interval, a so-called hang-up interval during which the speech coding unit will continue to supply speech frames as if the received sequence of sounds had been human speech. If after the time of hanging the VAD unit still detects the non-speech, a SID box is generated. The reason for this procedure is that the short pauses between words in human speech will not be interpreted as non-speech, but rather that the speech frame generator will still be active.

COMPENDIUM OF THE INVENTION The present invention discloses a method and an apparatus for reducing echoes introduced by crosstalk. The object of the present invention, therefore, is to reduce the echo introduced by crosstalk. The problem described above, with the way to reduce the echo introduced by crosstalk is solved when the microphone and the switches that introduce the loudspeaker controlled by the state machine that take as input the signal energy of the signal from the microphone, a VAD flag of the signal from the microphone, the energy of the signal, of the signal to the loudspeaker and the VAD flag of the signal to the loudspeaker. One of the advantages with the present invention is that the echo introduced with the crosstalk is significantly reduced without requiring great computing power. Other advantages will be apparent to a person skilled in the art in view of the detailed description that will be provided below. The scope of additional applicability of the present invention will become apparent from the detailed description that is hereinafter provided. However, it should be understood that the preferred embodiments of the invention are provided by way of illustration only, since various changes and modifications will become apparent within the scope of the invention to those skilled in the art from this detailed description.

BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 shows a functional diagram of an embodiment of the invention. Figure 2 shows a finite state diagram.

DETAILED DESCRIPTION OF THE PREFERRED MODALITIES In Figure 1, a microphone 101 is connected to a GSM encoder 102. Before the signal reaches the GSM coder 102, it has been digitized and sampled according to the known technology, but not shown in Figure 1. From the GSM coder 102, the coded signal is transmitted to a non-coded receiver. shown in the Figure, first passing through a switch 103 that can enable or disable the transmission. From the GSM encoder 102 there is an ACFE (AutoCorrect Feature) which is passed to a VAD unit 104. To the VAD unit 105, there is also a NE value of the long-term predictor delay that was passed from the GSM frames. From the 104 VAD unit there is a value, PE which represents the energy of the signal passed to the finite state machine 105. The 104 VAD unit also calculates a flag FE indicating whether the VAD unit 104 has detected human speech. The flag FE is passed to the machine 105 of finite state. The FE flag will be true if the human voice has been detected. In addition, in Figure 1 there is a coded voice signal subject to sampling received from a sender (not shown) and that has been passed to a GSM decoder 106. From the GSM decoder 106 the decoded sampling speech signal is passed to the speaker 107 by first passing through a switch 108 that can enable or disable the voice signal to reach the speaker. In order for the loudspeaker to be able to function properly, a D / A conversion is needed according to the known technology, but not shown in Figure 1. The received coded speech signal is deducted from an ND value of long-term predictor delay, and it is passed to a 109 VAD unit. Since the decoding of GSM frames does not normally involve the use of a VAD unit, the GSM decoder lacks the parameters necessary to calculate the ACF. To be able to calculate the ACF, an autocorrelation unit 110 receives the data from the GSM decoder 106 and calculates the ACFD that is passed to the VAD unit 109. The autocorrelation unit 110 is a part of the GSM encoder, as described in FIG. the rules. An indication of the PD value, the power in the voice signal to the speaker, is passed from the 109 VAD unit to the finite state machine 105. From the 109 VAD unit also an FD flag is passed to the finite state machine indicating whether the VAD unit has detected the human voice.

The finite state machine 106 comprising the functionality for setting the switches 103 and 109 depending on the values admitted to the finite state machine. In Figure 2, the states and possible transitions are shown for the finite state machine in Figure 1. The transitions between the states are carried out according to the descriptions that will be given below. The following definitions were used: FE: VAD flag when FD is coded: VAD flag when decoding PE: Signal energy when encoded PD: Signal energy when decoded • Hang: Time from decision to switch address to that the switching is carried out. This time must be long enough to compensate for the echo of the room. 201. FE = 1 and FD = 0 OR FE = 1 and PE > PD, hang = 0 202. FE = 0, hang = 600 ms 203. FD = 1 AND FE = 0 O FD = 1 and PD > PE, hang = 0 204. FD = 0, hang = 600 ms 205. FD = 1 AND PD > PE, hang = 600 ms 206. FE = 1 AND PE > PD, hang = 600 ms In the state TRANSMIT 207 is the switch that controls the transmission of the voice signal from the trained microphone and the switch that controls the transmission of the voice signal to the incapacitated speaker. In the RECEIVER state 208 is the switch that controls the transmission of the voice signal from the trained microphone and the switch that controls the transmission to the trained loudspeaker. In the unoccupied state 209, both switches are incapacitated. The invention having been described in this way it will be apparent that it can be varied in many ways. These variations should not be considered as a deviation from the spirit and scope of the invention, and all these modifications that will be apparent to a person skilled in the art are intended to be included within the scope of the following claims.

Claims

- - R E I V I N D I C A C I O N S

1. A method for reducing echo when transmitting the voice in a telephone application, the telephone application comprises a loudspeaker and a microphone, characterized in that a finite state machine affects the loudspeaker and the microphone to be connected or disconnected depending on the characteristics of the microphone signal and the characteristics of the signal to the loudspeaker.

A method according to claim 1, wherein the telephone application comprises at least one VAD unit, a GSM encoder and a GSM decoder, characterized in that the first VAD flag of the signal from the microphone is passed to the finite state machine, of a first value representing the signal energy in the signal from the microphone is passed to the finite state machine, that a second VAD flag of the signal to the speaker is passed to the state machine finite, that a second value that represents the energy in the signal to the loudspeaker is passed to the finite state machine, that the finite state machine affects a first switch that controls the transmission of the signal from the microphone, that the machine of finite state affects a second switch that controls the transmission of the second signal to the loudspeaker depending on the values of the first VAD flag, the second VAD flag, the er value and the second value.

A method according to claim 2, characterized in that a first voice signal sampled from the microphone is passed to the GSM encoder, that a first delay value of the long-term predictor is first passed to a first unit VAD, that a first autocorrelation coefficient is passed from the first GSM encoder to the first VAD unit, that a first Boolean flag is passed from the first VAD unit to the first finite state machine, that a first value representing the energy of the signal from the microphone is passed from the first VAD unit to the finite state machine, that a second coded voice signal being sampled is received, that the second voice signal is passed to a GSM decoder, which a second delay value of the long-term predictor from the second speech signal is passed to a second VAD unit, that a second auto-correlation coefficient is calculated and it is passed to the second VAD unit, that a second value representing the energy in the second voice signal is passed from the second VAD unit to the finite state machine, that a second Boolean flag is passed from the unit VAD to the finite state machine and the finite state machine controls a first switch that affects the transmission of the first coded voice signal sampled from the microphone, a second switch affects the transmission of the second decoded speech signal towards a loudspeaker depending on the values of the first Boolean flag, the second Boolean flag, the first value and the second value.

4. A method according to claim 2, characterized in that if the finite state machine takes a first state, the first switch that controls the transmission from the microphone is graded to allow this transmission and that the second switch that controls the transmission to the loudspeaker graduates to not allow this transmission, that if the finite state machine takes a second state, the first switch that controls the transmission from the microphone graduates to not allow this transmission and that the second switch that controls the transmission to the loudspeaker is adjusted to allow this transmission.

5. A method according to claim 4, characterized in that if the finite state machine adopts a third state, the first and second switches both adjust to the same state.

6. A method according to claim 5, characterized in that the finite state machine switch from the third state to the first state if the first flag is true and the second flag is false, or if the first flag is true and the first value is larger than the second value, which the finite state machine changes from the first state to the third state if the first flag is false and if a hanging time has elapsed, the finite state machine changes from the third state to the second state if the second flag is true and the first flag is false or if the second flag is true and the second value is larger than the first value, the finite state machine changes from the second state to the third state if the second flag is false and the time for hanging has passed , that the finite state machine changes from the first state to the second state, if the second flag is true and the second value is larger than the first value and that the time has passed hang, that the finite state machine changes from the second to the first state, if the first flag is true and the first value is larger than the second value and the hang time has passed.

7. A method according to claim 6, characterized in that the hanging time is 600 milliseconds.

8. An apparatus for reducing the echo when transmitting the voice in a telephone application, the telephone application comprises a loudspeaker and a microphone, characterized in that the telephone application comprises a finite state machine placed to affect the loudspeaker and the microphone so that is connected or disconnected depending on the characteristics of the signal from the microphone and the characteristics of the signal to the loudspeaker.

9. A personal computer placed to transmit and receive the voice with a telephone application comprising an apparatus for reducing the echo of the voice, the telephone application comprising a loudspeaker and a microphone characterized in that the telephone application comprises a state machine finite placed to affect the loudspeaker and microphone to be connected or disconnected depending on the characteristics of the signal from the microphone and the characteristics of the signal from the loudspeaker.