JPH06282297A

JPH06282297A - Voice coding method

Info

Publication number: JPH06282297A
Application number: JP5090501A
Authority: JP
Inventors: Tadayoshi Makino; 忠由牧野
Original assignee: IDOU TSUSHIN SYST KAIHATSU KK
Current assignee: IDOU TSUSHIN SYST KAIHATSU KK
Priority date: 1993-03-26
Filing date: 1993-03-26
Publication date: 1994-10-07

Abstract

PURPOSE:To provide a voice coding method by which the influence of a surrounding noise superimposed upon an input voice signal can be reduced. CONSTITUTION:A silence detecting part 3 to detect a silent interval of an input voice signal, an FFT part 4 to measure a spectrum pattern of the input signal, a noise spectrum pattern group 7 prepared beforhand, a neural network group 9 to learn extraction of a voice signal with every noise pattern in this group, and a coding part 12, are provided, and an input pattern with every detection of the silence timing and a prepared noise pattern are compared with each other, and the closest noise pattern is selected, and an input signal converted into a frequency area is inputted to a neural network learnt beforehand by this selected noise pattern, and is coded after the voice signal is extracted.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、音声信号をデジタル化
して伝送あるいは記憶を行い、また伝送ないし記憶され
ているデジタル信号をアナログ信号へ変換する音声符号
化方式に関し、電話機、携帯電話機、自動車電話機等の
電話機器、音声ファイル、音声メモリ等へ応用可能な音
声符号化方式に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice encoding system for digitizing a voice signal for transmission or storage and converting the transmitted or stored digital signal into an analog signal, such as a telephone, a mobile telephone, an automobile. The present invention relates to a voice encoding method applicable to telephone devices such as telephones, voice files, voice memories, and the like.

【０００２】[0002]

【従来の技術】従来、この種の音声符号化方式は、周囲
の騒音による音声信号の劣化を軽減するため、入力され
る音声信号の振幅を測定する機能や、入力される音声の
振幅を制御する機能を備えており、入力される音声信号
の振幅がある一定の閾値より低いレベルである部分（以
下「無音区間」という）を検出し、この無音区間の信号
レベルを減衰させていた。この方式は、音声入力がまっ
たく無い場合には、周囲の騒音のみが符号化され強調さ
れる、というような事態の生じることのないように制御
を行い、周囲騒音の影響の低減を図るものであった。図
５は、従来の騒音対策を施した音声符号化方式の構成を
示す図である。図５において、５１は音声信号入力であ
り、Ａ／Ｄ変換部５２に接続している。Ａ／Ｄ変換部５
２の出力は、無音検出部５３、切換スイッチ６７及び符
号化部６２に接続している。切換スイッチ６７は、アッ
テネータ６８に接続している。符号化部６２の出力は、
伝送路６３を経由して復号化部６４に接続している。復
号化部６４の出力は、Ｄ／Ａ変換部６５に接続し、Ｄ／
Ａ変換部６５の出力は音声信号出力端子６６に接続して
いる。上記の構成により、音声信号入力端子５１から入
力されたアナログ信号の音声入力信号は、Ａ／Ｄ変換部
５２でデジタル信号に変換され、次いで無音検出部５３
により無音区間の検出が行われる。そして、無音区間に
おいては、切換スイッチ６７がＯＮし、アッテネータ６
８を介して音声信号が接地されるため、入力信号のレベ
ルは低下させられる。上記のような仕組により、入力音
声信号が無音の状態において騒音信号のみが符号化され
雑音ばかりが目立つ、といった事態を防止していた。2. Description of the Related Art Conventionally, this kind of speech coding system has a function of measuring the amplitude of an input speech signal and controlling the amplitude of the inputted speech in order to reduce deterioration of the speech signal due to ambient noise. It has a function to do so, detects a portion where the amplitude of the input audio signal is lower than a certain threshold value (hereinafter referred to as "silent section"), and attenuates the signal level of this silent section. This method is intended to reduce the influence of ambient noise by controlling so that the situation where only ambient noise is encoded and emphasized when there is no voice input at all will occur. there were. FIG. 5 is a diagram showing the configuration of a conventional speech coding method with noise countermeasures. In FIG. 5, reference numeral 51 is an audio signal input, which is connected to the A / D conversion unit 52. A / D converter 5
The output of 2 is connected to the silence detector 53, the changeover switch 67, and the encoder 62. The changeover switch 67 is connected to the attenuator 68. The output of the encoding unit 62 is
It is connected to the decoding unit 64 via the transmission line 63. The output of the decoding unit 64 is connected to the D / A conversion unit 65 and
The output of the A converter 65 is connected to the audio signal output terminal 66. With the above configuration, an analog audio input signal input from the audio signal input terminal 51 is converted into a digital signal by the A / D converter 52, and then the silence detector 53.
Thus, the silent section is detected. Then, in the silent section, the changeover switch 67 is turned on, and the attenuator 6
Since the audio signal is grounded via 8, the level of the input signal is lowered. With the above-mentioned mechanism, it has been prevented that only the noise signal is encoded and only noise is conspicuous when the input voice signal is silent.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、上記従
来の音声符号化方式は、入力音声信号が無い無音時のみ
における騒音軽減対策であり、入力音声信号の振幅が一
定値より大きい場合には、周囲の騒音の影響を軽減する
ことはできなかった。本発明は、上記の課題を解決する
ためになされたものであり、入力音声信号に重畳した周
囲騒音の影響を低減しうる音声符号化方式を提供するこ
とを目的とする。However, the above-mentioned conventional speech coding system is a noise reduction measure only when there is no input voice signal and there is no input voice signal. It was not possible to reduce the effect of noise on the. The present invention has been made to solve the above problems, and an object of the present invention is to provide a speech coding method capable of reducing the influence of ambient noise superimposed on an input speech signal.

【０００４】[0004]

【課題を解決するための手段】上記の課題を解決するた
め、本願の第１の発明に係る音声符号化方式は、入力音
声信号の無音区間を検出する無音区間検出手段と、当該
入力音声信号のスペクトラムパターンを測定する入力ス
ペクトラムパターン測定手段と、予め用意された騒音ス
ペクトラムパターン群と、当該騒音スペクトラムパター
ン群内の騒音スペクトラムパターン毎に音声信号抽出の
学習を行ったニューラルネットワークと、符号化手段と
を備え、前記無音区間のタイミング検出毎にそのときの
入力音声信号のスペクトラムパターンと前記騒音スペク
トラムパターン群内の個々の騒音スペクトラムパターン
との比較を行い最も近い騒音スペクトラムパターンの選
択を行い、当該選択された騒音スペクトラムパターンに
て予め学習したニューラルネットワークに周波数領域に
変換した前記入力音声信号を入力し、音声信号のみの抽
出を行った後に符号化を行うように構成される。また、
本願の第２の発明に係る音声符号化方式は、入力音声信
号の無音区間を検出する無音区間検出手段と、当該入力
音声信号のスペクトラムパターンを測定する入力スペク
トラムパターン測定手段と、時間計測手段と、当該時間
計測手段により制御され一定時間毎の無音時のスペクト
ラムパターンである時間帯スペクトラムパターンを蓄積
する時間帯スペクトラムパターン蓄積手段と、無騒音状
態で収録した比較用音声信号と、学習機能を有するニュ
ーラルネットワーク群と、符号化手段とを備え、前記時
間帯スペクトラムパターンと前記比較用音声信号との和
により音声信号の抽出を行う学習を予め行い、時間帯に
よって変化する騒音に対し適応的に音声信号のみの抽出
を行った後に符号化を行うように構成される。In order to solve the above problems, a speech coding system according to the first invention of the present application is a silent section detecting means for detecting a silent section of an input speech signal, and the input speech signal. Input spectrum pattern measuring means for measuring the spectrum pattern of the above, a noise spectrum pattern group prepared in advance, a neural network for learning voice signal extraction for each noise spectrum pattern in the noise spectrum pattern group, and an encoding means. And, for each timing detection of the silent section, the spectrum pattern of the input audio signal at that time is compared with individual noise spectrum patterns in the noise spectrum pattern group, and the closest noise spectrum pattern is selected. Two previously learned with the selected noise spectrum pattern Enter the input audio signal transformed into the frequency domain to over neural network, configured to perform encoding after the extraction of the audio signals only. Also,
A voice encoding system according to a second invention of the present application comprises a silent section detecting means for detecting a silent section of an input speech signal, an input spectrum pattern measuring means for measuring a spectrum pattern of the input speech signal, and a time measuring means. A time zone spectrum pattern accumulating means for accumulating a time zone spectrum pattern which is a spectrum pattern during a silent period controlled by the time measuring means, a comparison voice signal recorded in a noiseless state, and a learning function A neural network group and an encoding unit are provided, and learning is performed in advance to extract a voice signal by the sum of the time period spectrum pattern and the comparison voice signal, and voice is adaptively applied to noise that changes depending on the time period. It is configured to perform encoding after performing only signal extraction.

【０００５】[0005]

【作用】上記構成を有する本願の第１の発明に係る音声
符号化方式によれば、入力音声信号の振幅が一定の閾値
以下の場合、すなわち無音区間において、周囲騒音のス
ペクトラム解析を行って入力スペクトラムパターンを測
定し、予め用意した騒音スペクトラムパターン群の中か
ら、測定した入力スペクトラムパターンに最も近い騒音
スペクトラムを選択する。一方、想定した想定騒音スペ
クトラムパターンと音声信号とが混在した信号を入力と
し、音声信号のみを教師信号として学習を行ったニュー
ラルネットワークは、上記混在信号中の音声信号を抽出
する能力を持つ。したがって、上記の予め用意した複数
の騒音スペクトラムパターンから成る騒音スペクトラム
パターン群を上記の想定騒音スペクトラムパターンとし
て音声抽出の学習を行ったニューラルネットワーク群を
用意することにより、無音時の入力信号から、予め用意
した騒音スペクトラムパターンのいずれかを選択し、そ
の騒音スペクトラムパターンで予め学習を行っているニ
ューラルネットワークを選択し、そのニューラルネット
ワークに入力信号を入力した後に符号化を行うことによ
り、周囲騒音の影響を除いた音声符号化を行うことがで
きる。また、上記構成を有する本願の第２の発明に係る
音声符号化方式によれば、一定時間毎、すなわち時間帯
毎の周囲騒音の入力スペクトラムパターンを測定して蓄
積し、この蓄積された時間帯騒音スペクトラムパターン
と比較用音声信号とにより学習を行うニューラルネット
ワークを備えたことにより音声符号化装置の設置された
場所の時間帯毎の周囲騒音の特性に適応させて音声信号
の抽出を行うことができる。抽出後の信号を符号化する
ことにより、より効果的に周囲騒音の影響を除去して音
声符号化を行うことができる。According to the speech coding method of the first invention of the present application having the above-mentioned configuration, when the amplitude of the input speech signal is equal to or less than a certain threshold value, that is, in the silent section, the spectrum analysis of the ambient noise is performed and input. The spectrum pattern is measured, and the noise spectrum closest to the measured input spectrum pattern is selected from the noise spectrum pattern group prepared in advance. On the other hand, a neural network that has received a signal in which an assumed assumed noise spectrum pattern and a voice signal are mixed and learned by using only the voice signal as a teacher signal has a capability of extracting the voice signal in the mixed signal. Therefore, by preparing a neural network group that has learned the voice extraction with the noise spectrum pattern group consisting of a plurality of noise spectrum patterns prepared in advance as the above-mentioned assumed noise spectrum pattern, Effects of ambient noise by selecting one of the prepared noise spectrum patterns, selecting a neural network that has been pre-learned with the noise spectrum pattern, inputting an input signal to the neural network, and then encoding It is possible to perform speech coding excluding. Further, according to the speech coding method according to the second invention of the present application having the above-mentioned configuration, the input spectrum pattern of the ambient noise is measured and accumulated at constant time intervals, that is, at each time interval, and the accumulated time interval is measured. Since the neural network that performs learning using the noise spectrum pattern and the comparative speech signal is provided, the speech signal can be extracted by adapting to the characteristics of the ambient noise for each time zone of the place where the speech encoding device is installed. it can. By encoding the extracted signal, it is possible to more effectively remove the influence of ambient noise and perform voice encoding.

【０００６】[0006]

【実施例】以下、本発明の実施例を図面にもとづいて説
明する。図１は、本発明の第１実施例である音声符号化
方式の構成を示したものである。図１において、１は音
声信号入力端子であり、Ａ／Ｄ変換部２に接続してい
る。Ａ／Ｄ変換部２は、無音区間検出手段である無音検
出部３及び入力スペクトラムパターン測定手段であるＦ
ＦＴ（Fast Fourier Transform：高速フーリエ変換）部
４に接続している。無音検出部３の出力はニューラルネ
ットワーク第１選択スイッチ８に接続している。また、
ＦＦＴ部４の出力は、切換スイッチ５及び上記のニュー
ラルネットワーク第１選択スイッチ８に接続している。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 shows the configuration of a speech coding system which is a first embodiment of the present invention. In FIG. 1, reference numeral 1 is an audio signal input terminal, which is connected to the A / D converter 2. The A / D conversion unit 2 is a silence detection unit 3 which is a silent section detection unit and an F which is an input spectrum pattern measurement unit.
It is connected to an FT (Fast Fourier Transform) unit 4. The output of the silence detector 3 is connected to the neural network first selection switch 8. Also,
The output of the FFT unit 4 is connected to the changeover switch 5 and the neural network first selection switch 8 described above.

【０００７】切換スイッチ５は、上記の無音検出部３に
より、無音時にＯＮするように制御されるスイッチであ
り、その出力は選択部６に接続されている。選択部６に
は、騒音スペクトラムパターン群７が接続している。こ
れにより、選択部６は、無音時にＦＦＴ部４から与えら
れるスペクトラムパターンに対し、騒音スペクトラムパ
ターン群７において予め用意されたスペクトラムパター
ンのうちで最も近いものを選択し、その結果に基づき、
ニューラルネットワーク第１選択スイッチ８及びニュー
ラルネットワーク第２選択スイッチ１０を制御する。The changeover switch 5 is a switch which is controlled by the silence detecting section 3 so as to be turned on when there is no sound, and its output is connected to the selecting section 6. A noise spectrum pattern group 7 is connected to the selection unit 6. Thereby, the selection unit 6 selects the closest one of the spectrum patterns prepared in advance in the noise spectrum pattern group 7 to the spectrum pattern given from the FFT unit 4 when there is no sound, and based on the result,
The neural network first selection switch 8 and the neural network second selection switch 10 are controlled.

【０００８】９は、予め用意した各騒音スペクトラムパ
ターンと音声信号とが混在した信号中から音声信号を抽
出する学習を予め行ったニューラルネットワーク群であ
り、複数のニューラルネットワーク部９1 〜９n を備え
ている。各ニューラルネットワーク部には、ＦＦＴ部４
により周波数領域へ変換された入力音声信号が、ニュー
ラルネットワーク第１選択スイッチ８により切り換えら
れて入力される。Reference numeral 9 is a neural network group which has been preliminarily learned to extract a voice signal from a signal in which a noise spectrum pattern and a voice signal which are prepared in advance are mixed, and are provided with a plurality of neural network units 91 to 9n. There is. Each neural network unit has an FFT unit 4
The input voice signal converted into the frequency domain by is switched by the neural network first selection switch 8 and input.

【０００９】このニューラルネットワーク群９の出力
は、選択部６により制御されるニューラルネットワーク
第２選択スイッチ１０により切り換えられてＩＦＦＴ
（逆高速フーリエ変換）部１１へ出力される。ＩＦＦＴ
部１１では、入力された信号を時間領域信号に変換した
後、符号化手段である符号化部１２へ出力する。符号化
部１２で符号化された出力は、伝送路１３を経由して復
号化部１４に入力され、復号化された後、Ｄ／Ａ変換部
１５により再びアナログ信号に変換され、音声信号出力
端子１６から出力される。The output of the neural network group 9 is switched by the neural network second selection switch 10 controlled by the selection unit 6 to change the IFFT.
It is output to the (inverse fast Fourier transform) unit 11. IFFT
The unit 11 transforms the input signal into a time domain signal, and then outputs it to the encoding unit 12, which is an encoding means. The output coded by the coding unit 12 is input to the decoding unit 14 via the transmission line 13, is decoded, and is then converted again into an analog signal by the D / A conversion unit 15 to output a voice signal. It is output from the terminal 16.

【００１０】次に、上記の第１実施例の動作について説
明する。音声信号入力端子１から入力されたアナログ音
声信号は、Ａ／Ｄ変換部２においてデジタル信号に変換
される。デジタル化された信号は、無音検出部３に入力
され、無音区間のタイミングが検出される。この出力は
上記の切換スイッチ５に送られ、無音区間が検出されて
いる間は、切換スイッチ５がＯＮするような制御信号が
無音検出部３から切換スイッチ５に送られる。切換スイ
ッチ５がＯＮしている間は、ＦＦＴ部４において周波数
領域に変換された入力信号が切換スイッチ５を通って選
択部６に入力される。Next, the operation of the first embodiment will be described. The analog audio signal input from the audio signal input terminal 1 is converted into a digital signal in the A / D converter 2. The digitized signal is input to the silence detector 3 and the timing of the silent section is detected. This output is sent to the changeover switch 5 described above, and a control signal for turning on the changeover switch 5 is sent from the silence detecting section 3 to the changeover switch 5 while the silent section is detected. While the changeover switch 5 is ON, the input signal converted into the frequency domain by the FFT unit 4 is input to the selection unit 6 through the changeover switch 5.

【００１１】選択部６においては、接続している騒音ス
ペクトラムパターン群７において予め用意されたスペク
トラムパターンと、入力信号との比較を行い、騒音スペ
クトラムパターン群７において予め用意されたスペクト
ラムパターンのうちで最も近いものを選択する。これに
より、無音区間での入力信号、すなわち周囲騒音のスペ
クトラムがどの種類の騒音であるかを検出する。The selecting section 6 compares the spectrum pattern prepared in advance in the connected noise spectrum pattern group 7 with the input signal, and selects among the spectrum patterns prepared in advance in the noise spectrum pattern group 7. Select the closest one. As a result, it is possible to detect what kind of noise the input signal in the silent section, that is, the spectrum of the ambient noise.

【００１２】騒音スペクトラムパターン群７において予
め用意されるスペクトラムパターンの例を図２に示す。
図２において、図２（Ａ）はオフィス等の事務所騒音の
例を、図２（Ｂ）は風の騒音の例を、図２（Ｃ）は自動
車の車室内の騒音の例を、それぞれ示している。これら
以外にも、各種の騒音パターンを用意することが可能で
ある。An example of spectrum patterns prepared in advance in the noise spectrum pattern group 7 is shown in FIG.
In FIG. 2, FIG. 2A is an example of office noise such as an office, FIG. 2B is an example of wind noise, and FIG. 2C is an example of noise inside a vehicle. Shows. In addition to these, it is possible to prepare various noise patterns.

【００１３】上記のニューラルネットワーク部９に関連
する部分の、さらに詳細な構成を図３に示す。図にある
ように、あるニューラルネットワーク部９m には、騒音
スペクトラムパターン群７で用意され選択部６によって
選択された騒音スペクトラムと等しい騒音信号２２と、
入力音声信号２１とが加算器２３により加算された信号
がＦＦＴ部４ｂにより、例えば２５６チャンネルの周波
数帯に分析された周波数領域信号に変換され、正規化部
２４ｂにより正規化されて入力され、ニューラルネット
ワーク部９m の出力が比較器２５に入力される。一方、
音声信号のみがＦＦＴ部４ａにより周波数領域信号に変
換され、正規化部２４ａにより正規化された信号が教師
信号として比較器２５に入力されるように構成されてい
る。FIG. 3 shows a more detailed structure of a portion related to the neural network unit 9 described above. As shown in the figure, in a certain neural network unit 9m, a noise signal 22 prepared in the noise spectrum pattern group 7 and having the same noise spectrum as that selected by the selecting unit 6,
The signal obtained by adding the input audio signal 21 and the adder 23 is converted by the FFT unit 4b into a frequency domain signal analyzed into a frequency band of 256 channels, for example, and is normalized by the normalizing unit 24b and input. The output of the network unit 9m is input to the comparator 25. on the other hand,
Only the voice signal is converted into a frequency domain signal by the FFT unit 4a, and the signal normalized by the normalization unit 24a is input to the comparator 25 as a teacher signal.

【００１４】また、ニューラルネットワーク部９m は、
ＦＦＴ部の分析チャンネル数に合わせた、例えば２５６
ユニットの入力層と、例えば２００ユニットの中間層
と、入力ユニット数に合わせた、例えば２５６ユニット
の出力層の３層構造のネットワークで音声信号抽出の学
習を各々実施したものを備えている。The neural network unit 9m is
256 according to the number of FFT analysis channels
Each unit has an input layer of units, an intermediate layer of, for example, 200 units, and a network having a three-layer structure of an output layer of, for example, 256 units, which is adapted to the number of input units, and learning of voice signal extraction is performed.

【００１５】次に、上記の構成による学習時の動作につ
いて説明する。図３の入力音声信号２１は騒音の付加さ
れていない音声信号であり、予め無騒音環境下で録音し
た男女各年齢層の音声信号を使用する。図３の騒音信号
２２は、上記の図２に示したような事務所騒音、車室内
騒音など想定する騒音が選択される。Next, the operation at the time of learning with the above configuration will be described. The input voice signal 21 in FIG. 3 is a voice signal without noise, and voice signals of each age group of men and women recorded in advance in a noiseless environment are used. As the noise signal 22 in FIG. 3, assumed noise such as office noise and vehicle interior noise as shown in FIG. 2 is selected.

【００１６】上記の音声信号２１と騒音信号２２は、加
算器２３により加算され、音声と騒音とが混在した信号
が生成される。この加算器出力信号は、ＦＦＴ部４ｂ及
び正規化部２４ｂにおいて各周波数分析フレームごとの
スペクトラムレベルにて正規化した周波数領域の信号と
なる。The voice signal 21 and the noise signal 22 are added by the adder 23 to generate a signal in which voice and noise are mixed. This adder output signal becomes a signal in the frequency domain normalized by the spectrum level for each frequency analysis frame in the FFT unit 4b and the normalization unit 24b.

【００１７】一方、音声信号２１のみの信号をＦＦＴ部
４ａ及び正規化部２４ａにおいて各フレームごとのスペ
クトラムレベルにて正規化した周波数領域の信号とした
後、教師信号として比較器２５の一方に入力し、比較器
２５の他方にニューラルネットワーク部９m の出力層の
出力を接続する。そして、騒音信号と音声信号の混在し
た上記の信号の周波数領域信号をニューラルネットワー
ク部９m のニューラルネットワーク入力層に接続し、教
師信号とした音声信号のみが出力層から出力されるよ
う、ニューラルネットワーク９m の学習を行い、各ユニ
ットの重み係数ｗij，ｗjkを決定する。この場合、ニュ
ーラルネットワーク９m の構成は、ＦＦＴ部４ａ，４ｂ
の周波数分解能に応じた構成とし、例えば、上述のよう
に、２５６ユニットの入力層と２５６ユニットの出力層
を有するネットワーク構成とする。On the other hand, after the signal of only the audio signal 21 is normalized in the FFT unit 4a and the normalizing unit 24a by the spectrum level of each frame, it is input to one of the comparators 25 as a teacher signal. Then, the output of the output layer of the neural network unit 9m is connected to the other side of the comparator 25. Then, the frequency domain signal of the above-mentioned signal in which the noise signal and the voice signal are mixed is connected to the neural network input layer of the neural network unit 9m so that only the voice signal as the teacher signal is output from the output layer. Is learned and the weighting factors wij and wjk of each unit are determined. In this case, the configuration of the neural network 9m is the FFT units 4a and 4b.
In accordance with the frequency resolution of, for example, a network configuration having an input layer of 256 units and an output layer of 256 units as described above.

【００１８】図１のニューラルネットワーク９1 〜９n
としては、上記のような学習を行ったものを用いる。上
記選択部６により、音声が無音の状態における騒音スペ
クトラムパターンが騒音スペクトラムパターン群７のス
ペクトラムパターンから選択されると、ニューラルネッ
トワーク第１選択スイッチ８及びニューラルネットワー
ク第２選択スイッチ１０の切換制御が行われ、選択部６
が選択した騒音スペクトラムパターンによって上記の学
習（音声信号と騒音信号とが混在する信号から音声信号
を抽出する学習）を予め行っているニューラルネットワ
ーク部が、複数のニューラルネットワーク部９1 〜９n
の中から選択される。したがって、このようにスイッチ
８，９が切換制御された後は、混在信号中から抽出され
た音声信号成分がＩＦＦＴ部１１に入力される。The neural networks 91 to 9n shown in FIG.
As the above, the one learned as described above is used. When the noise spectrum pattern in the silent state is selected from the spectrum patterns of the noise spectrum pattern group 7 by the selection unit 6, the switching control of the neural network first selection switch 8 and the neural network second selection switch 10 is performed. Were, selection section 6
The neural network unit that performs the above learning (learning to extract a voice signal from a signal in which a voice signal and a noise signal are mixed) in advance according to the noise spectrum pattern selected by the plurality of neural network units 91 to 9n.
To be selected from. Therefore, after the switches 8 and 9 are switch-controlled in this way, the audio signal component extracted from the mixed signal is input to the IFFT unit 11.

【００１９】そして、この信号成分は、ＩＦＦＴ部１１
で逆フーリエ変換されて時間領域信号にされた後、符号
化部１２において符号化され、伝送路１３を経由して復
号化部１４に送られ、復号化された後、Ｄ／Ａ変換部１
５により再びアナログ信号に変換され、音声信号出力端
子１６から出力される。This signal component is transmitted to the IFFT section 11
Is inverse-Fourier-transformed into a time domain signal, encoded by the encoding unit 12, sent to the decoding unit 14 via the transmission path 13, and decoded, and then the D / A conversion unit 1
It is converted into an analog signal again by 5 and output from the audio signal output terminal 16.

【００２０】このように、上記の第１実施例によれば、
無音検出部３が入力音声の無音区間を検出すると、その
期間のスペクトラムパターンと、予め想定しておいた騒
音スペクトラムパターンとの比較により騒音特性を既測
定パターンのいずれかに決定し、その騒音特性に合わせ
て既に学習済みで音声騒音混在信号中から音声を抽出し
うる能力をもつニューラルネットワークを特定し、その
ニューラルネットワークに入力音声信号を通すことによ
り、騒音による音声信号の符号化時の劣化を抑制し、騒
音の影響を抑えることができる。Thus, according to the first embodiment described above,
When the silence detector 3 detects a silent section of the input voice, the noise characteristic is determined to be one of the already measured patterns by comparing the spectrum pattern of the period with a noise spectrum pattern assumed in advance, and the noise characteristic is determined. A neural network that has already been learned in accordance with the above and has the ability to extract speech from a speech and noise mixed signal is specified, and the input speech signal is passed through the neural network, so that noise-induced degradation of the speech signal at the time of coding can be prevented. It is possible to suppress and suppress the influence of noise.

【００２１】次に、本発明の第２実施例を図４にもとづ
いて説明する。図４の第２実施例は、図１の第１実施例
と同様の構成に、時間計測手段であるタイマ４８を付加
し、図１の騒音スペクトラムパターン群７に相当する騒
音スペクトラムパターン群（図示せず）に加えて時間帯
スペクトラムパターン蓄積手段である時間帯スペクトラ
ムパターン群４６を付加し、時間帯ニューラルネットワ
ーク群４７を追加し、さらに、無騒音状態で収録された
比較用音声信号４１と、その信号のＦＦＴ変換を行うＦ
ＦＴ部３４ａを設け、時間帯ニューラルネットワークに
自己学習可能な機能を設けて構成されている。Next, a second embodiment of the present invention will be described with reference to FIG. In the second embodiment of FIG. 4, a noise spectrum pattern group corresponding to the noise spectrum pattern group 7 of FIG. 1 is added to the configuration similar to that of the first embodiment of FIG. In addition to (not shown), a time zone spectrum pattern group 46 which is a time zone spectrum pattern accumulating means is added, a time zone neural network group 47 is added, and further, a comparison voice signal 41 recorded in a noiseless state, F that performs FFT conversion of the signal
The FT unit 34a is provided and the time zone neural network is provided with a self-learning function.

【００２２】次に、上記の第２実施例の動作について説
明する。第２実施例の基本的な動作は第１実施例と同様
であり、さらに以下の動作が付加される。Next, the operation of the above second embodiment will be described. The basic operation of the second embodiment is similar to that of the first embodiment, and the following operation is added.

【００２３】まず、タイマ４８を設けたことにより、無
音区間のタイミングに入力された信号の時間帯毎の平均
的スペクトラムパターンが時間帯スペクトラムパターン
群４６に蓄積される。さらに、時間帯ニューラルネット
ワーク群４７の時間帯毎に用意されたニューラルネット
ワークによって、上記の時間帯スペクトラムパターン群
４６に蓄積された時間帯毎の騒音信号と、予め用意され
た無騒音状態で収録された比較用音声信号４１をＦＦＴ
部３４ａにより時間領域信号に変換した信号とを用いて
学習を行う。First, by providing the timer 48, the average spectrum pattern for each time zone of the signal input at the timing of the silent section is accumulated in the time zone spectrum pattern group 46. Further, by the neural network prepared for each time zone of the time zone neural network group 47, the noise signal for each time zone accumulated in the above time zone spectrum pattern group 46 and the noise signal prepared in advance are recorded. FFT the compared audio signal 41
Learning is performed using the signal converted into the time domain signal by the unit 34a.

【００２４】この学習は、時間帯毎の騒音スペクトラム
がある程度蓄積されてから開始し、学習の終了は、比較
器４５によって比較される騒音の中から抽出された音声
信号と、比較用音声信号４１との比較結果の差が一定値
以下に収まった時点とする。This learning is started after the noise spectrum for each time zone is accumulated to some extent, and the learning is ended by the voice signal extracted from the noise compared by the comparator 45 and the voice signal 41 for comparison. The time when the difference between the comparison results with and falls below a certain value.

【００２５】上記の学習終了後、騒音スペクトラムの選
択範囲を、第１実施例のような予め用意した騒音スペク
トラムパターンのみから、予め用意した騒音スペクトラ
ムパターンと時間帯騒音パターンとを加えたものに拡張
するか、あるいは、騒音スペクトラムパターン選択によ
るニューラルネットワーク選択から、時間帯によるニュ
ーラルネットワーク選択へ選択方法を切り換えることに
より、騒音による音声信号の劣化を低減する。After the above learning is completed, the selection range of the noise spectrum is expanded from only the prepared noise spectrum pattern as in the first embodiment to the prepared noise spectrum pattern and the time zone noise pattern. Alternatively, the deterioration of the voice signal due to noise is reduced by switching the selection method from the selection of the neural network by selecting the noise spectrum pattern to the selection of the neural network by the time zone.

【００２６】このように、上記の第２実施例によれば、
騒音スペクトラムパターンの音声符号化装置が設置され
た場所での測定を行うことから、特に、固定された位置
で使用される音声符号化装置の騒音による音声信号の劣
化について、適応的な騒音を用いた学習によるニューラ
ルネットワークを用いることにより、効率的に騒音の影
響を減少させることができる。一般に、入力音声中の無
音区間の占める割合は、約３０％に達することから、無
音時毎に測定する、ということは、ほぼ時々刻々の周囲
騒音を測定することに等しく、音声符号化装置の設置さ
れている周囲の音響環境の変化に十分に適応することが
可能となる。Thus, according to the second embodiment described above,
Since the measurement of the noise spectrum pattern is performed at the place where the voice encoder is installed, the adaptive noise is used especially for the deterioration of the voice signal due to the noise of the voice encoder used at a fixed position. The influence of noise can be efficiently reduced by using the learned neural network. In general, the ratio of the silent section in the input speech reaches about 30%, so that measuring every silent period is equivalent to measuring the ambient noise almost every moment, and the speech coding apparatus has It becomes possible to fully adapt to changes in the installed acoustic environment.

【００２７】なお、上記実施例における符号化部１２に
関しては、ＡＤＰＣＭ（AdaptiveDifferential Pulse C
ode Modulation），ＣＥＬＰ（Code Excited Linear Pr
ediction），ＬＰＣ（Linear Prediction Coding）ボコ
ーダ等、各種の符号化器が使用可能である。また、図１
における伝送路１３は、有線伝送路、無線伝送路、ある
いは、一旦メモリに記憶させてから再生する方式等、各
種の方式が可能である。Regarding the encoding unit 12 in the above embodiment, an ADPCM (Adaptive Differential Pulse C
ode Modulation), CELP (Code Excited Linear Pr)
ediction), LPC (Linear Prediction Coding) vocoder, and other various encoders can be used. Also, FIG.
The transmission line 13 in FIG. 2 can be of various types, such as a wired transmission line, a wireless transmission line, or a system in which it is stored in the memory and then reproduced.

【００２８】なお、本発明は、上記実施例に限定される
ものではない。上記実施例は、例示であり、本発明の特
許請求の範囲に記載された技術的思想と実質的に同一な
構成を有し、同様な作用効果を奏するものは、いかなる
ものであっても本発明の技術的範囲に包含される。例え
ば、上記実施例においては、符号化部における騒音低減
を行う例について説明したが、上記の騒音低減の考え方
は、上記符号化部のみならず、上記復号化部においても
実施することができる。The present invention is not limited to the above embodiment. The above-mentioned embodiment is an exemplification, has substantially the same configuration as the technical idea described in the scope of the claims of the present invention, and has any similar effect to the present invention. It is included in the technical scope of the invention. For example, in the above embodiment, an example in which noise is reduced in the encoding unit has been described, but the concept of noise reduction described above can be implemented not only in the encoding unit but also in the decoding unit.

【００２９】[0029]

【発明の効果】以上説明したように、上記構成を有する
本願の第１の発明に係る音声符号化方式によれば、入力
音声信号の振幅が一定の閾値以下の場合、すなわち無音
区間において、周囲騒音のスペクトラム解析を行って入
力スペクトラムパターンを測定し、予め用意した騒音ス
ペクトラムパターン群の中から、測定した入力スペクト
ラムパターンに最も近い騒音スペクトラムを選択する。
一方、想定した想定騒音スペクトラムパターンと音声信
号とが混在した信号を入力とし、音声信号のみを教師信
号として学習を行ったニューラルネットワークは、上記
混在信号中の音声信号を抽出する能力を持つ。したがっ
て、上記の予め用意した複数の騒音スペクトラムパター
ンから成る騒音スペクトラムパターン群を上記の想定騒
音スペクトラムパターンとして音声抽出の学習を行った
ニューラルネットワーク群を用意することにより、無音
時の入力信号から、予め用意した騒音スペクトラムパタ
ーンのいずれかを選択し、その騒音スペクトラムパター
ンで予め学習を行っているニューラルネットワークを選
択し、そのニューラルネットワークに入力信号を入力し
た後に符号化を行うことにより、周囲騒音の影響を除い
た音声符号化を行うことができる、という利点を有す
る。また、上記構成を有する本願の第２の発明に係る音
声符号化方式によれば、一定時間毎、すなわち時間帯毎
の周囲騒音の入力スペクトラムパターンを測定して蓄積
し、この蓄積された時間帯騒音スペクトラムパターンと
比較用音声信号とにより学習を行うニューラルネットワ
ークを備えたことにより音声符号化装置の設置された場
所の時間帯毎の周囲騒音の特性に適応させて音声信号の
抽出を行うことができる。抽出後の信号を符号化するこ
とにより、より効果的に周囲騒音の影響を除去して音声
符号化を行うことができる。したがって、音声符号化装
置が固定されて使用されるような場合には、周囲騒音に
よる音声信号の劣化をより効果的に低減することができ
る、という利点も有している。As described above, according to the speech coding method of the first invention of the present application having the above-mentioned configuration, when the amplitude of the input speech signal is equal to or less than a certain threshold, that is, in the silent section, the surrounding The noise spectrum analysis is performed to measure the input spectrum pattern, and the noise spectrum closest to the measured input spectrum pattern is selected from the noise spectrum pattern group prepared in advance.
On the other hand, a neural network that has received a signal in which an assumed assumed noise spectrum pattern and a voice signal are mixed and learned by using only the voice signal as a teacher signal has a capability of extracting the voice signal in the mixed signal. Therefore, by preparing a neural network group that has learned the voice extraction with the noise spectrum pattern group consisting of a plurality of noise spectrum patterns prepared in advance as the above-mentioned assumed noise spectrum pattern, Effects of ambient noise by selecting one of the prepared noise spectrum patterns, selecting a neural network that has been pre-learned with the noise spectrum pattern, inputting an input signal to the neural network, and then encoding It has an advantage that speech coding can be performed except for. Further, according to the speech coding method according to the second invention of the present application having the above-mentioned configuration, the input spectrum pattern of the ambient noise is measured and accumulated at constant time intervals, that is, at each time interval, and the accumulated time interval is measured. Since the neural network that performs learning using the noise spectrum pattern and the comparative speech signal is provided, the speech signal can be extracted by adapting to the characteristics of the ambient noise for each time zone of the place where the speech encoding device is installed. it can. By encoding the extracted signal, it is possible to more effectively remove the influence of ambient noise and perform voice encoding. Therefore, when the speech coding apparatus is fixedly used, there is also an advantage that deterioration of the speech signal due to ambient noise can be more effectively reduced.

[Brief description of drawings]

【図１】本発明の第１実施例である音声符号化方式の全
体構成を示すブロック図である。FIG. 1 is a block diagram showing an overall configuration of a speech coding system which is a first embodiment of the present invention.

【図２】図１に示す騒音スペクトラムパターン群におい
て用意される騒音パターンの例を示す図である。FIG. 2 is a diagram showing an example of noise patterns prepared in the noise spectrum pattern group shown in FIG.

【図３】図１に示すニューラルネットワーク部に関連す
る部分のさらに詳細な構成を示すブロック図である。3 is a block diagram showing a more detailed configuration of a portion related to the neural network unit shown in FIG.

【図４】本発明の第２実施例である音声符号化方式の全
体構成を示すブロック図である。FIG. 4 is a block diagram showing an overall configuration of a speech coding system which is a second embodiment of the present invention.

【図５】従来の騒音対策を施した音声符号化方式の構成
を示す図である。[Fig. 5] Fig. 5 is a diagram showing the configuration of a conventional speech encoding system with noise countermeasures.

[Explanation of symbols]

１音声信号入力端子２Ａ／Ｄ変換部３無音検出部４，４ａ，４ｂＦＦＴ部５切換スイッチ６選択部７騒音スペクトラムパターン群８ニューラルネットワーク第１選択スイッチ９ニューラルネットワーク群９1 〜９n ニューラルネットワーク部１０ニューラルネットワーク第２選択スイッチ１１ＩＦＦＴ部１２符号化部１３伝送路１４復号化部１５Ｄ／Ａ変換部１６音声信号出力端子２１音声信号２２騒音信号２３加算器２４ａ，２４ｂ正規化部２５比較器３２Ａ／Ｄ変換部３３無音検出部３４ａ，３４ｂＦＦＴ部３６選択部３８ａ，３８ｂ切換スイッチ３９ニューラルネットワーク部４１比較用音声信号４５比較器４６時間帯スペクトラムパターン群４７時間帯ニューラルネットワーク群４８タイマ５１音声信号入力端子５２Ａ／Ｄ変換部５３無音検出部６２符号化部６３伝送路６４復号化部６５Ｄ／Ａ変換部６６音声信号出力端子６７切換スイッチ６８アッテネータ 1 Audio signal input terminal 2 A / D conversion unit 3 Silence detection unit 4, 4a, 4b FFT unit 5 Changeover switch 6 Selection unit 7 Noise spectrum pattern group 8 Neural network first selection switch 9 Neural network group 91 to 9n Neural network unit 10 Neural network second selection switch 11 IFFT unit 12 Encoding unit 13 Transmission path 14 Decoding unit 15 D / A conversion unit 16 Audio signal output terminal 21 Audio signal 22 Noise signal 23 Adder 24a, 24b Normalization unit 25 Comparator 32 A / D conversion unit 33 Silence detection unit 34a, 34b FFT unit 36 Selection unit 38a, 38b Changeover switch 39 Neural network unit 41 Voice signal for comparison 45 Comparator 46 Time zone spectrum pattern group 47 Time zone neural network group 48 Ma 51 audio signal input terminal 52 A / D converter 53 the silence detector 62 encoder 63 transmission line 64 decoder 65 D / A converter unit 66 the audio signal output terminal 67 the changeover switch 68 Attenuator

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁵ 識別記号庁内整理番号ＦＩ技術表示箇所Ｈ０４Ｂ 14/04 Ｚ 4101−5Ｋ ─────────────────────────────────────────────────── ─── Continuation of the front page (51) Int.Cl. ⁵ Identification code Internal reference number FI technical display location H04B 14/04 Z 4101-5K

Claims

[Claims]

1. A silent section detecting means for detecting a silent section of an input voice signal, an input spectrum pattern measuring means for measuring a spectrum pattern of the input voice signal, a noise spectrum pattern group prepared in advance, and the noise spectrum. A neural network that has learned voice signal extraction for each noise spectrum pattern in the pattern group, and an encoding means are provided, and the spectrum pattern of the input voice signal at that time and the noise spectrum pattern at each timing detection of the silent section The closest noise spectrum pattern is selected by comparing with the individual noise spectrum patterns in the group, and the input speech signal converted into the frequency domain is input to the neural network learned in advance with the selected noise spectrum pattern. , Of the audio signal Speech coding scheme and performing coding after the extraction.

2. A silent section detecting means for detecting a silent section of an input voice signal, an input spectrum pattern measuring means for measuring a spectrum pattern of the input voice signal, a time measuring means, and a constant controlled by the time measuring means. A time zone spectrum pattern accumulating means for accumulating a time zone spectrum pattern, which is a spectrum pattern at the time of no sound, a comparison voice signal recorded in a noiseless state, a neural network group having a learning function, and an encoding means A voice signal is extracted by the sum of the time-range spectrum pattern and the comparison voice signal, and learning is performed in advance, and only the voice signal is adaptively extracted for noise that changes depending on the time zone. A voice encoding method characterized by performing encoding.