JP3074703B2

JP3074703B2 - Multi-pulse encoder

Info

Publication number: JP3074703B2
Application number: JP02166883A
Authority: JP
Inventors: 直人岩橋
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1990-06-27
Filing date: 1990-06-27
Publication date: 2000-08-07
Anticipated expiration: 2015-08-07
Also published as: JPH0457100A

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、音声信号の高能率符号化を行うマルチパル
ス符号化装置に関するものである。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a multi-pulse encoding device for performing high-efficiency encoding of a speech signal.

[Summary of the Invention]

本発明は、駆動パルスを複数の合成フィルタに供給し
て得られた複数の音声情報と入力音声情報とを比較し
て、一つの合成フィルタ及び対応する駆動パルスの対を
選択し、この対の合成フィルタ及び駆動パルスに関連す
る情報を符号化することにより、音声信号を低ビットレ
ートに圧縮符号化することができると共に、低ビットレ
ートであっても音声合成によって品質の良い合成音を得
ることができるようになるマルチパルス符号化装置を提
供するものである。The present invention compares a plurality of audio information obtained by supplying a drive pulse to a plurality of synthesis filters with input audio information, selects one synthesis filter and a pair of corresponding drive pulses, and By encoding information related to the synthesis filter and the driving pulse, it is possible to compress and encode the audio signal at a low bit rate, and to obtain a high quality synthesized sound by voice synthesis even at a low bit rate. It is intended to provide a multi-pulse encoding device capable of performing the following.

[Conventional technology]

従来の音声の分析合成系（ボコーダ）として、例えば
いわゆる線形予測分析（LPC）を用いた音声分析合成系
がある。また、このLPC分析合成系を用いた音声信号の
符号化において、品質の良い合成音声を得ることのでき
る符号化の手法としては、例えば、いわゆるマルチパル
ス駆動線形予測符号化（MPC或いはMPEC）がある。このM
PCは、上記LPC分析合成系において、一般に行われてい
るパルスと雑音による音源のモデル化を避け、音源を有
声音・無声音にかかわらず複数のパルスによって表現
し、これによってLPC合成フィルタを駆動する方法であ
る。As a conventional speech analysis / synthesis system (vocoder), for example, there is a speech analysis / synthesis system using so-called linear prediction analysis (LPC). In the coding of a speech signal using the LPC analysis / synthesis system, a so-called multi-pulse drive linear prediction coding (MPC or MPEC) is an example of a coding method that can obtain a high-quality synthesized voice. is there. This M
In the above-mentioned LPC analysis / synthesis system, the PC avoids the modeling of the sound source generally performed by the pulse and the noise, and expresses the sound source by a plurality of pulses irrespective of voiced and unvoiced sounds, thereby driving the LPC synthesis filter. Is the way.

第２図に上記MPCを用いた従来の音声合成回路のブロ
ック図を示す。FIG. 2 shows a block diagram of a conventional speech synthesis circuit using the MPC.

この第２図において、入力端子101には、後述するパ
ルスの振幅と位置を示す信号Ｐ（ｎ）が供給され、マル
チパルス生成回路102に送られる。該マルチパルス生成
回路102では、上記パルスの振幅と位置の信号Ｐ（ｎ）
に基づいてマルチパルス列Ｖ（ｎ）が生成される。当該
マルチパルス列Ｖ（ｎ）で、LPC合成フィルタである長
期予測合成フィルタ103と短期予測合成フィルタ104が駆
動されることにより、合成音声信号Ｑ（ｎ）が得られる
ようになっている。該合成音声信号Ｑ（ｎ）が出力端子
105から出力される。In FIG. 2, a signal P (n) indicating the amplitude and position of a pulse described later is supplied to an input terminal 101 and sent to a multi-pulse generation circuit 102. In the multi-pulse generation circuit 102, the signal P (n) of the amplitude and position of the pulse
, A multi-pulse train V (n) is generated. By driving the long-term prediction synthesis filter 103 and the short-term prediction synthesis filter 104, which are LPC synthesis filters, with the multi-pulse train V (n), a synthesized speech signal Q (n) can be obtained. An output terminal of the synthesized voice signal Q (n)
Output from 105.

ここで、上記長期予測合成フィルタ103のフィルタ特
性F_SL（Ｚ）は、 F_SL（Ｚ）＝1/（１＋βZ^-M）（１）で表せる。なお、（１）式中のＭは該長期予測合成フィ
ルタ103の遅延タップ数、βは予測係数である。また、
上記短期予測合成フィルタ104のフィルタ特性F_SS（Ｚ）
は、 F_SS（Ｚ）＝1/（１＋α₁Z^-1＋α₂Z^-2…）（２）で表せる。この（２）式中αは予測係数である。Here, the filter characteristic F _SL (Z) of the long-term prediction synthesis filter 103 can be expressed by F _SL (Z) = 1 / (1 + βZ− ^M ) (1) In the equation (1), M is the number of delay taps of the long-term prediction synthesis filter 103, and β is a prediction coefficient. Also,
Filter characteristics F _SS (Z) of the short-term prediction synthesis filter 104
_{Can be} expressed as F _SS (Z) = 1 / (1 + α ₁ Z ⁻¹ + α ₂ Z ⁻² ...) (2) In the equation (2), α is a prediction coefficient.

ただし、一般のMPCを用いた音声合成の場合、上記長
期予測合成フィルタ103は必ずしも必要ではないが、こ
の第２図のように長期予測合成フィルタ103を用いるこ
とで、音声の長期の相関をとることができるようにな
り、このため音質が向上し、また、マルチパルスのエネ
ルギが少なくて済むようになる。However, in the case of speech synthesis using general MPC, the long-term prediction synthesis filter 103 is not always necessary, but by using the long-term prediction synthesis filter 103 as shown in FIG. 2, a long-term correlation of speech is obtained. Therefore, the sound quality can be improved, and the energy of the multi-pulse can be reduced.

ところで、上述した長期予測合成フィルタ103及び短
期予測合成フィルタ104の各フィルタ特性F_SL（Ｚ）及び
F_SS（Ｚ）は、例えば第３図のような構成を用いて決定
されている。By the way, each filter characteristic F _SL (Z) of the long-term prediction synthesis filter 103 and the short-term prediction synthesis filter 104 described above and
F _SS (Z) is determined using, for example, a configuration as shown in FIG.

この第３図の入力端子111には、原音声信号である入
力音声信号Ｓ（ｎ）が供給されている。ここで、上述し
た第２図での短期予測合成フィルタ104の予測係数α_ｉ
（ｉは1,2,3,…）は、例えば上記入力音声信号Ｓ（ｎ）
のLPC分析を行うことで得られる。また、第２図の長期
予測合成フィルタ103の遅延タップ数Ｍ及び予測係数β
は、 F_AS（Ｚ）＝１＋α₁Z^-1＋α₂Z^-2… （３）で表されるフィルタ特性F_AS（Ｚ）の短期予測分析フィ
ルタ112に、上記入力音声信号Ｓ（ｎ）を入力した時の
出力R₁（ｎ）を分析することで得られる。この時の分析
方法としては、例えば、 F_AL（Ｚ）＝１＋βZ^-M （４）のフィルタ特性F_AL（Ｚ）の長期予測分析フィルタ113
に、上記短期予測分析フィルタ112の出力R₁（ｎ）を入
力した時の出力R₂（ｎ）の２乗和が最小になるように、
すなわち、が最小となるように、上記予測係数β及び遅延タップ数
Ｍが求められる。なお、上記出力R₂（ｎ）は出力端子11
4から出力される。An input audio signal S (n), which is an original audio signal, is supplied to the input terminal 111 in FIG. Here, the prediction coefficient α _i of the short-term prediction synthesis filter 104 in FIG.
(I is 1, 2, 3,...) Is, for example, the input audio signal S (n)
It is obtained by performing LPC analysis. Further, the number M of delay taps and the prediction coefficient β of the long-term prediction synthesis filter 103 in FIG.
The above-mentioned input audio signal S (n) is input to a short-term prediction analysis filter 112 having a filter characteristic F _AS (Z) expressed by F _AS (Z) = 1 + α ₁ Z ⁻¹ + α ₂ Z ⁻² (3) It is obtained by analyzing the output R ₁ (n) when input. As an analysis method at this time, for example, a long-term prediction analysis filter 113 of a filter characteristic F _AL (Z) of F _AL (Z) = 1 + βZ− ^M (4)
In order to minimize the sum of squares of the output R ₂ (n) when the output R ₁ (n) of the short-term prediction analysis filter 112 is input,
That is, Is minimized, and the prediction coefficient β and the number M of delay taps are obtained. The output R ₂ (n) is connected to the output terminal 11
Output from 4.

このようにして上記フィルタ特性F_AL（Ｚ）及びF
_SS（Ｚ）が求められた第２図の長期予測合成フィルタ10
3と短期予測合成フィルタ104を上記マルチパルス列Ｖ
（ｎ）で駆動するのが上記MPCである。Thus, the filter characteristics F _AL (Z) and F
Long-term prediction synthesis filter 10 of the second _{view SS} (Z) is determined
3 and the short-term prediction synthesis filter 104
The MPC is driven by (n).

また、上記音声合成装置では、例えば合成音声信号Ｑ
（ｎ）が所望の値に近くなるようなマルチパルス列Ｖ
（ｎ）の探索が行われ、この探索によって得られたマル
チパルス列Ｖ（ｎ）によって音声の合成がなされるよう
になっている。この時のマルチパルス列Ｖ（ｎ）探索の
アルゴリズムとしては、例えば、いわゆるＡ−ｂ−Ｓ
（アナリシス−バイ−シンセシス）法の原理を用いた方
法がある。この方法では、第４図に示すような構成によ
って、入力音声信号Ｓ（ｎ）と上記合成音声信号Ｑ
（ｎ）との誤差を計算し、聴覚特性による重み付けを行
ってから、平均２乗誤差が最小になるような上記マルチ
パルス列Ｖ（ｎ）の探索が行われる。Further, in the above-mentioned speech synthesizer, for example, the synthesized speech signal Q
Multi-pulse train V such that (n) approaches the desired value
The search of (n) is performed, and speech is synthesized by the multi-pulse train V (n) obtained by the search. An algorithm for searching for the multi-pulse train V (n) at this time is, for example, a so-called AbS
There is a method using the principle of the (analysis-by-synthesis) method. In this method, the input audio signal S (n) and the synthesized audio signal Q
After calculating an error from (n) and performing weighting based on auditory characteristics, a search for the multi-pulse train V (n) that minimizes the mean square error is performed.

すなわち、この第４図の構成において、初期状態とし
て既にある数のパルス（マルチパルス列Ｖ（ｎ））が決
定されているとすると、該マルチパルス列Ｖ（ｎ）は、
上述した長期予測合成フィルタと短期予測合成フィルタ
とで構成されたLPC合成フィルタ123を通って合成音声信
号Ｑ（ｎ）に変換される。この合成音声信号Ｑ（ｎ）
は、端子126から供給される上記入力音声信号Ｓ（ｎ）
と、該合成音声信号Ｑ（ｎ）との減算演算を行うことで
誤差信号ｅ（ｎ）を得る減算器124に送られる。その
後、該誤差信号ｅ（ｎ）に聴感的な重み付け（重み付け
の係数Ｗ（ｚ）による重み付け）を行う重み付けフィル
タ125に送られ、その出力e_w（ｎ）に基づいて２乗誤差
最小化回路121で２乗誤差が最小になるように新しいパ
ルスの振幅と位置の信号Ｐ（ｎ）が決定されてマルチパ
ルス生成回路122に送れらる。すなわち、該第４図の構
成では、これまでに（例えば初期状態で）決定されたマ
ルチパルス列Ｖ（ｎ）による合成音声信号Ｑ（ｎ）と入
力音声信号Ｓ（ｎ）との誤差信号ｅ（ｎ）に基づいて、
新しいマルチパルス列Ｖ（ｎ）を追加し、この誤差信号
ｅ（ｎ）が予め設定した値より小さくなるか、或いは予
め設定した数のパルス（マルチパルス列Ｖ（ｎ））が決
定されるまで、上述した処理を繰り返すようになってい
る。That is, in the configuration of FIG. 4, assuming that a certain number of pulses (multi-pulse train V (n)) have already been determined as an initial state, the multi-pulse train V (n) becomes
The signal is converted into a synthesized speech signal Q (n) through the LPC synthesis filter 123 composed of the above-described long-term prediction synthesis filter and short-term prediction synthesis filter. This synthesized speech signal Q (n)
Is the input audio signal S (n) supplied from the terminal 126
Is subtracted from the synthesized speech signal Q (n) and sent to a subtractor 124 that obtains an error signal e (n). Thereafter, the error signal e (n) is sent to a weighting filter 125 for performing perceptual weighting (weighting with a weighting coefficient W (z)), and based on the output e _w (n), a square error minimizing circuit is used. At 121, a new pulse amplitude and position signal P (n) is determined so that the square error is minimized, and sent to the multi-pulse generation circuit 122. That is, in the configuration of FIG. 4, an error signal e () between the synthesized voice signal Q (n) based on the multi-pulse train V (n) determined so far (for example, in the initial state) and the input voice signal S (n). n)
A new multi-pulse train V (n) is added, and the error signal e (n) becomes smaller than a preset value or until a predetermined number of pulses (multi-pulse train V (n)) is determined. The processing is repeated.

[Problems to be solved by the invention]

従来のMPCの方式では、上述したように、例えば上記
Ａ−ｂ−Ｓ法の原理を用いたマルチパルス列Ｖ（ｎ）の
探索に先立って、上記長期予測合成フィルタ及び短期予
測合成フィルタのフィルタ特性を決定しておく必要があ
るが、上記長期予測合成フィルタのフィルタ特性決定の
手法としては最適な方法がなく、特に上記遅延タップ数
Ｍを求めるのが困難である。このため、音声合成装置で
得られた合成音声が、良好なものとなっているとは言い
難い。In the conventional MPC system, as described above, for example, prior to searching for the multi-pulse train V (n) using the principle of the AbS method, the filter characteristics of the long-term prediction synthesis filter and the short-term prediction synthesis filter are used. However, there is no optimal method for determining the filter characteristics of the long-term prediction synthesis filter, and it is particularly difficult to determine the number M of delay taps. For this reason, it is difficult to say that the synthesized speech obtained by the speech synthesizer is good.

そこで、本発明は、上述のような実情に鑑みて提案さ
れたものであり、音声信号の圧縮符号化ができると共
に、簡単に最適な合成音声信号を得ることができるマル
チパルス符号化装置を提供することを目的とするもので
ある。Therefore, the present invention has been proposed in view of the above-described circumstances, and provides a multi-pulse encoding device capable of performing compression encoding of an audio signal and easily obtaining an optimal synthesized audio signal. It is intended to do so.

[Means for solving the problem]

本発明のマルチパルス符号化装置は、上述の目的を達
成するために提案されたものであり、長期予測合成及び
短期予測合成を行う複数の合成フィルタと、これら複数
の合成フィルタに対応する駆動パルス発生手段と、当該
駆動パルス発生手段によって発生された駆動パルスを対
応する合成フィルタに供給して得られた音声情報と、入
力音声情報とを比較し、上記複数の合成フィルタの一つ
とこれに対応する駆動パルスの対を選択する比較選択手
段と、当該比較選択手段で選択された合成フィルタ及び
駆動パルスに関連する情報を符号化する符号化手段とを
有してなるものである。A multi-pulse encoding apparatus according to the present invention has been proposed to achieve the above-described object, and includes a plurality of synthesis filters for performing long-term prediction synthesis and short-term prediction synthesis, and a driving pulse corresponding to the plurality of synthesis filters. Generating means for comparing the input audio information with audio information obtained by supplying the driving pulse generated by the driving pulse generating means to the corresponding synthesis filter, and comparing one of the plurality of synthesis filters with the input audio information. And a coding unit for coding information relating to the synthesis filter and the driving pulse selected by the comparison / selection unit.

[Action]

本発明によれば、合成フィルタからの音声情報（合成
音声情報）と、入力音声情報とを比較し、この比較結果
に基づいた合成フィルタとこれに対応する駆動パルスに
関連する情報を符号化しているので、この符号化出力か
ら得られる合成音は最もよい合成音となる。According to the present invention, speech information (synthesized speech information) from a synthesis filter is compared with input speech information, and information related to a synthesis filter based on the comparison result and a driving pulse corresponding thereto is encoded. Therefore, the synthesized speech obtained from the encoded output is the best synthesized speech.

〔Example〕

以下、本発明を適用した実施例について図面を参照し
ながら説明する。Hereinafter, embodiments of the present invention will be described with reference to the drawings.

第１図に本発明実施例のマルチパルス符号化装置の概
略構成のブロック図を示す。FIG. 1 is a block diagram showing a schematic configuration of a multi-pulse encoder according to an embodiment of the present invention.

この第１図の符号化装置は、複数の合成フィルタとし
ての長期予測合成フィルタ3₁〜3_m（ｍは整数）及び短期
予測合成フィルタ4₁〜4_mと、これら複数の合成フィルタ
に対応する駆動パルス発生手段であるマルチパルス生成
回路2₁〜2_mと、当該マルチパルス生成回路2₁〜2_mによっ
て発生された駆動パルス（マルチパルス列Ｖ（ｎ）_１〜
Ｖ（ｎ）_ｍ）を対応する合成フィルタに供給して得られ
た音声情報である合成音声信号Ｑ（ｎ）_１〜Ｑ（ｎ）_ｍ
と入力音声情報である入力音声信号Ｓ（ｎ）とを比較
し、最も良い合成音声信号Ｑ（ｎ）_ｇ（ｇは１〜ｍの内
の何れか）が得られる上記複数の合成フィルタの一つと
これに対応する駆動パルスの対を選択する比較処理部11
と選択処理部12とからなる比較選択回路10と、当該比較
選択回路10で選択された合成フィルタ及び駆動パルスに
関連する情報としての長期予測合成フィルタの遅延タッ
プ数M_gと予測係数β_g,マルチパルス列Ｖ（ｎ）_ｇ及び短
期予測合成フィルタの予測係数を符号化する符号化回路
６とを有してなるものである。The first diagram of an encoding apparatus, (m is an integer) the long-term prediction synthesis filter 3 ₁ to 3 _m as a plurality of synthesis filters and the short-term prediction synthesis filter 4 ₁ to 4 _m, corresponding to the plurality of synthesis filters a multi-pulse generating circuit 2 ₁ to 2 _m, which is a driving pulse generating means, said multi-pulse generation circuit 2 ₁ to 2 the generated drive pulse by _m (multi-pulse train V (n) ₁ ~
V (n) _m ) to the corresponding synthesis filter, and obtains synthesized voice signals Q (n) _{1 to} Q (n) _m which are voice information obtained.
And the input audio signal S (n), which is the input audio information, to obtain one of the plurality of synthesis filters from which the best synthesized audio signal Q (n) _g (g is any one of 1 to m) is obtained. And a comparison processing unit 11 for selecting a corresponding pair of drive pulses.
And a selection processing unit 12 and a synthesis filter selected by the comparison and selection circuit 10 and a delay tap number M _g and a prediction coefficient β _g of a long-term prediction synthesis filter as information related to the driving pulse. And a coding circuit 6 for coding the multi-pulse train V (n) _g and the prediction coefficients of the short-term prediction synthesis filter.

すなわち、この第１図に示す本実施例装置は、MPCを
用いた音声合成を行うものであって、長期予測合成フィ
ルタ3₁〜3_mを用いることで、音声の長期の相関がとれる
ようにして音質を向上させ、また、マルチパルスのエネ
ルギを少なくて済むようにしている。ここで、上記各長
期予測合成フィルタ3₁〜3_mは、前述した（１）式中遅延
タップ数Ｍと予測係数βがそれぞれ異なるものとされる
ことで、各々異なるフィルタ特性F_SL（Ｚ）_１〜F
_SL（Ｚ）_ｍとされている。また、上記短期予測合成フィ
ルタ4₁〜4_mの各フィルタ特性はそれぞれ同じフィルタ特
性F_SS（Ｚ）となっている。なお、上記短期予測合成フ
ィルタ4₁〜4_mの各フィルタ特性F_SS（Ｚ）は、前述した
（２）式で表すことができるものである。更に、上記マ
ルチパルス生成回路2₁〜2_mからは、前述した第４図のよ
うにして、対応する長期予測合成フィルタ3₁〜3_mでの最
適のマルチパルス列Ｖ（ｎ）_１〜Ｖ（ｎ）_ｍが生成され
るようになっている。例えば、これらマルチパルス生成
回路2₁〜2_mからは、前述した第４図に示したLPC合成フ
ィルタ123内の長期予測合成フィルタのフィルタ特性
を、本実施例の各長期予測合成フィルタ3₁〜3_mのフィル
タ特性F_SL（Ｚ）_１〜F_SL（Ｚ）_ｍとし、該フィルタ特性
F_SL（Ｚ）_１〜F_SL（Ｚ）_ｍとされた長期予測合成フィル
タが配されたLPC合成フィルタ123を用いて、当該第４図
の構成でそれぞれ選ばれた最適のマルチパルス列が得ら
れるようになっている。このようにして得られたマルチ
パルス列Ｖ（ｎ）_１〜Ｖ（ｎ）_ｍが、第１図のLPC合成
フィルタである長期予測合成フィルタ3₁〜3_m及び短期予
測合成フィルタ4₁〜4_mを介することにより、合成音声信
号Ｑ（ｎ）_１〜Ｑ（ｎ）_ｍが得られるようになってい
る。これら合成音声信号Ｑ（ｎ）_１〜Ｑ（ｎ）_ｍが上記
比較選択回路10に送られる。That is, the present embodiment shown in FIG. 1 device, there is performing voice synthesis using MPC, by using the long-term prediction synthesis filter 3 ₁ to 3 _m, so take a correlation of the speech long-term To improve the sound quality and to reduce the energy of the multi-pulse. Here, each of the long-term prediction synthesis filter 3 ₁ to 3 _m is described above (1) beta prediction coefficients and delay taps M in the formula that are different from each other, each different filter characteristics F _SL (Z) _{1 to} F
_SL (Z) _m . Further, each filter characteristic of the short-term prediction synthesis filter 4 ₁ to 4 _m is respectively a same filter characteristic F _SS (Z). Incidentally, the short-term prediction synthesis filter 4 ₁ to 4 _m each filter characteristic F _SS of (Z) are those that can be represented by the aforementioned formula (2). Furthermore, from the multi-pulse generating circuit 2 ₁ to 2 _m, as FIG. 4 described above, the corresponding long-term prediction synthesis filter 3 ₁ to 3 in the _m optimal multi-pulse train _V (n) 1 _~V ( n) _m is generated. For example, from these multi-pulse generating circuit 2 ₁ to 2 _m, the filter characteristic of the long-term prediction synthesis filter in LPC synthesis filter 123 shown in FIG. 4 described above, the long-term prediction synthesis filter 3 ₁ of this embodiment 3 _m filter characteristics F _SL (Z) _{1 to} F _SL (Z) _m
Using the LPC synthesis filter 123 provided with the long-term prediction synthesis filters designated as F _SL (Z) _{1 to} F _SL (Z) _m , the optimum multi-pulse trains respectively selected in the configuration of FIG. 4 can be obtained. It has become. Thus the multi-pulse train _{V (n) 1 ~V (n} ) m thus obtained is, long-term prediction synthesis filter 3 ₁ to 3 _m and short-term prediction synthesis filter 4 ₁ to 4 _m is a LPC synthesis filter of Figure 1 , Synthetic speech signals Q (n) _{1 to} Q (n) _m can be obtained. These synthesized speech signals Q (n) _{1 to} Q (n) _m are sent to the comparison and selection circuit 10.

本実施例装置においては、上述したように各長期予測
合成フィルタ3₁〜3_mのフィルタ特性がそれぞれ異なって
いるため、各短期予測合成フィルタ4₁〜4_mの各出力すな
わち合成フィルタの出力である合成音声信号Ｑ（ｎ）_１
〜Ｑ（ｎ）_ｍもそれぞれ異なったものとなっている。こ
れら各合成音声信号Ｑ（ｎ）_１〜Ｑ（ｎ）_ｍの中で、最
も良い合成音が比較選択回路10で選ばれる。In this embodiment apparatus, because the filter characteristics of the long-term prediction synthesis filter 3 ₁ to 3 _m are different from each as described above, in the output or outputs of the synthesis filters in each short-term prediction synthesis filter 4 ₁ to 4 _m A certain synthesized voice signal Q (n) ₁
To Q (n) _m are also different from each other. Among these synthesized speech signals Q (n) _{1 to} Q (n) _m , the best synthesized sound is selected by the comparison and selection circuit 10.

ここで、上記比較選択回路10の比較処理部11では、入
力音声信号Ｓ（ｎ）に最も近くなる合成音声信号Ｑ
（ｎ）_ｇを選ぶような処理を行う。具体的には、例えば
各合成音声信号Ｑ（ｎ）_１〜Ｑ（ｎ）_ｍと、端子５を介
した入力音声信号Ｓ（ｎ）との誤差信号（入力音声信号
Ｓ（ｎ）に対する誤差信号）をそれぞれ得て、これら誤
差信号に基づいて２乗誤差が最小となるような合成音声
信号Ｑ（ｎ）_ｇを選ぶ。換言すれば、２乗誤差が最も小
さくなるということは、入力音声信号Ｓ（ｎ）に最も近
い合成音声信号Ｑ（ｎ）_ｇであることを示す。この比較
処理部11での比較結果は、上記選択処理部12に送られ
る。Here, in the comparison processing section 11 of the comparison and selection circuit 10, the synthesized voice signal Q which is closest to the input voice signal S (n) is output.
(N) Perform a process to select _g . Specifically, for example, an error signal (an error signal with respect to the input audio signal S (n)) between each synthesized audio signal Q (n) _{1 to} Q (n) _m and the input audio signal S (n) via the terminal 5. ), And a synthesized speech signal Q (n) _g that minimizes the square error is selected based on these error signals. In other words, the smallest square error indicates that the synthesized speech signal Q (n) _g is closest to the input speech signal S (n). The comparison result in the comparison processing unit 11 is sent to the selection processing unit 12.

また、上記比較選択回路10の選択処理部11には、各マ
ルチパルス生成回路2₁〜2_mからのマルチパルス列Ｖ
（ｎ）_１〜Ｖ（ｎ）_ｍと、各長期予測合成フィルタ3₁〜
3_mの各遅延タップ数M₁〜M_m及び予測係数β_１〜β_ｍの情
報も供給されている。該選択処理部12では、上記比較処
理部11の比較結果に応じて、上記最適の合成音声信号Ｑ
（ｎ）_ｇが得られる長期予測合成フィルタの遅延タップ
数M_gと予測係数β_ｇ及び対応するマルチパルス生成回路
のマルチパルス列Ｖ（ｎ）_ｇの情報を選択して出力する
ようになっている。Moreover, the selection processing unit 11 of the comparison selection circuit 10, the multi-pulse train V from the multi-pulse generating circuit 2 ₁ to 2 _m
_(N) 1 ~V and _{(n) m,} the long-term prediction synthesis filter 3 ₁ -
Information on the delay tap numbers M _{1 to} M _{m of} 3 _m and the prediction coefficients β _{1 to} β _m is also supplied. In the selection processing unit 12, according to the comparison result of the comparison processing unit 11, the optimal synthesized speech signal Q
(N) The number of delay taps _Mg and the prediction coefficient β _{g of the} long-term prediction synthesis filter from which _g can be obtained and the information on the corresponding multi-pulse train V (n) _g of the multi-pulse generation circuit are selected and output. .

当該遅延タップ数M_g,予測係数β_ｇとマルチパルス列
Ｖ（ｎ）_ｇの情報、及び端子９からの短期予測合成フィ
ルタの予測係数の情報が符号化回路６に送られて符号化
された後、出力端子７から出力信号Ｃ（ｎ）として出力
される。After the information of the delay tap number M _g , the prediction coefficient β _g and the multi-pulse train V (n) _g , and the information of the prediction coefficient of the short-term prediction synthesis filter from the terminal 9 are sent to the encoding circuit 6 and encoded. , From the output terminal 7 as an output signal C (n).

上述のようなことから、本実施例装置においては、マ
ルチパルス符号化において用いられる長期予測合成フィ
ルタを決定（フィルタ特性を決定）する場合、それぞれ
異なるフィルタ特性の複数の長期予測合成フィルタを用
意しておき、これらの長期予測合成フィルタを用いて得
られた合成音声信号の中から最適な合成音を得ることが
できるようになる信号を選び、この選ばれた合成音声信
号Ｑ（ｎ）_ｇに対応する長期予測合成フィルタの遅延タ
ップ数M_g及び予測係数β_ｇの情報と、該長期予測合成フ
ィルタに対応したマルチパルス生成回路の出力（マルチ
パルス列Ｖ（ｎ）_ｇ）を得て、これらの情報を短期予測
合成フィルタの予測係数と共に符号化して出力するよう
にしている。このため、後に当該符号化出力に基づいて
音声を合成すれば、良好な合成音声が得られるようにな
る。上述のようなことから、従来例の場合と同じビット
レートで符号化しても、本実施例装置での符号化出力に
基づいた合成音の方が音質が向上するようになる。更
に、各長期合成フィルタ3₁〜3_mのフィルタ特性も最適な
ものが選ばれるようになっているため、合成音声の品質
が向上することになる。From the above, in the apparatus of this embodiment, when determining a long-term prediction synthesis filter used in multi-pulse encoding (determining filter characteristics), a plurality of long-term prediction synthesis filters having different filter characteristics are prepared. In advance, a signal from which an optimum synthesized sound can be obtained is selected from synthesized speech signals obtained using these long-term prediction synthesis filters, and the selected synthesized speech signal Q (n) _g is selected. obtaining information of the corresponding long-term prediction delay tap number M _g and prediction coefficients of the synthesis filter beta _g, the output of the multi-pulse generating circuit corresponding to the long-life prediction synthesis filter (multi-pulse train V _(n) g), of The information is encoded and output together with the prediction coefficients of the short-term prediction synthesis filter. Therefore, if speech is later synthesized based on the encoded output, a good synthesized speech can be obtained. As described above, even when encoding is performed at the same bit rate as in the conventional example, the sound quality of the synthesized sound based on the encoded output of the apparatus of the present embodiment is improved. Furthermore, since the filter characteristics of the long-term synthesis filter 3 ₁ to 3 _m even the best one has to be selected, the quality of the synthesized speech is improved.

〔The invention's effect〕

本発明のマルチパルス符号化装置においては、長期予
測合成及び短期予測合成を行う複数の合成フィルタと、
これら複数の合成フィルタに対応する駆動パルス発生手
段からの駆動パルスを対応する合成フィルタに供給して
得られた複数の音声情報と入力音声情報とを比較して、
一つの合成フィルタ及び対応する駆動パルスの対を選択
することで、合成フィルタの最適な特性を選ぶことがで
きるようになり、この選択された対の合成フィルタ及び
駆動パルスに関連する情報を符号化することによって、
音声信号を低ビットレートに圧縮符号化することができ
ると共に、低ビットレートであっても音声合成によって
簡単に品質の良い合成音を得ることができるようにな
る。In the multi-pulse encoding device of the present invention, a plurality of synthesis filters for performing long-term prediction synthesis and short-term prediction synthesis,
A plurality of audio information obtained by supplying drive pulses from the drive pulse generating means corresponding to the plurality of synthesis filters to the corresponding synthesis filter is compared with input audio information,
By selecting one synthesis filter and a corresponding drive pulse pair, it is possible to select the optimum characteristics of the synthesis filter and to encode information relating to the selected pair of synthesis filter and drive pulse. By,
An audio signal can be compression-coded at a low bit rate, and a high-quality synthesized sound can be easily obtained by voice synthesis even at a low bit rate.

[Brief description of the drawings]

第１図は本発明実施例装置の概略構成を示すブロック回
路図、第２図は従来例の合成回路を示すブロック回路
図、第３図はフィルタ特性決定のための構成を示す。ブ
ロック回路図、第４図はマルチパルス列探索のための構
成を示すブロック回路図である。６……符号化回路 2₁〜2_m……マルチパルス生成回路 3₁〜3_m……長期予測合成フィルタ 4₁〜4_m……短期予測合成フィルタ 10……比較選択回路 11……比較処理部 12……選択処理部FIG. 1 is a block circuit diagram showing a schematic configuration of an apparatus according to an embodiment of the present invention, FIG. 2 is a block circuit diagram showing a conventional synthesis circuit, and FIG. 3 is a configuration for determining filter characteristics. FIG. 4 is a block circuit diagram showing a configuration for searching for a multi-pulse train. 6 ...... encoding circuit 2 ₁ to 2 _m ...... multi-pulse generating circuit 3 ₁ to 3 _m ...... long-term prediction synthesis filter 4 ₁ to 4 _m ...... short-term prediction synthesis filter 10 ...... Comparison selection circuit 11 ...... comparison Part 12 …… Selection processing part

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 11/00 - 13/08 G10L 19/00 - 21/06 ＩＮＳＰＥＣ（ＤＩＡＬＯＧ) ＪＩＣＳＴファイル（ＪＯＩＳ) ＷＰＩ（ＤＩＡＬＯＧ)────────────────────────────────────────────────── ─── Continued from the front page (58) Fields surveyed (Int. Cl. ⁷ , DB name) G10L 11/00-13/08 G10L 19/00-21/06 INSPEC (DIALOG) JICST file (JOIS) WPI ( DIALOG)

Claims

(57) [Claims]

1. A plurality of synthesis filters for performing long-term prediction synthesis and short-term prediction synthesis, driving pulse generating means corresponding to the plurality of synthesis filters, and a synthesis filter corresponding to the driving pulse generated by the driving pulse generating means. Comparing the audio information obtained by supplying the input audio information with the input audio information, and selecting one of the plurality of synthesis filters and a pair of the corresponding drive pulse. A multi-pulse encoding device comprising: a synthesis filter; and encoding means for encoding information related to the drive pulse.