JP2615991B2

JP2615991B2 - Linear predictive speech analysis and synthesis device

Info

Publication number: JP2615991B2
Application number: JP1077242A
Authority: JP
Inventors: 哲田口
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1988-03-28
Filing date: 1989-03-28
Publication date: 1997-06-04
Anticipated expiration: 2012-06-04
Also published as: JPH01315800A

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は線形予測型音声分析合成装置に関する。Description: TECHNICAL FIELD The present invention relates to a linear predictive speech analysis / synthesis apparatus.

[Conventional technology]

従来の線形予測型音声分析合成装置は、入力音声信号
が有声音である場合、入力音声信号の基本周波数を繰返
し周波数とするインパルス列を合成側の音声信号とする
のが一般的であった。また、インパルス列の代りに、包
絡線波形が基本周波数で繰返す“形状”を有するパルス
列を用いることも行われていた。In a conventional linear prediction speech analysis / synthesis apparatus, when an input speech signal is a voiced sound, an impulse train having a fundamental frequency of the input speech signal as a repetition frequency is generally used as a speech signal on the synthesis side. Further, instead of an impulse train, a pulse train whose envelope waveform has a “shape” that repeats at a fundamental frequency has been used.

[Problems to be solved by the invention]

上述した従来の線形予測型音声分析合成装置は、音源
信号にインパルス列を用いるとエネルギーが時間軸上に
ピッチ励振点に集中するので出力音声信号が不自然にな
る欠点があり、また、“形状”を有するパルス列を用い
るとエネルギーの集中は避けられるが音源信号が有色化
されるので入力音声信号と出力音声信号とでスペクトル
構造が異ってしまい不自然になる欠点がある。The above-described conventional linear predictive speech analysis / synthesis apparatus has a drawback that when an impulse train is used as a sound source signal, energy concentrates on a pitch excitation point on a time axis, so that an output speech signal becomes unnatural. When a pulse train having "" is used, concentration of energy can be avoided, but since the sound source signal is colored, there is a disadvantage that the input audio signal and the output audio signal have different spectral structures and are unnatural.

本発明の目的は、エネルギーが集中しない音源信号を
用い、しかも入力音声信号と出力音声信号とでスペクト
ル構造が一致して音質の良い線形予測型音声分析合成装
置を提供することにある。SUMMARY OF THE INVENTION An object of the present invention is to provide a linear predictive speech analysis / synthesis apparatus which uses a sound source signal in which energy is not concentrated, and has a high sound quality because the input speech signal and the output speech signal have the same spectral structure.

[Means for solving the problem]

本発明の線形予測型音声分析合成装置は、音源情報が
有するスペクトル包絡特性と音声合成フィルタが有する
スペクトル包絡周波数特性との縦続周波数特性が入力音
声信号のスペクトル包絡周波数特性と一致する合成手段
を合成側に具備しているものであり、具体的な手段の一
例としては分析側であらかじめ定めた時間間隔ごとに入
力音声信号の有声音・無声音の区別情報、有声音である
ときの基本周波数の情報および電力の情報を含む音源情
報とスペクトル包絡を示す線形予測係数またはこの線形
側係数と等価な係数とを計測し、合成側で前記音源情報
と前記線形予測係数またはこの線形予測係数と等価な係
数とに基づき出力音声信号を合成する線形予測型音声分
析合成装置において、前記合成側には、パルス発生器か
らのパルスと前記線形予測係数またはこの線形予測係数
と等価な係数とが入力されるディジタルフィルタと、前
記有声音・無声音の区別情報に基づき前記基本周波数の
周期と等しい周期のパルス列または雑音信号を出力する
励起信号発生手段と、前記ディジタルフィルタの出力と
前記励起信号発生手段の出力がそれぞれ入力され前記デ
ィジタルフィルタのインパルス応答を時間的に反転した
インパルス応答をもつトランスバーサルフィルタと、前
記電力の情報に基づき前記トランスバーサルフィルタの
出力を制御し音源信号を出力する音源信号生成手段と、
前記音源信号が入力され前記線形予測係数またはこの線
形予測係数の等価な係数で定まるフィルタ特性に予め定
めたロスを付加した特性をもつロス付き合成フィルタと
を具備し、前記線形予測係数またはこの線形予測係数と
等価な係数で定まる合成フィルタの伝達関数を前記ロス
付き合成フィルタの伝達完成で割った商を前記ディジタ
ルフィルタの伝達関数とする。The linear predictive speech analysis / synthesis apparatus according to the present invention synthesizes a synthesizing means in which a cascade frequency characteristic of a spectrum envelope characteristic of the sound source information and a spectrum envelope frequency characteristic of the speech synthesis filter matches the spectrum envelope frequency characteristic of the input audio signal. As an example of specific means, information on the distinction between voiced and unvoiced sounds of the input voice signal at each time interval predetermined on the analysis side, information on the fundamental frequency when the input voice signal is a voiced sound The sound source information including the power information and the linear prediction coefficient indicating the spectral envelope or a coefficient equivalent to the linear side coefficient are measured, and the sound source information and the linear prediction coefficient or the coefficient equivalent to the linear prediction coefficient are calculated on the synthesis side. A linear predictive speech analysis / synthesis apparatus that synthesizes an output speech signal based on a pulse from a pulse generator and the line A digital filter to which a prediction coefficient or a coefficient equivalent to the linear prediction coefficient is inputted, and an excitation signal generating means for outputting a pulse train or a noise signal having a cycle equal to the cycle of the fundamental frequency based on the voiced / unvoiced discrimination information A transversal filter to which the output of the digital filter and the output of the excitation signal generating means are respectively input and have an impulse response obtained by inverting the impulse response of the digital filter with respect to time; and the transversal filter based on the power information. Sound source signal generating means for controlling the output of the sound source and outputting a sound source signal;
A lossy synthesis filter having a characteristic obtained by adding a predetermined loss to a filter characteristic to which the sound source signal is input and determined by the linear prediction coefficient or an equivalent coefficient of the linear prediction coefficient, wherein the linear prediction coefficient or the linear The quotient obtained by dividing the transfer function of the synthesis filter determined by a coefficient equivalent to the prediction coefficient by the completion of the transfer of the lossy synthesis filter is defined as the transfer function of the digital filter.

〔Example〕

次に、本発明について図面を参照して説明する。 Next, the present invention will be described with reference to the drawings.

第１図は本発明の一実施例を示すブロック図である。 FIG. 1 is a block diagram showing one embodiment of the present invention.

第１図に示す実施例は、入力音声信号を入力する窓切
出処理器1,2と、窓切出処理器１の出力信号を入力して
Ｋパラメータk₁〜k_pおよび電力パラメータpwを出力する
LPC分析器３と、Ｋパラメータk₁〜k_pを入力するＫ量子
化器４と、電力パラメータpwを入力する電力量子化器５
と、窓切出処理器２の出力信号を入力しピッチパラメー
タptを出力するピッチ抽出器６と、ピッチパラメータpt
を入力するピッチ量子化器７と、Ｋ量子化器4,電力量子
化器5,ピッチ量子化器７の出力信号を入力する多量化回
路８と、伝送路（図示せず）を介して多重化回路８の出
力信号を入力する分離回路９と、それぞれ入力端が分離
回路９に接続されたＫ複号器10,電力複号器11,ピッチ複
号器12と、Ｋ複号器10が出力したＫパラメータk₁〜k_pを
入力しαパラメータα_１〜α_ｐを出力するK/α変換器13
と、電力複号器11が出力した電力パラメータpwとピッチ
複号器12が出力したピッチパラメータptとαパラメータ
α_１〜α_ｐとを入力する音源信号生成部14と、音源信号
生成部14の出力信号とαパラメータα_１〜α_ｐとを入力
音声信号を出力するロス付合成フィルタ15とを具備して
構成されている。多重化回路８までの部分が分析側であ
り、分離回路９以降の部分が合成側である。In the embodiment shown in FIG. 1, window cutout processors 1 and 2 for inputting an input audio signal, and an output signal of the window cutout processor 1 are input and K parameters k _{1 to} k _p and a power parameter pw are set. Output
LPC analyzer 3, K quantizer 4 for inputting K parameters k _{1 to} k _p, and power quantizer 5 for inputting power parameter pw
A pitch extractor 6 that receives an output signal of the window cutout processor 2 and outputs a pitch parameter pt;
, A K quantizer 4, a power quantizer 5, and a multiplexing circuit 8 for inputting output signals of the pitch quantizer 7, and a multiplexing circuit 8 via a transmission line (not shown). A separation circuit 9 for inputting an output signal of the conversion circuit 8, a K decoder 10, a power decoder 11, a pitch decoder 12, and a K decoder 10 each having an input terminal connected to the separation circuit 9. K / α converter 13 for inputting output K parameters k _{1 to} k _p and outputting α parameters α _{1 to} α _p
A sound source signal generation unit 14 that inputs the power parameter pw output from the power decoder 11, the pitch parameter pt output from the pitch decoder 12, and the α parameters α _{1 to} α _p , The output signal and the α parameters α _{1 to} α _p are provided with a lossy synthesis filter 15 for outputting an input audio signal. The part up to the multiplexing circuit 8 is the analysis side, and the part after the separation circuit 9 is the synthesis side.

窓切出処理器１は、出力音声信号を遮断周波数3.4kHz
の低域フイルタで帯域制限し、そのあと8kHzのサンプリ
ング周波数で標本化し、各サンプルを所定のビット数に
量子化し、30ms分の量子化サンプルにハミング窓関数を
乗算する窓切出を分析フレーム周期である20msごとに行
う。窓切出処理器２も窓関数として矩形窓関数を用いる
ことを除き窓切出処理器１が行うのと同様に、入力音声
信号を処理する。The window cutout processor 1 cuts the output audio signal to a cutoff frequency of 3.4 kHz.
The bandwidth is limited by a low-pass filter, and then sampled at a sampling frequency of 8 kHz, each sample is quantized to a predetermined number of bits, and a window cutout that multiplies a 30 ms quantized sample by a Hamming window function is an analysis frame period. Is performed every 20 ms. The window cutout processor 2 processes the input audio signal in the same manner as the window cutout processor 1 except that a rectangular window function is used as the window function.

LPC分析器３は、窓切出処理器１から入力した信号を
分析フレームごとに線形予測分析（Iinear predictive
coding:LPC）してｐ次の部分自己相関係数であるＫパラ
メータk₁〜k_pを抽出し、また、入力音声信号の分析フレ
ームごとの電力（の1/2乗）を表す電力パラメータpwを
出力する。ピッチ抽出器６は、窓切出処理器２から入力
した信号を分析フレームごとに有声音であるか無声音で
あるか判定し、有声音である場合ピッチ周期を抽出し、
判定結果とピッチ周期とを一括してピッチパラメータpt
として出力する。The LPC analyzer 3 performs linear prediction analysis (Iinear predictive) on the signal input from the window cutout processor 1 for each analysis frame.
coding: LPC) to extract K parameters k _{1 to} k _p , which are p-order partial autocorrelation coefficients, and a power parameter pw representing the power (1/2 power) of each analysis frame of the input speech signal. Is output. The pitch extractor 6 determines whether the signal input from the window cutout processor 2 is a voiced sound or an unvoiced sound for each analysis frame, and if the signal is a voiced sound, extracts a pitch period,
Judgment result and pitch cycle collectively pitch parameter pt
Output as

LPC分析器３およびピッチ抽出器６で得たＫパラメー
タk₁〜k_p,電力パラメータpw,ピッチパラメータptは、Ｋ
量子化器4,電力量子化器5,ピッチ量子化器７でそれぞれ
所定のビット数に量子化され、多重化回路８で多重化さ
れて伝送路で伝送され、分離回路９で分離されたあとＫ
複号器10,電力複号器11,ピッチ複号器12でそれぞれ複号
され、再びＫパラメータk₁〜k_p,電力パラメータpw,ピッ
チパラメータptとなる。The K parameters k _{1 to} k _p , the power parameter pw, and the pitch parameter pt obtained by the LPC analyzer 3 and the pitch extractor 6 are K
After being quantized to a predetermined number of bits by a quantizer 4, a power quantizer 5, and a pitch quantizer 7, respectively, multiplexed by a multiplexing circuit 8, transmitted by a transmission line, and separated by a separating circuit 9. K
Decoder 10, the power Decoder 11 is decoding respectively pitch Decoder 12, again becomes K parameters k ₁ to k _p, power parameter pw, pitch parameter pt.

K/α変換器13は、複号されたＫパラメータk₁〜k_pから
線形予測係数であるαパラメータα_１〜α_ｐを公知の方
法で算出し、出力する。K / alpha converter 13, the alpha parameter alpha ₁ to? _P is the linear prediction coefficients calculated by known methods from the K parameter k ₁ to k _p, which is decoding, and outputs.

以上詳述するように、音源信号生成部14は再生された
αパラメータα_１〜α_p,ピッチパラメータpt,電力パラ
メータpwに基づいて音源信号を生成し、ロス付合成フィ
ルタ15はこの音源信号を入力しαパラメータα_１〜α_ｐ
に基づき出力音声信号を合成する。As described in detail above, the sound source signal generation unit 14 generates a sound source signal based on the reproduced α parameters α _{1 to} α _p , the pitch parameter pt, and the power parameter pw, and the lossy synthesis filter 15 Input and α parameters α _{1 to} α _p
The output audio signal is synthesized based on

まず、ロス付合成フィルタ15について説明する。 First, the lossy synthesis filter 15 will be described.

第２図はロス付合成フィルタ15のブロック図である。 FIG. 2 is a block diagram of the lossy synthesis filter 15.

ロス付合成フィルタ15は、減算器31と、一方の入力端
から０＜γ＜１である定数γを入力するｐ個の掛算器32
と、窓切出処理器1,2におけるサンプリング周期に等し
い遅延をそれぞれ与えるｐ個の遅延回路33と、一方の入
力端からαパラメータα_ｉ（ｉは１〜ｐの整数）を入力
するｐ個の掛算器34と、加算器35とを備えて構成されて
いる。１個の掛算器32の出力端が１個の遅延回路33の入
力端に接続され、このように接続された各１個の掛算器
32,遅延回路33からそれぞれなるｐ組の縦続接続回路が
減算器31の出力端に縦続に接続される。先頭からｉ番目
の遅延回路33の出力端はαパラメータα_ｉを入力する掛
算器34の他方の入力端にも接続される。加算器35はすべ
て掛算器34の掛算出力を加算する。減算器31は入力する
音源信号から加算器35の加算出力を減算する。減算器31
の減算出力は分岐され出力音声信号として取出される。The lossy synthesis filter 15 includes a subtractor 31 and p multipliers 32 for inputting a constant γ satisfying 0 <γ <1 from one input terminal.
And p number of delay circuits 33 each providing a delay equal to the sampling period in the window cutout processors 1 and 2, and p number of inputting α parameters α _i (i is an integer of 1 to p) from one input terminal , And an adder 35. The output terminal of one multiplier 32 is connected to the input terminal of one delay circuit 33, and each one of the multipliers thus connected is connected.
32 and p sets of cascade connection circuits each including a delay circuit 33 are cascade-connected to the output terminal of the subtractor 31. The output terminal of the i-th delay circuit 33 from the top is also connected to the other input terminal of the multiplier 34 for inputting the α parameter α _i . All the adders 35 add the multiplication calculation power of the multiplier 34. The subtractor 31 subtracts the added output of the adder 35 from the input sound source signal. Subtractor 31
Is branched and taken out as an output audio signal.

定数γを１としたロス付合成フィルタ15、いいかえれ
ば、ロス付合成フィルタ15から掛算器32をすべて取除い
た回路は公知のLPC合成フィルタである。ロス付合成フ
ィルタ15はLPC合成フイルタの各段に定数γで定まる損
失を与えた構成になっており、その波形応答はLPC合成
フィルタの波形応答をダンピングした波形応答になる。A circuit obtained by removing all the multipliers 32 from the lossy synthesis filter 15 with the constant γ set to 1, in other words, is a known LPC synthesis filter. The lossy synthesis filter 15 has a configuration in which a loss determined by a constant γ is given to each stage of the LPC synthesis filter, and its waveform response is a waveform response obtained by damping the waveform response of the LPC synthesis filter.

ロス付合成フィルタ15の伝達関数H₁（Ｚ）はで表現される。又、通常、線形予測型音声分析合成装置
に使用されるLPC合成フィルタの伝達関数Ｈ（Ｚ）はで表現される。Ｈ（Ｚ）とH₁（Ｚ）との周波数伝送特性
（スペクトル包絡特性）例を第４図に、インパルス応答
例を第５図に示す。尚、第４図、第５図に於てH₁（Ｚ）
はγ＝0.8の場合を図示している。この係数γを1.0とし
た場合、H₁（Ｚ）＝Ｈ（Ｚ）に、γ＝0.0とした場合、H
₁（Ｚ）の周波数伝送特性は完全に平坦に、インパルス
応答は単位パルスになる。The transfer function H ₁ (Z) of the lossy synthesis filter 15 is Is represented by Usually, the transfer function H (Z) of the LPC synthesis filter used in the linear prediction type speech analysis / synthesis apparatus is Is represented by FIG. 4 shows an example of frequency transmission characteristics (spectral envelope characteristics) of H (Z) and H ₁ (Z), and FIG. 5 shows an example of impulse response. In addition, in FIG. 4 and FIG. 5, H ₁ (Z)
Shows the case where γ = 0.8. When this coefficient γ is 1.0, H ₁ (Z) = H (Z), and when γ = 0.0, H
The frequency transmission characteristic of ₁ (Z) is completely flat, and the impulse response is a unit pulse.

なお、ロス付合成フィルタ15から掛算器32をすべて取
除き、掛算器34にαパラメータα_１の代りに値α_ｉγ^ｉ
を入力するようにしても、ロス付合成フィルタ15と同一
の伝達関数をもつロス付合成フィルタを構成することが
できる。It should be noted that the multiplier 32 is entirely removed from the lossy synthesis filter 15 and the multiplier 34 has a value α _i γ ⁱ instead of the α parameter α _1.
Is input, a lossy synthesis filter having the same transfer function as the lossy synthesis filter 15 can be configured.

次に、音源信号生成部14について説明する。 Next, the sound source signal generation unit 14 will be described.

第３図は音源信号生成部14のブロック図である。 FIG. 3 is a block diagram of the sound source signal generation unit 14.

音源信号生成部14は、クロック発生器20と、パルス発
生器21と、クロック発生器20,パルス発生器21の出力信
号とαパラメータα_１〜α_ｐとを入力する標準型デジタ
ルフィルタ22と、標準型ディジタルフィルタ22の出力端
に縦続に接続されそれぞれクロック発生器20の出力信号
を入力する複数個（この個数については後述する）の遅
延回路23と、ピッチパラメータptを入力するパルス列発
生器24と、雑音発生器25と、ピッチパラメータptに制御
されてパルス列発生器24または雑音発生器25のいずれか
一方の出力信号を選択出力する切替器26と、窓切出処理
器1,2におけるサンプリング周期に等しい遅延をそれぞ
れ与え切替器26の出力端に縦続に接続され遅延回路23の
個数より１個少い個数の遅延回路27と、再後尾から互に
同じ順番に配置された遅延回路23および27の出力信号を
それぞれ２入力とする掛算器28と、先頭に配置された遅
延回路23の出力信号と先頭に配置された遅延回路27の入
力信号を２入力とする掛算器28と、すべての掛算器28の
掛算出力を加算する加算器29と、電力パラメータpwと加
算器29の加算出力とを入力し掛算出力を音源信号として
出力する掛算器30とを備えて構成されている。The sound source signal generation unit 14 includes a clock generator 20, a pulse generator 21, a standard digital filter 22 that receives output signals of the clock generator 20, the pulse generator 21, and α parameters α _{1 to} α _p . A plurality of (the number of which will be described later) delay circuits 23 cascaded to the output terminal of the standard digital filter 22 and each receiving an output signal of the clock generator 20, and a pulse train generator 24 receiving a pitch parameter pt , A noise generator 25, a switch 26 for selectively outputting either the pulse train generator 24 or the noise generator 25 under the control of the pitch parameter pt, and sampling in the window cutout processors 1 and 2. Each of the delay circuits is provided with a delay equal to the cycle and connected in cascade to the output terminal of the switch 26, and the number of the delay circuits 27 is one less than the number of the delay circuits 23, and the delay circuits 23 and 23 arranged in the same order from the rear end. Yo A multiplier 28 having two inputs each of the output signal of the delay circuit 27 and a multiplier 28 having two inputs of the output signal of the delay circuit 23 arranged at the head and the input signal of the delay circuit 27 arranged at the beginning, An adder 29 that adds the multiplication power of the multiplier 28 and a multiplier 30 that inputs the power parameter pw and the added output of the adder 29 and outputs the multiplication power as a sound source signal are provided.

パルス列発生器24は、ピッチパラメータpt中のピッチ
周期に等しい繰返し同期で、パルス列を発生する。雑音
発生器25はＭ系列等の白色雑音を出力する。切替器26
は、ピッチパラメータpt中の判定結果に対応して、有音
声の場合パルス発生器24の出力パルス列を、無声音の場
合雑音発生器25の出力雑音を選択し、励振信号として出
力する。The pulse train generator 24 generates a pulse train with repetitive synchronization equal to the pitch period in the pitch parameter pt. The noise generator 25 outputs white noise such as an M sequence. Switch 26
Selects the output pulse train of the pulse generator 24 for voiced speech and the output noise of the noise generator 25 for unvoiced speech in accordance with the determination result in the pitch parameter pt, and outputs it as an excitation signal.

音源信号生成部14のうちパルス列発生器24,雑音発生
器25,切替器26を除く部分は、切替器26の出力した励振
信号で励振され、以下説明するようにして音源信号をつ
くりだす。The part of the sound source signal generator 14 other than the pulse train generator 24, the noise generator 25, and the switch 26 is excited by the excitation signal output from the switch 26, and generates a sound source signal as described below.

先に述べたLPC合成フィルタの（αパラメータα_１〜
α_ｐで定まる）伝達関数Ｈ（Ｚ）おおびロス付合成フィ
ルタ15の（αパラメータα_１〜α_ｐで定まる）伝達関数
H₁（Ｚ）に対し、伝達関数がになるように標準型ディジタルフィルタ22を構成する。
クロック発生器20は、標準型ディジタルフィルタ22の所
要のインパルス応答長分だけの個数のクロックパルス
を、分析フレームごとに出力する。このクロックパルス
の繰返し周期は、窓切出処理器1,2におけるサンプリン
グ周期より十分短く設定する。パルス発生器21は、分析
フレームごとにインパルスを１個出力する。各遅延回路
23は、例えば、クロック発生器20が出力するクロックパ
ルスを動作するクロックとするＤ型フリップフロップを
所要のビット数分並列に組合せて構成され、遅延回路23
の個数はクロック発生器20のクロックパルス発生個数に
等しくする。(Α parameter α ₁ to LPC synthesis filter)
Transfer function H (Z) determined by α _p and transfer function (determined by α parameters α _{1 to} α _p ) of lossy synthesis filter 15
For H ₁ (Z), the transfer function is The standard digital filter 22 is configured so that
The clock generator 20 outputs as many clock pulses as the required impulse response length of the standard digital filter 22 for each analysis frame. The repetition period of this clock pulse is set sufficiently shorter than the sampling period in the window cutout processors 1 and 2. The pulse generator 21 outputs one impulse for each analysis frame. Each delay circuit
The delay circuit 23 is composed of, for example, a D-type flip-flop that uses a clock pulse output from the clock generator 20 as a clock to operate in parallel for a required number of bits.
Is equal to the number of clock pulses generated by the clock generator 20.

分析フレームごとにαパラメータα_１〜α_ｐが入力さ
れ標準型ディジタルフィルタ22の伝達関数H₂（Ｚ）が設
定され、続いてパルス発生器21からインパルスが入力し
クロック発生器20からのクロックパルスで標準形ディジ
タルフィルタ22が動作し、クロックパルスが出力され尽
すと、各遅延回路23の出力端に標準型ディジタルフィル
タ22のインパルス応答を表す信号が得られ、次の分析フ
レームまで保持される。The α parameters α _{1 to} α _p are input for each analysis frame, the transfer function H ₂ (Z) of the standard digital filter 22 is set, and then the impulse is input from the pulse generator 21 and the clock pulse from the clock generator 20 is input. When the standard type digital filter 22 is operated and the clock pulse is completely output, a signal representing the impulse response of the standard type digital filter 22 is obtained at the output terminal of each delay circuit 23, and is held until the next analysis frame.

ところで、遅延回路27,掛算器28,加算器29からなるト
ランスバーサルフィルタは、各タップ係数を各遅延回路
23から得ていることおよび各遅延回路23と各掛算器28と
の接続対応関係から、標準型ディジタルフィルタ22のイ
ンパルス応答を時間的に反転したインパルス応答をも
つ。切替器26が出力した励振信号がこのトランスバーサ
ルフィルタに入力し、その出力信号が掛算器30で入力音
声信号の電力と対応付けられ、音源信号としてロス付合
成フィルタ15へ出力される。なお、掛算器30を切替器26
の直後に接続することもできる。By the way, the transversal filter including the delay circuit 27, the multiplier 28, and the adder 29
Because of the fact that the impulse response of the standard digital filter 22 is temporally inverted, the impulse response has an impulse response based on the fact that the impulse response is obtained from 23 and the connection correspondence between each delay circuit 23 and each multiplier 28. The excitation signal output from the switch 26 is input to the transversal filter, and the output signal is correlated with the power of the input audio signal by the multiplier 30 and output to the lossy synthesis filter 15 as a sound source signal. Note that the multiplier 30 is connected to the switch 26
Can be connected immediately after

さて、音源信号生成部14が出力する音源信号のスペク
トル構造は切替器26が出力した励振信号で伝達関数H
₂（Ｚ）のディジタルフィルタを励振した場合の出力信
号のスペクトル構造と等しい。この音源信号が伝達関数
H₁（Ｚ）のロス付合成フィルタ15を通って出力音声信号
になるので、出力音声信号のスペクトル構造は励振信号
が伝達関数Ｈ（Ｚ）（＝H₁（Ｚ）・H₂（Ｚ））のLPC合
成フィルタを通った出力信号のスペクトル構造と一致
し、したがって、入力音声信号のスペクトル構造と一致
する。しかも、励振信号から音源信号をつくるトランス
バーサルフィルタのインパルス応答を伝達関数がH
₂（Ｚ）であるディジタルフィルタのインパルス応答を
反転したインパルス応答にすることにより、励振信号か
ら出力音声信号がつくられる過程の位相関係は伝達関数
Ｈ（Ｚ）のLPC合成フィルタによる処理過程の位相関係
とは異なり、励振信号がパルス列である場合も出力音声
信号のエネルギーがピッチ励振点に集中することはな
い。Now, the spectrum structure of the sound source signal output from the sound source signal generation unit 14 is represented by the transfer function H with the excitation signal output from the switch 26.
₂ Equivalent to the spectral structure of the output signal when the (Z) digital filter is excited. This sound source signal is the transfer function
Since the output audio signal passes through the lossy synthesizing filter 15 of H ₁ (Z), the spectrum structure of the output audio signal is such that the excitation signal has a transfer function H (Z) (= H ₁ (Z) · H ₂ (Z) ) Matches the spectral structure of the output signal that has passed through the LPC synthesis filter, and therefore matches the spectral structure of the input speech signal. In addition, the transfer function of the impulse response of the transversal filter that creates the sound source signal from the excitation signal is H
_{By making} the impulse response of the digital filter which is ₂ (Z) into an inverted impulse response, the phase relationship of the process of generating the output audio signal from the excitation signal becomes the phase of the process of the transfer function H (Z) processed by the LPC synthesis filter. Unlike the relationship, even when the excitation signal is a pulse train, the energy of the output audio signal does not concentrate on the pitch excitation point.

〔The invention's effect〕

以上説明したように本発明は、ロス付合成フィルタ
と、パルス列または雑音信号である励振信号からロス付
合成フィルタと合せて入力音声信号のスペクトル構造と
一致する出力音声信号を合成するディジタルフィルタの
インパルス応答を反転したインパルス応答をもつ音源波
形生成手段によって励振信号から音源信号をつくる音源
信号生成手段とを具備することにより、エネルギーがピ
ッチ励振点に集中せず、しかも入力音声信号と出力音声
信号とでスペクトル構造が一致して音質のよい線形予測
型音声分析合成装置を提供できる効果がある。As described above, the present invention provides a lossy synthesis filter and an impulse of a digital filter for synthesizing an output audio signal that matches a spectrum structure of an input audio signal from an excitation signal that is a pulse train or a noise signal together with the lossy synthesis filter. By providing a sound source signal generating means for generating a sound source signal from an excitation signal by a sound source waveform generating means having an impulse response whose response is inverted, energy is not concentrated at a pitch excitation point, and furthermore, an input sound signal and an output sound signal Therefore, there is an effect that a linear prediction type speech analysis / synthesis apparatus having a good sound quality by matching the spectrum structures can be provided.

[Brief description of the drawings]

第１図は本発明の一実施例を示すブロック図、第２図は
第１図に示す実施例におけるロス付合成フィルタ15のブ
ロック図、第３図は同じく音源信号生成部14のブロック
図、第４図はロス付合成フィルタ15のスペクトル包絡特
性を説明するための波形図、第５図はロス付合成フィル
タ15のインパルス応答特性を説明するための波形図であ
る。 1,2……窓切出処理器、３……LPC分析器、６……ピッチ
抽出器、13……K/α変換器、14……音源信号生成部、15
……ロス付合成フィルタ、20……クロック発生器、21…
…パルス発生器、22……標準型ディジタルフィルタ、2
3,27……遅延回路、24……パルス列発生器、25……雑音
発生器、26……切替器、28,30……掛算器、29……加算
器、31……減算器、32,34……掛算器、33……遅延回
路、35……加算器。FIG. 1 is a block diagram showing an embodiment of the present invention, FIG. 2 is a block diagram of a lossy synthesis filter 15 in the embodiment shown in FIG. 1, FIG. FIG. 4 is a waveform diagram for explaining the spectral envelope characteristics of the lossy synthesis filter 15, and FIG. 5 is a waveform diagram for explaining the impulse response characteristics of the lossy synthesis filter 15. 1,2 ... window cutout processor, 3 ... LPC analyzer, 6 ... pitch extractor, 13 ... K / α converter, 14 ... sound source signal generator, 15
…… Synthesis filter with loss, 20 …… Clock generator, 21…
… Pulse generator, 22 …… Standard digital filter, 2
3,27 delay circuit, 24 pulse train generator, 25 noise generator, 26 switch, 28, 30 multiplier, 29 adder, 31 subtractor, 32, 34: Multiplier, 33: Delay circuit, 35: Adder.

Claims

(57) [Claims]

An input speech signal is analyzed to extract voiced / unvoiced discrimination information, sound source information including fundamental frequency information when voiced, and a linear prediction coefficient indicating a spectrum envelope or a coefficient equivalent thereto. A linear predictive speech analysis / synthesizer for transmitting the signal to the synthesizing side and synthesizing the audio signal on the synthesizing side, wherein the synthesizing side receives an excitation signal generated based on the sound source information, and Sound source waveform shaping means for forming a sound source signal by involving an impulse response determined by a prediction coefficient or a coefficient equivalent thereto; A lossy synthesis filter having a characteristic obtained by adding a predetermined loss to the sound source signal, and a spectrum envelope frequency characteristic of the sound source wave signal and a loss of the lossy synthesis filter. Linear predictive speech analysis and synthesis device, characterized in that fitted to the spectrum envelope frequency characteristics of the input audio signal cascade frequency characteristics of the spectrum envelope frequency characteristics.

2. The sound source information including the voiced / unvoiced discrimination information of the input voice signal, the fundamental frequency information and the power information when the voice signal is voiced, and the spectrum envelope at each time interval predetermined on the analysis side. A linear prediction type speech which measures a linear prediction coefficient or a coefficient equivalent to the linear prediction coefficient, and synthesizes an output audio signal based on the sound source information and the linear prediction coefficient or a coefficient equivalent to the linear prediction coefficient on a synthesis side. In the analysis / synthesis apparatus, the synthesis side includes a digital filter to which the pulse from the pulse generator and the linear prediction coefficient or a coefficient equivalent to the linear prediction coefficient are input, and the voiced / unvoiced sound discrimination information. An excitation signal generating means for outputting a pulse train or a noise signal having a cycle equal to the cycle of the fundamental frequency; an output of the digital filter and the excitation signal A transversal filter to which the output of the generating means is input and which has an impulse response obtained by temporally inverting the impulse response of the digital filter, and a sound source for controlling the output of the transversal filter based on the power information and outputting a sound source signal Signal generating means; and a lossy synthesis filter having a characteristic obtained by adding a predetermined loss to a filter characteristic to which the sound source signal is input and determined by the linear prediction coefficient or a coefficient equivalent to the linear prediction coefficient. A quotient obtained by dividing the transfer function of the synthesis filter determined by the prediction coefficient or a coefficient equivalent to the linear prediction coefficient by the transfer function of the lossy synthesis filter is used as the transfer function of the digital filter, and an output audio signal is obtained from the synthesis filter. A linear predictive speech analysis / synthesis apparatus characterized by the following.