JPS60175100A

JPS60175100A - Voice synthesizer

Info

Publication number: JPS60175100A
Application number: JP59030744A
Authority: JP
Inventors: 鈴木　龍司
Original assignee: Sanyo Electric Co Ltd; Sanyo Denki Co Ltd
Current assignee: Sanyo Electric Co Ltd; Sanyo Denki Co Ltd
Priority date: 1984-02-20
Filing date: 1984-02-20
Publication date: 1985-09-09
Anticipated expiration: 2011-10-23
Also published as: JP2547532B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】イ）産業上の利用分野本発明は音声の特徴を示す音声パラメータに基づいて音
声を合成する音声合成装置に関する。DETAILED DESCRIPTION OF THE INVENTION (a) Field of Industrial Application The present invention relates to a speech synthesis device that synthesizes speech based on speech parameters indicating characteristics of speech.

口）従来技術音声合成装置としては、現在パーコール方式を代表とす
る線形予測符号化方式が主流になっておシ、このパーコ
ール方式は例えば日経エレクトロニクス１９８０年２月
４日号の記事［身近になった音声合成の各種方式を比較
する」に示されている如く、音声信号から抽出したパー
コール係数、アンプパラメータ、及びピッチパラメータ
をパラメータメモリに約１０ｍ５ｅｃのフレーム周期単
位で格納しておき、これを音声合成回路にフレーム周期
単位で更新して導入し音声信号を再合成するものである
。然るに、このフレーム周期は音声が定常状態であると
みなせるｌＱｍｓｅｃに設定されているが、音声合成回
路では音声の自然な変化状態を忠実に再現する為に、こ
れを１分割し′ｆｃ２．５ｍ式を単位時間として上記パ
ラメータの時系列をこの単位時間間隔に補間する補間処
理が行なわれ、との処理にて得られる補間パラメータに
基づいて音声が再合成されるのである。(Example) Conventional technology The linear predictive coding method represented by the Percoll method is currently mainstream as a speech synthesizer. As shown in "Comparing Various Methods of Speech Synthesis", the Percoll coefficients, amplifier parameters, and pitch parameters extracted from the speech signal are stored in the parameter memory in frame period units of approximately 10 m5ec, and these are This is updated and introduced into the synthesis circuit in frame period units to resynthesize the audio signal. However, this frame period is set to 1Qmsec, which allows the voice to be considered to be in a steady state, but in order to faithfully reproduce the natural changing state of voice, the voice synthesis circuit divides this into 1 and uses the fc2.5m formula. An interpolation process is performed to interpolate the time series of the parameters to this unit time interval, with the unit time being the unit time, and the voice is resynthesized based on the interpolation parameters obtained by the process.

しかしながら、上述の如き従来の音声合成装置に於いて
も、音質の低下を招く事なく音声データとしての音声パ
ラメータの時系列のデータにる削減してパラメータメモ
リのコストダウンを図る事が要望されており、言い換え
れば音声データのデータ量を増す事なく合成音声の品質
の向上を図る事が要望されている。However, even in the conventional speech synthesizer as described above, there is a desire to reduce the cost of parameter memory by reducing the number of audio parameters as audio data to time-series data without deteriorating the audio quality. In other words, it is desired to improve the quality of synthesized speech without increasing the amount of speech data.

（ハ）発明の目的本発明は上述の点に鑑みて為されたものであり音声デー
タの圧縮を図シ合成音声の品質の向上を可能とした音声
合成装置を提供するものである。(c) Object of the Invention The present invention has been made in view of the above-mentioned points, and it is an object of the present invention to provide a speech synthesis device that can compress speech data and improve the quality of synthesized speech.

に）発明の構成本発明の音声合成装置は、パラメータメモリに特定単位
時間の整数倍時間を更新時間とした音声パラメータの時
系列と、この音声パラメータ毎の更新時間を示す更新時
間コードと、が夫々対応づけられて格納されており、音
声合成回路はパラメータメモリから得られる音声パラメ
ータの時系列を夫々の音声パラメータに対応した更新時
間コードが示す更新時間毎に更新すると共に、夫々の音
声パラメータを特定単位時間間隔に補間するものである
。B) Structure of the Invention The speech synthesis device of the present invention includes, in the parameter memory, a time series of speech parameters whose update time is an integral multiple of a specific unit time, and an update time code indicating the update time for each of the speech parameters. The speech synthesis circuit updates the time series of speech parameters obtained from the parameter memory at every update time indicated by the update time code corresponding to each speech parameter, and Interpolation is performed at specific unit time intervals.

（ホ）実施例図に本発明の音声合成装置の構成余水す。同図に於いて
、（１）はパラメータＲＯＭであシ、音声の特徴を示す
パーコール係数に１〜Ｎ１０、ピッチパラメータＰ、及
びアンプパラメータＡからなる計約５０ビットのパラメ
ータ群に加えて１〜６ビツト程度の更新時間コードｎを
１フレームの音声データとして、この各フレームが時系
列的に格納されている。ただしこの時の１フレームの各
パラメータは音声信号の特定時間に対応したものではな
く、基本となる例えば２．５　ｍ　ｓｅｃの単位時間を
整数倍した時間に対応しており、この整数値が更新時間
コードｎとして表わされているのである。尚、この場合
のコードｎは０，１．２を示すものとする。（２）ハ該
パラメータＲＯＭ（１）からの１フレ一ム分毎の音声デ
ータの読み出しを制御する読み出し制御回路、（３）は
該パラメータＲＯＭ　（１）から読み出され７’（１フ
レ一ム分の音声データを一時的に格納するバッファメモ
リであシ、このメモリ（３）の更新時間コードｎに従っ
て上記読み出し制御回路（２）の次のフレームの読み出
しまでの時間即ち更新時間が決定される。例えばｎ＝Ｑ
の時Ｔ＝＝２．５ｍ５ｅｃ、ｎ　”’　１　ノ時Ｔ＝５
ｍｓｅｃ、ｎ＝２の時Ｔ＝１Ｑｍ渡となる。（４）は上
記バッフ１メモリ（２）に格納されているパラメータ群
に１〜に＋ｏ　、　Ｆ　、　Ａの夫々について、非線形
な伸張処理を行なって真のパラメータ値に逆変換する変
換ＩＬ（ｌｉである。即ち、パラメータｌｔＯＭ（１）
の各パラメータに１〜ｘｔｏ、ｒ。(E) The configuration of the speech synthesizer of the present invention is shown in the embodiment diagram. In the same figure, (1) is a parameter ROM, in addition to a parameter group of approximately 50 bits consisting of a Percoll coefficient indicating the characteristics of the voice, 1 to N10, a pitch parameter P, and an amplifier parameter A. Each frame is stored in chronological order, with an update time code n of approximately 6 bits serving as one frame of audio data. However, each parameter of one frame at this time does not correspond to a specific time of the audio signal, but corresponds to the basic unit time of, for example, 2.5 m sec, multiplied by an integer, and this integer value is updated. It is expressed as a time code n. Note that the code n in this case indicates 0, 1.2. (2) C) A readout control circuit that controls the reading of audio data for each frame from the parameter ROM (1); The buffer memory temporarily stores audio data for frames (3), and the time until the readout control circuit (2) reads the next frame, that is, the update time, is determined according to the update time code n of this memory (3). For example, n=Q
When T = = 2.5m5ec, when n ''' 1 T = 5
msec, when n=2, T=1Qm passing. (4) is a transformation IL (li That is, the parameter ltOM(1)
1 to xto, r for each parameter.

Ａは夫々効率の艮いデータ削減の為に真の値を非線形圧
縮処理されているのである。（５１）、（５２）は夫々
上記変換ＲＯＭ（４）からのパラメータ値を前回のす１ンプルに基づいて補間する７補開回路、及び７補間回路
であり、これ等回路（５１）、（支）の出力と上記変換
ｉｔ　ＯＭ　（４）からの直接の出力を選択出力するマ
ルチプレクサ（６）に結合している。即ち、該マルチプ
レクサ（６）は上記バッフ１メモリ（３）の更新時間コ
ードｎに従って、例えばｎ＝Ｑの時、補間されないパラ
メータ値ｆ：２．５ｍ気に一度選択出力し、ｎ＝度ずつ
計５ｍ５ｅｃ間に２層選択出力し、ｎ＝２の時Ｔ補間さ
れたパラメータ値が２．５ｍ５ｅｃに一度ずつ計ｉ［１
ｍ５ｏｃ間に４度選択出力するのである。（７）は上記
マルチプレクサ（６）から常に２．５ｍ５ｅｃの単位時
間毎に得られる各パラメータ値に基づいて音声信号を合
成するディジタルフィルタ、（８）ハディジタルフィル
タ（７）からの音声信号をアナログ形成に変換してスピ
ーカ（９）を駆動するり、Ａ変換回路である。The true values of each A are subjected to non-linear compression processing in order to reduce data efficiency. (51) and (52) are a 7-compensation circuit and a 7-interpolation circuit, respectively, which interpolate the parameter values from the conversion ROM (4) based on the previous sample. It is coupled to a multiplexer (6) for selectively outputting the output of the converter (4) and the direct output from the converter (4). That is, the multiplexer (6) selects and outputs the non-interpolated parameter value f: 2.5m once every n=degrees when n=Q, for example, according to the update time code n of the buffer 1 memory (3). Two layers are selectively output during 5 m5 ec, and when n = 2, the T-interpolated parameter value is calculated once every 2.5 m5 ec in total i[1
Selective output is performed four times during m5oc. (7) is a digital filter that synthesizes the audio signal based on each parameter value always obtained from the multiplexer (6) every 2.5 m5ec, and (8) a digital filter that synthesizes the audio signal from the digital filter (7) into an analog It is an A conversion circuit that converts the signal into a signal to drive the speaker (9).

斯様な音声合成装置に於いては、そのパラメータＲＯＭ
　＋１＋　１こけ、音声信号をパーコール分析して得ら
れるパラメータ群が格納される事となるが、この分析時
にパラメータ群の時間変化が非線形な所ではこのパラメ
ータ群のフレーム時間を短カくし、逆にパラメータ群の
時間変化が線形な所ではこのパラメータ群のフレーム時
間を長くスる処理が行なわれる。In such a speech synthesizer, its parameter ROM
+1+ 1Koke, the parameter group obtained by Parcoll analysis of the audio signal will be stored, but in places where the time change of the parameter group during this analysis is non-linear, the frame time of this parameter group will be shortened, and vice versa. Where the time change of a parameter group is linear, processing is performed to lengthen the frame time of this parameter group.

即ち、パーコール分析にて２．５ｍ５ｅｃの単位時間毎
にパラメータ群ここでは各パラメータに１〜に＋ｏ、　
Ｆ　、　Ａを代表してＸ（ｔ）のサンプル列が順次得ら
れるとすると、Ｘ　（ｔ）とＸ（ｔ＋４）とに依って１
補間を行ない、補間値Ｘ（ｔ＋１　）、Ｘ（ｔ＋２）、
Ｘ（ｔ＋１）をめ、これ等の補間値の夫々と実際の各分
析値Ｘ（ｔ＋１　）、Ｘ（ｔ＋２　ＬＸ（ｔ＋３）の夫
々との誤差δ（ｔ＋１）、δ（ｔ＋２）、δ（ｔ＋３）
がいずれも特定の誤差許容範囲内にある時には、この時
間区間でのサンプ■ ル列が線形に変化していると判定され、■補間が採用さ
れ、更新時間コードとしてｎ＝２が割当てられる。また
、上記の誤差δ（ｔ＋１）、δ（ｔ＋２）、δ（ｔ＋３
）の内一つでも誤差許容範囲外にある場合は、この時間
区間のサンプル列が非線形に変化していると判定され、
ｉ補間は採用されない。従って、この場合、次にＸ（ｔ
）とｘ（ｔ−４−２）とに依って１補間を行ない、補間
値ｘ（ｔ−１−１）をめ、この値と実際の分析値Ｘ（ｔ
＋１　）との誤差δ’（ｔ＋１）が特定の誤差許容範囲
内にある時に、百補間が採用され、更新時間コードとし
てｎ＝１が割当てられる。逆に、上記の誤差δ′（ｔ＋
１）が誤差許容範囲外にある時には、百補間も採用され
ず、即ちいずれの補間もなしでｎ＝０が割当てられる。That is, in the Percoll analysis, a group of parameters are calculated for each unit time of 2.5 m5ec.Here, each parameter is 1 to +o,
Assuming that a sample sequence of X(t) is obtained sequentially representing F and A, 1
Perform interpolation and obtain interpolated values X(t+1), X(t+2),
X(t+1), and the errors δ(t+1), δ(t+2), δ(t+3) between each of these interpolated values and each of the actual analysis values X(t+1), X(t+2), LX(t+3) )
When both are within a specific error tolerance range, it is determined that the sample sequence in this time interval is changing linearly, interpolation is adopted, and n=2 is assigned as the update time code. Also, the above errors δ(t+1), δ(t+2), δ(t+3
) is outside the error tolerance range, it is determined that the sample sequence in this time interval is changing nonlinearly,
i-interpolation is not adopted. Therefore, in this case, then X(t
) and x(t-4-2), obtain the interpolated value x(t-1-1), and combine this value with the actual analysis value X(t-4-2).
+1) when the error δ'(t+1) is within a certain error tolerance range, hundred interpolation is adopted and n=1 is assigned as the update time code. Conversely, the above error δ′(t+
1) is outside the error tolerance, neither the hundred interpolation is employed, ie n=0 is assigned without any interpolation.

この様にして、パラメータ群の各パラメータのサンプル
Ｘ　（ｔ）の変化状態に応じてこのパラメータ群の更新
時間ｎを設定しているので、図の音声合成装置に於いて
、この更新時間ｎに対応した補間処與ができるのである
。尚、以上の説明に於いて補間を加えてもよく、この場
合にはさらに、音声データの圧縮が可能となる事は明白
であろう。In this way, the update time n of this parameter group is set according to the change state of sample X (t) of each parameter in the parameter group, so in the speech synthesizer shown in the figure, Corresponding interpolation processing can be performed. Note that interpolation may be added to the above description, and it is clear that in this case, it becomes possible to further compress the audio data.

（へ）発明の効果本発明の音声合成装置は、以上の説明から明らかな如く
、音声信号の変化率の状態に依ってパラメータの更新時
間を可変としながら、これに対応して補間回数を可変に
設定できるので、合成音声の品質の劣化を防止しながら
、音声データの大巾なデータ圧縮が可能となる。また逆
に言えば従来装置と同一のデータ量であれば、合成音声
の品質の向上が図れるのである。(f) Effects of the Invention As is clear from the above description, the speech synthesis device of the present invention allows the update time of parameters to be varied depending on the state of the rate of change of the audio signal, and the number of interpolations to be varied accordingly. Since it can be set to , it is possible to perform extensive data compression of audio data while preventing deterioration in the quality of synthesized speech. Conversely, if the amount of data is the same as that of conventional devices, the quality of synthesized speech can be improved.

[Brief explanation of the drawing]

図は本発明の音声合成装置の構成を示すブロック図であ
シ、（１）はパラメータＲＯＭ、（２＋は読み出し制御
回路、（５１１５２）は補間回路、（７）はディジタル
フィルタ、（９）はスピーカを夫々示している。出願人三洋電機株式会社代理人　弁理士　佐野　静夫The figure is a block diagram showing the configuration of the speech synthesis device of the present invention, (1) is a parameter ROM, (2+ is a readout control circuit, (51152) is an interpolation circuit, (7) is a digital filter, and (9) is a Each speaker is shown. Applicant Sanyo Electric Co., Ltd. Representative Patent Attorney Shizuo Sano

Claims

[Claims]

1) Consists of a parameter memory that stores audio parameters extracted from audio signals in time series, and a speech synthesis circuit that synthesizes speech based on the time series of audio parameters that are sequentially removed from the parameter memory. A time series of audio parameters whose update time is an integer multiple of a specific unit time and an update time code indicating the update time for each audio parameter are stored in correspondence with each other, and the speech synthesis circuit A speech synthesis device characterized by updating a time series of speech parameters obtained from a parameter memory at each time indicated by an update time code corresponding to each speech parameter, and interpolating each speech parameter at a specific unit time interval. .