JPH03203798A

JPH03203798A - Voice synthesis system

Info

Publication number: JPH03203798A
Application number: JP1343112A
Authority: JP
Inventors: Takashi Aso; 隆麻生; Takeshi Fujita; 武藤田; Yasunori Ohora; 恭則大洞; Katsuhiko Kawasaki; 勝彦川崎
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1989-12-29
Filing date: 1989-12-29
Publication date: 1991-09-05

Abstract

PURPOSE:To secure the continuity in the coupling of a VCV(vowel-consoant- vowel) phoneme and obtain a smooth synthesized voice by normalizing the power of a voice phoneme based on the mean power value of respective vowels. CONSTITUTION:When voice phonemes are connected, the value of standard power used for power normalization is found and stored in a power normalization data storage part 7 before voice synthesis so as to normalize the power based on the mean power value of the vowels. Then the power of the VCV phoneme parameter which is inputted from a parameter read part 3 is normalized by a power normalization part 6. Consequently, the continuity at the time of the coupling of the VCV phoneme is secured and the smooth synthesized voice is obtained.

Description

【発明の詳細な説明】［産業上の利用分野］本発明は、ＶＣＶ　（母音−子音−母音）素片編集によ
る音声合成方式に関するも、のである。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a speech synthesis method using VCV (vowel-consonant-vowel) segment editing.

［従来の技術］従来文字列データから音声を生成するための、音声規則
合成方式がある。これは、文字列データの情報に従って
、音声素片のファイルに登録された音声素片の特長パラ
メータ（ＬＰＣ，ＰＡＲＣＯＲ，ＬＳＰ、メルケブスト
ラムなど。以下単にパラメータと呼ぶことにする）を取
り出し、一定の規則に基づいてパラメータと駆動音源信
号（有声音声区間ではインパルス列、無声音声区間では
ノイズ）を合成音声の発声速度に応じて伸縮させて結合
し、音声合成器に与えることにより合成音声を得ている
。ここで音声素片の種類としては、ＣＶ（子音−母音）
素片、ＶＣＶ　（子音−母音−子音）素片等を用いるの
が一般的である。[Prior Art] Conventionally, there is a speech rule synthesis method for generating speech from character string data. This extracts the feature parameters (LPC, PARCOR, LSP, Melkebstrum, etc., hereinafter simply referred to as parameters) of the speech segments registered in the speech segment file according to the information of the character string data, and uses certain rules. Synthesized speech is obtained by combining parameters and driving sound source signals (impulse train for voiced speech sections, noise for unvoiced speech sections) according to the speaking speed of the synthesized speech, and feeding them to the speech synthesizer. . Here, the types of speech segments are CV (consonant-vowel)
It is common to use a segment, a VCV (consonant-vowel-consonant) segment, or the like.

素片を合成するために、パラメータの補間を行う必要が
あるが、パラメータが急激に変化するようなときの補間
も、従来は補間区間において音声素片と音声素片を直線
で結ぶだけだったため、元々のスペクトル情報が失われ
、合成される音声も変化してしまう危険性が生じる。ま
た、単語あるいは文章として発声された人間の音声中か
ら、該当するＶＣＶ区間を切り出してきたものをそのま
ま用いていた。In order to synthesize speech segments, it is necessary to interpolate parameters, but conventionally, interpolation when parameters change rapidly was simply connecting speech segments with straight lines in the interpolation interval. , there is a risk that the original spectral information will be lost and the synthesized speech will also change. Also, the corresponding VCV section was extracted from human speech uttered as a word or sentence and used as it was.

［発明が解決しようとしている問題点］従来の技術では
、音声素片を補間する際に、単語あるいは文章として発
声された人間の音声中から、該当する■ＣＶ区間を切り
出してきたそのものをそのまま用いて、直線で結ぶだけ
だった為、発声環境の違いからパワーのばらつきが大き
く、ギャップが生じてしまう為、異音に聞こえてしまつ
という問題点があった。[Problem to be solved by the invention] In the conventional technology, when interpolating speech segments, the corresponding ■CV section is extracted from human speech uttered as a word or sentence and is used as it is. However, since they were simply connected in a straight line, there was a problem in that the power varied greatly due to differences in the vocal environment, creating gaps, which resulted in sounds that sounded strange.

［問題を解決するための手段］本発明では、各母音のパワーの平均値を基準にして音声
素片のパワーの正規化を行うことにより、ＶＣＶ素片の
結合時の連続性を確保し、滑らかな合成音声を得ること
を可能とする音声合成方式を提供することを目的とする
。[Means for solving the problem] In the present invention, by normalizing the power of speech segments based on the average value of the power of each vowel, continuity when combining VCV segments is ensured, The purpose of the present invention is to provide a speech synthesis method that makes it possible to obtain smooth synthesized speech.

また、本発明では、単語あるいは文章などにおけるパワ
ー特性に従って母音のパワーの平均値を調節してからＶ
ＣＶ素片のパワーの正規化を行うことにより、単語ある
いは文章などのアクセント等の、より自然で、滑らかな
合成音声を得ることを可能とする音声合成方式を提供す
ることを目的としている。In addition, in the present invention, the average value of vowel power is adjusted according to the power characteristics of a word or sentence, and then V
The purpose of this invention is to provide a speech synthesis method that makes it possible to obtain more natural and smooth synthesized speech, such as accented words or sentences, by normalizing the power of CV segments.

［実施例１］第１図は、本発明の一実施例を説明するための図で、１
は合成すべき単語あるいは文章を入力するテキスト入力
部、２は入力されたテキストを解析して音韻系列に分解
したり、テキスト中に含まれるコントロールコード（ア
クセント情報や発声速度などを制御するコード）を解析
するためのテキスト解析部、３はテキスト解析部２から
の音韻系列情報から必要とする音声素片パラメータ番デ
ータ中から読出すためのパラメータ読出し部、４はＶＣ
■音声素片を格納しであるＶＣＶパラメータファイルで
あり、このｖＣ■パラメータの中に音声のパワーの情報
も含まれている。５はテキスト解析部２からの制御情報
からピッチを生成するピッチ生成部、６はパラメータ読
出し部５で読出した音声素片のパワーの正規化を行うパ
ワー正規化部、７はパワー正規化部６で用いるパワーの
基準値を記憶しておくためのパワー正規化データ格納部
、８はパワーの正規化を行った音声素片データを接続す
るパラメータ接続部、９は接続されたパラメータ系列と
ピッチ情報から音声波形を生成する音声合成部、１０は
音声波形を出力するための出力手段である。[Example 1] FIG. 1 is a diagram for explaining an example of the present invention.
2 is a text input section that inputs words or sentences to be synthesized, and 2 is a control code that analyzes the input text and breaks it down into phoneme sequences, and that is included in the text (a code that controls accent information, speaking rate, etc.) 3 is a parameter reading unit for reading out necessary speech unit parameter number data from the phoneme sequence information from the text analysis unit 2; 4 is a VC
■This is a VCV parameter file that stores speech segments, and this vC■ parameter also includes information on the power of the voice. 5 is a pitch generation unit that generates a pitch from the control information from the text analysis unit 2; 6 is a power normalization unit that normalizes the power of the speech segment read out by the parameter reading unit 5; and 7 is a power normalization unit 6. 8 is a parameter connection section that connects the speech segment data whose power has been normalized; 9 is a connected parameter series and pitch information; 10 is an output means for outputting a speech waveform.

本発明は、音声素片の接続を行う際、母音のパワー平均
値を基準にしてパワーの正規化を行うため、音声合成を
行う前に、予めパワーの正規化に用いる標準的なパワー
の値を求めて、パワー正規化データ格納部７に記憶させ
ておく必要がある。In the present invention, when connecting speech segments, the power is normalized based on the average vowel power value, so before performing speech synthesis, the standard power value used for power normalization is It is necessary to calculate and store it in the power normalized data storage section 7.

そのための方法について説明する。第３図は、母音のパ
ワー平均値を求める方法を示す図である。The method for doing so will be explained. FIG. 3 is a diagram showing a method for determining the average power value of vowels.

まず、母音部Ｖにおけるパワーの変化から、その定常部
Ｖ°を抽出し、特長パラメータ（ｂ＋、）（１≦ｉ≦ｎ
、１≦ｊ≦ｋ）を求める。ここでｋは分析次数、ｎはＶ
ｏのフレーム数である。次に特長パラメータ（ｂ、Ｊ）
の中からパワー情報を表す項（メルケブストラム係数に
おいては１次の項）を時間方向（ｉ方向）に加えあわせ
て平均することにより、パワー環の平均値を求める。以
上の操作を、各母音（必要なら撥音についても求めてお
く）毎に行い、各母音のパワー環の平均値を求め、パワ
ー正規化データ格納部７に記憶させてお（。First, the steady part V° is extracted from the power change in the vowel part V, and the feature parameter (b+,) (1≦i≦n
, 1≦j≦k). where k is the analytical order and n is V
is the number of frames of o. Next, feature parameters (b, J)
The average value of the power ring is obtained by adding and averaging terms representing power information (first-order terms in the Melkebstrum coefficients) in the time direction (i-direction). The above operation is performed for each vowel (if necessary, the phlegm is also determined), the average value of the power ring of each vowel is determined, and the average value is stored in the power normalized data storage unit 7 (.

以下データの流れに沿って動作の説明を行う。The operation will be explained below along with the flow of data.

合成すべきテキストはテキスト人力部１より入力される
。ここでのテキストはローマ字あるいは仮名などの読み
を表す文字中に、アクセントや発声速度を制御するため
のコントロールコードが挿入されているものを想定して
いるが、漢字仮名混じり文を音声出力するような場合に
は、テキスト入力部１の前に言語解析部を設けて、漢字
仮名交じり文を読みに変換すればよい。Text to be synthesized is input from the text human resource section 1. The text here is assumed to be Roman letters or kana, with control codes inserted to control the accent and speaking speed, but it is assumed that the text is written in kanji or kana, with control codes inserted to control the accent and speaking speed. In such a case, a language analysis section may be provided in front of the text input section 1 to convert sentences containing kanji and kana into readings.

テキスト入力部１で入力されたテキストはテキスト解析
部２において解析されて、読みを表す情報（音韻系列情
報）とアクセント位置や発生速度などの情報（制御情報
）に分解される。音韻系列情報はパラメータ読み出し部
３に入力され、指定された音声素片パラメータをＶＣ■
パラメータファイル４から読み出す。パラメータ読み出
し部３より入力された音声素片パラメータはパワー正規
化部６によりパワーの正規化を行う。The text input by the text input section 1 is analyzed by the text analysis section 2 and decomposed into information representing the pronunciation (phonological sequence information) and information such as accent position and rate of occurrence (control information). The phoneme sequence information is input to the parameter reading unit 3, and the specified speech unit parameters are input to the VC■
Read from parameter file 4. The power normalization section 6 normalizes the power of the speech segment parameters inputted from the parameter reading section 3 .

第４図は、■ＣＶ素片における母音のパワー正規化方法
を説明するための図で、（ａ）はデータベースから取り
出してきたＶＣＶデータのパワー変化、（ｂ）はパワー
正規化関数、（ｃ）は正規化関数を用いて正規化された
■Ｃｖデータのパワー変化である。データベースから取
り出してきたＶＣＶデータは、発生環境の違いから、同
じ母音においてもパワーの値にばらつきが大きい。Figure 4 is a diagram for explaining the power normalization method of vowels in ■CV fragments, where (a) is the power change of VCV data retrieved from the database, (b) is the power normalization function, and (c ) is the power change of ■Cv data normalized using a normalization function. The VCV data retrieved from the database has large variations in power values even for the same vowel due to differences in the environment in which it occurs.

したがってＶＣＶデータの両端において、（ａ）に示す
ように、パワー正規化データ格納部７に記憶されている
各母音のパワーの平均値とギャップが生じている。そこ
で、ＶＣＶデータの両端におけるギャップ（ΔＸ、Δｙ
）を計測して、両端のギャップを打ち消すような直線を
作成し、正規化関数とする。具体的には（ｂ）に示すよ
うに、両端でのギャップ（ΔＸ、Δｙ）をｖＣ■Ｃ−デ
ータ間いて直線で結んだものをパワー正規化関数とする
。Therefore, at both ends of the VCV data, as shown in (a), there is a gap with the average power of each vowel stored in the power normalized data storage section 7. Therefore, the gap (ΔX, Δy
), create a straight line that cancels out the gaps at both ends, and use it as a normalization function. Specifically, as shown in (b), the power normalization function is obtained by connecting the gaps (ΔX, Δy) at both ends with a straight line between vC■C-data.

（ｂ）で作成された正規化関数を、（ａ）の原データに
適用し、ＶＣｖデータの両端においてパワーのギャップ
がな（なるように調節して、（ｃ）に示すような正規化
されたＶｃ■データを得ることができる。このとき、対
数値で与えられるパラメータ（例えばメルケブストラム
パラメータなど）においてはパラメータの調節は加減算
で行うことができるので、（ｂ）で作成された正規化関
数を（ａ）の原データに加減するだけの簡単な処理で正
規化することができる。第４図には、分かり易いように
メルケブストラムパラメータの場合を示した。Apply the normalization function created in (b) to the original data in (a), adjust it so that there is no power gap at both ends of the VCv data, and obtain the normalized data shown in (c). At this time, the normalization created in (b) The function can be normalized by a simple process of adding or subtracting the function to the original data in (a). Figure 4 shows the case of Melkebstrum parameters for ease of understanding.

次にパラメータ接続部８において、パワー正規化部６で
パワーの正規化を施された■Ｃｖデータをモーラが等間
隔になるように配置し、母音の定常部において補間処理
を行い、パラメータ系列を作成する。Next, in the parameter connection unit 8, the ■Cv data whose power has been normalized in the power normalization unit 6 are arranged so that the moras are equally spaced, and interpolation processing is performed in the constant part of the vowel to create a parameter series. create.

ピッチ生成部５においてはテキスト解析部２からの制御
情報に従ってピッチ系列を作成する。The pitch generation section 5 generates a pitch sequence according to the control information from the text analysis section 2.

このピッチ系列とパラメータ接続部８で得られるパラメ
ータ系列から、合成部９において音声波形を作成する。From this pitch series and the parameter series obtained by the parameter connection section 8, a speech waveform is created in the synthesis section 9.

合成部９はデジタルフィルタなどで構成することができ
る。作成された音声波形は出力手段１０により音声出力
される。The synthesizing section 9 can be composed of a digital filter or the like. The created audio waveform is output as audio by the output means 10.

本実施例は、ＣＰＵ　（中央制御装置）においてプログ
ラムによって制御されていても良い。This embodiment may be controlled by a program in a CPU (central control unit).

［実施例２］実施例１では、パワー正規化部６において、正規化関数
として、１つのｖＣ■データ区間に対して１つの直線を
与えるようになっているが、この方式ではＣの部分も正
規化の影響を受けて、パワーが変化する。そこで、本実
施例では母音部だけを正規化する方式について述べる。[Embodiment 2] In Embodiment 1, the power normalization unit 6 provides one straight line for one vC■ data interval as the normalization function, but in this method, the C part is also Power changes under the influence of normalization. Therefore, in this embodiment, a method of normalizing only the vowel part will be described.

実施例１と同様にして、各母音のパワーの平均値を求め
てパワー正規化データ格納部７に記憶させる。また接続
に使用するＶＣＶデータについて、■（母音）とＣ（子
音）の境界にあらかじめ印をつけたデータを記憶させて
お（。In the same manner as in Example 1, the average value of the power of each vowel is determined and stored in the power normalized data storage section 7. Also, regarding the VCV data used for connection, I have stored data in which the boundary between ■ (vowel) and C (consonant) is marked in advance (.

第５図はパワー正規化部６におけるほかの実施例を説明
する為の図で、（ａ）はデータベースから取り出してき
たＶＣＶデータのパワー変化、（ｂ）は母音部のパワー
を正規化する為のパワー正規化関数、（ｃ）は正規化関
数を用いて正規化されたＶＣｖデータのパワー変化であ
る。FIG. 5 is a diagram for explaining another embodiment in the power normalization unit 6, in which (a) shows the power change of VCV data retrieved from the database, and (b) shows the normalization of the power of the vowel part. (c) is the power change of VCv data normalized using the normalization function.

実施例１と同様に、■Ｃｖデータの両端と、各母音のパ
ワーの平均値とのギャップ（ΔＸ。As in Example 1, ■ the gap (ΔX) between both ends of the Cv data and the average value of the power of each vowel;

△ｙ）を計測する。ΔＸは■ＣＶデータの前のＶの範囲
におけるギャップを打ち消す為に、（ａ）の区間Ａにお
いて、ΔｘＯとを直線で結んだものを区間Ａの正規化関
数とする。同様にΔｙはＶＣＶデータの後ろの■の範囲
におけるギャップを打ち消す為に、（ａ）の区間Ｃにお
いて、０とΔｙを直線で結んだものを区間Ｃにおけるパ
ワー正規化関数とする。区間Ｂの子音部については正規
化関数を設定しない。△y). In order to cancel the gap in the range of V before CV data, ΔX is connected with ΔxO by a straight line in interval A of (a), and the normalization function of interval A is used. Similarly, for Δy, in order to cancel the gap in the range marked ■ after the VCV data, in the section C of (a), the power normalization function in the section C is set by connecting 0 and Δy with a straight line. No normalization function is set for the consonant part of section B.

実際にパワーの値を設定する為には、実施例１と同様に
、（ｂ）で作成されたパワー正規化関数を（ａ）の原デ
ータに適用することにより（ｃ）に示すような正規化さ
れたＶＣｖデータを得ることができる。このとき、対数
値で与えられるパラメータ（例えばメルケブストラムパ
ラメータなど）においてはパラメータの調節は加減算で
行うことができるので、（ｂ）で作成された正規化関数
を（ａ）の原データから差し引（だけの簡単な処理で正
規化することができる。第５図には分かり易いようにメ
ルケブストラムパラメータの場合を示した。In order to actually set the power value, as in Example 1, by applying the power normalization function created in (b) to the original data in (a), the normalization as shown in (c) is obtained. It is possible to obtain converted VCv data. At this time, for parameters given as logarithmic values (for example, Melkebstrum parameters), parameter adjustment can be performed by addition and subtraction, so the normalization function created in (b) is subtracted from the original data in (a). It can be normalized by a simple process of subtracting (.).For ease of understanding, the case of Melkebstrum parameters is shown in FIG.

以上説明したように、母音のパワーの平均値とＶＣＶデ
ータのパワーのギャップをなくすようなパワー正規化関
数を求めて、■ｃｖデータを正規化することにより、よ
り自然な合成音声を得ることが可能となる。また、パワ
ー正規化関数については上記実施例において２種類のも
のについて述べたが、このほかにも次のような関数が考
えられる。As explained above, by finding a power normalization function that eliminates the gap between the average vowel power and the power of VCV data and normalizing the cv data, it is possible to obtain more natural synthesized speech. It becomes possible. Furthermore, although two types of power normalization functions have been described in the above embodiments, the following functions may also be considered.

［実施例３コ第６図はパワー正規化関数のほかの実現方法を示す図で
ある。第４図においては（ΔＸ、Δｙ）を直線で結ぶこ
とによりパワー正規化関数を求めたが、ここではＶＣＶ
データの両端の位置では傾きがＯとなるような２次曲線
をパワー正規化関数とする。ＶＣＶデータの先行あるい
は後続の補間区間は、正規化関数によるパワーの調節が
行われていないので、このようにパワー正規化関数の傾
きを徐々に０へと近付けることによって、正規化を行っ
た後のパワーの変化が、■ｃ■データと補間区間におけ
る母音のパワー平均値との境界付近において滑らかにな
るという効果が生まれる。[Embodiment 3] FIG. 6 is a diagram showing another method of realizing the power normalization function. In Figure 4, the power normalization function was obtained by connecting (ΔX, Δy) with a straight line, but here, VCV
The power normalization function is a quadratic curve whose slope is O at both ends of the data. In the preceding or following interpolation interval of VCV data, the power is not adjusted by the normalization function, so by gradually approaching the slope of the power normalization function to 0 in this way, the power can be adjusted after normalization. An effect is created in which the change in power becomes smooth near the boundary between the ■c■ data and the average vowel power value in the interpolation interval.

パワーの正規化の方法は実施例１に述べた方法−と同じ
である。The power normalization method is the same as the method described in the first embodiment.

［実施例４］第７図はパワー正規化関数の他の実現方法を示す図であ
る。第４図においてパワー正規化関数の存在する区間Ａ
および区間Ｃにおいて、それぞれの境界で傾きがＯとな
るような２次の曲線をパワー正規化関数とする。ｖＣ■
データの先行あるいは後続の補間区間は、正規化関数に
よるパワーの調節が行われていないので、このようにパ
ワー正規化関数の傾きを徐々に０へと近付けることによ
って、正規化を行った後のパワーの変化が、ＶＣＶデー
タと補間区間における母音のパワー平均値との境界付近
において滑らかになるという効果が生まれる。このよう
にすれば子音部を変化させることなく、かつＶＣｖデー
タの境界付近でパワーの変化を滑らかにすることができ
る。[Embodiment 4] FIG. 7 is a diagram showing another method of realizing the power normalization function. In Fig. 4, the section A where the power normalization function exists
In section C, a quadratic curve with a slope of O at each boundary is defined as a power normalization function. vC ■
The power of the preceding or following interpolation interval of data is not adjusted by the normalization function, so by gradually approaching the slope of the power normalization function to 0 in this way, the power after normalization can be adjusted. This produces the effect that the change in power becomes smooth near the boundary between the VCV data and the average vowel power value in the interpolation interval. In this way, it is possible to smooth the change in power near the boundary of the VCv data without changing the consonant part.

パワーの正規化の方法は実施例１に述べた方法と同じで
ある。The power normalization method is the same as that described in the first embodiment.

本実施例は、ＣＰＵ’（中央制御装置）においてプログ
ラムによって制御されていても良い。This embodiment may be controlled by a program in a CPU' (central control unit).

［実施例５〕以上の実施例では、各母音のパワーの平均値は■ＣＶデ
ータの接続時点に関係なく各母音ごとに一定の値を用い
ているが、単語あるいは文章を合成する場合には、ｖＣ
■素片の位置により母音のパワーを変化させたほうがよ
り自然な合成音声になる。パワーはピッチと連動すると
考えれば、ピッチに同期させて各母音のパワーの平均値
（各母音の基準値と呼ぶことにする）を操作することも
出来る。この場合、合成音声に付加するピッチパターン
に応じて基準値を上げ下げする割合（パワー特性と呼ぶ
ことにする）を決定し、その割合で基準値を変化させて
パワーの調節を行う。その場合の実施例を第８図に示す
。[Example 5] In the above example, the average value of the power of each vowel is a constant value for each vowel regardless of the connection point of CV data, but when synthesizing words or sentences, ,vC
■Varying the power of the vowel depending on the position of the elemental piece will result in more natural synthesized speech. Considering that power is linked to pitch, it is also possible to manipulate the average power value of each vowel (hereinafter referred to as the reference value for each vowel) in synchronization with pitch. In this case, the rate at which the reference value is raised or lowered (referred to as a power characteristic) is determined according to the pitch pattern added to the synthesized speech, and the power is adjusted by changing the reference value at that rate. An example in that case is shown in FIG.

第８図において１から１０までは第１図における同一ブ
ロックと同様の機能を有する。In FIG. 8, blocks 1 to 10 have the same functions as the same blocks in FIG.

１１はピッチ生成部５により作成されるピッチパターン
に従ってパワー正規化データ格納部７のパワーの基準値
を変化させるパワー基準値作成部である。Reference numeral 11 denotes a power reference value creation unit that changes the power reference value of the power normalized data storage unit 7 in accordance with the pitch pattern created by the pitch generation unit 5.

ここでは実施例１に対してパワー基準値作成部１１が追
加されているので、この部分について第９図を用いて説
明する。Here, a power reference value creation section 11 is added to the first embodiment, so this part will be explained using FIG. 9.

第９図において、（ａ）は入力された音韻系列に従って
ＶＣｖデータを時間軸方向に配置したときのパワーの変
化と補間区間の各母音におけるパワーの基準値、（ｂ）
はピッチパターンに応じて求められるパワー特性、（ｃ
）パワー特性に従って変更した基準値、（ｄ）は（ｃ）
を基準にして（ａ）のＶＣＶデータの正規化を行い、得
たパワーの値である。In FIG. 9, (a) shows the change in power when VCv data is arranged in the time axis direction according to the input phoneme sequence and the reference value of power for each vowel in the interpolation interval, and (b)
is the power characteristic required according to the pitch pattern, (c
) Reference value changed according to power characteristics, (d) is (c)
This is the power value obtained by normalizing the VCV data in (a) based on .

文章や単語等を発声する場合には、語頭はパワーが太き
（、語尾に近付くにつれてパワーは徐々に小さくなって
ゆく。これは、文章や単語中で音節の数であるモーラ数
と、そのモーラ中の何番目のモーラであるかで判断でき
る。また、語中のアクセント位置ではパワーが一時的に
大きくなる。そこで単語のモーラ数とアクセント位置か
らパワー特性を仮定することが可能である。今、仮に、
（ｂ）に示すようなパワー特性を仮定し、この特性に従
って（ａ）の補間区間の母音の基準値を修正する。メル
ケブストラム係数の場合はパラメータが対数値になって
いるので、（ｃ）に示すように修正分を加減算すること
により基準値の変更を行うことができる。この変更され
た基準を用いて、（ｄ）に示すように、（ａ）の各ＶＣ
Ｖデータのパワーの正規化を行う。正規化の方法につい
ては実施例１から実施例４に記載しである。When pronouncing a sentence or word, the power is high at the beginning of the word (and the power gradually decreases as it approaches the end. This is due to the number of moras, which is the number of syllables in the sentence or word, and It can be determined by the number of mora in the mora.Also, the power temporarily increases at the accent position in the word.Therefore, it is possible to assume the power characteristics from the number of mora in the word and the accent position. Now, hypothetically,
Assuming a power characteristic as shown in (b), the reference value of the vowel in the interpolation interval of (a) is corrected according to this characteristic. In the case of Melkebstral coefficients, the parameters are logarithmic values, so the reference value can be changed by adding or subtracting the correction amount, as shown in (c). Using this modified criterion, each VC in (a), as shown in (d),
Normalize the power of V data. The normalization method is described in Examples 1 to 4.

［効果］本発明において、各母音のパワーの平均値を基準にして
音声素片のパワーの正規化を行うことにより、ＶＣｖ素
片の結合時の連続性を確保し、滑らかな合成音声を得る
ことを可能とする音声合成方式を提供することが可能と
なった。[Effect] In the present invention, by normalizing the power of speech segments based on the average value of the power of each vowel, continuity is ensured when VCv segments are combined, and smooth synthesized speech is obtained. It has now become possible to provide a speech synthesis method that enables this.

また、本発明において、単語あるいは文章などにおける
パワー特性に従って母音のパワーの平均値を調節してか
らＶＣＶ素片のパワーの正規化を行うことにより、単語
あるいは文章などのアクセント等の、より自然で、滑ら
かな合成音声を得る第１図は本発明の第１の実施例の構
成を示すブロック図第２図はｖＣ■素片接続におけるパワーギャップ第３図
は母音のパワー平均値を求める方法第４図はＶＣＶ素片
における母音のパワー正規化方法第５図はＶＣ■素片における母音のパワー正規化方法そ
の２第６図はパワー正規化部６における他の実施例その１第７図はパワー正規化部６における他の実施例その２第８図は本発明の第５の実施例の構成を示すブロック図第９図は母音のパワー基準値を変化させる方法ｌ・・・
テキスト入力部、２・・・テキスト解析部、３・・・パラメータ読み出し部、４・・・ｖＣ■パラメータファイル、５・・・ピッチ生成部、６・・・パワー正規化部、７・・・パワー正規化格納部、８・・・パラメータ接続部、９・・・合成部、ｌＯ・・・スピーカ、１１・・・パワー基準値作成部。In addition, in the present invention, by adjusting the average value of vowel power according to the power characteristics of a word or sentence, etc., and then normalizing the power of a VCV segment, it is possible to make the accent of a word or sentence more natural. , obtaining smooth synthesized speech. Figure 1 is a block diagram showing the configuration of the first embodiment of the present invention. Figure 2 is the power gap in vC ■ segment connection. Figure 3 is the method for determining the average power value of vowels. Figure 4 shows a vowel power normalization method in a VCV unit. Figure 5 shows a vowel power normalization method in a VC ■ unit. Part 2. Figure 6 shows another example of the power normalization section 6, part 1. Other Embodiment 2 of the Power Normalization Unit 6 FIG. 8 is a block diagram showing the configuration of the fifth embodiment of the present invention. FIG. 9 is a method for changing the power reference value of a vowel.
Text input section, 2... Text analysis section, 3... Parameter reading section, 4... vC ■ parameter file, 5... Pitch generation section, 6... Power normalization section, 7... Power normalization storage unit, 8... Parameter connection unit, 9... Synthesis unit, lO... Speaker, 11... Power reference value creation unit.

ア３ ■ ＃音のハ０ワー乎ｊ勾イ亘乞オウウろちシ玉７レー八七笛図（（Ｌ’）正列化前のノぐワー変化Ｃｂ）ノ＼°ワーコジ３Ｊ芭化ルＶ虻（Ｃ）正規ノヒ５虐たパワー・々化謳図ＶＣＶ素片ＩＺ；Ｆ；１７．砧ｅ音０ハリーエ刀υし芳
Ｓム（の正規化前のパワー変化（の正ＲＬＩシゴわちパワー変化囁　６　図〕でワーエ月屯イし名弔６１Ｓおり′るイ也の）ｒ方ヒ
３ダリモっ１輩ワ図A3 ■ #Sound of H0Wa 乎j 转い还语 ouurochishi tama 7re 87 flute diagram ((L') Nogwa change before regularization Cb) ノ\°Wakoji 3J Basification V (C) Genuine nohi 5 torture power/variation song diagram VCV piece IZ;F;17. The power change before normalization of the power change before normalization (the positive RLI of the power change whisper 6 figure) and the famous funeral 61S is the same as the power change before normalization. Hi3darimo 1st year wa figure

Claims

[Claims]

(1) Read out the feature parameters and driving sound sources registered in the VCV (vowel-consonant-vowel) speech segment file according to the phonetic sequence of the speech to be synthesized, and use the read parameters and sound source information to In a speech synthesis method that sequentially combines the vowels based on rules and feeds them to a speech synthesizer to output speech, the average value of the power of each vowel is stored in advance, and the power at both ends of each VCV segment is calculated as the power of each vowel. A speech synthesis method characterized by normalizing VCV segments so that they match the average value of .

(2) The speech synthesis method according to claim (1), characterized in that the power of the VCV element is normalized for the entire VCV element.

(3) The speech synthesis method according to claim (1), characterized in that the power of the VCV segment is normalized only for vowels.

(4) The speech synthesis method according to claim 1, characterized in that the power of the VCV element is normalized by adjusting the average value of the power of each vowel according to the power characteristics of the word or sentence.