JP3109778B2

JP3109778B2 - Voice rule synthesizer

Info

Publication number: JP3109778B2
Application number: JP05106683A
Authority: JP
Inventors: 治木村; 延佳海木
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1993-05-07
Filing date: 1993-05-07
Publication date: 2000-11-20
Anticipated expiration: 2015-11-20
Also published as: JPH06318094A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、合成音声を生成する音
声規則合成装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech rule synthesizing apparatus for generating synthesized speech.

【０００２】[0002]

【従来の技術】規則に従って音声を合成する従来の音声
合成装置では、音声の合成単位として音韻や、音節、Ｖ
ＣＶ（母音・子音・母音）連接、ＣＶＣ（子音・母音・
子音）連接、単語など音韻との対応や、調音結合を考慮
した単位を設定し、自然音声を分析して作成した音声合
成パラメータ値を記憶しておき、入力文字列に対応する
単位の音声合成パラメータ（以下、合成素片と呼ぶ）の
編集、結合、変形により音声を合成していた。2. Description of the Related Art In a conventional speech synthesizer for synthesizing speech according to rules, phonemes, syllables, V
CV (vowel / consonant / vowel) concatenation, CVC (consonant / vowel /
Consonants) Set units that take into account the connection with phonemes, such as concatenation and words, and set units that take into account articulation, store speech synthesis parameter values created by analyzing natural speech, and synthesize speech in units corresponding to the input character strings. Speech was synthesized by editing, combining, and transforming parameters (hereinafter, referred to as synthesis units).

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、上述し
た従来の音声合成装置では、同じ音素や音節で、単位毎
に発声して集めた音と文章中に現れる音がかなり異なる
ため、合成音の自然さに欠けるという問題点があった。However, in the above-mentioned conventional speech synthesizer, the sound collected and uttered for each unit with the same phoneme or syllable is considerably different from the sound appearing in the sentence. There was a problem that it lacked.

【０００４】例えば、単音節などを発声した自然音声を
分析したもので文章の音声を合成すると、一音一音はっ
きりと発音しているような印象の合成音になってしま
う。合成音の速度をあげるほどその傾向が強い。For example, when a sentence is synthesized using a result of analyzing a natural voice that utters a single syllable or the like, the synthesized sound has an impression that each sound is clearly pronounced. The tendency increases as the speed of the synthesized sound increases.

【０００５】また、あらかじめ文章や単語のように合成
単位よりも長い単位で発声した自然音声を大量に持ち、
最適な素片を選択して合成素片として用いると、調音結
合はすでに表現されているので自然性が向上するが、最
適な素片を選択する規則がまだ見い出されていない。[0005] In addition, a large amount of natural speech previously uttered in units longer than the synthesis unit, such as sentences and words, is provided.
When the optimal segment is selected and used as a synthetic segment, the articulatory connection is already expressed and the naturalness is improved, but a rule for selecting the optimal segment has not yet been found.

【０００６】特に、合成素片の接続による歪みを少なく
するために、合成素片の接続部のスペクトル歪みを考慮
して素片を選択するには、素片間のスペクトル間距離を
算出する必要があり、素片の組合せの多さから多大の演
算量が必要であるという問題点があった。In particular, in order to select a segment in consideration of the spectral distortion at the connection of the synthesized segments in order to reduce the distortion due to the connection of the synthesized segments, it is necessary to calculate the inter-spectral distance between the segments. There is a problem that a large amount of calculation is required due to the large number of combinations of segments.

【０００７】本発明の目的は、大量の音声データから演
算量が比較的少なく、しかも素片接続部のスペクトル歪
みの少ない素片を選択することにより、明瞭性及び自然
性が高い合成音声を出力できる音声規則合成装置を提供
することにある。An object of the present invention is to select a segment having a relatively small amount of computation from a large amount of speech data and having a small spectral distortion at a segment connection portion, thereby outputting a synthesized speech having high clarity and naturalness. It is an object of the present invention to provide a voice rule synthesizing device capable of performing the above.

【０００８】[0008]

【課題を解決するための手段】本発明の目的は、自然音
声を分析して音韻毎にラベル付けされた音声合成パラメ
ータを格納する記憶手段と、出力音声を組み立てるため
に適切な合成単位に分割し、韻律情報と合成単位に分割
された前後の音韻系列とを設定する設定手段と、設定手
段で設定された前後の音韻系列と韻律情報とからの情報
に基づいて、接続部の接続音韻中心のスペクトルである
ターゲットスペクトルを算出する算出手段と、算出手段
によって算出されたターゲットスペクトルとスペクトル
距離の最小となる合成素片を、記憶手段に格納されてい
る音声合成パラメータから選択する選択手段と、選択手
段で選択された合成素片を接続する接続手段とを備えて
いる音声規則合成装置によって達成される。SUMMARY OF THE INVENTION It is an object of the present invention to analyze natural speech and store speech synthesis parameters labeled for each phoneme, and to divide the speech into appropriate synthesis units for assembling output speech. Divided into prosodic information and synthesis units
Setting means for setting the front and rear of the phoneme sequence that is based on information from the phoneme sequence and prosodic information before and after that is set as in setting unit, a spectrum of connection phoneme center of connection part <br / > Calculating means for calculating the target spectrum, and calculating means
Target spectrum and spectrum calculated by
The composite element having the minimum distance is stored in the storage means.
And Ruoto voice synthesis parameter or al selection selecting means is achieved by in which the speech synthesis by rule system and connection means for connecting the case Narumotohen selected by the selection means.

【０００９】[0009]

【作用】本発明の音声規則合成装置では、記憶手段は、
自然音声を分析して音韻毎にラベル付けされた音声合成
パラメータを格納し、設定手段は、出力音声を組み立て
るために適切な合成単位に分割し、韻律情報と合成単位
に分割された前後の音韻系列とを設定し、算出手段は、
設定手段で設定された前後の音韻系列と韻律情報とから
の情報に基づいて、接続部の接続音韻中心のスペクトル
であるターゲットスペクトルを算出し、選択手段は、算
出手段によって算出されたターゲットスペクトルとスペ
クトル距離の最小となる合成素片を、記憶手段に格納さ
れている音声合成パラメータから選択し、接続手段は、
選択手段で選択された合成素片を接続する。In the speech rule synthesizing apparatus of the present invention, the storage means comprises:
Speech synthesis labeled for each phoneme by analyzing natural speech
Store the parameters, setting means divides the appropriate synthesis unit to assemble the output speech, prosody information and the synthesis unit
And the phonological sequence before and after being divided into
From the before and after phoneme sequences and prosodic information set by the setting means
Based on the information, the spectrum of the connecting phonemes center of connection part
Calculating a target spectrum is, selecting means, target spectrum and space calculated by the calculating means
The composite element with the minimum vector distance is stored in the storage means.
From the speech synthesis parameters that have been established , and the connection means:
The synthesis unit selected by the selection unit is connected.

【００１０】[0010]

【実施例】以下、図面を参照して、本発明の音声規則合
成装置の実施例を説明する。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram showing a speech rule synthesizing apparatus according to an embodiment of the present invention;

【００１１】図１は、本発明の音声規則合成装置の一実
施例の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of an embodiment of a speech rule synthesizing apparatus according to the present invention.

【００１２】図１の音声規則合成装置は、テキスト入力
端子１１に接続されておりテキスト入力端子から入力さ
れた変換すべきテキストを基に形態素解析、漢字かな変
換、アクセント処理等を行なって出力するテキスト解析
部１２、テキスト解析部１２に接続されておりテキスト
解析部１２から出力された解析情報を基にピッチパタ
ン、各音素毎の時間長パタン、及び振幅パタンを生成し
て出力する韻律情報生成部１３、テキスト解析部１２に
接続されておりテキスト解析部１２から出力された解析
情報を基に出力音声を組み立てるために合成単位に分割
して出力する設定手段である合成単位設定部１４、韻律
情報生成部１３及び合成単位設定部１４に接続されてお
り合成単位設定部１４で合成単位に分割された前後の音
韻系列と韻律情報生成部１３からの情報を基に適切なタ
ーゲットスペクトルを算出して出力する算出手段である
ターゲットスペクトル算出部１５、ターゲットスペクト
ル算出部１５に接続されており大量の音声データを基に
合成に必要な音響パラメータを分析、作成して出力する
記憶手段である音声パラメータファイル１６、ターゲッ
トスペクトル算出部１５及び音声パラメータファイル１
６に接続されており合成素片の接続部のスペクトルがタ
ーゲットスペクトルに最も近いものを音声パラメータフ
ァイル１６の中から選択して出力する選択手段である合
成素片選択部１７、合成素片選択部１７に接続されてお
り選択された素片同士を結合して出力する接続手段であ
る合成素片接続部１８、韻律情報生成部１３及び合成素
片接続部１８に接続されており合成素片接続部１８で得
られた合成素片系列及び韻律情報生成部１３で得られた
韻律情報を基に合成音声を生成して出力端子２０に出力
する合成音声生成部１９によって達成されている。The speech rule synthesizing apparatus shown in FIG. 1 is connected to a text input terminal 11, performs morphological analysis, kanji-kana conversion, accent processing, etc. based on a text to be converted input from the text input terminal, and outputs the result. A text analysis unit 12 connected to the text analysis unit 12 and generating prosody information for generating and outputting a pitch pattern, a time length pattern for each phoneme, and an amplitude pattern based on the analysis information output from the text analysis unit 12 A synthesizing unit setting unit 14 which is connected to the text analyzing unit 12 and is a setting unit which divides the synthesized speech into synthesizing units based on the analysis information output from the text analyzing unit 12 and outputs the synthesized speech. It is connected to the information generating unit 13 and the synthesizing unit setting unit 14. A target spectrum calculating unit 15 which is a calculating means for calculating and outputting an appropriate target spectrum based on the information from the unit 13; connected to the target spectrum calculating unit 15; a sound necessary for synthesis based on a large amount of voice data; analysis parameters, the speech parameter file 16 is a storage unit for outputting created, the target spectrum calculating unit 15 and the speech parameter file 1
6, a synthesis unit selection unit 17 and a synthesis unit selection unit which are selection means for selecting and outputting, from the speech parameter file 16, a spectrum whose connection section of the synthesis unit is closest to the target spectrum. a connecting means for coupling and outputting the connected and selected by the segment between the 17 synthesis fragment connection unit 1 8, synthesis fragment is connected to the prosody information generating unit 13 and the synthesis fragment connection 18 This is achieved by a synthesized speech generation unit 19 that generates a synthesized speech based on the synthesized speech sequence obtained by the connection unit 18 and the prosody information obtained by the prosody information generation unit 13 and outputs the synthesized speech to the output terminal 20.

【００１３】次に、図１の音声規則合成装置の動作を説
明する。Next, the operation of the speech rule synthesizing apparatus shown in FIG. 1 will be described.

【００１４】テキスト入力端子１１より音声に変換すべ
きテキストが入力されると、テキスト解析部１２より係
り受けなどの構文解析や品詞解析などの形態素解析、及
び漢字かな変換、アクセント処理が行われ、合成単位設
定部１４、韻律情報生成部１３に必要な解析情報が送出
される。その解析情報としては合成単位設定部１４に対
しては音韻の区別を示す記号列、韻律情報生成部１３に
対しては呼気段落内モーラ数、アクセント形、発声スピ
ードなどである。When a text to be converted into speech is input from the text input terminal 11, the text analysis unit 12 performs syntax analysis such as dependency, morphological analysis such as part-of-speech analysis, kanji-kana conversion, and accent processing. Necessary analysis information is sent to the synthesis unit setting unit 14 and the prosody information generation unit 13. The analysis information includes, for the synthesis unit setting unit 14, a symbol string indicating the distinction of phonemes, and for the prosody information generation unit 13, the number of mora in an exhalation paragraph, accent form, utterance speed, and the like.

【００１５】韻律情報生成部１３は、これらの情報を基
にピッチパタン、各音素毎の時間長パタン、及び振幅パ
タンを規則により生成する。The prosody information generation unit 13 generates a pitch pattern, a time length pattern for each phoneme, and an amplitude pattern based on these information according to rules.

【００１６】合成単位設定部１４は、入力された音韻記
号列を、音節やＶＣＶ音韻系列など出力音声を組み立て
る上で適切な合成単位に分割し、その分割された音韻系
列をターゲットスペクトル算出部１５に出力する。The synthesis unit setting unit 14 divides the input phoneme symbol string into synthesis units suitable for assembling output speech such as syllables and VCV phoneme sequences, and divides the divided phoneme sequences into target spectrum calculation units 15. Output to

【００１７】ターゲットスペクトル算出部１５は、合成
単位に分割された前後の音韻系列と、韻律情報生成部１
３からの情報を基に最適なターゲットスペクトルを算出
する。The target spectrum calculation unit 15 includes a phonological sequence before and after divided into synthesis units, and a prosody information generation unit 1.
The optimum target spectrum is calculated based on the information from Step 3.

【００１８】音声パラメータファイル１６は、大量の音
声データを基にオフライン処理であらかじめ作成してお
く。例えば、アナウンサ一人による単語、文章など数時
間分の音声データに対しデジタルソナグラムによる視察
により音韻ラベリングを施して、合成に必要な音響パラ
メータを分析しておく。The voice parameter file 16 is created in advance by offline processing based on a large amount of voice data. For example, phonetic labeling is performed on audio data for several hours, such as words and sentences by one announcer, by inspection using digital sonargrams, and acoustic parameters required for synthesis are analyzed in advance.

【００１９】合成素片選択部１７は、合成素片の接続部
のスペクトルが、上記ターゲットスペクトルに最も近い
ものを音声パラメータファイル１６の中から選択する。The synthesis unit selection unit 17 selects, from the speech parameter file 16, the one whose connection unit spectrum is closest to the target spectrum.

【００２０】合成素片接続部１８は、選択された素片ど
うしの結合を行なって合成波形生成部１９に送出する。The synthesized segment connecting unit 18 combines the selected segments and sends the result to the synthesized waveform generating unit 19.

【００２１】合成音声生成部１９は、合成素片接続部１
８で得られた合成素片系列と、韻律情報生成部１３で得
られた韻律情報を基にして合成音声を生成し、生成した
音声を出力端子１０に出力される。The synthesized speech generating unit 19 includes the synthesized speech unit connecting unit 1.
A synthesized speech is generated based on the synthesized speech sequence obtained in step 8 and the prosody information obtained in the prosody information generation unit 13, and the generated voice is output to the output terminal 10.

【００２２】上述した構成では、テキスト解析部１２を
設けているが、あらかじめテキスト解析を行い、その解
析情報を本装置へ入力した場合には、テキスト解析部１
２を省略できる。In the above-described configuration, the text analysis unit 12 is provided. When the text analysis is performed in advance and the analysis information is input to the apparatus, the text analysis unit 1 is used.
2 can be omitted.

【００２３】同様に、あらかじめ韻律のパタンを生成し
本装置へ入力した場合は、韻律情報生成部１３を省略で
きる。Similarly, when a prosody pattern is generated in advance and input to the apparatus, the prosody information generation unit 13 can be omitted.

【００２４】ここで用いる音響パラメータ及び合成音声
を生成するための合成器については、特に規定するもの
はなく全てに対して適用可能である。There are no particular restrictions on the synthesizer for generating the acoustic parameters and the synthesized speech used here, and the present invention can be applied to all.

【００２５】次に、図２のフローチャートを参照して、
上記ターゲットスペクトル算出部１５の動作を詳細に述
べる。Next, referring to the flowchart of FIG.
The operation of the target spectrum calculator 15 will be described in detail.

【００２６】図２は、／ｏＮｓｅｉ／を合成する場合の
／Ｎ／のターゲットを算出する一例を示している。FIG. 2 shows an example of calculating the target of / N / when synthesizing / oNsei /.

【００２７】まず、前後の合成単位と韻律情報を入力し
（ステップＳ１）、接続部の音韻を中心に音韻系列を設
定し（ステップＳ２）、音声パラメータファイル１６か
らその音韻系列を含む音声パラメータを検索する（ステ
ップＳ３）。First, the preceding and succeeding synthesis units and prosody information are input (step S1), a phoneme sequence is set centering on the phoneme of the connection part (step S2), and the speech parameters including the phoneme sequence are read from the speech parameter file 16. Search (step S3).

【００２８】もし、候補が見つからない場合は、順次検
索音韻系列を両側から削除しながら検索を行なう。例え
ば／ｏＮｓｅ／を含む音声パラメータがないときは、／
ｏＮｓ／→／Ｎｓ／→／Ｎ／となる。If no candidate is found, the search is performed while sequentially deleting the search phoneme sequence from both sides. For example, if there is no voice parameter including / oNse /,
oNs / → / Ns / → / N /

【００２９】次に、韻律情報から接続部のピッチ条件を
設定し（ステップＳ４）、候補の絞り込みを行なう（ス
テップＳ５）。このピッチ条件は、例えばピッチの±５
％などとする。もし該当するものがなければ、ピッチ条
件を±１０％、１５％……と広げていく。Next, the pitch condition of the connection portion is set from the prosody information (step S4), and candidates are narrowed down (step S5). This pitch condition is, for example, ± 5 of the pitch.
%. If there is no such condition, the pitch condition is expanded to ± 10%, 15%.

【００３０】次に、候補の中から接続部の音韻の継続長
に最も近いものを選択し（ステップＳ６）、選択された
音声パラメータから接続音韻の中心のスペクトルを算出
し（ステップＳ７）、ターゲットスペクトルとする（ス
テップＳ８）。Next, a candidate closest to the continuation length of the connected phoneme is selected from the candidates (step S6), and the spectrum of the center of the connected phoneme is calculated from the selected speech parameters (step S7). The spectrum is set (step S8).

【００３１】以上の処理で接続部のターゲットスペクト
ルを算出する。With the above processing, the target spectrum of the connection part is calculated.

【００３２】次に、図１の音声規則合成装置による音声
規則の合成処理を具体的に説明する。Next, the speech rule synthesizing process by the speech rule synthesizing apparatus shown in FIG. 1 will be specifically described.

【００３３】例えば「音声」という単語がテキスト入力
端子１１に入力されると、テキスト解析部１２で／ｏＮ
ｓｅｉ／という音韻系列と韻律情報が生成される。そし
て、合成単位をＶＣＶとすると、合成単位設定部１４で
／Ｓｏ／、／ｏＮ／、／Ｎｓｅ／、／ｅｉ／、／ｉＳ／
の５つの合成単位に分割される。ただし、／Ｓ／は無音
をあらわす。次に合成単位毎に素片を選択して行くが、
以下に／Ｎｓｅ／の場合の例を示す。For example, when the word “voice” is input to the text input terminal 11, the text analysis unit 12 outputs / oN
A phonological sequence “sei /” and prosodic information are generated. Assuming that the synthesis unit is VCV, the synthesis unit setting unit 14 sets / So /, / oN /, / Nse /, / ei /, / iS /
Is divided into five synthesis units. However, / S / represents silence. Next, we select segments for each synthesis unit,
An example in the case of / Nse / is shown below.

【００３４】まず、ターゲットスペクトル算出部１５で
／ｏＮ／と／Ｎｓｅ／の接続部のターゲットスペクトル
を算出する。この場合、／ｏＮｓｅ／の音韻系列の音声
パラメータを音声パラメータファイル１６から検索し、
韻律情報からの絞り込みによって選択された音声パラメ
ータの／Ｎ／の時間的中心であるスペクトルをターゲッ
トスペクトルＳＰ１とする。First, the target spectrum calculator 15 calculates the target spectrum at the connection between / oN / and / Nse /. In this case, the voice parameter of the phoneme sequence of / oNse / is searched from the voice parameter file 16, and
A spectrum that is the temporal center of / N / of the speech parameter selected by narrowing down from the prosody information is set as a target spectrum SP1.

【００３５】同様に／Ｎｓｅ／と／ｅｉ／との接続部の
ターゲットスペクトルＳＰ２も算出する。Similarly, the target spectrum SP2 at the connection between / Nse / and / ei / is calculated.

【００３６】次に、合成素片選択部１７で、／Ｎｓｅ／
の音韻系列を持つ音声パラメータを音声パラメータファ
イル１６から検索する。次に、ターゲットスペクトルＳ
Ｐ１、ＳＰ２と検索された候補毎に／Ｎ／及び／ｅ／の
部分のスペクトル距離の最小値を算出し、その最小値の
和が最も小さい候補を合成素片として選択する。このよ
うにして合成単位毎に素片を選択した後、合成素片の接
続をターゲットスペクトルとの距離が最小の位置で行な
い、合成波形を生成する。Next, the synthesis unit selection unit 17 outputs / Nse /
The speech parameter having the phoneme sequence of is searched from the speech parameter file 16. Next, the target spectrum S
The minimum value of the spectral distance of the / N / and / e / portions is calculated for each of the candidates searched for P1 and SP2, and the candidate having the smallest sum of the minimum values is selected as the synthesis unit. After selecting the segments for each synthesis unit in this way, the synthesis segments are connected at the position where the distance from the target spectrum is the minimum, and a synthesized waveform is generated.

【００３７】このように接続部における最適なターゲッ
トスペクトルを設定し、これに最も近いスペクトルを持
つ合成素片を接続していくことによって、接続歪みの少
ない合成音声が得られる。As described above, by setting the optimum target spectrum at the connection portion and connecting the synthesis segments having the spectrums closest to the optimum target spectrum, a synthesized speech with little connection distortion can be obtained.

【００３８】従来のターゲットスペクトルを設定しない
で接続歪みの少ない合成を行なう方式では、接続する合
成素片間の組合せの多さのために多大の計算量を要して
いたのに対し、本装置では計算量の大幅な削減が可能で
ある。In the conventional method of performing synthesis with small connection distortion without setting a target spectrum, a large amount of calculation is required due to the large number of combinations between connected synthesis segments. Can greatly reduce the amount of calculation.

【００３９】更に、計算量及びメモリを削減する方法と
して、音韻系列及び韻律情報毎にあらかじめターゲット
スペクトルを算出し、そのターゲットスペクトルに最適
な合成素片をテーブル登録しておく。Further, as a method of reducing the amount of calculation and the memory, a target spectrum is calculated in advance for each phoneme sequence and prosody information, and a synthesis unit optimal for the target spectrum is registered in a table.

【００４０】例えば、ＶＣＶ単位の合成でハツオン／Ｎ
／も母音として考えると、接続部は／ａ、ｉ、ｕ、ｅ、
ｏ、Ｎ／の６種類である。For example, by combining VCV units,
If / is also considered as a vowel, the connections are / a, i, u, e,
o and N /.

【００４１】最小のハード構成を考えると、あらかじめ
普通の高さで発声した単母音の定常部を分析しておき、
それぞれのスペクトルをターゲットスペクトルとする。Considering the minimum hardware configuration, the stationary part of a single vowel uttered at a normal pitch is analyzed in advance,
Let each spectrum be a target spectrum.

【００４２】次に、上記合成素片選択部１７と同様のア
ルゴリズムで、ターゲットスペクトルに最適な合成素片
を選択し、これをテーブル登録しておく。そして合成時
には、そのテーブルを参照することによって合成素片を
選択する。この場合、ＶＣＶ毎に１種類の合成素片が対
応しているテーブルを構築できる。Next, an optimum synthesis unit for the target spectrum is selected by the same algorithm as that of the synthesis unit selection unit 17 and registered in a table. At the time of synthesis, the synthesis unit is selected by referring to the table. In this case, it is possible to construct a table in which one type of synthesized element corresponds to each VCV.

【００４３】この方法では、合成時に検索処理を行なう
方法に比べて合成音の品質が落ちる可能性はあるが、テ
ーブルに記述された合成素片のみを音声パラメータファ
イルにメモリするだけでよく、更に合成時に検索処理を
行なわないので、計算量及びメモリを大幅に削減でき
る。In this method, although there is a possibility that the quality of synthesized speech may be lower than in the method of performing search processing at the time of synthesis, it is only necessary to store only the synthesis segments described in the table in the voice parameter file. Since search processing is not performed at the time of synthesis, the amount of calculation and memory can be significantly reduced.

【００４４】また、もう少し大きなハード構成が可能な
ら、複数の高さで発声した単母音の定常部をターゲット
にしたり、調音結合の影響を強く受ける音韻系列（例え
ば無声化や鼻音化）の音声からターゲットを作成し、合
成素片テーブルを作成することによって更に高品質化を
はかることができる。Further, if a slightly larger hardware configuration is possible, it is possible to target a stationary part of a single vowel uttered at a plurality of pitches, or from a phoneme sequence (for example, unvoiced or nasalized) which is strongly affected by articulation. The quality can be further improved by creating a target and creating a synthesis unit table.

【００４５】[0045]

【発明の効果】本発明の音声規則合成装置は、自然音声
を分析して音韻毎にラベル付けされた音声合成パラメー
タを格納する記憶手段と、出力音声を組み立てるために
適切な合成単位に分割し、韻律情報と合成単位に分割さ
れた前後の音韻系列とを設定する設定手段と、設定手段
で設定された前後の音韻系列と韻律情報とからの情報に
基づいて、接続部の接続音韻中心のスペクトルであるタ
ーゲットスペクトルを算出する算出手段と、算出手段に
よって算出されたターゲットスペクトルとスペクトル距
離の最小となる合成素片を、記憶手段に格納されている
音声合成パラメータから選択する選択手段と、選択手段
で選択された合成素片を接続する接続手段とを備えてい
るので、大量の音声パラメータを蓄積しておき、音声の
合成にために適切な合成素片を抽出して接続することに
より出力音声を合成する。その結果、少ない計算量で明
瞭性が高くしかも自然性のよい音声を得ることができ
る。The speech rule synthesizing apparatus of the present invention analyzes natural speech and stores speech synthesis parameters labeled for each phoneme, and divides the output speech into appropriate synthesis units to assemble the output speech. Divided into prosodic information and synthesis units
Setting means for setting the front and rear of the phoneme sequence that, based on information from the phoneme sequence and prosodic information before and after that is set as in setting unit, a spectrum of connection phoneme center of connection part data <br /> and calculating means for calculating a target over the spectrum, the calculation means
Therefore, the calculated target spectrum and spectral distance
The smallest synthesis fragments of the release, that is stored in the storage means
Selection means for either et selection speech synthesis parameters, since a connecting means for connecting the case Narumotohen selected by the selecting means, advance to accumulate large amount of speech parameters, for the synthesis of speech Then, an output speech is synthesized by extracting and connecting an appropriate synthesis unit. As a result, a voice with high clarity and good naturalness can be obtained with a small amount of calculation.

[Brief description of the drawings]

【図１】本発明の音声規則合成装置の一実施例の構成を
示すブロック図である。FIG. 1 is a block diagram showing a configuration of an embodiment of a speech rule synthesis device according to the present invention.

【図２】図１の音声規則合成装置によるターゲットスペ
クトルの算出処理を説明するためのフローチャートであ
る。FIG. 2 is a flowchart for explaining a target spectrum calculation process performed by the speech rule synthesis device of FIG. 1;

[Explanation of symbols]

１１テキスト入力端子１２テキスト解析部１３韻律情報生成部１４合成単位設定部１５ターゲットスペクトル算出部１６音声パラメータファイル１７合成素片選択部１８合成素片接続部１９合成音声生成部２０音声出力端子 Reference Signs List 11 text input terminal 12 text analysis unit 13 prosody information generation unit 14 synthesis unit setting unit 15 target spectrum calculation unit 16 voice parameter file 17 synthesis unit selection unit 18 synthesis unit connection unit 19 synthesized voice generation unit 20 voice output terminal

フロントページの続き (56)参考文献特開平３−119394（ＪＰ，Ａ) 特開平１−284898（ＪＰ，Ａ) 特開平４−281495（ＪＰ，Ａ) 特開平５−94199（ＪＰ，Ａ) 特開平５−53595（ＪＰ，Ａ) 電子情報通信学会技術研究報告［音声］，Ｖｏｌ．89，Ｎｏ．318，ＳＰ89− 65，野村哲也外「低次ケプストラム係数の連続性を考慮した音素セグメント接続法」，ｐ．９−16（1989年11月27日発行) 日本音響学会平成元年度秋季研究発表会講演論文集▲Ｉ▼，３−Ｐ−４，野村哲也外「ケプストラム係数の連続性を考慮した音素セグメント接続法」，ｐ. 283−284（平成元年10月４日発行) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 11/00 - 13/08 G10L 19/00 - 21/06 ＪＩＣＳＴファイル（ＪＯＩＳ)Continuation of front page (56) References JP-A-3-119394 (JP, A) JP-A 1-284898 (JP, A) JP-A 4-281495 (JP, A) JP-A 5-94199 (JP) , A) JP-A-5-53595 (JP, A) IEICE Technical Report [Voice], Vol. 89, No. 318, SP89-65, Tetsuya Nomura et al. "Phoneme segment connection method considering continuity of low-order cepstrum coefficients", p. 9-16 (published November 27, 1989) Proceedings of the Acoustical Society of Japan, Fall Meeting, 1989, I-P, 3-P-4, Tetsuya Nomura, et al. Segment connection method ”, pp. 283-284 (October 4, 1989) (58) Fields investigated (Int. Cl. ⁷ , DB name) G10L 11/00-13/08 G10L 19/00-21 / 06 JICST file (JOIS)

Claims

(57) [Claims]

1. A storage means for analyzing natural speech and storing speech synthesis parameters labeled for each phoneme, and dividing the speech into appropriate synthesis units for assembling an output speech.
Setting means for setting the phoneme series of front and rear divided into information and the synthesis unit, phonological system before and after set by the setting means
Based on the information from the column and the prosody information, connecting section of the connection tone
Synthesis element comprising a calculation means for calculating a target spectrum is the spectrum of the rhyme center, a minimum of te <br/> target spectrum and spectral distance calculated by the calculating means
Pieces, and selection means for Ruoto voice synthesis parameters or al selection is stored in the storage means, selected by said selection means
That a connecting means for connecting the coupling Narumotohen speech rule synthesizing apparatus according to claim.

2. The method according to claim 1, wherein the calculating means calculates the prosody information and the synthesis unit.
Input the phonological sequence before and after the division and before and after connecting
From speech parameter candidates including the phoneme sequence of the synthesis unit,
Select the speech parameter closest to the duration of the phoneme of the connection
The pitch condition of the connection part is set from the
Narrow down meter candidates and check if
Calculating a target spectrum from the
The speech rule synthesis device according to claim 1.