JP3513071B2

JP3513071B2 - Speech synthesis method and speech synthesis device

Info

Publication number: JP3513071B2
Application number: JP2000053822A
Authority: JP
Inventors: 岳彦籠嶋; 重宣瀬戸
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2000-02-29
Filing date: 2000-02-29
Publication date: 2004-03-31
Anticipated expiration: 2020-02-29
Also published as: JP2001242882A

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、テキスト音声合成
のための音声合成方法及び音声合成装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice synthesizing method and a voice synthesizing apparatus for synthesizing text voices.

【０００２】[0002]

【従来の技術】任意の文章から人工的に音声信号を作り
出すことをテキスト音声合成という。通常、テキスト合
成システムは、テキスト解析、合成パラメータ生成、音
声合成の３つの段階から構成される。2. Description of the Related Art Artificially producing a voice signal from an arbitrary sentence is called text-to-speech synthesis. Usually, a text synthesis system is composed of three stages: text analysis, synthesis parameter generation, and speech synthesis.

【０００３】図１０に、従来の一般的な音声合成装置の
構成例を示す。図１０に示されるように、従来の一般的
なテキスト合成システムは、通常、テキスト解析部１０
１０、合成パラメータ生成部１００１、音声合成部１０
１３、韻律制御辞書１００２から構成される。入力され
たテキスト１１０２は、まず、テキスト解析部１０１０
において形態素解析や構文解析などが行われ、言語情報
１１０４が出力される。言語情報１１０４は、テキスト
１１０２の読みに対応する音声記号列や、韻律制御の単
位となるアクセント句の情報、アクセントの位置、品詞
など、合成パラメータの生成に必要な様々な情報を含ん
でいる。次に、合成パラメータ生成部１００１は、言語
情報１０４に基づき、韻律制御辞書１００２を参照して
韻律制御を行い、合成パラメータ１１００を生成する。
合成パラメータ１１００は、基本周波数・音韻継続時間
・パワーなどの韻律パラメータと音素記号列などの音韻
パラメータとから構成される。そして、音声合成部１０
１３は、合成パラメータ１１００で指定された音韻情報
や韻律情報に従って音声情報１１０８を生成する。FIG. 10 shows an example of the configuration of a conventional general speech synthesizer. As shown in FIG. 10, the conventional general text synthesizing system normally uses the text analysis unit 10.
10, synthesis parameter generation unit 1001, speech synthesis unit 10
13 and a prosody control dictionary 1002. The input text 1102 is first analyzed by the text analysis unit 1010.
In, morphological analysis and syntactic analysis are performed, and language information 1104 is output. The language information 1104 includes various information necessary for generating synthetic parameters, such as a phonetic symbol string corresponding to the reading of the text 1102, information on an accent phrase as a unit of prosody control, an accent position, and a part of speech. Next, the synthesis parameter generation unit 1001 performs prosody control based on the language information 104 with reference to the prosody control dictionary 1002, and generates the synthesis parameter 1100.
The synthesis parameter 1100 is composed of prosodic parameters such as fundamental frequency, phoneme duration and power, and phoneme parameters such as phoneme symbol strings. Then, the voice synthesis unit 10
13 generates voice information 1108 according to the phoneme information and prosody information designated by the synthesis parameter 1100.

【０００４】このような合成システムでは、人間が文章
を読み上げるときのような調子（いわゆる朗読調）の音
声を合成することが普通であったが、近年、発話スタイ
ルを制御して多様な合成音を生成する方法が提案されて
いる。例えば、特開平１０−１１０８３号公報では、基
準発話スタイル（朗読調など）を含む複数の発話スタイ
ルの韻律制御辞書を用いて合成音声の発話スタイルを制
御する方法が開示されている。図１１に、この従来の音
声合成装置の構成を示す。In such a synthesizing system, it is usual to synthesize a voice having a tone (so-called reading tone) as when a human reads a sentence, but in recent years, various synthesizing voices have been controlled by controlling the utterance style. A method of generating is proposed. For example, Japanese Unexamined Patent Publication No. 10-11083 discloses a method of controlling the utterance style of synthetic speech using a prosody control dictionary of a plurality of utterance styles including a reference utterance style (reading tone, etc.). FIG. 11 shows the configuration of this conventional speech synthesizer.

【０００５】上述した一般的なテキスト合成システムと
の相違は、韻律制御辞書を複数持ち（図１１では２０１
４，２０１５の２つ）、発話スタイル指定情報２１０３
に基づいて選択された韻律制御辞書（例えば２０１５）
と基準発話スタイル韻律制御辞書２０１６とを用いてそ
れぞれ合成パラメータ２１０５および合成パラメータ２
１０６を生成し、発話スタイル強調部２０１２において
強調度指定情報２１０１に従って発話スタイルを補正す
ることにある。複数の韻律制御情報２０１４，２０１５
は、基準発話スタイルと異なる発話スタイルの韻律制御
辞書であり、例えば会話調スタイルやアナウンサー調ス
タイルなどがある。発話スタイル強調部２０１２は、発
話スタイル指定情報２１０３によって選択された発話ス
タイルの合成パラメータ２１０５と、基準発話スタイル
の合成パラメータ２１０６との韻律パラメータの差分を
計算し、強調度指定情報２１０１と該差分に応じて合成
パラメータ２１０６の韻律パラメータを補正することに
より、発話スタイルが調整された合成パラメータ２１０
７を生成する。The difference from the above-mentioned general text synthesizing system is that it has a plurality of prosodic control dictionaries (201 in FIG. 11).
4, 2015), utterance style designation information 2103
Prosodic control dictionary selected based on (eg 2015)
And the reference utterance style prosody control dictionary 2016 are used to synthesize parameters 2105 and 2 respectively.
106 is generated, and the utterance style emphasizing unit 2012 corrects the utterance style in accordance with the emphasis degree designation information 2101. Plural prosody control information 2014, 2015
Is a prosody control dictionary of an utterance style different from the reference utterance style, and includes, for example, a conversation style and an announcer style. The utterance style emphasizing unit 2012 calculates the difference between the prosody parameters of the utterance style synthesis parameter 2105 selected by the utterance style designation information 2103 and the reference utterance style synthesis parameter 2106, and obtains the emphasis degree designation information 2101 and the difference. By correcting the prosody parameter of the synthesis parameter 2106 accordingly, the synthesis parameter 210 in which the utterance style is adjusted
7 is generated.

【０００６】[0006]

【発明が解決しようとする課題】上述したように従来の
音声合成方法では、基準発話スタイルと選択された１つ
の発話スタイルとの中間的な発話スタイルに変更するこ
としかできず、また文中では常に一定の発話スタイルと
なり変更の自由度は小さい。さらに、変更できるのは、
発話スタイル（朗読調・会話調・アナウンサー調など）
のみであり、話者の個人性（Ａさんの声・Ｂさんの声な
ど）や感情（怒った声・悲しい声など）の変更は不可能
であるという問題があった。As described above, in the conventional speech synthesis method, it is only possible to change to an intermediate utterance style between the reference utterance style and the selected one utterance style, and it is always The utterance style is constant and there is little freedom to change. In addition, you can change
Utterance style (reading, conversation, announcer, etc.)
However, there is a problem that it is impossible to change the speaker's individuality (voice of Mr. A, voice of Mr. B, etc.) and emotion (angry voice, sad voice, etc.).

【０００７】本発明は、上記事情を考慮してなされたも
ので、テキスト音声合成による合成音の韻律の多様性を
向上させることのできる音声合成方法及び音声合成装置
を提供することを目的とする。The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a voice synthesizing method and a voice synthesizing apparatus capable of improving the diversity of prosody of synthesized voice by text voice synthesis. .

【０００８】[0008]

【課題を解決するための手段】本発明は、複数の韻律制
御辞書を用いてそれぞれ生成された韻律パラメータを任
意の割合で内挿して韻律パラメータを生成することによ
り、多様な韻律の合成音声を生成できるようにしたもの
である。ここで、複数の韻律制御辞書は、発話スタイル
が異なる場合の他、話者の個人性や年齢・性別が異なる
ものであってもよいし、感情が異なるものであってもよ
く、また、これらの組み合わせによる様々な特徴を持っ
たものを用いてもよい。SUMMARY OF THE INVENTION According to the present invention, a prosodic parameter generated by using a plurality of prosodic control dictionaries is interpolated at an arbitrary ratio to generate a prosodic parameter. It can be generated. Here, the plurality of prosodic control dictionaries may have different utterance styles, different personality, different age / sex, or different emotions. You may use what has various characteristics by the combination of.

【０００９】すなわち、本発明（請求項１）に係る音声
合成方法は、入力された言語情報に従って、複数の韻律
制御辞書を用いてそれぞれ第１の韻律パラメータを生成
し、前記韻律制御辞書毎に指定された重み情報に従っ
て、複数の前記第１の韻律パラメータ間で内挿処理を行
って第２の韻律パラメータを生成し、前記第２の韻律パ
ラメータに従って合成音声を生成することを特徴とす
る。That is, the speech synthesis method according to the present invention (Claim 1) uses the plurality of prosody control dictionaries to generate the first prosody parameters according to the input linguistic information, and the prosody control dictionaries are generated for each of the prosody control dictionaries. According to the designated weight information, interpolation processing is performed between the plurality of first prosody parameters to generate second prosody parameters, and synthetic speech is generated according to the second prosody parameters.

【００１０】ここで、言語情報とは、テキストの読みに
対応する音節記号列や、韻律制御の単位となるアクセン
ト句の情報、アクセントの位置、品詞、かかり受け、な
どのテキストを解析することによって得られる情報と、
平均的な発生速度や声の大きさなどを指定する付加情報
から構成されるものである。Here, the linguistic information is obtained by analyzing text such as a syllable symbol string corresponding to reading of text, information of accent phrase which is a unit of prosody control, position of accent, part of speech, catching, etc. The information you get,
It is composed of additional information that specifies the average rate of occurrence and the volume of voice.

【００１１】ここで、韻律制御辞書とは、合成音声の基
本周波数・音韻継続時間長・パワー・ポーズなどの韻律
を制御するために参照するものであり、例えば、基本周
波数の典型的な変化パターンや、アクセント成分・音韻
継続時間長・パワー・ポーズ長などの制御量の統計的な
モデルのパラメータ、あるいは決定木で表現されるルー
ルなどが考えられる。Here, the prosodic control dictionary is referred to in order to control prosody such as fundamental frequency, phoneme duration, power, and pause of synthesized speech. For example, a typical variation pattern of fundamental frequency. Also, statistical model parameters of control quantities such as accent components, phoneme durations, powers, and pause lengths, or rules represented by decision trees can be considered.

【００１２】ここで、韻律パラメータとは、基本周波数
・音韻継続時間長・パワー・ポーズなどの合成音声の韻
律を特徴付けるパラメータの集合である。Here, the prosody parameter is a set of parameters that characterize the prosody of the synthetic speech such as the fundamental frequency, the phoneme duration, the power, and the pause.

【００１３】ここで、韻律パラメータ間の内挿処理と
は、複数の韻律パラメータ間で加重平均などの処理によ
って前記複数の韻律パラメータの中間的な韻律パラメー
タを生成する処理である。ただし、ここで言う内挿処理
には、重みが負になるような、いわゆる外挿処理も含ま
れるものとし、この場合、生成された韻律パラメータ
は、前記複数の韻律パラメータの中間的なものとはなら
ず、いずれかの韻律パラメータの特徴をより強調したよ
うなものとなることもありうる。また、この内挿処理
は、全ての韻律パラメータに対して行ってもよいし、一
部のパラメータ、例えば基本周波数のみに対して行って
もよい。また、例えば基本周波数と音韻継続時間長では
内挿の際の重みが異なるようにしてもよい。Here, the interpolation process between the prosody parameters is a process of generating an intermediate prosody parameter of the plurality of prosody parameters by a process such as a weighted average among the plurality of prosody parameters. However, the interpolation processing here includes so-called extrapolation processing in which the weight becomes negative, and in this case, the generated prosody parameter is an intermediate one of the plurality of prosody parameters. However, the characteristics of one of the prosody parameters may be more emphasized. Further, this interpolation processing may be performed on all the prosody parameters, or may be performed on some parameters, for example, only the fundamental frequency. In addition, for example, the fundamental frequency and the phoneme duration may have different weights for interpolation.

【００１４】また、好ましくは、前記重み情報が文中で
変化するようにしてもよい。Further, preferably, the weight information may be changed in the sentence.

【００１５】また、本発明（請求項３）に係る音声合成
方法は、複数の第１の韻律制御辞書間で内挿処理を行っ
て第２の韻律制御辞書を生成し、入力された言語情報に
従って、前記第２の韻律制御辞書を用いて韻律パラメー
タを生成し、前記韻律パラメータに従って合成音声を生
成することを特徴とする。In the speech synthesis method according to the present invention (claim 3), interpolation processing is performed between a plurality of first prosody control dictionaries to generate a second prosody control dictionary, and the inputted language information According to the above, a prosody parameter is generated using the second prosody control dictionary, and a synthetic speech is generated according to the prosody parameter.

【００１６】ここで、韻律辞書間の内挿処理とは、複数
の韻律制御辞書内の対応する情報の間で、加重平均など
の処理によって前記複数の韻律制御辞書の中間的な特性
を持つ音韻制御辞書を生成する処理である。ただし、上
述した韻律パラメータの内挿と同様に、ここで言う内挿
処理には、重みが負になるような、いわゆる外挿処理も
含まれるものとし、この場合、生成された韻律制御辞書
の特性は、前記複数の韻律制御辞書の中間的なものとは
ならず、いずれかの韻律制御辞書の特性をより強調した
ようなものとなることもありうる。また、この内挿処理
は、韻律制御辞書全体に対して行ってもよいし、一部分
だけ、例えば基本周波数制御に関する部分のみに対して
行ってもよい。また、例えば基本周波数に関する部分の
内挿のための重みと音韻継続時間長制御に関する部分の
内挿のための重みが異なるようにしてもよい。Here, the interpolating process between the prosodic dictionaries is a phonological unit having an intermediate characteristic of the plural prosodic control dictionaries by a process such as a weighted average among corresponding information in the plural prosodic control dictionaries. This is a process of generating a control dictionary. However, similar to the above-described interpolation of the prosody parameters, the interpolation processing here includes so-called extrapolation processing in which the weight becomes negative. In this case, the generated prosody control dictionary The characteristic is not intermediate between the prosodic control dictionaries, and may be a characteristic in which one of the prosodic control dictionaries is more emphasized. Further, this interpolation processing may be performed on the entire prosody control dictionary, or may be performed only on a part thereof, for example, only a portion related to fundamental frequency control. Further, for example, the weight for interpolation of the part related to the fundamental frequency may be different from the weight for interpolation of the part related to phoneme duration control.

【００１７】また、本発明（請求項４）に係る音声合成
方法は、複数の第１の韻律制御辞書間で内挿処理を行う
ことによって生成される第２の韻律制御辞書を用いて、
入力された言語情報に従って韻律パラメータを生成し、
前記韻律パラメータに従って合成音声を生成することを
特徴とする。Further, the speech synthesis method according to the present invention (claim 4) uses a second prosody control dictionary generated by performing an interpolation process among a plurality of first prosody control dictionaries,
Generate prosody parameters according to the input linguistic information,
It is characterized in that synthetic speech is generated according to the prosody parameter.

【００１８】また、好ましくは、前記韻律制御辞書は典
型的な基本周波数の変化パターンを表す代表パターン、
もしくはこれと同等の情報を有するもの、例えば典型的
なピッチ周期の変化パターンなどを含むようにしてもよ
い。Further, preferably, the prosody control dictionary is a representative pattern representing a typical fundamental frequency change pattern,
Alternatively, information having equivalent information, such as a typical pitch period change pattern, may be included.

【００１９】また、本発明（請求項６）に係る音声合成
装置は、入力された言語情報に従って、複数の韻律制御
辞書を用いてそれぞれ第１の韻律パラメータを生成する
手段と、前記韻律制御辞書毎に指定された重み情報に従
って、複数の前記第１の韻律パラメータ間で内挿処理を
行って第２の韻律パラメータを生成する手段と、前記第
２の韻律パラメータに従って合成音声を生成する手段と
を備えたことを特徴とする。Further, the speech synthesizer according to the present invention (claim 6) includes means for generating a first prosody parameter using a plurality of prosody control dictionaries in accordance with the input linguistic information, and the prosody control dictionary. Means for interpolating between a plurality of the first prosody parameters according to the weight information designated for each to generate a second prosody parameter; and means for generating a synthetic speech according to the second prosody parameter. It is characterized by having.

【００２０】また、本発明（請求項７）に係る音声合成
装置は、複数の第１の韻律制御辞書間で内挿処理を行っ
て第２の韻律制御辞書を生成する手段と、入力された言
語情報に従って、前記第２の韻律制御辞書を用いて韻律
パラメータを生成する手段と、前記韻律パラメータに従
って合成音声を生成する手段とを備えたことを特徴とす
る。Further, the speech synthesizer according to the present invention (claim 7) is input with a means for performing interpolation processing between a plurality of first prosody control dictionaries to generate a second prosody control dictionary. It is characterized by further comprising means for generating a prosody parameter using the second prosody control dictionary according to language information and means for generating a synthetic voice according to the prosody parameter.

【００２１】また、本発明に係る韻律制御辞書作成方法
は、複数の韻律制御辞書毎に指定された重み情報を入力
し、入力された前記重み情報に従って、複数の第１の韻
律制御辞書間で内挿処理を行って第２の韻律制御辞書を
生成することを特徴とする。Further, in the prosody control dictionary creating method according to the present invention, the weight information designated for each of the plurality of prosody control dictionaries is input, and in accordance with the inputted weight information, among the plurality of first prosody control dictionaries. The second prosody control dictionary is generated by performing interpolation processing.

【００２２】また、本発明に係る韻律制御辞書作成装置
は、複数の韻律制御辞書毎に指定された重み情報を入力
する手段と、入力された前記重み情報に従って、複数の
第１の韻律制御辞書間で内挿処理を行って第２の韻律制
御辞書を生成する手段とを備えたことを特徴とする。Further, the prosody control dictionary creating apparatus according to the present invention has means for inputting weight information designated for each of the plurality of prosody control dictionaries, and a plurality of first prosody control dictionaries according to the inputted weight information. Means for performing interpolating processing between them to generate a second prosody control dictionary.

【００２３】また、本発明（請求項１０）は、入力され
た言語情報に従って、複数の韻律制御辞書を用いてそれ
ぞれ第１の韻律パラメータを生成させ、前記韻律制御辞
書毎に指定された重み情報に従って、複数の前記第１の
韻律パラメータ間で内挿処理を行って第２の韻律パラメ
ータを生成させ、前記第２の韻律パラメータに従って合
成音声を生成させるためのプログラムを記録したコンピ
ュータ読取り可能な記録媒体である。According to the present invention (claim 10), the first prosody parameter is generated using a plurality of prosody control dictionaries in accordance with the input linguistic information, and the weight information designated for each prosody control dictionaries is generated. A computer-readable record of a program for performing an interpolation process between a plurality of the first prosody parameters to generate a second prosody parameter, and generating a synthetic speech according to the second prosody parameter. It is a medium.

【００２４】また、本発明（請求項１１）は、複数の第
１の韻律制御辞書間で内挿処理を行って第２の韻律制御
辞書を生成させ、入力された言語情報に従って、前記第
２の韻律制御辞書を用いて韻律パラメータを生成させ、
前記韻律パラメータに従って合成音声を生成させるため
のプログラムを記録したコンピュータ読取り可能な記録
媒体である。According to the present invention (claim 11), a second prosody control dictionary is generated by performing an interpolating process among a plurality of first prosody control dictionaries, and the second prosody control dictionary is generated according to the input language information. Generate prosody parameters using the prosody control dictionary of
It is a computer-readable recording medium in which a program for generating synthetic speech according to the prosody parameter is recorded.

【００２５】なお、装置に係る本発明は方法に係る発明
としても成立し、方法に係る本発明は装置に係る発明と
しても成立する。It should be noted that the present invention related to the apparatus is established as an invention related to the method, and the present invention related to the method is also established as an invention related to the apparatus.

【００２６】また、装置または方法に係る本発明は、コ
ンピュータに当該発明に相当する手順を実行させるため
の（あるいはコンピュータを当該発明に相当する手段と
して機能させるための、あるいはコンピュータに当該発
明に相当する機能を実現させるための）プログラムを記
録したコンピュータ読取り可能な記録媒体としても成立
する。Further, the present invention related to an apparatus or method is for causing a computer to execute a procedure corresponding to the present invention (or for causing a computer to function as means corresponding to the present invention, or for a computer corresponding to the present invention. It also holds as a computer-readable recording medium in which a program (for realizing the function to perform) is recorded.

【００２７】本発明によれば、複数の韻律制御辞書を用
いて生成された韻律パラメータを任意の重みで内挿処理
して韻律パラメータを生成し、その韻律パラメータを用
いて音声合成を行うことで、多種多様な韻律的特徴を持
つ合成音声を生成することができる。According to the present invention, prosodic parameters generated by using a plurality of prosodic control dictionaries are interpolated with arbitrary weights to generate prosodic parameters, and speech synthesis is performed using the prosodic parameters. , It is possible to generate synthetic speech having various prosodic features.

【００２８】また、本発明によれば、予め複数の韻律制
御辞書に内挿処理を行って韻律制御辞書を生成し、その
韻律制御辞書を用いて音声合成を行うことによって、計
算量を増加させることなく多種多様な韻律的特徴を持つ
合成音声を生成することができる。Further, according to the present invention, the calculation amount is increased by performing interpolation processing on a plurality of prosody control dictionaries in advance to generate a prosody control dictionary and performing speech synthesis using the prosody control dictionaries. It is possible to generate synthetic speech having a wide variety of prosodic features.

【００２９】[0029]

【発明の実施の形態】以下、図面を参照しながら発明の
実施の形態を説明する。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described below with reference to the drawings.

【００３０】（第１の実施形態）図１は、本発明の第１
の実施形態に係る音声合成方法を実現する音声合成装置
（もしくは音声合成ソフト）の構成例を示すブロック図
である。(First Embodiment) FIG. 1 shows a first embodiment of the present invention.
3 is a block diagram showing a configuration example of a voice synthesizing device (or voice synthesizing software) for realizing the voice synthesizing method according to the embodiment of FIG.

【００３１】図１に示されるように、この音声合成装置
は、テキスト解析部１０、合成パラメータ生成部２０、
合成パラメータ内挿部２２、音声合成部１３、複数の韻
律制御辞書（図１では２４〜２６の３つ）を備えてい
る。As shown in FIG. 1, this speech synthesizer comprises a text analysis section 10, a synthesis parameter generation section 20,
A synthesis parameter interpolation unit 22, a voice synthesis unit 13, and a plurality of prosody control dictionaries (three of 24 to 26 in FIG. 1) are provided.

【００３２】各ユニットの基本的な構成・動作は次のよ
うになる（基本的には、テキスト解析部１０、合成パラ
メータ生成部２０、合成パラメータ内挿部２２、音声合
成部１３の順番で処理が行われる）。The basic configuration and operation of each unit are as follows (basically, the text analysis section 10, the synthesis parameter generation section 20, the synthesis parameter interpolation section 22, and the speech synthesis section 13 are processed in this order. Is done).

【００３３】テキスト解析部１０は、入力されたテキス
ト１０２に対して、形態素解析や構文解析などを行い、
言語情報１０４を生成する。言語情報１０４は、テキス
ト１０２の読みに対応する音声記号列や、韻律制御の単
位となるアクセント句の情報、アクセントの位置、品詞
など、合成パラメータの生成に必要な様々な情報を含ん
でいる。The text analysis unit 10 performs morphological analysis and syntactic analysis on the input text 102,
The language information 104 is generated. The language information 104 includes various information necessary for generating a synthesis parameter, such as a phonetic symbol string corresponding to reading of the text 102, information on an accent phrase serving as a unit of prosody control, an accent position, and a part of speech.

【００３４】合成パラメータ生成部２０は、言語情報１
０４に従って、韻律制御辞書２４を参照して合成パラメ
ータ２０４を、韻律制御辞書２５を参照して合成パラメ
ータ２０５を、韻律制御辞書２６を参照して合成パラメ
ータ２０６をそれぞれ生成する。The synthesis parameter generation unit 20 uses the language information 1
04, the synthesis parameter 204 is generated by referring to the prosody control dictionary 24, the synthesis parameter 205 is generated by referring to the prosody control dictionary 25, and the synthesis parameter 206 is generated by referring to the prosody control dictionary 26.

【００３５】合成パラメータ内挿部２２は、重み情報２
０１に従って、合成パラメータ２０４・合成パラメータ
２０５・合成パラメータ２０６の韻律パラメータに内挿
処理を行って、合成パラメータ２０７を生成する。The synthesis parameter interpolation unit 22 uses the weight information 2
In accordance with 01, interpolation processing is performed on the prosody parameters of the synthesis parameter 204, the synthesis parameter 205, and the synthesis parameter 206 to generate the synthesis parameter 207.

【００３６】音声合成部１３は、合成パラメータ２０７
で指定された音韻情報や韻律情報に従って、音声情報１
０８を生成する。The voice synthesizing unit 13 synthesizes parameters 207.
Voice information 1 according to the phonological information and prosody information specified in
08 is generated.

【００３７】以下では、基本周波数を韻律パラメータの
例として、本実施形態の動作例について詳細に説明す
る。In the following, the operation example of this embodiment will be described in detail by using the fundamental frequency as an example of the prosody parameter.

【００３８】テキスト解析部１０により言語情報１０４
が生成されると、合成パラメータ生成部２０では言語情
報１０４に従って複数の韻律制御辞書を参照して複数の
合成パラメータを生成する。The language information 104 by the text analysis unit 10
Is generated, the synthesis parameter generation unit 20 generates a plurality of synthesis parameters by referring to a plurality of prosody control dictionaries according to the language information 104.

【００３９】図２に、この合成パラメータ生成部２０内
における基本周波数を生成する処理を表す機能ブロック
図を示す。なお、図２では、１つの韻律制御辞書につい
て示してある（実際には複数の韻律制御辞書のそれぞれ
に対して図２の処理が行われる）。FIG. 2 is a functional block diagram showing the processing for generating the fundamental frequency in the synthesis parameter generating section 20. Note that FIG. 2 shows one prosody control dictionary (actually, the process of FIG. 2 is performed for each of the plurality of prosody control dictionaries).

【００４０】基本周波数制御辞書４８は、韻律制御辞書
の一部であり、代表パターン辞書４５と代表パターン選
択規則４６とオフセット生成規則４７から構成される。
代表パターン辞書４５は、アクセント句単位の典型的な
基本周波数の変化パターンの集合であり、例えば図３で
表されるようなパターンを格納している。代表パターン
選択部４１は、言語情報１０４に従って、代表パターン
選択規則４６を参照して、代表パターン辞書より最も適
当と予想される代表パターン４０１をアクセント句毎に
選択する。オフセット生成部４４は、言語情報１０４に
従って、オフセット生成規則４７を参照してアクセント
句毎にアクセント句の平均的な高さを指定するオフセッ
ト４０４を生成する。オフセット処理部４２は、代表パ
ターン辞書４５を対数周波数軸上でオフセット４０４だ
け並行移動させてアクセント毎に基本周波数パターン４
０２を生成する。パターン接続部４３は、アクセント句
毎の基本周波数パターン４０２を滑らかに接続して文章
全体の基本周波数パターン４０３を出力する。テキスト
「ただいまマイクのテスト中です」を例として、代表パ
ターン４０１，基本周波数パターン４０２，基本周波数
パターン４０３を図４（ａ），（ｂ），（ｃ）にそれぞ
れ示す。The fundamental frequency control dictionary 48 is a part of the prosody control dictionary, and is composed of a representative pattern dictionary 45, a representative pattern selection rule 46, and an offset generation rule 47.
The representative pattern dictionary 45 is a set of typical basic frequency change patterns in units of accent phrases, and stores patterns such as that shown in FIG. 3, for example. The representative pattern selection unit 41 refers to the representative pattern selection rule 46 according to the language information 104, and selects the representative pattern 401 expected to be most suitable from the representative pattern dictionary for each accent phrase. The offset generation unit 44 refers to the offset generation rule 47 according to the language information 104 and generates an offset 404 that specifies the average height of the accent phrase for each accent phrase. The offset processing unit 42 moves the representative pattern dictionary 45 in parallel by the offset 404 on the logarithmic frequency axis to generate the basic frequency pattern 4 for each accent.
02 is generated. The pattern connection unit 43 smoothly connects the basic frequency patterns 402 for each accent phrase and outputs the basic frequency pattern 403 of the entire sentence. Taking the text "I am currently testing the microphone" as an example, a representative pattern 401, a fundamental frequency pattern 402, and a fundamental frequency pattern 403 are shown in FIGS. 4 (a), (b), and (c), respectively.

【００４１】合成パラメータ生成部２０は、上述した基
本周波数パターン生成の処理を、複数の韻律制御辞書２
４・２５・２６を参照してそれぞれ行って基本周波数パ
ターンを生成し、合成パラメータ２０４・２０５・２０
６をそれぞれ出力する。The synthesis parameter generation unit 20 performs the above-described processing of fundamental frequency pattern generation in a plurality of prosody control dictionaries 2.
4.25.26 to generate a fundamental frequency pattern, and synthesize parameters 204.205.20.
6 is output.

【００４２】続いて、合成パラメータ内挿部２２は、合
成パラメータ２０４・２０５・２０６の基本周波数パタ
ーンに対して、重み情報２０１に従って加重平均処理を
行って基本周波数パターンを生成し、合成パラメータ２
０７を出力する。Subsequently, the synthesis parameter interpolation unit 22 performs a weighted average process on the fundamental frequency patterns of the synthesis parameters 204, 205 and 206 according to the weight information 201 to generate a fundamental frequency pattern, and the synthesis parameter 2
07 is output.

【００４３】重み情報２０１は、ｎ（ｎは複数）個の韻
律制御辞書（図１の場合、２４〜２６）に対応するｎ個
（図１の場合、３つ）の重み係数の組で表される。図５
は、重み情報２０１の入力手段（例えば、ＧＵＩによ
る）の一例を示している。この例では、３つの韻律制御
辞書２４・２５・２６はそれぞれＡ氏、Ｂ氏、Ｃ氏の口
調の韻律的特徴を表現する韻律制御辞書に対応してい
る。黒丸で示されているポインタの位置で、誰にどの程
度似た韻律にするのかを指定する。ポインタの位置が、
図５のａ，ｂ，ｃ，ｄのときの重み情報２０１の値を図
６に示す。ポインタの位置がａの場合には重み情報２０
１は韻律は３人の中間的なものとなり、ｂの場合にはＢ
氏に最も似た韻律でＣ氏とＡ氏の特徴も少し含むような
ものとなり、ｃの場合にはＡ氏とＣ氏の中間的な韻律と
なりＢ氏の特徴は含まれず、ｄの場合にはＡ氏の韻律の
特徴を大げさにしたような韻律となる。このように、重
み情報を制御することによって、様々な個人性を持った
合成音声を生成することが可能となる。The weight information 201 is represented by a set of n (three in the case of FIG. 1) weighting factors corresponding to n (n is a plurality) prosodic control dictionaries (24 to 26 in FIG. 1). To be done. Figure 5
Shows an example of an input means (for example, by GUI) of the weight information 201. In this example, the three prosodic control dictionaries 24, 25, and 26 correspond to prosodic control dictionaries expressing the prosodic features of the tone of Mr. A, Mr. B, and Mr. C, respectively. The position of the pointer indicated by a black circle specifies which prosody to which the prosody is similar. The position of the pointer is
The values of the weight information 201 for a, b, c and d in FIG. 5 are shown in FIG. If the position of the pointer is a, the weight information 20
1 has a prosody intermediate to those of 3 people, and in the case of b, B
The prosody most resembles that of Mr. C, and the features of Mr. A and Mr. A are slightly included. In the case of c, the prosody is intermediate between Mr. A and Mr. C, and the feature of Mr. B is not included, and in the case of d, Is a prosody that exaggerates the characteristics of Mr. A's prosody. In this way, by controlling the weight information, it becomes possible to generate synthetic speech with various personalities.

【００４４】合成パラメータ内挿部２２の他の構成例と
して、文中で、重み情報が変化できるようにすることも
可能である。図７は、「この宝くじ当たっている、これ
で一生遊んで暮らせるよ。」に対して、テキストに対応
して変化する重み情報の一例を示している。この例で
は、３つの韻律制御辞書２４・２５・２６は、同一人物
の、平穏なとき・驚いたとき・喜んだときの韻律に対応
している。このような変化する重み情報に従って内挿処
理を行うことによって、感情の細かな変化を表現する基
本周波数パターンを生成することが可能となる。As another configuration example of the synthesis parameter interpolation unit 22, it is possible to change the weight information in the sentence. FIG. 7 shows an example of weight information that changes according to the text in response to "I'm in this lottery, I can play and live for the rest of my life." In this example, the three prosody control dictionaries 24, 25, and 26 correspond to prosody of the same person when the person is calm, surprised, or pleased. By performing the interpolation process according to such changing weight information, it becomes possible to generate a fundamental frequency pattern expressing a fine change in emotion.

【００４５】なお、重み情報２０１は、ユーザが入力す
る形態、他のプログラム（プロセス）から与える形態、
テキストの所定の単位（例えば、文単位、文の構成要素
単位）ごとに付与する形態、テキスト解析部１０がテキ
ストを解析することによって生成する形態など、種々の
形態が可能である。The weight information 201 is input by the user, given by another program (process),
Various forms are possible, such as a form provided for each predetermined unit of text (for example, a sentence unit, a sentence component unit), a form generated by analyzing the text by the text analysis unit 10.

【００４６】なお、本実施形態では、代表パターンに基
づく基本周波数制御モデルを用いて説明したが、この他
に、いわゆる藤崎モデルのような、パターンを関数近似
するモデルなど種々の基本周波数制御モデルを用いるこ
とが可能である。In the present embodiment, the basic frequency control model based on the representative pattern is used, but various basic frequency control models such as a model that approximates a pattern with a function, such as the so-called Fujisaki model, may be used. It can be used.

【００４７】また、本実施形態では、韻律パラメータの
例として基本周波数について説明したが、韻律継続時間
長やパワー、ポーズなどの韻律パラメータについても同
様の形態で実施することが可能である。すなわち、音韻
継続時間長やパワー、ポーズなどの系列を、複数の韻律
制御辞書を用いてそれぞれ生成し、上述した重み情報に
従って内挿処理を行うことにより様々な韻律的特徴を持
った合成音声を生成することができる。In the present embodiment, the fundamental frequency has been described as an example of the prosody parameter, but the prosody parameters such as prosody duration, power, and pause can be implemented in the same manner. That is, sequences of phoneme duration, power, pause, etc. are generated using a plurality of prosodic control dictionaries, and interpolation processing is performed according to the weight information described above to generate synthetic speech having various prosodic features. Can be generated.

【００４８】以上説明してきたように、本実施形態によ
れば、複数の韻律制御辞書を用いて生成された韻律パラ
メータを任意の重みで内挿処理して韻律パラメータを生
成し、その韻律パラメータを用いて音声合成を行うこと
で、多種多様な韻律的特徴を持つ合成音声を生成するこ
とができる。As described above, according to the present embodiment, the prosody parameters generated using a plurality of prosody control dictionaries are interpolated with arbitrary weights to generate prosody parameters, and the prosody parameters are By performing speech synthesis by using it, it is possible to generate synthetic speech having various prosodic features.

【００４９】（第２の実施形態）次に、本発明の第２の
実施形態について説明する。(Second Embodiment) Next, a second embodiment of the present invention will be described.

【００５０】第１の実施形態では、複数の韻律制御辞書
を参照して生成された韻律パラメータに対して内挿処理
を行ったのに対して、第２の実施形態では、複数の韻律
制御辞書に対してあらかじめ内挿処理を行って、韻律の
特徴が調整された韻律制御辞書を作成する点が異なって
いる。In the first embodiment, interpolation processing is performed on the prosody parameters generated by referring to a plurality of prosody control dictionaries, whereas in the second embodiment, a plurality of prosody control dictionaries are used. The difference is that interpolation processing is performed in advance to create a prosody control dictionary in which the prosody features are adjusted.

【００５１】図８は、本実施形態に係る音声合成方法を
実現する音声合成装置（もしくは音声合成ソフト）の構
成例を示すブロック図である。FIG. 8 is a block diagram showing a configuration example of a voice synthesizing apparatus (or voice synthesizing software) for realizing the voice synthesizing method according to this embodiment.

【００５２】図８に示されるように、この音声合成装置
は、テキスト解析部１０、合成パラメータ生成部３０、
韻律制御辞書内挿部３１、音声合成部１３、内挿のもと
となる複数の韻律制御辞書（図８では２４〜２６の３
つ）、内挿により得られた韻律制御辞書３２を備えてい
る。As shown in FIG. 8, this speech synthesizer comprises a text analysis unit 10, a synthesis parameter generation unit 30,
The prosody control dictionary interpolating unit 31, the voice synthesizing unit 13, and a plurality of prosody control dictionaries (3 of 24 to 26 in FIG.
And a prosody control dictionary 32 obtained by interpolation.

【００５３】以下では、第１の実施形態と相違する部分
を中心に説明する。In the following, the parts different from the first embodiment will be mainly described.

【００５４】本実施形態では、韻律制御辞書内挿処理と
音声合成処理の２つに大きく分けられる。すなわち、予
め韻律制御辞書内挿部３１により複数の韻律制御辞書
（２４〜２６）をもとに韻律制御辞書３２を生成する。
以降は、韻律制御辞書３２を使って、テキスト解析部１
０・合成パラメータ生成部３０・音声合成部１３により
音声合成を行う。また、韻律制御辞書３２の内容を修正
もしくは別のものにしたい場合には、再度、韻律制御辞
書内挿部３１により韻律制御辞書３２を生成する。In this embodiment, the prosody control dictionary interpolation processing and the speech synthesis processing are roughly divided into two. That is, the prosody control dictionary interpolating unit 31 generates the prosody control dictionary 32 based on the plurality of prosody control dictionaries (24 to 26) in advance.
After that, using the prosody control dictionary 32, the text analysis unit 1
0. The synthesis parameter generation unit 30 and the voice synthesis unit 13 perform voice synthesis. If the content of the prosody control dictionary 32 is desired to be modified or changed, the prosody control dictionary interpolating unit 31 generates the prosody control dictionary 32 again.

【００５５】なお、テキスト解析部１０・合成パラメー
タ生成部３０・音声合成部１３の構成・動作は基本的に
は図１０におけるそれらと同様であるので各ユニットに
関するここでの説明は省略する。Since the configuration and operation of the text analysis unit 10, the synthesis parameter generation unit 30, and the voice synthesis unit 13 are basically the same as those in FIG. 10, the description of each unit will be omitted here.

【００５６】以下、韻律制御辞書内挿部３１に関して説
明する。The prosody control dictionary interpolating unit 31 will be described below.

【００５７】韻律制御辞書内挿部３１は、複数の韻律制
御辞書２４・２５・２６に対して重み情報３０１に従っ
て内挿処理を行って、韻律制御辞書３２を生成する（合
成パラメータ生成部３０は、言語情報１０４に従って、
この韻律制御辞書３２を参照して合成パラメータ３０５
を生成することになる）。The prosody control dictionary interpolation unit 31 performs interpolation processing on the plurality of prosody control dictionaries 24, 25, and 26 according to the weight information 301 to generate a prosody control dictionary 32 (the synthesis parameter generation unit 30 , According to the language information 104,
The synthesis parameter 305 is referred to by referring to the prosody control dictionary 32.
Will be generated).

【００５８】以下では、基本周波数を韻律パラメータの
例として、本実施形態の動作例について詳細に説明す
る。In the following, the operation example of this embodiment will be described in detail by using the fundamental frequency as an example of the prosody parameter.

【００５９】合成パラメータ生成部３０における基本周
波数制御モデルは、第１の実施形態と同様に、図２で説
明した代表パターンに基づくモデルを用いて説明する。The fundamental frequency control model in the synthesis parameter generation unit 30 will be described using the model based on the representative pattern described in FIG. 2 as in the first embodiment.

【００６０】この場合、基本周波数制御辞書は、代表パ
ターン辞書・代表パターン選択規則・オフセット生成規
則から構成される。ただし、代表パターン選択規則は、
複数の韻律制御辞書２４・２５・２６について全て共通
となるように韻律制御辞書が作られているものとする。
この場合、韻律制御辞書３２の代表パターン選択規則
は、複数の韻律制御辞書２４・２５・２６のいずれかの
代表パターン選択規則の複製とすればよい。そこで、韻
律制御辞書内挿部３１における処理は、複数の韻律制御
辞書２４・２５・２６の代表パターン辞書を内挿して韻
律制御辞書３２の代表パターン辞書を生成することと、
複数の韻律制御辞書２４・２５・２６のオフセット生成
規則を内挿して韻律制御辞書３２のオフセット生成規則
を生成することとなる。In this case, the basic frequency control dictionary is composed of a representative pattern dictionary, a representative pattern selection rule, and an offset generation rule. However, the representative pattern selection rule is
It is assumed that the prosody control dictionary is created so that all the prosody control dictionaries 24, 25, and 26 are common.
In this case, the representative pattern selection rule of the prosody control dictionary 32 may be a copy of one of the representative pattern selection rules of the plurality of prosody control dictionaries 24, 25, and 26. Therefore, the processing in the prosody control dictionary interpolating unit 31 interpolates the representative pattern dictionaries of the plurality of prosody control dictionaries 24, 25, and 26 to generate the representative pattern dictionary of the prosody control dictionary 32.
The offset generation rules of the prosody control dictionaries 24, 25, and 26 are interpolated to generate the offset generation rules of the prosody control dictionary 32.

【００６１】まず、代表パターン辞書の内挿処理につい
て説明する。First, the interpolation process of the representative pattern dictionary will be described.

【００６２】各代表パターン辞書は、Ｎ個の代表パター
ンによって構成されている。１番からＮ番までの代表パ
ターンについて、各代表パターン辞書の同じ番号の代表
パターン同士を、重み情報３０１に従って加重平均処理
することにより内挿処理された代表パターンが生成され
る。Each representative pattern dictionary is composed of N representative patterns. For the representative patterns No. 1 to N, the representative patterns having the same number in each representative pattern dictionary are weighted and averaged according to the weight information 301 to generate the interpolated representative patterns.

【００６３】次に、オフセット生成規則の内挿処理につ
いて説明する。Next, the interpolation process of the offset generation rule will be described.

【００６４】オフセット生成規則は、統計的なモデルの
一つである数量化Ｉ類を用いて行うことができる。数量
化Ｉ類によるオフセット生成規則の例を図９に示す。オ
フセットの値は、各言語情報が属するカテゴリに対応す
る係数と平均値ｍの和で与えられる。例えば、あるアク
セント句の文中位置が文中、モーラ数が４、品詞が名詞
であれば、オフセットの値は、ｍ＋ａ_２＋ｂ_４＋ｃ_１と
なる。そこで、各オフセット生成規則の、同じカテゴリ
に対応する係数同士を、重み情報３０１に従って加重平
均処理することにより内挿処理されたオフセット生成規
則が生成される。The offset generation rule can be performed by using a quantification type I which is one of statistical models. FIG. 9 shows an example of the offset generation rule based on the quantification type I. The offset value is given by the sum of the coefficient corresponding to the category to which each language information belongs and the average value m. For example, when the position of a certain accent phrase is in the sentence, the number of moras is 4, and the part of speech is a noun, the offset value is m + a ₂ + b ₄ + c ₁ . Therefore, the offset generation rule in which the interpolation processing is performed is performed by performing the weighted average processing on the coefficients corresponding to the same category of each offset generation rule according to the weight information 301.

【００６５】重み情報３０１は、ｎ（ｎは複数）個の韻
律制御辞書（図８の場合、２４〜２６）に対応するｎ個
（図８の場合、３つ）の重み係数の組で表され、重み係
数を変化させて韻律制御辞書を生成することによって、
第１の実施形態と同様に、様々な韻律的特徴を持った合
成音声を生成することができる。The weight information 301 is represented by a set of n (three in the case of FIG. 8) weighting factors corresponding to n (n is a plurality) prosodic control dictionaries (24 to 26 in the case of FIG. 8). And generate a prosody control dictionary by changing the weighting factors,
Similar to the first embodiment, it is possible to generate synthetic speech having various prosodic features.

【００６６】以降は、韻律制御辞書３２を使って、テキ
スト解析部１０・合成パラメータ生成部３０・音声合成
部１３により音声合成を行うことができる。Thereafter, using the prosody control dictionary 32, the text analysis unit 10, the synthesis parameter generation unit 30, and the voice synthesis unit 13 can perform voice synthesis.

【００６７】なお、重み情報３０１を異ならせて生成し
た複数種類の韻律制御辞書３２を用意しておき、それら
を適宜選択して使用可能にすることもできる。It is also possible to prepare a plurality of types of prosody control dictionaries 32 generated by making the weight information 301 different, and select and use them appropriately.

【００６８】なお、本実施形態では、代表パターンに基
づく基本周波数制御モデルを用いて説明したが、この他
に、いわゆる藤崎モデルのようなパターンを関数近似す
るモデルなど種々の基本周波数制御モデルを用いること
が可能である。Although the basic frequency control model based on the representative pattern is used in the present embodiment, various basic frequency control models such as the so-called Fujisaki model which approximates the pattern by function are used. It is possible.

【００６９】また、本実施形態では、韻律パラメータの
例として基本周波数制御について説明したが、音韻継続
時間長や、パワー、ポーズなどの韻律パラメータについ
ても同様の形態で実施することが可能である。すなわ
ち、音韻継続時間長やパワー、ポーズなどは、数量化Ｉ
類などの統計的モデルを用いて制御することが可能であ
るため、上述したオフセット生成規則と同様にモデルの
パラメータを重み情報に従って内挿処理することによ
り、様々な韻律的特徴を持った韻律制御情報を生成する
ことができる。Further, in the present embodiment, the fundamental frequency control is explained as an example of the prosodic parameter, but the prosodic parameter such as the phoneme duration, the power, and the pause can also be implemented in the same form. That is, the phoneme duration, power, pause, etc. are quantified by I
Since it is possible to control using a statistical model such as a class, the prosodic control with various prosodic characteristics is performed by interpolating the model parameters according to the weight information as in the offset generation rule described above. Information can be generated.

【００７０】本実施形態では、複数の韻律制御辞書に対
してあらかじめ内挿処理を行って生成された韻律制御辞
書のみを用いて韻律パラメータを生成するため、音声合
成を行う際の韻律パラメータ生成の計算量が、従来の音
声合成方法と比較して小さいという利点がある。In the present embodiment, since the prosody parameters are generated only by using the prosody control dictionary generated by performing the interpolation process on a plurality of prosody control dictionaries in advance, the prosody parameter generation at the time of speech synthesis is performed. There is an advantage that the amount of calculation is smaller than that of the conventional speech synthesis method.

【００７１】ところで、本実施形態では、図８の構成要
素のうちテキスト解析部１０と合成パラメータ生成部３
０と音声合成部１３と韻律制御辞書３２とを含む音声合
成装置（もしくは音声合成ソフト）として構成する形態
も可能である。もしくは、テキスト解析部１０と合成パ
ラメータ生成部３０と音声合成部１３とを含む音声合成
装置（もしくは音声合成ソフト）として構成し、韻律制
御辞書３２は別途入力する形態も可能である。このよう
な構成は、例えばテレビゲームの各キャラクターの口調
に適した韻律制御辞書３２をそれぞれ作成しておき、こ
のキャラクターにゲームの中で合成音声で喋らせるよう
な用途に利用するなど、種々の装置もしくはアプリケー
ションプログラムに適用することができる。By the way, in the present embodiment, the text analysis unit 10 and the synthesis parameter generation unit 3 among the components shown in FIG.
It is also possible to adopt a configuration in which it is configured as a voice synthesizing device (or voice synthesizing software) including 0, the voice synthesizing unit 13, and the prosody control dictionary 32. Alternatively, it may be configured as a voice synthesis device (or voice synthesis software) including the text analysis unit 10, the synthesis parameter generation unit 30, and the voice synthesis unit 13, and the prosody control dictionary 32 may be input separately. With such a configuration, for example, a prosody control dictionary 32 suitable for the tone of each character of a video game is created in advance, and the character is used in a game in which a synthetic voice is spoken in the game. It can be applied to devices or application programs.

【００７２】同様に、韻律制御辞書内挿部３１および韻
律制御辞書３２の素材となる複数の韻律制御辞書を含む
韻律制御辞書作成装置（もしくは韻律制御辞書作成ソフ
ト）として構成する形態も可能である。もしくは、韻律
制御辞書内挿部３１を含む韻律制御辞書作成装置（もし
くは韻律制御辞書作成ソフト）として構成し、素材とな
る複数の韻律制御辞書は別途入力する形態も可能であ
る。このような構成によって、様々な韻律制御辞書３２
を、ユーザ自身で作成して使用し、あるいはメーカーが
作成してユーザに提供することができる。Similarly, the prosody control dictionary interpolating unit 31 and the prosody control dictionary 32 may be configured as a prosody control dictionary creating device (or prosody control dictionary creating software) including a plurality of prosody control dictionaries as materials. . Alternatively, the prosody control dictionary creation device (or the prosody control dictionary creation software) including the prosody control dictionary interpolating unit 31 may be configured, and a plurality of material prosody control dictionaries as materials may be input separately. With such a configuration, various prosody control dictionaries 32
Can be created and used by the user himself or can be created and provided to the user by the manufacturer.

【００７３】以上説明してきたように、本実施形態によ
れば、予め複数の韻律制御辞書に内挿処理を行って韻律
制御辞書を生成し、その韻律制御辞書を用いて音声合成
を行うことによって、計算量を増加させることなく多種
多様な韻律的特徴を持つ合成音声を生成することができ
る。As described above, according to the present embodiment, by interpolating a plurality of prosody control dictionaries in advance to generate a prosody control dictionary, and performing speech synthesis using the prosody control dictionaries. , It is possible to generate synthetic speech having various prosodic features without increasing the amount of calculation.

【００７４】なお、以上の各機能は、ハードウェアとし
てもソフトウェアとしても実現可能である。Each of the above functions can be realized as hardware or software.

【００７５】また、本実施形態は、コンピュータに所定
の手段を実行させるための（あるいはコンピュータを所
定の手段として機能させるための、あるいはコンピュー
タに所定の機能を実現させるための）プログラムを記録
したコンピュータ読取り可能な記録媒体としても実施す
ることもできる。Further, the present embodiment is a computer in which a program for causing a computer to execute a predetermined means (or for causing a computer to function as a predetermined means or for causing a computer to realize a predetermined function) is recorded. It can also be implemented as a readable recording medium.

【００７６】本発明は、上述した実施の形態に限定され
るものではなく、その技術的範囲において種々変形して
実施することができる。The present invention is not limited to the above-described embodiments, but can be implemented with various modifications within the technical scope thereof.

【００７７】[0077]

【発明の効果】本発明によれば、複数の韻律制御辞書を
用いて生成された韻律パラメータを任意の重みで内挿処
理して韻律パラメータを生成し、その韻律パラメータを
用いて音声合成を行うことで、多種多様な韻律的特徴を
持つ合成音声を生成することができる。According to the present invention, prosodic parameters generated by using a plurality of prosodic control dictionaries are interpolated with arbitrary weights to generate prosodic parameters, and speech synthesis is performed using the prosodic parameters. Thus, it is possible to generate synthetic speech having various prosodic features.

【００７８】また、本発明によれば、予め複数の韻律制
御辞書に内挿処理を行って韻律制御辞書を生成し、その
韻律制御辞書を用いて音声合成を行うことによって、計
算量を増加させることなく多種多様な韻律的特徴を持つ
合成音声を生成することができる。Further, according to the present invention, the calculation amount is increased by performing interpolation processing on a plurality of prosody control dictionaries in advance to generate a prosody control dictionary and performing speech synthesis using the prosody control dictionaries. It is possible to generate synthetic speech having a wide variety of prosodic features.

[Brief description of drawings]

【図１】本発明の第１の実施形態に係る音声合成装置の
構成例を示す図FIG. 1 is a diagram showing a configuration example of a speech synthesizer according to a first embodiment of the present invention.

【図２】基本周波数パターン生成モデルを示す図FIG. 2 is a diagram showing a fundamental frequency pattern generation model.

【図３】代表パターンの例を説明するための図FIG. 3 is a diagram for explaining an example of a representative pattern.

【図４】基本周波数パターンの例を説明するための図FIG. 4 is a diagram for explaining an example of a fundamental frequency pattern.

【図５】重み情報の入力手段の例を説明するための図FIG. 5 is a diagram for explaining an example of an input unit of weight information.

【図６】図５の入力手段により指定された重み情報の例
を示す図FIG. 6 is a diagram showing an example of weight information designated by the input means of FIG.

【図７】重み情報の変化の例を説明するための図FIG. 7 is a diagram for explaining an example of changes in weight information.

【図８】本発明の第２の実施形態に係る音声合成装置の
構成例を示す図FIG. 8 is a diagram showing a configuration example of a speech synthesizer according to a second embodiment of the present invention.

【図９】数量化Ｉ類によるオフセット生成規則の一例を
示す図FIG. 9 is a diagram showing an example of an offset generation rule based on quantification type I.

【図１０】従来の一般的な音声合成装置の構成例を示す
図FIG. 10 is a diagram showing a configuration example of a conventional general speech synthesizer.

【図１１】従来の発話スタイルを制御する音声合成装置
の構成例を示す図FIG. 11 is a diagram showing a configuration example of a conventional speech synthesizing device for controlling a speech style.

[Explanation of symbols]

１０…テキスト解析部１３…音声合成部２０，３０…合成パラメータ生成部２２…合成パラメータ内挿部２４〜２６，３２…韻律制御辞書３１…韻律制御辞書内挿部４１…代表パターン選択部４２…オフセット処理部４３…パターン接続部４４…オフセット生成部４５…代表パターン辞書４６…代表パターン選択規則４７…オフセット生成規則４８…基本周波数制御辞書 10 ... Text analysis section 13 ... Voice synthesizer 20, 30 ... Synthesis parameter generation unit 22 ... Synthesis parameter interpolation unit 24-26, 32 ... Prosodic control dictionary 31 ... Prosody control dictionary interpolator 41 ... Representative pattern selection section 42 ... Offset processing unit 43 ... Pattern connection part 44 ... Offset generation unit 45 ... Representative pattern dictionary 46 ... Representative pattern selection rule 47 ... Offset generation rule 48 ... Basic frequency control dictionary

フロントページの続き (56)参考文献特開平10−11083（ＪＰ，Ａ) 特開平11−231885（ＪＰ，Ａ) 特開2000−47681（ＪＰ，Ａ) 特開平９−244693（ＪＰ，Ａ) 特開平９−90970（ＪＰ，Ａ) 坂野秀樹，他，包絡と音源の独立操作による音声モーフィング，電子情報通信学会論文誌Ａ，1998年２月25日，Ｖｏｌ．Ｊ81−Ａ，Ｎｏ．２，ｐ．261−268 吉村貴克，他，ＨＭＭに基づく音声合成システムにおける話者補間，日本音響学会平成９年度秋季研究発表会講演論文集，1997年９月，１−Ｐ−17，ｐ. 337−338 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 13/06 G10L 13/08 ＪＩＣＳＴファイル（ＪＯＩＳ)Continuation of the front page (56) References JP-A-10-11083 (JP, A) JP-A-11-231885 (JP, A) JP-A-2000-47681 (JP, A) JP-A-9-244693 (JP, A) Japanese Patent Laid-Open No. 9-90970 (JP, A) Hideki Sakano et al., Voice morphing by independent operation of envelope and sound source, IEICE Transactions A, February 25, 1998, Vol. J81-A, No. 2, p. 261-268 Takakatsu Yoshimura, et al., Speaker Interpolation in HMM-based Speech Synthesis System, Proceedings of Autumn Meeting of Acoustical Society of Japan, 1997, September 1997, 1-P-17, p. 337- 338 (58) Fields surveyed (Int.Cl. ⁷ , DB name) G10L 13/06 G10L 13/08 JISST file (JOIS)

Claims

(57) [Claims]

1. At least according to the input language information.
A first prosody parameter is generated using each of the three prosody control dictionaries, and interpolation processing is performed between the first prosody parameters according to the weight information specified for each prosody control dictionary to generate a second prosody parameter. Is generated , synthetic speech is generated according to the second prosody parameter , and the weight information changes in the sentence .

2. The prosodic control dictionary has a typical fundamental frequency.
Characterized by including a representative pattern that represents a change pattern
The voice synthesis method according to claim 1.

3. A second prosody control dictionary is generated by performing an interpolating process between a plurality of first prosody control dictionaries, and the second prosody control dictionary is used to generate a prosody parameter according to the input linguistic information. In the first prosodic control dictionary , wherein the first and second prosodic control dictionaries have a typical fundamental frequency.
Characterized by including a representative pattern that represents a change pattern
Voice synthesis method.

4. At least according to the input language information.
A means for generating a first prosody parameter using each of the three prosody control dictionaries, and an interpolation process between the first prosody parameters according to the weight information designated for each prosody control dictionary Means for generating a prosody parameter, means for generating a synthetic voice according to the second prosody parameter, and the weight information
A voice synthesizer , comprising: means for changing information in a sentence .

5. Means for performing a interpolation process between a plurality of first prosody control dictionaries to generate a second prosody control dictionaries, and using the second prosody control dictionaries according to input linguistic information. The first and second prosody control dictionaries are provided with a typical basic prosody, and means for generating a prosody parameter and means for generating a synthetic voice according to the prosody parameter.
A speech synthesizer comprising a representative pattern representing a frequency change pattern .

6. At least according to the input language information.
A first prosody parameter is generated using each of the three prosody control dictionaries, and interpolation processing is performed between the first prosody parameters according to the weight information specified for each prosody control dictionary to generate a second prosody parameter. to generate, the second to produce a synthesized speech in accordance with the prosodic parameters, computer-readable recording medium recording the order of the program is changed the weight information in the text.

7. A representative pattern representing a typical fundamental frequency change pattern by performing interpolation processing among a plurality of first prosody control dictionaries.
A second prosody control dictionary including turns is generated, a prosody parameter is generated using the second prosody control dictionary according to the input language information, and a program for generating synthetic speech according to the prosody parameter is recorded. Computer-readable recording medium.