JP2001166788A

JP2001166788A - Voice synthesis method and apparatus

Info

Publication number: JP2001166788A
Application number: JP34788299A
Authority: JP
Inventors: Hirofumi Nishimura; 洋文西村; Toshimitsu Minowa; 利光蓑輪; Akira Mochizuki; 亮望月
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1999-12-07
Filing date: 1999-12-07
Publication date: 2001-06-22

Abstract

PROBLEM TO BE SOLVED: To solve the problem that a pitch pattern can not be extracted since an original natural voice has no pitch at its voiceless part when a pitch pattern is extracted from a natural voice and used to apply a different pronounced voice to synthesis. SOLUTION: A voice from which a pitch pattern is extracted is composed of a voice which is all composed of a voiced sound and when it is difficult to compose a sentence with all of a voiced sound, it is made possible to constitute the sentence all composed of the voiced sound with ease by using a composite meaningless document.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は音声合成方法に関
し、特にテキストを音声に変換する音声合成方法および
その装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech synthesis method, and more particularly to a speech synthesis method for converting text into speech and an apparatus therefor.

【０００２】[0002]

【従来の技術】従来、音声片を接続して音声を合成する
音声合成方法においては、特開平８−４４３９１号公報
に記載されているように、無声音も含んだ音声から大局
的なピッチパタンを抽出し、これを制御目標のピッチパ
タンにして音声を合成していた。2. Description of the Related Art Conventionally, in a speech synthesizing method for synthesizing speech by connecting speech segments, as described in Japanese Patent Laid-Open No. 8-44391, a global pitch pattern is obtained from speech including unvoiced sound. The voice was extracted and used as a control target pitch pattern to synthesize speech.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、従来の
音声合成方法では、無声音近傍の有声音部分のピッチパ
タンに大きな変動が現れることや、無声音部分について
は補間によりピッチパタンを推定する必要があった。こ
のため、「さ」や「しゃ」などの音節が連続して母音区
間が非常に短くなる場合などでは、無声子音によるピッ
チの変動をあまり受けていない有声音区間が十分にない
ため、安定した大局的なピッチパタンを求めることが困
難になったり、母音が無声化する場合などでは、その音
節のピッチパタンをすべて補間処理により求める必要が
生じ、合成音声の抑揚が不自然になるという問題があっ
た。However, in the conventional speech synthesizing method, a large fluctuation appears in a pitch pattern of a voiced portion near an unvoiced sound, and a pitch pattern needs to be estimated for an unvoiced portion by interpolation. . For this reason, when vowel sections such as “sa” and “sha” are continuous and the vowel section is very short, there is not enough voiced section that has not received much fluctuation in pitch due to unvoiced consonants. When it is difficult to obtain a global pitch pattern or when vowels are unvoiced, it is necessary to obtain all pitch patterns of the syllables by interpolation processing, and the inflection of synthesized speech becomes unnatural. there were.

【０００４】本発明は、このような合成音声の抑揚の不
自然性を取り除き、合成音声の品質を向上させた音声合
成方法およびその装置を提供するものである。[0004] The present invention is to provide a speech synthesis method and an apparatus therefor which eliminate the unnaturalness of the inflection of the synthesized speech and improve the quality of the synthesized speech.

【０００５】[0005]

【課題を解決するための手段】上記課題を解決するため
に、本発明の音声合成方法は、合成目的とする音声のピ
ッチパタンを自然音声から抽出し、このピッチパタンに
従い音声片の音の高さを変更し、これらを接続して任意
の音声を合成する際に、ピッチパタンを抽出する自然音
声として、有声音のみで構成される有意味の自然音声を
用いることを特徴としたものであり、有声音のみで構成
される有意味の自然音声から抽出したピッチパタンを利
用することにより、無声音近傍のピッチパタンの修正や
無声部分のピッチパタンの補間処理などを行わなくて
も、全ての音声区間において自然音声から抽出した安定
したピッチパタンを求めることができることとなる。In order to solve the above-mentioned problems, a speech synthesis method according to the present invention extracts a pitch pattern of a speech to be synthesized from natural speech, and according to the pitch pattern, calculates a pitch of a speech piece. When combining these and synthesizing an arbitrary speech, a meaningful natural speech composed of only voiced sounds is used as a natural speech for extracting a pitch pattern. By using pitch patterns extracted from meaningful natural voices composed only of voiced sounds, all voices can be corrected without correcting pitch patterns near unvoiced sounds or performing interpolation processing of pitch patterns in unvoiced parts. A stable pitch pattern extracted from natural speech in a section can be obtained.

【０００６】また、本発明の音声合成方法は、ピッチパ
タンを抽出する自然音声として、合文法無意味文章を用
いることを特徴としたものであり、自然音声から抽出
したピッチパタンを基にして音声を合成する場合、文章
に意味があると、文章の意味によっては強調される部分
が出てきてしまうなどの影響によるピッチパタンの変動
が出てしまうが、合文法無意味文章であれば、アクセン
ト句毎にモーラ数とアクセント型が異なるすべての有声
音からなる音声を用意しておけば、任意のピッチパタン
においてもこれらを組み合わせることにより、容易に有
声音からなる文章を構成することができることとなる。A speech synthesis method according to the present invention is characterized in that a grammatical meaningless sentence is used as a natural speech for extracting a pitch pattern. The speech synthesis method is based on the pitch pattern extracted from the natural speech. When synthesizing a sentence, if the sentence has meaning, the pitch pattern will fluctuate due to the effect of emphasizing parts depending on the meaning of the sentence. By preparing voices composed of all voiced sounds with different mora number and accent type for each phrase, it is possible to easily compose sentences composed of voiced sounds by combining these in any pitch pattern. Become.

【０００７】また、本発明の音声合成方法は、合成目的
とする音声と同じフレーズ数、モーラ数およびアクセン
ト型の有声音からなる音声を用意できない場合に、フレ
ーズ数、モーラ数およびアクセント型が類似している有
声音からなる音声のピッチパタンの一部を変更して、目
的のアクセント句の数やアクセント句内のモーラ数や合
成しようとするアクセント型のピッチパタンを作り出
し、これを用いて任意の音声を合成するものであり、
全てのアクセント句の数やアクセント句内のモーラ数や
合成しようとするアクセント型について有声音からなる
音声を用意する場合に、全てのアクセント句毎に全ての
モーラ数およびアクセント型の音声をデータベースに用
意しようとすると、３フレーズまでの音声でそれぞれの
アクセント句のモーラ数が９モーラまでを考慮するとし
ても１２７，５５０通り（50＋50×50＋50×50×50）の
音声が必要になってしまい、データベースの規模が非常
に大きくなってしまうが、多少、モーラ数などが異なっ
ても有意味単語などの意味のある聞きなれた音声を用い
て安定したピッチパタンをまず求め、これを修正して目
的のピッチパタンを作り出せば、任意のピッチパタンを
求めることができることとなる。Further, in the speech synthesis method of the present invention, when it is not possible to prepare a voice composed of voiced sounds having the same number of phrases, mora and accent type as the voice to be synthesized, the number of phrases, mora number and accent type are similar. By changing a part of the pitch pattern of the voiced sound that is being voiced, the number of desired accent phrases, the number of mora in the accent phrase, and the accent type pitch pattern to be synthesized are created, and Synthesizes the voice of
When preparing voices consisting of voiced sounds for the number of all accent phrases, the number of mora in accent phrases, and the accent type to be synthesized, all mora numbers and accent type voices for all accent phrases are stored in a database. If you try to prepare it, 127,550 voices (50 + 50 × 50 + 50 × 50 × 50) will be needed even if the number of mora of each accent phrase is up to 9 mora for voices up to 3 phrases. Although the scale becomes extremely large, a stable pitch pattern is first obtained using meaningful and familiar voices such as meaningful words even if the number of mora is slightly different, and this is corrected to If a pitch pattern is created, an arbitrary pitch pattern can be obtained.

【０００８】また、本発明の音声合成装置は、合成しよ
うとする音声の読みを入力する入力手段と、その入力か
らフレーズ数、モーラ数およびアクセント型を決定し、
これによりピッチパタンのカテゴリを決定するピッチパ
タンカテゴリ決定手段と、自然音声の文章から抽出した
ピッチパタンを格納したピッチパタンデータベースと、
前記ピッチパタンカテゴリ決定手段で決定したピッチパ
タンカテゴリのピッチパタンを前記ピッチパタンデータ
ベースから選択するピッチパタン決定手段と、合成しよ
うとする音声のアクセント句毎のモーラ数やアクセント
型に合うようにピッチパタンを変形するピッチパタン変
形手段と、合成しようとする読みに従って音声片を選択
する音声片選択手段と、音声片を格納した音声片データ
ベースと、前記音声片選択手段によって選択された音声
片を前記ピッチパタン決定手段によって決定したピッチ
パタンに基づいて変形する音声片変形手段と、音声片を
接続して音声を合成する音声片接続手段と、合成音声を
出力する合成音出力手段とを備えたものであり、自然音
声から抽出したピッチパタンに合わせて音声を合成する
ことにより、自然なピッチパタンの音声を合成するする
ことができることとなる。Further, the speech synthesizing apparatus of the present invention determines the number of phrases, the number of mora, and the accent type from the input means for inputting the reading of the voice to be synthesized,
A pitch pattern category determining means for determining a category of the pitch pattern, a pitch pattern database storing pitch patterns extracted from sentences of natural speech,
Pitch pattern determining means for selecting the pitch pattern of the pitch pattern category determined by the pitch pattern category determining means from the pitch pattern database; and pitch pattern matching the number of mora and accent type of each accent phrase of the voice to be synthesized. A voice pattern selecting means for selecting a voice fragment according to the reading to be synthesized; a voice fragment database storing the voice fragments; and A speech piece transformation means for transforming based on the pitch pattern determined by the pattern decision means, speech piece connection means for connecting speech pieces to synthesize speech, and synthetic speech output means for outputting synthesized speech. Yes, by synthesizing speech according to the pitch pattern extracted from natural speech, And thus capable of synthesizing a speech pitch pattern.

【０００９】以下、本発明の実施の形態について、図１
から図８を用いて説明する。（実施の形態１）本発明の請求項１に記載の発明につい
て、具体的に説明する。図１は無声音を含んだ音声か
らピッチパタンを抽出した場合の概念図である。図１に
おいて、１００は音声波形、１０１、１０２、１０３は
ピッチパタン、１０４から１０８はモーラ位置ごとの母
音区間である。音声波形１００は「あきたけん」と発声
した音声であり、無声音部分は「き」、「た」、「け」
の子音部分と「き」の無声化母音の部分である。自然音
声から大局的なピッチパタンを抽出するには、１０４〜
１０８のモーラ位置毎の母音区間において安定したピッ
チパタンを求めることが重要である。しかしながら、図
１に示す概念図のように、無声音付近のピッチパタンは
変動が大きく、大局的に安定したピッチを求めることが
困難になってしまう。具体的に説明すると、ピッチパタ
ン１０２やピッチパタン１０３のように無声子音近辺の
ピッチは多少ピッチが上昇する傾向があったり、ピッチ
パタン１０２のように無破裂音に挟まれた有声音部分は
短くなってしまうので、安定した部分を十分に確保でき
ないという問題を生じる。さらに、区間１０５のように
母音部分が無声化してしまう場合には、この部分のピッ
チパタンを前後のピッチから補間により求める必要が生
じてしまい、自然音声からピッチパタンを直接求めるこ
とができないという問題が生じる。Hereinafter, an embodiment of the present invention will be described with reference to FIG.
This will be described with reference to FIG. (Embodiment 1) The invention described in claim 1 of the present invention will be specifically described. FIG. 1 is a conceptual diagram when a pitch pattern is extracted from a voice including an unvoiced sound. In FIG. 1, reference numeral 100 denotes a voice waveform, 101, 102, and 103 denote pitch patterns, and 104 to 108 denote vowel sections for each mora position. The voice waveform 100 is a voice uttered as “Aken Ken”, and the unvoiced sound portions are “ki”, “ta”, “ke”.
This is the consonant part and the unvoiced vowel part of "ki". To extract a global pitch pattern from natural speech, 104-
It is important to find a stable pitch pattern in the vowel section for each 108 mora positions. However, as shown in the conceptual diagram of FIG. 1, the pitch pattern near the unvoiced sound has large fluctuations, making it difficult to obtain a globally stable pitch. More specifically, the pitch near an unvoiced consonant tends to slightly increase as in the pitch pattern 102 and the pitch pattern 103, and the voiced sound portion sandwiched between non-plosive sounds as in the pitch pattern 102 is short. Therefore, there arises a problem that a stable portion cannot be sufficiently secured. Further, when the vowel part is unvoiced as in the section 105, the pitch pattern of this part needs to be obtained from the preceding and following pitches by interpolation, and the pitch pattern cannot be directly obtained from natural speech. Occurs.

【００１０】しかしながら、全て有声音からなる音声か
らピッチパタンを抽出すれば、上記の問題は解決するこ
とができる。図２はすべて有声音の音声からピッチパタ
ンを抽出した場合の概念図である。図２において２００
は「あおもり」と発声した音声波形、２０１はピッチパ
タン、２０２から２０５はモーラ位置ごとの母音区間で
ある。ピッチパタン２０１に示すように、無声音が含ま
れない場合は単語全体に渡ってピッチパタンが連続にな
る。したがって、大局的に安定したピッチを求めること
ができる。However, the above problem can be solved by extracting a pitch pattern from a voice composed of all voiced sounds. FIG. 2 is a conceptual diagram in the case where pitch patterns are extracted from voiced sounds. In FIG. 2, 200
Is a voice waveform uttered as "Aomori", 201 is a pitch pattern, and 202 to 205 are vowel sections for each mora position. As shown in the pitch pattern 201, when no unvoiced sound is included, the pitch pattern is continuous over the entire word. Therefore, a stable pitch can be obtained globally.

【００１１】（実施の形態２）本発明の請求項２に記載
の発明について、具体的に説明する。図３は請求項２に
記載のピッチパタンを抽出するための合文法無意味文章
を作成する様子である。図３では３つのアクセント句か
らなる合文法無意味文章を作成する様子を示しており、
３０１が第１アクセント句区間、３０２が第2アクセン
ト句区間、３０３が第３アクセント句区間である。この
時、名詞代入領域３０４および３０５に図４に示すよう
な名詞の単語を入れ、動詞代入領域３０６に図５に示す
ような動詞を入れれば、全て有声音からなる合文法無意
味文章を簡単に作成することができ、各アクセント句に
おいて任意のモーラ数およびアクセント型を設定するこ
とができる。アクセント句が２個の場合や４個以上の場
合の文章においても同様な操作により任意のモーラ数お
よびアクセント型のアクセント句からなる合文法無意味
文章を作成することができる。このようにして作成した
合文法無意味文章は全く意味が無いので、強調などによ
るピッチパタンの変動が無いピッチパタンを得ることが
できる。(Embodiment 2) The invention described in claim 2 of the present invention will be specifically described. FIG. 3 shows a state in which a grammatical meaningless sentence for extracting a pitch pattern according to claim 2 is created. FIG. 3 shows the creation of a grammatical meaningless sentence composed of three accent phrases.
301 is a first accent phrase section, 302 is a second accent phrase section, and 303 is a third accent phrase section. At this time, if a noun word as shown in FIG. 4 is inserted into the noun substitution areas 304 and 305 and a verb as shown in FIG. 5 is inserted into the verb substitution area 306, a grammatical meaningless sentence composed entirely of voiced sounds can be easily obtained. , And an arbitrary mora number and accent type can be set in each accent phrase. Even in the case of two or four or more accent phrases, a syntactically meaningless sentence composed of an arbitrary number of mora and an accent type accent phrase can be created by the same operation. Since the syntactically meaningless sentence created in this way has no meaning at all, it is possible to obtain a pitch pattern having no change in pitch pattern due to emphasis or the like.

【００１２】（実施の形態３）本発明の請求項３に記載
の発明について、具体的に説明する。図６は請求項３に
記載したピッチパタンを抽出するための音声を作成する
ための作成表の例であり、この表は３つのアクセント句
からなる音声の場合の例を示している。図６ではアクセ
ント型を０型、１型、ｎ型の３つに分類している。この
分類の様子を図７に示す。図７において、７０１から７
０３はピッチパタンを表している。ｎ型アクセントと
は、２型アクセント以降のアクセント型を表している。
そして、これらの音声を各アクセント句毎に入れ替え
て、合計で２７通り（３×３×３）の文章のピッチパタ
ンを用意しておく。例えば、「明日の天気は雨で
す。」という３つのアクセント句からなる音声のピッチ
パタンを求めるには、「明日の」が０型であり、「天気
は」と「雨です。」が１型であるので、図６から「ライ
オンが人類を飲んでいる。」の文章を選択する。しか
し、「明日の」と「ライオンが」ではモーラ数が異なる
ため、ピッチパタンを修正して用いる。「明日の」と
「ライオンが」は０型アクセントなので、図７の７０１
のようなピッチパタンになる。０型アクセントは２モー
ラ目以降においてピッチパタンの変化が少ないので、こ
の間（点線の区間）で伸縮処理を行なう。同様に１型ア
クセントの場合も図７の７０２の点線で示した区間で伸
縮処理を行なう。ｎ型アクセントの場合も同様にして伸
縮処理を行なう。以上の説明では３つのアクセント句か
らなる文章の場合を例にして説明したが、それ以外のア
クセント句の数からなる文章の場合も同様にしてピッチ
パタンを求めることができる。(Embodiment 3) The invention described in claim 3 of the present invention will be specifically described. FIG. 6 is an example of a creation table for creating a voice for extracting a pitch pattern according to the third aspect. This table shows an example of a voice composed of three accent phrases. In FIG. 6, the accent types are classified into three types: 0 type, 1 type, and n type. The state of this classification is shown in FIG. In FIG. 7, 701 to 7
03 represents a pitch pattern. The n-type accent indicates an accent type after the type 2 accent.
These voices are replaced for each accent phrase, and a total of 27 (3 × 3 × 3) sentence pitch patterns are prepared. For example, to find a pitch pattern of a voice composed of three accent phrases, "Tomorrow's weather is rain.", "Tomorrow" is type 0, and "Weather is" and "rain is." Therefore, the sentence "Lion is drinking mankind" is selected from FIG. However, since "morrow" and "lion" have different mora numbers, the pitch pattern is modified and used. "Tomorrow" and "Lion are" are type 0 accents, so 701 in FIG.
It becomes a pitch pattern like Since the 0-type accent has little change in the pitch pattern after the second mora, the expansion / contraction process is performed during this period (section indicated by the dotted line). Similarly, in the case of the type 1 accent, expansion / contraction processing is performed in a section indicated by a dotted line 702 in FIG. In the case of an n-type accent, expansion and contraction processing is performed in the same manner. In the above description, the case of a sentence composed of three accent phrases has been described as an example. However, the pitch pattern can be similarly obtained for a sentence composed of other accent phrases.

【００１３】このようにして、自然音声からピッチパタ
ンを生成すれば、非常に少ない音声データベースから任
意のピッチパタンを生成することができる。[0013] By generating a pitch pattern from natural speech in this way, an arbitrary pitch pattern can be generated from a very small number of speech databases.

【００１４】（実施の形態４）本発明の請求項４に記載
の発明について、具体的に説明する。図８は請求項４に
記載の音声合成装置の構成図であり、合成しようとする
音声の読みを入力する読み入力手段８００と、読み入力
手段８００から入力した読みのアクセント句数、モーラ
数およびアクセント型を決定し、これによりピッチパタ
ンのカテゴリを決定するピッチパタンカテゴリ決定手段
８０１と、自然音声の文章から抽出したピッチパタンを
格納したピッチパタンデータベース８０８と、ピッチパ
タンカテゴリ決定手段８０１で決定したピッチパタンカ
テゴリのピッチパタンをピッチパタンデータベース８０
８から選択するピッチパタン決定手段８０２と、ピッチ
パタン決定手段８０２で求めたピッチパタンをアクセン
ト句毎にモーラ数とアクセント型が合うようにピッチパ
タンを変形するピッチパタン変形手段８０３と、合成し
ようとする読みに従って音声片を選択する音声片選択手
段８０４と、音声片を格納した音声片データベース８０
９と、ピッチパタン変形手段８０３によって変形したピ
ッチパタンに基づいて音声片選択手段８０４によって選
択された音声片を変形する音声片変形手段８０５と、音
声片変形手段８０５で変形した音声片を接続して音声を
合成する音声片接続手段８０６と、合成音声を出力する
合成音出力手段８０７とを備えている。(Embodiment 4) The invention described in claim 4 of the present invention will be specifically described. FIG. 8 is a block diagram of the voice synthesizing device according to claim 4, wherein the reading input means 800 for inputting the reading of the voice to be synthesized, the number of accent phrases, the number of mora, The pitch pattern category determining means 801 that determines the accent type and thereby determines the category of the pitch pattern, the pitch pattern database 808 storing pitch patterns extracted from sentences of natural speech, and the pitch pattern category determining means 801 Pitch pattern database 80 for pitch patterns in the pitch pattern category
8 and a pitch pattern deforming means 803 for transforming the pitch pattern obtained by the pitch pattern determining means 802 so that the mora number matches the accent type for each accent phrase. Voice segment selecting means 804 for selecting a voice segment according to the reading to be read, and a voice segment database 80 storing voice segments
9, a speech piece deforming means 805 for transforming the speech piece selected by the speech piece selecting means 804 based on the pitch pattern transformed by the pitch pattern deforming means 803, and a speech piece deformed by the speech piece deforming means 805. And a voice synthesis unit 806 for outputting synthesized voice.

【００１５】ピッチパタンデータベース８０８には、実
施の形態３で説明したように、各アクセント句のアクセ
ント型を０型、１型、ｎ型にカテゴリ分類したピッチパ
タンを格納しておく。この時、５つのアクセント句ま
での文章を考慮しても３６３通り（３＋９＋２７＋８１
＋２４３）なので、小さい規模のデータベースにするこ
とができる。読み入力手段８０１から入力された読みを
ピッチカテゴリ決定手段８０２で分析し、各アクセント
句のアクセント型が上記のどのカテゴリに分類されるか
を求め、ピッチパタン決定手段８０２で文章全体で一つ
の音声から抽出したピッチパタンを選択する。ピッチパ
タン決定手段８０２で決定したピッチパタンは、各アク
セント句内でのモーラ数が異なるのでモーラ数が合うよ
うにピッチパタンをピッチパタン変形手段８０３で変更
する。この時、ｎ型アクセントではアクセント型も合う
ようにピッチパタンを変更する。ピッチパタン変形手段
８０３で変更したピッチパタンに基づいて音声片選択手
段８０４が音声片データベース８０９から音声素片を選
択し、それを音声片変形手段８０５で変形し、音声片接
続手段８０６で接続して音声を合成し、合成音出力手段
８０７から出力する。As described in the third embodiment, the pitch pattern database 808 stores pitch patterns in which the accent type of each accent phrase is categorized into 0 type, 1 type, and n type. At this time, 363 ways (3 + 9 + 27 + 81) even when considering sentences up to five accent phrases
+243), so that a small-scale database can be obtained. The pronunciation input from the reading input means 801 is analyzed by the pitch category determining means 802 to determine which of the above categories the accent type of each accent phrase is classified into, and the pitch pattern determining means 802 forms one voice for the entire sentence. Select the pitch pattern extracted from. Since the pitch pattern determined by the pitch pattern determining means 802 differs in the number of moras in each accent phrase, the pitch pattern is changed by the pitch pattern deforming means 803 so that the number of moras matches. At this time, the pitch pattern of the n-type accent is changed so as to match the accent type. Based on the pitch pattern changed by the pitch pattern deforming unit 803, the voice unit selecting unit 804 selects a voice unit from the voice unit database 809, deforms it by the voice unit deforming unit 805, and connects it by the voice unit connecting unit 806. And synthesizes the sound, and outputs the synthesized sound from the synthesized sound output unit 807.

【００１６】このように、本実施の形態によれば、自然
音声から抽出したピッチパタンに合わせて音声を合成す
るので、文章全体で安定したピッチパタンの音声を、規
模の小さいピッチパタンデータベース用いて合成するこ
とができる。As described above, according to the present embodiment, the speech is synthesized in accordance with the pitch pattern extracted from the natural speech, so that the speech of the stable pitch pattern in the whole sentence can be obtained by using a small-scale pitch pattern database. Can be synthesized.

【００１７】[0017]

【発明の効果】以上のように本発明によれば、有声音の
みで構成される自然音声から抽出したピッチパタンに従
い音声を合成するようにしたものであり、これにより無
声音近傍に現れるピッチパタンの大きな変動を抑えるこ
とができると共に、ピッチパタンの補間処理が無くな
り、安定した大局的なピッチパタンをそのまま求めるこ
とができるので、自然な抑揚の音声を合成することがで
きる。また、文章の音声合成においてピッチパタンが文
章全体で安定した音声合成装置を提供することができる
という有利な効果が得られる。As described above, according to the present invention, speech is synthesized in accordance with a pitch pattern extracted from a natural speech composed only of voiced sounds, whereby a pitch pattern appearing near an unvoiced sound is synthesized. Since large fluctuations can be suppressed, and pitch pattern interpolation processing is eliminated, and a stable global pitch pattern can be obtained as it is, a natural inflection sound can be synthesized. Further, in the speech synthesis of a sentence, there is obtained an advantageous effect that it is possible to provide a speech synthesis device in which the pitch pattern is stable over the entire sentence.

[Brief description of the drawings]

【図１】実施の形態１における無声音を含んだ音声のピ
ッチパタンの概念図FIG. 1 is a conceptual diagram of a pitch pattern of a voice including an unvoiced sound according to the first embodiment.

【図２】実施の形態１における有声音のみで構成される
音声のピッチパタンの概念図FIG. 2 is a conceptual diagram of a pitch pattern of a voice composed of only voiced sounds according to the first embodiment.

【図３】実施の形態２における合文法無意味文章作成の
概念図FIG. 3 is a conceptual diagram of creating a grammatical meaningless sentence according to the second embodiment.

【図４】実施の形態２におけるモーラ数およびアクセン
ト型毎の有声音のみで構成される名詞の例を示す一覧図FIG. 4 is a list showing examples of nouns composed only of voiced sounds for each mora number and accent type according to the second embodiment.

【図５】実施の形態２におけるモーラ数およびアクセン
ト型毎の有声音のみで構成される動詞の例を示す一覧図FIG. 5 is a list showing an example of a verb composed of only voiced sounds for each mora number and accent type according to the second embodiment.

【図６】実施の形態３におけるピッチパタン抽出音声作
成例を示す一覧図FIG. 6 is a list showing an example of pitch pattern extraction voice creation according to the third embodiment.

【図７】実施の形態３におけるピッチパタンカテゴライ
ズの概念図FIG. 7 is a conceptual diagram of pitch pattern categorization in a third embodiment.

【図８】実施の形態４におけるピッチパタンデータベー
スを用いた音声合成装置の構成を示すブロック図FIG. 8 is a block diagram showing a configuration of a speech synthesizer using a pitch pattern database according to a fourth embodiment.

[Explanation of symbols]

１００ … 音声波形１０１〜１０３ … ピッチパタン１０４〜１０８ … 母音区間２００ … 音声波形２０１ … ピッチパタン２０２〜２０５ … 母音区間３０１〜３０３ … アクセント句区間３０４、３０５ … 名詞代入領域３０６ … 動詞代入領域７０１〜７０３ … ピッチパタン８００ … 読み入力手段８０１ … ピッチパタンカテゴリ決定手段８０２ … ピッチパタン決定手段８０３ … ピッチパタン変形手段８０４ … 音声素片選択手段８０５ … 音声素片変形手段８０６ … 合成片接続手段８０７ … 合成音出力手段８０８ … ピッチパタンデータベース８０９ … 音声片データベース 100 voice waveform 101-103 pitch pattern 104-108 vowel section 200 voice pattern 201 pitch pattern 202-205 vowel section 301-303 accent phrase section 304, 305 noun substitution area 306 verb substitution area 701 ... 703 pitch pattern 800 reading input means 801 pitch pattern category determining means 802 pitch pattern determining means 803 pitch pattern deforming means 804 voice unit selecting means 805 voice unit deforming means 806 synthetic unit connecting means 807 … Synthesized sound output means 808… pitch pattern database 809… voice segment database

フロントページの続き (72)発明者望月亮神奈川県横浜市港北区綱島東四丁目３番１号松下通信工業株式会社内Ｆターム(参考） 5D045 AA09 Continuation of the front page (72) Inventor Ryo Mochizuki 4-3-1 Tsunashima Higashi, Kohoku-ku, Yokohama-shi, Kanagawa Prefecture F-term in Matsushita Communication Industrial Co., Ltd. 5D045 AA09

Claims

[Claims]

1. A method of extracting a pitch pattern of a speech to be synthesized from natural speech, changing a pitch of a speech piece according to the pitch pattern, and connecting these to synthesize an arbitrary speech. A speech synthesis method characterized by using a meaningful natural speech composed of only voiced sounds as a natural speech for extracting a pattern.

2. The speech synthesis method according to claim 1, wherein a grammatical meaningless sentence is used as natural speech for extracting a pitch pattern.

3. The same number of phrases as the speech to be synthesized,
When a voice composed of voiced sounds of the number of mora and accent type cannot be prepared, a part of a pitch pattern of a voice composed of voiced sounds having a similar number of phrases, mora number and accent type is changed to obtain a desired accent phrase. 2. The speech synthesis method according to claim 1, wherein an arbitrary speech is synthesized using the number of characters, the number of mora in an accent phrase, and an accent type pitch pattern to be synthesized.

4. Input means for inputting a reading of a speech to be synthesized, pitch pattern category determining means for determining the number of phrases, mora number and accent type from the input, and thereby determining the category of the pitch pattern, A pitch pattern database storing pitch patterns extracted from sentences of natural speech, and pitch pattern determining means for selecting a pitch pattern of the pitch pattern category determined by the pitch pattern category determining means from the pitch pattern database are to be synthesized. Pitch pattern transformation means for transforming the pitch pattern so as to match the mora number and accent type of each accent phrase of speech, speech piece selection means for selecting speech pieces according to the reading to be synthesized, and speech pieces storing speech pieces Database and selected by the voice segment selecting means A speech piece deformation means for transforming the speech piece based on the pitch pattern determined by the pitch pattern determination means, a speech piece connection means for connecting the speech pieces to synthesize a voice, and a synthesized sound output means for outputting a synthesized voice; A speech synthesizer provided with.