JPH0229800A

JPH0229800A - Voice synthesizing device

Info

Publication number: JPH0229800A
Application number: JP63181125A
Authority: JP
Inventors: Yoshiyuki Hara; 義幸原
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1988-07-20
Filing date: 1988-07-20
Publication date: 1990-01-31

Abstract

PURPOSE:To obtain a composite voice with high naturalness under easy control by holding the accent level of a proper noun constant in vice mode and obtaining its composite voice, and obtaining the composite voice while varying the accent level in the voice output mode of a general document. CONSTITUTION:Word information and document information indicating the proper noun symbolized by the symbol sequence input part 1 of the voice synthesizing device are inputted and a rule synthesis processing part 2 analyzes symbol information inputted through an input part 1 to convert it into a symbol information phoneme sequence and also finds accent type information. Those pieces of information are supplied to composite parameter generation parts 3 and 4, which generate parameters for voice synthesis according to a phoneme sequence and accent information from the processing part 2 and output the composite parameters to a voice synthesis part 5, thus easily performing rule synthesis.

Description

【発明の詳細な説明】［発明の目的］（産業上の利用分野）本発明は記号化されたコード情報として入力される固有
名詞列や一般文章をそれぞれ自然性良く音声合成するこ
とのできる音声合成装置に関する。[Detailed Description of the Invention] [Objective of the Invention] (Field of Industrial Application) The present invention provides a speech synthesis system that can synthesize natural noun sequences and general sentences input as encoded code information with a high degree of naturalness. Regarding a synthesis device.

（従来の技術）近時、入力文字コード列を解析してその音韻パラメータ
系列と韻律パラメータ系列とを求め、これらの音韻パラ
メータ系列と韻律パラメータ系列とに従って所定の規則
に基づき、上記人力文字コード列が示す合成音声を生成
出力する音声合成装置が種々開発されている。この種の
規則合成による音声合成装置は、従来の録音編集方式に
比較して任意の単語や文を自然性良く商品に生成するこ
とができると云う優れた利点を持つ。これ故、音声認識
技術の発達と相俟って、自然性の高いマン・マシン・イ
ンターフェースを実現する上での重要な技術として注目
されている。(Prior Art) Recently, an input character code string is analyzed to obtain its phonological parameter sequence and prosodic parameter sequence, and the above-mentioned human character code string is Various speech synthesis devices have been developed that generate and output synthesized speech represented by . This type of speech synthesis device using rule synthesis has an excellent advantage over conventional recording and editing systems in that it can generate arbitrary words and sentences into products with good naturalness. Therefore, along with the development of voice recognition technology, it is attracting attention as an important technology for realizing highly natural man-machine interfaces.

ところでこのような規則合成技術の大半は、専ら一般文
章情報の音声化を対象としてその研究開発が進められて
いる。この為、種々の情報検索システムに見られるよう
な固有名詞だけを音声出力対象とする応用分野にとって
は、必ずしも自然性の高い合成音声を得ることができな
いと云う問題があった。By the way, most of these rule synthesis technologies are being researched and developed exclusively for converting general text information into speech. For this reason, there is a problem in that it is not necessarily possible to obtain highly natural synthesized speech for applications where only proper nouns are to be output as speech, such as in various information retrieval systems.

具体的には、例えば「オトシテ／クダサイ」なる一般文
章を規則合成する場合、その文章を構成する単語の重要
性を考慮して「クダサイ」なる単語のアクセントレベル
を「オトシテ」なる単語のアクセントレベルより低く設
定することで、自然性の高い合成音声を得るようにして
いる。ところが同じモーラ数、同じアクセント型の固有
名詞を連続的に音声合成する場合、その単語の繋がり関
係からアクセントレベルを可変制御すると、逆に不自然
さが増すことが否めない。Specifically, for example, when synthesizing a general sentence such as ``Otoshite/Kudasai'', the accent level of the word ``Kudasai'' is changed to the accent level of the word ``Otoshite'', taking into account the importance of the words that make up the sentence. By setting it lower, we are trying to obtain a synthesized voice with a high degree of naturalness. However, when sequentially synthesizing proper nouns with the same number of moras and the same accent type, if the accent level is variably controlled based on the connection of the words, it is undeniable that the unnaturalness increases.

（発明が解決しようとする課題）このように従来の音声合成装置は専ら一般文章を音声合
成の対象としている為、固有名詞を連続的に音声合成す
る場合にはその自然性を確保することが困難である等の
問題があった。(Problem to be Solved by the Invention) As described above, since conventional speech synthesis devices only synthesize general sentences, it is difficult to ensure naturalness when sequentially synthesizing proper nouns. There were problems such as difficulty.

本発明はこのような事情を考慮してなされてもので、そ
の目的とするところは、入力文字コード列が一般文章で
あるか固有名詞であるかに拘らず、これらを自然性良く
音声合成することのできる音声合成装置を提供すること
にある。The present invention has been made in consideration of these circumstances, and its purpose is to synthesize speech with a natural manner regardless of whether the input character code string is a general sentence or a proper noun. The purpose of the present invention is to provide a speech synthesis device that can perform the following tasks.

［発明の構成］（問題点を解決するための手段）本発明に係る音声合成装置は、合成出力する音声のピッ
チパターンのアクセントレベルを常に一定に保つ合成パ
ラメータを生成して音声合成する手段と、合成出力する
音声のピッチパターンのアクセントレベルを可変可能と
する合成パラメータを生成して音声合成する手段とを具
備し、これらの手段を人力文字コード列が一般文章であ
るか固有名詞であるかに応じて選択的に用いるようにし
たことを特徴とするものである。[Structure of the Invention] (Means for Solving the Problems) A speech synthesis device according to the present invention includes a means for synthesizing speech by generating a synthesis parameter that always keeps the accent level of the pitch pattern of the speech to be synthesized and output constant. , a means for synthesizing speech by generating a synthesis parameter that makes it possible to vary the accent level of the pitch pattern of the speech to be synthesized and output, and determining whether the human character code string is a general sentence or a proper noun. It is characterized by being selectively used depending on the situation.

（作用）本発明によれば固有名詞の音声出力モード時にはアクセ
ントレベルを一定としてその合成音声を得、一般文章の
音声出力モード時にはアクセントレベルを可変しながら
その合成音声を得るので、自然性の高い合成音声を簡易
な制御の下で、簡易に得ることが可能となる。(Function) According to the present invention, when in the speech output mode of proper nouns, the accent level is kept constant and synthesized speech is obtained, and when in the speech output mode of general sentences, the synthesized speech is obtained while varying the accent level, so that the synthesized speech is highly natural. It becomes possible to easily obtain synthesized speech under simple control.

（実施例）以下、図面を参照して本発明の一実施例につ、き説明す
る。(Example) An example of the present invention will be described below with reference to the drawings.

第１図は実施例装置の概略構成図であり、ｌは記号化さ
れた固有名詞を示す単語情報や文章情報を入力する記号
列入力部である。規則合成処理部２は上記記号列人力部
ｌを介して入力された記号列情報を解析処理して上記入
力記号列情報を音韻系列に変換すると共に、そのアクセ
ント型の情報を求め、これらの情報を合成パラメータ生
成部３゜４にそれぞれ与えている。これらの合成パラメ
ータ生成部３．４は上記規則合成処理部２から与えられ
る音韻系列とアクセント型情報とに基づいて音声合成の
為の合成パラメータを生成し、この合成パラメータを音
声合成部５に出力して音声の規則合成に供するものであ
る。FIG. 1 is a schematic configuration diagram of an embodiment of the apparatus, and l is a symbol string input section for inputting word information and sentence information indicating encoded proper nouns. The rule synthesis processing section 2 analyzes the symbol string information inputted through the symbol string human power section 1, converts the input symbol string information into a phoneme sequence, obtains information on its accent type, and converts this information. are given to the synthesis parameter generation units 3 and 4, respectively. These synthesis parameter generation units 3.4 generate synthesis parameters for speech synthesis based on the phoneme sequence and accent type information given from the rule synthesis processing unit 2, and output these synthesis parameters to the speech synthesis unit 5. This is used for the regular synthesis of speech.

しかして固有名詞発声モード用の合成パラメータ生成部
３は、合成出力する音声のピッチパターンのアクセント
レベルを、第２図（ａ）に示すように常に一定に保つ合
成パラメータを生成する。つまり第１のアクセント句の
アクセントレベルａｌと第２のアクセント句のアクセン
トレベルａ２とが等しくなるように、その合成パラメー
タを生成している。尚、第２図において、破線は話調の
降ド成分を示し、実線はピッチパターンを示している。The synthesis parameter generation unit 3 for the proper noun utterance mode generates synthesis parameters that keep the accent level of the pitch pattern of the voice to be synthesized and output constant at all times, as shown in FIG. 2(a). In other words, the synthesis parameters are generated so that the accent level al of the first accent phrase is equal to the accent level a2 of the second accent phrase. In FIG. 2, the broken line indicates the falling tone component, and the solid line indicates the pitch pattern.

これに対して文章発声モード用の合成パラメータ生成部
４は、アクセント句の並びに従って、例えば第２図（ｂ
）に示すようにそのアクセントレベルを（ａｌ　−ａ２
　＞ａ３　）のように可変する合成パラメータを生成す
る。On the other hand, the synthesis parameter generation unit 4 for the sentence utterance mode generates the synthesized parameters according to the arrangement of the accent phrases, for example, as shown in FIG.
), the accent level is (al −a2
>a3) Generate variable synthesis parameters.

モード切替部６は音声合成対象とする記号列が内角′名
詞の列であるか、成るいは一般文章であるかのモード切
替え情報に従って上述した合成パラメータ生成部３．４
を選択的に用いて、その音声合成を制御する。この切替
え制御により、固有名詞についてはアクセントレベルを
一定とした合成告白が生成出力され、一般文章について
はアクセントレベルが可変された合成音響が生成出力さ
れる。The mode switching section 6 generates the above-mentioned synthesis parameter generating section 3.4 according to the mode switching information indicating whether the symbol string to be speech synthesized is a string of interior angle nouns, or a general sentence.
is selectively used to control the speech synthesis. Through this switching control, a synthetic confession with a constant accent level is generated and output for proper nouns, and a synthesized sound with a variable accent level is generated and output for general sentences.

この結果、住所・氏名・企業名等の固有名詞を連続的に
音声合成する場合であっても、一般文章を段重合成する
場合であっても、その自然性を十分高いものとすること
が可能となる。As a result, whether it is continuous speech synthesis of proper nouns such as addresses, names, and company names, or multistage synthesis of general sentences, it is possible to achieve a sufficiently high level of naturalness. It becomes possible.

尚、本発明は上述した実施例に限定されるものではない
。例えば音声合成するアクセント句の数は２つまたは３
つに限られるものではなく、またアクセントレベルの可
変制御の形態も、特に限定されるものではない。また人
力記号列の品詞解析結果等に応じて、合成パラメータの
生成形態を自動的に切替え制御することも可能である。Note that the present invention is not limited to the embodiments described above. For example, the number of accent phrases to be synthesized is 2 or 3.
The accent level is not limited to this, and the form of accent level variable control is not particularly limited either. It is also possible to automatically switch and control the generation form of synthesis parameters according to the result of part-of-speech analysis of a human-powered symbol string.

その他、本発明はその要旨を逸脱しない範囲で種々変形
して実施することができる。In addition, the present invention can be implemented with various modifications without departing from the gist thereof.

［発明の効果］以上説明したように本発明によれば、簡易な制御により
自然性の高い合成音声を効果的に生成出力することので
きる簡易で、実用性の高い音声合成装置を提供すること
ができる等、多大なる効果が奏せられる。[Effects of the Invention] As explained above, according to the present invention, it is possible to provide a simple and highly practical speech synthesis device that can effectively generate and output highly natural synthesized speech through simple control. It can bring about great effects, such as being able to.

[Brief explanation of the drawing]

第１図は本発明の一実施例に係る音声合成装置の概略構
成図、第２図は合成出力する音声のアクセントレベルの
制御形態を模式的に示す図であ■・・・記号列入力部、
２・・・規則合成処理部、・・・合成パラメータ生成部
、５・・・音声合成部、モード切替部。FIG. 1 is a schematic configuration diagram of a speech synthesis device according to an embodiment of the present invention, and FIG. 2 is a diagram schematically showing a control form of the accent level of synthesized and output speech. ,
2... Rule synthesis processing unit,... Synthesis parameter generation unit, 5... Speech synthesis unit, mode switching unit.

Claims

[Claims] In a speech synthesis device that generates and outputs synthesized speech indicated by the symbol string information using an analysis result of symbol string information and a predetermined rule, the accent level of the pitch pattern of the synthesized speech is always kept constant. 1. A speech synthesis method characterized by comprising: a means for generating a synthesis parameter for voice synthesis by generating a synthesis parameter that maintains the pitch pattern; and a means for generating a synthesis parameter for voice synthesis by generating a synthesis parameter that makes it possible to vary the accent level of the pitch pattern of the voice to be synthesized and output. Device.