JPH01211798A

JPH01211798A - Regular synthesizing device for voice

Info

Publication number: JPH01211798A
Application number: JP63037947A
Authority: JP
Inventors: Masanobu Abe; 匡伸阿部; Hisao Kuwabara; 尚夫桑原; Kiyohiro Kano; 清宏鹿野
Original assignee: A T R JIDO HONYAKU DENWA KENKYUSHO KK
Current assignee: A T R JIDO HONYAKU DENWA KENKYUSHO KK
Priority date: 1988-02-19
Filing date: 1988-02-19
Publication date: 1989-08-24
Anticipated expiration: 2014-04-12
Also published as: JP2880508B2

Abstract

PURPOSE:To add individual features of a 2nd speaker while minimizing individual voice information which is prepared by providing a regular synthesizing means which synthesizes the voice signal of a 1st standard speaker and a voice converting means which converts the voice signal of the 1st speaker into the voice signal of the 2nd speaker to whom the individual features are to be added. CONSTITUTION:The regular synthesis part 20 including a unit set information file 21 of voice receives a character information signal s1 and refers to the unit set information file 21 of voice to synthesize the voice signal s2 of the standard speaker A. Then the voice quantity conversion part 30 receives the voice signal s2 of the speaker A and refers to a previously registered voice individual information file 40 of the speaker B whose voice is to be given individuality to convert the voice quality of the speech signal (regularly synthetic voice) of the speaker A into the voice quality of the speaker B, thereby outputting the voice signal s4 of the speaker B. Consequently, while the individual voice information is minimized, the individuality of the speaker is given to the regularly synthetic voice.

Description

【発明の詳細な説明】［産業上の利用分野］この発明は、音声の規則合成装置に関し、特に、音質を
変換して出力する規則合成装置に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a rule-based synthesis device for speech, and particularly to a rule-based synthesis device that converts and outputs sound quality.

［従来の技術および発明が解決しようとする課題］従来
の規則合成システムで出力される合成音の音質は、１つ
または数種類（男性の音声、女性の音声、子供らしい音
声および老人のような音声など）に限られている。しか
し、現実に規則合成システムを使用する場合には、各個
人が音声の個人性を持っているように、出力される合成
音声の音質も個人的な特徴を持っていることが望まれる
。[Prior Art and Problems to be Solved by the Invention] Conventional rule synthesis systems output synthesized sounds of one or several types (male voice, female voice, childlike voice, and elderly voice). etc.) are limited to However, when a rule synthesis system is actually used, it is desirable that the sound quality of the output synthesized voice also have individual characteristics, just as each person has individual characteristics in their voice.

また、これとは別に、規則合成音に、希望する特定の人
の音質を持たせたい要求もある。Apart from this, there is also a demand for the rule-synthesized speech to have the desired sound quality of a specific person.

しかし、従来の規則合成システムでは、（１）　規則合
成に用いられる音声単位は、成る話者が発声した数百側
の音声から作成されるため、発声者の負担が大きい。However, in conventional rule synthesis systems, (1) the speech units used for rule synthesis are created from hundreds of voices uttered by a given speaker, which places a heavy burden on the speaker;

（２）　音声単位の作成は、全自動で行なうのが難しく
、人手がかかり、沢山の話者について音声の単位を作成
するのは事実上不可能である。(2) Creation of speech units is difficult and labor-intensive to perform fully automatically, and it is virtually impossible to create speech units for many speakers.

（３）　規則合成に用いられる音声の単位セットを各話
者ごとに作ると、格納しておくメモリ量が膨大なものと
なる。(3) If a unit set of speech used for rule synthesis is created for each speaker, the amount of memory to be stored will become enormous.

などのような理由により、上記の要求を実現することが
できないという課題があった。For the following reasons, there was a problem in that the above requirements could not be realized.

この発明は、上記のような課題を解決するためになされ
たもので、発声者、規則合成のためのデータ作成者およ
び規則合成システムの負担をできる限り少なく保ちなが
ら、規則合成音に発声者の個人性を持たせることを目的
とする。This invention was made in order to solve the above-mentioned problems, and it is possible to reduce the burden on the speaker, the data creator for rule synthesis, and the rule synthesis system as much as possible, while adding the burden on the speaker to the rule synthesized speech. The purpose is to give it individuality.

［課題を解決するための手段］この発明に係る音声の規則合成装置は、外部から文字情
報信号を受け、標準となる第１の話者の音声単位信号集
合を参照することにより、第１の話者の音声信号を合成
する規則合成手段と、第１の話者の音声を、音声に個人
的特徴を付与したい第２の話者の音声に変換するのに必
要な変換信号を蓄積した変換信号ファイル手段と、合成
された第１の話者の音声信号を変換信号ファイル手段に
蓄積された変換信号に基づいて、第２の話者の音声信号
に変換する音声変換手段とを含む。[Means for Solving the Problems] A speech rule synthesis device according to the present invention receives a character information signal from the outside, and by referring to a standard set of speech unit signals of a first speaker, synthesizes a first speech unit signal. A rule synthesis means for synthesizing speech signals of speakers, and a conversion device that stores conversion signals necessary for converting the speech of a first speaker into the speech of a second speaker who wants to add personal characteristics to the speech. It includes a signal file means, and a voice conversion means for converting the synthesized first speaker's voice signal into a second speaker's voice signal based on the converted signal stored in the converted signal file means.

［作用コこの発明における音声の規則合成装置は、規則合成手段
により標準となる第１の話者の音声信号を規則合成した
後で、音声変換手段により変換信号ファイル手段に蓄積
された変換信号に従って、その第１の音声信号を個人的
特徴を有する第２の話者の音声信号に変換するので、音
声の特徴が異なる個人ごとに規則合成するのに必要な音
声単位集合を準備する必要がない。[Operation] The speech rule synthesis device according to the present invention performs rule synthesis of the standard first speaker's speech signal by the rule synthesis means, and then synthesizes the voice signal by the voice conversion means according to the converted signal stored in the converted signal file means. , the first speech signal is converted into the speech signal of the second speaker having individual characteristics, so there is no need to prepare a set of speech units necessary for regular synthesis for each individual with different speech characteristics. .

［発明の実施例コ第１図は、この発明による規則合成装置の一実施例を示
すブロック図である。[Embodiment of the Invention] FIG. 1 is a block diagram showing an embodiment of a rule synthesis apparatus according to the invention.

第１図を参照して、入力部１０に、外部から文字および
アクセント型などの文字列ならびに韻律信号を含む文字
情報信号ｓ１が与えられる。音声の・単位セット情報フ
ァイル２１を含む規則合成部２０は、文字情報信号ｓ１
を受け、音声の単位セット情報ファイル２１を参照して
、標準の話者Ａの音声信号ｓ２を合成する。ここで、音
声の単位セット情報ファイル２１は、標準話者Ａについ
て、音素や音節などの音声の単位となる情報が予め蓄え
られたデータベースである。Referring to FIG. 1, a character information signal s1 including character strings such as characters and accent types, and a prosody signal is applied to input unit 10 from the outside. A rule synthesis unit 20 including a voice unit set information file 21 generates a character information signal s1.
, and synthesizes a standard speech signal s2 of speaker A with reference to the speech unit set information file 21. Here, the speech unit set information file 21 is a database in which information about the standard speaker A, which is a speech unit such as a phoneme or a syllable, is stored in advance.

声質変換部３０は、話者Ａの音声信号ｓ２を受け、予め
登録されている、音声に個人性を付与したい話者Ｂの音
声個人情報ファイル４０を参照して、話者Ａの音声信号
（規則合成音）ｓ２の声質を、話者Ｂの声質に変換し、
話者Ｂの音声信号Ｓ４として出力する。The voice quality converting unit 30 receives the voice signal s2 of the speaker A, refers to a pre-registered voice personal information file 40 of the speaker B who wants to add individuality to the voice, and converts the voice signal s2 of the speaker A ( Ruled synthesized speech) Convert the voice quality of s2 to the voice quality of speaker B,
It is output as speaker B's audio signal S4.

声質変換部３０における声質変換方法として、ベクトル
量子化を利用した声質変換法が用いられる。この方法は
、規則合成部２０の基準となった標準話者Ａと、音声に
個人性を付与したい話者Ｂとの間の声質変換を、各話者
のコードブックの対応づけである変換コードブックによ
って行なうものである。As a voice quality conversion method in the voice quality conversion unit 30, a voice quality conversion method using vector quantization is used. This method converts the voice quality between a standard speaker A, which is the standard for the rule synthesis unit 20, and a speaker B, who wants to add individuality to the voice, using a conversion code that is a correspondence between the codebooks of each speaker. This is done using a book.

変換コードブックは、個人性を付与したい話者の音声の
パワー、ピッチ周波数およびスペクトル情報を含み、音
声の特徴が離散的に表現されている。第１図の音声個人
情報ファイル４０は、この変換コードブックの内容を含
む。The conversion codebook includes the power, pitch frequency, and spectrum information of the speaker's voice to which individuality is to be added, and the characteristics of the voice are expressed discretely. The audio personal information file 40 in FIG. 1 includes the contents of this conversion codebook.

第２図は、変換コードブックの作成手順を示すフロー図
である。FIG. 2 is a flow diagram showing the procedure for creating a conversion codebook.

第２図を参照して、以下に変換コードブック４１．４２
．４３を求める手順について説明する。Referring to Figure 2, below is the conversion code book 41.42
．． The procedure for finding 43 will be explained.

まず、ステップ３０１および３０２において、話者Ａお
よび話者Ｂのそれぞれの音声にＬＰＧ分析を施し、パワ
ー、ピッチ周波数およびスペクトルパラメータを求める
。次に、ステップ３０Ｂおよび３０４において、スペク
トルパラメータをベクトル量子化し、ステップ３０５お
よび３０６でパワーをスカラー量子化し、ステップ３０
７および３０８においてピッチ周波数をスカラー量子化
する。First, in steps 301 and 302, the voices of speaker A and speaker B are subjected to LPG analysis to obtain power, pitch frequency, and spectral parameters. Next, the spectral parameters are vector quantized in steps 30B and 304, the power is scalar quantized in steps 305 and 306, and the power is scalar quantized in steps 30B and 304.
7 and 308, the pitch frequency is scalar quantized.

話者Ａおよび話者Ｂの発声した音声の時間対応をとるた
めに、スペクトルパラメータを用いて、ステップ３０９
においてＤｏｕｂｌｅ　　５ｐｌｉｔ法によるＤＰマツ
チングを行なう。ここで得られた時間対応の情報を基に
して、ステップ３１０１３１１および３１２において、
各特徴量について話者Ａと話者Ｂの対応関係を求め、ヒ
ストグラムを作成する。スペクトルパラメータおよびパ
ワーの変換コードブック４１．４３は、このヒスドグラ
ムを重みとした話者Ｂの特徴ベクトルの線形結合で求め
る。また、ピッチ周波数の変換コードブック４２は、こ
のヒストグラムの最大値を与える話者Ｂの特徴ベクトル
で作成する。Step 309 uses the spectral parameters to take the time correspondence of the voices uttered by speaker A and speaker B.
DP matching is performed using the Double 5plit method. Based on the time-related information obtained here, in steps 3101311 and 312,
The correspondence between speaker A and speaker B is determined for each feature, and a histogram is created. The spectral parameter and power conversion codebooks 41 and 43 are obtained by linear combination of feature vectors of speaker B using this hisdogram as weight. Further, the pitch frequency conversion codebook 42 is created using the feature vector of speaker B that gives the maximum value of this histogram.

第３図は、声質変換部３０における声質変換手順を示す
フロー図である。FIG. 3 is a flow diagram showing the voice quality conversion procedure in the voice quality conversion section 30.

第３図を参照して、以下に変換コードブックを用いた声
質変換方法について説明する。話者Ａの音声信号ｓ２は
、ステップ４０１においてＬＰＧ分析され、パワー、ピ
ッチ周波数およびスペクトルパラメータが抽出される。A voice quality conversion method using a conversion codebook will be described below with reference to FIG. The speech signal s2 of speaker A is subjected to LPG analysis in step 401 to extract power, pitch frequency and spectral parameters.

次に、ステップ４０２において話者Ａのスペクトルコー
ドブックからのスペクトルパラメータがベクトル量子化
され、ステップ４０３において話者Ａのパワーコードブ
ックからのパワーがスカラー量子化され、ステップ４０
４において話者Ａのピッチ周波数コードブックからのピ
ッチ周波数がスカラー量子化される。これらの量子化さ
れたパラメータを復号化する過程において、前述の変換
コードブック４１．４２．４３が使用される。すなわち
、ステップ４０５において、話者Ａから話者Ｂへのスペ
クトル変換コードブック４１を用い、ステップ４０６に
おいて、パワー変換コードブック４３を用い、ステップ
４０７においてピッチ周波数変換コードブック４２を用
いる。そして、変換された各パラメータを用いてステッ
プ４０８で話者Ｂの音声信号ｓ４が合成される。Next, the spectral parameters from speaker A's spectral codebook are vector quantized in step 402, the powers from speaker A's power codebook are scalar quantized in step 403, and step 40
4, the pitch frequencies from speaker A's pitch frequency codebook are scalar quantized. In the process of decoding these quantized parameters, the aforementioned transformation codebooks 41.42.43 are used. That is, in step 405, the spectrum conversion codebook 41 from speaker A to speaker B is used, in step 406, the power conversion codebook 43 is used, and in step 407, the pitch frequency conversion codebook 42 is used. Then, the speech signal s4 of speaker B is synthesized in step 408 using each of the converted parameters.

第４図は、この発明による規則合成装置を含む規則合成
システムのハードウェア構成を示す概略ブロック図であ
る。FIG. 4 is a schematic block diagram showing the hardware configuration of a rule synthesis system including a rule synthesis apparatus according to the present invention.

第４図を参照して、この規則合成システムは、アンプ１
とローパスフィルタ２とＡ／Ｄ変換器３とコンピュータ
システム４とを含む。アンプ１は入力された音声信号を
増幅するものであり、ローパスフィルタ２は増幅された
音声信号から折返し雑音を除去するものである。Ａ／Ｄ
変換器３は音声信号を１２ｋＨｚのサンプリング信号に
より、１６ビツトのディジタル信号に変換するものであ
る。コンピュータシステム４は、規則合成装置（演算処
理部）５と磁気ディスク６と端末類７とプリンタ８とを
含む。この発明による音声の規則合成装置は第４図の規
則合成装置５内において構成される。Referring to FIG. 4, this rule synthesis system consists of amplifier 1
, a low-pass filter 2 , an A/D converter 3 , and a computer system 4 . The amplifier 1 is for amplifying an input audio signal, and the low-pass filter 2 is for removing aliasing noise from the amplified audio signal. A/D
The converter 3 converts the audio signal into a 16-bit digital signal using a 12 kHz sampling signal. The computer system 4 includes a rule synthesis device (arithmetic processing unit) 5, a magnetic disk 6, terminals 7, and a printer 8. The speech rule synthesis device according to the present invention is constructed within the rule synthesis device 5 shown in FIG.

［発明の効果］以上のように、この発明によれば、標準となる第１の話
者の音声信号を合成する規則合成手段と、第１の話者の
音声信号を個人的特徴を付与したい第２の話者の音声信
号に変換する音声変換手段とを含むので、予め準備すべ
き個人的音声情報を最少限に保ちながら、第１の話者す
なわち標準の話者の規則合成音に第２の話者の個人的特
徴を付与することができる。[Effects of the Invention] As described above, according to the present invention, there is provided a rule synthesis means for synthesizing a standard voice signal of the first speaker, and a method for imparting personal characteristics to the voice signal of the first speaker. and a voice converting means for converting the voice signal into the voice signal of the second speaker, so that the synthesized speech of the first speaker, that is, the standard speaker, can be converted into the regular synthesized voice of the first speaker, that is, the standard speaker, while keeping the personal voice information that must be prepared in advance to a minimum. It is possible to add the personal characteristics of the second speaker.

[Brief explanation of the drawing]

第１図は、この発明による規則合成装置の一実施例を示
すブロック図である。第２図は、変換コードブックの作
成手順を示すフロー図である。第３図は、声質変換部に
おける声質変換手順を示すフロー図である。第４図は、
この発明による規則合成装置を含む規則合成システムの
ハードウェア構成を示す概略ブロック図である。図において、１はアンプ、２はローパスフィルタ、３は
Ａ／Ｄ変換器、４はコンピュータシステム、５は規則合
成装置、１０は入力部、２０は規則合成部、２１は音声
単位セット情報ファイル、３０は声質変換部、４０は音
声個人情報ファイル、５０は出力部、ｓｌは文字情報信
号、ｓ２は話者Ａの音声信号、ｓ３は話者Ｂの音声個人
情報信号、ｓ４は話者Ｂの音声信号を示す。FIG. 1 is a block diagram showing an embodiment of a rule synthesis device according to the present invention. FIG. 2 is a flow diagram showing the procedure for creating a conversion codebook. FIG. 3 is a flow diagram showing the voice quality conversion procedure in the voice quality conversion section. Figure 4 shows
1 is a schematic block diagram showing the hardware configuration of a rule synthesis system including a rule synthesis device according to the present invention. In the figure, 1 is an amplifier, 2 is a low-pass filter, 3 is an A/D converter, 4 is a computer system, 5 is a rule synthesis device, 10 is an input section, 20 is a rule synthesis section, 21 is an audio unit set information file, 30 is a voice quality conversion unit, 40 is a voice personal information file, 50 is an output unit, sl is a text information signal, s2 is a voice signal of speaker A, s3 is a voice personal information signal of speaker B, and s4 is a voice personal information signal of speaker B. Indicates an audio signal.

Claims

[Scope of Claims] It includes a standard set of voice unit signals of a first speaker, receives a character information signal from the outside, and refers to the set of voice unit signals, thereby generating the voice of the first speaker. a rule synthesis means for synthesizing signals; and a conversion signal file storing conversion signals necessary for converting the voice of the first speaker into the voice of a second speaker whose voice is desired to have personal characteristics. means, and the synthesized first
voice conversion means for receiving the voice signal of the first speaker and converting the voice signal of the first speaker into the voice signal of the second speaker based on the conversion signal stored in the conversion signal file means; A speech rule synthesis device, including: