JP3059751B2

JP3059751B2 - Residual driven speech synthesizer

Info

Publication number: JP3059751B2
Application number: JP2249498A
Authority: JP
Inventors: 徹北村
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1990-09-18
Filing date: 1990-09-18
Publication date: 2000-07-04
Anticipated expiration: 2015-07-04
Also published as: JPH04125699A

Description

【発明の詳細な説明】（イ）産業上の利用分野本発明は、任意の言葉を発声することが可能な音声合
成装置、特に残差駆動を行う残差駆動型音声合成装置に
関する。The present invention relates to a speech synthesizer capable of uttering arbitrary words, and more particularly to a residual drive type speech synthesizer that performs residual drive.

（ロ）従来の技術近年、任意の文章から音声を合成するための規則合成
手法の研究が盛んであり、現在では、新聞の校閲装置や
盲人用読書機などに試作、実用化されているものがあ
る。(B) Conventional technology In recent years, research on rule synthesis methods for synthesizing speech from arbitrary sentences has been actively conducted, and currently, prototypes have been produced and practically used in newspaper review devices and reading machines for blind people. There is.

任意の文章から音声を合成するための規則合成装置
は、例えば、テキスト入力に対し、文章解析を行って読
みがなやアクセントを決定し、音韻規則から、必要な合
成単位である音声素片（例えばCVC単位）を決定して結
合し、韻律規則から、声の高さなどを決定して、音声パ
ラメ−タの時系列とピッチパタ−ンを生成し、これらの
パラメ−タから音源とディジタルフィルタを構成するこ
とにより合成音声を生成する。A rule synthesizer for synthesizing speech from an arbitrary sentence, for example, performs sentence analysis on text input to determine pronunciation and accent, and, based on phoneme rules, a speech unit (synthesis unit) which is a necessary synthesis unit. For example, CVC units are determined and combined, the pitch of the voice is determined from the prosodic rules, a time series of voice parameters and a pitch pattern are generated, and a sound source and a digital filter are generated from these parameters. Is generated to generate a synthesized speech.

このような音声合成手法に用いる音声パラメ−タとし
ては、LSPやフォルマントなどが一般的であり、一方、
音源としては、メモリの削減と処理の簡単化のため、イ
ンパルスと白色雑音が用いられていた。As speech parameters used for such a speech synthesis method, LSP and formants are generally used.
Impulses and white noise were used as sound sources to reduce memory and simplify processing.

而して、LSPなど線形予測系の音声合成では予測残差
を駆動音源として用いることにより、原音声に近い合成
音声を得られることが証明されており、「昭和56年度日
本音響学会秋季研究発表会講演論文集１−２−16」に示
されるように、規則合成に対しても、駆動音源として残
差を用いる手法が提案されている。これは、規則合成に
用いる合成単位である音声素片と共に、音声素片のすべ
てに対し、残差波形を蓄え、音声合成時の駆動音源とし
て用いるものである。Thus, it has been proven that in speech synthesis of a linear prediction system such as LSP, a synthesized speech close to the original speech can be obtained by using the prediction residual as a driving sound source. As shown in “Transaction Papers 1-2-16”, a method using a residual as a driving sound source has also been proposed for rule synthesis. In this method, residual waveforms are stored for all speech units together with speech units, which are synthesis units used for rule synthesis, and used as a driving sound source during speech synthesis.

しかし、規則合成に対し、残差を駆動音源として用い
た場合、以下のような問題が生じる。すなわち、規則合
成においては、種々のピッチ周期で合成音を生成させる
ため、音源のピッチ周期を任意に変更できることが必要
となる。インパルスと白色雑音を音源とする場合は、イ
ンパルスの時間間隔を変更するだけでピッチ周期の変更
が可能であるが、残差を駆動音源とする場合には、何ら
かの方法で残差のピッチ周期を変更しなければならな
い。However, when the residual is used as a driving sound source in rule synthesis, the following problem occurs. That is, in the rule synthesis, it is necessary to be able to arbitrarily change the pitch period of the sound source in order to generate synthesized sounds at various pitch periods. When the impulse and white noise are used as the sound source, the pitch period can be changed only by changing the time interval of the impulse, but when the residual is used as the driving sound source, the pitch period of the residual is changed by some method. Must be changed.

従って、一般的には、上記講演論文集にも示されてい
るように、ピッチ周期を長くする場合には伸長部分に０
が詰められ、短くする場合には残差波形を途中で切り捨
てることにより、ピッチ周期の変更が行われている。こ
のとき、残差を変更後のピッチ周期ごとに接続した時、
残差のスペクトルに歪みが生じ、音質劣化の原因とな
る。Therefore, in general, as shown in the above-mentioned collection of lecture papers, when the pitch period is lengthened, 0 is added to the extended portion.
Are shortened, the pitch cycle is changed by truncating the residual waveform on the way. At this time, when the residuals are connected for each changed pitch cycle,
Distortion occurs in the spectrum of the residual, which causes sound quality degradation.

これに対し、最新の「平成元年度日本音響学会春季研
究発表会講演論文集２−７−18」に示されるごとく、ピ
ッチ周期の変更により生じるスペクトル歪みが最小とな
るように、残差の切り出し位置を決定する方法が提案さ
れており、男声においては、ピッチ周期の変更に対し、
良質な合成音声を得ることができたと報告されている
が、零詰め切り捨てによるピッチ周期変更の影響が大き
い女声については、合成音声の劣化が大きい。On the other hand, as shown in the latest “Annual Papers of the Acoustical Society of Japan Spring Research Conference, 2-7-18,” the extraction of the residual so that the spectral distortion caused by the change in the pitch period is minimized. A method for determining the position has been proposed.
Although it has been reported that a high-quality synthesized voice could be obtained, the degradation of the synthesized voice is large for a female voice that is greatly affected by a pitch cycle change due to truncation to zero.

（ハ）発明が解決しようとする課題本発明は、上記の課題を解決するため、ピッチの変更
量と音質の劣化量に相関があることに着目し、規則合成
で必要となる各音声素片に対し、ピッチ周期の異なる残
差を複数個蓄え、その中から合成すべき音声のピッチ周
期に最も近いピッチ周期の残差を選択し、これを駆動音
源として用いる事により、ピッチ周期の変更による音声
合成の劣化の回避を可能とした残差駆動型音声合成装置
を実現するものである。(C) Problems to be Solved by the Invention In order to solve the above-mentioned problems, the present invention focuses on the fact that there is a correlation between the amount of change in pitch and the amount of deterioration in sound quality. On the other hand, by storing a plurality of residuals having different pitch periods, a residual having a pitch period closest to the pitch period of the voice to be synthesized is selected from among the residuals, and this is used as a driving sound source. It is an object of the present invention to realize a residual drive type speech synthesizer capable of avoiding degradation of speech synthesis.

（ニ）課題を解決するための手段本発明の残差駆動型音声合成装置は、音声合成に必要
な音声ラメ−タの列である音声素片を蓄える第１のメモ
リ、各音声素片に対応する残差を蓄える第２のメモリ、
発声すべき内容から必要な音声素片を示す記号列を生成
する音韻記号列生成部、発声内容からピッチ周期の変化
を決定するピッチパタ−ン生成部、該音韻記号列生成部
により生成された記号列に基づいて必要な音声素片を順
次接続する音声素片接続部、接続された音声素片に含ま
れる音声パラメ−タを係数として音声を合成する音声合
成フィルタ、音声素片に対応する残差を駆動音源とし、
上記ピッチパタ−ン生成部で決定された各時点でのピッ
チ周期に応じて、残差のピッチ周期を変更して上記合成
フィルタに入力する駆動音源生成部、並びに上記第２の
メモリに蓄えられた複数の残差の中から特定の残差を選
択する残差選択回路からなる。(D) Means for Solving the Problems A residual driving type speech synthesizer according to the present invention has a first memory for storing speech segments, which are a sequence of speech parameters required for speech synthesis, and a speech unit. A second memory for storing the corresponding residuals,
A phoneme symbol string generation unit that generates a symbol string indicating a required speech unit from the content to be uttered, a pitch pattern generation unit that determines a change in pitch period from the utterance content, and a symbol generated by the phoneme symbol string generation unit A speech unit connection unit for sequentially connecting necessary speech units based on the sequence, a speech synthesis filter for synthesizing speech using speech parameters included in the connected speech units as coefficients, and a residue corresponding to the speech unit. Using the difference as the driving sound source,
According to the pitch cycle at each point in time determined by the pitch pattern generator, the pitch cycle of the residual is changed and input to the synthesis filter, and the driving sound source generator is stored in the second memory. It comprises a residual selection circuit for selecting a specific residual from a plurality of residuals.

（ホ）作用残差駆動型音声合成装置では、ピッチ周期を変更は、
従来から、残差の一部に零デ−タを挿入したり、一部を
切り捨てることにより行われていたが、そのために音質
の劣化が生じる。ところが、実験によると、残差のピッ
チ周期の変更を施した時の、ピッチ変更量と音質の関係
は、ピッチ周期を長くする（音程を低くする）場合も、
ピッチ周期を短くする（音程を高くする）場合もピッチ
周期の変更量が大きい程、主観評価の評価値は悪くな
り、音質は劣化している。(E) Operation In the residual drive type speech synthesizer, changing the pitch cycle
Conventionally, zero data has been inserted into a part of the residual or truncated, but the sound quality deteriorates. However, according to experiments, when the pitch cycle of the residual is changed, the relationship between the pitch change amount and the sound quality is as follows.
Even when the pitch period is shortened (increased pitch), the larger the amount of change in the pitch period, the worse the subjective evaluation value and the sound quality is degraded.

本発明の残差駆動型音声合成装置は、上記第２のメモ
リに蓄えられた複数の残差の中から特定の残差を選択す
る残差選択回路を設けたものであるので、上記第１のメ
モリに蓄えられた音声素片に対応して、ピッチ周期の異
なる複数の残差を第２のメモリに蓄え、前記ピッチパタ
−ン生成部で決定された各時点でのピッチ周期に応じ
て、適切なピッチ周期の残差を第２のメモリから上記残
差選択回路が選択し、選択された残差に対して駆動音源
生成部が必要なピッチ周期の変更を行うことができる。Since the residual driving type speech synthesizer of the present invention is provided with a residual selection circuit for selecting a specific residual from a plurality of residuals stored in the second memory, A plurality of residuals having different pitch periods are stored in the second memory corresponding to the speech units stored in the memory of (a), and according to the pitch period at each time point determined by the pitch pattern generation unit, The above-described residual selection circuit selects an appropriate residual of the pitch period from the second memory, and the driving sound source generation unit can change the required pitch period with respect to the selected residual.

（へ）実施例本発明の残差駆動型音声合成装置と対比説明するため
に、まず、従来装置について説明する。(F) Embodiment In order to compare the present embodiment with a residual drive type speech synthesizer of the present invention, a conventional apparatus will be described first.

第１図は一般的な残差駆動型規則合成装置の構成例を
示したものである。但し、同図には、言語処理の部分は
含んでおらず、入力はかな文字とアクセントの位置情報
などで行われる。FIG. 1 shows an example of the configuration of a general residual drive type rule synthesizing apparatus. However, this figure does not include the language processing part, and the input is performed using kana characters and accent position information.

同図の装置によれば、まず、入力情報が文字列バッフ
ァ（１）に入力される。例えば、入力情報として「た＊
べにき＊た。」と入力されると、音韻記号列生成部
（２）は、文字列バッファ（１）に蓄えられた入力情報
を必要な音声素片を示す音韻記号に変換する。この列で
は、合成単位をCV素片とした場合について述べるため、
音韻記号列バッファ（３）に第２図に示すような音韻記
号列が蓄えられる。According to the apparatus shown in the figure, first, input information is input to the character string buffer (1). For example, "*
* Is input, the phoneme symbol string generation unit (2) converts the input information stored in the character string buffer (1) into phoneme symbols indicating necessary speech units. In this column, to describe the case where the synthesis unit is a CV element,
A phoneme symbol string as shown in FIG. 2 is stored in the phoneme symbol string buffer (3).

音声素片メモリ（４）には、各CV素片に対応した音声
パラメ−タ、例えば、LSP係数など蓄えられており、音
韻記号列バッファ（３）に蓄えられた音韻記号に従っ
て、必要な音声素片が音声素片メモリ（４）から、音声
素片接続部（５）に順次読み出される。そして、読み出
された音声素片は、音声素片接続部（５）で接続され、
接続長の調整や補間処理等が施された後、音声パラメ−
タバッファ（６）に蓄えられる。The speech unit memory (4) stores speech parameters corresponding to each CV segment, for example, LSP coefficients and the like, and necessary speech is stored in accordance with the phoneme symbols stored in the phoneme symbol string buffer (3). Units are sequentially read from the speech unit memory (4) to the speech unit connection unit (5). Then, the read speech units are connected by a speech unit connection unit (5),
After adjustment of connection length and interpolation processing, audio parameters
Buffer (6).

一方、文字列バッファ（１）に蓄えらえたアクセント
情報（＊）と文節の切れ目を示す情報（スペ−ス）か
ら、ピッチパタ−ン生成部（７）において、ピッチの変
化パタ−ンが生成される。第４図はピッチパタ−ンが生
成される過程を「た＊べにき＊た。」の例で図示した
ものであって、第３図（イ）に示すように文章全体にわ
たって下降するフレ−ズ成分に対し、アクセント位置
（＊）の直後に下降する同図（ロ）のアクセント成分が
加算され、第３図（ハ）に示すピッチ変化パタ−ンが生
成される。On the other hand, from the accent information (*) stored in the character string buffer (1) and the information (space) indicating a break in a phrase, a pitch change pattern is generated in a pitch pattern generation unit (7). You. FIG. 4 illustrates the process of generating a pitch pattern by way of an example of "***". As shown in FIG. 3 (a), a frame descending over the entire text. The pitch component shown in FIG. 3 (b), which descends immediately after the accent position (*), is added to the pitch component, thereby generating a pitch change pattern shown in FIG. 3 (c).

また、残差波形メモリ（８）では、各音声素片に対応
して、残差波形とそのピッチ周期が蓄えられており、順
次読み出された音声素片に対応する残差波形とそのピッ
チ周期が、駆動音源生成部（９）に読み出され、ピッチ
パタ−ン生成部（７）で生成されたピッチの変化パタ−
ンに従ってピッチの変更が行われた後、接続された駆動
音源バッファ（10）に蓄えられる。The residual waveform memory (8) stores a residual waveform and its pitch period corresponding to each speech unit. The residual waveform corresponding to the sequentially read speech unit and its pitch are stored. The period is read out by the driving sound source generation section (9), and the pitch change pattern generated by the pitch pattern generation section (7).
After the pitch is changed according to the pitch, it is stored in the connected driving sound source buffer (10).

駆動音源バッファ（10）に蓄えられた駆動音源は、合
成フィルタ（11）に音源として入力され、音声パラメ−
タバッファに蓄えられた音声パラメ−タを合成フィルタ
（１）の係数として、合成音声が生成される。合成され
た音声はDA変換器（12）でアナログ信号に変換され、ス
ピ−カ（13）で発音される。The driving sound source stored in the driving sound source buffer (10) is input to the synthesis filter (11) as a sound source, and the sound parameters are output.
The synthesized speech is generated using the speech parameters stored in the data buffer as coefficients of the synthesis filter (1). The synthesized voice is converted into an analog signal by a DA converter (12) and is emitted by a speaker (13).

本発明の従来の残差駆動型音声合成装置の駆動音源生
成部（９）は第４図に示す如く、残差波形メモリ（８）
から読み出された残差波形が残差波形バッファ（91）
に、その残差波形のピッチ周期が残差ピッチ周期レジス
タ（92）に蓄えられる。一方、ピッチパタ−ン生成部
（７）で生成されたピッチ変化パタ−ンは、各時点のピ
ッチ周期の形で生成ピッチ周期バッファ（93）に蓄えら
れる。そして、生成ピッチ周期バッファ（93）に蓄えら
えたピッチ周期のうち、その時点で合成すべき音声のピ
ッチ周期が、目標ピッチ周期レジスタ（94）にセットさ
れる。差分器（95）は、残差ピッチ周期レジスタ（92）
に蓄えられた読み出されている残差波形のピッチ周期
と、目標ピッチ周期レジスタ（94）にセットされたの合
成すべき音声のピッチ周期の差を計算し、ピッチ周期変
更値レジスタ（96）に蓄える。ピッチ制御回路（97）は
ピッチ周期変更値レジスタ（96）の内容に基づいて、ピ
ッチ周期変更値が正の時は、残差波形バッファ（91）に
蓄えられている残差に対し、変更値分だけ零デ−タを挿
入してピッチ周期を長くし、ピッチ周期変更値が負の時
は、残差波形を切り捨てることによって、ピッチ周期を
短くして、駆動音源バッファ（10）に残差波形を蓄え
る。As shown in FIG. 4, a driving sound source generating section (9) of the conventional residual driving type speech synthesizer of the present invention has a residual waveform memory (8).
The residual waveform read from the residual waveform buffer (91)
Then, the pitch cycle of the residual waveform is stored in the residual pitch cycle register (92). On the other hand, the pitch change pattern generated by the pitch pattern generation section (7) is stored in the generated pitch cycle buffer (93) in the form of the pitch cycle at each time. Then, of the pitch periods stored in the generated pitch period buffer (93), the pitch period of the voice to be synthesized at that time is set in the target pitch period register (94). Differentiator (95) is a residual pitch period register (92)
Calculates the difference between the pitch cycle of the read residual waveform stored in the target pitch cycle and the pitch cycle of the voice to be synthesized set in the target pitch cycle register (94). To store. When the pitch cycle change value is positive based on the contents of the pitch cycle change value register (96), the pitch control circuit (97) applies the change value to the residual stored in the residual waveform buffer (91). The pitch period is lengthened by inserting zero data by the minute, and when the pitch period change value is negative, the pitch period is shortened by truncating the residual waveform, and the residual period is stored in the driving sound source buffer (10). Store the waveform.

以上のような構成で所望のピッチ変化パタ−ンの音声
を合成できるが、このような従来方法では例えば、残差
波形メモリ（８）に、ピッチ周期が33、すなわち、ピッ
チ周波数が300HZ（サンプリング周期が10KHZの場合）の
音声素片「ta」に対応する残差波形が蓄えられていた場
合、「た＊べにき＊た。」の最初の「た」は平均約40
0HZ、最後の「た」は平均約220HZで合成しなければなら
ないため、10前後のピッチ周期の零詰め切り捨てが必要
となり（400HZ＝ピッチ周期25、220HZ＝ピッチ周期4
5）、残差のスペクトルが歪み合成音声が劣化する。さ
らに長い文章の場合、ピッチ周期の変更量が増大するこ
とも生じる。With the above-described configuration, a voice having a desired pitch change pattern can be synthesized. In such a conventional method, for example, the pitch cycle is 33, that is, the pitch frequency is 300 Hz (sampling) in the residual waveform memory (8). If the residual waveform corresponding to the speech unit “ta” (when the period is 10 KHZ) is stored, the first “ta” of “ta * nii **” is about 40 on average.
Since 0HZ and the last "ta" must be synthesized at an average of about 220HZ, it is necessary to round down to zero with a pitch cycle of about 10 (400HZ = pitch cycle 25, 220HZ = pitch cycle 4
5) The residual spectrum is distorted and the synthesized speech is degraded. In the case of longer sentences, the change amount of the pitch period may increase.

これに対して、第５図に本発明を実現する駆動音源生
成部（９）の構成例を示す。On the other hand, FIG. 5 shows a configuration example of a driving sound source generation unit (9) for realizing the present invention.

同図の本発明装置に於ては、同一の音声素片に対し、
ピッチ周期の異なる残差波形を複数個、例えば６種類、
残差波形メモリ（８）に蓄えており各音声素片に対応し
て、６種類の残差波形が蓄えられている先頭アドレス
が、残差アドレスレジスタ１（981）〜残差アドレスレ
ジスタ６（986）にセットされる。また、同時に、各残
差波形のピッチ周期も読み出され、残差ピッチ周期レジ
スタ１（921）〜残差ピッチ周期レジスタ６（926）にセ
ットされる。In the device of the present invention shown in FIG.
A plurality of residual waveforms having different pitch periods, for example, six types,
The head addresses which are stored in the residual waveform memory (8) and store six types of residual waveforms corresponding to each speech unit are the residual address register 1 (981) to the residual address register 6 ( 986). At the same time, the pitch cycle of each residual waveform is read out and set in the residual pitch cycle register 1 (921) to the residual pitch cycle register 6 (926).

第６図は、残差ピッチ周期レジスタ１（921）〜残差
ピッチ周期レジスタ６（926）にセットされるピッチ周
期の例を示したものである。第６図に示すように、種々
のピッチ周期の残差波形が、各音声素片に対応して、残
差波形メモリ（８）に蓄えられている。FIG. 6 shows an example of the pitch cycle set in the residual pitch cycle register 1 (921) to the residual pitch cycle register 6 (926). As shown in FIG. 6, residual waveforms of various pitch periods are stored in the residual waveform memory (8) corresponding to each speech unit.

一方、ピッチパタ−ン生成部（７）で生成されたピッ
チ変化パタ−ンは、各時点のピッチ周期の形で生成ピッ
チ周期バッファ（93）に蓄えられる。そして、生成ピッ
チ周期バッファ（93）に蓄えられたピッチ周期のうち、
その時点で合成するべき音声のピッチ周期が、目標ピッ
チ周期レジスタ（94）にセットされる。On the other hand, the pitch change pattern generated by the pitch pattern generation section (7) is stored in the generated pitch cycle buffer (93) in the form of the pitch cycle at each time. Then, of the pitch periods stored in the generated pitch period buffer (93),
The pitch period of the sound to be synthesized at that time is set in the target pitch period register (94).

第３図の例では、まず最初の「た」に対応するピッチ
周波数400HZから、ピッチ周期25（サンプリング周波数1
0KHZの時）が目標ピッチ周期レジスタ（94）にセットさ
れる。差分器（95）は、まず、残差ピッチ周期レジスタ
（921）に蓄えらえた20を読みだし、目標ピッチ周期レ
ジスタ（94）の値である25との差をとり、差分値５を出
力し、その値は比較器（99）の一方の入力に取り込まれ
る。比較器（99）の出力に接続されたピッチ周期変更値
レジスタ（96）には、現時点で最も少ない差分値が蓄え
られており、初期値は大きな値として100が入力されて
いる。比較器は、ピッチ周期変更値レジスタ（96）にセ
ットされている100と差分器（95）の出力である５とを
比較し、絶対値の少ない方の値５を出力して、ピッチ周
期変更値レジスタ（96）にセットするとともに、その時
点で差分回路（95）に入力されている残差ピッチ周期レ
ジスタ（921）に対応する残差アドレスレジスタ１（98
1）の内容を残差アドレスレジスタ（98）にセットす
る。In the example of FIG. 3, first, from the pitch frequency 400 Hz corresponding to the first "ta", the pitch period 25 (the sampling frequency 1
0KHZ) is set in the target pitch period register (94). The differentiator (95) first reads out the 20 stored in the residual pitch period register (921), takes the difference from 25 which is the value of the target pitch period register (94), and outputs the difference value 5. , Its value is taken into one input of a comparator (99). The pitch period change value register (96) connected to the output of the comparator (99) stores the smallest difference value at the present time, and 100 is input as a large initial value. The comparator compares 100 set in the pitch cycle change value register (96) with 5 which is the output of the differentiator (95), and outputs the smaller value 5 of the absolute value to change the pitch cycle. The value is set in the value register (96) and the residual address register 1 (98) corresponding to the residual pitch period register (921) input to the difference circuit (95) at that time.
Set the contents of 1) in the residual address register (98).

次に、差分器（95）は、残差ピッチ周期レジスタ（92
2）に蓄えられた26を読みだし、目標ピッチ周期レジス
タ（94）の値である25との差をとり、差分値１を出力
し、その値は比較器（99）の一方の入力に取り込まれ
る。比較器（99）の出力に接続されたピッチ周期変更値
レジスタ（96）には、時点で最も少ない差分値５が蓄え
られている。比較器は、ピッチ周期変更値レジスタ（9
6）にセットされている５と差分器（95）の出力である
１とを比較し、絶対値の少ない方の値を１を出力して、
ピッチ周期変更値レジスタ（96）にセットするととも
に、その時点で差分回路（95）に入力されている残差ピ
ッチ周期レジスタ（922）に対応する残差アドレスレジ
スタ１（982）の内容を残差アドレスレジスタ（98）に
セットする。逆にピッチ周期変更値レジスタ（96）にセ
ットされている値の絶対値の方が小さい場合は、残差ア
ドレスレジスタ（98）の値はそのまま保存される。Next, the differentiator (95) sets the residual pitch period register (92
2) The 26 stored in 2) is read out, the difference from 25 which is the value of the target pitch period register (94) is obtained, and the difference value 1 is output, and the value is taken into one input of the comparator (99). It is. The pitch period change value register (96) connected to the output of the comparator (99) stores the smallest difference value 5 at the time. The comparator uses the pitch period change value register (9
6) is compared with 1 which is the output of the differentiator (95), and the smaller absolute value is output as 1.
The content of the residual address register 1 (982) corresponding to the residual pitch period register (922) input to the difference circuit (95) at that time is set in the pitch cycle change value register (96). Set in the address register (98). Conversely, when the absolute value of the value set in the pitch cycle change value register (96) is smaller, the value of the residual address register (98) is stored as it is.

以上の操作を繰り返すことにより、合成すべきピッチ
周期25と最も近いピッチ周期26が残差ピッチ周期レジス
タ１〜６の中から選択され、その差分値、すなわちピッ
チ周期を変更すべき量である１がピッチ周期変更値レジ
スタ（96）にセットされる。また、残差アドレスレジス
タ（98）には、選択された残差ピッチ周期レジスタ２
（922）に対応する残差アドレスレジスタ２（982）の値
がセットされる。そして、最初の「た」を合成する際に
は、残差波形バッファ（91）に、残差アドレスレジスタ
（98）に格納されたアドレスから残差が読みこまれ、ピ
ッチ制御回路（97）によって、１だけ零デ−タが挿入さ
れる。By repeating the above operation, the pitch cycle 26 closest to the pitch cycle 25 to be synthesized is selected from the residual pitch cycle registers 1 to 6, and the difference value thereof, that is, 1 which is the amount to change the pitch cycle Is set in the pitch cycle change value register (96). The residual address register (98) includes a selected residual pitch period register 2
The value of the residual address register 2 (982) corresponding to (922) is set. Then, when synthesizing the first "ta", the residual is read from the address stored in the residual address register (98) into the residual waveform buffer (91), and the pitch is controlled by the pitch control circuit (97). Zero data is inserted by one.

同様に、最後の「た」に対しては、ピッチ周期44の残
差アドレスレジスタ５（985）の先頭アドレスが、残差
アドレスレジスタ（98）に格納され、そのアドレスに従
って、残差波形が残差波形がバッファ（91）に読み込ま
れる。合成すべきピッチ周期は45であり、ピッチ周期変
更値レジスタ（96）には最終的に−１がセットされりた
め、ピッチ制御回路（97）によって、１だけ残差波形の
切り捨てが行われる。Similarly, for the last “ta”, the leading address of the residual address register 5 (985) of the pitch cycle 44 is stored in the residual address register (98), and the residual waveform is stored according to the address. The difference waveform is read into the buffer (91). Since the pitch cycle to be synthesized is 45 and −1 is finally set in the pitch cycle change value register (96), the pitch control circuit (97) cuts off the residual waveform by one.

本発明は以上のような構成であるため、残差波形のピ
ッチ周期を変更する際、実施例の場合、最大でも３だけ
ピッチ周期の零詰め切り捨てを行うだけで十分なピッチ
制御が可能となる。Since the present invention is configured as described above, when the pitch cycle of the residual waveform is changed, in the case of the embodiment, sufficient pitch control can be performed only by truncating the pitch cycle to zero at most. .

（ト）発明の効果本発明の残差駆動型音声合成装置は、同一の音声素片
に対し、ピッチ周期の異なる残差を複数個蓄えているた
め、規則から生成されたピッチパタ−ンに従うピッチ周
期に、残差波形のピッチ周期を変更する際、例えば、合
成すべきピッチ周期に最も近いピッチ周期の残差波形を
利用して駆動音源を生成するため、ピッチの変更量を大
幅に減少させることができ、残差波形のピッチ変更によ
る音質の劣化を回避することができる。(G) Effect of the Invention Since the residual driving type speech synthesizer of the present invention stores a plurality of residuals having different pitch periods for the same speech unit, the pitch according to the pitch pattern generated from the rule is used. When changing the pitch cycle of the residual waveform to the cycle, for example, the driving sound source is generated using the residual waveform of the pitch cycle closest to the pitch cycle to be synthesized, so that the amount of change in the pitch is greatly reduced. Thus, it is possible to avoid deterioration in sound quality due to a change in pitch of the residual waveform.

[Brief description of the drawings]

第１図は一般的な残差駆動型規則合成装置の構成図、第
２図は音韻記号の配列図、第３図はピッチパタ−ンのパ
ターン図、第４図は従来の残差駆動型音声合成装置にお
ける駆動音源生成部の構成図、第５図は本発明を実現す
る駆動音源生成部の構成図、第６図は残差ピッチ周期レ
ジスタ１〜６の配列図である。（１）……文字列バッファ、（２）……音韻記号列生成
部、（３）……音韻記号列バッファ、（４）……音声素
片メモリ、５……音声素片接続部、（６）……音声パラ
メ−タバッファ、（７）……ピッチパタ−ン生成部、
（８）……残差波形メモリ、（９）……残差音源生成
部、（10）……駆動音源バッファ、（11）……合成フィ
ルタ、（12）……DA変換器、（13）……スピ−カ、（9
1）……残差波形バッファ、（92）……残差ピッチ周期
レジスタ、（921）〜（926）……残差ピッチ周期レジス
タ１〜６、（93）……生成ピッチ周期バッファ、（94）
……目標ピッチ周期レジスタ、（95）……差分器、（9
6）……ピッチ周期変更値レジスタ、（97）……ピッチ
制御回路、（98）……残差アドレスレジスタ、（981）
〜（986）……残差アドレスレジスタ１〜６、（99）…
…比較器。FIG. 1 is a block diagram of a general residual drive type rule synthesizing apparatus, FIG. 2 is an arrangement diagram of phoneme symbols, FIG. 3 is a pitch pattern pattern diagram, and FIG. 4 is a conventional residual drive type speech. FIG. 5 is a configuration diagram of a driving sound source generation unit in the synthesizing device, FIG. 5 is a configuration diagram of a driving sound source generation unit realizing the present invention, and FIG. 6 is an arrangement diagram of residual pitch period registers 1 to 6. (1) ... character string buffer, (2) ... phoneme symbol string generation unit, (3) ... phoneme symbol string buffer, (4) ... speech unit memory, 5 ... speech unit connection unit, ( 6) Voice parameter buffer, (7) Pitch pattern generation unit,
(8) ... residual waveform memory, (9) ... residual sound source generator, (10) ... driving sound source buffer, (11) ... synthesis filter, (12) ... DA converter, (13) …… Speaker, (9
1) Residual waveform buffer, (92) Residual pitch cycle register, (921) to (926) Residual pitch cycle registers 1 to 6, (93) Generated pitch cycle buffer, (94) )
…… Target pitch period register, (95) …… Differentiator, (9
6) Pitch cycle change value register (97) Pitch control circuit (98) Residual address register (981)
~ (986) ... Residual address registers 1 to 6, (99) ...
... Comparator.

フロントページの続き (56)参考文献特開平２−66599（ＪＰ，Ａ) 電子情報通信学会技術研究報告［音声］Ｖｏｌ．89，Ｎｏ．388，ＳＰ89− 112，「スペクトルひずみ最小基準による駆動音源の生成とピッチ制御法」ｐ. 19−26（1990年１月26日発行) 日本音響学会昭和56年度秋季研究発表会講演論文集▲Ｉ▼ １−２−16「音韻連鎖と残差波形を用いた音声合成」ｐ. 67−68（昭和56年10月６日発行) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 11/00 - 13/08 G10L 19/00 - 21/06 ＪＩＣＳＴファイル（ＪＯＩＳ)Continuation of front page (56) References JP-A-2-66599 (JP, A) IEICE Technical Report [Sound] Vol. 89, No. 388, SP89-112, "Generation of Driving Sound Source and Pitch Control Method Based on Minimum Criterion for Spectral Distortion", p. 19-26 (issued on January 26, 1990) Proceedings of ASJ Autumn Meeting, 1981 I-1-2-16 "Speech synthesis using phoneme chain and residual waveform" p. 67-68 (October 6, 1981) (58) Fields investigated (Int. Cl. ⁷ , DB Name) G10L 11/00-13/08 G10L 19/00-21/06 JICST file (JOIS)

Claims

(57) [Claims]

1. A first memory for storing a speech unit which is a sequence of speech parameters required for speech synthesis, a second memory for storing a residual corresponding to each speech unit, and a content to be uttered. Phoneme symbol string generation unit that generates a symbol string indicating a natural speech unit,
A pitch pattern that determines the change in pitch period from the utterance content
A speech unit connection unit for sequentially connecting necessary speech units based on the symbol string generated by the phoneme symbol string generation unit, and a speech parameter included in the connected speech unit as a coefficient. A voice synthesis filter for synthesizing voice, a residual corresponding to a voice segment is used as a driving sound source, and a pitch cycle of the residual is changed according to a pitch cycle at each time point determined by the pitch pattern generation unit. In a residual driving type speech synthesizer comprising a driving sound source generating unit for inputting to the synthesis filter, a residual for selecting a specific residual from a plurality of residuals stored in the second memory. A selection circuit for storing a plurality of residuals having different pitch periods in a second memory in correspondence with the speech units stored in the first memory;
According to the pitch period at each point determined by the
The residual selection circuit selects an appropriate residual of the pitch period from the second memory, and the driving sound source generation unit changes a required pitch period for the selected residual. Driven speech synthesizer.

2. The residual selecting circuit according to claim 1, wherein the residual of the pitch cycle closest to the pitch cycle at each time point determined by the pitch pattern generating section is selected from the residuals in the second memory. 2. The apparatus of claim 1, wherein the apparatus is selected.