JPH1115489A

JPH1115489A - Singing sound synthesizing device

Info

Publication number: JPH1115489A
Application number: JP9181816A
Authority: JP
Inventors: Shinichi Ota; 慎一大田; Tetsuo Nishimoto; 哲夫西元
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 1997-06-24
Filing date: 1997-06-24
Publication date: 1999-01-22
Anticipated expiration: 2017-06-24
Also published as: JP3307283B2

Abstract

PROBLEM TO BE SOLVED: To synthesize a more natural singing sound on the basis of text data. SOLUTION: Text data are read out corresponding to melody data stored in a music information storage part 5 and voice parameters consisting of phonematic formant data and phonematic articulation coupling control data corresponding to the vocal sound of the text data are read out of a voice quality control information storage part 6; and a voicing parameter supply control part 7 interpolates the respective parameters and supplies them to formant waveform generation parts 81 to 8m at specific intervals of time to synthesize and output a singing voice corresponding to the text. The speed of pitch variation at the time of a rise in interval is made slower than that at the time of a decrease. Further, the pitch is held when a voiceless sound is generated and begins to be varied when a voiced sound is generated. For consonant and vowel sounds, the target values of the formant frequencies of the vowels are varied.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、歌詞データに基づ
いて対応する音韻を発音し、当該歌詞を人声音で歌唱す
るようになされた歌唱音合成装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a singing sound synthesizing apparatus which generates a corresponding phoneme based on lyrics data and sings the lyrics with a human voice.

【０００２】[0002]

【従来の技術】音声合成技術の一つとして、フォルマン
ト合成による音声合成方式が知られている。この方式
は、時系列的に変化するフォルマントに関するパラメー
タデータを複数ステップにわたって予め記憶している記
憶手段と、音声を発声すべきときに前記記憶手段から前
記パラメータデータを複数ステップにわたって時系列的
に読み出す読出手段と、読み出されたパラメータデータ
が入力され、該パラメータデータに応じて決定されるフ
ォルマント特性を持つ楽音信号を合成するフォルマント
合成手段とを備え、音声信号のフォルマントを時系列的
に変化させるものである。2. Description of the Related Art As one of the speech synthesis techniques, a speech synthesis system using formant synthesis is known. According to this method, storage means for storing parameter data relating to a formant that changes in time series over a plurality of steps, and reading out the parameter data from the storage means in a time series over a plurality of steps when a voice is to be uttered A readout unit; and a formant synthesizing unit that receives the read-out parameter data and synthesizes a tone signal having a formant characteristic determined according to the parameter data, and changes a formant of the audio signal in a time-series manner. Things.

【０００３】最近では、このような音声合成技術を音楽
に適用し、歌詞データに基づいて自然な歌唱音を合成出
力する歌唱音合成装置（シンギングシンセサイザ）が提
案されている（特開平９−５０２８７号公報を参照され
たい）。この歌唱音合成装置は、歌詞データとメロディ
ーデータとを記憶し、該メロディーデータの読出に対応
して前記歌詞データを読み出し、当該歌詞データに対応
した音韻を発音させて当該歌詞を歌唱させるように構成
されている。そして、一つの音符を発音する時間内に複
数の音節データを発音することができるようにし、ま
た、子音の発音時間は設定された時間とし、母音の発音
時間は音符長により変化させるようにして、自然な歌唱
音を合成出力することができるようになされている。Recently, a singing sound synthesizer (singing synthesizer) which synthesizes and outputs a natural singing sound based on lyrics data by applying such a voice synthesis technique to music has been proposed (Japanese Patent Laid-Open No. 9-50287). No.). The singing sound synthesizer stores lyrics data and melody data, reads the lyrics data in response to the reading of the melody data, and causes the lyric to be sung by generating a phoneme corresponding to the lyrics data. It is configured. Then, a plurality of syllable data can be pronounced within a time when one note is pronounced, and a consonant pronunciation time is set to a set time, and a vowel pronunciation time is changed according to a note length. It is designed to be able to synthesize and output natural singing sounds.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、自然な
歌声に聞こえるためには、上記した以外にも歌唱特有の
問題がある。例えば、歌唱においては音程が変化するた
め、ピッチ変化が当然に現われる。図７の（ａ）は、ａ
という音を音程を変えつつ発音したときのピッチ周波数
の変化の様子を実測した図である。この図に示すよう
に、低い音程から高い音程に上げたときと高い音程から
低い音程へ下げたときでは、ピッチ変化の傾きが異なっ
ている。すなわち、高い音程に上がるときのピッチ変化
の傾きは、低い音程に下がるときの傾きよりも緩やかな
ものとなっている。音程の上下は声帯にある筋肉の緊
張、弛緩により行なわれるが、その力学的特性に起因す
るものと考えられる。However, in order to sound natural singing voice, there are other problems specific to singing in addition to the above. For example, in singing, the pitch changes, so that a change in pitch naturally appears. (A) of FIG.
FIG. 7 is a diagram showing a change in pitch frequency when a sound is generated while changing the pitch. As shown in this figure, the slope of the pitch change differs between when the pitch is raised from a low pitch to a high pitch and when it is lowered from a high pitch to a low pitch. That is, the gradient of the pitch change when the pitch rises to a high pitch is gentler than the gradient when the pitch falls to a low pitch. The pitch is raised and lowered by the tension and relaxation of the muscles in the vocal cords, which is considered to be due to its mechanical characteristics.

【０００５】従来の歌唱音合成装置においては、図７の
（ｂ）に示すように、ピッチの変化をピッチの上昇時下
降時のいずれにおいても同様に行なっていた。したがっ
て、特にピッチの下降時に自然の発声によるものではな
い違和感があった。したがって、図７の（ｃ）に示すよ
うに、ピッチの上昇時と下降時においてピッチ変化の傾
きを変更させることが、人間の聴覚上違和感のないもの
とするために必要となる。In the conventional singing sound synthesizer, as shown in FIG. 7 (b), the pitch is changed in the same manner both when the pitch is raised and when the pitch is lowered. Therefore, there was a feeling of discomfort that was not caused by natural utterances, especially when the pitch was lowered. Therefore, as shown in FIG. 7 (c), it is necessary to change the inclination of the pitch change when the pitch rises and when the pitch falls, in order to make the human sense of hearing uncomfortable.

【０００６】また、図８は、有声音（voiced sound）Ｖ
の後にピッチの異なる無声音（unvoiced sound）Ｕと有
声音Ｖからなる音節を発音させるときのピッチの変化の
様子を説明する図である。同図（ａ）に示すように、こ
のような場合には無声音Ｕの発音開始時からピッチ変化
を生じさせるのが通常であるが、無声音Ｕにはピッチ成
分がないため、無声音Ｕの発音期間においてはピッチ変
化が出力されない。したがって、図８の（ｂ）に示すよ
うに、当該音節の母音Ｖの発音期間になってから、いき
なりピッチが跳んだ母音Ｖが発生されることとなる。し
たがって、ピッチのつながりが不連続な不自然な発音と
なってしまう。そこで、このような不自然さがなく、図
８の（ｃ）に示すように、自然なピッチのつながりを有
する音声出力が望まれる。FIG. 8 shows a voiced sound V
FIG. 7 is a diagram for explaining how the pitch changes when a syllable composed of an unvoiced sound U and a voiced sound V having different pitches is generated after. As shown in FIG. 3A, in such a case, pitch change is usually caused from the start of the sounding of the unvoiced sound U. However, since the unvoiced sound U has no pitch component, the sounding period of the unvoiced sound U is generated. Does not output a pitch change. Therefore, as shown in FIG. 8B, a vowel V with a sudden jump in pitch is generated after the vowel V of the syllable is in the sounding period. Therefore, the connection between the pitches results in a discontinuous and unnatural sound. Therefore, there is a demand for an audio output which does not have such unnaturalness and has a natural pitch connection as shown in FIG.

【０００７】さらに、実際の人間の発音においては、子
音Ｃ（consonant）、母音Ｖ（vowel）、子音Ｃというよ
うに子音の間に母音が挟まれているときには、その母音
の本来のフォルマント中心周波数に達することなく次の
子音のフォルマント中心周波数に変化していくこと（い
わゆる、アンダーシュート）が発生する。図９はこの様
子を示す図である。例えば母音ｕを単独で発音した場合
には、図９の（ａ）に示すような第１フォルマント中心
周波数ＦＦ１および第２フォルマント中心周波数ＦＦ２
となっている。しかしながら、ｄ−ｕ−ｄと発音した場
合には、図９の（ｂ）に示すように、母音ｕの第２フォ
ルマント中心周波数ＦＦ２は、同図（ａ）に示したＦＦ
２の周波数まで達することなく、次のｄの第２フォルマ
ント周波数に遷移している。このように、子音で挟まれ
た母音は、実際には、単独で発音されたときのフォルマ
ント中心周波数とは異なるフォルマント中心周波数で発
音されている。Furthermore, in actual human pronunciation, when a vowel is sandwiched between consonants such as consonant C (consonant), vowel V (vowel), and consonant C, the original formant center frequency of the vowel Changes to the center frequency of the formant of the next consonant (so-called undershoot) without reaching. FIG. 9 is a diagram showing this state. For example, when the vowel u is generated alone, the first formant center frequency FF1 and the second formant center frequency FF2 as shown in FIG.
It has become. However, when the sound is produced as dud, as shown in FIG. 9B, the second formant center frequency FF2 of the vowel u becomes the FF shown in FIG.
The transition is made to the next d second formant frequency without reaching the frequency of 2. Thus, a vowel sandwiched between consonants is actually produced at a formant center frequency that is different from the formant center frequency when it is independently produced.

【０００８】そこで、本発明は、上述のような実際の人
間による発音に現われる特徴を取り込むことにより、よ
り自然な歌声を発声することができる歌唱音合成装置を
提供することを目的としている。Accordingly, an object of the present invention is to provide a singing sound synthesizing apparatus capable of producing a more natural singing voice by taking in the features appearing in actual human pronunciation as described above.

【０００９】[0009]

【課題を解決するための手段】上記目的を達成するため
に、本発明の歌唱音合成装置は、歌詞データとメロディ
データとを記憶し、該メロディデータの読み出しに対応
して前記歌詞データを読み出し、該歌詞データに対応し
た音韻を発音させて当該歌詞を歌唱させるように構成さ
れた歌唱音合成装置において、発音させる音韻のピッチ
を上昇させるときのピッチ変化速度が、発音させる音韻
のピッチを下降させるときのピッチ変化速度よりも遅い
速度とされているものである。これにより、人間による
発声の場合と同様の音程の変化を実現でき、自然な歌唱
音を発声させることができる。In order to achieve the above object, a singing sound synthesizer according to the present invention stores lyrics data and melody data, and reads out the lyrics data in response to the reading of the melody data. In a singing sound synthesizing device configured to cause a phonogram corresponding to the lyrics data to sound and sing the lyrics, the pitch change speed when increasing the pitch of the phonemes to be generated decreases the pitch of the phonemes to be generated. The speed is set to be lower than the speed at which the pitch is changed. As a result, the same pitch change as in the case of human utterance can be realized, and a natural singing sound can be uttered.

【００１０】また、本発明の他の歌唱音合成装置は、歌
詞データとメロディデータとを記憶し、該メロディデー
タの読み出しに対応して前記歌詞データを読み出し、該
歌詞データに対応した音韻を発音させて当該歌詞を歌唱
させるように構成された歌唱音合成装置において、有声
音、無声音、有声音の順に発音させる場合において発音
する音韻のピッチを変化させるときには、前記無声音の
発音期間は先行する前記有声音のピッチデータを保持
し、後続する前記有声音の発音開始時から当該ピッチの
変化を開始するようになされているものである。これに
より、ピッチのつながりがスムーズになり、自然な発音
となる。Further, another singing sound synthesizer of the present invention stores lyrics data and melody data, reads the lyrics data in response to the reading of the melody data, and generates a phoneme corresponding to the lyrics data. In the singing sound synthesizer configured to cause the lyrics to sing, the voiced sound, the unvoiced sound, when changing the pitch of the phoneme to be pronounced in the case of sounding in the order of the voiced sound, the sounding period of the unvoiced sound is preceding The voiced sound pitch data is held, and the change of the pitch starts from the start of the sounding of the following voiced sound. As a result, the connection between the pitches becomes smooth, and a natural sound is produced.

【００１１】さらに、本発明のさらに他の歌唱音合成装
置は、歌詞データとメロディデータとを記憶し、該メロ
ディデータの読み出しに対応して前記歌詞データを読み
出し、該歌詞データに対応した音韻を発音させて当該歌
詞を歌唱させるように構成された歌唱音合成装置におい
て、子音、母音、子音の順に発音させる場合において前
記母音を発音させるときは、そのフォルマント中心周波
数が当該母音を単独で発音させる場合のフォルマント中
心周波数に達しないうちに、後続する前記子音のフォル
マント中心周波数に戻すようになされているものであ
る。これにより、人間による発生と同様の音韻を発声す
ることができ、自然な歌唱音を合成することができる。Still another singing sound synthesizer of the present invention stores lyric data and melody data, reads the lyric data in response to the reading of the melody data, and generates a phoneme corresponding to the lyric data. In a singing sound synthesizer configured to sing and sing the lyrics, in the case of consonants, vowels, and consonants, when the vowel is to be sounded, the formant center frequency causes the vowel to sound alone. Before reaching the formant center frequency in the case, the formant center frequency of the following consonant is restored. As a result, the same phoneme as that generated by a human can be uttered, and a natural singing sound can be synthesized.

【００１２】[0012]

【発明の実施の形態】図１は、本発明の歌唱音合成装置
のシステム構成の一例を示す図である。この図におい
て、１は装置全体の制御を行う制御部であり、ＣＰＵお
よび各種フラグや各種変数領域、バッファ等として使用
されるＲＡＭなどが設けられている。２は制御プログラ
ムなどが格納されているプログラム記憶装置PRGMEMであ
る。なお、このプログラム記憶装置２は、ネットワーク
等を介して制御プログラムを前記制御部１に供給するプ
ログラム供給装置であってもよい。３は機器の動作状態
や入力データおよび操作者に対するメッセージなどを表
示するための表示部、操作つまみや操作ボタンなどの各
種設定操作子などが設けられた操作設定部である。４は
ＭＩＤＩインターフェース部であり、外部のＭＩＤＩ機
器やネットワーク等に接続されており、ＭＩＤＩデータ
の入出力を行なう。５はメロディーデータおよび歌詞デ
ータからなる歌唱データや伴奏データからなる楽曲情報
が格納される楽曲情報記憶部SONGMEMであり、半導体メ
モリのほか、フロッピーディスク装置、ハードディスク
装置、ＭＯディスク装置、ＩＣメモリカードなどの各種
メディアを使用することができる。FIG. 1 is a diagram showing an example of the system configuration of a singing sound synthesizer according to the present invention. In FIG. 1, reference numeral 1 denotes a control unit which controls the entire apparatus, and includes a CPU, a RAM used as various flags, various variable areas, buffers, and the like. Reference numeral 2 denotes a program storage device PRGMEM in which a control program and the like are stored. Note that the program storage device 2 may be a program supply device that supplies a control program to the control unit 1 via a network or the like. Reference numeral 3 denotes an operation setting unit provided with a display unit for displaying the operation state of the device, input data, a message for the operator, and the like, and various setting operators such as operation knobs and operation buttons. Reference numeral 4 denotes a MIDI interface unit, which is connected to an external MIDI device, a network, or the like, and performs input / output of MIDI data. Reference numeral 5 denotes a song information storage unit SONGMEM for storing song information including melody data and lyric data and song information including accompaniment data. In addition to semiconductor memory, a floppy disk device, hard disk device, MO disk device, IC memory card, etc. Various media can be used.

【００１３】また、６は声質制御情報記憶部PHONEMEMで
あり、この声質制御情報記憶部PHONEMEM６内には、音韻
フォルマントデータPHFRMTDATAおよび音韻調音結合制御
データPHCOMBDATAが格納されているボイスパラメータメ
モリPMEMが設けられている。このボイスパラメータメモ
リPMEMは、発生する音声の種類、例えば、男声、女声、
あるいは特定の歌手等ごとにそれぞれ設けられており、
操作者は、発声させたい声質に応じて使用するボイスパ
ラメータメモリPMEMを選択することができるようになさ
れている。前記音韻フォルマントデータPHFRMTDATAは、
各音韻をフォルマント合成するためのパラメータであ
り、各音韻対応に、その音韻を発生するための各フォル
マントの形状を指定するデータ（FShape 1〜m）、各フ
ォルマントの中心周波数をそれぞれ指定するデータ（FF
req 1〜m）、各フォルマントの出力レベルデータ（FLev
el 1〜m）等のデータからなっている。また、前記音韻
調音結合制御データPHCOMBDATAはある音韻から他の音韻
に移行する際の調音結合（特に、フォルマント周波数遷
移）を行なうための各種パラメータであり、前記音韻フ
ォルマントデータPHFRMTDATA中の各データに対する補間
レートや該補間レートを補正するための定数等からなっ
ている。Reference numeral 6 denotes a voice quality control information storage unit PHONEMEM. The voice quality control information storage unit PHONEMEM 6 is provided with a voice parameter memory PMEM storing phoneme formant data PHFRMTDATA and phoneme tone combination control data PHCOMBDATA. ing. The voice parameter memory PMEM stores the type of voice to be generated, for example, male voice, female voice,
Or it is provided for each specific singer etc.,
The operator can select the voice parameter memory PMEM to be used according to the voice quality to be uttered. The phonological formant data PHFRMTDATA is
A parameter for synthesizing each phoneme in formant. Data (FShape 1 to m) that specifies the form of each formant to generate the phoneme, data that specifies the center frequency of each formant ( FF
req 1 to m), output level data of each formant (FLev
el 1 to m). The phonological tone combination control data PHCOMBDATA is various parameters for performing articulation coupling (particularly, formant frequency transition) when transitioning from one phonology to another phonology. Interpolation for each data in the phonological formant data PHFRMTDATA is performed. It consists of a rate and constants for correcting the interpolation rate.

【００１４】７は発音ボイスパラメータ供給制御部であ
り、前記楽曲情報記憶部SONGMEM５から当該楽曲のメロ
ディデータおよび歌詞データを読み出し、歌詞データに
対応する音韻フォルマントデータPHFRMTDATAおよび音韻
調音結合制御データPHCOMBDATAを前記声質制御情報発生
部PHONEMEMのボイスパラメータメモリPMEMから読み出し
て補間演算を実行し、所定時間毎にフォルマント波形発
生部８１〜８ｍに発音制御パラメータとして供給する。Numeral 7 is a pronunciation voice parameter supply control unit which reads out the melody data and lyrics data of the music from the music information storage SONGMEM 5 and converts the phonological formant data PHFRMTDATA and the phonological tones combination control data PHCOMBDATA corresponding to the lyric data. The voice quality control information generation unit PHONEMEM reads out the voice parameter memory PMEM from the memory and performs an interpolation operation, and supplies it to the formant waveform generation units 81 to 8m at predetermined time intervals as sound generation control parameters.

【００１５】８１〜８ｍはフォルマント波形発生部FGen
1〜mであり、前記制御部１の発音ボイスパラメータ供
給制御部７から所定時間毎に供給される発音制御パラメ
ータ、すなわち、フォルマント中心周波数FFreq 1〜m、
フォルマント振幅レベルFLevel 1〜m、フォルマント形
状情報FShape 1〜mおよび音高情報PITCHに基づいてそれ
ぞれ対応する次数のフォルマント波形を発生するもので
ある。図示するように、各フォルマント波形発生部FGen
1〜mには、それぞれ、無声音の発生を受け持つ無声音
音源ユニットUNVOICED TG UNIT９と有声音の発生を受け
持つ有声音音源ユニットVOICED TG UNIT１０とが設けら
れており、各音源ユニット９および１０の出力は加算さ
れて、当該フォルマント波形発生部FGen n（ｎ＝１〜
ｍ）の出力FORMANT_OUT n（ｎ＝１〜ｍ）として出力さ
れる。なお、このような音声合成装置は本出願人により
既に提案されている（特開平３−２００２９９号公
報）。また、前記各フォルマント波形発生部８１〜８ｍ
は楽音の発生も行うことができ、音声のための発音チャ
ンネルとして割り当てられていない音源は、楽音の発生
に割り当てることができる。Reference numerals 81 to 8 m denote formant waveform generation units FGen
1 to m, the sounding control parameters supplied from the sounding voice parameter supply control unit 7 of the control unit 1 every predetermined time, that is, formant center frequencies FFreq 1 to m,
Based on the formant amplitude levels FLevel 1 to m, the formant shape information FShape 1 to m, and the pitch information PITCH, corresponding formant waveforms are generated. As shown, each formant waveform generator FGen
Each of 1 to m is provided with an unvoiced sound source unit UNVOICED TG UNIT 9 for generating unvoiced sound and a voiced sound source unit VOICED TG UNIT 10 for generating voiced sound, and the outputs of the respective sound source units 9 and 10 are added. Then, the formant waveform generator FGen n (n = 1 to
m) is output as FORMANT_OUT n (n = 1 to m). Such a speech synthesizer has already been proposed by the present applicant (JP-A-3-200299). Further, each of the formant waveform generators 81 to 8m
Can also generate musical tones, and sound sources not assigned as sounding channels for voice can be allocated to generating musical tones.

【００１６】１１は信号合成部であり、前記ｍ個のフォ
ルマント波形発生部８１〜８ｍから出力される出力信号
FORMANT_OUT 1〜mを制御部１から供給される制御信号MI
XCONTに応じて加算して出力する信号合成部である。Reference numeral 11 denotes a signal synthesizing unit, which is an output signal output from the m formant waveform generating units 81 to 8m.
FORMANT_OUT 1 to m are converted to a control signal MI supplied from the control unit 1.
This is a signal synthesis unit that adds and outputs according to XCONT.

【００１７】このように構成された本発明の歌唱音合成
装置の動作について説明する。図２は、制御部１におい
て実行されるメインルーチンの動作フローチャートであ
る。さて、動作が開始されると、まず、ステップＳ１０
においてシステムの初期化が行われる。続いて、ステッ
プＳ２０の操作イベント検出処理に進み、前記操作設定
部３において操作イベントが発生したか否かが判定され
る。次に、ステップＳ３０に進み、前記操作イベント検
出処理Ｓ２０において検出した操作イベントが演奏する
楽曲および歌唱の選択イベントである場合には、対応す
る楽曲および歌唱に対応するデータ（伴奏データ、メロ
ディーデータおよび歌詞データ）を前記楽曲情報記憶部
SONGMEM５から読み出す。なお、このメロディーデータ
は、ＭＩＤＩのシーケンスデータと同様のデータとされ
ており、KEYON、KEYOFF、音高[PITCH]、音長[NOTELENGT
H]、タッチ[TOUCH]等のデータが含まれている。また、
歌詞データは、例えば、音符と歌詞との関連を持たせる
ために音符の区切りを示す区切り符号が付加された、歌
詞に対応する音韻が記載されたデータである。さらに、
伴奏データは、ＭＩＤＩのシーケンスデータとされてい
る。The operation of the singing sound synthesizing apparatus according to the present invention thus configured will be described. FIG. 2 is an operation flowchart of a main routine executed in the control unit 1. When the operation is started, first, in step S10
In, the system is initialized. Subsequently, the process proceeds to an operation event detection process in step S20, and it is determined whether or not an operation event has occurred in the operation setting unit 3. Next, proceeding to step S30, if the operation event detected in the operation event detection process S20 is a selection event of a music piece and a song to be played, data (accompaniment data, melody data, Lyrics data) to the music information storage unit
Read from SONGMEM5. The melody data is the same as the MIDI sequence data, and includes KEYON, KEYOFF, pitch [PITCH], pitch [NOTELENGT
H], touch [TOUCH], etc. are included. Also,
The lyrics data is, for example, data in which phonemes corresponding to the lyrics are described, to which a delimiter indicating the delimitation of the notes has been added in order to associate the notes with the lyrics. further,
The accompaniment data is MIDI sequence data.

【００１８】続いて、ステップＳ４０に進み、前記ステ
ップＳ３０において読み出された選択された楽曲の伴奏
データの演奏処理が行われる。これは、通常の楽音発生
処理と同一の処理であり、伴奏用の楽音が前記音源部７
により発生される。続いて、ステップＳ５０に進み、当
該楽曲の歌唱音制御処理が行われる。この処理により、
前記メロディーデータおよび歌詞データに対応した歌唱
音が前記フォルマント波形発生部FGen 1〜mにおいて発
生され、前記信号合成部１１において合成されて出力さ
れることとなる。なお、この歌唱音制御処理Ｓ５０の詳
細については後述する。続いて、ステップＳ６０に進
み、前記ステップＳ２０において検出された処理が前記
楽曲・歌唱選択処理以外の処理である場合におけるその
操作に対応する処理や表示制御処理を行う。そして、前
記ステップＳ２０に戻り、再び、上述した処理を繰り返
し順次実行する。Then, the process proceeds to a step S40, where the performance processing of the accompaniment data of the selected music read out in the step S30 is performed. This is the same process as the normal tone generation process, and the accompaniment tone is generated by the tone generator 7.
Generated by Subsequently, the process proceeds to step S50, where a singing sound control process for the music is performed. With this process,
Singing sounds corresponding to the melody data and the lyrics data are generated in the formant waveform generating units FGen1 to FGen, and are synthesized and output in the signal synthesizing unit 11. The singing sound control processing S50 will be described later in detail. Subsequently, the process proceeds to step S60, where a process corresponding to the operation when the process detected in step S20 is other than the music / singing selection process and a display control process are performed. Then, the process returns to step S20, and the above-described processing is repeated and executed again.

【００１９】次に、前記Ｓ５０の歌唱音制御処理につい
て説明する。図３は、この歌唱音制御処理の動作フロー
チャートである。なお、このフローチャートにおいて
は、説明を簡単にするため、発生する歌唱音の振幅の制
御については省略してある。この歌唱音制御処理が起動
されると、まず、ステップＳ３０１において、歌唱音発
生処理中であるか否かが判定される。この歌唱音発生処
理中であるか否かは、前記制御部１におけるＲＡＭ上に
設定されたフラグSINGING_STARTが「１」であるか否か
により判定される。このフラグSINGING_STARTが「０」
のときは歌唱音発生処理中でないと判断され、ステップ
Ｓ３０２に進む。そして、前記操作設定部３において歌
唱音発生開始指示イベントが発生したか否かを判定す
る。その結果、歌唱音発生開始指示イベントが発生して
いなかったときは、この歌唱音制御処理Ｓ５０を終了
し、前記ステップＳ６０のその他処理に進む。Next, the singing sound control processing in S50 will be described. FIG. 3 is an operation flowchart of the singing sound control processing. In this flowchart, the control of the amplitude of the generated singing sound is omitted for the sake of simplicity. When the singing sound control process is started, first, in step S301, it is determined whether or not the singing sound generation process is being performed. Whether or not the singing sound generation process is being performed is determined based on whether or not a flag SINGING_START set on the RAM in the control unit 1 is “1”. This flag SINGING_START is "0"
In the case of, it is determined that the singing sound generation process is not being performed, and the process proceeds to step S302. Then, it is determined whether or not the singing sound generation start instruction event has occurred in the operation setting unit 3. As a result, when the singing sound generation start instruction event has not occurred, the singing sound control processing S50 ends, and the flow proceeds to the other processing of the step S60.

【００２０】また、歌唱音発生開始指示イベントが発生
していたときには、前記Ｓ３０２の判定結果がＹＥＳと
なり、歌唱音発生処理初期化処理Ｓ３０３が行われ、前
記歌詞データから読み出す音韻のシーケンスポインタｉ
に初期値「１」がセットされる。次に、ステップＳ３０
４に進み、前記フラグSINGING_STARTに「１」をセット
して、この回の歌唱音制御処理を終了する。When a singing sound generation start instruction event has occurred, the result of the determination in S302 is YES, a singing sound generation processing initialization processing S303 is performed, and a sequence pointer i of a phoneme to be read from the lyrics data.
Is set to the initial value "1". Next, step S30
Proceeding to 4, the flag SINGING_START is set to "1", and this singing sound control processing is ended.

【００２１】さて、フラグSINGING_STARTが「１」にセ
ットされている状態（歌唱音発生処理中の状態）で、こ
の歌唱音制御処理Ｓ５０が開始されたときは、前記ステ
ップＳ３０１の判定結果がＮＯとなりステップＳ３０５
に進む。そして、このステップＳ３０５において、歌唱
音発生終了指示イベントが発生したか否かを判定する。
この判定の結果がＹＥＳのときは、ステップＳ３０６に
おいて前記フラグSINGING_STARTを「０」にリセットし
て歌唱音発生処理終了処理を行い、歌唱音制御処理を終
了する。When the singing sound control processing S50 is started in a state where the flag SINGING_START is set to "1" (a state in which the singing sound generation processing is being performed), the determination result in the step S301 becomes NO. Step S305
Proceed to. Then, in this step S305, it is determined whether or not a singing sound generation end instruction event has occurred.
When the result of this determination is YES, the flag SINGING_START is reset to "0" in step S306, singing sound generation processing end processing is performed, and the singing sound control processing ends.

【００２２】また、前記ステップＳ３０５の判定結果が
ＮＯのときは、ステップＳ３０７に進み、第ｉ番目の音
韻を発音を開始すべきタイミングになったか否か、すな
わち、前記メロディーデータのノートオンイベントのタ
イミングとなったか否かを判定する。このステップＳ３
０７の判定の結果がＹＥＳのときは、この音韻ｉを発生
するための準備処理が実行される。まず、ステップＳ３
０８において、メロディーおよび歌詞イベントの分析が
行われる。具体的には、前記メロディーデータのノート
オンイベントに含まれている音高情報KCPITCH、現在発
音中の音韻（ｉ−１）、その音高（ｉ−１）、その発音
時間（ｉ−１）と、いま発音開始タイミングとなった音
韻ｉ、その音高ｉ、その発音時間ｉの分析が行われる。
ここで、前記音韻ｉの音高ｉは、有声音、無声音を問わ
ず、すべてノートの音高情報KC PITCHに対応する周波数
とする。また、前記発音時間ｉは、当該音韻の発音時間
設定とメロディーイベントのKEYON、KEYOFF情報などに
より決定される。一般に、日本語の歌唱においては、主
に先行する子音が所定時間発音され、後続する母音が次
のメロディーノートのキーオンまで継続するように前記
発音時間ｉを決定する。On the other hand, if the decision result in the step S305 is NO, the process advances to a step S307 to determine whether or not it is time to start generating the i-th phoneme, that is, whether the note-on event of the melody data has been started. It is determined whether or not the timing has come. This step S3
When the result of the determination in 07 is YES, a preparation process for generating this phoneme i is executed. First, step S3
At 08, a melody and lyrics event analysis is performed. Specifically, the pitch information KCPITCH included in the note-on event of the melody data, the currently sounding phoneme (i-1), its pitch (i-1), and its sounding time (i-1) Then, the phoneme i, the pitch i, and the sounding time i, which are the sounding start timing, are analyzed.
Here, the pitch i of the phoneme i is a frequency corresponding to the pitch information KC PITCH of the note, regardless of voiced sound or unvoiced sound. The sounding time i is determined by the sounding time setting of the phoneme and the KEYON and KEYOFF information of the melody event. Generally, in Japanese singing, the sounding time i is determined so that the preceding consonant is mainly pronounced for a predetermined time and the succeeding vowel continues until the next melody note key-on.

【００２３】次に、ステップＳ３０９に進み、前記声質
制御情報記憶部PHONEMEM６のボイスパラメータメモリPM
EMから、先行する音韻（ｉ−１）後続する音韻ｉに対応
する音韻調音結合制御データPHCOMBDATAxを読み出す。
次に、ステップＳ３１０において、前記ボイスパラメー
タメモリPMEMから発音を開始すべき音韻ｉに対応する音
韻フォルマントデータPHFRMNTDATAyを読み出す。そし
て、ステップＳ３１１に進み、音韻ｉの発音処理中フラ
グをセットするとともに、音韻ｉの発音開始処理を行
う。また、必要に応じて、先行する音韻（ｉ−１）の発
音の終了処理も行う。このステップＳ３１１が終了後こ
の回の歌唱音制御処理を終了する。前記音韻ｉの発音開
始処理の詳細については後述する。Next, the process proceeds to step S309, where the voice parameter memory PM of the voice quality control information storage unit PHONEMEM6 is stored.
From the EM, read out the phoneme / tone combination control data PHCOMBDATAx corresponding to the preceding phoneme (i-1) and the subsequent phoneme i.
Next, in step S310, phoneme formant data PHFRMNTDATAy corresponding to phoneme i to start sounding is read from the voice parameter memory PMEM. Then, the process proceeds to step S311 to set a sound-in-progress processing flag of phoneme i and perform a sound-generation start processing of phoneme i. Also, if necessary, a process of ending the pronunciation of the preceding phoneme (i-1) is performed. After the step S311 ends, the singing sound control process of this time ends. Details of the sounding start processing of the phoneme i will be described later.

【００２４】なお、ｉ＝１のとき、すなわち、当該歌詞
の最初の音韻の発音開始処理を行うときには、前記ステ
ップＳ３０８における先行する音韻（ｉ−１）が存在し
ない。また、この実施の形態においては、休符や歌唱中
に発生する息継ぎの場合に無音区間が発生することとな
るが、これらについても音韻として扱うようにしてい
る。したがって、休符や息継ぎの後においては、上記最
初の音韻の発音の場合と同様に先行する音韻（ｉ−１）
が存在しないこととなる。このような場合に対応するた
めに、歌い始めあるいは発音の立上りの遷移状態を示す
データを前記音韻調音結合制御データPHCOMBDATA中に格
納しておき、このデータを前記音韻（ｉ−１）、音高
（ｉ−１）として使用するようにしている。When i = 1, that is, when the sound generation process of the first phoneme of the lyrics is performed, there is no preceding phoneme (i-1) in step S308. Further, in this embodiment, a silent section is generated in the case of a rest or a breath occurring during singing, and these are also treated as phonemes. Therefore, after a rest or a breath, the preceding phoneme (i-1) is performed in the same manner as in the case of the pronunciation of the first phoneme.
Does not exist. In order to cope with such a case, data indicating the transition state of the beginning of singing or the rising of the pronunciation is stored in the phonological and articulated tone combination control data PHCOMBDATA, and this data is stored in the phonological (i-1) and pitch. (I-1).

【００２５】さて、音韻ｉの発音開始タイミングではな
く、前記ステップＳ３０７の判断結果がＮＯとなったと
きには、ステップＳ３１２に進み、音韻ｉの発音処理中
であるか否かが判定される。音韻ｉ発音処理中フラグが
セットされており、音韻ｉの発音処理中であるときに
は、この判定結果がＹＥＳとなり、ステップＳ３１３の
発音ボイスパラメータ発生制御処理が行われる。この処
理は、前記フォルマント波形発生部８１〜８ｍに対して
所定時間毎に発音制御パラメータを出力する処理であ
り、これにより前記フォルマント波形発生部８１〜８ｍ
により実際に歌唱音が出力される処理である。この処理
の詳細については後述する。If the result of the determination in step S307 is NO, instead of the sounding start timing of the phoneme i, the flow advances to step S312 to determine whether the sounding process of the phoneme i is in progress. If the phoneme i sounding processing flag is set and the sounding processing of phoneme i is being performed, the determination result is YES, and the sounding voice parameter generation control processing of step S313 is performed. This process is a process of outputting a sound control parameter to the formant waveform generators 81 to 8m at predetermined time intervals, whereby the formant waveform generators 81 to 8m are output.
Is a process of actually outputting a singing sound. Details of this processing will be described later.

【００２６】次にステップＳ３１４に進み、音韻ｉの発
音時間（発音時間ｉ）をチェックし、次の音韻（ｉ＋
１）の発音開始タイミングに達したか否かを判定する
（ステップＳ３１５）。この判定の結果がＮＯのとき
は、この回の歌唱音制御処理Ｓ５０を終了し、前記Ｓ６
０のその他処理に進む。一方、次の音韻（ｉ＋１）の発
音開始タイミングとなったときには、ステップＳ３１６
に進み、（ｉ＋１）を前記発音中の音韻シーケンス番号
ｉとして、前記ステップＳ３０８に進む。以下、前記ス
テップＳ３０８〜Ｓ３１１を新しい音韻シーケンス番号
ｉについて実行する。これにより、その音韻ｉの発音の
準備処理が行われる。Next, the flow advances to step S314 to check the sounding time of the phoneme i (sounding time i), and the next phoneme (i +
It is determined whether the sound generation start timing of 1) has been reached (step S315). If the result of this determination is NO, this singing sound control processing S50 is ended, and
It proceeds to other processing of 0. On the other hand, when it is time to start generating the next phoneme (i + 1), step S316 is executed.
Then, the process proceeds to step S308, where (i + 1) is set as the phoneme sequence number i during sound generation. Hereinafter, steps S308 to S311 are executed for a new phoneme sequence number i. As a result, a preparation process for the pronunciation of the phoneme i is performed.

【００２７】図４は、前記Ｓ３１１の音韻ｉ発音開始処
理を説明するためのフローチャートである。この音韻ｉ
発音開始処理が開始されると、まず、ステップＳ４０１
において、この音韻ｉが無音状態から発音される音韻で
あるか否かが判定される。この判定の結果がＹＥＳのと
きは、直前の音韻の音高がないので、ステップＳ４０２
に進み、その音韻ｉに対応するメロディーの音高KCPITC
H iを音韻ｉのピッチデータPITCH iおよび音韻ｉに先行
する音韻（ｉ−１）のピッチデータPITCH i-1にセット
する。このときには、音高ｉから音韻ｉが立ち上げられ
ることとなる。なお、このステップＳ４０２のようにピ
ッチデータを設定する代わりに、例えば、PITCH i-1と
して、０あるいは他の所定値を設定することもできる。
この場合は、その設定された値を初期値として音韻ｉの
音高ｉに立ち上がっていくこととなる。FIG. 4 is a flowchart for explaining the phoneme i sound generation start processing in S311. This phoneme i
When the pronunciation start processing is started, first, in step S401
In, it is determined whether or not this phoneme i is a phoneme sounded from a silent state. If the result of this determination is YES, there is no pitch of the immediately preceding phoneme, so step S402
To the melody pitch KCPITC corresponding to the phoneme i
H i is set to the pitch data PITCH i of the phoneme i and the pitch data PITCH i-1 of the phoneme (i−1) preceding the phoneme i. At this time, the phoneme i is started from the pitch i. Instead of setting the pitch data as in step S402, for example, 0 or another predetermined value can be set as PITCH i-1.
In this case, the set value is set as an initial value and rises to the pitch i of the phoneme i.

【００２８】また、先行して発音される音韻があり、前
記ステップＳ４０１の判定結果がＮＯとなったときは、
ステップＳ４０７に進み、その音韻ｉが無声音であるか
否かを判定する。音韻ｉが無声音でありこの判定結果が
ＹＥＳのときは、ステップＳ４０８に進み、先行する音
韻（ｉ−１）のピッチデータPITCH i-1を無声音である
音韻ｉのピッチデータPITCH iとする。これにより、無
声音のときに先行する音韻の音高PITCHを保持すること
ができる。また、先行する音韻（ｉ−１）が無声音では
ないときにはその音韻ｉの音高ｉをそのままPITCH iと
する。If there is a phoneme to be sounded earlier and the result of the determination in step S401 is NO,
Proceeding to step S407, it is determined whether or not the phoneme i is an unvoiced sound. If the phoneme i is an unvoiced sound and the determination result is YES, the process proceeds to step S408, where the pitch data PITCH i-1 of the preceding phoneme (i-1) is set as the pitch data PITCH i of the phoneme i which is the unvoiced sound. As a result, the pitch PITCH of the preceding phoneme can be held for an unvoiced sound. When the preceding phoneme (i-1) is not an unvoiced sound, the pitch i of the phoneme i is directly used as PITCH i.

【００２９】さて、上述したようにして音韻ｉおよび
（ｉ−１）のピッチデータPITCH iおよびPITCH i-1が設
定された後、ステップＳ４０３において、PITCH iとPIT
CH i-1とが比較される。この結果、PITCH i＞PITCH i-1
であるとき、すなわち、音韻ｉの音高ｉが先行して発音
される音韻（ｉ−１）の音高（ｉ−１）よりも高いとき
には、ステップＳ４０４に進む。このステップＳ４０４
においては、前記ボイスパラメータメモリPMEMに格納さ
れている音韻フォルマントデータPHFRMNTDATA中に含ま
れている当該音韻ｉのピッチの補間レートRpitch iに対
し、係数Ｋｕを乗算して、補間演算において使用するピ
ッチ補間レートR'pitch iとする。また、PITCH i≦PITC
H i-1でピッチが下降するときにはＳ４０３の判定結果
がＮＯとなり、ステップＳ４０９において、音韻ｉの補
間レートRpitch iに対し係数Ｋｄを乗算して、ピッチ補
間レートR'pitch iとする。After the pitch data PITCH i and PITCH i-1 of the phoneme i and (i-1) are set as described above, in step S403, PITCH i and PITCH i are set.
CH i-1 is compared. As a result, PITCH i> PITCH i-1
If the pitch is i, that is, if the pitch i of the phoneme i is higher than the pitch (i-1) of the phoneme (i-1) that is generated earlier, the process proceeds to step S404. This step S404
In the above, the interpolation rate Rpitch i of the pitch of the phoneme i included in the phoneme formant data PHFRMNTDATA stored in the voice parameter memory PMEM is multiplied by a coefficient Ku, and the pitch interpolation used in the interpolation calculation is performed. Let the rate be R'pitch i. Also, PITCH i ≤ PITC
When the pitch decreases at Hi-1, the determination result in S403 is NO, and in step S409, the interpolation rate Rpitchi of the phoneme i is multiplied by the coefficient Kd to obtain a pitch interpolation rate R'pitchi.

【００３０】ここで、０＜Ｋｕ＜Ｋｄとされており、ピ
ッチ上昇時のピッチ補間レートは、ピッチ下降時の補間
レートよりも小さく設定されている。なお、前記係数Ｋ
ｄ＝１とし、ピッチの変化無しのときおよびピッチが下
降するときの補間レートとして、前記音韻フォルマント
データPHFRMNTDATA中に含まれている補間レートRpitch
iをそのまま用いるようにすればよい。このようにし
て、ピッチが上昇するときと下降するときとでピッチの
遷移レートを変更することができる。Here, 0 <Ku <Kd, and the pitch interpolation rate at the time of pitch rise is set smaller than the interpolation rate at the time of pitch decrease. The coefficient K
The interpolation rate Rpitch included in the phonological formant data PHFRMNTDATA is set as d = 1, as an interpolation rate when there is no change in pitch and when the pitch falls.
i may be used as it is. In this way, the pitch transition rate can be changed between when the pitch rises and when it falls.

【００３１】なお、前記係数ＫｕおよびＫｄを、音高差
や音高によって異なる値に設定しても良い。例えば、高
い音から更に高い音に変化するときに補間レートが異な
るようにしてもよい。このようにすることにより、より
自然な歌唱音とすることが可能となる。次にステップＳ
４０５に進み、フォルマント中心周波数の補正演算処理
が実行され、次にステップＳ４０６において、音韻ｉの
発音開始指示処理が行われる。この処理は、前記音韻ｉ
発音処理中フラグをセットする処理である。Note that the coefficients Ku and Kd may be set to different values depending on the pitch difference or pitch. For example, the interpolation rate may be different when changing from a high sound to a higher sound. By doing so, a more natural singing sound can be obtained. Next, step S
Proceeding to 405, a correction calculation process of the formant center frequency is executed, and then, in step S406, a sounding start instruction process of the phoneme i is performed. This processing is based on the phoneme i
This is a process of setting a flag during sounding processing.

【００３２】図５は、前記ステップＳ４０５のFFreq補
正演算処理のフローチャートである。この処理は、発音
する音韻が子音（Ｃ）母音（Ｖ）子音（Ｃ）であるとき
に、子音に挟まれて発音される母音のフォルマント中心
周波数を制御する処理である。この処理が開始される
と、まず、ステップＳ５０１において、前記歌詞データ
をチェックし、先行する音韻（ｉ−１）、発音開始タイ
ミングとなった音韻ｉおよび後続する音韻（ｉ＋１）が
母音（Ｖ）あるいは子音（Ｃ）のいずれであるかをチェ
ックする。この結果、ＣＶＣの連続発音となっている場
合には、ステップＳ５０２の判定結果がＹＥＳとなり、
ステップＳ５０３に進む。このステップＳ５０３におい
て、フォルマントの次数を示す変数ｊに初期値「１」を
代入し、以下、ステップＳ５０４〜Ｓ５０５、Ｓ５０
７、Ｓ５０６およびＳ５０９のループにより、各フォル
マント中心周波数の目標値の修正を行う。FIG. 5 is a flowchart of the FFreq correction calculation processing in step S405. This process is a process for controlling the formant center frequency of a vowel pronounced between consonants when the pronounced phoneme is a consonant (C) vowel (V) consonant (C). When this process is started, first, in step S501, the lyrics data is checked, and the preceding phoneme (i-1), the phoneme i that has become the sound generation start timing, and the subsequent phoneme (i + 1) are vowels (V). Or it is checked which of the consonants (C). As a result, when the CVC is continuously sounded, the determination result of step S502 is YES, and
Proceed to step S503. In this step S503, the initial value "1" is substituted for a variable j indicating the order of the formant, and the following steps S504 to S505 and S50
7, the target value of each formant center frequency is corrected by the loop of S506 and S509.

【００３３】すなわち、まず、ステップＳ５０４におい
て、音韻ｉの第ｊフォルマントの目標中心周波数FFreq
ijと先行する音韻（ｉ−１）の第ｊフォルマントの目標
中心周波数FFreq (i-1)jとを比較する。この結果、音韻
ｉの第ｊフォルマントの目標中心周波数FFreq ijが先行
する音韻（ｉ−１）の第ｊフォルマントの目標中心周波
数FFreq (i-1)jよりも低い周波数であるときには、ステ
ップＳ５０５において、音韻ｉの第ｊフォルマントの目
標中心周波数FFreq ijに係数Ｈj1（０＜Ｈj1＜１）を乗
算して、低く設定された新たな第ｊフォルマントの目標
中心周波数FFreq' ijとする。That is, first, in step S504, the target center frequency FFreq of the j-th formant of the phoneme i
ij is compared with the target center frequency FFreq (i-1) j of the j-th formant of the preceding phoneme (i-1). As a result, if the target center frequency FFreq ij of the j-th formant of the phoneme i is lower than the target center frequency FFreq (i-1) j of the j-th formant of the preceding phoneme (i-1), the process proceeds to step S505. , The target center frequency FFreq ij of the j-th formant of the phoneme i is multiplied by a coefficient Hj1 (0 <Hj1 <1) to obtain a new target center frequency FFreq ′ ij of the j-th formant set lower.

【００３４】一方、音韻ｉの第ｊフォルマントの目標中
心周波数FFreq ijが先行する音韻（ｉ−１）の第ｊフォ
ルマントの目標中心周波数FFreq (i-1)jよりも高い周波
数であるときには、ステップＳ５０７において、音韻ｉ
の第ｊフォルマントの目標中心周波数FFreq ijに係数Ｈ
j2（１≦Ｈj2）を乗算して、高く設定された新たな第ｊ
フォルマントの目標中心周波数FFreq' ijとする。な
お、音韻ｉの第ｊフォルマントの目標中心周波数FFreq
ijと先行する音韻（ｉ−１）の第ｊフォルマントの目標
中心周波数FFreq (i-1)jとが一致しているときは、前記
Ｈj2＝１とし、目標中心周波数の修正を行わないように
する。On the other hand, when the target center frequency FFreq ij of the j-th formant of the phoneme i is higher than the target center frequency FFreq (i-1) j of the j-th formant of the preceding phoneme (i-1), the step In S507, the phoneme i
To the target center frequency FFreq ij of the j-th formant of
j2 (1 ≦ Hj2) to obtain a new high j-th
The target center frequency of the formant is FFreq'ij. Note that the target center frequency FFreq of the j-th formant of phoneme i
When ij matches the target center frequency FFreq (i-1) j of the j-th formant of the preceding phoneme (i-1), the above-mentioned Hj2 = 1 is set so that the target center frequency is not corrected. I do.

【００３５】次に、ステップＳ５０６に進み、すべての
フォルマントについての処理が終了した否かを判定し、
終了していないときは、変数ｊをインクリメントして
（Ｓ５０９）、前記ステップＳ５０４からの処理を繰り
返す。また、すべてのフォルマントについての処理が終
了したときには、このFFreq補正演算処理を終了する。Next, the process proceeds to step S506, in which it is determined whether or not processing for all formants has been completed.
If not, the variable j is incremented (S509), and the processing from step S504 is repeated. When the processing for all the formants is completed, the FFreq correction operation processing is completed.

【００３６】また、前記ステップＳ５０２の判定がＮＯ
となったとき、すなわち、ＣＶＣの連続発音となってい
ないときには、フォルマントの目標中心周波数を変更す
る必要がないのであるから、音韻ｉに対応する各フォル
マントの目標中心周波数[FFreq] iをそのまま目標中心
周波数[FFreq'] iとして、このFFreq補正演算処理を終
了する。以上が、前記ステップＳ３１１の音韻ｉ発音開
始処理の内容である。このようにして、前記図３におい
て音韻ｉの発音開始タイミングとなったときの発音準備
処理が終了する。If the determination in step S502 is NO
, That is, when the CVC is not continuously sounded, it is not necessary to change the target center frequency of the formant. Therefore, the target center frequency [FFreq] i of each formant corresponding to the phoneme i is set as the target. The center frequency [FFreq ′] i is set, and the FFreq correction operation processing is terminated. The above is the contents of the phoneme i pronunciation start processing in step S311. In this way, the sound generation preparation processing at the time of starting sound generation of the phoneme i in FIG. 3 ends.

【００３７】次に、前記ステップＳ３１３の発音ボイス
パラメータ発生制御処理について、図６のフローチャー
トを参照して説明する。前述したように、このステップ
Ｓ３１３は音韻ｉの発音処理中に実行される処理であ
り、前記フォルマント波形発生部８１〜８ｍに対し所定
時間毎に各パラメータを供給して、所定の音韻を発生さ
せる処理である。Next, the sounding voice parameter generation control processing in step S313 will be described with reference to the flowchart of FIG. As described above, this step S313 is a process executed during the sounding process of the phoneme i, and supplies each parameter to the formant waveform generators 81 to 8m at predetermined time intervals to generate a predetermined phoneme. Processing.

【００３８】この処理においては、まず、ステップＳ６
０１において、先行する音韻（ｉ−１）が有声音である
か否かが判定される。先行する音韻（ｉ−１）が有声音
の場合にはこの判定結果がＹＥＳとなり、次に、ステッ
プＳ６０２において発音処理中の音韻ｉが有声音である
か否かが判定される。また、先行する音韻（ｉ−１）が
無声音のときにはステップＳ６０６に進み、当該音韻ｉ
が無声音であるか否かが判定される。In this process, first, at step S6
At 01, it is determined whether the preceding phoneme (i-1) is a voiced sound. If the preceding phoneme (i-1) is a voiced sound, the result of this determination is YES, and it is next determined in step S602 whether the phoneme i in the pronunciation process is a voiced sound. If the preceding phoneme (i-1) is unvoiced, the process proceeds to step S606, and the phoneme i
Is determined to be a voiceless sound.

【００３９】そして、前記ステップＳ６０２の判定結果
がＹＥＳのとき、すなわち、先行する音韻（ｉ−１）が
有声音で当該音韻ｉが有声音であり有声音の発音が連続
するとき、および、前記ステップＳ６０６の判定結果が
ＮＯのとき、すなわち、先行する音韻（ｉ−１）が無声
音で当該音韻ｉが有声音のときには、ステップＳ６０３
の処理が実行される。このステップＳ６０３は、先行す
る音韻（ｉ−１）のピッチPITCH i-1と現在発音してい
る音韻ｉのピッチPITCH iとの間を前記ステップＳ４０
４あるいはＳ４０９において設定した補間レートR'pitc
h iで補間し、その結果を所定時間（例えば数ｍｓｅ
ｃ）毎に、前記フォルマント波形発生部８１〜８ｍにPI
TCHデータとして出力する処理である。これにより、各
フォルマント波形発生部８１〜８ｍにおいて生成される
音韻のピッチが設定されたレートで変更される。When the result of the determination in step S602 is YES, that is, when the preceding phoneme (i-1) is a voiced sound, the phoneme i is a voiced sound, and the voiced sound continues to be generated, and If the decision result in the step S606 is NO, that is, if the preceding phoneme (i-1) is an unvoiced sound and the phoneme i is a voiced sound, the step S603 is performed.
Is performed. In step S603, the pitch between pitch PITCH i-1 of the preceding phoneme (i-1) and pitch PITCH i of the currently sounding phoneme i is set in step S40.
4 or the interpolation rate R'pitc set in S409
hi, and interpolate the result for a predetermined time (for example,
c) Each time the formant waveform generators 81 to 8m
This is a process of outputting as TCH data. Thereby, the pitch of the phonemes generated in each of the formant waveform generators 81 to 8m is changed at the set rate.

【００４０】また、前記ステップＳ６０２の判定結果が
ＮＯのとき、すなわち、先行する音韻（ｉ−１）が有声
音で後続する音韻ｉが無声音のとき、および、前記ステ
ップＳ６０６の判定結果がＹＥＳのとき、すなわち、先
行する音韻（ｉ−１）が無声音で後続する音韻ｉが無声
音のときは、ステップＳ６０７が実行される。このステ
ップＳ６０７においては、その時点のピッチPITCHを保
持したまま、すなわち、補間演算処理を行うことなく、
所定時間毎にPITCHデータとして、前記フォルマント波
形発生部８１〜８ｍへ出力する。これにより、無声音の
ときはピッチPITCHが保持されることとなる。When the result of the determination in step S602 is NO, that is, when the preceding phoneme (i-1) is a voiced sound and the succeeding phoneme i is an unvoiced sound, and the result of the determination in step S606 is YES. If the preceding phoneme (i-1) is an unvoiced sound and the subsequent phoneme i is an unvoiced sound, step S607 is executed. In this step S607, the pitch PITCH at that time is held, that is, without performing the interpolation calculation processing,
It is output as PITCH data to the formant waveform generators 81 to 8m at predetermined time intervals. As a result, the pitch PITCH is held for an unvoiced sound.

【００４１】上記ステップＳ６０３あるいはＳ６０４に
おいてピッチデータの送出を実行した後は、ステップＳ
６０４に進み、各フォルマント周波数の制御データの送
出が行われる。この処理においては、先行する音韻（ｉ
−１）と当該音韻ｉとの間の音韻調音結合制御データPH
COMBDATAxに応じた調音結合特性で、各フォルマント周
波数について、[FFreq] i-1〜[FFreq'] iの間で補間演
算し、所定時間毎に補間値を出力する。ここで、[FFre
q] i-1は先行する音韻（ｉ−１）の各フォルマントの中
心周波数であり、[FFreq'] iは、前記FFreq補正演算処
理Ｓ４０５において演算された当該音韻ｉの各フォルマ
ントの目標中心周波数である。After the pitch data is transmitted in step S603 or S604, step S603 is executed.
Proceeding to 604, control data of each formant frequency is transmitted. In this processing, the preceding phoneme (i
-1) and the phonological articulation combination control data PH
Interpolation is performed between [FFreq] i−1 to [FFreq ′] i for each formant frequency with an articulation coupling characteristic according to COMBDATAx, and an interpolated value is output at predetermined time intervals. Where [FFre
q] i−1 is the center frequency of each formant of the preceding phoneme (i−1), and [FFreq ′] i is the target center frequency of each formant of the phoneme i calculated in the FFreq correction calculation processing S405. It is.

【００４２】次に、ステップＳ６０５に進み、他の発音
ボイスパラメータ、FShape、FLevel等についても、同様
に補間演算処理を行い、所定時間毎に、前記フォルマン
ト波形発生部８１〜８ｍに出力する。このようにして、
所定時間（例えば、数ｍｓｅｃ）毎に、各フォルマント
波形発生部８１〜８ｍに発音ボイスパラメータが送出さ
れ、各フォルマント波形発生部８１〜８ｍにおいて前述
のようにして当該音韻の対応するフォルマント波形が生
成される。各フォルマント波形発生部８１〜８ｍから出
力される各フォルマントに対応する出力は信号合成部１
１において加算され、当該歌詞に対応した合成された音
韻が発音されることとなる。Next, the process proceeds to step S605, where interpolation calculation processing is similarly performed for other sounding voice parameters, FShape, FLevel, and the like, and output to the formant waveform generators 81 to 8m at predetermined time intervals. In this way,
At predetermined time intervals (for example, several msec), the sounding voice parameters are transmitted to the formant waveform generators 81 to 8m, and the formant waveform generators 81 to 8m generate the corresponding formant waveforms of the phoneme as described above. Is done. The output corresponding to each formant output from each of the formant waveform generators 81 to 8m is the signal synthesizer 1
1 and the synthesized phoneme corresponding to the lyrics is pronounced.

【００４３】なお、以上の説明においては、フォルマン
ト中心周波数、フォルマントレベル、フォルマント帯域
幅およびピッチ周波数などの各パラメータを、前記制御
部１から所定時間間隔で（例えば、数ミリ秒程度の間隔
で）逐次送出して制御するようにしていたが、この時間
間隔をより長くして、各フォルマント波形発生部８１〜
８ｍに含まれているエンベロープジェネレータにより前
記各パラメータを逐次制御させるようにしてもよい。In the above description, the parameters such as the formant center frequency, formant level, formant bandwidth and pitch frequency are transmitted from the control unit 1 at predetermined time intervals (for example, at intervals of about several milliseconds). The control is performed by sequentially transmitting the signals. However, the time interval is set longer, and each of the formant waveform generators 81 to 81 is controlled.
The parameters may be sequentially controlled by an envelope generator included in 8 m.

【００４４】また、上記においては、前記声質制御情報
記憶部PHONEMEM６中に複数種類の音韻フォルマントデー
タPHFRMNTDATAおよび音韻調音結合制御データPHCOMBDAT
Aを記憶し、発生させたい声質に応じてそれらを選択す
るようにしていたが、記憶されている複数種類のデータ
（例えば、男声、女声あるいは個人の声を分析して得た
各個人に対応するデータ）のうちのいくつかを選択し
て、それらを任意に組み合わせ、補間処理を行うことに
より、それらの中間的な特性あるいは新規な特性を有す
る音韻フォルマントデータあるいは音韻調音結合制御デ
ータを生成し、そのデータを用いて発音制御するように
することもできる。例えば、男声と女声の２種類のデー
タから音韻フォルマントデータを生成し、該データを発
音制御に用いることにより、男女２者の中間的な声質を
持った音韻を発生させることが可能となる。Further, in the above, a plurality of types of phoneme formant data PHFRMNTDATA and phoneme tone combination control data PHCOMBDAT are stored in the voice quality control information storage unit PHONEMEM6.
A is stored and selected according to the voice quality desired to be generated, but multiple types of stored data (for example, male voice, female voice, or individual voice obtained by analyzing individual voice) Phonological formant data or phonological tonal connection control data having an intermediate characteristic or a new characteristic by selecting some of them, and combining them arbitrarily and performing an interpolation process. The sound generation can be controlled using the data. For example, by generating phoneme formant data from two types of data, male and female, and using the data for pronunciation control, it is possible to generate phonemes with intermediate voice quality between two men and women.

【００４５】本発明の歌唱音合成装置の適用分野として
特に好適な例を挙げれば、歌唱音が出力可能な電子楽器
やコンピュータシステム、音声応答装置、あるいはゲー
ムマシンやカラオケなどのアミューズメント機器などが
考えられる。また、本発明の歌唱音合成装置は、パソコ
ンに代表されるコンピュータシステムのソフトウエアと
いう形態で実施することも可能である。その際、音声波
形合成までＣＰＵにより実行するようにしてもよいし、
あるいは図１に示したように別途音源を設けてもよい。Particularly suitable examples of the application field of the singing sound synthesizing device of the present invention include an electronic musical instrument, a computer system, a voice response device, a game machine and an amusement device such as a karaoke capable of outputting a singing sound. Can be Further, the singing sound synthesizer of the present invention can be implemented in the form of software of a computer system represented by a personal computer. At that time, the processing up to the synthesis of the audio waveform may be executed by the CPU,
Alternatively, a separate sound source may be provided as shown in FIG.

【００４６】[0046]

【発明の効果】以上説明したように、本発明の歌唱音合
成装置によれば、音程が変化するときにおけるピッチの
変化速度を音程の上昇時は遅くし下降時は早くしている
ために、音程が変化するときにおいても自然な歌唱音を
発声することができる。また、音程が変化するときにお
けるピッチのつながりがスムーズになり、自然な発音と
なる。さらに、子音、母音、子音の順に発音させるとき
に、母音のフォルマント中心周波数を、単独で発音させ
る場合のフォルマント中心周波数に達しないうちに後続
の子音のフォルマント中心周波数に戻すようにしている
ために、違和感のない歌唱音を出力することが可能とな
る。As described above, according to the singing sound synthesizing apparatus of the present invention, the pitch changing speed when the pitch changes is made slower when the pitch is rising and faster when the pitch is falling. Even when the pitch changes, a natural singing sound can be uttered. Further, the connection of the pitches when the pitch changes is smoothed, resulting in a natural sound. Furthermore, when producing a consonant, a vowel, and a consonant in this order, the formant center frequency of the vowel is returned to the formant center frequency of the following consonant before reaching the formant center frequency of the case of producing the sound alone. Thus, it is possible to output a singing sound without a sense of incongruity.

[Brief description of the drawings]

【図１】本発明の歌唱音合成装置のシステム構成の一
例を示すブロック図である。FIG. 1 is a block diagram showing an example of a system configuration of a singing sound synthesizer of the present invention.

【図２】本発明の歌唱音合成措置におけるメインルー
チンを示すフローチャートである。FIG. 2 is a flowchart showing a main routine in a singing sound synthesizing measure of the present invention.

【図３】歌唱音制御処理を説明するためのフローチャ
ートである。FIG. 3 is a flowchart illustrating a singing sound control process.

【図４】音韻ｉ発音開始処理を説明するためのフロー
チャートである。FIG. 4 is a flowchart illustrating a phoneme i pronunciation start process.

【図５】 FFReq補正演算処理を説明するためのフロー
チャートである。FIG. 5 is a flowchart illustrating an FFReq correction calculation process.

【図６】発音ボイスパラメータ発生制御処理を説明す
るためのフローチャートである。FIG. 6 is a flowchart illustrating a sound generation voice parameter generation control process.

【図７】音高の上昇、下降時の状態を説明するための
図である。FIG. 7 is a diagram for explaining a state when a pitch rises and falls.

【図８】音高が変化したときの無声音と有声音出力を
説明するための図である。FIG. 8 is a diagram for explaining unvoiced sound and voiced sound output when the pitch changes.

【図９】子音母音子音の発音時のフォルマント周波数
の変動を説明するための図である。FIG. 9 is a diagram for explaining a change in formant frequency when a consonant vowel consonant is generated.

[Explanation of symbols]

１制御部、２プログラム記憶装置、３操作設定
部、４ＭＩＤＩインターフェース、５楽曲情報記憶
部、６声質制御情報記憶部、７発音ボイスパラメー
タ供給制御部、８１〜８ｍフォルマント波形発生部、
９無声音音源部、１０有声音音源部、１１信号合
成部、１２伴奏用音源部1 control section, 2 program storage device, 3 operation setting section, 4 MIDI interface, 5 music information storage section, 6 voice quality control information storage section, 7 sounding voice parameter supply control section, 81-8 m formant waveform generation section,
9 unvoiced sound source section, 10 voiced sound source section, 11 signal synthesis section, 12 accompaniment sound source section

Claims

[Claims]

The present invention is configured to store lyrics data and melody data, read the lyrics data in response to the reading of the melody data, generate a phoneme corresponding to the lyrics data, and sing the lyrics. Singing sound, wherein the pitch change speed when increasing the pitch of the phoneme to be pronounced is set to be lower than the pitch changing speed when decreasing the pitch of the phoneme to be pronounced. Synthesizer.

2. The system is configured to store lyrics data and melody data, read the lyrics data in response to the reading of the melody data, generate a phoneme corresponding to the lyrics data, and sing the lyrics. In the singing sound synthesizer, when changing the pitch of the phoneme to be pronounced in the case of voiced sound, unvoiced sound, and voiced sound in order, the pitching period of the unvoiced sound retains the preceding voiced sound pitch data and follows. The singing sound synthesizer, wherein the change of the pitch is started from the start of the voiced sound.

3. The system is configured to store lyric data and melody data, read the lyric data in response to the reading of the melody data, generate a phoneme corresponding to the lyric data, and sing the lyrics. In the singing sound synthesizer, when a consonant, a vowel, and a consonant are to be pronounced in order to produce the vowel, the formant center frequency does not reach the formant center frequency in the case where the vowel is solely produced. The singing sound synthesizer is adapted to return to the formant center frequency of the consonant.