JP2001337688A

JP2001337688A - Voice synthesizer, voice systhesizing method and its storage medium

Info

Publication number: JP2001337688A
Application number: JP2000156974A
Authority: JP
Inventors: Kenichiro Nakagawa; 賢一郎中川; Takashi Aso; 隆麻生
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2000-05-26
Filing date: 2000-05-26
Publication date: 2001-12-07

Abstract

PROBLEM TO BE SOLVED: To synthesize a voice which expresses feelings representing a character strings such as pictographs. SOLUTION: A morpheme analysis module 203 detects a prescribed character string from a text information. A pictograph analysis module 207 consults a pictographic dictionary 208 to obtain a correction value for the metrical parameter of the detected character string and supply it to a metrical parameter production module 204. At the metrical parameter production module 204, the metrical parameter produced from the text information other than the prescribed character string is corrected with the correction value.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は文字情報から音声情
報を合成する音声合成装置及び音声合成方法並びに記憶
媒体に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech synthesizer for synthesizing speech information from character information, a speech synthesis method, and a storage medium.

【０００２】[0002]

【従来の技術】一般に、テキストから音声を合成する音
声合成装置では、テキストを入力とし、合成波形を出力
とする。このような音声合成装置では、言語解析部にお
いてテキストの言語解析を行い、音響処理部において音
声素片の選択と波形合成とを行う。2. Description of the Related Art Generally, in a speech synthesizer for synthesizing speech from text, a text is input and a synthesized waveform is output. In such a speech synthesizer, a language analysis unit performs a language analysis of a text, and a sound processing unit performs selection of a speech unit and synthesis of a waveform.

【０００３】[0003]

【発明が解決しようとする課題】電子メールやホームペ
ージに記述されたテキストには、「(^^)」や「(^_^)」
のような文字列が登場する場合がある。このような文字
列は絵文字や顔文字やスマイリーと呼ばれ、テキスト作
成者の感情を表現する手法の一つである。[Problems to be Solved by the Invention] The text described in the e-mail and the homepage includes "(^^)" and "(^ _ ^)"
May appear. Such a character string is called an emoji, an emoticon, or a smiley, and is one of the techniques for expressing the emotion of a text creator.

【０００４】しかしながら、従来の音声合成装置では、
このような文字列を単なる文字列と判断し、そのまま合
成音声として発声してしまうか、発声しないようにする
かしかできないという問題がある。何れの場合において
も、絵文字等の文字列が表す感情を合成音声で伝えるこ
とができないという問題がある。However, in the conventional speech synthesizer,
There is a problem that such a character string is determined as a simple character string, and it can only be uttered as a synthesized voice as it is or not to be uttered. In either case, there is a problem that emotions represented by character strings such as pictograms cannot be conveyed by synthetic speech.

【０００５】本発明は以上の問題点に対して鑑みたもの
であり、絵文字等の文字列が表す感情を表現する音声を
合成することのできる音声合成装置、方法および記憶媒
体を提供することを目的とする。The present invention has been made in view of the above problems, and provides a voice synthesizing apparatus, method, and storage medium capable of synthesizing voice expressing an emotion represented by a character string such as a pictogram. Aim.

【０００６】[0006]

【課題を解決するための手段】本発明の目的を達成する
ために、例えば本発明の音声合成装置は以下の構成を備
える。すなわち、文字情報から所定の文字列を検出する
検出手段と、前記文字情報から合成される音声情報の韻
律を前記所定の文字列に応じて制御する制御手段とを備
える。In order to achieve the object of the present invention, for example, a speech synthesizer of the present invention has the following arrangement. That is, it comprises a detecting means for detecting a predetermined character string from the character information, and a control means for controlling the prosody of voice information synthesized from the character information according to the predetermined character string.

【０００７】本発明の目的を達成するために、例えば本
発明の音声合成装置は更に以下の構成を備える。すなわ
ち、更に、前記所定の文字列を登録する登録手段を備え
る。[0007] In order to achieve the object of the present invention, for example, a speech synthesizer of the present invention further has the following configuration. That is, the apparatus further comprises a registration unit for registering the predetermined character string.

【０００８】本発明の目的を達成するために、例えば本
発明の音声合成装置は更に以下の構成を備える。すなわ
ち、更に、前記文字情報から音声情報を合成する音声合
成手段を備え、前記音声合成手段は、前記所定の文字列
を除く文字情報から音声情報を合成する。[0008] In order to achieve the object of the present invention, for example, the speech synthesizer of the present invention further has the following configuration. That is, the apparatus further includes a voice synthesizing unit that synthesizes voice information from the character information, and the voice synthesizing unit synthesizes voice information from character information other than the predetermined character string.

【０００９】[0009]

【発明の実施の形態】以下添付図面に従って、本発明を
好適な実施形態に従って詳細に説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described below in detail according to preferred embodiments with reference to the accompanying drawings.

【００１０】［第１の実施形態］図１は、本実施形態に
おける音声合成装置の概略構成を示す図である。本実施
形態における音声合成装置は、パーソナルコンピュー
タ，ワークステーション，カーナビゲーションシステム
等の据え置き型情報端末や、携帯電話，携帯型パーソナ
ルコンピュータ等の携帯型情報端末に適用可能である。[First Embodiment] FIG. 1 is a diagram showing a schematic configuration of a speech synthesizing apparatus according to this embodiment. The speech synthesizer according to the present embodiment is applicable to stationary information terminals such as personal computers, workstations, and car navigation systems, and portable information terminals such as mobile phones and portable personal computers.

【００１１】図１において、１０１は入力部、１０２は
音声合成ユニット、１０３は出力部、１０４は記憶部、
１０５はネットワークインタフェース、１０６は制御部
である。In FIG. 1, 101 is an input unit, 102 is a speech synthesis unit, 103 is an output unit, 104 is a storage unit,
Reference numeral 105 denotes a network interface, and reference numeral 106 denotes a control unit.

【００１２】入力部１０１は、キーボード等の文字情報
入力デバイスとマウス等のポインティングデバイスとを
備え、文字情報を入力したり、ユーザの操作を入力した
りする。The input unit 101 includes a character information input device such as a keyboard and a pointing device such as a mouse, and inputs character information and a user operation.

【００１３】音声合成ユニット１０２は、入力部１０
１、記憶部１０４、インタフェース１０５から入力され
た電子メール、ホームページ、文章ファイル等に含まれ
るテキスト（日本語、英語等の言語からなる）から音声
情報を合成する。The voice synthesizing unit 102 includes an input unit 10
1. Speech information is synthesized from texts (consisting of languages such as Japanese and English) contained in an e-mail, a homepage, a sentence file, and the like input from the storage unit 104 and the interface 105.

【００１４】本実施形態の音声合成ユニット１０２は、
ハードウェアで構成することもソフトウェアで構成する
ことも可能である。本実施形態では、音声合成ユニット
１０２をソフトウェアで構成する場合について説明す
る。この場合、音声合成ユニット１０２を実現するため
の制御プログラムは記憶部１０４等の記憶媒体に格納さ
れる。この制御プログラムは、制御部１０６によって読
み出され、後述する処理手順に従って処理される。The speech synthesis unit 102 of the present embodiment
It can be configured by hardware or software. In the present embodiment, a case where the speech synthesis unit 102 is configured by software will be described. In this case, a control program for implementing the speech synthesis unit 102 is stored in a storage medium such as the storage unit 104. This control program is read by the control unit 106 and processed according to the processing procedure described later.

【００１５】出力部１０３は、スピーカ等を備え、音声
合成ユニット１０２で生成された合成音声を出力する。The output unit 103 includes a speaker or the like, and outputs a synthesized voice generated by the voice synthesis unit 102.

【００１６】記憶部１０４は、半導体メモリ、磁気記録
媒体、光記録媒体、ハードディスク等からなり、音声合
成装置の機能を制御するための各種の制御プログラムを
格納する。記憶部１０４は、入力部１０１を用いて作成
した電子メール、ホームページ、文章ファイル等を記憶
したり、インタフェース１０５で受信した電子メール、
ホームページ、文章ファイル等を記憶したりする。The storage unit 104 includes a semiconductor memory, a magnetic recording medium, an optical recording medium, a hard disk, and the like, and stores various control programs for controlling functions of the speech synthesizer. The storage unit 104 stores an e-mail, a homepage, a text file, and the like created by using the input unit 101, an e-mail received by the interface 105,
Store homepages, text files, etc.

【００１７】インタフェース１０５は、電話回線網、移
動体通信回線網、インターネット、衛星通信回線網等の
ネットワークに接続された他の情報端末と通信し、電子
メール、ホームページ、文章ファイル等を受信したり、
音声合成ユニット１０２で合成した音声情報を予め指定
された情報端末に伝送する。The interface 105 communicates with other information terminals connected to a network such as a telephone network, a mobile communication network, the Internet, a satellite communication network, and receives an e-mail, a home page, a text file, and the like. ,
The voice information synthesized by the voice synthesis unit 102 is transmitted to an information terminal specified in advance.

【００１８】制御部１０６は中央処理ユニット（ＣＰ
Ｕ）を具備する。制御部１０６は、記憶装置１０４に記
憶された各種の制御プログラムを読み出し、音声合成装
置の機能を制御する。The control unit 106 includes a central processing unit (CP)
U). The control unit 106 reads various control programs stored in the storage device 104 and controls the functions of the speech synthesizer.

【００１９】図２は、本実施形態における音声合成ユニ
ット１０２の機能構成を示す図である。FIG. 2 is a diagram showing a functional configuration of the voice synthesizing unit 102 in the present embodiment.

【００２０】同図において、２０１はテキスト入力モジ
ュール、２０２は読み出しモジュール、２０３は形態素
解析モジュール、２０４は韻律パラメータ生成モジュー
ル、２０５は音素片選択モジュール、２０６は波形編集
モジュール、２０７は絵文字解析モジュール、２０８は
絵文字辞書、２０９は音素片辞書、２１０は単語辞書、
２１１は文法規則、２１２は絵文字登録モジュールであ
る。In the figure, 201 is a text input module, 202 is a reading module, 203 is a morphological analysis module, 204 is a prosody parameter generation module, 205 is a phoneme segment selection module, 206 is a waveform editing module, 207 is a pictogram analysis module, 208 is an emoticon dictionary, 209 is a phoneme dictionary, 210 is a word dictionary,
211 is a grammar rule, and 212 is a pictogram registration module.

【００２１】テキスト入力モジュール２０１は、ユーザ
の指定した電子メール、ホームページ、文章ファイル等
に含まれるテキストの入力を管理する。電子メール、ホ
ームページ、文章ファイル等に含まれるテキストは、入
力部１０１、記憶部１０４またはインタフェース１０５
から供給される。入力されたテキストは制御部１０６の
管理するメモリに保持される。The text input module 201 manages input of text included in an e-mail, a homepage, a text file and the like designated by the user. Texts contained in e-mails, homepages, text files, etc. are input to the input unit 101, the storage unit 104 or the interface 105
Supplied from The input text is stored in a memory managed by the control unit 106.

【００２２】読み出しモジュール２０２は、メモリに保
持されたテキストの読み出しを管理する。読み出しモジ
ュール２０２は、メモリに保持されたテキストから１文
を単位とする文字情報を読み出し、読み出した文字情報
を形態素解析モジュール２０３に供給する。The reading module 202 manages reading of the text stored in the memory. The read module 202 reads character information in units of one sentence from the text stored in the memory, and supplies the read character information to the morphological analysis module 203.

【００２３】形態素解析モジュール２０３は、単語辞書
２１０、文法規則２１１を用いて、読み出しモジュール
２０２から供給された文字情報を形態素解析して読み，
アクセント位置，アクセントの強さなどの韻律情報を検
出するとともに、この文字情報の文末，句読
点（「、」，「，」，「；」，「：」等）の前後から絵
文字または顔文字と推定される文字列を検出する。検出
された韻律情報は、韻律パラメータ生成モジュール２０
４に供給され、検出された文字列は、絵文字解析モジュ
ール２０７に供給される。The morphological analysis module 203 uses the word dictionary 210 and the grammar rules 211 to morphologically analyze and read the character information supplied from the reading module 202.
Detects prosodic information such as accent position and accent strength, and presumes emoticons or emoticons from the end of the sentence of this character information and before and after punctuation marks (“,”, “,”, “;”, “:”, etc.) Detected character string The detected prosody information is transmitted to the prosody parameter generation module 20.
4 is supplied to the pictographic analysis module 207.

【００２４】単語辞書２１０は、各単語の読み、アクセ
ント位置、アクセントの強さなどの韻律情報を保持する
データベースである。文法規則２１１は、文法的な接続
規則を保持したデータベースである。尚、単語辞書２１
０，文法規則２１１は、記憶部１０４が保持する。The word dictionary 210 is a database that holds prosody information such as reading of each word, accent position, and accent strength. The grammar rule 211 is a database holding grammatical connection rules. Note that the word dictionary 21
0, grammar rules 211 are stored in the storage unit 104.

【００２５】絵文字解析モジュール２０７は、形態素解
析モジュール２０３で検出された文字列が絵文字辞書２
０８に登録されているか否かを判定する。登録されてい
る場合には、その文字列に対応する韻律パラメータの修
正値を取得し、取得した修正値を韻律パラメータ生成モ
ジュール２０４に供給する。The pictograph analysis module 207 converts the character string detected by the morphological analysis module 203 into the pictograph dictionary 2.
08 is determined. If the character string is registered, the correction value of the prosody parameter corresponding to the character string is obtained, and the obtained correction value is supplied to the prosody parameter generation module 204.

【００２６】絵文字辞書２０８は、複数個の絵文字また
は顔文字を感情の種類と韻律パラメータの修正値とに対
応付けて保持するデータベースである。絵文字辞書２０
８が保持する絵文字，感情の種類，韻律パラメータの修
正値の一例を図４に示す。例えば、絵文字「（笑）」
は、「楽しい」という感情を表す。この感情に対応する
韻律パラメータの修正値に基づいて音声情報の韻律を制
御することにより、「楽しい」という感情を表現した音
声情報を生成することが可能となる。「嬉しい」，「楽
しい」，「悲しみ」等の感情と韻律パラメータの修正値
との関係や、「（＾＾）」，「（笑）」，「（・＿
・；）」等の絵文字と感情との関係は、絵文字登録モジ
ュール２１２によって設定される。尚、絵文字辞書２０
８は、記憶部１０４が保持する。The pictograph dictionary 208 is a database that stores a plurality of pictographs or emoticons in association with the type of emotion and the correction value of the prosodic parameter. Emoji dictionary 20
FIG. 4 shows an example of pictograms, emotion types, and correction values of prosodic parameters held by 8. For example, the emoji "(laughs)"
Represents the emotion of "fun". By controlling the prosody of the voice information based on the correction value of the prosody parameter corresponding to this emotion, it is possible to generate voice information expressing the emotion of "fun". The relationship between emotions such as “happy”, “fun”, and “sadness” and the corrected values of prosodic parameters, and “(＾＾)”, “(laughs)”, “(・ _
.;)) And the emotions are set by the pictogram registration module 212. The pictograph dictionary 20
8 is stored in the storage unit 104.

【００２７】韻律パラメータ生成モジュール２０４は、
形態素解析モジュール２０３から供給された韻律情報に
基づいてピッチ周波数（声の高さを表す）、音韻時間長
（声の長さ）、音パワー（声の強さを表す）、ポーズ長
（ポーズする間隔を表す）等の韻律パラメータを生成
し、これらの韻律パラメータを絵文字解析モジュール２
０７で取得した修正値を用いて修正する。The prosody parameter generation module 204 includes:
Based on the prosody information supplied from the morphological analysis module 203, the pitch frequency (representing the pitch of the voice), the phoneme time length (representing the length of the voice), the sound power (representing the strength of the voice), and the pause length (representing the pause) Prosody parameters such as an interval) are generated, and these prosody parameters are converted to the pictogram analysis module 2
Correction is performed using the correction value acquired in step 07.

【００２８】音素片選択モジュール２０５は、音素片辞
書２０９から複数個の音声素片を選択する。尚、音素片
辞書２０９は、記憶部１０４が保持する。The speech unit selection module 205 selects a plurality of speech units from the speech unit dictionary 209. The phoneme segment dictionary 209 is stored in the storage unit 104.

【００２９】波形編集モジュール２０６は、韻律パラメ
ータ生成モジュール２０４で得られた韻律パラメータに
基づいて、音素片選択モジュール２０５で得られた複数
個の音声素片を編集して接続する。波形編集モジュール
２０６では、ＰＳＯＬＡ法（ピッチ同期波形重畳法）等
を用いて、複数個の音声素片を編集して接続し、１つ音
声波形データを生成する。こうして得られた音声波形デ
ータは、出力部１０３またはインタフェース１０５に供
給される。出力部１０３に供給された場合には、スピー
カから出力され、インタフェース１０５に供給された場
合には、ネットワークに接続された所望の情報端末に伝
送される。The waveform editing module 206 edits and connects a plurality of speech units obtained by the phoneme unit selection module 205 based on the prosody parameters obtained by the prosody parameter generation module 204. The waveform editing module 206 edits and connects a plurality of speech units using the PSOLA method (pitch-synchronized waveform superposition method) or the like, and generates one speech waveform data. The audio waveform data thus obtained is supplied to the output unit 103 or the interface 105. When supplied to the output unit 103, the signal is output from a speaker, and when supplied to the interface 105, the signal is transmitted to a desired information terminal connected to a network.

【００３０】絵文字登録モジュール２１２は、感情と韻
律パラメータの修正値との関係や絵文字と感情との関係
を再設定したり、新規の絵文字や新規の種類を登録した
りする手順を制御する。ユーザが新規の絵文字の登録を
要求した場合、絵文字登録モジュール２１２は図５に示
す登録画面を表示する。The pictogram registration module 212 controls the procedure for resetting the relation between the emotion and the correction value of the prosody parameter, the relation between the pictogram and the emotion, and registering a new pictogram and a new type. When the user requests registration of a new pictogram, the pictogram registration module 212 displays a registration screen shown in FIG.

【００３１】図５において、５０１は新規の絵文字を入
力する項目である。ユーザは、入力部２０１を用いて新
規の絵文字を項目５０１に入力する。図５では、「（＾
ｏ＾）」を入力する例を示す。５０２は項目５０１に入
力した絵文字に対応する感情を入力する項目である。ユ
ーザは、入力部２０１を用いて予め登録された感情の中
から所望の感情を選択する。図５では、「楽しい」を選
択する例を示す。５０３は決定ボタン、５０４はキャン
セルボタンである。決定ボタン５０３が押下された場
合、モジュール２１２は、項目５０１に入力された絵文
字と項目５０２で選択された感情とこの感情に設定され
た韻律パラメータの修正値とを対応付けて絵文字辞書２
０８に登録する。In FIG. 5, reference numeral 501 denotes an item for inputting a new pictogram. The user inputs a new pictogram to the item 501 using the input unit 201. In FIG. 5, “(＾
o ＾) ”. Reference numeral 502 denotes an item for inputting an emotion corresponding to the pictograph input in the item 501. The user uses the input unit 201 to select a desired emotion from pre-registered emotions. FIG. 5 shows an example of selecting “fun”. 503 is a decision button, and 504 is a cancel button. When the enter button 503 is pressed, the module 212 associates the pictograph input in the item 501 with the emotion selected in the item 502 and the correction value of the prosodic parameter set for this emotion, and sets the pictograph dictionary 2
Register at 08.

【００３２】一方、ユーザが新規の感情の登録を要求し
た場合、絵文字登録モジュール２１２は図６に示す登録
画面を表示する。On the other hand, when the user requests registration of a new emotion, pictogram registration module 212 displays a registration screen shown in FIG.

【００３３】図６において、６０１は新規の感情を入力
する項目である。ユーザは、入力部２０１を用いて新規
の感情を項目６０１に入力する。６０２は声の高さに対
応する韻律パラメータを調節する項目、６０３は声の長
さに対応する韻律パラメータを調節する項目、６０４は
声の強さに対応する韻律パラメータを調節する項目、６
０５はポーズ間隔に対応する韻律パラメータを調節する
項目である。In FIG. 6, reference numeral 601 denotes an item for inputting a new emotion. The user inputs a new emotion to the item 601 using the input unit 201. 602 is an item for adjusting the prosody parameter corresponding to the pitch of the voice, 603 is an item for adjusting the prosody parameter corresponding to the length of the voice, 604 is an item for adjusting the prosody parameter corresponding to the voice intensity, 6
05 is an item for adjusting the prosody parameter corresponding to the pause interval.

【００３４】６０６はサンプルを入力する項目、６０７
はサンプルの再生を支持する再生ボタンである。再生ボ
タン６０７の押下を検出した場合、音声合成ユニット１
０２は、項目６０６に入力されたサンプルから音声情報
を合成し、その音声情報の韻律パラメータを項目６０２
〜６０５に入力された値に対応する韻律パラメータの修
正値で修正する。修正された音声情報は、出力部１０３
のスピーカから出力される。Reference numeral 606 denotes an item for inputting a sample.
Is a play button that supports sample playback. When the press of the play button 607 is detected, the speech synthesis unit 1
02 synthesizes voice information from the sample input to the item 606, and sets the prosodic parameter of the voice information to the item 602.
The correction is made with the correction value of the prosodic parameter corresponding to the value input to. The corrected audio information is output to the output unit 103
Output from the speaker.

【００３５】６０８は決定ボタン、６０９はキャンセル
ボタンである。決定ボタン６０８の押下を検出した場
合、モジュール２１２は、項目６０１に入力された感情
と項目６０２〜６０５に入力された値に対応する韻律パ
ラメータの修正値と対応付けて絵文字辞書２０８に登録
する。Reference numeral 608 denotes an enter button, and 609 denotes a cancel button. When the pressing of the enter button 608 is detected, the module 212 registers the emotion input in the item 601 and the correction value of the prosodic parameter corresponding to the values input in the items 602 to 605 in the pictograph dictionary 208 in association with each other.

【００３６】図３は、本実施形態における音声合成ユニ
ット１０２の処理手順を説明するフローチャートであ
る。FIG. 3 is a flowchart for explaining the processing procedure of the speech synthesis unit 102 in the present embodiment.

【００３７】テキスト入力モジュール２０１は、ユーザ
の指定した電子メール、ホームページ、文章ファイル等
に含まれるテキストを入力し、メモリに保持する（ステ
ップＳ３０１）。The text input module 201 inputs a text included in an e-mail, a homepage, a text file, and the like designated by the user, and stores the text in a memory (step S301).

【００３８】読み出しモジュール２０２は、メモリに保
持されたテキストから１文を単位とする文字情報を読み
出し、読み出した文字情報を形態素解析モジュール２０
３に供給する。The reading module 202 reads character information in units of one sentence from the text held in the memory, and reads the read character information into the morphological analysis module 20.
Supply 3

【００３９】形態素解析モジュール２０３は、単語辞書
２１０、文法規則２１１を用いて、ステップＳ３０２で
読み出された文字情報を形態素解析して読み，アクセン
ト位置，アクセントの強さなどの韻律情報を検出すると
ともに、この文字情報の文末，句読
点（「、」，「，」，「；」，「：」等）の前後から絵
文字と推定される文字列を検出する（ステップＳ３０
３）。例えば、「今日は良い天気ですね（＾＾）。」を
形態素解析した場合には、文末から文字列「（＾＾）」
を検出する。また、例えば、「今日は良い天気ですね
（＾＾）、でも明日は雨です（泣）。」を形態素解析し
た場合には、句読点の前後から文字列「（＾＾）」及び
「（泣）」を検出する。The morphological analysis module 203 uses the word dictionary 210 and the grammatical rules 211 to morphologically analyze the character information read in step S302, and detects prosody information such as reading, accent position, and accent strength. At the same time, a character string presumed to be a pictogram is detected from before and after the end of the sentence of this character information and punctuation marks (“,”, “,”, “;”, “:”, etc.) (step S30)
3). For example, if morphological analysis is performed for “Today's good weather (＾＾).”, The character string “(＾＾)” starts at the end of the sentence.
Is detected. Also, for example, when the morphological analysis of “Today is fine weather (＾＾), but tomorrow is rainy (cry)” is performed, the character strings “(＾＾)” and “(cry ) "Is detected.

【００４０】韻律パラメータ生成モジュール２０４は、
ステップＳ３０３で形態素解析された文字情報の韻律パ
ラメータを生成する（ステップＳ３０４）。但し、形態
素解析モジュール２０３で絵文字または顔文字と推定さ
れる文字列を検出した場合には、その文字列を除く文字
情報の韻律パラメータを生成する。ステップＳ３０３で
生成される韻律パラメータは、ピッチ周波数、音韻時間
長、音パワー、ポーズ長等である。The prosody parameter generation module 204 includes:
A prosodic parameter of the character information subjected to morphological analysis in step S303 is generated (step S304). However, when the morphological analysis module 203 detects a character string presumed to be a pictogram or a face character, it generates a prosodic parameter of character information excluding the character string. The prosody parameters generated in step S303 are pitch frequency, phoneme time length, sound power, pause length, and the like.

【００４１】ステップＳ３０３で文字列が検出された場
合、この文字列は絵文字解析モジュール２０７に供給さ
れる（ステップＳ３０５）。一方、ステップＳ３０３で
文字列が検出されなかった場合には、ステップＳ３０８
の処理を実行する。If a character string is detected in step S303, this character string is supplied to pictographic analysis module 207 (step S305). On the other hand, if no character string is detected in step S303, step S308
Execute the processing of

【００４２】絵文字解析モジュール２０７は、絵文字辞
書２０８を参照してステップＳ３０３で検出された文字
列に対応する韻律パラメータの修正値を取得し、取得し
た修正値を韻律パラメータ生成モジュール２０４に供給
する（ステップＳ３０６）。１つの文字情報から複数個
の文字列を検出した場合には、各文字列の修正値を取得
する。The pictographic analysis module 207 refers to the pictographic dictionary 208 to obtain a correction value of the prosody parameter corresponding to the character string detected in step S303, and supplies the obtained correction value to the prosody parameter generation module 204 ( Step S306). When a plurality of character strings are detected from one character information, a correction value of each character string is obtained.

【００４３】韻律パラメータ生成モジュール２０４は、
ステップＳ３０６で取得した修正値を用いて、ステップ
Ｓ３０４で生成した韻律パラメータを修正する（ステッ
プＳ３０７）。The prosodic parameter generation module 204
The prosody parameter generated in step S304 is corrected using the correction value obtained in step S306 (step S307).

【００４４】１つの文字情報から複数個の文字列を検出
した場合には、その文字情報を複数個の文節に分け、各
文節の韻律パラメータを各文字列の修正値で修正する。
例えば、「今日は良い天気ですね（＾＾）、でも明日は
雨です（泣）。」の場合には、文字列「（＾＾）」に対
応する修正値を用いて「今日は良い天気ですね」の韻律
パラメータを修正し、文字列「（泣）」に対応する修正
値を用いて「でも明日は雨です」の韻律パラメータを修
正する。このように構成することにより、１つの文字情
報をより感情豊かに表現することが可能となる。When a plurality of character strings are detected from one piece of character information, the character information is divided into a plurality of phrases, and the prosodic parameters of each phrase are corrected with the correction value of each character string.
For example, in the case of “Today is good weather (＾＾), but tomorrow is rainy (crying).” In the case of “correct today is good weather using the correction value corresponding to the character string“ (＾＾) ”. The prosody parameter of "But it's raining" is corrected using the correction value corresponding to the character string "(cry)". With this configuration, it is possible to express one piece of character information with more emotion.

【００４５】音素片選択モジュール２０５は、音素片辞
書２０９から複数個の音声素片を選択する（ステップＳ
３０８）。The phoneme segment selection module 205 selects a plurality of speech segments from the phoneme segment dictionary 209 (step S).
308).

【００４６】波形編集モジュール２０６は、ステップＳ
３０４で生成された韻律パラメータまたはステップＳ３
０７で修正された韻律パラメータに基づいて、ステップ
Ｓ３０８で選択された複数個の音声素片を編集して接続
する（ステップＳ３０９）。The waveform editing module 206 executes step S
Prosody parameters generated in step 304 or step S3
Based on the prosodic parameters modified in step 07, the plurality of speech units selected in step S308 are edited and connected (step S309).

【００４７】このように処理することにより、絵文字の
表現する感情を、絵文字を除く文字情報から得られた合
成音声の韻律に反映することが可能となる。波形編集モ
ジュール２０６で得られた合成音声は、出力部１０３ま
たはインタフェース１０５に供給される。By performing such processing, the emotion expressed by the pictogram can be reflected in the prosody of the synthesized speech obtained from the character information excluding the pictogram. The synthesized speech obtained by the waveform editing module 206 is supplied to the output unit 103 or the interface 105.

【００４８】読み出しモジュール２０２は、まだ読み出
していない文字情報が存在するか否かを判別する（ステ
ップＳ３１０）。存在する場合には、次の文字情報を読
み出し、読み出した文字情報を形態素解析モジュール２
０３に供給する（ステップＳ３０２）。一方、存在しな
い場合には、本処理を終了する。The reading module 202 determines whether or not there is character information that has not been read yet (step S310). If there is, the next character information is read, and the read character information is read by the morphological analysis module 2.
03 (step S302). On the other hand, if the file does not exist, the process ends.

【００４９】以上説明したように本実施形態によれば、
絵文字等の文字列を含む文字情報から音声情報を合成す
る際に、その音声情報の韻律をその文字列の種類に応じ
て制御することが可能となる。このように構成すること
によって、絵文字等の文字列を含んだテキストを合成音
声で感情豊かに表現することができる。As described above, according to the present embodiment,
When synthesizing voice information from character information including a character string such as a pictogram, the prosody of the voice information can be controlled according to the type of the character string. With this configuration, it is possible to express a text including a character string such as an emoticon with a synthetic voice in an emotionally rich manner.

【００５０】［第２の実施形態］第１の実施形態では、
絵文字辞書２０８に登録していない文字列を検出した場
合には、文字情報から合成される音声情報の韻律を修正
しない音声合成装置について説明した。[Second Embodiment] In the first embodiment,
The speech synthesizing apparatus that does not modify the prosody of speech information synthesized from character information when a character string not registered in the pictograph dictionary 208 is detected has been described.

【００５１】本実施形態では、音声合成ユニット１０２
の一部を変更し、絵文字辞書２０８に登録していない文
字列を検出した場合であっても、その文字列を構成する
文字や記号から韻律パラメータの修正値を決定し、文字
情報から合成される音声情報の韻律を修正する音声合成
装置について説明する。In this embodiment, the speech synthesis unit 102
Is changed, and even if a character string not registered in the pictograph dictionary 208 is detected, the correction value of the prosodic parameter is determined from the characters and symbols constituting the character string, and is synthesized from the character information. A speech synthesizer that corrects the prosody of the speech information will be described.

【００５２】図７は、本実施形態における音声合成ユニ
ット１０２の機能構成を示す。なお、図２と同様の機能
を有するモジュールについては同一の番号を付す。７０
１は絵文字パーツ辞書、７０２は絵文字解析モジュール
である。FIG. 7 shows a functional configuration of the speech synthesizing unit 102 in the present embodiment. Note that the same numbers are assigned to modules having the same functions as those in FIG. 70
Reference numeral 1 denotes a pictographic part dictionary, and 702, a pictographic analysis module.

【００５３】パーツ辞書７０１は、所定の文字や記号
（以下、パーツと称する）と韻律パラメータの修正値と
対応付けて記憶するデータベースである。パーツ辞書７
０１の一例を図９に示す。尚、パーツ辞書７０１は、記
憶部１０４が保持する。The parts dictionary 701 is a database that stores predetermined characters and symbols (hereinafter, referred to as parts) in association with correction values of prosody parameters. Parts dictionary 7
An example of 01 is shown in FIG. The parts dictionary 701 is stored in the storage unit 104.

【００５４】絵文字解析モジュール７０２は、形態素解
析モジュール２０３で検出された文字列が絵文字辞書２
０８に登録されているか否かを判定する。登録されてい
る場合には、その文字列に対応する韻律パラメータの修
正値を取得し、登録されていない場合には、パーツ辞書
７０１を参照して韻律パラメータの修正値を取得する。The pictograph analysis module 702 converts the character string detected by the morphological analysis module 203 into the pictograph dictionary 2.
08 is determined. If the character string is registered, the correction value of the prosody parameter corresponding to the character string is obtained. If the character string is not registered, the correction value of the prosody parameter is obtained with reference to the parts dictionary 701.

【００５５】図８は、本実施形態における音声合成ユニ
ット１０２の処理手順を説明するフローチャートであ
る。FIG. 8 is a flowchart for explaining the processing procedure of the speech synthesis unit 102 in the present embodiment.

【００５６】同図において、ステップＳ８０１〜ステッ
プＳ８０５の各処理は、図３におけるステップＳ３０１
〜ステップＳ３０５の各処理と同じであるため、ここで
は説明は省略する。In the figure, each processing of steps S801 to S805 is performed in step S301 in FIG.
Since the processing is the same as the processing in steps S305 to S305, the description is omitted here.

【００５７】ステップＳ８０４で絵文字と推定される文
字列が検出された場合、この文字列は絵文字解析モジュ
ール７０２に供給される（ステップＳ８０５）。一方、
ステップＳ８０４で文字列が検出されなかった場合に
は、ステップＳ８１２の処理を実行する。If a character string estimated to be a pictogram is detected in step S804, this character string is supplied to pictogram analysis module 702 (step S805). on the other hand,
If no character string is detected in step S804, the process of step S812 is performed.

【００５８】絵文字解析モジュール７０２は、絵文字辞
書２０８を参照し、ステップＳ８０４で検出された文字
列が絵文字辞書２０８に登録されているか否かを判定す
る（ステップＳ８０６）。登録されている場合には、そ
の文字列に対応する韻律パラメータの修正値を取得し、
取得した修正値を韻律パラメータ生成モジュール２０４
に供給する（ステップＳ８０７）。韻律パラメータ生成
モジュール２０４は、ステップＳ８１０で取得した修正
値を用いて、ステップＳ８０４で生成した韻律パラメー
タを修正する（ステップＳ８０８）。The pictograph analysis module 702 refers to the pictograph dictionary 208 and determines whether or not the character string detected in step S804 is registered in the pictograph dictionary 208 (step S806). If it is registered, obtain the modified value of the prosodic parameter corresponding to the character string,
Prosody parameter generation module 204
(Step S807). The prosody parameter generation module 204 corrects the prosody parameter generated in step S804 using the correction value acquired in step S810 (step S808).

【００５９】一方、ステップＳ８０４で検出された文字
列が絵文字辞書２０８に登録されていない場合には、こ
の文字列を構成するパーツを検出する（ステップＳ８０
９）。絵文字解析モジュール７０２は、絵文字パーツ辞
書７０１を参照し、各パーツに対応する韻律パラメータ
の修正値を取得し、取得した修正値を韻律パラメータ生
成モジュール２０４に供給する（ステップＳ８１０）。
韻律パラメータ生成モジュール２０４は、ステップＳ８
１０で修正値を取得するごとに、ステップＳ８０４で生
成した韻律パラメータを修正する（ステップＳ８１
１）。On the other hand, if the character string detected in step S804 is not registered in pictogram dictionary 208, the parts constituting this character string are detected (step S80).
9). The pictographic analysis module 702 refers to the pictographic parts dictionary 701, acquires a correction value of the prosody parameter corresponding to each part, and supplies the obtained correction value to the prosody parameter generation module 204 (step S810).
The prosody parameter generation module 204 determines in step S8
Each time a correction value is obtained in step 10, the prosody parameter generated in step S804 is corrected (step S81).
1).

【００６０】音素片選択モジュール２０５は、音素片辞
書２０９から複数個の音声素片を選択する（ステップＳ
８１２）。The phoneme segment selection module 205 selects a plurality of speech segments from the phoneme segment dictionary 209 (step S).
812).

【００６１】波形編集モジュール２０６は、ステップＳ
８０４で生成された韻律パラメータ、ステップＳ８０８
で修正された韻律パラメータまたはステップＳ８１１で
修正された韻律パラメータに基づいて、ステップＳ８０
７で選択された複数個の音声素片を編集して接続する
（ステップＳ８１３）。The waveform editing module 206 determines in step S
Prosodic parameters generated in step 804, step S808
Based on the prosody parameters modified in step S811 or the prosody parameters modified in step S811
The plurality of speech units selected in step 7 are edited and connected (step S813).

【００６２】このように処理することにより、予め登録
された絵文字が検出された場合には、絵文字の表現する
感情を、絵文字を除く文字情報から得られた合成音声の
韻律に反映することが可能となる。また、予め登録され
た絵文字が検出されなかった場合であっても、予め登録
されたパーツに対応する修正値によって合成音声の韻律
に変化を与えることが可能となる。By performing such processing, when a pre-registered pictograph is detected, the emotion expressed by the pictograph can be reflected in the prosody of the synthesized speech obtained from the character information excluding the pictograph. Becomes Further, even when a pre-registered pictogram is not detected, it is possible to change the prosody of the synthesized speech by the correction value corresponding to the pre-registered part.

【００６３】読み出しモジュール２０２は、まだ読み出
していない文字情報が存在するか否かを判別する（ステ
ップＳ８１４）。存在する場合には、次の文字情報を読
み出し、読み出した文字情報を形態素解析モジュール２
０３に供給する（ステップＳ８０２）。一方、存在しな
い場合には、本処理を終了する。The reading module 202 determines whether or not there is character information that has not been read yet (step S814). If there is, the next character information is read, and the read character information is read by the morphological analysis module 2.
03 (step S802). On the other hand, if the file does not exist, the process ends.

【００６４】以上説明したように本実施形態によれば、
絵文字辞書２０８に登録していない文字列を検出した場
合であっても、文字列を構成するパーツに応じて文字情
報から合成される音声情報の韻律を制御することが可能
となる。As described above, according to the present embodiment,
Even when a character string that is not registered in the pictograph dictionary 208 is detected, it is possible to control the prosody of the speech information synthesized from the character information according to the parts constituting the character string.

【００６５】［第３の実施形態］第２の実施形態では、
絵文字辞書２０８に登録していない文字列を検出した場
合であっても、文字列を構成するパーツに応じて合成さ
れた音声情報の韻律を制御する音声合成装置について説
明した。[Third Embodiment] In the second embodiment,
The speech synthesizing apparatus that controls the prosody of the speech information synthesized according to the parts forming the character string even when the character string not registered in the pictograph dictionary 208 is detected has been described.

【００６６】本実施形態では、音声合成ユニット１０２
の一部を変更し、絵文字辞書２０８に登録していない文
字列を検出した場合であっても、ダイナミックプログラ
ミング（以下、ＤＰ）法を用いてこの文字列に最も近い
（似ている）絵文字を絵文字辞書２０８から検出し、検
出された絵文字の修正値を用いて文字情報から合成され
る音声情報の韻律を制御する音声合成装置について説明
する。In this embodiment, the speech synthesis unit 102
Is changed to detect a character string that is not registered in the pictograph dictionary 208, the closest (similar) pictograph to this character string is determined using the dynamic programming (hereinafter, DP) method. A speech synthesizing device that detects from the pictograph dictionary 208 and controls the prosody of the speech information synthesized from the character information by using the detected correction value of the pictograph will be described.

【００６７】本実施形態で使用するＤＰ法について図１
０を用いて説明する。図１０は、ＤＰを行う際に用いる
絵文字平面であり、入力パターン（形態素解析モジュー
ル２０３で検出された文字列）を横軸とし、標準パター
ン（絵文字辞書２０８内の絵文字）を縦軸とする。FIG. 1 shows the DP method used in this embodiment.
Explanation will be made using 0. FIG. 10 is a pictogram plane used in performing DP, in which an input pattern (a character string detected by the morphological analysis module 203) is set as a horizontal axis, and a standard pattern (a pictogram in the pictogram dictionary 208) is set as a vertical axis.

【００６８】ＸＹ平面をそのＸ軸，Ｙ軸の各パターン文
字の比較点とし、その各パターン文字間の差異を評価す
る尺度をｄ（ｘ，ｙ）とする。なお、ｄ（ｘ，ｙ）は
ｘ，ｙ間の際を評価する尺度である。ｄ（ｘ，ｙ）の値
は、図１１に示すテーブルを用いることで得ることがで
きる。同図を用いれば、例えばｄ（"＾"，"＿"）＝２.
０である。このＸＹ平面の中を、左下から右上の終点ま
で進む経路を考える。ただし、格子点（ｘ，ｙ）に到達
できる経路は、真横から来る（ｘ−１，ｙ）、真下から
来る（ｘ，ｙ−１）、斜め左下から来る（ｘ−１，ｙ−
１）の３点からのみとする。The XY plane is set as a comparison point of each pattern character on the X axis and the Y axis, and a scale for evaluating the difference between the pattern characters is d (x, y). Note that d (x, y) is a scale for evaluating the time between x and y. The value of d (x, y) can be obtained by using the table shown in FIG. As shown in the figure, for example, d (“＾”, “_”) = 2.
0. Consider a path that travels from the lower left to the upper right end point in the XY plane. However, the routes that can reach the grid point (x, y) come from right beside (x-1, y), come directly below (x, y-1), and come diagonally from the lower left (x-1, y-).
Only from the three points 1).

【００６９】始点から上記３点にいたる最小距離の部分
和がそれぞれＤ（ｘ−１，ｙ），Ｄ（ｘ，ｙ−１），Ｄ
（ｘ−１，ｙ−１）で与えられていると仮定し、格子点
（ｘ，ｙ）にいたる最小距離を次式で定義する。The partial sums of the minimum distance from the starting point to the above three points are D (x-1, y), D (x, y-1), D
Assuming that it is given by (x-1, y-1), the minimum distance to the grid point (x, y) is defined by the following equation.

【００７０】D(x,y)= min(D(x-1,y)+d(x,y), D(x,y-1)+
d(x,y), D(x-1,y-1)+2d(x,y)) このとき、Ｄ(終点）が入力パターンと標準パターン間
の距離（差異）となる。この距離が小さいものほどパタ
ーンが似ていることになるため、入力パターンと絵文字
辞書２０８内に格納された全ての絵文字との距離を調
べ、最も距離が小さい絵文字を選択する。D (x, y) = min (D (x-1, y) + d (x, y), D (x, y-1) +
d (x, y), D (x-1, y-1) + 2d (x, y)) At this time, D (end point) is the distance (difference) between the input pattern and the standard pattern. The smaller the distance is, the more similar the pattern is. Therefore, the distance between the input pattern and all the pictographs stored in the pictograph dictionary 208 is checked, and the pictograph with the shortest distance is selected.

【００７１】本実施形態における音声合成ユニット１０
２の処理手順を図８及び図１２を用いて説明する。図１
２に示すフローチャートは、図８のステップＳ８０９〜
Ｓ８１１を変更した例である。尚、図８のステップＳ８
０９〜Ｓ８１１以外のステップについては、上記の手順
と同様に処理されるため、その説明を省略する。The speech synthesizing unit 10 in the present embodiment
The processing procedure 2 will be described with reference to FIGS. FIG.
The flowchart shown in FIG. 2 corresponds to steps S809 to S809 in FIG.
This is an example in which S811 is changed. Step S8 in FIG.
Steps other than steps 09 to S811 are processed in the same manner as the above-described procedure, and a description thereof will be omitted.

【００７２】絵文字解析モジュール７０２は、絵文字辞
書２０８を参照し、ステップＳ８０４で検出された文字
列が絵文字辞書２０８に登録されているか否かを判定す
る（ステップＳ８０６）。ステップＳ８０４で検出され
た文字列が絵文字辞書２０８に登録されていない場合に
は、ステップＳ９０１の処理を開始する。The pictograph analysis module 702 refers to the pictograph dictionary 208 and determines whether or not the character string detected in step S804 is registered in the pictograph dictionary 208 (step S806). If the character string detected in step S804 is not registered in the pictograph dictionary 208, the processing in step S901 is started.

【００７３】まず変数ｅ１に入力パターン（形態素解析
モジュール２０３で形態素解析を行った文字列に絵文字
が含まれている場合のこの絵文字）を格納する（ステッ
プＳ９０１）。一方、変数ｋ'に本実施形態における音
声合成装置が扱える最大の実数（この音声合成装置にと
っては∞）をｅ'にＮＵＬＬを格納し、初期化する（ス
テップＳ９０２）。以降の各処理では、ｋ'には最小距
離、ｅ'には最小距離の絵文字の表記が常に入る変数で
ある。First, an input pattern (this pictogram in the case where a pictogram is included in a character string subjected to morphological analysis by the morphological analysis module 203) is stored in a variable e1 (step S901). On the other hand, the maximum real number (∞ for the speech synthesizer) that can be handled by the speech synthesizer according to the present embodiment is stored in the variable k ′ as NULL and e ′ is initialized (step S902). In the following processing, k ′ is a variable that always contains the pictograph of the minimum distance and e ′ is the minimum distance.

【００７４】絵文字解析モジュール２０７は絵文字辞書
２０８から一つの絵文字表記を取り出し、変数ｅ２に格
納する（ステップＳ９０３）。次に、変数ｅ１とｅ２の
ＤＰ距離ｋを上述の方法により求め（ステップＳ９０
５）、その値が変数ｋ'に格納された値よりも小さけれ
ば（ステップＳ９０６）、変数ｋ'に変数ｋに格納され
た値を、変数ｅ'に変数ｅ２に格納された値を夫々格納
する（ステップＳ９０７）。この距離計算を絵文字辞書
２０８に格納された全ての絵文字について行い（ステッ
プＳ９０４）、ループを抜ける。The pictograph analysis module 207 extracts one pictograph notation from the pictograph dictionary 208 and stores it in the variable e2 (step S903). Next, the DP distance k between the variables e1 and e2 is determined by the above-described method (step S90).
5) If the value is smaller than the value stored in the variable k '(step S906), the value stored in the variable k is stored in the variable k', and the value stored in the variable e2 is stored in the variable e '. (Step S907). This distance calculation is performed for all pictographs stored in pictograph dictionary 208 (step S904), and the process exits the loop.

【００７５】入力パターンと絵文字辞書２０８に格納さ
れた全ての絵文字との間で距離計算を行った後に、変数
ｅ'に格納された値がＮＵＬＬであれば（絵文字辞書２
０８内に格納された全ての絵文字との距離が出なかっ
た）、韻律パラメータを修正することなくステップＳ８
１２の処理を行う（ステップＳ９０８）。また、変数
ｋ'に格納された値が大きい（同図の例では５より大き
い）場合、入力パターンと似ている絵文字が絵文字辞書
２０８にはなかったとみなし、韻律パラメータを修正せ
ずにステップＳ８１２の処理を行う（ステップＳ９０
９）。After calculating the distance between the input pattern and all the pictographs stored in pictograph dictionary 208, if the value stored in variable e 'is NULL (pictograph dictionary 2
08, the distance from all the pictographs stored in the image did not come out), and without modifying the prosodic parameters, step S8
12 is performed (step S908). If the value stored in the variable k ′ is large (greater than 5 in the example of FIG. 4), it is assumed that no pictogram similar to the input pattern is present in the pictogram dictionary 208, and the prosody parameter is not corrected and step S812 is performed. (Step S90)
9).

【００７６】一方、ｋ'≦５の場合、変数ｅ'に格納され
た絵文字に最も類似した絵文字であるとみなし、変数
ｅ'に格納された絵文字に対応するの修正値を取得する
（ステップＳ９１０）。韻律パラメータ生成モジュール
２０４は、ステップＳ９１０で取得した修正値を用い
て、ステップＳ８０４で生成した韻律パラメータを修正
する（ステップＳ９１１）。On the other hand, when k ′ ≦ 5, it is regarded that the pictogram is most similar to the pictogram stored in the variable e ′, and a correction value corresponding to the pictogram stored in the variable e ′ is obtained (step S910). ). The prosody parameter generation module 204 corrects the prosody parameter generated in step S804 using the correction value obtained in step S910 (step S911).

【００７７】以上説明したように本実施形態によれば、
絵文字辞書２０８に登録していない文字列を検出した場
合であっても、この文字列に最も類似した絵文字に応じ
て文字情報から合成される音声情報の韻律を制御するこ
とが可能となる。As described above, according to the present embodiment,
Even when a character string that is not registered in the pictograph dictionary 208 is detected, it is possible to control the prosody of the speech information synthesized from the character information according to the pictograph most similar to this character string.

【００７８】[0078]

【発明の効果】以上説明したように説明に本発明によれ
ば、絵文字等の文字列が表す感情を表現する音声を合成
することができる。As described above, according to the present invention, it is possible to synthesize a voice expressing an emotion represented by a character string such as a pictogram.

[Brief description of the drawings]

【図１】本実施形態における音声合成装置の概略構成を
示す図である。FIG. 1 is a diagram illustrating a schematic configuration of a speech synthesis device according to an embodiment.

【図２】第１の実施形態における音声合成ユニット１０
２の機能構成を示す図である。FIG. 2 shows a speech synthesis unit 10 according to the first embodiment.
FIG. 3 is a diagram illustrating a functional configuration of the second embodiment;

【図３】第１の実施形態における音声合成ユニット１０
２の処理手順を説明するフローチャートである。FIG. 3 is a speech synthesis unit 10 according to the first embodiment.
6 is a flowchart illustrating a processing procedure of No. 2;

【図４】絵文字辞書２０８を説明する図である。FIG. 4 is a diagram illustrating a pictograph dictionary 208.

【図５】新規の絵文字を登録する登録画面を説明する図
である。FIG. 5 is a diagram illustrating a registration screen for registering a new pictogram.

【図６】新規の感情を登録する登録画面を説明する図で
ある。FIG. 6 is a diagram illustrating a registration screen for registering a new emotion.

【図７】第２の実施形態における音声合成ユニット１０
２の機能構成を示す図である。FIG. 7 shows a speech synthesis unit 10 according to the second embodiment.
FIG. 3 is a diagram illustrating a functional configuration of the second embodiment;

【図８】第２の実施形態における音声合成ユニット１０
２の処理手順を説明するフローチャートである。FIG. 8 shows a speech synthesis unit 10 according to the second embodiment.
6 is a flowchart illustrating a processing procedure of No. 2;

【図９】絵文字パーツ辞書７０１を説明する図である。FIG. 9 is a diagram illustrating a pictograph part dictionary 701.

【図１０】ダイナミックプログラミングを説明する図で
ある。FIG. 10 is a diagram illustrating dynamic programming.

【図１１】各パターン間の差異を評価する尺度を得るた
めのテーブルを示す図である。FIG. 11 is a diagram showing a table for obtaining a scale for evaluating a difference between patterns.

【図１２】第３の実施形態における音声合成ユニット１
０２の処理手順を説明するフローチャートである。FIG. 12 shows a speech synthesis unit 1 according to a third embodiment.
11 is a flowchart for explaining the processing procedure of No. 02.

Claims

[Claims]

1. A method comprising: detecting means for detecting a predetermined character string from character information; and control means for controlling the prosody of voice information synthesized from the character information according to the predetermined character string. Speech synthesizer.

2. The speech synthesizer according to claim 1, wherein the predetermined character string is a character string expressing a predetermined emotion.

3. The speech synthesizer according to claim 1, wherein the predetermined character string is a pictogram or a face character.

4. The speech synthesizer according to claim 1, wherein said control means controls a prosody of said speech information based on prosody control information corresponding to said predetermined character string.

5. The control means comprises: voice pitch, voice intensity,
5. The speech synthesizer according to claim 1, wherein at least one of a voice length and a pause interval is controlled.

6. The speech synthesizer according to claim 1, further comprising a registration unit for registering said predetermined character string.

7. The apparatus according to claim 1, further comprising voice synthesis means for synthesizing voice information from said character information, wherein said voice synthesis means synthesizes voice information from character information excluding said predetermined character string. 7. The speech synthesizer according to any one of 1 to 6.

8. A method comprising: detecting a predetermined character string from character information; and controlling a prosody of voice information synthesized from the character information according to the predetermined character string. Speech synthesis method to be used.

9. The speech synthesis method according to claim 8, wherein the predetermined character string is a character string expressing a predetermined emotion.

10. The speech synthesis method according to claim 8, wherein the predetermined character string is a pictogram or a face character.

11. The speech synthesis method according to claim 8, wherein the control step controls a prosody of the speech information based on prosody control information corresponding to the predetermined character string.

12. The method according to claim 8, wherein the control step controls at least one of a pitch, a voice intensity, a voice length, and a pause interval. The speech synthesis method described in 1.

13. The speech synthesis method according to claim 8, further comprising a registration step of registering said predetermined character string.

14. A speech synthesizing step of synthesizing speech information from the character information, wherein the speech synthesizing step synthesizes speech information from character information excluding the predetermined character string. 14. The speech synthesis method according to any one of Items 8 to 13.

15. A computer-readable storage medium storing a program for executing the method according to any one of claims 8 to 14.