JP2000047681A

JP2000047681A - Information processing method

Info

Publication number: JP2000047681A
Application number: JP10217751A
Authority: JP
Inventors: Takehiko Kagoshima; 岳彦籠嶋; Shigenobu Seto; 重宣瀬戸; Shinko Morita; 眞弘森田
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1998-07-31
Filing date: 1998-07-31
Publication date: 2000-02-18
Anticipated expiration: 2018-07-31
Also published as: JP3550303B2

Abstract

(57)【要約】【課題】事例に対応する特徴量の選択を行う際に、正し
い特徴量の選択がなされなかった場合でもより正解に近
い特徴量の選択が行える。【解決手段】ある事例に関する複数の属性の状態に従っ
て、前記事例に関する特徴量を複数の選択肢から選択す
る情報処理方法であって、ｋ番目の選択肢に対応する特
徴量の推定値ｙ_k をｊ番目の属性の状態によって決定さ
れる属性値ｄ_j の関数ｗ_kj（ｄ_j ）と定数ｗ_k0とを用い
て、【数１】によって求め、該特徴量の推定値に基づいて、前記事例
に関する特徴量を前記複数の選択肢より選択する。 (57) [Summary] [Problem] When selecting a feature value corresponding to a case, even when a correct feature value is not selected, a feature value closer to a correct answer can be selected. An information processing method for selecting a feature amount related to a case from a plurality of options in accordance with states of a plurality of attributes related to a case, wherein an estimated value y _k of a feature amount corresponding to a k-th option is set to a j-th option. function w _kj attribute values d _j which is determined by the state of the attribute (d _j) and with the constant w _k0, Equation 1] And selecting a feature value related to the case from the plurality of options based on the estimated value of the feature value.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、属性の情報から事
例に対応する特徴量の選択（あるいは事例のクラス分
類）を行うシステムに関する。[0001] 1. Field of the Invention [0002] The present invention relates to a system for selecting a feature amount corresponding to a case (or classifying a case) from attribute information.

【０００２】[0002]

【従来の技術】ある事例が、どのクラスに属するかを決
定する問題について、人工知能などの分野でこれまで様
々な研究がなされている。例えば、文献「ＡＩによるデ
ータ解析、Ｊ．Ｒ．キンラン、（株）トッパン」には、
決定木を用いて事例を分類する方法が開示されている。
決定木による分類器は、属性に関するＩＦ−ＴＨＥＮの
ルールで記述されるものである。2. Description of the Related Art Various studies have been made on the problem of determining which class a case belongs to in the field of artificial intelligence and the like. For example, the document "Data analysis by AI, JR Kinlan, Toppan Co., Ltd."
A method for classifying cases using a decision tree is disclosed.
The classifier based on the decision tree is described by the IF-THEN rules regarding attributes.

【０００３】例えば、属性を天侯、温度として、与えら
れた事例が「開催」または「中止」のいずれのクラスに
属するかを決定する分類器の例を図１に示す。このよう
な決定木は、すでに属性の値と正しいクラスが分かって
いる多数の事例を訓練事例として、訓練事例に対する分
類の正解率が高くなるように学習される。For example, FIG. 1 shows an example of a classifier that determines whether a given case belongs to a class of “held” or “cancelled”, with attributes as weather and temperature. Such a decision tree is learned so that a large number of cases whose attribute values and correct classes are already known are used as training cases so as to increase the accuracy of classification for the training cases.

【０００４】[0004]

【発明が解決しようとする課題】事例を分類する問題に
おいて、クラスの数が３つ以上である場合には、分類の
正解率だけでなく、どのように誤るかが問題になること
がある。例として、テキストを音声に変換するテキスト
音声合成におけるピッチパターン生成処理について述べ
る。ピッチパターン生成処理は、テキストを解析して得
られる品詞、アクセント型、音節数などの情報から、声
の高さの時間変化バターンであるピッチパターンを生成
する処理である。ピッチパターンの生成法として、アク
セント句単位の典型的なピッチパターンである代表パタ
ーンを複数用意し、アクセント句毎の品詞、アクセント
型、音節数などを属性として、代表パターンを選択し、
これを用いてピッチパターンを生成する方法がある。こ
の代表パターン選択においては、最適なパターンの選択
率が高いだけでなく、最適なパターンからかけはなれた
パターンが選ばれにくいことが重要である。In the problem of classifying cases, when the number of classes is three or more, not only the accuracy rate of classification but also how to make an error may be a problem. As an example, a pitch pattern generation process in text-to-speech synthesis that converts text to speech will be described. The pitch pattern generation process is a process of generating a pitch pattern, which is a time-varying pattern of voice pitch, from information such as part of speech, accent type, and number of syllables obtained by analyzing a text. As a method of generating a pitch pattern, prepare a plurality of representative patterns, which are typical pitch patterns in units of accent phrases, and select a representative pattern with attributes such as the part of speech, accent type, and the number of syllables for each accent phrase,
There is a method of generating a pitch pattern using this. In the selection of the representative pattern, it is important that not only the selectivity of the optimal pattern is high, but also that a pattern deviating from the optimal pattern is hard to be selected.

【０００５】代表パターンが、図２に示した４つのパタ
ーンであるとすると、図２（ａ）に示すパターンが最適
なパターンである場合に誤って図２（ｂ）に示すパター
ンを選択してしまうことはそれほど問題にならないが、
誤って図２（ｃ）に示すパターンを選択してしまうとア
クセントが全く違って聞こえることとなり、大きな問題
となる。[0005] Assuming that the representative patterns are the four patterns shown in FIG. 2, if the pattern shown in FIG. 2A is the optimum pattern, the pattern shown in FIG. It doesn't matter much,
If the pattern shown in FIG. 2C is selected by mistake, the accent sounds completely different, which is a serious problem.

【０００６】従来の決定木による分類法を代表パターン
選択に用いた場合、属性と最適なパターンの番号の組を
トレーニング事例として、最適なパターンの選択率を高
めることを基準として学習を行う。そのため、最適なパ
ターンが選ばれなかった場合にかけはなれたパターンが
選択されることを避けられないという問題がある。When a conventional classification method using a decision tree is used for representative pattern selection, learning is performed on the basis of increasing the selectivity of the optimal pattern using a set of an attribute and an optimal pattern number as a training example. For this reason, there is a problem that it is inevitable to select a pattern that is far apart when the optimal pattern is not selected.

【０００７】そこで、本発明は以上の問題を考慮してな
されたものであり、事例に対応する特徴量の選択（ある
いは事例のクラス分類であってもよい）を行う際に、各
選択肢を選択した場合の評価値を当該事例の属性情報に
基づき推定してから該推定値に基づいて特徴量を選択す
ることにより、正しい特徴量の選択がなされなかった場
合でもより正解に近い特徴量の選択が行える情報処理方
法を提供することをも目的とする。Therefore, the present invention has been made in consideration of the above problems, and when selecting a feature amount corresponding to a case (or classifying a case), each option is selected. By estimating the evaluation value based on the attribute information of the case and selecting the feature based on the estimated value, even if the correct feature is not selected, the feature closer to the correct answer can be selected. Another object of the present invention is to provide an information processing method capable of performing the above.

【０００８】[0008]

【課題を解決するための手段】本発明の情報処理方法
は、ある事例に関する複数の属性（例えば、テキスト音
声合成におけるアクセント句のアクセント型、モーラ
数、当該アクセント句の係り先、先行アクセント句の係
り先）の状態に従って、前記事例に関する特徴量（例え
ばテキスト音声合成における代表パターン）を複数の選
択肢から選択する情報処理方法であって、ｋ番目の選択
肢に対応する特徴量の推定値ｙ_k をｊ番目の属性の状態
によって決定される属性値ｄ_j の関数ｗ_kj（ｄ_j ）と定
数ｗ_k0とを用いて、According to the information processing method of the present invention, a plurality of attributes related to a certain case (for example, the accent type of accent phrase in text-to-speech synthesis, the number of mora, the destination of the accent phrase, and the according to the state of dependency destination), an information processing method for selecting the feature quantity relating to the case (for example, the representative patterns in text-to-speech synthesis) from a plurality of choices, the estimated value y _k of the feature quantity corresponding to the k-th choice using j th attribute of state function w _kj attribute values d _j which is determined by the (d _j) and constant w _k0,

【０００９】[0009]

【数２】によって求め、該特徴量の推定値に基づいて、前記事例
に関する特徴量を前記複数の選択肢より選択することを
特徴とすることにより、事例に対応する特徴量の選択を
行う際に、各選択肢を選択した場合の評価値を属性の情
報に基づき推定し、該推定値に基づいて特徴量を選択す
るため、正しい特徴量の選択がなされなかった場合でも
より正解に近い特徴量の選択が行える。(Equation 2) By selecting a feature amount related to the case from the plurality of options based on the estimated value of the feature amount, when selecting a feature amount corresponding to the case, each option is selected. Since the evaluation value at the time of selection is estimated based on the attribute information and the feature amount is selected based on the estimated value, even if the correct feature amount is not selected, a feature amount closer to the correct answer can be selected.

【００１０】ここで、属性とは、前記事例に関して事例
の分類を行うに際して知ることができる情報の種類であ
る。例えば、事例を「人間」とするならば、属性として
性別、職業、身長、体重、年齢など種々の情報が考えら
れる。Here, the attribute is the type of information that can be known when classifying cases with respect to the case. For example, if the case is “human”, various information such as gender, occupation, height, weight, and age can be considered as attributes.

【００１１】属性値とは、この属性の状態を表す数値で
ある。属性が、成績（優、良、可、不可）や服のサイズ
（ＬＬ、Ｌ、Ｍ、Ｓ）などのように順序関係を持つもの
（いわゆる順序尺度）である場合は、順序関係に従った
適当な値を属性値とすることができる。例えば、成績で
あれば、優：４、良：３、可：２、不可：１などとして
もよい。属性が、身長、体重、年齢、摂氏温度などのよ
うに数値として測定されるもの（いわゆる間隔尺度また
は比尺度）である場合は、測定値を属性値とすることが
できる。あるいは、測定値を量子化して得られる代表値
を順序尺度と同様に扱っても良い。属性が、性別・職業
・色などのように順序関係を持たないもの（いわゆる分
類尺度）である場合は、これらの分類のカテゴリに対し
て任意の値を対応させて属性値とすることができる。例
えば、色であれば、赤：１、青：２、黄：３、などとし
てもよい。The attribute value is a numerical value indicating the state of this attribute. If the attribute has an order relationship (so-called order scale) such as grades (excellent, good, acceptable, unacceptable) or clothes sizes (LL, L, M, S), the order is followed. An appropriate value can be used as the attribute value. For example, in the case of a grade, excellent: 4, good: 3, acceptable: 2, unacceptable: 1, etc. When the attribute is measured as a numerical value (so-called interval scale or ratio scale) such as height, weight, age, and Celsius temperature, the measured value can be used as the attribute value. Alternatively, a representative value obtained by quantizing the measured value may be treated in the same manner as the ordinal scale. If the attribute has no order relation such as gender, occupation, color, etc. (so-called classification scale), an attribute value can be set by associating an arbitrary value with the category of these classifications. . For example, if it is a color, red: 1, blue: 2, yellow: 3, etc. may be used.

【００１２】少なくとも１つの属性に対する属性値ｄ_j
が有限個（Ｎ個）の値（ｄ_j1、ｄ_j2、…、ｄ_jN）をとる
ものである場合、好ましくは、各属性値に対応する係数
（ａ_kj1 、ａ_kj2 、…、ａ_kjN ）を用いて、ｋ番目
の選択肢に対応する属性値の関数ｗ_kj（ｄ_j ）がｗ_kj（ｄ_jm）＝ａ_kjm で表されるようにしてもよい。Attribute value d _j for at least one attribute
_{Takes a} finite number (N) of values (d _j1 , d _j2 ,..., D _jN ), preferably, the coefficients (a _kj1 , a _kj2 ,..., A _kjN ) corresponding to each attribute value , The function w _kj (d _j ) of the attribute value corresponding to the k-th option may be represented by w _kj (d _jm ) = a _kjm .

【００１３】属性の状態が不明であったり、属性が想定
していない状態となるなどして、属性値が不明である場
合、不明な属性値ｄ_j の関数ｗ_kj（ｄ_j ）が定数ｃ_kjを
用いてｗ_kj（ｄ_j ）＝ｃ_kj で表されるようにしてもよい。[0013] or an attribute of the state is unknown, the attribute is in such a state that does not assume, if the attribute value is unknown, the function of unknown attribute value d _{_j} w _kj (d _j) is constant c it may be represented by _{_{w kj (d j) = c}} kj with _kj.

【００１４】好ましくは、前記選択肢が、テキストを音
声に変換するテキスト音声合成に関する特徴パラメータ
の選択肢であるようにしてもよい。好ましくは、前記特
徴パラメータの選択肢が、ピッチパターン制御モデルの
特徴バラメータの選択肢であるようにしてもよい。[0014] Preferably, the option may be an option of a feature parameter relating to text-to-speech synthesis for converting text to speech. Preferably, the option of the characteristic parameter may be an option of a characteristic parameter of a pitch pattern control model.

【００１５】好ましくは、前記ピッチパターン制御モデ
ルが、複数の代表パターンより１つのパターンを選択
し、該選択されたパターンに対して変形を施して得られ
るパターンをピッチパターンとするものであるようにし
てもよい。Preferably, the pitch pattern control model is such that one pattern is selected from a plurality of representative patterns, and a pattern obtained by subjecting the selected pattern to deformation is used as a pitch pattern. You may.

【００１６】好ましくは、前記変形が、少なくとも対数
周波数軸上の並行移動を含むようにしてもよい。好まし
くは、前記属性が、アクセント句のモーラ数、アクセン
ト句の係り先、先行アクセント句の係り先を含むように
してもよい。Preferably, the deformation includes at least a parallel movement on a logarithmic frequency axis. Preferably, the attribute may include the number of mora of the accent phrase, the destination of the accent phrase, and the destination of the preceding accent phrase.

【００１７】[0017]

【発明の実施の形態】以下、本発明の実施形態について
図面を参照して説明する。ある人がラーメン、寿司、カ
レーライスのうちどれが最も好きかを、その人の年齢、
性別、出身地方から推定する問題を例題として説明す
る。ここでは、トレーニング事例として、図３に示すよ
うなアンケートの結果を用いるものとする。トレーニン
グ事例（アンケート結果）の例を図４に示す。Embodiments of the present invention will be described below with reference to the drawings. Whether a person likes ramen, sushi, or curry and rice the most,
The problem estimated from gender and hometown is explained as an example. Here, it is assumed that a questionnaire result as shown in FIG. 3 is used as a training example. FIG. 4 shows an example of a training case (questionnaire result).

【００１８】本実施形態では、年齢、性別、出身地方が
属性（ｊ）であり、ラーメン、寿司、カレーライスがそ
れぞれどのくらい好きかが推定すべき特徴量（ｋ）であ
る。図４に示す年齢、性別、出身地方といった属性から
ラーメン、寿司、カレーライスがそれぞれどのくらい好
きかを表す推定値を求め、好ましさの推定値が最大にな
るものを選択することにより、どの食べ物が最も好きか
を推定する。In this embodiment, the attribute (j) is the age, gender, and home region, and the feature quantity (k) to estimate how much each of ramen, sushi, and curry and rice is like. From the attributes shown in Fig. 4, such as age, gender, and hometown, an estimated value indicating how much each of the noodles, sushi, and curry and rice is desired is determined, and the food with the highest estimated value of preference is selected to determine which food is best. Estimate what you like best.

【００１９】各属性の属性値は、アンケートの選択肢の
番号をそのまま用いるものとし、年齢、性別、出身地方
の属性値をそれぞれｄ₁ 、ｄ₂ 、ｄ₃ で表す。各特徴量
（ｋ＝１、２、３）、各属性（ｊ＝１、２、３）毎に定
めた属性値の関数ｗ_kjと各特徴量毎に定めた定数ｗ_k0を
用いて、ラーメン、寿司、カレーライスの好ましさ（特
徴量）の推定値ｙ₁ 、ｙ₂ 、ｙ₃ を次のようにモデル化
する。As the attribute value of each attribute, the number of the option of the questionnaire is used as it is, and the attribute values of the age, gender and hometown are represented by d ₁ , d ₂ and d ₃ , respectively. Using a feature w (k = 1, 2, 3), an attribute value function w _kj defined for each attribute (j = 1, 2, 3), and a constant w _k0 defined for each feature, ramen , Sushi, curry and rice are modeled as follows in terms of the estimated values y ₁ , y ₂ , and y ₃ .

【００２０】[0020]

【数３】ここで、属性値の関数ｗ_kjを以下のように定義する。(Equation 3) Here, the function w _kj of the attribute value is defined as follows.

【００２１】[0021]

【数４】 (Equation 4)

【００２２】次に、推定モデルのパラメータをトレーニ
ング事例を用いて決定する方法について説明する。全て
のトレーニング事例に対するラーメン、寿司、カレーラ
イスの好ましさの推定誤差の２乗和を表す評価関数Ｅ
₁ 、Ｅ₂ 、Ｅ₃ を以下のように定義する。Next, a method of determining the parameters of the estimation model using training examples will be described. Evaluation function E representing the sum of squares of the estimation error of the preference of ramen, sushi and curry rice for all training cases
₁ , E ₂ and E ₃ are defined as follows.

【００２３】[0023]

【数５】 (Equation 5)

【００２４】ただし、Ｍはトレーニング事例の数を、ｙ
₁ ⁱ 、ｙ₂ ⁱ 、ｙ₃ ⁱ は、それぞれｉ番目のトレーニン
グ事例のラーメン、寿司、カレーライスの好ましさの値
（アンケートの選択肢の番号）を表し、ｄ_j ⁱ はｉ番目
のトレーニング事例のｊ番目の属性値を表すものとす
る。この評価関数を最小化するようなバラメータを探索
すればよい。パラメータの探索は、何らかの公知の最適
化手法を用いればよい。Where M is the number of training cases, y
₁ ^i, y ₂ ^i, y ₃ ⁱ is the i-th training cases respectively noodles represent sushi, desirability value of curry and rice (the number of choices questionnaire), d _j ⁱ is the i-th training case Represents the j-th attribute value of. A parameter that minimizes this evaluation function may be searched for. The search for the parameter may use some known optimization technique.

【００２５】例えば、ａ₁₁₁ ＝−０．２、ａ₁₁₂ ＝−
０．１、ａ₁₁₃ ＝０．３、ａ₁₁₄ ＝０．５、ａ₁₁₅ ＝
０．４と係数が求められたとすると、変数ｄ₁ を年代と
して関数Ｗ_k1（ｄ₁ ）は、図５のように表される。ま
た、図５の関数の代わりに、変数ｄ₁ を年齢として、図
５の関数を滑らかに補間して得られる図６のような関数
を用いてもよい。For example, a ₁₁₁ = −0.2, a ₁₁₂ = −
0.1, a ₁₁₃ = 0.3, a ₁₁₄ = 0.5, a ₁₁₅ =
Assuming that a coefficient of 0.4 is obtained, the function W _k1 (d ₁ ) is represented as shown in FIG. 5 using the variable d ₁ as the age. Further, instead of the function of Figure 5, the variable d ₁ as age, it may be used functions like Figure 6 obtained by smoothly interpolating function of Figure 5.

【００２６】求められたモデルを用いて最も好ましい食
べ物（特徴量）を推定するためには、推走する人の年
齢、性別、出身地方から、図３に従って属性値ｄ₁ 、ｄ
₂ 、ｄ₃ を求め、これを（１）式に代入して各食べ物に
対応する好ましさの推定値ｙ₁、ｙ₂ 、ｙ₃ を求めて、
推定値が最大になった食べ物（最大の推定値がｙ₁ であ
ればラーメン、ｙ₂ であれば寿司、ｙ₃ であればカレー
ライス）を最も好きな食べ物と推定する。In order to estimate the most preferable food (characteristic amount) using the obtained model, the attribute values d ₁ , d
₂ , d ₃ are obtained, and the obtained values are substituted into the equation (1) to obtain estimated values y ₁ , y ₂ , y ₃ of the preference corresponding to each food,
Food the estimated value becomes maximum is estimated to be the most favorite food (if the maximum of the estimated value is y ₁ Ramen, if y ₂ sushi, curry rice if y _3).

【００２７】また、このモデルを用いて、ラーメン、寿
司、カレーライスの中から好きな食べ物を全て推定して
選択することもできる。例えば、閾値を「３．５」とし
て、好ましさの推定値ｙ_k が３．５以上であればその食
べ物が好きであると推定すればよい。好ましさの推定値
がｙ₁ ＝２．５、ｙ₂ ＝４．２、ｙ₃ ＝３．８であると
すれば、好きな食ぺ物は寿司とカレーライスであると推
定される。Further, using this model, all the favorite foods can be estimated and selected from ramen, sushi, and curry and rice. For example, if the threshold value is set to “3.5” and the estimated value y _k of the preference is 3.5 or more, it may be estimated that the user likes the food. If the estimated value of the preference is y ₁ = 2.5, y ₂ = 4.2 and y ₃ = 3.8, the favorite foods are estimated to be sushi and curry rice.

【００２８】トレーニング事例として与えられる情報
が、各食べ物の好ましさではなく、どれが最も好きかと
いう情報のみである場合、最も好きな食べ物に対する好
ましさの値を「５」とし、それ以外を「１」として、上
記の方法と同様にモデルのパラメータを求めて推定を行
うことができる。例えば、好きな食べ物が寿司の場合
は、ｉ番目のトレーニング事例におけるラーメンの推定
値はｙ₁ ⁱ ＝１、寿司の推定値はｙ₂ ⁱ ＝５、カレーラ
イスの推定値はｙ₃ ⁱ ＝１となる。If the information given as a training example is not information about the preference of each food but only information about which one likes the most, the value of the preference for the most favorite food is set to “5”, and Is set to “1”, the parameters of the model can be obtained and estimated in the same manner as in the above method. For example, if your favorite food is sushi, the estimated value of the noodles in the i-th training case y ₁ ⁱ = _1, the estimated value of sushi y ₂ ⁱ = 5, the estimated value of curry rice y ₃ ⁱ = 1 Becomes

【００２９】また、不明な属性値があるような事例に対
して推定を行うために、ｄ_j が不明な場合の関数の値
を、ｗ_kj（ｄ_j ）＝ｃ_kj（ｄ_j が不明）（６）と定義する。ｃ_kjは、例えばａ_kjm （ｍ＝１、２、…）
の乎均値としてもよい。あるいは、属性値の出現頻度で
重み付けをして求めた加重平均値を用いてもよい。In order to estimate a case where there is an unknown attribute value, the value of the function when _dj is unknown is given by w _kj (d _j ) = c _kj (where _dj is unknown). (6) is defined. c _kj is, for example, a _kjm (m = 1, 2,...)
It may be a perfectly equal value. Alternatively, a weighted average value obtained by weighting with the appearance frequency of the attribute value may be used.

【００３０】なお、定数ｗ_k0は、常に「０」として最適
化を行わないようにしてもよい。また、２つ以上の属性
の組合せを新たに１つの属性と定義してもよい。例え
ば、「性別」と「出身地方」の２つの属性を組み合わせ
て１つの属性とし、図７に示すように属性値を付与する
ことも可能である。The constant w _k0 may always be set to “0” so that the optimization is not performed. Further, a combination of two or more attributes may be newly defined as one attribute. For example, it is also possible to combine two attributes of “sex” and “hometown” into one attribute, and to assign an attribute value as shown in FIG.

【００３１】図８は、本発明の情報処理方法を実現する
情報処理装置の構成を示すブロック図で、例えば、入力
されたテキスト１０１を合成音声１１０に変換するテキ
スト音声合成処理を実現するものである。FIG. 8 is a block diagram showing the configuration of an information processing apparatus for realizing the information processing method of the present invention. For example, it realizes a text-to-speech synthesis process for converting an input text 101 into a synthesized speech 110. is there.

【００３２】言語処理部１０は、テキスト１０１に言語
処理を行って、その処理結果、すなわち、アクセン卜型
１０２、モーラ数１０３、当該アクセント句の係り先１
０４、先行アクセント句の係り先１０５、音韻記号列１
０６をそれぞれアクセント句毎に出力データとして出力
する。The linguistic processing unit 10 performs linguistic processing on the text 101, and the processing result, that is, the accent type 102, the number of moras 103, and the destination 1 of the accent phrase
04, destination 105 of leading accent phrase, phoneme symbol string 1
06 is output as output data for each accent phrase.

【００３３】例えば「あらゆる現実をすべて自分の方へ
ねじ曲げたのだ」というテキストの場合、アクセント句
は「あらゆる」「現実を」「すべて」「自分の」「方
ヘ」「ねじ曲げたのだ」となり、これらの各アクセント
句についてのアクセン卜型１０２、モーラ数１０３、当
該アクセント句の係り先１０４、先行アクセント句の係
り先１０５が出力される。For example, in the case of the text "everything has been twisted toward oneself", the accent phrase is "everything", "everything", "all", "oneself", "one", "twisted". The accent type 102, the number of moras 103, the destination 104 of the accent phrase, and the destination 105 of the preceding accent phrase for each of these accent phrases are output.

【００３４】韻律生成部１１は、アクセント型１０２、
モーラ数１０３、当該アクセント句の係り先１０４、先
行アクセン卜句の係り先１０５、音韻記号列１０６か
ら、基本周波数の時間変化パターンであるピッチパター
ンと音韻継続時間長を生成して、それぞれ出力データ１
０７、１０８として出力する。The prosody generation unit 11 includes an accent type 102,
A pitch pattern, which is a time-change pattern of the fundamental frequency, and a phonological duration are generated from the number of moras 103, the relevant part 104 of the accent phrase, the relevant part 105 of the preceding accent phrase, and the phoneme symbol string 106, and the output data is generated. 1
07 and 108 are output.

【００３５】音声信号生成部１２は、音韻記号列１０
６、ピッチパターン１０７、音韻継続時間長１０８よ
り、合成音声１１０を生成する。次に、韻律生成部１１
の詳細な動作について、図９を参照して説明する。The voice signal generation unit 12 generates the phoneme symbol string 10
6. A synthesized speech 110 is generated from the pitch pattern 107 and the phoneme duration 108. Next, the prosody generation unit 11
Will be described with reference to FIG.

【００３６】代表パターン選択部２１は、言語処理部１
０から出力されるアクセン卜型１０２、モーラ数１０
３、当該アクセント句の係り先１０４、先行アクセント
句の係り先１０５から、当該アクセント句に対して適し
た代表パターンを選択し、代表バターン番号２０１を出
力する。The representative pattern selection unit 21 is a language processing unit 1
Accent type 102 output from 0, number of mora 10
3. The representative pattern suitable for the accent phrase is selected from the relevant phrase 104 of the accent phrase and the relevant destination 105 of the preceding accent phrase, and the representative pattern number 201 is output.

【００３７】代表パターン記憶部２２は、複数の代表パ
ターンを代表パターン番号に対応させて予め記憶してい
る。その中から、代表パターン選択部２１から出力され
た代表バターン番号２０１に対応する代表バターン２０
３を読みだして出力する。The representative pattern storage unit 22 stores a plurality of representative patterns in advance corresponding to the representative pattern numbers. Among them, the representative pattern 20 corresponding to the representative pattern number 201 output from the representative pattern selection unit 21
Read out 3 and output.

【００３８】代表バターンとは、例えば図１０に示すよ
うな、音声の基本周波数の代表的な時間変化バターンの
時間軸をモーラ単位に正規化したものである。移動量生
成部２０は、入力したモーラ数１０３、当該アクセント
句の係り先１０４、先行アクセン卜句の係り先１０５よ
り、代表パターン２０３を対数周波数軸方向に並行移動
する際の移動量２０２を求めて出力する。The representative pattern is obtained by normalizing the time axis of a typical time-change pattern of the fundamental frequency of a voice in units of mora as shown in FIG. The movement amount generation unit 20 obtains a movement amount 202 when the representative pattern 203 is moved in parallel in the logarithmic frequency axis direction from the input number of moras 103, the destination 104 of the accent phrase, and the destination 105 of the preceding accent phrase. Output.

【００３９】音韻継続時間長生成部２３は、音韻記号列
１０６に従って、各音韻の音韻継続時間長１０８を求め
て出力する。ピッチパターン生成部２４は、代表パター
ン２０３を、音韻継続時間長１０８に従って各モーラの
長さが音韻継続時間長と等しくなるように時間方向に伸
縮し、移動量２０２に従って対数周波数軸上で並行移動
させて、ピッチパターン１０７を出力する。「あらゆ
る」というアクセント句に対して、ピッチパターンの生
成を行った例を図１１に示す。The phoneme duration generation unit 23 calculates and outputs a phoneme duration 108 of each phoneme according to the phoneme symbol string 106. The pitch pattern generation unit 24 expands and contracts the representative pattern 203 in the time direction according to the phoneme duration 108 so that the length of each mora is equal to the phoneme duration, and moves in parallel on the logarithmic frequency axis according to the movement 202. Then, the pitch pattern 107 is output. FIG. 11 shows an example in which a pitch pattern is generated for an accent phrase of “any”.

【００４０】図１１（ａ）が代表バターン２０３を、図
１１（ｂ）が、時間方向の伸縮を行って得られるバター
ンを、図１１（ｃ）が、対数周波数軸上で並行移動させ
て得られるピッチパターン１０７を表している。ただ
し、図１１の縦軸は対数周波数を表すものとし、移動量
２０２は４．５とした。FIG. 11A shows the representative pattern 203, FIG. 11B shows the pattern obtained by expanding and contracting in the time direction, and FIG. 11C shows the pattern obtained by moving the pattern in parallel on the logarithmic frequency axis. FIG. However, the vertical axis in FIG. 11 represents the logarithmic frequency, and the moving amount 202 is 4.5.

【００４１】次に、代表パターン選択部２１の詳細な動
作について説明する。ここで、アクセント型１０２、モ
ーラ数１０３、当該アクセント句の係り先１０４、先行
アクセント句の係り先１０５を当該アクセン卜句に関す
る属性とし、代表パターンを特徴量とする。まず、属性
値から各代表パターンに対する評価値を推定し、該推定
値が最小となる代表パターンの番号を代表バターン番号
２０１として出力する。Next, the detailed operation of the representative pattern selecting section 21 will be described. Here, the accent type 102, the number of moras 103, the destination 104 of the accent phrase, and the destination 105 of the preceding accent phrase are attributes of the accent phrase, and the representative pattern is a feature amount. First, an evaluation value for each representative pattern is estimated from the attribute value, and the number of the representative pattern with the smallest estimated value is output as a representative pattern number 201.

【００４２】ここで、各代表バターンの推定値とは、各
代表パターンを使用して生成されるピッチパターンと、
属性の状態の組合せに対して理想的なピッチパターンと
の距離を表している。Here, the estimated value of each representative pattern is a pitch pattern generated using each representative pattern,
It represents the distance from an ideal pitch pattern for a combination of attribute states.

【００４３】距離の推定値の求め方について説明する。
各属性の状態に対する属性値を図１２のように定義す
る。ｋ番目の代表パターンに対する距離の推定値ｙ_k
を、属性値の関数ｗ_kj（ｄ_j）、（ｋ＝１、２、…、
Ｋ）（ｊ＝１、２、３、４）と定数ｗ_koを用いて次式に
よって求める。A method for obtaining an estimated distance value will be described.
Attribute values for each attribute state are defined as shown in FIG. Estimated distance y _k for the k-th representative pattern
To the attribute value functions w _kj (d _j ), (k = 1, 2 _,.
K) is determined by the following equation using (j = 1, 2, 3, 4) and a constant w _ko .

【００４４】[0044]

【数６】 (Equation 6)

【００４５】関数ｗ_kj（ｄ_j ）、（ｋ＝１、２、…、
Ｋ）（ｊ＝１、２、３、４）は、係数ａ_kjm （ｋ＝１、
２、…、Ｋ）（ｊ＝１、２、３、４）（ｍ＝０、１、
２、…）とｃ_kjを用いて以下のように定義される。The functions w _kj (d _j ), (k = 1, 2,...,
K) (j = 1, 2, 3, 4) is a coefficient a _kjm (k = 1,
, K) (j = 1, 2, 3, 4) (m = 0, 1,
2,...) And c _kj are defined as follows.

【００４６】[0046]

【数７】 (Equation 7)

【００４７】ここで、ｄ_j が不明であるとは、言語処理
の失敗などの理由で、属性の状態を知ることができない
ような揚合を意味している。（８）式の代わりに、次式
のような多項式を用いてもよい。Here, the fact that d _j is unknown means that the state of the attribute cannot be known due to a failure in language processing or the like. Instead of equation (8), a polynomial such as the following equation may be used.

【００４８】ｗ_kj（ｄ_j ）＝ｂ_kj2 ｄ_j ² ＋ｂ_kj1 ｄ_j ＋ｂ_kj0 （９）（７）式のｗ_k0および（８）式の係数ａ_kjm あるいは
（９）式の係数ｂ_kjm は、トレーニング事例に対する距
離の推定値の誤差が最小となるように決定される。W _kj (d _j ) = b _kj2 d _j ² + b _kj1 d _j + b _kj0 (9) w _{k0 of the} equation (7) and the coefficient a _{kjm of the} equation (8) or the coefficient b _kjm of the equation (9) are , The error of the estimated distance value for the training case is determined to be minimum.

【００４９】トレ−二ング事例とは、実音声から抽出さ
れたピッチパターンと各代表パターンとの距離のデータ
と、対応するテキストの属性（アクセン卜型、モーラ
数、当該アクセン卜句の係り先、先行アクセン卜旬の係
り先）のデータの組合せである。トレ−二ング事例は、
大量のテキストとそれを読み上げた実音声のデータを解
析することにより得られる。The training examples are data on the distance between the pitch pattern extracted from the actual voice and each representative pattern, and the attributes of the corresponding text (accent type, number of mora, the relevant part of the accent phrase). , A combination of data of the preceding account). The training case is
It is obtained by analyzing a large amount of text and the data of the actual voice that reads it.

【００５０】係数の最適化は、例えば距離の推定値の２
乗平均誤差を評価関数として、公知の最適化手法を用い
て行うことができる。また、（８）式の係数ｃ_kjは、ａ
_kj0 、ａ_kj1 …の平均値とするか、あるいは属性値の出
現頻度を考慮して重み付けを行った加重平均値としても
よい。The optimization of the coefficient is performed by, for example,
This can be performed by using a known optimization technique using the root mean square error as an evaluation function. Further, the coefficient c _kj in the equation (8) is a
The average value of _kj0 , _akj1, ... or a weighted average value weighted in consideration of the appearance frequency of the attribute value may be used.

【００５１】本実施形態ではテキストの属性を、アクセ
ント型、モーラ数、当該アクセン卜句の係り先、先行ア
クセン卜句の係り先としたが、その他にも、品詞、文中
の位置、呼気段落中の位置、音韻の種類など様々な情報
を属性とすることができる。In this embodiment, the attributes of the text are the accent type, the number of moras, the destination of the accent phrase, and the destination of the preceding accent phrase. Various information, such as the position of the character and the type of phoneme, can be used as attributes.

【００５２】[0052]

【発明の効果】以上説明したように、本発明によれば、
事例に対応する特徴量の選択を行う際に、各選択肢を選
択した場合の評価値を属性の情報に基づき推定し、該推
定値に基づいて特徴量を選択するため、正しい特徴量の
選択がなされなかった場合でもより正解に近い特徴量の
選択が行える。As described above, according to the present invention,
When selecting a feature amount corresponding to a case, an evaluation value when each option is selected is estimated based on attribute information, and a feature amount is selected based on the estimated value. Even in the case where this is not done, a feature value closer to the correct answer can be selected.

[Brief description of the drawings]

【図１】従来からある決定木を用いた分類器の一例を示
す図。FIG. 1 is a diagram showing an example of a classifier using a conventional decision tree.

【図２】ピッチパターン生成処理における代表パターン
の具体例を示す図。FIG. 2 is a diagram showing a specific example of a representative pattern in pitch pattern generation processing.

【図３】トレーニング事例収集のためのアンケートの具
体例を示す図。FIG. 3 is a diagram showing a specific example of a questionnaire for collecting training cases.

【図４】トレーニング事例の具体例を示す図。FIG. 4 is a diagram showing a specific example of a training case.

【図５】属性値の関数ｗ_k1（ｄ₁ ）の具体例を示す図。FIG. 5 is a diagram showing a specific example of an attribute value function w _k1 (d ₁ ).

【図６】属性値の関数ｗ_k1（ｄ₁ ）の具体例を示す図
で、図５の関数を滑らかに補間したものである。FIG. 6 is a diagram showing a specific example of a function w _k1 (d ₁ ) of an attribute value, which is obtained by smoothly interpolating the function of FIG. 5;

【図７】２つの属性の組合せを１つの属性として属性値
を定めたテーブルの一例を示す図。FIG. 7 is a diagram showing an example of a table in which a combination of two attributes is set as one attribute and attribute values are determined.

【図８】本発明の一実施形態に係るテキスト音声合成処
理を実行するを情報処理装置の構成例を示す図。FIG. 8 is a diagram showing an example of the configuration of an information processing apparatus for executing a text-to-speech synthesis process according to an embodiment of the present invention.

【図９】韻律生成部の構成例を示す図。FIG. 9 is a diagram showing a configuration example of a prosody generation unit.

【図１０】代表パターン記憶部に記憶される代表パター
ンを模式的に示す図。FIG. 10 is a diagram schematically illustrating a representative pattern stored in a representative pattern storage unit.

【図１１】複数の選択肢の中から選択された代表パター
ンからピッチパターンを生成する過程を説明するための
図。FIG. 11 is a view for explaining a process of generating a pitch pattern from a representative pattern selected from a plurality of options.

【図１２】属性の状態に対する属性値を定めたテーブル
の一例を示す図。FIG. 12 is a diagram showing an example of a table in which attribute values for attribute states are determined.

[Explanation of symbols]

１０…言語処理部１１…韻律生成部１２…音声信号生成部２０…移動量生成部２１…代表パターン選択部２２…代表パターン記憶部２３…音韻継続時間長生成部２４…ピッチパターン生成部 DESCRIPTION OF SYMBOLS 10 ... Language processing part 11 ... Prosody generation part 12 ... Speech signal generation part 20 ... Moving amount generation part 21 ... Representative pattern selection part 22 ... Representative pattern storage part 23 ... Phonological duration length generation part 24 ... Pitch pattern generation part

───────────────────────────────────────────────────── フロントページの続き (72)発明者森田眞弘兵庫県神戸市東灘区本山南町８丁目６番26 号株式会社東芝関西研究所内Ｆターム(参考） 5D045 AA09 ──────────────────────────────────────────────────続き Continued on the front page (72) Inventor Masahiro Morita 8-6-26 Motoyama Minamicho, Higashinada-ku, Kobe-shi, Hyogo F-term in Toshiba Kansai Research Institute Co., Ltd. 5D045 AA09

Claims

[Claims]

1. An information processing method for selecting a feature amount related to a case from a plurality of options according to a state of a plurality of attributes related to a case, wherein an estimated value y _k of a feature amount corresponding to a k-th option is j Function w _{kj of} attribute value d _j determined by the state of the th attribute
Using (d _j ) and a constant w _k0 , And selecting a feature amount related to the case from the plurality of options based on the estimated value of the feature amount.

2. An attribute value d for at least one attribute.
_j takes a finite number of values (d _j1 , d _j2 ..., d _jN ), and coefficients (a _kjl , a _kj2,.
_2. The information processing method according to claim 1, wherein a function w _kj (d _j ) of the attribute value corresponding to the k-th option is represented by w _kj (d _jm ) = a _{kjm using} a _kjN ). .

3. If the attribute value d _j is unknown, the function w _kj (d _j ) of the attribute value corresponding to the k-th option is a constant C
_2. The information processing method according to claim 1, wherein w _kj (d _j ) = c _kj using _kj .

4. The information processing method according to claim 1, wherein the option is an option of a feature parameter relating to text-to-speech synthesis for converting text into speech.

5. The information processing method according to claim 1, wherein the option is an option of a special feature parameter of a pitch pattern control model relating to text-to-speech synthesis that converts text to speech.

6. The option is an option of a special pattern parameter of a pitch pattern control model relating to text-to-speech synthesis for converting text into speech, and the pitch pattern control model selects one pattern from a plurality of representative patterns. 2. The information processing method according to claim 1, wherein a pattern obtained by deforming the selected pattern is used as a pitch pattern.

7. The option is an option of a special pattern parameter of a pitch pattern control model relating to text-to-speech synthesis for converting text to speech, and the pitch pattern control model selects one pattern from a plurality of representative patterns. 2. The information processing method according to claim 1, wherein a pattern obtained by subjecting the selected pattern to a deformation including at least a parallel movement on a logarithmic frequency axis is used as a pitch pattern.

8. The method according to claim 1, wherein the attributes include at least the number of mora of an accent phrase, a destination of an accent phrase, and a destination of a preceding accent phrase in text-to-speech synthesis for converting text into speech. Information processing method.