JPS59132039A

JPS59132039A - Evaluating method of kana character string

Info

Publication number: JPS59132039A
Application number: JP58005684A
Authority: JP
Inventors: Yasutaka Morimoto; 森本　恭隆; Yutaka Ooyama; 裕大山
Original assignee: NEC Corp; Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1983-01-17
Filing date: 1983-01-17
Publication date: 1984-07-30

Abstract

PURPOSE:To make the Kana (Japanese syllabary)-Kanji (CHinese character) conversion easy, by using the likelihood of plural Kana character candidates, which are attained on a basis of the pronunciation of a single syllable unit, and connection information among plural single syllables to attain m-syllable Kana character strings and selecting the Kana character string having a higher priority. CONSTITUTION:Characters are pronounced in single syllable units and are inputted to a voice input device 1. The voice input device 1 outputs corresponding Kana character candidates and their likelihood for every inputted single syllable and stores them in a storage device 2. A connection information storage device 3 stores connection information of Kana characters of the single syllable. An operating device 4 attains priorities of partial Kana character strings in accordance with Kana character strings in the device 2 and connection information by operation and selects, for example, 5 Kana character strings having higher priorities and stores selected them in a storage device 5. Five Kana character strings having higher priorities are selected though 5<2> two- syllable partial Kana character strings are generated, and selected them and the likelihood of the third single syllable are operated in 5<2> operations, and this operation is repeated to select the Kana charater string having a higher priority. Thus, the trouble of the Kana-Kanji conversion operation is reduced to make the operation easy.

Description

【発明の詳細な説明】本発明は、単音節単位に区切って発音された日本語文章
のそれぞれの該単音節に対して得られた１種あるいは複
数種のカナ文字候補から構成されるカナ文字列を優先度
の高いものから予め指定された数だけ評定するカナ文字
列評定方法に関するものである。DETAILED DESCRIPTION OF THE INVENTION The present invention provides kana characters composed of one or more types of kana character candidates obtained for each monosyllable of a Japanese sentence divided into monosyllabic units and pronounced. The present invention relates to a kana character string evaluation method that evaluates a predetermined number of strings in descending order of priority.

近年、コンピュータ等の情報処理装置によって日本語文
章の処理が行なわれるようになってきている。しかし、
日本語は漢字、平仮名２炸仮名。In recent years, information processing devices such as computers have come to process Japanese sentences. but,
Japanese has kanji, hiragana and 2 hiragana.

英数字、記号を含む多棟類の文字種を用いる言語である
ため日本語文章を処理する上で入力の問題が最大の技術
的障害とされてきた。そして現在、日本語文薔処理にお
いては効率の良い容易な入力方式の確立が最大の課題に
なっている。Since it is a language that uses a wide variety of characters including alphanumeric characters and symbols, input problems have been considered the biggest technical obstacle when processing Japanese sentences. Currently, the biggest challenge in Japanese language processing is establishing an efficient and easy input method.

現任のところ、日本語入力方式としてはカナ漢字変換方
式が主流となっている。これは、カナキーボードから所
望の文章を読みの通りに入力し、これを漢字カナ混じシ
文章に変換する方式であり日本語の文章に使用される数
千字種もの漢字を数十種程度のカナ文字用のキーボード
で入力できるという利点がある。しかし、利用者がカナ
タイプのｍ１１練を受けていないと、カナキーボードか
らの入力は決して答易なものてはなく、利用者に大きな
負担を与えるだけでなく、入力の速度も決して速くはな
いため、カナキーボードからの入力によるカナ漢字変換
方式は、カナタイプに習熟していない一般の利用者にと
っては十分な効果をあげているとはいえない。Currently, the Kana-Kanji conversion method is the mainstream Japanese input method. This is a method that inputs a desired sentence using a kana keyboard as it is read and converts it into a mixed kanji/kana sentence. It has the advantage of being able to be entered using a keyboard for kana characters. However, if the user has not received kana type m11 training, inputting from the kana keyboard will not be easy and will not only put a heavy burden on the user, but the input speed will also not be fast. Therefore, it cannot be said that the kana-kanji conversion method using input from a kana keyboard is sufficiently effective for general users who are not familiar with kana type.

そこで、キーボードの代シに単音節音声入力装置を備え
たカナ漢字変換方式が考えられる。この方式によれば、
利用者が入力したい文章を口述すると、単音節音声入力
装置がカナ文字列に変換しこれをカナ漢字変換方式によ
って変換することによって所望の漢字カナ混じシ文が得
られるため、利用者はほとんど訓練を受けることなしに
、日本語の文章を容易に入力することが可能となる。Therefore, a Kana-Kanji conversion system that is equipped with a monosyllabic voice input device in place of the keyboard may be considered. According to this method,
When a user dictates the sentence they want to input, the monosyllabic voice input device converts it into a kana character string, which is then converted using the kana-kanji conversion method to obtain the desired kanji-kana-mixed sentence. This makes it possible to easily input Japanese sentences without having to receive a message.

しかし、笑際には単音節音声入力装置による単音節認識
の認識率が１００メではないため、利用者が口述した音
節列を必ずしも所望のカナ文字列に変換できるとは限ら
ない。そのため、漢字カナ混じシ文への誤変換や変換不
能を生じることになる。However, since the recognition rate of monosyllable recognition by the monosyllabic voice input device is not 100 me when it comes to laughing, it is not always possible to convert the syllable string dictated by the user into the desired kana character string. This results in erroneous conversion or inability to convert into kanji/kana mixed sentences.

単音節音声入力装置からの出力であるカナ文字列を利用
者が確認してカナ文字列の誤シを補助キーボード等によ
って修正する方式も考えられるが操作性が悪く、音声入
力の特長が半減することになる。よって、音声認識結果
のあいまいさを認めた上で、これを補う処理を自動的に
行い、キーボードを介した人手による修正操作を極力少
なくすることがマンマシンインタフェース的視野からも
望ましい。It is possible to consider a method in which the user checks the kana character string output from the monosyllabic voice input device and corrects errors in the kana character string using an auxiliary keyboard, etc., but this method is difficult to operate and reduces the benefits of voice input by half. It turns out. Therefore, it is desirable from the perspective of a man-machine interface to recognize the ambiguity of the voice recognition result and automatically perform processing to compensate for it, thereby minimizing manual correction operations via the keyboard.

一方、日本語を単音節の連続と見たときに、単音節の連
接に関して顕著な特徴が見出されることが多い。例えば
「ア」という単音節の後には「イ」や「ン」が比較的多
く見られるものの（愛情、安心など）、「ゾ」や「ヨ」
はあｔ、ｂ見られない。On the other hand, when we look at Japanese as a series of monosyllables, we often find remarkable features regarding the conjunction of monosyllables. For example, after the single syllable ``a'', ``i'' and ``n'' are relatively common (in love, security, etc.), but ``zo'' and ``yo'' are relatively common.
Ha, b, I can't see it.

また、「ア」の前には「イ」や「す」が比較的多く見ら
れるものの（廃案２乗合など）、「ゾ」や「ヌ」は殆ん
ど見られないといった傾向がある。Additionally, although ``i'' and ``su'' are relatively often seen before ``a'' (e.g., in the case of scrapped 2-joint combinations), ``zo'' and ``nu'' are almost never seen.

このような単音節の連接に関する情報を有効に利用する
ことにより、単音節音声入力装置から出力されるカナ文
字候補の集合であるカナ文字列の候補の中から、日本語
として適当でないと思われるもの（即ち連続の可能性が
極めて低いカナ文字候補から構成されるカナ文字列）の
優先度を下げ、逆に日本語として適当であると思われる
もの（即ち連接の可能性が極めて高いカナ文字候補から
構成されるカナ文字列）の優先度を上げる等の処理を行
うことによシ、結果としてより確からしいカナ文字列を
評定することができる。By effectively utilizing such information regarding monosyllabic concatenation, it is possible to identify candidates for kana character strings that are not suitable for Japanese characters from among the candidate kana character strings that are a collection of kana character candidates output from the monosyllabic speech input device. (i.e., kana character strings consisting of kana character candidates with extremely low probability of concatenation), and conversely lower the priority of kana characters that are considered appropriate for Japanese (i.e., kana characters with extremely high probability of concatenation). By performing processing such as increasing the priority of kana character strings consisting of candidates, it is possible to evaluate more likely kana character strings as a result.

本発明の目的は、単音節音声入力装置の認識結果が一意
的に決定しない状態のときに、複数単音節間の連接情報
を利用してそれぞれの該単音節に対して得られたカナ文
字候補から構成されるカナ文字列を、優先度の高いもの
から予め指定された数だけ評定するカナ文字列評定方法
を提供し、音声カナ漢字変換方式をはじめとした音声入
力を伴う日本語情報処理装置の性能および操作性の向上
を実現せしめることにある。An object of the present invention is to provide kana character candidates obtained for each monosyllable by using conjunctive information between a plurality of monosyllables when the recognition result of a monosyllabic speech input device is not uniquely determined. The present invention provides a Japanese language information processing device that uses speech input, including a phonetic kana-kanji conversion method, and provides a method for evaluating kana character strings consisting of a pre-specified number of kana character strings in descending order of priority. The objective is to improve the performance and operability of the system.

本発明に関連した発明にカナ文字列決定方式（特願昭５
７−０９２７５７号）およびカナ文字列優先度決定方式
〇特願昭５７−０９２７５５号）があるが該カナ文字列
決定方式においてはカナ文字列の候補が１つしか得られ
なかった。また該カナ文字列優先度決定方式においては
すべてのカナ文字候補の組合わせ（即ち、例えば単音節
ごとのカナ文字候補が５ケずつあるｎ文字の語であれば
５ｎ通シの組合わせ）から成るカナ文字列について検討
しているため、処理時間が長く記憶量も膨大なもの・に
なっていた。An invention related to the present invention is a kana character string determination method (patent application filed in 1973).
No. 7-092757) and a kana character string priority determination method (Japanese Patent Application No. 57-092755), but in this kana character string determination method, only one kana character string candidate was obtained. In addition, in this kana character string priority determination method, from all combinations of kana character candidates (i.e., for an n-character word with 5 kana character candidates for each monosyllable, 5n combinations) Because we were considering kana character strings consisting of the following characters, the processing time was long and the amount of memory was enormous.

本発明においては、この２点を改善し、しかも優先度の
高いものから予め指定された数だけカナ文字列を評定す
るために次の方法をとっている。In the present invention, the following method is used to improve these two points and to evaluate a pre-specified number of kana character strings starting from those with higher priority.

即ち、ｍ音節部分カナ文字列にｍ＋１音節目のカナ文字
候補を接続しｍ＋１音節部分カナ文字列を生成する際に
、ｍ＋１音節部分カナ文字列優先度の小さい部分カナ文
字列（日本語として適当である可能性が極めて低い）を
排除することにしている。これによシ評定する部分カナ
文字列の数が減少するため、処理時間が短縮され記憶量
も減少するととになる。In other words, when connecting the m+1 syllable kana character candidate to the m-syllable partial kana character string to generate the m+1 syllable partial kana character string, the m+1 syllable partial kana character string is selected as a partial kana character string with a lower priority (appropriate for Japanese). We have decided to exclude cases (the possibility of which is extremely low). This reduces the number of partial kana character strings to be evaluated, which shortens processing time and reduces storage capacity.

以下に、図面を用いて具体的な寅施例を掲げて本発明を
説明するが、ここで用いた装置構成は本実施例以外の形
態をとることも可能であシ本発明の範囲を限定するもの
ではない。The present invention will be described below with reference to the drawings and specific examples; however, the device configuration used here may take a form other than this example, and the scope of the present invention is limited. It's not something you do.

第１図は本発明の一実施例を示すブロック図である。ｌ
は単音節音声入力装置であシ、２は単音節音声入力装置
１から出力されるカナ文字候補および該カナ文字候補の
尤度を一時的に記憶しておくためのカナ文字・尤度記憶
装置であシ、３は単音節の連接情報を記憶しておくため
の連接情報記憶装置であシ、４はカナ文字・尤度記憶装
置２内の前記カナ文字候補と前記尤度および連接情報記
憶装置３内の連接情報によって該カナ文字候補から構成
される部分カナ文字列を部分カナ文字列優先度の高いも
のから出力するカナ文字列・優先度演算装置であシ、５
はカナ文字列・優先度演算装置４から出力された部分カ
ナ文字列および該部分カナ文字列段先度を記憶するだめ
のカナ文字列・優先度記憶装置である。FIG. 1 is a block diagram showing one embodiment of the present invention. l
2 is a monosyllabic speech input device, and 2 is a kana character/likelihood storage device for temporarily storing kana character candidates output from the monosyllabic speech input device 1 and the likelihood of the kana character candidates. 3 is a conjunctive information storage device for storing monosyllable conjunctive information; 4 is a kana character candidate, the likelihood, and conjunctive information storage in the kana character/likelihood storage device 2; 5. A kana character string/priority calculation device that outputs partial kana character strings made up of the kana character candidates in descending order of the priority of the partial kana character strings based on concatenation information in the device 3;
is a kana character string/priority storage device that stores the partial kana character string outputted from the kana character string/priority calculating device 4 and the column precedence of the partial kana character string.

利用者は、単音節単位に区切った日本語文章を発声する
かあいはテープレコーダーに録音された音声を出力させ
る等の手段を用いて単音節列を単音節音声入力装置１に
入力する。単音節音声入力装置１は、入力された単音節
ごとにその単音節に対応するカナ文字候補と、そのカナ
文字候補の確からしさを表わす尤度をカナ文字・尤度記
憶装置２に出力する。The user inputs a monosyllable string into the monosyllabic speech input device 1 by uttering a Japanese sentence divided into monosyllabic units or by outputting recorded speech on a tape recorder. The monosyllabic speech input device 1 outputs, for each input monosyllable, a kana character candidate corresponding to that monosyllable and a likelihood representing the probability of the kana character candidate to the kana character/likelihood storage device 2.

第２図は、カナ文字・尤度記憶装ｗ、２Ｐ′３でカナ文
字候補とその尤度が記憶される一例を示した概念図であ
る。FIG. 2 is a conceptual diagram showing an example in which kana character candidates and their likelihoods are stored in the kana character/likelihood storage device w, 2P'3.

第２図において、Ａ（１，ｊ）（１，ｊはともに自然数
）は単音節音声入力装置１かも入力された第１音節の第
ｊ番目の候補であるカナ文字であシ、Ｂ（１，ｊ）（１
，ｊはともに自然数）はＡ（１，ｊ）の確からしさを数
値で表わした尤度である。In FIG. 2, A(1,j) (1,j are both natural numbers) is a kana character which is the j-th candidate of the first syllable input by the monosyllabic speech input device 1, and B(1 ,j)(1
, j are both natural numbers) is the likelihood expressed numerically as the probability of A(1, j).

一方、単音節の連接情報は、連接情報記憶装置３内に記
憶される。On the other hand, monosyllable conjunctive information is stored in the conjunctive information storage device 3.

第３図（ａｔ、（ｂｌ、（ｅ）はその記憶の状態の一例
を示す概念図である。FIGS. 3(at), (bl, and (e)) are conceptual diagrams showing an example of the storage state.

第３図ｆａｌにおいて、Ｍ（ｉ）（ｉは自然数）および
Ｎ（ｉ）（ｉは自然数）はカナ文字であシ、Ｃ［Ｍ（ｔ
）、Ｎ（ｉ））ｉｄ力＋文字Ｍ（１）ト力を文字Ｎ（ｉ
）の連接情報を数値で表現した値であり、具体的には例
えば文字列の連接頻度である。In Figure 3 fal, M(i) (i is a natural number) and N(i) (i is a natural number) are written in kana characters, C[M(t
), N(i)) id force + letter M(1) t force to letter N(i
) is a numerical value expressing the concatenation information, specifically, for example, the concatenation frequency of character strings.

第３図（ｂ）は第３図（ａ）の形式で連接頻度が与えら
れた例であシ、「アア」が１、「アイ」が１１７、「ア
ラ」が６という重み付は値を持っていることを示してい
る。Figure 3 (b) is an example in which the conjunctive frequency is given in the format shown in Figure 3 (a). It shows that you have it.

また、第３図ｔｅｌは、第３図ｆｂｌのデータをカナ文
字をキーにしたインデックス編成に構成した例であるが
、このように連接情報の記憶形態によっては、その検索
を効率良く行うことも可能である。In addition, tel in Figure 3 is an example in which the data in fbl in Figure 3 is organized into an index using kana characters as keys, but depending on the storage format of the linked information, it may be possible to search for it efficiently. It is possible.

カナ文字列・優先度演算装置４は、カナ文字・尤度記録
装置２内のカナ文字候補と該カナ文字候補の尤度および
連接情報記憶装置３内の連接情報によって、該カナ文字
候補から構成される部分カナ文字列を部分カナ文字列優
先度の高いものから予め指定された数だけ評定する装置
である。カナ文字列・優先に演算装置４の具体的な芙現
例は次の通シである。The kana character string/priority arithmetic device 4 is constructed from the kana character candidates based on the kana character candidates in the kana character/likelihood recording device 2, the likelihood of the kana character candidates, and the linkage information in the linkage information storage device 3. This is a device that evaluates a pre-specified number of partial kana character strings starting from those with higher partial kana character string priorities. A concrete example of the arithmetic unit 4 for kana character strings and priority is as follows.

カナ文字候補と該カナ文字候補の尤度が第２図の形式で
与えられた場合、ｍ音節部分カナ文字列は必ず（１）式
の形をとる。When a kana character candidate and the likelihood of the kana character candidate are given in the format shown in FIG. 2, the m-syllable partial kana character string always takes the form of equation (1).

Ａ（１＊ｘ１）Ａ（２，ｘｌ）＋＋＋・＋Ａ（ｍ、ｘｍ
）（１）ただしｘ、（１＝１．２．・・・・・・、ｍ）
はカナ文字候補番号でちる。以下の（２）、（３）式の
処理を第１音節目から繰シ返し適用して（１）式の形式
をとる部分カナ文字列を生成してゆき最後の第ｎ音節目
まで処理することによシ日本語として適当でないと思わ
れるものく即ち、連接の可能性が極めて低いカナ文字候
補から構成されるカナ文字列）を除き、逆に日本語とし
て適描であると思われるもの（即ち、連接の可能性が極
めて高いカナ文字候補から構成され不カナ文字列）を優
先度の為いものから予め指定された数だけ評定すること
ができる。A(1*x1)A(2,xl)+++・+A(m,xm
) (1) However, x, (1=1.2......, m)
is a kana character candidate number. The processing of equations (2) and (3) below is repeatedly applied starting from the first syllable to generate a partial kana character string in the form of equation (1), until the final n-th syllable is processed. Excluding those that are particularly inappropriate for Japanese (i.e., kana character strings consisting of kana character candidates with extremely low possibility of concatenation), and conversely, those that are considered to be appropriate depictions for Japanese. (In other words, non-kana character strings composed of kana character candidates with extremely high possibility of concatenation) can be evaluated by a predetermined number based on priority.

ＴＩ（ＰＩ）＝９’〔Ｂ（１，ｉ）、Ｂ（２，Ｊ）、ｆ
（Ｃ（Ａ（１，ｔ）。TI (PI) = 9' [B (1, i), B (2, J), f
(C(A(1,t).

Ａ（２，ｊ）月〕（２）ただし１＝１，２．・・・・・・、ｍ１ｊ＝１、２、・
・・・・・、ｍ１Ｐ１°１、２、−＝、ｍｌ＋ｍ鵞ｍｋ：第に音節目のカナ文字候補数％；に＋１音節部分カナ文字列優先度の集合ＴＪ（Ｐす＝９（Ｂ（ｊ＋１．ｊ）、ｆ（Ｃ（Ａ（Ｊ、
ｉ）、Ａ（）＋１゜＊ｊ）月、Ｔｌ−１（１））（３）ただしｉ＝１＋２＋＝・＝・＊ｍｌ（”Ｊ＋１””Ｎ）
ｊ＝１、２、・−・・・、ｍ４＋□ ＰＪ”１ｅ２ｔ−””＋”ｊ””Ｊ−１−１””２＋３
＋・・”−・、ｎ１（２）、（３）式において、関数ｆはカナ文字の連接頻
度を変数とする関数で、具体的には、列えは連接頻度を
予め指定された閾値で抑える関数とすることによシ、連
接頻度のファクタが大きすぎるだめに原データより正解
率が下がるという事態を避は精度を上げることも可能に
なる。A(2,j) month] (2) However, 1=1,2. ......, m1j=1, 2,...
......, m1 P1°1, 2, -=, ml+m鵞mk: Number of kana character candidates for the 1st syllable %; Set of +1 syllable partial kana character string priorities TJ (P = 9 (B (j+1.j), f(C(A(J,
i), A()+1゜* j) month, Tl-1(1)) (3) where i=1+2+=・=・*ml("J+1""N)
j=1, 2,..., m4+□ PJ"1e2t-""+"j""J-1-1""2+3
+...”-・, n1 In equations (2) and (3), the function f is a function that uses the concatenation frequency of kana characters as a variable, and specifically, the ordering is based on the concatenation frequency with a prespecified threshold value. By using a suppressing function, it is possible to avoid the situation where the accuracy rate is lower than the original data due to the connection frequency factor being too large, and it is also possible to increase the accuracy.

また関数９はカナ文字候補の尤度ｘ、（ｉ＝１．２゜・
・・・・・、ｊ）と、連接頻度を関数ｆに代入して得ら
れた値ｙと、部分カナ文字列の優先ｆｌＪｚとによって
優先度を決定する関数であシ、具体的には、例えばｆ（
ｘｌ、ｘｌ、−＝、ｘ、、ｙ、ｚ）＝ｘ１＋ｘ＠＋＝＋
ｘ、＋ｙ＋ｔといった数式で表現できる。関数９′２ｇ
“はそれぞれ７．０）、ｆ″（ｙ、ｚ）ミ’（ＯｒＯｒ
”””ｅＯ＊７＋”）とする。Function 9 also calculates the likelihood x of kana character candidates, (i=1.2°・
..., j), the value y obtained by substituting the concatenation frequency into the function f, and the priority flJz of the partial kana character string. Specifically, For example, f(
xl, xl, -=, x,, y, z)=x1+x@+=+
It can be expressed by a mathematical formula such as x, +y+t. Function 9′2g
" are 7.0 respectively), f" (y, z) mi' (OrOr
"""eO*7+").

＜２）、（３）式を説明すると次のようになる。<2), Equations (3) are explained as follows.

まず、（２）式でｍ、・ｍ１通シ１文字目と２文字目の
連接頻度を関数ｆに代入して得られた値と１文字目と２
文字目の尤度を用いて優先度集合ＴＩを得る。このＴ１
を降順に並べかえ降順優先度集合ＴＦを得る。同時に、
２音節部分カナ文字列を並ヘカＬＸ（１＊、ｊ＊）（１
＊＝１．２ｊ＊＝１、２、−＝、ｍｓ）ｔ’得る。ただ
し、この段階で第２音節目のカナ文字候補数ｍ、は予め
指定された数Ｎとする。First, in equation (2), the value obtained by substituting the concatenation frequencies of the first and second characters of m, ・m1 through, and the first and second characters.
A priority set TI is obtained using the likelihood of the characters. This T1
are rearranged in descending order to obtain a descending priority set TF. at the same time,
The two-syllable partial kana character string is parallelized with LX(1*,j*)(1
*=1.2j*=1, 2, -=, ms) t' is obtained. However, at this stage, the number m of kana character candidates for the second syllable is set to a prespecified number N.

２文字目以降は、（３）式でｍ□・ｍ□□通９１文字目
と１＋１文字目の連接頻度を関数ｆに代入して得られた
値と２＋１文字目の尤度と１音節部分カナ文字列優先度
を用いて優先度集合ＴＪを得る。これよシ降順優先度集
合ＴＪおよびＡ（ｉｊ）（１＝１．２．”’＝’、１＋
ＩＪ＝１．２．”””、ｍｊ＋１）を得る。From the 2nd character onward, the value obtained by substituting the conjunctive frequency of the 91st character and 1+1 character of m□・m□□ to the function f in equation (3), the likelihood of the 2+1 character, and the 1st syllable part. A priority set TJ is obtained using the kana character string priorities. This is the descending priority set TJ and A(ij) (1=1.2."'=', 1+
IJ=1.2. """, mj+1) is obtained.

このときｍ□や□へとする。この（３）式の処理を１＝
２．３．・・・・・・、ｎ−１について繰シ返し行なう
ことによって優先度の高いものから予め指定された数だ
けカナ文字列を評定することができる。At this time, move to m□ or □. The processing of this equation (3) is 1=
2.3. By repeating this process for n-1 characters, it is possible to evaluate a predetermined number of kana character strings starting from the highest priority.

このようにしてカナ文字列を優先度の高いものから予め
指定された数だけカナ文字列・優先度記憶装置５に記憶
させることによシ、入力された音声に対応するカナ文字
列を優先度の高いものから４予め指定された数だけ得る
ことができる。In this way, by storing a predetermined number of kana character strings in the kana character string/priority storage device 5 starting from those with high priority, the kana character strings corresponding to the input voice can be prioritized. You can obtain a pre-specified number of 4 from the highest.

また、今までの説明ではカナ２文字間の連接のみ扱って
いるが、カナ３文字以上の連接について扱うことも十分
可能である。Furthermore, although the explanation up to now has dealt with only connections between two kana characters, it is also possible to deal with connections between three or more kana characters.

第４図は、３文字間の連接頻度を記憶した例で、「アイ
ア」が０、「アイイ」が５、「アイエ」が２、「アイ力
」が４といった重み付は値を持っていることを示してい
る。この場合、前述の優先度集合Ｔは次のように表現で
きる。Figure 4 is an example of memorizing the conjunctive frequency between three characters, and the weights have values such as ``Aia'' is 0, ``Aii'' is 5, ``Aie'' is 2, and ``Ai-ryoku'' is 4. It is shown that. In this case, the priority set T mentioned above can be expressed as follows.

Ｔｔ、（Ｐｔ）＝／ＣＢ（１、ｓ）、Ｂ（２，ｊ）、Ｂ
（３，ｋ）、ｆ（Ｃ（Ａ（１，１）、Ａ（２，ｊ）、Ａ
（３，ｋ）月〕（４）ただしｌ＝１、２、−・・・・・
、ｍ１ｊ＝１、２、・・・・−・、ｍ１ｋ＝１、２、・・・・・・２ｍ。Tt, (Pt) = /CB (1, s), B (2, j), B
(3, k), f(C(A(1,1), A(2,j), A
(3, k) month] (4) However, l = 1, 2, -...
, m1j=1, 2,...-, m1 k=1, 2,...2m.

Ｐ１＝１、２、−＝、ｍｌｌｍＣｍ１１＊＊ＴＬ（ＰＪ、）＝ｆ（Ｂ（Ｌ＋２．ｊ）、ｆ（Ｃ（Ａ（
Ｌ、ｓ）、Ａ＊（Ｌ＋ｚ、ｊ）月、Ｔ□−□（１））’（５）ただし”
””’ｌ”Ｊ、−１（ｍｊ＋２＝Ｎ）ｊ＝１．２．−一
・・・ｅｍＬ＋ｔＰｊ＝１、２＊−＝＊ｍＬ＋１°ｎｌｌ＋２ｊ＝２．３
．−・・・・−、ｎ−２また、カナ文字の連接情報の文字の代シに第５図＋８１
、（ｂ）のように文字列の先頭を意味する記号や終端を
意味する記号を含むことも考えられる。P1=1, 2, -=, mllmCm11** TL(PJ,)=f(B(L+2.j), f(C(A(
L, s), A* (L+z, j) month, T□−□(1))'(5) However”
""'l"J, -1 (mj+2=N)j=1.2.-1...emL+t Pj=1, 2*-=*mL+1°nll+2j=2.3
．． −・・・−, n−2 In addition, Fig. 5 +81
, (b), it is also possible to include a symbol denoting the beginning of the character string and a symbol denoting the end.

第５図（ａ）、（ｂ）は、その−例を示したものであシ
、■は文字列の先頭を意味する記号であシ、■は文字列
の終端を意味する記号である。FIGS. 5(a) and 5(b) show an example thereof, where ■ is a symbol that means the beginning of a character string, and ■ is a symbol that represents the end of a character string.

第５図（ａｌは「■」「ア」（即ち「ア」で始まるとい
う意味）の重み付は値が１００で「ン」「■」（即ち「
ン」で終わる意味）の重み付は値が５０であることを示
している。そして第５図ｆｂｌは「■」「力」「ン」（
「力」「ン」で始まる）が４５、「イ」「り」「■」（
「イ」「り」で終わる）が３０の重各付は値を持つこと
を示している。これらの場合においては優先度τはそれ
ぞれ次のように表現できる。Figure 5 (al means "■""A" (i.e., meaning that it starts with "A") has a value of 100, and the value is "N""■" (i.e. "
The weighting of ``meanings ending with ``'' indicates that the value is 50. And Fig. 5 fbl is "■", "power", "n" (
``Charcoal'' (starting with ``N'') is 45, ``I'', ``Ri'', ``■'' (
Ending with ``i'' and ``ri'') indicates that it has a value of 30. In these cases, the priority τ can be expressed as follows.

２文字連接；Ｔｓ（Ｐｔ）＝ｆ’（Ｂ（１，ｌ）、ｆ（Ｃ（■、Ａ（
１，ｉ）月〕（６）ただしｉ＝１、２、・・−・・−、
ｍ１Ｐ１、＝１．２．・・・・・・、ｍ１＊Ｔｊ（Ｐｉ）＝９（Ｂ（Ｊ、ｊ）、ｆ（Ｃ（Ａ（ｊ−１
，ｉ）、Ａ（ｊ、ｊ）刀。Two-character concatenation; Ts(Pt)=f'(B(1,l), f(C(■, A(
1, i) Month] (6) However, i = 1, 2, ・・・・・−,
m1P1,=1.2. ......, m1 * Tj (Pi) = 9 (B (J, j), f (C (A (j-1)
,i),A(j,j) sword.

＊Ｔ□−１（１））（７）ただしＩ＝１＋２、−””１ｍ２−＋（＝２””Ｎ）ｊ
＝１、２、・・・・・・、ｍ４ｐ、ｚ＝１＋２＋＝””＋ｍｌ−ｓ・ｍノｊ＝２、３、
・・・・・・、ｎ＊＊Ｔｎ＋１（Ｐｎ＋１””“（ｒ（Ｃ（Ａ（ｎ、ｔ）、■
月、Ｔｎ（ｉ）：）（８）ただしｉ＝１．２．・・・・
・・ｒｍｎＰ＝１．２．・・・・・・＋ｍｎｎ＋１３文字連接纂Ｔ＋（Ｐｔ）−２’（Ｂ（１，＋）、Ｂ（２，ｊ）、ｒ
（Ｃ（■、Ａ（１，ｉ）。*T□-1(1))(7) However, I=1+2, -""1m2-+(=2""N)j
= 1, 2, ......, m4 p, z = 1 + 2 + = "" + ml - s m no j = 2, 3,
......, n ** Tn+1(Pn+1"""(r(C(A(n, t), ■
month, Tn(i):) (8) where i=1.2.・・・・・・
...rmnP=1.2.・・・・・・+mn n+1 3-character concatenation T+(Pt)-2'(B(1,+), B(2,j), r
(C(■, A(1,i).

Ａ（２，ｊ）月〕（９）ただしｔ＝ｉ、２、・・・・・・、ｍ１ｊ＝１、２、・
・・・・・１ｍ。A (2, j) month] (9) where t=i, 2,..., m1j=1, 2,...
...1m.

Ｐ、＝１．２．・・・・・・、ｍｌ・ｍ。P,=1.2. ......, ml・m.

Ｔｚ（Ｐｚ）＝２ＣＢ（Ｊ＋１、ｊ）、ｆ（Ｃ（Ａ（ｚ
−ｉ、１）、Ａ＊＊＊（ｔｔｔ）、Ａ（ｚ−＋’１．ｊ）月、ＴＬ−□（１）
＋ＩＱただしＩ＝１．２．”’＝’、ｍＬ（ｍ４４．＝
Ｎ）ｊ＝１、２、・・・’−、ｍＪ＋□ ｐＪ：＝１ｔ２＋−＝ｔ＝２”ｍｚ４−ｔｊ＝２．３．
・−・・＝、ｎ−１Ｔｎ（Ｐｎ）＝す［ｆ（Ｃ（Ａ＊（ｎ−１，ｉ）、Ａ＊
（ｎ、ｉ）、０月。Tz (Pz) = 2CB (J + 1, j), f (C (A (z
-i, 1), A** * (ttt), A(z-+'1.j) month, TL-□(1)
+IQ However, I=1.2. ”'=', mL (m44.=
N) j=1, 2,...'-, mJ+□ pJ:=1t2+-=t=2"mz4-tj=2.3.
・-・=, n-1 Tn(Pn)=su[f(C(A*(n-1,i), A*
(n, i), October.

αη Ｔ、、（１）〕ただしｉ＝１、２、・・・・・・１ｍｎＰ＝１．２．・
−・・・１ｍｎ第６図（ａ）はカナ文字列・優先度演算装置４内のデー
タ構造を示した例でろシロは優先度の順位、７は部分カ
ナ文字列、８は部分カナ文字列７の優先度である。αη T,, (1)] where i=1, 2,...1mnP=1.2.・
-...1mn Figure 6 (a) is an example showing the data structure in the kana character string/priority calculation device 4. The white mark indicates the priority order, 7 indicates the partial kana character string, and 8 indicates the partial kana character string. It has a priority of 7.

第６図（ａ）のように予め指定された数（前記説明中の
Ｎ）が５である場合には、例えば単音節ごとのカナ文字
候補が５ケずつあるｎ文字の語に（２）。If the pre-specified number (N in the above explanation) is 5 as shown in Figure 6(a), for example, in a word of n letters with 5 kana character candidates for each monosyllable, (2) .

（３）式の処理を施したとすれば、２音節部分カナ文了
列を生成するのに５×５通り、２音節部分カナ文字列の
うち上位候補と３音節目のカナ文字候補より３音節部分
カナ文字列を生成するのに５×５通り、これを繰シ返す
から合計５Ｘ５Ｘ（ｎ−１）通りの組合わせから成る部
分カナ文字列のみを評定することになるため、前記カナ
文字列優先度決定方式（特願昭５７−０９２７５５）が
５ｎ通シの組合わせから成る部分カナ文字列ケ評定する
のに比べ、処理時間が短縮され記憶量も減少することに
なる。If we apply the processing in equation (3), we can generate a two-syllable partial kana sentence string in 5×5 ways, from the top candidates of the two-syllable partial kana character string and the third syllable kana character candidate. This is repeated in 5×5 ways to generate a syllable partial kana character string, so only partial kana character strings consisting of a total of 5×5X(n-1) combinations are evaluated. Compared to the column priority determination method (Japanese Patent Application No. 57-092755), which evaluates partial kana character strings consisting of 5n combinations, the processing time is shortened and the storage capacity is also reduced.

また前記データ構造を第６図（ｂｌのようなセル形式に
することもできる。図において、９は部分カナ文字列の
入る候補テーブル、１０は制御情報の入る制御セルであ
シ、制御セル１０には制御情報として候補テーブル９中
のどの部分カナ文字列を指すかを示すテーブルポインタ
１１、候補テーブル９中に側音節分の部分カナ文字列が
入っているかを示す音節数１２、該部分カナ文字列の優
先度１３、およびそのセルの次に優先度の高いセルを指
すセルポインタ１４が格納されている。また、１５は優
先度１３が最大のセルへの最大セルポインタ、１６は最
大セルポインタ１５と制御セル１０から成る制御テーブ
ルである。このようにセル形式のデータ構造を用いて部
分カナ文字列をテーブルとポインタで制御することによ
り並びかえ゛が単純になるため、処理時間が短縮できる
上に候補テーブル９の大きさを一定化することで、部分
カナ文字列の候補数（即ち前記説明中のＮの値９による
絞シ込み（即ち、たとえば候補テーブル９の大きさが２
５文字分だと仮定すれば５文字の語の場合、２文字目ま
でなら１２通シの候補が格納できるが、３，４．５文字
目と処理してゆくに従い優先度の高い候補から８通シ、
６通シ、５通シと部分カナ文字列の候補数が変化してゆ
くこと）ができるため、最初は多くの可能性を残してお
き終わシに近づくにつれて下位のカナ文字列（即ち、連
接の可能性の極めて低いカナ文字候補から構成されるカ
ナ文字列）が上位候補に出現する危険性を抑え、かつ処
理時間の短縮を可能にすることもできる。The data structure can also be in a cell format as shown in FIG. As control information, a table pointer 11 indicating which partial kana character string in the candidate table 9 is pointed to, a syllable number 12 indicating whether a partial kana character string for a side syllable is included in the candidate table 9, and the corresponding partial kana character string. A character string priority 13 and a cell pointer 14 pointing to the next highest priority cell after that cell are stored.Furthermore, 15 is the maximum cell pointer to the cell with the highest priority 13, and 16 is the maximum cell pointer. This is a control table consisting of a pointer 15 and a control cell 10. By using a cell-format data structure and controlling partial kana character strings with the table and pointer, rearranging becomes simple, reducing processing time. In addition, by making the size of the candidate table 9 constant, the number of candidates for partial kana character strings (i.e., narrowing down by the value of N in the above explanation of 9 (i.e., for example, the size of the candidate table 9 is 2).
Assuming that there are 5 characters, in the case of a 5-character word, 12 candidates can be stored up to the 2nd character, but as the 3rd, 4th, and 5th characters are processed, 8 candidates with higher priority can be stored. Toshi,
The number of candidates for partial kana character strings changes from 6 letters to 5 letters), so we leave many possibilities open at the beginning, and as we get closer to the end, lower kana character strings (i.e., concatenated It is also possible to suppress the risk that a kana character string (consisting of kana character candidates with extremely low probability) will appear among the top candidates, and to shorten the processing time.

第７図は利用者が発声した単音節列を単音節音声入力装
置１を通してカナ文字・尤度記憶装置２に出力したカナ
文字候補と尤度の一例であ如、図において括弧内の数字
は尤度を示している。また、第８図はカナ文字列・優先
度演算装置４によって、カナ文字・尤度記憶装置２内の
カナ文字候補と該カナ文字候補の尤度（第７図）および
連接情報記憶装置３内の連接情報とによって該カナ文字
候補から構成されるカナ文字列を優先度の高いものから
カナ文字列・優先度記憶装置５に出力したものであシ、
（ａｌは（６）、（７）、（８）式の処理を、（ｂｌは
（９）、（１０。FIG. 7 is an example of kana character candidates and likelihoods outputted from a monosyllable string uttered by a user to the kana character/likelihood storage device 2 through the monosyllabic voice input device 1. In the figure, the numbers in parentheses are It shows the likelihood. FIG. 8 also shows the kana character candidates in the kana character/likelihood storage device 2 and the likelihoods of the kana character candidates (FIG. 7) and the connection information storage device 3 using the kana character string/priority calculation device 4. The kana character strings composed of the kana character candidates are output to the kana character string/priority storage device 5 in descending order of priority based on the concatenation information.
(al processes equations (6), (7), and (8); (bl processes equations (9) and (10).

０υ式の処理を実施した結果である。なお第８図中の括
弧内の数字は優先度を示している。This is the result of implementing 0υ type processing. Note that the numbers in parentheses in FIG. 8 indicate the priority.

第７図と第８図に示したように単音節での認識結果の第
１候補が誤っていても連接情報を用いることによシ正し
いカナ文字列が第１候補として現われ得る。まだ、たと
え第１候補として現われなくても上位候補に現われる可
能性は高い。As shown in FIGS. 7 and 8, even if the first candidate as a monosyllable recognition result is incorrect, a correct kana character string can appear as the first candidate by using the concatenation information. Even if you do not appear as the first candidate, there is a high possibility that you will appear among the top candidates.

以上述べたように１本発明によれば単音節音声入力装置
のｇ識結果が一意的に決定しない状態の−ときに、カナ
文字候補から構成されるカナ文字列を優先度の高いもの
から順に得ることができるため、利用者による非効率的
なカナ文字修正等の作業を軽減することができ、効率の
良い音声カナ漢字変換方式等を実現することができる。As described above, according to the present invention, when the g recognition result of a monosyllabic speech input device is not uniquely determined, kana character strings consisting of kana character candidates are sorted in descending order of priority. Therefore, the user's work such as inefficient correction of kana characters can be reduced, and an efficient phonetic kana-kanji conversion method can be realized.

本発明の説明のために用いたカナ文字、尤度。Kana characters and likelihood used to explain the present invention.

連接情報などの情報記憶形態等は本実施例以外の形態を
とることも可能であ夛、本発明の範囲を限定するもので
はない。Information storage formats such as linkage information may take a format other than this example, and the scope of the present invention is not limited thereto.

[Brief explanation of drawings]

第１図は本発明を実現するための一実施例を示したブロ
ック図であシ、図において、１は単音節音声入力装置、
２はカナ文字・尤度記憶装置、３は連接情報記憶装置、
４はカナ文字列・優先度演算装置、５はカナ文字列・優
先度記憶装置である。第２図はカナ文字候補と尤度の記憶形態例を示す概念図
、第３図ｆａ）、（ｂ）、（ｃ）および第４図および第
６図１ａｌ、（ｂ）は、いずれも連接情報の記憶形態例
を示す概念図、第６図１ａｌ、（ｂ）は、カナ文字列・
優先度演算装置４内のデータ構造例を示す概念図、代理
人弁戸士ＩＮＪｉ＋ξ晋ミ菅η討ギ７口FIG. 1 is a block diagram showing an embodiment for realizing the present invention. In the figure, 1 is a monosyllabic voice input device;
2 is a kana character/likelihood storage device, 3 is a conjunction information storage device,
4 is a kana character string/priority calculation device, and 5 is a kana character string/priority storage device. Fig. 2 is a conceptual diagram showing an example of the storage format of kana character candidates and likelihoods; Fig. 3 fa), (b), (c) and Fig. 4 and Fig. 6 la, (b) are all connected. A conceptual diagram showing an example of an information storage format, FIG. 6 1al, (b) shows a kana character string/
Conceptual diagram showing an example of the data structure in the priority calculation device 4, agent Bentoshi INJi + ξ Shinmi Suga η Togi 7 mouth

Claims

[Claims]

(1) Likelihood indicating the probability of one or more types of kana character candidates for each monosyllable of a Japanese sentence pronounced in monosyllabic units, and the likelihood between multiple monosyllabic characters stored in advance When evaluating the kana character string composed of the kana character candidates and the priority of the kana character string using the linkage information, one or more types of m-syllable portions composed of m syllables of the kana character candidates. the connection information relating to the kana character candidate in the kana character string and the kana character candidates after the m+1 syllable; the m-syllable partial kana character string priority representing the certainty of the m-syllable partial kana character string; and m+1. The tn+1 syllable partial kana character string priority is determined using the above-mentioned likelihood of the syllable, and a pre-specified number of m+1 syllable partial kana character strings are selected from the m+1 syllable partial kana character strings with high priority. A kana character string evaluation method, characterized in that one type or plurality of kana character strings and the priority of the kana character strings are evaluated by repeating a generation process.

(2) When generating one or more types of partial kana character strings, the number of partial kana character strings to be generated is made variable depending on the length of the partial kana character strings. The kana character string evaluation method described in (1).