JP2003296661A

JP2003296661A - Character recognition device, character recognition method, its execution program and recording medium recording it

Info

Publication number: JP2003296661A
Application number: JP2002095511A
Authority: JP
Inventors: Akira Nakamura; 明中村; Hiromitsu Kawajiri; 博光川尻
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 2002-03-29
Filing date: 2002-03-29
Publication date: 2003-10-17
Anticipated expiration: 2022-03-29
Also published as: JP3970075B2

Abstract

<P>PROBLEM TO BE SOLVED: To resolve a problem in conventional character recognition wherein a recognition candidate character that should be normally excluded from an object of post processing is outputted as a highly reliable object of post processing, a recognition candidate character that should normally be the object of post processing is considered unreliable and excluded from the object of post processing, and recognition accuracy is deteriorated due to determination of reliability. <P>SOLUTION: The character recognition device has a character recognition means for recognizing a sequence of coordinate points of a character inputted by handwriting, a feature extracting means for calculating an average writing speed of the sequence of coordinate points of the character inputted by handwriting as a feature amount for calculating reliability of a determination object recognition candidate character group outputted by the character recognition means, a reliability calculating means for calculating the reliability of the determination object recognition candidate character group on the basis of the feature amount from the feature extracting means and a statistical trend of sample data, and a post processing control means for controlling the post processing of the determination object recognition candidate character group on the basis of the reliability from the reliability calculating means. <P>COPYRIGHT: (C)2004,JPO

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、文字認識結果の信
頼度（確からしさ）を判定することにより、手書き入力
された文字を認識する文字認識装置、文字認識方法、そ
の実行プログラムおよびそれを記録した記録媒体に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition device, a character recognition method for recognizing a character input by handwriting by determining the reliability (probability) of a character recognition result, an execution program thereof, and a recording thereof. Recording medium.

【０００２】[0002]

【従来の技術】従来の文字認識方法においては、たとえ
ば筆記入力された文字の特徴量を抽出し、これを認識辞
書中の特徴量と比較して、両者の類似度が高い、もしく
は両者の距離値が小さい（これらをまとめて、便宜上、
「確信度」が高いと称する）認識候補文字を出力するよ
うにしていた。しかしながら、かかる一文字毎の文字認
識では、筆記文字が認識辞書の特徴量に近接している場
合には比較的精度の良い認識結果が得られるが、認識辞
書の特徴量から離れた文字を筆記した場合には、適正な
認識結果を簡単に得ることができない。2. Description of the Related Art In a conventional character recognition method, for example, a feature amount of a character input by handwriting is extracted and compared with a feature amount in a recognition dictionary, and the similarity between them is high or the distance between them is high. Small value (collectively, for convenience,
The recognition candidate character is output as "high confidence". However, in such character recognition for each character, a relatively accurate recognition result can be obtained when the written character is close to the feature amount of the recognition dictionary, but a character far from the feature amount of the recognition dictionary is written. In this case, a proper recognition result cannot be easily obtained.

【０００３】そこで、かかる一文字毎の文字認識に加
え、前後の文字あるいは単語間・文節間の連接確率ない
し共起確率を検出し、前記文字毎の確信度とこれらの確
率とから文字列の整合度を算出し、かかる整合度に従っ
て文字列全体の認識文字列候補を出力する、いわゆる後
処理が実行されている。ところが、かかる後処理の際
に、あまりに多くの認識候補文字を対象とすると、後処
理の計算処理時間が増大してしまう。また、確信度の低
い認識候補文字を対象とすると、後処理の結果、却って
誤った文字列候補を出力する恐れもある。Therefore, in addition to the character recognition for each character, the concatenation probability or co-occurrence probability of the preceding or following characters or words or phrases is detected, and the character string is matched from the certainty factor for each character and these probabilities. The so-called post-processing is performed to calculate the degree and output the recognized character string candidates for the entire character string according to the degree of matching. However, if too many recognition candidate characters are targeted during such post-processing, the post-processing calculation processing time will increase. Further, when the recognition candidate character having a low certainty factor is targeted, there is a possibility that a wrong character string candidate may be output as a result of the post-processing.

【０００４】そこで、計算時間の増大を抑えながら後処
理の精度を高めるために、後処理の対象とする候補文字
を制限する種々の方式が提案されている。代表的なもの
として例えば、（ａ）文字認識により得られる各候補文
字の確信度を直接的に用いる方式、（ｂ）各候補文字の
確信度から認識結果の信頼度を推定し、信頼度に応じて
候補文字数を制御する方式、（ｃ）隣接する認識候補文
字群相互の間の言語的な連接関係から認識結果の信頼度
を推定する方式、などが挙げられる。Therefore, in order to improve the accuracy of the post-processing while suppressing the increase of the calculation time, various methods of limiting the candidate characters to be the target of the post-processing have been proposed. For example, (a) a method of directly using the certainty factor of each candidate character obtained by character recognition, (b) estimating the reliability of the recognition result from the certainty factor of each candidate character, A method of controlling the number of candidate characters accordingly, a method of (c) estimating the reliability of the recognition result from the linguistic connection between adjacent recognition candidate character groups, and the like.

【０００５】（ａ）はもっとも単純な方式である。即
ち、各候補文字の確信度を所定のしきい値と比較し、こ
のしきい値より確信度が高い候補文字のみを後処理の対
象とする方式である。また、（ｂ）の方式において認識
結果の信頼度を求める方法の一例としては、特開平０９
−２５９２２６号公報「認識結果の評価方法および認識
装置」が挙げられる。これは１位候補文字の確信度と２
位候補文字の確信度の差分値を求め、この差分値と１位
候補文字の確信度の線形和を認識結果の正解らしさの尺
度とする方法である。この方法は、認識結果が正解の場
合１位候補文字の確信度が比較的高く、かつ、１位候補
文字の確信度と２位候補文字の確信度の差が比較的大き
い傾向に着目したものである。また、この方法以外に
も、１位候補文字の確信度と２位以下の各候補文字の確
信度との比を用いる方法、各候補文字の確信度を多次元
の確率分布ととらえ、統計的に信頼度を求める方法な
ど、種々の方法が提案されている。(A) is the simplest method. That is, this is a method in which the certainty factor of each candidate character is compared with a predetermined threshold value, and only candidate characters having a certainty factor higher than this threshold value are subjected to post-processing. In addition, as an example of the method of obtaining the reliability of the recognition result in the method of (b), there is Japanese Patent Laid-Open No.
-259226 gazette "a recognition method evaluation method and a recognition apparatus" is mentioned. This is the certainty of the 1st place candidate character and 2
This is a method of obtaining a difference value of the certainty factor of the rank candidate character and using the linear sum of this difference value and the certainty factor of the first rank candidate character as a measure of the correctness of the recognition result. This method focuses on the tendency that the confidence of the 1st place candidate character is relatively high when the recognition result is correct and the difference between the 1st place candidate character and the 2nd place candidate character is relatively large. Is. In addition to this method, a method that uses the ratio of the certainty factor of the first-ranked candidate character and the certainty factor of each of the second and lower candidate characters, the certainty factor of each candidate character is regarded as a multidimensional probability distribution, and Various methods have been proposed, such as a method for obtaining reliability.

【０００６】更に（ｃ）の方式としては、判定対象の認
識候補文字群とその直前または直後の認識候補文字群に
含まれる各文字間の連接確率に着目して、認識結果の信
頼度を統計的に推定する方式である。Further, as the method (c), the reliability of the recognition result is statistically evaluated by paying attention to the connection probability between each character included in the recognition candidate character group to be judged and the recognition candidate character group immediately before or after the recognition candidate character group. It is a method to estimate it.

【０００７】[0007]

【発明が解決しようとする課題】しかしながら、一般に
文字認識の結果得られる各候補文字の確信度は、必ずし
もその候補文字の正解らしさを適切に反映していない。
例えば、比較的乱雑に書かれた文字の場合、たとえその
候補文字が正解であったとしても確信度は低い傾向があ
る。一方、候補文字と正解文字が類似している場合は、
不正解であってもその確信度は高い場合がある。したが
って、前記（ａ）の方式で精度良く候補文字を制限する
ことは困難である。However, generally, the certainty factor of each candidate character obtained as a result of character recognition does not necessarily properly reflect the correctness of the candidate character.
For example, in the case of relatively disorderly written characters, the certainty factor tends to be low even if the candidate character is the correct answer. On the other hand, if the candidate character and the correct character are similar,
Even if the answer is incorrect, the certainty factor may be high. Therefore, it is difficult to accurately limit the candidate characters by the method (a).

【０００８】また、１位候補文字と２位候補文字の組み
合わせが「つ」−「フ」、「之」−「え」など類似文字
の場合には、認識結果が正解か否かに拘らず、これらの
確信度の差分値は小さい傾向がある。ひらがな、カタカ
ナ、漢字、英数字のすべてを認識対象とする場合には、
このような類似文字の組み合わせが頻繁に発生するた
め、前記（ｂ）の方式でもやはり認識結果の信頼度を精
度良く推定することは難しい。When the combination of the first-ranked candidate character and the second-ranked candidate character is a similar character such as "tsu"-"fu", "yuki"-"e", regardless of whether the recognition result is correct or not. , The difference value of these certainty factors tends to be small. If you want to recognize all hiragana, katakana, kanji, and alphanumeric characters,
Since such combinations of similar characters frequently occur, it is difficult to accurately estimate the reliability of the recognition result even with the method (b).

【０００９】一方、前記（ｃ）の方式では、方式（ａ）
および（ｂ）に見られるような確信度の特性に起因する
問題は回避される。しかし、この方式では文字間の連接
確率が低い文字列（例えば使用頻度の低い専門用語や固
有名詞等）の場合、認識結果の信頼度を精度良く求めら
れない場合がある。したがって、前記従来技術によれ
ば、本来後処理の対象から除外すべき認識候補文字を信
頼度の高いものとして後処理の対象として出力し、逆
に、本来後処理の対象とすべき認識候補文字を信頼度の
低いものとして後処理の対象から除外する結果が生じ、
信頼度の判定によって却って認識精度を低下させる結果
を招いていた。これら従来技術に共通する点は、信頼度
判定の算出に用いる特徴量を文字認識結果からのみ抽出
しているところにある。On the other hand, in the method (c), the method (a)
Problems due to the belief property as seen in (b) and (b) are avoided. However, in this method, in the case of a character string having a low concatenation probability between characters (for example, technical terms or proper nouns that are rarely used), the reliability of the recognition result may not be obtained accurately. Therefore, according to the above-mentioned conventional technique, the recognition candidate character that should be excluded from the target of the post-processing is output as the target of the post-processing with high reliability, and conversely, the recognition candidate character that should be the target of the post-processing is output. Results in excluding from post-processing as unreliable,
The result of the reliability judgment is that the recognition accuracy is rather lowered. The common point of these prior arts is that the feature amount used for calculation of reliability determination is extracted only from the character recognition result.

【００１０】ところで、スタイラス等により手書き入力
された文字を認識するいわゆるオンライン手書き文字認
識においては、入力データである時系列の座標点列が、
認識結果の信頼度推定に有用な情報を含んでいる場合が
ある。例えば、乱雑に書かれれた文字は一般に筆記速度
が速く、丁寧に書かれた文字では筆記速度が遅い傾向が
ある。乱雑に書かれた文字では誤認識が起こりやすいた
め、認識結果の信頼度は低下すると考えられる。また、
一般に筆記画数が少ない場合には類似文字が多いためや
はり誤認識が起こりやすくなる。By the way, in so-called on-line handwritten character recognition for recognizing characters handwritten by a stylus or the like, the time-series coordinate point sequence, which is input data, is
It may contain information useful for estimating the reliability of the recognition result. For example, randomly written characters tend to have a faster writing speed, while carefully written characters tend to have a slower writing speed. Since misrecognition is likely to occur in randomly written characters, the reliability of the recognition result is considered to decrease. Also,
In general, when the number of writing strokes is small, many similar characters are present, so that erroneous recognition is likely to occur.

【００１１】さらに、筆記画数が認識処理の結果得られ
る候補文字の正規画数と比較して著しく小さいような場
合、例えば筆記画数が１画であるのに対して１位候補文
字の正規画数が１０画であるような場合は、画数の多い
文字を１画で筆記したような場合、すなわち極端なつづ
け字である可能性が高い。このような場合もやはり誤認
識が起こりやすくなる。したがって、筆記画数と候補文
字の正規画数相互の関係もまた文字認識結果の信頼度を
反映している。Further, when the number of writing strokes is significantly smaller than the number of normal strokes of the candidate character obtained as a result of the recognition processing, for example, the number of writing strokes is one, whereas the number of regular strokes of the first-ranked candidate character is ten. In the case of a stroke, it is highly likely that a character with a large number of strokes is written in one stroke, that is, an extreme continuous character. Even in such a case, erroneous recognition is likely to occur. Therefore, the mutual relationship between the writing stroke number and the regular stroke number of the candidate character also reflects the reliability of the character recognition result.

【００１２】そこで、本発明は、手書き入力データから
得られる信頼度推定に有用な特徴に着目することにより
従来技術が抱える問題を解消し、候補文字の確信度や候
補文字間の連接確率など、文字認識結果から得られる特
徴のみでは認識結果の信頼度を適切に推定できない場合
でも比較的精度よく信頼度を算出でき、もって、認識精
度を向上させ得る文字認識装置、文字認識方法、その実
行プログラムおよびそれを記憶した記録媒体を提供する
ことを目的とするものである。Therefore, the present invention solves the problems of the prior art by paying attention to the features useful for estimating the reliability obtained from the handwritten input data, such as the certainty factor of candidate characters and the connection probability between candidate characters. A character recognition device, a character recognition method, and an execution program thereof that can relatively accurately calculate the reliability even when the reliability of the recognition result cannot be appropriately estimated only by the features obtained from the character recognition result. And a recording medium storing the same.

【００１３】[0013]

【課題を解決するための手段】請求項１に係る発明は、
手書き入力された文字の座標点列を認識して認識候補文
字群を出力する文字認識手段と、前記文字認識手段より
出力される判定対象認識候補文字群の信頼度を算出する
ための特徴量として、前記手書き入力された文字の座標
点列の平均筆記速度を算出する特徴抽出手段と、前記特
徴抽出手段からの特徴量と、サンプルデータの統計的傾
向とに基づいて、前記判定対象認識候補文字群の信頼度
を算出する信頼度算出手段と、前記信頼度算出手段から
の信頼度に基づいて前記判定対象認識候補文字群の後処
理を制御する後処理制御手段とを有することを特徴とす
る。The invention according to claim 1 is
Character recognition means for recognizing the coordinate point sequence of characters input by handwriting and outputting a recognition candidate character group, and a feature amount for calculating the reliability of the judgment target recognition candidate character group output from the character recognition means. , The feature extraction means for calculating the average writing speed of the coordinate point sequence of the characters input by handwriting, the feature amount from the feature extraction means, and the statistical tendency of the sample data, and the determination target recognition candidate character A reliability calculation means for calculating reliability of the group, and a post-processing control means for controlling post-processing of the judgment target recognition candidate character group based on the reliability from the reliability calculation means. .

【００１４】請求項２に係る発明は、請求項１記載の発
明において、前記特徴抽出手段は、前記平均筆記速度
と、前記手書き入力された文字の座標点列の筆記画数と
を、前記判定対象認識候補文字群の信頼度を算出するた
めの特徴量として抽出することを特徴とする。請求項３
に係る発明は、請求項１ないし２の何れかに記載の発明
において、前記特徴抽出手段は、さらに前記判定対象認
識候補文字群中の上位Ｎ文字（Ｎ≧１）の正規画数を、
当該判定対象認識候補文字群の信頼度を算出するための
特徴量として抽出することを特徴とする。According to a second aspect of the present invention, in the first aspect of the present invention, the feature extraction means determines the average writing speed and the number of writing strokes in the coordinate point sequence of the handwritten input character as the determination target. It is characterized in that it is extracted as a feature amount for calculating the reliability of the recognition candidate character group. Claim 3
The invention according to claim 1 is the invention according to any one of claims 1 and 2, wherein the feature extraction unit further sets the number of regular strokes of upper N characters (N ≧ 1) in the determination target recognition candidate character group,
It is characterized in that it is extracted as a feature amount for calculating the reliability of the judgment target recognition candidate character group.

【００１５】請求項４に係る発明は、請求項１ないし３
の何れかに記載の発明において、前記特徴抽出手段は、
さらに前記判定対象認識候補文字群中の上位Ｍ文字（Ｍ
≧１）の確信度を、当該判定対象認識候補文字群の信頼
度を算出するための特徴量として抽出することを特徴と
する。請求項５に係る発明は、請求項１ないし４の何れ
かに記載の発明において、前記特徴抽出手段は、さらに
前記判定対象認識候補文字群中の各認識候補文字とその
直前の手書き入力に対する直前認識候補文字群との間の
連接確率の値および／もしくはその直後の手書き入力に
対する直後認識候補文字群との間の連接確率の値を、当
該判定対象認識候補文字群の信頼度を算出するための特
徴量として抽出することを特徴とする。The invention according to claim 4 relates to claims 1 to 3.
In the invention described in any of 1,
Furthermore, the upper M characters (M
The certainty factor of ≧ 1) is extracted as a feature amount for calculating the reliability factor of the determination target recognition candidate character group. According to a fifth aspect of the present invention, in the invention according to any one of the first to fourth aspects, the feature extracting means further includes each recognition candidate character in the determination target recognition candidate character group and immediately before handwriting input immediately before that. In order to calculate the reliability of the determination target recognition candidate character group, the value of the connection probability with the recognition candidate character group and / or the value of the connection probability with the immediately subsequent recognition candidate character group for the handwriting input immediately after that. Is extracted as a feature amount of.

【００１６】請求項６に係る発明は、請求項５記載の発
明において、前記特徴抽出手段は、前記判定対象認識候
補文字群中の各認識候補文字と前記直前認識候補文字群
中の最上位確信度の認識候補文字との間の連接確率の値
および／もしくは前記直後認識候補文字群中の最上位確
信度の認識候補文字との間の連接確率の値を当該判定対
象認識候補文字群の特徴量として抽出することを特徴と
する。According to a sixth aspect of the present invention, in the invention according to the fifth aspect, the feature extracting means includes each recognition candidate character in the determination target recognition candidate character group and the highest belief in the immediately preceding recognition candidate character group. The value of the concatenation probability with the recognition candidate character of the degree and / or the value of the concatenation probability with the recognition candidate character of the highest confidence in the immediately following recognition candidate character group are the features of the determination target recognition candidate character group. It is characterized in that it is extracted as a quantity.

【００１７】請求項７に係る発明は、請求項５記載の発
明において、前記特徴抽出手段は、前記判定対象認識候
補文字群中の一の認識候補文字とその直前または直後認
識候補文字群中の各認識候補文字との間の連接確率の
内、最高の連接確率を当該一の認識候補文字と前記直前
または直後認識候補文字群との間の連接確率とすること
を特徴とする。According to a seventh aspect of the present invention, in the invention according to the fifth aspect, the feature extraction means selects one recognition candidate character in the determination target recognition candidate character group and a recognition candidate character group immediately before or after the recognition candidate character group. Among the connection probabilities with each recognition candidate character, the highest connection probability is set as the connection probability between the one recognition candidate character and the immediately preceding or following recognition candidate character group.

【００１８】請求項８に係る発明は、請求項１ないし７
の何れかに記載の発明において、前記信頼度算出手段
は、前記特徴量から前記判定対象認識候補文字群中の一
の認識候補文字の確からしさを判別得点として算出する
判別得点算出手段を含み、当該判別得点に基づいて前記
信頼度を算出することを特徴とする。請求項９に係る発
明は、請求項１ないし８の何れかに記載の発明におい
て、前記後処理制御手段は、前記信頼度算出手段から算
出された信頼度に基づいて、後処理の対象とする認識候
補文字を制限することを特徴とする。The invention according to claim 8 relates to claims 1 to 7.
In any one of the inventions, the reliability calculation means includes a judgment score calculation means for calculating the certainty of one recognition candidate character in the judgment target recognition candidate character group from the characteristic amount as a judgment score, The reliability is calculated based on the discrimination score. According to a ninth aspect of the present invention, in the invention according to any one of the first to eighth aspects, the post-processing control means sets the post-processing target based on the reliability calculated by the reliability calculation means. The feature is that recognition candidate characters are limited.

【００１９】請求項１０に係る発明は、手書き入力され
た文字の座標点列を認識して認識候補文字群を出力する
文字認識ステップと、前記文字認識ステップより出力さ
れる判定対象認識候補文字群の信頼度を算出するための
特徴量として、前記手書き入力された文字の座標点列の
平均筆記速度を算出する特徴抽出ステップと、前記特徴
抽出ステップからの特徴量と、サンプルデータの統計的
傾向とに基づいて、前記判定対象認識候補文字群の信頼
度を算出する信頼度算出ステップと、前記信頼度算出ス
テップからの信頼度に基づいて前記判定対象認識候補文
字群の後処理を制御する後処理制御ステップとを有する
ことを特徴とする。According to a tenth aspect of the present invention, a character recognition step of recognizing a coordinate point sequence of characters input by handwriting and outputting a recognition candidate character group, and a determination target recognition candidate character group output from the character recognition step. As a feature amount for calculating the reliability of, the feature extraction step of calculating the average writing speed of the coordinate point sequence of the characters input by handwriting, the feature amount from the feature extraction step, and the statistical tendency of sample data A reliability calculation step of calculating the reliability of the judgment target recognition candidate character group based on the following, and after controlling the post-processing of the judgment target recognition candidate character group based on the reliability from the reliability calculation step And a processing control step.

【００２０】請求項１１に係る発明は、請求項１０記載
の発明において、前記特徴抽出ステップは、前記平均筆
記速度と、前記手書き入力された文字の座標点列の筆記
画数とを、前記判定対象認識候補文字群の信頼度を算出
するための特徴量として抽出することを特徴とする。請
求項１２に係る発明は、請求項１０ないし１１の何れか
に記載の発明において、前記特徴抽出ステップは、さら
に前記判定対象認識候補文字群中の上位Ｎ文字（Ｎ≧
１）の正規画数を、当該判定対象認識候補文字群の信頼
度を算出するための特徴量として抽出することを特徴と
する。According to an eleventh aspect of the present invention, in the invention of the tenth aspect, in the feature extracting step, the average writing speed and the number of writing strokes in the coordinate point sequence of the character input by handwriting are used as the determination target. It is characterized in that it is extracted as a feature amount for calculating the reliability of the recognition candidate character group. According to a twelfth aspect of the present invention, in the invention according to any one of the tenth to eleventh aspects, the feature extracting step further includes the upper N characters (N ≧) in the determination target recognition candidate character group.
The feature is that the regular stroke number of 1) is extracted as a feature amount for calculating the reliability of the determination target recognition candidate character group.

【００２１】請求項１３に係る発明は、請求項１０ない
し１２の何れかに記載の発明において、前記特徴抽出ス
テップは、さらに前記判定対象認識候補文字群中の上位
Ｍ文字（Ｍ≧１）の確信度を、当該判定対象認識候補文
字群の信頼度を算出するための特徴量として抽出するこ
とを特徴とする。請求項１４に係る発明は、請求項１０
ないし１３の何れかに記載の発明において、前記特徴抽
出ステップは、さらに前記判定対象認識候補文字群中の
各認識候補文字とその直前の手書き入力に対する直前認
識候補文字群との間の連接確率の値および／もしくはそ
の直後の手書き入力に対する直後認識候補文字群との間
の連接確率の値を、当該判定対象認識候補文字群の信頼
度を算出するための特徴量として抽出することを特徴と
する。According to a thirteenth aspect of the present invention, in the invention according to any one of the tenth to twelfth aspects, the feature extraction step further includes the upper M characters (M ≧ 1) in the judgment target recognition candidate character group. It is characterized in that the certainty factor is extracted as a feature amount for calculating the reliability of the judgment target recognition candidate character group. The invention according to claim 14 is claim 10
In the invention described in any one of (1) to (13), the feature extraction step further includes the concatenation probability of each recognition candidate character in the determination target recognition candidate character group and the immediately preceding recognition candidate character group for the immediately preceding handwriting input. It is characterized in that the value and / or the value of the concatenation probability between the immediately subsequent recognition candidate character group and the immediately subsequent handwritten input is extracted as a feature amount for calculating the reliability of the determination target recognition candidate character group. .

【００２２】請求項１５に係る発明は、請求項１４記載
の発明において、前記特徴抽出ステップは、前記判定対
象認識候補文字群中の各認識候補文字と前記直前認識候
補文字群中の最上位確信度の認識候補文字との間の連接
確率の値および／もしくは前記直後認識候補文字群中の
最上位確信度の認識候補文字との間の連接確率の値を当
該判定対象認識候補文字群の特徴量として抽出すること
を特徴とする。According to a fifteenth aspect of the present invention, in the invention according to the fourteenth aspect, the feature extracting step includes each recognition candidate character in the determination target recognition candidate character group and the highest confidence in the immediately preceding recognition candidate character group. The value of the concatenation probability with the recognition candidate character of the degree and / or the value of the concatenation probability with the recognition candidate character of the highest confidence in the immediately following recognition candidate character group are the features of the determination target recognition candidate character group. It is characterized in that it is extracted as a quantity.

【００２３】請求項１６に係る発明は、請求項１４記載
の発明において、前記特徴抽出ステップは、前記判定対
象認識候補文字群中の一の認識候補文字とその直前また
は直後認識候補文字群中の各認識候補文字との間の連接
確率の内、最高の連接確率を当該一の認識候補文字と前
記直前または直後認識候補文字群との間の連接確率とす
ることを特徴とする。According to a sixteenth aspect of the present invention, in the invention according to the fourteenth aspect, the feature extracting step includes one recognition candidate character in the determination target recognition candidate character group and a recognition candidate character group immediately before or after the recognition candidate character group. Among the connection probabilities with each recognition candidate character, the highest connection probability is set as the connection probability between the one recognition candidate character and the immediately preceding or following recognition candidate character group.

【００２４】請求項１７に係る発明は、請求項１０ない
し１６の何れかに記載の発明において、前記信頼度算出
ステップは、前記特徴量から前記判定対象認識候補文字
群中の一の認識候補文字の確からしさを判別得点として
算出する判別得点算出ステップを含み、当該判別得点に
基づいて前記信頼度を算出することを特徴とする。請求
項１８に係る発明は、請求項１０ないし１７の何れかに
記載の発明において、前記後処理制御ステップは、前記
信頼度算出ステップから算出された信頼度に基づいて、
後処理の対象とする認識候補文字を制限することを特徴
とする。According to a seventeenth aspect of the present invention, in the invention according to any one of the tenth to sixteenth aspects, the reliability calculation step is based on the feature amount and is one recognition candidate character in the judgment target recognition candidate character group. It is characterized by including a discriminant score calculation step for calculating the certainty of as a discriminant score, and calculating the reliability based on the discriminant score. The invention according to claim 18 is the invention according to any one of claims 10 to 17, wherein the post-processing control step is based on the reliability calculated from the reliability calculation step,
The feature is that the recognition candidate characters that are the target of the post-processing are limited.

【００２５】請求項１９に係る発明は、コンピュータに
請求項１０ないし１８の何れかに記載の文字認識方法に
おける各処理ステップを実行させるためのプログラムで
ある。請求項２０に係る発明は、コンピュータに請求項
１０ないし１８の何れかに記載の文字認識方法における
各処理ステップを実行させるためのプログラムを記録し
たコンピュータ読み取り可能な記録媒体である。The invention according to claim 19 is a program for causing a computer to execute each processing step in the character recognition method according to any one of claims 10 to 18. The invention according to claim 20 is a computer-readable recording medium recording a program for causing a computer to execute each processing step in the character recognition method according to any one of claims 10 to 18.

【００２６】[0026]

【発明の実施の形態】＜第１の実施の形態＞以下、本発
明の第１の実施の形態につき図面を参照して説明する。
まず、図１は、第１の実施の形態に係る手書き文字認識
装置の回路ブロック図である。BEST MODE FOR CARRYING OUT THE INVENTION <First Embodiment> A first embodiment of the present invention will be described below with reference to the drawings.
First, FIG. 1 is a circuit block diagram of a handwritten character recognition device according to the first embodiment.

【００２７】図１において、１は入力部で、タブレット
等に手書き入力された筆跡から筆跡文字情報を生成し出
力する。２は文字認識部で、入力部１から供給された筆
跡文字情報を文字認識辞書３の文字特徴量と比較し、両
者の近接度（確信度）が１位からＮ位までの認識辞書中
の文字を当該筆跡文字の認識候補文字として出力する。
３は文字認識辞書で、候補文字がその文字特徴量ととも
に記憶されている。In FIG. 1, an input unit 1 generates and outputs handwriting character information from handwriting input by handwriting on a tablet or the like. A character recognition unit 2 compares the handwritten character information supplied from the input unit 1 with the character feature amount of the character recognition dictionary 3, and the proximity (confidence) between the two is in the recognition dictionary from the first rank to the Nth rank. The character is output as a recognition candidate character of the handwriting character.
Reference numeral 3 is a character recognition dictionary in which candidate characters are stored together with their character feature amounts.

【００２８】４は特徴抽出部で、入力部１で得られた筆
跡文字情報から平均筆記速度を算出するとともに筆記画
数を求める。さらに正規画数テーブル５を参照して、文
字認識部２から供給される判定対象の認識候補文字群の
内、上位Ｎ文字の正規画数を抽出する。５は正規画数テ
ーブルで、認識対象文字の正規画数を各認識対象文字に
対応づけて記憶している。A feature extraction unit 4 calculates an average writing speed from the handwritten character information obtained by the input unit 1 and obtains the number of writing strokes. Further, by referring to the regular stroke number table 5, the regular stroke number of the upper N characters of the recognition candidate character group of the determination target supplied from the character recognition unit 2 is extracted. A regular stroke number table 5 stores the regular stroke number of the recognition target character in association with each recognition target character.

【００２９】６は判別得点算出部で、特徴抽出部４で得
られた特徴量を処理して、当該判定対象の認識候補文字
群の正誤判別得点を算出する。７は認識信頼度算出部
で、判別得点算出部６からの判別得点と、判別得点−信
頼度変換テーブル８とを比較して、当該判定対象の認識
候補文字群の信頼度を出力する。８は判別得点−信頼度
変換テーブルで、判別得点と信頼度の関係をテーブルと
して記憶しておくものである。A discrimination score calculation unit 6 processes the feature amount obtained by the feature extraction unit 4 to calculate the true / false discrimination score of the recognition candidate character group to be judged. A recognition reliability calculation unit 7 compares the discrimination score from the discrimination score calculation unit 6 with the discrimination score-reliability conversion table 8 and outputs the reliability of the recognition candidate character group of the determination target. Reference numeral 8 denotes a discrimination score-reliability conversion table, which stores the relation between the discrimination score and the reliability as a table.

【００３０】９は認識候補数制御部で、認識信頼度算出
部７からの信頼度と、信頼度−累積正読率テーブル１０
とを比較して、当該判定対象文字の認識候補数を制限す
るものである。１０は信頼度−累積正読率テーブルで、
信頼度と正読率の関係をテーブルとして記憶しておくも
のである。２１は言語処理部で、認識候補数制御部９に
よって設定された個数の認識候補文字を対象として後処
理を行い、文字列候補を出力する。Reference numeral 9 is a recognition candidate number control unit, which is a reliability from the recognition reliability calculation unit 7, and a reliability-cumulative correct reading rate table 10.
And are compared to limit the number of recognition candidates of the determination target character. 10 is a reliability-cumulative correct reading rate table,
The relationship between the reliability and the correct reading rate is stored as a table. A language processing unit 21 performs post-processing on the number of recognition candidate characters set by the recognition candidate number control unit 9 and outputs a character string candidate.

【００３１】なお、請求項における「文字認識手段」は
実施の形態における図１の文字認識部２および文字認識
辞書３が対応する。請求項における「特徴抽出手段」は
実施の形態における図１の特徴抽出部４および正規画数
テーブル５が対応する。請求項における「信頼度算出手
段」は実施の形態における図１の判別得点算出部６、認
識信頼度算出部７および判別得点―信頼度変換テーブル
８が対応する。請求項における「後処理制御手段」は実
施の形態における図１の認識候補数制御部９および信頼
度―累積正読率テーブル１０が対応する。The "character recognizing means" in the claims corresponds to the character recognizing unit 2 and the character recognizing dictionary 3 in FIG. 1 in the embodiment. The “feature extraction means” in the claims corresponds to the feature extraction unit 4 and the regular stroke number table 5 of FIG. 1 in the embodiment. The “reliability calculation means” in the claims corresponds to the discrimination score calculation unit 6, the recognition reliability calculation unit 7, and the discrimination score-reliability conversion table 8 of FIG. 1 in the embodiment. The “post-processing control means” in the claims corresponds to the recognition candidate number control unit 9 and the reliability-cumulative correct reading rate table 10 of FIG. 1 in the embodiment.

【００３２】次に、前記回路ブロック図に示された各部
の処理の詳細について説明する。まず、図２を参照し
て、特徴抽出部４の処理について説明する。図２は入力
部１より文字「い」が筆記された時の座標点列を表して
いる。かかる特徴抽出部４では、入力部１で得られた座
標点列（ｘｉ，ｙｉ，ｔｉ，ｐｉ）（ｉ＝１〜Ｋ）から
平均筆記速度を算出するとともに筆記画数を求める。こ
こでＫは座標点数であり、ｘｉおよびｙｉはそれぞれｉ
番目の座標点のｘ座標およびｙ座標、ｔｉはｉ番目の座
標点が発生した時刻である。またｐｉは時刻ｔｉにおけ
る入力ペンの状態を示し、ペンがタブレットに接してい
る時はｐｉ＝１，ペンがタブレットから離れている時は
ｐｉ＝０の値を持つ。Next, details of the processing of each unit shown in the circuit block diagram will be described. First, the processing of the feature extraction unit 4 will be described with reference to FIG. FIG. 2 shows a sequence of coordinate points when the character “i” is written from the input unit 1. The feature extracting unit 4 calculates the average writing speed from the coordinate point sequence (xi, yi, ti, pi) (i = 1 to K) obtained by the input unit 1 and obtains the number of writing strokes. Here, K is the number of coordinate points, and xi and yi are i
The x coordinate and y coordinate of the th coordinate point and ti are the times when the i th coordinate point occurred. Further, pi indicates the state of the input pen at time ti, and has a value of pi = 1 when the pen is in contact with the tablet and pi = 0 when the pen is away from the tablet.

【００３３】ｉ番目の座標点からｉ＋１番目の座標点へ
ペンが移動した時の筆記速度ｖｉは数１で表される。The writing speed vi when the pen moves from the i-th coordinate point to the (i + 1) -th coordinate point is expressed by equation 1.

【００３４】[0034]

【数１】 [Equation 1]

【００３５】したがって、座標点列の平均筆記速度Ｖは
数２により算出される。Therefore, the average writing speed V of the coordinate point sequence is calculated by the equation 2.

【００３６】[0036]

【数２】 [Equation 2]

【００３７】また、入力ペン状態を示すｐｉの値が１か
ら０に変化する回数を計数することにより、筆記画数Ｓ
ｉｎｐが得られる。さらに、認識候補文字群の上位Ｎ文
字に対して、正規画数テーブル５を参照することにより
正規画数Ｓｎ１，Ｓｎ２，…，ＳｎＮが得られる。次
に、図３を参照して、判別得点算出部６における処理に
ついて説明する。The number of writing strokes S is calculated by counting the number of times the value of pi indicating the input pen state changes from 1 to 0.
inp is obtained. Further, the normal stroke number Sn1, Sn2, ..., SnN can be obtained by referring to the regular stroke number table 5 for the upper N characters of the recognition candidate character group. Next, with reference to FIG. 3, the processing in the discrimination score calculation unit 6 will be described.

【００３８】前記特徴抽出部４で抽出した平均筆記速度
Ｖ、筆記画数Ｓｉｎｐ、正規画数Ｓｎ１，Ｓｎ２，…，
ＳｎＮの組は、（２＋Ｎ）次元のベクトル空間において
所定のベクトル（特徴ベクトル）として表現できる。判
別得点算出部６では、予め第１位の認識候補文字が正読
または誤読であるサンプルについて同様に特徴ベクトル
を学習データとして抽出しておき、これと判定対象の認
識候補文字群の特徴ベクトルとを比較して、当該認識候
補文字群の判別得点を算出する。The average writing speed V, the number of writing strokes Sinp, the number of regular strokes Sn1, Sn2, ...
The SnN set can be expressed as a predetermined vector (feature vector) in a (2 + N) -dimensional vector space. The discrimination score calculation unit 6 similarly extracts a feature vector as learning data for a sample in which the first-ranked recognition candidate character is correctly read or erroneously read, and extracts it as a feature vector of the recognition candidate character group to be judged. And the discrimination score of the recognition candidate character group is calculated.

【００３９】たとえば、正読と誤読の特徴ベクトルの学
習データと判定対象の認識候補文字群の特徴ベクトルが
図３に示すような状態にあるとする。図３は簡略化のた
めにＮ＝１すなわち特徴ベクトルの次元数が３の場合を
図示したものである。判別得点算出部６は、正読、誤読
のそれぞれの集合（クラス）の特徴ベクトルの分布か
ら、予め、両クラスの特徴ベクトルの平均値（重心ベク
トル）および共分散行列を求め、これを記憶している。
そして、これら各クラスの重心と判定対象文字群の特徴
ベクトルとの間のマハラノビス距離ＤＭｃ、ＤＭｅを求
め、これらの値の比、比の対数または差を判別得点とす
る。For example, it is assumed that the learning data of the correct reading and misreading feature vectors and the feature vector of the recognition candidate character group to be judged are in a state as shown in FIG. FIG. 3 illustrates the case where N = 1, that is, the number of dimensions of the feature vector is 3, for simplification. The discrimination score calculation unit 6 obtains the average value (centroid vector) of the feature vectors of both classes and the covariance matrix in advance from the distribution of the feature vector of each set (class) of correct reading and erroneous reading, and stores this. ing.
Then, the Mahalanobis distances DMc and DMe between the center of gravity of each of these classes and the feature vector of the character group to be determined are obtained, and the ratio of these values, the logarithm of the ratio, or the difference is used as the determination score.

【００４０】ここで、前記マハラノビス距離ＤＭは次の
ようにして算出される。すなわち、クラスＣ１の重心ベ
クトルをｍ１、クラスＣ１の共分散行列をΣ１とする
と、所定の特徴ベクトルｘからｍ１へのマハラノビス２
乗距離ＤＭ１は、数３で定義される。Here, the Mahalanobis distance DM is calculated as follows. That is, assuming that the center of gravity vector of the class C1 is m1 and the covariance matrix of the class C1 is Σ1, the Mahalanobis 2 from the predetermined feature vector x to m1.
The riding distance DM1 is defined by Equation 3.

【００４１】[0041]

【数３】 [Equation 3]

【００４２】ここで、共分散行列Σ１はｎ×ｎの正方行
列であり（ｎ：特徴空間の次元数）、その（ｉ,ｊ）要
素はｉ番目の特徴量とｊ番目の特徴量の共分散、すなわ
ちΣ１（ｉ,ｊ）＝σｉｊである。なお、前記では、判
別得点をマハラノビス距離ＤＭを用いて算出したが、こ
れに替えて、正読、誤読の各クラスの特徴ベクトルの分
布から線形判別分析により線形判別関数を求めておき、
判定対象の認識候補文字群の特徴ベクトルに対してこの
線形判別関数を当てはめて判別得点を求めるようにして
も良い。Here, the covariance matrix Σ1 is an n × n square matrix (n: the number of dimensions of the feature space), and its (i, j) element is the co-variance of the i-th feature quantity and the j-th feature quantity. The variance, ie Σ1 (i, j) = σij. In the above, the discriminant score was calculated using the Mahalanobis distance DM, but instead of this, a linear discriminant function was obtained by linear discriminant analysis from the distribution of the characteristic vectors of each class of correct reading and misreading,
The linear discriminant function may be applied to the feature vector of the recognition candidate character group to be determined to obtain the discrimination score.

【００４３】また、正読、誤読の学習サンプルから抽出
した特徴ベクトルを学習データとして、対象の特徴ベク
トルが正読か誤読かを判定できるように学習させたニュ
ーラルネットを用い、判定対象の特徴ベクトルに対する
当該ニューラルネットの出力値を判別得点とするように
してもよい。次に、図４を参照して認識信頼度算出部７
の処理について説明する。Further, using the feature vector extracted from the learning sample of correct reading or erroneous reading as learning data, the neural network trained so as to judge whether the target characteristic vector is correct reading or erroneous reading is used, and the characteristic vector to be judged. The output value of the neural network with respect to may be used as the discrimination score. Next, referring to FIG. 4, the recognition reliability calculation unit 7
The process will be described.

【００４４】たとえば、前記判別得点の算出において、
マハラノビス距離ＤＭｃ、ＤＭｅの距離の比または比の
対数を判別得点とした場合、正読および誤読の各学習サ
ンプルから得られる判別得点と信頼度の関係は図４に示
すようになる。ここで、信頼度は、学習サンプルからベ
イズの定理によって算出される。すなわち、判別得点ｙ
を有する正読サンプル個数の全正読サンプル個数に対す
る比率をｐ（ｙ｜Ｘ１＝Ｃ）、判別得点ｙを有する誤読
サンプル個数の全誤読サンプル個数に対する比率をｐ
（ｙ｜Ｘ１＝Ｅ）、全サンプル数に対する正読サンプル
数の総数の比率をＰ（Ｘ１＝Ｃ）、全サンプル数に対す
る誤読サンプル数の総数の比率をＰ（Ｘ１＝Ｅ）とする
と、判別得点ｙを有する認識候補文字群の確信度１位の
認識候補文字の信頼度は、次式によって算出できる。For example, in calculating the discrimination score,
When the discriminant score is the ratio of the Mahalanobis distances DMc and DMe or the logarithm of the ratio, the relationship between the discriminant score obtained from each of the correct reading and misreading learning samples and the reliability is as shown in FIG. Here, the reliability is calculated from the learning sample by Bayes' theorem. That is, the discrimination score y
P (y | X1 = C), which is the ratio of the number of correct reading samples with respect to the total number of correct reading samples, and p is the ratio of the number of misreading samples with the discrimination score y to the total number of misreading samples.
(Y | X1 = E), the ratio of the total number of correct reading samples to the total number of samples is P (X1 = C), and the ratio of the total number of misreading samples to the total number of samples is P (X1 = E). The reliability of the recognition candidate character having the highest certainty factor of the recognition candidate character group having the score y can be calculated by the following equation.

【００４５】Ｐ（Ｘ１＝Ｃ｜ｙ）＝ｐ（ｙ｜Ｘ１＝Ｃ）
・Ｐ（Ｘ１＝Ｃ）／[ｐ（ｙ｜Ｘ１＝Ｃ）・Ｐ（Ｘ１＝
Ｃ）＋ｐ（ｙ｜Ｘ１＝Ｅ）・Ｐ（Ｘ１＝Ｅ）] ここで、Ｘ１は１位の認識候補文字を表し、Ｘ１＝Ｃ、
Ｘ１＝Ｅはそれぞれ、１位認識候補文字が正解、不正解
である事象を意味する。かかる式から判別得点と信頼度
の関係を示す判別得点―信頼度変換テーブルを予め作成
しておき、これを判別得点―信頼度変換テーブル８に記
憶させておく。認識信頼度算出部７は判別得点算出部６
からの判別得点と、当該判別得点―信頼度変換テーブル
８の得点を比較し、該当する信頼度を、当該認識候補文
字群の第１位の認識候補文字の信頼度として出力する。P (X1 = C | y) = p (y | X1 = C)
・ P (X1 = C) / [p (y | X1 = C) ・ P (X1 =
C) + p (y | X1 = E) · P (X1 = E)] Here, X1 represents the first candidate character for recognition, X1 = C,
X1 = E means that the first-ranked recognition candidate characters are correct and incorrect, respectively. A discriminant score-reliability conversion table showing the relationship between the discriminant score and the reliability is created in advance from this equation and stored in the discriminant score-reliability conversion table 8. The recognition reliability calculation unit 7 is the discrimination score calculation unit 6
And the score of the determination score-reliability conversion table 8 are compared, and the corresponding reliability is output as the reliability of the first recognition candidate character of the recognition candidate character group.

【００４６】次に、図５を参照して、認識候補数制御部
９の処理について説明する。図５の上部に示す表は、判
定対象の確信度１位の認識候補文字に対する信頼度と、
当該判定対象のＮ位までの認識候補文字の中に正読の文
字が含まれる累積確率との関係を示すものである。かか
る表中の確率は、前記正読、誤読の学習サンプルを基に
予め算出しておく。Next, the processing of the recognition candidate number control section 9 will be described with reference to FIG. The table shown in the upper part of FIG. 5 shows the reliability with respect to the recognition candidate character having the first confidence in the determination target,
It shows the relationship with the cumulative probability that a correct reading character is included in the N-th recognition candidate characters to be judged. The probabilities in the table are calculated in advance based on the learning samples of correct reading and erroneous reading.

【００４７】信頼度―累積正読率テーブル１０には、か
かる表を記憶させておく。そして、認識候補数制御部９
は、判定対象の認識候補文字群の信頼度と当該テーブル
中の信頼度レベルとを比較し、該当する信頼度レベルの
累積確率を参照しながら何位までの認識候補文字を言語
処理部２１に出力するかを決定する。ここで、何位まで
を出力するかは、例えば、該当する信頼度レベルの累積
確率が所定のしきい値に達したか否かで決定する。この
際、設定されるしきい値は、全ての信頼度レベルに対し
て一律としても良いし、あるいは、信頼度レベル毎に個
別に設定するようにしても良い。This table is stored in the reliability-cumulative correct reading rate table 10. Then, the recognition candidate number control unit 9
Compares the reliability of the recognition candidate character group to be determined with the reliability level in the table, and refers to the cumulative probability of the corresponding reliability level up to the maximum number of recognition candidate characters in the language processing unit 21. Decide whether to output. Here, up to what is output is determined by, for example, whether or not the cumulative probability of the corresponding reliability level reaches a predetermined threshold value. At this time, the threshold value to be set may be uniform for all reliability levels, or may be set individually for each reliability level.

【００４８】あるいは、図５の上部の表を基に、信頼度
レベル毎の出力候補数を予め設定し、これを信頼度−累
積正読率テーブル１０に記憶させておいても良い。図５
の下部に示す表は、信頼度レベルと出力候補数とを予め
設定した場合の一例である。信頼度―累積正読率テーブ
ル１０に予めかかる表を記憶させた場合には、認識候補
数制御部９は、該当する出力候補数を表から読み出し、
それに従って、言語処理部２１に出力される認識候補文
字を制限する。Alternatively, the number of output candidates for each reliability level may be preset based on the table in the upper part of FIG. 5, and this may be stored in the reliability-cumulative correct reading rate table 10. Figure 5
The table below shows an example of the case where the reliability level and the number of output candidates are preset. When such a table is stored in advance in the reliability-cumulative correct reading rate table 10, the recognition candidate number control unit 9 reads the corresponding output candidate number from the table,
Accordingly, the recognition candidate characters output to the language processing unit 21 are limited.

【００４９】以上の実施の形態においては、認識結果の
信頼度推定に有用な特徴量を手書き入力データから抽出
しているため、候補文字の確信度からは認識結果の信頼
度を適切に推定できない場合にも比較的精度よく信頼度
を算出でき、もって、正読率の高い認識候補文字を言語
処理部に出力することができるようになる。＜第２の実施の形態＞次に、本発明に係る第２の実施形
態について以下に説明する。In the above embodiment, since the feature amount useful for estimating the reliability of the recognition result is extracted from the handwritten input data, the reliability of the recognition result cannot be properly estimated from the certainty factor of the candidate character. Also in this case, the reliability can be calculated relatively accurately, and thus the recognition candidate character having a high correct reading rate can be output to the language processing unit. <Second Embodiment> Next, a second embodiment according to the present invention will be described below.

【００５０】本実施の形態は前記特徴抽出部４における
特徴抽出処理を変更するものである。まず、図６に本実
施の形態に係る手書き文字認識装置の回路ブロック図を
示す。第１の実施の形態において示した図１との相違
は、「文字間連接確率辞書１１」が追加されている点で
ある。In this embodiment, the feature extraction processing in the feature extraction unit 4 is changed. First, FIG. 6 shows a circuit block diagram of the handwritten character recognition apparatus according to the present embodiment. The difference from FIG. 1 shown in the first embodiment is that the “character-character connecting probability dictionary 11” is added.

【００５１】本実施の形態においては、特徴抽出部４に
おける特徴抽出処理として、第１の実施の形態において
示した処理内容に加えて、文字間連接確率辞書１１を参
照することにより判定対象認識候補文字群中の認識候補
文字とその直前および直後の認識候補文字群との間の連
接確率を抽出する処理が追加される。すなわち、第１の
実施の形態では信頼度算出に用いる特徴量として、平均
筆記速度、筆記画数、判定対象認識候補文字群の上位Ｎ
文字の正規画数を採用したが、第２の実施の形態では、
これらに加えて判定対象認識候補文字群中の上位Ｌ位ま
での認識候補文字とその直前の認識候補文字群との間の
連接確率の値Ｐｂｋ（ｋ＝１〜Ｌ）、および判定対象認
識候補文字群中の上位Ｌ位までの認識候補文字とその直
後の認識候補文字群との間の連接確率の値Ｐｆｋ（ｋ＝
１〜Ｌ）を信頼度算出に用いる特徴量として採用する。In the present embodiment, as the feature extraction processing in the feature extraction unit 4, in addition to the processing content shown in the first embodiment, the judgment target recognition candidate is obtained by referring to the intercharacter concatenation probability dictionary 11. A process of extracting a concatenation probability between the recognition candidate character in the character group and the recognition candidate character group immediately before and after the recognition candidate character is added. That is, in the first embodiment, the average writing speed, the number of writing strokes, and the top N of the judgment target recognition candidate character group are used as the feature amount used for reliability calculation.
Although the normal stroke number of characters is adopted, in the second embodiment,
In addition to these, the value Pbk (k = 1 to L) of the connection probability between the recognition candidate characters up to the top L in the judgment target recognition candidate character group and the recognition candidate character group immediately before it, and the judgment target recognition candidate The value Pfk (k = k) of the concatenation probability between the recognition candidate character up to the upper L rank in the character group and the recognition candidate character group immediately after it.
1 to L) are adopted as feature quantities used for reliability calculation.

【００５２】ここで、判定対象認識候補文字群中の第ｋ
位候補文字とその直前の認識候補文字群との間の連接確
率の値Ｐｂｋは、本実施の形態では、第ｋ位候補文字と
直前の１位からＪ位までの候補文字との間の連接確率の
最大値とする。Ｐｆｋも同様に、第ｋ位候補文字と直後
の１位からＪ位までの候補文字との間の連接確率の最大
値とする。Here, the kth character in the recognition target recognition candidate character group
In the present embodiment, the value Pbk of the connection probability between the rank candidate character and the immediately preceding recognition candidate character group is the connection between the kth candidate character and the immediately preceding 1st to Jth candidate characters. Maximum probability. Similarly, Pfk is the maximum value of the concatenation probability between the kth candidate character and the immediately following 1st to Jth candidate characters.

【００５３】たとえば図７の例においては、判定対象の
認識候補文字の１位文字「日」に対するＰｂ１は、当該
「日」と直前の１位文字「朋」からＪ位文字「胡」まで
のそれぞれの連接確率Ｐ（Ｃ１｜Ｃｂｋ）の内、最大の
連接確率を採用する。また、１位文字「日」に対するＰ
ｆ１は、当該「日」と直後の１位文字「も」からＪ位文
字「亡」までのそれぞれの連接確率Ｐ（Ｃｆｋ｜Ｃ１）
の内、最大の連接確率を採用する。同様に、判定対象の
認識候補文字の２位文字「月」に対するＰｂ２、Ｐｆ２
は、直前、直後の文字群に対する連接確率の最大値をそ
れぞれ採用する。For example, in the example of FIG. 7, Pb1 for the 1st character "day" of the recognition candidate character to be judged is the "day" and the preceding 1st character "To" to the Jth character "hu". Among the respective connection probabilities P (C1 | Cbk), the maximum connection probability is adopted. In addition, P for the first character "day"
f1 is the concatenation probability P (Cfk | C1) of the "day" and the immediately preceding 1st character "mo" to the Jth character "death."
Among them, the highest connection probability is adopted. Similarly, Pb2 and Pf2 for the second character "month" of the recognition candidate character to be determined
Uses the maximum value of the concatenation probability for the character groups immediately before and after respectively.

【００５４】ここで、Ｃ１は判定対象の認識候補１位の
文字を表し、Ｃｂｋ、Ｃｆｋはそれぞれ、直前、直後の
認識候補ｋ位の文字を表す。そして、Ｐ（Ｃｊ｜Ｃｉ）
は、文字Ｃｉに続いて文字Ｃｊが現れる連接確率を表
す。第２の実施形態においては、図３に示す判別空間は
（２＋Ｎ＋２Ｌ）次元となる。また、正読・誤読のサン
プルも、平均筆記速度、筆記画数、当該サンプルの認識
候補文字群の上位Ｎ文字の正規画数の他、当該サンプル
の認識候補文字群の上位Ｌ文字に対する連接確率Ｐｂ
ｋ、Ｐｆｋが特徴抽出要素とされ、かかるサンプルデー
タに従って判別得点―信頼度変換テーブル８と信頼度―
累積正読率テーブル１０に記憶されるテーブルが設定さ
れる。Here, C1 represents the first character of the recognition candidate to be judged, and Cbk and Cfk represent the character of the kth candidate immediately before and after the recognition candidate, respectively. And P (Cj | Ci)
Represents the concatenation probability that the character Cj appears after the character Ci. In the second embodiment, the discriminant space shown in FIG. 3 has (2 + N + 2L) dimensions. Also, in the case of correct reading / wrong reading samples, in addition to the average writing speed, the number of writing strokes, the number of normal strokes of the upper N characters of the recognition candidate character group of the sample, the concatenation probability Pb with respect to the upper L characters of the recognition candidate character group of the sample
k and Pfk are feature extraction elements, and the discrimination score-reliability conversion table 8 and reliability-according to the sample data.
A table stored in the cumulative correct reading rate table 10 is set.

【００５５】第２の実施形態においては、第１の実施形
態で採用した特徴量に加えて、隣接する認識候補文字群
に含まれる文字間の連接確率を信頼度判定の特徴量とし
て採用するものであるから、前記第１の実施形態よりも
さらに高精度の信頼度判定を行えるものである。さらに
他の実施形態として、前記連接確率Ｐｂｋ、Ｐｆｋの
他、第Ｍ位までの認識候補文字の確信度（類似度もしく
は距離値）を特徴要素として加え、（２＋Ｎ＋２Ｌ＋
Ｍ）次元のベクトル空間にて当該認識候補文字群の特徴
ベクトルを抽出するようにしてもよい。かかる場合には
図３に示す判別空間も（２＋Ｎ＋２Ｌ＋Ｍ）次元とな
る。かかる第２の実施の形態では、連接関係のみならず
確信度が加味されるものであるから、より高精度の信頼
度判定が可能となる。In the second embodiment, in addition to the feature quantity adopted in the first embodiment, a concatenation probability between characters included in adjacent recognition candidate character groups is adopted as a feature quantity for reliability determination. Therefore, the reliability determination can be performed with higher accuracy than in the first embodiment. As still another embodiment, in addition to the concatenation probabilities Pbk and Pfk, the certainty factor (similarity or distance value) of the recognition candidate characters up to the Mth position is added as a feature element, and (2 + N + 2L +
The feature vector of the recognition candidate character group may be extracted in the M) -dimensional vector space. In such a case, the discriminant space shown in FIG. 3 also has (2 + N + 2L + M) dimensions. In the second embodiment, not only the concatenation relation but also the certainty factor is taken into consideration, so that the reliability determination can be performed with higher accuracy.

【００５６】ところで、前記実施の形態では、図１にお
けるブロック毎に処理を分けて一連の処理フローを説明
したが、制御プログラムに従ってＣＰＵによってかかる
処理フローを実行することも可能である。かかる場合、
前記処理フローは、ＲＯＭまたはＲＡＭに制御プログラ
ムとして記憶される。また、文字認識辞書３、正規画数
テーブル５、判別得点―信頼度変換テーブル８、信頼度
―累積正読率テーブル１０および文字間連接確率辞書１
１の参照データもＲＯＭまたはＲＡＭに記憶される。Ｃ
ＰＵは、かかる制御プログラムに従って、参照データを
参照しながら、前記の処理を実行する。By the way, in the above-described embodiment, a series of processing flows are explained by dividing the processing into each block in FIG. 1, but it is also possible to execute this processing flow by the CPU according to the control program. In such cases,
The processing flow is stored in the ROM or the RAM as a control program. Further, the character recognition dictionary 3, the regular stroke number table 5, the discrimination score-reliability conversion table 8, the reliability-cumulative correct reading rate table 10, and the inter-character connection probability dictionary 1
The reference data of 1 is also stored in the ROM or the RAM. C
The PU executes the above process according to the control program while referring to the reference data.

【００５７】図８に、かかる制御プログラムによるフロ
ーを示す。ここで、ステップＳ１０１は入力部１におけ
る処理、ステップＳ１０２は文字認識部２における処
理、ステップＳ１０３は特徴抽出部４における処理、ス
テップＳ１０４は判別得点算出部６における処理、ステ
ップＳ１０５は認識信頼度算出部７における処理、ステ
ップＳ１０６は認識候補数制御部９における処理であ
る。FIG. 8 shows a flow of the control program. Here, step S101 is processing in the input unit 1, step S102 is processing in the character recognition unit 2, step S103 is processing in the feature extraction unit 4, step S104 is processing in the discrimination score calculation unit 6, and step S105 is recognition reliability calculation. Processing in the unit 7, step S106 is processing in the recognition candidate number control unit 9.

【００５８】なお、請求項における「文字認識ステッ
プ」は実施の形態における図７のステップＳ１０２が対
応する。請求項における「特徴抽出ステップ」は実施の
形態における図７のステップＳ１０３が対応する。請求
項における「信頼度算出ステップ」は実施の形態におけ
る図７のステップＳ１０４およびＳ１０５が対応する。
かかる制御プログラムおよび各種参照データは、フレキ
シブルディスク等の記録媒体またはインターネット等の
伝送媒体を介して取引され得る。記録媒体または伝送媒
体を介して取引されるデータのファイル構造の一例を図
９に示す。記録媒体には、かかるファイル構造のデータ
が記録される。また、伝送媒体を介した取引では、かか
るファイル構造のデータが伝送媒体を介して供給され
る。The "character recognition step" in the claims corresponds to step S102 of FIG. 7 in the embodiment. The “feature extraction step” in the claims corresponds to step S103 in FIG. 7 in the embodiment. The “reliability calculation step” in the claims corresponds to steps S104 and S105 of FIG. 7 in the embodiment.
The control program and various reference data can be traded via a recording medium such as a flexible disk or a transmission medium such as the Internet. FIG. 9 shows an example of a file structure of data traded via a recording medium or a transmission medium. Data having such a file structure is recorded on the recording medium. Further, in the transaction via the transmission medium, the data having such a file structure is supplied via the transmission medium.

【００５９】以上、本発明に係る実施の形態について説
明したが、本発明はかかる実施の形態に制限されるもの
ではなく、他に種々の変更が可能である。たとえば、前
記実施の形態では、平均筆記速度、筆記画数、候補文字
の正規画数、隣接する認識候補文字群相互間の連接確
率、候補文字の確信度を特徴量として信頼度を算出する
例を示したが、これら種々の特徴量の内、個々の実施装
置において特に有用な特徴量のみを選択して採用するこ
ともできる。Although the embodiments according to the present invention have been described above, the present invention is not limited to the embodiments and various modifications can be made. For example, in the above-described embodiment, an example is shown in which the reliability is calculated using the average writing speed, the number of writing strokes, the number of normal strokes of candidate characters, the connection probability between adjacent recognition candidate character groups, and the certainty of candidate characters as a feature amount. However, it is also possible to select and employ only the feature amount that is particularly useful in the individual implementation devices from these various feature amounts.

【００６０】また、図１の認識候補数制御部９における
処理内容を、認識候補数の制限ではなく、信頼度に応じ
て当該認識結果をリジェクト（無効）とするようにして
も良い。更に、手書き入力の対象は、文字を一例として
挙げたが、これには限られず、図形でも構わないことは
いうまでもない。Further, the processing content in the recognition candidate number control unit 9 of FIG. 1 may be such that the recognition result is rejected (invalid) according to the reliability, instead of limiting the number of recognition candidates. Furthermore, although the target of the handwriting input is a character as an example, it is needless to say that it is not limited to this and may be a figure.

【００６１】その他、図１の判別得点算出部６における
判別得点の算出方法や、図１の認識信頼度算出部７にお
ける認識信頼度の算出方法も、前記実施の形態にて示し
たマハラノビス距離ＤＭを用いる方法や、ベイズの定理
を用いる方法以外の方法を採用することもできる。本発
明の実施形態は、本発明の技術的思想の範囲内におい
て、適宜、様々な変更が可能である。In addition, the method for calculating the discrimination score in the discrimination score calculation unit 6 in FIG. 1 and the method for calculating the recognition reliability in the recognition reliability calculation unit 7 in FIG. 1 are the same as the Mahalanobis distance DM shown in the above embodiment. Alternatively, a method other than the method using Bayes' theorem can be adopted. The embodiments of the present invention can be appropriately modified in various ways within the scope of the technical idea of the present invention.

【００６２】また、前述の実施の形態は、あくまでも、
本発明の一つの実施形態であって、本発明ないし各構成
要件の用語の意義は、実施の形態に記載されたものに制
限されるものではない。Further, the above-described embodiment is, to the last,
This is one embodiment of the present invention, and the meanings of the terms of the present invention or each constituent element are not limited to those described in the embodiment.

【００６３】[0063]

【発明の効果】以上、本発明によれば、手書き入力デー
タから得られる信頼度推定に有用な特徴を用いることに
より、候補文字の確信度や候補文字間の連接確率など文
字認識結果から得られる情報のみでは当該認識結果の信
頼度を適切に推定できない場合でも比較的精度よく信頼
度を算出でき、もって、これを文字認識装置に採用した
場合には、文字認識の精度を向上させることができるよ
うになる。As described above, according to the present invention, it is possible to obtain from the character recognition result such as the certainty factor of the candidate character and the concatenation probability between the candidate characters by using the feature useful for the reliability estimation obtained from the handwritten input data. Even if the reliability of the recognition result cannot be properly estimated with only the information, the reliability can be calculated with relatively high accuracy. Therefore, when this is adopted in the character recognition device, the accuracy of character recognition can be improved. Like

[Brief description of drawings]

【図１】第１の実施の形態に係る回路ブロック図を示
す図である。FIG. 1 is a diagram showing a circuit block diagram according to a first embodiment.

【図２】第１の実施の形態に係る特徴抽出部の処理を
説明するための図である。FIG. 2 is a diagram for explaining a process of a feature extraction unit according to the first embodiment.

【図３】第１の実施の形態に係る判別得点算出部の処
理を説明するための図である。FIG. 3 is a diagram for explaining a process of a discrimination score calculation unit according to the first embodiment.

【図４】第１の実施の形態に係る認識信頼度算出部の
処理を説明するための図である。FIG. 4 is a diagram for explaining a process of a recognition reliability calculation unit according to the first embodiment.

【図５】第１の実施の形態に係る認識候補数制御部の
処理を説明するための図である。FIG. 5 is a diagram for explaining a process of a recognition candidate number control unit according to the first embodiment.

【図６】第２の実施の形態に係る回路ブロック図を示
す図である。FIG. 6 is a diagram showing a circuit block diagram according to a second embodiment.

【図７】第２の実施の形態に係る特徴抽出部の処理を
説明するための図である。FIG. 7 is a diagram for explaining a process of a feature extraction unit according to the second embodiment.

【図８】第１および第２の実施の形態に係る実行フロ
ーチャートである。FIG. 8 is an execution flowchart according to the first and second embodiments.

【図９】第２の実施の形態に係る実行プログラムと参
照データのファイル構造である。FIG. 9 is a file structure of an execution program and reference data according to the second embodiment.

[Explanation of symbols]

１…入力部２…文字認識部３…文字認識辞書４…特徴抽出部５…正規画数テーブル６…判別得点算出部７…認識信頼度算出部８…判別得点―信頼度変換テーブル９…認識候補数制御部１０…信頼度−累積正読率テーブル１１…文字間連接確率辞書 1 ... Input section 2 ... Character recognition part 3 ... Character recognition dictionary 4 ... Feature extraction unit 5 ... Regular stroke number table 6 ... Discrimination score calculation unit 7 ... Recognition reliability calculation unit 8 ... Judgment score-reliability conversion table 9 ... Recognition candidate number control unit 10 ... Reliability-cumulative correct reading rate table 11 ... Character probability dictionary

フロントページの続きＦターム(参考） 5B064 AB04 BA05 DD05 DD07 EA18 5B068 AA01 BD02 BD17 CC19 CD02 CD06 Continued front page F-term (reference) 5B064 AB04 BA05 DD05 DD07 EA18 5B068 AA01 BD02 BD17 CC19 CD02 CD06

Claims

[Claims]

1. A character recognition means for recognizing a coordinate point sequence of characters input by handwriting and outputting a recognition candidate character group, and a reliability of a judgment target recognition candidate character group output by the character recognition means. As a feature amount for the feature extraction means for calculating the average writing speed of the coordinate point sequence of the characters input by handwriting, the feature amount from the feature extraction means, and the statistical tendency of the sample data, based on the It has reliability calculation means for calculating the reliability of the judgment target recognition candidate character group, and post-processing control means for controlling post-processing of the judgment target recognition candidate character group based on the reliability from the reliability calculation means. A character recognition device characterized by the above.

2. The invention according to claim 1, wherein the feature extraction unit determines the average writing speed and the number of writing strokes of the coordinate point sequence of the character input by handwriting as the reliability of the determination target recognition candidate character group. A character recognition device characterized by extracting as a feature amount for calculating a degree.

3. The invention according to any one of claims 1 and 2, wherein the feature extraction means further sets the number of regular strokes of upper N characters (N ≧ 1) in the determination target recognition candidate character group,
A character recognition device characterized by being extracted as a feature amount for calculating the reliability of the judgment target recognition candidate character group.

4. The invention according to any one of claims 1 to 3, wherein the feature extracting means further determines the certainty factor of the upper M characters (M ≧ 1) in the determination target recognition candidate character group. A character recognition device characterized by extracting as a feature amount for calculating the reliability of a target recognition candidate character group.

5. The invention according to claim 1, wherein the feature extracting means further includes each recognition candidate character in the determination target recognition candidate character group and the immediately preceding recognition candidate character for handwriting input immediately before the recognition candidate character. A feature value for calculating the reliability of the determination target recognition candidate character group by determining the value of the connection probability with the group and / or the value of the connection probability with the immediately subsequent recognition candidate character group for the handwriting input immediately after that. Character recognition device characterized by extracting as.

6. The invention according to claim 5, wherein the feature extracting means includes each recognition candidate character in the determination target recognition candidate character group and a recognition candidate character with the highest certainty factor in the immediately preceding recognition candidate character group. And / or extracting the value of the connection probability with the recognition candidate character having the highest certainty factor in the immediately following recognition candidate character group as the feature amount of the determination target recognition candidate character group. Characterized character recognition device.

7. The invention according to claim 5, wherein the feature extracting means includes one recognition candidate character in the determination target recognition candidate character group and each recognition candidate character immediately before or after the recognition candidate character group. A character recognition apparatus, wherein the highest connection probability among the connection probabilities between the one recognition candidate character and the immediately preceding or immediately following recognition candidate character group is used as the connection probability.

8. The invention according to claim 1, wherein the reliability calculation means determines the certainty of one recognition candidate character in the judgment target recognition candidate character group from the feature amount. A character recognition device comprising: a discrimination score calculating means for calculating the reliability, and calculating the reliability based on the discrimination score.

9. The invention according to any one of claims 1 to 8, wherein the post-processing control means, based on the reliability calculated by the reliability calculation means, is a recognition candidate character to be subjected to post-processing. A character recognition device characterized by restricting.

10. A character recognition step of recognizing a coordinate point sequence of characters input by handwriting and outputting a recognition candidate character group, and calculating reliability of a judgment target recognition candidate character group output from the character recognition step. As a feature amount for the feature extraction step of calculating the average writing speed of the coordinate point sequence of the handwritten input characters, the feature amount from the feature extraction step, and the statistical tendency of the sample data, based on the A reliability calculation step of calculating the reliability of the judgment target recognition candidate character group; and a post-processing control step of controlling post-processing of the judgment target recognition candidate character group based on the reliability from the reliability calculation step. A character recognition method characterized by the above.

11. The invention according to claim 10, wherein in the feature extraction step, the average writing speed and the number of writing strokes in the coordinate point sequence of the character input by handwriting are calculated as the reliability of the determination target recognition candidate character group. A character recognition method characterized by extracting as a feature amount for calculating the degree.

12. The invention according to claim 10, wherein in the feature extracting step, the number of regular strokes of upper N characters (N ≧ 1) in the determination target recognition candidate character group is further determined. A character recognition method characterized by extracting as a feature amount for calculating the reliability of a target recognition candidate character group.

13. The invention according to claim 10, wherein in the feature extracting step, the certainty factor of the upper M characters (M ≧ 1) in the determination target recognition candidate character group is further determined. A character recognition method characterized by extracting as a feature amount for calculating the reliability of a target recognition candidate character group.

14. The invention according to claim 10, wherein the feature extraction step further includes each recognition candidate character in the determination target recognition candidate character group and the immediately preceding recognition candidate character for the handwriting input immediately before the recognition candidate character. A feature value for calculating the reliability of the determination target recognition candidate character group by determining the value of the connection probability with the group and / or the value of the connection probability with the immediately subsequent recognition candidate character group for the handwriting input immediately after that. Character recognition method characterized by extracting as.

15. The invention according to claim 14, wherein in the feature extraction step, each recognition candidate character in the determination target recognition candidate character group and the recognition candidate character with the highest certainty factor in the immediately preceding recognition candidate character group are included. And / or extracting the value of the connection probability with the recognition candidate character having the highest certainty factor in the immediately following recognition candidate character group as the feature amount of the determination target recognition candidate character group. Character recognition method that features.

16. The invention according to claim 14, wherein in the feature extracting step, one recognition candidate character in the judgment target recognition candidate character group and each recognition candidate character in the recognition candidate character group immediately before or after the recognition candidate character group are recognized. A character recognition method, wherein the highest connection probability among the connection probabilities between the recognition candidate characters is the connection probability between the one recognition candidate character and the immediately preceding or following recognition candidate character group.

17. The invention according to claim 10, wherein the reliability calculation step determines a certainty of one recognition candidate character in the determination target recognition candidate character group from the feature amount. A character recognition method, comprising: a discrimination score calculating step of calculating as; and calculating the reliability based on the discrimination score.

18. The invention according to claim 10, wherein the post-processing control step is based on the reliability calculated from the reliability calculation step, and is a recognition candidate character to be post-processed. A character recognition method characterized by limiting.

19. The computer according to any one of claims 10 to 18.
A program for executing each processing step in the character recognition method described in any one of 1.

20. The computer according to any one of claims 10 to 18.
A computer-readable recording medium recording a program for executing each processing step in the character recognition method according to any one of 1.