JP2003296661A - Character recognition device, character recognition method, its execution program and recording medium recording it - Google Patents
Character recognition device, character recognition method, its execution program and recording medium recording itInfo
- Publication number
- JP2003296661A JP2003296661A JP2002095511A JP2002095511A JP2003296661A JP 2003296661 A JP2003296661 A JP 2003296661A JP 2002095511 A JP2002095511 A JP 2002095511A JP 2002095511 A JP2002095511 A JP 2002095511A JP 2003296661 A JP2003296661 A JP 2003296661A
- Authority
- JP
- Japan
- Prior art keywords
- candidate character
- recognition candidate
- recognition
- reliability
- character
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims description 52
- 238000012805 post-processing Methods 0.000 claims abstract description 33
- 238000004364 calculation method Methods 0.000 claims description 45
- 238000000605 extraction Methods 0.000 claims description 33
- 238000012545 processing Methods 0.000 claims description 29
- 239000013598 vector Substances 0.000 description 21
- 238000010586 diagram Methods 0.000 description 12
- 238000006243 chemical reaction Methods 0.000 description 9
- 230000005540 biological transmission Effects 0.000 description 4
- 230000001186 cumulative effect Effects 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 4
- 239000010749 BS 2869 Class C1 Substances 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000005484 gravity Effects 0.000 description 2
- 239000000470 constituent Substances 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Landscapes
- Character Discrimination (AREA)
Abstract
Description
【0001】[0001]
【発明の属する技術分野】本発明は、文字認識結果の信
頼度(確からしさ)を判定することにより、手書き入力
された文字を認識する文字認識装置、文字認識方法、そ
の実行プログラムおよびそれを記録した記録媒体に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition device, a character recognition method for recognizing a character input by handwriting by determining the reliability (probability) of a character recognition result, an execution program thereof, and a recording thereof. Recording medium.
【0002】[0002]
【従来の技術】従来の文字認識方法においては、たとえ
ば筆記入力された文字の特徴量を抽出し、これを認識辞
書中の特徴量と比較して、両者の類似度が高い、もしく
は両者の距離値が小さい(これらをまとめて、便宜上、
「確信度」が高いと称する)認識候補文字を出力するよ
うにしていた。しかしながら、かかる一文字毎の文字認
識では、筆記文字が認識辞書の特徴量に近接している場
合には比較的精度の良い認識結果が得られるが、認識辞
書の特徴量から離れた文字を筆記した場合には、適正な
認識結果を簡単に得ることができない。2. Description of the Related Art In a conventional character recognition method, for example, a feature amount of a character input by handwriting is extracted and compared with a feature amount in a recognition dictionary, and the similarity between them is high or the distance between them is high. Small value (collectively, for convenience,
The recognition candidate character is output as "high confidence". However, in such character recognition for each character, a relatively accurate recognition result can be obtained when the written character is close to the feature amount of the recognition dictionary, but a character far from the feature amount of the recognition dictionary is written. In this case, a proper recognition result cannot be easily obtained.
【0003】そこで、かかる一文字毎の文字認識に加
え、前後の文字あるいは単語間・文節間の連接確率ない
し共起確率を検出し、前記文字毎の確信度とこれらの確
率とから文字列の整合度を算出し、かかる整合度に従っ
て文字列全体の認識文字列候補を出力する、いわゆる後
処理が実行されている。ところが、かかる後処理の際
に、あまりに多くの認識候補文字を対象とすると、後処
理の計算処理時間が増大してしまう。また、確信度の低
い認識候補文字を対象とすると、後処理の結果、却って
誤った文字列候補を出力する恐れもある。Therefore, in addition to the character recognition for each character, the concatenation probability or co-occurrence probability of the preceding or following characters or words or phrases is detected, and the character string is matched from the certainty factor for each character and these probabilities. The so-called post-processing is performed to calculate the degree and output the recognized character string candidates for the entire character string according to the degree of matching. However, if too many recognition candidate characters are targeted during such post-processing, the post-processing calculation processing time will increase. Further, when the recognition candidate character having a low certainty factor is targeted, there is a possibility that a wrong character string candidate may be output as a result of the post-processing.
【0004】そこで、計算時間の増大を抑えながら後処
理の精度を高めるために、後処理の対象とする候補文字
を制限する種々の方式が提案されている。代表的なもの
として例えば、(a)文字認識により得られる各候補文
字の確信度を直接的に用いる方式、(b)各候補文字の
確信度から認識結果の信頼度を推定し、信頼度に応じて
候補文字数を制御する方式、(c)隣接する認識候補文
字群相互の間の言語的な連接関係から認識結果の信頼度
を推定する方式、などが挙げられる。Therefore, in order to improve the accuracy of the post-processing while suppressing the increase of the calculation time, various methods of limiting the candidate characters to be the target of the post-processing have been proposed. For example, (a) a method of directly using the certainty factor of each candidate character obtained by character recognition, (b) estimating the reliability of the recognition result from the certainty factor of each candidate character, A method of controlling the number of candidate characters accordingly, a method of (c) estimating the reliability of the recognition result from the linguistic connection between adjacent recognition candidate character groups, and the like.
【0005】(a)はもっとも単純な方式である。即
ち、各候補文字の確信度を所定のしきい値と比較し、こ
のしきい値より確信度が高い候補文字のみを後処理の対
象とする方式である。また、(b)の方式において認識
結果の信頼度を求める方法の一例としては、特開平09
−259226号公報「認識結果の評価方法および認識
装置」が挙げられる。これは1位候補文字の確信度と2
位候補文字の確信度の差分値を求め、この差分値と1位
候補文字の確信度の線形和を認識結果の正解らしさの尺
度とする方法である。この方法は、認識結果が正解の場
合1位候補文字の確信度が比較的高く、かつ、1位候補
文字の確信度と2位候補文字の確信度の差が比較的大き
い傾向に着目したものである。また、この方法以外に
も、1位候補文字の確信度と2位以下の各候補文字の確
信度との比を用いる方法、各候補文字の確信度を多次元
の確率分布ととらえ、統計的に信頼度を求める方法な
ど、種々の方法が提案されている。(A) is the simplest method. That is, this is a method in which the certainty factor of each candidate character is compared with a predetermined threshold value, and only candidate characters having a certainty factor higher than this threshold value are subjected to post-processing. In addition, as an example of the method of obtaining the reliability of the recognition result in the method of (b), there is Japanese Patent Laid-Open No.
-259226 gazette "a recognition method evaluation method and a recognition apparatus" is mentioned. This is the certainty of the 1st place candidate character and 2
This is a method of obtaining a difference value of the certainty factor of the rank candidate character and using the linear sum of this difference value and the certainty factor of the first rank candidate character as a measure of the correctness of the recognition result. This method focuses on the tendency that the confidence of the 1st place candidate character is relatively high when the recognition result is correct and the difference between the 1st place candidate character and the 2nd place candidate character is relatively large. Is. In addition to this method, a method that uses the ratio of the certainty factor of the first-ranked candidate character and the certainty factor of each of the second and lower candidate characters, the certainty factor of each candidate character is regarded as a multidimensional probability distribution, and Various methods have been proposed, such as a method for obtaining reliability.
【0006】更に(c)の方式としては、判定対象の認
識候補文字群とその直前または直後の認識候補文字群に
含まれる各文字間の連接確率に着目して、認識結果の信
頼度を統計的に推定する方式である。Further, as the method (c), the reliability of the recognition result is statistically evaluated by paying attention to the connection probability between each character included in the recognition candidate character group to be judged and the recognition candidate character group immediately before or after the recognition candidate character group. It is a method to estimate it.
【0007】[0007]
【発明が解決しようとする課題】しかしながら、一般に
文字認識の結果得られる各候補文字の確信度は、必ずし
もその候補文字の正解らしさを適切に反映していない。
例えば、比較的乱雑に書かれた文字の場合、たとえその
候補文字が正解であったとしても確信度は低い傾向があ
る。一方、候補文字と正解文字が類似している場合は、
不正解であってもその確信度は高い場合がある。したが
って、前記(a)の方式で精度良く候補文字を制限する
ことは困難である。However, generally, the certainty factor of each candidate character obtained as a result of character recognition does not necessarily properly reflect the correctness of the candidate character.
For example, in the case of relatively disorderly written characters, the certainty factor tends to be low even if the candidate character is the correct answer. On the other hand, if the candidate character and the correct character are similar,
Even if the answer is incorrect, the certainty factor may be high. Therefore, it is difficult to accurately limit the candidate characters by the method (a).
【0008】また、1位候補文字と2位候補文字の組み
合わせが「つ」−「フ」、「之」−「え」など類似文字
の場合には、認識結果が正解か否かに拘らず、これらの
確信度の差分値は小さい傾向がある。ひらがな、カタカ
ナ、漢字、英数字のすべてを認識対象とする場合には、
このような類似文字の組み合わせが頻繁に発生するた
め、前記(b)の方式でもやはり認識結果の信頼度を精
度良く推定することは難しい。When the combination of the first-ranked candidate character and the second-ranked candidate character is a similar character such as "tsu"-"fu", "yuki"-"e", regardless of whether the recognition result is correct or not. , The difference value of these certainty factors tends to be small. If you want to recognize all hiragana, katakana, kanji, and alphanumeric characters,
Since such combinations of similar characters frequently occur, it is difficult to accurately estimate the reliability of the recognition result even with the method (b).
【0009】一方、前記(c)の方式では、方式(a)
および(b)に見られるような確信度の特性に起因する
問題は回避される。しかし、この方式では文字間の連接
確率が低い文字列(例えば使用頻度の低い専門用語や固
有名詞等)の場合、認識結果の信頼度を精度良く求めら
れない場合がある。したがって、前記従来技術によれ
ば、本来後処理の対象から除外すべき認識候補文字を信
頼度の高いものとして後処理の対象として出力し、逆
に、本来後処理の対象とすべき認識候補文字を信頼度の
低いものとして後処理の対象から除外する結果が生じ、
信頼度の判定によって却って認識精度を低下させる結果
を招いていた。これら従来技術に共通する点は、信頼度
判定の算出に用いる特徴量を文字認識結果からのみ抽出
しているところにある。On the other hand, in the method (c), the method (a)
Problems due to the belief property as seen in (b) and (b) are avoided. However, in this method, in the case of a character string having a low concatenation probability between characters (for example, technical terms or proper nouns that are rarely used), the reliability of the recognition result may not be obtained accurately. Therefore, according to the above-mentioned conventional technique, the recognition candidate character that should be excluded from the target of the post-processing is output as the target of the post-processing with high reliability, and conversely, the recognition candidate character that should be the target of the post-processing is output. Results in excluding from post-processing as unreliable,
The result of the reliability judgment is that the recognition accuracy is rather lowered. The common point of these prior arts is that the feature amount used for calculation of reliability determination is extracted only from the character recognition result.
【0010】ところで、スタイラス等により手書き入力
された文字を認識するいわゆるオンライン手書き文字認
識においては、入力データである時系列の座標点列が、
認識結果の信頼度推定に有用な情報を含んでいる場合が
ある。例えば、乱雑に書かれれた文字は一般に筆記速度
が速く、丁寧に書かれた文字では筆記速度が遅い傾向が
ある。乱雑に書かれた文字では誤認識が起こりやすいた
め、認識結果の信頼度は低下すると考えられる。また、
一般に筆記画数が少ない場合には類似文字が多いためや
はり誤認識が起こりやすくなる。By the way, in so-called on-line handwritten character recognition for recognizing characters handwritten by a stylus or the like, the time-series coordinate point sequence, which is input data, is
It may contain information useful for estimating the reliability of the recognition result. For example, randomly written characters tend to have a faster writing speed, while carefully written characters tend to have a slower writing speed. Since misrecognition is likely to occur in randomly written characters, the reliability of the recognition result is considered to decrease. Also,
In general, when the number of writing strokes is small, many similar characters are present, so that erroneous recognition is likely to occur.
【0011】さらに、筆記画数が認識処理の結果得られ
る候補文字の正規画数と比較して著しく小さいような場
合、例えば筆記画数が1画であるのに対して1位候補文
字の正規画数が10画であるような場合は、画数の多い
文字を1画で筆記したような場合、すなわち極端なつづ
け字である可能性が高い。このような場合もやはり誤認
識が起こりやすくなる。したがって、筆記画数と候補文
字の正規画数相互の関係もまた文字認識結果の信頼度を
反映している。Further, when the number of writing strokes is significantly smaller than the number of normal strokes of the candidate character obtained as a result of the recognition processing, for example, the number of writing strokes is one, whereas the number of regular strokes of the first-ranked candidate character is ten. In the case of a stroke, it is highly likely that a character with a large number of strokes is written in one stroke, that is, an extreme continuous character. Even in such a case, erroneous recognition is likely to occur. Therefore, the mutual relationship between the writing stroke number and the regular stroke number of the candidate character also reflects the reliability of the character recognition result.
【0012】そこで、本発明は、手書き入力データから
得られる信頼度推定に有用な特徴に着目することにより
従来技術が抱える問題を解消し、候補文字の確信度や候
補文字間の連接確率など、文字認識結果から得られる特
徴のみでは認識結果の信頼度を適切に推定できない場合
でも比較的精度よく信頼度を算出でき、もって、認識精
度を向上させ得る文字認識装置、文字認識方法、その実
行プログラムおよびそれを記憶した記録媒体を提供する
ことを目的とするものである。Therefore, the present invention solves the problems of the prior art by paying attention to the features useful for estimating the reliability obtained from the handwritten input data, such as the certainty factor of candidate characters and the connection probability between candidate characters. A character recognition device, a character recognition method, and an execution program thereof that can relatively accurately calculate the reliability even when the reliability of the recognition result cannot be appropriately estimated only by the features obtained from the character recognition result. And a recording medium storing the same.
【0013】[0013]
【課題を解決するための手段】請求項1に係る発明は、
手書き入力された文字の座標点列を認識して認識候補文
字群を出力する文字認識手段と、前記文字認識手段より
出力される判定対象認識候補文字群の信頼度を算出する
ための特徴量として、前記手書き入力された文字の座標
点列の平均筆記速度を算出する特徴抽出手段と、前記特
徴抽出手段からの特徴量と、サンプルデータの統計的傾
向とに基づいて、前記判定対象認識候補文字群の信頼度
を算出する信頼度算出手段と、前記信頼度算出手段から
の信頼度に基づいて前記判定対象認識候補文字群の後処
理を制御する後処理制御手段とを有することを特徴とす
る。The invention according to claim 1 is
Character recognition means for recognizing the coordinate point sequence of characters input by handwriting and outputting a recognition candidate character group, and a feature amount for calculating the reliability of the judgment target recognition candidate character group output from the character recognition means. , The feature extraction means for calculating the average writing speed of the coordinate point sequence of the characters input by handwriting, the feature amount from the feature extraction means, and the statistical tendency of the sample data, and the determination target recognition candidate character A reliability calculation means for calculating reliability of the group, and a post-processing control means for controlling post-processing of the judgment target recognition candidate character group based on the reliability from the reliability calculation means. .
【0014】請求項2に係る発明は、請求項1記載の発
明において、前記特徴抽出手段は、前記平均筆記速度
と、前記手書き入力された文字の座標点列の筆記画数と
を、前記判定対象認識候補文字群の信頼度を算出するた
めの特徴量として抽出することを特徴とする。請求項3
に係る発明は、請求項1ないし2の何れかに記載の発明
において、前記特徴抽出手段は、さらに前記判定対象認
識候補文字群中の上位N文字(N≧1)の正規画数を、
当該判定対象認識候補文字群の信頼度を算出するための
特徴量として抽出することを特徴とする。According to a second aspect of the present invention, in the first aspect of the present invention, the feature extraction means determines the average writing speed and the number of writing strokes in the coordinate point sequence of the handwritten input character as the determination target. It is characterized in that it is extracted as a feature amount for calculating the reliability of the recognition candidate character group. Claim 3
The invention according to claim 1 is the invention according to any one of claims 1 and 2, wherein the feature extraction unit further sets the number of regular strokes of upper N characters (N ≧ 1) in the determination target recognition candidate character group,
It is characterized in that it is extracted as a feature amount for calculating the reliability of the judgment target recognition candidate character group.
【0015】請求項4に係る発明は、請求項1ないし3
の何れかに記載の発明において、前記特徴抽出手段は、
さらに前記判定対象認識候補文字群中の上位M文字(M
≧1)の確信度を、当該判定対象認識候補文字群の信頼
度を算出するための特徴量として抽出することを特徴と
する。請求項5に係る発明は、請求項1ないし4の何れ
かに記載の発明において、前記特徴抽出手段は、さらに
前記判定対象認識候補文字群中の各認識候補文字とその
直前の手書き入力に対する直前認識候補文字群との間の
連接確率の値および/もしくはその直後の手書き入力に
対する直後認識候補文字群との間の連接確率の値を、当
該判定対象認識候補文字群の信頼度を算出するための特
徴量として抽出することを特徴とする。The invention according to claim 4 relates to claims 1 to 3.
In the invention described in any of 1,
Furthermore, the upper M characters (M
The certainty factor of ≧ 1) is extracted as a feature amount for calculating the reliability factor of the determination target recognition candidate character group. According to a fifth aspect of the present invention, in the invention according to any one of the first to fourth aspects, the feature extracting means further includes each recognition candidate character in the determination target recognition candidate character group and immediately before handwriting input immediately before that. In order to calculate the reliability of the determination target recognition candidate character group, the value of the connection probability with the recognition candidate character group and / or the value of the connection probability with the immediately subsequent recognition candidate character group for the handwriting input immediately after that. Is extracted as a feature amount of.
【0016】請求項6に係る発明は、請求項5記載の発
明において、前記特徴抽出手段は、前記判定対象認識候
補文字群中の各認識候補文字と前記直前認識候補文字群
中の最上位確信度の認識候補文字との間の連接確率の値
および/もしくは前記直後認識候補文字群中の最上位確
信度の認識候補文字との間の連接確率の値を当該判定対
象認識候補文字群の特徴量として抽出することを特徴と
する。According to a sixth aspect of the present invention, in the invention according to the fifth aspect, the feature extracting means includes each recognition candidate character in the determination target recognition candidate character group and the highest belief in the immediately preceding recognition candidate character group. The value of the concatenation probability with the recognition candidate character of the degree and / or the value of the concatenation probability with the recognition candidate character of the highest confidence in the immediately following recognition candidate character group are the features of the determination target recognition candidate character group. It is characterized in that it is extracted as a quantity.
【0017】請求項7に係る発明は、請求項5記載の発
明において、前記特徴抽出手段は、前記判定対象認識候
補文字群中の一の認識候補文字とその直前または直後認
識候補文字群中の各認識候補文字との間の連接確率の
内、最高の連接確率を当該一の認識候補文字と前記直前
または直後認識候補文字群との間の連接確率とすること
を特徴とする。According to a seventh aspect of the present invention, in the invention according to the fifth aspect, the feature extraction means selects one recognition candidate character in the determination target recognition candidate character group and a recognition candidate character group immediately before or after the recognition candidate character group. Among the connection probabilities with each recognition candidate character, the highest connection probability is set as the connection probability between the one recognition candidate character and the immediately preceding or following recognition candidate character group.
【0018】請求項8に係る発明は、請求項1ないし7
の何れかに記載の発明において、前記信頼度算出手段
は、前記特徴量から前記判定対象認識候補文字群中の一
の認識候補文字の確からしさを判別得点として算出する
判別得点算出手段を含み、当該判別得点に基づいて前記
信頼度を算出することを特徴とする。請求項9に係る発
明は、請求項1ないし8の何れかに記載の発明におい
て、前記後処理制御手段は、前記信頼度算出手段から算
出された信頼度に基づいて、後処理の対象とする認識候
補文字を制限することを特徴とする。The invention according to claim 8 relates to claims 1 to 7.
In any one of the inventions, the reliability calculation means includes a judgment score calculation means for calculating the certainty of one recognition candidate character in the judgment target recognition candidate character group from the characteristic amount as a judgment score, The reliability is calculated based on the discrimination score. According to a ninth aspect of the present invention, in the invention according to any one of the first to eighth aspects, the post-processing control means sets the post-processing target based on the reliability calculated by the reliability calculation means. The feature is that recognition candidate characters are limited.
【0019】請求項10に係る発明は、手書き入力され
た文字の座標点列を認識して認識候補文字群を出力する
文字認識ステップと、前記文字認識ステップより出力さ
れる判定対象認識候補文字群の信頼度を算出するための
特徴量として、前記手書き入力された文字の座標点列の
平均筆記速度を算出する特徴抽出ステップと、前記特徴
抽出ステップからの特徴量と、サンプルデータの統計的
傾向とに基づいて、前記判定対象認識候補文字群の信頼
度を算出する信頼度算出ステップと、前記信頼度算出ス
テップからの信頼度に基づいて前記判定対象認識候補文
字群の後処理を制御する後処理制御ステップとを有する
ことを特徴とする。According to a tenth aspect of the present invention, a character recognition step of recognizing a coordinate point sequence of characters input by handwriting and outputting a recognition candidate character group, and a determination target recognition candidate character group output from the character recognition step. As a feature amount for calculating the reliability of, the feature extraction step of calculating the average writing speed of the coordinate point sequence of the characters input by handwriting, the feature amount from the feature extraction step, and the statistical tendency of sample data A reliability calculation step of calculating the reliability of the judgment target recognition candidate character group based on the following, and after controlling the post-processing of the judgment target recognition candidate character group based on the reliability from the reliability calculation step And a processing control step.
【0020】請求項11に係る発明は、請求項10記載
の発明において、前記特徴抽出ステップは、前記平均筆
記速度と、前記手書き入力された文字の座標点列の筆記
画数とを、前記判定対象認識候補文字群の信頼度を算出
するための特徴量として抽出することを特徴とする。請
求項12に係る発明は、請求項10ないし11の何れか
に記載の発明において、前記特徴抽出ステップは、さら
に前記判定対象認識候補文字群中の上位N文字(N≧
1)の正規画数を、当該判定対象認識候補文字群の信頼
度を算出するための特徴量として抽出することを特徴と
する。According to an eleventh aspect of the present invention, in the invention of the tenth aspect, in the feature extracting step, the average writing speed and the number of writing strokes in the coordinate point sequence of the character input by handwriting are used as the determination target. It is characterized in that it is extracted as a feature amount for calculating the reliability of the recognition candidate character group. According to a twelfth aspect of the present invention, in the invention according to any one of the tenth to eleventh aspects, the feature extracting step further includes the upper N characters (N ≧) in the determination target recognition candidate character group.
The feature is that the regular stroke number of 1) is extracted as a feature amount for calculating the reliability of the determination target recognition candidate character group.
【0021】請求項13に係る発明は、請求項10ない
し12の何れかに記載の発明において、前記特徴抽出ス
テップは、さらに前記判定対象認識候補文字群中の上位
M文字(M≧1)の確信度を、当該判定対象認識候補文
字群の信頼度を算出するための特徴量として抽出するこ
とを特徴とする。請求項14に係る発明は、請求項10
ないし13の何れかに記載の発明において、前記特徴抽
出ステップは、さらに前記判定対象認識候補文字群中の
各認識候補文字とその直前の手書き入力に対する直前認
識候補文字群との間の連接確率の値および/もしくはそ
の直後の手書き入力に対する直後認識候補文字群との間
の連接確率の値を、当該判定対象認識候補文字群の信頼
度を算出するための特徴量として抽出することを特徴と
する。According to a thirteenth aspect of the present invention, in the invention according to any one of the tenth to twelfth aspects, the feature extraction step further includes the upper M characters (M ≧ 1) in the judgment target recognition candidate character group. It is characterized in that the certainty factor is extracted as a feature amount for calculating the reliability of the judgment target recognition candidate character group. The invention according to claim 14 is claim 10
In the invention described in any one of (1) to (13), the feature extraction step further includes the concatenation probability of each recognition candidate character in the determination target recognition candidate character group and the immediately preceding recognition candidate character group for the immediately preceding handwriting input. It is characterized in that the value and / or the value of the concatenation probability between the immediately subsequent recognition candidate character group and the immediately subsequent handwritten input is extracted as a feature amount for calculating the reliability of the determination target recognition candidate character group. .
【0022】請求項15に係る発明は、請求項14記載
の発明において、前記特徴抽出ステップは、前記判定対
象認識候補文字群中の各認識候補文字と前記直前認識候
補文字群中の最上位確信度の認識候補文字との間の連接
確率の値および/もしくは前記直後認識候補文字群中の
最上位確信度の認識候補文字との間の連接確率の値を当
該判定対象認識候補文字群の特徴量として抽出すること
を特徴とする。According to a fifteenth aspect of the present invention, in the invention according to the fourteenth aspect, the feature extracting step includes each recognition candidate character in the determination target recognition candidate character group and the highest confidence in the immediately preceding recognition candidate character group. The value of the concatenation probability with the recognition candidate character of the degree and / or the value of the concatenation probability with the recognition candidate character of the highest confidence in the immediately following recognition candidate character group are the features of the determination target recognition candidate character group. It is characterized in that it is extracted as a quantity.
【0023】請求項16に係る発明は、請求項14記載
の発明において、前記特徴抽出ステップは、前記判定対
象認識候補文字群中の一の認識候補文字とその直前また
は直後認識候補文字群中の各認識候補文字との間の連接
確率の内、最高の連接確率を当該一の認識候補文字と前
記直前または直後認識候補文字群との間の連接確率とす
ることを特徴とする。According to a sixteenth aspect of the present invention, in the invention according to the fourteenth aspect, the feature extracting step includes one recognition candidate character in the determination target recognition candidate character group and a recognition candidate character group immediately before or after the recognition candidate character group. Among the connection probabilities with each recognition candidate character, the highest connection probability is set as the connection probability between the one recognition candidate character and the immediately preceding or following recognition candidate character group.
【0024】請求項17に係る発明は、請求項10ない
し16の何れかに記載の発明において、前記信頼度算出
ステップは、前記特徴量から前記判定対象認識候補文字
群中の一の認識候補文字の確からしさを判別得点として
算出する判別得点算出ステップを含み、当該判別得点に
基づいて前記信頼度を算出することを特徴とする。請求
項18に係る発明は、請求項10ないし17の何れかに
記載の発明において、前記後処理制御ステップは、前記
信頼度算出ステップから算出された信頼度に基づいて、
後処理の対象とする認識候補文字を制限することを特徴
とする。According to a seventeenth aspect of the present invention, in the invention according to any one of the tenth to sixteenth aspects, the reliability calculation step is based on the feature amount and is one recognition candidate character in the judgment target recognition candidate character group. It is characterized by including a discriminant score calculation step for calculating the certainty of as a discriminant score, and calculating the reliability based on the discriminant score. The invention according to claim 18 is the invention according to any one of claims 10 to 17, wherein the post-processing control step is based on the reliability calculated from the reliability calculation step,
The feature is that the recognition candidate characters that are the target of the post-processing are limited.
【0025】請求項19に係る発明は、コンピュータに
請求項10ないし18の何れかに記載の文字認識方法に
おける各処理ステップを実行させるためのプログラムで
ある。請求項20に係る発明は、コンピュータに請求項
10ないし18の何れかに記載の文字認識方法における
各処理ステップを実行させるためのプログラムを記録し
たコンピュータ読み取り可能な記録媒体である。The invention according to claim 19 is a program for causing a computer to execute each processing step in the character recognition method according to any one of claims 10 to 18. The invention according to claim 20 is a computer-readable recording medium recording a program for causing a computer to execute each processing step in the character recognition method according to any one of claims 10 to 18.
【0026】[0026]
【発明の実施の形態】<第1の実施の形態>以下、本発
明の第1の実施の形態につき図面を参照して説明する。
まず、図1は、第1の実施の形態に係る手書き文字認識
装置の回路ブロック図である。BEST MODE FOR CARRYING OUT THE INVENTION <First Embodiment> A first embodiment of the present invention will be described below with reference to the drawings.
First, FIG. 1 is a circuit block diagram of a handwritten character recognition device according to the first embodiment.
【0027】図1において、1は入力部で、タブレット
等に手書き入力された筆跡から筆跡文字情報を生成し出
力する。2は文字認識部で、入力部1から供給された筆
跡文字情報を文字認識辞書3の文字特徴量と比較し、両
者の近接度(確信度)が1位からN位までの認識辞書中
の文字を当該筆跡文字の認識候補文字として出力する。
3は文字認識辞書で、候補文字がその文字特徴量ととも
に記憶されている。In FIG. 1, an input unit 1 generates and outputs handwriting character information from handwriting input by handwriting on a tablet or the like. A character recognition unit 2 compares the handwritten character information supplied from the input unit 1 with the character feature amount of the character recognition dictionary 3, and the proximity (confidence) between the two is in the recognition dictionary from the first rank to the Nth rank. The character is output as a recognition candidate character of the handwriting character.
Reference numeral 3 is a character recognition dictionary in which candidate characters are stored together with their character feature amounts.
【0028】4は特徴抽出部で、入力部1で得られた筆
跡文字情報から平均筆記速度を算出するとともに筆記画
数を求める。さらに正規画数テーブル5を参照して、文
字認識部2から供給される判定対象の認識候補文字群の
内、上位N文字の正規画数を抽出する。5は正規画数テ
ーブルで、認識対象文字の正規画数を各認識対象文字に
対応づけて記憶している。A feature extraction unit 4 calculates an average writing speed from the handwritten character information obtained by the input unit 1 and obtains the number of writing strokes. Further, by referring to the regular stroke number table 5, the regular stroke number of the upper N characters of the recognition candidate character group of the determination target supplied from the character recognition unit 2 is extracted. A regular stroke number table 5 stores the regular stroke number of the recognition target character in association with each recognition target character.
【0029】6は判別得点算出部で、特徴抽出部4で得
られた特徴量を処理して、当該判定対象の認識候補文字
群の正誤判別得点を算出する。7は認識信頼度算出部
で、判別得点算出部6からの判別得点と、判別得点−信
頼度変換テーブル8とを比較して、当該判定対象の認識
候補文字群の信頼度を出力する。8は判別得点−信頼度
変換テーブルで、判別得点と信頼度の関係をテーブルと
して記憶しておくものである。A discrimination score calculation unit 6 processes the feature amount obtained by the feature extraction unit 4 to calculate the true / false discrimination score of the recognition candidate character group to be judged. A recognition reliability calculation unit 7 compares the discrimination score from the discrimination score calculation unit 6 with the discrimination score-reliability conversion table 8 and outputs the reliability of the recognition candidate character group of the determination target. Reference numeral 8 denotes a discrimination score-reliability conversion table, which stores the relation between the discrimination score and the reliability as a table.
【0030】9は認識候補数制御部で、認識信頼度算出
部7からの信頼度と、信頼度−累積正読率テーブル10
とを比較して、当該判定対象文字の認識候補数を制限す
るものである。10は信頼度−累積正読率テーブルで、
信頼度と正読率の関係をテーブルとして記憶しておくも
のである。21は言語処理部で、認識候補数制御部9に
よって設定された個数の認識候補文字を対象として後処
理を行い、文字列候補を出力する。Reference numeral 9 is a recognition candidate number control unit, which is a reliability from the recognition reliability calculation unit 7, and a reliability-cumulative correct reading rate table 10.
And are compared to limit the number of recognition candidates of the determination target character. 10 is a reliability-cumulative correct reading rate table,
The relationship between the reliability and the correct reading rate is stored as a table. A language processing unit 21 performs post-processing on the number of recognition candidate characters set by the recognition candidate number control unit 9 and outputs a character string candidate.
【0031】なお、請求項における「文字認識手段」は
実施の形態における図1の文字認識部2および文字認識
辞書3が対応する。請求項における「特徴抽出手段」は
実施の形態における図1の特徴抽出部4および正規画数
テーブル5が対応する。請求項における「信頼度算出手
段」は実施の形態における図1の判別得点算出部6、認
識信頼度算出部7および判別得点―信頼度変換テーブル
8が対応する。請求項における「後処理制御手段」は実
施の形態における図1の認識候補数制御部9および信頼
度―累積正読率テーブル10が対応する。The "character recognizing means" in the claims corresponds to the character recognizing unit 2 and the character recognizing dictionary 3 in FIG. 1 in the embodiment. The “feature extraction means” in the claims corresponds to the feature extraction unit 4 and the regular stroke number table 5 of FIG. 1 in the embodiment. The “reliability calculation means” in the claims corresponds to the discrimination score calculation unit 6, the recognition reliability calculation unit 7, and the discrimination score-reliability conversion table 8 of FIG. 1 in the embodiment. The “post-processing control means” in the claims corresponds to the recognition candidate number control unit 9 and the reliability-cumulative correct reading rate table 10 of FIG. 1 in the embodiment.
【0032】次に、前記回路ブロック図に示された各部
の処理の詳細について説明する。まず、図2を参照し
て、特徴抽出部4の処理について説明する。図2は入力
部1より文字「い」が筆記された時の座標点列を表して
いる。かかる特徴抽出部4では、入力部1で得られた座
標点列(xi,yi,ti,pi)(i=1〜K)から
平均筆記速度を算出するとともに筆記画数を求める。こ
こでKは座標点数であり、xiおよびyiはそれぞれi
番目の座標点のx座標およびy座標、tiはi番目の座
標点が発生した時刻である。またpiは時刻tiにおけ
る入力ペンの状態を示し、ペンがタブレットに接してい
る時はpi=1,ペンがタブレットから離れている時は
pi=0の値を持つ。Next, details of the processing of each unit shown in the circuit block diagram will be described. First, the processing of the feature extraction unit 4 will be described with reference to FIG. FIG. 2 shows a sequence of coordinate points when the character “i” is written from the input unit 1. The feature extracting unit 4 calculates the average writing speed from the coordinate point sequence (xi, yi, ti, pi) (i = 1 to K) obtained by the input unit 1 and obtains the number of writing strokes. Here, K is the number of coordinate points, and xi and yi are i
The x coordinate and y coordinate of the th coordinate point and ti are the times when the i th coordinate point occurred. Further, pi indicates the state of the input pen at time ti, and has a value of pi = 1 when the pen is in contact with the tablet and pi = 0 when the pen is away from the tablet.
【0033】i番目の座標点からi+1番目の座標点へ
ペンが移動した時の筆記速度viは数1で表される。The writing speed vi when the pen moves from the i-th coordinate point to the (i + 1) -th coordinate point is expressed by equation 1.
【0034】[0034]
【数1】 [Equation 1]
【0035】したがって、座標点列の平均筆記速度Vは
数2により算出される。Therefore, the average writing speed V of the coordinate point sequence is calculated by the equation 2.
【0036】[0036]
【数2】 [Equation 2]
【0037】また、入力ペン状態を示すpiの値が1か
ら0に変化する回数を計数することにより、筆記画数S
inpが得られる。さらに、認識候補文字群の上位N文
字に対して、正規画数テーブル5を参照することにより
正規画数Sn1,Sn2,…,SnNが得られる。次
に、図3を参照して、判別得点算出部6における処理に
ついて説明する。The number of writing strokes S is calculated by counting the number of times the value of pi indicating the input pen state changes from 1 to 0.
inp is obtained. Further, the normal stroke number Sn1, Sn2, ..., SnN can be obtained by referring to the regular stroke number table 5 for the upper N characters of the recognition candidate character group. Next, with reference to FIG. 3, the processing in the discrimination score calculation unit 6 will be described.
【0038】前記特徴抽出部4で抽出した平均筆記速度
V、筆記画数Sinp、正規画数Sn1,Sn2,…,
SnNの組は、(2+N)次元のベクトル空間において
所定のベクトル(特徴ベクトル)として表現できる。判
別得点算出部6では、予め第1位の認識候補文字が正読
または誤読であるサンプルについて同様に特徴ベクトル
を学習データとして抽出しておき、これと判定対象の認
識候補文字群の特徴ベクトルとを比較して、当該認識候
補文字群の判別得点を算出する。The average writing speed V, the number of writing strokes Sinp, the number of regular strokes Sn1, Sn2, ...
The SnN set can be expressed as a predetermined vector (feature vector) in a (2 + N) -dimensional vector space. The discrimination score calculation unit 6 similarly extracts a feature vector as learning data for a sample in which the first-ranked recognition candidate character is correctly read or erroneously read, and extracts it as a feature vector of the recognition candidate character group to be judged. And the discrimination score of the recognition candidate character group is calculated.
【0039】たとえば、正読と誤読の特徴ベクトルの学
習データと判定対象の認識候補文字群の特徴ベクトルが
図3に示すような状態にあるとする。図3は簡略化のた
めにN=1すなわち特徴ベクトルの次元数が3の場合を
図示したものである。判別得点算出部6は、正読、誤読
のそれぞれの集合(クラス)の特徴ベクトルの分布か
ら、予め、両クラスの特徴ベクトルの平均値(重心ベク
トル)および共分散行列を求め、これを記憶している。
そして、これら各クラスの重心と判定対象文字群の特徴
ベクトルとの間のマハラノビス距離DMc、DMeを求
め、これらの値の比、比の対数または差を判別得点とす
る。For example, it is assumed that the learning data of the correct reading and misreading feature vectors and the feature vector of the recognition candidate character group to be judged are in a state as shown in FIG. FIG. 3 illustrates the case where N = 1, that is, the number of dimensions of the feature vector is 3, for simplification. The discrimination score calculation unit 6 obtains the average value (centroid vector) of the feature vectors of both classes and the covariance matrix in advance from the distribution of the feature vector of each set (class) of correct reading and erroneous reading, and stores this. ing.
Then, the Mahalanobis distances DMc and DMe between the center of gravity of each of these classes and the feature vector of the character group to be determined are obtained, and the ratio of these values, the logarithm of the ratio, or the difference is used as the determination score.
【0040】ここで、前記マハラノビス距離DMは次の
ようにして算出される。すなわち、クラスC1の重心ベ
クトルをm1、クラスC1の共分散行列をΣ1とする
と、所定の特徴ベクトルxからm1へのマハラノビス2
乗距離DM1は、数3で定義される。Here, the Mahalanobis distance DM is calculated as follows. That is, assuming that the center of gravity vector of the class C1 is m1 and the covariance matrix of the class C1 is Σ1, the Mahalanobis 2 from the predetermined feature vector x to m1.
The riding distance DM1 is defined by Equation 3.
【0041】[0041]
【数3】 [Equation 3]
【0042】ここで、共分散行列Σ1はn×nの正方行
列であり(n:特徴空間の次元数)、その(i,j)要
素はi番目の特徴量とj番目の特徴量の共分散、すなわ
ちΣ1(i,j)=σijである。なお、前記では、判
別得点をマハラノビス距離DMを用いて算出したが、こ
れに替えて、正読、誤読の各クラスの特徴ベクトルの分
布から線形判別分析により線形判別関数を求めておき、
判定対象の認識候補文字群の特徴ベクトルに対してこの
線形判別関数を当てはめて判別得点を求めるようにして
も良い。Here, the covariance matrix Σ1 is an n × n square matrix (n: the number of dimensions of the feature space), and its (i, j) element is the co-variance of the i-th feature quantity and the j-th feature quantity. The variance, ie Σ1 (i, j) = σij. In the above, the discriminant score was calculated using the Mahalanobis distance DM, but instead of this, a linear discriminant function was obtained by linear discriminant analysis from the distribution of the characteristic vectors of each class of correct reading and misreading,
The linear discriminant function may be applied to the feature vector of the recognition candidate character group to be determined to obtain the discrimination score.
【0043】また、正読、誤読の学習サンプルから抽出
した特徴ベクトルを学習データとして、対象の特徴ベク
トルが正読か誤読かを判定できるように学習させたニュ
ーラルネットを用い、判定対象の特徴ベクトルに対する
当該ニューラルネットの出力値を判別得点とするように
してもよい。次に、図4を参照して認識信頼度算出部7
の処理について説明する。Further, using the feature vector extracted from the learning sample of correct reading or erroneous reading as learning data, the neural network trained so as to judge whether the target characteristic vector is correct reading or erroneous reading is used, and the characteristic vector to be judged. The output value of the neural network with respect to may be used as the discrimination score. Next, referring to FIG. 4, the recognition reliability calculation unit 7
The process will be described.
【0044】たとえば、前記判別得点の算出において、
マハラノビス距離DMc、DMeの距離の比または比の
対数を判別得点とした場合、正読および誤読の各学習サ
ンプルから得られる判別得点と信頼度の関係は図4に示
すようになる。ここで、信頼度は、学習サンプルからベ
イズの定理によって算出される。すなわち、判別得点y
を有する正読サンプル個数の全正読サンプル個数に対す
る比率をp(y|X1=C)、判別得点yを有する誤読
サンプル個数の全誤読サンプル個数に対する比率をp
(y|X1=E)、全サンプル数に対する正読サンプル
数の総数の比率をP(X1=C)、全サンプル数に対す
る誤読サンプル数の総数の比率をP(X1=E)とする
と、判別得点yを有する認識候補文字群の確信度1位の
認識候補文字の信頼度は、次式によって算出できる。For example, in calculating the discrimination score,
When the discriminant score is the ratio of the Mahalanobis distances DMc and DMe or the logarithm of the ratio, the relationship between the discriminant score obtained from each of the correct reading and misreading learning samples and the reliability is as shown in FIG. Here, the reliability is calculated from the learning sample by Bayes' theorem. That is, the discrimination score y
P (y | X1 = C), which is the ratio of the number of correct reading samples with respect to the total number of correct reading samples, and p is the ratio of the number of misreading samples with the discrimination score y to the total number of misreading samples.
(Y | X1 = E), the ratio of the total number of correct reading samples to the total number of samples is P (X1 = C), and the ratio of the total number of misreading samples to the total number of samples is P (X1 = E). The reliability of the recognition candidate character having the highest certainty factor of the recognition candidate character group having the score y can be calculated by the following equation.
【0045】P(X1=C|y)=p(y|X1=C)
・P(X1=C)/[p(y|X1=C)・P(X1=
C)+p(y|X1=E)・P(X1=E)]
ここで、X1は1位の認識候補文字を表し、X1=C、
X1=Eはそれぞれ、1位認識候補文字が正解、不正解
である事象を意味する。かかる式から判別得点と信頼度
の関係を示す判別得点―信頼度変換テーブルを予め作成
しておき、これを判別得点―信頼度変換テーブル8に記
憶させておく。認識信頼度算出部7は判別得点算出部6
からの判別得点と、当該判別得点―信頼度変換テーブル
8の得点を比較し、該当する信頼度を、当該認識候補文
字群の第1位の認識候補文字の信頼度として出力する。P (X1 = C | y) = p (y | X1 = C)
・ P (X1 = C) / [p (y | X1 = C) ・ P (X1 =
C) + p (y | X1 = E) · P (X1 = E)] Here, X1 represents the first candidate character for recognition, X1 = C,
X1 = E means that the first-ranked recognition candidate characters are correct and incorrect, respectively. A discriminant score-reliability conversion table showing the relationship between the discriminant score and the reliability is created in advance from this equation and stored in the discriminant score-reliability conversion table 8. The recognition reliability calculation unit 7 is the discrimination score calculation unit 6
And the score of the determination score-reliability conversion table 8 are compared, and the corresponding reliability is output as the reliability of the first recognition candidate character of the recognition candidate character group.
【0046】次に、図5を参照して、認識候補数制御部
9の処理について説明する。図5の上部に示す表は、判
定対象の確信度1位の認識候補文字に対する信頼度と、
当該判定対象のN位までの認識候補文字の中に正読の文
字が含まれる累積確率との関係を示すものである。かか
る表中の確率は、前記正読、誤読の学習サンプルを基に
予め算出しておく。Next, the processing of the recognition candidate number control section 9 will be described with reference to FIG. The table shown in the upper part of FIG. 5 shows the reliability with respect to the recognition candidate character having the first confidence in the determination target,
It shows the relationship with the cumulative probability that a correct reading character is included in the N-th recognition candidate characters to be judged. The probabilities in the table are calculated in advance based on the learning samples of correct reading and erroneous reading.
【0047】信頼度―累積正読率テーブル10には、か
かる表を記憶させておく。そして、認識候補数制御部9
は、判定対象の認識候補文字群の信頼度と当該テーブル
中の信頼度レベルとを比較し、該当する信頼度レベルの
累積確率を参照しながら何位までの認識候補文字を言語
処理部21に出力するかを決定する。ここで、何位まで
を出力するかは、例えば、該当する信頼度レベルの累積
確率が所定のしきい値に達したか否かで決定する。この
際、設定されるしきい値は、全ての信頼度レベルに対し
て一律としても良いし、あるいは、信頼度レベル毎に個
別に設定するようにしても良い。This table is stored in the reliability-cumulative correct reading rate table 10. Then, the recognition candidate number control unit 9
Compares the reliability of the recognition candidate character group to be determined with the reliability level in the table, and refers to the cumulative probability of the corresponding reliability level up to the maximum number of recognition candidate characters in the language processing unit 21. Decide whether to output. Here, up to what is output is determined by, for example, whether or not the cumulative probability of the corresponding reliability level reaches a predetermined threshold value. At this time, the threshold value to be set may be uniform for all reliability levels, or may be set individually for each reliability level.
【0048】あるいは、図5の上部の表を基に、信頼度
レベル毎の出力候補数を予め設定し、これを信頼度−累
積正読率テーブル10に記憶させておいても良い。図5
の下部に示す表は、信頼度レベルと出力候補数とを予め
設定した場合の一例である。信頼度―累積正読率テーブ
ル10に予めかかる表を記憶させた場合には、認識候補
数制御部9は、該当する出力候補数を表から読み出し、
それに従って、言語処理部21に出力される認識候補文
字を制限する。Alternatively, the number of output candidates for each reliability level may be preset based on the table in the upper part of FIG. 5, and this may be stored in the reliability-cumulative correct reading rate table 10. Figure 5
The table below shows an example of the case where the reliability level and the number of output candidates are preset. When such a table is stored in advance in the reliability-cumulative correct reading rate table 10, the recognition candidate number control unit 9 reads the corresponding output candidate number from the table,
Accordingly, the recognition candidate characters output to the language processing unit 21 are limited.
【0049】以上の実施の形態においては、認識結果の
信頼度推定に有用な特徴量を手書き入力データから抽出
しているため、候補文字の確信度からは認識結果の信頼
度を適切に推定できない場合にも比較的精度よく信頼度
を算出でき、もって、正読率の高い認識候補文字を言語
処理部に出力することができるようになる。
<第2の実施の形態>次に、本発明に係る第2の実施形
態について以下に説明する。In the above embodiment, since the feature amount useful for estimating the reliability of the recognition result is extracted from the handwritten input data, the reliability of the recognition result cannot be properly estimated from the certainty factor of the candidate character. Also in this case, the reliability can be calculated relatively accurately, and thus the recognition candidate character having a high correct reading rate can be output to the language processing unit. <Second Embodiment> Next, a second embodiment according to the present invention will be described below.
【0050】本実施の形態は前記特徴抽出部4における
特徴抽出処理を変更するものである。まず、図6に本実
施の形態に係る手書き文字認識装置の回路ブロック図を
示す。第1の実施の形態において示した図1との相違
は、「文字間連接確率辞書11」が追加されている点で
ある。In this embodiment, the feature extraction processing in the feature extraction unit 4 is changed. First, FIG. 6 shows a circuit block diagram of the handwritten character recognition apparatus according to the present embodiment. The difference from FIG. 1 shown in the first embodiment is that the “character-character connecting probability dictionary 11” is added.
【0051】本実施の形態においては、特徴抽出部4に
おける特徴抽出処理として、第1の実施の形態において
示した処理内容に加えて、文字間連接確率辞書11を参
照することにより判定対象認識候補文字群中の認識候補
文字とその直前および直後の認識候補文字群との間の連
接確率を抽出する処理が追加される。すなわち、第1の
実施の形態では信頼度算出に用いる特徴量として、平均
筆記速度、筆記画数、判定対象認識候補文字群の上位N
文字の正規画数を採用したが、第2の実施の形態では、
これらに加えて判定対象認識候補文字群中の上位L位ま
での認識候補文字とその直前の認識候補文字群との間の
連接確率の値Pbk(k=1〜L)、および判定対象認
識候補文字群中の上位L位までの認識候補文字とその直
後の認識候補文字群との間の連接確率の値Pfk(k=
1〜L)を信頼度算出に用いる特徴量として採用する。In the present embodiment, as the feature extraction processing in the feature extraction unit 4, in addition to the processing content shown in the first embodiment, the judgment target recognition candidate is obtained by referring to the intercharacter concatenation probability dictionary 11. A process of extracting a concatenation probability between the recognition candidate character in the character group and the recognition candidate character group immediately before and after the recognition candidate character is added. That is, in the first embodiment, the average writing speed, the number of writing strokes, and the top N of the judgment target recognition candidate character group are used as the feature amount used for reliability calculation.
Although the normal stroke number of characters is adopted, in the second embodiment,
In addition to these, the value Pbk (k = 1 to L) of the connection probability between the recognition candidate characters up to the top L in the judgment target recognition candidate character group and the recognition candidate character group immediately before it, and the judgment target recognition candidate The value Pfk (k = k) of the concatenation probability between the recognition candidate character up to the upper L rank in the character group and the recognition candidate character group immediately after it.
1 to L) are adopted as feature quantities used for reliability calculation.
【0052】ここで、判定対象認識候補文字群中の第k
位候補文字とその直前の認識候補文字群との間の連接確
率の値Pbkは、本実施の形態では、第k位候補文字と
直前の1位からJ位までの候補文字との間の連接確率の
最大値とする。Pfkも同様に、第k位候補文字と直後
の1位からJ位までの候補文字との間の連接確率の最大
値とする。Here, the kth character in the recognition target recognition candidate character group
In the present embodiment, the value Pbk of the connection probability between the rank candidate character and the immediately preceding recognition candidate character group is the connection between the kth candidate character and the immediately preceding 1st to Jth candidate characters. Maximum probability. Similarly, Pfk is the maximum value of the concatenation probability between the kth candidate character and the immediately following 1st to Jth candidate characters.
【0053】たとえば図7の例においては、判定対象の
認識候補文字の1位文字「日」に対するPb1は、当該
「日」と直前の1位文字「朋」からJ位文字「胡」まで
のそれぞれの連接確率P(C1|Cbk)の内、最大の
連接確率を採用する。また、1位文字「日」に対するP
f1は、当該「日」と直後の1位文字「も」からJ位文
字「亡」までのそれぞれの連接確率P(Cfk|C1)
の内、最大の連接確率を採用する。同様に、判定対象の
認識候補文字の2位文字「月」に対するPb2、Pf2
は、直前、直後の文字群に対する連接確率の最大値をそ
れぞれ採用する。For example, in the example of FIG. 7, Pb1 for the 1st character "day" of the recognition candidate character to be judged is the "day" and the preceding 1st character "To" to the Jth character "hu". Among the respective connection probabilities P (C1 | Cbk), the maximum connection probability is adopted. In addition, P for the first character "day"
f1 is the concatenation probability P (Cfk | C1) of the "day" and the immediately preceding 1st character "mo" to the Jth character "death."
Among them, the highest connection probability is adopted. Similarly, Pb2 and Pf2 for the second character "month" of the recognition candidate character to be determined
Uses the maximum value of the concatenation probability for the character groups immediately before and after respectively.
【0054】ここで、C1は判定対象の認識候補1位の
文字を表し、Cbk、Cfkはそれぞれ、直前、直後の
認識候補k位の文字を表す。そして、P(Cj|Ci)
は、文字Ciに続いて文字Cjが現れる連接確率を表
す。第2の実施形態においては、図3に示す判別空間は
(2+N+2L)次元となる。また、正読・誤読のサン
プルも、平均筆記速度、筆記画数、当該サンプルの認識
候補文字群の上位N文字の正規画数の他、当該サンプル
の認識候補文字群の上位L文字に対する連接確率Pb
k、Pfkが特徴抽出要素とされ、かかるサンプルデー
タに従って判別得点―信頼度変換テーブル8と信頼度―
累積正読率テーブル10に記憶されるテーブルが設定さ
れる。Here, C1 represents the first character of the recognition candidate to be judged, and Cbk and Cfk represent the character of the kth candidate immediately before and after the recognition candidate, respectively. And P (Cj | Ci)
Represents the concatenation probability that the character Cj appears after the character Ci. In the second embodiment, the discriminant space shown in FIG. 3 has (2 + N + 2L) dimensions. Also, in the case of correct reading / wrong reading samples, in addition to the average writing speed, the number of writing strokes, the number of normal strokes of the upper N characters of the recognition candidate character group of the sample, the concatenation probability Pb with respect to the upper L characters of the recognition candidate character group of the sample
k and Pfk are feature extraction elements, and the discrimination score-reliability conversion table 8 and reliability-according to the sample data.
A table stored in the cumulative correct reading rate table 10 is set.
【0055】第2の実施形態においては、第1の実施形
態で採用した特徴量に加えて、隣接する認識候補文字群
に含まれる文字間の連接確率を信頼度判定の特徴量とし
て採用するものであるから、前記第1の実施形態よりも
さらに高精度の信頼度判定を行えるものである。さらに
他の実施形態として、前記連接確率Pbk、Pfkの
他、第M位までの認識候補文字の確信度(類似度もしく
は距離値)を特徴要素として加え、(2+N+2L+
M)次元のベクトル空間にて当該認識候補文字群の特徴
ベクトルを抽出するようにしてもよい。かかる場合には
図3に示す判別空間も(2+N+2L+M)次元とな
る。かかる第2の実施の形態では、連接関係のみならず
確信度が加味されるものであるから、より高精度の信頼
度判定が可能となる。In the second embodiment, in addition to the feature quantity adopted in the first embodiment, a concatenation probability between characters included in adjacent recognition candidate character groups is adopted as a feature quantity for reliability determination. Therefore, the reliability determination can be performed with higher accuracy than in the first embodiment. As still another embodiment, in addition to the concatenation probabilities Pbk and Pfk, the certainty factor (similarity or distance value) of the recognition candidate characters up to the Mth position is added as a feature element, and (2 + N + 2L +
The feature vector of the recognition candidate character group may be extracted in the M) -dimensional vector space. In such a case, the discriminant space shown in FIG. 3 also has (2 + N + 2L + M) dimensions. In the second embodiment, not only the concatenation relation but also the certainty factor is taken into consideration, so that the reliability determination can be performed with higher accuracy.
【0056】ところで、前記実施の形態では、図1にお
けるブロック毎に処理を分けて一連の処理フローを説明
したが、制御プログラムに従ってCPUによってかかる
処理フローを実行することも可能である。かかる場合、
前記処理フローは、ROMまたはRAMに制御プログラ
ムとして記憶される。また、文字認識辞書3、正規画数
テーブル5、判別得点―信頼度変換テーブル8、信頼度
―累積正読率テーブル10および文字間連接確率辞書1
1の参照データもROMまたはRAMに記憶される。C
PUは、かかる制御プログラムに従って、参照データを
参照しながら、前記の処理を実行する。By the way, in the above-described embodiment, a series of processing flows are explained by dividing the processing into each block in FIG. 1, but it is also possible to execute this processing flow by the CPU according to the control program. In such cases,
The processing flow is stored in the ROM or the RAM as a control program. Further, the character recognition dictionary 3, the regular stroke number table 5, the discrimination score-reliability conversion table 8, the reliability-cumulative correct reading rate table 10, and the inter-character connection probability dictionary 1
The reference data of 1 is also stored in the ROM or the RAM. C
The PU executes the above process according to the control program while referring to the reference data.
【0057】図8に、かかる制御プログラムによるフロ
ーを示す。ここで、ステップS101は入力部1におけ
る処理、ステップS102は文字認識部2における処
理、ステップS103は特徴抽出部4における処理、ス
テップS104は判別得点算出部6における処理、ステ
ップS105は認識信頼度算出部7における処理、ステ
ップS106は認識候補数制御部9における処理であ
る。FIG. 8 shows a flow of the control program. Here, step S101 is processing in the input unit 1, step S102 is processing in the character recognition unit 2, step S103 is processing in the feature extraction unit 4, step S104 is processing in the discrimination score calculation unit 6, and step S105 is recognition reliability calculation. Processing in the unit 7, step S106 is processing in the recognition candidate number control unit 9.
【0058】なお、請求項における「文字認識ステッ
プ」は実施の形態における図7のステップS102が対
応する。請求項における「特徴抽出ステップ」は実施の
形態における図7のステップS103が対応する。請求
項における「信頼度算出ステップ」は実施の形態におけ
る図7のステップS104およびS105が対応する。
かかる制御プログラムおよび各種参照データは、フレキ
シブルディスク等の記録媒体またはインターネット等の
伝送媒体を介して取引され得る。記録媒体または伝送媒
体を介して取引されるデータのファイル構造の一例を図
9に示す。記録媒体には、かかるファイル構造のデータ
が記録される。また、伝送媒体を介した取引では、かか
るファイル構造のデータが伝送媒体を介して供給され
る。The "character recognition step" in the claims corresponds to step S102 of FIG. 7 in the embodiment. The “feature extraction step” in the claims corresponds to step S103 in FIG. 7 in the embodiment. The “reliability calculation step” in the claims corresponds to steps S104 and S105 of FIG. 7 in the embodiment.
The control program and various reference data can be traded via a recording medium such as a flexible disk or a transmission medium such as the Internet. FIG. 9 shows an example of a file structure of data traded via a recording medium or a transmission medium. Data having such a file structure is recorded on the recording medium. Further, in the transaction via the transmission medium, the data having such a file structure is supplied via the transmission medium.
【0059】以上、本発明に係る実施の形態について説
明したが、本発明はかかる実施の形態に制限されるもの
ではなく、他に種々の変更が可能である。たとえば、前
記実施の形態では、平均筆記速度、筆記画数、候補文字
の正規画数、隣接する認識候補文字群相互間の連接確
率、候補文字の確信度を特徴量として信頼度を算出する
例を示したが、これら種々の特徴量の内、個々の実施装
置において特に有用な特徴量のみを選択して採用するこ
ともできる。Although the embodiments according to the present invention have been described above, the present invention is not limited to the embodiments and various modifications can be made. For example, in the above-described embodiment, an example is shown in which the reliability is calculated using the average writing speed, the number of writing strokes, the number of normal strokes of candidate characters, the connection probability between adjacent recognition candidate character groups, and the certainty of candidate characters as a feature amount. However, it is also possible to select and employ only the feature amount that is particularly useful in the individual implementation devices from these various feature amounts.
【0060】また、図1の認識候補数制御部9における
処理内容を、認識候補数の制限ではなく、信頼度に応じ
て当該認識結果をリジェクト(無効)とするようにして
も良い。更に、手書き入力の対象は、文字を一例として
挙げたが、これには限られず、図形でも構わないことは
いうまでもない。Further, the processing content in the recognition candidate number control unit 9 of FIG. 1 may be such that the recognition result is rejected (invalid) according to the reliability, instead of limiting the number of recognition candidates. Furthermore, although the target of the handwriting input is a character as an example, it is needless to say that it is not limited to this and may be a figure.
【0061】その他、図1の判別得点算出部6における
判別得点の算出方法や、図1の認識信頼度算出部7にお
ける認識信頼度の算出方法も、前記実施の形態にて示し
たマハラノビス距離DMを用いる方法や、ベイズの定理
を用いる方法以外の方法を採用することもできる。本発
明の実施形態は、本発明の技術的思想の範囲内におい
て、適宜、様々な変更が可能である。In addition, the method for calculating the discrimination score in the discrimination score calculation unit 6 in FIG. 1 and the method for calculating the recognition reliability in the recognition reliability calculation unit 7 in FIG. 1 are the same as the Mahalanobis distance DM shown in the above embodiment. Alternatively, a method other than the method using Bayes' theorem can be adopted. The embodiments of the present invention can be appropriately modified in various ways within the scope of the technical idea of the present invention.
【0062】また、前述の実施の形態は、あくまでも、
本発明の一つの実施形態であって、本発明ないし各構成
要件の用語の意義は、実施の形態に記載されたものに制
限されるものではない。Further, the above-described embodiment is, to the last,
This is one embodiment of the present invention, and the meanings of the terms of the present invention or each constituent element are not limited to those described in the embodiment.
【0063】[0063]
【発明の効果】以上、本発明によれば、手書き入力デー
タから得られる信頼度推定に有用な特徴を用いることに
より、候補文字の確信度や候補文字間の連接確率など文
字認識結果から得られる情報のみでは当該認識結果の信
頼度を適切に推定できない場合でも比較的精度よく信頼
度を算出でき、もって、これを文字認識装置に採用した
場合には、文字認識の精度を向上させることができるよ
うになる。As described above, according to the present invention, it is possible to obtain from the character recognition result such as the certainty factor of the candidate character and the concatenation probability between the candidate characters by using the feature useful for the reliability estimation obtained from the handwritten input data. Even if the reliability of the recognition result cannot be properly estimated with only the information, the reliability can be calculated with relatively high accuracy. Therefore, when this is adopted in the character recognition device, the accuracy of character recognition can be improved. Like
【図1】 第1の実施の形態に係る回路ブロック図を示
す図である。FIG. 1 is a diagram showing a circuit block diagram according to a first embodiment.
【図2】 第1の実施の形態に係る特徴抽出部の処理を
説明するための図である。FIG. 2 is a diagram for explaining a process of a feature extraction unit according to the first embodiment.
【図3】 第1の実施の形態に係る判別得点算出部の処
理を説明するための図である。FIG. 3 is a diagram for explaining a process of a discrimination score calculation unit according to the first embodiment.
【図4】 第1の実施の形態に係る認識信頼度算出部の
処理を説明するための図である。FIG. 4 is a diagram for explaining a process of a recognition reliability calculation unit according to the first embodiment.
【図5】 第1の実施の形態に係る認識候補数制御部の
処理を説明するための図である。FIG. 5 is a diagram for explaining a process of a recognition candidate number control unit according to the first embodiment.
【図6】 第2の実施の形態に係る回路ブロック図を示
す図である。FIG. 6 is a diagram showing a circuit block diagram according to a second embodiment.
【図7】 第2の実施の形態に係る特徴抽出部の処理を
説明するための図である。FIG. 7 is a diagram for explaining a process of a feature extraction unit according to the second embodiment.
【図8】 第1および第2の実施の形態に係る実行フロ
ーチャートである。FIG. 8 is an execution flowchart according to the first and second embodiments.
【図9】 第2の実施の形態に係る実行プログラムと参
照データのファイル構造である。FIG. 9 is a file structure of an execution program and reference data according to the second embodiment.
1…入力部 2…文字認識部 3…文字認識辞書 4…特徴抽出部 5…正規画数テーブル 6…判別得点算出部 7…認識信頼度算出部 8…判別得点―信頼度変換テーブル 9…認識候補数制御部 10…信頼度−累積正読率テーブル 11…文字間連接確率辞書 1 ... Input section 2 ... Character recognition part 3 ... Character recognition dictionary 4 ... Feature extraction unit 5 ... Regular stroke number table 6 ... Discrimination score calculation unit 7 ... Recognition reliability calculation unit 8 ... Judgment score-reliability conversion table 9 ... Recognition candidate number control unit 10 ... Reliability-cumulative correct reading rate table 11 ... Character probability dictionary
フロントページの続き Fターム(参考) 5B064 AB04 BA05 DD05 DD07 EA18 5B068 AA01 BD02 BD17 CC19 CD02 CD06 Continued front page F-term (reference) 5B064 AB04 BA05 DD05 DD07 EA18 5B068 AA01 BD02 BD17 CC19 CD02 CD06
Claims (20)
して認識候補文字群を出力する文字認識手段と、 前記文字認識手段より出力される判定対象認識候補文字
群の信頼度を算出するための特徴量として、前記手書き
入力された文字の座標点列の平均筆記速度を算出する特
徴抽出手段と、 前記特徴抽出手段からの特徴量と、サンプルデータの統
計的傾向とに基づいて、前記判定対象認識候補文字群の
信頼度を算出する信頼度算出手段と、 前記信頼度算出手段からの信頼度に基づいて前記判定対
象認識候補文字群の後処理を制御する後処理制御手段と
を有することを特徴とする文字認識装置。1. A character recognition means for recognizing a coordinate point sequence of characters input by handwriting and outputting a recognition candidate character group, and a reliability of a judgment target recognition candidate character group output by the character recognition means. As a feature amount for the feature extraction means for calculating the average writing speed of the coordinate point sequence of the characters input by handwriting, the feature amount from the feature extraction means, and the statistical tendency of the sample data, based on the It has reliability calculation means for calculating the reliability of the judgment target recognition candidate character group, and post-processing control means for controlling post-processing of the judgment target recognition candidate character group based on the reliability from the reliability calculation means. A character recognition device characterized by the above.
抽出手段は、前記平均筆記速度と、前記手書き入力され
た文字の座標点列の筆記画数とを、前記判定対象認識候
補文字群の信頼度を算出するための特徴量として抽出す
ることを特徴とする文字認識装置。2. The invention according to claim 1, wherein the feature extraction unit determines the average writing speed and the number of writing strokes of the coordinate point sequence of the character input by handwriting as the reliability of the determination target recognition candidate character group. A character recognition device characterized by extracting as a feature amount for calculating a degree.
において、前記特徴抽出手段は、さらに前記判定対象認
識候補文字群中の上位N文字(N≧1)の正規画数を、
当該判定対象認識候補文字群の信頼度を算出するための
特徴量として抽出することを特徴とする文字認識装置。3. The invention according to any one of claims 1 and 2, wherein the feature extraction means further sets the number of regular strokes of upper N characters (N ≧ 1) in the determination target recognition candidate character group,
A character recognition device characterized by being extracted as a feature amount for calculating the reliability of the judgment target recognition candidate character group.
において、前記特徴抽出手段は、さらに前記判定対象認
識候補文字群中の上位M文字(M≧1)の確信度を、当
該判定対象認識候補文字群の信頼度を算出するための特
徴量として抽出することを特徴とする文字認識装置。4. The invention according to any one of claims 1 to 3, wherein the feature extracting means further determines the certainty factor of the upper M characters (M ≧ 1) in the determination target recognition candidate character group. A character recognition device characterized by extracting as a feature amount for calculating the reliability of a target recognition candidate character group.
において、前記特徴抽出手段は、さらに前記判定対象認
識候補文字群中の各認識候補文字とその直前の手書き入
力に対する直前認識候補文字群との間の連接確率の値お
よび/もしくはその直後の手書き入力に対する直後認識
候補文字群との間の連接確率の値を、当該判定対象認識
候補文字群の信頼度を算出するための特徴量として抽出
することを特徴とする文字認識装置。5. The invention according to claim 1, wherein the feature extracting means further includes each recognition candidate character in the determination target recognition candidate character group and the immediately preceding recognition candidate character for handwriting input immediately before the recognition candidate character. A feature value for calculating the reliability of the determination target recognition candidate character group by determining the value of the connection probability with the group and / or the value of the connection probability with the immediately subsequent recognition candidate character group for the handwriting input immediately after that. Character recognition device characterized by extracting as.
抽出手段は、前記判定対象認識候補文字群中の各認識候
補文字と前記直前認識候補文字群中の最上位確信度の認
識候補文字との間の連接確率の値および/もしくは前記
直後認識候補文字群中の最上位確信度の認識候補文字と
の間の連接確率の値を当該判定対象認識候補文字群の特
徴量として抽出することを特徴とする文字認識装置。6. The invention according to claim 5, wherein the feature extracting means includes each recognition candidate character in the determination target recognition candidate character group and a recognition candidate character with the highest certainty factor in the immediately preceding recognition candidate character group. And / or extracting the value of the connection probability with the recognition candidate character having the highest certainty factor in the immediately following recognition candidate character group as the feature amount of the determination target recognition candidate character group. Characterized character recognition device.
抽出手段は、前記判定対象認識候補文字群中の一の認識
候補文字とその直前または直後認識候補文字群中の各認
識候補文字との間の連接確率の内、最高の連接確率を当
該一の認識候補文字と前記直前または直後認識候補文字
群との間の連接確率とすることを特徴とする文字認識装
置。7. The invention according to claim 5, wherein the feature extracting means includes one recognition candidate character in the determination target recognition candidate character group and each recognition candidate character immediately before or after the recognition candidate character group. A character recognition apparatus, wherein the highest connection probability among the connection probabilities between the one recognition candidate character and the immediately preceding or immediately following recognition candidate character group is used as the connection probability.
において、前記信頼度算出手段は、前記特徴量から前記
判定対象認識候補文字群中の一の認識候補文字の確から
しさを判別得点として算出する判別得点算出手段を含
み、当該判別得点に基づいて前記信頼度を算出すること
を特徴とする文字認識装置。8. The invention according to claim 1, wherein the reliability calculation means determines the certainty of one recognition candidate character in the judgment target recognition candidate character group from the feature amount. A character recognition device comprising: a discrimination score calculating means for calculating the reliability, and calculating the reliability based on the discrimination score.
において、前記後処理制御手段は、前記信頼度算出手段
から算出された信頼度に基づいて、後処理の対象とする
認識候補文字を制限することを特徴とする文字認識装
置。9. The invention according to any one of claims 1 to 8, wherein the post-processing control means, based on the reliability calculated by the reliability calculation means, is a recognition candidate character to be subjected to post-processing. A character recognition device characterized by restricting.
識して認識候補文字群を出力する文字認識ステップと、 前記文字認識ステップより出力される判定対象認識候補
文字群の信頼度を算出するための特徴量として、前記手
書き入力された文字の座標点列の平均筆記速度を算出す
る特徴抽出ステップと、 前記特徴抽出ステップからの特徴量と、サンプルデータ
の統計的傾向とに基づいて、前記判定対象認識候補文字
群の信頼度を算出する信頼度算出ステップと、 前記信頼度算出ステップからの信頼度に基づいて前記判
定対象認識候補文字群の後処理を制御する後処理制御ス
テップとを有することを特徴とする文字認識方法。10. A character recognition step of recognizing a coordinate point sequence of characters input by handwriting and outputting a recognition candidate character group, and calculating reliability of a judgment target recognition candidate character group output from the character recognition step. As a feature amount for the feature extraction step of calculating the average writing speed of the coordinate point sequence of the handwritten input characters, the feature amount from the feature extraction step, and the statistical tendency of the sample data, based on the A reliability calculation step of calculating the reliability of the judgment target recognition candidate character group; and a post-processing control step of controlling post-processing of the judgment target recognition candidate character group based on the reliability from the reliability calculation step. A character recognition method characterized by the above.
特徴抽出ステップは、前記平均筆記速度と、前記手書き
入力された文字の座標点列の筆記画数とを、前記判定対
象認識候補文字群の信頼度を算出するための特徴量とし
て抽出することを特徴とする文字認識方法。11. The invention according to claim 10, wherein in the feature extraction step, the average writing speed and the number of writing strokes in the coordinate point sequence of the character input by handwriting are calculated as the reliability of the determination target recognition candidate character group. A character recognition method characterized by extracting as a feature amount for calculating the degree.
の発明において、前記特徴抽出ステップは、さらに前記
判定対象認識候補文字群中の上位N文字(N≧1)の正
規画数を、当該判定対象認識候補文字群の信頼度を算出
するための特徴量として抽出することを特徴とする文字
認識方法。12. The invention according to claim 10, wherein in the feature extracting step, the number of regular strokes of upper N characters (N ≧ 1) in the determination target recognition candidate character group is further determined. A character recognition method characterized by extracting as a feature amount for calculating the reliability of a target recognition candidate character group.
の発明において、前記特徴抽出ステップは、さらに前記
判定対象認識候補文字群中の上位M文字(M≧1)の確
信度を、当該判定対象認識候補文字群の信頼度を算出す
るための特徴量として抽出することを特徴とする文字認
識方法。13. The invention according to claim 10, wherein in the feature extracting step, the certainty factor of the upper M characters (M ≧ 1) in the determination target recognition candidate character group is further determined. A character recognition method characterized by extracting as a feature amount for calculating the reliability of a target recognition candidate character group.
の発明において、前記特徴抽出ステップは、さらに前記
判定対象認識候補文字群中の各認識候補文字とその直前
の手書き入力に対する直前認識候補文字群との間の連接
確率の値および/もしくはその直後の手書き入力に対す
る直後認識候補文字群との間の連接確率の値を、当該判
定対象認識候補文字群の信頼度を算出するための特徴量
として抽出することを特徴とする文字認識方法。14. The invention according to claim 10, wherein the feature extraction step further includes each recognition candidate character in the determination target recognition candidate character group and the immediately preceding recognition candidate character for the handwriting input immediately before the recognition candidate character. A feature value for calculating the reliability of the determination target recognition candidate character group by determining the value of the connection probability with the group and / or the value of the connection probability with the immediately subsequent recognition candidate character group for the handwriting input immediately after that. Character recognition method characterized by extracting as.
特徴抽出ステップは、前記判定対象認識候補文字群中の
各認識候補文字と前記直前認識候補文字群中の最上位確
信度の認識候補文字との間の連接確率の値および/もし
くは前記直後認識候補文字群中の最上位確信度の認識候
補文字との間の連接確率の値を当該判定対象認識候補文
字群の特徴量として抽出することを特徴とする文字認識
方法。15. The invention according to claim 14, wherein in the feature extraction step, each recognition candidate character in the determination target recognition candidate character group and the recognition candidate character with the highest certainty factor in the immediately preceding recognition candidate character group are included. And / or extracting the value of the connection probability with the recognition candidate character having the highest certainty factor in the immediately following recognition candidate character group as the feature amount of the determination target recognition candidate character group. Character recognition method that features.
特徴抽出ステップは、前記判定対象認識候補文字群中の
一の認識候補文字とその直前または直後認識候補文字群
中の各認識候補文字との間の連接確率の内、最高の連接
確率を当該一の認識候補文字と前記直前または直後認識
候補文字群との間の連接確率とすることを特徴とする文
字認識方法。16. The invention according to claim 14, wherein in the feature extracting step, one recognition candidate character in the judgment target recognition candidate character group and each recognition candidate character in the recognition candidate character group immediately before or after the recognition candidate character group are recognized. A character recognition method, wherein the highest connection probability among the connection probabilities between the recognition candidate characters is the connection probability between the one recognition candidate character and the immediately preceding or following recognition candidate character group.
の発明において、前記信頼度算出ステップは、前記特徴
量から前記判定対象認識候補文字群中の一の認識候補文
字の確からしさを判別得点として算出する判別得点算出
ステップを含み、当該判別得点に基づいて前記信頼度を
算出することを特徴とする文字認識方法。17. The invention according to claim 10, wherein the reliability calculation step determines a certainty of one recognition candidate character in the determination target recognition candidate character group from the feature amount. A character recognition method, comprising: a discrimination score calculating step of calculating as; and calculating the reliability based on the discrimination score.
の発明において、前記後処理制御ステップは、前記信頼
度算出ステップから算出された信頼度に基づいて、後処
理の対象とする認識候補文字を制限することを特徴とす
る文字認識方法。18. The invention according to claim 10, wherein the post-processing control step is based on the reliability calculated from the reliability calculation step, and is a recognition candidate character to be post-processed. A character recognition method characterized by limiting.
の何れかに記載の文字認識方法における各処理ステップ
を実行させるためのプログラム。19. The computer according to any one of claims 10 to 18.
A program for executing each processing step in the character recognition method described in any one of 1.
の何れかに記載の文字認識方法における各処理ステップ
を実行させるためのプログラムを記録したコンピュータ
読み取り可能な記録媒体。20. The computer according to any one of claims 10 to 18.
A computer-readable recording medium recording a program for executing each processing step in the character recognition method according to any one of 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2002095511A JP3970075B2 (en) | 2002-03-29 | 2002-03-29 | Character recognition apparatus, character recognition method, execution program thereof, and recording medium recording the same |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2002095511A JP3970075B2 (en) | 2002-03-29 | 2002-03-29 | Character recognition apparatus, character recognition method, execution program thereof, and recording medium recording the same |
Publications (2)
Publication Number | Publication Date |
---|---|
JP2003296661A true JP2003296661A (en) | 2003-10-17 |
JP3970075B2 JP3970075B2 (en) | 2007-09-05 |
Family
ID=29387239
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2002095511A Expired - Fee Related JP3970075B2 (en) | 2002-03-29 | 2002-03-29 | Character recognition apparatus, character recognition method, execution program thereof, and recording medium recording the same |
Country Status (1)
Country | Link |
---|---|
JP (1) | JP3970075B2 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2010537321A (en) * | 2007-08-24 | 2010-12-02 | ロベルト・ボッシュ・ゲゼルシャフト・ミト・ベシュレンクテル・ハフツング | Method and system for optimal selection strategy for statistical classification |
JP2014502399A (en) * | 2010-12-10 | 2014-01-30 | 上海合合信息科技発展有限公司 | Handwriting input method by superimposed writing |
CN105528610A (en) * | 2014-09-30 | 2016-04-27 | 阿里巴巴集团控股有限公司 | Character recognition method and device |
JP2019159374A (en) * | 2018-03-07 | 2019-09-19 | 富士ゼロックス株式会社 | Information processing apparatus and program |
JP2021005159A (en) * | 2019-06-25 | 2021-01-14 | 富士ゼロックス株式会社 | Information processing apparatus and program |
-
2002
- 2002-03-29 JP JP2002095511A patent/JP3970075B2/en not_active Expired - Fee Related
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2010537321A (en) * | 2007-08-24 | 2010-12-02 | ロベルト・ボッシュ・ゲゼルシャフト・ミト・ベシュレンクテル・ハフツング | Method and system for optimal selection strategy for statistical classification |
JP2014502399A (en) * | 2010-12-10 | 2014-01-30 | 上海合合信息科技発展有限公司 | Handwriting input method by superimposed writing |
CN105528610A (en) * | 2014-09-30 | 2016-04-27 | 阿里巴巴集团控股有限公司 | Character recognition method and device |
CN105528610B (en) * | 2014-09-30 | 2019-05-07 | 阿里巴巴集团控股有限公司 | Character recognition method and device |
JP2019159374A (en) * | 2018-03-07 | 2019-09-19 | 富士ゼロックス株式会社 | Information processing apparatus and program |
JP2021005159A (en) * | 2019-06-25 | 2021-01-14 | 富士ゼロックス株式会社 | Information processing apparatus and program |
US11321955B2 (en) | 2019-06-25 | 2022-05-03 | Fujifilm Business Innovation Corp. | Information processing apparatus and non-transitory computer readable medium |
JP7338265B2 (en) | 2019-06-25 | 2023-09-05 | 富士フイルムビジネスイノベーション株式会社 | Information processing device and program |
Also Published As
Publication number | Publication date |
---|---|
JP3970075B2 (en) | 2007-09-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7756335B2 (en) | Handwriting recognition using a graph of segmentation candidates and dictionary search | |
US7596272B2 (en) | Handling of diacritic points | |
US5841901A (en) | Pattern recognition system | |
US5768417A (en) | Method and system for velocity-based handwriting recognition | |
US20120014601A1 (en) | Handwriting recognition method and device | |
US7903877B2 (en) | Radical-based HMM modeling for handwritten East Asian characters | |
CN113657098B (en) | Text error correction method, device, equipment and storage medium | |
Rashid et al. | Scanning neural network for text line recognition | |
WO1997022947A1 (en) | Method and system for lexical processing | |
US20220375244A1 (en) | Systems and methods for handwriting recognition | |
JP3216800B2 (en) | Handwritten character recognition method | |
CN108694167B (en) | Candidate word evaluation method, candidate word ordering method and device | |
JP3970075B2 (en) | Character recognition apparatus, character recognition method, execution program thereof, and recording medium recording the same | |
JP7095450B2 (en) | Information processing device, character recognition method, and character recognition program | |
CN108628826A (en) | Candidate word evaluation method and device, computer equipment and storage medium | |
CN108681533B (en) | Candidate word evaluation method and device, computer equipment and storage medium | |
Bhattacharya et al. | Cleaning of online Bangla free-form handwritten text | |
CN108664466B (en) | Candidate word evaluation method and device, computer equipment and storage medium | |
CN108733646B (en) | Candidate word evaluation method and device, computer equipment and storage medium | |
CN108647202B (en) | Candidate word evaluation method and device, computer equipment and storage medium | |
JP3374762B2 (en) | Character recognition method and apparatus | |
KR20070090188A (en) | Method and apparatus for recognizing handwriting pattern | |
JP4330296B2 (en) | Character recognition device, character recognition reliability determination method, execution program thereof, and recording medium storing the same | |
Hurst et al. | Error repair in human handwriting: an intelligent user interface for automatic online handwriting recognition | |
CN108681534A (en) | Candidate word evaluation method and device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A621 | Written request for application examination |
Free format text: JAPANESE INTERMEDIATE CODE: A621 Effective date: 20040520 |
|
RD01 | Notification of change of attorney |
Free format text: JAPANESE INTERMEDIATE CODE: A7421 Effective date: 20051227 |
|
A131 | Notification of reasons for refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A131 Effective date: 20061024 |
|
A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20061221 |
|
A131 | Notification of reasons for refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A131 Effective date: 20070206 |
|
A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20070406 |
|
TRDD | Decision of grant or rejection written | ||
A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 Effective date: 20070508 |
|
A61 | First payment of annual fees (during grant procedure) |
Free format text: JAPANESE INTERMEDIATE CODE: A61 Effective date: 20070605 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20100615 Year of fee payment: 3 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20110615 Year of fee payment: 4 |
|
LAPS | Cancellation because of no payment of annual fees |