JP2001147988A

JP2001147988A - Method and device for recognizing character

Info

Publication number: JP2001147988A
Application number: JP33116399A
Authority: JP
Inventors: Jutaro Ishioka; 寿太郎石岡
Original assignee: Japan Digital Laboratory Co Ltd
Current assignee: Japan Digital Laboratory Co Ltd
Priority date: 1999-11-22
Filing date: 1999-11-22
Publication date: 2001-05-29

Abstract

PROBLEM TO BE SOLVED: To provide a method and a device for recognizing character, with which the rate of recognition is improved in the case of reading a document or the like with ruled lines or field lines printed in one non-dropout color with an image reader. SOLUTION: A character is segmented from a read image, from which ruled lines or the like are removed (S1 and S2), ruled line contact information is stored (S3), recognized candidate information is provided by recognizing processing (S4 and S5), the group of corrected images Ci2 is provided by repeating operation for correcting that segmented image close to a form, which is predicted from a code Co1 and a distance Di1 of candidate characters provided by recognizing processing and registered form information Fin, just for the number of predicted forms (S6), the corrected image of high reliability is provided by discriminating the reliability of these corrected images (S7) and further, it is discriminated whether or not the recognized result is to be outputted (S8).

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は文字認識技術に関
し、特に、非ドロップアウトカラーの１色刷りで罫線や
枠線が印刷された帳票に記入された文字をイメージリー
ダで読み取って得た読み取りイメージから罫線又は枠線
を除去した文字の認識技術に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition technology, and more particularly to a character recognition technology which reads a character written on a form on which a ruled line and a frame line are printed by one color printing of a non-dropout color with an image reader. The present invention relates to a technology for recognizing characters from which ruled lines or frame lines have been removed.

【０００２】[0002]

【従来の技術】ＯＣＲやスキャナ等のイメージリーダで
は帳票や原稿上の文字を読み取って電気信号に変換し文
字イメージを出力するが、帳票には、通常、罫線や枠線
が印刷されており文字は罫線に沿って印刷或いは記入さ
れるので、読み取りの邪魔にならないように罫線や枠線
はドロップアウトカラーで印刷されている。従って、イ
メージリーダで文字が印刷或いは記入された帳票を読み
取っても罫線又は枠線は読み取られないのでそれらの罫
線や枠線イメージなしの文字、すなわち、文字のみを読
み取ることができる。2. Description of the Related Art An image reader such as an OCR or a scanner reads a character on a form or a document, converts the character into an electric signal, and outputs a character image. However, a form usually has a ruled line or a frame line printed thereon. Are printed or written along the ruled lines, so that the ruled lines and frame lines are printed in a dropout color so as not to interfere with reading. Therefore, even if a form on which characters are printed or written is read by an image reader, the ruled lines or the frame lines are not read, so that the characters without those ruled lines or the frame line images, that is, only the characters can be read.

【０００３】しかし、従来、このような帳票をイメージ
リーダで読み取って文字認識処理を行うには文字記入位
置を判定するために罫線又は枠線の位置を示す非ドロッ
プアウトカラーのガイドマークを印刷しておく必要があ
った。However, conventionally, in order to perform character recognition processing by reading such a form with an image reader, a non-dropout color guide mark indicating the position of a ruled line or a frame line is printed in order to determine the character entry position. Had to be kept.

【０００４】しかし、上述のように非ドロップアウトカ
ラーのガイドマークを印刷する方法では帳票をドロップ
アウトカラーの罫線又は枠線と非ドロップアウトカラー
のガイドラインの２色刷りとする必要があるので、帳票
の印刷コストがかかりランニングコストが高くなるとい
った問題点があった。However, in the method of printing the guide mark of the non-dropout color as described above, it is necessary to print the form in two colors of the ruled line or frame of the dropout color and the guideline of the non-dropout color. There is a problem that printing cost is high and running cost is high.

【０００５】[0005]

【発明が解決しようとする課題】上述した２色刷りの帳
票を用いることによるランニングコストの上昇を避ける
には非ドロップアウトカラーの１色刷りで印刷された帳
票を用いればよいが、この場合には、非ドロップアウト
カラーで印刷された罫線又は枠線と記入された文字が接
触又は重複すると、イメージリーダで読み取った際、罫
線又は枠線と文字との区別がつかず、誤認識や読み取り
不能を生ずる場合があるといった問題点があった。In order to avoid an increase in running cost due to the use of the above-described two-color printing form, a form printed with non-dropout color one-color printing may be used. If a ruled line or frame printed in a non-dropout color touches or overlaps with a written character, the ruled line or frame cannot be distinguished from the character when read by an image reader, resulting in erroneous recognition or inability to read. There was a problem that there was a case.

【０００６】そこで、罫線又は枠線と文字が接触した場
合の誤認識や読み取り不能を防止するためには、イメー
ジリーダで読み取った後、罫線又は枠線を強制的に除去
すればよいが、単に、罫線や枠線を除去するだけでは
（罫線又は枠線と接触していた文字の一部が除去されて
しまうので）除去後の文字イメージの認識率が低下する
ので、従来は、残った文字イメージ部分の前後のストロ
ークの方向とその距離等から除去された部分を推定して
イメージ補正を行っていた。Therefore, in order to prevent erroneous recognition or inability to read when a character contacts a ruled line or a frame line, the ruled line or the frame line may be forcibly removed after being read by an image reader. Since the recognition rate of the character image after the removal is reduced only by removing the ruled line or the frame line (since a part of the character in contact with the ruled line or the frame line is removed), conventionally, the remaining character The image correction is performed by estimating the removed portion from the directions of the strokes before and after the image portion and the distance thereof.

【０００７】しかし、上述のイメージ補正方法でストロ
ークの方向が真の文字イメージとは異なる方向に向いて
いる場合には正しいイメージ補正ができないといった問
題点があった。例えば、文字「７」の下の部分が図３の
例のように罫線に接触していると、罫線と共にその情報
が失われ図１０（ａ）の例のようにストロークの方向が
左斜め下方向となり、しかもその距離が長いので、従来
の方法でイメージ補正すると図１０（ｃ）のように左斜
め方向にストローク部分が延長したイメージ（文字認識
すれば「７」）となり、正しい文字イメージ（図１０
（ｂ））とは異なったイメージ補正がなされることとな
る。However, there is a problem that correct image correction cannot be performed when the direction of the stroke is different from the true character image in the above-described image correction method. For example, if the lower part of the character "7" is in contact with the ruled line as in the example of FIG. 3, the information is lost together with the ruled line, and the stroke direction is diagonally lower left as in the example of FIG. Since the distance is long and the distance is long, if the image is corrected by the conventional method, an image in which the stroke portion is extended diagonally to the left as shown in FIG. 10C ("7" if character recognition is performed) is obtained, and a correct character image ( FIG.
Image correction different from (b)) is performed.

【０００８】本発明は上記問題点を解決するためになさ
れたものであり、非ドロップアウトカラーの１色刷りで
罫線或いは枠線（以下、罫線等）が印刷された帳票等を
イメージリーダで読み取る際の認識率向上を実現した文
字認識方法及び文字認識装置の提供を目的とする。SUMMARY OF THE INVENTION The present invention has been made to solve the above-mentioned problem, and is intended to read a form or the like on which a ruled line or a frame (hereinafter referred to as a ruled line) is printed by non-dropout color printing with an image reader. It is an object of the present invention to provide a character recognition method and a character recognition device which realize an improvement in the recognition rate of a character.

【０００９】[0009]

【課題を解決するための手段】上記課題を解決するため
に、第１の発明の文字認識方法は、読み取った原稿の読
み取りイメージから罫線等のイメージを取り除いて１文
字ずつ文字イメージを切り出して文字認識を行う文字認
識方法であって、前記文字イメージの切り出しの際に切
り出された文字イメージと罫線等との接触情報を取得
し、この切り出された文字の認識処理を行い、認識処理
で得られた認識結果と接触情報を基に、切り出された文
字イメージを予め登録された予測形状情報から予測され
る形状に近づけるように補正した補正イメージ群を得
て、この補正イメージ群の中から認識候補イメージを選
択し、この認識候補イメージに対応する認識文字を出力
するか否かの評価を行う、ことを特徴とする。In order to solve the above-mentioned problems, a character recognition method according to a first aspect of the present invention is to remove a ruled line or the like from a read image of a read original and cut out a character image one by one to extract a character image. This is a character recognition method for performing recognition, comprising obtaining contact information between a character image cut out at the time of cutting out the character image and a ruled line, performing recognition processing of the cut out character, and obtaining the recognition processing. Based on the recognized recognition result and the contact information, a corrected image group is obtained by correcting the cut-out character image so as to approximate a shape predicted from the pre-registered predicted shape information, and a recognition candidate is selected from the corrected image group. It is characterized in that an image is selected and whether or not to output a recognition character corresponding to the recognition candidate image is evaluated.

【００１０】また、第２の発明の文字認識装置は、読み
取った原稿の読み取りイメージから罫線等のイメージを
取り除いて１文字ずつ文字イメージを切り出して文字認
識を行う文字認識方法であって、文字イメージの切り出
しの際に切り出された文字イメージと罫線等との接触情
報を取得し、この切り出された文字の認識処理を行い、
認識処理で得られた認識結果と接触情報を基に、切り出
された文字イメージを予め登録された予測形状情報から
予測される形状に近づけるように補正した補正イメージ
群を得て、この補正イメージ群中の補正イメージが所定
の条件を満たしているか否かを調べ、所定の条件を満た
した補正イメージの中から認識候補イメージを抽出し、
この認識候補イメージに対応する認識文字を出力するか
否かの評価を行う、ことを特徴とする。A character recognition apparatus according to a second aspect of the present invention is a character recognition method for removing characters such as ruled lines from a read image of a read original and cutting out character images one by one to perform character recognition. The contact information between the character image cut out at the time of cutting out and the ruled line, etc. is obtained, and recognition processing of the cut out character is performed.
Based on the recognition result obtained in the recognition processing and the contact information, a corrected image group is obtained by correcting the cut-out character image so as to approximate a shape predicted from the pre-registered predicted shape information. Check whether the corrected image in the image satisfies a predetermined condition, and extract a recognition candidate image from the corrected image satisfying the predetermined condition,
It is characterized in that whether or not to output a recognition character corresponding to the recognition candidate image is evaluated.

【００１１】また、第３の発明は上記第２の発明の文字
認識方法において、補正イメージ群中の全ての補正イメ
ージが所定の条件を満たさない場合に、更に、文字イメ
ージの認識結果と前記接触情報を基に、切り出された文
字イメージについて、所定の補正優先順位により、予め
登録された予測形状情報から予測される形状に近づける
ように補正した補正イメージ群を得て、更に、この補正
イメージ群中の補正イメージが所定の条件を満たしてい
るか否かを調べ所定の条件を満たした補正イメージの中
から認識候補イメージを抽出する動作を繰り返す、こと
を特徴とする。According to a third aspect of the present invention, in the character recognition method according to the second aspect of the present invention, when all of the corrected images in the corrected image group do not satisfy a predetermined condition, the character image recognition result further includes On the basis of the information, a cut-out character image is obtained in a predetermined correction priority order to obtain a corrected image group corrected so as to approach a shape predicted from pre-registered predicted shape information. It is characterized in that the operation of checking whether or not the correction image in the image satisfies a predetermined condition and extracting a recognition candidate image from the correction image satisfying the predetermined condition is repeated.

【００１２】また、第４の発明の文字認識装置は、読み
取った原稿の読み取りイメージから罫線等のイメージを
取り除く罫線除去手段と、この罫線除去手段によって罫
線等が取り除かれた文字イメージから１文字ずつ文字イ
メージを切り出す切り出し手段と、罫線除去手段によっ
て罫線等が取り除かれた文字イメージから罫線等が接触
していた部分の接触情報を取得する罫線接触情報取得手
段と、切り出し手段によって切り出された文字イメージ
の認識処理を行なう認識手段と、認識手段による認識処
理によって得られた認識結果と接触情報を基に、切り出
された文字イメージを予め登録された予測形状情報から
予測される形状に近づけるように補正した補正イメージ
群を得るイメージ補正手段と、イメージ補正手段によっ
て得られた補正イメージ群の中から認識候補イメージを
選択する補正イメージ選択手段と、補正イメージ選択手
段によって得られた認識候補イメージに対応する認識文
字を出力するか否かの評価を行う認識文字出力評価手段
と、を備えたことを特徴とする。According to a fourth aspect of the present invention, there is provided a character recognition device for removing ruled lines and the like from a read image of a read original, and one character at a time from the character image from which the ruled lines and the like have been removed by the ruled line removing means. A cutout means for cutting out a character image, a ruled line contact information obtaining means for obtaining contact information of a portion where a ruled line or the like has contacted from a character image from which a ruled line or the like has been removed by a ruled line removing means, and a character image cut out by the cutout means Based on the recognition result obtained by the recognition process and the contact information obtained by the recognition process, and correcting the cut-out character image to approximate the shape predicted from the pre-registered predicted shape information. Image correction means for obtaining a group of corrected images, and a correction image obtained by the image correction means. Correction image selection means for selecting a recognition candidate image from a group of images, and recognition character output evaluation means for evaluating whether to output a recognition character corresponding to the recognition candidate image obtained by the correction image selection means. , Is provided.

【００１３】また、第５の発明の文字認識装置は、読み
取った原稿の読み取りイメージから罫線等のイメージを
取り除く罫線除去手段と、この罫線除去手段によって罫
線等が取り除かれた文字イメージから１文字ずつ文字イ
メージを切り出す切り出し手段と、罫線除去手段によっ
て罫線等が取り除かれた文字イメージから罫線等が接触
していた部分の接触情報を取得する罫線接触情報取得手
段と、切り出し手段によって切り出された文字イメージ
の認識処理を行なう認識手段と、認識手段による認識処
理によって得られた認識結果と接触情報を基に、切り出
された文字イメージを予め登録された予測形状情報から
予測される形状に近づけるように補正した補正イメージ
群を得るイメージ補正手段と、文字イメージ補正手段に
よる補正後の文字イメージが所定の条件を満たしている
か否かを判定し、所定の条件を満たした補正イメージの
中から認識候補イメージを抽出する補正イメージ判定手
段と、補正イメージ判定手段によって得られた認識候補
イメージに対応する認識文字を出力するか否かの評価を
行う認識文字出力評価手段と、を備えたことを特徴とす
る。According to a fifth aspect of the present invention, there is provided a character recognition device for removing ruled lines and the like from a read image of a read original, and one character at a time from the character image from which the ruled lines and the like have been removed by the ruled line removing means. A cutout means for cutting out a character image, a ruled line contact information obtaining means for obtaining contact information of a portion where a ruled line or the like has contacted from a character image from which a ruled line or the like has been removed by a ruled line removing means, and a character image cut out by the cutout means Based on the recognition result obtained by the recognition process and the contact information obtained by the recognition process, and correcting the cut-out character image to approximate the shape predicted from the pre-registered predicted shape information. Image correcting means for obtaining a corrected image group and characters after correction by the character image correcting means Determining whether the image satisfies a predetermined condition, and extracting a recognition candidate image from the correction images satisfying the predetermined condition; and a recognition candidate image obtained by the correction image judgment unit. And a recognition character output evaluating means for evaluating whether or not to output a corresponding recognition character.

【００１４】また、第６の発明は上記第５の発明の文字
認識装置において、イメージ補正手段は、補正イメージ
判定手段による判定の結果、全ての補正イメージが所定
の条件を満たさない場合に、文字イメージの認識結果と
前記接触情報を基に、更に、前記切り出された文字イメ
ージを、所定の補正優先順位により、予め登録された予
測形状情報から予測される形状に近づけるように補正す
る手段を含む、ことを特徴とする。According to a sixth aspect of the present invention, in the character recognition device according to the fifth aspect of the present invention, the image correcting means determines whether or not all the corrected images satisfy the predetermined condition as a result of the determination by the corrected image determining means. Means for correcting the cut-out character image based on the recognition result of the image and the contact information so that the cut-out character image approaches a shape predicted from pre-registered predicted shape information according to a predetermined correction priority. , Characterized in that.

【００１５】[0015]

【発明の実施の形態】図１は本発明の文字認識装置の一
実施例の構成を示すブロック図であり、文字認識装置１
００は、罫線除去部１０、文字切り出し部２０、罫線接
触情報格納部３０及び文字認識ブロック４０から構成さ
れている。なお、図示していないが文字認識装置１００
は、ＣＰＵおよびその周辺回路からなり上述の各構成部
分の動作の制御及び文字認識装置全体の動作を制御する
制御部を備えている。FIG. 1 is a block diagram showing the configuration of an embodiment of a character recognition apparatus according to the present invention.
Reference numeral 00 includes a ruled line removing unit 10, a character cutout unit 20, a ruled line contact information storage unit 30, and a character recognition block 40. Although not shown, the character recognition device 100
Has a control unit composed of a CPU and its peripheral circuits, which controls the operation of each of the above-described components and controls the operation of the entire character recognition apparatus.

【００１６】罫線除去部１０は、スキャナ等から読み込
んだ読み込みイメージＩｍ１（図３）から罫線等（罫線
或いは枠線）を検出して罫線等のイメージを除去した文
字イメージＩｍ２（図４）を取得し、罫線等に接触して
いたイメージの罫線接触情報（例えば、文字の接触方
向、接触個所数、接触部分の位置（座標）、罫線等の太
さ）Ｉｆ１を取得する。The ruled line removing section 10 detects a ruled line or the like (ruled line or frame line) from the read image Im1 (FIG. 3) read from a scanner or the like and acquires a character image Im2 (FIG. 4) from which the image of the ruled line or the like has been removed. Then, rule line contact information (for example, the contact direction of the character, the number of contact points, the position (coordinates) of the contact portion, the thickness of the rule line, etc.) If1 of the image touching the rule line or the like is acquired.

【００１７】文字切り出し部２０は、罫線除去部１０に
よって罫線等が除去されたイメージＩｍ２から１文字ず
つ文字イメージを切り出して、切り出し文字イメージＣ
ｉ１を取得し、その切り出し文字イメージのもつ罫線接
触情報を文字単位の罫線接触情報Ｉｆ２に変換する。The character cutout unit 20 cuts out character images one by one from the image Im2 from which the ruled lines and the like have been removed by the ruled line removing unit 10, and outputs the cutout character images C.
i1 is obtained, and the ruled line contact information of the cut-out character image is converted into ruled line contact information If2 for each character.

【００１８】罫線接触情報格納部３０はＲＡＭ等の一時
格納メモリからなり、文字切り出し部２０で得た罫線接
触情報Ｉｆ２を格納する。The ruled line contact information storage unit 30 comprises a temporary storage memory such as a RAM, and stores the ruled line contact information If2 obtained by the character cutout unit 20.

【００１９】また、文字認識ブロック４０は、特徴抽出
部４１、辞書部４２、識別部４３、予測形状情報部４
４、イメージ補正処理部４５、補正イメージ選択部４６
及び棄却判定部４７を備え、特徴抽出やイメージ補正処
理等を行った後、文字認識を行う。The character recognition block 40 includes a feature extraction section 41, a dictionary section 42, an identification section 43, and a predicted shape information section 4.
4. Image correction processing unit 45, correction image selection unit 46
And a rejection determination unit 47 for performing character extraction after performing feature extraction, image correction processing, and the like.

【００２０】すなわち、文字認識ブロック４０におい
て、特徴抽出部４１は、文字切り出し部２０で切り出さ
れた文字イメージＣｉ１から特徴量Ｆｄ１を算出する。That is, in the character recognition block 40, the feature extracting unit 41 calculates a feature amount Fd1 from the character image Ci1 cut out by the character cutting unit 20.

【００２１】また、辞書部４２は、例えば、数字、アル
ファベット等の標準的な特徴量Ｆｄｄ、文字コードＣｏ
等から構成されるテンプレート構成をなし、各文字種に
対して予め複数個のテンプレートをＲＯＭ等の保存メモ
リに格納してなる。The dictionary unit 42 stores standard feature values Fdd, such as numbers and alphabets, and character codes Co.
A plurality of templates are stored in advance in a storage memory such as a ROM for each character type.

【００２２】また、識別部４３は、特徴抽出部４１で算
出された特徴量Ｆｄ１と辞書部４２の各テンプレートが
持つ特徴量Ｆｄｄとの距離計算を行い、距離の小さい順
（特徴の近い順）から上位規定値までの文字コードＣｏ
１、その距離Ｄｉ１等の認識候補情報を取得する。The discriminating unit 43 calculates the distance between the feature value Fd1 calculated by the feature extracting unit 41 and the feature value Fdd of each template of the dictionary unit 42. Character code Co from to upper specified value
1. Acquire recognition candidate information such as the distance Di1.

【００２３】また、予測形状情報部４４は、罫線等との
接触状態によりあり得る全ての形状の情報ＦｉｎをＲＯ
Ｍ等の保存メモリに格納してなる。The predicted shape information unit 44 stores information Fin of all possible shapes depending on the state of contact with a ruled line or the like in the RO.
M and stored in a storage memory such as M.

【００２４】また、イメージ補正処理部４５は、文字切
り出し部２０で切り出された文字イメージＣｉ１につい
て、それに対応する識別部４３で得られた候補文字のコ
ードＣｏ１、及び距離Ｄｉ１と、罫線接触情報格納部３
０に格納されている罫線接触情報ｆｉ２とを基に、予測
形状情報部４４に格納されている形状情報Ｆｉｎから予
測される形状に近づけるように補正する動作を予測され
る全ての形状の数だけ繰り返し、補正イメージＣｉ２群
を得る。The image correction processing section 45 stores the code Co1 and the distance Di1 of the candidate character obtained by the identification section 43 corresponding to the character image Ci1 cut out by the character cutout section 20, and the ruled line contact information. Part 3
On the basis of the ruled line contact information fi2 stored in 0, the number of operations to correct the shape to be close to the shape predicted from the shape information Fin stored in the predicted shape information unit 44 is the number of all predicted shapes. By repeating this, a group of corrected images Ci2 is obtained.

【００２５】また、補正イメージ選択部４６は、イメー
ジ補正処理部４５で補正された全ての補正イメージＣｉ
２群の中から信頼性のある補正文字イメージＣｉ３を選
択する。Further, the correction image selection section 46 controls all the correction images Ci corrected by the image correction processing section 45.
A reliable corrected character image Ci3 is selected from the two groups.

【００２６】また、棄却判定部４７は、補正イメージ選
択部４６で選択された補正文字イメージＣｉ３の文字と
しての信頼度を判定し、信頼度が所低値より高い場合は
その文字コードを出力し、そうでなければリジェクトコ
ード（棄却コード）を出力する。The rejection determination unit 47 determines the reliability of the corrected character image Ci3 selected by the correction image selection unit 46 as a character. If the reliability is higher than a low value, the character code is output. Otherwise, it outputs a reject code (rejection code).

【００２７】図２は、図１の文字認識装置１００による
文字認識動作例を示すフローチャートであり、各ステッ
プの動作シーケンスの制御は制御部によって行われる。
また、図３は読み込み文字イメージの例を示す図であ
る。また、図４は罫線等の除去後の文字イメージの例を
示す図であり、図５は図４での罫線等の除去後の文字
「９」を例とした補正イメージ及び認識結果を示す図で
ある。FIG. 2 is a flowchart showing an example of a character recognition operation performed by the character recognition apparatus 100 shown in FIG. 1. The control of the operation sequence of each step is performed by the control unit.
FIG. 3 is a diagram showing an example of a read character image. 4 is a diagram showing an example of a character image after removing ruled lines and the like, and FIG. 5 is a diagram showing a corrected image and a recognition result of the character “9” after removing ruled lines and the like in FIG. 4 as an example. It is.

【００２８】ステップＳ１：（罫線等の除去）図２で、罫線除去部１０はＤＲＡＭ等の一時記憶メモリ
に取り込まれた非ドロップアウトカラーの帳票又は原稿
（図３の例では文字、「２」、「９」が記入されてい
る）の読み込みイメージＩｍ１の罫線等（図３の例では
符号３１、３２で示される罫線及び符号３３で示される
枠線）を除去した文字イメージＩｍ２（図４）を得て、
ＤＲＡＭ等の一時記憶メモリに記憶する。Step S1: (Removal of Ruled Lines, etc.) In FIG. 2, the ruled line removing unit 10 is a non-dropout color form or document (character, "2" in the example of FIG. 3) taken into a temporary storage memory such as a DRAM. , "9"), a character image Im2 (FIG. 4) from which the ruled lines and the like (ruled lines denoted by reference numerals 31 and 32 and frame line denoted by reference numeral 33 in the example of FIG. 3) of the read image Im1 are removed. Get
It is stored in a temporary storage memory such as a DRAM.

【００２９】ステップＳ２：（文字の切り出し）次に、文字切り出し部２０は上記ステップＳ１で罫線等
が除去されたイメージＩｍ２から１文字ずつ文字イメー
ジを切り出して、切り出し文字イメージＣｉ１（図４の
例では符号４１、４２で表される各切り出し文字イメー
ジ）を取得する。Step S2: (Cutout of Character) Next, the character cutout unit 20 cuts out character images one by one from the image Im2 from which the ruled lines and the like have been removed in step S1, and cuts out the character image Ci1 (see FIG. 4). Then, each cut-out character image represented by reference numerals 41 and 42 is obtained.

【００３０】ステップＳ３：（罫線接触情報の取得及び
格納）また、文字切り出し部２０は上記ステップＳ２で切り出
した切り出し文字イメージのもつ罫線接触情報を文字単
位の罫線接触情報Ｉｆ２（例えば、文字の接触方向、接
触個所数、接触部分の位置（座標）、接触部分の罫線等
の太さ）に変換し（図４の例では切り出し文字イメージ
４１、４２の２つの罫線接触情報を得て）罫線接触情報
格納部３０に格納する（罫線等と接触していない文字が
あった場合は「罫線接触なし」を意味する罫線接触情報
（例えば、接触個所数＝０）が格納される）。例えば、
図４に示す２文字の切り出しイメージのうち符号４２で
表される文字イメージを例とすると、図３で文字「９」
は右側の罫線３２に３ヵ所の部分５１，５２，５３で接
触しているので、ステップＳ１で罫線除去を行うと符号
４２に示すような切り出しイメージとなる（つまり、上
記ステップＳ２で図５に示すような１ヵ所が開いたまま
の１文字イメージが切り出される）。また、この例で、
文字切り出し部２０は文字イメージの罫線接触情報とし
て文字イメージ４２と接触した罫線３２の位置（つまり
罫線３２と文字「９」の接触部分の両端の位置５１、５
２と、罫線３２の下側で文字「９」が接触した位置５３
を罫線接触情報格納部３０に格納する。なお、実施例で
は位置５１、５２、５３を座標値（Ｘ、Ｙ）で表してい
るがこれに限定されない。Step S3: (Acquisition and storage of ruled line contact information) The character cutout unit 20 converts the ruled line contact information of the cut-out character image cut out in step S2 into ruled line contact information If2 for each character (for example, character contact information). The direction, the number of contact points, the position (coordinates) of the contact part, the thickness of the ruled line of the contact part, etc.) are converted (in the example of FIG. 4, two pieces of ruled line contact information of the cut-out character images 41 and 42 are obtained). The information is stored in the information storage unit 30 (when there is a character that is not in contact with a ruled line or the like, ruled line contact information (for example, the number of contact points = 0) meaning "no ruled line contact" is stored). For example,
If the character image represented by reference numeral 42 in the two-character cutout image shown in FIG. 4 is taken as an example, the character “9” in FIG.
Is in contact with the right ruled line 32 at three portions 51, 52, and 53, and if the ruled line is removed in step S1, a cutout image as indicated by reference numeral 42 is obtained (that is, FIG. One character image is cut out with one place open as shown). Also, in this example,
The character cutout unit 20 determines the position of the ruled line 32 in contact with the character image 42 as the ruled line contact information of the character image (that is, the positions 51, 5, 5
2 and the position 53 where the character “9” contacts below the ruled line 32
Is stored in the ruled line contact information storage unit 30. In the embodiment, the positions 51, 52, and 53 are represented by coordinate values (X, Y), but the present invention is not limited to this.

【００３１】ステップＳ４：（切り出した文字イメージ
の特徴抽出）特徴抽出部４１は、上記ステップＳ２で切り出された１
つの文字イメージＣｉ１（例えば、図５の切り出し文字
イメージ）から特長抽出を行い特徴量Ｆｄ１を算出す
る。Step S4: (Characteristic Extraction of Cut-out Character Image) The feature extracting unit 41 extracts the 1
Feature extraction is performed from one character image Ci1 (for example, the cut-out character image in FIG. 5) to calculate a feature amount Fd1.

【００３２】ステップＳ５：（切り出した文字イメージ
の認識候補情報の取得）次に、識別部４３は上記ステップＳ４で算出された文字
イメージＣｉ１の特徴量Ｆｄ１と辞書部４２の各テンプ
レートに格納されている標準的な特徴量Ｆｄｄとの距離
計算を行い、距離の小さい順（特徴の近い順）から上位
３位までの認識候補文字コードＣｏ１及び距離計算結果
Ｄｉ１を認識候補情報として取得する（例えば、図５の
切り出し文字イメージについて、上記ステップＳ５の識
別処理を行った結果、認識第３位までの認識候補文字コ
ードＣｏ１は第１認識候補文字コードと第２認識候補文
字コードが「３」を示す文字コードであり、第３認識候
補文字コードは「９」といったように認識文字コードが
取得され、それぞれについて対応する距離計算結果が取
得される）。なお、実施例では取得する認識文字コード
Ｃｏ１の数を３つ（第１〜第３位）としたが認識文字コ
ードＣｏ１の数はこれに限定されない。Step S5: (Acquisition of Recognition Candidate Information of Cut-out Character Image) Next, the identification unit 43 stores the feature amount Fd1 of the character image Ci1 calculated in step S4 and the templates of the dictionary unit 42. The distance calculation is performed with respect to the standard feature value Fdd, and the recognition candidate character codes Co1 and the distance calculation result Di1 from the smallest distance (the order of the closest feature) to the top three are acquired as recognition candidate information (for example, As a result of performing the identification processing of step S5 on the cut-out character image in FIG. 5, the first and second recognition candidate character codes of the first and second recognition candidate character codes Co1 up to the third recognition position are “3”. The third recognition candidate character code is a character code, and a recognition character code such as “9” is obtained. Fruit is obtained). In the embodiment, the number of the recognized character codes Co1 to be obtained is three (first to third), but the number of the recognized character codes Co1 is not limited to this.

【００３３】ステップＳ６：（補正イメージ群の取得）イメージ補正処理部４５では、予測形状情報部４４に格
納されている形状情報Ｆｉｎとここまでの動作で取得し
た条件（切り出し文字イメージ（例えば、文字イメージ
４２）の罫線接触情報Ｉｆ２、認識候補文字候補Ｃｏ１
及び距離計算結果Ｄｉ１））を基に予測形状情報部４４
からここまでの動作で取得した条件を満たす形状情報を
取得する。次に、対象となる切り出し文字イメージにつ
いて、対応する識別部４３で得られた候補文字のコード
Ｃｏ１、及び距離Ｄｉ１と、罫線接触情報格納部３０に
格納されている罫線接触情報ｆｉ２とを基に、予測形状
情報部４４に格納されている形状情報Ｆｉｎから予測さ
れる形状に近づけるように補正する動作を予測される全
ての形状の数だけ繰り返し、補正イメージＣｉ２群を得
る（例えば、図５の切り出し文字イメージの場合は、予
測形状情報部４４から文字「３」の可能性がある場合の
補正イメージ（図６（ａ））と、文字「８」の可能性が
ある場合の補正イメージ（図６（ｂ））と文字「９」の
可能性がある場合の補正イメージ（図６（ｃ））とを得
ることができる）。Step S6: (Acquisition of Corrected Image Group) In the image correction processing unit 45, the shape information Fin stored in the predicted shape information unit 44 and the conditions acquired by the operation up to this point (cut-out character image (for example, character Image 42) ruled line contact information If2, recognition candidate character candidate Co1
And the predicted shape information section 44 based on the distance calculation result Di1))
To obtain the shape information that satisfies the conditions obtained by the operations up to here. Next, for the target cut-out character image, based on the code Co1 and distance Di1 of the candidate character obtained by the corresponding identification unit 43 and the ruled line contact information fi2 stored in the ruled line contact information storage unit 30, The operation of correcting the shape to be closer to the shape predicted from the shape information Fin stored in the predicted shape information unit 44 is repeated by the number of all predicted shapes to obtain a group of corrected images Ci2 (for example, FIG. In the case of a cut-out character image, a correction image when there is a possibility of the character “3” from the predicted shape information unit 44 (FIG. 6A) and a correction image when there is a possibility of the character “8” (FIG. 6 (b)) and a corrected image (FIG. 6 (c)) when there is a possibility of the character "9".

【００３４】ステップＳ７：（補正イメージ選択処理）補正イメージ選択部４６では、上記ステップＳ６でイメ
ージ補正処理部４５によって得られた補正イメージＣｉ
２群（図５の文字イメージの場合の例では上述した３種
類類の補正イメージＣｉ２）のうち、所定の選択方法に
より最も信頼度の高い形状の補正イメージを補正イメー
ジＣｉ３として選択する。この場合の選択方法として、
実施例では、それぞれの補正イメージＣｉ２の識別処理
を行って補正イメージの選択を行う。つまり、それぞれ
の補正イメージＣｉ２の特徴量を算出し、辞書部４２の
各テンプレートに格納されている標準的な特徴量Ｆｄｄ
との距離計算を行い、その結果を基にそれぞれの補正イ
メージＣｉ２のうち距離が最も小さく他のカテゴリーと
の距離が最も離れている補正イメージＣｉ３を選択する
（図５の例では上述の３種類の補正イメージのうち、距
離の最も小さく他のカテゴリーとの距離が最も離れてい
る（すなわち、一番信頼度の高い）「９」の形状に近づ
けて補正したい図６（ｃ）の補正イメージが選択され
る）。なお、選択方法はこの方法に限定されない。Step S7: (Correction Image Selection Processing) The correction image selection section 46 selects the correction image Ci obtained by the image correction processing section 45 in step S6.
Of the two groups (the three types of corrected images Ci2 described above in the case of the character image in FIG. 5), the corrected image having the shape with the highest reliability is selected as the corrected image Ci3 by a predetermined selection method. In this case,
In the embodiment, a correction image is selected by performing a process of identifying each correction image Ci2. That is, the feature amount of each correction image Ci2 is calculated, and the standard feature amount Fdd stored in each template of the dictionary unit 42 is calculated.
Is calculated, and based on the calculation result, a corrected image Ci3 having the smallest distance and the farthest distance from another category among the corrected images Ci2 is selected (in the example of FIG. 5, the above-described three types are selected). Among the correction images of FIG. 6C, the correction image of FIG. 6C which is to be corrected by approaching the shape of “9” having the smallest distance and being farthest from other categories (that is, having the highest reliability) is obtained. Selected). Note that the selection method is not limited to this method.

【００３５】ステップＳ８：（文字としての信頼性の判
定）次に、棄却判定部４７で、上記ステップＳ７で補正イメ
ージ選択部４６によって選択された補正イメージＣｉ３
に対し、辞書部４２の各テンプレートに格納されている
標準的な特徴量Ｆｄｄとの距離計算を行い、その距離ｒ
と所定の閾値δを比較し、ｒ−δ＜０（つまり、距離＜
閾値）の場合に絶対的な信頼性ありと評価してＳ９に移
行し、そうでない場合にはＳ１０に移行する。Step S8: (Judgment of Reliability as Character) Next, the rejection judging section 47 selects the corrected image Ci3 selected by the corrected image selecting section 46 in step S7.
Is calculated with respect to the standard feature value Fdd stored in each template of the dictionary unit 42, and the distance r
And a predetermined threshold δ, and r−δ <0 (that is, distance <
In the case of (threshold value), it is evaluated that there is absolute reliability, and the process proceeds to S9; otherwise, the process proceeds to S10.

【００３６】ステップＳ９：（文字認識コードの出力）制御部は特徴量Ｆｄｄとの距離が閾値より小さい場合に
は選択された補正イメージについて上記ステップＳ７で
辞書部４２から得られる文字コードを認識文字コードと
して出力し、１文字分の認識処理を終了する。Step S9: (Output of Character Recognition Code) If the distance from the feature value Fdd is smaller than the threshold value, the control unit converts the character code obtained from the dictionary unit 42 in step S7 into a recognition character This is output as a code, and the recognition process for one character is completed.

【００３７】ステップＳ１０：（リジェクトコードの出
力）制御部は特徴量Ｆｄｄとの距離が閾値より小さい場合に
はリジェクトコードを出力し、１文字分の認識処理を終
了する。Step S10: (Output of Reject Code) When the distance from the feature value Fdd is smaller than the threshold value, the control unit outputs a reject code and ends the recognition processing for one character.

【００３８】上記構成により、罫線等の除去処理（Ｓ
１）により図５のように真のイメージ情報を失っても図
６（ｃ）のような補正イメージを取得して信頼度の高い
認識結果を出力することができる。With the above configuration, the processing for removing ruled lines and the like (S
According to 1), even if true image information is lost as shown in FIG. 5, a corrected image as shown in FIG. 6C can be obtained and a highly reliable recognition result can be output.

【００３９】図７は本発明の文字認識装置の他の実施例
の構成を示すブロック図であり、取得された補正イメー
ジの信頼度を所定の優先順位（補正を行う際の順序に基
づく優先度）に従って調べその補正イメージの信頼度が
低い場合には次の順位の補正イメージの信頼度を調べる
動作を繰り返して文字イメージとしての信頼度の高い補
正イメージを取得可能に構成した例である。FIG. 7 is a block diagram showing the configuration of another embodiment of the character recognition device according to the present invention. The reliability of the acquired corrected image is determined by a predetermined priority (priority based on the order in which correction is performed). 2), when the reliability of the corrected image is low, the operation of checking the reliability of the corrected image of the next rank is repeated to obtain a highly reliable corrected image as a character image.

【００４０】図７で、文字認識装置１００’は、罫線除
去部１０、文字切り出し部２０、罫線接触情報格納部３
０及び文字認識ブロック４０’から構成されている。な
お、図１の文字認識装置１００と同様に図示していない
が文字認識装置１００’はＣＰＵおよびその周辺回路か
らなり、これら各構成部分の動作制御及び文字認識装置
全体の動作を制御する制御部を備えている。In FIG. 7, the character recognition device 100 'includes a ruled line removing unit 10, a character cutout unit 20, and a ruled line contact information storage unit 3.
0 and a character recognition block 40 '. Although not shown like the character recognition device 100 of FIG. 1, the character recognition device 100 'is composed of a CPU and its peripheral circuits, and controls the operation of each of these components and controls the operation of the entire character recognition device. It has.

【００４１】ここで、罫線除去部１０、文字切り出し部
２０及び罫線接触情報格納部３０の構成、機能及び動作
は図１の文字認識装置１００の場合と同様である。Here, the configurations, functions, and operations of the ruled line removing unit 10, the character cutout unit 20, and the ruled line contact information storage unit 30 are the same as those of the character recognition device 100 of FIG.

【００４２】また、文字認識ブロック４０’は、特徴抽
出部４１、辞書部４２、識別部４３、予測形状情報部４
４、イメージ補正処理部４５’、補正イメージ判定部４
６’及び棄却判定部４７を備え、特徴抽出やイメージ補
正処理等を行った後、文字認識を行う。The character recognition block 40 'includes a feature extraction unit 41, a dictionary unit 42, an identification unit 43, and a predicted shape information unit 4.
4. Image correction processing unit 45 ', corrected image determination unit 4
6 ′ and a rejection determination unit 47, perform character extraction, image correction processing, etc., and then perform character recognition.

【００４３】ここで、文字認識ブロック４０’で特徴抽
出部４１、辞書部４２、識別部４３、予測形状情報部４
４及び棄却判定部４７の構成、機能及び動作は図１の文
字認識装置１００の文字認識部４０の場合と同様であ
る。また、イメージ補正処理部４５’は、識別部４３で
得られた候補文字の文字コードＣｏ１及び距離Ｄｉ１
と、罫線接触情報格納部３０に格納されている罫線接触
情報ｆｉを基に、切り出された文字イメージＣｉ１を所
定の優先順位に従い予測される形状に近づけるように補
正する動作を予測される全ての形状について繰り返し、
補正イメージＣｉ２群を得る。なお、文字切り出し部２
０で切り出された文字イメージＣｉ１について優先順位
に従って何回かイメージ補正を行ってもこれ以上補正す
るイメージがない場合はリジェクトコード（棄却コー
ド）を出力する。また、ここでいう優先順位は切り出し
文字イメージが補正を行う際の補正順位であり、例え
ば、上、中、下の罫線と接触しているイメージを補正す
る場合どの順序から先に補正するかを定める（例えば、
上→中→下、或いは下→中→上）ものであり、予測情報
の１部をなしている。Here, in the character recognition block 40 ', the feature extraction unit 41, the dictionary unit 42, the identification unit 43, and the predicted shape information unit 4
The configuration, function, and operation of 4 and rejection determination unit 47 are the same as those of character recognition unit 40 of character recognition device 100 in FIG. Further, the image correction processing unit 45 ′ includes the character code Co1 and the distance Di1 of the candidate character obtained by the identification unit 43.
Based on the ruled line contact information fi stored in the ruled line contact information storage unit 30, all the operations for correcting the cut-out character image Ci1 so as to approximate the predicted shape according to a predetermined priority order are performed. Repeat about the shape,
A group of corrected images Ci2 is obtained. Note that the character cutout unit 2
If a character image Ci1 cut out at 0 is corrected several times in accordance with the priority and there is no image to be corrected any more, a reject code (rejection code) is output. The priority here is a correction order when the cut-out character image is corrected. For example, when correcting an image that is in contact with the upper, middle, and lower ruled lines, the order in which to correct first is determined. (For example,
(Up → middle → bottom or bottom → middle → top), which is a part of the prediction information.

【００４４】また、補正イメージ判定部４６’はイメー
ジ補正処理部４５’で補正された補正イメージＣｉ２の
イメージとしての信頼度を判定し、信頼度があると判定
した場合にはその補正イメージを文字イメージＣｉ３と
して棄却判定部４７に出力し、そうでない場合には再度
イメージ補正処理部４５’でイメージ補正を行うために
イメージ補正処理部４５’に補正イメージＣｉ２を切り
出し文字として与える。図８は、図７の文字認識装置１
００’による文字認識動作例を示すフローチャートであ
り、優先順位に従って切り出し文字イメージの補正を行
う補正イメージ作成ステップ（Ｓ６’）と補正イメージ
の信頼度を判定するステップ（Ｓ７’）を設け、作成さ
れた補正イメージの信頼度が低い場合に再度補正の作成
を繰り返して新たな文字イメージを取得可能とした例で
ある。また、各ステップの動作シーケンスの制御は制御
部によって行われる。また、図８でステップＳ１〜Ｓ５
（罫線等の除去〜切り出した文字の識別処理）とＳ８〜
Ｓ１０（棄却判定〜コード出力）までの動作は図２の文
字認識動作と同様である。以下、Ｓ６’及びＳ７’の動
作について説明する。The corrected image determination section 46 'determines the reliability of the corrected image Ci2 corrected by the image correction processing section 45' as an image. The image is output to the rejection determination unit 47 as the image Ci3, and if not, the corrected image Ci2 is given as a cutout character to the image correction processing unit 45 'so that the image correction is performed again by the image correction processing unit 45'. FIG. 8 shows the character recognition device 1 of FIG.
12 is a flowchart showing an example of a character recognition operation by 00 ', which is provided with a correction image creation step (S6') for correcting a cut-out character image in accordance with a priority order and a step (S7 ') for determining the reliability of the correction image. In this example, when the reliability of the corrected image is low, a new character image can be obtained by repeating the generation of the correction again. The control of the operation sequence of each step is performed by the control unit. Steps S1 to S5 in FIG.
(Removal of ruled lines, etc.-identification processing of cut-out characters) and S8-
The operation up to S10 (rejection determination to code output) is the same as the character recognition operation in FIG. Hereinafter, the operations of S6 ′ and S7 ′ will be described.

【００４５】ステップＳ６’：（補正イメージ群の取
得）図８で、イメージ補正処理部４５’は、文字切り出し部
２０で切り出された文字イメージＣｉ１に対応してここ
までの動作で取得した条件（切り出し文字イメージ（例
えば、文字イメージ４２）の罫線接触情報Ｉｆ２、認識
候補文字候補Ｃｏ１及び距離計算結果Ｄｉ１））を基に
予測形状情報部４４からここまでの動作で取得した条件
を満たす形状情報を所定の優先順位に従って取得し、切
り出された文字イメージＣｉ１を所定の優先順位に従い
予測される形状に近づけるように補正する動作を予測さ
れる全ての形状について繰り返し、補正イメージＣｉ２
群を得てＳ７’に移行する。なお、補正イメージＣｉ２
が得られない場合はＳ１０（リジェクトコードの出力処
理）に移行する。Step S6 ′: (Acquisition of Corrected Image Group) In FIG. 8, the image correction processing unit 45 ′ corresponds to the character image Ci1 cut out by the character cutout unit 20 and the conditions obtained by the above operation ( Based on the ruled line contact information If2 of the cut-out character image (for example, the character image 42, the recognition candidate character candidate Co1 and the distance calculation result Di1)), the shape information that satisfies the conditions acquired by the above operation from the predicted shape information unit 44 is obtained. The operation of correcting the cut-out character image Ci1 obtained in accordance with the predetermined priority order so as to approach the predicted shape in accordance with the predetermined priority order is repeated for all the predicted shapes, and the corrected image Ci2 is obtained.
The group is obtained, and the process proceeds to S7 '. Note that the correction image Ci2
If not, the process proceeds to S10 (reject code output process).

【００４６】ステップＳ７’：（補正イメージの信頼度
判定）補正イメージ判定部４６’はイメージ補正処理部４５’
で補正された補正イメージＣｉ２群の中のそれぞれの補
正イメージについてイメージとしての信頼度を判定し、
信頼度があると判定した補正イメージの中で最も信頼性
の高い補正イメージを認識候補文字イメージＣｉ３とし
て棄却判定部４７に出力してＳ８に移行し、そうでない
場合には再度イメージ補正処理部４５’でイメージ補正
を行うためにイメージ補正処理部４５’に補正イメージ
Ｃｉ２を切り出し文字として与え、Ｓ６’に戻る。ま
た、補正イメージＣｉ２群の信頼性判定方法として、実
施例では、それぞれの補正イメージＣｉ２の特徴量を算
出し、辞書部４２の各テンプレートに格納されている標
準的な特徴量Ｆｄｄとの距離計算を行い、その結果を基
にそれぞれの補正イメージＣｉ２のうち距離ｒ’が閾値
δ’（δ’＞δ）より小さく（つまり、ｒ’−δ’＜
０）他のカテゴリーとの距離が離れている補正イメージ
Ｃｉ３を抽出すると共に、それらの補正イメージのうち
その上位候補文字の文字コードＣｏ１が一致数が最も多
い補正イメージを抽出して認識候補文字イメージとす
る。なお、信頼性判定方法はこの方法に限定されない以
下、図７の文字認識装置１００’による文字認識の具体
的動作例について上記図８のフローチャート（Ｓ１〜Ｓ
５、Ｓ８〜Ｓ１０については図２のフローチャート）を
基に説明する。Step S7 ': (Determining the reliability of the corrected image) The corrected image determining unit 46' includes an image correction processing unit 45 '.
The reliability as an image is determined for each corrected image in the group of corrected images Ci2 corrected by
The correction image having the highest reliability among the correction images determined to have reliability is output to the rejection determination unit 47 as the recognition candidate character image Ci3, and the process proceeds to S8. Otherwise, the image correction processing unit 45 is performed again. In order to perform the image correction by ', the corrected image Ci2 is given as a cutout character to the image correction processing unit 45', and the process returns to S6 '. As a method of determining the reliability of the group of corrected images Ci2, in the embodiment, the feature amount of each corrected image Ci2 is calculated, and the distance calculation from the standard feature amount Fdd stored in each template of the dictionary unit 42 is performed. Is performed, and based on the result, the distance r ′ of each corrected image Ci2 is smaller than the threshold δ ′ (δ ′> δ) (that is, r′−δ ′ <
0) A corrected image Ci3 having a distance from another category is extracted, and a corrected image in which the character code Co1 of the upper candidate character has the largest number of matches is extracted from the corrected images, and a recognized candidate character image is extracted. And Note that the reliability determination method is not limited to this method. Hereinafter, a specific operation example of character recognition by the character recognition device 100 ′ of FIG.
5, S8 to S10 will be described based on the flowchart of FIG. 2).

【００４７】読み込みイメージＩｍ１を図３に示したイ
メージとし、罫線除去部１０により罫線等を除去したイ
メージＩｍ２を図４に示したイメージとする（Ｓ１）。
ここで、文字切り出し部２０により２文字分の文字イメ
ージＣｉ１が切り出される（Ｓ２）。以下、切り出され
た２文字分の文字イメージのうち、下側１ヵ所が枠線３
２と接触し、底部の情報を失ってしまったイメージ４１
（文字「２」（図９（ａ））を例とする。The read image Im1 is the image shown in FIG. 3, and the image Im2 from which the ruled lines and the like have been removed by the ruled line removing unit 10 is the image shown in FIG. 4 (S1).
Here, a character image Ci1 for two characters is cut out by the character cutout unit 20 (S2). In the following, the lower one part of the cut-out character image of two characters is indicated by a frame 3
Image 41 that has lost information at the bottom when it came into contact with 2
(The character "2" (FIG. 9A) is taken as an example.

【００４８】また、文字切り出し部２０により切り出し
た図９（ａ）に示す切り出しイメージ４１について枠線
３２の下側と接触している部分（８１）の位置、接触個
所数（１ヵ所）及び罫線の太さを罫線接触情報Ｉｆ２と
して罫線接触情報格納部３０に格納される（Ｓ３）。In the cut-out image 41 shown in FIG. 9A cut out by the character cut-out unit 20, the position of the portion (81) in contact with the lower side of the frame line 32, the number of contact points (one place), and the ruled line Is stored in the ruled line contact information storage unit 30 as ruled line contact information If2 (S3).

【００４９】次に、特徴抽出部４１で切り出しイメージ
４１に対して特徴抽出を行って特徴量Ｆｄ１を算出し
（Ｓ４）、識別部４３でこの特徴量Ｆｄ１と辞書部１４
２の各テンプレートに格納されている標準的な特徴量Ｆ
ｄｄとの距離計算を行い、距離の小さい順（特徴の近い
順）から上位３位までの認識候補文字コードＣｏ１及び
距離計算結果Ｄｉ１を認識候補情報として取得する（Ｓ
５）。なお、説明上、ここで得られた第３位までの認識
候補文字コードＣｏ１は第１位認識文字コードと第２位
認識文字コードが「７」を示す文字コードであり、第３
位認識文字コードが「９」を示す文字コードとする。Next, the feature extraction unit 41 performs feature extraction on the cut-out image 41 to calculate a feature amount Fd1 (S4), and the identification unit 43 compares the feature amount Fd1 with the dictionary unit 14.
2 is a standard feature F stored in each template
dd, and obtains the recognition candidate character codes Co1 and the distance calculation result Di1 from the smallest distance (the order of the closest feature) to the top three as the recognition candidate information (S
5). For the sake of explanation, the recognition candidate character codes Co1 up to the third place obtained here are character codes in which the first place recognition character code and the second place recognition character code indicate "7".
The position recognition character code is a character code indicating “9”.

【００５０】次に、イメージ補正処理部４５’では、予
測形状情報部４４から上記ステップＳ５で得た認識候補
情報（認識候補文字コードＣｏ１及び距離計算結果Ｄｉ
１）と、罫線接触情報格納部３０に格納されているこの
切り出し文字イメージ（図９（ａ））の罫線接触情報Ｉ
ｆ２を満たす予測形状情報を取得し、切り出された文字
イメージを所定の優先順位に従い予測される形状に近づ
けるように補正する動作を予測される全ての形状につい
て繰り返し、補正イメージ群を得る（切り出しイメージ
４１の場合は最初のサイクル（Ｓ６’、Ｓ７’のサイク
ル）で予測形状情報から「７」の可能性があるとされて
図９（ｂ）の補正イメージＣｉ２を得る）（Ｓ６’）。Next, in the image correction processing section 45 ', the recognition candidate information (the recognition candidate character code Co1 and the distance calculation result Di) obtained in step S5 from the predicted shape information section 44 are obtained.
1) and the line contact information I of this cut-out character image (FIG. 9A) stored in the line contact information storage unit 30.
The predicted shape information that satisfies f2 is acquired, and the operation of correcting the cut-out character image so as to approach the predicted shape according to a predetermined priority is repeated for all the predicted shapes to obtain a corrected image group (cut-out image group). In the case of 41, in the first cycle (cycles of S6 'and S7'), it is determined that there is a possibility of "7" from the predicted shape information, and the corrected image Ci2 of FIG. 9B is obtained (S6 ').

【００５１】次に、補正イメージ判定部４６’は上記ス
テップＳ６’で得た補正イメージ「７」の信頼性を判定
する。ここで、図９（ｂ）の「７」の文字イメージと辞
書部４２で持っている標準的な特徴Ｆｄｄとの距離が大
きい（ｒ’−δ’＞０）とすると、補正イメージ判定部
４６’はイメージとしての信頼性が低いと判定して再度
イメージ判定を行うためにＳ６’に移行する（Ｓ
７’）。Next, the correction image determination section 46 'determines the reliability of the correction image "7" obtained in step S6'. Here, if the distance between the character image of “7” in FIG. 9B and the standard feature Fdd of the dictionary unit 42 is large (r′−δ ′> 0), the corrected image determination unit 46 'Moves to S6' in order to judge that the reliability as an image is low and perform image judgment again (S6).
7 ').

【００５２】イメージ補正処理部４５’は、上記ステッ
プＳ２で得た切り出しイメージ４１に対して、切り出さ
れた文字イメージ４１を次の優先順位に従い予測される
形状に近づけるように補正する動作を予測される全ての
形状について繰り返し、新たな補正イメージ（図９
（ｃ）に示す「２」の補正イメージ）を取得する（Ｓ
６’）。The image correction processing unit 45 'predicts an operation of correcting the cut-out character image 41 obtained in step S2 so that the cut-out character image 41 approaches the shape to be predicted according to the following priority order. Repeatedly for all the shapes, a new correction image (Fig. 9
(A correction image of “2” shown in FIG. 3C) is acquired (S
6 ').

【００５３】再び、Ｓ７’で補正イメージ判定部４６’
で上記ステップＳ６’で得た補正イメージ「２」の信頼
性を判定する。ここで、図９（ｃ）の「２」の文字イメ
ージと辞書部４２で持っている標準的な特徴Ｆｄｄとの
距離が小さいとすると、「２」の文字イメージは（前述
したように）第１候補および第２候補として文字コード
Ｃｏ１が一致し、上位候補では最も一致数が多いので、
補正イメージ判定部４６’はイメージとしての信頼性が
高いと判定して棄却判定のためＳ８に移行する。Again, at S7 ', the corrected image determination unit 46'
Then, the reliability of the corrected image “2” obtained in step S6 ′ is determined. Here, assuming that the distance between the character image of “2” in FIG. 9C and the standard feature Fdd held in the dictionary unit 42 is small, the character image of “2” becomes the second image (as described above). Since the character code Co1 matches as the first candidate and the second candidate, and the top candidate has the largest number of matches,
The corrected image determination unit 46 'determines that the reliability as an image is high, and shifts to S8 for rejection determination.

【００５４】ステップＳ８（図２）で、棄却判定部４７
は上記ステップＳ７’で得られた補正イメージＣｉ３に
対し、辞書部４２の各テンプレートに格納されている標
準的な特徴量Ｆｄｄとの距離計算を行い、その距離ｒと
所定の閾値δを比較する（この例では、認識候補（補正
イメージＣｉ３）「２」の距離ｒ−δ＜０（つまり、距
離＜閾値）とするとＳ９に移行して文字「２」の文字コ
ードが出力され、そうでない場合にはＳ１０に移行して
リジェクトコードが出力される。In step S8 (FIG. 2), rejection determination section 47
Calculates the distance between the corrected image Ci3 obtained in step S7 ′ and the standard feature value Fdd stored in each template of the dictionary unit 42, and compares the distance r with a predetermined threshold δ. (In this example, if the distance r−δ <0 (that is, the distance <threshold) of the recognition candidate (corrected image Ci3) “2” is satisfied (ie, the distance <threshold value), the process proceeds to S9 where the character code of the character “2” is output. To S10, the reject code is output.

【００５５】上記構成により、罫線除去処理によって図
９（ａ）に示したように文字が真のイメージの情報を失
っても第１０（ｃ）のように失った文字イメージを再現
した補正イメージを取得することができるので、信頼性
の高い認識結果を得ることができる。With the above configuration, as shown in FIG. 9A, the corrected image reproducing the lost character image as shown in FIG. 10C even if the character loses the information of the true image as shown in FIG. Since it can be obtained, a highly reliable recognition result can be obtained.

【００５６】以上、本発明のいくつかの実施例について
説明したが本発明はこれらの実施例に限定されるもので
はなく、種々の変形実施が可能であることはいうまでも
ない。Although several embodiments of the present invention have been described above, the present invention is not limited to these embodiments, and it goes without saying that various modifications can be made.

【００５７】[0057]

【発明の効果】上記説明したように、第１〜第６の発明
の文字認識方法及び第４の発明の文字認識装置によれ
ば、罫線除去の際、罫線と接触していた部分の情報を保
持しておき、その情報を用いて文字イメージを補正する
ので、ストローク方向のいかんによらず文字イメージの
補正ができ、また、罫線に接触していた文字が幾つかの
部分（ブロック）に分離されても補正を行うことができ
るので非ドロップアウトカラーの罫線等を１色刷りした
帳票等を用いても認識率の高い文字認識を実現できる。As described above, according to the character recognition methods of the first to sixth inventions and the character recognition device of the fourth invention, when removing the ruled line, the information of the portion in contact with the ruled line is removed. Since the character image is corrected using the information stored, the character image can be corrected irrespective of the stroke direction, and the character touching the ruled line is separated into several parts (blocks). Since the correction can be performed even if the correction is performed, character recognition with a high recognition rate can be realized even if a form or the like in which ruled lines of a non-dropout color or the like are printed in one color is used.

[Brief description of the drawings]

【図１】本発明の文字認識装置の一実施例の構成を示す
ブロック図である。FIG. 1 is a block diagram illustrating a configuration of an embodiment of a character recognition device of the present invention.

【図２】図１の文字認識装置による文字認識動作例を示
すフローチャートである。FIG. 2 is a flowchart illustrating an example of a character recognition operation performed by the character recognition device of FIG. 1;

【図３】読み込み文字イメージの例を示す図である。FIG. 3 is a diagram illustrating an example of a read character image.

【図４】罫線等の除去後の文字イメージの例を示す図で
ある。FIG. 4 is a diagram showing an example of a character image after removing ruled lines and the like.

【図５】１文字切り出し後の文字イメージの一例を示す
図である。FIG. 5 is a diagram showing an example of a character image after one character is cut out.

【図６】切り出された文字を例としたイメージ補正及び
認識結果を示す図である。FIG. 6 is a diagram illustrating a result of image correction and recognition using a cut-out character as an example.

【図７】本発明の文字認識装置の一実施例の構成を示す
ブロック図である。FIG. 7 is a block diagram showing a configuration of an embodiment of the character recognition device of the present invention.

【図８】図７の文字認識装置による文字認識動作例を示
すフローチャートである。8 is a flowchart illustrating an example of a character recognition operation performed by the character recognition device of FIG. 7;

【図９】切り出された文字を例としたイメージ補正及び
認識結果を示す図である。FIG. 9 is a diagram illustrating a result of image correction and recognition using a cut-out character as an example.

【図１０】本発明の文字認識方法による認識結果と、従
来の文字認識方法による認識結果の比較説明図である。FIG. 10 is a diagram illustrating a comparison between a recognition result obtained by the character recognition method of the present invention and a recognition result obtained by the conventional character recognition method.

[Explanation of symbols]

１０罫線除去部（罫線除去手段）２０文字切り出し部（切り出し手段、罫線接触情報取
得手段）４１特長抽出部（文字認識手段）４２辞書部（文字認識手段）４３識別部（文字認識手段）４５，４５’ イメージ補正処理部（イメージ補正手
段）４６補正イメージ選択部（補正イメージ選択手段）４６’ 補正イメージ判定部（補正イメージ判定手段）４７棄却判定部（認識文字出力評価手段）１００，１００’ 文字認識装置DESCRIPTION OF SYMBOLS 10 Ruled line removal part (ruled line removal means) 20 Character cutout part (cutout means, ruled line contact information acquisition means) 41 Feature extraction part (character recognition means) 42 Dictionary part (character recognition means) 43 Identification part (character recognition means) 45, 45 'image correction processing section (image correction means) 46 corrected image selection section (correction image selection means) 46' correction image determination section (correction image determination means) 47 rejection determination section (recognized character output evaluation means) 100, 100 'characters Recognition device

Claims

[Claims]

1. A character recognition method for removing characters such as ruled lines from a read image of a read original and extracting a character image one character at a time to perform character recognition, wherein the character extracted when the character image is extracted. Obtain contact information between the image and the ruled line, etc., perform recognition processing of the cut-out character, and, based on the recognition result obtained in the recognition processing and the contact information, register the cut-out character image in advance. Obtaining a corrected image group corrected to approximate the shape predicted from the predicted shape information obtained, selecting a recognition candidate image from the corrected image group, and determining whether to output a recognition character corresponding to the recognition candidate image. A character recognition method characterized by evaluating whether or not a character is recognized.

2. A character recognition method for removing characters such as ruled lines from a read image of a read document and extracting a character image one character at a time to perform character recognition, wherein the character extracted when the character image is extracted. Obtain contact information between the image and the ruled line, etc., perform recognition processing of the cut-out character, and, based on the recognition result obtained in the recognition processing and the contact information, register the cut-out character image in advance. A corrected image group corrected to approximate the shape predicted from the predicted shape information obtained is obtained, it is checked whether the corrected image in the corrected image group satisfies a predetermined condition, and the predetermined condition is satisfied. Extracting a recognition candidate image from the corrected image and evaluating whether to output a recognition character corresponding to the recognition candidate image or not. Character recognition method.

3. When all of the corrected images in the corrected image group do not satisfy the predetermined condition, further based on the recognition result of the character image and the contact information, According to the predetermined correction priority, a corrected image group corrected so as to approach the shape predicted from the pre-registered predicted shape information is obtained, and further, whether the corrected image in the corrected image group satisfies predetermined conditions. 3. The character recognition method according to claim 2, wherein an operation of extracting a recognition candidate image from the corrected images satisfying the predetermined condition is checked.

4. A ruled line removing means for removing an image such as a ruled line from a read image of a read document; a cutout means for cutting out a character image one by one from a character image from which a ruled line or the like has been removed by the ruled line removing means; Ruled line contact information acquiring means for acquiring contact information of a portion where the ruled line or the like was in contact from the character image from which the ruled line or the like has been removed by the removing means; and recognizing means for recognizing the character image cut out by the cutout means. Based on the recognition result and the contact information obtained by the recognition processing by the recognition unit, based on the contact information, a corrected image group corrected so that the cut-out character image approaches a shape predicted from pre-registered predicted shape information Image correction means for obtaining, and a corrected image obtained by the image correction means Correction image selection means for selecting a recognition candidate image from among the above, recognition character output evaluation means for evaluating whether to output a recognition character corresponding to the recognition candidate image obtained by the correction image selection means, A character recognition device comprising:

5. A ruled line removing means for removing an image such as a ruled line from a read image of a read original; a cutout means for cutting out a character image one by one from a character image from which a ruled line or the like has been removed by the ruled line removing means; Ruled line contact information acquiring means for acquiring contact information of a portion where the ruled line or the like was in contact from the character image from which the ruled line or the like has been removed by the removing means; and recognizing means for recognizing the character image cut out by the cutout means. Based on the recognition result and the contact information obtained by the recognition processing by the recognition unit, based on the contact information, a corrected image group corrected so that the cut-out character image approaches a shape predicted from pre-registered predicted shape information Image correction means for obtaining, and a character image corrected by the character image correction means. Determining whether a predetermined condition is satisfied, and extracting a recognition candidate image from among the corrected images satisfying the predetermined condition; and a recognition candidate image obtained by the corrected image determination unit. A recognition character output evaluating means for evaluating whether or not to output a corresponding recognition character;

6. The image correction means, if all the corrected images do not satisfy a predetermined condition as a result of the determination by the corrected image determination means, further based on the recognition result of the character image and the contact information. 6. The character recognition device according to claim 5, further comprising: means for correcting the cut-out character image so as to approach a shape predicted from pre-registered predicted shape information according to a predetermined correction priority. apparatus.