JPH06301817A

JPH06301817A - Character recognition device

Info

Publication number: JPH06301817A
Application number: JP5083135A
Authority: JP
Inventors: Masaru Sugioka; 賢杉岡; Koji Ito; 晃治伊東
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1993-04-09
Filing date: 1993-04-09
Publication date: 1994-10-28

Abstract

PURPOSE:To reduce the time and the labor required for confirmation, correction, or the like of the recognition result by changing the display color and the printing color of a character when the reliability of the recognition result is low. CONSTITUTION:In a line width validity discriminating part 72, an average line width calculating part 74 stores a line width W of the input pattern of each character in one field (one item) or one row of a slip, which is calculated by a line width calculating part 20, in a line width register 76. The average line width calculating part 74 averages the line width of characters in one field to obtain an average line width WM and obtains the difference between this average line width and the stored line width W of the input pattern of each character in the pertinent field and compares this difference with line width thresholds WH and WL as reference line width and sends the character number and the result, which indicates that the reliability of the recognition result is low, to a control part 32 in the case of W-WM>WH or W-WM<WL. The control part 32 presents the display color or the printing color of the pertinent character in a display part 80 or a printing part 82 based on the result.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は、操作性が良く、しか
も、信頼性の高い文字認識装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition device which has good operability and high reliability.

【０００２】[0002]

【従来の技術】従来より種々の文字認識装置が開発され
実用に供せられている。これら文字認識装置を利用して
各種のデータベース作成のため帳票をはじめ、その他の
日本語印刷文書を計算機に入力する業務がますます増加
するものと予想され、入力の省力化及び高速化のために
印刷漢字を光学的に認識するための印刷漢字ＯＣＲ（Ｏ
ＣＲ：光学式文字認識装置）の必要性が高まっている。
この発明の文字認識装置の説明に先立ち、特公昭６０−
３８７５６号に開示されている、従来の典型的な文字認
識装置（以下、単にＯＣＲと称する場合がある。）の一
例につき図２を参照して説明する。2. Description of the Related Art Conventionally, various character recognition devices have been developed and put into practical use. It is expected that the work of inputting other Japanese printed documents, such as forms, for creating various databases using these character recognition devices will increase more and more, in order to save labor and speed up input. Printed Kanji OCR (O to recognize printed Kanji optically
The need for a CR (optical character recognition device) is increasing.
Prior to the description of the character recognition device of the present invention, Japanese Patent Publication No. 60-
An example of a conventional typical character recognition device (hereinafter sometimes simply referred to as OCR) disclosed in No. 38756 will be described with reference to FIG.

【０００３】図２は従来技術を説明するためのＯＣＲを
示すブロック図である。同図において１０は文字認識装
置の読取認識部である。この読取認識部１０において、
１２は帳票等に印刷されている漢字を光学的に読取って
光信号入力に変換する光学式読取入力部、１４は光信号
入力、１６は光電変換部、１８はパタンレジスタ、２０
は線幅計算部、２２は文字枠検出部、２４はサブパタン
抽出部、２６は特徴マトリクス抽出部、２８は識別部、
３０は文字名出力端子、３２は読取認識部１０中の各構
成部１２、１６、１８、２０、２２、２４、２６、２８
を制御するための制御部である。そこで、光学式読取入
力部１２及び光電変換部１６とで読取部５０を構成し、
パターンレジスタ１８、線幅計算部２０、文字枠検出部
２２、サブパターン抽出部２４、特徴マトリクス抽出部
２６、識別部２８を以って認識部６０を構成している。FIG. 2 is a block diagram showing an OCR for explaining the conventional technique. In the figure, reference numeral 10 is a reading recognition unit of the character recognition device. In this reading recognition unit 10,
Reference numeral 12 is an optical read input unit for optically reading Chinese characters printed on a form or the like and converting them into an optical signal input, 14 is an optical signal input, 16 is a photoelectric conversion unit, 18 is a pattern register, 20
Is a line width calculation unit, 22 is a character frame detection unit, 24 is a sub pattern extraction unit, 26 is a feature matrix extraction unit, 28 is an identification unit,
30 is a character name output terminal, 32 is each component 12, 16, 18, 20, 22, 24, 26, 28 in the read recognition unit 10.
Is a control unit for controlling. Therefore, the reading unit 50 is configured by the optical reading input unit 12 and the photoelectric conversion unit 16,
The pattern register 18, the line width calculation unit 20, the character frame detection unit 22, the sub pattern extraction unit 24, the feature matrix extraction unit 26, and the identification unit 28 constitute a recognition unit 60.

【０００４】図２を用いて、従来装置における文字認識
動作について説明する。先ず、印刷漢字等の文字情報
は、光学式読取入力部１２で読み取られて原パターンの
光信号入力１４に変換され、この光信号入力１４を光電
変換部１６で２値の量子化された電気信号３４に変換す
る。パターンレジスタ１８はこの電気信号３４を格納す
る。この格納の際、文字は例えば１００×１００個のセ
ルに分割されて、各々の２値データがパターンレジスタ
１８に記憶される。線幅計算部２０は入力パターン（原
パターン）の線幅Ｗを計算し線幅信号３６を出力する。
この線幅計算部２０は周知のフィルタ回路と同様にシフ
トレジスタ構成となっており、例えば下記に示すような
周知の近似式（１）を用いて線幅Ｗを計算する。Ｗ＝１／｛１−（Ｑ／Ａ）｝（１）式（１）において、Ｑはパターンレジスタ１８を（２×
２）の窓で走査したときに窓のすべての点が黒点となる
点の個数であり、Ａはパターンレジスタ１８内の全黒点
の個数である。A character recognition operation in the conventional apparatus will be described with reference to FIG. First, character information such as printed Chinese characters is read by the optical read input unit 12 and converted into an optical signal input 14 of an original pattern, and this optical signal input 14 is converted into binary quantized electrical signals by a photoelectric conversion unit 16. Signal 34. The pattern register 18 stores this electric signal 34. At the time of this storage, the character is divided into, for example, 100 × 100 cells, and each binary data is stored in the pattern register 18. The line width calculator 20 calculates the line width W of the input pattern (original pattern) and outputs the line width signal 36.
The line width calculation unit 20 has a shift register configuration similar to a known filter circuit, and calculates the line width W using, for example, a known approximation formula (1) shown below. W = 1 / {1- (Q / A)} (1) In the equation (1), Q is the pattern register 18 (2 ×
2) is the number of points where all the points in the window become black points when scanned by the window, and A is the number of all black points in the pattern register 18.

【０００５】サブパタン抽出部２４は原パターンから垂
直サブパターン（ＶＳＰ）、水平サブパターン（ＨＳ
Ｐ）、右斜めサブパターン（ＲＳＰ）、左斜めサブパタ
ーン（ＬＳＰ）の４つのサブパターンを抽出し、サブパ
ターン抽出信号３８を出力する。図３に原パターンとサ
ブパターンの抽出例を示す。図３（Ａ）が原パターン、
（Ｂ）がＶＳＰ、（Ｃ）がＨＳＰ、（Ｄ）がＲＳＰ、
（Ｅ）がＬＳＰのサブパターンをそれぞれ示している。
サブパターン抽出部２４は、パターンレジスタ１８につ
いて垂直スキャンを全面行って、黒ビットの連続の長さ
と線幅計算部２０において計算された線幅との関係に基
づいて垂直サブパターン（ＶＳＰ）を抽出し、同様に水
平スキャンにより水平サブパターン（ＨＳＰ）を、右斜
め（４５゜）スキャンにより右斜めサブパターン（ＲＳ
Ｐ）を、左斜め（４５゜）スキャンにより左斜めサブパ
ターン（ＬＳＰ）を抽出する。The sub-pattern extraction unit 24 extracts a vertical sub-pattern (VSP) and a horizontal sub-pattern (HS) from the original pattern.
P), the right diagonal sub-pattern (RSP) and the left diagonal sub-pattern (LSP) are extracted and a sub-pattern extraction signal 38 is output. FIG. 3 shows an example of extracting the original pattern and the sub-pattern. The original pattern is shown in FIG.
(B) is VSP, (C) is HSP, (D) is RSP,
(E) shows the LSP sub-patterns, respectively.
The sub-pattern extraction unit 24 performs a vertical scan on the pattern register 18 over the entire surface and extracts a vertical sub-pattern (VSP) based on the relationship between the continuous length of black bits and the line width calculated by the line width calculation unit 20. Similarly, the horizontal scan scans the horizontal sub-pattern (HSP), and the diagonal scan (45 °) scans the right diagonal sub-pattern (RS).
Left diagonal sub-pattern (LSP) is extracted by scanning left diagonal (45 °) of P).

【０００６】文字枠検出部２２はパターンレジスタ１８
からの電気信号３４から当該パターンレジスタ１８内の
文字パターンに外接する文字枠を検出しその結果として
の文字枠信号４０を特徴マトリクス抽出部２６へ送る。
特徴マトリクス抽出部２６は各方向別のサブパターンを
格納した図示しないサブパターンレジスタについて原パ
ターンの文字枠に対応する領域を（Ｎ×Ｍ）（例えば、
Ｎ＝Ｍ＝５）の領域に分割する。例えば文字が１００×
１００のセルに分割され、Ｎ＝Ｍ＝５の場合には各領域
は２０×２０のセルを有する。The character box detector 22 is provided with a pattern register 18
A character frame circumscribing the character pattern in the pattern register 18 is detected from the electric signal 34 from the above, and the resulting character frame signal 40 is sent to the feature matrix extraction unit 26.
The feature matrix extraction unit 26 sets (N × M) (for example, an area corresponding to the character frame of the original pattern in an unillustrated sub pattern register storing sub patterns for each direction) (for example,
(N = M = 5). For example, the character is 100 ×
It is divided into 100 cells, and when N = M = 5, each area has 20 × 20 cells.

【０００７】ここで、垂直サブパターンＶＳＰを例にと
り、特徴マトリクスを抽出する方法を説明する。特徴マ
トリクス抽出部２６は、各分割領域毎に設けた図示しな
い（Ｎ×Ｍ）個のカウンタを“０”にし、文字枠内を主
走査方向は水平に左から右へ走査し、副走査毎に白ビッ
ト（文字背影部）から黒ビット（文字線部）へ変化した
時の黒ビットの座標位置（Ｘ_WB，Ｙ_n）と、黒ビットか
ら白ビットへ変化した時の黒ビットの座標位置（Ｘ_BW，
Ｙ_n）を検出し、その中点座標位置（Ｘ_B,Ｙ_n）を式
（２）により計算する。Ｘ_B ＝（Ｘ_WB＋Ｘ_BW）／２（２）次に、この中点座標位置が前記分割した領域のどこに存
在するかを決定し、決定した分割領域のカウンタの内容
を＋１する。Here, a method of extracting a feature matrix will be described by taking the vertical sub-pattern VSP as an example. The feature matrix extraction unit 26 sets (N × M) counters (not shown) provided for each divided area to “0”, horizontally scans the character frame in the main scanning direction from left to right, and for each sub-scanning. And the black bit coordinate position (X _WB , Y _n ) when the white bit (character back shadow part) changes to the black bit (character line part) and the black bit coordinate position when the black bit changes to the white bit (X _BW ,
Y _n ) is detected and the midpoint coordinate position (X _B , Y _n ) is calculated by the equation (2). X _B = (X _WB + X _BW ) / 2 (2) Next, it is determined where the midpoint coordinate position exists in the divided area, and the content of the counter of the determined divided area is incremented by one.

【０００８】この処理を文字枠内全面について行い、各
分割領域に設けた（Ｎ×Ｍ）のカウンタ値が垂直サブパ
ターンＶＳＰの（Ｎ×Ｍ）次元の線長マトリクスであ
る。同様の処理をＨＳＰ、ＲＳＰ、ＬＳＰの各サブパタ
ーンについて行う。但し水平サブパターンＨＳＰは主走
査方向を垂直に上から下へ副走査方向を水平に左から右
へ走査する。左及び右斜めサブパターンＲＳＰ及びＬＳ
Ｐは主走査方向を垂直に上から下へ副走査方向を水平に
左から右へ、又は、主走査方向を水平に左から右へ、副
走査方向を垂直に上から下へ走査する。This processing is performed on the entire surface of the character frame, and the (N × M) counter value provided in each divided area is the (N × M) -dimensional line length matrix of the vertical sub-pattern VSP. Similar processing is performed for each HSP, RSP, and LSP sub-pattern. However, the horizontal sub-pattern HSP scans vertically in the main scanning direction from top to bottom and horizontally in the sub-scanning direction from left to right. Left and right diagonal sub-patterns RSP and LS
P scans the main scanning direction vertically from top to bottom, the sub-scanning direction horizontally from left to right, or scans the main scanning direction horizontally from left to right and the sub-scanning direction vertically from top to bottom.

【０００９】次に、特徴マトリクス抽出部26は、抽出し
た線長マトリクスを文字の大きさで正規化し、特徴マト
リクスを作成する。その方法は正規化前の線長マトリク
スの１要素をｋ_ij、正規化後の要素をＬ_ij、文字枠の水
平方向の長さをΔＸ、垂直方向の長さをΔＹとすると下
記の様な処理を行っている。（１）垂直サブパターン（ＶＳＰ）マトリクスの場合Ｌ_ij ＝ｋ_ij／ΔＹ（３）（２）水平サブパターン（ＨＳＰ）マトリクスの場合Ｌ_ij ＝ｋ_ij／ΔＸ（４）（３）斜めサブパターン（ＲＳＰ，ＬＳＰ）マトリクス
の場合Ｌ_ij ＝ｋ_ij／｛（ΔＸ）²＋（ΔＹ）²｝^1/2 （５）前記処理により、特徴マトリクス抽出部２６は最終的に
原パターンを表現する｛（Ｎ×Ｍ）×４｝次元の正規化
した特徴マトリクスを作成し、特徴マトリクス信号４２
を出力する。Next, the feature matrix extraction unit 26 normalizes the extracted line length matrix by the size of the character to create a feature matrix. The method is as follows, where one element of the line length matrix before normalization is k _ij , the element after normalization is L _ij , the horizontal length of the character frame is ΔX, and the vertical length is ΔY. It is processing. (1) Vertical sub-pattern (VSP) matrix L _ij = k _ij / ΔY (3) (2) Horizontal sub-pattern (HSP) matrix L _ij = k _ij / ΔX (4) (3) Diagonal sub-pattern In the case of (RSP, LSP) matrix L _ij = k _ij / {(ΔX) ² + (ΔY) ² } ^1/2 (5) By the above processing, the feature matrix extraction unit 26 finally represents the original pattern. A (N × M) × 4} -dimensional normalized feature matrix is created, and the feature matrix signal 42 is generated.
Is output.

【００１０】識別部２８は標準文字パタンターンについ
ての特徴マトリクスを保持する標準文字マスク（ｆ_m）
と特徴マトリクス抽出部２６において抽出された特徴マ
トリクス（ｆ_i）との間に従来から使用されている式
（６）の距離（Ｄ）が最小の値を与える標準文字マスク
のカテゴリ名を文字名出力４４として文字名出力端子３
０から出力する。Ｄ＝ Σ（ｆ_m−ｆ_i）² （６）The identification unit 28 is a standard character mask (f _m ) holding a feature matrix for standard character patterns.
And the feature matrix (f _i ) extracted by the feature matrix extraction unit 26, the category name of the standard character mask that gives the minimum value of the distance (D) of the expression (6) that has been used conventionally is a character name. Character name output terminal 3 as output 44
Output from 0. _{_{D = Σ (f m -f i}} ) 2 (6)

【００１１】ところで、上述した従来構成のＯＣＲでは
光学式読取入力部１２及び光電変換部１６から成る読取
部５０の分解能が不足しているとパターンがつぶれてし
まい抽出される特徴が不安定になり誤認識を起こしてい
た。図４は読取部50の分解能が不足してつぶれたパター
ンの例で、中ゴシック体１０ポイントの「轟」のパター
ンを分解能１６本／ｍｍで読取ったものである。従っ
て、つぶれないようにするため、従来の印刷漢字ＯＣＲ
で日本語印刷文書を入力する場合、入力する前に日本語
印刷文書の活字についてサイズや字体が適正な範囲にあ
ることを確認していた。By the way, in the above-described conventional OCR, if the resolution of the reading unit 50 including the optical reading input unit 12 and the photoelectric conversion unit 16 is insufficient, the pattern is crushed and the characteristics to be extracted become unstable. It caused a false recognition. FIG. 4 shows an example of a pattern which is crushed due to lack of resolution of the reading unit 50, and a pattern of "roar" of 10 points of medium Gothic is read at a resolution of 16 lines / mm. Therefore, in order to prevent it from being crushed, the conventional print Kanji OCR
When inputting a Japanese print document with, I confirmed that the size and font of the type of the Japanese print document were within the proper range before inputting.

【００１２】[0012]

【発明が解決しようとする課題】しかしながら、印刷漢
字ＯＣＲでは、活字の字体は例えば明朝体といっても細
明朝体、中明朝体、太明朝体と多くの種類があり、さら
に印刷メーカーごとに活字が異なるなど活字の字体が多
いため、活字の専門家ではない一般のユーザーは日本語
印刷文書を印刷漢字ＯＣＲで認識させ、表示又は印字さ
れている認識結果が正しいかどうかを注意深く観察して
認識結果の確認、訂正等を行う必要があり、これがた
め、従来装置には、確認、訂正に時間や手間がかかり操
作性が悪いという問題点があった。以上の問題は、印刷
漢字ＯＣＲに限らず、手書きＯＣＲをはじめとするあら
ゆるＯＣＲについても同様である。However, in the printed Kanji OCR, there are many types of typefaces, such as the Mincho typeface, such as the Hoshomincho type, the Middle Mincho type, and the Taiming type. Since there are many typefaces, such as different typefaces for each printing maker, general users who are not specialists in typefaces recognize Japanese printed documents with the printed kanji OCR and check whether the displayed or printed recognition results are correct. It is necessary to carefully observe and confirm and correct the recognition result. Therefore, the conventional device has a problem that it takes time and labor for confirmation and correction, resulting in poor operability. The above problem is not limited to the printed Kanji OCR, and is the same for any OCR including a handwritten OCR.

【００１３】この発明の目的は、上述した従来の問題点
を除去し、操作性が良く、しかも信頼性が高い文字認識
装置を提供することにある。An object of the present invention is to eliminate the above-mentioned conventional problems, and to provide a character recognition device which has good operability and high reliability.

【００１４】[0014]

【課題を解決するための手段】この発明は前記課題を解
決するために、入力媒体上に記載された文字パターンを
光電変換し、文字パターンデータを出力する読取部と、
前記文字パターンデータを認識する認識部と、認識結果
を表示する表示部又は、印字部を備えた文字認識装置に
おいて、前記文字パターンデータの線幅に基づいて当該
文字認識結果の信頼性を判定する文字線幅妥当性判定部
を設け、前記文字認識結果の信頼性に基づいて認識結果
の表示又は印字方法を視覚的に変化させることを特徴と
する。In order to solve the above-mentioned problems, the present invention comprises a reading section for photoelectrically converting a character pattern described on an input medium and outputting character pattern data.
In a character recognition device including a recognition unit that recognizes the character pattern data and a display unit that displays the recognition result, or a printing unit, the reliability of the character recognition result is determined based on the line width of the character pattern data. A character line width validity determining unit is provided, and the method of displaying or printing the recognition result is visually changed based on the reliability of the character recognition result.

【００１５】[0015]

【作用】本発明によれば、文字の線幅に基づき当該文字
認識結果の信頼性が判定され、その判定結果に基づいて
認識結果の表示方法或は印字方法を視覚的に変化させる
ようにしたので、認識結果の後処理に要する時間、手間
を軽減することができ、前記課題を解決することが出来
る。According to the present invention, the reliability of the character recognition result is judged based on the line width of the character, and the display method or the printing method of the recognition result is visually changed based on the judgment result. Therefore, the time and labor required for post-processing of the recognition result can be reduced, and the above problem can be solved.

【００１６】[0016]

【実施例】以下、図面を参照してこの発明の実施例につ
き説明する。図１は本発明の一実施例を示すブロック図
である。同図において、図２に示した構成成分と同一の
構成成分については同一の符号を付して示し、その詳細
な説明を省略する。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing an embodiment of the present invention. In the figure, the same components as those shown in FIG. 2 are designated by the same reference numerals, and detailed description thereof will be omitted.

【００１７】この発明の文字認識装置は、平均線幅計算
部７４及び線幅レジスタ７６から成る文字線幅妥当性判
定部７２を具えている。この文字線幅妥当性判定部７２
は文字認識処理中に認識しようとする文字の認識結果の
信頼性を判定する。この場合、制御部３２を作動させ
て、表示部８０又は印字部８２における表示色又は印字
色を変化させることによって、オペレータに認識した文
字の認識結果の信頼性が低いことを知らせることができ
る。The character recognition apparatus of the present invention comprises a character line width validity determining section 72 including an average line width calculating section 74 and a line width register 76. This character line width validity determination unit 72
Determines the reliability of the recognition result of the character to be recognized during the character recognition process. In this case, the control unit 32 is operated to change the display color or the print color on the display unit 80 or the print unit 82, thereby making it possible to notify the operator that the recognition result of the recognized character is low in reliability.

【００１８】平均線幅計算部７４は、読取認識部１０の
線幅計算部２０から出力される各入力文字パターン毎の
線幅信号３６の線幅（Ｗ）情報に基づいて、１文字以上
の平均線幅Ｗ_Mを算出する。The average line width calculation unit 74 includes one or more characters based on the line width (W) information of the line width signal 36 for each input character pattern output from the line width calculation unit 20 of the read recognition unit 10. The average line width W _M is calculated.

【００１９】線幅レジスタ７６は平均線幅計算部７４で
平均線幅Ｗ_Mを算出するに当たり、線幅計算部２０から
の各文字の線幅信号３６を受信してそれぞれの文字の線
幅Ｗを一旦格納し、所要に応じて、平均をとる文字数分
の各線幅Ｗを平均線幅計算部７４によって読み取ること
ができる構成となしてある。尚、制御部３２は文字線幅
妥当性判定部７２の各構成成分７４、７６と表示部８０
及び印字部８２の各種の制御を行うためのものである。The line width register 76 receives the line width signal 36 of each character from the line width calculation unit 20 in calculating the average line width W _M in the average line width calculation unit 74 and receives the line width W of each character. Is stored once, and each line width W corresponding to the number of characters to be averaged can be read by the average line width calculation unit 74 as required. The control unit 32 controls the constituent components 74 and 76 of the character line width validity determining unit 72 and the display unit 80.
And for controlling the printing unit 82 in various ways.

【００２０】次に線幅を用いて認識結果の信頼性が低い
ことをユーザーに知らしめる線幅妥当性判定部７２の動
作について説明する。今、一例として、認識しようとす
る文字の文字担持体を帳票とし、読取部における分解能
を１６本／ｍｍとする。Next, the operation of the line width validity determining unit 72 for notifying the user that the reliability of the recognition result is low using the line width will be described. Now, as an example, the character carrier of the characters to be recognized is used as a form, and the resolution in the reading unit is 16 lines / mm.

【００２１】平均線幅計算部７４は帳票の１フィールド
（１項目）又は１行について、線幅計算部２０で計算さ
れた当該フィールド又は行内の各文字の入力パターン毎
の線幅Ｗを線幅レジスタ７６に格納し、線幅レジスタ７
６のアドレスカウンタ（図示せず）を歩進する。尚、帳
票の形式により、１フィールドの範囲及び文字数は、予
め定義されているものとする。そして、この平均線幅計
算部７４は、線幅レジスタ７６に格納された１フィール
ド分の文字についての線幅を平均して平均線幅Ｗ_Mを求
め、線幅レジスタ７６に格納された当該フィールド内の
各文字の入力パターン毎の線幅Ｗとの差を求め、基準線
幅としての線幅閾値Ｗ_H、Ｗ_L（Ｗ_H、Ｗ_Lは定数、この実
施例ではＷ_H＝１．０、Ｗ_L＝−１．０）と比較し、式
（７）あるいは式（８）を満たすならば制御部３２に文
字番号と認識結果の信頼性が低いという妥当判定結果を
送る。Ｗ−Ｗ_M＞Ｗ_H （７）Ｗ−Ｗ_M＜Ｗ_L （８）The average line width calculation unit 74 determines the line width W for each input pattern of each character in the field or line calculated by the line width calculation unit 20 for one field (one item) or one line of the form. Stored in register 76, line width register 7
An address counter (not shown) 6 is incremented. It should be noted that the range and the number of characters in one field are defined in advance depending on the form of the form. Then, the average line width calculation unit 74 averages the line widths of the characters for one field stored in the line width register 76 to obtain the average line width W _M , and the field stored in the line width register 76 is averaged. obtains the difference between the line width W of each input pattern of each character of the inner line width as a reference line width threshold value W _H, W _L (W _H, W _L is a constant, W _H = 1.0 in this embodiment , W _L = −1.0), and if Expression (7) or Expression (8) is satisfied, the validity determination result that the reliability of the character number and the recognition result is low is sent to the control unit 32. W-W _M > W _H (7) W-W _M <W _L (8)

【００２２】尚、この基準値としての線幅閾値Ｗ_H，Ｗ_L
はフィールド毎に、認識目的（例えば、読取字種や筆記
具）又は認識条件（例えば、分解能）に応じて予め定め
ておき、その値を文字線幅妥当性判定部７２に設けたメ
モリ（図示せず）に読み出し自在に格納しておく。The line width thresholds W _H and W _L as the reference values
Is predetermined for each field according to the recognition purpose (for example, reading character type or writing instrument) or the recognition condition (for example, resolution), and the value is stored in the memory (not shown) provided in the character line width validity determining unit 72. It is stored so that it can be read freely.

【００２３】制御部３２は、文字毎の文字番号と対応す
る妥当判定結果に基づいて表示部８０又は印字部８２に
おいて当該文字の表示色又は印字色を指示する。この場
合ユーザーは、表示部又は印字部の特定の色（この場
合、妥当判定結果が式（７）と式（８）の２通りあるの
で２色）についてだけ注意して認識結果の確認をすれば
良い。The control unit 32 instructs the display color or the printing color of the character in the display unit 80 or the printing unit 82 based on the validity judgment result corresponding to the character number of each character. In this case, the user should check the recognition result while paying attention only to the specific color of the display section or the printing section (in this case, there are two kinds of validity judgment results of formula (7) and formula (8), so there are two colors). Good.

【００２４】この発明は、上述した実施例にのみ限定さ
れるものではなく、多くの変形又は変更を行い得ること
明かである。例えば、認識部の構成は、図１に示した従
来構成に何等限定されず、文字パターンの線幅を用いて
特徴を抽出し、この抽出された特徴に基づいて認識結果
を出力する構成のものであればその構成を問わない。It will be clear that the invention is not limited to the embodiments described above, but that many variants or modifications can be made. For example, the configuration of the recognition unit is not limited to the conventional configuration shown in FIG. 1, and the feature is extracted using the line width of the character pattern and the recognition result is output based on the extracted feature. If so, the configuration does not matter.

【００２５】さらに、上述した実施例の他に、当該文字
を認識するまでに認識したすべての文字の平均を平均線
幅Ｗ_Mとしても良い。その場合、線幅レジスタに当該文
字に至るまでの文字数を格納しておくのが良い。In addition to the above embodiment, the average line width W _M may be the average of all the characters recognized until the character is recognized. In that case, it is preferable to store the number of characters up to the character in the line width register.

【００２６】さらに、上述した実施例では、１フィール
ドまたは１行の平均線幅Ｗ_Mと当該フィールドの各文字
の線幅Ｗとの差を基準線幅閾値Ｗ_H，Ｗ_Lと比較している
が、式（９）或は式（１０）に示すように各文字毎に直
接、基準線幅閾値Ｗ_THH、Ｗ_THL（Ｗ_THH、Ｗ_THLは定数、
この実施例ではＷ_THH＝７．０、Ｗ_THL＝２．０）と比較
することもできる。この場合、式（９）或は式（１０）
を満たすならば認識結果の信頼性が低いと判定する。Ｗ＞Ｗ_THH （９）Ｗ＜Ｗ_THL （１０）Further, in the above-described embodiment, the difference between the average line width W _M of one field or one line and the line width W of each character in the field is compared with the reference line width thresholds W _H and W _L. However, as shown in Expression (9) or Expression (10), the reference line width thresholds W _THH , W _THL (W _THH , W _THL are constants, directly for each character.
In this embodiment, W _THH = 7.0 and W _THL = 2.0) can be compared. In this case, equation (9) or equation (10)
If the condition is satisfied, it is determined that the reliability of the recognition result is low. W> W _THH (9) W <W _THL (10)

【００２７】さらに、上述した実施例では文字担持体と
して帳票につき説明したが、この文字認識装置は帳票以
外の任意好適な文字担持体についても適用できる。Further, although the form has been described as the character carrier in the above-mentioned embodiments, this character recognition device can be applied to any suitable character carrier other than the form.

【００２８】さらに、上述した実施例では文字線幅妥当
性判定部の線幅レジスタに対し線幅計算部から平均線幅
計算部を経て線幅信号を入力させる構成となっている
が、その代わりに線幅計算部から直接線幅レジスタに入
力させるように構成してもよい。Further, in the above-mentioned embodiment, the line width signal is inputted from the line width calculation section to the average line width calculation section to the line width register of the character line width validity judgment section, but instead of this, Alternatively, the line width calculator may directly input the line width register.

【００２９】さらに、上述した実施例では、表示色又は
印字色について説明したが、これに限らず、例えば、輝
度、反転、ブリンキング、アンダーライン等視覚的に判
別できるものであればよい。Further, although the display color or the print color has been described in the above-mentioned embodiment, the present invention is not limited to this, and any color such as luminance, reversal, blinking, underline, etc. can be visually distinguished.

【００３０】尚、上述した担持体とは、手書き、印刷、
その他の任意好適な手法により、紙、プラスチック、
布、木、金属、その他の任意好適な材料に文字が情報と
して記録されたものであればよい。The above-mentioned carrier means handwriting, printing,
By any other suitable method, paper, plastic,
Text, information may be recorded on cloth, wood, metal, or any other suitable material.

【００３１】[0031]

【発明の効果】上述した説明からも明かなようにこの発
明によれば、入力文字を含む文字列の文字の線幅の平均
と所定の閾値とを比較し、比較した結果、当該文字の認
識結果の信頼性が低いと判明したときには、当該文字の
認識結果の表示色又は印字色を変えるようにしたので、
従来に比べて、認識結果の確認、訂正等に要する時間や
手間を削減することができる。As is apparent from the above description, according to the present invention, the average line width of characters in a character string including an input character is compared with a predetermined threshold value, and as a result of the comparison, the character is recognized. When it is found that the reliability of the result is low, the display color or print color of the recognition result of the character is changed.
It is possible to reduce the time and labor required for checking and correcting the recognition result, as compared with the related art.

[Brief description of drawings]

【図１】本発明の文字認識装置の一実施例を示すブロッ
ク図である。FIG. 1 is a block diagram showing an embodiment of a character recognition device of the present invention.

【図２】従来の文字認識装置を示すブロック図である。FIG. 2 is a block diagram showing a conventional character recognition device.

【図３】原パターンとサブパターンの抽出例を示す図で
ある。FIG. 3 is a diagram showing an example of extracting an original pattern and a sub-pattern.

【図４】つぶれたパターン例を示す図である。FIG. 4 is a diagram showing an example of a collapsed pattern.

[Explanation of symbols]

１０読取認識部１２光学式読取入力部１６光電変換部１８パターンレジスタ２０線幅計算部２２文字枠検出部２４サブパターン抽出部２６特徴マトリクス抽出部２８識別部３０文字名出力端子３２制御部５０読取部６０認識部７２文字線幅妥当性判定部７４平均線幅計算部７６線幅レジスタ８０表示部８２印字部 10 Read recognition unit 12 Optical read input unit 16 Photoelectric conversion unit 18 Pattern register 20 Line width calculation unit 22 Character frame detection unit 24 Sub pattern extraction unit 26 Feature matrix extraction unit 28 Identification unit 30 Character name output terminal 32 Control unit 50 Read Part 60 Recognition part 72 Character line width validity judgment part 74 Average line width calculation part 76 Line width register 80 Display part 82 Printing part

Claims

[Claims]

1. A reading unit for photoelectrically converting a character pattern described on an input medium and outputting character pattern data,
A recognition unit for recognizing the character pattern data, and a display unit for displaying the recognition result or a character recognition device including a printing unit, wherein the reliability of the character recognition result is determined based on the line width of the character pattern data. A character recognition device, comprising: a character line width validity determining unit, and visually changing a method of displaying or printing a recognition result based on the reliability of the character recognition result.