JP2875330B2

JP2875330B2 - Character recognition method

Info

Publication number: JP2875330B2
Application number: JP2053157A
Authority: JP
Inventors: 浩一樋口; 義征山下
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1990-03-05
Filing date: 1990-03-05
Publication date: 1999-03-31
Anticipated expiration: 2014-03-31
Also published as: JPH03253988A

Description

【発明の詳細な説明】（産業上の利用分野）本発明は、郵便番号等の識別に用いられる文字認識方
法、特に高速で認識精度の優れた文字認識方法に関する
ものである。Description: TECHNICAL FIELD The present invention relates to a character recognition method used for identifying a postal code or the like, and more particularly to a character recognition method which is fast and has excellent recognition accuracy.

（従来の技術）従来、この種の分野の技術としては、特開昭57−2318
5号公報等に記載されるものがあった。以下、図を用い
て説明する。(Prior Art) Conventionally, a technique in this field is disclosed in Japanese Patent Application Laid-Open No. 57-2318.
There was one described in No. 5 gazette and the like. Hereinafter, description will be made with reference to the drawings.

第２図は、上記文献に記載された従来の文字認識方法
を説明するための説明図、及び第３図（ａ），（ｂ）は
情報媒体の一例を示す図である。FIG. 2 is an explanatory view for explaining a conventional character recognition method described in the above-mentioned document, and FIGS. 3 (a) and 3 (b) are views showing an example of an information medium.

この文字認識方法は、先ず、情報媒体上の文字を光電
変換して文字パターンＡを得る（ステップ１）。この文
字パターンＡにより、各方向の線素成分（ストローク成
分）を表すサブパターンA1〜A4を抽出する（ステップ
２）。さらに、その各サブパターンA1〜A4をＮ×Ｍ（例
えばＮ＝Ｍ＝５）の複数の領域A1aに分割する（ステッ
プ３）。その後、分割してできた各領域A1aの黒点数を
計数し、各領域A1a内のサブパターンのストロークの長
さを表現する特徴量を抽出し、各サブパターンA1〜A4に
ついてＮ×Ｍ次元の特徴マトリクスを作成する。そし
て、得られた特徴マトリクスと予め用意した辞書マスク
と照合して文字を識別するものであった。In this character recognition method, first, characters on an information medium are photoelectrically converted to obtain a character pattern A (step 1). From the character pattern A, sub-patterns A1 to A4 representing line element components (stroke components) in each direction are extracted (step 2). Further, each of the sub-patterns A1 to A4 is divided into a plurality of areas A1a of N × M (for example, N = M = 5) (step 3). After that, the number of black spots in each of the divided areas A1a is counted, and a feature amount representing the stroke length of the sub-pattern in each of the areas A1a is extracted. Create a feature matrix. Then, characters are identified by comparing the obtained feature matrix with a dictionary mask prepared in advance.

ところで、第３図（ａ），（ｂ）に示す葉書や封書等
の郵便番号においては、第３図（ａ）に示すように予め
住所と共に印刷されている場合と、第３図（ｂ）に示す
ように手書きで記入されている場合とがある。個人が記
入するする場合は、手書きである場合が一般的である
が、印刷されている場合は、ダイレクトメールの返信葉
書、返信封筒や雑誌書籍等に折り込まれた申し込み葉書
等にみられ、その数は膨大である。また、注文伝票、申
込書、依頼票等の伝票類において、手書文字及び、印刷
文字が混在する場合も同様である。By the way, the postal codes such as postcards and postcards shown in FIGS. 3 (a) and 3 (b) are printed in advance together with the address as shown in FIG. 3 (a). As shown in FIG. When an individual fills in the form, it is generally handwritten.However, when the form is printed, it is found in a reply postcard of direct mail, an application postcard inserted in a reply envelope or a magazine book, etc. The number is huge. The same applies to a case where handwritten characters and printed characters are mixed in slips such as an order slip, an application form, and a request slip.

この様な場合、手書き文字及び印刷文字の双方に対応
する大容量の辞書メモリを用意して文字認識を行ってい
た。In such a case, a large-capacity dictionary memory corresponding to both handwritten characters and printed characters is prepared to perform character recognition.

（発明が解決しようとする課題）しかしながら、上記の文字認識方法では、手書き文字
及び印刷文字の双方に対応した大容量の辞書メモリ内の
全ての辞書マスクとの照合を行う必要がある。そのた
め、照合中の照合ミスが、その分、多く発生しやすくな
り、認識精度の低下を招くと共に、照合に要する時間が
増大するという問題があった。(Problems to be Solved by the Invention) However, in the above-described character recognition method, it is necessary to perform matching with all dictionary masks in a large-capacity dictionary memory corresponding to both handwritten characters and printed characters. For this reason, there is a problem that the number of collation errors during collation is likely to occur correspondingly, which causes a reduction in recognition accuracy and an increase in the time required for collation.

本発明は、前記従来技術が持っていた課題として、認
識精度の低下及び照合時間の増大という点について解決
した文字認識方法を提供するものである。An object of the present invention is to provide a character recognition method that solves the problems of the prior art, that is, a reduction in recognition accuracy and an increase in collation time.

（課題を解決するための手段）本発明は前記課題を解決するために、情報媒体上の文
字を光電変換して得られる多値画像から該文字の特徴を
抽出し、その抽出結果と予め用意された複数の辞書マス
クとを照合して前記文字の識別を行う文字認識方法にお
いて、前記多値画像に対して、互いに異なる認識用２値
化閾値と辞書選択用２値化閾値とを設定し、その認識用
２値化閾値及び辞書選択用２値化閾値に基づき２値化を
行って該認識用２値化閾値と辞書選択用２値化閾値とに
対応した認識用２値パターンと辞書選択用２値パターン
とをそれぞれ得た後、前記辞書選択用２値パターンの特
徴を検出し、その検出結果に基づき前記複数の辞書マス
クの内、所定の辞書マスクを選択するようにしたもので
ある。(Means for Solving the Problems) In order to solve the above problems, the present invention extracts the characteristics of a character from a multi-valued image obtained by photoelectrically converting a character on an information medium, and prepares the extraction result in advance. In the character recognition method for identifying the character by comparing the plurality of dictionary masks with each other, different thresholds for recognition and different thresholds for dictionary selection are set for the multi-valued image. A binary pattern for recognition and a dictionary corresponding to the binarization threshold for recognition and the binarization threshold for dictionary selection based on the binarization threshold for recognition and the binarization threshold for dictionary selection After each of the selection binary patterns is obtained, the feature of the dictionary selection binary pattern is detected, and a predetermined dictionary mask is selected from the plurality of dictionary masks based on the detection result. is there.

さらに、前記複数の辞書マスクは、手書文字及び印刷
文字用のマスクを用いてもよいし、また、辞書選択用２
値パターンの特徴を、２値パターンの黒点数の和、文字
線量、線幅、または黒点の分布状態としてもよい。Further, as the plurality of dictionary masks, masks for handwritten characters and printed characters may be used.
The feature of the value pattern may be the sum of the number of black points in the binary pattern, the character dose, the line width, or the distribution state of the black points.

（作用）本発明によれば、以上のように文字認識方法を構成し
たので、葉書等情報媒体の多値画像に対してその多値画
像の濃度等を基準として認識用２値化閾値及び辞書選択
用２値化閾値を設定し、その認識用２値化閾値及び辞書
選択用２値化閾値に基づき２値化を行って該認識用２値
化閾値と辞書選択用２値化閾値とに対応した認識用２値
パターンと辞書用２値パターンとを得た後、辞書選択用
２値パターンの黒点数の和、文字線量、線幅、または黒
点の分布状態等の特徴を検出し、その検出結果に基づき
手書文字用、印刷文字用等の複数の辞書マスクの内、所
定の辞書マスクを選択し、その所定の辞書マスクを用い
て照合を行う。これにより、照合する辞書マスクを限定
することができる。(Operation) According to the present invention, since the character recognition method is configured as described above, a binary threshold for recognition and a dictionary for a multivalued image of an information medium such as a postcard are determined based on the density and the like of the multivalued image. A binarization threshold for selection is set, binarization is performed based on the binarization threshold for recognition and the binarization threshold for dictionary selection, and the binarization threshold for recognition and the binarization threshold for dictionary selection are set. After obtaining the corresponding binary pattern for recognition and the binary pattern for dictionary, the features such as the sum of the number of black points, the character dose, the line width, or the distribution state of the black points of the binary pattern for dictionary selection are detected. Based on the detection result, a predetermined dictionary mask is selected from a plurality of dictionary masks for handwritten characters, printed characters, and the like, and collation is performed using the predetermined dictionary mask. Thereby, the dictionary mask to be compared can be limited.

したがって、前記課題を解決できるのである。 Therefore, the above problem can be solved.

（実施例）第１図は、本発明の実施例を示す文字認識方法を実施
するための文字認識装置の構成ブロック図である。(Embodiment) FIG. 1 is a block diagram showing a configuration of a character recognition apparatus for implementing a character recognition method according to an embodiment of the present invention.

この文字認識装置は、入力端10から入力した光信号Ｌ
を電気信号に変換するCCDセンサ等の光電変換部11と、
光電変換部11において光電変換されて得られたディジタ
ル信号を多値画像データとして格納する画像メモリ12と
を有している。This character recognition device uses an optical signal L input from an input terminal 10.
A photoelectric conversion unit 11 such as a CCD sensor for converting
It has an image memory 12 for storing digital signals obtained by photoelectric conversion in the photoelectric conversion unit 11 as multi-valued image data.

多値画像データの濃度は、１画素当たり４ビット等の
多値で表現される。例えば、４ビットの場合、最も白色
が０、最も黒色が15の16段階で表現される。画像メモリ
12は、１文字分の多値画像データを格納するだけの例え
ば64×64×４ビットの容量を有すると共に、各画素の信
号を情報媒体上の２次元座標に対応して再現できる格納
形式で、RAM（ランダム・アクセス・メモリ）等で構成
されている。The density of the multivalued image data is represented by a multivalued value such as 4 bits per pixel. For example, in the case of 4 bits, the most white is represented by 0 and the most black is represented by 16 levels. Image memory
Reference numeral 12 denotes a storage format having a capacity of, for example, 64 × 64 × 4 bits for storing multi-valued image data for one character, and capable of reproducing a signal of each pixel corresponding to two-dimensional coordinates on an information medium. , RAM (random access memory) and the like.

画像メモリ12の出力側には、画像メモリ12中の多値画
像データについて認識用２値化閾値T1を用いて２値化処
理を行い、その認識用２値化閾値T1に対応した認識用２
値パターンを得る第１の２値化部13と、同様に辞書選択
用２値化閾値T2を用いて２値化処理を行って辞書選択用
２値パターンを得る第２の２値化部14とが、それぞれ接
続されている。なお、認識用２値化閾値T1及び辞書選択
用２値化閾値T2は、光電変換部11の特性によって定めら
れる。また、認識用２値パターン及び辞書選択用２値パ
ターンの値である“1"と“0"とは、認識用２値パターン
及び辞書選択用２値パターンの黒点と白点とをそれぞれ
表している。On the output side of the image memory 12, binarization processing is performed on the multi-valued image data in the image memory 12 using the binarization threshold T1 for recognition, and the binarization processing corresponding to the binarization threshold T1 for recognition is performed.
A first binarizing unit 13 for obtaining a value pattern, and a second binarizing unit 14 for similarly performing a binarizing process using a binary threshold for dictionary selection T2 to obtain a binary pattern for dictionary selection And are respectively connected. The threshold value T1 for recognition and the threshold value T2 for dictionary selection are determined by the characteristics of the photoelectric conversion unit 11. The values “1” and “0” of the binary pattern for recognition and the binary pattern for dictionary selection represent the black point and white point of the binary pattern for recognition and the binary pattern for dictionary selection, respectively. I have.

さらに、第１及び第２の２値化部13,14の出力側に
は、第１の２値化部13において得られた認識用２値パタ
ーンを格納する第１のパターンレジスタ15と、第２の２
値化部14において得られた辞書選択用２値パターンを格
納する第２のパターンレジスタ16とが、それぞれ接続さ
れている。Further, on the output side of the first and second binarization units 13 and 14, a first pattern register 15 for storing the binary pattern for recognition obtained in the first binarization unit 13 is provided. 2 of 2
A second pattern register 16 for storing the dictionary selection binary pattern obtained by the value conversion unit 14 is connected to each other.

第２のパターンレジスタ16の出力側には、レジスタ16
の全面を走査してこのレジスタ16内の黒点数を計数し、
その計数結果に応じた辞書選択信号DSを出力する文字存
在判定部17と、印刷文字用の辞書マスクを有する印刷文
字辞書部及び手書文字用の辞書マスクを有する手書文字
辞書部を備え、辞書選択信号DSによりいずれかの辞書部
を選択する辞書メモリ18とが、順次接続されている。こ
の辞書メモリ18は、RAM等で構成され、論理“1"の辞書
選択信号DSが入力される時には手書文字辞書部及び印刷
文字辞書部の双方を辞書マスクを出力し、論理“0"の辞
書選択信号DSの時は、印刷文字辞書部の辞書マスクを出
力するメモリである。The output side of the second pattern register 16 has a register 16
Scan the entire surface of the register 16 and count the number of black points in this register 16,
A character presence determining unit 17 that outputs a dictionary selection signal DS according to the counting result, and a handwritten character dictionary unit having a printed character dictionary unit having a dictionary mask for printed characters and a dictionary mask for handwritten characters, The dictionary memory 18 for selecting one of the dictionary units by the dictionary selection signal DS is sequentially connected. The dictionary memory 18 is composed of a RAM or the like, and outputs a dictionary mask for both the handwritten character dictionary unit and the printed character dictionary unit when a dictionary selection signal DS of logic “1” is input, and outputs a logic “0”. In the case of the dictionary selection signal DS, it is a memory for outputting a dictionary mask of the print character dictionary unit.

そして、辞書メモリ18と第１のパターンレジスタ15と
が認識部19に接続されている。認識部19は、第１のパタ
ーンレジスタ15に格納された認識用２値パターンについ
て特徴抽出を行って特徴マトリックスfiを作成し、作成
された特徴マトリックスfiと辞書メモリ18から出力され
た辞書マスクとの照合を行い、得られた文字名を出力端
20へ出力する回路である。ここで、辞書マスクは、前記
特徴マトリックスfiと同形式の辞書マトリクスgiで構成
されている。The dictionary memory 18 and the first pattern register 15 are connected to the recognition unit 19. The recognizing unit 19 performs feature extraction on the binary pattern for recognition stored in the first pattern register 15 to create a feature matrix fi, and generates the feature matrix fi and the dictionary mask output from the dictionary memory 18. And output the resulting character name
This is a circuit to output to 20. Here, the dictionary mask is composed of a dictionary matrix gi having the same format as the feature matrix fi.

以上のように構成される文字認識装置の動作について
説明する。The operation of the character recognition device configured as described above will be described.

図示しないビジコンカメラ等を通じて文字・図形等が
記載された葉書、封書、帳票等の情報媒体から光信号Ｌ
が入力端10より入力されると、光電変換部11において光
電変換され、その結果、得られたディジタル信号は、多
値画像データとして画像メモリ12に格納される。その
後、この多値画像データは、第１の２値化部13において
２値化処理され、認識用２値パターンとして第１のパタ
ーンレジスタ15に格納される。An optical signal L is transmitted from an information medium such as a postcard, a letter, a form, or the like on which characters, graphics, etc. are written through a vidicon camera (not shown).
Is input from the input terminal 10, the photoelectric conversion is performed in the photoelectric conversion unit 11, and as a result, the obtained digital signal is stored in the image memory 12 as multi-valued image data. Thereafter, the multi-valued image data is binarized by the first binarizing unit 13 and stored in the first pattern register 15 as a binary pattern for recognition.

この２値化処理は次のようにして行われる。第１のパ
ターンレジスタ15に格納されるの認識用２値パターンf1
（x,y）を次式で求める。This binarization process is performed as follows. Recognition binary pattern f1 stored in first pattern register 15
(X, y) is obtained by the following equation.

ｇ（x,y）≧T1の時はf1（x,y）＝１ｇ（x,y）＜T1の時はf1（x,y）＝０但し、ｇ（x,y）；多値画像データであり０〜15の値を
持つ T1;認識用２値化閾値 …（１）第２の２値化部14でも、第１の２値化部13と同様に、
多値画像データについて次式に基づき２値化処理が行わ
れ、得られた認識用２値パターンf2（x,y）が第２のパ
ターンレジスタ16に格納される。f1 (x, y) = 1 when g (x, y) ≧ T1 f1 (x, y) = 0 when g (x, y) <T1, where g (x, y); multi-valued image T1 which is data and has a value of 0 to 15; binarization threshold value for recognition (1) In the second binarization unit 14, similarly to the first binarization unit 13,
Binarization processing is performed on the multi-valued image data based on the following equation, and the obtained binary pattern for recognition f2 (x, y) is stored in the second pattern register 16.

ｇ（x,y）≧T2の時はf2（x,y）＝１ｇ（x,y）＜T2の時はf2（x,y）＝０但し、T2;辞書用２値化閾値 …（２）文字存在判定部17では、第２のパターンレジスタ16を
全面について走査し、第２のパターンレジスタ16内の黒
点数BLを次式のように計数する。f2 (x, y) = 1 when g (x, y) ≧ T2, f2 (x, y) = 0 when g (x, y) <T2, where T2; 2) The character presence determination unit 17 scans the entire second pattern register 16 and counts the number of black points BL in the second pattern register 16 as follows.

この黒点数BLに基づき、次式を適用して作成した辞書
選択信号DSを辞書メモリ18へ出力する。 Based on the number of black points BL, the dictionary selection signal DS created by applying the following equation is output to the dictionary memory 18.

BL≧BTHLの時はDS＝１ BL＜BTHLの時はDS＝０但し、BTHL;閾値 …（４）文字存在判定部17から出力された辞書選択信号DSは、
辞書メモリ18に入力される。例えば、辞書選択信号DSが
（DS＝１）で入力された場合、手書用辞書マスク及び印
刷用辞書マスクを認識部19へ出力する。また、辞書選択
信号DSが（DS＝０）で入力された場合、印刷用辞書マス
クだけを認識部19へ出力する。DS = 1 when BL ≧ BTHL DS = 0 when BL <BTHL However, BTHL; threshold value (4) The dictionary selection signal DS output from the character existence determination unit 17 is
The data is input to the dictionary memory 18. For example, when the dictionary selection signal DS is input with (DS = 1), the dictionary mask for handwriting and the dictionary mask for printing are output to the recognition unit 19. When the dictionary selection signal DS is input with (DS = 0), only the printing dictionary mask is output to the recognition unit 19.

認識部19では、次のような処理が行われる。 The recognizing unit 19 performs the following processing.

先ず、第１のパターンレジスタ15中の認識用２値パタ
ーンである文字パターンの線幅を算出する。次に、前記
文字パターンを複数の方向に走査して各走査列毎の黒点
の連続個数を検出する。検出された連続個数と前記線幅
とに基づき、例えば、水平、垂直、右斜め、及び左斜め
の４方向に走査して４種類のサブパターンを得る。First, the line width of a character pattern which is a binary pattern for recognition in the first pattern register 15 is calculated. Next, the character pattern is scanned in a plurality of directions to detect the continuous number of black spots in each scanning row. On the basis of the detected continuous number and the line width, four types of sub-patterns are obtained by scanning in, for example, four directions of horizontal, vertical, diagonal right and diagonal left.

続いて、前記文字パターンの文字枠内領域をサブパタ
ーンについてＮ×Ｍ個の領域（但し、N,Mは正の整数）
に分割する。この分割された領域内について黒点数を計
数し、その計数結果と前記線幅とに基づき、特徴量を計
算する。さらに、この特徴量を文字の大きさで正規化し
て特徴マトリクスfiを作成する。なお、Ｎ×Ｍ×４次元
となり、例えばＮ＝４、Ｍ＝４とすると、64次元とな
る。Subsequently, the area within the character frame of the character pattern is divided into N × M areas (where N and M are positive integers) for the sub-pattern.
Divided into The number of black points is counted in the divided area, and the feature amount is calculated based on the counted result and the line width. Further, the feature amount is normalized by the size of the character to create a feature matrix fi. Note that the dimensions are N × M × 4 dimensions. For example, if N = 4 and M = 4, the dimensions are 64 dimensions.

そして、この特徴マトリクスfiと辞書メモリ18から出
力さる辞書マスクとの照合を行う。つまり、辞書マトリ
クスgi（ｉ＝1,2,……,64）と特徴マトリクスfi（ｉ＝
1,2,……,64）との間に下記の式を適用して距離Ｄを計
算し、距離Ｄが最も小さな辞書マトリクスgiに対応する
文字名を出力端へ出力する。Then, the feature matrix fi is compared with the dictionary mask output from the dictionary memory 18. That is, the dictionary matrix gi (i = 1, 2,..., 64) and the feature matrix fi (i =
, 64) to calculate the distance D, and output the character name corresponding to the dictionary matrix gi having the smallest distance D to the output terminal.

次に、以上のような文字認識方法を用いて郵便番号の
読取りを行う場合について説明する。 Next, a case where a zip code is read using the above-described character recognition method will be described.

第４図は黒色の手書郵便番号読取時における濃度を示
す図、及び第５図は赤色の印刷郵便番号読取時における
濃度を示す図である。FIG. 4 is a diagram showing the density when reading a black handwritten postal code, and FIG. 5 is a diagram showing the density when reading a red printed postal code.

第４図及び第５図に示すように、例えば、郵便番号3
1,32の読取りを行う場合、通常、記入枠30は赤色で印刷
される。ダイレクトメールに同封されるような返信用葉
書や雑誌に綴じ込まれた申し込み葉書等は、予め印刷さ
れた郵便番号32を有し、その郵便番号32は、印刷の都合
上、記入枠30と同色の赤色であることが多い。これに対
して、一般の手書で記入された郵便番号31は、通常、黒
または青色である。As shown in FIGS. 4 and 5, for example, zip code 3
When reading 1,32, the entry box 30 is usually printed in red. Reply postcards that are enclosed in direct mail or application postcards bound in magazines have a pre-printed postal code 32, and the postal code 32 is the same color as the entry box 30 for convenience of printing. Often red. On the other hand, the postal code 31 entered by a general handwriting is usually black or blue.

第４図において、赤色の記入枠30の内に黒色の手書き
で記入された郵便番号31を読取る場合、先ず、この図が
示すように、光電変換部11により媒体がｘ方向に走査さ
れると、赤色の濃度は黒色の濃度より低いので、赤色で
印刷された記入枠30の濃度は、黒色の郵便番号31の濃度
より低くなる。ここで、多値画像データの値は、赤色で
10程度、黒色で15程度として得られるものとして、認識
用２値化閾値T1＝８、及び辞書選択用２値化閾値T2＝12
に設定する。これにより、認識用２値化閾値T1を基準に
２値化処理が行われる第１の２値化部13において、上記
（１）式により記入枠30及び郵便番号31が２値化された
認識用２値パターンが得られ、第１のパターンレジスタ
15に格納される。In FIG. 4, when reading a postal code 31 written in black by hand in a red writing box 30, first, as shown in this figure, the medium is scanned in the x direction by the photoelectric conversion unit 11. Since the density of red is lower than the density of black, the density of the entry frame 30 printed in red is lower than the density of the postal code 31 of black. Here, the value of the multi-valued image data is red
Assuming that about 10 and about 15 are obtained in black, the binarization threshold T1 for recognition is 8 and the binarization threshold T2 for dictionary selection is 12
Set to. As a result, in the first binarization unit 13 in which the binarization processing is performed based on the binarization threshold value T1 for recognition, the recognition is performed in which the entry frame 30 and the postal code 31 are binarized by the above equation (1). And a first pattern register is obtained.
Stored in 15.

一方、辞書選択用２値化閾値T2を基準に２値化処理を
行う第２の２値化部14では、上記（２）式により記入枠
30の２値パターンは得られず、郵便番号31のみの２値パ
ターンが得られる。この郵便番号31の２値パターンは、
第２のパターンレジスタ16に格納された後、文字存在判
定部17において、その黒点数BLが上記（３）式により計
数される。ここで、BTHL＝10とすると、上記（４）式に
示すように、BL≧BTHLとなるので、DS＝１となり、論理
“1"の辞書選択信号DSが辞書メモリ18へ出力される。こ
の結果、辞書メモリ18において、手書文字用辞書マスク
及び印刷文字用辞書マスクの双方が認識部19へ出力され
る。続いて、この辞書マスクとレジスタ15中の認識用２
値パターンとの照合が行われ、得られた文字名が出力端
20へ出力される。On the other hand, the second binarization unit 14 that performs the binarization processing based on the dictionary selection binarization threshold T2 uses the entry frame by the above equation (2).
A binary pattern of 30 is not obtained, and a binary pattern of only the postal code 31 is obtained. The binary pattern of this zip code 31 is
After being stored in the second pattern register 16, the number of black points BL is counted in the character presence determination section 17 by the above equation (3). Here, if BTHL = 10, then BL ≧ BTHL as shown in the above equation (4), so that DS = 1, and the dictionary selection signal DS of logic “1” is output to the dictionary memory 18. As a result, in the dictionary memory 18, both the handwritten character dictionary mask and the printed character dictionary mask are output to the recognition unit 19. Then, the dictionary mask and the recognition 2 in the register 15 are used.
Matching with the value pattern is performed, and the obtained character name is output
Output to 20.

これに対して、第５図に示すように、赤色の記入枠30
の内に赤色の印刷文字で記入された郵便番号32を読取る
場合、上記同様に、光電変換部11により媒体がｘ方向に
走査されると、第１の２値化部13において、上記（１）
式により記入枠30及び郵便番号31が２値化された２値パ
ターンが得られるが、第２の２値化部14では、上記
（２）式により記入枠30及び郵便番号31の２値パターン
のいずれも得られないという結果になる。すると、上記
（３）及び（４）式に示すようにBL＜BTHLとなるので、
DS＝０となり、文字存在判定部17から論理“0"の辞書選
択信号DSが辞書メモリ18へ出力される。この結果、辞書
メモリ18において、印刷文字用辞書マスクのみが選択さ
れ、認識部19へ出力される。続いて、上記同様にこの辞
書マスクとレジスタ15中の認識用２値パターンとの照合
が行われ、得られた文字名が出力端20へ出力される。On the other hand, as shown in FIG.
In the case of reading the postal code 32 written in red print characters, the first binarizing unit 13 scans the medium in the x direction by the photoelectric conversion unit 11 as described above. )
A binary pattern in which the entry frame 30 and the postal code 31 are binarized is obtained by the formula, but the second binarization unit 14 uses the binary pattern of the entry frame 30 and the postal code 31 according to the above formula (2). Is not obtained. Then, since BL <BTHL as shown in the above equations (3) and (4),
DS = 0, and the character selection unit 17 outputs a dictionary selection signal DS of logic “0” to the dictionary memory 18. As a result, in the dictionary memory 18, only the print character dictionary mask is selected and output to the recognition unit 19. Subsequently, the dictionary mask is compared with the binary pattern for recognition in the register 15 in the same manner as described above, and the obtained character name is output to the output terminal 20.

本実施例は、次のような利点を有している。 This embodiment has the following advantages.

郵便番号を記入する場合、通常、印刷文字は赤色また
は黒色であり、手書文字は黒色で赤色を用いることはな
い。これを利用して、赤色で印刷された文字パターンは
現れずに、黒色で印刷または記入された文字パターンの
みが再現されるような２値化閾値を設定したので、文字
パターンが現れた場合は、黒色で記入されたものと見な
し、印刷文字用及び手書文字用辞書マスクと照合する。
これに対して、文字パターンが現れない場合は、赤色で
記入されたものと見なし、印刷文字用辞書マスクと照合
する。したがって、照合する辞書マスクを限定すること
ができるので照合に要する時間を短縮でき、認識精度の
向上が期待できる。When writing a zip code, the printed characters are usually red or black, and the handwritten characters are black and not red. Utilizing this, the binarization threshold is set so that the character pattern printed in red does not appear, and only the character pattern printed or written in black is reproduced, so if the character pattern appears, , And are compared with the dictionary masks for printed characters and handwritten characters.
On the other hand, if the character pattern does not appear, it is assumed that the character pattern is written in red, and the character pattern is compared with the print character dictionary mask. Therefore, the dictionary mask to be collated can be limited, so that the time required for collation can be reduced, and improvement in recognition accuracy can be expected.

なお、本発明は図示の実施例に限定されず、種々の変
形が可能である。その変形例としては、例えば次のよう
なものがある。Note that the present invention is not limited to the illustrated embodiment, and various modifications are possible. For example, there are the following modifications.

（イ）上記実施例では、認識用２値化閾値T1及び辞書選
択用２値化閾値T2は、それぞれ単一の閾値として設定し
たが、認識用２値化閾値T1及び辞書選択用２値化閾値T2
共に、複数の閾値として設定することも可能である。(B) In the above embodiment, the binarization threshold T1 for recognition and the binarization threshold T2 for dictionary selection are each set as a single threshold. However, the binarization threshold T1 for recognition and binarization for dictionary selection are used. Threshold T2
Both can be set as a plurality of thresholds.

（ロ）上記実施例では、辞書選択用２値パターンの特徴
をその２値パターンの黒点数の和で検出したが、これを
例えば、文字線量、線幅、または黒点の分布状態等を用
いて検出してもよい。(B) In the above embodiment, the feature of the binary pattern for dictionary selection is detected by the sum of the number of black spots of the binary pattern, but this is detected by using, for example, the character dose, the line width, or the distribution state of the black spot. It may be detected.

（ハ）上記実施例の辞書マスクは、手書文字及び印刷文
字を対象としたが、これに限定されず、その他種々のも
のを対象とすることも可能である。(C) Although the dictionary mask of the above embodiment targets handwritten characters and printed characters, the present invention is not limited to this, and it is possible to target various other masks.

（ニ）上記実施例の第１及び第２の２値化部13,14、第
１及び第２のパターンレジスタ15,16、文字存在判定部1
7、及び認識部19は、個別回路として構成してもよい
し、あるいは計算機によるプログラム制御により実行す
るようにしてもよい。(D) The first and second binarizing units 13 and 14, the first and second pattern registers 15 and 16, and the character presence determining unit 1 of the above embodiment.
7 and the recognition unit 19 may be configured as individual circuits, or may be executed by program control by a computer.

（ホ）上記実施例では、BL≧BTHLの時に辞書選択信号DS
を論理“1"にして、BL＜BTHLの時に論理“0"にしたが、
その逆のBL＜BTHLの時に論理“1"にして、BL≧BTHLの時
に論理“0"にしてもよい。(E) In the above embodiment, when BL ≧ BTHL, the dictionary selection signal DS
Is set to logic “1”, and when BL <BTHL, it is set to logic “0”.
Conversely, the logic may be set to “1” when BL <BTHL, and may be set to logic “0” when BL ≧ BTHL.

（ヘ）上記実施例の辞書メモリ17では、論理“1"の辞書
選択信号DSにより印刷文字用及び手書文字用辞書マスク
を出力し、論理“0"の辞書選択信号DSにより印刷文字用
辞書マスクのみ出力するようにしたが、その逆の論理
“0"の辞書選択信号DSにより印刷文字用及び手書文字用
辞書マスクを出力し、論理“1"の辞書選択信号DSにより
印刷文字用辞書マスクのみ出力するようにしてもよい。(F) In the dictionary memory 17 of the above embodiment, a dictionary mask for print characters and handwriting characters is output by the dictionary selection signal DS of logic "1", and a dictionary for print characters is output by the dictionary selection signal DS of logic "0". Although only the mask is output, the dictionary mask for print characters and handwritten characters is output by the dictionary selection signal DS of logic “0”, and the dictionary for print characters is output by the dictionary selection signal DS of logic “1”. Only the mask may be output.

（ト）上記実施例において、認識用２値パターン及び辞
書選択用２値パターンの値の“1"と“0"とは、認識用２
値パターン及び辞書選択用２値パターンの黒点と白点と
をそれぞれ表すようにしたが、その逆の黒点を“0"と
し、白点を“1"と表すようにしてもよい。(G) In the above embodiment, the values “1” and “0” of the binary pattern for recognition and the binary pattern for dictionary selection are used for the binary pattern for recognition.
Although the black point and the white point of the value pattern and the dictionary selection binary pattern are represented respectively, the opposite black point may be represented by “0” and the white point may be represented by “1”.

（チ）多値画像データの濃度は、１画素当たり４ビット
で表現し、16段階の濃度で表したが、これに限定され
ず、１画素当たりｎ（正の整数）ビットの多値で種々の
段階に表現できる。その場合、画像メモリ12の容量も濃
度段階に応じたものにする必要がある。(H) The density of the multi-valued image data is represented by 4 bits per pixel and represented by 16 levels of density, but is not limited to this, and is multi-valued with n (positive integer) bits per pixel. Can be expressed in the stage. In that case, the capacity of the image memory 12 also needs to be adjusted according to the density level.

（発明の効果）以上詳細に説明したように、本発明によれば、情報媒体上の文字を光電変換して得られる多値画像に
対して認識用２値化閾値と辞書選択用２値化閾値とを用
いて認識用２値パターンと辞書選択用２値パターンとを
それぞれ得た後、その辞書選択用２値パターンの特徴を
検出し、その検出結果に基づき複数の辞書マスクの内、
所定の辞書マスクを選択するようにしたので、照合する
辞書マスクを限定することができ、照合に要する時間の
短縮が可能となり、しかも認識精度の向上が期待でき
る。(Effects of the Invention) As described in detail above, according to the present invention, a binarization threshold for recognition and a binarization for dictionary selection are applied to a multivalued image obtained by photoelectrically converting characters on an information medium. After obtaining the binary pattern for recognition and the binary pattern for dictionary selection using the threshold value, respectively, the feature of the binary pattern for dictionary selection is detected, and based on the detection result, a plurality of dictionary masks are selected.
Since a predetermined dictionary mask is selected, the dictionary mask to be collated can be limited, the time required for collation can be shortened, and improvement in recognition accuracy can be expected.

[Brief description of the drawings]

第１図は本発明の実施例を示す文字認識装置の構成ブロ
ック図、第２図は従来の文字認識方法の説明図、第３図
（ａ），（ｂ）は情報媒体の一例を示す図、第４図及び
第５図は郵便番号読取時における濃度を示す図である。 11……光電変換部、12……画像メモリ、13,14……第１
及び第２の２値化部、15,16……第１及び第２のパター
ンレジスタ、17……文字存在判定部、18……辞書メモ
リ、19……認識部、DS……辞書選択信号、T1……認識用
２値化閾値、T2……辞書選択用２値化閾値。FIG. 1 is a block diagram showing the configuration of a character recognition apparatus according to an embodiment of the present invention, FIG. 2 is an explanatory view of a conventional character recognition method, and FIGS. 3 (a) and 3 (b) show an example of an information medium. 4 and 5 are diagrams showing the density at the time of reading the postal code. 11: photoelectric conversion unit, 12: image memory, 13, 14: first
, A second binarization section, 15, 16... First and second pattern registers, 17... A character presence determination section, 18... A dictionary memory, 19... A recognition section, DS. T1: binarization threshold for recognition, T2: binarization threshold for dictionary selection.

Claims

(57) [Claims]

1. A method for identifying a character by extracting a characteristic of the character from a multi-valued image obtained by photoelectrically converting a character on an information medium, and comparing the extraction result with a plurality of dictionary masks prepared in advance. In the character recognition method, different binarization thresholds for recognition and binarization for dictionary selection are set for the multi-valued image, and the binarization threshold for recognition and binarization for dictionary selection are different from each other. After performing binarization based on a threshold to obtain a binary pattern for recognition and a binary pattern for dictionary selection corresponding to the binary threshold for recognition and the binary threshold for dictionary selection, respectively, the dictionary selection is performed. A character recognition method comprising: detecting a feature of a binary pattern for use; and selecting a predetermined dictionary mask from the plurality of dictionary masks based on the detection result.

2. The character recognition method according to claim 1, wherein the plurality of dictionary masks use masks for handwritten characters and printed characters.

3. The character recognition method according to claim 1, wherein a characteristic of the binary pattern for dictionary selection is a sum of the number of black points of the binary pattern, a character dose, a line width, or a distribution state of black points. Method.