JPH05108882A

JPH05108882A - Character recognition device

Info

Publication number: JPH05108882A
Application number: JP3269243A
Authority: JP
Inventors: Ryoichi Yushimo; 良一湯下
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1991-10-17
Filing date: 1991-10-17
Publication date: 1993-04-30

Abstract

(57)【要約】【目的】文字の図形特徴の欠落やノイズの影響を軽減
し、認識率の向上を図る。【構成】画像入力部にて入力した２値画像から文字領
域を切り出し、文字領域の縦横比によって定まる縦・横
分割数にて、文字領域を小領域に分割し、各小領域内の
黒画素と白画素の割合を文字領域の図形特徴として求
め、その図形特徴と認識辞書とを類似度等の方法により
比較することにより、認識結果を得る。【効果】特徴抽出での文字領域の分割パターンを文字
領域の縦横比により決め、縦に細長い文字では縦に細か
く横に粗く、横に細長い文字では横に細かく縦に粗く分
割することで文字の図形特徴の欠落やノイズの影響を軽
減し、認識率の向上を図る。 (57) [Summary] [Purpose] To improve the recognition rate by reducing the effects of noise and missing graphic features of characters. [Structure] A character area is cut out from a binary image input by an image input unit, and the character area is divided into small areas by the number of vertical and horizontal divisions determined by the aspect ratio of the character area, and black pixels in each small area are divided. The ratio of white pixels and white pixels is obtained as the graphic feature of the character area, and the recognition result is obtained by comparing the graphic feature with the recognition dictionary by a method such as similarity. [Effect] The character area division pattern in the feature extraction is determined by the aspect ratio of the character area. We aim to improve the recognition rate by mitigating the effects of missing figure features and noise.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は文字の認識を行う文字認
識装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition device for recognizing characters.

【０００２】[0002]

【従来の技術】従来の文字認識装置は、文字の外接矩形
を文字領域とし、縦方向・横方向の分割数を固定として
文字領域を小領域に分割し、各小領域内の黒画素と白画
素の割合を文字領域の図形特徴として求めていた。2. Description of the Related Art A conventional character recognizing device divides a character area into small areas by defining a circumscribing rectangle of a character as a character area, and fixing a fixed number of divisions in the vertical and horizontal directions. The ratio of pixels was obtained as a graphic feature of the character area.

【０００３】文字領域の分割において、縦方向・横方向
の分割数を固定とした場合、‘Ｉ’や‘１’等の「縦に
細長い文字」や、‘−’や‘＝’等の「横に細長い文
字」において、縦方向・横方向の情報量が大きく異なる
こととなり、認識結果に少なからず悪影響を及ぼす。When the number of divisions in the vertical and horizontal directions is fixed in dividing the character area, "vertical elongated characters" such as "I" and "1" and "-" and "=", etc. In the case of "horizontally long and narrow characters", the amount of information in the vertical and horizontal directions is greatly different, which has a considerable adverse effect on the recognition result.

【０００４】例えば、‘ｉ’を認識することを考えた場
合、上下の島が離れていることが大きな特徴となるが、
その特徴は縦方向の情報であり、縦方向の分割数を増や
し分割幅を小さくしなければ現われ難くなる。また、横
方向の特徴は小さいため、横方向の分割数を減らし分割
幅を大きくしなければノイズ等の影響を受け易くなる。For example, considering the recognition of'i ', the major feature is that the upper and lower islands are separated,
The feature is information in the vertical direction, and it is difficult to appear unless the number of divisions in the vertical direction is increased and the division width is reduced. Further, since the feature in the horizontal direction is small, unless the number of divisions in the horizontal direction is reduced and the division width is increased, it is easy to be affected by noise or the like.

【０００５】従って、‘ｉ’においては、縦方向の分割
数を多くし、横方向の分割数を少なくすることが望まれ
るが、‘−’に対しては全く逆のことが言え、全ての文
字に対して同じ分割を行う限り、いずれかの文字に影響
がでる。Therefore, in "i", it is desired to increase the number of divisions in the vertical direction and decrease the number of divisions in the horizontal direction. As long as you do the same division for the characters, it affects either character.

【０００６】その結果、‘ｉ’に対して縦の十分な分割
数が設定できなかった場合、‘ｉ’を‘Ｉ’や‘１’に
誤ることが多くなる。As a result, if a sufficient number of vertical divisions cannot be set for'i ',' i 'is often mistaken for'I' or '1'.

【０００７】[0007]

【発明が解決しようとする課題】上記で説明したように
従来の文字認識装置は、特徴抽出での文字領域の分割
を、どのような対象に対しても同様に行っていたため、
重要な特徴の欠落やノイズの影響の増大を招いていた。As described above, since the conventional character recognition device divides the character region in the feature extraction in the same manner for any object,
This led to a lack of important features and an increase in the influence of noise.

【０００８】[0008]

【課題を解決するための手段】本発明は上記問題点を鑑
みてなされたもので、特徴抽出での文字領域の分割パタ
ーンを文字領域の縦横比により決め、縦に細長い文字で
は縦に細かく横に粗く、横に細長い文字では横に細かく
縦に粗く分割することで文字の図形特徴の欠落やノイズ
の影響を軽減し、認識率の向上を図る。SUMMARY OF THE INVENTION The present invention has been made in view of the above problems, and the division pattern of a character area in feature extraction is determined by the aspect ratio of the character area. Characters that are coarse and long and narrow horizontally are divided finely horizontally and vertically to reduce the effects of missing graphic features of characters and noise, and improve the recognition rate.

【０００９】[0009]

【作用】文字の図形特徴の欠落やノイズの影響を軽減す
ることで、類似した文字の相互の誤認識やノイズによる
誤認識を抑え、認識率の向上を図る。By reducing the effects of noise in the graphic features of characters and noise, it is possible to suppress the mutual recognition of similar characters and the recognition error due to noise, and to improve the recognition rate.

【００１０】[0010]

【実施例】本発明を一実施例を示す添付図面とともに説
明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described with reference to the accompanying drawings showing one embodiment.

【００１１】図１において、１は認識対象文書を２値画
像として入力する画像入力部、２は入力された２値画像
中の黒画素の連接情報をもとに文字の外接矩形を求める
ことにより、文字領域を１文字ずつ切り出す文字領域切
り出し部、３は文字領域の縦横比によって定まる、縦・
横分割数にて文字領域を小領域に分割し、各小領域内の
黒画素と白画素の割合を文字領域の図形特徴として求め
る特徴抽出部、４は抽出された特徴と予め用意された辞
書とを類似度等の方法により比較し、最も類似した特徴
を持つ文字を文字領域の認識結果とする文字識別部、５
から７は認識の対象となる全ての文字の図形特徴を蓄え
る認識辞書であり、特徴抽出部３にて文字領域がどのよ
うに分割されたかにより使い分けられる。また、これら
の認識辞書は予め用意される。８は、１から４の各部を
つなぐ内部バス、９は、５から７の各部をつなぐ内部バ
スである。In FIG. 1, reference numeral 1 denotes an image input unit for inputting a document to be recognized as a binary image, and 2 denotes a circumscribed rectangle of a character based on connection information of black pixels in the input binary image. , A character area cutout unit that cuts out a character area one by one, and 3 is determined by the aspect ratio of the character area.
The feature extraction unit 4 divides a character area into small areas by the number of horizontal divisions, and obtains the ratio of black pixels and white pixels in each small area as a graphic feature of the character area. Are compared by a method such as a degree of similarity, and the character identification unit that recognizes the character having the most similar feature as the recognition result of the character area,
Reference numerals 7 to 7 are recognition dictionaries that store the graphic features of all characters to be recognized, and are used properly according to how the character region is divided by the feature extraction unit 3. Also, these recognition dictionaries are prepared in advance. Reference numeral 8 is an internal bus connecting the units 1 to 4, and 9 is an internal bus connecting the units 5 to 7.

【００１２】以上のように構成された本実施例の文字認
識装置について、図２に全体の処理の流れ図、図３に文
字領域の分割例を示し、以下その動作を説明する。認識
したい文書を画像入力部１にて２値画像として入力する
（処理１０）。文字領域切り出し部２にて２値画像中の
黒画素の連接情報をもとに文字の外接矩形を求め、文字
領域として１文字ずつ切り出す（処理１１）。特徴抽出
部３にて文字領域の図形特徴を抽出する（処理１２〜１
８）。以下その過程を説明する。With respect to the character recognition apparatus of the present embodiment configured as described above, FIG. 2 shows a flow chart of the entire processing, and FIG. 3 shows an example of character area division. The operation will be described below. The document to be recognized is input as a binary image by the image input unit 1 (process 10). The character area cutout unit 2 obtains a circumscribed rectangle of the character based on the connection information of the black pixels in the binary image, and cuts out each character as a character area (process 11). The feature extraction unit 3 extracts the graphic feature of the character area (processes 12 to 1).
8). The process will be described below.

【００１３】まず文字領域の縦横比を求め、縦横比の値
により特徴抽出の際の文字領域の分割パターンを決定す
る（処理１２〜１５）。分割は文字領域を碁盤状に分け
ることにより行われ、縦方向の分割数・横方向の分割数
を縦横比により設定する。First, the aspect ratio of the character area is obtained, and the division pattern of the character area at the time of feature extraction is determined according to the value of the aspect ratio (processes 12 to 15). The division is performed by dividing the character area into a board shape, and the number of divisions in the vertical direction and the number of divisions in the horizontal direction are set by the aspect ratio.

【００１４】本説明中では、縦横比によって「縦：１
２、横：５」「縦：８、横：８」「縦：５、横：１２」
の３つの分割パターンを使い分け、それぞれ分割パター
ン１、２、３とする。各分割パターンを図３に示す。In the present description, depending on the aspect ratio, "vertical: 1
2, horizontal: 5 "" vertical: 8, horizontal: 8 "" vertical: 5, horizontal: 12 "
These three divided patterns are separately used to form divided patterns 1, 2, and 3, respectively. Each division pattern is shown in FIG.

【００１５】縦横比により分割パターンは以下のように
なる。縦横比＞２．０のとき分割パターン１０．５ ≦ 縦横比 ≦ ２．０のとき分割パターン２縦横比＜０．５のとき分割パターン３以上にて決定される分割パターンにて文字領域を小領域
に分割し（処理１６）、各小領域内の黒画素の面積を求
める（処理１７）。抽出される特徴が文字の大きさによ
らないようにするため、求められた黒画素の面積を小領
域の面積で除算し、正規化する（処理１８）。正規化し
た各小領域の値を図形特徴とする。The division pattern is as follows according to the aspect ratio. When aspect ratio> 2.0 Divided pattern 1 0.5 ≤ aspect ratio ≤ 2.0 Divided pattern 2 When aspect ratio <0.5 Divided pattern 3 Select a character area with the divided pattern determined by the above. It is divided into small regions (process 16), and the area of the black pixel in each small region is obtained (process 17). In order that the extracted feature does not depend on the size of the character, the obtained area of the black pixel is divided by the area of the small region and normalized (process 18). The normalized value of each small area is used as a graphic feature.

【００１６】以上の処理により求められた文字領域の図
形特徴をもとに、文字識別部４にて、文字領域内の図形
が何れの文字に該当するかを識別する（処理１９〜２
３）。Based on the graphic feature of the character area obtained by the above processing, the character identifying section 4 identifies which character the graphic in the character area corresponds to (process 19 to 2).
3).

【００１７】まず、特徴抽出時に選択された分割パター
ンによって、認識辞書を選択する（処理１９）。認識辞
書は分割パターンに合わせて３種類用意されており、辞
書Ａ、Ｂ、Ｃはそれぞれ分割パターン１、２、３にて認
識対象とする文字全てから特徴を抽出して予め作成され
たもので、特徴抽出時に選択された分割パターンが１の
時は辞書Ａ、２の時は辞書Ｂ、３の時は辞書Ｃを選択す
る。First, a recognition dictionary is selected according to the division pattern selected at the time of feature extraction (process 19). Three types of recognition dictionaries are prepared in accordance with the division patterns. The dictionaries A, B, and C are created in advance by extracting features from all the characters to be recognized in the division patterns 1, 2, and 3, respectively. When the division pattern selected at the time of feature extraction is 1, the dictionary A is selected, when it is 2, the dictionary B is selected, and when it is 3, the dictionary C is selected.

【００１８】次に、選択された辞書と図形特徴の間で類
似度等の方法により比較し、最も類似した特徴を持つ文
字を認識結果とする（処理１０〜２３）。Next, the selected dictionary and the graphic feature are compared by a method such as similarity, and the character having the most similar feature is set as the recognition result (processes 10 to 23).

【００１９】[0019]

【発明の効果】以上説明したようにこの発明によって、
文字の図形特徴の欠落やノイズの影響を軽減すること
で、類似した文字の相互の誤認識やノイズによる誤認識
を抑え、認識率の向上を図ることができる。As described above, according to the present invention,
By reducing the effect of noise in the graphic feature of a character and the influence of noise, mutual recognition of similar characters and erroneous recognition due to noise can be suppressed, and the recognition rate can be improved.

[Brief description of drawings]

【図１】本発明の一実施例における文字認識装置の構成
図FIG. 1 is a configuration diagram of a character recognition device according to an embodiment of the present invention.

【図２】本実施例の文字認識処理の全体の流れ図FIG. 2 is an overall flow chart of character recognition processing according to this embodiment.

【図３】本実施例の文字領域の分割パターンを示す図FIG. 3 is a diagram showing a division pattern of a character area according to the present embodiment.

[Explanation of symbols]

１画像入力部２文字領域切り出し部３特徴抽出部４文字識別部５〜７認識辞書８〜９内部バス１０〜２４流れ図中の各処理２５〜２７文字領域分割パターン 1 Image Input Section 2 Character Area Extraction Section 3 Feature Extraction Section 4 Character Identification Section 5-7 Recognition Dictionary 8-9 Internal Bus 10-24 Processing in Flow Chart 25-27 Character Area Division Pattern

Claims

[Claims]

1. An image input unit for inputting a document to be recognized as a binary image, and a character circumscribing rectangle is obtained based on connection information of black pixels in the input binary image to determine a character area. The character area cutout part that cuts out each character and the character area is divided into small areas by the number of vertical and horizontal divisions determined by the aspect ratio of the character area, and the ratio of black pixels and white pixels in each small area It is composed of a feature extraction unit which is obtained as a feature, and a character identification unit which compares the extracted features with a dictionary prepared in advance by a method such as a similarity degree and uses a character having the most similar feature as a recognition result of a character region. Character recognizer.