[go: up one dir, main page]

JPH05108882A - Character recognition device - Google Patents

Character recognition device

Info

Publication number
JPH05108882A
JPH05108882A JP3269243A JP26924391A JPH05108882A JP H05108882 A JPH05108882 A JP H05108882A JP 3269243 A JP3269243 A JP 3269243A JP 26924391 A JP26924391 A JP 26924391A JP H05108882 A JPH05108882 A JP H05108882A
Authority
JP
Japan
Prior art keywords
character
character area
area
feature
recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP3269243A
Other languages
Japanese (ja)
Inventor
Ryoichi Yushimo
良一 湯下
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Holdings Corp
Original Assignee
Matsushita Electric Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Industrial Co Ltd filed Critical Matsushita Electric Industrial Co Ltd
Priority to JP3269243A priority Critical patent/JPH05108882A/en
Publication of JPH05108882A publication Critical patent/JPH05108882A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Input (AREA)
  • Character Discrimination (AREA)

Abstract

(57)【要約】 【目的】 文字の図形特徴の欠落やノイズの影響を軽減
し、認識率の向上を図る。 【構成】 画像入力部にて入力した2値画像から文字領
域を切り出し、文字領域の縦横比によって定まる縦・横
分割数にて、文字領域を小領域に分割し、各小領域内の
黒画素と白画素の割合を文字領域の図形特徴として求
め、その図形特徴と認識辞書とを類似度等の方法により
比較することにより、認識結果を得る。 【効果】 特徴抽出での文字領域の分割パターンを文字
領域の縦横比により決め、縦に細長い文字では縦に細か
く横に粗く、横に細長い文字では横に細かく縦に粗く分
割することで文字の図形特徴の欠落やノイズの影響を軽
減し、認識率の向上を図る。
(57) [Summary] [Purpose] To improve the recognition rate by reducing the effects of noise and missing graphic features of characters. [Structure] A character area is cut out from a binary image input by an image input unit, and the character area is divided into small areas by the number of vertical and horizontal divisions determined by the aspect ratio of the character area, and black pixels in each small area are divided. The ratio of white pixels and white pixels is obtained as the graphic feature of the character area, and the recognition result is obtained by comparing the graphic feature with the recognition dictionary by a method such as similarity. [Effect] The character area division pattern in the feature extraction is determined by the aspect ratio of the character area. We aim to improve the recognition rate by mitigating the effects of missing figure features and noise.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】本発明は文字の認識を行う文字認
識装置に関するものである。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition device for recognizing characters.

【0002】[0002]

【従来の技術】従来の文字認識装置は、文字の外接矩形
を文字領域とし、縦方向・横方向の分割数を固定として
文字領域を小領域に分割し、各小領域内の黒画素と白画
素の割合を文字領域の図形特徴として求めていた。
2. Description of the Related Art A conventional character recognizing device divides a character area into small areas by defining a circumscribing rectangle of a character as a character area, and fixing a fixed number of divisions in the vertical and horizontal directions. The ratio of pixels was obtained as a graphic feature of the character area.

【0003】文字領域の分割において、縦方向・横方向
の分割数を固定とした場合、‘I’や‘1’等の「縦に
細長い文字」や、‘−’や‘=’等の「横に細長い文
字」において、縦方向・横方向の情報量が大きく異なる
こととなり、認識結果に少なからず悪影響を及ぼす。
When the number of divisions in the vertical and horizontal directions is fixed in dividing the character area, "vertical elongated characters" such as "I" and "1" and "-" and "=", etc. In the case of "horizontally long and narrow characters", the amount of information in the vertical and horizontal directions is greatly different, which has a considerable adverse effect on the recognition result.

【0004】例えば、‘i’を認識することを考えた場
合、上下の島が離れていることが大きな特徴となるが、
その特徴は縦方向の情報であり、縦方向の分割数を増や
し分割幅を小さくしなければ現われ難くなる。また、横
方向の特徴は小さいため、横方向の分割数を減らし分割
幅を大きくしなければノイズ等の影響を受け易くなる。
For example, considering the recognition of'i ', the major feature is that the upper and lower islands are separated,
The feature is information in the vertical direction, and it is difficult to appear unless the number of divisions in the vertical direction is increased and the division width is reduced. Further, since the feature in the horizontal direction is small, unless the number of divisions in the horizontal direction is reduced and the division width is increased, it is easy to be affected by noise or the like.

【0005】従って、‘i’においては、縦方向の分割
数を多くし、横方向の分割数を少なくすることが望まれ
るが、‘−’に対しては全く逆のことが言え、全ての文
字に対して同じ分割を行う限り、いずれかの文字に影響
がでる。
Therefore, in "i", it is desired to increase the number of divisions in the vertical direction and decrease the number of divisions in the horizontal direction. As long as you do the same division for the characters, it affects either character.

【0006】その結果、‘i’に対して縦の十分な分割
数が設定できなかった場合、‘i’を‘I’や‘1’に
誤ることが多くなる。
As a result, if a sufficient number of vertical divisions cannot be set for'i ',' i 'is often mistaken for'I' or '1'.

【0007】[0007]

【発明が解決しようとする課題】上記で説明したように
従来の文字認識装置は、特徴抽出での文字領域の分割
を、どのような対象に対しても同様に行っていたため、
重要な特徴の欠落やノイズの影響の増大を招いていた。
As described above, since the conventional character recognition device divides the character region in the feature extraction in the same manner for any object,
This led to a lack of important features and an increase in the influence of noise.

【0008】[0008]

【課題を解決するための手段】本発明は上記問題点を鑑
みてなされたもので、特徴抽出での文字領域の分割パタ
ーンを文字領域の縦横比により決め、縦に細長い文字で
は縦に細かく横に粗く、横に細長い文字では横に細かく
縦に粗く分割することで文字の図形特徴の欠落やノイズ
の影響を軽減し、認識率の向上を図る。
SUMMARY OF THE INVENTION The present invention has been made in view of the above problems, and the division pattern of a character area in feature extraction is determined by the aspect ratio of the character area. Characters that are coarse and long and narrow horizontally are divided finely horizontally and vertically to reduce the effects of missing graphic features of characters and noise, and improve the recognition rate.

【0009】[0009]

【作用】文字の図形特徴の欠落やノイズの影響を軽減す
ることで、類似した文字の相互の誤認識やノイズによる
誤認識を抑え、認識率の向上を図る。
By reducing the effects of noise in the graphic features of characters and noise, it is possible to suppress the mutual recognition of similar characters and the recognition error due to noise, and to improve the recognition rate.

【0010】[0010]

【実施例】本発明を一実施例を示す添付図面とともに説
明する。
DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described with reference to the accompanying drawings showing one embodiment.

【0011】図1において、1は認識対象文書を2値画
像として入力する画像入力部、2は入力された2値画像
中の黒画素の連接情報をもとに文字の外接矩形を求める
ことにより、文字領域を1文字ずつ切り出す文字領域切
り出し部、3は文字領域の縦横比によって定まる、縦・
横分割数にて文字領域を小領域に分割し、各小領域内の
黒画素と白画素の割合を文字領域の図形特徴として求め
る特徴抽出部、4は抽出された特徴と予め用意された辞
書とを類似度等の方法により比較し、最も類似した特徴
を持つ文字を文字領域の認識結果とする文字識別部、5
から7は認識の対象となる全ての文字の図形特徴を蓄え
る認識辞書であり、特徴抽出部3にて文字領域がどのよ
うに分割されたかにより使い分けられる。また、これら
の認識辞書は予め用意される。8は、1から4の各部を
つなぐ内部バス、9は、5から7の各部をつなぐ内部バ
スである。
In FIG. 1, reference numeral 1 denotes an image input unit for inputting a document to be recognized as a binary image, and 2 denotes a circumscribed rectangle of a character based on connection information of black pixels in the input binary image. , A character area cutout unit that cuts out a character area one by one, and 3 is determined by the aspect ratio of the character area.
The feature extraction unit 4 divides a character area into small areas by the number of horizontal divisions, and obtains the ratio of black pixels and white pixels in each small area as a graphic feature of the character area. Are compared by a method such as a degree of similarity, and the character identification unit that recognizes the character having the most similar feature as the recognition result of the character area,
Reference numerals 7 to 7 are recognition dictionaries that store the graphic features of all characters to be recognized, and are used properly according to how the character region is divided by the feature extraction unit 3. Also, these recognition dictionaries are prepared in advance. Reference numeral 8 is an internal bus connecting the units 1 to 4, and 9 is an internal bus connecting the units 5 to 7.

【0012】以上のように構成された本実施例の文字認
識装置について、図2に全体の処理の流れ図、図3に文
字領域の分割例を示し、以下その動作を説明する。認識
したい文書を画像入力部1にて2値画像として入力する
(処理10)。文字領域切り出し部2にて2値画像中の
黒画素の連接情報をもとに文字の外接矩形を求め、文字
領域として1文字ずつ切り出す(処理11)。特徴抽出
部3にて文字領域の図形特徴を抽出する(処理12〜1
8)。以下その過程を説明する。
With respect to the character recognition apparatus of the present embodiment configured as described above, FIG. 2 shows a flow chart of the entire processing, and FIG. 3 shows an example of character area division. The operation will be described below. The document to be recognized is input as a binary image by the image input unit 1 (process 10). The character area cutout unit 2 obtains a circumscribed rectangle of the character based on the connection information of the black pixels in the binary image, and cuts out each character as a character area (process 11). The feature extraction unit 3 extracts the graphic feature of the character area (processes 12 to 1).
8). The process will be described below.

【0013】まず文字領域の縦横比を求め、縦横比の値
により特徴抽出の際の文字領域の分割パターンを決定す
る(処理12〜15)。分割は文字領域を碁盤状に分け
ることにより行われ、縦方向の分割数・横方向の分割数
を縦横比により設定する。
First, the aspect ratio of the character area is obtained, and the division pattern of the character area at the time of feature extraction is determined according to the value of the aspect ratio (processes 12 to 15). The division is performed by dividing the character area into a board shape, and the number of divisions in the vertical direction and the number of divisions in the horizontal direction are set by the aspect ratio.

【0014】本説明中では、縦横比によって「縦:1
2、横:5」「縦:8、横:8」「縦:5、横:12」
の3つの分割パターンを使い分け、それぞれ分割パター
ン1、2、3とする。各分割パターンを図3に示す。
In the present description, depending on the aspect ratio, "vertical: 1
2, horizontal: 5 "" vertical: 8, horizontal: 8 "" vertical: 5, horizontal: 12 "
These three divided patterns are separately used to form divided patterns 1, 2, and 3, respectively. Each division pattern is shown in FIG.

【0015】縦横比により分割パターンは以下のように
なる。 縦横比 > 2.0 のとき 分割パターン1 0.5 ≦ 縦横比 ≦ 2.0 のとき 分割パターン2 縦横比 < 0.5 のとき 分割パターン3 以上にて決定される分割パターンにて文字領域を小領域
に分割し(処理16)、各小領域内の黒画素の面積を求
める(処理17)。抽出される特徴が文字の大きさによ
らないようにするため、求められた黒画素の面積を小領
域の面積で除算し、正規化する(処理18)。正規化し
た各小領域の値を図形特徴とする。
The division pattern is as follows according to the aspect ratio. When aspect ratio> 2.0 Divided pattern 1 0.5 ≤ aspect ratio ≤ 2.0 Divided pattern 2 When aspect ratio <0.5 Divided pattern 3 Select a character area with the divided pattern determined by the above. It is divided into small regions (process 16), and the area of the black pixel in each small region is obtained (process 17). In order that the extracted feature does not depend on the size of the character, the obtained area of the black pixel is divided by the area of the small region and normalized (process 18). The normalized value of each small area is used as a graphic feature.

【0016】以上の処理により求められた文字領域の図
形特徴をもとに、文字識別部4にて、文字領域内の図形
が何れの文字に該当するかを識別する(処理19〜2
3)。
Based on the graphic feature of the character area obtained by the above processing, the character identifying section 4 identifies which character the graphic in the character area corresponds to (process 19 to 2).
3).

【0017】まず、特徴抽出時に選択された分割パター
ンによって、認識辞書を選択する(処理19)。認識辞
書は分割パターンに合わせて3種類用意されており、辞
書A、B、Cはそれぞれ分割パターン1、2、3にて認
識対象とする文字全てから特徴を抽出して予め作成され
たもので、特徴抽出時に選択された分割パターンが1の
時は辞書A、2の時は辞書B、3の時は辞書Cを選択す
る。
First, a recognition dictionary is selected according to the division pattern selected at the time of feature extraction (process 19). Three types of recognition dictionaries are prepared in accordance with the division patterns. The dictionaries A, B, and C are created in advance by extracting features from all the characters to be recognized in the division patterns 1, 2, and 3, respectively. When the division pattern selected at the time of feature extraction is 1, the dictionary A is selected, when it is 2, the dictionary B is selected, and when it is 3, the dictionary C is selected.

【0018】次に、選択された辞書と図形特徴の間で類
似度等の方法により比較し、最も類似した特徴を持つ文
字を認識結果とする(処理10〜23)。
Next, the selected dictionary and the graphic feature are compared by a method such as similarity, and the character having the most similar feature is set as the recognition result (processes 10 to 23).

【0019】[0019]

【発明の効果】以上説明したようにこの発明によって、
文字の図形特徴の欠落やノイズの影響を軽減すること
で、類似した文字の相互の誤認識やノイズによる誤認識
を抑え、認識率の向上を図ることができる。
As described above, according to the present invention,
By reducing the effect of noise in the graphic feature of a character and the influence of noise, mutual recognition of similar characters and erroneous recognition due to noise can be suppressed, and the recognition rate can be improved.

【図面の簡単な説明】[Brief description of drawings]

【図1】本発明の一実施例における文字認識装置の構成
FIG. 1 is a configuration diagram of a character recognition device according to an embodiment of the present invention.

【図2】本実施例の文字認識処理の全体の流れ図FIG. 2 is an overall flow chart of character recognition processing according to this embodiment.

【図3】本実施例の文字領域の分割パターンを示す図FIG. 3 is a diagram showing a division pattern of a character area according to the present embodiment.

【符号の説明】[Explanation of symbols]

1 画像入力部 2 文字領域切り出し部 3 特徴抽出部 4 文字識別部 5〜7 認識辞書 8〜9 内部バス 10〜24 流れ図中の各処理 25〜27 文字領域分割パターン 1 Image Input Section 2 Character Area Extraction Section 3 Feature Extraction Section 4 Character Identification Section 5-7 Recognition Dictionary 8-9 Internal Bus 10-24 Processing in Flow Chart 25-27 Character Area Division Pattern

Claims (1)

【特許請求の範囲】[Claims] 【請求項1】認識対象文書を2値画像として入力する画
像入力部と、入力された2値画像中の黒画素の連接情報
をもとに文字の外接矩形を求めることにより、文字領域
を1文字ずつ切り出す文字領域切り出し部と、文字領域
の縦横比によって定まる、縦・横分割数にて文字領域を
小領域に分割し、各小領域内の黒画素と白画素の割合を
文字領域の図形特徴として求める特徴抽出部と、抽出さ
れた特徴と予め用意された辞書とを類似度等の方法によ
り比較し、最も類似した特徴を持つ文字を文字領域の認
識結果とする文字識別部とからなる文字認識装置。
1. An image input unit for inputting a document to be recognized as a binary image, and a character circumscribing rectangle is obtained based on connection information of black pixels in the input binary image to determine a character area. The character area cutout part that cuts out each character and the character area is divided into small areas by the number of vertical and horizontal divisions determined by the aspect ratio of the character area, and the ratio of black pixels and white pixels in each small area It is composed of a feature extraction unit which is obtained as a feature, and a character identification unit which compares the extracted features with a dictionary prepared in advance by a method such as a similarity degree and uses a character having the most similar feature as a recognition result of a character region. Character recognizer.
JP3269243A 1991-10-17 1991-10-17 Character recognition device Pending JPH05108882A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP3269243A JPH05108882A (en) 1991-10-17 1991-10-17 Character recognition device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP3269243A JPH05108882A (en) 1991-10-17 1991-10-17 Character recognition device

Publications (1)

Publication Number Publication Date
JPH05108882A true JPH05108882A (en) 1993-04-30

Family

ID=17469644

Family Applications (1)

Application Number Title Priority Date Filing Date
JP3269243A Pending JPH05108882A (en) 1991-10-17 1991-10-17 Character recognition device

Country Status (1)

Country Link
JP (1) JPH05108882A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6188790B1 (en) 1996-02-29 2001-02-13 Tottori Sanyo Electric Ltd. Method and apparatus for pre-recognition character processing
KR100498683B1 (en) * 1997-12-19 2005-09-09 마츠시타 덴끼 산교 가부시키가이샤 A character recognition apparatus, a character recognition method, and a computer-readable storage medium recording a character recognition apparatus

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6188790B1 (en) 1996-02-29 2001-02-13 Tottori Sanyo Electric Ltd. Method and apparatus for pre-recognition character processing
KR100498683B1 (en) * 1997-12-19 2005-09-09 마츠시타 덴끼 산교 가부시키가이샤 A character recognition apparatus, a character recognition method, and a computer-readable storage medium recording a character recognition apparatus

Similar Documents

Publication Publication Date Title
JPH05108882A (en) Character recognition device
JPH02116987A (en) Character recognizing device
JPH06187489A (en) Character recognizing device
JP2675303B2 (en) Character recognition method
JP2917427B2 (en) Drawing reader
JP2788506B2 (en) Character recognition device
JP2612383B2 (en) Character recognition processing method
JPH0916713A (en) Image area dividing method
JP2995818B2 (en) Character extraction method
JP2902097B2 (en) Information processing device and character recognition device
JP2918363B2 (en) Character classification method and character recognition device
JP2993252B2 (en) Homomorphic character discrimination method and apparatus
JPH0573718A (en) Area attribute identification method
JPH01124082A (en) Character recognizing device
JP3193573B2 (en) Character recognition device with brackets
JPH1021332A (en) Non-linear normalizing method
JP2974396B2 (en) Image processing method and apparatus
JPH03217993A (en) Character size recognizer
JP2972443B2 (en) Character recognition device
JPH06231306A (en) Character recognition device
JP2832035B2 (en) Character recognition device
JPH05108880A (en) English character recognition device
JPH0371380A (en) Character recognizing device
JPH05135204A (en) Character recognition device
JPS63131287A (en) Character recognition system