[go: up one dir, main page]

JPH0584553B2 - - Google Patents

Info

Publication number
JPH0584553B2
JPH0584553B2 JP60159035A JP15903585A JPH0584553B2 JP H0584553 B2 JPH0584553 B2 JP H0584553B2 JP 60159035 A JP60159035 A JP 60159035A JP 15903585 A JP15903585 A JP 15903585A JP H0584553 B2 JPH0584553 B2 JP H0584553B2
Authority
JP
Japan
Prior art keywords
character
character pattern
sub
individual
pattern
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
JP60159035A
Other languages
Japanese (ja)
Other versions
JPS6219990A (en
Inventor
Masahiro Shimizu
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Holdings Corp
Original Assignee
Matsushita Electric Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Industrial Co Ltd filed Critical Matsushita Electric Industrial Co Ltd
Priority to JP60159035A priority Critical patent/JPS6219990A/en
Publication of JPS6219990A publication Critical patent/JPS6219990A/en
Publication of JPH0584553B2 publication Critical patent/JPH0584553B2/ja
Granted legal-status Critical Current

Links

Landscapes

  • Character Input (AREA)
  • Character Discrimination (AREA)

Description

【発明の詳細な説明】 産業上の利用分野 本発明は、新聞・雑誌等の活字及び手書き文字
を認識し、例えばJISコード等の情報量に変換す
る文字認識装置に関するものである。
DETAILED DESCRIPTION OF THE INVENTION Field of Industrial Application The present invention relates to a character recognition device that recognizes printed characters and handwritten characters from newspapers, magazines, etc., and converts them into an amount of information such as a JIS code.

従来の技術 従来の文字認識装置では文字間隔が明確な文
書、つまり読み取る文書の用紙上の絶対的な位置
が予め判明している文書を対象としており、対象
となる文書に制限を与えていた。
BACKGROUND ART Conventional character recognition devices target documents with clear character spacing, that is, documents whose absolute position on a sheet of paper is known in advance, and limits the target documents.

この問題を解決するために本発明者は先に入力
された文書から認識対象となる文字列を幅w、高
さHの矩形で切り出し、文字の幅が各文字ほぼ一
定であることを利用して文字列の中から個別文字
パターンを切り出す文字認識装置を提案した。
In order to solve this problem, the present inventor cut out the character string to be recognized from the previously input document into a rectangle with width W and height H, and took advantage of the fact that the width of each character is almost constant. proposed a character recognition device that extracts individual character patterns from a character string.

発明が解決しようとする問題点 しかしながら、実際には文字には『偏』や『つ
くり』を含んでいるものが多く、個別文字の切り
出しを一定の幅のみを基準として行なう手法では
『偏』や『つくり』を隣接する文字と混同し、個
別文字の切り出しミスが生じていた。本発明は上
記問題点を解決することを目的としたもので、
『偏』や『つくり』を含んでいる文字に対しても
文字列から正しく個別文字を切り出し、文字認識
を行なうことができる文字認識装置を提供するこ
とを目的としている。
Problems to be Solved by the Invention However, in reality, many characters include ``bias'' and ``structure,'' and the method of cutting out individual characters using only a certain width as a standard causes ``bias'' and ``shape.''``Tsukuri'' was confused with adjacent characters, resulting in errors in cutting out individual characters. The present invention aims to solve the above problems,
To provide a character recognition device capable of correctly cutting out individual characters from a character string and performing character recognition even for characters including ``bias'' and ``tsukuri''.

問題点を解決するための手段 本発明は前記問題点を解決するため、幅w・高
さHの矩形で切り出した文字列に於いて文字方向
と垂直に走査してヒストグラムを求め、ヒストグ
ラムから文字の切れ目を検出して文字パターンの
構成要素であるサブ文字パターンを求め、例えば
前記切り出した矩形の高さHと前記文字列中のサ
ブ文字パターンの幅iの中から最大値を求めその
値を文字パターンの基準幅Aとし、前記基準幅A
を基にサブ文字パターンを組み合わせて個別文字
パターンを抽出し、前記個別文字パターンPk
幅C wkを決定する過程に於いて、サブ文字パタ
ーンP siが隣接する個別文字パターンPk、Pk+1
同時に含まれる場合、前記個別文字パターンPk
Pk+1の幅C wk、C wk+1を比較し、前記サブ文字
パターンP siを前記個別文字パターンPk、Pk+1
うち幅の狭い個別文字パターンに含めるように構
成している。
Means for Solving the Problems In order to solve the above-mentioned problems, the present invention obtains a histogram by scanning a character string cut out in a rectangular shape with a width W and a height H in a direction perpendicular to the character direction. Find the sub-character pattern that is a constituent element of the character pattern by detecting the break in the character string, for example, find the maximum value from the height H of the cut out rectangle and the width i of the sub-character pattern in the character string. The standard width of the character pattern is A, and the standard width A is
In the process of extracting individual character patterns by combining sub-character patterns based on and determining the width C w k of the individual character pattern P k , the sub-character patterns P s i are extracted from adjacent individual character patterns P k , When included in P k+1 at the same time, the individual character pattern P k ,
The widths C w k and C w k+1 of P k+1 are compared, and the sub-character pattern P s i is included in the narrower individual character pattern among the individual character patterns P k and P k+1. It consists of

作 用 本発明は前記の技術的手段により、『偏』や
『つくり』を含んでいる文字でも正確に切り出し、
文字認識が可能となる。
Effect The present invention uses the above-mentioned technical means to accurately cut out even characters that include ``bias'' and ``tsukuri''.
Character recognition becomes possible.

実施例 以下、本発明の実施例について図面を参照しな
がら説明する。
Embodiments Hereinafter, embodiments of the present invention will be described with reference to the drawings.

第1図は、本発明による文字認識装置の一実施
例の構成図である。1は画像入力部であり、認識
対象文字を含む画像を走査して2値信号で画像を
入力し画像メモリ2に格納する。3は文字列切り
出し部であり、画像メモリ2を走査して文字列を
矩形で切り出す。4はサブ文字パターン抽出部で
あり、文字列切り出し部3で切り出した文字列を
列方向と垂直に走査して文字部のヒストグラムを
求め文字パターンの構成要素であるサブ文字パタ
ーンを抽出する。5は個別候補文字パターン抽出
部であり、サブ文字パターン抽出部4で抽出した
サブ文字パターンの組み合わせから個別候補文字
パターンを抽出する。6は個別文字パターン決定
部であり、個別候補文字パターン抽出部5で得ら
れた各個別文字パターンにおいて、サブ文字パタ
ーンが隣接する個別文字パターンに同時に含まれ
る場合、サブ文字パターンが含まれる個別文字パ
ターンを一意的に決定する。7は認識部であり、
個別文字パターン決定部6で得られた各個別文字
パターンのストローク等の特徴量を求め、予め辞
書8に登録されている文字の特徴量と照合し、最
も似た文字を認識候補文字とする。9は表示部で
あり、認識部7で得られた認識結果を表示する。
FIG. 1 is a block diagram of an embodiment of a character recognition device according to the present invention. Reference numeral 1 denotes an image input unit which scans an image including characters to be recognized, inputs the image as a binary signal, and stores it in the image memory 2. 3 is a character string cutting section which scans the image memory 2 and cuts out a character string into a rectangular shape. Reference numeral 4 denotes a sub-character pattern extracting section, which scans the character string cut out by the character string cutting section 3 perpendicularly to the column direction, obtains a histogram of the character portion, and extracts sub-character patterns which are constituent elements of the character pattern. Reference numeral 5 denotes an individual candidate character pattern extraction section, which extracts individual candidate character patterns from the combination of sub-character patterns extracted by the sub-character pattern extraction section 4. Reference numeral 6 denotes an individual character pattern determination unit, which determines whether a sub-character pattern is included in an adjacent individual character pattern at the same time in each individual character pattern obtained by the individual candidate character pattern extraction unit 5. Uniquely determine the pattern. 7 is a recognition part;
The characteristic amounts such as strokes of each individual character pattern obtained by the individual character pattern determining section 6 are determined and compared with the characteristic amounts of characters registered in advance in the dictionary 8, and the most similar character is selected as a recognition candidate character. A display section 9 displays the recognition results obtained by the recognition section 7.

以上のように構成された文字認識装置につい
て、第2図に示す入力画像を例に説明する。
The character recognition device configured as described above will be explained using an input image shown in FIG. 2 as an example.

画像入力部1から入力された第2図に示すよう
な画像は2値化されて画像メモリ2に格納され
る。文字列切り出し部3は画像メモリ2に蓄えら
れている入力画像から予め絶対的な位置が決めら
れている文字列を第3図aに示すような矩形Rで
切り出す。
An image as shown in FIG. 2 input from the image input section 1 is binarized and stored in the image memory 2. The character string cutting section 3 cuts out a character string whose absolute position is determined in advance from the input image stored in the image memory 2 in a rectangle R as shown in FIG. 3a.

次にサブ文字パターン抽出部4では矩形Rで切
り出された文字列に対し、列方向と垂直に走査し
て文字列のヒストグラムを第3図bに示すように
求め、連続する文字部により構成されるサブ文字
パターンを切り出し、各サブ文字パターンの幅wi
(i=1、2、……、8)を求める。第3図cに
切りだされたサブ文字パターンPs1、Ps2……、
Ps8を示す。
Next, the sub-character pattern extracting unit 4 scans the character string cut out in the rectangle R perpendicular to the column direction to obtain a histogram of the character string as shown in FIG. 3b. The width of each sub-character pattern w i
Find (i=1, 2, ..., 8). The sub-character patterns P s1 , P s2 ..., cut out in Fig. 3c
Indicates P s8 .

個別候補文字パターン抽出部5ではサブ文字パ
ターン抽出部で抽出された各サブ文字パターンの
中からサブ文字パターンの幅wiと矩形Rで切り出
した文字列の高さHとを比較し、その最大値を基
準値Aとする。例えば第3図bではHが最大であ
り、基準値AはHとなる。さらに隣接するサブ文
字パターンを組み合わせ、サブ文字パターン幅wi
とサブ文字パターン間幅biが |Σ wi+Σ bi−A|≦α(α:定数) の条件を満たす場合、隣接するサブ文字パターン
を組み合わせて1つの個別候補文字パターンと
し、個別候補文字パターンP1、P2、……、P7
第4図aに示すように得られる。
The individual candidate character pattern extraction unit 5 compares the width w i of the sub-character pattern from among the sub-character patterns extracted by the sub-character pattern extraction unit and the height H of the character string cut out by the rectangle R, and calculates the maximum Let this value be the reference value A. For example, in FIG. 3b, H is the maximum, and the reference value A becomes H. Furthermore, by combining adjacent sub-character patterns, the sub-character pattern width w i
If the width between sub-character patterns b i satisfies the condition |Σ w i +Σ b i −A|≦α (α: constant), adjacent sub-character patterns are combined into one individual candidate character pattern, and the individual candidates are Character patterns P 1 , P 2 , . . . , P 7 are obtained as shown in FIG. 4a.

個別文字パターン決定部6では個別候補文字パ
ターン抽出部で得られた個別文字パターンにおい
てサブ文字パターンPsiが隣接する個別文字パタ
ーンに同時に含まれる場合、前記サブ文字パター
ンP siと左右の隣接するサブ文字パターン間の距
離を求め、サブ文字パターンP siをより近い側の
サブ文字パターンの属する個別文字パターンに含
める。例えば第4図aに於いてサブ文字パターン
Ps4は個別候補文字パターンP3、P4に同時に含ま
れているが、個別文字パターン幅Cw3はCw4より
狭いのでサブ文字パターンPs4は個別候補文字パ
ターンP3の一部とみなし、サブ文字パターンPs5
は個別候補文字パターンP4、P5に同時に含まれ
ているが、個別文字パターン幅Cw5はCw4より狭
いのでサブ文字パターンPs5は個別候補文字パタ
ーンP5の一部とみなし、第4図bのように個別
文字パターンは決定される。
If the sub-character pattern P si is simultaneously included in adjacent individual character patterns in the individual character pattern obtained by the individual candidate character pattern extraction section, the individual character pattern determining section 6 selects the sub-character pattern P si adjacent to the left and right of the sub-character pattern P si . The distance between the sub-character patterns is determined, and the sub-character pattern P s i is included in the individual character pattern to which the closer sub-character pattern belongs. For example, in Figure 4a, the subcharacter pattern
P s4 is included in the individual candidate character patterns P 3 and P 4 at the same time, but since the individual character pattern width C w3 is narrower than C w4 , the sub character pattern P s4 is considered to be a part of the individual candidate character pattern P 3 , Sub character pattern P s5
is included in the individual candidate character patterns P 4 and P 5 at the same time, but since the individual character pattern width C w5 is narrower than C w4 , the sub character pattern P s5 is considered to be a part of the individual candidate character pattern P 5 , and the fourth Individual character patterns are determined as shown in Figure b.

認識部7では個別文字パターン決定部で得られ
た個別文字パターンPiについて第5図bの矢印が
示す方向に着目画素を含んでM個以上連なつてい
るか否かを調べ方向コードを設定し、方向コード
毎に各画素の連結性を調べてストロークを抽出
し、ストロークの数・位置・長さ等の特徴量を抽
出する。第5図aに文字『文』のストロークの抽
出結果を示す。抽出した特徴量を辞書8に登録さ
れている特徴量と照合し、最も似た文字を認識候
補文字とし、表示部9で表示する。
The recognition unit 7 determines whether M or more individual character patterns P i obtained by the individual character pattern determination unit are connected in the direction indicated by the arrow in FIG. 5b, including the pixel of interest, and sets a direction code. , the connectivity of each pixel is examined for each direction code, strokes are extracted, and feature quantities such as the number, position, and length of strokes are extracted. Figure 5a shows the stroke extraction results for the character ``bun''. The extracted feature amount is compared with the feature amount registered in the dictionary 8, and the most similar character is set as a recognition candidate character and displayed on the display unit 9.

発明の効果 本発明によれば、認識対象文字列から個別文字
パターンを抽出する場合に、『偏』や『つくり』
を含んでいる文字列に対しても個別文字パターン
を正確に抽出することが出来、文字認識の精度を
向上する事が出来る。
Effects of the Invention According to the present invention, when extracting individual character patterns from a character string to be recognized,
It is possible to accurately extract individual character patterns even for character strings containing , and the accuracy of character recognition can be improved.

例えば第6図aにおいて、認識対象文字『1
列』はPs10、Ps11、Ps12、Ps13の4個のサブパタ
ーンに分解され、個別候補文字パターンP10、P11
は第6図bのように得られ、サブ文字パターン
Ps2は個別文字パターンP10、P11の両方に同時に
含まれている。この場合、Ps2に隣接するサブ文
字パターン間の距離を比較しなければ第6図cに
示すような2個の個別文字パターンP12、P13が求
められて切り出しミスが生じ、Ps2に隣接するサ
ブ文字パターン間の距離を比較することにより第
6図dに示すような個別文字パターンP14、P15
正しく得られることがわかる。
For example, in Figure 6a, the recognition target character "1"
string' is decomposed into four sub-patterns, P s10 , P s11 , P s12 , and P s13 , and individual candidate character patterns P 10 , P 11
is obtained as shown in Figure 6b, and the subcharacter pattern
P s2 is included in both individual character patterns P 10 and P 11 at the same time. In this case, if the distances between sub-character patterns adjacent to P s2 are not compared, two individual character patterns P 12 and P 13 as shown in Figure 6c will be obtained, resulting in a cutting error, and P s2 will be It can be seen that individual character patterns P 14 and P 15 as shown in FIG. 6d can be correctly obtained by comparing the distances between adjacent sub-character patterns.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は本発明の一実施例による文字認識装置
の構成図、第2図は入力画像の1例を示す図、第
3図は文字列からサブ文字パターンを切り出す方
法の説明図、第4図は個別文字パターンを切り出
した結果を示す図、第5図は文字認識方法の説明
図、第6図は個別文字パターンを切り出した結果
を示す図である。 1……画像入力部、2……画像メモリ部、3…
…文字列切り出し部、4……サブ文字パターン抽
出部、5……個別候補文字パターン抽出部、6…
…個別文字パターン決定部、7……認識部、8…
…辞書、9……表示部。
FIG. 1 is a block diagram of a character recognition device according to an embodiment of the present invention, FIG. 2 is a diagram showing an example of an input image, FIG. 3 is an explanatory diagram of a method for cutting out sub-character patterns from a character string, and FIG. The figure shows the results of cutting out individual character patterns, FIG. 5 is an explanatory diagram of the character recognition method, and FIG. 6 shows the results of cutting out individual character patterns. 1... Image input section, 2... Image memory section, 3...
...Character string extraction section, 4...Sub character pattern extraction section, 5...Individual candidate character pattern extraction section, 6...
...Individual character pattern determination unit, 7...Recognition unit, 8...
...Dictionary, 9...Display section.

Claims (1)

【特許請求の範囲】 1 認識対象文字を含む画像を入力する画像入力
部と、前記画像入力部で入力された画像から認識
対象となる文字の集合である文字列を幅w、高さ
Hの矩形で切り出す文字列切り出し部と、前記矩
形において文字列方向に対して垂直に走査して文
字を形成する画素のヒストグラムを求め、ヒスト
グラムの値が1以上である文字部において連続す
る文字部から構成されるサブ文字パターンを抽出
するサブ文字パターン抽出部と、前記サブ文字パ
ターン抽出部に於いて得られたサブ文字パターン
P siの幅をwi、前記サブ文字パターンP siに隣接
するサブ文字パターンP si+1間の距離をdiとした
場合、前記サブ文字パターンP siとサブ文字パタ
ーンP sj間の幅wi,j(=wi+di+wi+1+……+wj
が文字の基準幅A以下であればサブ文字パターン
P si、P si+1……、P sjをひとつの個別候補文字
パターンPkとし、前記幅wi,jを個別文字パターン
Pkの幅C wkとする個別候補文字パターン抽出部
と、前記サブ文字パターン抽出部において得られ
たサブ文字パターンP siが前記隣接する個別候補
文字パターンPkとPk+1とに同時に含まれるとき、
前記個別候補文字パターンPk、Pk+1の幅C wk
C wk+1を比較することによりサブ文字パターン
P siが属する個別文字パターンを一意的に決定す
る個別文字パターン決定部と、前記個別文字パタ
ーン決定部により得られた文字パターンの特徴を
計算し、前記特徴と辞書とを照合することにより
認識候補文字を抽出する認識部を有することを特
徴とする文字認識装置。
[Scope of Claims] 1. An image input unit that inputs an image including characters to be recognized, and a character string that is a set of characters to be recognized from the image input by the image input unit and has a width w and a height H. Consisting of a character string cutting part cut out in a rectangle, a histogram of pixels forming a character by scanning perpendicularly to the character string direction in the rectangle, and consecutive character parts in which the value of the histogram is 1 or more. a sub - character pattern extraction unit that extracts a sub-character pattern that is If the distance between the character patterns P s i+1 is d i , then the width w i,j between the sub-character pattern P s i and the sub-character pattern P s j (=w i +d i + w i+1 +... …+w j )
is less than or equal to the character standard width A, sub-character patterns P s i , P s i+1 ..., P s j are considered as one individual candidate character pattern P k , and the width w i,j is an individual character pattern.
An individual candidate character pattern extracting section with a width C w k of P k and a sub-character pattern P s i obtained in the sub-character pattern extraction section are divided into the adjacent individual candidate character patterns P k and P k+1 . When included at the same time,
The width C w k of the individual candidate character pattern P k , P k+1 ,
an individual character pattern determining unit that uniquely determines the individual character pattern to which the sub character pattern P s i belongs by comparing C w k+1 ; and calculating the characteristics of the character pattern obtained by the individual character pattern determining unit. A character recognition device comprising: a recognition unit that extracts recognition candidate characters by comparing the features with a dictionary.
JP60159035A 1985-07-18 1985-07-18 Character recognizing device Granted JPS6219990A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP60159035A JPS6219990A (en) 1985-07-18 1985-07-18 Character recognizing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP60159035A JPS6219990A (en) 1985-07-18 1985-07-18 Character recognizing device

Publications (2)

Publication Number Publication Date
JPS6219990A JPS6219990A (en) 1987-01-28
JPH0584553B2 true JPH0584553B2 (en) 1993-12-02

Family

ID=15684825

Family Applications (1)

Application Number Title Priority Date Filing Date
JP60159035A Granted JPS6219990A (en) 1985-07-18 1985-07-18 Character recognizing device

Country Status (1)

Country Link
JP (1) JPS6219990A (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS63223890A (en) * 1987-03-12 1988-09-19 Toshiba Corp Drawing reader
JPH01271890A (en) * 1988-04-22 1989-10-30 Nec Corp Optical character reading device

Also Published As

Publication number Publication date
JPS6219990A (en) 1987-01-28

Similar Documents

Publication Publication Date Title
EP0138445B1 (en) Method and apparatus for segmenting character images
KR900007009B1 (en) Character recognition device
JPH05242292A (en) Separating method
KR100582039B1 (en) Character recognition device
JPH0584553B2 (en)
JP2661898B2 (en) Character recognition device
JPS6316392A (en) Character recognizing device
JPH0782525B2 (en) Character recognition device
JP3914119B2 (en) Character recognition method and character recognition device
JP2697790B2 (en) Character type determination method
JPH0576671B2 (en)
JP2537973B2 (en) Character recognition device
JPS6316391A (en) Character recognizing device
JP2993533B2 (en) Information processing device and character recognition device
JPS6330991A (en) Character recognizing device
JPH0797390B2 (en) Character recognition device
JPS62219187A (en) Character recognizing device
JPH0584552B2 (en)
JPS63271588A (en) Character recognition device
JP2918363B2 (en) Character classification method and character recognition device
JPH0415776A (en) How to extract font size information
JPS63221495A (en) Character recognizing device
JPS62251888A (en) Character recognizing device
JPH0664628B2 (en) Character recognition device
JPH05174114A (en) Information processor and character recognizing device using the same

Legal Events

Date Code Title Description
EXPY Cancellation because of completion of term