JPH01154296A

JPH01154296A - Character segmenting method

Info

Publication number: JPH01154296A
Application number: JP62314753A
Authority: JP
Inventors: Takafumi Enami; 隆文枝並
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1987-12-10
Filing date: 1987-12-10
Publication date: 1989-06-16

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】［目次］概要産業上の利用分野従来の技術発明が解決しようとする問題点問題点を解決するための手段作用実施例発明の効果［概要コ本発明は、イメージデータに変換された紙葉上の印刷文
字を文字認識のために切り出す文字切出方法に関するも
のであり、文字認識の精度を向上できる方法の提供を目的とし、このため、−文字領域のイメージデータにより該領域に
含まれた文字の両端位置を検出し、両検出位置の文字両
外側のうち両検出位置　より定まる側で両検出位置に応
じた量となる空白が設けられた方形の切出窓を前記イメ
ージデータに対して設定し、設定枠内のイメージデータ
を文字切出データとして抽出する、ことを特徴としてい
る。Detailed Description of the Invention [Table of Contents] Overview Industrial Application Fields Prior Art Problems to be Solved by the Invention Means for Solving the Problems Effects of the Invention [Summary This paper relates to a character extraction method for cutting out printed characters on paper sheets that have been converted into characters for character recognition, and aims to provide a method that can improve the accuracy of character recognition. The positions of both ends of the character included in the region are detected, and a rectangular cut-out window with a blank space corresponding to both detection positions is formed on the side determined by both detection positions of the characters at both detection positions. It is characterized in that it is set for image data, and the image data within the set frame is extracted as character cutting data.

［産業上の利用分野］本発明はイメージデータに変換された紙葉上の印刷文字
を切り出す文字切出方法に関するものである。[Industrial Application Field] The present invention relates to a character cutting method for cutting out printed characters on paper sheets that have been converted into image data.

紙葉上に印刷された文字は、その読み取りをイメージス
キャナなどで光学的に行ない、各文字をイメージデータ
から切り出し、切り出された文字を辞書と照合すること
により認識できる。Characters printed on a sheet of paper can be recognized by optically reading them using an image scanner or the like, cutting out each character from the image data, and comparing the cut out characters with a dictionary.

［従来の技術］第８図は従来における文字切出方法を説明するものであ
り、同図（Ａ＞、（Ｂ）、（Ｃ）、（Ｄ）で示される全
角文字「Ａ」、半角文字「Ａ」、記号「′″」、記号「
〜」は同図（Ｅ）、（Ｆ）。[Prior Art] Fig. 8 explains a conventional character cutting method. “A”, symbol “′″”, symbol “
~” are in the same figure (E) and (F).

（Ｇ）、（Ｈ）のようにそれらが全ての辺に内接する方
形の窓で切り出されていた。As shown in (G) and (H), they were cut out with square windows inscribed on all sides.

このように認識の対象となる文字が切り出し用方形窓の
全通に内接するので、常に安定した文字認識を行なうこ
とが可能となる。In this way, since the character to be recognized is inscribed in the entire rectangular cut-out window, stable character recognition is always possible.

［発明が解決しようとする問題点］しかしながら従来においては、同図（Ｅ）。[Problem that the invention attempts to solve] However, in the conventional case, the same figure (E) is used.

（Ｆ）、（Ｇ）、（Ｈ＞から理解されるように切り出し
時には文字の大きさや位置が切り出されたイメージデー
タには含まれておらず、窓に内接した文字の形状のみが
認識資料となり、したがって例えば同図（Ｇ）、（Ｈ＞
の区別か困難となり、その結果、文字認識に誤りが発生
するという問題があった。As can be understood from (F), (G), and (H>, the size and position of the characters are not included in the extracted image data at the time of extraction, and only the shape of the characters inscribed in the window is the recognition material. Therefore, for example, (G), (H>
There was a problem in that it became difficult to distinguish between characters, and as a result, errors occurred in character recognition.

すなわち、通常の文書には略正方形で十分な結果が得ら
れる漢字よりそれ以外のものが多く含まれており、漢字
以外のものには撥音便や記号などの様に大きさや位置で
その内容を示すものも含まれるので、それらの場合には
ほぼ全てが誤って認識される。In other words, normal documents contain many characters other than kanji, which can be used to obtain sufficient results with approximately square shapes, and non-kanji characters have their contents indicated by their size and position, such as syllables or symbols. In these cases, almost everything will be recognized incorrectly.

本発明は上記従来の課題に鑑みてなされたものであり、
その目的は、大きさや位置などでその内容を示す撥音便
や記号などを正確に認識すること　゛が可能となる文字
切出方法を提供することにある。The present invention has been made in view of the above-mentioned conventional problems,
The purpose is to provide a character cutting method that makes it possible to accurately recognize phonograms, symbols, etc. that indicate their content by size, position, etc.

［問題点を解決するための手段］上記目的を達成するために、本発明に係る方法は第１図
のように構成されている。[Means for Solving the Problems] In order to achieve the above object, the method according to the present invention is configured as shown in FIG.

まず第１図のステップ１０では一文字領域のイメージデ
ータより該領域に含まれた文字の両端位置が検出される
。First, in step 10 of FIG. 1, the positions of both ends of a character included in one character area are detected from the image data of that area.

なお、本発明では文字の左右または上下おるいはそれら
の双方の両端位置が検出され、同図においては、上下及
び左右の両端位置が検出されている。In the present invention, the left and right, top and bottom, or both end positions of the characters are detected, and in the figure, both the top and bottom and left and right end positions are detected.

このようにして文字の両端位置が検出されると、次のス
テップ１２では、両位置に応じた量の空白を両位置の文
字外側に設けた方形の切出窓が前記イメージデータに対
して設定される。When the positions of both ends of the character are detected in this way, in the next step 12, a rectangular cutting window is set for the image data, with spaces corresponding to both positions provided outside the character at both positions. Ru.

そして空白が設けられる側は両検出位置により定まり、
　同図においては文字偏り側の反対側に空白が設けられ
る。The side where the blank space is provided is determined by both detection positions,
In the figure, a blank space is provided on the side opposite to the character bias side.

最後のステップ１４では、ステップ１２で設定された窓
内のイメージデータが文字切出データとして抽出されて
おり、その抽出データが該文字の認識に用いられる。In the final step 14, the image data within the window set in step 12 is extracted as character cutting data, and the extracted data is used to recognize the character.

［作用］本発明では、文字の両検出位置が文字の大きざ及びその
位置を示し、これに応じて設けられる空白がその大きさ
２位置の情報となるので、第１図に示された全ての文字
が、それらの大きさや位置にかかわらず、正確に認識さ
れる。[Function] In the present invention, both detection positions of a character indicate the size and position of the character, and the blanks provided accordingly serve as information on the size and two positions. characters are recognized accurately regardless of their size or position.

［実施例］以下、図面に基づいて本発明に係る方法の好適な実施例
を説明する。[Example] Hereinafter, a preferred example of the method according to the present invention will be described based on the drawings.

第２図は本発明が適用されたシステムを説明するもので
あり、紙葉２０の印刷文はスキャナ２２で光学的に読み
取られ、そのイメージデータによりコンピュータ２４で
印刷文の文字認識が行なわれる。FIG. 2 illustrates a system to which the present invention is applied, in which a printed text on a sheet of paper 20 is optically read by a scanner 22, and a computer 24 performs character recognition of the printed text based on the image data.

この文字認識のための処理を行なう際には、同図のよう
に各文字の切り出しが行なわれており、第３図にはその
手順か示されている。When performing this character recognition process, each character is cut out as shown in the figure, and FIG. 3 shows the procedure.

本実施例では、各行ごとに文字高さ１文字横区間がまず
検出され、これにより一文字の領域が決定される。In this embodiment, a horizontal section with a character height of one character is first detected for each line, and thereby the area of one character is determined.

その決定作用が第４図で説明されており、同図において
は一行内に「・Ｊ、ｒ、Ｊ、ｒ−Ｊ。The determining action is explained in Fig. 4, in which ".J, r, J, r-J.

ｒＡＪ、　ｒ　Ｊ、　ｒＢＪ、　ｒ、　Ｊ、　ｒｃＪ、
ビ」が含まれている。rAJ, rJ, rBJ, r, J, rcJ,
"B" is included.

そしてこの行内で上下方向寸法が最も大きな文字ｒＡＪ
の上端位置ｃｕｐ、下端位置ｃｄｗｎが検出されてそれ
らが打上端位置１ｕｐ、行下端位置Ｉ　ｄｗｎとされ、
文字領域が原則的に正方形であるので、それら位置１ｕ
ｐ、Ｉｄｗｎ間の距離が諸行における一文字の高さ及び
その横区間ｂｃ−ｂｃとして設定される。And the character rAJ with the largest vertical dimension in this line
The upper end position cup and the lower end position cdwn are detected and these are set as the upper end position 1up and the lower end position Idwn,
Since the character area is basically a square, these positions 1u
The distance between p and Idwn is set as the height of one character in the lines and its horizontal interval bc-bc.

つぎに、各文字の上端と下端の位置（ｃｕｐ。Next, determine the top and bottom positions of each character (cup.

ｃｄｗｎ）が検出され（ステップ３１）、それらの上下
方向中央の位置が求められる（ステップ３２）。cdwn) is detected (step 31), and their vertical center position is determined (step 32).

ざらにそれらの中央位置から文字高さの半分だけ各々外
側に離れた位置（ｃｔｏｐ、ｃｂｏｔｔＯｍ）が求めら
れるとくステップ３３）、これらが打上下端位置１ｕｐ
、Ｉｄｗｎを越えたか否かが判断され（ステップ３４．
３５＞、越えた場合にはそれらの位置Ｉｕｐ、ｌｄｗｎ
に制限される（ステップ３６．３７＞。Roughly, the positions (ctop, cbottOm) separated outward by half the character height from the center position are found (step 33), and these are the top and bottom stroke positions 1up.
, Idwn is exceeded (step 34.
35>, then their positions Iup, ldwn
(step 36.37>).

このようにして定められた上下位置（ｃｔｏｐ。The vertical position (ctop) determined in this way.

ｃｂｏｔｔｏｍ）は内側へ向かって移動を開始しくステ
ップ３８）、それらの一方が文字の上端位置ｃｕｐまた
下端位置ｃｄｏｗｎに達したことが確認され（ステップ
３９でＹＥＳ）　、あるいは移動量が文字高さの半分と
なったことが確認されると（ステップ４０でＹＥＳ）　
、移動が停止され（ステップ４１）、そのときの上側停
止位置ｃｔｏｐ及び下側停止位置ｃｄｏｗｎが上下方向
における文字の切出位置として記′臣される。cbottom) starts to move inward (step 38), and it is confirmed that one of them has reached the upper end position cup or the lower end position cdown of the character (YES at step 39), or the amount of movement is equal to the character height. When it is confirmed that it has become half (YES in step 40)
, the movement is stopped (step 41), and the upper stop position ctop and lower stop position cdown at that time are recorded as the cutout position of the character in the vertical direction.

そして左右端についても同様な処理が行なわれ（ステッ
プ４３）、その確認が行なわれると（ステップ４４でＹ
ＥＳ）、記憶位置で定まる方形窓内におけるイメージデ
ータが文字切出データとして抽出され、出力される（ス
テップ４５）。Similar processing is performed for the left and right ends (step 43), and after confirmation (step 44, Y
ES), the image data within the rectangular window determined by the storage position is extracted as character cutout data and output (step 45).

第５図は実施例の作用を示すものであり、ここでは同図
（Ａ）のように記号「、」の文字が切り出される。FIG. 5 shows the operation of the embodiment, in which characters with the symbol "," are cut out as shown in FIG. 5(A).

その場合にはこの文字が左下側に位置した小さなもので
あるので、同図（Ｂ）のように文字の上側及び右側に空
白を有した窓でそのイメージデータが抽出される。In this case, since this character is small and located on the lower left side, its image data is extracted using a window with blank spaces above and to the right of the character, as shown in FIG. 2B.

また「−」の場合にはこれが一文字領域の上下方向中央
に位置するので、同図（Ｃ）の様に示す文字の上下に余
白を有した窓でイメージデータが抽出される。In the case of "-", this is located at the center in the vertical direction of a single character area, so the image data is extracted in a window with margins above and below the character as shown in FIG. 4C.

以上の説明から理解されるように本実施例によれば、文
字切出窓の全ての辺に文字が内接しないときにはその文
字が小さなものであり、またスペースの存在位置で一文
字領域の文字の偏り位置が示されるので、これを利用し
て、第４図の記号ｒ−Ｊ、ｒ　　Ｊ、ｒ、Ｊ、ｒ、Ｊは
正確に誤りなく認識される。As can be understood from the above description, according to this embodiment, when a character is not inscribed on all sides of the character cutout window, the character is small, and the characters in a single character area are biased depending on the position of the space. Since the position is indicated, using this, the symbols r-J, r J, r, J, r, J in FIG. 4 can be recognized accurately and without error.

このことは撥音便のかなについても同様であり、その結
果、かなや記号の文字認識を従来に比して極めて正確に
行なうことが可能となる。The same holds true for kana letters, and as a result, it becomes possible to recognize characters in kana and symbols much more accurately than in the past.

ただし、第４図に示された記号「・」のように文字が小
さく、−文字領域の中央に位置する場合には第６図から
も理解されるようにその文字は前記窓に内接せず、した
がって、場合によっては記号「、」と認識される様に文
字認識に誤りが生ずる。However, if the character is small like the symbol "・" shown in Figure 4 and located in the center of the - character area, the character is inscribed in the window as can be understood from Figure 6. Therefore, in some cases, errors may occur in character recognition such that the symbol is recognized as ",".

この場合には窓の左右辺が文字の左右両端に各々接する
ように窓を設定することが安定した文字認識を行なう上
でも好ましい。In this case, it is preferable to set the window so that the left and right sides of the window touch the left and right ends of the character, respectively, in order to perform stable character recognition.

第７図はその方法を説明するものであり、まず文字左端
位置ｃ＋ｅｆｔおよび文字右端位置ｃｒｉｃｌｈｔが検
出される。FIG. 7 explains the method. First, the left end position of the character c+eft and the right end position of the character crclht are detected.

次にそれらの中央位置ｃｍｉｄが求められ、文字高さの
半分だけ離れた一文字領域の左右境界ｂＣと一致する左
右位置ｃｈ　Ｉ　ｅｆｔ、　ｃｈｒ　ｉ　Ｃ１ｈｔが求
められる。Next, their center position cmid is determined, and left and right positions ch I eft and ch i C1ht that coincide with the left and right boundaries bC of the single character area separated by half the character height are determined.

なお、境界位置ｂｃをそれらが越えたときは、境界位置
ｂｃにそれらが境界位置にそれらが制限される。Note that when they exceed the boundary position bc, they are restricted to the boundary position bc.

さらに位置ｃｈ　ｌ　ｅｆｔ、ｃｈｒ　ｉ　ｇｈｔが内
側へ移動され、文字「・」の両側へそれらが達したとき
に、それらの位置Ｃ１ｅｆｔ、ｃｒｉｇｈｔが同図（Ｂ
＞のように窓の左右辺を決定するものとなる。Further, the positions ch l left and ch r i g ht are moved inward and when they reach both sides of the character "・", their positions C1 eft and cright are moved inward in the same figure (B
>, which determines the left and right sides of the window.

ただし、文字「、」のように左右対称の位置に配置され
ない場合には、左右いずれかの窓辺かその文字に外接す
る。However, if the character is not placed in a symmetrical position, such as the character ",", it will be circumscribed by either the left or right window or the character.

本実施例によれば窓の上下左右四辺の内二辺に切り出す
べき文字が内接するので、第１実施例よりさらに正確な
文字認識を行なうことが可能となる。According to this embodiment, since the characters to be cut out are inscribed on two of the four sides of the window, it is possible to perform more accurate character recognition than in the first embodiment.

［発明の効果］以上説明したように本発明によれば、文字切出窓の空白
から切り出すべき文字の大きさや一文字領域内における
文字位置を確認できるので、撥音便かな文字、記号文字
などの認識をきわめて高い精度で認識することが可能と
なる。[Effects of the Invention] As explained above, according to the present invention, it is possible to check the size of the character to be cut out from the blank space of the character cutout window and the position of the character within a single character area, which greatly improves the recognition of Kana characters, symbol characters, etc. It becomes possible to recognize with high accuracy.

なお、記号文字のみの認識率は従来では６０％程度であ
ったものを９５％に、かな及び記号については認識率を
従来の９０％から９５％へ向上できることが確認されて
いる。It has been confirmed that the recognition rate for only symbols and characters can be improved from about 60% in the past to 95%, and for kana and symbols from 90% to 95%.

[Brief explanation of the drawing]

第１図は発明の県理説明図、第２図は実施例のシステム
説明図、第３図は実施例の手順説明図、第４図は実施例
の一文字領域決定作用説明図、第５図及び第６図は実施
例の作用説明図、第７図は他の左右方向切出方法の説明
図、第８図は従来例の説明図である。１０・・・両端位置検出１２・・・切出窓設定１４・・・イメージデータ抽出ン発明の原理説明図第　　１　　図実施例のシステム説明図第２図＜”−−一文字領域第５図他の左右方向切出方法の説明図第７図Fig. 1 is an explanatory diagram of the principles of the invention, Fig. 2 is an explanatory diagram of the system of the embodiment, Fig. 3 is an explanatory diagram of the procedure of the embodiment, Fig. 4 is an explanatory diagram of the single character area determination operation of the embodiment, and Fig. 5 6 is an explanatory diagram of the operation of the embodiment, FIG. 7 is an explanatory diagram of another horizontal cutting method, and FIG. 8 is an explanatory diagram of a conventional example. 10...Both end position detection 12...Cutout window setting 14...Image data extraction Principle explanatory diagram of the invention Fig. 1 System explanatory diagram of the embodiment Fig. 2 <''--Single character area Fig. 5 Other Figure 7 is an explanatory diagram of the left-right direction cutting method.

Claims

[Claims] The positions of both ends of a character included in one character area are detected from the image data of the area (10), and the side determined by both detection positions of the outside of the character at both detection positions is detected according to both detection positions. A rectangular cutting window with a blank space of the same amount is set for the image data (12).
, extracting image data within a setting window as character cutting data (14).