JPH05242294A - Drawing reader - Google Patents
Drawing readerInfo
- Publication number
- JPH05242294A JPH05242294A JP4039920A JP3992092A JPH05242294A JP H05242294 A JPH05242294 A JP H05242294A JP 4039920 A JP4039920 A JP 4039920A JP 3992092 A JP3992092 A JP 3992092A JP H05242294 A JPH05242294 A JP H05242294A
- Authority
- JP
- Japan
- Prior art keywords
- character
- candidate
- character string
- data
- candidates
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 claims abstract description 16
- 239000000284 extract Substances 0.000 claims abstract description 8
- 238000000605 extraction Methods 0.000 claims description 32
- 239000013598 vector Substances 0.000 claims description 27
- 238000006243 chemical reaction Methods 0.000 claims description 9
- 238000005520 cutting process Methods 0.000 claims description 7
- 238000010586 diagram Methods 0.000 description 14
- 238000000034 method Methods 0.000 description 12
- 230000010354 integration Effects 0.000 description 3
- 238000000926 separation method Methods 0.000 description 3
- 238000007781 pre-processing Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
Landscapes
- Character Input (AREA)
Abstract
Description
【0001】[0001]
【産業上の利用分野】本発明は文字、線分、シンボル等
が混在する書類や図面から、文字データを検出してその
文字の認識を行う図面読取装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a drawing reading apparatus for detecting character data from a document or drawing in which characters, line segments, symbols and the like are mixed and recognizing the character.
【0002】[0002]
【従来の技術】従来、文字や図形等を読み取る図面読取
装置は例えば特願平1−253451号に記載されたも
のがある。この装置は、文字、線分、シンボルが混在す
る書類や図面等の被写体をラスタスキャンして得られた
黒白に対応する2値化画素データについて輪郭ベクトル
化処理を施し、該輪郭ベクトルに外接する外接四角形デ
ータを文字候補として求める輪郭ベクトル化処理部と、
該文字候補の外接四角形データをツリー構造で登録する
登録部と、該登録された文字候補から文字列候補を抽出
する文字列候補抽出部と、該文字列候補から文字列を抽
出する文字列抽出部と、該抽出された文字列内の複数の
文字候補を統合する機能を有した文字切り出し部とを設
けて構成されている。2. Description of the Related Art Conventionally, a drawing reading device for reading characters and figures is disclosed in Japanese Patent Application No. 1-253451. This apparatus performs contour vectorization processing on binarized pixel data corresponding to black and white obtained by raster scanning an object such as a document or drawing in which characters, line segments, and symbols are mixed, and circumscribes the contour vector. A contour vectorization processing unit that obtains circumscribed quadrangle data as character candidates,
A registration unit that registers the circumscribed quadrangle data of the character candidates in a tree structure, a character string candidate extraction unit that extracts the character string candidates from the registered character candidates, and a character string extraction that extracts the character strings from the character string candidates. And a character cutout unit having a function of integrating a plurality of character candidates in the extracted character string.
【0003】[0003]
【発明が解決しようとする課題】前記図面読取装置は、
入力対象となる図面中に文字の大きさと同程度の、文字
以外の孤立図形が無い場合に有効であるが、市街区地図
のように文字と同程度の大きさを持つ建築物を表す図形
が存在しているような図面を入力する場合には次のよう
な問題が生じる。The drawing reading device is
This is effective when there are no isolated figures other than characters in the drawing to be entered, which are of the same size as the characters, but a figure representing a building that has the same size as the characters, such as an urban map, is used. The following problems occur when inputting a drawing that exists.
【0004】すなわち要素分離時に文字サイズ以下の図
形を文字候補としてしまうため、誤った認識結果となっ
てしまう。例えば図19(a)に示すような地図を認識
する場合、正しい認識結果は図19(b)となるが、前
記図面読取装置により認識した場合、図20(a)のよ
うに図形を文字と誤って認識したり、図20(b)のよ
うに文字を図形と誤って認識してしまう。That is, when the elements are separated, a graphic having a size smaller than the character size is selected as a character candidate, resulting in an incorrect recognition result. For example, when recognizing a map as shown in FIG. 19 (a), the correct recognition result is as shown in FIG. 19 (b). However, when it is recognized by the drawing reading device, a figure is converted into a character as shown in FIG. 20 (a). The character may be erroneously recognized, or the character may be erroneously recognized as a figure as shown in FIG.
【0005】本発明は上記の点に鑑みてなされたもので
その目的は、文字と同程度の大きさの図形が同一図面上
に存在する場合であっても、文字と図形を確実に区別し
て認識することができる図面読取装置を提供することに
ある。The present invention has been made in view of the above points, and an object thereof is to reliably distinguish a character from a figure even when a figure having a size similar to that of a character exists in the same drawing. It is to provide a drawing reading device that can be recognized.
【0006】[0006]
【課題を解決するための手段】本発明は、文字、線分、
シンボルが混在する被写体をラスタスキャンして得られ
た黒白に対応する2値化画素データから文字データを抽
出し、該抽出された文字データと文字辞書を参照、比較
して文字を認識する図面読取装置において、前記被写体
をラスタスキャンして得られた黒白に対応する2値化画
素データに基づいて、互いに隣接した2個の黒画素を結
ぶベクトルを結合して成る輪郭ベクトルのデータを求め
るとともに、該輪郭ベクトルに外接する外接四角形デー
タを文字候補として求める輪郭ベクトル化処理部と、前
記輪郭ベクトル化処理部で求められた文字候補のデータ
から多角形図形を検出し、該検出された多角形図形デー
タを文字図形候補として求める多角形図形検出処理部
と、前記輪郭ベクトル化処理部で求められた文字候補の
外接四角形データをツリー構造で登録する登録部と、所
定の大きさに設定した文字列候補の核から検索範囲を決
定し、前記登録部に登録された文字候補のうち中心座標
が前記検索範囲内に存在する文字列を文字列候補として
抽出する文字列候補抽出部と、前記文字列候補抽出部で
抽出されたデータの中から、文字列候補核の高さhCと
文字列候補内の互いに隣接する文字候補間の距離dがd
<hC×α(αは定数)なる関係にあるデータを文字列
として抽出する文字列抽出部と、前記文字列抽出部で抽
出された文字列内の近接する複数の文字候補を統合する
機能を有し、文字列から文字を切り出す文字切り出し部
とを備え、前記文字切り出し部で切り出された文字デー
タに基づいて文字認識を行うとともに、前記多角形図形
検出処理部で求められた文字図形候補のうち文字列化さ
れていない文字図形候補を図形と認定することを特徴と
している。SUMMARY OF THE INVENTION The present invention is directed to characters, line segments,
Drawing reading for recognizing characters by extracting character data from binary pixel data corresponding to black and white obtained by raster scanning an object with mixed symbols and referring to the extracted character data and a character dictionary for comparison In the apparatus, based on the binarized pixel data corresponding to black and white obtained by raster-scanning the subject, data of a contour vector formed by combining vectors connecting two black pixels adjacent to each other is obtained, and A contour vectorization processing unit that obtains circumscribed quadrangle data circumscribing the contour vector as a character candidate, and a polygonal figure is detected from the character candidate data obtained by the contour vectorization processing unit, and the detected polygonal figure A polygon figure detection processing unit that obtains data as character and figure candidates, and circumscribed quadrangle data of the character candidates obtained by the contour vectorization processing unit The registration range to be registered in the Lee structure, and the search range is determined from the core of the character string candidates set to a predetermined size, and among the character candidates registered in the registration unit, the central coordinates are characters within the search range. The character string candidate extraction unit that extracts a string as a character string candidate, and the height h C of the character string candidate core and the adjacent character candidates in the character string candidate from the data extracted by the character string candidate extraction unit. The distance d is d
<H C × α (α is a constant) A character string extraction unit for extracting data as a character string, and a function for integrating a plurality of adjacent character candidates in the character string extracted by the character string extraction unit And a character slicing unit for slicing a character from a character string, and performing character recognition based on the character data cut out by the character slicing unit, and a character and figure candidate obtained by the polygonal figure detection processing unit. The feature is that character figure candidates that are not converted into character strings are recognized as figures.
【0007】[0007]
【作用】前記ベクトル化処理部によって外接四角形を求
めると文字、線分、シンボルが混在する被写体(書類や
図面)の中から文字候補のデータが得られる。この文字
候補の外接四角形データはツリー構造で登録部に登録さ
れる。また多角形図形検出処理部は前記文字候補データ
から多角形図形を検出し、文字図形候補を求める。When the circumscribed quadrangle is obtained by the vectorization processing unit, character candidate data can be obtained from a subject (document or drawing) in which characters, line segments, and symbols are mixed. The circumscribed quadrangle data of the character candidates is registered in the registration unit in a tree structure. The polygonal figure detection processing unit detects a polygonal figure from the character candidate data and obtains a character figure candidate.
【0008】文字列候補抽出部は文字列候補核で決まる
検索範囲で前記登録部内のデータを検索し、中心座標が
検索範囲内にある文字列を文字列候補として抽出する。
前記外接四角形データはツリー構造で登録されているの
で領域四角形の近傍検索処理は軽減される。前記文字列
候補の中で、d<hC×α(dは隣接する文字候補間距
離、hCは文字列候補核の高さ、αは定数)の関係が成
立するデータが文字列抽出部によって文字列として抽出
される。すなわち被写体の中に混在する線分やシンボル
は除外され文字列のデータのみが抽出される。The character string candidate extraction unit searches the data in the registration unit within a search range determined by the character string candidate kernel, and extracts a character string whose center coordinates are within the search range as a character string candidate.
Since the circumscribed quadrangle data is registered in a tree structure, the area quadrangle neighborhood search processing is reduced. Among the character string candidates, data in which the relationship of d <h C × α (d is the distance between adjacent character candidates, h C is the height of the character string candidate kernel, and α is a constant) is satisfied is the character string extraction unit. It is extracted as a character string by. That is, line segments and symbols mixed in the subject are excluded and only character string data is extracted.
【0009】次に文字切り出し部は前記文字列から文字
を切り出す。このとき文字列内の近接する複数の文字候
補の大きさや文字候補間距離が所定値である場合には、
それら複数の文字候補を統合し、統合された文字候補を
1個の文字として切り出す。これによって文字の大きさ
や書式がいかなるものであっても文字として読み取るこ
とができる。Next, the character cutout unit cuts out a character from the character string. At this time, if the size of the plurality of adjacent character candidates in the character string and the distance between the character candidates are predetermined values,
The plurality of character candidates are integrated and the integrated character candidate is cut out as one character. As a result, the characters can be read as characters regardless of the size and format of the characters.
【0010】また前記多角形図形検出処理部で求められ
た文字図形候補のうち文字列化されていない文字図形候
補を図形と認定する。このため市街地図のように文字と
同程度の大きさの図形が同一図面上に存在する場合であ
っても、文字と図形を確実に区別して認識することがで
きる。Further, among the character and figure candidates obtained by the polygonal figure detection processing section, character and character candidates which are not converted into character strings are recognized as figures. Therefore, even when a figure having a size similar to that of a character exists in the same drawing as in a city map, the character and the figure can be reliably distinguished and recognized.
【0011】[0011]
【実施例】以下、図面を参照しながら本発明の一実施例
を説明する。まず本発明による図面読取装置は図1のよ
うに輪郭ベクトル化処理部1、登録部2、文字列候補抽
出部3、文字列抽出部4、文字切り出し部5および文字
認識部6を備えており、全体の処理の流れは図2のよう
に示される。すなわちまずステップS1において輪郭ベ
クトル化処理部1が前処理を行い、ステップS2におい
て多角形図形検出処理を行い、ステップS3において登
録部2、文字列候補抽出部3および文字列抽出部4が文
字列化処理を行い、ステップS4において文字切り出し
部5が文字分離処理を行い、ステップS5において文字
認識部6が文字認識処理を行う。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings. First, the drawing reading apparatus according to the present invention includes a contour vectorization processing unit 1, a registration unit 2, a character string candidate extraction unit 3, a character string extraction unit 4, a character cutout unit 5 and a character recognition unit 6 as shown in FIG. The overall processing flow is shown in FIG. That is, first, in step S 1 , the contour vectorization processing unit 1 performs preprocessing, in step S 2 , polygon figure detection processing is performed, and in step S 3 , the registration unit 2, the character string candidate extraction unit 3, and the character string extraction unit 4 are performed. There performs character string processing, character extraction section 5 performs a character segmentation process in step S 4, the character recognition unit 6 performs character recognition processing in step S 5.
【0012】前記輪郭ベクトル化処理部1が行う前処理
は図3のようなフローチャートで表される。すなわちス
テップS1において画像入力処理を行い、ステップS2に
おいて輪郭ベクトル化処理を行い、ステップS3におい
て要素分離処理、多角形図形検出処理を行う。The preprocessing performed by the contour vectorization processing unit 1 is represented by the flowchart shown in FIG. That performs image input processing in step S 1, whereby the border vectorized process in step S 2, the element separation process in step S 3, performs polygon graphic detection processing.
【0013】このうち前記画像入力処理および輪郭ベク
トル化処理は、例えば特願平1−253451号の第4
図から第13図に記載された手段によって行う。すなわ
ち文字、線分、シンボルが混在する被写体をラスタスキ
ャンして得られた黒白に対応する2値化画素データに基
づいて、互いに隣接した2個の黒画素を結ぶベクトルを
結合して成る輪郭ベクトルのデータを求めるとともに、
該輪郭ベクトルに外接する外接四角形データを求める
(図9)。Of these, the image input processing and the contour vectorization processing are described, for example, in No. 4 of Japanese Patent Application No. 1-253451.
It is performed by the means described in FIGS. That is, based on the binarized pixel data corresponding to black and white obtained by raster-scanning a subject in which characters, line segments, and symbols are mixed, a contour vector formed by combining vectors connecting two black pixels adjacent to each other. While seeking the data of
The circumscribed quadrangle data circumscribing the contour vector is obtained (FIG. 9).
【0014】また多角形図形検出処理は図4のフローチ
ャートに沿って行う。まずステップS1において輪郭ベ
クトル化が終了したか否かを判定し、ステップS2にお
いて文字候補か否かを判定し、文字候補であればステッ
プS3において多角形図形検出処理を行い、ステップS4
において多角形図形か否かを判定し、多角形図形であれ
ば当該文字候補を文字図形候補に変更する(ステップS
5)。上記のように多角形図形検出処理を行うことによ
り、文字候補と文字図形候補の2つに区分することがで
きる。The polygon figure detection processing is performed according to the flowchart of FIG. First, in step S 1 , it is determined whether or not contour vectorization is completed, and in step S 2 , it is determined whether or not it is a character candidate. If it is a character candidate, polygon figure detection processing is performed in step S 3 , and step S 3 Four
It is determined whether or not it is a polygon figure, and if it is a polygon figure, the character candidate is changed to a character figure candidate (step S).
5 ). By performing the polygonal figure detection processing as described above, it is possible to classify into two, character candidates and character figure candidates.
【0015】尚前記文字候補の抽出(要素分離処理)は
次のようにして行う。まず図5(特願平1−25345
1号の第9図と同一)の輪郭ベクトル系列テーブルを使
用してショートベクトルを除去する。図6はショートベ
クトル除去手段を説明するフローチャートであり、ステ
ップS1で図5に示す輪郭ベクトル系列テーブルから輪
郭ベクトルの外接四角形(図7に示す)(xmin,ymin)、
(xmax,ymax)の算出を行う。次にステップS2でdx=
xman−xmin,dy=ymax−yminとしきい値DXth,
DYth とを比較する。ステップS3 はステップS2の比
較判断部で、このステップS3で(dx>DXth)∪
(dy>DYth)を判断し、YESならステップS4で
輪郭ベクトルとして外接四角形という情報を作成する。
この情報の一例を図8に外接四角形情報テーブルとして
示す。前記ステップS3でNOなら図5に示す輪郭ベク
トル系列テーブルからステップS5で同一系列のベクト
ルを削除して雑音(ノイズ)を除去する。The extraction of the character candidates (element separation processing) is performed as follows. First, FIG. 5 (Japanese Patent Application No. 1-25345)
The short vector is removed using the contour vector sequence table (identical to FIG. 9 of No. 1). FIG. 6 is a flow chart for explaining the short vector removing means. In step S 1 , the contour vector circumscribed quadrangle (shown in FIG. 7) from the contour vector series table shown in FIG. 5 (x min , y min ),
(x max , y max ) is calculated. Next, in step S 2 , dx =
x man −x min , dy = y max −y min and threshold value DX th ,
Compare with DY th . Step S 3 is the comparison / judgment unit of step S 2 , and in this step S 3 (dx> DX th ) ∪
If (dy> DY th ) is determined, and if YES, information of a circumscribed quadrangle is created as a contour vector in step S 4 .
An example of this information is shown in FIG. 8 as a circumscribed quadrangle information table. Remove the vector of the same series in Step S 5 from the outline vector sequence table shown in FIG. 5, if NO to remove noise (noise) in the step S 3.
【0016】図7は輪郭ベクトルの外接四角形を示す説
明図であり、この図において、xmax,xmin,ymax,
yminは次のようになる。 xmax=max(…,xi,…) xmin=min(…,xi,…) ymax=max(…,yi,…) ymin=min(…,yi,…) dx=xmax−xmin dy=ymax−ymin (dx<DXth)∩(dy<DYth) 但し、DXth,DYth:しきい値である。FIG. 7 is an explanatory diagram showing the circumscribed quadrangle of the contour vector. In this figure, x max , x min , y max ,
y min is as follows. x max = max (..., x i , ...) x min = min (..., x i , ...) y max = max (..., y i , ...) y min = min (..., y i , ...) dx = x max −x min dy = y max −y min (dx <DX th ) ∩ (dy <DY th ) where DX th and DY th are threshold values.
【0017】上記のようにして求められた外接四角形情
報(図9)に基づいて次のような条件で文字候補を抽出
する。 条件1 wx<しきい値、且つwy<しきい値(wxは外接
四角形のx方向長さ、wyは外接四角形のy方向長さ) 条件2 外周輪郭ベクトル そして文字候補の外接四角形に完全包含される外周及び
内周の輪郭ベクトルは文字候補となった輪郭ベクトルに
リンク付けされる。Character candidates are extracted under the following conditions based on the circumscribed quadrangle information (FIG. 9) obtained as described above. Condition 1 w x <threshold, and w y <threshold (w x is the length of the circumscribing rectangle in the x direction, w y is the length of the circumscribing rectangle in the y direction) Condition 2 Outer contour vector and circumscribing rectangle of character candidate The contour vectors of the outer circumference and the inner circumference which are completely included in are linked to the contour vectors which are the character candidates.
【0018】次に文字列化処理(図2のステップS3)
について説明する。この処理は前記図4の多角形図形検
出処理後、図10(a),(b)のフローチャートに沿
って行う。即ちステップS1において輪郭ベクトル化が
終了したか否かを判定し、ステップS2において文字候
補又は文字図形候補か否かを判定し、ステップS3にお
いて文字候補をツリー構造として図1の登録部2に登録
する。またステップS4において文字列化対象終了か否
かを判定し、ステップS5においてまだ文字列化されて
いないかどうかを判定し、ステップS6において文字列
核となるか否かを判定する。文字列核となる場合はステ
ップS7において文字列核を中心に文字列候補範囲の探
索を行う。そしてステップS8において範囲内に見つか
ったか否かを判定し、見つかった場合は該見つかった文
字を文字列に含める(ステップS9)。またステップS
10において文字列化対象終了か否かを判定し、ステップ
S11においてまだ文字列化されていないかどうかを判定
し、ステップS12において文字図形候補か否かを判定
し、文字図形候補であればステップS13において当該文
字図形候補を図形と決定する。Next, a character string conversion process (step S 3 in FIG. 2)
Will be described. This process is performed according to the flow charts of FIGS. 10A and 10B after the polygon figure detection process of FIG. That is, it is determined in step S 1 whether or not contour vectorization is completed, it is determined in step S 2 whether or not it is a character candidate or a character / graphic candidate, and in step S 3 , the character candidate is set as a tree structure in the registration unit of FIG. Register in 2. Further, in step S 4 , it is determined whether or not the character string conversion target ends, in step S 5 it is determined whether or not the character string is not yet converted, and in step S 6 , it is determined whether or not it becomes a character string nucleus. If it is the character string nucleus, the character string candidate range is searched centering on the character string nucleus in step S 7 . Then it is determined whether or not found in the range in step S 8, if found contain characters which uses該見the string (step S 9). Also step S
In step 10 , it is determined whether or not it is the end of the character string conversion target, in step S 11 , it is determined whether the character string is not yet converted, and in step S 12 , it is determined whether or not it is a character / graphic candidate. in step S 13 if determining the graphic character candidate figure.
【0019】上記のように図4および図10(a),
(b)の処理を行うことにより、文字列化を開始する文
字列核は文字候補となったものだけという条件が加わ
り、文字図形候補からだけで構成される文字列がなくな
り、文字列化されていない文字図形候補は自動的に図形
候補として決定される。As described above, as shown in FIGS. 4 and 10 (a),
By performing the process of (b), the condition that the character string nucleus that starts the character string conversion is only the character candidate is added, and the character string composed only of the character and figure candidates disappears and is converted into the character string. The character / figure candidates that are not included are automatically determined as the figure candidates.
【0020】前記図10の文字列化処理は図1の登録部
2、文字列候補抽出部3、文字列抽出部4で次のように
して行われる。まず文字候補を対象に図11のように水
平方向(ステップS1),垂直方向(ステップS2)およ
び斜め方向(ステップS3)に文字切り出し処理を行
う。各方向の処理は各々同一の内容であり、図12に示
すように文字列候補抽出処理(ステップS1)を行った
後、文字列抽出処理(ステップS2)を行った後、文字切
り出し処理(ステップS3)を行う。The character string conversion process of FIG. 10 is performed by the registration unit 2, the character string candidate extraction unit 3, and the character string extraction unit 4 of FIG. 1 as follows. First, as shown in FIG. 11, character candidates are subjected to character cutting processing in the horizontal direction (step S 1 ), the vertical direction (step S 2 ) and the diagonal direction (step S 3 ). The processing in each direction has the same content. As shown in FIG. 12, after the character string candidate extraction processing (step S 1 ) is performed, the character string extraction processing (step S 2 ) is performed, and then the character cutout processing is performed. perform (step S 3).
【0021】文字列候補抽出処理は図13に示すように
文字候補のツリー構造化処理(ステップS1)を行った
後、文字列候補の範囲検索処理(ステップS2)を行
う。実際には図8のように得られた文字候補の外接四角
形の中心座標を基に空間的な2分割を繰り返す形式のツ
リー構造として登録部(図1の登録部2)に登録する。
そして文字候補のうち検索しようとする文字列の高さと
同程度の高さをもつものを文字列候補の核として図14
(a)に示すような検索範囲に中心座標がある文字候補
を前記登録部2の文字候補ツリーから検索する。In the character string candidate extraction processing, as shown in FIG. 13, the character candidate tree structuring processing (step S 1 ) is performed, and then the character string candidate range search processing (step S 2 ) is performed. Actually, the tree is registered in the registration unit (registration unit 2 in FIG. 1) as a tree structure in which the spatial division is repeated based on the center coordinates of the circumscribed quadrangle of the character candidate obtained as shown in FIG.
Then, among the character candidates, one having the same height as the height of the character string to be searched is used as the core of the character string candidates.
A character candidate having center coordinates in the search range as shown in (a) is searched from the character candidate tree of the registration unit 2.
【0022】この検索は次のような手順で図1の文字列
候補抽出部3が行う。 (1)文字列核から右側に図14(b)のような探索範
囲内に他の文字候補の中心点を検索する。 (2)前記(1)で見つかった中心点のうち、中心核よ
り最も遠い距離にあるものを次の探索範囲の開始点とす
る。もし(1)で中心点が見つからない場合は右側方向
の探索終り。 (3)左側方向について前記(1)、(2)と同様に行
う。This search is carried out by the character string candidate extraction unit 3 of FIG. 1 in the following procedure. (1) A central point of another character candidate is searched for within the search range as shown in FIG. 14B on the right side of the character string nucleus. (2) Among the center points found in (1) above, the one farthest from the central nucleus is set as the start point of the next search range. If the center point is not found in (1), the search ends to the right. (3) The leftward direction is performed in the same manner as (1) and (2) above.
【0023】上記の検索で見つけられたものを文字列候
補とする。尚文字列候補の核となる大きさを大きなもの
から数段階に分けて設定して検索を行うことにより、大
きさの異なる文字列に対しても対応がとれる。What is found by the above search is set as a character string candidate. By setting the core size of the character string candidates in several stages from the largest size and performing the search, it is possible to deal with character strings of different sizes.
【0024】次に文字列抽出部4は文字列候補の中から
次の条件に合うものを文字列として抽出する。すなわち
図15に示すように文字列候補核の高さhcと文字候補
間距離dが、 d(i,j)<hc×定数‥‥‥(1) なる関係にあるとき、図示文字列候補内のi,jが文字
列となる。Next, the character string extracting unit 4 extracts, from the character string candidates, those satisfying the following conditions as a character string. That is, as shown in FIG. 15, when the height hc of the character string candidate kernel and the distance d between the character candidates have a relationship of d (i, j) <hc × constant ... I and j are character strings.
【0025】次に文字分離処理(図2のステップS4)
を説明する。文字切り出し部5は前記抽出された文字列
から文字候補単位で文字の切り出しを行う。このとき次
の第(2)式に示す条件に合致する場合には図16のス
テップS1のように高さ方向分離文字統合を行い、第
(3)式,第(4)式に示す条件に合致する場合には図
16のステップS2のように幅方向分離文字統合を行
う。例えば図17のように高さ方向に文字候補i,jが
分離しているときに下記第(2)式の条件が成立すれ
ば、該文字候補i,jは図示の如く統合される。Next, character separation processing (step S 4 in FIG. 2)
Will be explained. The character cutout unit 5 cuts out characters from the extracted character string in units of character candidates. At this time, if the conditions shown in the following formula (2) are met, the height direction separated characters are integrated as in step S 1 of FIG. 16, and the conditions shown in formulas (3) and (4) are satisfied. If it matches, the width direction separated character integration is performed as in step S 2 of FIG. For example, when the condition of the following expression (2) is satisfied when the character candidates i and j are separated in the height direction as shown in FIG. 17, the character candidates i and j are integrated as shown.
【0026】max(xmin(i),xmin(j))≦(xcen(i) o
r xcen(j))≦ min(xmax(i),xmax(j))‥‥‥
(2) (xmin:外接四角形のx方向最小座標、xmax:外接四
角形のx方向最大座標、xcen:外接四角形のx方向中
心座標) また例えば図18のように幅方向に文字候補j,kが分
離しているときに下記第(3)式,第(4)式の条件が
成立すれば、該文字候補j,kは図示の如く統合され
る。 d(j,k)<d(文字列内平均)×定数‥‥(3) w(j)+w(k)+d(j,k)≦h×定数‥‥‥(4) (d:文字候補間距離、w:文字候補幅、h:文字列高
さ) 上述した文字切り出し処理(図11、図12)の過程で
は、文字候補が文字列として確定する毎に処理済マーク
がつき処理対象数が少なくなる。また図12の各処理で
は水平方向について説明したが、垂直方向については
x,y方向が逆になる。斜め方向については所定の傾き
角度を仮定し、その角度での座標変換処理が、文字列抽
出処理、文字切り出し処理(図12のステップS2,
S3)に含まれる。Max (x min (i), x min (j)) ≤ (x cen (i) o
r x cen (j)) ≤ min (x max (i), x max (j))
(2) (x min : minimum coordinate in the x direction of the circumscribing quadrangle, x max : maximum coordinate in the x direction of the circumscribing quadrangle, x cen : central coordinate in the x direction of the circumscribing quadrangle) Further, for example, as shown in FIG. , K are separated, if the following conditions (3) and (4) are satisfied, the character candidates j and k are integrated as shown. d (j, k) <d (average in character string) x constant ... (3) w (j) + w (k) + d (j, k) ≤ h x constant ... (4) (d: character candidate (Interval distance, w: character candidate width, h: character string height) In the process of the character cutting process (FIGS. 11 and 12) described above, a processed mark is added every time a character candidate is determined as a character string, and the number of processing targets is set. Is less. Although the horizontal direction has been described in each process of FIG. 12, the x and y directions are opposite in the vertical direction. A predetermined inclination angle is assumed for the diagonal direction, and the coordinate conversion processing at that angle is performed by the character string extraction processing and the character cutout processing (step S 2 in FIG. 12,
S 3 ).
【0027】文字認識処理(図2のステップS5)は次
のようにして行われる。文字認識部6は前記文字切り出
し部5で切り出された文字と、予め文字が登録された辞
書とを参照、比較し、一文字ずつ認識する。The character recognition process (step S 5 in FIG. 2) is performed as follows. The character recognition unit 6 refers to and compares the characters cut out by the character cutting unit 5 with a dictionary in which the characters are registered in advance, and recognizes each character.
【0028】[0028]
【発明の効果】以上のように本発明によれば、輪郭ベク
トル化処理部と、多角形図形検出処理部と、登録部と、
文字列候補抽出部と、文字列抽出部と、文字切り出し部
とを備え、前記文字切り出し部で切り出された文字デー
タに基づいて文字認識を行うとともに、前記多角形図形
検出処理部で求められた文字図形候補のうち文字列化さ
れていない文字図形候補を図形と認定するようにしたの
で、次のような優れた効果が得られる。As described above, according to the present invention, a contour vectorization processing section, a polygon figure detection processing section, a registration section,
A character string candidate extraction unit, a character string extraction unit, and a character cutout unit are provided, and character recognition is performed based on the character data cut out by the character cutout unit, and the polygonal figure detection processing unit obtains the character recognition. Since the character / figure candidates that are not converted into character strings among the character / figure candidates are identified as a graphic, the following excellent effects can be obtained.
【0029】(1)市街地図のように文字と同程度の大
きさの図形が同一図面上に存在する場合であっても、文
字と図形を確実に区別して認識することができる。 (2)文字、線分、シンボルが混在する書類や図面から
文字を読み取って認識する場合、大きさや書式に限定条
件を付けることなくいかなる大きさ、書式の文字であっ
ても容易に且つ正確に読み取ることができる。 (3)また文字候補のデータはツリー構造で登録してお
くので処理の高速化が図れる。(1) Even when a figure having a size similar to that of a character exists in the same drawing as in a city map, the character and the figure can be surely distinguished and recognized. (2) When reading and recognizing characters from a document or drawing in which characters, line segments, and symbols are mixed, it is easy and accurate to use characters of any size and format without limiting the size and format. Can be read. (3) Since the data of character candidates is registered in a tree structure, the processing speed can be increased.
【図1】本発明の一実施例を示す全体構成のブロック
図。FIG. 1 is a block diagram of an overall configuration showing an embodiment of the present invention.
【図2】本発明の一実施例の全体のフローチャート。FIG. 2 is an overall flowchart of an embodiment of the present invention.
【図3】本発明の輪郭ベクトル化処理部のフローチャー
ト。FIG. 3 is a flowchart of a contour vectorization processing unit of the present invention.
【図4】本発明の多角形図形検出処理のフローチャー
ト。FIG. 4 is a flowchart of polygonal figure detection processing according to the present invention.
【図5】本発明の一実施例の輪郭ベクトル系列テーブル
を示す説明図。FIG. 5 is an explanatory diagram showing a contour vector series table according to an embodiment of the present invention.
【図6】本発明の一実施例のショートベクトル除去手段
のフローチャート。FIG. 6 is a flowchart of the short vector removing means according to the embodiment of the present invention.
【図7】本発明の一実施例の外接四角形の説明図。FIG. 7 is an explanatory diagram of a circumscribed quadrangle according to an embodiment of the present invention.
【図8】本発明の一実施例の外接四角形情報テーブルの
説明図。FIG. 8 is an explanatory diagram of a circumscribed quadrangle information table according to the embodiment of this invention.
【図9】本発明の一実施例の輪郭ベクトル化処理の説明
図。FIG. 9 is an explanatory diagram of contour vectorization processing according to an embodiment of the present invention.
【図10】本発明の文字列化処理のフローチャート。FIG. 10 is a flowchart of a character string conversion process of the present invention.
【図11】本発明の一実施例の文字切り出し処理のフロ
ーチャート。FIG. 11 is a flowchart of character cutting processing according to an embodiment of the present invention.
【図12】本発明の一実施例の文字切り出し処理のフロ
ーチャート。FIG. 12 is a flowchart of character cutting processing according to an embodiment of the present invention.
【図13】本発明の一実施例の文字列候補抽出処理のフ
ローチャート。FIG. 13 is a flowchart of a character string candidate extraction process according to an embodiment of the present invention.
【図14】本発明の一実施例の文字列候補抽出処理の説
明図。FIG. 14 is an explanatory diagram of character string candidate extraction processing according to an embodiment of the present invention.
【図15】本発明の一実施例の文字列抽出処理の説明
図。FIG. 15 is an explanatory diagram of character string extraction processing according to an embodiment of the present invention.
【図16】本発明の一実施例の文字切り出し処理のフロ
ーチャート。FIG. 16 is a flowchart of character cutting processing according to an embodiment of the present invention.
【図17】本発明の一実施例の高さ方向分離文字統合処
理の説明図。FIG. 17 is an explanatory diagram of height direction separated character integration processing according to an embodiment of the present invention.
【図18】本発明の一実施例の幅方向分離文字統合処理
の説明図。FIG. 18 is an explanatory diagram of width-direction separated character integration processing according to an embodiment of the present invention.
【図19】(a)は図面読取装置に入力する図面の一例
を示す説明図、(b)は正しい認識結果例を示す説明
図。FIG. 19A is an explanatory diagram showing an example of a drawing input to a drawing reading device, and FIG. 19B is an explanatory diagram showing an example of a correct recognition result.
【図20】(a)は従来の図面読取装置によって図形を
文字と誤って認識した例を示す説明図、(b)は従来の
図面読取装置によって文字を図形と誤って認識した例を
示す説明図。20A is an explanatory diagram showing an example in which a figure is mistakenly recognized as a character by a conventional drawing reading apparatus, and FIG. 20B is an explanatory diagram showing an example in which a figure is mistakenly recognized as a figure by a conventional drawing reading apparatus. Fig.
1…輪郭ベクトル化処理部、2…登録部、3…文字列候
補抽出部、4…文字列抽出部、5…文字切り出し部、6
…文字認識部。1 ... Contour vectorization processing unit, 2 ... Registration unit, 3 ... Character string candidate extraction unit, 4 ... Character string extraction unit, 5 ... Character cutout unit, 6
… Character recognition part.
Claims (1)
をラスタスキャンして得られた黒白に対応する2値化画
素データから文字データを抽出し、該抽出された文字デ
ータと文字辞書を参照、比較して文字を認識する図面読
取装置において、 前記被写体をラスタスキャンして得られた黒白に対応す
る2値化画素データに基づいて、互いに隣接した2個の
黒画素を結ぶベクトルを結合して成る輪郭ベクトルのデ
ータを求めるとともに、該輪郭ベクトルに外接する外接
四角形データを文字候補として求める輪郭ベクトル化処
理部と、 前記輪郭ベクトル化処理部で求められた文字候補のデー
タから多角形図形を検出し、該検出された多角形図形デ
ータを文字図形候補として求める多角形図形検出処理部
と、 前記輪郭ベクトル化処理部で求められた文字候補の外接
四角形データをツリー構造で登録する登録部と、 所定の大きさに設定した文字列候補の核から検索範囲を
決定し、前記登録部に登録された文字候補のうち中心座
標が前記検索範囲内に存在する文字列を文字列候補とし
て抽出する文字列候補抽出部と、 前記文字列候補抽出部で抽出されたデータの中から、文
字列候補核の高さhCと文字列候補内の互いに隣接する
文字候補間の距離dがd<hC×α(αは定数)なる関
係にあるデータを文字列として抽出する文字列抽出部
と、 前記文字列抽出部で抽出された文字列内の近接する複数
の文字候補を統合する機能を有し、文字列から文字を切
り出す文字切り出し部とを備え、 前記文字切り出し部で切り出された文字データに基づい
て文字認識を行うとともに、前記多角形図形検出処理部
で求められた文字図形候補のうち文字列化されていない
文字図形候補を図形と認定することを特徴とする図面読
取装置。1. Character data is extracted from binarized pixel data corresponding to black and white obtained by raster-scanning a subject in which characters, line segments, and symbols are mixed, and the extracted character data and a character dictionary are referred to. In a drawing reading apparatus for recognizing characters by comparison, a vector connecting two black pixels adjacent to each other is combined based on binary pixel data corresponding to black and white obtained by raster scanning the object. The contour vector data obtained by the contour vector, and a contour vector conversion processing unit that determines the circumscribed quadrangle data circumscribing the contour vector as a character candidate; and a polygon figure from the character candidate data calculated by the contour vector conversion processing unit. A polygon figure detection processing unit which detects and detects the detected polygon figure data as a character figure candidate, and a sentence which is obtained by the contour vectorization processing unit. The search range is determined from the registration unit that registers the circumscribed quadrangle data of the character candidates in a tree structure, and the core of the character string candidates set to a predetermined size, and the center coordinates of the character candidates registered in the registration unit are A character string candidate extraction unit that extracts a character string existing within the search range as a character string candidate, and a height h C of the character string candidate core and a character string candidate from the data extracted by the character string candidate extraction unit. A character string extraction unit that extracts data having a relationship that the distance d between adjacent character candidates in d is d <h C × α (α is a constant), and the character extracted by the character string extraction unit. Having a function of integrating a plurality of adjacent character candidates in a column, a character cutout unit for cutting out a character from a character string, and performing character recognition based on the character data cut out by the character cutout unit, Polygon figure detection processing Drawing reading apparatus characterized by qualifying the graphic character graphic candidates that are not stringified of the graphic character candidate obtained in.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP4039920A JPH05242294A (en) | 1992-02-27 | 1992-02-27 | Drawing reader |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP4039920A JPH05242294A (en) | 1992-02-27 | 1992-02-27 | Drawing reader |
Publications (1)
Publication Number | Publication Date |
---|---|
JPH05242294A true JPH05242294A (en) | 1993-09-21 |
Family
ID=12566376
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP4039920A Pending JPH05242294A (en) | 1992-02-27 | 1992-02-27 | Drawing reader |
Country Status (1)
Country | Link |
---|---|
JP (1) | JPH05242294A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2010134919A1 (en) * | 2009-05-21 | 2010-11-25 | Hewlett-Packard Development Company, L.P. | Generation of an individual glyph, and system and method for inspecting individual glyphs |
JP2016018428A (en) * | 2014-07-09 | 2016-02-01 | 株式会社東芝 | Electronic apparatus, method and program |
-
1992
- 1992-02-27 JP JP4039920A patent/JPH05242294A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2010134919A1 (en) * | 2009-05-21 | 2010-11-25 | Hewlett-Packard Development Company, L.P. | Generation of an individual glyph, and system and method for inspecting individual glyphs |
CN102439607A (en) * | 2009-05-21 | 2012-05-02 | 惠普开发有限公司 | Generation of an individual glyph, and system and method for inspecting individual glyphs |
US8818047B2 (en) | 2009-05-21 | 2014-08-26 | Hewlett-Packard Development Company, L.P. | Generation of an individual glyph, and system and method for inspecting individual glyphs |
JP2016018428A (en) * | 2014-07-09 | 2016-02-01 | 株式会社東芝 | Electronic apparatus, method and program |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US4903311A (en) | Character region extracting method and apparatus capable of implementing the method | |
JPH0418351B2 (en) | ||
JPH05242294A (en) | Drawing reader | |
JP2917427B2 (en) | Drawing reader | |
JP2797523B2 (en) | Drawing follower | |
JPH02116987A (en) | Character recognizing device | |
JPH06180771A (en) | English letter recognizing device | |
JPH0728935A (en) | Document image processor | |
JPH03189888A (en) | Kind decision device for character string in drawing reader | |
JP3140079B2 (en) | Ruled line recognition method and table processing method | |
JPH10222607A (en) | Method for recognizing separation of graphic element from character element | |
JPS63269267A (en) | Character recognizing device | |
JP2974396B2 (en) | Image processing method and apparatus | |
JPH07160810A (en) | Character recognition device | |
JPH05282493A (en) | Roman letter recognizing device | |
JP3151866B2 (en) | English character recognition method | |
JP3193573B2 (en) | Character recognition device with brackets | |
JPH08202822A (en) | Character cutting device and character cutting method | |
JP2832035B2 (en) | Character recognition device | |
JP3411795B2 (en) | Character recognition device | |
JPH03163683A (en) | Drawing reader | |
JPH07192094A (en) | Character cutout circuit and character cutout method | |
JPH0757047A (en) | Character segmentation system | |
JPH0334081A (en) | Drawing reader | |
JPH0896078A (en) | Character recognizing device |