JPH05242294A

JPH05242294A - Drawing reader

Info

Publication number: JPH05242294A
Application number: JP4039920A
Authority: JP
Inventors: Tetsuya Yasuda; 哲也安田; Takeshi Aizawa; 毅相澤
Original assignee: Meidensha Corp; Meidensha Electric Manufacturing Co Ltd
Current assignee: Meidensha Corp; Meidensha Electric Manufacturing Co Ltd
Priority date: 1992-02-27
Filing date: 1992-02-27
Publication date: 1993-09-21

Abstract

PURPOSE:To recognize a character and a graphic by surely distinguishing the one from the other even when the graphic with same size as that of the character and the character exist on the same drawing in a drawing reader which extracts character data from binary picture element data in accordance with black/white obtained by raster-scanning a subject in which the character, a line segment, and a symbol are mixed. CONSTITUTION:A character candidate is found by performing contour vectorization processing based on the binary picture element data (step S1), and polygonal graphic detection processing is applied to the character candidate and it is classified to the character candidate and a character graphic candidate(step S2), and the character candidate is made into a character string, and also, the character graphic candidate not being made into the character string is set as the graphic(step S3), and the character is segmented from the character string by unifying adjacent plural character candidates in the character string(step S4), and character recognition is performed based on segmented character data(step S5).

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は文字、線分、シンボル等
が混在する書類や図面から、文字データを検出してその
文字の認識を行う図面読取装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a drawing reading apparatus for detecting character data from a document or drawing in which characters, line segments, symbols and the like are mixed and recognizing the character.

【０００２】[0002]

【従来の技術】従来、文字や図形等を読み取る図面読取
装置は例えば特願平１−２５３４５１号に記載されたも
のがある。この装置は、文字、線分、シンボルが混在す
る書類や図面等の被写体をラスタスキャンして得られた
黒白に対応する２値化画素データについて輪郭ベクトル
化処理を施し、該輪郭ベクトルに外接する外接四角形デ
ータを文字候補として求める輪郭ベクトル化処理部と、
該文字候補の外接四角形データをツリー構造で登録する
登録部と、該登録された文字候補から文字列候補を抽出
する文字列候補抽出部と、該文字列候補から文字列を抽
出する文字列抽出部と、該抽出された文字列内の複数の
文字候補を統合する機能を有した文字切り出し部とを設
けて構成されている。2. Description of the Related Art Conventionally, a drawing reading device for reading characters and figures is disclosed in Japanese Patent Application No. 1-253451. This apparatus performs contour vectorization processing on binarized pixel data corresponding to black and white obtained by raster scanning an object such as a document or drawing in which characters, line segments, and symbols are mixed, and circumscribes the contour vector. A contour vectorization processing unit that obtains circumscribed quadrangle data as character candidates,
A registration unit that registers the circumscribed quadrangle data of the character candidates in a tree structure, a character string candidate extraction unit that extracts the character string candidates from the registered character candidates, and a character string extraction that extracts the character strings from the character string candidates. And a character cutout unit having a function of integrating a plurality of character candidates in the extracted character string.

【０００３】[0003]

【発明が解決しようとする課題】前記図面読取装置は、
入力対象となる図面中に文字の大きさと同程度の、文字
以外の孤立図形が無い場合に有効であるが、市街区地図
のように文字と同程度の大きさを持つ建築物を表す図形
が存在しているような図面を入力する場合には次のよう
な問題が生じる。The drawing reading device is
This is effective when there are no isolated figures other than characters in the drawing to be entered, which are of the same size as the characters, but a figure representing a building that has the same size as the characters, such as an urban map, is used. The following problems occur when inputting a drawing that exists.

【０００４】すなわち要素分離時に文字サイズ以下の図
形を文字候補としてしまうため、誤った認識結果となっ
てしまう。例えば図１９（ａ）に示すような地図を認識
する場合、正しい認識結果は図１９（ｂ）となるが、前
記図面読取装置により認識した場合、図２０（ａ）のよ
うに図形を文字と誤って認識したり、図２０（ｂ）のよ
うに文字を図形と誤って認識してしまう。That is, when the elements are separated, a graphic having a size smaller than the character size is selected as a character candidate, resulting in an incorrect recognition result. For example, when recognizing a map as shown in FIG. 19 (a), the correct recognition result is as shown in FIG. 19 (b). However, when it is recognized by the drawing reading device, a figure is converted into a character as shown in FIG. 20 (a). The character may be erroneously recognized, or the character may be erroneously recognized as a figure as shown in FIG.

【０００５】本発明は上記の点に鑑みてなされたもので
その目的は、文字と同程度の大きさの図形が同一図面上
に存在する場合であっても、文字と図形を確実に区別し
て認識することができる図面読取装置を提供することに
ある。The present invention has been made in view of the above points, and an object thereof is to reliably distinguish a character from a figure even when a figure having a size similar to that of a character exists in the same drawing. It is to provide a drawing reading device that can be recognized.

【０００６】[0006]

【課題を解決するための手段】本発明は、文字、線分、
シンボルが混在する被写体をラスタスキャンして得られ
た黒白に対応する２値化画素データから文字データを抽
出し、該抽出された文字データと文字辞書を参照、比較
して文字を認識する図面読取装置において、前記被写体
をラスタスキャンして得られた黒白に対応する２値化画
素データに基づいて、互いに隣接した２個の黒画素を結
ぶベクトルを結合して成る輪郭ベクトルのデータを求め
るとともに、該輪郭ベクトルに外接する外接四角形デー
タを文字候補として求める輪郭ベクトル化処理部と、前
記輪郭ベクトル化処理部で求められた文字候補のデータ
から多角形図形を検出し、該検出された多角形図形デー
タを文字図形候補として求める多角形図形検出処理部
と、前記輪郭ベクトル化処理部で求められた文字候補の
外接四角形データをツリー構造で登録する登録部と、所
定の大きさに設定した文字列候補の核から検索範囲を決
定し、前記登録部に登録された文字候補のうち中心座標
が前記検索範囲内に存在する文字列を文字列候補として
抽出する文字列候補抽出部と、前記文字列候補抽出部で
抽出されたデータの中から、文字列候補核の高さｈ_Cと
文字列候補内の互いに隣接する文字候補間の距離ｄがｄ
＜ｈ_C×α（αは定数）なる関係にあるデータを文字列
として抽出する文字列抽出部と、前記文字列抽出部で抽
出された文字列内の近接する複数の文字候補を統合する
機能を有し、文字列から文字を切り出す文字切り出し部
とを備え、前記文字切り出し部で切り出された文字デー
タに基づいて文字認識を行うとともに、前記多角形図形
検出処理部で求められた文字図形候補のうち文字列化さ
れていない文字図形候補を図形と認定することを特徴と
している。SUMMARY OF THE INVENTION The present invention is directed to characters, line segments,
Drawing reading for recognizing characters by extracting character data from binary pixel data corresponding to black and white obtained by raster scanning an object with mixed symbols and referring to the extracted character data and a character dictionary for comparison In the apparatus, based on the binarized pixel data corresponding to black and white obtained by raster-scanning the subject, data of a contour vector formed by combining vectors connecting two black pixels adjacent to each other is obtained, and A contour vectorization processing unit that obtains circumscribed quadrangle data circumscribing the contour vector as a character candidate, and a polygonal figure is detected from the character candidate data obtained by the contour vectorization processing unit, and the detected polygonal figure A polygon figure detection processing unit that obtains data as character and figure candidates, and circumscribed quadrangle data of the character candidates obtained by the contour vectorization processing unit The registration range to be registered in the Lee structure, and the search range is determined from the core of the character string candidates set to a predetermined size, and among the character candidates registered in the registration unit, the central coordinates are characters within the search range. The character string candidate extraction unit that extracts a string as a character string candidate, and the height h _C of the character string candidate core and the adjacent character candidates in the character string candidate from the data extracted by the character string candidate extraction unit. The distance d is d
<H _C × α (α is a constant) A character string extraction unit for extracting data as a character string, and a function for integrating a plurality of adjacent character candidates in the character string extracted by the character string extraction unit And a character slicing unit for slicing a character from a character string, and performing character recognition based on the character data cut out by the character slicing unit, and a character and figure candidate obtained by the polygonal figure detection processing unit. The feature is that character figure candidates that are not converted into character strings are recognized as figures.

【０００７】[0007]

【作用】前記ベクトル化処理部によって外接四角形を求
めると文字、線分、シンボルが混在する被写体（書類や
図面）の中から文字候補のデータが得られる。この文字
候補の外接四角形データはツリー構造で登録部に登録さ
れる。また多角形図形検出処理部は前記文字候補データ
から多角形図形を検出し、文字図形候補を求める。When the circumscribed quadrangle is obtained by the vectorization processing unit, character candidate data can be obtained from a subject (document or drawing) in which characters, line segments, and symbols are mixed. The circumscribed quadrangle data of the character candidates is registered in the registration unit in a tree structure. The polygonal figure detection processing unit detects a polygonal figure from the character candidate data and obtains a character figure candidate.

【０００８】文字列候補抽出部は文字列候補核で決まる
検索範囲で前記登録部内のデータを検索し、中心座標が
検索範囲内にある文字列を文字列候補として抽出する。
前記外接四角形データはツリー構造で登録されているの
で領域四角形の近傍検索処理は軽減される。前記文字列
候補の中で、ｄ＜ｈ_C×α（ｄは隣接する文字候補間距
離、ｈ_Cは文字列候補核の高さ、αは定数）の関係が成
立するデータが文字列抽出部によって文字列として抽出
される。すなわち被写体の中に混在する線分やシンボル
は除外され文字列のデータのみが抽出される。The character string candidate extraction unit searches the data in the registration unit within a search range determined by the character string candidate kernel, and extracts a character string whose center coordinates are within the search range as a character string candidate.
Since the circumscribed quadrangle data is registered in a tree structure, the area quadrangle neighborhood search processing is reduced. Among the character string candidates, data in which the relationship of d <h _C × α (d is the distance between adjacent character candidates, h _C is the height of the character string candidate kernel, and α is a constant) is satisfied is the character string extraction unit. It is extracted as a character string by. That is, line segments and symbols mixed in the subject are excluded and only character string data is extracted.

【０００９】次に文字切り出し部は前記文字列から文字
を切り出す。このとき文字列内の近接する複数の文字候
補の大きさや文字候補間距離が所定値である場合には、
それら複数の文字候補を統合し、統合された文字候補を
１個の文字として切り出す。これによって文字の大きさ
や書式がいかなるものであっても文字として読み取るこ
とができる。Next, the character cutout unit cuts out a character from the character string. At this time, if the size of the plurality of adjacent character candidates in the character string and the distance between the character candidates are predetermined values,
The plurality of character candidates are integrated and the integrated character candidate is cut out as one character. As a result, the characters can be read as characters regardless of the size and format of the characters.

【００１０】また前記多角形図形検出処理部で求められ
た文字図形候補のうち文字列化されていない文字図形候
補を図形と認定する。このため市街地図のように文字と
同程度の大きさの図形が同一図面上に存在する場合であ
っても、文字と図形を確実に区別して認識することがで
きる。Further, among the character and figure candidates obtained by the polygonal figure detection processing section, character and character candidates which are not converted into character strings are recognized as figures. Therefore, even when a figure having a size similar to that of a character exists in the same drawing as in a city map, the character and the figure can be reliably distinguished and recognized.

【００１１】[0011]

【実施例】以下、図面を参照しながら本発明の一実施例
を説明する。まず本発明による図面読取装置は図１のよ
うに輪郭ベクトル化処理部１、登録部２、文字列候補抽
出部３、文字列抽出部４、文字切り出し部５および文字
認識部６を備えており、全体の処理の流れは図２のよう
に示される。すなわちまずステップＳ₁において輪郭ベ
クトル化処理部１が前処理を行い、ステップＳ₂におい
て多角形図形検出処理を行い、ステップＳ₃において登
録部２、文字列候補抽出部３および文字列抽出部４が文
字列化処理を行い、ステップＳ₄において文字切り出し
部５が文字分離処理を行い、ステップＳ₅において文字
認識部６が文字認識処理を行う。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings. First, the drawing reading apparatus according to the present invention includes a contour vectorization processing unit 1, a registration unit 2, a character string candidate extraction unit 3, a character string extraction unit 4, a character cutout unit 5 and a character recognition unit 6 as shown in FIG. The overall processing flow is shown in FIG. That is, first, in step S ₁ , the contour vectorization processing unit 1 performs preprocessing, in step S ₂ , polygon figure detection processing is performed, and in step S ₃ , the registration unit 2, the character string candidate extraction unit 3, and the character string extraction unit 4 are performed. There performs character string processing, character extraction section 5 performs a character segmentation process in step S _4, the character recognition unit 6 performs character recognition processing in step S _5.

【００１２】前記輪郭ベクトル化処理部１が行う前処理
は図３のようなフローチャートで表される。すなわちス
テップＳ₁において画像入力処理を行い、ステップＳ₂に
おいて輪郭ベクトル化処理を行い、ステップＳ₃におい
て要素分離処理、多角形図形検出処理を行う。The preprocessing performed by the contour vectorization processing unit 1 is represented by the flowchart shown in FIG. That performs image input processing in step S _1, whereby the border vectorized process in step S _2, the element separation process in step S _3, performs polygon graphic detection processing.

【００１３】このうち前記画像入力処理および輪郭ベク
トル化処理は、例えば特願平１−２５３４５１号の第４
図から第１３図に記載された手段によって行う。すなわ
ち文字、線分、シンボルが混在する被写体をラスタスキ
ャンして得られた黒白に対応する２値化画素データに基
づいて、互いに隣接した２個の黒画素を結ぶベクトルを
結合して成る輪郭ベクトルのデータを求めるとともに、
該輪郭ベクトルに外接する外接四角形データを求める
（図９）。Of these, the image input processing and the contour vectorization processing are described, for example, in No. 4 of Japanese Patent Application No. 1-253451.
It is performed by the means described in FIGS. That is, based on the binarized pixel data corresponding to black and white obtained by raster-scanning a subject in which characters, line segments, and symbols are mixed, a contour vector formed by combining vectors connecting two black pixels adjacent to each other. While seeking the data of
The circumscribed quadrangle data circumscribing the contour vector is obtained (FIG. 9).

【００１４】また多角形図形検出処理は図４のフローチ
ャートに沿って行う。まずステップＳ₁において輪郭ベ
クトル化が終了したか否かを判定し、ステップＳ₂にお
いて文字候補か否かを判定し、文字候補であればステッ
プＳ₃において多角形図形検出処理を行い、ステップＳ₄
において多角形図形か否かを判定し、多角形図形であれ
ば当該文字候補を文字図形候補に変更する（ステップＳ
₅）。上記のように多角形図形検出処理を行うことによ
り、文字候補と文字図形候補の２つに区分することがで
きる。The polygon figure detection processing is performed according to the flowchart of FIG. First, in step S ₁ , it is determined whether or not contour vectorization is completed, and in step S ₂ , it is determined whether or not it is a character candidate. If it is a character candidate, polygon figure detection processing is performed in step S ₃ , and step S ₃ _Four
It is determined whether or not it is a polygon figure, and if it is a polygon figure, the character candidate is changed to a character figure candidate (step S).
₅ ). By performing the polygonal figure detection processing as described above, it is possible to classify into two, character candidates and character figure candidates.

【００１５】尚前記文字候補の抽出（要素分離処理）は
次のようにして行う。まず図５（特願平１−２５３４５
１号の第９図と同一）の輪郭ベクトル系列テーブルを使
用してショートベクトルを除去する。図６はショートベ
クトル除去手段を説明するフローチャートであり、ステ
ップＳ₁で図５に示す輪郭ベクトル系列テーブルから輪
郭ベクトルの外接四角形（図７に示す）(ｘ_min,ｙ_min)、
(ｘ_max,ｙ_max)の算出を行う。次にステップＳ₂でｄｘ＝
ｘ_man−ｘ_min，ｄｙ＝ｙ_max−ｙ_minとしきい値ＤＸ_th，
ＤＹ_th とを比較する。ステップＳ₃ はステップＳ₂の比
較判断部で、このステップＳ₃で（ｄｘ＞ＤＸ_th）∪
（ｄｙ＞ＤＹ_th）を判断し、ＹＥＳならステップＳ₄で
輪郭ベクトルとして外接四角形という情報を作成する。
この情報の一例を図８に外接四角形情報テーブルとして
示す。前記ステップＳ₃でＮＯなら図５に示す輪郭ベク
トル系列テーブルからステップＳ₅で同一系列のベクト
ルを削除して雑音（ノイズ）を除去する。The extraction of the character candidates (element separation processing) is performed as follows. First, FIG. 5 (Japanese Patent Application No. 1-25345)
The short vector is removed using the contour vector sequence table (identical to FIG. 9 of No. 1). FIG. 6 is a flow chart for explaining the short vector removing means. In step S ₁ , the contour vector circumscribed quadrangle (shown in FIG. 7) from the contour vector series table shown in FIG. 5 (x _min , y _min ),
(x _max , y _max ) is calculated. Next, in step S ₂ , dx =
x _man −x _min , dy = y _max −y _min and threshold value DX _th ,
Compare with DY _th . Step S ₃ is the comparison / judgment unit of step S ₂ , and in this step S ₃ (dx> DX _th ) ∪
If (dy> DY _th ) is determined, and if YES, information of a circumscribed quadrangle is created as a contour vector in step S ₄ .
An example of this information is shown in FIG. 8 as a circumscribed quadrangle information table. Remove the vector of the same series in Step S ₅ from the outline vector sequence table shown in FIG. 5, if NO to remove noise (noise) in the step S _3.

【００１６】図７は輪郭ベクトルの外接四角形を示す説
明図であり、この図において、ｘ_max，ｘ_min，ｙ_max，
ｙ_minは次のようになる。ｘ_max＝ｍａｘ（…，ｘ_i，…）ｘ_min＝ｍｉｎ（…，ｘ_i，…）ｙ_max＝ｍａｘ（…，ｙ_i，…）ｙ_min＝ｍｉｎ（…，ｙ_i，…）ｄｘ＝ｘ_max−ｘ_min ｄｙ＝ｙ_max−ｙ_min （ｄｘ＜ＤＸ_th）∩（ｄｙ＜ＤＹ_th）但し、ＤＸ_th，ＤＹ_th：しきい値である。FIG. 7 is an explanatory diagram showing the circumscribed quadrangle of the contour vector. In this figure, x _max , x _min , y _max ,
y _min is as follows. x _max = max (..., x _i , ...) x _min = min (..., x _i , ...) y _max = max (..., y _i , ...) y _min = min (..., y _i , ...) dx = x _max −x _min dy = y _max −y _min (dx <DX _th ) ∩ (dy <DY _th ) where DX _th and DY _th are threshold values.

【００１７】上記のようにして求められた外接四角形情
報（図９）に基づいて次のような条件で文字候補を抽出
する。条件１ｗ_x＜しきい値、且つｗ_y＜しきい値(ｗ_xは外接
四角形のｘ方向長さ、ｗ_yは外接四角形のｙ方向長さ）条件２外周輪郭ベクトルそして文字候補の外接四角形に完全包含される外周及び
内周の輪郭ベクトルは文字候補となった輪郭ベクトルに
リンク付けされる。Character candidates are extracted under the following conditions based on the circumscribed quadrangle information (FIG. 9) obtained as described above. Condition 1 w _x <threshold, and w _y <threshold (w _x is the length of the circumscribing rectangle in the x direction, w _y is the length of the circumscribing rectangle in the y direction) Condition 2 Outer contour vector and circumscribing rectangle of character candidate The contour vectors of the outer circumference and the inner circumference which are completely included in are linked to the contour vectors which are the character candidates.

【００１８】次に文字列化処理（図２のステップＳ₃）
について説明する。この処理は前記図４の多角形図形検
出処理後、図１０（ａ），（ｂ）のフローチャートに沿
って行う。即ちステップＳ₁において輪郭ベクトル化が
終了したか否かを判定し、ステップＳ₂において文字候
補又は文字図形候補か否かを判定し、ステップＳ₃にお
いて文字候補をツリー構造として図１の登録部２に登録
する。またステップＳ₄において文字列化対象終了か否
かを判定し、ステップＳ₅においてまだ文字列化されて
いないかどうかを判定し、ステップＳ₆において文字列
核となるか否かを判定する。文字列核となる場合はステ
ップＳ₇において文字列核を中心に文字列候補範囲の探
索を行う。そしてステップＳ₈において範囲内に見つか
ったか否かを判定し、見つかった場合は該見つかった文
字を文字列に含める（ステップＳ₉）。またステップＳ
₁₀において文字列化対象終了か否かを判定し、ステップ
Ｓ₁₁においてまだ文字列化されていないかどうかを判定
し、ステップＳ₁₂において文字図形候補か否かを判定
し、文字図形候補であればステップＳ₁₃において当該文
字図形候補を図形と決定する。Next, a character string conversion process (step S _{3 in} FIG. 2)
Will be described. This process is performed according to the flow charts of FIGS. 10A and 10B after the polygon figure detection process of FIG. That is, it is determined in step S ₁ whether or not contour vectorization is completed, it is determined in step S ₂ whether or not it is a character candidate or a character / graphic candidate, and in step S ₃ , the character candidate is set as a tree structure in the registration unit of FIG. Register in 2. Further, in step S ₄ , it is determined whether or not the character string conversion target ends, in step S ₅ it is determined whether or not the character string is not yet converted, and in step S ₆ , it is determined whether or not it becomes a character string nucleus. If it is the character string nucleus, the character string candidate range is searched centering on the character string nucleus in step S ₇ . Then it is determined whether or not found in the range in step S _8, if found contain characters which uses該見the string (step S _9). Also step S
_{In step 10} , it is determined whether or not it is the end of the character string conversion target, in step S ₁₁ , it is determined whether the character string is not yet converted, and in step S ₁₂ , it is determined whether or not it is a character / graphic candidate. in step S ₁₃ if determining the graphic character candidate figure.

【００１９】上記のように図４および図１０（ａ），
（ｂ）の処理を行うことにより、文字列化を開始する文
字列核は文字候補となったものだけという条件が加わ
り、文字図形候補からだけで構成される文字列がなくな
り、文字列化されていない文字図形候補は自動的に図形
候補として決定される。As described above, as shown in FIGS. 4 and 10 (a),
By performing the process of (b), the condition that the character string nucleus that starts the character string conversion is only the character candidate is added, and the character string composed only of the character and figure candidates disappears and is converted into the character string. The character / figure candidates that are not included are automatically determined as the figure candidates.

【００２０】前記図１０の文字列化処理は図１の登録部
２、文字列候補抽出部３、文字列抽出部４で次のように
して行われる。まず文字候補を対象に図１１のように水
平方向（ステップＳ₁），垂直方向（ステップＳ₂）およ
び斜め方向（ステップＳ₃）に文字切り出し処理を行
う。各方向の処理は各々同一の内容であり、図１２に示
すように文字列候補抽出処理（ステップＳ₁）を行った
後、文字列抽出処理(ステップＳ₂）を行った後、文字切
り出し処理（ステップＳ₃）を行う。The character string conversion process of FIG. 10 is performed by the registration unit 2, the character string candidate extraction unit 3, and the character string extraction unit 4 of FIG. 1 as follows. First, as shown in FIG. 11, character candidates are subjected to character cutting processing in the horizontal direction (step S ₁ ), the vertical direction (step S ₂ ) and the diagonal direction (step S ₃ ). The processing in each direction has the same content. As shown in FIG. 12, after the character string candidate extraction processing (step S ₁ ) is performed, the character string extraction processing (step S ₂ ) is performed, and then the character cutout processing is performed. perform (step S _3).

【００２１】文字列候補抽出処理は図１３に示すように
文字候補のツリー構造化処理（ステップＳ₁）を行った
後、文字列候補の範囲検索処理（ステップＳ₂）を行
う。実際には図８のように得られた文字候補の外接四角
形の中心座標を基に空間的な２分割を繰り返す形式のツ
リー構造として登録部（図１の登録部２）に登録する。
そして文字候補のうち検索しようとする文字列の高さと
同程度の高さをもつものを文字列候補の核として図１４
（ａ）に示すような検索範囲に中心座標がある文字候補
を前記登録部２の文字候補ツリーから検索する。In the character string candidate extraction processing, as shown in FIG. 13, the character candidate tree structuring processing (step S ₁ ) is performed, and then the character string candidate range search processing (step S ₂ ) is performed. Actually, the tree is registered in the registration unit (registration unit 2 in FIG. 1) as a tree structure in which the spatial division is repeated based on the center coordinates of the circumscribed quadrangle of the character candidate obtained as shown in FIG.
Then, among the character candidates, one having the same height as the height of the character string to be searched is used as the core of the character string candidates.
A character candidate having center coordinates in the search range as shown in (a) is searched from the character candidate tree of the registration unit 2.

【００２２】この検索は次のような手順で図１の文字列
候補抽出部３が行う。（１）文字列核から右側に図１４（ｂ）のような探索範
囲内に他の文字候補の中心点を検索する。（２）前記（１）で見つかった中心点のうち、中心核よ
り最も遠い距離にあるものを次の探索範囲の開始点とす
る。もし（１）で中心点が見つからない場合は右側方向
の探索終り。（３）左側方向について前記（１）、（２）と同様に行
う。This search is carried out by the character string candidate extraction unit 3 of FIG. 1 in the following procedure. (1) A central point of another character candidate is searched for within the search range as shown in FIG. 14B on the right side of the character string nucleus. (2) Among the center points found in (1) above, the one farthest from the central nucleus is set as the start point of the next search range. If the center point is not found in (1), the search ends to the right. (3) The leftward direction is performed in the same manner as (1) and (2) above.

【００２３】上記の検索で見つけられたものを文字列候
補とする。尚文字列候補の核となる大きさを大きなもの
から数段階に分けて設定して検索を行うことにより、大
きさの異なる文字列に対しても対応がとれる。What is found by the above search is set as a character string candidate. By setting the core size of the character string candidates in several stages from the largest size and performing the search, it is possible to deal with character strings of different sizes.

【００２４】次に文字列抽出部４は文字列候補の中から
次の条件に合うものを文字列として抽出する。すなわち
図１５に示すように文字列候補核の高さｈｃと文字候補
間距離ｄが、ｄ（ｉ，ｊ）＜ｈｃ×定数‥‥‥（１）なる関係にあるとき、図示文字列候補内のｉ，ｊが文字
列となる。Next, the character string extracting unit 4 extracts, from the character string candidates, those satisfying the following conditions as a character string. That is, as shown in FIG. 15, when the height hc of the character string candidate kernel and the distance d between the character candidates have a relationship of d (i, j) <hc × constant ... I and j are character strings.

【００２５】次に文字分離処理（図２のステップＳ₄）
を説明する。文字切り出し部５は前記抽出された文字列
から文字候補単位で文字の切り出しを行う。このとき次
の第（２）式に示す条件に合致する場合には図１６のス
テップＳ₁のように高さ方向分離文字統合を行い、第
（３）式，第（４）式に示す条件に合致する場合には図
１６のステップＳ₂のように幅方向分離文字統合を行
う。例えば図１７のように高さ方向に文字候補ｉ，ｊが
分離しているときに下記第（２）式の条件が成立すれ
ば、該文字候補ｉ，ｊは図示の如く統合される。Next, character separation processing (step S _{4 in} FIG. 2)
Will be explained. The character cutout unit 5 cuts out characters from the extracted character string in units of character candidates. At this time, if the conditions shown in the following formula (2) are met, the height direction separated characters are integrated as in step S ₁ of FIG. 16, and the conditions shown in formulas (3) and (4) are satisfied. If it matches, the width direction separated character integration is performed as in step S ₂ of FIG. For example, when the condition of the following expression (2) is satisfied when the character candidates i and j are separated in the height direction as shown in FIG. 17, the character candidates i and j are integrated as shown.

【００２６】max（ｘ_min(i),ｘ_min(j)）≦（ｘ_cen(i) o
r ｘ_cen(j)）≦ min（ｘ_max(i),ｘ_max(j)）‥‥‥
（２）（ｘ_min：外接四角形のｘ方向最小座標、ｘ_max：外接四
角形のｘ方向最大座標、ｘ_cen：外接四角形のｘ方向中
心座標）また例えば図１８のように幅方向に文字候補ｊ，ｋが分
離しているときに下記第（３）式，第（４）式の条件が
成立すれば、該文字候補ｊ，ｋは図示の如く統合され
る。ｄ(ｊ,ｋ)＜ｄ(文字列内平均)×定数‥‥（３）ｗ(ｊ)＋ｗ(ｋ)＋ｄ(ｊ，ｋ)≦ｈ×定数‥‥‥（４）（ｄ：文字候補間距離、ｗ：文字候補幅、ｈ：文字列高
さ）上述した文字切り出し処理（図１１、図１２）の過程で
は、文字候補が文字列として確定する毎に処理済マーク
がつき処理対象数が少なくなる。また図１２の各処理で
は水平方向について説明したが、垂直方向については
ｘ，ｙ方向が逆になる。斜め方向については所定の傾き
角度を仮定し、その角度での座標変換処理が、文字列抽
出処理、文字切り出し処理(図１２のステップＳ₂，
Ｓ₃）に含まれる。Max (x _min (i), x _min (j)) ≤ (x _cen (i) o
r x _cen (j)) ≤ min (x _max (i), x _max (j))
(2) (x _min : minimum coordinate in the x direction of the circumscribing quadrangle, x _max : maximum coordinate in the x direction of the circumscribing quadrangle, x _cen : central coordinate in the x direction of the circumscribing quadrangle) Further, for example, as shown in FIG. , K are separated, if the following conditions (3) and (4) are satisfied, the character candidates j and k are integrated as shown. d (j, k) <d (average in character string) x constant ... (3) w (j) + w (k) + d (j, k) ≤ h x constant ... (4) (d: character candidate (Interval distance, w: character candidate width, h: character string height) In the process of the character cutting process (FIGS. 11 and 12) described above, a processed mark is added every time a character candidate is determined as a character string, and the number of processing targets is set. Is less. Although the horizontal direction has been described in each process of FIG. 12, the x and y directions are opposite in the vertical direction. A predetermined inclination angle is assumed for the diagonal direction, and the coordinate conversion processing at that angle is performed by the character string extraction processing and the character cutout processing (step S _{2 in} FIG. 12,
S ₃ ).

【００２７】文字認識処理（図２のステップＳ₅）は次
のようにして行われる。文字認識部６は前記文字切り出
し部５で切り出された文字と、予め文字が登録された辞
書とを参照、比較し、一文字ずつ認識する。The character recognition process (step S _{5 in} FIG. 2) is performed as follows. The character recognition unit 6 refers to and compares the characters cut out by the character cutting unit 5 with a dictionary in which the characters are registered in advance, and recognizes each character.

【００２８】[0028]

【発明の効果】以上のように本発明によれば、輪郭ベク
トル化処理部と、多角形図形検出処理部と、登録部と、
文字列候補抽出部と、文字列抽出部と、文字切り出し部
とを備え、前記文字切り出し部で切り出された文字デー
タに基づいて文字認識を行うとともに、前記多角形図形
検出処理部で求められた文字図形候補のうち文字列化さ
れていない文字図形候補を図形と認定するようにしたの
で、次のような優れた効果が得られる。As described above, according to the present invention, a contour vectorization processing section, a polygon figure detection processing section, a registration section,
A character string candidate extraction unit, a character string extraction unit, and a character cutout unit are provided, and character recognition is performed based on the character data cut out by the character cutout unit, and the polygonal figure detection processing unit obtains the character recognition. Since the character / figure candidates that are not converted into character strings among the character / figure candidates are identified as a graphic, the following excellent effects can be obtained.

【００２９】（１）市街地図のように文字と同程度の大
きさの図形が同一図面上に存在する場合であっても、文
字と図形を確実に区別して認識することができる。（２）文字、線分、シンボルが混在する書類や図面から
文字を読み取って認識する場合、大きさや書式に限定条
件を付けることなくいかなる大きさ、書式の文字であっ
ても容易に且つ正確に読み取ることができる。（３）また文字候補のデータはツリー構造で登録してお
くので処理の高速化が図れる。(1) Even when a figure having a size similar to that of a character exists in the same drawing as in a city map, the character and the figure can be surely distinguished and recognized. (2) When reading and recognizing characters from a document or drawing in which characters, line segments, and symbols are mixed, it is easy and accurate to use characters of any size and format without limiting the size and format. Can be read. (3) Since the data of character candidates is registered in a tree structure, the processing speed can be increased.

[Brief description of drawings]

【図１】本発明の一実施例を示す全体構成のブロック
図。FIG. 1 is a block diagram of an overall configuration showing an embodiment of the present invention.

【図２】本発明の一実施例の全体のフローチャート。FIG. 2 is an overall flowchart of an embodiment of the present invention.

【図３】本発明の輪郭ベクトル化処理部のフローチャー
ト。FIG. 3 is a flowchart of a contour vectorization processing unit of the present invention.

【図４】本発明の多角形図形検出処理のフローチャー
ト。FIG. 4 is a flowchart of polygonal figure detection processing according to the present invention.

【図５】本発明の一実施例の輪郭ベクトル系列テーブル
を示す説明図。FIG. 5 is an explanatory diagram showing a contour vector series table according to an embodiment of the present invention.

【図６】本発明の一実施例のショートベクトル除去手段
のフローチャート。FIG. 6 is a flowchart of the short vector removing means according to the embodiment of the present invention.

【図７】本発明の一実施例の外接四角形の説明図。FIG. 7 is an explanatory diagram of a circumscribed quadrangle according to an embodiment of the present invention.

【図８】本発明の一実施例の外接四角形情報テーブルの
説明図。FIG. 8 is an explanatory diagram of a circumscribed quadrangle information table according to the embodiment of this invention.

【図９】本発明の一実施例の輪郭ベクトル化処理の説明
図。FIG. 9 is an explanatory diagram of contour vectorization processing according to an embodiment of the present invention.

【図１０】本発明の文字列化処理のフローチャート。FIG. 10 is a flowchart of a character string conversion process of the present invention.

【図１１】本発明の一実施例の文字切り出し処理のフロ
ーチャート。FIG. 11 is a flowchart of character cutting processing according to an embodiment of the present invention.

【図１２】本発明の一実施例の文字切り出し処理のフロ
ーチャート。FIG. 12 is a flowchart of character cutting processing according to an embodiment of the present invention.

【図１３】本発明の一実施例の文字列候補抽出処理のフ
ローチャート。FIG. 13 is a flowchart of a character string candidate extraction process according to an embodiment of the present invention.

【図１４】本発明の一実施例の文字列候補抽出処理の説
明図。FIG. 14 is an explanatory diagram of character string candidate extraction processing according to an embodiment of the present invention.

【図１５】本発明の一実施例の文字列抽出処理の説明
図。FIG. 15 is an explanatory diagram of character string extraction processing according to an embodiment of the present invention.

【図１６】本発明の一実施例の文字切り出し処理のフロ
ーチャート。FIG. 16 is a flowchart of character cutting processing according to an embodiment of the present invention.

【図１７】本発明の一実施例の高さ方向分離文字統合処
理の説明図。FIG. 17 is an explanatory diagram of height direction separated character integration processing according to an embodiment of the present invention.

【図１８】本発明の一実施例の幅方向分離文字統合処理
の説明図。FIG. 18 is an explanatory diagram of width-direction separated character integration processing according to an embodiment of the present invention.

【図１９】（ａ）は図面読取装置に入力する図面の一例
を示す説明図、（ｂ）は正しい認識結果例を示す説明
図。FIG. 19A is an explanatory diagram showing an example of a drawing input to a drawing reading device, and FIG. 19B is an explanatory diagram showing an example of a correct recognition result.

【図２０】（ａ）は従来の図面読取装置によって図形を
文字と誤って認識した例を示す説明図、（ｂ）は従来の
図面読取装置によって文字を図形と誤って認識した例を
示す説明図。20A is an explanatory diagram showing an example in which a figure is mistakenly recognized as a character by a conventional drawing reading apparatus, and FIG. 20B is an explanatory diagram showing an example in which a figure is mistakenly recognized as a figure by a conventional drawing reading apparatus. Fig.

[Explanation of symbols]

１…輪郭ベクトル化処理部、２…登録部、３…文字列候
補抽出部、４…文字列抽出部、５…文字切り出し部、６
…文字認識部。1 ... Contour vectorization processing unit, 2 ... Registration unit, 3 ... Character string candidate extraction unit, 4 ... Character string extraction unit, 5 ... Character cutout unit, 6
… Character recognition part.

Claims

[Claims]

1. Character data is extracted from binarized pixel data corresponding to black and white obtained by raster-scanning a subject in which characters, line segments, and symbols are mixed, and the extracted character data and a character dictionary are referred to. In a drawing reading apparatus for recognizing characters by comparison, a vector connecting two black pixels adjacent to each other is combined based on binary pixel data corresponding to black and white obtained by raster scanning the object. The contour vector data obtained by the contour vector, and a contour vector conversion processing unit that determines the circumscribed quadrangle data circumscribing the contour vector as a character candidate; and a polygon figure from the character candidate data calculated by the contour vector conversion processing unit. A polygon figure detection processing unit which detects and detects the detected polygon figure data as a character figure candidate, and a sentence which is obtained by the contour vectorization processing unit. The search range is determined from the registration unit that registers the circumscribed quadrangle data of the character candidates in a tree structure, and the core of the character string candidates set to a predetermined size, and the center coordinates of the character candidates registered in the registration unit are A character string candidate extraction unit that extracts a character string existing within the search range as a character string candidate, and a height h _C of the character string candidate core and a character string candidate from the data extracted by the character string candidate extraction unit. A character string extraction unit that extracts data having a relationship that the distance d between adjacent character candidates in d is d <h _C × α (α is a constant), and the character extracted by the character string extraction unit. Having a function of integrating a plurality of adjacent character candidates in a column, a character cutout unit for cutting out a character from a character string, and performing character recognition based on the character data cut out by the character cutout unit, Polygon figure detection processing Drawing reading apparatus characterized by qualifying the graphic character graphic candidates that are not stringified of the graphic character candidate obtained in.