JP2013016168A

JP2013016168A - Method and device for positioning text area in image

Info

Publication number: JP2013016168A
Application number: JP2012145538A
Authority: JP
Inventors: Yie-Hwon Pan; パン・イーフォン; Yuanping Zhu; ジュ・ユアヌピン; Junu Sunu; スヌ・ジュヌ; Satoshi Naoi; 聡直井
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2011-06-30
Filing date: 2012-06-28
Publication date: 2013-01-24
Anticipated expiration: 2032-06-28
Also published as: CN102855478B; JP5939056B2; CN102855478A

Abstract

【課題】本発明は、画像におけるテキスト領域を位置決めする方法及び装置を開示する。
【解決手段】本発明による画像におけるテキスト領域の位置決め方法は、入力画像における各画素の周りに存在可能なテキストの大きさを推定するステップと、上記テキストの大きさ及び領域差異度に基づいて、上記入力画像から候補筆画領域を抽出するステップと、上記候補筆画領域から真の筆画領域を特定するステップと、前記真の筆画領域をマージしてテキスト領域を形成するステップとを含む。
【選択図】図２The present invention discloses a method and apparatus for positioning a text region in an image.
A method for positioning a text region in an image according to the present invention includes estimating a size of text that can exist around each pixel in an input image, and based on the size of the text and the region difference. Extracting a candidate stroke area from the input image; identifying a true stroke area from the candidate stroke area; and merging the true stroke area to form a text area.
[Selection] Figure 2

Description

本発明は、一般的に、画像処理に関し、具体的には、画像におけるテキスト領域を位置決めする方法及び装置に関する。 The present invention relates generally to image processing, and in particular to a method and apparatus for positioning a text region in an image.

画像に対してインデクシング、検索、分類などを行う各種のアプリケーションにおいて、画像から画像内容に関する情報を抽出する必要がある。画像には通常、テキスト情報がある。このテキスト情報は画像の内容に対して比較的に高い相関性を有する。したがって、これらのテキスト情報の取得は、画像のアプリケーションにおいて重要なものである。通常、まず、画像におけるテキスト領域を位置決めし、次に、テキスト領域の位置している画像ブロックに対して抽出及び光学文字認識（ＯＣＲ）処理を行って、テキスト情報を取得する。画像は、自然シーンによる画像と、人工的に付加されたテキストによる画像とに分けることができる。人工的に付加されたテキストによる画像は人の関与があるため、その中からテキスト領域を位置決めすることは相対的に簡単である。自然シーンによる画像は画像を単位とするため、テキスト領域の画像と非テキスト領域の画像とを区別し難い。そのため、その中からテキスト領域を位置決めするのは難しい。本発明は、画像中のテキスト領域を位置決めすることに着目して、自然シーンによる画像を含む比較的複雑な画像を処理することができる。 In various applications that perform indexing, search, classification, and the like on an image, it is necessary to extract information about the image content from the image. An image usually has text information. This text information has a relatively high correlation with the content of the image. Therefore, acquisition of such text information is important in image applications. Usually, first, a text area in an image is positioned, and then text information is obtained by performing extraction and optical character recognition (OCR) processing on the image block where the text area is located. The images can be divided into images based on natural scenes and images based on artificially added text. Since the artificially added text image has human involvement, it is relatively easy to position the text area from within it. Since an image based on a natural scene has an image as a unit, it is difficult to distinguish an image in a text area from an image in a non-text area. For this reason, it is difficult to position the text area from within. The present invention can process relatively complex images including images from natural scenes, focusing on positioning text regions in the image.

以下に、本発明の幾つかの局面に対する基本的な理解をもたらすように本発明に関して簡単に概説する。ここで理解すべきことは、この概説が、本発明に関する網羅的な概説ではないということである。本発明の主要部分を特定することを意図するものでなく、本発明の範囲を限定することを意図するものでもない。単に、話を単純化するために幾つかの概念を表して、後述する詳細な説明に先行する説明とすることを目的とする。 The following presents a simplified summary of the invention in order to provide a basic understanding of some aspects of the invention. It should be understood that this overview is not an exhaustive overview regarding the present invention. It is not intended to identify key portions of the invention, nor is it intended to limit the scope of the invention. Its purpose is simply to present some concepts in order to simplify the story and to precede the detailed description that follows.

本発明の目的は、従来技術の上記問題点に対して、画像におけるテキスト領域を位置決め可能な方法及び装置を提供することにある。この技術的な方策は、画像から、高速かつ正確にテキスト領域を位置決めすることができ、任意の種類の画像に適用する。 An object of the present invention is to provide a method and an apparatus capable of positioning a text region in an image with respect to the above-mentioned problems of the prior art. This technical measure can locate text regions quickly and accurately from an image and applies to any kind of image.

上記目的を実現するために、本発明の１つの局面によれば、画像におけるテキスト領域を位置決めする方法を提供する。上記方法は、入力画像における各画素の周りに存在可能なテキストの大きさを推定するステップと、上記テキストの大きさ及び領域の差異度に基づいて、上記入力画像から候補筆画領域を抽出するステップと、上記候補筆画領域から真の筆画領域を特定するステップと、上記真の筆画領域をマージしてテキスト領域を形成するステップとを含む。 In order to achieve the above object, according to one aspect of the present invention, a method for positioning a text region in an image is provided. The method includes estimating a size of text that can exist around each pixel in the input image, and extracting candidate stroke regions from the input image based on the size of the text and the degree of difference between the regions. And specifying a true stroke area from the candidate stroke areas and merging the true stroke areas to form a text area.

本発明の他の局面によれば、画像におけるテキスト領域の位置決め装置を提供する。上記装置は、入力画像における各画像の周りに存在可能なテキストの大きさを推定するための推定ユニットと、上記テキストの大きさ及び領域の差異度に基づいて、上記入力画像から候補筆画領域を抽出するための抽出ユニットと、上記候補筆画領域から真の筆画領域を特定するための特定ユニットと、真の筆画単位をマージしてテキスト領域を形成するためのマージ・ユニットとを含む。 According to another aspect of the present invention, a text region positioning device in an image is provided. The apparatus is configured to estimate a candidate stroke area from the input image based on an estimation unit for estimating a size of text that can exist around each image in the input image and a difference between the size of the text and the area. An extraction unit for extraction, a specific unit for specifying a true stroke area from the candidate stroke areas, and a merge unit for merging true stroke units to form a text area are included.

また、本発明の他の局面によれば、さらに記憶媒体を提供する。上記記憶媒体は、機械による読み取り可能なプログラムコードを有する。情報処理装置において上記プログラムコードを実行すると、上記プログラムコードは、上記情報処理装置に本発明による上記方法を実行させる。 According to another aspect of the present invention, a storage medium is further provided. The storage medium has program code readable by a machine. When the program code is executed in the information processing apparatus, the program code causes the information processing apparatus to execute the method according to the present invention.

また、本発明の他の局面によれば、さらにプログラムを提供する。上記プログラムはコンピュータ実行可能なコマンドを有する。情報処理装置において上記コマンドを実行すると、上記コマンドは、情報処理装置に本発明による上記方法を実行させる。 According to another aspect of the present invention, a program is further provided. The program has a computer executable command. When the command is executed in the information processing apparatus, the command causes the information processing apparatus to execute the method according to the present invention.

処理すべき自然シーンによる画像の例を示す図である。It is a figure which shows the example of the image by the natural scene which should be processed. 本発明による一実施例に係る画像中のテキスト領域を位置決めする方法を示すフローチャートである。4 is a flowchart illustrating a method for positioning a text region in an image according to an embodiment of the present invention. 画像ピラミッドの構造を示す模式図である。It is a schematic diagram which shows the structure of an image pyramid. 図２のステップＳ２０１の詳細を示すフローチャートである。It is a flowchart which shows the detail of step S201 of FIG. 図２のステップＳ２０２の詳細を示すフローチャートである。It is a flowchart which shows the detail of step S202 of FIG. 図２のステップＳ２０３の詳細を示すフローチャートである。It is a flowchart which shows the detail of step S203 of FIG. 図２のステップＳ２０４の詳細を示すフローチャートである。It is a flowchart which shows the detail of step S204 of FIG. 図７ＡのステップＳ７０２の詳細を示すフローチャートである。It is a flowchart which shows the detail of step S702 of FIG. 7A. 全ての筆画領域を接続するチェーン構造を示す模式図である。It is a schematic diagram which shows the chain structure which connects all the stroke areas. 行分け後のチェーン構造を示す模式図である。It is a schematic diagram which shows the chain structure after line division. 字分け後のチェーン構造を示す模式図である。It is a schematic diagram which shows the chain structure after character division. 本発明による一実施例に係る画像におけるテキスト領域の位置決め方法の処理結果を示す模式図である。It is a schematic diagram which shows the processing result of the positioning method of the text area in the image which concerns on one Example by this invention. 本発明による一実施例に係る画像におけるテキスト領域の位置決め装置の構成を示すブロック図である。It is a block diagram which shows the structure of the positioning device of the text area in the image which concerns on one Example by this invention. 本発明の実施例の方法及び装置を実現するためのコンピュータを模式的に示すブロック図である。1 is a block diagram schematically showing a computer for realizing a method and apparatus according to an embodiment of the present invention.

以下、図面を参照しながら本発明の例示的な実施例を詳細に説明する。話を明瞭かつ簡潔にするために、本明細書においては、実際的な実施形態のすべての特徴を説明している訳ではない。しかし、理解すべきことは、何れのこのような実際的な実施例を開発する過程においても、実施形態によって限定された決まりを多くしなければならないという点である。例えば、システム及び業務に関する制約条件などの制約条件が満たされ、前述の制約条件は実施形態によって変わる可能性がある。なお、開発作業は非常に複雑でかつ時間がかかるものであるが、本開示内容の恩恵を受ける当業者には、このような開発作業が単に日常的な任務であることも理解されよう。 Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the drawings. In the interest of clarity and brevity, not all features of a practical embodiment are described herein. However, it should be understood that in the process of developing any such practical example, the rules limited by the embodiments must be increased. For example, a constraint condition such as a constraint condition related to a system and a job is satisfied, and the above-described constraint condition may vary depending on the embodiment. It should be noted that although development work is very complex and time consuming, those skilled in the art who benefit from the present disclosure will also understand that such development work is merely a routine task.

ここで、さらに説明すべきことは、不必要な詳細により、本発明を分かりにくくすることのないように、図面において本発明の技術的な方策と密接に関係する装置構造及び／又は処理ステップのみを示す一方、本発明とあまり関係していない他の詳細を省略しているという点である。また、さらに指摘すべきことは、本発明の一図面、又は一種類の実施形態において説明した要素及び特徴は、一つ又は複数の他の図面若しくは実施例に示された要素及び特徴と組み合わせてもよいという点である。 Here, what is further to be described is only the device structure and / or processing steps closely related to the technical measures of the present invention in the drawings, so as not to obscure the present invention by unnecessary details. However, other details not so much related to the present invention are omitted. It should also be pointed out that elements and features described in one drawing or embodiment of the invention may be combined with elements and features shown in one or more other drawings or examples. It is a good point.

以下、図２を参照しながら本発明の一実施例による画像中のテキスト領域の位置決め方法の流れを説明する。 Hereinafter, a flow of a method for positioning a text area in an image according to an embodiment of the present invention will be described with reference to FIG.

図１は本発明が処理可能な画像の一つの具体的な例を示す。図１に示す自然シーンによる画像において、家としての背景及び交通標識としての前景がある。しかし、上述のように、当該画像は画素を単位とし、画像中のテキスト領域に対して人工的に付加されたタグが一切ないため、その中から正確に、迅速にテキスト領域を抽出し難い。 FIG. 1 shows one specific example of an image that can be processed by the present invention. In the image of the natural scene shown in FIG. 1, there is a background as a house and a foreground as a traffic sign. However, as described above, since the image has a pixel as a unit and there is no tag artificially added to the text area in the image, it is difficult to extract the text area accurately and quickly.

図２に示すように、本発明の一実施例による画像中のテキスト領域の位置決め方法は、入力された画像における画素毎の周りに存在可能なテキストの大きさを推定するステップ（Ｓ２０１）と、上記テキストの大きさ及び領域の差異度に基づいて、上記入力された画像から候補筆画領域を抽出するステップ（Ｓ２０２）と、上記候補筆画領域から真の筆画領域を特定するステップ（Ｓ２０３）と、上記真の筆画領域をマージしてテキスト領域を形成するステップ（Ｓ２０４）とを含む。 As shown in FIG. 2, a method for positioning a text area in an image according to an embodiment of the present invention includes estimating a size of text that can exist around each pixel in an input image (S201), Extracting a candidate stroke area from the input image based on the size of the text and the difference between the areas (S202); identifying a true stroke area from the candidate stroke area (S203); Merging the true stroke area to form a text area (S204).

以下、図３及び図４を参照しながら図２のステップＳ２０１を具体的に説明する。 Hereinafter, step S201 of FIG. 2 will be described in detail with reference to FIGS.

図３は、画像ピラミッドの構造を示している。なお、階層Ｌｎ（ｎ≧１）毎に１枚のピラミッド画像が存在している。第１階層Ｌ１に対応するピラミッド画像は元の入力画像である。図１は、入力画像の具体例を示す。各階層Ｌｎは第１階層Ｌ１に対するスケーリング係数ｓｃ_ｎを有する。各階層Ｌｎ（ｎ＞１）に対してスケーリング係数ｓｃ_ｎで入力画像を均等な比率で縮小してこの階層のピラミッド画像を得る。例えば、入力画像は８×８画素の大きさである。ステップ幅が１／２である場合、第２階層の第１階層Ｌ１に対するスケーリング係数はｓｃ_２＝１／２であり、最近隣接内挿法を採用すると、４×４画素の大きさの第２階層のピラミッド画像が得られる。順次に類推すると、図３のように、入力画像から複数のスケーリングの画像ピラミッドを取得することができる。第ｎ階層（ｎ≧１）については、ステップ幅がｓｔｅｐであれば、第ｎ階層のスケーリング係数がｓｃ_ｎ＝ｓｔｅｐ^ｎ−１となることは明らかである。入力画像における各画素のスケーリング係数は、各階層のピラミッド画像におけるその対応する画素（当該画素自身を含む）の確信度及び各階層のピラミッド画像のスケーリング係数により算出されたものであり、式２を参照しながら後述する。 FIG. 3 shows the structure of the image pyramid. One pyramid image exists for each layer Ln (n ≧ 1). The pyramid image corresponding to the first hierarchy L1 is the original input image. FIG. 1 shows a specific example of an input image. Each layer Ln has a scaling factor sc _n for the first layer L1. For each layer Ln (n> 1), the input image is reduced at an equal ratio by the scaling coefficient sc _n to obtain a pyramid image of this layer. For example, the input image is 8 × 8 pixels in size. When the step width is 1/2, the scaling factor for the first layer L1 of the second layer is sc ₂ = 1/2, and when the nearest neighbor interpolation method is employed, the second of 4 × 4 pixel size A hierarchical pyramid image is obtained. By analogy sequentially, a plurality of scaled image pyramids can be obtained from the input image as shown in FIG. For the nth layer (n ≧ 1), if the step width is step, it is clear that the scaling factor of the nth layer is sc _n = step ⁿ⁻¹ . The scaling factor of each pixel in the input image is calculated by the certainty factor of the corresponding pixel (including the pixel itself) in the pyramid image of each layer and the scaling factor of the pyramid image of each layer. This will be described later with reference.

図４は図２中のステップＳ２０１の詳細を示すフローチャートである。 FIG. 4 is a flowchart showing details of step S201 in FIG.

まず、上述のように、入力画像に基づいて複数の階層のピラミッド画像、すなわち、画像ピラミッド（ステップＳ４０１）を生成する。 First, as described above, a plurality of layers of pyramid images, that is, image pyramids (step S401) are generated based on the input image.

その後、各階層のピラミッド画像に対して、各画素の周りにテキストの存在する確率を推定する（ステップＳ４０２）。具体的には、各階層のピラミッド画像に対して、一定の大きさのスキャンウィンドウでスキャンして、各階層のピラミッド画像の各画素の周りにテキストの存在する確率を取得する。本実施例において、各階層のピラミッド画像のスキャンウィンドウの大きさは元の入力画像の大きさとして固定される。画素毎に、当該画素を中心としたスキャンウィンドウ内の局所テクスチャ、例えば、勾配方向ヒストグラムＨＯ（ＨｉｓｔｏｇｒａｍｓｏｆＯｒｉｅｎｔｅｄＧｒａｄｉｅｎｔ）Ｇ特徴を算出する。算出されたＨＯＧ特徴を訓練済みの分類器に入力し、分類器は当該画素の周りにテキストの存在する確信度（確率）を返すことができる。 After that, the probability that text exists around each pixel is estimated for the pyramid image of each layer (step S402). Specifically, the pyramid image of each layer is scanned with a scan window of a certain size, and the probability that text exists around each pixel of the pyramid image of each layer is acquired. In this embodiment, the size of the scan window of the pyramid image in each layer is fixed as the size of the original input image. For each pixel, a local texture in the scan window centered on the pixel, for example, a gradient direction histogram HO (Histograms of Oriented Gradient) G feature is calculated. The calculated HOG features can be input into a trained classifier, which can return the certainty (probability) that the text exists around the pixel.

分類器を訓練する際に、一つの画素の周りにテキスト領域が存在しているか否かということは既に分かっている。すなわち、一つの画素が、テキスト領域を構成する画素の一つであるか否かということは既に分かっている。例えば、１は当該画素の周りにテキスト領域が存在することを示し、０は当該画素の周りにテキスト領域が存在しないことを示す。一枚の画像における画素毎に、一定の大きさのスキャンウィンドウでＨＯＧ特徴を算出し、算出したＨＯＧ特徴及び１又は０で表された当該画素の周りにテキストが存在するか否かの確信度を分類器に入力する。大量の訓練が行われた分類器はステップＳ４０２に適用可能である。 When training the classifier, it is already known whether there is a text region around one pixel. That is, it is already known whether one pixel is one of the pixels constituting the text area. For example, 1 indicates that a text area exists around the pixel, and 0 indicates that no text area exists around the pixel. HOG feature is calculated for each pixel in one image with a scan window of a certain size, and the certainty of whether there is text around the calculated HOG feature and the pixel represented by 1 or 0 To the classifier. A classifier subjected to a large amount of training can be applied to step S402.

注意すべきことは、ここで局所テクスチャが上述のＨＯＧ特徴に限定されないという点である。例えば、ウェーブレット特徴を算出してもよい。一つの画素を中心とするスキャンウィンドウ内の画素に対してウェーブレット変換を行い、変換によって得られたウェーブレット係数の統計量、例えば、平均値及び／又は分散を当該画素のウェーブレット特徴とする。分類器については、Ｗａｌｄｂｏｏｓｔ、ＡｄａＢｏｏｓｔ等の分類器を選択することができる。 It should be noted that the local texture is not limited to the HOG feature described above. For example, wavelet features may be calculated. Wavelet transform is performed on the pixels in the scan window centered on one pixel, and the statistic of the wavelet coefficient obtained by the transform, for example, an average value and / or variance is used as the wavelet feature of the pixel. For the classifier, a classifier such as Waldboost or AdaBoost can be selected.

ステップＳ４０３では、まず、入力画像の各画素に対して、各階層のピラミッド画像における、その対応する画素の座標を算出する。例えば、第２階層のピラミッド画像が対応するスケーリング係数ｓｃ_２が２の場合、入力画像における座標が（ａ，ｂ）である画素は、第２階層のピラミッド画像における座標が（２ａ，２ｂ）である画素に対応する。すなわち、入力画像において画素の横座標及び縦座標にそれぞれ対応するピラミッドの階層のスケーリング係数を乗じると、当該スケーリング係数の対応するピラミッド画像において対応する画素の横座標及び縦座標が得られる。スケーリング係数が整数ではない値である場合、算出結果を四捨五入して整数化して、対応する画素の横座標及び縦座標とする。入力画像中の一つの画素について、第１階層のピラミッド、すなわち、入力画像におけるその対応する画素がそれ自身であるということは明らかである。したがって、入力画像における一つの画素について、各階層のピラミッド画像それぞれにおいて、対応する画素が一つ存在する。 In step S403, first, for each pixel of the input image, the coordinates of the corresponding pixel in the pyramid image of each layer are calculated. For example, if the scaling factor sc ₂ is 2 to the pyramid image of the second hierarchy corresponding coordinates in the input image is (a, b) pixel coordinates in the pyramid image of the second hierarchy (2a, 2b) Corresponds to a certain pixel. That is, when the input image is multiplied by the scaling factor of the pyramid hierarchy corresponding to the abscissa and ordinate of the pixel, the abscissa and ordinate of the corresponding pixel in the pyramid image corresponding to the scaling factor are obtained. If the scaling factor is a non-integer value, the calculation result is rounded to an integer to obtain the abscissa and ordinate of the corresponding pixel. It is clear that for one pixel in the input image, the first level pyramid, ie its corresponding pixel in the input image, is itself. Accordingly, for each pixel in the input image, there is one corresponding pixel in each layer of the pyramid image.

入力画像におけるｉ番目の画素ｓ_ｉが対応する各階層のピラミッド画像（第１階層のピラミッド画像、すなわち入力画像自身を含む）における各画素によって構成された集合をＰ_ｉとする。ｊはＰ_ｉに属し、ある階層のピラミッド画像における、ｓ_ｉに対応する画素であり、当該ピラミッド画像の対応するスケーリング係数はｓｃ_ｊであることを仮定する。ステップＳ４０２で算出されたｊの周りにテキストが存在する確信度はｗ_ｊであることを仮定する。以下の式１、２に従って入力画像における画素ｓ_ｉの周りにテキストが存在する確率 Let P _i be a set composed of pixels in each layer of the pyramid image (including the first layer pyramid image, that is, the input image itself) to which the i-th pixel s _i in the input image corresponds. It is assumed that j belongs to P _i and is a pixel corresponding to s _i in a pyramid image of a certain hierarchy, and the corresponding scaling coefficient of the pyramid image is sc _j . Assume that the certainty that text exists around j calculated in step S402 is w _j . Probability that text exists around pixel s _i in the input image according to equations 1 and 2 below

及びスケーリング係数ｓｃ_ｉを算出する。

And a scaling coefficient sc _i is calculated.

ただし、

However,

は、それぞれ、確信度及びスケーリング係数の正規化因子を示す。

Denote the normalization factor of the certainty factor and the scaling factor, respectively.

また、上記式１及び２において、ｗ_ｊは重み付け係数として、各階層のピラミッドにおいて対応する画素の情報を第１階層のピラミッド画像としての元の入力画像に投影する。 In the above formulas 1 and 2, w _j is used as a weighting coefficient, and the information of the corresponding pixel in the pyramid of each layer is projected onto the original input image as the pyramid image of the first layer.

スケーリング係数ｓｃ_ｉ及びスキャンウィンドウの大きさにより、入力画像における画素ｓ_ｉの周りに存在可能なテキストの大きさを算出することができる。例えば、スキャンウィンドウの長さｌｅｎｇｔｈ又は幅ｗｉｄｔｈでスキャンウィンドウの大きさを示すと、ｌｅｎｇｔｈ／ｓｃ_ｉ又はｗｉｄｔｈ／ｓｃ_ｉを半径とする円、又はｌｅｎｇｔｈ／ｓｃ_ｉ及びｗｉｄｔｈ／ｓｃ_ｉを長さ及び幅とする矩形の領域は、入力画像における画素ｓ_ｉの周りに存在可能なテキストの大きさを表すことができる。 Based on the scaling factor sc _i and the size of the scan window, the size of text that can exist around the pixel s _i in the input image can be calculated. For example, indicating the size of the scanning window length length or width width of the scan window, length / sc _i or width / sc _i a circle and radius or length / sc _i and width / sc _i the length and, A rectangular area having a width can represent the size of text that can exist around the pixel s _i in the input image.

式２の変形として、ｓｃ_ｊをｌｅｎｇｔｈ／ｓｃ_ｊに置き換えると、等式の左辺はｌｅｎｇｔｈ／ｓｃ_ｉになる。上記説明から分かるように、式２の物理的意義は、各階層のピラミッドにおける、入力画像中の各画素ｓ_ｉに対応する画素ｊの確信度ｗ_ｊと、当該ピラミッド階層に対応するテキストの大きさｌｅｎｇｔｈ／ｓｃ_ｊとに基づいて、入力画像における各画素ｓ_ｉの周りに存在可能なテキストの大きさｌｅｎｇｔｈ／ｓｃ_ｉを算出するというものである。 As a variation of the formula 2, replacing the sc _j in length / sc _j, the left side of the equation becomes length / sc _i. As can be seen from the above description, the physical significance of Expression 2 is that the certainty factor w _j of the pixel j corresponding to each pixel s _i in the input image in the pyramid of each layer and the size of the text corresponding to the pyramid layer. On the basis of the length length / sc _j , the text size length / sc _i that can exist around each pixel s _i in the input image is calculated.

前記式１、２の候補として、入力画像における画素ｓ_ｉに対応する画素のうち、確信度ｗ_ｊの最も高い画素ｊの確信度ｗ_ｊとスケーリング係数ｓｃ_ｊとを、画素ｓ_ｉの周りにテキストが存在する確率 As a candidate of the formula 1 and 2, among the pixels corresponding to the pixel s _i in the input image, and a confidence factor w _j and scaling factor sc _j of the highest pixel j confidence w _j, around the pixel s _i Probability that the text exists

とスケーリング係数ｓｃ_ｉとする。

And scaling factor sc _i .

ステップＳ４０１−Ｓ４０３によれば、元の入力画像における、各画素の周りにテキストが存在する確率と、各画素の周りに存在可能なテキストの大きさを取得することができる。 According to steps S401 to S403, it is possible to acquire the probability that text exists around each pixel and the size of text that can exist around each pixel in the original input image.

ここで画像ピラミッドを採用してスケーリング変換を行うことで、任意な大きさの文字を検出することができる。相対的には、スケーリング変更を行わずに、大きさが一定のウィンドウで画像をスキャンする場合、すなわち、上記実施例の第１階層のみが存在する場合、依然として、各画素周りにテキストが存在する確率を取得することができ、各画素の周りに存在可能なテキストの大きさはスキャンウィンドウの大きさである。この場合、本発明を依然として実現することができるが、スケーリング変換がないため、大きさが一定の文字のみを検出することができる。 Here, by adopting an image pyramid and performing scaling conversion, it is possible to detect characters of any size. Relatively, if the image is scanned with a constant size window without scaling changes, i.e. only the first hierarchy of the above example exists, there will still be text around each pixel. Probability can be obtained, and the size of text that can exist around each pixel is the size of the scan window. In this case, the present invention can still be realized, but since there is no scaling conversion, only characters with a constant size can be detected.

注意すべきことは、スケーリング変換の方法が、上述された、スキャンウィンドウが変化せずに各階層のピラミッド画像スケーリングをズームする場合に限定されないという点である。入力画像をズームせずに、大きさの変化するスキャンウィンドウ（すなわち、スケーリング変換）を採用して入力画像に対して複数回のスキャンを行ってもよい。この場合、各回のスキャンの結果により、同様に、入力画像における各画素の周りにテキストが存在する確率と、各画素の周りに存在可能なテキストの大きさとを算出することができる。 It should be noted that the scaling conversion method is not limited to the above-described zooming of each layer of pyramid image scaling without changing the scan window. The input image may be scanned a plurality of times by using a scan window (that is, scaling conversion) whose size changes without zooming the input image. In this case, similarly, the probability that text exists around each pixel in the input image and the size of text that can exist around each pixel can be calculated based on the result of each scan.

以下、図５を参照しながら、図２中のステップＳ２０２を具体的に説明する。 Hereinafter, step S202 in FIG. 2 will be described in detail with reference to FIG.

図５は、図２中のステップＳ２０２の詳細を示すフローチャートである。 FIG. 5 is a flowchart showing details of step S202 in FIG.

上述のように、ステップＳ２０２において、ステップＳ２０１で算出された各画素の周りに存在可能なテキストの大きさ及び領域差異度に基づいて、上記入力画像から候補筆画領域を抽出する。 As described above, in step S202, a candidate stroke area is extracted from the input image based on the size of the text and the area difference that can exist around each pixel calculated in step S201.

具体的には、予め決められた条件を満たすまで、領域（当初は画素）のマージの反複によって入力画像における類似の領域をマージし続ける。これにより、候補筆画領域としてマージした複数の領域が得られる。マージの基準は、主に領域内の差異度、領域間の差異度、領域周りに存在可能なテキストの大きさを考慮したものである。 Specifically, the similar regions in the input image are continuously merged by repeating the merge of regions (initially pixels) until a predetermined condition is satisfied. Thereby, a plurality of areas merged as candidate stroke areas are obtained. The standard for merging mainly considers the degree of difference within an area, the degree of difference between areas, and the size of text that can exist around the area.

まず、ステップＳ５０１において、入力画像における隣接した領域に対して領域間の差異度及び領域内の差異度を算出する。領域内の差異度は、例えば領域内の最も大きい色差である。階調画像について、領域内の最も大きい色差は領域内の全ての画素のうち、階調値最高の画素と階調値最低の画素との間の、階調値の差の絶対値である。例えば、（Ｙ，Ｃｒ，Ｃｂ）が示す色彩画像に対して各画素の First, in step S501, the difference between regions and the difference within the region are calculated for adjacent regions in the input image. The degree of difference within a region is, for example, the largest color difference within the region. For the gradation image, the largest color difference in the area is the absolute value of the difference in gradation value between the pixel having the highest gradation value and the pixel having the lowest gradation value among all the pixels in the area. For example, for each color image indicated by (Y, Cr, Cb),

の値を算出することができ、領域内の各画素間の最大の

Value can be calculated and the maximum between each pixel in the region

の差の絶対値を領域内の最大色差とする。領域間の差異度は、階調画像について二つの領域の諧調の平均値の差の絶対値をとることができ、カラー画像について２つの領域の

The absolute value of the difference between is the maximum color difference in the region. The degree of difference between the areas can be an absolute value of the difference between the average values of the gradations of the two areas in the gradation image, and the difference between the two areas in the color image can be obtained.

の平均値の差の絶対値をとることができる。まず、隣接した領域は隣接した画素であり、隣接した画素の領域内の差異度は０であり、隣接した画素の領域間の差異度は、例えば隣接した画素の階調値の差の絶対値（階調画像について）、又は隣接した画素の

The absolute value of the difference between the average values can be taken. First, the adjacent area is an adjacent pixel, the difference degree between the adjacent pixel areas is 0, and the difference degree between the adjacent pixel areas is, for example, the absolute value of the difference between the gradation values of the adjacent pixels. (For gradation image) or of adjacent pixels

の差の絶対値又は隣接した画素の輝度の差の絶対値（カラー画像について）である。

Or the absolute value of the difference in luminance between adjacent pixels (for a color image).

ステップＳ５０１において、仮に、隣接した領域をＣ_１、Ｃ_２とし、領域Ｃ_１の画素のうち領域Ｃ_２の画素と隣接した画素からなる集合をＣ_１とし、領域Ｃ_２の画素のうち領域Ｃ_１の画素と隣接した画素からなる集合をＣ_２とする。上述のように、Ｃ_１、Ｃ_２の範囲で領域内の差異度及び領域間の差異度を算出する実施形態以外、単にＣ_１、Ｃ_２だけの範囲において上述の方法を採用して、Ｃ_１、Ｃ_２の領域内の差異度及び領域間の差異度を算出して、Ｃ_１、Ｃ_２の領域内の差異度及び領域間の差異度を表してもよい。 In step S501, if the adjacent regions and _C 1, _{C 2,} a set of pixels adjacent to the pixel region _{C 2} of the pixel region _{C 1} and _{C 1,} region of the pixel region _{C 2} C the set consisting of adjacent pixels as _one pixel and C _2. As described above, except for the embodiment in which the difference in the region and the difference between the regions in the range of C ₁ and C ₂ are calculated, the above-described method is adopted in the range of only C ₁ and C ₂ , and C _1, the difference of the C ₂ in the area and to calculate the difference degree between the region may represent the difference of the differences of and regions within the region of C _1, C _2.

次に、ステップＳ５０２において、ステップＳ２０１で推定した入力画像における、画素毎の周りに存在可能なテキストの大きさに基づいて、領域の周りに存在可能なテキストの大きさを推定し、推定された領域の周りに存在可能なテキストの大きさに基づいて、各領域内の差異度を調整する。 Next, in step S502, based on the size of text that can exist around each pixel in the input image estimated in step S201, the size of text that can exist around the region is estimated and estimated. The degree of difference within each region is adjusted based on the size of text that can exist around the region.

当該ステップは、発明者による以下の発見、すなわち、領域内の差異度及び領域間の差異度のみに基づいて隣接した領域をマージするか否かを決めることが完全に正確な訳ではないことを基づいたものである。隣接した領域Ｃ_１、Ｃ_２の周りに存在可能なテキストの大きさが小さいと分かっていれば、Ｃ_１、Ｃ_２は、マージすべきではないものである可能性が高い。隣接した領域Ｃ_１、Ｃ_２の周りに存在可能なテキストの大きさが大きいと分かっていれば、Ｃ_１、Ｃ_２は、マージすべきものである可能性が高い。したがって、更に領域周りに存在可能なテキストの大きさを考慮し、領域内の差異度を調整することにより、隣接した領域Ｃ_１、Ｃ_２をマージすべきか否かについて、より正確に判断することができる。 The step states that it is not entirely accurate to determine whether to merge adjacent regions based solely on the following discoveries by the inventor, i.e., differences within and between regions. It is based. If it is known that the size of the text that can exist around the adjacent regions C ₁ and C ₂ is small, it is likely that C ₁ and C ₂ should not be merged. If it is known that the size of the text that can exist around the adjacent regions C ₁ and C ₂ is large, it is highly likely that C ₁ and C ₂ should be merged. Therefore, it is possible to more accurately determine whether or not the adjacent regions C ₁ and C ₂ should be merged by considering the size of text that can exist around the region and adjusting the degree of difference in the region. Can do.

領域Ｃ_１に１つの画素のみが含まれている場合、領域Ｃ_１における唯一の画素のスケーリング係数ｓｃ_ｉ及びスキャンウィンドウの大きさを使用して入力画像における領域Ｃ_１の周りに存在可能なテキストの大きさを算出することができる。例えば、スキャンウィンドウの長さｌｅｎｇｔｈ又は幅ｗｉｄｔｈでスキャンウィンドウの大きさを示すと、ｌｅｎｇｔｈ／ｓｃ_ｉ又はｗｉｄｔｈ／ｓｃ_ｉを半径とする円、又はｌｅｎｇｔｈ／ｓｃ_ｉ及びｗｉｄｔｈ／ｓｃ_ｉを長さ及び幅とする矩形の領域は、入力画像中の領域Ｃ_１の周りに存在可能なテキストの大きさを表すことができる。領域Ｃ_１に一つより多い画素が含まれると、領域Ｃ_１におけるスケーリング係数ｓｃ_ｉの平均値及びスキャンウィンドウの大きさにより、入力画像における領域Ｃ_１の周りに存在可能なテキストの大きさを算出することができる。例えば、スキャンウィンドウの長さｌｅｎｇｔｈ又は幅ｗｉｄｔｈでスキャンウィンドウの大きさを示し、ｓｃ_ｉの平均値をａ＿ｓｃ_ｉとすると、ｌｅｎｇｔｈ／ａ＿ｓｃ_ｉ又はｗｉｄｔｈ／ａ＿ｓｃ_ｉを半径とする円、又はｌｅｎｇｔｈ／ａ＿ｓｃ_ｉ及びｗｉｄｔｈ／ａ＿ｓｃ_ｉを長さ及び幅とする矩形の領域は、入力画像中の領域Ｃ_１の周りに存在可能なテキストの大きさを表すことができる。 If the region C ₁ contains only one pixel, the text that can exist around the region C ₁ in the input image using the scaling factor sc _i of the only pixel in the region C ₁ and the size of the scan window Can be calculated. For example, indicating the size of the scanning window length length or width width of the scan window, length / sc _i or width / sc _i a circle and radius or length / sc _i and width / sc _i the length and, rectangular area that the width can represent the magnitude of which can be present around the area C ₁ of the input image text. When the area C ₁ includes more than one pixel, the size of text that can exist around the area C ₁ in the input image is determined by the average value of the scaling factor sc _i in the area C ₁ and the size of the scan window. Can be calculated. For example, when the length of the scan window is represented by the length or width width of the scan window and the average value of sc _i is a_sc _i , a circle having a radius of length / a_sc _i or width / a_sc _i , or length / a_sc rectangular area of the _i and width / a_sc _i the length and width may represent the magnitude of which can be present around the area C ₁ of the input image text.

次に、ステップＳ５０３において、隣接した領域の領域間の差異度が、隣接した領域の調整後の領域内の差異度の最小値よりも小さいか否かを判断する。判断結果が肯定の場合、ステップＳ５０４に進んで、隣接した領域Ｃ_１、Ｃ_２をマージした後に、ステップＳ５０５に進む。判断結果が否定の場合、直接、ステップＳ５０５に進む。 Next, in step S503, it is determined whether or not the difference between the adjacent areas is smaller than the minimum value of the difference in the adjusted area of the adjacent areas. If the determination result is affirmative, the process proceeds to step S504, and after the adjacent areas C ₁ and C ₂ are merged, the process proceeds to step S505. If the determination result is negative, the process directly proceeds to step S505.

注意すべきことは、現在のすべての隣接した領域に対して、ステップＳ５０１−Ｓ５０３を行うという点である。 It should be noted that steps S501 to S503 are performed for all current adjacent areas.

ステップＳ５０５において、現在のすべての隣接した領域のいずれもステップＳ５０３での判断においてマージ条件を満たしていないか否かを判断する。判断結果が否定の場合に、すなわち、新たにマージした少なくとも一つの領域がある場合に、ステップＳ５０１に戻る。判断結果が肯定の場合、現在のすべての隣接した領域のいずれもマージすることができず、すなわち、すべての候補筆画領域を既に抽出していることを意味する。 In step S505, it is determined whether or not all of the current adjacent areas satisfy the merge condition in the determination in step S503. If the determination result is negative, that is, if there is at least one newly merged area, the process returns to step S501. If the determination result is affirmative, it means that none of the current adjacent areas can be merged, that is, all candidate stroke areas have already been extracted.

以下に、上記ステップＳ５０１−Ｓ５０３を実現する例示的な式３、４を表す。 Hereinafter, exemplary expressions 3 and 4 for realizing the above steps S501 to S503 are shown.

Ｄｉｆ（Ｃ_１，Ｃ_２）は領域間の差異度を示す。Ｄ（Ｃ_１，Ｃ_２）がｔｒｕｅである場合には領域Ｃ_１、Ｃ_２をマージし、Ｄ（Ｃ_１，Ｃ_２）がｆａｌｓｅである場合には領域Ｃ_１、Ｃ_２をマージしない。ｍｉｎ（）は最小値をとることを示し、Ｉｎｔ（Ｃ_１）、Ｉｎｔ（Ｃ_２）はそれぞれ、領域Ｃ_１、Ｃ_２の領域内の差異度を表す。τ（Ｃ_１）、τ（Ｃ_２）はそれぞれ、領域Ｃ_１、Ｃ_２のスケーリング正則化項であり、領域Ｃ_１、Ｃ_２の周りに存在可能なテキストの大きさを表す。上述のように、ステップＳ２０１により、１つの画素の周りに存在可能なテキストの大きさが得られるとともに、その中の各画素の周りに存在可能なテキストの大きさに基づいてＣ_１、Ｃ_２の周りに存在可能なテキストの大きさ

Dif (C ₁ , C ₂ ) indicates the degree of difference between regions. When D (C ₁ , C ₂ ) is true, the regions C ₁ and C ₂ are merged, and when D (C ₁ , C ₂ ) is false, the regions C ₁ and C ₂ are not merged. min () indicates a minimum value, and Int (C ₁ ) and Int (C ₂ ) indicate the degrees of difference in the regions C ₁ and C ₂ , respectively. tau _(C 1), a tau _{(C 2),} respectively, the scaling regularization term of area _C 1, _{C 2,} represents the magnitude of the present text that can be around the area _C 1, _{C 2.} As described above, the size of the text that can exist around one pixel is obtained by step S201, and C ₁ and C ₂ are based on the size of the text that can exist around each pixel therein. Size of text that can exist around

を算出することができる。

Can be calculated.

にそれぞれ、経験によって得られる係数値Ｋを乗算すると、領域Ｃ_１、Ｃ_２のスケーリング正則化項τ（Ｃ_１）、τ（Ｃ_２）が得られる。ＭＩｎｔ（Ｃ_１，Ｃ_２）は、中間算出結果となり、隣接した領域の調整後の領域内の差異度の最小値である。

Are multiplied by a coefficient value K obtained by experience, respectively, to obtain scaling regularization terms τ (C ₁ ) and τ (C ₂ ) of regions C ₁ and C ₂ . MInt (C ₁ , C ₂ ) is an intermediate calculation result, and is the minimum value of the degree of difference in the adjusted area between adjacent areas.

以下に、図６を参照して図２のステップＳ２０３を詳細に説明する。 Hereinafter, step S203 of FIG. 2 will be described in detail with reference to FIG.

図６は、図２のステップＳ２０３の詳細を示すフローチャートである。 FIG. 6 is a flowchart showing details of step S203 in FIG.

上述のように、ステップＳ２０２で複数の候補筆画領域が取得されている。以下、ステップＳ２０３において、背景から誤って抽出した候補筆画領域をフィルタリングし、除去して真の筆画領域を特定する。 As described above, a plurality of candidate stroke areas are acquired in step S202. Thereafter, in step S203, the candidate stroke area erroneously extracted from the background is filtered and removed to identify the true stroke area.

ステップＳ２０２において、条件付き確率場ＣＲＦモデルを採用し、同時に、個別の筆画特徴と隣接した筆画との関係を考慮して、候補筆画領域が真の筆画領域であるか否かを正確に特定する。 In step S202, the conditional random field CRF model is adopted, and at the same time, the relationship between the individual stroke characteristics and the adjacent strokes is considered, and whether or not the candidate stroke area is a true stroke area is specified accurately. .

まず、ステップＳ６０１において、候補筆画領域のうち、互いに相関する候補筆画領域を特定する。具体的には、候補筆画領域のサイズ情報及び候補筆画領域の間の距離に基づいて、上記候補筆画領域が相関するか否かを判断する。一つの候補筆画領域が一つの連通領域であり、連通領域の外切矩形の幅及び高さを当該候補筆画領域の幅ｗ及び高さｈとし、二つの候補筆画領域ｉ及びｊの質量中心間の距離をｄｉｓｔ（ｒ_ｉ，ｒ_ｊ）とし、ｍｉｎ［］が最小値をとることを表し、以下の式５により、二つの候補筆画領域が相関するか否かを判断する。 First, in step S601, candidate stroke areas that correlate with each other are specified from the candidate stroke areas. Specifically, based on the size information of the candidate stroke area and the distance between the candidate stroke areas, it is determined whether or not the candidate stroke areas are correlated. One candidate stroke area is one communication area, and the width and height of the outer cut rectangle of the communication area are the width w and height h of the candidate stroke area, and between the center of mass of the two candidate stroke areas i and j Is represented by dist (r _i , r _j ), and min [] represents the minimum value, and it is determined whether or not two candidate stroke areas are correlated according to the following equation (5).

式５を満たした候補筆画領域は、相関すると考えられる。全ての候補筆画領域に対して以上の判断を行うと、候補筆画領域近隣図が得られる。なお、候補筆画領域をノードとし、相関する候補筆画領域の対応するノードが互いに接続する。

Candidate stroke regions that satisfy Equation 5 are considered to be correlated. When the above determination is made for all candidate stroke areas, a candidate stroke area neighborhood diagram is obtained. Note that the candidate stroke area is a node, and the corresponding nodes of the correlated candidate stroke areas are connected to each other.

ステップＳ６０２において、次の式６に従って真の筆画領域を特定する。 In step S602, a true stroke area is specified according to the following equation 6.

Ｅは確信度を示し、訓練された分類器によって与えられるものである。Ｘは特徴の観測値を示す。Ｙは真の筆画領域であるか否かを示す。Ｇは候補筆画領域近隣図を示す。Λはパラメータを示す。ｘ_ｉは、例えば候補筆画領域のサイズ、確信度などの特徴である。上記ステップＳ２０１では、一つの画素の周りにテキスト領域が存在する確信度を算出することで、候補筆画領域の周りにテキスト領域の存在する確信度を算出して（例えば、候補筆画領域における各画素の周りにテキスト領域の存在する確信度の平均値をとる）候補筆画領域ｉの特徴ｘ_ｉとすることができる。ｙ_ｉは、候補筆画領域ｉが真の筆画領域であるか否かを示す。ｙ_ｉが１である場合、ｉは真の筆画領域である。ｙ_ｉが０である場合、ｉは真の筆画領域ではない。λ_uni、λb_iは、分類器を訓練することによって得られたパラメータである。Ｎ_ｉはｉと相関する全ての候補筆画領域の集合である。ｊはＮ_ｉにおける一つの候補筆画領域である。ｘ_ｊは候補筆画領域ｊの特徴を示し、例えば、候補筆画領域ｊにおける各画素周りにテキスト領域の存在する確信度である。ｙ_ｊは候補筆画領域ｊが真の筆画領域であるか否かを示し、ｙ_ｊが１である場合、ｊは真の筆画領域である。ｙ_ｊが０である場合、ときに、ｊは真の筆画領域ではない。λ_ijは、重み係数であり、ｊとｉの相関度を反映している。次の式７、８を採用して重み係数λ_ijを算出する。

E indicates confidence and is given by a trained classifier. X represents the observed value of the feature. Y indicates whether or not it is a true stroke area. G shows a neighborhood drawing of a candidate stroke area. Λ indicates a parameter. x _i is a feature such as the size of a candidate stroke area and the certainty factor, for example. In step S201, the certainty factor that the text region exists around one pixel is calculated by calculating the certainty factor that the text region exists around one pixel (for example, each pixel in the candidate stroke region). Can be used as a feature x _i of the candidate stroke area i). y _i indicates whether or not the candidate stroke area i is a true stroke area. If y _i is 1, i is the true stroke area. If y _i is 0, i is not a true stroke area. λ _uni and λb _i are parameters obtained by training the classifier. N _i is a set of all candidate stroke areas correlated with i. j is one of the candidate stroke region in N _i. x _j indicates the characteristics of the candidate stroke area j, and is, for example, the certainty that a text area exists around each pixel in the candidate stroke area j. y _j indicates whether or not the candidate stroke area j is a true stroke area. If y _j is 1, j is a true stroke area. If y _j is 0, sometimes j is not a true stroke area. λ _ij is a weighting factor and reflects the degree of correlation between j and i. The following formulas 7 and 8 are employed to calculate the weighting factor λ _ij .

各候補筆画領域ｉに対して、それと相関する全ての候補筆画領域（すなわち、Ｎ_ｉ）及びｉ自身を利用して、ｉの所在しているテキスト行ｌ_ｉをフィットする。具体的には、特徴区間において一つの点は一つの候補筆画領域の特徴を示す。Ｎ_ｉ及びｉの対応する点をフィットし、同一のフィッティング曲線に属する点の対応する候補筆画領域を、ｉの所在のテキスト行ｌ_ｉに属すると特定する。ｊは、Ｎ_ｉに属する、ｉと相関する候補筆画領域の一つである。dist（j,l_i）はｊの質量中心からｌ_ｉまでの距離である。

For each candidate stroke area i, all candidate stroke areas (that is, N _i ) and i that correlate with the candidate stroke area i are used and the text line l _{i where i} is located is fitted. Specifically, one point in the feature section indicates the feature of one candidate stroke area. The corresponding points of N _i and i are fitted, and the corresponding candidate stroke area of the points belonging to the same fitting curve is specified as belonging to the text line l _i where i is located. j belongs to N _i, is one of the candidate stroke region correlated to i. dist (j, l _i ) is the distance from the center of mass of j to l _i .

は経験によって得られる正規化因子である。ｅｘｐ［］は、自然対数ｅを底とする指数関数である。

Is a normalization factor obtained through experience. exp [] is an exponential function with the natural logarithm e as the base.

は回帰誤差である。以上から分かるように、ｊとｌ_ｉの距離が遠いほど、

Is the regression error. As can be seen from the above, the farther the distance between j and l _i is,

が小さくなる。

Becomes smaller.

の使用により、ｉと相関する候補筆画領域ｊの重み係数と異なるようにし、ｉと同一のテキスト行に属するｊがｉに対する影響がより大きくなる。よって、ｉから離れた特徴の類似する候補筆画領域がｉに大きい影響を与えることが回避される。

Is different from the weight coefficient of the candidate stroke area j correlated with i, and j belonging to the same text line as i has a greater influence on i. Therefore, it is avoided that a candidate stroke area having a similar feature away from i has a large influence on i.

上記の式において、Ｅ（ｘ_ｉ，ｘ_ｊ，ｙ_ｉ，ｙ_ｊ，λ_ｂｉ）の代わりにＥ（ｘ_ｉｊ，ｙ_ｉ，ｙ_ｊ，λ_ｂｉ）を使用可能であり、ｘ_ｉｊは、候補筆画領域ｉ及びｊにおける各画素の周りにテキスト領域の存在する確信度の平均値の差の絶対値をとることができる。ｘ_ｉｊは、候補筆画領域ｉとｊの質量中心の距離をとることもでき、領域間の関係をよりよく反映できるようになっている。Ｅ（ｘ_ｉ，ｙ_ｉ，λ_ｕｎｉ）
は、単一の候補筆画領域が真の筆画領域であるか否かについての確信度であり（呈する値がｙ_ｉの場合によるものである）、Ｅ（ｘ_ｉ，ｘ_ｊ，ｙ_ｉ，ｙ_ｊ，λ_ｂｉ）は、呈する値がｙ_ｉ，ｙ_ｊの場合の確信度を示し、 In the above equation, E (x _ij , y _i , y _j , λ _bi ) can be used instead of E (x _i , x _j , y _i , y _j , λ _bi ), and x _ij is a candidate The absolute value of the difference between the average values of the certainty factors in which the text area exists around each pixel in the stroke areas i and j can be taken. x _ij can also take the distance between the center of mass of the candidate stroke areas i and j, and can better reflect the relationship between the areas. E (x _i , y _i , λ _uni )
Is the certainty as to whether or not the single candidate stroke area is a true stroke area (this is due to the case where the value presented is y _i ), and E (x _i , x _j , y _i , y _j , λ _bi ) indicates the certainty factor when the values to be presented are y _i , y _j ,

は、相関する候補筆画領域間の関係を示す。

Indicates the relationship between the correlated candidate stroke areas.

各候補筆画領域ｉに対して、ｉ、及びｉに対する全てのｊが真の筆画領域であるか否かを仮定する、すなわち、ｙ_ｉ、ｙ_ｊの値を仮定する。全ての値取りの可能性及び相応する特徴（すなわち、ｘ_ｉ、ｘ_ｊ及び／又はｘ_ｉｊ）が訓練済みの分類器に入力され、分類器からＥ（ｘ_ｉ，ｙ_ｉ，λ_ｕｎｉ）の代わりにＥ（ｘ_ｉ，ｘ_ｊ，ｙ_ｉ，ｙ_ｊ，λ_ｂｉ）の値が返され、Ｅ（Ｘ，Ｙ，Ｇ，Λ）が算出される。Ｅ（Ｘ，Ｙ，Ｇ，Λ）を最大値にすると、対応するｙ_ｉ、ｙ_ｊの値取り結果が真の筆画領域の特定結果とされる。 For each candidate stroke area i, it is assumed whether i and all j for i are true stroke areas, ie, the values of y _i , y _j are assumed. All pricing possibilities and corresponding features (ie, x _i , x _j and / or x _ij ) are input into the trained classifier and from the classifier E (x _i , y _i , λ _uni ) Instead, the value of E (x _i , x _j , y _i , y _j , λ _bi ) is returned, and E (X, Y, G, Λ) is calculated. When E (X, Y, G, Λ) is maximized, the corresponding value of y _i and y _j is taken as the result of specifying the true stroke area.

すなわち、ステップＳ６０２において、入力画像におけるすべての候補筆画領域それぞれが真の筆画領域であるか否かに関する多種の仮定の組合せを生成する。各種の仮定の組合せそれぞれに対して、上記仮定の組合せ及び各候補筆画領域における画素の特徴に基づいて各候補筆画領域の第１の確信度を算出し、上記仮定の組合せ及び互いに相関する候補筆画領域における画素の特徴に基づいて、第１の確信度と対応する第２の確信度を算出する。次に、上記第１の確信度及び第２の確信度に基づいて、当該仮定の組合せの場合を表す確信度を算出する。当該仮定の組合せの場合を表す確信度が最も高い場合に対応する仮定の組合せを、真の筆画領域の特定結果とする。なお、上記互いに相関する候補筆画領域は同一のテキスト行に属する場合に、上記仮定の組合せの場合を表す確信度の算出において、それに対応する第２の確信度に、比較的に大きい重み係数を付与する。 That is, in step S602, various assumption combinations regarding whether or not all candidate stroke areas in the input image are true stroke areas are generated. For each combination of various assumptions, the first certainty factor of each candidate stroke area is calculated based on the combination of the above assumptions and the pixel characteristics in each candidate stroke area, and the candidate strokes correlated with each other. Based on the characteristics of the pixels in the region, a second certainty factor corresponding to the first certainty factor is calculated. Next, based on the first certainty factor and the second certainty factor, a certainty factor representing the case of the assumed combination is calculated. A hypothetical combination corresponding to the case where the certainty level representing the hypothetical combination is the highest is taken as the result of specifying the true stroke area. In addition, when the candidate stroke areas correlated with each other belong to the same text line, a relatively large weighting factor is applied to the corresponding second certainty factor in the calculation of the certainty factor representing the case of the assumed combination. Give.

以上で、ステップＳ２０３において、真の筆画領域を取得している。以下、ステップＳ２０４では、真の筆画領域をマージして、テキスト領域を形成する。 In step S203, the true stroke area is acquired. In step S204, the true stroke area is merged to form a text area.

以下、図７Ａ−７Ｂ、８Ａ−８Ｃを参照しながら、図２のステップＳ２０４を説明する。 Hereinafter, step S204 of FIG. 2 will be described with reference to FIGS. 7A-7B and 8A-8C.

図７Ａは、図２のステップＳ２０４の詳細を示すフローチャートである。図８Ａは全ての真の筆画領域を接続するチェーン構造の模式図である。図８Ｂは行分け後のチェーン構造の模式図である。図８Ｃは字分け後のチェーン構造の模式図である。 FIG. 7A is a flowchart showing details of step S204 in FIG. FIG. 8A is a schematic diagram of a chain structure that connects all true stroke areas. FIG. 8B is a schematic diagram of the chain structure after line separation. FIG. 8C is a schematic view of the chain structure after character division.

ステップＳ２０３において、真の筆画領域は既に特定されている。ステップＳ２０４において、これらの筆画領域をマージして、テキスト領域を形成する。 In step S203, the true stroke area has already been specified. In step S204, these stroke areas are merged to form a text area.

まず、ステップＳ７０１において、筆画領域間の距離に基づいて筆画領域間の接続関係を特定する。筆画領域間の距離は、筆画領域の質量中心間のユークリッド距離によって表すことができる。図８Ａに示すように、筆画領域間の距離に基づいて、最小全域木アルゴリズムを採用して全ての筆画領域をチェーン構造に従って接続することができる。最小全域木アルゴリズムは本技術分野において既知のアルゴリズムであるので、ここでは説明しないものとする。 First, in step S701, the connection relation between the stroke areas is specified based on the distance between the stroke areas. The distance between the stroke areas can be represented by the Euclidean distance between the centers of mass of the stroke areas. As shown in FIG. 8A, based on the distance between the stroke areas, a minimum spanning tree algorithm can be employed to connect all stroke areas according to a chain structure. Since the minimum spanning tree algorithm is a known algorithm in this technical field, it will not be described here.

図８Ａにおいて、筆画領域間の距離のみを基づいて筆画領域間の関係を判断する場合に、同一行における異なる字に属する筆画領域や、異なる行に属する筆画領域は、距離が近いために接続される可能性があるということは明らかである。したがって、後述のステップＳ７０２、Ｓ７０３において、このような誤接続を取り除くことに着目する。 In FIG. 8A, when the relationship between the stroke areas is determined based only on the distance between the stroke areas, the stroke areas belonging to different characters in the same line and the stroke areas belonging to different lines are connected because the distance is short. It is clear that there is a possibility. Therefore, attention is paid to removing such erroneous connection in steps S702 and S703 described later.

ステップＳ７０２において、異なるテキスト行に属する筆画領域間の接続関係を取り除く。図７Ｂは図７ＡのステップＳ７０２の詳細のフローチャートを示す。 In step S702, the connection relationship between the stroke areas belonging to different text lines is removed. FIG. 7B shows a detailed flowchart of step S702 in FIG. 7A.

ステップＳ７０２１において、チェーン構造において一本の接続辺によって接続された二つの筆画領域間のユークリッド距離が閾値ｔｈ_ｅｄより大きいか否かを判断する。判断結果が否定の場合に、直接、ステップＳ７０２３に進む。判断結果が肯定の場合に、当該接続辺を切断し（ステップＳ７０２２）、ステップＳ７０２３に進む。 In step S7021, it is determined whether or not the Euclidean distance between the two stroke regions connected by one connection side in the chain structure is greater than the threshold value th _ed . If the determination result is negative, the process directly proceeds to step S7023. If the determination result is affirmative, the connection side is disconnected (step S7022), and the process proceeds to step S7023.

上記状況に類似させ、距離のみの場合には、まだ誤接続が残されるおそれがある。したがって、ステップＳ７０２３−Ｓ７０２５において、更に検出して誤接続を切断する。 Similar to the above situation, if there is only a distance, there is still a possibility that an erroneous connection is left. Accordingly, in steps S7023 to S7025, the erroneous connection is further detected and disconnected.

ステップＳ７０２１及びＳ７０２２を行うと、本来の最小全域木アルゴリズムによって生成された一つのチェーン構造は、既に複数のチェーン構造に分割された可能性がある。各チェーン構造に対して、後述のステップＳ７０２３−Ｓ７０２５を実行する。 When steps S7021 and S7022 are performed, there is a possibility that one chain structure generated by the original minimum spanning tree algorithm has already been divided into a plurality of chain structures. Steps S7023 to S7025 described later are executed for each chain structure.

ステップＳ７０２３において、同一のチェーン構造に属する筆画領域を一本の中心線ｌにフィットする。例えば、最小二乗法を利用して、同一のチェーン構造に属する筆画領域の質量中心を一本の中心線ｌにフィットする。 In step S7023, the stroke areas belonging to the same chain structure are fitted to one center line l. For example, the center of mass of the stroke regions belonging to the same chain structure is fitted to one center line l using the least square method.

当該チェーン構造に属する各筆画領域から当該中心線ｌまでの距離が予め設けられた閾値ｔｈ_ｌｅより大きいか否かを判断する（ステップＳ７０２４）。 It is determined whether or not the distance from each stroke area belonging to the chain structure to the center line l is greater than a predetermined threshold th _le (step S7024).

判断結果が肯定の場合、中心線ｌの両側それぞれに少なくとも一つのテキスト行があることを表す。したがって、当該チェーン構造における，当該中心線１を跨る接続辺を切断する。（ステップＳ７０２５）
ステップＳ７０２５により、一つのチェーン構造は二つの新しいチェーン構造になるので、再びＳ７０２３に戻し、判断を続ける。 If the determination result is affirmative, it indicates that there is at least one text line on each side of the center line l. Therefore, the connection side straddling the center line 1 in the chain structure is cut. (Step S7025)
In step S7025, one chain structure becomes two new chain structures, so the process returns to S7023 again to continue the determination.

ステップＳ７０２４の判断結果が否定の場合、現在のチェーン構造において、一つのテキスト行しかないことを表す。したがって、テキスト行間の接続辺がなくなり、ステップＳ７０２が終了し、ステップＳ７０３に進み、同一のテキスト行に属する各字間の誤接続を切断する。ステップＳ７０２の処理結果は、図８Ｂに示される。 If the determination result in step S7024 is negative, it indicates that there is only one text line in the current chain structure. Therefore, there is no connection side between the text lines, and step S702 is completed, and the process proceeds to step S703, where the erroneous connection between the characters belonging to the same text line is disconnected. The processing result of step S702 is shown in FIG. 8B.

ステップＳ７０３において、ステップＳ７０２により得られたチェーン構造それぞれは一つのテキスト行を表す。各チェーン構造においては、複数の筆画領域が存在し、筆画領域は接続辺によって接続されている。接続されている各筆画領域間の枠距離ｂｄ及びテキスト行全体（即ち、チェーン構造）の平均枠距離ａ＿ｂｄを算出する。接続辺によって接続される二つの筆画領域の枠距離とは、この二つの筆画領域の外接矩形の隣接した辺間の距離を指す。接続辺によって接続される二つの筆画領域の枠距離ｂｄがテキスト全体の平均枠距離ａ＿ｂｄよりもはるかに大きい場合（例えば、ｂｄ＞ａ＿ｂｄ＊ξ、ξは経験で予め設けられた定数である）は、この二つの筆画領域が異なる字に属すべきであることを表し、これらの接続辺を切断する。すなわち、ステップＳ７０３では、異なる字に属する筆画領域間の接続関係を取り除く。ステップＳ７０３の処理結果は、図８Ｃに示す。 In step S703, each chain structure obtained in step S702 represents one text line. In each chain structure, there are a plurality of stroke areas, and the stroke areas are connected by connecting edges. The frame distance bd between the connected stroke areas and the average frame distance a_bd of the entire text line (that is, the chain structure) are calculated. The frame distance between two stroke areas connected by a connecting edge refers to the distance between adjacent edges of a circumscribed rectangle of the two stroke areas. When the frame distance bd between the two stroke areas connected by the connecting edge is much larger than the average frame distance a_bd of the entire text (for example, bd> a_bd * ξ, ξ is a constant set in advance by experience). Represents that these two stroke areas should belong to different characters, and cuts their connecting edges. That is, in step S703, the connection relationship between the stroke areas belonging to different characters is removed. The processing result of step S703 is shown in FIG. 8C.

以上において、複数のチェーン構造を取得している。チェーン構造それぞれが一つの字を表し、各チェーン構造において、接続辺によって接続された複数の筆画領域を含む。チェーン構造毎の外接矩形を当該チェーン構造の対応する字のテキスト領域とすることができる。図９は本発明による一実施例とする画像におけるテキスト領域の位置決め方法の処理結果を示す模式図である。 In the above, a plurality of chain structures have been acquired. Each chain structure represents one character, and each chain structure includes a plurality of stroke areas connected by connecting edges. A circumscribed rectangle for each chain structure can be used as a text area of a corresponding character of the chain structure. FIG. 9 is a schematic diagram showing a processing result of a text region positioning method in an image according to an embodiment of the present invention.

以下、図１０を参照して、本発明の一実施例による画像におけるテキスト領域を位置決めする装置の構造を説明する。図１０は、本発明の一実施例による画像におけるテキスト領域を位置決めする装置の構造を示すブロック図である。図１０に示すように、当該実施例による画像におけるテキスト領域を位置決めする装置１００は、入力画像における各画素の周りに存在可能なテキストの大きさを推定するための推定ユニット１０１と、上記テキストの大きさ及び領域の差異度に基づいて、上記入力画像から候補筆画領域を抽出するための抽出ユニット１０２と、上記候補筆画領域から、真の筆画領域を特定するための特定ユニット１０３と、テキスト領域を形成するように上記真の筆画領域をマージするためのマージ・ユニット１０４とを含む。 Hereinafter, the structure of an apparatus for positioning a text area in an image according to an embodiment of the present invention will be described with reference to FIG. FIG. 10 is a block diagram illustrating a structure of an apparatus for positioning a text area in an image according to an embodiment of the present invention. As shown in FIG. 10, an apparatus 100 for positioning a text region in an image according to this embodiment includes an estimation unit 101 for estimating the size of text that can exist around each pixel in an input image, An extraction unit 102 for extracting a candidate stroke area from the input image based on the size and degree of difference between the areas, a specification unit 103 for specifying a true stroke area from the candidate stroke area, and a text area And a merge unit 104 for merging the true stroke areas so as to form

上記推定ユニット１０１は、上記入力画像に基づいて複数階層のピラミッド画像を生成するための画像ピラミッド生成ユニット１０１１と、各階層のピラミッド画像における各画素の周りにテキストの存在する確率を推定するための推定サブユニット１０１２と、上記確率及びピラミッド階層に対応するテキストの大きさに基づいて、上記入力画像における各画素の周りに存在可能なテキストの大きさを算出するための算出ユニット１０１３とを含む。 The estimation unit 101 estimates an image pyramid generation unit 1011 for generating a pyramid image of a plurality of layers based on the input image, and a probability that text exists around each pixel in the pyramid image of each layer. An estimation subunit 1012 and a calculation unit 1013 for calculating the size of text that can exist around each pixel in the input image based on the probability and the size of the text corresponding to the pyramid hierarchy are included.

上記抽出ユニット１０２は、上記入力画像における隣接した領域に対して領域間の差異度及び領域内の差異度を算出するための差異度算出ユニット１０２１と、推定された入力画像における各画素の周りに存在可能なテキストの大きさに基づいて領域の周りに存在可能なテキストの大きさを推定し、推定された領域の周りに存在可能なテキストの大きさに基づいて各領域内の差異度を調整するための調整ユニット１０２２と、隣接した領域の領域間の差異度及び上記隣接した領域の調整後の領域内の差異度に基づいて上記隣接した領域をマージするための隣接領域マージ・ユニット１０２３とを含む。 The extraction unit 102 includes a difference degree calculation unit 1021 for calculating a difference degree between regions and a difference degree within the region with respect to adjacent regions in the input image, and around each pixel in the estimated input image. Estimate the size of text that can exist around the area based on the size of text that can exist, and adjust the degree of difference within each area based on the size of text that can exist around the estimated area An adjustment unit 1022 for merging, and an adjacent region merging unit 1023 for merging the adjacent regions based on the difference between the adjacent regions and the difference in the adjusted region of the adjacent regions including.

上記特定ユニット１０３は、候補筆画領域のうち互いに相関する候補筆画領域を特定するための相関特定ユニット１０３１と、入力画像におけるすべての候補筆画領域それぞれが真の筆画領域であるか否かに関する多種の仮定組合せを生成するための仮定組合せユニット１０３２と、仮定組合せ毎に、上記仮定組合せ及び各候補筆画領域における画素の特徴に基づいて、各候補筆画領域の第１の確信度を算出し、上記仮定組合せ及び互いに相関する候補筆画領域における特徴に基づいて、第１の確信度に対応する第２の確信度を算出し、その後、上記第１の確信度と上記第２の確信度に基づいて、当該仮定組合せの場合を表す確信度を算出するための確信度算出ユニット１０３３と、当該仮定組合せの場合を表す確信度の最も高い場合に対応する仮定組合せを真の筆画領域の特定結果とするための特定サブユニット１０３４とを含む。なお、上記互いに相関する候補筆画領域が同一のテキスト行に属する場合に、当該仮定組合せの場合を表す確信度の算出において、その対応する第２の確信度に比較的に大きい重み係数を付与する。なお、上記相関特定ユニットは、候補筆画領域のサイズ情報及び候補筆画領域間の距離に基づいて、上記候補筆画領域が相関するか否かを判断する。なお、上記確信度算出ユニットは、特徴空間において候補筆画領域をフィットし、同一のフィッティング曲線に属する候補筆画領域を同一のテキスト行に属すると特定し、回帰誤差に基づいて確信度の算出における重み係数を算出する。 The specifying unit 103 includes a correlation specifying unit 1031 for specifying candidate stroke areas correlated with each other among the candidate stroke areas, and various types relating to whether each candidate stroke area in the input image is a true stroke area. An assumption combination unit 1032 for generating an assumption combination, and for each assumption combination, a first certainty factor of each candidate stroke area is calculated based on the assumption combination and the pixel characteristics in each candidate stroke area, and the assumption is calculated. Based on the combination and the features in the candidate stroke areas that are correlated with each other, a second certainty factor corresponding to the first certainty factor is calculated, and then, based on the first certainty factor and the second certainty factor, Corresponding to the certainty calculation unit 1033 for calculating the certainty factor representing the case of the assumed combination and the highest certainty factor representing the case of the assumed combination Assumptions combination that includes a specific subunit 1034 for a particular result of the true stroke region. When the candidate stroke areas correlated with each other belong to the same text line, a relatively large weighting factor is given to the corresponding second certainty factor in calculating the certainty factor representing the assumption combination. . The correlation specifying unit determines whether or not the candidate stroke areas correlate based on the size information of the candidate stroke areas and the distance between the candidate stroke areas. The certainty factor calculation unit fits candidate stroke regions in the feature space, identifies candidate stroke regions belonging to the same fitting curve as belonging to the same text line, and weights in calculating the certainty factor based on the regression error Calculate the coefficient.

上記マージ・ユニット１０４は、筆画領域間の距離に基づいて筆画領域間の接続関係を特定するための接続ユニット１０４１と、異なるテキスト行に属する筆画領域間の接続関係を取り除くための行分けユニット１０４２と、異なる字に属する筆画領域間の接続関係を取り除くための字分けユニット１０４３とを含む。 The merge unit 104 includes a connection unit 1041 for specifying the connection relationship between the stroke areas based on the distance between the stroke areas, and a line dividing unit 1042 for removing the connection relationship between the stroke areas belonging to different text lines. And a character division unit 1043 for removing the connection relationship between the stroke areas belonging to different characters.

本発明による画像におけるテキスト領域の位置決め装置１００に含まれた推定ユニット１０１、抽出ユニット１０２、特定ユニット１０３、マージ・ユニット１０４での処理はそれぞれ上述した画像におけるテキスト領域の位置決め方法のステップＳ２０１−Ｓ２０４での処理と類似しており、話を簡潔にするために、これらのユニットに対する詳細説明を省略する。 The processes in the estimation unit 101, the extraction unit 102, the specifying unit 103, and the merge unit 104 included in the text region positioning apparatus 100 in the image according to the present invention are steps S201 to S204 of the text region positioning method in the image described above, respectively. For the sake of brevity, a detailed description of these units is omitted.

同様に、推定ユニット１０１に含まれた画像ピラミッド生成ユニット１０１１、推定サブユニット１０１２、算出ユニット１０１３での処理はそれぞれ、上述したステップＳ４０１−Ｓ４０３での処理と類似し、抽出ユニット１０２に含まれた差異度算出ユニット１０２１、調整ユニット１０２２、隣接領域マージ・ユニット１０２３での処理はそれぞれ、上述したステップＳ５０１−Ｓ５０５での処理と類似し、特定ユニット１０３に含まれた相関特定ユニット１０３１、仮定組合せユニット１０３２、確信度算出ユニット１０３３、特定サブユニット１０３４での処理はそれぞれ、上述したステップＳ６０１−Ｓ６０２での処理と類似し、マージ・ユニット１０４に含まれた接続ユニット１０４１、行分けユニット１０４２、字分けユニット１０４３での処理はそれぞれ上述したステップＳ７０１−Ｓ７０３での処理と類似しており、話を簡潔にするために、これらのユニットに対する詳細説明を省略する。 Similarly, the processes in the image pyramid generation unit 1011, the estimation subunit 1012, and the calculation unit 1013 included in the estimation unit 101 are similar to the processes in steps S 401 to S 403 described above, and are included in the extraction unit 102. The processing in the difference calculation unit 1021, the adjustment unit 1022, and the adjacent region merging unit 1023 is similar to the processing in steps S501 to S505 described above, and the correlation specifying unit 1031 and the hypothetical combination unit included in the specific unit 103, respectively. 1032, the certainty calculation unit 1033, and the processing in the specific subunit 1034 are similar to the processing in steps S 601 to S 602 described above, respectively, and the connection unit 1041, line segmentation unit 1042, and character segmentation included in the merge unit 104. Uni Preparative process in 1043 is similar to the processing in step S701-S703 described above, respectively, for the sake of brevity, a detailed description thereof will be omitted for these units.

また、ここで指摘すべきことは、上記装置中の各構成モジュール、ユニットは、ソフトウェア、ファームウェア、ハードウェア又はそれらの組合せによって構成されてよい。構成に使用可能な具体的な手段又は方式は、当業者に既に知られているものであるため、ここでは説明しない。ソフトウェア又はファームウェアにより実現される場合、記憶媒体又はネットワークから専用ハードウェア構造を有するコンピュータ（例えば、図１１に示された汎用コンピュータ１１００）に、当該ソフトウェアを構成するプログラムをインストールする。当該コンピュータは、各種のプログラムがインストールされると、各種機能の実行等が可能である。 Further, it should be pointed out that each component module and unit in the apparatus may be configured by software, firmware, hardware, or a combination thereof. Specific means or schemes that can be used in the construction are already known to those skilled in the art and will not be described here. When realized by software or firmware, a program constituting the software is installed from a storage medium or a network to a computer having a dedicated hardware structure (for example, the general-purpose computer 1100 shown in FIG. 11). The computer can execute various functions when various programs are installed.

図１１は、本発明の実施例による方法及び装置を実施するためのコンピュータを模式的に示すブロック図である。 FIG. 11 is a block diagram schematically illustrating a computer for implementing the method and apparatus according to the embodiments of the present invention.

図１１において、中央処理ユニット（ＣＰＵ）１１０１は、リードオンリーメモリ（ＲＯＭ）１１０２に記憶されたプログラム、又は記憶部１１０８からランダムアクセスメモリ（ＲＡＭ）１１０３にロードされたプログラムに基づいて、各種の処理を実行する。ＲＡＭ１１０３において、必要に応じ、ＣＰＵ１１０１が各種の処理等を実行する場合に必要とされるデータも記憶される。ＣＰＵ１１０１、ＲＯＭ１１０２及びＲＡＭ１１０３は、バス１１０４を経由して互いに接続される。入力／出力インターフェース１１０５もバス１１０４に接続される。 In FIG. 11, a central processing unit (CPU) 1101 performs various processes based on a program stored in a read-only memory (ROM) 1102 or a program loaded from a storage unit 1108 to a random access memory (RAM) 1103. Execute. In the RAM 1103, data necessary for the CPU 1101 to execute various processes and the like is also stored as necessary. The CPU 1101, ROM 1102, and RAM 1103 are connected to each other via a bus 1104. An input / output interface 1105 is also connected to the bus 1104.

入力部１１０６（キーボード、マウス等を含む）と、出力部１１０７（例えば、陰極線管（ＣＲＴ）、液晶ディスプレイ（ＬＣＤ）等のようなディスプレイ及びスピーカ等を含む）と、記憶部１１０８（ハードディスク等を含む）と、通信部１１０９（ＬＡＮカードのようなネットワークインターフェースカード、モデム等を含む）は、入力／出力インターフェース１１０５に接続される。通信部１１０９はネットワーク（例えば、インターネット）を経由して通信処理を実行する。必要に応じ、ドライバ１１１０は入力／出力インターフェース１１０５に接続されてもよい。取り外し可能な媒体１１１１、例えば磁気ディスク、光ディスク、光磁気ディスク、半導体メモリ等が必要に応じてドライバ１１１０に取り付けられ、これによりその中から読み出されたコンピュータプログラムが必要に応じて記憶部１１０８にインストールされる。 An input unit 1106 (including a keyboard, a mouse, etc.), an output unit 1107 (for example, a display such as a cathode ray tube (CRT), a liquid crystal display (LCD), a speaker, etc.), and a storage unit 1108 (a hard disk, etc. And a communication unit 1109 (including a network interface card such as a LAN card, a modem, etc.) are connected to the input / output interface 1105. The communication unit 1109 executes communication processing via a network (for example, the Internet). The driver 1110 may be connected to the input / output interface 1105 as needed. A removable medium 1111, for example, a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is attached to the driver 1110 as necessary, so that a computer program read out from the driver 1110 can be stored in the storage unit 1108 as necessary. Installed.

ソフトウェアにより、上述の一連の処理を実現する場合は、ネットワーク、例えばインターネット、又は記憶媒体、例えば取り外し可能な媒体１１１１からソフトウェアを構成するプログラムをインストールする。 When the above-described series of processing is realized by software, a program constituting the software is installed from a network, for example, the Internet, or a storage medium, for example, a removable medium 1111.

このような記憶媒体は、図１１に示された、その中にプログラムが記憶されており、デバイスから離れて配送されてユーザにプログラムを提供する取り外し可能な媒体１１１１に限定されないことを、当業者は理解すべきである。取り外し可能な媒体１１１１としては、例えば、磁気ディスク（フロッピー（登録商標）・ディスク含む）、光ディスク（コンパクトディスク・リードオンリーメモリ（ＣＤ−ＲＯＭ）やディジタル多用途ディスク（ＤＶＤ）を含む）、光磁気ディスク（ミニディスク（ＭＤ）（登録商標）含む）及び半導体メモリがある。又は、記憶媒体は、ＲＯＭ１１０２、記憶部１１０８に含まれるハードディスクであって、プログラムが記憶されており、且つそれらを含むデバイスとともにユーザに配布されるハードディスクなどであってもよい。 Those skilled in the art will recognize that such storage media is not limited to the removable media 1111 shown in FIG. 11 in which the program is stored and delivered remotely from the device to provide the program to the user. Should be understood. Examples of the removable medium 1111 include a magnetic disk (including a floppy (registered trademark) disk), an optical disk (including a compact disk / read only memory (CD-ROM) and a digital versatile disk (DVD)), and a magneto-optical disk. There are discs (including Mini Disc (MD) (registered trademark)) and semiconductor memories. Alternatively, the storage medium may be a hard disk included in the ROM 1102 and the storage unit 1108 in which a program is stored and distributed to the user together with a device including them.

本発明は、コンピュータで読み取り可能な命令コードが記憶されたプログラムを提供する。上記命令コードは、コンピュータで読取られて実行されると、上記本発明の実施例による方法を実行することができる。 The present invention provides a program storing computer-readable instruction codes. When the instruction code is read and executed by a computer, the method according to the embodiment of the present invention can be executed.

同様に、上述のコンピュータで読み取り可能な命令コードが記憶されたプログラム製品を搭載する記憶媒体も本発明の開示に含まれる。上記記憶媒体は、フロッピー(登録商標）ディスク、光ディスク、磁気光ディスク、メモリカード、メモリースティック等を含むが、それらに限定されない。 Similarly, a storage medium on which a program product in which the above-described computer-readable instruction code is stored is also included in the disclosure of the present invention. Examples of the storage medium include, but are not limited to, a floppy (registered trademark) disk, an optical disk, a magnetic optical disk, a memory card, a memory stick, and the like.

以上の本発明の具体的な実施例に対する記述において、一種の実施形態に対して記述し及び／又は示した特徴は、同一又は類似の形態で一つ又は複数の他の実施形態で使用されたり、他の実施形態における特徴と組合せたり、あるいは、他の実施形態における特徴の代替としたりすることができる。 In the foregoing description of specific embodiments of the invention, the features described and / or illustrated for one type of embodiment may be used in one or more other embodiments in the same or similar form. , Can be combined with features in other embodiments, or can be substituted for features in other embodiments.

強調すべきことは、専門用語「含む／有する」が、本文で使用される場合、特徴、要素、ステップ又は構成部分の存在を意味するが、一つ又は複数の他の特徴、要素、ステップ又は構成部分の存在又は付加を排除する訳でないという点である。 It should be emphasized that the term “comprising / having” as used herein means the presence of a feature, element, step or component, but one or more other features, elements, steps or It does not exclude the presence or addition of a component.

また、本発明の方法は、明細書に記述された時系列に従って実行されることに限られず、他の時系列に従って順次、並行に、又は個別に実行されてもよい。したがって、本明細書で記述された方法の実行順序は本発明の技術的範囲を制限するものでない。 The method of the present invention is not limited to being executed according to the time series described in the specification, and may be executed sequentially, in parallel, or individually according to other time series. Therefore, the order of execution of the methods described herein does not limit the technical scope of the present invention.

以上で、本発明の具体的な実施形態を説明したが、上述の全ての実施形態及び実施例は例示的なものであり、限定的なものではないことを理解すべきである。当業者は、本願の特許請求の範囲の趣旨及び範囲内において本発明に対する各種の修正、改良又は均等物を企図することができる。これらの修正、改良又は均等物も本発明の保護範囲内に含まれると考えられる。
（付記１）
画像におけるテキスト領域を位置決めする方法であって、
入力画像における各画素の周りに存在可能なテキストの大きさを推定するステップと、
前記テキストの大きさ及び領域差異度に基づいて、前記入力画像から候補筆画領域を抽出するステップと、
前記候補筆画領域から真の筆画領域を特定するステップと、
前記真の筆画領域をマージしてテキスト領域を形成するステップと
を含む方法。
（付記２）
前記入力画像における各画素の周りに存在可能なテキストの大きさを推定するステップは、
前記入力画像に基づいて複数の階層のピラミッド画像を生成するステップと、
各階層のピラミッド画像における各画素の周りにテキストの存在する確率を推定するステップと、
前記確率及びピラミッドの階層に対応するテキストの大きさに基づいて前記入力画像における各画素の周りに存在可能なテキストの大きさを算出するステップと
を更に含む、付記１に記載の方法。
（付記３）
前記候補筆画領域を抽出するステップは、
前記入力画像における隣接した領域に対して、領域間の差異度と領域内の差異度を算出するステップと、
推定された入力画像における各画素の周りに存在可能なテキストの大きさに基づいて、領域の周りに存在可能なテキストの大きさを推定し、推定された領域の周りに存在可能なテキストの大きさに基づいて各領域内の差異度を調整するステップと、
隣接した領域の領域間の差異度及び隣接した領域の調整後の領域内の差異度に基づいて、前記隣接した領域をマージすべきか否かを判断するステップと、
前記隣接した領域をマージすべきと判断された場合に、前記隣接した領域をマージし、上述したステップを繰り返すステップと
を更に含み、
現在のすべての隣接した領域をいずれもマージすべきではないと判断された場合は、候補筆画領域を抽出したことを意味する、付記１に記載の方法。
（付記４）
隣接した領域の領域間の差異度が、隣接した領域の調整後の領域内の差異度の最小値よりも小さい場合に、前記隣接した領域をマージする、付記３に記載の方法。
（付記５）
前記の抽出ステップを開始する際の隣接した領域は、隣接した画素であり、前記領域の差異度は色差を含む、付記３又は４に記載の方法。
（付記６）
前記真の筆画領域を特定するステップは、
候補筆画領域のうち互いに相関する候補筆画領域を特定するステップと、
入力画像におけるすべての候補筆画領域のそれぞれが真の筆画領域であるか否かについての多種の仮定組合せを生成するステップと、
各種の仮定組合せに対して、前記仮定組合せ及び各候補筆画領域における画素の特徴に基づいて、各候補筆画領域の第１の確信度を算出し、前記仮定組合せ及び互いに相関する候補筆画領域における画素の特徴に基づいて、第１の確信度に対応する第２の確信度を算出し、その後、前記第１の確信度及び前記第２の確信度に基づいて、当該仮定組合せの場合を表す確信度を算出するステップと、
当該仮定組合せの場合を表す確信度の最も高い場合に対応した仮定組合せを、真の筆画領域を特定する結果とするステップと、
前記互いに相関する候補筆画領域が同一のテキスト行に属した場合に、当該仮定組合せの場合を表す確信度の算出において、その対応する第２の確信度に大きい重み係数を付加するステップと
を更に含む、付記１に記載の方法。
（付記７）
候補筆画領域のサイズ情報及び候補筆画領域間の距離に基づいて、前記候補筆画領域が相関するか否かを判断するステップを更に含む、付記６に記載の方法。
（付記８）
特徴空間において候補筆画領域をフィットし、同一のフィッティング曲線に属する候補筆画領域を、同一のテキスト行に属すると特定するステップと、
回帰誤差に基づいて、前記当該仮定組合せの場合を表す確信度の算出における重み係数を算出するステップと
を更に含む、付記６又は７に記載の方法。
（付記９）
前記真の筆画領域をマージするステップは、
筆画領域間の距離に基づいて筆画領域間の接続関係を特定するステップと、
異なるテキスト行に属する筆画領域間の接続関係を取り除くステップと、
異なる字に属する筆画領域間の接続関係を取り除くステップと
を更に含む、付記１に記載の方法。
（付記１０）
画像におけるテキスト領域を位置決めする装置であって、
入力画像における各画素の周りに存在可能なテキストの大きさを推定するための推定ユニットと、
前記テキストの大きさ及び領域の差異度に基づいて、前記入力画像から候補筆画領域を抽出するための抽出ユニットと、
前記候補筆画領域から真の筆画領域を特定するための特定ユニットと、
真の筆画単位をマージしてテキスト領域を形成するためのマージ・ユニットと
を含む装置。
（付記１１）
前記推定ユニットは、
前記入力画像に基づいて複数の階層のピラミッド画像を生成するための画像ピラミッド生成ユニットと、
各階層のピラミッド画像における各画素の周りにテキストの存在する確率を推定するための推定サブユニットと、
前記確率及びピラミッド階層に対応するテキストの大きさに基づいて前記入力画像における各画素の周りに存在可能なテキストの大きさを算出するための算出ユニットと
を含む、付記１０に記載の装置。
（付記１２）
前記抽出ユニットは、
前記入力画像における隣接した領域に対して、領域間の差異度及び領域内の差異度を算出するための差異度算出ユニットと、
推定された入力画像における各画素の周りに存在可能なテキストの大きさに基づいて、領域の周りに存在可能なテキストの大きさを推定し、推定された領域の周りに存在可能なテキストの大きさに基づいて、各領域内の差異度を調整するための調整ユニットと、
隣接した領域の領域間の差異度及び前記隣接した領域の調整後の領域内の差異度に基づいて、前記隣接した領域をマージする隣接領域マージ・ユニットと
を含む、付記１０に記載の装置。
（付記１３）
前記特定ユニットは、
候補筆画領域のうち互いに相関する候補筆画領域を特定するための相関特定ユニットと、
入力画像におけるすべての候補筆画領域それぞれが真の筆画領域であるか否かについての多種の仮定組合せを生成するための仮定組合せユニットと、
各種の仮定組合せに対して、前記仮定組合せ及び各候補筆画領域における画素の特徴に基づいて、各候補筆画領域の第１の確信度を算出し、前記仮定組合せ及び互いに相関する候補筆画領域における画素の特徴に基づいて、第１の確信度に対応する第２の確信度を算出し、その後、前記第１の確信度及び前記第２の確信度に基づいて、当該仮定組合せにおける場合を表す確信度を算出するための確信度算出ユニットと、
当該仮定組合せにおける場合を表す確信度の最も高い場合に対応した仮定組合せを、真の筆画領域を特定する結果とする特定サブユニットと
を含み、
前記互いに相関する候補筆画領域が同一のテキスト行に属した場合に、当該仮定組合せにおける場合を表す確信度の算出において、その対応する第２の確信度に大きい重み係数を付加する、付記１０に記載の装置。
（付記１４）
前記相関特定ユニットは、候補筆画領域のサイズ情報及び候補筆画領域間の距離に基づいて、前記候補筆画領域が相関するか否かを判断する、付記１３に記載の装置。
（付記１５）
前記確信度算出ユニットは、
特徴空間において候補筆画領域をフィットし、同一のフィッティング曲線に属する候補筆画領域を、同一のテキスト行に属すると特定し、回帰誤差に基づいて、当該仮定組合せにおける場合を表す確信度の算出における重み係数を算出する、付記１３又は１４に記載の装置。
（付記１６）
前記マージ・ユニットは、
筆画領域間の距離に基づいて筆画領域間の接続関係を特定するための接続ユニットと、
異なるテキスト行に属する筆画領域間の接続関係を取り除くための行分けユニットと、
異なる字に属する筆画領域間の接続関係を取り除くための字分けユニットと
を含む、付記１０に記載の装置。 While specific embodiments of the invention have been described above, it should be understood that all of the above-described embodiments and examples are illustrative and not limiting. Those skilled in the art may contemplate various modifications, improvements, or equivalents to the present invention within the spirit and scope of the claims. These modifications, improvements or equivalents are considered to be within the protection scope of the present invention.
(Appendix 1)
A method for positioning a text area in an image comprising:
Estimating the size of text that can exist around each pixel in the input image;
Extracting a candidate stroke area from the input image based on the text size and the area difference; and
Identifying a true stroke area from the candidate stroke areas;
Merging the true stroke areas to form a text area.
(Appendix 2)
Estimating the size of text that can exist around each pixel in the input image comprises:
Generating a plurality of layers of pyramid images based on the input image;
Estimating the probability that text will exist around each pixel in the pyramid image of each hierarchy;
The method of claim 1, further comprising: calculating a size of text that can exist around each pixel in the input image based on the probability and the size of the text corresponding to a hierarchy of pyramids.
(Appendix 3)
The step of extracting the candidate stroke area includes:
Calculating a difference between regions and a difference within the region for adjacent regions in the input image; and
Estimate the size of text that can exist around the area based on the size of text that can exist around each pixel in the estimated input image, and size of the text that can exist around the estimated area Adjusting the degree of difference within each region based on the depth;
Determining whether or not to merge the adjacent regions based on the difference between the regions of the adjacent regions and the difference in the adjusted region of the adjacent regions;
Merging the adjacent areas when it is determined that the adjacent areas should be merged, and repeating the steps described above.
The method according to appendix 1, which means that if it is determined that all current adjacent areas should not be merged, a candidate stroke area is extracted.
(Appendix 4)
The method according to claim 3, wherein the adjacent areas are merged when the difference between the adjacent areas is smaller than the minimum difference in the adjusted area of the adjacent areas.
(Appendix 5)
The method according to appendix 3 or 4, wherein the adjacent region when starting the extraction step is an adjacent pixel, and the degree of difference between the regions includes a color difference.
(Appendix 6)
The step of specifying the true stroke area includes:
Identifying candidate stroke areas that correlate with each other among the candidate stroke areas;
Generating various hypothetical combinations as to whether each of all candidate stroke areas in the input image is a true stroke area;
For each hypothetical combination, the first certainty factor of each candidate stroke area is calculated based on the hypothesis combination and the characteristics of the pixels in each candidate stroke area, and the hypothetical combination and the pixels in the candidate stroke area correlated with each other. A second certainty factor corresponding to the first certainty factor is calculated based on the characteristics of the first confidence factor, and thereafter, a certainty representing the case of the hypothetical combination based on the first certainty factor and the second certainty factor. Calculating the degree,
A step of setting a hypothetical combination corresponding to the highest certainty level representing the hypothetical combination as a result of specifying a true stroke area;
A step of adding a large weighting factor to the corresponding second certainty factor in calculating the certainty factor representing the case of the hypothetical combination when the candidate stroke areas correlated with each other belong to the same text line; The method according to appendix 1, comprising:
(Appendix 7)
The method according to claim 6, further comprising the step of determining whether or not the candidate stroke areas correlate based on the size information of the candidate stroke areas and the distance between the candidate stroke areas.
(Appendix 8)
Fitting candidate stroke areas in the feature space and identifying candidate stroke areas belonging to the same fitting curve as belonging to the same text line;
The method according to appendix 6 or 7, further comprising: calculating a weighting factor in calculating the certainty factor representing the case of the assumption combination based on a regression error.
(Appendix 9)
The step of merging the true stroke area includes:
Identifying a connection relationship between the stroke areas based on the distance between the stroke areas;
Removing a connection between stroke areas belonging to different text lines;
The method according to claim 1, further comprising the step of removing a connection relationship between stroke areas belonging to different characters.
(Appendix 10)
An apparatus for positioning a text area in an image,
An estimation unit for estimating the size of text that can exist around each pixel in the input image;
An extraction unit for extracting a candidate stroke area from the input image based on the size of the text and the difference between the areas;
A specific unit for specifying a true stroke area from the candidate stroke area;
A merge unit for merging true stroke units to form a text area.
(Appendix 11)
The estimation unit is:
An image pyramid generation unit for generating a plurality of layers of pyramid images based on the input image;
An estimation subunit for estimating the probability that text will exist around each pixel in the pyramid image of each hierarchy;
The apparatus according to claim 10, further comprising: a calculation unit for calculating the size of text that can exist around each pixel in the input image based on the probability and the size of text corresponding to the pyramid hierarchy.
(Appendix 12)
The extraction unit is
A difference degree calculation unit for calculating a difference degree between areas and a difference degree within the area with respect to adjacent areas in the input image;
Estimate the size of text that can exist around the area based on the size of text that can exist around each pixel in the estimated input image, and size of the text that can exist around the estimated area And an adjustment unit for adjusting the degree of difference in each area,
The apparatus according to claim 10, further comprising: an adjacent area merging unit that merges the adjacent areas based on a difference between the adjacent areas and a difference in the adjusted area of the adjacent areas.
(Appendix 13)
The specific unit is:
A correlation identification unit for identifying candidate stroke areas correlated with each other among the candidate stroke areas;
A hypothetical combination unit for generating various hypothetical combinations as to whether each of all candidate stroke areas in the input image is a true stroke area;
For each hypothetical combination, the first certainty factor of each candidate stroke area is calculated based on the hypothesis combination and the characteristics of the pixels in each candidate stroke area, and the hypothetical combination and the pixels in the candidate stroke area correlated with each other. A second certainty factor corresponding to the first certainty factor is calculated based on the characteristics of the first confidence factor, and thereafter, a certainty representing the case in the assumed combination based on the first certainty factor and the second certainty factor. A certainty calculation unit for calculating the degree,
A hypothetical combination corresponding to the highest certainty level representing the case in the hypothetical combination includes a specific subunit that results in identifying the true stroke area;
When the candidate stroke areas correlated with each other belong to the same text line, a large weighting factor is added to the corresponding second certainty factor in the calculation of the certainty factor representing the case of the assumed combination. The device described.
(Appendix 14)
The apparatus according to appendix 13, wherein the correlation specifying unit determines whether or not the candidate stroke areas correlate based on size information of the candidate stroke areas and a distance between the candidate stroke areas.
(Appendix 15)
The certainty factor calculation unit includes:
Weights in the calculation of certainty factors that fit the candidate stroke area in the feature space, identify candidate stroke areas belonging to the same fitting curve as belonging to the same text line, and represent the case in the hypothetical combination based on the regression error The apparatus according to appendix 13 or 14, which calculates a coefficient.
(Appendix 16)
The merge unit is
A connection unit for identifying the connection relationship between the stroke areas based on the distance between the stroke areas;
A line dividing unit for removing the connection relationship between the stroke areas belonging to different text lines;
The apparatus according to claim 10, further comprising: a character dividing unit for removing a connection relationship between the stroke areas belonging to different characters.

１００装置
１０１推定ユニット
１０２抽出ユニット
１０３特定ユニット
１０４マージ・ユニット 100 apparatus 101 estimation unit 102 extraction unit 103 specific unit 104 merge unit

Claims

A method for positioning a text area in an image comprising:
Estimating the size of text that can exist around each pixel in the input image;
Extracting a candidate stroke area from the input image based on the text size and the area difference; and
Identifying a true stroke area from the candidate stroke areas;
Merging the true stroke areas to form a text area.

Estimating the size of text that can exist around each pixel in the input image comprises:
Generating a plurality of layers of pyramid images based on the input image;
Estimating the probability that text will exist around each pixel in the pyramid image of each hierarchy;
Calculating the size of text that can exist around each pixel in the input image based on the probability and the size of text corresponding to a hierarchy of pyramids.

The step of extracting the candidate stroke area includes:
Calculating a difference between regions and a difference within the region for adjacent regions in the input image; and
Estimate the size of text that can exist around the area based on the size of text that can exist around each pixel in the estimated input image, and size of the text that can exist around the estimated area Adjusting the degree of difference within each region based on the depth;
Determining whether or not to merge the adjacent regions based on the difference between the regions of the adjacent regions and the difference in the adjusted region of the adjacent regions;
Merging the adjacent regions when it is determined that the adjacent regions should be merged, and repeating the steps described above.
The method according to claim 1, wherein if it is determined that none of all current adjacent areas should be merged, the candidate stroke area is extracted.

The method according to claim 3, wherein the adjacent areas are merged when the difference between the areas of the adjacent areas is smaller than the minimum difference in the adjusted area of the adjacent areas.

5. The method according to claim 3, wherein an adjacent region at the start of the extraction step is an adjacent pixel, and the difference between the regions includes a color difference.

The step of specifying the true stroke area includes:
Identifying candidate stroke areas that correlate with each other among the candidate stroke areas;
Generating various hypothetical combinations as to whether each of all candidate stroke areas in the input image is a true stroke area;
For each hypothetical combination, the first certainty factor of each candidate stroke area is calculated based on the hypothesis combination and the characteristics of the pixels in each candidate stroke area, and the hypothetical combination and the pixels in the candidate stroke area correlated with each other. A second certainty factor corresponding to the first certainty factor is calculated based on the characteristics of the first confidence factor, and thereafter, a certainty representing the case of the hypothetical combination based on the first certainty factor and the second certainty factor. Calculating the degree,
A step of setting a hypothetical combination corresponding to the highest certainty level representing the hypothetical combination as a result of specifying a true stroke area;
Adding a large weighting factor to the corresponding second certainty factor in calculating the certainty factor representing the case of the hypothetical combination when the candidate stroke areas correlated with each other belong to the same text line. The method of claim 1.

The method according to claim 6, further comprising: determining whether the candidate stroke areas are correlated based on size information of the candidate stroke areas and a distance between the candidate stroke areas.

Fitting candidate stroke areas in the feature space and identifying candidate stroke areas belonging to the same fitting curve as belonging to the same text line;
The method according to claim 6, further comprising a step of calculating a weighting factor in calculating the certainty factor representing the case of the assumed combination based on the regression error.

The step of merging the true stroke area includes:
Identifying a connection relationship between the stroke areas based on the distance between the stroke areas;
Removing a connection between stroke areas belonging to different text lines;
Removing a connection relationship between stroke areas belonging to different characters.

An apparatus for positioning a text area in an image,
An estimation unit for estimating the size of text that can exist around each pixel in the input image;
An extraction unit for extracting a candidate stroke area from the input image based on the size of the text and the difference between the areas;
A specific unit for specifying a true stroke area from the candidate stroke area;
A merge unit for merging true stroke units to form a text area.