JP2021131784A

JP2021131784A - Image processing system, program, and image processing method

Info

Publication number: JP2021131784A
Application number: JP2020027618A
Authority: JP
Inventors: 裕介村松; Yusuke Murakami
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2020-02-20
Filing date: 2020-02-20
Publication date: 2021-09-09

Abstract

【課題】手書き文字を含む非定型帳票から、項目と値を抽出して保存する。【解決手段】本発明の画像処理システムは、画像から手書き文字の画素を抽出するための学習済みモデルを用いて、原稿をスキャンして得た処理対象画像から手書き文字の画素を抽出し、また、画像から手書き文字が記入された手書き領域を推定するための学習済みモデルを用いて、前記処理対象画像から手書き領域を推定し、さらに、前記処理対象画像から活字文字を含む活字領域を抽出し、前記手書き領域と所定の位置関係にある前記活字領域を特定し、当該手書き領域の文字認識結果と当該特定された活字領域の文字認識結果とを対応づけて保存する。【選択図】図９PROBLEM TO BE SOLVED: To extract and save items and values from an atypical form including handwritten characters. An image processing system of the present invention uses a trained model for extracting handwritten character pixels from an image to extract handwritten character pixels from a processed image obtained by scanning a manuscript. , The handwritten area is estimated from the processed image using the trained model for estimating the handwritten area in which the handwritten characters are written from the image, and the printed area including the printed characters is extracted from the processed image. , The print area having a predetermined positional relationship with the handwritten area is specified, and the character recognition result of the handwritten area and the character recognition result of the specified print area are stored in association with each other. [Selection diagram] Fig. 9

Description

本発明は、スキャン画像内に記載されている手書き文字の認識を行う画像処理システムに関する。 The present invention relates to an image processing system that recognizes handwritten characters described in a scanned image.

近年、コンピュータの普及に伴う労働環境の変化により、業務で扱う文書の電子化が進んでいる。こうした電子化の対象は帳票にも及んでおり、帳票の情報（項目と値）を読み取り構造化して保存する技術が検討されている。特に、項目や値が記載される位置がまちまちで統一したフォーマットのない帳票（非定型帳票）に対しては、項目と値が記入される位置を予め定義しておくことが出来ない為、帳票の内容や構造を解析して項目と値を逐一探す必要がある。特許文献１では、帳票の内容をＯＣＲしてテキスト化した後、値の可能性のある文字列を検索し、その文字列の周囲で値に対応する項目の文字列を特定して両者を対応付けて保存する技術を開示している。 In recent years, due to changes in the working environment accompanying the spread of computers, the digitization of documents handled in business is progressing. The target of such digitization extends to forms, and technologies for reading, structuring, and storing form information (items and values) are being studied. In particular, for forms (atypical forms) that have different positions where items and values are written and do not have a unified format, the positions where items and values are written cannot be defined in advance. It is necessary to analyze the contents and structure of and search for items and values one by one. In Patent Document 1, after the contents of the form are converted into text by OCR, a character string having a possibility of a value is searched, and a character string of an item corresponding to the value is specified around the character string to correspond to both. It discloses the technology to attach and store.

特開２０１８−５５２５５号公報JP-A-2018-55255

しかしながら、特許文献１の技術では値の可能性のある文字列、および値に対する項目の文字列は予め登録された文字列のパターンを参照して判断される。そのため、登録されていない項目や値が含まれる帳票には対応できない。 However, in the technique of Patent Document 1, the character string having a possibility of the value and the character string of the item for the value are determined by referring to the pattern of the character string registered in advance. Therefore, it is not possible to handle forms that include unregistered items and values.

上記課題を解決するために、本発明の画像処理システムは、画像から手書き文字の画素を抽出するための学習済みモデルを用いて、原稿をスキャンして得た処理対象画像から手書き文字の画素を抽出する手書き抽出手段と、画像から手書き文字が記入された手書き領域を推定するための学習済みモデルを用いて、前記処理対象画像から手書き領域を推定する手書き領域推定手段と、前記処理対象画像から活字文字を含む活字領域を抽出する活字領域抽出手段と、前記手書き領域と所定の位置関係にある前記活字領域を特定し、当該手書き領域の文字認識結果と当該特定された活字領域の文字認識結果とを対応づけて保存する保存手段と、を備えることを特徴とする。 In order to solve the above problem, the image processing system of the present invention uses a trained model for extracting handwritten character pixels from an image to obtain handwritten character pixels from a processed image obtained by scanning a document. Using a handwriting extraction means for extracting and a learned model for estimating a handwriting area in which handwritten characters are written from an image, a handwriting area estimation means for estimating a handwriting area from the processing target image, and a handwriting area estimation means for estimating the handwriting area from the processing target image, and the processing target image The type area extraction means for extracting the type area including the type characters and the type area having a predetermined positional relationship with the handwriting area are specified, and the character recognition result of the handwriting area and the character recognition result of the specified type area are specified. It is characterized in that it is provided with a storage means for storing the above in association with each other.

本発明によれば、手書き文字を含む帳票画像に含まれる項目と値を対応付けて保存することが出来る。 According to the present invention, it is possible to store items and values included in a form image including handwritten characters in association with each other.

画像処理システムの構成を示した図であるIt is a figure which showed the structure of an image processing system. 図２（ａ）は画像処理装置の構成を示す図である。図２（ｂ）は学習装置の構成を示す図である。図２（ｃ）は画像処理サーバの構成を示す図である。図２（ｄ）はＯＣＲサーバの構成を示す図であるFIG. 2A is a diagram showing a configuration of an image processing device. FIG. 2B is a diagram showing the configuration of the learning device. FIG. 2C is a diagram showing a configuration of an image processing server. FIG. 2D is a diagram showing the configuration of the OCR server. 図３（ａ）は画像処理システムの学習シーケンスを示す図である。図３（ｂ）は画像処理システムの利用シーケンスを示す図である。FIG. 3A is a diagram showing a learning sequence of the image processing system. FIG. 3B is a diagram showing a usage sequence of the image processing system. 帳票の例を示す図である。It is a figure which shows the example of a form. 図５（ａ）は学習原稿スキャン画面を示す図である。図５（ｂ）は手書き抽出正解データ作成画面を示す図である。図５（ｃ）は手書き領域推定正解データ作成画面を示す図である。図５（ｄ）は帳票処理画面を示す図である。FIG. 5A is a diagram showing a learning document scanning screen. FIG. 5B is a diagram showing a handwritten extraction correct answer data creation screen. FIG. 5C is a diagram showing a handwriting area estimation correct answer data creation screen. FIG. 5D is a diagram showing a form processing screen. 図６（ａ）は原稿サンプル画像生成処理のフローを示す図である。図６（ｂ）は原稿サンプル画像受信処理のフローを示す図である。図６（ｃ）は正解データ生成処理のフローを示す図である。FIG. 6A is a diagram showing a flow of a document sample image generation process. FIG. 6B is a diagram showing a flow of document sample image reception processing. FIG. 6C is a diagram showing a flow of correct answer data generation processing. 図７（ａ）は学習データ生成処理のフローを示す図である。図７（ｂ）は学習処理のフローを示す図である。FIG. 7A is a diagram showing a flow of learning data generation processing. FIG. 7B is a diagram showing a flow of learning processing. 図８（ａ）は手書き抽出の学習データの構成例を示す図である。図８（ｂ）は手書き領域推定の学習データの構成例を示す図である。FIG. 8A is a diagram showing a configuration example of learning data extracted by handwriting. FIG. 8B is a diagram showing a configuration example of learning data for handwriting area estimation. 図９（ａ）は帳票テキスト化依頼処理のフローを示す図である。図９（ｂ）は帳票テキスト化処理のフローを示す図である。FIG. 9A is a diagram showing a flow of form text conversion request processing. FIG. 9B is a diagram showing a flow of form text conversion processing. 図１０は帳票テキスト化処理におけるデータ生成処理の概要を示す図である。FIG. 10 is a diagram showing an outline of a data generation process in the form text conversion process. 実施例２における帳票テキスト化処理のフローを示す図である。It is a figure which shows the flow of the form text conversion processing in Example 2. 帳票の例を示す図である。It is a figure which shows the example of a form.

以下、本発明を実施するための形態について実施例にて具体的な構成を挙げ、図面を用いて説明する。なお、本発明を実現するための構成は実施例に記載された構成のみに限定されるものではない。同様の効果を得られる範囲で実施例に記載の構成の一部を省略または均等物に置き換えてもよい。 Hereinafter, embodiments for carrying out the present invention will be described with reference to the drawings with reference to specific configurations in Examples. The configuration for realizing the present invention is not limited to the configuration described in the examples. A part of the configuration described in the examples may be omitted or replaced with an equivalent to the extent that the same effect can be obtained.

（実施例１）
図１は本実施例の画像処理システムの構成例を示した図である。画像処理システム１００は、画像処理装置１０１、学習装置１０２、画像処理サーバ１０３、ＯＣＲサーバ１０４より構成される。画像処理装置１０１と学習装置１０２、画像処理サーバ１０３、ＯＣＲサーバ１０４は、ネットワーク１０５を介して接続されている。 (Example 1)
FIG. 1 is a diagram showing a configuration example of the image processing system of this embodiment. The image processing system 100 includes an image processing device 101, a learning device 102, an image processing server 103, and an OCR server 104. The image processing device 101, the learning device 102, the image processing server 103, and the OCR server 104 are connected via the network 105.

画像処理装置１０１は、ＭＦＰ等と呼ばれるデジタル複合機などであり、印刷機能やスキャン機能（画像取得部１１１としての機能）を有する。画像処理装置１０１は、帳票などの原稿をスキャンして画像データを生成する。スキャンした画像データから、手書きで記入された手書き文字の画素および手書き文字が記入された領域を識別することが可能な学習済みモデル（学習モデルとも言う）を作成するために、サンプルとなる原稿をスキャンして画像データを生成する（以降この画像データを「原稿サンプル画像」と呼称する）。複数枚の原稿をスキャンして原稿サンプル画像を複数得る。これらの原稿には、ユーザにより手書きで文字や記号などの記入が成された原稿が含まれている。そして、画像処理装置１０１は、ネットワーク１０５を介して、原稿サンプル画像を学習装置１０２に送信して、学習済みモデルの作成を行わせる。 The image processing device 101 is a digital multifunction device called an MFP or the like, and has a printing function and a scanning function (function as an image acquisition unit 111). The image processing device 101 scans a document such as a form to generate image data. In order to create a trained model (also called a learning model) that can identify the pixels of handwritten characters and the area where handwritten characters are written from the scanned image data, a sample manuscript is used. Scan to generate image data (hereinafter this image data is referred to as "manuscript sample image"). Scan multiple documents to obtain multiple document sample images. These manuscripts include manuscripts in which characters, symbols, and the like are handwritten by the user. Then, the image processing device 101 transmits a manuscript sample image to the learning device 102 via the network 105 to create a trained model.

また、学習済みモデルが作成された後に、画像処理装置１０１は、処理対象となる帳票（ユーザにより手書きで記入された文字（手書き記号、手書き図形）が含まれる原稿）をスキャンして処理対象とする画像データを得る（以降この画像データを「処理対象画像」と呼称する）。そして、画像処理装置１０１は、ネットワーク１０５を介して、処理対象画像を画像処理サーバ１０３に送信する。 Further, after the trained model is created, the image processing device 101 scans the form to be processed (a manuscript including characters (handwritten symbols, handwritten figures) handwritten by the user) and sets it as the processing target. Image data to be processed is obtained (hereinafter, this image data is referred to as a "processed image"). Then, the image processing device 101 transmits the image to be processed to the image processing server 103 via the network 105.

学習装置１０２は、画像処理装置１０１が生成した原稿サンプル画像を蓄積する画像蓄積部１１５として機能する。またこのようにして蓄積した画像から学習データを生成する学習データ生成部１１２として機能する。学習データは、手書き抽出および手書き領域推定を行うニューラルネットワークを学習するために用いられるデータである。学習装置１０２は、生成した学習データを用いて、ニューラルネットワークの学習をおこなう学習部１１３として機能する。学習部１１３によって実行される学習処理により、学習結果（ニューラルネットワークのパラメータなどを含む学習済みモデル）が生成される。学習装置１０２は、学習結果（学習モデル）を、ネットワーク１０５を介して、画像処理サーバ１０３に送信する。本実施形態では、機械学習手法の一つとして、多層ニューラルネットワークを用いるディープラーニングを使用するものとするが、ディープラーニングに限るものではなく、その他の学習手法を用いても構わない。 The learning device 102 functions as an image storage unit 115 that stores a document sample image generated by the image processing device 101. Further, it functions as a learning data generation unit 112 that generates learning data from the images accumulated in this way. The training data is data used for learning a neural network that performs handwriting extraction and handwriting area estimation. The learning device 102 functions as a learning unit 113 that learns a neural network using the generated learning data. The learning process executed by the learning unit 113 generates a learning result (a learned model including neural network parameters and the like). The learning device 102 transmits the learning result (learning model) to the image processing server 103 via the network 105. In the present embodiment, deep learning using a multi-layer neural network is used as one of the machine learning methods, but the learning method is not limited to deep learning, and other learning methods may be used.

画像処理サーバ１０３は、処理対象画像を変換する画像変換部１１４として機能する。画像変換部１１４は、処理対象画像に基づいて手書きＯＣＲの対象とする画像を生成する。すなわち、画像変換部１１４は、画像処理装置１０１が生成した処理対象画像に対して学習済みモデル（学習済みのニューラルネットワーク）を適用することで、手書き画素の抽出と手書き領域の抽出を行う。この際、画像処理サーバ１０３は、学習装置１０２が生成した学習結果のニューラルネットワーク（学習済みモデル（学習モデルとも言う））を用いて推論して、処理対象画像中の手書きの画素（画素位置）を抽出（特定）する。そして、当該特定した手書き画素を黒とし、それ以外の画素を白とした手書き抽出画像を得る。さらに、画像処理サーバ１０３は、画像処理装置１０１が生成した処理対象画像に対してニューラルネットワークを用いて手書き領域の推定を行う。画像処理サーバ１０３は、学習装置１０２が生成した学習結果を用いることで、ニューラルネットワークにより推論して処理対象画像中の手書き文字が記入された領域（手書き領域）を推定（特定）する。例えば、手書きされた文字の画素を含み、かつ、罫線で囲まれた領域（記入欄）などが手書き領域として推定されるものとするが、罫線で囲まれた領域に限るものではない。ここで、この手書き領域の実態は、処理対象画像中の手書き文字を含む部分領域を示す情報であり、例えば、処理対象画像上の特定の画素の位置（座標）と、当該画素位置からの幅や高さから成る情報として表現される。また、手書き領域は、帳票に記入された項目の数に応じて、複数得られる場合がある。 The image processing server 103 functions as an image conversion unit 114 that converts an image to be processed. The image conversion unit 114 generates an image to be handwritten OCR based on the image to be processed. That is, the image conversion unit 114 extracts the handwritten pixels and the handwritten area by applying the trained model (learned neural network) to the image to be processed generated by the image processing device 101. At this time, the image processing server 103 infers using a neural network (learned model (also referred to as a learning model)) of the learning result generated by the learning device 102, and handwritten pixels (pixel positions) in the image to be processed. Is extracted (specified). Then, a handwritten extracted image in which the specified handwritten pixel is black and the other pixels are white is obtained. Further, the image processing server 103 estimates the handwritten area of the image to be processed generated by the image processing device 101 by using a neural network. The image processing server 103 estimates (identifies) the area (handwritten area) in which the handwritten characters are written in the image to be processed by inferring from the neural network by using the learning result generated by the learning device 102. For example, it is assumed that an area (entry field) including pixels of handwritten characters and surrounded by a ruled line is estimated as a handwritten area, but the area is not limited to the area surrounded by the ruled line. Here, the actual state of this handwritten area is information indicating a partial area including handwritten characters in the image to be processed, for example, the position (coordinates) of a specific pixel on the image to be processed and the width from the pixel position. It is expressed as information consisting of and height. In addition, a plurality of handwritten areas may be obtained depending on the number of items entered in the form.

すなわち、手書き抽出画像は、処理対象画像中の手書きと推定された画素のみを抜き出した画像である。一方、手書き領域は、処理対象画像における個々の項目に対する手書き記入部分の領域を示す情報である。よって、手書き領域に基づいて手書き抽出画像上の部分領域を決定することで、手書き抽出画像内の手書き文字を、各項目に対して記入された部分として分割して扱うことができる。そして、画像変換部１１４は、手書き抽出画像と手書き領域とをＯＣＲサーバ１０４に送信する。これにより、ＯＣＲサーバ１０４に、手書き抽出画像上の領域であって、推定した各手書き領域に該当する領域を、それぞれ手書きＯＣＲの対象領域とすることを指示するものである。 That is, the handwritten extracted image is an image obtained by extracting only the pixels estimated to be handwritten in the image to be processed. On the other hand, the handwritten area is information indicating an area of a handwritten entry portion for each item in the image to be processed. Therefore, by determining the partial area on the handwritten extracted image based on the handwritten area, the handwritten characters in the handwritten extracted image can be divided and handled as the parts written for each item. Then, the image conversion unit 114 transmits the handwritten extracted image and the handwritten area to the OCR server 104. As a result, the OCR server 104 is instructed to set the area on the handwritten extracted image corresponding to each estimated handwritten area as the target area of the handwritten OCR.

また、画像変換部１１４は、処理対象画像から手書き抽出画像に含まれる手書き画素を除去した画像（手書き文字以外の画素（活字等に対応する部分）が残るため、以降「活字画像」と呼称する）を生成する。そして、画像変換部１１４は、活字画像上の領域であって、活字ＯＣＲの対象とする活字を含む領域（以降この領域を「活字領域」と呼称する）の情報を生成する。活字領域の生成については後述する。そして、画像変換部１１４は、活字画像と活字領域とをＯＣＲサーバ１０４に送信する。これにより、ＯＣＲサーバ１０４に、活字画像上の活字領域を、それぞれ活字ＯＣＲすることを指示するものである。 Further, since the image conversion unit 114 retains an image in which the handwritten pixels included in the handwritten extracted image are removed from the image to be processed (pixels other than the handwritten characters (parts corresponding to the printed characters) remain, the image conversion unit 114 is hereinafter referred to as a "printed image". ) Is generated. Then, the image conversion unit 114 generates information on a region on the print image that includes the print that is the target of the print OCR (hereinafter, this region is referred to as a "print region"). The generation of the print area will be described later. Then, the image conversion unit 114 transmits the print image and the print area to the OCR server 104. As a result, the OCR server 104 is instructed to perform typographic OCR on each typographic area on the typographic image.

また、画像変換部１１４は、ＯＣＲサーバ１０４から手書きＯＣＲ結果および活字ＯＣＲ結果を受信する。そしてこれを統合し、テキストデータとして画像処理装置１０１に送信する。以降このテキストデータを「帳票テキストデータ」と呼称する。 Further, the image conversion unit 114 receives the handwritten OCR result and the printed OCR result from the OCR server 104. Then, this is integrated and transmitted as text data to the image processing device 101. Hereinafter, this text data will be referred to as "form text data".

ＯＣＲサーバ１０４は、手書きＯＣＲ部１１６、活字ＯＣＲ部１１７としての機能を備える。ＯＣＲサーバ１０４は、手書き抽出画像と手書き領域とを受信すると、手書き抽出画像中の手書き領域に対して、手書き文字を文字認識するのに適した、手書き文字用のＯＣＲ処理を行ってテキストデータ（ＯＣＲ結果）を取得する。手書きＯＣＲ部１１６は、当該手書きＯＣＲ結果のテキストデータを画像処理サーバ１０３に送信する。また、ＯＣＲサーバ１０４は、活字画像と活字領域を受信すると、活字画像中の活字領域に対して、活字文字を文字認識するのに適した活字用のＯＣＲ処理を行ってテキストデータを取得する。活字ＯＣＲ部１１７は、当該活字ＯＣＲテキストデータを画像処理サーバ１０３に送信する。 The OCR server 104 has functions as a handwritten OCR unit 116 and a printed OCR unit 117. When the OCR server 104 receives the handwritten extracted image and the handwritten area, the OCR server 104 performs OCR processing for the handwritten character, which is suitable for recognizing the handwritten character, on the handwritten area in the handwritten extracted image, and performs text data (text data ( OCR result) is acquired. The handwritten OCR unit 116 transmits the text data of the handwritten OCR result to the image processing server 103. Further, when the OCR server 104 receives the print image and the print area, the OCR server 104 performs OCR processing for the print suitable for character recognition on the print area in the print image to acquire text data. The print OCR unit 117 transmits the print OCR text data to the image processing server 103.

＜学習シーケンス＞
本システムにおける学習シーケンスについて説明する。図３（ａ）は画像処理システムの学習シーケンスを示す図である。 <Learning sequence>
The learning sequence in this system will be described. FIG. 3A is a diagram showing a learning sequence of the image processing system.

ステップ３０１（以降の説明においてＳ３０１等と表記する）において、ユーザが複数のサンプル原稿を用意して、原稿の読取指示を行うと、画像取得部１１１は、複数のサンプル原稿を読みとって、複数枚の原稿サンプル画像を生成する（Ｓ３０２）。 In step 301 (referred to as S301 or the like in the following description), when the user prepares a plurality of sample originals and gives an instruction to read the originals, the image acquisition unit 111 reads the plurality of sample originals and a plurality of sheets. Generates a manuscript sample image of (S302).

上述のように生成された原稿サンプル画像は、学習データ生成部１１２に送信される（Ｓ３０３）。なお、このとき、原稿サンプル画像にＩＤ情報を付与するとよい。このＩＤ情報は例えば、画像取得部１１１として機能する画像処理装置１０１を識別するための情報である。なお、ＩＤ情報として、画像処理装置１０１を操作するユーザを識別するためのユーザ識別情報や、ユーザが所属するグループを識別するためのグループ識別情報であってもよい。 The manuscript sample image generated as described above is transmitted to the learning data generation unit 112 (S303). At this time, it is advisable to add ID information to the manuscript sample image. This ID information is, for example, information for identifying an image processing device 101 that functions as an image acquisition unit 111. The ID information may be user identification information for identifying a user who operates the image processing device 101, or group identification information for identifying a group to which the user belongs.

画像が送信されてくると、学習データ生成部１１２は、画像蓄積部１１５に原稿サンプル画像を蓄積する（Ｓ３０４）。 When the image is transmitted, the learning data generation unit 112 stores the manuscript sample image in the image storage unit 115 (S304).

ユーザが原稿サンプル画像に関する正解データの付与指示を学習装置１０２に行うと（Ｓ３０５）、学習データ生成部１１２は当該正解データ（手書き画素の情報と、手書き領域の情報）を取得する。そして原稿サンプル画像に紐づけて画像蓄積部１１５に蓄積する（Ｓ３０６）。紐づけられた原稿サンプル画像と正解データは、ニューラルネットワークの学習に用いるデータである。正解データの付与方法については後述する。そして、学習データ生成部１１２は、このようにして蓄積したデータに基づいて、学習処理に用いるための学習データを生成する（Ｓ３０７）。このとき、特定のＩＤ情報に基づく原稿サンプル画像のみを用いて学習データを生成してもよい。その後、学習データ生成部１１２は、当該生成した学習データを学習部１１３に送信する（Ｓ３０８）。なお、特定のＩＤ情報が付与されている画像のみで学習データを生成した場合は、ＩＤ情報も併せて送信する。学習部１１３は、受信した学習データに基づき学習処理を行い、学習モデルを更新する（Ｓ３０９）。学習部１１３は、ＩＤ情報ごとに学習モデルを保持し、対応する学習データのみで学習をおこなってもよい。このようにＩＤ情報と学習モデルを紐づけることで、特定の利用環境（特定の装置やユーザグループなど）に特化した学習モデルを構築することができる。 When the user gives an instruction to give the correct answer data regarding the manuscript sample image to the learning device 102 (S305), the learning data generation unit 112 acquires the correct answer data (information on the handwritten pixel and information on the handwritten area). Then, it is linked to the original sample image and stored in the image storage unit 115 (S306). The associated manuscript sample image and correct answer data are data used for learning the neural network. The method of assigning correct answer data will be described later. Then, the learning data generation unit 112 generates learning data for use in the learning process based on the data accumulated in this way (S307). At this time, the learning data may be generated using only the manuscript sample image based on the specific ID information. After that, the learning data generation unit 112 transmits the generated learning data to the learning unit 113 (S308). When the learning data is generated only from the image to which the specific ID information is attached, the ID information is also transmitted. The learning unit 113 performs learning processing based on the received learning data and updates the learning model (S309). The learning unit 113 may hold a learning model for each ID information and perform learning only with the corresponding learning data. By associating the ID information with the learning model in this way, it is possible to construct a learning model specialized for a specific usage environment (specific device, user group, etc.).

＜利用シーケンス＞
本システムにおける利用シーケンスについて説明する。図３（ｂ）は画像処理システムの利用シーケンスを示した図である。 <Usage sequence>
The usage sequence in this system will be described. FIG. 3B is a diagram showing a usage sequence of the image processing system.

Ｓ３５１において、ユーザが原稿（帳票）の読取指示を行うと、画像取得部１１１は、原稿を読み取って処理対象画像を生成する（Ｓ３５２）。ここで読み取られる画像は、例えば図４に示すような帳票４００や帳票４５０である。これらの帳票は氏名記入欄４０３および４５１や、住所記入欄４０１および４５２、電話番号記入欄４０２および４５３を備え、それぞれ、氏名や住所、電話番号が手書きで記入されている。しかし、これら記入欄の配置（帳票のレイアウト）は、帳票作成元のユーザにより決定されるため、帳票毎に異なる。レイアウトが予め定められていない帳票を、非定型帳票と呼ぶ。 In S351, when the user gives an instruction to read the document (form), the image acquisition unit 111 reads the document and generates an image to be processed (S352). The image read here is, for example, a form 400 or a form 450 as shown in FIG. These forms include name entry fields 403 and 451, address entry fields 401 and 452, and telephone number entry fields 402 and 453, and the name, address, and telephone number are entered by hand, respectively. However, since the arrangement of these entry fields (form layout) is determined by the user who created the form, it differs for each form. A form whose layout is not defined in advance is called an atypical form.

上述のように読み取られた処理対象画像は、画像変換部１１４に送信される（Ｓ３５３）。なお、このとき、送信データにＩＤ情報を付与するとよい。 The image to be processed read as described above is transmitted to the image conversion unit 114 (S353). At this time, it is advisable to add ID information to the transmission data.

画像変換部１１４は、処理対象画像のデータを受信すると、処理対象画像のテキスト化指示を受け付ける（Ｓ３５４）。このとき、画像変換部１１４は、画像取得部１１１をＯＣＲ処理結果のデータの返信先として記憶しておく。手書き文字の加工指示を受け付けた画像変換部１１４は、最新の学習モデルを学習部１１３に要求する（Ｓ３５５）。この要求に応じて、学習部１１３は最新の学習モデルを画像変換部１１４に送信する（Ｓ３５６）。画像変換部１１４からの要求時にＩＤ情報が指定されていた場合は、ＩＤ情報に対応する学習モデルの中で最新の学習モデルを送信する。画像変換部１１４は、取得した学習モデルに基づいて、処理対象画像に対して、手書き抽出および手書き領域推定を行う（Ｓ３５７）。続けて、画像変換部１１４は処理対象画像から活字画像と活字領域とを生成する（Ｓ３５８）。その後、各手書き領域について、その手書き領域に最も近い活字領域を探し、手書き領域を値に関する領域、活字領域を項目に関する領域として対応付ける（Ｓ３５９）。 When the image conversion unit 114 receives the data of the image to be processed, the image conversion unit 114 receives an instruction to convert the image to be processed into text (S354). At this time, the image conversion unit 114 stores the image acquisition unit 111 as a return destination of the data of the OCR processing result. The image conversion unit 114 that has received the handwritten character processing instruction requests the learning unit 113 for the latest learning model (S355). In response to this request, the learning unit 113 transmits the latest learning model to the image conversion unit 114 (S356). If the ID information is specified at the time of the request from the image conversion unit 114, the latest learning model among the learning models corresponding to the ID information is transmitted. The image conversion unit 114 performs handwriting extraction and handwriting area estimation for the image to be processed based on the acquired learning model (S357). Subsequently, the image conversion unit 114 generates a print image and a print region from the image to be processed (S358). After that, for each handwriting area, the print area closest to the handwriting area is searched, and the handwriting area is associated with the value area and the print area is associated with the item (S359).

そして、手書き抽出画像の中から、値に関する領域に該当する箇所の部分画像（手書き抽出画像）を手書きＯＣＲ部１１６に送信する（Ｓ３６０）。手書きＯＣＲ部１１６は、受信した手書き抽出画像に対して手書きＯＣＲ処理を施し、手書き文字画像の文字認識結果であるテキストデータ（手書き）を取得する（Ｓ３６１）。手書きＯＣＲ部１１６は、当該取得したテキストデータ（手書き）を画像変換部１１４に送信する（Ｓ３６２）。 Then, from the handwritten extracted images, a partial image (handwritten extracted image) of the portion corresponding to the area related to the value is transmitted to the handwritten OCR unit 116 (S360). The handwritten OCR unit 116 performs handwritten OCR processing on the received handwritten extracted image, and acquires text data (handwritten) which is a character recognition result of the handwritten character image (S361). The handwriting OCR unit 116 transmits the acquired text data (handwriting) to the image conversion unit 114 (S362).

また、活字画像の中から、項目に関する領域に該当する箇所の部分画像（活字画像）を活字ＯＣＲ部１１７に送信する（Ｓ３６３）。活字ＯＣＲ部１１７は、活字画像に活字ＯＣＲ処理を施し、活字文字画像のテキストデータ（活字）を取得する（Ｓ３６４）。そして、活字ＯＣＲ部１１７は、当該取得したテキストデータ（活字）を画像変換部１１４に送信する（Ｓ３６５）。画像変換部１１４は、こうして得られた項目に関するテキストデータ（活字）と、値に関するテキストデータ（手書き）とを、帳票テキストデータとして画像取得部１１１に送信する（Ｓ３６６）。帳票テキストデータを取得した画像取得部１１１は、帳票テキストデータの利用画面をユーザに提示する（Ｓ３６７）。このあと、画像取得１１１は、帳票テキストデータの利用用途に応じて、帳票テキストデータを出力する。例えば、別体外部の業務システム（不図示）に送信したり、印刷して出力する。 Further, from the print image, a partial image (print image) of the portion corresponding to the area related to the item is transmitted to the print OCR unit 117 (S363). The type OCR unit 117 performs the type OCR process on the type image and acquires the text data (type) of the type character image (S364). Then, the type OCR unit 117 transmits the acquired text data (type) to the image conversion unit 114 (S365). The image conversion unit 114 transmits the text data (printed type) relating to the item thus obtained and the text data (handwriting) relating to the value to the image acquisition unit 111 as form text data (S366). The image acquisition unit 111 that has acquired the form text data presents the form text data usage screen to the user (S367). After that, the image acquisition 111 outputs the form text data according to the usage of the form text data. For example, it may be sent to a separate external business system (not shown), or printed and output.

＜装置構成＞
上述したシステムを実現するために、各装置は次のような構成を備える。図２（ａ）は画像処理装置の構成を示す図である。図２（ｂ）は学習装置の構成を示す図である。図２（ｃ）は画像処理サーバの構成を示す図である。図２（ｄ）はＯＣＲサーバの構成を示す図である。 <Device configuration>
In order to realize the above-mentioned system, each device has the following configuration. FIG. 2A is a diagram showing a configuration of an image processing device. FIG. 2B is a diagram showing the configuration of the learning device. FIG. 2C is a diagram showing a configuration of an image processing server. FIG. 2D is a diagram showing a configuration of an OCR server.

図２（ａ）に示すように、画像処理装置１０１は、次を備える。ＣＰＵ２０１、ＲＯＭ２０２、ＲＡＭ２０４、プリンタデバイス２０５、スキャナデバイス２０６、原稿搬送デバイス２０７、ストレージ２０８、入力デバイス２０９、表示デバイス２１０、及び外部インタフェース２１１を備える。各デバイスは、データバス２０３によって相互通信可能に接続されている。 As shown in FIG. 2A, the image processing apparatus 101 includes the following. It includes a CPU 201, a ROM 202, a RAM 204, a printer device 205, a scanner device 206, a document transport device 207, a storage 208, an input device 209, a display device 210, and an external interface 211. The devices are connected to each other by the data bus 203 so as to be able to communicate with each other.

ＣＰＵ２０１は、画像処理装置１０１を統括的に制御するためのコントローラである。ＣＰＵ２０１は、ＲＯＭ２０２に格納されているブートプログラムによりＯＳ（オペレーティングシステム）を起動する。このＯＳ上で、ストレージ２０８に記憶されているコントローラプログラムが実行される。コントローラプログラムは、画像処理装置１０１を制御するためのプログラムである。ＣＰＵ２０１は、データバス２０３によって接続されている各デバイスを統括的に制御する。ＲＡＭ２０４は、ＣＰＵ２０１の主メモリやワークエリア等の一時記憶領域として動作する。 The CPU 201 is a controller for comprehensively controlling the image processing device 101. The CPU 201 boots the OS (operating system) by the boot program stored in the ROM 202. The controller program stored in the storage 208 is executed on this OS. The controller program is a program for controlling the image processing device 101. The CPU 201 comprehensively controls each device connected by the data bus 203. The RAM 204 operates as a temporary storage area such as a main memory or a work area of the CPU 201.

プリンタデバイス２０５は、画像データを用紙（記録材、シート）上に印刷するものである。これには感光体ドラムや感光体ベルトなどを用いた電子写真印刷方式や、微小ノズルアレイからインクを吐出して用紙上に直接画像を印字するインクジェット方式などがあるが、どの方式でもかまわない。スキャナデバイス２０６は、ＣＣＤなどの光学読取装置を用いて紙などの原稿上の走査を行い、電気信号データを得てこれを変換し、画像データを生成する。また、ＡＤＦ（オート・ドキュメント・フィーダ）などの原稿搬送デバイス２０７は、原稿搬送デバイス２０７上の原稿台に載置された原稿を１枚ずつスキャナデバイス２０６に搬送する。 The printer device 205 prints image data on paper (recording material, sheet). These include an electrophotographic printing method using a photoconductor drum and a photoconductor belt, and an inkjet method in which ink is ejected from a minute nozzle array to print an image directly on paper, but any method may be used. The scanner device 206 scans a document such as paper using an optical reader such as a CCD, obtains electrical signal data, converts the data, and generates image data. Further, the document transport device 207 such as an ADF (auto document feeder) transports the documents placed on the platen on the document transport device 207 to the scanner device 206 one by one.

ストレージ２０８は、ＨＤＤやＳＳＤなどの、読み出しと書き込みが可能な不揮発メモリであり、ここには、前述のコントローラプログラムなど、様々なデータが記録される。入力デバイス２０９は、タッチパネルやハードキーなどから構成さる入力装置である。入力デバイス２０９は、ユーザの操作指示を受け付ける。そして、指示位置を含む指示情報をＣＰＵ２０１に伝達する。表示デバイス２１０は、ＬＣＤやＣＲＴなどの表示装置である。表示デバイス２１０は、ＣＰＵ２０１が生成した表示データを表示する。ＣＰＵ２０１は、入力デバイス２０９より受信した指示情報と、表示デバイス２１０に表示させている表示データとから、いずれの操作が成されたかを判定する。そしてこの判定結果に応じて、画像処理装置１０１を制御するとともに、新たな表示データを生成し表示デバイス２１０に表示させる。 The storage 208 is a non-volatile memory such as an HDD or SSD that can be read and written, and various data such as the controller program described above are recorded in the storage 208. The input device 209 is an input device composed of a touch panel, hard keys, and the like. The input device 209 receives the user's operation instruction. Then, the instruction information including the instruction position is transmitted to the CPU 201. The display device 210 is a display device such as an LCD or a CRT. The display device 210 displays the display data generated by the CPU 201. The CPU 201 determines which operation has been performed from the instruction information received from the input device 209 and the display data displayed on the display device 210. Then, the image processing device 101 is controlled according to the determination result, and new display data is generated and displayed on the display device 210.

外部インタフェース２１１は、ＬＡＮや電話回線、赤外線といった近接無線などのネットワークを介して、外部機器と、画像データをはじめとする各種データの送受信を行う。外部インタフェース２１１は、学習装置１０２やＰＣ（不図示）などの外部機器より、ＰＤＬデータを受信する。ＣＰＵ２０１は、外部インタフェース２１１が受信したＰＤＬデータを解釈し、画像を生成する。生成した画像は、プリンタデバイス２０５により印刷したり、ストレージ１０８に記憶したりする。また、外部インタフェース２１１は、画像処理サーバ１０３などの外部機器より画像データを受信する。受信した画像データをプリンタデバイス２０５により印刷したり、ストレージ１０８に記憶したり、外部インタフェース２１１により、他の外部機器に送信したりする。 The external interface 211 transmits and receives various data such as image data to and from an external device via a network such as a LAN, a telephone line, or a proximity radio such as infrared rays. The external interface 211 receives PDL data from an external device such as a learning device 102 or a PC (not shown). The CPU 201 interprets the PDL data received by the external interface 211 and generates an image. The generated image is printed by the printer device 205 or stored in the storage 108. Further, the external interface 211 receives image data from an external device such as the image processing server 103. The received image data is printed by the printer device 205, stored in the storage 108, and transmitted to another external device by the external interface 211.

図２（ｂ）の学習装置１０２は、ＣＰＵ２３１、ＲＯＭ２３２、ＲＡＭ２３４、ストレージ２３５、入力デバイス２３６、表示デバイス２３７、外部インタフェース２３８、ＧＰＵ２３９を備える。各部は、データバス２３３を介して相互にデータを送受信することができる。 The learning device 102 of FIG. 2B includes a CPU 231, a ROM 232, a RAM 234, a storage 235, an input device 236, a display device 237, an external interface 238, and a GPU 239. Each unit can send and receive data to and from each other via the data bus 233.

ＣＰＵ２３１は、学習装置１０２の全体を制御するためのコントローラである。ＣＰＵ２３１は、不揮発メモリであるＲＯＭ２３２に格納されているブートプログラムによりＯＳを起動する。このＯＳの上で、ストレージ２３５に記憶されている学習データ生成プログラムおよび学習プログラムを実行する。ＣＰＵ２３１が学習データ生成プログラムを実行することより、学習データを生成する。また、ＣＰＵ２３１が学習プログラムを実行することにより、手書き抽出を行うニューラルネットワークを学習する。ＣＰＵ２３１は、データバス２３３などのバスを介して各部を制御する。 The CPU 231 is a controller for controlling the entire learning device 102. The CPU 231 boots the OS by a boot program stored in the ROM 232 which is a non-volatile memory. The learning data generation program and the learning program stored in the storage 235 are executed on this OS. The learning data is generated by the CPU 231 executing the learning data generation program. In addition, the CPU 231 learns a neural network that performs handwriting extraction by executing a learning program. The CPU 231 controls each unit via a bus such as the data bus 233.

ＲＡＭ２３４は、ＣＰＵ２３１のメインメモリやワークエリア等の一時記憶領域として動作するものである。ストレージ２３５は、読み出しと書き込みが可能な不揮発メモリであり、前述の学習データ生成プログラムや学習プログラムを記録する。 The RAM 234 operates as a temporary storage area such as a main memory or a work area of the CPU 231. The storage 235 is a non-volatile memory that can be read and written, and records the above-mentioned learning data generation program and learning program.

入力デバイス２３６は、マウスやキーボードなどから構成さる入力装置である。表示デバイス２３７は、図２（ａ）を用いて説明した表示デバイス２１０と同様である。 The input device 236 is an input device composed of a mouse, a keyboard, and the like. The display device 237 is the same as the display device 210 described with reference to FIG. 2 (a).

外部インタフェース２３８は、図２（ａ）を用いて説明した外部インタフェース２１１と同様である。 The external interface 238 is the same as the external interface 211 described with reference to FIG. 2 (a).

ＧＰＵ２３９は、画像処理プロセッサであり、ＣＰＵ２３１と協調して画像データの生成やニューラルネットワークの学習を行う。 The GPU 239 is an image processing processor that generates image data and learns a neural network in cooperation with the CPU 231.

図２（ｃ）の画像処理サーバ１０３は、ＣＰＵ２６１、ＲＯＭ２６２、ＲＡＭ２６４、ストレージ２６５、入力デバイス２６６、表示デバイス２６７、外部インタフェース２６８を備える。各部は、データバス２６３を介して相互にデータを送受信することができる。 The image processing server 103 of FIG. 2C includes a CPU 261 and a ROM 262, a RAM 264, a storage 265, an input device 266, a display device 267, and an external interface 268. Each unit can send and receive data to and from each other via the data bus 263.

ＣＰＵ２６１は、画像処理サーバ１０３の全体を制御するためのコントローラである。ＣＰＵ２６１は、不揮発メモリであるＲＯＭ２６２に格納されているブートプログラムによりＯＳを起動する。このＯＳの上で、ストレージ２６５に記憶されている画像処理サーバプログラムを実行する。ＣＰＵ２６１がこの画像処理サーバプログラムを実行することより、処理対象画像に対して手書き抽出および手書き領域推定を行う。ＣＰＵ２６１は、データバス２６３などのバスを介して各部を制御する。 The CPU 261 is a controller for controlling the entire image processing server 103. The CPU 261 boots the OS by a boot program stored in the ROM 262, which is a non-volatile memory. The image processing server program stored in the storage 265 is executed on this OS. By executing this image processing server program, the CPU 261 performs handwriting extraction and handwriting area estimation for the image to be processed. The CPU 261 controls each unit via a bus such as the data bus 263.

ＲＡＭ２６４は、ＣＰＵ２６１のメインメモリやワークエリア等の一時記憶領域として動作するものである。ストレージ２６５は、読み出しと書き込みが可能な不揮発メモリであり、前述の画像処理プログラムを記録する。 The RAM 264 operates as a temporary storage area such as a main memory or a work area of the CPU 261. The storage 265 is a non-volatile memory that can be read and written, and records the above-mentioned image processing program.

入力デバイス２６６は、図２（ｂ）を用いて説明した入力デバイス２３６と同様である。表示デバイス２６７は、図２（ａ）を用いて説明した表示デバイス２１０と同様である。 The input device 266 is the same as the input device 236 described with reference to FIG. 2 (b). The display device 267 is the same as the display device 210 described with reference to FIG. 2 (a).

外部インタフェース２６８は、図２（ａ）を用いて説明した外部インタフェース２１１と同様である。 The external interface 268 is the same as the external interface 211 described with reference to FIG. 2 (a).

図２（ｄ）のＯＣＲサーバ１０４は、ＣＰＵ２９１、ＲＯＭ２９２、ＲＡＭ２９４、ストレージ２９５、入力デバイス２９６、表示デバイス２９７、外部インタフェース２９８を備える。各部は、データバス２９３を介して相互にデータを送受信することができる。 The OCR server 104 of FIG. 2D includes a CPU 291 and a ROM 292, a RAM 294, a storage 295, an input device 296, a display device 297, and an external interface 298. Each unit can send and receive data to and from each other via the data bus 293.

ＣＰＵ２９１は、ＯＣＲサーバ１０４の全体を制御するためのコントローラである。ＣＰＵ２９１は、不揮発メモリであるＲＯＭ２９２に格納されているブートプログラムによりＯＳを起動する。このＯＳの上で、ストレージ２９５に記憶されているＯＣＲサーバプログラムを実行する。ＣＰＵ２９１がこのＯＣＲサーバプログラムを実行することより、手書き抽出画像や活字画像の手書き文字や活字を認識してテキスト化する。ＣＰＵ２９１は、データバス２９３などのバスを介して各部を制御する。 The CPU 291 is a controller for controlling the entire OCR server 104. The CPU 291 boots the OS by a boot program stored in the ROM 292, which is a non-volatile memory. The OCR server program stored in the storage 295 is executed on this OS. By executing this OCR server program, the CPU 291 recognizes the handwritten characters and the printed characters of the handwritten extracted image and the printed image and converts them into text. The CPU 291 controls each unit via a bus such as a data bus 293.

ＲＡＭ２９４は、ＣＰＵ２９１のメインメモリやワークエリア等の一時記憶領域として動作するものである。ストレージ２９５は、読み出しと書き込みが可能な不揮発メモリであり、前述の画像処理プログラムを記録する。 The RAM 294 operates as a temporary storage area such as a main memory or a work area of the CPU 291. The storage 295 is a non-volatile memory that can be read and written, and records the above-mentioned image processing program.

入力デバイス２９６は、図２（ｂ）を用いて説明した入力デバイス２３６と同様である。表示デバイス２９７は、図２（ａ）を用いて説明した表示デバイス２１０と同様である。 The input device 296 is the same as the input device 236 described with reference to FIG. 2 (b). The display device 297 is the same as the display device 210 described with reference to FIG. 2 (a).

外部インタフェース２９８は、図２（ａ）を用いて説明した外部インタフェース２１１と同様である。 The external interface 298 is the same as the external interface 211 described with reference to FIG. 2 (a).

＜操作画面＞
Ｓ３０１に示したユーザの指示は、画像処理装置１０１において、次のような操作画面を介して行われる。図５（ａ）は、サンプル原稿をスキャンして学習モデルを作成指示する際の操作画面（以下、学習原稿スキャン画面と称す）を示す図である。 <Operation screen>
The user's instruction shown in S301 is given in the image processing device 101 via the following operation screen. FIG. 5A is a diagram showing an operation screen (hereinafter, referred to as a learning document scanning screen) when scanning a sample document and instructing to create a learning model.

学習原稿スキャン画面５００は、表示デバイス２１０に表示される画面の一例である。図５（ａ）に示すように、学習原稿スキャン画面５００は、プレビュー領域５０１、スキャンボタン５０２、送信開始ボタン５０３を備える。 The learning document scanning screen 500 is an example of a screen displayed on the display device 210. As shown in FIG. 5A, the learning document scan screen 500 includes a preview area 501, a scan button 502, and a transmission start button 503.

スキャンボタン５０２はスキャナデバイス２０６にセットされた原稿の読取を開始するためのボタンである。スキャンが完了すると、原稿サンプル画像が生成され、プレビュー領域５０１に表示される。スキャナデバイス２０６に別の原稿をセットし、スキャンボタン５０２を再び押すことで、複数の原稿サンプル画像をまとめて保持しておくこともできる。 The scan button 502 is a button for starting scanning of the document set in the scanner device 206. When scanning is complete, a document sample image is generated and displayed in the preview area 501. By setting another document in the scanner device 206 and pressing the scan button 502 again, a plurality of document sample images can be held together.

原稿が読み取られると、送信開始ボタン５０３が押下可能となる。送信開始ボタン５０３がユーザにより選択指示されると、原稿サンプル画像が学習装置１０２に送信される。 When the document is read, the transmission start button 503 can be pressed. When the transmission start button 503 is instructed to be selected by the user, the original sample image is transmitted to the learning device 102.

Ｓ３０５に示したユーザによる正解データの付与指示は、次のような操作画面（図５（ｂ）〜（ｃ））で行われる。図５（ｂ）は、原稿サンプル画像において手書きされた文字部分の画素を指定する際の操作画面（以下、手書き抽出正解データ作成画面と称す）を示す図である。また、図５（ｃ）は、原稿サンプル画像において手書きされた文字を含む手書き領域を指定する際の操作画面（以下、手書き領域推定正解データ作成画面と称す）を示す図である。ユーザは、手書き抽出正解データ作成画面および手書き領域推定正解データ作成画面の表示内容に基づいて操作し、正解データを作成する。 The instruction to give the correct answer data by the user shown in S305 is performed on the following operation screens (FIGS. 5B to 5C). FIG. 5B is a diagram showing an operation screen (hereinafter, referred to as a handwritten extraction correct answer data creation screen) when designating the pixels of the handwritten character portion in the manuscript sample image. Further, FIG. 5C is a diagram showing an operation screen (hereinafter, referred to as a handwritten area estimation correct answer data creation screen) when designating a handwritten area including handwritten characters in the manuscript sample image. The user operates based on the display contents of the handwriting extraction correct answer data creation screen and the handwriting area estimation correct answer data creation screen to create correct answer data.

手書き抽出正解データ作成画面５２０は、表示デバイス２３７に表示される画面の一例である。図５（ｂ）に示すように、手書き抽出正解データ作成画面５２０は、画像表示領域５２１、画像選択ボタン５２２、拡大ボタン５２３、縮小ボタン５２４、抽出ボタン５２５、推定ボタン５２６、保存ボタン５２７を備える。 The handwritten extraction correct answer data creation screen 520 is an example of a screen displayed on the display device 237. As shown in FIG. 5B, the handwritten extraction correct answer data creation screen 520 includes an image display area 521, an image selection button 522, an enlargement button 523, a reduction button 524, an extraction button 525, an estimation button 526, and a save button 527. ..

画像選択ボタン５２２は、画像処理装置１０１から受信し、画像蓄積部１１５に蓄積されている原稿サンプル画像を選択するためのボタンである。画像選択ボタン５２２を指示すると、選択画面（不図示）が表示され、原稿サンプル画像を選択することができる。原稿サンプル画像を選択すると、画像表示領域５２１に選択した原稿サンプル画像が表示される。ユーザは画像表示領域５２１に表示された原稿サンプル画像に対して操作し、正解データを作成する。 The image selection button 522 is a button for selecting a document sample image received from the image processing device 101 and stored in the image storage unit 115. When the image selection button 522 is instructed, a selection screen (not shown) is displayed, and a document sample image can be selected. When the original sample image is selected, the selected original sample image is displayed in the image display area 521. The user operates on the original sample image displayed in the image display area 521 to create correct answer data.

拡大ボタン５２３と縮小ボタン５２４は、画像表示領域５２１の表示を拡大あるいは縮小するためのボタンである。拡大ボタン５２３や縮小ボタン５２４を指示することにより、正解データの作成が行いやすいよう、画像表示領域５２１に表示されている原稿サンプル画像を拡大および縮小して表示することができる。 The enlargement button 523 and the reduction button 524 are buttons for enlarging or reducing the display of the image display area 521. By instructing the enlargement button 523 or the reduction button 524, the original sample image displayed in the image display area 521 can be enlarged or reduced and displayed so that the correct answer data can be easily created.

抽出ボタン５２５および推定ボタン５２６は、手書き抽出と手書き領域推定のいずれの正解データを作成するかを選択するためのボタンである。いずれかを選択すると、選択したボタンは強調して表示される。抽出ボタン５２５を選択すると、手書き抽出の正解データを作成する状態となる。このボタンを選択した場合、ユーザは、手書き抽出の正解データを次のように操作して作成する。ユーザは、図５（ｂ）に示すように、入力デバイス２３６を介してマウスカーソルを操作し、画像表示領域５２１に表示されている原稿サンプル画像中の手書き文字をなぞるようにして指定することにより、手書き文字に対応する画素を選択する。学習データ生成部１１２は、この操作を受信すると、上記操作により選択された原稿サンプル画像上の画素位置を記録する。すなわち、手書き抽出の正解データは、原稿サンプル画像上の、手書きに該当する画素の位置である。 The extraction button 525 and the estimation button 526 are buttons for selecting whether to create correct answer data of handwriting extraction or handwriting area estimation. If you select one, the selected button will be highlighted. When the extraction button 525 is selected, the correct answer data for handwriting extraction is created. When this button is selected, the user creates the correct answer data for handwriting extraction by operating as follows. As shown in FIG. 5B, the user operates the mouse cursor via the input device 236 and specifies by tracing the handwritten characters in the manuscript sample image displayed in the image display area 521. , Select the pixel corresponding to the handwritten character. Upon receiving this operation, the learning data generation unit 112 records the pixel positions on the document sample image selected by the above operation. That is, the correct answer data of the handwriting extraction is the position of the pixel corresponding to the handwriting on the manuscript sample image.

一方、推定ボタン５２６を選択すると、手書き領域推定の正解データを作成する状態となる。このボタンを選択した場合、ユーザは、手書き領域推定の正解データを次のように操作して作成する。ユーザは、図５（ｃ）に点線枠で示すように、入力デバイス２３６を介してマウスカーソルを操作し、画像表示領域５２１に表示されている原稿サンプル画像中の手書き文字が記入される罫線に囲まれた領域の位置（記入欄内であり罫線は含まない）を選択する。これはすなわち、帳票の記入欄毎に領域を選択する操作である。学習データ生成部１１２は、この操作を受信すると、上記操作により選択された領域を記録する。すなわち、手書き領域推定の正解データは、原稿サンプル画像上の、記入欄内の領域（手書き文字が記入され得る領域）である（以降、手書き文字が記入される領域を「手書き領域」と呼称する）。 On the other hand, when the estimation button 526 is selected, the correct answer data for the handwriting area estimation is created. When this button is selected, the user creates the correct answer data for the handwriting area estimation by operating as follows. As shown by the dotted line frame in FIG. 5C, the user operates the mouse cursor via the input device 236 to create a ruled line on which handwritten characters in the original sample image displayed in the image display area 521 are written. Select the position of the enclosed area (in the entry field, not including the ruled line). This is an operation of selecting an area for each entry field of the form. Upon receiving this operation, the learning data generation unit 112 records the area selected by the above operation. That is, the correct answer data for estimating the handwritten area is an area (area in which handwritten characters can be entered) in the entry field on the manuscript sample image (hereinafter, the area in which the handwritten characters are entered is referred to as a "handwritten area". ).

保存ボタン５２７は、作成された正解データを保存するためのボタンである。 The save button 527 is a button for saving the created correct answer data.

手書き抽出の正解データは、次のような画像として、画像蓄積部１１５に蓄積される。手書き抽出の正解データを示す画像のサイズは、原稿サンプル画像と同じサイズ（幅および高さ）を有する。ユーザにより選択された手書き文字位置の画素の値は、手書きを示す値（例えば０、以降も同様）である。それ以外の画素の値は、手書きではないことを示す値（例えば２５５、以降も同様）である。以降、このような手書き抽出の正解データである画像を「手書き抽出正解画像」と呼称する。手書き抽出正解画像の例を図４（ｃ）に示す。 The correct answer data extracted by handwriting is stored in the image storage unit 115 as the following image. The size of the image showing the correct answer data of the handwritten extraction has the same size (width and height) as the original sample image. The value of the pixel at the handwritten character position selected by the user is a value indicating handwriting (for example, 0, and so on). The values of the other pixels are values indicating that they are not handwritten (for example, 255, and so on). Hereinafter, the image which is the correct answer data of such handwriting extraction will be referred to as "handwriting extraction correct answer image". An example of a handwritten extracted correct image is shown in FIG. 4 (c).

また、手書き領域推定の正解データは、次のような画像として、画像蓄積部１１５に蓄積される。手書き領域推定の正解データを示す画像のサイズは、原稿サンプル画像と同じサイズ（幅および高さ）を有する。ユーザにより選択された手書き領域に該当する画素の値は、手書き領域であることを示す値（例えば０、以降も同様）である。また、それ以外の画素の値は手書き領域ではないことを示す値（例えば２５５、以降も同様）である。以降、このような手書き領域推定の正解データである画像を「手書き領域推定正解画像」と呼称する。手書き領域推定正解画像の例を図４（ｄ）に示す。 Further, the correct answer data for estimating the handwriting area is stored in the image storage unit 115 as the following image. The size of the image showing the correct answer data of the handwriting area estimation has the same size (width and height) as the original sample image. The value of the pixel corresponding to the handwriting area selected by the user is a value indicating that it is a handwriting area (for example, 0, and so on). Further, the values of the other pixels are values indicating that the area is not a handwriting area (for example, 255, and the same applies thereafter). Hereinafter, the image which is the correct answer data of such handwriting area estimation will be referred to as "handwriting area estimation correct answer image". An example of the handwritten area estimation correct image is shown in FIG. 4 (d).

Ｓ３５１に示したユーザの指示は、画像処理装置１０１において、次のような操作画面を介して行われる。図５（ｄ）は、処理対象の帳票をスキャンして処理する際に表示される操作画面（以下、帳票処理画面と称す）を示す図である。図５（ｄ）に示すように、帳票処理画面５２０は、プレビュー領域５４１、スキャンボタン５４２、送信開始ボタン５４３を備える。 The user's instruction shown in S351 is given in the image processing device 101 via the following operation screen. FIG. 5D is a diagram showing an operation screen (hereinafter, referred to as a form processing screen) displayed when scanning and processing a form to be processed. As shown in FIG. 5D, the form processing screen 520 includes a preview area 541, a scan button 542, and a transmission start button 543.

スキャンボタン５４２はスキャナデバイス２０６にセットされた原稿の読取を開始するためのボタンである。スキャンが完了すると、処理対象画像が生成され、プレビュー領域５４１に表示される。 The scan button 542 is a button for starting scanning of the document set in the scanner device 206. When the scan is completed, the image to be processed is generated and displayed in the preview area 541.

原稿が読み取られると、送信開始ボタン５４３が押下可能となる。送信開始ボタン５４３がユーザにより押下されると、処理対象画像が画像処理サーバ１０３に送信される。 When the document is read, the transmission start button 543 can be pressed. When the transmission start button 543 is pressed by the user, the image to be processed is transmitted to the image processing server 103.

＜原稿サンプル画像生成処理＞
次に、画像処理装置１０１による原稿サンプル画像生成処理について説明する。図６（ａ）は原稿サンプル画像生成処理のフローを示す図である。この処理は、ＣＰＵ２０１が、ストレージ２０８に記録されているコントローラプログラムを読み出し、ＲＡＭ２０４に展開して実行することで実現される。これは、ユーザが、画像処理装置１０１の入力デバイス２０９を操作することにより開始される。 <Original sample image generation process>
Next, the original sample image generation process by the image processing device 101 will be described. FIG. 6A is a diagram showing a flow of a document sample image generation process. This process is realized by the CPU 201 reading the controller program recorded in the storage 208, expanding it in the RAM 204, and executing it. This is started by the user operating the input device 209 of the image processing device 101.

ＣＰＵ２０１は、Ｓ６０１において、原稿のスキャン指示が成されたか否かを判定する。ユーザが、入力デバイス２０９を介して、原稿をスキャンするための所定の操作（スキャンボタン５０２の指示）を行った場合には、ＹＥＳと判定し、Ｓ６０２に遷移する。そうでなければ、ＮＯと判定し、Ｓ６０４に遷移する。 The CPU 201 determines in S601 whether or not the document scanning instruction has been given. When the user performs a predetermined operation (instruction of the scan button 502) for scanning the document via the input device 209, it is determined as YES and the process proceeds to S602. If not, it is determined as NO, and the process proceeds to S604.

ＣＰＵ２０１は、Ｓ６０２において、スキャナデバイス２０６や原稿搬送デバイス２０７を制御して、サンプルの原稿をスキャンして原稿サンプル画像を生成する。原稿サンプル画像は、グレースケールの画像データとして生成されるものとするが、これに限るものではなく、カラーの多値画像であってもよい。 In S602, the CPU 201 controls the scanner device 206 and the document transport device 207 to scan the sample document and generate a document sample image. The manuscript sample image is generated as grayscale image data, but is not limited to this, and may be a color multi-valued image.

ＣＰＵ２０１は、Ｓ６０３において、Ｓ６０２で生成した原稿サンプル画像を、外部インタフェース２１１を介して、学習装置１０２に送信する。 In S603, the CPU 201 transmits the document sample image generated in S602 to the learning device 102 via the external interface 211.

ＣＰＵ２０１は、Ｓ６０４において、処理を終了するか否かを判定する。ユーザが、原稿サンプル画像生成処理を終了する所定の操作を行った場合には、ＹＥＳと判定して、処理を終了する。そうでなければ、ＮＯと判定し、Ｓ６０１に遷移する。 The CPU 201 determines in S604 whether or not to end the process. When the user performs a predetermined operation to end the original sample image generation process, it is determined as YES and the process is terminated. If not, it is determined as NO, and the process proceeds to S601.

以上の処理によって、画像処理装置１０１は、原稿サンプル画像を生成して学習装置１０２に送信する。ユーザの操作や、原稿搬送デバイス２０７に載置した原稿枚数に応じて、原稿サンプル画像が複数取得される。 Through the above processing, the image processing device 101 generates a document sample image and transmits it to the learning device 102. A plurality of document sample images are acquired according to the user's operation and the number of documents placed on the document transport device 207.

＜原稿サンプル画像受信処理＞
次に、学習装置１０２による原稿サンプル画像受信処理について説明する。図６（ｂ）は原稿サンプル画像受信処理のフローを示す図である。この処理は、ＣＰＵ２３１が、ストレージ２３５に記録されている学習データ生成プログラムを読み出し、ＲＡＭ２３４に展開して実行することで実現される。これは、ユーザが、学習装置１０２の電源をＯＮ（オン）にすると開始される。 <Original sample image reception processing>
Next, the document sample image reception process by the learning device 102 will be described. FIG. 6B is a diagram showing a flow of document sample image reception processing. This process is realized by the CPU 231 reading the learning data generation program recorded in the storage 235, expanding it in the RAM 234, and executing it. This is started when the user turns on the power of the learning device 102.

ＣＰＵ２３１は、Ｓ６２１において、原稿サンプル画像を受信したか否かを判定する。ＣＰＵ２３１は、外部インタフェース２３８を介して画像データを受信していたならば、ＹＥＳと判定し、Ｓ６２２に遷移する。そうでなければ、ＮＯと判定し、Ｓ６２３に遷移する。 The CPU 231 determines in S621 whether or not the original sample image has been received. If the CPU 231 has received the image data via the external interface 238, the CPU 231 determines YES and transitions to S622. If not, it is determined as NO, and the process proceeds to S623.

ＣＰＵ２３１は、Ｓ６２２において、受信した原稿サンプル画像を、ストレージ２３５の所定の領域に記録する。 In S622, the CPU 231 records the received document sample image in a predetermined area of the storage 235.

ＣＰＵ２３１は、Ｓ６２３において、処理を終了するか否かを判定する。ユーザが、学習装置１０２の電源のＯＦＦなどの、原稿サンプル画像受信処理を終了する所定の操作を行った場合には、ＹＥＳと判定して、処理を終了する。そうでなければ、ＮＯと判定し、Ｓ６２１に遷移する。 The CPU 231 determines in S623 whether or not to end the process. When the user performs a predetermined operation for ending the document sample image reception process such as turning off the power of the learning device 102, it is determined as YES and the process is terminated. If not, it is determined as NO, and the process proceeds to S621.

＜正解データ生成処理＞
次に、学習装置１０２による正解データ生成処理について説明する。図６（ｃ）は正解データ生成処理のフローを示す図である。 <Correct answer data generation process>
Next, the correct answer data generation process by the learning device 102 will be described. FIG. 6C is a diagram showing a flow of correct answer data generation processing.

この処理は、学習装置１０２の学習データ生成部１１２により実現される。これは、ユーザが、学習装置１０２の入力デバイス２３６を介して、所定の操作を行うことで開始される。 This process is realized by the learning data generation unit 112 of the learning device 102. This is started when the user performs a predetermined operation via the input device 236 of the learning device 102.

ＣＰＵ２３１は、Ｓ６４１において、原稿サンプル画像の選択指示が成されたか否かを判定する。ユーザが、入力デバイス２３６を介して、原稿サンプル画像を選択するための所定の操作（画像選択ボタン５２２の指示）を行った場合には、ＹＥＳと判定し、Ｓ６４２に遷移する。そうでなければ、ＮＯと判定し、Ｓ６４３に遷移する。 The CPU 231 determines in S641 whether or not a selection instruction for a document sample image has been made. When the user performs a predetermined operation (instruction of the image selection button 522) for selecting the original sample image via the input device 236, it is determined as YES and the process proceeds to S642. If not, it is determined as NO, and the process proceeds to S643.

ＣＰＵ２３１は、Ｓ６４２において、Ｓ６４１でユーザが選択した原稿サンプル画像を、ストレージ２３５から読み出して操作画面（画像表示領域５２１）に表示する。 In S642, the CPU 231 reads the original sample image selected by the user in S641 from the storage 235 and displays it on the operation screen (image display area 521).

ＣＰＵ２３１は、Ｓ６４３において、ユーザが正解データの入力指示を行ったか否かを判定する。ユーザが、入力デバイス２３６を介して、図５（ｂ）（ｃ）を用いて前述したように、原稿サンプル画像上の手書き文字をなぞる、あるいは、手書き文字が記入される手書き領域を指定するための操作を行ったならば、ＹＥＳと判定し、Ｓ６４４に遷移する。そうでなければ、ＮＯと判定し、Ｓ６４７に遷移する。 In S643, the CPU 231 determines whether or not the user has given an input instruction for correct answer data. For the user to trace the handwritten characters on the manuscript sample image or to specify the handwritten area in which the handwritten characters are written, as described above with reference to FIGS. 5 (b) and 5 (c), via the input device 236. If the operation of is performed, it is determined as YES, and the transition to S644 occurs. If not, it is determined as NO, and the process proceeds to S647.

ＣＰＵ２３１は、Ｓ６４４において、ユーザが入力した正解データは、手書き抽出の正解データの作成操作であるか、手書き領域の正解データの作成操作であるかを判定する。ＣＰＵ２３１は、ユーザが手書き抽出の正解データ作成を指示する操作を行っていたならば（抽出ボタン５２５の選択）、ＹＥＳと判定し、Ｓ６４５に遷移する。一方、手書き領域推定の正解データを指示する操作である場合（推定ボタン５２６を選択している）、Ｓ６４６に遷移する。 In S644, the CPU 231 determines whether the correct answer data input by the user is an operation for creating correct answer data for handwriting extraction or an operation for creating correct answer data for the handwritten area. If the user has performed an operation instructing the creation of correct answer data for handwritten extraction (selection of the extraction button 525), the CPU 231 determines YES and transitions to S645. On the other hand, in the case of the operation of instructing the correct answer data of the handwriting area estimation (the estimation button 526 is selected), the transition to S646 occurs.

ＣＰＵ２３１は、Ｓ６４５において、ユーザが入力した手書き抽出の正解データを、ＲＡＭ２３４に一時的に記憶する。前述のとおり、手書き抽出の正解データは、原稿サンプル画像中の手書きに該当する画素の位置情報である。 In S645, the CPU 231 temporarily stores the correct answer data of the handwritten extraction input by the user in the RAM 234. As described above, the correct answer data of the handwriting extraction is the position information of the pixel corresponding to the handwriting in the manuscript sample image.

ＣＰＵ２３１は、Ｓ６４６において、ユーザが入力した手書き領域推定の正解データをＲＡＭ２３４に一時的に記憶する。前述のとおり、手書き領域推定の正解データは、原稿サンプル画像上の、手書き領域に該当する領域情報である。 In S646, the CPU 231 temporarily stores the correct answer data of the handwriting area estimation input by the user in the RAM 234. As described above, the correct answer data for the handwriting area estimation is the area information corresponding to the handwriting area on the manuscript sample image.

ＣＰＵ２３１は、Ｓ６４７において、正解データの保存指示が成されたか否かを判定する。ユーザが、入力デバイス２３６を介して、正解データを保存するための所定の操作（保存ボタン５２７の指示）を行った場合には、ＹＥＳと判定し、Ｓ６４８に遷移する。そうでなければ、ＮＯと判定し、Ｓ６５０に遷移する。 The CPU 231 determines in S647 whether or not the instruction to save the correct answer data has been given. When the user performs a predetermined operation (instruction of the save button 527) for saving the correct answer data via the input device 236, it is determined as YES and the process proceeds to S648. If not, it is determined as NO, and the process proceeds to S650.

ＣＰＵ２３１は、Ｓ６４８において、手書き抽出正解画像を生成し、手書き抽出の正解データとして保存する。ＣＰＵ２３１は、次のようにして手書き抽出正解画像を生成する。ＣＰＵ２３１は、手書き抽出正解画像として、Ｓ６４２で読み出した原稿サンプル画像と同じサイズの画像を生成する。当該画像の全ての画素を、手書きではないことを示す値にする。次いで、Ｓ６４５においてＲＡＭ２３４に一時的に記憶した位置情報を参照し、手書き抽出正解画像上の該当する位置の画素の値を、手書きであることを示す値に変更する。このようにして生成した手書き抽出正解画像を、Ｓ６４２で読み出した原稿サンプル画像と関連付けて、ストレージ２３５の所定の領域に保存する。 In S648, the CPU 231 generates a handwritten extraction correct answer image and saves it as handwritten extraction correct answer data. The CPU 231 generates a handwritten extraction correct answer image as follows. The CPU 231 generates an image having the same size as the original sample image read in S642 as the handwritten extraction correct image. All pixels of the image are set to values indicating that they are not handwritten. Next, in S645, the position information temporarily stored in the RAM 234 is referred to, and the value of the pixel at the corresponding position on the handwritten extracted correct image is changed to a value indicating that the handwriting is performed. The handwritten extraction correct answer image generated in this way is associated with the original sample image read out in S642 and stored in a predetermined area of the storage 235.

ＣＰＵ２３１は、Ｓ６４９において、手書き領域推定正解画像を生成し、手書き領域推定の正解データとして保存する。ＣＰＵ２３１は、次のようにして手書き領域推定正解画像を生成する。ＣＰＵ２３１は、手書き領域推定正解画像として、Ｓ６４２で読み出した原稿サンプル画像と同じサイズの画像を生成する。当該画像の全ての画素を、手書き領域ではないことを示す値にする。次いで、Ｓ６４６においてＲＡＭ２３４に一時的に記憶した領域情報を参照し、手書き領域推定正解画像上の該当する領域内の画素の値を、手書き領域であることを示す値に変更する。このようにして生成した手書き領域推定正解画像を、Ｓ６４２で読み出した原稿サンプル画像と関連付けて、ストレージ２３５の所定の領域に保存する。 In S649, the CPU 231 generates a handwriting area estimation correct answer image and saves it as correct answer data for the handwriting area estimation. The CPU 231 generates a handwritten area estimation correct answer image as follows. The CPU 231 generates an image having the same size as the original sample image read in S642 as the handwritten area estimation correct image. All the pixels of the image are set to values indicating that they are not in the handwriting area. Next, in S646, the area information temporarily stored in the RAM 234 is referred to, and the value of the pixel in the corresponding area on the handwritten area estimated correct answer image is changed to a value indicating that it is a handwritten area. The handwritten area estimated correct answer image generated in this way is associated with the original sample image read out in S642 and stored in a predetermined area of the storage 235.

ＣＰＵ２３１は、Ｓ６５０において、処理を終了するか否かを判定する。ユーザが、正解データ生成処理を終了する所定の操作を行った場合には、ＹＥＳと判定して、処理を終了する。そうでなければ、ＮＯと判定し、Ｓ６４１に遷移する。 The CPU 231 determines in S650 whether or not to end the process. When the user performs a predetermined operation to end the correct answer data generation process, it is determined as YES and the process is terminated. If not, it is determined as NO and the transition to S641 is performed.

＜学習データ生成処理＞
次に、学習装置１０２による学習データ生成処理について説明する。図７（ａ）は学習データ生成処理のフローを示す図である。この処理は、学習装置１０２の学習データ生成部１１２により実現される。これは、ユーザが、画像処理装置１０１の入力デバイス２０９を介して、所定の操作を行うことで開始される。 <Learning data generation process>
Next, the learning data generation process by the learning device 102 will be described. FIG. 7A is a diagram showing a flow of learning data generation processing. This process is realized by the learning data generation unit 112 of the learning device 102. This is started when the user performs a predetermined operation via the input device 209 of the image processing device 101.

まずＣＰＵ２３１は、Ｓ７０１において、ストレージ２３５に記憶している原稿サンプル画像を選択して読み出す。図６（ｂ）のフローチャートのＳ６２２の処理ステップにより、ストレージ２３５には複数の原稿サンプル画像が記録されているので、その中からランダムにひとつを選択する。 First, in S701, the CPU 231 selects and reads out the original sample image stored in the storage 235. Since a plurality of document sample images are recorded in the storage 235 by the processing step of S622 in the flowchart of FIG. 6B, one of them is randomly selected.

ＣＰＵ２３１は、Ｓ７０２において、ストレージ２３５に記憶している手書き抽出正解画像を読み出す。Ｓ６４８の処理によって、Ｓ７０１で読み出した原稿サンプル画像に関連付けられた手書き抽出正解画像がストレージ２３５に記憶されているので、これを読み出す。 In S702, the CPU 231 reads out the handwritten extraction correct answer image stored in the storage 235. Since the handwritten extraction correct answer image associated with the original sample image read out in S701 is stored in the storage 235 by the process of S648, it is read out.

ＣＰＵ２３１は、Ｓ７０３において、ストレージ２３５に記憶している手書き領域推定正解画像を読み出す。Ｓ６４９の処理によって、Ｓ７０１で読み出した原稿サンプル画像に関連付けられた手書き領域推定正解画像がストレージ２３５に記憶されているので、これを読み出す。 In S703, the CPU 231 reads out the handwritten area estimated correct answer image stored in the storage 235. Since the handwritten area estimated correct answer image associated with the original sample image read out in S701 is stored in the storage 235 by the process of S649, it is read out.

ＣＰＵ２３１は、Ｓ７０４において、Ｓ７０１で読み出した原稿サンプル画像中の一部（例えば縦ｘ横＝２５６ｘ２５６の大きさ）を切り出して、学習データに用いる入力画像を生成する。切り出す位置はランダムに決定する。 In S704, the CPU 231 cuts out a part (for example, a size of vertical x horizontal = 256 x 256) in the original sample image read in S701 to generate an input image to be used for training data. The cutting position is randomly determined.

ＣＰＵ２３１は、Ｓ７０５において、Ｓ７０２で読み出した手書き抽出正解画像中の一部を切り出して、手書き抽出の学習データに用いる正解ラベル画像（教師データ、正解画像データ）を生成する。以降この正解ラベル画像を「手書き抽出正解ラベル画像」と呼称する。切り出す位置およびサイズは、Ｓ７０４で原稿サンプル画像から入力画像を切り出した位置およびサイズと同様とする。 In S705, the CPU 231 cuts out a part of the hand-drawn extraction correct-answer image read in S702 to generate a correct-answer label image (teacher data, correct-answer image data) used for learning data of hand-drawn extraction. Hereinafter, this correct label image will be referred to as a "handwritten extracted correct label image". The position and size to be cut out are the same as the position and size to cut out the input image from the original sample image in S704.

ＣＰＵ２３１は、Ｓ７０６において、Ｓ７０３で読み出した手書き領域推定正解画像中の一部を切り出して、手書き領域推定の学習データに用いる正解ラベル画像を生成する（以降この正解ラベル画像を「手書き領域推定正解ラベル画像」と呼称する）。切り出す位置およびサイズは、Ｓ７０４で原稿サンプル画像から入力画像を切り出した位置およびサイズと同様とする。 In S706, the CPU 231 cuts out a part of the handwritten area estimation correct answer image read in S703 to generate a correct answer label image to be used for the learning data of the handwritten area estimation (hereinafter, this correct answer label image is referred to as "handwritten area estimated correct answer label". Called "image"). The position and size to be cut out are the same as the position and size to cut out the input image from the original sample image in S704.

ＣＰＵ２３１は、Ｓ７０７において、Ｓ７０４で生成した入力画像と、Ｓ７０６で生成した手書き抽出正解ラベル画像とを対応付け、手書き抽出の学習データとしてストレージ２３５の所定の領域に保存する。本実施例では、図８（ａ）のような学習データが保存される。 In S707, the CPU 231 associates the input image generated in S704 with the handwritten extraction correct answer label image generated in S706, and stores it in a predetermined area of the storage 235 as learning data for handwritten extraction. In this embodiment, the learning data as shown in FIG. 8A is stored.

ＣＰＵ２３１は、Ｓ７０８において、Ｓ７０４で生成した入力画像と、Ｓ７０７で生成した手書き領域推定正解ラベル画像とを対応付け、手書き領域推定の学習データとしてストレージ２３５の所定の領域に保存する。本実施例では、図８（ｂ）のような学習データが保存される。 In S708, the CPU 231 associates the input image generated in S704 with the handwritten area estimation correct answer label image generated in S707, and stores it in a predetermined area of the storage 235 as learning data for handwriting area estimation. In this embodiment, the learning data as shown in FIG. 8B is stored.

ＣＰＵ２３１は、Ｓ７０９において、学習データ生成処理を終了するか否かを判定する。ＣＰＵ２３１は、予め決定した学習データの数（本フローチャートの開始時に、学習置１０２の入力デバイス２３６を介して、ユーザが指定するなどして決定）だけ学習データを生成していたならば、ＹＥＳと判定し、処理を終了する。そうでなければ、ＮＯと判定し、Ｓ７０１に遷移する。 The CPU 231 determines in S709 whether or not to end the learning data generation process. If the CPU 231 has generated training data for the number of predetermined learning data (determined by the user via the input device 236 of the learning device 102 at the start of this flowchart), YES. Judge and end the process. If not, it is determined as NO, and the process proceeds to S701.

以上により、手書き画素の抽出を行うためのニューラルネットワークの学習処理に使用される学習データと、手書き領域推定を行うためのニューラルネットワークの学習処理に使用される学習データとが生成される。ニューラルネットワークの汎用性を高めるために、学習データの加工を行っても良い。例えば、入力画像を所定の範囲（例えば、５０％〜１５０％の間）からランダムに選択して決定する変倍率で変倍する。手書き抽出および手書き領域推定の正解ラベル画像も同様に変倍する。あるいは、入力画像を所定の範囲（例えば、−１０度〜１０度の間）からランダムに選択して決定する回転角度で回転する。手書き抽出および手書き領域推定の正解ラベル画像も同様に回転する。変倍や回転を考慮すれば、Ｓ７０４やＳ７０５、Ｓ７０６で入力画像や手書き抽出および手書き領域推定の正解ラベル画像を切り出す際に、少し大きめのサイズ（例えば、縦ｘ横＝５１２ｘ５１２の大きさ）で切り出す。そして、変倍および回転後に、最終的な入力画像や手書き抽出および手書き領域推定の正解ラベル画像のサイズ（例えば、縦ｘ横＝２５６ｘ２５６）となるよう、中心部分から切り出す。あるいは、入力画像の各画素の輝度を変更して加工してもよい。すなわち、ガンマ補正を用いて入力画像の輝度を変更する。ガンマ値は所定の範囲（例えば、０．１〜１０．０の間）からランダムに選択して決定する。 As described above, the learning data used for the learning process of the neural network for extracting the handwritten pixels and the learning data used for the learning process of the neural network for estimating the handwritten area are generated. In order to increase the versatility of the neural network, the training data may be processed. For example, the input image is randomly selected from a predetermined range (for example, between 50% and 150%) and scaled at a variable magnification determined. The correct label image for handwriting extraction and handwriting area estimation is similarly scaled. Alternatively, the input image is rotated at a rotation angle determined by randomly selecting from a predetermined range (for example, between -10 degrees and 10 degrees). The correct label image for handwriting extraction and handwriting area estimation also rotates in the same manner. Considering scaling and rotation, when cutting out the input image and the correct label image for handwriting extraction and handwriting area estimation in S704, S705, and S706, use a slightly larger size (for example, length x width = 512 x 512). break the ice. Then, after scaling and rotation, the image is cut out from the central portion so as to have the size of the final input image and the correct label image for handwriting extraction and handwriting area estimation (for example, length x width = 256 x 256). Alternatively, the brightness of each pixel of the input image may be changed for processing. That is, the brightness of the input image is changed by using gamma correction. The gamma value is randomly selected and determined from a predetermined range (for example, between 0.1 and 10.0).

＜学習処理＞
次に、学習装置１０２による学習処理について説明する。図７（ｂ）は学習処理のフローを示す図である。この処理は、学習装置１０２の学習部１１３により実現される。これは、ユーザが、学習装置１０２の入力デバイス２３６を介して、所定の操作を行うことで開始される。なお、本実施例において、ニューラルネットワークの学習には、ミニバッチ法を用いるものとする。 <Learning process>
Next, the learning process by the learning device 102 will be described. FIG. 7B is a diagram showing a flow of learning processing. This process is realized by the learning unit 113 of the learning device 102. This is started when the user performs a predetermined operation via the input device 236 of the learning device 102. In this embodiment, the mini-batch method is used for learning the neural network.

まずＣＰＵ２３１は、Ｓ７３１において、手書き抽出と手書き領域推定のニューラルネットワークをそれぞれ初期化する。すなわち、ＣＰＵ２３１は、２つのニューラルネットワークを構築し、これらニューラルネットワークに含まれる各パラメタの値を、ランダムに決定して初期化する。これらニューラルネットワークの構造は、様々なものを用いることができるが、例えば、公知技術であるＦＣＮ（ＦｕｌｌｙＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｔｗｏｒｋｓ）の形態を取り得る。手書き領域推定のニューラルネットワークについては、他にも、例えば、公知技術であるＹＯＬＯ（ＹｏｕＯｎｌｙＬｏｏｋＯｎｃｅ）の形態を取り得る。 First, the CPU 231 initializes the handwriting extraction and the handwriting area estimation neural networks in S731, respectively. That is, the CPU 231 constructs two neural networks, and randomly determines and initializes the values of each parameter included in these neural networks. Various structures of these neural networks can be used, and for example, they may take the form of FCN (Full Convolutional Networks), which is a known technique. In addition, the neural network for estimating the handwritten area may take the form of YOLO (You Only Look None), which is a known technique.

ＣＰＵ２３１は、Ｓ７３２において、学習データを取得する。ＣＰＵ２３１は、図７（ａ）のフローチャートに示した、学習データ生成処理を実行して、所定の数の学習データ（例えば１０個のミニバッチサイズの学習データ）を取得する。 The CPU 231 acquires the learning data in S732. The CPU 231 executes the learning data generation process shown in the flowchart of FIG. 7A to acquire a predetermined number of learning data (for example, 10 mini-batch size learning data).

ＣＰＵ２３１は、Ｓ７３３において、手書き抽出のニューラルネットワークの誤差を算出する。すなわち、手書き抽出の各学習データに含まれる入力画像を手書き抽出のニューラルネットワークに入力して出力を得る。当該出力は、入力画像と同じ画像サイズであり、予測結果として、手書きであると判定された画素は、画素の値が手書きを示す値、そうではないと判定された画素は、画素の値が手書きではないことを示す値である画像である。そして、当該出力と学習データに含まれる手書き抽出正解ラベル画像との差を評価して誤差を求める。当該評価には指標として交差エントロピーを用いることができる。 The CPU 231 calculates the error of the handwritten extraction neural network in S733. That is, the input image included in each learning data of handwriting extraction is input to the neural network of handwriting extraction to obtain an output. The output has the same image size as the input image, and as a prediction result, the pixel value of the pixel determined to be handwritten is the value indicating handwriting, and the pixel value determined to be not is the pixel value. It is an image which is a value indicating that it is not handwritten. Then, the difference between the output and the handwritten extraction correct label image included in the learning data is evaluated to obtain an error. Cross entropy can be used as an index for the evaluation.

ＣＰＵ２３１は、Ｓ７３４において、手書き抽出のニューラルネットワークのパラメタを調整する。すなわち、Ｓ７３３において算出した誤差をもとに、バックプロパゲーション法によって手書き抽出のニューラルネットワークのパラメタ値を変更するものである。 The CPU 231 adjusts the parameters of the handwritten extraction neural network in S734. That is, the parameter value of the handwritten extraction neural network is changed by the backpropagation method based on the error calculated in S733.

ＣＰＵ２３１は、Ｓ７３５において、手書き領域推定のニューラルネットワークの誤差を算出する。すなわち、手書き領域推定の各学習データに含まれる入力画像を手書き領域推定のニューラルネットワークに入力して出力を得る。当該出力は、入力画像と同じ画像サイズであり、予測結果として、手書き領域であると判定された画素は、画素の値が手書き領域を示す値、そうではないと判定された画素は、画素の値が手書き領域ではないことを示す値である画像である。そして、当該出力と学習データに含まれる手書き領域推定正解ラベル画像との差を評価して誤差を求める。当該評価の指標には、手書き抽出と同様、交差エントロピーを用いることができる。 The CPU 231 calculates the error of the neural network for estimating the handwriting area in S735. That is, the input image included in each learning data of the handwriting area estimation is input to the neural network of the handwriting area estimation to obtain an output. The output has the same image size as the input image, and as a prediction result, the pixel value determined to be the handwritten area is the value indicating the handwritten area, and the pixel determined not to be the handwritten area is the pixel. It is an image which is a value indicating that the value is not a handwriting area. Then, the difference between the output and the handwritten area estimation correct label image included in the learning data is evaluated to obtain an error. As the index of the evaluation, cross entropy can be used as in the case of handwritten extraction.

ＣＰＵ２３１は、Ｓ７３６において、手書き領域推定のニューラルネットワークのパラメタを調整する。すなわち、Ｓ７３５において算出した誤差をもとに、バックプロパゲーション法によって手書き領域推定のニューラルネットワークのパラメタ値を変更するものである。 In S736, the CPU 231 adjusts the parameters of the neural network for estimating the handwriting area. That is, the parameter value of the neural network for estimating the handwriting area is changed by the backpropagation method based on the error calculated in S735.

ＣＰＵ２３１は、Ｓ７３７において、学習を終了するか否かを判定する。これは次のようにして行う。ＣＰＵ２３１は、Ｓ７３２〜Ｓ７３６の処理を、所定回数（例えば、６００００回）行ったか否かを判定する。当該所定回数は、本フローチャートの開始時にユーザが操作入力するなどして決定することができる。所定回数行った場合には、ＹＥＳと判定し、Ｓ７３８に遷移する。そうでない場合は、Ｓ７３２に遷移し、ニューラルネットワークの学習を続ける。 The CPU 231 determines in S737 whether or not to end the learning. This is done as follows. The CPU 231 determines whether or not the processes of S732 to S736 have been performed a predetermined number of times (for example, 60,000 times). The predetermined number of times can be determined by the user inputting an operation at the start of this flowchart. If it has been performed a predetermined number of times, it is determined as YES, and the process transitions to S738. If not, the process proceeds to S732 and the learning of the neural network is continued.

ＣＰＵ２３１は、Ｓ７３８において、学習結果として、Ｓ７３４とＳ７３６において調整した手書き抽出および手書き領域推定のニューラルネットワークのパラメタ（すなわち、学習済みの学習モデル）を、それぞれ、画像処理サーバ１０３に送信する。 In S738, the CPU 231 transmits, as a learning result, the parameters of the neural network for handwriting extraction and handwriting area estimation adjusted in S734 and S736 (that is, the learned learning model) to the image processing server 103, respectively.

＜帳票テキスト化依頼処理＞
次に、画像処理装置１０１による、帳票テキスト化依頼処理について説明する。画像処理装置１０１は、手書き文字が記入された帳票をスキャンして処理対象画像を生成する。そして、処理対象画像データを画像処理サーバ１０３に送信して、帳票テキスト化を依頼する。図９（ａ）は帳票テキスト化依頼処理のフローを示す図である。この処理は、画像処理装置１０１のＣＰＵ２０１が、ストレージ２０８に記録されているコントローラプログラムを読み出し、ＲＡＭ２０４に展開して実行することにより実現される。これは、ユーザが、画像処理装置１０１の入力デバイス２０９を介して、所定の操作を行うことで開始される。 <Form text request processing>
Next, the form text conversion request processing by the image processing device 101 will be described. The image processing device 101 scans a form in which handwritten characters are written to generate an image to be processed. Then, the image data to be processed is transmitted to the image processing server 103 to request the form text conversion. FIG. 9A is a diagram showing a flow of form text conversion request processing. This process is realized by the CPU 201 of the image processing device 101 reading the controller program recorded in the storage 208, expanding it into the RAM 204, and executing it. This is started when the user performs a predetermined operation via the input device 209 of the image processing device 101.

まずＣＰＵ２０１は、Ｓ９０１において、スキャナデバイス２０６や原稿搬送デバイス２０７を制御して、原稿をスキャンして処理対象画像を生成する。処理対象画像は、グレースケールの画像データとして生成される。 First, in S901, the CPU 201 controls the scanner device 206 and the document transport device 207 to scan the document and generate an image to be processed. The image to be processed is generated as grayscale image data.

ＣＰＵ２０１は、Ｓ９０２において、Ｓ９０１で生成した処理対象画像を、外部インタフェース２１１を介して、画像処理サーバ１０３に送信する。 In S902, the CPU 201 transmits the image to be processed generated in S901 to the image processing server 103 via the external interface 211.

ＣＰＵ２０１は、Ｓ９０３において、画像処理サーバ１０３から、処理結果を受信したか否かを判定する。画像処理サーバ１０３から、外部インタフェース２１１を介して、処理結果を受信していた場合には、ＹＥＳと判定し、Ｓ９０４に遷移する。そうでない場合には、ＮＯと判定し、Ｓ９０３の処理ステップを繰り返す。 The CPU 201 determines in S903 whether or not the processing result has been received from the image processing server 103. When the processing result is received from the image processing server 103 via the external interface 211, it is determined as YES, and the process proceeds to S904. If not, it is determined as NO, and the processing step of S903 is repeated.

ＣＰＵ２０１は、Ｓ９０４において、画像処理サーバ１０３から受信した処理結果、すなわち、Ｓ９０１で生成した処理対象画像に含まれる手書き文字や活字を認識して生成した帳票テキストデータを出力する。例えば、ユーザが、入力デバイス２０９を操作して設定した送信宛先に、外部インタフェース２１１を介して、帳票テキストデータを送信することができる。 The CPU 201 outputs the processing result received from the image processing server 103 in S904, that is, the form text data generated by recognizing the handwritten characters and printed characters included in the processing target image generated in S901. For example, the user can transmit the form text data to the transmission destination set by operating the input device 209 via the external interface 211.

＜帳票テキスト化処理＞
次に、画像処理サーバ１０３による帳票テキスト化処理について説明する。図９（ｂ）は帳票テキスト化処理のフローを示す図である。図１０は、帳票テキスト化処理における、データ生成処理の概要を示す図である。画像変換部１１４として機能する画像処理サーバ１０３は、画像処理装置１０１から処理対象画像を受信し、当該スキャン画像データに含まれる活字や手書き文字をＯＣＲしてテキストデータを得る。活字に対するＯＣＲは、活字ＯＣＲ部１１７に実行させる。手書き文字に対するＯＣＲは、手書きＯＣＲ部１１６に実行させる。帳票テキスト化処理は、ＣＰＵ２６１が、ストレージ２６５に記憶されている画像処理サーバプログラムを読み出し、ＲＡＭ２６４に展開して実行することで実現される。これは、ユーザが、画像処理サーバ１０３の電源をＯＮ（オン）にすると開始される。 <Form text processing>
Next, the form text conversion process by the image processing server 103 will be described. FIG. 9B is a diagram showing a flow of form text conversion processing. FIG. 10 is a diagram showing an outline of a data generation process in the form text conversion process. The image processing server 103 that functions as the image conversion unit 114 receives the image to be processed from the image processing device 101, and obtains text data by OCRing the printed characters and handwritten characters included in the scanned image data. The OCR for the type is executed by the type OCR unit 117. The handwritten OCR unit 116 executes the OCR for the handwritten characters. The form text conversion process is realized by the CPU 261 reading the image processing server program stored in the storage 265, deploying it in the RAM 264, and executing it. This is started when the user turns on the power of the image processing server 103.

まずＣＰＵ２６１は、Ｓ９５１において、手書き抽出を行うニューラルネットワーク（学習モデル）と、手書き領域推定を行うニューラルネットワーク（学習モデル）をロードする。ＣＰＵ２６１は、図７（ｂ）のフローチャートのＳ７３１の場合と同一のニューラルネットワークを構築する。そして、ＣＰＵ２６１は、学習装置１０２からＳ７３８の処理で送信された学習結果（手書き抽出を行うニューラルネットワークのパラメタと手書き領域推定を行うニューラルネットワークのパラメタ）を、構築したニューラルネットワークにそれぞれ反映することで、学習済みのニューラルネットワーク（学習モデル）をロードする。 First, in S951, the CPU 261 loads a neural network (learning model) that performs handwriting extraction and a neural network (learning model) that estimates the handwriting area. The CPU 261 constructs the same neural network as in the case of S731 in the flowchart of FIG. 7 (b). Then, the CPU 261 reflects the learning results (parameters of the neural network for handwriting extraction and parameters of the neural network for estimating the handwriting area) transmitted from the learning device 102 in the process of S738 to the constructed neural network, respectively. , Load the trained neural network (learning model).

ＣＰＵ２６１は、Ｓ９５２において、処理対象画像を、画像処理装置１０１より受信したかを判定する。外部インタフェース２６８を介して、処理対象画像を受信していたならば、ＹＥＳと判定し、Ｓ９５３に遷移する。そうでなければ、ＮＯと判定し、Ｓ９６５に遷移する。例として、ここでは、処理対象画像として、図１０の帳票４００（図４に示した帳票４００）を受信したものとする。 The CPU 261 determines in S952 whether the image to be processed has been received from the image processing device 101. If the image to be processed has been received via the external interface 268, it is determined as YES, and the process proceeds to S953. If not, it is determined as NO, and the process proceeds to S965. As an example, here, it is assumed that the form 400 of FIG. 10 (form 400 shown in FIG. 4) is received as the image to be processed.

ＣＰＵ２６１は、Ｓ９５３において、画像処理装置１０１から受信した処理対象画像から手書き画素を抽出する。ＣＰＵ２６１は、処理対象画像を、Ｓ９５１で構築した手書き抽出を行うニューラルネットワークに入力して、手書き画素を推定させる。ニューラルネットワークの出力として、次のような画像データが得られる。処理対象画像と同じ画像サイズであり、予測結果として手書きであると判定された画素には、手書きであること示す値、手書きではないと判定された画素には、手書きではないことを示す値が、それぞれ記録された二値画像データが得られる。そして、当該画像データ中の手書きであることを示す値の画素と同じ位置の画素を、処理対象画像が抽出して手書き抽出画像を生成する。これにより図１０の手書き抽出画像１００１のように、手書きと判断された画素のみが含まれる画像が得られる。 In S953, the CPU 261 extracts handwritten pixels from the processing target image received from the image processing device 101. The CPU 261 inputs the image to be processed into the neural network for handwriting extraction constructed in S951 to estimate the handwritten pixels. The following image data can be obtained as the output of the neural network. Pixels that have the same image size as the image to be processed and are determined to be handwritten as a prediction result have a value indicating that they are handwritten, and pixels that are determined not to be handwritten have a value indicating that they are not handwritten. , The recorded binary image data can be obtained. Then, the processing target image extracts the pixel at the same position as the pixel of the value indicating that it is handwritten in the image data to generate the handwritten extracted image. As a result, an image including only the pixels determined to be handwritten, such as the handwritten extracted image 1001 of FIG. 10, can be obtained.

ＣＰＵ２６１は、Ｓ９５４において、画像処理装置１０１から受信した処理対象画像から手書き領域を推定する。ＣＰＵ２６１は、処理対象画像を、Ｓ９５１で構築した手書き領域推定を行うニューラルネットワークに入力して、手書き領域を推定させる。ニューラルネットワークの出力として、次のような画像データが得られる。処理対象画像と同じ画像サイズであり、予測結果として手書き領域であると判定された画素には、手書き領域であること示す値、手書き領域ではないと判定された画素には、手書き領域ではないことを示す値が、それぞれ記録された二値画像データが得られる。Ｓ３０５において、ユーザは、手書き領域推定の正解データを、罫線枠（記入欄）を考慮して、帳票の記入項目ごとに作成した。手書き領域推定を行うニューラルネットワークはこれを学習しているため、記入欄（記入項目）ごとに手書き領域と推定される画素を出力する。なお、手書き領域推定のニューラルネットワークの出力は、画素毎の予測結果であるため、推定された手書き領域の画素は必ずしも矩形を構成するものではない。本実施形態では手書き領域を扱いやすくするため、当該推定された手書き領域に外接する外接矩形を設定する。外接矩形の設定には公知の技術を適用し実現することができる。それぞれの外接矩形は、処理対象画像上における左上端点と幅および高さから成る情報として表現することができる。このようにして得た矩形情報群を手書き領域とする。図１０の１００２に、処理対象画像（帳票４００）に対して推定した手書き領域の位置を、点線枠で示して例示する（１０１１〜１０１３）。 In S954, the CPU 261 estimates the handwriting area from the processing target image received from the image processing device 101. The CPU 261 inputs the image to be processed into the neural network for estimating the handwriting area constructed in S951 to estimate the handwriting area. The following image data can be obtained as the output of the neural network. Pixels that have the same image size as the image to be processed and are determined to be in the handwriting area as a prediction result have a value indicating that they are in the handwriting area, and pixels that are determined not to be in the handwriting area are not in the handwriting area. Binary image data in which the values indicating the above are recorded can be obtained. In S305, the user created the correct answer data for the handwriting area estimation for each entry item of the form in consideration of the ruled line frame (entry column). Since the neural network that estimates the handwriting area learns this, it outputs the pixels estimated to be the handwriting area for each entry field (entry item). Since the output of the neural network for estimating the handwriting area is the prediction result for each pixel, the estimated pixels in the handwriting area do not necessarily form a rectangle. In the present embodiment, in order to make the handwriting area easy to handle, an circumscribed rectangle circumscribing the estimated handwriting area is set. A known technique can be applied to the setting of the circumscribing rectangle. Each circumscribed rectangle can be expressed as information consisting of the upper left end point and the width and height on the image to be processed. The rectangular information group obtained in this way is used as a handwriting area. In 1002 of FIG. 10, the position of the handwritten area estimated with respect to the image to be processed (form 400) is shown by a dotted line frame and illustrated (1011 to 1013).

ＣＰＵ２６１は、Ｓ９５５において、Ｓ９５３で得た手書き抽出のニューラルネットワークの出力に基づいて、処理対象画像から手書きと判断された画素を除去することにより、活字画像を生成する。ＣＰＵ２６１は、処理対象画像の画素であって、ニューラルネットワークが出力した画像データにおいて画素値が手書きを示す値である画素と同位置の画素を、白（ＲＧＢ＝（２５５，２５５，２５５））に変更する。これにより、図１０の活字画像１００３が得られる。 In S955, the CPU 261 generates a print image by removing pixels determined to be handwritten from the image to be processed based on the output of the neural network of handwriting extraction obtained in S953. The CPU 261 sets the pixels of the image to be processed and the pixels at the same positions as the pixels whose pixel values are the values indicating handwriting in the image data output by the neural network to white (RGB = (255, 255, 255)). change. As a result, the print image 1003 of FIG. 10 is obtained.

ＣＰＵ２６１は、Ｓ９５６において、Ｓ９５５で生成した活字画像から活字領域を抽出する。ＣＰＵ２６１は、活字領域として、活字を内包する活字画像上の部分領域を抽出する。ここで部分領域とは、印刷内容のまとまり（オブジェクト）であり、例えば、複数の文字からなる文字行や、複数の文字行からなる文章、あるいは、図や写真、表、グラフ、といったオブジェクトである。この部分領域の抽出方法として、例えば次のような手法を取り得る。活字画像を白黒に二値化して二値画像を生成する。この二値画像において黒画素が連結する部分（連結黒画素）を抽出し、これに外接する矩形を作成していく。当該矩形の形状や大きさを評価することで、文字ないし文字の一部である矩形群を得ることができる。これら矩形群について、矩形間の距離を評価し、予め定めた閾値以下の距離である矩形の統合を行うことで、文字である矩形群を得ることができる。同様の大きさの文字の矩形が近くに並んでいる場合には、それらを統合して文字行の矩形群を得ることができる。短辺長が同様の文字行の矩形が等間隔に並んでいる場合は、それらを統合して文章の矩形群を得ることができる。また、図や写真、表、グラフなど、文字や行、文章以外のオブジェクトを内包する矩形も得ることができる。以上で抽出した矩形から、単独の文字あるいは文字の一部である矩形を除外する。残った矩形を部分領域とする。図１０の１００４に、活字画像に対して抽出した活字領域を、点線枠で例示する（１０１４〜１０１８）。 In S956, the CPU 261 extracts a print area from the print image generated in S955. The CPU 261 extracts a partial area on the type image including the type as the type area. Here, the partial area is a group (object) of print contents, for example, a character line consisting of a plurality of characters, a sentence consisting of a plurality of character lines, or an object such as a figure, a photograph, a table, or a graph. .. As a method for extracting this partial region, for example, the following method can be adopted. A binary image is generated by binarizing a printed image into black and white. In this binary image, a portion where black pixels are connected (connected black pixels) is extracted, and a rectangle circumscribing this is created. By evaluating the shape and size of the rectangle, a character or a group of rectangles that are a part of the character can be obtained. By evaluating the distance between the rectangles and integrating the rectangles having a distance equal to or less than a predetermined threshold value, the rectangle group which is a character can be obtained. When rectangles of characters of similar size are lined up close together, they can be integrated to obtain a group of rectangles of character lines. When rectangles of character lines with the same short side length are arranged at equal intervals, they can be integrated to obtain a group of rectangles in a sentence. It is also possible to obtain rectangles containing objects other than characters, lines, and sentences, such as figures, photographs, tables, and graphs. From the rectangles extracted above, a single character or a rectangle that is a part of a character is excluded. The remaining rectangle is used as a partial area. In 1004 of FIG. 10, the type region extracted with respect to the type image is illustrated by a dotted line frame (1014-1018).

ＣＰＵ２６１は、Ｓ９５７において、Ｓ９５４で推定した手書き領域のうち一つを選び、それに対応する活字領域を取得する。具体的には、手書き領域に対し、左方向又は上方向に位置する最も近い活字領域を探し、該当するものをその手書き領域に対応する活字領域とする。すなわち、手書き領域に対して所定の位置関係にある活字領域（左方向又は上方向で最も近い位置にある活字領域）を、当該手書き領域に対応する活字領域として取得する。例えば図１０において、手書き領域１０１１に最も近い活字領域は活字領域１０１６であるため、領域１０１１と領域１０１６を対応付ける。図１０の１００５に、手書き領域と活字領域が対応付けられている様子を示す。 In S957, the CPU 261 selects one of the handwriting areas estimated in S954 and acquires the corresponding print area. Specifically, the closest printing area located to the left or upward with respect to the handwriting area is searched for, and the corresponding area is set as the printing area corresponding to the handwriting area. That is, the type area having a predetermined positional relationship with the handwriting area (the type area closest to the left direction or the upward direction) is acquired as the type area corresponding to the handwriting area. For example, in FIG. 10, since the printing area closest to the handwriting area 1011 is the printing area 1016, the area 1011 and the area 1016 are associated with each other. FIG. 101005 shows how the handwriting area and the print area are associated with each other.

ＣＰＵ２６１は、Ｓ９５８において、Ｓ９５３で生成した手書き抽出画像のうち、Ｓ９５７で選択した手書き領域の部分のみを切り出し、外部インタフェース２６８を介して、手書きＯＣＲ部１１６に送信する。そして、切り出された手書き抽出画像に対して、手書きＯＣＲを実行させる。すなわち、手書き領域内に含まれる手書き画素に基づいて文字認識処理が実行される。手書きＯＣＲには公知の技術を適用し実現することができる。手書き抽出画像を手書きＯＣＲすることにより、例えば、図４の住所記入欄４０１や電話番号記入欄４５３に図示したように、手書き記入欄内の活字やマークなどの印字内容が、手書きＯＣＲの対象となってしまうことを低減する。 In S958, the CPU 261 cuts out only the part of the handwritten area selected in S957 from the handwritten extracted image generated in S953, and transmits it to the handwritten OCR unit 116 via the external interface 268. Then, the handwritten OCR is executed on the cut out handwritten extracted image. That is, the character recognition process is executed based on the handwritten pixels included in the handwritten area. A known technique can be applied to the handwritten OCR to realize it. By handwriting OCR of the handwritten extracted image, for example, as shown in the address entry field 401 and the telephone number entry field 453 in FIG. 4, the printed contents such as the type and the mark in the handwriting entry field are subject to the handwriting OCR. It reduces the possibility of becoming.

ＣＰＵ２６１は、Ｓ９５９において、手書きＯＣＲ部１１６から、手書きＯＣＲ結果を受信したか否かを判定する。手書きＯＣＲ結果とは、手書きＯＣＲ部１１６が、手書き文字を文字認識することにより取得されたテキストデータである。外部インタフェース２６８を介して、手書きＯＣＲ部１１６から、手書きＯＣＲ結果を受信していたならば、ＹＥＳと判定し、Ｓ９５５に遷移する。そうでなければ、Ｓ９５９の処理を繰り返す。 The CPU 261 determines in S959 whether or not the handwritten OCR result is received from the handwritten OCR unit 116. The handwritten OCR result is text data acquired by the handwritten OCR unit 116 recognizing handwritten characters. If the handwritten OCR result has been received from the handwritten OCR unit 116 via the external interface 268, it is determined as YES, and the process proceeds to S955. If not, the process of S959 is repeated.

ＣＰＵ２６１は、Ｓ９６０において、Ｓ９５５で生成した活字画像のうち、Ｓ９５７で取得した手書き領域に対応する活字領域の部分のみを切り出し、外部インタフェース２６８を介して、活字ＯＣＲ部１１７に送信し、活字ＯＣＲを実行させる。活字ＯＣＲには公知の技術を適用し実現することができる。 In S960, the CPU 261 cuts out only the part of the type area corresponding to the handwriting area acquired in S957 from the type image generated in S955, transmits it to the type OCR unit 117 via the external interface 268, and transmits the type OCR. Let it run. A known technique can be applied to the type OCR to realize it.

ＣＰＵ２６１は、Ｓ９６１において、活字ＯＣＲ部１１７から、活字ＯＣＲ結果を受信したか否かを判定する。活字ＯＣＲ結果とは、活字ＯＣＲ部１１７が、活字領域に含まれていた活字を文字認識することにより取得されたテキストデータである。外部インタフェース２６８を介して、活字ＯＣＲ部１１７から、活字ＯＣＲ結果を受信していたならば、ＹＥＳと判定し、Ｓ９６１に遷移する。そうでなければ、Ｓ９６１の処理を繰り返す。 The CPU 261 determines in S961 whether or not the print OCR result has been received from the print OCR unit 117. The type OCR result is text data acquired by the type OCR unit 117 recognizing the type included in the type area. If the print OCR result has been received from the print OCR unit 117 via the external interface 268, it is determined as YES, and the process proceeds to S961. If not, the process of S961 is repeated.

ＣＰＵ２６１は、Ｓ９６２において、Ｓ９６１で得た活字ＯＣＲ結果を項目、Ｓ９５９で得た手書きＯＣＲの結果を値として、図１０に示す帳票テキストデータ１００６を生成して保存する。 In S962, the CPU 261 generates and stores the form text data 1006 shown in FIG. 10 using the print OCR result obtained in S961 as an item and the handwritten OCR result obtained in S959 as a value.

ＣＰＵ２６１は、Ｓ９６３において、Ｓ９５４で推定したすべての手書き領域について、Ｓ９５７以降の処理を行ったか否かを判定する。すべての手書き領域について処理を行った場合はＹＥＳと判定し、Ｓ９６４に遷移する。そうでなければ、ＮＯと判定し、Ｓ９５７に遷移する。 In S963, the CPU 261 determines whether or not the processing after S957 has been performed on all the handwriting areas estimated in S954. When processing is performed for all the handwriting areas, it is determined as YES, and the process proceeds to S964. If not, it is determined as NO, and the process proceeds to S957.

ＣＰＵ２６１は、Ｓ９６４において、Ｓ９６２で生成した帳票テキストデータを、画像処理装置１０１に送信出力する。なお、本実施形態では、帳票テキストデータを画像処理装置１０１に送信するものとしたが、これに限るものではなく、予め出力先の外部サーバが予め指定されている場合は、その外部サーバに直接送信するように構成してもよい。 In S964, the CPU 261 transmits and outputs the form text data generated in S962 to the image processing device 101. In the present embodiment, the form text data is transmitted to the image processing device 101, but the present invention is not limited to this, and if the output destination external server is specified in advance, the form text data is directly sent to the external server. It may be configured to transmit.

ＣＰＵ２６１は、Ｓ９６５において、処理を終了するか否かを判定する。ユーザが、画像処理サーバ１０３の電源のＯＦＦなどの所定の操作を行った場合には、ＹＥＳと判定し、処理を終了する。そうでなければ、ＮＯと判定し、Ｓ９５２に遷移する。 In S965, the CPU 261 determines whether or not to end the process. When the user performs a predetermined operation such as turning off the power of the image processing server 103, it is determined as YES and the process is terminated. If not, it is determined as NO, and the process proceeds to S952.

以上、本実施例に示したとおり、事前に項目や値を登録せずに、手書き記入を含む帳票のスキャン画像から、記載されている情報（項目と値）を読み取る事ができる。これにより、事前にどのような項目が含まれるか分からない非定型帳票についても、人手に頼らずに電子化することができる。 As described above, as shown in this embodiment, it is possible to read the described information (items and values) from the scanned image of the form including the handwritten entry without registering the items and values in advance. As a result, even atypical forms for which it is not known in advance what kind of items are included can be digitized without relying on human labor.

本実施例では、手書き抽出画像を手書き領域毎に切り出して複数の部分画像を生成して送信したがこの限りではない。処理対象画像から生成した手書き抽出画像と手書き領域とを手書きＯＣＲ部に送信し、ＯＣＲ結果から手書き領域毎にＳ９５７以降の処理を行ってもよい。 In this embodiment, the handwritten extracted image is cut out for each handwritten area to generate and transmit a plurality of partial images, but this is not the case. The handwritten extracted image generated from the image to be processed and the handwritten area may be transmitted to the handwritten OCR unit, and the processing after S957 may be performed for each handwritten area from the OCR result.

また、なお、本実施例では、処理対象画像に対して手書き抽出を行い、手書き抽出画像を生成したが、この限りではない。手書き領域推定により推定した領域内に限定して手書き抽出を行ってもよい。このようにしても、手書きＯＣＲの対象となる画像は、記入項目を示す手書き領域であり、かつ手書き画素のみを含むものである。 Further, in the present embodiment, the handwritten extraction is performed on the image to be processed to generate the handwritten extracted image, but this is not the case. The handwriting extraction may be performed only within the area estimated by the handwriting area estimation. Even in this way, the image that is the target of the handwritten OCR is a handwritten area indicating an entry item and includes only handwritten pixels.

（実施例２）
ユーザに手書きで記入させる帳票は記入漏れに気付きやすくさせる為に、記入欄を密集させる事が多い。このうち、記入欄が縦に並ぶ帳票は、左に項目名、右に記入欄という構成をとる事が多いため、記入された手書きの左に項目がある可能性が高い。一方記入欄が横に並ぶ帳票は、上に項目名、下に記入欄という構成をとることが多いため、記入された手書きの上に項目がある可能性が高い。 (Example 2)
In the form that the user fills out by hand, the entry fields are often crowded in order to make it easier to notice the omission. Of these, forms with vertical entry fields often have an item name on the left and an entry field on the right, so there is a high possibility that there will be an item on the left of the handwritten entry. On the other hand, a form in which entry fields are lined up side by side often has an item name at the top and an entry field at the bottom, so there is a high possibility that there are items above the written handwriting.

本実施例ではこの傾向を利用し、帳票テキスト化処理で手書き領域に対応する活字領域を取得する際、手書き領域同士の位置関係に応じて活字領域の探索方向を限定するケースについて述べる。本実施例の画像処理システムの構成は、特徴部分を除いて実施例１の構成と同様である。そのため、同様の構成については、同様の符号を付し、その詳細な説明を省略する。 In this embodiment, a case will be described in which the search direction of the print area is limited according to the positional relationship between the handwritten areas when the print area corresponding to the handwritten area is acquired in the form text conversion process by utilizing this tendency. The configuration of the image processing system of this embodiment is the same as that of the first embodiment except for the feature portion. Therefore, similar reference numerals will be given to the same configurations, and detailed description thereof will be omitted.

＜帳票テキスト化処理＞
図１１を用いて、実施例２における処理を説明する。Ｓ９５１からＳ９５６までの処理は実施例１における同符号の処理と同様である。 <Form text processing>
The process in the second embodiment will be described with reference to FIG. The processes from S951 to S956 are the same as the processes of the same reference numerals in Example 1.

ＣＰＵ２６１は、Ｓ１１０１において、Ｓ９５４で推定した手書き領域のうち一つを選択する。 The CPU 261 selects one of the handwriting areas estimated in S954 in S1101.

ＣＰＵ２６１は、Ｓ１１０２において、Ｓ１１０１で選択した手書き領域が、左右方向に密集した中にあるか、上下方向に密集した中にあるかを判定する。具体的には、Ｓ１１０１で選択したものに最も近い他の手書き領域が左又は右にある場合は左右方向に、最も近い他の手書き領域が上または下にある場合は上下方向に密集していると判定する。図１２に示す帳票１２００を例に説明する。Ｓ９５２において帳票１２００を受信し、Ｓ９５４で領域１２０１〜１２０４が手書き領域（長い点線で示す）として推定され、更にＳ９５６で領域１２０５〜１２１１が活字領域（短い点線で示す）として抽出されたとする。このうち、手書き領域１２０１は最も近い手書き領域が下にある領域１２０２であるため、上下方向に密集していると判定する。一方、手書き領域１２０３は最も近い手書き領域が右にある領域１２０４であるため、左右方向に密集していると判定する。左右方向に密集している場合はＹＥＳと判定し、Ｓ１１０３に遷移する。そうでなければＮＯと判定し、Ｓ１１０４に遷移する。 In S1102, the CPU 261 determines whether the handwriting area selected in S1101 is densely packed in the left-right direction or densely packed in the vertical direction. Specifically, when the other handwriting area closest to the one selected in S1101 is on the left or right, it is concentrated in the left-right direction, and when the other handwriting area closest to the one selected in S1101 is on the top or bottom, it is dense in the vertical direction. Is determined. The form 1200 shown in FIG. 12 will be described as an example. It is assumed that the form 1200 is received in S952, the areas 1201 to 1204 are estimated as the handwritten area (indicated by the long dotted line) in S954, and the areas 1205 to 1211 are extracted as the printed area (indicated by the short dotted line) in S956. Of these, since the handwriting area 1201 is the area 1202 in which the closest handwriting area is below, it is determined that the handwriting area 1201 is dense in the vertical direction. On the other hand, since the handwriting area 1203 is the area 1204 with the closest handwriting area on the right, it is determined that the handwriting area 1203 is dense in the left-right direction. If it is dense in the left-right direction, it is determined as YES, and the process transitions to S1103. If not, it is determined as NO, and the process proceeds to S1104.

ＣＰＵ２６１は、Ｓ１１０３において、Ｓ１１０１で選択した手書き領域から、上方向に最も近い活字領域を取得する。図１２の帳票１２００において、手書き領域１２０３は左右方向に密集しており、この手書き領域１２０３から上方向に最も近い活字領域は領域１２１０である。故に、手書き領域１２０３に対応する活字領域として、領域１２１０が取得される。 In S1103, the CPU 261 acquires the print area closest to the upward direction from the handwriting area selected in S1101. In the form 1200 of FIG. 12, the handwriting area 1203 is densely packed in the left-right direction, and the print area closest to the handwriting area 1203 in the upward direction is the area 1210. Therefore, the area 1210 is acquired as the print area corresponding to the handwriting area 1203.

ＣＰＵ２６１は、Ｓ１１０４において、Ｓ１１０１で選択した手書き領域から、左方向に最も近い活字領域を取得する。図１２の帳票１２００において、手書き領域１２０１は上下方向に密集しており、この手書き領域１２０１から左方向に最も近い活字領域は領域１２０７である。故に、手書き領域１２０１に対応する活字領域として、領域１２０７が取得される。 In S1104, the CPU 261 acquires the print area closest to the left from the handwriting area selected in S1101. In the form 1200 of FIG. 12, the handwriting area 1201 is densely packed in the vertical direction, and the print area closest to the left direction from the handwriting area 1201 is the area 1207. Therefore, the area 1207 is acquired as the print area corresponding to the handwriting area 1201.

Ｓ９５８からＳ９６７までの処理は実施例１における同符号の処理と同様である。 The processes from S958 to S967 are the same as the processes of the same reference numerals in Example 1.

本実施例２によれば、手書き領域（＝帳票の記入欄）の構成を分析し、それぞれの手書きの値に対応する項目が存在する可能性の高い方向に絞って項目を探索する。これにより、手書きの値により適した項目を対応付けやすくなる。 According to the second embodiment, the configuration of the handwritten area (= entry field of the form) is analyzed, and the items are searched for in the direction in which there is a high possibility that the item corresponding to each handwritten value exists. This makes it easier to associate items that are more suitable for handwritten values.

なお、本実施例では説明を省略したが、Ｓ１１０３又はＳ１１０４で手書き領域の上又は左方向に活字領域が見つからなかった場合、それぞれ左又は上方向に活字領域を探索し直しても良い。また、手書き領域と活字領域の距離に閾値を設け、手書き領域から閾値以上に遠い活字領域は対応付けないよう制御しても良い。 Although the description is omitted in this embodiment, if the print area is not found above or in the left direction of the handwriting area in S1103 or S1104, the print area may be searched again in the left or upward direction, respectively. Further, a threshold value may be set for the distance between the handwriting area and the print area, and the print area farther than the threshold value from the handwriting area may be controlled so as not to be associated with each other.

（その他の実施例）
本発明は、上述の実施例の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 (Other Examples)
The present invention supplies a program that realizes one or more functions of the above-described embodiment to a system or device via a network or storage medium, and one or more processors in the computer of the system or device reads and executes the program. It can also be realized by the processing to be performed. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

本発明は、複数の機器から構成されるシステムに適用しても、１つの機器からなる装置に適用してもよい。例えば、実施例では、学習データ生成部１１２と学習部１１３は、学習装置１０２において実現されるものとしたが、それぞれ別体の装置において実現されるようにしても良い。この場合、学習データ生成部１１２を実現する装置は、学習データ生成部１１２が生成した学習データを、学習部１１３を実現する装置に送信する。そして、学習部１１３は受信した学習データをもとにニューラルネットワークを学習する。画像処理装置１０１と画像処理サーバ１０３とを別体の装置として説明したが、画像処理装置１０１が画像処理サーバ１０３の機能を具備しても良い。画像処理サーバ１０３とＯＣＲサーバ１０４とを別体の装置として説明したが、画像処理サーバ１０３のＯＣＲサーバ１０４としての機能を具備しても良い。 The present invention may be applied to a system composed of a plurality of devices or a device composed of one device. For example, in the embodiment, the learning data generation unit 112 and the learning unit 113 are realized by the learning device 102, but they may be realized by separate devices. In this case, the device that realizes the learning data generation unit 112 transmits the learning data generated by the learning data generation unit 112 to the device that realizes the learning unit 113. Then, the learning unit 113 learns the neural network based on the received learning data. Although the image processing device 101 and the image processing server 103 have been described as separate devices, the image processing device 101 may have the function of the image processing server 103. Although the image processing server 103 and the OCR server 104 have been described as separate devices, the image processing server 103 may have a function as the OCR server 104.

本発明は上記実施例に限定されるものではなく、本発明の趣旨に基づき種々の変形（各実施例の有機的な組合せを含む）が可能であり、それらを本発明の範囲から除外するものではない。即ち、上述した各実施例及びその変形例を組み合わせた構成も全て本発明に含まれるものである。 The present invention is not limited to the above examples, and various modifications (including organic combinations of each example) are possible based on the gist of the present invention, and these are excluded from the scope of the present invention. is not it. That is, all the configurations in which each of the above-described examples and modifications thereof are combined are also included in the present invention.

実施例では、活字領域の抽出をＳ９５６に示したように、画素の連結性に基づいて判定する方法を示したが、手書き領域推定と同様にニューラルネットワークを用いて推定してもよい。手書き領域推定正解画像を作成したのと同じ要領で、活字領域をユーザが選択し、それに基づいて正解データを作成、活字ＯＣＲ領域推定を行うニューラルネットワークを新たに構成し、当該正解データを参照させて学習する。 In the embodiment, as shown in S956, the extraction of the print region is determined based on the connectivity of the pixels, but it may be estimated using a neural network in the same manner as the handwriting region estimation. Handwritten area estimation In the same way as when creating a correct image, the user selects the type area, creates correct answer data based on it, newly configures a neural network that estimates the type OCR area, and refers to the correct answer data. To learn.

実施例では、学習処理時に学習データ生成処理より学習データを生成した。しかしながら、事前に学習データ生成処理によって学習データを大量に生成しておいて、学習処理時にそこから随時、ミニバッチサイズ分をサンプルするようにしてもよい。 In the embodiment, the learning data was generated from the learning data generation process at the time of the learning process. However, a large amount of learning data may be generated in advance by the learning data generation process, and a mini-batch size may be sampled from the learning data at any time during the learning process.

実施例では、入力画像をグレースケール画像として生成したが、フルカラー画像など他の形式として生成してもよい。 In the embodiment, the input image is generated as a grayscale image, but it may be generated as another format such as a full-color image.

なお、各実施例中に登場する略称の定義は次の通りである。ＭＦＰとは、ＭｕｌｔｉＦｕｎｃｔｉｏｎＰｅｒｉｐｈｅｒａｌのことである。ＡＳＩＣとは、ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔのことである。ＣＰＵとは、ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔのことである。ＲＡＭとは、Ｒａｎｄｏｍ‐ＡｃｃｅｓｓＭｅｍｏｒｙのことである。ＲＯＭとは、ＲｅａｄＯｎｌｙＭｅｍｏｒｙのことである。ＨＤＤとはＨａｒｄＤｉｓｋＤｒｉｖｅのことである。ＳＳＤとはＳｏｌｉｄＳｔａｔｅＤｒｉｖｅのことである。ＬＡＮとは、ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋのことである。ＰＤＬとは、ＰａｇｅＤｅｓｃｒｉｐｔｉｏｎＬａｎｇｕａｇｅのことである。ＯＳとは、ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍのことである。ＰＣとは、ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒのことである。ＯＣＲとは、ＯｐｔｉｃａｌＣｈａｒａｃｔｅｒＲｅｃｏｇｎｉｔｉｏｎ／Ｒｅａｄｅｒのことである。ＣＣＤとは、Ｃｈａｒｇｅ−ＣｏｕｐｌｅｄＤｅｖｉｃｅのことである。ＬＣＤとはＬｉｑｕｉｄＣｒｙｓｔａｌＤｉｓｐｌａｙのことである。ＡＤＦとはＡｕｔｏＤｏｃｕｍｅｎｎｔＦｅｅｄｅｒのことである。ＣＲＴとはＣａｔｈｏｄｅＲａｙＴｕｂｅのことである。ＧＰＵとは、ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔのことである。ＩＤとは、Ｉｄｅｎｔｉｆｉｃａｔｉｏｎのことである。 The definitions of abbreviations appearing in each embodiment are as follows. The MFP is a Multi Function Peripheral. The ASIC is an Application Special Integrated Circuit. The CPU is a Central Processing Unit. RAM is Random-Access Memory. ROM is a Read Only Memory. HDD stands for Hard Disk Drive. SSD stands for Solid State Drive. LAN is a Local Area Network. PDL is Page Description Language. The OS is an Operating System. A PC is a Personal Computer. OCR stands for Optical Character Recognition / Reader. CCD is a Charge-Coupled Device. The LCD is a Liquid Crystal Display. ADF stands for Auto Document Feeder. CRT stands for Cathode Ray Tube. GPU is a Graphics Processing Unit. The ID is an identity.

Claims

A handwriting extraction means for extracting the pixels of the handwritten characters from the image to be processed obtained by scanning the original using a trained model for extracting the pixels of the handwritten characters from the image.
A handwriting area estimation means for estimating a handwriting area from the image to be processed by using a learned model for estimating a handwriting area in which handwritten characters are written from an image, and a handwriting area estimation means.
A type area extraction means for extracting a type area including a type character from the image to be processed, and a type area extraction means.
A storage means for identifying the type area having a predetermined positional relationship with the handwriting area and storing the character recognition result of the handwriting area and the character recognition result of the specified type area in association with each other.
An image processing system characterized by being equipped with.

The image processing system according to claim 1, wherein the character recognition result of the handwritten area is obtained by performing handwritten character recognition processing based on the pixels of the handwritten character included in the estimated handwritten area. ..

The printing area extraction means is characterized in that the printing area is extracted by extracting the printing area from the image generated by removing the pixels of the handwritten character from the processing target image. Item 2. The image processing system according to Item 1 or 2.

Any one of claims 1 to 3, wherein the character recognition result of the type area is obtained by performing the character recognition process of the type on the type characters included in the extracted type area. The image processing system described in.

Any of claims 1 to 4, wherein the type area having a predetermined positional relationship with the handwriting area is a type area closest to the handwriting area in the left direction or the upward direction. The image processing system according to item 1.

The print area having a predetermined positional relationship with the handwriting area is a print area closest to the handwriting area in the left direction when the handwriting area is densely arranged in the vertical direction with another handwriting area. The first to fourth aspects of claim 1 to 4, wherein when the handwriting area is densely arranged in the left-right direction with another handwriting area, the handwriting area is the print area closest to the handwriting area in the upward direction. The image processing system according to any one of the items.

The trained model for extracting the pixels of the handwritten character from the image is created by learning based on the correct answer data created based on the pixels of the handwritten character portion specified by the user on the sample image. The image processing system according to any one of claims 1 to 6, wherein the image processing system is characterized by the above.

The trained model for estimating the handwritten area in which handwritten characters are written from the image is learned based on correct answer data created based on the position of the entry field specified by the user on the sample image. The image processing system according to any one of claims 1 to 7, wherein the image processing system is created.

The trained model for extracting the pixels of the handwritten character from the image and the trained model for extracting the pixels of the handwritten character from the image are trained models constructed by deep learning. The image processing system according to any one of claims 1 to 8, wherein the image processing system is characterized.

A program for causing a computer to function as each means of the image processing system according to any one of claims 1 to 8.

A handwriting extraction step that extracts the pixels of the handwritten characters from the image to be processed obtained by scanning the manuscript using a trained model for extracting the pixels of the handwritten characters from the image.
Using a trained model for estimating the handwritten area in which handwritten characters are written from the image, a handwritten area estimation step for estimating the handwritten area from the image to be processed, and a handwriting area estimation step.
A type area extraction step for extracting a type area including a type character from the image to be processed, and a type area extraction step.
A save step of identifying the type area having a predetermined positional relationship with the handwriting area, and saving the character recognition result of the handwriting area and the character recognition result of the specified type area in association with each other.
An image processing method characterized by having.