JP2011228792A

JP2011228792A - Image processing device

Info

Publication number: JP2011228792A
Application number: JP2010094200A
Authority: JP
Inventors: 大 ▲高▼田; Dai Takada
Original assignee: Murata Machinery Ltd
Current assignee: Murata Machinery Ltd
Priority date: 2010-04-15
Filing date: 2010-04-15
Publication date: 2011-11-10

Abstract

PROBLEM TO BE SOLVED: To provide an image processing device capable of determining a character which a user desires to specify, even when the user has roughly marked thereon.SOLUTION: A network composite machine mounted with an image processing device comprises: a scanner for obtaining image data from a document; a marked area extracting portion 31 for extracting a marked area; a character described area extracting portion 331 for extracting a character described area, in which a relevant character is described, for each character contained in a character string; an overlapped character described area extracting portion 322 for extracting an overlapped character described area in which the character described area and the marked area are overlapped with respect to each other; a character calculating portion 333 for calculating each length of the overlapped character described area and the character described area corresponding to the overlapped character described area in a character string direction; and a character determining portion 334 for determining whether a character to be determined is the character specified by a marking on the document or not based on the length of the overlapped character described area in the character string direction with respect to the length of the character described area in the character string direction.

Description

本発明は、画像処理装置に関する。 The present invention relates to an image processing apparatus.

マーカーペンで文字がマーキングされた原稿をスキャナで読み取って画像データを生成し、マーキングされた文字に対して削除、色変更、強調等の画像処理を施す画像処理装置が、下記特許文献１に開示されている。下記特許文献１に開示された画像処理装置は、画像処理の対象となる領域を特定するために、まず、マーカーペンによって塗り潰されたマーキング箇所と、文字が記載された矩形領域を求める。そして、この画像処理装置は、マーキング箇所を領域内に含む矩形領域を特定し、これを画像処理の対象とする。 An image processing apparatus that reads a document on which characters are marked with a marker pen with a scanner to generate image data, and performs image processing such as deletion, color change, and enhancement on the marked characters is disclosed in Patent Document 1 below. Has been. The image processing apparatus disclosed in Patent Document 1 below first obtains a marking area filled with a marker pen and a rectangular area in which characters are described in order to specify an area to be subjected to image processing. The image processing apparatus identifies a rectangular area including the marking portion in the area, and sets this as a target for image processing.

特開２００１−２３０９１９号公報JP 2001-230919 A

ところで、ユーザが原稿の文字をマーカーペン等でマーキングする際に、指定したい文字の隣の文字にマーキングが掛かってしまう場合がある。この場合、上記特許文献１に記載された画像処理装置では、指定したい文字の隣の文字が記載された矩形領域内に、マーキング箇所が位置するので、指定したくない文字まで画像処理対象として特定されてしまう。 By the way, when a user marks a character on a document with a marker pen or the like, the character adjacent to the character to be specified may be marked. In this case, in the image processing apparatus described in Patent Document 1, since the marking portion is located in the rectangular area where the character next to the character to be specified is described, the character that is not desired to be specified is specified as the image processing target. Will be.

このため、上記画像処理装置を用いる場合、ユーザは、指定したい文字だけをマーキングし、指定外の文字にはマーキングがかからないように、精緻にマーキングを行う必要がある。しかしながら、一般的に、文字が記載された領域を塗りつぶし易いように、マーキングには太いペンが使用されるので、指定したい文字の周囲の文字にマーキングがかからないようにするには、神経を使いながらマーキングする必要がある。 For this reason, when using the image processing apparatus, it is necessary for the user to mark only the character that the user wants to designate, and to perform marking precisely so that the non-designated character is not marked. However, in general, a thick pen is used for marking so that it is easy to fill the area where the character is written. Therefore, in order to prevent marking around the character to be specified, it is necessary to use a nerve. Need to be marked.

本発明は、上記問題点を解消する為になされたものであり、ユーザがラフにマーキングした場合であっても、ユーザが指定したい文字を特定することが可能な画像処理装置を提供することを目的とする。 The present invention has been made to solve the above-described problems, and provides an image processing apparatus that can specify a character that the user wants to specify even when the user has roughly marked. Objective.

本発明に係る画像処理装置は、原稿の画像データを取得する取得手段と、取得手段によって取得された画像データから、原稿上において着色されたマーキング領域を抽出するマーキング抽出手段と、原稿に並んで記載された複数の文字に含まれる文字毎に該文字が記載された文字記載領域を画像データから抽出する文字記載領域抽出手段と、文字記載領域抽出手段によって抽出された文字記載領域と、マーキング抽出手段によって抽出されたマーキング領域とが重なる重複文字領域を抽出する重複文字領域抽出手段と、重複文字領域抽出手段によって抽出された重複文字領域と、該重複文字領域に対応する文字記載領域とについて、複数の文字が並んだ方向の長さをそれぞれ算出する文字算出手段と、文字算出手段によって算出された文字記載領域の上記長さに対する重複文字領域の上記長さに基づいて、文字が原稿上においてマーキングによって指定された文字か否かを判定する文字判定手段とを備えることを特徴とする。 An image processing apparatus according to the present invention includes an acquisition unit that acquires image data of a document, a marking extraction unit that extracts a colored marking region on the document from the image data acquired by the acquisition unit, and the document processing unit. A character description area extracting unit that extracts a character description area in which the character is described for each character included in a plurality of described characters from image data, a character description area extracted by the character description area extraction unit, and a marking extraction About the overlapping character area extracting means for extracting the overlapping character area overlapping the marking area extracted by the means, the overlapping character area extracted by the overlapping character area extracting means, and the character description area corresponding to the overlapping character area, Character calculation means for calculating the length in the direction in which a plurality of characters are arranged, and character descriptions calculated by the character calculation means Based on the above length of the overlapping character region with respect to the length of the region, the character is characterized in that it comprises a character determination means for determining whether or not the specified characters by the marking on the document.

本発明に係る画像処理装置によれば、原稿上において着色されたマーキング領域が、画像データから抽出される。また、原稿に記載された複数の文字に含まれる文字毎に、該文字が記載された文字記載領域が、画像データから抽出される。そして、文字記載領域とマーキング領域とが重なる重複文字領域が抽出される。続いて、重複文字領域とこの重複文字領域に対応する文字記載領域とについて、上記複数の文字が並んだ方向の長さがそれぞれ算出される。そして、文字記載領域の上記長さに対する重複文字領域の上記長さに基づいて、マーキングによって指定された文字か否かが判定される。このため、ユーザがラフに文字をマーキングして、例えば、指定したい文字の隣の文字にまでマーキングがかかってしまった場合、又は、指定したい文字を完全に塗り潰していない場合であっても、マーキングされた度合いに基づいて、ユーザがマーキングによって指定したい文字か否かを判定できる。従って、ユーザがラフにマーキングした場合であっても、ユーザが指定したい文字を特定することが可能となる。 According to the image processing apparatus of the present invention, the colored marking area on the document is extracted from the image data. In addition, for each character included in a plurality of characters described in the document, a character description region in which the character is described is extracted from the image data. Then, an overlapping character area where the character description area and the marking area overlap is extracted. Subsequently, the length in the direction in which the plurality of characters are arranged is calculated for the overlapping character region and the character description region corresponding to the overlapping character region. Then, based on the length of the overlapping character area with respect to the length of the character description area, it is determined whether the character is designated by marking. For this reason, even if the user has roughly marked the character and, for example, the marking has been applied to the character next to the character to be designated, or even if the character to be designated is not completely filled Based on the degree, the user can determine whether the character is desired to be designated by marking. Therefore, even if the user has roughly marked, it is possible to specify the character that the user wants to specify.

本発明に係る画像処理装置では、原稿に配列して記載された複数の文字列に含まれる文字列毎に該文字列が記載された列記載領域を画像データから抽出する列記載領域抽出手段と、列記載領域抽出手段によって抽出された列記載領域と、マーキング抽出手段によって抽出されたマーキング領域とが重なる重複列領域を抽出する重複列領域抽出手段と、重複列領域抽出手段によって抽出された重複列領域と、該重複列領域に対応する列記載領域とについて、文字列に含まれる複数の文字が並んだ方向と直交する直交方向の長さをそれぞれ算出する列算出手段と、列算出手段によって算出された列記載領域の直交方向の長さに対する重複列領域の直交方向の長さに基づいて、文字列が原稿上においてマーキングによって指定された文字を含むか否かを判定する列判定手段とを備えることが好ましい。 In the image processing apparatus according to the present invention, a column description area extracting unit that extracts, from image data, a column description area in which a character string is described for each character string included in a plurality of character strings arranged and described in a document. A duplicate column region extraction unit that extracts a duplicate column region in which a column description region extracted by the column description region extraction unit and a marking region extracted by the marking extraction unit overlap, and an overlap extracted by the duplicate column region extraction unit A column calculation unit for calculating a length in a direction orthogonal to a direction in which a plurality of characters included in the character string are arranged, and a column calculation unit for the column region and the column description region corresponding to the overlapping column region; Whether or not the character string includes characters designated by marking on the manuscript based on the orthogonal length of the overlapping row region with respect to the calculated length of the row description region in the orthogonal direction It is preferable and a determining column determining means.

この好ましい構成によれば、原稿に記載された複数の文字列に含まれる文字列毎に、該文字列が記載された列記載領域が、画像データから抽出される。そして、列記載領域とマーキング領域とが重なる重複列領域が抽出される。続いて、重複列領域とこの重複列領域に対応する列記載領域とについて、文字列に含まれる複数の文字が並んだ方向と直交する直交方向の長さがそれぞれ算出される。そして、列記載領域の直交方向の長さに対する重複列領域の直交方向の長さに基づいて、文字列がマーキングによって指定された文字を含むか否かが判定される。このため、ユーザがラフに文字をマーキングして、例えば、指定したい文字列の隣の文字列にまでマーキングがかかってしまった場合、又は、指定したい文字列を完全に塗り潰していない場合でも、マーキングされた度合いに基づいて、ユーザがマーキングによって指定したい文字を含む文字列か否かを判定できる。従って、ユーザがラフにマーキングした場合であっても、ユーザの指定したい文字を含む文字列を特定することが可能となる。 According to this preferable configuration, for each character string included in the plurality of character strings described in the document, a column description area in which the character string is described is extracted from the image data. Then, an overlapping row area where the row description area and the marking area overlap is extracted. Subsequently, the length in the orthogonal direction orthogonal to the direction in which a plurality of characters included in the character string are arranged is calculated for the overlapping row region and the column description region corresponding to the overlapping row region. Then, it is determined whether or not the character string includes a character designated by the marking based on the length in the orthogonal direction of the overlapping column region with respect to the length in the orthogonal direction of the column description region. For this reason, even if the user roughly marks a character and, for example, the character string next to the character string to be specified has been marked, or even if the character string to be specified is not completely filled Whether or not the character string includes a character that the user wants to designate by marking can be determined based on the degree of the character. Therefore, even if the user has roughly marked, it is possible to specify a character string including the character that the user wants to specify.

本発明に係る画像処理装置では、文字判定手段は、列判定手段によって原稿上においてマーキングされた文字を含むと判定された文字列に含まれる複数の文字について、原稿上においてマーキングによって指定された文字か否かを判定することが好ましい。 In the image processing apparatus according to the present invention, the character determination means includes a character designated by marking on the document for a plurality of characters included in the character string determined to include the character marked on the document by the column determination means. It is preferable to determine whether or not.

この好ましい構成によれば、文字列単位で判定対象となる文字が抽出され、その後、抽出された文字列に含まれる文字がマーキングによって指定された文字か否かが判定される。従って、全ての文字について一つ一つ判定する場合と比較して、マーキングによって指定された文字を効率良く特定することができる。 According to this preferable configuration, the character to be determined is extracted in character string units, and then it is determined whether or not the character included in the extracted character string is a character specified by marking. Therefore, it is possible to efficiently specify the character designated by the marking as compared with the case where all characters are determined one by one.

本発明に係る画像処理装置では、重複列領域抽出手段によって抽出された重複列領域が、単数か複数かを判断する判断手段を備えることが好ましく、文字判定手段が、判断手段によって重複列領域が単数と判断された場合に、該重複列領域に対応する文字列に含まれる複数の文字について、原稿上においてマーキングによって指定された文字か否かを判定することが好ましい。 In the image processing apparatus according to the present invention, it is preferable that the image processing apparatus further includes a determination unit that determines whether the overlapping column region extracted by the overlapping column region extraction unit is singular or plural. When it is determined that the number is singular, it is preferable to determine whether or not a plurality of characters included in the character string corresponding to the overlapping row area are characters designated by marking on the document.

この好ましい構成によれば、重複列領域が単数か複数かについて判断されるので、複数の文字列に渡ってマーキングされているか否かを判断することができる。そして、重複列領域が単数の場合に、その重複列領域に対応する文字列に含まれる複数の文字について、原稿上においてマーキングによって指定された文字か否かが判定される。このため、マーキングされている文字列が１列の場合に、重複列領域の上記長さにかかわらず、その文字列に含まれる複数の文字を、判定対象の文字とすることができる。 According to this preferable configuration, since it is determined whether the overlapping row region is singular or plural, it is possible to determine whether marking is performed over a plurality of character strings. Then, when there is a single overlapping row area, it is determined whether or not a plurality of characters included in the character string corresponding to the overlapping row area are characters designated by marking on the document. For this reason, when the character string currently marked is one line, the several character contained in the character string can be made into the character for determination irrespective of the said length of an overlap line area | region.

本発明によれば、ユーザがラフにマーキングした場合であっても、ユーザの指定したい文字を特定することが可能となる。 According to the present invention, it is possible to specify a character that the user wants to specify even when the user has roughly marked.

実施形態に係る画像処理装置が搭載されたネットワーク複合機の構成を示すブロック図である。1 is a block diagram illustrating a configuration of a network multi-function peripheral equipped with an image processing apparatus according to an embodiment. ネットワーク複合機が有する制御部の構成を示すブロック図である。It is a block diagram which shows the structure of the control part which a network multifunction device has. 制御部が有する列抽出部が、マーキングされた文字列の一例に対して行う列抽出処理を説明するための図である。It is a figure for demonstrating the column extraction process which the column extraction part which a control part has performs with respect to an example of the marked character string. 制御部が有する文字抽出部が、マーキングされた文字の一例に対して行う文字抽出処理を説明するための図である。It is a figure for demonstrating the character extraction process which the character extraction part which a control part has performs with respect to an example of the marked character. 制御部が有する文字抽出部が、マーキングされた文字の別の例に対して行う文字抽出処理を説明するための図である。It is a figure for demonstrating the character extraction process which the character extraction part which a control part has performs with respect to another example of the marked character. ネットワーク複合機が行う指定文字の抽出処理の処理手順を示すフローチャート（前半部分）である。10 is a flowchart (first half) illustrating a processing procedure of a specified character extraction process performed by the network multifunction peripheral. ネットワーク複合機が行う指定文字の抽出処理の処理手順を示すフローチャート（後半部分）である。10 is a flowchart (second half) illustrating a processing procedure of a specified character extraction process performed by the network multifunction peripheral.

以下、図面を参照して本発明の好適な実施形態について詳細に説明する。まず、図１及び図２を用いて、実施形態に係る画像処理装置が搭載されたネットワーク複合機１の構成について説明する。図１は、ネットワーク複合機１の構成を示すブロック図である。図２は、ネットワーク複合機１が有する制御部３０の構成を示すブロック図である。 DESCRIPTION OF EMBODIMENTS Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the drawings. First, the configuration of the network MFP 1 in which the image processing apparatus according to the embodiment is mounted will be described with reference to FIGS. 1 and 2. FIG. 1 is a block diagram showing the configuration of the network multifunction device 1. FIG. 2 is a block diagram illustrating a configuration of the control unit 30 included in the network multifunction peripheral 1.

ネットワーク複合機１は、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）９０に接続されたパーソナルコンピュータ５等の機器との間で情報の送受信が可能に構成され、プリント機能、スキャン機能、及びＦＡＸ機能を含む複合機能を備えた機器である。このネットワーク複合機１は、ユーザによって文字がマーキングされた原稿を読み取って画像データを生成し、マーキングによって指定された指定文字を特定する機能を有する。特定された指定文字に対しては、例えば、ＯＣＲ（文字認識）処理、又は削除、色変換、強調等の編集処理が行われる。 The network multifunction device 1 is configured to be able to send and receive information to and from devices such as a personal computer 5 connected to a LAN (Local Area Network) 90, and has a composite function including a print function, a scan function, and a FAX function. Equipment. The network multifunction device 1 has a function of reading a document on which characters are marked by a user, generating image data, and specifying a designated character designated by the marking. For the specified designated character, for example, OCR (character recognition) processing or editing processing such as deletion, color conversion, and enhancement is performed.

上記機能を発揮するため、ネットワーク複合機１は、操作パネル１０、ディスプレイ１１、スキャナ１２、プリンタ１３、ＮＣＵ（ＮｅｔｗｏｒｋＣｏｎｔｒｏｌＵｎｉｔ）１４、モデム１５、ＬＡＮＩ／Ｆ（ＬＡＮインターフェース）１９、及び制御部３０を備えている。なお、ＮＣＵ１４は、モデム１５と公衆交換電話網（ＰＳＴＮ）９１との接続を制御し、ＦＡＸの送受信機能を司るものである。 In order to exhibit the above functions, the network multifunction peripheral 1 includes an operation panel 10, a display 11, a scanner 12, a printer 13, an NCU (Network Control Unit) 14, a modem 15, a LAN I / F (LAN interface) 19, and a control unit. 30. The NCU 14 controls the connection between the modem 15 and the public switched telephone network (PSTN) 91 and controls the FAX transmission / reception function.

スキャナ１２は、原稿を読み取って画像データを取得する。このスキャナ１２は、特許請求の範囲に記載の取得手段として機能する。原稿には、複数の文字列が配列して記載され、一部の文字がマーキングされている。ここで、マーキングとは、原稿上の領域を原稿の背景色すなわち地の色とは異なる色で着色することである。ユーザは、例えば、カラーマーカー等で文字にラインを引くことで、文字をマーキングすることができる。また、文字が記載された領域を塗ることにより、文字をマーキングすることができる。このようにして文字をマーキングすることにより、ユーザは編集したい文字を指定する。 The scanner 12 reads a document and acquires image data. The scanner 12 functions as acquisition means described in the claims. On the manuscript, a plurality of character strings are arranged and described, and some characters are marked. Here, the marking is to color an area on the document with a color different from the background color of the document, that is, the background color. For example, the user can mark a character by drawing a line on the character with a color marker or the like. Moreover, a character can be marked by painting the area | region where the character was described. By marking the character in this manner, the user designates the character to be edited.

スキャナ１２は、カラーで原稿を読み取って、ＲＧＢ各色のデータを含む画像データを生成する。このため例えば、スキャナ１２は、赤（Ｒ）、緑（Ｇ）、青（Ｂ）のカラーフィルタが貼り付けられた単板ＣＣＤを有し、ＲＧＢ各色のデータを取得する。また、例えば、ＣＣＤに入射する光を分光プリズム内のダイクロック膜の反射によってＲＧＢの３原色に分光する３ＣＣＤ方式を用いてもよい。スキャナ１２によって生成された画像データは、制御部３０へ出力される。 The scanner 12 reads an original in color and generates image data including RGB data. For this reason, for example, the scanner 12 has a single-plate CCD to which color filters of red (R), green (G), and blue (B) are attached, and acquires data of RGB colors. Further, for example, a 3CCD system that splits light incident on the CCD into three primary colors of RGB by reflection of a dichroic film in the spectral prism may be used. Image data generated by the scanner 12 is output to the control unit 30.

制御部３０は、ＣＰＵ１６、ＲＯＭ１７、及びＲＡＭ１８等によって構成され、入力された画像データから、マーキングによって指定された指定文字を特定する。図２に示すように、制御部３０は、機能的な構成要素として、マーキング抽出部３１、列抽出部３２、及び文字抽出部３３を備える。 The control unit 30 includes a CPU 16, a ROM 17, a RAM 18, and the like, and specifies a designated character designated by marking from input image data. As shown in FIG. 2, the control unit 30 includes a marking extraction unit 31, a column extraction unit 32, and a character extraction unit 33 as functional components.

マーキング抽出部３１は、スキャナ１２から入力された画像データからマーキング領域を抽出する。マーキング領域は、原稿上の着色された領域である。例えば、マーキング抽出部３１は、画像データをＲＧＢ色空間からＨＬＳ色空間に変換し、画像データに含まれる各画素のＨＬＳ色空間中の位置に基づいて、各画素がマーキング領域内の画素か否かを特定する。ここで、ＨＬＳ色空間は、色相（Ｈｕｅ）、彩度（Ｓａｔｕｒａｔｉｏｎ）、輝度（Ｌｉｇｈｔｎｅｓｓ／Ｌｕｍｉｎａｎｃｅ）の３つの成分からなる色空間である。 The marking extraction unit 31 extracts a marking area from the image data input from the scanner 12. The marking area is a colored area on the document. For example, the marking extraction unit 31 converts the image data from the RGB color space to the HLS color space, and determines whether each pixel is a pixel in the marking area based on the position of each pixel included in the image data in the HLS color space. To identify. Here, the HLS color space is a color space including three components of hue (Hue), saturation (Saturation), and luminance (Lightness / Luminance).

マーキング抽出部３１は、着色された画素の一固まりの領域をマーキング領域とし、位置座標データ等でマーキング領域を特定する。例えば、マーキング領域が矩形状であれば、４隅の位置座標データによってマーキング領域を特定することができる。マーキング抽出部３１は、マーキング領域を特定する位置座標データを列抽出部３２及び文字抽出部３３へ出力する。なお、マーキング抽出部３１は、特許請求の範囲に記載のマーキング抽出手段として機能する。 The marking extraction unit 31 uses a group of colored pixels as a marking region, and specifies the marking region using position coordinate data or the like. For example, if the marking area is rectangular, the marking area can be specified by position coordinate data at four corners. The marking extraction unit 31 outputs position coordinate data for specifying the marking area to the column extraction unit 32 and the character extraction unit 33. The marking extraction unit 31 functions as marking extraction means described in the claims.

列抽出部３２は、原稿に記載された複数の文字列のうち、指定文字を含む文字列を抽出する。列抽出部３２は、上記処理を行うために、列記載領域抽出部３２１、重複列領域抽出部３２２、列算出部３２３、列判定部３２４、及び判断部３２５を有する。 The column extraction unit 32 extracts a character string including a designated character from among a plurality of character strings described in the document. The column extraction unit 32 includes a column description region extraction unit 321, an overlapping column region extraction unit 322, a column calculation unit 323, a column determination unit 324, and a determination unit 325 in order to perform the above processing.

列記載領域抽出部３２１は、文字列毎に文字列が記載された列記載領域を画像データから抽出する。列記載領域は、文字列が記載された領域を囲んだ矩形状の領域である。列記載領域抽出部３２１は、列記載領域の４隅の位置座標を特定することで、列記載領域を抽出できる。 The column description area extraction unit 321 extracts a column description area in which a character string is described for each character string from the image data. The column description area is a rectangular area surrounding the area where the character string is described. The column description region extraction unit 321 can extract the column description region by specifying the position coordinates of the four corners of the column description region.

例えば、一般的に、ＯＣＲ処理を行う際に、文字列が記載された矩形状の領域の４隅を特定する位置座標データが生成される。列記載領域抽出部３２１は、ＯＣＲ処理によって生成された位置座標データを利用して、文字列が記載された矩形状の領域の４隅の位置座標データを特定する。なお、列記載領域抽出部３２１は、特許請求の範囲に記載の列記載領域抽出手段として機能する。 For example, generally, when performing OCR processing, position coordinate data that specifies four corners of a rectangular region in which a character string is described is generated. The column description area extraction unit 321 specifies the position coordinate data of the four corners of the rectangular area in which the character string is described, using the position coordinate data generated by the OCR process. The column description area extraction unit 321 functions as a column description area extraction unit described in the claims.

重複列領域抽出部３２２は、マーキング領域と列記載領域とが重なった重複列領域を抽出する。この、重複列領域抽出部３２２は、特許請求の範囲に記載の重複列領域抽出手段として機能する。列算出部３２３は、重複列領域と列記載領域との、文字列方向に直交する直交方向の長さをそれぞれ算出する。文字列方向は、文字列に含まれる複数の文字が並んだ方向である。文字列方向は、例えば、文字列記載領域の長手方向に特定される。なお、列算出部３２３は、特許請求の範囲に記載の列算出手段として機能する。 The overlapping row area extraction unit 322 extracts the overlapping row area where the marking area and the row description area overlap. The overlapping row area extraction unit 322 functions as an overlapping row area extraction unit described in the claims. The column calculation unit 323 calculates the lengths of the overlapping column area and the column description area in the orthogonal direction orthogonal to the character string direction. The character string direction is a direction in which a plurality of characters included in the character string are arranged. The character string direction is specified, for example, in the longitudinal direction of the character string description area. The column calculation unit 323 functions as a column calculation unit described in the claims.

列判定部３２３は、重複列領域の上記直交方向の長さが、列記載領域の上記直交方向の長さに対する所定の割合以上である場合に、判定対象の文字列が指定文字を含むと判定する。そして、重複列領域の上記長さが、列記載領域の上記長さに対する所定の割合より小さい場合に、文字列が指定文字を含まないと判定する。所定の割合は、例えば、２分の１に設定することができる。なお、列判定部３２４は、特許請求の範囲に記載の列判定手段として機能する。 The column determination unit 323 determines that the character string to be determined includes the designated character when the length of the overlapping column region in the orthogonal direction is equal to or greater than a predetermined ratio with respect to the length of the column description region in the orthogonal direction. To do. Then, when the length of the overlapping row area is smaller than a predetermined ratio with respect to the length of the row description area, it is determined that the character string does not include the designated character. The predetermined ratio can be set to 1/2, for example. The column determination unit 324 functions as a column determination unit described in the claims.

判断部３２５は、重複列領域抽出部３２２によって抽出された重複列領域が、単数か複数かを判断する。これにより、マーキング領域と重なる文字列が複数ラインあるか否か、すなわち、複数の文字列に渡ってマーキングされているか否かを判断することができる。この判断部３２５は、特許請求の範囲に記載の判断手段として機能する。判断部３２５が、重複列領域が複数であると判断した場合、列算出部３２３及び列判定部３２４によって、文字列が指定文字を含むか否かが判断される。 The determination unit 325 determines whether the overlapping column region extracted by the overlapping column region extraction unit 322 is single or plural. This makes it possible to determine whether or not there are a plurality of lines of character strings that overlap the marking area, that is, whether or not marking is performed over a plurality of character strings. The determination unit 325 functions as a determination unit described in the claims. When the determination unit 325 determines that there are a plurality of overlapping column regions, the column calculation unit 323 and the column determination unit 324 determine whether the character string includes a designated character.

図３を参照して、列抽出部３２が文字列を抽出する列抽出方法について具体例を用いて説明する。図３に示すように、原稿には、一例として、複数（３列）の文字列４１〜４３が各文字列方向４４と直交する直交方向４５に並んで配列されている。図３において、グレーの矩形状の領域が、マーキング領域４６である。この図３の例では、マーキング領域４６が、複数の文字列４１，４２に渡っている。なお、図３では、マーキング領域４６をグレーで示しているが、ピンクや黄色の蛍光ペン、その他の任意の色が着いていてもよい。このマーキング領域４６は、上記のマーキング抽出部３１によって抽出される。 With reference to FIG. 3, a column extraction method in which the column extraction unit 32 extracts a character string will be described using a specific example. As shown in FIG. 3, as an example, a plurality (three columns) of character strings 41 to 43 are arranged side by side in an orthogonal direction 45 that is orthogonal to each character string direction 44. In FIG. 3, a gray rectangular area is a marking area 46. In the example of FIG. 3, the marking region 46 extends over a plurality of character strings 41 and 42. In FIG. 3, the marking region 46 is shown in gray, but pink or yellow highlighter or any other color may be worn. The marking area 46 is extracted by the marking extraction unit 31 described above.

列記載領域抽出部３２１は、上段の文字列４１、中断の文字列４２、及び下段の文字列４３がそれぞれ記載された列記載領域４１１，４２１，４３１を抽出する。重複列領域抽出部３２２は、マーキング領域４６と上段の文字列４１の列記載領域４１１とが重なる重複列領域４１２を抽出する。また、重複列領域抽出部３２２は、マーキング領域４６と中段の文字列４２の列記載領域４２１とが重なる重複列領域４２２を抽出する。 The column description area extraction unit 321 extracts column description areas 411, 421, and 431 in which the upper character string 41, the interrupted character string 42, and the lower character string 43 are respectively described. The overlapping line area extraction unit 322 extracts the overlapping line area 412 where the marking area 46 and the line description area 411 of the upper character string 41 overlap. In addition, the overlapping line area extraction unit 322 extracts an overlapping line area 422 where the marking area 46 and the line description area 421 of the middle character string 42 overlap.

この場合、判断部３２５は、重複列領域４１２，４２２が複数と判断する。重複列領域が複数と判断されたので、重複列領域４１２に対応する上段の文字列４１と、重複列領域４２２に対応する中段の文字列４２が、判定対象の文字列となる。 In this case, the determination unit 325 determines that there are a plurality of overlapping row regions 412 and 422. Since it is determined that there are a plurality of overlapping string areas, the upper character string 41 corresponding to the overlapping string area 412 and the middle character string 42 corresponding to the overlapping string area 422 become the character strings to be determined.

上段の文字列４１について、列算出部３２３は、重複列領域４１２の直交方向４５の長さ４１３と、列記載領域４１１の直交方向４５の長さ４１４とを算出する。長さは、例えば、文字列方向４４に沿った複数箇所について直交方向４５の長さを算出し、算出した長さの平均値を用いることができる。そして、列判定部３２４は、算出した列記載領域４１１の長さ４１４に対する重複列領域４１２の長さ４１３に基づいて、文字列４１が指定文字を含むか否かを判定する。 For the upper character string 41, the column calculation unit 323 calculates the length 413 of the overlapping column region 412 in the orthogonal direction 45 and the length 414 of the column description region 411 in the orthogonal direction 45. As the length, for example, the length in the orthogonal direction 45 can be calculated for a plurality of locations along the character string direction 44, and an average value of the calculated lengths can be used. Then, the column determination unit 324 determines whether or not the character string 41 includes a designated character based on the calculated length 413 of the overlapping column region 412 with respect to the length 414 of the column description region 411.

図３に示す例では、重複列領域４１２の長さ４１３が、列記載領域４１１の長さ４１４の半分以上でないので、文字列４１は、指定文字を含まないと判断される。また、この方法によれば、中段の文字列４２は、列判定部３２４によって指定文字を含むと判断される。下段の文字列４３は、重複列領域がなく、全くマーキングされていないので、指定文字を含まないと判断される。これにより、ユーザが、ラフにマーキングし、マーキングしたい文字列が中段の文字列４２のみであるにもかかわらず、その上段の文字列４１にまでマーキングが掛かってしまった場合でも、ユーザのマーキングした文字列４２のみを正確に抽出することができる。 In the example shown in FIG. 3, the length 413 of the overlapping column area 412 is not more than half the length 414 of the column description area 411, so that the character string 41 is determined not to include the designated character. Further, according to this method, the middle character string 42 is determined by the column determining unit 324 to include a designated character. Since the lower character string 43 has no overlapping row area and is not marked at all, it is determined that it does not include a designated character. As a result, even if the user has roughly marked the character string to be marked only in the middle character string 42 and the upper character string 41 has been marked, the user has marked it. Only the character string 42 can be extracted accurately.

なお、図３は、マーキング領域４６と重なる文字列が複数ある場合について説明したが、マーキング領域４６と重なる文字列が単数である場合は、判断部３２５によって重複列領域が単数と判断される。この場合、列判定部３２５は、重複列領域に対応する文字列が、指定文字を含むと判定する。列抽出部３２は、列判定部３２４によって指定文字を含むと判定された文字列を抽出し、文字抽出部３３へ出力する。 Note that FIG. 3 illustrates the case where there are a plurality of character strings overlapping the marking area 46. However, when there is a single character string overlapping the marking area 46, the determination unit 325 determines that the overlapping line area is singular. In this case, the column determination unit 325 determines that the character string corresponding to the overlapping column region includes a designated character. The column extraction unit 32 extracts the character string determined by the column determination unit 324 as including the designated character, and outputs the character string to the character extraction unit 33.

図２に示す文字抽出部３３は、列抽出部３２によって抽出された文字列に含まれる複数の文字から指定文字を抽出する。文字抽出部３３は、指定文字を抽出するために、文字記載領域抽出部３３１、重複文字領域抽出部３３２、文字算出部３３３、及び文字判定部３３４を有する。 The character extraction unit 33 illustrated in FIG. 2 extracts a designated character from a plurality of characters included in the character string extracted by the column extraction unit 32. The character extraction unit 33 includes a character description region extraction unit 331, an overlapping character region extraction unit 332, a character calculation unit 333, and a character determination unit 334 in order to extract a designated character.

文字記載領域抽出部３３１は、判定対象となる文字列に含まれる文字毎に文字が記載された文字記載領域を画像データから抽出する。例えば、ＯＣＲ処理を行う際に、各文字が記載された矩形状の領域の４隅を特定する位置座標データが生成される。文字記載領域抽出部３３１は、この４隅の位置座標データによって文字記載領域を特定し、抽出することができる。なお、文字記載領域抽出部３３１は、特許請求の範囲に記載の文字記載領域抽出手段として機能する。 The character description area extraction unit 331 extracts, from the image data, a character description area in which a character is described for each character included in the character string to be determined. For example, when performing OCR processing, position coordinate data that specifies four corners of a rectangular area in which each character is described is generated. The character description area extraction unit 331 can specify and extract a character description area based on the position coordinate data of the four corners. The character description area extraction unit 331 functions as a character description area extraction unit described in the claims.

重複文字領域抽出部３３２は、マーキング領域と文字記載領域とが重なった重複文字領域を抽出する。この、重複文字領域抽出部３３２は、特許請求の範囲に記載の重複文字領域抽出手段として機能する。文字算出部３３３は、重複文字領域と文字記載領域との、文字列方向４４の長さをそれぞれ算出する。なお、文字算出部３３３は、特許請求の範囲に記載の文字算出手段として機能する。 The duplicate character region extraction unit 332 extracts a duplicate character region where the marking region and the character description region overlap. The overlapping character area extracting unit 332 functions as an overlapping character area extracting unit described in the claims. The character calculation unit 333 calculates the lengths of the overlapping character area and the character description area in the character string direction 44, respectively. The character calculation unit 333 functions as character calculation means described in the claims.

文字判定部３３４は、重複文字領域の文字列方向４４の長さが、文字記載領域の文字列方向４４の長さに対する所定の割合以上である場合に、判定対象の文字が指定文字であると判定する。そして、重複文字領域の上記長さが、文字記載領域の上記長さに対する所定の割合より小さい場合に、判定対象の文字が指定文字でないと判定する。所定の割合は、例えば、２分の１に設定することができる。なお、文字抽出部３３は、文字判定部３３４によって指定文字であると判定された文字を抽出する。 The character determination unit 334 determines that the character to be determined is a designated character when the length of the overlapping character area in the character string direction 44 is equal to or greater than a predetermined ratio with respect to the length of the character description area in the character string direction 44. judge. Then, when the length of the overlapping character area is smaller than a predetermined ratio with respect to the length of the character description area, it is determined that the character to be determined is not a designated character. The predetermined ratio can be set to 1/2, for example. Note that the character extraction unit 33 extracts a character that is determined to be a designated character by the character determination unit 334.

図４を参照して、文字抽出部３３が文字を抽出する文字抽出処理について、具体例を用いて説明する。まず、図４に示す文字列「列文字列文」に含まれる複数の文字のうち、右から１番目の文字「文」５１が判定対象である場合ついて説明する。 With reference to FIG. 4, the character extraction process in which the character extraction unit 33 extracts characters will be described using a specific example. First, the case where the first character “sentence” 51 from the right among the plurality of characters included in the character string “string character string sentence” shown in FIG.

文字記載領域抽出部３３１は、文字「文」５１が記載された文字記載領域５１１を抽出する。重複文字領域抽出部３３２は、マーキング抽出部３１によって抽出されたマーキング領域４６と、文字記載領域抽出部３３１によって抽出された文字記載領域５１１との重複文字領域５１２を抽出する。重複文字領域５１２は、位置座標データによって特定することができる。 The character description area extraction unit 331 extracts the character description area 511 in which the character “sentence” 51 is described. The overlapping character area extraction unit 332 extracts an overlapping character area 512 between the marking area 46 extracted by the marking extraction unit 31 and the character description area 511 extracted by the character description area extraction unit 331. The overlapping character area 512 can be specified by position coordinate data.

文字算出部３３３は、重複文字領域５１２の文字列方向４４の長さ５１３と、文字記載領域５１１の文字列方向４４の長さ５１４とを算出する。長さは、例えば、直交方向４５に沿った複数位置について文字列方向４４の長さをそれぞれ算出し、算出した長さの平均値を用いることができる。そして、文字判定部３３４は、算出された重複文字領域５１２と文字記載領域５１１との長さに基づいて、文字「文」５１が指定文字か否かを判定する。 The character calculation unit 333 calculates the length 513 of the overlapping character area 512 in the character string direction 44 and the length 514 of the character description area 511 in the character string direction 44. As the length, for example, the length in the character string direction 44 is calculated for each of a plurality of positions along the orthogonal direction 45, and an average value of the calculated lengths can be used. Then, the character determination unit 334 determines whether or not the character “sentence” 51 is a designated character based on the calculated lengths of the overlapping character region 512 and the character description region 511.

図４に示す例では、重複文字領域５１２の長さ５１３が、文字記載領域５１１の長さ５１４の半分以上でないので、文字５１「文」は、指定文字でないと判定される。上記方法によれば、右から２番目の文字「列」５２及び右から３番目の文字「字」５３は、重複文字領域と文字記載領域とが一致し、重複文字領域の文字列方向４４の長さが、文字記載領域の文字列方向４４の長さの半分以上である。従って、文字「列」５２及び文字「字」５３は、指定文字であると判定される。 In the example shown in FIG. 4, since the length 513 of the overlapping character area 512 is not more than half the length 514 of the character description area 511, it is determined that the character 51 “sentence” is not a designated character. According to the above method, the second character “string” 52 from the right and the third character “character” 53 from the right match the overlapping character area and the character description area, and the character string direction 44 of the overlapping character area is the same. The length is at least half of the length in the character string direction 44 of the character description area. Therefore, the character “string” 52 and the character “character” 53 are determined to be designated characters.

また、右から４番目の文字「文」５４は、一部しかマーキングされていないが、重複文字領域５４２の文字列方向４４の長さ５４３が、文字記載領域５４１の文字列方向４４の長さ５４４の半分以上である。従って、文字「文」５４は、指定文字であると判定される。右から５番目の文字「列」５５は、全くマーキングされていないので、重複文字領域がなく、指定文字でないと判定さされる。この場合、文字抽出部３３は、文字「文」５４、文字「字」５３、文字「列」５２のみを指定文字であるとして抽出する。 The fourth character “sentence” 54 from the right is only partially marked, but the length 543 of the overlapping character area 542 in the character string direction 44 is the length of the character description area 541 in the character string direction 44. More than half of 544. Therefore, the character “sentence” 54 is determined to be a designated character. Since the fifth character “sequence” 55 from the right is not marked at all, it is determined that there is no overlapping character area and it is not a designated character. In this case, the character extraction unit 33 extracts only the character “sentence” 54, the character “character” 53, and the character “string” 52 as designated characters.

図４に示すように、ユーザがラフにマーキングし、マーキングしたい文字「文」５４、文字「字」５３、及び文字「列」５２の右側の文字「文」５１にまでマーキングが掛かり、左側の文字「文」５４は、文字の左側の部分がマーキングされていない。この場合でも、左側の文字「文」５４は指定文字に含まれ、右側の文字「文」５１は指定文字に含まれないと判別し、ユーザのマーキングしたい文字「文」５４、文字「字」５３、文字「列」５２のみを正確に抽出することができる。 As shown in FIG. 4, the user performs rough marking, and the character “sentence” 54, the character “letter” 53, and the character “sentence” 51 on the right side of the character “string” 52 are marked. The character “sentence” 54 is not marked on the left side of the character. Even in this case, it is determined that the character “sentence” 54 on the left side is included in the designated character and the character “sentence” 51 on the right side is not included in the designated character, and the character “sentence” 54 and the character “character” that the user wants to mark are determined. 53, only the character “string” 52 can be accurately extracted.

引き続いて図５を参照して、文字抽出部３３が文字を抽出する文字抽出処理について、別の具体例を用いて説明する。図５に示す例は、図４に示す例よりマーキング領域４６の幅が狭くライン状である。 Next, with reference to FIG. 5, a character extraction process in which the character extraction unit 33 extracts characters will be described using another specific example. In the example shown in FIG. 5, the width of the marking region 46 is narrower than that of the example shown in FIG.

右から１番目の文字「文」６１は、重複文字領域６１２の文字列方向４４の長さ６１３が、文字記載領域６１１の文字列方向４４の長さ６１４の半分より小さい。従って、文字「文」６１は、指定文字でないと判定される。右から２番目の文字「列」６２及び右から３番目の文字「字」６３は、重複文字領域の文字列方向４４の長さが、文字記載領域の文字列方向４４の長さの半分以上である。従って、文字「列」６２及び文字「字」６３は、指定文字であると判定される。 In the first character “sentence” 61 from the right, the length 613 of the overlapping character region 612 in the character string direction 44 is smaller than half the length 614 of the character description region 611 in the character string direction 44. Therefore, it is determined that the character “sentence” 61 is not a designated character. In the second character “string” 62 from the right and the third character “character” 63 from the right, the length of the overlapping character area in the character string direction 44 is more than half of the length of the character description area in the character string direction 44. It is. Therefore, the character “string” 62 and the character “character” 63 are determined to be designated characters.

また、右から４番目の文字「文」６４は、一部しかマーキングされていないが、重複文字領域６４２の文字列方向４４の長さ６４３が、文字記載領域６４１の文字列方向４４の長さ６４４の半分以上である。従って、文字「文」６４は、指定文字であると判定される。右から５番目の文字「列」６５は、全くマーキングされていないので、重複文字領域がなく、指定文字でないと判定される。この場合、文字抽出部３３は、文字「文」６４、文字「字」６３、文字「列」６２のみを指定文字であるとして抽出する。 The fourth character “sentence” 64 from the right is only partially marked, but the length 643 of the overlapping character area 642 in the character string direction 44 is the length of the character description area 641 in the character string direction 44. More than half of 644. Therefore, the character “sentence” 64 is determined to be the designated character. Since the fifth character “column” 65 from the right is not marked at all, it is determined that there is no overlapping character area and it is not a designated character. In this case, the character extraction unit 33 extracts only the character “sentence” 64, the character “character” 63, and the character “string” 62 as the designated characters.

この場合もユーザが、ラフにマーキングし、マーキングしたい文字列の右側の文字「文」６１にマーキングが掛かり、左側の文字「文」６４は、文字の左側の部分がマーキングされていない。この場合でも、左側の文字「文」６４は指定文字に含まれ、右側の文字「文」６１は指定文字に含まれないことを判別し、ユーザのマーキングしたい文字「文」６４、文字「字」６３、文字「列」６２のみを正解に抽出することができる。 Also in this case, the user performs rough marking, and the character “sentence” 61 on the right side of the character string to be marked is marked, and the left part of the character “sentence” 64 is not marked. Even in this case, it is determined that the character “sentence” 64 on the left side is included in the designated character, and the character “sentence” 61 on the right side is not included in the designated character. "63" and only the character "string" 62 can be extracted correctly.

図５の例では、ライン状のマーキング領域４６が、文字列の直交方向４５における中央部に位置しているが、文字列の下側又は上側に位置している場合であっても、同様の処理を適用することができる。この場合も、ユーザのマーキングしたい文字「文」６４、文字「字」６３、文字「列」６２のみを正解に抽出することができる。 In the example of FIG. 5, the line-shaped marking region 46 is located at the center in the orthogonal direction 45 of the character string, but the same applies even if it is located below or above the character string. Processing can be applied. Also in this case, only the character “sentence” 64, the character “character” 63, and the character “string” 62 that the user wants to mark can be extracted correctly.

引き続いて、図６及び図７を参照してネットワーク複合機１の動作について説明する。図６と図７は、ネットワーク複合機１が行う指定文字の抽出処理の処理手順を示すフローチャートの前半部分と後半部分である。 Subsequently, the operation of the network MFP 1 will be described with reference to FIGS. 6 and 7. 6 and 7 are the first half and the second half of the flowchart showing the processing procedure of the designated character extraction process performed by the network MFP 1.

まず、ユーザが、マーキングされた原稿を読み取るように、操作パネル１０を用いて操作を行うと、ステップＳ１０１では、ネットワーク複合機１にセットされた原稿がスキャナ１２によって読み取られ、原稿の画像データが取得される。ステップＳ１０２では、マーキング領域４６が画像データから抽出される。ステップＳ１０３では、画像データに対してＯＣＲ処理を行うことにより、文字列が記載された列記載領域が抽出される。 First, when the user performs an operation using the operation panel 10 so as to read the marked document, in step S101, the document set on the network multifunction peripheral 1 is read by the scanner 12, and the image data of the document is obtained. To be acquired. In step S102, the marking area 46 is extracted from the image data. In step S103, an OCR process is performed on the image data to extract a column description area in which a character string is described.

続いて、ステップＳ１０４では、ステップＳ１０２において抽出されたマーキング領域４６と、ステップＳ１０３において抽出された列記載領域とが重なった重複列領域が抽出される。なお、ステップＳ１０２で行われるマーキング領域４６の抽出処理、ステップＳ１０３で行われる列記載領域の抽出処理、及びステップＳ１０４で行われる重複列領域の抽出処理は、上述した方法で行うことができるので、ここでは説明を省略する。 Subsequently, in step S104, an overlapping column region in which the marking region 46 extracted in step S102 and the column description region extracted in step S103 overlap is extracted. Note that the marking area 46 extraction process performed in step S102, the column description area extraction process performed in step S103, and the overlapping column area extraction process performed in step S104 can be performed by the above-described method. The description is omitted here.

ステップＳ１０５では、マーキング領域４６と重なる文字列があるか否かが判断される。すなわち、ステップＳ１０４で重複列領域が抽出されたか否かが判断される。重複列領域が抽出されなかった場合は、マーキングされた文字がないので、処理が終了する。重複列領域が抽出された場合は、処理がステップＳ１０６へ進む。 In step S105, it is determined whether there is a character string overlapping the marking area 46. That is, it is determined in step S104 whether or not an overlapping row area has been extracted. If no overlapping row area is extracted, there is no marked character and the process ends. If an overlapping row area is extracted, the process proceeds to step S106.

ステップＳ１０６では、マーキング領域４６と重なる文字列が単ラインか否かが判断される。すなわち、ステップＳ１０４で抽出された重複列領域が、単数か複数かが判断される。重複列領域が単数である場合は、ステップＳ１０８へ処理が進む。重複列領域が複数ある場合は、ステップＳ１０７へ処理が進む。 In step S106, it is determined whether or not the character string overlapping the marking area 46 is a single line. That is, it is determined whether the overlapping row area extracted in step S104 is singular or plural. If there is a single overlapping column area, the process proceeds to step S108. If there are a plurality of overlapping row areas, the process proceeds to step S107.

ステップＳ１０７では、判定対象の文字列について、列記載領域と重複列領域との直交方向４５の長さが算出され、重複列領域の長さが、列記載領域の長さの半分以上か否かが判断される。重複列領域の長さが、列記載領域の長さの半分以上でないと判断された場合は、ステップＳ１０９へ処理が進む。ステップＳ１０９では、判定対象の文字列が、指定文字を含む文字列ではないと判定され、ステップＳ１１６へ処理が進む。 In step S107, for the character string to be determined, the length in the orthogonal direction 45 between the column description area and the overlap string area is calculated, and whether or not the length of the overlap string area is half or more of the length of the column description area. Is judged. If it is determined that the length of the overlapping column area is not more than half the length of the column description area, the process proceeds to step S109. In step S109, it is determined that the character string to be determined is not a character string including the designated character, and the process proceeds to step S116.

ステップＳ１０７で、重複列領域の長さが、列記載領域の長さの半分以上であると判断された場合は、ステップＳ１０８へ処理が進む。ステップＳ１０８では、判定対象の文字列が指定文字を含む文字列であると判定される。続いて、ステップＳ１１０では、ステップＳ１０８で指定文字を含むと判定された文字列のうち、判定対象となる文字の文字記載領域が抽出される。ステップＳ１１１では、ステップＳ１１０で抽出された文字記載領域とマーキング領域４６とが重なる重複文字領域が抽出される。 If it is determined in step S107 that the length of the overlapping column area is half or more of the length of the column description area, the process proceeds to step S108. In step S108, it is determined that the character string to be determined is a character string including a designated character. Subsequently, in step S110, the character description area of the character to be determined is extracted from the character string determined to include the designated character in step S108. In step S111, an overlapping character area in which the character description area extracted in step S110 and the marking area 46 overlap is extracted.

ステップＳ１１２では、判定対象の文字について、文字記載領域と重複文字領域との文字列方向４４の長さが算出され、重複文字領域の長さが、文字記載領域の長さの半分以上か否かが判断される。重複文字領域の長さが、文字記載領域の長さの半分以上でないと判断された場合は、ステップＳ１１４へ処理が進む。ステップＳ１１４では、判定対象の文字が、指定文字ではないと判定され、ステップＳ１１５へ処理が進む。 In step S112, for the character to be determined, the length in the character string direction 44 between the character description area and the overlapping character area is calculated, and whether or not the length of the overlapping character area is half or more of the length of the character description area. Is judged. If it is determined that the length of the overlapping character area is not more than half the length of the character description area, the process proceeds to step S114. In step S114, it is determined that the character to be determined is not a designated character, and the process proceeds to step S115.

ステップＳ１１２で、重複文字領域の長さが、文字記載領域の長さの半分以上であると判断された場合は、ステップＳ１１３へ処理が進む。ステップＳ１１３では、判定対象の文字が指定文字であると判定される。ステップＳ１１５では、ステップＳ１０８で指定文字を含むと判定された文字列について、次の判定対象の文字があるか否かが判断される。次の判定対象の文字がある場合には、ステップＳ１１０へ戻って、再び判定対象の文字が指定文字か否かの判定処理を行う。 If it is determined in step S112 that the length of the overlapping character area is half or more of the length of the character description area, the process proceeds to step S113. In step S113, it is determined that the character to be determined is a designated character. In step S115, it is determined whether or not there is a next character to be determined for the character string determined to include the designated character in step S108. If there is a next character to be determined, the process returns to step S110 to determine again whether or not the character to be determined is a designated character.

文字列に含まれる全ての文字について判定が終了した場合は、ステップＳ１１６へ処理が進む。ステップＳ１１６では、次の判定対象の文字列があるか否かが判断される。次の判定対象の文字列がある場合は、ステップＳ１０７へ戻って、再び判定対象の文字列が、指定文字を含むか否かの判定処理が行われる。マーキング領域４６と重なる全ての文字列について、判定処理が終了した場合は、一連の抽出処理が終了する。以上の処理により、マーキングにより指定された指定文字が特定される。特定された指定文字は、ディスプレイ１１等に表示される。また、特定された指定文字をＯＣＲ処理してもよいし、各種の編集を行っても良い。 If the determination is completed for all characters included in the character string, the process proceeds to step S116. In step S116, it is determined whether there is a character string to be determined next. If there is a next character string to be determined, the process returns to step S107, and determination processing is performed again to determine whether or not the character string to be determined includes the designated character. When the determination process is completed for all the character strings overlapping the marking region 46, a series of extraction processes is completed. Through the above processing, the designated character designated by the marking is specified. The specified designated character is displayed on the display 11 or the like. Further, the specified designated character may be subjected to OCR processing or various editing may be performed.

以上説明した本実施形態に係るネットワーク複合機１によれば、重複文字領域と文字記載領域との文字列方向４４の長さが算出される。そして、文字記載領域の文字列方向４４の長さに対する重複文字領域の文字列方向４４の長さの割合が所定値以上の場合、判定対象の文字が指定文字であると判定される。このため、ユーザがラフに文字をマーキングして、例えば、指定したい文字の隣の文字にまでマーキングが掛かってしまった場合、又は、指定したい文字を完全に塗り潰していない場合でも、マーキングされた度合いに基づいて、指定文字か否かを判定できる。従って、ユーザがラフにマーキングした場合であっても、ユーザの指定したい文字を特定することが可能となる。 According to the network multifunction peripheral 1 according to the present embodiment described above, the length in the character string direction 44 between the overlapping character area and the character description area is calculated. When the ratio of the length of the overlapping character area in the character string direction 44 to the length of the character description area 44 in the character string direction 44 is equal to or greater than a predetermined value, it is determined that the determination target character is the designated character. For this reason, even if the user has roughly marked the character and, for example, the character next to the character to be specified has been marked, or even if the character to be specified is not completely filled, the degree of marking Based on the above, it can be determined whether or not it is a designated character. Therefore, even if the user has roughly marked, it is possible to specify the character that the user wants to specify.

また、ネットワーク複合機１によれば、重複文字列領域と列記載領域との直交方向４５の長さが算出される。そして、列記載領域の直交方向４５の長さに対する重複列領域の直交方向４５の長さの割合が所定値以上の場合、判定対象の文字列が指定文字を含むと判定される。このため、ユーザがラフに文字をマーキングして、例えば、指定したい文字列の隣の文字列にまでマーキングが掛かってしまった場合、又は、指定したい文字列を完全に塗り潰していない場合でも、マーキングされた度合いに基づいて、指定文字を含む文字列か否かを判定できる。従って、ユーザがラフにマーキングした場合であっても、ユーザの指定したい文字を含む文字列を特定することが可能となる。 Further, according to the network MFP 1, the length in the orthogonal direction 45 between the overlapping character string area and the column description area is calculated. When the ratio of the length in the orthogonal direction 45 of the overlapping row area to the length in the orthogonal direction 45 of the row description area is equal to or greater than a predetermined value, it is determined that the character string to be determined includes the designated character. For this reason, even if the user has roughly marked the character and, for example, the character string next to the character string to be specified has been marked, or even if the character string to be specified is not completely filled Whether or not the character string includes the designated character can be determined based on the degree to which the character has been set. Therefore, even if the user has roughly marked, it is possible to specify a character string including the character that the user wants to specify.

また、ネットワーク複合機１によれば、指定文字を含む文字列が抽出された後、その抽出された文字列に含まれる複数の文字から指定文字が抽出される。すなわち、文字列単位で判定対象となる文字が抽出され、その後、抽出された文字列に含まれる文字がマーキングによって指定された文字か否かが判定される。従って、全ての文字について一つ一つ判定する場合と比較して、マーキングによって指定された文字を効率良く特定することができる。 Further, according to the network MFP 1, after the character string including the designated character is extracted, the designated character is extracted from a plurality of characters included in the extracted character string. That is, a character to be determined is extracted in character string units, and then it is determined whether or not the character included in the extracted character string is a character specified by marking. Therefore, it is possible to efficiently specify the character designated by the marking as compared with the case where all characters are determined one by one.

更に、ネットワーク複合機１によれば、重複列領域が単数か複数かについて判断されるので、複数の文字列に渡ってマーキングがなされているか否かを判断することができる。そして、重複列領域が単数の場合に、その重複列領域に対応する文字列に含まれる複数の文字について、指定文字か否かが判断される。このため、マーキングされている文字列が１列の場合に、重複列領域の上記長さにかかわらず、その文字列に含まれる複数の文字を、判定対象の文字とすることができる。すなわち、図５に示すように、マーキングがライン状になされ、列記載領域の直交方向４５の長さに対する重複列領域の直交方向４５の長さの割合が所定値より小さい場合であっても、その文字列に含まれる複数の文字を、判定対象の文字とすることができる。 Furthermore, according to the network multifunction device 1, since it is determined whether the overlapping row area is singular or plural, it is possible to determine whether marking is performed over a plurality of character strings. Then, when there is a single overlapping column area, it is determined whether or not a plurality of characters included in the character string corresponding to the overlapping column area are designated characters. For this reason, when the character string currently marked is one line, the several character contained in the character string can be made into the character for determination irrespective of the said length of an overlap line area | region. That is, as shown in FIG. 5, even when the marking is made in a line shape and the ratio of the length in the orthogonal direction 45 of the overlapping row region to the length in the orthogonal direction 45 of the row description region is smaller than a predetermined value, A plurality of characters included in the character string can be determined.

以上、本発明の実施の形態について説明したが、本発明は、上記実施形態に限定されるものではなく種々の変形が可能である。例えば、上記実施形態では、重複文字領域又は重複列領域の長さが、文字記載領域又は列記載領域の半分以上か否かに基づいて、判定を行ったが、これに限られない。重複文字領域又は重複列領域の長さが、文字記載領域又は列記載領域の３分の１以上等、判定基準は任意に設定することができる。 Although the embodiment of the present invention has been described above, the present invention is not limited to the above embodiment, and various modifications can be made. For example, in the above embodiment, the determination is made based on whether or not the length of the overlapping character region or the overlapping column region is half or more of the character description region or the column description region, but is not limited thereto. The determination criterion can be arbitrarily set such that the length of the overlapping character region or the overlapping column region is one third or more of the character description region or the column description region.

また、上記実施形態では、列抽出部３２を備えることとしたが、列抽出部３２を備えていなくてもよい。この場合、文字抽出部３３が、マーキング領域４６と少なくとも一部が重なる文字についてそれぞれ指定文字か否かを判定する。また、上記実施形態では、スキャナ１２を備えることとしたが、スキャナ１２を備えていなくてもよい。この場合、原稿の画像データを外部装置から取得し、取得した画像データに対して画像処置を施すことにより、マーキングされた文字を抽出する。更に、上記実施形態では、文字が横書きの場合について具体的に説明したが、縦書きであっても本発明を適用することができる。この場合、文字列方向は縦方向となる。 In the above embodiment, the column extraction unit 32 is provided. However, the column extraction unit 32 may not be provided. In this case, the character extraction unit 33 determines whether or not each character that overlaps at least part of the marking region 46 is a designated character. In the above embodiment, the scanner 12 is provided. However, the scanner 12 may not be provided. In this case, the image data of the document is acquired from the external device, and the marked characters are extracted by performing image processing on the acquired image data. Furthermore, in the above-described embodiment, the case where the characters are written horizontally is specifically described. However, the present invention can be applied even when the characters are written vertically. In this case, the character string direction is the vertical direction.

１ネットワーク複合機
１２スキャナ
３０制御部
３１マーキング抽出部
３２１列記載領域抽出部
３２２重複列領域抽出部
３２３列算出部
３２４列判定部
３２５判断部
３３１文字記載領域抽出部
３３２重複文字領域抽出部
３３３文字算出部
３３４文字判定部
DESCRIPTION OF SYMBOLS 1 Network compound machine 12 Scanner 30 Control part 31 Marking extraction part 321 Column description area extraction part 322 Duplicate line area extraction part 323 Column calculation part 324 Column determination part 325 Judgment part 331 Character description area extraction part 332 Duplicate character area extraction part 333 Character Calculation unit 334 Character determination unit

Claims

Acquisition means for acquiring image data of a document;
Marking extraction means for extracting a colored marking area on the document from the image data acquired by the acquisition means;
A character description region extracting means for extracting, from the image data, a character description region in which the character is described for each character included in a plurality of characters described side by side in the document;
An overlapping character area extracting means for extracting an overlapping character area where the character describing area extracted by the character description area extracting means and the marking area extracted by the marking extracting means overlap;
Character calculation means for calculating the length in the direction in which the plurality of characters are arranged for the overlapping character area extracted by the overlapping character area extraction means and the character description area corresponding to the overlapping character area;
Character determination for determining whether the character is a character designated by marking on the document based on the length of the overlapping character region with respect to the length of the character description region calculated by the character calculation unit Means,
An image processing apparatus comprising:

Column description area extracting means for extracting, from the image data, a column description area in which the character string is described for each character string included in a plurality of character strings arranged and described in the document;
An overlapping column region extracting unit for extracting an overlapping column region where the column describing region extracted by the column describing region extracting unit and the marking region extracted by the marking extracting unit overlap;
About the overlapping row area extracted by the overlapping row area extraction means and the column description area corresponding to the overlapping row area, the length in the orthogonal direction orthogonal to the direction in which a plurality of characters included in the character string are arranged Column calculating means for calculating
Based on the length in the orthogonal direction of the overlapping row area with respect to the length in the orthogonal direction of the row description area calculated by the row calculation means, the character string is a character designated by marking on the document. Column determining means for determining whether or not to include,
The image processing apparatus according to claim 1, further comprising:

The character determination means determines whether or not the plurality of characters included in the character string determined to include a character marked on the original by the line determination means are characters specified by marking on the original. The image processing apparatus according to claim 2, wherein the determination is performed.

A determination unit that determines whether the overlapping column region extracted by the overlapping column region extraction unit is singular or plural;
The character determination unit is designated by marking on the document for the plurality of characters included in the character string corresponding to the overlap column region when the determination unit determines that the overlap column region is singular. The image processing apparatus according to claim 3, wherein it is determined whether the character is a character.