JP2006195887A

JP2006195887A - Image processing system

Info

Publication number: JP2006195887A
Application number: JP2005009116A
Authority: JP
Inventors: Kimihide Terao; 仁秀寺尾
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2005-01-17
Filing date: 2005-01-17
Publication date: 2006-07-27

Abstract

<P>PROBLEM TO BE SOLVED: To authenticate which one of an editable original, an uneditable original, and a simple vectoring document is acquired. <P>SOLUTION: After performing a login in an MFP, authentication is conducted concerning whether right to acquire an original electronic file from a barcode exists or not, whether right to retrieve the arbitrary file on a document management server exists or not, and whether right to perform vectoring processing in image data exists or not. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本願発明は、複写機などの画像処理装置で読み取った画像データを、Ｗｏｒｄ等、所謂文書作成アプリケーションソフトで再利用可能なベクトルデータに変換する画像処理システムに関する。 The present invention relates to an image processing system for converting image data read by an image processing apparatus such as a copying machine into vector data reusable by so-called document creation application software such as Word.

近年、環境問題が叫ばれる中、オフィスでのペーパーレス化が急速に進んでいる。即ち、従来からバインダー等で蓄積された紙文書をスキャナーで読み取りポータブルドキュメントフォーマット（以降ＰＤＦと記す）に変換して画像記憶装置にデータベースとして蓄積し、文書管理システムを構築出来る。一方、機能が拡張されたＭＦＰでは、予め画像を記録する際に、該画像ファイルが存在する画像記憶装置内のポインター情報を該文書の表紙或いは記載情報中に付加情報として記録して置き、再度、該文書を複写等、再利用する際に、このポインター情報からオリジナル電子ファイルの格納場所を検出し、該電子ファイルの元情報を直接用いる事で、紙文書全体の保存を削減する（例えば、特許文献１参照。）。 In recent years, paperless offices are rapidly becoming paperless as environmental problems are screamed. In other words, it is possible to construct a document management system by reading a paper document that has been conventionally stored in a binder or the like with a scanner and converting it into a portable document format (hereinafter referred to as PDF) and storing it as a database in an image storage device. On the other hand, in the MFP with the expanded function, when recording an image in advance, the pointer information in the image storage device in which the image file exists is recorded as additional information on the cover or description information of the document, and is stored again. When the document is copied or reused, the storage location of the original electronic file is detected from the pointer information, and the original information of the electronic file is directly used to reduce the storage of the entire paper document (for example, (See Patent Document 1).

しかしながら、上記で示しているように、ＭＦＰ側で出力する画像には必ずポインター情報を印刷しておく必要がある。一般的にこのポインター情報はバーコード、もしくは電子透かし等が用いられるが、これらポインター情報により出力画像自体が潰されてしまうことがある。 However, as described above, pointer information must be printed on an image output on the MFP side. In general, a bar code, a digital watermark, or the like is used as the pointer information, but the output image itself may be crushed by the pointer information.

この問題を解決するためにはＭＦＰ側でポインター情報は印刷せずに、スキャン画像内のキーワード文字列、イメージ、レイアウト等を用いてオリジナル電子ファイルを特定することが可能となる。
特開２００４−４６５３７号公報 In order to solve this problem, it is possible to specify the original electronic file using the keyword character string, image, layout, etc. in the scanned image without printing the pointer information on the MFP side.
JP 2004-46537 A

しかしながら、前記従来例では全てのユーザーに対してポインター情報からの電子ファイルの取得、ポインター情報から電子ファイルが見つからなかった場合の検索処理、電子ファイルがなかった場合のベクトル化処理を行わせていた。このため、セキュリティ的に問題があると共に、柔軟性がなかった。 However, in the above-described conventional example, the acquisition of the electronic file from the pointer information is performed for all users, the search process when the electronic file is not found from the pointer information, and the vectorization process when there is no electronic file. . For this reason, there was a security problem and there was no flexibility.

上記目的を達成するために、
請求項１記載の発明は、画像処理システムにユーザー認証を行いログインし、このログインしたユーザー毎に電子ファイル特定手段と、ベクトル化手段の各機能を実行する際の制限を加える。これにより、ユーザー毎に使える機能に対して使用制限を設けることができ、セキュリティが向上すると共に、ユーザー毎の柔軟性があがる。 To achieve the above objective,
According to the first aspect of the present invention, user authentication is performed on the image processing system and login is performed, and restrictions are imposed on the execution of the functions of the electronic file specifying unit and the vectorizing unit for each logged-in user. As a result, it is possible to place usage restrictions on functions that can be used for each user, improving security and increasing flexibility for each user.

請求項２記載の発明は、ユーザーにユーザー名及びパスワードを入力させることによりユーザー認証を行う。 According to the second aspect of the present invention, user authentication is performed by allowing a user to input a user name and a password.

請求項３記載の発明は、生体認証によるユーザー認証を行うことによりユーザーにユーザー名及びパスワードの入力を省略させることができる。 According to the third aspect of the present invention, it is possible to cause the user to omit the input of the user name and password by performing user authentication by biometric authentication.

請求項４記載の発明は、ＩＣカード等を本体装置に挿すことによりユーザー認証を行うことができ、ユーザーにユーザー名及びパスワードの入力を省略させることができる。 According to the fourth aspect of the present invention, user authentication can be performed by inserting an IC card or the like into the main device, and the user can be prevented from inputting a user name and a password.

請求項５記載の発明は、請求項１記載の画像処理システムにおいて、電子ファイルを特定する手段が原稿に付加的に記録された電子ファイルの格納場所を示す付加情報を認識する手段であるのでイメージ情報から簡単にオリジナルの電子ファイルを特定出来る。 According to the fifth aspect of the present invention, in the image processing system according to the first aspect, the means for identifying the electronic file is a means for recognizing additional information indicating the storage location of the electronic file additionally recorded on the document. The original electronic file can be easily identified from the information.

請求項６記載の発明は、請求項１記載の画像処理システムにおいて、
電子ファイル特定手段が原稿中に記載された特定の情報を記憶手段で格納されたファイルの中から検索する手段を有し、検索の結果、特定情報の一致によって電子ファイルを特定する事で、付加情報が記録されていない文書に対しても容易にオリジナルの電子ファイルを特定出来る。 The invention according to claim 6 is the image processing system according to claim 1,
The electronic file identification means has means to search for specific information described in the manuscript from the files stored in the storage means. As a result of the search, the electronic file is specified by matching the specific information. The original electronic file can be easily specified even for a document in which no information is recorded.

請求項７記載の発明は、請求項１記載の画像処理システムにおいて、ベクトル化手段は原稿中の文字領域を光学的文字認識するＯＣＲ手段によって文字部分を文字フォントデータにベクトル化するので文字部は高品位である。 According to a seventh aspect of the present invention, in the image processing system according to the first aspect, the vectorizing means vectorizes the character portion into character font data by the OCR means for optically recognizing the character area in the document. High quality.

また、請求項８記載の発明は、請求項１記載の画像処理システムにおいて、ベクトル化手段は原稿を複数のオブジェクトに分割し、各オブジェクトに対して独立にベクトル化する事を特徴とする為、ベクトル化されたオブジェクトは独立に扱う事が出来る。 The invention according to claim 8 is characterized in that, in the image processing system according to claim 1, the vectorization means divides the document into a plurality of objects and vectorizes each object independently. Vectorized objects can be handled independently.

請求項９記載の発明は、請求項１記載の画像処理システムにおいて、ベクトル化手段はベクトル化されたオブジェクトを既存の文書作成ソフトウエアーで扱える例えばｒｔｆフォーマットに変換するので、ベクトル化したオブジェクトを既存の文書作成アプリソフト上で再利用出来る。 According to the ninth aspect of the present invention, in the image processing system according to the first aspect, the vectorization means converts the vectorized object into, for example, the rtf format that can be handled by existing document creation software. Can be reused on other document creation application software.

請求項１０記載の発明は、請求項１記載の画像処理システムにおいて、ベクトル化手段でベクトル化されたベクトルデータを記憶する画像記憶手段と、該ベクトルデータに該データを格納する格納場所を付加情報として付加する情報付加手段を備える事を特徴とし、従って、該文書を再度原稿として画像読み取り手段で読み取った際には、この付加情報から電子ファイルを特定する事が可能になる。 According to a tenth aspect of the present invention, in the image processing system according to the first aspect, image storage means for storing the vector data vectorized by the vectorizing means, and a storage location for storing the data in the vector data are added information. Therefore, when the document is read again by the image reading means as an original, an electronic file can be specified from the additional information.

請求項１１記載の発明は、原稿を読み取り走査する手段と、該手段で得られたイメージ情報から、該原稿の電子ファイルを特定する手段とを有し、前記特定手段で得られた該原稿の電子ファイルがイメージファイルやＰＤＦの様にオブジェクト単位で既存の文書作成ソフトウエアーで扱えない場合にも前記画像読み取り走査手段で得られるイメージ情報をベクトル化手段でベクトルデータに変換する事を特徴とする為、イメージファイルに対しても再利用可能なベクトルデータに変換出来る。 The invention described in claim 11 has means for reading and scanning the original and means for specifying the electronic file of the original from the image information obtained by the means, and the original obtained by the specifying means The image information obtained by the image reading / scanning means is converted into vector data by the vectorization means even when the electronic file cannot be handled by existing document creation software in units of objects like image files and PDF. Therefore, the image file can be converted into reusable vector data.

請求項１２記載の発明は、第１項記載の画像処理システムにおいて
ベクトル化手段は原稿を読み取り走査して得られるイメージ情報をオブジェクト毎に分割する手段と、該分割されたオブジェクト単位で記憶手段で格納されたファイルの中から一致するオブジェクトを検索する手段を有し、該検索手段で得られた情報を用いてベクトル化する事を特徴とする為、全てのオブジェクトに対するベクトル化が不要になり、処理の高速化、及び、高画質化が図られる。 According to a twelfth aspect of the present invention, in the image processing system according to the first aspect, the vectorization means includes means for dividing the image information obtained by reading and scanning the original for each object, and storage means for each divided object. Since it has a means for searching for a matching object from the stored file and is vectorized using information obtained by the searching means, vectorization for all objects becomes unnecessary, Processing speed and image quality can be improved.

請求項１３記載の発明は、前記ユーザー毎に制限を設ける手段において、ユーザー名としては前記ユーザー認証手段においてログインしたユーザー名を用いることによって、前記電子ファイル特定手段と前記ベクトル化手段の各処理を実行する権限があるユーザーに対してのみ実行できるようにできる。 The invention according to claim 13 is a means for setting a restriction for each user, and by using the user name logged in the user authentication means as the user name, each process of the electronic file specifying means and the vectorizing means is performed. It can be executed only for users who have authority to execute.

請求項１４記載の発明は、前記ユーザー毎に制限を設ける手段において、ログインしたユーザー名ではなく、読み取り原稿自体に電子透かしとしてユーザー情報を埋め込んでおき、このユーザー情報を用いることによって、前記電子ファイル特定手段と前記ベクトル化手段の各処理を実行する権限がある場合にのみ実行できるようにできる。 According to the fourteenth aspect of the present invention, in the means for setting a restriction for each user, user information is embedded as a digital watermark in the read document itself instead of a logged-in user name, and the electronic file is used by using the user information. It can be executed only when there is an authority to execute each process of the specifying means and the vectorization means.

請求項１５記載の発明は、前記電子ファイル特定手段において、電子ファイルが保存されている文書管理装置側において任意のファイル、フォルダに対してユーザー毎にアクセス制限を設けることにより取得可能な電子ファイルに制限を加えることができる。 According to a fifteenth aspect of the present invention, in the electronic file specifying means, an electronic file that can be acquired by providing an access restriction for each user on an arbitrary file or folder on the document management apparatus side in which the electronic file is stored. Restrictions can be added.

請求項１６記載の発明は、請求項１５における取得可能な電子ファイルとしてはアプリケーションソフトで作成された編集可能な文書、イメージデータとして格納されている文書、ベクトル化された文書を含め、ユーザー権限に応じて取得可能な電子ファイルに制限を加えることができる。 The invention described in claim 16 includes the editable document created by the application software, the document stored as image data, and the vectorized document as the electronic file that can be acquired in claim 15 according to the user authority. Accordingly, restrictions can be added to electronic files that can be acquired.

請求項１７記載の発明は、請求項１５における文書管理装置としては原稿を読み取る画像処理システム本体、及びネットワーク上に存在する文書管理サーバーを含むことによって、ネットワーク上の任意の文書管理サーバーを対象とすることができる。 The invention described in claim 17 is directed to an arbitrary document management server on the network by including an image processing system main body for reading a document and a document management server existing on the network as the document management device in claim 15. can do.

請求項１記載の発明によれば、画像処理システムにユーザー認証を行いログインし、このログインしたユーザー毎に電子ファイル特定手段と、ベクトル化手段の各機能を実行する際の制限を加える。これにより、ユーザー毎に使える機能に対して使用制限を設けることができ、セキュリティが向上すると共に、ユーザー毎の柔軟性があがる。 According to the first aspect of the present invention, user authentication is performed on the image processing system to log in, and restrictions are imposed on the execution of the functions of the electronic file specifying unit and the vectorizing unit for each logged-in user. As a result, it is possible to place usage restrictions on functions that can be used for each user, improving security and increasing flexibility for each user.

請求項２記載の発明によれば、ユーザーにユーザー名及びパスワードを入力させることによりユーザー認証を行う。 According to the invention described in claim 2, user authentication is performed by allowing the user to input a user name and a password.

請求項３記載の発明によれば、生体認証によるユーザー認証を行うことによりユーザーにユーザー名及びパスワードの入力を省略させることができる。 According to the invention described in claim 3, by performing user authentication by biometric authentication, the user can omit inputting a user name and a password.

請求項４記載の発明によれば、ＩＣカード等を本体装置に挿すことによりユーザー認証を行うことができ、ユーザーにユーザー名及びパスワードの入力を省略させることができる。 According to the fourth aspect of the present invention, user authentication can be performed by inserting an IC card or the like into the main unit, and the user can be prevented from inputting a user name and password.

請求項５記載の発明によれば、請求項１記載の画像処理システムにおいて、電子ファイルを特定する手段が原稿に付加的に記録された電子ファイルの格納場所を示す付加情報を認識する手段であるのでイメージ情報から簡単にオリジナルの電子ファイルを特定出来る。 According to the fifth aspect of the present invention, in the image processing system according to the first aspect, the means for identifying the electronic file is means for recognizing additional information indicating a storage location of the electronic file additionally recorded on the document. So you can easily identify the original electronic file from the image information.

請求項６記載の発明によれば、請求項１記載の画像処理システムにおいて、電子ファイル特定手段が原稿中に記載された特定の情報を記憶手段で格納されたファイルの中から検索する手段を有し、検索の結果、特定情報の一致によって電子ファイルを特定する事で、付加情報が記録されていない文書に対しても容易にオリジナルの電子ファイルを特定出来る。 According to a sixth aspect of the present invention, in the image processing system according to the first aspect, the electronic file specifying means has means for searching for specific information described in the document from the files stored in the storage means. As a result of the search, by specifying the electronic file by matching the specific information, the original electronic file can be easily specified even for a document in which no additional information is recorded.

請求項７記載の発明によれば、請求項１記載の画像処理システムにおいて、ベクトル化手段は原稿中の文字領域を光学的文字認識するＯＣＲ手段によって文字部分を文字フォントデータにベクトル化するので文字部は高品位である。 According to the seventh aspect of the present invention, in the image processing system according to the first aspect, the vectorizing means vectorizes the character portion into character font data by the OCR means for optically recognizing the character area in the document. The department is high-grade.

また、請求項８記載の発明によれば、請求項１記載の画像処理システムにおいて、ベクトル化手段は原稿を複数のオブジェクトに分割し、各オブジェクトに対して独立にベクトル化する事を特徴とする為、ベクトル化されたオブジェクトは独立に扱う事が出来る。 According to the invention described in claim 8, in the image processing system according to claim 1, the vectorization means divides the document into a plurality of objects and vectorizes each object independently. Therefore, vectorized objects can be handled independently.

請求項９記載の発明によれば、請求項１記載の画像処理システムにおいて、ベクトル化手段はベクトル化されたオブジェクトを既存の文書作成ソフトウエアーで扱える例えばｒｔｆフォーマットに変換するので、ベクトル化したオブジェクトを既存の文書作成アプリソフト上で再利用出来る。 According to the invention described in claim 9, in the image processing system according to claim 1, since the vectorization means converts the vectorized object into, for example, the rtf format that can be handled by existing document creation software, Can be reused on existing document creation application software.

請求項１０記載の発明によれば、請求項１記載の画像処理システムにおいて、ベクトル化手段でベクトル化されたベクトルデータを記憶する画像記憶手段と、該ベクトルデータに該データを格納する格納場所を付加情報として付加する情報付加手段を備える事を特徴とし、従って、該文書を再度原稿として画像読み取り手段で読み取った際には、この付加情報から電子ファイルを特定する事が可能になる。 According to a tenth aspect of the present invention, in the image processing system according to the first aspect, the image storage means for storing the vector data vectorized by the vectorization means, and the storage location for storing the data in the vector data are provided. The information adding means for adding as additional information is provided. Therefore, when the document is read again as an original by the image reading means, the electronic file can be specified from the additional information.

請求項１１記載の発明によれば、原稿を読み取り走査する手段と、該手段で得られたイメージ情報から、該原稿の電子ファイルを特定する手段とを有し、前記特定手段で得られた該原稿の電子ファイルがイメージファイルやＰＤＦの様にオブジェクト単位で既存の文書作成ソフトウエアーで扱えない場合にも前記画像読み取り走査手段で得られるイメージ情報をベクトル化手段でベクトルデータに変換する事を特徴とする為、イメージファイルに対しても再利用可能なベクトルデータに変換出来る。 According to the eleventh aspect of the invention, there is provided means for reading and scanning a document, and means for specifying an electronic file of the document from image information obtained by the means. The image information obtained by the image reading / scanning means is converted into vector data by the vectorization means even when the electronic file of the manuscript cannot be handled by existing document creation software in units of objects like image files or PDF. Therefore, the image file can be converted into reusable vector data.

請求項１２記載の発明によれば、第１項記載の画像処理システムにおいて
ベクトル化手段は原稿を読み取り走査して得られるイメージ情報をオブジェクト毎に分割する手段と、該分割されたオブジェクト単位で記憶手段で格納されたファイルの中から一致するオブジェクトを検索する手段を有し、該検索手段で得られた情報を用いてベクトル化する事を特徴とする為、全てのオブジェクトに対するベクトル化が不要になり、処理の高速化、及び、高画質化が図られる。 According to a twelfth aspect of the present invention, in the image processing system according to the first aspect, the vectorizing means divides image information obtained by reading and scanning a document for each object, and stores the divided information in units of objects. Since there is a means for searching for a matching object from the files stored in the means, and vectorization is performed using information obtained by the search means, it is not necessary to vectorize all objects Thus, the processing speed and image quality can be improved.

請求項１３記載の発明によれば、前記ユーザー毎に制限を設ける手段において、ユーザー名としては前記ユーザー認証手段においてログインしたユーザー名を用いることによって、前記電子ファイル特定手段と前記ベクトル化手段の各処理を実行する権限があるユーザーに対してのみ実行できるようにできる。 According to a thirteenth aspect of the present invention, each of the electronic file specifying unit and the vectorizing unit is configured to use a user name logged in the user authentication unit as a user name in the unit for setting a restriction for each user. It can be executed only for users who have the authority to execute processing.

請求項１４記載の発明によれば、前記ユーザー毎に制限を設ける手段において、ログインしたユーザー名ではなく、読み取り原稿自体に電子透かしとしてユーザー情報を埋め込んでおき、このユーザー情報を用いることによって、前記電子ファイル特定手段と前記ベクトル化手段の各処理を実行する権限がある場合にのみ実行できるようにできる。 According to the fourteenth aspect of the present invention, in the means for providing a restriction for each user, the user information is embedded as a digital watermark in the read document itself instead of the logged-in user name, and by using the user information, the user information is used. It can be executed only when there is an authority to execute each process of the electronic file specifying means and the vectorization means.

請求項１５記載の発明によれば、前記電子ファイル特定手段において、電子ファイルが保存されている文書管理装置側において任意のファイル、フォルダに対してユーザー毎にアクセス制限を設けることにより取得可能な電子ファイルに制限を加えることができる。 According to the fifteenth aspect of the present invention, in the electronic file specifying unit, an electronic file that can be acquired by providing an access restriction for each user on an arbitrary file or folder on the document management apparatus side where the electronic file is stored. You can add restrictions to the file.

請求項１６記載の発明によれば、請求項１５における取得可能な電子ファイルとしてはアプリケーションソフトで作成された編集可能な文書、イメージデータとして格納されている文書、ベクトル化された文書を含め、ユーザー権限に応じて取得可能な電子ファイルに制限を加えることができる。 According to the invention described in claim 16, the electronic files that can be acquired in claim 15 include editable documents created by application software, documents stored as image data, vectorized documents, and the like. It is possible to limit electronic files that can be acquired according to authority.

請求項１７記載の発明によれば、請求項１５における文書管理装置としては原稿を読み取る画像処理システム本体、及びネットワーク上に存在する文書管理サーバーを含むことによって、ネットワーク上の任意の文書管理サーバーを対象とすることができる。 According to the seventeenth aspect of the present invention, the document management apparatus according to the fifteenth aspect includes an image processing system main body that reads a document and a document management server that exists on the network. Can be targeted.

以下、発明を実施するための最良の形態を説明する。 The best mode for carrying out the invention will be described below.

本願発明の実施の形態について説明する。図１は本発明にかかる画像処理システム構成例を示すブロック図である。この画像処理システムは、オフィス１０とオフィス２０とをインターネット１０４で接続された環境で実現する。オフィス１０内に構築されたＬＡＮ１０７には、ＭＦＰ１００、ＭＦＰ１００を制御するマネージメントＰＣ１０１、クライアントＰＣ（外部記憶手段）１０２、文書管理サーバー１０６、そのデータベース１０５およびプロキシサーバー１０３が接続されている。ＬＡＮ１０７及びオフィス２０内のＬＡＮ１０８はプロキシサーバー１３を介してインターネット１０４に接続される。ＭＦＰ１００は本発明において紙文書の画像読み取り部と読み取った画像信号に対する画像処理の１部を担当し、画像信号はＬＡＮ１０９を用いてマネージメントＰＣ１０１に入力する。マネージメントＰＣは通常のＰＣであり、内部に画像記憶手段、画像処理手段、表示手段、入力手段を有するが、その一部をＭＦＰ１００に一体化して構成されている。 Embodiments of the present invention will be described. FIG. 1 is a block diagram showing a configuration example of an image processing system according to the present invention. This image processing system is realized in an environment where the office 10 and the office 20 are connected by the Internet 104. A LAN 107 constructed in the office 10 is connected to the MFP 100, a management PC 101 that controls the MFP 100, a client PC (external storage means) 102, a document management server 106, its database 105, and a proxy server 103. The LAN 107 and the LAN 108 in the office 20 are connected to the Internet 104 via the proxy server 13. In the present invention, the MFP 100 is in charge of an image reading unit of a paper document and a part of image processing for the read image signal, and the image signal is input to the management PC 101 using the LAN 109. The management PC is a normal PC and includes an image storage unit, an image processing unit, a display unit, and an input unit. A part of the management PC is integrated with the MFP 100.

図２はＭＦＰ１００の構成図である。図２においてオートドキュメントフィーダー（以降ＡＤＦと記す）を含む画像読み取り部１１０は束状の或いは１枚の原稿画像を図示しない光源で照射し、原稿反射像をレンズで固体撮像素子上に結像し、固体撮像素子からラスター状の画像読み取り信号を６００ＤＰＩの密度のイメージ情報として得る。通常の複写機能はこの画像信号をデータ処理部１１５で記録信号へ画像処理し、複数毎複写の場合は記録装置１１１に一旦１ページ分の記録データを記憶保持した後、記録装置１１２に順次出力して紙上に画像を形成する。 FIG. 2 is a configuration diagram of the MFP 100. In FIG. 2, an image reading unit 110 including an auto document feeder (hereinafter referred to as ADF) irradiates a bundle or one original image with a light source (not shown), and forms an original reflection image on a solid-state image sensor with a lens. A raster-like image reading signal is obtained as image information having a density of 600 DPI from the solid-state imaging device. In the normal copying function, the image signal is processed into a recording signal by the data processing unit 115. In the case of copying every plural number, recording data for one page is temporarily stored in the recording device 111 and then sequentially output to the recording device 112. Then, an image is formed on the paper.

一方クライアントＰＣ１０２から出力されるプリントデータはＬＡＮ１０７からネットワークＩＦ１１４を経てデータ処理装置１１５で記録可能なラスターデータに変換した後、前記記録装置で紙上に記録画像として形成される。 On the other hand, the print data output from the client PC 102 is converted into raster data that can be recorded by the data processing device 115 from the LAN 107 via the network IF 114, and then formed as a recorded image on paper by the recording device.

ＭＦＰ１００への操作者の指示はＭＦＰに装備されたキー操作部とマネージメントＰＣに入力されるキーボード、及びマウスからなる入力装置１１３から行われ、これら一連の動作はデータ処理装置１１５内の図示しない制御部で制御される。 An operator's instruction to the MFP 100 is performed from a key operation unit equipped in the MFP, a keyboard input to the management PC, and an input device 113 including a mouse, and these series of operations are not shown in the data processing device 115. Controlled by the department.

一方操作入力の状態表示及び処理中の画像データの表示は表示装置１１６で行われる。尚記憶装置１１１はマネージメントＰＣからも制御され、これらＭＦＰとマネージメントＰＣとのデータの授受及び制御はネットワークＩＦ１１７および直結したＬＡＮ１０９を用いて行われる。 On the other hand, the status display of operation input and the display of image data being processed are performed on the display device 116. The storage device 111 is also controlled by the management PC, and data exchange and control between the MFP and the management PC are performed using the network IF 117 and the directly connected LAN 109.

処理概要を説明する。 An outline of the process will be described.

次に本発明による画像処理全体の概要を図３を用いて説明する。 Next, an overview of the entire image processing according to the present invention will be described with reference to FIG.

図３においてまず、ユーザー認証を行いＭＦＰにログインし（ステップ１４０）、ＭＦＰ１００の画像読み取り部１１０を動作させ１枚の原稿をラスター状に走査し、イメージ情報入力処理１２０で６００ＤＰＩ−８ビットの画像信号を得る。該画像信号をデータ処理部１１５で前処理を施し記憶装置１１１に１ページ分の画像データとして保存する。 In FIG. 3, first, user authentication is performed and the MFP is logged in (step 140), the image reading unit 110 of the MFP 100 is operated to scan one original in a raster shape, and a 600 DPI-8-bit image is processed by the image information input process 120. Get a signal. The image signal is preprocessed by the data processing unit 115 and stored in the storage device 111 as image data for one page.

マネージメントＰＣ１０１のＣＰＵは該格納された画像信号から先ず、文字／線画部分とハーフトーンの画像部分とに領域を分離し、文字部は更に段落で塊として纏まっているブロック毎に、或いは、線で構成された表、図形に分離し各々セグメント化する。一方ハーフトーンで表現される画像部分は、矩形に分離されたブロックの画像部分、背景部等、所謂ブロック毎に独立したオブジェクトに分割する（ステップ１２１）。 The CPU of the management PC 101 first separates the area from the stored image signal into a character / line image portion and a halftone image portion, and the character portion is further divided into blocks or a line or a line. Separated into organized tables and figures, each segmented. On the other hand, the image portion expressed by halftone is divided into independent objects for each so-called block, such as an image portion of a block separated into rectangles, a background portion, and the like (step 121).

このとき原稿画像中に付加情報として記録された２次元バーコード、或いはＵＲＬに該当するオブジェクトを検出しＵＲＬはＯＣＲで文字認識し、或いは２次元バーコードなら該マークを解読して（ステップ１２２）該原稿のオリジナル電子ファイルが格納されている記憶装置内のポインター情報を検出する（ステップ１２３）。尚ポインター情報を付加する手段は他に文字と文字の間隔に情報を埋め込む方法、ハーフトーンの画像に埋め込む方法等直接可視化されない所謂電子透かしによる方法も有る。 At this time, a two-dimensional barcode recorded as additional information in the original image or an object corresponding to the URL is detected, and the URL recognizes the character by OCR. If the two-dimensional barcode is used, the mark is decoded (step 122). Pointer information in the storage device storing the original electronic file of the original is detected (step 123). Other means for adding pointer information include a so-called digital watermark method that is not directly visualized, such as a method of embedding information between characters and a space between characters and a method of embedding in a halftone image.

ポインター情報が検出された場合、ステップ１２５に分岐し、ポインターで示されたアドレスから元の電子ファイルの検索を行う（ステップ１２５）。電子ファイルは図１においてクライアントＰＣ内のハードディスク内、或いはオフィス１０或いは２０のＬＡＮに接続された文書管理サーバー１０５内のデータベース１０５内、或いはＭＦＰ１００自体が有する記憶装置１１１のいずれかに格納されており、ステップ１２３で得られたアドレス情報に従ってこれらの記憶装置内を検索する。 If the pointer information is detected, the process branches to step 125, and the original electronic file is searched from the address indicated by the pointer (step 125). In FIG. 1, the electronic file is stored in either the hard disk in the client PC, the database 105 in the document management server 105 connected to the office 10 or 20 LAN, or the storage device 111 of the MFP 100 itself. , The storage device is searched according to the address information obtained in step 123.

ステップ１２５で電子ファイルが見つからなかった場合、見つかったがＰＤＦあるいはｔｉｆｆに代表される所謂イメージファイルであった場合、或いはポインター情報自体が存在しなかった場合はステップ１４２に分岐する。ステップ１２５で電子ファイルが見つかった場合はポインター情報から電子ファイルを取得する権限があるかどうかを判断する（ステップ１４１）。権限が無い場合は、表示装置上に権限が無いことを表示し（ステップ１４５）終了する。権限がある場合はステップ１３４に分岐する。 If the electronic file is not found in step 125, if it is found but is a so-called image file represented by PDF or tiff, or if the pointer information itself does not exist, the process branches to step 142. If an electronic file is found in step 125, it is determined from the pointer information whether there is an authority to acquire the electronic file (step 141). If there is no authority, it is displayed on the display device that there is no authority (step 145), and the process ends. If it is authorized, the process branches to step 134.

ステップ１４２はファイル検索処理を行う権限があるかどうかを判断し、ファイル検索権限がなくイメージファイルも存在しなかった場合は権限が無いことを表示装置上に表示して終了する（ステップ１４５）。ファイル検索権限がなくともイメージファイルが存在した場合はステップ１３４に分岐する。ファイル検索権限がある場合はステップ１２６のファイル検索処理を実行する。 In step 142, it is determined whether or not there is an authority to perform a file search process. If there is no file search authority and no image file exists, the fact that there is no authority is displayed on the display device and the process ends (step 145). If there is an image file without file search authority, the process branches to step 134. If there is a file search authority, the file search process in step 126 is executed.

ステップ１２６は所謂文書検索処理ルーチンである。 Step 126 is a so-called document search processing routine.

まずステップ１２２で各文字ブロックに対して行ったＯＣＲの結果から単語を抽出して全文検索、或いは各オブジェクトの配列と各オブジェクトの属性から所謂レイアウト検索を行う。検索の結果、類似度の高い電子ファイルが見つかった場合、サムネイル等を表示（ステップ１２７）し、複数の中から操作者の選択が必要なら操作者の入力操作よってファイルの特定を行う。尚、候補が１ファイルの場合、自動的にステップ１２８からステップ１３４に分岐し、格納アドレスを通知する。 First, in step 122, a word is extracted from the result of OCR performed on each character block and a full text search is performed, or a so-called layout search is performed from the array of each object and the attribute of each object. If an electronic file with a high degree of similarity is found as a result of the search, a thumbnail or the like is displayed (step 127), and if the operator needs to be selected from a plurality of files, the file is specified by the operator's input operation. If the candidate is one file, the process automatically branches from step 128 to step 134 to notify the storage address.

ステップ１２６の検索処理で電子ファイルが見つからなかった場合、或いは、見つかったがＰＤＦあるいはｔｉｆｆに代表される所謂イメージファイルであった場合、ステップ１４３に分岐する。 If the electronic file is not found in the search process of step 126, or if it is found but is a so-called image file represented by PDF or tiff, the process branches to step 143.

ステップ１４３は原稿を走査して読み取ったイメージデータをベクトルデータに変換させる処理を行う権限があるかどうかを判断するステップであり、ベクトル化処理を行う権限がなく、イメージデータも存在しなかった場合は権限が無いことを表示装置上に表示して終了する（ステップ１４５）。ベクトル化処理を行う権限がなくともイメージファイルが存在した場合はステップ１３４に分岐する。ベクトル化処理を行う権限がある場合はステップ１２９のベクトル化処理を実行する。 Step 143 is a step for judging whether or not there is an authority to perform processing for converting image data read by scanning a document into vector data, and when there is no authority to perform vectorization processing and image data does not exist. Displays on the display device that there is no authority (step 145). If there is an image file without authority to perform vectorization processing, the process branches to step 134. If the user has authority to perform vectorization processing, the vectorization processing in step 129 is executed.

ステップ１２９はイメージデータからベクトルデータへの変換処理部であり、オリジナル電子ファイルに近い電子ファイルに変換する。先ず、ステップ１２２でＯＣＲされた文字ブロックに対しては、更に文字のサイズ、スタイル、字体を認識し、原稿を走査して得られた文字に可視的に忠実なフォントデータに変換する。一方線で構成される表、図形ブロックに対してはアウトライン化する。画像ブロックに対してはイメージデータとして個別のＪＰＥＧファイルとして処理する。これらのベクトル化処理は各オブジェクト毎に行い、更に各オブジェクトのレイアウト情報を保存して例えば、ｒｔｆに変換（ステップ１３０）して電子ファイルとして記憶装置１１１に格納（ステップ１３１）する。 Step 129 is a conversion processing unit from image data to vector data, which converts the image data into an electronic file close to the original electronic file. First, for the character block that has been OCR in step 122, the character size, style, and font are further recognized and converted into font data that is visually faithful to the character obtained by scanning the document. Tables and graphic blocks composed of one line are outlined. The image block is processed as an individual JPEG file as image data. These vectorization processes are performed for each object, and further, layout information of each object is stored, converted into, for example, rtf (step 130), and stored in the storage device 111 as an electronic file (step 131).

今ベクトル化した原稿画像は以降同様の処理を行う際に直接電子ファイルとして検索出来るように、先ずステップ１３２において検索の為のインデックス情報を生成して検索用インデックスファイルに追加する。更に、ステップ１３６で今操作者が行いたい処理が記録であると判断されれば、ステップ１３３に分岐し、ポインター情報をイメージデータとしてファイルに付加する。 First, in step 132, index information for search is generated and added to the search index file so that the vectorized document image can be directly searched as an electronic file when the same processing is performed thereafter. Further, if it is determined in step 136 that the process that the operator wants to perform is recording, the process branches to step 133 to add pointer information to the file as image data.

検索処理で電子ファイルが特定できた場合も同様に以降からは直接電子ファイルを特定する為にステップ１２８からステップ１３４に分岐し、格納アドレスを操作者に通知すると共に、今紙に記録する場合は、同様にポインター情報を電子ファイルに付加する。尚、ステップ１２５でポインター情報から電子ファイルが特定できた場合、検索処理で電子ファイルが特定出来た場合、ベクトル化により電子ファイルに変換した場合、ステップ１３４において、該電子ファイルの格納アドレスを操作者に通知する。 Similarly, when the electronic file can be specified by the search process, the process branches from step 128 to step 134 in order to directly specify the electronic file. In the case where the storage address is notified to the operator and recorded on the current paper, Similarly, pointer information is added to the electronic file. If the electronic file can be identified from the pointer information in step 125, if the electronic file can be identified by the search process, or converted to an electronic file by vectorization, in step 134, the storage address of the electronic file is set to the operator. Notify

尚、以上本発明によって得られた電子ファイル自体を用いて、例えば文書の加工、蓄積、伝送、記録をステップ１３５で行う事が可能になる。これらの処理はイメージデータを用いる場合に比べて情報量が削減され、蓄積効率が高まり、伝送時間が短縮され、又記録表示する際には高品位なデータとして非常に優位となる。 Note that, using the electronic file itself obtained by the present invention as described above, for example, processing, storage, transmission and recording of a document can be performed in step 135. These processes reduce the amount of information, increase the storage efficiency, shorten the transmission time, and are very advantageous as high-quality data when recording and displaying, compared to the case of using image data.

以下、各処理ブロックに対して詳細に説明する。 Hereinafter, each processing block will be described in detail.

先ずステップ１２１で示すブロックセレクション処理について説明する。 First, the block selection process shown in step 121 will be described.

ブロックセレクション処理を説明する。 The block selection process will be described.

ブロックセレクション処理とは、図４の右に示すステップ１２０で読み取った一頁のイメージデータを左に示す様に、各オブジェクト毎の塊として認識し、該ブロック各々を文字／図画／写真／線／表等の属性に判定し、異なる属性を持つ領域に分割する処理である。 In the block selection process, the image data of one page read in step 120 shown on the right in FIG. 4 is recognized as a block for each object, as shown on the left, and each block is represented as a character / drawing / photo / line / This is a process of determining an attribute such as a table and dividing it into areas having different attributes.

ブロックセレクション処理の実施例を以下に説明する。 An example of the block selection process will be described below.

先ず、入力画像を白黒に二値化し、輪郭線追跡をおこなって黒画素輪郭で囲まれる画素の塊を抽出する。面積の大きい黒画素の塊については、内部にある白画素に対しても輪郭線追跡をおこない白画素の塊を抽出、さらに一定面積以上の白画素の塊の内部からは再帰的に黒画素の塊を抽出する。 First, the input image is binarized into black and white, and contour tracking is performed to extract a block of pixels surrounded by a black pixel contour. For a black pixel block with a large area, contour tracing is also performed for white pixels inside, and a white pixel block is extracted, and the black pixel block is recursively extracted from the white pixel block with a certain area or more. Extract lumps.

このようにして得られた黒画素の塊を、大きさおよび形状で分類し、異なる属性を持つ領域へ分類していく。たとえば、縦横比が１に近く、大きさが一定の範囲のものを文字相当の画素塊とし、さらに近接する文字が整列良くグループ化可能な部分を文字領域、扁平な画素塊を線領域、一定大きさ以上でかつ四角系の白画素塊を整列よく内包する黒画素塊の占める範囲を表領域、不定形の画素塊が散在している領域を写真領域、それ以外の任意形状の画素塊を図画領域、などとする。 The black pixel blocks obtained in this way are classified by size and shape, and are classified into regions having different attributes. For example, if the aspect ratio is close to 1 and the size is within a certain range, the pixel block corresponding to the character is used, the portion where the adjacent characters can be grouped in a well-aligned manner is the character region, and the flat pixel block is the line region. The area occupied by the black pixel block that is larger than the size and contains the square white pixel block in a well-aligned manner is the table region, the region where the irregular pixel block is scattered is the photo region, and the pixel block of any other shape is used. A drawing area, etc.

ブロックセレクション処理で得られた各ブロックに対するブロック情報を図５に示す。これらのブロック毎の情報は以降に説明するベクトル化、或いは検索の為の情報として用いる。 FIG. 5 shows block information for each block obtained by the block selection process. Information for each block is used as information for vectorization or search described below.

ポインター情報の検出を説明する。 The detection of pointer information will be described.

次に、ステップ１２２で示すファイルの格納位置をイメージ情報から抽出する為のＯＣＲ／ＯＭＲ処理について説明する。 Next, the OCR / OMR process for extracting the file storage position shown in step 122 from the image information will be described.

図６は原稿画像中に付加された２次元バーコード（ＱＲコードシンボル）を復号して、データ文字列を出力する過程を示すフローチャートである。２次元バーコードの付加された原稿３１０の一例を図７に示す。 FIG. 6 is a flowchart showing a process of decoding a two-dimensional barcode (QR code symbol) added to a document image and outputting a data character string. An example of a document 310 to which a two-dimensional barcode is added is shown in FIG.

まず、データ処理装置１１５内のページメモリに格納された原稿３１０を表すイメージ画像をＣＰＵ（不図示）で走査して、先に説明したブロックセレクション処理の結果から所定の２次元バーコードシンボル３１１の位置を検出する。ＱＲコードの位置検出パターンは、シンボルの４隅のうちの３済みに配置される同一の位置検出要素パターンから構成される（ステップ３００）。 First, an image representing the original 310 stored in the page memory in the data processing device 115 is scanned by a CPU (not shown), and a predetermined two-dimensional barcode symbol 311 is obtained from the result of the block selection process described above. Detect position. The QR code position detection pattern is composed of the same position detection element patterns arranged in three of the four corners of the symbol (step 300).

次に、位置検出パターンに隣接する形式情報を復元し、シンボルに適用されている誤り訂正レベルおよびマスクパターンを得る（ステップ３０１）。 Next, the format information adjacent to the position detection pattern is restored, and the error correction level and mask pattern applied to the symbol are obtained (step 301).

シンボルの型番を決定した（ステップ３０２）後、形式情報で得られたマスクパターンを使って符号化領域ビットパターンをＸＯＲ演算することによってマスク処理を解除する（ステップ３０３）。 After determining the symbol model number (step 302), the mask process is canceled by performing an XOR operation on the encoded area bit pattern using the mask pattern obtained from the format information (step 303).

尚、モデルに対応する配置規則に従い、シンボルキャラクタを読取り、メッセージのデータ及び誤り訂正コード語を復元する（ステップ３０４）。 The symbol character is read in accordance with the arrangement rule corresponding to the model, and the message data and the error correction code word are restored (step 304).

復元されたコード上に、誤りがあるかどうかの検出を行い（ステップ３０５）、誤りが検出された場合、ステップ３０６に分岐し、これを訂正する。 It is detected whether or not there is an error on the restored code (step 305). If an error is detected, the process branches to step 306 and is corrected.

誤り訂正されたデータより、モード指示子および文字数指示子に基づいて、データコード語をセグメントに分割する（ステップ３０７）。 Based on the mode indicator and the character number indicator, the data code word is divided into segments from the error-corrected data (step 307).

最後に、仕様モードに基づいてデータ文字を復号し、結果を出力する（ステップ３０８）。 Finally, the data character is decoded based on the specification mode, and the result is output (step 308).

尚、２次元バーコード内に組み込まれたデータは、対応するファイルのアドレス情報を表しており、例えばファイルサーバー名およびファイル名からなるパス情報で構成される。或いは、対応するファイルへのＵＲＬで構成される。 The data incorporated in the two-dimensional bar code represents the address information of the corresponding file, and is composed of path information including a file server name and a file name, for example. Alternatively, it consists of a URL to the corresponding file.

本実施例ではポインター情報が２次元バーコードを用いて付与された原稿３１０について説明したが、直接文字列でポインター情報が記録される場合は所定のルールに従った文字列のブロックを先のブロックセレクション処理で検出し、該、ポインター情報を示す文字列の各文字を文字認識する事で、直接元ファイルのアドレス情報を得る事が可能である。 In the present embodiment, the document 310 to which pointer information is assigned using a two-dimensional barcode has been described. However, when pointer information is directly recorded as a character string, a block of a character string according to a predetermined rule is replaced with the previous block. The address information of the original file can be obtained directly by detecting in the selection process and recognizing each character of the character string indicating the pointer information.

又、或いは図７の文書３１０の文字ブロック３１２、或いは３１３の文字列に対して隣接する文字と文字の間隔等に視認し難い程度の変調を加え、該文字間隔に情報を埋め込むことでもポインター情報を付与できる。該、所謂透かし情報は後述する文字認識処理を行う際に各文字の間隔を検出すれば、ポインター情報が得られる。又、自然画３１４の中に電子透かしとしてポインター情報を付加する事も可能である。
権限があるかどうかを判断する処理を示す。 Alternatively, pointer information can also be obtained by adding a modulation that is difficult to visually recognize to the character block 312 or 313 character string of the document 310 in FIG. Can be granted. As for the so-called watermark information, pointer information can be obtained by detecting an interval between characters when performing character recognition processing described later. It is also possible to add pointer information as a digital watermark in the natural image 314.
Indicates the process of determining whether there is authority.

次に、本実施例ではポインター情報から電子ファイルを取得する権限（ステップ１４１）、電子ファイルが無かった場合にファイル検索処理を行える権限（ステップ１４２）、イメージデータをベクトルデータに変換できる権限（ステップ１４３）をユーザー毎に割り当てることによって、それぞれの処理をユーザー毎に制限している。制限のかけ方としてはステップ１４０でＭＦＰにログインしたユーザー名を用いて、そのユーザー名がそれぞれの処理に対する権限があるかどうかを認証する。 Next, in this embodiment, the authority to acquire an electronic file from pointer information (step 141), the authority to perform a file search process when there is no electronic file (step 142), and the authority to convert image data into vector data (step) By assigning 143) to each user, each process is limited to each user. As a method of applying the restriction, the user name logged into the MFP in step 140 is used to authenticate whether the user name has authority for each processing.

ポインター情報によるファイル検索の内容を示す。 Indicates the contents of file search by pointer information.

次に、ポインター情報からの電子ファイルの検索について図８のフローチャートを使用して説明する。 Next, retrieval of an electronic file from pointer information will be described using the flowchart of FIG.

まず、ポインタ情報に含まれるアドレスに基づいて，ファイルサーバーを特定する（ステップ４００）。 First, the file server is specified based on the address included in the pointer information (step 400).

ここでファイルサーバーとは、クライアントＰＣ１０２や、データベース１０５を内蔵する文書管理サーバー１０６や、記憶装置１１１を内蔵するＭＦＰ１００自身を指す。 Here, the file server refers to the client PC 102, the document management server 106 including the database 105, and the MFP 100 itself including the storage device 111.

ここでアドレスとは、ＵＲＬや、サーバー名とファイル名からなるパス情報である。
ファイルサーバーが特定できたら、ファイルサーバーに対してアドレスを転送する（ステップ４０１）。ファイルサーバーは，アドレスを受信すると，該当するファイルを検索する（ステップ４０２）。ファイルが存在しない場合（ステップ４０３−Ｎ）には、ＭＦＰに対してその旨通知する。 Here, the address is URL or path information including a server name and a file name.
If the file server can be identified, the address is transferred to the file server (step 401). When the file server receives the address, it searches for the corresponding file (step 402). If the file does not exist (step 403-N), the MFP is notified.

ファイルが存在した場合（ステップ４０３−Ｙ）には、図３で説明した様に、ファイルのアドレスを通知（ステップ１３４）すると共に、ユーザーの希望する処理が画像ファイルデータの取得であれば、ＭＦＰに対してファイルを転送する（ステップ４０８）。 If the file exists (step 403-Y), as described with reference to FIG. 3, the file address is notified (step 134), and if the processing desired by the user is acquisition of image file data, the MFP (Step 408).

ファイル検索処理を示す。 Indicates a file search process.

次に、図３のステップ１２６で示すファイル検索処理の詳細について図５、図１０を使用して説明を行う。 Next, details of the file search process shown in step 126 of FIG. 3 will be described with reference to FIGS.

ステップ１２６の処理は、前述したように、ステップ１２４で入力原稿（入力ファイル）にポインタ情報が存在しなかった場合、または、ポインタ情報は在るが電子ファイルが見つからなかった場合、或いは電子ファイルがイメージファイルであった場合に行われる。 As described above, the process of step 126 is performed when there is no pointer information in the input document (input file) in step 124, or when the pointer information is present but the electronic file is not found, or when the electronic file is not found. Performed when the file is an image file.

ここでは、ステップ１２２の結果、抽出された各ブロック及び入力ファイルが、図５に示す情報（ブロック情報、入力ファイル情報）を備えるものとする。情報内容として、属性、座標位置、幅と高さのサイズ、ＯＣＲ情報有無を例としてあげる。 Here, it is assumed that each block and input file extracted as a result of step 122 includes information (block information and input file information) shown in FIG. Examples of information contents include attributes, coordinate positions, width and height sizes, and presence / absence of OCR information.

属性は、文字、線、写真、絵、表その他に分類する。また簡単に説明を行うため、ブロックは座標Ｘの小さい順、即ち（例、Ｘ１＜Ｘ２＜Ｘ３＜Ｘ４＜Ｘ５＜Ｘ６）にブロック１、ブロック２、ブロック３、ブロック４、ブロック５，ブロック６と名前をつけている。ブロック総数は、入力ファイル中の全ブロック数であり、図１０の場合は、ブロック総数は６である。以下、これらの情報を使用して、データベース内から、入力ファイルに類似したファイルのレイアウト検索を行うフローチャートを図１０に示す。ここで、データベースファイルは、図５と同様の情報を備えることを前提とする。 Attributes are classified into characters, lines, photographs, pictures, tables and others. Further, for the sake of simple explanation, the blocks are arranged in the order of the smallest coordinate X, that is, in the order of (for example, X1 <X2 <X3 <X4 <X5 <X6), block 1, block 2, block 3, block 4, block 5, block 6 Named. The total number of blocks is the total number of blocks in the input file. In the case of FIG. A flowchart for performing a layout search for a file similar to an input file from the database using these pieces of information is shown in FIG. Here, it is assumed that the database file includes the same information as in FIG.

フローチャートの流れは、入力ファイルとデータベース中のファイルを順次比較するものである。まず、ステップ５１０にて、後述する類似率などの初期化を行う。次に、ステップ５１１にてブロック総数の比較を行い、ここで、真の場合、さらにファイル内のブロックの情報を順次比較する。 The flow of the flowchart is to sequentially compare the input file and the file in the database. First, in step 510, initialization such as a similarity rate described later is performed. Next, in step 511, the total number of blocks is compared. If true, the information of the blocks in the file is further sequentially compared.

ブロックの情報比較では、ステップ５１３，５１５，５１８にて、属性類似率、サイズ類似率、ＯＣＲ類似率をそれぞれ算出し、ステップ５２２にてそれらをもとに総合類似率を算出する。各類似率の算出方法については、公知の技術が用いられるので説明を省略する。 In block information comparison, an attribute similarity rate, a size similarity rate, and an OCR similarity rate are calculated in steps 513, 515, and 518, respectively, and an overall similarity rate is calculated based on them in step 522. About the calculation method of each similarity rate, since a well-known technique is used, description is abbreviate | omitted.

ステップ５２３にて総合類似率が、予め設定された閾値Ｔｈより高ければステップ５２４にてそのファイルを類似候補としてあげる。但し、図中のＮ、Ｗ、Ｈは、入力ファイルのブロック総数、各ブロック幅、各ブロック高さとし、ΔＮ、ΔＷ、ΔＨは、入力ファイルのブロック情報を基準として誤差を考慮したものである。ｎ、ｗ、ｈは、データベースファイルのブロック総数、各ブロック幅、各ブロック高さとする。また、不図示ではあるが、ステップ５１４にてサイズ比較時に、位置情報ＸＹの比較などを行ってもよい。 In step 523, if the overall similarity is higher than a preset threshold Th, the file is listed as a similarity candidate in step 524. However, N, W, and H in the figure are the total number of blocks of the input file, each block width, and each block height, and ΔN, ΔW, and ΔH consider errors based on the block information of the input file. n, w, and h are the total number of blocks in the database file, each block width, and each block height. Further, although not shown, the position information XY may be compared at the time of size comparison in step 514.

以上、検索の結果、類似度が閾値Ｔｈより高く、候補として保存されたデータベースファイル（ステップ５２４）をサムネイル等で表示（ステップ１２７）する。複数の中から操作者の選択が必要なら操作者の入力操作よってファイルの特定を行う。 As described above, as a result of the search, the similarity is higher than the threshold value Th, and the database file (step 524) stored as a candidate is displayed as a thumbnail or the like (step 127). If the operator needs to be selected from a plurality of files, the file is specified by the operator's input operation.

ベクトル化処理を示す。 A vectorization process is shown.

ファイルサーバーに元ファイルが存在しない場合は、図４に示すイメージデータを各ブロック毎にベクトル化する。次にステップ１２９で示されるベクトル化について詳説する。まず文字ブロックに対しては各文字に対して文字認識処理を行う。 When the original file does not exist in the file server, the image data shown in FIG. 4 is vectorized for each block. Next, the vectorization shown in step 129 will be described in detail. First, character recognition processing is performed on each character in the character block.

『文字認識』
文字認識部では、文字単位で切り出された画像に対し、パターンマッチの一手法を用いて認識を行い、対応する文字コードを得る。この認識処理は、文字画像から得られる特徴を数十次元の数値列に変換した観測特徴ベクトルと、あらかじめ字種毎に求められている辞書特徴ベクトルと比較し、最も距離の近い字種を認識結果とする処理である。特徴ベクトルの抽出には種々の公知手法があり、たとえば、文字をメッシュ状に分割し、各メッシュ内の文字線を方向別に線素としてカウントしたメッシュ数次元ベクトルを特徴とする方法がある。 "Character recognition"
The character recognition unit recognizes an image cut out in character units using a pattern matching method, and obtains a corresponding character code. This recognition process recognizes the character type with the closest distance by comparing the observed feature vector obtained by converting the feature obtained from the character image into a numerical sequence of several tens of dimensions and the dictionary feature vector obtained for each character type in advance. The resulting process. There are various known methods for extracting a feature vector. For example, there is a method characterized by dividing a character into meshes, and using a mesh number-dimensional vector obtained by counting character lines in each mesh as line elements according to directions.

ブロックセレクション（ステップ１２１）で抽出された文字領域に対して文字認識を行う場合は、まず該当領域に対し横書き、縦書きの判定をおこない、各々対応する方向に行を切り出し、その後文字を切り出して文字画像を得る。横書き、縦書きの判定は、該当領域内で画素値に対する水平／垂直の射影を取り、水平射影の分散が大きい場合は横書き領域、垂直射影の分散が大きい場合は縦書き領域と判断すればよい。文字列および文字への分解は、横書きならば水平方向の射影を利用して行を切り出し、さらに切り出された行に対する垂直方向の射影から、文字を切り出すことでおこなう。縦書きの文字領域に対しては、水平と垂直を逆にすればよい。尚この時文字のサイズが検出出来る。 When character recognition is performed on the character area extracted in block selection (step 121), first, horizontal writing and vertical writing are determined for the corresponding area, lines are cut out in the corresponding directions, and then characters are cut out. Get a character image. Horizontal / vertical writing can be determined by taking a horizontal / vertical projection of the pixel value in the corresponding area, and determining that the horizontal projection area is large when the horizontal projection variance is large, and vertical writing area when the vertical projection variance is large. . For horizontal writing, character strings and characters are decomposed by cutting out lines using horizontal projection, and then cutting out characters from the vertical projection of the cut lines. For vertically written character areas, horizontal and vertical may be reversed. At this time, the character size can be detected.

『フォント認識』
文字認識の際に用いる、字種数ぶんの辞書特徴ベクトルを、文字形状種すなわちフォント種に対して複数用意し、マッチングの際に文字コードとともにフォント種を出力することで、文字のフォントが認識出来る。 "Font recognition"
Multiple character feature vectors used for character recognition are prepared for the character shape type, that is, font type, and the font type is output along with the character code when matching. I can do it.

『文字のベクトル化』
前記文字認識およびフォント認識よって得られた、文字コードおよびフォント情報を用いて、各々あらかじめ用意されたアウトラインデータを用いて、文字部分の情報をベクトルデータに変換する。なお、元原稿がカラーの場合は、カラー画像から各文字の色を抽出してベクトルデータとともに記録する。 "Vectorization of characters"
Using the character code and font information obtained by the character recognition and font recognition, the character portion information is converted into vector data using outline data prepared in advance. When the original document is color, the color of each character is extracted from the color image and recorded together with vector data.

以上の処理により文字ブロックに属するイメージ情報をほぼ形状、大きさ、色が忠実なベクトルデータに変換出来る。 Through the above processing, the image information belonging to the character block can be converted into vector data whose shape, size and color are almost faithful.

『文字以外の部分のベクトル化』
ブロックセレクション処理（ステップ１２１）で、図画あるいは線、表領域とされた領域を対象に、中で抽出された画素塊の輪郭をベクトルデータに変換する。具体的には、輪郭をなす画素の点列を角と看倣される点で区切って、各区間を部分的な直線あるいは曲線で近似する。角とは曲率が極大となる点であり、曲率が極大となる点は、図１１に図示するように、任意点Ｐｉに対し左右ｋ個の離れた点Ｐｉ−ｋ，Ｐｉ＋ｋの間に弦を引いたとき、この弦とＰｉの距離が極大となる点として求められる。さらに、Ｐｉ−ｋ，Ｐｉ＋ｋ間の弦の長さ／弧の長さをＲとし、Ｒの値が閾値以下である点を角とみなすことができる。角によって分割された後の各区間は、直線は点列に対する最小二乗法など、曲線は３次スプライン関数などを用いてベクトル化することができる。 "Vectorization of non-character parts"
In the block selection process (step 121), the outline of the pixel block extracted therein is converted into vector data for a region that is a drawing, line, or table region. Specifically, a point sequence of pixels forming an outline is divided by points regarded as corners, and each section is approximated by a partial straight line or curve. The corner is a point where the curvature is maximum, and the point where the curvature is maximum is that a string is placed between k points Pi-k and Pi + k that are separated from the arbitrary point Pi as shown in FIG. It is obtained as a point where the distance between the string and Pi becomes maximum when drawn. Furthermore, let R be the chord length / arc length between Pi−k and Pi + k, and a point where the value of R is equal to or less than a threshold value can be regarded as a corner. Each section after being divided by the corners can be vectorized by using a least square method for a straight line and a curve using a cubic spline function.

また、対象が内輪郭を持つ場合、ブロックセレクションで抽出した白画素輪郭の点列を用いて、同様に部分的直線あるいは曲線で近似する。 Further, when the target has an inner contour, it is similarly approximated by a partial straight line or a curve using the point sequence of the white pixel contour extracted by block selection.

以上のように、輪郭の区分線近似を用いれば、任意形状の図形のアウトラインをベクトル化することができる。元原稿がカラーの場合は、カラー画像から図形の色を抽出してベクトルデータとともに記録する。 As described above, the outline of a figure having an arbitrary shape can be vectorized by using the contour line approximation. If the original document is in color, the figure color is extracted from the color image and recorded together with vector data.

さらに、図１２に示す様に、ある区間で外輪郭と、内輪郭あるいは別の外輪郭が近接している場合、２つの輪郭線をひとまとめにし、太さを持った線として表現することができる。具体的には、ある輪郭の各点Ｐｉから別輪郭上で最短距離となる点Ｑｉまで線を引き、各距離ＰＱｉが平均的に一定長以下の場合、注目区間はＰＱｉ中点を点列として直線あるいは曲線で近似し、その太さはＰＱｉの平均値とする。線や線の集合体である表罫線は、前記のような太さを持つ線の集合として効率よくベクトル表現することができる。 Furthermore, as shown in FIG. 12, when an outer contour and an inner contour or another outer contour are close to each other in a certain section, two contour lines can be combined and expressed as a line having a thickness. . Specifically, when a line is drawn from each point Pi of a certain contour to a point Qi that is the shortest distance on another contour, and each distance PQi is on average less than or equal to a certain length, the interval of interest has a midpoint PQi as a point sequence The line is approximated by a straight line or a curve, and its thickness is the average value of PQi. A table ruled line that is a line or a set of lines can be efficiently expressed as a set of lines having the above-described thickness.

尚、先に文字ブロックに対する文字認識処理を用いたベクトル化を説明したが、該文字認識処理の結果、辞書からの距離が最も近い文字を認識結果として用いるが、この距離が所定値以上の場合は、必ずしも本来の文字に一致せず、形状が類似する文字に誤認識している場合が多い。従って本発明では、この様な文字に対しては、上記した様に、一般的な線画と同じに扱い、該文字をアウトライン化する。即ち、従来文字認識処理で誤認識を起こす文字に対しても誤った文字にベクトル化されず、可視的にイメージデータに忠実なアウトライン化によるベクトル化が行える。 The vectorization using the character recognition process for the character block has been described above. As a result of the character recognition process, the character having the closest distance from the dictionary is used as the recognition result, but this distance is not less than a predetermined value. Are not necessarily identical to the original characters and are often erroneously recognized as characters having similar shapes. Therefore, in the present invention, such characters are handled in the same manner as general line drawings as described above, and the characters are outlined. That is, even a character that is erroneously recognized in the conventional character recognition processing is not vectorized into an erroneous character, and can be vectorized by outline rendering that is visually faithful to image data.

又、写真と判定されたブロックに対しては本発明では、ベクトル化出来ない為、イメージデータのままとする。 In the present invention, since it is not possible to vectorize a block determined to be a photograph, it is left as image data.

図形認識を説明する。 Figure recognition will be described.

上述したように任意形状の図形のアウトラインをベクトル化した後、これらベクトル化された区分線を図形オブジェクト毎にグループ化する処理について説明する。 A process of grouping the vectorized dividing lines for each graphic object after vectorizing the outline of the arbitrarily shaped graphic as described above will be described.

図１３は、ベクトルデータを図形オブジェクト毎にグループ化するまでのフローチャートを示している。まず、各ベクトルデータの始点、終点を算出する（７００）。次に各ベクトルの始点、終点情報を用いて、図形要素を検出する（７０１）。図形要素の検出とは、区分線が構成している閉図形を検出することである。検出に際しては、閉形状を構成する各ベクトルはその両端にそれぞれ連結するベクトルを有しているという原理を応用し、検出を行う。次に図形要素内に存在する他の図形要素、もしくは区分線をグループ化し、一つの図形オブジェクトとする（７０２）。また、図形要素内に他の図形要素、区分線が存在しない場合は図形要素を図形オブジェクトとする。 FIG. 13 shows a flowchart until the vector data is grouped for each graphic object. First, the start point and end point of each vector data are calculated (700). Next, a graphic element is detected using the start point and end point information of each vector (701). The detection of a graphic element is to detect a closed graphic formed by a dividing line. In detection, the detection is performed by applying the principle that each vector constituting the closed shape has vectors connected to both ends thereof. Next, other graphic elements or dividing lines existing in the graphic element are grouped to form one graphic object (702). If there is no other graphic element or dividing line in the graphic element, the graphic element is set as a graphic object.

図１４は，図形要素を検出するフローチャートを示している。先ず、ベクトルデータより両端に連結していない不要なベクトルを除去し、閉図形構成ベクトルを抽出する（７１０）。次に閉図形構成ベクトルの中から該ベクトルの始点を開始点とし、時計回りに順にベクトルを追っていく。開始点に戻るまで行い、通過したベクトルを全て一つの図形要素を構成する閉図形としてグループ化する（７１１）。また、閉図形内部にある閉図形構成ベクトルも全てグループ化する。さらにまだグループ化されていないベクトルの始点を開始点とし、同様の処理を繰り返す。最後に、７１０で除去された不要ベクトルのうち、７１１で閉図形としてグループ化されたベクトルに接合しているものを検出し一つの図形要素としてグループ化する（７１２）。 FIG. 14 shows a flowchart for detecting a graphic element. First, unnecessary vectors not connected to both ends are removed from the vector data, and a closed graphic component vector is extracted (710). Next, from the closed figure constituent vectors, the starting point of the vector is used as a starting point, and the vectors are sequentially followed in the clockwise direction. The process is repeated until the start point is returned, and all the passed vectors are grouped as a closed graphic constituting one graphic element (711). In addition, all closed graphic constituent vectors inside the closed graphic are also grouped. Further, the same processing is repeated with the starting point of a vector not yet grouped as a starting point. Finally, among the unnecessary vectors removed at 710, those joined to the vector grouped as a closed figure at 711 are detected and grouped as one figure element (712).

以上によって図形ブロックを個別に再利用可能な個別の図形オブジェクトとして扱う事が可能になる。 As described above, the graphic block can be handled as an individual graphic object that can be reused individually.

アプリデータへの変換処理を説明する。 The conversion process to application data will be described.

ところで、一頁分のイメージデータをブロックセレクション処理（１２１）し、ベクトル化処理（１２９）した結果は図７９に示す様な中間データ形式のファイルとして変換されているが、このようなデータ形式はドキュメント・アナリシス・アウトプット・フォーマット（ＤＡＯＦ）と呼ばれる。 By the way, the image data for one page is subjected to block selection processing (121), and the result of vectorization processing (129) is converted as an intermediate data format file as shown in FIG. 79. It is called Document Analysis Output Format (DAOF).

図１５はＤＡＯＦのデータ構造を示す図である。 FIG. 15 shows the data structure of DAOF.

図１５において、７９１はＨｅａｄｅｒであり、処理対象の文書画像データに関する情報が保持される。 In FIG. 15, reference numeral 791 denotes a header, which holds information relating to document image data to be processed.

レイアウト記述データ部７９２では、文書画像データ中のＴＥＸＴ（文字）、ＴＩＴＬＥ（タイトル）、ＣＡＰＴＩＯＮ（キャプション）、ＬＩＮＥＡＲＴ（線画）、ＥＰＩＣＴＵＲＥ（自然画）、ＦＲＡＭＥ（枠）、ＴＡＢＬＥ（表）等の属性毎に認識された各ブロックの属性情報とその矩形アドレス情報を保持する。文字認識記述データ部７９３では、ＴＥＸＴ、ＴＩＴＬＥ、ＣＡＰＴＩＯＮ等のＴＥＸＴブロックを文字認識して得られる文字認識結果を保持する。表記述データ部７９４では、ＴＡＢＬＥブロックの構造の詳細を格納する。画像記述データ部７９５は、ＰＩＣＴＵＲＥやＬＩＮＥＡＲＴ等のブロックのイメージデータを文書画像データから切り出して保持する。 In the layout description data portion 792, attributes such as TEXT (character), TITLE (title), CAPTION (caption), LINEART (line drawing), EPICTURE (natural image), FRAME (frame), TABLE (table), etc. in the document image data. The attribute information of each block recognized every time and its rectangular address information are held. The character recognition description data portion 793 holds character recognition results obtained by character recognition of TEXT blocks such as TEXT, TITLE, and CAPTION. The table description data portion 794 stores details of the structure of the TABLE block. The image description data portion 795 cuts out image data of blocks such as PICTURE and LINEART from the document image data and holds them.

このようなＤＡＯＦは、中間データとしてのみならず、それ自体がファイル化されて保存される場合もあるが、このファイルの状態では、所謂一般の文書作成アプリケーションで個々のオブジェクトを再利用する事は出来ない。そこで、次にこのＤＡＯＦからアプリデータに変換する処理１３０について詳説する。 Such a DAOF is not only used as intermediate data but may be stored as a file itself. In this file state, it is not possible to reuse individual objects in a so-called general document creation application. I can't. Therefore, the process 130 for converting DAOF into application data will be described in detail.

図１６は、全体の概略フローである。 FIG. 16 is an overall schematic flow.

８０００は、ＤＡＯＦデータの入力を行う。 In 8000, DAOF data is input.

８００２は、アプリデータの元となる文書構造ツリー生成を行う。 8002 generates a document structure tree that is the source of application data.

８００４は、文書構造ツリーを元に、ＤＡＯＦ内の実データを流し込み、実際のアプリデータを生成する。 In step 8004, actual data in the DAOF is poured based on the document structure tree to generate actual application data.

図１７は、８００２文書構造ツリー生成部の詳細フロー、図８２は、文書構造ツリーの説明図である。全体制御の基本ルールとして、処理の流れはミクロブロック（単一ブロック）からマクロブロック（ブロックの集合体）へ移行する。 FIG. 17 is a detailed flow of the 8002 document structure tree generation unit, and FIG. 82 is an explanatory diagram of the document structure tree. As a basic rule of overall control, the flow of processing shifts from a micro block (single block) to a macro block (an aggregate of blocks).

以後ブロックとは、ミクロブロック、及びマクロブロック全体を指す。 Hereinafter, the block refers to the micro block and the entire macro block.

８１００は、ブロック単位で縦方向の関連性を元に再グループ化する。スタート直後はミクロブロック単位での判定となる。 8100 performs regrouping based on the relevance in the vertical direction in units of blocks. Immediately after the start, judgment is made in units of micro blocks.

ここで、関連性とは、距離が近い、ブロック幅（横方向の場合は高さ）がほぼ同一であることなどで定義することができる。 Here, the relevance can be defined by the fact that the distance is close and the block width (height in the horizontal direction) is substantially the same.

また、距離、幅、高さなどの情報はＧＡＯＦを参照し、抽出する。 Information such as distance, width, and height is extracted with reference to GAOF.

図１８（ａ）は実際のページ構成、（ｂ）はその文書構造ツリーである。８１００の結果、Ｔ３，Ｔ４，Ｔ５が一つのグループＶ１、Ｔ６，Ｔ７が一つのグループＶ２が同じ階層のグループとしてまず生成される。 FIG. 18A shows an actual page configuration, and FIG. 18B shows its document structure tree. As a result of 8100, T3, T4, and T5 are generated as one group V1, and T6 and T7 are generated as one group V2 in the same hierarchy.

８１０２は、縦方向のセパレータの有無をチェックする。セパレータは、例えば物理的にはＤＡＯＦ中でライン属性を持つオブジェクトである。また論理的な意味としては、アプリ中で明示的にブロックを分割する要素である。ここでセパレータを検出した場合は、同じ階層で再分割する。 8102 checks whether or not there is a separator in the vertical direction. For example, the separator is physically an object having a line attribute in the DAOF. Also, logically, it is an element that explicitly divides a block in the application. If a separator is detected here, it is subdivided at the same level.

８１０４は、分割がこれ以上存在し得ないか否かをグループ長を利用して判定する。 8104 uses the group length to determine whether there are no more divisions.

ここで、縦方向のグループ長がページ高さとなっている場合は、文書構造ツリー生成は終了する。 If the group length in the vertical direction is the page height, the document structure tree generation ends.

図１８の場合は、セパレータもなく、グループ高さはページ高さではないので、８１０６に進む。 In the case of FIG. 18, since there is no separator and the group height is not the page height, the process proceeds to 8106.

８１０６は、ブロック単位で横方向の関連性を元に再グループ化する。ここもスタート直後の第一回目はミクロブロック単位で判定を行うことになる。 In step 8106, regrouping is performed based on the relevance in the horizontal direction in units of blocks. Here too, the first time immediately after the start is determined in units of microblocks.

関連性、及びその判定情報の定義は、縦方向の場合と同じである。 The definition of the relevance and the determination information is the same as in the vertical direction.

図１８の場合は、Ｔ１，Ｔ２でＨ１、Ｖ１，Ｖ２でＨ２、がＶ１，Ｖ２の１つ上の同じ階層のグループとして生成される。 In the case of FIG. 18, H1 is generated at T1 and T2, and H2 is generated at V1 and V2, as a group of the same hierarchy one above V1 and V2.

８１０８は、横方向セパレータの有無をチェックする。 8108 checks for the presence of a horizontal separator.

図１８では、Ｓ１があるので、これをツリーに登録し、Ｈ１，Ｓ１，Ｈ２という階層が生成される。 In FIG. 18, since there is S1, this is registered in the tree, and a hierarchy of H1, S1, and H2 is generated.

８１１０は、分割がこれ以上存在し得ないか否かをグループ長を利用して判定する。 8110 uses the group length to determine whether there are no more divisions.

ここで、横方向のグループ長がページ幅となっている場合は、文書構造ツリー生成は終了する。 If the horizontal group length is the page width, the document structure tree generation ends.

そうでない場合は、８１０２に戻り、再びもう一段上の階層で、縦方向の関連性チェックから繰り返す。 If not, the process returns to 8102, and the relevance check in the vertical direction is repeated again at the next higher level.

図１８の場合は、分割幅がページ幅になっているので、ここで終了し、最後にページ全体を表す最上位階層のＶ０が文書構造ツリーに付加される。 In the case of FIG. 18, since the division width is the page width, the process ends here, and finally V0 of the highest hierarchy representing the entire page is added to the document structure tree.

文書構造ツリーが完成した後、その情報を元に８００６においてアプリデータの生成を行う。 After the document structure tree is completed, application data is generated in 8006 based on the information.

図１８の場合は、具体的には、以下のようになる。 Specifically, in the case of FIG.

すなわち、Ｈ１は横方向に２つのブロックＴ１とＴ２があるので、２カラムとし、Ｔ１の内部情報（ＤＡＯＦを参照、文字認識結果の文章、画像など）を出力後、カラムを変え、Ｔ２の内部情報出力、その後Ｓ１を出力となる。 That is, since there are two blocks T1 and T2 in the horizontal direction, H1 has two columns, and after T1 internal information (refer to DAOF, text of character recognition result, image, etc.) is output, the column is changed and the internal of T2 Information is output, and then S1 is output.

Ｈ２は横方向に２つのブロックＶ１とＶ２があるので、２カラムとして出力、Ｖ１はＴ３，Ｔ４，Ｔ５の順にその内部情報を出力、その後カラムを変え、Ｖ２のＴ６，Ｔ７の内部情報を出力する。 Since H2 has two blocks V1 and V2 in the horizontal direction, it outputs as 2 columns, V1 outputs its internal information in the order of T3, T4, T5, then changes the column and outputs the internal information of T2, T7 of V2 To do.

以上によりアプリデータへの変換処理が行える。 As described above, conversion processing to application data can be performed.

ポインター情報の付加を説明する。 The addition of pointer information will be described.

次に、ステップ１３３で示す、ポインター情報付加処理について説明する。 Next, the pointer information addition process shown in step 133 will be described.

今処理すべき文書が検索処理で特定された場合、あるいはベクトル化によって元ファイルが再生できた場合において、該文書を記録処理する場合においては、紙への記録の際にポインター情報を付与する事で、この文書を用いて再度各種処理を行う場合に簡単に元ファイルデータを取得できる。 When the document to be processed is specified by the search process, or when the original file can be reproduced by vectorization, when the document is recorded, pointer information is added at the time of recording on paper. Thus, the original file data can be easily acquired when performing various processes again using this document.

図１９はポインター情報としてのデータ文字列を２次元バーコード（ＱＲコードシンボル：ＪＩＳＸ０５１０）３１１にて符号化して画像中に付加する過程を示すフローチャートである。 FIG. 19 is a flowchart showing a process of encoding a data character string as pointer information with a two-dimensional barcode (QR code symbol: JIS X0510) 311 and adding it to an image.

２次元バーコード内に組み込むデータは、対応するファイルのアドレス情報を表しており、例えばファイルサーバー名およびファイル名からなるパス情報で構成される。或いは、対応するファイルへのＵＲＬや、対応するファイルの格納されているデータベース１０５内あるいはＭＦＰ１００自体が有する記憶装置内で管理されるファイルＩＤ等で構成される。 The data incorporated in the two-dimensional barcode represents the address information of the corresponding file, and is composed of path information including a file server name and a file name, for example. Alternatively, the URL includes a URL to the corresponding file, a file ID managed in the database 105 in which the corresponding file is stored, or a storage device included in the MFP 100 itself.

まず、符号化する種種の異なる文字を識別するため、入力データ列を分析する。また、誤り検出及び誤り訂正レベルを選択し、入力データが収容できる最小型番を選択する（ステップ９００）。 First, an input data string is analyzed in order to identify different characters to be encoded. Further, an error detection and error correction level is selected, and a minimum model number that can accommodate input data is selected (step 900).

次に、入力データ列を所定のビット列に変換し、必要に応じてデータのモード（数字、英数字、８ビットバイト、漢字等）を表す指示子や、終端パターンを付加する。さらに所定のビットコード語に変換する。（ステップ９０１）。 Next, the input data string is converted into a predetermined bit string, and an indicator indicating a data mode (numeric, alphanumeric, 8-bit byte, kanji, etc.) and a termination pattern are added as necessary. Further, it is converted into a predetermined bit code word. (Step 901).

この時、誤り訂正を行うため、コード語列を型番および誤り訂正レベルに応じて所定のブロック数に分割し、各ブロック毎に誤り訂正コード語を生成し、データコード語列の後に付加する（ステップ９０２）。 At this time, in order to perform error correction, the code word string is divided into a predetermined number of blocks according to the model number and the error correction level, an error correction code word is generated for each block, and is added after the data code word string ( Step 902).

該ステップ９０２で得られた各ブロックのデータコード語を接続し、各ブロックの誤り訂正コード語、必要に応じて剰余コード語を後続する。（ステップ９０３）。 The data code words of each block obtained in step 902 are connected, and the error correction code word of each block and, if necessary, the remainder code word are followed. (Step 903).

次に、マトリクスに位置検出パターン、分離パターン、タイミングパターンおよび位置合わせパターン等とともにコード語モジュールを配置する（ステップ９０４）。 Next, the code word module is arranged in the matrix together with the position detection pattern, separation pattern, timing pattern, alignment pattern, and the like (step 904).

更に、シンボルの符号化領域に対して最適なマスクパターンを選択して、マスク処理パターンをステップ９０４で得られたモジュールにＸＯＲ演算により変換する（ステップ９０５）。 Further, an optimal mask pattern is selected for the symbol coding region, and the mask processing pattern is converted into a module obtained in step 904 by an XOR operation (step 905).

最後に、ステップ９０５で得られたモジュールに形式情報および型番情報を生成して、２次元コードシンボルを完成する（ステップ９０６）。 Finally, format information and model number information are generated in the module obtained in step 905 to complete a two-dimensional code symbol (step 906).

上記に説明した、アドレス情報の組み込まれた２次元バーコードは、例えば、クライアントＰＣ１０２から電子ファイルをプリントデータとして記録装置１１２に紙上に記録画像として形成する場合に、データ処理装置１１５内で記録可能なラスターデータに変換された後にラスターデータ上の所定の個所に付加されて画像形成される。ここで画像形成された紙を配布されたユーザーは、画像読取り部１１０で読み取ることにより、前述したステップ１２３にてポインター情報からオリジナル電子ファイルの格納場所を検出することができる。 The above-described two-dimensional barcode with address information embedded can be recorded in the data processing apparatus 115 when, for example, an electronic file is formed as print data from the client PC 102 on the recording apparatus 112 as a recorded image on paper. After being converted into raster data, it is added to a predetermined location on the raster data to form an image. The user who has distributed the image-formed paper here can read the image reading unit 110 to detect the storage location of the original electronic file from the pointer information in step 123 described above.

尚、同様の目的で付加情報を付与する手段は、本実施例で説明した２次元バーコードの他に、例えば、ポインター情報を直接文字列で文書に付加する方法、文書内の文字列、特に文字と文字の間隔を変調して情報を埋め込む方法、文書中の中間調画像中に埋め込む方法等、一般に電子透かしと呼ばれる方法が適用出来る。 In addition to the two-dimensional barcode described in this embodiment, the means for adding additional information for the same purpose is, for example, a method of adding pointer information directly to a document as a character string, a character string in a document, A method generally referred to as a digital watermark, such as a method of embedding information by modulating a character spacing and a method of embedding in a halftone image in a document, can be applied.

［実施例１］
前実施例においては、ステップ１４０においてＭＦＰにユーザー認証したユーザー名／ワードを用いて各処理に対する権限があるかどうかを判断したが、ユーザー認証したユーザー名／パスワードではなく、読み取る原稿自体に電子透かしとしてユーザー名／パスワードを入れておき、それを原稿読み取り時に読み取り、そのユーザー名／パスワード用いて各処理に対する権限があるかどうかを判断することに用いることも可能である。 [Example 1]
In the previous embodiment, whether or not the MFP has the authority for each process using the user name / word authenticated by the MFP in step 140 is determined. It is also possible to insert a user name / password, read it when reading a document, and use the user name / password to determine whether or not there is an authority for each processing.

［実施例２］
前記実施例においては、ポインター情報から電子ファイルを取得する際、電子ファイルを検索する際、ベクトル化処理を行う際の３つの処理に関してユーザー制限を行い、ユーザー毎に使える機能に制限をかけていた。 [Example 2]
In the above-described embodiment, when obtaining an electronic file from pointer information, searching for an electronic file, and performing a vectorization process, user restrictions are imposed, and functions that can be used for each user are restricted. .

しかし、ポインター情報から電子ファイルを取得する際（ステップ１２５）や、ファイル検索を行い電子ファイルを取得する際（ステップ１２６）には同一ファイル名で複数の電子ファイルが存在している場合がある。これは偶然同じファイル名になっている場合もあるが、意図して同一ファイル名で保存している場合もある。 However, when an electronic file is acquired from the pointer information (step 125) or when an electronic file is acquired by performing a file search (step 126), there may be a plurality of electronic files with the same file name. This may happen to have the same file name by chance, or it may be intentionally saved with the same file name.

例えば、電子ファイルの内容は全く同じであるが、アプリケーションソフトで作成された編集可能な電子ファイル、編集不可能なイメージデータの電子ファイル、ただベクトル化されただけの電子ファイルが存在している可能性がある。 For example, the contents of the electronic file are exactly the same, but there can be an editable electronic file created with application software, an electronic file of image data that cannot be edited, or an electronic file that is just vectorized There is sex.

しかし、これら全ての電子ファイルを全てのユーザーには公開するのではなく、ユーザー毎に制限をかけておきたいといった場合も考えられる。 However, there is a case where all these electronic files are not disclosed to all users but are restricted for each user.

こうした場合において、これら電子ファイルにユーザー毎にアクセス制限を設けることによって、取得可能な電子ファイルに制限をかけることができる。 In such a case, the electronic files that can be acquired can be restricted by providing access restrictions to these electronic files for each user.

つまり、前記実施例でも示したようにＭＦＰにログインしたユーザー名もしくは電子透かしに埋め込まれたユーザー情報を用いることによって、電子ファイル取得時に取得可能な電子ファイルのみを取得させることが可能となる。 That is, as shown in the above-described embodiment, by using the user name logged in to the MFP or the user information embedded in the digital watermark, it is possible to acquire only the electronic file that can be acquired when acquiring the electronic file.

本発明にかかる画像処理システム構成例を示すブロック図である。1 is a block diagram illustrating a configuration example of an image processing system according to the present invention. ＭＦＰ１００の構成図である。2 is a configuration diagram of an MFP 100. FIG. 本発明による画像処理全体の概要を示すフローチャートである。It is a flowchart which shows the outline | summary of the whole image processing by this invention. ブロックセレクション処理を示す図である。It is a figure which shows a block selection process. ブロックセレクション処理で得られた各ブロックに対するブロック情報を示す図である。It is a figure which shows the block information with respect to each block obtained by the block selection process. 原稿画像中に付加された２次元バーコード（ＱＲコードシンボル）を復号して、データ文字列を出力する過程を示すフローチャートである。5 is a flowchart illustrating a process of decoding a two-dimensional barcode (QR code symbol) added to a document image and outputting a data character string. ２次元バーコードの付加された原稿３１０の一例を示す図である。FIG. 4 is a diagram illustrating an example of a document 310 to which a two-dimensional barcode is added. ポインター情報からの電子ファイルの検索する際のフローチャートである。It is a flowchart at the time of searching an electronic file from pointer information. 本発明に係るフローチャートである。3 is a flowchart according to the present invention. データベース内から、入力ファイルに類似したファイルのレイアウト検索を行うフローチャートである。It is a flowchart which performs the layout search of the file similar to an input file from the database. 文字以外の部分のベクトル化する際の極大点を示す図である。It is a figure which shows the maximum point at the time of vectorizing parts other than a character. ブロックセレクション処理の一部を示す図である。It is a figure which shows a part of block selection process. 図形オブジェクト認識処理を示すフローチャートである。It is a flowchart which shows a graphical object recognition process. 図形要素検出処理を示すフローチャートである。It is a flowchart which shows a graphic element detection process. ＤＡＯＦのデータ構造を示す図である。It is a figure which shows the data structure of DAOF. アプリデータ変換を示すフローチャートである。It is a flowchart which shows application data conversion. 文書構造ツリー作成を示すフローチャートである。It is a flowchart which shows document structure tree creation. 文書構造ツリーを説明するための図である。It is a figure for demonstrating a document structure tree. ポインター情報としてのデータ文字列を２次元バーコード（ＱＲコードシンボル：ＪＩＳＸ０５１０）３１１にて符号化して画像中に付加する過程を示すフローチャートである。It is a flowchart which shows the process in which the data character string as pointer information is encoded by a two-dimensional barcode (QR code symbol: JIS X0510) 311 and added to an image.

Explanation of symbols

１０オフィス
１３プロキシサーバー
２０オフィス
１０４インターネット
１０７ＬＡＮ
１００ＭＦＰ
１０１マネージメントＰＣ
１０２クライアントＰＣ（外部記憶手段）
１０３プロキシサーバー
１０４インターネット
１０５データベース
１０６文書管理サーバー
１０７ＬＡＮ
１０８ＬＡＮ
１０９ＬＡＮ
１００ＭＦＰ 10 Office 13 Proxy server 20 Office 104 Internet 107 LAN
100 MFP
101 Management PC
102 Client PC (external storage means)
103 Proxy server 104 Internet 105 Database 106 Document management server 107 LAN
108 LAN
109 LAN
100 MFP

Claims

Means for authenticating and logging in to the image processing system; means for reading and scanning a document; identification means for identifying an electronic file of the document from image information obtained by the means; and When the electronic file cannot be specified, the image information obtained by the image reading and scanning means is converted into vector data by the vectorizing means. An image processing system having means for providing a restriction.

The image processing system according to claim 1, wherein the user authentication unit allows a user to input a user name and a password.

The image processing system according to claim 1, wherein the user authentication unit uses a biometric authentication device.

2. The image processing system according to claim 1, wherein the user authentication means uses a memory card such as an IC card and a magnetic card.

2. An image processing system according to claim 1, wherein said electronic file specifying means includes means for recognizing a storage location of an electronic file additionally recorded on a document.

2. An image processing system according to claim 1, wherein said electronic file specifying means has means for searching for specific information described in a document from a file stored in a storage means.

2. An image processing system according to claim 1, wherein said vectorizing means includes OCR means for OCRing characters in a document.

The image processing system according to claim 1, wherein the vectorization unit divides the document into a plurality of objects and independently vectorizes each object.

2. The image processing system according to claim 1, wherein the vectorization means includes format conversion means for converting a vectorized object into a default format that can be handled by existing document creation software.

2. An image storage means for storing vector data vectorized by the vectorization means, and an information addition means for adding a storage location for storing the data to the vector data as additional information. The image processing system described.

Means for reading and scanning the document, and means for specifying the electronic file of the document from the image information obtained by the means, and the electronic file of the document obtained by the specifying means An image processing system wherein image information obtained by the image reading / scanning means is converted into vector data by a vectorizing means when it cannot be handled by document creation software.

The vectorization means includes means for dividing image information obtained by reading and scanning a document for each object, and means for searching for a matching object from the files stored in the storage means in units of the divided objects. The image processing system according to claim 1, wherein vectorization is performed using information obtained by the search means.

In the means for setting a restriction for each user, the user name used as the user name logged in the user authentication means is used only for a user who is authorized to execute each process of the electronic file specifying means and the vectorizing means. The image processing system according to claim 1, wherein the image processing system can be executed.

In the means for setting a restriction for each user, user information is embedded as a digital watermark in the read original itself instead of a logged-in user name, and each process of the electronic file specifying means and the vectorizing means is performed using this user information. The image processing system according to claim 1, wherein the image processing system can be executed only when there is an authority to execute.

In the electronic file specifying means, a restriction is imposed on an electronic file that can be acquired by providing an access restriction for each user on an arbitrary file or folder on the document management apparatus side in which the electronic file is stored. The image processing system according to claim 1.

16. The image processing system according to claim 15, wherein the obtainable electronic file includes an editable document created by application software, a document stored as image data, and a vectorized document.

16. The image processing system according to claim 15, further comprising: an image processing system main body for reading a document; and a document management server existing on a network.