JP2010273088A

JP2010273088A - Paper document history management system

Info

Publication number: JP2010273088A
Application number: JP2009122989A
Authority: JP
Inventors: Yasuhiro Fujii; 康広藤井; Susumu Serita; 進芹田; Yoshinori Honda; 義則本多; Hikari Morita; 光森田
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2009-05-21
Filing date: 2009-05-21
Publication date: 2010-12-02

Abstract

【課題】印刷、複写、廃棄などのログをテキストで高精度に検索することができない。
【解決手段】印刷時、印刷データからテキストを抽出し、かつ印刷データに識別子を書き込んでおく。検索時、検索キーワードにヒットしたログと同一の識別子をもつ複写や廃棄のログも一緒に出力する。具体的には、検索条件に基づいて紙文書来歴DBが格納するログ情報を検索する紙文書ログ検索部と、検索したログ情報を出力する出力部と、検索条件、及び出力したログ情報から前記紙IDの選択を入力する入力部と、選択された紙IDを含むログ情報を紙文書来歴DBから検索する紙文書来歴検索部とを具備する。
【選択図】図2Logs for printing, copying, discarding, etc. cannot be searched with high accuracy by text.
In printing, text is extracted from print data, and an identifier is written in the print data. At the time of search, a copy or discard log having the same identifier as the log that hits the search keyword is also output. Specifically, the paper document log search unit that searches log information stored in the paper document history DB based on the search condition, the output unit that outputs the searched log information, the search condition, and the output log information An input unit for inputting selection of a paper ID, and a paper document history search unit for searching log information including the selected paper ID from the paper document history DB.
[Selection] Figure 2

Description

本発明は、電子文書の印刷、紙文書の複写、廃棄などのログを収集し、これらのログを検索する技術に関する。 The present invention relates to a technique for collecting logs of electronic document printing, paper document copying, discarding, etc., and retrieving these logs.

紙文書の漏えい事故が多発しており、対策が急務となっている。紙文書の漏えいを防ぐ技術として、例えば特許文献1のように、電子文書を印刷する際に識別情報を埋め込んでおき、この識別情報をもとに紙文書の流通経路のトレースを行なう方法が知られている。 There are many paper document leakage accidents, and countermeasures are urgently needed. As a technique for preventing leakage of a paper document, for example, as in Patent Document 1, a method of embedding identification information when printing an electronic document and tracing the distribution route of the paper document based on this identification information is known. It has been.

また、特許文献2のように、印刷元となる電子文書や複写物のイメージ情報を収集しておき、印刷元の電子文書から検索インデックスを作成し、さらに、複写物のイメージ情報にOCR処理を施してテキスト情報を抽出しておくことで、印刷ログや複写ログをテキスト検索可能にする方法が知られている。 Also, as in Patent Document 2, image information of the electronic document or copy that is the printing source is collected, a search index is created from the electronic document of the printing source, and OCR processing is performed on the image information of the copy. A method for making text search of a print log or a copy log by performing text extraction to perform text search is known.

特許3265621号Patent 3256621 特開2007-48236号JP 2007-48236

しかしながら、特許文献1の方法で識別情報を埋め込んでおいたとしても、人手による紙文書の持ち出しは検知できない。また、識別情報が埋め込まれた紙文書が外部に漏えいした場合、その漏えいした紙文書が見つかり、かつ識別情報が抽出できなければ、流通経路は把握できない。そのため、漏えい自体が発覚しなかった場合や、漏えい先でコピーを何回も繰り返されるなどといった劣化処理を施されて漏えい物から識別情報を抽出できなくなった場合には、特許文献1の方法は機能しなくなる。 However, even if the identification information is embedded by the method of Patent Document 1, it is not possible to detect manual take-out of a paper document. Further, when a paper document in which identification information is embedded leaks to the outside, the distribution channel cannot be grasped unless the leaked paper document is found and the identification information cannot be extracted. Therefore, if the leak itself has not been detected, or if it is no longer possible to extract identification information from the leaked material due to deterioration processing such as copying being repeated many times at the leak destination, the method of Patent Document 1 is Stops functioning.

特許文献2の方法では、実際に漏えい物が見つからなかった場合でも、電子文書に含まれるキーワードでログを定期的に検索することによって、紙文書の漏えいを検知することができる。また、紙文書が漏えいしたとしても、漏えい物に含まれるキーワードで検索することで漏えい元を絞り込むことができる。しかし、OCR処理における誤認識によって、複写物から正しくテキスト情報を抽出できない可能性は高い。この場合、特許文献2の方法では複写ログを正しく検索できなくなる。 According to the method of Patent Document 2, even when a leaked item is not actually found, it is possible to detect a leak of a paper document by periodically searching the log with a keyword included in the electronic document. Even if a paper document is leaked, the leak source can be narrowed down by searching with a keyword included in the leaked material. However, there is a high possibility that text information cannot be correctly extracted from a copy due to misrecognition in OCR processing. In this case, the method of Patent Document 2 cannot search the copy log correctly.

上記課題を解決するために、本発明は、印刷時、印刷データからテキストを抽出し、かつ印刷データに識別子を書き込んでおく。そして、検索時、検索キーワードにヒットしたログと同一の識別子をもつ複写や廃棄のログも一緒に出力することで、複写物や廃棄物のスキャンイメージからOCRで抽出したテキスト情報の代わりに印刷時に印刷元の電子データから抽出したテキスト情報で、高精度に複写や廃棄のログを検索できるようにする。 In order to solve the above problems, the present invention extracts text from print data and writes an identifier in the print data at the time of printing. At the time of search, a copy or disposal log with the same identifier as the log that hits the search keyword is also output, so that at the time of printing instead of text information extracted by OCR from the scanned image of the copy or waste The text information extracted from the electronic data of the printing source can be searched for a copy or disposal log with high accuracy.

具体的には、本発明の紙文書来歴管理システムの態様は、クライアントPCから出力された印刷ログに含まれる紙IDと電子文書のテキスト、複合機から出力された複写ログに含まれる、電子文書に対応する紙文書のイメージから抽出した紙IDとテキスト、並びに、シュレッダから出力された廃棄ログに含まれる紙文書及び複合機により紙文書が複写された複写紙文書の少なくとも一方のイメージから抽出した紙IDとテキストをログ情報として格納する紙文書来歴DBと、検索条件に基づいて紙文書来歴DBが格納するログ情報を検索する紙文書ログ検索部と、検索したログ情報を出力する出力部と、検索条件、及び出力したログ情報から前記紙IDの選択を入力する入力部と、選択された紙IDを含むログ情報を紙文書来歴DBから検索する紙文書来歴検索部とを具備する。 Specifically, the paper document history management system according to the present invention includes a paper ID and a text of an electronic document included in a print log output from a client PC, and an electronic document included in a copy log output from a multi-function peripheral. Extracted from at least one of the paper ID and text extracted from the image of the paper document corresponding to, the paper document included in the disposal log output from the shredder, and the copy paper document copied by the multifunction machine A paper document history DB that stores paper ID and text as log information, a paper document log search unit that searches log information stored in the paper document history DB based on search conditions, and an output unit that outputs the searched log information An input unit for inputting the selection of the paper ID from the search condition and the output log information, and a paper document history search unit for searching log information including the selected paper ID from the paper document history DB. Comprising.

本発明の他の態様は、キーワードとなるインデックスと、インデックスをテキストに含むログ情報の管理番号とを対応付けて格納するテキスト検索テーブルを有し、紙文書ログ検索部は、検索条件に含まれる検索キーワードを単語に分解し、分解して得られた単語が、テキスト検索テーブルのインデックスに存在するとき、対応する管理番号を取得し、取得した管理番号が示すログ情報を、出力部が出力する検索したログ情報とする。 Another aspect of the present invention has a text search table that stores an index serving as a keyword and a management number of log information including the index in the text in association with each other, and the paper document log search unit is included in the search condition. When the search keyword is decomposed into words and the word obtained by the decomposition exists in the index of the text search table, the corresponding management number is acquired, and the output unit outputs log information indicated by the acquired management number. The retrieved log information.

本発明のさらに他の態様は、紙文書来歴検索部は、選択された紙IDを含むログ情報を紙文書来歴DBから複数検索した場合は、複数のログ情報の集合を出力部が出力する検索したログ情報とする。 According to still another aspect of the present invention, when the paper document history search unit searches the paper document history DB for a plurality of log information including the selected paper ID, the output unit outputs a set of a plurality of log information. Log information.

本発明のさらに他の態様は、紙文書来歴検索部は、複数の前記ログ情報の集合から、印刷ログの数＋複写ログの数−廃棄ログの数を紙の枚数とし、出力部は紙の枚数を出力する。 According to still another aspect of the present invention, the paper document history search unit sets the number of print logs + the number of copy logs−the number of discard logs from the set of the plurality of log information, and the output unit sets the number of papers. Output the number of copies.

本発明のさらに他の態様は、紙文書来歴検索部は、紙文書来歴DBから検索した結果、選択された紙IDを含むログ情報を取得できない場合、選択された紙IDに対応するログ情報のテキストと類似するテキストを有する、もしくは、選択された紙IDに対応するログ情報に含まれるスキャンイメージと類似するスキャンイメージを有する、紙文書来歴DBに格納されているログ情報を選択する。 According to still another aspect of the present invention, when the paper document history search unit cannot acquire log information including the selected paper ID as a result of searching from the paper document history DB, the log information corresponding to the selected paper ID is stored. Log information stored in the paper document history DB having a text similar to the text or a scan image similar to the scan image included in the log information corresponding to the selected paper ID is selected.

本発明のさらに他の態様は、上記の紙文書来歴DB、紙文書ログ検索部、出力部、入力部、及び、紙文書来歴検索部を含む管理サーバと、電子文書を、紙IDを付与した紙文書として印刷するプリンタと、プリンタに接続し、印刷する電子文書を保持し、電子文書に紙IDを付与してプリンタに出力し、紙IDと電子文書のテキストとを含む印刷ログを管理サーバへ出力するクライアントPC と、紙文書を少なくとも複写し、複写した複写紙文書を出力し、紙IDが付与された紙文書のイメージを含む複写ログを管理サーバへ出力する複合機と、紙文書及び複写紙文書の少なくとも一方を廃棄し、紙IDが付与され、廃棄した紙文書及び複写紙文書の少なくとも一方のイメージとを含む廃棄ログを管理サーバへ出力するシュレッダとを有する。 Still another aspect of the present invention provides a management server including the paper document history DB, a paper document log search unit, an output unit, an input unit, and a paper document history search unit, and a paper ID assigned to the electronic document. Management server that prints as a paper document, connects to the printer, holds the electronic document to be printed, assigns a paper ID to the electronic document, outputs it to the printer, and includes the paper ID and the text of the electronic document A client PC that outputs to the MFP, a multifunction machine that outputs at least a copy of the paper document, outputs a copied copy of the paper document, and outputs a copy log including an image of the paper document assigned with the paper ID to the management server, and the paper document and A shredder that discards at least one of the copy paper documents, is given a paper ID, and outputs a discard log including an image of at least one of the discarded paper document and the copy paper document to the management server;

本発明のさらに他の態様は、クライアントPCは、印刷物を識別するための識別子である紙IDを生成する紙ID生成部と、プリンタに出力する電子文書に生成した紙IDを付与する紙ID書込部と、電子文書から電子文書のテキストを抽出するテキスト抽出部と、生成した紙IDと抽出した電子文書のテキストとを含む印刷ログを管理サーバに送信する印刷ログ登録部とを具備する。 According to still another aspect of the present invention, the client PC includes a paper ID generation unit that generates a paper ID that is an identifier for identifying a printed matter, and a paper ID document that gives the generated paper ID to an electronic document output to the printer. And a text extraction unit that extracts the text of the electronic document from the electronic document, and a print log registration unit that transmits a print log including the generated paper ID and the extracted text of the electronic document to the management server.

本発明のさらに他の態様は、紙ID生成部は、電子文書のページ数に応じた数の紙IDを生成し、生成した紙IDを電子文書のページ毎に付与する。 In still another aspect of the present invention, the paper ID generation unit generates a number of paper IDs corresponding to the number of pages of the electronic document, and assigns the generated paper ID to each page of the electronic document.

本発明のさらに他の態様は、管理サーバは、複合機からの複写ログ及びシュレッダからの廃棄ログの少なくとも一方のログ情報を取得するログ取得部と、取得したログ情報に含まれるイメージから紙IDを抽出する紙ID抽出部と、イメージからテキスト情報を抽出するテキスト抽出部と、ログ情報、抽出した紙ID、および、抽出したテキスト情報を紙文書来歴DBに登録するログ登録部とを具備する。 According to still another aspect of the present invention, the management server includes a log acquisition unit that acquires log information of at least one of a copy log from the multifunction peripheral and a discard log from the shredder, and a paper ID from an image included in the acquired log information. A paper ID extraction unit that extracts text information, a text extraction unit that extracts text information from an image, and a log registration unit that registers log information, the extracted paper ID, and the extracted text information in the paper document history DB. .

本発明のさらに他の態様は、紙ID抽出部は、紙IDを抽出できなかった場合、紙IDの抽出に失敗した旨を出力する。 In still another aspect of the present invention, the paper ID extraction unit outputs that the extraction of the paper ID has failed when the paper ID cannot be extracted.

本発明のさらに他の態様は、電子文書を、紙IDを付与した紙文書として印刷するプリンタと、プリンタに接続し、印刷する電子文書を保持し、電子文書に紙IDを付与してプリンタに出力し、紙IDと電子文書のテキストとを含む印刷ログを出力するクライアントPC と、紙文書を少なくとも複写し、複写した複写紙文書を出力し、紙IDが付与された紙文書のイメージを含む複写ログを出力する複合機と、紙文書及び複写紙文書の少なくとも一方を廃棄し、紙IDが付与され、廃棄した紙文書及び複写紙文書の少なくとも一方のイメージとを含む廃棄ログを出力するシュレッダと、クライアントPCから出力された印刷ログに含まれる紙IDと電子文書のテキスト、複合機から出力された複写ログに含まれる紙文書のイメージから抽出した紙IDとテキスト、および、シュレッダから出力された廃棄ログに含まれる紙文書及び複写紙文書の少なくとも一方のイメージから抽出した紙IDとテキストをログ情報として紙文書来歴DBに登録する管理サーバとを有する。 Still another aspect of the present invention provides a printer that prints an electronic document as a paper document to which a paper ID is assigned, a printer connected to the printer, the electronic document to be printed is held, and a paper ID is assigned to the electronic document to the printer. The client PC that outputs and outputs the print log including the paper ID and the text of the electronic document, and at least copies the paper document, outputs the copied copy paper document, and includes the image of the paper document assigned with the paper ID A shredder that outputs a disposal log that includes a multifunction peripheral that outputs a copy log, and discards at least one of a paper document and a copy paper document, is assigned a paper ID, and includes at least one image of the discarded paper document and the copy paper document And the paper ID and text of the electronic document included in the print log output from the client PC, the paper ID and text extracted from the image of the paper document included in the copy log output from the MFP, and And a management server that registers the paper ID and text extracted from at least one of the paper document and the copy paper document included in the discard log output from the shredder as log information in the paper document history DB.

以上の態様を換言すると、紙文書の印刷ログ、複写ログ、廃棄ログなど紙文書に関係するログを管理する紙文書来歴管理サーバと、印刷元となる電子文書を保持するクライアントPC と、電子文書の印刷を行なうプリンタと、紙文書の複写、電子化、FAXなどを行なう複合機と、紙文書の廃棄を行なうシュレッダとがネットワークに接続した構成を持つ紙文書来歴管理システムを提供する。 In other words, the paper document history management server that manages logs related to paper documents such as print logs, copy logs, and disposal logs of paper documents, a client PC that holds the electronic document that is the printing source, and the electronic document A paper document history management system having a configuration in which a printer that performs printing, a multi-function device that performs copying, digitization, faxing, and the like of a paper document and a shredder that discards the paper document are connected to a network.

ここで当該クライアントPCは、印刷対象となる電子文書をプリンタでの印刷に適した形式に変換する印刷データ変換部と、印刷物を識別するための識別子を発行する紙ID生成部と、いつ誰がどの電子文書を印刷したかなどといった印刷ログを収集する印刷ログ収集部と、前記印刷データ変換部で前記電子文書から変換された印刷データからテキスト情報を抽出するテキスト抽出部と、前記印刷データに前記紙ID生成部で発行された識別子を書き込む紙ID書込部などを具備する。 Here, the client PC includes a print data conversion unit that converts an electronic document to be printed into a format suitable for printing on a printer, a paper ID generation unit that issues an identifier for identifying printed matter, and who and when A print log collection unit that collects a print log such as whether an electronic document has been printed, a text extraction unit that extracts text information from print data converted from the electronic document by the print data conversion unit, and the print data A paper ID writing unit for writing the identifier issued by the paper ID generation unit is provided.

当該紙文書来歴管理サーバは、印刷ログ、複写ログ、廃棄ログなど紙文書に関係するログを格納する紙文書来歴DBと、複合機もしくはシュレッダから取得したログ情報に含まれる紙文書をスキャンしたイメージから識別子を抽出する紙ID抽出部と、前記イメージからOCRなどでテキスト情報を抽出するテキスト抽出部などを具備する。 The paper document history management server scans a paper document history DB that stores logs related to paper documents, such as print logs, copy logs, and disposal logs, and paper documents included in log information acquired from a multifunction peripheral or shredder A paper ID extracting unit for extracting an identifier from the image, and a text extracting unit for extracting text information from the image by OCR or the like.

当該紙文書来歴DBは、紙文書に関係するログ情報を格納するログ管理テーブルと、紙文書をスキャンしたイメージを格納する画像ログ管理テーブルと、テキスト検索用のインデックス情報を格納するテキスト検索テーブルと、紙文書の識別子と前記ログ管理テーブルに格納されたログとの関連を管理する紙ID管理テーブルなどを具備する。 The paper document history DB includes a log management table that stores log information related to paper documents, an image log management table that stores images scanned from paper documents, and a text search table that stores index information for text search. A paper ID management table for managing the association between the identifier of the paper document and the log stored in the log management table.

紙文書来歴DBに格納されたログ情報を検索する紙文書来歴検索プログラムは、前記テキスト検索テーブルを照合して、検索キーワードに合致するログ情報を抽出して表示する。ユーザが検索結果からあるログを選択した場合、さらに紙文書来歴検索プログラムは、前記紙ID管理テーブルを参照して、ユーザが選択したログ情報と同一の識別子を含むログの集合を抽出して表示する。このログの集合には、印刷だけではなく、複写ログや廃棄ログが含まれる。 The paper document history search program that searches log information stored in the paper document history DB collates the text search table, extracts log information that matches the search keyword, and displays it. When the user selects a certain log from the search result, the paper document history search program further refers to the paper ID management table and extracts and displays a set of logs including the same identifier as the log information selected by the user To do. This set of logs includes not only printing but also copying logs and discarding logs.

本発明によれば、検索キーワードにヒットしたログ情報と同一の識別子（紙ID）をもつ複写や廃棄のログも出力するので、電子文書の印刷、印刷された紙文書の複写のログを高精度で検索できる。 According to the present invention, since a copy or discard log having the same identifier (paper ID) as the log information hit in the search keyword is also output, the electronic document print log and the printed paper document copy log can be accurately recorded. You can search with

紙文書来歴管理システムの全体構成を例示する図である。It is a figure which illustrates the whole structure of a paper document history management system. クライアントPCおよび紙文書来歴管理サーバの構成を例示する図である。It is a figure which illustrates the structure of a client PC and a paper document history management server. 紙文書来歴管理システムのハードウェア構成を例示する図である。It is a figure which illustrates the hardware constitutions of a paper document history management system. 紙文書来歴DBの構成を例示する図である。It is a figure which illustrates the structure of paper document history DB. 紙文書来歴DB更新処理のフローチャートの一例である。It is an example of the flowchart of a paper document history DB update process. 紙文書来歴検索インターフェースを例示する図である。It is a figure which illustrates a paper document history search interface. 紙文書来歴検索プログラムの構成を例示する図である。It is a figure which illustrates the structure of a paper document history search program. 紙文書ログ検索処理のフローチャートの一例である。It is an example of the flowchart of a paper document log search process. 紙文書来歴検索処理のフローチャートの一例である。It is an example of the flowchart of a paper document history search process. 複数の紙文書来歴検索システムで構成されるシステムの全体構成を例示する図である。It is a figure which illustrates the whole structure of the system comprised by several paper document history search system.

図面を参照しつつ、紙文書来歴管理システムの実施例について説明する。 An embodiment of a paper document history management system will be described with reference to the drawings.

図1は紙文書来歴管理システムの全体構成を例示する図である。紙文書来歴管理システム10は、紙文書の印刷ログ、複写ログ、廃棄ログなどを管理する紙文書来歴管理サーバ100と、印刷元となる電子文書を保持するn台のクライアントPC 102-1〜102-nと、電子文書の印刷を行なうp台のプリンタ103-1〜103-pと、紙文書の複写を行なうm台の複合機104-1〜104-mと、紙文書の廃棄を行なうs台のシュレッダ106-1〜106-sとがネットワーク108に接続した構成を持つ。 FIG. 1 is a diagram illustrating the overall configuration of a paper document history management system. The paper document history management system 10 includes a paper document history management server 100 that manages print logs, copy logs, and disposal logs of paper documents, and n client PCs 102-1 to 102 that hold electronic documents to be printed. -n, p printers 103-1 to 103-p for printing electronic documents, m multifunction devices 104-1 to 104-m for copying paper documents, and disposal of paper documents The shredders 106-1 to 106-s are connected to the network 108.

以下、クライアント102は、クライアント102-1〜102-nのいずれかを指すものとする。プリンタ103、複合機104、シュレッダ106についても同様である。 Hereinafter, the client 102 refers to any of the clients 102-1 to 102-n. The same applies to the printer 103, the multifunction machine 104, and the shredder 106.

本実施例においては、複合機104-1〜104-mおよびシュレッダ106-1〜106-sは、紙文書の複写、廃棄などのときに、紙文書をスキャンしたイメージを含むログ情報を収集する。以下、このイメージを「画像ログ」とよぶ。なお、画像ログを収集可能なシュレッダについては、例えば特開2004-228684に説明されている。 In this embodiment, the multifunction peripherals 104-1 to 104-m and the shredders 106-1 to 106-s collect log information including an image obtained by scanning a paper document when the paper document is copied or discarded. . Hereinafter, this image is referred to as “image log”. A shredder capable of collecting image logs is described in, for example, Japanese Patent Application Laid-Open No. 2004-228684.

もし画像ログを収集できる他の装置（FAXなど）が存在すれば、紙文書来歴管理システム10はこれを具備してもよい。もしくは、複合機104が紙文書の電子化やFAXなどを行なう機能を有してもよい。この場合、複合機104はこれらの画像ログも収集可能であるとする。 If there is another device (such as FAX) that can collect image logs, the paper document history management system 10 may include this. Alternatively, the multifunction device 104 may have a function of digitizing a paper document, faxing, or the like. In this case, it is assumed that the multifunction peripheral 104 can also collect these image logs.

複合機104は複写機能だけではなく印刷機能を有してもよい。この場合、プリンタ103-1〜103-pはなくてもよい。 The multi-function peripheral 104 may have not only a copying function but also a printing function. In this case, the printers 103-1 to 103-p may be omitted.

紙文書来歴管理システム10は、紙文書来歴管理サーバ100、クライアントPC102-1〜102-n、プリンタ103-1〜103-p以外の装置（複合機104-1〜104-m、シュレッダ106-1〜106-sのすべて、もしくはいずれか）を具備しなくてもよい。複合機104-1〜104-m、シュレッダ106-1〜106-sのすべてを具備しない場合については実施例2として後述する。 The paper document history management system 10 includes devices other than the paper document history management server 100, client PCs 102-1 to 102-n, and printers 103-1 to 103-p (multifunction machines 104-1 to 104-m, shredders 106-1). All or any of ˜106-s may not be provided. A case where all of the multifunction peripherals 104-1 to 104-m and the shredders 106-1 to 106-s are not provided will be described later as a second embodiment.

別の組織から紙文書が人手で渡ってきた場合、紙文書来歴管理システム10は、印刷元となる電子文書を保管するクライアントPC102-1〜102-nやプリンタ103-1〜103-pを具備しなくてもよい。その場合については実施例3として後述する。 When a paper document is manually transferred from another organization, the paper document history management system 10 includes client PCs 102-1 to 102-n and printers 103-1 to 103-p that store electronic documents as printing sources. You don't have to. This case will be described later as Example 3.

以下、紙文書来歴管理サーバが管理する印刷ログ、複写ログ、廃棄ログなど紙文書に関係するログ情報を総称して「紙文書ログ」とよぶ。 Hereinafter, log information related to a paper document such as a print log, a copy log, and a discard log managed by the paper document history management server is collectively referred to as a “paper document log”.

図2はクライアントPC102および紙文書来歴管理サーバ100の構成を例示する図である。クライアントPC102は印刷対象となる電子文書20を保持しており、ユーザ2000の印刷指示に応じて電子文書20をプリンタ103（もしくは複合機104）での印刷に適した形式に変換する印刷データ変換部200と、印刷物を識別する情報（以下「紙ID」とよぶ）を発行する紙ID生成部202と、いつ誰がどの電子文書を印刷したかなどといった印刷ログを収集する印刷ログ収集部204と、前記印刷データからテキスト情報を抽出するテキスト抽出部206と、前記印刷データに紙IDを書き込む紙ID書込部208と、収集した印刷ログと抽出したテキスト情報と紙IDとを対応付けて紙文書来歴管理サーバ100に登録する印刷ログ登録部210と、紙IDが付加された印刷データ22をプリンタ103（もしくは複合機104）に送信する出力部212を具備する。なお、印刷ログ登録部210が紙文書来歴管理サーバ100に登録する、収集した印刷ログと抽出したテキスト情報と紙IDとを対応付けた情報を印刷ログと呼ぶこともある。以下、各構成要素について動作の詳細を説明する。 FIG. 2 is a diagram illustrating the configuration of the client PC 102 and the paper document history management server 100. The client PC 102 holds an electronic document 20 to be printed, and a print data conversion unit that converts the electronic document 20 into a format suitable for printing on the printer 103 (or the multifunction machine 104) in accordance with a print instruction from the user 2000. 200, a paper ID generation unit 202 that issues information for identifying printed matter (hereinafter referred to as “paper ID”), a print log collection unit 204 that collects print logs such as when and who printed which electronic document, A text extraction unit 206 that extracts text information from the print data; a paper ID writing unit 208 that writes a paper ID to the print data; and a paper document that associates the collected print log with the extracted text information and paper ID. A print log registration unit 210 that is registered in the history management server 100 and an output unit 212 that transmits the print data 22 to which the paper ID is added to the printer 103 (or the multifunction machine 104) are provided. The information that the print log registration unit 210 registers in the paper document history management server 100 and that associates the collected print log with the extracted text information and the paper ID may be referred to as a print log. Details of the operation of each component will be described below.

印刷データ変換部200は、電子文書20を構成する文章の各文字コードをフォントに変換するなどといったレンダリング処理を行なうことで、電子文書20を印刷データに変換する。印刷データは出力先のプリンタ103-1〜103-pのそれぞれに固有の形式をもつ。従って、印刷データ変換部はp個の変換関数を備える。出力先のプリンタ（103-1〜103-pのいずれか）はユーザ2000が指定する。なお、もし複合機104-1〜104-mのいずれかが印刷機能を有していれば、その台数をpmとすると、印刷データ変換部はp+pm個の変換関数を備え、ユーザ2000がp+pm台の内どの装置で印刷するかを指定する。 The print data conversion unit 200 converts the electronic document 20 into print data by performing a rendering process such as converting each character code of a sentence constituting the electronic document 20 into a font. The print data has a format specific to each of the output destination printers 103-1 to 103-p. Accordingly, the print data conversion unit includes p conversion functions. The user 2000 designates the output destination printer (any one of 103-1 to 103-p). If any of the MFPs 104-1 to 104-m has a printing function, if the number is pm, the print data conversion unit has p + pm conversion functions, and the user 2000 Specify which of the p + pm devices to print.

紙ID生成部202は、印刷物のページごとに一意となる紙IDを生成する。例えば、電子文書20が10ページの文書で、これを20部印刷する場合、紙ID生成部202が生成する紙IDは10個となる。仮にページごとではなく紙一枚一枚に一意となる紙IDを200個発行したとしても、複合機104で複写された場合、結局同じ紙IDを持つ紙が複数存在することになるため、紙IDの枯渇を防止するためにページごとに一意となる紙IDを発行する。紙IDの生成には、GUIDなどといった同一識別子が生成される確率が非常に小さいことが保証された識別子生成関数を用いてもいいし、一意となる紙IDを発行する図示していないサーバに問い合わせて取得してもよい。 The paper ID generation unit 202 generates a paper ID that is unique for each page of the printed matter. For example, if the electronic document 20 is a 10-page document and 20 copies are printed, the paper ID generation unit 202 generates 10 paper IDs. Even if 200 unique paper IDs are issued for each piece of paper, not for each page, there will be multiple papers with the same paper ID after copying with the MFP 104. In order to prevent ID depletion, a unique paper ID is issued for each page. For generating a paper ID, an identifier generation function that guarantees a very low probability of generating the same identifier, such as a GUID, may be used, or a server (not shown) that issues a unique paper ID may be used. You may obtain it by inquiring.

なお、紙ID生成部202はページごとではなく印刷物一枚一枚に固有のIDを付与してもよい。すなわち、例えば10ページの電子文書を20部印刷する場合、200個のIDを生成してもよい。この場合については、後に図4を用いて説明する。 Note that the paper ID generation unit 202 may assign a unique ID to each printed matter instead of each page. That is, for example, when 20 copies of a 10-page electronic document are printed, 200 IDs may be generated. This case will be described later with reference to FIG.

印刷ログ収集部204は、印刷ログとして、印刷時刻、ログインユーザ名などといったユーザ2000の識別情報、IPアドレス、マックアドレスなどといったクライアントPC102の識別情報、電子文書20に関するファイル名などといった情報などを収集する。 The print log collection unit 204 collects, as a print log, identification information of the user 2000 such as a printing time and a login user name, identification information of the client PC 102 such as an IP address and a Mac address, and information such as a file name related to the electronic document 20. To do.

テキスト抽出部は206、印刷データ変換部200が行なうレンダリング処理をフッキングし、フォント情報からレンダリング元となる文字コードを入手するなどといった方法でテキストを抽出する。 The text extraction unit 206 hooks a rendering process performed by the print data conversion unit 200, and extracts text by a method such as obtaining a character code as a rendering source from font information.

紙ID書込部208は、印刷データ変換部200から受信した印刷データに、紙ID生成部202から受信した紙IDを書き込む。紙IDはバーコードとして印刷データに書き込まれてもよいし、電子透かしの形で目立たないように紙文書全体に埋め込まれてもよい。電子透かし技術の詳細については、例えば特許4193665に説明されている。 The paper ID writing unit 208 writes the paper ID received from the paper ID generation unit 202 in the print data received from the print data conversion unit 200. The paper ID may be written in the print data as a barcode, or may be embedded in the entire paper document so as not to stand out in the form of a digital watermark. Details of the digital watermark technique are described in, for example, Japanese Patent No. 4193665.

印刷ログ登録部210は、印刷ログ収集部204で収集した印刷ログと、テキスト抽出部206で抽出した印刷データのテキスト情報と、紙ID生成部202で生成した紙IDを、ネットワーク108を介して紙文書来歴管理サーバ100の紙文書来歴DB220へ登録する。なお、前述したように、印刷ログ登録部210が紙文書来歴管理サーバ100に登録する、収集した印刷ログと抽出したテキスト情報と紙IDとを対応付けた情報を印刷ログと呼ぶこともある。これらに加えて印刷データそのものを登録しても構わない。この場合については後で図4を用いて説明する。 The print log registration unit 210 receives the print log collected by the print log collection unit 204, the text information of the print data extracted by the text extraction unit 206, and the paper ID generated by the paper ID generation unit 202 via the network 108. Register in the paper document history DB 220 of the paper document history management server 100. As described above, the information that the print log registration unit 210 registers in the paper document history management server 100 and that associates the collected print log, the extracted text information, and the paper ID may be referred to as a print log. In addition to these, the print data itself may be registered. This case will be described later with reference to FIG.

出力部212は、紙ID書込部208から、紙ID付印刷データ22を受信し、ユーザ2000が指定したプリンタ103（もしくは複合機104）へ印刷データを送信する。印刷データを受信したプリンタ103（もしくは複合機104）が紙ID付印刷データ22を印刷し出力することで、印刷処理が完了する。 The output unit 212 receives the paper ID-added print data 22 from the paper ID writing unit 208 and transmits the print data to the printer 103 (or the multifunction peripheral 104) designated by the user 2000. The printer 103 (or the multifunction machine 104) that has received the print data prints and outputs the paper ID-added print data 22, and the print processing is completed.

なお、クライアントPC102内の構成要素200〜212は、ネットワーク108に接続する図示していないプリンタサーバが備えてもよい。この場合、ユーザ2000の印刷指示によって、クライアントPC102にある電子文書20が、ネットワーク108を介してプリンタサーバの印刷データ変換部200に送信され、プリンタサーバ内にある構成要素200〜212が上述と同様の動作を行なうことで印刷が行なわれる。 The components 200 to 212 in the client PC 102 may be provided in a printer server (not shown) connected to the network 108. In this case, according to the print instruction of the user 2000, the electronic document 20 in the client PC 102 is transmitted to the print data conversion unit 200 of the printer server via the network 108, and the components 200 to 212 in the printer server are the same as described above. Printing is performed by performing the above operation.

紙文書来歴管理サーバ100は、紙文書ログを格納する紙文書来歴DB220と、複合機104もしくはシュレッダ106（もしくは別の画像ログを取得できネットワーク108に接続した装置）から紙文書ログ24（複写ログや廃棄ログ）を取得するログ取得部222と、取得した紙文書ログ24に含まれる画像ログから紙IDを抽出する紙ID抽出部224と、画像ログにOCRを施してテキスト情報を抽出するテキスト抽出部226と、取得したログ、抽出した紙ID、テキスト情報を紙文書来歴DB220に登録するログ登録部228と、紙文書来歴DB220に格納されたログ情報を検索する紙文書来歴検索プログラム230を具備する。以下、各構成要素について動作の詳細を説明する。 The paper document history management server 100 includes a paper document history DB 220 for storing paper document logs, and a paper document log 24 (copy log) from the multifunction peripheral 104 or the shredder 106 (or another device that can acquire another image log and connected to the network 108). Log acquisition unit 222 for acquiring a log or a disposal log), a paper ID extraction unit 224 for extracting a paper ID from an image log included in the acquired paper document log 24, and text for extracting text information by performing OCR on the image log An extraction unit 226, a log registration unit 228 that registers the acquired log, the extracted paper ID, and text information in the paper document history DB 220, and a paper document history search program 230 that searches log information stored in the paper document history DB 220. It has. Details of the operation of each component will be described below.

紙文書来歴DB220は、紙文書ログの他に、印刷データや画像ログから抽出したテキスト情報や紙IDを格納する。印刷ログと同様に、紙文書ログである複写ログや廃棄ログに印刷物に含まれるテキストや紙IDを含めて、複写ログや廃棄ログと呼ぶことがある。紙文書来歴DB220のデータ構造については後で図4を用いて説明する。 The paper document history DB 220 stores text information and paper ID extracted from print data and image logs in addition to the paper document log. Similar to the print log, a copy log or a discard log, which is a paper document log, may include a text or a paper ID included in a printed matter and may be called a copy log or a discard log. The data structure of the paper document history DB 220 will be described later with reference to FIG.

ログ取得部222は、複合機104やシュレッダ106などの画像ログを取得可能な装置から紙文書ログ24を取得する。例えば複合機104からは、紙文書ログ24として、複写時刻、ICカードの識別子などの複写したユーザを識別する情報、IPアドレス、マックアドレスなどといった複合機104の識別情報、および複写物のスキャン画像（画像ログ）を受信する。紙文書ログ24は、複合機104/シュレッダ106から操作毎に送信される場合もあるし、ログ取得部222が定期的に複合機104/シュレッダ106に接続して収集する場合がある。 The log acquisition unit 222 acquires the paper document log 24 from an apparatus capable of acquiring an image log such as the multifunction machine 104 or the shredder 106. For example, from the multifunction device 104, as a paper document log 24, information identifying the copied user such as a copy time and an IC card identifier, identification information of the multifunction device 104 such as an IP address and a Mac address, and a scanned image of a copy (Image log) is received. The paper document log 24 may be transmitted for each operation from the multifunction peripheral 104 / shredder 106, or may be collected by the log acquisition unit 222 periodically connected to the multifunction peripheral 104 / shredder 106.

なお、プリンタ103の代わりに複合機104で印刷した場合、複合機の機種によっては、印刷の画像ログも取得してしまう場合がある。印刷ログはクライアントPC102で収集済みであるため、ログ取得部222は、紙文書ログ24取得時に紙文書ログ24の種別を確認して、もし印刷ログであったら紙文書来歴DB220登録せず破棄する必要がある。 When printing is performed by the multifunction peripheral 104 instead of the printer 103, a print image log may be acquired depending on the model of the multifunction peripheral. Since the print log has already been collected by the client PC 102, the log acquisition unit 222 checks the type of the paper document log 24 when acquiring the paper document log 24, and discards the print log without registering it in the paper document history DB 220. There is a need.

紙ID抽出部224はログ取得部222から画像ログを取得し、当該画像ログに書き込まれた紙IDの抽出を行なう。紙IDがバーコードの形で書き込まれている場合はバーコードの読み取りを試み、電子透かしの形で書き込まれている場合は電子透かし情報の抽出を試みる。もともと紙IDが付与されていない紙文書を複写、廃棄した、もしくはバーコードや透かしが除去されたなどといった理由で紙IDが抽出できなかった場合、その旨（紙ID抽出失敗）を示す情報を抽出結果として保持する。 The paper ID extraction unit 224 acquires the image log from the log acquisition unit 222, and extracts the paper ID written in the image log. If the paper ID is written in the form of a barcode, it tries to read the barcode, and if it is written in the form of a digital watermark, it tries to extract the digital watermark information. If the paper ID cannot be extracted due to reasons such as copying or discarding a paper document that was not originally assigned a paper ID, or removing a barcode or watermark, information indicating that (paper ID extraction failure) Hold as extraction result.

なお、本実施例では、あらかじめ紙IDの書き込み形式を決めておく必要がある。もしくは、紙ID抽出部224は、最初にバーコード読み取りを試み、もし読み取れなかった場合は電子透かし抽出を試みるといったように、紙IDが抽出できるまで複数の抽出方式を試してもよい。 In the present embodiment, it is necessary to determine the paper ID writing format in advance. Alternatively, the paper ID extraction unit 224 may try a plurality of extraction methods until the paper ID can be extracted, such as first trying to read a barcode, and if not, trying to extract a digital watermark.

テキスト抽出部226は、ログ取得部222から画像ログを取得し、画像ログにOCRを施してテキスト情報を抽出する。紙IDが付与されていない紙文書を複写、廃棄した場合には印刷元のデータが存在しないため、テキスト抽出部226で抽出したテキスト情報だけを用いて検索されることになる。 The text extraction unit 226 acquires an image log from the log acquisition unit 222, extracts the text information by performing OCR on the image log. When a paper document to which a paper ID is not assigned is copied or discarded, the data of the printing source does not exist, so that the search is performed using only the text information extracted by the text extraction unit 226.

なお、もし紙IDが付与されていない紙文書が運用上存在しない場合、もしくはOCRの信頼性が低いためこのような紙文書のテキスト検索を行なわないことにした場合には、テキスト抽出部226は不要となる。 If there is no paper document that does not have a paper ID, or if the text search of such a paper document is not performed due to low OCR reliability, the text extraction unit 226 It becomes unnecessary.

ログ登録部228は、ログ取得部222で取得した紙文書ログ24、紙ID抽出部224で抽出した紙ID、テキスト抽出部226で抽出したテキスト情報を紙文書来歴DB220に登録する。 The log registration unit 228 registers the paper document log 24 acquired by the log acquisition unit 222, the paper ID extracted by the paper ID extraction unit 224, and the text information extracted by the text extraction unit 226 in the paper document history DB 220.

紙文書来歴検索プログラム230は、紙文書来歴DB220に格納されたログ情報を検索するプログラムである。検索手順の詳細については後で図6〜8を用いて説明する。なお、紙文書来歴検索プログラム230は、紙文書来歴管理サーバ100以外のハードウェアが具備してもよい。この場合紙文書来歴検索プログラム230は、ネットワーク108を介して紙文書来歴DB220にアクセスすることで紙文書ログを検索する。 The paper document history search program 230 is a program for searching log information stored in the paper document history DB 220. Details of the search procedure will be described later with reference to FIGS. The paper document history search program 230 may be provided with hardware other than the paper document history management server 100. In this case, the paper document history search program 230 searches the paper document log by accessing the paper document history DB 220 via the network 108.

図3は紙文書来歴管理システム10のハードウェア構成を例示する図である。 FIG. 3 is a diagram illustrating a hardware configuration of the paper document history management system 10.

紙文書来歴管理サーバ100/クライアントPC102は、演算を実行するCPU302a、情報を一時的に格納するメモリ304a、情報を格納する記憶装置306a、ネットワーク108を介した情報を授受する通信部308a、ディスプレイなどの表示部310a、ユーザ操作を受け付けるキーボード、マウスなどの操作部312aがハブ300aに接続した構成をもつ。 The paper document history management server 100 / client PC 102 includes a CPU 302a that performs computation, a memory 304a that temporarily stores information, a storage device 306a that stores information, a communication unit 308a that exchanges information via the network 108, a display, and the like The display unit 310a, a keyboard that accepts user operations, and an operation unit 312a such as a mouse are connected to the hub 300a.

複合機104/シュレッダ106は、CPU302b、メモリ304b、記憶装置306b、通信部308b、タッチパネルのような表示部310b、操作部312bの他、紙文書を読み取って画像ログを生成するスキャン部314、および複合機104/シュレッダ106の動作に必須となる動作装置316がハブ300bに接続した構成をもつ。具体的に動作装置316とは、複合機104であれば紙送り器、用紙トレイなどであり、シュレッダ106であれば紙細断装置となる。なお、プリンタ103のハードウェア構成については、本発明に直接関係しないので説明を割愛する。 The MFP 104 / shredder 106 includes a CPU 302b, a memory 304b, a storage device 306b, a communication unit 308b, a display unit 310b such as a touch panel, an operation unit 312b, a scanning unit 314 that reads a paper document and generates an image log, and An operation device 316 that is indispensable for the operation of the multifunction machine 104 / shredder 106 is connected to the hub 300b. Specifically, the operation device 316 is a paper feeder, a paper tray, or the like in the case of the multifunction machine 104, and a paper shredding device in the case of the shredder 106. Note that the hardware configuration of the printer 103 is not directly related to the present invention and will not be described.

図4は紙文書来歴DBの構成を例示する図である。紙文書来歴DB220は、印刷ログ、複写ログ、廃棄ログを格納するログ管理テーブル40と、画像ログを格納する画像ログ管理テーブル42と、テキスト検索用のインデックス情報を格納するテキスト検索テーブル44と、紙IDとログ管理テーブル40に格納されたログの関連を管理する紙ID管理テーブル46といったデータ構造を持ち、さらに、テキスト検索テーブル44を更新するテキスト検索テーブル更新部440と、紙ID管理テーブルを更新する紙ID管理テーブル更新部460を具備する。以下、それぞれのテーブルおよび装置について詳細を説明する。 FIG. 4 is a diagram illustrating the configuration of the paper document history DB. The paper document history DB 220 includes a log management table 40 that stores print logs, copy logs, and discard logs, an image log management table 42 that stores image logs, and a text search table 44 that stores index information for text search, It has a data structure such as a paper ID management table 46 for managing the relationship between the paper ID and the log stored in the log management table 40, and further includes a text search table update unit 440 for updating the text search table 44, and a paper ID management table. A paper ID management table update unit 460 to be updated is provided. Details of each table and apparatus will be described below.

ログ管理テーブル40は、格納されているログを識別する番号（以下「ログ管理番号」とよぶ）を格納するログ管理番号カラム400、印刷や複写などのユーザ操作名を記す操作名カラム402、印刷データや画像ログの識別情報を示すファイル名カラム404、印刷データや画像ログの何ページ目かを表すページカラム406、印刷物に書き込まれた紙IDを格納する紙IDカラム408、印刷物、複写物として出力した紙の部数を記す部数カラム410、印刷データや画像データから抽出されたテキスト情報を格納するテキスト情報カラム412などを要素として持つ。他に、ログの対象となる操作を行なった時刻、クライアントPC102や複合機104、シュレッダ106などの識別情報、ユーザの識別情報などのカラムを設けても良い。 The log management table 40 includes a log management number column 400 for storing a number for identifying a stored log (hereinafter referred to as “log management number”), an operation name column 402 for indicating a user operation name such as printing or copying, and printing. File name column 404 indicating identification information of data and image log, page column 406 indicating the page number of print data and image log, paper ID column 408 for storing paper ID written on the printed matter, printed matter and copy matter It has a copy number column 410 for indicating the number of copies of the output paper, a text information column 412 for storing text information extracted from print data and image data, and the like. In addition, columns such as a time when an operation to be logged is performed, identification information of the client PC 102, the multifunction peripheral 104, the shredder 106, and the like, and user identification information may be provided.

ファイル名カラム404には、操作名カラム402が印刷である場合は、印刷元となる電子文書のファイル名称や、印刷データの識別情報が入る。それ以外であれば、画像ログテーブル42に格納されている対応する画像ログのファイル名が入る。例えば、図4のログ管理番号400が1〜3のログは、ファイル名「bank.doc」の電子文書の1〜3ページを1部印刷し、1ページ目に紙ID123が、2ページ目に紙ID124が、3ページ目に紙ID125が付与されたことを意味している。また、ログ管理番号が4の紙文書ログは、画像ログ管理テーブル42にファイル名「4.bmp」の画像ログが格納されていることを意味している。図2の紙ID生成部202のところで説明したように、ページ毎に異なる紙IDが付与される。紙ID生成部202がページ毎ではなく印刷物一枚一枚に一意となるIDを生成した場合は、紙IDカラム408には対応する部数カラム410記載された部数分の紙IDが格納される。 In the file name column 404, when the operation name column 402 is printing, the file name of the electronic document to be printed and the identification information of the print data are entered. Otherwise, the file name of the corresponding image log stored in the image log table 42 is entered. For example, the log with the log management number 400 of 1 to 3 in Fig. 4 prints 1 to 3 pages of the electronic document with the file name "bank.doc", with the paper ID 123 on the first page and the second page. The paper ID 124 means that the paper ID 125 is assigned to the third page. Further, the paper document log having the log management number 4 means that the image log having the file name “4.bmp” is stored in the image log management table 42. As described with reference to the paper ID generation unit 202 in FIG. 2, a different paper ID is assigned to each page. When the paper ID generation unit 202 generates an ID that is unique for each printed product, not for each page, the paper ID column 408 stores paper IDs for the number of copies described in the corresponding copy number column 410.

部数カラム410は、印刷物、複写物の出力部数、もしくは廃棄時の紙の部数が格納される。もし複数の同じ紙IDをもつ紙文書を同時に廃棄した場合、廃棄ログの部数は1以上となりうる。 The number of copies column 410 stores the number of output copies of printed matter and copies, or the number of copies of paper when discarded. If multiple paper documents with the same paper ID are discarded at the same time, the number of copies of the discard log can be 1 or more.

テキスト情報カラム412には、操作名カラム402が印刷である場合は、印刷データから抽出したテキスト情報が入る。それ以外であれば、画像ログからOCRを用いて抽出したテキスト情報が入る。例えば、ログ管理番号が1の印刷ログのテキスト情報「XYZ銀行は…」は、電子文書「bank.doc」の1ページ目に記載されている文章すべてを意味する。一方、ログ管理番号が4の複写ログのテキスト情報「XYZ○○は…」や、ログ管理番号が5の廃棄ログのテキスト情報「○○銀行は…」の「○」は、OCRで文字を正しく特定できなかったことを意味している。本実施例では、このようなOCRで抽出した不確実なテキストの代わりに印刷データから抽出したテキストを用いて検索できるようにする。 In the text information column 412, when the operation name column 402 is printing, text information extracted from the print data is entered. Otherwise, text information extracted from the image log using OCR is entered. For example, the text information “XYZ bank is ...” of the print log with the log management number 1 means all sentences described on the first page of the electronic document “bank.doc”. On the other hand, the text information “XYZ ○○ is…” for the copy log with the log management number 4 and the text information “○ bank is…” for the discard log with the log management number 5 It means that it could not be identified correctly. In this embodiment, a search can be made using text extracted from print data instead of uncertain text extracted by OCR.

なお、図2の紙文書来歴管理サーバ100がテキスト抽出部226を備えなかった場合、画像ログからテキスト情報を抽出しないため、ログ管理番号4、5のテキスト情報カラム412は空となる。この場合、紙IDが付与されていない紙文書を複写、廃棄したときのログはテキスト検索にヒットしなくなるが、その反面、OCRの誤認識による検索精度の低下が回避できるという利点がある。 If the paper document history management server 100 of FIG. 2 does not include the text extraction unit 226, the text information column 412 of log management numbers 4 and 5 is empty because text information is not extracted from the image log. In this case, a log when a paper document not assigned with a paper ID is copied or discarded does not hit the text search, but on the other hand, there is an advantage that a decrease in search accuracy due to erroneous recognition of OCR can be avoided.

画像ログ管理テーブル42には、画像ログそのものが格納される。図4は、ログ管理テーブル40のログ管理番号4のログに対応する画像ログに「4.bmp」、ログ管理番号5に対応する画像ログに「5.bmp」というファイル名を振って格納した場合を例示している。複合機104/シュレッダ106によっては、複数ページをまとめて複写/廃棄したとき、TIFF、PDFなどの形式で複数ページを持つ1つの画像ファイルを生成する場合がある。ページごとに紙IDが異なるため、この場合、ログ管理テーブル40のページカラム406に、抽出できた紙IDに対応する画像ログは何ページ目なのかを明示する必要がある。 The image log management table 42 stores the image log itself. In FIG. 4, the image log corresponding to the log with log management number 4 in the log management table 40 is stored with the file name “4.bmp” and the image log corresponding to log management number 5 with the file name “5.bmp”. The case is illustrated. Depending on the multifunction peripheral 104 / shredder 106, when a plurality of pages are copied / discarded together, one image file having a plurality of pages in a format such as TIFF or PDF may be generated. Since the paper ID is different for each page, in this case, it is necessary to clearly indicate the page number of the image log corresponding to the extracted paper ID in the page column 406 of the log management table 40.

なお、電子文書20によってはテキスト情報を一切含まない場合がある。このような場合に備えて、後で印刷ログを目視で確認できるように、印刷データ自体を画像ログテーブル42に格納してもよい。この場合、ログ管理テーブル40のファイル名カラム404は、画像ログテーブル42に格納されている印刷データの識別情報（ファイル名など）となる。 Note that some electronic documents 20 may not include any text information. In preparation for such a case, the print data itself may be stored in the image log table 42 so that the print log can be visually confirmed later. In this case, the file name column 404 of the log management table 40 is print data identification information (file name, etc.) stored in the image log table 42.

テキスト検索テーブル44は、検索キーワードとなるインデックスを格納するインデックスカラム442と、それをテキスト情報に含むログ管理番号の一覧を格納するログ一覧カラム444からなる。紙文書来歴DB220に新しい紙文書ログが登録されたとき、テキスト検索テーブル更新部440が起動し、形態素解析などを施すことで当該紙文書ログのテキスト情報を単語に分解し、単語をインデックスカラム442、当該紙文書ログのログ管理番号をログ一覧カラム444に追加することで、テキスト検索テーブル44を更新する。動作フローの詳細については後で図5を用いて説明する。 The text search table 44 includes an index column 442 for storing an index serving as a search keyword, and a log list column 444 for storing a list of log management numbers including the index in the text information. When a new paper document log is registered in the paper document history DB 220, the text search table update unit 440 is activated, and parses the text information of the paper document log into words by performing morphological analysis and the like. The text search table 44 is updated by adding the log management number of the paper document log to the log list column 444. Details of the operation flow will be described later with reference to FIG.

紙ID管理テーブル46は、紙IDを格納する紙IDカラム462と、それを要素としてもつ紙文書ログのログ管理番号の一覧を格納するログ一覧カラム464からなる。紙文書来歴DB220に新しい紙文書ログが登録されたとき、紙ID管理テーブル更新部460が起動し、当該紙文書ログの紙IDを紙IDカラム462、当該紙文書ログのログ管理番号をログ一覧カラム464に追加することで、紙ID管理テーブル44を更新する。動作フローの詳細については後で図5を用いて説明する。 The paper ID management table 46 includes a paper ID column 462 for storing a paper ID and a log list column 464 for storing a list of log management numbers of paper document logs having the paper ID as an element. When a new paper document log is registered in the paper document history DB 220, the paper ID management table update unit 460 is activated, the paper ID of the paper document log is displayed in the paper ID column 462, and the log management number of the paper document log is displayed in the log list. By adding to the column 464, the paper ID management table 44 is updated. Details of the operation flow will be described later with reference to FIG.

図5は紙文書来歴DB更新処理のフローチャートの一例である。図2、図4を参照しつつ、各ステップについて説明する。
（ステップ500）紙文書来歴DB220は、紙文書ログ、テキスト情報、紙ID、画像ログ（もしくは印刷データ）を受信する。以下、当該テキストをNewText、当該紙IDをNewIDと記す。
（ステップ502）紙文書来歴DB220は、ログ管理テーブル40に新しいログ管理番号NewLogNoをもつ行を追加し、受信した紙文書ログを格納する。
（ステップ504）紙文書来歴DB220は、画像ログテーブル504に受信した画像ログ（もしくは印刷データ）を保存する。ステップ502で画像ログのファイル名を決定した場合は、この決定したファイル名で保存する。そうでない場合は、ログ管理テーブル40のファイル名カラム404を画像ログテーブル504に保存したファイル名で置き換える。 FIG. 5 is an example of a flowchart of the paper document history DB update process. Each step will be described with reference to FIGS.
(Step 500) The paper document history DB 220 receives the paper document log, text information, paper ID, and image log (or print data). Hereinafter, the text is referred to as NewText, and the paper ID is referred to as NewID.
(Step 502) The paper document history DB 220 adds a row having a new log management number NewLogNo to the log management table 40, and stores the received paper document log.
(Step 504) The paper document history DB 220 stores the received image log (or print data) in the image log table 504. If the file name of the image log is determined in step 502, the file is stored with the determined file name. Otherwise, the file name column 404 of the log management table 40 is replaced with the file name stored in the image log table 504.

以降、テキスト検索テーブル更新に関するステップ506〜512の処理、紙ID管理テーブル更新に関するステップ514〜518の処理は、それぞれ並行して行なわれる。もしくは、どちらか一方を先に行なった後にもう一方の処理を行なってもよい。
（ステップ506）テキスト検索テーブル更新部440は、テキスト情報NewTextを単語に分解する。分解には、改行や空白を詰めるなどといった正規化処理を行い、使用言語を判定した後に形態素解析を施すなどといった方法が用いられる。 Thereafter, the processes in steps 506 to 512 relating to the text search table update and the processes in steps 514 to 518 relating to the paper ID management table update are respectively performed in parallel. Alternatively, the other process may be performed after either one is performed first.
(Step 506) The text search table update unit 440 decomposes the text information NewText into words. For the decomposition, a method of performing a normalization process such as filling a new line or a blank, and performing a morphological analysis after determining a language to be used is used.

以下、ステップ508〜512は、分解して得られたすべての単語について処理を行なう。単語の個数をN個、各単語をP-1〜P-Nとする。
（ステップ508）テキスト検索テーブル更新部440は、単語P-i（i=1〜N）がすでにテキスト検索テーブル44のインデックスカラム442に存在するかを調べる。存在すればステップ510へ、しなければステップ512へ飛ぶ。
（ステップ510）すでに単語P-iがインデックスカラム442に存在する場合、テキスト検索テーブル更新部440は、ログ管理番号NewLogNoをテキスト検索テーブル44のログ一覧カラム444に追加する。
（ステップ512）単語P-iがインデックスカラム442に存在しない場合、テキスト検索テーブル更新部440は新しい行を追加して、この行のインデックスカラム442に単語P-iを、ログ一覧カラム444にログ管理番号NewLogNoを追加する。 In steps 508 to 512, processing is performed for all words obtained by the decomposition. The number of words is N, and each word is P-1 to PN.
(Step 508) The text search table update unit 440 checks whether the word Pi (i = 1 to N) already exists in the index column 442 of the text search table 44. If it exists, the process jumps to step 510;
(Step 510) When the word Pi already exists in the index column 442, the text search table update unit 440 adds the log management number NewLogNo to the log list column 444 of the text search table 44.
(Step 512) If the word Pi does not exist in the index column 442, the text search table update unit 440 adds a new row, and adds the word Pi to the index column 442 and the log management number NewLogNo to the log list column 444. to add.

すべての単語についてステップ508〜512を適用したとき、テキスト検索テーブル更新に関するステップ506〜512の処理が終了する。 When Steps 508 to 512 are applied to all the words, the processing of Steps 506 to 512 related to the text search table update ends.

一方、紙ID管理テーブル更新に関するステップは以下の通りである。
（ステップ514）紙ID管理テーブル更新部460は、NewIDがすでに紙ID管理テーブル46の紙IDカラム462に存在するかを調べる。存在すればステップ516へ、しなければステップ518へ飛ぶ。
（ステップ516）すでにNewIDが紙IDカラム462に存在する場合、紙ID管理テーブル更新部460は、ログ管理番号NewLogNoを紙ID管理テーブル46のログ一覧カラム464に追加する。
（ステップ518）NewIDが紙ID462に存在しない場合、紙ID管理テーブル更新部460は新しい行を追加して、この行の紙IDカラム442にNewIDを、ログ一覧カラム464にログ管理番号NewLogNoを追加する。 On the other hand, the steps related to the paper ID management table update are as follows.
(Step 514) The paper ID management table update unit 460 checks whether the New ID already exists in the paper ID column 462 of the paper ID management table 46. If it exists, the process jumps to step 516;
(Step 516) If NewID already exists in the paper ID column 462, the paper ID management table update unit 460 adds the log management number NewLogNo to the log list column 464 of the paper ID management table 46.
(Step 518) If the New ID does not exist in the paper ID 462, the paper ID management table update unit 460 adds a new line, adds the New ID to the paper ID column 442 and the log management number NewLogNo to the log list column 464 of this line. To do.

以上で紙文書来歴DB更新処理が完了する。 This completes the paper document history DB update process.

図6は紙文書来歴検索インターフェースを例示する図である。紙文書来歴検索プログラム230はインターフェース600をもつ。インターフェース600は、ユーザが検索条件を指定する領域602、検索条件に合致したログ一式を表示する領域604、および選択したログに関連する紙文書の出力状況を図示する領域606からなる。 FIG. 6 is a diagram illustrating a paper document history search interface. The paper document history search program 230 has an interface 600. The interface 600 includes an area 602 in which a user specifies search conditions, an area 604 that displays a set of logs that match the search conditions, and an area 606 that illustrates the output status of a paper document related to the selected log.

領域602では、ユーザが検索キーワードや、検索対象の期間や、検索対象のログを指定する。図6の例では、検索キーワードとして「銀行」が、検索対象期間として2001年1月1日0時0分0秒から2008年9月12日23時59分59秒が、検索対象のログの種類として印刷、複写、廃棄が選択されていることを意味している。なお、領域602は検索条件の一例であり、この他に、ファイル名、IPアドレス、ユーザ名、紙IDなども検索条件になりうる。 In an area 602, the user specifies a search keyword, a search target period, and a search target log. In the example shown in FIG. 6, “bank” is the search keyword, and the search target period is from January 1, 2001 0: 0: 0 to September 12, 2008 23:59:59. This means that printing, copying, or disposal is selected as the type. Note that the area 602 is an example of a search condition, and besides this, a file name, an IP address, a user name, a paper ID, and the like can also be the search condition.

領域604には、上記の検索条件に合致した紙文書ログ一式が表示されている。実際、図4のログ管理テーブル40を見ると、上記検索条件に合致するのはログ管理番号1と5の紙文書ログになることがわかる。紙文書ログ検索の動作フローについては後で図8を用いて詳細を説明する。 In a region 604, a set of paper document logs that match the above search conditions is displayed. In fact, looking at the log management table 40 in FIG. 4, it is understood that the paper document logs with log management numbers 1 and 5 meet the above search conditions. Details of the operation flow of the paper document log search will be described later with reference to FIG.

ユーザがログ管理番号1の紙文書ログを選択したとき、このログの紙の出力状況が領域606に表示される。吹き出し608に「bank.doc」の1ページ目から抽出したテキスト情報が参考のため表示されている。また、吹き出し610には、「bank.doc」の1ページ目の印刷物、すなわち紙ID123が書き込まれた紙文書の印刷、複写、廃棄のログがまとめて表示されている。電子文書「bank.doc」の1ページ目が1部印刷され、この印刷物が2部複写され、合計3部の紙文書が1部廃棄されて、現在2部残っていることが明らかであろう。実際、図4のログ管理テーブル40に紙ID123の紙が複写されたログが存在するため、複写ログも吹き出し610に表示されている。 When the user selects a paper document log with log management number 1, the paper output status of this log is displayed in area 606. Text information extracted from the first page of “bank.doc” is displayed in a balloon 608 for reference. Also, the balloon 610 collectively displays print, copy, and discard logs of the first page of “bank.doc”, that is, a paper document in which the paper ID 123 is written. It will be clear that one copy of the first page of the electronic document "bank.doc" has been printed, two copies of this printed matter have been copied, a total of three copies of the paper document have been discarded, and two copies are still present . Actually, since there is a log in which the paper with the paper ID 123 is copied in the log management table 40 of FIG.

以下、同一の紙IDを持つ紙文書ログの集合を「紙文書来歴情報」とよぶ。吹き出し610には紙文書来歴情報が表示される。紙文書来歴情報を表示するための動作フローについては後で図9を用いて詳細を説明する。 Hereinafter, a set of paper document logs having the same paper ID is referred to as “paper document history information”. The balloon 610 displays paper document history information. The operation flow for displaying the paper document history information will be described later in detail with reference to FIG.

なお、特許文献2のようにOCRだけを用いて「銀行」でテキスト検索した場合、図4のログ管理テーブル40からは、ログ管理番号4の複写ログはヒットしない。本実施例では、印刷物に紙IDを付与することで、OCRでのテキスト抽出結果だけに頼らずにログをテキスト検索できる。 When a text search is performed with “bank” using only OCR as in Patent Document 2, the copy log with log management number 4 is not hit from the log management table 40 of FIG. In this embodiment, by assigning a paper ID to the printed matter, it is possible to search the log for text without relying only on the text extraction result in OCR.

また、図6は紙文書来歴検索プログラム230のインターフェースの一例であり、例えば、ファイル名を示したアイコン612、紙IDを示したアイコン614の代わりにサムネイル画像をそれぞれ表示してもよい。また、吹き出し608、610はなくても構わない。 FIG. 6 shows an example of the interface of the paper document history search program 230. For example, thumbnail images may be displayed in place of the icon 612 indicating the file name and the icon 614 indicating the paper ID. Further, the balloons 608 and 610 may be omitted.

図7は紙文書来歴検索プログラム230の構成を例示する図である。紙文書来歴検索プログラム230は、図6の領域602で指定されるような検索条件70を入力する入力部700と、検索条件70に基づいて紙文書ログの検索を行なう紙文書ログ検索部702と、図示していないユーザが選択したログの紙文書来歴を検索する紙文書来歴検索部704と、紙文書来歴DB220と通信する通信部220と、紙文書ログや紙文書来歴情報の検索結果を表示する出力部708とで構成される。 FIG. 7 is a diagram illustrating an example of the configuration of the paper document history search program 230. The paper document history search program 230 includes an input unit 700 for inputting a search condition 70 as specified in the area 602 of FIG. 6, and a paper document log search unit 702 for searching for a paper document log based on the search condition 70. , A paper document history search unit 704 that searches the paper document history of the log selected by the user (not shown), a communication unit 220 that communicates with the paper document history DB 220, and a search result of the paper document log and paper document history information are displayed. Output unit 708.

なお、図2で説明したように、紙文書来歴検索プログラム230が紙文書来歴管理サーバ100以外のネットワーク108に接続するハードウェア上に存在する場合、通信部706はネットワーク108を介して紙文書来歴DB220に接続する。 As described with reference to FIG. 2, when the paper document history search program 230 is present on hardware connected to the network 108 other than the paper document history management server 100, the communication unit 706 transmits the paper document history via the network 108. Connect to DB220.

図8は紙文書ログ検索処理のフローチャートの一例である。図4、図7を参照しつつ各ステップについて説明する。
（ステップ800）入力部700は検索条件70を受信して紙文書ログ検索部702に渡す。
（ステップ802）紙文書ログ検索部702は、受信した検索条件70に含まれる検索キーワードを単語P-1〜P-sに分解する。例えば「XYZ銀行」で検索した場合、単語は「XYZ」と「銀行」になる。 FIG. 8 is an example of a flowchart of the paper document log search process. Each step will be described with reference to FIGS.
(Step 800) The input unit 700 receives the search condition 70 and passes it to the paper document log search unit 702.
(Step 802) The paper document log search unit 702 decomposes the search keyword included in the received search condition 70 into words P-1 to Ps. For example, when searching for “XYZ bank”, the words are “XYZ” and “bank”.

次のステップ804は、すべての単語P-k（k=1〜s）について処理を行なう。
（ステップ804）紙文書ログ検索部702は、単語P-k（k=1〜s）がテキスト検索テーブル44のインデックスカラム442に存在するかどうかを調べ、存在すれば対応するログ一覧カラム444からログ管理番号一覧を取得する。 In the next step 804, processing is performed for all words Pk (k = 1 to s).
(Step 804) The paper document log search unit 702 checks whether or not the word Pk (k = 1 to s) exists in the index column 442 of the text search table 44, and if it exists, log management is performed from the corresponding log list column 444. Get a list of numbers.

すべての単語についてステップ804を適用したとき、ステップ806へ飛ぶ。
（ステップ806）紙文書ログ検索部702は、ステップ804で取得したすべての単語に対するログ管理番号一覧から、検索キーワードに合致するログ管理番号を絞り込む。例えば検索キーワードが「XYZ銀行」であり、「XYZ」に合致したログ管理番号が1と4、「銀行」に合致したログ管理番号が1と5となった場合、求めるべきログ管理番号は両方の積集合をとって1のみとなる。 When step 804 is applied to all words, the process jumps to step 806.
(Step 806) The paper document log search unit 702 narrows down log management numbers that match the search keyword from the log management number list for all the words acquired in step 804. For example, if the search keyword is “XYZ bank”, the log management numbers that match “XYZ” are 1 and 4, and the log management numbers that match “bank” are 1 and 5, both log management numbers should be obtained. The product set of is taken to be only 1.

ステップ802〜806は一般的なテキスト検索の動作フローであり、例えば検索キーワードとして「XYZ or 銀行」とすることもできる。この場合キーワードにヒットするログの管理番号は、和集合を取って1、4、5となる。さらに、検索キーワードとして「銀行＊」（銀行で始まる任意の文字列）のように正規表現を用いることも可能である。この場合、キーワードにヒットするログ管理番号は、「銀行」に合致したログ管理番号1、5それぞれのログのテキスト情報カラム412の文章構造を解析することで求めることができる。
（ステップ808）ステップ806で検索キーワードに合致するログ管理番号が1つ以上存在する場合はステップ810へ、存在しなければステップ816へ飛ぶ。
（ステップ810）紙文書ログ検索部702は、ステップ802〜806で求めた検索キーワードに合致するログに対して検索期間などの他の検索条件を適用し、検索条件70に合致するログを絞り込む。
（ステップ812）ステップ810で求めたログが1つ以上存在する場合はステップ814へ、存在しなければステップ816へ飛ぶ。
（ステップ814）紙文書ログ検索部702は、ステップ810で求めた全ての紙文書ログを出力部708に送信する。出力部708は、受信した全ての紙文書ログを図6の領域604のように表示して処理を終了する。
（ステップ816）出力部708は、検索条件70に合致する紙文書ログが存在しない旨を表示して、処理を終了する。 Steps 802 to 806 are general text search operation flows. For example, “XYZ or bank” may be used as a search keyword. In this case, the management number of the log that hits the keyword is 1, 4, and 5 taking the union. Furthermore, a regular expression such as “bank *” (an arbitrary character string starting with a bank) can be used as a search keyword. In this case, the log management number that hits the keyword can be obtained by analyzing the sentence structure in the text information column 412 of each log management number 1 and 5 that matches “bank”.
(Step 808) If there is one or more log management numbers matching the search keyword in step 806, the process jumps to step 810, and if not, the process jumps to step 816.
(Step 810) The paper document log search unit 702 applies other search conditions such as a search period to the logs that match the search keywords obtained in steps 802 to 806, and narrows down the logs that match the search conditions 70.
(Step 812) If one or more logs obtained in step 810 exist, the process jumps to step 814, and if not, the process jumps to step 816.
(Step 814) The paper document log search unit 702 transmits all the paper document logs obtained in step 810 to the output unit 708. The output unit 708 displays all received paper document logs as shown in an area 604 in FIG. 6 and ends the processing.
(Step 816) The output unit 708 displays that there is no paper document log that matches the search condition 70, and ends the processing.

なお、検索条件70に検索キーワードを含んでいなければ、テキスト検索を省略してステップ800からステップ806へ飛ぶ。 If the search condition 70 does not include a search keyword, the text search is omitted and the process jumps from step 800 to step 806.

図9は紙文書来歴検索処理のフローチャートの一例である。図4、図7を参照しつつ各ステップについて説明する。
（ステップ900）図示していないユーザは、検索条件70にヒットした紙文書ログから紙文書来歴を検索するログを、入力部700を介して選択する。以下このログを「選択ログ」とよぶ。
（ステップ902）紙文書来歴検索部704は、ログ管理テーブル40の操作名カラム402を参照して、選択ログの種類を調べる。印刷であった場合ステップ904へ、それ以外の場合906へ飛ぶ。
（ステップ904）紙文書来歴検索部704は、ログ管理テーブル40の紙IDカラム408を参照して、選択ログの紙IDを特定する。選択ログが印刷ログである場合、紙IDは必ず存在する。特定後、ステップ908へ飛ぶ。
（ステップ906）紙文書来歴検索部704は、ログ管理テーブル40の紙IDカラム408を参照して、選択ログの紙IDの有無を調べる。もし紙IDが存在すればステップ908へ、存在しないか無効値が入っている場合にはステップ914へ飛ぶ。後者は、複写物や廃棄物に劣化があり紙IDを抽出できなかったか、もしくは、はじめから紙にIDが書き込まれていなかったことを意味している。
（ステップ908）紙文書来歴検索部704は、紙ID管理テーブル46を参照して、選択ログの紙IDをもつ紙文書ログを特定する。これが求めるべき紙文書来歴情報となる。
（ステップ910）紙文書来歴検索部704は、求めた紙文書来歴に含まれる紙文書ログから紙の出力枚数を計算する。具体的には、印刷ログの部数＋複写ログの部数−廃棄ログの部数が求めるべき出力枚数となる。
（ステップ912）紙文書来歴検索部704は、求めた紙文書来歴情報と紙の出力枚数を出力部708へ送信する。出力部708は、受信した情報を図6の領域606のように表示して処理を終了する。
（ステップ914）出力部708は、選択ログのみ表示して処理を終了する。 FIG. 9 is an example of a flowchart of a paper document history search process. Each step will be described with reference to FIGS.
(Step 900) A user (not shown) selects a log for searching for a paper document history from a paper document log that hits the search condition 70 via the input unit 700. Hereinafter, this log is referred to as “selection log”.
(Step 902) The paper document history search unit 704 refers to the operation name column 402 of the log management table 40 and checks the type of the selected log. If it is printing, the process jumps to step 904, and otherwise it jumps to 906.
(Step 904) The paper document history search unit 704 refers to the paper ID column 408 of the log management table 40 and identifies the paper ID of the selected log. If the selection log is a print log, the paper ID is always present. After specifying, jump to Step 908.
(Step 906) The paper document history search unit 704 refers to the paper ID column 408 of the log management table 40 and checks whether there is a paper ID of the selected log. If the paper ID exists, the process jumps to step 908, and if it does not exist or contains an invalid value, the process jumps to step 914. The latter means that the copy or waste has deteriorated and the paper ID cannot be extracted, or the ID has not been written on the paper from the beginning.
(Step 908) The paper document history search unit 704 refers to the paper ID management table 46 and specifies a paper document log having the paper ID of the selected log. This is the paper document history information to be obtained.
(Step 910) The paper document history search unit 704 calculates the number of output sheets from the paper document log included in the obtained paper document history. Specifically, the number of copies to be obtained is the number of copies of the print log + the number of copies of the copy log−the number of copies of the discard log.
(Step 912) The paper document history search unit 704 transmits the obtained paper document history information and the number of output sheets of paper to the output unit 708. The output unit 708 displays the received information as indicated by an area 606 in FIG. 6 and ends the process.
(Step 914) The output unit 708 displays only the selected log and ends the process.

なお、ステップ906にて、もし複写や廃棄の画像ログからの紙ID抽出に失敗しており紙IDが存在しなかった場合には、ステップ914で選択ログのみ表示するのではなく、選択ログと類似した紙文書ログを選び出して紙文書来歴情報として表示することも可能である。具体的には、ログ管理テーブル40に格納されているすべてのログのテキスト情報と選択ログのテキスト情報を比較して、類似している紙文書ログを1つ以上選び出すことで、紙文書来歴を求める。もしくは、もし選択ログに対応する画像ログが画像ログテーブル42に存在する場合、選択ログに対応する画像ログと画像ログテーブル42に格納されているすべての画像ログとを比較して、類似している画像ログに対応している紙文書ログを1つ以上選び出すことでも紙文書来歴を求めることができる。類似度の算出には、例えば「Fuzzy Hushing」技術を用いることができる。「Fuzzy Hushing」については、文献Jesse Kornblum: “Identifying almost identical files using context triggered piecewise hashing”, Digital Investigation 3S pp. S91‐S97 （2006）に開示されている選び出す紙文書ログの個数は、あらかじめ設定しておく、類似度がある閾値以下の紙文書ログすべてとするなど、事前の設定で決定される。ただし、こうして得られた紙文書来歴情報はあくまでテキスト情報やスキャンイメージが類似している文書であって同一の紙IDが付与されている保証はないため、出力部708はその旨も表示してユーザに注意喚起する必要がある。 In step 906, if the paper ID extraction from the copy or discard image log has failed and there is no paper ID, the selected log is not displayed in step 914, but only the selected log. It is also possible to select similar paper document logs and display them as paper document history information. Specifically, the text information of all the logs stored in the log management table 40 is compared with the text information of the selected log, and one or more similar paper document logs are selected, so that the paper document history can be obtained. Ask. Or, if the image log corresponding to the selected log exists in the image log table 42, the image log corresponding to the selected log is compared with all the image logs stored in the image log table 42, and similar. The paper document history can also be obtained by selecting one or more paper document logs corresponding to the existing image log. For example, the “Fuzzy Hushing” technique can be used to calculate the similarity. For "Fuzzy Hushing", the number of paper document logs to be selected as disclosed in the document Jesse Kornblum: “Identifying almost identical files using context triggered piecewise hashing”, Digital Investigation 3S pp. S91-S97 (2006) is set in advance. It is determined in advance, for example, all paper document logs whose similarity is below a threshold value. However, since the paper document history information obtained in this way is a document with similar text information and scan image, and there is no guarantee that the same paper ID is assigned, the output unit 708 also displays that fact. Need to alert the user.

以上、印刷データに紙IDを付与し、同一の紙IDをもつ紙文書ログを関連付けて管理することで、複写や廃棄の画像ログからOCRで抽出した不確実なテキストの代わりに、印刷データから抽出したテキストで検索できるようになる。 As described above, by assigning a paper ID to the print data and managing the paper document log with the same paper ID in association with each other, it is possible to use the print data instead of the uncertain text extracted from the copy or discard image log by OCR. You can search by the extracted text.

図1の紙文書来歴管理システム10が、紙文書来歴管理サーバ100、クライアントPC102以外の装置（複合機104-1〜104-mおよびシュレッダ106-1〜106-sすべて）を具備しない場合について説明する。この場合、紙文書来歴管理システム10が管理対象とするのは印刷のみとなる。複合機104やシュレッダ106で取得された画像ログについて処理する必要がないため、紙IDが不要となる。以下、図1〜8を適宜修正しつつ、本実施例の詳細を説明する。 The case where the paper document history management system 10 of FIG. 1 does not include devices (all of the multifunction peripherals 104-1 to 104-m and the shredders 106-1 to 106-s) other than the paper document history management server 100 and the client PC 102 will be described. To do. In this case, the paper document history management system 10 only manages printing. Since it is not necessary to process the image log acquired by the multifunction machine 104 or the shredder 106, a paper ID is not necessary. Hereinafter, details of the present embodiment will be described while appropriately modifying FIGS.

図1は本実施例の紙文書来歴管理システム10の全体構成を例示する図である。ただし紙文書来歴管理システム10は複合機104-1〜104-mおよびシュレッダ106-1〜106-sを含まない。紙文書来歴管理システム10は、紙文書の印刷ログを管理する紙文書来歴管理サーバ100と、印刷元となる電子文書を保持するn台のクライアントPC 102-1〜102-nと、電子文書の印刷を行なうp台のプリンタ103-1〜103-pとがネットワーク108に接続した構成を持つ。 FIG. 1 is a diagram illustrating the overall configuration of a paper document history management system 10 of this embodiment. However, the paper document history management system 10 does not include the multifunction peripherals 104-1 to 104-m and the shredders 106-1 to 106-s. The paper document history management system 10 includes a paper document history management server 100 that manages a print log of a paper document, n client PCs 102-1 to 102-n that hold an electronic document to be printed, and an electronic document P printers 103-1 to 103-p that perform printing are connected to the network 108.

図2はクライアントPCおよび紙文書来歴管理サーバの構成を例示する図である。本実施例では画像ログを取り扱わないため、紙IDや画像ログを処理する要素を含まない。具体的には、クライアントPC102は、紙ID生成部202、紙ID書込部208を含まず、紙文書来歴管理サーバ100は、ログ取得部222、紙ID抽出部224、テキスト抽出部226、ログ登録部228を含まない。出力部212は印刷データ変換部200から直接印刷データを受信する。 FIG. 2 is a diagram illustrating the configuration of the client PC and the paper document history management server. Since this embodiment does not handle image logs, it does not include paper IDs or elements that process image logs. Specifically, the client PC 102 does not include the paper ID generation unit 202 and the paper ID writing unit 208, and the paper document history management server 100 includes a log acquisition unit 222, a paper ID extraction unit 224, a text extraction unit 226, a log The registration unit 228 is not included. The output unit 212 receives print data directly from the print data conversion unit 200.

図3は紙文書来歴管理システムのハードウェア構成を例示する図である。実施例1と同様である。 FIG. 3 is a diagram illustrating a hardware configuration of the paper document history management system. Similar to Example 1.

図4は紙文書来歴DBの構成を例示する図である。本実施例では画像ログを取り扱わないため、紙IDや画像ログを処理する要素を含まない。具体的には、紙文書来歴DB220は、画像ログテーブル42、紙ID管理テーブル46、紙ID管理テーブル更新部460を含まない。ログ管理テーブル40の構成要素のうち、紙ID408は存在しない。また、図4の例では、ログ管理番号4、5の複写ログ、廃棄ログが存在しない。 FIG. 4 is a diagram illustrating the configuration of the paper document history DB. Since this embodiment does not handle image logs, it does not include paper IDs or elements that process image logs. Specifically, the paper document history DB 220 does not include the image log table 42, the paper ID management table 46, and the paper ID management table update unit 460. Among the components of the log management table 40, the paper ID 408 does not exist. In the example of FIG. 4, there is no copy log or discard log with log management numbers 4 and 5.

図5は紙文書来歴DB更新処理のフローチャートの一例である。本実施例では紙IDを取り扱わないため、ステップ514〜518の処理が存在しない。 FIG. 5 is an example of a flowchart of the paper document history DB update process. In this embodiment, since the paper ID is not handled, there is no processing in steps 514 to 518.

図6は紙文書来歴検索インターフェースを例示する図である。本実施例では印刷ログのみ扱うため、検索条件を指定する領域602のログの種類を選択する領域が不要となる。また、紙文書ログの検索結果を表示する領域604に記載されている廃棄ログが存在しない。印刷ログのみ取り扱う以上、紙文書の来歴を処理する必要がなくなるため、領域606は不要となる。 FIG. 6 is a diagram illustrating a paper document history search interface. In this embodiment, since only the print log is handled, the area for selecting the log type in the area 602 for designating the search condition becomes unnecessary. Further, there is no discard log described in the area 604 for displaying the paper document log search result. As long as only the print log is handled, it is not necessary to process the history of the paper document, so the area 606 is unnecessary.

図7は紙文書来歴検索プログラムの構成を例示する図である。本実施例では紙文書の来歴を取り扱う必要がないため、紙文書来歴検索部704が不要となる。 FIG. 7 is a diagram illustrating the configuration of a paper document history search program. In the present embodiment, since there is no need to handle the history of paper documents, the paper document history search unit 704 becomes unnecessary.

図8は紙文書ログ検索処理のフローチャートの一例である。本実施例の紙文書ログ検索処理の動作フローは実施例1と同様である。 FIG. 8 is an example of a flowchart of the paper document log search process. The operation flow of the paper document log search process of this embodiment is the same as that of the first embodiment.

以上の構成、動作フローに従い印刷データからテキスト情報を抽出することで、印刷ログをテキストで検索可能になる。 By extracting text information from the print data according to the above configuration and operation flow, the print log can be searched by text.

実施例1もしくは2記載の紙文書来歴管理システム10内で印刷、複写した紙文書を、別の紙文書来歴管理システムに持って行って複写、廃棄を行うなどといったように、システムをまたがって紙文書が流通した場合、これらの紙文書ログを検索可能にする方法および装置について説明する。 Paper documents that have been printed and copied in the paper document history management system 10 described in the first or second embodiment are taken to another paper document history management system to be copied and discarded. A method and apparatus for enabling retrieval of these paper document logs when documents are distributed will be described.

図10は、複数の紙文書来歴検索システムで構成されるシステムの全体構成を例示する図である。r個の紙文書来歴管理システム10-1〜10-rは実施例1もしくは2記載の紙文書来歴管理システム10に相当し、それぞれ紙文書来歴管理サーバ100-1〜100-rを備えている。紙文書来歴管理サーバ100-1〜100-rはネットワーク1002に接続している。さらにネットワーク1002には、紙文書来歴管理サーバ100-1〜100-rへアクセスするための接続情報を管理する紙文書来歴DNSサーバ1000も接続している。以下、各構成要素について詳細を説明する。 FIG. 10 is a diagram illustrating an overall configuration of a system including a plurality of paper document history search systems. The r paper document history management systems 10-1 to 10-r correspond to the paper document history management system 10 described in the first or second embodiment, and include paper document history management servers 100-1 to 100-r, respectively. . The paper document history management servers 100-1 to 100-r are connected to the network 1002. Further, a paper document history DNS server 1000 that manages connection information for accessing the paper document history management servers 100-1 to 100-r is also connected to the network 1002. Details of each component will be described below.

紙文書来歴管理システム10-iは、紙文書来歴管理サーバ100-iと、クライアントPC、プリンタ、複合機、シュレッダなどを備える（i=1〜r）。印刷のみ行い、複合機、シュレッダが存在しない組織の紙文書来歴管理システムは、実施例2記載のシステムとなる。また、もし印刷を行なわず、他組織より人手で渡ってきた紙文書のみを扱う組織が存在する場合、この組織の紙文書管理システムは、印刷元となる電子文書を保管するクライアントPCやプリンタを具備しなくてもよい。 The paper document history management system 10-i includes a paper document history management server 100-i, a client PC, a printer, a multifunction peripheral, a shredder, and the like (i = 1 to r). A paper document history management system for an organization that only performs printing and does not have a multifunction peripheral or a shredder is the system described in the second embodiment. Also, if there is an organization that does not perform printing and handles only paper documents that have been handed over from other organizations, this organization's paper document management system uses a client PC or printer that stores electronic documents to be printed. It does not have to be provided.

紙文書来歴DNSサーバ1000は、紙文書来歴管理サーバ100-1〜100-rの紙文書来歴DBにアクセスするためのIPアドレスなどといった接続情報を管理する。もしくは、各紙文書来歴管理サーバ100-1〜100-rが、別システムに属する紙文書来歴管理サーバ100-1〜100-rの接続情報を保持していてもよい。この場合、紙文書来歴DNSサーバ1000は不要となる。 The paper document history DNS server 1000 manages connection information such as an IP address for accessing the paper document history DB of the paper document history management servers 100-1 to 100-r. Alternatively, each of the paper document history management servers 100-1 to 100-r may hold connection information of the paper document history management servers 100-1 to 100-r belonging to another system. In this case, the paper document history DNS server 1000 is not required.

紙文書来歴管理システム10-1〜10-rが、それぞれある一部署の紙文書ログを収集し、これらを管理する上位組織が存在する場合、ネットワーク1002は当該上位組織によって管理されるイントラネットとなる。逆に、もしシステム10-1〜10-rそれぞれが企業のような独立な組織によって管理される場合、ネットワーク1002はインターネットとなる。 When the paper document history management systems 10-1 to 10-r each collect paper document logs of a certain department and there is an upper organization that manages them, the network 1002 becomes an intranet managed by the upper organization. . Conversely, if each of the systems 10-1 to 10-r is managed by an independent organization such as a company, the network 1002 is the Internet.

システムをまたがって流通した紙文書のログを検索するための動作フローについて、図8、9を用いて説明する。以下、紙文書来歴管理サーバ100-iに属する紙文書来歴DBを紙文書来歴DB-iと記す（i=1〜r）。ある紙文書来歴DB-k（k=1〜r）から、検索条件に該当する紙文書ログや紙文書来歴情報を検索する場合について説明する。 An operation flow for searching a log of a paper document distributed across systems will be described with reference to FIGS. Hereinafter, the paper document history DB belonging to the paper document history management server 100-i is referred to as a paper document history DB-i (i = 1 to r). A case will be described in which a paper document log or paper document history information corresponding to a search condition is searched from a certain paper document history DB-k (k = 1 to r).

図8は本実施例の紙文書ログ検索処理のフローチャートの一例である。このフローチャートのステップ804において、テキスト検索テーブルから検索キーワードを含む紙文書ログを検索するとき、紙文書来歴DB-kだけではなく、紙文書来歴DB-j（j=1〜r、k以外）からも該当する紙文書ログを取得する。具体的には、紙文書来歴管理サーバ100-j（j=1〜r、k以外）のアクセス先情報を紙文書来歴DNSサーバ1000から取得し、各紙文書来歴DB-j（j=1〜r、k以外）のテキスト検索テーブルを参照することで、検索キーワードを含む紙文書ログを取得する。ステップ804以外の処理は実施例1と同様である。 FIG. 8 is an example of a flowchart of the paper document log search process of the present embodiment. In step 804 of this flowchart, when searching the paper document log including the search keyword from the text search table, not only the paper document history DB-k but also the paper document history DB-j (j = 1 to r, other than k) Also obtain the corresponding paper document log. Specifically, the access destination information of the paper document history management server 100-j (j = 1 to r, other than k) is acquired from the paper document history DNS server 1000, and each paper document history DB-j (j = 1 to r The paper document log including the search keyword is acquired by referring to the text search table (other than k). Processing other than step 804 is the same as in the first embodiment.

図9は本実施例の紙文書来歴検索処理のフローチャートの一例である。このフローチャートのステップ908において、紙ID管理テーブルから来歴検索対象となる紙IDを含む紙文書ログを検索するとき、紙文書来歴DB-kだけではなく、紙文書来歴DB-j（j=1〜r、k以外）からも該当する紙文書ログを取得する。具体的には、上と同様、紙文書来歴管理サーバ100-j（j=1〜r、k以外）のアクセス先情報を紙文書来歴DNSサーバ1000から取得し、各紙文書来歴DB-j（j=1〜r、k以外）の紙ID管理テーブルを参照することで、来歴検索対象となる紙IDを含む紙文書ログを取得する。ステップ908以外の処理は実施例1と同様である。 FIG. 9 is an example of a flowchart of the paper document history search process of the present embodiment. In step 908 of this flowchart, when searching the paper document log including the paper ID that is the history search target from the paper ID management table, not only the paper document history DB-k but also the paper document history DB-j (j = 1 to Corresponding paper document log is also acquired from (other than r, k). Specifically, as described above, the access destination information of the paper document history management server 100-j (except j = 1 to r, k) is acquired from the paper document history DNS server 1000, and each paper document history DB-j (j By referring to the paper ID management table (other than = 1 to r, k), a paper document log including a paper ID that is a history search target is acquired. Processing other than step 908 is the same as in the first embodiment.

以上の構成、動作フローに従うことで、複数の紙文書管理システムをまたがって流通する紙文書のログをテキストで検索可能になる。 By following the above-described configuration and operation flow, it becomes possible to search for text of paper document logs distributed across a plurality of paper document management systems by text.

以上の実施形態によれば、検索キーワードにヒットしたログ情報と同一の識別子（紙ID）をもつ複写や廃棄のログも出力するので、電子文書の印刷、印刷された紙文書の複写のログを高精度で検索できる。紙文書の廃棄のログもテキスト検索可能となる。定期的に機密情報に含まれるキーワードでログを検索することで、情報漏えいを検知できるようになる。さらに、漏えいした紙文書の記載内容でログを検索することで、紙文書の情報漏えい元を高精度で特定することができる。 According to the above embodiment, since a copy or discard log having the same identifier (paper ID) as the log information hit with the search keyword is also output, the electronic document print log and the printed paper document copy log are displayed. Search with high accuracy. Text search is also possible for paper document disposal logs. Information leaks can be detected by periodically searching logs with keywords included in confidential information. Furthermore, by searching the log based on the description of the leaked paper document, it is possible to specify the information leak source of the paper document with high accuracy.

10・・・紙文書来歴管理システム、100・・・紙文書来歴管理サーバ、102-1〜102-n・・・クライアントPC、103-1〜103-p・・・プリンタ、104-1〜104-m・・・複合機、106-1〜106-s・・・シュレッダ、108・・・ネットワーク、20・・・電子文書、22・・・紙ID付印刷データ、24・・・紙文書ログ、200・・・印刷データ変換部、202・・・紙ID生成部、204・・・印刷ログ収集部、206・・・テキスト抽出部、208・・・紙ID書込部、210・・・印刷ログ登録部、212・・・出力部、220・・・紙文書来歴DB、222・・・ログ取得部、224・・・紙ID抽出部、226・・・テキスト抽出部、228・・・ログ登録部、230・・・紙文書来歴検索プログラム、300a、300b・・・ハブ、302a、302b・・・CPU、304a、304b・・・メモリ、306a、306b・・・記憶装置、308a、308b・・・記憶装置、310a、310b・・・表示部、312a、312b・・・操作部、314・・・スキャン部、316・・・動作装置、40・・・ログ管理テーブル、42・・・画像ログテーブル、44・・・テキスト検索テーブル、46・・・紙ID管理テーブル、440・・・テキスト検索テーブル更新部、460・・・紙ID管理テーブル更新部、600・・・紙文書来歴検索プログラムのインターフェースの一例、70・・・検索条件、72・・・検索結果、700・・・入力部、702・・・紙文書ログ検索部、704・・・紙文書来歴検索部、706・・・通信部、708・・・出力部、10-1〜10-r・・・紙文書来歴管理システム、100-1〜100-r・・・紙文書来歴管理サーバ、1000・・・紙文書来歴DNSサーバ、1002・・・ネットワーク。 10 ... paper document history management system, 100 ... paper document history management server, 102-1 to 102-n ... client PC, 103-1 to 103-p ... printer, 104-1 to 104 -m ... MFP, 106-1 to 106-s ... Shredder, 108 ... Network, 20 ... Electronic document, 22 ... Print data with paper ID, 24 ... Paper document log , 200 ... print data conversion unit, 202 ... paper ID generation unit, 204 ... print log collection unit, 206 ... text extraction unit, 208 ... paper ID writing unit, 210 ... Print log registration unit, 212 ... output unit, 220 ... paper document history DB, 222 ... log acquisition unit, 224 ... paper ID extraction unit, 226 ... text extraction unit, 228 ... Log registration unit 230 ... Paper document history search program, 300a, 300b ... Hub, 302a, 302b ... CPU, 304a, 304b ... Memory, 306a, 306b ... Storage device, 308a, 308b ... Storage devices, 310a, 310b ... Display unit, 3 12a, 312b ... operation unit, 314 ... scanning unit, 316 ... operating device, 40 ... log management table, 42 ... image log table, 44 ... text search table, 46 ...・ Paper ID management table, 440... Text search table update unit, 460... Paper ID management table update unit, 600... Example of interface for paper document history search program, 70. ..Search results, 700 ... input unit, 702 ... paper document log search unit, 704 ... paper document history search unit, 706 ... communication unit, 708 ... output unit, 10-1 to 10-r ... paper document history management system, 100-1 to 100-r ... paper document history management server, 1000 ... paper document history DNS server, 1002 ... network.

Claims

The paper ID and the text of the electronic document included in the print log output from the client PC, and the paper ID and the text extracted from the image of the paper document corresponding to the electronic document included in the copy log output from the multifunction machine In addition, the paper ID and the text extracted from at least one image of the paper document included in the discard log output from the shredder and the copy paper document in which the paper document is copied by the multifunction device are stored as log information. Paper document history DB to
A paper document log search unit for searching the log information stored in the paper document history DB based on a search condition;
An output unit for outputting the searched log information;
An input unit for inputting the selection of the paper ID from the search condition and the output log information;
A paper document history search unit for searching the log information including the selected paper ID from the paper document history DB;
A paper document history management system comprising:

A text search table that stores an index as a keyword and a management number of the log information including the index in the text in association with each other;
The paper document log search unit
Decomposing a search keyword included in the search condition into words,
When the word obtained by decomposing exists in the index of the text search table, the corresponding management number is obtained,
2. The paper document history management system according to claim 1, wherein the log information indicated by the acquired management number is the retrieved log information output by the output unit.

The paper document history search unit, when searching a plurality of the log information including the selected paper ID from the paper document history DB, the output unit outputs a set of the plurality of log information. 2. The paper document history management system according to claim 1, wherein the system is information.

The paper document history search unit sets the number of print logs + the number of copy logs−the number of discard logs from the set of a plurality of log information,
4. The paper document history management system according to claim 3, wherein the output unit outputs the number of sheets of the paper.

When the log information including the selected paper ID cannot be acquired as a result of searching from the paper document history DB, the paper document history search unit and the text of the log information corresponding to the selected paper ID The paper document having similar text, selecting the log information stored in the paper document history DB, or having an image similar to the image of the log information corresponding to the selected paper ID 4. The paper document history management system according to claim 3, wherein the log information stored in the history DB is selected.

A management server including the paper document history DB, the paper document log search unit, the output unit, the input unit, and the paper document history search unit;
A printer for printing the electronic document as a paper document to which the paper ID is assigned;
Connecting to the printer, holding the electronic document to be printed, assigning the paper ID to the electronic document and outputting to the printer, and managing the print log including the paper ID and the text of the electronic document A client PC that outputs to the server,
A multifunction machine that at least copies the paper document, outputs the copied paper document, and outputs a copy log including the image of the paper document to which the paper ID is assigned, to the management server;
A shredder that discards at least one of the paper document and the copy paper document, outputs a discard log to which the paper ID is assigned and includes at least one image of the discarded paper document and the copy paper document. And a paper document history management system.

The client PC
A paper ID generation unit that generates the paper ID, which is an identifier for identifying a printed material;
A paper ID writing unit for giving the generated paper ID to the electronic document to be output to the printer;
A text extraction unit for extracting text of the electronic document from the electronic document;
7. The paper document history management system according to claim 6, further comprising: a print log registration unit that transmits the print log including the generated paper ID and the extracted text of the electronic document to the management server.

8. The paper ID generation unit generates the number of the paper IDs according to the number of pages of the electronic document, and assigns the generated paper ID to each page of the electronic document. Paper document history management system.

The management server
A log acquisition unit that acquires log information of at least one of the copy log from the multifunction peripheral and the discard log from the shredder;
A paper ID extraction unit that extracts the paper ID from the image included in the acquired log information;
A text extraction unit for extracting text information from the image;
7. The paper document history management system according to claim 6, further comprising a log registration unit that registers the log information, the extracted paper ID, and the extracted text information in the paper document history DB.

10. The paper document history management system according to claim 9, wherein when the paper ID cannot be extracted, the paper ID extraction unit outputs that the extraction of the paper ID has failed.

A printer that prints an electronic document as a paper document with a paper ID;
A client connected to the printer, holding the electronic document to be printed, assigning the paper ID to the electronic document, outputting to the printer, and outputting a print log including the paper ID and the text of the electronic document PC and
A multifunction device for copying at least the paper document, outputting the copied paper document, and outputting a copy log including an image of the paper document assigned with the paper ID;
A shredder that discards at least one of the paper document and the copy paper document, outputs a discard log to which the paper ID is assigned, and includes at least one image of the discarded paper document and the copy paper document;
The paper ID and the text of the electronic document included in the print log output from the client PC, the paper ID and the text extracted from the image of the paper document included in the copy log output from the multifunction device, And a management server that registers the paper ID and text extracted from the image of at least one of the paper document and the copy paper document included in the discard log output from the shredder as log information in the paper document history DB. A paper document history management system comprising:

The client PC
A paper ID generation unit that generates the paper ID, which is an identifier for identifying a printed material;
A paper ID writing unit for giving the generated paper ID to the electronic document to be output to the printer;
A text extraction unit for extracting text of the electronic document from the electronic document;
12. The paper document history management system according to claim 11, further comprising: a print log registration unit that transmits the print log including the generated paper ID and the extracted text of the electronic document to the management server.

13. The paper ID generation unit generates the number of the paper IDs according to the number of pages of the electronic document, and assigns the generated paper ID to each page of the electronic document. Paper document history management system.

The management server
A log acquisition unit that acquires log information of at least one of the copy log from the multifunction peripheral and the discard log from the shredder;
A paper ID extraction unit that extracts the paper ID from the image included in the acquired log information;
A text extraction unit for extracting text information from the image;
12. The paper document history management system according to claim 11, further comprising a log registration unit that registers the log information, the extracted paper ID, and the extracted text information in the paper document history DB.

15. The paper document history management system according to claim 14, wherein when the paper ID cannot be extracted, the paper ID extraction unit outputs that the extraction of the paper ID has failed.