JP2006252455A

JP2006252455A - File management device, file management method and file management program

Info

Publication number: JP2006252455A
Application number: JP2005071590A
Authority: JP
Inventors: Hirosuke Takada; 浩祐高田
Original assignee: Fuji Photo Film Co Ltd
Current assignee: Fujifilm Holdings Corp
Priority date: 2005-03-14
Filing date: 2005-03-14
Publication date: 2006-09-21

Abstract

<P>PROBLEM TO BE SOLVED: To automatically provide images with file names, suitable for the contents of documents without the user having to sequentially specify the file names. <P>SOLUTION: A file name providing section 110c provides a document image Ik with a file name, in accordance with the page information extracted from the document image Ik. If the title information extracted from the document image Ik is "a complete works of literature" and page number information is "50" for instance, "the complete works of literature 50" is given as the file name of the document image Ik. The total number of pages may also be given at the same time. In the case of n=100, for instance, "the complete works of literature 50/100" is given as the file name of the document image Ik. The file name gives clear information as to which page of which document the scanned document is located. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明はファイル管理装置、ファイル管理方法及びファイル管理プログラムに係り、特にデジタル画像ファイルに変換された原稿の管理に関する。 The present invention relates to a file management apparatus, a file management method, and a file management program, and more particularly to management of a document converted into a digital image file.

従来、スキャナで複数枚の原稿を連続して読み取った場合、スキャナ画像を容易に管理できるようにする技術が様々開発されている。例えば、特許文献１によると、ＤＢ検索部が検索した部品情報に基づいてページ生成部がスキャン順を指定するページを提供し、当該ページでファイルに付与すべき部品情報等のファイル名情報が原稿の読み取り順に基づいて並び替えられると、テンプレート生成部がその並び替えたファイル名情報に基づいてリネーム用のジョブテンプレートを生成し、リネーム処理部がリネーム用のジョブテンプレートに従って原稿の読み取り順に生成される画像ファイルに指定されたファイル名を順次付与する。
特開２００３−２７１６１４号公報 2. Description of the Related Art Conventionally, various techniques have been developed that allow a scanner image to be easily managed when a scanner reads a plurality of documents continuously. For example, according to Patent Document 1, a page generation unit provides a page for designating a scan order based on component information searched by a DB search unit, and file name information such as component information to be added to a file on the page is a document. Are rearranged based on the reading order, the template generation unit generates a renaming job template based on the rearranged file name information, and the renaming processing unit generates the original in the reading order according to the renaming job template. Sequentially assign the specified file names to the image files.
JP 2003-271614 A

特許文献１の技術では、ユーザがいちいち原稿の読み取り順にリネームすべきファイル名情報を指定していかなければならず面倒である。本発明はこのような問題点に鑑みてなされたもので、ユーザが逐一ファイル名を指定することなく、原稿の内容に適合したファイル名を自動で画像に付与することを目的とする。 In the technique of Patent Document 1, it is troublesome for the user to designate file name information to be renamed in the reading order of the original document. The present invention has been made in view of such problems, and an object of the present invention is to automatically assign a file name suitable for the content of an original to an image without the user specifying the file name one by one.

上記課題を解決するため、本願発明に係るファイル管理装置は、原稿の画像を取得する画像取得部と、原稿の画像を記憶する画像記憶部と、画像記憶部に記憶された原稿の画像から原稿の所属を示すページ情報を抽出するページ情報抽出部と、ページ情報抽出部の抽出したページ情報に従って原稿の画像のファイル名を付与するファイル名付与部と、を備える。 In order to solve the above problems, a file management apparatus according to the present invention includes an image acquisition unit that acquires an image of an original, an image storage that stores an image of the original, and an original from an image of the original stored in the image storage A page information extracting unit that extracts page information indicating the affiliation of the document, and a file name assigning unit that assigns a file name of the image of the document according to the page information extracted by the page information extracting unit.

この発明によると、抽出したページ情報が原稿の画像のファイル名に付与されるため、原稿のページ番号をファイル名から一目瞭然に把握でき、単純にファイル名に連番を付与してどの画像がどの原稿ページに対応するか分からなくなるのを防げる。また、ユーザがいちいち適切なファイル名を指定していく必要もない。 According to the present invention, since the extracted page information is added to the file name of the document image, the page number of the document can be easily understood from the file name, and a serial number is simply added to the file name to determine which image This prevents you from knowing whether or not it corresponds to a manuscript page. Also, there is no need for the user to specify an appropriate file name.

なお、ページ情報は、原稿のページ数を示す情報、原稿のタイトルを示す情報、原稿の作成者その他原稿の所属を識別するに足る何らかの情報又はこれらの一部又は全部を組み合わせた情報を含む。 The page information includes information indicating the number of pages of the document, information indicating the title of the document, document creator, other information sufficient to identify the affiliation of the document, or information combining a part or all of these.

ページ情報抽出部は原稿の画像中の余白に囲まれた文字列からページ情報を抽出するようにしてもよい。 The page information extraction unit may extract page information from a character string surrounded by margins in the document image.

通常、ページ情報は本文と分離して余白に囲まれていることが多いと考えられるため、効率的にページ情報を抽出できる。 Normally, it is considered that the page information is often separated from the text and surrounded by margins, so that the page information can be extracted efficiently.

このファイル管理装置は、或る原稿の画像から抽出されたページ情報の位置に関する情報であるページ位置情報を取得するページ位置情報取得部と、ページ位置情報を記憶するページ位置記憶部と、をさらに備え、ページ情報抽出部はページ位置情報を基準とした所定の領域内において他の原稿の画像からページ情報を抽出するようにしてもよい。 The file management apparatus further includes a page position information acquisition unit that acquires page position information that is information related to a position of page information extracted from an image of a document, and a page position storage unit that stores page position information. The page information extraction unit may extract page information from an image of another document in a predetermined area based on the page position information.

既に或る原稿の画像からページ情報が抽出されていれば、他の原稿の画像についても、ページ位置情報で示される位置と略同一位置にページ情報が存在すると考えられる。このため、画像の全領域についてページ情報を検索していくよりも効率的にページ情報を抽出することが可能である。 If page information has already been extracted from an image of a certain original, it is considered that the page information is also present at substantially the same position as the position indicated by the page position information for other original images. Therefore, it is possible to extract the page information more efficiently than searching the page information for the entire area of the image.

このファイル管理装置は、原稿の画像から原稿のタイトルを示すタイトル情報を抽出するタイトル情報抽出部をさらに備えていてもよい。 The file management apparatus may further include a title information extraction unit that extracts title information indicating the title of the document from the image of the document.

タイトル情報抽出部は原稿の画像中の余白に囲まれた文字列からタイトル情報を抽出してもよい。 The title information extraction unit may extract title information from a character string surrounded by margins in the document image.

通常、タイトル情報は、本文と分離して余白に囲まれていることが多いと考えられるため、効率的にページ情報を抽出できる。 Normally, it is considered that the title information is often surrounded by a blank space separated from the text, so that page information can be extracted efficiently.

このファイル管理装置は、原稿のタイトルの候補を登録するタイトル候補登録部をさらに備え、タイトル情報抽出部はタイトル候補登録部に登録された原稿のタイトルの候補と一致する文字列をタイトル情報として抽出するようにしてもよい。 The file management apparatus further includes a title candidate registration unit for registering document title candidates, and the title information extraction unit extracts character strings that match the document title candidates registered in the title candidate registration unit as title information. You may make it do.

原稿が社内文書のような定型的な内容を有する文書であった場合、予測できるタイトルを候補として予め登録しておけば、文字列からタイトル情報を効率的に抽出できる。 If the manuscript is a document having a typical content such as an in-house document, title information can be efficiently extracted from a character string if a predictable title is registered in advance as a candidate.

このファイル管理装置は、抽出された文字列の内タイトル情報にすべき所望の文字列を指定させるタイトル指定部をさらに備えてもよい。 The file management apparatus may further include a title designation unit that designates a desired character string to be used as title information in the extracted character string.

こうすると、抽出された文字列が複数種類あっても、その中からユーザがタイトル情報を任意に指定できる。 In this way, even if there are a plurality of types of extracted character strings, the user can arbitrarily specify title information from among them.

ファイル名付与部はタイトル情報に従って原稿の画像のファイル名を付与してもよい。 The file name assigning unit may assign the file name of the document image according to the title information.

抽出したタイトル情報が原稿の画像のファイル名に付与されるため、原稿のタイトルをファイル名から一目瞭然に把握できる。 Since the extracted title information is added to the file name of the document image, the title of the document can be easily understood from the file name.

このファイル管理装置は、タイトル情報に対応するフォルダを画像記憶部に作成するフォルダ作成部と、タイトル情報に対応するフォルダにタイトル情報の抽出された原稿の画像を格納するファイル管理部と、をさらに備えてもよい。 The file management apparatus further includes a folder creation unit that creates a folder corresponding to the title information in the image storage unit, and a file management unit that stores an image of the document from which the title information is extracted in the folder corresponding to the title information. You may prepare.

こうすると、画像をタイトルごとのフォルダで分類できる。 In this way, images can be classified by folder for each title.

ページ情報の形式を判断するページ形式判断部をさらに備え、ファイル管理部はページ情報の形式の共通する画像を共通のフォルダに格納してもよい。 A page format determination unit that determines the format of the page information may be further provided, and the file management unit may store images having a common page information format in a common folder.

こうすると、同じ形式のページ情報を有する画像同士が共通のフォルダに分類され、異なる形式の文書が同一のフォルダに混在してしまうのを可及的に防げる。 In this way, images having page information of the same format are classified into a common folder, and documents of different formats can be prevented from being mixed in the same folder as much as possible.

原稿の画像を取得するステップと、原稿の画像を記憶するステップと、記憶された原稿の画像から原稿のページ数を示すページ情報を抽出するステップと、抽出されたページ情報に従って原稿の画像のファイル名を付与するステップと、を含むファイル管理方法も本発明に含まれる。 A step of acquiring a document image, a step of storing the document image, a step of extracting page information indicating the number of pages of the document from the stored document image, and a file of the document image according to the extracted page information A file management method including the step of assigning a name is also included in the present invention.

原稿の画像を取得するステップと、原稿の画像を記憶するステップと、記憶された原稿の画像から原稿のページ数を示すページ情報を抽出するステップと、抽出されたページ情報に従って原稿の画像のファイル名を付与するステップと、をコンピュータに実行させるファイル管理プログラムも本発明に含まれる。 A step of acquiring a document image, a step of storing the document image, a step of extracting page information indicating the number of pages of the document from the stored document image, and a file of the document image according to the extracted page information A file management program that causes a computer to execute the step of assigning a name is also included in the present invention.

以下、添付した図面を参照し本発明の好ましい実施の形態を説明する。 Hereinafter, preferred embodiments of the present invention will be described with reference to the accompanying drawings.

図１は本発明の好ましい実施形態に係るファイル管理装置１０の機能ブロック図である。このファイル管理装置１０は、タッチパネル付きモニタ１０６と、インターネットなどのネットワークを介して必要な情報を送受信することが可能な通信回線インターフェース１０８を備える。ファイル管理装置１０には、スキャナなどで構成された画像取得装置２００とデータ通信するためのＵＳＢインターフェース２０６も設けられている。タッチパネル付きモニタ１０６には、画像の一覧や各種操作ボタンが表示される。 FIG. 1 is a functional block diagram of a file management apparatus 10 according to a preferred embodiment of the present invention. The file management apparatus 10 includes a monitor 106 with a touch panel and a communication line interface 108 capable of transmitting and receiving necessary information via a network such as the Internet. The file management apparatus 10 is also provided with a USB interface 206 for data communication with the image acquisition apparatus 200 configured by a scanner or the like. On the monitor 106 with a touch panel, a list of images and various operation buttons are displayed.

また、ファイル管理装置１０は、ファイル管理装置１０の全体の制御を行う中央処理装置（ＣＰＵ）１１０と、ＣＰＵ１１０を動作させるプログラム等が書き込まれているＲＯＭ及びＣＰＵ１１０が処理を実行する際の作業領域となるＲＡＭから構成されるシステムメモリ１１２と、タッチパネル付きモニタ１０６に表示する情報を出力する表示コントローラ１１４と、タッチパネル付きモニタ１０６に加えられた押圧で各種情報の入力を受け付ける入力コントローラ１１６とを備える。 The file management apparatus 10 includes a central processing unit (CPU) 110 that performs overall control of the file management apparatus 10, a ROM in which a program for operating the CPU 110 is written, and a work area when the CPU 110 executes processing. A system memory 112 configured by a RAM, a display controller 114 that outputs information to be displayed on the monitor 106 with a touch panel, and an input controller 116 that receives input of various types of information by pressing applied to the monitor 106 with a touch panel. .

また、ファイル管理装置１０は、画像取得装置２００からの画像などを一時的に格納するハードディスクユニット（ＨＤＤ）１１８と、ＨＤＤ１１８への情報の格納あるいはＨＤＤ１１８からの情報の読み出しを制御するＨＤＤコントローラ１１９を備える。 In addition, the file management apparatus 10 includes a hard disk unit (HDD) 118 that temporarily stores an image from the image acquisition apparatus 200, and an HDD controller 119 that controls storage of information in the HDD 118 or reading of information from the HDD 118. Prepare.

画像取得装置２００はラインＣＣＤスキャナなどの各種スキャナで構成され、ＵＳＢインターフェース２０６によりＣＰＵ１１０と接続されている。画像取得装置２００は、図示しないマウント上にセットされた複数の原稿を連続的に１枚ずつスキャンして原稿を読み取り、画像ファイル（原稿画像）に変換する。原稿画像はＨＤＤ１１８に格納される。 The image acquisition apparatus 200 includes various scanners such as a line CCD scanner, and is connected to the CPU 110 via a USB interface 206. The image acquisition apparatus 200 continuously scans a plurality of documents set on a mount (not shown) one by one, reads the documents, and converts them into image files (document images). The document image is stored in the HDD 118.

なお、画像取得装置２００をデジタルカメラやビデオカメラなどの各種撮像装置で構成してもよい。即ち、各種撮像装置により原稿を撮像し、撮像で得た画像ファイルを原稿画像としてＨＤＤ１１８に格納してもよい。 Note that the image acquisition device 200 may be configured by various imaging devices such as a digital camera and a video camera. That is, a document may be captured by various imaging devices, and an image file obtained by the imaging may be stored in the HDD 118 as a document image.

図２は、ファイル管理装置１０の要部構成図である。ＣＰＵ１１０は、処理単位（モジュール）として、スキャナ制御部１１０ａ、ページ情報抽出部１１０ｂ、ファイル名付与部１１０ｃ、ページ位置情報取得部１１０ｄ、タイトル情報抽出部１１０ｅ、タイトル候補登録部１１０ｆ、フォルダ作成部１１０ｇ、ファイル管理部１１０ｈ、ページ形式判断部１１０ｉを有している。各処理単位は、プログラムとしてＲＯＭに記憶されている。これらの処理単位の機能は後述する。 FIG. 2 is a main part configuration diagram of the file management apparatus 10. The CPU 110 includes, as processing units (modules), a scanner control unit 110a, a page information extraction unit 110b, a file name assignment unit 110c, a page position information acquisition unit 110d, a title information extraction unit 110e, a title candidate registration unit 110f, and a folder creation unit 110g. A file management unit 110h and a page format determination unit 110i. Each processing unit is stored in the ROM as a program. The functions of these processing units will be described later.

システムメモリ１１２は、後述のページ位置情報を記憶する領域であるページ位置記憶部１１２ａを有している。ＨＤＤ１１８は原稿画像を記憶する画像記憶部の役割を果たす。 The system memory 112 includes a page position storage unit 112a that is an area for storing page position information described later. The HDD 118 serves as an image storage unit that stores document images.

以下、図３のフローチャートに従い、ＣＰＵ１１０の実行する処理を説明する。 Hereinafter, processing executed by the CPU 110 will be described with reference to the flowchart of FIG.

Ｓ１では、スキャナ制御部１１０ａは、画像取得装置２００を制御し、原稿をスキャンさせ、原稿画像をＨＤＤ１１８に記憶する。原稿画像に濃淡調整等の各種画像処理を行った上でＨＤＤ１１８に記憶してもよい。なお、ＨＤＤ１１８に記憶された画像にはスキャン順に連続した通し番号ｋ（ｋ＝１，２，３・・，ｎ。ｎはスキャンした原稿の総数）がファイル名として暫定的に付与される。以下、各原稿画像をＩｋで表し、ｋ＝１〜ｎについてステップＳ１〜１５をループさせる。 In S <b> 1, the scanner control unit 110 a controls the image acquisition apparatus 200 to scan the document and store the document image in the HDD 118. The original image may be stored in the HDD 118 after various image processing such as density adjustment is performed. Note that serial numbers k (k = 1, 2, 3,..., N, where n is the total number of scanned documents) are provisionally assigned to the images stored in the HDD 118 as file names. Hereinafter, each original image is represented by Ik, and steps S1 to S15 are looped for k = 1 to n.

Ｓ２では、ファイル管理部１１０ｈは、ページ位置記憶部１１２ａにページ位置情報（原稿画像においてページ情報の配置されている位置に関する情報。以下同じ。）が記憶されているか否かを判断する。ページ位置情報が記憶されている場合はＳ３に移行し、記憶されていない場合はＳ５に移行する。 In S2, the file management unit 110h determines whether or not page position information (information regarding the position where the page information is arranged in the document image; the same applies hereinafter) is stored in the page position storage unit 112a. If page position information is stored, the process proceeds to S3, and if not stored, the process proceeds to S5.

なお、ページ位置情報は、後述のＳ８又はＳ１１で記憶される。このため、少なくとも最初の原稿画像についてＳ２が実行されても、ページ位置情報が記憶されていない状態であり、自動的にＳ５へ移行することになる。 The page position information is stored in S8 or S11 described later. For this reason, even if S2 is executed for at least the first original image, the page position information is not stored, and the process automatically proceeds to S5.

Ｓ３では、ページ情報抽出部１１０ｂは、記憶されたページ位置情報を基準とした所定の周辺領域内において、余白に囲まれた文字列を原稿画像Ｉｋから抽出する。 In S3, the page information extraction unit 110b extracts a character string surrounded by margins from the document image Ik in a predetermined peripheral area based on the stored page position information.

例えば、図４に示すような原稿画像Ｉｋがあり、原稿画像Ｉｋの左下隅を原点ＯとしたＸＹ平面座標における１対の対角点Ｐ０（Ｘ０，Ｙ０）及びＰ１（Ｘ１，Ｙ１）がページ位置情報としてページ位置記憶部１１２ａに記憶されているとする。この場合、ページ情報抽出部１１２ｂは、対角点Ｐ０及びＰ１で規定される矩形領域Ｒ０内から文字列を抽出する。 For example, there is a document image Ik as shown in FIG. 4, and a pair of diagonal points P0 (X0, Y0) and P1 (X1, Y1) in the XY plane coordinates with the lower left corner of the document image Ik as the origin O is a page. Assume that it is stored in the page position storage unit 112a as position information. In this case, the page information extraction unit 112b extracts a character string from the rectangular area R0 defined by the diagonal points P0 and P1.

あるいは、ＸＹ平面座標における１対の対角点Ｐ２（Ｘ２，Ｙ２）及びＰ３（Ｘ３，Ｙ３）がページ位置情報としてページ位置記憶部１１２ａに記憶されているとする。この場合、ページ情報抽出部１１２ｂは、対角点Ｐ２及びＰ３で規定される矩形領域Ｒ１内から文字列を抽出する。 Alternatively, it is assumed that a pair of diagonal points P2 (X2, Y2) and P3 (X3, Y3) in the XY plane coordinates are stored in the page position storage unit 112a as page position information. In this case, the page information extraction unit 112b extracts a character string from the rectangular area R1 defined by the diagonal points P2 and P3.

Ｓ４では、タイトル情報抽出部１１０ｅは、原稿画像Ｉｋの領域Ｒ１から抽出された文字列にタイトル情報（原稿のタイトルを示す情報。ページ情報の一態様）が含まれているか否かを判断する。この判断は、例えば次のようにする。即ち、予めタイトル候補登録部１１０ｆによってＨＤＤ１１８に登録されている所定のタイトル文字列とマッチングする文字列が含まれている場合にタイトルが含まれていると判断する。あるいは、タイトル情報抽出部１１０ｅは、抽出された文字列をタッチパネルモニタ１０６に表示し、この文字列がタイトルであるか否かをタッチパネルモニタ１０６から指示入力させる。タイトルである旨の指示入力がされた場合はタイトルが含まれていると判断する。タイトルが含まれている場合はＳ８に移行し、タイトルが含まれていない場合はＳ５に移行する。 In S4, the title information extraction unit 110e determines whether or not title information (information indicating the title of the document; one aspect of page information) is included in the character string extracted from the region R1 of the document image Ik. This determination is performed as follows, for example. That is, when a character string matching a predetermined title character string registered in the HDD 118 by the title candidate registration unit 110f in advance is included, it is determined that the title is included. Alternatively, the title information extraction unit 110e displays the extracted character string on the touch panel monitor 106, and instructs the touch panel monitor 106 to input whether or not this character string is a title. If an instruction indicating that the title is input, it is determined that the title is included. If the title is included, the process proceeds to S8, and if the title is not included, the process proceeds to S5.

なお、タイトル候補登録部１１０ｆがタッチパネル１０６から任意に入力されたタイトル文字列をタイトル候補としてＨＤＤ１１８に登録できるようにしてもよい。タイトル文字列は、例えば「文学」、「医学」、「科学」、「ビジネス」などの包括的あるいは総称的な文字列、あるいは「短歌」や「詩」などといったさらに個別具体的な文字列を複数登録できるようにしておく。 Note that the title candidate registration unit 110f may register a title character string arbitrarily input from the touch panel 106 in the HDD 118 as a title candidate. For the title character string, for example, a comprehensive or generic character string such as “literature”, “medicine”, “science”, “business”, or an individual specific character string such as “tanka” or “poetry”. Allow multiple registrations.

そして、タイトル情報抽出部１１０ｅは、原稿画像Ｉｋから抽出された文字列とマッチングする所定のタイトル文字列が複数ある場合、これをタイトル候補としてタッチパネルモニタ１０６に表示し、タイトル候補からいずれか一つのタイトル候補をタッチパネルモニタ１０６から指示入力させることでタイトル情報を確定してもよい。 Then, when there are a plurality of predetermined title character strings that match the character string extracted from the document image Ik, the title information extraction unit 110e displays these as title candidates on the touch panel monitor 106, and any one of the title candidates is displayed. The title information may be determined by inputting a title candidate from the touch panel monitor 106.

Ｓ５では、ページ情報抽出部１１２ｂは、原稿画像Ｉｋの全領域の内余白に囲まれた領域から文字列を抽出する。例えば、図３の原稿画像Ｉｋの場合、余白に囲まれた領域Ｒ０〜Ｒ５のすべてから文字列を抽出する。 In S5, the page information extraction unit 112b extracts a character string from the area surrounded by the inner margin of the entire area of the document image Ik. For example, in the case of the document image Ik in FIG. 3, character strings are extracted from all of the regions R0 to R5 surrounded by the margins.

Ｓ６では、タイトル情報抽出部１１０ｅは、原稿画像Ｉｋから抽出された文字列にタイトル情報が含まれているか否かを判断する。この判断はＳ４と同様である。タイトル情報が含まれている場合はＳ７に移行し、タイトルが含まれていない場合はＳ１０に移行する。 In S6, the title information extraction unit 110e determines whether title information is included in the character string extracted from the document image Ik. This determination is the same as in S4. If the title information is included, the process proceeds to S7, and if the title is not included, the process proceeds to S10.

Ｓ７では、フォルダ作成部１１０ｇは、原稿画像Ｉｋから抽出されたタイトル情報に対応するフォルダ（例えばタイトル情報をフォルダ名として付与したフォルダ）をＨＤＤ１１８に作成する。ファイル管理部１１０ｈは、このフォルダにタイトル情報を抽出した原稿画像Ｉｋを格納する。これにより、原稿画像から抽出されたタイトル情報ごとのフォルダによって原稿画像Ｉｋを分類できる。 In S7, the folder creation unit 110g creates a folder corresponding to the title information extracted from the document image Ik (for example, a folder assigned title information as a folder name) in the HDD 118. The file management unit 110h stores the document image Ik obtained by extracting the title information in this folder. Thereby, the document image Ik can be classified by the folder for each title information extracted from the document image.

Ｓ８では、ページ位置情報取得部１１０ｄはページ位置情報を原稿画像Ｉｋから取得してページ位置記憶部１１２ａに記憶する。次に、ページ情報抽出部１１０ｂは、抽出された文字列にページ数情報（ページ数を示す情報。ページ情報の一態様）が含まれているか否かを判断する。この判断は、単なる数字とページ数とを区別するための判断を含んでおり、例えば、原稿画像Ｉｋの四隅領域又は上下左右の周縁領域から検出された数字をページ数情報と判断する。あるいは、総ページ数と該当ページ数とが一体となった文字列が含まれている場合（例えば１／５など）、ページ数情報が含まれていると判断する。ページ数情報が含まれている場合はＳ１４に移行し、ページ数情報が含まれていない場合はＳ９に移行する。 In S8, the page position information acquisition unit 110d acquires the page position information from the document image Ik and stores it in the page position storage unit 112a. Next, the page information extraction unit 110b determines whether or not the extracted character string includes page number information (information indicating the number of pages; one aspect of page information). This determination includes determination for simply distinguishing the number from the number of pages. For example, the numbers detected from the four corner areas or the upper, lower, left, and right peripheral areas of the document image Ik are determined as the page number information. Alternatively, when a character string in which the total page number and the corresponding page number are integrated (for example, 1/5), it is determined that page number information is included. If page number information is included, the process proceeds to S14. If page number information is not included, the process proceeds to S9.

Ｓ９では、ファイル名付与部１１０ｃは、原稿画像Ｉｋから抽出されたタイトル、通し番号及びページ数が抽出できなかったことを示す文字又は記号を含んだ所定の文字列を原稿画像のファイル名として付与する。例えば、抽出されたタイトルが「文学大全集」、通し番号が「００１」、ページ数が抽出できなかったことを示す文字又は記号が「？」とすると、ファイル名として「文学大全集００１？」を原稿画像に付与する。 In S9, the file name assigning unit 110c assigns a predetermined character string including characters or symbols indicating that the title, serial number, and number of pages extracted from the document image Ik could not be extracted as the file name of the document image. . For example, if the extracted title is “Literature Collection”, the serial number is “001”, and the character or symbol indicating that the number of pages could not be extracted is “?”, The file name is “Literature Collection 001?”. It is added to the original image.

Ｓ１０では、ページ情報抽出部１１０ｂ原稿画像Ｉｋから抽出された文字列にページ数情報が含まれているか否かを判断する。この判断はＳ８と同様である。 In S10, it is determined whether or not the page number information is included in the character string extracted from the page information extraction unit 110b original image Ik. This determination is similar to S8.

Ｓ１１では、ページ位置情報取得部１１０ｄはページ位置情報を原稿画像Ｉｋから取得してページ位置記憶部１１２ａに記憶する。次に、ページ形式判断部１１０ｉは、原稿画像Ｉｋから抽出されたページ情報の形式と他の原稿画像Ｉｊ（ｊ≠ｋ）から抽出されたページ情報の形式を、全ての他の原稿画像Ｉｊについて比較する。 In S11, the page position information acquisition unit 110d acquires the page position information from the document image Ik and stores it in the page position storage unit 112a. Next, the page format determination unit 110i sets the format of page information extracted from the document image Ik and the format of page information extracted from another document image Ij (j ≠ k) for all other document images Ij. Compare.

Ｓ１２では、ページ形式判断部１１０ｉは、上記比較の結果、原稿画像Ｉｋから抽出されたページ情報と形式が共通する他の原稿画像Ｉｊ（ｊ≠ｋ）がＨＤＤ１１８に保存されているか否かを判断する。ページ情報の形式が共通するとは、例えば、ページ数やタイトルのフォント、サイズ、レイアウト、フォーマット等の特徴が共通することである。他の原稿画像Ｉｊが保存されている場合はＳ１３に移行し、保存されていない場合はＳ１５に移行する。 In S12, the page format determination unit 110i determines whether another document image Ij (j ≠ k) having the same format as the page information extracted from the document image Ik is stored in the HDD 118 as a result of the comparison. To do. The common page information format means that, for example, features such as the number of pages, title font, size, layout, and format are common. If another document image Ij is stored, the process proceeds to S13, and if not stored, the process proceeds to S15.

Ｓ１３では、ファイル管理部１１０ｈは、他の原稿画像Ｉｊが複数でなく１つだけであるか否かを判断する。他の原稿画像Ｉｊが１つだけである場合はＳ１５に移行し、複数の場合はＳ１４に移行する。 In S13, the file management unit 110h determines whether there is only one other document image Ij instead of a plurality of document images Ij. If there is only one other document image Ij, the process proceeds to S15, and if there are a plurality of other document images Ij, the process proceeds to S14.

Ｓ１４では、ファイル名付与部１１０ｃは、原稿画像Ｉｋから抽出されたページ情報に従って原稿画像Ｉｋにファイル名を付与する。例えば、原稿画像Ｉｋから抽出されたタイトル情報が「文学大全集」、ページ数情報が「５０」だとすると、原稿画像Ｉｋのファイル名として「文学大全集５０」を付与する。なお、総ページ数も合わせて付与してもよい。例えばｎ＝１００とすると、原稿画像Ｉｋのファイル名として「文学大全集５０／１００」を付与する。このファイル名から、スキャンした原稿がどの文書の何ページであるかが一目瞭然で分かる。 In S14, the file name assigning unit 110c assigns a file name to the document image Ik according to the page information extracted from the document image Ik. For example, if the title information extracted from the manuscript image Ik is “Literature University Complete Collection” and the page number information is “50”, “Literature University Complete Collection 50” is assigned as the file name of the manuscript image Ik. The total number of pages may also be given. For example, if n = 100, “literature complete works 50/100” is assigned as the file name of the document image Ik. From this file name, you can see at a glance how many pages of which document the scanned document is.

Ｓ１５では、ファイル名付与部１１０ｃは、原稿画像Ｉｋと他の原稿画像Ｉｊのそれぞれについて新たに生成した連番をファイル名として付与し、所定のフォルダに格納する。連番の付与順は先に付与された通し番号順とする。これにより、タイトルの抽出されなかった原稿画像についても、連番を付与することができる。 In S15, the file name assigning unit 110c assigns a newly generated serial number for each of the document image Ik and the other document image Ij as a file name, and stores it in a predetermined folder. The sequential number assignment order is the order of serial numbers assigned previously. As a result, serial numbers can be assigned to document images from which titles have not been extracted.

なお、上述のステップＳ１〜１５をＣＰＵ１１に実行させる方法及びプログラムも本発明に含まれる。 In addition, the method and program which make CPU11 perform above-mentioned step S1-15 are also contained in this invention.

本発明の好ましい実施形態に係るファイル管理装置のブロック図1 is a block diagram of a file management apparatus according to a preferred embodiment of the present invention. ファイル管理装置の要部構成図Main part configuration diagram of file management device ファイル管理装置の実行する処理の流れを示すフローチャートFlow chart showing the flow of processing executed by the file management device 原稿の一例を示す図Figure showing an example of a document

Explanation of symbols

１１０：ＣＰＵ、１１２：システムメモリ、１１８：ハードディスクユニット 110: CPU, 112: System memory, 118: Hard disk unit

Claims

An image acquisition unit for acquiring an image of the document;
An image storage unit for storing the image of the original;
A page information extraction unit that extracts page information indicating the affiliation of the document from an image of the document stored in the image storage unit;
A file name giving unit for giving a file name of the image of the document according to the page information extracted by the page information extracting unit;
A file management apparatus comprising:

The file management apparatus according to claim 1, wherein the page information extraction unit extracts page information from a character string surrounded by a margin in an image of the document.

A page position information acquisition unit that acquires page position information that is information related to the position of page information extracted from an image of a certain original;
A page position storage unit for storing the page position information;
Further comprising
The file management apparatus according to claim 1, wherein the page information extraction unit extracts page information from an image of another document within a predetermined area based on the page position information.

The file management apparatus according to claim 1, further comprising a title information extraction unit that extracts title information indicating a title of the document from the image of the document.

The file management apparatus according to claim 4, wherein the title information extraction unit extracts title information from a character string surrounded by a margin in an image of the document.

A title candidate registration unit for registering candidate titles of the manuscript;
6. The file management apparatus according to claim 4, wherein the title information extraction unit extracts, as title information, a character string that matches a document title candidate registered in the title candidate registration unit.

The file management apparatus according to claim 6, further comprising a title designation unit that designates a desired character string to be used as title information in the extracted character string.

The file management apparatus according to claim 4, wherein the file name assigning unit assigns a file name of the image of the document according to the title information.

A folder creation unit that creates a folder corresponding to the title information in the image storage unit;
A file management unit for storing an image of the document from which the title information is extracted in a folder corresponding to the title information;
The file management apparatus according to claim 4, further comprising:

A page format determination unit for determining the format of the page information;
The file management apparatus according to claim 9, wherein the file management unit stores images having a common page information format in a common folder.

Obtaining a document image;
Storing an image of the document;
Extracting page information indicating the number of pages of the document from the stored document image;
Assigning a file name of the image of the document according to the extracted page information;
File management method including.

Obtaining a document image;
Storing an image of the document;
Extracting page information indicating the number of pages of the document from the stored document image;
Assigning a file name of the image of the document according to the extracted page information;
File management program that causes a computer to execute.