JP2601139B2

JP2601139B2 - String search device

Info

Publication number: JP2601139B2
Application number: JP5148737A
Authority: JP
Inventors: 陽子恒元; 睦治垣原
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1993-06-21
Filing date: 1993-06-21
Publication date: 1997-04-16
Anticipated expiration: 2012-04-16
Also published as: JPH0721191A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、新聞、雑誌、各種文献
等を登録した文書データファイルから指定された文字列
を検索し、その結果を出力する文字列検索装置に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character string search device for searching a specified character string from a document data file in which newspapers, magazines, various documents, etc. are registered, and outputting the result.

【０００２】[0002]

【従来の技術】従来、文字列検索においては、検索処理
の高速性に焦点が置かれており、一件ずつしかキーワー
ドの検索指定ができなかったり、検索結果の表示につい
ても、装置出力部の見やすさや表示内容についても、編
集・加工するなどして文書ファイルから該当文書を効率
的に抽出するための工夫はなされていなかった。2. Description of the Related Art Conventionally, in character string search, the focus has been on the high speed of search processing, and only one keyword can be designated for search. Regarding the legibility and display contents, there has been no contrivance for efficiently extracting the relevant document from the document file by editing or processing.

【０００３】[0003]

【課題を解決するための手段】本発明の文字列検索装置
は、指定されたキーワードを文書データファイルから検
索し、その結果を出力する文字列検索装置であって、キ
ーワードと該キーワードの識別番号との対応表を格納す
るキーワード対応テーブルと、キーワードの識別番号と
該キーワードの文書データファイルにおける位置との対
応表を格納する検索結果テーブルと、各文書における各
キーワードの出現頻度を示す表を格納する回答編集テー
ブルと、検索対象となる複数の文書データを格納する文
書データファイルと、前記文書データファイルにおける
各文書の位置を格納する文書ＩＤファイルと、入力され
たキーワードから前記キーワード対応テーブルを作成す
るキーワード入力処理部と、前記文書データファイルを
検索し、検索結果と前記キーワード対応テーブルをもと
に前記検索結果テーブルを作成する文字列検索処理部
と、前記検索結果テーブルにおけるキーワードの位置と
前記文書ＩＤファイルにおける各文書の位置とから、前
記回答編集テーブルを作成する検索結果編集処理部と、
回答編集テーブルの内容を検索者が真に求める文書を定
量的に表現するものとして出力する検索結果出力処理部
とから構成されている。SUMMARY OF THE INVENTION A character string search apparatus according to the present invention searches for a designated keyword from a document data file and outputs the result, and the keyword and an identification number of the keyword are output. A keyword correspondence table that stores a correspondence table between the keywords, a search result table that stores a correspondence table between the identification number of the keyword and the position of the keyword in the document data file, and a table that indicates the frequency of occurrence of each keyword in each document Creating an answer edit table, a document data file storing a plurality of document data to be searched, a document ID file storing the position of each document in the document data file, and the keyword correspondence table from input keywords. A keyword input processing unit to search for the document data file, and search results A character string search processing unit that creates the search result table based on the keyword correspondence table, and creates the answer edit table from the position of the keyword in the search result table and the position of each document in the document ID file. A search result editing unit;
And a search result output unit that outputs the contents of the answer editing table as a quantitative representation of a document that the searcher truly desires.

【０００４】[0004]

【課題を解決するための手段】上述した問題点を解決す
るため、本発明による文字列検索装置は、指定された文
字列（キーワード）を文書データファイルから検索し、
その結果を出力する文字列検索装置であり、キーワード
とそのキーワード識別番号との対応を表わしたキーワー
ド対応テーブルと、キーワード識別番号とそれに対応す
る文書データファイル中の該当文字列位置情報を格納し
てある検索結果テーブルと、文書毎に検索結果をキーワ
ード別に集計する回答編集テーブルと、検索対象となる
複数の文書データを格納してある文書データファイル
と、文書データファイル内の各文書の位置情報を格納し
てある文書ＩＤファイルと、入力されたキーワードから
キーワード対応テーブルを作成するキーワード入力処理
部と、キーワード対応テーブルをもとに文書データファ
イルを検索し、検索結果を該当文字列の存在する位置情
報（アドレス）として示した検索結果テーブルを作成す
る文字列検索処理部と、検索結果テーブル上の該当位置
情報と文書ＩＤファイルの各文書の文書データファイル
中の位置情報とから、キーワードがどの文書にどの程度
含まれているかを調べ、その結果をもとに回答編集テー
ブルを作成する検索結果編集処理部と、回答編集テーブ
ルの内容を出力装置に出力する検索結果出力処理部を備
えている。In order to solve the above-mentioned problems, a character string search device according to the present invention searches a specified character string (keyword) from a document data file,
A character string search device for outputting the result, which stores a keyword correspondence table indicating a correspondence between a keyword and the keyword identification number, and stores a keyword identification number and corresponding character string position information in a document data file corresponding to the keyword identification number. A search result table, an answer editing table for summarizing search results by keyword for each document, a document data file storing a plurality of document data to be searched, and position information of each document in the document data file. A keyword input processing unit that creates a keyword correspondence table from the stored document ID file, an input keyword, and a search for a document data file based on the keyword correspondence table. A character string search processing unit that creates a search result table shown as information (address) From the corresponding position information on the search result table and the position information in the document data file of each document of the document ID file, it is checked which document contains the keyword and to what extent, and based on the result, an answer editing table is obtained. And a search result output processing unit that outputs the contents of the answer editing table to an output device.

【０００５】[0005]

【作用】本発明では、検索対象文字例として一度に複数
のキーワードを指定することを可能とし、この検索結果
を文書単位にキーワード別に編集・加工して、出力装置
へ出力することによって、検索の操作性を向上させると
ともに、検索結果からどの文書にどういうキーワードが
どの程度の頻度で現われているかわかるようにして、該
当のキーワードをもつ文書が多数存在する場合でも、そ
の中からさらに本当に必要としている文書を絞り込むこ
とができ、文書データファイルから、不要な文書を読み
出す無駄を省くことができる。この検索指定するキーワ
ードの数を増やすことによって目的とする文書の抽出を
より的確で効率的に行うことができる。According to the present invention, it is possible to specify a plurality of keywords at a time as a search target character example, edit and process the search results for each keyword in document units, and output the edited results to an output device, thereby obtaining a search result. In addition to improving operability, the search results show which keywords appear in which documents and at what frequency, and even if there are many documents with the relevant keywords, they are needed even more from those documents Documents can be narrowed down, and unnecessary reading of unnecessary documents from the document data file can be eliminated. By increasing the number of keywords designated for search, a target document can be extracted more accurately and efficiently.

【０００６】[0006]

【実施例】本発明について図面を参照して説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described with reference to the drawings.

【０００７】図１において、本発明の文字列検索装置１
は、キーワード入力処理部１１、文字列検索処理部１
２、検索結果編集処理部１３、検索結果出力処理部１４
と、キーワード入力処理部１１で作成されるキーワード
対応テーブル１５、文字列検索処理部１２で作成される
検索結果テーブル１６、検索結果編集処理部１３で作成
される回答編集テーブル１７、検索対象となる文書デー
タを格納する文書データファイル１８、文書データファ
イル内における各文書の位置情報（アドレス）を格納し
てある文書ＩＤファイル１９から構成されている。また
文字列検索位置１には、キーワード入力処理部１１に検
索文字列を入力する入力装置２が、検索結果出力処理部
１４に検索結果を編集した回答編集テーブル１７の内容
を出力する出力装置３が接続されている。In FIG. 1, a character string search device 1 of the present invention
Is a keyword input processing unit 11, a character string search processing unit 1
2. Search result edit processing unit 13, search result output processing unit
And a keyword correspondence table 15 created by the keyword input processing unit 11, a search result table 16 created by the character string search processing unit 12, an answer edit table 17 created by the search result edit processing unit 13, and a search target. It comprises a document data file 18 for storing document data, and a document ID file 19 for storing position information (address) of each document in the document data file. Further, at the character string search position 1, an input device 2 for inputting a search character string to the keyword input processing unit 11, and an output device 3 for outputting the contents of the answer editing table 17 obtained by editing the search result to the search result output processing unit 14. Is connected.

【０００８】次に各処理部の機能と文字列検索動作につ
いて説明する。キーワード入力処理部１１は、入力装置
２から検索文字列（複数の指定が可能。以下キーワード
と記述する）が入力されるのを待つ。キーワードが入力
されると、キーワード入力処理部１１は、各キーワード
に対してキーワード識別番号を付与して、キーワードと
キーワード識別番号との対応をとり、図２に例示したよ
うなキーワード対応テーブル１５を作成し、キーワード
を管理する。Next, the function of each processing unit and the character string search operation will be described. The keyword input processing unit 11 waits for a search character string (a plurality of designations can be made; hereinafter, described as a keyword) from the input device 2. When a keyword is input, the keyword input processing unit 11 assigns a keyword identification number to each keyword, associates the keyword with the keyword identification number, and stores the keyword in the keyword correspondence table 15 as illustrated in FIG. Create and manage keywords.

【０００９】キーワード対応テーブル１５が作成される
と、文字列検索処理部１２は、キーワード対応テーブル
１５の内容をもとに文書データファイル１８を参照して
実際に検索処理を行い、検索結果を検索結果テーブル１
６に書き出す。この検索は各キーワード別に文書データ
ファイル１８を先頭から検索するのではなく、指定され
た全てのキーワードを対象としてそれらをキーワードバ
ッファにセットしておき、順次文書データファイルの先
頭から検索を実行していく。そして、文書データファイ
ル中に、該当箇所が見つかると、その位置情報（文書デ
ータファイル中のアドレス）と、該当キーワードのキー
ワード識別番号を検索結果として、検索結果テーブル１
６に順次、書き出していく。この結果、検索終了時には
検索結果テーブル１６には、図３に示すように、キーワ
ードの位置情報とキーワード識別番号との対応表が作成
される。When the keyword correspondence table 15 is created, the character string search processing unit 12 actually performs a search process by referring to the document data file 18 based on the contents of the keyword correspondence table 15 and searches for a search result. Result table 1
Write in 6. In this search, instead of searching the document data file 18 for each keyword from the beginning, all the specified keywords are set in the keyword buffer, and the search is executed sequentially from the beginning of the document data file. Go. When a corresponding portion is found in the document data file, the position information (address in the document data file) and the keyword identification number of the corresponding keyword are used as a search result, and the search result table 1
6 is sequentially written. As a result, at the end of the search, as shown in FIG. 3, a correspondence table between the keyword position information and the keyword identification number is created in the search result table 16.

【００１０】検索処理終了後、検索結果編集処理部１３
は、検索結果テーブル１６の検索結果に基づいて文書Ｉ
Ｄファイル１９を参照して検索したキーワードの位置情
報から、そのキーワードがどの文書に属するものである
かチェックすることによって、検索結果を文書単位に各
キーワードの出現頻度を集計し、回答編集テーブル１７
を作成する。図５に示すように文書ＩＤファイル１９に
は、文書データファイル中における各文書の位置情報と
して、開始位置と終了位置が格納されているので、検索
結果テーブル１６に格納されている各キーワードの位置
情報から、各キーワードがどの文書のアドレス範囲にあ
るかがわかるので、文書単位に各キーワードの編集をす
ることができる。この検索は、複数のキーワードについ
て一度に行っていることと、文書データファイル１８中
には、各キーワードに対して複数の該当箇所が存在する
ことにより、検索結果は、文書単位に、しかも各文書内
ではキーワード別にその出現頻度を集計し、回答編集テ
ーブル１７に書き出している。こうして編集された結果
は、検索結果出力処理部１４により出力装置３に出力さ
れる。検索結果出力処理部１４には、通常は回答編集テ
ーブル１７の内容を出力処理用フォーマットに編集して
出力する。検索者は、検索結果の各文書のキーワードの
頻度分布を見て、自分が必要と思われる文書を容易に抽
出することが可能となる。またオプションとして、検索
者（オペレータ）の指定があれば、文書データを出力
し、その中で該当箇所を反転させる機能があり、この場
合は、該当箇所をブリンキング表示することも可能であ
る。After the search processing is completed, the search result editing processing unit 13
Is a document I based on the search result in the search result table 16.
By checking which document the keyword belongs to from the position information of the keyword searched with reference to the D file 19, the search result is totalized for each keyword in the document unit, and the answer editing table 17 is obtained.
Create As shown in FIG. 5, the document ID file 19 stores a start position and an end position as position information of each document in the document data file. Therefore, the position of each keyword stored in the search result table 16 is stored. Since the information indicates which document address range each keyword is in, it is possible to edit each keyword in document units. Since this search is performed for a plurality of keywords at one time, and a plurality of corresponding portions exist for each keyword in the document data file 18, the search result is obtained for each document and for each document. In the table, the frequency of appearance is tabulated for each keyword and written in the answer editing table 17. The result edited in this way is output to the output device 3 by the search result output processing unit 14. Normally, the contents of the answer edit table 17 are edited into an output processing format and output to the search result output processing unit 14. The searcher can easily extract documents that he / she thinks necessary by looking at the frequency distribution of keywords of each document in the search results. As an option, if a searcher (operator) designates, there is a function of outputting document data and inverting a corresponding portion in the document data. In this case, the relevant portion can be blinking displayed.

【００１１】[0011]

【発明の効果】以上説明したように、本発明の文字列検
索装置は、キーワード対応テーブルを作成して指定され
た検索文字列を管理することによって、複数のキーワー
ドを一度に検索することができる。さらに回答編集テー
ブルを作成することによって、検索結果の各文書のキー
ワードの頻度分布を見て、検索された文書の中から、ほ
んとうに必要としているものだけを、文書を読むことな
くさらに絞り込むことが可能となり、不要な文書を文書
データファイルから読み出す無駄をなくし、効率的に、
かつ精度の高い文書の検索が可能になる。As described above, the character string search device of the present invention can search a plurality of keywords at once by creating a keyword correspondence table and managing the specified search character strings. . In addition, by creating an answer edit table, you can look at the frequency distribution of keywords in each document in the search results and further narrow down only those that are really needed from the searched documents without reading the documents. And eliminate unnecessary reading of unnecessary documents from the document data file.
In addition, a highly accurate document search can be performed.

[Brief description of the drawings]

【図１】本発明の一実施例のブロック構成図である。FIG. 1 is a block diagram of an embodiment of the present invention.

【図２】キーワード対応テーブルの説明図である。FIG. 2 is an explanatory diagram of a keyword correspondence table.

【図３】検索結果テーブルの説明図である。FIG. 3 is an explanatory diagram of a search result table.

【図４】回答編集テーブルの説明図である。FIG. 4 is an explanatory diagram of an answer editing table.

【図５】文書ＩＤファイルの説明図である。FIG. 5 is an explanatory diagram of a document ID file.

[Explanation of symbols]

１文字列検索装置２入力装置３出力装置１１キーワード入力処理部１２文字列検索処理部１３検索結果編集処理部１４検索結果出力処理部１５キーワード対応テーブル１６検索結果テーブル１７回答編集テーブル１８文書データファイル１９文書ＩＤファイル REFERENCE SIGNS LIST 1 character string search device 2 input device 3 output device 11 keyword input processing unit 12 character string search processing unit 13 search result editing processing unit 14 search result output processing unit 15 keyword correspondence table 16 search result table 17 answer editing table 18 document data file 19 Document ID file

Claims

(57) [Claims]

1. A character string search device for searching a specified keyword from a document data file and outputting the result, comprising: a keyword correspondence table storing a correspondence table between a keyword and an identification number of the keyword; Search result table that stores a correspondence table between the identification number of the keyword and the position of the keyword in the document data file; an answer edit table that stores a table indicating the frequency of occurrence of each keyword in each document; and a plurality of documents to be searched. A document data file for storing data; a document ID file for storing the position of each document in the document data file; a keyword input processing unit for creating the keyword correspondence table from the input keywords; and a search for the document data file Based on the search results and the keyword correspondence table A character string search processing unit for creating a search result table; a search result edit processing unit for creating the answer edit table from a position of a keyword in the search result table and a position of each document in the document ID file; A character string search device comprising: a search result output processing unit that outputs the contents of an edit table as a quantitative expression of a document that a searcher truly seeks.