JPH11272680A

JPH11272680A - Document data providing device and program recording medium therefor

Info

Publication number: JPH11272680A
Application number: JP10069674A
Authority: JP
Inventors: Akira Ochitani; 亮落谷
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1998-03-19
Filing date: 1998-03-19
Publication date: 1999-10-08

Abstract

(57)【要約】【課題】文書データ提供装置に関し，文書データを検
索した結果, 類似する文書をひとまとめにして表示し
て，利用者に対しチェック効率のよい情報提供を行うこ
とを目的とする。【解決手段】文書データベース20を検索した結果の文
書群中に表現された共通の事実情報を抽出し, それをも
とに文合成部８によって検索結果を要約した表示文書を
作成する。この表示文書を合成文表示部９によってユー
ザに提示する。課金情報計算部10は，表示文書中に使用
された情報ソースごとの使用量にもとづき，各情報ソー
スに対する利用者への課金の計算，情報ソースに分配す
る課金の計算を行う。 (57) [Summary] [Problem] To provide a document data providing device that provides similar information as a result of a search for document data and collectively displays similar documents to a user with efficient checking. . SOLUTION: Common fact information expressed in a document group as a result of searching a document database 20 is extracted, and a sentence synthesizing unit 8 creates a display document summarizing the search result based on the extracted fact information. This display document is presented to the user by the composite sentence display unit 9. The charge information calculation unit 10 calculates the charge to the user for each information source and the charge to be distributed to the information source based on the usage amount of each information source used in the display document.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は，利用者からの検索
要求に対して新聞記事データベースなどの文書データベ
ースを検索した結果を利用者に提示し，また，利用者に
提示した検索結果データの量に応じて課金を行うデータ
ベースサービスのための文書データ提供装置およびその
プログラム記録媒体に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for presenting a search result from a document database such as a newspaper article database to a user in response to a search request from the user, and an amount of search result data presented to the user. The present invention relates to a document data providing apparatus for a database service that performs accounting according to the following, and a program recording medium thereof.

【０００２】[0002]

【従来の技術】新聞記事データベースサービスなどで用
いられている従来の文書データ提供装置では，利用者か
らの検索要求に対して検索結果を表示する際に，以下の
情報を表示することによって，利用者に情報を提供して
いた。2. Description of the Related Art In a conventional document data providing apparatus used in a newspaper article database service or the like, when displaying a search result in response to a search request from a user, the following information is displayed. Provided information to the public.

【０００３】１．検索結果の件数２．タイトル，見出し等，文書の一部の表示３．本文これは，検索結果のテキストを利用者に提示する際に
は，利用者に対して情報量または検索時間等に応じて課
金する必要があるため，最初に検索結果の件数を表示
し，これに対し利用者は，必要ならタイトル，見出し等
の部分情報を閲覧し，それら部分情報から必要そうな情
報を推測して，最終的に文書情報（本文）を閲覧すると
いう手順をとるためである。[0003] 1. 1. Number of search results 2. Display of a part of the document, such as the title and headline. Body This means that when presenting the text of the search results to the user, the user must be charged according to the amount of information or the search time, so the number of search results is displayed first. On the other hand, if necessary, the user browses partial information such as titles and headings, guesses necessary information from those partial information, and finally browses the document information (text). .

【０００４】[0004]

【発明が解決しようとする課題】文書データベースの内
容が新聞記事等の場合，同一事実に関する記事が複数の
新聞社や地方版など複数の近似した文面で格納されてい
ることが多い。したがって，検索結果にはタイトルや内
容の近似するものが複数現れ，その中から必要な事実を
含むものを選ぶ際には，利用者は多くのタイトルや本文
を表示させて必要な情報が含まれるかどうかをチェック
しなければならず，不要な情報のタイトルや本文を順に
チェックする手間がかかり，不要な記事に対しても課金
され無駄が生じるなど，検索結果の提供方法に問題があ
った。When the contents of the document database are newspaper articles or the like, articles relating to the same fact are often stored in a plurality of similar texts such as a plurality of newspaper companies or local editions. Therefore, in the search results, a plurality of titles and contents similar to each other appear, and when selecting one containing necessary facts from among them, the user displays many titles and texts and includes necessary information. It is necessary to check whether or not it is necessary to check the title and body of unnecessary information in order, and there is a problem in the method of providing search results such that unnecessary articles are charged and wasteful.

【０００５】本発明の目的は，このような問題を解決
し，文書データベースの検索結果から文書間で類似する
部分を探し，これらの共通情報をひとまとめに要約して
表示し，また，共通情報として情報ソース（情報源）か
ら引用した情報の量を情報ソースごとに計算し，利用者
課金を情報ソースに分配することにより，重複した検索
結果のチェック等の無駄を省き，また利用者にとっても
情報提供者にとっても不合理とならない課金を可能にす
ることである。[0005] An object of the present invention is to solve such a problem, search for a similar part between documents from a search result of a document database, summarize and display the common information collectively, and as common information. By calculating the amount of information quoted from information sources (information sources) for each information source and distributing the user's billing to the information sources, it is possible to eliminate the waste of checking duplicate search results and to provide information for users. The purpose is to enable charging that is not unreasonable for the provider.

【０００６】[0006]

【課題を解決するための手段】本発明は，上記課題を解
決するため，利用者からの検索要求に応えて，あらかじ
め蓄えられた文書データを検索し，検索結果を提供する
処理を行う際に，１．検索結果の文書群の中に表現された共通の事実情報
を抽出し，２．これら共通の事実情報から必要とされる量の情報を
選択し，３．これら共通の事実情報から要約文を作成し，４．複数の情報ソースから集めた情報の利用量を計算
し，５．各情報ソースの利用量をもとに課金を計算し，課金
を分配する処理を行う手段を持つことを要旨とする。SUMMARY OF THE INVENTION In order to solve the above-mentioned problems, the present invention provides a method for retrieving document data stored in advance in response to a retrieval request from a user and providing a retrieval result. , 1. 1. Extract common fact information expressed in the document group of the search result; 2. Select the required amount of information from these common fact information; 3. Create a summary from these common fact information; 4. Calculate the usage of information gathered from multiple information sources; The gist of the present invention is to have means for calculating a charge based on the usage amount of each information source and distributing the charge.

【０００７】図１は，本発明の文書データ提供装置のシ
ステム構成例を示す。文書データ提供装置１００は，Ｃ
ＰＵおよびメモリなどからなり，以下に説明する各処理
手段を持つ。文書データベース２０は，検索対象となる
文書群のデータが格納されているものであり，文書デー
タ提供装置１００の内部にあって文書データ提供装置１
００が直接管理するものであってもよく，また，文書デ
ータ提供装置１００の外部に存在して，ネットワークを
介してアクセスされるようなものであってもよい。さら
に，文書データベース２０は，複数の独立したデータベ
ースから構成されるものであってもよい。FIG. 1 shows an example of a system configuration of a document data providing apparatus according to the present invention. The document data providing device 100
It is composed of a PU and a memory, and has each processing means described below. The document database 20 stores data of a document group to be searched. The document database 20 is located inside the document data providing apparatus 100 and is stored in the document data providing apparatus 1.
00 may be directly managed, or may exist outside the document data providing apparatus 100 and be accessed via a network. Further, the document database 20 may be composed of a plurality of independent databases.

【０００８】検索条件入力部１は，利用者からの検索条
件の入力を受け付ける。文書検索部２は，検索条件入力
部１により入力された検索条件をもとに文書データベー
ス２０の検索を行う。検索条件の入力および文書検索に
ついては，従来の一般的な検索システムと同様な方法を
用いる。[0008] The search condition input unit 1 receives an input of a search condition from a user. The document search unit 2 searches the document database 20 based on the search condition input by the search condition input unit 1. For input of search conditions and document search, a method similar to that of a conventional general search system is used.

【０００９】文書検索部２で処理された検索結果の文書
群は，文書分割部３により，文書ごとに文書構造上のま
とまり（段落または文など）や意味的なまとまり（意味
段落）の単位に分割される。A document group as a search result processed by the document search unit 2 is divided by the document division unit 3 into a unit of a document structure (paragraph or sentence) or a semantic unit (semantic paragraph) for each document. Divided.

【００１０】クラスタリング部４は，例えば語彙頻度や
文字頻度をもとに計算される語彙ベクトルを特徴とし
て，複数のテキストから切り出された部分テキストの類
似度を計算し分類する周知のクラスタリング技術などを
用いることにより，分割した文書テキストのクラスタリ
ングを行う。The clustering unit 4 uses, for example, a well-known clustering technique for calculating and classifying similarities of partial texts cut out from a plurality of texts by using vocabulary vectors calculated based on vocabulary frequencies and character frequencies. By using this, the clustering of the divided document text is performed.

【００１１】文構造解析部５は，部分テキスト中の名詞
句解析を行い，複合名詞句や連体修飾語を伴う名詞句と
述語を解析し，さらに名詞句と述語の間の依存関係を解
析して文構造の生成を行う。表現照合部６は，文の依存
構造の最下位の語句レベルに対して類似照合処理を行
う。語句レベルの類似照合の後，上位の依存関係に対し
ても類似照合を行う。The sentence structure analysis unit 5 analyzes a noun phrase in a partial text, analyzes a noun phrase and a predicate accompanied by a compound noun phrase and an adnominal modifier, and further analyzes a dependency relationship between the noun phrase and the predicate. To generate a sentence structure. The expression matching unit 6 performs similarity matching processing on the lowest word level of the sentence dependency structure. After similarity matching at the word level, similarity matching is performed for higher-order dependencies.

【００１２】情報量計算部７は，表現照合部６による照
合結果から，各名詞句に共有度の点数を付け，この点数
に基づいてあらかじめ定められた表示すべき情報量の条
件を満たすまで表示語句の範囲を決める。さらに，表示
する範囲の語句に対し情報ソースごとの情報使用量（情
報利用度）を計算する。The information amount calculation unit 7 assigns a score of the degree of sharing to each noun phrase based on the result of matching by the expression matching unit 6 and displays the noun phrases based on the score until a predetermined information amount condition to be displayed is satisfied. Determine the range of words. Further, the information usage (information usage) for each information source is calculated for the words in the display range.

【００１３】文合成部８は，共通する名詞句を含む文の
部分を併合することにより，複数の文から一つの文を作
成する。合成文表示部９は，文合成部８が合成した文を
表示し，利用者に提示する。この合成文は，検索結果の
文書群のうち同じ内容を表すテキスト部分について着目
して，それらを要約したものとなっている。The sentence synthesizing unit 8 creates one sentence from a plurality of sentences by merging sentence parts including a common noun phrase. The synthesized sentence display unit 9 displays the sentence synthesized by the sentence synthesis unit 8 and presents it to the user. This composite sentence is a summary of the text portion that represents the same content in the document group of the search result.

【００１４】課金情報計算部１０は，情報量計算部７で
計算された情報使用量に従って，利用者から徴収する課
金をそれぞれの情報ソースに対して分配するための計算
を行う。課金処理部１１は，課金情報計算部１０が計算
した課金情報に基づいて課金処理を行う。The accounting information calculation unit 10 performs a calculation for distributing the accounting collected from the user to each information source in accordance with the information usage calculated by the information amount calculation unit 7. The billing processor 11 performs a billing process based on the billing information calculated by the billing information calculator 10.

【００１５】本文選択部１２は，合成文表示部９が表示
した合成文に対する利用者の指示により，その合成文に
おける指示された部分に対応する検索結果の元の文書
（本文）を選択する。本文表示部１３は，本文選択部１
２により選択された本文を表示する。The text selection unit 12 selects the original document (text) of the search result corresponding to the designated portion in the composite sentence according to the user's instruction for the composite sentence displayed by the composite sentence display unit 9. The text display unit 13 is a text selection unit 1
The text selected in step 2 is displayed.

【００１６】以上の各処理部は，コンピュータとそのコ
ンピュータが実行するソフトウェアプログラムとによっ
て実現され，これを実現するためのプログラムは，コン
ピュータが読み取り可能な可搬媒体メモリ，半導体メモ
リ，ハードディスクなどの適当な記録媒体に格納するこ
とができる。Each of the above-described processing units is realized by a computer and a software program executed by the computer, and the program for realizing the processing is executed by a computer-readable portable medium memory, a semiconductor memory, a hard disk, or other suitable memory. Storage medium.

【００１７】本発明を用いることにより，文書データベ
ース２０の検索結果の情報提供の際に，検索結果の類似
情報が併合されて内容が短く要約されるため，利用者の
情報選択の手間を減らすことができる。また，複数の情
報ソースの利用度を算出して，情報ソースごとに利用度
に応じた課金を計算し，課金を分配する機能を提供する
ことができる。By using the present invention, when providing information on the search results of the document database 20, similar information of the search results is merged and the contents are summarized briefly, so that the user's trouble of selecting information is reduced. Can be. Further, it is possible to provide a function of calculating the usage of a plurality of information sources, calculating a charge according to the usage for each information source, and distributing the charge.

【００１８】[0018]

【発明の実施の形態】以下に，本発明の実施の形態を説
明する。図２は，図１に示すシステムにおけるデータの
流れを示す図である。Embodiments of the present invention will be described below. FIG. 2 is a diagram showing a data flow in the system shown in FIG.

【００１９】例えば「汚職」という語で新聞記事データ
ベースを検索し，複数検索結果が得られた場合につい
て，図２に示すデータの流れに沿って説明する。検索条
件入力部１により，利用者から検索条件として「汚職」
の語が与えられると，文書検索部２は，この検索条件に
より文書データベース２０を検索する。ここでは，図３
に示すような検索結果が，文書検索部２による検索処理
の結果として得られたとする。図３は，検索条件「汚
職」による検索結果である文書１，文書２，文書３の３
つの文書の内容を示している。For example, a case where a newspaper article database is searched for the word "corruption" and a plurality of search results are obtained will be described along the data flow shown in FIG. The user inputs “corruption” as a search condition using the search condition input unit 1.
Is given, the document search unit 2 searches the document database 20 according to the search condition. Here, FIG.
Is obtained as a result of the search processing by the document search unit 2. FIG. 3 shows documents 1, 2 and 3 which are the search results based on the search condition "corruption".
Shows the contents of two documents.

【００２０】文書分割部３により，図３に示す３つの文
書は，それぞれ文単位に分割される。分割した結果とし
て文書１の場合を例にとると，図４に示すように，文書
１は，部分１，部分２，部分３という部分識別番号を付
された３つの部分文書に分割されることになる。なお，
本発明の実施において，分割単位は文に限られるわけで
はなく，対象とする文書の文表現の性質に応じて，文だ
けではなく段落を単位に分割することも可能であり，ま
た，意味的な処理の結果計算できる意味の切れ目，すな
わち意味段落のような単位で分割することも可能であ
る。The document dividing section 3 divides each of the three documents shown in FIG. 3 into sentences. Taking the case of document 1 as an example of the result of division, as shown in FIG. 4, document 1 is divided into three partial documents with partial identification numbers of part 1, part 2, and part 3. become. In addition,
In the practice of the present invention, the unit of division is not limited to sentences, and it is possible to divide not only sentences but also paragraphs according to the nature of the sentence expression of the target document. It is also possible to divide the data into meaning breaks that can be calculated as a result of various processes, that is, into units such as meaning paragraphs.

【００２１】次のクラスタリング部４では，従来の言語
処理で用いられている語彙頻度ベクトルを特徴情報とし
て，シングルリンク法やシングルパス法などの一般的な
既存のクラスタリング手法により処理が行われて，文書
分割部３によって分割された部分文書がいくつかのクラ
スタに分類される。クラスタリング結果の例を図５に示
す。図５は，図３に示す３つの文書のうち，文書１の部
分１および文書２の部分１（先頭の文）は同一のクラス
タ（クラスタ１）に分類され，文書３の部分１は別のク
ラスタ（クラスタ２）に分類されたことを示している。In the next clustering unit 4, processing is performed by a general existing clustering method such as a single link method or a single pass method using vocabulary frequency vectors used in conventional language processing as feature information. The partial documents divided by the document dividing unit 3 are classified into some clusters. FIG. 5 shows an example of the clustering result. FIG. 5 shows that, of the three documents shown in FIG. 3, part 1 of document 1 and part 1 (head sentence) of document 2 are classified into the same cluster (cluster 1), and part 1 of document 3 is a different cluster. This indicates that the data is classified into a cluster (cluster 2).

【００２２】クラスタリング処理の結果，類似する文書
の部分が複数見つかったクラスタの各部分に対して，文
構造解析部５により文構造解析を行う。この文構造解析
では，まず名詞句や述語などの句の解析が行われ，図６
に示すような部分句が生成される。図６は，同一クラス
タに分類された文書１の部分１と文書２の部分１の部分
句解析結果の例を示している。続いて，名詞句と述語の
間の依存関係解析が行われ，依存関係を示す構造が生成
される。図７は，文書１の部分１について解析を行った
各部分句の依存構造解析結果の例を示している。As a result of the clustering process, a sentence structure analysis unit 5 performs a sentence structure analysis on each part of the cluster in which a plurality of similar document parts are found. In this sentence structure analysis, first, phrases such as noun phrases and predicates are analyzed.
Is generated. FIG. 6 shows an example of a partial phrase analysis result of part 1 of document 1 and part 1 of document 2 classified into the same cluster. Subsequently, a dependency analysis between the noun phrase and the predicate is performed, and a structure indicating the dependency is generated. FIG. 7 shows an example of a dependency structure analysis result of each partial phrase obtained by analyzing the part 1 of the document 1.

【００２３】次に，表現照合部６により，依存構造の最
下位の部分にあたる名詞句や述語について類似照合を行
う。図８は，類似照合パターン規則の例を示す。図８に
おける照合パターン１とそれに対になっている照合パタ
ーン２とにそれぞれマッチする部分句は，類似するもの
と判断される。Next, the expression matching unit 6 performs similarity matching on a noun phrase or predicate corresponding to the lowest part of the dependency structure. FIG. 8 shows an example of the similar matching pattern rule. Subphrases that match the matching pattern 1 in FIG. 8 and the matching pattern 2 that matches the matching pattern 1 are determined to be similar.

【００２４】この表現照合部６による類似照合では，句
を構成する要素の間に，図８に示すようなあらかじめ定
められた特定の品詞や語の類似照合パターン規則を介し
て，類似するもの，および完全に一致するものを照合す
る。例えば，図６に示すような部分句において，類似す
るものとしては，「６日」＝「六日」，「同公団の」＝
「同公団」などがあり，完全に一致するものとしては，
「東京地検特捜部は」＝「東京地検特捜部は」がある。
また，それらの要素の個数が句全体の要素の個数に対
し，一定の比率を超えた場合にも類似したものと判定す
る。例えば「日本××公団の汚職事件で」＝「日本××
公団の接待汚職事件で」は，照合により類似と判定され
る。In the similarity matching performed by the expression matching unit 6, similarities between elements constituting a phrase are determined by a predetermined similar part of speech or word similarity matching pattern rule as shown in FIG. And an exact match. For example, in the subphrases as shown in FIG. 6, as similarities, "6th" = "6th", "the same corporation" =
There is the same corporation, etc.
There is "Tokyo District Public Prosecutors Special Investigation Department" = "Tokyo District Public Prosecutors Special Investigation Department".
In addition, when the number of those elements exceeds a certain ratio with respect to the number of elements of the entire phrase, it is determined that they are similar. For example, "In case of Japan XX Corporation corruption case" = "Japan XX
"Corporate entertainment corruption case" is determined to be similar by collation.

【００２５】依存構造の最下位の語句を照合した後，表
現照合部６は，依存構造の上位の構造に関しても，同様
の類似照合パターン規則により特定の依存構造の間での
類似を判定する。また，部分構造の照合個数が，そのレ
ベルの依存構造の要素全体の個数に対し一定の比率を超
えるかどうかを計算し，一定の比率を超えた場合には依
存構造同士がマッチしたと判定する。このような照合処
理を最上位の依存構造まで繰り返す。After matching the lowest-order words in the dependency structure, the expression matching unit 6 determines the similarity between the specific dependency structures with respect to the higher-order structure of the dependency structure using the same similarity matching pattern rule. It also calculates whether the number of substructure matches exceeds a certain ratio to the total number of elements in the dependent structure at that level, and if the ratio exceeds a certain ratio, determines that the dependent structures match. . Such a matching process is repeated up to the highest dependency structure.

【００２６】次に，情報量計算部７により，依存構造の
最下位部にあたる名詞句，述語に対しての類似照合結果
をもとに，類似または完全に一致した要素の個数に一定
の係数αをかけたものを共有度として，図９のように計
算する。図９に示す共有度の計算結果の例では，α＝２
としている。Next, based on the result of similarity matching with respect to the noun phrase and the predicate at the lowest part of the dependency structure, the information amount calculation unit 7 sets a constant α to the number of similar or completely matching elements. Is calculated as shown in FIG. In the example of the calculation result of the sharing degree shown in FIG.
And

【００２７】続いて，情報量計算部７は，下位の依存構
造から順に，先の類似照合処理で照合した依存構造につ
いて，照合する依存構造の共有度を加えて係数を下位の
共有度の総和に一定の係数βを掛けて，上位に向かって
順に計算していき，情報量付きの依存構造を生成する。
図１０の例では，β＝２とし各依存構造に対して情報量
を計算した結果を示している。図１０に示す依存構造の
各枝に付した数字がその計算結果の情報量である。Subsequently, the information amount calculation unit 7 adds, in order from the lower-order dependency structure, the degree of sharing of the dependency structure to be collated with respect to the dependency structure that has been collated in the similarity matching process, and adds the coefficient to the sum of the lower-level degree of sharing. Is multiplied by a constant coefficient β, and calculation is performed in order from the higher order to generate a dependency structure with an information amount.
The example of FIG. 10 shows the result of calculating the amount of information for each dependent structure with β = 2. The number attached to each branch of the dependency structure shown in FIG. 10 is the information amount of the calculation result.

【００２８】文合成部８では，各部分構造の上位の依存
構造から始めて情報量の多い部分構造を順に選択し，必
要な表示量（あらかじめ表示に必要とされる句の数や文
字数が与えられる）を満たすまで順に選択を繰り返す。
図１０に示す依存構造から，１００文字を表示量の最大
限度として生成した合成文の例を，図１１に示す。The sentence synthesizing unit 8 sequentially selects the substructures having a large amount of information, starting from the dependency structure at the top of each substructure, and gives the required display amount (the number of phrases and the number of characters required for display in advance). Repeat the selection in order until the condition is satisfied.
FIG. 11 shows an example of a composite sentence in which 100 characters are generated as the maximum display amount from the dependency structure shown in FIG.

【００２９】次に，課金情報計算部１０では，合成文を
生成する際に使用された情報ソース（情報源）ごとに情
報使用量を次の式により計算する。情報使用量＝Σ（情報源から引用された部分）×（１／
共有度）図１１に示す合成文に対して，図１２に示すように，各
部分の「１／共有度」を求める。図１３は，この情報使
用量の計算結果の例を示している。計算の結果，図１１
に示す合成文では，文書１の情報利用度は７．５，文書
２の情報利用度は３．５となる。Next, the billing information calculator 10 calculates the information usage for each information source (information source) used when generating the composite sentence by the following formula. Information usage = Σ (portion quoted from information source) x (1 /
With respect to the composite sentence shown in FIG. 11, “1 / share degree” of each part is obtained as shown in FIG. FIG. 13 shows an example of the calculation result of the information usage amount. As a result of the calculation, FIG.
In the composite sentence shown in (1), the information usage of document 1 is 7.5 and the information usage of document 2 is 3.5.

【００３０】これらの情報利用度，すなわち情報使用量
の比率に従い利用者課金の分配率を計算する。図１４
は，計算した課金情報の分配率の例を示す。図１４の例
では，図１１の合成文の一記事あたりの課金単位を１と
して，これに対する課金の各情報源への分配率を計算
し，その結果が文書１を含む情報源に対して「０．６
８」，文書２を含む情報源に対して「０．３２」となっ
ている。The distribution rate of user billing is calculated according to the information usage, that is, the ratio of the information usage. FIG.
Shows an example of the calculated distribution ratio of charging information. In the example of FIG. 14, the charge unit per article of the composite sentence of FIG. 11 is set to 1, and the distribution rate of the charge to each information source is calculated. 0.6
8 "and" 0.32 "for the information source including the document 2.

【００３１】合成文表示部９では，検索結果を文書単位
で表示する場合には，先の合成文を表示文書に必要な表
示量（あらかじめ表示に必要とされる句の数や文字数が
与えられる）を満たすまで共有情報量の多いものから順
に，元となる文書単位に集めて，表示文書を作成し表示
する。図１５は，表示文書の例を示している。When the search result is displayed in document units, the synthesized sentence display unit 9 gives the display amount of the preceding synthesized sentence required for the display document (the number of phrases and the number of characters required for display in advance). (1) until the amount of shared information is large, the documents are collected in the original document unit, and a display document is created and displayed. FIG. 15 shows an example of a display document.

【００３２】この文書の表示では，元の文書（本文）へ
のリンク情報を共有部分の表示に組合わせて表示し，マ
ウス等のポインティング装置により，本文を選択できる
ようにする。本文選択部１２は，マウス等からの選択入
力情報により，文書識別番号を得て，本文表示部１３に
より利用者の選択した本文を表示する。In this document display, link information to the original document (text) is displayed in combination with the display of the shared portion, and the text can be selected by a pointing device such as a mouse. The text selection unit 12 obtains a document identification number based on selection input information from a mouse or the like, and displays the text selected by the user on the text display unit 13.

【００３３】図１６は，本文選択の例を示す。例えば，
図１５の表示文書の画面において，マウスカーソルを
「日本××公団の接待汚職事件で」の語句の部分に持っ
てくると，その部分に該当する元の文書へのリンク情報
が，図１６に示すようにポップアップされて表示され
る。ここで，「記事１」および「記事２」の部分は，例
えば「Ａ新聞（文書１）」，「Ｂ新聞（文書２）」とい
うような情報源を示すコメントの表記でもよい。マウス
カーソル３０をマウスにより移動させ，例えば「記事
２」（文書２を示す）のメニュー項目を選ぶと，図３に
示す検索結果のうち文書２の記事本文が，本文選択部１
２によって選択され，本文表示部１３によって表示され
る。利用者は，以上のようなユーザインタフェースによ
り，要約された検索結果を読むことができる。また，検
索結果のうち読みたい情報だけを効率よく選択して読む
こともできる。FIG. 16 shows an example of text selection. For example,
When the mouse cursor is brought to the portion of the phrase "in the case of Japanese XX Corporation's entertainment corruption" on the display document screen of FIG. 15, the link information to the original document corresponding to the portion is shown in FIG. A pop-up is displayed as shown. Here, the “article 1” and “article 2” parts may be a description of a comment indicating an information source such as “newspaper A (document 1)” or “newspaper B (document 2)”. When the mouse cursor 30 is moved with the mouse and, for example, a menu item of “article 2” (indicating document 2) is selected, the article text of document 2 in the search results shown in FIG.
2 and is displayed by the text display unit 13. The user can read the summarized search results by using the user interface as described above. Further, it is also possible to efficiently select and read only the information to be read from the search results.

【００３４】[0034]

【発明の効果】以上説明したように，本発明によれば，
検索結果を一件一件提示するのではなく，類似部分はひ
とまとめに要約して表示するので，利用者に対し検索結
果のチェック効率が良い情報提供が可能になる。As described above, according to the present invention,
Rather than presenting search results one by one, similar parts are summarized and displayed collectively, so that information can be provided to the user with high efficiency of search result check.

[Brief description of the drawings]

【図１】本発明のシステム構成例を示す図である。FIG. 1 is a diagram showing a system configuration example of the present invention.

【図２】図１に示すシステムにおけるデータの流れを示
す図である。FIG. 2 is a diagram showing a data flow in the system shown in FIG.

【図３】検索結果の例を示す図である。FIG. 3 is a diagram illustrating an example of a search result.

【図４】部分文書の例を示す図である。FIG. 4 is a diagram illustrating an example of a partial document.

【図５】クラスタリング結果の例を示す図である。FIG. 5 is a diagram illustrating an example of a clustering result.

【図６】部分句解析結果の例を示す図である。FIG. 6 is a diagram showing an example of a partial phrase analysis result.

【図７】依存構造解析結果の例を示す図である。FIG. 7 is a diagram illustrating an example of a dependent structure analysis result.

【図８】類似照合パターン規則の例を示す図である。FIG. 8 is a diagram illustrating an example of a similar matching pattern rule.

【図９】共有度の計算結果の例を示す図である。FIG. 9 is a diagram illustrating an example of a calculation result of a sharing degree.

【図１０】情報量を計算した結果の例を示す図である。FIG. 10 is a diagram illustrating an example of a result of calculating an information amount.

【図１１】合成文の例を示す図である。FIG. 11 is a diagram illustrating an example of a composite sentence.

【図１２】各部分の情報使用量の例を示す図である。FIG. 12 is a diagram illustrating an example of information usage of each part.

【図１３】情報使用量の計算結果例を示す図である。FIG. 13 is a diagram illustrating an example of a calculation result of information usage.

【図１４】課金情報の例を示す図である。FIG. 14 is a diagram illustrating an example of billing information.

【図１５】表示文書の例を示す図である。FIG. 15 is a diagram illustrating an example of a display document.

【図１６】本文選択の例を示す図である。FIG. 16 is a diagram illustrating an example of text selection.

[Explanation of symbols]

１検索条件入力部２文書検索部３文書分割部４クラスタリング部５文構造解析部６表現照合部７情報量計算部８文合成部９合成文表示部１０課金情報計算部１１課金処理部１２本文選択部１３本文表示部２０文書データベース１００文書データ提供装置 DESCRIPTION OF SYMBOLS 1 Search condition input part 2 Document search part 3 Document division part 4 Clustering part 5 Sentence structure analysis part 6 Expression matching part 7 Information amount calculation part 8 Sentence synthesis part 9 Synthetic sentence display part 10 Billing information calculation part 11 Billing processing part 12 Text Selection unit 13 Body display unit 20 Document database 100 Document data providing device

Claims

[Claims]

In response to a search request from a user, a document data providing apparatus that searches for document data stored in advance and provides a search result is provided with common fact information expressed in a document group of the search result. Document data providing means for extracting a search document, creating a display document summarizing search results based on the extracted common fact information, and presenting the created display document to a user. apparatus.

2. A document data providing device for searching document data stored in advance in response to a search request from a user and providing a search result, the common fact information expressed in a document group of the search result. , A means for creating a display document summarizing the search results based on the extracted common fact information, a means for presenting the created display document to the user, and a method for displaying a plurality of documents used in the display document. Means for calculating the amount of information collected from each information source for each information source, and calculating a charge to a user for each information source or calculating a charge to be distributed to the information source based on the amount of use for each information source And a means for performing the following.

3. The document data providing apparatus according to claim 1, wherein an amount of information required to create a display document from common fact information expressed in the document group of the search result is provided. A document data providing device, comprising: means for selecting.

4. A document data providing apparatus according to claim 1, wherein said means for selecting a text corresponding to a designated portion in the display of said display document presented to a user is provided. Means for displaying the contents of the document data.

5. A recording medium on which a program for realizing a document data providing apparatus for searching document data stored in advance and providing search results in response to a search request from a user is provided. A process of extracting common fact information expressed in the resulting document group, a process of creating a display document summarizing search results based on the extracted common fact information, and presenting the created display document to the user A program recording medium for a document data providing apparatus, wherein a program for causing a computer to execute the processing is recorded.

6. A recording medium on which a program for realizing a document data providing apparatus for searching document data stored in advance in response to a search request from a user and providing a search result is provided. A process of extracting common fact information expressed in the resulting document group, a process of creating a display document summarizing search results based on the extracted common fact information, and presenting the created display document to the user Processing, calculating the amount of information collected from a plurality of information sources used in the display document for each information source, and providing a user for each information source based on the amount of use for each information source. A program for causing a computer to execute the calculation of the accounting of the document or the calculation of the accounting to be distributed to the information source. Recording medium.