JPH06103311A

JPH06103311A - Text base information retrieving device

Info

Publication number: JPH06103311A
Application number: JP4249672A
Authority: JP
Inventors: Kenji Sato; 研治佐藤; Kazushi Muraki; 一至村木
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1992-09-18
Filing date: 1992-09-18
Publication date: 1994-04-15
Anticipated expiration: 2013-05-18
Also published as: JP2752864B2

Abstract

PURPOSE:To enable interactive information gathering and acquisition by handling the whole text data as a plane text and performing retrieval under functional conditions. CONSTITUTION:The information retrieving device equipped with an input means 1, a large-scale text data base 6, an information extracting means 5, and an output means 8 is equipped with a function relation extracting means 4 which extracts function units as units carrying information in respective texts in the large-scale text data base 6, a function unit index 3 which holds the function units extracted by the function relation extracting means 4 and the positions in the texts as indexes, a function unit retrieving means 2 which retrieves a function unit matching with an input in the function unit index 3, and a majority decision evaluating means 7 which extracts representative information by making a majority decision basing on the coincidence of the function unit when the amount of the text outputted by the information extracting means 5 is large.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は情報検索装置に関し、特
に文献や章節といった単位に依らず、複数テキストに跨
る情報を検索する機能を有する情報検索装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an information retrieving apparatus, and more particularly to an information retrieving apparatus having a function of retrieving information spanning a plurality of texts regardless of a unit such as a document or a chapter.

【０００２】[0002]

【従来の技術】従来のテキストベースの情報検索方法
は、百科事典や辞書の例にみられるように、ある特定の
項目についての説明のテキストが、それらの項目をイン
デクスとして整理されている形態をもっており、検索の
際はこの項目名を探しその説明のテキストより情報を得
るという検索方法である。これを電子化した電子百科事
典では従来の引き方に加え、電子化のメリットを生かし
インデクスの文字列との部分的な一致等でも引くことが
可能となっている。しかし、この従来のテキストベース
情報検索方式は、百科事典や辞書のように情報が項目に
よって整理されていることが必須であり、この項目の整
理は人手に依ってしか行うことができないという欠点が
ある。また、人手で情報を整理する必要があるため、既
にテキスト中には大量に現れ、しばしば用いられる新た
な語彙が、誰かによって整理されるまではテキストベー
スからは検索できないという欠点がある。2. Description of the Related Art A conventional text-based information retrieval method has a form in which a text explaining a certain specific item is organized using those items as indexes, as seen in examples of encyclopedias and dictionaries. Therefore, the search method is to search for this item name and obtain information from the text of its description. In the electronic encyclopedia that digitizes this, in addition to the conventional way of drawing, it is possible to draw even with partial matching with the character string of the index by taking advantage of the digitization. However, in this conventional text-based information retrieval method, it is essential that the information be organized by items, such as encyclopedias and dictionaries, and the disadvantage that this item can be organized only manually. is there. In addition, since it is necessary to manually organize information, there is a disadvantage that a large number of words already appearing in a text and often used cannot be searched from a text base until someone arranges them.

【０００３】これを解決するため特開平２−２５３３７
１号にみられるように、自然言語のテキストの意味解析
を行い、文の正規パターンを作っておき、ユーザの入力
との一致をみることで類似文の検索を行う自然文意味解
析処理装置が提案されている。しかし、この装置は比較
的類似した入力が繰り返される場合の入力パターンの一
致をみるための装置で、百科事典のようなユーザの要求
する情報が多岐にわたるような場面では有効ではない。To solve this, Japanese Patent Laid-Open No. 25337/1990
As can be seen in No. 1, a natural sentence semantic analysis processing device that performs a semantic analysis of a natural language text, creates a regular pattern of a sentence, and searches for a similar sentence by checking a match with a user input. Proposed. However, this device is a device for observing the matching of input patterns when relatively similar inputs are repeated, and is not effective in a situation such as an encyclopedia in which a wide variety of information requested by a user.

【０００４】[0004]

【発明が解決しようとする課題】上記の人手による情報
整理のコストを発生させない情報検索法として、単語検
索のみを全テキストに対して行うフルテキストサーチの
技術もある。しかし、この方法を用いて検索を行っても
以下のような問題がある。（１）複数のテキストに必要な情報が跨っており単語だ
けでは検索できない（２）同種の情報が大量に検索されてしまい不必要な情
報が繰り返してしまう（３）大量の情報をユーザが処理できず目的の情報へ行
き着けないThere is also a full-text search technique in which only word search is performed on all texts as an information search method that does not incur the cost of manual information arrangement. However, even if a search is performed using this method, there are the following problems. (1) Necessary information spans multiple texts and cannot be searched only with words (2) A large amount of information of the same type is searched and unnecessary information is repeated (3) A large amount of information is processed by the user I can not reach the target information because I can not do it

【課題を解決するための手段】上述した問題点を解決す
るため、本発明の情報検索装置は、ユーザの情報検索要
求を受け付ける入力手段と、大量のテキストを保持する
大規模テキストデータベースと、前記大規模テキストデ
ータベースより入力に合致したインデクスを用いて必要
な情報を抽出する情報抽出手段と、抽出した情報をユー
ザに提示する出力手段を備えた情報検索装置において、
前記大規模テキストデータベース中の各テキストの情報
を担う単位としての機能単位を抽出する機能関係抽出手
段と、前記機能関係抽出手段により抽出された機能単位
とそのテキスト内での位置をインデクスとして保持する
機能単位インデクスと、入力に合致した機能単位を前記
機能単位インデクス内で検索する機能単位検索手段と、
前記情報抽出手段より出力されるテキストが大量であっ
た場合に、その機能単位の一致数により多数決をとり代
表的な情報を抽出する多数決評価手段を備えることを特
徴とする。In order to solve the above-mentioned problems, an information retrieval apparatus of the present invention comprises an input means for accepting an information retrieval request from a user, a large-scale text database holding a large amount of text, and In an information retrieval device equipped with an information extraction means for extracting necessary information using an index that matches the input from a large-scale text database and an output means for presenting the extracted information to the user,
Functional relation extraction means for extracting a functional unit as a unit for carrying information of each text in the large-scale text database, and a functional unit extracted by the functional relation extraction means and its position in the text are held as an index. A functional unit index and a functional unit search means for searching the functional unit index for a functional unit matching the input,
When a large amount of text is output from the information extraction means, a majority decision evaluation means is provided for taking a majority decision according to the number of matching functional units and extracting representative information.

【０００５】[0005]

【作用】本発明によるテキストベース情報検索装置で
は、情報検索の単位として情報を担う単位であるテキス
ト内の単語間の機能的関係を用いて、複数のテキストに
対し同時に検索を行い情報を提供する。このため、ユー
ザの必要とする極小の情報が、たとえ複数のテキストに
跨っていても検索可能となる。更に、同種の情報が大量
に検索された場合には、それらの情報を機能的関係が同
じものでまとめ、その量を情報の信頼性や重要性と考
え、多いものから順に提供するということを行う。この
多数決による情報選択で、同種の情報が大量に検索され
てしまい不必要な情報が繰り返すということがなくなっ
ている。更に、この極小の情報を提示する検索をインタ
ラクティブに用いることで、ユーザに処理しきれない情
報を一度に与えてしまう事なく目的の情報の検索を可能
としている。In the text-based information search device according to the present invention, a plurality of texts are searched at the same time to provide information by using the functional relationship between words in the text, which is a unit that carries information as a unit of information search. . Therefore, even the smallest amount of information required by the user can be searched for even if it spans a plurality of texts. Furthermore, when a large amount of information of the same type is retrieved, those information are grouped with the same functional relationship, and the amount is considered to be the reliability and importance of the information, and it is provided in order from the largest. To do. By selecting information by this majority vote, a large amount of information of the same type is searched and unnecessary information is no longer repeated. Furthermore, by interactively using the search that presents this minimum amount of information, it is possible to search for the target information without giving the user all the information that cannot be processed at once.

【０００６】[0006]

【実施例】次に、本発明について図面を参照して説明す
る。図１は本発明の一実施例を示すブロック図である。
図１を参照すると、本発明の実施例は、ユーザの情報検
索要求を受け付ける入力手段１と、大量のテキストを保
持する大規模テキストデータベース６と、前記大規模テ
キストデータベース６中の各テキストの情報を担う単位
としての機能単位を抽出する機能関係抽出手段４と、前
記機能関係抽出手段により抽出された機能単位とそのテ
キスト内での位置をインデクスとして保持する機能単位
インデクス３と、入力に合致した機能単位を前記機能単
位インデクス内で検索する機能単位検索手段２と、前記
大規模テキストデータベース６より入力に合致したイン
デクスを用いて必要な情報を抽出する情報抽出手段５
と、前記情報抽出手段５より出力されるテキストが大量
であった場合に、その機能単位の一致数により多数決を
とり代表的な情報を抽出する多数決評価手段７と、抽出
した情報をユーザに提示する出力手段８から構成され
る。DESCRIPTION OF THE PREFERRED EMBODIMENTS Next, the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing an embodiment of the present invention.
With reference to FIG. 1, an embodiment of the present invention, an input means 1 for receiving a user's information search request, a large-scale text database 6 for holding a large amount of text, and information of each text in the large-scale text database 6 A functional relationship extracting unit 4 for extracting a functional unit as a unit for carrying out the function, a functional unit index 3 for holding the functional unit extracted by the functional relationship extracting unit and its position in the text as an index, and the input matched. A functional unit searching means 2 for searching a functional unit in the functional unit index, and an information extracting means 5 for extracting necessary information from the large-scale text database 6 by using an index matching the input.
When a large amount of text is output from the information extraction unit 5, a majority decision evaluation unit 7 that extracts a representative information by making a majority decision based on the number of matching functional units, and presents the extracted information to the user. Output means 8 for

【０００７】大規模テキストデータベース１に登録され
たテキストについて、機能関係抽出手段４は、そのテキ
ストを文書単位として見るのではなく、全体をプレーン
なテキストとして見て、その中から情報を担う単位とし
て機能単位を抽出し、機能単位インデクス３へ登録す
る。この機能関係抽出手段４が起動されるタイミングと
しては機能単位インデクス３がアクセスされた時でもよ
いが、実用的には大規模テキストデータベース６へテキ
ストが登録された際に起動されるのがよい。機能単位イ
ンデクス３内では同じ機能単位毎にインデクスがまとめ
られるため、同一機能単位について複数のテキストの別
々の箇所のインデクスが保持されることがしばしば発生
する。Regarding the text registered in the large-scale text database 1, the functional relationship extracting means 4 does not look at the text as a document unit but as a whole plain text, and regards it as a unit for carrying information. The functional unit is extracted and registered in the functional unit index 3. The functional relationship extracting means 4 may be activated at the time when the functional unit index 3 is accessed, but practically it is preferable that it be activated when a text is registered in the large-scale text database 6. In the functional unit index 3, indexes are grouped for the same functional unit, so that indexes of different places of a plurality of texts are often held for the same functional unit.

【０００８】機能単位としては、単語の対とその間の関
連の３項関係を用いる。この３項関係は、対となってい
る単語の機能的な意味を示す単位であると考える。最も
単純な機能単位としては、対となる単語を係り受け関係
を持つ２自立語とし、関連をそれらの単語の間の関係
（格助詞、接続助詞、接続詞、等）としたものがある。
これら自立語をＡ，Ｂで表し、その間の関係をＲで表す
と、機能単位は、Ａ−Ｂ−Ｃ例：湾岸戦争−に−参戦した湾岸戦争−の−連合軍宣戦した−そして−爆撃したと表される。また、もっと複雑な単語間の関連は、この
機能単位の連結により表すことができる。Ａ−Ｒ１−Ｂ−Ｒ２−Ｃ → Ａ−Ｒ３−Ｃ（Ｒ３＝
Ｂ）例：湾岸戦争−参戦した−米国この機能単位の連結は、単にＢの単語が一致しているだ
けでは充分でなく、ＡとＢの関連が述べられている文脈
がＢとＣの関連が述べられている文においてのみ連結を
行う。As the functional unit, a ternary relation of word pairs and the relation between them is used. It is considered that this ternary relationship is a unit indicating the functional meaning of a pair of words. As the simplest functional unit, there is one in which a pair of words are two independent words having a dependency relationship and a relationship is a relationship between those words (case particle, connecting particle, connecting particle, etc.).
When these independent words are represented by A and B, and the relation between them is represented by R, the functional units are ABC: eg, Gulf War-participated in Gulf War-allied warfare-and-bombing It is expressed as done. Further, more complicated relations between words can be expressed by connecting the functional units. A-R1-B-R2-C → A-R3-C (R3 =
B) Example: Gulf War-Participated-USA This functional unit connection is not enough that the words of B are simply the same, and the context in which the relationship between A and B is stated is the relationship between B and C. Concatenates only in the sentence in which is stated.

【０００９】次にユーザにより検索要求が入力された場
合の動作について説明する。入力手段１により得られた
ユーザの入力は、先ず機能単位検索手段２において、機
能単位を用いて、ユーザの必要とする機能的関係を抽出
する。この機能的関係に一致する機能単位のインデクス
により、情報抽出手段５で大規模テキストデータベース
６よりテキストが抽出される。この抽出されるテキスト
の文字列長は、そのテキストの全体ではなく、情報を担
う単位の文字列の長さが適当で、１文という単位を用い
ることが考えられる。例で説明すると、「湾岸戦争にど
こが参戦したのか」という検索要求に対しては「湾岸戦
争−参戦した−米国」という機能単位が検索され、その
インデクスを辿ることで機能単位の元となった文「湾岸
戦争に米国が参戦した」というテキストが表示される。Next, the operation when the user inputs a search request will be described. In the user's input obtained by the input means 1, first, the functional unit search means 2 extracts the functional relationship required by the user by using the functional unit. A text is extracted from the large-scale text database 6 by the information extracting means 5 with an index of a functional unit that matches this functional relationship. The character string length of the extracted text is not the entire text, but the length of the character string of a unit that carries information is appropriate, and it is conceivable to use a unit of one sentence. Explaining with an example, in response to a search request "Where participated in the Gulf War", the functional unit "Gulf War-participated in the United States" was searched, and by tracing the index, it became the source of the functional unit. The text "The United States participated in the Gulf War" is displayed.

【００１０】情報抽出手段５により抽出されたテキスト
の量が多すぎて、ユーザに提示するには不適当であると
思われる場合には、多数決評価手段７でそれらのテキス
トの機能単位での一致により多数決をとり、その代表的
な意見からユーザに提示する。もし、情報抽出手段５で
抽出されたテキストの量がそれほど多くない場合は、多
数決評価手段７はなにもしない。この多数決評価による
情報選択は、ユーザが一度に処理しきれない量の情報を
与えることを避け、最も普遍的な情報を提示することを
行っている。また大量のテキストを無制限に情報源とし
て追加していくと、その中には誤った記述や不必要なノ
イズ情報がどうしても存在することになるが、この厚み
による情報の選択はこれらの除去の役目も果たしてい
る。When the amount of text extracted by the information extraction means 5 is too large and it is considered to be unsuitable for presentation to the user, the majority evaluation means 7 matches the texts in functional units. To take a majority vote and present it to the user from the representative opinion. If the amount of text extracted by the information extraction means 5 is not so large, the majority decision evaluation means 7 does nothing. The information selection by the majority evaluation avoids giving the amount of information that the user cannot process at one time, and presents the most universal information. Moreover, if a large amount of text is added as an unlimited information source, there will inevitably be erroneous descriptions and unnecessary noise information, but the selection of information based on this thickness plays a role in removing these. Is also playing.

【００１１】次に図２を用いて本テキストベース検索装
置での検索の過程を説明する。図２は本装置による検索
過程を示す概念図である。まず、ユーザが湾岸戦争につ
いて情報検索を行いたいと考えたとする。ユーザが米国
が湾岸戦争に関与していたことを知っていたとして、
「湾岸戦争では米国はどうした」という入力を行う。こ
の入力で機能単位検索手段は、「湾岸戦争−米国−宣戦
した」という機能単位を機能単位インデクスより検索す
る。更にその機能単位が抽出された元のテキストとし
て、「湾岸戦争で米国がイラクに宣戦した」が情報抽出
手段により取り出されユーザに提示される。ユーザはこ
のテキストを見て、米国は連合国の中心だったような記
憶を思い出し、そのまま疑問として入力する。すると
「湾岸戦争で連合国は米国が中心であった」と検索され
る。このような過程を繰り返して、ユーザは「湾岸戦争
では電子兵器が初めて使われた」ことまで知ることが可
能となる。Next, referring to FIG. 2, a search process in the text-based search apparatus will be described. FIG. 2 is a conceptual diagram showing a search process by this device. First, suppose the user wants to retrieve information about the Gulf War. As users knew the United States was involved in the Gulf War,
Enter "What happened to the United States in the Gulf War?" With this input, the functional unit search means searches the functional unit index for the functional unit "Gulf War-US-War". Further, as the original text from which the functional unit is extracted, "the United States has warned Iraq in the Gulf War" is retrieved by the information extracting means and presented to the user. The user sees this text, recalls the memory that the United States was the center of the Allied Powers, and inputs it as a question. Then, it is searched that "the United States was the main allies in the Gulf War." By repeating this process, it becomes possible for the user to know that "electronic weapons were used for the first time in the Gulf War."

【００１２】[0012]

【発明の効果】以上説明したように、本発明によるテキ
ストベース情報検索装置では、テキストを文書単位とし
て見るのではなく、全体をプレーンなテキストとして見
て、テキストの位置まで特定するインデクスを用意する
ことで、ユーザの情報検索要求に対応するテキストを、
データベース全体に持っている情報の中から検索するこ
とが可能となっている。また、この機能単位のインデク
スは大規模テキストデータベースにテキストを登録する
際に、機能関係抽出手段によって自動的に作成されるた
め、インデクス作成のコストがかからないという特徴が
ある。また最新の情報が述べられたテキストを大規模テ
キストデータベースに登録するだけで、その情報を検索
・利用することが可能になるという特徴もある。As described above, in the text-based information retrieval apparatus according to the present invention, the text is not viewed as a document unit, but the entire text is viewed as plain text, and an index for identifying the position of the text is prepared. The text corresponding to the user's information search request can be
It is possible to search from the information held in the entire database. In addition, since the index of each functional unit is automatically created by the functional relationship extracting means when the text is registered in the large-scale text database, there is a feature that the cost of creating the index is low. Another feature is that the text that describes the latest information can be searched and used by simply registering it in a large-scale text database.

[Brief description of drawings]

【図１】本発明の一実施例を示すブロック図である。FIG. 1 is a block diagram showing an embodiment of the present invention.

【図２】本発明による検索過程を示す概念図。FIG. 2 is a conceptual diagram showing a search process according to the present invention.

【符号の説明】１入力手段２機能単位検索手段３機能単位インデクス４機能関係抽出手段５情報抽出手段６大規模テキストデータベース７多数決評価手段８出力手段[Explanation of Codes] 1 Input Means 2 Functional Unit Searching Means 3 Functional Unit Indexes 4 Functional Relationship Extracting Means 5 Information Extracting Means 6 Large Scale Text Databases 7 Majority Evaluating Means 8 Output Means

Claims

[Claims]

1. Input means for receiving an information search request from a user, a large-scale text database holding a large amount of text, and information extraction for extracting necessary information from the large-scale text database by using an index matching the input. Means, and an information retrieval device equipped with an output means for presenting the extracted information to the user, a functional relationship extracting means for extracting a functional unit as a unit for carrying the information of each text in the large-scale text database, A functional unit index that holds the functional unit extracted by the functional relationship extracting unit and its position in the text as an index; a functional unit searching unit that searches for a functional unit that matches the input in the functional unit index; If there is a large amount of text output by the information extraction means, An information retrieval apparatus comprising a majority decision evaluating means for taking a majority decision and extracting representative information.