JPH0744566A

JPH0744566A - Abstract preparation device

Info

Publication number: JPH0744566A
Application number: JP5186570A
Authority: JP
Inventors: Kenji Ono; 顕司小野; Kazuo Sumita; 一男住田; Seiji Miike; 誠司三池
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1993-07-29
Filing date: 1993-07-29
Publication date: 1995-02-14

Abstract

PURPOSE:To improve the reading comprehensibility and naturalness of abstract sentences by including the corresponding destination of a correspondence expression in the abstract sentence or omission part display in the abstract sentence. CONSTITUTION:A format analysis part 1 analyzes electronized input sentences and analyzes the break of the sentences, the end of paragraphs and the structure of chapters and passages, etc. A morpheme/syntax analysis part 3 performs morpheme/ syntax analysis by each sentence for the part of a text and the titles of the chapters and the passages. A correspondence solving part 5 refers to the morpheme/syntax analyzed results of the respective sentences and decides the corresponding destination of the correspondence expression (participal adjectives and pronouns) included in the sentences. An important sentence judgement part 9 judges important sentences based on the morpheme/syntax analyzed results of the respective sentences and the titles of the chapters and the passages. In such constitution, when unselected words and phrases, the part of the correspondence expression such as the participal adjective and the pronoun or a non-important sentence for instance, are present between the words and phrases selected from inputted original sentences corresponding to a prescribed reference, that effect is indicated between the words an phrases in the abstract sentence with a prescribed code or the like for instance.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、自然言語からなる文章
（テキスト）からその抄録を作成・表示する抄録作成装
置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an abstract creating apparatus for creating and displaying an abstract of a sentence (text) in natural language.

【０００２】[0002]

【従来の技術】従来、抄録を作成するシステムで主流と
なっている技術は、文書から重要文を選択し、この選択
された重要文を接続して抄録を作成するものである。こ
のような従来方式の抄録作成システムで問題になるの
が、重要文として選ばれてきた文に“それ”“その”
“それら”“これ”“同氏”といった照応表現が含まれ
ているケースである。たまたまそれらの語句の照応先
（参照先）がやはり重要文であれば、抄録文中にそれら
の語句の指し示す語句あるいは文が存在するので、抄録
文の理解に問題は起きないが、重要文ではない場合に問
題が起きる。そのような場合、照応先の語句が判らない
ばかりでなく、抄録文中でたまたま前にきた文中の異な
る語句を指していると読み間違え、大きな誤読をする可
能性があった。従来はそれら照応表現を削除するなどし
て対応していたが、この方法では抄録文の可読性や自然
性を損なう欠点は否めなかった。2. Description of the Related Art Conventionally, a mainstream technology in a system for creating an abstract is to select important sentences from a document and connect the selected important sentences to create an abstract. The problem with such a conventional abstract creation system is that the sentence that has been selected as an important sentence is "that"
In this case, anaphoric expressions such as “these,” “this,” and “the same person” are included. If the reference (reference) of those terms happens to be an important sentence, there is no word or sentence in the abstract that points to those words, so there is no problem in understanding the abstract, but it is not an important sentence. If you have a problem. In such a case, not only the word or phrase of the reference destination is not known, but if it points to a different word or phrase in the abstract sentence that happens to come, there is a possibility that it will be misread and a large misreading will occur. In the past, such anaphoric expressions were deleted, but this method cannot be denied the drawback of impairing the readability and naturalness of the abstract.

【０００３】理想的には、連体詞や代名詞の照応先の文
や語句を判定／同定し、抄録文中の連体詞や代名詞をそ
れらの語句や文に置換することが望ましい。このとき、
照応先の文や語句がもともと抄録文に含まれる場合は、
代名詞・連体詞はそのまま残した方が好ましい。Ideally, it is desirable to determine / identify the sentence or phrase to which the adjunct or pronoun is anaphorated and replace the adjunct or pronoun in the abstract sentence with the phrase or sentence. At this time,
If the sentence or phrase of the reference is originally included in the abstract sentence,
It is preferable to leave the pronouns and adnominals as they are.

【０００４】この照応先の同定処理技術（照応問題解決
技術）として、日本語対象のものとしては、「日本語文
の照応問題解決システム」、山村毅、大西昇、杉江
昇（電子情報通信学会論文誌 Vol.J73-D-II, No.6, p
p.887-896, 1990.6)や、「意味および文脈情報を用いた
日本語文の解析−文脈を考慮した処理」（長尾真、辻
井潤一、田中一敏：情報処理学会論文誌 Vol.17, N
o.1, Jan.1976)などが提案されている。As the anaphora identification processing technology (anaphora problem solving technology), the Japanese objects include "Japanese sentence anaphora problem solving system", Takeshi Yamamura, Noboru Onishi, Noboru Sugie (The Institute of Electronics, Information and Communication Engineers) Magazine Vol.J73-D-II, No.6, p
p.887-896, 1990.6) and "Analysis of Japanese sentences using meaning and context information-processing in consideration of context" (Makoto Nagao, Junichi Tsujii, Kazutoshi Tanaka: IPSJ Journal Vol.17, N.
o.1, Jan.1976) has been proposed.

【０００５】しかしこれらの技術は、処理する文章の分
野の専門用語や知識、また日常的な知識（常識）が事前
に機械処理可能な形で準備されていることが前提となっ
ている。従ってこれらの技術は、扱う文章の分野が判ら
ないので事前に分野知識を準備できない一般的な自然言
語処理システムに用いることができない。また‘常識’
の準備の問題は、解決されていない技術的問題である。
また、これら照応解決技術の抄録作成システムへの応用
はこれまで検討されていなかった。また、抄録作成シス
テムを含め従来の文書処理・表示システムでは、照応解
析結果の利用の仕方が十分でなかった。However, these techniques are premised on that technical terms and knowledge in the field of texts to be processed, as well as daily knowledge (common sense), are prepared in advance in a machine-processable form. Therefore, these techniques cannot be used for a general natural language processing system in which the domain knowledge cannot be prepared in advance because the domain of the text to be handled is unknown. Also'common sense '
The preparation issue of is an unsolved technical issue.
Moreover, the application of these anaphoric solution techniques to the abstract preparation system has not been studied so far. Moreover, in the conventional document processing / display system including the abstract creating system, the usage of the anaphora analysis result was not sufficient.

【０００６】一方、近年、文書が電子化されるに伴い、
文書の利用技術に新たな発展が見られる。例えば、テキ
ストデータベースとして集中管理したり、ＳＧＭＬのよ
うに、特定の記号を電子化文書中に挿入することによっ
て、様々な処理の便宜をはかるものもある。またハイパ
ーテキストのように文書あるいは文書中のいろいろな部
分を電子的に相互に関連づけて記憶することにより、文
書の利用の仕方を拡張するものもある。１つの文書の全
てではなく、関連するところだけを検索して表示すると
いったように、文書の検索処理や表示の仕方も多角的・
複合的になってきている。On the other hand, with the recent digitization of documents,
New developments can be seen in document utilization technology. For example, a text database may be centrally managed or a specific symbol may be inserted into an electronic document, such as SGML, to facilitate various processes. There is also a method such as hypertext in which a document or various parts in the document are electronically associated with each other and stored to expand the way of using the document. There are multiple ways to search and display documents, such as searching and displaying only the relevant parts instead of all of one document.
It is becoming more complex.

【０００７】このようになってくると、文章の一部につ
いて、その部分のみを読んで一応の内容の理解ができる
よう、照応表現の照応先の語句や話題表現を補うといっ
た技術は、これから重要になってくると思われる。[0007] Under such circumstances, a technique of supplementing the word or topic expression of the anaphoric expression is important so that only a part of the sentence can be read to understand the tentative content. I think it will be.

【０００８】また、それに伴い、処理結果を一方的に表
示するのみならず、ユーザの操作に従ってインタクティ
ブに、表示部分の変更や文中で参照している語句の呈示
などの付加情報の表示を行うようなシステムの機能、つ
まりシステムのマンマシンインタフェースがより重要と
なってきている。このような状況では、抄録の表示の仕
方や、それに伴う照応解析処理結果の利用の仕方には様
々な可能性があるが、まだ十分実現されているとはいえ
ない状況である。Along with this, not only the processing result is unilaterally displayed, but also the additional information such as the change of the display portion and the presentation of the phrase referred to in the sentence is interactively displayed according to the operation of the user. The function of such a system, that is, the man-machine interface of the system, is becoming more important. In such a situation, there are various possibilities of displaying the abstract and using the anaphora analysis processing result accompanying it, but it cannot be said that it has been fully realized yet.

【０００９】[0009]

【発明が解決しようとする課題】しかしながら、従来の
抄録生成システムでは、照応解析を行っていないか、あ
るいは、照応解析結果の利用の仕方が十分ではなく、抄
録文の読解性が低かった。また、抄録文や原文の表示に
対するユーザの要求（抄録の長さの変更、抄録中の連体
詞や代名詞の照応先の表示要求や、抄録に含まれていな
い原文部分の表示要求など）に対応することが困難だっ
た。However, in the conventional abstract generation system, the anaphora analysis is not performed, or the anaphora analysis result is not sufficiently used, and the abstract text is poor in readability. It also responds to user requests for displaying abstract sentences and original texts (such as changing the length of abstracts, requesting references to adjuncts and pronouns in abstracts, and displaying original texts not included in abstracts). It was difficult.

【００１０】本発明は、上記課題に鑑みてなされたもの
で、抄録文中の照応表現の照応先或いは省略箇所表示が
抄録文中に含まれるようにすることによって、抄録文の
読解性、自然性を向上させる抄録作成装置を提供するこ
とを目的とする。The present invention has been made in view of the above-mentioned problems, and makes it possible to improve the comprehension and naturalness of an abstract sentence by including the reference destination of the anaphoric expression in the abstract sentence or the abbreviation display. It is an object to provide an abstract preparation device that improves.

【００１１】[0011]

【課題を解決するための手段】上記目的を達成するため
本発明は、入力される原文から抄録を作成する抄録作成
装置であって、原文中から所定の基準に従って語句を選
択して抄録文を作成する抄録文作成手段と、この抄録文
作成手段で選択された語句と語句との間等に選択されな
い語句、例えば非重要文或いは連体詞、代名詞などの照
応表現の部分が存在するときにはその旨を抄録文中の当
該語句間に、例えば所定の符号等で提示する提示手段
と、この提示手段で提示された箇所に対応する語句、例
えば非重要文或いは連体詞、代名詞などの照応表現の部
分を必要に応じて表示する表示手段とを有することを要
旨とする。In order to achieve the above object, the present invention is an abstract creating device for creating an abstract from an input original sentence, and selects an abstract sentence by selecting a phrase from the original sentence according to a predetermined criterion. The abstract sentence creating means to be created, and the words and phrases not selected between the words and phrases selected by this abstract sentence creating means, such as insignificant sentences or anaphoric expressions such as adnominals, pronouns, etc. Between the words and phrases in the abstract sentence, it is necessary to provide a presentation means for presenting with, for example, a predetermined code and a word or phrase corresponding to the location presented by this presentation means, for example, an anaphoric expression part such as an insignificant sentence or a noun or a pronoun. The gist of the present invention is to have display means for displaying the information accordingly.

【００１２】好ましくは原文から重要文を選んで抄録文
を作成する抄録作成システムに於いて、選んだ重要文が
原文で連続していない場合は、その間に“…”などの特
定の記号を挿入するものである。[0012] Preferably, in an abstract creating system for selecting an important sentence from an original sentence to create an abstract sentence, if the selected important sentences are not continuous in the original sentence, a specific symbol such as "..." is inserted between them. To do.

【００１３】好ましくは表示中の“…”などの特定の記
号部分をマウスでクリックするなどして指定すると、そ
の部分の原文が表示されるものである。Preferably, when a particular symbol portion such as "..." in the display is designated by clicking with a mouse, the original text of that portion is displayed.

【００１４】好ましくは原文から重要文を選んで抄録文
を作成する抄録作成システムに於いて、照応解析部を備
え、照応解析部の解析結果を利用して抄録文を作成し表
示するものである。Preferably, in an abstract creating system for selecting an important sentence from an original sentence to create an abstract sentence, an anaphora analysis unit is provided, and an abstract sentence is created and displayed by using an analysis result of the anaphora analysis unit. .

【００１５】好ましくは表示中の連体詞、代名詞などの
照応表現の部分を、マウスでクリックするなどして指定
すると、その照応先の語句や文が表示される。あるい
は、それらがすでに表示されている場合は、表示中の該
当部分が強調表示などされるものである。Preferably, when the anaphoric expression portion such as a noun or a pronoun being displayed is designated by clicking with a mouse, the word or sentence of the anaphoric destination is displayed. Alternatively, if they are already displayed, the relevant part being displayed is highlighted.

【００１６】好ましくは原文から重要文を選んで抄録文
を作成する抄録作成システムに於いて、照応解析部を備
え、選ばれた重要文に連体詞・代名詞が含まれ、その照
応先が非重要文である場合、その非重要文を（重要文と
して）抄録に含めるものである。[0016] Preferably, in the abstract creating system for creating an abstract sentence by selecting an important sentence from an original sentence, an anaphoric analysis unit is provided, and the selected important sentence includes a noun and a pronoun, and its destination is an unimportant sentence. If, then the non-important sentence is included in the abstract (as an important sentence).

【００１７】好ましくは原文から重要文を選んで抄録文
を作成する抄録作成システムに於いて、照応解析部を備
え、選ばれた重要文に連体詞・代名詞が含まれる場合、
それら連体詞・代名詞を照応先の語句や文に置換するも
のである。Preferably, in the abstract creating system for creating an abstract sentence by selecting an important sentence from the original sentence, when an anaphoric analysis unit is provided and the selected important sentence includes a conjunction or a pronoun,
It replaces those adnominals and pronouns with the phrase or sentence of the destination.

【００１８】好ましくは原文から重要文を選んで抄録文
を作成する抄録作成システムに於いて、照応解析部を備
え、選ばれた重要文に連体詞・代名詞が含まれる場合、
それら連体詞・代名詞の前後に、挿入を示す適当な記号
とともに、照応先の語句や文を挿入するものである。Preferably, in the abstract creating system for creating an abstract sentence by selecting an important sentence from the original sentence, when an anaphoric analysis unit is provided and the selected important sentence includes a conjunction or a pronoun,
Before and after these adnominals and pronouns, the phrase or sentence of the recipient is inserted together with an appropriate symbol indicating insertion.

【００１９】好ましくは原文から重要文を選んで抄録文
を作成する抄録作成システムに於いて、照応解析部を備
え、選ばれた重要文に連体詞・代名詞が含まれ、その照
応先が非重要文である場合、その照応表現を含む文を抄
録に含めないものである。Preferably, in the abstract creating system for selecting an important sentence from an original sentence to create an abstract sentence, an anaphoric analysis unit is provided, and the selected important sentence includes a noun / pronoun, and its destination is an unimportant sentence. , The sentence containing the anaphoric expression is not included in the abstract.

【００２０】好ましくは原文から重要文を選んで抄録文
を作成する抄録作成システムに於いて、照応解析部を備
え、選ばれた重要文に連体詞が含まれ、その照応先が非
重要文である場合、連体詞を削除するものである。Preferably, in the abstract creating system for creating an abstract sentence by selecting an important sentence from the original sentence, an anaphoric analysis unit is provided, and the selected important sentence includes a conjunction, and the destination of the anaphora is a non-important sentence. In this case, the adnominal is deleted.

【００２１】好ましくは原文の副詞節や従属節など主節
でない（重要でない）部分を削除して抄録文を作成する
抄録作成システムに於いて、照応解析部を備え、主節に
現れる連体詞・代名詞の照応先が主節でない節に含まれ
る場合、それらの節を削除しないものである。[0021] Preferably, in an abstract creation system for creating an abstract sentence by deleting non-primary (non-important) parts such as adverbial clauses and subordinate clauses of the original sentence, an anaphoric analysis unit is provided, and a noun / pronoun appearing in the main clause If the reference destination of is included in a clause that is not the main clause, those clauses are not deleted.

【００２２】[0022]

【作用】本発明は、入力される原文中から所定の基準に
従って選択された語句と語句との間等に選択されない語
句、例えば非重要文或いは連体詞、代名詞などの照応表
現の部分が存在するときにはその旨を抄録文中の当該語
句間に、例えば所定の符号等でもって提示するようにし
ている。また、さらに提示された箇所に対応する語句、
例えば非重要文或いは連体詞、代名詞などの照応表現の
部分を必要に応じて表示することで抄録文の読解性を高
められる。According to the present invention, when an unselected word, such as an unimportant sentence or an anaphoric expression part such as a noun or a pronoun, exists between the word or phrase selected according to a predetermined criterion from the input original sentence. This is shown between the words and phrases in the abstract text, for example, by using a predetermined code or the like. In addition, words and phrases corresponding to the part that is further presented,
For example, by displaying an anaphoric expression part such as a non-important sentence or a noun sentence or a pronoun, it is possible to improve the readability of the abstract sentence.

【００２３】また、本発明は照応解析結果をふまえて抄
録を生成しているので、抄録文の読解性が高く、抄録文
を対象にフルテキストサーチや情報抽出などの処理を行
う際、キーワードの漏れが照応処理をしない場合に比べ
少ないので、より良い結果を期待できる。また最近の情
報検索システムの中には１文中の複数の語の共起関係や
構文関係を利用するものがあるが、照応処理が適切にさ
れていれば、語の１つが代名詞であってもこういった検
索システムの検索精度を下げることがない。また、ユー
ザの操作に従って、インタラクティブに、抄録の長さを
変更したり、抄録中にない原文部分を表示したり、抄録
中の連体詞、代名詞などの照応先を表示したりすること
ができる。Further, according to the present invention, since the abstract is generated on the basis of the anaphora analysis result, the abstract sentence is highly comprehensible, and when the abstract sentence is subjected to processing such as full-text search or information extraction, Better results can be expected because there are fewer leaks than without anaphora treatment. Some recent information retrieval systems use co-occurrence relations or syntactic relations of multiple words in one sentence, but if anaphora processing is appropriate, even if one of the words is a pronoun. It does not reduce the search accuracy of these search systems. Further, it is possible to interactively change the length of the abstract, display an original text part that is not in the abstract, or display an anaphoric destination such as a noun or a pronoun in the abstract in accordance with the user's operation.

【００２４】[0024]

【実施例】以下、本発明に係る一実施例を図面を参照し
て説明する。図１は本発明に係る抄録作成装置の一実施
例である自動抄録生成・表示システムの構成を示したブ
ロック図である。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing the configuration of an automatic abstract generation / display system which is an embodiment of the abstract creation device according to the present invention.

【００２５】図１を参照するに、書式解析部１は電子化
された入力文章を解析し、文の切れ目、段落の変わり
目、章や節の構造などを解析するものである。形態素・
構文解析部３は章や節の見出し、および本文部分を１文
毎に形態素・構文解析するものである。照応解決部５
は、各文の形態素・構文結果を参照して、文に含まれる
照応表現（連体詞・代名詞）の照応先を決定するもので
ある。重要文判定部９は章や節の見出しや各文の形態素
・構文解析結果を元に、重要文を判定するものである。Referring to FIG. 1, the format analysis unit 1 analyzes the digitized input sentence, and analyzes sentence breaks, paragraph transitions, chapter and section structures, and the like. morpheme·
The syntactic analysis unit 3 morphologically and syntactically analyzes the chapter and section headings and the body part for each sentence. Anaphora Resolution Department 5
Is to determine the anaphoric destination of the anaphoric expression (adjunct / pronoun) included in the sentence by referring to the morpheme / syntax result of each sentence. The important sentence determination unit 9 determines an important sentence based on a chapter or section heading and a morphological / syntactic analysis result of each sentence.

【００２６】次に、図２を参照して照応解決部５の構成
について説明する。この照応解決部５は話題解析部５１
と照応同定部５５からなる。Next, the configuration of the anaphora solving section 5 will be described with reference to FIG. This anaphora resolution unit 5 is a topic analysis unit 51.
And an anaphora identification section 55.

【００２７】話題解析部５１は、原文の形態素・構文解
析結果から、文章の内容に直接関している話題表現を抽
出する。修辞的話題表現記憶部５３ａは、話題表現に関
係した修辞的な文型を記憶している。この文型に合致す
るような話題表現は、修辞的な用法であり、文章の内容
に直接関係しない。倒置構文文型記憶部５３ｂには、倒
置構文など文の話題が、‘は’、‘も’、‘こそ’とい
った話題呈示表現が最後部につく名詞句ではないような
ものの文型情報が記憶されている。The topic analysis unit 51 extracts a topic expression directly related to the content of the sentence from the morpheme / syntactic analysis result of the original sentence. The rhetorical topic expression storage unit 53a stores rhetorical sentence patterns related to topic expressions. Topic expressions that match this sentence pattern are rhetorical usages and are not directly related to the content of the sentence. The inversion syntax sentence pattern storage unit 53b stores sentence pattern information about a topic of a sentence such as an inversion syntax and a topic presentation expression such as'wa ',' mo 'and'koso' which is not a noun phrase at the end. There is.

【００２８】照応同定部５５は、原文の形態素・構文解
析結果と、話題解析部５１の話題解析結果とから、原文
に含まれる連体詞・代名詞の照応タイプ、および、照応
先の文あるいは語句を同定する。照応表現文型記憶部５
３ｃは、照応表現を含む文型に関する情報が記憶されて
いる。この情報は、文の照応タイプを決定するために利
用される。命題照応表現記憶部５３ｄには、‘こと’、
‘できごと’、‘事態’など、述部を含む文全体を照応
することが多いような名詞が登録されている。The anaphoric identification unit 55 identifies the anaphoric type of the adnominal / pronoun included in the original sentence and the sentence or phrase of the anaphoric destination from the morpheme / syntactic analysis result of the original sentence and the topic analysis result of the topic analysis unit 51. To do. Anaphoric expression sentence pattern storage unit 5
3c stores information about sentence patterns including anaphoric expressions. This information is used to determine the anaphoric type of the sentence. In the propositional anaphoric expression storage section 53d, the'thing ',
Registered are nouns that often refer to the whole sentence including predicates such as'event 'and'situation'.

【００２９】図１を参照するに、抄録作成・表示部７
は、重要文判定結果と照応解決結果を元に、抄録文を作
成する。また、ユーザの操作に従って、インタラクティ
ブに、抄録の長さを変更したり、抄録中にない原文部分
を表示したり、抄録中の連体詞、代名詞などの照応先を
表示したりすることができる。図３は抄録生成・表示部
の構成図である。Referring to FIG. 1, the abstract creation / display unit 7
Creates an abstract sentence based on the important sentence judgment result and anaphora resolution result. Further, it is possible to interactively change the length of the abstract, display an original text part that is not in the abstract, or display an anaphoric destination such as a noun or a pronoun in the abstract in accordance with the user's operation. FIG. 3 is a block diagram of the abstract generation / display unit.

【００３０】以下、解析例を元に各処理部の動作を説明
する。図１８は入力文書例を示す図である（なお、図中
Ｒｖの表示のある箇所はいわゆる反転表示が行われてい
る箇所を示すものである。以下、他の図面についても同
様）。書式解析部１は電子化された入力文章を解析し、
文の切れ目、段落の変わり目、章や節の構造などを解析
する。解析結果は、文章中のどの部分に要約処理を行う
かの選択、あるいは、要約処理を行う単位の決定に利用
される。例えば、章や節毎に抄録するとか、“結論”や
“概要”という見出しを持つ章や節のみを抄録して全体
の抄録とする、といった処理に利用される。The operation of each processing unit will be described below based on an analysis example. FIG. 18 is a diagram showing an example of an input document (in the figure, a portion where Rv is displayed indicates a so-called inverted display. The same applies to other drawings hereinafter). The format analysis unit 1 analyzes the digitized input sentence,
Analyze sentence breaks, paragraph breaks, chapter and section structures, etc. The analysis result is used to select which part of the sentence is to be subjected to the abstraction process or to determine the unit of the abstraction process. For example, it is used for processing such as abstracting each chapter or section, or abstracting only the chapters or sections having the headings “Conclusion” and “Summary” to obtain the entire abstract.

【００３１】形態素・構文解析部３は章や節の見出し、
および本文部分を１文毎に形態素・構文解析する。図２
０は、図１８に示した入力文書例に対する、文書形態素
・構文解析結果の一部である。本実施例では、形態素の
切れ目を＋、文節の切れ目を／、形態素名を＜…＞で示
している。また、文の主術部に直接に係る句や節（以降
主タームと呼ぶ）の切れ目を、“［…］”で示してい
る。この２つの記号の間に記述されているものは、主
語”、補語、副詞句など、構文解析の結果得られるその
節の構文情報である。The morpheme / syntactic analysis section 3 uses chapter and section headings,
And the text part is morpheme / syntactically analyzed for each sentence. Figure 2
0 is a part of the document morpheme / syntactic analysis result for the input document example shown in FIG. In this embodiment, a morpheme break is indicated by +, a bunsetsu break is indicated by /, and a morpheme name is indicated by <...>. In addition, a break of a phrase or a clause (hereinafter referred to as a main term) directly related to the main artistic part of the sentence is indicated by "[...]". What is described between these two symbols is the syntactic information of the clause obtained as a result of the syntactic analysis, such as the subject ", a complement, an adverb phrase.

【００３２】重要文判定部９は章や節の見出しや各文の
形態素・構文解析結果を元に、重要文を判定する。この
処理には、従来技術としていろいろな方式が既に提案さ
れている。例えば、特願平２−２０３８６５号は文の接
続の仕方、接続関係をもとに文章の文脈構造を抽出し、
それに従って重要文を判定する方式をとっている。この
重要文判定部９は通常、指定された重要文の数、あるい
は指定された文字数の範囲内で、重要文を判定するもの
である。The important sentence judgment unit 9 judges an important sentence based on the chapter or section heading and the morphological / syntactic analysis result of each sentence. For this processing, various methods have been already proposed as a conventional technique. For example, Japanese Patent Application No. 2-203865 extracts the context structure of a sentence based on the sentence connection method and the connection relation,
According to the method, the important sentence is judged. The important sentence determination unit 9 normally determines an important sentence within the number of designated important sentences or the designated number of characters.

【００３３】文字数指定方式の重要文判定部も通常、内
部では文数指定方式である。重要文として判定された文
の文字数の総和を計算し、指定された文字数を超過する
ものであれば、指定文数を内部でディクリメントして、
再度重要文判定を行う、という後処理を指定文字数以下
になるまで繰り返す、という点が違うだけである。The important sentence determination unit of the character number designating method is also usually the sentence number designating method internally. Calculate the sum of the number of characters of the sentence judged as important sentence, and if it exceeds the specified number of characters, decrement the specified number of sentences internally,
The only difference is that the post-processing of determining the important sentence again is repeated until the number of specified characters is reached.

【００３４】よって、以下の説明では文数指定方式の場
合を例として説明を続けるが、文字数指定方式の重要文
判定部が利用された場合も、上述の後処理が加わるのみ
なので、類推は容易である。Therefore, in the following description, the case of the sentence number designating method will be continued as an example. However, even when the important sentence determining unit of the character number designating method is used, the above-mentioned post-processing is added, and analogy is easy. Is.

【００３５】また、従来方式として圧縮率（抄録文の文
字数／原文の文字数、あるいは、抄録文の文数／原文の
文数）指定方式の重要文判定方式も存在する。これも、
原文の文字数あるいは文数を得て、抄録文に許される文
字数あるいは文数を計算する前処理を付加すれば先の方
式と同等であるので、これらの方式を利用した場合の実
施例の説明は省略する。As a conventional method, there is also an important sentence determination method of a compression ratio (number of characters of abstract sentence / number of characters of original sentence, or number of sentences of abstract sentence / number of sentences of original sentence) specification method. This is also
If the number of characters or the number of sentences in the original sentence is obtained and the preprocessing for calculating the number of characters or sentences allowed in the abstract sentence is added, it is equivalent to the above method. Omit it.

【００３６】図２に示す話題解析部５１は、原文の形態
素・構文解析結果から、話題表現を抽出する。話題解析
部のアルゴリズムを図４に示す。話題解析部は、まず、
‘は’、‘も’、‘こそ’など、話題を呈示するような
助詞や修辞表現を句や節の最後部に含むような名詞句・
節ないし副詞句・節を原文から抽出する（ステップＳ１
７）。これら話題呈示表現を含む句や節でも、修辞的な
もので、文章の内容とは直接関係のないものがある。修
辞的話題表現記憶部にはそのような文体に関する情報が
記憶されている。図１６に登録例を示す。文体は正規表
現で記述されてあり、原文の形態素列とパターンマッチ
が行われて、マッチする場合は、その話題表現は修辞的
用法であると判定され、話題としては却下される（ステ
ップＳ１９，２１）。The topic analysis unit 51 shown in FIG. 2 extracts a topic expression from the morpheme / syntactic analysis result of the original sentence. The algorithm of the topic analysis unit is shown in FIG. First, the topic analysis section
A noun phrase that includes a particle or rhetorical expression that presents a topic such as'wa ',' mo ',' koso 'at the end of the phrase or clause.
Extract clauses or adverb phrases / clauses from the original sentence (step S1)
7). Some phrases and clauses including these topic presentation expressions are rhetorical and have nothing to do with the content of the sentence. The rhetorical topic expression storage unit stores information about such styles. FIG. 16 shows an example of registration. The style is written in regular expressions, pattern matching is performed with the morpheme sequence of the original sentence, and if there is a match, the topic expression is determined to be rhetorical usage and rejected as a topic (step S19, 21).

【００３７】正規表現とは、ｕｎｉｘＯＳをベースとす
る計算機で使われる文字列の指定方法である。’．’は
任意の１文字を、’ａ＊’は文字ａの０回以上の連続
を、’＾’は文頭を、’＄’は文末を、’［＾ａ］’は
文字ａ以外の文字を、’（ｘ｜ｙ）’は文字列ｘあるい
は文字列ｙを、’（ｘ）？’は空文字列あるいは文字列
ｘを、それぞれ示す。尚、‘…’は‘．＊’を表すもの
とする。また、‘＋’は標準的な正規表現では直前の文
字の１回以上の連続を指すものであるが、本例では単な
る文字（形態素を区切る記号）として扱っている。The regular expression is a method of designating a character string used in a computer based on unixOS. '. 'Is any one character,' a * 'is 0 or more consecutive characters a,' ^ 'is the beginning of a sentence,' $ 'is the end of a sentence, and' [^ a] 'is a character other than the letter a. , '(X | y)' is the character string x or the character string y, and '(x)? 'Indicates an empty character string or a character string x, respectively. "..." is "." * 'Represents. Further, in the standard regular expression, “+” indicates one or more times of the preceding character, but in this example, it is treated as a mere character (a symbol that separates morphemes).

【００３８】また、倒置構文などでは、その文の話題
は、‘は’や‘も’などの話題呈示表現を最後部に含む
名詞句ではない。このような文型に関する情報は、倒置
構文文型記憶部に登録されている。登録例を図１７に示
す。この文型パターンは後述する正規表現で書かれてい
るが、パターン中の‘〜’部分にマッチする原文部分
が、その文の話題を示している。尚、‘〜’は、パター
ンマッチに際しては、“…”すなわち、“．＊”と同等
の文字列として処理される。In the inverted syntax, the topic of the sentence is not a noun phrase including a topic presentation expression such as'wa 'or'mo' at the end. Information about such sentence patterns is registered in the inverted syntax sentence pattern storage unit. An example of registration is shown in FIG. This sentence pattern is written by a regular expression described later, but the original sentence part that matches the '~' part in the pattern indicates the topic of the sentence. In addition, "-" is processed as a character string equivalent to "..." That is, ". *" At the time of pattern matching.

【００３９】図２１は、図１８に示した入力文書例に対
して話題解析を行った結果である。第２文、第１６文な
どでは、前文の話題が継承されている。また、第５文、
第７文からの話題表現の抽出には、倒置構文文型記憶部
の情報が利用されている。１つの文から抽出される話題
は１つとは限らず、第９文、第１１文では２つの話題が
抽出されている。FIG. 21 shows the result of topic analysis for the input document example shown in FIG. In the second sentence, the 16th sentence, etc., the topic of the previous sentence is inherited. Also, the fifth sentence,
The information in the inverted syntax sentence pattern storage unit is used to extract the topic expression from the seventh sentence. The topic extracted from one sentence is not limited to one, and two topics are extracted in the ninth sentence and the eleventh sentence.

【００４０】照応解決部５は、話題解析結果と、各文の
形態素・構文結果を参照して、文に含まれる連体詞・代
名詞などの照応表現の照応先を決定する。ここでは、
‘これ’‘それ’‘これら’‘どちら’といった常識的
なものに加え、‘同氏’‘同社’といった表現、また、
‘女史’‘博士’といった人称名詞まで照応表現に含め
て考えている。図５及び図６に照応先同定アルゴリズム
を示す。The anaphora resolution unit 5 refers to the topic analysis result and the morpheme / syntax result of each sentence to determine the anaphoric destination of the anaphoric expression such as adnominal / pronoun included in the sentence. here,
In addition to common sense things such as'this' it 'these' these ', expressions like'the same company',
I am thinking of including personal nouns such as'Mr. Doctor 'in the anaphoric expression. 5 and 6 show the recipient identification algorithm.

【００４１】照応解決部５は、まず原文中の代名詞、連
体詞を抽出する（ステップＳ４７，Ｓ４９，Ｓ５１）。
この抽出および抽出された連体詞・代名詞の照応先の語
句が自文内にあるか、前方の文中であるかの判定（ステ
ップＳ４５）は、照応表現記憶部に登録された文体パタ
ーンを原文の形態素・構文解析結果にマッチさせること
によって同時に行われる。複数のテンプレートがマッチ
したときは、より文頭に近いテンプレートを優先する。
マッチした箇所が重なる場合は、文中でより長くマッチ
しているテンプレートを優先する。照応表現文型記憶部
の登録例を図１５に示す。修辞的話題表現記憶部５３ａ
や倒置構文文型記憶部５３ｂと同様、正規表現で記述さ
れている。The anaphora resolution unit 5 first extracts pronouns and adnominals in the original sentence (steps S47, S49, S51).
This extraction and determination whether the phrase of the anaphoric destination of the extracted adnominal / pronoun is in the own sentence or in the preceding sentence (step S45) determines the style pattern registered in the anaphoric expression storage unit as the morpheme of the original sentence. -It is performed at the same time by matching the parsing result. When multiple templates match, the template closer to the beginning of the sentence is given priority.
If the matched parts overlap, the template that matches longer in the sentence is given priority. FIG. 15 shows an example of registration in the anaphoric expression sentence pattern storage unit. Rhetorical topic expression storage unit 53a
Like the inverted syntax sentence pattern storage unit 53b, it is described by a regular expression.

【００４２】図１６中、は“その”や“それ”、“こ
の”で始まる文に対してマッチするテンプレートであ
る。形態素列の指定があるので、“その為”といった接
続詞とマッチすることがない。このような文頭に存在す
る連体詞・代名詞は、その照応先は前文であると見なせ
る。In FIG. 16, is a template that matches a sentence starting with "that", "that", or "this". Since the morpheme sequence is specified, it does not match a conjunction such as "So". The adjunct / pronoun existing at the beginning of a sentence can be regarded as the destination of the preceding sentence.

【００４３】は、“ＡはＢであるが、それは…。”と
いった文にマッチする。Ａ，Ｂは適当な名詞句であると
する。このような文型では、“その”の照応先はＡか
Ｂ、あるいは“ＡはＢである”という部分全体と見なす
ことができる。Matches a sentence such as "A is B, but it is ...". Let A and B be proper noun phrases. In such a sentence pattern, the reference destination of "the" can be regarded as A or B, or the entire part "A is B".

【００４４】は“そのＡがＢであるようなＣは、
…。”という文とマッチする。この場合“その”の照応
先はＣである。"C where A is B is
…. Matches the sentence "." In this case, the destination of "the" is C.

【００４５】は、“ＡはＢであるのに、そのＣは
…。”といった文とマッチする。この場合、“その”の
照応先はＡかＢ、あるいは“ＡはＢである”という部分
と見なすことができる。Matches a sentence such as "A is B, but C is ...". In this case, it can be considered that the "destination" is A or B, or "A is B".

【００４６】照応先が前方である場合、その照応先を決
定する。照応表現が“それ”、“これ”のように代名詞
である場合（ステップＳ５１）は、前文の話題を照応先
とする。照応表現が“その”、“この”など連体詞であ
り、被修飾語が前文に現れている場合は、その被修飾語
を含む前文の名詞句を照応先とする（ステップＳ７
７）。When the destination is forward, the destination is determined. When the anaphoric expression is a pronoun such as "that" or "kore" (step S51), the topic of the preceding sentence is set as the anaphoric destination. When the anaphoric expression is an adjunct such as "that" or "this" and the modified word appears in the preceding sentence, the noun phrase of the preceding sentence including the modified word is set as the destination (step S7).
7).

【００４７】前文にそのような語句がなく、被修飾語が
“できごと”、“やり方”、“事態”など、オブジェク
トでなくイベントを参照しやすい語句である場合（ステ
ップＳ６３）は、前文の全体を照応先とする（ステップ
Ｓ７９）。どのような語句がそうであるかという情報
は、命題照応表現記憶部５３ｄに登録されている。そう
でない場合は、前文の話題を照応先とする（ステップＳ
６５）。If there is no such phrase in the preceding sentence and the modified word is a phrase such as “event”, “method”, “situation” that makes it easy to refer to an event instead of an object (step S63), The whole is set as the reference destination (step S79). Information about what kind of phrase it is is registered in the propositional anaphora expression storage unit 53d. If not, the topic in the previous sentence is set as the reference destination (step S).
65).

【００４８】図１８に示した文章に対する照応解決結果
を図２２に示す。文２、文１２、文１５中の照応表現
は、参照先が自文内であると判定されている。文４、文
１６では、被修飾語が前文に存在するので、それを含む
前文の名詞句が照応先と判定される。文８は、前文の話
題が照応先と判定されている例である。文１７は、“ポ
リシー”という単語が、命題照応型の表現であるので、
前文（１６文）全体が照応先となっている。FIG. 22 shows the result of anaphora resolution for the sentence shown in FIG. It is determined that the reference destinations of the anaphoric expressions in sentences 2, 12, and 15 are within the own sentence. In sentences 4 and 16, since the modified word exists in the preceding sentence, the noun phrase of the preceding sentence including it is determined to be the destination. Sentence 8 is an example in which the topic of the previous sentence is determined to be the reference destination. In sentence 17, the word "policy" is a propositional anaphoric expression.
The entire preamble (16 sentences) is the reference destination.

【００４９】抄録生成・表示部７は、照応解決結果、お
よび、各文の形態素他・構文解析結果、およびユーザ指
定の抄録文数指定を元に、表示する抄録文を作成し、ま
た、ユーザの指示に従って、表示の変更、照応先の語句
の表示、抄録のやり直しなどを行う。図６乃至１５及び
図２４乃至３３は、抄録生成・表示部７の動作を説明す
るものである。The abstract generation / display unit 7 creates an abstract sentence to be displayed based on the anaphora resolution result, the morpheme / syntax analysis result of each sentence, and the user-specified abstract sentence number specification. Change the display, display the phrase of the reference destination, and redo the abstract according to the instructions. 6 to 15 and FIGS. 24 to 33 describe the operation of the abstract generation / display unit 7.

【００５０】以下の説明では、ユーザの指定した文数が
５であり、抄録生成・表示部７はその数を引き数として
重要文判定部９を起動し、重要文として判定された文
が、１，８，１２，１７の４文であったとする。In the following description, the number of sentences specified by the user is 5, and the abstract generation / display unit 7 activates the important sentence determination unit 9 with that number as an argument, and the sentence determined as an important sentence is It is assumed that there are four sentences of 1, 8, 12, and 17.

【００５１】図６は、表示を行うためのアルゴリズムで
ある。原文中で連続していない文と文の間には“…”が
挿入され、また、文の接続表現が削除される。そうして
作成された抄録の表示例が図２３である。FIG. 6 shows an algorithm for displaying. "..." is inserted between the sentences that are not continuous in the original sentence, and the connection expression of the sentence is deleted. FIG. 23 shows a display example of the abstract thus created.

【００５２】尚、特願平０４−１００１６８号では、選
択された重要文が原文中で不連続の場合、文脈構造を参
照して、文中の接続表現を適宜適切なものに置換する方
式が提案されている。本実施例にその技術を利用するこ
とも容易である。In Japanese Patent Application No. 04-100168, when the selected important sentences are discontinuous in the original sentence, a method of referring to the context structure and replacing the connection expression in the sentence with an appropriate one is proposed. Has been done. It is easy to use the technique in this embodiment.

【００５３】図７は、請求項２記載の動作を行う為のア
ルゴリズムである。FIG. 7 shows an algorithm for performing the operation described in claim 2.

【００５４】図２３に示したような抄録文の表示がされ
ているときに、例えば、表示中の第２文の前の記号
“…”がマウスなどでクリックされたとする。すると、
この場合、第２文から第７文の文章が表示される。図２
４は、その表示例である。この例では、表示中に第２文
から第７文が挿入され、反転（図中、Ｒｖで示す）など
によって挿入された部分が分かり易くなるように示され
ている。When the abstract sentence as shown in FIG. 23 is being displayed, for example, it is assumed that the symbol “...” In front of the second sentence being displayed is clicked with a mouse or the like. Then,
In this case, the sentences from the second sentence to the seventh sentence are displayed. Figure 2
4 is an example of the display. In this example, the second sentence to the seventh sentence are inserted in the display, and the inserted portion is shown by inversion (indicated by Rv in the drawing) or the like so that it is easy to understand.

【００５５】図２５は別の表示例である。ここでは、別
ウインドウに第２文から第７文が表示され、クリックし
た場所とそのウインドウが線で結ばれている。FIG. 25 shows another display example. Here, the second sentence to the seventh sentence are displayed in another window, and the clicked place and the window are connected by a line.

【００５６】図８は表示を行う為のアルゴリズムであ
る。今、図２３に示した抄録文が表示されていたとす
る。ここで、表示中の第２文の照応表現“こちら”をク
リックしたときの表示例が、図２６である。照応先の語
句を含む文が挿入され、反転表示などによって強調され
て示される。FIG. 8 shows an algorithm for displaying. Now, suppose that the abstract sentence shown in FIG. 23 is being displayed. Here, FIG. 26 shows a display example when the anaphoric expression “here” of the second sentence being displayed is clicked. A sentence including the phrase of the reference destination is inserted and highlighted by reverse display or the like.

【００５７】同様、表示中の第４文の照応表現“この”
をクリックしたときの表示例が図２７である。この場
合、照応先の語句を含む文は既に表示されているので、
該当部分が反転表示などによって示される。Similarly, the anaphoric expression "Kono" of the fourth sentence being displayed
FIG. 27 shows a display example when the button is clicked. In this case, the sentence containing the phrase of the reference destination is already displayed, so
The relevant part is indicated by reverse display or the like.

【００５８】図９は、表示を行う為のアルゴリズムであ
る。重要文で、非重要文への照応表現を含んでいるもの
は、重要文２（原文番号８）および重要文４（原文番号
１７）でる。両方含めると重要文数が６となり、ユーザ
の指定文数を超過する。この場合、重要文判定部が引き
数４で再起動される。そして、そのとき重要文として判
定された文が１，８，１６の３文であったとする。非重
要文への照応表現を含んでいるのは、先と同じで２文あ
る。今度は、照応先の文を含めても、ユーザの指定文数
以下なので、重要文判定部が再起動されることなく、抄
録文が生成される。図２８は、その表示例である。前述
した表示方式も利用している。FIG. 9 shows an algorithm for displaying. Important sentences that include anaphoric expressions for non-important sentences are important sentence 2 (original sentence number 8) and important sentence 4 (original sentence number 17). If both are included, the number of important sentences becomes 6, which exceeds the number of sentences specified by the user. In this case, the important sentence determination unit is restarted with argument 4. Then, it is assumed that the sentences determined as the important sentences at that time are three sentences of 1, 8, and 16. Just as before, there are two sentences that include anaphoric expressions for non-important sentences. This time, since the number of sentences including the reference destination is less than the number of sentences specified by the user, the abstract sentence is generated without restarting the important sentence determination unit. FIG. 28 is an example of the display. The display method described above is also used.

【００５９】図１０は、表示を行う為のアルゴリズムで
ある。これは、重要文中の照応表現の照応先が（非重要
文で）１文全体である場合は、重要文として抄録中に含
め照応先が（非重要文中の）語句である場合は、重要文
中の照応表現をその語句と置換するというものである。FIG. 10 shows an algorithm for displaying. This means that if the referent of an anaphoric expression in an important sentence is the whole sentence (in a non-important sentence), it is included in the abstract as an important sentence, and if the referent is a phrase (in the non-important sentence), it is in the important sentence. The anaphoric expression of is replaced with the phrase.

【００６０】基本的な動作は図９に示したものとほぼ同
じであるが、今の場合、重要文２（原文番号８）の照応
先は語句であり、その置換に際しては重要文の数は増え
ない、重要文４（原文番号１７）の照応先は文全体であ
り、これは置換でなく挿入によって、抄録文中に加えら
れる。重要文の数はインクリメントされる。結局、照応
処理後の文数は５であり、ユーザの指定文数以下なの
で、重要文判定部は再起動されることなく、そのまま抄
録文として表示される。図２９は、その表示例である。
前述した表示方式も利用している。The basic operation is almost the same as that shown in FIG. 9, but in this case, the reference destination of the important sentence 2 (original sentence number 8) is a word and phrase, and when replacing it, the number of important sentences is The reference destination of the important sentence 4 (original sentence number 17) that does not increase is the entire sentence, and this is added to the abstract sentence by insertion instead of replacement. The number of important sentences is incremented. After all, the number of sentences after anaphoric processing is 5, which is equal to or less than the number of sentences designated by the user, so that the important sentence determination unit is displayed as it is as an abstract sentence without being restarted. FIG. 29 is a display example thereof.
The display method described above is also used.

【００６１】図１１は、表示を行う為のアルゴリズムで
ある。図１０に示す例と違う点は、置換はせず、挿入に
さいして挿入部分を示すような記号‘［’，‘］’を付
加するという点である。図３０は、その表示例である。
前述した表示方式も利用している。FIG. 11 shows an algorithm for displaying. The difference from the example shown in FIG. 10 is that replacement is not performed, and the symbols '[', ']' indicating the inserted portion are added at the time of insertion. FIG. 30 is a display example thereof.
The display method described above is also used.

【００６２】図１２は、表示を行う為のアルゴリズムで
ある、これは、非重要文への照応表現を含むような重要
文を抄録から削除するものである。この例では、重要文
２（原文番号８）、重要文４（原文番号１７）が削除さ
れている。図３１は、表示例である。前述した表示方式
も利用している。FIG. 12 shows an algorithm for displaying. This is to delete an important sentence from the abstract that includes an anaphoric expression to an unimportant sentence. In this example, important sentence 2 (original sentence number 8) and important sentence 4 (original sentence number 17) are deleted. FIG. 31 is a display example. The display method described above is also used.

【００６３】図１３は、表示を行う為のアルゴリズムで
ある。これは、非重要文への照応を含むような連体詞を
重要文から削除するものである。照応表現が代名詞であ
る場合は、通常それを削除すると文の意味がとれなくな
るので、その文全体を削除する。この例では、重要文２
（原文番号８）の“こちらの”が削除される。FIG. 13 shows an algorithm for displaying. This removes adnominals that include anaphora from non-important sentences from important sentences. If the anaphoric expression is a pronoun, deleting the whole sentence usually removes the meaning of the sentence. In this example, important sentence 2
“Original” of (original number 8) is deleted.

【００６４】ところが、重要文４（原文番号１７）に於
いては、“この”を削除しても“ポリシー”だけでは、
文章が掴みにくい。一般に連体詞の被修飾語が“できご
と”、“事態”など命題照応表現のタイプであり、その
語にかかる他の修飾節がない場合は、連体詞を削除する
と文章が掴めなくなってしまう。このような場合には、
代名詞同様その文全体を削除する。However, in important sentence 4 (original sentence number 17), even if "this" is deleted, only "policy"
Difficult to grasp sentences. Generally, the modified word of an adnominal is a type of propositional anaphoric expression such as “event” or “situation”, and if there is no other modifier clause associated with the word, deleting the adjunct will make it impossible to grasp the sentence. In such cases,
Delete the entire sentence as well as the pronoun.

【００６５】図３２は、表示例である。請求項１記載の
表示方式も利用している。FIG. 32 is a display example. The display system according to claim 1 is also used.

【００６６】さて、従来方式として、文中の副詞句や従
属節など重要でない部分を削除して文章の簡約化をはか
る技術がある。その技術に照応解決処理を適用した例に
ついて、以下述べる。As a conventional method, there is a technique for reducing a sentence by deleting unimportant parts such as adverb phrases and subordinate clauses in the sentence. An example of applying anaphora resolution processing to the technique will be described below.

【００６７】副詞句や従属節等を削除する際問題となる
のは、主節中に照応表現があり、その照応先がそれら副
詞句や従属節内の語句である場合である。そのような場
合には、その副詞句なし従属節は削除しないようにすべ
きである。A problem when deleting an adverb phrase or a subordinate phrase is when there is an anaphoric expression in the main clause and the destination of the anaphora is a phrase in the adverb phrase or subordinate phrase. In such cases, the adverbial subordinate clause should not be deleted.

【００６８】図１４は、表示を行う為のアルゴリズムで
ある。この文章の例では、文番号１２の主節の冒頭の
“これ”は、その前の“文体や文型といった修辞的、実
際的な情報のみである程度の照応解決を行うことができ
るが”という挿入句を指しているので、この節は削除さ
れずに残る。図３３は、表示例である。FIG. 14 shows an algorithm for displaying. In the example of this sentence, "kore" at the beginning of the main clause of sentence number 12 is inserted before that, "although it is possible to solve anaphora to some extent only with rhetorical and practical information such as style and sentence pattern". This clause remains undeleted because it refers to a phrase. FIG. 33 is a display example.

【００６９】以上、抄録作成・表示部の動作については
詳しく述べたが、これらの処理は自由に組み合わされて
良い。またこれらの表示処理は、ユーザの指定に応じて
動的に相互に移行して良い。図１４に示す例の処理を、
照応解析部の解析結果を利用して抄録文を作成し表示す
る場合の重要文選択型の自動抄録システムと組み合わせ
ることも有用である。The operation of the abstract creation / display unit has been described above in detail, but these processes may be freely combined. Further, these display processes may be dynamically switched to each other in accordance with the designation of the user. The process of the example shown in FIG.
It is also useful to combine it with an important sentence selection type automatic abstraction system when creating and displaying abstract sentences using the analysis results of the anaphora analysis part.

【００７０】尚、本実施例では各処理部や記憶部で処理
されるデータの実際的な構造について言及しなかった
が、全て文字列型のデータであり、１処理単位（１登
録）が１行を構成するテキストとみなしてかまわない。
複数のデータの対応関係を記憶する場合は、適当なフィ
ールドセパレータで区切って、それらの情報を決まった
順番で１行に並べて登録するようにする。Although the actual structure of the data processed by each processing unit and storage unit is not mentioned in this embodiment, all the data are character string type and one processing unit (one registration) is one. You can think of it as the text that makes up a line.
When storing the correspondence of a plurality of data, it is divided by an appropriate field separator and the information is arranged and registered in one line in a predetermined order.

【００７１】それらの情報を参照、検索するときは、文
頭からのフィールドセパレータの個数を数え、必要とし
ているフィールド（カラム）のデータを取り出すように
すれば良い。これらのデータに対する登録や検索は、テ
キストファイルに対する追加や検索の処理を用いればよ
く、これは広く行われているどのような方法を用いても
可能である。また、本例は抄録の表示を中心に述べてい
るが、圧縮率１００％、つまり、抄録しない場合を考え
れば、高機能のテキストブラウザとしても利用可能であ
る。When referring to and retrieving such information, the number of field separators from the beginning of the sentence may be counted and the required field (column) data may be extracted. Registration and search for these data may be performed by adding or searching the text file, and any widely used method may be used. Further, although this example mainly describes the display of the abstract, the compression rate is 100%, that is, it can be used as a high-performance text browser in the case of not abstracting.

【００７２】上述したように、本実施例は照応解析結果
をふまえて抄録を生成しているので、抄録文の読解性が
高い。また、抄録文を対象にフルテキストサーチや情報
抽出などの処理を行う際、キーワードの漏れが照応処理
をしない場合に比べ少ないので、より良い結果を期待で
きる。また最近の情報検索システムの中には１文中の複
数の語の共起関係や構文関係を利用するものがあるが、
照応処理が適切にされていれば、語の１つが代名詞であ
ってもこういった検索システムの検索精度を下げること
がない。また、ユーザの操作に従って、インタラクティ
ブに、抄録の長さを変更したり、抄録中にない原文部分
を表示したり、抄録中の連体詞、代名詞などの照応先を
表示したりすることができる。As described above, in this embodiment, since the abstract is generated based on the anaphora analysis result, the readability of the abstract text is high. In addition, when performing processing such as full-text search or information extraction on the abstract sentence, the number of keywords is less than that in the case without anaphora processing, and therefore better results can be expected. Some recent information retrieval systems use co-occurrence relations and syntactic relations of multiple words in one sentence.
Proper anaphoric processing does not reduce the search accuracy of such a search system even if one of the words is a pronoun. Further, it is possible to interactively change the length of the abstract, display an original text part that is not in the abstract, or display an anaphoric destination such as a noun or a pronoun in the abstract in accordance with the user's operation.

【００７３】[0073]

【発明の効果】以上説明したように本発明によれば、抄
録文中の非重要文、例えば照応表現の照応先或いは省略
箇所表示が抄録文中に含まれるようにしたので、抄録文
の読解性、自然性を向上させることができる。As described above, according to the present invention, since the non-important sentence in the abstract sentence, for example, the reference destination of the anaphoric expression or the abbreviation display, is included in the abstract sentence, the readability of the abstract sentence, The naturalness can be improved.

[Brief description of drawings]

【図１】本発明に係る抄録作成・表示システムの一実施
例の概略の構成を示すブロック図である。FIG. 1 is a block diagram showing a schematic configuration of an example of an abstract creation / display system according to the present invention.

【図２】図１に示す照応解決処理部の概略の構成を示す
ブロック図である。FIG. 2 is a block diagram showing a schematic configuration of an anaphora resolution processing unit shown in FIG.

【図３】図１に示す抄録生成・表示部の概略の構成を示
すブロック図である。FIG. 3 is a block diagram showing a schematic configuration of an abstract generation / display unit shown in FIG.

【図４】図１に示す照応解析処理部の話題解析部の動作
アルゴリズムを示すフローチャートである。FIG. 4 is a flowchart showing an operation algorithm of a topic analysis unit of the anaphora analysis processing unit shown in FIG.

【図５】図１に示す照応解決処理部の照応固定部の動作
アルゴリズムを示すフローチャートである。である。5 is a flowchart showing an operation algorithm of an anaphora fixing unit of the anaphora resolution processing unit shown in FIG. 1; Is.

【図６】図１に示す抄録生成・表示部のアルゴリズムを
示すフローチャートである。FIG. 6 is a flowchart showing an algorithm of the abstract generation / display unit shown in FIG. 1.

【図７】図１に示す抄録生成・表示部のアルゴリズムを
示すフローチャートである。FIG. 7 is a flowchart showing an algorithm of the abstract generation / display unit shown in FIG. 1.

【図８】図１に示す抄録生成・表示部のアルゴリズムを
示すフローチャートである。FIG. 8 is a flowchart showing an algorithm of the abstract generation / display unit shown in FIG.

【図９】図１に示す抄録生成・表示部のアルゴリズムを
示すフローチャートである。9 is a flowchart showing an algorithm of the abstract generation / display unit shown in FIG.

【図１０】図１に示す抄録生成・表示部のアルゴリズム
を示すフローチャートである。10 is a flowchart showing an algorithm of the abstract generation / display unit shown in FIG.

【図１１】図１に示す抄録生成・表示部のアルゴリズム
を示すフローチャートである。11 is a flowchart showing an algorithm of the abstract generation / display unit shown in FIG.

【図１２】図１に示す抄録生成・表示部のアルゴリズム
を示すフローチャートである。12 is a flowchart showing an algorithm of the abstract generation / display unit shown in FIG.

【図１３】図１に示す抄録生成・表示部のアルゴリズム
を示すフローチャートである。FIG. 13 is a flowchart showing an algorithm of the abstract generation / display unit shown in FIG. 1.

【図１４】図１に示す抄録生成・表示部のアルゴリズム
を示すフローチャートである。FIG. 14 is a flowchart showing an algorithm of the abstract generation / display unit shown in FIG. 1.

【図１５】図２に示す照応表現文型記憶部の登録例を示
す図である。FIG. 15 is a diagram showing an example of registration in the anaphoric expression sentence pattern storage unit shown in FIG. 2;

【図１６】図２に示す修辞的話題表現記憶部の登録例を
示す図である。16 is a diagram showing an example of registration in the rhetorical topic expression storage unit shown in FIG.

【図１７】図２に示す倒置構文文型記憶部の登録例を示
す図である。17 is a diagram showing an example of registration in the inverted syntax sentence pattern storage unit shown in FIG.

【図１８】各処理を説明するための処理文章例を示す図
である。FIG. 18 is a diagram illustrating a processing sentence example for explaining each processing.

【図１９】文章の文番号と照応表現の位置を記したもの
を示す図である。FIG. 19 is a diagram showing the sentence numbers of sentences and the positions of anaphoric expressions.

【図２０】文章の形態素・構文解析結果の一部の表示例
を示す図である。FIG. 20 is a diagram showing a display example of a part of a morpheme / syntactic analysis result of a sentence.

【図２１】文章の話題解析結果を示す図である。FIG. 21 is a diagram showing a topic analysis result of a sentence.

【図２２】文章の照応解決結果を示す図である。FIG. 22 is a diagram showing a result of anaphora resolution of a sentence.

【図２３】表示例を示す図である。FIG. 23 is a diagram showing a display example.

【図２４】表示例を示す図である。FIG. 24 is a diagram showing a display example.

【図２５】表示例を示す図である。FIG. 25 is a diagram showing a display example.

【図２６】表示例を示す図である。FIG. 26 is a diagram showing a display example.

【図２７】表示例を示す図である。FIG. 27 is a diagram showing a display example.

【図２８】表示例を示す図である。FIG. 28 is a diagram showing a display example.

【図２９】表示例を示す図である。FIG. 29 is a diagram showing a display example.

【図３０】表示例を示す図である。FIG. 30 is a diagram showing a display example.

【図３１】表示例を示す図である。FIG. 31 is a diagram showing a display example.

【図３２】表示例を示す図である。FIG. 32 is a diagram showing a display example.

【図３３】表示例を示すである。FIG. 33 is a diagram showing a display example.

[Explanation of symbols]

１書式解析部３形態素・構文解析部５照応解決部７抄録生成・表示部９重要文判定部１１キーボード１３ディスプレイ／プリンタ 1 Format Analysis Section 3 Morphological / Syntax Analysis Section 5 Anaphoric Resolution Section 7 Abstract Generation / Display Section 9 Important Sentence Determination Section 11 Keyboard 13 Display / Printer

Claims

[Claims]

1. An abstract creating device for creating an abstract from an inputted original sentence, comprising: an abstract sentence creating means for creating an abstract sentence by selecting a word or phrase from the original sentence according to a predetermined standard; and the abstract sentence creating means. When there is a word that is not selected between the selected words and phrases, etc., a presentation means for presenting that fact between the words and phrases in the abstract sentence, and a word or phrase corresponding to the part presented by this presentation means are provided as necessary. An abstract creating device, comprising: a display unit for displaying the abstract.