JPH05324707A

JPH05324707A - Natural language processor

Info

Publication number: JPH05324707A
Application number: JP4148694A
Authority: JP
Inventors: Takashi Katooka; 隆加登岡; Hiroko Hayashi; 寛子林; Shigeya Senda; 滋也千田; Yoshikazu Shiraishi; 美和白石; Masumi Narita; 真澄成田; Yoshihisa Oguro; 慶久大黒
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1992-05-15
Filing date: 1992-05-15
Publication date: 1993-12-07

Abstract

PURPOSE:To allow a user who has knowledge of an object field to automatically obtain more detailed information by inputting information on a modification relation, an equivalent, etc., and analyzing the information. CONSTITUTION:A sample sentence extracting device 3 extracts a sample sentence 5 from a target document 4. A sentence analyzing device 1 analyzes a sampled document of the field to be processed. The analysis result of the sentence analyzing device 1 is stored in an analytic storage device 2. When the sentence analyzing device 1 analyzes the document, the sample is not analyzed if the feature of the sentence to be analyzed is the same as that of a document which is analyzed before, so the information can effectively be gathered.

Description

Detailed Description of the Invention

【０００１】[0001]

【技術分野】本発明は、自然言語処理装置に関し、より
詳細には、より精度の高い翻訳装置精度を有する機械翻
訳装置における辞書作成支援装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a natural language processing device, and more particularly to a dictionary creation support device in a machine translation device having a higher accuracy of the translation device.

【０００２】[0002]

【従来技術】汎用の自然言語処理システムにおいて、多
種多様な言語現象に対応するため、辞書や文法等のデー
タを出きるだけ増やし、場合に応じた使い分け情報を付
与するのが通常である。しかし、処理対象分野あるいは
文書を限定した場合は多義性がかなり解消されるため、
多くの使い分け情報は無駄になるばかりでなく、不適当
な情報付与により誤った処理が施されるなどの弊害も多
い。2. Description of the Related Art In a general-purpose natural language processing system, in order to deal with various language phenomena, it is usual to increase the amount of data such as dictionaries and grammars as much as possible, and to add proper usage information according to the case. However, when the processing target field or document is limited, the ambiguity is considerably eliminated, so
Not only is a great deal of use-specific information wasted, but there are many adverse effects such as incorrect processing being performed due to improper information addition.

【０００３】この点を解決するために、例えば、特開平
２−６７６７９号公報に「辞書作成支援機能付き翻訳処
理方式および装置」が提案されている。この公報のもの
は翻訳対象である第１言語で記述されたサンプルテキス
トから、単語の出現頻度等の統計情報を抽出し、その頻
度情報に基づいて、汎用辞書から抽出すべき情報をユー
ザが指示し、その指示された情報に基づき、汎用辞書の
情報の中から翻訳対象テキストに適した情報を抽出して
ユーザ辞書を作成するものである。しかしながら、シス
テムに関する知識のないユーザが直接登録できる情報
は、訳語や大まかな品詞程度に限られるうえ、細かな情
報を得ようとすれば、それだけユーザに高度の能力を要
求し、また、誤登録などの問題も多くなる。In order to solve this point, for example, Japanese Patent Laid-Open No. 2-67679 proposes "translation processing system and device with dictionary creation support function". In this publication, statistical information such as the frequency of occurrence of words is extracted from the sample text written in the first language to be translated, and the user indicates the information to be extracted from the general-purpose dictionary based on the frequency information. Then, based on the instructed information, information suitable for the translation target text is extracted from the information in the general-purpose dictionary to create the user dictionary. However, the information that a user who has no knowledge about the system can directly register is limited to translated words and rough part-of-speech, and the more detailed information is required, the higher the ability of the user is required, and the incorrect registration. There are also many problems.

【０００４】翻訳する対象文書から、サンプル文を抽出
し、それに言語的な分析を施すことで対象文書にカスタ
マイズしたデータ（辞書、文法等）を作成する。サンプ
ル文は多ければ多いほどデータの網羅度はますが、コス
トパフォーマンスを考えるとできるだけ少ないサンプル
文でカスタマイズするほうが良い。そこで、既に得た情
報をもつサンプル文は出きるだけ１度だけで済ますよう
にしたい。また、従来の文分析装置で単語の情報（品
詞、訳語など）を得たとき、情報に多義が生じる場合が
ある。これらの多義を解消するため、統計的な情報を用
いて、頻度などを構文解析の計算に用いることで、解析
率を上げる。ある程度のサンプルから多義のないと考え
られるものは、それ以上サンプルを得る必要がないが、
多義があった語は統計的データが必要であるので、より
多くのサンプル文を必要とする。より効率的なサンプル
文の選択を行う。また、同様に多義があった語の多義解
消のための情報をユーザの高度な能力を要求しないで、
自動的に抽出することが要求されている。Sample sentences are extracted from the target document to be translated and subjected to linguistic analysis to create data (dictionary, grammar, etc.) customized for the target document. The more sample sentences there are, the more complete the data will be. However, considering cost performance, it is better to customize with as few sample sentences as possible. Therefore, I want to be able to complete the sample sentence with the already obtained information only once. Further, when word information (part of speech, translation, etc.) is obtained by a conventional sentence analysis device, the information may be ambiguous. In order to eliminate these ambiguities, statistical information is used to increase the analysis rate by using frequency and the like in the calculation of syntactic analysis. Something that is considered ambiguous from some sample does not need to get more samples,
Ambiguous words require more sample sentences because they require statistical data. Select more efficient sample sentences. Similarly, without requiring the user's advanced ability to provide information for disambiguating a word that has polysemy,
Automatic extraction is required.

【０００５】[0005]

【目的】本発明は、上述のごとき実情に鑑みてなされた
もので、対象となる分野の知識を持つユーザが係り受け
関係や訳語などの情報を入力し、その情報を分析するこ
とで、より細かな情報を自動的に得るようにした自然言
語処理装置を提供することを目的としたものである。[Purpose] The present invention has been made in view of the above situation, and a user having knowledge of a target field inputs information such as dependency relations and translations, and analyzes the information. It is an object of the present invention to provide a natural language processing device capable of automatically obtaining detailed information.

【０００６】[0006]

【構成】本発明は、上記目的を達成するために、（１）
抽出されたサンプル文を分析する文分析装置と、該文分
析装置における分析結果を記憶する分析結果記憶装置と
を有する自然言語処理装置において、ある処理対象分野
の文書を文分析装置により分析するとき、分析する文の
特徴がすでに以前に分析したものと同じ場合は、そのサ
ンプルを分析しないサンプル抽出手段を有すること、更
には、（２）抽出されたサンプル文を分析する文分析装
置と、該文分析装置における分析結果を記憶する分析結
果記憶装置とを有する自然言語処理装置において、ある
処理対象分野の文書を文分析装置により分析するとき、
語ごとの情報に多義が生じるとき多義が生じる語を含む
文をさらに選択収集するサンプル抽出手段を有するこ
と、更には、（３）前記（１）又は（２）において、あ
る処理対象分野の文書を文分析装置により分析すると
き、ある語の訳語に多義が生じるとき多義が生じる語を
含む文に同時に出現する語をその多義の語の訳語の情報
として辞書を作成すること、更には、（４）前記（３）
において、ある処理対象分野の文書を文分析装置により
分析するとき、ある語の訳語に多義が生じるとき多義が
生じる語を含む文に含まれる語のうち多義のある語に構
文的に関連した語をその多義の語の訳語の情報として辞
書を作成すること、或いは、（５）抽出されたサンプル
文を分析する文分析装置と、該文分析装置における分析
結果を記憶する分析結果記憶装置において、ある処理対
象分野の文書を文分析装置により分析するとき、語ごと
の情報に多義が生じるとき多義が生じる語を含む文をサ
ンプリング文抽出装置によりさらに選択し、収集された
サンプル文を文分析装置により分析して、多義のある語
の多義解消のための情報を収集することを特徴としたも
のである。以下、本発明の実施例に基づいて説明する。In order to achieve the above object, the present invention provides (1)
In a natural language processing device having a sentence analysis device that analyzes extracted sample sentences and an analysis result storage device that stores the analysis result of the sentence analysis device, when analyzing a document in a certain field to be processed by the sentence analysis device If the characteristics of the sentence to be analyzed are the same as those previously analyzed, it has a sample extracting means that does not analyze the sample, and (2) a sentence analyzer for analyzing the extracted sample sentence, In a natural language processing device having an analysis result storage device for storing the analysis result in the sentence analysis device, when analyzing a document in a certain field to be processed by the sentence analysis device,
Having a sample extracting means for selectively collecting a sentence including a word that causes polysemy when information for each word has polysemy, and (3) a document in a certain field to be processed in (1) or (2) above. When a sentence analysis device is used to analyze a word, a word that appears at the same time in a sentence that includes a word that causes polysemy when the word has polysemy is created as a dictionary of translated words of the polysemous word. 4) Above (3)
In the case of analyzing a document in a certain field to be processed by a sentence analysis device, a word syntactically related to a polysemous word among words included in a sentence including a polysemous word In a dictionary as the information of the translated word of the polysemous word, or (5) in a sentence analysis device that analyzes the extracted sample sentence and an analysis result storage device that stores the analysis result in the sentence analysis device, When analyzing a document in a certain field to be processed by a sentence analysis device, a sentence containing an ambiguous word is further selected by a sampling sentence extraction device when the meaning of each word is ambiguous, and the collected sample sentence is analyzed by the sentence analysis device. It is characterized by collecting information for disambiguation of an ambiguous word by analyzing. Hereinafter, description will be given based on examples of the present invention.

【０００７】図１は、本発明による自然言語処理装置の
一実施例を説明するための構成図で、図中、１は文分析
装置、２は分析結果記憶装置、３はサンプル文抽出装
置、４はターゲット文書、５はサンプル文である。図２
は、本発明による自然言語処理装置の動作を説明するた
めのフローチャートである。実施例として、英文を分析
する場合を以下に説明する。まず、文分析装置１により
以下の英文を分析する。 This boy is wise. （１） Wise boy has good brain. （２） Good brain is heavy. （３）サンプル文抽出装置３から前記文（１）をサンプル文と
して得る（step１）。これを語に分割して、語で文分析
結果を検索すると最初は分析結果が空なので（step
３）、サンプル文をユーザに表示して文分析装置により
文分析のための情報（係り受け関係）を入力させ、この
情報をもとに文分析を行う（step４）。このユーザの入
力結果は図３に示すような文単位のものになる（文単位
に分析した順番にテキスト情報として追加されてい
く）。これらから文分析装置により語単位にまとめて
（step５）、図４のような語単位にまとめる形式にし、
分析結果記憶装置２に記憶する（step６）。文（２）を
サンプル文として得る（step１）。文（１）と同様に語
に分割し、分割した語がこれまでの分析結果で既に使わ
れていたかをチェックする。FIG. 1 is a block diagram for explaining an embodiment of a natural language processing apparatus according to the present invention. In the figure, 1 is a sentence analysis device, 2 is an analysis result storage device, 3 is a sample sentence extraction device, Reference numeral 4 is a target document, and 5 is a sample sentence. Figure 2
3 is a flowchart for explaining the operation of the natural language processing device according to the present invention. As an example, the case of analyzing an English sentence will be described below. First, the sentence analysis device 1 analyzes the following English sentences. This boy is wise. (1) Wise boy has good brain. (2) Good brain is heavy. (3) The sentence (1) is obtained as a sample sentence from the sample sentence extracting device 3 (step 1). If you divide this into words and search sentence analysis results by words, the analysis results will be empty at first (step
3) The sample sentence is displayed to the user, information for sentence analysis (dependency relation) is input by the sentence analysis device, and sentence analysis is performed based on this information (step 4). The input result of this user is for each sentence as shown in FIG. 3 (text information is added in the order of analysis for each sentence). From these, the sentence analysis device collects them in word units (step 5), and forms them into word units as shown in FIG.
The data is stored in the analysis result storage device 2 (step 6). Obtain sentence (2) as a sample sentence (step 1). Similar to sentence (1), it is divided into words, and it is checked whether the divided words have already been used in the analysis results so far.

【０００８】既に使われていたかのチェックは文分析結
果記憶装置の中の（step１）における行頭が“％”で始
める行と文（２）を構成する語から１語づつを取り出し
て文字列マッチングを行う。行頭が“％”で始まる行に
は文の文字列が格納されている。この時点では文（１）
の文の文字列しか存在していない。文（２）の第１語
“wise”は文（１）で既に使われている。文（２）の第
２語“boy”も文（１）で使われている。文（２）の第
３語“has”は文（１）でまだ使われていないのでサン
プル文として文（２）は採用され、文分析装置によって
分析を行い、その結果が図３，図４のようになる。同様
に文（３）も分析される。文（３）を構成する語“heav
y”は文（１）（２）の文分析結果のいずれの“％”行
にも文字列が存在しないので、サンプル文として採用さ
れ文分析装置で分析される。このあとに、以下の文を分
析しようとするとき、 This boy is heavy. （４）To check whether or not it has already been used, the character string matching is performed by taking out one word from the word forming the sentence (2) and the line starting with "%" in the sentence analysis result storage device (step 1). To do. The character string of the sentence is stored in the line beginning with "%". At this point sentence (1)
Only the character string of the sentence of exists. The first word "wise" in sentence (2) is already used in sentence (1). The second word "boy" in sentence (2) is also used in sentence (1). Since the third word "has" of sentence (2) has not been used in sentence (1) yet, sentence (2) is adopted as a sample sentence and the result is analyzed by the sentence analysis device, and the result is shown in FIG. become that way. Similarly, sentence (3) is also analyzed. The word "heav" that composes sentence (3)
Since “y” does not have a character string in any “%” line of the sentence analysis results of sentences (1) and (2), it is adopted as a sample sentence and analyzed by the sentence analysis device. When trying to analyze, this boy is heavy. (4)

【０００９】上記文（４）を構成する語は“this”,“b
oy”,“is”,“heavy”はいずれも前記（１）（２）
（３）の文群で構成される語群に含まれているという特
徴をもっている。よって既に各語は文分析が済んでお
り、各語の情報を得ている。文（４）の文分析は行わな
い（step３）。特徴が同じというのはこのほかに例え
ば、原文の文字列マッチングを行い、既に分析した文と
まったく同じ英文であるという特徴をもった英文が来た
ときこの文をサンプル文として選択しないなどである。
又は、同じ特徴をもったものをサンプルとするかどうか
のチェックにおいて、同じ特徴をもっているが、ある制
限をこえるまではサンプルとして選択し、制限を越える
と選択をやめるということも出きる。例えば、語の出現
回数が５回を越えるまでは、たとえ、同じ語だけからな
るサンプル文であってもサンプル文として採用し、５回
を越える語だけからなるときはサンプル文としては採用
しない、ということである。これにより、サンプルが少
なすぎるため、分析した得た情報の信頼性が低くなりす
ぎるのを防ぐことができる。The words forming the sentence (4) are "this" and "b".
oy ”,“ is ”, and“ heavy ”are all the same as above (1) (2)
The feature is that it is included in the word group composed of the sentence group of (3). Therefore, sentence analysis has already been completed for each word, and information on each word has been obtained. Sentence analysis of sentence (4) is not performed (step 3). The fact that the characteristics are the same is that, for example, when a character string matching of the original sentence is performed and an English sentence with the characteristic that it is exactly the same as the already analyzed sentence comes, this sentence is not selected as a sample sentence. ..
Alternatively, when checking whether or not to use samples having the same characteristics as samples, it is possible to select the samples as having the same characteristics until a certain limit is exceeded, and to stop the selection when the limits are exceeded. For example, if the number of appearances of a word exceeds five, even a sample sentence consisting of the same word is adopted as a sample sentence, and if it consists of words exceeding five times, it is not adopted as a sample sentence. That's what it means. This will prevent the reliability of the obtained information analyzed from becoming too low due to too few samples.

【００１０】終了チェック（step７）においては、あら
かじめ設定されたサンプル文の数や、異なり語の数によ
り制限を行い、これを越えた場合は終了とする。対象文
書における異なり語を予め求め、それの例えば８０％に
当る数を制限の異なり語の数とするなどがある。また、
異なり語の増加の仕方の減少具合によって制限すること
もできる。文は、たとえば英文でいうピリオド、日本語
でいう読点で区切られるテキストだけを指すものでな
く、句、章、段落、文書などとしても良い。In the end check (step 7), the number of sample sentences and the number of different words are limited in advance, and when the number exceeds the limit, the process ends. For example, the different words in the target document may be obtained in advance, and the number corresponding to 80% of the different words may be set as the number of different words with restrictions. Also,
It can also be limited by the degree to which the number of different words increases. The sentence may be a phrase, a chapter, a paragraph, a document, etc., as well as a text delimited by a period in English or a reading point in Japanese.

【００１１】あるサンプル文を文分析したとき、語“wi
se”の用法に多義が生じる場合がある。多義の発生の仕
方はサンプル文内に限って多義が生じる場合と以前に他
のサンプル文を分析したときと比べて、多義になる場合
とがある。例えば、文（１）の“wise”は「賢い」とい
う意味であるが、次の文 The boy is wise in the brain. （５）における“wise”は「通じている」という意味である。
このように同じ語に対して多義があるサンプルがくる
と、“wise”の意味をどちらの意味として記憶すべきか
の優先度が問題になる。そこで、多義が生じた場合、多
義が生じた語を含むサンプルを対象文書の中から検索
し、例えば、 He is wise in the brain. （６） He is wise in the machine. （７）などを複数集めて文分析を行い、“wise”の情報を数多
く得ることによって各意味の使われやすさを求める。前
記（６）（７）の“wise”の意味はいずれも「通じてい
る」である。対象分野をある特定の分野に絞るとある語
の意味等の情報はほぼ一意に決まってくることが多く、
頻度という単純なパラメータである語の意味を決定して
も良い場合が多い。When a sample sentence is sentence-analyzed, the word "wi
Ambiguity may occur in the usage of "se". Ambiguity may occur only within a sample sentence and may be ambiguous compared to when other sample sentences were previously analyzed. For example, “wise” in sentence (1) means “wise”, but “wise” in the following sentence, The boy is wise in the brain. (5), means “being informed”.
In this way, when there are ambiguous samples for the same word, the priority of which meaning to store the meaning of "wise" becomes a problem. Therefore, when ambiguity occurs, a sample including the word in which ambiguity occurs is searched from the target document, and for example, He is wise in the brain. (6) He is wise in the machine. (7) Collecting and analyzing sentences, and obtaining a lot of "wise" information, we seek usability of each meaning. The meanings of “wise” in (6) and (7) above are “communication”. When the target field is narrowed down to a specific field, the information such as the meaning of a word is often decided almost uniquely,
In many cases, the meaning of a word, which is a simple parameter of frequency, may be determined.

【００１２】前記例のように多義が生じた場合、この多
義を解消するための情報を得る方法として、多義が生じ
る語を含む文に同時に出現する語をその多義の語の訳語
の情報として辞書を作成する例を説明する。“wise”の
多義の場合“wise”と同時に使われた語は、前記文
（１）（５）（６）（７）から「賢い」の訳で使われた
とき、 is ４ boy １ this １であり、「通じている」のときは、 machine １ problem １ brain １ in ３ is ３ he ３ the ４ boy １である。When polysemy occurs as in the above example, as a method of obtaining information for eliminating the polysemy, a dictionary that simultaneously appears in a sentence including a word in which polysemy occurs is used as information of a translated word of the polysemous word. An example of creating will be described. In the case of ambiguity of “wise”, the word used at the same time as “wise” is 4 boy 1 this 1 when it is used in the translation of “wise” from the above sentence (1) (5) (6) (7). And, when it is "communication", it means machine 1 problem 1 brain 1 in 3 is 3 he 3 the 4 boy 1.

【００１３】そこで、“wise”の辞書の「賢い」の訳語
には is、boy、this が同時に使われたとして語形その
ものを記憶する。また「通じている」の訳語には machin
e、problem、brain、to、is、he、the、this を辞書に
登録する。この辞書を用いて翻訳しようとした文に出現
する語が多く含まれている訳語を選択するなどすれば良
い。又、語の出現頻度も記憶しておき、同時に出現する
語の優先度として用いれば重みの計算精度が高まる。Therefore, the word form itself is memorized as "is", "boy", and "this" are used at the same time for the "wise" translation of the "wise" dictionary. In addition, the translated word for "communicate" is machin
Register e, problem, brain, to, is, he, the, this in the dictionary. It is sufficient to select a translated word that includes many words that appear in the sentence to be translated using this dictionary. Further, the frequency of appearance of words is also stored and used as the priority of words that appear at the same time, so that the calculation accuracy of the weight is improved.

【００１４】前記の例を用いると wise に構文的に関連
する語は文分析結果より「賢い」文（１）では、“is”「通じている」文（５）では、“is”と“in” 文（６）では、“is”と“in” 文（７）では、“is”と“in” となる。Using the above example, words syntactically related to wise are more intelligent than the sentence analysis result in the “smart” sentence (1), “is”, and in the “sensible” sentence (5), “is” and “is”. In the “in” statement (6), it becomes “is” and in the “in” statement (7), it becomes “is” and “in”.

【００１５】このように構文的に関連した語を訳語の情
報として登録すると、多義の各訳語がどのような時にど
の意味で使われるかの特徴を得ることができる。これら
の情報は翻訳のときの共起情報として多義解消に使うこ
とができる。前記請求項３の実施例における頻度の情報
を用いれば多義解消における精度をさらに高めることが
できる。関連した語とは文（１）と文（５）係り受け関
係（図５（ａ）,（ｂ））から“wise”に直接係るもの
及び“wise”が直接受けるものを選びだした。関連する
語の範囲をあらかじめ設定して関連する語の量を調整す
ることができる。By thus registering syntactically related words as translated word information, it is possible to obtain characteristics of when and what meaning each ambiguous translated word is used for. Such information can be used for disambiguation as co-occurrence information at the time of translation. If the frequency information in the embodiment of claim 3 is used, the accuracy in disambiguation can be further improved. The related words are selected from the dependency relations (Figs. 5 (a) and 5 (b)) of sentence (1) and sentence (5) that directly relate to "wise" and those that "wise" directly receives. The range of related words can be preset to adjust the amount of related words.

【００１６】多義とは訳語だけでなく、品詞などのレベ
ルがある。これらを解消するための語の構文的に関連し
た語を多義ごとの情報として頻度とともに記憶したり、
多義のある語の活用型（単数複数、時世等）の情報とし
て記憶したりする。品詞レベルで多義があるときは、名
詞のとき、動詞のときそれぞれで構文的に関連した何の
語が同時に使用されるかを記憶する。既存の機械翻訳シ
ステム、あるいは、形態素解析、構文解析を用いて、予
め構文解析を行い、解析に失敗する文をサンプル文とし
て採用することで、文法、辞書に不足している情報を含
んだ文をサンプルとして効果的に収集することができ
る。The polysemy has not only the translated word but also the level of part of speech. The syntactically related words to eliminate these are stored together with the frequency as information for each meaning,
It is stored as information on the usage type of ambiguity words (singular or plural, time, etc.). When there is ambiguity at the part-of-speech level, remember which syntactically related words are used at the same time for nouns and verbs. By using the existing machine translation system, morphological analysis, or syntactic analysis to perform syntactic analysis in advance, and adopting the sentence that fails to be analyzed as a sample sentence, a sentence that includes information that is missing in the grammar or dictionary Can be effectively collected as a sample.

【００１７】[0017]

【効果】以上の説明から明らかなように、本発明による
と、以下のような効果がある。（１）請求項１のサンプル文抽出装置においては、既に
サンプル文として文分析した文はもう文分析を行わない
ようにサンプル文としては採用しないので、効率的に早
く多くの異なり語のデータを得るためのサンプル文の収
集を行うことができる。（２）請求項２のサンプル文抽出装置においては、多義
がある語を含む文を集め、これのサンプル数を増すこと
で多義のある語だけ情報が多く得ることが出来る。これ
により、効率的にサンプル文の収集を行うことができ
る。（３）請求項３の辞書作成支援装置においては、訳語に
多義のある語があるときその訳語ごとにある語と同時に
出現した語を記憶した辞書を作成する。多義のある訳語
毎にこの訳語はどの語と同時に使われ易いのかという特
徴を辞書に記憶することができる。従来は高度な技術を
もった人間がどの語と同時に使われ安いかを慎重に検討
して辞書作成していたが、この装置においてはこれを自
動的におこなって、多くの情報量により統計的に精度を
高めていくことができる。（４）請求項４において、訳語に多義のある語に構文的
に関連した語だけを多義のある訳語毎に記憶するので、
請求項３の辞書支援装置に比べ記憶要領が小さくなる。（５）請求項５において、訳語の多義だけでなく、例え
ば品詞レベルの多義に関して、多義を解消するため、同
時に使用され、かつ、構文的に関連のある語を多義レベ
ルや共起記述レベル毎に記憶することで多義解消を行
う。これら多義解消のための情報を人間の手で入力する
ことに比べて、高度な知識が必要なく、自動的に行うこ
とが出来る。[Effect] As is apparent from the above description, the present invention has the following effects. (1) In the sample sentence extracting device according to claim 1, since the sentence already analyzed as the sample sentence is not adopted as the sample sentence so that the sentence analysis is not performed anymore, the data of many different words can be efficiently and quickly obtained. You can collect sample sentences to obtain them. (2) In the sample sentence extracting device according to the second aspect, by collecting sentences including ambiguous words and increasing the number of samples of these sentences, a large amount of information can be obtained only for ambiguous words. As a result, the sample sentences can be collected efficiently. (3) In the dictionary creation support device according to the third aspect, when the translated word has ambiguous words, a dictionary is created for each translated word in which a word that appears at the same time as a word is stored. It is possible to store in the dictionary the features of which word is easily used at the same time as each word having a polysemous meaning. Conventionally, a dictionary was created by carefully considering which words were used by people with advanced technology at the same time as being cheap, but with this device, this was done automatically and statistically based on a large amount of information. The accuracy can be improved. (4) In claim 4, since only words syntactically related to a word having a polysemous meaning in a translated word are stored for each word having a polysemous meaning,
The memory requirement is smaller than that of the dictionary support device according to the third aspect. (5) In claim 5, not only the polysemy of the translation word but also the polysemy of the part-of-speech level is used to eliminate the polysemy, and the syntactically related words are used at the polysemy level or the cooccurrence description level at the same time. Ambiguity is resolved by memorizing in. It is possible to perform automatically without the need for advanced knowledge, as compared with the case of manually inputting the information for disambiguation.

[Brief description of drawings]

【図１】本発明による自然言語処理装置の一実施例を
説明するための構成図である。FIG. 1 is a configuration diagram for explaining an embodiment of a natural language processing device according to the present invention.

【図２】本発明による自然言語処理装置の動作を説明
するためのフローチャートである。FIG. 2 is a flowchart for explaining the operation of the natural language processing device according to the present invention.

【図３】本発明の文分析結果（文単位）を示す図であ
る。FIG. 3 is a diagram showing a sentence analysis result (sentence unit) of the present invention.

【図４】本発明の文分析結果（語単位の集計結果）を
示す図である。FIG. 4 is a diagram showing a result of sentence analysis of the present invention (a total result of word units).

【図５】本発明の文の係り受け関係を示す図である。FIG. 5 is a diagram showing a dependency relationship of a sentence according to the present invention.

[Explanation of symbols]

１…文分析装置、２…分析結果記憶装置、３…サンプル
文抽出装置、４…ターゲット文書、５…サンプル文。1 ... sentence analysis device, 2 ... analysis result storage device, 3 ... sample sentence extraction device, 4 ... target document, 5 ... sample sentence.

───────────────────────────────────────────────────── フロントページの続き (72)発明者白石美和東京都大田区中馬込１丁目３番６号株式会社リコー内 (72)発明者成田真澄東京都大田区中馬込１丁目３番６号株式会社リコー内 (72)発明者大黒慶久東京都大田区中馬込１丁目３番６号株式会社リコー内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Miwa Shiraishi 1-3-6 Nakamagome, Ota-ku, Tokyo Stock company Ricoh Co., Ltd. (72) Masumi Narita 1-3-6 Nakamagome, Ota-ku, Tokyo Shares Within Ricoh Company (72) Inventor Yoshihisa Oguro 1-3-6 Nakamagome, Ota-ku, Tokyo Within Ricoh Company, Ltd.

Claims

[Claims]

1. A natural language processing device having a sentence analysis device for analyzing an extracted sample sentence and an analysis result storage device for storing an analysis result in the sentence analysis device, wherein a document in a certain field to be processed is sentence analyzed. A natural language processor characterized in that it has a sample extraction means that does not analyze the sample, when analyzed by the device, if the characteristics of the sentence to be analyzed are already the same as previously analyzed.

2. A natural language processing device having a sentence analysis device for analyzing an extracted sample sentence and an analysis result storage device for storing an analysis result of the sentence analysis device, wherein a document in a certain processing target field is subjected to sentence analysis. A natural language processing apparatus comprising sample extracting means for further selectively collecting a sentence including a word that causes ambiguity when information for each word causes ambiguity when analyzed by the apparatus.

3. When analyzing a document in a certain field to be processed by a sentence analysis device, when there is ambiguity in a translated word of a certain word, words that appear at the same time in a sentence including a word in which the ambiguity occurs are information of the translated word of the ambiguous word. The natural language processing apparatus according to claim 1 or 2, wherein a dictionary is created as.

4. When a document of a certain processing target field is analyzed by a sentence analysis device, when a polysemous word occurs in a translated word of a certain word, a polysemous word among the words included in the sentence including the polysemous word is syntactically defined. 4. The natural language processing apparatus according to claim 3, wherein a dictionary is created using related words as information about translated words of the polysemous words.

5. A sentence analysis device for analyzing an extracted sample sentence, and an analysis result storage device for storing an analysis result of the sentence analysis device, wherein when a document in a certain field to be analyzed is analyzed by the sentence analysis device, When a polysemous information occurs in each information, a sentence including a polysemous word is further selected by a sampling sentence extraction device, the collected sample sentence is analyzed by a sentence analysis device, and information for disambiguation of a polysemous word is obtained. A natural language processing device characterized by collecting.