JP2007115118A

JP2007115118A - Reputation information extraction method, apparatus and program

Info

Publication number: JP2007115118A
Application number: JP2005307291A
Authority: JP
Inventors: Nobuaki Hiroshima; 伸章廣嶋; Setsuo Yamada; 節夫山田; Kura Furuse; 蔵古瀬; Ryoji Kataoka; 良治片岡
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: NTT Inc
Priority date: 2005-10-21
Filing date: 2005-10-21
Publication date: 2007-05-10

Abstract

<P>PROBLEM TO BE SOLVED: To acquire an attribute expression candidate independently from an attribute expression list. <P>SOLUTION: This method comprises dividing an input sentence to words; retrieving and acquiring, in reference to an evaluation expression list describing evaluation expressions that are evaluations related to properties of an object thing, an evaluation expression from a divided word string; acquiring, for the evaluation expression, attribute expression candidates that are properties of the object thing by paying attention to a sentence structure; calculating, for each attribute expression candidate, an attribute likelihood showing the adequecy as the object thing in reference to word information related to the object thing and attribute expression candidates in a plurality of documents; and taking an attribute expression candidate having an attribute likelihood higher than a predetermined threshold as attribute expression and extracting a set of this attribute expression and an evaluation expression corresponding to the attribute expression as reputation information. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、評判情報抽出方法及び装置及びプログラムに係り、特に、文から製品などの対象事物に関する評判情報を抽出するための評判情報抽出方法及び装置及びプログラムに関する。 The present invention relates to a reputation information extraction method, apparatus, and program, and more particularly, to a reputation information extraction method, apparatus, and program for extracting reputation information related to an object such as a product from a sentence.

Ｗｅｂページには、製品や人物など（以下では「対象事物」と記す）についての評判が書かれた文を含むページが数多く存在する。例えば、携帯電話については、「電池の持ちがよい」や「デザインがかわいい」といった評判を含む文が書かれている。このような評判を含む文から「電池の持ち」のような対象事物の性質を表す属性表現と「よい」のような評価表現の組を評判情報として抽出することができれば、製品を購入する際の参考情報などとして抽出した評判情報が役立つ。そのため、文から対象事物に関する評判情報を抽出する研究が行われている。 There are many Web pages including sentences in which a reputation about a product, a person, and the like (hereinafter referred to as “target thing”) is written. For example, for a mobile phone, a sentence including a reputation such as “battery is good” or “cute design” is written. If you can extract as a reputation information a set of attribute expressions that express the characteristics of the subject matter, such as “battery possession” and evaluation expressions, such as “good”, from such a sentence containing reputation, Reputation information extracted as reference information is useful. For this reason, research has been conducted to extract reputation information about the subject matter from sentences.

従来は、「色」や「重さ」のような対象事物の性質を表す属性表現のリストと、「よい」や「美しい」のような属性表現に対する評価を表す評価表現のリストを作成し（例えば、非特許文献１参照）、属性表現リストと評価表現リストを参照して、評判が含まれる文からパターンマッチングにより属性表現と評価表現の組を評判情報として抽出するという方法がとられている（例えば、非特許文献２参照）。 Previously, we created a list of attribute expressions that represent the properties of the target object such as “color” and “weight”, and a list of evaluation expressions that represent evaluations for attribute expressions such as “good” and “beautiful” ( For example, refer to Non-Patent Document 1), and a method of extracting a pair of attribute expression and evaluation expression as reputation information by pattern matching from a sentence including reputation by referring to the attribute expression list and the evaluation expression list. (For example, refer nonpatent literature 2).

また、属性表現リストを用いない方法として、対象事物に関連する表現を持つ文から、文が評判を含むかどうかを判定するという方法が提案されている（例えば、非特許文献３参照）。
「意見抽出のための評価表現の収集」言語処理学会論文集、Vol.12, No.3, pp.203-222, 2005 「Ｗｅｂ文書集合からの意見情報抽出と着眼点に基づく要約生成」言語処理学会第１０回年次大会、pp.644-647, 2004 「ドメイン特徴語の自動取得によるＷｅｂ掲示板からの意見文抽出」言語処理学会第１１回年次大会、pp.672-675, 2005 Further, as a method that does not use an attribute expression list, a method of determining whether a sentence includes a reputation from a sentence having an expression related to the target thing has been proposed (for example, see Non-Patent Document 3).
“Collecting Evaluation Expressions for Opinion Extraction” Proceedings of the Society of Language Processing, Vol.12, No.3, pp.203-222, 2005 “Extraction of Opinion Information from Web Document Set and Summary Generation Based on Focus” The 10th Annual Conference of the Association for Natural Language Processing, pp.644-647, 2004 “Extracting Opinion Sentences from Web Bulletin Boards by Automatic Acquisition of Domain Feature Words” The 11th Annual Conference of the Language Processing Society, pp.672-675, 2005

しかしながら、属性表現の種類は多岐にわたり、新しい製品などの出現と共に属性表現も増え続けていくため、上記の非特許文献１のような方法を用いても全ての属性表現を属性表現リストに登録することはできない。そのため、上記の非特許文献２のような方法では、評判を含む文中に対象事物の性質を表す表現が含まれていても、その表現が属性表現リストに登録されていなければ評判情報を抽出することができない。例えば、映画に関する属性表現が記述された属性表現リストがあり、ある新作の映画にＸという映画初登場の俳優が出演した場合、Ｘは属性表現リストに登録されていないため、「Ｘがかっこいい」という文には評判情報が含まれているにも関わらず、評判情報を抽出できない。 However, since there are a wide variety of attribute expressions and the number of attribute expressions continues to increase with the appearance of new products, all the attribute expressions are registered in the attribute expression list even using the method described in Non-Patent Document 1 above. It is not possible. Therefore, in the method such as Non-Patent Document 2 described above, reputation information is extracted if an expression representing the nature of the object is included in the sentence including the reputation if the expression is not registered in the attribute expression list. I can't. For example, if there is an attribute expression list in which attribute expressions related to movies are described, and an actor first appearing in a movie named X appears in a new movie, X is not registered in the attribute expression list, so “X is cool” Although the sentence contains the reputation information, the reputation information cannot be extracted.

また、属性表現リストを用いずに文が評判を含むかどうかを判定する上記の非特許文献３の方法では、対象事物と関連のある表現及び評判を表しやすい表現が含まれていれば、対象事物の性質を表していない属性表現を持つ文であっても、その対象事物についての評判を含む文であると判定されてしまう。例えば、ある映画について書かれた文書の中に、「Ｘの出演している番組も面白い」という文が存在していた場合、この文には対象事物と関連のある表現「Ｘ」及び評判を表しやすい表現「面白い」が含まれているので、映画の性質を表していない属性表現「番組」を持つ文であっても、映画についての評判を含む文であると判定されてしまう。その結果、この方法からでは、対象事物について述べられていない評判情報を抽出してしまう。 Further, in the method of Non-Patent Document 3 for determining whether a sentence includes a reputation without using an attribute expression list, if an expression related to the target thing and an expression that easily represents the reputation are included, the target Even a sentence having an attribute expression that does not represent the nature of a thing is determined to be a sentence that includes a reputation for the subject matter. For example, in a document written about a movie, if there is a sentence “The program in which X appears is also interesting,” this sentence contains the expression “X” and reputation related to the subject matter. Since the expression “interesting” that is easy to represent is included, even a sentence having an attribute expression “program” that does not represent the nature of the movie is determined to be a sentence that includes a reputation about the movie. As a result, this method extracts reputation information that does not describe the object.

本発明は、上記の点に鑑みなされたもので、属性表現リストに依存することなく属性表現候補を取得することにより、属性表現が属性表現リストに含まれていないために抽出できなかった評判情報を抽出できるだけでなく、文の構造に着目して評価表現に対応した属性表現候補を取得し、取得した属性表現候補に対して複数の文書における対象事物と属性表現に関する単語情報を用いて対象事物の性質としての適切さを調べることにより、対象事物について正しく述べられている評判情報を抽出することができるような評判情報抽出方法及び装置及びプログラムを提供することを目的とする。 The present invention has been made in view of the above points, and by obtaining attribute expression candidates without depending on the attribute expression list, reputation information that cannot be extracted because the attribute expression is not included in the attribute expression list. In addition to extracting text, candidate attribute expressions corresponding to evaluation expressions are acquired by focusing on the structure of the sentence, and the target objects are obtained by using the target object in a plurality of documents and the word information related to the attribute expressions for the acquired attribute expression candidates. It is an object of the present invention to provide a reputation information extraction method, apparatus, and program capable of extracting reputation information that is correctly stated about a subject matter by examining the appropriateness of the property.

図１は、本発明の原理を説明するための図である。 FIG. 1 is a diagram for explaining the principle of the present invention.

本発明（請求項１）は、入力された文から製品を含む対象事物に関する評判情報を抽出する評判情報抽出方法であって、
入力された文を単語に分割する単語分割ステップ（ステップ１）と、
評価表現リスト記憶手段に格納されている、対象事物の性質に関する評価である評価表現が記載された評価表現リストを参照して、単語分割ステップで分割された単語列から評価表現を検索し、検索された評価表現を取得する評価表現取得ステップ（ステップ２）と、
評価表現に対し、文構造を格納した文構造記憶手段を参照して、対象事物の性質である属性表現候補を取得する属性表現候補取得ステップ（ステップ３）と、
属性表現候補に対し、複数の文書における対象事物と属性表現候補に関する単語情報を格納した単語情報記憶手段を参照して、対象事物としての適切さを表す属性尤度を算出する属性尤度算出ステップ（ステップ４）と、
属性尤度が所定の閾値よりも高い属性表現候補を属性表現とし、該属性表現と該属性表現に対応する評価表現の組を評判情報として抽出する評判情報抽出ステップ（ステップ５）と、を行う。 The present invention (Claim 1) is a reputation information extraction method for extracting reputation information about an object including a product from an input sentence,
A word dividing step (step 1) for dividing the inputted sentence into words;
By referring to the evaluation expression list stored in the evaluation expression list storing the evaluation expressions that are evaluations related to the properties of the target object, the evaluation expression is searched from the word string divided in the word dividing step and searched. An evaluation expression acquisition step (step 2) of acquiring the evaluated evaluation expression;
An attribute expression candidate acquisition step (step 3) for acquiring an attribute expression candidate that is a property of the target thing with reference to the sentence structure storage means that stores the sentence structure for the evaluation expression;
Attribute likelihood calculating step for calculating attribute likelihood representing appropriateness as a target thing with reference to word information storage means storing word information related to the target thing and attribute candidate in a plurality of documents for the attribute expression candidate (Step 4),
A reputation information extraction step (step 5) is performed in which an attribute expression candidate having an attribute likelihood higher than a predetermined threshold is defined as an attribute expression, and a set of the attribute expression and an evaluation expression corresponding to the attribute expression is extracted as reputation information. .

また、本発明（請求項２）は、請求項１の評判情報抽出方法であって、
属性尤度算出ステップにおいて、単語情報として単語の出現頻度を用いる。 The present invention (Claim 2) is the reputation information extraction method of Claim 1,
In the attribute likelihood calculation step, the word appearance frequency is used as the word information.

図２は、本発明の原理構成図である。 FIG. 2 is a principle configuration diagram of the present invention.

本発明（請求項３）は、入力された文から製品を含む対象事物に関する評判情報を抽出する評判情報抽出装置であって、
対象事物の性質に関する評価である評価表現が記載された評価表現リストを格納した評価表現リスト記憶手段６と、
文構造を格納した文構造記憶手段７と、
複数の文書における対象事物と属性表現候補に関する単語情報を格納した単語情報記憶手段８と、
入力された文を単語に分割する単語分割手段１と、
評価表現リスト記憶手段６に格納されている、評価表現リストを参照して、単語分割手段１で分割された単語列から評価表現を検索し、検索された評価表現を取得する評価表現取得手段２と、
評価表現に対し、文構造記憶手段７を参照して、対象事物の性質である属性表現候補を取得する属性表現候補取得手段３と、
属性表現候補に対し、単語情報記憶手段８を参照して、対象事物としての適切さを表す属性尤度を算出する属性尤度算出手段４と、
属性尤度が所定の閾値よりも高い属性表現候補を属性表現とし、該属性表現と該属性表現に対応する評価表現の組を評判情報として抽出する評判情報抽出手段５と、を有する。 The present invention (Claim 3) is a reputation information extraction device that extracts reputation information about an object including a product from an input sentence,
An evaluation expression list storage means 6 for storing an evaluation expression list in which evaluation expressions that are evaluations related to the properties of the object are described;
Sentence structure storage means 7 storing the sentence structure;
Word information storage means 8 for storing word information related to a target thing and attribute expression candidates in a plurality of documents;
Word dividing means 1 for dividing an inputted sentence into words;
Evaluation expression acquisition means 2 for searching for an evaluation expression from the word string divided by the word dividing means 1 with reference to the evaluation expression list stored in the evaluation expression list storage means 6 and acquiring the searched evaluation expression When,
For the evaluation expression, referring to the sentence structure storage means 7, the attribute expression candidate acquisition means 3 for acquiring the attribute expression candidate that is the property of the target thing,
Attribute likelihood calculating means 4 for calculating the attribute likelihood representing the appropriateness as the target object with reference to the word information storage means 8 for the attribute expression candidate,
Reputation information extraction means 5 for extracting attribute expression candidates whose attribute likelihood is higher than a predetermined threshold as attribute expressions and extracting a combination of the attribute expression and an evaluation expression corresponding to the attribute expression as reputation information.

また、本発明（請求項４）は、請求項３の評判情報抽出装置であって、
単語情報記憶手段８は、単語情報として単語の出現頻度を格納し、
属性尤度算出手段４は、
単語情報記憶手段の単語の出現頻度を用いて属性尤度を算出する。 Moreover, this invention (Claim 4) is the reputation information extraction apparatus of Claim 3,
The word information storage means 8 stores the appearance frequency of words as word information,
The attribute likelihood calculating means 4
The attribute likelihood is calculated using the word appearance frequency in the word information storage means.

本発明（請求項５）は、対象事物の性質に関する評価である評価表現が記載された評価表現リストを格納した評価表現リスト記憶手段と、
文構造を格納した文構造記憶手段と、
複数の文書における対象事物と属性表現候補に関する単語情報を格納した単語情報記憶手段と、
を有するコンピュータを、請求項３または４記載の評判情報抽出装置として機能させる評判情報抽出プログラムである。 The present invention (Claim 5) includes an evaluation expression list storage unit that stores an evaluation expression list in which an evaluation expression that is an evaluation related to the property of a target object is described;
A sentence structure storage means for storing the sentence structure;
Word information storage means for storing word information related to a target thing and attribute expression candidates in a plurality of documents;
A reputation information extracting program that causes a computer having the function to function as the reputation information extracting device according to claim 3 or 4.

上記のように本発明によれば、属性表現リストに依存することなく属性表現候補を取得するため、属性表現リストを参照する従来の技術よりも多くの評判情報を抽出することができる。 As described above, according to the present invention, since the attribute expression candidates are acquired without depending on the attribute expression list, it is possible to extract more reputation information than the conventional technology that refers to the attribute expression list.

また、文の構造に着目して評価表現に対応した属性表現候補を取得し、取得した属性表現候補に対して複数の文書における対象事物と属性表現に関する単語情報を用いて対象事物の性質としての適切さを調べるため、属性表現リストを参照しない従来の技術よりも対象事物について述べてられている評判情報を正しく抽出することができる。 In addition, the attribute expression candidate corresponding to the evaluation expression is acquired by paying attention to the structure of the sentence, and the acquired attribute expression candidate is used as a property of the target object using word information regarding the target object and the attribute expression in a plurality of documents. In order to check the appropriateness, it is possible to correctly extract the reputation information described about the subject matter rather than the conventional technique that does not refer to the attribute expression list.

また、本発明では、自動的に属性表現を取得するため、属性表現リストを参照する従来の技術で必要な属性表現リストを作成するコストを削減することができる。 In the present invention, since the attribute expression is automatically acquired, it is possible to reduce the cost of creating the attribute expression list necessary in the conventional technique for referring to the attribute expression list.

以下、図面と共に本発明の実施の形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図３は、本発明の一実施の形態における評判情報抽出装置の構成を示す。 FIG. 3 shows a configuration of a reputation information extracting device according to an embodiment of the present invention.

同図に示す評判情報抽出装置は、単語分割部１、評価表現取得部２、属性表現候補取得部３、属性尤度算出部４、評判情報抽出部５、評価表現リスト記憶部６、文構造データベース７、単語情報データベース８から構成される。 The reputation information extraction apparatus shown in the figure includes a word division unit 1, an evaluation expression acquisition unit 2, an attribute expression candidate acquisition unit 3, an attribute likelihood calculation unit 4, a reputation information extraction unit 5, an evaluation expression list storage unit 6, and a sentence structure. It consists of a database 7 and a word information database 8.

評価表現リスト記憶部６は、対象事物の性質に関する評価である評価表現が記載された評価表現リストを格納する。 The evaluation expression list storage unit 6 stores an evaluation expression list in which evaluation expressions that are evaluations related to the properties of the target object are described.

文構造データベース７は、文の構造を格納する。 The sentence structure database 7 stores a sentence structure.

単語情報データベース８は、複数の文書における対象事物と属性表現に関する単語情報を格納する。 The word information database 8 stores word information related to target things and attribute expressions in a plurality of documents.

単語分割部１は、入力された単語を分割し、単語列を取得し、入力された対象事物と共に評価表現取得部２に渡す。 The word dividing unit 1 divides the input word, acquires a word string, and passes it to the evaluation expression acquiring unit 2 together with the input target object.

評価表現取得部２は、評価表現リスト記憶部６に格納されている対象事物の性質に関する評価である評価表現が記載された評価表現リストを参照して、文中の単語列から評価表現を検索し、検索された評価表現を取得し、属性表現候補取得部３に渡す。 The evaluation expression acquisition unit 2 refers to the evaluation expression list in which the evaluation expression that is the evaluation related to the property of the target object stored in the evaluation expression list storage unit 6 is referred to, and searches the evaluation expression from the word string in the sentence. The retrieved evaluation expression is acquired and passed to the attribute expression candidate acquisition unit 3.

属性表現取得部３は、評価表現に対し、文構造データベース７に格納された文の構造に着目して属性表現候補を取得し、属性尤度算出部４に渡す。 The attribute expression acquisition unit 3 acquires attribute expression candidates for the evaluation expression by paying attention to the sentence structure stored in the sentence structure database 7 and passes the attribute expression candidates to the attribute likelihood calculation unit 4.

属性尤度算出部４は、属性表現候補に対し、複数の文書における対象事物と属性表現に関する単語情報が格納された単語情報データベース８を参照して、属性表現としての適切さを表す属性尤度を算出し、評判情報抽出部５に渡す。 The attribute likelihood calculation unit 4 refers to the word information database 8 in which word information related to the target thing and the attribute expression in a plurality of documents is stored with respect to the attribute expression candidate, and represents the attribute likelihood indicating appropriateness as the attribute expression. Is calculated and passed to the reputation information extraction unit 5.

評判情報抽出部５は、属性尤度が閾値よりも高い属性表現候補を属性表現とし、属性表現とそれに対応する評価表現の組を評判情報として抽出し、出力する。 The reputation information extraction unit 5 extracts attribute expression candidates whose attribute likelihood is higher than the threshold as attribute expressions, and extracts and outputs a combination of attribute expressions and corresponding evaluation expressions as reputation information.

以下に、上記の構成における動作を説明する。 The operation in the above configuration will be described below.

図４は、本発明の一実施の形態における評判情報抽出装置の全体の動作のフローチャートである。 FIG. 4 is a flowchart of the overall operation of the reputation information extracting apparatus according to the embodiment of the present invention.

ステップ１００）まず、単語分割部１は、入力された文を単語に分割し、単語列を得る。 Step 100) First, the word dividing unit 1 divides an inputted sentence into words to obtain a word string.

ステップ２００）次に、評価表現取得部２により、対象事物の性質に関する評価である評価表現が記載された評価表現リスト記憶部６の評価表現リストを参照して、文中の単語列から評価表現を検索し、検索された評価表現を取得する。 Step 200) Next, the evaluation expression acquisition unit 2 refers to the evaluation expression list in the evaluation expression list storage unit 6 in which the evaluation expression that is the evaluation regarding the property of the target object is described, and the evaluation expression is obtained from the word string in the sentence. Search and obtain the searched evaluation expression.

ステップ３００）次に、属性表現候補取得部３により、ステップ２００で得られた評価表現に対し、文構造データベース７に格納された文構造に着目して属性表現候補を取得する。 Step 300) Next, the attribute expression candidate acquisition unit 3 acquires attribute expression candidates for the evaluation expression obtained in Step 200 by paying attention to the sentence structure stored in the sentence structure database 7.

ステップ４００）次に、属性尤度算出部４により、属性表現候補に対し、複数の文書における対象事物と属性表現に関する単語情報が格納された単語情報データベース８を参照して、属性表現としての適切さを表す属性尤度を算出する。 Step 400) Next, the attribute likelihood calculation unit 4 refers to the word information database 8 in which word information related to the target thing and the attribute expression in a plurality of documents is stored for the attribute expression candidate, and the appropriate attribute expression is obtained. Attribute likelihood representing the height is calculated.

ステップ５００）最後に、評判情報抽出部５により、ステップ４００で算出された属性尤度が閾値よりも高い属性表現候補を属性表現とし、属性表現とそれに対応する評価表現の組を評判情報として抽出する。 Step 500) Finally, the reputation information extraction unit 5 extracts attribute expression candidates whose attribute likelihoods calculated in step 400 are higher than the threshold value as attribute expressions, and extracts pairs of attribute expressions and corresponding evaluation expressions as reputation information. To do.

なお、抽出された評判情報はユーザの表示装置に表示する、または、記憶手段に格納するようにしても良い。 It should be noted that the extracted reputation information may be displayed on the display device of the user or stored in the storage means.

以下、図面と共に、具体例を用いて前述のフローチャートに沿って本発明の実施例を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the above-described flowcharts using specific examples together with the drawings.

以下の実施例では、まず、図５に示す文から評判情報を抽出する場合について説明する。 In the following example, first, a case where reputation information is extracted from the sentence shown in FIG. 5 will be described.

また、対象事物として映画の名前である「スペースウォーズ」が与えられているものとする。対象事物はこのようにはじめから与えられてもよいし、対象表現リストを参照するなどして入力文中から取得してもよい。 Further, it is assumed that “Space Wars”, which is the name of a movie, is given as an object. The target thing may be given from the beginning as described above, or may be acquired from the input sentence by referring to the target expression list.

ステップ１００では、単語分割部１により、図５の文を単語に分割し、単語列を得る。ここでは、既存の形態素解析技術を用いて単語分割を行い、各単語の表記、品詞の情報を得るものとする。単語に分割した例を図６に示す。 In step 100, the word division unit 1 divides the sentence of FIG. 5 into words to obtain a word string. Here, it is assumed that word division is performed using an existing morphological analysis technique to obtain notation and part-of-speech information for each word. An example of dividing into words is shown in FIG.

ステップ２００では、評価表現取得部２により、評価表現リスト記憶部６の対象事物の性質に関する評価である評価表現が記載された評価表現リストを参照して、文中の単語列から評価表現を検索し、検索された評価表現を取得する。評価表現リストの例を図７に示す。単語分割部１で得られた図６の単語列から、図７の評価表現リストが含まれているかを調べ、評価表現が含まれていればその評価表現を取得する。ここでは、図６の単語列全体から評価表現が含まれているかどうかを調べるが、単語列の一部から調べてもよい。図６の単語列中に図７の５番目の「きれい」という評価表現が含まれているため、評価表現として「きれい」を取得する。 In step 200, the evaluation expression acquisition unit 2 searches the evaluation expression list from the word string in the sentence by referring to the evaluation expression list in which the evaluation expression that is the evaluation regarding the property of the target thing in the evaluation expression list storage unit 6 is described. , Get the retrieved evaluation expression. An example of the evaluation expression list is shown in FIG. From the word string of FIG. 6 obtained by the word dividing unit 1, it is checked whether the evaluation expression list of FIG. 7 is included. If the evaluation expression is included, the evaluation expression is acquired. Here, it is checked whether or not the evaluation expression is included in the entire word string in FIG. 6, but it may be checked from a part of the word string. Since the fifth evaluation expression “beautiful” in FIG. 7 is included in the word string in FIG. 6, “beautiful” is acquired as the evaluation expression.

ステップ３００では、属性表現候補取得部３により、文構構造データベース７に格納された文の構造に着目して、属性表現候補を取得する。文構造データベース７に格納されている文の構造の例を図８に示す。図８では、文の構造として、文中のある名詞Ｗ１の後に「は」「が」「も」の何れかの助詞が出現し、その直後または１つ以上の副詞Ｗ２を挟んで評価表現Ｗ３が出現するという構造かどうかに着目し、その構造を持つ場合には最初に出現した名詞Ｗ１を属性表現候補として取得する。着目する文の構造は、この例のように１つでもよいし、複数でもよい。属性表現候補の取得方法については文の構造に着目していればこれに限定されることなく、係り受け解析を用いて文の構造を解析し、評価表現に係る文節中に含まれる名詞を属性表現候補として取得するなどとしてもよい。図６の単語列は、「ＣＧ」という名詞の後に助詞「が」が出現し、１つの副詞「とても」を挟んで評価表現「きれい」が出現するという図８に示された文の構造を持つため、最初に出現した名詞である「ＣＧ」を属性表現候補として取得する。 In step 300, the attribute expression candidate acquisition unit 3 acquires attribute expression candidates by paying attention to the sentence structure stored in the sentence structure database 7. An example of the structure of a sentence stored in the sentence structure database 7 is shown in FIG. In FIG. 8, as a sentence structure, any particle of “ha”, “ga”, and “mo” appears after a noun W1 in the sentence, and an evaluation expression W3 immediately after or after one or more adverbs W2 Focusing on whether or not the structure appears, if it has the structure, the first appearing noun W1 is acquired as an attribute expression candidate. The structure of the sentence of interest may be one or a plurality as in this example. The method for acquiring candidate attribute expressions is not limited to this, as long as the focus is on the sentence structure. Dependency analysis is used to analyze the sentence structure, and the nouns included in the phrase related to the evaluation expression are attributed. It may be acquired as an expression candidate. The word string in FIG. 6 has the sentence structure shown in FIG. 8 in which the particle “ga” appears after the noun “CG”, and the evaluation expression “beautiful” appears across one adverb “very”. Therefore, “CG”, which is a noun that appears first, is acquired as an attribute expression candidate.

ステップ４００では、属性尤度算出部４により、属性表現候補に対し、複数の文書における対象事物と属性表現に関する単語情報を格納した単語情報データベース８を参照して、属性表現としての適切さを表す属性尤度を算出する。単語情報データベース８は、入力文とは別の複数の文書のそれぞれに対して単語分割を行い、各文書中における単語出現頻度を求めて記憶したものである。単語情報データベース８の例を図９に示す。なお、利用する単語情報は単語の出現頻度に限定されるものではなく、単語の文書中での位置情報を利用したり、単語の品詞情報を利用したり、単語の属するカテゴリの情報を利用したり、それらを組み合わせた情報を利用したりしてもよい。単語情報データベース８から対象事物と属性表現候補の頻度が共に１以上である文書の数Ａ、対象事物の頻度が１以上であり、属性表現候補の頻度が０である文書の数Ｂ、対象表現の頻度が０であり属性表現候補の頻度が１以上である文書の数Ｃ、対象表現と属性表現候補の頻度が０である文書の数Ｄを取得し、これらの値から以下の式により属性尤度Ｌを算出する。 In step 400, the attribute likelihood calculation unit 4 refers to the word expression database 8 that stores word information related to the target thing and the attribute expression in a plurality of documents for the attribute expression candidate, and expresses the appropriateness as the attribute expression. Attribute likelihood is calculated. The word information database 8 divides a word into each of a plurality of documents different from the input sentence, and obtains and stores the word appearance frequency in each document. An example of the word information database 8 is shown in FIG. Note that the word information to be used is not limited to the appearance frequency of the word. The position information of the word in the document, the part of speech information of the word, the information of the category to which the word belongs are used. Or information combining them may be used. From the word information database 8, the number A of documents in which the frequency of the target thing and the attribute expression candidate is both 1 or more, the number B of documents in which the frequency of the target thing is 1 or more and the frequency of the attribute expression candidate is 0, and the target expression The number C of documents having a frequency of 0 and the frequency of candidate attribute expressions being 1 or more and the number D of documents having a target expression and candidate attribute expressions having a frequency of 0 are obtained, and the attribute is obtained from these values by the following formula: A likelihood L is calculated.

Ｌ＝２Ａ／（２Ａ＋Ｂ＋Ｃ）式（１）
属性尤度の算出方法は、上記の式に限定されるものではなく、単語情報データベース８に格納されている頻度をもとに、「H. Shutze, Dimensions of Meaning, Proceedings of Supercomputing 92, pp.787-796, 1992」に示される概念ベースを作成し、対象表現と属性表現候補の概念ベクトル間の距離を属性尤度とするなどとしてもよい。 L = 2A / (2A + B + C) Formula (1)
The attribute likelihood calculation method is not limited to the above formula, and based on the frequency stored in the word information database 8, “H. Shutze, Dimensions of Meaning, Proceedings of Supercomputing 92, pp. 787-796, 1992 "may be created, and the distance between the target expression and the concept vector of the attribute expression candidate may be used as the attribute likelihood.

具体的には、図９の文書番号「２」の文書を例にとると、文書中での単語「スペースウォーズ」の頻度が“２”、単語「ＣＧ」の頻度が“１”、単語「ポップコーン」「今日」「シーン」の頻度は“０”であることを示している。この単語情報データベース８を元に、図１０の各値を計算する。属性表現候補に対し、各文書がＡ〜Ｄのどれに相当するか調べていき、それぞれに相当する文書の数が図１０の各値となる。例えば、「ＣＧ」を例にとると、文書番号１の文書はＤ、文書番号２の文書はＡ、文書番号３の文書はＢ、文書番号４の文書はＣ、…となる。 Specifically, taking the document with the document number “2” in FIG. 9 as an example, the frequency of the word “Space Wars” in the document is “2”, the frequency of the word “CG” is “1”, and the word “ The frequency of “popcorn”, “today”, and “scene” is “0”. Based on this word information database 8, each value in FIG. 10 is calculated. With respect to the attribute expression candidate, it is checked which of the documents A to D corresponds to, and the number of documents corresponding to each of the documents corresponds to each value in FIG. For example, taking “CG” as an example, the document with document number 1 is D, the document with document number 2 is A, the document with document number 3 is B, the document with document number 4 is C,.

図１０と上記の式（１）を用いて、属性表現候補毎に属性尤度を算出する。属性表現候補「ＣＧ」について属性尤度Ｌを算出すると、
Ｌ＝２×２８／（２×２８＋１５２＋９３）≒０．１９
となる。 The attribute likelihood is calculated for each attribute expression candidate using FIG. 10 and the above equation (1). When the attribute likelihood L is calculated for the attribute expression candidate “CG”,
L = 2 × 28 / (2 × 28 + 152 + 93) ≈0.19
It becomes.

ステップ５００では、評判情報抽出部５により、属性尤度が閾値以上の属性表現候補を属性表現とし、属性表現とそれに対応する評価表現の組を評判情報として抽出する。ここでは、閾値を０．１５とする。閾値はこのように全ての入力文に対して固定の値でもよいし、属性表現候補の２０％が閾値以上となるように入力文に対して異なる値を設定したりしてもよい。属性表現候補「ＣＧ」の属性尤度は０．１９であり、閾値以上の値であるため、属性表現候補「ＣＧ」は属性表現となる。よって、図５の文に対して、属性表現「ＣＧ」とそれに対応する評価表現「きれい」という１組の評判情報が抽出される。 In step 500, the reputation information extraction unit 5 extracts attribute expression candidates having attribute likelihoods equal to or greater than a threshold value as attribute expressions, and extracts a combination of the attribute expression and the corresponding evaluation expression as reputation information. Here, the threshold value is 0.15. As described above, the threshold value may be a fixed value for all the input sentences, or may be set to a different value for the input sentence so that 20% of the attribute expression candidates are equal to or greater than the threshold value. The attribute likelihood of the attribute expression candidate “CG” is 0.19, which is a value equal to or greater than the threshold value, so the attribute expression candidate “CG” is an attribute expression. Therefore, a set of reputation information of the attribute expression “CG” and the corresponding evaluation expression “beautiful” is extracted from the sentence of FIG.

次に、図１１に示す文から評判情報を抽出する場合について説明する。 Next, the case where reputation information is extracted from the sentence shown in FIG. 11 will be described.

対象事物としては、先ほどと同様に「スペースウォーズ」が与えられているものとする。 It is assumed that “Space Wars” is given as the subject matter as before.

ステップ１００では、単語分割部１により、図１１の文を単語分割し、単語列を得る。単語に分割した結果を図１２に示す。 In step 100, the word division unit 1 divides the sentence in FIG. 11 into words to obtain a word string. The result of dividing into words is shown in FIG.

ステップ２００では、評価表現取得部２により、評価表現リスト記憶部６の対象事物の性質に関する評価である評価表現が記載された評価表現リストを参照して、文中の単語列から評価表現を検索し、検索された評価表現を取得する。図１２の単語列の中に図７の３番目の「おいし（い）」という評価表現が含まれているため、評価表現として「おいし（い）」を取得する。 In step 200, the evaluation expression acquisition unit 2 searches the evaluation expression list from the word string in the sentence by referring to the evaluation expression list in which the evaluation expression that is the evaluation regarding the property of the target thing in the evaluation expression list storage unit 6 is described. , Get the retrieved evaluation expression. Since the third evaluation expression “delicious” in FIG. 7 is included in the word string in FIG. 12, “delicious” is acquired as the evaluation expression.

ステップ３００では、属性表現候補取得部３により、文構造データベース７に格納された文の構造に着目して、属性表現候補を取得する。図１１の単語列は、「ポップコーン」という名詞の後に助詞「が」が出現し、その直後に評価表現「おいし（い）」が出現するという図８に示された文の構造を持つため、最初に出現した名詞である「ポップコーン」を属性表現候補として取得する。 In step 300, the attribute expression candidate acquisition unit 3 acquires attribute expression candidates by paying attention to the sentence structure stored in the sentence structure database 7. The word string in FIG. 11 has the sentence structure shown in FIG. 8 in which the particle “ga” appears after the noun “popcorn”, and the evaluation expression “oi” appears immediately after that. The first pop-up noun “popcorn” is acquired as an attribute expression candidate.

ステップ４００では、属性尤度算出部４により、属性表現候補に対し、複数の文書における対象事物と属性表現に関する単語情報を格納した単語情報データベース８を参照して、属性表現としての適切さを表す属性尤度を算出する。図９と上記の式（１）により、属性表現候補「ポップコーン」について属性尤度Ｌを算出すると、
Ｌ＝２×１０／（２×１０＋１７０＋５０）≒０．０８
となる。 In step 400, the attribute likelihood calculation unit 4 refers to the word expression database 8 that stores word information related to the target thing and the attribute expression in a plurality of documents for the attribute expression candidate, and expresses the appropriateness as the attribute expression. Attribute likelihood is calculated. By calculating the attribute likelihood L for the attribute expression candidate “popcorn” using FIG. 9 and the above equation (1),
L = 2 × 10 / (2 × 10 + 170 + 50) ≈0.08
It becomes.

ステップ５００では、評判情報抽出部５により、属性尤度が閾値以上の属性表現候補を属性表現とし、属性表現とそれに対応する評価表現の組を評判情報として抽出する。先程と同様に、閾値を０．１５とすると、属性表現候補「ポップコーン」の属性尤度は０．０８であり、閾値未満の値であるため、属性表現候補「ポップコーン」は属性表現とならない。よって、図１１の文からは、評判情報が抽出されない。 In step 500, the reputation information extraction unit 5 extracts attribute expression candidates having attribute likelihoods equal to or greater than a threshold value as attribute expressions, and extracts a combination of the attribute expression and the corresponding evaluation expression as reputation information. Similarly to the previous case, if the threshold value is 0.15, the attribute likelihood of the attribute expression candidate “popcorn” is 0.08, which is a value less than the threshold value, so the attribute expression candidate “popcorn” does not become an attribute expression. Therefore, reputation information is not extracted from the sentence of FIG.

このように、属性表現リストに依存することなく、文の構造に着目して属性表現候補を取得するため、属性表現が属性表現リストに含まれていないために評判情報が抽出されないということが起こらなくなる。また、取得した属性表現候補に対して、複数の文書における対象事物と属性表現に関する単語情報を用いて、「ＣＧ」のような対象事物の性質として適切な属性表現候補の属性尤度は高い値となり、「ポップコーン」のように対象事物の性質として適切でない属性表現候補の属性尤度は低い値となるように属性尤度を算出し、属性尤度が高い値を持つ属性表現候補のみを属性表現とするため、対象事物について述べられている評判情報を正しく抽出することができる。 As described above, since the attribute expression candidates are obtained by focusing on the sentence structure without depending on the attribute expression list, reputation information is not extracted because the attribute expression is not included in the attribute expression list. Disappear. Moreover, the attribute likelihood of the attribute expression candidate appropriate as the property of the target thing such as “CG” is high with respect to the acquired attribute expression candidate using word information related to the target thing and the attribute expression in a plurality of documents. The attribute likelihood is calculated so that the attribute likelihood of the attribute expression candidate that is not appropriate as the property of the target object such as “popcorn” has a low value, and only the attribute expression candidate having a high attribute likelihood is attributed. Since it is expressed, it is possible to correctly extract the reputation information described about the object.

なお、上記の実施の形態及び実施例で示した一連の動作をプログラムとして構築し、評判情報抽出装置として利用されるコンピュータにインストールし、ＣＰＵ等の制御手段により実行させる、または、ネットワークを介して流通させることも可能である。 The series of operations shown in the above embodiments and examples is constructed as a program, installed in a computer used as a reputation information extraction device, and executed by a control means such as a CPU, or via a network. It is also possible to distribute.

また、構築されたプログラムを評判情報抽出装置として利用されるコンピュータに接続されるハードディスク装置や、フレキシブルディスク、ＣＤ−ＲＯＭ等の可搬記憶媒体に格納しておき、コンピュータにインストールして実行させることも可能である。 Also, the constructed program is stored in a hard disk device connected to a computer used as a reputation information extracting device, a portable storage medium such as a flexible disk, a CD-ROM, etc., and installed and executed on the computer. Is also possible.

なお、本発明は、上記の実施の形態及び実施例に限定されることなく、特許請求の範囲内において種々変更・応用が可能である。 The present invention is not limited to the above-described embodiments and examples, and various modifications and applications can be made within the scope of the claims.

本発明は、製品についての顧客満足度の調査などに利用可能である。 The present invention can be used for surveys of customer satisfaction about products.

本発明の原理を説明するための図である。It is a figure for demonstrating the principle of this invention. 本発明の原理構成図である。It is a principle block diagram of this invention. 本発明の一実施の形態における評判情報抽出装置の構成図である。It is a block diagram of the reputation information extraction apparatus in one embodiment of this invention. 本発明の一実施の形態における評判情報抽出装置の全体の動作のフローチャートである。It is a flowchart of the whole operation | movement of the reputation information extraction apparatus in one embodiment of this invention. 本発明の一実施例の入力文の例（その１）である。It is an example (the 1) of the input sentence of one Example of this invention. 本発明の一実施例の単語分割部による単語分割結果の例（その１）である。It is an example (the 1) of the word division | segmentation result by the word division part of one Example of this invention. 本発明の一実施例の評価表現リストの例である。It is an example of the evaluation expression list of one Example of this invention. 本発明の一実施例の文構造データベースに格納された文構造の例である。It is an example of the sentence structure stored in the sentence structure database of one Example of this invention. 本発明の一実施例の単語情報データベースの例である。It is an example of the word information database of one Example of this invention. 本発明の一実施例の単語情報データベースから取得した属性尤度算出に用いる値の例である。It is an example of the value used for attribute likelihood calculation acquired from the word information database of one Example of this invention. 本発明の一実施例の文の例（その２）である。It is an example (the 2) of the sentence of one Example of this invention. 本発明の一実施例の単語分割部による単語分割結果の例（その２）である。It is an example (the 2) of the word division | segmentation result by the word division part of one Example of this invention.

Explanation of symbols

１単語分割手段、単語分割部
２評価表現取得手段、評価表現取得部
３属性表現候補取得手段、属性表現候補取得部
４属性尤度算出抽出手段、属性尤度算出抽出部
５評判情報抽出手段、評判情報抽出部
６評価表現リスト記憶手段、評価表現リスト記憶部
７文構造記憶手段、文構造データベース
８単語情報記憶手段、単語情報データベース DESCRIPTION OF SYMBOLS 1 Word division means, Word division part 2 Evaluation expression acquisition means, Evaluation expression acquisition part 3 Attribute expression candidate acquisition means, Attribute expression candidate acquisition part 4 Attribute likelihood calculation extraction means, Attribute likelihood calculation extraction part 5 Reputation information extraction means, Reputation information extraction unit 6 Evaluation expression list storage unit, evaluation expression list storage unit 7 Sentence structure storage unit, sentence structure database 8 Word information storage unit, word information database

Claims

A reputation information extraction method for extracting reputation information about an object including a product from an input sentence,
A word dividing step of dividing the input sentence into words;
With reference to the evaluation expression list described in the evaluation expression, which is an evaluation related to the properties of the target object, stored in the evaluation expression list storage means, the evaluation expression is searched from the word string divided in the word dividing step, An evaluation expression acquisition step of acquiring the searched evaluation expression;
An attribute expression candidate acquisition step of acquiring an attribute expression candidate that is a property of the target thing with reference to a sentence structure storage unit that stores a sentence structure for the evaluation expression;
Attribute likelihood calculation for calculating attribute likelihood representing appropriateness as a target thing with reference to word information storage means storing word information related to the target thing and attribute expression candidates in a plurality of documents for the attribute expression candidate Steps,
Reputation information extraction step for extracting a set of attribute expression and an evaluation expression corresponding to the attribute expression as reputation information as an attribute expression candidate whose attribute likelihood is higher than a predetermined threshold;
Reputation information extraction method characterized by performing.

In the attribute likelihood calculating step,
The reputation information extraction method according to claim 1, wherein a word appearance frequency is used as the word information.

A reputation information extraction device that extracts reputation information about an object including a product from an input sentence,
An evaluation expression list storage means for storing an evaluation expression list in which evaluation expressions that are evaluations related to the properties of the object are described;
A sentence structure storage means for storing the sentence structure;
Word information storage means for storing word information related to a target thing and attribute expression candidates in a plurality of documents;
Word dividing means for dividing the inputted sentence into words;
Evaluation expression acquisition means for retrieving an evaluation expression from the word string divided by the word dividing means by referring to the evaluation expression list stored in the evaluation expression list storage means and acquiring the searched evaluation expression When,
With respect to the evaluation expression, referring to the sentence structure storage means, attribute expression candidate acquisition means for acquiring an attribute expression candidate that is a property of the target thing,
For the attribute expression candidate, referring to the word information storage means, attribute likelihood calculating means for calculating an attribute likelihood representing appropriateness as a target object,
Reputation information extraction means for extracting a set of attribute expressions and evaluation expressions corresponding to the attribute expressions as reputation information, with the attribute expression candidates having an attribute likelihood higher than a predetermined threshold as attribute expressions;
A reputation information extraction device characterized by comprising:

The word information storage means stores the appearance frequency of words as the word information,
The attribute likelihood calculating means includes:
Calculating the attribute likelihood using the appearance frequency of the word in the word information storage means;
The reputation information extraction device according to claim 3.

An evaluation expression list storage means for storing an evaluation expression list in which evaluation expressions that are evaluations related to the properties of the object are described;
A sentence structure storage means for storing the sentence structure;
Word information storage means for storing word information related to a target thing and attribute expression candidates in a plurality of documents;
A computer having
5. A reputation information extraction program that functions as the reputation information extraction device according to claim 3.