JP6919961B1

JP6919961B1 - Information processing system and information processing method

Info

Publication number: JP6919961B1
Application number: JP2021510244A
Authority: JP
Inventors: 桂太杉原
Original assignee: 桂太杉原
Priority date: 2019-12-20
Filing date: 2020-12-04
Publication date: 2021-08-18
Anticipated expiration: 2040-12-04
Also published as: JPWO2021124933A1; WO2021124933A1

Abstract

本開示の一側面によれば、少なくとも弱連結で連結された複数の文書で構成される文書ネットワークが判別される。文書ネットワークに含まれる、二つ以上の文書からのインリンクを有する特定文書が判別される。特定文書を基準に、複数のサブネットワークが判別される。サブネットワークのそれぞれに対する個別処理の実行により、文書ネットワークを構成する複数の文書のそれぞれのスコアが算出される。個別処理では、対応するサブネットワークに含まれる各文書のスコアが算出される。二つ以上のサブネットワークに属する重複文書のそれぞれに関しては、対応する重複文書の二つ以上のサブネットワークでのスコアが統合される。According to one aspect of the present disclosure, a document network composed of at least a plurality of documents linked by a weak connection is determined. A specific document having an inlink from two or more documents included in the document network is identified. Multiple subnetworks are identified based on a specific document. By executing individual processing for each of the sub-networks, the scores of the plurality of documents constituting the document network are calculated. In the individual processing, the score of each document included in the corresponding subnetwork is calculated. For each of the duplicate documents belonging to two or more subnetworks, the scores of the corresponding duplicate documents in the two or more subnetworks are integrated.

Description

Cross-reference to related applications

本国際出願は、令和１年１２月２０日に日本国特許庁に出願された日本国特許出願第２０１９−２３０８２２号に基づく優先権を主張するものであり、日本国特許出願第２０１９−２３０８２２号の全内容を本国際出願に参照により援用する。 This international application claims priority based on Japanese Patent Application No. 2019-230822 filed with the Japan Patent Office on December 20, 1991, and Japanese Patent Application No. 2019-230822. The entire contents of the issue are incorporated in this international application by reference.

本開示は、情報処理システム及び情報処理方法に関する。 The present disclosure relates to an information processing system and an information processing method.

ウェブページのランク付けを行う技術が既に知られている（特許文献１参照）。この技術の単純な例では、ページランクを、多くのウェブページからリンクされるウェブページほど高く判定する。ページランクの計算には、ウェブページ間の接続関係（換言すれば接続状態）を値０，１で二値表現した隣接行列、及び、隣接行列を変形した値０，１と他の実数とを成分に含む行列が用いられる。 A technique for ranking web pages is already known (see Patent Document 1). In a simple example of this technique, the page rank is determined higher for web pages linked from more web pages. To calculate the page rank, an adjacency matrix that binary-represents the connection relationship between web pages (in other words, the connection state) with values 0 and 1, and the value 0, 1 that is a modification of the adjacency matrix and other real numbers are used. The matrix included in the components is used.

特開２０１７−１０２７１２号公報Japanese Unexamined Patent Publication No. 2017-102712

上述の隣接行列に基づきページランクを判定する従来方法では、ウェブページ間の実際の接続関係に加え、全てのウェブページから全てのウェブページへの仮想的な接続関係を措定する必要がある。このため、ウェブページの良好なランク付けを行うことが難しい。 In the conventional method of determining the page rank based on the adjacency matrix described above, it is necessary to determine a virtual connection relationship from all web pages to all web pages in addition to the actual connection relationship between web pages. For this reason, it is difficult to give a good ranking of web pages.

そこで、本開示の一側面によれば、従来よりも適切に複数文書のスコアリングを行うことが可能な技術を提供できることが望ましい。 Therefore, according to one aspect of the present disclosure, it is desirable to be able to provide a technique capable of scoring a plurality of documents more appropriately than before.

本開示の一側面によれば、情報処理システムが提供される。情報処理システムは、文書ネットワーク判別部と、文書判別部と、サブネットワーク判別部と、スコア算出部と、を備える。 According to one aspect of the disclosure, an information processing system is provided. The information processing system includes a document network discriminating unit, a document discriminating unit, a sub-network discriminating unit, and a score calculating unit.

文書ネットワーク判別部は、文書間の接続関係を表すデータに基づき、少なくとも弱連結で連結された複数の文書で構成される文書ネットワークを判別するように構成される。文書判別部は、判別された文書ネットワークに含まれる、二つ以上の文書からのインリンクを有する特定文書を判別するように構成される。 The document network discriminating unit is configured to discriminate a document network composed of a plurality of documents connected by at least a weak connection based on data representing a connection relationship between documents. The document discriminating unit is configured to discriminate a specific document having an inlink from two or more documents included in the discriminated document network.

サブネットワーク判別部は、判別された特定文書を基準に、文書ネットワークに含まれる複数のサブネットワークを判別するように構成される。スコア算出部は、判別された複数のサブネットワークのそれぞれに対する個別処理を実行することにより、文書ネットワークを構成する複数の文書のそれぞれのスコアを算出するように構成される。個別処理では、対応するサブネットワークに含まれる各文書のスコアが算出される。 The sub-network discriminating unit is configured to discriminate a plurality of sub-networks included in the document network based on the discriminated specific document. The score calculation unit is configured to calculate the scores of each of the plurality of documents constituting the document network by executing individual processing for each of the plurality of determined sub-networks. In the individual processing, the score of each document included in the corresponding subnetwork is calculated.

文書ネットワークには、重複文書が一つ以上含まれる。重複文書のそれぞれは、複数のサブネットワークのうちの二つ以上のサブネットワークに属する文書である。スコア算出部は、一つ以上の重複文書のそれぞれに関して、対応する重複文書の二つ以上のサブネットワークでのスコアを統合することにより、対応する重複文書に対する一つのスコアを算出する。 The document network contains one or more duplicate documents. Each duplicate document is a document that belongs to two or more subnetworks out of a plurality of subnetworks. The score calculation unit calculates one score for the corresponding duplicate document by integrating the scores of the corresponding duplicate document in two or more subnetworks for each of the one or more duplicate documents.

本開示の一側面に係る情報処理システムによれば、二つ以上の文書からのインリンクを有する文書が含まれる文書ネットワークにおける複数文書のスコアリングを適切に実行できる。従って、本開示の一側面に係る情報処理システムは、複雑な接続関係を有する文書ネットワークにおける複数文書のスコアリングに大変役立つ。 According to the information processing system according to one aspect of the present disclosure, scoring of a plurality of documents in a document network including documents having inlinks from two or more documents can be appropriately performed. Therefore, the information processing system according to one aspect of the present disclosure is very useful for scoring a plurality of documents in a document network having a complicated connection relationship.

本開示の一側面によれば、サブネットワーク判別部は、特定文書を境界に有する複数のサブネットワークを判別してもよい。複数のサブネットワークは、二つ以上の上流サブネットワークと、下流サブネットワークとを少なくとも有し得る。二つ以上の上流サブネットワークは、特定文書が有する二つ以上のインリンクに対応し、上流ネットワークのそれぞれでは、特定文書が、対応する一つのインリンクを有する。下流サブネットワークは、特定文書が有するアウトリンクを通じて特定文書と接続される。 According to one aspect of the present disclosure, the sub-network discriminating unit may discriminate a plurality of sub-networks having a specific document as a boundary. The plurality of subnetworks may have at least two or more upstream subnetworks and downstream subnetworks. Two or more upstream subnetworks correspond to two or more inlinks of a particular document, and in each of the upstream networks, the particular document has one corresponding inlink. The downstream subnetwork is connected to the specific document through the outlink that the specific document has.

この場合、特定文書は、二つ以上の上流サブネットワークに属する重複文書である。スコア算出部は、特定文書の上流サブネットワークでのスコアを統合することにより、特定文書に対して一つの統合スコアを算出し、下流サブネットワークに属する各文書のスコアを、特定文書の統合スコアを基準に算出してもよい。 In this case, the specific document is a duplicate document that belongs to two or more upstream subnetworks. The score calculation unit calculates one integrated score for a specific document by integrating the scores in the upstream subnetwork of the specific document, and sets the score of each document belonging to the downstream subnetwork as the integrated score of the specific document. It may be calculated based on the standard.

本開示の一側面によれば、サブネットワーク判別部は、複数のサブネットワークとして、特定文書が有するインリンク毎のサブネットワークを判別してもよい。サブネットワークのそれぞれは、対応するインリンクより上流に位置する文書群と、特定文書と、特定文書が有するアウトリンクより下流に位置する文書群と、を備え得る。 According to one aspect of the present disclosure, the sub-network discriminating unit may discriminate the sub-network for each inlink of the specific document as a plurality of sub-networks. Each of the sub-networks may include a group of documents located upstream of the corresponding inlink, a specific document, and a group of documents located downstream of the outlink of the specific document.

本開示の一側面によれば、サブネットワーク判別部は、複数のサブネットワークとして、特定文書が有するインリンク及びアウトリンクの組合せ毎のサブネットワークを判別してもよい。サブネットワークのそれぞれは、組合せに対応するインリンクより上流に位置する文書群と、特定文書と、組合せに対応するアウトリンクより下流に位置する文書群と、を備え得る。 According to one aspect of the present disclosure, the sub-network discriminating unit may discriminate the sub-network for each combination of the in-link and the out-link of the specific document as a plurality of sub-networks. Each of the sub-networks may include a group of documents located upstream of the inlink corresponding to the combination, a specific document, and a group of documents located downstream of the outlink corresponding to the combination.

本開示の一側面によれば、統合は、対応する重複文書の、二つ以上のサブネットワークでのスコアの合計を算出することにより実現されてもよい。本開示の一側面によれば、統合は、対応する重複文書の、二つ以上のサブネットワークでのスコアの代表値を算出することにより実現されてもよい。代表値は、平均値であってもよい。 According to one aspect of the disclosure, integration may be achieved by calculating the sum of the scores of the corresponding duplicate documents in two or more subnetworks. According to one aspect of the disclosure, integration may be achieved by calculating representative scores of the corresponding duplicate documents in two or more subnetworks. The representative value may be an average value.

本開示の一側面によれば、個別処理は、対応するサブネットワークにおける文書間の接続関係に基づくエルミート隣接行列を用いて、対応するサブネットワークに含まれる各文書のスコアを算出する処理を含んでいてもよい。 According to one aspect of the present disclosure, the individual processing includes the processing of calculating the score of each document contained in the corresponding subnetting using the Hermitian adjacency matrix based on the connection relationship between the documents in the corresponding subnetting. You may.

本開示の一側面によれば、個別処理は、対応するサブネットワークに含まれる、アウトリンクを有さない後端文書に対して、ダミー文書を付加することにより、後端文書に仮想的にアウトリンクを設けるように、対応するサブネットワークを変更する処理を含んでいてもよい。個別処理は、変更後のサブネットワークにおける文書間の接続関係に基づくエルミート隣接行列を定義する処理を含んでもよい。 According to one aspect of the present disclosure, the individual processing is virtually out to the trailing document by adding a dummy document to the trailing document that does not have an outlink and is included in the corresponding subnetworks. It may include a process of changing the corresponding subnetwork so as to provide a link. The individual processing may include a processing that defines an Hermitian adjacency matrix based on the connection relationship between documents in the modified subnetwork.

本開示の一側面では、エルミート隣接行列は、対応するサブネットワークを構成する文書Ｄ［ｍ］（１≦ｍ（整数）≦Ｎ）間の接続関係に基づくＮ行Ｎ列のエルミート行列であってもよい。 In one aspect of the disclosure, the Hermitian adjacency matrix is an N-by-N Hermitian matrix based on the connection relationships between the documents D [m] (1 ≤ m (integer) ≤ N) that make up the corresponding subnetworks. May be good.

エルミート隣接行列は、第ｐ行第ｑ列の成分ｈ（ｐ，ｑ）が、文書Ｄ［ｐ］から文書Ｄ［ｑ］へのリンクが存在し且つ文書Ｄ［ｑ］から文書Ｄ［ｐ］へのリンクが存在するとき、値１を示し、文書Ｄ［ｐ］から文書Ｄ［ｑ］へのリンク及び文書Ｄ［ｑ］から文書Ｄ［ｐ］へのリンクのいずれも存在しないとき、値０を示し、文書Ｄ［ｐ］から文書Ｄ［ｑ］へのリンクが存在するが文書Ｄ［ｑ］から文書Ｄ［ｐ］へのリンクが存在しないとき、値＋ｉ（ｉは虚数単位）を示し、文書Ｄ［ｐ］から文書Ｄ［ｑ］へのリンクが存在しないが文書Ｄ［ｑ］から文書Ｄ［ｐ］へのリンクが存在するとき、値−ｉを示す、対角成分がゼロのエルミート行列に対応してもよい。 In the Hermeet adjacency matrix, the component h (p, q) in the p-th row and q-th column has a link from the document D [p] to the document D [q] and the document D [q] to the document D [p]. A value of 1 when a link to is present, and a value when neither the link from document D [p] to document D [q] nor the link from document D [q] to document D [p] is present. When 0 is shown and there is a link from document D [p] to document D [q] but there is no link from document D [q] to document D [p], the value + i (i is an imaginary unit). Shown, when there is no link from document D [p] to document D [q] but there is a link from document D [q] to document D [p], the diagonal component is zero, indicating the value -i. It may correspond to the Elmeat matrix of.

個別処理は、エルミート隣接行列を変形して特殊エルミート隣接行列を定義し、特殊エルミート隣接行列の固有ベクトルを用いて、対応するサブネットワークに含まれる各文書のスコアを算出する処理を含んでいてもよい。 The individual processing may include a process of transforming the Hermitian adjacency matrix to define a special Hermitian adjacency matrix and using the eigenvectors of the special Hermitian adjacency matrix to calculate the score of each document contained in the corresponding subnetwork. ..

エルミート隣接行列は、固有ベクトルの各成分を仮に複素平面上に配置したときに、全ての成分が複素平面においてπ／２ラジアンの角度範囲内に収まるように変形されてもよい。 The Hermitian adjacency matrix may be modified so that when each component of the eigenvector is placed on the complex plane, all the components fall within the angular range of π / 2 radians in the complex plane.

本開示の一側面によれば、文書間の接続関係を１，０，＋ｉ，−ｉの４値で表現可能なエルミート隣接行列に対応する特殊エルミート隣接行列を用いて複数の文書をスコアリングする。このため、全文書から全文書への仮想的な接続関係を措定する必要がなく、文書間の接続関係に基づく各文書のスコアリングを従来よりも適切に実現することができる。 According to one aspect of the present disclosure, a plurality of documents are scored using a special Hermitian adjacency matrix corresponding to the Hermitian adjacency matrix in which the connection relationship between documents can be expressed by four values of 1,0, + i, and −i. .. Therefore, it is not necessary to determine a virtual connection relationship from all documents to all documents, and scoring of each document based on the connection relationship between documents can be realized more appropriately than before.

本開示の一側面によれば、コンピュータプログラムが提供されてもよい。コンピュータプログラムは、上述した情報処理システムが備える文書ネットワーク判別部と、文書判別部と、サブネットワーク判別部と、スコア算出部の少なくとも一つとして、コンピュータを機能させるためのコンピュータプログラムであってもよい。 According to one aspect of the disclosure, computer programs may be provided. The computer program may be a computer program for operating the computer as at least one of the document network discrimination unit, the document discrimination unit, the subnetwork discrimination unit, and the score calculation unit included in the above-mentioned information processing system. ..

本開示の一側面によれば、コンピュータにより実行される情報処理方法が提供されてもよい。情報処理方法は、文書間の接続関係を表すデータに基づき、少なくとも弱連結で連結された複数の文書で構成される文書ネットワークを判別することを含んでいてもよい。 According to one aspect of the present disclosure, information processing methods performed by a computer may be provided. The information processing method may include determining a document network composed of a plurality of documents connected by at least a weak connection based on data representing a connection relationship between documents.

情報処理方法は、判別された文書ネットワークに含まれる、二つ以上の文書からのインリンクを有する特定文書を判別することを含んでいてもよい。情報処理方法は、判別された特定文書を基準に、文書ネットワークに含まれる複数のサブネットワークを判別することを含んでいてもよい。 The information processing method may include discriminating a specific document having an inlink from two or more documents included in the discriminated document network. The information processing method may include discriminating a plurality of sub-networks included in the document network based on the discriminated specific document.

情報処理方法は、判別された複数のサブネットワークのそれぞれに対する個別処理として、対応するサブネットワークに含まれる各文書のスコアを算出する処理を実行することにより、文書ネットワークを構成する複数の文書のそれぞれのスコアを算出することを含んでいてもよい。 In the information processing method, each of the plurality of documents constituting the document network is executed by executing a process of calculating the score of each document included in the corresponding subnetwork as an individual process for each of the plurality of determined subnetworks. It may include calculating the score of.

文書ネットワークには、複数のサブネットワークのうちの二つ以上のサブネットワークに属する文書である重複文書が一つ以上含まれていてもよい。文書ネットワークを構成する複数の文書のそれぞれのスコアを算出することは、一つ以上の重複文書のそれぞれに関して、対応する重複文書の二つ以上のサブネットワークでのスコアを統合することにより、対応する重複文書に対する一つのスコアを算出することを含んでいてもよい。 The document network may include one or more duplicate documents that belong to two or more subnetworks of the plurality of subnetworks. Calculating the scores for each of the documents that make up the document network corresponds to each of the duplicate documents by integrating the scores of the corresponding duplicate documents in two or more subnetworks. It may include calculating one score for duplicate documents.

本開示の一側面によれば、上述した情報処理システムが実行する手順の少なくとも一部を備える情報処理方法が提供され得る。 According to one aspect of the present disclosure, an information processing method can be provided that includes at least a portion of the procedures performed by the information processing system described above.

本開示の一側面によれば、プロセッサと、プロセッサに特定の処理を実行させるための命令を含むメモリと、を備える情報処理システムが提供されてもよい。特定の処理は、上述の情報処理方法に対応する処理であり得る。 According to one aspect of the present disclosure, an information processing system may be provided that includes a processor and a memory containing instructions for causing the processor to perform a particular process. The specific process may be a process corresponding to the above-mentioned information processing method.

第１実施形態の情報処理システムの構成を表すブロック図である。It is a block diagram which shows the structure of the information processing system of 1st Embodiment. 情報処理システムに関する機能ブロック図である。It is a functional block diagram about an information processing system. クエリ応答部の詳細を表す機能ブロック図である。It is a functional block diagram which shows the detail of a query response part. 第２スコアリング部が実行する処理を表すフローチャートである。It is a flowchart which shows the process which the 2nd scoring part executes. 結合ノードを含まない第１の文書ネットワークを示す図である。It is a figure which shows the 1st document network which does not include a join node. 結合ノードを含む第２の文書ネットワークを示す図である。It is a figure which shows the 2nd document network which contains a join node. 第２スコアリング部が実行するスコア算出処理の第１部分を表すフローチャートである。It is a flowchart which shows the 1st part of the score calculation process executed by the 2nd scoring part. 第２スコアリング部が実行する副処理を表すフローチャートである。It is a flowchart which shows the sub-processing which the 2nd scoring unit executes. ダミーノードが付加された第１の文書ネットワークを示す図である。It is a figure which shows the 1st document network which attached the dummy node. 特殊エルミート隣接行列の第１例を示す図である。It is a figure which shows the 1st example of the special Hermitian adjacency matrix. 図１１Ａは、各ノードの複素平面上の配置に関する説明図であり、図１１Ｂは、スコアの算出方法を説明した図である。FIG. 11A is an explanatory diagram regarding the arrangement of each node on the complex plane, and FIG. 11B is a diagram illustrating a score calculation method. 特殊エルミート隣接行列の第２例を示す図である。It is a figure which shows the 2nd example of the special Hermitian adjacency matrix. 先端ノードから第０層結合ノードに導かれるサブグラフを示す図である。It is a figure which shows the subgraph which leads from the tip node to the 0th layer connection node. ダミーノードが付加されたサブグラフを示す図である。It is a figure which shows the subgraph to which the dummy node is added. スコア算出処理の第２部分を表すフローチャートである。It is a flowchart which shows the 2nd part of the score calculation process. 第１層結合ノードに導かれるサブグラフを示す図である。It is a figure which shows the subgraph which leads to the 1st layer connection node. 第２層結合ノードに導かれるサブグラフを示す図である。It is a figure which shows the subgraph which leads to the 2nd layer connection node. 第３層結合ノードに導かれるサブグラフを示す図である。It is a figure which shows the subgraph which leads to the 3rd layer connection node. スコア算出処理の第３部分を表すフローチャートである。It is a flowchart which shows the 3rd part of the score calculation process. 結合ノードから後端ノードに導かれる非循環型のサブグラフを示す図である。It is a figure which shows the non-circular type subgraph which leads from a join node to a rear end node. 循環型のサブグラフを示す図である。It is a figure which shows the circulation type subgraph. 第２実施形態のスコア算出処理を表すフローチャートである。It is a flowchart which shows the score calculation process of 2nd Embodiment. 第３の文書ネットワークの例を示す図である。It is a figure which shows the example of the 3rd document network. 図２４Ａ及び図２４Ｂは、第２実施形態における第３の文書ネットワークのサブグラフを示す図である。24A and 24B are diagrams showing a subgraph of the third document network in the second embodiment. 第３実施形態のスコア算出処理を表すフローチャートである。It is a flowchart which shows the score calculation process of 3rd Embodiment. 図２６Ａ及び図２６Ｂは、第３実施形態における第３の文書ネットワークのサブグラフを示す図である。26A and 26B are diagrams showing a subgraph of the third document network in the third embodiment. 第４の文書ネットワークの例を示す図である。It is a figure which shows the example of the 4th document network. 第５の文書ネットワークの例を示す図である。It is a figure which shows the example of the 5th document network. 図２９Ａ及び図２９Ｂは、第４の文書ネットワークのサブグラフを示す図である。29A and 29B are diagrams showing a subgraph of the fourth document network. 第４実施形態において第２スコアリング部が実行する副処理を表すフローチャートである。It is a flowchart which shows the sub-processing which the 2nd scoring unit executes in 4th Embodiment. 第５実施形態において第２スコアリング部が実行する副処理を表すフローチャートである。6 is a flowchart showing a sub-process executed by the second scoring unit in the fifth embodiment. 第５実施形態における特殊エルミート隣接行列に関する説明図である。It is explanatory drawing about the special Hermitian adjacency matrix in 5th Embodiment. 第５実施形態における特殊エルミート隣接行列に関する説明図である。It is explanatory drawing about the special Hermitian adjacency matrix in 5th Embodiment.

１…情報処理システム、５…ユーザ端末、１０…演算部、１１…プロセッサ、１５…メモリ、２０…記憶部、３０…通信部、１１０…クローラ、１２０…インデクサ、１３０…クエリ処理部、１４０…クエリ応答部、１４１…第１スコアリング部、１４３…第２スコアリング部、１４５…ランク付け部、１４７…出力部、２１０…ページリポジトリ、２２０…インデックス記憶部。 1 ... Information processing system, 5 ... User terminal, 10 ... Computational unit, 11 ... Processor, 15 ... Memory, 20 ... Storage unit, 30 ... Communication unit, 110 ... Crawler, 120 ... Indexer, 130 ... Query processing unit, 140 ... Query response unit, 141 ... 1st scoring unit, 143 ... 2nd scoring unit, 145 ... ranking unit, 147 ... output unit, 210 ... page repository, 220 ... index storage unit.

本開示の例示的実施形態を、以下に図面を参照しながら説明する。 An exemplary embodiment of the present disclosure will be described below with reference to the drawings.

［第１実施形態］
図１に示す本実施形態の情報処理システム１は、ユーザ端末５から入力される検索クエリに応答して、ユーザ端末５に検索クエリに対応する文書のリストを提供するように構成される。文書は、ウェブ文書、具体的にはウェブページである。情報処理システム１は、通信ネットワークを通じてユーザ端末５から利用可能な検索エンジンとして機能する。通信ネットワークは、例えばインターネットである。[First Embodiment]
The information processing system 1 of the present embodiment shown in FIG. 1 is configured to provide the user terminal 5 with a list of documents corresponding to the search query in response to the search query input from the user terminal 5. The document is a web document, specifically a web page. The information processing system 1 functions as a search engine that can be used from the user terminal 5 through the communication network. The communication network is, for example, the Internet.

情報処理システム１は、演算部１０と、記憶部２０と、通信部３０とを備える。演算部１０は、プロセッサ１１及びメモリ１５を備える。記憶部２０は、プロセッサ１１により実行されるコンピュータプログラム及びデータを記憶する。記憶部２０は、ハードディスクドライブ及びソリッドステートドライブの一方を備えることができる。 The information processing system 1 includes a calculation unit 10, a storage unit 20, and a communication unit 30. The arithmetic unit 10 includes a processor 11 and a memory 15. The storage unit 20 stores computer programs and data executed by the processor 11. The storage unit 20 may include one of a hard disk drive and a solid state drive.

通信部３０は、ユーザ端末５と通信可能な通信インタフェースを備える。演算部１０は、記憶部２０に記憶されたコンピュータプログラムに従う処理を実行することにより、検索機能を実現する。検索機能を実現するための処理は、具体的には、プロセッサ１１により実行される。図１に簡略的に示される情報処理システム１は、一つ以上の協働するサーバ装置群で構成され得る。 The communication unit 30 includes a communication interface capable of communicating with the user terminal 5. The calculation unit 10 realizes a search function by executing a process according to a computer program stored in the storage unit 20. Specifically, the process for realizing the search function is executed by the processor 11. The information processing system 1 briefly shown in FIG. 1 may be composed of one or more cooperating server devices.

検索機能は、演算部１０が、図２に示すクローラ１１０、インデクサ１２０、クエリ処理部１３０、及び、クエリ応答部１４０として機能し、記憶部２０が、ページリポジトリ２１０、及びインデックス記憶部２２０として機能することにより実現される。 In the search function, the calculation unit 10 functions as the crawler 110, the indexer 120, the query processing unit 130, and the query response unit 140 shown in FIG. 2, and the storage unit 20 functions as the page repository 210 and the index storage unit 220. It is realized by doing.

クローラ１１０は、周知のクローラと同様に、通信ネットワークに存在する複数のウェブページを収集するように構成される。クローラ１１０により収集されたウェブページは、ページリポジトリ２１０に蓄積される。 The crawler 110, like a well-known crawler, is configured to collect a plurality of web pages existing in a communication network. The web pages collected by the crawler 110 are stored in the page repository 210.

インデクサ１２０は、ページリポジトリ２１０に蓄積された各ウェブページを解析してインデックス化するように構成される。インデックス化により、ウェブページ毎のインデックスデータが生成される。ウェブページ毎のインデックスデータは、インデックス記憶部２２０に記憶される。 The indexer 120 is configured to analyze and index each web page stored in the page repository 210. Indexing generates index data for each web page. The index data for each web page is stored in the index storage unit 220.

各インデックスデータは、内容インデックス、及び構造インデックスを含む。内容インデックスは、対応するウェブページのキーワード、タイトル、及びキーとなる文章の情報を含む。構造インデックスは、対応するウェブページのハイパーリンク構造を表す情報を含む。インデックスデータの一群は、ウェブページ間の接続関係を表す。 Each index data includes a content index and a structural index. The content index contains information on the corresponding web page keywords, titles, and key texts. The structure index contains information that represents the hyperlink structure of the corresponding web page. A group of index data represents the connection relationship between web pages.

クエリ処理部１３０は、ユーザからの検索クエリを受け付け、検索クエリに対応するウェブページの集合である関連ページ群を、全ウェブページの中から抽出する。ここでいう全ウェブページは、クローラ１１０により通信ネットワーク内で見つけられ、インデックス記憶部２２０にインデックスデータが登録されたウェブページ群に対応する。 The query processing unit 130 receives a search query from a user and extracts a group of related pages, which is a set of web pages corresponding to the search query, from all the web pages. All the web pages referred to here correspond to a group of web pages found in the communication network by the crawler 110 and whose index data is registered in the index storage unit 220.

具体的に、クエリ処理部１３０は、インデックス記憶部２２０が記憶するウェブページの内容インデックスに基づき、検索クエリに対応する語彙を含むウェブページの集合を関連ページ群として、全ウェブページの中から抽出する。抽出された関連ページ群の情報は、クエリ応答部１４０に提供される。 Specifically, the query processing unit 130 extracts from all the web pages a set of web pages including the vocabulary corresponding to the search query as a related page group based on the content index of the web page stored in the index storage unit 220. do. The extracted related page group information is provided to the query response unit 140.

クエリ応答部１４０は、提供される関連ページ群の情報に基づき、関連ページ群をページランク順に配列した検索結果リストを、検索クエリに対する応答データとして、ユーザ端末５に送信する。 The query response unit 140 transmits a search result list in which the related page groups are arranged in the order of page rank to the user terminal 5 as response data for the search query based on the information of the related page group provided.

関連ページの夫々は、検索クエリとの関連度及び重要度が高いウェブページほど上位にランク付けされ、検索結果リストの上位に配置される。検索結果リストは、従来の検索エンジンからの応答データと同様に、リストアップされた関連ページへのリンクを有したウェブページとして構成される。ここで言うリンクは、所謂ハイパーリンクである。 Each related page is ranked higher as the web page has a higher degree of relevance and importance to the search query, and is placed at the top of the search result list. The search result list is configured as a web page having links to the listed related pages, similar to the response data from a conventional search engine. The link referred to here is a so-called hyperlink.

クエリ応答部１４０は、図３に示すように、第１スコアリング部１４１と、第２スコアリング部１４３と、ランク付け部１４５と、出力部１４７とを備える。第１スコアリング部１４１は、検索クエリに対応する関連ページ群について、関連ページの夫々を、ページコンテンツの検索クエリとの関連度に基づいてスコアリングする。具体的に、第１スコアリング部１４１は、関連ページの夫々に、第１スコアとして、内容得点を与えるように構成される。 As shown in FIG. 3, the query response unit 140 includes a first scoring unit 141, a second scoring unit 143, a ranking unit 145, and an output unit 147. The first scoring unit 141 scores each of the related pages for the related page group corresponding to the search query based on the degree of relevance to the search query of the page content. Specifically, the first scoring unit 141 is configured to give a content score as the first score to each of the related pages.

第２スコアリング部１４３は、検索クエリとは独立して動作し、クローラ１１０により収集されたウェブページの夫々に、第２スコアとして、ウェブページ間の接続関係に基づく重要得点を与えるように構成される。 The second scoring unit 143 operates independently of the search query, and is configured to give each of the web pages collected by the crawler 110 an important score based on the connection relationship between the web pages as the second score. Will be done.

第２スコアは、ウェブページ間の接続関係から重要度が高いと推定されるウェブページほど大きな値を示すように算出される。第２スコアは、多くのリンクが集まるウェブページほど、高い重要得点を持つウェブページからリンクされるウェブページほど、他のウェブページへのリンクの少ないウェブページからのリンクを持つウェブページほど大きな値を示す。 The second score is calculated so that the web page that is presumed to be more important from the connection relationship between the web pages shows a larger value. The second score is higher for web pages with many links, for web pages linked from web pages with high important scores, and for web pages with links from web pages with few links to other web pages. Is shown.

ランク付け部１４５は、第１スコアリング部１４１が関連ページの夫々に対して算出した第１スコアと、第２スコアリング部１４３が関連ページの夫々に対して算出した第２スコアとに基づき、関連ページの夫々のページランクを算出するように構成される。 The ranking unit 145 is based on the first score calculated by the first scoring unit 141 for each of the related pages and the second score calculated by the second scoring unit 143 for each of the related pages. It is configured to calculate the page rank of each related page.

一例によれば、関連ページの夫々のページランクは、第１スコアと第２スコアとの重み付け和に対応する。例えば、第１スコアＸ１、第２スコアＸ２、及び、０から１の間の値を採る重み付け係数αを用いて、各関連ページのページランクＹは、式Ｙ＝α・Ｘ１＋（１−α）・Ｘ２に従って算出され得る。各関連ページのページランクは、検索クエリに基づく内容得点とウェブページ間の接続関係に基づく重要得点とを成分に含む全体得点として理解されてもよい。 According to one example, each page rank of the related page corresponds to the weighted sum of the first score and the second score. For example, using the first score X1, the second score X2, and the weighting coefficient α that takes a value between 0 and 1, the page rank Y of each related page is expressed by the formula Y = α · X1 + (1-α). -Can be calculated according to X2. The page rank of each related page may be understood as an overall score including a content score based on a search query and an important score based on the connection relationship between web pages.

出力部１４７は、検索クエリに対応する関連ページ群を、ランク付け部１４５により算出された各関連ページのページランクに基づき、ページランクの高い順に並べたページリストを、検索結果リストとして検索クエリ送信元のユーザ端末５に送信する。 The output unit 147 sends a search query as a search result list, which is a page list in which related pages corresponding to the search query are arranged in descending order of page rank based on the page rank of each related page calculated by the ranking unit 145. It is transmitted to the original user terminal 5.

具体的には、第２スコアリング部１４３は、図４に示す処理を定期的に実行することにより、インデックス記憶部２２０が記憶する最新のインデックスデータに基づき、少なくとも弱連結で連結されたウェブページ群毎に、対応するウェブページ群に属する各ウェブページの第２スコアを算出する。 Specifically, the second scoring unit 143 periodically executes the process shown in FIG. 4, and based on the latest index data stored in the index storage unit 220, at least a weakly concatenated web page. For each group, the second score of each web page belonging to the corresponding web page group is calculated.

図４に示す処理を開始すると、第２スコアリング部１４３は、全ウェブページの中から、一つ以上の文書ネットワークを抽出する（Ｓ１１０）。第２スコアリング部１４３は、インデックス記憶部２２０が記憶するインデックスデータを参照することにより、全ウェブページの中で、少なくとも弱連結で連結されたウェブページ群のそれぞれを、一つの文書ネットワークとして抽出することができる。一つの文書ネットワークは、少なくとも弱連結で連結されたウェブページ群から構成される。 When the process shown in FIG. 4 is started, the second scoring unit 143 extracts one or more document networks from all the web pages (S110). By referring to the index data stored in the index storage unit 220, the second scoring unit 143 extracts at least each of the web page groups connected by a weak connection as one document network among all the web pages. can do. A document network consists of at least a group of web pages that are weakly linked.

少なくとも弱連結で連結されたノード群から構成されるネットワークは、ノード間のリンクの接続方向を無視したときに、そのネットワークに属するノード群の任意の一つのノードから、残りのノードにリンクをたどって到達可能なネットワークに対応する。 A network consisting of at least a group of nodes connected by a weak connection follows a link from any one node of the group of nodes belonging to the network to the remaining nodes when the connection direction of the link between the nodes is ignored. Corresponds to a reachable network.

すなわち、文書ネットワークは、その文書ネットワークに属するウェブページの任意の一つが、リンクの接続方向を無視したときに、残りのウェブページと少なくとも間接的に接続されるウェブページ群から構成される。 That is, a document network is composed of a group of web pages that are at least indirectly connected to the remaining web pages when any one of the web pages belonging to the document network ignores the connection direction of the link.

図５及び図６は、異なる二つの文書ネットワークの例を示す。文書ネットワークは、有向グラフとして表現される。図５及び図６における一つの円は、一つのノード、換言すれば一つのウェブページに対応する。同図における矢印は、矢印の始点に対応するウェブページに、矢印の終点に対応するウェブページへのリンク（ハイパーリンク）が形成されていることを示す。即ち、矢印の始点に対応するウェブページから矢印の終点に対応するウェブページにリンクを介して移動可能であることを意味する。 5 and 6 show examples of two different document networks. The document network is represented as a directed graph. One circle in FIGS. 5 and 6 corresponds to one node, in other words, one web page. The arrow in the figure indicates that a link (hyperlink) to the web page corresponding to the end point of the arrow is formed on the web page corresponding to the start point of the arrow. That is, it means that it is possible to move from the web page corresponding to the start point of the arrow to the web page corresponding to the end point of the arrow via a link.

図５及び図６に示される文書ネットワーク内の各ウェブページは、明らかに、矢印の方向を無視したとき、文書ネットワーク内の他のウェブページと少なくとも間接的に接続されている。以下では、一つの文書ネットワーク内の複数のウェブページのそれぞれを、図において円内に示される数字ｋを用いて、第ｋウェブページとも表現する。文書ネットワーク内の各ウェブページのことを、ノードとも表現する。第ｋノードは、第ｋウェブページを意味する。 Each web page in the document network shown in FIGS. 5 and 6 is clearly at least indirectly connected to other web pages in the document network when the direction of the arrow is ignored. In the following, each of a plurality of web pages in one document network is also referred to as a k-th web page by using the number k shown in a circle in the figure. Each web page in the document network is also referred to as a node. The k-th node means the k-th web page.

Ｓ１１０に続くＳ１２０において、第２スコアリング部１４３は、上記抽出した一つ以上の文書ネットワークのうちの一つを、処理対象の文書ネットワークに選択する。その後、第２スコアリング部１４３は、処理対象の文書ネットワーク内の各ノードの第２スコアを算出するために、図７に示すスコア算出処理を実行する（Ｓ１３０）。 In S120 following S110, the second scoring unit 143 selects one or more of the extracted document networks as the document network to be processed. After that, the second scoring unit 143 executes the score calculation process shown in FIG. 7 in order to calculate the second score of each node in the document network to be processed (S130).

第２スコアリング部１４３は、全ての文書ネットワークに対して、スコア算出処理を実行するまで、スコア算出処理を繰返し実行する（Ｓ１２０−Ｓ１４０）。すなわち、第２スコアリング部１４３は、各文書ネットワークを順に処理対象に選択し（Ｓ１２０）、選択した処理対象の文書ネットワークに対するスコア算出処理を実行する（Ｓ１３０）。 The second scoring unit 143 repeatedly executes the score calculation process until the score calculation process is executed for all the document networks (S120-S140). That is, the second scoring unit 143 sequentially selects each document network as a processing target (S120), and executes a score calculation process for the selected document network to be processed (S130).

第２スコアリング部１４３は、全ての文書ネットワークに対するスコア算出処理を終了すると（Ｓ１４０でＹｅｓ）、図４に示す処理を終了する。第２スコアリング部１４３は、このようにして、文書ネットワーク毎に、対応する文書ネットワークを構成する各ウェブページの第２スコアを算出する。算出された第２スコアは、ランク付け部１４５に提供される。 When the second scoring unit 143 finishes the score calculation process for all the document networks (Yes in S140), the second scoring unit 143 ends the process shown in FIG. In this way, the second scoring unit 143 calculates the second score of each web page constituting the corresponding document network for each document network. The calculated second score is provided to the ranking unit 145.

図７に示すスコア算出処理（Ｓ１３０）を開始すると、第２スコアリング部１４３は、処理対象の文書ネットワークにおける、インリンクを持たない先端ノードを判別する（Ｓ２１０）。 When the score calculation process (S130) shown in FIG. 7 is started, the second scoring unit 143 determines the tip node having no inlink in the document network to be processed (S210).

インリンクを持つノードは、このノードへのリンクが他のノードにおいて形成されたノードを意味する。換言すれば、インリンクを持つウェブページは、このウェブページに移動可能なリンク（ハイパーリンク）が他のウェブページにおいて形成されたウェブページを意味する。以下では、インリンクを持たないノードのことを「先端ノード」とも表現する。 A node with an inlink means a node in which a link to this node is formed in another node. In other words, a web page with an inlink means a web page in which a link (hyperlink) that can move to this web page is formed in another web page. In the following, a node that does not have an inlink is also referred to as a "leading node".

図５においてインリンクを持つノードは、第２、第３、第４、第５、第６、第７、及び第８ノードであり、インリンクを持たないノードは、第１ノードである。図６においてインリンクを持たないノードは、第１、第２、第４、第１０、第１３、及び第１４ノードである。 In FIG. 5, the nodes having an inlink are the second, third, fourth, fifth, sixth, seventh, and eighth nodes, and the node having no inlink is the first node. The nodes having no inlink in FIG. 6 are the first, second, fourth, tenth, thirteenth, and fourteenth nodes.

処理対象の文書ネットワークが、先端ノードを有さない文書ネットワークである場合、文書ネットワークには、インリンクを持たないダミーノードＤＰが付加される。具体的には、文書ネットワーク内の全てのノードへのアウトリンクを持つダミーノードＤＰが文書ネットワークに付加される。 When the document network to be processed is a document network having no advanced node, a dummy node DP having no inlink is added to the document network. Specifically, a dummy node DP having outlinks to all the nodes in the document network is added to the document network.

アウトリンクを持つノードは、他ノードへのリンクを持つノードを意味する。換言すれば、アウトリンクを持つウェブページは、他のウェブページに移動可能なリンク（ハイパーリンク）が形成されたウェブページを意味する。以下では、アウトリンクを持たないノードのことを「後端ノード」とも表現する。 A node with an outlink means a node with a link to another node. In other words, a web page with an outlink means a web page in which a link (hyperlink) that can be moved to another web page is formed. In the following, a node that does not have an outlink is also referred to as a "rear end node".

文書ネットワークに、インリンクを持たないダミーノードＤＰが付加された場合、第２スコアリング部１４３は、ダミーノードＤＰが付加された文書ネットワークを、処理対象の文書ネットワークとみなし、付加したダミーノードＤＰを、先端ノードと判別する。 When a dummy node DP having no inlink is added to the document network, the second scoring unit 143 considers the document network to which the dummy node DP is added as a document network to be processed, and adds the dummy node DP. Is determined as the tip node.

Ｓ２１０に続くＳ２２０において、第２スコアリング部１４３は、複数のインリンクを持つ結合ノードを判別する。一つの結合ノードは、複数のインリンクを持つ一つのノードのことを意味する。 In S220 following S210, the second scoring unit 143 determines a join node having a plurality of inlinks. One join node means one node having a plurality of inlinks.

図５に示す文書ネットワークには、結合ノードがない。図６に示す文書ネットワークにおける結合ノードは、二重丸で示される第３、第６、第７、第１２、及び第１５ノードである。例えば、第３ノードは、第１ノードからのインリンクと、第２ノードからのインリンクと、を有する。 The document network shown in FIG. 5 does not have a join node. The join nodes in the document network shown in FIG. 6 are the third, sixth, seventh, twelfth, and fifteenth nodes indicated by double circles. For example, the third node has an inlink from the first node and an inlink from the second node.

Ｓ２２０での処理により、処理対象の文書ネットワークが結合ノードを有することが判明した場合（Ｓ２３０でＹｅｓ）、第２スコアリング部１４３は、Ｓ２５０の処理を実行する。処理対象の文書ネットワークが結合ノードを有さないことが判明した場合、第２スコアリング部１４３は、Ｓ２４０の処理を実行する。 When the processing in S220 reveals that the document network to be processed has a join node (Yes in S230), the second scoring unit 143 executes the processing in S250. When it is found that the document network to be processed does not have a join node, the second scoring unit 143 executes the process of S240.

Ｓ２４０において、第２スコアリング部１４３は、処理対象の文書ネットワーク内の各ノードのスコアを、文書ネットワークに対応するエルミート隣接行列Ｈを用いて算出する。算出される各ノードのスコアは、ノード間の接続関係に基づくスコアである。 In S240, the second scoring unit 143 calculates the score of each node in the document network to be processed by using the Hermitian adjacency matrix H corresponding to the document network. The calculated score of each node is a score based on the connection relationship between the nodes.

第２スコアリング部１４３は、Ｓ２４０で算出した各ノードのスコアを、各ウェブページの第２スコアとしてランク付け部１４５に出力する（Ｓ２４５）。その後、図７に示すスコア算出処理を終了する。 The second scoring unit 143 outputs the score of each node calculated in S240 to the ranking unit 145 as the second score of each web page (S245). After that, the score calculation process shown in FIG. 7 is completed.

Ｓ２４０において、第２スコアリング部１４３は、同一出願人によって２０１８年７月１３日に出願された国際出願ＰＣＴ／ＪＰ２０１８／０２６５６０と同様の手法で、各ノードのスコアを算出することができる。具体的には、第２スコアリング部１４３は、図８に示す副処理を実行することにより、各ノードのスコアを算出することができる。 In S240, the second scoring unit 143 can calculate the score of each node by the same method as the international application PCT / JP2018 / 026560 filed on July 13, 2018 by the same applicant. Specifically, the second scoring unit 143 can calculate the score of each node by executing the sub-processing shown in FIG.

以下では、スコアの算出方法を説明するために、処理対象の文書ネットワークを構成するノードのそれぞれを、ノードＤ［ｍ］と表現する。変数ｍは、値１からＮまでの整数値を採る（１≦ｍ≦Ｎ）。Ｎは、処理対象の文書ネットワークのノード数Ｎである。ノードＤ［ｍ］は、対応する文書ネットワークにおける第ｍノード、すなわち第ｍウェブページに対応する。 In the following, in order to explain the score calculation method, each of the nodes constituting the document network to be processed is expressed as a node D [m]. The variable m takes an integer value from the values 1 to N (1 ≦ m ≦ N). N is the number of nodes N of the document network to be processed. Node D [m] corresponds to the mth node in the corresponding document network, i.e. the mth web page.

図８に示す副処理を開始すると、第２スコアリング部１４３は、処理対象の文書ネットワークに対応するエルミート隣接行列Ｈを生成する（Ｓ１０１０）。具体的には、第２スコアリング部１４３は、処理対象の文書ネットワーク内のノード間の接続関係を、値１，０，＋ｉ，−ｉで表すエルミート隣接行列Ｈを生成する。ここでｉは、虚数単位を表す。 When the sub-processing shown in FIG. 8 is started, the second scoring unit 143 generates the Hermitian adjacency matrix H corresponding to the document network to be processed (S1010). Specifically, the second scoring unit 143 generates the Hermitian adjacency matrix H in which the connection relationships between the nodes in the document network to be processed are represented by the values 1, 0, + i, and −i. Here, i represents an imaginary unit.

エルミート隣接行列Ｈは、処理対象の文書ネットワークのノード数Ｎに対応したＮ行Ｎ列（ＮｘＮ）の行列であり、各成分が、値１，０，＋ｉ，−ｉのいずれかの値を採る行列である。以下における表現「成分ｈ（ｐ，ｑ）」は、エルミート隣接行列Ｈにおける第ｐ行第ｑ列の成分を示す。 The Hermitian adjacency matrix H is a matrix of N rows and N columns (NxN) corresponding to the number N of nodes of the document network to be processed, and each component takes any value of 1, 0, + i, or -i. It is a matrix. The expression "component h (p, q)" in the following indicates the component of the p-th row and q-th column in the Hermitian adjacency matrix H.

処理対象の文書ネットワークにおいて、ノードＤ［ｐ］からノードＤ［ｑ］へのリンクが存在し且つノードＤ［ｑ］からノードＤ［ｐ］へのリンクが存在するとき、対応する成分ｈ（ｐ，ｑ）は、値１に設定される。 When there is a link from node D [p] to node D [q] and a link from node D [q] to node D [p] in the document network to be processed, the corresponding component h (p) , Q) is set to the value 1.

ノードＤ［ｐ］からノードＤ［ｑ］へのリンク及びノードＤ［ｑ］からノードＤ［ｐ］へのリンクのいずれもが存在しないとき、対応する成分ｈ（ｐ，ｑ）は、値０に設定される。従って、エルミート隣接行列Ｈの対角成分ｈ（ｐ，ｐ）は、値ゼロである。 When neither the link from node D [p] to node D [q] nor the link from node D [q] to node D [p] exists, the corresponding component h (p, q) has a value of 0. Is set to. Therefore, the diagonal component h (p, p) of the Hermitian adjacency matrix H has a value of zero.

ノードＤ［ｐ］からノードＤ［ｑ］へのリンクが存在するがノードＤ［ｑ］からノードＤ［ｐ］へのリンクが存在しないとき、対応する成分ｈ（ｐ，ｑ）は、値＋ｉに設定される。ノードＤ［ｐ］からノードＤ［ｑ］へのリンクが存在しないがノードＤ［ｑ］からノードＤ［ｐ］へのリンクが存在するとき、対応する成分ｈ（ｐ，ｑ）は、値−ｉに設定される。 When there is a link from node D [p] to node D [q] but no link from node D [q] to node D [p], the corresponding component h (p, q) is the value + i. Is set to. When there is no link from node D [p] to node D [q] but there is a link from node D [q] to node D [p], the corresponding component h (p, q) is the value − Set to i.

第２スコアリング部１４３は、処理対象の文書ネットワークのノード間の接続関係に従って、上述したように各成分ｈ（ｐ，ｑ）の値を設定し、エルミート隣接行列Ｈを生成する（Ｓ１０１０）。 The second scoring unit 143 sets the value of each component h (p, q) as described above according to the connection relationship between the nodes of the document network to be processed, and generates the Hermitian adjacency matrix H (S1010).

上述した規則に従って各成分ｈ（ｐ，ｑ）の値が設定される場合、第ｐ行第ｑ列の成分ｈ（ｐ，ｑ）と対角成分を挟んで対称的な位置にある第ｑ行第ｐ列の成分ｈ（ｑ，ｐ）は、成分ｈ（ｐ，ｑ）の複素共役である。従って、エルミート隣接行列Ｈは、エルミート行列である。

When the value of each component h (p, q) is set according to the above-mentioned rule, the qth row located symmetrically with the component h (p, q) in the p-th row and q-th column. The component h (q, p) in the first column is a complex conjugate of the component h (p, q). Therefore, the Hermitian adjacency matrix H is a Hermitian matrix.

Ｓ１０１０では、エルミート隣接行列Ｈを生成する前に、処理対象の文書ネットワークにおけるアウトリンクを持たない各後端ノードに対し、ダミーノードＤＰが付加される。 In S1010, a dummy node DP is added to each rear-end node having no outlink in the document network to be processed before the Hermitian adjacency matrix H is generated.

後端ノードに付加されるダミーノードＤＰは、図９に示されるように、後端ノードからのインリンクを一つ持つが、アウトリンクを持たないノードである。図９に示す文書ネットワークは、図５に示す文書ネットワークにおいてアウトリンクを持たない第５及び第８ノードのそれぞれに、ダミーノードＤＰが付加された文書ネットワークである。Ｓ１０１０では、このようにダミーノードＤＰが付加された処理対象の文書ネットワークに対して、エルミート隣接行列Ｈが生成される。 As shown in FIG. 9, the dummy node DP added to the rear end node is a node having one inlink from the rear end node but not an outlink. The document network shown in FIG. 9 is a document network in which a dummy node DP is added to each of the fifth and eighth nodes having no outlink in the document network shown in FIG. In S1010, the Hermitian adjacency matrix H is generated for the document network to be processed to which the dummy node DP is added in this way.

続くＳ１０２０において、第２スコアリング部１４３は、上記生成したエルミート隣接行列Ｈを変形した特殊エルミート隣接行列Ｈ１を生成する。変形は、特殊エルミート隣接行列Ｈ１の固有ベクトルＶの各成分を複素平面に配置したときに、成分の全てがπ／２ラジアンの角度範囲に収まるように行われる。変形に際して、第２スコアリング部１４３は、第１補正量Ｃ１及び第２補正量Ｃ２を算出する。 In the subsequent S1020, the second scoring unit 143 generates a special Hermitian adjacency matrix H1 which is a modification of the generated Hermitian adjacency matrix H. The transformation is performed so that when each component of the eigenvector V of the special Hermitian adjacency matrix H1 is arranged in the complex plane, all of the components fall within the angular range of π / 2 radians. At the time of deformation, the second scoring unit 143 calculates the first correction amount C1 and the second correction amount C2.

第１補正量Ｃ１及び第２補正量Ｃ２に含まれるパラメータｎの値は、Ｓ１１０で抽出される一つ以上の文書ネットワークにおけるノード数Ｎの最大値以上の自然数に定められる。文書ネットワークには、上述したようにダミーノードＤＰが付加され得る。この場合、ノード数Ｎは、ダミーノードＤＰを含む文書ネットワーク内のノード数である。パラメータｎの値が、このように定められることで、固有ベクトルＶの成分の全ては、π／２ラジアンの角度範囲に収まる。

The value of the parameter n included in the first correction amount C1 and the second correction amount C2 is defined as a natural number equal to or greater than the maximum value of the number of nodes N in one or more document networks extracted in S110. A dummy node DP may be added to the document network as described above. In this case, the number of nodes N is the number of nodes in the document network including the dummy node DP. By defining the value of the parameter n in this way, all the components of the eigenvector V fall within the angle range of π / 2 radians.

パラメータｎの値が大きいほど、固有ベクトルＶの成分は、π／２ラジアンの角度範囲より小さい角度範囲内に収まる。成分の全てをπ／２ラジアンの角度範囲に収めることの目的は、成分の全てが複素平面上の一つの象限内に収まるようにするためである。第２スコアの良好な算出のために、パラメータｎは、この目的が達成可能な範囲で、小さい値に定められる。上述の第２補正量Ｃ２は、角度範囲の調整に寄与し、第１補正量Ｃ１は、第２補正量Ｃ２によって行列成分の絶対値が変化するのを回避するのに役立つ。 The larger the value of the parameter n, the smaller the component of the eigenvector V falls within the angle range of π / 2 radians. The purpose of keeping all of the components within the π / 2 radian angle range is to ensure that all of the components fit within one quadrant on the complex plane. For good calculation of the second score, the parameter n is set to a small value within the range where this purpose can be achieved. The above-mentioned second correction amount C2 contributes to the adjustment of the angle range, and the first correction amount C1 helps to avoid changing the absolute value of the matrix component by the second correction amount C2.

第２スコアリング部１４３は、第１補正量Ｃ１及び第２補正量Ｃ２の算出後、エルミート隣接行列Ｈにおける値＋ｉの成分を、値Ｃ１（Ｃ２＋ｉ）に置換し、値−ｉを示す成分を値Ｃ１（Ｃ２−ｉ）に置換する。第２スコアリング部１４３は更に、当該置換後のエルミート隣接行列Ｈにおける各行の成分の値Ｃ１（Ｃ２＋ｉ）を、同じ行において値Ｃ１（Ｃ２＋ｉ）を示す成分の数及び値１を示す成分の数の和Ｗで除算した値｛Ｃ１（Ｃ２＋ｉ）／Ｗ｝に変更する。 After calculating the first correction amount C1 and the second correction amount C2, the second scoring unit 143 replaces the component of the value + i in the Hermitian adjacency matrix H with the value C1 (C2 + i), and replaces the component showing the value −i with the component showing the value −i. Replace with the value C1 (C2-i). The second scoring unit 143 further sets the value C1 (C2 + i) of the component in each row in the Hermitian adjacency matrix H after the substitution, the number of components showing the value C1 (C2 + i) in the same row, and the number of components showing the value 1. Change to the value {C1 (C2 + i) / W} divided by the sum of W of.

第２スコアリング部１４３は更に、値｛Ｃ１（Ｃ２＋ｉ）／Ｗ｝に変更された成分と対角成分を挟んで対称的な位置にある成分の値Ｃ１（Ｃ２−ｉ）を、値｛Ｃ１（Ｃ２＋ｉ）／Ｗ｝の複素共役｛Ｃ１（Ｃ２−ｉ）／Ｗ｝に変更する。第２スコアリング部１４３は、このような置換及び変更によって定義されるエルミート行列を、特殊エルミート隣接行列Ｈ１として生成する。 The second scoring unit 143 further sets the value C1 (C2-i) of the component changed to the value {C1 (C2 + i) / W} and the component located symmetrically across the diagonal component to the value {C1. Change to the complex conjugate {C1 (C2-i) / W} of (C2 + i) / W}. The second scoring unit 143 generates the Hermitian matrix defined by such substitution and modification as the special Hermitian adjacency matrix H1.

エルミート隣接行列Ｈから特殊エルミート隣接行列Ｈ１への変形手順の具体例が図１０に示される。例えば、第ｐ１行における合計Ｎ個の成分ｈ（ｐ１，１），ｈ（ｐ１，２），…，ｈ（ｐ１，Ｎ）の内、値＋ｉを採る成分及び値１を採る成分が合計Ｗ１個である場合には、エルミート隣接行列Ｈにおける第ｐ１行の値＋ｉを示す各成分は、値｛Ｃ１（Ｃ２＋ｉ）／Ｗ１｝に変更される。 A specific example of the transformation procedure from the Hermitian adjacency matrix H to the special Hermitian adjacency matrix H1 is shown in FIG. For example, of the total N components h (p1,1), h (p1,2), ..., H (p1, N) in the first row, the component that takes the value + i and the component that takes the value 1 are the total W1. In the case of the number, each component indicating the value + i in the p1 row in the Hermitian adjacency matrix H is changed to the value {C1 (C2 + i) / W1}.

第ｐ２行における合計Ｎ個の成分ｈ（ｐ２，１），ｈ（ｐ２，２），…，ｈ（ｐ２，Ｎ）の内、値＋ｉを採る成分及び値１を採る成分が合計Ｗ２個である場合には、エルミート隣接行列Ｈにおける第ｐ２行の値＋ｉを示す各成分は、値｛Ｃ１（Ｃ２＋ｉ）／Ｗ２｝に変更される。 Of the total N components h (p2,1), h (p2,2), ..., H (p2, N) in the second row, the component that takes the value + i and the component that takes the value 1 are W2 in total. In some cases, each component indicating the value + i in row p2 in the Hermitian adjacency matrix H is changed to the value {C1 (C2 + i) / W2}.

更に、値−ｉを示す成分の値は、対角成分を挟んで対称的な位置にある成分の複素共役に変更される。例えば、値｛Ｃ１（Ｃ２＋ｉ）／Ｗ１｝を示す成分ｈ（ｐ１，ｑ１）と対角成分を挟んで対称的な位置にある成分ｈ（ｑ１，ｐ１）の値は、｛Ｃ１（Ｃ２−ｉ）／Ｗ１｝に変更される。同様に、値｛Ｃ１（Ｃ２＋ｉ）／Ｗ２｝を示す成分ｈ（ｐ２，ｑ２）と対角成分を挟んで対称的な位置にある成分ｈ（ｑ２，ｐ２）の値は、｛Ｃ１（Ｃ２−ｉ）／Ｗ２｝に変更される。 Further, the value of the component showing the value −i is changed to the complex conjugate of the components at symmetrical positions with the diagonal component in between. For example, the value of the component h (p1, q1) showing the value {C1 (C2 + i) / W1} and the component h (q1, p1) located symmetrically across the diagonal component is {C1 (C2-i). ) / W1}. Similarly, the value of the component h (p2, q2) showing the value {C1 (C2 + i) / W2} and the component h (q2, p2) located symmetrically across the diagonal component is {C1 (C2-). i) / W2} is changed.

続くＳ１０３０において、第２スコアリング部１４３は、Ｓ１０２０で生成した特殊エルミート隣接行列Ｈ１の固有値及び固有ベクトルＶを算出する。特殊エルミート隣接行列Ｈ１がＮｘＮの行列であることから、固有ベクトルＶは、Ｎ個の成分を含むＮ次元ベクトルである。 In the following S1030, the second scoring unit 143 calculates the eigenvalues and the eigenvectors V of the special Hermitian adjacency matrix H1 generated in S1020. Since the special Hermitian adjacency matrix H1 is a matrix of NxN, the eigenvector V is an N-dimensional vector containing N components.

以下では、絶対値最大の固有値に対応する固有ベクトルＶの各成分をＶ［ｍ］を用いて表す。変数ｍは値１から値Ｎまでの整数値を採る。即ち、固有ベクトルＶは、Ｖ＝｛Ｖ［１］，Ｖ［２］，…，Ｖ［Ｎ］｝である。固有ベクトルＶの各成分Ｖ［ｍ］（１≦ｍ≦Ｎ）は、文書ネットワークを構成するノードＤ［ｍ］（１≦ｍ≦Ｎ）に対応する。 In the following, each component of the eigenvector V corresponding to the eigenvalue having the maximum absolute value is represented by using V [m]. The variable m takes an integer value from the value 1 to the value N. That is, the eigenvector V is V = {V [1], V [2], ..., V [N]}. Each component V [m] (1 ≦ m ≦ N) of the eigenvector V corresponds to the node D [m] (1 ≦ m ≦ N) constituting the document network.

続くＳ１０４０において、第２スコアリング部１４３は、特殊エルミート隣接行列Ｈ１の絶対値最大の固有値に対応する固有ベクトルＶの各成分Ｖ［ｍ］（１≦ｍ≦Ｎ）を、文書ネットワークの始点ノードに対応する成分Ｅで除算する。始点ノードが第ｓノードＤ［ｓ］であるとき、成分Ｅは、固有ベクトルＶの第ｓ成分Ｖ［ｓ］である（Ｅ＝Ｖ［ｓ］）。 In the following S1040, the second scoring unit 143 sets each component V [m] (1 ≦ m ≦ N) of the eigenvector V corresponding to the maximum eigenvalue of the absolute value of the special Hermitian adjacency matrix H1 to the start point node of the document network. Divide by the corresponding component E. When the starting node is the s node D [s], the component E is the s component V [s] of the eigenvector V (E = V [s]).

始点ノードは、文書ネットワークにおけるノードのうち、最も小さいスコアを付与すべきノードに対応する。始点ノードは、処理対象の文書ネットワーク内でリンクの向きに従って移動可能な先端ノードと後端ノードとの組合せのうち、先端ノードから後端ノードまでのノード数が最も多い組合せに対応する先端ノードに設定され得る。 The starting node corresponds to the node in the document network to which the lowest score should be given. The start point node is the tip node corresponding to the combination of the tip node and the trailing node that can move according to the direction of the link in the document network to be processed and has the largest number of nodes from the tip node to the trailing node. Can be set.

固有ベクトルＶの各成分Ｖ［ｍ］（１≦ｍ≦Ｎ）が成分Ｅで除算されると、始点ノードに対応する固有ベクトルＶの成分は、値１に変換される。以下では、除算後の固有ベクトルＶを、固有ベクトルＶ１と表現する。固有ベクトルＶ１は、Ｖ１＝｛Ｖ［１］／Ｅ，Ｖ［２］／Ｅ，…，Ｖ［ｓ］／Ｅ＝１，…，Ｖ［Ｎ］／Ｅ｝である。除算により、始点ノードに対応する固有ベクトルＶ１の成分は、複素平面において、実軸上に配置される。 When each component V [m] (1 ≦ m ≦ N) of the eigenvector V is divided by the component E, the component of the eigenvector V corresponding to the start point node is converted to the value 1. In the following, the eigenvector V after division is expressed as the eigenvector V1. The eigenvector V1 is V1 = {V [1] / E, V [2] / E, ..., V [s] / E = 1, ..., V [N] / E}. By division, the components of the eigenvector V1 corresponding to the start node are arranged on the real axis in the complex plane.

Ｓ１０４０での処理を終えると、第２スコアリング部１４３は、除算後の固有ベクトルＶ１の各成分Ｖ１［ｍ］＝Ｖ［ｍ］／Ｅ（１≦ｍ≦Ｎ）に基づいて、文書ネットワーク内の各ノードＤ［ｍ］のスコアを算出する（Ｓ１０５０）。 After finishing the processing in S1040, the second scoring unit 143 in the document network is based on each component V1 [m] = V [m] / E (1 ≦ m ≦ N) of the eigenvector V1 after division. The score of each node D [m] is calculated (S1050).

Ｓ１０５０において、第２スコアリング部１４３は、各成分Ｖ１［ｍ］（１≦ｍ≦Ｎ）を、複素平面上で回転変換する。具体的に、第２スコアリング部１４３は、複素平面上において、最も第１象限側に位置する成分が、実軸から角度θ１だけ第４象限側に位置するように、固有ベクトルＶ１の各成分Ｖ１［ｍ］（１≦ｍ≦Ｎ）を複素平面上において回転させる。 In S1050, the second scoring unit 143 rotationally transforms each component V1 [m] (1 ≦ m ≦ N) on the complex plane. Specifically, in the second scoring unit 143, each component V1 of the eigenvector V1 is arranged so that the component located on the first quadrant side on the complex plane is located on the fourth quadrant side by an angle θ1 from the real axis. [M] (1 ≦ m ≦ N) is rotated on the complex plane.

図１１Ａによれば、始点ノードに対応する成分Ｖ１［ｓ］が複素平面の実軸上にある。図１１Ａ及び図１１Ｂにおける黒丸及び白丸の夫々は、固有ベクトルＶ１の成分の一つ、換言すれば、文書ネットワーク内のノードの一つに対応し、黒丸は、始点ページに対応する。 According to FIG. 11A, the component V1 [s] corresponding to the start point node is on the real axis of the complex plane. Each of the black and white circles in FIGS. 11A and 11B corresponds to one of the components of the eigenvector V1, in other words, one of the nodes in the document network, and the black circle corresponds to the starting page.

上記回転変換によって、始点ノードに対応する成分は、図１１Ｂに示すように、複素平面上で実軸から角度θ１だけ第４象限側に位置するように回転移動する。この回転変換は、適切なスコアリングを目的として、始点ノードを実軸から第４象限側にずらすために実行される。角度θ１は、回転変換によっても、固有ベクトルＶ１の全成分が依然として第４象限に位置する小さい角度に定められる。スコアリングに悪影響がなければ、角度θ１はゼロであってもよい。 By the above rotation conversion, as shown in FIG. 11B, the component corresponding to the start point node is rotationally moved so as to be located on the fourth quadrant side by an angle θ1 from the real axis on the complex plane. This rotational transformation is performed to shift the starting node from the real axis to the fourth quadrant for proper scoring. The angle θ1 is set to a small angle in which all the components of the eigenvector V1 are still located in the fourth quadrant even by the rotation transformation. The angle θ1 may be zero as long as the scoring is not adversely affected.

回転変換後の固有ベクトルＶ１のことを、以下では、固有ベクトルＶｃ＝｛Ｖｃ［１］，Ｖｃ［２］，…，Ｖｃ［ｓ］，…，Ｖｃ［Ｎ］｝と表現する。固有ベクトルＶｃの各成分Ｖｃ［ｍ］（１≦ｍ≦Ｎ）は、複素数である。 In the following, the eigenvector V1 after the rotation conversion is expressed as the eigenvector Vc = {Vc [1], Vc [2], ..., Vc [s], ..., Vc [N]}. Each component Vc [m] (1 ≦ m ≦ N) of the eigenvector Vc is a complex number.

Ｓ１０５０では、各成分Ｖｃ［ｍ］（１≦ｍ≦Ｎ）の複素平面上の位置に基づいて、各ノードのスコアを算出する。以下では、回転変換後の固有ベクトルＶｃの各成分Ｖｃ［ｍ］（１≦ｍ≦Ｎ）のことを、各ノードのスコア基準値Ｖｃ［ｍ］（１≦ｍ≦Ｎ）とも表現する。スコア基準値Ｖｃ［ｍ］は、第ｍノードのスコア基準値であり、第ｍノードのスコアリングに用いられる。角度θ１がゼロであるとき、各ノードのスコア基準値Ｖｃ［ｍ］（１≦ｍ≦Ｎ）は、Ｖ１［ｍ］（１≦ｍ≦Ｎ）に一致する。 In S1050, the score of each node is calculated based on the position of each component Vc [m] (1 ≦ m ≦ N) on the complex plane. In the following, each component Vc [m] (1 ≦ m ≦ N) of the eigenvector Vc after rotation conversion is also expressed as a score reference value Vc [m] (1 ≦ m ≦ N) of each node. The score reference value Vc [m] is the score reference value of the m-th node, and is used for scoring the m-node. When the angle θ1 is zero, the score reference value Vc [m] (1 ≦ m ≦ N) of each node corresponds to V1 [m] (1 ≦ m ≦ N).

本明細書において以下に記載される関数ａｒｇ（ｘ）は、複素数ｘの複素平面上の偏角であると理解されてよい。ｘは、例えば、Ｖｃ［ｍ］である。図１１Ｂに示される成分Ｖｃ［ｍ］の複素平面上の実軸から第４象限への角度θ［ｍ］は、｛２π−ａｒｇ（Ｖｃ［ｍ］）｝に等しい。以下で表現する｜ｘ｜は、複素数ｘの絶対値を意味する。ｘ＝Ｖｃ［ｍ］である場合、｜ｘ｜は、図１１Ｂに示すＶｃ［ｍ］の複素平面上の長さＬ［ｍ］に対応する。 The function arg (x) described below herein may be understood to be the argument of the complex number x on the complex plane. x is, for example, Vc [m]. The angle θ [m] of the component Vc [m] shown in FIG. 11B from the real axis on the complex plane to the fourth quadrant is equal to {2π-arg (Vc [m])}. | X | expressed below means the absolute value of the complex number x. When x = Vc [m], | x | corresponds to the length L [m] of Vc [m] shown in FIG. 11B on the complex plane.

Ｓ１０５０において、第２スコアリング部１４３は、各ノードのスコア相当値Ｚ［ｍ］（１≦ｍ≦Ｎ）として、各ノードのスコア基準値Ｖｃ［ｍ］（１≦ｍ≦Ｎ）の複素平面における実軸からの距離に対応する値Ｚ［ｍ］＝Ｌ［ｍ］・θ［ｍ］（１≦ｍ≦Ｎ）＝｜Ｖｃ［ｍ］｜・｛２π−ａｒｇ（Ｖｃ［ｍ］）｝を算出する。 In S1050, the second scoring unit 143 sets the score equivalent value Z [m] (1 ≦ m ≦ N) of each node to the complex plane of the score reference value Vc [m] (1 ≦ m ≦ N) of each node. Z [m] = L [m] · θ [m] (1 ≦ m ≦ N) = | Vc [m] | · {2π-arg (Vc [m])} Is calculated.

別例としてスコア相当値Ｚ［ｍ］は、式Ｚ［ｍ］＝｜Ｖｃ［ｍ］｜^ｄ１・｛２π−ａｒｇ（Ｖｃ［ｍ］）｝^ｄ２に従って算出されてもよい。値ｄ１，ｄ２は、ゼロより大きい任意の実数である。ｄ１が大きいほど、始点ノードからの各点のアウトリンク数の少なさに応じて、Ｚ［ｍ］の値は大きくなる。ｄ２が大きいほど始点ノードからの距離に応じて、Ｚ［ｍ］の値は大きくなる。As another example, the score equivalent value Z [m] may be calculated according to the formula Z [m] = | Vc [m] | ^d1 · {2π-arg (Vc [m])} ^d2. The values d1 and d2 are any real numbers greater than zero. As d1 is larger, the value of Z [m] becomes larger according to the smaller number of outlinks at each point from the start point node. The larger d2 is, the larger the value of Z [m] is according to the distance from the starting point node.

その後、第２スコアリング部１４３は、文書ネットワーク内の各ノードＤ［ｍ］（１≦ｍ≦Ｎ）のスコアを、スコア相当値Ｚ［ｍ］に基づいて算出する。Ｓ１０５０において、ノードＤ［ｍ］に対応するスコアＸは、Ｘ＝Ｚ［ｍ］−Ｚ０に従って算出される。Ｚ０は、例えば、文書ネットワーク全体におけるＺ［ｍ］の最小値である。この場合、最も小さいＺ［ｍ］を示すノードＤ［ｍ］のスコアは、値ゼロである。Ｚ０は、値ゼロであってもよい。すなわち、Ｚ０の項はなくてもよい。 After that, the second scoring unit 143 calculates the score of each node D [m] (1 ≦ m ≦ N) in the document network based on the score equivalent value Z [m]. In S1050, the score X corresponding to the node D [m] is calculated according to X = Z [m] −Z0. Z0 is, for example, the minimum value of Z [m] in the entire document network. In this case, the score of the node D [m] indicating the smallest Z [m] is zero. Z0 may have a value of zero. That is, the Z0 term may be omitted.

Ｓ２４５では、このようにして算出された各ノードのスコアＸが、各ウェブページの第２スコアとしてランク付け部１４５に出力される。 In S245, the score X of each node calculated in this way is output to the ranking unit 145 as the second score of each web page.

別例として、第２スコアリング部１４３は、Ｓ１０２０で上述の特殊エルミート隣接行列Ｈ１に代えて、図１２に示す特殊エルミート隣接行列Ｈ２を生成してもよい。図１２に示される特殊エルミート隣接行列Ｈ２は、図１０上段に示すエルミート隣接行列Ｈに対応する特殊エルミート隣接行列Ｈ２の例である。 As another example, the second scoring unit 143 may generate the special Hermitian adjacency matrix H2 shown in FIG. 12 in place of the above-mentioned special Hermitian adjacency matrix H1 in S1020. The special Hermitian adjacency matrix H2 shown in FIG. 12 is an example of the special Hermitian adjacency matrix H2 corresponding to the Hermitian adjacency matrix H shown in the upper part of FIG.

第２スコアリング部１４３は、特殊エルミート隣接行列Ｈ２の生成に際して、エルミート隣接行列Ｈにおける値＋ｉの成分を、値Ｃ１（Ｃ２＋ｉ）に置換し、値−ｉを示す成分を値Ｃ１（Ｃ２−ｉ）に置換することができる。第２スコアリング部１４３は更に、次の処理Ａ及び処理Ｂを行うことができる。 When the special Hermitian adjacency matrix H2 is generated, the second scoring unit 143 replaces the component of the value + i in the Hermitian adjacency matrix H with the value C1 (C2 + i), and replaces the component indicating the value −i with the value C1 (C2-i). ) Can be replaced. The second scoring unit 143 can further perform the following processes A and B.

（処理Ａ）
第２スコアリング部１４３は、置換後のエルミート隣接行列Ｈにおける各行の成分内の値Ｃ１（Ｃ２＋ｉ）を、同じ行において値Ｃ１（Ｃ２＋ｉ）及び値１を示す成分の数Ｗで除算した値｛Ｃ１（Ｃ２＋ｉ）／Ｗ｝に変更し、更に、値｛Ｃ１（Ｃ２＋ｉ）／Ｗ｝に変更された成分と対角成分を挟んで対称的な位置にある成分内の値Ｃ１（Ｃ２−ｉ）を、値｛Ｃ１（Ｃ２＋ｉ）／Ｗ｝の複素共役｛Ｃ１（Ｃ２−ｉ）／Ｗ｝に変更することができる。(Process A)
The second scoring unit 143 divides the value C1 (C2 + i) in the components of each row in the Hermitian adjacency matrix H after substitution by the value C1 (C2 + i) and the number W of the components indicating the value 1 in the same row { The value C1 (C2-i) in the component at a symmetrical position with the component changed to C1 (C2 + i) / W} and further changed to the value {C1 (C2 + i) / W} with the diagonal component in between. Can be changed to the complex conjugate {C1 (C2-i) / W} of the value {C1 (C2 + i) / W}.

（処理Ｂ）
第２スコアリング部１４３は、置換後のエルミート隣接行列Ｈにおける各行の成分内の値Ｃ１（Ｃ２−ｉ）を、同じ行においてＣ１（Ｃ２−ｉ）及び値１を示す成分の数Ｚで乗算した値｛Ｃ１（Ｃ２−ｉ）Ｚ｝に変更し、更に、値｛Ｃ１（Ｃ２−ｉ）Ｚ｝に変更された成分と対角成分を挟んで対称的な位置にある成分内の値Ｃ１（Ｃ２＋ｉ）を、値｛Ｃ１（Ｃ２−ｉ）Ｚ｝の複素共役｛Ｃ１（Ｃ２＋ｉ）Ｚ｝に変更することができる。(Process B)
The second scoring unit 143 multiplies the value C1 (C2-i) in the components of each row in the replaced Hermitian adjacency matrix H by the number Z of the components showing C1 (C2-i) and the value 1 in the same row. Change to the value {C1 (C2-i) Z}, and further change to the value {C1 (C2-i) Z}. (C2 + i) can be changed to the complex conjugate {C1 (C2 + i) Z} of the value {C1 (C2-i) Z}.

このような置換及び変更によって、特殊エルミート隣接行列Ｈ２は生成される。第２スコアリング部１４３は、処理Ａの実行後、処理Ｂを実行してもよいし、処理Ｂの実行後、処理Ａを実行してもよいし、処理Ａ及び処理Ｂを同時並行的に実行してもよい。いずれの態様で処理Ａ及び処理Ｂを実行しても、同じ特殊エルミート隣接行列Ｈ２が生成される。 By such substitution and modification, the special Hermitian adjacency matrix H2 is generated. The second scoring unit 143 may execute the process B after the execution of the process A, may execute the process A after the execution of the process B, or may execute the process A and the process B in parallel. You may do it. Regardless of which mode the process A and the process B are executed, the same special Hermitian adjacency matrix H2 is generated.

図１２に示される特殊エルミート隣接行列Ｈ２における値Ｚ１は、第ｐ３行における成分ｈ（ｐ３，１），ｈ（ｐ３，２），…，ｈ（ｐ３，Ｎ）の内、値−ｉを採る成分及び値１を採る成分の数に対応する。値Ｚ２は、第ｐ４行における合計Ｎ個の成分ｈ（ｐ４，１），ｈ（ｐ４，２），…，ｈ（ｐ４，Ｎ）の内、値−ｉを採る成分及び値１を採る成分の数に対応する。第ｐ３行は、図１２において値｛Ｃ１（Ｃ２−ｉ）Ｚ１／Ｗ１｝が示される行と理解されてよい。第ｐ４行は、図１２において値｛Ｃ１（Ｃ２−ｉ）Ｚ２／Ｗ２｝が示される行と理解されてよい。 The value Z1 in the special Hermitian adjacency matrix H2 shown in FIG. 12 takes the value −i from the components h (p3, 1), h (p3, 2), ..., H (p3, N) in the third row. Corresponds to the number of components and components that take a value of 1. The value Z2 is a component that takes a value -i and a component that takes a value 1 among a total of N components h (p4, 1), h (p4, 2), ..., H (p4, N) in the fourth row. Corresponds to the number of. The third line may be understood as the line in which the value {C1 (C2-i) Z1 / W1} is shown in FIG. The fourth line may be understood as the line in which the value {C1 (C2-i) Z2 / W2} is shown in FIG.

第２スコアリング部１４３は、このように算出した特殊エルミート隣接行列Ｈ２を、特殊エルミート隣接行列Ｈ１に代えて用いて、Ｓ１０３０−Ｓ１０５０の処理を実行することができる。 The second scoring unit 143 can execute the processing of S1030-S1050 by using the special Hermitian adjacency matrix H2 calculated in this way in place of the special Hermitian adjacency matrix H1.

Ｓ２５０（図７参照）において、第２スコアリング部１４３は、処理対象の文書ネットワークに含まれる結合ノードの層数Ｊを判別する。本実施形態では、先端ノードからリンクの向きに従ってノード間を移動したときに、最初に現れる結合ノードが第０層結合ノードと定義される。 In S250 (see FIG. 7), the second scoring unit 143 determines the number of layers J of the combined nodes included in the document network to be processed. In the present embodiment, the first joining node that appears when moving between the nodes according to the direction of the link from the tip node is defined as the 0th layer joining node.

第０層結合ノードの次に現れる結合ノードが第１層結合ノードと定義され、第ｊ層結合ノードの次に現れる結合ノードが第（ｊ＋１）結合ノードと定義される（ｊは０以上の整数である）。この定義に従えば、文書ネットワーク内に、第（Ｊ−１）層の結合ノードまでが存在するとき、文書ネットワーク内における結合ノードの層数はＪである。 The join node that appears next to the 0th layer join node is defined as the 1st layer join node, and the join node that appears next to the jth layer join node is defined as the (j + 1) join node (j is an integer greater than or equal to 0). Is). According to this definition, when there are up to the join nodes of the (J-1) layer in the document network, the number of join nodes in the document network is J.

図６に示す文書ネットワークによれば、第０層結合ノードは、第３ノード及び第１２ノードであり、第１層結合ノードは、第６ノードであり、第２層結合ノードは、第７ノードであり、第３層結合ノードは、第１５ノードである。図６に示す文書ネットワークにおける結合ノードの層数Ｊは、４である。 According to the document network shown in FIG. 6, the 0th layer connection node is the 3rd node and the 12th node, the 1st layer connection node is the 6th node, and the 2nd layer connection node is the 7th node. The third layer join node is the fifteenth node. The number of layers J of the join nodes in the document network shown in FIG. 6 is 4.

この説明から理解できるように、先端ノードに依存して複数の層番号を採り得る結合ノードに関しては、採り得る層番号のうちの最大の層番号が、対応する結合ノードに割り当てられる。第７ノードは、第１層結合ノードではなく、第２層結合ノードである。 As can be understood from this explanation, for a join node that can take a plurality of layer numbers depending on the tip node, the highest layer number among the available layer numbers is assigned to the corresponding join node. The seventh node is not a first layer join node but a second layer join node.

Ｓ２５０の処理後、第２スコアリング部１４３は、ｊ＝０に設定し（Ｓ２６０）、先端ノードから第ｊ層（すなわち第０層）結合ノードに導かれるサブグラフを判別する（Ｓ２７０）。 After the processing of S250, the second scoring unit 143 sets j = 0 (S260) and determines the subgraph guided from the tip node to the jth layer (that is, the 0th layer) connecting node (S270).

図６に示す文書ネットワークの例によれば、Ｓ２７０で判別されるサブグラフは、図１３に示すように、第１ノードと第３ノードとからなるサブグラフＳＧ１と、第２ノードと第３ノードとからなるサブグラフＳＧ２と、第１０ノード、第１１ノード、第１２ノード、及び第２０ノードからなるサブグラフＳＧ３と、第１３ノード及び第１２ノードからなるサブグラフＳＧ４である。Ｓ２７０において、サブグラフは、先端ノードと第０層結合ノードとの組み合わせ毎に判別される。 According to the example of the document network shown in FIG. 6, the subgraph determined in S270 is composed of the subgraph SG1 consisting of the first node and the third node, and the second node and the third node, as shown in FIG. Subgraph SG2, subgraph SG3 consisting of tenth node, eleventh node, twelfth node, and twentieth node, and subgraph SG4 consisting of thirteenth node and twelfth node. In S270, the subgraph is discriminated for each combination of the tip node and the 0th layer join node.

第２スコアリング部１４３は、Ｓ２７０で判別したサブグラフのそれぞれに関して、図８に示す処理と同様の処理を実行する（Ｓ２８０）。これにより、サブグラフ毎に、サブグラフ内の各ノードのスコア基準値及びスコアを算出する。 The second scoring unit 143 executes the same processing as that shown in FIG. 8 for each of the subgraphs determined in S270 (S280). As a result, the score reference value and the score of each node in the subgraph are calculated for each subgraph.

Ｓ２８０において、第２スコアリング部１４３は、判別されたサブグラフを順に処理対象に選択して、図８に示す処理を実行することができる。ここでは、処理対象のサブグラフが、図８の説明における「処理対象の文書ネットワーク」と同様に扱われて、サブグラフ内の各ノードのスコア基準値及びスコアが算出される。 In S280, the second scoring unit 143 can sequentially select the determined subgraphs as processing targets and execute the processing shown in FIG. Here, the subgraph to be processed is treated in the same manner as the “document network to be processed” in the description of FIG. 8, and the score reference value and the score of each node in the subgraph are calculated.

例えば、処理対象のサブグラフが、第１０ノード、第１１ノード、第１２ノード、及び第２０ノードからなるサブグラフＳＧ３である場合には、このサブグラフにおいて、アウトリンクを有さない第１２ノード及び第２０ノードに対しダミーノードＤＰが付加される（図１４参照）。付加対象のノードには、サブグラフ化前において付加対象のノードが有するアウトリンクの数と同数、ダミーノードＤＰが付加され得る。 For example, when the subgraph to be processed is the subgraph SG3 consisting of the 10th node, the 11th node, the 12th node, and the 20th node, in this subgraph, the 12th node and the 20th node having no outlink. A dummy node DP is added to the node (see FIG. 14). Dummy node DP may be added to the addition target node in the same number as the number of outlinks that the addition target node has before subgraphing.

Ｓ２８０では、このようにダミーノードＤＰが付加されたサブグラフに対応するエルミート隣接行列Ｈが生成される。このエルミート隣接行列Ｈに対応する特殊エルミート隣接行列Ｈ１又は特殊エルミート隣接行列Ｈ２に基づいて、第１０ノード、第１１ノード、第１２ノード、及び第２０ノードのスコア基準値及びスコアが算出される。上述の値Ｚ０は、例えば先端ノードである第１０ノードのスコアがゼロとなるように設定され得る。 In S280, the Hermitian adjacency matrix H corresponding to the subgraph to which the dummy node DP is added is generated. Based on the special Hermitian adjacency matrix H1 or the special Hermitian adjacency matrix H2 corresponding to the Hermitian adjacency matrix H, the score reference values and scores of the tenth node, the eleventh node, the twelfth node, and the twentieth node are calculated. The above-mentioned value Z0 can be set so that the score of the tenth node, which is the tip node, becomes zero, for example.

Ｓ２８０において、サブグラフ毎のスコア基準値及びスコアを算出すると、第２スコアリング部１４３は、サブグラフ間で重複する第ｊ層結合ノードのスコア基準値及びスコアを統合する（Ｓ２８５）。 When the score reference value and the score for each subgraph are calculated in S280, the second scoring unit 143 integrates the score reference value and the score of the j-layer connecting node overlapping between the subgraphs (S285).

Ｓ２８５において、第２スコアリング部１４３は、サブグラフ間で重複する第ｊ層結合ノードのスコア基準値を、次のように合成して、処理対象の文書ネットワークにおける第ｊ層結合ノードのそれぞれに対し唯一のスコア基準値及びスコアを算出する。 In S285, the second scoring unit 143 synthesizes the score reference values of the j-layer connected nodes overlapping between the subgraphs as follows, and for each of the j-layer connected nodes in the document network to be processed. Calculate the only score reference value and score.

具体的には、第２スコアリング部１４３は、一つの第ｊ層結合ノードに関して、当該第ｊ層結合ノードのサブグラフ毎のスコア基準値のうち、複素平面上において実軸からの角度θが最も大きいスコア基準値を判別する。その角度θが最大のスコア基準値と複素平面上で重なるように、第ｊ層結合ノードの各サブグラフにおけるスコア基準値を複素平面上で回転させる。 Specifically, the second scoring unit 143 has the highest angle θ from the real axis on the complex plane among the score reference values for each subgraph of the j-layer connection node with respect to one j-layer connection node. Determine a large score reference value. The score reference value in each subgraph of the j-layer join node is rotated on the complex plane so that the angle θ overlaps with the maximum score reference value on the complex plane.

第２スコアリング部１４３は、複素平面上で重なった各スコア基準値をベクトル合成し、一つの第ｊ層結合ノードに対して唯一のスコア基準値を、その合成ベクトルに決定する。この唯一のスコア基準値Ｖｘに基づいて、一つの第ｊ層結合ノードに対応するスコア相当値Ｚｘ＝｜Ｖｘ｜^ｄ１・｛２π−ａｒｇ（Ｖｘ）｝^ｄ２を算出する。第ｊ層結合ノードのスコアＸは、Ｘ＝Ｚｘ−Ｚ０に従って算出され得る。The second scoring unit 143 vector-synthesizes the score reference values overlapped on the complex plane, and determines the only score reference value for one j-layer connecting node as the composite vector. Based on this unique score reference value Vx, the score equivalent value Zx = | Vx | ^d1 · {2π-arg (Vx)} ^d2 corresponding to one j-layer connection node is calculated. The score X of the j-layer join node can be calculated according to X = Zx−Z0.

サブグラフ間で重複する第ｊ層結合ノードに対して共通する一つのスコアＸを与えるために、値Ｚ０は、第ｊ層結合ノードについて上記角度θが最大のスコア基準値を有するサブグラフにおけるＺ［ｍ］の最小値に設定され得る。あるいは、値Ｚ０は、上述したように、値ゼロであってもよい。 In order to give a common score X to the j-layer connecting node overlapping between the subgraphs, the value Z0 is Z [m in the subgraph having the maximum score reference value at the angle θ for the j-layer connecting node. ] Can be set to the minimum value. Alternatively, the value Z0 may be zero, as described above.

別例として、第２スコアリング部１４３は、一つの第ｊ層結合ノードに関して、第ｊ層結合ノードのサブグラフ毎のスコア基準値を重ねないまま複素平面上においてベクトル合成することで、一つの第ｊ層結合ノードに対して唯一のスコア基準値を決定してもよい。 As another example, the second scoring unit 143 performs vector synthesis on the complex plane with respect to one j-layer connecting node without overlapping the score reference values for each subgraph of the j-layer connecting node. The only score reference value may be determined for the j-layer join node.

第２スコアリング部１４３は、文書ネットワークに複数の第ｊ層結合ノードが存在する場合、Ｓ２８５において、第ｊ層結合ノードのそれぞれに対して上述の処理を実行し、各第ｊ層結合ノードのスコア基準値Ｖｘ及びスコアＸを算出する。これにより第２スコアリング部１４３は、第ｊ層結合ノード毎に、第ｊ層結合ノードに関するサブグラフ間のスコア基準値Ｖｃを統合したスコア基準値Ｖｘ及び対応するスコアＸを算出する。 When a plurality of j-layer connected nodes exist in the document network, the second scoring unit 143 executes the above processing for each of the j-layer connected nodes in S285, and the second scoring unit 143 executes the above processing for each of the j-layer connected nodes. The score reference value Vx and the score X are calculated. As a result, the second scoring unit 143 calculates the score reference value Vx and the corresponding score X by integrating the score reference values Vc between the subgraphs regarding the j-layer join node for each j-layer join node.

Ｓ２８５での処理を終えると、第２スコアリング部１４３は、変数ｊの値を１インクリメントする（Ｓ２９０）、続くＳ３００において、第２スコアリング部１４３は、変数ｊの値が、層数Ｊ未満であるか否かを判断する。 When the processing in S285 is completed, the second scoring unit 143 increments the value of the variable j by 1 (S290), and in the subsequent S300, the second scoring unit 143 has the value of the variable j less than the number of layers J. Judge whether or not.

変数ｊの値が層数Ｊ以上であると判断すると（Ｓ３００でＮｏ）、第２スコアリング部１４３は、Ｓ４１０（図１９参照）の処理を実行する。一方、変数ｊの値が層数Ｊ未満であると判断すると（Ｓ３００でＹｅｓ）、第２スコアリング部１４３は、Ｓ３１０（図１５参照）の処理を実行する。 When it is determined that the value of the variable j is equal to or greater than the number of layers J (No in S300), the second scoring unit 143 executes the process of S410 (see FIG. 19). On the other hand, if it is determined that the value of the variable j is less than the number of layers J (Yes in S300), the second scoring unit 143 executes the process of S310 (see FIG. 15).

Ｓ３１０において、第２スコアリング部１４３は、先端ノードから第ｊ層結合ノードまでのサブグラフを判別する。ここで判別されるサブグラフは、先端ノードから第ｊ層結合ノードまでの間に、他の結合ノードが含まれないサブグラフである。Ｓ３１０では、先端ノードと第ｊ層結合ノードとの組み合わせ毎に、組み合わせに対応する一つの先端ノードと一つの結合ノードとを含むサブグラフが判別される。 In S310, the second scoring unit 143 determines the subgraph from the tip node to the j-layer connection node. The subgraph determined here is a subgraph in which no other join node is included between the tip node and the j-th layer join node. In S310, for each combination of the tip node and the j-th layer join node, a subgraph including one tip node and one join node corresponding to the combination is determined.

図６に示す文書ネットワークの例によれば、Ｓ３１０で判別されるサブグラフは、図１６に示すように、第４ノードと、第５ノードと、第６ノードとからなるサブグラフＳＧ５である。 According to the example of the document network shown in FIG. 6, the subgraph determined in S310 is a subgraph SG5 including a fourth node, a fifth node, and a sixth node, as shown in FIG.

Ｓ３１０での処理によって、該当するサブグラフが存在することが判明した場合（Ｓ３２０でＹｅｓ）、第２スコアリング部１４３は、Ｓ３１０で判別されたサブグラフのそれぞれに関して、Ｓ２８０と同様の処理を実行する（Ｓ３３０）。これにより、サブグラフ毎に、サブグラフ内の各ノードのスコア基準値及びスコアを算出する（Ｓ３３０）。その後、第２スコアリング部１４３は、Ｓ３４０の処理を実行する。 When the processing in S310 reveals that the corresponding subgraph exists (Yes in S320), the second scoring unit 143 executes the same processing as in S280 for each of the subgraphs determined in S310 (Yes in S320). S330). As a result, the score reference value and the score of each node in the subgraph are calculated for each subgraph (S330). After that, the second scoring unit 143 executes the process of S340.

Ｓ３１０の処理によって、該当するサブグラフが存在しないことが判明した場合（Ｓ３２０でＮｏ）、第２スコアリング部１４３は、Ｓ３３０の処理を実行せず、Ｓ３４０の処理を実行する。 When it is found by the process of S310 that the corresponding subgraph does not exist (No in S320), the second scoring unit 143 does not execute the process of S330, but executes the process of S340.

Ｓ３４０において、第２スコアリング部１４３は、変数ｆを値ゼロに設定する。続くＳ３５０において、第２スコアリング部１４３は、第ｆ層結合ノードから第ｊ層結合ノードへのサブグラフを判別する。ここで判別されるサブグラフは、第ｆ層結合ノードから第ｊ層結合ノードまでの間に、他の結合ノードが含まれないサブグラフである。 In S340, the second scoring unit 143 sets the variable f to a value of zero. In the following S350, the second scoring unit 143 determines the subgraph from the f-layer connection node to the j-layer connection node. The subgraph determined here is a subgraph in which no other join node is included between the f-layer join node and the j-layer join node.

Ｓ３５０では、第ｆ層結合ノードと第ｊ層結合ノードとの組み合わせ毎に、組み合わせに対応する一つの第ｆ層結合ノードと一つの第ｊ層結合ノードとを含むサブグラフが判別される。サブグラフにおいて第ｆ層結合ノードは、先端ノードに対応し、第ｊ層結合ノードは、後端ノードに対応する。 In S350, for each combination of the f-layer connection node and the j-layer connection node, a subgraph including one f-layer connection node and one j-layer connection node corresponding to the combination is determined. In the subgraph, the f-layer join node corresponds to the front end node, and the j-layer join node corresponds to the rear end node.

図６に示す文書ネットワークの例によれば、ｆ＝０及びｇ＝１であるとき、Ｓ３５０で判別されるサブグラフは、図１６に示す第３ノードと第６ノードとからなるサブグラフＳＧ６である。 According to the example of the document network shown in FIG. 6, when f = 0 and g = 1, the subgraph determined in S350 is the subgraph SG6 including the third node and the sixth node shown in FIG.

Ｓ３５０での処理によって、該当するサブグラフが存在しないことが判明した場合（Ｓ３６０でＮｏ）、第２スコアリング部１４３は、Ｓ３７０の処理を実行することなく、Ｓ３８０の処理を実行する。 When the processing in S350 reveals that the corresponding subgraph does not exist (No in S360), the second scoring unit 143 executes the processing of S380 without executing the processing of S370.

一方、該当するサブグラフが存在することが判明した場合（Ｓ３６０でＹｅｓ）、第２スコアリング部１４３は、Ｓ３５０で判別されたサブグラフのそれぞれに対しＳ２８０と同様の処理を実行する。それにより、サブグラフ毎に、サブグラフ内の各ノードのスコア基準値及びスコアを算出する（Ｓ３７０）。 On the other hand, when it is found that the corresponding subgraph exists (Yes in S360), the second scoring unit 143 executes the same processing as in S280 for each of the subgraphs determined in S350. As a result, the score reference value and the score of each node in the subgraph are calculated for each subgraph (S370).

Ｓ３７０において、第２スコアリング部１４３は更に、算出したサブグラフ内の各ノードのスコア基準値及びスコアを、既に計算されている第ｆ層結合ノードのスコア基準値及びスコアに応じて修正する。 In S370, the second scoring unit 143 further modifies the score reference value and score of each node in the calculated subgraph according to the already calculated score reference value and score of the f-layer connection node.

サブグラフ内の第ｆ層結合ノードのスコア基準値及びスコアは、Ｓ３７０の処理前に計算されている。例えば、ｆ＝０であるときの第０層結合ノードのスコア基準値及びスコアは、Ｓ２８５で計算される。Ｓ３７０において、第２スコアリング部１４３は、既にスコア基準値及びスコアが計算された第ｆ層結合ノードのスコア基準値及びスコアを基準に、サブグラフ内の残りのノードのスコア基準値及びスコアを修正する。 The score reference value and the score of the f-layer connection node in the subgraph are calculated before the processing of S370. For example, the score reference value and the score of the 0th layer connection node when f = 0 are calculated in S285. In S370, the second scoring unit 143 modifies the score reference values and scores of the remaining nodes in the subgraph based on the score reference values and scores of the f-layer connecting node for which the score reference values and scores have already been calculated. do.

Ｓ３７０の第１例によれば、第２スコアリング部１４３は、Ｓ３７０の処理前に算出されている第ｆ層結合ノードのスコア基準値がＶａである場合、サブグラフ内の各ノードのスコア基準値を次のように修正する。 According to the first example of S370, when the score reference value of the f-layer join node calculated before the processing of S370 is Va, the second scoring unit 143 has the score reference value of each node in the subgraph. Is modified as follows.

すなわち、第２スコアリング部１４３は、Ｓ３７０で算出した修正前の各ノードのスコア基準値を、第ｆ層結合ノードのスコア基準値が上記Ｖａと一致するように、複素平面上で回転させる。このようにして回転させたときの各ノードのスコア基準値を、修正後のスコア基準値として決定する。 That is, the second scoring unit 143 rotates the score reference value of each node before modification calculated in S370 on the complex plane so that the score reference value of the f-layer connection node matches the above Va. The score reference value of each node when rotated in this way is determined as the corrected score reference value.

第２スコアリング部１４３は、修正後の各ノードのスコア基準値Ｖｃに基づいて、サブグラフ内の各ノードの修正後のスコアＸを算出することができる。スコアＸは、第ｆ層結合ノードのスコアＸが修正前のスコアと同じになるように算出され得る。 The second scoring unit 143 can calculate the modified score X of each node in the subgraph based on the score reference value Vc of each node after modification. The score X can be calculated so that the score X of the f-layer join node is the same as the score before modification.

Ｓ３７０の第２例によれば、第２スコアリング部１４３は、Ｓ３７０の処理前に算出されている第ｆ層結合ノードのスコアがＸａである場合、サブグラフ内の各ノードのスコアを次のように修正する。 According to the second example of S370, when the score of the f-layer join node calculated before the processing of S370 is Xa, the second scoring unit 143 sets the score of each node in the subgraph as follows. Modify to.

すなわち、第２スコアリング部１４３は、Ｓ３７０で算出した修正前の第ｆ層結合ノードのスコアと上記Ｘａとの差分だけ、Ｓ３７０で算出した修正前の各ノードのスコアを加算する。これにより、第２スコアリング部１４３は、Ｓ３７０で算出した修正前の第ｆ層結合ノードのスコアが上記Ｘａと一致するように、サブグラフ内の各ノードのスコアを修正する。 That is, the second scoring unit 143 adds the score of each node before modification calculated in S370 by the difference between the score of the f-layer connection node before modification calculated in S370 and the above Xa. As a result, the second scoring unit 143 corrects the score of each node in the subgraph so that the score of the f-layer connecting node before the correction calculated in S370 matches the above Xa.

Ｓ３７０での処理を終えると、第２スコアリング部１４３は、変数ｆの値を１インクリメントする（Ｓ３８０）。その後、第２スコアリング部１４３は、変数ｆの値が、変数ｊの値未満であるか否かを判断する（Ｓ３９０）。ここで肯定判断すると（Ｓ３９０でＹｅｓ）、第２スコアリング部１４３は、Ｓ３５０の処理を実行する。肯定判断すると（Ｓ３９０でＮｏ）、第２スコアリング部１４３は、Ｓ４００の処理を実行する。 When the processing in S370 is completed, the second scoring unit 143 increments the value of the variable f by 1 (S380). After that, the second scoring unit 143 determines whether or not the value of the variable f is less than the value of the variable j (S390). If an affirmative decision is made here (Yes in S390), the second scoring unit 143 executes the process of S350. If affirmative determination is made (No in S390), the second scoring unit 143 executes the process of S400.

Ｓ４００において、第２スコアリング部１４３は、Ｓ３１０，Ｓ３５０で判別されたサブグラフ間で重複する第ｊ層結合ノードのそれぞれに関して、Ｓ３３０，Ｓ３７０の処理で算出された、対応する第ｊ層結合ノードのサブグラフ毎のスコア基準値及びスコアを、Ｓ２８５の処理と同様に統合する。 In S400, the second scoring unit 143 of the corresponding j-layer join node calculated by the processing of S330 and S370 for each of the j-layer join nodes overlapping between the subgraphs determined in S310 and S350. The score reference value and the score for each subgraph are integrated in the same manner as in the process of S285.

すなわち、第２スコアリング部１４３は、第ｊ層結合ノードのそれぞれに関し、対応する第ｊ層結合ノードのサブグラフ毎のスコア基準値Ｖｃを統合したスコア基準値Ｖｘを算出し、スコア基準値Ｖｘに対応するスコア相当値Ｚｘに基づくスコアＸ＝Ｚｘ−Ｚ０を算出する。 That is, the second scoring unit 143 calculates a score reference value Vx that integrates the score reference value Vc for each subgraph of the corresponding j-layer join node for each of the j-layer join nodes, and uses the score reference value Vx as the score reference value Vx. The score X = Zx−Z0 based on the corresponding score equivalent value Zx is calculated.

その後、第２スコアリング部１４３は、変数ｊの値を１インクリメントして（Ｓ２９０）、Ｓ３００〜Ｓ４００の処理を実行する。第２スコアリング部１４３は、変数ｊの値をインクリメントしながら、Ｓ３００〜Ｓ４００を繰返し実行することにより、第（Ｊ−１）層結合ノードまでの各ノードのスコア基準値及びスコアを算出する。 After that, the second scoring unit 143 increments the value of the variable j by 1 (S290) and executes the processes of S300 to S400. The second scoring unit 143 calculates the score reference value and the score of each node up to the (J-1) layer connection node by repeatedly executing S300 to S400 while incrementing the value of the variable j.

第（Ｊ−１）層結合ノードまでの各ノードのスコア基準値及びスコアを算出し終えると、第２スコアリング部１４３は、Ｓ３００において否定判断して、Ｓ４１０の処理を実行する（図１９参照）。 After calculating the score reference value and the score of each node up to the (J-1) layer connection node, the second scoring unit 143 makes a negative judgment in S300 and executes the process of S410 (see FIG. 19). ).

Ｓ４１０より前の処理の流れを、図６に示す文書ネットワークの例に基づいて具体的に説明する。第２スコアリング部１４３は、ｇ＝０であるとき、サブグラフＳＧ１（図１３参照）に関する処理、及びサブグラフＳＧ２に関する処理の実行により、第１、第２、及び第３ノードのスコア基準値及びスコアを算出する。 The flow of processing prior to S410 will be specifically described with reference to the example of the document network shown in FIG. When g = 0, the second scoring unit 143 performs the processing related to the subgraph SG1 (see FIG. 13) and the processing related to the subgraph SG2 to perform the score reference values and scores of the first, second, and third nodes. Is calculated.

第２スコアリング部１４３は更に、サブグラフＳＧ３に関する処理、及びサブグラフＳＧ４に関する処理の実行により、第１０、第１１、第１２、第１３、及び第２０ノードのスコア基準値及びスコアを算出する。 The second scoring unit 143 further calculates the score reference values and scores of the tenth, eleventh, twelfth, thirteenth, and twentieth nodes by executing the processing relating to the subgraph SG3 and the processing relating to the subgraph SG4.

その後、ｇ＝１のプロセスにおいて、第２スコアリング部１４３は、サブグラフＳＧ５（図１６参照）に関する処理を実行し、更には、サブグラフＳＧ６に関する処理を実行し、第４、第５、及び第６ノードのスコア基準値及びスコアを算出する。 After that, in the process of g = 1, the second scoring unit 143 executes the process related to the subgraph SG5 (see FIG. 16), further executes the process related to the subgraph SG6, and executes the processes related to the subgraph SG6, and the fourth, fifth, and sixth. Calculate the node score reference value and score.

ｇ＝２のプロセスにおいて、第２スコアリング部１４３は、サブグラフＳＧ７（図１７参照）に関する処理を実行し、更には、サブグラフＳＧ８に関する処理を実行し、第７ノードのスコア基準値及びスコアを算出する。 In the process of g = 2, the second scoring unit 143 executes the process related to the subgraph SG7 (see FIG. 17), further executes the process related to the subgraph SG8, and calculates the score reference value and the score of the seventh node. do.

ｇ＝３のプロセスにおいて、第２スコアリング部１４３は、サブグラフＳＧ９（図１８参照）に関する処理を実行し、更には、サブグラフＳＧ１０に関する処理を実行し、第８、第１４、及び第１５ノードのスコア基準値及びスコアを算出する。 In the process of g = 3, the second scoring unit 143 executes the process related to the subgraph SG9 (see FIG. 18), and further executes the process related to the subgraph SG10, and the 8th, 14th, and 15th nodes. Score The reference value and score are calculated.

Ｓ４１０（図１９参照）において、第２スコアリング部１４３は、結合ノードから始まり、非結合ノードの後端ノードで終わる非循環型のサブグラフを判別する。図６に示す文書ネットワークの例によれば、Ｓ４１０で判別されるサブグラフは、図２０に示す、第１５、第１６、第１７、第１８、及び第１９ノードからなるサブグラフＳＧ１１である。 In S410 (see FIG. 19), the second scoring unit 143 determines a non-circular subgraph starting at the joined node and ending at the trailing end node of the unjoined node. According to the example of the document network shown in FIG. 6, the subgraph identified in S410 is the subgraph SG11 composed of the 15th, 16th, 17th, 18th, and 19th nodes shown in FIG.

Ｓ４１０での処理によって、該当するサブグラフが存在しないことが判明した場合（Ｓ４２０でＮｏ）、第２スコアリング部１４３は、Ｓ４３０の処理を実行せずに、Ｓ４４０の処理を実行する。一方、該当するサブグラフが存在することが判明した場合（Ｓ４２０でＹｅｓ）、第２スコアリング部１４３は、Ｓ４１０で判別されたサブグラフ毎に、サブグラフ内の各ノードのスコア基準値及びスコアを算出する（Ｓ４３０）。 When the processing in S410 reveals that the corresponding subgraph does not exist (No in S420), the second scoring unit 143 executes the processing of S440 without executing the processing of S430. On the other hand, when it is found that the corresponding subgraph exists (Yes in S420), the second scoring unit 143 calculates the score reference value and the score of each node in the subgraph for each subgraph determined in S410. (S430).

Ｓ４３０において、第２スコアリング部１４３は、Ｓ３７０の処理と同様、サブグラフ内の各ノードのスコア基準値及びスコアを、既に計算されている結合ノードのスコア及びスコア基準値に基づいて修正する。このようにして、サブグラフ内の各ノードのスコア及びスコア基準値を決定する。 In S430, the second scoring unit 143 modifies the score reference value and the score of each node in the subgraph based on the already calculated score and the score reference value of the combined node, as in the processing of S370. In this way, the score and score reference value of each node in the subgraph are determined.

続くＳ４４０において、第２スコアリング部１４３は、循環系のサブグラフを判別する。図６に示す文書ネットワークの例によれば、Ｓ４４０で判別されるサブグラフは、図２１に示す、第６、第７、第８、及び第９ノードからなるサブグラフＳＧ１２である。 In the following S440, the second scoring unit 143 discriminates the subgraph of the circulatory system. According to the example of the document network shown in FIG. 6, the subgraph identified in S440 is the subgraph SG12 composed of the sixth, seventh, eighth, and ninth nodes shown in FIG.

循環系のサブグラフが存在しない場合（Ｓ４５０でＮｏ）、第２スコアリング部１４３は、Ｓ４６０−Ｓ４９０の処理を実行せずに、Ｓ５００の処理を実行する。一方、循環系のサブグラフが存在する場合（Ｓ４５０でＹｅｓ）、第２スコアリング部１４３は、Ｓ４６０−Ｓ４９０において、Ｓ４４０で判別されたサブグラフ毎に、サブグラフ内の各ノードのスコア基準値及びスコアを算出する。第２スコアリング部１４３は、Ｓ４４０で判別されたすべてのサブグラフに関してＳ４７０−Ｓ４８０の処理を実行すると（Ｓ４９０でＹｅｓ）、Ｓ５００の処理を実行する。 When the circulatory system subgraph does not exist (No in S450), the second scoring unit 143 executes the process of S500 without executing the process of S460-S490. On the other hand, when there is a circulatory system subgraph (Yes in S450), the second scoring unit 143 sets the score reference value and score of each node in the subgraph for each subgraph determined in S440 in S460-S490. calculate. When the second scoring unit 143 executes the processing of S470-S480 for all the subgraphs determined in S440 (Yes in S490), the second scoring unit 143 executes the processing of S500.

Ｓ４６０において、第２スコアリング部１４３は、Ｓ４４０で判別されたサブグラフの一つを選択する。Ｓ４７０において、第２スコアリング部１４３は、選択したサブグラフ内において、既にスコアが算出されているノードのスコア群に基づき、サブグラフ内の各ノードに対する共通の加算スコアを決定する。 In S460, the second scoring unit 143 selects one of the subgraphs determined in S440. In S470, the second scoring unit 143 determines a common additional score for each node in the subgraph based on the score group of the nodes for which the score has already been calculated in the selected subgraph.

Ｓ４７０の第１例によれば、第２スコアリング部１４３は、上記サブグラフ内のスコア群の最大値を、加算スコアに決定する。Ｓ４７０の第２例によれば、第２スコアリング部１４３は、上記スコア群の平均値を、加算スコアに決定する。 According to the first example of S470, the second scoring unit 143 determines the maximum value of the score group in the above subgraph as the added score. According to the second example of S470, the second scoring unit 143 determines the average value of the score group as the added score.

Ｓ４８０において、第２スコアリング部１４３は、決定した加算スコアを、選択したサブグラフ内の各ノードのスコアに加算して、各ノードのスコアを修正する。加算前にスコアが算出されていないノードに対しては、スコアがゼロであるとみなして、上記決定した加算スコアを加算することができる。 In S480, the second scoring unit 143 adds the determined added score to the score of each node in the selected subgraph to correct the score of each node. For nodes for which the score has not been calculated before the addition, the score can be regarded as zero and the above-determined addition score can be added.

第２スコアリング部１４３は、このようにして各サブグラフ内のノードのスコアを修正すると、Ｓ５００の処理を実行する。Ｓ５００の処理が実行される前に、文書ネットワーク内のすべてのノードのスコアが決定される。 The second scoring unit 143 executes the process of S500 when the score of the node in each subgraph is corrected in this way. Before the processing of S500 is executed, the scores of all the nodes in the document network are determined.

Ｓ５００において、第２スコアリング部１４３は、決定された文書ネットワーク内の各ノードのスコアを、各ウェブページの第２スコアとしてランク付け部１４５に出力する。その後、スコア算出処理を終了する。 In S500, the second scoring unit 143 outputs the score of each node in the determined document network to the ranking unit 145 as the second score of each web page. After that, the score calculation process is completed.

以上に説明した第１実施形態の情報処理システム１は、次のように変形され得る。第１変形例として、第２スコアリング部１４３は、アウトリンクのないノードに対してダミーノードＤＰを置かずに、エルミート隣接行列Ｈを生成し、スコア基準値及びスコアを算出してもよい。 The information processing system 1 of the first embodiment described above can be modified as follows. As a first modification, the second scoring unit 143 may generate the Hermitian adjacency matrix H and calculate the score reference value and the score without placing the dummy node DP for the node without the outline.

第２変形例として、第２スコアリング部１４３は、各ノードのスコア基準値Ｖｃ［ｍ］を、対応するノードのアウトリンク先の影響を除いた値として修正し、修正したスコア基準値Ｖｃ^＊［ｍ］を、修正前のスコア基準値Ｖｃ［ｍ］に代えてを用いて、各ノードに対応するスコア相当値Ｚ［ｍ］＝｜Ｖｃ^＊［ｍ］｜^ｄ１・｛２π−ａｒｇ（Ｖｃ^＊［ｍ］）｝^ｄ２を算出してもよい。As a second modification, the second scoring unit 143 modifies the score reference value Vc [m] of each node as a value excluding the influence of the outlink destination of the corresponding node, and modifies the score reference value Vc ^*. Using [m] instead of the uncorrected score reference value Vc [m], the score equivalent value Z [m] = | Vc ^* [m] | ^d1 · {2π-arg (Vc) corresponding to each node. ^* [M])} ^d2 may be calculated.

第３変形例として、第２スコアリング部１４３は、各ノードのスコア基準値Ｖｃ［ｍ］を、対応するノードのインリンク元からの影響を除いた値として修正し、修正したスコア基準値Ｖｃ^＊［ｍ］を、修正前のスコア基準値に代えてを用いて、各ノードに対応するスコア相当値Ｚ［ｍ］＝｜Ｖｃ^＊［ｍ］｜^ｄ１・｛２π−ａｒｇ（Ｖｃ^＊［ｍ］）｝^ｄ２を算出してもよい。第２変形例及び第３変形例は、第１変形例と同様、アウトリンクのないノードに対してダミーノードＤＰを置かずに、エルミート隣接行列Ｈを生成して実施され得る。As a third modification, the second scoring unit 143 modifies the score reference value Vc [m] of each node as a value excluding the influence from the inlink source of the corresponding node, and modifies the score reference value Vc. ^{* Using} [m] instead of the score reference value before correction, the score equivalent value Z [m] = | Vc ^* [m] | ^d1 · {2π-arg (Vc ^* [m] corresponding to each node ])} ^d2 may be calculated. Similar to the first modification, the second modification and the third modification can be implemented by generating the Hermitian adjacency matrix H without placing a dummy node DP for a node having no outline.

本実施形態の情報処理システム１によれば、ウェブページ間の接続関係を１，０，＋ｉ，−ｉの４値で表現したエルミート隣接行列Ｈに対応する特殊エルミート隣接行列Ｈ１，Ｈ２を用いて複数のウェブページをスコアリングする。このため、全ウェブページから全ウェブページへの仮想的な接続関係を措定する必要がなく、ウェブページ間の接続関係に基づく各ウェブページのスコアリング／ランク付けを従来よりも適切に実現することができる。 According to the information processing system 1 of the present embodiment, the special Hermitian adjacency matrices H1 and H2 corresponding to the Hermitian adjacency matrix H in which the connection relationship between the web pages is expressed by four values of 1,0, + i, and −i are used. Score multiple web pages. Therefore, it is not necessary to establish a virtual connection relationship from all web pages to all web pages, and scoring / ranking of each web page based on the connection relationship between web pages should be realized more appropriately than before. Can be done.

本実施形態によれば、結合ノードを有する文書ネットワークにおいても、複数のウェブページのスコアリング／ランク付けを、エルミート隣接行列Ｈを用いて適切に実行できる。従って、出力部１４７は、第２スコアリング部１４３からの第２スコアに基づき、ウェブページ間の接続関係に基づいた適切な検索結果リストを、ユーザ端末５に提供することができる。 According to this embodiment, even in a document network having a join node, scoring / ranking of a plurality of web pages can be appropriately performed using the Hermitian adjacency matrix H. Therefore, the output unit 147 can provide the user terminal 5 with an appropriate search result list based on the connection relationship between the web pages based on the second score from the second scoring unit 143.

［第２実施形態］
続いて、第２実施形態の情報処理システム１を説明する。第２実施形態の情報処理システム１は、第１実施形態とは異なる内容のスコア算出処理がＳ１３０において実行されることを除けば、第１実施形態の情報処理システム１と同様に構成される。従って、以下では、第２スコアリング部１４３が、Ｓ１３０で実行するスコア算出処理の説明のみをする。以下において言及しない第２実施形態の情報処理システム１の構成は、第１実施形態と同一であると理解されてよい。[Second Embodiment]
Subsequently, the information processing system 1 of the second embodiment will be described. The information processing system 1 of the second embodiment is configured in the same manner as the information processing system 1 of the first embodiment, except that a score calculation process having a content different from that of the first embodiment is executed in S130. Therefore, in the following, only the score calculation process executed by the second scoring unit 143 in S130 will be described. It may be understood that the configuration of the information processing system 1 of the second embodiment not referred to below is the same as that of the first embodiment.

第２実施形態において、第２スコアリング部１４３は、Ｓ１２０で選択した処理対象の文書ネットワークに含まれる各ノードの第２スコアをＳ１３０において算出する際、図２２に示すスコア算出処理を実行する。 In the second embodiment, the second scoring unit 143 executes the score calculation process shown in FIG. 22 when calculating the second score of each node included in the document network to be processed selected in S120 in S130.

図２２に示すスコア算出処理を開始すると、第２スコアリング部１４３は、処理対象の文書ネットワークに含まれるインリンクを持たない先端ノードを判別する（Ｓ６１０）。例えば、処理対象の文書ネットワークが、図２３に例示される文書ネットワークである場合、第２スコアリング部１４３は、第１ノード及び第８ノードを先端ノードとして判別する。 When the score calculation process shown in FIG. 22 is started, the second scoring unit 143 determines the tip node having no inlink included in the document network to be processed (S610). For example, when the document network to be processed is the document network illustrated in FIG. 23, the second scoring unit 143 determines the first node and the eighth node as the tip node.

その後、第２スコアリング部１４３は、文書ネットワークに含まれる先端ノードの一つを選択し（Ｓ６２０）、選択した先端ノードを含むサブグラフを判別する（Ｓ６３０）。判別されるサブグラフは、選択した先端ノードと、この先端ノードからリンクの向きに従って移動可能な文書ネットワーク内のすべてのノードとからなるサブグラフである。 After that, the second scoring unit 143 selects one of the advanced nodes included in the document network (S620) and determines the subgraph including the selected advanced node (S630). The discriminated subgraph is a subgraph consisting of the selected tip node and all the nodes in the document network that can be moved from this tip node according to the direction of the link.

図２３に示される文書ネットワークの例によれば、選択された先端ノードが第１ノードである場合、Ｓ６３０では、図２４Ａに示すように、第８ノードを除く第１ノードから第９ノードまでのノードからなるサブグラフＳＧ２１が判別される。選択された先端ノードが第８ノードである場合、Ｓ６３０では、図２４Ｂに示すように、第３ノードから第９ノードまでのノードからなるサブグラフＳＧ２２が判別される。 According to the example of the document network shown in FIG. 23, when the selected leading node is the first node, in S630, as shown in FIG. 24A, from the first node to the ninth node excluding the eighth node. The subgraph SG21 composed of nodes is determined. When the selected leading node is the eighth node, in S630, as shown in FIG. 24B, the subgraph SG22 composed of the nodes from the third node to the ninth node is determined.

その後、第２スコアリング部１４３は、Ｓ６３０で判別されたサブグラフを、処理対象の文書ネットワークとみなしたときの図８に示す処理と同様の処理を実行し、サブグラフ内の各ノードのスコア基準値及びスコアを算出する（Ｓ６４０）。 After that, the second scoring unit 143 executes the same processing as the processing shown in FIG. 8 when the subgraph determined in S630 is regarded as the document network to be processed, and the score reference value of each node in the subgraph is executed. And the score is calculated (S640).

Ｓ６４０において、第２スコアリング部１４３は、後端ノードにダミーノードＤＰを配置せずに、エルミート隣接行列Ｈを生成し、スコア基準値及びスコアを算出することができる。第２スコアリング部１４３は、サブグラフ内における先端ノードのスコア基準値を、複素平面において、実軸上の値１の点、又は実軸に近い第４象限上の特定点に配置するように、サブグラフにおける各ノードのスコア基準値を算出することができる。特定点は、実軸上の値１の点を、角度θ１だけ第４象限側に回転させた点であり得る。 In S640, the second scoring unit 143 can generate the Hermitian adjacency matrix H and calculate the score reference value and the score without arranging the dummy node DP at the rear end node. The second scoring unit 143 arranges the score reference value of the tip node in the subgraph at the point of the value 1 on the real axis or the specific point on the fourth quadrant close to the real axis in the complex plane. The score reference value of each node in the subgraph can be calculated. The specific point may be a point obtained by rotating a point having a value of 1 on the real axis toward the fourth quadrant by an angle θ1.

Ｓ６４０に続くＳ６５０において、第２スコアリング部１４３は、すべての先端ノードを選択して、Ｓ６４０の処理を実行したか否かを判断する。Ｓ６５０において否定判断すると、第２スコアリング部１４３は、選択する先端ノードを変更して（Ｓ６２０）、変更後の先端ノードのサブグラフを判別する（Ｓ６３０）。そして、判別したサブグラフのエルミート隣接行列Ｈに基づいて、サブグラフ内の各ノードのスコア基準値及びスコアを算出する（Ｓ６４０）。 In S650 following S640, the second scoring unit 143 selects all the tip nodes and determines whether or not the processing of S640 has been executed. If a negative determination is made in S650, the second scoring unit 143 changes the selected tip node (S620) and determines the subgraph of the changed tip node (S630). Then, the score reference value and the score of each node in the subgraph are calculated based on the Hermitian adjacency matrix H of the discriminated subgraph (S640).

第２スコアリング部１４３は、このようにして、文書ネットワーク内に含まれる先端ノードのそれぞれに対応するサブグラフ毎に、対応するエルミート隣接行列Ｈに基づく各ノードのスコア基準値及びスコアを算出する。換言すれば、第２スコアリング部１４３は、結合ノードのインリンク毎のサブグラフを判別し、サブグラフ毎に、対応するエルミート隣接行列Ｈに基づく各ノードのスコア基準値及びスコアを算出する。その後、第２スコアリング部１４３は、Ｓ６５０で肯定判断して、Ｓ６６０の処理を実行する。 In this way, the second scoring unit 143 calculates the score reference value and the score of each node based on the corresponding Hermitian adjacency matrix H for each subgraph corresponding to each of the advanced nodes included in the document network. In other words, the second scoring unit 143 determines the subgraph for each inlink of the connecting node, and calculates the score reference value and the score of each node based on the corresponding Hermitian adjacency matrix H for each subgraph. After that, the second scoring unit 143 makes an affirmative judgment in S650 and executes the process of S660.

サブグラフ内には、他のサブグラフと重複するノードが含まれるが、Ｓ６４０では、重複するノードのそれぞれに対し、サブグラフ毎に、スコア基準値及びスコアが算出される。 The subgraph includes nodes that overlap with other subgraphs, but in S640, a score reference value and a score are calculated for each of the overlapping nodes for each subgraph.

Ｓ６６０において、第２スコアリング部１４３は、文書ネットワーク内の各ノードの第２スコアとして、対応するノードの各サブグラフでのスコアを統合した値を算出する。具体的に、第２スコアリング部１４３は、一つのノードの第２スコアを、そのノードの各サブグラフでのスコアを合計した値として算出する。 In S660, the second scoring unit 143 calculates a value obtained by integrating the scores of the corresponding nodes in each subgraph as the second score of each node in the document network. Specifically, the second scoring unit 143 calculates the second score of one node as the total value of the scores in each subgraph of the node.

あるいは、第２スコアリング部１４３は、一つのノードの第２スコアを、そのノードの各サブグラフでのスコア基準値の合成ベクトルに基づいて算出してもよい。第２スコアリング部１４３は、ノード毎に、対応するノードの各サブグラフでのスコア基準値の合成ベクトルを、対応するノードの唯一のスコア基準値Ｖｘとして用いて、式Ｚｘ＝｜Ｖｘ｜^ｄ１・｛２π−ａｒｇ（Ｖｘ）｝^ｄ２に従って、スコア基準値Ｖｘに対応するスコア相当値Ｚｘを算出することができる。第２スコアリング部１４３は、算出したスコア相当値Ｚｘを、対応するノードの第２スコアとして出力することができる。Alternatively, the second scoring unit 143 may calculate the second score of one node based on the composite vector of the score reference values in each subgraph of the node. The second scoring unit 143 uses the composite vector of the score reference values in each subgraph of the corresponding node as the only score reference value Vx of the corresponding node for each node, and uses the equation Zx = | Vx | ^d1. According to {2π-arg (Vx)} ^d2 , the score equivalent value Zx corresponding to the score reference value Vx can be calculated. The second scoring unit 143 can output the calculated score equivalent value Zx as the second score of the corresponding node.

上述の第２実施形態によっても、情報処理システム１は、結合ノードを有する文書ネットワークに関する複数のウェブページのスコアリング／ランク付けを、エルミート隣接行列Ｈを用いて適切に実行することができる。 Also according to the second embodiment described above, the information processing system 1 can appropriately perform scoring / ranking of a plurality of web pages related to the document network having the connecting node by using the Hermitian adjacency matrix H.

第２実施形態は、第１実施形態と同様に変形されてもよい。すなわち、第１実施形態において、第１、第２、及び第３変形例として説明したスコアの算出に係る変形例は、第２実施形態に適用されてもよい。 The second embodiment may be modified in the same manner as the first embodiment. That is, in the first embodiment, the modified example relating to the calculation of the score described as the first, second, and third modified examples may be applied to the second embodiment.

［第３実施形態］
続いて、第３実施形態の情報処理システム１を説明する。第３実施形態の情報処理システム１は、第１実施形態とは異なる内容のスコア算出処理がＳ１３０において実行されることを除けば、第１実施形態の情報処理システム１と同様に構成される。従って、以下では、第２スコアリング部１４３が、Ｓ１３０で実行するスコア算出処理の説明のみをする。以下において言及しない第３実施形態の情報処理システム１の構成は、第１実施形態と同一であると理解されてよい。[Third Embodiment]
Subsequently, the information processing system 1 of the third embodiment will be described. The information processing system 1 of the third embodiment is configured in the same manner as the information processing system 1 of the first embodiment, except that a score calculation process having a content different from that of the first embodiment is executed in S130. Therefore, in the following, only the score calculation process executed by the second scoring unit 143 in S130 will be described. It may be understood that the configuration of the information processing system 1 of the third embodiment, which is not described below, is the same as that of the first embodiment.

第３実施形態において、第２スコアリング部１４３は、Ｓ１２０で選択した処理対象の文書ネットワークに含まれる各ノードの第２スコアをＳ１３０において算出する際に、図２５に示すスコア算出処理を実行する。 In the third embodiment, the second scoring unit 143 executes the score calculation process shown in FIG. 25 when calculating the second score of each node included in the document network to be processed selected in S120 in S130. ..

図２５に示すスコア算出処理を開始すると、第２スコアリング部１４３は、処理対象の文書ネットワーク内に含まれるインリンクを持たない先端ノード、及びアウトリンクを持たない後端ノードを判別する（Ｓ７１０）。 When the score calculation process shown in FIG. 25 is started, the second scoring unit 143 determines the front node having no inlink and the rear node having no outlink included in the document network to be processed (S710). ).

アウトリンクを持たない後端ノードがない場合、第２スコアリング部１４３は、文書ネットワークを有向グラフで表現したときのアウトリンク及びインリンクの合計と、文書ネットワークを無向グラフで表現したときのリンク数とが異なるノードを、形式に後端ノードと判別する。該当するノードがない場合、第２スコアリング部１４３は、文書ネットワーク内のすべてのノードからのインリンクを有するダミーノードＤＰを文書ネットワークに追加して、そのダミーノードＤＰを後端ノードと判別する。 When there is no trailing node that does not have an outlink, the second scoring unit 143 calculates the total of the outlinks and inlinks when the document network is represented by a directed graph, and the link when the document network is represented by an undirected graph. Nodes with different numbers are formally identified as trailing nodes. If there is no corresponding node, the second scoring unit 143 adds a dummy node DP having inlinks from all the nodes in the document network to the document network, and determines the dummy node DP as the rearmost node. ..

処理対象の文書ネットワークが、図２３に例示される文書ネットワークである場合、第２スコアリング部１４３は、第１及び第８ノードを、先端ノードとして判別し、第５、第７、及び第９ノードを、後端ノードとして判別する。処理対象の文書ネットワークが、図２７及び図２８に例示される文書ネットワークである場合、第２スコアリング部１４３は、第１ノードを、先端ノードとして判別し、第９ノードを、後端ノードとして判別する。 When the document network to be processed is the document network exemplified in FIG. 23, the second scoring unit 143 determines the first and eighth nodes as the tip nodes, and the fifth, seventh, and ninth nodes. Determine the node as the trailing node. When the document network to be processed is the document network exemplified in FIGS. 27 and 28, the second scoring unit 143 determines the first node as the front node and the ninth node as the rear node. Determine.

その後、第２スコアリング部１４３は、文書ネットワークに含まれる、先端ノードと後端ノードとの組合せ毎のサブグラフを判別する（Ｓ７２０）。サブグラフは、先端ノードから後端ノードまでリンクの向きに従って移動可能なノードの一群からなるサブグラフである。先端ノードと後端ノードとの組合せ毎のサブグラフは、結合ノードが有するインリンク及びアウトリンクの組合せ毎のサブグラフと理解されてもよい。 After that, the second scoring unit 143 determines the subgraph for each combination of the front-end node and the rear-end node included in the document network (S720). A subgraph is a subgraph consisting of a group of nodes that can move from the front end node to the rear end node according to the direction of the link. The subgraph for each combination of the front-end node and the rear-end node may be understood as the subgraph for each combination of the inlink and the outlink of the join node.

図２３に示される文書ネットワークの例によれば、Ｓ７２０において、第２スコアリング部１４３は、図２６Ａに示す先端ノードが第１ノードである三つのサブグラフＳＧ３１、ＳＧ３２、ＳＧ３３、及び、図２６Ｂに示す先端ノードが第８ノードである三つのサブグラフＳＧ３４、ＳＧ３５、ＳＧ３６を判別する。 According to the example of the document network shown in FIG. 23, in S720, the second scoring unit 143 has three subgraphs SG31, SG32, SG33, and FIG. 26B in which the tip node shown in FIG. 26A is the first node. The three subgraphs SG34, SG35, and SG36 whose tip node is the eighth node are discriminated.

図２７に示される文書ネットワークの例によれば、Ｓ７２０において、第２スコアリング部１４３は、図２９Ａに示す第３ノードの第一のアウトリンクを通る先端ノードが第１ノード及び後端ノードが第９ノードであるサブグラフと、図２９Ｂに示す第３ノードの第二のアウトリンクを通る先端ノードが第１ノード及び後端ノードが第９ノードであるサブグラフとを判別する。 According to the example of the document network shown in FIG. 27, in S720, the second scoring unit 143 has the first node and the rearmost node as the front node passing through the first outlink of the third node shown in FIG. 29A. The subgraph which is the ninth node and the subgraph whose front end node passing through the second outlink of the third node shown in FIG. 29B is the first node and the rear end node is the ninth node are discriminated from each other.

その後、第２スコアリング部１４３は、判別したサブグラフの一つを選択し（Ｓ７３０）、選択したサブグラフを、処理対象の文書ネットワークとみなしたときの図８に示す処理と同様の処理を実行し、サブグラフ内の各ノードの第１仮スコアを算出する（Ｓ７４０）。 After that, the second scoring unit 143 selects one of the determined subgraphs (S730), and executes the same processing as that shown in FIG. 8 when the selected subgraph is regarded as the document network to be processed. , Calculate the first tentative score of each node in the subgraph (S740).

Ｓ７４０において、第２スコアリング部１４３は、第ｍノードの第１仮スコアＸｐ１［ｍ］を、式Ｘｐ１［ｍ］＝｛（２π−ａｒｇ（Ｖｃ［ｍ］））／（π／２ｎ）｝^ｄ３に従って算出することができる。Ｖｃ［ｍ］は、第ｍノードのスコア基準値である。ｄ３は、０より大きい任意の実数である。ｄ３が大きいほど、第１仮スコアＸｐ１［ｍ］は、インリンクを持たない先端ノードからの距離（リンクの数）に応じて大きくなる。In S740, the second scoring unit 143 sets the first provisional score Xp1 [m] of the m node to the equation Xp1 [m] = {(2π-arg (Vc [m])) / (π / 2n)}. It can be calculated according to ^d3. Vc [m] is a score reference value of the mth node. d3 is any real number greater than 0. The larger d3 is, the larger the first provisional score Xp1 [m] is according to the distance (the number of links) from the tip node having no inlink.

第２スコアリング部１４３は、すべてのサブグラフに関してＳ７４０の処理を実行するまで（Ｓ７５０でＮｏ）、サブグラフのそれぞれを順に選択し（Ｓ７３０）、Ｓ７４０の処理を実行する。これにより、第２スコアリング部１４３は、サブグラフ毎に、当該サブグラフ内の各ノードの第１仮スコアを算出する（Ｓ７４０）。 The second scoring unit 143 selects each of the subgraphs in order (S730) until the processing of S740 is executed for all the subgraphs (No in S750), and the processing of S740 is executed. As a result, the second scoring unit 143 calculates the first provisional score of each node in the subgraph for each subgraph (S740).

すべてのサブグラフに関して、Ｓ７４０の処理を実行すると（Ｓ７５０でＹｅｓ）、第２スコアリング部１４３は、文書ネットワーク内のノード毎に、当該ノードの第２仮スコアとして、当該ノードの第１仮スコアの平均値を算出する（Ｓ７６０）。一つのノードの第２仮スコアは、対応するノードの第１仮スコアの合計を、対応するノードが属するサブグラフの数で除算した値である。 When the processing of S740 is executed for all the subgraphs (Yes in S750), the second scoring unit 143 sets the second tentative score of the node as the second tentative score of the node for each node in the document network. The average value is calculated (S760). The second provisional score of one node is the sum of the first provisional scores of the corresponding node divided by the number of subgraphs to which the corresponding node belongs.

その後、第２スコアリング部１４３は、文書ネットワーク内の各ノードの第２仮スコアに基づいて、各ノードの第３仮スコアを算出する（Ｓ７７０）。Ｓ７７０において、第２スコアリング部１４３は、第ｍノードの第３仮スコアＸｐ３［ｍ］を、第ｍノードの第２仮スコアＸｐ２［ｍ］及びスコア基準値Ｖｃ［ｍ］を用いて、式Ｘｐ３［ｍ］＝Ｘｐ２［ｍ］・｜Ｖｃ［ｍ］｜に従い算出する。 After that, the second scoring unit 143 calculates the third tentative score of each node based on the second tentative score of each node in the document network (S770). In S770, the second scoring unit 143 uses the third provisional score Xp3 [m] of the m-node, the second provisional score Xp2 [m] of the m-node, and the score reference value Vc [m] to formulate the formula. Calculate according to Xp3 [m] = Xp2 [m] · | Vc [m] |.

その後、第２スコアリング部１４３は、文書ネットワーク内の各ノードの第２スコアを、各ノードの第３仮スコアを用いて算出する（Ｓ７８０）。具体的には、第２スコアリング部１４３は、文書ネットワーク内の第ｍノードの第２スコアを、第３仮スコアＸｐ３［ｍ］を値Ｍ^ｄ４で除算した値Ｘｐ３［ｍ］／Ｍ^ｄ４に算出する。ここで、Ｍは、後端ノードから第ｍノードに到達可能な各ノードのアウトリンクの数の積である。After that, the second scoring unit 143 calculates the second score of each node in the document network using the third provisional score of each node (S780). Specifically, the second scoring unit 143, the second score of the m nodes in the document network, the third temporary score Xp3 value obtained by dividing [m] to a value ^{M d4} Xp3 [m] ^{/ M d4} calculate. Here, M is the product of the number of outlinks of each node that can reach the mth node from the rearmost node.

上述の第３実施形態によっても、結合ノードを有する文書ネットワークに関し、複数のウェブページのスコアリング／ランク付けを、エルミート隣接行列Ｈを用いて適切に実行することができる。 Also according to the third embodiment described above, scoring / ranking of a plurality of web pages can be appropriately performed using the Hermitian adjacency matrix H for a document network having a join node.

［第４実施形態］
続いて、第４実施形態の情報処理システム１を説明する。第４実施形態の情報処理システム１は、第２スコアリング部１４３が、図８に示す副処理に代えて図３０に示す副処理を実行する点で、第１実施形態とは異なる。一方、第４実施形態の情報処理システム１は、その他の点で基本的に第１実施形態と同じである。従って、以下では、第４実施形態の情報処理システム１の第１実施形態とは異なる構成を選択的に説明し、第１実施形態とは同一構成の説明を省略する。[Fourth Embodiment]
Subsequently, the information processing system 1 of the fourth embodiment will be described. The information processing system 1 of the fourth embodiment is different from the first embodiment in that the second scoring unit 143 executes the sub-processing shown in FIG. 30 instead of the sub-processing shown in FIG. On the other hand, the information processing system 1 of the fourth embodiment is basically the same as that of the first embodiment in other respects. Therefore, in the following, a configuration different from that of the first embodiment of the information processing system 1 of the fourth embodiment will be selectively described, and the description of the same configuration as that of the first embodiment will be omitted.

図３０に示す副処理は、Ｓ２４０で実行される。更に、Ｓ２８０，Ｓ３３０，Ｓ３７０，Ｓ４３０では、スコア基準値及びスコアの算出に際し、図３０に示す副処理と同様の処理が実行される。 The sub-processing shown in FIG. 30 is executed in S240. Further, in S280, S330, S370, and S430, the same processing as the sub-processing shown in FIG. 30 is executed when calculating the score reference value and the score.

図３０に示す副処理を開始すると、第２スコアリング部１４３は、Ｓ１０１０と同様に、処理対象の文書ネットワークに対応するエルミート隣接行列Ｈを生成する（Ｓ１１１０）。 When the sub-processing shown in FIG. 30 is started, the second scoring unit 143 generates the Hermitian adjacency matrix H corresponding to the document network to be processed (S1110), similarly to S1010.

その後、第２スコアリング部１４３は、上記生成したエルミート隣接行列Ｈを変形した特殊エルミート隣接行列Ｈ３を生成する（Ｓ１１２０）。特殊エルミート隣接行列Ｈ３は、Ｓ１０２０において生成される特殊エルミート隣接行列Ｈ１における対角成分の全てを、値０から値−１に置換することによって生成される。 After that, the second scoring unit 143 generates a special Hermitian adjacency matrix H3 which is a modification of the generated Hermitian adjacency matrix H (S1120). The special Hermitian adjacency matrix H3 is generated by substituting all the diagonal components in the special Hermitian adjacency matrix H1 generated in S1020 from a value 0 to a value -1.

すなわち、第２スコアリング部１４３は、Ｓ１１１０で生成したエルミート隣接行列ＨをＳ１０２０での処理と同様に変形することにより、特殊エルミート隣接行列Ｈ１を生成し、特殊エルミート隣接行列Ｈ１における対角成分の全てを、値０から値−１に置換することによって、特殊エルミート隣接行列Ｈ３を生成することができる。 That is, the second scoring unit 143 generates a special Hermitian adjacency matrix H1 by transforming the Hermitian adjacency matrix H generated in S1110 in the same manner as in the processing in S1020, and the diagonal components in the special Hermitian adjacency matrix H1. By substituting everything from value 0 to value -1, the special Hermitian adjacency matrix H3 can be generated.

続くＳ１１３０において、第２スコアリング部１４３は、列ベクトルＢを生成する。Ｓ１１２０で生成される特殊エルミート隣接行列Ｈ３は、処理対象の文書ネットワークのノード数Ｎに対応したＮ行Ｎ列（ＮｘＮ）の行列である。 In the subsequent S1130, the second scoring unit 143 generates the column vector B. The special Hermitian adjacency matrix H3 generated in S1120 is a matrix of N rows and N columns (NxN) corresponding to the number N of nodes of the document network to be processed.

Ｓ１１３０で生成される列ベクトルＢは、Ｎ行１列の行列に対応し、列ベクトルＢは、各成分が、対応するノードがインリンクを有するノードであるか否かに応じた値を示すように生成される。 The column vector B generated in S1130 corresponds to a matrix of N rows and 1 column, and the column vector B indicates a value according to whether or not each component is a node having an inlink. Is generated in.

具体的に、第２スコアリング部１４３は、インリンクを有さないノードに対応する成分を値−１に設定し、インリンクを有するノードに対応する成分を値０に設定するように、列ベクトルＢを生成する。 Specifically, the second scoring unit 143 sets the component corresponding to the node having no inlink to the value -1, and sets the component corresponding to the node having the inlink to the value 0. Generate vector B.

続くＳ１１４０において、第２スコアリング部１４３は、特殊エルミート隣接行列Ｈ３及び列ベクトルＢを含む次の連立方程式を解くことにより、連立方程式の解に対応するスコア基準ベクトルＵの各成分ｕ［ｍ］（１≦ｍ≦Ｎ）を求める。スコア基準ベクトルＵは、列ベクトルＢと同様にＮ行１列の行列に対応する。下式では、スコア基準ベクトルＵの第ｍ行成分をｕ［ｍ］で表現し、列ベクトルＢの第ｍ行成分をｂ［ｍ］で表し、特殊エルミート隣接行列Ｈ３を、Ｈ’で表す。 In the following S1140, the second scoring unit 143 solves each component u [m] of the score reference vector U corresponding to the solution of the simultaneous equations by solving the following simultaneous equations including the special Hermitian adjacency matrix H3 and the column vector B. (1 ≦ m ≦ N) is obtained. The score reference vector U corresponds to a matrix of N rows and 1 column like the column vector B. In the following equation, the m-th row component of the score reference vector U is represented by u [m], the m-th row component of the column vector B is represented by b [m], and the special Hermitian adjacency matrix H3 is represented by H'.

例えば、第１ノードのみがインリンクを有さないノード数が５の例示的かつ単純な文書ネットワークを想定する。この場合、上記連立方程式は、例示的な特殊エルミート隣接行列Ｈ３を用いて、次のように表され得る。

For example, assume an exemplary and simple document network in which only the first node has no inlinks and the number of nodes is 5. In this case, the simultaneous equations can be expressed as follows using the exemplary Hermitian adjacency matrix H3.

続くＳ１１５０において、第２スコアリング部１４３は、Ｓ１１４０で算出したスコア基準ベクトルＵの各成分ｕ［ｍ］に基づいて、各ノードのスコア基準値Ｖｃ［ｍ］を決定する。

In the subsequent S1150, the second scoring unit 143 determines the score reference value Vc [m] of each node based on each component u [m] of the score reference vector U calculated in S1140.

第２スコアリング部１４３は、第１実施形態における固有ベクトルＶ１と同様に、スコア基準ベクトルＵの各成分ｕ［ｍ］（１≦ｍ≦Ｎ）を、文書ネットワークの始点ノードに対応する成分Ｅ＝ｕ［ｓ］で除算することによって、補正することができる。その補正値ｕ［ｍ］／Ｅ（１≦ｍ≦Ｎ）を、複素平面上で角度θ１だけ実軸から第４象限側に回転移動させるように補正することができる。この補正値を、スコア基準値Ｖｃ［ｍ］に決定することができる。 Similar to the eigenvector V1 in the first embodiment, the second scoring unit 143 sets each component u [m] (1 ≦ m ≦ N) of the score reference vector U to the component E = corresponding to the start point node of the document network. It can be corrected by dividing by u [s]. The correction value u [m] / E (1 ≦ m ≦ N) can be corrected so as to be rotationally moved from the real axis to the fourth quadrant side by an angle θ1 on the complex plane. This correction value can be determined as the score reference value Vc [m].

続くＳ１１６０において、第２スコアリング部１４３は、各ノードのスコア相当値Ｚ［ｍ］（１≦ｍ≦Ｎ）として、各ノードのスコア基準値Ｖｃ［ｍ］（１≦ｍ≦Ｎ）に基づいた値Ｚ［ｍ］＝｜Ｖｃ［ｍ］｜・｛２π−ａｒｇ（Ｖｃ［ｍ］）｝を算出する。第１実施形態と同様に、スコア相当値Ｚ［ｍ］は、式Ｚ［ｍ］＝｜Ｖｃ［ｍ］｜^ｄ１・｛２π−ａｒｇ（Ｖｃ［ｍ］）｝^ｄ２に従って算出されてもよい。In the following S1160, the second scoring unit 143 sets the score equivalent value Z [m] (1 ≦ m ≦ N) of each node based on the score reference value Vc [m] (1 ≦ m ≦ N) of each node. The value Z [m] = | Vc [m] | · {2π-arg (Vc [m])} is calculated. Similar to the first embodiment, the score equivalent value Z [m] may be calculated according to ^{the formula Z [m] = | Vc [m] | d1} · {2π-arg (Vc [m])} ^d2.

第２スコアリング部１４３は更に、文書ネットワーク内の各ノードＤ［ｍ］（１≦ｍ≦Ｎ）のスコアＸを、スコア相当値Ｚ［ｍ］に基づいて算出する（Ｓ１１６０）。第２スコアリング部１４３は、第１実施形態と同様に、ノードＤ［ｍ］に対応するスコアＸを、Ｘ＝Ｚ［ｍ］−Ｚ０に従って算出することができる。Ｚ０は、例えば文書ネットワーク全体におけるＺ［ｍ］の最小値である。Ｚ０は、値ゼロであってもよい。 The second scoring unit 143 further calculates the score X of each node D [m] (1 ≦ m ≦ N) in the document network based on the score equivalent value Z [m] (S1160). The second scoring unit 143 can calculate the score X corresponding to the node D [m] according to X = Z [m] −Z0, as in the first embodiment. Z0 is, for example, the minimum value of Z [m] in the entire document network. Z0 may have a value of zero.

本実施形態では、このように各ノードのスコアＸが算出される。算出されたスコアＸの扱いは、第１実施形態と同様である。本実施形態によれば、第１実施形態のように固有値及び固有ベクトルを算出することなく、エルミート隣接行列Ｈを用いて各ノードのスコアＸを算出することができる。 In this embodiment, the score X of each node is calculated in this way. The treatment of the calculated score X is the same as that of the first embodiment. According to the present embodiment, the score X of each node can be calculated using the Hermitian adjacency matrix H without calculating the eigenvalues and the eigenvectors as in the first embodiment.

［第５実施形態］
続いて、第５実施形態の情報処理システム１を説明する。第５実施形態の情報処理システム１は、第２スコアリング部１４３が、図８に示す副処理に代えて図３１に示す副処理を実行する点で、第１実施形態とは異なる。一方、第５実施形態の情報処理システム１は、その他の点で基本的に第１実施形態と同じである。従って、以下では、第５実施形態の情報処理システム１の第１実施形態とは異なる構成を選択的に説明し、第１実施形態とは同一構成の説明を省略する。[Fifth Embodiment]
Subsequently, the information processing system 1 of the fifth embodiment will be described. The information processing system 1 of the fifth embodiment is different from the first embodiment in that the second scoring unit 143 executes the sub-processing shown in FIG. 31 instead of the sub-processing shown in FIG. On the other hand, the information processing system 1 of the fifth embodiment is basically the same as that of the first embodiment in other respects. Therefore, in the following, a configuration different from that of the first embodiment of the information processing system 1 of the fifth embodiment will be selectively described, and the description of the same configuration as that of the first embodiment will be omitted.

図３１に示す副処理は、Ｓ２４０で実行される。更に、Ｓ２８０，Ｓ３３０，Ｓ３７０，Ｓ４３０では、スコア基準値及びスコアの算出に際し、図３１に示す副処理と同様の処理が実行される。 The sub-processing shown in FIG. 31 is executed in S240. Further, in S280, S330, S370, and S430, the same processing as the sub-processing shown in FIG. 31 is executed when calculating the score reference value and the score.

図３１に示す副処理を開始すると、第２スコアリング部１４３は、Ｓ１０１０と同様に、処理対象の文書ネットワークに対応するエルミート隣接行列Ｈを生成する（Ｓ１２１０）。 When the sub-processing shown in FIG. 31 is started, the second scoring unit 143 generates the Hermitian adjacency matrix H corresponding to the document network to be processed (S1210), similarly to S1010.

その後、第２スコアリング部１４３は、上記生成したエルミート隣接行列Ｈを変形した特殊エルミート隣接行列Ｈ４を生成する（Ｓ１２２０）。特殊エルミート隣接行列Ｈ４を生成するために、第２スコアリング部１４３は、エルミート隣接行列Ｈにおいて、値＋ｉの成分を全て値０に置き換えることができる。更に、値−ｉの成分を、全て値Ｃ１（Ｃ２−ｉ）に置き換えることができる。この置換により、図３２上段に示す例示的なエルミート隣接行列Ｈは、図３２下段に示す行列Ｈ^（１）に置換される。After that, the second scoring unit 143 generates a special Hermitian adjacency matrix H4 which is a modification of the generated Hermitian adjacency matrix H (S1220). In order to generate the special Hermitian adjacency matrix H4, the second scoring unit 143 can replace all the components of the value + i with the value 0 in the Hermitian adjacency matrix H. Further, all the components of the value −i can be replaced with the value C1 (C2-i). By this substitution, the exemplary Hermitian adjacency matrix H shown in the upper part of FIG. 32 is replaced with the ^{matrix H (1) shown in the lower part of FIG. 32.}

第２スコアリング部１４３は更に、上記行列Ｈ^（１）において、アウトリンクの数が２以上のノードに対応する列、換言すれば、エルミート隣接行列Ｈにおいて値−ｉを有する成分の数が２以上である列における値Ｃ１（Ｃ２−ｉ）の成分を、値Ｃ１（Ｃ２−ｉ）／Ｒに置換する。値Ｒは、アウトリンクの数であり、エルミート隣接行列Ｈの対応する列において値−ｉを有する成分の個数に対応する。この置換により、図３２下段に示される行列Ｈ^（１）は、図３３上段に示される行列Ｈ^（２）に置換される。The second scoring unit 143 further includes, in the above matrix H ⁽¹⁾ , a column corresponding to a node having two or more outlinks, in other words, the number of components having a value −i in the Hermitian adjacency matrix H is 2. The component of the value C1 (C2-i) in the above column is replaced with the value C1 (C2-i) / R. The value R is the number of outlinks and corresponds to the number of components having the value −i in the corresponding column of the Hermitian adjacency matrix H. ^{By this substitution, the matrix H (1)} shown in the lower part of FIG. 32 is replaced with the ^{matrix H (2)} shown in the upper part of FIG. 33.

第２スコアリング部１４３は更に、図３３下段に示すように、行列Ｈ^（２）における対角成分を全て値−１に置換して、特殊エルミート隣接行列Ｈ４を生成する。続くＳ１２３０において、第２スコアリング部１４３は、Ｓ１１３０での処理と同様に、列ベクトルＢを生成する。 ^{The second scoring unit 143 further replaces all the diagonal components in the matrix H (2)} with a value -1 as shown in the lower part of FIG. 33 to generate a special Hermitian adjacency matrix H4. In the subsequent S1230, the second scoring unit 143 generates the column vector B as in the process in S1130.

その後、第２スコアリング部１４３は、Ｓ１１４０での処理と同様に、特殊エルミート隣接行列Ｈ４及び列ベクトルＢを含む連立方程式を解くことにより、連立方程式の解に対応するスコア基準ベクトルＵの各成分ｕ［ｍ］（１≦ｍ≦Ｎ）を求める（Ｓ１２４０）。 After that, the second scoring unit 143 solves the simultaneous equations including the special Hermitian adjacency matrix H4 and the column vector B in the same manner as the processing in S1140, so that each component of the score reference vector U corresponding to the solution of the simultaneous equations u [m] (1 ≦ m ≦ N) is obtained (S1240).

Ｓ１２４０での処理は、特殊エルミート隣接行列Ｈ３に代えて、特殊エルミート隣接行列Ｈ４が用いられる点を除けば、Ｓ１１４０での処理と同じである。連立方程式の解において、インリンクを持たない先端ノードに対応する成分は、実数１を示す。 The processing in S1240 is the same as the processing in S1140 except that the special Hermitian adjacency matrix H4 is used instead of the special Hermitian adjacency matrix H3. In the solution of simultaneous equations, the component corresponding to the tip node having no inlink shows the real number 1.

続くＳ１２５０において、第２スコアリング部１４３は、Ｓ１２４０で算出したスコア基準ベクトルＵの各成分ｕ［ｍ］に基づいて、各ノードのスコア基準値Ｖｃ［ｍ］を決定する。本実施形態によれば、第２スコアリング部１４３は、各ノードのスコア基準値Ｖｃ［ｍ］を、スコア基準ベクトルＵの、対応する成分ｕ［ｍ］と同じ値に決定することができる。 In the subsequent S1250, the second scoring unit 143 determines the score reference value Vc [m] of each node based on each component u [m] of the score reference vector U calculated in S1240. According to the present embodiment, the second scoring unit 143 can determine the score reference value Vc [m] of each node to be the same value as the corresponding component u [m] of the score reference vector U.

続くＳ１２６０において、第２スコアリング部１４３は、各ノードのスコア相当値Ｚ［ｍ］（１≦ｍ≦Ｎ）として、各ノードのスコア基準値Ｖｃ［ｍ］（１≦ｍ≦Ｎ）に基づいた次式に従う値Ｚ［ｍ］を算出する。値ｄは、ゼロより大きい任意の実数である。値ｄは、値１であってもよい。 In the following S1260, the second scoring unit 143 sets the score equivalent value Z [m] (1 ≦ m ≦ N) of each node based on the score reference value Vc [m] (1 ≦ m ≦ N) of each node. The value Z [m] according to the following equation is calculated. The value d is any real number greater than zero. The value d may be the value 1.

第２スコアリング部１４３は更に、文書ネットワーク内の各ノードＤ［ｍ］（１≦ｍ≦Ｎ）のスコアＸを、スコア相当値Ｚ［ｍ］に基づいて算出する（Ｓ１２６０）。第２スコアリング部１４３は、第１実施形態と同様に、ノードＤ［ｍ］に対応するスコアＸを、Ｘ＝Ｚ［ｍ］−Ｚ０に従って算出することができる。Ｚ０は、例えば、文書ネットワーク全体におけるＺ［ｍ］の最小値である。Ｚ０は、値ゼロであってもよい。

The second scoring unit 143 further calculates the score X of each node D [m] (1 ≦ m ≦ N) in the document network based on the score equivalent value Z [m] (S1260). The second scoring unit 143 can calculate the score X corresponding to the node D [m] according to X = Z [m] −Z0, as in the first embodiment. Z0 is, for example, the minimum value of Z [m] in the entire document network. Z0 may have a value of zero.

本実施形態では、このように各ノードのスコアＸが算出される。算出されたスコアＸの扱いは、第１実施形態と同様である。本実施形態によれば、第１実施形態のように固有値及び固有ベクトルを求めることなく、エルミート隣接行列Ｈを用いて、各ノードのスコアＸを算出することができる。 In this embodiment, the score X of each node is calculated in this way. The treatment of the calculated score X is the same as that of the first embodiment. According to the present embodiment, the score X of each node can be calculated using the Hermitian adjacency matrix H without obtaining the eigenvalues and the eigenvectors as in the first embodiment.

以上に、本開示の例示的実施形態を説明したが、本開示は、上述の実施形態に限定されない。本開示は、ウェブ文書に限定されないリンク／引用関係を持つ文書のスコアリングに適用されてもよい。第４実施形態及び第５実施形態に係る技術的思想は、第２実施形態又は第３実施形態に適用されてもよい。 Although the exemplary embodiments of the present disclosure have been described above, the present disclosure is not limited to the above-described embodiments. The disclosure may apply to scoring documents with link / citation relationships that are not limited to web documents. The technical ideas according to the fourth embodiment and the fifth embodiment may be applied to the second embodiment or the third embodiment.

先端ノードを有さない文書ネットワークに、インリンクを持たないダミーノードＤＰであって、文書ネットワーク内の全てのノードへのアウトリンクを持つダミーノードＤＰを付加する技術は、第１実施形態から第５実施形態に適用され得る。更に言えば、このインリンクを持たないダミーノードＤＰは、先端ノードを有する文書ネットワークに付加されてもよい。 The technique of adding a dummy node DP having no inlink and having an outlink to all the nodes in the document network to the document network having no advanced node is the first to the first embodiment. 5 Can be applied to embodiments. Furthermore, the dummy node DP having no inlink may be added to the document network having the advanced node.

同様に、後端ノードを有さない文書ネットワークに、文書ネットワーク内のすべてのノードからのインリンクを有するダミーノードＤＰであって、アウトリンクを持たないダミーノードＤＰを付加する技術は、第１実施形態から第５実施形態に適用され得る。更に言えば、このアウトリンクを持たないダミーノードＤＰは、後端ノードを有する文書ネットワークに付加されてもよい。 Similarly, the first technique is to add a dummy node DP that has inlinks from all the nodes in the document network to the document network that does not have a trailing end node and that does not have an outlink. It can be applied from the embodiment to the fifth embodiment. Furthermore, the dummy node DP having no outlink may be added to the document network having the trailing node.

上記実施形態における１つの構成要素が有する機能は、複数の構成要素に分散して設けられてもよい。複数の構成要素が有する機能は、１つの構成要素に統合されてもよい。上記実施形態の構成の一部は、省略されてもよい。特許請求の範囲に記載の文言から特定される技術思想に含まれるあらゆる態様が本開示の実施形態である。
The functions of one component in the above embodiment may be distributed to a plurality of components. The functions of the plurality of components may be integrated into one component. Some of the configurations of the above embodiments may be omitted. The embodiments of the present disclosure are all aspects contained in the technical idea identified from the wording described in the claims.

Claims

A document network discriminator configured to discriminate at least a document network composed of a plurality of documents linked by a weak connection based on data representing a connection relationship between documents.
A document discriminating unit configured to discriminate a specific document having an inlink from two or more documents included in the discriminated document network, and a document discriminating unit.
A sub-network discriminating unit configured to discriminate a plurality of sub-networks included in the document network based on the discriminated specific document.
A score calculation unit configured to calculate the scores of the plurality of documents constituting the document network by executing individual processing for each of the determined plurality of subnetworks. In the process, a score calculation unit that calculates the score of each document included in the corresponding subnetwork, and
With
The document network includes one or more duplicate documents that belong to two or more of the plurality of subnetworks.
The score calculation unit calculates one score for the corresponding duplicate document by integrating the scores of the corresponding duplicate document in the two or more subnetworks for each of the one or more duplicate documents. Information processing system.

The information processing system according to claim 1.
The sub-network discriminating unit discriminates a plurality of sub-networks having the specific document as a boundary, and discriminates the plurality of sub-networks.
The plurality of subnetworks are two or more upstream networks corresponding to two or more inlinks possessed by the specific document, and each of the upstream networks has one inlink corresponding to the specific document. It has at least two or more upstream subnetworks and at least a downstream subnet network that is connected to the particular document through the outlinks that the particular document has.
The specific document is the duplicate document belonging to the two or more upstream subnetworks.
The score calculation unit calculates one integrated score for the specific document by integrating the scores of the specific document in the upstream subnetwork, and obtains the score of each document belonging to the downstream subnetwork. An information processing system that calculates based on the integrated score of a specific document.

The information processing system according to claim 1.
The sub-network discriminating unit is a sub-network for each in-link possessed by the specific document as the plurality of sub-networks, and includes a document group located upstream of the corresponding in-link, the specific document, and the specific document. An information processing system that discriminates a sub-network for each in-link including a group of documents located downstream of the out-link of the document.

The information processing system according to claim 1.
The sub-network discriminating unit is a sub-network for each combination of in-link and out-link possessed by the specific document as the plurality of sub-networks, and includes a group of documents located upstream of the in-link corresponding to the combination. An information processing system that discriminates a sub-network for each combination including the specific document and a document group located downstream from the outlink corresponding to the combination.

The information processing system according to claim 3 or 4.
The integration is achieved by calculating the sum of the scores of the corresponding duplicate documents in the two or more subnetworks, or by calculating the representative value of the scores in the two or more subnetworks. Information processing system to be done.

The information processing system according to any one of claims 1 to 5.
The individual processing is an information processing system including a processing for calculating a score of each document included in the corresponding subnetwork by using an Hermitian adjacency matrix based on a connection relationship between documents in the corresponding subnetwork.

In the individual processing, a dummy document is added to the trailing document that does not have an outlink included in the corresponding subnetwork so that the trailing document is virtually provided with an outlink. The information processing system according to claim 6, further comprising a process of changing the corresponding subnetwork and defining an Elmeat adjacency matrix based on the connection relationship between documents in the changed subnetwork.

The Elmeet adjacency matrix is an N-by-N-column Elmeat matrix based on the connection relationship between the documents D [m] (1 ≦ m (integer) ≦ N) constituting the corresponding subnetwork, and is the p-th row. The component h (p, q) in column q is a value when there is a link from document D [p] to document D [q] and there is a link from document D [q] to document D [p]. 1 is indicated, and when neither the link from the document D [p] to the document D [q] nor the link from the document D [q] to the document D [p] exists, the value 0 is indicated and the document D [p] is indicated. When there is a link from document D [q] but no link from document D [q] to document D [p], a value + i (i is an imaginary unit) is shown, and document D [p] to document Claim 6 corresponding to an Elmeat matrix with zero diagonal, indicating a value −i, where there is no link to D [q] but there is a link from document D [q] to document D [p]. Alternatively, the information processing system according to claim 7.

The individual processing includes a processing of transforming the Hermitian adjacency matrix to define a special Hermitian adjacency matrix and calculating the score of each document included in the corresponding subnetwork using the eigenvectors of the special Hermitian adjacency matrix. The Hermitian adjacency matrix according to claim 8, wherein when each component of the eigenvector is tentatively arranged on the complex plane, all the components are deformed so as to be within the angle range of π / 2 radian in the complex plane. Information processing system.

With the processor
A memory containing instructions for causing the processor to perform a specific process, and
The specific process is
Determining a document network consisting of at least multiple documents linked by weakly linked documents based on data representing the connection relationship between documents.
Distinguishing a specific document having an inlink from two or more documents included in the discriminated document network, and
To discriminate a plurality of sub-networks included in the document network based on the discriminated specific document.
As an individual process for each of the determined plurality of sub-networks, a process of calculating the score of each document included in the corresponding sub-network is executed, so that each score of the plurality of documents constituting the document network is executed. To calculate and
Including
The document network includes one or more duplicate documents that belong to two or more of the plurality of subnetworks.
To calculate the score of each of the plurality of documents constituting the document network is to integrate the scores of the corresponding duplicate documents in the two or more subnetworks for each of the one or more duplicate documents. An information processing system that includes calculating one score for the corresponding duplicate document.

An information processing method executed by a computer
Determining a document network consisting of at least multiple documents linked by weakly linked documents based on data representing the connection relationship between documents.
Distinguishing a specific document having an inlink from two or more documents included in the discriminated document network, and
To discriminate a plurality of sub-networks included in the document network based on the discriminated specific document.
As an individual process for each of the determined plurality of sub-networks, a process of calculating the score of each document included in the corresponding sub-network is executed, so that each score of the plurality of documents constituting the document network is executed. To calculate and
Including
The document network includes one or more duplicate documents that belong to two or more of the plurality of subnetworks.
To calculate the score of each of the plurality of documents constituting the document network is to integrate the scores of the corresponding duplicate documents in the two or more subnetworks for each of the one or more duplicate documents. An information processing method comprising calculating one score for the corresponding duplicate document.

The information processing method according to claim 11.
Distinguishing the plurality of subnetworks includes discriminating a plurality of subnetworks having the specific document as a boundary.
The plurality of subnetworks are two or more upstream networks corresponding to two or more inlinks possessed by the specific document, and each of the upstream networks has one inlink corresponding to the specific document. It has at least two or more upstream subnetworks and at least a downstream subnet network that is connected to the particular document through the outlinks that the particular document has.
The specific document is the duplicate document belonging to the two or more upstream subnetworks.
To calculate the score of each of the plurality of documents constituting the document network, one integrated score is calculated for the specific document by integrating the scores of the specific document in the upstream subnetwork. , An information processing method including calculating the score of each document belonging to the downstream subnetwork based on the integrated score of the specific document.

The information processing method according to claim 11.
Distinguishing the plurality of sub-networks is a sub-network for each in-link possessed by the specific document, and includes a document group located upstream of the corresponding in-link, the specific document, and an out link possessed by the specific document. An information processing method including determining a group of documents located downstream of a link and a subnetwork for each in-link including the document group.

The information processing method according to claim 11.
Distinguishing the plurality of sub-networks is a sub-network for each combination of in-link and out-link of the specific document, and the document group located upstream of the in-link corresponding to the combination and the specific document. An information processing method including determining a sub-network for each combination including a document group located downstream from the outlink corresponding to the combination.

The information processing method according to claim 13 or 14.
The integration is achieved by calculating the sum of the scores of the corresponding duplicate documents in the two or more subnetworks, or by calculating the representative value of the scores in the two or more subnetworks. Information processing method to be performed.

To make a computer function as the document network discrimination unit, the document discrimination unit, the subnetwork discrimination unit, and the score calculation unit included in the information processing system according to any one of claims 1 to 9. Computer program.