JP2004259083A

JP2004259083A - Method, server and program for retrieving information

Info

Publication number: JP2004259083A
Application number: JP2003050314A
Authority: JP
Inventors: Shigeru Koyanagi; 滋小柳
Original assignee: Individual
Current assignee: Individual
Priority date: 2003-02-27
Filing date: 2003-02-27
Publication date: 2004-09-16

Abstract

<P>PROBLEM TO BE SOLVED: To provide an information retrieval system and an information retrieval method capable of efficiently retrieving information on a WWW, especially an information retrieval method capable of performing information retrieval reflecting the taste of a specific user retrieving the information, an information retrieval server and an information retrieval program. <P>SOLUTION: A page retrieval part 21 searches and outputs a WWW page ID having a keyword using a key word/page conversion table. A community retrieval part 22 finds a community of a user performing information retrieval by performing matrix clustering from the user/page conversion table with the user ID and the WWW page ID as initial values, and outputs the WWW page ID with much access of the user belonging to the community and the WWW page ID containing the keyword to a page list generation part 22. The page list generation part 23 generates a WWW page list ranking the WWW page ID. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、要求元のクライアントからのキーワード及びユーザＩＤに基づいて、ＷＷＷページ上で提供されている情報からユーザが所望する情報を検索して、該情報を有するＷＷＷページ情報を前記クライアントに送信する情報検索方法、情報検索サーバ、及び情報検索プログラムに関するものである。
【０００２】
【従来の技術】
ＷＷＷの普及により、膨大な量の情報が電子化されてＷＷＷページとしてアクセス可能となっており、大量のＷＷＷページによる情報が様々な分野で広く利用されている。このようなＷＷＷ上の膨大な量の情報の中から、ユーザが所望の目的の情報を効率よく得るためには、ＷＷＷ上で効率よく情報検索を行い、所望のＷＷＷページを素早く表示できることが非常に重要である。このような情報検索を行うために、従来より様々な検索エンジンが利用されている。
【０００３】
従来の検索エンジンによる情報検索システムの概要を説明するに、まず、ユーザがクライアントにおいて、所望の情報を得るためのキーワードを入力し、該キーワードをサーバに送信する。キーワードを受信したサーバは、キーワード・ページ対応表から、該キーワードに該当するＷＷＷページ情報を抽出してＷＷＷページリストを生成し、該ＷＷＷページリストをクライアントに表示する。ユーザは、表示された各ＷＷＷページ情報のＵＲＬを用いて所望のＷＷＷページを表示させることができる。また、表示されたＷＷＷページリストのＷＷＷページ情報数が多い場合には、更にキーワードを入力して絞込みを行う。なお、前記キーワード・ページ対応表は、サーバが予め検索対象となるＷＷＷページを収集し、各ＷＷＷページに含まれるキーワードを抽出することにより作成されている。
【０００４】
一般に、検索エンジンにより検索可能なＷＷＷページ数は膨大であり、慣用されている用語や多用されている用語をキーワードとして情報検索を行うと、該キーワードを有するＷＷＷページが膨大に存在するので、ＷＷＷページリストに表示されるＷＷＷページ情報数も膨大となる。従って、更にキーワードを入力してＷＷＷページの絞込みを行う必要がある。例えば、ユーザが、ある田中氏の情報を得るために従来の検索エンジンで「田中」をキーワードとして情報検索を行うと、百数十万件のＷＷＷページがヒットする。これらＷＷＷページリストの中には、様々な分野における「田中」を氏とする者の情報が混在するので、該ＷＷＷページリストの中から、ユーザが所望の田中氏の情報を有するＷＷＷページを探し出すことは困難である。更に、音楽関係の田中氏の情報を得たいので、「田中」と「音楽」とをキーワードとして絞込みを行っても、二十数万件がヒットし、これらの中から所望の田中氏の情報を有するＷＷＷページを探し出すことも困難である。従って、ＷＷＷ上で情報検索を行う場合には、所望のＷＷＷページに含まれ、且つ慣用語や多用語でないと推測されるキーワードを使うことが望まれるが、該キーワードを的確に想定して効率のよい情報検索を行うことは、ＷＷＷページにおける慣用語や多用語を把握した熟練者でないと難しい。
【０００５】
他方、従来の検索エンジンでは、情報検索に不慣れなユーザや的確なキーワードを幾つも思いつかない場合等でも、ユーザが所望のＷＷＷページを効率よく表示させることができるように、検索結果を順位付けしてＷＷＷページリストに表示するようにしている。このような検索結果の順位付けには、ＷＷＷページのアクセス数や被リンク数を基準としている場合が多く、アクセス数の多いものは、多くのユーザに対して人気が高いので、情報検索を行ったユーザが望むＷＷＷページに該当する可能性が高く、また、被リンク数が多いものは、多くのＷＷＷページからリンクが貼られていることからＷＷＷページの客観的な重用度が高いので、同様に、ユーザの望むＷＷＷページに該当する可能性が高いと考えられる（特許文献１及び特許文献２参照）。
【０００６】
図１０は、従来の情報検索システムの処理の一例を示す図であるが、まず、クライアントに入力されたキーワードにより、サーバが予め作成されたキーワード・ページ対応表を用いて該当するキーワードを有するＷＷＷページを検索する。
【０００７】
図１１は、前記キーワード・ページ対応表であり、縦軸をＷＷＷページ、横軸をキーワードとして、「０」は対応するキーワードを有しないことを、「１」は対応するキーワードを有することを表示している。このようなキーワード・ページ対応表は、ロボット検索等により一定期間毎にＷＷＷページを巡回してキーワードが抽出されて随時更新されている。このようなキーワード・ページ対応表により、クライアントが入力したキーワードを含むＷＷＷページが抽出される。
【０００８】
次に、検索された複数のＷＷＷページに対して、ページ優先度表を用いて各ＷＷＷページの順位付けを行う。このページ優先度表には、ＷＷＷページのアクセス数や被リンク数を基準として各ＷＷＷページ毎に優先度を示す数値が記録されている。図１２は、前記ページ優先度表であり、各ＷＷＷページ毎のアクセス数又は被リンク数に応じてランク付けがされており、数値が大きいもの程、アクセス数又は被リンク数が多く、優先度が高いことを示している。この順位付けに従って、検索されたＷＷＷページ情報を優先度の高い順から降順としてＷＷＷページリストを生成してクライアントに表示する。
【０００９】
例えば、クライアントがキーワード「ｃ」を入力して検索を行った場合、情報検索システムとして機能するサーバは、前記キーワード・ページ対応表よりＷＷＷページＣ，Ｄ，Ｅが抽出する。さらに、前記ページ優先度表から、抽出されたＷＷＷページＣ，Ｅ，Ｄの優先度を得て、優先度の高い順にＣ，Ｅ，Ｄの順序でＷＷＷページリストが生成され、クライアントに表示される。
【００１０】
ユーザはクライアントに表示されたＷＷＷページリストの各ＷＷＷページ情報から任意のＷＷＷページを選択して表示させるが、アクセス数又は被リンク数の多いＷＷＷページは、多くのユーザにとって有用なものであるから、情報検索を行っている当該ユーザにとっても有用である可能性が高い。このように、アクセス数や被リンク数を指標として、広くユーザに有用であるＷＷＷページから順位付けしたＷＷＷページリストを生成することにより、ユーザが望んでいる情報を有するＷＷＷページが上位に含まれる可能性が高くなり、効率のよい情報検索を提供することができる。
【００１１】
【特許文献１】
特開２００２−２０２９９２号公報
【特許文献２】
特開２００２−２１５６７１号公報
【００１２】
【発明が解決しようとする課題】
しかし、ＷＷＷ上で提供される情報の種類が多様化するとともに、情報検索の目的もユーザ毎に多様となっており、更にＷＷＷの利用層が広がることにより、情報検索を利用する各ユーザの嗜好も様々となっている。従って、情報検索を行った特定のユーザが望んでいる情報を有するＷＷＷページが、必ずしも、全ユーザに人気の高いものや重要なものであるとは限らないので、アクセス数や被リンク数を基準とした順位付けは、多種多様なユーザすべての検索効率を高めるものではない。
【００１３】
例えば、若年層のユーザに人気のあるＷＷＷページが、高齢層のユーザにとっても人気があるとは限らないように、特定のユーザが自分の嗜好にあった情報を検索しようとしても、全ユーザのアクセス数や被リンク数を基準として順位付けされたＷＷＷページリストでは、当該ユーザが望む情報を有するＷＷＷページが上位に含まれるとは限らないので、当該ユーザはＷＷＷページリストによる順位付けに関係なく各ＷＷＷページを順次表示させることにより、各々のＷＷＷページに所望の情報が含まれているかを確認せねばならず、結局、情報検索に時間と手間を要することとなり非効率である。
【００１４】
本発明は、これらに鑑みてなされたものであり、ＷＷＷ上で効率のよい情報検索を行うことができる情報検索システム及び情報検索方法、特に情報検索を行う特定のユーザの嗜好を反映させた情報検索を行うことができる情報検索方法、情報検索サーバ、及び情報検索プログラムを提供することを目的とする。
【００１５】
【課題を解決するための手段】
本発明の請求項１に係る情報検索方法は、要求元のクライアントからのキーワード及びユーザＩＤに基づいて、ＷＷＷページ上で提供されている情報からユーザが所望する情報を検索して、該情報を有するＷＷＷページ情報を前記クライアントに送信する情報検索方法であって、前記キーワードに基づいて、ＷＷＷページに含まれるキーワードを各ＷＷＷページに対応させて記録されたキーワード・ページ対応表から、該キーワードを含むＷＷＷページを検索し、前記ユーザＩＤとＷＷＷページ検索結果とを初期値として、各ＷＷＷページと各ユーザとを行成分と列成分として各ユーザの各ＷＷＷページへのアクセス履歴を２次元空間上に表現したユーザ・ページ対応表から、アクセス履歴が密集した領域を抽出することにより、前記ユーザと前記キーワードにおいて類似した傾向を有するユーザのコミュニティを求め、該コミュニティに属するＷＷＷページを選択し、該コミュニティに属するユーザのアクセス頻度に基づいて、前記コミュニティに属するＷＷＷページを順位付けて各ＷＷＷページ情報を示したＷＷＷページリストを生成し、該ＷＷＷページリストを前記クライアントに送信するものである。
【００１６】
また、本発明の請求項２に係る情報検索方法は、要求元のクライアントからのキーワード及びユーザＩＤに基づいて、ＷＷＷページ上で提供されている情報からユーザが所望する情報を検索して、該情報を有するＷＷＷページ情報を前記クライアントに送信する情報検索方法であって、前記キーワードに基づいて、ＷＷＷページに含まれるキーワードを各ＷＷＷページに対応させて記録されたキーワード・ページ対応表から、該キーワードを含むＷＷＷページを検索し、前記ユーザＩＤに基づいて、予め各ユーザの好みを記録したユーザプロファイルから、該ユーザと好みが類似するユーザのコミュニティを求め、前記ＷＷＷページ検索結果と前記コミュニティに属するユーザとを初期値として、各ＷＷＷページと各ユーザとを行成分と列成分として各ユーザの各ＷＷＷページへのアクセス履歴を２次元空間上に表現したユーザ・ページ対応表から、アクセス履歴が密集した領域を抽出することにより、該領域に属するＷＷＷページを選択し、前記領域に属するユーザのアクセス頻度に基づいて、前記領域に属するＷＷＷページを順位付けて各ＷＷＷページ情報を示したＷＷＷページリストを生成し、該ＷＷＷページリストを前記クライアントに送信するものである。
【００１７】
また、本発明（請求項３）は、請求項１又は２に記載の情報検索方法において、前記クライアントからのキーワードが、ユーザが入力したものである。
【００１８】
また、本発明（請求項４）は、請求項１又は２に記載の情報検索方法において、前記クライアントからのキーワードが、クライアントに表示されたキーワード群からユーザが選択したものである。
【００１９】
また、本発明（請求項５）は、請求項１又は２に記載の情報検索方法において、前記クライアントからのキーワードは、クライアントに表示されているＷＷＷページから抽出されたものである。
【００２０】
また、本発明（請求項６）は、請求項１に記載の情報検索方法において、前記ユーザ・ページ対応表から、要求元のユーザがアクセスしたＷＷＷページと前記キーワードを含むＷＷＷページとを次の対象として選択し、対象となったＷＷＷページにアクセスしたユーザを、その次の対象として選択し、これを所定の収束条件を満たすまで繰り返すことにより、アクセス履歴が密集した領域を抽出するものである。
【００２１】
また、本発明（請求項７）は、請求項６に記載の情報検索方法において、前記ユーザ・ページ対応表から、ＷＷＷページ又はユーザを次の対象として選択する際に、各ＷＷＷページ又はユーザのアクセス頻度が所定の閾値以下のものであって前記キーワードを含むＷＷＷページ以外のものを対象外とするものである。
【００２２】
また、本発明（請求項８）は、請求項２に記載の情報検索方法において、前記ユーザ・ページ対応表から、前記コミュニティに属するユーザがアクセスしたＷＷＷページと前記キーワードを含むＷＷＷページとを次の対象として選択し、対象となったＷＷＷページにアクセスしたユーザと前記コミュニティに属するユーザとを、その次の対象として選択し、これを所定の収束条件を満たすまで繰り返すことにより、アクセス履歴が密集した領域を抽出するものである。
【００２３】
また、本発明（請求項９）は、請求項８に記載の情報検索方法において、前記ユーザ・ページ対応表から、ＷＷＷページ又はユーザを次の対象として選択する際に、各ＷＷＷページ又はユーザのアクセス頻度が所定の閾値以下のものであって前記キーワードを含むＷＷＷページ又は前記コミュニティに属するユーザ以外のものを対象外とするものである。
【００２４】
また、本発明（請求項１０）は、請求項１又は２に記載の情報検索方法において、前記ＷＷＷページリストは、前記コミュニティ又は領域に属するＷＷＷページを、前記キーワードを含み且つ前記コミュニティ又は領域のアクセス頻度が高いＷＷＷページと、前記キーワードを含み且つ前記コミュニティ又は領域のアクセス頻度が低いＷＷＷページと、前記キーワードを含まないがコミュニティ又は領域のアクセスが頻度が高いＷＷＷページとに分類し、各分類毎に順位付けて各ＷＷＷページ情報を示したものである。
【００２５】
また、本発明の請求項１１に係る情報検索サーバは、要求元のクライアントからのキーワード及びユーザＩＤに基づいて、ＷＷＷページ上で提供されている情報からユーザが所望する情報を検索して、該情報を有するＷＷＷページ情報をＷＷＷページリストとして前記クライアントに送信する情報検索サーバであって、前記キーワードに基づいて、ＷＷＷページに含まれるキーワードを各ＷＷＷページに対応させて記録されたキーワード・ページ対応表から、該キーワードを含むＷＷＷページを検索して検索結果を出力するページ検索手段と、前記ユーザＩＤとページ検索手段の検索結果とを初期値として、各ＷＷＷページと各ユーザとを行成分と列成分として各ユーザの各ＷＷＷページへのアクセス履歴を２次元空間上に表現したユーザ・ページ対応表から、アクセス履歴が密集した領域を抽出することにより、前記ユーザと前記キーワードにおいて類似した傾向を有するユーザのコミュニティを求め、該コミュニティに属するＷＷＷページを選択して、該ＷＷＷページとコミュニティのアクセス頻度とを出力するコミュニティ検索手段と、前記コミュニティに属するユーザのアクセス頻度に基づき、前記コミュニティ検索手段により選択されたＷＷＷページを順位付けて、各ＷＷＷページ情報を示したＷＷＷページリストを生成して出力するページリスト生成手段と、を具備してなるものである。
【００２６】
また、本発明の請求項１２に係る情報検索サーバは、要求元のクライアントからのキーワード及びユーザＩＤに基づいて、ＷＷＷページ上で提供されている情報からユーザが所望する情報を検索して、該情報を有するＷＷＷページ情報をＷＷＷページリストとして前記クライアントに送信する情報検索サーバであって、前記キーワードに基づいて、ＷＷＷページに含まれるキーワードを各ＷＷＷページに対応させて記録されたキーワード・ページ対応表から、該キーワードを含むＷＷＷページを検索して検索結果を出力するページ検索手段と、前記ユーザＩＤに基づいて、予め各ユーザの好みを記録したユーザプロファイルから、該ユーザと好みが類似するユーザのコミュニティを求め、該コミュニティに属するユーザＩＤを出力するコミュニティ検索手段と、前記検索結果と前記コミュニティに属するユーザＩＤとを初期値として、各ＷＷＷページと各ユーザとを行成分と列成分として各ユーザの各ＷＷＷページへのアクセス履歴を２次元空間上に表現したユーザ・ページ対応表から、アクセス履歴が密集した領域を抽出することにより、該領域に属するＷＷＷページを選択して、該ＷＷＷページとアクセス頻度とを出力するマトリクスクラスタリング手段と、前記アクセス頻度に基づき、前記マトリクスクラスタリング手段により選択されたＷＷＷページを順位付けて、各ＷＷＷページ情報を示したＷＷＷページリストを生成して出力するページリスト生成手段と、を具備してなるものである。
【００２７】
また、本発明（請求項１３）は、請求項１１又は１２に記載の情報検索サーバにおいて、前記ページ検索手段は、ユーザが前記クライアントに入力したキーワードに基づいて、ＷＷＷページを検索するものである。
【００２８】
また、本発明（請求項１４）は、請求項１１又は１２に記載の情報検索サーバにおいて、前記ページ検索手段は、前記クライアントにキーワード群を表示し、該キーワード群からユーザが選択したキーワードに基づいて、ＷＷＷページを検索するものである。
【００２９】
また、本発明（請求項１５）は、請求項１１又は１２に記載の情報検索サーバにおいて、前記ページ検索手段は、前記クライアントに表示されているＷＷＷページからキーワードを抽出し、該キーワードに基づいて、ＷＷＷページを検索するものである。
【００３０】
また、本発明（請求項１６）は、請求項１１に記載の情報検索サーバにおいて、前記コミュニティ検索手段は、前記ユーザ・ページ対応表から、要求元のユーザがアクセスしたＷＷＷページと前記キーワードを含むＷＷＷページとを次の対象として選択し、対象となったＷＷＷページにアクセスしたユーザを、その次の対象として選択し、これを所定の収束条件を満たすまで繰り返すことにより、アクセス履歴が密集した領域を抽出するものである。
【００３１】
また、本発明（請求項１７）は、請求項１６に記載の情報検索サーバにおいて、前記コミュニティ検索手段は、前記ユーザ・ページ対応表から、ＷＷＷページ又はユーザを次の対象として選択する際に、各ＷＷＷページ又はユーザのアクセス頻度が所定の閾値以下のものであって前記キーワードを含むＷＷＷページ以外のものを対象外とするものである。
【００３２】
また、本発明（請求項１８）は、請求項１２に記載の情報検索サーバにおいて、前記マトリクスクラスタリング手段は、前記ユーザ・ページ対応表から、前記コミュニティに属するユーザがアクセスしたＷＷＷページと前記キーワードを含むＷＷＷページとを次の対象として選択し、対象となったＷＷＷページにアクセスしたユーザと前記コミュニティに属するユーザとを、その次の対象として選択し、これを所定の収束条件を満たすまで繰り返すことにより、アクセス履歴が密集した領域を抽出するものである。
【００３３】
また、本発明（請求項１９）は、請求項１８に記載の情報検索サーバにおいて、前記マトリクスクラスタリング手段は、前記ユーザ・ページ対応表から、ＷＷＷページ又はユーザを次の対象として選択する際に、各ＷＷＷページ又はユーザのアクセス頻度が所定の閾値以下のものであって前記キーワードを含むＷＷＷページ又は前記コミュニティに属するユーザ以外のものを対象外とするものである。
【００３４】
また、本発明（請求項２０）は、請求項１１又は１２に記載の情報検索サーバにおいて、前記ページリスト生成手段は、前記コミュニティ又は前記領域に属するＷＷＷページを、前記キーワードを含み且つ前記コミュニティ又は領域のアクセス頻度が高いＷＷＷページと、前記キーワードを含み且つ前記コミュニティ又は領域のアクセス頻度が低いＷＷＷページと、前記キーワードを含まないがコミュニティ又は領域のアクセスが頻度が高いＷＷＷページとに分類し、各分類毎に順位付けて各ＷＷＷページ情報を示したＷＷＷページリストを生成するものである。
【００３５】
また、本発明の請求項２１に係る情報検索プログラムは、コンピュータを、要求元のクライアントからのキーワード及びユーザＩＤに基づいて、ＷＷＷページ上で提供されている情報からユーザが所望する情報を検索して、該情報を有するＷＷＷページ情報をＷＷＷページリストとして前記クライアントに送信する情報検索サーバとして機能させるための情報検索プログラムであって、コンピュータを、前記キーワードに基づいて、ＷＷＷページに含まれるキーワードを各ＷＷＷページに対応させて記録されたキーワード・ページ対応表から、該キーワードを含むＷＷＷページを検索して検索結果を出力するページ検索手段、前記ユーザＩＤとページ検索手段の検索結果とを初期値として、各ＷＷＷページと各ユーザとを行成分と列成分として各ユーザの各ＷＷＷページへのアクセス履歴を２次元空間上に表現したユーザ・ページ対応表から、アクセス履歴が密集した領域を抽出することにより、前記ユーザと前記キーワードにおいて類似した傾向を有するユーザのコミュニティを求め、該コミュニティに属するＷＷＷページを選択して、該ＷＷＷページとコミュニティのアクセス頻度とを出力するコミュニティ検索手段、前記コミュニティに属するユーザのアクセス頻度に基づき、前記コミュニティ検索手段により選択されたＷＷＷページを順位付けて、各ＷＷＷページ情報を示したＷＷＷページリストを生成して出力するページリスト生成手段、として機能させるものである。
【００３６】
また、本発明の請求項２２に係る情報検索プログラムは、コンピュータを、要求元のクライアントからのキーワード及びユーザＩＤに基づいて、ＷＷＷページ上で提供されている情報からユーザが所望する情報を検索して、該情報を有するＷＷＷページ情報をＷＷＷページリストとして前記クライアントに送信する情報検索サーバとして機能させるための情報検索プログラムであって、コンピュータを、前記キーワードに基づいて、ＷＷＷページに含まれるキーワードを各ＷＷＷページに対応させて記録されたキーワード・ページ対応表から、該キーワードを含むＷＷＷページを検索して検索結果を出力するページ検索手段、前記ユーザＩＤに基づいて、予め各ユーザの好みを記録したユーザプロファイルから、該ユーザと好みが類似するユーザのコミュニティを求め、該コミュニティに属するユーザＩＤを出力するコミュニティ検索手段、前記検索結果と前記コミュニティに属するユーザＩＤとを初期値として、各ＷＷＷページと各ユーザとを行成分と列成分として各ユーザの各ＷＷＷページへのアクセス履歴を２次元空間上に表現したユーザ・ページ対応表から、アクセス履歴が密集した領域を抽出することにより、該領域に属するＷＷＷページを選択して、該ＷＷＷページとアクセス頻度とを出力するマトリクスクラスタリング手段、前記アクセス頻度に基づき、前記マトリクスクラスタリング手段により選択されたＷＷＷページを順位付けて、各ＷＷＷページ情報を示したＷＷＷページリストを生成して出力するページリスト生成手段、として機能させるものである。
【００３７】
また、本発明（請求項２３）は、請求項２１又は２２に記載の情報検索プログラムにおいて、前記ページ検索手段を、ユーザが前記クライアントに入力したキーワードに基づいて、ＷＷＷページを検索するものとして機能させるものである。
【００３８】
また、本発明（請求項２４）は、請求項２１又は２２に記載の情報検索プログラムにおいて、前記ページ検索手段を、前記クライアントにキーワード群を表示し、該キーワード群からユーザが選択したキーワードに基づいて、ＷＷＷページを検索するものとして機能させるものである。
【００３９】
また、本発明（請求項２５）は、請求項２１又は２２に記載の情報検索プログラムにおいて、前記ページ検索手段を、前記クライアントに表示されているＷＷＷページからキーワードを抽出し、該キーワードに基づいて、ＷＷＷページを検索するものとして機能させるものである。
【００４０】
また、本発明（請求項２６）は、請求項２１に記載の情報検索プログラムにおいて、前記コミュニティ検索手段を、前記ユーザ・ページ対応表から、要求元のユーザがアクセスしたＷＷＷページと前記キーワードを含むＷＷＷページとを次の対象として選択し、対象となったＷＷＷページにアクセスしたユーザを、その次の対象として選択し、これを所定の収束条件を満たすまで繰り返すことにより、アクセス履歴が密集した領域を抽出するものとして機能させるものである。
【００４１】
また、本発明（請求項２７）は、請求項２６に記載の情報検索プログラムにおいて、前記コミュニティ検索手段を、前記ユーザ・ページ対応表から、ＷＷＷページ又はユーザを次の対象として選択する際に、各ＷＷＷページ又はユーザのアクセス頻度が所定の閾値以下のものであって前記キーワードを含むＷＷＷページ以外のものを対象外とするものとして機能させるものである。
【００４２】
また、本発明（請求項２８）は、請求項２２に記載の情報検索プログラムにおいて、前記マトリクスクラスタリング手段を、前記ユーザ・ページ対応表から、前記コミュニティに属するユーザがアクセスしたＷＷＷページと前記キーワードを含むＷＷＷページとを次の対象として選択し、対象となったＷＷＷページにアクセスしたユーザと前記コミュニティに属するユーザとを、その次の対象として選択し、これを所定の収束条件を満たすまで繰り返すことにより、アクセス履歴が密集した領域を抽出するものとして機能させるものである。
【００４３】
また、本発明（請求項２９）は、請求項２８に記載の情報検索プログラムにおいて、前記マトリクスクラスタリング手段を、前記ユーザ・ページ対応表から、ＷＷＷページ又はユーザを次の対象として選択する際に、各ＷＷＷページ又はユーザのアクセス頻度が所定の閾値以下のものであって前記キーワードを含むＷＷＷページ又は前記コミュニティに属するユーザ以外のものを対象外とするものとして機能させるものである。
【００４４】
また、本発明（請求項３０）は、請求項２１又は２２に記載の情報検索プログラムにおいて、前記ページリスト生成手段を、前記コミュニティ又は前記領域に属するＷＷＷページを、前記キーワードを含み且つ前記コミュニティ又は領域のアクセス頻度が高いＷＷＷページと、前記キーワードを含み且つ前記コミュニティ又は領域のアクセス頻度が低いＷＷＷページと、前記キーワードを含まないがコミュニティ又は領域のアクセスが頻度が高いＷＷＷページとに分類し、各分類毎に順位付けて各ＷＷＷページ情報を示したＷＷＷページリストを生成するものとして機能させるものである。
【００４５】
【発明の実施の形態】
以下、本発明の実施の形態を図面に基づき具体的に説明する。
〔第１の実施の形態〕
図１に示すように、本実施の形態に係る情報検索システム１は、情報検索サーバ２と複数のクライアント３とが、ネットワーク４を介して双方通信可能な状態で接続されて構成されている。情報検索サーバ２及びクライアント３は、計算機とソフトウェア等によって構成されている。該計算機は、例えばパーソナルコンピュータであり、ＣＰＵ、ＲＡＭ、ハードディスク、ＣＲＴ等の表示装置、キーボードやマウス等の入力装置、ＬＡＮボード等の通信装置等から構成される。ネットワーク４は、広域網や公衆網、ＬＡＮ等であり、本実施の形態ではインターネットを例に説明する。
【００４６】
本実施の形態にように、インターネットを介して情報検索サーバ２にクライアント３がアクセスされる場合では、一般に、情報検索サーバ２は検索エンジンを提供するＷＷＷサーバであり、クライアント３はインターネットブラウザ等のインタフェースを具備し、該インタフェースにより検索エンジンやＷＷＷページを閲覧することができるものである。ユーザがクライアント３からＷＷＷ上の情報検索を行う際には、クライアント３のインタフェースに表示された検索エンジンにおいてキーワードを入力することにより、該クライアント３からネットワーク４を介して情報検索サーバ２に前記キーワード及びユーザＩＤが送信され、これに対し、情報検索サーバ２が該キーワード及びユーザＩＤに基づいてＷＷＷページの検索を行い、検索結果を順位付けしてＷＷＷページリストを生成し、クライアント３へ送信する。ユーザはクライアント３のインタフェースに表示されたＷＷＷページリストから所望のＷＷＷページを選択することにより、クライアント３にＷＷＷページを閲覧する。
【００４７】
前記情報検索サーバ２は、管理部２０、ページ検索部２１、コミュニティ検索部２２、ページリスト生成部２３を備えてなるものである。
管理部２０は、ページ検索部２１、コミュニティ検索部２２、ページリスト生成部２３の機能の管理とネットワーク４との接続の確保を行っており、クライアント３から送信されたキーワード及びユーザＩＤは管理部２０が受信して、ページ検索部２１及びコミュニティ検索部２２へ送信するようになっている。また、ページリスト生成部２３により作成されたＷＷＷページリストを所定のクライアント３に送信するようになっている。さらに、管理部２０は、予め定められた一定周期毎にＷＷＷ上のＷＷＷページを巡回して、各ＷＷＷページに含まれるキーワードを抽出し、該キーワードとＷＷＷページＩＤとを対応させたキーワード・ページ対応表として記録し、且つ、ＷＷＷページのＵＲＬやページ運営者、ページ要約等のＷＷＷページ情報をＷＷＷページＩＤと対応させたＷＷＷページ情報データベースとして記録するロボット検索機能と、アクセスログ等から各ユーザのＷＷＷページへのアクセス履歴を、ユーザＩＤとＷＷＷページＩＤとを対応させたユーザ・ページ対応表として記録する機能とを有している。このように機能する管理部２０は、例えばＣＰＵ及び通信装置により実現することができる。
【００４８】
ページ検索部２１は、前記管理部２０からキーワードを受け、ロボット検索等を用いて蓄積されたキーワード・ページ対応表を用いて、該キーワードを有するＷＷＷページＩＤを検索して、コミュニティ検索部２１へ出力するものである。ＷＷＷページＩＤは、ＵＲＬ等、ＷＷＷページを識別できるものであればよい。キーワード・ページ対応表は、例えばハードディスクに格納されており、ページ検索部２１は、例えばＣＰＵにより実現することができる。
【００４９】
コミュニティ検索部２２は、前記管理部２０からユーザＩＤを、前記ページ検索部２１からＷＷＷページの検索結果、即ち抽出されたＷＷＷページＩＤを受け、該ユーザＩＤとＷＷＷページＩＤを初期値として、ユーザ・ページ対応表からマトリクスクラスタリングを行って情報検索を行っているユーザのコミュニティを求め、該コミュニティに属するユーザのアクセスが多いＷＷＷページＩＤと前記キーワードを含むＷＷＷページＩＤとをページリスト生成部２２へ出力するものである。マトリクスクラスタリングは、「１」と「０」との２値をもつ行列から密な部分行列を抽出する手法であるが、詳細については後述する。コミュニティとは、マトリクスクラスタリングにより抽出された密な部分行列に含まれるユーザ群であり、情報検索を行っているユーザがアクセスしたＷＷＷページ又は前記キーワードを含むＷＷＷページにアクセスしたことがあるユーザの集まりとしてマトリクスクラスタリングにより求められる。即ち、コミュニティとは、情報検索を行っているユーザの嗜好と同様の嗜好を持つユーザ群である。ユーザ・ページ対応表は、例えばハードディスクに格納されており、コミュニティ検索部２２は、例えばＣＰＵにより実現することができる。
【００５０】
ページリスト生成部２３は、前記コミュニティ検索部２２からコミュニティに属するユーザのアクセス頻度が高いＷＷＷページＩＤと前記キーワードを含むＷＷＷページＩＤとを受けて、一定の優先順位に基づいてＷＷＷページＩＤを順位付けし、ＷＷＷページ情報データベースからＷＷＷページ情報を読み出してＷＷＷページリストを生成するものである。前記コミュニティに属するユーザのアクセス頻度が高いＷＷＷページＩＤと前記キーワードを含むＷＷＷページＩＤとから、（１）キーワードを含み且つコミュニティのアクセス頻度が高いＷＷＷページ、（２）キーワードを含み且つコミュニティのアクセス頻度が低いＷＷＷページ、（３）キーワードを含まないがコミュニティのアクセス頻度が高いＷＷＷページの３種類のＷＷＷページに分類することができるので、予め、これらに所定の優先順位を設定しておく。一般に、（１）に分類されるＷＷＷページが情報検索を行っているユーザにとって有用度が高いと考えられるので、（１）を優先し、（２）又は（３）のいずれを優先させるかを設定しておけばよい。また、（１），（２），（３）内夫々におけるＷＷＷページの優先度はコミュニティのアクセス頻度により順位付けする。
【００５１】
次に、本情報検索システム１の処理手順を図２〜４を用いて説明する。
まず、ユーザＰは、クライアント３から、ネットワーク４を介して情報検索サーバ２にアクセスし、クライアント３のインタフェースに検索エンジンのページを表示する。ユーザＰが望む情報を有するＷＷＷページを検索するために、手器等と思われるキーワードを当該ページ上に入力する。例えば、キーワードとして「ｃ」と「ｄ」とをａｎｄ検索で入力したとする。クライアント３から、該キーワードｃ，ｄとユーザＰのユーザＩＤとが情報検索サーバ２に送信され、これを受けて該情報検索サーバ２は情報検索処理を実行する。
【００５２】
図２は、前記情報検索サーバ２の検索処理手順を示すものであるが、図に示すように、前記キーワードｃ，ｄとユーザＰのユーザＩＤとを受けた情報検索サーバ２は、まず、キーワードｃ，ｄを含むＷＷＷページを検索するページ検索を行う（Ｓ１）。詳細には、キーワードｃ，ｄとユーザＰのユーザＩＤとを受けた管理部２０から、キーワードｃ，ｄがページ検索部２１に出力され、該ページ検索部２１がキーワード・ページ対応表からキーワードｃ，ｄを含むＷＷＷページＩＤを抽出する。図３は、キーワード・ページ対応表の一例を示すものであり、縦軸をＷＷＷページＩＤ、横軸をキーワードとして、ＷＷＷページがキーワードを有する場合は「１」で、有しない場合は「０」で表現されている。いま、入力されているキーワードは「ｃ」ａｎｄ「ｄ」であるので、ページ検索部２１は、ＷＷＷページＣ，ＤのＷＷＷページＩＤを抽出して、コミュニティ検索部２２へ出力することとなる。
【００５３】
管理部２０からユーザＰのユーザＩＤを、ページ検索部２１からＷＷＷページＣ，ＤのＷＷＷページＩＤを受けたコミュニティ検索部２２は、これらユーザＩＤとＷＷＷページＩＤとを初期値として、ユーザ・ページ対応表のマトリクスクラスタリングを実行する（Ｓ２）。図４は、ユーザ・ページ対応表の一例を示すものであり、縦軸をユーザＩＤ、横軸をＷＷＷページＩＤとして、ユーザがＷＷＷページにアクセスした場合は「１」で、アクセスしていない場合は「０」で表現されている。従って、例えばユーザＰはＷＷＷページＡ，Ｂを閲覧したことがある。マトリクスクラスタリングは、１と０との２値をもつ行列から初期値を含む密な部分行列を抽出する手法であり、指定された行又は列からマーカ伝播と枝刈りと繰り返すことにより密な部分行列を見つける。
【００５４】
以下、初期値をユーザＰ、ＷＷＷページＣ，Ｄとして、ユーザＰの列から行方向に向かってマトリクスクラスタリングを行っていく過程を、図５を用いて説明する。
まず、図５（ａ）に示すように、ユーザ・ページ対応表からユーザＰの列から要素が１である行、即ちＷＷＷページＡ，Ｂに向かってマーカが伝播される。この際、ＷＷＷページＣ，Ｄはキーワードｃ，ｄを含むので、強制的にマーカ伝播される。従って、ＷＷＷページＡ，Ｂ，Ｃ，Ｄが生き残る。次に、図５（ｂ）に示すように、ＷＷＷページＡ，Ｂ，Ｃ，Ｄの行から要素が１である列、即ちユーザＰ，Ｑ，Ｒ，Ｓ，Ｔ，Ｕ，Ｖに向かってマーカが伝播される。その後、図５（ｃ）に示すように、列において受信したマーカ数により枝刈りが行われる。マーカ数１以下に対して枝刈りを行うと、ユーザＳ，Ｔ，Ｕ，Ｖが消去され、ユーザＰ，Ｑ，Ｒが生き残る。同様に、行において受信したマーカ数により枝刈りが行われるが、ＷＷＷページＣはマーカ数が０であるもののキーワードｃ，ｄを含むので、マーカ数に拘わらず強制的に生き残る。ＷＷＷページＤはマーカ数が２であり枝刈りの対象にはならないが、仮に枝刈りの対象になったとしてもキーワードｃ，ｄを含むので、強制的に生き残る。このように、所定のマーカ数を閾値として枝刈りを行うことにより、最終的な部分行列を所望の大きさに絞り込むことが可能となり、また、マトリクスクラスタリングが効率化されて処理速度が速くなるので、ユーザ・ぺージ対応表が膨大な場合に効果的である。
【００５５】
その他のＷＷＷページＡ，Ｂはマーカ数が２であるので、枝刈りは行われない。この後、マーカ伝播を繰り返しても行列は変化しないので、マトリクスクラスタリングにより得られる密な部分行列は図５（ｃ）に示すものとなる。当該部分行列は、キーワードｃ，ｄに関してユーザＰと類似したアクセス履歴を有するコミュニティＰ，Ｑ，Ｒを示している。このようなコミュニティは、同一ユーザＰに対してもキーワードにより決定される初期値により異なる。即ち、一人のユーザは検索しようとする情報毎に複数のコミュニティに属しており、本コミュニティ検索部２２が行うマトリクスクラスタリングによれば、このような予め想定できないようなユーザ間の関係であるコミュニティを容易且つ高速に抽出することができる。
【００５６】
このようにして、前記コミュニティ検索部２２は、コミュニティに属するユーザＰ，Ｑ，Ｒのアクセス頻度が高いものとしてＷＷＷページＡ，Ｂ，Ｃ，Ｄの各ＷＷＷページＩＤを、コミュニティの各ＷＷＷページへのアクセス頻度とともに、即ち、ＷＷＷページＡに対し３，ＷＷＷページＢに対し３、ＷＷＷページＣに対し０、ＷＷＷページＤに対し２を関連付けて出力する。また、出力キーワードｃ，ｄを含むものとしてＷＷＷページＣ，ＤのＷＷＷページＩＤを出力する。なお、このＷＷＷページＣ，ＤのＷＷＷページＩＤは前記ページ検索部２１からページリスト生成部２３へ出力するようにしてもよい。
【００５７】
コミュニティ検索部２２から、コミュニティのアクセス頻度が高いＷＷＷページＩＤと、キーワードを含むＷＷＷページＩＤとを受けたページリスト生成部２３は、これらに基づいてＷＷＷページの順位付けを行う（Ｓ３）。前述したように、これらＷＷＷページは、（１）キーワードｃ，ｄを含み且つコミュニティＰ，Ｑ，Ｒのアクセス頻度が高いＷＷＷページＤ、（２）キーワードｃ、ｄを含み且つコミュニティＰ，Ｑ，Ｒのアクセス頻度が低いＷＷＷページＣ、（３）キーワードｃ，ｄを含まないがコミュニティＰ，Ｑ，Ｒのアクセス頻度が高いＷＷＷページＡ，Ｂの３種類のＷＷＷページに分類することができるので、予め設定された順位に基づいて順位付けを行う。例えば、（１），（３），（２）の順で順位付けを行うように設定されている場合には、ＷＷＷページの優先順位は、Ｄ，Ａ，Ｂ，Ｃの順となる。
【００５８】
さらに、ページリスト生成部２３は、ＷＷＷページＤ，Ｃ，Ａ，ＢのＷＷＷページ情報をＷＷＷページ情報データベースから読み出してＷＷＷページリストを作成し、管理部２０へ出力する（Ｓ４）。前記ＷＷＷページ情報にはＷＷＷページのＵＲＬ等が含まれており、ＷＷＷページリストには該ＵＲＬにハイパーリングが付されて表示される。管理部２０は、該ＷＷＷページリストをクライアント３へ送信し（Ｓ５）、一連の情報検索処理が終了する。ユーザはクライアント３のインタフェースに表示されたＷＷＷページリストから所望のＷＷＷページを選択することにより、クライアント３にＷＷＷページを閲覧することができる。
【００５９】
本実施の形態において、従来の情報検索のようにＷＷＷページの全アクセス数により順位付けをした場合には、アクセス数が４であるＷＷＷページＣがアクセス数２であるＷＷＷページＤより優先順位が高くなるが、ＷＷＷページＣは、ユーザＰと同様の嗜好を持つユーザＱ，Ｒのアクセス頻度が低い。一方、ＷＷＷページＤはユーザＱ，Ｒのアクセス頻度が高い。従って、ユーザＰの嗜好を考慮すれば、ユーザＰにとって有用な情報はＷＷＷページＤである可能性が高いと考えられ、従来の優先順位ではユーザＰの嗜好を反映していないこととなる。
【００６０】
本実施の形態に係る情報検索システム１によれば、コミュニティＰ，Ｑ，Ｒに属するユーザのアクセス頻度の高いＷＷＷページＤをＷＷＷページリストの上位に優先して表示することができ、ユーザＰの嗜好に適合した情報検索が可能となり、特に、入力されたキーワードを含むＷＷＷページ数が膨大な場合に有用である。また、キーワードｃ，ｄを有しないＷＷＷページＡ，Ｂをも検索結果として表示させることができ、キーワードのゆらぎに対しても強い情報検索が可能となる。
【００６１】
なお、本実施の形態では、ユーザはクライアント３にキーワードを入力するものとしたが、前記ページ検索部２１により、情報検索サーバ２にアクセスしたクライアント３に一定のキーワード群を表示させて、ユーザが、キーワードの入力に代えて、表示されたキーワード群からキーワードを選択するような形態としてもよい。また、前記ページ検索部２１により、ユーザが現在閲覧しているＷＷＷページに含まれる情報をキーワードとして抽出させることも可能である。
【００６２】
〔第２の実施の形態〕
図６に示すように、本実施の形態に係る情報検索システム５は、情報検索サーバ６と複数のクライアント３とが、ネットワーク４を介して双方通信可能な状態で接続されて構成されている。なお、クライアント３及びネットワーク４は前記第１の実施の形態と同様であるので説明を省略し、ここでは、情報検索サーバ６について詳述する。
【００６３】
前記情報検索サーバ６は、管理部６０、ページ検索部６１、コミュニティ検索部６２、マトリクスクラスタリング部６３、ページリスト生成部６４を備えてなるものである。
【００６４】
管理部６０は、ページ検索部６１、コミュニティ検索部６２、マトリクスクラスタリング部６３、ページリスト生成部６４の機能の管理とネットワーク４との接続の確保を行っており、クライアント３から送信されたキーワード及びユーザＩＤは管理部６０が受信して、ページ検索部６１及びコミュニティ検索部６２へ送信するようになっている。また、ページリスト生成部６４により作成されたＷＷＷページリストを所定のクライアント３に送信するようになっている。
【００６５】
さらに、管理部６０は、予め定められた一定周期毎にＷＷＷ上のＷＷＷページを巡回して、各ＷＷＷページに含まれるキーワードを抽出し、該キーワードとＷＷＷページＩＤとを対応させたキーワード・ページ対応表として記録し、且つ、ＷＷＷページのＵＲＬやページ運営者、ページ要約等のＷＷＷページ情報をＷＷＷページＩＤと対応させたＷＷＷページ情報データベースとして記録するロボット検索機能と、アクセスログ等から各ユーザのＷＷＷページへのアクセス履歴を、ユーザＩＤとＷＷＷページＩＤとを対応させたユーザ・ページ対応表として記録する機能とを有している。また、情報検索サーバ６にアクセスしたユーザが新規ユーザか否かを判定し、新規ユーザである場合にはユーザプロファイルの登録画面をクライアント３に表示させ、入力されたユーザプロファイルをデータベースに記録する。また、既登録ユーザのプロファイルの変更等も同様に行う。このように機能する管理部６０は、例えばＣＰＵ及び通信装置により実現することができる。
【００６６】
ページ検索部６１は、前記管理部６０からキーワードを受け、ロボット検索等を用いて蓄積されたキーワード・ページ対応表を用いて、該キーワードを有するＷＷＷページＩＤを検索して、マトリクスクラスタリング部６３へ出力するものである。ＷＷＷページＩＤは、ＷＷＷページを識別できるものであれば、ＵＲＬ等であってもよい。キーワード・ページ対応表は、例えばハードディスクに格納されており、ページ検索部６１は、例えばＣＰＵにより実現することができる。
【００６７】
コミュニティ検索部６２は、前記管理部６０からユーザＩＤを受け、ユーザプロファイルから情報検索を行っているユーザのコミュニティを求め、該コミュニティに属するユーザＩＤをマトリクスクラスタリング部６３へ出力するものである。該コミュニティは、例えば、ユーザプロファイル間の相関係数を求めることにより行うが、これについては後述する。前記ユーザプロファイルは、例えばハードディスクに格納されており、コミュニティ検索部６２は、例えばＣＰＵにより実現することができる。
【００６８】
マトリクスクラスタリング部６３は、前記ページ検索部６１からＷＷＷページＩＤを、前記コミュニティ検索部６２からユーザＩＤを受け、これを初期値として、ユーザ・ページ対応表に対してマトリクスクラスタリングを行って密な部分行列を抽出し、該部分行列に含まれるＷＷＷページＩＤをページリスト生成部６４へ出力する。前記ユーザ・ページ対応表は、例えばハードディスクに格納されており、コミュニティ検索部６３は、例えばＣＰＵにより実現することができる。
【００６９】
ページリスト生成部６４は、前記コミュニティ検索部６３からＷＷＷページＩＤとを受けて、一定の優先順位に基づいてＷＷＷページＩＤを順位付けし、ＷＷＷページ情報データベースからＷＷＷページ情報を読み出してＷＷＷページリストを生成するものである。前記コミュニティに属するユーザのアクセス頻度が高いＷＷＷページＩＤと前記キーワードを含むＷＷＷページＩＤとから、（１）キーワードを含み且つコミュニティのアクセス頻度が高いＷＷＷページ、（２）キーワードを含み且つコミュニティのアクセス頻度が低いＷＷＷページ、（３）キーワードを含まないがコミュニティのアクセス頻度が高いＷＷＷページの３種類のＷＷＷページに分類することができるので、予め、これらに所定の優先順位を設定しておく。また、（１），（２），（３）各種類内におけるＷＷＷページの優先度はコミュニティのアクセス頻度により順位付けする。
【００７０】
次に、本情報検索システム５の処理手順を図７〜９を用いて説明する。
まず、ユーザＰは、クライアント３から、ネットワーク４を介して情報検索サーバ６にアクセスし、クライアント３のインタフェースに検索エンジンのページを表示する。ユーザＰが望む情報を有するＷＷＷページを検索するために、手器等と思われるキーワードを当該ページ上に入力する。ここでは、第１の実施の形態と同様に、キーワードとして「ｃ」と「ｄ」とをａｎｄ検索したとする。クライアント３から、該キーワードｃ，ｄとユーザＰのユーザＩＤとが情報検索サーバ６に送信され、これを受けて該情報検索サーバ６は情報検索処理を実行する。
【００７１】
図７は、前記情報検索サーバ６の検索処理手順を示すものであるが、図に示すように、前記キーワードａとユーザＰのユーザＩＤとを受けた情報検索サーバ６は、まず、キーワードｃ，ｄを含むＷＷＷページを検索するページ検索を行う（Ｓ１０）。詳細には、キーワードｃ，ｄとユーザＰのユーザＩＤとを受けた管理部６０から、キーワードｃ，ｄがページ検索部６１に出力され、該ページ検索部６１が、図３に示すキーワード・ページ対応表からキーワードｃ，ｄを含むＷＷＷページＣ，Ｄを抽出して、ＷＷＷページＩＤをマトリクスクラスタリング部６３へ出力する。
【００７２】
一方、管理部６０からユーザＰのユーザＩＤを受けたコミュニティ検索部６２は、ユーザプロファイルからユーザＰの属するコミュニティを抽出する（Ｓ１１）。該コミュニティは、例えば、プロファイル間の相関係数を求めることにより行われる。このような相関係数を算出する方法は多種あるが、ここでは平均自乗誤差による方法を例に説明する。図８は、ユーザプロファイルの一例を示すものであり、縦軸をユーザＩＤ、横軸を例えばスポーツや音楽、映画のような好みの分野として、ユーザが好む分野を好みの度合いに応じて５段階で表現し、好まない場合は「０」で表現されている。このようなユーザプロファイルは、ユーザが情報検索サーバ５にはじめてアクセスした場合にユーザＩＤに対応して登録され、必要に応じて更新することも可能となっている。
【００７３】
前記ユーザプロファイルから、ユーザＰの嗜好は分野イ，ロ，ヘであり、まず、該カテゴリを好みの分野とした他のユーザを判定すると、ユーザＰと共通の分野イ，ロ，ヘを好むのは、ユーザＱ，Ｒであると判定される。次に、ユーザＰとユーザＱとのプロファイルの相関係数が、両者が共通に評価している分野のロとヘの好みの度合いの自乗の差から以下のように求められる。
（３−５）^２＋（１−１）^２＝４
同様に、ユーザＰとユーザＲについては、共通する分野がイとロであるので、
（５−５）^２＋（３−１）^２＝４
となる。得られた相関係数を比較すると同じであるので、ユーザＰの好みに対し、ユーザＱ，Ｒは同様に類似していると判定できる。なお、ユーザＰと好みの分野が共通するユーザが多数ある場合には、求められた相関係数が閾値以下であることを条件としたり、相関係数による順位付けでコミュニティに属するユーザを選定することとしてもよい。このようにして得られたコミュニティに属するユーザＩＤをマトリクスクラスタリング部６３へ出力する。
【００７４】
ページ検索部６１からＷＷＷページＣ，ＤのＷＷＷページＩＤを，コミュニティ検索部６２からユーザＰの属するコミュニティのユーザＰ，Ｑ，ＲのユーザＩＤを受けたマトリクスクラスタリング部６３は、これらユーザＩＤとＷＷＷページＩＤとを初期値として、ユーザ・ページ対応表のマトリクスクラスタリングを実行する（Ｓ１２）。図９（ａ）は、ユーザ・ページ対応表の一例を示すものであり、縦軸をユーザＩＤ、横軸をＷＷＷページＩＤとして、ユーザがＷＷＷページにアクセスした場合は「１」で、アクセスしていない場合は「０」で表現されている。初期値をユーザＰ，Ｑ，Ｒ、ＷＷＷページＣ，Ｄとして、ユーザＰ，Ｑ，Ｒの列から要素が１である行、即ちＷＷＷページＡ，Ｂ，Ｄ，Ｆに向かってマーカが伝播される。この際、ＷＷＷページＣはキーワードＣ，Ｄを含むので、強制的にマーカ伝播される。従って、図９（ｂ）に示すように、ＷＷＷページＡ，Ｂ，Ｃ，Ｄ，Ｆが生き残る。その後、図９（ｃ）に示すように、行において受信したマーカ数により枝刈りが行われる。マーカ数１以下に対して枝刈りを行うと、ＷＷＷページＣ，Ｆが消去され、ＷＷＷページＡ，Ｂ，Ｄが生き残ることとなるが、ＷＷＷページＣはマーカ数が０であるもののキーワードｃ，ｄを含むので、マーカ数に拘わらず強制的に生き残る。ＷＷＷページＤはマーカ数が２であり枝刈りの対象にはならないが、仮に枝刈りの対象になったとしてもキーワードｃ，ｄを含むので、強制的に生き残る。その他のＷＷＷページＡ，Ｂはマーカ数が２であるので、枝刈りは行われない。従って、図９（ｃ）に示すように、ＷＷＷページＡ，Ｂ，Ｃ，Ｄが生き残る。一方、列において受信したマーカ数により枝刈りを行う場合には、コミュニティに属するユーザＰ，Ｑ，Ｒはマーカ数に拘わらず強制的に生き残るようにする。
【００７５】
この後、マーカ伝播を繰り返しても行列は変化しないので、マトリクスクラスタリングにより得られる密な部分行列は図９（ｃ）に示すものとなる。このようにして得られたＷＷＷページＡ，Ｂ，Ｃ，Ｄを、ユーザＰ，Ｑ，Ｒのアクセス頻度とともに、即ち、ＷＷＷページＡに対し３，ＷＷＷページＢに対し３、ＷＷＷページＣに対し０、ＷＷＷページＤに対し２を関連付けて出力する。また、出力キーワードｃ，ｄを含むものとしてＷＷＷページＣ，ＤのＷＷＷページＩＤを出力する。
【００７６】
コミュニティ検索部２２から、コミュニティのアクセス頻度が高いＷＷＷページＩＤと、キーワードを含むＷＷＷページＩＤとを受けたページリスト生成部６４は、これらに基づいてＷＷＷページの順位付けを行う（Ｓ１３）。該順位付けは、第１の実施の形態と同様に行うと、Ｄ，Ａ，Ｂ，Ｃの順となる。さらに、ページリスト生成部６４は、ＷＷＷページＤ，Ｃ，Ａ，ＢのＷＷＷページ情報をＷＷＷページ情報データベースから読み出してＷＷＷページリストを作成し、管理部６０へ出力する（Ｓ１４）。管理部６０は、該ＷＷＷページリストをクライアント３へ送信し（Ｓ１５）、一連の情報検索処理が終了する。
【００７７】
本実施の形態のように、予め登録されたユーザプロファイルによりユーザのコミュニティを抽出することとすれば、各ユーザのＷＷＷページアクセス履歴から検出する場合より精度が高くなり、ユーザの嗜好を的確に反映した情報検索が可能となる。
【００７８】
なお、前記各実施の形態に係る情報検索サーバ２，６は、専用のシステムの他、前述した情報検索方法の各処理ステップを行わせるためのプログラムとして実現し、例えば、該プログラムを記録したＣＤ−ＲＯＭ等の記録媒体を用いて、汎用コンピュータに該プログラムをインストールすることにより実現することも可能である。
【００７９】
【発明の効果】
以上説明したように、本発明によれば、情報検索を行ったユーザと同じ嗜好をもつ特定のコミュニティに属するユーザをアクセス履歴又はユーザプロファイルから求め、該コミュニティの嗜好を反映させてＷＷＷページの検索及びその順位付けをすることができ、ユーザの好みの情報を有するＷＷＷページを効率よく検索できることができる。
【図面の簡単な説明】
【図１】本発明の第１の実施の形態に係る情報検索システムの構成を示す図である。
【図２】情報検索の処理手順を示すフローチャートである。
【図３】キーワード・ページ対応表の一例を示す図である。
【図４】ユーザ・ページ対応表の一例を示す図である。
【図５】マトリクスクラスタリングの処理過程を示す図である。
【図６】本発明の第２の実施の形態に係る情報検索システムの構成を示す図である。
【図７】情報検索の処理手順を示すフローチャートである。
【図８】ユーザプロファイルの一例を示す図である。
【図９】マトリクスクラスタリングの処理過程を示す図である。
【図１０】従来の情報検索の処理手順の一例を示すフローチャートである。
【図１１】従来のキーワード・ページ対応表の一例を示す図である。
【図１２】従来のページ優先度表の一例を示す図である。
【符号の説明】
１，５情報検索システム
２，６情報検索サーバ
３クライアント
４ネットワーク
２０，６０管理部
２１，６１ページ検索部
２２，６２コミュニティ検索部
２３，６４ページリスト生成部
６３マトリクスクラスタリング部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention searches for information desired by a user from information provided on a WWW page based on a keyword and a user ID from a requesting client, and transmits WWW page information having the information to the client. The present invention relates to an information search method, an information search server, and an information search program.
[0002]
[Prior art]
With the spread of the WWW, a huge amount of information has been digitized and can be accessed as a WWW page, and information based on a large amount of WWW pages is widely used in various fields. In order for the user to efficiently obtain desired information from such a vast amount of information on the WWW, it is very necessary to efficiently search for information on the WWW and quickly display a desired WWW page. Is important. Conventionally, various search engines have been used to perform such information search.
[0003]
To explain the outline of an information search system using a conventional search engine, first, a user inputs a keyword for obtaining desired information at a client, and transmits the keyword to a server. The server that has received the keyword extracts WWW page information corresponding to the keyword from the keyword / page correspondence table, generates a WWW page list, and displays the WWW page list on the client. The user can display a desired WWW page using the URL of each displayed WWW page information. When the displayed WWW page list has a large number of WWW page information, a keyword is further input to narrow down. The keyword-page correspondence table is created by the server collecting WWW pages to be searched in advance and extracting keywords included in each WWW page.
[0004]
In general, the number of WWW pages that can be searched by a search engine is enormous, and when information is searched using keywords that are commonly used or frequently used, there are a huge number of WWW pages that have the keywords. The number of WWW page information displayed on the page list also becomes enormous. Therefore, it is necessary to further narrow down the WWW pages by inputting a keyword. For example, when a user performs an information search with “Tanaka” as a keyword using a conventional search engine to obtain information on a certain Mr. Tanaka, hundreds of thousands of WWW pages are hit. In these WWW page lists, information of persons whose name is "Tanaka" in various fields is mixed. Therefore, the user searches the WWW page list for a WWW page having desired information of Mr. Tanaka. It is difficult. Furthermore, we want to obtain information on Mr. Tanaka related to music, so even if we narrow down using "Tanaka" and "Music" as keywords, there are more than 200,000 hits, It is also difficult to find a WWW page having Therefore, when performing an information search on the WWW, it is desirable to use a keyword that is included in a desired WWW page and is presumed not to be an idiomatic term or a multilingual term. It is difficult to perform a good information search without a skilled person who has grasped the idioms and multiple terms in the WWW page.
[0005]
On the other hand, the conventional search engine ranks search results so that a user can efficiently display a desired WWW page even if the user is unfamiliar with information search or does not think of a number of accurate keywords. Displayed on the WWW page list. In many cases, the ranking of such search results is based on the number of accesses and the number of linked pages of the WWW page, and the information with a large number of accesses is popular among many users. It is highly likely that the user corresponds to the WWW page desired by the user, and the one with a large number of linked pages has a high degree of objective importance of the WWW page because links are attached from many WWW pages. In addition, it is considered that there is a high possibility that the page corresponds to the WWW page desired by the user (see Patent Document 1 and Patent Document 2).
[0006]
FIG. 10 is a diagram showing an example of the processing of a conventional information search system. First, a server inputs a WWW having a corresponding keyword by using a keyword / page correspondence table created in advance by a keyword. Search for a page.
[0007]
FIG. 11 shows the keyword / page correspondence table, in which the vertical axis indicates a WWW page and the horizontal axis indicates a keyword, where "0" indicates that there is no corresponding keyword, and "1" indicates that there is a corresponding keyword. are doing. In such a keyword / page correspondence table, keywords are extracted from the WWW page at regular intervals by a robot search or the like, and are updated as needed. From such a keyword / page correspondence table, a WWW page including the keyword input by the client is extracted.
[0008]
Next, the plurality of WWW pages searched are ranked using the page priority table. In this page priority table, a numerical value indicating the priority for each WWW page is recorded based on the number of accesses to the WWW page and the number of links. FIG. 12 is the page priority table, which is ranked according to the number of accesses or the number of linked pages for each WWW page. Is high. According to this ranking, the searched WWW page information is generated in descending order of priority and a WWW page list is generated and displayed on the client.
[0009]
For example, when the client performs a search by inputting the keyword "c", the server functioning as an information search system extracts WWW pages C, D, and E from the keyword / page correspondence table. Furthermore, the priority of the extracted WWW pages C, E, and D is obtained from the page priority table, and a WWW page list is generated in the order of C, E, and D in descending order of priority, and displayed on the client. You.
[0010]
The user selects and displays an arbitrary WWW page from each WWW page information of the WWW page list displayed on the client, but a WWW page with a large number of accesses or a number of linked pages is useful for many users. Therefore, it is highly likely that the information search is useful for the user performing the information search. As described above, by generating the WWW page list in which the WWW pages that are widely useful to the user are ranked using the number of accesses and the number of links as indices, the WWW pages having the information desired by the user are included in the top. The likelihood increases and an efficient information search can be provided.
[0011]
[Patent Document 1]
JP 2002-202992 A
[Patent Document 2]
JP-A-2002-215671
[0012]
[Problems to be solved by the invention]
However, as the types of information provided on the WWW are diversified, the purpose of information search is also diversified for each user. Further, as the use layer of the WWW is expanded, the preferences of each user who uses the information search are changed. Are also various. Therefore, a WWW page having information desired by a specific user who has performed an information search is not always popular or important to all users. The above ranking does not improve the search efficiency of all the various users.
[0013]
For example, so that a WWW page that is popular with younger users is not always popular with older users, even if a specific user tries to search for information that suits his or her preferences, all users can search for information. In the WWW page list ranked on the basis of the number of accesses and the number of links, the WWW page having the information desired by the user is not always included in the higher rank. By sequentially displaying each WWW page, it is necessary to check whether desired information is included in each WWW page. As a result, it takes time and effort to retrieve information, which is inefficient.
[0014]
The present invention has been made in view of the above, and an information search system and an information search method capable of performing efficient information search on the WWW, particularly information reflecting a preference of a specific user performing the information search An object of the present invention is to provide an information search method, an information search server, and an information search program capable of performing a search.
[0015]
[Means for Solving the Problems]
An information search method according to claim 1 of the present invention searches for information desired by a user from information provided on a WWW page based on a keyword and a user ID from a requesting client, and retrieves the information. An information search method for transmitting WWW page information to the client, comprising: searching for a keyword included in a WWW page based on the keyword in a keyword / page correspondence table recorded in correspondence with each WWW page; Search for WWW pages including the user ID and the WWW page search results as initial values, and the access history of each user to each WWW page using each WWW page and each user as row components and column components in a two-dimensional space. By extracting an area where access histories are dense from the user / page correspondence table expressed in A community of a user having a similar tendency in the keyword is obtained, a WWW page belonging to the community is selected, and the WWW pages belonging to the community are ranked based on the access frequency of the user belonging to the community. Is generated, and the WWW page list is transmitted to the client.
[0016]
The information search method according to claim 2 of the present invention searches for information desired by a user from information provided on a WWW page based on a keyword and a user ID from a requesting client. An information search method for transmitting WWW page information having information to the client, wherein the keyword included in the WWW page is associated with each WWW page based on the keyword and the keyword / page correspondence table is recorded. A WWW page including a keyword is searched, and a community of users whose preferences are similar to the user is obtained from a user profile in which the preferences of each user are recorded in advance based on the user ID, and the WWW page search results and the community are obtained. Each WWW page and each user are defined as a row component and a column By extracting a region where access histories are dense from a user-page correspondence table expressing a user's access history to each WWW page in a two-dimensional space, a WWW page belonging to the region is selected, and And generates a WWW page list indicating each WWW page information by ranking the WWW pages belonging to the area based on the access frequency of the user belonging to the area, and transmits the WWW page list to the client.
[0017]
According to a third aspect of the present invention, in the information search method according to the first or second aspect, the keyword from the client is input by a user.
[0018]
Further, according to the present invention (claim 4), in the information search method according to claim 1 or 2, the keyword from the client is selected by a user from a keyword group displayed on the client.
[0019]
According to the present invention (claim 5), in the information search method according to claim 1 or 2, the keyword from the client is extracted from a WWW page displayed on the client.
[0020]
Further, according to the present invention (claim 6), in the information search method according to claim 1, a WWW page accessed by the requesting user and a WWW page containing the keyword are read from the user page correspondence table. A user who has selected as a target and accessed the target WWW page is selected as the next target, and this is repeated until a predetermined convergence condition is satisfied, thereby extracting a region where access histories are dense. .
[0021]
Further, according to the present invention (claim 7), in the information search method according to claim 6, when a WWW page or a user is selected as the next target from the user page correspondence table, each WWW page or user is selected. Those whose access frequency is equal to or lower than a predetermined threshold and which is not a WWW page including the keyword are excluded.
[0022]
Further, according to the present invention (claim 8), in the information search method according to claim 2, a WWW page accessed by a user belonging to the community and a WWW page including the keyword are determined from the user page correspondence table. By selecting the user who has accessed the target WWW page and the user belonging to the community as the next target and repeating this until a predetermined convergence condition is satisfied, the access history becomes dense. The extracted region is extracted.
[0023]
Further, according to the present invention (claim 9), in the information search method according to claim 8, when a WWW page or a user is selected as the next target from the user page correspondence table, each WWW page or user is selected. The access frequency is not more than a predetermined threshold and is not a WWW page including the keyword or a user other than a user belonging to the community.
[0024]
Also, according to the present invention (claim 10), in the information search method according to claim 1 or 2, the WWW page list includes a WWW page belonging to the community or area, including the keyword, and displaying the community or area. A WWW page having a high access frequency, a WWW page including the keyword and having a low access frequency to the community or area, and a WWW page not including the keyword but having a high frequency of access to the community or area are classified. Each WWW page information is shown by ranking each time.
[0025]
The information search server according to claim 11 of the present invention searches for information desired by a user from information provided on a WWW page based on a keyword and a user ID from a requesting client. An information retrieval server for transmitting WWW page information having information to a client as a WWW page list, wherein a keyword included in a WWW page is recorded in association with each WWW page based on the keyword. A page search means for searching a table for a WWW page including the keyword and outputting a search result; and using the user ID and the search result of the page search means as initial values, each WWW page and each user as row components. The user who expressed the access history of each user to each WWW page as a column component in a two-dimensional space. By extracting an area where access histories are dense from the page correspondence table, a community of a user having a similar tendency to the user and the keyword is obtained, a WWW page belonging to the community is selected, and the WWW page belonging to the community is selected. Community search means for outputting the access frequency of the community; and WWW pages selected by the community search means are ranked based on the access frequency of the user belonging to the community, and a WWW page list indicating each WWW page information is displayed. And a page list generating means for generating and outputting.
[0026]
The information search server according to claim 12 of the present invention searches for information desired by a user from information provided on a WWW page based on a keyword and a user ID from a requesting client. An information retrieval server for transmitting WWW page information having information to a client as a WWW page list, wherein a keyword included in a WWW page is recorded in association with each WWW page based on the keyword. A page search means for searching a table for a WWW page including the keyword and outputting a search result, and a user profile similar to the user based on the user ID based on the user profile in which the user's preferences are recorded in advance. Community that seeks the community of the user and outputs the user ID belonging to the community. Means for searching, and using the search results and user IDs belonging to the community as initial values, using each WWW page and each user as row components and column components, and storing the access history of each user to each WWW page in a two-dimensional space. A matrix clustering means for selecting a WWW page belonging to the area by extracting an area where access history is dense from the user / page correspondence table expressed in the above, and outputting the WWW page and the access frequency; Page list generating means for ranking the WWW pages selected by the matrix clustering means based on the frequency, generating and outputting a WWW page list indicating each WWW page information.
[0027]
According to a thirteenth aspect of the present invention, in the information search server according to the eleventh or twelfth aspect, the page search means searches a WWW page based on a keyword input to the client by a user. .
[0028]
According to a fourteenth aspect of the present invention, in the information search server according to the eleventh or twelfth aspect, the page search means displays a keyword group on the client, and displays a keyword group selected by the user from the keyword group. Thus, a WWW page is searched.
[0029]
Further, according to the present invention (claim 15), in the information search server according to claim 11 or 12, the page search means extracts a keyword from a WWW page displayed on the client, and based on the keyword, , WWW pages.
[0030]
Further, according to the present invention (claim 16), in the information search server according to claim 11, the community search means includes, from the user page correspondence table, a WWW page accessed by the requesting user and the keyword. A WWW page is selected as the next target, a user who has accessed the target WWW page is selected as the next target, and this is repeated until a predetermined convergence condition is satisfied, thereby obtaining an area where access histories are dense. Is extracted.
[0031]
Further, according to the present invention (claim 17), in the information search server according to claim 16, when the community search means selects a WWW page or a user as the next target from the user page correspondence table, Each WWW page or a user whose access frequency is equal to or less than a predetermined threshold and other than the WWW page including the keyword is excluded.
[0032]
Further, according to the present invention (claim 18), in the information search server according to claim 12, the matrix clustering means converts a WWW page accessed by a user belonging to the community and the keyword from the user page correspondence table. Selecting the WWW page including the target as the next target, selecting the user who accessed the target WWW page and the user belonging to the community as the next target, and repeating this until a predetermined convergence condition is satisfied. Thus, an area where access histories are dense is extracted.
[0033]
Further, according to the present invention (claim 19), in the information search server according to claim 18, when the matrix clustering means selects a WWW page or a user as the next target from the user page correspondence table, The access frequency of each WWW page or user is equal to or less than a predetermined threshold value and is not a WWW page including the keyword or a user other than the user belonging to the community.
[0034]
Further, according to the present invention (claim 20), in the information search server according to claim 11 or 12, the page list generating means stores a WWW page belonging to the community or the area, including the keyword and the community or WWW page. Classifying into a WWW page having a high access frequency to a region, a WWW page including the keyword and having a low access frequency to the community or the region, and a WWW page not including the keyword but having a high frequency of access to the community or the region, A WWW page list indicating each WWW page information is generated by ranking each classification.
[0035]
An information search program according to claim 21 of the present invention searches a computer for information desired by a user from information provided on a WWW page based on a keyword and a user ID from a requesting client. An information search program for causing the computer to function as an information search server that transmits WWW page information having the information to the client as a WWW page list, the computer recognizing a keyword included in the WWW page based on the keyword. A page search means for searching a WWW page containing the keyword from a keyword / page correspondence table recorded in correspondence with each WWW page and outputting a search result, and initial values of the user ID and the search result of the page search means As a row component and a column component for each WWW page and each user. By extracting an area where access histories are dense from a user / page correspondence table expressing the access histories of each user on each WWW page in a two-dimensional space, the users having similar tendencies in the keywords with the users are extracted. A community search means for obtaining a community, selecting a WWW page belonging to the community, and outputting the WWW page and the access frequency of the community, and selecting the community search means based on an access frequency of a user belonging to the community; The function is to function as a page list generating unit that ranks WWW pages and generates and outputs a WWW page list indicating each WWW page information.
[0036]
Further, the information search program according to claim 22 of the present invention searches a computer for information desired by a user from information provided on a WWW page based on a keyword and a user ID from a requesting client. An information search program for causing the computer to function as an information search server that transmits WWW page information having the information to the client as a WWW page list, the computer recognizing a keyword included in the WWW page based on the keyword. A page search means for searching a WWW page including the keyword from a keyword / page correspondence table recorded in correspondence with each WWW page and outputting a search result, and pre-recording each user's preference based on the user ID. From a user profile that has been created, A community search means for finding the community of the user and outputting a user ID belonging to the community, and setting each of the WWW pages and each user as a row component and a column component with the search result and the user ID belonging to the community as initial values. By extracting an area where access histories are dense from a user / page correspondence table expressing a user's access history to each WWW page in a two-dimensional space, a WWW page belonging to the area is selected, and the WWW page is selected. Clustering means for outputting a WWW page list indicating each WWW page information by ranking the WWW pages selected by the matrix clustering means based on the access frequency. It functions as generating means.
[0037]
According to a twenty-third aspect of the present invention, in the information search program according to the twenty-first or twenty-second aspect, the page search means is configured to search for a WWW page based on a keyword input to the client by the user. It is to let.
[0038]
According to the present invention (claim 24), in the information search program according to claim 21 or 22, the page search means displays a keyword group on the client, and performs a search based on a keyword selected by the user from the keyword group. Thus, it is made to function as a search for a WWW page.
[0039]
According to the present invention (claim 25), in the information search program according to claim 21 or 22, the page search means extracts a keyword from a WWW page displayed on the client and performs a search based on the keyword. , WWW pages.
[0040]
According to the present invention (claim 26), in the information search program according to claim 21, the community search means includes a WWW page accessed by the requesting user and the keyword from the user page correspondence table. A WWW page is selected as the next target, a user who has accessed the target WWW page is selected as the next target, and this is repeated until a predetermined convergence condition is satisfied, thereby obtaining an area where access histories are dense. To function as an extractor.
[0041]
According to the present invention (claim 27), in the information search program according to claim 26, when the community search means selects a WWW page or a user as the next target from the user page correspondence table, Each WWW page or a user whose access frequency is equal to or less than a predetermined threshold and which is not a WWW page including the keyword is excluded from the target.
[0042]
According to the present invention (claim 28), in the information search program according to claim 22, the matrix clustering means is configured to determine, from the user page correspondence table, a WWW page accessed by a user belonging to the community and the keyword. Selecting the WWW page including the target as the next target, selecting the user who accessed the target WWW page and the user belonging to the community as the next target, and repeating this until a predetermined convergence condition is satisfied. Thus, the function of extracting a region where access histories are dense is performed.
[0043]
According to the present invention (claim 29), in the information search program according to claim 28, when the matrix clustering means selects a WWW page or a user as the next target from the user page correspondence table, Each WWW page or a user whose access frequency is equal to or less than a predetermined threshold and which is not a WWW page including the keyword or a user other than a user belonging to the community is made to function as a target.
[0044]
According to the present invention (claim 30), in the information search program according to claim 21 or 22, the page list generating means may be configured to include a WWW page belonging to the community or the area, including the keyword, and Classifying into a WWW page having a high access frequency to a region, a WWW page including the keyword and having a low access frequency to the community or the region, and a WWW page not including the keyword but having a high frequency of access to the community or the region, The function is such that a WWW page list indicating each WWW page information is generated by ranking each classification.
[0045]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be specifically described with reference to the drawings.
[First Embodiment]
As shown in FIG. 1, an information search system 1 according to the present embodiment includes an information search server 2 and a plurality of clients 3 connected via a network 4 so as to be able to communicate with each other. The information search server 2 and the client 3 are configured by a computer, software, and the like. The computer is, for example, a personal computer and includes a CPU, a RAM, a hard disk, a display device such as a CRT, an input device such as a keyboard and a mouse, and a communication device such as a LAN board. The network 4 is a wide area network, a public network, a LAN, or the like. In the present embodiment, the Internet will be described as an example.
[0046]
When the client 3 accesses the information search server 2 via the Internet as in the present embodiment, generally, the information search server 2 is a WWW server that provides a search engine, and the client 3 is an Internet browser or the like. An interface is provided, and the search engine and WWW pages can be browsed through the interface. When a user searches for information on the WWW from the client 3, the user inputs a keyword in a search engine displayed on the interface of the client 3, and the client 3 sends the keyword to the information search server 2 via the network 4. In response, the information search server 2 performs a search for WWW pages based on the keyword and the user ID, ranks the search results, generates a WWW page list, and transmits the list to the client 3. . The user browses the WWW page on the client 3 by selecting a desired WWW page from the WWW page list displayed on the interface of the client 3.
[0047]
The information search server 2 includes a management unit 20, a page search unit 21, a community search unit 22, and a page list generation unit 23.
The management unit 20 manages the functions of the page search unit 21, the community search unit 22, and the page list generation unit 23 and secures the connection with the network 4. The keyword and the user ID transmitted from the client 3 are stored in the management unit. 20 is received and transmitted to the page search unit 21 and the community search unit 22. Further, the WWW page list created by the page list generation unit 23 is transmitted to a predetermined client 3. Further, the management unit 20 traverses WWW pages on the WWW at predetermined intervals, extracts a keyword included in each WWW page, and associates the keyword with a WWW page ID. A robot search function that records as a correspondence table and records a WWW page information database in which WWW page information such as a URL of a WWW page, a page operator, and a page summary is associated with a WWW page ID; Has a function of recording the access history to the WWW page as a user / page correspondence table in which the user ID and the WWW page ID are associated with each other. The management unit 20 that functions as described above can be realized by, for example, a CPU and a communication device.
[0048]
The page search unit 21 receives a keyword from the management unit 20, searches for a WWW page ID having the keyword by using a keyword / page correspondence table stored using a robot search or the like, and sends it to the community search unit 21. Output. The WWW page ID may be any one that can identify a WWW page, such as a URL. The keyword / page correspondence table is stored, for example, on a hard disk, and the page search unit 21 can be realized by, for example, a CPU.
[0049]
The community search unit 22 receives the user ID from the management unit 20 and the search result of the WWW page, that is, the extracted WWW page ID from the page search unit 21, and sets the user ID and the WWW page ID as initial values, A matrix of a user who is performing information search by performing matrix clustering from the page correspondence table is obtained, and a WWW page ID that frequently accesses users belonging to the community and a WWW page ID including the keyword are sent to the page list generation unit 22. Output. The matrix clustering is a method of extracting a dense sub-matrix from a matrix having two values of “1” and “0”, which will be described later in detail. A community is a group of users included in a dense sub-matrix extracted by matrix clustering, and is a group of users who have accessed a WWW page accessed by a user performing information search or a WWW page including the keyword. Is obtained by matrix clustering. That is, the community is a group of users having the same preferences as those of the user performing the information search. The user page correspondence table is stored, for example, on a hard disk, and the community search unit 22 can be realized by, for example, a CPU.
[0050]
The page list generation unit 23 receives the WWW page IDs of the users belonging to the community that are frequently accessed and the WWW page IDs including the keywords from the community search unit 22 and ranks the WWW page IDs based on a certain priority. Then, the WWW page information is read from the WWW page information database to generate a WWW page list. From the WWW page ID of a user belonging to the community where the access frequency is high and the WWW page ID including the keyword, (1) a WWW page including the keyword and frequently accessed by the community; Since it can be classified into three types of WWW pages: low-frequency WWW pages, and (3) WWW pages that do not include a keyword but have a high frequency of community access, predetermined priorities are set in advance. In general, it is considered that the WWW page classified into (1) has a high degree of usefulness for a user who is performing information search. Therefore, it is determined whether (1) is prioritized and (2) or (3) is prioritized. Just set it. The priorities of WWW pages in (1), (2) and (3) are ranked according to the access frequency of the community.
[0051]
Next, a processing procedure of the information search system 1 will be described with reference to FIGS.
First, the user P accesses the information search server 2 from the client 3 via the network 4 and displays a search engine page on the interface of the client 3. In order to search for a WWW page having information desired by the user P, a keyword considered to be a hand device or the like is input on the page. For example, assume that "c" and "d" are input as keywords by an and search. The keywords c and d and the user ID of the user P are transmitted from the client 3 to the information search server 2, and in response to this, the information search server 2 executes an information search process.
[0052]
FIG. 2 shows a search processing procedure of the information search server 2. As shown in FIG. 2, the information search server 2 receiving the keywords c and d and the user ID of the user P first sets A page search for searching a WWW page including c and d is performed (S1). In more detail, the keywords c and d are output from the management unit 20 that has received the keywords c and d and the user ID of the user P to the page search unit 21, and the page search unit 21 searches the keyword c from the keyword / page correspondence table. , D are extracted. FIG. 3 shows an example of the keyword / page correspondence table. The vertical axis represents the WWW page ID, and the horizontal axis represents the keyword. If the WWW page has a keyword, it is “1”; Is represented by Since the input keyword is “c” and “d”, the page search unit 21 extracts the WWW page IDs of the WWW pages C and D, and outputs the extracted WWW page IDs to the community search unit 22.
[0053]
The community search unit 22 receiving the user ID of the user P from the management unit 20 and the WWW page IDs of the WWW pages C and D from the page search unit 21 sets the user page and the WWW page ID as initial values. The matrix clustering of the correspondence table is executed (S2). FIG. 4 shows an example of the user-page correspondence table. The vertical axis represents the user ID, and the horizontal axis represents the WWW page ID. When the user accesses the WWW page, the value is "1", and when the user does not access the page. Is represented by “0”. Therefore, for example, the user P has viewed the WWW pages A and B. Matrix clustering is a method for extracting a dense sub-matrix including an initial value from a matrix having two values of 1 and 0. The dense sub-matrix is obtained by repeating marker propagation and pruning from a specified row or column. Find out.
[0054]
Hereinafter, a process of performing matrix clustering from the column of the user P in the row direction with the initial values of the user P and the WWW pages C and D will be described with reference to FIG.
First, as shown in FIG. 5A, the marker is propagated from the column of the user P to the row where the element is 1, ie, the WWW pages A and B, from the user / page correspondence table. At this time, since the WWW pages C and D include the keywords c and d, the markers are forcibly propagated. Therefore, WWW pages A, B, C, and D survive. Next, as shown in FIG. 5B, from the rows of the WWW pages A, B, C, and D, the columns whose elements are 1, that is, the users P, Q, R, S, T, U, and V The marker is propagated. Thereafter, as shown in FIG. 5C, pruning is performed based on the number of markers received in the column. When pruning is performed on the number of markers of 1 or less, the users S, T, U, and V are deleted, and the users P, Q, and R survive. Similarly, pruning is performed based on the number of markers received in the row. However, although the number of markers is 0, the WWW page C includes keywords c and d, so that the WWW page C is forcibly survived regardless of the number of markers. The WWW page D has two markers and is not targeted for pruning. However, even if it is targeted for pruning, it contains keywords c and d, and thus survives forcibly. In this way, by performing pruning with a predetermined number of markers as a threshold, it becomes possible to narrow down the final sub-matrix to a desired size, and the matrix clustering becomes more efficient and the processing speed becomes faster. This is effective when the user page correspondence table is enormous.
[0055]
Since the other WWW pages A and B have two markers, pruning is not performed. Thereafter, even if marker propagation is repeated, the matrix does not change, so that the dense partial matrix obtained by matrix clustering is as shown in FIG. The sub-matrix indicates communities P, Q, and R having an access history similar to that of the user P with respect to the keywords c and d. Such a community differs for the same user P depending on an initial value determined by a keyword. That is, one user belongs to a plurality of communities for each piece of information to be searched, and according to the matrix clustering performed by the community search unit 22, a community having such a relationship between users that cannot be assumed in advance is identified. Extraction can be performed easily and at high speed.
[0056]
In this way, the community search unit 22 assigns the WWW page IDs of the WWW pages A, B, C, and D to the WWW pages of the community assuming that the access frequency of the users P, Q, and R belonging to the community is high. , Ie, 3 for WWW page A, 3 for WWW page B, 0 for WWW page C, and 2 for WWW page D, and outputs them. Also, the WWW page IDs of the WWW pages C and D are output as including the output keywords c and d. The WWW page IDs of the WWW pages C and D may be output from the page search unit 21 to the page list generation unit 23.
[0057]
The page list generation unit 23, which has received the WWW page ID with a high community access frequency and the WWW page ID including the keyword from the community search unit 22, ranks the WWW pages based on these (S3). As described above, these WWW pages include (1) a WWW page D including keywords c and d and having a high frequency of access to communities P, Q and R, and (2) including keywords c and d and communities P, Q and R can be classified into three types of WWW pages: a WWW page C with a low access frequency, and (3) a WWW page A and a B that do not include the keywords c and d but have a high access frequency to the communities P, Q, and R. , Based on a preset order. For example, if the order is set in the order of (1), (3), and (2), the priority order of the WWW pages is D, A, B, and C.
[0058]
Further, the page list generation unit 23 reads out the WWW page information of the WWW pages D, C, A, and B from the WWW page information database, creates a WWW page list, and outputs it to the management unit 20 (S4). The WWW page information includes the URL of the WWW page, and the WWW page list is displayed with a hyper ring attached to the URL. The management unit 20 transmits the WWW page list to the client 3 (S5), and a series of information search processing ends. The user can browse the WWW page by the client 3 by selecting a desired WWW page from the WWW page list displayed on the interface of the client 3.
[0059]
In the present embodiment, when ranking is performed based on the total number of accesses to WWW pages as in the conventional information search, WWW page C having an access number of 4 has priority over WWW page D having an access number of 2. Although higher, the access frequency of the users Q and R having the same preference as the user P on the WWW page C is low. On the other hand, the access frequency of the users Q and R on the WWW page D is high. Therefore, considering the preferences of the user P, it is considered that the useful information for the user P is likely to be the WWW page D, and the priorities of the priorities do not reflect the preferences of the user P.
[0060]
According to the information search system 1 according to the present embodiment, the WWW page D, which is frequently accessed by the users belonging to the communities P, Q, and R, can be displayed with higher priority in the WWW page list, This makes it possible to search for information that suits the user's preferences, and is particularly useful when the number of WWW pages including the input keyword is enormous. In addition, WWW pages A and B that do not have keywords c and d can be displayed as search results, and information search that is strong against keyword fluctuations can be performed.
[0061]
In the present embodiment, the user inputs a keyword to the client 3. However, the page search unit 21 causes the client 3 accessing the information search server 2 to display a certain group of keywords, and the user Instead of inputting a keyword, a mode in which a keyword is selected from the displayed keyword group may be adopted. In addition, the page search unit 21 can extract information included in the WWW page currently being browsed by the user as a keyword.
[0062]
[Second embodiment]
As shown in FIG. 6, an information search system 5 according to the present embodiment is configured such that an information search server 6 and a plurality of clients 3 are connected via a network 4 so as to be able to communicate with each other. Note that the client 3 and the network 4 are the same as those in the first embodiment, and a description thereof will be omitted. Here, the information search server 6 will be described in detail.
[0063]
The information search server 6 includes a management unit 60, a page search unit 61, a community search unit 62, a matrix clustering unit 63, and a page list generation unit 64.
[0064]
The management unit 60 manages the functions of the page search unit 61, the community search unit 62, the matrix clustering unit 63, and the page list generation unit 64, and secures the connection with the network 4. The user ID is received by the management unit 60 and transmitted to the page search unit 61 and the community search unit 62. Further, the WWW page list created by the page list generation unit 64 is transmitted to a predetermined client 3.
[0065]
Further, the management unit 60 traverses WWW pages on the WWW at predetermined intervals, extracts a keyword included in each WWW page, and associates the keyword with a WWW page ID. A robot search function that records as a correspondence table and records a WWW page information database in which WWW page information such as a URL of a WWW page, a page operator, and a page summary is associated with a WWW page ID; Has a function of recording the access history to the WWW page as a user / page correspondence table in which the user ID and the WWW page ID are associated with each other. Further, it is determined whether the user who has accessed the information search server 6 is a new user. If the user is a new user, a registration screen of a user profile is displayed on the client 3 and the input user profile is recorded in the database. Also, the profile of the registered user is changed in the same manner. The management unit 60 that functions as described above can be realized by, for example, a CPU and a communication device.
[0066]
The page search unit 61 receives a keyword from the management unit 60, searches for a WWW page ID having the keyword by using a keyword / page correspondence table stored using a robot search or the like, and sends it to the matrix clustering unit 63. Output. The WWW page ID may be a URL or the like as long as it can identify the WWW page. The keyword / page correspondence table is stored in, for example, a hard disk, and the page search unit 61 can be realized by, for example, a CPU.
[0067]
The community search unit 62 receives the user ID from the management unit 60, finds the community of the user performing the information search from the user profile, and outputs the user ID belonging to the community to the matrix clustering unit 63. This community is performed, for example, by calculating a correlation coefficient between user profiles, which will be described later. The user profile is stored in, for example, a hard disk, and the community search unit 62 can be realized by, for example, a CPU.
[0068]
The matrix clustering unit 63 receives the WWW page ID from the page search unit 61 and the user ID from the community search unit 62, and performs matrix clustering on the user / page correspondence table using these as initial values, thereby obtaining dense portions. The matrix is extracted, and the WWW page ID included in the submatrix is output to the page list generation unit 64. The user page correspondence table is stored in, for example, a hard disk, and the community search unit 63 can be realized by, for example, a CPU.
[0069]
The page list generation unit 64 receives the WWW page ID from the community search unit 63, ranks the WWW page IDs based on a certain priority, reads out the WWW page information from the WWW page information database, and reads the WWW page list. Is generated. From the WWW page ID of a user belonging to the community where the access frequency is high and the WWW page ID including the keyword, (1) a WWW page including the keyword and frequently accessed by the community; Since it can be classified into three types of WWW pages: low-frequency WWW pages, and (3) WWW pages that do not include a keyword but have a high frequency of community access, predetermined priorities are set in advance. In addition, the priorities of WWW pages in each of (1), (2), and (3) are ranked according to the access frequency of the community.
[0070]
Next, a processing procedure of the information search system 5 will be described with reference to FIGS.
First, the user P accesses the information search server 6 from the client 3 via the network 4 and displays a search engine page on the interface of the client 3. In order to search for a WWW page having information desired by the user P, a keyword considered to be a hand device or the like is input on the page. Here, it is assumed that “c” and “d” have been searched for as keywords as in the first embodiment. The keywords c and d and the user ID of the user P are transmitted from the client 3 to the information search server 6, and in response to this, the information search server 6 executes an information search process.
[0071]
FIG. 7 shows a search processing procedure of the information search server 6. As shown in FIG. 7, the information search server 6 that has received the keyword a and the user ID of the user P first sets the keywords c, A page search for searching for a WWW page including d is performed (S10). More specifically, the keywords c and d are output from the management unit 60 that has received the keywords c and d and the user ID of the user P to the page search unit 61, and the page search unit 61 determines whether the keyword page shown in FIG. The WWW pages C and D including the keywords c and d are extracted from the correspondence table, and the WWW page ID is output to the matrix clustering unit 63.
[0072]
On the other hand, the community search unit 62 that has received the user ID of the user P from the management unit 60 extracts the community to which the user P belongs from the user profile (S11). The community is performed, for example, by calculating a correlation coefficient between profiles. There are various methods for calculating such a correlation coefficient. Here, a method based on a mean square error will be described as an example. FIG. 8 shows an example of the user profile. The vertical axis represents the user ID, the horizontal axis represents a favorite field such as sports, music, and a movie. , And if it is not preferred, it is represented by “0”. Such a user profile is registered corresponding to the user ID when the user accesses the information search server 5 for the first time, and can be updated as necessary.
[0073]
From the user profile, the preference of the user P is the fields A, B, and F. First, when the other users who set the category as the favorite field are determined, the user P prefers the fields A, B, and F common to the user P. Are determined to be users Q and R. Next, the correlation coefficient of the profile of the user P and the profile of the user Q is obtained as follows from the difference between the squares of the degree of preference for the fields b and f in the fields that are commonly evaluated.
(3-5) ² + (1-1) ² = 4
Similarly, for the user P and the user R, since the common field is a and b,
(5-5) ² + (3-1) ² = 4
It becomes. Since the obtained correlation coefficients are the same, it can be determined that the users Q and R are similarly similar to the preference of the user P. If there are a large number of users who share a favorite field with the user P, a condition that the obtained correlation coefficient is equal to or less than a threshold is used, or a user belonging to the community is selected by ranking based on the correlation coefficient. It may be good. The user IDs belonging to the community thus obtained are output to the matrix clustering unit 63.
[0074]
The matrix clustering unit 63, which receives the WWW page IDs of the WWW pages C and D from the page search unit 61 and the user IDs of the users P, Q and R of the community to which the user P belongs from the community search unit 62, sets these user IDs and WWW With the page ID as an initial value, matrix clustering of the user / page correspondence table is executed (S12). FIG. 9A shows an example of the user page correspondence table. The vertical axis represents the user ID and the horizontal axis represents the WWW page ID. When the user accesses the WWW page, “1” is used. If not, it is represented by “0”. Assuming that the initial values are the users P, Q, R, and WWW pages C, D, the markers are propagated from the columns of the users P, Q, R to the row where the element is 1, that is, the WWW pages A, B, D, F. You. At this time, since the WWW page C includes the keywords C and D, the marker is forcibly propagated. Therefore, as shown in FIG. 9B, the WWW pages A, B, C, D, and F survive. Thereafter, as shown in FIG. 9C, pruning is performed based on the number of markers received in the row. If pruning is performed on the number of markers of 1 or less, the WWW pages C and F are deleted, and the WWW pages A, B and D survive. Since d is included, it survives forcibly regardless of the number of markers. The WWW page D has two markers and is not targeted for pruning. However, even if it is targeted for pruning, it contains keywords c and d, and thus survives forcibly. Since the other WWW pages A and B have two markers, pruning is not performed. Therefore, as shown in FIG. 9C, the WWW pages A, B, C, and D survive. On the other hand, when pruning is performed based on the number of markers received in the column, the users P, Q, and R belonging to the community are forced to survive regardless of the number of markers.
[0075]
Thereafter, the matrix does not change even if marker propagation is repeated, so that the dense partial matrix obtained by matrix clustering is as shown in FIG. 9C. The WWW pages A, B, C, and D obtained in this manner are used together with the access frequencies of the users P, Q, and R, that is, 3 for the WWW page A, 3 for the WWW page B, and 3 for the WWW page C. 0 and 2 are output in association with WWW page D. Also, the WWW page IDs of the WWW pages C and D are output as including the output keywords c and d.
[0076]
The page list generation unit 64, which has received the WWW page ID having a high community access frequency and the WWW page ID including the keyword from the community search unit 22, ranks the WWW pages based on these (S13). If the ranking is performed in the same manner as in the first embodiment, the order is D, A, B, C. Further, the page list generation unit 64 reads out the WWW page information of the WWW pages D, C, A, and B from the WWW page information database, creates a WWW page list, and outputs it to the management unit 60 (S14). The management unit 60 transmits the WWW page list to the client 3 (S15), and a series of information search processing ends.
[0077]
If the user's community is extracted from a user profile registered in advance as in the present embodiment, the accuracy is higher than when the user's community is detected from the WWW page access history of each user, and the user's preferences are accurately reflected. This enables information retrieval.
[0078]
The information search servers 2 and 6 according to each of the embodiments are realized as a program for performing each processing step of the above-described information search method, in addition to a dedicated system, for example, a CD storing the program. -It can also be realized by installing the program on a general-purpose computer using a recording medium such as a ROM.
[0079]
【The invention's effect】
As described above, according to the present invention, a user belonging to a specific community having the same preference as a user who has performed an information search is obtained from an access history or a user profile, and a search of a WWW page is performed by reflecting the preference of the community. And WWW pages having user's favorite information can be searched efficiently.
[Brief description of the drawings]
FIG. 1 is a diagram showing a configuration of an information search system according to a first embodiment of the present invention.
FIG. 2 is a flowchart illustrating a procedure of an information search process.
FIG. 3 is a diagram illustrating an example of a keyword / page correspondence table.
FIG. 4 is a diagram showing an example of a user / page correspondence table.
FIG. 5 is a diagram showing a process of matrix clustering.
FIG. 6 is a diagram showing a configuration of an information search system according to a second embodiment of the present invention.
FIG. 7 is a flowchart illustrating a procedure of an information search process.
FIG. 8 is a diagram illustrating an example of a user profile.
FIG. 9 is a diagram showing a process of matrix clustering.
FIG. 10 is a flowchart illustrating an example of a conventional information search processing procedure.
FIG. 11 is a diagram showing an example of a conventional keyword / page correspondence table.
FIG. 12 is a diagram showing an example of a conventional page priority table.
[Explanation of symbols]
1,5 Information retrieval system
2,6 Information search server
3 clients
4 Network
20,60 Management Department
21, 61 Page search section
22,62 Community Search Department
23, 64 page list generator
63 Matrix clustering unit

Claims

An information search method for searching for information desired by a user from information provided on a WWW page based on a keyword and a user ID from a requesting client, and transmitting WWW page information having the information to the client And
Based on the keyword, a keyword / page correspondence table recorded by associating a keyword included in the WWW page with each WWW page is searched for a WWW page including the keyword,
User page correspondence expressing the access history of each user to each WWW page in a two-dimensional space using the user ID and the WWW page search result as initial values, each WWW page and each user as row components and column components. By extracting an area where access histories are dense from the table, a community of a user having a similar tendency to the user and the keyword is obtained, and a WWW page belonging to the community is selected.
On the basis of the access frequency of the user belonging to the community, the WWW pages belonging to the community are ranked and a WWW page list indicating each WWW page information is generated;
Transmitting the WWW page list to the client.

An information search method for searching for information desired by a user from information provided on a WWW page based on a keyword and a user ID from a requesting client, and transmitting WWW page information having the information to the client And
Based on the keyword, a keyword / page correspondence table recorded by associating a keyword included in the WWW page with each WWW page is searched for a WWW page including the keyword,
Based on the user ID, from a user profile in which the preferences of each user are recorded in advance, a community of users whose preferences are similar to the user is obtained.
A user who expresses the access history of each user on each WWW page in a two-dimensional space using the WWW page search result and the user belonging to the community as initial values, and using each WWW page and each user as row components and column components. -By extracting a region where access history is dense from the page correspondence table, a WWW page belonging to the region is selected,
On the basis of the access frequency of the user belonging to the area, the WWW pages belonging to the area are ranked, and a WWW page list indicating each WWW page information is generated,
Transmitting the WWW page list to the client.

3. The information search method according to claim 1, wherein the keyword from the client is input by a user.

3. The information search method according to claim 1, wherein the keyword from the client is selected by a user from a keyword group displayed on the client.

3. The information search method according to claim 1, wherein the keyword from the client is extracted from a WWW page displayed on the client.

From the user page correspondence table, a WWW page accessed by the requesting user and a WWW page including the keyword are selected as the next targets, and the user accessing the target WWW page is set as the next target. 2. The information search method according to claim 1, wherein a region where access histories are dense is extracted by selecting and repeating this process until a predetermined convergence condition is satisfied.

From the user page correspondence table, when selecting a WWW page or a user as the next target, when the access frequency of each WWW page or user is equal to or less than a predetermined threshold and other than the WWW page including the keyword, 7. The information search method according to claim 6, wherein the information is excluded.

From the user page correspondence table, a WWW page accessed by a user belonging to the community and a WWW page including the keyword are selected as next targets, and a user accessing the target WWW page and a user belonging to the community are selected. 3. The information retrieval method according to claim 2, wherein a region where access histories are dense is extracted by selecting the next target as a next target and repeating this until a predetermined convergence condition is satisfied.

From the user page correspondence table, when selecting a WWW page or a user as the next target, when the access frequency of each WWW page or user is equal to or less than a predetermined threshold and the WWW page or the community including the keyword is selected. 9. The information search method according to claim 8, wherein a user other than the user to which the user belongs is excluded.

The WWW page list includes WWW pages belonging to the community or area, the WWW pages containing the keyword and having a high access frequency to the community or area, and the WWW pages containing the keyword and having a low access frequency to the community or area. And WWW page information that does not include the keyword but is frequently accessed in a community or an area and is ranked for each classification and indicates each WWW page information. 2. The information search method described in 2.

Based on the keyword and user ID from the requesting client, search for information desired by the user from the information provided on the WWW page, and transmit WWW page information having the information to the client as a WWW page list Information retrieval server,
Page search means for searching a WWW page including the keyword from a keyword / page correspondence table recorded by associating a keyword included in the WWW page with each WWW page based on the keyword and outputting a search result;
Using the user ID and the search result of the page search means as initial values, each WWW page and each user are represented as a row component and a column component, and the access history of each user to each WWW page is represented in a two-dimensional space. By extracting an area where access histories are dense from the page correspondence table, a community of a user having a similar tendency to the user and the keyword is obtained, a WWW page belonging to the community is selected, and the WWW page and the community are selected. Community search means to output the access frequency of
Page list generating means for ranking the WWW pages selected by the community search means based on the access frequency of a user belonging to the community, and generating and outputting a WWW page list indicating each WWW page information. An information retrieval server, characterized in that the information retrieval server comprises:

Based on the keyword and user ID from the requesting client, search for information desired by the user from the information provided on the WWW page, and transmit WWW page information having the information to the client as a WWW page list Information retrieval server,
Page search means for searching a WWW page including the keyword from a keyword / page correspondence table recorded by associating a keyword included in the WWW page with each WWW page based on the keyword and outputting a search result;
Community search means for obtaining, from a user profile in which the preferences of each user are recorded in advance based on the user ID, a community of a user having similar preferences to the user, and outputting a user ID belonging to the community;
A user who expresses the access history of each user on each WWW page in a two-dimensional space using each of the WWW pages and each user as a row component and a column component using the search result and the user ID belonging to the community as initial values. Matrix clustering means for extracting a region where access histories are dense from the page correspondence table, selecting a WWW page belonging to the region, and outputting the WWW page and the access frequency;
Page list generating means for ranking the WWW pages selected by the matrix clustering means based on the access frequency, generating and outputting a WWW page list indicating each WWW page information. An information retrieval server, characterized in that it is located.

The information search server according to claim 11, wherein the page search unit searches a WWW page based on a keyword input by the user to the client.

13. The information search server according to claim 11, wherein the page search means displays a keyword group on the client and searches a WWW page based on a keyword selected by a user from the keyword group.

13. The information search server according to claim 11, wherein the page search means extracts a keyword from a WWW page displayed on the client and searches for a WWW page based on the keyword.

The community search means selects, from the user page correspondence table, a WWW page accessed by the requesting user and a WWW page including the keyword as the next target, and selects the user who accessed the target WWW page. 12. The information retrieval server according to claim 11, wherein the information retrieval server selects a next target and repeats the selection until a predetermined convergence condition is satisfied, thereby extracting an area where access histories are dense.

The community search means, when selecting a WWW page or a user as the next target from the user page correspondence table, the access frequency of each WWW page or user is equal to or less than a predetermined threshold and includes the keyword 17. The information search server according to claim 16, wherein pages other than WWW pages are excluded.

The matrix clustering means selects, from the user / page correspondence table, a WWW page accessed by a user belonging to the community and a WWW page including the keyword as a next target, and a user accessing the target WWW page. And selecting a user belonging to the community as a next target and repeating the selection until a predetermined convergence condition is satisfied, thereby extracting an area where access histories are dense. Information search server described in.

The matrix clustering means, when selecting a WWW page or a user as the next target from the user page correspondence table, the access frequency of each WWW page or user is equal to or less than a predetermined threshold and includes the keyword 20. The information search server according to claim 18, wherein objects other than a user belonging to a WWW page or the community are excluded.

The page list generating means may include a WWW page belonging to the community or the area, the WWW page including the keyword and having a high access frequency of the community or area, and the WWW page including the keyword and having a low access frequency of the community or area. WWW pages are classified into WWW pages and WWW pages that do not include the keyword but have a high frequency of access to a community or area, and a WWW page list indicating each WWW page information is generated by ranking each classification. The information retrieval server according to claim 11 or 12, wherein:

A computer searches for information desired by a user from information provided on a WWW page based on a keyword and a user ID from a requesting client, and stores WWW page information having the information as a WWW page list. An information search program for functioning as an information search server to be transmitted to the client,
Computer
Page search means for searching a WWW page including the keyword from a keyword / page correspondence table recorded in association with the keyword included in the WWW page in correspondence with each WWW page based on the keyword, and outputting a search result;
Using the user ID and the search result of the page search means as initial values, each WWW page and each user are represented as a row component and a column component, and the access history of each user to each WWW page is represented in a two-dimensional space. By extracting an area where access histories are dense from the page correspondence table, a community of a user having a similar tendency to the user and the keyword is obtained, a WWW page belonging to the community is selected, and the WWW page and the community are selected. Community search means to output the access frequency of
Based on the access frequency of the user belonging to the community, the WWW pages selected by the community search unit are ranked, and a WWW page list indicating each WWW page information is generated and output as a page list generation unit. An information retrieval program characterized in that:

A computer searches for information desired by a user from information provided on a WWW page based on a keyword and a user ID from a requesting client, and stores WWW page information having the information as a WWW page list. An information search program for functioning as an information search server to be transmitted to the client,
Computer
Page search means for searching a WWW page including the keyword from a keyword / page correspondence table recorded in association with the keyword included in the WWW page in correspondence with each WWW page based on the keyword, and outputting a search result;
Community search means for obtaining, from a user profile in which preferences of each user are recorded in advance based on the user IDs, of a user having similar preferences to the user, and outputting a user ID belonging to the community;
A user who expresses the access history of each user on each WWW page in a two-dimensional space using each of the WWW pages and each user as a row component and a column component using the search result and the user ID belonging to the community as initial values. Matrix clustering means for extracting a region where access histories are dense from the page correspondence table, selecting a WWW page belonging to the region, and outputting the WWW page and the access frequency;
On the basis of the access frequency, the WWW pages selected by the matrix clustering unit are ranked, and a WWW page list indicating each WWW page information is generated and output as a page list generating unit. Information retrieval program.

23. The information search program according to claim 21, wherein the page search means functions as a search for a WWW page based on a keyword input to the client by a user.

23. The information search program according to claim 21, wherein the page search means displays a group of keywords on the client and functions as a unit for searching a WWW page based on a keyword selected by the user from the group of keywords.

23. The information search program according to claim 21, wherein the page search means functions as a means for extracting a keyword from a WWW page displayed on the client and searching for the WWW page based on the keyword.

The community search means selects, from the user page correspondence table, a WWW page accessed by the requesting user and a WWW page including the keyword as the next target, and selects the user who accessed the target WWW page. 22. The information search program according to claim 21, wherein the information search program is selected as a next target, and is repeated until a predetermined convergence condition is satisfied, so as to function as a device for extracting an area where access histories are dense.

When the community search means selects a WWW page or user as the next target from the user page correspondence table, the access frequency of each WWW page or user is equal to or less than a predetermined threshold and includes the keyword. 27. The information search program according to claim 26, wherein the information search program is caused to function as an object other than a WWW page.

The user who selects the WWW page accessed by the user belonging to the community and the WWW page including the keyword from the user / page correspondence table as the next target, and accesses the target WWW page from the user / page correspondence table. And selecting a user belonging to the community as a next target and repeating the selection until a predetermined convergence condition is satisfied, thereby causing a region having a dense access history to be extracted. 22. The information search program according to 22.

When the matrix clustering unit selects a WWW page or a user as the next target from the user page correspondence table, the access frequency of each WWW page or user is equal to or less than a predetermined threshold and includes the keyword. 29. The information search program according to claim 28, wherein the information search program is caused to function as excluding a user other than a WWW page or a user belonging to the community.

The page list generation unit may be configured to convert a WWW page belonging to the community or the area into a WWW page including the keyword and having a high access frequency to the community or the area, and a WWW page including the keyword and having a low access frequency to the community or the area. The WWW pages are classified into WWW pages and WWW pages that do not include the keyword but have a high frequency of access to a community or area, and function as a WWW page list that indicates each WWW page information by ranking each classification. The information search program according to claim 21 or 22, wherein