JP2004126975A

JP2004126975A - Preference trend similar user extraction method and apparatus, preference trend similar user extraction program, and recording medium storing the same

Info

Publication number: JP2004126975A
Application number: JP2002290800A
Authority: JP
Inventors: Hiroyuki Takeuchi; 竹内　宏之; Shinji Abe; 安部　伸治; Etsuro Fujita; 藤田　悦郎; Yasuhito Hayashi; 林　泰仁
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2002-10-03
Filing date: 2002-10-03
Publication date: 2004-04-22

Abstract

【課題】選好傾向の類似に基づいたユーザグループを作成し、情報提供者の広告商品などに興味をもちそうなユーザグループを抽出する。
【解決手段】利用者端末１０２を使用する各ユーザのサービス利用履歴から作成された個人の興味を表わすユーザプロファイルを格納するユーザプロファイル格納部１０５と多数の単語に関しての概念ベクトルが格納された知識体系１０６を備えるサーバ１０４において、情報提供者端末１０３から入力された情報提供者の広告商品などを表現したキーワードとユーザプロファイルとの相関を算出することで、情報提供者の広告商品に興味をもちそうなユーザグループを把握することが出来る。また、グルーピング処理部１０８において、情報提供者が望む人数等のパラメータに合わせたユーザ作成を行うことで、情報提供者の意図するユーザグループ（興味の強弱、人数等）を算出することが出来る。
【選択図】　　　　図１A user group is created based on the similarity of a preference tendency, and a user group likely to be interested in an advertisement product of an information provider is extracted.
Kind Code: A1 A user profile storage unit for storing a user profile representing personal interest created from service usage history of each user using a user terminal, and a knowledge system in which concept vectors regarding a large number of words are stored. The server 104 including the information provider 106 calculates the correlation between the user profile and the keyword expressing the advertising product of the information provider input from the information provider terminal 103, so that the advertising product of the information provider becomes interested. User groups can be grasped. In addition, the grouping processing unit 108 creates a user in accordance with a parameter such as the number of persons desired by the information provider, so that the user group (intensity of interest, number of persons, etc.) intended by the information provider can be calculated.
[Selection diagram] Fig. 1

Description

【０００１】
【発明の属する技術分野】
本発明は、ネットワーク上で利用されるコンテンツ提供システムから取得される利用履歴を用いて、情報提供者がユーザに対し効果的に広告提供等を行う場合に、利用者を分類し、選好傾向が類似するユーザを抽出することを目的とした技術に関する。
【０００２】
【従来の技術】
ネットワークを利用した情報提供市場は年々拡大している。情報提供者がユーザに対し効果的に広告提供を行うためには、ターゲットユーザを絞り込む必要があると言われている。ターゲットを絞り込む方法として、個々のユーザの特性を把握し、似た特性をまとめてグループ化する手法が挙げられる。
【０００３】
従来、個々のユーザの特性は情報フィルタリング技術により、ユーザの興味という形で抽出してきた。サービス例として、オンラインスーパーマーケットの例が挙げられるが、この場合、個人適応化機能等を用いてユーザのアクセス、購入履歴をもとに個々のユーザに合わせた「情報推薦サービス」を展開したり、キャンペーンの対象にするユーザを絞り込んだりしている。
【０００４】
例えば、特許文献１にはユーザの嗜好情報とそのユーザ嗜好に類似したユーザグループを抽出する手法について記載されている。また、特許文献２にはユーザプロファイルとコンテンツプロファイルとの間の類似度を演算することについて記載されており、特許文献３には２次元マップについて記載されている。
【０００５】
【特許文献１】
特開平１１−１５８４０号公報
【特許文献２】
特開２００１−２２９１６７号公報
【特許文献３】
特開２００２−１７５３２１号公報
【０００６】
【発明が解決しようとする課題】
従来技術は、配信する情報に対して興味を持つ可能性があるユーザを抽出することは可能だが、各ユーザの配信情報に対する興味の強弱を表現することが出来ていないため、興味度に基づくユーザの絞り込みが難しいという状況がある。特許文献１では、ユーザの嗜好情報とそのユーザ嗜好に類似したユーザグループを抽出する手法が述べられているが、情報提供者が意図するユーザグループを算出することや興味の強弱等によりグループ化する人数（ターゲットとする人数）を調整することは出来ない。
【０００７】
本発明の目的は、選好傾向の類似に基づいたユーザグループを作成し、情報提供者の広告商品などに興味をもちそうなユーザグループを抽出する。また、興味度や人数をパラメータとして、ユーザグループを選択することで、情報提供者が広告ターゲットを容易に選定できる選好傾向類似ユーザ抽出方法及び装置、並びに選好傾向類似ユーザ抽出プログラム及びそれを記録した記録媒体を提案することにある。
【０００８】
【課題を解決するための手段】
本発明の選好傾向類似ユーザ抽出方法は、各ユーザのサービス利用履歴から作成された個人の興味を表わすユーザプロファイルと多数の単語に関しての概念ベクトルが格納された知識体系を保持するサーバにおいて、情報提供者の広告商品などを表現したキーワードとユーザプロファイルとの相関を算出することで、情報提供者の広告商品に興味をもちそうなユーザグループを把握することが出来る。
【０００９】
また、情報提供者の広告商品などに関するキーワードやターゲットユーザ数等のユーザグループ可視化に必要なパラメータを入力する手段とグルーピング処理部において、情報提供者が望む人数等のパラメータに合わせたユーザ作成を行うことで、情報提供者の意図するユーザグループ（興味の強弱、人数等）を算出することが出来る。
【００１０】
【発明の実施の形態】
以下、本発明の一実施の形態について説明する。
【００１１】
図１は、本発明の実施形態におけるシステム構成図である。本実施の形態では、インターネットなどのネットワーク１０１を通して、複数の利用者端末１０２や情報提供者端末１０３がサーバ１０４に接続されている。以下、図１、図２、図３を用いて説明する。サーバ１０４は、個人の興味を表わしたユーザプロファイルをユーザプロファイル格納部１０５に保持している。このユーザプロファイルは、各ユーザが利用者端末１０２から利用したサービス利用履歴から作成されている。また、サーバ１０４は、多数の単語に関しての概念ベクトルが格納された知識体系１０６と利用者分類における各種の処理を行う演算部（マッチング処理演算部１０７、グルーピング処理部１０８、可視化処理部１０９、ユーザ間類似度算出処理部１１０）から構成される。ユーザプロファイル及び知識体系の構成、演算部における各処理部の機能詳細に関しては後に記載する。
【００１２】
情報提供者が広告商品などに興味をもちそうなユーザグループを把握したい場合は、情報提供者端末１０３から情報提供者の広告商品などに関するキーワードを入力して、サーバ１０４に送信する。また、結果として出力するユーザ数等のユーザグループ可視化に必要なパラメータを入力することも出来る。
【００１３】
サーバ１０４は、情報提供者端末１０３から入力されたキーワード等のパラメータ２０１に対して、ユーザプロファイルベクトル２０４を検索することで、キーワードと関連性の高い概念ベクトルを抽出することができる。
【００１４】
さらにサーバ１０４では、キーワードを含む概念ベクトルを知識体系から抽出し、ユーザプロファイルに含まれるユーザプロファイルベクトル２０４とのマッチング処理２０５を行うことで、情報提供者から入力されたキーワードに興味を持つ可能性があるユーザグループを算出し、グルーピング処理２０６により、情報提供者が望む人数等のパラメータに合わせたユーザグループの作成を行うことができる。
【００１５】
以上の処理結果を可視化処理２０８により表形式で出力することも出来る。図３の可視化例では、マッチングに使用された概念ベクトルを保持するユーザを、各概念ごとに集計した表３０１としている。また、各ユーザプロファイルから年齢等の属性を抽出して集計した表３０２を出力することも出来る。
【００１６】
また、グルーピング処理２０６から出力されたユーザグループに対して、ユーザ間類似度算出処理２０７によりユーザ間の類似度をベクトルの距離尺度として算出し、結果を可視化処理２０８によりビジュアル化する場合、この可視化された結果は、情報提供者が意図するユーザグループを選好傾向の類似性により可視化されている３０３。また、情報提供者が入力したキーワードとの関連性の強弱を色変化を用いて表現することも考えられる。また、出力されたユーザグループに対して、各ユーザプロファイルをすべて利用して、ユーザ間類似度算出処理２０７によりユーザ間の類似度を算出し、結果を可視化処理２０８によりビジュアル化する場合、先のビジュアル化では発見できなかったユーザグループや属性の傾向を把握することが出来る。また、ユーザプロファイルと同様に、各概念ベクトルを多次元尺度構成法を用いて２次元平面上における配置（座標）を出力すれば、可視化処理２０８により３０３にマッピングすることができる。また、ユーザプロファイルに利用者の連絡先（Ｅ−ｍａｉｌアドレスなど）が含まれ、かつメールサーバ等のユーザに連絡をとることが出来る手段をシステムに含む場合、前記で表示したユーザグループに対しで情報提供者が配信したい情報を届けることが可能である。
【００１７】
次に、本発明における、ユーザプロファイルの構成例、マッチング処理、グルーピング処理について説明する。
【００１８】
〔知識体系〕
コーパスなどより作成された各概念の表現に使用される単語集合を知識体系として利用する。今回の例では、知識体系における一つの概念情報は、一つの興味領域を表現しているとする。図４に知識体系の例を示す。
【００１９】
今回は、上記を用いて、利用者に提供されるコンテンツのジャンル（ニュース、アニメ、音楽、スポーツ、映画）を、概念ベクトルデータベース（Ｗ_１，Ｗ_２・・・Ｗ_５）として表現する場合を具体例として考える。
【００２０】
〔コンテンツプロファイル〕
ネットワーク上で利用される複数のコンテンツ提供システムから取得されるコンテンツのプロファイルを知識体系から抽出した概念ベクトルの組み合わせにより作成する場合を想定する。コンテンツプロファイルは、特願２０００−３５５８７０（特許文献３）により説明されているコンテンツプロファイル作成方法に加え、コンテンツ文章中の出現頻度が上位に位置する単語を数個選択して、選択した名詞を含む概念を知識体系から検索して、各概念において単語が持つ重みから、各概念ベクトルの値を算出する。
【００２１】
コンテンツプロファイルベクトルの例
Ｖ_{ｃｏｎｔｅｎｔｓ}＝＜Ｗ_１，Ｗ_２・・・Ｗ_ｉ＞　Ｗは各概念の重要度
例えば、コンテンツＡにおいて、コンテンツ文章中における出現頻度が高い単語上位５個に（Ｔ_２１，Ｔ_２２，Ｔ_３１，Ｔ_４２，Ｔ_５１）が含まれる場合、Ｗ_２（アニメ）における単語（Ｔ_２１，Ｔ_２２）の重み合計、同じくＷ_３（音楽）・・・Ｗ_５（映画）における単語（Ｔ_３１，Ｔ_４２，Ｔ_５１）の重み合計を算出する。算出された重み合計を用いて、コンテンツプロファイルベクトルを表現すると、下記のように表現できる。
【００２２】
Ｖ_{ｃｏｎｔｅｎｔｓ}　＝＜０，　０．４，　０．０５，　０．０２，　０．２５＞
〔ユーザプロファイル〕
ユーザプロファイルは、各利用者（ユーザ）の興味を表現したキーワードベクトルとして表現される。ユーザプロファイルベクトルの作成方法は、前述のコンテンツプロファイルを利用して、特願２０００−３５５８７０（特許文献３）で説明されている方法などにより作成することが考えられる。また、ユーザプロファイルには、当該ユーザの性別、年齢、居住地域などの他の属性も付加されることも考えられる。
【００２３】
ユーザプロファイルベクトルの例
Ｖ_{ｐｒｏｆｉｌｅ}　＝＜Ｗ_１，Ｗ_２・・・Ｗ_ｎ＞　Ｗはユーザプロファイルとして使用する概念ベクトル
ユーザプロファイルで使用する概念ベクトル（Ｗ_１，Ｗ_２・・・Ｗ_５）は、知識体系から選択することを考える。
【００２４】
以下には、上記を用いて、利用者に提供されるコンテンツのジャンル（ニュース、アニメ、音楽、スポーツ、映画）を、Ｗ_１，Ｗ_２・・・Ｗ_５として利用する場合を具体例として述べる。
【００２５】
まず、各利用者のコンテンツ視聴履歴を用いて、特願２０００−３５５８７０（特許文献３）で説明されている方法などにより、ユーザプロファイルベクトルの各軸（Ｗ_１，Ｗ_２・・・Ｗ_５）の重要度を算出する。
Ｖ_{ｐｒｏｆｉｌｅ}　＝＜Ｗ_１，Ｗ_２・・・Ｗ_５　＞
※各軸を右記とする。（Ｗ_１＝ニュース、Ｗ_２＝アニメ、Ｗ_３＝音楽、Ｗ_４＝スポーツ、Ｗ_５＝映画）
映画のコンテンツ（以下に示すコンテンツベクトルを持つものとする）
Ｖ_{ｃｏｎｔｅｎｔｓ}　＝＜０，　０．４，　０．０５，　０．０２，　０．２５＞
を視聴した利用者は、次のようなプロファイルとして表現されることになっている。
Ｖ_{ｐｒｏｆｉｌｅ}　＝＜０，　０．４，　０．０５，　０．０２，　０．２５＞
連続して、以下に示すコンテンツベクトルを視聴した場合、
Ｖ_{ｃｏｎｔｅｎｔｓ}　＝＜０．０４，　０，　０．１，　０．２，　０．１＞
ユーザプロファイルは、次のようになる。
Ｖ_{ｐｒｏｆｉｌｅ}　＝＜０．０２，　０．２，　０．０３，　０．１１，　０．１８＞
〔マッチング処理〕
マッチング処理では、知識体系から、広告主が入力したキーワードを単語として含む概念ベクトルを抽出する。抽出した概念ベクトルから各ユーザプロファイルの概念ベクトルとのマッチングを行い、広告に興味を持つユーザを抽出する。例えば、キーワード〔チケット〕〔プレゼント〕に対して、広告に興味を持つユーザを抽出する場合を考える。「映画」の概念ベクトルに〔チケット〕が［単語Ｔ_５４：０．０２］として含まれる場合、ユーザプロファイルにおける概念ベクトル「映画」に着目して、利用者をソートする。概念ベクトルが閾値以上のユーザを候補として抽出することなどが考えられる。
【００２６】
また、キーワード〔プレゼント〕は、ユーザプロファイルに使用されている概念ベクトルには含まれないが、知識体系中の概念ベクトルＷ_ｉに［単語Ｔ_ｉｎ：０．３］として含まれる場合、概念ベクトルＷ_ｉに類似するユーザプロファイルに使用されている概念ベクトルを用いる方法も考えられる。左記の方法として、一般的に知られているベクトル空間法を用いて各概念ベクトルとの類似度計算を行うことが考えられる。計算された類似度が閾値以上の場合、ユーザプロファイルに使用されている概念ベクトルを用いて、利用者をソートすることも考えられる。
【００２７】
【数１】

【００２８】
〔グルーピング処理〕
グルーピング処理では、マッチング処理により選択されたユーザと似た興味を持つユーザを選択し、その概念ベクトルがユーザの興味に近いかどうかという判定を行う。そのため、グルーピングを２つの段階に分けて処理する。各段階を経るに従って、広告主が設定するターゲットユーザの数を増やしていく事ができる。
【００２９】
まず、マッチング処理により、広告主が設定したキーワードを含んだ、ユーザプロファイルにおける概念ベクトルが選択される。この選択された概念ベクトル（概念）を含むユーザプロファイルを持つグループが第一グループとなる。次に、広告主が設定したキーワードを含んだ知識体系中の概念ベクトルが存在する場合、左記の概念ベクトルとユーザプロファイルにおける概念ベクトル間での類似度を算出して、類似度が閾値を超える場合、ユーザプロファイルにおける概念ベクトルを含むユーザプロファイルを持つグループが第二グループとなる。
【００３０】
例えば、「チケット」のキーワードを含む概念として「映画」が抽出された場合、この「映画」をある閾値以上の値としてユーザプロファイル中に含むユーザを第一グループとして抽出する。次に、キーワード〔プレゼント〕を含んだ知識体系中の概念ベクトルと類似するユーザプロファイルに使用されている概念ベクトルとの類似度計算を行い、計算された類似度が閾値以上の場合、ユーザプロファイルに使用されている概念ベクトルをある閾値以上の値としてユーザプロファイル中に含むユーザを第二グループとして抽出する。このようにして選択された概念ベクトル（概念）を含むユーザプロファイルを持つグループが、第二グループとなる。ターゲット人数は、第一グループ、第一グループ＋第二グループと拡大していく。上記グループの人数等から、情報提供者が入力した人数等のパラメータに近づけるように概念ベクトルを選択していく。また、グルーピング処理によって得られた第一グループ、第二グループ内の多くのユーザが共通して持っている新たな類似度の高い概念ベクトルや、共通している他の属性を発見することで、第三のグループを算出することもできる。
【００３１】
以下では、例として、広告主が広告を打ち、配信先人数５００人を指定する状況を考える。広告主は広告の特徴を表わすキーワード［例：プレゼント、チケット］と、配信先人数［５００人］を入力する。出力では、キーワード［チケット］と概念ベクトル［映画］に関して、ユーザプロファイル中に、ある閾値以上の値を保持する利用者数と、キーワード［プレゼント］を含む概念と類似の高い概念［音楽］、および左記概念ベクトル値で、ある閾値以上の値を保持する利用者数が表示される。
【００３２】
〔ユーザ間類似度算出処理〕
上記までの処理により抽出されたユーザに対して、各ユーザプロファイルベクトル間の類似度を、２次元平面上の距離尺度として表現する。例として、あるユーザｉの持つユーザプロファイルベクトルを、下記のように表現するとき
Ｖ_{ｐｒｏｆｉｌｅ}　＝＜Ｗ_ｉ，１，Ｗ_ｉ，２・・・Ｗ_ｉ，ｎ＞　Ｗ_ｉ，ｎはユーザｉにおけるｎ番目の概念が持つ重要度
類似度計算には下記に示すベクトルの距離尺度を用いることが考えられる。
【００３３】
【数２】

【００３４】
上記式を用いて、抽出されたユーザに対して、各ユーザプロファイルベクトル間の類似度を、多次元尺度構成法を用いて２次元平面上の距離尺度として表現することで、各ユーザの２次元平面上における配置（座標）を算出する。
【００３５】
〔可視化処理〕
算出された各ユーザの２次元平面上における配置（座標）を用いて、３０３のようにユーザをプロットする。ユーザと共に各概念ベクトルの２次元平面上における配置（座標）を多次元尺度構成法により算出してプロットすることも考えられる。また、キーワードとの関連度の違いを色などにより表現することも考えられる。
【００３６】
以上、本発明者によってなされた発明を、前記実施の形態に基づき具体的に説明したが、本発明は、前記実施の形態に限定されるものではなく、その要旨を逸脱しない範囲において種々変更可能であることは勿論である。
【００３７】
【発明の効果】
以上説明したように、本発明によれば、ユーザの選好傾向に関する類似度に基づいてユーザを算出して、情報提供者の広告商品などに興味をもちそうなユーザグループを作成して、興味度や人数をパラメータとして抽出するユーザグループを選択することで、情報提供者が広告ターゲットを容易に選定できる。
【図面の簡単な説明】
【図１】本発明の実施形態におけるシステム構成図である。
【図２】本発明の実施形態における利用者分類方法を説明するためのフローチャートである。
【図３】本発明の実施形態における利用者分類方法を用いたユーザプロファイル可視化例を示す図である。
【図４】知識体系の例を示す図である。
【符号の説明】
１０１…インターネットなどのネットワーク、１０２…利用者端末、１０３…情報提供者端末、１０４…サーバ、１０５…ユーザプロファイル格納部、１０６…知識体系、１０７…マッチング処理演算部、１０８…グルーピング処理部、１０９…可視化処理部、１１０…ユーザ間類似度算出処理部、２０１…キーワード等のパラメータ、２０４…ユーザプロファイルベクトル、２０５…マッチング処理、２０６…グルーピング処理、２０７…ユーザ間類似度算出処理、２０８…可視化処理。[0001]
TECHNICAL FIELD OF THE INVENTION
According to the present invention, when an information provider effectively provides an advertisement to a user using a usage history acquired from a content providing system used on a network, the user is classified and the preference tendency is reduced. The present invention relates to a technique for extracting similar users.
[0002]
[Prior art]
The information provision market using networks is expanding year by year. It is said that an information provider needs to narrow down target users in order to effectively provide advertisements to users. As a method of narrowing down the target, there is a method of grasping the characteristics of individual users and grouping similar characteristics together.
[0003]
Conventionally, the characteristics of individual users have been extracted in the form of user interests using information filtering technology. An example of a service is an online supermarket. In this case, the “information recommendation service” tailored to each user based on the user's access and purchase history using personal adaptation functions, etc., Or narrowing down the audience for the campaign.
[0004]
For example, Patent Document 1 describes a method of extracting user preference information and a user group similar to the user preference. Patent Document 2 describes calculating a similarity between a user profile and a content profile, and Patent Document 3 describes a two-dimensional map.
[0005]
[Patent Document 1]
JP-A-11-15840 [Patent Document 2]
JP 2001-229167 A [Patent Document 3]
JP 2002-175321 A
[Problems to be solved by the invention]
In the prior art, it is possible to extract users who may be interested in the information to be distributed, but it is not possible to express the level of interest in the distribution information of each user. There is a situation where it is difficult to narrow down. Patent Literature 1 describes a method of extracting user preference information and a user group similar to the user preference. However, the user group intended by the information provider is calculated or grouped based on the level of interest. The number of people (target number) cannot be adjusted.
[0007]
An object of the present invention is to create a user group based on the similarity of the preference tendency and extract a user group likely to be interested in an advertisement product of an information provider. In addition, by selecting a user group using the degree of interest and the number of persons as parameters, an information provider can easily select an advertisement target. It is to propose a recording medium.
[0008]
[Means for Solving the Problems]
According to the method for extracting users having similar preference trends according to the present invention, information is provided in a server that holds a knowledge system in which concept vectors regarding a number of words and a user profile representing individual interests created from service usage history of each user are stored. By calculating the correlation between the keyword expressing the advertisement product of the user and the user profile, it is possible to grasp the user group likely to be interested in the advertisement product of the information provider.
[0009]
In addition, a means for inputting parameters necessary for visualizing a user group such as a keyword relating to an advertisement product of an information provider and the number of target users and a grouping processing unit create a user according to parameters such as the number of information providers desired. Thus, the user group (strength of interest, number of persons, etc.) intended by the information provider can be calculated.
[0010]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, an embodiment of the present invention will be described.
[0011]
FIG. 1 is a system configuration diagram in the embodiment of the present invention. In this embodiment, a plurality of user terminals 102 and information provider terminals 103 are connected to a server 104 via a network 101 such as the Internet. Hereinafter, description will be made with reference to FIGS. 1, 2 and 3. The server 104 holds a user profile indicating personal interest in the user profile storage unit 105. This user profile is created from the service usage history used by each user from the user terminal 102. The server 104 includes a knowledge system 106 in which concept vectors for a large number of words are stored, and arithmetic units (matching processing arithmetic units 107, grouping processing units 108, visualization processing units 109, And a similarity calculation section 110). The configuration of the user profile and the knowledge system and the details of the functions of each processing unit in the arithmetic unit will be described later.
[0012]
When the information provider wants to grasp a user group that is likely to be interested in an advertisement product or the like, a keyword related to the advertisement product of the information provider is input from the information provider terminal 103 and transmitted to the server 104. It is also possible to input parameters required for visualizing the user group such as the number of users to be output as a result.
[0013]
The server 104 can extract a concept vector highly relevant to the keyword by searching the user profile vector 204 for the parameter 201 such as a keyword input from the information provider terminal 103.
[0014]
Further, the server 104 extracts a concept vector including a keyword from the knowledge system and performs a matching process 205 with the user profile vector 204 included in the user profile, thereby possibly interested in the keyword input from the information provider. A certain user group is calculated, and the grouping process 206 can create a user group according to parameters such as the number of people desired by the information provider.
[0015]
The above processing results can be output in a tabular form by the visualization processing 208. In the visualization example of FIG. 3, a user who holds the concept vectors used for matching is represented in a table 301 in which the users are totaled for each concept. In addition, a table 302 in which attributes such as age are extracted from each user profile and totalized can be output.
[0016]
Also, for the user group output from the grouping process 206, the similarity between users is calculated as a distance measure of the vector by the user similarity calculation process 207, and the result is visualized by the visualization process 208. The result obtained is visualized by the similarity of the preference tendency to the user group intended by the information provider 303. It is also conceivable to express the strength of the relevance with the keyword input by the information provider using a color change. In addition, when all the user profiles are used for the output user group, the similarity between users is calculated by the similarity calculation process 207 between users, and the result is visualized by the visualization process 208. It is possible to grasp the tendency of user groups and attributes that could not be found by visualization. In addition, similarly to the user profile, if the arrangement (coordinates) of each concept vector on a two-dimensional plane is output using a multidimensional scaling method, the concept vectors can be mapped to 303 by the visualization processing 208. In the case where the user profile includes the user's contact information (such as an e-mail address) and the system includes means such as a mail server that can contact the user, the user group displayed in the above may be deleted. It is possible for an information provider to deliver information to be distributed.
[0017]
Next, a configuration example of a user profile, matching processing, and grouping processing according to the present invention will be described.
[0018]
[Knowledge system]
A word set used for expressing each concept created from a corpus or the like is used as a knowledge system. In this example, it is assumed that one piece of conceptual information in the knowledge system represents one area of interest. FIG. 4 shows an example of the knowledge system.
[0019]
In this case, using the above, the genre (news, animation, music, sports, movie) of the content provided to the user is represented as a concept vector database (W ₁ , W ₂ ... W ₅ ). Consider as a specific example.
[0020]
[Content Profile]
It is assumed that profiles of contents obtained from a plurality of content providing systems used on a network are created by combining concept vectors extracted from a knowledge system. The content profile includes, in addition to the content profile creation method described in Japanese Patent Application No. 2000-355870 (Patent Document 3), selecting a few words having a higher appearance frequency in a content sentence and selecting the noun. The concept is retrieved from the knowledge system, and the value of each concept vector is calculated from the weight of the word in each concept.
[0021]
Example of content profile vector V _contents = <W ₁ , W ₂ ... _Wi > W is the importance of each concept. For example, in the content A, the top five words that appear frequently in the content text (T ₂₁ , _{_{_{T 22, T 31, T 42}}} , _{if T 51)} includes the weight sum word _(T _{21, T 22)} in _{W 2} (animation) in the same _{W 3} (music) · · · _{W 5} (Film) calculating a weight sum word _{_{_{(T 31, T 42, T}}} 51). When the content profile vector is expressed using the calculated weight sum, it can be expressed as follows.
[0022]
V _contents = <0, 0.4, 0.05, 0.02, 0.25>
[User profile]
The user profile is expressed as a keyword vector expressing the interest of each user (user). As a method of creating the user profile vector, it is conceivable to create the user profile vector by using the above-described content profile and the method described in Japanese Patent Application No. 2000-355870 (Patent Document 3). It is also conceivable that other attributes such as the gender, age, and residence area of the user are added to the user profile.
[0023]
Example of user profile vector V _profile = <W ₁ , W ₂ ... W _n > W is a concept vector used as a user profile The concept vector (W ₁ , W ₂ ... W ₅ ) used in the user profile is Consider choosing from a body of knowledge.
[0024]
The following, using the above, described the content of the genre that will be provided to the user (news, animation, music, sports, movies), and the case to be used as _W _1, W ₂ ··· W ₅ as a specific example .
[0025]
First, each axis (W ₁ , W _2, ..., W ₅ ) of the user profile vector is determined by using the content viewing history of each user and the method described in Japanese Patent Application No. 2000-355870 (Patent Document 3). Is calculated.
V _profile = <W ₁ , W ₂ ... W ₅ >
* Each axis is described on the right. (W ₁ = news, W ₂ = animation, W ₃ = music, W ₄ = sports, W ₅ = movie)
Movie content (has the content vector shown below)
V _contents = <0, 0.4, 0.05, 0.02, 0.25>
Is viewed as a profile as follows.
V _profile = <0, 0.4, 0.05, 0.02, 0.25>
If you watch the following content vectors continuously,
V _contents = <0.04, 0, 0.1, 0.2, 0.1>
The user profile is as follows.
V _profile = <0.02, 0.2, 0.03, 0.11, 0.18>
[Matching processing]
In the matching process, a concept vector including a keyword input by the advertiser as a word is extracted from the knowledge system. Matching is performed with the concept vector of each user profile from the extracted concept vector, and a user who is interested in the advertisement is extracted. For example, consider a case where a user who is interested in an advertisement is extracted for a keyword [ticket] [present]. When [ticket] is included in the concept vector of “movie” as [word T ₅₄ : 0.02], the users are sorted by focusing on the concept vector “movie” in the user profile. It is conceivable to extract a user whose concept vector is equal to or larger than a threshold as a candidate.
[0026]
In addition, keyword [gift] is not included in the concept vector that is being used in the user profile, the concept vector W _i in the body of knowledge [word T _in: 0.3] If that is included as, the concept vector W _A method using a concept vector used for a user profile similar to _i is also conceivable. As a method described on the left, it is conceivable to calculate the similarity with each concept vector using a generally known vector space method. If the calculated similarity is equal to or greater than the threshold, the users may be sorted using the concept vector used in the user profile.
[0027]
(Equation 1)

[0028]
[Grouping processing]
In the grouping process, a user having an interest similar to the user selected by the matching process is selected, and it is determined whether or not the concept vector is close to the user's interest. Therefore, the grouping is processed in two stages. Through each step, the number of target users set by the advertiser can be increased.
[0029]
First, a concept vector in the user profile including the keyword set by the advertiser is selected by the matching process. A group having a user profile including the selected concept vector (concept) is a first group. Next, when there is a concept vector in the knowledge system including the keyword set by the advertiser, the similarity between the concept vector on the left and the concept vector in the user profile is calculated, and the similarity exceeds the threshold. The group having the user profile including the concept vector in the user profile is the second group.
[0030]
For example, when "movie" is extracted as a concept including the keyword of "ticket", a user that includes this "movie" in a user profile as a value equal to or greater than a certain threshold is extracted as a first group. Next, a similarity calculation between the concept vector in the knowledge system including the keyword [present] and the concept vector used for the similar user profile is performed, and when the calculated similarity is equal to or larger than the threshold, the similarity calculation is performed. A user that includes a used concept vector in a user profile as a value equal to or greater than a certain threshold is extracted as a second group. A group having a user profile including the concept vector (concept) selected in this way is a second group. The target number will expand to the first group, the first group + the second group. Based on the number of persons in the group and the like, concept vectors are selected so as to approach parameters such as the number of persons input by the information provider. In addition, the first group obtained by the grouping process, a new concept vector having a high similarity that many users in the second group have in common, and by discovering other common attributes, A third group can also be calculated.
[0031]
In the following, as an example, consider a situation in which an advertiser places an advertisement and specifies the number of distribution destinations to be 500. The advertiser inputs a keyword [example: present, ticket] representing the feature of the advertisement and the number of distribution destinations [500]. In the output, regarding the keyword [ticket] and the concept vector [movie], in the user profile, the number of users holding a certain value or more, a high concept [music] similar to the concept including the keyword [present], and In the concept vector value on the left, the number of users holding a value equal to or greater than a certain threshold is displayed.
[0032]
[User similarity calculation processing]
The similarity between each user profile vector is expressed as a distance measure on a two-dimensional plane with respect to the user extracted by the above processing. As an example, when a user profile vector of a certain user i is expressed as follows, V _profile = < _{Wi, 1} , _{Wi, 2} ... _{Wi, n} > _{Wi, n} is _n in user i. It is conceivable to use the following vector distance scale for the importance similarity calculation of the second concept.
[0033]
(Equation 2)

[0034]
Using the above equation, the degree of similarity between each user profile vector with respect to the extracted user is expressed as a distance measure on a two-dimensional plane using a multidimensional scale construction method. The arrangement (coordinates) on the plane is calculated.
[0035]
(Visualization processing)
Using the calculated arrangement (coordinates) of each user on the two-dimensional plane, the users are plotted as indicated by 303. It is also conceivable that the arrangement (coordinates) of each concept vector on a two-dimensional plane is calculated and plotted by a multidimensional scaling method together with the user. It is also conceivable to express the difference in the degree of relevance with the keyword by a color or the like.
[0036]
As described above, the invention made by the inventor has been specifically described based on the embodiment. However, the present invention is not limited to the embodiment, and can be variously modified without departing from the gist of the invention. Needless to say,
[0037]
【The invention's effect】
As described above, according to the present invention, a user is calculated based on a similarity regarding a user's preference tendency, and a user group that is likely to be interested in an advertisement product of an information provider is created. The information provider can easily select an advertisement target by selecting a user group to extract the number of persons and the number of persons as a parameter.
[Brief description of the drawings]
FIG. 1 is a system configuration diagram according to an embodiment of the present invention.
FIG. 2 is a flowchart for explaining a user classification method according to the embodiment of the present invention.
FIG. 3 is a diagram illustrating an example of visualizing a user profile using a user classification method according to an embodiment of the present invention.
FIG. 4 is a diagram illustrating an example of a knowledge system.
[Explanation of symbols]
101: Network such as the Internet, 102: User terminal, 103: Information provider terminal, 104: Server, 105: User profile storage unit, 106: Knowledge system, 107: Matching processing operation unit, 108: Grouping processing unit, 109 ... visualization processing unit, 110 ... user similarity calculation processing unit, 201 ... parameters such as keywords, 204 ... user profile vector, 205 ... matching processing, 206 ... grouping processing, 207 ... user similarity calculation processing, 208 ... visualization processing.

Claims

A user profile storage unit for storing a user profile representing personal interest created from the service usage history of each user;
A knowledge system that stores concept vectors for many words,
Is a method of extracting users having similar preferences in a system having
For a keyword group input by the information provider, a similar concept vector is calculated from the knowledge system, and a matching process is performed to perform a matching process with a user profile;
A grouping process for bringing a calculation result closer to a parameter input by an information provider;
A visualization process for visualizing a grouping process result;
Method for extracting users with similar preferences.

The method of extracting users having similar preferences as in claim 1,
Calculate the similarity between user profile vectors held by each user calculated by the matching process,
The result is converted into two-dimensional planar coordinates by a multidimensional scaling method,
A preference-similar-user extraction method for visualizing preference-similar users by mapping each user in a two-dimensional plane based on the converted coordinates of each user on a two-dimensional plane.

The user preference extraction method according to claim 2, wherein
A preference tendency for a keyword group input by an information provider, calculated by the matching process, to convert a similar concept vector into coordinates on a two-dimensional plane by a multidimensional scaling method, and to map and visualize with a user. Similar user extraction method.

A user profile storage unit for storing a user profile representing personal interest created from the service usage history of each user;
A knowledge system that stores concept vectors for many words,
For a keyword group input by the information provider, a matching processing unit that calculates a similar concept vector from the knowledge system and performs a matching process with a user profile;
A grouping processing unit that performs a process of bringing the calculation result closer to the parameter input by the information provider,
A preference extraction similar user extraction device basically including a visualization processing unit for visualizing a result of a grouping processing unit.

The preference extraction similar user extraction device according to claim 4,
An inter-user similarity calculation processing unit that calculates a similarity between user profile vectors held by each user calculated by the matching processing unit;
An inter-user similarity calculation processing unit that converts an output result from the inter-user similarity calculation processing unit into two-dimensional planar coordinates by a multidimensional scale configuration method;
A preference-similar-user extraction device including a visualization processing unit that visualizes preference-similar users by mapping each user in a two-dimensional plane based on the converted coordinates of each user on a two-dimensional plane.

The preference-similar-user extracting device according to claim 5,
For a keyword group input by the information provider calculated by the matching processing unit, a similar concept vector is converted into coordinates on a two-dimensional plane by a multidimensional scale construction method, and is mapped and visualized with a user. A preference-similar-user extracting device including a processing unit.

A similar concept vector extracted for a keyword group input by an information provider from a knowledge system in which concept vectors for a large number of words are stored, and a user profile representing individual interest created from a service usage history of each user. Matching processing function to perform matching processing of
A grouping process function for performing a process of bringing the calculation result closer to the parameter input by the information provider,
A visualization processing function that visualizes the grouping processing results,
A user extraction program similar to the preference trend for realizing.

8. The user extraction program according to claim 7, wherein:
Calculate the similarity between user profile vectors held by each user calculated by the matching process,
The result is converted into two-dimensional planar coordinates by a multidimensional scaling method,
A preference tendency similar user extraction program for visualizing a preference tendency similar user by mapping each user in a two-dimensional plane based on the converted coordinates of each user on a two-dimensional plane.

9. The user extraction program according to claim 8, wherein:
A preference-similar user extraction program that converts a concept vector similar to a keyword group input by an information provider into coordinates on a two-dimensional plane by a multidimensional scale construction method, and maps and visualizes the coordinates with a user.

A computer-readable recording medium on which a program similar to the preference tendency extracted according to any one of claims 7 to 9 is recorded.