JP2022087409A

JP2022087409A - Data addition system and data addition method

Info

Publication number: JP2022087409A
Application number: JP2020199328A
Authority: JP
Inventors: 江里子佐藤; Eriko Sato; 将樹向尾; Masaki Kobi
Original assignee: Hitachi High Tech Corp
Current assignee: Hitachi High Tech Corp
Priority date: 2020-12-01
Filing date: 2020-12-01
Publication date: 2022-06-13
Also published as: WO2022118860A1

Abstract

【課題】ターゲットユーザに類似する個人が特定できない潜在ユーザについて、マーケティングに活用可能なデータを付加する。【解決手段】ユーザの登録情報を含まないコンテンツのアクセス履歴から、指定された条件を含むターゲットユーザに一定以上類似する潜在ユーザのレコードと、ターゲットユーザのレコードとを含むデータを、潜在ユーザデータとして出力する潜在データ抽出部と、潜在ユーザデータのレコードに関連付けるための条件として指定された関連項目を用いて、潜在ユーザデータのレコードに関連する、アクセス履歴とは異なるアクセス履歴を記憶したデータセットのレコードを、潜在ユーザデータのレコードに付加する潜在ユーザデータ付加部と、を有する。【選択図】図１An object of the present invention is to add data that can be used for marketing to potential users who are similar to a target user and whose individuals cannot be identified. [Solution] Data including a record of a potential user who is more than a certain degree similar to a target user including specified conditions and a record of the target user from a content access history that does not include the user's registration information is used as potential user data. Using the latent data extractor to be output and the relevant item specified as a condition for associating with the record of the potential user data, the data set storing the access history different from the access history related to the record of the potential user data and a potential user data appending unit for appending records to records of potential user data. [Selection diagram] Fig. 1

Description

本発明は、ユーザの登録情報を含まないコンテンツのアクセス履歴から潜在ユーザデータを抽出し、潜在ユーザデータに別のデータセット群の関連するデータを付加する技術に関する。 The present invention relates to a technique for extracting latent user data from an access history of content that does not include user registration information and adding relevant data of another dataset group to the latent user data.

マーケティング分野では、コンテンツを利用しているユーザの属性や行動履歴といった情報を用いて、ターゲットとなるユーザの情報を複数のデータベースから取得する技術が知られている。ユーザの情報を取得する技術としては、例えば、特開文献１や特許文献２が知られている。 In the field of marketing, there is known a technique of acquiring information of a target user from a plurality of databases by using information such as attributes and behavior history of users who use the content. As a technique for acquiring user information, for example, Japanese Patent Application Laid-Open No. 1 and Patent Document 2 are known.

特開文献１には、複数企業にマーケティングデータを提供するマーケティングデータ提供システムにおいて、各企業の顧客情報を含む顧客データを収集する顧客データ収集手段と、提供された顧客データに対し名寄せを行うマーケティングデータ提供システムが開示されている。 Japanese Patent Application Laid-Open No. 1 describes a customer data collection means for collecting customer data including customer information of each company in a marketing data providing system for providing marketing data to a plurality of companies, and marketing for identifying the provided customer data. The data provision system is disclosed.

特許文献２には、ユーザの属性を用いて前記ユーザの特徴を推定し、対象ユーザと類似する類似ユーザの特徴情報に基づいて対象ユーザの特徴情報を補足する補足部と備えることを特徴とする情報処理装置が開示されている。 Patent Document 2 is characterized in that it includes a supplementary unit that estimates the characteristics of the user using the attributes of the user and supplements the characteristic information of the target user based on the characteristic information of a similar user similar to the target user. The information processing device is disclosed.

特開２００３－２８８４６４号公報Japanese Patent Application Laid-Open No. 2003-288464 特開２０２０－０３５２１９号公報Japanese Unexamined Patent Publication No. 2020-035219

ウェブサイトにコンテンツ情報を提供する投稿者は、提供するコンテンツにアクセスする閲覧ユーザに応じて、新たなビジネスの創出を意図する場合がある。この閲覧ユーザの中には、例えば、資料をダウンロードするためにウェブサイトに登録して個人の属性情報を提供するターゲット可能なユーザ以外にも、そのユーザと同様の嗜好を持っていて類似しているものの、ウェブサイトにアクセスしただけで個人の属性情報の提供まではしていない潜在ユーザも存在する。なお、潜在ユーザとは、ターゲットユーザとコンテンツのアクセス履歴が類似するユーザをさし、ターゲットユーザは、ターゲット可能なユーザの中で自身の商品サービスを利用してもらいたいユーザをさす。もし、従来取得可能だったターゲットユーザの情報以外に、この潜在ユーザについてマーケティングに活用可能な情報が得られた場合、新たなビジネスを創出することが可能となる。また、マーケティングに活用可能な情報とは、ユーザ個人を特定して直接商品サービスを売り込むことが可能な情報以外にも、検索キーワードや購買情報など、ユーザの嗜好傾向といった他の情報もある。そのため、ウェブサイトのコンテンツを利用するユーザの登録情報を含まない潜在ユーザであっても、別のデータセット群から、この潜在ユーザに関連するデータが付加されることでマーケティングに活用できる情報となりうる。 A contributor who provides content information to a website may intend to create a new business depending on the browsing user who accesses the provided content. Some of these browsing users have similar tastes and similarities to, for example, other than the targetable users who register on the website to download materials and provide personal attribute information. However, there are potential users who only access the website and do not provide personal attribute information. The potential user refers to a user whose content access history is similar to that of the target user, and the target user refers to a user who wants to use his / her own product service among the targetable users. If information that can be used for marketing is obtained for this potential user in addition to the information of the target user that was previously available, it will be possible to create a new business. Further, the information that can be used for marketing includes not only information that can directly identify an individual user and sell a product or service, but also other information such as a search keyword and purchase information, such as a user's preference tendency. Therefore, even a potential user who does not include the registration information of the user who uses the content of the website can be information that can be utilized for marketing by adding data related to this potential user from another data set group. ..

上記従来例の特開文献１では、ターゲットとする個人が特定できるようなデータに対して名寄せしてデータを付加するものであるが、個人が特定できない場合においてデータを付加することは考慮されていないという問題があった。 In Japanese Patent Laid-Open No. 1 of the above-mentioned conventional example, data is added by name identification to data that can be identified by a target individual, but it is considered to add data when an individual cannot be identified. There was a problem that it wasn't there.

上記従来例の特許文献２では、自身のサービスの最適化に用いるために、ターゲットとする特定の個人が特定できるユーザの情報について、他のサービスの解析結果を利用してターゲットユーザや類似ユーザの情報を使って補足するものであって、個人が特定できない類似ユーザに対して補足するものではなかった。 In Patent Document 2 of the above-mentioned conventional example, in order to be used for optimizing one's own service, information on a user who can be identified by a specific target individual is used as an analysis result of another service to obtain a target user or a similar user. It was supplemented using information, not for similar users who could not be identified by an individual.

そこで本発明は、上記問題点に鑑みてなされたもので、ターゲットユーザに類似する個人が特定できない潜在ユーザについて、マーケティングに活用可能なデータを付加することが可能なデータ付加システム、及び、データ付加方法を提供することを目的とする。 Therefore, the present invention has been made in view of the above problems, and is a data addition system capable of adding data that can be used for marketing to a potential user who cannot identify an individual similar to a target user, and data addition. The purpose is to provide a method.

本発明にかかるデータ付加システムは、ユーザの登録情報を含まないコンテンツのアクセス履歴から、指定された条件を含むターゲットユーザに一定以上類似する潜在ユーザのレコードと、前記ターゲットユーザのレコードとを含むデータを、潜在ユーザデータとして出力する潜在データ抽出部と、前記潜在ユーザデータのレコードに関連付けるための条件として指定された関連項目を用いて、前記潜在ユーザデータのレコードに関連する、前記アクセス履歴とは異なるアクセス履歴を記憶したデータセットのレコードを、前記潜在ユーザデータのレコードに付加する潜在ユーザデータ付加部と、を有することを特徴とするデータ付加システムとして構成される。 The data addition system according to the present invention is data including a record of a potential user who is similar to the target user including a specified condition by a certain amount or more and a record of the target user from the access history of the content which does not include the user registration information. The access history related to the record of the latent user data by using the latent data extraction unit that outputs the latent user data and the related item specified as the condition for associating the record of the latent user data with the data. It is configured as a data addition system characterized by having a latent user data addition unit that adds a record of a data set storing different access histories to the record of the latent user data.

本発明によれば、ターゲットユーザに類似する個人が特定できない潜在ユーザについて、マーケティングに活用可能なデータを付加することができる。 According to the present invention, it is possible to add data that can be used for marketing to a potential user who cannot identify an individual similar to the target user.

本実施例における潜在ユーザデータ付加システムの構成の一例を示すブロック図である。It is a block diagram which shows an example of the structure of the latent user data addition system in this Example. 図１に示した潜在ユーザデータ抽出部が行う処理の処理手順を示すフローチャートの一例を示す図である。It is a figure which shows an example of the flowchart which shows the processing procedure of the processing performed by the latent user data extraction unit shown in FIG. 図１に示した関連データ類推部が行う処理の処理手順を示すフローチャートの一例を示す図である。It is a figure which shows an example of the flowchart which shows the processing procedure of the processing performed by the related data analogy part shown in FIG. 図１に示したデータ付加部が行う処理の処理手順を示すフローチャートの一例を示す図である。It is a figure which shows an example of the flowchart which shows the processing procedure of the processing performed by the data addition part shown in FIG. 1. 図１に示したアクセス履歴のデータの一例を示す図である。It is a figure which shows an example of the data of the access history shown in FIG. 図１に示したデータセット群のデータセットの一例を示す図である。It is a figure which shows an example of the data set of the data set group shown in FIG. 図１に示した入力部の入力データの一例を示す図である。It is a figure which shows an example of the input data of the input part shown in FIG. 図１に示した潜在ユーザデータ抽出部の処理結果の一例を示す図である。It is a figure which shows an example of the processing result of the latent user data extraction part shown in FIG. 本実施例におけるアクセス履歴とターゲットユーザとの関係性の一例を示す図である。It is a figure which shows an example of the relationship between the access history and a target user in this Example. 本実施例におけるターゲットユーザと潜在ユーザとの関係性の一例を示す図である。It is a figure which shows an example of the relationship between a target user and a potential user in this Example. 本実施例におけるアクセス履歴とデータセット１との関係性の一例を示す図である。It is a figure which shows an example of the relationship between the access history and the data set 1 in this Example. 本実施例におけるアクセス履歴とデータセット２との関係性の一例を示す図である。It is a figure which shows an example of the relationship between the access history and the data set 2 in this Example. 本実施例におけるアクセス履歴とデータセット３との関係性の一例を示す図である。It is a figure which shows an example of the relationship between the access history and the data set 3 in this Example. 本実施例における出力結果の一例を示す図である。It is a figure which shows an example of the output result in this Example. システムのコンピュータ概略図である。It is a computer schematic diagram of a system.

以下、図面を参照して本発明の実施形態を説明する。以下の記載および図面は、本発明を説明するための例示であって、説明の明確化のため、適宜、省略および簡略化がなされている。本発明は、他の種々の形態でも実施する事が可能である。特に限定しない限り、各構成要素は単数でも複数でも構わない。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. The following description and drawings are examples for explaining the present invention, and are appropriately omitted and simplified for the sake of clarification of the description. The present invention can also be implemented in various other forms. Unless otherwise specified, each component may be singular or plural.

図面において示す各構成要素の位置、大きさ、形状、範囲などは、発明の理解を容易にするため、実際の位置、大きさ、形状、範囲などを表していない場合がある。このため、本発明は、必ずしも、図面に開示された位置、大きさ、形状、範囲などに限定されない。 The positions, sizes, shapes, ranges, etc. of each component shown in the drawings may not represent the actual positions, sizes, shapes, ranges, etc., in order to facilitate understanding of the invention. Therefore, the present invention is not necessarily limited to the position, size, shape, range and the like disclosed in the drawings.

以下の説明では、「テーブル」、「リスト」等の表現にて各種情報を説明することがあるが、各種情報は、これら以外のデータ構造で表現されていてもよい。データ構造に依存しないことを示すために「ＸＸテーブル」、「ＸＸリスト」等を「ＸＸ情報」と呼ぶことがある。識別情報について説明する際に、「識別情報」、「識別子」、「名」、「ＩＤ」、「番号」等の表現を用いた場合、これらについてはお互いに置換が可能である。 In the following description, various information may be described by expressions such as "table" and "list", but various information may be expressed by a data structure other than these. The "XX table", "XX list", etc. may be referred to as "XX information" to show that they do not depend on the data structure. When expressions such as "identification information", "identifier", "name", "ID", and "number" are used in explaining the identification information, these can be replaced with each other.

同一あるいは同様な機能を有する構成要素が複数ある場合には、同一の符号に異なる添字を付して説明する場合がある。ただし、これらの複数の構成要素を区別する必要がない場合には、添字を省略して説明する場合がある。 When there are a plurality of components having the same or similar functions, they may be described by adding different subscripts to the same reference numerals. However, if it is not necessary to distinguish between these plurality of components, the subscripts may be omitted for explanation.

また、以下の説明では、プログラムを実行して行う処理を説明する場合があるが、プログラムは、プロセッサ（例えばＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ））によって実行されることで、定められた処理を、適宜に記憶資源（例えばメモリ）および／またはインターフェースデバイス（例えば通信ポート）等を用いながら行うため、処理の主体がプロセッサとされてもよい。同様に、プログラムを実行して行う処理の主体が、プロセッサを有するコントローラ、装置、システム、計算機、ノードであってもよい。プログラムを実行して行う処理の主体は、演算部であれば良く、特定の処理を行う専用回路（例えばＦＰＧＡ（Ｆｉｅｌｄ－ＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）やＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ））を含んでいてもよい。 Further, in the following description, a process performed by executing a program may be described, but the program is determined by being executed by a processor (for example, a CPU (Central Processing Unit) or a GPU (Graphics Processing Unit)). In order to perform the processed processing while appropriately using storage resources (for example, memory) and / or interface devices (for example, communication port), the main body of the processing may be a processor. Similarly, the subject of the process of executing the program may be a controller, an apparatus, a system, a computer, or a node having a processor. The main body of the processing performed by executing the program may be any arithmetic unit, and may include a dedicated circuit (for example, FPGA (Field-Programmable Gate Array) or ASIC (Application Specific Integrated Circuit)) for performing specific processing. ..

プログラムは、プログラムソースから計算機のような装置にインストールされてもよい。プログラムソースは、例えば、プログラム配布サーバまたは計算機が読み取り可能な記憶メディアであってもよい。プログラムソースがプログラム配布サーバの場合、プログラム配布サーバはプロセッサと配布対象のプログラムを記憶する記憶資源を含み、プログラム配布サーバのプロセッサが配布対象のプログラムを他の計算機に配布してもよい。また、以下の説明において、２以上のプログラムが１つのプログラムとして実現されてもよいし、１つのプログラムが２以上のプログラムとして実現されてもよい。 The program may be installed from a program source into a device such as a calculator. The program source may be, for example, a program distribution server or a computer-readable storage medium. When the program source is a program distribution server, the program distribution server includes a processor and a storage resource for storing the program to be distributed, and the processor of the program distribution server may distribute the program to be distributed to other computers. Further, in the following description, two or more programs may be realized as one program, or one program may be realized as two or more programs.

図１は、本実施例における潜在ユーザデータ付加システム１０００の構成の一例を示すブロック図である。潜在ユーザデータ付加システム１０００は、入力部００１と、コンテンツのアクセス履歴００２と、潜在ユーザデータ抽出部００３と、複数のデータセットをもつデータセット群００４と、関連データ類推部００５１と、データ取得部００５２とを含む潜在ユーザデータ付加部００５を有する。これらの具体的な機能についてはフローチャートを用いて後述する。 FIG. 1 is a block diagram showing an example of the configuration of the latent user data addition system 1000 in this embodiment. The latent user data addition system 1000 includes an input unit 001, a content access history 002, a latent user data extraction unit 003, a data set group 004 having a plurality of data sets, a related data estimation unit 0051, and a data acquisition unit. It has a latent user data addition unit 005 including 0052. These specific functions will be described later using a flowchart.

潜在ユーザデータ付加システム１０００は、例えば、図１５（コンピュータ概略図）に示すような、ＣＰＵ１５０１と、メモリ１５０２と、ＨＤＤ(ＨａｒｄＤｉｓｋＤｒｉｖｅ)等の外部記憶装置１５０３と、ＣＤ(ＣｏｍｐａｃｔＤｉｓｋ)やＵＳＢメモリ等の可搬性を有する記憶媒体１５０８に対して情報を読み書きする読書装置１５０７と、キーボードやマウス等の入力装置１５０６と、ディスプレイ等の出力装置１５０５と、通信ネットワークに接続するためのＮＩＣ(ＮｅｔｗｏｒｋＩｎｔｅｒｆａｃｅＣａｒｄ)等の通信装置１５０４と、これらを連結するシステムバス等の内部通信線(システムバスという)１５０９と、を備えた一般的なコンピュータ１５００により実現できる。 The latent user data addition system 1000 includes, for example, a CPU 1501, a memory 1502, an external storage device 1503 such as an HDD (Hard Disk Drive), a CD (Compact Disk), or a USB, as shown in FIG. 15 (computer schematic diagram). A reading device 1507 that reads and writes information to and from a portable storage medium 1508 such as a memory, an input device 1506 such as a keyboard and a mouse, an output device 1505 such as a display, and a NIC (Network) for connecting to a communication network. It can be realized by a general computer 1500 including a communication device 1504 such as an Interface Card) and an internal communication line (referred to as a system bus) 1509 such as a system bus connecting them.

各システムや装置に記憶され、あるいは処理に用いられる様々なデータ（例えば、上述した生産関連データ２）は、ＣＰＵ１５０１がメモリ１５０２または外部記憶装置１５０３から読み出して利用することにより実現可能である。また、各システムや装置が有する各機能部（後述）は、ＣＰＵ１５０１が外部記憶装置１５０３に記憶されている所定のプログラムをメモリ１５０２にロードして実行することにより実現可能である。 Various data stored in each system or device or used for processing (for example, the production-related data 2 described above) can be realized by the CPU 1501 reading from the memory 1502 or the external storage device 1503 and using the data. Further, each functional unit (described later) possessed by each system or device can be realized by the CPU 1501 loading a predetermined program stored in the external storage device 1503 into the memory 1502 and executing the program.

上述した所定のプログラムは、読書装置１５０７を介して記憶媒体１５０８から、あるいは、通信装置１５０４を介してネットワークから、外部記憶装置１５０３に記憶(ダウンロード)され、それから、メモリ１５０２上にロードされて、ＣＰＵ１５０１により実行されるようにしてもよい。また、読書装置１５０７を介して、記憶媒体１５０８から、あるいは通信装置１５０４を介してネットワークから、メモリ１５０２上に直接ロードされ、ＣＰＵ１５０１により実行されるようにしてもよい。 The predetermined program described above is stored (downloaded) in the external storage device 1503 from the storage medium 1508 via the reading device 1507 or from the network via the communication device 1504, and then loaded onto the memory 1502. It may be executed by the CPU 1501. It may also be loaded directly onto the memory 1502 from the storage medium 1508 via the reading device 1507 or from the network via the communication device 1504 and executed by the CPU 1501.

以下では、潜在ユーザデータ付加システム１０００が、ある１つのコンピュータにより構成される場合を例示するが、これらの機能部が複数のコンピュータにより構成されていてもよい。さらには、これらの機能部は、その全部または一部が、クラウドのような１または複数のコンピュータに分散して設けられ、ネットワークを介して互いに通信することにより同様の機能を実現してもよい。 In the following, the case where the latent user data addition system 1000 is configured by one computer is illustrated, but these functional units may be configured by a plurality of computers. Further, all or a part of these functional units may be distributed in one or a plurality of computers such as a cloud, and the same functions may be realized by communicating with each other via a network. ..

図２は、図１に示した潜在ユーザデータ抽出部が行う処理の処理手順を示すフローチャートの一例を示す図である。潜在ユーザデータ抽出部００３は、アクセス履歴の各レコードのユーザとターゲットユーザとの類似度を算出し（ステップ１０２）、アクセス履歴の各レコードのユーザとターゲットユーザデータとの類似度がある閾値α以上のデータを抽出する（ステップ１０３）。ターゲットユーザとは、ターゲット可能なユーザの中で自身の商品サービスを利用してもらいたいユーザであり、例えば、図７を用いて後述するように、入力部００１からターゲットユーザとする条件が指定される。なお、類似度算出の方法としては、例えば、潜在ユーザデータ抽出部００３が、あらかじめアクセス履歴をクラスタリングしておき、ターゲットユーザとの距離を類似度として算定する方法などがある。ただし、類似度算出方法はこの限りでない。潜在ユーザデータ抽出部００３の処理結果については、図８を用いて後述する。 FIG. 2 is a diagram showing an example of a flowchart showing a processing procedure of processing performed by the latent user data extraction unit shown in FIG. 1. The latent user data extraction unit 003 calculates the similarity between the user of each record in the access history and the target user (step 102), and the similarity between the user of each record in the access history and the target user data is equal to or higher than a threshold value α. Data is extracted (step 103). The target user is a user who wants to use his / her own product / service among the targetable users. For example, as described later with reference to FIG. 7, a condition to be a target user is specified from the input unit 001. To. As a method of calculating the similarity, for example, there is a method in which the latent user data extraction unit 003 clusters the access history in advance and calculates the distance to the target user as the similarity. However, the similarity calculation method is not limited to this. The processing result of the latent user data extraction unit 003 will be described later with reference to FIG.

図３は、図１に示した関連データ類推部が行う処理の処理手順を示すフローチャートの一例を示す図である。関連データ類推部００５１は、データセット群からデータセットを選択し（ステップ２０２）、入力部００１で指定した関連項目と処理方法を参照し、処理方法に名寄せ処理を指定された項目があるかを判定する（ステップ２０３）。関連データ類推部００５１は、処理方法に名寄せ処理を指定された項目があると判定した場合は（ステップ２０３；ＹＥＳ）、名寄せ処理をする（ステップ２０４）。また、関連データ類推部００５１は、処理方法に関連度算出を指定された項目があるかを判定する（ステップ２０５）。関連データ類推部００５１は、処理方法に関連度算出を指定された項目があると判定した場合は（ステップ２０５；ＹＥＳ）、関連度算出をする（ステップ２０６）。関連データ類推部００５１は、データセット群のうち、全てのデータセットについて処理を行ったか否かを判定し（ステップ２０７）、全てのデータセットについて処理を行ったと判定した場合（ステップ２０７；ＹＥＳ）図３に示す処理を終了する。一方、関連データ類推部００５１は、全てのデータセットについて処理を行っていないと判定した場合（ステップ２０７；ＮＯ）、ステップ２０２のデータセットの選択に戻る。ステップ２０６の関連度算出は、自然言語処理による単語間の類似度測定方法などを用いて算出する。ただし、関連度算出方法はこの限りでない。 FIG. 3 is a diagram showing an example of a flowchart showing a processing procedure of processing performed by the related data analogy unit shown in FIG. 1. The related data analogy unit 0051 selects a data set from the data set group (step 202), refers to the related item specified in the input unit 001 and the processing method, and determines whether there is an item for which name identification processing is specified in the processing method. Determination (step 203). When the related data analogy section 0051 determines that there is an item for which the name identification process is specified in the processing method (step 203; YES), the name identification process is performed (step 204). Further, the related data analogy unit 0051 determines whether or not there is an item for which the relevance calculation is specified in the processing method (step 205). When the related data analogy section 0051 determines that the processing method has an item for which the relevance calculation is specified (step 205; YES), the relevance calculation unit 0051 calculates the relevance degree (step 206). When the related data analogy unit 0051 determines whether or not all the data sets have been processed in the data set group (step 207) and determines that all the data sets have been processed (step 207; YES). The process shown in FIG. 3 is terminated. On the other hand, when the related data analogy unit 0051 determines that all the data sets have not been processed (step 207; NO), the process returns to the selection of the data set in step 202. The degree of relevance in step 206 is calculated by using a method for measuring the degree of similarity between words by natural language processing. However, the relevance calculation method is not limited to this.

図４は、図１に示したデータ付加部が行う処理の処理手順を示すフローチャートの一例を示す図である。データ付加部００５２は、潜在ユーザデータに、関連データ類推部００５１で名寄せ処理したレコードの結果を付加し（ステップ３０２）、また、関連度算出した項目の関連結果と関連度βを付加する（ステップ３０３）。 FIG. 4 is a diagram showing an example of a flowchart showing a processing procedure of processing performed by the data addition unit shown in FIG. 1. The data addition unit 0052 adds the result of the record subjected to the name identification processing by the related data analogy unit 0051 to the latent user data (step 302), and also adds the relational result of the item whose relevance degree has been calculated and the relevance degree β (step). 303).

図５は、図１に示したアクセス履歴のデータの一例を示す図である。図１に示したアクセス履歴００２は、不特定のユーザが登録を必要とすることなくアクセス可能なＷｅｂサイト（例えば、ブログ）に対してアクセスした履歴を示す情報である。アクセス履歴００２の一例であるコンテンツのアクセス履歴Ａ４０１は、例えば、ウェブサーバが所定の周期等で収集する。 FIG. 5 is a diagram showing an example of the access history data shown in FIG. The access history 002 shown in FIG. 1 is information indicating a history of access to a website (for example, a blog) that can be accessed by an unspecified user without requiring registration. The content access history A401, which is an example of the access history 002, is collected by, for example, a web server at a predetermined cycle or the like.

図５の（ａ）に示すように、アクセス履歴Ａ４０１は、項目群Ａ－１｛ＩＰアドレス、アクセス時刻、訪問回数、訪問ページ、閲覧タグ、検索キーワード｝をひとつのレコードに含むテーブルである。「Ｎｏ」は、レコードを識別するための連番であり、図５の（ａ）では、１番からＮ_０までのレコードがアクセス履歴Ａ４０１として記録されている。アクセス履歴Ａ４０１は、コンテンツごとに記憶される。このコンテンツにアクセスしたアクセスユーザの中で、コンテンツに興味関心をもったユーザは、より詳細な情報を取得しようと資料をダウンロードするために、図５の（ｂ）に示すような項目群Ａ－２｛氏名、所属、調査目的、ダウンロード資料リンク｝を、ウェブサーバに送信して、上記Ｗｅｂサイトから資料をダウンロードしたものとする。アクセス履歴Ａ４０１を収集したウェブサーバは、この項目群Ａ－２をもつデータを追加取得データとし、追加された結果、アクセス履歴＋追加取得データ（追加取得データアクセス履歴）Ａ４０２が構成される。図５の（ｂ）では、Ｎｏ１、３、４、Ｎ_０のレコードは、Ｗｅｂサイトのコンテンツにアクセスして資料をダウンロードしたユーザのアクセス履歴であり、Ｎｏ２、４のレコードは、Ｗｅｂサイトのコンテンツにアクセスしたものの、資料をダウンロードしなかったユーザのアクセス履歴であることを示している。追加取得データアクセス履歴Ａ４０２についても、アクセス履歴Ａ４０１と同様、例えば、ウェブサーバが所定の周期等で収集する。 As shown in FIG. 5A, the access history A401 is a table including the item group A-1 {IP address, access time, number of visits, visited page, browsing tag, search keyword} in one record. "No" is a serial number for identifying the record, and in (a) of FIG. 5, the record from No. 1 to _N0 is recorded as the access history A401. The access history A401 is stored for each content. Among the access users who have accessed this content, the user who is interested in the content has the item group A- as shown in FIG. 5 (b) in order to download the material in order to obtain more detailed information. 2 It is assumed that the material is downloaded from the above website by sending {name, affiliation, purpose of investigation, download material link} to the web server. The web server that has collected the access history A401 uses the data having this item group A-2 as the additional acquisition data, and as a result of the addition, the access history + the additional acquisition data (additional acquisition data access history) A402 is configured. In FIG. 5B, the records of Nos. 1, 3, 4, and _N0 are the access histories of the user who accessed the contents of the website and downloaded the materials, and the records of Nos. 2 and 4 are the contents of the website. It shows that it is the access history of the user who accessed the site but did not download the material. Similar to the access history A401, the additional acquisition data access history A402 is also collected by, for example, a web server at a predetermined cycle or the like.

図６は、図１に示したデータセット群のデータセットの一例を示す図である。図１に示したデータセット群００４は、アクセス履歴００２として記憶されるＷｅｂサイトとは異なるＷｅｂサイト（例えば、ブログの環境を提供する運営サイトや他の運営サイト）に対してアクセスした履歴を示す情報である。以下では、データセット群に含まれるデータセットとして、３種類のデータセットを想定する。 FIG. 6 is a diagram showing an example of a data set of the data set group shown in FIG. The data set group 004 shown in FIG. 1 shows a history of accessing a website different from the website stored as the access history 002 (for example, an operation site that provides a blog environment or another operation site). Information. In the following, three types of data sets are assumed as the data sets included in the data set group.

１つめは、自身が運営するＷｅｂサイトのコンテンツで想定されるユーザ層が、アクセス履歴に記憶されているアクセスをしたコンテンツで想定されるユーザ層と同じであるデータセットＢ５０１（データセット１）である。データセットＢ５０１は、項目群Ｂ－１｛ＩＰ、アクセス時刻、訪問回数、訪問ページ、検索キーワード｝を含む。「Ｎｏ」は、レコードを識別するための連番であり、図６の（ａ）では、１番からＮ_１までのレコードがデータセットＢ５０１として記録されている。データセットＢ５０１は、コンテンツごとに記憶されてもよい。 The first is the data set B501 (data set 1) in which the user group assumed for the contents of the website operated by the user is the same as the user group assumed for the accessed contents stored in the access history. be. The data set B501 includes the item group B-1 {IP, access time, number of visits, visited page, search keyword}. “No” is a serial number for identifying the records, and in FIG. 6A, the records from No. ₁ to N1 are recorded as the data set B501. The data set B501 may be stored for each content.

２つめは、自身が運営するＷｅｂサイトのコンテンツで想定されるユーザ層が、アクセス履歴に記憶されているアクセスをしたコンテンツで想定されるユーザ層と同じだが、自身が運営するＷｅｂサイトのコンテンツに登録を必要とするデータセットＢ５０２（データセット２）である。データセットＢ５０２は、項目群Ｂ－２｛氏名、所属、ＩＰ、担当案件、購買情報｝を含む。「Ｎｏ」は、レコードを識別するための連番であり、図６の（ｂ）では、１番からＮ_２までのレコードがデータセットＢ５０２として記録されている。データセットＢ５０２は、コンテンツごとに記憶されてもよい。 The second is that the user group assumed for the content of the website operated by oneself is the same as the user group expected for the accessed content stored in the access history, but the content of the website operated by oneself Data set B502 (data set 2) that requires registration. Data set B502 includes item group B-2 {name, affiliation, IP, charge, purchase information}. “No” is a serial number for identifying records, and in FIG. 6B, records 1 to _N2 are recorded as data set B502. The data set B502 may be stored for each content.

３つめは、他者が運営しており、自身が運営するＷｅｂサイトのコンテンツで想定されるユーザ層と同様のユーザ層が含まれているか不明なデータセットＢ５０３（データセット３）である。データセットＢ５０３は、項目群Ｂ－３｛氏名、所属、投稿記事内容｝を含むとする。データセットＢ５０３は、コンテンツごとに記憶されてもよい。 The third is the data set B503 (data set 3), which is operated by another person and it is unknown whether or not the user layer similar to the user layer assumed in the content of the website operated by the other person is included. It is assumed that the data set B503 includes the item group B-3 {name, affiliation, posted article content}. The data set B503 may be stored for each content.

図７は、図１に示した入力部の入力データの一例を示す図である。入力部００１は、アクセス履歴００２のデータとして、追加取得データをもつ追加取得データアクセス履歴Ａ４０２のデータセットの入力を受け付ける。また、ターゲットユーザに関する情報は、アクセス履歴００２の項目群Ａ－２の｛調査目的｝が“ＡＩ応用”に該当するアクセスユーザの例が入力される。また、各データセットに対してアクセス履歴００２に対する関連項目と、処理方法を入力したデータが入力される。ターゲットユーザに関する情報、アクセス履歴００２に対する関連項目、処理方法は、あらかじめ本システムの利用者により定められているものとする。なお、関連項目の指定は一対一だけでなく、一対多、多対一、多対多の指定も含む。 FIG. 7 is a diagram showing an example of input data of the input unit shown in FIG. 1. The input unit 001 accepts the input of the data set of the additional acquisition data access history A402 having the additional acquisition data as the data of the access history 002. Further, as the information about the target user, an example of an access user whose {investigation purpose} of the item group A-2 of the access history 002 corresponds to "AI application" is input. Further, for each data set, the related items for the access history 002 and the data in which the processing method is input are input. Information about the target user, related items for the access history 002, and the processing method shall be determined in advance by the user of this system. The designation of related items includes not only one-to-one designation but also one-to-many, many-to-one, and many-to-many designations.

図７では、例えば、データセット１については、入力部００１は、アクセス履歴００２に対する関連項目として、｛ＩＰアドレス｝、｛訪問ページ｝、｛検索キーワード｝の指定を受け付け、これら以外の項目については関連項目として指定されなかったことを示している。また、入力部００１は、関連項目として指定されたこれらの項目の処理方法として、それぞれ、｛名寄せ｝、｛関連度算出｝、｛関連度算出｝の指定を受け付けたことを示している。データセット２、データセット３についても、同様の方法でこれらの項目が指定されていることがわかる。 In FIG. 7, for example, for the data set 1, the input unit 001 accepts the designation of {IP address}, {visit page}, and {search keyword} as related items for the access history 002, and for items other than these, the input unit 001 accepts the designation. Indicates that it was not specified as a related item. Further, the input unit 001 indicates that the designation of {name identification}, {relevance degree calculation}, and {relevance degree calculation} has been accepted as the processing method of these items designated as the related items, respectively. It can be seen that these items are specified in the same manner for the data set 2 and the data set 3.

図８は、図１に示した潜在ユーザデータ抽出部の処理結果の一例を示す図である。潜在ユーザデータ抽出部００３は、図２に示したステップ１０２の類似度算出において、アクセス履歴に記憶されたユーザとターゲットユーザとの類似度を算出し、ステップ１０３において、アクセス履歴のユーザとターゲットユーザとの類似度がある閾値α以上のデータを抽出する。 FIG. 8 is a diagram showing an example of the processing result of the latent user data extraction unit shown in FIG. The latent user data extraction unit 003 calculates the similarity between the user stored in the access history and the target user in the similarity calculation in step 102 shown in FIG. 2, and in step 103, the user and the target user in the access history. Extract data with a threshold α or higher that has a degree of similarity with.

図８では、項目群Ａ－１の｛訪問回数、訪問ページ、閲覧タグ｝をキーにして、アクセス履歴に記憶されたユーザとターゲットユーザとの類似度を算出し、類似度の高いアクセスユーザを潜在ユーザデータ７０１として抽出している。 In FIG. 8, the similarity between the user stored in the access history and the target user is calculated by using the {visit count, visited page, browsing tag} of the item group A-1 as a key, and the access user having a high degree of similarity is calculated. It is extracted as latent user data 701.

より具体的には、図８では、潜在ユーザデータ抽出部００３は、図５の（ｂ）に示した追加取得データアクセス履歴Ａ４０２に含まれるレコードのうち、あらかじめ追加取得データアクセス履歴Ａ４０２のなかで一定以上類似するレコードを抽出するための類似条件として指定された項目群Ａ－１の｛訪問回数、訪問ページ、閲覧タグ｝が同じ（あるいは少なくとも一部を含む、意味的に項目の内容が一定以上近似する）データを抽出する。さらに、潜在ユーザデータ抽出部００３は、抽出した当該データのなかで、ターゲットユーザとして条件が指定されたレコードに含まれる上記項目群Ａ－１の｛訪問回数、訪問ページ、閲覧タグ｝に、ある閾値α以上で類似する上記項目群Ａ－１の｛訪問回数、訪問ページ、閲覧タグ｝を含むレコードを抽出する。これにより、追加取得データアクセス履歴Ａ４０２のなかで、ターゲットユーザのレコードと、当該ターゲットユーザに一定以上類似する潜在ユーザのレコードとを含むデータを、潜在ユーザデータ７０１として抽出することができる。この類似度算出方法は、例えば、あらかじめアクセス履歴００２をクラスタリングしておいて、ターゲットユーザのクラスタとの距離の近さを類似度として算出する方法がある。ただし、類似度算出方法はこの限りでない。図８では、上記潜在ユーザデータ７０１のレコードのうち、Ｎｏ１、３のレコードとして記憶されているユーザは、アクセス履歴００２の項目群Ａ－２の｛調査目的｝を“ＡＩ応用”としてＷｅｂサイトのコンテンツにアクセスして入力し、資料をダウンロードしたユーザ、すなわち、図７の（２）において入力部００１で入力した条件に合致するターゲットユーザであることを示している。また、Ｎｏ１、３のレコードとして記憶されているターゲットユーザ以外のユーザ（Ｎｏ５、７、１１）は、当該ターゲットユーザに一定以上類似する潜在ユーザであることを示している。 More specifically, in FIG. 8, the latent user data extraction unit 003 is previously included in the additional acquisition data access history A402 among the records included in the additional acquisition data access history A402 shown in FIG. 5 (b). The {visit count, visited page, browsing tag} of the item group A-1 specified as a similar condition for extracting records that are similar to a certain level or more are the same (or include at least a part, and the content of the item is semantically constant. Extract the data (similar to the above). Further, the latent user data extraction unit 003 is located in the {visit count, visit page, browsing tag} of the item group A-1 included in the record for which the condition is specified as the target user in the extracted data. Records containing {visit count, visit page, browsing tag} of the above item group A-1 having a threshold value of α or more are extracted. Thereby, in the additional acquisition data access history A402, the data including the record of the target user and the record of the potential user having a certain degree or more similar to the target user can be extracted as the potential user data 701. As this similarity calculation method, for example, there is a method in which access history 002 is clustered in advance and the closeness of the distance to the cluster of the target user is calculated as the similarity. However, the similarity calculation method is not limited to this. In FIG. 8, among the records of the latent user data 701, the user stored as the records of Nos. 1 and 3 uses the {investigation purpose} of the item group A-2 of the access history 002 as the "AI application" of the website. It indicates that the user has accessed and input the content and downloaded the material, that is, the target user who meets the conditions input by the input unit 001 in (2) of FIG. Further, the users (Nos. 5, 7, and 11) other than the target users stored as the records of Nos. 1 and 3 indicate that they are potential users who are similar to the target users by a certain amount or more.

図９は、本実施例において、アクセス履歴００２とターゲットユーザとの関係性の一例を示す図である。図９では、アクセスユーザの中には、項目群Ａ－１のみのアクセスユーザＵ１と、その中で資料をダウンロードして項目群Ａ－２をもつ追加取得データを取得したユーザである追加取得ユーザＵ２（例えば、図５の（ｂ）における氏名「田中」、「伊藤」）と、追加取得ユーザの中で獲得目標となるターゲット範囲を指定したターゲットユーザＵ３（例えば、項目群Ａ－２の｛調査目的｝が“ＡＩ応用”である氏名「佐藤」、「鈴木」）が存在することを示している。ここでは、アクセスユーザＵ１は、追加取得ユーザＵ２を包含し、追加取得ユーザＵ２はターゲットユーザＵ３を包含している関係となる。なお、項目群Ａ－１を使ってターゲットユーザを指定できる場合は、追加取得ユーザデータは必ずしも必要とされない。 FIG. 9 is a diagram showing an example of the relationship between the access history 002 and the target user in this embodiment. In FIG. 9, among the access users, the access user U1 having only the item group A-1 and the additional acquisition user who has downloaded the material and acquired the additional acquisition data having the item group A-2. U2 (for example, the names "Tanaka" and "Ito" in (b) of FIG. 5) and the target user U3 (for example, item group A-2 {for example, which specifies the target range to be acquired among the additional acquisition users. The purpose of the survey} indicates that there are names "Sato" and "Suzuki") whose "AI application" is used. Here, the access user U1 includes the additional acquisition user U2, and the additional acquisition user U2 includes the target user U3. If the target user can be specified using the item group A-1, the additional acquisition user data is not always required.

図１０は、本実施例において、ターゲットユーザと潜在ユーザとの関係性の一例を示す図である。図１０では、アクセスユーザの中には、ある分類Ｘにおいて、ターゲットユーザＵ３と類似度α以上で類似する潜在ユーザＵ４が存在することを示している。この分類Ｘは、図８における項目群Ａ－１のうちの｛訪問回数、訪問ページ、閲覧タグ｝をさし、項目は一つとは限らず、また主成分分析などをして次元圧縮したものも対象となる。なお、ターゲットユーザは潜在ユーザに含まれる。 FIG. 10 is a diagram showing an example of the relationship between the target user and the potential user in this embodiment. FIG. 10 shows that among the access users, there is a latent user U4 having a similarity α or higher with the target user U3 in a certain classification X. This classification X refers to {visit count, visit page, browsing tag} in the item group A-1 in FIG. 8, and the item is not limited to one, and is dimensionally compressed by principal component analysis or the like. Is also a target. The target user is included in the potential user.

図１１は、本実施例において、アクセス履歴００２とデータセット１との関係性の一例を示す図である。関連データ類推部００５１におけるデータセットを選択する処理（ステップ２０２）で、データセット１を選択した時の処理結果を示す。図７より、名寄せ処理（ステップ２０４）で指定された関連項目には、データセット１の項目＝”ＩＰアドレス”があるため、関連データ類推部００５１は、アクセス履歴の項目＝“ＩＰアドレス”により、データセット１について名寄せ処理（ステップ２０４）をする。なお、ここで、ＩＰアドレスが同じであってもユーザが違う場合はある。しかし、データセット１では、自身が運営するＷｅｂサイトのコンテンツを管理するシステムにおいて、例えば、当該システムの構造上、異なるユーザ（例えば、ハンドルネームを複数有しているユーザ）のＩＰアドレスは１対１に紐づく前提があれば、名寄せ処理（ステップ２０４）の項目として扱う。また、例えば、アクセス履歴の項目＝“アクセス時刻”と、データセット１の項目＝“アクセス時刻”は、名称が同じであっても、データセット１が同時刻にアクセスするシステムに関するデータセットでなければ、関連項目として指定していない。しかし、必ずしもアクセス履歴とデータセット１についてアクセスした時刻が同時刻でなくても、例えば、通信環境による遅延を考慮して一定の時間幅を設け、当該時間幅に上記アクセスした時刻が含まれる場合には、両者を上記名寄せ処理の項目として扱ってもよい。 FIG. 11 is a diagram showing an example of the relationship between the access history 002 and the data set 1 in this embodiment. The processing result when the data set 1 is selected in the processing (step 202) for selecting the data set in the related data analogy section 0051 is shown. From FIG. 7, since the related item specified in the name identification process (step 204) includes the item of the data set 1 = "IP address", the related data analogy section 0051 is based on the item of the access history = "IP address". , Perform name identification processing (step 204) for the data set 1. Here, even if the IP address is the same, the user may be different. However, in the data set 1, in the system that manages the contents of the website operated by itself, for example, due to the structure of the system, the IP addresses of different users (for example, users having a plurality of handle names) are paired. If there is a premise associated with 1, it is treated as an item of name identification processing (step 204). Further, for example, the item of access history = "access time" and the item of data set 1 = "access time" must be a data set related to a system in which data set 1 accesses at the same time even if the names are the same. For example, it is not specified as a related item. However, even if the access history and the access time for the data set 1 are not necessarily the same time, for example, a certain time width is provided in consideration of the delay due to the communication environment, and the time width includes the above-mentioned access time. May be treated as an item of the above-mentioned name identification processing.

いずれにしても、このようにどの項目同士を紐づけるか判断できるのは本システムの利用者であるため、入力部００１において、関連項目とその処理方法の指定が必要となる。図１１の（１）名寄せ処理では、アクセス履歴００２に含まれる潜在ユーザＵ４の一つのユーザデータＵ４１が、データセット１に含まれるアクセスユーザＵ１のうちの名寄せ後のユーザデータＵ１１と紐づいて同一ユーザとして判定されたことを示している。また、図７より、指定された項目には、データセット１の項目＝”訪問ページ”と、“検索キーワード”がある。このため、関連度算出（ステップ２０６）において、関連データ類推部００５１は、各々に対して指定されたアクセス履歴の項目＝“訪問ページ”と、“検索キーワード”とのそれぞれについて、アクセス履歴００２に含まれる潜在ユーザやターゲットユーザと、データセット１に含まれるアクセスユーザとの関連度算出（ステップ２０６）を実行する。 In any case, since it is the user of this system who can determine which items are to be linked in this way, it is necessary to specify the related items and the processing method in the input unit 001. In (1) name identification processing of FIG. 11, the user data U41 of one potential user U4 included in the access history 002 is the same as the user data U11 after name identification among the access users U1 included in the data set 1. Indicates that the user has been determined. Further, from FIG. 7, the designated items include the item of the data set 1 = "visit page" and "search keyword". Therefore, in the relevance calculation (step 206), the related data analogy unit 0051 adds the access history item = "visit page" and the "search keyword" designated for each to the access history 002. The relevance calculation (step 206) between the included potential user or target user and the access user included in the data set 1 is executed.

図１１の（２）関連度算出では、アクセス履歴００２に含まれるターゲットユーザＵ３の１つのユーザデータＵ３１がデータセット１に含まれるアクセスユーザＵ１のうちのユーザデータの１つであるユーザデータＵ１２と関連度β_１１で紐づけられたことを示している。また、アクセス履歴００２に含まれる潜在ユーザＵ４の１つのユーザデータＵ４２がデータセット１に含まれるアクセスユーザＵ１のうちのユーザデータの２つであるユーザデータＵ１３、Ｕ１４と、それぞれ、関連度β_１２とβ_１３で紐づいたことを示している。図１１の（２）では、データセット１の項目として“訪問ページ”の場合を例示したが、“検索キーワード”についても同様に考えることができる。 In (2) relevance calculation of FIG. 11, one user data U31 of the target user U3 included in the access history 002 is combined with the user data U12 which is one of the user data of the access user U1 included in the data set 1. It shows that they were associated with the degree of relevance β ₁₁ . Further, the user data U42 of one potential user U4 included in the access history 002 is the user data U13 and U14 which are two of the user data of the access users U1 included in the data set 1, and the degree of association β ₁₂ respectively. It is shown that it was linked with β ₁₃ . In (2) of FIG. 11, the case of the “visit page” is illustrated as the item of the data set 1, but the “search keyword” can be considered in the same manner.

図１２は、本実施例において、アクセス履歴００２とデータセット２との関係性の一例を示す図である。関連データ類推部００５１におけるデータセットを選択する処理（ステップ２０２）で、データセット２を選択した時の処理結果を示す。図７より、名寄せ処理（ステップ２０４）で指定された関連項目には、データセット２の項目＝”氏名”ＡＮＤ“所属”があるため、関連データ類推部００５１は、アクセス履歴の項目＝“氏名”ＡＮＤ“所属”により、データセット２について名寄せ処理（ステップ２０４）をする。図１２の（１）名寄せ処理では、アクセス履歴００２に含まれる潜在ユーザＵ４の中でターゲットユーザＵ３の１つのユーザデータＵ３２が、データセット２に含まれる登録ユーザＵ５のうちの名寄せ後のユーザデータの１つであるユーザデータＵ５１と紐づいて同一ユーザとして判定されたことを示している。 FIG. 12 is a diagram showing an example of the relationship between the access history 002 and the data set 2 in this embodiment. The processing result when the data set 2 is selected in the processing (step 202) for selecting the data set in the related data analogy section 0051 is shown. From FIG. 7, since the related item specified in the name identification process (step 204) includes the item of the data set 2 = "name" AND "affiliation", the related data analogy section 0051 has the item of the access history = "name". By "AND" belonging ", the name identification process (step 204) is performed on the data set 2. In (1) name identification processing of FIG. 12, one user data U32 of the target user U3 among the potential users U4 included in the access history 002 is the user data after name identification among the registered users U5 included in the data set 2. It is shown that the user is determined to be the same user in association with the user data U51, which is one of the above.

また、図７より、指定された項目には、データセット２の項目＝”担当案件”がある。当該項目は、アクセス履歴の項目＝“調査目的”と名称とは異なるが、システムの構造上、別のデータセットを用いることも許容している場合は、項目名が異なっても類似する内容を入力している場合がある。したがって、図７に示したように、入力部００１において類似する内容を含む項目としてユーザが任意で指定することを可能としている。この項目指定は、手動で入力する以外にも、例えば、これらの２つの項目の内容同士を比較しておき、いずれかの項目が他の項目の内容と一定の閾値以上の類似内容を含むことを判定し、デフォルト値として自動で関連項目として指定させても良い。 Further, from FIG. 7, the designated item includes the item of the data set 2 = "in charge". This item has a different name from the access history item = "investigation purpose", but if the structure of the system allows the use of different data sets, similar content may be used even if the item name is different. You may have entered it. Therefore, as shown in FIG. 7, it is possible for the user to arbitrarily specify an item including similar contents in the input unit 001. In addition to manually inputting this item specification, for example, the contents of these two items are compared with each other, and one of the items contains similar contents with the contents of another item and a certain threshold value or more. May be determined and automatically specified as a related item as the default value.

図１２の（２）関連度算出では、アクセス履歴００２に含まれるターゲットユーザＵ３の１つのユーザデータＵ３３がデータセット２に含まれる登録ユーザＵ５のうちの２つのユーザデータであるユーザデータＵ５２、Ｕ５３と、それぞれ、関連度β_２１とβ_２２で紐づいたことを示している。なお、関連度算出（ステップ２０６）では、項目群Ａ－２を含むアクセス履歴を対象とするため、項目群Ａ－２をもつターゲットユーザのみ処理する。 In (2) relevance calculation of FIG. 12, user data U52 and U53 in which one user data U33 of the target user U3 included in the access history 002 is two user data of the registered users U5 included in the data set 2 are used. And, respectively, it is shown that they are linked by the degree of association β ₂₁ and β ₂₂ . In the relevance calculation (step 206), since the access history including the item group A-2 is targeted, only the target user having the item group A-2 is processed.

また、図には示していないが、データセット２が、自身が運営するＷｅｂサイトのシステムに登録されたデータであって、アクセス履歴００２のユーザ層と同一のユーザ層が使っているシステムのデータである場合を考える。この場合、項目群Ａ－１において指定された関連項目があり、かつ当該関連項目について、データセット２とアクセス履歴００２との関連度が一定の閾値以上高いユーザがいた場合、個人を特定できなかった潜在ユーザのユーザデータに、データセット２の登録ユーザデータＵ５のうち上記関連度が一定の閾値以上高いユーザの登録ユーザデータが付加されることで個人を特定できることもある。 Further, although not shown in the figure, the data set 2 is the data registered in the system of the website operated by itself, and the data of the system used by the same user layer as the user layer of the access history 002. Consider the case where. In this case, if there is a related item specified in the item group A-1 and there is a user whose relationship between the data set 2 and the access history 002 is higher than a certain threshold value for the related item, the individual cannot be identified. An individual may be identified by adding the registered user data of a user whose relevance is higher than a certain threshold value among the registered user data U5 of the data set 2 to the user data of the latent user.

図１３は、本実施例において、アクセス履歴００２とデータセット３との関係性の一例を示す図である。関連データ類推部００５１におけるデータセットを選択する処理（ステップ２０２）で、データセット３を選択した時の処理結果を示す。図７より、名寄せ処理（ステップ２０４）で指定された関連項目には、データセット１の項目＝”氏名”ＡＮＤ“所属”があるため、関連データ類推部００５１は、アクセス履歴の項目＝“氏名”ＡＮＤ“所属”により、データセット３について名寄せ処理（ステップ２０４）をする。 FIG. 13 is a diagram showing an example of the relationship between the access history 002 and the data set 3 in this embodiment. The processing result when the data set 3 is selected in the processing (step 202) for selecting the data set in the related data analogy section 0051 is shown. From FIG. 7, since the related item specified in the name identification process (step 204) includes the item of the data set 1 = "name" AND "affiliation", the related data analogy section 0051 has the item of the access history = "name". By "AND" belonging ", the name identification process (step 204) is performed on the data set 3.

図１３の（１）名寄せ処理では、アクセス履歴００２に含まれる潜在ユーザＵ４の中のターゲットユーザＵ３の１つのユーザデータＵ３４が、データセット３に含まれる登録ユーザＵ５のうちの名寄せ後の１つのユーザデータであるユーザデータＵ６１と紐づいて同一ユーザとして判定されたことを示している。また、図７より、指定された項目には、データセット３の項目＝”投稿記事内容”がある。しかし、当該項目が、他者が運用しているＷｅｂサイトのコンテンツを管理するシステムにおいて、例えばＳＮＳ（ＳｏｃｉａｌＮｅｔｗｏｒｋｉｎｇＳｅｒｖｉｃｅ）のような多数利用者登録されている登録データに含まれる場合もある。このような場合、当該項目を介して、例えば、当該項目”投稿記事内容”にアクセス可能なようにリンクを張る等、同様のユーザが大量に紐づき、関連度を算出してデータを付加すると煩雑となる場合がある。そのため、図１３の（２）では、データセット３の一点鎖線で囲ったようにグループ同士で紐づけして、関連度の高いグループの特徴を出力することを想定している。このグループ分けの方法は、関連データ類推部００５１が、あらかじめデータセット３のデータをクラスタリングしてデータを分類しておくなどできるが、方法はこの限りでない。図１３の（２）関連度算出では、アクセス履歴００２に含まれる潜在ユーザＵ４の２つのユーザデータＵ４３、Ｕ４４が、それぞれ、データセット３に含まれる登録ユーザのうち、上記分類された関連度の高いグループＧ６１、Ｇ６２のユーザデータと各々関連度β_３１とβ_３２で紐づいたことを示している。 In (1) name identification processing of FIG. 13, one user data U34 of the target user U3 in the potential user U4 included in the access history 002 is one of the registered users U5 included in the data set 3 after the name identification. It shows that it is determined as the same user in association with the user data U61 which is the user data. Further, from FIG. 7, the designated item includes the item of the data set 3 = "posted article content". However, the item may be included in the registration data registered by a large number of users such as SNS (Social Networking Service) in a system that manages the contents of a website operated by another person. In such a case, if a large number of similar users are linked, the degree of relevance is calculated, and data is added, for example, by linking the item so that the item "posted article content" can be accessed via the item. It can be complicated. Therefore, in (2) of FIG. 13, it is assumed that the groups are linked to each other as surrounded by the alternate long and short dash line of the data set 3 to output the characteristics of the highly related groups. As for the method of grouping, the related data analogy unit 0051 can cluster the data of the data set 3 in advance to classify the data, but the method is not limited to this. In (2) relevance calculation of FIG. 13, the two user data U43 and U44 of the potential user U4 included in the access history 002 are each of the registered users included in the data set 3 and have the above-mentioned classified relevance. It is shown that the user data of the high groups G61 and G62 are associated with the relevance degree β ₃₁ and β ₃₂ , respectively.

図１４は、本実施例において、出力結果の一例を示す図である。図１４では、図５に示した項目群Ａ－１および項目群Ａ－２に付加される付加データを例示しているが、実際には、当該付加データの各レコードに、上記項目群Ａ－１および項目群Ａ－２が対応付けられている。また、図１４では、当該付加データとして、潜在ユーザデータに対して、図４に示したデータ付加部００５２が処理を実行したときの出力例１３０１を示す。 FIG. 14 is a diagram showing an example of an output result in this embodiment. FIG. 14 illustrates the additional data added to the item group A-1 and the item group A-2 shown in FIG. 5, but in reality, each record of the additional data includes the item group A-. 1 and item group A-2 are associated with each other. Further, FIG. 14 shows an output example 1301 when the data addition unit 0052 shown in FIG. 4 executes processing on the latent user data as the additional data.

関連データ類推部００５１は、各データセットについて名寄せ処理（ステップ２０４）したレコードを、付加データとして図５に示した項目群Ａ－１および項目群Ａ－２に付加する。また、関連データ類推部００５１は、各データセットについて関連度算出（ステップ２０６）に用いた関連項目について、関連結果と、関連度βを出力する。なお、関連結果は、ここでは関連項目の内容を出力する例としているが、名寄せ処理と同様にレコード結果を出力しても良い。 The related data analogy section 0051 adds the record of the name identification process (step 204) for each data set to the item group A-1 and the item group A-2 shown in FIG. 5 as additional data. Further, the related data analogy unit 0051 outputs the related result and the related degree β for the related item used for the related degree calculation (step 206) for each data set. Although the related result is an example of outputting the contents of the related item here, the record result may be output in the same manner as the name identification process.

この結果、図１４のように、当初入力部００１から指定されたターゲットユーザのレコード（例えば、Ｎｏ．１やＮｏ．３のレコード）のほか、潜在ユーザに含まれる当該ターゲットユーザ以外の潜在ユーザのレコード（例えば、Ｎｏ．７やＮｏ．１１のレコード）に対して、関連するデータを付加することが可能となる。潜在ユーザのレコードに対して、関連するデータを付加することで、例えば、Ｎｏ．７の潜在ユーザは、図８から素材Ａ、素材Ｃに興味をもち、かつ、投稿記事内容から関連度β＝０．８（すなわち、８０パーセントの関連度）で図１４からＡＩ応用を考えている大学院生の可能性があることがわかり、マーケティング対象となる注目分野の想定ができる。また、Ｎｏ．１１の潜在ユーザは、図８から素材Ａに興味をもち、投稿記事内容から関連度β＝０．８（すなわち、８０パーセントの関連度）で図１４からＡＩ応用を考えている施工の現場に勤務する作業員の可能性があることがわかり、現場のニーズを想定することができる可能性がある。よって、本システムにより、潜在ユーザデータに関連するデータを付加することで、マーケティングに活用可能な情報を取得することができるといえる。 As a result, as shown in FIG. 14, in addition to the record of the target user initially designated from the input unit 001 (for example, the record of No. 1 or No. 3), the potential user other than the target user included in the potential user. It is possible to add related data to a record (for example, a record of No. 7 or No. 11). By adding related data to the record of the potential user, for example, No. The potential user of 7 is interested in material A and material C from FIG. 8, and considers AI application from FIG. 14 with a degree of relevance β = 0.8 (that is, a degree of relevance of 80%) from the content of the posted article. It turns out that there is a possibility of being a graduate student, and it is possible to envision a field of interest to be marketed. In addition, No. The 11 potential users are interested in the material A from FIG. 8, and from the content of the posted article, the relevance β = 0.8 (that is, the relevance of 80%), and from FIG. It may be possible to find out that there is a possibility of working workers and to anticipate the needs of the site. Therefore, it can be said that this system can acquire information that can be used for marketing by adding data related to potential user data.

より具体的には、本実施例における潜在ユーザデータ付加システム１０００では、ユーザの登録情報を含まないコンテンツのアクセス履歴００２から、指定された条件（例えば、図７におけるアクセス履歴００２の項目群Ａ－２の｛調査目的｝が“ＡＩ応用”である条件）を含むターゲットユーザに一定以上類似する潜在ユーザのレコード（例えば、図８のレコードＮｏ５、７、１１のレコード）と、上記ターゲットユーザのレコード（例えば、図８のレコードＮｏ１、３のレコード）とを含むデータを、潜在ユーザデータとして出力する潜在データ抽出部００３と、上記潜在ユーザデータのレコードに関連付けるための条件として指定された関連項目（例えば、図７における｛ＩＰアドレス｝、｛訪問ページ｝、｛検索キーワード｝）を用いて、上記潜在ユーザデータのレコードに関連する、上記アクセス履歴とは異なるアクセス履歴を記憶したデータセット（例えば、図６のデータセット１～３）のレコードを、上記潜在ユーザデータのレコードに付加する潜在ユーザデータ付加部００５と、を有する。したがって、アクセス履歴に含まれるターゲットユーザに類似する個人が特定できないような潜在ユーザについて、アクセス履歴以外の様々なデータのなかから、マーケティングに活用可能なデータを付加することができる。 More specifically, in the latent user data addition system 1000 in this embodiment, the specified conditions (for example, the item group A- of the access history 002 in FIG. 7) are specified from the access history 002 of the content that does not include the user registration information. A record of a potential user (for example, a record of records Nos. 5, 7 and 11 in FIG. 8) and a record of the target user that are similar to the target user by a certain amount or more including the condition that the {investigation purpose} of 2 is "AI application"). Related items specified as conditions for associating the data including (for example, the records of records No. 1 and 3 in FIG. 8) with the latent data extraction unit 003 that outputs the data as latent user data and the record of the latent user data. For example, using {IP address}, {visit page}, {search keyword} in FIG. 7, a data set (for example,) that stores an access history different from the access history related to the record of the latent user data (for example). It has a latent user data addition unit 005 that adds the records of the data sets 1 to 3) of FIG. 6 to the record of the latent user data. Therefore, it is possible to add data that can be used for marketing from various data other than the access history for potential users whose individuals similar to the target user included in the access history cannot be identified.

また、上記潜在ユーザデータ付加部００５は、上記関連項目の内容（例えば、｛訪問ページ｝の内容、｛検索キーワード｝の文字列や値）に基づいて、上記データセットのレコードと、上記潜在ユーザデータのレコードとの関連度を算出する関連データ類推部００５１を有する。したがって、データセットの様々なレコードと潜在ユーザデータの様々なレコードとの間での関連度を把握することができる。 Further, the latent user data addition unit 005 may use the records of the data set and the latent user based on the contents of the related items (for example, the contents of the {visit page}, the character string or the value of the {search keyword}). It has a related data estimation unit 0051 that calculates the degree of relevance of the data to the record. Therefore, it is possible to grasp the degree of relevance between various records of the data set and various records of the latent user data.

また、上記潜在ユーザデータ付加部００５は、上記関連度が算出された上記潜在ユーザデータのレコードと上記データセットのレコードとを対応付けた出力結果データを出力するデータ付加部００５２を有する。したがって、上記関連度が算出された潜在ユーザデータのレコードとデータセットのレコードとを対応付けてユーザに提示することができる。 Further, the latent user data addition unit 005 has a data addition unit 0052 that outputs output result data in which the record of the latent user data for which the degree of relevance is calculated and the record of the data set are associated with each other. Therefore, the record of the latent user data for which the degree of relevance has been calculated can be associated with the record of the data set and presented to the user.

また、上記潜在データ抽出部００３は、上記アクセス履歴から、所定の項目（例えば、図８において、一定以上類似するレコードを抽出するための類似条件として指定された項目群Ａ－１の｛訪問回数、訪問ページ、閲覧タグ｝）の内容が一定以上類似するユーザのレコードを抽出し、抽出した上記レコードのなかで、上記ターゲットユーザに一定以上類似する潜在ユーザのレコードと上記ターゲットユーザのレコードとを含むデータを、例えば、図８に示す潜在ユーザデータとして出力する。したがって、例えば、本システムのユーザが所望の項目群を指定したうえで、ターゲットユーザに一定以上類似する潜在ユーザのレコードとターゲットユーザのレコードとを含むデータを潜在ユーザデータとして抽出し、出力することができる。 Further, the latent data extraction unit 003 {visits number of times of the item group A-1 designated as a similar condition for extracting records having a certain degree or more similar in a predetermined item (for example, in FIG. 8 in FIG. 8) from the access history. , Visit page, browsing tag}) is extracted from the records of users whose contents are similar to a certain level or more, and among the extracted records, the records of potential users and the records of the target user that are similar to the target user by a certain amount or more are extracted. The included data is output, for example, as latent user data shown in FIG. Therefore, for example, after the user of this system specifies a desired item group, data including a record of a potential user and a record of the target user that are similar to the target user by a certain amount or more is extracted and output as potential user data. Can be done.

また、上記関連データ類推部００５１は、データセット群の各データセットについて、上記関連項目について名寄せ処理が指定されている場合、上記潜在ユーザデータのレコードに対して名寄せ処理を行い、上記関連項目について関連度算出することが指定されている場合、上記潜在ユーザデータセットのレコードに対して関連度を算出することができる。 Further, when the name identification process is specified for the related item for each data set of the data set group, the related data guessing unit 0051 performs the name identification process for the record of the latent user data, and the related item. When it is specified to calculate the degree of relevance, the degree of relevance can be calculated for the records of the above latent user data set.

また、上記データ付加部００５２は、上記関連データ類推部００５１により算出された上記関連度（例えば、図１４のレコードＮｏ．７のβ ＝０．８）と、当該関連度の算出に用いた上記関連項目の内容（例えば、図１４のレコードＮｏ．７の“クラスタリング”）とを含む上記出力結果データを、表示部（例えば、ディスプレイ等の出力装置１５０５）に出力する。したがって、本システムのユーザは、どのような内容に基づいてどの程度の関連度があるのか、といった関連度の詳細を、一目で把握することができる。 Further, the data addition unit 0052 is used to calculate the relevance degree (for example, β = 0.8 of record No. 7 in FIG. 14) calculated by the related data analogy unit 0051 and the relevance degree. The output result data including the contents of related items (for example, “clustering” of record No. 7 in FIG. 14) is output to a display unit (for example, an output device 1505 such as a display). Therefore, the user of this system can grasp the details of the degree of relevance, such as what kind of content is based on and how much relevance is, at a glance.

以上のように、ユーザの登録情報を含まないコンテンツのアクセス履歴を用いて、獲得目標の例となるターゲットユーザの情報と、各データセットとの関連項目と処理方法を入力することで、潜在ユーザデータの抽出と、別のデータセット群から潜在ユーザデータに関連するデータを付加することが可能となる。これにより、ユーザ個人を特定して直接商品サービスを売り込むことが可能な情報以外に、嗜好傾向などマーティングに活用可能な情報を取得でき、新たなビジネスの創出を可能とする。 As described above, by using the access history of the content that does not include the user registration information, the information of the target user that is an example of the acquisition target, the related items with each data set, and the processing method are input, so that the potential user can be used. It is possible to extract data and add data related to latent user data from another dataset group. As a result, in addition to the information that can directly identify the individual user and sell the product / service, the information that can be used for marting such as the preference tendency can be acquired, and it is possible to create a new business.

なお、本発明は上記した実施例に限定されるものではなく、様々な変形例が含まれる。例えば、上記した実施例は本発明を分かりやすく説明するために詳細に記載したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、ある実施例の構成の一部を他の実施例の構成に置き換えることが可能であり、また、ある実施例の構成に他の実施例の構成を加えることも可能である。また、各実施例の構成の一部について、他の構成の追加、削除、又は置換のいずれもが、単独で、又は組み合わせても適用可能である。 The present invention is not limited to the above-described embodiment, and includes various modifications. For example, the above-described embodiment is described in detail in order to explain the present invention in an easy-to-understand manner, and is not necessarily limited to the one including all the configurations described. Further, it is possible to replace a part of the configuration of one embodiment with the configuration of another embodiment, and it is also possible to add the configuration of another embodiment to the configuration of one embodiment. Further, for a part of the configuration of each embodiment, any of addition, deletion, or replacement of other configurations can be applied alone or in combination.

１０００：潜在ユーザデータ付加システム
００１：入力部
００２：アクセス履歴
００３：潜在ユーザデータ抽出部
００４：データセット群
００５：潜在ユーザデータ付加部
００５１：関連データ類推部
００５２：データ付加部 1000: Latent user data addition system 001: Input unit 002: Access history 003: Latent user data extraction unit 004: Data set group 005: Latent user data addition unit 0051: Related data analogy unit 0052: Data addition unit

Claims

From the access history of the content that does not include the user registration information, the data including the record of the latent user that is similar to the target user including the specified condition by a certain amount or more and the record of the target user is output as the latent user data. Data extraction unit and
Using the related item specified as a condition for associating with the record of the latent user data, the record of the data set related to the record of the latent user data and storing the access history different from the access history is described as the latent. The latent user data addition part to be added to the user data record,
A data addition system characterized by having.

The data addition system according to claim 1.
The latent user data addition unit is a related data analogy unit that calculates the degree of relevance between the record of the data set and the record of the latent user data based on the contents of the related item.
A data addition system characterized by having.

The data addition system according to claim 2.
The latent user data addition unit is a data addition unit that outputs output result data in which a record of the latent user data for which the degree of relevance is calculated and a record of the data set are associated with each other.
A data addition system characterized by having.

The data addition system according to claim 1.
The latent data extraction unit extracts a record of a user having similar contents of a predetermined item from the access history, and among the extracted records, a record of a latent user having a certain degree or more similar to the target user and the target. Output data including user records as latent user data,
A data addition system characterized by this.

The data addition system according to claim 2.
When the name identification process is specified for the related item for each data set of the data set group, the related data guessing unit performs the name identification process for the record of the latent user data and calculates the degree of relevance for the related item. If it is specified, the degree of relevance is calculated for the records in the latent user data set.
A data addition system characterized by this.

The data addition system according to claim 3.
The data addition unit outputs the output result data including the relevance degree calculated by the related data analogy unit and the contents of the related item used for calculating the relevance degree to the display unit.
A data addition system characterized by this.

It is a data addition method performed by a computer.
From the access history of the content that does not include the user's registration information, the latent data extraction unit latently collects data including the record of the latent user that is similar to the target user including the specified condition by a certain amount or more and the record of the target user. Output as user data
Data in which the latent user data addition unit stores an access history different from the access history related to the record of the latent user data by using the related item specified as a condition for associating with the record of the latent user data. Add the record of the set to the record of the latent user data.
A data addition method characterized by this.

The data addition method according to claim 7.
The related data analogy unit of the latent user data addition unit calculates the degree of association between the record of the data set and the record of the latent user data based on the contents of the related item.
A data addition method characterized by this.

The data addition method according to claim 8.
The data addition unit of the latent user data addition unit outputs output result data in which the record of the latent user data for which the degree of relevance is calculated and the record of the data set are associated with each other.
A data addition method characterized by this.

The data addition method according to claim 7.
The latent data extraction unit extracts a record of a user having similar contents of a predetermined item from the access history, and among the extracted records, a record of a latent user having a certain degree or more similar to the target user and the target. Output data including user records as latent user data,
A data addition method characterized by this.

The data addition method according to claim 8.
When the related data guessing unit specifies the name identification process for the related item for each data set of the data set group, the name identification process is performed on the record of the latent user data, and the degree of relevance is calculated for the related item. If it is specified, the degree of relevance is calculated for the records in the latent user data set.
A data addition method characterized by this.

The data addition method according to claim 9.
The data addition unit outputs the output result data including the relevance degree calculated by the related data analogy unit and the contents of the related item used for calculating the relevance degree to the display unit.
A data addition method characterized by this.