JP2003173280A

JP2003173280A - Database generation device, database generation method, and database generation program

Info

Publication number: JP2003173280A
Application number: JP2001371636A
Authority: JP
Inventors: Katsuto Bessho; 克人別所; Shigeto Iwase; 成人岩瀬
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2001-12-05
Filing date: 2001-12-05
Publication date: 2003-06-20
Anticipated expiration: 2021-12-05
Also published as: JP3803961B2

Abstract

(57)【要約】【課題】ネットワーク上で店舗情報などを独立に管理
・運営している複数サーバからデータを収集し、重複デ
ータがなく、かつ、データの更新状況の表示が可能なデ
ータベース（ＤＢ）を生成する。【解決手段】本データベース生成装置１０は、ネット
ワーク４０上の各サーバ２０からデータと、該データの
識別ＩＤ、更新日時などの情報を収集する手段１１、収
集した各データから、該データを特徴付ける属性値を抽
出し、該属性値、識別ＩＤ、更新日時などからなる新Ｄ
Ｂ１８を生成する手段１２、新ＤＢ内の属性値が同一と
みなせるデータを名寄せする手段１３、該新ＤＢと前回
生成した旧ＤＢ１７間で属性値が同一とみなせるデータ
同士を対応づける手段１４、新ＤＢのデータと対応する
旧ＤＢのデータの識別ＩＤや更新日時を比較し、新ＤＢ
の該当データに更新情報を付与する手段１５を有する。 (57) [Summary] [Problem] A database that collects data from multiple servers that independently manage and operate store information and the like on a network, has no duplicate data, and can display the update status of data ( DB). SOLUTION: This database generation device 10 collects data from each server 20 on a network 40 and information 11 such as an identification ID of the data, an update date and time, and an attribute characterizing the data from the collected data. The value is extracted and a new D consisting of the attribute value, identification ID, update date
A means for generating B18, a means for identifying data whose attribute values in the new DB can be regarded as being identical, a means for associating data whose attribute values can be regarded as identical between the new DB and the previously generated old DB, Compare the identification ID and update date of the DB data with the corresponding old DB data, and
Means 15 for adding update information to the corresponding data.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、インターネット等
のネットワーク上に分散配置され、店などの案内情報等
を独立に管理・運営している複数のサーバ等からデータ
を収集し、検索・案内するためのデータベースを生成す
る装置及び方法、並びにそのプログラムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention collects data from a plurality of servers or the like which are distributedly arranged on a network such as the Internet and independently manage / operate guide information of stores and the like, and search / guide. The present invention relates to an apparatus and method for generating a database for data, and a program thereof.

【０００２】[0002]

【従来の技術】店の案内情報などのデータは、いくつか
の組織において独立に作成され、必要に応じて更新され
る場合が多い。一つの組織が所有しているデータ集合が
全ての店の案内情報をカバーしているわけではないの
で、独立に作成・更新されているこれらのデータ集合を
統合すれば、より充実した情報検索サービスを行うこと
ができる。各組織が保有するデータ集合は、インターネ
ット等のネットワークに接続されたコンピュータ内に保
管され、閲覧に供される。以後、このようなコンピュー
タを情報サーバと呼ぶことにする。2. Description of the Related Art Data such as shop guide information is often created independently by some organizations and updated as needed. Since the data set owned by one organization does not cover the guide information of all stores, if these data sets created and updated independently are integrated, a more complete information search service will be provided. It can be performed. The data set possessed by each organization is stored in a computer connected to a network such as the Internet and provided for browsing. Hereinafter, such a computer will be referred to as an information server.

【０００３】複数の情報サーバからデータ集合を収集
し、データベースを生成する従来の技術においては、複
数の情報サーバから収集したデータ集合を単純にマージ
したものをデータベースとしていた。生成されたデータ
ベース中の各データには、該データが存在する情報サー
バ内の元データへのリンク情報が付与されており、ユー
ザが端末を用いてデータベースからデータを検索した際
は、端末に表示されたデータに付随するリンク情報によ
り、該データの元データにアクセスを行うことができ
る。図１１は、データベースから、例えば業種が「中
華」で住所が「新宿区神楽坂」である店を検索したとき
の、従来の検索結果表示画面の一例を示したものであ
る。ユーザがリンク情報を画面上でクリックすることに
より、リンク先の店の詳細画面が表示される。In the conventional technique of collecting data sets from a plurality of information servers and generating a database, a database is a simple merge of data sets collected from a plurality of information servers. Each data in the generated database is provided with link information to the original data in the information server in which the data exists, and is displayed on the terminal when the user retrieves the data from the database using the terminal. The original information of the data can be accessed by the link information attached to the created data. FIG. 11 shows an example of a conventional search result display screen when a store having a business type of “Chinese” and an address of “Kagurazaka, Shinjuku-ku” is searched from the database. When the user clicks the link information on the screen, the detailed screen of the linked store is displayed.

【０００４】[0004]

【発明が解決しようとする課題】いくつかの組識におい
て独立に作成されたデータ集合では、同一店舗でも名義
や住所が異なる形式、表現で登録されることが多い。従
って、複数の情報サーバから収集したデータ集合を単純
にマージしてデータベースを生成する従来の技術では、
重複する同一店舗を一つにまとめることができず、検索
結果の店舗群の中に、同一店舗が複数混在して表示され
ることがある。このような場合、検索結果が冗長に多く
なり、ユーザは不必要なデータの中身まで吟味し、それ
が既に見たデータと同じかどうか判断するといった煩雑
な作業を強いられることになる。例えば、図１１の検索
結果表示画面では、１番目の店舗と４番目の店舗が同一
であり、３番目の店舗と６番目の店舗が同一である。In a data set created independently by some organizations, names and addresses are often registered in different formats and expressions even in the same store. Therefore, in the conventional technique of simply merging data sets collected from a plurality of information servers to generate a database,
It may not be possible to combine the same overlapping stores into one, and a plurality of the same stores may be mixedly displayed in the store group of search results. In such a case, the search results are redundantly increased, and the user is forced to perform a complicated work of examining the contents of unnecessary data and determining whether the data is the same as the data already seen. For example, in the search result display screen of FIG. 11, the first store and the fourth store are the same, and the third store and the sixth store are the same.

【０００５】また、店などの情報を検索するユーザにと
って特に興味のあるのは、店の新しい情報や、新規に出
来たお店などの情報である。このため、データ集合の収
集とデータベースの生成を定期的に実行する場合、生成
されたデータベースからデータを検索するユーザにとっ
ては、表示されたデータの内、どのデータが更新された
ものであるか、または新規のものであるかの情報がつい
ていると、新しい情報を迅速に取得することが出来る。
しかしながら、従来技術においては、各データにこのよ
うな更新情報は付与されない。Also, what is particularly interesting to a user who searches for information on a store or the like is new information on the store or information on a newly opened store. Therefore, when performing collection of a data set and generation of a database on a regular basis, for a user who retrieves data from the generated database, which data among the displayed data is updated, Alternatively, if information indicating that it is new is attached, new information can be quickly acquired.
However, in the related art, such update information is not added to each data.

【０００６】本発明の目的は、ネットワーク上に分散し
て存在している複数の情報サーバ等からデータ集合を収
集して、冗長性がないようにデータを統合し、かつデー
タの更新情報が付加されたデータベースを生成すること
を可能とするデータベース生成装置及び方法、並びにそ
のためのデータベース生成プログラムを提供することに
ある。An object of the present invention is to collect data sets from a plurality of information servers or the like which are distributed over the network, integrate the data so that there is no redundancy, and add data update information. To provide a database generating apparatus and method capable of generating a stored database, and a database generating program therefor.

【０００７】[0007]

【課題を解決するための手段】本発明のデータベース生
成装置は、過去に生成されたデータベース（旧データベ
ース）を記憶する記憶手段と、複数地点から、名義や住
所などの属性の値を含むデータ、該データの識別ＩＤ、
更新日時などの情報を収集するデータ収集手段と、前記
収集された各データから属性の値を抽出し、各データ
が、前記抽出した属性の値、識別ＩＤ、更新日時などか
らなる構成のデータベース（新データベース）を生成す
る属性情報抽出手段と、前記生成された新データベース
内の属性の値が同一とみなせるデータ集合を同一グルー
プに分類する名寄せ手段と、新データベースと前記旧デ
ータベースとの間で、属性の値が同一とみなせるデータ
同士を同一と判断して両データベース間の各データを対
応付けする結合手段と、前記新データベース中のデータ
の識別ＩＤや更新日時などの情報と、前記データと対応
付けされた前記旧データベース中のデータの識別ＩＤや
更新日時などの情報とを比較することによって、前記新
データベース内の該当データに更新情報を付与する更新
情報付与手段とを有することを特徴とする。A database generation apparatus of the present invention comprises a storage means for storing a database (old database) generated in the past, data containing attribute values such as names and addresses from a plurality of points, Identification ID of the data,
A data collection means for collecting information such as update date and time, and a database having a configuration in which attribute values are extracted from each of the collected data, and each data includes the extracted attribute value, identification ID, update date and time ( Between a new database), an attribute information extraction unit that generates a new database), a name identification unit that classifies data sets that can be regarded as having the same attribute value in the generated new database into the same group, and a new database and the old database, A combination unit that determines that the data whose attribute values can be regarded as the same is the same and associates each data between the two databases, information such as an identification ID and update date and time of the data in the new database, and the corresponding data. By comparing it with the information such as the identification ID and the update date and time of the data in the old database attached, the data in the new database is compared. And having an update information providing means for providing update information to the data.

【０００８】名寄せ手段では、生成された新データベー
スにおいて、重複するデータが一つにされている。この
ため、このデータベースからユーザの要求に合致するデ
ータを検索し表示したとき、同一店舗のデータが複数個
表示されることはなく、検索結果の把握がより容易に行
える。また、結合手段は、生成した新データベースと、
前回生成した旧データベースとの間で、同一店舗等のデ
ータを特定し、更新情報付与手段では、それらの識別Ｉ
Ｄ（例えば名称）や更新日時などを比較することによ
り、データの更新情報を導出するので、最終的に生成さ
れたデータベースは、データの更新情報が付与された上
で、データを表示することが可能である。In the name identification means, the duplicated data is unified in the generated new database. Therefore, when the data matching the user's request is retrieved from this database and displayed, a plurality of data of the same store are not displayed, and the retrieval result can be grasped more easily. In addition, the joining means is a new database created,
The data of the same store or the like is specified with respect to the old database generated last time, and the identification information I
Since the update information of the data is derived by comparing D (for example, name) and update date and time, the finally generated database can display the data after the update information of the data is added. It is possible.

【０００９】次に、本発明のデータベース生成方法は、
複数地点から、名義や住所などの属性の値を含むデー
タ、該データの識別ＩＤ、更新日時などの情報を収集す
るデータ収集過程と、前記収集された各データから属性
の値を抽出し、各データが、前記抽出した属性の値、識
別ＩＤ、更新日時などからなる構成のデータベース（新
データベース）を生成する属性情報抽出過程と、前記生
成された新データベース内の属性の値が同一とみなせる
データ集合を同一グループに分類する名寄せ過程と、新
データベースと過去に生成して保持されている旧データ
ベースとの間で、属性の値が同一とみなせるデータ同士
を同一と判断して両データベース間の各データを対応付
けする結合過程と、前記新データベース中のデータの識
別ＩＤや更新日時などの情報と、前記データと対応付け
された前記旧データベース中のデータの識別ＩＤや更新
日時などの情報とを比較することによって、新データベ
ース内の該当データに更新情報を付与する更新情報付与
過程とを有することを特徴とする。Next, the database generation method of the present invention is
A data collection process of collecting data including attribute values such as names and addresses from a plurality of points, identification ID of the data, information such as update date and time, and extracting attribute values from the collected data. The data is data in which the attribute information extracting process for generating a database (new database) having a structure including the extracted attribute value, identification ID, update date and time, and the attribute value in the generated new database can be regarded as the same. The name identification process that classifies the set into the same group and the new database and the old database that was generated and retained in the past are judged to be the same data that has the same attribute value. A combining process for associating data, information such as identification ID and update date and time of data in the new database, and the old data associated with the data By comparing the information such as an identification ID and update time of the data in the over scan, and having an update information imparting step of imparting updates to corresponding data in the new database.

【００１０】次に、本発明のコンピュータで実行可能な
プログラムは、複数地点から、名義や住所などの属性の
値を含むデータ、該データの識別ＩＤ、更新日時などの
情報を収集するデータ収集プロセスと、前記収集された
各データから属性の値を抽出し、各データが、前記抽出
した属性の値、識別ＩＤ、更新日時などからなる構成の
データベース（新データベース）を生成する属性情報抽
出プロセスと、前記属性情報抽出プロセスで生成された
新データベース内の属性の値が同一とみなせるデータ集
合を同一グループに分類する名寄せプロセスと、前記新
データベースと過去に生成した旧データベースとの間
で、属性の値が同一とみなせるデータ同士を同一と判断
して両データベース間の各データを対応付けする結合プ
ロセスと、前記新データベース中のデータの識別ＩＤや
更新日時などの情報と、前記データと対応付けされた前
記旧データベース中のデータの識別ＩＤや更新日時など
の情報とを比較することによって、前記新データベース
内の該当データに更新情報を付与する更新情報付与プロ
セスとを有することを特徴とする。Next, the computer-executable program of the present invention is a data collection process for collecting data including attribute values such as name and address, identification ID of the data, update date and time from a plurality of points. And an attribute information extraction process for extracting a value of an attribute from each of the collected data, and generating a database (new database) having a structure in which each data includes the extracted attribute value, identification ID, update date and time, and the like. , Between the new database and the old database generated in the past, the name identification process of classifying the data sets in which the values of the attributes in the new database generated in the attribute information extraction process can be regarded as the same, into the same group. A combining process that determines that data that can be regarded as having the same value is the same and associates each data between both databases, and the new data By comparing the information such as the identification ID and the update date and time of the data in the database with the information such as the identification ID and the update date and time of the data in the old database associated with the data, the corresponding information in the new database can be obtained. And an update information adding process for adding update information to the data.

【００１１】[0011]

【発明の実施の形態】以下に、本発明の一実施例につい
て、図面を参照して説明する。図１は、本発明の一実施
の形態のデータベース生成装置の構成例を示す図であ
る。データベース生成装置１０は、インターネット等の
ネットワーク４０に接続されるものであり、該ネットワ
ーク４０を介して、店の案内情報などのデータ集合を管
理・運営している複数の情報サーバ２０と、ユーザが使
用するユーザ端末３０とに接続している。ネットワーク
４０に接続された個々の情報サーバ２０はそのＵＲＬ
（uniform resource locator）によって識別される。各
情報サーバ２０は、その内部にデータ集合をもち、当該
データ集合を、他の情報サーバとは独立して管理・運営
している。したがって、同一店舗の案内情報などが、複
数の情報サーバ２０内に存在することが多々ある。デー
タベース生成装置１０自体も、その内部にデータ集合を
もって、管理・運営するという形態をとっていてもよ
い。ユーザ端末３０としては、典型的には、ＷＷＷソフ
トウェア（ＷＷＷブラウザ）が組み込まれたパーソナル
コンピュータ（パソコン）や携帯端末が使用される。各
ユーザは、該ユーザ端末３０を用いて情報検索などを行
うほか、必要ならデータベース生成装置１０に対して要
望等を通知する。BEST MODE FOR CARRYING OUT THE INVENTION An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a diagram showing a configuration example of a database generation device according to an embodiment of the present invention. The database generation device 10 is connected to a network 40 such as the Internet, and a plurality of information servers 20 that manages and operates a data set such as guide information of a store via the network 40 and a user It is connected to the user terminal 30 to be used. The URL of each information server 20 connected to the network 40
(Uniform resource locator). Each information server 20 has a data set therein, and manages and operates the data set independently of other information servers. Therefore, guide information for the same store often exists in a plurality of information servers 20. The database generation device 10 itself may also be in the form of managing and operating with a data set inside. As the user terminal 30, typically, a personal computer (personal computer) or a mobile terminal in which WWW software (WWW browser) is incorporated is used. Each user uses the user terminal 30 to search for information, etc., and if necessary, notifies the database generation device 10 of requests and the like.

【００１２】データベース生成装置１０は、データ収集
手段１１、属性情報抽出手段１２、名寄せ手段１３、結
合手段１４、更新情報付与手段１５の各処理手段、及
び、データベース格納部１６を具備する。データベース
格納部１６には、過去（ここでは前回とする）に生成し
たデータベース（旧データベース）１７と新規に生成し
たデータベース（新データベース）１８が格納される。
データベース生成装置１０は、所謂コンピュータで実現
されるものであり、各処理手段１１〜１５はＣＰＵやそ
の内蔵メモリ（ＲＡＭ、ＲＯＭ等）が受け持ち、データ
ベース格納部１６はハードディスク、その他の外部記憶
装置などが受け持つ。The database generating device 10 comprises a data collecting unit 11, an attribute information extracting unit 12, a name collating unit 13, a combining unit 14, an update information adding unit 15, and a database storing unit 16. The database storage unit 16 stores a database (old database) 17 created in the past (here, the previous time) and a newly created database (new database) 18.
The database generation device 10 is realized by a so-called computer, the processing means 11 to 15 are handled by the CPU and its built-in memory (RAM, ROM, etc.), the database storage unit 16 is a hard disk, other external storage device, etc. Is responsible for.

【００１３】なお、データベース生成装置１０自体、ユ
ーザ端末３０から検索要求を受けて情報検索サービスを
実施してもよい。この場合、図１では省略したが、デー
タベース生成装置１０は情報検索手段も具備することに
なる。また、情報検索装置は該データベース生成装置１
０とは別構成として、データベース生成装置１０で生成
したデータベースを別の情報検索装置で利用することで
もよい。The database generating apparatus 10 itself may receive the search request from the user terminal 30 to implement the information search service. In this case, although omitted in FIG. 1, the database generation device 10 also includes information retrieval means. Further, the information retrieval device is the database generation device 1
As a configuration different from 0, the database generated by the database generation device 10 may be used by another information search device.

【００１４】図２は、本発明の一実施形態のデータベー
ス生成方法のフローチャートを示す図である。以下、図
２のフローチャートに従って、本データベース生成装置
１０の動作を詳しく説明する。FIG. 2 is a diagram showing a flowchart of a database generating method according to an embodiment of the present invention. Hereinafter, the operation of the database generation device 10 will be described in detail with reference to the flowchart of FIG.

【００１５】データベース生成装置１０では、データ収
集手段１１において、一定期間や特定日時ごと（例え
ば、１日、１週間、毎月曜日など）に、各情報サーバ２
０にアクセスし、各情報サーバ２０内のデータ（データ
集合）を収集する（ステップ１１１）。ここで、各デー
タは一つのファイルであり、全てのファイルがあるディ
レクトリ配下にあるものとする。このディレクトリの所
在は、データベース生成装置１０の管理者と各情報サー
バ２０の管理者との間であらかじめ取り決めがなされて
おり、データ収集手段１１は、各情報サーバ２０の該デ
ィレクトリ配下のファイル群をダウンロードし、例えば
ＲＡＭやハードディスク等に一時的に格納する。ここ
で、ファイルとともに、データの名称となるファイル名
（これが当該データのリンク情報となる）とファイルの
更新日時の情報も取得する。In the database generating device 10, the data collecting means 11 allows each information server 2 to be operated at a fixed period or at a specific date and time (for example, one day, one week, and every Monday).
0 to collect data (data set) in each information server 20 (step 111). Here, it is assumed that each data is one file and is under the directory where all files are. The location of this directory is pre-arranged between the administrator of the database generation device 10 and the administrator of each information server 20, and the data collection unit 11 stores the files under the directory of each information server 20. It is downloaded and temporarily stored in, for example, a RAM or a hard disk. Here, together with the file, the file name that becomes the name of the data (this becomes the link information of the data) and the information of the update date and time of the file are acquired.

【００１６】図３は、情報サーバＡ及びＢからダウンロ
ードしたデータの一例を示したものである。この例で
は、同一店舗「紅蘭亭」のデータが情報サーバＡにもＢ
にも登録されているものとし、そのデータを示したもの
である。図３に示すように、情報サーバＡとＢでは、同
一店舗「紅蘭亭」でも、名義や住所等が異なる形式、表
現で登録されている。FIG. 3 shows an example of data downloaded from the information servers A and B. In this example, the data of the same store “Korantei” is also stored in the information server A and B
It is assumed that it is also registered in and the data is shown. As shown in FIG. 3, in the information servers A and B, names and addresses are registered in different formats and expressions even in the same store “Korantei”.

【００１７】次に、属性情報抽出手段１２において、上
記データ収集手段１１で収集した各データから名義や住
所などの該データを特徴付ける属性の値を抽出する（ス
テップ１１２）。各データファイルは典型的にはＨＴＭ
Ｌ文書やＸＭＬ文書であり、ユーザはユーザ端末３０を
用いてＷＷＷソフトウェア（ＷＷＷブラウザ）から該当
ファイルのＵＲＬにアクセスすることにより、その内容
を閲覧することができるものである。各データファイル
の内容が、どういった属性からなり、各属性がどのよう
なフォーマットで記述されているかといったフォーマッ
ト情報は、各情報サーバ２０ごとに決められている。こ
こでは、各情報サーバに対応したデータファイルフォー
マット解析ルーチンを属性情報抽出手段１２が保持して
いるとする。属性情報抽出手段１２は、各情報サーバに
対応したデータファイルフォーマット解析ルーチンによ
り、データファイルから名義や住所などの属性値を抽出
する。次に、属性情報抽出手段１２では、抽出した名義
や住所などの属性値と該データが存在する情報サーバ名
及び該データのリンク情報及び更新日時の情報等からな
るデータ（レコード）を作成し、このようなデータが集
積したデータベースを生成してデータベース格納部１６
に格納する。この新たに生成されたデータベースを新デ
ータベース１８とする。また、前回（１日前、１週間前
など）、同様に各情報サーバ２０からデータを収集して
生成し、後述の名寄せ、結合、更新情報付与等の処理を
施したデータベースを旧データベース１７とする。Next, the attribute information extracting means 12 extracts the value of the attribute characterizing the data such as the name and address from each data collected by the data collecting means 11 (step 112). Each data file is typically HTM
It is an L document or an XML document, and the user can browse the content of the file by accessing the URL of the file from the WWW software (WWW browser) using the user terminal 30. Format information such as what attributes each data file has and what format each attribute is described in is determined for each information server 20. Here, it is assumed that the attribute information extraction means 12 holds a data file format analysis routine corresponding to each information server. The attribute information extraction means 12 extracts attribute values such as names and addresses from the data file by a data file format analysis routine corresponding to each information server. Next, the attribute information extraction means 12 creates data (records) including attribute values such as names and addresses that have been extracted, the name of the information server where the data exists, link information of the data, and update date and time, The database storing unit 16 generates a database in which such data is accumulated.
To store. This newly created database is referred to as a new database 18. In addition, the old database 17 is a database that was previously collected (1 day ago, 1 week ago, etc.) and similarly generated by collecting data from each information server 20 and subjected to processes such as name identification, combination, and update information addition, which will be described later. .

【００１８】図４は、新たに生成されたデータベース
（新データベース）１８の一例を示したもので、（ａ）
は情報サーバＡのデータ、（ｂ）は情報サーバＢのデー
タである。ここでは、抽出する属性として業種、名義、
住所をとっている。業種体系は情報サーバ２０ごとに一
般に異なっている。また、同一店舗のデータでも、情報
サーバが異なれば、名義や住所の表記には揺れがある。FIG. 4 shows an example of the newly created database (new database) 18 (a).
Is data of the information server A, and (b) is data of the information server B. Here, the type of industry, name, and
I have an address. The business system is generally different for each information server 20. In addition, even in the data of the same store, if the information server is different, the notation of the name and the address may be unstable.

【００１９】なお、情報サーバ２０が、店のデータファ
イルの他に、各店の名義や住所、電話番号、リンク情報
などの基本情報のみが記載されているデータのリストか
らなるファイルをもっている場合、データ収集手段１１
において、データファイル群の代わりに、そのようなリ
ストファイルをダウンロードしてもよい。この場合、属
性情報抽出手段１２においては、リストファイルの各デ
ータから名義、住所、リンク情報などを抽出し、抽出し
たリンク情報をもとに、再び情報サーバ２０にアクセス
し、データファイルの更新日時情報を取得する。そし
て、同様に図４のような新データベース１８を生成す
る。In addition, when the information server 20 has a file including a list of data in which only basic information such as name, address, telephone number, and link information of each store is described in addition to the data file of the store, data is stored. Collecting means 11
In, in place of the data file group, such a list file may be downloaded. In this case, the attribute information extracting means 12 extracts the name, address, link information, etc. from each data of the list file, accesses the information server 20 again based on the extracted link information, and updates the date and time of the data file. Get information. Then, similarly, a new database 18 as shown in FIG. 4 is generated.

【００２０】次に、名寄せ手段１３において、新データ
ベース１８内の名義や住所などの属性の値が同一とみな
せるデータ（レコード）を同一グループに分類する（ス
テップ１１３）。即ち、同一店舗として名寄せする。Next, the name identification means 13 classifies the data (records) in the new database 18, which can be regarded as having the same attribute values such as name and address, into the same group (step 113). That is, the names are identified as the same store.

【００２１】例えば、図４に示した新データベース１８
の任意の２データ間において、名義及び住所の属性の値
同士を照合し、マッチしたレコード同士を同一グループ
に分類する。名義文字列や住所文字列の照合方法には例
えば次のようなものが考えられる。一つには完全一致し
たときマッチするとみなす方法（完全一致と呼ぶ）があ
り、また、両方に共通して含まれる文字の数の割合があ
る閾値以上のときマッチするとみなす方法（文字単位一
致と呼ぶ）がある。他には、文字列を単語分割して両方
に共通して含まれる単語の数の割合がある閾値以上のと
きマッチするとみなす方法（単語単位一致と呼ぶ）があ
る。いずれの方法も、漢数字を算用数字に変換したり、
英字を大文字に統一化するといった表記の揺れを解消す
る処理を事前に行うことにより、より照合の精度を高め
ることが可能である。照合の結果、図４の例では、１
番目と４番目のデータ（レコード）がマッチし、３番目
と６番目のデータがマッチする。このマッチしたレコー
ド同士を同一グループに分類する。ここで、各グループ
を通常のデータと区別して、名寄せデータと呼ぶことに
する。For example, the new database 18 shown in FIG.
The value of the attribute of the name and the value of the attribute of the arbitrary two data are compared, and the matched records are classified into the same group. For example, the following method can be considered as a method of collating the nominal character string and the address character string. One is to consider a match when it is an exact match (called an exact match), and a method to consider it to be a match when the ratio of the number of characters contained in both is more than a certain threshold (character match). There is). Another method is to divide a character string into words and to consider that a match occurs when the ratio of the number of words commonly included in both is greater than or equal to a certain threshold (called word-unit matching). Either way, you can convert Kanji numbers into arithmetic numbers,
It is possible to further improve the accuracy of the collation by performing a process in advance to eliminate the fluctuation of the notation such as unifying the letters to uppercase. As a result of the collation, in the example of FIG.
The fourth and fourth data (records) match, and the third and sixth data match. The matched records are classified into the same group. Here, each group will be referred to as name identification data in order to distinguish it from normal data.

【００２２】名寄せ手段１３では、各名寄せデータの名
義や住所の属性値として、例えば当該名寄せデータに含
まれるデータの名義や住所の属性値から一つだけ選ん
で、その値そのものを用いるか、あるいは正規化した値
に変換する。また、各データの業種名は、データベース
生成装置１０独自の業種体系における対応する業種名に
変換する。In the name identification means 13, as the name or address attribute value of each name identification data, for example, only one name or address attribute value of the data included in the name identification data is selected and the value itself is used, or Convert to a normalized value. Further, the business type name of each data is converted into the corresponding business type name in the business type system unique to the database generation device 10.

【００２３】図４について、こうして更新された新デー
タベース１８の一例を図５に示す。例えば、データベー
ス生成装置１０独自の業種体系では、業種として「和
食」、「中華」などがあり、図４におけるデータの業種
名はいずれも「中華」に変換される。図５において、同
一グループに分類された１番目と４番目のデータの業種
名はともに「中華」に変換されるので、名寄せデータと
しての業種名も「中華」となる。３番目と６番目のデー
タに関しても同様である。また、名寄せデータの名義や
住所の属性値としては、１番目と４番目のデータでは、
名義は「紅蘭亭」を選択し、住所は「新宿区神楽坂１−
２−３」を選択している。同様に、３番目と６番目のデ
ータでは、名義は「大竹亭」を選択し、住所は「新宿区
神楽坂３−８−６」を選択している。なお、図５中の新
データベース１８の「更新情報」の欄は後述の更新情報
付与手段１５で書き替えられるもので、ここでは全て空
（ＮＵＬＬ）としておく。With reference to FIG. 4, an example of the new database 18 updated in this way is shown in FIG. For example, in the business type system unique to the database generation device 10, there are “Japanese food”, “Chinese food”, and the like as business types, and the business type names of the data in FIG. 4 are all converted to “Chinese food”. In FIG. 5, since the industry names of the first and fourth data classified into the same group are both converted to “Chinese”, the industry name as the name identification data is also “Chinese”. The same applies to the third and sixth data. Moreover, as the name of the name identification data and the attribute value of the address, in the first and fourth data,
Select "Korantei" as the name and the address as "1-
2-3 ”is selected. Similarly, in the 3rd and 6th data, "Otaketei" is selected as the name and "3-8-6 Kagurazaka, Shinjuku-ku" is selected as the address. The column of "update information" of the new database 18 in FIG. 5 is rewritten by the update information adding means 15 described later, and is all empty here.

【００２４】ここで、どの属性値同士をどの照合方法で
照合させるかといった照合ルールは、名寄せ手段１３を
実現するプログラム内に記述してもよいし、データベー
ス生成装置１０内の、プログラムが参照する外付けテー
ブルに記述して、データベース生成装置１０の管理者
が、この外付けテーブルを自由に変更できるようにして
おいてもよい。The matching rule such as which matching method is used for matching which attribute value may be described in the program that implements the name identification means 13, or is referred to by the program in the database generation device 10. The external table may be described so that the administrator of the database generation device 10 can freely change the external table.

【００２５】図６は、このような外付けテーブルの内容
の一例である。図６（ａ）では、データが一致する基準
を記述する。この例では、照合項目として名義と住所を
指定している。名義の照合結果の評価値が９０点以上か
つ住所の照合結果の評価値が８０点以上の場合、あるい
は名義の照合結果の評価値が８０点以上かつ住所の照合
結果の評価値が９０点以上の場合、２データが一致する
と判定する。図６（ｂ）では、名義の照合方法を記述す
る。ここでは、照合方法として完全一致、文字単位一
致、単語単位一致を指定しており、各方法による照合を
行う。完全一致の照合処理で一致したならば評価値１０
０とし、一致しなければ評価値０とする。文字単位一致
の照合結果の評価値は一致した文字の数の割合に１００
を乗じたものとする。単語単位一致の照合結果の評価値
も一致した単語の数の割合に１００を乗じたものとす
る。一番高い評価値を返した照合方法の評価値を名義の
評価値とする。図６（ｃ）では、住所の照合方法を同様
に記述する。ここでは、照合方法として完全一致、単語
単位一致を指定している。一番高い評価値を返した照合
方法の評価値を住所の評価値とする。FIG. 6 shows an example of the contents of such an external table. In FIG. 6A, the criteria for matching the data are described. In this example, the name and address are specified as the matching items. When the evaluation value of the collation result of the name is 90 points or more and the evaluation value of the collation result of the address is 80 points or more, or the evaluation value of the collation result of the name is 80 points or more and the evaluation value of the collation result of the address is 90 points or more In the case of, it is determined that the two data match. In FIG. 6B, the collation method of the name is described. Here, exact matching, character-based matching, and word-based matching are specified as matching methods, and matching is performed by each method. If there is a match in the complete matching process, the evaluation value is 10
The evaluation value is 0, and if they do not match, the evaluation value is 0. The evaluation value of the matching result of character unit matching is 100 in the ratio of the number of matching characters.
Multiplied by. Assume that the evaluation value of the matching result of word unit matching is 100 times the ratio of the number of matching words. The evaluation value of the matching method that returned the highest evaluation value is the nominal evaluation value. In FIG. 6C, the address matching method is also described. Here, exact matching and word-based matching are designated as matching methods. The evaluation value of the matching method that returned the highest evaluation value is used as the evaluation value of the address.

【００２６】次に、結合手段１４において、データベー
ス格納部１６にある、名寄せ後の新データベース１８
と、前回各情報サーバ２０からデータを収集して、生成
した旧データベース１７との間で、名義や住所などの属
性の値が同一とみなせる名寄せデータ同士を同一と判断
してリンク付けし、対応付けする（ステップ１１４）。
例えば、新旧データベース１７、１８内の同一と判断さ
れた両データに、同一なデータであることを示す情報を
付与するなどしてリンク付けし、対応付けする。Next, in the combining means 14, the new database 18 after the name identification is stored in the database storage unit 16.
Then, the name identification data that can be regarded as having the same attribute values such as name and address is linked with the old database 17 which is generated by collecting the data from each information server 20 last time, and linking them. (Step 114).
For example, both data determined to be the same in the old and new databases 17 and 18 are linked and associated with each other by adding information indicating that they are the same data.

【００２７】ここでは、情報サーバ２０において、同一
データのリンク情報が時の経過とともに変わり得るとい
う前提であるものとする。各データの更新情報を導出す
るにあたっては、新データと旧データの更新日時などを
比較する必要があり、そのためには、新旧データベース
において、どのデータが同一かを判断しなけれならな
い。リンク情報が不変であれば、リンク情報が同一かで
判断できるが、リンク情報が変わり得るという前提のも
とでは、データがもつ名義や住所の属性値が同一かで判
断する必要があるわけである。ここで、同一データであ
っても時の経過とともに、名義などが微妙に変更される
場合もありうるので、照合は、表記の揺れを考慮して行
う。具体的には、例えば完全一致以外に文字端単位一致
や単語単位一致といった照合方法で行う。基本的には名
寄せの場合と同様である。また、照合の対象となる項目
を、例えば名義のみにすると、同一店の住所が変更して
も、新旧のデータはマッチすることになる。このよう
に、どのような条件で新旧のデータを同一視するかは、
照合ルールを変更することにより調節可能である。図７
に、外付けテーブルに記述する照合ルールにおけるデー
タ一致基準の一例を示す。ここでは照合項目として名義
のみを指定した例を示している。名義の照合方法の記述
は、例えば図６と同様にすればよい。Here, it is assumed that, in the information server 20, the link information of the same data can change over time. In deriving the update information of each data, it is necessary to compare the update date and time of the new data and the old data, and for that purpose, it is necessary to judge which data is the same in the old and new databases. If the link information does not change, it can be judged whether the link information is the same, but under the assumption that the link information can change, it is necessary to judge whether the name and address attribute values of the data are the same. is there. Here, even with the same data, the name and the like may be slightly changed with the passage of time, so the collation is performed in consideration of the fluctuation of the notation. Specifically, for example, a matching method such as character-end-based matching or word-based matching other than exact matching is performed. Basically, it is similar to the case of name identification. Further, if the items to be collated are, for example, only the name, the old and new data will match even if the address of the same store changes. In this way, under what conditions are the old and new data identified
It can be adjusted by changing the matching rule. Figure 7
Shows an example of the data matching criterion in the matching rule described in the external table. Here, an example is shown in which only the name is specified as the matching item. The description of the name matching method may be the same as that shown in FIG. 6, for example.

【００２８】図８は、旧データベース１７の一例であ
る。便宜上、図８では、各データは前々回から更新がな
かったとしている。結合手段１４では、図５に示した新
データベース１８の各名寄せデータと同一な旧データベ
ース１７の名寄せデータを、名義のみあるいは名義及び
住所の属性値同士を照合することによって特定する。そ
の結果、図５の新データベース１８の１番目、２番目、
３番目の名寄せデータがそれぞれ、図８の旧データベー
ス１７の１番目、２番目、３番目の名寄せデータにリン
ク付けされる。図５の新データベース１８の４番目の名
寄せデータにリンク付けされる名寄せデータは、図８の
旧データベース１７には存在しない。なお、リンク付け
された名寄せデータ内の同一の対応情報サーバをもつデ
ータ同士も、同一のデータとしてリンク付けされる。以
後、図５、図８の各データを上から何番目かで表現す
る。FIG. 8 shows an example of the old database 17. For the sake of convenience, it is assumed in FIG. 8 that each data has not been updated since the last time. The combining unit 14 identifies the name identification data of the old database 17 that is the same as each name identification data of the new database 18 shown in FIG. 5 by collating only the nominal values or the attribute values of the nominal and address. As a result, the new database 18 of FIG.
The third name identification data is linked to the first, second, and third name identification data in the old database 17 of FIG. 8, respectively. The name identification data linked to the fourth name identification data of the new database 18 of FIG. 5 does not exist in the old database 17 of FIG. Data having the same corresponding information server in the linked name identification data are also linked as the same data. Hereinafter, each data in FIG. 5 and FIG. 8 will be expressed by the number from the top.

【００２９】次に、更新情報付与手段１５において、新
データベース１８のデータのリンク情報や更新日時の情
報と、結合手段１４により該データと同一と判断された
旧データベース１７中のデータのリンク情報や更新日時
の情報とを比較することにより、新データベース１８中
の該当データに更新情報を設定・付与する（ステップ１
１５）。即ち、新データベース１８中のデータとリンク
付けされた旧データベース１７のデータがあり、かつリ
ンク情報または更新日時が変更されているとき、該デー
タは更新されたものと判断し、いずれも変更されていな
いとき、該データは更新なしと判断し、新データベース
１８中の該当データの更新情報を「更新」あるいは「更
新なし」とする。また、新データベース１８中のデータ
とリンク付けされた旧データベース１７のデータがない
場合、該データは新規に作成されたものと判断し、新デ
ータベース１８中の該当データの更新情報を「新規」と
する。Next, in the update information adding means 15, the link information of the data of the new database 18 and the information of the update date and time, and the link information of the data in the old database 17 which is judged to be the same by the combining means 14 and The update information is set / appended to the corresponding data in the new database 18 by comparing it with the update date / time information (step 1).
15). That is, when there is data in the old database 17 linked to the data in the new database 18 and the link information or the update date / time has been changed, it is determined that the data has been updated, and both have been changed. If there is not, the data is determined not to be updated, and the update information of the corresponding data in the new database 18 is set to "update" or "no update". If there is no data in the old database 17 linked to the data in the new database 18, it is determined that the data has been newly created, and the update information of the corresponding data in the new database 18 is set as “new”. To do.

【００３０】例えば、図５の新データベース１８の１番
目のデータは、リンク付けされた図８の旧データベース
１７の１番目のデータと、リンク情報が同じで、更新日
時が変わっているので、当該データは更新されたものと
判断する。For example, the first data of the new database 18 of FIG. 5 has the same link information as the first data of the linked old database 17 of FIG. 8, and the update date and time has changed. The data is judged to have been updated.

【００３１】図５の新データベース１８の２番目のデー
タは、リンク付けされた図８の旧データベース１７の２
番目のデータと比べ、リンク情報も更新日時も不変なの
で、当該データは更新されていないものと判断する。図
５の新データベース１８の４番目のデータについても同
様である。The second data in the new database 18 of FIG. 5 is the second data of the linked old database 17 of FIG.
Since the link information and the update date and time are unchanged compared to the second data, it is determined that the data has not been updated. The same applies to the fourth data of the new database 18 of FIG.

【００３２】図５の新データベース１８の３番目のデー
タは、リンク付けされた図８の旧データベース１７の３
番目のデータと比べ、更新日時は変わらないが、リンク
情報が変わっているので、当該データは更新されたもの
と判断する。The third data of the new database 18 of FIG. 5 is the third data of the linked old database 17 of FIG.
Compared with the second data, the update date and time does not change, but the link information has changed, so it is determined that the data has been updated.

【００３３】図５の新データベース１８の５番目のデー
タは、名寄せデータとしては、図８の旧データベース１
７の３番目の名寄せデータとリンクしているが、データ
としてリンク付けされたデータは図８の旧データベース
１７にないので、新規に作成されたものと判断する。The fifth data of the new database 18 of FIG. 5 is the name identification data as the old database 1 of FIG.
Although it is linked to the third name identification data of No. 7, but the data linked as data is not in the old database 17 of FIG. 8, it is determined to be newly created.

【００３４】図５の新データベース１８の６番目のデー
タにリンク付けされたデータは、図８の旧データベース
１７にないので、当該データは新規に作成されたものと
判断する。Since the data linked to the sixth data of the new database 18 of FIG. 5 does not exist in the old database 17 of FIG. 8, it is judged that the data is newly created.

【００３５】このようにして、図５の新データベース１
８と図８の旧データベース１７の場合、図９に示すよう
な更新情報の付与された新データベース１８が最終的に
生成される。更新情報付与手段１５では、この最終的に
生成された新データベース１８でもって旧データベース
１７を上書きする。In this way, the new database 1 of FIG.
8 and the old database 17 in FIG. 8, a new database 18 with update information as shown in FIG. 9 is finally generated. The update information adding means 15 overwrites the old database 17 with the finally created new database 18.

【００３６】以上によりデータベースの生成が終了す
る。最終的に生成されたデータベースにユーザ端末３０
からアクセスし、ユーザの要求に合致するデータを検索
し表示したときには、名寄せデータの業種、名義、住所
の情報と、該データが存在する情報サーバ２０内のファ
イルへのリンク情報及び更新情報が表示される。図１０
は、図９の生成データベースにより、業種が「中華」で
住所が「新宿区神楽坂」である店を検索したときの検索
結果の表示例である。ユーザはこのリンク情報を画面上
でクリックすることにより、リンク先のファイルの内容
である店の詳細情報にアクセスすることができる。With the above, the generation of the database is completed. The user terminal 30 is added to the finally generated database.
When accessing from, and searching for and displaying data that matches the user's request, information on the business name, name, and address of the name identification data, and link information and update information to a file in the information server 20 in which the data exists are displayed. To be done. Figure 10
9 is a display example of a search result when a store whose business type is “Chinese” and whose address is “Kagurazaka, Shinjuku-ku” is searched by the generation database of FIG. 9. By clicking this link information on the screen, the user can access the detailed information of the store, which is the content of the linked file.

【００３７】以上、本発明の典型的な一実施例について
述べたが、名寄せ前の旧データベースを保持しておき、
結合手段１４のリンク付けを、名寄せ後の新旧データベ
ース（図５及び図８）間で実行するのではなく、名寄せ
前の新旧データベース（図４及び図４相当の古いデータ
ベース）間で実行してもよい。例えば、この場合、対応
情報サーバが同一なデータ同士を照合させる。The typical embodiment of the present invention has been described above. The old database before name identification is retained,
Even if the linking of the combining means 14 is performed not between the old and new databases (FIGS. 5 and 8) after name identification, but between the old and new databases before the name identification (old databases equivalent to FIGS. 4 and 4). Good. For example, in this case, the correspondence information server collates the same data.

【００３８】情報サーバ２０において、同一データのリ
ンク情報が時の経過とともに変わりえても、各データに
とって恒久的に不変なＩＤ情報がデータ中に含まれてい
る場合は次のように処理を行うこともできる。属性情報
抽出手段１２において、このＩＤ情報を抽出し、結合手
段１４におけるリンク付けを、新データベース中の各デ
ータに対し、当該データと同一のＩＤ情報をもつ旧デー
タベース中のデータをリンク付けることによって行う。In the information server 20, even if the link information of the same data changes with the passage of time, if the data contains ID information that is permanently invariable for each data, the following processing is performed. You can also The attribute information extracting means 12 extracts the ID information and links the combining means 14 by linking each data in the new database with the data in the old database having the same ID information as the data. To do.

【００３９】また、情報サーバ２０において、同一デー
タのリンク情報が常に不変であれば、結合手段１４のリ
ンク付けは必要でない。更新情報付与手段１５におい
て、生成したデータベース（名寄せ前のものでも名寄せ
後のものでもよい）中のデータの更新日時が、前にデー
タ集合を収集した時点以降ならば、該データは更新され
たデータか新規データであることが分かる。さらに、該
データの対応情報サーバとリンク情報がともに同一であ
るデータが、前に生成したデータベース中にあれば、該
データは更新されたデータであり、なければ新規データ
であることが判明する。In the information server 20, if the link information of the same data is always unchanged, the linking means 14 need not be linked. In the update information adding means 15, if the update date and time of the data in the generated database (which may be before or after the name identification) is after the time of collecting the data set before, the data is the updated data. It turns out that this is new data. Further, if the data having the same link information as the corresponding information server of the data is in the previously generated database, it is determined that the data is the updated data, and if not, it is the new data.

【００４０】上記に挙げた以外にも、本発明は特許請求
の範囲の記載内で、様々な変更や拡張が可能である。例
えば、名寄せ手段や名寄せ過程をなくして、各データの
更新情報のみを付与する構成も考えられる。Besides the above, the present invention can be variously modified and expanded within the scope of the claims. For example, a configuration in which only the update information of each data is given without the name identification means and the name identification process is conceivable.

【００４１】なお、図１で示した装置における各部の一
部もしくは全部での処理機能をコンピュータのプログラ
ムで構成し、そのプログラムをコンピュータを用いて実
行して本発明を実現することができること、あるいは、
図２で示した処理手順をコンピュータのプログラムで構
成し、そのプログラムをコンピュータに実行させること
ができることは言うまでもない。また、コンピュータで
その処理機能を実現するためのプログラム、あるいは、
コンピュータにその処理手順を実行させるためのプログ
ラムを、そのコンピュータが読み取り可能な記録媒体、
例えば、ＦＤや、ＭＯ、ＲＯＭ、メモリカード、ＣＤ、
ＤＶＤ、リムーバブルディスクなどに記録して、保存し
たり、提供したりすることができるとともに、インター
ネット等のネットワークを通してそのプログラムを配布
したりすることが可能である。It should be noted that a part or all of the processing functions of each unit in the apparatus shown in FIG. 1 can be configured by a computer program, and the program can be executed by the computer to implement the present invention, or ,
It goes without saying that the processing procedure shown in FIG. 2 can be configured by a computer program and the computer can be caused to execute the program. Also, a program for realizing the processing function on a computer, or
A program for causing a computer to execute the processing procedure, a computer-readable recording medium,
For example, FD, MO, ROM, memory card, CD,
The program can be recorded on a DVD, a removable disk, etc., and saved or provided, and the program can be distributed through a network such as the Internet.

【００４２】[0042]

【発明の効果】以上説明したように、本発明によれば、
生成されたデータベースからユーザの要求に合致するデ
ータを検索したとき、重複データがなく、かつデータの
更新情報が付加された形で検索結果を表示することが可
能となる。As described above, according to the present invention,
When the data that meets the user's request is searched from the generated database, it is possible to display the search result in a form in which there is no duplicate data and data update information is added.

[Brief description of drawings]

【図１】本発明の一実施形態のデータベース生成装置の
構成を示す図である。FIG. 1 is a diagram showing a configuration of a database generation device according to an embodiment of the present invention.

【図２】本発明の一実施形態のデータベース生成方法の
フローチャート図である。FIG. 2 is a flow chart diagram of a database generation method according to an embodiment of the present invention.

【図３】情報サーバからダウンロードしたデータの一例
を示す図である。FIG. 3 is a diagram showing an example of data downloaded from an information server.

【図４】属性情報抽出手段で生成された新データベース
の一例を示す図である。FIG. 4 is a diagram showing an example of a new database generated by an attribute information extraction unit.

【図５】名寄せ手段で更新された新データベースの一例
を示す図である。FIG. 5 is a diagram showing an example of a new database updated by a name identification means.

【図６】名寄せ手段で適用される照合ルールの一例を示
す図である。FIG. 6 is a diagram showing an example of a matching rule applied by a name identification means.

【図７】結合手段で適用される照合ルールの一例を示す
図である。FIG. 7 is a diagram showing an example of a matching rule applied by a combining unit.

【図８】前回生成した旧データベースの一例を示す図で
ある。FIG. 8 is a diagram showing an example of an old database generated last time.

【図９】更新情報付与手段で更新情報が付与された新デ
ータベースの一例を示す図である。FIG. 9 is a diagram showing an example of a new database to which update information is added by update information adding means.

【図１０】本発明により生成されたデータベースからの
検索結果の表示画面の一例を示す図である。FIG. 10 is a diagram showing an example of a display screen of search results from a database generated according to the present invention.

【図１１】従来のデータベース生成技術により生成した
データベースからの検索結果の表示画面の例を示す図で
ある。FIG. 11 is a diagram showing an example of a display screen of search results from a database generated by a conventional database generation technique.

[Explanation of symbols]

１０データベース生成装置１１データ収集手段１２属性情報抽出手段１３名寄せ手段１４結合手段１５更新情報付与手段１６データベース格納部１７旧データベース１８新データベース２０情報サーバ３０ユーザ端末４０ネットワーク 10 Database generator 11 Data collection means 12 Attribute information extraction means 13 means of gathering people 14 Coupling means 15 Update information giving means 16 Database storage 17 Old Database 18 new database 20 Information server 30 user terminals 40 networks

───────────────────────────────────────────────────── フロントページの続きＦターム(参考） 5B075 KK03 KK07 ND20 NK02 NK46 NR03 NR14 NR20 UU40 5B082 EA08 EA10 ─────────────────────────────────────────────────── ─── Continued front page F term (reference) 5B075 KK03 KK07 ND20 NK02 NK46 NR03 NR14 NR20 UU40 5B082 EA08 EA10

Claims

[Claims]

1. An apparatus for collecting data from a plurality of points to generate a database, a storage unit for storing a database generated in the past (hereinafter referred to as an old database), a name, an address, etc. from each of the points. Data including the value of the attribute, identification ID of the data, data collection means for collecting update date / time information, and extracting the value of the attribute from each of the collected data, and each data is at least the value of the extracted attribute. Attribute information extraction means for generating a database (hereinafter referred to as a new database) having a structure consisting of the ID, the identification ID, and the update date and time, and a name grouping for classifying a data set in which the attribute values in the generated new database can be regarded as the same into the same group. Means, the new database and the old database, the data whose attribute values can be regarded as the same are judged to be the same, and Connecting means for associating each data between data sets, information such as identification ID and update date and time of data in the new database, and identification ID and update date and time of data in the old database associated with the data Update information adding means for adding update information to the corresponding data in the new database by comparing the information with the above information.

2. The database generation device according to claim 1, wherein the update information adding means uses the identification ID of the data of the new database and the information of the update date and time as the identification ID of the corresponding data of the old database and the information of the update date and time. If there is no match, update information indicating that there is an update, and if there is a match, update information is added to the corresponding data in the new database. If there is no data associated with the data in the new database in the old database, the update information in the new database is added. A database generation device characterized by adding update information indicating new to corresponding data.

3. A method for automatically generating a database by collecting data from a plurality of points, wherein data including attribute values such as names and addresses, identification IDs of the data, and update date / time information are collected from each of the points. A data collection process to be collected and an attribute value is extracted from each of the collected data, and each data is stored in a database (hereinafter referred to as a new database) composed of at least the extracted attribute value, identification ID, and update date and time. An attribute information extraction step of generating, a name identification step of classifying data sets in which the attribute values in the generated new database can be regarded as the same into the same group, and the old database generated and retained in the past with the new database And a combining process of determining that the data whose attribute values can be regarded as the same are the same and associating each data between both databases, Corresponding data in the new database by comparing information such as identification ID and update date and time of data in the database with information such as identification ID and update date and time of data in the old database associated with the data And a step of adding update information to the database, the method for generating a database.

4. The database generation method according to claim 3, wherein at least one of the name identification process and the combining process is omitted, and in the update information giving process, at least the combining process is omitted, A database generation method characterized by recognizing correspondence between new database data and old database data based on invariant information.

5. A computer-executable program for collecting data from a plurality of points to generate a database, the data including attribute values such as names and addresses from each of the points, and the data of the data. A data collection process for collecting identification IDs and update date / time information, and a database having a configuration in which attribute values are extracted from each of the collected data, and each data includes at least the extracted attribute values, identification IDs, and update dates / times. (Hereinafter referred to as a new database), an attribute information extraction process, a name identification process that classifies data sets that can be regarded as having the same attribute value in the generated new database into the same group, and the new database and the past Data that can be regarded as having the same attribute values as the old database are judged to be the same, and Comparing the information such as the identification ID and update date and time of the data in the new database with the information such as the identification ID and update date and time of the data in the old database associated with the data. By doing so, an update information giving process of giving update information to the corresponding data in the new database, a database generation program.