[go: up one dir, main page]

CN104346444A - Optimum site selection method based on road network reverse spatial keyword query - Google Patents

Optimum site selection method based on road network reverse spatial keyword query Download PDF

Info

Publication number
CN104346444A
CN104346444A CN201410568900.6A CN201410568900A CN104346444A CN 104346444 A CN104346444 A CN 104346444A CN 201410568900 A CN201410568900 A CN 201410568900A CN 104346444 A CN104346444 A CN 104346444A
Authority
CN
China
Prior art keywords
query
merchant
road network
keyword
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410568900.6A
Other languages
Chinese (zh)
Other versions
CN104346444B (en
Inventor
高云君
秦旭
赵靖文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201410568900.6A priority Critical patent/CN104346444B/en
Publication of CN104346444A publication Critical patent/CN104346444A/en
Application granted granted Critical
Publication of CN104346444B publication Critical patent/CN104346444B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Remote Sensing (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种基于路网反空间关键字查询的最佳选址方法,对于数据集采用基于连接聚簇的索引结构存储,并利用类迪杰斯特拉搜索方法来遍历索引;在遍历索引时本发明首先计算得到查询商家的潜在竞争商家的候选结果;接着利用提出的相关规则进行验证,判断其是否为真正的竞争商家。本发明结合了空间数据库的现有技术和反空间关键字算法,并且整个查询过程不需要遍历整个数据集,从而提供了最佳性能。

An optimal site selection method based on road network anti-space keyword query, adopts an index structure based on connection clustering for data sets, and uses a Dijkstra-like search method to traverse the index; when traversing the index, the present invention Firstly, the candidate results of the potential competitors of the inquired merchants are calculated; then, the proposed relevant rules are used to verify whether they are real competitive merchants. The invention combines the prior art of the spatial database and the anti-spatial keyword algorithm, and the entire query process does not need to traverse the entire data set, thereby providing the best performance.

Description

一种基于路网反空间关键字查询的最佳选址方法An Optimal Site Selection Method Based on Road Network Anti-Spatial Keyword Query

技术领域technical field

本发明涉及数据库的索引与查询技术,特别是一种基于路网反空间关键字查询的最佳选址方法。The invention relates to database indexing and query technology, in particular to an optimal site selection method based on road network anti-space keyword query.

背景技术Background technique

空间数据库是作为一种应用技术而诞生和发展起来的,其目的是为了存储、管理和检索各种地理空间数据(包括空间数据和非空间数据)。目前,空间数据库被广泛地应用于地理信息系统、计算机辅助设计、多媒体信息系统以及数据仓库,为以上系统提供数据存储和查询解决方案。路网空间数据库作为空间数据库的重要组成部分,由于其广泛的应用得到了越来越多的关注。Spatial database was born and developed as an application technology, its purpose is to store, manage and retrieve various geospatial data (including spatial data and non-spatial data). At present, spatial databases are widely used in geographic information systems, computer-aided design, multimedia information systems and data warehouses, providing data storage and query solutions for the above systems. Road network spatial database, as an important part of spatial database, has received more and more attention due to its wide application.

为了快速、有效的访问路网空间数据,专家学者们提出了大量的空间索引方式。迄今为止,影响最大、应用最广泛的基于路网空间的索引是基于连接聚簇的索引方法。它利用了现有的路网的邻接信息,其构建思想是对任意路网节点的邻接节点以及路网上得数据点分别聚类存储,再用B树进行索引,从而达到最小化存取代价的目的。In order to quickly and effectively access road network spatial data, experts and scholars have proposed a large number of spatial index methods. So far, the most influential and widely used road network space-based index is the index method based on connection clustering. It uses the adjacency information of the existing road network. Its construction idea is to cluster and store the adjacent nodes of any road network node and the data points on the road network, and then use the B-tree to index, so as to minimize the storage cost. Purpose.

在此基础上,专家学者们提出了各种各具特色的基于路网的查询及解决方法,如最近邻查询、连续最近邻查询、反向最近邻查询。其中基于路网的反最近邻查询是最近提出的一种新颖的查询。它主要从商家的角度进行查询,向商家返回基于路网距离影响用户最多的地址,从而向帮助商家进行选址推荐。On this basis, experts and scholars have proposed various road network-based queries and solutions, such as nearest neighbor query, continuous nearest neighbor query, and reverse nearest neighbor query. Among them, the anti-nearest neighbor query based on road network is a novel query proposed recently. It mainly conducts queries from the perspective of merchants, and returns to merchants the addresses that affect users the most based on road network distance, so as to help merchants make site selection recommendations.

目前,针对路网反最近邻查询已有成熟的解决方案。但是在某些情况下,路网反最近邻查询不仅仅需要考虑空间信息,文本描述信息作为数据点的重要组成部分,在已有的方法中却没有得到有效利用。为了更好的帮助商家做出最佳选址决策,这就要求系统能够处理路网环境下带有文本信息的反最近邻查询结果。但是现有的方法都不能有效的解决。At present, there are mature solutions for reverse nearest neighbor query on road networks. However, in some cases, the road network inverse nearest neighbor query not only needs to consider spatial information, but text description information, as an important part of data points, has not been effectively used in existing methods. In order to better help merchants make optimal location decisions, this requires the system to be able to process reverse nearest neighbor query results with text information in the road network environment. But none of the existing methods can effectively solve it.

发明内容Contents of the invention

本发明要克服现有技术不能有效利用文本描述信息的缺点,提供一种基于路网反空间关键字查询的最佳选址方法。The present invention overcomes the shortcoming that the prior art cannot effectively utilize the text description information, and provides an optimal site selection method based on road network anti-space keyword query.

本发明解决其技术问题所采用的技术方案步骤如下:The technical solution steps adopted by the present invention to solve its technical problems are as follows:

步骤(1):收集已有商家信息,对其建立索引,收集查询商家备选地址信息和文本信息;Step (1): Collect existing business information, build an index on it, collect and query business alternative address information and text information;

步骤(2):对于每一个查询商家备选地址,通过类迪杰斯特拉方法,并利用查询文本信息,找到与查询商家有潜在竞争关系的商家;Step (2): For each candidate address of the inquiring merchant, use the Dijkstra-like method and use the query text information to find a merchant that has a potential competitive relationship with the inquiring merchant;

步骤(3):利用判断规则对步骤(2)中得到的每一个查询商家备选地址的潜在竞争商家进行验证;Step (3): Use the judgment rule to verify each potential competitor of the candidate address of the query merchant obtained in step (2);

步骤(4):利用步骤(3)中的结果找到查询商家的最优备选地址。Step (4): Utilize the results in step (3) to find the optimal alternative address of the inquired merchant.

所述的步骤(1)中每一个商家的信息包括地址信息和文本信息,其中地址信息是通过一个地理坐标表示的,文本信息是由一组或多组关键字表示的;所有商家信息是通过基于连接聚簇的索引模型对其建立索引的,索引文件包含了所有的商家地理位置信息以及文本信息,其中地理信息存储于连接表文件中,文本信息存放于数据点文件中;查询商家的所有信息都存储于文本文件中。The information of each merchant in the described step (1) includes address information and text information, wherein the address information is represented by a geographic coordinate, and the text information is represented by one or more groups of keywords; all merchant information is represented by It is indexed based on the index model of connection clustering. The index file contains all the geographic location information and text information of the merchant. The geographic information is stored in the connection table file, and the text information is stored in the data point file; query all merchants. Information is stored in text files.

所述的步骤(2)中对于每一个查询商家备选地址,通过类迪杰斯特拉方法,并利用查询文本信息,找到与查询商家有潜在竞争关系的商家,是通过判断关键字包含性以及路网距离实现的,计算得到的数据点集合也就是对查询商家可能构成竞争关系的商家;在计算的过程中,需要维护一个存放路网节点以及该节点到查询商家地址距离的优先队列,在该优先队列的队首总是当前距离查询点最近的路网节点;通过迪杰斯特拉方法扩展路网的获取数据点的过程中,还需要维护对应于当前路网节点的计数树用于保存关键字计数信息;计数树由2n-1个节点构成,每个节点包含三个计数值分别为c1,c2以及c3,其中c1表示查询节点到当前节点中关键字等于当前关键字的数据点个数,c2表示查询节点到当前节点中关键字包含当前关键字的数据点个数,c3表示查询节点到当前节点中邻接边上关键字包含当前关键字的数据点个数;若所有边界节点计数树的c1或c2数值都达到了给定的k值,则路网扩展结束,因为扩展的路网区域内不可能存在潜在的竞争商家;其中,在判断数据点是否是潜在竞争商家的过程中,分三种情况考虑:In the step (2), for each alternative address of the inquiring merchant, through the Dijkstra-like method and using the query text information, find a merchant that has a potential competitive relationship with the inquiring merchant, by judging the inclusion of the keyword And the road network distance is realized, the calculated data point set is the merchant that may constitute a competitive relationship with the query merchant; in the calculation process, it is necessary to maintain a priority queue that stores the road network node and the distance from the node to the query merchant address The leader of the priority queue is the current road network node closest to the query point; in the process of expanding the road network to obtain data points through the Dijkstra method, it is also necessary to maintain the count tree corresponding to the current road network node. To save keyword counting information; the counting tree is composed of 2 n -1 nodes, each node contains three counting values c 1 , c 2 and c 3 , where c 1 means that the keyword in the query node to the current node is equal to The number of data points of the current keyword, c 2 indicates the number of data points from the query node to the current node whose keywords contain the current keyword, and c 3 indicates the data from the query node to the current node on the adjacent edge whose keywords contain the current keyword number of points; if the values of c 1 or c 2 of all boundary node counting trees reach the given value of k, the road network expansion ends, because there are no potential competitors in the expanded road network area; among them, in In the process of judging whether a data point is a potential competitor, three situations are considered:

2.1)该数据点的关键字集合被查询商家的关键字集合全部包含,这种状况下根据其当前路网节点的计数树中对应关键字的数值分为两种情况分别做出不同处理:2.1) The keyword set of the data point is all included by the keyword set of the query merchant. In this case, according to the value of the corresponding keyword in the counting tree of the current road network node, it is divided into two cases and processed differently:

a)该数据点关键字集合所对应的当前路网节点的计数树的计数值小于给定的k值,则将当前数据点添加至候选集合并更新当前节点的计数树对应数值;a) The count value of the count tree of the current road network node corresponding to the data point keyword set is less than the given k value, then the current data point is added to the candidate set and the corresponding value of the count tree of the current node is updated;

b)该数据点关键字集合所对应的当前路网节点的计数树的计数值大于或等于给定的k值,更新当前节点的计数树对应数值;b) The count value of the count tree of the current road network node corresponding to the data point keyword set is greater than or equal to the given k value, and the corresponding value of the count tree of the current node is updated;

2.2)该数据点的关键字集合被查询商家的关键字集合部分包含,这种状况下需要更新当前节点的计数树对应数值,不将数据点加入候选集合中;2.2) The keyword set of the data point is partly included in the keyword set of the query merchant. In this case, the corresponding value of the count tree of the current node needs to be updated, and the data point is not added to the candidate set;

2.3)该数据点的关键字集合完全不被查询商家的关键字集合包含,这种状况下不需要对数据点以及计数树做任何处理。2.3) The keyword set of the data point is not included in the keyword set of the queried merchant at all. In this case, there is no need to do any processing on the data point and the counting tree.

所述的步骤(3)中利用判断规则对步骤(2)中得到的每一个查询商家备选地址的潜在竞争商家进行验证,是通过验证潜在竞争商家的k个包含其关键字集合的最近商家是否包含查询商家来实现的;对于每一个待验证的潜在竞争商家,首先要在索引中定位该商家的地理位置以及关键字集合,再根据潜在竞争商家到查询商家的距离范围内,关键字集合包含潜在竞争商家的关键字集合的所有商家数量来判断;在判断时,需考虑两种情况:Utilize judgment rule in the described step (3) to verify the potential competitive merchants of each query merchant's alternative address obtained in step (2), by verifying the k nearest merchants of the potential competitive merchants containing its keyword set It is realized by including query merchants; for each potential competitor merchant to be verified, first locate the geographic location and keyword set of the merchant in the index, and then according to the distance range from the potential competitor merchant to the query merchant, the keyword set Judging by the number of all merchants in the keyword set containing potential competing merchants; when judging, two situations need to be considered:

3.1)在潜在竞争商家到查询商家的距离范围内,如果存在多于或等于k个商家,其关键字集合包含潜在竞争商家的关键字集合,那么该潜在竞争商家不是查询商家的真正竞争商家;3.1) Within the distance from a potential competitor to the query merchant, if there are more than or equal to k merchants whose keyword set contains the keyword set of the potential competitor, then the potential competitor is not the real competitor of the query merchant;

3.2)在潜在竞争商家到查询商家的距离范围内,存在小于k个商家,其关键字集合包含潜在竞争商家的关键字集合,那么该潜在竞争商家是查询商家的真正竞争商家。3.2) If there are less than k merchants within the distance from the potential competitor to the query merchant, whose keyword set contains the keyword set of the potential competitor, then the potential competitor is the real competitor of the query merchant.

所述的步骤(4)中利用步骤(3)中的结果找到查询商家的最优备选地址,是根据每一个备选地址所具有的真正竞争商家数量来实现的,所具有的真正竞争商家数量最少的备选地址即是查询商家的最优备选地址。Utilize the result in the step (3) in the described step (4) to find the optimal alternative address of the query merchant, which is realized according to the number of real competitive merchants that each alternative address has, and the real competitive merchants that have The candidate address with the least number is the optimal candidate address for querying the merchant.

本发明具有的有益效果是:The beneficial effects that the present invention has are:

本发明充分利用了空间数据库中现有索引技术,反最近邻查询以及带有关键字查询技术,通过只遍历部分路网范围的商家信息便能得到查询结果,大大降低了I/O时间和CPU时间,提供了最佳性能。The present invention makes full use of the existing indexing technology in the spatial database, the anti-nearest neighbor query and the query technology with keywords, and the query result can be obtained by only traversing the merchant information in a part of the road network range, which greatly reduces the I/O time and CPU time, provides the best performance.

附图说明Description of drawings

图1是本发明的实施步骤流程图。Fig. 1 is a flow chart of the implementation steps of the present invention.

图2为基于路网反空间关键字查询的最佳选址方法的工作原理示意图。Figure 2 is a schematic diagram of the working principle of the optimal location method based on road network anti-space keyword query.

图3为查询示意图。Figure 3 is a schematic diagram of the query.

具体实施方式Detailed ways

现结合附图和具体实施对本发明的技术方案作进一步说明:Now in conjunction with accompanying drawing and concrete implementation technical scheme of the present invention is described further:

如图1,图2所示,本发明具体实施过程和工作原理如下:As shown in Fig. 1 and Fig. 2, the concrete implementation process and working principle of the present invention are as follows:

步骤(1):收集已有商家信息,对其建立索引,收集查询商家备选地址信息和文本信息;Step (1): Collect existing business information, build an index on it, collect and query business alternative address information and text information;

步骤(2):对于每一个查询商家备选地址,通过类迪杰斯特拉方法,并利用查询文本信息,找到与查询商家有潜在竞争关系的商家;Step (2): For each candidate address of the inquiring merchant, use the Dijkstra-like method and use the query text information to find a merchant that has a potential competitive relationship with the inquiring merchant;

步骤(3):利用判断规则对步骤(2)中得到的每一个查询商家备选地址的潜在竞争商家进行验证;Step (3): Use the judgment rule to verify each potential competitor of the candidate address of the query merchant obtained in step (2);

步骤(4):利用步骤(3)中的结果找到查询商家的最优备选地址。Step (4): Utilize the results in step (3) to find the optimal alternative address of the inquired merchant.

步骤(1)中每一个商家的信息包括地址信息和文本信息,其中地址信息是通过一个地理坐标表示的,文本信息是由一组或多组关键字表示的;所有商家信息是通过基于连接聚簇的索引模型对其建立索引的,索引文件包含了所有的商家地理位置信息以及文本信息,其中地理信息存储于连接表文件中,文本信息存放于数据点文件中;查询商家的所有信息都存储于文本文件中。具体如图2中的索引模块所示。The information of each merchant in step (1) includes address information and text information, wherein the address information is represented by a geographic coordinate, and the text information is represented by one or more groups of keywords; all business information is aggregated based on connections The index model of the cluster builds an index for it. The index file contains all the business location information and text information, where the geographic information is stored in the connection table file, and the text information is stored in the data point file; all the information of the query business is stored in the text file. Specifically, the index module in FIG. 2 is shown.

步骤(2)中对于每一个查询商家备选地址,通过类迪杰斯特拉方法,并利用查询文本信息,找到与查询商家有潜在竞争关系的商家,是通过判断关键字包含性以及路网距离实现的,计算得到的数据点集合也就是对查询商家可能构成竞争关系的商家;在计算的过程中,需要维护一个存放路网节点以及该节点到查询商家地址距离的优先队列,在该优先队列的队首总是当前距离查询点最近的路网节点;通过迪杰斯特拉方法扩展路网的获取数据点的过程中,还需要维护对应于当前路网节点的计数树用于保存关键字计数信息;计数树由2n-1个节点构成,每个节点包含三个计数值分别为c1,c2以及c3,其中c1表示查询节点到当前节点中关键字等于当前关键字的数据点个数,c2表示查询节点到当前节点中关键字包含当前关键字的数据点个数,c3表示查询节点到当前节点中邻接边上关键字包含当前关键字的数据点个数;若所有边界节点计数树的c1或c2数值都达到了给定的k值,则路网扩展结束,因为扩展的路网区域内不可能存在潜在的竞争商家。该步骤具体是由图2中的潜在竞争商家计算引擎求得。其中,在判断数据点是否是潜在竞争商家的过程中,分三种情况考虑:In step (2), for each candidate address of the query merchant, through Dijkstra-like method and using the query text information, find the merchant that has potential competitive relationship with the query merchant, by judging the inclusion of the keyword and the road network The distance is realized, and the calculated data point set is the merchant that may constitute a competitive relationship with the query merchant; The head of the queue is always the road network node closest to the query point; in the process of obtaining data points by expanding the road network through the Dijkstra method, it is also necessary to maintain a count tree corresponding to the current road network node to save the key Word counting information; the counting tree is composed of 2 n -1 nodes, each node contains three counting values c 1 , c 2 and c 3 , where c 1 means that the key from the query node to the current node is equal to the current key c 2 indicates the number of data points whose keywords contain the current keyword from the query node to the current node, and c 3 indicates the number of data points whose keywords include the current keyword on the adjacent edge from the query node to the current node ; If the c 1 or c 2 values of all boundary node count trees reach the given k value, the road network expansion ends, because there is no potential competitive merchant in the expanded road network area. This step is specifically determined by the potential competitor calculation engine in FIG. 2 . Among them, in the process of judging whether a data point is a potential competitor, three situations are considered:

2.1)该数据点的关键字集合被查询商家的关键字集合全部包含,这种状况下根据其当前路网节点的计数树中对应关键字的数值分为两种情况分别做出不同处理:2.1) The keyword set of the data point is all included by the keyword set of the query merchant. In this case, according to the value of the corresponding keyword in the counting tree of the current road network node, it is divided into two cases and processed differently:

a)该数据点关键字集合所对应的当前路网节点的计数树的计数值小于给定的k值,则将当前数据点添加至候选集合并更新当前节点的计数树对应数值;a) The count value of the count tree of the current road network node corresponding to the data point keyword set is less than the given k value, then the current data point is added to the candidate set and the corresponding value of the count tree of the current node is updated;

b)该数据点关键字集合所对应的当前路网节点的计数树的计数值大于或等于给定的k值,更新当前节点的计数树对应数值;b) The count value of the count tree of the current road network node corresponding to the data point keyword set is greater than or equal to the given k value, and the corresponding value of the count tree of the current node is updated;

2.2)该数据点的关键字集合被查询商家的关键字集合部分包含,这种状况下需要更新当前节点的计数树对应数值,不将数据点加入候选集合中;2.2) The keyword set of the data point is partly included in the keyword set of the query merchant. In this case, the corresponding value of the count tree of the current node needs to be updated, and the data point is not added to the candidate set;

2.3)该数据点的关键字集合完全不被查询商家的关键字集合包含,这种状况下不需要对数据点以及计数树做任何处理。2.3) The keyword set of the data point is not included in the keyword set of the queried merchant at all. In this case, there is no need to do any processing on the data point and the counting tree.

以图3中查询点q为例,我们可以看到,p4,p6,p7以及p8为舍弃掉的点,故不需要对这些点进行进一步验证。Taking the query point q in Figure 3 as an example, we can see that p 4 , p 6 , p 7 and p 8 are discarded points, so further verification of these points is not required.

步骤(3)中利用判断规则对步骤(2)中得到的每一个查询商家备选地址的潜在竞争商家进行验证,是通过验证潜在竞争商家的k个包含其关键字集合的最近商家是否包含查询商家来实现的;对于每一个待验证的潜在竞争商家,首先要在索引中定位该商家的地理位置以及关键字集合,再根据潜在竞争商家到查询商家的距离范围内,关键字集合包含潜在竞争商家的关键字集合的所有商家数量来判断。具体由图2中潜在竞争商家过滤引擎计算求得,在判断时,需考虑两种情况:In step (3), use the judgment rule to verify each potential competitor of the query merchant’s alternative address obtained in step (2), by verifying whether the k nearest merchants of the potential competitor’s merchant containing its keyword set contain the query merchants; for each potential competitor to be verified, first locate the location and keyword set of the merchant in the index, and then according to the distance from the potential competitor to the query merchant, the keyword set contains potential competitors It is judged by the quantity of all merchants in the keyword set of the merchant. Specifically, it is calculated by the filter engine of potential competitors in Figure 2. When judging, two situations need to be considered:

3.1)在潜在竞争商家到查询商家的距离范围内,如果存在多于或等于k个商家,其关键字集合包含潜在竞争商家的关键字集合,那么该潜在竞争商家不是查询商家的真正竞争商家;3.1) Within the distance from a potential competitor to the query merchant, if there are more than or equal to k merchants whose keyword set contains the keyword set of the potential competitor, then the potential competitor is not the real competitor of the query merchant;

3.2)在潜在竞争商家到查询商家的距离范围内,存在小于k个商家,其关键字集合包含潜在竞争商家的关键字集合,那么该潜在竞争商家是查询商家的真正竞争商家。3.2) If there are less than k merchants within the distance from the potential competitor to the query merchant, whose keyword set contains the keyword set of the potential competitor, then the potential competitor is the real competitor of the query merchant.

步骤(4)中利用步骤(3)中的结果找到查询商家的最优备选地址,是根据每一个备选地址所具有的真正竞争商家数量来实现的,所具有的真正竞争商家数量最少的备选地址即是查询商家的最优备选地址,具体由图2中真正竞争商家计算引擎计算求得。In step (4), use the results in step (3) to find the optimal candidate address of the query merchant, which is realized according to the number of real competitive merchants that each candidate address has, and the one with the least number of real competitive merchants The alternative address is the optimal alternative address of the inquiring merchant, specifically calculated by the calculation engine of the real competing merchants in Figure 2.

本说明书实施例所述的内容仅仅是对发明构思的实现形式的列举,本发明的保护范围不应当被视为仅限于实施例所陈述的具体形式,本发明的保护范围也及于本领域技术人员根据本发明构思所能够想到的等同技术手段。The content described in the embodiments of this specification is only an enumeration of the implementation forms of the inventive concept. The protection scope of the present invention should not be regarded as limited to the specific forms stated in the embodiments. Equivalent technical means that a person can think of based on the concept of the present invention.

Claims (5)

1.一种基于路网反空间关键字查询的最佳选址方法:其特征在于该方法的步骤如下:1. a kind of optimal site selection method based on road network anti-space keyword query: it is characterized in that the step of this method is as follows: 步骤(1):收集已有商家信息,对其建立索引,收集查询商家备选地址信息和文本信息;Step (1): Collect existing business information, build an index on it, collect and query business alternative address information and text information; 步骤(2):对于每一个查询商家备选地址,通过类迪杰斯特拉方法,并利用查询文本信息,找到与查询商家有潜在竞争关系的商家;Step (2): For each candidate address of the inquiring merchant, use the Dijkstra-like method and use the query text information to find a merchant that has a potential competitive relationship with the inquiring merchant; 步骤(3):利用判断规则对步骤(2)中得到的每一个查询商家备选地址的潜在竞争商家进行验证;Step (3): Use the judgment rule to verify each potential competitor of the candidate address of the query merchant obtained in step (2); 步骤(4):利用步骤(3)中的结果找到查询商家的最优备选地址。Step (4): Utilize the results in step (3) to find the optimal alternative address of the inquired merchant. 2.根据权利要求1所述的一种基于路网反空间关键字查询的最佳选址方法,其特征在于:所述的步骤(1)中每一个商家的信息包括地址信息和文本信息,其中地址信息是通过一个地理坐标表示,文本信息是由一组或多组关键字构成;所有商家信息是通过基于连接聚簇的索引模型对其建立索引,索引文件包含了所有的商家地理位置信息以及文本信息,其中地理信息存储于连接表文件,文本信息存放于数据点文件;查询商家的所有信息都存储于文本文件。2. a kind of optimal site selection method based on road network anti-space keyword query according to claim 1, is characterized in that: the information of each merchant in the described step (1) comprises address information and text information, The address information is represented by a geographical coordinate, and the text information is composed of one or more groups of keywords; all business information is indexed through an index model based on connection clustering, and the index file contains all business location information And text information, wherein the geographical information is stored in the connection table file, the text information is stored in the data point file; all the information of the business query is stored in the text file. 3.根据权利要求1所述的一种基于路网反空间关键字查询的最佳选址方法,其特征在于:所述的步骤(2)中对于每一个查询商家备选地址,通过类迪杰斯特拉方法,并利用查询文本信息,找到与查询商家有潜在竞争关系的商家,是通过判断关键字包含性以及路网距离实现的,计算得到的数据点集合也就是对查询商家可能构成竞争关系的商家;在计算的过程中,需要维护一个存放路网节点以及该节点到查询商家地址距离的优先队列,在该优先队列的队首总是当前距离查询点最近的路网节点;通过迪杰斯特拉方法扩展路网来获取数据点的过程中,还需要维护对应于当前路网节点的计数树用于保存关键字计数信息;计数树由2n-1个节点构成,每个节点包含三个计数值分别为c1,c2以及c3,其中c1表示查询节点到当前节点中关键字等于当前关键字的数据点个数,c2表示查询节点到当前节点中关键字包含当前关键字的数据点个数,c3表示查询节点到当前节点中邻接边上关键字包含当前关键字的数据点个数;若所有边界节点计数树的c1或c2数值都达到了给定的k值,则路网扩展结束,因为扩展的路网区域内不可能存在潜在的竞争商家;其中,在判断数据点是否是潜在竞争商家的过程中,分三种情况考虑:3. a kind of best site selection method based on road network anti-space keyword query according to claim 1, is characterized in that: in described step (2), for each inquiry businessman alternative address, by class Di Jestella method, and using the query text information to find businesses that have a potential competitive relationship with the query business, it is realized by judging the inclusion of keywords and the distance of the road network. The calculated set of data points is the possible composition of the query business. Competitive merchants; in the calculation process, it is necessary to maintain a priority queue that stores road network nodes and the distance from the node to the address of the query merchant. The leader of the priority queue is the road network node that is currently closest to the query point; through In the process of Dijkstra method expanding the road network to obtain data points, it is also necessary to maintain a counting tree corresponding to the current road network node to store keyword counting information; the counting tree is composed of 2 n -1 nodes, each The node contains three count values respectively c 1 , c 2 and c 3 , where c 1 indicates the number of data points whose keywords are equal to the current keyword from the query node to the current node, and c 2 indicates the number of data points from the query node to the current node The number of data points containing the current keyword, c 3 indicates the number of data points whose keywords contain the current keyword on the adjacent edge from the query node to the current node; if the values of c 1 or c 2 of all boundary node count trees have reached Given a value of k, the road network expansion ends, because there are no potential competitors in the expanded road network area; among them, in the process of judging whether a data point is a potential competitor, three situations are considered: 2.1)该数据点的关键字集合被查询商家的关键字集合全部包含,这种状况下根据其当前路网节点的计数树中对应关键字的数值分为两种情况分别做出不同处理:2.1) The keyword set of the data point is all included by the keyword set of the query merchant. In this case, according to the value of the corresponding keyword in the counting tree of the current road network node, it is divided into two cases and processed differently: a)该数据点关键字集合所对应的当前路网节点的计数树的计数值小于给定的k值,则将当前数据点添加至候选集合并更新当前节点的计数树对应数值;a) The count value of the count tree of the current road network node corresponding to the data point keyword set is less than the given k value, then the current data point is added to the candidate set and the corresponding value of the count tree of the current node is updated; b)该数据点关键字集合所对应的当前路网节点的计数树的计数值大于或等于给定的k值,更新当前节点的计数树对应数值;b) The count value of the count tree of the current road network node corresponding to the data point keyword set is greater than or equal to the given k value, and the corresponding value of the count tree of the current node is updated; 2.2)该数据点的关键字集合被查询商家的关键字集合部分包含,这种状况下需要更新当前节点的计数树对应数值,不将数据点加入候选集合中;2.2) The keyword set of the data point is partly included in the keyword set of the query merchant. In this case, the corresponding value of the count tree of the current node needs to be updated, and the data point is not added to the candidate set; 2.3)该数据点的关键字集合完全不被查询商家的关键字集合包含,这种状况下不需要对数据点以及计数树做任何处理。2.3) The keyword set of the data point is not included in the keyword set of the queried merchant at all. In this case, there is no need to do any processing on the data point and the counting tree. 4.根据权利要求1所述的一种基于路网反空间关键字查询的最佳选址方法,其特征在于:所述的步骤(3)中利用判断规则对步骤(2)中得到的每一个查询商家备选地址的潜在竞争商家进行验证,是通过验证潜在竞争商家的k个包含其关键字集合的最近商家是否包含查询商家来实现的;对于每一个待验证的潜在竞争商家,首先要在索引中定位该商家的地理位置以及关键字集合,再根据潜在竞争商家到查询商家的距离范围内,关键字集合包含潜在竞争商家的关键字集合的所有商家数量来判断;在判断时,需考虑两种情况:4. a kind of optimal location method based on road network anti-space keyword query according to claim 1, is characterized in that: in described step (3), utilize judgment rule to obtain in step (2) each The verification of a potential competitor of an alternate address of a query merchant is realized by verifying whether the k nearest merchants of the potential competitor that contain its keyword set include the query merchant; for each potential competitor to be verified, first of all Locate the geographic location and keyword set of the merchant in the index, and then judge according to the number of all merchants whose keyword set includes the keyword set of the potential competitor within the distance from the potential competitor to the query merchant; when judging, it is necessary Consider two cases: 3.1)在潜在竞争商家到查询商家的距离范围内,如果存在多于或等于k个商家,其关键字集合包含潜在竞争商家的关键字集合,那么该潜在竞争商家不是查询商家的真正竞争商家;3.1) Within the distance from a potential competitor to the query merchant, if there are more than or equal to k merchants whose keyword set contains the keyword set of the potential competitor, then the potential competitor is not the real competitor of the query merchant; 3.2)在潜在竞争商家到查询商家的距离范围内,存在小于k个商家,其关键字集合包含潜在竞争商家的关键字集合,那么该潜在竞争商家是查询商家的真正竞争商家。3.2) If there are less than k merchants within the distance from the potential competitor to the query merchant, whose keyword set contains the keyword set of the potential competitor, then the potential competitor is the real competitor of the query merchant. 5.根据权利要求1所述的一种基于路网反空间关键字查询的最佳选址方法,其特征在于:所述的步骤(4)中利用步骤(3)中的结果找到查询商家的最优备选地址,是根据每一个备选地址所具有的真正竞争商家数量来实现的,所具有的真正竞争商家数量最少的备选地址即是查询商家的最优备选地址。5. a kind of optimal site selection method based on road network anti-space keyword query according to claim 1, is characterized in that: in the described step (4), utilize the result in the step (3) to find the query merchant's The optimal candidate address is realized according to the number of real competitive merchants that each candidate address has, and the candidate address with the least number of real competitive merchants is the optimal candidate address for the inquiring merchant.
CN201410568900.6A 2014-10-23 2014-10-23 A kind of the best site selection method based on the anti-spatial key inquiry of road network Active CN104346444B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410568900.6A CN104346444B (en) 2014-10-23 2014-10-23 A kind of the best site selection method based on the anti-spatial key inquiry of road network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410568900.6A CN104346444B (en) 2014-10-23 2014-10-23 A kind of the best site selection method based on the anti-spatial key inquiry of road network

Publications (2)

Publication Number Publication Date
CN104346444A true CN104346444A (en) 2015-02-11
CN104346444B CN104346444B (en) 2017-07-07

Family

ID=52502035

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410568900.6A Active CN104346444B (en) 2014-10-23 2014-10-23 A kind of the best site selection method based on the anti-spatial key inquiry of road network

Country Status (1)

Country Link
CN (1) CN104346444B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105404675A (en) * 2015-11-20 2016-03-16 苏州大学 Ranked reverse nearest neighbor space keyword query method and apparatus
CN105868336A (en) * 2016-03-28 2016-08-17 哈尔滨工程大学 Set-oriented space keyword query method for road network
CN107391636A (en) * 2017-07-10 2017-11-24 江苏省现代企业信息化应用支撑软件工程技术研发中心 The anti-neighbour's spatial key querying methods of top m
CN108549690A (en) * 2018-04-12 2018-09-18 石家庄铁道大学 Spatial key querying method and system based on space length constraint
CN108733803A (en) * 2018-05-18 2018-11-02 电子科技大学 A kind of Multi-User Dimension keyword query method under road network
CN112308597A (en) * 2020-08-14 2021-02-02 西安工程大学 Method for selecting facility address according to influence of sport users in competitive environment
CN112883272A (en) * 2021-03-16 2021-06-01 山东大学 Method for determining recommended object

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020319A (en) * 2013-01-11 2013-04-03 江苏大学 Real-time mobile space keyword approximate Top-k query method
US20140181083A1 (en) * 2012-12-21 2014-06-26 Motorola Solutions, Inc. Method and apparatus for multi-dimensional graphical representation of search queries and results

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140181083A1 (en) * 2012-12-21 2014-06-26 Motorola Solutions, Inc. Method and apparatus for multi-dimensional graphical representation of search queries and results
CN103020319A (en) * 2013-01-11 2013-04-03 江苏大学 Real-time mobile space keyword approximate Top-k query method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李艳红等: "路网移动对象空间关键字连续Top-k查询", 《华中科技大学学报(自然科学版)》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105404675A (en) * 2015-11-20 2016-03-16 苏州大学 Ranked reverse nearest neighbor space keyword query method and apparatus
CN105868336B (en) * 2016-03-28 2019-11-29 哈尔滨工程大学 Spatial key word querying method in road network towards set
CN105868336A (en) * 2016-03-28 2016-08-17 哈尔滨工程大学 Set-oriented space keyword query method for road network
CN107391636A (en) * 2017-07-10 2017-11-24 江苏省现代企业信息化应用支撑软件工程技术研发中心 The anti-neighbour's spatial key querying methods of top m
CN107391636B (en) * 2017-07-10 2020-06-09 江苏省现代企业信息化应用支撑软件工程技术研发中心 Top-m reverse nearest neighbor space keyword query method
CN108549690A (en) * 2018-04-12 2018-09-18 石家庄铁道大学 Spatial key querying method and system based on space length constraint
CN108549690B (en) * 2018-04-12 2021-07-13 石家庄铁道大学 Spatial Keyword Query Method and System Based on Spatial Distance Constraint
CN108733803A (en) * 2018-05-18 2018-11-02 电子科技大学 A kind of Multi-User Dimension keyword query method under road network
CN108733803B (en) * 2018-05-18 2022-04-29 电子科技大学 A multi-user space keyword query method under road network
CN112308597A (en) * 2020-08-14 2021-02-02 西安工程大学 Method for selecting facility address according to influence of sport users in competitive environment
CN112308597B (en) * 2020-08-14 2023-07-18 西安工程大学 A Method for Selecting Facility Addresses Based on Influence-Motion Users in a Competitive Environment
CN112883272A (en) * 2021-03-16 2021-06-01 山东大学 Method for determining recommended object
CN112883272B (en) * 2021-03-16 2022-04-29 山东大学 Method for determining recommended object

Also Published As

Publication number Publication date
CN104346444B (en) 2017-07-07

Similar Documents

Publication Publication Date Title
Rocha-Junior et al. Top-k spatial keyword queries on road networks
CN104346444B (en) A kind of the best site selection method based on the anti-spatial key inquiry of road network
CN104035949B (en) Similarity data retrieval method based on locality sensitive hashing (LASH) improved algorithm
Deng et al. Best keyword cover search
CN107423368A (en) A kind of space-time data indexing means in non-relational database
CN106951526B (en) Entity set extension method and device
CN107145526B (en) An anti-nearest neighbor query processing method for geo-social keywords under road network
Mahmood et al. FAST: frequency-aware indexing for spatio-textual data streams
CN108182242A (en) A kind of indexing means for the inquiry of magnanimity multi dimensional numerical data area
CN105868336B (en) Spatial key word querying method in road network towards set
CN104376112A (en) Road network space keyword search method
CN106156271A (en) Related information directory system based on distributed storage and foundation thereof and using method
CN104391908B (en) Multiple key indexing means based on local sensitivity Hash on a kind of figure
CN107506490A (en) Preferential search algorithm and system based on position top k keyword queries under sliding window
CN107451302A (en) Modeling method and system based on position top k keyword queries under sliding window
CN110059149A (en) Electronic map spatial key Querying Distributed directory system and method
CN111813778A (en) An approximate keyword storage and query method for large-scale road network data
CN105893486A (en) Large-scale graph shortest distance indexing method based on cluster
Sun et al. On efficient aggregate nearest neighbor query processing in road networks
CN106096065B (en) A kind of similar to search method and device of multimedia object
CN113407669B (en) A Semantic Trajectory Query Method Based on Activity Influence
CN114491056B (en) Method and system for improving POI search in digital policing scenarios
Mahmood et al. Fast: frequency-aware spatio-textual indexing for in-memory continuous filter query processing
CN107526788A (en) The at the uniform velocity searching algorithm of track inquiry based on interest region
Han et al. Efficiently retrieving top-k trajectories by locations via traveling time

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant