[go: up one dir, main page]

CN108304585A - A kind of result data choosing method and relevant apparatus based on spatial key search - Google Patents

A kind of result data choosing method and relevant apparatus based on spatial key search Download PDF

Info

Publication number
CN108304585A
CN108304585A CN201810184309.9A CN201810184309A CN108304585A CN 108304585 A CN108304585 A CN 108304585A CN 201810184309 A CN201810184309 A CN 201810184309A CN 108304585 A CN108304585 A CN 108304585A
Authority
CN
China
Prior art keywords
candidate
text object
spatial
topics
diversity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810184309.9A
Other languages
Chinese (zh)
Other versions
CN108304585B (en
Inventor
钱志虎
许佳捷
郑凯
柳诚飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN201810184309.9A priority Critical patent/CN108304585B/en
Publication of CN108304585A publication Critical patent/CN108304585A/en
Application granted granted Critical
Publication of CN108304585B publication Critical patent/CN108304585B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请公开了一种基于空间关键字搜索的结果数据选取方法,先通过多样性主题数实现了度量空间文本对象的多样性,再通过距离系数和多样性主题数确定每个候选空间文本对象的边界成本,选取边界成本最小的候选空间文本对象至结果集中,使结果集中的对象与查询对象距离较短并且多样性主题数保持在较高状态,也就是在基于距离系数选择的同时考虑到每个搜索结果的多样性,提高结果集的多样性,满足用户多样化的搜索需求。本申请还公开了一种基于空间关键字搜索的结果数据选取装置、服务器以及计算机可读存储介质,具有上述有益效果。

This application discloses a method for selecting result data based on spatial keyword search. First, the diversity of spatial text objects is measured by the number of diverse topics, and then the number of each candidate spatial text object is determined by the distance coefficient and the number of diverse topics. Boundary cost, select the candidate space text object with the smallest boundary cost to the result set, so that the distance between the object in the result set and the query object is relatively short and the number of diverse topics is kept at a high state, that is, while selecting based on the distance coefficient, each The diversity of individual search results improves the diversity of result sets and meets the diverse search needs of users. The application also discloses a device for selecting result data based on spatial keyword search, a server, and a computer-readable storage medium, which have the above-mentioned advantageous effects.

Description

一种基于空间关键字搜索的结果数据选取方法及相关装置A method and related device for selecting result data based on spatial keyword search

技术领域technical field

本申请涉及计算机技术领域,特别涉及一种基于空间关键字搜索的结果数据选取方法、结果数据选取装置、服务器以及计算机可读存储介质。The present application relates to the field of computer technology, and in particular to a method for selecting result data based on spatial keyword search, a device for selecting result data, a server, and a computer-readable storage medium.

背景技术Background technique

随着定位服务技术的出现,越来越多的应用把现实现象与空间位置关联起来,衍生出应用广泛的空间关键字查询,即结合空间查询和文本查询以寻求最优结果的混合查询。With the emergence of location-based service technology, more and more applications associate real phenomena with spatial locations, resulting in widely used spatial keyword queries, that is, hybrid queries that combine spatial queries and text queries to seek optimal results.

通常空间关键字查询分为大体三个步骤,分别是将空间文本对象以及相关数据进行形式化度量、对所有的空间文本对象建立相应的索引结构以及通过接收的查询关键字进行查询。其中,对于空间文本对象的形式化度量具有统一的度量标准,通过该度量标准可以高效的进行相应的空间文本对象查询。基于上一方式,可以实现对空间关键字进行进一步的查询操作,得到与查询关键词相关的查询结果。Generally, spatial keyword query is divided into three steps, which are formally measuring spatial text objects and related data, establishing corresponding index structures for all spatial text objects, and querying through received query keywords. Among them, there is a unified measurement standard for the formal measurement of the spatial text object, and the corresponding spatial text object query can be performed efficiently through the measurement standard. Based on the above method, it is possible to perform further query operations on spatial keywords, and obtain query results related to the query keywords.

但是,目前针对空间关键字的查询方法一般返回的查询结果即返回的空间文本对象与查询关键词有较高的相似度,而对结果集中的兴趣点之间的关系没有要求,通常这些被返回的点之间都很相似,无法满足用户的多样化需求。例如,用户想要搜索附近的尽可能多的类别的餐馆,以便在不同类型的餐馆中进行选择,而搜索引擎可能只返回附近统一类型的餐馆,无法帮助用户进行选择。However, the current query methods for spatial keywords generally return query results, that is, the returned spatial text objects have a high degree of similarity with the query keywords, and there is no requirement for the relationship between the interest points in the result set, usually these are returned The points are very similar and cannot meet the diverse needs of users. For example, a user wants to search for nearby restaurants of as many categories as possible, so as to choose among different types of restaurants, but the search engine may only return nearby restaurants of the same type, which cannot help the user to make a choice.

因此,如何使空间关键字搜索提高搜索结果的多样性,满足用户的多样化需求,是本领域技术人员所关注的重点问题。Therefore, how to improve the diversity of search results through spatial keyword search and meet the diverse needs of users is a key issue that those skilled in the art are concerned about.

发明内容Contents of the invention

本申请的目的是提供一种基于空间关键字搜索的结果数据选取方法、结果数据选取装置、服务器以及计算机可读存储介质,通过距离系数和多样性主题数确定每个候选空间文本对象的边界成本,选取边界成本最小的候选空间文本对象至结果集中,使结果集中的对象与查询对象距离较短并且多样性主题数保持在较高状态,也就是在基于距离系数选择的同时考虑到每个搜索结果的多样性,提高结果集的多样性,满足用户多样化的搜索需求。The purpose of this application is to provide a method for selecting result data based on spatial keyword search, a device for selecting result data, a server, and a computer-readable storage medium, and determine the boundary cost of each candidate space text object through the distance coefficient and the number of diverse topics , select the candidate space text object with the smallest boundary cost to the result set, so that the distance between the object in the result set and the query object is relatively short and the number of diverse topics is kept high, that is, each search is considered while selecting based on the distance coefficient The diversity of results improves the diversity of result sets and meets the diverse search needs of users.

为解决上述技术问题,本申请提供一种基于空间关键字搜索的结果数据选取方法,包括:In order to solve the above technical problems, the application provides a method for selecting result data based on spatial keyword search, including:

对多个空间文本对象执行索引结构建立操作,得到索引结构;Execute the index structure building operation on multiple spatial text objects to obtain the index structure;

根据得到的查询对象使用所述索引结构选取多个候选空间文本对象,得到候选集;Selecting a plurality of candidate spatial text objects according to the obtained query object using the index structure to obtain a candidate set;

确定每个所述候选空间文本对象和所述查询对象的距离,得到每个所述候选空间文本对象和所述查询对象的距离系数;Determine the distance between each of the candidate spatial text objects and the query object, and obtain a distance coefficient between each of the candidate spatial text objects and the query object;

确定每个所述候选空间文本对象在初始化的所有主题之外包括的主题数量,得到每个所述候选空间文本对象的第一多样性主题数;Determine the number of topics that each of the candidate spatial text objects includes in addition to all the topics initialized, and obtain the first diversity topic number of each of the candidate spatial text objects;

根据所有所述距离系数和所有所述第一多样性主题数确定每个所述候选空间文本对象的第一边界成本;其中,所述距离系数与所述第一边界成本为正比关系,所述第一多样性主题数与所述第一边界成本为反比关系;Determine the first boundary cost of each of the candidate space text objects according to all the distance coefficients and all the first diversity topics; wherein, the distance coefficient is directly proportional to the first boundary cost, so The number of the first diversity topics is inversely proportional to the first boundary cost;

选取所述第一边界成本最小的候选空间文本对象加入到结果集。Select the candidate space text object with the minimum cost of the first boundary and add it to the result set.

可选的,还包括:Optionally, also include:

当选取所述第一边界成本最小的候选空间文本对象加入到结果集时,确定每个所述候选空间文本对象在所述第一边界成本最小的候选空间文本对象加入后的所有主题之外包括的主题数量,得到每个所述候选空间文本对象的第二多样性主题数;When selecting the candidate space text object with the smallest boundary cost to add to the result set, it is determined that each of the candidate space text objects includes The number of topics, obtain the second diversity topic number of each of the candidate spatial text objects;

根据所有所述距离系数和所有所述第二多样性主题数确定对应的所述候选空间文本对象的第二边界成本;其中,所述距离系数与所述第二边界成本为正比关系,所述第二多样性主题数与所述第二边界成本为反比关系;Determine the second boundary cost of the corresponding candidate space text object according to all the distance coefficients and all the second diversity topic numbers; wherein, the distance coefficient is directly proportional to the second boundary cost, so The number of the second diversity topics is inversely proportional to the second boundary cost;

选取所述第二边界成本最小的候选空间文本对象加入到所述结果集。Selecting the candidate space text object with the minimum second boundary cost and adding it to the result set.

可选的,对多个空间文本对象执行索引结构建立操作,得到索引结构,包括:Optionally, perform an index structure building operation on multiple spatial text objects to obtain an index structure, including:

确定每个所述空间文本对象的关键字出现次数;determining the number of keyword occurrences for each of said spatial text objects;

将所述关键字出现次数小于预设次数的空间文本对象设置为块结构,得到多个块结构;Setting the space text object whose occurrence frequency of the keyword is less than the preset number of times as a block structure to obtain multiple block structures;

将所述关键字出现次数大于等于所述预设次数的空间文本对象设置为树结构,得到多个树结构;Setting the space text objects whose keyword occurrence times are greater than or equal to the preset times as a tree structure to obtain multiple tree structures;

将所有所述块结构和所有所述树结构作为所述索引结构。All the block structures and all the tree structures are used as the index structures.

可选的,根据得到的查询对象使用所述索引结构选取多个候选空间文本对象,得到候选集,包括:Optionally, using the index structure to select a plurality of candidate spatial text objects according to the obtained query object to obtain a candidate set, including:

根据得到的查询对象使用所述索引结构按照贪心算法从所有所述空间文本对象中选取多个所述候选空间文本对象,得到所述候选集。According to the obtained query object, the index structure is used to select multiple candidate spatial text objects from all the spatial text objects according to a greedy algorithm to obtain the candidate set.

本申请还提供一种基于空间关键字搜索的结果数据选取装置,包括:The present application also provides a device for selecting result data based on spatial keyword search, including:

索引建立模块,用于对多个空间文本对象执行索引结构建立操作,得到索引结构;An index building module is used to perform an index structure building operation on multiple spatial text objects to obtain an index structure;

候选集获取模块,用于根据得到的查询对象使用所述索引结构选取多个候选空间文本对象,得到候选集;The candidate set acquisition module is used to select a plurality of candidate spatial text objects according to the obtained query object using the index structure to obtain the candidate set;

距离系数获取模块,用于确定每个所述候选空间文本对象和所述查询对象的距离,得到每个所述候选空间文本对象和所述查询对象的距离系数;A distance coefficient acquisition module, configured to determine the distance between each of the candidate spatial text objects and the query object, and obtain the distance coefficient between each of the candidate spatial text objects and the query object;

第一多样性主题数获取模块,用于确定每个所述候选空间文本对象在初始化的所有主题之外包括的主题数量,得到每个所述候选空间文本对象的第一多样性主题数;The first diversity topic number acquisition module is used to determine the number of topics included in each of the candidate space text objects in addition to all the topics initialized, and obtain the first diversity topic number of each of the candidate space text objects ;

第一边界成本获取模块,用于根据所有所述距离系数和所有所述第一多样性主题数确定每个所述候选空间文本对象的第一边界成本;其中,所述距离系数与所述第一边界成本为正比关系,所述第一多样性主题数与所述第一边界成本为反比关系;The first boundary cost acquisition module is used to determine the first boundary cost of each of the candidate space text objects according to all the distance coefficients and all the first diversity topic numbers; wherein, the distance coefficient and the The first boundary cost is in a proportional relationship, and the number of the first diversity topics is inversely proportional to the first boundary cost;

第一结果数据选取模块,用于选取所述第一边界成本最小的候选空间文本对象加入到结果集。The first result data selection module is used to select the candidate space text object with the smallest boundary cost and add it to the result set.

可选的,还包括:Optionally, also include:

第二多样性主题数获取模块,用于当选取所述第一边界成本最小的候选空间文本对象加入到结果集时,确定每个所述候选空间文本对象在所述第一边界成本最小的候选空间文本对象加入后的所有主题之外包括的主题数量,得到每个所述候选空间文本对象的第二多样性主题数;The second diversity topic number acquisition module is used to determine each of the candidate space text objects at the minimum cost of the first boundary when selecting the candidate space text object with the minimum cost of the first boundary and adding it to the result set. After the candidate spatial text object is added, the number of topics included outside all topics is obtained to obtain the second diversity topic number of each of the candidate spatial text objects;

第二边界成本获取模块,用于根据所有所述距离系数和所有所述第二多样性主题数确定对应的所述候选空间文本对象的第二边界成本;其中,所述距离系数与所述第二边界成本为正比关系,所述第二多样性主题数与所述第二边界成本为反比关系;The second boundary cost acquisition module is used to determine the second boundary cost of the corresponding candidate space text object according to all the distance coefficients and all the second diversity topic numbers; wherein, the distance coefficient is the same as the The second boundary cost is in a proportional relationship, and the number of the second diversity topics is inversely proportional to the second boundary cost;

第二结果数据选取模块,用于选取所述第二边界成本最小的候选空间文本对象加入到所述结果集。The second result data selection module is configured to select the candidate space text object with the minimum second boundary cost and add it to the result set.

可选的,所述索引建立模块,包括:Optionally, the index building module includes:

关键字出现次数获取单元,用于确定每个所述空间文本对象的关键字出现次数;A keyword occurrence count acquisition unit, configured to determine the keyword occurrence count of each of the spatial text objects;

块结构获取单元,用于将所述关键字出现次数小于预设次数的空间文本对象设置为块结构,得到多个块结构;A block structure acquisition unit, configured to set the space text object whose occurrence times of the keyword is less than a preset number of times as a block structure to obtain multiple block structures;

树结构获取单元,用于将所述关键字出现次数大于等于所述预设次数的空间文本对象设置为树结构,得到多个树结构;A tree structure acquiring unit, configured to set the space text objects whose keyword occurrence times are greater than or equal to the preset number of times as a tree structure to obtain multiple tree structures;

索引结构获取单元,用于将所有所述块结构和所有所述树结构作为所述索引结构。The index structure acquisition unit is configured to use all the block structures and all the tree structures as the index structures.

可选的,所述候选集获取模块,包括:Optionally, the candidate set acquisition module includes:

候选集获取单元,用于根据得到的查询对象使用所述索引结构按照贪心算法从所有所述空间文本对象中选取多个所述候选空间文本对象,得到所述候选集。The candidate set acquisition unit is configured to select a plurality of candidate spatial text objects from all the spatial text objects according to the obtained query object using the index structure according to a greedy algorithm to obtain the candidate set.

本申请还提供一种服务器,包括:The application also provides a server, including:

存储器,用于存储计算机程序;memory for storing computer programs;

处理器,用于执行所述计算机程序时实现如上所述的结果数据选取方法。A processor, configured to implement the above method for selecting result data when executing the computer program.

本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如上所述的结果数据选取方法。The present application also provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the above method for selecting result data is realized.

本申请所提供的一种基于空间关键字搜索的结果数据选取方法,包括:对多个空间文本对象执行索引结构建立操作,得到索引结构;根据得到的查询对象使用所述索引结构选取多个候选空间文本对象,得到候选集;确定每个所述候选空间文本对象和所述查询对象的距离,得到每个所述候选空间文本对象和所述查询对象的距离系数;确定每个所述候选空间文本对象在初始化的所有主题之外包括的主题数量,得到每个所述候选空间文本对象的第一多样性主题数;根据所有所述距离系数和所有所述第一多样性主题数确定每个所述候选空间文本对象的第一边界成本;其中,所述距离系数与所述第一边界成本为正比关系,所述第一多样性主题数与所述第一边界成本为反比关系;选取所述第一边界成本最小的候选空间文本对象加入到结果集。A method for selecting result data based on spatial keyword search provided by the present application includes: performing an index structure establishment operation on multiple spatial text objects to obtain an index structure; using the index structure to select multiple candidates according to the obtained query object Spatial text object, obtain candidate set; Determine the distance of each described candidate spatial text object and described query object, obtain the distance coefficient of each described candidate spatial text object and described query object; Determine each described candidate space The subject quantity that text object comprises outside all subjects of initialization, obtains the first diversity subject number of each described candidate space text object; Determine according to all described distance coefficients and all described first diversity subject numbers The first boundary cost of each of the candidate spatial text objects; wherein, the distance coefficient is in direct proportion to the first boundary cost, and the first diversity topic number is inversely proportional to the first boundary cost ; Select the candidate space text object with the minimum cost of the first boundary and add it to the result set.

可见,先通过多样性主题数实现了度量空间文本对象的多样性,再通过距离系数和多样性主题数确定每个候选空间文本对象的边界成本,选取边界成本最小的候选空间文本对象至结果集中,使结果集中的对象与查询对象距离较短并且多样性主题数保持在较高状态,也就是在基于距离系数选择的同时考虑到每个搜索结果的多样性,提高结果集的多样性,满足用户多样化的搜索需求。It can be seen that the diversity of the metric space text object is realized by the number of diverse topics first, and then the boundary cost of each candidate space text object is determined by the distance coefficient and the number of diversity topics, and the candidate space text object with the smallest boundary cost is selected into the result set , so that the distance between the object in the result set and the query object is short and the number of diverse topics is kept high, that is, the diversity of each search result is considered while selecting based on the distance coefficient, and the diversity of the result set is improved to meet Diversified search needs of users.

本申请还提供一种基于空间关键字搜索的结果数据选取装置、服务器以及计算机可读存储介质,具有上述有益效果,在此不做赘述。The present application also provides a device for selecting result data based on spatial keyword search, a server, and a computer-readable storage medium, which have the above-mentioned beneficial effects, and will not be repeated here.

附图说明Description of drawings

为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only It is an embodiment of the present application, and those skilled in the art can also obtain other drawings according to the provided drawings without creative work.

图1为本申请实施例所提供的一种基于空间关键字搜索的结果数据选取方法的流程图;Fig. 1 is a flow chart of a method for selecting result data based on spatial keyword search provided by an embodiment of the present application;

图2为本申请实施例所提供的基于空间关键字搜索的结果数据选取方法的后续选取过程的流程图;Fig. 2 is a flow chart of the subsequent selection process of the result data selection method based on spatial keyword search provided by the embodiment of the present application;

图3为本申请实施例所提供的基于空间关键字搜索的结果数据选取方法的索引建立的流程图;Fig. 3 is the flow chart of index establishment of the result data selection method based on spatial keyword search provided by the embodiment of the present application;

图4为本申请实施例所提供的一种基于空间关键字搜索的结果数据选取装置的结构示意图。FIG. 4 is a schematic structural diagram of an apparatus for selecting result data based on spatial keyword search provided by an embodiment of the present application.

具体实施方式Detailed ways

本申请的核心是提供一种基于空间关键字搜索的结果数据选取方法、结果数据选取装置、服务器以及计算机可读存储介质,通过距离系数和多样性主题数确定每个候选空间文本对象的边界成本,选取边界成本最小的候选空间文本对象至结果集中,使结果集中的对象与查询对象距离较短并且多样性主题数保持在较高状态,也就是在基于距离系数选择的同时考虑到每个搜索结果的多样性,提高结果集的多样性,满足用户多样化的搜索需求。The core of this application is to provide a method for selecting result data based on spatial keyword search, a device for selecting result data, a server, and a computer-readable storage medium, and determine the boundary cost of each candidate spatial text object through the distance coefficient and the number of diverse topics , select the candidate space text object with the smallest boundary cost to the result set, so that the distance between the object in the result set and the query object is relatively short and the number of diverse topics is kept high, that is, each search is considered while selecting based on the distance coefficient The diversity of results improves the diversity of result sets and meets the diverse search needs of users.

为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments It is a part of the embodiments of this application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.

请参考图1,图1为本申请实施例所提供的一种基于空间关键字搜索的结果数据选取方法的流程图。Please refer to FIG. 1 , which is a flow chart of a method for selecting result data based on spatial keyword search provided by an embodiment of the present application.

本实施例提供一种基于空间关键字搜索的结果数据选取方法,可以提高搜索的多样性,可以包括:This embodiment provides a method for selecting result data based on spatial keyword search, which can improve the diversity of searches, and may include:

S101,对多个空间文本对象执行索引结构建立操作,得到索引结构;S101. Execute an index structure building operation on multiple spatial text objects to obtain an index structure;

本步骤旨在对多个空间文本对象执行索引结构建立操作,得到相应的索引结构,也就是对所有的空间文本对象建立索引结构。在根据查询条件进行相应的查询时,首先要建立索引结构,才可以实现对空间文本对象的搜索。具体的,索引结构的建立方法在此不做限定,只要可以实现对空间文本对象的搜索的建立方法都可以为本实施例中使用的建立方法。This step aims to perform an index structure building operation on multiple spatial text objects to obtain corresponding index structures, that is, to build index structures for all spatial text objects. When performing corresponding queries according to the query conditions, an index structure must be established first to realize the search for spatial text objects. Specifically, the establishment method of the index structure is not limited here, as long as the establishment method that can realize the search for the spatial text object can be the establishment method used in this embodiment.

并且可以针对不同的搜索环境以及查询的特点,建立合适的索引结构,有利于加快搜索的速度,并且简化索引结构,降低维护和更新索引结构的成本。Furthermore, an appropriate index structure can be established according to different search environments and query characteristics, which is conducive to speeding up the search, simplifying the index structure, and reducing the cost of maintaining and updating the index structure.

其中,空间文本对象是将被搜索空间文本数据进行形式化表达后得到的。具体的形式化过程在后续段落中进行说明。Wherein, the spatial text object is obtained by formally expressing the searched spatial text data. The specific formalization process is explained in the following paragraphs.

S102,根据得到的查询对象使用索引结构选取多个候选空间文本对象,得到候选集;S102, using an index structure to select a plurality of candidate spatial text objects according to the obtained query object to obtain a candidate set;

在步骤S101的基础上,本步骤旨在根据得到的索引结构和查询对象选取得到多个候选空间文本对象,得到候选集。这一步骤的目的主要是通过查询对象在所有空间文本对象选取出,也就是通过该索引结果选取出候选空间文本对象,目的是使该候选空间文本对象符合查询对象的限制,因此本步骤中通过查询对象选取出多个候选空间文本对象,并且得到候选集。On the basis of step S101, this step aims to select and obtain multiple candidate spatial text objects according to the obtained index structure and query object, and obtain a candidate set. The purpose of this step is to select all spatial text objects through the query object, that is, to select the candidate spatial text object through the index result, the purpose is to make the candidate spatial text object conform to the restriction of the query object, so in this step through The query object selects multiple candidate spatial text objects and obtains a candidate set.

其中,查询对象与上一步骤中的空间文本对象类似,都是将数据进行形式化表达得到的,具体的形式化过程在后续段落中进行说明。Among them, the query object is similar to the spatial text object in the previous step, and is obtained by formalizing the data, and the specific formalization process is explained in the following paragraphs.

S103,确定每个候选空间文本对象和查询对象的距离,得到每个候选空间文本对象和查询对象的距离系数;S103, determine the distance between each candidate spatial text object and the query object, and obtain the distance coefficient between each candidate spatial text object and the query object;

在步骤S102的基础上,本步骤旨在确定每个候选空间文本对象和查询对象之间的距离,得到对应的距离系数。本步骤中确定距离的计算方法可以与一般的空间距离计算方法相同,其主要目的也是在空间搜索中得到对象与查询对象之间的距离,以使后续的查询过程可以通过距离筛选得到最优的查询结果。On the basis of step S102, this step aims to determine the distance between each candidate spatial text object and the query object, and obtain the corresponding distance coefficient. The calculation method for determining the distance in this step can be the same as the general spatial distance calculation method, and its main purpose is to obtain the distance between the object and the query object in the space search, so that the subsequent query process can obtain the optimal distance through distance screening. search result.

当然,本步骤中确定距离的方法也可以根据候选空间文本对象与查询对象的形式化表达方式的不同进行变化,具体的变化形式和方式应视实际应用环境做选择,在此不做限定。Of course, the method for determining the distance in this step can also be changed according to the difference between the formalized expressions of the candidate spatial text object and the query object. The specific change form and method should be selected according to the actual application environment, and are not limited here.

S104,确定每个候选空间文本对象在初始化的所有主题之外包括的主题数量,得到每个候选空间文本对象的第一多样性主题数;S104, determine the number of topics included in each candidate space text object besides all the topics initialized, and obtain the first diversity topic number of each candidate space text object;

在步骤S102的基础上,本步骤旨在确定每个候选空间文本对象在初始化的所有主题之外包括的主题数量,得到该候选空间文本对象对应的第一多样性主题数。On the basis of step S102, this step aims to determine the number of topics included in each candidate spatial text object besides all the initialized topics, and obtain the first diversity topic number corresponding to the candidate spatial text object.

由于在一般的空间关键字搜索中,选取过程只考虑到候选空间文本对象和查询对象之间的距离,即最终的结果数据为与查询对象最短的候选空间文本对象组成,这样会使结果数据与查询对象非常相似。但是这样所得到的结果无法满足用户对于查询过程的多样化需求,即在该查询对象一定的相似程度下,要求查询结果尽可能的多样化,也就是说结果数据包括跟某个关键词主题相关的多种类型的兴趣点。Because in the general spatial keyword search, the selection process only considers the distance between the candidate spatial text object and the query object, that is, the final result data is composed of the shortest candidate spatial text object with the query object, which will make the result data and the query object shortest. Query objects are very similar. However, the results obtained in this way cannot meet the diverse needs of users for the query process, that is, under a certain degree of similarity of the query objects, the query results are required to be as diverse as possible, that is to say, the result data includes keywords related to a certain keyword topic. Various types of points of interest.

进一步的,本步骤中通过某一个候选空间文本对象在初始化的所有主题之外所包括的主题数量确定该候选空间文本对象的多样性主题数。其中,主题是候选空间文本对象与查询对象所共有的一种属性,也是在空间文本搜索中使用的一般属性。Further, in this step, the number of diverse topics of a candidate spatial text object is determined by the number of topics included in the candidate spatial text object other than all the topics initialized. Among them, topic is an attribute shared by the candidate spatial text object and the query object, and it is also a general attribute used in spatial text search.

其中,初始化的所有主题即为目前初始化得到的结果集中所有空间文本对象所包括的主题,进一步的,就可以通过该所有主题的范围之外的主题数量度量出某个候选空间文本对象的主题多样性。具体的当实际使用时,可以在空间文本对象的搜索过程中通过该多样性主题数选取搜索的结果数据,提高搜索结果的多样性,以使结果数据符合用户的多样化需求。Among them, all the topics initialized are the topics included in all spatial text objects in the result set currently initialized. Further, the topic diversity of a candidate spatial text object can be measured by the number of topics outside the scope of all the topics. sex. Specifically, in actual use, the search result data can be selected according to the diversity topic number during the search process of the spatial text object, so as to increase the diversity of the search results so that the result data meet the diverse needs of users.

所以,上述的结果集为在本实施例中进行初始化处理得到的,并且结果集中所添加的数据是不断循环选取候选集中的数据得到的。如果此时是第一轮循环,那么初始化处理得到的结果集可以不存在空间文本对象,也就没有相应的主题范围。初始化处理也可以是通过其他的搜索方法对结果集中添加一定数量的结果数据,之后再通过本实施例的方法添加更多的空间文本对象,此时初始化处理得到的结果集也就存在一定的主题范围,根据该主题范围就可以确定出候选空间文本对象的第一多样性主题数。Therefore, the above result set is obtained by performing initialization processing in this embodiment, and the data added in the result set is obtained by continuously cyclically selecting data in the candidate set. If it is the first cycle at this time, then the result set obtained by the initialization process may not have a space text object, and there will be no corresponding subject range. The initialization process can also be to add a certain amount of result data to the result set through other search methods, and then add more spatial text objects through the method of this embodiment. At this time, the result set obtained by the initialization process also has a certain theme range, according to the topic range, the first diversity topic number of the candidate spatial text object can be determined.

可以想到的是,本实施例中的初始化处理过程还可以是另一个空间关键字搜索的过程,也就是本实施可以应用在其他的空间关键字搜索过程之后,以提高该空间关键字搜索的多样性。进一步的,可以在本实施例的空间关键字搜索过程之后使用其他的搜索方法,选择更合适的搜索结果。It is conceivable that the initialization process in this embodiment can also be another spatial keyword search process, that is, this implementation can be applied after other spatial keyword search processes to improve the diversity of the spatial keyword search. sex. Further, other search methods may be used after the spatial keyword search process in this embodiment to select more suitable search results.

需要说明的是,本步骤与步骤S103在执行顺序上并无先后关系。It should be noted that there is no sequence relationship between this step and step S103 in execution sequence.

S105,根据所有距离系数和所有第一多样性主题数确定每个候选空间文本对象的第一边界成本;其中,距离系数与第一边界成本为正比关系,第一多样性主题数与第一边界成本为反比关系;S105, determine the first boundary cost of each candidate space text object according to all distance coefficients and all first diversity topics; wherein, the distance coefficient is directly proportional to the first boundary cost, and the first diversity topic number is related to the first - Boundary cost is an inverse relationship;

在步骤S103和S104的基础上,本步骤旨在根据得到的距离系数和第一多样性主题数确定该候选空间文本对象的第一边界成本。也就是在步骤S103和步骤S104所得到的距离系数和第一多样性主题数的基础上,引出边界成本作为统一的度量对象相似性和多样性的度量方式。并且使距离系数与第一边界成本为正比关系,第一多样性主题数与第一边界成本为反比关系,也就是边界成本越小代表距离系数越小同时第一多样性主题数越大。On the basis of steps S103 and S104, this step aims to determine the first boundary cost of the candidate spatial text object according to the obtained distance coefficient and the first number of diverse topics. That is, on the basis of the distance coefficient and the number of first diversity topics obtained in step S103 and step S104, the boundary cost is derived as a unified measurement method for measuring object similarity and diversity. And the distance coefficient is directly proportional to the first boundary cost, and the first diversity topic number is inversely proportional to the first boundary cost, that is, the smaller the boundary cost, the smaller the distance coefficient and the larger the first diversity topic number .

S106,选取第一边界成本最小的候选空间文本对象加入到结果集。S106. Select the candidate space text object with the smallest first boundary cost and add it to the result set.

在步骤S105的基础上,本步骤旨在选取第一边界成本最小的候选空间文本对象加入到结果集中。也就是将候选集中距离系数与第一多样性主题数最好的候选空间文本对象选取作为结果数据的之一,提高了结果数据的多样性,满足了用户的多样性需求。On the basis of step S105, this step aims to select the candidate space text object with the smallest first boundary cost and add it to the result set. That is, the candidate spatial text object with the best distance coefficient and first diversity topic number in the candidate set is selected as one of the result data, which improves the diversity of the result data and meets the diversity needs of users.

综上,本实施例通过距离系数和多样性主题数确定每个候选空间文本对象的边界成本,选取边界成本最小的候选空间文本对象至结果集中,使结果集中的对象与查询对象距离较短并且多样性主题数保持在较高状态,也就是在基于距离系数选择的同时考虑到每个搜索结果的多样性,提高结果集的多样性,满足用户多样化的搜索需求。To sum up, in this embodiment, the boundary cost of each candidate spatial text object is determined by the distance coefficient and the number of diverse topics, and the candidate spatial text object with the smallest boundary cost is selected into the result set, so that the distance between the object in the result set and the query object is relatively short and The number of diverse topics is kept at a high level, that is, the diversity of each search result is considered while selecting based on the distance coefficient, so as to increase the diversity of the result set and meet the diverse search needs of users.

可选的,本实施例中还可以根据得到的查询对象使用索引结构按照贪心算法从所有空间文本对象中选取多个候选空间文本对象,得到候选集。Optionally, in this embodiment, according to the obtained query object, multiple candidate spatial text objects may be selected from all spatial text objects using an index structure according to a greedy algorithm to obtain a candidate set.

本可选方案的主要目的是从所有空间文本对象中选取出候选空间文本对象,得到候选集,特别之处是通过贪心算法选取出合适的候选空间文本对象。主要因为上述实施例中基于边界成本的结果数据选取方法不能保证结果数据的多样化性和距离同时满足一定要求,所以为了提高查询结果的质量,本可选方案中使用贪心算法的选取得到多个候选空间文本对象,得到候选集。The main purpose of this option is to select candidate spatial text objects from all spatial text objects to obtain a candidate set, in particular to select suitable candidate spatial text objects through a greedy algorithm. Mainly because the result data selection method based on the boundary cost in the above embodiment cannot guarantee that the diversity and distance of the result data meet certain requirements at the same time, so in order to improve the quality of the query results, this option uses a greedy algorithm to select multiple Candidate space text object to get the candidate set.

本可选方案的核心思想是按照空间文本对象与查询对象之间的空间距离将他们分层不同的层,然后在每一层选取特定数量的空间文本对象,使这些空间文本对象拥有的未被覆盖主题数比同一层的其他空间文本对象多。The core idea of this optional scheme is to stratify them into different layers according to the spatial distance between the spatial text objects and the query object, and then select a specific number of spatial text objects in each layer, so that the spatial text objects owned by these spatial text objects Covers more topics than other spatial text objects on the same layer.

具体的,给定一个空间对象集合D,对于一个空间关键字查询q,假设在D上能找到同时满足多样化需求即覆盖足够多的主题和最小化距离函数的k个空间对象,它们离查询q的空间距离之和为M。我们假设M的值可能很小,所以可以从距离以查询q的位置为圆心、半径较小的圆形范围内开始找起,如果这个范围内找到的结果不能覆盖足够的主题,则扩大M在更大的范围内搜索。对于每一个圆形搜索范围,将范围内的对象按它们与查询点的距离划分成不同的层,然后在每一层选取合适的对象。Specifically, given a set of spatial objects D, for a spatial keyword query q, it is assumed that k spatial objects can be found on D that meet the diverse requirements at the same time, that is, cover enough topics and minimize the distance function, and they are far from the query The sum of the spatial distances of q is M. We assume that the value of M may be very small, so we can start searching from within a circular range with a small radius centered on the position of the query q. If the results found within this range cannot cover enough topics, then expand M in Search on a larger scale. For each circular search range, objects within the range are divided into different layers according to their distance from the query point, and then appropriate objects are selected in each layer.

进一步的,还可以将该可选方案中使用的贪心算法在获取结果数据之后进行相应的搜索,也就是将本实施例中得到结果集通过贪心算法重新筛选,可以提高原有的结果集的准确率,即使结果集与查询对象的距离更近。Further, the greedy algorithm used in the alternative scheme can also be used for corresponding searches after obtaining the result data, that is, the result set obtained in this embodiment is re-screened by the greedy algorithm, which can improve the accuracy of the original result set. rate, even if the result set is closer to the query object.

基于上述实施例,可以对其中所描述的空间文本对象、查询对象等形式化表达为如下形式。Based on the above embodiments, the spatial text objects, query objects, etc. described therein can be formally expressed in the following forms.

1)空间文本对象形式化表达1) Formal expression of spatial text objects

使用2维空间中的一个带有位置坐标和文本描述的点o={loc,term,topic}来表示一个空间文本对象。其中,loc由经纬度构成,表示对象o所在的位置;term是用来描述对象o的一组关键字;topic表示对象o所覆盖的主题集合。Use a point o={loc,term,topic} with location coordinates and text description in 2D space to represent a spatial text object. Among them, loc is composed of latitude and longitude, indicating the location of the object o; term is a set of keywords used to describe the object o; topic indicates the collection of topics covered by the object o.

例如,在地图应用环境中,一个空间关键字对应了一个兴趣点,即商家或机构,系统记录了它的位置和文本描述,这个兴趣点所覆盖的主题可以由人工标记或者通过自然语言处理技术分析该它的评论信息来获取。为了方便,也可以将空间文本对象称为空间对象。For example, in a map application environment, a spatial keyword corresponds to a point of interest, that is, a business or institution, and the system records its location and text description. The topics covered by this point of interest can be marked manually or through natural language processing technology. Get it by analyzing its comment information. For convenience, a spatial text object may also be referred to as a spatial object.

基于上述定义,我们用D来表示数据库中的所有空间文本对象的集合,即:Based on the above definition, we use D to represent the collection of all spatial text objects in the database, namely:

2)空间关键字查询的形式化表达2) Formal expression of spatial keyword query

空间关键字查询形式化为q={loc,term},q即为查询对象。其中,loc是查询点即用户所在的位置,在二维空间用经纬度坐标表示;term是用户所输入的一组关键字,例如“中餐馆”,用于描述用户的查询意图。The spatial keyword query is formalized as q={loc, term}, and q is the query object. Among them, loc is the query point, that is, the location of the user, which is represented by latitude and longitude coordinates in two-dimensional space; term is a set of keywords entered by the user, such as "Chinese restaurant", which is used to describe the user's query intention.

对给定查询对象q,搜索引擎从数据集D中挑选与q最为匹配的k个最相似的空间文本对象作为返回的结果数据。其中,结果数据为距离很近,文本相关程度大且结果之间多样化程度高的一组空间文本对象的集合。For a given query object q, the search engine selects the k most similar spatial text objects that best match q from the data set D as the returned result data. Among them, the result data is a collection of a group of spatial text objects with close distance, high text correlation and high diversification among results.

3)候选集的形式化表达3) Formal expression of the candidate set

给定一个空间文本对象数据库D,一个空间关键字查询对象q={loc,term}和一个阈值Thre,D的一个子集S(即)被称为候选集。当且仅当满足两个条件:Given a spatial text object database D, a spatial keyword query object q={loc, term} and a threshold Thre, a subset S of D (ie ) is called the candidate set. If and only if two conditions are met:

关键词限制,S中的每一个空间文本对象o包含所有的查询关键字,即 keyword restriction, each spatial text object o in S contains all query keywords, namely

多样化要求,S中所有空间文本对象覆盖不同的主题数之和不小于Thre,即 Diversification requires that the sum of the number of different topics covered by all spatial text objects in S is not less than Thre, that is

4)空间文本对象的距离函数的形式化表达4) Formal expression of the distance function of the spatial text object

给定一个空间文本对象数据库D,一个空间关键字查询对象q={loc,term},对于D的元素个数为k的一个子集我们定义集合R与q的距离函数为:Given a spatial text object database D, a spatial keyword query object q={loc,term}, for a subset of D whose number of elements is k We define the distance function between sets R and q as:

其中Dist(q,o)表示空间文本对象o与查询对象q之间的距离,Distmax表示数据集D中的空间文本对象与查询对象的最远距离。如上所示,查询对象与空间文本对象集合的距离经过归一化处理,即取值是在[0,1]区间。Where Dist(q,o) represents the distance between the spatial text object o and the query object q, and Dist max represents the farthest distance between the spatial text object in the dataset D and the query object. As shown above, the distance between the query object and the spatial text object set is normalized, that is, the value is in the interval [0,1].

5)搜索问题的形式化定义5) Formal definition of the search problem

给定一个空间文本对象数据集D,一个空间关键字查询q={loc,term},一个距离函数f和阈值Thre,考虑空间文本对象与查询查询之间的空间距离、文本阈值以及主题覆盖度,拟返回满足以下两个相似性度量条件的k个空间文本对象:Given a dataset D of spatial text objects, a spatial keyword query q = {loc, term}, a distance function f and a threshold Thre, consider the spatial distance between the spatial text object and the query query, the text threshold, and the topic coverage , intends to return k spatial text objects satisfying the following two similarity measurement conditions:

1、这k个空间文本对象组成候选集R,即 1. These k spatial text objects form a candidate set R, namely and

2、f(q,R)取得最小值。2. f(q, R) obtains the minimum value.

请参考图2,图2为本申请实施例所提供的基于空间关键字搜索的结果数据选取方法的后续选取过程的流程图。Please refer to FIG. 2 , which is a flow chart of the subsequent selection process of the method for selecting result data based on spatial keyword search provided by the embodiment of the present application.

基于上一实施例,本实施例主要针对上一实施例中当第一边界成本最小的候选空间文本对象加入到结果集后做的一个扩充说明,前述部分与上一实施例大体相同,相同部分可以参考上一实施例,在此不做赘述。Based on the previous embodiment, this embodiment is mainly aimed at an extended description of the candidate space text object with the smallest boundary cost in the previous embodiment after it is added to the result set. The foregoing part is substantially the same as the previous embodiment. Reference may be made to the previous embodiment, and details are not repeated here.

本实施例可以包括:This embodiment can include:

S201,当选取第一边界成本最小的候选空间文本对象加入到结果集时,确定每个候选空间文本对象在第一边界成本最小的候选空间文本对象加入后的所有主题之外包括的主题数量,得到每个候选空间文本对象的第二多样性主题数;S201, when selecting the candidate space text object with the minimum first boundary cost to add to the result set, determine the number of topics included in each candidate space text object after the addition of the candidate space text object with the minimum first boundary cost, Obtain the second diversity theme number of each candidate space text object;

本步骤旨在当上一实施例中将第一边界成本最小的候选空间文本对象加入到结果集时,此时相当于对结果集中的空间文本对象进行变动,也就是结果集所包含的主题范围发生了相应的变化,为了继续提高从候选集中选取候选空间文本对象的多样性,就需要确定每个候选空间文本对象在第一边界成本最小的候选空间文本对象加入后的所有主题之外包括的主题数量,得到第二多样性主题数。This step aims to add the candidate spatial text object with the smallest boundary cost to the result set in the previous embodiment. At this time, it is equivalent to changing the spatial text object in the result set, that is, the subject range contained in the result set Corresponding changes have taken place. In order to continue to improve the diversity of candidate spatial text objects selected from the candidate set, it is necessary to determine the number of topics included in each candidate spatial text object after the candidate spatial text object with the smallest boundary cost is added. The number of topics, to get the number of topics in the second diversity.

由于结果集加入了第一实施例的中所选取的候选空间文本数据,相应的结果集中的所包括的主题范围也发了变化,也就是度量的主题范围也发生了变化,因此本步骤的目的主要是在第二结果集的基础上重新计算所有候选空间文本对象的多样性主题数,即第二多样性主题数。Since the candidate space text data selected in the first embodiment is added to the result set, the subject range included in the corresponding result set has also changed, that is, the subject range of measurement has also changed, so the purpose of this step Mainly, the number of diversity topics of all candidate spatial text objects is recalculated on the basis of the second result set, that is, the second number of diversity topics.

S202,根据所有距离系数和所有第二多样性主题数确定对应的候选空间文本对象的第二边界成本;其中,距离系数与第二边界成本为正比关系,第二多样性主题数与第二边界成本为反比关系;S202. Determine the second boundary cost of the corresponding candidate space text object according to all distance coefficients and all second diversity topic numbers; wherein, the distance coefficient is directly proportional to the second boundary cost, and the second diversity topic number is related to the first The two boundary costs are inversely proportional;

在步骤S202的基础上,本步骤旨在根据上一实施例的距离系数和第二多样性主题数确定第二边界成本。具体内容与上一实施例大体相同,可以参考上衣实施例,在此不做赘述。On the basis of step S202, this step aims to determine the second boundary cost according to the distance coefficient and the second number of diverse topics in the previous embodiment. The specific content is substantially the same as that of the previous embodiment, and reference may be made to the embodiment of the upper garment, so details are not repeated here.

S203,选取第二边界成本最小的候选空间文本对象加入到结果集。S203. Select the candidate space text object with the smallest second boundary cost and add it to the result set.

在步骤S202的基础上,本步骤旨在将第二边界成本最小的候选空间文本对象加入到结果集。On the basis of step S202, this step aims to add the candidate space text object with the smallest second boundary cost to the result set.

由于在空间关键字搜索中每次添加结果数据的所包括的主题范围都发生了相应的变化,因此本实施例旨在说明后续如何添加结果数据,从而保持结果数据的多样性。因此,本实施所说明的步骤可以拓展至多次,只需要在本实施例的基础上做适应性的修改,具体的不再赘述。Since the range of subjects included in each addition of result data in the spatial keyword search has a corresponding change, this embodiment aims to explain how to add result data subsequently, so as to maintain the diversity of result data. Therefore, the steps described in this implementation can be extended to multiple times, and it is only necessary to make adaptive modifications on the basis of this embodiment, and details will not be repeated here.

请参考图3,图3为本申请实施例所提供的基于空间关键字搜索的结果数据选取方法的索引建立的流程图。Please refer to FIG. 3 . FIG. 3 is a flow chart of index building in the method for selecting result data based on spatial keyword search provided by the embodiment of the present application.

基于上一实施例,本实施例主要是针对上一实施例中如何建立索引结果做的一个具体说明,其他部分可以参考上一实施例,在此不做赘述。Based on the previous embodiment, this embodiment is mainly a specific description of how to create an index result in the previous embodiment, other parts can refer to the previous embodiment, and will not be repeated here.

本实施例可以包括:This embodiment can include:

S301,确定每个空间文本对象的关键字出现次数;S301. Determine the keyword occurrence times of each spatial text object;

S302,将关键字出现次数小于预设次数的空间文本对象设置为块结构,得到多个块结构;S302, setting the space text object whose occurrence frequency of the keyword is less than the preset number of times as a block structure to obtain multiple block structures;

S303,将关键字出现次数大于等于预设次数的空间文本对象设置为树结构,得到多个树结构;S303, setting the spatial text objects whose keyword occurrence times are greater than or equal to the preset number of times as a tree structure to obtain multiple tree structures;

S304,将所有块结构和所有树结构作为索引结构。S304. Use all block structures and all tree structures as index structures.

在现有的空间关键字查询中,索引结果可划分为三个类别:即以空间优先的索引结构、以文本优先的索引结构及二者紧密结合的索引结构。空间优先索引结构又可以分为基于R树、网格以及空间填充曲线的索引结构;文本优先索引结构主要基于倒排文件和位图;空间文本结合的索引结构同时紧密结合了这些结构来更加有效过滤一些不符合查询对象要求的空间文本对象。但是,随着数据量的增加,这些索引结构都变得异常庞大,这使得索引的空间占用量直线上升,而且更新速度变慢,影响实际应用中的体验。In the existing spatial keyword query, the index results can be divided into three categories: the index structure with space priority, the index structure with text priority and the index structure with the two closely combined. The space-first index structure can be further divided into index structures based on R-trees, grids, and space-filling curves; the text-first index structure is mainly based on inverted files and bitmaps; the index structure combined with space and text is closely combined with these structures to be more effective Filter some spatial text objects that do not meet the requirements of the query object. However, as the amount of data increases, these index structures become extremely large, which makes the space occupied by the index skyrocket, and the update speed slows down, which affects the experience in practical applications.

因此,本实施例通过每个空间文本对象的关键字出现次数对该对象设置不同的索引结构,也就是对空间文本对象的索引结构进行分级处理,将关键字出现频率较低的对象设置为块结构,已于存放数量较多的对象数据。将关键字出现频率较高的对象设置为树结构,方便进行搜索时进行查找到相关的对象。Therefore, in this embodiment, different index structures are set for each spatial text object through the keyword occurrence times of the object, that is, the index structure of the spatial text object is hierarchically processed, and objects with lower frequency of keyword occurrence are set as blocks Structure, used to store a large amount of object data. Objects with high frequency of keywords are set in a tree structure, which is convenient for finding related objects when searching.

并且在搜索对象的过程中,你可以搜索不同的树结构和块结构就可以完成相应的搜索操作。对于符合查询对象条件的树结构,可以以最小边界成本的递增顺序访问其中的对象节点,其中最小边界成本可以定义为:And in the process of searching objects, you can search different tree structures and block structures to complete the corresponding search operations. For a tree structure that meets the query object condition, the object nodes in it can be accessed in the increasing order of the minimum boundary cost, where the minimum boundary cost can be defined as:

其中N表示树结构的节点,Dist(q,N.mbr)是N的最小边界矩形离查询的空间距离,|Occuri=1|是N的索引结构中出现的次数(即N所覆盖的主题数)。Among them, N represents the node of the tree structure, Dist(q,N.mbr) is the spatial distance between the minimum bounding rectangle of N and the query, |Occur i = 1| is the number of occurrences in the index structure of N (that is, the topics covered by N number).

本申请实施例提供了一种基于空间关键字搜索的结果数据选取方法,可以通过距离系数和多样性主题数确定每个候选空间文本对象的边界成本,选取边界成本最小的候选空间文本对象至结果集中,使结果集中的对象与查询对象距离较短并且多样性主题数保持在较高状态,也就是在基于距离系数选择的同时考虑到每个搜索结果的多样性,提高结果集的多样性,满足用户多样化的搜索需求。The embodiment of the present application provides a method for selecting result data based on spatial keyword search, which can determine the boundary cost of each candidate spatial text object through the distance coefficient and the number of diverse topics, and select the candidate spatial text object with the smallest boundary cost to the result Concentration, so that the distance between the object in the result set and the query object is short and the number of diverse topics is kept high, that is, the diversity of each search result is considered while selecting based on the distance coefficient, and the diversity of the result set is improved. Meet the diverse search needs of users.

下面对本申请实施例提供的一种基于空间关键字搜索的结果数据选取装置进行介绍,下文描述的一种基于空间关键字搜索的结果数据选取装置与上文描述的一种基于空间关键字搜索的结果数据选取方法可相互对应参照。The following is an introduction to a result data selection device based on spatial keyword search provided by the embodiment of the present application. The result data selection device based on spatial keyword search described below is the same as the one described above based on spatial keyword search. The result data selection methods can be referred to each other.

请参考图4,图4为本申请实施例所提供的一种基于空间关键字搜索的结果数据选取装置的结构示意图。Please refer to FIG. 4 , which is a schematic structural diagram of an apparatus for selecting result data based on spatial keyword search provided by an embodiment of the present application.

本实施例提供一种基于空间关键字搜索的结果数据选取装置,可以包括:This embodiment provides a device for selecting result data based on spatial keyword search, which may include:

索引建立模块100,用于对多个空间文本对象执行索引结构建立操作,得到索引结构;An index building module 100, configured to perform an index structure building operation on multiple spatial text objects to obtain an index structure;

候选集获取模块200,用于根据得到的查询对象使用索引结构选取多个候选空间文本对象,得到候选集;The candidate set acquisition module 200 is used to select a plurality of candidate space text objects using an index structure according to the obtained query object to obtain a candidate set;

距离系数获取模块300,用于确定每个候选空间文本对象和查询对象的距离,得到每个候选空间文本对象和查询对象的距离系数;The distance coefficient acquisition module 300 is used to determine the distance between each candidate space text object and the query object, and obtain the distance coefficient between each candidate space text object and the query object;

第一多样性主题数获取模块400,用于确定每个候选空间文本对象在初始化的所有主题之外包括的主题数量,得到每个候选空间文本对象的第一多样性主题数;The first diversity topic number acquisition module 400 is used to determine the number of topics that each candidate space text object includes outside all the topics initialized, and obtain the first diversity topic number of each candidate space text object;

第一边界成本获取模块500,用于根据所有距离系数和所有第一多样性主题数确定每个候选空间文本对象的第一边界成本;其中,距离系数与第一边界成本为正比关系,第一多样性主题数与第一边界成本为反比关系;The first boundary cost acquisition module 500 is used to determine the first boundary cost of each candidate space text object according to all distance coefficients and all first diversity topic numbers; wherein, the distance coefficient is proportional to the first boundary cost, and the first boundary cost The number of diverse topics is inversely proportional to the first boundary cost;

第一结果数据选取模块600,用于选取第一边界成本最小的候选空间文本对象加入到结果集。The first result data selection module 600 is configured to select the candidate space text object with the smallest first boundary cost and add it to the result set.

基于上述实施例,还可以包括:Based on the foregoing embodiments, it may also include:

第二多样性主题数获取模块,用于当选取第一边界成本最小的候选空间文本对象加入到结果集时,确定每个候选空间文本对象在第一边界成本最小的候选空间文本对象加入后的所有主题之外包括的主题数量,得到每个候选空间文本对象的第二多样性主题数;The second diversity topic number acquisition module is used to determine that each candidate space text object is added after the candidate space text object with the minimum first boundary cost is added when selecting the candidate space text object with the minimum first boundary cost to add to the result set The number of topics included in all the topics of , get the second diversity topic number of each candidate space text object;

第二边界成本获取模块,用于根据所有距离系数和所有第二多样性主题数确定对应的候选空间文本对象的第二边界成本;其中,距离系数与第二边界成本为正比关系,第二多样性主题数与第二边界成本为反比关系;The second boundary cost acquisition module is used to determine the second boundary cost of the corresponding candidate space text object according to all distance coefficients and all second diversity topics; wherein, the distance coefficient is proportional to the second boundary cost, and the second The number of diverse topics is inversely proportional to the second boundary cost;

第二结果数据选取模块,用于选取第二边界成本最小的候选空间文本对象加入到结果集。The second result data selection module is used to select the candidate space text object with the smallest second boundary cost and add it to the result set.

其中,该索引建立模块100,可以包括:Wherein, the index building module 100 may include:

关键字出现次数获取单元,用于确定每个空间文本对象的关键字出现次数;A keyword occurrence number acquisition unit is used to determine the keyword occurrence number of each space text object;

块结构获取单元,用于将关键字出现次数小于预设次数的空间文本对象设置为块结构,得到多个块结构;A block structure acquiring unit, configured to set a space text object whose keyword occurrence times are less than a preset number of times as a block structure to obtain multiple block structures;

树结构获取单元,用于将关键字出现次数大于等于预设次数的空间文本对象设置为树结构,得到多个树结构;A tree structure acquisition unit, configured to set the spatial text object whose occurrence times of keywords is greater than or equal to the preset number of times as a tree structure to obtain multiple tree structures;

索引结构获取单元,用于将所有块结构和所有树结构作为索引结构。The index structure acquisition unit is used to use all block structures and all tree structures as index structures.

其中,该候选集获取模块200,可以包括:Wherein, the candidate set obtaining module 200 may include:

候选集获取单元,用于根据得到的查询对象使用索引结构按照贪心算法从所有空间文本对象中选取多个候选空间文本对象,得到候选集。The candidate set acquisition unit is used to select a plurality of candidate spatial text objects from all spatial text objects according to the obtained query object using an index structure according to a greedy algorithm to obtain a candidate set.

本申请实施例还提供一种服务器,包括:The embodiment of the present application also provides a server, including:

存储器,用于存储计算机程序;memory for storing computer programs;

处理器,用于执行计算机程序时实现如上述实施例的结果数据选取方法。The processor is configured to implement the method for selecting result data in the above-mentioned embodiments when executing the computer program.

本申请实施例还提供一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,计算机程序被处理器执行时实现如上述实施例的结果数据选取方法。The embodiment of the present application also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method for selecting result data as in the above-mentioned embodiment is implemented.

说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。Each embodiment in the description is described in a progressive manner, each embodiment focuses on the difference from other embodiments, and the same and similar parts of each embodiment can be referred to each other. As for the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and for the related information, please refer to the description of the method part.

专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Professionals can further realize that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, computer software or a combination of the two. In order to clearly illustrate the possible For interchangeability, in the above description, the composition and steps of each example have been generally described according to their functions. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.

结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。The steps of the methods or algorithms described in connection with the embodiments disclosed herein may be directly implemented by hardware, software modules executed by a processor, or a combination of both. Software modules can be placed in random access memory (RAM), internal memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other Any other known storage medium.

以上对本申请所提供的一种基于空间关键字搜索的结果数据选取方法、结果数据选取装置、服务器以及计算机可读存储介质进行了详细介绍。本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想。应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以对本申请进行若干改进和修饰,这些改进和修饰也落入本申请权利要求的保护范围内。A method for selecting result data based on spatial keyword search, a device for selecting result data, a server, and a computer-readable storage medium provided by the present application have been described above in detail. In this paper, specific examples are used to illustrate the principles and implementation methods of the present application, and the descriptions of the above embodiments are only used to help understand the methods and core ideas of the present application. It should be pointed out that those skilled in the art can make some improvements and modifications to the application without departing from the principles of the application, and these improvements and modifications also fall within the protection scope of the claims of the application.

Claims (10)

1. a kind of result data choosing method based on spatial key search, which is characterized in that including:
Index structure is executed to multiple space text objects and establishes operation, obtains index structure;
Multiple candidate spatial text objects are chosen using the index structure according to obtained query object, obtain Candidate Set;
The distance for determining each the candidate spatial text object and the query object, obtains each candidate spatial text The distance coefficient of object and the query object;
It determines the theme quantity that each candidate spatial text object includes except all themes of initialization, obtains each First diversity number of topics of the candidate spatial text object;
Each candidate spatial text pair is determined according to all distance coefficients and all first diversity numbers of topics The first boundary cost of elephant;Wherein, the distance coefficient and first boundary cost are proportional relation, first diversity Number of topics is inverse relation with first boundary cost;
The candidate spatial text object for choosing the first boundary cost minimum is added to result set.
2. result data choosing method according to claim 1, which is characterized in that further include:
When the candidate spatial text object for choosing the first boundary cost minimum is added to result set, each time is determined Select the outsourcing of all themes of the space text object after the candidate spatial text object of the first boundary cost minimum is added The theme quantity included obtains the second diversity number of topics of each candidate spatial text object;
The corresponding candidate spatial text is determined according to all distance coefficients and all second diversity numbers of topics The second boundary cost of object;Wherein, the distance coefficient and the second boundary cost are proportional relation, and described second is various Sexual Themes number is inverse relation with the second boundary cost;
The candidate spatial text object for choosing the second boundary cost minimization is added to the result set.
3. result data choosing method according to claim 2, which is characterized in that execute rope to multiple space text objects Guiding structure establishes operation, obtains index structure, including:
Determine the keyword occurrence number of each space text object;
The space text object that the keyword occurrence number is less than to preset times is set as block structure, obtains multiple agllutinations Structure;
The space text object that the keyword occurrence number is more than or equal to the preset times is set as tree construction, obtains more A tree construction;
Using all block structures and all tree constructions as the index structure.
4. result data choosing method according to claim 3, which is characterized in that use institute according to obtained query object It states index structure and chooses multiple candidate spatial text objects, obtain Candidate Set, including:
It is selected from all space text objects according to greedy algorithm using the index structure according to obtained query object Multiple candidate spatial text objects are taken, the Candidate Set is obtained.
5. a kind of result data selecting device based on spatial key search, which is characterized in that including:
Index establishes module, establishes operation for executing index structure to multiple space text objects, obtains index structure;
Candidate Set acquisition module, for choosing multiple candidate spatial texts using the index structure according to obtained query object Object obtains Candidate Set;
Distance coefficient acquisition module, the distance for determining each the candidate spatial text object and the query object, obtains To the distance coefficient of each the candidate spatial text object and the query object;
First diversity number of topics acquisition module, for determining that each the candidate spatial text object is in all masters of initialization The theme quantity for including except topic obtains the first diversity number of topics of each candidate spatial text object;
First boundary cost acquisition module, for true according to all distance coefficients and all first diversity numbers of topics First boundary cost of fixed each candidate spatial text object;Wherein, the distance coefficient and first boundary cost For proportional relation, the first diversity number of topics is inverse relation with first boundary cost;
First result data chooses module, and the candidate spatial text object for choosing the first boundary cost minimum is added to Result set.
6. result data selecting device according to claim 5, which is characterized in that further include:
Second diversity number of topics acquisition module, for when the candidate spatial text object for choosing the first boundary cost minimum When being added to result set, determine that each candidate spatial text object is literary in the candidate spatial of the first boundary cost minimum The theme quantity for including except all themes after the addition of this object, obtains more than the second of each candidate spatial text object Sample Sexual Themes number;
The second boundary cost acquisition module, for true according to all distance coefficients and all second diversity numbers of topics The second boundary cost of the fixed corresponding candidate spatial text object;Wherein, the distance coefficient and the second boundary at This is proportional relation, and the second diversity number of topics is inverse relation with the second boundary cost;
Second result data chooses module, and the candidate spatial text object for choosing the second boundary cost minimization is added to The result set.
7. result data selecting device according to claim 6, which is characterized in that the index establishes module, including:
Keyword occurrence number acquiring unit, the keyword occurrence number for determining each space text object;
Block structure acquiring unit, the space text object for the keyword occurrence number to be less than to preset times are set as block Structure obtains multiple block structures;
Tree construction acquiring unit, the space text object for the keyword occurrence number to be more than or equal to the preset times It is set as tree construction, obtains multiple tree constructions;
Index structure acquiring unit, for using all block structures and all tree constructions as the index structure.
8. result data selecting device according to claim 7, which is characterized in that the Candidate Set acquisition module, including:
Candidate Set acquiring unit, for according to obtained query object using the index structure according to greedy algorithm from all institutes It states and chooses multiple candidate spatial text objects in the text object of space, obtain the Candidate Set.
9. a kind of server, which is characterized in that including:
Memory, for storing computer program;
Processor realizes that Claims 1-4 any one of them result data such as is chosen when for executing the computer program Method.
10. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium Program realizes such as Claims 1-4 any one of them result data selection side when the computer program is executed by processor Method.
CN201810184309.9A 2018-03-06 2018-03-06 A method and related device for selecting result data based on spatial keyword search Active CN108304585B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810184309.9A CN108304585B (en) 2018-03-06 2018-03-06 A method and related device for selecting result data based on spatial keyword search

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810184309.9A CN108304585B (en) 2018-03-06 2018-03-06 A method and related device for selecting result data based on spatial keyword search

Publications (2)

Publication Number Publication Date
CN108304585A true CN108304585A (en) 2018-07-20
CN108304585B CN108304585B (en) 2022-05-17

Family

ID=62849191

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810184309.9A Active CN108304585B (en) 2018-03-06 2018-03-06 A method and related device for selecting result data based on spatial keyword search

Country Status (1)

Country Link
CN (1) CN108304585B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112149005A (en) * 2019-06-27 2020-12-29 腾讯科技(深圳)有限公司 Method, apparatus, device and readable storage medium for determining search results
CN112632267A (en) * 2020-12-04 2021-04-09 中国人民大学 Search result diversification system combining global interaction and greedy selection
CN113065036A (en) * 2021-04-14 2021-07-02 深圳大学 Method and device for measuring performance of space supporting point and related components
CN113158087A (en) * 2021-04-09 2021-07-23 深圳前海微众银行股份有限公司 Query method and device for space text
CN113407799A (en) * 2021-06-22 2021-09-17 深圳大学 Performance measurement method and device for measuring space division boundary and related equipment
CN114595313A (en) * 2020-12-07 2022-06-07 北京达佳互联信息技术有限公司 Information retrieval result processing method and device, server and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104834679A (en) * 2015-04-14 2015-08-12 苏州大学 Representation and inquiry method of behavior track and device therefor
CN105069094A (en) * 2015-08-06 2015-11-18 苏州大学 Semantic understanding based space keyword indexing method
CN106503223A (en) * 2016-11-04 2017-03-15 华东师范大学 A kind of binding site and the online source of houses searching method and device of key word information
CN107145545A (en) * 2017-04-18 2017-09-08 东北大学 Top k zone users text data recommends method in a kind of location-based social networks

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104834679A (en) * 2015-04-14 2015-08-12 苏州大学 Representation and inquiry method of behavior track and device therefor
CN105069094A (en) * 2015-08-06 2015-11-18 苏州大学 Semantic understanding based space keyword indexing method
CN106503223A (en) * 2016-11-04 2017-03-15 华东师范大学 A kind of binding site and the online source of houses searching method and device of key word information
CN107145545A (en) * 2017-04-18 2017-09-08 东北大学 Top k zone users text data recommends method in a kind of location-based social networks

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
JIABAO SUN等: "Interactive Spatial Keyword Querying with Semantics", 《 PROCEEDINGS OF THE 2017 ACM ON CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT》 *
ZHIHU QIAN等: "On Efficient Spatial Keyword Querying with Semantics", 《INTERNATIONAL CONFERENCE ON DATABASE SYSTEMS FOR ADVANCED APPLICATIONS》 *
ZHIHU QIAN等: "Semantic-aware top-k spatial keyword queries", 《WORLD WIDE WEB》 *
梁银等: "基于对象集合的空间关键词查询", 《计算机应用》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112149005A (en) * 2019-06-27 2020-12-29 腾讯科技(深圳)有限公司 Method, apparatus, device and readable storage medium for determining search results
CN112149005B (en) * 2019-06-27 2023-09-01 腾讯科技(深圳)有限公司 Method, apparatus, device and readable storage medium for determining search results
CN112632267A (en) * 2020-12-04 2021-04-09 中国人民大学 Search result diversification system combining global interaction and greedy selection
CN112632267B (en) * 2020-12-04 2023-05-02 中国人民大学 Global interaction and greedy selection combined search result diversification system
CN114595313A (en) * 2020-12-07 2022-06-07 北京达佳互联信息技术有限公司 Information retrieval result processing method and device, server and storage medium
CN113158087A (en) * 2021-04-09 2021-07-23 深圳前海微众银行股份有限公司 Query method and device for space text
CN113065036A (en) * 2021-04-14 2021-07-02 深圳大学 Method and device for measuring performance of space supporting point and related components
CN113407799A (en) * 2021-06-22 2021-09-17 深圳大学 Performance measurement method and device for measuring space division boundary and related equipment

Also Published As

Publication number Publication date
CN108304585B (en) 2022-05-17

Similar Documents

Publication Publication Date Title
CN108304585B (en) A method and related device for selecting result data based on spatial keyword search
CN110609902B (en) Text processing method and device based on fusion knowledge graph
Guo et al. Efficient selection of geospatial data on maps for interactive and visualized exploration
US9501577B2 (en) Recommending points of interests in a region
US10204142B2 (en) Progressive spatial searching using augmented structures
CN109522465A (en) The semantic searching method and device of knowledge based map
JP4992243B2 (en) Information element processing program, information element processing method, and information element processing apparatus
CN110059264B (en) Location retrieval method, device and computer storage medium based on knowledge map
CN105069094B (en) A kind of spatial key indexing means based on semantic understanding
CN112115227A (en) Data query method and device, electronic equipment and storage medium
CN109033314A (en) The Query method in real time and system of extensive knowledge mapping in the case of memory-limited
CN113254630A (en) Domain knowledge map recommendation method for global comprehensive observation results
CN115544088A (en) Address information query method, device, electronic equipment and storage medium
Sun et al. Interactive spatial keyword querying with semantics
EP3243147A1 (en) Geocoding multi-entity queries
CN116839613A (en) Multi-attribute-oriented dynamic group travel planning method and device
CN116680356A (en) Address data processing method and device, electronic equipment and storage medium
CN104850658B (en) A kind of data filling method and system
CN107562872A (en) Metric space data similarity search method and device based on SQL
CN108287853B (en) Data relation analysis method and system
CN118897866B (en) Data query method and related device based on relational cluster database
CN119441545B (en) Index generation method and query method based on multi-modal database
JP6167531B2 (en) Region search method, region index construction method, and region search device
Rocha-Junior Efficient Processing of Preference Queries in Distributed and Spatial Databases.
CN107741879A (en) A kind of big data processing method and its device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant