CN113849498B

CN113849498B - Index construction and query method

Info

Publication number: CN113849498B
Application number: CN202110950796.7A
Authority: CN
Inventors: 崔栋; 张华�; 温巧燕; 王华伟; 涂腾飞
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2021-08-18
Filing date: 2021-08-18
Publication date: 2022-08-23
Anticipated expiration: 2041-08-18
Also published as: CN113849498A

Abstract

One or more embodiments of this specification provide an index construction and query method, including constructing a quad-tree structure according to a spatial data set; using a Z curve to perform data dimension reduction processing on the spatial data in a cell to obtain the spatial data of the spatial data. Data representation; sort the spatial data representation according to the Z value, and construct a linked list; perform data segmentation processing based on the linked list to obtain multiple data segments; construct a local model of each data segment, and determine the unit according to the local model of each data segment Lattice query model. On the basis of the constructed quad-tree structure, the data segmentation algorithm is used to divide the data segments, and the query model is constructed to reduce the space storage cost and improve the retrieval performance. Dynamic index building for dynamically updated spatial datasets.

Description

An index construction and query method

技术领域technical field

本说明书一个或多个实施例涉及数据处理技术领域，尤其涉及一种索引构建及查询方法。One or more embodiments of this specification relate to the technical field of data processing, and in particular, to an index construction and query method.

背景技术Background technique

物联网设备会生成大量的地理空间数据，为了有效地访问和处理此类数据，通常会采用树型索引结构存储空间数据。然而，当数据量达到PB级别以上时，树形索引结构急剧变大，严重侵占系统资源；另一方面，传统的树形索引结构在静态数据上性能较好，但无法支持动态的索引构建与更新。IoT devices generate large amounts of geospatial data, and in order to efficiently access and process such data, spatial data is usually stored in a tree-type index structure. However, when the amount of data reaches PB level or more, the tree-shaped index structure grows rapidly, which seriously occupies system resources; on the other hand, the traditional tree-shaped index structure has better performance on static data, but cannot support dynamic index construction and renew.

发明内容SUMMARY OF THE INVENTION

有鉴于此，本说明书一个或多个实施例的目的在于提出一种索引构建及查询方法，能够实现动态索引构建。In view of this, the purpose of one or more embodiments of this specification is to provide an index construction and query method, which can realize dynamic index construction.

基于上述目的，本说明书一个或多个实施例提供了一种索引构建方法，包括：Based on the above purpose, one or more embodiments of this specification provide an index construction method, including:

根据空间数据集，构造包括至少一个单元格的四叉树结构；According to the spatial data set, construct a quadtree structure including at least one cell;

利用Z曲线对所述单元格内的空间数据进行数据降维处理，得到与所述空间数据对应的空间数据表示；其中，所述空间数据表示包括空间数据的坐标及Z值；The spatial data in the cell is subjected to data dimensionality reduction processing by using the Z-curve to obtain a spatial data representation corresponding to the spatial data; wherein, the spatial data representation includes the coordinates and Z values of the spatial data;

根据所述Z值对所述空间数据表示进行排序，并将排序后的空间数据表示保存于链表中；Sorting the spatial data representation according to the Z value, and saving the sorted spatial data representation in a linked list;

基于所述链表进行数据分段处理，得到至少一个数据段；Perform data segmentation processing based on the linked list to obtain at least one data segment;

构建每个数据段的本地模型，根据各数据段的本地模型，确定所述单元格的查询模型。A local model of each data segment is constructed, and a query model of the cell is determined according to the local model of each data segment.

可选的，所述链表的各结点包括空间数据表示及其在链表中的位置。Optionally, each node of the linked list includes a spatial data representation and its position in the linked list.

可选的，基于所述链表进行数据分段处理，得到至少一个数据段为：采用FSW算法对所述链表中的结点进行数据分段，得到至少一个数据段。Optionally, performing data segmentation processing based on the linked list to obtain at least one data segment is: using the FSW algorithm to perform data segmentation on the nodes in the linked list to obtain at least one data segment.

可选的，所述数据段的本地模型包括数据段的起始点的位置及Z值，结束点的位置及Z值，根据所述起始点和结束点计算得到的斜率。Optionally, the local model of the data segment includes the position and Z value of the start point of the data segment, the position and Z value of the end point, and the slope calculated according to the start point and the end point.

可选的，根据各数据段的本地模型，确定所述单元格的查询模型为：对各数据段的本地模型进行数据拟合，生成所述查询模型。Optionally, determining the query model of the cell according to the local model of each data segment is: performing data fitting on the local model of each data segment to generate the query model.

本说明书实施例还提供一种范围查询方法，基于所构建的索引进行查询，包括：The embodiments of this specification also provide a range query method, which performs query based on the constructed index, including:

根据输入的查询矩形查询所述四叉树结构，确定与所述查询矩形具有交集的单元格；Query the quad-tree structure according to the input query rectangle, and determine the cells that have an intersection with the query rectangle;

将所述查询矩形划分为多个子矩形；其中，所述子矩形分为与所述单元格完全重合的子矩形，和与所述单元格部分重合的子矩形；Dividing the query rectangle into a plurality of sub-rectangles; wherein, the sub-rectangles are divided into sub-rectangles that completely coincide with the cells and sub-rectangles that partially coincide with the cells;

对于与单元格完全重合的子矩形，将所述完全重合的单元格中的空间数据作为查询结果；For the sub-rectangle that completely coincides with the cell, use the spatial data in the completely coincident cell as the query result;

对于与单元格部分重合的子矩形，根据所述部分重合的单元格的查询模型进行查询，得到查询结果。For a sub-rectangle partially overlapping with a cell, query is performed according to the query model of the partially overlapping cell, and a query result is obtained.

可选的，根据根据所述部分重合的单元格的查询模型进行查询，得到查询结果，包括：Optionally, query according to the query model of the partially overlapping cells to obtain query results, including:

确定所述与单元格部分重合的子矩形的最小Z值和最大Z值；Determine the minimum Z value and the maximum Z value of the sub-rectangle partially coincident with the cell;

确定所述最大Z值和最小Z值在所述查询模型中的可能位置范围；determining the range of possible positions of the maximum Z value and the minimum Z value in the query model;

判断所述可能位置范围内的空间数据是否在所述查询矩形范围内，将在所述查询矩形范围内的空间数据作为查询结果。It is judged whether the spatial data within the range of possible positions is within the range of the query rectangle, and the spatial data within the range of the query rectangle is used as a query result.

本说明书实施例还提供一种最近邻查询方法，基于所构建的索引进行查询，包括：The embodiments of this specification also provide a nearest neighbor query method, which performs query based on the constructed index, including:

根据输入的查询数据点查询四叉树结构，确定所述查询数据点所在的单元格；Query the quad-tree structure according to the input query data point, and determine the cell where the query data point is located;

根据所述单元格的空间密度和数据点数量，确定初始距离；Determine the initial distance according to the spatial density of the cell and the number of data points;

确定所述初始距离与预设的距离阈值中较小的距离，根据所述查询数据点和所述距离，构建初始查询矩形；determining the smaller distance between the initial distance and a preset distance threshold, and constructing an initial query rectangle according to the query data point and the distance;

以所述初始查询矩形为范围查询方法的查询矩形，获得范围查询结果；Using the initial query rectangle as the query rectangle of the range query method, obtain the range query result;

根据所述范围查询结果确定最近邻查询结果。The nearest neighbor query result is determined according to the range query result.

可选的，根据所述范围查询结果确定最近邻查询结果，包括：Optionally, the nearest neighbor query result is determined according to the range query result, including:

以所述查询数据点为圆心，所述距离为半径，构建初始查询圆形；Constructing an initial query circle with the query data point as the center and the distance as the radius;

判断所述范围查询结果中的空间数据是否在所述初始查询圆形范围之内，将在所述初始查询圆形内的空间数据作为中间查询结果；Judging whether the spatial data in the range query result is within the range of the initial query circle, and using the spatial data within the initial query circle as an intermediate query result;

判断所述中间查询结果中的数据量是否达到预定数量；judging whether the amount of data in the intermediate query result reaches a predetermined amount;

如果达到，按照与所述查询数据点的距离由小到大排列，从排序后的数据点中选取所述预定数量的数据点作为最近邻查询结果；If so, arrange according to the distance from the query data point from small to large, and select the predetermined number of data points from the sorted data points as the nearest neighbor query result;

如果未达到，对所述初始距离进行扩展，根据所述查询数据点和扩展的距离，构建二次查询矩形，根据所述二次查询矩形利用范围查询方法进行查询，得到范围查询结果，基于范围查询结果更新所述中间查询结果，直至所述中间查询结果中的数据量达到所述预定数量为止。If it is not reached, expand the initial distance, construct a secondary query rectangle according to the query data points and the expanded distance, perform a query using a range query method according to the secondary query rectangle, and obtain a range query result, based on the range The query result updates the intermediate query result until the amount of data in the intermediate query result reaches the predetermined amount.

可选的，根据所述二次查询矩形利用范围查询方法进行查询，包括：Optionally, performing a query by using a range query method according to the secondary query rectangle, including:

根据所述初始查询矩形和所述二次查询矩形，确定未查询区域；According to the initial query rectangle and the secondary query rectangle, determine the unsearched area;

以所述未查询区域为范围查询方法的查询矩形进行查询。The query is performed using the unqueried area as the query rectangle of the range query method.

从上面所述可以看出，本说明书一个或多个实施例提供的索引构建及查询方法，通过根据空间数据集，构造四叉树结构；利用Z曲线对单元格内的空间数据进行数据降维处理，得到空间数据的空间数据表示；根据Z值对空间数据表示进行排序，并构建链表；基于链表进行数据分段处理，得到多个数据段；构建每个数据段的本地模型，根据各数据段的本地模型，确定单元格的查询模型。本实施例的方法，在构建的四叉树结构基础上，利用数据分段算法划分数据段，并构建查询模型，降低空间存储代价，提高检索性能，可以一次性的数据遍历快速构建索引，提高索引构建效率，适用于动态更新的空间数据集的动态索引构建。It can be seen from the above that the index construction and query method provided by one or more embodiments of this specification constructs a quad-tree structure according to a spatial data set; uses Z-curve to perform data dimensionality reduction on spatial data in a cell process to obtain the spatial data representation of the spatial data; sort the spatial data representation according to the Z value, and build a linked list; perform data segmentation processing based on the linked list to obtain multiple data segments; build a local model of each data segment, according to each data segment The local model of the segment, which determines the cell's query model. The method of this embodiment, on the basis of the constructed quad-tree structure, uses the data segmentation algorithm to divide the data segments, and constructs the query model, which reduces the space storage cost, improves the retrieval performance, and can quickly build an index by one-time data traversal, and improves the Index construction efficiency, suitable for dynamic index construction of dynamically updated spatial datasets.

附图说明Description of drawings

为了更清楚地说明本说明书一个或多个实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本说明书一个或多个实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate one or more embodiments of the present specification or the technical solutions in the prior art, the following briefly introduces the accompanying drawings used in the description of the embodiments or the prior art. Obviously, in the following description The accompanying drawings are only one or more embodiments of the present specification, and for those of ordinary skill in the art, other drawings can also be obtained from these drawings without any creative effort.

图1为本说明书一个或多个实施例的构建方法流程示意图；1 is a schematic flowchart of a construction method according to one or more embodiments of this specification;

图2为本说明书一个或多个实施例的索引构建过程示意图；2 is a schematic diagram of an index building process according to one or more embodiments of the present specification;

图3为本说明书一个或多个实施例的数据划分示意图；3 is a schematic diagram of data partitioning in one or more embodiments of the present specification;

图4A、4B为本说明书一个或多个实施例的数据降维和有序化示意图；4A and 4B are schematic diagrams of dimensionality reduction and ordering of data according to one or more embodiments of this specification;

图5为本说明书一个或多个实施例的数据分段示意图；FIG. 5 is a schematic diagram of data segmentation according to one or more embodiments of the present specification;

图6为本说明书一个或多个实施例的查询模型示意图；FIG. 6 is a schematic diagram of a query model according to one or more embodiments of the present specification;

图7为本说明书一个或多个实施例的范围查询流程示意图；FIG. 7 is a schematic diagram of a scope query process according to one or more embodiments of this specification;

图8A、8B、8C为本说明书一个或多个实施例的范围查询示例图；8A, 8B, and 8C are exemplary diagrams of a range query in one or more embodiments of the present specification;

图9为本说明书一个或多个实施例的KNN查询流程示意图；9 is a schematic diagram of a KNN query process according to one or more embodiments of this specification;

图10A、10B、10C为本说明书一个或多个实施例的KNN查询示例图；10A, 10B, and 10C are exemplary diagrams of KNN queries according to one or more embodiments of this specification;

图11为本说明书一个或多个实施例的装置结构示意图；11 is a schematic structural diagram of an apparatus according to one or more embodiments of the specification;

图12为本说明书一个或多个实施例的电子设备结构示意图。FIG. 12 is a schematic structural diagram of an electronic device according to one or more embodiments of the specification.

具体实施方式Detailed ways

为使本公开的目的、技术方案和优点更加清楚明白，以下结合具体实施例，并参照附图，对本公开进一步详细说明。In order to make the objectives, technical solutions and advantages of the present disclosure clearer, the present disclosure will be further described in detail below with reference to the specific embodiments and the accompanying drawings.

需要说明的是，除非另外定义，本说明书一个或多个实施例使用的技术术语或者科学术语应当为本公开所属领域内具有一般技能的人士所理解的通常意义。本说明书一个或多个实施例中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性，而只是用来区分不同的组成部分。“包括”或者“包含”等类似的词语意指出现该词前面的元件或者物件涵盖出现在该词后面列举的元件或者物件及其等同，而不排除其他元件或者物件。“连接”或者“相连”等类似的词语并非限定于物理的或者机械的连接，而是可以包括电性的连接，不管是直接的还是间接的。“上”、“下”、“左”、“右”等仅用于表示相对位置关系，当被描述对象的绝对位置改变后，则该相对位置关系也可能相应地改变。It should be noted that, unless otherwise defined, the technical or scientific terms used in one or more embodiments of the present specification shall have the usual meanings understood by those with ordinary skill in the art to which this disclosure belongs. The terms "first," "second," and similar terms used in one or more embodiments of this specification do not denote any order, quantity, or importance, but are merely used to distinguish the various components. "Comprises" or "comprising" and similar words mean that the elements or things appearing before the word encompass the elements or things recited after the word and their equivalents, but do not exclude other elements or things. Words like "connected" or "connected" are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "Up", "Down", "Left", "Right", etc. are only used to represent the relative positional relationship, and when the absolute position of the described object changes, the relative positional relationship may also change accordingly.

如背景技术部分所述，传统的树形索引结构对于大数据量的空间数据，性能急剧下降，由于未考虑空间数据的分布特点，没有根据数据分布特点选取合适的索引结构。机器学习算法的引入从根本上改变了传统的索引结构，然而，目前已有的索引构建方法，有些不支持特定的操作，有些无法适用于动态更新的空间数据的动态索引构建。As described in the background art section, the performance of the traditional tree index structure for large amount of spatial data drops sharply. Since the distribution characteristics of the spatial data are not considered, an appropriate index structure is not selected according to the data distribution characteristics. The introduction of machine learning algorithms has fundamentally changed the traditional index structure. However, some of the existing index construction methods do not support specific operations, and some are not suitable for dynamic index construction of dynamically updated spatial data.

有鉴于此，本说明书实施例提供一种索引构建方法，根据空间数据集构建四叉树结构，利用Z曲线对四叉树结构的单元格进行数据降维处理，根据Z值对空间数据进行排序，基于排序后的空间数据构建链表，对链表中的各结点进行数据分段，构建各数据段的本地模型，根据各本地模型确定单元格的查询模型；能够快速构建索引，适用于动态更新的空间数据集的动态索引构建。In view of this, an embodiment of this specification provides an index construction method, which constructs a quad-tree structure according to a spatial data set, performs data dimensionality reduction processing on the cells of the quad-tree structure by using a Z curve, and sorts the spatial data according to the Z value. , build a linked list based on the sorted spatial data, segment the data for each node in the linked list, build a local model of each data segment, and determine the query model of the cell according to each local model; can quickly build an index, suitable for dynamic update Dynamic index building of spatial datasets.

如图1、2所示，本说明书实施例提供一种索引构建方法，包括：As shown in Figures 1 and 2, the embodiments of this specification provide an index construction method, including:

S101：根据空间数据集，构造包括至少一个单元格的四叉树结构；S101: Construct a quadtree structure including at least one cell according to the spatial data set;

本实施例中，根据空间数据集中空间数据的坐标，采用四叉树算法将空间数据集划分为包括多个单元格的四叉树结构。In this embodiment, according to the coordinates of the spatial data in the spatial data set, a quad-tree algorithm is used to divide the spatial data set into a quad-tree structure including a plurality of cells.

结合图3所示，具体的，假设原始的空间数据集Data:Data＝{k_i|i＝0,1,…,t}，空间数据k_i的坐标为(x_i,y_i)，利用四叉树算法对空间数据集进行划分的过程为：将空间数据集中所有的数据点映射到二维数据空间V＝[X_l,X_u)×[Y_l,Y_u)∈R²，计算二维数据空间V的数据量V_count；判断数据量V_count是否已达到数据量阈值，若未达到，根据二维数据空间V构建单元格，若已达到，对二维数据空间V进行等距离划分，生成四个子数据空间{NW，NE，SW，SE}，其中，NW＝[X_l,(X_l+X_u)/2)×[(Y_l+Y_u)/2,Y_u)，NE＝[(X_l+X_u)/2,X_u)×[(Y_l+Y_u)/2,Y_u)，SW＝[X_l,(X_l+X_u)/2)×[Y_l,(Y_l+Y_u)/2)，SE＝[(X_l+X_u)/2,X_u)×[Y_l,(Y_l+Y_u)/2)；根据数据点k_i的坐标将其划分到对应的子数据空间中，所有数据点划分之后，得到每个子数据空间的所有数据点。之后，对于任意子数据空间V∈{NW,NE,SW,SE}，再根据子数据空间中的数据量，判断数据量是否达到数据量阈值，若达到，继续对子数据空间进行划分，直至划分出的数据空间内的数据量均小于数据量阈值，停止划分，得到划分出的多个单元格。As shown in Figure 3, specifically, assuming the original spatial data set Data:Data={ki | _i =0,1,...,t}, the coordinates of the spatial data _{ki are (x i , y i} ₎ _, using The process of dividing the spatial data set by the quadtree algorithm is as follows: map all the data points in the spatial data set to the two-dimensional data space V=[X _l ,X _u )×[Y _l ,Y _u )∈R ² , calculate The data volume V _count of the two-dimensional data space V; determine whether the data volume V _count has reached the data volume threshold, if not, construct a cell according to the two-dimensional data space V, if it has reached, carry out equidistant to the two-dimensional data space V Divide to generate four sub-data spaces {NW, NE, SW, SE}, where NW=[X _l ,(X _l +X _u )/2)×[(Y _l +Y _u )/2,Y _u ) , NE=[(X _l +X _u )/2,X _u )×[(Y _l +Y _u )/2,Y _u ), SW=[X _l ,(X _l +X _u )/2)× [Y _l ,(Y _l +Y _u )/2), SE=[(X _l +X _u )/2,X _u )×[Y _l ,(Y _l +Y _u )/2); according to data points The coordinates of k _i are divided into corresponding sub-data spaces. After all data points are divided, all data points in each sub-data space are obtained. After that, for any sub-data space V∈{NW,NE,SW,SE}, according to the data amount in the sub-data space, determine whether the data amount reaches the data amount threshold, if so, continue to divide the sub-data space until If the amount of data in the divided data space is smaller than the data amount threshold, the division is stopped, and multiple divided cells are obtained.

S102：利用Z曲线对单元格内的空间数据进行数据降维处理，得到与空间数据对应的空间数据表示；其中，空间数据表示包括空间数据的坐标及Z值；S102: Perform data dimensionality reduction processing on the spatial data in the cell by using the Z-curve to obtain a spatial data representation corresponding to the spatial data; wherein, the spatial data representation includes coordinates and Z values of the spatial data;

本实施例中，利用Z曲线对每个单元格内的空间数据进行数据降维处理，得到空间数据的一维空间数据表示。In this embodiment, a Z-curve is used to perform data dimension reduction processing on the spatial data in each cell to obtain a one-dimensional spatial data representation of the spatial data.

结合图4A、4B所示，对于单元格内的空间数据k_i(x_i,y_i)，根据Z曲线的映射函数，计算Z值，表示为：4A and 4B, for the spatial data k _i (x _i , y _i ) in the cell, according to the mapping function of the Z curve, the Z value is calculated, which is expressed as:

根据计算出的Z值，确定空间数据k_i的空间数据表示

According to the calculated Z value, determine the spatial data representation of the spatial data _ki

S103：根据Z值对空间数据表示进行排序，并将排序后的空间数据表示保存于链表中；S103: Sort the spatial data representation according to the Z value, and save the sorted spatial data representation in the linked list;

本实施例中，确定空间数据的空间数据表示之后，根据空间数据表示中的Z值从小到大对空间数据表示进行排序，得到排序后的空间数据表示，可表示为有序空间数据表示集

In this embodiment, after the spatial data representation of the spatial data is determined, the spatial data representation is sorted according to the Z value in the spatial data representation from small to large, and the sorted spatial data representation is obtained, which can be represented as an ordered spatial data representation set

获得排序后的空间数据表示之后，基于排序后的空间数据表示，构建链表List＝{(i,z_i,k_i)|i＝0,1,…,n}，其中，i为链表中结点的编号，也即结点对应的空间数据表示在链表中的位置，z_i为第i个结点的Z值，k_i为第i个结点的空间数据。After obtaining the sorted spatial data representation, based on the sorted spatial data representation, construct a linked list List={(i,z _i , _ki )|i=0,1,...,n}, where i is the node in the linked list. The number of the point, that is, the position of the spatial data corresponding to the node in the linked list, _zi is the Z value of the _ith node, and ki is the spatial data of the ith node.

S104：基于链表进行数据分段处理，得到至少一个数据段；S104: Perform data segmentation processing based on the linked list to obtain at least one data segment;

S105：构建每个数据段的本地模型，根据各数据段的本地模型，确定单元格的查询模型。S105: Build a local model of each data segment, and determine a query model of a cell according to the local model of each data segment.

本实施例中，构建链表之后，按照数据分布规律对链表中的各结点进行数据分段，得到多个数据段；对于每个数据段，构建相应的本地模型，根据所有数据段的本地模型确定单元格的查询模型。In this embodiment, after the linked list is constructed, data segments are performed on each node in the linked list according to the data distribution law to obtain multiple data segments; for each data segment, a corresponding local model is constructed, and according to the local models of all data segments Determine the query model for the cell.

一些实施例中，采用FSW算法(feasible space window，可行空间窗)对链表中的结点进行数据分段。FSW算法的原理是，将链表的第一个结点初始化为第一数据段的起始点，第二个结点为第一数据段的初始结束点；按照预定的规则判断第三个结点是否可以加入第一数据段中，如果可以加入则继续判断下一个结点是否可以加入第一数据段中，直至下一个结点不可以加入第一数据段中，得到划分出的第一数据段。之后，以下一个结点为第二数据段的起始点，按照预定的规则继续判断后续结点是否可以加入，直至遍历链表中的所有结点，完成数据分段，得到至少一个数据段。In some embodiments, the FSW algorithm (feasible space window, feasible space window) is used to perform data segmentation on the nodes in the linked list. The principle of the FSW algorithm is to initialize the first node of the linked list as the starting point of the first data segment, and the second node as the initial end point of the first data segment; according to predetermined rules, determine whether the third node is It can be added to the first data segment, and if it can be added, it continues to judge whether the next node can be added to the first data segment, until the next node cannot be added to the first data segment, and the divided first data segment is obtained. Afterwards, the next node is the starting point of the second data segment, and it continues to judge whether subsequent nodes can be added according to predetermined rules, until all nodes in the linked list are traversed, data segmentation is completed, and at least one data segment is obtained.

具体的，利用FSW算法对链表中的结点进行数据分段的过程是：初始化可行区间S＝FSW(k₀,k₁,ε)，其中，ε为最大误差阈值，初始化第一数据段d＝{k₀,k₁}，链表中的第0个结点为第一数据段的起始点，第1个结点为初始结束点；判断第2个结点是否可以加入第一数据段，计算第2个结点的可行区间S’＝FSW(k₁,k₂,ε)，计算S＝S′∩S；Specifically, the process of using the FSW algorithm to segment the data of the nodes in the linked list is: initializing the feasible interval S=FSW(k ₀ , k ₁ , ε), where ε is the maximum error threshold, and initializing the first data segment d ={k ₀ ,k ₁ }, the 0th node in the linked list is the starting point of the first data segment, and the first node is the initial end point; to determine whether the second node can be added to the first data segment, Calculate the feasible interval S'=FSW(k ₁ , k ₂ , ε) of the second node, and calculate S=S′∩S;

若S不为空，表示第2个结点符合线性分布，将第2个结点加入第一数据段，继续判断第3个结点是否可以加入第一数据段，直至下一个结点不可以加入第一数据段，第一数据段分段结束。此时，根据第一数据段构建第一数据段的本地模型L:L＝{l,u,z_l,z_u,θ}，其中，l为第一数据段的起始点的位置，u为第一数据段的结束点的位置，z_l为起始点的Z值，z_u为结束点的Z值，θ为根据起始点的坐标和结束点的坐标计算得到的斜率。If S is not empty, it means that the second node conforms to the linear distribution, add the second node to the first data segment, and continue to judge whether the third node can be added to the first data segment until the next node cannot. The first data segment is added, and the segment of the first data segment ends. At this time, a local model L:L={l,u,z _l ,z _u ,θ} of the first data segment is constructed according to the first data segment, where l is the position of the starting point of the first data segment, and u is The position of the end point of the first data segment, z _l is the Z value of the start point, z _u is the Z value of the end point, and θ is the slope calculated according to the coordinates of the start point and the end point.

若S为空，则第2个结点不符合线性分布，第一数据段分段结束；第一数据段分段结束后，重新开始划分第二数据段，初始化可行区间S＝FSW(k₂,k₃,ε)，初始化第二数据段d＝{k₂,k₃}，以第2个结点为第二数据段的起始点，第3个结点为第二数据段的初始结束点，判断第4个结点是否可以加入第二数据段，计算第4个结点的可行区间S’＝FSW(k₃,k₄,ε)，计算S＝S′∩S。If S is empty, the second node does not conform to the linear distribution, and the first data segment is segmented; after the first data segment is segmented, the second data segment is divided again, and the initialized feasible interval S=FSW(k ₂ ,k ₃ ,ε), initialize the second data segment d={k ₂ ,k ₃ }, take the second node as the starting point of the second data segment, and the third node as the initial end of the second data segment point, determine whether the fourth node can be added to the second data segment, calculate the feasible interval S'=FSW(k ₃ , k ₄ , ε) of the fourth node, and calculate S=S′∩S.

如图5、6所示，按照上述过程，对链表中的各结点进行分段处理，最终得到多个数据段，以及每个数据段所对应的本地模型。之后，基于各数据段的本地模型，通过对Z值进行数据拟合，得到单元格的查询模型。一些方式中，可使用线性分段函数进行数据拟合，与传统树形索引结构相比，能够减少索引的中间节点数量和间接查询次数，减少索引存储代价，提高检索速度。As shown in Figures 5 and 6, according to the above process, each node in the linked list is segmented, and finally a plurality of data segments and a local model corresponding to each data segment are obtained. Then, based on the local model of each data segment, the query model of the cell is obtained by performing data fitting on the Z value. In some ways, a linear piecewise function can be used for data fitting, which can reduce the number of intermediate nodes in the index and the number of indirect queries, reduce the cost of index storage, and improve retrieval speed compared with the traditional tree index structure.

本说明书实施例提供的索引构建方法，包括根据空间数据集，构造包括至少一个单元格的四叉树结构；利用Z曲线对单元格内的空间数据进行数据降维处理，得到与空间数据对应的空间数据表示；根据Z值对空间数据表示进行排序，得到排序后的空间数据表示；根据排序后的空间数据表示，构建链表；基于链表进行数据分段处理，得到至少一个数据段；构建每个数据段的本地模型，根据各数据段的本地模型，确定单元格的查询模型。在构建的四叉树结构基础上，利用数据分段算法划分数据段，并构建查询模型，降低空间存储代价，提高检索性能，可以一次性的数据遍历快速构建索引，提高索引构建效率，适用于动态更新的空间数据集的动态索引构建。The index construction method provided by the embodiments of this specification includes constructing a quad-tree structure including at least one cell according to a spatial data set; performing data dimension reduction processing on the spatial data in the cell by using a Z-curve to obtain an index corresponding to the spatial data. Spatial data representation; sort the spatial data representation according to the Z value to obtain the sorted spatial data representation; construct a linked list according to the sorted spatial data representation; perform data segmentation processing based on the linked list to obtain at least one data segment; construct each The local model of the data segment determines the query model of the cell according to the local model of each data segment. On the basis of the constructed quadtree structure, the data segmentation algorithm is used to divide the data segments, and the query model is constructed to reduce the space storage cost and improve the retrieval performance. Dynamic index building for dynamically updated spatial datasets.

如图7、8A、8B、8C所示，一些实施例中，基于所构建的索引可以进行范围查询，范围查询的方法包括：As shown in FIGS. 7, 8A, 8B, and 8C, in some embodiments, a range query can be performed based on the constructed index, and the range query method includes:

S701：根据查询矩形查询四叉树结构，确定与查询矩形具有交集的单元格；S701: Query the quad-tree structure according to the query rectangle, and determine the cells that have an intersection with the query rectangle;

一些实施方式中，进行范围查询时，给定查询矩形qr＝[x_l，x_u)×[y_l，y_u)，其中，x_l表示x轴方向最小边界值，x_u表示x轴方向最大边界值，y_l表示y轴方向最小边界值，y_u表示y轴方向最大边界值。根据查询矩形查询四叉树结构，确定出与查询矩形具有交集的单元格。In some embodiments, when performing a range query, a given query rectangle qr=[x _l , x _u )×[y _l , _yu ), where x _l represents the minimum boundary value in the x-axis direction, and x _u represents the x-axis direction _The maximum boundary value, yl represents the minimum boundary value in the y-axis direction, and _yu represents the maximum boundary value in the y-axis direction. The quad-tree structure is searched according to the query rectangle, and the cells that have the intersection with the query rectangle are determined.

S702：将查询矩形划分为多个子矩形；其中，子矩形分为与单元格完全重合的子矩形，和与单元格部分重合的子矩形；S702: Divide the query rectangle into a plurality of sub-rectangles; wherein, the sub-rectangles are divided into sub-rectangles that completely overlap with the cells, and sub-rectangles that partially overlap with the cells;

本实施例中，查询矩形qr可由多个子矩形qr’_i的并集表示：In this embodiment, the query rectangle qr can be represented by the union of multiple sub-rectangles qr' _i :

根据子矩形与单元格的位置关系，将查询矩形还可表示为：According to the positional relationship between the sub-rectangle and the cell, the query rectangle can also be expressed as:

其中，cr_i为与单元格完全重合的子矩形，lr_i为与单元格部分重合的子矩形，Qc为与单元格完全重合的子矩形的数量，Ql为与单元格部分重合的子矩形的数量。Among them, cr _i is the sub-rectangle that completely overlaps with the cell, lr _i is the sub-rectangle that partially overlaps the cell, Qc is the number of sub-rectangles that completely overlap the cell, and Ql is the quantity.

S703：对于与单元格完全重合的子矩形，将完全重合的单元格中的空间数据作为查询结果；S703: For the sub-rectangle completely overlapping with the cell, use the spatial data in the completely overlapping cell as the query result;

S704：对于与单元格部分重合的子矩形，根据部分重合的单元格的查询模型进行查询，得到查询结果。S704: For the sub-rectangle partially overlapping with the cell, perform a query according to the query model of the partially overlapping cell to obtain a query result.

对于与单元格部分重合的子矩形，进一步根据部分重合的单元格的查询模型进行查询。查询方法是，先计算子矩形区域内的最小Z值和最大Z值；根据最小Z值Z_min和最大Z值Z_max在查询模型中的范围，确定在查询模型中可能的位置范围[p_min-ε,p_min+ε]，[p_max-ε,p_max+ε]，其中，ε为查询模型的置信度，可以得到子矩形在查询模型中的可能位置范围为[p_min-ε,p_max+ε]。For the sub-rectangle that partially overlaps with the cell, the query is further performed according to the query model of the partially overlapped cell. The query method is to first calculate the minimum Z value and the maximum Z value in the sub-rectangular area; according to the range of the minimum Z value Z _min and the maximum Z value Z _max in the query model, determine the possible position range in the query model [p _min -ε,p _min +ε], [p _max -ε,p _max +ε], where ε is the confidence of the query model, it can be obtained that the possible position range of the sub-rectangle in the query model is [p _min -ε, p _max +ε].

如图8B所示，由于子矩形在查询模型中的可能位置范围大于该子矩形的实际范围，需要对可能位置范围内的空间数据进行校验，将校验通过的数据作为查询结果。校验时，判断可能位置范围内的空间数据是否在查询矩形qr的范围内，将在查询矩形范围内的空间数据作为查询结果，不在查询矩形范围内的空间数据筛选掉；可能位置范围内的所有空间数据都筛选之后，获得查询结果。As shown in FIG. 8B , since the possible position range of the sub-rectangle in the query model is larger than the actual range of the sub-rectangle, the spatial data within the possible position range needs to be verified, and the verified data is used as the query result. During verification, it is judged whether the spatial data within the range of possible positions is within the range of the query rectangle qr, the spatial data within the range of the query rectangle is used as the query result, and the spatial data not within the range of the query rectangle is filtered out; After all spatial data is filtered, the query results are obtained.

一些实施例中，基于所构建的索引进行最近邻查询(KNN查询)。其中，KNN查询的原理是：给定查询数据点q_knn＝(x,y)和距离阈值δ，确定以查询数据点q_knn为圆心、δ为半径的圆，该圆的圆形区域定义为B(q_knn,δ)，确定圆形区域B(q_knn,δ)内包含的数据点，按照与查询数据点q_knn的距离由小到大排序，从排序后的数据点中选取K个数据点作为KNN查询结果。In some embodiments, a nearest neighbor query (KNN query) is performed based on the constructed index. Among them, the principle of KNN query is: given the query data point q _knn = (x, y) and the distance threshold δ, determine a circle with the query data point q _knn as the center and δ as the radius, and the circular area of the circle is defined as B(q _knn ,δ), determine the data points contained in the circular area B(q _knn ,δ), sort them according to the distance from the query data point q _knn from small to large, and select K from the sorted data points Data points as KNN query results.

如图9、10A、10B、10C所示，基于所构建的索引进行KNN查询的方法，包括：As shown in Figures 9, 10A, 10B, and 10C, the method for performing KNN query based on the constructed index includes:

S901：根据查询数据点查询四叉树结构，确定查询数据点所在的单元格；S901: Query the quad-tree structure according to the query data point, and determine the cell where the query data point is located;

S902：根据该单元格的空间密度和数据点数量，确定初始距离；S902: Determine the initial distance according to the spatial density of the cell and the number of data points;

初始距离对KNN查询至关重要，其过大或者过小都会导致查询性能下降。本实施例所构建的索引为四叉树结构，尽可能的将空间数据进行了等量划分，每个单元格包含的数据量不大于数据量阈值。因此可以做出合理假设：单元格内的空间数据大致均匀分布，在此基础上计算单元格

所包含的空间数据的空间密度，表示为：The initial distance is very important for KNN query, if it is too large or too small, the query performance will be degraded. The index constructed in this embodiment is a quad-tree structure, and the spatial data is divided into equal amounts as much as possible, and the amount of data contained in each cell is not greater than the threshold of the amount of data. Therefore, a reasonable assumption can be made: the spatial data within the cell is roughly evenly distributed, and the cell is calculated on this basis.

The spatial density of the contained spatial data, expressed as:

ρ_i＝Cell_i_count/((u_xi-l_xi)×(u_yi-l_yi)) (4)ρ _i =Cell _i _count/((u _xi -l _xi )×(u _yi -l _yi )) (4)

u_xi表示单元格在x轴方向上的最大值，u_yi表示单元格在y轴方向上的最大值、l_xi表示单元格在x轴方向上的最小值、l_yi表示单元格在y轴方向上的最小值。u _xi represents the maximum value of the cell in the x-axis direction, u _yi represents the maximum value of the cell in the y-axis direction, l _xi represents the minimum value of the cell in the x-axis direction, and l _yi represents the cell in the y-axis direction the minimum value in the direction.

当要选取第i个单元格内的K个空间数据时，可以根据空间密度估算初始距离δ₀，表示为：When the K spatial data in the ith cell are to be selected, the initial distance δ ₀ can be estimated according to the spatial density, which is expressed as:

S903：确定初始距离与距离阈值中较小的距离，根据查询数据点和该距离，构建初始查询矩形；S903: Determine the smaller distance between the initial distance and the distance threshold, and construct an initial query rectangle according to the query data point and the distance;

本实施例中，确定初始距离之后，从初始距离和距离阈值中选取较小的距离δ'＝min(δ₀,δ)，根据查询数据点和距离δ'构建初始查询矩形qr'＝[x-δ',x+δ']×[y-δ',y+δ']。In this embodiment, after the initial distance is determined, a smaller distance δ'=min(δ ₀ ,δ) is selected from the initial distance and the distance threshold, and an initial query rectangle qr'=[x -δ',x+δ']×[y-δ',y+δ'].

S904：以初始查询矩形为范围查询方法的查询矩形，获得范围查询结果；S904: Using the initial query rectangle as the query rectangle of the range query method, obtain the range query result;

S905：以查询数据点为圆心，该距离为半径，构建初始查询圆形，判断范围查询结果中的空间数据是否在初始查询圆形范围之内，将在初始查询圆形内的空间数据作为中间查询结果；S905: Taking the query data point as the center and the distance as the radius, construct an initial query circle, determine whether the spatial data in the range query result is within the range of the initial query circle, and take the spatial data within the initial query circle as the center search result;

本实施例中，以查询数据点q_knn为圆心、距离δ'为半径确定初始查询圆形B(q_knn,δ')；若范围查询结果为r＝{k_i|i＝0,1,…,m}，计算空间数据k_i与查询数据点之间的距离，如果二者之间的距离小于等于距离δ'，空间数据k_i在初始查询圆形区域内，将空间数据k_i在作为中间查询结果，如果二者之间的距离大于距离δ'，空间数据k_i在初始查询圆形区域外，将空间数据k_i筛选掉。In this embodiment, the initial query circle B(q _knn ,δ') is determined with the query data point q _knn as the center and the distance δ' as the radius; if the range query result is r={k _i |i=0,1, ...,m}, calculate the distance between the spatial data _ki and the query data point, if the distance between the two is less than or equal to the distance δ', the spatial data _ki is in the initial query circular area, and the spatial data _ki is in the As an intermediate query result, if the distance between the two is greater than the distance δ', the spatial data _ki is outside the initial query circular area, and the spatial data _ki is filtered out.

本实施例中，根据查询数据点q_knn＝(x,y)和距离阈值δ构建矩形区域

圆形区域B(q_knn,δ)为矩形区域的内切圆，这样就将查询圆形区域内的数据点转换为查询矩形区域内的数据点。再利用范围查询方法获得查询矩形的范围查询结果之后，再根据范围查询结果确定出在圆形区域中的查询结果。In this embodiment, a rectangular area is constructed according to the query data point q _knn =(x, y) and the distance threshold δ

The circular area B(q _knn ,δ) is the inscribed circle of the rectangular area, so that the data points in the query circular area are converted into data points in the query rectangular area. After obtaining the range query result of the query rectangle by using the range query method, the query result in the circular area is determined according to the range query result.

S906：判断中间查询结果中的数据量是否达到K个，如果达到，按照与查询数据点的距离由小到大排列，从排序后的数据点中选取K个数据点作为KNN查询结果；如果未达到，对初始距离进行扩展，根据查询数据点和扩展的距离，构建二次查询矩形，根据二次查询矩形利用范围查询方法进行查询，得到范围查询结果，基于范围查询结果更新中间查询结果，直至中间查询结果中的数据量达到K个为止。S906: Determine whether the amount of data in the intermediate query results reaches K, if so, arrange them according to the distance from the query data points from small to large, and select K data points from the sorted data points as the KNN query result; Reach, expand the initial distance, construct a secondary query rectangle according to the query data points and the extended distance, perform the query using the range query method according to the secondary query rectangle, obtain the range query result, and update the intermediate query result based on the range query result, until The amount of data in the intermediate query results reaches K.

其中，根据二次查询矩形利用范围查询方法进行查询，包括：根据初始查询矩形和二次查询矩形，确定未查询区域；以未查询区域为范围查询方法的查询矩形进行查询。Wherein, using the range query method to perform the query according to the secondary query rectangle includes: determining the unqueried area according to the initial query rectangle and the secondary query rectangle;

本实施例中，若根据初始查询矩形获得的数据量K’未达到K个，需要对初始距离进行扩展。在空间密度保持不变的条件下，扩展后的二次查询矩形的勒贝格测度是初始查询矩形的K/K’倍，进而获得扩展后的距离为：In this embodiment, if the amount of data K' obtained according to the initial query rectangle does not reach K, the initial distance needs to be extended. Under the condition that the spatial density remains unchanged, the Lebesgue measure of the expanded quadratic query rectangle is K/K' times that of the initial query rectangle, and the expanded distance is obtained as:

本实施例所构建的索引支持KNN查询，通过将圆形区域查询转换为矩形区域查询，利用范围查询方法获得矩形区域的范围查询结果，进一步根据范围查询结果筛选出KNN查询结果。The index constructed in this embodiment supports KNN query. By converting the circular area query into a rectangular area query, the range query result of the rectangular area is obtained by using the range query method, and the KNN query result is further screened according to the range query result.

需要说明的是，本说明书一个或多个实施例的方法可以由单个设备执行，例如一台计算机或服务器等。本实施例的方法也可以应用于分布式场景下，由多台设备相互配合来完成。在这种分布式场景的情况下，这多台设备中的一台设备可以只执行本说明书一个或多个实施例的方法中的某一个或多个步骤，这多台设备相互之间会进行交互以完成所述的方法。It should be noted that the methods of one or more embodiments of this specification may be executed by a single device, such as a computer or a server. The method in this embodiment can also be applied in a distributed scenario, and is completed by the cooperation of multiple devices. In the case of such a distributed scenario, one device among the multiple devices may only execute one or more steps in the method of one or more embodiments of the present specification, and the multiple devices may perform operations on each other. interact to complete the described method.

需要说明的是，上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下，在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外，在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中，多任务处理和并行处理也是可以的或者可能是有利的。It should be noted that the above describes specific embodiments of the present specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims can be performed in an order different from that in the embodiments and still achieve desirable results. Additionally, the processes depicted in the figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

如图11所示，本说明书实施例还提供一种索引构建装置，包括：As shown in FIG. 11 , an embodiment of the present specification further provides an index building apparatus, including:

构造树模块，用于根据空间数据集，构造包括至少一个单元格的四叉树结构；a construction tree module for constructing a quadtree structure including at least one cell according to the spatial data set;

降维处理，用于利用Z曲线对单元格内的空间数据进行数据降维处理，得到与空间数据对应的空间数据表示；其中，空间数据表示包括空间数据的坐标及Z值；Dimensionality reduction processing, which is used to perform data dimensionality reduction processing on the spatial data in the cell by using the Z-curve to obtain the spatial data representation corresponding to the spatial data; wherein, the spatial data representation includes the coordinates and the Z value of the spatial data;

排序模块，用于根据Z值对空间数据表示进行排序，并将排序后的空间数据表示保存于链表中；The sorting module is used to sort the spatial data representation according to the Z value, and save the sorted spatial data representation in the linked list;

分段模块，用于基于链表进行数据分段处理，得到至少一个数据段；The segmentation module is used to perform data segmentation processing based on the linked list to obtain at least one data segment;

模型构建模块，用于构建每个数据段的本地模型，根据各数据段的本地模型，确定单元格的查询模型。The model building module is used to construct the local model of each data segment, and determine the query model of the cell according to the local model of each data segment.

为了描述的方便，描述以上装置时以功能分为各种模块分别描述。当然，在实施本说明书一个或多个实施例时可以把各模块的功能在同一个或多个软件和/或硬件中实现。For the convenience of description, when describing the above device, the functions are divided into various modules and described respectively. Of course, when implementing one or more embodiments of this specification, the functions of each module may be implemented in one or more software and/or hardware.

上述实施例的装置用于实现前述实施例中相应的方法，并且具有相应的方法实施例的有益效果，在此不再赘述。The apparatuses in the foregoing embodiments are used to implement the corresponding methods in the foregoing embodiments, and have the beneficial effects of the corresponding method embodiments, which will not be repeated here.

图12示出了本实施例所提供的一种更为具体的电子设备硬件结构示意图，该设备可以包括：处理器1010、存储器1020、输入/输出接口1030、通信接口1040和总线1050。其中处理器1010、存储器1020、输入/输出接口1030和通信接口1040通过总线1050实现彼此之间在设备内部的通信连接。FIG. 12 shows a schematic diagram of a more specific hardware structure of an electronic device provided in this embodiment. The device may include: a processor 1010 , a memory 1020 , an input/output interface 1030 , a communication interface 1040 and a bus 1050 . The processor 1010 , the memory 1020 , the input/output interface 1030 and the communication interface 1040 realize the communication connection among each other within the device through the bus 1050 .

处理器1010可以采用通用的CPU(Central Processing Unit，中央处理器)、微处理器、应用专用集成电路(Application Specific Integrated Circuit，ASIC)、或者一个或多个集成电路等方式实现，用于执行相关程序，以实现本说明书实施例所提供的技术方案。The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit, central processing unit), a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and is used to execute related program to implement the technical solutions provided by the embodiments of this specification.

存储器1020可以采用ROM(Read Only Memory，只读存储器)、RAM(Random AccessMemory，随机存取存储器)、静态存储设备，动态存储设备等形式实现。存储器1020可以存储操作系统和其他应用程序，在通过软件或者固件来实现本说明书实施例所提供的技术方案时，相关的程序代码保存在存储器1020中，并由处理器1010来调用执行。The memory 1020 may be implemented in the form of a ROM (Read Only Memory, read only memory), a RAM (Random Access Memory, random access memory), a static storage device, a dynamic storage device, and the like. The memory 1020 may store an operating system and other application programs. When implementing the technical solutions provided by the embodiments of this specification through software or firmware, the relevant program codes are stored in the memory 1020 and invoked by the processor 1010 for execution.

输入/输出接口1030用于连接输入/输出模块，以实现信息输入及输出。输入输出/模块可以作为组件配置在设备中(图中未示出)，也可以外接于设备以提供相应功能。其中输入设备可以包括键盘、鼠标、触摸屏、麦克风、各类传感器等，输出设备可以包括显示器、扬声器、振动器、指示灯等。The input/output interface 1030 is used to connect the input/output module to realize information input and output. The input/output/module can be configured in the device as a component (not shown in the figure), or can be externally connected to the device to provide corresponding functions. The input device may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output device may include a display, a speaker, a vibrator, an indicator light, and the like.

通信接口1040用于连接通信模块(图中未示出)，以实现本设备与其他设备的通信交互。其中通信模块可以通过有线方式(例如USB、网线等)实现通信，也可以通过无线方式(例如移动网络、WIFI、蓝牙等)实现通信。The communication interface 1040 is used to connect a communication module (not shown in the figure), so as to realize the communication interaction between the device and other devices. The communication module may implement communication through wired means (eg, USB, network cable, etc.), or may implement communication through wireless means (eg, mobile network, WIFI, Bluetooth, etc.).

总线1050包括一通路，在设备的各个组件(例如处理器1010、存储器1020、输入/输出接口1030和通信接口1040)之间传输信息。Bus 1050 includes a path to transfer information between the various components of the device (eg, processor 1010, memory 1020, input/output interface 1030, and communication interface 1040).

需要说明的是，尽管上述设备仅示出了处理器1010、存储器1020、输入/输出接口1030、通信接口1040以及总线1050，但是在具体实施过程中，该设备还可以包括实现正常运行所必需的其他组件。此外，本领域的技术人员可以理解的是，上述设备中也可以仅包含实现本说明书实施例方案所必需的组件，而不必包含图中所示的全部组件。It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in the specific implementation process, the device may also include necessary components for normal operation. other components. In addition, those skilled in the art can understand that, the above-mentioned device may only include components necessary to implement the solutions of the embodiments of the present specification, rather than all the components shown in the figures.

上述实施例的电子设备用于实现前述实施例中相应的方法，并且具有相应的方法实施例的有益效果，在此不再赘述。The electronic devices in the foregoing embodiments are used to implement the corresponding methods in the foregoing embodiments, and have the beneficial effects of the corresponding method embodiments, which will not be repeated here.

本实施例的计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括，但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带，磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质，可用于存储可以被计算设备访问的信息。The computer readable medium of this embodiment includes both permanent and non-permanent, removable and non-removable media and can be implemented by any method or technology for information storage. Information may be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device.

所属领域的普通技术人员应当理解：以上任何实施例的讨论仅为示例性的，并非旨在暗示本公开的范围(包括权利要求)被限于这些例子；在本公开的思路下，以上实施例或者不同实施例中的技术特征之间也可以进行组合，步骤可以以任意顺序实现，并存在如上所述的本说明书一个或多个实施例的不同方面的许多其它变化，为了简明它们没有在细节中提供。It should be understood by those of ordinary skill in the art that the discussion of any of the above embodiments is only exemplary, and is not intended to imply that the scope of the present disclosure (including the claims) is limited to these examples; under the spirit of the present disclosure, the above embodiments or Technical features in different embodiments may also be combined, steps may be carried out in any order, and there are many other variations of the different aspects of one or more embodiments of this specification as described above, which are not in detail for the sake of brevity supply.

另外，为简化说明和讨论，并且为了不会使本说明书一个或多个实施例难以理解，在所提供的附图中可以示出或可以不示出与集成电路(IC)芯片和其它部件的公知的电源/接地连接。此外，可以以框图的形式示出装置，以便避免使本说明书一个或多个实施例难以理解，并且这也考虑了以下事实，即关于这些框图装置的实施方式的细节是高度取决于将要实施本说明书一个或多个实施例的平台的(即，这些细节应当完全处于本领域技术人员的理解范围内)。在阐述了具体细节(例如，电路)以描述本公开的示例性实施例的情况下，对本领域技术人员来说显而易见的是，可以在没有这些具体细节的情况下或者这些具体细节有变化的情况下实施本说明书一个或多个实施例。因此，这些描述应被认为是说明性的而不是限制性的。Additionally, in order to simplify illustration and discussion, and in order not to obscure one or more embodiments of this specification, the figures provided may or may not be shown in connection with integrated circuit (IC) chips and other components. Well known power/ground connection. Furthermore, devices may be shown in block diagram form in order to avoid obscuring one or more embodiments of this description, and this also takes into account the fact that details regarding the implementation of such block diagram devices are highly dependent on the implementation of the invention (ie, these details should be well within the understanding of those skilled in the art) of the platform describing one or more embodiments. Where specific details (eg, circuits) are set forth to describe exemplary embodiments of the present disclosure, it will be apparent to those skilled in the art that these specific details may be used without or with variations One or more embodiments of this specification are implemented below. Accordingly, these descriptions are to be considered illustrative rather than restrictive.

尽管已经结合了本公开的具体实施例对本公开进行了描述，但是根据前面的描述，这些实施例的很多替换、修改和变型对本领域普通技术人员来说将是显而易见的。例如，其它存储器架构(例如，动态RAM(DRAM))可以使用所讨论的实施例。Although the present disclosure has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations to these embodiments will be apparent to those of ordinary skill in the art from the foregoing description. For example, other memory architectures (eg, dynamic RAM (DRAM)) may use the discussed embodiments.

本说明书一个或多个实施例旨在涵盖落入所附权利要求的宽泛范围之内的所有这样的替换、修改和变型。因此，凡在本说明书一个或多个实施例的精神和原则之内，所做的任何省略、修改、等同替换、改进等，均应包含在本公开的保护范围之内。The embodiment or embodiments of this specification are intended to cover all such alternatives, modifications and variations that fall within the broad scope of the appended claims. Therefore, any omission, modification, equivalent replacement, improvement, etc. made within the spirit and principle of one or more embodiments of the present specification should be included within the protection scope of the present disclosure.

Claims

1. A nearest neighbor query method, characterized in that, comprising:

Query the quad-tree structure according to the input query data point, and determine the cell where the query data point is located;

Determine the initial distance according to the spatial density of the cell and the number of data points;

determining the smaller distance between the initial distance and a preset distance threshold, and constructing an initial query rectangle according to the query data point and the distance;

Using the initial query rectangle as the query rectangle of the range query method, obtain the range query result;

Determine the nearest neighbor query result according to the range query result;

Wherein, determining the nearest neighbor query result according to the range query result includes:

Constructing an initial query circle with the query data point as the center and the distance as the radius;

Judging whether the spatial data in the range query result is within the range of the initial query circle, and using the spatial data within the initial query circle as an intermediate query result;

judging whether the amount of data in the intermediate query result reaches a predetermined amount;

If so, arrange according to the distance from the query data point from small to large, and select the predetermined number of data points from the sorted data points as the nearest neighbor query result;

If it is not reached, expand the initial distance, construct a secondary query rectangle according to the query data points and the expanded distance, perform a query using a range query method according to the secondary query rectangle, and obtain a range query result, based on the range The query result updates the intermediate query result until the amount of data in the intermediate query result reaches the predetermined amount.

2. The method according to claim 1, wherein the query is performed by using a range query method according to the secondary query rectangle, comprising:

According to the initial query rectangle and the secondary query rectangle, determine the unsearched area;

The query is performed using the unqueried area as the query rectangle of the range query method.

3. The method according to claim 1, wherein the nearest neighbor query method performs a query based on an index constructed by the following method, comprising:

According to the spatial data set, construct a quadtree structure including at least one cell;

The spatial data in the cell is subjected to data dimensionality reduction processing by using the Z-curve to obtain a spatial data representation corresponding to the spatial data; wherein, the spatial data representation includes the coordinates and Z values of the spatial data;

Sort the spatial data representation according to the Z value, and store the sorted spatial data representation in a linked list;

Perform data segmentation processing based on the linked list to obtain at least one data segment;

A local model of each data segment is constructed, and according to the local model of each data segment, data fitting is performed on the Z value by using the current piecewise function to determine the query model of the cell.

4. The method according to claim 3, wherein each node of the linked list includes a spatial data representation and its position in the linked list.

5. The method according to claim 3, wherein, performing data segmentation processing based on the linked list, and obtaining at least one data segment is: adopting FSW algorithm to perform data segmentation on the nodes in the linked list, obtaining at least one data segment. a data segment.

6. The method according to claim 3 or 5, wherein the local model of the data segment comprises the position and Z value of the starting point of the data segment, the position and Z value of the ending point, according to the starting point and the Z value. The calculated slope of the end point.