[go: up one dir, main page]

CN111373389B - Data storage system and method for providing data storage system - Google Patents

Data storage system and method for providing data storage system Download PDF

Info

Publication number
CN111373389B
CN111373389B CN201780097046.1A CN201780097046A CN111373389B CN 111373389 B CN111373389 B CN 111373389B CN 201780097046 A CN201780097046 A CN 201780097046A CN 111373389 B CN111373389 B CN 111373389B
Authority
CN
China
Prior art keywords
lock
node
symbol
storage system
data storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201780097046.1A
Other languages
Chinese (zh)
Other versions
CN111373389A (en
Inventor
谢尔盖·罗曼诺维奇·巴希罗夫
阿莱克桑德尔·阿列克山德罗维奇·西马克
徐君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN111373389A publication Critical patent/CN111373389A/en
Application granted granted Critical
Publication of CN111373389B publication Critical patent/CN111373389B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9027Trees

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明提供一种数据存储系统100,具有数据存储器104和数据控制器101,用于实现具有多个节点的前缀树600,其中,所述数据控制器101用于在所述前缀树600中发起写操作;所述数据控制器101用于为写操作下的节点602、603处多个符号中的单个符号提供锁。

The present invention provides a data storage system 100, which has a data storage 104 and a data controller 101, which is used to implement a prefix tree 600 with multiple nodes, wherein the data controller 101 is used to initiate in the prefix tree 600 Write operation; the data controller 101 is used to provide a lock for a single symbol among multiple symbols at nodes 602 and 603 under the write operation.

Description

数据存储系统以及用于提供数据存储系统的方法Data storage system and method for providing data storage system

技术领域Technical field

本发明涉及一种数据存储系统以及用于提供数据存储系统的方法。更具体地,本发明涉及一种数据存储系统和方法,用于提高多个写入器之间的写操作的可扩展性。The present invention relates to a data storage system and a method for providing a data storage system. More specifically, the present invention relates to a data storage system and method for improving the scalability of write operations among multiple writers.

背景技术Background technique

数据查找对于包括信息数据库和存储系统的所有计算机程序都是必不可少的。程序需要为随后的检索或计算快速定位数据。通常,通过索引数据结构提供此类查找功能,这些数据结构包含由某些数据属性组织的存储数据的相应位置或地址。因此,不需要遍历整个存储数据集,而是查找相应的索引记录。常用的索引是一种独立的工具,它具有高效的接口,通过请求的属性快速检索数据位置。Data lookup is essential for all computer programs including information databases and storage systems. Programs need to quickly locate data for subsequent retrieval or calculations. Typically, such lookup capabilities are provided through index data structures that contain the corresponding locations or addresses of stored data organized by certain data attributes. Therefore, instead of traversing the entire stored data set, the corresponding index record is found. Common Index is a stand-alone tool with an efficient interface for quickly retrieving data locations by requested attributes.

在并发存在的情况下,若干参与者同时执行索引修改,总体操作效率很大程度上取决于并发同步的方法。通常,写入器以排它方式或在单独的节点副本中修改索引结构记录。因此,所述索引结构表示存储数据集的完整和正确状态。通常认为基于树的数据结构的同步是一个瓶颈。In the presence of concurrency, several participants perform index modifications at the same time, and the overall operation efficiency depends largely on the method of concurrent synchronization. Typically, writers modify index structure records exclusively or in separate node replicas. Therefore, the index structure represents the complete and correct state of the stored data set. Synchronization of tree-based data structures is often considered a bottleneck.

数十年来,人们已经了解不同的基于树的索引结构。尽管它们的本质不同,但其中许多都具有某些共同的特征,例如单个根节点、相互连接形成树的内部节点以及最后一层节点(称为叶)。通常,有效负载数据保存在树叶内,内部节点存储一些在遍历查找期间进行选择的不同属性值。Different tree-based index structures have been known for decades. Despite their different natures, many of them share certain characteristics, such as a single root node, internal nodes that are connected to each other to form the tree, and the final level of nodes, called leaves. Typically, the payload data is kept inside the leaves, and the internal nodes store some different attribute values that are selected during the traversal lookup.

前缀树是基于树的索引数据结构的特殊情况。并非将属性值保存在内部节点内部,而是将该信息保存在节点互连内。因此,在前缀树遍历期间,不需要逐一查看子节点并比较搜索的属性值,只需在存在时选择与属性值索引对应的子节点即可。尽管这类树明显比较简单且遍历算法简洁,但由于内存开销很大,因此并没有广泛用于通用数据库或存储系统索引结构。实际上,每个前缀树节点都需要保存整个范围的可能属性值,即使实际上只有很少的这类子节点存在。上文暗示了内存消耗呈指数增长。Prefix trees are a special case of tree-based index data structures. Rather than storing attribute values within internal nodes, this information is stored within node interconnections. Therefore, during prefix tree traversal, there is no need to look at the child nodes one by one and compare the searched attribute values, but only select the child node corresponding to the attribute value index when it exists. Despite their apparent simplicity and concise traversal algorithm, such trees are not widely used in general-purpose database or storage system index structures due to their high memory overhead. In practice, each prefix tree node needs to hold the entire range of possible attribute values, even if only few such child nodes actually exist. The above implies that memory consumption increases exponentially.

近年来,前缀树已经用于一般用途。最重要的修改在于提供了可变大小的节点。这意味着,存在不同容量的内部节点,在树修改期间根据所需的子节点数量选择特定数量的子节点。这种方法通常称为水平压缩。另一个有价值的改进是在于跳过了所有仅具有单个子节点的内部节点,从而需要将共同属性保存在中间节点内部,并将搜索属性存储在相应的叶子内。这种方法通常称为垂直压缩或键序跳过。这两种改进都直接带来了更精确、更适度的内存消耗。In recent years, prefix trees have come into use for general purposes. The most important modification is to provide variable-sized nodes. This means that there are internal nodes of different capacities and a specific number of child nodes are selected during tree modification based on the number of child nodes required. This method is often called horizontal compression. Another valuable improvement consists in skipping all internal nodes with only a single child node, thereby requiring the common attributes to be saved inside the intermediate nodes and the search attributes to be stored within the corresponding leaves. This method is often called vertical compression or key-order skipping. Both improvements directly lead to more precise and modest memory consumption.

已知前缀树的另一个重要问题在于并发访问或同步的方法。传统上,同步技术使写入器串联化,从而同时尝试修改同一个树节点,即使它们更改了节点的不同部分。这类解决方案简单,但很难扩展。更先进的技术允许若干写入器并行修改一个树节点。然而,它们需要复杂的数据结构,而且通常实现起来更复杂。可以考虑的替代选择是使用硬件事务内存,但是在写入时呈现适当硬件支持方面仍然存在一些阻碍。Another important issue with known prefix trees is the method of concurrent access or synchronization. Traditionally, synchronization techniques concatenate writers so that they simultaneously try to modify the same tree node, even if they change different parts of the node. Such solutions are simple but difficult to scale. More advanced techniques allow several writers to modify a tree node in parallel. However, they require complex data structures and are generally more complex to implement. An alternative that could be considered is to use hardware transactional memory, but there are still some obstacles in presenting proper hardware support when writing.

通常,原始前缀树设计需要大量内存,因此这类树的使用非常有限。所谓的自适应基数树(Adaptive Radix Tree,ART)是一种针对内存使用前缀树高度优化的先进设计。ART综合了所有已知的前缀树压缩技术:每个内部节点可以包含若干个符号,称为共同前缀,而不是每个内部节点仅一个符号,如果键的其余部分是唯一的,则应用键序跳过以减少内部节点的数量。此外,每个内部节点具有自适应大小,并且根据子节点的数量增加或减少。通常,在前缀树中,修改最多影响两个节点:进行更改的节点及可能其父节点。因此,通常有若干已知的并发控制机制可以有效地应用于ART或前缀树。Typically, primitive prefix tree designs require large amounts of memory, so the use of such trees is very limited. The so-called Adaptive Radix Tree (ART) is an advanced design highly optimized for memory usage prefix trees. ART combines all known prefix tree compression techniques: instead of just one symbol per internal node, each internal node can contain several symbols, called co-prefixes, and key ordering is applied if the rest of the key is unique Skip to reduce the number of internal nodes. Additionally, each internal node has an adaptive size and increases or decreases based on the number of child nodes. Typically, in a prefix tree, modifications affect at most two nodes: the node where the change was made and possibly its parent. Therefore, there are generally several known concurrency control mechanisms that can be effectively applied to ART or prefix trees.

锁耦合是一种用于同步B树的标准方法,可以轻易地应用于前缀树。其思想是在树遍历期间一次最多保持两个锁。从根节点开始,每次在前缀树中向下一层,就会获取对子节点的锁,从而释放父锁。该方法允许在不同前缀树节点具有不同父节点时同时修改这些前缀树节点。锁耦合算法具有树的第一层上争用高的问题。每个参与者都从根节点开始,所以在树的零层上有高争用,在第一层上也有很高的争用概率。Lock coupling is a standard method for synchronizing B-trees and can be easily applied to prefix trees. The idea is to hold at most two locks at a time during tree traversal. Starting from the root node, each time you go down one level in the prefix tree, the lock on the child node is acquired, thereby releasing the parent lock. This method allows simultaneous modification of different prefix tree nodes when they have different parents. Lock coupling algorithms have the problem of high contention on the first level of the tree. Each participant starts at the root node, so there is high contention on level zero of the tree, and also high contention probability on level one.

另一种方法是乐观锁耦合。它是一种性能更优的增强型锁耦合方法。并非在每个步骤都获取锁来防止对节点的修改,而是每个参与者都假设不会有并发修改。使用版本计数器之后检测修改,如果检测到修改,则重启操作。否则,最多需要两个锁来执行修改(针对节点和父节点)。乐观锁耦合显示出更好的性能,因为锁是按需获取的,树节点中的总争用不高。然而,它的主要问题是粗粒度锁。所述粗粒度锁会阻塞实际上没有执行任何修改的多个树节点。例如,如果一个参与者锁定了树节点,那么另一个参与者就不能锁定它的子节点,因为他还需要锁定父节点。Another approach is optimistic lock coupling. It is an enhanced lock coupling method with better performance. Rather than acquiring a lock at each step to prevent modifications to the node, each participant assumes that there will be no concurrent modifications. Detect modifications after using a version counter, and restart the operation if modifications are detected. Otherwise, up to two locks are required to perform modifications (for the node and parent). Optimistic lock coupling shows better performance because locks are acquired on demand and the total contention among tree nodes is not high. However, its main problem is coarse-grained locking. The coarse-grained lock blocks multiple tree nodes that are not actually performing any modifications. For example, if one actor locks a tree node, another actor cannot lock its child nodes because he also needs to lock the parent node.

读优化写入排除(Read-Optimized Write Exclusion,ROWEX)方法显示出更好的性能。对每个内部节点都扩展了层字段,该层字段指示在节点层处键序列的总长度。通过对前缀和前缀长度的原子修改,和对节点指针的原子修改,阅读器可以不受任何阻碍(锁定、重试等)地工作。写入器的工作方式与乐观锁耦合中写入器的工作方式类似,但会执行额外的动作以支持阅读器正确读取状态中的前缀树。对于写入器而言,ROWEX使用与乐观锁耦合类似的锁定方法。每个写入器在需要修改的树节点及其父节点上有两个排他锁。主要的区别在于,写入器执行额外的动作以使阅读器在没有锁的情况下工作。这些额外动作大多来自对树节点字段的原子更新,这些原子更新对于阅读器正确读取树很重要。Read-Optimized Write Exclusion (ROWEX) method shows better performance. Each internal node is extended with a layer field that indicates the total length of the key sequence at the node's layer. Through atomic modification of the prefix and prefix length, and atomic modification of the node pointer, the reader can work without any hindrance (locking, retries, etc.). Writers work similarly to writers in optimistic locking coupling, but perform additional actions to support readers in correctly reading the prefix tree in the state. For writers, ROWEX uses a locking method similar to optimistic lock coupling. Each writer has two exclusive locks on the tree node that needs to be modified and its parent node. The main difference is that the writer performs additional actions to allow the reader to work without locks. Most of these extra actions come from atomic updates to tree node fields that are important for the reader to read the tree correctly.

写时复制(Copy on Write,COW)方法通常用于无锁算法。关键思想是制作所修改节点的隐藏副本,然后对该副本执行所有更改,并使其可见。传统的锁定、比较和交换操作以及节点版本控制都可以用于实现COW。COW方法适合于面向读的工作负载,例如,工作负载中95%的读操作和5%的写操作。写入器在准备树节点复制期间检测更改,重启操作并重做副本。这类额外的内存复制和分配也会导致写性能不佳。另一方面,COW可以与锁一起使用,以避免重启。在这种情况下,写操作的可扩展性相当低。另外,对于写入速度较慢但读取速度接近DRAM的非易失性存储器,额外复制需要更多的时间,因此细粒度锁实际上可能更快。The Copy on Write (COW) method is usually used in lock-free algorithms. The key idea is to make a hidden copy of the modified node, then perform all changes on that copy and make it visible. Traditional locking, comparison and swap operations as well as node versioning can be used to implement COW. The COW method is suitable for read-oriented workloads, for example, a workload with 95% read operations and 5% write operations. The writer detects changes during preparation for tree node replication, restarts the operation, and redoes the copy. This type of additional memory copying and allocation can also lead to poor write performance. COW, on the other hand, can be used with locks to avoid restarts. In this case, the scalability of write operations is quite low. Also, for non-volatile memory, which is slower to write but has read speeds close to DRAM, the extra copy takes more time, so fine-grained locking may actually be faster.

硬件事务内存(Hardware Transactional Memory,HTM)是一种硬件机制,用于检测冲突并撤消对共享数据所做的任何更改。这种硬件的目标是通过强制实现原子性、一致性和隔离来透明地支持标记为事务的代码区域。在此场景中,所有参与者始终观察到树节点的一致状态,因为修改只在事务内执行,因此只有在成功提交之后,才对所有参与者可见。当提交操作失败时,由于节点已经被同时更改,因此需要重启修改。HTM是一种非常特殊的硬件,尚还没有得到很好的支持,如今在生产中使用它的机会不多。Hardware Transactional Memory (HTM) is a hardware mechanism used to detect conflicts and undo any changes made to shared data. The goal of this hardware is to transparently support regions of code marked as transactions by enforcing atomicity, consistency, and isolation. In this scenario, all participants always observe a consistent state of the tree nodes because modifications are only performed within a transaction and are therefore visible to all participants only after a successful commit. When the commit operation fails, the modification needs to be restarted because the nodes have been changed at the same time. HTM is a very specialized piece of hardware that is not well supported yet, and there are not many opportunities to use it in production today.

软件事务内存(Software Transactional Memory,STM)是在没有HTM支持的情况下在机器上对HTM的模拟。然而,最佳地实现STM需要大量的比较和交换指令支持(例如,DCAS)操作,否则这种方法的实现是复杂且异常缓慢的。不幸的是,DCAS操作仅由最新型处理器支持,并不广泛可用。此外,在实践中,与基于细粒度锁的系统相比,STM系统的性能也受到了影响,这主要归因于与维护日志相关的开销和提交事务所花费的时间。Software Transactional Memory (STM) is a simulation of HTM on the machine without HTM support. However, optimally implementing STM requires a large number of compare-and-swap instructions to support (e.g., DCAS) operations, otherwise implementation of this approach is complex and extremely slow. Unfortunately, DCAS operations are only supported by the latest processors and are not widely available. Furthermore, in practice, the performance of STM systems suffers compared to systems based on fine-grained locks, mainly due to the overhead associated with maintaining logs and the time spent committing transactions.

基数树的并发实现是复杂的,因为键压缩可能导致不确定状态持续一段时间,直到树转换完成。阅读器和写入器同时访问节点需要得到适当处理,以避免树的不确定状态。Concurrent implementation of radix trees is complex because key compression can cause an indeterminate state to persist for a period of time until the tree transformation is complete. Simultaneous access to nodes by readers and writers needs to be handled appropriately to avoid an indeterminate state of the tree.

由于键压缩,写操作有必要解压缩现行节点中的键的一部分并分出另一路径。这是最复杂的情况并导致树的结构变化。例如,对于自适应基数树,可能需要对节点扩展进行重新分配。此类操作比较复杂,通常需要一个写入器的等待状态,直到另一个写入器完成其写操作。Due to key compression, it is necessary for a write operation to decompress part of the key in the current node and fork another path. This is the most complex situation and results in changes in the structure of the tree. For example, for adaptive radix trees, node extensions may need to be reallocated. Such operations are complex and typically require one writer to wait in a wait state until the other writer completes its write operation.

发明内容Contents of the invention

鉴于上述问题和缺点,本发明旨在提高前缀树的性能。因此,本发明的目的是提高多个写入器之间的写操作的可扩展性。In view of the above problems and shortcomings, the present invention aims to improve the performance of prefix trees. Therefore, it is an object of the present invention to improve the scalability of write operations between multiple writers.

本发明的目的通过所附独立权利要求中提供的解决方案来实现。本发明的有利实施方式在从属权利要求中进一步定义。The object of the invention is achieved by the solution provided in the appended independent claims. Advantageous embodiments of the invention are further defined in the dependent claims.

具体地,本发明提出了一种包括内存高效的索引数据结构数据存储系统以及用于同步若干个写入器的独特并发控制方法,从而在写入时提供高可扩展性。Specifically, the present invention proposes a data storage system that includes a memory-efficient index data structure and a unique concurrency control method for synchronizing several writers, thereby providing high scalability when writing.

在数据库和存储系统中,所述索引数据结构的性能非常重要,对最终性能有重要的影响。为了支持数据用户的同时访问,索引数据结构配有并发控制机制。由于已知的复杂并发控制机制导致了额外的开销,上述并发控制机制成为系统的瓶颈。甚至并行访问的串联化有时也会造成不利影响,从而限制了系统在多核硬件上的可扩展性。本发明通过改进写操作的并发控制机制,解决了索引数据结构的可扩展性问题。In databases and storage systems, the performance of the index data structure is very important and has a significant impact on the final performance. To support simultaneous access by data users, the index data structure is equipped with a concurrency control mechanism. Due to the extra overhead caused by known complex concurrency control mechanisms, the above concurrency control mechanisms become bottlenecks in the system. Even serialization of parallel access can sometimes have detrimental effects, limiting the scalability of the system on multi-core hardware. The present invention solves the scalability problem of the index data structure by improving the concurrency control mechanism of write operations.

本发明可以看作一种细粒锁定方法,适用作数据库和存储系统中用作索引数据结构的前缀树的并发控制机制。所述前缀树还可以应用垂直压缩和水平压缩、键序跳过、可变大小的节点以及写入器与阅读器之间的先进同步。所述树在有限字母表或字符串中所有字的均匀搜索域内运行。树节点的分支因子由所述字母表的基数定义。它还可以包括例如可变节点大小、公共节点前缀和节点锁等概念,以多符号语义扩展节点锁。可变节点大小技术允许根据实际需求确定特定节点的实际分支因子。公共节点前缀能够在子节点共享其键内的多个符号或前缀的情况下垂直压缩基数树。The present invention can be regarded as a fine-grained locking method and is suitable for use as a concurrency control mechanism for prefix trees used as index data structures in databases and storage systems. The prefix tree can also apply vertical and horizontal compression, key order skipping, variable size nodes, and advanced synchronization between writers and readers. The tree operates within a uniform search domain of all words in a finite alphabet or string. The branching factor of a tree node is defined by the base of the alphabet. It can also include concepts such as variable node sizes, common node prefixes, and node locks, extending node locks with multi-symbol semantics. Variable node size technology allows the actual branching factor for a specific node to be determined based on actual needs. Common node prefixes enable vertical compression of radix trees in situations where child nodes share multiple symbols or prefixes within their keys.

多符号锁允许获取整个节点的排它锁或该节点中字母表的特定符号的选择锁。所述字母表的特定符号的锁提供了只锁住一个相应子节点的工具,而所述排它所锁住所有子节点。然后,写入器能够对同一父节点的不同子节点进行同时修改。此外,如果这些写入器针对所述字母表的不同符号工作,则可以同时在同一节点中插入或删除。在更复杂的情况(通常很少发生)下,例如,当需要更改共同前缀或应扩展或缩小可变节点大小时,写入器仍然能够获取排它锁。所设想的乐观放松锁耦合同步方法是对乐观锁耦合方案的增强,它最多使用两个排它锁进行多符号锁的树修改。这种方法在并发环境中实现了更好的可扩展性和更好的性能。Multi-symbol locks allow obtaining an exclusive lock on an entire node or a selective lock on a specific symbol of the alphabet within that node. The alphabet-specific symbol lock provides the facility to lock only one corresponding child node, while the exclusive locks all child nodes. The writer is then able to make simultaneous modifications to different child nodes of the same parent node. Furthermore, if these writers work for different symbols of the alphabet, insertions or deletions can be made in the same node at the same time. In more complex cases (which usually occur rarely), such as when the co-prefix needs to be changed or the variable node size should be expanded or reduced, the writer is still able to acquire an exclusive lock. The envisaged optimistic relaxed lock coupling synchronization method is an enhancement of the optimistic lock coupling scheme, which uses at most two exclusive locks for tree modifications of multi-symbol locks. This approach enables better scalability and better performance in concurrent environments.

本发明的第一方面提供了一种数据存储系统,具有数据存储器和数据控制器,用于实现具有多个节点的前缀树,其中,所述数据控制器用于在所述前缀树中发起写操作;所述数据控制器用于为写操作下的节点处多个符号中的单个符号提供锁。A first aspect of the present invention provides a data storage system having a data memory and a data controller for implementing a prefix tree with a plurality of nodes, wherein the data controller is used to initiate a write operation in the prefix tree ; The data controller is used to provide a lock for a single symbol among multiple symbols at a node under a write operation.

本发明的一个优点在于,除了获得完全排它锁之外,还有可能仅针对一个符号获取放松锁。对于前缀树而言,这意味着即使多个子节点具有相同的父节点,仍可以同时对它们进行修改,因为它们的前缀实际上是不同的。其他优势在于,相比使用排它锁的乐观锁耦合,本发明具有更好的扩展性,从而实现了平均30%的吞吐量增益,如在基准模型上所测试的。相比写时复制方法,本发明方法的另一优点在于持久内存。One advantage of the invention is that, in addition to obtaining a fully exclusive lock, it is also possible to obtain a relaxed lock for only one symbol. For prefix trees, this means that even if multiple child nodes have the same parent, they can still be modified at the same time because their prefixes are actually different. Other advantages are that the present invention scales better than optimistic lock coupling using exclusive locks, achieving an average throughput gain of 30%, as tested on the benchmark model. Another advantage of the inventive method compared to the copy-on-write method is persistent memory.

根据本发明的数据存储系统能够为并发数据库和存储系统以及高度依赖索引数据结构的其他应用带来若干积极效果,例如吞吐量更高、操作时延降低、系统可扩展性提高以及CPU负载降低。The data storage system according to the present invention can bring several positive effects to concurrent databases and storage systems and other applications that rely heavily on index data structures, such as higher throughput, reduced operation latency, improved system scalability, and reduced CPU load.

在所述第一方面的一种实现方式中,所述数据控制器用于发起对特定节点的插入或删除;所述数据控制器用于为所述特定节点提供锁;为所述特定节点的父节点处多个符号中表示所述特定节点的单个符号提供锁。此操作允许锁定在操作下的子节点,而在父节点处仅锁定单个节点。因此,可以同时修改单个父节点的一个以上子节点。In an implementation of the first aspect, the data controller is used to initiate the insertion or deletion of a specific node; the data controller is used to provide a lock for the specific node; to provide a parent node for the specific node. The lock is provided by a single symbol among multiple symbols that represents the particular node. This operation allows locking the child nodes under the operation, while only locking a single node at the parent node. Therefore, more than one child node of a single parent node can be modified simultaneously.

在所述第一方面的另一种实现方式中,所述数据控制器用于对特定节点的多个符号中的单个符号发起更改;为所述单个符号提供锁。这种单符号锁允许在单个节点处进行多符号操作,因为只有相应的符号被锁定,而不是整个节点被锁定。此外,不需要锁定所述父节点。In another implementation of the first aspect, the data controller is configured to initiate a change to a single symbol among a plurality of symbols of a specific node; and provide a lock for the single symbol. This single-symbol lock allows multi-symbol operations at a single node because only the corresponding symbol is locked, rather than the entire node. Additionally, the parent node does not need to be locked.

在所述第一方面的另一种实现方式中,所述数据控制器用于为节点提供位长为0至n–1的锁结构,其中,n是所述前缀树的分支因子,其中,所述数据控制器用于为所述锁结构的至少一个位提供锁。这种锁结构允许以单个符号的解析对所有符号进行全范围锁定。这种方法可能会增加内存使用率。然而,集成复杂性和执行时间与标准程序大致相同。In another implementation of the first aspect, the data controller is configured to provide a lock structure with a bit length of 0 to n-1 for a node, where n is a branching factor of the prefix tree, and where The data controller is configured to provide a lock for at least one bit of the lock structure. This lock structure allows full-scope locking of all symbols with resolution of a single symbol. This approach may increase memory usage. However, the integration complexity and execution time are approximately the same as the standard program.

在所述第一方面的另一种实现方式中,所述数据控制器用于为节点提供锁结构,其中,所述锁结构包括:符号段,包括长度为0至k–1且用于存储符号的数组;锁掩码段,位长为0至k–1;其中,k<n,n是所述前缀树的所述分支因子,其中,所述数据控制器用于为所述锁掩码段的至少一个位提供锁。这种锁结构允许使用紧凑锁,其可用作传统锁的直接替代。虽然集成复杂性可能更高,但是内存使用率和执行时间与标准程序大致相同。In another implementation of the first aspect, the data controller is configured to provide a lock structure for a node, wherein the lock structure includes: a symbol segment, including a length of 0 to k-1 and used to store symbols. an array; the lock mask segment, the bit length is 0 to k–1; where, k<n, n is the branching factor of the prefix tree, wherein the data controller is used to provide the lock mask segment At least one bit of provides the lock. This lock structure allows the use of compact locks that can be used as a drop-in replacement for traditional locks. Although integration complexity may be higher, memory usage and execution time are approximately the same as the standard program.

在所述第一方面的另一种实现方式中,所述数据控制器用于:对所述前缀树中的第一节点发起第一写操作;为所述第一节点提供锁;为所述第一节点的父节点处表示所述第一节点的单个符号提供锁;其中,所述数据控制器用于对所述前缀树中与所述第一节点具有相同父节点的第二节点发起第二写操作。可以对共同父节点的两个子节点进行两个并行写操作。启动两个写入器以同时写入,使得可以在一个时间点进行多个免等待写操作。In another implementation of the first aspect, the data controller is configured to: initiate a first write operation on a first node in the prefix tree; provide a lock for the first node; and provide a lock for the first node. A single symbol representing the first node at the parent node of a node provides a lock; wherein the data controller is used to initiate a second write to a second node in the prefix tree that has the same parent node as the first node. operate. Two parallel write operations can be performed on two child nodes of a common parent node. Start two writers to write simultaneously, allowing multiple wait-free write operations to occur at one point in time.

在所述第一方面的另一种实现方式中,所述数据控制器用于为所述第二节点提供锁;为所述父节点处表示所述第二节点的单个符号提供锁。对所述父节点处第二符号的锁定操作是可选的,例如,当只有两个并发写操作时,不需要锁定操作。对于两个以上并发写操作,锁定所述父节点处的对应符号。In another implementation of the first aspect, the data controller is configured to provide a lock for the second node; provide a lock for a single symbol at the parent node that represents the second node. The locking operation on the second symbol at the parent node is optional, for example, when there are only two concurrent write operations, the locking operation is not required. For more than two concurrent write operations, the corresponding symbol at the parent node is locked.

在所述第一方面的另一种实现方式中,所述数据控制器用于对所述前缀树中的节点的第一符号发起第一写操作,其中,所述第一符号表示所述节点的第一子节点;为所述第一符号提供锁;其中,所述数据控制器用于对所述节点中的第二符号发起第二写操作,其中,所述第二符号表示所述节点的第二子节点。可以对一个节点处的两个符号进行两个并发写操作。启动两个写入器以同时修改一个节点处的多个符号,使得可以在一个时间点进行多个免等待写操作。In another implementation of the first aspect, the data controller is configured to initiate a first write operation on a first symbol of a node in the prefix tree, wherein the first symbol represents a a first child node; providing a lock for the first symbol; wherein the data controller is configured to initiate a second write operation on a second symbol in the node, where the second symbol represents the node's first Two child nodes. Two concurrent writes can be made to two symbols at a node. Start two writers to modify multiple symbols at a node simultaneously, allowing multiple wait-free writes to occur at a single point in time.

在所述第一方面的另一种实现方式中,所述数据控制器用于为所述第二符号提供锁。对所述父节点处第二符号的锁定操作是可选的,例如,当只有两个并发写操作时,不需要锁定操作。对于两个以上并发写操作,锁定所述父节点处的对应符号。In another implementation of the first aspect, the data controller is configured to provide a lock for the second symbol. The locking operation on the second symbol at the parent node is optional, for example, when there are only two concurrent write operations, the locking operation is not required. For more than two concurrent write operations, the corresponding symbol at the parent node is locked.

在所述第一方面的另一种实现方式中,所述数据控制器用于:为写操作下的子节点提供锁;在所述写操作下的子节点的所有父节点处为表示相应子节点的单个符号提供锁,使得仅锁定到所述写操作下的子节点的单个路径。这种对单符号锁的单路径的定义对树的单节点修改或写操作造成的损害最小。它为更多的并发写操作留下了充足的空间。在现有技术中,需要锁定全部子树。In another implementation of the first aspect, the data controller is configured to: provide a lock for a child node under a write operation; and represent a corresponding child node at all parent nodes of the child node under the write operation. A single symbol provides a lock such that only a single path to the child node under the write operation is locked. This definition of a single path to a single-symbol lock causes minimal damage to single-node modifications or writes to the tree. It leaves plenty of room for more concurrent write operations. In the existing technology, all subtrees need to be locked.

在所述第一方面的另一种实现方式中,所述数据控制器用于在完成所述写操作之后释放符号和/或节点的锁。一旦终止所述修改,所述释放使得有机会对该特定节点或符号进行进一步写操作。In another implementation of the first aspect, the data controller is configured to release a lock of a symbol and/or a node after completing the write operation. Once the modification is terminated, the release gives the opportunity for further writes to that particular node or symbol.

在所述第一方面的另一种实现方式中,所述数据控制器用于通过锁定所述特定节点处的所述多个符号来为特定节点提供排它锁。本申请的锁概念仍然允许使用排它锁,即节点处所有符号的锁。In another implementation of the first aspect, the data controller is configured to provide an exclusive lock for a particular node by locking the plurality of symbols at the particular node. The lock concept of this application still allows the use of exclusive locks, that is, locks of all symbols at the node.

在所述第一方面的另一种实现方式中,所述前缀树是基数树或自适应基数树。本申请的锁概念非常适合这种树,因为对于基数树而言,每个节点可以表示若干符号。In another implementation of the first aspect, the prefix tree is a radix tree or an adaptive radix tree. The lock concept of this application is very suitable for this kind of tree, because for radix trees, each node can represent several symbols.

本发明的第二方面提供了一种用于提供数据存储系统的方法,所述数据存储系统用于实现具有多个节点的前缀树,所述方法包括:在所述前缀树中发起写操作;为写操作下的节点处多个符号中的单个符号提供锁。适用与上述相同的修改和优点。A second aspect of the present invention provides a method for providing a data storage system for implementing a prefix tree having a plurality of nodes, the method comprising: initiating a write operation in the prefix tree; Provides a lock on a single symbol among multiple symbols at the node under write operation. The same modifications and advantages as above apply.

本发明的第三方面提供了一种计算机程序,具有程序代码,用于在所述计算机程序在计算机或如上所述的数据存储系统上运行时,执行如上所述的方法。适用与上述相同的修改和优点。A third aspect of the invention provides a computer program having program code for performing a method as described above when the computer program is run on a computer or a data storage system as described above. The same modifications and advantages as above apply.

应注意,本申请中所述的所有设备、元件、单元和构件都可以在软件或硬件元件或其任何种类的组合中实施。本申请中描述的各种实体执行的所有步骤和所描述的将由各种实体执行的功能旨在表明各个实体适于或用于执行各自的步骤和功能。虽然在以下具体实施例的描述中,由外部实体执行的特定功能或步骤没有在执行特定步骤或功能的该实体的具体元件的描述中反映,但是技术人员应该清楚的是这些方法和功能可以在各自的硬件或软件元件或其任意组合中实现。It should be noted that all devices, elements, units and structures described in this application may be implemented in software or hardware elements or any kind of combination thereof. All steps described in this application as being performed by various entities and functions described as being performed by the various entities are intended to indicate that the respective entities are suitable or useful for performing the respective steps and functions. Although in the following description of specific embodiments, specific functions or steps performed by an external entity are not reflected in the description of specific elements of the entity that perform the specific steps or functions, it should be clear to the skilled person that these methods and functions may be performed in implemented in their respective hardware or software components or any combination thereof.

附图说明Description of the drawings

结合所附附图,下面具体实施例的描述将阐述上述本发明的各方面及其实现方式,其中:In conjunction with the accompanying drawings, the following description of specific embodiments will illustrate various aspects of the invention and its implementation, wherein:

图1示出了具有索引结构的数据存储系统的系统架构图。Figure 1 shows the system architecture diagram of a data storage system with an index structure.

图2示出了全范围锁结构的图。Figure 2 shows a diagram of the full range lock structure.

图3示出了256位全范围锁结构的示例。Figure 3 shows an example of a 256-bit full range lock structure.

图4示出了紧凑锁结构的图。Figure 4 shows a diagram of the compact lock structure.

图5示出了64位大小的紧凑锁结构的示例。Figure 5 shows an example of a compact lock structure of 64-bit size.

图6示出了具有两个并发写操作的索引树。Figure 6 shows an index tree with two concurrent write operations.

图7示出了键插入算法的流程图。Figure 7 shows a flow chart of the key insertion algorithm.

图8示出了多符号锁算法的流程图。Figure 8 shows a flow chart of the multi-symbol lock algorithm.

图9示出了全范围锁算法的流程图。Figure 9 shows a flow chart of the full range lock algorithm.

图10示出了紧凑锁算法的流程图。Figure 10 shows a flow chart of the compact lock algorithm.

图11示出了用于提供数据存储系统的方法。Figure 11 illustrates a method for providing a data storage system.

具体实施方式Detailed ways

图1示出了包括数据控制器101的数据存储系统100。数据控制器101包括针对用作索引数据结构103的具有垂直压缩和键序跳过的前缀树或基数树的并发控制机制102。数据存储系统100还包括数据存储器104。数据控制器101可以使用动态随机存取存储器(Dynamic Random Access Memory,DRAM)等在主存储器中实现,数据存储器104通常包括大容量存储器,如硬盘、固态硬盘(Solid-State-Disk,SSD)等。Figure 1 shows a data storage system 100 including a data controller 101. The data controller 101 includes a concurrency control mechanism 102 for a prefix tree or radix tree with vertical compression and key order skipping used as the index data structure 103 . Data storage system 100 also includes data storage 104 . The data controller 101 can be implemented in the main memory using a dynamic random access memory (Dynamic Random Access Memory, DRAM), etc. The data memory 104 usually includes a large-capacity memory, such as a hard disk, a solid-state disk (Solid-State-Disk, SSD), etc. .

根据本发明的权利要求1,数据存储系统100包括数据控制器101和数据存储器104。According to claim 1 of the present invention, the data storage system 100 includes a data controller 101 and a data memory 104.

因此,为了快速定位数据,只需要搜索索引数据结构103内与请求索引记录对应的索引。通常,索引数据结构将原始数据组织成逻辑单元,并将每个单元的直接地址保留在存储介质上,从而在数据查找期间尽量减少与存储介质的交互。Therefore, in order to quickly locate data, only the index corresponding to the requested index record in the index data structure 103 needs to be searched. Typically, index data structures organize raw data into logical units and retain the direct address of each unit on the storage medium, thereby minimizing interaction with the storage medium during data lookup.

数据用户111可以经由网络或因特网以及低阶设备协议与数据库或存储系统交互,或者可以在同一台机器上运行。每个数据用户都可以用作用于添加和修改数据的写入器112,或用于查找和检索存储数据104的阅读器113。在并发存在的情况下,当若干个参与者111同时执行数据查找和修改时,总体操作效率很大程度上取决于并发同步或控制的方法102,该方法旨在使存储数据保持一致状态。Data users 111 may interact with the database or storage system via the network or the Internet and low-level device protocols, or may run on the same machine. Each data user can function as a writer 112 for adding and modifying data, or a reader 113 for finding and retrieving stored data 104 . In the presence of concurrency, when several participants 111 perform data lookups and modifications simultaneously, the overall operational efficiency depends largely on the method of concurrency synchronization or control 102 that aims to keep the stored data in a consistent state.

数据存储系统100可看作多个写入器112之间的先进同步的变型,所述高级同步适用作数据存储系统100和信息数据库中用作索引数据结构103的索引或前缀树的并发控制机制102。这种基于前缀树的索引数据结构103还可以配有垂直压缩、水平压缩、键序跳过以及写入器112与阅读器113之间的先进同步。The data storage system 100 can be viewed as a variant of advanced synchronization between multiple writers 112, which is applicable as a concurrency control mechanism for the index or prefix tree used as the index data structure 103 in the data storage system 100 and information database. 102. This prefix tree based index data structure 103 can also be equipped with vertical compression, horizontal compression, key order skipping and advanced synchronization between the writer 112 and the reader 113.

本发明提供了一种新的前缀树节点锁定工具:多符号锁及其实现方式:紧凑锁和全范围锁。对于任意字母表,前缀树节点的最大分支因子实际上等于字母表基数。然而,树的修改大多是在树节点中的一个子节点上执行的。因此,在大多数情况下,仅锁定单个子节点的机制是相当有利的。多符号锁就是为此目的而设计的。The present invention provides a new prefix tree node locking tool: multi-symbol lock and its implementation: compact lock and full-range lock. For any alphabet, the maximum branching factor of a prefix tree node is actually equal to the alphabet base. However, tree modifications are mostly performed on a child node in the tree node. Therefore, in most cases, the mechanism of locking only a single child node is quite advantageous. Multi-symbol locks are designed for this purpose.

图2示出了全范围锁结构200。这种全范围锁结构200具有n个位,其中,n是给定字母表中的符号数。例如,对于C样式字节串,使用四个64位无符号整数。全范围锁结构200可以具有位长为0至n–1的数组形式,其中,n是前缀树的分支因子。Figure 2 shows a full range lock structure 200. This full-range lock structure 200 has n bits, where n is the number of symbols in a given alphabet. For example, for C-style byte strings, use four 64-bit unsigned integers. The full range lock structure 200 may be in the form of an array with a bit length of 0 to n-1, where n is the branching factor of the prefix tree.

图3示出了256位C字符串的全范围锁结构200的示例。为了更好的解释,在图3右侧提供了一个ASCII表。在第一示例中,全范围锁结构200a未被锁定。因此,从0到255的所有位都设置为0。在第二示例中,全范围锁结构200b被排它锁定。因此,从0到255的所有位都设置为1。在第三示例中,在全范围锁结构200c中,锁定符号d。因此,除了第100位(表示符号d,见ASCII表)设置为1外,从0到255的所有位都设置为0。在第四示例中,在全范围锁结构200d中,锁定符号d和K。因此,除了第100位(表示符号d,见ASCII表)和第75位(表示符号K,见ASCII表)设置为1外,从0到255的所有位都设置为0。Figure 3 shows an example of a full-range lock structure 200 for a 256-bit C string. For better explanation, an ASCII table is provided on the right side of Figure 3. In the first example, full range lock structure 200a is unlocked. Therefore, all bits from 0 to 255 are set to 0. In the second example, full range lock structure 200b is locked exclusively. Therefore, all bits from 0 to 255 are set to 1. In a third example, in full range lock structure 200c, symbol d is locked. Therefore, all bits from 0 to 255 are set to 0 except bit 100 (representing the symbol d, see the ASCII table) which is set to 1. In a fourth example, in full range lock structure 200d, symbols d and K are locked. Therefore, all bits from 0 to 255 are set to 0 except bit 100 (representing the symbol d, see the ASCII table) and bit 75 (representing the symbol K, see the ASCII table), which are set to 1.

图4示出了紧凑锁结构400。这种紧凑锁结构400包括节点的符号间隔或段401和锁掩码402。符号段401可包括长度为0至k–1的数组,用于存储符号,其中,n是给定字母表中符号的数目或前缀树的分支因子。锁掩码段或锁掩码402的位长可为0至k–1,其中,k是小于n的数字,因为需要一次锁定所有符号的可能性很小。当k等于n时,所述紧凑锁成为所述全范围锁,因为对于每个符号,所述锁掩码中有相应的位,且不需要符号间隔段。对于特殊情况,即锁掩码402使用完毕时,另一写操作发生,该写操作必须等待。紧凑锁结构400需要k位掩码和k个值间隔,k个值各自表示一个符号。它是所述全范围锁的缩短版。例如,对于C样式字节串,最合理的k值为7和14。对于k等于7,可以使用一个64位无符号整数,其中,第一个字节用作位,其他字节用作值。类似地,对于k等于14,可以使用一个128位无符号整数表示所述紧凑锁,其中,前两个字节用作位,其他字节用作值。Figure 4 shows a compact lock structure 400. This compact lock structure 400 includes a symbol interval or segment 401 of nodes and a lock mask 402 . Symbol segment 401 may include an array of length 0 to k-1 for storing symbols, where n is the number of symbols in a given alphabet or the branching factor of a prefix tree. The bit length of the lock mask segment or lock mask 402 may be from 0 to k – 1, where k is a number less than n since there is a small possibility that all symbols need to be locked at once. When k equals n, the compact lock becomes the full-range lock because for each symbol there is a corresponding bit in the lock mask and no symbol gap segment is required. For the special case, that is, when lock mask 402 is used up, another write operation occurs, and the write operation must wait. The compact lock structure 400 requires a k-bit mask and k value intervals, each of which represents a symbol. It is a shortened version of the full range lock described. For example, for C-style byte strings, the most reasonable k values are 7 and 14. For k equal to 7, you can use a 64-bit unsigned integer, where the first byte is used as the bit and the other bytes are used as the value. Similarly, for k equal to 14, the compact lock can be represented using a 128-bit unsigned integer, where the first two bytes are used as bits and the other bytes are used as values.

数据控制器101用于为锁掩码段402的至少一个位提供锁。The data controller 101 is configured to provide a lock for at least one bit of the lock mask segment 402.

图5示出了64位大小的C字符串的紧凑锁结构400的示例。在第一示例中,紧凑锁结构400a未被锁定。因此,从0到6的所有位都设置为0。在第二示例中,紧凑锁结构400b被排它锁定。因此,从0到6的所有位都设置为1。在第三示例中,在紧凑锁结构400c中,锁定符号d。因此,除了第1位(表示符号d)设置为1外,从0到6的所有位都设置为0。锁掩码402的第1位与符号段401的第1间隔对应。在第四示例中,在紧凑锁结构400d中,锁定符号d和K。因此,除了第1位(表示符号d)和第4位(表示符号K)设置为1外,从0到6的所有位都设置为0。Figure 5 shows an example of a compact lock structure 400 for a 64-bit size C string. In the first example, compact lock structure 400a is unlocked. Therefore, all bits from 0 to 6 are set to 0. In a second example, compact lock structure 400b is locked exclusively. Therefore, all bits from 0 to 6 are set to 1. In a third example, in compact lock structure 400c, symbol d is locked. Therefore, all bits from 0 to 6 are set to 0 except bit 1 (representing the symbol d) which is set to 1. The first bit of the lock mask 402 corresponds to the first interval of the symbol segment 401. In a fourth example, in compact lock structure 400d, symbols d and K are locked. Therefore, all bits from 0 to 6 are set to 0 except bit 1 (representing the symbol d) and bit 4 (representing the symbol K) which are set to 1.

对于上文给出的多符号锁实现方式的详细示例,示例中使用的字母表是单字节字符集,每个符号在8位内表示,整个字母表基数为256个符号。所有符号都用于形成输入键字符串,255个非零字节是用于编码信息的符号,零字节符号指示字符串的结束。这种方法称为C样式字符串或以空值终止的字符串。For the detailed example of the multi-symbol lock implementation given above, the alphabet used in the example is a single-byte character set, each symbol is represented in 8 bits, and the entire alphabet base is 256 symbols. All symbols are used to form the input key string, the 255 non-zero bytes are the symbols used to encode the information, and the zero byte symbol indicates the end of the string. This approach is called C-style strings or null-terminated strings.

所述前缀树可以应用键序跳过和垂直压缩技术。在这种情况下,使用共同前缀字段扩展内层节点,而叶子节点则存储全部键。在树遍历期间,如果跳过了某些符号,则需要额外的比较以确保特定叶子节点与预期的键对应。另外,当插入或删除之后共同前缀更改时,内部节点可能会被拆分或合并。通常应用自适应分支因子以显著降低内存消耗。内部节点使用子节点的可变大小间隔进行更新,例如,可以根据需要将它们扩展或收缩到大小4、16、48和256。所有内存优化技术都使内部节点变得复杂,并且不可能在一般通用硬件上自动更新内部节点的所有字段。因此,需要一种先进同步机制来构建多线程前缀树。The prefix tree can apply key order skipping and vertical compression techniques. In this case, the inner nodes are extended with common prefix fields, while the leaf nodes store all keys. During tree traversal, if certain symbols are skipped, additional comparisons are required to ensure that a specific leaf node corresponds to the expected key. Additionally, internal nodes may be split or merged when the common prefix changes after an insertion or deletion. Adaptive branching factors are often applied to significantly reduce memory consumption. Internal nodes are updated with variable size intervals for child nodes, which can, for example, expand or contract to sizes 4, 16, 48, and 256 as needed. All memory optimization techniques make internal nodes complex, and it is not possible to automatically update all fields of an internal node on general purpose hardware. Therefore, an advanced synchronization mechanism is needed to build multi-threaded prefix trees.

图6示出了基数树600,具有根节点601、两个内部节点602和603以及叶子节点604。结合基数树600,解释了多符号锁的优点。并非获取完全排它锁,而是多符号锁有可能仅获取一个符号的放松锁。对于前缀树而言,这意味着即使若干子节点具有相同的父节点,仍可以同时对它们进行修改。实际上只需要它们的前缀不同。这里采用的是多符号锁的紧凑锁变型。Figure 6 shows a radix tree 600 with a root node 601, two internal nodes 602 and 603, and a leaf node 604. Combined with radix tree 600, the advantages of multi-symbol locking are explained. Instead of acquiring a fully exclusive lock, it is possible for a multi-symbol lock to acquire only a relaxed lock of one symbol. For prefix trees, this means that several child nodes can be modified simultaneously even if they have the same parent. Actually only their prefixes need to be different. A compact lock variant of the multi-symbol lock is used here.

假设第一写入器605和第二写入器606同时修改基数树600。在左侧,针对多符号锁显示操作。作为比较,右侧显示了传统的完全排它锁的功能。Assume that first writer 605 and second writer 606 modify radix tree 600 simultaneously. On the left, actions are shown for a multi-symbol lock. For comparison, the functionality of a traditional fully exclusive lock is shown on the right.

第一写入器605修改内部节点602。根据本发明,第一写入器605在根节点601处锁定位607,即将其值设置为1。该位607与符号c(间隔608)对应,即表示写操作下的节点602的符号。此外,第一写入器605按照锁掩码段402中设置为1的所有位的指示,排它地锁定写操作下的节点602的所有符号。First writer 605 modifies internal node 602. According to the present invention, the first writer 605 locks bit 607 at the root node 601, ie sets its value to 1. This bit 607 corresponds to symbol c (interval 608), which represents the symbol of node 602 under write operation. In addition, the first writer 605 exclusively locks all symbols of the node 602 under the write operation as indicated by all bits in the lock mask segment 402 that are set to 1.

在父节点601处,仅设置一个位607。因此,第二写入器606能够对另一内部节点603执行并发的第二写操作。因此,第二写入器606在根节点601处锁定位609,即将其值设置为1。该位609与符号r(间隔610)对应,即表示该第二写操作下的节点603的符号。此外,第二写入器606按照锁掩码段402中设置为1的所有位的指示,排它地锁定写操作下的节点603的所有符号。At parent node 601, only one bit 607 is set. Therefore, the second writer 606 is able to perform a concurrent second write operation to another internal node 603 . Therefore, the second writer 606 locks bit 609 at the root node 601, ie, sets its value to 1. This bit 609 corresponds to the symbol r (interval 610), which represents the symbol of the node 603 under the second write operation. In addition, the second writer 606 exclusively locks all symbols of the node 603 under the write operation as indicated by all bits in the lock mask segment 402 that are set to 1.

右侧所示的完全排它锁的情况则完全不同。其中,首先写入的第二写入器606完全锁定根节点601和内部节点603。因此,第一写入器605需要等待第二写入器606完成操作。与此相反,本发明允许写入器605和606的并行写操作。The situation with the fully exclusive lock shown on the right is completely different. Among them, the second writer 606 that writes first completely locks the root node 601 and the internal node 603. Therefore, the first writer 605 needs to wait for the second writer 606 to complete the operation. In contrast, the present invention allows parallel write operations by writers 605 and 606.

图7描述了用于前缀树插入操作的乐观放松锁耦合并发控制的流程图。为执行修改,写入器最初遵循针对阅读器选择的同步方案,以遍历树并寻找所需的键或插入新节点的位置。为执行插入或删除,所述写入器最多需要两个锁:一个用于已停止遍历的节点,另一个用于该节点的父节点。如果需要,所述写入器会一直针对所述父节点上的相应符号获取单符号锁。因此,其他写入器可以修改同一父节点的不同子节点。当可以在不影响该节点下插入或删除子节点时,也应使用共享字段的单符号锁,例如,以添加新的子叶节点。Figure 7 depicts a flow chart of optimistically relaxed lock-coupling concurrency control for prefix tree insertion operations. To perform modifications, the writer initially follows the synchronization scheme chosen for the reader to traverse the tree and find the required key or location to insert a new node. To perform an insertion or deletion, the writer requires at most two locks: one for the node that has stopped traversing, and one for that node's parent. If necessary, the writer will always acquire a single-symbol lock on the corresponding symbol on the parent node. Therefore, other writers can modify different child nodes of the same parent node. Single-symbol locks on shared fields should also be used when child nodes can be inserted or deleted without affecting that node, for example, to add a new cotyledon node.

为了针对前缀树应用乐观放松锁耦合方法,用多符号锁扩展了所有内部结点。它可以是所提出的实现方式之一:紧凑锁或全范围锁。一般来说,该思想是为了增强利用放松锁的乐观锁耦合算法。也就是说,只要有可能,就仅锁定单个符号,而不是排它锁。该算法要求并发写入器之间同步,以保持前缀树的一致性。为了同步并发写入器和阅读器,可以使用任何已知方法,例如基于原子操作的节点版本控制或无锁方法。此外,阅读器可以应用经典的锁耦合算法,使用单符号锁来而不是排他性更合理。To apply the optimistically relaxed lock coupling approach to prefix trees, all internal nodes are expanded with multi-symbol locks. It can be one of the proposed implementations: compact lock or full-range lock. In general, the idea is to enhance optimistic locking coupling algorithms that exploit relaxed locks. That is, whenever possible, only lock a single symbol, rather than locking exclusively. This algorithm requires synchronization between concurrent writers to maintain the consistency of the prefix tree. To synchronize concurrent writers and readers, any known method can be used, such as node versioning based on atomic operations or a lock-free approach. Additionally, readers can apply classic lock coupling algorithms, where it is more reasonable to use single-symbol locks instead of exclusive ones.

在步骤700中,程序开始并转到步骤702,其中,阅读器查找键,从而用作阅读器。在步骤703中,确定是否找到所述键。如果找到,则在步骤704中更新对应值。随后,在退出步骤705中终止例程。In step 700, the process begins and proceeds to step 702, where the reader looks up the key and thereby acts as a reader. In step 703, it is determined whether the key is found. If found, the corresponding value is updated in step 704. Subsequently, the routine is terminated in exit step 705.

如果在步骤703中未找到所述键,则在步骤706中选择现行节点进行插入。然后,在步骤707中,锁定所述父节点中的对应单个符号。在步骤708中,确定锁定符号处的所述现行节点是否仍然是所述父节点的直接子节点。如果不是,则在步骤709中,插入新节点并解锁所述父节点中的符号,在步骤702中,重新开始所述程序。If the key is not found in step 703, the current node is selected for insertion in step 706. Then, in step 707, the corresponding single symbol in the parent node is locked. In step 708, it is determined whether the current node at the locked symbol is still a direct child of the parent node. If not, in step 709, a new node is inserted and the symbol in the parent node is unlocked, and in step 702, the procedure is restarted.

当所述现行节点仍然是所述锁定符号处所述父节点的直接子节点时,所述例程从节点708转至节点710。其中,确定是否需要节点转换。如果需要,则在步骤711中锁定完整的现行节点,因为整个节点都受到修改的影响。如果不需要,则在步骤712中仅锁定所述现行节点中的符号,因为只有该符号受修改影响。在这两种情况下,在步骤713中执行相应的插入。在步骤705中退出所述程序之前,在步骤714中解锁所有获取的锁。The routine branches from node 708 to node 710 when the current node is still a direct child of the parent node at the locked symbol. Among them, determine whether node conversion is required. If necessary, the complete active node is locked in step 711 since the entire node is affected by the modification. If not required, only the symbol in the current node is locked in step 712, since only this symbol is affected by the modification. In both cases, the corresponding insertion is performed in step 713. Before exiting the program in step 705, all acquired locks are unlocked in step 714.

图8描述了支持四种主要操作的多符号锁算法的四个流程图。所述多符号锁算法可以通过全范围锁算法(稍后结合图9进行解释)或紧凑锁算法(稍后结合图10进行解释)实现。Figure 8 depicts four flowcharts of the multi-symbol lock algorithm supporting four main operations. The multi-symbol lock algorithm can be implemented by a full-range lock algorithm (explained later in conjunction with Figure 9) or a compact lock algorithm (explained later in conjunction with Figure 10).

根据从步骤800开始的第一算法,获取排它锁。在步骤801中,获取或观察锁状态。在步骤802中,确定是否已保持任何锁,即是否设置了某种锁。如果是,则操作转回到步骤801,从而形成观察锁状态的循环。如果否,则在步骤803中设置排它锁,如上所述。最后,在步骤804中,退出例程,其中,在相应节点处具有设置的排它锁。According to the first algorithm starting from step 800, an exclusive lock is acquired. In step 801, the lock status is obtained or observed. In step 802, it is determined whether any locks have been held, ie, whether some kind of lock has been set. If so, the operation returns to step 801, thereby forming a loop for observing the lock status. If not, the exclusive lock is set in step 803, as described above. Finally, in step 804, the routine is exited with the exclusive lock set at the corresponding node.

根据从步骤810开始的第二算法,释放排它锁。在步骤810中,描述了如何释放设置的排它锁。在步骤811中,释放所述排它锁。该操作实际如何执行取决于所使用的算法。可以采用全范围锁算法或紧凑锁算法。最后,在步骤812中,退出例程,其中,在相应节点处具有释放锁。According to the second algorithm starting at step 810, the exclusive lock is released. In step 810, it is described how to release the set exclusive lock. In step 811, the exclusive lock is released. How this operation is actually performed depends on the algorithm used. Either the full-range locking algorithm or the compact locking algorithm can be used. Finally, in step 812, the routine is exited with the lock released at the corresponding node.

根据从步骤820开始的第三算法,获取单符号锁。在步骤821中,获取或观察锁状态。在步骤822中,确定是否已经保持排它锁。如果是,则操作转回到步骤821,从而形成观察所述节点的锁状态的循环。如果否,则转到步骤823。在步骤823中,确定是否已锁定现行符号。如果是,则操作转回到步骤821,从而形成观察所述符号的锁状态的循环。如果否,则在步骤824中设置单符号锁,如上所述。最后,在步骤825中,退出例程,其中,在相应节点处具有设置的单符号锁。According to the third algorithm starting from step 820, a single symbol lock is acquired. In step 821, the lock status is obtained or observed. In step 822, it is determined whether the exclusive lock has been held. If so, the operation returns to step 821, forming a loop that observes the lock status of the node. If not, go to step 823. In step 823, it is determined whether the current symbol has been locked. If so, the operation returns to step 821, forming a loop that observes the lock state of the symbol. If not, the single symbol lock is set in step 824, as described above. Finally, in step 825, the routine is exited with the single symbol lock set at the corresponding node.

根据从步骤830开始的第四算法,释放单符号锁。在步骤831中,释放所述单符号锁。该操作实际如何执行取决于所使用的算法。可以采用全范围锁算法或紧凑锁算法。最后,在步骤832中,退出例程,其中,在相应节点处具有释放锁。According to the fourth algorithm starting at step 830, the single symbol lock is released. In step 831, the single symbol lock is released. How this operation is actually performed depends on the algorithm used. Either the full-range locking algorithm or the compact locking algorithm can be used. Finally, in step 832, the routine is exited with the lock released at the corresponding node.

图9示出了五种全范围锁算法。全范围锁物理上表示为长度为n位的内存块,如图2所示。每个位i(i为0至n–1)与字母表的具有代码i的符号对应。所述全范围锁是一种精确的解决方案,提供了最大的可扩展性,但相比传统的锁可能会消耗更多的内存。Figure 9 shows five full-range locking algorithms. A full-range lock is physically represented as a memory block of length n bits, as shown in Figure 2. Each bit i (i is 0 to n–1) corresponds to a symbol of the alphabet with code i. The full-scope lock is a precise solution that provides maximum scalability, but may consume more memory than traditional locks.

根据从步骤900开始的第一算法,获取排它锁。在步骤901中,读取锁值,在步骤902中,确定该锁值是否等于0。如果否,则操作转回到步骤901,从而形成读取所述锁值的循环。如果是,则在步骤903中,将所有n位设置为锁定值1,从而设置或获取排它锁。最后,在步骤904中,退出例程,其中,在相应节点处具有设置的排它锁。According to the first algorithm starting from step 900, an exclusive lock is acquired. In step 901, the lock value is read, and in step 902, it is determined whether the lock value is equal to 0. If not, the operation returns to step 901, thereby forming a loop for reading the lock value. If so, in step 903, all n bits are set to the lock value 1, thereby setting or acquiring the exclusive lock. Finally, in step 904, the routine is exited with the exclusive lock set at the corresponding node.

根据从步骤910开始的第二算法,主动获取排它锁。该算法在有参与者尝试获取排它锁时不会让其他参与者获取单符号锁,即排它锁比单符号锁具有更高的优先级。在步骤911中,通过将所有n位设置为1来初始化n位的局部变量剩余掩码。剩余掩码位表示仍需要获取的锁的位。在步骤912中,读取锁值,在步骤913中,通过利用所述锁值进行AND操作按位更新所述剩余掩码位。在步骤914中,将所有位设置为锁定值1。在步骤915中,确定剩余掩码值是否等于0,指示保持所述锁的所有位。如果否,则所述操作转回到步骤912以重启所述剩余掩码位的更新。如果是,在步骤916中,退出例程,其中,在相应节点处具有设置的排它锁。According to the second algorithm starting from step 910, the exclusive lock is actively acquired. This algorithm will not allow other participants to acquire a single-symbol lock when a participant attempts to acquire an exclusive lock, that is, an exclusive lock has a higher priority than a single-symbol lock. In step 911, the n-bit local variable remainder mask is initialized by setting all n bits to 1. The remaining mask bits represent the bits of the lock that still need to be acquired. In step 912, the lock value is read, and in step 913, the remaining mask bits are updated bitwise by performing an AND operation using the lock value. In step 914, all bits are set to the lock value of 1. In step 915, it is determined whether the remaining mask value is equal to 0, indicating that all bits of the lock are held. If not, the operation returns to step 912 to restart updating of the remaining mask bits. If so, in step 916, the routine is exited with the exclusive lock set at the corresponding node.

根据从步骤920开始的第三算法,释放排它锁。在步骤921中,释放所述锁。为了释放所述排它锁,n个位全部设置为0。最后,在步骤922中,退出例程,其中,在相应节点处具有释放锁。According to the third algorithm starting at step 920, the exclusive lock is released. In step 921, the lock is released. To release the exclusive lock, n bits are all set to 0. Finally, in step 922, the routine is exited with the lock released at the corresponding node.

根据从步骤930开始的第四算法,获取单符号锁。在步骤931中,将局部变量i设置为符号码,从而选择相应的符号。在步骤932中,读取所述锁值。在步骤933中,确定是否找到i位。如果是,则操作转回到步骤932,从而形成读取所述锁值的循环。如果否,则在步骤934中,将i位设置为锁定值1,从而设置或获取单符号锁。最后,在步骤935中,退出例程,其中,在相应节点处具有设置的单符号锁。According to the fourth algorithm starting from step 930, a single symbol lock is acquired. In step 931, local variable i is set to the symbol code, thereby selecting the corresponding symbol. In step 932, the lock value is read. In step 933, it is determined whether the i bit is found. If so, the operation returns to step 932, forming a loop that reads the lock value. If not, in step 934, the i bit is set to the lock value 1, thereby setting or acquiring the single symbol lock. Finally, in step 935, the routine is exited with the single symbol lock set at the corresponding node.

根据从步骤940开始的第五算法,释放单符号锁。在步骤941中,将局部变量i设置为符号码,从而选择相应的符号。在步骤942中,将所述i位设置为锁定值0,从而释放所述单符号锁。最后,在步骤943中,退出例程,其中,在相应节点处具有释放的单符号锁。According to the fifth algorithm starting at step 940, the single symbol lock is released. In step 941, local variable i is set to the symbol code, thereby selecting the corresponding symbol. In step 942, the i bit is set to a lock value of 0, thereby releasing the single symbol lock. Finally, in step 943, the routine is exited with the single symbol lock released at the corresponding node.

图10示出了五种紧凑锁算法。针对内存消耗优化了紧凑锁,但可能会导致可扩展性略微降低。所述紧凑锁可以利用与传统锁相同的内存量,作为它们的直接替代。图4示出了紧凑锁在内存中的表示形式,作为符号和位掩码的若干间隔。间隔k的数量也是掩码中的位数,其中,k<n。每个间隔都表示存储单个锁定的符号,且必须有足够的大小来表示字母表的符号。掩码中的每个位指示带符号的相应间隔是否被锁定。Figure 10 shows five compact locking algorithms. Compact locks are optimized for memory consumption, but may result in slightly less scalability. The compact locks can utilize the same amount of memory as traditional locks, serving as their drop-in replacement. Figure 4 shows the representation of a compact lock in memory as several intervals of symbols and bit masks. The number of intervals k is also the number of bits in the mask, where k < n. Each interval represents the storage of a single locked symbol and must be large enough to represent the symbols of the alphabet. Each bit in the mask indicates whether the corresponding signed interval is locked.

根据从步骤1000开始的第一算法,获取排它锁。在步骤1001中,读取锁掩码,在步骤1002中,确定该锁掩码是否等于0。如果否,则操作转回到步骤1001,从而形成读取所述锁掩码的循环。如果是,则在步骤1003中,将所述锁掩码的所有n位设置为1,从而设置或获取排它锁。在设置所述掩码中所有k位期间,忽略容纳符号的间隔部分。最后,在步骤1004中,退出例程,其中,在相应节点处具有设置的排它锁。According to the first algorithm starting from step 1000, an exclusive lock is acquired. In step 1001, the lock mask is read, and in step 1002, it is determined whether the lock mask is equal to 0. If not, the operation returns to step 1001, thereby forming a loop for reading the lock mask. If so, in step 1003, all n bits of the lock mask are set to 1, thereby setting or acquiring the exclusive lock. During the setting of all k bits in the mask, the part of the gap that holds the symbol is ignored. Finally, in step 1004, the routine is exited with the exclusive lock set at the corresponding node.

根据从步骤1010开始的第二算法,主动获取排它锁。该算法在有参与者尝试获取排它锁时不会让其他参与者获取单符号锁,即排它锁比单符号锁具有更高的优先级。在步骤1011中,通过将所有k位设置为1来初始化k位的局部变量剩余掩码。剩余掩码位表示仍需要获取的锁掩码的位。在步骤1012中,读取所述锁掩码,在步骤1013中,通过利用所述锁掩码进行AND操作按位更新所述剩余掩码位。在步骤1014中,将所述锁掩码的所有位设置为锁定值1。在设置所述掩码中所有k位期间,忽略容纳符号的间隔部分。在步骤1015中,确定剩余掩码值是否等于0,指示保持锁掩码的所有位。如果否,则所述操作转回到步骤1012以重启所述剩余掩码位的更新。如果是,在步骤1016中,退出例程,其中,在相应节点处具有设置的排它锁。According to the second algorithm starting from step 1010, the exclusive lock is actively acquired. This algorithm will not allow other participants to acquire a single-symbol lock when a participant attempts to acquire an exclusive lock, that is, an exclusive lock has a higher priority than a single-symbol lock. In step 1011, the k-bit local variable remainder mask is initialized by setting all k bits to 1. The remaining mask bits represent the bits of the lock mask that still need to be acquired. In step 1012, the lock mask is read, and in step 1013, the remaining mask bits are updated bitwise by performing an AND operation using the lock mask. In step 1014, all bits of the lock mask are set to a lock value of 1. During the setting of all k bits in the mask, the part of the gap that holds the symbol is ignored. In step 1015, it is determined whether the remaining mask value is equal to 0, indicating that all bits of the lock mask are maintained. If not, the operation returns to step 1012 to restart the updating of the remaining mask bits. If so, in step 1016, the routine is exited with the exclusive lock set at the corresponding node.

根据从步骤1020开始的第三算法,释放排它锁。在步骤1021中,通过将所述掩码中的所有k位调零将所述锁掩码设置为0,从而忽略容纳符号的间隔部分。最后,在步骤1022中,退出例程,其中,在相应节点处具有释放的排它锁。According to the third algorithm starting from step 1020, the exclusive lock is released. In step 1021, the lock mask is set to 0 by zeroing out all k bits in the mask, thereby ignoring the portion of the gap that holds the symbol. Finally, in step 1022, the routine is exited with the exclusive lock released at the corresponding node.

根据从步骤1030开始的第四算法,获取单符号锁。在步骤1031中,将局部变量i设置为符号码,从而选择相应的符号。According to the fourth algorithm starting from step 1030, a single symbol lock is acquired. In step 1031, local variable i is set to the symbol code, thereby selecting the corresponding symbol.

在步骤1032中,读取所述锁掩码,在步骤1033中,迭代所述锁掩码中的每个位,直到有更多位。在步骤1034和步骤1035中,如果设置了所述锁掩码中的现行位(步骤1034)且对应的符号间隔值等于i(步骤1035),则操作转回到步骤1032,形成读取所述锁掩码的循环,否则,所述操作转回到步骤1033,继续到所述锁掩码中的下一个位。如果不再有位(步骤1033),则操作转到步骤1036。在步骤1036中,确定所述锁掩码中是否有任何未设置的j位。如果没有,则所述紧凑锁已满,操作转回到步骤1032以重试锁定。可选地,步骤1036中的操作可以从随机索引或从符号码mod k开始搜索所述锁掩码中的未设置位,以减少当若干写入器试图占用第一个可用零位时可能发生的冲突数量。In step 1032, the lock mask is read, and in step 1033, each bit in the lock mask is iterated until there are more bits. In steps 1034 and 1035, if the current bit in the lock mask is set (step 1034) and the corresponding symbol interval value is equal to i (step 1035), the operation returns to step 1032 to form the read The loop of the lock mask, otherwise, the operation returns to step 1033 and continues to the next bit in the lock mask. If there are no more bits (step 1033), operation proceeds to step 1036. In step 1036, it is determined whether there are any unset j bits in the lock mask. If not, the compact lock is full and operation returns to step 1032 to retry the lock. Optionally, the operation in step 1036 may search for unset bits in the lock mask starting from a random index or starting from the symbol code mod k to reduce what may happen when several writers try to occupy the first available zero bit number of conflicts.

因此,扫描在所述锁掩码中设置相应位的所有间隔,以确保该符号尚未被锁定。相应地,查找在所述锁掩码中未设置的任意位j,j为0至k–1。Therefore, all intervals that have the corresponding bit set in the lock mask are scanned to ensure that the symbol has not been locked. Accordingly, look for any bit j that is not set in the lock mask, j being 0 to k–1.

如果在步骤1036中确定有未设置的j位,则操作进一步进行到步骤1037,其中,设置锁掩码中的j位并且将符号间隔j设置为i。换句话说,设置所述锁掩码中的j位,并将该符号存储在间隔j中。在可选步骤1038中,返回j值,该值可用于稍后释放单符号锁。最后,在步骤1039中,退出例程,其中,在相应节点处具有设置的单符号锁。If it is determined in step 1036 that there are j bits that are not set, the operation proceeds further to step 1037, where the j bits in the lock mask are set and the symbol interval j is set to i. In other words, bit j in the lock mask is set and the symbol is stored in interval j. In optional step 1038, a j value is returned, which can be used to later release the single symbol lock. Finally, in step 1039, the routine is exited with the single symbol lock set at the corresponding node.

根据从步骤1040开始的第五算法,释放单符号锁。在步骤1041中,确定是否提供可选参数索引j。如果不提供,则在步骤1042中,将局部变量i设置为符号码,从而选择相应的符号。在步骤1043中,扫描在所述掩码中设置有对应位的所有间隔,以查找具有符号间隔j的j位,所述符号间隔j中存在符号i,i是0到n-1的符号码。在步骤1044中,通过将找到的所述锁掩码的所述j位设置为零来将其置零,从而忽略符号间隔。在步骤1041中确定提供可选参数索引j之后,操作转到步骤1044。最后,在步骤1045中,退出例程,其中,在相应节点处具有释放的单符号锁。According to the fifth algorithm starting at step 1040, the single symbol lock is released. In step 1041, it is determined whether optional parameter index j is provided. If not provided, in step 1042, the local variable i is set to the symbol code, thereby selecting the corresponding symbol. In step 1043, scan all intervals in which the corresponding bits are set in the mask to find j bits with a symbol interval j in which symbol i is present, i being a symbol code from 0 to n-1 . In step 1044, the j bits of the found lock mask are zeroed by setting them to zero, thereby ignoring the symbol gap. After determining in step 1041 that optional parameter index j is provided, the operation proceeds to step 1044. Finally, in step 1045, the routine is exited with the single symbol lock released at the corresponding node.

图11示出了根据权利要求14所述的用于提供数据存储系统100的方法。Figure 11 shows a method for providing a data storage system 100 according to claim 14.

根据第一步骤1100,提供一种数据存储系统100,用于实现具有多个节点的前缀树。在步骤1101中,在所述前缀树中发起写操作。在步骤1102中,为写操作下的节点处多个符号中的单个符号提供锁。According to a first step 1100, a data storage system 100 is provided for implementing a prefix tree with a plurality of nodes. In step 1101, a write operation is initiated in the prefix tree. In step 1102, a lock is provided for a single symbol of the plurality of symbols at the node under the write operation.

已经结合作为实例的不同实施例以及实现方式描述了本发明。然而,根据对附图、本发明和独立权利要求的研究,本领域技术人员在实践所要求保护的发明时,能够理解和实现其他变化。在权利要求书以及说明书中,词语“包括”不排除其他元件或步骤,且不定冠词“一”或者“一个”不排除多个。单个元件或其他单元可满足权利要求书中所叙述的若干实体或项目的功能。在仅凭某些措施被记载在相互不同的从属权利要求书中这个单纯的事实并不意味着这些措施的结合不能在有利的实现方式中使用。The invention has been described in connection with different embodiments and implementations as examples. However, other variations will be apparent to those skilled in the art in practicing the claimed invention, from a study of the drawings, the disclosure and the independent claims. In the claims and description, the word "comprising" does not exclude other elements or steps, and the indefinite article "a" or "an" does not exclude a plurality. A single element or other unit may fulfill the functions of several entities or items recited in a claim. The mere fact that certain measures are recited in mutually different dependent claims does not mean that a combination of these measures cannot be used to advantageous implementations.

Claims (13)

1.一种数据存储系统(100),其特征在于,具有数据存储器(104)和数据控制器(101),用于实现具有多个节点的前缀树(600),其中1. A data storage system (100), characterized by having a data memory (104) and a data controller (101) for implementing a prefix tree (600) with multiple nodes, wherein 所述数据控制器(101)用于在所述前缀树(600)中发起写操作;The data controller (101) is used to initiate a write operation in the prefix tree (600); 所述数据控制器(101)用于为写操作下的节点(602、603)处多个符号中的单个符号提供锁;The data controller (101) is configured to provide a lock for a single symbol among a plurality of symbols at a node (602, 603) under a write operation; 所述数据控制器(101)具体用于:为节点提供位长为0至n–1的锁结构,其中,n是所述前缀树(600)的分支因子,所述锁结构包括:符号段(401),包括长度为0至k–1且用于存储符号的数组;锁掩码段(402),位长为0至k–1,其中,k<n,所述数据控制器(101)用于为所述锁掩码段(402)的至少一个位提供锁。The data controller (101) is specifically configured to: provide a lock structure with a bit length of 0 to n-1 for a node, where n is a branching factor of the prefix tree (600), and the lock structure includes: a symbol segment (401), including an array with a length of 0 to k–1 for storing symbols; a lock mask segment (402) with a bit length of 0 to k–1, where k<n, the data controller (101 ) is used to provide a lock for at least one bit of the lock mask segment (402). 2.根据权利要求1所述的数据存储系统(100),其特征在于2. The data storage system (100) according to claim 1, characterized in that 所述数据控制器(101)用于发起对特定节点的插入或删除;The data controller (101) is used to initiate the insertion or deletion of a specific node; 所述数据控制器(101)用于为所述特定节点提供锁;为所述特定节点的父节点处多个符号中表示所述特定节点的单个符号提供锁。The data controller (101) is configured to provide a lock for the specific node; provide a lock for a single symbol representing the specific node among multiple symbols at a parent node of the specific node. 3.根据权利要求1或2所述的数据存储系统(100),其特征在于3. Data storage system (100) according to claim 1 or 2, characterized in that 所述数据控制器(101)用于对特定节点的多个符号中的单个符号发起更改;为所述单个符号提供锁。The data controller (101) is configured to initiate changes to a single symbol among multiple symbols of a specific node; and provide a lock for the single symbol. 4.根据权利要求1至3中任一项所述的数据存储系统(100),其特征在于4. Data storage system (100) according to any one of claims 1 to 3, characterized in that 所述数据控制器(101)用于:对所述前缀树(600)中的第一节点发起第一写操作;为所述第一节点提供锁;为所述第一节点的父节点处表示所述第一节点的单个符号提供锁;其中The data controller (101) is configured to: initiate a first write operation on the first node in the prefix tree (600); provide a lock for the first node; and represent the parent node of the first node. A single symbol of the first node provides the lock; where 所述数据控制器(101)用于对所述前缀树(600)中与所述第一节点具有相同父节点的第二节点发起第二写操作。The data controller (101) is configured to initiate a second write operation on a second node in the prefix tree (600) that has the same parent node as the first node. 5.根据权利要求4所述的数据存储系统(100),其特征在于5. The data storage system (100) according to claim 4, characterized in that 所述数据控制器(101)用于为所述第二节点提供锁;为所述父节点处表示所述第二节点的单个符号提供锁。The data controller (101) is configured to provide a lock for the second node; provide a lock for a single symbol representing the second node at the parent node. 6.根据权利要求1至5中任一项所述的数据存储系统(100),其特征在于6. Data storage system (100) according to any one of claims 1 to 5, characterized in that 所述数据控制器(101)用于对所述前缀树(600)中节点的第一符号发起第一写操作,其中,所述第一符号表示所述节点的第一子节点;为所述第一符号提供锁;其中The data controller (101) is configured to initiate a first write operation on the first symbol of a node in the prefix tree (600), where the first symbol represents the first child node of the node; for the The first symbol provides the lock; where 所述数据控制器(101)用于对所述节点中的第二符号发起第二写操作,其中,所述第二符号表示所述节点的第二子节点。The data controller (101) is configured to initiate a second write operation on a second symbol in the node, where the second symbol represents a second child node of the node. 7.根据权利要求6所述的数据存储系统(100),其特征在于7. Data storage system (100) according to claim 6, characterized in that 所述数据控制器(101)用于为所述第二符号提供锁。The data controller (101) is configured to provide a lock for the second symbol. 8.根据权利要求1至7中任一项所述的数据存储系统(100),其特征在于8. Data storage system (100) according to any one of claims 1 to 7, characterized in that 所述数据控制器(101)用于:为写操作下的子节点提供锁;在所述写操作下的子节点的所有父节点处为表示相应子节点的单个符号提供锁,使得仅锁定到所述写操作下的子节点的单个路径。The data controller (101) is configured to: provide a lock for a child node under a write operation; provide a lock for a single symbol representing a corresponding child node at all parent nodes of the child node under the write operation, so that only locks to A single path to the child node under the write operation. 9.根据权利要求1至8中任一项所述的数据存储系统(100),其特征在于9. Data storage system (100) according to any one of claims 1 to 8, characterized in that 所述数据控制器(101)用于在完成所述写操作之后释放符号和/或节点的锁。The data controller (101) is configured to release the lock of the symbol and/or node after completing the write operation. 10.根据权利要求1至9中任一项所述的数据存储系统(100),其特征在于10. Data storage system (100) according to any one of claims 1 to 9, characterized in that 所述数据控制器(101)用于通过锁定特定节点处的多个符号来为所述特定节点提供排它锁。The data controller (101) is configured to provide an exclusive lock for a specific node by locking multiple symbols at the specific node. 11.根据权利要求1至10中任一项所述的数据存储系统(100),其特征在于11. Data storage system (100) according to any one of claims 1 to 10, characterized in that 所述前缀树(600)是基数树或自适应基数树。The prefix tree (600) is a radix tree or an adaptive radix tree. 12.一种用于提供数据存储系统(100)的方法,其特征在于,所述数据存储系统(100)用于实现具有多个节点的前缀树(600),所述方法包括:12. A method for providing a data storage system (100), characterized in that the data storage system (100) is used to implement a prefix tree (600) with a plurality of nodes, the method comprising: 在所述前缀树中发起写操作(600);Initiate a write operation in the prefix tree (600); 为写操作下的节点(602、603)处多个符号中的单个符号提供锁;Provides a lock on a single symbol among multiple symbols at node (602, 603) under write operation; 所述为写操作下的节点(602、603)处多个符号中的单个符号提供锁,包括:为节点提供位长为0至n–1的锁结构,其中,n是所述前缀树(600)的分支因子,所述锁结构包括:符号段(401),包括长度为0至k–1且用于存储符号的数组;锁掩码段(402),位长为0至k–1,其中,k<n,数据控制器(101)用于为所述锁掩码段(402)的至少一个位提供锁。Providing a lock for a single symbol among multiple symbols at a node (602, 603) under a write operation includes: providing a lock structure with a bit length of 0 to n-1 for the node, where n is the prefix tree ( 600), the lock structure includes: a symbol segment (401), including an array with a length of 0 to k–1 and used to store symbols; a lock mask segment (402), with a bit length of 0 to k–1 , where k<n, the data controller (101) is used to provide a lock for at least one bit of the lock mask segment (402). 13.一种计算机可读存储介质,其特征在于,存储有计算机程序,所述计算机程序用于在计算机或权利要求1至11中任一项所述的数据存储系统(100)上运行时,执行权利要求12所述的方法。13. A computer-readable storage medium, characterized by storing a computer program for running on a computer or the data storage system (100) of any one of claims 1 to 11, The method of claim 12 is performed.
CN201780097046.1A 2017-11-20 2017-11-20 Data storage system and method for providing data storage system Active CN111373389B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/RU2017/000856 WO2019098870A1 (en) 2017-11-20 2017-11-20 Data storage system and method of providing a data storage system

Publications (2)

Publication Number Publication Date
CN111373389A CN111373389A (en) 2020-07-03
CN111373389B true CN111373389B (en) 2023-11-17

Family

ID=60766119

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201780097046.1A Active CN111373389B (en) 2017-11-20 2017-11-20 Data storage system and method for providing data storage system

Country Status (2)

Country Link
CN (1) CN111373389B (en)
WO (1) WO2019098870A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112784117B (en) * 2021-01-06 2023-06-02 北京信息科技大学 Advanced radix tree construction method and construction system for mass data
CN113674821B (en) * 2021-10-21 2022-03-22 浙江太美医疗科技股份有限公司 Network interaction method, device, equipment and storage medium
CN118051502B (en) * 2024-04-10 2024-06-28 腾讯科技(深圳)有限公司 Index processing method, device and equipment of database and readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0438958A2 (en) * 1990-01-22 1991-07-31 International Business Machines Corporation Byte stream file management using shared and exclusive locks
CN103020060A (en) * 2011-09-20 2013-04-03 佳都新太科技股份有限公司 Number segment matching algorithm based on tree structure and realization method of number segment matching algorithm
CN105117417A (en) * 2015-07-30 2015-12-02 西安交通大学 Read-optimized memory database Trie tree index method
AU2016230539A1 (en) * 2015-03-11 2017-10-05 Ntt Communications Corporation Retrieval device, retrieval method, program, and recording medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7899067B2 (en) * 2002-05-31 2011-03-01 Cisco Technology, Inc. Method and apparatus for generating and using enhanced tree bitmap data structures in determining a longest prefix match
US8868531B2 (en) * 2012-09-10 2014-10-21 Apple Inc. Concurrent access methods for tree data structures
US9208258B2 (en) * 2013-04-11 2015-12-08 Apple Inc. Locking and traversal methods for ordered tree data structures
US9817919B2 (en) * 2013-06-10 2017-11-14 Nvidia Corporation Agglomerative treelet restructuring for bounding volume hierarchies
US10496283B2 (en) * 2016-01-22 2019-12-03 Suraj Prabhakar WAGHULDE Adaptive prefix tree based order partitioned data storage system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0438958A2 (en) * 1990-01-22 1991-07-31 International Business Machines Corporation Byte stream file management using shared and exclusive locks
CN103020060A (en) * 2011-09-20 2013-04-03 佳都新太科技股份有限公司 Number segment matching algorithm based on tree structure and realization method of number segment matching algorithm
AU2016230539A1 (en) * 2015-03-11 2017-10-05 Ntt Communications Corporation Retrieval device, retrieval method, program, and recording medium
CN105117417A (en) * 2015-07-30 2015-12-02 西安交通大学 Read-optimized memory database Trie tree index method

Also Published As

Publication number Publication date
CN111373389A (en) 2020-07-03
WO2019098870A1 (en) 2019-05-23

Similar Documents

Publication Publication Date Title
US6868414B2 (en) Technique for serializing data structure updates and retrievals without requiring searchers to use locks
Leis et al. The ART of practical synchronization
US20190294558A1 (en) Latchless, non-blocking dynamically resizable segmented hash index
EP3047400B1 (en) Multi-version concurrency control on in-memory snapshot store of oracle in-memory database
CN103678556B (en) The method and processing equipment of columnar database processing
US10019382B2 (en) Secondary data structures for storage class memory (scm) enables main-memory databases
CN105868228A (en) In-memory database system providing lockless read and write operations for OLAP and OLTP transactions
JP4833590B2 (en) Concurrent transactions (CONCURRENT TRANSACTIONS) and page synchronization (PAGESYNCHRONIZATION)
CN111316255B (en) Data storage system and method for providing a data storage system
CN104750720B (en) The realization that high-performance data is handled under multi-thread concurrent access environment
US7716182B2 (en) Version-controlled cached data store
EP4009189A1 (en) Querying versioned data cell
WO2017059798A1 (en) Methods, apparatus, and system for serialization and deserialization, and electronic devices
US11860861B2 (en) Growing dynamic shared memory hash table
CN105630862A (en) Set-oriented visibility state retrieval scheme
CN111259071A (en) Concurrent access control method in distributed database system
CN101286160A (en) method of database indexing
US20160239529A1 (en) Methods and systems of splitting database indexes and digests
CN111373389B (en) Data storage system and method for providing data storage system
CN114328500B (en) A data access method, device, equipment and computer readable storage medium
CN115203211A (en) A method and system for generating a unique hash sequence number
US10417215B2 (en) Data storage over immutable and mutable data stages
WO2025001902A1 (en) Skiplist-based data read-write method, system, device and storage medium
WO2017054662A1 (en) Apparatus and method for managing storage of primary database and replica database
CN115168022A (en) Object processing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant