[go: up one dir, main page]

CN114996275A - Key value storage method based on multi-tree conversion mechanism - Google Patents

Key value storage method based on multi-tree conversion mechanism Download PDF

Info

Publication number
CN114996275A
CN114996275A CN202210711424.3A CN202210711424A CN114996275A CN 114996275 A CN114996275 A CN 114996275A CN 202210711424 A CN202210711424 A CN 202210711424A CN 114996275 A CN114996275 A CN 114996275A
Authority
CN
China
Prior art keywords
tree
key
value data
data
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210711424.3A
Other languages
Chinese (zh)
Other versions
CN114996275B (en
Inventor
潘玉彪
林鑫伟
张惠臻
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huaqiao University
Original Assignee
Huaqiao University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huaqiao University filed Critical Huaqiao University
Priority to CN202210711424.3A priority Critical patent/CN114996275B/en
Publication of CN114996275A publication Critical patent/CN114996275A/en
Application granted granted Critical
Publication of CN114996275B publication Critical patent/CN114996275B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)

Abstract

The invention provides a key value storage method based on a multi-tree conversion mechanism, which specifically comprises the following steps: for the written key value data, firstly storing the key value data into a write skip list, and converting the key value data into a read-only skip list to be inserted into a B + tree in the memory device after the size of the key value data reaches the limit; when the size of the B + tree reaches a certain limit, traversing the key value data according to a heat strategy, and persisting the key value data with low heat to a cold tree 0 layer in the external storage device; if the number of the key value data files in the cold tree layer 0 reaches the size limit, triggering the partition operation of the layer 0; when the key value data in the B + tree execute persistence operation, the 0-layer partition only receives the key value data which accords with a set range; if the key value data in the specific range in the cold tree reaches a certain heat degree, transferring the key value data to a hot tree of the external storage device; while key-value data of low heat in the hot tree will be transferred to the cold tree. The method provided by the invention reduces the read amplification by using the heat strategy, ensures the write-in performance and realizes the integral improvement of the performance of the key value storage system.

Description

一种基于多树转换机制的键值存储方法A key-value storage method based on multi-tree conversion mechanism

技术领域technical field

本发明涉及计算机存储领域,特别是指一种基于多树转换机制的键值存储方法。The invention relates to the field of computer storage, in particular to a key-value storage method based on a multi-tree conversion mechanism.

背景技术Background technique

随着大数据时代的到来,数据储量不断增加,数据特点发生了明显变化,非结构化数据占比已达到了数据总量的80%。在这样的数据变化趋势下,键值存储被广泛应用,它对数据结构没有明确限制,具有很高的可扩展性。目前已经有一些优秀的键值存储产品出现且被广泛应用,如Chrome浏览器使用的Leveldb,facebook基于Leveldb进一步提出改进的Rocksdb,Twitter使用的Redis和Memcached等。目前的键值存储所使用的数据结构主要有哈希结构、B树以及日志结构合并树(LSM树),这三种结构分别具有不同的优缺点。With the advent of the era of big data, data reserves have been increasing, and data characteristics have changed significantly. The proportion of unstructured data has reached 80% of the total data. Under such a trend of data changes, key-value storage is widely used, it has no clear restrictions on the data structure, and has high scalability. At present, some excellent key-value storage products have appeared and are widely used, such as Leveldb used by Chrome browser, Rocksdb further improved by facebook based on Leveldb, Redis and Memcached used by Twitter, etc. The data structures used in the current key-value store mainly include hash structure, B-tree and log-structure merge tree (LSM tree). These three structures have different advantages and disadvantages.

由于基于LSM树键值存储系统具有良好的写性能,并且支持范围扫描,因此出现了许多相关产品。但由于LSM树结构固有的结构特点(外存中的层级结构),该结构的维护和使用会对键值存储系性能造成消极影响。以Leveldb为例,首先,Leveldb的写放大的来源主要是层间合并操作中的数据重写。例如,Leveldb中每一层的大小时上一层的10倍,将i层中的一个文件合并到i+1层时,在最坏的情况下,需要读取i+1层的10个文件,合并排序后再将这些数据写回i+1层,写放大倍数高达10,在这种情况下,将一个SST文件从L0层移动到L6层,写放大最高会达到50。其次,Leveldb的读放大的来源主要有两个方面。一方面,在查找一个键值对时,Leveldb可能需要在多个层级中进行查找,在最坏情况下,需要在L0中查找8个文件,在L1-L6层中每层需要查找一个文件,一共需要查找14个文件;另一方面,在一个SST文件中查找一个键值对时,Leveldb需要读取该文件的多个元数据块,实际读取的数据包括索引块、bloom过滤器块和数据块。例如,当查找1KB的键值对时,Leveldb需要读取16KB的索引块,4KB的过滤器块和4KB的数据块,共24KB的数据。在最差的情况下,需要读取14个SST文件,所以读放大会高达336。Since the LSM tree-based key-value storage system has good write performance and supports range scanning, many related products have appeared. However, due to the inherent structural characteristics of the LSM tree structure (hierarchical structure in external memory), the maintenance and use of this structure will have a negative impact on the performance of the key-value storage system. Taking Leveldb as an example, first of all, the source of Leveldb's write amplification is mainly data rewriting in the inter-layer merge operation. For example, the size of each layer in Leveldb is 10 times that of the previous layer. When merging a file in layer i to layer i+1, in the worst case, 10 files in layer i+1 need to be read , and then write the data back to the i+1 layer after merge sorting, and the write magnification is as high as 10. In this case, if an SST file is moved from the L0 layer to the L6 layer, the maximum write magnification will reach 50. Secondly, there are two main sources of Leveldb's read amplification. On the one hand, when looking for a key-value pair, Leveldb may need to look up in multiple levels, in the worst case, it needs to find 8 files in L0, and each layer needs to find one file in L1-L6 layers, A total of 14 files need to be searched; on the other hand, when searching for a key-value pair in an SST file, Leveldb needs to read multiple metadata blocks of the file, and the actual data read includes index blocks, bloom filter blocks and data block. For example, when looking for a 1KB key-value pair, Leveldb needs to read a 16KB index block, a 4KB filter block, and a 4KB data block, for a total of 24KB of data. In the worst case, 14 SST files need to be read, so the read magnification will be as high as 336.

发明内容SUMMARY OF THE INVENTION

本发明的主要目的在于克服现有技术中的上述缺陷,提出一种基于多树转换机制的键值存储方法,使用热度策略减少读放大的同时,保证写入性能,实现键值存储系统性能的整体提升。The main purpose of the present invention is to overcome the above-mentioned defects in the prior art, and propose a key-value storage method based on a multi-tree conversion mechanism, which uses a heat strategy to reduce read amplification while ensuring write performance, and realizes the performance of the key-value storage system. Overall improvement.

本发明采用如下技术方案:The present invention adopts following technical scheme:

一种基于多树转换机制的键值存储方法,包括:A key-value storage method based on a multi-tree conversion mechanism, comprising:

对于写入的键值数据,首先保存至写入跳表中,当大小达到限制后,转换为只读跳表插入至内存设备中的B+树;当B+树的大小达到一定限制时,遍历键值数据;For the written key-value data, it is first stored in the write skip table. When the size reaches the limit, it is converted into a read-only skip table and inserted into the B+ tree in the memory device; when the size of the B+ tree reaches a certain limit, the keys are traversed. value data;

根据热度策略进行判断,将热度低的键值数据持久化至外存设备中的冷树的0层,热度高的键值数据则继续留存在B+树中;Judging according to the heat strategy, the key-value data with low heat will be persisted to the 0th layer of the cold tree in the external storage device, and the key-value data with high heat will continue to remain in the B+ tree;

若冷树的0层中的键值数据文件数量达到大小限制,则触发0层分区操作,0层分区操作依据层中现有的键值数据范围进行均分,若0层中某一分区的键值数据文件数量达到大小限制,则再次触发分区操作;If the number of key-value data files in layer 0 of the cold tree reaches the size limit, the layer-0 partition operation is triggered, and the layer-0 partition operation is equally divided according to the existing key-value data range in the layer. When the number of key-value data files reaches the size limit, the partition operation is triggered again;

当B+树中热度较低的键值数据执行持久化操作时,0层分区只接收符合设定范围的键值数据;若冷树中范围内的键值数据经热度策略判断后达到一定热度时,则将这一范围内的键值数据转移至外存设备的热树中,并使用内存设备中的B+树索引热树中的键值数据;When the low-heat key-value data in the B+ tree performs the persistence operation, the 0-tier partition only receives the key-value data that meets the set range; , then transfer the key-value data in this range to the hot tree of the external memory device, and use the B+ tree in the memory device to index the key-value data in the hot tree;

当热树的大小达到一定限制时,对键值数据根据热度策略进行判断,将一个或多个低热度范围内的键值数据转移至冷树分区中,并从B+树中删除这些数据的索引。When the size of the hot tree reaches a certain limit, the key-value data is judged according to the heat policy, and one or more key-value data in the low-heat range are transferred to the cold tree partition, and the indexes of these data are deleted from the B+ tree. .

具体地,所述热度策略具体为:Specifically, the heat strategy is specifically:

将只读跳表中的键值数据插入至B+树结构中时,保存插入时间字段,用于判断键值数据的热度情况;When the key-value data in the read-only skip table is inserted into the B+ tree structure, the insertion time field is saved to judge the popularity of the key-value data;

根据节点中记录的时间字段判断该键值数据是否需要保留在B+树中,若不需保留,则持久化至外存设备的冷树分区内;According to the time field recorded in the node, determine whether the key-value data needs to be kept in the B+ tree, if not, it will be persisted to the cold tree partition of the external storage device;

对于外存设备中的键值数据,使用全局热度表来记录键值数据的操作数,当执行冷树中的合并操作时,若一个位于冷树内的范围单位的操作数达到阈值,则将此范围单位内的所有键值数据转移至热树中;For the key-value data in the external storage device, the global heat table is used to record the operands of the key-value data. When the merge operation in the cold tree is performed, if the operand of a range unit located in the cold tree reaches the threshold, the All key-value data within this range unit is transferred to the hot tree;

当热树达到一定大小限制时,使用全局热度表对热树内的键值数据执行热度判断,根据键值数据的范围热度情况,判断该范围内的键值数据是否需要保留热树中,若不需保留,则将热树中这些范围内的键值数据转移至冷树中。When the hot tree reaches a certain size limit, use the global heat table to perform heat judgment on the key-value data in the hot tree, and judge whether the key-value data in the range needs to be retained in the hot tree according to the range of the key-value data. If no retention is required, the key-value data in these ranges in the hot tree is transferred to the cold tree.

具体地,还包括数据读取步骤,具体为:Specifically, it also includes a data reading step, specifically:

S11:首先在缓存区中查找目标数据,若找到,到步骤S12;若未找到,到步骤S13;S11: First, search for the target data in the cache area, if found, go to step S12; if not, go to step S13;

S12:系统完成用户的读取请求,向用户发送读取到的目标数据;S12: The system completes the user's read request and sends the read target data to the user;

S13:在B+树中查找目标键及值类型,若找到且值类型为实际值,到步骤S12,若找到且值类型为地址信息,到步骤S14;若未找到,到步骤S15;S13: Find the target key and value type in the B+ tree, if found and the value type is the actual value, go to step S12, if found and the value type is address information, go to step S14; if not found, go to step S15;

S14:在外存设备的热树中根据上步得到的地址读取目标数据,到步骤S12;S14: Read the target data in the hot tree of the external storage device according to the address obtained in the previous step, and go to step S12;

S15:在外存设备的冷树中自顶向下按层查找目标数据,若找到,到步骤S12;若未找到,到步骤S16;S15: Search the target data from top to bottom in the cold tree of the external storage device, if found, go to step S12; if not, go to step S16;

S16:系统完成用户的读取请求,向用户发送结果,未找到目标数据。S16: The system completes the user's read request and sends the result to the user, but the target data is not found.

具体地,还包括数据写入步骤,具体为:Specifically, it also includes a data writing step, specifically:

S21:首先写入跳表结构中,如果达到大小限制,到步骤S22;S21: First write into the skip table structure, if the size limit is reached, go to step S22;

S22:将此跳表的转换为只读跳表,并创建新的写入跳表结构用于数据写入;S22: Convert this skip table to a read-only skip table, and create a new write skip table structure for data writing;

S23:若B+树大小未达到限制,到步骤S25;若达到限制,到步骤S24;S23: If the size of the B+ tree does not reach the limit, go to step S25; if it reaches the limit, go to step S24;

S24:遍历B+树中的键值数据,将热度较低的键值数据持久化至外存设备的冷树中,到步骤S25;S24: Traverse the key-value data in the B+ tree, and persist the key-value data with lower heat to the cold tree of the external storage device, and go to step S25;

S25:将只读跳表结构内的键值数据与当前时间保存至B+树结构中;S25: Save the key-value data and the current time in the read-only skip table structure to the B+ tree structure;

S26:系统完成用户的写入请求,向用户发送写入成功的信息。S26: The system completes the user's writing request, and sends a successful writing information to the user.

具体地,应用的存储结构具体为:Specifically, the storage structure of the application is as follows:

写入跳表、只读跳表、B+树以及磁盘中的冷树结构与热树结构;其中所述冷树结构为3层日志结构合并树,包括0层日志结构、1层日志结构、2层日志结构,且0层日志结构存在分区结构;热树结构为单层日志结构合并树。Write skip table, read-only skip table, B+ tree, and cold tree structure and hot tree structure in disk; wherein the cold tree structure is a 3-layer log structure merge tree, including 0-level log structure, 1-level log structure, 2-level log structure Layer log structure, and the 0-layer log structure has a partition structure; the hot tree structure is a single-layer log structure merge tree.

由上述对本发明的描述可知,与现有技术相比,本发明具有如下有益效果:As can be seen from the above description of the present invention, compared with the prior art, the present invention has the following beneficial effects:

(1)本发明提出一种基于多树转换机制的键值存储方法,具体包括:对于写入的键值数据,首先保存至写入跳表中,当大小达到限制后,转换为只读跳表插入至内存设备中的B+树;当B+树的大小达到一定限制时,遍历键值数据;根据热度策略进行判断,将热度低的键值数据持久化至外存设备中的冷树的0层,热度高的键值数据则继续留存在B+树中;若冷树的0层中的键值数据文件数量达到大小限制,则触发0层分区操作,0层分区操作依据层中现有的键值数据范围进行均分,若0层中某一分区的键值数据文件数量达到大小限制,则再次触发分区操作;当B+树中热度较低的键值数据执行持久化操作时,0层分区只接收符合设定范围的键值数据;若冷树中范围内的键值数据经热度策略判断后达到一定热度时,则将这一范围内的键值数据转移至外存设备的热树中,并使用内存设备中的B+树索引热树中的键值数据;当热树的大小达到一定限制时,对键值数据根据热度策略进行判断,将一个或多个低热度范围内的键值数据转移至冷树分区中,并从B+树中删除这些数据的索引。本发明提供的方法使用热度策略减少读放大的同时,保证写入性能,实现键值存储系统性能的整体提升。(1) The present invention proposes a key-value storage method based on a multi-tree conversion mechanism, which specifically includes: firstly save the key-value data written in the write-in jump table, and when the size reaches the limit, convert it into a read-only jump table Insert the table into the B+ tree in the memory device; when the size of the B+ tree reaches a certain limit, traverse the key-value data; judge according to the heat policy, and persist the key-value data with low heat to 0 of the cold tree in the external storage device Layer, the hot key-value data will continue to remain in the B+ tree; if the number of key-value data files in the cold tree layer 0 reaches the size limit, the 0-layer partition operation is triggered, and the 0-layer partition operation is based on the existing layer in the layer. The range of key-value data is divided equally. If the number of key-value data files in a partition in layer 0 reaches the size limit, the partition operation will be triggered again; The partition only receives key-value data that meets the set range; if the key-value data in the cold tree reaches a certain degree of popularity after being judged by the heat strategy, the key-value data in this range will be transferred to the hot tree of the external storage device and use the B+ tree in the memory device to index the key-value data in the hot tree; when the size of the hot tree reaches a certain limit, the key-value data is judged according to the heat policy, and one or more keys in the low heat range are The value data is moved to the cold tree partition, and the index of this data is removed from the B+ tree. The method provided by the present invention uses a heat strategy to reduce read amplification, and at the same time ensures the write performance, and realizes the overall improvement of the performance of the key-value storage system.

(2)本发明一方面,通过热度策略决定持久化操作,将热数据暂留在内存中,避免从外存设备中读取造成额外IO开销;另一方面,使用全局热度判断策略,索引外存设备中频繁访问的键值数据,加速键值数据的读取过程,提高读取性能。(2) On the one hand, the present invention determines the persistence operation through the heat strategy, and temporarily keeps the hot data in the memory to avoid additional IO overhead caused by reading from the external storage device; It stores frequently accessed key-value data in the device, accelerates the reading process of key-value data, and improves reading performance.

(3)本发明使用内存中的B+树保存热树中的地址信息,能够索引到热树中的所有有效数据,从而减少热数据的读取次数,并且B+树能够保证一定的范围查找性能。通过减少冷树的层级与0层分区的设计,减少冷数据的读取次数,减小了读放大。(3) The present invention uses the B+ tree in the memory to store the address information in the hot tree, and can index all valid data in the hot tree, thereby reducing the number of reads of the hot data, and the B+ tree can ensure a certain range of search performance. By reducing the design of cold tree levels and 0-level partitions, the number of reads of cold data is reduced and read amplification is reduced.

(4)本发明使用B+树保存热数据,能够避免键值数据执行更新操作时导致的多次持久化操作。传统LSM树的层间合并操作能够实现空间回收,但使得相同数据重复写入。使用冷热树之间的转换替代传统的层间合并操作,既能提高读性能又能减少数据写入量。(4) The present invention uses the B+ tree to store hot data, which can avoid multiple persistence operations caused when the key-value data performs an update operation. The inter-layer merge operation of the traditional LSM tree can achieve space reclamation, but it makes the same data repeatedly written. Using the conversion between hot and cold trees to replace the traditional inter-layer merge operation can not only improve the read performance but also reduce the amount of data writing.

附图说明Description of drawings

图1为基于多树转换机制的键值存储方法的系统结构图;Fig. 1 is a system structure diagram of a key-value storage method based on a multi-tree conversion mechanism;

图2为基于多树转换机制的键值存储方法的数据读取流程图;Fig. 2 is the data reading flow chart of the key-value storage method based on multi-tree conversion mechanism;

图3为基于多树转换机制的键值存储方法的数据写入流程图。FIG. 3 is a data writing flow chart of the key-value storage method based on the multi-tree conversion mechanism.

具体实施方式Detailed ways

以下通过具体实施方式对本发明作进一步的描述。The present invention will be further described below through specific embodiments.

如图1,为本发明实施例提供的基于多树转换机制的键值存储方法的系统结构图;应用的存储结构具体为:Figure 1 is a system structure diagram of a key-value storage method based on a multi-tree conversion mechanism provided by an embodiment of the present invention; the applied storage structure is specifically:

写入跳表、只读跳表、B+树以及磁盘中的冷树结构与热树结构;其中所述冷树结构为3层日志结构合并树,包括0层日志结构、1层日志结构、2层日志结构,且0层日志结构存在分区结构;热树结构为单层日志结构合并树;Write skip table, read-only skip table, B+ tree, and cold tree structure and hot tree structure in disk; wherein the cold tree structure is a 3-layer log structure merge tree, including 0-level log structure, 1-level log structure, 2-level log structure Layer log structure, and the 0-layer log structure has a partition structure; the hot tree structure is a single-layer log structure merge tree;

存储方法具体包括:首先保存至写入跳表中,当大小达到限制后,转换为只读跳表插入至内存设备中的B+树;当B+树的大小达到一定限制时,根据热度策略遍历其中的键值数据进行判断;The storage method specifically includes: first saving to the write jump table, when the size reaches the limit, converting it into a read-only jump table and inserting it into the B+ tree in the memory device; when the size of the B+ tree reaches a certain limit, traverse it according to the heat strategy The key-value data to judge;

将热度较低的键值数据持久化至外存设备中的冷树的0层,热度较高的键值数据则继续留存在B+树中;Persist key-value data with low heat to layer 0 of the cold tree in the external storage device, and key-value data with high heat will continue to remain in the B+ tree;

若冷树的0层中的键值数据文件数量达到大小限制,则触发数据合并操作与0层分区操作;0层分区操作依据层中现有的键值数据范围进行均分;若0层中某一分区的键值数据文件数量达到大小限制,则再次触发数据合并操作与分区操作;If the number of key-value data files in the 0th layer of the cold tree reaches the size limit, the data merge operation and the 0th layer partition operation are triggered; the 0th layer partition operation is divided equally according to the existing key value data range in the layer; When the number of key-value data files in a partition reaches the size limit, the data merge operation and partition operation are triggered again;

当B+树中热度较低的键值数据执行持久化操作时,0层分区只接收符合自身范围的键值数据。如果冷树中某一范围内的键值数据经热度策略判断达到一定热度时,那么将这一范围内的所有键值数据转移至外存设备的热树中,并在内存设备中的B+树插入键与地址信息用于索引热树中的键值数据。When the low-heat key-value data in the B+ tree is persisted, the 0-tier partition only receives key-value data that matches its own range. If the key-value data in a certain range in the cold tree reaches a certain degree of heat according to the heat strategy, then all key-value data in this range will be transferred to the hot tree of the external storage device, and the B+ tree in the memory device will be transferred. The insertion key and address information is used to index the key-value data in the hot tree.

当热树的大小达到一定限制时,对其中的键值数据根据热度策略进行遍历判断,将一个或多个低热度范围内的键值数据转移至冷树分区中,并从B+树中删除这些数据的索引;When the size of the hot tree reaches a certain limit, the key-value data in it is traversed and judged according to the heat policy, and one or more key-value data in the low-heat range are transferred to the cold tree partition, and these data are deleted from the B+ tree. the index of the data;

具体地,所述热度策略具体为:Specifically, the heat strategy is specifically:

将只读跳表中的键值数据插入至B+树结构中时,保存插入时间字段,用于判断键值数据的热度情况;When the key-value data in the read-only skip table is inserted into the B+ tree structure, the insertion time field is saved to judge the popularity of the key-value data;

根据节点中记录的时间字段判断该键值数据是否需要保留在B+树中,若不需保留,则持久化至外存设备的冷树分区内;According to the time field recorded in the node, determine whether the key-value data needs to be kept in the B+ tree, if not, it will be persisted to the cold tree partition of the external storage device;

对于外存设备中的键值数据,使用全局热度表来记录键值数据的操作数,当执行冷树中的合并操作时,若一个位于冷树内的范围单位的操作数达到阈值,则将此范围单位内的所有键值数据转移至热树中;For the key-value data in the external storage device, the global heat table is used to record the operands of the key-value data. When the merge operation in the cold tree is performed, if the operand of a range unit located in the cold tree reaches the threshold, the All key-value data within this range unit is transferred to the hot tree;

当热树达到一定大小限制时,使用全局热度表对热树内的键值数据执行热度判断,根据键值数据的范围热度情况,判断该范围内的键值数据是否需要保留热树中,若不需保留,则将热树中这些范围内的键值数据转移至冷树中。When the hot tree reaches a certain size limit, use the global heat table to perform heat judgment on the key-value data in the hot tree, and judge whether the key-value data in the range needs to be retained in the hot tree according to the range of the key-value data. If no retention is required, the key-value data in these ranges in the hot tree is transferred to the cold tree.

如图2为基于多树转换机制的键值存储方法的数据读取流程图。Figure 2 is a data reading flow chart of the key-value storage method based on the multi-tree conversion mechanism.

101、用户向系统发送的读取请求,到102。101. A read request sent by the user to the system, go to 102.

102、在缓存区中查找目标数据,若找到,获取目标数据,到103,否则到104。102. Search for the target data in the buffer area, if found, obtain the target data, go to 103, otherwise go to 104.

103、向用户返回查询结果。103. Return the query result to the user.

104、在B+树中查找目标数据,若未找到,到105,否则到106。104. Search for the target data in the B+ tree, if not found, go to 105, otherwise go to 106.

105、在冷树中自顶向下逐层查找数据,若找到,获取目标数据,到103,否则,返回结果不存在,到103。105. Search for data layer by layer from top to bottom in the cold tree, if found, obtain the target data, go to 103, otherwise, return the result if it does not exist, go to 103.

106、判断值类型是否为地址信息,若不是,获取目标数据,到103,否则到107。106. Determine whether the value type is address information, if not, obtain the target data, go to 103, otherwise go to 107.

107、根据上步找到的地址信息,获取目标数据,到103。107. Obtain target data according to the address information found in the previous step, and go to 103.

如图3为基于多树转换机制的键值存储方法的数据写入流程图。Figure 3 is a data writing flow chart of the key-value storage method based on the multi-tree conversion mechanism.

201、用户向系统发送的写入请求,到202。201. A write request sent by the user to the system, go to 202.

202、判断写入跳表是否已满,若未满,到211,否则到203。202. Determine whether the write skip table is full, if not, go to 211, otherwise go to 203.

203、写入跳表转换为只读跳表,到204。203. Convert the write skip table to a read-only skip table, and go to 204.

204、判断缓存区的大小是否已满,若未满,到205,否则到206。204. Determine whether the size of the buffer area is full, if not, go to 205, otherwise go to 206.

205、创建新的写入跳表,到202。205. Create a new write skip table, go to 202.

206、判断B+树的大小是否达到限制,若是,到207,否则到208。206. Determine whether the size of the B+ tree reaches the limit, if so, go to 207, otherwise go to 208.

207、判断冷树0层分区内文件数是否达到限制,若是,到209,否则到210。207. Determine whether the number of files in the cold tree layer 0 partition reaches the limit, if so, go to 209, otherwise go to 210.

208、将只读跳表插入至B+树中,到204。208. Insert the read-only skip list into the B+ tree, and go to 204.

209、对冷树的0层执行合并操作与分区操作,到210。209. Perform a merge operation and a partition operation on the 0 layer of the cold tree, and go to 210.

210、依据热度策略持久化B+树中低热度数据至冷树中,到206。210. Persist the low-heat data in the B+ tree to the cold tree according to the heat strategy, and go to 206.

211、写入数据到写入跳表中,到212。211. Write data into the write skip table, go to 212.

212、向用户返回写入结果。212. Return the writing result to the user.

(1)本发明提出一种基于多树转换机制的键值存储方法,具体包括:对于写入的键值数据,首先保存至写入跳表中,当大小达到限制后,转换为只读跳表插入至内存设备中的B+树;当B+树的大小达到一定限制时,遍历键值数据;根据热度策略进行判断,将热度低的键值数据持久化至外存设备中的冷树的0层,热度高的键值数据则继续留存在B+树中;若冷树的0层中的键值数据文件数量达到大小限制,则触发0层分区操作,0层分区操作依据层中现有的键值数据范围进行均分,若0层中某一分区的键值数据文件数量达到大小限制,则再次触发分区操作;当B+树中热度较低的键值数据执行持久化操作时,0层分区只接收符合设定范围的键值数据;若冷树中范围内的键值数据经热度策略判断后达到一定热度时,则将这一范围内的键值数据转移至外存设备的热树中,并使用内存设备中的B+树索引热树中的键值数据;当热树的大小达到一定限制时,对键值数据根据热度策略进行判断,将一个或多个低热度范围内的键值数据转移至冷树分区中,并从B+树中删除这些数据的索引。本发明提供的方法使用热度策略减少读放大的同时,保证写入性能,实现键值存储系统性能的整体提升。(1) The present invention proposes a key-value storage method based on a multi-tree conversion mechanism, which specifically includes: firstly save the key-value data written in the write-in jump table, and when the size reaches the limit, convert it into a read-only jump table Insert the table into the B+ tree in the memory device; when the size of the B+ tree reaches a certain limit, traverse the key-value data; judge according to the heat policy, and persist the key-value data with low heat to 0 of the cold tree in the external storage device Layer, the hot key-value data will continue to remain in the B+ tree; if the number of key-value data files in the cold tree layer 0 reaches the size limit, the 0-layer partition operation is triggered, and the 0-layer partition operation is based on the existing layer in the layer. The range of key-value data is divided equally. If the number of key-value data files in a partition in layer 0 reaches the size limit, the partition operation will be triggered again; The partition only receives key-value data that meets the set range; if the key-value data in the cold tree reaches a certain degree of popularity after being judged by the heat strategy, the key-value data in this range will be transferred to the hot tree of the external storage device and use the B+ tree in the memory device to index the key-value data in the hot tree; when the size of the hot tree reaches a certain limit, the key-value data is judged according to the heat policy, and one or more keys in the low heat range are The value data is moved to the cold tree partition, and the index of this data is removed from the B+ tree. The method provided by the present invention uses a heat strategy to reduce read amplification, and at the same time ensures the write performance, and realizes the overall improvement of the performance of the key-value storage system.

(2)本发明一方面,通过热度策略决定持久化操作,将热数据暂留在内存中,避免从外存设备中读取造成额外IO开销;另一方面,使用全局热度判断策略,索引外存设备中频繁访问的键值数据,加速键值数据的读取过程,提高读取性能。(2) On the one hand, the present invention determines the persistence operation through the heat strategy, and temporarily keeps the hot data in the memory to avoid additional IO overhead caused by reading from the external storage device; It stores frequently accessed key-value data in the device, accelerates the reading process of key-value data, and improves reading performance.

(3)本发明使用内存中的B+树保存热树中的地址信息,能够索引到热树中的所有有效数据,从而减少热数据的读取次数,并且B+树能够保证一定的范围查找性能。通过减少冷树的层级与0层分区的设计,减少冷数据的读取次数,减小了读放大。(3) The present invention uses the B+ tree in the memory to store the address information in the hot tree, and can index all valid data in the hot tree, thereby reducing the number of reads of the hot data, and the B+ tree can ensure a certain range of search performance. By reducing the design of cold tree levels and 0-level partitions, the number of reads of cold data is reduced and read amplification is reduced.

(4)本发明使用B+树保存热数据,能够避免键值数据执行更新操作时导致的多次持久化操作。传统LSM树的层间合并操作能够实现空间回收,但使得相同数据重复写入。使用冷热树之间的转换替代传统的层间合并操作,既能提高读性能又能减少数据写入量。(4) The present invention uses the B+ tree to store hot data, which can avoid multiple persistence operations caused when the key-value data performs an update operation. The inter-layer merge operation of the traditional LSM tree can achieve space reclamation, but it makes the same data repeatedly written. Using the conversion between hot and cold trees to replace the traditional inter-layer merge operation can not only improve the read performance but also reduce the amount of data writing.

上述仅为本发明的具体实施方式,但本发明的设计构思并不局限于此,凡利用此构思对本发明进行非实质性的改动,均应属于侵犯本发明保护范围的行为。The above are only specific embodiments of the present invention, but the design concept of the present invention is not limited to this, and any non-substantial modification of the present invention by using this concept should be regarded as an act of infringing the protection scope of the present invention.

Claims (5)

1.一种基于多树转换机制的键值存储方法,其特征在于,包括:1. a key-value storage method based on multi-tree conversion mechanism, is characterized in that, comprises: 对于写入的键值数据,首先保存至写入跳表中,当大小达到限制后,转换为只读跳表插入至内存设备中的B+树;当B+树的大小达到一定限制时,遍历键值数据;For the written key-value data, it is first stored in the write skip table. When the size reaches the limit, it is converted into a read-only skip table and inserted into the B+ tree in the memory device; when the size of the B+ tree reaches a certain limit, the keys are traversed. value data; 根据热度策略进行判断,将热度低的键值数据持久化至外存设备中的冷树的0层,热度高的键值数据则继续留存在B+树中;Judging according to the heat strategy, the key-value data with low heat will be persisted to the 0th layer of the cold tree in the external storage device, and the key-value data with high heat will continue to remain in the B+ tree; 若冷树的0层中的键值数据文件数量达到大小限制,则触发0层分区操作,0层分区操作依据层中现有的键值数据范围进行均分,若0层中某一分区的键值数据文件数量达到大小限制,则再次触发分区操作;If the number of key-value data files in layer 0 of the cold tree reaches the size limit, the layer-0 partition operation is triggered, and the layer-0 partition operation is equally divided according to the existing key-value data range in the layer. When the number of key-value data files reaches the size limit, the partition operation is triggered again; 当B+树中热度较低的键值数据执行持久化操作时,0层分区只接收符合设定范围的键值数据;若冷树中范围内的键值数据经热度策略判断后达到一定热度时,则将这一范围内的键值数据转移至外存设备的热树中,并使用内存设备中的B+树索引热树中的键值数据;When the low-heat key-value data in the B+ tree performs the persistence operation, the 0-tier partition only receives the key-value data that meets the set range; , then transfer the key-value data in this range to the hot tree of the external memory device, and use the B+ tree in the memory device to index the key-value data in the hot tree; 当热树的大小达到一定限制时,对键值数据根据热度策略进行判断,将一个或多个低热度范围内的键值数据转移至冷树分区中,并从B+树中删除这些数据的索引。When the size of the hot tree reaches a certain limit, the key-value data is judged according to the heat policy, and one or more key-value data in the low-heat range are transferred to the cold tree partition, and the indexes of these data are deleted from the B+ tree. . 2.根据权利要求1所述的一种基于多树转换机制的键值存储方法,其特征在于,所述热度策略具体为:2. a kind of key-value storage method based on multi-tree conversion mechanism according to claim 1, is characterized in that, described heat strategy is specifically: 将只读跳表中的键值数据插入至B+树结构中时,保存插入时间字段,用于判断键值数据的热度情况;When the key-value data in the read-only skip table is inserted into the B+ tree structure, the insertion time field is saved to judge the popularity of the key-value data; 根据节点中记录的时间字段判断该键值数据是否需要保留在B+树中,若不需保留,则持久化至外存设备的冷树分区内;According to the time field recorded in the node, determine whether the key-value data needs to be kept in the B+ tree, if not, it will be persisted to the cold tree partition of the external storage device; 对于外存设备中的键值数据,使用全局热度表来记录键值数据的操作数,当执行冷树中的合并操作时,若一个位于冷树内的范围单位的操作数达到阈值,则将此范围单位内的所有键值数据转移至热树中;For the key-value data in the external storage device, the global heat table is used to record the operands of the key-value data. When the merge operation in the cold tree is performed, if the operand of a range unit located in the cold tree reaches the threshold, the All key-value data within this range unit is transferred to the hot tree; 当热树达到一定大小限制时,使用全局热度表对热树内的键值数据执行热度判断,根据键值数据的范围热度情况,判断该范围内的键值数据是否需要保留热树中,若不需保留,则将热树中这些范围内的键值数据转移至冷树中。When the hot tree reaches a certain size limit, use the global heat table to perform heat judgment on the key-value data in the hot tree, and judge whether the key-value data in the range needs to be retained in the hot tree according to the range of the key-value data. If no retention is required, the key-value data in these ranges in the hot tree is transferred to the cold tree. 3.根据权利要求1或2所述的一种基于多树转换机制的键值存储方法,其特征在于,还包括数据读取步骤,具体为:3. a kind of key-value storage method based on multi-tree conversion mechanism according to claim 1 and 2, is characterized in that, also comprises data reading step, is specially: S11:首先在缓存区中查找目标数据,若找到,到步骤S12;若未找到,到步骤S13;S11: First, search for the target data in the cache area, if found, go to step S12; if not, go to step S13; S12:系统完成用户的读取请求,向用户发送读取到的目标数据;S12: The system completes the user's read request and sends the read target data to the user; S13:在B+树中查找目标键及值类型,若找到且值类型为实际值,到步骤S12,若找到且值类型为地址信息,到步骤S14;若未找到,到步骤S15;S13: Find the target key and value type in the B+ tree, if found and the value type is the actual value, go to step S12, if found and the value type is address information, go to step S14; if not found, go to step S15; S14:在外存设备的热树中根据上步得到的地址读取目标数据,到步骤S12;S14: Read the target data in the hot tree of the external storage device according to the address obtained in the previous step, and go to step S12; S15:在外存设备的冷树中自顶向下按层查找目标数据,若找到,到步骤S12;若未找到,到步骤S16;S15: Search the target data from top to bottom in the cold tree of the external storage device, if found, go to step S12; if not, go to step S16; S16:系统完成用户的读取请求,向用户发送结果,未找到目标数据。S16: The system completes the user's read request and sends the result to the user, but the target data is not found. 4.根据权利要求1或2所述的一种基于多树转换机制的键值存储方法,其特征在于,还包括数据写入步骤,具体为:4. a kind of key-value storage method based on multi-tree conversion mechanism according to claim 1 and 2, is characterized in that, also comprises data writing step, is specially: S21:首先写入跳表结构中,如果达到大小限制,到步骤S22;S21: First write into the skip table structure, if the size limit is reached, go to step S22; S22:将此跳表的转换为只读跳表,并创建新的写入跳表结构用于数据写入;S22: Convert this skip table to a read-only skip table, and create a new write skip table structure for data writing; S23:若B+树大小未达到限制,到步骤S25;若达到限制,到步骤S24;S23: If the size of the B+ tree does not reach the limit, go to step S25; if it reaches the limit, go to step S24; S24:遍历B+树中的键值数据,将热度较低的键值数据持久化至外存设备的冷树中,到步骤S25;S24: Traverse the key-value data in the B+ tree, and persist the key-value data with lower heat to the cold tree of the external storage device, and go to step S25; S25:将只读跳表结构内的键值数据与当前时间保存至B+树结构中;S25: Save the key-value data and the current time in the read-only skip table structure to the B+ tree structure; S26:系统完成用户的写入请求,向用户发送写入成功的信息。S26: The system completes the user's writing request, and sends a successful writing information to the user. 5.根据权利要求1或2所述的一种基于多树转换机制的键值存储方法,其特征在于,应用的存储结构具体为:5. a kind of key-value storage method based on multi-tree conversion mechanism according to claim 1 and 2, is characterized in that, the storage structure of application is specifically: 写入跳表、只读跳表、B+树以及磁盘中的冷树结构与热树结构;其中所述冷树结构为3层日志结构合并树,包括0层日志结构、1层日志结构、2层日志结构,且0层日志结构存在分区结构;热树结构为单层日志结构合并树。Write skip table, read-only skip table, B+ tree, and cold tree structure and hot tree structure in disk; wherein the cold tree structure is a 3-layer log structure merge tree, including 0-level log structure, 1-level log structure, 2-level log structure Layer log structure, and the 0-layer log structure has a partition structure; the hot tree structure is a single-layer log structure merge tree.
CN202210711424.3A 2022-06-22 2022-06-22 A key-value storage method based on multi-tree conversion mechanism Active CN114996275B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210711424.3A CN114996275B (en) 2022-06-22 2022-06-22 A key-value storage method based on multi-tree conversion mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210711424.3A CN114996275B (en) 2022-06-22 2022-06-22 A key-value storage method based on multi-tree conversion mechanism

Publications (2)

Publication Number Publication Date
CN114996275A true CN114996275A (en) 2022-09-02
CN114996275B CN114996275B (en) 2024-08-16

Family

ID=83036517

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210711424.3A Active CN114996275B (en) 2022-06-22 2022-06-22 A key-value storage method based on multi-tree conversion mechanism

Country Status (1)

Country Link
CN (1) CN114996275B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116149550A (en) * 2022-12-29 2023-05-23 北京趋动智能科技有限公司 Data management method, device, storage medium and equipment
CN118092812A (en) * 2024-04-18 2024-05-28 华侨大学 Key value storage and read-write method based on memory table index and iterator reduction mechanism
CN118626685A (en) * 2024-08-09 2024-09-10 杭州新视窗信息技术有限公司 A multi-level data node storage index method and system

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180089244A1 (en) * 2016-09-26 2018-03-29 Vmware, Inc. Key-value stores implemented using fragmented log-structured merge trees
CN111026329A (en) * 2019-11-18 2020-04-17 华中科技大学 Key value storage system based on host management tile record disk and data processing method
CN112100293A (en) * 2020-09-23 2020-12-18 腾讯科技(深圳)有限公司 Data processing method, data access method, data processing device, data access device and computer equipment
CN112486994A (en) * 2020-11-30 2021-03-12 武汉大学 Method for quickly reading data of key value storage based on log structure merging tree
CN113626431A (en) * 2021-07-28 2021-11-09 浪潮云信息技术股份公司 LSM tree-based key value separation storage method and system for delaying garbage recovery
CN113760171A (en) * 2020-06-01 2021-12-07 华为技术有限公司 Metadata storage method and device
CN113821171A (en) * 2021-09-01 2021-12-21 浪潮云信息技术股份公司 Key value storage method based on hash table and LSM tree
CN114416646A (en) * 2022-01-20 2022-04-29 上海妃鱼网络科技有限公司 Data processing method and device of hierarchical storage system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180089244A1 (en) * 2016-09-26 2018-03-29 Vmware, Inc. Key-value stores implemented using fragmented log-structured merge trees
CN111026329A (en) * 2019-11-18 2020-04-17 华中科技大学 Key value storage system based on host management tile record disk and data processing method
CN113760171A (en) * 2020-06-01 2021-12-07 华为技术有限公司 Metadata storage method and device
CN112100293A (en) * 2020-09-23 2020-12-18 腾讯科技(深圳)有限公司 Data processing method, data access method, data processing device, data access device and computer equipment
CN112486994A (en) * 2020-11-30 2021-03-12 武汉大学 Method for quickly reading data of key value storage based on log structure merging tree
CN113626431A (en) * 2021-07-28 2021-11-09 浪潮云信息技术股份公司 LSM tree-based key value separation storage method and system for delaying garbage recovery
CN113821171A (en) * 2021-09-01 2021-12-21 浪潮云信息技术股份公司 Key value storage method based on hash table and LSM tree
CN114416646A (en) * 2022-01-20 2022-04-29 上海妃鱼网络科技有限公司 Data processing method and device of hierarchical storage system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116149550A (en) * 2022-12-29 2023-05-23 北京趋动智能科技有限公司 Data management method, device, storage medium and equipment
CN116149550B (en) * 2022-12-29 2024-08-02 北京趋动智能科技有限公司 Data management method, device, storage medium and equipment
CN118092812A (en) * 2024-04-18 2024-05-28 华侨大学 Key value storage and read-write method based on memory table index and iterator reduction mechanism
CN118626685A (en) * 2024-08-09 2024-09-10 杭州新视窗信息技术有限公司 A multi-level data node storage index method and system

Also Published As

Publication number Publication date
CN114996275B (en) 2024-08-16

Similar Documents

Publication Publication Date Title
CN110825748B (en) High-performance and easily-expandable key value storage method by utilizing differentiated indexing mechanism
CN114996275B (en) A key-value storage method based on multi-tree conversion mechanism
CN102646069B (en) Method for prolonging service life of solid-state disk
CN110347852B (en) File system and file management method embedded in horizontally scalable key-value storage system
CN105117415B (en) A kind of SSD data-updating methods of optimization
CN102364474B (en) Metadata storage system for cluster file system and metadata management method
CN107368436B (en) Flash memory cold and hot data separated storage method combined with address mapping table
US10740251B2 (en) Hybrid drive translation layer
US20100211616A1 (en) Performance by Avoiding Disk I/O for Deduplicated File Blocks
CN104166634A (en) Management method of mapping table caches in solid-state disk system
CN114356877A (en) A log structure merge tree hierarchical storage method and system based on persistent memory
CN101488153A (en) Method for implementing high-capacity flash memory file system in embedded type Linux
CN113553476B (en) Key value storage method for reducing write pause by utilizing hash
CN114969069B (en) A heat-aware local update method for key-value storage systems
CN112732725B (en) NVM (non volatile memory) hybrid memory-based adaptive prefix tree construction method, system and medium
CN105975215A (en) STL mapping table management method based on Ondemand algorithm
CN115203079A (en) Method for writing data into solid state disk
CN114741028B (en) A persistent key-value storage method, device and system based on OCSSD
CN104504076A (en) Method for implementing distributed caching with high concurrency and high space utilization rate
CN114415966B (en) Method for constructing KV SSD storage engine
CN114691041A (en) Key value storage system and garbage recycling method
Wang et al. Hotkey-lsm: A hotness-aware lsm-tree for big data storage
CN117406923A (en) Data deleting and managing system based on log structure merging tree
CN118312515B (en) Collaborative invalid key-value pair confirmation method and garbage collection method applied to WiscKey
CN114398007B (en) LSM-tree-based caching optimization method for KV storage system read performance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant