CN120196288B

CN120196288B - Data storage method, system, electronic device and storage medium

Info

Publication number: CN120196288B
Application number: CN202510668306.2A
Authority: CN
Inventors: 刚亚州; 徐冲冲; 孙明刚
Original assignee: Suzhou Metabrain Intelligent Technology Co Ltd
Current assignee: Suzhou Metabrain Intelligent Technology Co Ltd
Priority date: 2025-05-22
Filing date: 2025-05-22
Publication date: 2025-08-01
Anticipated expiration: 2045-05-22
Also published as: CN120196288A

Abstract

The present application discloses a data storage method, system, electronic device and storage medium, and relates to the field of data storage technology. Since the entire data storage system is split into multiple storage units, each storage unit includes a buffer and a storage tree. Newly written data is first stored in the buffer, and then the data is flushed to the storage tree by flushing the buffer, so as to realize batch writing of data in the storage tree and reduce write amplification of the storage tree. Moreover, when the storage tree hierarchy is high, the high-level storage tree is split into multiple storage subtrees, so as to realize distributed storage of data, avoid the situation where the storage tree hierarchy is too high, reduce the data access delay to the storage tree, and lay the foundation for improving the concurrent access volume and access efficiency of data.

Description

Data storage method, system, electronic equipment and storage medium

Technical Field

The present application relates to the field of data storage technologies, and in particular, to a data storage method, a data storage system, an electronic device, and a storage medium.

Background

With the development of information technology, there is an increasing need for digital data to be stored. In the process of data storage, the concurrent access amount and access efficiency of data need to be considered, so how to efficiently store the data becomes important research content.

In the related art, a b+ tree is generally used for data storage, and when data is read and written, the data is searched from the b+ tree step by step down until the leaves. However, when the data amount to be stored is large, the capacity of a single leaf node of the b+ tree is limited, so that node splitting is required to generate new leaf nodes and non-leaf nodes, the newly generated nodes increase the level number of the b+ tree, and when the level number of the b+ tree is large, the data access delay to the b+ tree is increased, so that the concurrent access amount and access efficiency of the data are not guaranteed.

Disclosure of Invention

The application provides a data storage method, a system, electronic equipment and a storage medium, which at least solve the problems that when the number of levels of a B+ tree is large, the data access delay to the B+ tree is increased in the related art, and the concurrent access quantity and the access efficiency of data are not guaranteed.

The application provides a data storage method, which comprises the following steps:

Acquiring a data storage request;

Screening target storage units from a plurality of storage units according to target storage addresses represented by the data storage requests, wherein each storage unit comprises a buffer and a storage tree;

writing data to be stored indicated by the data storage request into a buffer of a target storage unit;

under the condition that the buffer meets the preset brushing conditions, brushing the data to be brushed in the buffer down to a storage tree;

And splitting the storage tree to split the storage tree into a plurality of storage subtrees under the condition that the current level number of the storage tree exceeds the storage tree level number threshold, wherein the level number of the storage subtrees is equal to the storage tree level number threshold.

The application also provides a data storage device, comprising:

the acquisition module is used for acquiring the data storage request;

The screening module is used for screening target storage units from a plurality of storage units according to the target storage addresses represented by the data storage requests, wherein each storage unit comprises a buffer and a storage tree;

The writing module is used for writing the data to be stored indicated by the data storage request into the buffer of the target storage unit;

The brushing module is used for brushing the data to be brushed in the buffer down to the storage tree under the condition that the buffer meets the preset brushing condition;

And the splitting module is used for splitting the storage tree to split the storage tree into a plurality of storage subtrees under the condition that the current level number of the storage tree exceeds the storage tree level number threshold value, wherein the level number of the storage subtrees is equal to the storage tree level number threshold value.

The application also provides a data storage system, which comprises a plurality of storage units and data storage equipment, wherein each storage unit comprises a buffer and a storage tree;

The data storage device adopts any one of the data storage methods to perform data storage processing on the plurality of storage units.

The application also provides an electronic device comprising a memory for storing a computer program and a processor for implementing the steps of any one of the data storage methods when executing the computer program.

The present application also provides a computer readable storage medium having a computer program stored therein, wherein the computer program when executed by a processor implements the steps of any of the data storage methods described above.

The application also provides a computer program product comprising a computer program which when executed by a processor performs the steps of any of the data storage methods described above.

According to the application, the whole data storage system is split into a plurality of storage units, each storage unit comprises a buffer and a storage tree, newly written data is firstly stored into the buffer, then the data is downwards brushed into the storage tree by downwards brushing the data from the buffer, so that batch writing of the data of the storage tree is realized, the write amplification of the storage tree is reduced, and when the storage tree level is higher, the storage tree of a high level is split into a plurality of storage subtrees, so that the scattered storage of the data is realized, the situation that the storage tree level is too high is avoided, the data access delay of the storage tree is reduced, and the foundation is laid for improving the concurrent access quantity and the access efficiency of the data.

Drawings

For a clearer description of embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described, it being apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to the drawings without inventive effort for those skilled in the art.

FIG. 1 is a schematic diagram of a network architecture on which an embodiment of the present application is based;

FIG. 2 is a schematic flow chart of a data storage method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an exemplary data storage system according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a memory cell according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a memory tree splitting process according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a merging process of memory trees according to an embodiment of the present application;

FIG. 7 is a schematic diagram of another merging process of memory trees according to an embodiment of the present application;

FIG. 8 is a schematic diagram of a request response flow provided by an embodiment of the present application;

FIG. 9 is a schematic diagram of a data storage device according to an embodiment of the present application;

FIG. 10 is a schematic diagram of a data storage system according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. Based on the embodiments of the present application, all other embodiments obtained by a person of ordinary skill in the art without making any inventive effort are within the scope of the present application.

It should be noted that in the description of the present application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. The terms "first," "second," and the like in this specification are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.

In a data storage system, data granularity (minimum data unit) is the basis of data information storage, and is the minimum unit of data. In recent years, with the development of information technology, massive amounts of data have been generated, but how to effectively manage and organize such massive amounts of data has become a prominent problem. For a large amount of stored data, the data content and the data meaning in the data are analyzed by inquiry, so that the data can be more effectively utilized. Efficient organization and management of metadata in a storage system is an effective means to address this problem, enabling the system to manage and maintain the data.

In the full flash data storage, a large number of high concurrent data access and query problems are necessarily involved, and metadata is effectively managed, so that the concurrent access amount and access efficiency can be increased. Therefore, the effective management method for multiple concurrent reading and writing of metadata in the full flash memory is important, so that the large-scale concurrent random access metadata has higher throughput and smaller time delay.

In the related art, a b+ tree is generally used for data storage, and when data is read and written, the data is searched from the b+ tree step by step down until the leaves. However, when the data volume to be stored is large, because the capacity of a single leaf node of the b+ tree is limited, node splitting is required to generate new leaf nodes and non-leaf nodes, the newly generated nodes increase the level number of the b+ tree, when the level number of the b+ tree is large, the data access delay to the b+ tree is increased, especially in the writing process, the newly written metadata can be returned after the b+ tree is operated, in the inserting and deleting operation, KV insertion (key value pair insertion), node splitting and KV deletion (key value pair deletion) are all related to data movement, copying and the like, and the brought CPU consumption is very high in the whole processing path of the I/O, the performance is not high, and the concurrent access volume and access efficiency of the data are not beneficial to being ensured.

The embodiment of the application provides a data storage method, a system, electronic equipment and a storage medium, and the method comprises the steps of obtaining a data storage request, screening target storage units in a plurality of storage units according to target storage addresses represented by the data storage request, writing data to be stored indicated by the data storage request into a buffer of the target storage units, brushing the data to be brushed down in the buffer to a storage tree when the buffer meets a preset brushing condition, and splitting the storage tree to split the storage tree into a plurality of storage subtrees when the current level number of the storage tree exceeds a storage tree level number threshold value, wherein the level number of the storage subtrees is equal to the storage tree level number threshold value. According to the method provided by the scheme, the whole data storage system is split into the plurality of storage units, each storage unit comprises the buffer and the storage tree, newly written data is firstly stored in the buffer, then the data is downwards brushed to the storage tree through the data downwards brushing to the buffer, so that batch writing of the data of the storage tree is realized, the write amplification of the storage tree is reduced, and when the storage tree level is higher, the storage tree of a high level is split into a plurality of storage subtrees, the scattered storage of the data is realized, the situation that the storage tree level is too high is avoided, the data access delay to the storage tree is reduced, and the foundation is laid for improving the concurrent access quantity and the access efficiency of the data.

The present application will be further described in detail below with reference to the drawings and detailed description for the purpose of enabling those skilled in the art to better understand the aspects of the present application.

The specific application environment architecture or specific hardware architecture upon which the execution of the data storage method depends is described herein.

First, a network structure on which the present application is based will be described:

The data storage method, the system, the electronic equipment and the storage medium provided by the embodiment of the application are suitable for carrying out data storage processing on metadata waiting for storage data. As shown in fig. 1, a network structure diagram according to an embodiment of the present application mainly includes a user side and a server side, where the user side is configured to initiate a data storage request or a data reading request to the server side, and the server side is configured to respond to the data storage request or the data reading request initiated by the user side, and perform corresponding data processing based on the data storage method provided by the embodiment of the present application, so as to implement decentralized storage of data, avoid a situation that a storage tree level is too high, and reduce a data access delay to the storage tree.

The embodiment of the application provides a data storage method which is used for carrying out data storage processing on metadata waiting for storage data. The execution body of the embodiment of the application is electronic equipment, such as a server, a desktop computer, a notebook computer, a tablet computer and other electronic equipment which can be used for data storage.

As shown in fig. 2, a flow chart of a data storage method according to an embodiment of the present application is shown, where the method includes:

step 201, a data storage request is obtained.

The data storage request is a write request initiated by the user side and the upper layer application, the request at least comprises data to be stored and a target storage address, and the target storage address at least comprises a logic block address (Logical Block Address, abbreviated as LBA).

Step 202, selecting a target storage unit from a plurality of storage units according to the target storage address represented by the data storage request.

Each storage unit comprises a buffer and a storage tree, wherein the buffer is a temporary storage area in a memory, and the storage tree adopts a tree index structure such as a B+ tree.

Step 203, writing the data to be stored indicated by the data storage request into the buffer of the target storage unit.

It should be noted that, the buffer is located in the memory area, and the leaf node of the storage tree for storing data is located in the external memory area such as the disk, and since the speed of the memory data writing operation is far higher than that of the external data writing operation, the data to be stored indicated by the data storage request is written into the buffer of the target storage unit, so that the data storage efficiency can be improved. In addition, as the data to be stored can be modified in a certain period, the buffer is more convenient for modifying the data, and meanwhile, the condition that the data is frequently modified to cause writing amplification to the storage tree is avoided.

And 204, under the condition that the buffer meets the preset brushing condition, brushing the data to be brushed in the buffer down to the storage tree.

Specifically, when the buffer usage (storage space occupancy) reaches a preset threshold or reaches a timed refresh period, it is determined that the buffer satisfies a preset flush condition, and the storage space of the buffer can be released for storing subsequent data by flushing the data to be flushed in the buffer down to the storage tree.

In step 205, in the case that the current level number of the storage tree exceeds the storage tree level number threshold, splitting processing is performed on the storage tree to split the storage tree into a plurality of storage subtrees.

Wherein the number of levels of the storage subtrees is equal to the storage tree number of levels threshold.

Specifically, when the number of levels of the storage tree is high (the current number of levels exceeds the threshold of the number of levels of the storage tree), the storage tree is split into a plurality of short trees (storage subtrees), so that the number of non-leaf nodes is reduced, the height of the storage tree is controlled, and the query performance is prevented from being reduced due to the fact that the storage tree is too high.

On the basis of the foregoing embodiments, as a practical implementation manner, in an embodiment, selecting a target storage unit from a plurality of storage units according to a target storage address represented by a data storage request includes:

Step 2021, in a preset static index layer, searching a plurality of routing layers layer by layer according to a target storage address represented by a data storage request, so as to determine a target positioning layer table item corresponding to the target storage address;

step 2022, selecting a target storage unit from the plurality of storage units according to the target positioning layer table entry.

The preset static index layer comprises a plurality of routing layers and positioning layers, the positioning layers comprise a plurality of positioning layer table entries, and each positioning layer table entry corresponds to a storage unit one by one.

Exemplary, as shown in fig. 3, a schematic structural diagram of an exemplary data storage system according to an embodiment of the present application is provided, where the system includes a preset static index layer and a slot layer, and the slot layer includes a plurality of storage units. The routing layer comprises a plurality of routing layer table entries, each routing layer table entry records table entry information in a Key Value pair mode, a starting logical address is used as a Key Value (K), an offset is used as a Value (V), and a logical address range corresponding to the routing layer table entry is [ Key, key+value ]. The offset of the routing layer entries of the adjacent routing layers is 10 times different, and the number of the routing layer entries of the adjacent routing layers is 10 times different, namely the logic address range recorded by the routing layer entries is gradually decreased. By searching a plurality of routing layers layer by layer, the last routing layer can precisely locate the logical address range (routing layer list item) corresponding to the target storage address. And finally, determining the target positioning layer list item corresponding to the target storage address according to the corresponding relation between the routing layer list item of the last routing layer and each positioning layer list item in the positioning layer. The locating layer list item also records list item information in a Key Value pair mode, wherein a starting logic address in the locating layer list item is used as a Key Value, and a pointer Ptr is used as a Value. And screening the target storage unit in the slot layer according to the target pointer represented by the Value of the target positioning layer table item. The dynamic index layer (slot layer) is composed of a plurality of storage slot bits (storage units), each storage slot bit stores a certain range of key value pair data, the key value pair data is pointed by an index item of one positioning layer, the data ranges among the storage slot bits have no intersection, and the storage slot bits are arranged in an increasing sequence according to the array subscript sequence of the positioning layer.

The static index layer is used for accelerating positioning and consists of a routing layer and a positioning layer. Each structural extension creates a new static index layer (routing layer and positioning layer) but does not change once generated. The positioning layer is a large enough index item array, each array element stores a Key and a pointer pointing to a corresponding storage slot bit and is used for indexing the storage slot of the dynamic index layer, the index item Key indicates the maximum Key stored in the storage slot, namely the maximum Key of a tree root node stored in the storage slot, the routing layer is a multi-dimensional array based on the positioning layer and is a class jump table structure, each element is a positioning layer array index, and specific slots (target storage units) of the dynamic index layer are located through multi-level binary search.

In the data storage system provided by the embodiment of the application, a hierarchical index organization mode is provided, wherein the hierarchical index organization mode is divided into a static index and a dynamic index (slot level), the static index of an upper layer is used for accelerating positioning, and the dynamic index layer of a lower layer is composed of a plurality of storage units respectively comprising independent buffers and storage trees and is used for being responsible for persistence of data. The hierarchical structure reduces the number of partial non-leaf nodes, and meanwhile, the number of the non-leaf nodes of the storage tree with high access frequency is far smaller than that of the leaf nodes, so that all the non-leaf nodes are cached in a memory, and the data can be obtained only by one-time external memory access when a read request is in cache miss, thereby improving the read performance. Meanwhile, the use of a plurality of dwarf trees and independent scattered buffer buffers greatly reduces the influence of brushing, and meanwhile, the background periodically expands and maintains the index structure to be always composed of the dwarf trees. The system is initialized to only contain a single storage unit, and the expansion is gradually carried out along with the increase of the data volume.

Accordingly, in one embodiment, after splitting the storage tree to split the storage tree into a plurality of storage subtrees, a corresponding number of new storage units may be generated according to the plurality of storage subtrees, a positioning layer table entry is allocated to the new storage units, and a preset static index layer is modified according to an allocation result of the positioning layer table entry.

Specifically, when the splitting process is performed on the storage tree, the data page cached in the buffer corresponding to the storage tree is correspondingly divided, so that corresponding new storage units are generated for each storage subtree, positioning layer table entries are allocated for the new storage units, and a preset static index layer is adaptively modified, so that when the data stored in the storage subtree is accessed later, the corresponding new storage units can be accurately positioned through the preset static index layer, and the quick access of the data is realized.

On the basis of the foregoing embodiment, as an implementation manner, in an embodiment, in a case that the buffer meets a preset brushing condition, brushing data to be brushed in the buffer down to the storage tree includes:

Step 2041, converting the buffer into a read-only state and creating a new buffer for buffering the data to be stored written in the upper stage in the brushing process under the condition that the buffer meets the preset brushing condition;

In step 2042, the data to be flushed in the read-only status buffer is flushed down the memory tree.

It should be noted that, the buffer stores the increment key pair data of the storage slot bit logic address range in an ordered structure, the increment key pair data includes inserted, modified and deleted key pair data, that is, the data to be flushed is the increment key pair data, and the complete increment key pair data is saved by power failure in the buffer to prevent data loss. The buffer data is flushed by sequentially checking all storage tanks (storage units) of the dynamic index layer (slot layer) from left to right whether the buffer cache of the single storage tank meeting the preset flushing condition is flushed. If the buffer is used for carrying out data brushing because the storage space occupancy rate reaches a preset threshold value, the brushing of large data volume can be dispersed to be carried out for a plurality of times, thereby reducing the structural adjustment and the write amplification cost caused by writing the disk of each dirty data, relieving the performance jitter and reducing the tail delay.

Specifically, when the buffer meets the preset brushing condition, the storage unit is locked, the buffer is set to be in a read-only state, new data is prevented from being written into the buffer, the mixing of the new data in the data brushing process is avoided, and the accuracy of the data brushed to the storage tree is ensured. And simultaneously, a new buffer is created and used for receiving the data to be stored which is written in the upper stage in the data brushing process so as to prevent the write operation from being blocked due to the old buffer brushing, and after the creation of the new buffer is completed, the storage unit is unlocked so as to ensure that the storage unit normally provides service.

The data to be flushed in the read-only state buffer is flushed to the storage tree, so that the data is converted from temporary storage of the memory to persistent storage of the external memory, and the reliability and the durability of the data are ensured.

Specifically, in an embodiment, for any data to be flushed, a target leaf node may be determined in a storage tree belonging to the same storage unit as a read-only state buffer according to a target storage address of the data to be flushed, copy processing is performed on the target leaf node to obtain a duplicate leaf node, the data to be flushed is flushed to the duplicate leaf node, the duplicate leaf node is temporarily stored in a memory, when all the data to be flushed in the read-only state buffer has been flushed to the corresponding duplicate leaf node, merging processing is performed on the duplicate leaf nodes generated by copying the same target node to obtain at least one leaf node to be persistence, and at least one leaf node to be persistence is inserted into the storage tree to replace the original target leaf node in the storage tree.

Specifically, for each piece of data to be flushed, the target storage address is first searched for the corresponding target leaf node in the storage tree belonging to the same storage unit as the read-only state buffer. After the target leaf node is found, the target leaf node is subjected to copying operation, and a copy leaf node is generated. In the copying process, the copy leaf node can inherit the attribute and the data content of the original target leaf node, namely, the copy-on-write mode is adopted to perform data downloading, and under the multithread concurrent access scene, the original target leaf node can still be accessed by the read-write operation, the write operation blocking can not occur, the concurrency performance of the leaf node data access is ensured, and the direct modification of the original node is prevented from influencing other ongoing read-write operations.

Specifically, the data to be flushed is written into a duplicate leaf node, and the duplicate leaf node becomes a dirty node containing new data and is temporarily stored in a memory. And after all the data to be flushed in the read-only state buffer are flushed to the corresponding duplicate leaf nodes, merging the duplicate leaf nodes generated by copying the same target node. In the merging process, integrating the data in the duplicate leaf nodes, and removing the repeated or redundant parts to form at least one leaf node to be persisted. And finally, inserting at least one leaf node to be persisted obtained after the merging processing into a storage tree to replace the original target leaf node, and completing the updating and persisting storage of the data in the storage tree, so that the storage tree can reflect the latest data state, and the consistency and the integrity of the data are ensured.

Specifically, in an embodiment, for screening data to be flushed from the buffer, the value of the data in the buffer may be defined, where the value of each data is proportional to the importance of the data, and when the buffer meets a preset flushing condition, the data is sorted in descending order of value, and the part of the data with the highest value is selected as the data to be flushed, so as to ensure that the critical data (the data with higher value) is preferentially and durably stored in the storage tree.

Specifically, in an embodiment, the data in the buffer may be classified into cold data and hot data according to the access frequency of each piece of data in the buffer, and the cold data is used as the data to be flushed, so that the hot data is reserved in the buffer, and when a user subsequently initiates a data reading request to the hot data, the data can be directly read from the buffer, so that the access efficiency of the user to the storage unit is improved.

Further, in an embodiment, a new non-leaf node may be generated according to index information of the data to be flushed stored in the leaf node to be persisted, and the new non-leaf node may be inserted into the storage tree to replace an original non-leaf node in the storage tree.

Wherein the index information of the data to be swiped includes a key value range (logical address range) and a pointer. The non-leaf nodes mainly play roles of indexing and guiding searching in a storage tree structure such as a B+ tree, actual data is not stored, and the positions of the leaf nodes where the data are located are indicated through key value pairs and pointers.

Specifically, the leaf nodes to be persisted store data which is flushed down the storage tree, and a new non-leaf node is constructed according to index information of the data. The construction of the new non-leaf node is based on the data of the leaf node to be persisted, and the index structure of the storage tree is updated and optimized. For example, if the leaf node to be persisted contains a series of consecutive key-value ranges, the new non-leaf node may be reasonably partitioned and organized according to these ranges to more efficiently locate data in subsequent queries. And finally, inserting the generated new non-leaf nodes into the storage tree to replace the original non-leaf nodes.

Further, the new tree root node information of the storage tree is synchronized to the system metadata, so that the preset static index layer can be modified adaptively according to the system metadata. The tree root node information comprises address information of the stored tree root node.

In the data process of the storage unit, the storage node includes a new Buffer (buffer_cache), a read-only state Buffer (buffer_commit) and a storage tree. After the data brushing operation is completed, the storage unit is locked again, the read-only state buffer is deleted from the storage node, the new root node of the storage tree is switched according to the new tree node information represented by the system metadata, and finally the storage unit is unlocked, so that the storage unit can be normally applied.

On the basis of the above embodiment, as a practical implementation, in an embodiment, the method further includes:

Step 301, obtaining the number of data volumes, the total capacity of the data volumes and the written data capacity of a data storage system;

Step 302, determining the structure constraint condition of the storage unit according to the number of data volumes, the total capacity of the data volumes and the written data capacity of the data storage system;

step 303, according to the current number of the storage units and the structural constraint condition of the storage units;

step 304, determining a storage tree level number threshold according to the current number of storage units and the structural constraint condition of the storage units.

The structure constraint conditions of the storage units comprise the minimum height of the storage tree, the maximum height of the storage tree, the expected number of the storage units and the current number of the storage units.

The number of the data volumes is the total number of the logical volumes in the system, the total capacity of the data volumes is the total storage space of all the logical volumes, and the written data capacity is the actual data volume of the system which is written currently. The minimum height of the storage tree is the lowest hierarchical level number to be maintained so as to ensure the query efficiency, the maximum height of the storage tree is the highest hierarchical level number allowed so as to avoid the decrease of the query performance caused by the overhigh tree, and the expected number of the storage units is the optimal number of the storage units calculated according to the data volume and the load. Wherein, as the number of storage units increases, the threshold of the number of storage tree levels will gradually increase, eventually becoming the maximum height of the stored tree.

Specifically, in one embodiment, the storage tree level number threshold may be determined based on the following formula:

Wherein, the Representing a threshold value of the number of levels of the storage tree,Representing the minimum height of the storage tree,Representing the maximum height of the memory tree,Indicating the current number of memory cells,Indicating the desired number of memory cells.

Specifically, as the number of data volumes, the total capacity of the data volumes, the written data capacity and the change of the data storage system and the change of the current number of storage units subjected to splitting and merging of the storage tree, the threshold value of the storage tree level is adaptively adjusted, so that the threshold value of the storage tree level is ensured to be matched with the actual service scene of the data storage system, and the reliability of the splitting processing and the merging processing of the subsequent storage tree is further ensured.

On the basis of the foregoing embodiment, as a practical implementation manner, in one embodiment, splitting the storage tree to split the storage tree into a plurality of storage subtrees includes:

step 2051, determining a splitting strategy of the storage tree according to a difference value between the current level number of the storage tree and the threshold value of the level number of the storage tree;

Step 2052, removing at least one level of upper nodes of the storage tree according to the splitting strategy of the storage tree, so that the plurality of non-leaf nodes become root nodes, and splitting the storage tree into a plurality of storage subtrees.

It should be noted that, the system needs to ensure that the storage tree of each storage unit is always maintained in a relatively low tree height (hierarchy) state, so that the storage tree structure of all storage slot bits can be balanced by designing a background expansion operation based on the average number of leaf child nodes of each storage unit. An extended operation is one that is load friendly to hot spots. When the written data is mainly concentrated in a certain memory unit, the access efficiency of the hot spot data is reduced due to the increase of the memory tree structure, and at the moment, the height of the memory tree in the memory unit in which the hot spot is concentrated can be reduced through expansion operation (memory tree splitting), and the hot spot is dispersed into more memory units, so that the performance is kept stable.

The hierarchical index structure is divided into three sections, namely an expanded area, an expanding area and an unexpanded area, and the system maintains the Key range (logic address range) of the expanding part, when the newly requested Key (destination address) is positioned in the range, the Key cannot be immediately processed, the expansion of the range is required to be waited, if the newly requested Key is smaller than or equal to the system expansion Key, the index of the expanded area part, namely the new version is processed, and if the newly requested Key is larger than the system expansion Key, the index of the unexpanded part, namely the index of the old version is processed.

Specifically, in the process of expanding the storage tree, a new enough empty positioning layer is firstly created, then each storage unit of the original dynamic index layer is sequentially traversed, and balance processing is carried out according to the specific situation of the storage tree in the storage unit. When the height of the storage tree is close to the height threshold t, no processing is performed, and only the new positioning layer index item is pointed to the original storage tank bit. When the height of the storage tree exceeds the height threshold, split (Split) operation is performed, the buffer and the storage tree are Split into a plurality of parts, a plurality of new storage units are formed, and new positioning layer index items are sequentially created to point to the corresponding storage units. And when the height of the storage tree is smaller than the height threshold value, performing merging (Merge) operation, continuously traversing the following storage units until a storage tree with the height not smaller than the threshold value is found, merging all the traversed storage trees with the height smaller than the threshold value, ensuring that the height of the merged storage tree does not exceed the threshold value t, merging a buffer at the same time to form a new storage unit, and finally creating a new positioning layer index item (positioning layer table item). After each new locating layer index item is generated, the information is written into the system metadata file, and a new routing layer index is built in the process of continuously generating the new locating layer index items. And deleting the routing layer and the positioning layer of the old version after all the dynamic index layer storage units are completely expanded, and using the index of the new version as a new core index to realize the adaptive updating of the preset static index layer.

As shown in fig. 5, an exemplary process of splitting a storage tree according to an embodiment of the present application is a process of splitting a storage tree into a plurality of sub-storage trees by using a storage tree level threshold t as a boundary. The splitting operation does not need to update the B+ Tree, but only needs to update the memory direction of the index area, so that no extra node data persistence is brought. Taking t=2 and n=4 as an example, the difference value between the two is 2, and then the splitting strategy of the storage tree is determined by removing the first-level upper node (root node 1) and the second-level upper node (child nodes 2 and 3 of the root node) in the storage, so that the non-leaf nodes 4, 5 and 6 become root nodes, namely splitting expands 3 storage subtrees (storage units).

Correspondingly, in one embodiment, under the condition that the current level of the storage tree is lower than the threshold value of the level of the storage tree, the storage tree is used as the storage tree to be combined, the plurality of storage trees to be combined are combined to obtain the combined storage tree, and meanwhile, the storage units of the storage tree to be combined are combined to obtain the combined storage unit.

As shown in fig. 6, an exemplary merging process of a storage tree provided by the embodiment of the present application is shown in fig. 7, and another exemplary merging process of a storage tree provided by the embodiment of the present application is shown in fig. 6, where merging operations are divided into two types, and one type is that a plurality of storage trees with the same hierarchy number are merged as shown in fig. 6, and since the logical address ranges between the storage units are arranged in an ascending order, root nodes between the storage trees are ordered, and in this case, only one root node needs to be newly generated to organize the plurality of storage trees. The other is that as shown in fig. 7, multiple storage trees with different layers are merged, and the merging process only needs to insert the root node of a shorter tree into the root node of a higher tree as a child node, which also benefits from the ordering of keys (logical addresses) between the trees. Both merge operations generate new dirty root nodes, bringing a small amount of extra node data persistence.

step 401, obtaining a data reading request;

step 402, screening the storage units to be accessed from a plurality of storage units according to the target read address represented by the data storage request;

step 403, traversing the buffer of the storage unit to be accessed to determine whether the buffer stores the data to be read;

step 404, in the case that it is determined that the buffer stores the data to be read, directly reading the data to be read from the buffer;

Step 405, in the case that it is determined that the buffer does not store the data to be read, traversing the storage tree of the storage unit to be accessed to locate the leaf node to be accessed, and reading the data to be read from the leaf node to be accessed.

In the request response flow schematic diagram provided in the embodiment of the present application, as shown in fig. 8, the routing layer and the positioning layer are static structures, and no locking conflict is needed during searching, so that all user request threads can be addressed in parallel. Each storage groove (storage unit) of the dynamic index layer (slot level) is independently managed and key ranges are not intersected, the system distributes threads according to the storage units, one storage unit is used for one thread, the internal access of the storage units does not need locking, and the data dispersion effect of multiple storage units can effectively relieve the influence of conflicts on performance. As in fig. 8, write 2 (write request 2) needs to wait for write 1 (write request 1) to complete and then write, the read requests of the unified memory unit may correspond in parallel, i.e. as in fig. 8, read 1 (read request 1) and read 2 (read request 2) may be performed simultaneously.

Specifically, when a read request (data read request) is obtained, firstly, positioning to a corresponding dynamic index layer storage slot bit through a routing layer and a positioning layer according to a read key value (target read address), firstly, querying a buffer, and if no data to be read is found, continuing to search in a B+ Tree (storage Tree). The lookup in the storage tree starts from the root node, and a binary lookup is performed in each node, layer by layer, down until it is determined whether data is present in the leaf node. If any leaf node can not be obtained from the node cache, the leaf node is obtained from the disk storage file according to the node address and is temporarily stored in the node cache for next inquiry.

The data storage method comprises the steps of obtaining a data storage request, screening target storage units from a plurality of storage units according to target storage addresses represented by the data storage request, writing data to be stored indicated by the data storage request into a buffer of the target storage units, flushing the data to be flushed in the buffer to the storage tree when the buffer meets preset flushing conditions, and splitting the storage tree to split the storage tree into a plurality of storage subtrees when the current level number of the storage tree exceeds a storage tree level number threshold value, wherein the level number of the storage subtrees is equal to the storage tree level number threshold value. According to the method provided by the scheme, the whole data storage system is split into the plurality of storage units, each storage unit comprises the buffer and the storage tree, newly written data is firstly stored in the buffer, then the data is downwards brushed to the storage tree through the data downwards brushing to the buffer, so that batch writing of the data of the storage tree is realized, the write amplification of the storage tree is reduced, and when the storage tree level is higher, the storage tree of a high level is split into a plurality of storage subtrees, the scattered storage of the data is realized, the situation that the storage tree level is too high is avoided, the data access delay to the storage tree is reduced, and the foundation is laid for improving the concurrent access quantity and the access efficiency of the data.

From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment.

The embodiment of the application also provides a data storage device for executing the data storage method provided by the embodiment.

Fig. 9 is a schematic structural diagram of a data storage device according to an embodiment of the present application. The data storage device 90 includes an acquisition module 901, a screening module 902, a writing module 903, a brushing module 904, and a splitting module 905.

The storage tree splitting module is used for splitting the storage tree to split the storage tree into a plurality of storage subtrees under the condition that the current level number of the storage tree exceeds a storage tree level number threshold value, wherein the storage tree splitting module is used for splitting the storage tree into a plurality of storage subtrees under the condition that the level number of the storage subtrees is equal to the storage tree level number threshold value.

The description of the features in the embodiment corresponding to the data storage device may refer to the related description of the embodiment corresponding to the data storage method, which is not described in detail herein.

The embodiment of the application also provides a data storage system for executing the data storage method provided by the embodiment.

Fig. 10 is a schematic structural diagram of a data storage system according to an embodiment of the present application. The data storage system includes a plurality of storage units and a data storage device.

Wherein each storage unit comprises a buffer and a storage tree.

The data storage device performs data storage processing on the plurality of storage units by adopting the data storage method provided by the embodiment.

An electronic device according to an embodiment of the present application is also provided, as shown in fig. 11, and is a schematic structural diagram of the electronic device according to the embodiment of the present application, including a processor 10 and a memory 20, where the memory 20 stores a computer program, and the processor 10 is configured to execute the computer program to perform the steps in any of the data storage method embodiments described above.

Embodiments of the present application also provide a computer readable storage medium having a computer program stored therein, wherein the computer program is configured to perform the steps of any of the data storage method embodiments described above when run.

In an exemplary embodiment, the computer readable storage medium may include, but is not limited to, a U disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a removable hard disk, a magnetic disk, or an optical disk, etc. various media in which a computer program may be stored.

Embodiments of the present application also provide a computer program product comprising a computer program which, when executed by a processor, implements the steps of any of the data storage method embodiments described above.

Embodiments of the present application also provide another computer program product comprising a non-volatile computer readable storage medium storing a computer program which when executed by a processor implements the steps of any of the data storage method embodiments described above.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The data storage method, the system, the electronic equipment and the storage medium provided by the application are described in detail. The principles and embodiments of the present application have been described herein with reference to specific examples, the description of which is intended only to facilitate an understanding of the method of the present application and its core ideas. It should be noted that it will be apparent to those skilled in the art that various modifications and adaptations of the application can be made without departing from the principles of the application and these modifications and adaptations are intended to be within the scope of the application as defined in the following claims.

Claims

1. A method of data storage, comprising:

Acquiring a data storage request;

screening target storage units from a plurality of storage units according to the target storage addresses represented by the data storage requests, wherein each storage unit comprises a buffer and a storage tree;

Writing the data to be stored indicated by the data storage request into a buffer of the target storage unit;

under the condition that the buffer meets a preset brushing condition, brushing the data to be brushed in the buffer down to the storage tree;

Splitting the storage tree to split the storage tree into a plurality of storage subtrees under the condition that the current level number of the storage tree exceeds a storage tree level number threshold value, wherein the level number of the storage subtrees is equal to the storage tree level number threshold value;

Under the condition that the buffer meets a preset brushing condition, brushing the data to be brushed in the buffer down to the storage tree comprises the following steps:

under the condition that the buffer meets the preset brushing condition, converting the buffer into a read-only state, and creating a new buffer, wherein the new buffer is used for caching data to be stored which are written in the upper stage in the brushing process;

and brushing the data to be brushed in the read-only state buffer down to the storage tree.

2. The data storage method of claim 1, wherein the selecting a target storage unit from a plurality of storage units according to the target storage address characterized by the data storage request comprises:

in a preset static index layer, searching a plurality of routing layers layer by layer according to a target storage address represented by the data storage request so as to determine a target positioning layer table item corresponding to the target storage address;

Screening target storage units from a plurality of storage units according to the target positioning layer table item;

The preset static index layer comprises a plurality of routing layers and positioning layers, the positioning layers comprise a plurality of positioning layer table entries, and each positioning layer table entry corresponds to the storage unit one by one.

3. The data storage method of claim 2, wherein after splitting the storage tree to split the storage tree into a plurality of storage sub-trees, the method further comprises:

Generating a corresponding number of new storage units according to the plurality of storage subtrees;

distributing a locating layer table item for the new storage unit;

And modifying the preset static index layer according to the allocation result of the positioning layer table item.

4. A data storage method according to claim 3, wherein said swiping data to be swished in a read-only status buffer down to said storage tree comprises:

For any data to be flushed, determining a target leaf node in a memory tree belonging to the same memory unit as the read-only state buffer according to a target memory address of the data to be flushed;

copying the target leaf node to obtain a copy leaf node;

The data to be brushed is brushed down to the duplicate leaf node, and the duplicate leaf node is temporarily stored in a memory;

when all data to be flushed in the read-only state buffer are flushed to the corresponding duplicate leaf nodes, merging the duplicate leaf nodes generated by copying the same target node to obtain at least one leaf node to be persisted;

And inserting the at least one leaf node to be persisted into the storage tree to replace the original target leaf node in the storage tree.

5. The data storage method of claim 4, wherein the method further comprises:

generating a new non-leaf node according to index information of the data to be flushed stored in the leaf node to be durable;

and inserting the new non-leaf node into the storage tree to replace the original non-leaf node in the storage tree.

6. The data storage method of claim 1, wherein the method further comprises:

Acquiring the number of data volumes, the total capacity of the data volumes and the writing data capacity of a data storage system;

Determining the structural constraint condition of the storage unit according to the number of the data volumes, the total capacity of the data volumes and the written data capacity of the data storage system;

According to the current number of the storage units and the structural constraint conditions of the storage units;

determining the threshold value of the storage tree hierarchy number according to the current number of the storage units and the structural constraint condition of the storage units;

7. The method of claim 6, wherein determining the threshold number of storage tree levels based on the current number of storage units and the structural constraints of the storage units comprises:

determining the storage tree level number threshold based on the following formula:

Wherein, the Representing the storage tree level number threshold,Representing the minimum height of the storage tree,Representing the maximum height of the storage tree,Representing the current number of memory cells,Indicating the desired number of memory cells.

8. The data storage method of claim 1, wherein splitting the storage tree to split the storage tree into a plurality of storage sub-trees comprises:

determining a splitting strategy of the storage tree according to the difference value between the current layer number of the storage tree and the storage tree layer number threshold;

And removing at least one level of upper nodes of the storage tree according to the splitting strategy of the storage tree, so that a plurality of non-leaf nodes become root nodes, and splitting the storage tree into a plurality of storage subtrees.

9. The data storage method of claim 1, wherein the method further comprises:

under the condition that the current layer number of the storage tree is lower than a storage tree layer number threshold, taking the storage tree as a storage tree to be combined;

and carrying out merging processing on the plurality of storage trees to be merged to obtain a merged storage tree, and simultaneously carrying out merging processing on the storage units of the storage trees to be merged to obtain a merged storage unit.

10. The data storage method of claim 1, wherein the method further comprises:

Acquiring a data reading request;

Screening the storage units to be accessed from a plurality of storage units according to the target read address represented by the data storage request;

Traversing the buffer of the storage unit to be accessed to judge whether the buffer stores data to be read or not;

Under the condition that the buffer is determined to store the data to be read, the data to be read is directly read from the buffer;

And under the condition that the buffer does not store the data to be read, traversing the storage tree of the storage unit to be accessed so as to locate the leaf node to be accessed, and reading the data to be read from the leaf node to be accessed.

11. A data storage system is characterized by comprising a plurality of storage units and a data storage device, wherein each storage unit comprises a buffer and a storage tree;

the data storage device performs data storage processing on the plurality of storage units using the data storage method according to any one of claims 1 to 10.

12. An electronic device, comprising:

A memory for storing a computer program;

processor for implementing the steps of the data storage method according to any of claims 1 to 10 when executing said computer program.

13. A computer readable storage medium, characterized in that a computer program is stored in the computer readable storage medium, wherein the computer program, when being executed by a processor, implements the steps of the data storage method according to any of claims 1 to 10.

14. A computer program product comprising a computer program which, when executed by a processor, implements the steps of the data storage method according to any one of claims 1 to 10.