CN116185284B

CN116185284B - A tiered storage system based on data block activity

Info

Publication number: CN116185284B
Application number: CN202211655199.2A
Authority: CN
Inventors: 田鹏; 赵彬; 刘彬彬; 邓玲; 殷双飞; 陕振; 杨帆
Original assignee: Beijing Institute of Computer Technology and Applications
Current assignee: Beijing Institute of Computer Technology and Applications
Priority date: 2022-12-21
Filing date: 2022-12-21
Publication date: 2025-09-12
Anticipated expiration: 2042-12-21
Also published as: CN116185284A

Abstract

The invention relates to a hierarchical storage system based on data block liveness, and belongs to the technical field of data storage. The hierarchical storage system adopts two storage layers, namely NVMeSSD layers and SATA SSD/HDD layers, wherein NVMeSSD layers are high-level layers, SATASSD/HDD layers are low-level layers, and the hierarchical storage system realizes dynamic migration of data between the high-level layers and the low-level layers under the condition of no human intervention according to the access frequency of the data blocks. According to the access frequency of the data block, the invention realizes the dynamic migration of the data between the high-level layer and the low-level layer under the condition of no human intervention.

Description

Hierarchical storage system based on data block liveness

Technical Field

The invention belongs to the technical field of data storage, and particularly relates to a hierarchical storage system based on data block liveness.

Background

NVMeSSD has the characteristic of high-speed storage performance, SATASATA/HDD has the characteristic of low price and large capacity, and different storage media have different advantages. The advantages of different media are fully utilized in a cold and hot data layering mode, hot data are stored on high-speed equipment, and cold data are stored on equipment with relatively low price, so that the performance, capacity and price of a storage system can be well balanced.

Disclosure of Invention

First, the technical problem to be solved

The invention aims to solve the technical problem of realizing dynamic migration of data between a high-level layer and a low-level layer without human intervention.

(II) technical scheme

In order to solve the technical problems, the invention provides a design method of a layered storage system based on data block liveness, wherein the layered storage system adopts two storage layers, namely an NVMe SSD layer and a SATASSD/HDD layer, wherein the NVMeSSD layer is a high-level layer, the SATA SSD/HDD layer is a low-level layer, and the layered storage system realizes dynamic migration of data between the high-level layer and the low-level layer under the condition of no human intervention according to the access frequency of the data block.

Preferably, the tiered storage system defines two modes according to an initial positioning scheme, namely, a thermodynamic storage layer HotDST is prioritized and a cold dynamic storage layer ColdDST is prioritized, in a ColdDST prioritized mode, data is initially stored in a lower level layer, during use, the thermodynamic data is dynamically migrated to an upper level layer, the migration process is also called downgrade migration, in a HotDST prioritized mode, data is initially stored in an upper level layer, during use, the cold data is dynamically migrated to a lower level layer, and the migration process becomes upgrade migration.

Preferably, the tiered storage system automatically selects a corresponding block size value for the user based on the system IOPS and system bandwidth and storage resource size.

Preferably, the hierarchical storage system performs data layering based on data blocks to complete dynamic statistics and migration of the access frequency of the data blocks, and an intelligent hierarchical storage driver is inserted into an upper layer of a block device driver to realize all functions, wherein the intelligent hierarchical storage driver comprises a metadata management module, a data block estimation module, a migration control module, a system monitoring module and a migration module;

the metadata management module reserves sampling information (access information of I/O data) of access time, access type, position information and storage level of the data block, and data heat information generated by identification according to the sampling information, wherein the sampling information and the data heat information form metadata information, and the metadata information is updated every time of external read-write requests;

The data block value judging module estimates the data block before each execution of data migration, judges the value of the data block according to the metadata information of the data block, the judged result reflects the activity degree of the data block, and the data block value judging module orders the judged result to form a data block migration schedule and transmits the data block migration schedule to the migration control module;

The migration control module receives a data block estimation result of the data block estimation module, namely a data block migration schedule, and controls the migration module to migrate the data blocks, wherein the migration control module determines migration time, migration interval and migration mode, the migration time is determined by system load and residual storage space, the reasonable migration interval can timely migrate hot data or cold data, and meanwhile, the migration activity of the system is ensured to influence normal service of the system within an acceptable range;

The system monitoring module collects performance information of the system, including the current CPU utilization rate, the memory utilization rate, the IOPS of the storage device and the residual capacity of each layer of resources of the system;

And the migration control module adjusts the rate of the migration module for carrying out data block migration according to the system performance information provided by the system monitoring module.

Preferably, the intelligent hierarchical storage drive further comprises an access redirection module, wherein the access redirection module provides a virtual layer, maps physical addresses and provides a uniform storage interface.

Preferably, when the metadata management module performs hot data sampling, the hot data identification strategy is adopted to store the requested LBA in a segmented mode so as to identify and store data hot information;

Wherein the counter is used to track the frequency information of the LBAs, the recency bit is used to check if the entry has been accessed recently, the last 16 bits of the 32-bit LBAs are used to identify the LBAs, the 16-bit ID is made up of a primary ID and a secondary ID, during processing of the LBAs, two hash functions are used, one for generating the primary ID and the other for generating the secondary ID, the hash function generating the secondary ID only takes the last 4 LSBs of the LBAs and the hash function generating the primary ID takes the remaining 12 LSBs, the starting primary ID and secondary ID of the LBAs will also be stored sequentially for sequential accesses, and only the offset sub IDs will be stored, without storing the offset primary ID.

Preferably, the metadata management module operates when hot data identification is performed by hashing the requested LBA by two hash functions each time a user issues a write request to check if the LBA is already stored in the cache, incrementing the corresponding counter value by 1 to capture its frequency if the requested LBA hits the cache and setting the recency bit to 1 for recency capture, classifying it as hot data if the counter value is greater than or equal to a predetermined hot threshold, otherwise cold data, and inserting this newly requested LBA into the cache using a sample-based approach in the event of a cache miss.

Preferably, when the metadata management module samples hot data, LBAs are preserved with 50% probability, the aging mechanism periodically divides the access counter value by 2, resets the recency value to 0 for aging of recency while dividing the access counter value by 2, and resets its recency value to 1 for access of any new data.

Preferably, the metadata management module performs hot data identification, when the cache is full and a newly sampled LBA needs to be inserted into the cache during entry replacement, a replacement entry needs to be selected, such LBA is classified as candidate replacement entries if the access counter value is less than a predetermined hot threshold and its recency bit is reset to 0, the candidate replacement entries are stored in a candidate replacement entry list when the policy performs the decay process, the candidate replacement entry list is updated periodically for each decay period to reflect the latest information, when a cold entry needs to be removed from the cache, it first selects one candidate replacement entry from the list and directly checks whether the candidate can be removed, if the candidate replacement entry is still cold, the candidate replacement entry is deleted and the new entry is inserted into the cache, and if the candidate replacement entry has become hot data since the last aging period, it is not replaced.

The invention also provides application of the system in the technical field of data storage.

(III) beneficial effects

The hierarchical storage system based on the data block liveness adopts two storage layers, and realizes the dynamic migration of data between a high-level layer and a low-level layer under the condition of no human intervention according to the access frequency of the data block.

Drawings

FIG. 1 is a schematic diagram of a heat identification of LBA entries of the present invention;

fig. 2 is a schematic diagram of selection of candidate replacement entries according to the present invention.

Detailed Description

To make the objects, contents and advantages of the present invention more apparent, the following detailed description of the present invention will be given with reference to the accompanying drawings and examples.

The hierarchical storage system based on the data block activity adopts two storage layers (hierarchical mode), NVMeSSD layers (high-level) and SATASSD/HDD layers (low-level), NVMeSSD layers are high-level layers, SATASSD/HDD layers are low-level layers, and the hierarchical storage system realizes dynamic migration of data between the high-level layers and the low-level layers under the condition of no human intervention according to the access frequency of the data block. The hierarchical storage system based on data block liveness will define two modes, hotDST (HotDynamicStorageTiered, hot dynamic storage layer) priority and ColdDST (ColdDynamicStorageTiered, cold dynamic storage layer) priority, according to the initial positioning scheme. In ColdDST priority mode, data is initially stored at the lower level, and "hot data" is periodically dynamically migrated to the higher level during use, a process also known as downgrade migration. In HotDST priority mode, data is initially stored in the higher level layer, and during use, the "cold data" is dynamically migrated to the lower level layer periodically, and the migration process becomes an upgrade migration.

The invention relates to a hierarchical storage technology based on data block liveness, which comprises the steps of firstly determining the size of a data block, wherein the size of the data block is not too large, and is not too small, and if the size of the data block is too large, the hot data and the cold data cannot be truly distinguished, and if the size of the data block is too small, the heat of the data is ensured to be correctly measured, but metadata capacity is too large, management cost is brought to data migration scheduling, and complexity of a system is increased.

The system performs data layering based on the data blocks, completes dynamic statistics and migration of the access frequency of the data blocks, and realizes all functions by inserting an intelligent layering storage driver into an upper layer of a block device driver. The intelligent hierarchical storage drive comprises a metadata management module, a data block estimation module, a migration control module, an access redirection module, a system monitoring module and a migration module.

① Metadata management module

The metadata management module reserves sampling information (access information of I/O data) such as access time, access type, position information and storage hierarchy of the data block, and data heat information generated by identification according to the sampling information, wherein the sampling information and the data heat information form metadata information, and each external read-write request needs to update the metadata information. These information will provide basis for the data block estimate judgment module to estimate.

② Data block estimation judgment module

The data block value judging module estimates the data block before each execution of data migration, judges the value of the data block according to the metadata information of the data block, and the judging result reflects the activity degree of the data block. The data block value judging module sorts the judging results to form a data block migration schedule, and the data block migration schedule is transmitted to the migration control module, which is the premise and the basis of the work of the migration control module.

③ Migration control module

The migration control module is tightly combined with the data block estimation module, the former receives the data block estimation result of the latter, namely the data block migration schedule, and the migration control module is controlled to migrate the data block. The migration control module determines the migration time, the migration interval and the migration mode. The timing of migration is determined by factors such as system load, remaining storage space, and the like. The reasonable migration interval can timely migrate hot data or cold data, and meanwhile, the influence of the migration activity of the system on the normal service of the system is ensured to be within an acceptable range.

④ Access redirection module

The access orientation module provides a virtual layer, maps the physical address and provides a uniform storage interface for the outside.

⑤ System monitoring module

The system monitoring module collects performance information of the system, including the current CPU utilization rate, the memory utilization rate, the IOPS of the storage device, the residual capacity of each layer of resources and the like.

⑥ Migration module

And the migration module takes out migration tasks from the data block migration schedule, and migrates the data blocks on different storage levels. And the migration control module adjusts the rate of the migration module for migrating the data blocks according to the system performance information provided by the system monitoring module.

On-line thermal data statistics has two main points, namely, the thermal data is sampled and identified, and the sampled data is subjected to hierarchical hash index.

(1) Thermal data sampling

At hot data sampling, since most of the I/Os are localized, and in the case of sequential access, only the least significant bits LSB of a few logical block addresses LBAs are changed, while most of their other bits are unaffected. Based on the method, the hot data identification strategy is designed and adopted to store the requested LBAs in a segmented mode, and then the hot information of the data is identified and stored efficiently. In each cached hot data entry, the metadata management module maintains an ID, a counter, and a recent bit for it, as shown in FIG. 1.

Wherein the counter is used to track frequency information of LBAs and the recency bit is used to check whether an item has been accessed recently. To reduce memory consumption, the last 16 bits of the 32-bit LBA are used to identify the LBA, the 16-bit ID consisting of a primary ID (12 bits) and a secondary ID (4 bits). In processing the LBAs, two hash functions are used, one for generating the primary ID and the other for generating the secondary ID. The hash function that generates the secondary ID only obtains the last 4 LSBs of the LBA, while the hash function that generates the primary ID uses the remaining 12 LSBs. Such a two-level hierarchical hash index scheme may significantly reduce cache lookup overhead by directly accessing LBA information in the cache. Since many access patterns in a workload typically exhibit high spatial locality and temporal locality, such designs are able to take full advantage of spatial locality. For sequential access, the starting primary and secondary IDs of the LBAs will also be stored sequentially, and only the offset child ID will be stored, not the offset primary ID. Therefore, it can significantly reduce the memory space consumption. While this approach to partial LBA acquisition may lead to false LBA hot identification problems, increasing the number of bits of its primary ID may significantly increase its identification accuracy.

(2) Thermal data identification

In the hot data identification process, the online hot data identification of the system works in such a way that whenever a user issues a write request, the requested LBA is hashed by two hash functions to check if the LBA is already stored in the cache. If the requesting LBA hits the cache, the corresponding counter value is incremented by 1 to capture its frequency and the recency bit is set to 1 for recency capture. The counter value is classified as hot data if it is greater than or equal to a predetermined hot threshold, and as cold data otherwise. In the event of a cache miss, then the sample-based approach is used to insert this newly requested LBA into the cache.

At sampling, LBAs were retained with 50% probability. Therefore, it can reduce not only memory consumption but also computational overhead. The aging mechanism of this scheme periodically divides the access counter value by 2. For aging of recency, the recency value is reset to 0 while the access counter value is divided by 2, and for any new data access, its recency value is again set to 1.

During the replacement of an entry as shown in FIG. 2, the cache is full and a newly sampled LBA needs to be inserted into the cache, then a replacement entry needs to be selected. To reduce overhead, the policy maintains a candidate list of replacement entries. Such LBAs are classified as candidate replacement entries if the access counter value is less than a predetermined thermal threshold and its recency bit is reset to 0. The policy performs the decay process by storing the candidate replacement entries in a list. The candidate replacement entry list is updated periodically for each decay period to reflect the latest information, and when a cold item needs to be removed from the cache, it first selects a candidate replacement entry from the list and directly checks whether the candidate can be removed. If the candidate replacement entry is still cold, the candidate replacement entry is deleted and a new item is inserted into the cache, and if the candidate replacement entry has become hot data since the last aging period, it is not replaced. The candidate replacement entry list can directly find the candidate replacement entry by using the two-level hierarchical hash index scheme, which can significantly reduce the search overhead of the replacement entry.

The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and variations could be made by those skilled in the art without departing from the technical principles of the present invention, and such modifications and variations should also be regarded as being within the scope of the invention.

Claims

1. A tiered storage system based on data block activity, characterized by employing two storage tiers: an NVMe SSD tier and a SATA SSD/HDD tier, with the NVMe SSD tier being the higher-level tier and the SATA SSD/HDD tier being the lower-level tier. The tiered storage system dynamically migrates data between the higher-level and lower-level tiers based on data block access frequency, without human intervention.

The tiered storage system defines two modes: Hot DST priority (hot dynamic storage tier) and Cold DST priority (cold dynamic storage tier). In Cold DST priority mode, data is initially stored in the lower tier, and hot data is periodically dynamically migrated to the higher tier during use. This migration process is also called downgrade migration. In Hot DST priority mode, data is initially stored in the higher tier, and cold data is periodically dynamically migrated to the lower tier during use. This migration process is called upgrade migration.

The tiered storage system performs data tiering based on data blocks, dynamically counting and migrating data block access frequencies. All functions are implemented by inserting an intelligent tiered storage driver on top of the block device driver. The intelligent tiered storage driver includes a metadata management module, a data block valuation module, a migration control module, a system monitoring module, and a migration module.

The metadata management module retains sampling information such as the access time, access type, location information, and storage level of the data block, as well as data heat information generated by identification based on the sampling information. The sampling information and data heat information constitute the metadata information, which is updated with each external read and write request;

The data block value determination module evaluates the data block before each data migration and determines the value of the data block based on the metadata information of the data block. The determination result reflects the activity level of the data block. The data block value determination module sorts the determination results to form a data block migration plan table, which is passed to the migration control module.

The migration control module receives the data block valuation results of the data block valuation module, i.e., the data block migration schedule, and controls the migration module to migrate data blocks. The migration control module determines the timing, interval, and method of migration. The timing of migration is determined by the system load and remaining storage space. A reasonable migration interval enables timely migration of hot or cold data, while ensuring that the impact of the system migration activity on the normal operation of the system is within an acceptable range.

The system monitoring module collects system performance information, including the system's current CPU utilization, memory utilization, storage device IOPS, and remaining capacity of each layer of resources;

The migration module takes out the migration task from the data block migration plan table and migrates the data blocks on different storage levels; the migration control module adjusts the data block migration rate of the migration module according to the system performance information provided by the system monitoring module.

2. The system according to claim 1, wherein the tiered storage system automatically selects a corresponding block size value for the user based on system IOPS, system bandwidth, and storage resource size.

3. The system as described in claim 1 is characterized in that the intelligent tiered storage driver also includes an access redirection module, which provides a virtual layer, maps physical addresses, and provides a unified storage interface to the outside world.

4. The system of claim 1, wherein the metadata management module, when performing hot data sampling, employs a hot data identification strategy to segment and store the requested LBA, thereby identifying and preserving data heat information; and the metadata management module maintains an ID, a counter, and a recency bit for each cached hot data entry;

The counter is used to track the frequency information of the LBA, and the recency bit is used to check whether the item has been accessed recently; the last 16 bits of the 32-bit LBA are used to identify the LBA, and the 16-bit ID consists of a primary ID and a secondary ID; in the processing of the LBA, two hash functions are used, one for generating the primary ID and the other for generating the secondary ID; the hash function for generating the secondary ID only obtains the last 4 LSBs of the LBA, while the hash function for generating the primary ID uses the remaining 12 LSBs; for sequential access, the starting primary ID and secondary ID of the LBA will also be stored sequentially, and only the offset sub-ID will be stored, and the offset primary ID will not be stored.

5. The system as described in claim 1 is characterized in that the metadata management module works as follows when performing hot data identification: whenever a user issues a write request, the requested LBA is hashed by two hash functions to check whether the LBA is already stored in the cache; if the requested LBA hits the cache, the corresponding counter value is incremented by 1 to capture its frequency, and the recency bit is set to 1 for recency capture; if the counter value is greater than or equal to a predetermined hot threshold, it is classified as hot data, otherwise it is cold data; in the case of a cache miss, a sampling-based method is used to insert this newly requested LBA into the cache.

6. The system as described in claim 5 is characterized in that when the metadata management module performs hot data sampling, it retains the LBA with a probability of 50%, and the aging mechanism periodically divides the access counter value by 2. For recency aging, when the access counter value is divided by 2, the recency value is reset to 0. For any access to new data, its recency value is set to 1.

7. The system as described in claim 5 is characterized in that when the metadata management module performs hot data identification, during the entry replacement process, the cache is full and the newly sampled LBA needs to be inserted into the cache, then a replacement entry needs to be selected; if the access counter value is less than a predetermined hot threshold and its recency bit is reset to 0, then such LBA is classified as a candidate replacement entry; when the policy performs the decay process, these candidate replacement entries are stored in a candidate replacement entry list; the candidate replacement entry list is regularly updated for each decay period to reflect the latest information, and when a cold item needs to be removed from the cache, it first selects a candidate replacement entry from the list and directly checks whether the candidate can be removed; if the candidate replacement entry is still cold, the candidate replacement entry is deleted and the new item is inserted into the cache; if the candidate replacement entry has become hot data since the last aging period, it is not replaced.