CN113302597A

CN113302597A - Distributed storage system and garbage recycling method in distributed storage system

Info

Publication number: CN113302597A
Application number: CN201980089025.4A
Authority: CN
Inventors: 罗小东; 陈飘; 何益
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2019-04-23
Filing date: 2019-04-23
Publication date: 2021-08-24
Also published as: WO2020215223A1

Abstract

A distributed storage system and a garbage collection method in the distributed storage system are provided. The main node selects a target node from the plurality of storage nodes according to the data volume of the effective data distributed in each storage node by the source logic unit, wherein the data volume of the first effective data stored in the target node exceeds a set quantity threshold value. The main node creates a target logic unit, and the storage nodes distributed by the target logic unit comprise the target nodes. In other words, at least a part of the storage space occupied by the target logical unit is from the target node. The master node then instructs the target node to migrate the first valid data from a first source address to a first target address. And then releasing the storage space indicated by the first source address. Network bandwidth between storage nodes is saved.

Description

Distributed storage system and garbage recycling method in distributed storage system

Technical Field

The present application relates to the field of storage, and more particularly, to a distributed storage system and a method of garbage collection in a distributed storage system.

Background

In a distributed storage system, data is usually written to a plurality of storage nodes included in the system by way of additional writing. The additional writing is different from the overwriting writing, and when data is modified, the original data is not deleted immediately, so that a large amount of garbage data (the modified data is valid data) inevitably appears in the system. In order to release the storage space occupied by the garbage data, the system can periodically perform garbage collection. The garbage collection takes a logic unit as an object, and the specific process is that a certain number of storage nodes are selected in the distributed storage system, new logic units are created in the storage nodes, then effective data in the logic units to be collected are written into the new logic units, and then storage space occupied by the logic units to be collected is released. Since a certain number of storage nodes where the new logic unit is located are generally randomly selected, these storage nodes are often different from the node where the storage node to be recovered is located, and in the process of rewriting valid data into the new logic unit, data forwarding between the storage nodes is often involved, which consumes a large amount of bandwidth resources.

Disclosure of Invention

The application provides a distributed storage system and a garbage recovery method in the distributed storage system, which can ensure that at least a part of effective data is migrated in the same storage node, and reduce cross-node migration data to a certain extent, thereby achieving the purpose of saving bandwidth.

A first aspect provides a method for garbage collection in a distributed storage system, where the distributed storage system includes a plurality of storage nodes, and one of the storage nodes is a master node. In the method, a main node selects a target node from a plurality of storage nodes according to the data volume of valid data distributed in each storage node by a source logic unit, wherein the data volume of first valid data stored in the target node exceeds a set quantity threshold value. The main node creates a target logic unit, and the storage nodes distributed by the target logic unit comprise the target nodes. In other words, at least a part of the storage space occupied by the target logical unit is from the target node. The master node then instructs the target node to migrate the first valid data from a first source address to a first target address. The first source address and the first destination address both refer to real addresses and both are located within the destination node, whereas the memory space indicated by the first source address belongs to the source logical unit and the memory space indicated by the first destination address belongs to the destination logical unit. And when the main node confirms that all valid data in the source logic unit are migrated to the target logic unit, releasing the storage space occupied by the source logic unit. The memory space occupied by the source logical unit includes the memory space indicated by the first source address.

According to the garbage collection method provided by the first aspect, the storage nodes distributed by the source logical unit and storing more effective data are used as target nodes, and the storage nodes distributed by the target logical unit created by the master node include the target nodes, so that the target nodes not only provide storage space for the source logical unit, but also provide storage space for the target logical unit. The master node may instruct the target node to migrate the first valid data from a first source address within the target node to a first target address within the target node during the process of migrating valid data of a source logical unit. Because the first valid data is migrated in the target node, the forwarding of the data between the storage nodes is avoided to a certain extent, and the network bandwidth is saved.

In a first implementation of the first aspect, before the master node instructs the target node to migrate the first valid data from a first source address within the target node to a first target address within the target node, the master node creates a migration list, the migration list including the first source address of the first valid data and the first target address of the first valid data. Then, the master node sends the migration list to the target node. The migration list may be created according to a least migration principle.

With reference to the first implementation of the first aspect, in a second implementation of the first aspect, the plurality of storage nodes further includes other storage nodes, the other storage nodes are independent of the storage node to which the target logical unit is distributed, and the migration list further includes a second source address of second valid data stored in the other storage nodes and a second target address of the second valid data, where the second source address is located in the other storage nodes and the second target address is located in the target node. The master node also sends the migration list to the other storage nodes. The other storage nodes are distributed storage nodes with less effective data stored in the source logical unit, and are not selected as target nodes, so that no storage space is provided for the target logical unit. In this case, the other storage node needs to migrate the second valid data it stores to the target node. If the target node is multiple, the target node can be migrated to any one target node. Or, the other storage nodes may also migrate the second valid data to other nodes than the target node where the target logical unit is located.

With reference to the first implementation of the first aspect, in a third implementation of the first aspect, the first source address and the first destination address are both located in a first hard disk of the destination node. In this case, the target node may send the first source address and the first target address to the first hard disk, and the first hard disk performs a migration operation. Thereby relieving the processor of the target node of the burden.

With reference to the first implementation of the first aspect, in a fourth implementation of the first aspect, the first source address is located in a first hard disk of the target node, and the first target address is located in a second hard disk of the target node. In this case, the specific migration operation is that the processor of the target node reads the first valid data from the first source address to the cache, and then writes the first target address from the cache.

With reference to the first implementation of the first aspect, in a fourth implementation of the first aspect, the instructing, by the master node, the target node to migrate the first valid data from the first source address in the target node to the first target address in the target node includes instructing, by the master node, the target node to migrate the first valid data from the first source address to the first target address according to an offset of the first valid data in the source logic unit, so that the offset of the first valid data in the target logic unit after the migration is the same as the offset of the first valid data in the source logic unit before the migration. According to the migration mode, the position of the first valid data in the source logical unit is the same as the position of the first valid data in the target logical unit. If all valid data contained in the first stripe in which the first valid data is located are migrated in such a manner, the data fragment contained in the first stripe does not change before and after migration, so that the check fragment does not need to be recalculated, and the original check fragment of the first stripe is reserved. Therefore, the calculation amount of the main node is reduced, and the calculation resources are saved.

With reference to the fourth implementation of the first aspect, in a fifth implementation of the first aspect, after all valid data in the source logical unit are migrated to the target logical unit and the storage space occupied by the source logical unit is released, the master node may modify the identifier of the target logical unit to the identifier of the source logical unit. Since the logical address of the data is composed of the identity of the logical unit in which the data is located, and the offset within the logical unit. Because the target logical unit inherits the identifier of the source logical unit, and as can be seen from the fourth implementation, the position of the first valid data in the source logical unit is the same as the position of the first valid data in the target logical unit, and therefore the logical address of the first valid data does not change before and after migration, modification of metadata of the first valid data and forwarding of the modified metadata between storage nodes are avoided, and network bandwidth is further saved.

A second aspect of the present application provides a master node, the master node being located in a distributed storage system, the distributed storage system comprising a plurality of storage nodes, the master node comprising an interface and a processor, wherein the interface is configured to communicate with the plurality of storage nodes; the processor is configured to perform any one of the implementations provided in the first aspect.

A third aspect of the present application provides a garbage collection apparatus, where the apparatus is located in a master node of a distributed storage system, where the distributed storage system includes a plurality of storage nodes, and the master node is one of the plurality of storage nodes, and the garbage collection apparatus is configured to execute any one of the implementations provided in the first aspect.

A fourth aspect of the present application provides a computer program product for garbage collection, comprising a computer readable storage medium storing program code comprising instructions for performing the method described in the first aspect.

Drawings

FIG. 1 is a diagram of an application scenario provided by an embodiment of the present invention;

FIG. 2 is a schematic diagram of a logic unit provided by an embodiment of the invention;

FIG. 3 is a schematic diagram illustrating an effect of a garbage recycling method according to an embodiment of the present invention;

FIG. 4 is a schematic flow chart of a garbage recycling method according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating a migration list provided by an embodiment of the present invention;

FIG. 6 is a schematic flow chart of another garbage recycling method according to an embodiment of the present invention;

FIG. 7 is a schematic diagram illustrating the effect of another garbage recycling method according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a master node according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of a garbage collection apparatus for a master node according to an embodiment of the present invention.

Detailed Description

According to the embodiment of the application, when garbage collection (garpage collection), at least a part of effective data can be guaranteed to migrate in the same storage node, and cross-node migration data is reduced to a certain extent, so that the purpose of saving bandwidth is achieved. The technical solution in the embodiments of the present invention will be described below with reference to the accompanying drawings.

The technical scheme of the embodiment of the application can be applied to various storage systems. The technical solution of the embodiment of the present application is described below by taking a distributed storage system as an example, but the embodiment of the present invention does not limit this. In a distributed storage system, data is stored on a plurality of storage nodes (hereinafter referred to as "nodes") in a scattered manner, and the storage load is shared by the plurality of storage nodes, so that the storage mode not only improves the reliability, the availability and the access efficiency of the system, but also is easy to expand. The storage node is, for example, a server, or a combination of a storage controller and a storage medium.

Fig. 1 is a schematic diagram of a scenario to which the technical solution of the present embodiment can be applied. As shown in fig. 1, a plurality of client servers (client servers) 101 communicate with a storage system 100, and the storage system 100 includes a switch 103 and a plurality of storage nodes (or simply "nodes") 104, and the like. Among them, the switch 103 is an optional device. Each storage node 104 may include a plurality of mechanical hard disks or other types of storage media (e.g., solid state disks or shingled magnetic recording) for storing data.

Fig. 2 is an example of a logic unit provided in the present embodiment. A logical unit is a piece of logical space, with the actual physical space of each logical unit coming from multiple nodes. The number of nodes occupied by a logical unit depends on the type of Redundant Array of Independent Disks (RAID) corresponding to the logical unit. As shown in fig. 2, each of the node 2, the node 3, the node 4, the node 5, the node 6, and the node 7 provides a part of storage space, so as to construct the logical unit 1 with RAID type "4 + 2", where the node 2, the node 3, the node 4, and the node 5 are used to store data fragments, and the node 6 and the node 7 are used to store check fragments. Of these 6 nodes, one node (e.g., node 2) is elected as the master node. The main node divides the received data into 4 data fragments, calculates 2 check fragments of the 4 data fragments, and then sends each data fragment and the check fragment thereof to the corresponding node for storage. The master node may be a node where one of the fragments is located, or may be a node independent from the logical unit 1. When a data slice is written to a node, it is typically written at a set granularity, for example, 8KB or 16 KB. One data fragment or check fragment may be divided into a plurality of data blocks according to the set granularity. For example, one data slice stored in node 2 includes data block D1 and data block D2, one data slice stored in node 3 includes data D3 and D4, … …, one parity slice stored in node 6 includes Q1 and Q2, and so on. The check fragment comprises Q1, Q2, P1 and P2. D1, D2, D3, D4, D5, D6, D7, D8, Q1, Q2, P1 and P2 jointly form a stripe (stripe). When any two data fragments/check fragments are damaged, other fragments can be used for recovery, so that the reliability of data is ensured. For example, the logic unit 1 may further include another stripe, which is composed of D9, D10, D11, D12, D13, D14, D15, D16, Q3, Q4, P3, and P4. For each slice (data slice or check slice), the identity of the logical unit in which it is located and the location inside the logical unit constitute the logical address of the slice, and the actual address of the slice located in a node is the physical address of the slice.

Each logic unit may include one or more bars, the number of the bars included in the logic unit is not limited in this embodiment, and fig. 2 is only an example. The case of logic unit 2 and logic unit 3 is similar to logic unit 1 and will not be described in detail here. The number of nodes and the number of logical units and their corresponding RAID types are not limited in this embodiment.

In practical applications, a system often uses an additional writing mode to write data into a logical unit, and when a logical unit is full, the system allocates a new logical unit for data writing. As data is modified, the data written before modification may become invalid. These invalid data are not read but still occupy storage space. Therefore, when the system space is insufficient, the logical unit needs to be reclaimed to release the storage space. The append write is also referred to as a redirect-On-write (ROW).

In this embodiment, the logical unit is a basic unit of garbage collection. In other words, when a certain condition is triggered, the system selects one or more logic units to be recovered (also called source logic units) from the plurality of logic units, and the valid data in the logic units are migrated to other places and then released, so as to achieve the purpose of recovering the storage space.

The following describes the garbage recycling method provided in this embodiment with reference to fig. 3 and 4. The method can be applied to the distributed storage system shown in fig. 1, and the object of garbage collection is a logical unit as shown in fig. 2. Fig. 3 is a schematic view showing the effect of the garbage collection method, and fig. 4 is a schematic view showing the flow of the garbage collection method. As shown in fig. 4, the method includes the following steps.

In S401, the master node determines a source logical unit. This step is usually performed under certain triggering conditions, for example, the data amount of the garbage data in the system reaches a certain amount threshold, or the size of the storage space available in the system is lower than a certain space threshold, or the logical unit meeting the reclamation condition reaches a certain amount, and so on. The source logical unit also needs to satisfy a certain condition, for example, the data amount of the garbage data contained in the logical unit reaches a first garbage threshold, or the data amount of the valid data contained in the logical unit is lower than a second garbage threshold, and so on. Generally, one or more determined source logical units can be provided. Taking fig. 3 as an example, assume that the determined source logical units are logical unit 1, logical unit 2, and logical unit 3.

In S402, the master node determines the node where the source logical unit is located. Assume that logical unit 1, logical unit 2, and logical unit 3 are all source logical units. As shown in fig. 3, the RAID type corresponding to the logical unit 1 is "4 + 2", and is distributed in the node 2, the node 3, the node 4, the node 5, the node 6, and the node 7. The RAID type of logical unit 2 is consistent with logical unit 1, and it is distributed among node 1, node 2, node 3, node 5, node 6, and node 7. The RAID type of logical unit 3 is consistent with logical unit 1, and it is distributed among node 1, node 2, node 3, node 4, node 5, and node 7. Thus, the nodes at which the source logical unit is located include node 1, node 2, node 3, node 4, node 5, node 6, and node 7.

In S403, the master node counts the data volume of the valid data included in the node where the source logical unit is located, and selects a node where the data volume of the valid data exceeds a set number threshold as a target node. In practical applications, the data amount of the valid data is often counted according to the granularity of the data blocks described above. If a block contains only valid data, such a block is called a valid block (indicated by the white Dn in fig. 3), and if a block contains invalid data, this block is called an invalid block (indicated by the gray Dn in fig. 3). In addition, the P, Q data block in fig. 3 stores check data, and since the data fragment contained in the original stripe is changed in the normal case after the garbage collection is completed, the check fragment is recalculated and stored to ensure reliability. Thus, valid and invalid are only for data blocks in a data slice; p, Q the data block has no valid data and no invalid data, only the valid data block contained in the data slice is counted.

Illustratively, as shown in fig. 3, each of the node 2, the node 3, the node 4, and the node 5 contains 3 or 4 valid data blocks, and if the number threshold is 2, each of the node 2, the node 3, the node 4, and the node 5 may be a target node. However, fig. 3 is only an example, in this embodiment, as long as all nodes whose data amounts of valid data exceed the set number threshold may be target nodes, the number of the target nodes may be one or multiple, and this embodiment is not limited.

In S404, the master node creates a target logical unit (e.g., logical unit 4 shown in fig. 3), where at least a part of the storage space occupied by the target logical unit originates from the target node. The RAID type of the newly created target logical unit is consistent with the RAID type of the source logical unit (logical unit 1, logical unit 2, and logical unit 3), so that the logical unit 4 needs to span 6 nodes, where the 6 nodes include the target node selected in S403, and if the selected target node is not enough, some nodes are selected from the distributed storage system to reach 6 nodes. For example: assuming that the number of target nodes in S403 is 4, 2 more nodes need to be selected. As shown in fig. 3, the storage space of the logical unit 4 is derived from node 2, node 3, node 4, node 5, node 6 and node 7. The node 2, the node 3, the node 4, and the node 5 are target nodes selected in S403, the node 6 and the node 7 are two other selected nodes, and the selection policy of the nodes other than the target nodes may be load balancing or a random principle, which is not limited in this embodiment. The RAID type of the logical unit 4 is "4 + 2", the node 2, the node 3, the node 4, and the node 5 may be used for data fragmentation, and the node 6 and the node 7 are used for storing check fragmentation.

In S405, the valid data block in the source logical unit is migrated to the target logical unit (logical unit 4). Since the valid data blocks in the source logical unit are distributed over multiple nodes, specifically, the master node needs to send an instruction to the node where each valid data block is located, and instruct the node to migrate the valid data blocks stored in the node to the target logical unit. There can be two cases, case 1, where for the target node, valid data blocks need only be migrated inside the node. In this case, the master node may instruct the target node to migrate the valid data block from a first source address in the target node to a first target address in the target node, where the first source address and the first target address are both real addresses, a storage space indicated by the first source address belongs to the source logical unit, and a storage space indicated by the first target address belongs to the target logical unit. For example, data block D1 has both its source and destination addresses located inside node 2, so data block D1 only needs to be migrated inside node 2. In case 2, for nodes other than the target node, the valid data block stored in the node needs to be sent to one of the target nodes. In this case, the master node instructs the node to migrate the stored valid data block from the second source address to the second destination address. E.g., data block D19, having a source address located in node 1 and a destination address located in node 2. Since node 1 does not provide storage space for logical unit 4, D26 needs to be sent to one of the target nodes (node 2 shown in FIG. 3), and D26 is stored in logical unit 4 by node 2.

In an alternative embodiment, before S405, the primary node allocates a target address to each valid data block according to the minimum migration policy, and creates a migration list 50 (as shown in fig. 5). The migration list 50 includes the source and destination addresses of each valid data block. The source address of the valid data block refers to the actual address of the valid data block before migration, and the target address of the valid data block refers to the actual address of the valid data block after migration. The minimum migration policy is a migration policy, which refers to a policy that migration data across nodes is minimized as much as possible, and a principle that data migration across nodes is avoided as much as possible. For example, for a target node (e.g., node 2), it provides storage space for both the source logical unit and the target logical unit, so valid data blocks in the target node do not need to be migrated to other nodes. For a non-target node (e.g., node 6), since it does not provide storage space for the target logical unit, valid data blocks in the node have to be migrated to the node where the target logical unit is located. After the master node creates the migration list 50, the master node sends the list 50 to the nodes where the valid data blocks are located, and instructs the nodes to migrate according to the target addresses in the list. It should be particularly noted that the minimum migration policy is only one of the migration policies, and other migration policies may also be used in the embodiment of the present invention. Since the destination node of migration in the prior art is randomly selected, as long as there is at least one data block of a node specified as: and remains in the node after migration, in other words, the source node and the target node are the same node for this data block. Then, the beneficial effect of reducing the cross-node migration can be produced compared with the prior art, and therefore, the present invention is within the intended scope of the embodiments of the present invention.

Another optional implementation manner is that the master node does not need to generate the migration list 50, but directly instructs the node where the valid data block is located to perform data migration according to the target address after allocating the target address to each valid data block according to the minimum migration policy.

Further, for the target node, the valid data block is migrated inside the node, but the processing manner is different in different scenarios. If the source address and the destination address of an effective data block point to different hard disks, the effective data block needs to be read from the source address to a cache in a node during migration, and then the data block is acquired from the cache and rewritten into the destination address. For example, the data block D5 has the source address located in the hard disk 0 of the node 4 and the destination address located in the hard disk 1 of the node 4, and at this time, the node 4 needs to read D5 from the hard disk 0 to the cache, obtain D5 from the cache, and write the D5 to the hard disk 1. If the source address and the target address of one effective data block point to the same hard disk, the effective data block does not need to be read from the hard disk to a cache during migration, and the migration can be directly realized in the hard disk. At this time, the processor of the node may send a migration instruction to the hard disk where the valid data block is located, where the migration instruction includes a source address and a destination address of the valid data block. The hard disk can directly read data from a source address and then write the data into a target address. For example, data block D3, whose source and destination addresses are both located on hard disk 0 of node 3, then the processor of node 3 sends a migration instruction to the read-write chip of hard disk 0, which writes D3 from the in-disk offset address 2 to the in-disk offset address 10. In this embodiment, the in-disk offset address is used to indicate the specific location where data is stored in the hard disk.

For the target logical unit (e.g., logical unit 4 shown in fig. 3), after each data fragment is written into the corresponding node, the master node also needs to compute the check fragments of the data fragments. The parity shard includes parity data chunks, which are illustrated in FIG. 3 as P, Q. After the check data block is calculated, the master node sends check fragments (including the check data block) to corresponding nodes for storage.

And after all the valid data in the source logic unit are migrated to the target logic unit, the master node updates the metadata of the data. The metadata includes the logical and physical addresses of the data, the logical address referring to the identity of the logical unit in which the data is located and the offset within the logical unit. It is understood that the logical address of the valid data block changes after the valid data block is migrated from the source logical unit to the target logical unit. In order for the client server 101 to subsequently be able to read the correct data, the master node needs to modify the logical address of the data. The physical address refers to a physical location where the data is actually stored, and indicates an identification of a node where the data is located, an identification of a hard disk in the node, and an offset address in the disk (refer to fig. 5). When data is actually migrated from one node to another node, or from one hard disk to another hard disk, or the physical address of the data is migrated in the same hard disk, the changed physical address needs to be recorded in metadata of the data.

In S406, the master node releases the storage space occupied by the source logical unit (logical unit 1, logical unit 2, and logical unit 3 shown in fig. 3). Deleting all data stored in the source logical unit, including valid data and invalid data, prior to the releasing. The storage space obtained after the release is available for other logic units. It should be noted that, after all valid data in the source logical unit is migrated to the target logical unit, S406 specifically includes: and the main node respectively releases the corresponding storage space of the source logic unit distributed in each storage node.

According to the garbage collection method shown in fig. 4, the nodes with more effective data are selected to continue to provide storage space for the target logical unit, so that the effective data stored in the nodes can continue to be retained in the node, thereby avoiding forwarding among the nodes and saving network bandwidth. Even if a small amount of valid data located on other nodes still needs to be sent to the nodes containing more valid data, the network bandwidth can be saved to a certain extent compared with the prior art.

Further, S405 has at least two implementations. One implementation is to migrate each valid data block in logical unit 1, logical unit 2, and logical unit 3 to the storage space corresponding to logical unit 4, regardless of the logical address of the valid data block. In other words, the location of the valid data block within the logical unit changes from before to after migration. Then the data slice contained in a slice will also change, in which case the check slice has to be recalculated. In another implementation manner, in the process of migrating the valid data blocks from the source logical unit to the target logical unit, migration is performed according to the original logical addresses of the valid data blocks, so that the offset of each migrated valid data block in the logical unit 4 is consistent with the original offset. In this migration manner, in the case where a large number of logical units need to be reclaimed, it occurs with a high probability: compared with the prior to migration, the data fragments contained in some stripes are not changed after the migration, and the check fragments do not need to be recalculated for the stripes. Thus, this approach may save the computational resources of the system compared to the former approach. A specific example will be described below.

Referring to fig. 6 and 7, fig. 6 is another garbage recycling method provided in this embodiment. The method can be applied to the distributed storage system shown in fig. 1, and the object of garbage collection is a logical unit as shown in fig. 2. Fig. 6 is a schematic flow chart of the method, and fig. 7 is a schematic effect chart of the method. As shown in fig. 6, the method includes the following steps.

S601, the main node determines at least two source logic units. This step is usually performed under a certain trigger condition, which is consistent with the trigger condition in S401 shown in fig. 4, and reference may be made to the description of S401. The at least two source logical units also need to satisfy a condition. Optionally, the master node may set the source logic unit to meet a certain condition, and the specific condition setting may refer to the description of S401.

Illustratively, taking the determination of two source logical units as an example, the two logical units are the logical unit 22 and the logical unit 33, respectively. In this embodiment, each source logic unit may be set to satisfy the trigger condition, or only any one of the source logic units may be set to satisfy the trigger condition, and when the data amount of the garbage data of the logic unit 22 reaches the first garbage threshold, the data amount of the garbage data of the logic unit 33 is lower than the second garbage threshold. The arrangement is such that there is a difference in the amount of garbage data contained in the two source logical units. Further, in some scenarios, logical unit 22 may be the logical unit with the highest data amount of owned garbage data, while logical unit 33 is the logical unit with the lowest data amount of owned garbage data. Similarly, the present embodiment may also set other equivalent conditions to filter the two source logic units.

S602, the main node determines the node where the source logic unit is located. The nodes at which logic unit 22 and logic unit 33 are located may refer to the example of fig. 7.

And S603, the master node counts the data volume of the effective data contained in the node where the source logic unit is located, and selects the node with the data volume of the effective data exceeding a set number threshold as a target node. The specific implementation of S603 is the same as S403, please refer to the description of S403.

S604, the master node creates a target logical unit (for example, the logical unit 44 shown in fig. 7), and at least part of the storage space occupied by the target logical unit originates from the target node. The specific implementation of S604 is the same as S404, please refer to the description of S404. In the example of fig. 7, the node where the logic unit 44 is located completely coincides with the nodes where the logic units 22 and 33 are located, which is just an example, and it should be understood that in an actual application scenario, the node where the logic unit 44 is located may only partially coincide with the nodes where the logic units 22 and 33 are located (as shown in fig. 3).

S605, migrating the valid data blocks in the logical unit 33 to the logical unit 44, and not changing the offset of each valid data block in the logical unit during the migration process. That is, the offset within logical unit 33 before these valid data blocks are migrated is the same as the offset within logical unit 44 after migration.

S606, the valid data block in the logical unit 22 is written into the blank data block of the logical unit 44. As shown in fig. 7, after the stripe D50 is migrated to the logic unit 44, some data fragments may have blank data blocks because the original data blocks become invalid data blocks in the logic unit 33. Therefore, when migrating valid data blocks in the logical unit 22, empty data blocks can be filled with priority, and new stripes can be written if overflow occurs. D1, D3, etc. as shown in fig. 7 all write the stripe where D51 is located. In order to distinguish from the valid data in the logical unit 33, the valid data in the logical unit 22 is named padding data in the example of fig. 7, which is indicated by a dashed line.

According to the migration manner of S605-S606, the data slices contained in some of the stripes in the logical unit 33 are not changed after the migration, such as the stripe in which D33 is located and the stripe in which D41 is located in fig. 7. Therefore, the stripe of D33 and the stripe of D41 do not need to be recalculated for the parity shard. For the stripe of D50 and the stripe of D58, since new data blocks are filled in, the stripes are changed, and thus the check fragment needs to be recalculated. It can be appreciated that even though there are still some stripes to be recalculated for the check fragment, the computation amount of at least a part of the check fragment is reduced, and the computation resource is saved. On the other hand, since the check fragment is calculated by the master node and then sent to the corresponding nodes (such as the node 5 and the node 6 shown in fig. 7) for storage, since the check fragment does not need to be calculated, the calculated check fragment is not sent naturally, and thus bandwidth resources are saved.

Further, in this embodiment, the logic unit 44 may also inherit the identification of the logic unit 33, so that, for the data blocks D33, D34, etc., not only the offset in the logic unit is not changed, but also the identification of the logic unit in which they are located is not changed, and the logical addresses corresponding to these data blocks are not changed (the logical addresses include the identification of the logic unit and the offset in the logic unit). Therefore, modification of metadata of valid data blocks such as D33 and D34, and forwarding of the modified metadata among the storage nodes are avoided, and network bandwidth is further saved.

S607, the master node releases the storage space occupied by the logic unit 22 and the logic unit 33. This step may be described with reference to S406 shown in fig. 4.

The embodiment also provides a storage node, which may be a storage array or a server. When the storage node is a storage array, the storage node includes a storage controller and a storage medium. The structure of the memory controller can refer to the structure diagram of fig. 8. When the storage node is a server, the structural diagram of fig. 8 may also be referred to. Thus, regardless of the form of the storage node, the storage node includes at least a processor 801 and a memory 802. The memory 802 stores a program 803. The processor 801, the memory 802, and the interface 804 are connected by a system bus 805 to communicate with each other.

Processor 801 is a single or multi-core central processing unit, either a specific integrated circuit or one or more integrated circuits configured to implement embodiments of the present invention. The Memory 802 may be a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one hard disk Memory. The memory 802 is used to store computer-executable instructions. Specifically, the computer-executable instructions may include a program 803. When the storage node is running, the processor 801 runs the program 803 to execute the method flows of S401-S406 shown in fig. 4 or execute the method flows of S601-S607 shown in fig. 6.

Referring to fig. 9, the present embodiment further provides a garbage collection apparatus, where the apparatus is located in a master node of a distributed storage system, the distributed storage system includes a plurality of storage nodes, the master node is one of the plurality of storage nodes, and the garbage collection apparatus includes the following modules.

A selecting module 901, configured to select a target node from the multiple storage nodes according to a data amount of valid data of the source logic unit distributed in each storage node, where a data amount of first valid data stored by the target node exceeds a set number threshold. The specific functions of the module can refer to S401, S402, S403 shown in fig. 4, and S601, S602, and S603 shown in fig. 6. In addition, the functions of the modules may be performed by the processor 801 shown in fig. 8 running the program 803 in the memory 802.

A creating module 902, configured to create a target logical unit, where a storage node distributed by the target logical unit includes the target node. The specific functions of the module can refer to S404 shown in fig. 4 and S604 shown in fig. 6. In addition, the functions of the modules may be performed by the processor 801 shown in fig. 8 running the program 803 in the memory 802.

An instructing module 903, configured to instruct the target node to migrate the first valid data from a first source address in the target node to a first target address in the target node, where a storage space indicated by the first source address belongs to the source logical unit, and a storage space indicated by the first target address belongs to the target logical unit. The specific functions of the module can refer to S405 shown in fig. 4 and S605 and S606 shown in fig. 6. In addition, the functions of the modules may be performed by the processor 801 shown in fig. 8 running the program 803 in the memory 802.

A releasing module 904, configured to release the storage space indicated by the first source address. The specific functions of the module can refer to S406 shown in fig. 4 and S607 shown in fig. 6. In addition, the functions of the modules may be performed by the processor 801 shown in fig. 8 running the program 803 in the memory 802.

Optionally, the creating module 902 is further configured to create a migration list before the master node instructs the target node to migrate the first valid data from the first source address in the target node to the first target address in the target node, where the migration list includes the first source address of the first valid data and the first target address of the first valid data. The garbage collection apparatus may further include a sending module 905 configured to send the migration list to the target node.

Optionally, the migration list further includes a second source address of second valid data stored in the other storage nodes and a second destination address of the second valid data, where the second source address is located in the other storage nodes, and the second destination address is located in the destination node. The sending module 905 is further configured to send the migration list to the other storage nodes. The indicating module 903 is further configured to instruct the other storage nodes to migrate the stored second valid data from the second source address to the second destination address, where the storage space indicated by the second source address belongs to the source logical unit, and the storage space indicated by the second destination address belongs to the destination logical unit. The releasing module 904 is further configured to release the memory indicated by the second source address.

Optionally, the first source address and the first destination address are both located in the first hard disk of the destination node.

Optionally, the first source address is located in a first hard disk of the target node, and the first target address is located in a second hard disk of the target node.

Optionally, the indicating module 903 is specifically configured to instruct the target node to migrate the first valid data from the first source address to the first target address according to an offset that the first valid data is located in the source logic unit, so that the offset that the first valid data is located in the target logic unit after the migration is the same as the offset that the first valid data is located in the source logic unit before the migration.

Optionally, the first valid data is distributed in a first stripe of the source logical unit before migration, and the first valid data is distributed in a second stripe of the target logical unit after migration, and the indication module 903 is further configured to determine whether a data slice included in the first stripe is the same as a data slice included in the second stripe; and when the data fragment contained in the first stripe is the same as the data fragment contained in the second stripe, reserving the check fragment contained in the first stripe.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in or transferred from a computer-readable storage medium to another computer-readable storage medium, which may be any available medium that can be accessed by a computer or a data storage device including one or more available medium integrated storage nodes, data centers, and the like. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It should be understood that the terms "first" and the like in the embodiments of the present application are merely for referring to objects, and do not indicate the order of the corresponding objects.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a storage node, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

PCT domestic application, the claims have been published.