[go: up one dir, main page]

CN109840051B - Data storage method and device for a storage system - Google Patents

Data storage method and device for a storage system Download PDF

Info

Publication number
CN109840051B
CN109840051B CN201811613679.6A CN201811613679A CN109840051B CN 109840051 B CN109840051 B CN 109840051B CN 201811613679 A CN201811613679 A CN 201811613679A CN 109840051 B CN109840051 B CN 109840051B
Authority
CN
China
Prior art keywords
storage
data
storage partition
storage system
stored
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811613679.6A
Other languages
Chinese (zh)
Other versions
CN109840051A (en
Inventor
董如良
姬朋立
付克博
张进毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201811613679.6A priority Critical patent/CN109840051B/en
Publication of CN109840051A publication Critical patent/CN109840051A/en
Application granted granted Critical
Publication of CN109840051B publication Critical patent/CN109840051B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

In the method, when the topological structure of a storage system is a first topological structure, M first storage partition groups corresponding to the first topological structure are generated, and each of the M first storage partition groups corresponds to K first storage devices; when the topological structure of the storage system is updated from a first topological structure to a second topological structure different from the first topological structure, N second storage partition groups corresponding to the second topological structure are generated, each of the N second storage partition groups corresponds to P second storage devices, and then the M first storage partition groups stored in the storage system are updated by using the N second storage partition groups, so that after receiving new data to be stored, the new data to be stored is stored in the second storage device corresponding to at least one second storage partition group in the N second storage partition groups.

Description

一种存储系统的数据存储方法及装置Data storage method and device for a storage system

技术领域technical field

本申请涉及存储技术领域,尤其涉及一种存储系统的数据存储方法及装置。The present application relates to the field of storage technologies, and in particular, to a data storage method and device of a storage system.

背景技术Background technique

随着互联网用户的激增以及业务的多样性发展,越来越多的数据(例如,用户数据、业务配置数据等)需要使用存储系统进行存储,以便用来分析和指导业务。With the proliferation of Internet users and the development of business diversity, more and more data (eg, user data, business configuration data, etc.) need to be stored in a storage system so as to be used for analysis and business guidance.

为了提高存储系统的存储性能,提出了通过虚拟化技术对硬盘进行分区的方法,具体的做法是:首先按照预设的规则(例如用户需求或者存储系统的存储池对应的数据冗余模式等),建立若干个分区组(partition team,PT),例如,创建2053个PT,每个PT中包括6个分区(partition,pt),然后,根据均衡度和/或位置均衡度等原则,将该若干个PT打散到存储系统的所有硬盘上,这样每个PT都固定映射到存储系统的6个硬盘中的分区上,这样,在该存储系统上存储数据时,存储系统便只需从多个PT中选择一个PT,然后根据PT与存储系统的硬盘之间的映射关系,自动将该分条存储在相应的硬盘上,可以简化存储系统在进行数据存储的步骤,并且,该存储系统中只需要存储与该分条对应的PT的索引号,后续便可以根据该PT的索引号,从硬盘中获取该数据,可以节省存储系统的开销。In order to improve the storage performance of the storage system, a method of partitioning the hard disk using virtualization technology is proposed. , establish several partition teams (partition teams, PTs), for example, create 2053 PTs, each PT includes 6 partitions (partitions, pts), and then, according to the principles of balance and/or position balance, the Several PTs are scattered on all the hard disks of the storage system, so that each PT is fixedly mapped to the partitions in the 6 hard disks of the storage system. In this way, when storing data on this storage system, the storage system only needs to Select a PT from the PTs, and then automatically store the stripe on the corresponding hard disk according to the mapping relationship between the PT and the hard disk of the storage system, which can simplify the steps of data storage in the storage system. Only the index number of the PT corresponding to the stripe needs to be stored, and then the data can be acquired from the hard disk according to the index number of the PT, which can save the overhead of the storage system.

然而由于存储系统是一个动态的系统,例如,存储系统中的某些存储节点可能会发生故障,从而导致存储系统的存储容量减小(减容),或者可以向该存储系统中增加新的存储节点来扩容等,那么,当存储系统发生变化,例如,存储系统减容时,该预先创建的若干个PT组的某些PT则无法使用,从而与该PT对应的其他硬盘中的分区也无法使用,从而降低存储系统中的空间利用率。However, since the storage system is a dynamic system, for example, some storage nodes in the storage system may fail, thereby causing the storage capacity of the storage system to decrease (capacity reduction), or new storage may be added to the storage system Then, when the storage system changes, for example, when the storage system is reduced in capacity, some PTs in the pre-created PT groups cannot be used, so the partitions in other hard disks corresponding to the PT cannot be used. use, thereby reducing the space utilization in the storage system.

发明内容SUMMARY OF THE INVENTION

本申请提供一种数据存储方法及装置,用以提高存储系统的空间利用率。The present application provides a data storage method and device for improving the space utilization of a storage system.

第一方面,提供一种存储系统的数据存储方法,在该方法中,在存储系统的拓扑结构为第一拓扑结构时,生成与该第一拓扑结构对应的M个第一存储分区组,该M个第一存储分区组中的每个第一存储分区组对应K个第一存储设备;当存储系统的拓扑结构由第一拓扑结构更新为与所述第一拓扑结构不同的第二拓扑结构时,则生成与该第二拓扑结构对应的N个第二存储分区组,该N个第二存储分区组中的每个第二存储分区组对应P个第二存储设备,然后,使用该N个第二存储分区组更新该存储系统中存储的M个第一存储分区组,从而在接收待存储的新数据后,将该待存储的新数据存储在与该N个第二存储分区组中的至少一个第二存储分区组对应的第二存储设备中,N、P、M和K为正整数。A first aspect provides a data storage method for a storage system. In the method, when the topology of the storage system is a first topology, M first storage partition groups corresponding to the first topology are generated, and the Each of the M first storage partition groups corresponds to K first storage devices; when the topology of the storage system is updated from the first topology to a second topology different from the first topology is generated, N second storage partition groups corresponding to the second topology structure are generated, and each second storage partition group in the N second storage partition groups corresponds to P second storage devices, and then the N second storage partition groups are used. two second storage partition groups update the M first storage partition groups stored in the storage system, so that after receiving new data to be stored, the new data to be stored is stored in the same N second storage partition groups In the second storage device corresponding to at least one second storage partition group of , N, P, M and K are positive integers.

在上述技术方案中,存储系统支持动态的存储分区组,当存储系统的拓扑结构发生变化时,存储系统的存储分区组也会发生改变,例如,存储系统由3个存储节点且每个存储节点有3个硬盘,为避免发生单节点故障而导致整个存储系统无法使用的问题,该存储系统中每一个存储分区组中可以包括4个数据盘和2个校验盘,在这种情况下,存储系统的空间利用率为4/(4+2)=66.6%;当存储系统发生扩容,扩容到5个存储节点且每个存储节点有3个硬盘,则存储系统可以生成新的存储分区组,每个存储分区组中可以包括8个数据盘和2个检验盘,则空间利用率为8/(8+2)=80%,从而可以提高存储系统的空间利用率。当存储系统发生减容,由于3个存储节点中的某一个硬盘发生故障,则存储系统也可以生成新的存储分区组以适应减容后的存储系统,新的存储分区组中的每一个存储分区组都可以使用,可以提高存储系统的空间利用率。In the above technical solution, the storage system supports dynamic storage partition groups. When the topology structure of the storage system changes, the storage partition group of the storage system also changes. For example, the storage system consists of three storage nodes and each storage node There are 3 hard disks. To avoid the problem that the entire storage system cannot be used due to a single node failure, each storage partition group in the storage system can include 4 data disks and 2 parity disks. In this case, The space utilization rate of the storage system is 4/(4+2)=66.6%; when the storage system expands to 5 storage nodes and each storage node has 3 hard disks, the storage system can generate a new storage partition group , each storage partition group can include 8 data disks and 2 check disks, so the space utilization rate is 8/(8+2)=80%, so that the space utilization rate of the storage system can be improved. When the storage system is reduced in capacity, due to the failure of one of the hard disks in the three storage nodes, the storage system can also generate a new storage partition group to adapt to the reduced storage system. All partition groups can be used, which can improve the space utilization of the storage system.

在一种可能的设计中,P的值与该存储系统在该第二拓扑结构下所配置的数据冗余模式相同,K的值与该存储系统在该第一拓扑结构下所配置的数据冗余模式相同。In a possible design, the value of P is the same as the data redundancy mode configured by the storage system under the second topology structure, and the value of K is the same as the data redundancy mode configured by the storage system under the first topology structure The rest mode is the same.

在上述技术方案中,存储系统中的存储分区组所包括的存储设备的数量可以根据该存储系统在不同的拓扑结构下所配置的数据冗余模式相同,从而可以使得该存储系统的硬盘的空间得到充分地应用,可以提高存储系统的硬盘的利用率。In the above technical solution, the number of storage devices included in the storage partition group in the storage system can be the same according to the data redundancy modes configured by the storage system under different topological structures, so that the space of the hard disk of the storage system can be reduced. When fully applied, the utilization rate of the hard disk of the storage system can be improved.

在一种可能的设计中,在P大于K时,该每个第一存储分区组对应的L个第一存储设备中的每个第一存储设备中存储的一部分数据,与M个第二存储分区组对应的P个第二存储设备中的一个第二存储设备中存储的数据相同,L为小于P的正整数。In a possible design, when P is greater than K, a part of the data stored in each of the L first storage devices corresponding to each first storage partition group is different from the M second storage devices. The data stored in one of the P second storage devices corresponding to the partition group is the same, and L is a positive integer smaller than P.

在上述技术方案中,当存储系统扩容后,在第一拓扑结构下生成的每个第一存储分区组中包括的L个第一存储设备,可能分别与对应的第二存储分区组中处于相同位置上的第二存储设备相同,例如,第一个第一存储分区组中包括的用于存储第一个数据分片的第一存储设备和用于存储第三个数据分片的第一存储设备,与第一个第二存储分区组中包括的用于存储第一个数据分片的第二存储设备和用于存储第三个数据分片的第二存储设备相同,也就是说,每个第二存储分区组中包括的P个第二存储设备可以和每个第一存储分区组中包括的K个第一存储设备相关联。由于存储系统发生扩容后,第二存储分区组中每个存储设备中存储的数据量变小,因此,当需要将存储系统中已经存储的数据按照M个第二存储分区组进行存储时,该L个第一存储设备中存储的部分数据与对应的第二存储分区组中处于相同位置上的第二存储设备中存储的数据相同,从而该部分数据则可以不用迁移,可以减少数据迁移所消耗的资源。In the above technical solution, after the storage system is expanded, the L first storage devices included in each first storage partition group generated under the first topology may be in the same position as the corresponding second storage partition group, respectively. The second storage device in the same location, for example, the first storage device for storing the first data fragment and the first storage device for storing the third data fragment included in the first first storage partition group The device is the same as the second storage device for storing the first data fragment and the second storage device for storing the third data fragment included in the first second storage partition group, that is, each The P second storage devices included in each of the second storage partition groups may be associated with the K first storage devices included in each of the first storage partition groups. After the storage system is expanded, the amount of data stored in each storage device in the second storage partition group becomes smaller. Therefore, when the data already stored in the storage system needs to be stored according to M second storage partition groups, the L Part of the data stored in the first storage device is the same as the data stored in the second storage device at the same location in the corresponding second storage partition group, so this part of data does not need to be migrated, which can reduce the consumption of data migration. resource.

在一种可能的设计中,存储系统可以将每个存储在该M个第一存储分区组中的一个存储分区组对应的K个存储设备的原始数据中的部分数据按照该N个第二存储分区组中的一个第二存储分区组进行数据迁移,该每个原始数据中的部分数据与该L个第二存储设备中的每个第二存储设备中存储的数据均不同。In a possible design, the storage system may store part of the data in the original data of the K storage devices corresponding to one storage partition group in the M first storage partition groups according to the N second storage partition groups A second storage partition group in the partition group performs data migration, and part of data in each original data is different from data stored in each of the L second storage devices.

在上述技术方案中,若每个第一存储分区组中包括的L个第一存储设备,分别与对应的第二存储分区组中处于相同位置上的第二存储设备相同,则存储系统可以只需要将每个第一存储分区组中的除L个第一存储设备的一部分数据外的其他数据进行迁移,该L个第一存储设备的一部分数据为与该L个第一存储设备对应的L个第二存储设备中存储的数据,从而可以减少进行迁移的数据量。In the above technical solution, if the L first storage devices included in each first storage partition group are respectively the same as the second storage devices in the same position in the corresponding second storage partition group, the storage system may only It is necessary to migrate other data in each first storage partition group except a part of the data of the L first storage devices, and the part of the data of the L first storage devices is L corresponding to the L first storage devices. data stored in the second storage device, thereby reducing the amount of data to be migrated.

在一种可能的设计中,在P小于K时,该每个第一存储分区组对应的S个第一存储设备中的每个第一存储设备中存储的数据,与M个第二存储分区组对应的P个第二存储设备中的一个第二存储设备中存储的一部分数据相同,S为小于P的正整数。In a possible design, when P is less than K, the data stored in each first storage device in the S first storage devices corresponding to each first storage partition group is different from the data stored in the M second storage partitions. A part of data stored in one of the P second storage devices corresponding to the group is the same, and S is a positive integer smaller than P.

在上述技术方案中,当存储系统减容后,在第一拓扑结构下生成的每个第一存储分区组中包括的S个第一存储设备,可能分别与对应的第二存储分区组中处于相同位置上的第二存储设备相同,例如,第一个第一存储分区组中包括的用于存储第一个数据分片的第一存储设备,与第一个第二存储分区组中包括的用于存储第一个数据分片的第二存储设备相同,也就是说,每个第二存储分区组中包括的P个第二存储设备可以和每个第一存储分区组中包括的K个第一存储设备相关联。由于存储系统发生减容后,第二存储分区组中每个存储设备存储的数据量变大,因此,当需要将存储系统中已经存储的数据按照M个第二存储分区组进行存储时,该S个第一存储设备中存储的数据与对应的第二存储分区组中处于相同位置上的第二存储设备中存储的部分数据相同,从而该S个第一存储设备中存储的数据则可以不用迁移,可以减少数据迁移所消耗的资源。In the above technical solution, after the storage system is reduced in capacity, the S first storage devices included in each first storage partition group generated under the first topology structure may be respectively in the same position as the corresponding second storage partition group. The second storage devices in the same location are the same, for example, the first storage device included in the first first storage partition group for storing the first data fragment is the same as the first storage device included in the first second storage partition group. The second storage devices used to store the first data fragment are the same, that is, the P second storage devices included in each second storage partition group may be the same as the K included in each first storage partition group. A first storage device is associated. After the storage system is reduced in capacity, the amount of data stored by each storage device in the second storage partition group becomes larger. Therefore, when the data already stored in the storage system needs to be stored according to M second storage partition groups, the S The data stored in the first storage devices is the same as part of the data stored in the second storage devices at the same position in the corresponding second storage partition group, so the data stored in the S first storage devices may not need to be migrated , which can reduce the resources consumed by data migration.

在一种可能的设计中,存储系统可以将每个存储在该M个第一存储分区组中的一个存储分区组对应的K个存储设备的原始数据中的部分数据按照该N个第二存储分区组中的一个第二存储分区组进行数据迁移,该每个原始数据中的部分数据与该P个第二存储设备中的每个第二存储设备中存储的任意一部分数据均不同。In a possible design, the storage system may store part of the data in the original data of the K storage devices corresponding to one storage partition group in the M first storage partition groups according to the N second storage partition groups A second storage partition group in the partition group performs data migration, and part of data in each original data is different from any part of data stored in each of the P second storage devices.

在上述技术方案中,若每个第一存储分区组中包括的S个第一存储设备,分别与对应的第二存储分区组中处于相同位置上的第二存储设备相同,则存储系统可以只需要将每个第一存储分区组中与S个第一存储设备不同的其他存储设备上的数据进行迁移,从而可以减少进行迁移的数据量。In the above technical solution, if the S first storage devices included in each first storage partition group are respectively the same as the second storage devices in the same position in the corresponding second storage partition group, the storage system may only Data on other storage devices different from the S first storage devices in each first storage partition group needs to be migrated, so that the amount of data to be migrated can be reduced.

在一种可能的设计中,该N个第二存储分区组中的任意一个第二存储分区组对应的P个第二存储设备,与该M个第一存储分区组中任意一个第一存储分区组中包括的K个第一存储设备不相同。In a possible design, the P second storage devices corresponding to any second storage partition group in the N second storage partition groups are associated with any first storage partition in the M first storage partition groups. The K first storage devices included in the group are not the same.

在上述技术方案中,无论是存储系统扩容或者减容,存储系统在第二拓扑结构下生成的每个第二存储分区组中包括的多个第二存储设备,可能与在第一拓扑结构下对应的第一存储分区组中包括的多个第一存储设备不同,或者,存储系统在第二拓扑结构下生成的每个第二存储分区组与在第一拓扑结构下对应的第一存储分区组中处于相同位置的存储设备不同,也就是说,每个第二存储分区组中包括的P个第二存储设备可以和每个第一存储分区组中包括的K个第一存储设备不关联,这样,存储系统可以灵活选择用于生成M个第二存储分区组的方式,可以提高存储系统的灵活性。In the above technical solution, whether the storage system is capacity expansion or capacity reduction, the plurality of second storage devices included in each second storage partition group generated by the storage system under the second topology structure may be different from those under the first topology structure. The multiple first storage devices included in the corresponding first storage partition group are different, or, each second storage partition group generated by the storage system under the second topology structure is different from the corresponding first storage partition under the first topology structure. The storage devices in the same position in the group are different, that is, the P second storage devices included in each second storage partition group may not be associated with the K first storage devices included in each first storage partition group , in this way, the storage system can flexibly select a manner for generating the M second storage partition groups, which can improve the flexibility of the storage system.

在一种可能的设计中,存储系统可以将存储在M个第一存储分区组中的至少一个第一存储分区组的原始数据按照该N个第二存储分区组中的一个第二存储分区组进行数据迁移。In a possible design, the storage system may store the original data of at least one first storage partition group in the M first storage partition groups according to one second storage partition group in the N second storage partition groups Perform data migration.

在上述技术方案中,当存储系统得存储分区组由M个第一存储分区组更新为N个第二存储分区组后,则可以将之前按照M个第一存储分区组存储的数据进行迁移,从而使得已经存储的数据也能按照该N个第二存储分区组中至少一个第二存储分区组进行存储,可以提高存储系统的空间利用率。In the above technical solution, after the storage system has updated the storage partition groups from M first storage partition groups to N second storage partition groups, the data previously stored according to the M first storage partition groups can be migrated, Therefore, the already stored data can also be stored according to at least one second storage partition group in the N second storage partition groups, which can improve the space utilization rate of the storage system.

第二方面,提供一种存储系统的数据存储装置,该装置包括处理器,用于实现上述第一方面描述的方法。所述装置还可以包括存储器,用于存储程序指令和数据。所述存储器与所述处理器耦合,所述处理器可以调用并执行所述存储器中存储的程序指令,用于实现上述第一方面描述的方法。所述装置还可以包括通信接口,所述通信接口用于该装置与其它设备进行通信。示例性地,该其它设备为存储节点。In a second aspect, a data storage device of a storage system is provided. The device includes a processor for implementing the method described in the first aspect. The apparatus may also include a memory for storing program instructions and data. The memory is coupled to the processor, and the processor can invoke and execute program instructions stored in the memory, so as to implement the method described in the first aspect above. The apparatus may also include a communication interface for the apparatus to communicate with other devices. Illustratively, the other device is a storage node.

在一种可能的设计中,该装置包括通信接口和处理器,其中:In one possible design, the apparatus includes a communication interface and a processor, wherein:

所述处理器,用于在存储系统的拓扑结构由第一拓扑结构更新为第二拓扑结构时,生成与所述第二拓扑结构对应的N个第二存储分区组,所述N个第二存储分区组中的每个第二存储分区组对应P个第二存储设备,N和P为正整数;其中,所述第一拓扑结构与所述第二拓扑结构不同;The processor is configured to generate N second storage partition groups corresponding to the second topology structure when the topology structure of the storage system is updated from the first topology structure to the second topology structure, and the N second topological structure Each second storage partition group in the storage partition group corresponds to P second storage devices, and N and P are positive integers; wherein, the first topology structure is different from the second topology structure;

所述处理器,还用于使用所述N个第二存储分区组更新所述存储系统中存储的M个第一存储分区组,所述M个第一存储分区组与所述第一拓扑结构对应,所述M个第一存储分区组中的每个第一存储分区组对应K个第一存储设备,M和K为正整数;The processor is further configured to use the N second storage partition groups to update the M first storage partition groups stored in the storage system, the M first storage partition groups and the first topology structure Correspondingly, each first storage partition group in the M first storage partition groups corresponds to K first storage devices, and M and K are positive integers;

所述处理器,还用于在通过所述通信接口接收待存储的新数据后,将所述待存储的新数据存储在与所述N个第二存储分区组中的至少一个第二存储分区组对应的第二存储设备中。The processor is further configured to, after receiving the new data to be stored through the communication interface, store the new data to be stored in at least one second storage partition with the N second storage partition groups in the second storage device corresponding to the group.

在一种可能的设计中,P的值与所述存储系统在所述第二拓扑结构下所配置的数据冗余模式相同,K的值与所述存储系统在所述第一拓扑结构下所配置的数据冗余模式相同。In a possible design, the value of P is the same as the data redundancy mode configured by the storage system under the second topology, and the value of K is the same as that of the storage system under the first topology. The configured data redundancy mode is the same.

在一种可能的设计中,在P大于K时,所述每个第一存储分区组对应的L个第一存储设备中的每个第一存储设备中存储的一部分数据,与M个第二存储分区组对应的P个第二存储设备中的一个第二存储设备中存储的数据相同,L为小于P的正整数。In a possible design, when P is greater than K, a part of the data stored in each of the L first storage devices corresponding to each first storage partition group is different from the M second storage device. The data stored in one of the P second storage devices corresponding to the storage partition group is the same, and L is a positive integer smaller than P.

在一种可能的设计中,所述处理器还用于:In one possible design, the processor is also used to:

将每个原始数据中的部分数据按照所述N个第二存储分区组中的一个第二存储分区组进行数据迁移,所述每个原始数据存储在所述M个第一存储分区组中的一个存储分区组对应的K个存储设备,所述每个原始数据中的部分数据与所述L个第二存储设备中的每个第二存储设备中存储的数据均不同。Part of the data in each original data is migrated according to one second storage partition group in the N second storage partition groups, and each original data is stored in the M first storage partition groups. For K storage devices corresponding to one storage partition group, part of data in each of the original data is different from data stored in each of the L second storage devices.

在一种可能的设计中,在P小于K时,所述每个第一存储分区组对应的S个第一存储设备中的每个第一存储设备中存储的数据,与M个第二存储分区组对应的P个第二存储设备中的一个第二存储设备中存储的一部分数据相同,S为小于P的正整数。In a possible design, when P is less than K, the data stored in each of the S first storage devices corresponding to each first storage partition group is different from the data stored in the M second storage devices. Part of the data stored in one of the P second storage devices corresponding to the partition group is the same, and S is a positive integer smaller than P.

在一种可能的设计中,所述处理器还用于:In one possible design, the processor is also used to:

将每个原始数据中的部分数据按照所述N个第二存储分区组中的一个第二存储分区组进行数据迁移,所述每个原始数据存储在所述M个第一存储分区组中的一个存储分区组对应的K个存储设备,所述每个原始数据中的部分数据与所述P个第二存储设备中的每个第二存储设备中存储的任意一部分数据均不同。Part of the data in each original data is migrated according to one second storage partition group in the N second storage partition groups, and each original data is stored in the M first storage partition groups. For K storage devices corresponding to one storage partition group, part of data in each original data is different from any part of data stored in each of the P second storage devices.

在一种可能的设计中,所述N个第二存储分区组中的任意一个第二存储分区组对应的P个第二存储设备,与所述M个第一存储分区组中任意一个第一存储分区组中包括的K个第一存储设备不相同。In a possible design, the P second storage devices corresponding to any one of the N second storage partition groups are the same as any one of the M first storage partition groups. The K first storage devices included in the storage partition group are different.

在一种可能的设计中,所述处理器还用于:In one possible design, the processor is also used to:

将每个原始数据按照所述N个第二存储分区组中的一个第二存储分区组进行数据迁移,所述每个原始数据为,所述存储系统在所述第一拓扑结构时,按照所述M个第一存储分区组中的至少一个第一存储分区组存储所述存储系统中的数据。Data migration is performed for each original data according to one second storage partition group in the N second storage partition groups, and the each original data is that the storage system in the first topology structure, according to the At least one first storage partition group in the M first storage partition groups stores data in the storage system.

第三方面,提供一种存储系统的数据存储装置,该通信装置可以是存储系统的控制器,也可以是存储系统的控制器中的装置,该通信装置可以包括处理模块和通信模块,这些模块可以执行上述第一方面任一种设计示例中所执行的方法,具体的:In a third aspect, a data storage device of a storage system is provided. The communication device may be a controller of the storage system or a device in the controller of the storage system. The communication device may include a processing module and a communication module. These modules The method performed in any one of the above design examples of the first aspect can be performed, specifically:

所述处理模块,用于在存储系统的拓扑结构由第一拓扑结构更新为第二拓扑结构时,生成与所述第二拓扑结构对应的N个第二存储分区组,所述N个第二存储分区组中的每个第二存储分区组对应P个第二存储设备,N和P为正整数;其中,所述第一拓扑结构与所述第二拓扑结构不同;The processing module is configured to generate N second storage partition groups corresponding to the second topology when the topology of the storage system is updated from the first topology to the second topology, the N second Each second storage partition group in the storage partition group corresponds to P second storage devices, and N and P are positive integers; wherein, the first topology structure is different from the second topology structure;

所述处理模块,还用于使用所述N个第二存储分区组更新所述存储系统中存储的M个第一存储分区组,所述M个第一存储分区组与所述第一拓扑结构对应,所述M个第一存储分区组中的每个第一存储分区组对应K个第一存储设备,M和K为正整数;The processing module is further configured to use the N second storage partition groups to update the M first storage partition groups stored in the storage system, the M first storage partition groups and the first topology structure Correspondingly, each first storage partition group in the M first storage partition groups corresponds to K first storage devices, and M and K are positive integers;

所述处理模块,还用于在通过所述通信模块接收待存储的新数据后,将所述待存储的新数据存储在与所述N个第二存储分区组中的至少一个第二存储分区组对应的第二存储设备中。The processing module is further configured to, after receiving the new data to be stored through the communication module, store the new data to be stored in at least one second storage partition with the N second storage partition groups in the second storage device corresponding to the group.

在一种可能的设计中,P的值与所述存储系统在所述第二拓扑结构下所配置的数据冗余模式相同,K的值与所述存储系统在所述第一拓扑结构下所配置的数据冗余模式相同。In a possible design, the value of P is the same as the data redundancy mode configured by the storage system under the second topology, and the value of K is the same as that of the storage system under the first topology. The configured data redundancy mode is the same.

在一种可能的设计中,在P大于K时,所述每个第一存储分区组对应的L个第一存储设备中的每个第一存储设备中存储的一部分数据,与M个第二存储分区组对应的P个第二存储设备中的一个第二存储设备中存储的数据相同,L为小于P的正整数。In a possible design, when P is greater than K, a part of the data stored in each of the L first storage devices corresponding to each first storage partition group is different from the M second storage device. The data stored in one of the P second storage devices corresponding to the storage partition group is the same, and L is a positive integer smaller than P.

在一种可能的设计中,所述处理模块还用于:In a possible design, the processing module is also used to:

将每个原始数据中的部分数据按照所述N个第二存储分区组中的一个第二存储分区组进行数据迁移,所述每个原始数据存储在所述M个第一存储分区组中的一个存储分区组对应的K个存储设备,所述每个原始数据中的部分数据与所述L个第二存储设备中的每个第二存储设备中存储的数据均不同。Part of the data in each original data is migrated according to one second storage partition group in the N second storage partition groups, and each original data is stored in the M first storage partition groups. For K storage devices corresponding to one storage partition group, part of data in each of the original data is different from data stored in each of the L second storage devices.

在一种可能的设计中,在P小于K时,所述每个第一存储分区组对应的S个第一存储设备中的每个第一存储设备中存储的数据,与M个第二存储分区组对应的P个第二存储设备中的一个第二存储设备中存储的一部分数据相同,S为小于P的正整数。In a possible design, when P is less than K, the data stored in each of the S first storage devices corresponding to each first storage partition group is different from the data stored in the M second storage devices. Part of the data stored in one of the P second storage devices corresponding to the partition group is the same, and S is a positive integer smaller than P.

在一种可能的设计中,所述处理模块还用于:In a possible design, the processing module is also used to:

将每个原始数据中的部分数据按照所述N个第二存储分区组中的一个第二存储分区组进行数据迁移,所述每个原始数据存储在所述M个第一存储分区组中的一个存储分区组对应的K个存储设备,所述每个原始数据中的部分数据与所述P个第二存储设备中的每个第二存储设备中存储的任意一部分数据均不同。Part of the data in each original data is migrated according to one second storage partition group in the N second storage partition groups, and each original data is stored in the M first storage partition groups. For K storage devices corresponding to one storage partition group, part of data in each original data is different from any part of data stored in each of the P second storage devices.

在一种可能的设计中,所述N个第二存储分区组中的任意一个第二存储分区组对应的P个第二存储设备,与所述M个第一存储分区组中任意一个第一存储分区组中包括的K个第一存储设备不相同。In a possible design, the P second storage devices corresponding to any one of the N second storage partition groups are the same as any one of the M first storage partition groups. The K first storage devices included in the storage partition group are different.

在一种可能的设计中,所述处理模块还用于:In a possible design, the processing module is also used to:

将每个原始数据按照所述N个第二存储分区组中的一个第二存储分区组进行数据迁移,所述每个原始数据为,所述存储系统在所述第一拓扑结构时,按照所述M个第一存储分区组中的至少一个第一存储分区组存储所述存储系统中的数据。Data migration is performed for each original data according to one second storage partition group in the N second storage partition groups, and the each original data is that the storage system in the first topology structure, according to the At least one first storage partition group in the M first storage partition groups stores data in the storage system.

第四方面,提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被计算机执行时,使所述计算机执行第一方面中任意一项所述的方法。In a fourth aspect, a computer-readable storage medium is provided, the computer-readable storage medium stores a computer program, the computer program includes program instructions, and the program instructions, when executed by a computer, cause the computer to execute a first The method of any one of the aspects.

第五方面,提供一种计算机程序产品,所述计算机程序产品存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被计算机执行时,使所述计算机执行第一方面中任意一项所述的方法。In a fifth aspect, a computer program product is provided, the computer program product stores a computer program, and the computer program includes program instructions that, when executed by a computer, cause the computer to execute any one of the first aspects. method described in item.

第六方面,提供了一种芯片系统,该芯片系统包括处理器,还可以包括存储器,用于实现第一方面所述的方法。该芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。In a sixth aspect, a chip system is provided, the chip system includes a processor, and may further include a memory, for implementing the method described in the first aspect. The chip system can be composed of chips, and can also include chips and other discrete devices.

上述第二方面至第六方面及其实现方式的有益效果可以参考对第一方面的方法及其实现方式的有益效果的描述。For the beneficial effects of the second to sixth aspects and implementations thereof, reference may be made to the description of the method of the first aspect and the beneficial effects of the implementations thereof.

附图说明Description of drawings

图1为本申请实施例中存储系统创建的10个PT在存储设备的分布情况示意图;1 is a schematic diagram of the distribution of 10 PTs created by a storage system in a storage device in an embodiment of the present application;

图2A为本申请实施例中与一组PT对应的视图的一种示例的示意图;2A is a schematic diagram of an example of a view corresponding to a group of PTs in an embodiment of the present application;

图2B为本申请实施例中与一组PT对应的视图的另一种示例的示意图;2B is a schematic diagram of another example of a view corresponding to a group of PTs in an embodiment of the present application;

图3A为本申请实施例中提供的存储系统的一种示例的示意图;3A is a schematic diagram of an example of a storage system provided in an embodiment of the present application;

图3B为本申请实施例中提供的存储系统的另一种示例的示意图;3B is a schematic diagram of another example of a storage system provided in an embodiment of the present application;

图3C为本申请实施例中提供的存储系统的另一种示例的示意图;3C is a schematic diagram of another example of a storage system provided in an embodiment of the present application;

图4为本申请实施例中提供一种数据存储方法的一种示例的流程图;4 is a flowchart of an example of a data storage method provided in an embodiment of the present application;

图5为本申请实施例中存储系统的6个PT的一种示例的示意图;5 is a schematic diagram of an example of 6 PTs of the storage system in the embodiment of the application;

图6为本申请实施例中存储系统在第一拓扑结构下的第一视图的一种示例的示意图;6 is a schematic diagram of an example of a first view of a storage system under a first topology structure in an embodiment of the present application;

图7A为本申请实施例中存储系统在第二拓扑结构下的第二视图的一种示例的示意图;7A is a schematic diagram of an example of a second view of a storage system under a second topology structure in an embodiment of the present application;

图7B为本申请实施例中存储系统在第二拓扑结构下的第二视图的另一种示例的示意图;7B is a schematic diagram of another example of a second view of a storage system under a second topology structure in an embodiment of the present application;

图7C为本申请实施例中存储系统在第二拓扑结构下的第二视图的另一种示例的示意图;7C is a schematic diagram of another example of a second view of the storage system under the second topology structure in the embodiment of the present application;

图8为本申请实施例中的数据存储方法的另一种示例的流程图;8 is a flowchart of another example of a data storage method in an embodiment of the present application;

图9为本申请实施例中与一组PT对应的视图的另一种示例的示意图;9 is a schematic diagram of another example of a view corresponding to a group of PTs in an embodiment of the present application;

图10为本申请实施例中存储系统的第一视图的另一种示例的示意图;10 is a schematic diagram of another example of a first view of a storage system in an embodiment of the application;

图11为本申请实施例中存储系统的第二视图的另一种示例的示意图;11 is a schematic diagram of another example of a second view of a storage system in an embodiment of the application;

图12为本申请实施例中提供的一种存储系统的数据存储装置的一种示例的结构示意图;12 is a schematic structural diagram of an example of a data storage device of a storage system provided in an embodiment of the application;

图13为本申请实施例中提供的一种存储系统的数据存储装置的另一种示例的结构示意图。FIG. 13 is a schematic structural diagram of another example of a data storage device of a storage system provided in an embodiment of the present application.

具体实施方式Detailed ways

为了使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施例作进一步地详细描述。In order to make the objectives, technical solutions and advantages of the embodiments of the present application more clear, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

以下,对本申请实施例中涉及的技术术语进行解释说明,以便于本领域技术人员理解。Hereinafter, the technical terms involved in the embodiments of the present application will be explained to facilitate the understanding of those skilled in the art.

1)分区(partition,pt),是存储系统的硬盘的基本存储单位,对pt进行的分组,每个pt只属于一个PT,每个PT包含多个pt。1) A partition (partition, pt) is the basic storage unit of the hard disk of the storage system. The pts are grouped, each pt belongs to only one PT, and each PT contains multiple pts.

2)数据冗余模式,也可以称为数据冗余级别,是为了防止数据丢失或者产生错误,在存储数据时采用的一种保护机制。通常来讲,实现数据冗余通常可以包括如下两种方式,即副本方式或者纠删码(erasure coding,EC)方式,其中,副本方式可以理解为,将待存储的数据复制多份,例如,将待存储的数据复制为3份,然后将复制后的数据分别存储在不同的分区中,这样,当一个分区发生故障导致存储的数据丢失,该数据可以从其他分区中获取。在这种情况下,副本的数量即为数据冗余模式。EC方式可以理解为,将该待存储的数据分割成多份,然后对分割后的每一份数据进行编码,从而得到的多份数据分片,并根据多个数据分片生成校验部分,然后将数据分片和检验部分分别存储在不同的分区中,这样,该一个分区发生故障导致存储的数据丢失,可以根据其他存储节点上存储的数据分片以及校验部分,重构出该数据。在这种情况下,数据分片和检验部分的数量之和即为数据冗余模式,或者可以称为EC配比,例如,EC4+2模式,表示4个用于存储数据分片的分区,2个用于存储校验部分的分区;EC8+2模式,表示8个用于存储数据分片的分区,2个用于存储校验部分的分区。当然,当采用其他方式存储数据时,数据冗余模式也可以作类似的解释,在此不作限制。2) Data redundancy mode, also known as data redundancy level, is a protection mechanism adopted when storing data in order to prevent data loss or errors. Generally speaking, the realization of data redundancy can usually include the following two methods, namely, a copy method or an erasure coding (erasure coding, EC) method, wherein the copy method can be understood as copying multiple copies of the data to be stored, for example, Copy the data to be stored into three copies, and then store the copied data in different partitions. In this way, when a partition fails and the stored data is lost, the data can be obtained from other partitions. In this case, the number of replicas is the data redundancy mode. The EC method can be understood as dividing the data to be stored into multiple pieces, and then encoding each piece of data after division, so as to obtain multiple pieces of data, and generate the verification part according to the multiple pieces of data, Then, the data fragmentation and the verification part are stored in different partitions. In this way, if one partition fails and the stored data is lost, the data can be reconstructed according to the data fragmentation and verification part stored on other storage nodes. . In this case, the sum of the number of data shards and inspection parts is the data redundancy mode, or it can be called EC ratio, for example, EC4+2 mode, which means 4 partitions for storing data shards, 2 partitions for storing the parity part; EC8+2 mode means 8 partitions for storing data shards and 2 partitions for storing the parity part. Of course, when data is stored in other ways, the data redundancy mode can also be interpreted similarly, which is not limited here.

3)存储设备,例如可以包括串行高级技术附件(serial advanced technologyattachment,SATA)硬盘、小型计算机系统接口(small computer systeminterface,SCSI)硬盘、串行连接SCSI接口(serial attached SCSI,SAS)、光纤通道接口(fibre channel,FC)硬盘、机械硬盘(hard disk drive,HDD)以及固态硬盘(solid state drive,SSD)等。3) Storage devices, such as serial advanced technology attachment (SATA) hard disks, small computer system interface (SCSI) hard disks, serial attached SCSI interfaces (serial attached SCSI, SAS), Fibre Channel Interface (fibre channel, FC) hard disk, mechanical hard disk (hard disk drive, HDD) and solid state drive (solid state drive, SSD) and so on.

4)视图,是pt到存储设备(例如,硬盘)的映射表。请参考图1,为存储系统创建的10个PT在存储设备的分布情况示意图。在图1中,以每组PT包括6个pt为例,同颜色的6个pt为同一组PT,A~J分别表示不同的存储设备,例如,第一组PT的6个pt位于存储设备A~存储设备F上,第二组PT的6个pt位于存储设备B~存储设备G上,以此类推。以EC方式存储数据且数据冗余模式为EC4+2为例,针对第一组PT,可以得到如图2A所示的视图,其中,D1~D4表示第一组PT中的4个用于存储数据分片的pt,P和Q分别用于存储校验部分的pt,括号中的字符表示该pt所属的存储设备,针对第二组PT,可以得到如图2B所示的视图,针对其他PT也是同样的,在此不再赘述。在图2A和图2B所示的视图中,以硬盘编号从小到大进行排列,但在实际中的视图也可能是乱序的,在此不作限制。4) View, which is a mapping table from pt to storage device (eg, hard disk). Please refer to FIG. 1 , which is a schematic diagram of the distribution of the 10 PTs created for the storage system in the storage device. In Figure 1, each group of PTs includes 6 PTs as an example, the 6 PTs of the same color are the same group of PTs, and A to J respectively represent different storage devices. For example, the 6 PTs of the first group of PTs are located in the storage device. On A to storage device F, the 6 pts of the second group of PTs are located on storage device B to storage device G, and so on. Taking the data stored in EC mode and the data redundancy mode as EC4+2 as an example, for the first group of PTs, the view shown in FIG. 2A can be obtained, where D1 to D4 indicate that 4 of the first group of PTs are used for storage The pt, P and Q of the data fragment are respectively used to store the pt of the check part. The characters in brackets indicate the storage device to which the pt belongs. For the second group of PTs, the view shown in Figure 2B can be obtained. For other PTs, the view shown in Figure 2B can be obtained. The same is true, and will not be repeated here. In the views shown in FIG. 2A and FIG. 2B , the numbers of the hard disks are arranged in ascending order, but in practice, the views may also be out of order, which is not limited here.

5)本申请实施例中“多个”是指两个或两个以上,鉴于此,本申请实施例中也可以将“多个”理解为“至少两个”。“至少一个”,可理解为一个或多个,例如理解为一个、两个或更多个。例如,包括至少一个,是指包括一个、两个或更多个,而且不限制包括的是哪几个,例如,包括A、B和C中的至少一个,那么包括的可以是A、B、C、A和B、A和C、B和C、或A和B和C。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,字符“/”,如无特殊说明,一般表示前后关联对象是一种“或”的关系。5) In the embodiments of the present application, "plurality" refers to two or more than two. In view of this, "plurality" may also be understood as "at least two" in the embodiments of the present application. "At least one" can be understood as one or more, such as one, two or more. For example, including at least one refers to including one, two or more, and does not limit which ones are included. For example, including at least one of A, B, and C, then including A, B, C, A and B, A and C, B and C, or A and B and C. "And/or", which describes the association relationship of the associated objects, means that there can be three kinds of relationships, for example, A and/or B, which can mean that A exists alone, A and B exist at the same time, and B exists alone. In addition, the character "/", unless otherwise specified, generally indicates that the related objects are an "or" relationship.

除非有相反的说明,本申请实施例提及“第一”、“第二”等序数词用于对多个对象进行区分,不用于限定多个对象的顺序、时序、优先级或者重要程度。Unless stated to the contrary, ordinal numbers such as "first" and "second" mentioned in the embodiments of the present application are used to distinguish multiple objects, and are not used to limit the order, sequence, priority, or importance of multiple objects.

以上介绍了本申请所涉及的一些概念,下面,对本申请的技术背景进行说明。Some concepts involved in the present application have been introduced above, and the technical background of the present application will be described below.

存储系统起初以存储设备(例如,HDD)为单位做冗余,例如,数据冗余模式为EC8+2,则固定使用存储系统中的8块硬盘作为数据盘,存放数据分片,固定使用存储系统中的2块硬盘作为校验盘,存放根据数据分片生成的校验部分。由于数据盘和检验盘是固定的,所有数据的校验部分都写在这2个校验盘中,这样,当需要更新任意一个数据盘中的数据时,都需要同时更新2个校验盘,从而使得2个校验盘由于访问频率过高或者访问次数过多而成为存储系统的瓶颈。The storage system is initially redundant in units of storage devices (for example, HDDs). For example, if the data redundancy mode is EC8+2, then 8 hard disks in the storage system are fixed as data disks to store data slices and use storage for fixed use. The two hard disks in the system are used as check disks to store the check parts generated according to the data shards. Since the data disk and the verification disk are fixed, all the verification parts of the data are written in these two verification disks. In this way, when the data in any one of the data disks needs to be updated, the two verification disks need to be updated at the same time. , so that the two parity disks become the bottleneck of the storage system due to high access frequency or too many access times.

为了避免某些固定的硬盘成为瓶颈,业界推出了将数据分片和校验部分打散后进行存储的技术,也就是说,灵活地分配数据分片和校验部分的存储位置,让每个硬盘都可以存放数据分片和校验部分,然而,由于数据分片和校验部分所在的存储位置不固定,则对于每一个PT,都需要记录其对应的多个硬盘的编号等信息,开销比较大。由于上述问题,提出了前述的通过虚拟化技术对硬盘进行分区的方法,在创建若干个PT后,生成与该若干个PT对应的视图,从而根据该视图进行数据的读写操作。In order to prevent some fixed hard disks from becoming a bottleneck, the industry has introduced a technology to store data fragments and check parts after breaking them up. All hard disks can store data slices and check parts. However, since the storage locations of data slices and check parts are not fixed, for each PT, it is necessary to record information such as the number of its corresponding multiple hard disks. bigger. Due to the above problems, the aforementioned method of partitioning a hard disk through virtualization technology is proposed. After several PTs are created, views corresponding to the several PTs are generated, so that data read and write operations are performed according to the views.

由于该视图在创建PT时已经生成,这样,当存储系统扩容时增加存储节点或者某些存储节点发生故障,会存在存储系统中的空间利用率较低的问题。Since the view has been generated when the PT is created, when the storage system is expanded, storage nodes are added or some storage nodes fail, resulting in low space utilization in the storage system.

鉴于此,本申请实施例提供一种存储系统的数据存储方法,在该方法中,在存储系统的拓扑结构为第一拓扑结构时,生成与该第一拓扑结构对应的M个第一存储分区组,该M个第一存储分区组中的每个第一存储分区组对应K个第一存储设备,即获得与第一拓扑结构对应的第一视图,在存储系统的拓扑结构由第一拓扑结构更新为与所述第一拓扑结构不同的第二拓扑结构时,则生成与该第二拓扑结构对应的N个第二存储分区组,该N个第二存储分区组中的每个第二存储分区组对应P个第二存储设备,即获得与第二拓扑结构对应的第二视图,然后,使用该N个第二存储分区组更新该存储系统中存储的M个第一存储分区组,这样,在接收待存储的新数据后,将该待存储的新数据存储在与该N个第二存储分区组中的至少一个第二存储分区组对应的第二存储设备中,N、P、M和K为正整数。也就是说,在本申请实施例中的方法,存储系统支持动态的存储分区组(或视图),当存储系统的拓扑结构发生变化时,存储系统的存储分区组(或视图)也会发生改变,例如,存储系统由3个存储节点且每个存储节点有3个硬盘,为避免发生单节点故障而导致整个存储系统无法使用的问题,该存储系统中每一个存储分区组中可以包括4个数据盘和2个校验盘,在这种情况下,存储系统的空间利用率为4/(4+2)=66.6%;当存储系统发生扩容,扩容到5个存储节点且每个存储节点有3个硬盘,则存储系统可以生成新的存储分区组,每个存储分区组中可以包括8个数据盘和2个检验盘,则空间利用率为8/(8+2)=80%,从而可以提高存储系统的空间利用率。当存储系统发生减容,由于3个存储节点中的某一个硬盘发生故障,则存储系统也可以生成新的存储分区组以适应减容后的存储系统,新的存储分区组中的每一个存储分区组都可以使用,可以提高存储系统的空间利用率。In view of this, an embodiment of the present application provides a data storage method for a storage system. In the method, when the topology of the storage system is a first topology, M first storage partitions corresponding to the first topology are generated group, each first storage partition group in the M first storage partition groups corresponds to K first storage devices, that is, a first view corresponding to the first topology structure is obtained, and the topology structure of the storage system is determined by the first topology When the structure is updated to a second topology structure different from the first topology structure, N second storage partition groups corresponding to the second topology structure are generated, and each second storage partition group in the N second storage partition groups is generated. The storage partition group corresponds to the P second storage devices, that is, a second view corresponding to the second topology is obtained, and then the N second storage partition groups are used to update the M first storage partition groups stored in the storage system, In this way, after receiving the new data to be stored, the new data to be stored is stored in the second storage device corresponding to at least one second storage partition group in the N second storage partition groups, N, P, M and K are positive integers. That is to say, in the method in this embodiment of the present application, the storage system supports a dynamic storage partition group (or view), and when the topology structure of the storage system changes, the storage partition group (or view) of the storage system also changes For example, a storage system consists of 3 storage nodes and each storage node has 3 hard disks. To avoid the problem that the entire storage system cannot be used due to a single node failure, each storage partition group in the storage system can include 4 hard disks. Data disk and 2 parity disks. In this case, the space utilization of the storage system is 4/(4+2)=66.6%; when the storage system expands, it expands to 5 storage nodes and each storage node If there are 3 hard disks, the storage system can generate a new storage partition group, and each storage partition group can include 8 data disks and 2 test disks, then the space utilization rate is 8/(8+2)=80%, Thus, the space utilization of the storage system can be improved. When the storage system is reduced in capacity, due to the failure of one of the hard disks in the three storage nodes, the storage system can also generate a new storage partition group to adapt to the reduced storage system. All partition groups can be used, which can improve the space utilization of the storage system.

本申请实施例中的技术方案应用于存储系统中,该存储系统可以为文件存储系统、块存储系统或者对象存储系统,或者上述存储系统的组合,在本申请实施例中不作限制。The technical solutions in the embodiments of the present application are applied to a storage system, and the storage system may be a file storage system, a block storage system, or an object storage system, or a combination of the above storage systems, which is not limited in the embodiments of the present application.

图3A示出了本申请实施例中所涉及的存储系统的一种示例,如图3A所示,该存储系统包括一个控制器301以及用于存储数据的多个存储节点302,该多个存储节点构成的一种耦合的节点集合,协同起来对外提供服务,存储节点也可以称为存储机架。如图3A所示,该存储系统中包括存储机架1~存储机架3。每个存储机架中包括多个硬盘,该硬盘可以为HDD盘,多个硬盘构成存储服务器,在图3A中,3个硬盘构成一个存储服务器。至少一个存储服务器构成一个存储机架,在图3A中,1个存储服务器构成一个存储机架。控制器301用于根据存储系统的拓扑结构生成存储分区组,并对操作请求进行管理,例如处理写操作的请求,根据该存储分区组将数据写入对应的存储介质中,或者处理读操作的请求,根据存储分区组从对应的存储介质中获取数据等,控制器301可以为中央处理器(central processingunit,CPU)、特定应用集成电路(application-specific integrated circuit,ASIC)或者现场可编程门阵列(field-programmable gate array,FPGA)等。FIG. 3A shows an example of a storage system involved in this embodiment of the present application. As shown in FIG. 3A , the storage system includes a controller 301 and a plurality of storage nodes 302 for storing data. A collection of coupled nodes composed of nodes, which cooperate to provide external services, and storage nodes can also be called storage racks. As shown in FIG. 3A , the storage system includes storage racks 1 to 3 . Each storage rack includes a plurality of hard disks, and the hard disks may be HDD disks. The plurality of hard disks constitute a storage server. In FIG. 3A , three hard disks constitute a storage server. At least one storage server constitutes a storage rack, and in FIG. 3A , one storage server constitutes a storage rack. The controller 301 is configured to generate a storage partition group according to the topological structure of the storage system, and manage operation requests, such as processing a request for a write operation, write data into a corresponding storage medium according to the storage partition group, or process a request for a read operation. The controller 301 can be a central processing unit (CPU), an application-specific integrated circuit (ASIC) or a field programmable gate array to request data from a corresponding storage medium according to a storage partition group. (field-programmable gate array, FPGA) etc.

与图3A所示的架构不同的是,在图3B所示的架构中包括多个控制器,例如控制器A和控制器B,控制器A和控制器B之间可以进行通信,这样,当该存储系统中的某一个控制器发生故障时,该集群存储系统仍然可以通过其他的控制器为与该存储系统交互的其他设备(例如客户端)提供服务。Different from the architecture shown in FIG. 3A, the architecture shown in FIG. 3B includes multiple controllers, such as controller A and controller B, and communication between controller A and controller B can be performed, so that when When a controller in the storage system fails, the cluster storage system can still provide services for other devices (eg, clients) interacting with the storage system through other controllers.

需要说明的是,存储系统不限于如图3A和图3B所示的架构,例如,存储系统中还可以包括更多的存储节点,例如,如图3C所示,存储系统中可以包括5个存储节点。存储系统中还可以包括其他的设备,例如仲裁服务器等,本申请实施例描述的存储系统是为了更加清楚的说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定,本领域普通技术人员可知,随着存储技术和存储系统架构的演变,本申请实施例提供的技术方案对于类似的技术问题,同样适用。It should be noted that the storage system is not limited to the architecture shown in FIG. 3A and FIG. 3B. For example, the storage system may include more storage nodes. For example, as shown in FIG. 3C, the storage system may include five storage nodes. node. The storage system may also include other devices, such as an arbitration server, etc. The storage system described in the embodiments of the present application is to more clearly describe the technical solutions of the embodiments of the present application, and does not constitute a technical solution provided by the embodiments of the present application. Limitations, those of ordinary skill in the art know that with the evolution of storage technologies and storage system architectures, the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems.

下面结合附图介绍本申请实施例提供的技术方案。The technical solutions provided by the embodiments of the present application are described below with reference to the accompanying drawings.

本申请实施例提供一种数据存储方法,请参见图4,为该方法的流程图。在下面的介绍中,以该方法应用在图3A所示的存储系统中为例,也就是说,该方法由图3A所示的存储系统中的控制器执行。An embodiment of the present application provides a data storage method. Please refer to FIG. 4 , which is a flowchart of the method. In the following description, the method is used in the storage system shown in FIG. 3A as an example, that is, the method is executed by the controller in the storage system shown in FIG. 3A .

S41、存储系统的控制器生成并存储与第一拓扑结构对应的M个第一存储分区组,该M个第一存储分区组中的每个第一存储分区组对应K个第一存储设备。S41. The controller of the storage system generates and stores M first storage partition groups corresponding to the first topology, and each first storage partition group in the M first storage partition groups corresponds to K first storage devices.

在本申请实施例中,K的值与该存储系统在该第一拓扑结构下所配置的数据冗余模式相同,M、K为正整数。In this embodiment of the present application, the value of K is the same as the data redundancy mode configured by the storage system under the first topology structure, and M and K are positive integers.

作为一种示例,在图3A所示的存储系统中,包括3个存储节点,每个存储节点包括4个硬盘,共12个硬盘,各个硬盘依次标记为硬盘1~硬盘12。该存储系统配置的数据冗余模式为EC4+2模式,假设该存储系统创建6个PT,则每个PT中包括6个分区,其中数据分片占用4个分区,检验部分占用2个分区,从而得到如图5所示的PT。然后,采用数据排布算法,将该6个PT映射到6个硬盘的分区中。其中,数据排布算法可以是轮询算法,也就是说,从存储系统中的第一个硬盘开始,例如,第一个PT中的D1映射到硬盘1中的分区,然后,依次增加硬盘的编号,选择对应的硬盘映射第一PT中的D2,也就是说将第一个PT的D2映射到硬盘2的分区中,以此类推,直至将每个PT中pt映射至各个硬盘的分区。或者,该数据排布算法也可以是哈希算法,例如,根据PT中pt的编号和硬盘的编号,采用哈希算法,例如消息摘要算法5(message-digest algorithm five,MD5)或者mmh3算法等,计算出每个PT中的pt所映射的硬盘的编号。当然,还可以使用其他数据排布算法,在此不作限制。As an example, the storage system shown in FIG. 3A includes 3 storage nodes, each storage node includes 4 hard disks, a total of 12 hard disks, and the hard disks are labeled as hard disk 1 to hard disk 12 in sequence. The data redundancy mode configured by the storage system is EC4+2 mode. Assuming that the storage system creates 6 PTs, each PT includes 6 partitions, of which the data shard occupies 4 partitions, and the inspection part occupies 2 partitions. Thus, the PT shown in FIG. 5 is obtained. Then, using a data arrangement algorithm, the 6 PTs are mapped to the partitions of the 6 hard disks. The data arrangement algorithm may be a round-robin algorithm, that is, starting from the first hard disk in the storage system, for example, D1 in the first PT is mapped to the partition in hard disk 1, and then the number of hard disks is increased in sequence. Number, select the corresponding hard disk to map D2 in the first PT, that is to say, map D2 of the first PT to the partition of hard disk 2, and so on, until the pt in each PT is mapped to the partition of each hard disk. Alternatively, the data arrangement algorithm may also be a hash algorithm, for example, according to the number of pt in the PT and the number of the hard disk, a hash algorithm is used, such as message-digest algorithm five (MD5) or mmh3 algorithm, etc. , calculate the number of the hard disk mapped by the pt in each PT. Of course, other data arrangement algorithms can also be used, which are not limited here.

需要说明的是,数据排布算法需要满足如下条件:It should be noted that the data arrangement algorithm needs to meet the following conditions:

1)数据安全性。1) Data security.

在将pt映射至硬盘时,同一PT中的各个pt应尽量分布到不同的安全级别的存储设备中。其中,各种存储设备的安全级别可以由技术人员配置,例如,安全级别从高到底的排序依次为:存储机架>存储服务器>硬盘。也就是说,如果存储设备中有存储服务器和存储机架,则可以让每个PT中的pt分布在不同的存储服务器或者不同的存储机架上,从而可以使得数据安全性更高。例如,在图3A所示的存储系统中,包括3个存储机架,则可以让一个PT中的6个pt均匀地分布在3个存储机架中,也就是说,每个存储机架中可以映射2个pt。且由于一个存储机架中包括2个存储服务器,则可以将一个存储机架中的2个pt分别映射至不同的存储服务器中。When mapping pts to hard disks, each pt in the same PT should be distributed to storage devices with different security levels as much as possible. The security levels of various storage devices can be configured by technicians. For example, the order of security levels from high to bottom is: storage rack>storage server>hard disk. That is to say, if there are storage servers and storage racks in the storage device, the PTs in each PT can be distributed on different storage servers or different storage racks, so that data security can be higher. For example, in the storage system shown in Figure 3A, including 3 storage racks, 6 pts in one PT can be evenly distributed in the 3 storage racks, that is, in each storage rack 2 pts can be mapped. And since one storage rack includes two storage servers, the two pts in one storage rack can be mapped to different storage servers respectively.

2)数据均衡性。2) Data balance.

在将pt映射至硬盘时,尽量让每个硬盘上的pt的数量尽量与该硬盘的容量成正比,例如,硬盘1的容易为2M,硬盘2的容量为1M,则可以将一个PT中的2个pt映射在硬盘1上,而将一个PT中的1个pt映射在硬盘2上,从而可以防止由于某一个硬盘上的访问次数过多而造成该硬盘成为热点。若存储系统中各个硬盘的容量相同,则每个硬盘中可以分布相同数量的pt。When mapping pts to hard disks, try to make the number of pts on each hard disk proportional to the capacity of the hard disk. 2 pts are mapped on hard disk 1, and 1 pt in one PT is mapped on hard disk 2, so that a hard disk can be prevented from becoming a hot spot due to excessive access times on a certain hard disk. If the capacity of each hard disk in the storage system is the same, the same number of pts can be distributed in each hard disk.

3)数据类型均衡性。3) Data type balance.

在将pt映射至硬盘时,每个硬盘上的位于多个PT中相同位置的pt数量尽量与该硬盘容量成正比,例如,在图1所示的10个PT中,以每个PT中的第一个pt(位置为1的pt)为例,10个PT中的第一个pt均匀分布在了硬盘A~硬盘J这10硬盘中,也就是说,每个硬盘中分布了一个位置为1的pt,而针对其他位置的pt与第一个pt一样,都是均匀分布在硬盘A~硬盘J中,这样可以避免由于小文件不满条带时,也就是说存储该小文件可能只使用了4个用于存储数据分片的分区中的2个分区,如果对该小文件进行读写时,只会访问与前2个分区对应的硬盘,而对于与后2个分区对应的硬盘则不会访问,从而使得与前2个分区对应的硬盘成为热点的问题。When mapping pts to hard disks, the number of pts located in the same position among multiple PTs on each hard disk is proportional to the capacity of the hard disk. For example, among the 10 PTs shown in The first pt (the pt whose position is 1) is taken as an example. The first pt of the 10 PTs is evenly distributed among the 10 hard disks from hard disk A to hard disk J. That is to say, each hard disk has a position of 1 pt, and the pts for other locations, like the first pt, are evenly distributed in hard disk A to hard disk J, so as to avoid when the small file is not full of stripes, that is to say, the small file may only be stored using 2 of the 4 partitions used to store data slices, if the small file is read or written, only the hard disk corresponding to the first 2 partitions will be accessed, and the hard disk corresponding to the last 2 partitions will be accessed. will not be accessed, thus making the hard disk corresponding to the first 2 partitions a hot issue.

4)数据恢复均衡性。4) Data recovery balance.

也就是说,假设某个PT内的一个pt所在的硬盘发生故障,则需要从该PT内的其他pt所在的硬盘读取数据,用于恢复故障硬盘上的数据,对于数据冗余模式为EC4+2模式的存储系统来说,每个映射到该故障硬盘的PT都要从其他5个硬盘中读取数据,在这种情况下,从每个PT从其他的每个非故障的硬盘中读取的数据的总量应该尽量相等。That is to say, if the hard disk where a PT in a PT is located fails, it is necessary to read data from the hard disk where other PTs in the PT are located to restore the data on the failed hard disk. For the data redundancy mode, EC4 is used. For a storage system in +2 mode, each PT mapped to the faulty hard disk must read data from the other 5 hard disks, in this case, from each PT from every other non-faulty hard disk. The total amount of data read should be as equal as possible.

在本申请实施例中,以控制器采用轮询算法,将6个PT映射至12个硬盘为例,得到如图6所示的第一视图。在图6中,PT1、PT3以及PT5对应的6个pt映射在硬盘1~硬盘6中,其中,4个数据盘为硬盘1~硬盘4,硬盘5和硬盘6为检验盘,PT2、PT4以及PT6对应的6个pt映射在硬盘7~硬盘12中,其中,4个数据盘为硬盘7~硬盘10,硬盘11和硬盘12为检验盘。In the embodiment of the present application, taking the controller using a polling algorithm to map 6 PTs to 12 hard disks as an example, the first view shown in FIG. 6 is obtained. In Figure 6, the 6 pts corresponding to PT1, PT3 and PT5 are mapped in HDD 1 to HDD 6, among which, 4 data disks are HDD 1 to HDD 4, HDD 5 and HDD 6 are inspection disks, PT2, PT4 and The 6 pts corresponding to PT6 are mapped in the hard disks 7 to 12, wherein, the 4 data disks are the hard disks 7 to 10, and the hard disks 11 and 12 are test disks.

S42、存储系统的控制器在接收到待存储的数据后,确定将该待存储的数据存储在M个第一存储分区组中的至少一个分区组中。S42. After receiving the data to be stored, the controller of the storage system determines to store the data to be stored in at least one partition group of the M first storage partition groups.

作为一种示例,假设每一个PT能够存储128兆字节(Megabyte,MB)的数据,那么,当存储系统的控制器接收到大小为256MB的数据时,控制器可以根据预设的规则,从多个PT中选择2个PT用于存储该数据。该预设的规则可以是从未使用的多个PT中依次选取所需要的若干个PT,例如,该数据为存储系统的控制器接收到的第一个数据,也就是说在此之前,该存储系统中未存储任何数据,则控制器可以选择PT1和PT2用于存储该数据。或者,也可以是随机从未使用的多个PT中进行选择,例如,控制器随机选择PT1以及PT3用于存储该数据。然后,控制器则该256MB的数据分为2份,每份数据分为4个数据分片,每个数据分片的大小为32MB,然后,根据每份数据中的4个数据分片的内容生成2个校验部分,最后,根据每份数据的4个数据分片以及2个校验部分,生成与该数据对应的数据存储指令,该数据存储指令中包括该硬盘需要存储的数据分片或者检验部分。As an example, assuming that each PT can store 128 megabytes (Megabyte, MB) of data, then when the controller of the storage system receives data with a size of 256 MB, the controller can, according to a preset rule, store data from Two PTs are selected from among the plurality of PTs for storing the data. The preset rule may be to select several required PTs in sequence from multiple unused PTs. For example, the data is the first data received by the controller of the storage system, that is to say, before this, the No data is stored in the storage system, then the controller can select PT1 and PT2 for storing the data. Alternatively, it may also be randomly selected from a plurality of unused PTs, for example, the controller randomly selects PT1 and PT3 for storing the data. Then, the controller divides the 256MB data into 2 pieces, each piece of data is divided into 4 pieces of data, and the size of each piece of data is 32MB, and then, according to the content of the 4 pieces of data in each piece of data Generate 2 check parts, and finally, according to the 4 data slices and 2 check parts of each piece of data, generate a data storage instruction corresponding to the data, the data storage instruction includes the data slices that the hard disk needs to store Or the inspection part.

作为另一种示例,存储系统的控制器可以采用打散机制存储数据。例如,控制器接收到大小为32K的数据,则控制器会以8K为粒度,将该数据打散,从而分成4个大小为8K的数据,然后再将这4个大小为8K的数据分别存储到不同的PT中。As another example, the controller of the storage system may use a shredding mechanism to store data. For example, if the controller receives data with a size of 32K, the controller will use 8K as the granularity to break up the data, thereby dividing it into 4 pieces of data with a size of 8K, and then store the 4 pieces of data with a size of 8K separately. into a different PT.

当然,存储系统的控制器也可以使用其他方式选择对应的PT,在此不作限制。为方便说明书,在下文中,以存储系统的控制器选择PT1和PT3用于存储该数据为例。Of course, the controller of the storage system can also use other methods to select the corresponding PT, which is not limited here. For the convenience of description, in the following, the controller of the storage system selects PT1 and PT3 for storing the data as an example.

S43、存储系统的控制器向至少一个存储节点发送数据存储指令,该至少一个存储节点接收该数据存储指令,至少一个存储节点接收该数据存储指令,并执行该数据存储指令。S43. The controller of the storage system sends a data storage instruction to at least one storage node, the at least one storage node receives the data storage instruction, and the at least one storage node receives the data storage instruction and executes the data storage instruction.

由图6可知,与PT1对应的硬盘为硬盘1~硬盘6,与PT3对应的硬盘也为硬盘1~硬盘6,因此,存储系统的控制器则将向为与硬盘1~硬盘6分别对应的存储节点(也就是存储节点1~存储节点3)分别发送数据存储指令。存储节点1~存储节点3中的每个存储节点在接收该数据存储指令后,则将该数据存储指令中携带的数据分片或者检验部分,存储在对应的分区中,从而完成该数据存储过程。It can be seen from FIG. 6 that the hard disks corresponding to PT1 are hard disk 1 to hard disk 6, and the hard disks corresponding to PT3 are also hard disk 1 to hard disk 6. Therefore, the controller of the storage system will be directed to the hard disk 1 to hard disk 6 corresponding to respectively. The storage nodes (that is, storage node 1 to storage node 3) respectively send data storage instructions. After receiving the data storage instruction, each storage node in storage node 1 to storage node 3 stores the data shard or check part carried in the data storage instruction in the corresponding partition, thereby completing the data storage process .

S44、至少一个存储节点中的每个存储节点向存储系统的控制器发送心跳,存储系统的控制器接收该心跳。S44. Each storage node in the at least one storage node sends a heartbeat to the controller of the storage system, and the controller of the storage system receives the heartbeat.

在本申请实施例中,每个存储节点向存储系统的控制器发送的心跳,可以理解为一个自定义的信息,例如心跳包或心跳帧,让接收方知道自己“在线”,以确保连接的有效性。心跳的具体内容可以是DNS与客户端约定的内容,或者,是可以包括该节点中每个服务器以及每个硬盘的状态的信息,当然,也可以是只包括包头的一个空包,在此不作限制。每个存储节点可以按照一定的时间间隔自动向存储系统的控制器发送心跳,也可以是先由存储系统的控制器向每个存储节点发送查询信息,处于“在线”状态的存储节点在接收到查询信息后,则向控制器反馈心跳。在本申请实施例中,以每个存储节点以一定的时间间隔自动向控制器发送心跳为例。In the embodiment of the present application, the heartbeat sent by each storage node to the controller of the storage system can be understood as a self-defined information, such as a heartbeat packet or a heartbeat frame, to let the receiver know that he is "online" to ensure that the connection is complete. effectiveness. The specific content of the heartbeat can be the content agreed between the DNS and the client, or it can include information about the status of each server and each hard disk in the node. Of course, it can also be an empty packet that only includes the header, which is not described here. limit. Each storage node can automatically send a heartbeat to the controller of the storage system at a certain time interval, or the controller of the storage system can first send query information to each storage node, and the storage node in the "online" state receives the After querying the information, it feeds back the heartbeat to the controller. In the embodiment of the present application, each storage node automatically sends a heartbeat to the controller at a certain time interval as an example.

步骤S44为可选步骤,即不是必须要执行的。在图4中以虚线表示。Step S44 is an optional step, that is, it does not have to be executed. It is indicated by a dashed line in FIG. 4 .

S45、存储系统的控制器确定存储系统扩容。S45. The controller of the storage system determines capacity expansion of the storage system.

在本申请实施例中,存储系统扩容可以理解为,存储系统的拓扑结构由该第一拓扑结构更新为第二拓扑结构,且第二拓扑结构中的存储节点的数量变大。In the embodiment of the present application, the expansion of the storage system can be understood as the topology structure of the storage system is updated from the first topology structure to the second topology structure, and the number of storage nodes in the second topology structure increases.

当存储系统的控制器接收到每个存储节点的心跳后,可以根据在预设时间段内接收到的心跳的数量,确定该存储系统中的存储节点的数量。例如,在图3A所示的存储系统中,存储系统的控制器可以接收3个存储节点发送的心跳,从而确定该存储系统中有3个存储节点。若接收到的每个心跳中还包括该节点中的服务器以及硬盘的状态,则控制器可以获知在每个存储节点中存储设备的数量以及状态,这样,当存储节点中的存储设备(例如某个硬盘)发生故障后,存储系统的控制器可以根据该存储节点的心跳,确定该存储设备不可用,从而在进行数据存储时,不使用与该存储设备对应的PT。After the controller of the storage system receives the heartbeat of each storage node, it can determine the number of storage nodes in the storage system according to the number of heartbeats received within a preset time period. For example, in the storage system shown in FIG. 3A , the controller of the storage system may receive heartbeats sent by three storage nodes, thereby determining that there are three storage nodes in the storage system. If each received heartbeat also includes the status of the server and hard disk in the node, the controller can learn the number and status of storage devices in each storage node. After a hard disk) fails, the controller of the storage system can determine that the storage device is unavailable according to the heartbeat of the storage node, so that the PT corresponding to the storage device is not used during data storage.

由于存储系统是一个动态的系统,存储系统有可能发生扩容,例如,由于需要存储的数据量增大,从而增加存储机架、存储服务器和硬盘,也有可能发生减容,例如存储系统的规模减小或存储设备损坏而减少等。以存储系统发生扩容为例,存储系统由图3A所示的拓扑结构变更为图3C所示的拓扑结构,即增加了2个存储节点,每个存储节点中包括4个硬盘,则存储系统的控制器在存储系统扩容后,在预设时间段内接收到5个存储节点的心跳,从而确定出该存储系统的拓扑结构由具有3个存储节点的第一拓扑结构扩容为具有5个存储节点的第二拓扑结构。Since the storage system is a dynamic system, the storage system may expand its capacity. For example, due to the increase in the amount of data that needs to be stored, storage racks, storage servers and hard disks are added, and there may also be capacity reduction, such as a reduction in the size of the storage system. Reduced due to small or damaged storage devices, etc. Taking the expansion of the storage system as an example, the storage system is changed from the topology shown in Figure 3A to the topology shown in Figure 3C, that is, two storage nodes are added, and each storage node includes four hard disks. After the storage system is expanded, the controller receives the heartbeats of 5 storage nodes within a preset time period, thereby determining that the topology of the storage system is expanded from the first topology with 3 storage nodes to 5 storage nodes. the second topology.

当然,存储系统的控制器也可以采用其他方式确定存储系统的拓扑结构是否发生变化,在此不作限制,在本申请实施例中,以存储系统的控制器通过心跳确定存储系统的拓扑结构发生变化为例。Of course, the controller of the storage system may also use other methods to determine whether the topology of the storage system has changed, which is not limited here. In this embodiment of the present application, the controller of the storage system determines that the topology of the storage system has changed through heartbeat. For example.

S46、存储系统的控制器生成与该第二拓扑结构对应的N个第二存储分区组,该N个第二存储分区组中的每个第二存储分区组对应P个第二存储设备。S46. The controller of the storage system generates N second storage partition groups corresponding to the second topology structure, and each second storage partition group in the N second storage partition groups corresponds to P second storage devices.

在本申请实施例中,P的值与该存储系统在该第二拓扑结构下所配置的数据冗余模式相同,N和P为正整数。In this embodiment of the present application, the value of P is the same as the data redundancy mode configured by the storage system under the second topology structure, and N and P are positive integers.

作为一种示例,存储系统中存储各种拓扑结构与数据冗余模式的对应关系,请参考表1。在表1,当存储系统的拓扑结构为3个存储节点时,该存储系统可以包括EC4+2或者EC3+3这两种数据冗余模式,当存储系统的拓扑结构为5个存储节点时,该存储系统可以包括EC8+2、EC6+3或者EC6+2等多种数据冗余模式,当存储系统为其他拓扑结构时,则对应其他的数据冗余模式,在此不一一列举。As an example, please refer to Table 1 for the correspondence between various topology structures and data redundancy modes stored in the storage system. In Table 1, when the topology of the storage system is 3 storage nodes, the storage system can include two data redundancy modes, EC4+2 or EC3+3. When the topology of the storage system is 5 storage nodes, The storage system may include multiple data redundancy modes such as EC8+2, EC6+3, or EC6+2. When the storage system has other topologies, it corresponds to other data redundancy modes, which are not listed here.

表1Table 1

Figure BDA0001925369030000121
Figure BDA0001925369030000121

这样,当存储系统的控制器确定存储系统的拓扑结构发生变化,例如,由图3A所示的3个存储节点的第一拓扑结构变为图3C所示的5个存储节点的第二拓扑结构,则可以根据预存的各种拓扑结构对应的数据冗余模式的对应关系,确定存储系统在第二拓扑结构下所配置的数据冗余模式,然后则根据确定的数据冗余模式,生成与第二拓扑结构对应的N个第二存储分区组。In this way, when the controller of the storage system determines that the topology of the storage system has changed, for example, the first topology of three storage nodes shown in FIG. 3A is changed to the second topology of five storage nodes shown in FIG. 3C . , the data redundancy mode configured by the storage system under the second topology structure can be determined according to the corresponding relationship of the data redundancy modes corresponding to various pre-stored topological structures, and then according to the determined data redundancy mode, the N second storage partition groups corresponding to the two topology structures.

需要说明的是,在上述描述中,以存储节点的数量为例来描述拓扑结构,在实际应用中,拓扑结构的变化可能不止包括存储节点的数量的变化,例如,每个存储节点中包括的存储服务器以及硬盘的数量发生变化等,在这种情况下,也可以根据存储服务器以及硬盘的数量来定义拓扑结构是否发生变化,以及在每种拓扑结构下所配置的数据冗余模式,具体过程与前述以存储节点的数量为例来描述拓扑结构时的相应内容相似,在此不再赘述。It should be noted that, in the above description, the number of storage nodes is used as an example to describe the topology. In practical applications, the change of the topology may not only include the change of the number of storage nodes. Changes in the number of storage servers and hard disks, etc. In this case, it is also possible to define whether the topology structure changes according to the number of storage servers and hard disks, and the data redundancy mode configured under each topology structure. The specific process It is similar to the corresponding content when the topology structure is described above by taking the number of storage nodes as an example, and details are not repeated here.

在本申请实施例中,以存储系统由图3A所示的3个存储节点的第一拓扑结构变为图3C所示的5个存储节点的第二拓扑结构,且存储系统的控制器确定在第二拓扑结构下配置的数据冗余模式为EC8+2,也就是说,在第二拓扑结构下,一个PT中包括10个pt,其中8个pt用于存储数据分片,2个pt用于存储校验部分,即P大于K,存储系统的控制器生成与第二拓扑结构对应的N个第二存储分区组可以包括但不限于如下两种方式。In the embodiment of the present application, the storage system is changed from the first topology structure of 3 storage nodes shown in FIG. 3A to the second topology structure of 5 storage nodes shown in FIG. 3C , and the controller of the storage system determines that the The data redundancy mode configured in the second topology is EC8+2, that is to say, in the second topology, a PT includes 10 pts, of which 8 pts are used to store data slices, and 2 pts are used for In the storage verification part, that is, P is greater than K, the controller of the storage system may generate N second storage partition groups corresponding to the second topology, including but not limited to the following two ways.

第一种方式,不考虑在第一拓扑结构下的M个第一存储分区组,生成N个第二存储分区组。In the first manner, N second storage partition groups are generated without considering the M first storage partition groups under the first topology.

作为一种示例,存储系统的控制器首先创建6个PT,每个PT中包括10个分区,其中数据分片占用8个分区,检验部分占用2个分区,控制器可以采用与S41中相同的数据排布算法,例如,轮询算法或者哈希算法等,将该6个PT映射到存储系统的5个存储节点(一共有20个硬盘)上。As an example, the controller of the storage system first creates 6 PTs, and each PT includes 10 partitions, of which the data slice occupies 8 partitions, and the verification part occupies 2 partitions. A data arrangement algorithm, such as a polling algorithm or a hash algorithm, etc., maps the 6 PTs to 5 storage nodes (20 hard disks in total) of the storage system.

在这种方式下,与第二拓扑结构对应的N个第二存储分区组可以包括如下两种情况:In this manner, the N second storage partition groups corresponding to the second topology structure may include the following two situations:

情况a,每个第一存储分区组中的L个位置处的第一存储设备与对应的第二存储分区组中对应的L个位置上的第二存储设备是同一个存储设备,L为正整数。In case a, the first storage device at the L positions in each first storage partition group and the second storage device at the corresponding L positions in the corresponding second storage partition group are the same storage device, and L is positive. Integer.

请参考图7A,为第二视图的一种示例,在图7A中,N的取值为6。如图7A所示,每一个第二存储分区组中的前4个硬盘中,均有1个位置的硬盘编号与第一视图中相同位置的硬盘编号相同,例如,在图7A所示的视图中,PT1的D3所在的硬盘为硬盘3,在图6所示的视图中,PT1的D3所在的硬盘也为硬盘3;在图7A所示的视图中,PT2的D3所在的硬盘为硬盘9,在图6所示的视图中,PT2的D3所在的硬盘也为硬盘9;在图7A所示的视图中的PT3的D1所在的硬盘与图6所示的视图中的PT3的D1所在的硬盘相同,对于其他的PT中也有类似的情况,在此不一一列举,在这种情况下,L的取值为1。在图7A中以阴影部分表示第一视图和第二视图中相同的硬盘。Please refer to FIG. 7A , which is an example of the second view. In FIG. 7A , the value of N is 6. As shown in FIG. 7A , among the first 4 hard disks in each second storage partition group, the hard disk number at one position is the same as the hard disk number in the same position in the first view. For example, in the view shown in FIG. 7A , the hard disk where D3 of PT1 is located is hard disk 3, and in the view shown in FIG. 6, the hard disk where D3 of PT1 is located is also hard disk 3; in the view shown in FIG. 7A, the hard disk where D3 of PT2 is located is hard disk 9 , in the view shown in FIG. 6 , the hard disk where D3 of PT2 is located is also the hard disk 9; the hard disk where D1 of PT3 in the view shown in FIG. 7A is located is the same as that of D1 of PT3 in the view shown in FIG. 6 The hard disk is the same, and there are similar situations in other PTs, which are not listed here. In this case, the value of L is 1. The same hard disk in the first view and the second view is shown by hatching in FIG. 7A.

情况b,N个第二存储分区组中的任意一个第二存储分区组对应的P个第二存储设备,与M个第一存储分区组中任意一个第一存储分区组中包括的K个第一存储设备不相同。In case b, the P second storage devices corresponding to any one of the N second storage partition groups are different from the Kth storage devices included in any one of the M first storage partition groups. A storage device is not the same.

请参考图7B,为第二视图的一种示例,在图7B中,N的取值为6。如图7B所示,每一个第二存储分区组中前4个pt所映射的4个硬盘对应的硬盘编号与第一视图中对应的第一存储分区组的前4个pt所映射的4个硬盘对应的硬盘编号完全不同,例如,在图7B所示的视图中,PT1的D1所在的硬盘为硬盘14,D2所在的硬盘为硬盘15,D3所在的硬盘为硬盘4,D4所在的硬盘为硬盘3,在图6所示的视图中,PT1的D1所在的硬盘为硬盘1,D2所在的硬盘为硬盘2,D3所在的硬盘为硬盘3,D4所在的硬盘为硬盘4;在图7B所示的视图中,PT2的D1所在的硬盘为硬盘17,D2所在的硬盘为硬盘19,D3所在的硬盘为硬盘10,D4所在的硬盘为硬盘9,在图6所示的视图中,PT2的D1所在的硬盘为硬盘7,D2所在的硬盘为硬盘8,D3所在的硬盘为硬盘9,D4所在的硬盘为硬盘10,对于其他的PT中也有类似的情况,在此不一一列举。Please refer to FIG. 7B , which is an example of the second view. In FIG. 7B , the value of N is 6. As shown in FIG. 7B , the hard disk numbers corresponding to the 4 hard disks mapped by the first 4 pts in each second storage partition group and the 4 hard disk numbers mapped to the first 4 pts of the corresponding first storage partition group in the first view The hard disk numbers corresponding to the hard disks are completely different. For example, in the view shown in Figure 7B, the hard disk where D1 of PT1 is located is hard disk 14, the hard disk where D2 is located is hard disk 15, the hard disk where D3 is located is hard disk 4, and the hard disk where D4 is located is Hard disk 3, in the view shown in Figure 6, the hard disk where D1 of PT1 is located is hard disk 1, the hard disk where D2 is located is hard disk 2, the hard disk where D3 is located is hard disk 3, and the hard disk where D4 is located is hard disk 4; In the view shown, the hard disk where D1 of PT2 is located is hard disk 17, the hard disk where D2 is located is hard disk 19, the hard disk where D3 is located is hard disk 10, and the hard disk where D4 is located is hard disk 9. In the view shown in FIG. The hard disk where D1 is located is hard disk 7, the hard disk where D2 is located is hard disk 8, the hard disk where D3 is located is hard disk 9, and the hard disk where D4 is located is hard disk 10. Similar situations exist in other PTs, which are not listed here.

在这种情况下,当存储系统的拓扑结构发生变化,存储系统可以支持动态的视图,即当存储系统的拓扑结构发生变化,则对应的视图也可以发生变化,例如,增加每个PT中的pt数量,可以提高存储系统的存储空间的利用率。In this case, when the topology of the storage system changes, the storage system can support dynamic views. That is, when the topology of the storage system changes, the corresponding view can also change. For example, increase the number of views in each PT. The number of pts can improve the utilization of the storage space of the storage system.

第二种方式,考虑在第一拓扑结构下的M个第一存储分区组,生成N个第二存储分区组,也就是说,与第一拓扑结构下的M个第一存储分区组对应的第一视图与在第二拓扑结构下的N个第二存储分区组对应的第二视图联动,第二视图会考虑第一视图中同位置的pt的布局,在保证可靠性、均衡性、恢复均衡等约束的前提下,尽量保证每个PT中相同位置的pt落入相同的硬盘中。In the second manner, considering the M first storage partition groups under the first topology, generate N second storage partition groups, that is, corresponding to the M first storage partition groups under the first topology. The first view is linked with the second views corresponding to the N second storage partition groups under the second topology, and the second view will consider the layout of the pts in the same position in the first view to ensure reliability, balance, and recovery. Under the premise of balance and other constraints, try to ensure that the pts in the same position in each PT fall into the same hard disk.

作为一种示例,存储系统的控制器可以多次采用与S41中相同的数据排布算法,例如,轮询算法或者哈希算法等,获取与该6个PT对应的多个第二视图,然后根据该多个第二视图与该第一视图的联动情况,从该多个第二视图中选择每个PT中相同位置的pt落入相同的硬盘的最多的一个第二视图,作为在第二拓扑结构下的M个第二存储分区组的第二视图。As an example, the controller of the storage system may use the same data arrangement algorithm as in S41 for many times, such as a polling algorithm or a hash algorithm, etc., to obtain multiple second views corresponding to the 6 PTs, and then According to the linkage between the plurality of second views and the first view, select a second view from the plurality of second views with the most pts at the same position in each PT falling into the same hard disk, as the second view in the second view A second view of the M second storage partition groups under the topology.

请参考图7C,为第二视图的一种示例,在图7C中,N的取值为6。如图7C所示,每一个第二存储分区组中的前4个硬盘中,均有2个位置的硬盘编号与第一视图中相同位置的硬盘编号相同,例如,在图7C所示的视图中,PT1的D3所在的硬盘为硬盘3,D4所在的硬盘为硬盘4,在图6所示的视图中,PT1的D3所在的硬盘也为硬盘3,D4所在的硬盘也为硬盘4;在图7C所示的视图中,PT2的D3所在的硬盘为硬盘9,D4所在的硬盘为硬盘10,在图6所示的视图中,PT2的D3所在的硬盘也为硬盘9,D4所在的硬盘也为硬盘10;对于其他的PT中也有类似的情况,在此不一一列举。在图7C中以阴影部分表示第一视图和第二视图中相同的硬盘。Please refer to FIG. 7C , which is an example of the second view. In FIG. 7C , the value of N is 6. As shown in FIG. 7C , among the first 4 hard disks in each second storage partition group, the hard disk numbers in two positions are the same as the hard disk numbers in the same position in the first view. For example, in the view shown in FIG. 7C , the hard disk where D3 of PT1 is located is hard disk 3, and the hard disk where D4 is located is hard disk 4. In the view shown in Figure 6, the hard disk where D3 of PT1 is located is also hard disk 3, and the hard disk where D4 is located is also hard disk 4; In the view shown in FIG. 7C, the hard disk where D3 of PT2 is located is hard disk 9, and the hard disk where D4 is located is hard disk 10. In the view shown in FIG. 6, the hard disk where D3 of PT2 is located is also hard disk 9, and the hard disk where D4 is located is also hard disk 9. It is also the hard disk 10; there are similar situations in other PTs, which are not listed here. The same hard disk in the first view and the second view is shown by hatching in FIG. 7C.

在这种情况下,由于新生成的第二视图是基于原有的第一视图进行扩展的,这样,每个PT中相同位置的pt中的数据则可以不用迁移,从而可以减少数据迁移的数量,避免占用较多的带宽资源或者输入/输出(input/output,I/O)资源,从而可以将更多的带宽资源或者I/O资源用于业务服务,可以提高存储系统的业务服务能力。In this case, since the newly generated second view is extended based on the original first view, the data in the pts at the same position in each PT can not be migrated, thereby reducing the number of data migrations , to avoid occupying more bandwidth resources or input/output (I/O) resources, so that more bandwidth resources or I/O resources can be used for business services, which can improve the business service capability of the storage system.

S47、存储系统的控制器使用该N个第二存储分区组更新该存储系统中存储的M个第一存储分区组。S47. The controller of the storage system uses the N second storage partition groups to update the M first storage partition groups stored in the storage system.

当存储系统的控制器获取在第二拓扑结构下的N个第二存储分区组之后,则可以使用该N个第二存储分区组替换存储系统中存储的M个第一存储分区组,也就是说,存储系统中只存储一种存储分区组,且该存储系统中存储的存储分区组是与存储系统当前的拓扑结构所对应的。After the controller of the storage system acquires the N second storage partition groups under the second topology, the N second storage partition groups can be used to replace the M first storage partition groups stored in the storage system, that is, In other words, only one type of storage partition group is stored in the storage system, and the storage partition group stored in the storage system corresponds to the current topology of the storage system.

或者,存储系统的控制器也可以存储多种不同的存储分区组,例如,存储系统中存储两种类型的存储分区组,分别为存储系统在第一拓扑结构下的M个第一存储分区组以及存储系统在第二拓扑结构下的N个第二存储分区组。这样,针对已经按照M个第一存储分区组中的若干第一存储分区组进行存储的数据,则可以不用进行数据迁移,从而可以减少数据迁移所消耗的资源。Alternatively, the controller of the storage system may also store a variety of different storage partition groups. For example, the storage system stores two types of storage partition groups, which are the M first storage partition groups of the storage system under the first topology structure. and N second storage partition groups of the storage system under the second topology. In this way, for data that has been stored according to several first storage partition groups in the M first storage partition groups, data migration may not be performed, thereby reducing resources consumed by data migration.

S48、存储系统的控制器针对已经按照M个第一视图中至少一个第一存储分区组存储的数据进行数据迁移。S48. The controller of the storage system performs data migration for the data that has been stored according to at least one first storage partition group in the M first views.

在本申请实施例中,根据S46中生成N个第二存储分区组的方式的不同,存储系统的控制器迁移数据的方式也不一样。In this embodiment of the present application, according to the different ways of generating the N second storage partition groups in S46, the ways of migrating data by the controller of the storage system are also different.

针对S46中的第一种方式的情况a,由于每个第一存储分区组中的L个位置处的第一存储设备与对应的第二存储分区组中对应的L个位置上的第二存储设备是同一个存储设备,也就是说,每个第一存储分区组对应的L个第一存储设备中的每个第一存储设备中存储的一部分数据,需要与M个第二存储分区组对应的P个第二存储设备中的一个第二存储设备中存储的数据相同,从而,存储系统的控制器将每个存储在该M个第一存储分区组中的一个存储分区组对应的K个存储设备的原始数据中的部分数据按照该N个第二存储分区组中的一个第二存储分区组进行数据迁移,该每个原始数据中的部分数据与该L个第二存储设备中的每个第二存储设备中存储的数据均不同。For the case a of the first manner in S46, since the first storage devices at L locations in each first storage partition group and the second storage devices at corresponding L locations in the corresponding second storage partition group The device is the same storage device, that is, a part of the data stored in each of the L first storage devices corresponding to each first storage partition group needs to correspond to M second storage partition groups The data stored in one of the P second storage devices are the same, so that the controller of the storage system stores the K corresponding to one storage partition group in each of the M first storage partition groups. Part of the data in the original data of the storage device is migrated according to one second storage partition group in the N second storage partition groups, and the part of the data in each original data is the same as that in each of the L second storage devices. The data stored in the second storage devices are all different.

作为一种示例,由图7A可知,6个PT中的每一个PT中有一个pt所在的硬盘与第一存储分区组中相应的PT中的pt所在的硬盘相同,例如,图7A所示PT1的D3所在的硬盘为硬盘3,在图6所示的视图中,PT1的D3所在的硬盘也为硬盘3,从而,针对存储在图6所示的PT1中的数据,可以只迁移D1、D2、D4即可,而可以不用迁移D3。每个校验部分需要根据迁移后的多个数据分片重新生成。这样,存储系统的拓扑结构由3存储节点扩容到5个存储节点时,可以减少1/4=25%的数据迁移,可以节省存储系统的资源。As an example, it can be seen from FIG. 7A that the hard disk where one PT is located in each of the 6 PTs is the same as the hard disk where the PT in the corresponding PT in the first storage partition group is located. For example, PT1 shown in FIG. 7A The hard disk where D3 is located is hard disk 3. In the view shown in Figure 6, the hard disk where D3 of PT1 is located is also hard disk 3. Therefore, for the data stored in PT1 shown in Figure 6, only D1 and D2 can be migrated , D4 can be, and can not migrate D3. Each verification part needs to be regenerated based on the migrated multiple data shards. In this way, when the topology of the storage system is expanded from 3 storage nodes to 5 storage nodes, data migration can be reduced by 1/4=25%, and resources of the storage system can be saved.

针对S46中的第一种方式的情况b,由于N个第二存储分区组中的任意一个第二存储分区组对应的P个第二存储设备,与M个第一存储分区组中任意一个第一存储分区组中包括的K个第一存储设备不相同,从而,存储系统的控制器将按照该M个第一存储分区组中的至少一个第一存储分区组存储的每个原始数据,按照该N个第二存储分区组中的一个第二存储分区组进行数据迁移。For the case b of the first manner in S46, since the P second storage devices corresponding to any one of the N second storage partition groups are different from any one of the M first storage partition groups The K first storage devices included in a storage partition group are different, so the controller of the storage system will store each original data stored in at least one first storage partition group in the M first storage partition groups according to One second storage partition group among the N second storage partition groups performs data migration.

作为一种示例,由图7B可知,6个PT中的每一个PT中的多个pt所在的硬盘与第一存储分区组中相应的PT中的多个pt所在的硬盘不同,例如,图7A所示PT1的D1~D4所在的硬盘与在图6所示的视图中的PT1的D1~D4所在的硬盘不同,从而,针对存储在图6所示的PT1中的数据,需要全部迁移到与图7A所示的PT1对应的硬盘中,每个校验部分需要根据迁移后的多个数据分片重新生成。As an example, it can be seen from FIG. 7B that the hard disk where multiple PTs in each of the 6 PTs are located is different from the hard disk where multiple PTs in the corresponding PTs in the first storage partition group are located. For example, FIG. 7A The hard disks where D1 to D4 of PT1 shown are located are different from the hard disks where D1 to D4 of PT1 are located in the view shown in FIG. 6 . Therefore, for the data stored in PT1 shown in FIG. In the hard disk corresponding to PT1 shown in FIG. 7A , each verification part needs to be regenerated according to the multiple data fragments after migration.

针对S46中的第二种方式,可以参照针对S46中的第一种方式的情况a的说明,在此不再赘述。需要说明的是,由于在生成N个第二存储分区组时考虑到M个第一存储分区组,也就是说,每个PT中有更多个相同位置的pt落入相同的硬盘中,因此,相较于针对S46中的第一种方式的情况a中的需要迁移的数据的数据量来说,在这种情况下,需要迁移的数据的数据量更少,从而可以进一步减少进行数据迁移时所消耗的存储系统的资源。For the second manner in S46, reference may be made to the description for the case a of the first manner in S46, which will not be repeated here. It should be noted that, since the M first storage partition groups are considered when generating the N second storage partition groups, that is, there are more pts in the same position in each PT that fall into the same hard disk, so , compared with the data volume of the data to be migrated in the case a of the first method in S46, in this case, the data volume of the data to be migrated is less, so that the data migration can be further reduced The resources consumed by the storage system.

需要说明的是,步骤S48为可选步骤,即不是必须要执行的。例如,一些日志型的存储系统,也可以扩容后不迁移数据,新加入的存储节点和硬盘在后续会多写数据,逐渐追平容量,举例来说,第二视图基于原有的第一视图产生,在视图变更前,增加D5、D6、D7、D8四个partition,按照第一视图存储在该存储系统中的数据随着垃圾回收逐渐适配到第二视图。在图4中,以虚线表示步骤S48为可选步骤。It should be noted that step S48 is an optional step, that is, it is not required to be executed. For example, some log-type storage systems can also be expanded without migrating data. The newly added storage nodes and hard disks will write more data in the future to gradually equalize the capacity. For example, the second view is based on the original first view. Generated, before the view is changed, four partitions D5, D6, D7, and D8 are added, and the data stored in the storage system according to the first view is gradually adapted to the second view along with garbage collection. In FIG. 4 , step S48 is indicated as an optional step by a dotted line.

S49、存储系统的控制器在接收待存储的新数据后,将该待存储的新数据存储在与该N个第二存储分区组中的至少一个第二存储分区组对应的第二存储设备中。S49. After receiving the new data to be stored, the controller of the storage system stores the new data to be stored in a second storage device corresponding to at least one second storage partition group in the N second storage partition groups .

当存储系统的控制器生成N个第二存储分区组之后,在接收到待存储的数据时,则按照该N个第二存储分区组进行数据存储,具体的数据存储过程与步骤S42中相同,在此不再赘述。After the controller of the storage system generates N second storage partition groups, when receiving the data to be stored, data storage is performed according to the N second storage partition groups, and the specific data storage process is the same as that in step S42, It is not repeated here.

在上述技术方案中,存储系统支持动态的存储分区组(或视图),当存储系统的拓扑结构发生变化时,存储系统的存储分区组(或视图)也会发生改变,例如,当存储系统扩容后,存储系统中的数据可以同步到采用更高的数据冗余模式进行数据存储,从而可以提高存储系统的空间利用率。In the above technical solution, the storage system supports dynamic storage partition groups (or views). When the topology of the storage system changes, the storage partition groups (or views) of the storage system also change. For example, when the storage system expands Afterwards, the data in the storage system can be synchronized to a higher data redundancy mode for data storage, thereby improving the space utilization of the storage system.

在图4所示的实施例中,对存储系统扩容时的处理过程进行了说明。下面,对存储系统减容时的处理过程进行说明。请参考图8,为本申请实施例中的数据存储方法的另一种示例的流程图。在下面的介绍中,以该方法应用在图3C所示的存储系统中为例,也就是说,该方法由图3C所示的存储系统中的控制器执行。In the embodiment shown in FIG. 4 , the processing procedure when the storage system is expanded is described. The following describes the process of reducing the capacity of the storage system. Please refer to FIG. 8 , which is a flowchart of another example of the data storage method in the embodiment of the present application. In the following description, the method is used in the storage system shown in FIG. 3C as an example, that is, the method is executed by the controller in the storage system shown in FIG. 3C .

S81、存储系统的控制器生成并存储与第一拓扑结构对应的M个第一存储分区组,该M个第一存储分区组中的每个第一存储分区组对应K个第一存储设备。S81. The controller of the storage system generates and stores M first storage partition groups corresponding to the first topology structure, and each first storage partition group in the M first storage partition groups corresponds to K first storage devices.

作为一种示例,在图3C所示的存储系统中,包括5个存储节点,每个存储节点包括4个硬盘,共20个硬盘,各个硬盘依次标记为硬盘1~硬盘20。该存储系统配置的数据冗余模式为EC8+2模式,假设该存储系统创建6个PT,则每个PT中包括10个pt,其中数据分片占用8个分区,检验部分占用2个分区,从而得到如图9所示的PT。As an example, the storage system shown in FIG. 3C includes 5 storage nodes, each storage node includes 4 hard disks, a total of 20 hard disks, and the hard disks are labeled as hard disk 1 to hard disk 20 in sequence. The data redundancy mode configured by the storage system is EC8+2 mode. Assuming that the storage system creates 6 PTs, each PT includes 10 PTs, of which the data slice occupies 8 partitions, and the inspection part occupies 2 partitions. Thus, the PT shown in FIG. 9 is obtained.

在本申请实施例中,以控制器采用轮询算法,将6个PT映射至20个硬盘为例,得到如图10所示的第一视图。在图10中,PT1、PT3以及PT5对应的10个pt映射在硬盘1~硬盘10中,其中,8个数据盘为硬盘1~硬盘8,硬盘9和硬盘10为检验盘,PT2、PT4以及PT6对应的10个pt映射在硬盘11~硬盘20中,其中,8个数据盘为硬盘11~硬盘18,硬盘19和硬盘20为检验盘。In the embodiment of the present application, taking the controller using a polling algorithm to map 6 PTs to 20 hard disks as an example, the first view shown in FIG. 10 is obtained. In Figure 10, 10 pts corresponding to PT1, PT3 and PT5 are mapped in HDD 1 to HDD 10, among which 8 data disks are HDD 1 to HDD 8, HDD 9 and HDD 10 are inspection disks, PT2, PT4 and The 10 pts corresponding to the PT6 are mapped in the hard disks 11 to 20 , among which 8 data disks are the hard disks 11 to 18 , and the hard disks 19 and 20 are test disks.

S82、存储系统的控制器在接收到待存储的数据后,确定将该待存储的数据存储在M个第一存储分区组中的至少一个分区组中。S82. After receiving the data to be stored, the controller of the storage system determines to store the data to be stored in at least one partition group of the M first storage partition groups.

S83、存储系统的控制器向至少一个存储节点发送数据存储指令,该至少一个存储节点接收该数据存储指令,至少一个存储节点接收该数据存储指令,并执行该数据存储指令。S83. The controller of the storage system sends a data storage instruction to at least one storage node, the at least one storage node receives the data storage instruction, and the at least one storage node receives the data storage instruction and executes the data storage instruction.

S84、至少一个存储节点中的每个存储节点向存储系统的控制器发送心跳,存储系统的控制器接收该心跳。S84. Each storage node in the at least one storage node sends a heartbeat to the controller of the storage system, and the controller of the storage system receives the heartbeat.

S85、存储系统的控制器确定存储系统减容。S85. The controller of the storage system determines the capacity reduction of the storage system.

在本申请实施例中,存储系统减容可以理解为,存储系统的拓扑结构由该第一拓扑结构更新为第二拓扑结构,且第二拓扑结构中的存储节点的数量变小。In this embodiment of the present application, the capacity reduction of the storage system can be understood as that the topology structure of the storage system is updated from the first topology structure to the second topology structure, and the number of storage nodes in the second topology structure becomes smaller.

步骤S81~步骤S85与步骤S41~步骤S45相似,在此不再赘述。Steps S81 to S85 are similar to steps S41 to S45 and will not be repeated here.

S86、存储系统的控制器生成与该第二拓扑结构对应的N个第二存储分区组,该N个第二存储分区组中的每个第二存储分区组对应P个第二存储设备。S86. The controller of the storage system generates N second storage partition groups corresponding to the second topology structure, and each second storage partition group in the N second storage partition groups corresponds to P second storage devices.

当存储系统的控制器确定存储系统的拓扑结构发生变化,则存储系统的控制器可以采用表1所示的拓扑结构与数据冗余模式的对应关系,确定与第二拓扑结构对应的数据冗余模式。该过程与S46中相应的内容相似,在此不再赘述。When the controller of the storage system determines that the topology of the storage system has changed, the controller of the storage system may use the correspondence between the topology and the data redundancy mode shown in Table 1 to determine the data redundancy corresponding to the second topology. model. This process is similar to the corresponding content in S46, and will not be repeated here.

在本申请实施例中,以存储系统由图3C所示的5个存储节点的第一拓扑结构变为图3A所示的3个存储节点的第二拓扑结构,且存储系统的控制器确定在第二拓扑结构下配置的数据冗余模式为EC4+2,也就是说,在第二拓扑结构下,一个PT中包括6个pt,其中4个pt用于存储数据分片,2个pt用于存储校验部分,即P小于K,存储系统的控制器生成与第二拓扑结构对应的N个第二存储分区组可以包括但不限于两种方式,第一种方式,不考虑在第一拓扑结构下的M个第一存储分区组,生成N个第二存储分区组;第二种方式,考虑在第一拓扑结构下的M个第一存储分区组,生成N个第二存储分区组,这两种方式与步骤S46中相应的内容相似,在此不再赘述。In this embodiment of the present application, the storage system is changed from the first topology structure of 5 storage nodes shown in FIG. 3C to the second topology structure of 3 storage nodes shown in FIG. 3A , and the controller of the storage system determines that the The data redundancy mode configured in the second topology is EC4+2, that is to say, in the second topology, a PT includes 6 pts, of which 4 pts are used to store data slices, and 2 pts are used for In the storage verification part, that is, P is less than K, the controller of the storage system generates N second storage partition groups corresponding to the second topology, which may include, but are not limited to, two ways. M first storage partition groups under the topology structure generate N second storage partition groups; in the second method, consider M first storage partition groups under the first topology structure, and generate N second storage partition groups , these two manners are similar to the corresponding contents in step S46, and are not repeated here.

以第二种方式为例进行说明,作为一种示例,存储系统的控制器可以多次采用与S41中相同的数据排布算法,例如,轮询算法或者哈希算法等,获取与该6个PT对应的多个第二视图,然后根据该多个第二视图与该第一视图的联动情况,从该多个第二视图中选择每个PT中相同位置的pt落入相同的硬盘的最多的一个第二视图,作为在第二拓扑结构下的M个第二存储分区组的第二视图。The second method is used as an example to illustrate. As an example, the controller of the storage system may use the same data arrangement algorithm as in S41 for many times, such as a polling algorithm or a hash algorithm, etc., to obtain the same data as the six data. A plurality of second views corresponding to the PT, and then according to the linkage between the plurality of second views and the first view, from the plurality of second views, select the pt at the same position in each PT that falls into the same hard disk at most. A second view of , as a second view of the M second storage partition groups under the second topology.

请参考图11,为第二视图的一种示例,在图11中,N的取值为6。如图11所示,每一个第二存储分区组中的4个硬盘中,均有2个位置的硬盘编号与第一视图中相同位置的硬盘编号相同,例如,在图11所示的视图中,PT1的D3所在的硬盘为硬盘3,D4所在的硬盘为硬盘4,在图10所示的视图中,PT1的D3所在的硬盘也为硬盘3,D4所在的硬盘也为硬盘4;在图11所示的视图中,PT2的D3所在的硬盘为硬盘9,D4所在的硬盘为硬盘10,在图10所示的视图中,PT2的D3所在的硬盘也为硬盘9,D4所在的硬盘也为硬盘10;对于其他的PT中也有类似的情况,在此不一一列举。Please refer to FIG. 11 , which is an example of the second view. In FIG. 11 , the value of N is 6. As shown in FIG. 11 , among the 4 hard disks in each second storage partition group, the hard disk numbers in two positions are the same as the hard disk numbers in the same position in the first view. For example, in the view shown in FIG. 11 , the hard disk where D3 of PT1 is located is hard disk 3, and the hard disk where D4 is located is hard disk 4. In the view shown in Figure 10, the hard disk where D3 of PT1 is located is also hard disk 3, and the hard disk where D4 is located is also hard disk 4; In the view shown in Figure 11, the hard disk where D3 of PT2 is located is hard disk 9, and the hard disk where D4 is located is hard disk 10. In the view shown in Figure 10, the hard disk where D3 of PT2 is located is also hard disk 9, and the hard disk where D4 is located is also hard disk 9. It is the hard disk 10; there are similar situations in other PTs, which are not listed one by one here.

S87、存储系统的控制器使用该N个第二存储分区组更新该存储系统中存储的M个第一存储分区组。S87. The controller of the storage system uses the N second storage partition groups to update the M first storage partition groups stored in the storage system.

S88、存储系统的控制器针对已经按照M个第一视图中至少一个第一存储分区组存储的数据进行数据迁移。S88. The controller of the storage system performs data migration for the data that has been stored according to at least one first storage partition group in the M first views.

在本申请实施例中,根据S86中生成N个第二存储分区组的方式的不同,存储系统的控制器迁移数据的方式也不一样。以图11所示的方式为例,由于每个第一存储分区组中的S个位置处的第一存储设备与对应的第二存储分区组中对应的S个位置上的第二存储设备是同一个存储设备,也就是说,每个第一存储分区组对应的S个第一存储设备中的每个第一存储设备中存储的数据,与M个第二存储分区组对应的P个第二存储设备中的一个第二存储设备中存储的一部分数据相同,S为小于P的正整数,从而存储系统的控制器将每个存储在该M个第一存储分区组中的一个存储分区组对应的K个存储设备的原始数据中的部分数据按照该N个第二存储分区组中的一个第二存储分区组进行数据迁移,该每个原始数据中的部分数据与所述P个第二存储设备中的每个第二存储设备中存储的任意一部分数据均不同。In this embodiment of the present application, according to different ways of generating the N second storage partition groups in S86, the ways of migrating data by the controller of the storage system are also different. Taking the manner shown in FIG. 11 as an example, since the first storage devices at S positions in each first storage partition group and the second storage devices at S corresponding positions in the corresponding second storage partition group are The same storage device, that is to say, the data stored in each of the S first storage devices corresponding to each first storage partition group, the data stored in each of the P first storage devices corresponding to the M second storage partition groups. A part of the data stored in a second storage device in the two storage devices is the same, and S is a positive integer smaller than P, so that the controller of the storage system assigns each storage partition group stored in the M first storage partition groups to one storage partition group. Part of the data in the original data of the corresponding K storage devices is migrated according to one second storage partition group in the N second storage partition groups, and the partial data in each original data is the same as the P second storage partition group. Any part of data stored in each of the second storage devices in the storage devices is different.

作为一种示例,由图11可知,6个PT中的每一个PT中有2个pt所在的硬盘与第一存储分区组中相应的PT中的pt所在的硬盘相同,例如,在图11所示的视图中,PT1的D3所在的硬盘为硬盘3,D4所在的硬盘为硬盘4,在图10所示的视图中,PT1的D3所在的硬盘也为硬盘3,D4所在的硬盘也为硬盘4,从而,针对存储在图10所示的PT1中的数据,可以只迁移D1、D2、D5~D8即可,而可以不用迁移D3和D4。每个校验部分需要根据迁移后的多个数据分片重新生成。As an example, it can be seen from FIG. 11 that the hard disk where 2 pts are located in each of the 6 PTs is the same as the hard disk where the pts in the corresponding PTs in the first storage partition group are located. For example, as shown in FIG. 11 , In the view shown, the hard disk where D3 of PT1 is located is hard disk 3, and the hard disk where D4 is located is hard disk 4. In the view shown in Figure 10, the hard disk where D3 of PT1 is located is also hard disk 3, and the hard disk where D4 is located is also hard disk. 4. Therefore, for the data stored in PT1 shown in FIG. 10, only D1, D2, D5-D8 may be migrated, and D3 and D4 may not be migrated. Each verification part needs to be regenerated based on the migrated multiple data shards.

针对与生成N个第二存储分区组的其他方式对应的数据迁移过程,请参考步骤S48中的内容,在此不再赘述。For the data migration process corresponding to other ways of generating N second storage partition groups, please refer to the content in step S48, and details are not repeated here.

S89、存储系统的控制器在接收待存储的新数据后,将该待存储的新数据存储在与该N个第二存储分区组中的至少一个第二存储分区组对应的第二存储设备中。S89. After receiving the new data to be stored, the controller of the storage system stores the new data to be stored in a second storage device corresponding to at least one second storage partition group in the N second storage partition groups .

在上述技术方案中,存储系统支持动态的存储分区组(或视图),当存储系统的拓扑结构发生变化时,存储系统的存储分区组(或视图)也会发生改变,例如,当存储系统发生减容,由于5个存储节点中的2个存储节点发生故障,则存储系统也可以生成新的存储分区组以适应减容后的存储系统,新的存储分区组中的每一个存储分区组都可以使用,可以提高存储系统的空间利用率。In the above technical solution, the storage system supports dynamic storage partition groups (or views). When the topology of the storage system changes, the storage partition groups (or views) of the storage system also change. Capacity reduction, due to the failure of 2 storage nodes in the 5 storage nodes, the storage system can also generate a new storage partition group to adapt to the reduced storage system, each storage partition group in the new storage partition group It can be used to improve the space utilization of the storage system.

上述本申请提供的实施例中,分别从存储系统的控制器以及存储节点之间交互的角度对本申请实施例提供的方法进行了介绍。为了实现上述本申请实施例提供的方法中的各功能,存储系统的控制器可以包括硬件结构和/或软件模块,以硬件结构、软件模块、或硬件结构加软件模块的形式来实现上述各功能。上述各功能中的某个功能以硬件结构、软件模块、还是硬件结构加软件模块的方式来执行,取决于技术方案的特定应用和设计约束条件。In the above embodiments provided by the present application, the methods provided by the embodiments of the present application are respectively introduced from the perspective of interaction between a controller of a storage system and a storage node. In order to implement the functions in the methods provided by the above embodiments of the present application, the controller of the storage system may include a hardware structure and/or a software module, and implement the above functions in the form of a hardware structure, a software module, or a hardware structure plus a software module . Whether one of the above functions is performed in the form of a hardware structure, a software module, or a hardware structure plus a software module depends on the specific application and design constraints of the technical solution.

图12示出了一种存储系统的数据存储装置1200的结构示意图。其中,数据存储装置1200可以是存储系统的控制器,能够实现本申请实施例提供的方法中存储系统的控制器的功能;数据存储装置1200也可以是能够支持存储系统的控制器实现本申请实施例提供的方法中存储系统的控制器的功能的装置。数据存储装置1200可以是硬件结构、软件模块、或硬件结构加软件模块。数据存储装置1200可以由芯片系统实现。本申请实施例中,芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。FIG. 12 shows a schematic structural diagram of a data storage device 1200 of a storage system. The data storage device 1200 may be a controller of a storage system, which can implement the functions of the controller of the storage system in the methods provided in the embodiments of the present application; the data storage device 1200 may also be a controller capable of supporting the storage system to implement the implementation of the present application. The example provides an apparatus for storing the functions of a controller of a system in a method. The data storage device 1200 may be a hardware structure, a software module, or a hardware structure plus a software module. The data storage device 1200 may be implemented by a chip system. In this embodiment of the present application, the chip system may be composed of chips, or may include chips and other discrete devices.

数据存储装置1200可以包括通信模块1201以及处理模块1202。The data storage device 1200 may include a communication module 1201 and a processing module 1202 .

通信模块1201可以用于执行图4所示的实施例中的步骤S43以及步骤S44,或用于执行图8所示的实施例中的步骤S83以及步骤S84,和/或用于支持本文所描述的技术的其它过程。通信模块1201用于数据存储装置1200和其它模块进行通信,其可以是电路、器件、接口、总线、软件模块、收发器或者其它任意可以实现通信的装置。The communication module 1201 can be used to perform steps S43 and S44 in the embodiment shown in FIG. 4 , or to perform steps S83 and S84 in the embodiment shown in FIG. 8 , and/or to support the descriptions herein other processes of the technology. The communication module 1201 is used for the data storage device 1200 to communicate with other modules, and it can be a circuit, a device, an interface, a bus, a software module, a transceiver or any other device that can implement communication.

处理模块1202可以用于执行图4所示的实施例中的步骤S41~步骤S42以及步骤S45~步骤S49,或用于执行图8所示的实施例中的步骤S81~步骤S82以及步骤S85~步骤S89,和/或用于支持本文所描述的技术的其它过程。The processing module 1202 may be used to perform steps S41 to S42 and steps S45 to S49 in the embodiment shown in FIG. 4 , or to perform steps S81 to S82 and steps S85 to S85 to the embodiment shown in FIG. 8 . Step S89, and/or other processes for supporting the techniques described herein.

其中,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。Wherein, all relevant contents of the steps involved in the above method embodiments can be cited in the functional descriptions of the corresponding functional modules, which will not be repeated here.

本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,另外,在本申请各个实施例中的各功能模块可以集成在一个处理器中,也可以是单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。The division of modules in the embodiments of the present application is schematic, and is only a logical function division. In actual implementation, there may be other division methods. In addition, the functional modules in the various embodiments of the present application may be integrated into one processing unit. In the device, it can also exist physically alone, or two or more modules can be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware, and can also be implemented in the form of software function modules.

如图13所示为本申请实施例提供的一种存储系统的数据存储装置1300,其中,数据存储装置1300可以是图4或图8所示的实施例中的存储系统的控制器,能够实现本申请图4实施例提供的方法中存储系统的控制器的功能;数据存储装置1300也可以是能够支持存储系统的控制器实现本申请图4或图8所示的实施例提供的方法中存储系统的控制器的功能的装置。其中,该数据存储装置1300可以为芯片系统。本申请实施例中,芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。FIG. 13 shows a data storage device 1300 of a storage system provided by an embodiment of the present application, wherein the data storage device 1300 may be the controller of the storage system in the embodiment shown in FIG. 4 or FIG. The function of the controller of the storage system in the method provided in the embodiment of FIG. 4 of the present application; the data storage device 1300 may also be a controller capable of supporting the storage system to implement the storage in the method provided by the embodiment shown in FIG. 4 or FIG. 8 of the present application. A device that functions as a controller of a system. Wherein, the data storage device 1300 may be a chip system. In this embodiment of the present application, the chip system may be composed of chips, or may include chips and other discrete devices.

数据存储装置1300包括至少一个处理器1320,用于实现或用于支持数据存储装置1300实现本申请图4或图8所示的实施例提供的方法中存储系统的控制器的功能。示例性地,处理器1320可以生成与第二拓扑结构对应的N个第二存储分区组,具体参见方法示例中的详细描述,此处不做赘述。The data storage device 1300 includes at least one processor 1320 for implementing or supporting the data storage device 1300 to implement the function of the controller of the storage system in the method provided by the embodiment shown in FIG. 4 or FIG. 8 of the present application. Exemplarily, the processor 1320 may generate N second storage partition groups corresponding to the second topology. For details, please refer to the detailed description in the method example, which will not be repeated here.

数据存储装置1300还可以包括至少一个存储器1330,用于存储程序指令和/或数据。存储器1330和处理器1320耦合。本申请实施例中的耦合是装置、单元或模块之间的间接耦合或通信连接,可以是电性,机械或其它的形式,用于装置、单元或模块之间的信息交互。处理器1320可能和存储器1330协同操作。处理器1320可能执行存储器1330中存储的程序指令。所述至少一个存储器中的至少一个可以包括于处理器中。当处理器1320执行存储器1330中的程序指令时,可以实现图4或图8所示的方法。Data storage device 1300 may also include at least one memory 1330 for storing program instructions and/or data. Memory 1330 and processor 1320 are coupled. The coupling in the embodiments of the present application is an indirect coupling or communication connection between devices, units or modules, which may be in electrical, mechanical or other forms, and is used for information exchange between devices, units or modules. Processor 1320 may cooperate with memory 1330. Processor 1320 may execute program instructions stored in memory 1330 . At least one of the at least one memory may be included in the processor. When the processor 1320 executes the program instructions in the memory 1330, the method shown in FIG. 4 or FIG. 8 may be implemented.

数据存储装置1300还可以包括通信接口1310,用于通过传输介质和其它设备进行通信,从而用于数据存储装置1300中的装置可以和其它设备进行通信。示例性地,该其它设备可以是客户端。处理器1320可以利用通信接口1310收发数据。The data storage device 1300 may also include a communication interface 1310 for communicating with other devices through a transmission medium, so that the devices used in the data storage device 1300 may communicate with other devices. Illustratively, the other device may be a client. The processor 1320 may use the communication interface 1310 to send and receive data.

本申请实施例中不限定上述通信接口1310、处理器1320以及存储器1330之间的具体连接介质。本申请实施例在图13中以存储器1330、处理器1320以及通信接口1310之间通过总线1340连接,总线在图13中以粗线表示,其它部件之间的连接方式,仅是进行示意性说明,并不引以为限。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示,图13中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。The specific connection medium between the communication interface 1310 , the processor 1320 , and the memory 1330 is not limited in the embodiments of the present application. In the embodiment of the present application, the memory 1330, the processor 1320, and the communication interface 1310 are connected through a bus 1340 in FIG. 13. The bus is represented by a thick line in FIG. 13, and the connection between other components is only for schematic illustration. , is not limited. The bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is used in FIG. 13, but it does not mean that there is only one bus or one type of bus.

在本申请实施例中,处理器1320可以是通用处理器、数字信号处理器、专用集成电路、现场可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件,可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。In this embodiment of the present application, the processor 1320 may be a general-purpose processor, a digital signal processor, an application-specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, which may implement Alternatively, each method, step, and logic block diagram disclosed in the embodiments of the present application are executed. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the methods disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware processor, or executed by a combination of hardware and software modules in the processor.

在本申请实施例中,存储器1330可以是非易失性存储器,比如硬盘(hard diskdrive,HDD)或固态硬盘(solid-state drive,SSD)等,还可以是易失性存储器(volatilememory),例如随机存取存储器(random-access memory,RAM)。存储器是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。本申请实施例中的存储器还可以是电路或者其它任意能够实现存储功能的装置,用于存储程序指令和/或数据。In this embodiment of the present application, the memory 1330 may be a non-volatile memory, such as a hard disk drive (HDD) or a solid-state drive (SSD), etc., or may be a volatile memory (volatile memory), such as random access Access memory (random-access memory, RAM). Memory is, but is not limited to, any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory in this embodiment of the present application may also be a circuit or any other device capable of implementing a storage function, for storing program instructions and/or data.

本申请实施例中还提供一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行图4或图8中所示的实施例中存储系统的控制器执行的方法。Embodiments of the present application also provide a computer-readable storage medium, including instructions, which, when executed on a computer, cause the computer to execute the method executed by the controller of the storage system in the embodiment shown in FIG. 4 or FIG. 8 .

本申请实施例中还提供一种计算机程序产品,包括指令,当其在计算机上运行时,使得计算机执行图4或图8中所示的实施例中存储系统的控制器执行的方法。Embodiments of the present application also provide a computer program product, including instructions, which when run on a computer, cause the computer to execute the method executed by the controller of the storage system in the embodiment shown in FIG. 4 or FIG. 8 .

本申请实施例提供了一种芯片系统,该芯片系统包括处理器,还可以包括存储器,用于实现前述方法中存储系统的控制器的功能。该芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。An embodiment of the present application provides a chip system, where the chip system includes a processor, and may further include a memory, for implementing the function of the controller of the storage system in the foregoing method. The chip system can be composed of chips, and can also include chips and other discrete devices.

本申请实施例提供的方法中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、网络设备、用户设备或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,简称DSL)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机可以存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,数字视频光盘(digital video disc,简称DVD))、或者半导体介质(例如,SSD)等。The methods provided in the embodiments of the present application may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, it can be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated. The computer may be a general purpose computer, a special purpose computer, a computer network, network equipment, user equipment, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server or data center Transmission to another website site, computer, server or data center via wired (eg coaxial cable, optical fiber, digital subscriber line, DSL for short) or wireless (eg infrared, wireless, microwave, etc.) A computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The available media can be magnetic media (eg, floppy disks, hard disks, magnetic tape), optical media (eg, digital video disc (DVD) for short), or semiconductor media (eg, SSD), and the like.

显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the present application without departing from the scope of the present application. Thus, if these modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is also intended to include these modifications and variations.

Claims (17)

1.一种存储系统的数据存储方法,其特征在于,包括:1. a data storage method of a storage system, characterized in that, comprising: 在存储系统的拓扑结构由第一拓扑结构更新为第二拓扑结构时,生成与所述第二拓扑结构对应的N个第二存储分区组,所述N个第二存储分区组中的每个第二存储分区组对应P个第二存储设备,N和P为正整数;其中,所述第一拓扑结构与所述第二拓扑结构不同;When the topology of the storage system is updated from the first topology to the second topology, N second storage partition groups corresponding to the second topology are generated, and each of the N second storage partition groups The second storage partition group corresponds to P second storage devices, and N and P are positive integers; wherein, the first topology is different from the second topology; 使用所述N个第二存储分区组更新所述存储系统中存储的M个第一存储分区组,所述M个第一存储分区组与所述第一拓扑结构对应,所述M个第一存储分区组中的每个第一存储分区组对应K个第一存储设备,M和K为正整数;P和K的取值不同;Using the N second storage partition groups to update the M first storage partition groups stored in the storage system, the M first storage partition groups correspond to the first topology, and the M first storage partition groups Each first storage partition group in the storage partition group corresponds to K first storage devices, and M and K are positive integers; the values of P and K are different; 在接收待存储的新数据后,将所述待存储的新数据存储在与所述N个第二存储分区组中的至少一个第二存储分区组对应的第二存储设备中。After receiving the new data to be stored, the new data to be stored is stored in a second storage device corresponding to at least one of the N second storage partition groups. 2.根据权利要求1所述的方法,其特征在于,P的值与所述存储系统在所述第二拓扑结构下所配置的数据冗余模式相同,K的值与所述存储系统在所述第一拓扑结构下所配置的数据冗余模式相同。2 . The method according to claim 1 , wherein the value of P is the same as the data redundancy mode configured by the storage system under the second topology structure, and the value of K is the same as that of the storage system in the second topology. 3 . The data redundancy modes configured in the first topology structure are the same. 3.根据权利要求1或2所述的方法,其特征在于,在P大于K时,所述每个第一存储分区组对应的L个第一存储设备中的每个第一存储设备中存储的一部分数据,与M个第二存储分区组对应的P个第二存储设备中的一个第二存储设备中存储的数据相同,L为小于P的正整数。3. The method according to claim 1 or 2, wherein when P is greater than K, each first storage device in the L first storage devices corresponding to each first storage partition group stores A part of the data is the same as the data stored in one of the P second storage devices corresponding to the M second storage partition groups, and L is a positive integer smaller than P. 4.根据权利要求3所述的方法,其特征在于,所述方法还包括:4. The method according to claim 3, wherein the method further comprises: 将每个原始数据中的部分数据按照所述N个第二存储分区组中的一个第二存储分区组进行数据迁移,所述每个原始数据存储在所述M个第一存储分区组中的一个存储分区组对应的K个存储设备,所述每个原始数据中的部分数据与所述L个第二存储设备中的每个第二存储设备中存储的数据均不同。Part of the data in each original data is migrated according to one second storage partition group in the N second storage partition groups, and each original data is stored in the M first storage partition groups. For K storage devices corresponding to one storage partition group, part of data in each of the original data is different from data stored in each of the L second storage devices. 5.根据权利要求1或2所述的方法,其特征在于,在P小于K时,所述每个第一存储分区组对应的S个第一存储设备中的每个第一存储设备中存储的数据,与M个第二存储分区组对应的P个第二存储设备中的一个第二存储设备中存储的一部分数据相同,S为小于P的正整数。5. The method according to claim 1 or 2, wherein when P is less than K, each first storage device in the S first storage devices corresponding to each first storage partition group stores The data is the same as a part of data stored in one of the P second storage devices corresponding to the M second storage partition groups, and S is a positive integer smaller than P. 6.根据权利要求5所述的方法,其特征在于,所述方法还包括:6. The method according to claim 5, wherein the method further comprises: 将每个原始数据中的部分数据按照所述N个第二存储分区组中的一个第二存储分区组进行数据迁移,所述每个原始数据存储在所述M个第一存储分区组中的一个存储分区组对应的K个存储设备,所述每个原始数据中的部分数据与所述P个第二存储设备中的每个第二存储设备中存储的任意一部分数据均不同。Part of the data in each original data is migrated according to one second storage partition group in the N second storage partition groups, and each original data is stored in the M first storage partition groups. For K storage devices corresponding to one storage partition group, part of data in each original data is different from any part of data stored in each of the P second storage devices. 7.根据权利要求1或2所述的方法,其特征在于,所述N个第二存储分区组中的任意一个第二存储分区组对应的P个第二存储设备,与所述M个第一存储分区组中任意一个第一存储分区组中包括的K个第一存储设备不相同。7. The method according to claim 1 or 2, wherein the P second storage devices corresponding to any one of the N second storage partition groups are different from the M second storage devices. The K first storage devices included in any one of the first storage partition groups in a storage partition group are different. 8.根据权利要求7所述的方法,其特征在于,所述方法还包括:8. The method according to claim 7, wherein the method further comprises: 将每个原始数据按照所述N个第二存储分区组中的一个第二存储分区组进行数据迁移,所述每个原始数据为,所述存储系统在所述第一拓扑结构时,按照所述M个第一存储分区组中的至少一个第一存储分区组存储所述存储系统中的数据。Data migration is performed for each original data according to one second storage partition group in the N second storage partition groups, and the each original data is that the storage system in the first topology structure, according to the At least one first storage partition group in the M first storage partition groups stores data in the storage system. 9.一种存储系统的数据存储装置,其特征在于,包括通信接口和处理器,其中:9. A data storage device for a storage system, comprising a communication interface and a processor, wherein: 所述处理器,用于在存储系统的拓扑结构由第一拓扑结构更新为第二拓扑结构时,生成与所述第二拓扑结构对应的N个第二存储分区组,所述N个第二存储分区组中的每个第二存储分区组对应P个第二存储设备,N和P为正整数;其中,所述第一拓扑结构与所述第二拓扑结构不同;The processor is configured to generate N second storage partition groups corresponding to the second topology structure when the topology structure of the storage system is updated from the first topology structure to the second topology structure, and the N second topological structure Each second storage partition group in the storage partition group corresponds to P second storage devices, and N and P are positive integers; wherein, the first topology structure is different from the second topology structure; 所述处理器,还用于使用所述N个第二存储分区组更新所述存储系统中存储的M个第一存储分区组,所述M个第一存储分区组与所述第一拓扑结构对应,所述M个第一存储分区组中的每个第一存储分区组对应K个第一存储设备,M和K为正整数;P和K的取值不同;The processor is further configured to use the N second storage partition groups to update the M first storage partition groups stored in the storage system, the M first storage partition groups and the first topology structure Correspondingly, each first storage partition group in the M first storage partition groups corresponds to K first storage devices, and M and K are positive integers; P and K have different values; 所述处理器,还用于在通过所述通信接口接收待存储的新数据后,将所述待存储的新数据存储在与所述N个第二存储分区组中的至少一个第二存储分区组对应的第二存储设备中。The processor is further configured to, after receiving the new data to be stored through the communication interface, store the new data to be stored in at least one second storage partition with the N second storage partition groups in the second storage device corresponding to the group. 10.根据权利要求9所述的装置,其特征在于,P的值与所述存储系统在所述第二拓扑结构下所配置的数据冗余模式相同,K的值与所述存储系统在所述第一拓扑结构下所配置的数据冗余模式相同。10 . The apparatus according to claim 9 , wherein the value of P is the same as the data redundancy mode configured by the storage system under the second topology structure, and the value of K is the same as that of the storage system in the second topology structure. 11 . The data redundancy modes configured in the first topology structure are the same. 11.根据权利要求9或10所述的装置,其特征在于,在P大于K时,所述每个第一存储分区组对应的L个第一存储设备中的每个第一存储设备中存储的一部分数据,与M个第二存储分区组对应的P个第二存储设备中的一个第二存储设备中存储的数据相同,L为小于P的正整数。11. The apparatus according to claim 9 or 10, wherein when P is greater than K, each first storage device in the L first storage devices corresponding to each first storage partition group stores A part of the data is the same as the data stored in one of the P second storage devices corresponding to the M second storage partition groups, and L is a positive integer smaller than P. 12.根据权利要求11所述的装置,其特征在于,所述处理器还用于:12. The apparatus of claim 11, wherein the processor is further configured to: 将每个原始数据中的部分数据按照所述N个第二存储分区组中的一个第二存储分区组进行数据迁移,所述每个原始数据存储在所述M个第一存储分区组中的一个存储分区组对应的K个存储设备,所述每个原始数据中的部分数据与所述L个第二存储设备中的每个第二存储设备中存储的数据均不同。Part of the data in each original data is migrated according to one second storage partition group in the N second storage partition groups, and each original data is stored in the M first storage partition groups. For K storage devices corresponding to one storage partition group, part of data in each of the original data is different from data stored in each of the L second storage devices. 13.根据权利要求9或10所述的装置,其特征在于,在P小于K时,所述每个第一存储分区组对应的S个第一存储设备中的每个第一存储设备中存储的数据,与M个第二存储分区组对应的P个第二存储设备中的一个第二存储设备中存储的一部分数据相同,S为小于P的正整数。13. The apparatus according to claim 9 or 10, wherein when P is less than K, each first storage device in the S first storage devices corresponding to each first storage partition group stores The data is the same as a part of data stored in one of the P second storage devices corresponding to the M second storage partition groups, and S is a positive integer smaller than P. 14.根据权利要求13所述的装置,其特征在于,所述处理器还用于:14. The apparatus of claim 13, wherein the processor is further configured to: 将每个原始数据中的部分数据按照所述N个第二存储分区组中的一个第二存储分区组进行数据迁移,所述每个原始数据存储在所述M个第一存储分区组中的一个存储分区组对应的K个存储设备,所述每个原始数据中的部分数据与所述P个第二存储设备中的每个第二存储设备中存储的任意一部分数据均不同。Part of the data in each original data is migrated according to one second storage partition group in the N second storage partition groups, and each original data is stored in the M first storage partition groups. For K storage devices corresponding to one storage partition group, part of data in each original data is different from any part of data stored in each of the P second storage devices. 15.根据权利要求9或10所述的装置,其特征在于,所述N个第二存储分区组中的任意一个第二存储分区组对应的P个第二存储设备,与所述M个第一存储分区组中任意一个第一存储分区组中包括的K个第一存储设备不相同。15. The apparatus according to claim 9 or 10, wherein the P second storage devices corresponding to any one of the N second storage partition groups are different from the M second storage devices. The K first storage devices included in any one of the first storage partition groups in a storage partition group are different. 16.根据权利要求15所述的装置,其特征在于,所述处理器还用于:16. The apparatus of claim 15, wherein the processor is further configured to: 将每个原始数据按照所述N个第二存储分区组中的一个第二存储分区组进行数据迁移,所述每个原始数据为,所述存储系统在所述第一拓扑结构时,按照所述M个第一存储分区组中的至少一个第一存储分区组存储所述存储系统中的数据。Data migration is performed for each original data according to one second storage partition group in the N second storage partition groups, and the each original data is that the storage system in the first topology structure, according to the At least one first storage partition group in the M first storage partition groups stores data in the storage system. 17.一种计算机存储介质,其特征在于,所述计算机存储介质存储有指令,当所述指令在计算机上运行时,使得所述计算机执行如权利要求1-8任一项所述的方法。17. A computer storage medium, wherein the computer storage medium stores instructions that, when executed on a computer, cause the computer to execute the method according to any one of claims 1-8.
CN201811613679.6A 2018-12-27 2018-12-27 Data storage method and device for a storage system Active CN109840051B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811613679.6A CN109840051B (en) 2018-12-27 2018-12-27 Data storage method and device for a storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811613679.6A CN109840051B (en) 2018-12-27 2018-12-27 Data storage method and device for a storage system

Publications (2)

Publication Number Publication Date
CN109840051A CN109840051A (en) 2019-06-04
CN109840051B true CN109840051B (en) 2020-08-07

Family

ID=66883413

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811613679.6A Active CN109840051B (en) 2018-12-27 2018-12-27 Data storage method and device for a storage system

Country Status (1)

Country Link
CN (1) CN109840051B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110765230B (en) * 2019-09-03 2022-08-09 平安科技(深圳)有限公司 Legal text storage method and device, readable storage medium and terminal equipment
CN112748862B (en) 2019-10-31 2024-08-09 伊姆西Ip控股有限责任公司 Method, electronic device and computer program product for managing a disk
CN114063884B (en) * 2020-07-31 2024-07-12 伊姆西Ip控股有限责任公司 Partitioning method, apparatus and computer program product for an extended storage system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105630423A (en) * 2015-12-25 2016-06-01 华中科技大学 Erasure code cluster storage expansion method based on data caching
CN107111450A (en) * 2014-10-24 2017-08-29 微软技术许可有限责任公司 The disk partition splicing and equilibrium again carried out using partition table
KR20180012436A (en) * 2016-07-27 2018-02-06 (주)선재소프트 The database management system and method for preventing performance degradation of transaction when table reconfiguring
CN107943421A (en) * 2017-11-30 2018-04-20 成都华为技术有限公司 A kind of subregion partitioning method and device based on distributed memory system
CN108780386A (en) * 2017-12-20 2018-11-09 华为技术有限公司 A method, device and system for data storage

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7739577B2 (en) * 2004-06-03 2010-06-15 Inphase Technologies Data protection system
WO2015188014A1 (en) * 2014-06-04 2015-12-10 Pure Storage, Inc. Automatically reconfiguring a storage memory topology
EP3208714B1 (en) * 2015-12-31 2019-08-21 Huawei Technologies Co., Ltd. Data reconstruction method, apparatus and system in distributed storage system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107111450A (en) * 2014-10-24 2017-08-29 微软技术许可有限责任公司 The disk partition splicing and equilibrium again carried out using partition table
CN105630423A (en) * 2015-12-25 2016-06-01 华中科技大学 Erasure code cluster storage expansion method based on data caching
KR20180012436A (en) * 2016-07-27 2018-02-06 (주)선재소프트 The database management system and method for preventing performance degradation of transaction when table reconfiguring
CN107943421A (en) * 2017-11-30 2018-04-20 成都华为技术有限公司 A kind of subregion partitioning method and device based on distributed memory system
CN108780386A (en) * 2017-12-20 2018-11-09 华为技术有限公司 A method, device and system for data storage

Also Published As

Publication number Publication date
CN109840051A (en) 2019-06-04

Similar Documents

Publication Publication Date Title
US12105979B2 (en) Servicing input/output (‘I/O’) operations during a change in membership to a pod of storage systems synchronously replicating a dataset
US11803492B2 (en) System resource management using time-independent scheduling
US20240394159A1 (en) Stateful Membership Management For Storage Systems Replicating A Dataset
US10331370B2 (en) Tuning a storage system in dependence upon workload access patterns
US20190235777A1 (en) Redundant storage system
CN110096220B (en) Distributed storage system, data processing method and storage node
US10708355B2 (en) Storage node, storage node administration device, storage node logical capacity setting method, program, recording medium, and distributed data storage system
KR20170132651A (en) Method and apparatus for tenant-aware storage sharing platform
US20230138462A1 (en) Migrating Similar Data To A Single Data Reduction Pool
CN109840051B (en) Data storage method and device for a storage system
US10509581B1 (en) Maintaining write consistency in a multi-threaded storage system
US20210263654A1 (en) Mapping luns in a storage memory
WO2018075790A1 (en) Performance tuning in a storage system that includes one or more storage devices
WO2018022779A1 (en) Evacuating blades in a storage array that includes a plurality of blades
US12332744B2 (en) Method and system for media error recovery
US20250085900A1 (en) Method and system for linear raid level for large capacity and performance
US20250245106A1 (en) Method for efficient spare pe allocation of a raid system
US10360107B2 (en) Modifying allocation of storage resources in a dispersed storage network
EP3485365A1 (en) Performance tuning in a storage system that includes one or more storage devices
CN120653423A (en) Data access method, CXL switching device and computing system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant