CN114415968B

CN114415968B - Storage system and data writing method thereof

Info

Publication number: CN114415968B
Application number: CN202210091468.0A
Authority: CN
Inventors: 陈飞
Original assignee: Nanjing Xiaozhuang University
Current assignee: Nanjing Xiaozhuang University
Priority date: 2022-01-26
Filing date: 2022-01-26
Publication date: 2023-04-07
Anticipated expiration: 2042-01-26
Also published as: CN114415968A

Abstract

The invention discloses a storage system and a data writing method thereof, wherein the method comprises the steps of receiving a data writing request, and selecting the RAID level of data storage according to the data corresponding to the writing request and the state of the storage system; the physical disk of the storage system is divided into a plurality of storage blocks, the plurality of storage blocks from different physical disks are configured into storage groups according to the storage structures corresponding to the RAID levels, and the plurality of storage groups are configured into RAID storage groups of corresponding RAID levels; allocating a storage space according to the RAID level of data storage, determining the stripe relation of a storage group, and splitting a write request into a plurality of sub-requests according to the mapping relation between a stripe and a storage block; and executing the sub-request to finish data writing. By adopting the technical scheme, data redundancy is kept, and high-performance data writing can be provided.

Description

Storage system and data writing method thereof

技术领域technical field

本发明涉及数据存储技术领域，尤其涉及存储系统及其数据写入方法。The invention relates to the technical field of data storage, in particular to a storage system and a data writing method thereof.

背景技术Background technique

RAID(Redundant Array of Inexpensive Disk)，即廉价冗余磁盘阵列，是目前存储系统常用的一种数据冗余备份技术。根据磁盘不同的组合方式，RAID可以分为RAID0、RAID1、RAID10、RAID3、RAID5、RAID6等多种级别。RAID (Redundant Array of Inexpensive Disk), that is, cheap redundant disk array, is a data redundancy backup technology commonly used in storage systems at present. According to different combinations of disks, RAID can be divided into RAID0, RAID1, RAID10, RAID3, RAID5, RAID6 and other levels.

随着网络技术及信息处理技术的发展，存储系统含有的物理磁盘越来越多，RAID6级别因为能够同时容忍两块磁盘损坏，兼顾性能与磁盘空间率的同时，还能更大限度的保护用户数据的可靠性，因而在存储设备中广泛采用。With the development of network technology and information processing technology, the storage system contains more and more physical disks. Because RAID6 level can tolerate the damage of two disks at the same time, it can protect users to a greater extent while taking into account performance and disk space ratio. Data reliability, so it is widely used in storage devices.

其中存在问题是，采用RAID6的存储系统会存在写惩罚以及非一致性数据更新的问题，尤其是在小数据写入以及磁盘故障导致存储系统降级时，RAID6的写惩罚对数据写入性能影响尤其明显。The problem is that the storage system using RAID6 will have write penalties and non-consistent data updates, especially when small data writes and disk failures cause storage system degradation, the impact of RAID6 write penalties on data write performance is particularly high. obvious.

发明内容Contents of the invention

发明目的：本发明提供一种存储系统及其数据写入方法，通过在多个物理磁盘上设置多种RAID级别，根据存储系统的存储空间状态和写入数据的长度，适应性的选择最适合的RAID级别，旨在保持数据冗余的同时仍然能够提供高性能的数据写入。Purpose of the invention: The present invention provides a storage system and a data writing method thereof. By setting multiple RAID levels on multiple physical disks, the adaptive selection is the most suitable according to the storage space status of the storage system and the length of the written data. The RAID level is designed to maintain data redundancy while still providing high-performance data writing.

技术方案：本发明提供一种存储系统的数据写入方法，包括：接收数据写入请求，根据写入请求对应的数据和存储系统的状态，选择数据存储的RAID级别；所述存储系统的物理磁盘切分为多个存储块，根据RAID级别对应的存储结构，来自不同物理磁盘的多个存储块配置为存储组，多个存储组配置为相应RAID级别的RAID存储组；根据数据存储的RAID级别分配存储空间，并确定存储组的条带关系，将写入请求按照条带与存储块之间的映射关系拆分为多个子请求；执行子请求，完成数据写入。Technical solution: the present invention provides a data writing method of a storage system, comprising: receiving a data writing request, and selecting a RAID level for data storage according to the data corresponding to the writing request and the state of the storage system; The disk is divided into multiple storage blocks. According to the storage structure corresponding to the RAID level, multiple storage blocks from different physical disks are configured as storage groups, and multiple storage groups are configured as RAID storage groups of corresponding RAID levels; according to the RAID level of data storage Allocate storage space at the level, determine the stripe relationship of the storage group, split the write request into multiple sub-requests according to the mapping relationship between stripes and storage blocks; execute the sub-requests, and complete data writing.

具体的，所述存储系统配置的RAID级别，包括RAID0、RAID1、RAID5和RAID6中的至少一个。Specifically, the RAID level configured by the storage system includes at least one of RAID0, RAID1, RAID5, and RAID6.

具体的，所述选择数据存储的RAID级别，包括：若存储系统的状态是，没有配置冗余存储空间或冗余存储空间不足时，选择RAID0级别；若存储系统的状态是单冗余存储空间，由数据长度决定的写入性能确定两副本RAID1或RAID5；若存储系统的状态是双冗余存储空间，由数据长度决定的写入性能确定三副本RAID1或RAID6；所述冗余存储空间指镜像副本存储空间或校验存储空间。Specifically, the selection of the RAID level of data storage includes: if the state of the storage system is that there is no redundant storage space configured or the redundant storage space is insufficient, selecting the RAID0 level; if the state of the storage system is a single redundant storage space , the write performance determined by the data length determines two copies of RAID1 or RAID5; if the state of the storage system is a dual-redundant storage space, the write performance determined by the data length determines three copies of RAID1 or RAID6; the redundant storage space refers to Mirror copy storage space or checksum storage space.

具体的，所述执行子请求，还包括：记录数据写入的逻辑地址和存储空间的物理地址之间的对应关系。Specifically, the execution of the sub-request further includes: recording a correspondence between a logical address in which data is written and a physical address in the storage space.

具体的，所述存储系统的物理磁盘切分为多个存储大小相等的存储块。Specifically, the physical disk of the storage system is divided into multiple storage blocks with equal storage sizes.

具体的，还包括数据迁移，步骤包括：将数据从存储空间中进行读取，按照条带关系重新组织；将组织完成的数据按照相应的RAID级别分配新的存储空间，进行数据写入，更新数据写入的逻辑地址和存储空间的物理地址之间的对应关系，同时释放迁移前占用存储空间。Specifically, it also includes data migration. The steps include: reading data from the storage space, and reorganizing according to the stripe relationship; allocating the organized data to a new storage space according to the corresponding RAID level, and writing and updating the data The corresponding relationship between the logical address of data writing and the physical address of the storage space, and at the same time release the storage space occupied before the migration.

本发明还提供一种存储系统，包括：RAID部署模块、数据写入模块和磁盘阵列，其中：所述RAID部署模块，用于接收数据写入请求，根据写入请求对应的数据和存储系统的状态，选择数据存储的RAID级别；所述数据写入模块，用于根据数据存储的RAID级别分配存储空间，并确定存储组的条带关系，将写入请求按照条带与存储块之间的映射关系拆分为多个子请求；执行子请求，完成数据写入；所述磁盘阵列，物理磁盘切分为多个存储块，根据RAID级别对应的存储结构，来自不同物理磁盘的多个存储块配置为存储组，多个存储组配置为相应RAID级别的RAID存储组。The present invention also provides a storage system, including: a RAID deployment module, a data writing module, and a disk array, wherein: the RAID deployment module is used to receive a data writing request, and according to the data corresponding to the writing request and the storage system State, select the RAID level of data storage; The data writing module is used to allocate storage space according to the RAID level of data storage, and determine the stripe relationship of the storage group, and write the request according to the stripe and storage block The mapping relationship is divided into multiple sub-requests; the sub-requests are executed to complete data writing; the disk array, the physical disk is divided into multiple storage blocks, and multiple storage blocks from different physical disks are selected according to the storage structure corresponding to the RAID level It is configured as a storage group, and multiple storage groups are configured as RAID storage groups of corresponding RAID levels.

具体的，所述RAID部署模块，用于若存储系统的状态是，没有配置冗余存储空间或冗余存储空间不足时，选择RAID0级别；若存储系统的状态是单冗余存储空间，由数据长度决定的写入性能确定两副本RAID1或RAID5；若存储系统的状态是双冗余存储空间，由数据长度决定的写入性能确定三副本RAID1或RAID6；所述冗余存储空间指镜像副本存储空间或校验存储空间。Specifically, the RAID deployment module is used to select the RAID0 level if the state of the storage system is that there is no redundant storage space configured or the redundant storage space is insufficient; if the state of the storage system is a single redundant storage space, the data The write performance determined by the length determines two copies of RAID1 or RAID5; if the state of the storage system is dual redundant storage space, the write performance determined by the data length determines three copies of RAID1 or RAID6; the redundant storage space refers to mirror copy storage space or parity storage space.

具体的，包括空间管理模块，用于分配存储空间，记录数据写入的逻辑地址和存储空间的物理地址之间的对应关系。Specifically, it includes a space management module for allocating storage space and recording the correspondence between the logical address where data is written and the physical address of the storage space.

具体的，包括数据迁移模块，用于将数据从存储空间中进行读取，按照条带关系重新组织；将组织完成的数据按照相应的RAID级别分配存储空间，进行数据写入；空间管理模块用于记录数据写入的逻辑地址和存储空间的物理地址之间的对应关系，更新并释放空闲的存储空间。Specifically, it includes a data migration module, which is used to read data from the storage space and reorganize according to the stripe relationship; allocate storage space for the organized data according to the corresponding RAID level, and write data; the space management module uses Based on the correspondence between the logical address where the record data is written and the physical address of the storage space, the free storage space is updated and released.

有益效果：与现有技术相比，本发明具有如下显著优点：保持数据冗余的同时仍然能够提供高性能的数据写入。Beneficial effect: Compared with the prior art, the present invention has the following significant advantages: while maintaining data redundancy, it can still provide high-performance data writing.

附图说明Description of drawings

图1a是RAID6大写方式原理图。Figure 1a is a schematic diagram of the capitalization mode of RAID6.

图1b是RAID6小写方式原理图。Figure 1b is a schematic diagram of the RAID6 lowercase mode.

图2a是降级条带的大写方式原理图。Figure 2a is a schematic diagram of the uppercase method of the degraded stripe.

图2b是降级条带的小写方式原理图。Fig. 2b is a schematic diagram of a lowercase mode of degraded stripes.

图3是本发明提供的存储系统的结构示意图。FIG. 3 is a schematic structural diagram of a storage system provided by the present invention.

图4是本发明提供的数据映射关系的示意图。Fig. 4 is a schematic diagram of the data mapping relationship provided by the present invention.

图5是本发明提供的数据迁移方法的示意图。Fig. 5 is a schematic diagram of the data migration method provided by the present invention.

图6本发明提供的存储系统的硬件示意图。FIG. 6 is a schematic diagram of the hardware of the storage system provided by the present invention.

具体实施方式Detailed ways

下面结合附图对本发明的技术方案作进一步说明。The technical solution of the present invention will be further described below in conjunction with the accompanying drawings.

在对本发明提供的方案详细说明之前，先对本发明涉及的相关术语、原理进行简单的说明。Before describing the solutions provided by the present invention in detail, a brief description will be given of the relevant terms and principles involved in the present invention.

条带：RAID是由多块磁盘组成，条带技术将数据以块的方式分布存储在多个磁盘中，从而可以对数据进行并发处理。这样写入和读取数据就可以在多个磁盘上并发进行，从而良好的扩展I/O性能。Stripe: RAID is composed of multiple disks. The stripe technology distributes and stores data in blocks among multiple disks, so that data can be processed concurrently. In this way, writing and reading data can be performed concurrently on multiple disks, thereby extending I/O performance well.

RAID0：是一种简单的、无数据冗余的数据条带化技术，将数据分散存储在所有的磁盘中，以独立访问的方式实现多块磁盘的并发读写，从而实现多倍的带宽性能，读写的速度是所有RAID级别中最快的一种。但RAIDO没有镜像、校验等冗余信息，所以无法提供数据保护的功能，一旦出现磁盘损坏，则会丢失所有数据。RAID0: It is a simple data striping technology without data redundancy, which stores data scattered in all disks, and realizes concurrent read and write of multiple disks in an independent access manner, thereby achieving multiple bandwidth performance , the read and write speed is the fastest among all RAID levels. However, RAIDO does not have redundant information such as mirroring and verification, so it cannot provide data protection functions. Once the disk is damaged, all data will be lost.

镜像RAID：镜像RAID条带化技术是根据数据的副本数，将数据完全一致地写到多块磁盘上。镜像RAID是通过牺牲一定的存储空间来实现数据的冗余。如果镜像副本数为2，其整体的空间利用率只有50％。虽然镜像RAID的空间利用率低，但却提供了很好的数据保护。一块磁盘损坏，不会导致数据丢失，同时对数据的读写性能影响很小。Mirrored RAID: The mirrored RAID striping technology writes data to multiple disks in a consistent manner based on the number of copies of the data. Mirrored RAID achieves data redundancy by sacrificing a certain amount of storage space. If the number of mirror copies is 2, the overall space utilization is only 50%. Although mirrored RAID has low space utilization, it provides good data protection. A damaged disk will not cause data loss, and has little impact on data read and write performance.

RAID5和RAID6：这两种RAID级别是通过将同一个条带上的数据按照某种格式进行编码计算得到校验信息，然后将数据写入对应的数据盘，而校验信息则写入校验盘。RAID5只有一块校验盘，能够容忍存储池损坏一块磁盘。而RAID6则有两个校验盘，能够容忍同时损坏两块磁盘。一般情况下，在配置存储池时，将数据盘配置为2的幂次，如4+1，表示4块数据盘，1块校验盘的RAID5存储池；而4+2，则表示4块数据盘，2块校验盘的RAID6存储池。RAID5和RAID6在没有磁盘损坏时，读性能与RAID0和RAID1有一样的高带宽。但数据写入时，由于需要计算校验数据，因此写性能会受到比较大的影响。而如果RAID5和RAID6条带上存在故障磁盘时，则对数据的读写性能影响都很大。计算校验数据一般有大写和小写两种方式，为了说明不同的写入方式对性能的影响，下面结合附图对大写、小写以及存在故障磁盘时的读写原理作简单的介绍。RAID5 and RAID6: These two RAID levels obtain the verification information by encoding the data on the same stripe according to a certain format, and then write the data to the corresponding data disk, while the verification information is written into the verification plate. RAID5 has only one parity disk and can tolerate damage to one disk in the storage pool. And RAID6 has two parity disks, which can tolerate damage to two disks at the same time. Generally, when configuring a storage pool, configure the data disks to a power of 2, such as 4+1, which means a RAID5 storage pool with 4 data disks and 1 parity disk; and 4+2, which means 4 Data disk, RAID6 storage pool with 2 check disks. The read performance of RAID5 and RAID6 has the same high bandwidth as RAID0 and RAID1 when there is no disk damage. However, when data is written, due to the need to calculate the checksum data, the write performance will be greatly affected. However, if there are faulty disks on the RAID5 and RAID6 stripes, the read and write performance of data will be greatly affected. Generally, there are two ways to calculate the verification data, uppercase and lowercase. In order to illustrate the impact of different writing methods on performance, the following is a brief introduction to uppercase, lowercase and the principle of reading and writing when there is a faulty disk with the help of the accompanying drawings.

图1a为RAID6的4+2条带大写方式的原理示意图；图1b为RAID6的4+2小写方式的原理示意图。具体地，如图1a和图1b所示，当对条带D1写入新的数据时，为了计算条带新的校验值P和Q，可以通过采用大写或者小写的方式实现，其中大写方式，需要先预读出D2、D3和D4的数据，然后根据RAID6计算得到校验值P和Q，可得到NewP1＝NewD1^D2^D3^D4；NewQ1＝k1*NewD1^k2*D2^k3*D3^k4*D4。计算完成后，将NewD1以及NewP1和NewQ1写入条带，完成数据写入。小写方式，需要先预读出D1以及P和Q的数据，然后根据RAID6大写的等价等式可计算得到新的P和Q，即NewP1＝NewD1^D1^P1,New Q1＝k1*NewD1^k1*D1^Q1。FIG. 1a is a schematic diagram of the principle of the 4+2 stripe uppercase mode of RAID6; FIG. 1b is a schematic diagram of the principle of the 4+2 lowercase mode of RAID6. Specifically, as shown in Figure 1a and Figure 1b, when new data is written to the stripe D1, in order to calculate the new verification values P and Q of the stripe, it can be realized by using uppercase or lowercase, where the uppercase , it is necessary to pre-read the data of D2, D3, and D4, and then calculate the verification values P and Q according to RAID6, and obtain NewP1=NewD1^D2^D3^D4; NewQ1=k1*NewD1^k2*D2^k3* D3^k4*D4. After the calculation is completed, write NewD1, NewP1, and NewQ1 into the stripe to complete the data writing. In the lowercase mode, it is necessary to pre-read the data of D1, P and Q, and then calculate the new P and Q according to the equivalence equation of RAID6 uppercase, that is, NewP1=NewD1^D1^P1, New Q1=k1*NewD1^ k1*D1^Q1.

图2为RAID6的4+2条带存在故障磁盘数据写入原理示意图，其中存在两种情况，图2a为新写入的数据正好落在故障盘上，在这种场景下，则只能采用大写方式；图2b为新写入的数据落在其他正常的磁盘上，在这种场景下，则只能采用小写方式实现。Figure 2 is a schematic diagram of the principle of RAID6 4+2 stripes with faulty disk data writing. There are two situations. Figure 2a shows that the newly written data just falls on the faulty disk. In this scenario, you can only use Uppercase method; Figure 2b shows that the newly written data falls on other normal disks. In this scenario, it can only be realized in lowercase.

根据所述RAID6的原理可知，具有两块校验盘，因此能够同时容忍两块磁盘损坏，在兼顾性能与磁盘空间率的同时，还能更大限度的保护用户数据的可靠性。但无论是采用大写方式还是小写方式写入数据，都不可避免的需要先通过预读数据盘或者校验盘上的老数据，才能计算出新的校验数据，并且每一次写入新的数据都需要写入校验盘，这种额外增加的预读和写校验盘就是RAID5和RAID6的写惩罚。写惩罚的次数越多，对性能的影响就越大。另外，根据RAID5或者RAID6的数据写入方式可知，写入的数据将会分布在不同的磁盘上，一旦在写入过程某块磁盘出现写入错误，而其他磁盘写入成功，则整个条带的数据将被破坏，无法再通过P或者Q计算出正确的数据，这就是非一致性更新的问题。According to the principle of RAID6, it can be known that there are two parity disks, so it can tolerate the damage of two disks at the same time, while taking into account performance and disk space ratio, it can also protect the reliability of user data to a greater extent. However, no matter whether the data is written in uppercase or lowercase, it is inevitable to calculate the new verification data by pre-reading the old data on the data disk or the verification disk, and write new data every time. Both need to be written to the verification disk. This additional read-ahead and write verification disk is the write penalty of RAID5 and RAID6. The higher the write penalty, the greater the impact on performance. In addition, according to the data writing method of RAID5 or RAID6, the written data will be distributed on different disks. Once a writing error occurs on a certain disk during the writing process, while other disks are successfully written, the entire stripe The data will be destroyed, and the correct data can no longer be calculated through P or Q. This is the problem of inconsistent updates.

参阅图3，本发明提供一种存储系统的数据写入方法，包括：接收数据写入请求，根据写入请求对应的数据和存储系统的状态，选择数据存储的RAID级别；所述存储系统的物理磁盘切分为多个存储块，根据RAID级别对应的存储结构，来自不同物理磁盘的多个存储块配置为存储组，多个存储组配置为相应RAID级别的RAID存储组；根据数据存储的RAID级别分配存储空间，并确定存储组的条带关系，将写入请求按照条带与存储块之间的映射关系拆分为多个子请求；执行子请求，完成数据写入。Referring to Fig. 3, the present invention provides a data writing method of a storage system, comprising: receiving a data writing request, and selecting the RAID level of data storage according to the data corresponding to the writing request and the state of the storage system; The physical disk is divided into multiple storage blocks. According to the storage structure corresponding to the RAID level, multiple storage blocks from different physical disks are configured as storage groups, and multiple storage groups are configured as RAID storage groups of corresponding RAID levels; The RAID level allocates storage space, determines the stripe relationship of the storage group, splits the write request into multiple sub-requests according to the mapping relationship between stripes and storage blocks; executes the sub-requests, and completes data writing.

本发明实施例中，所述存储系统配置的RAID级别，包括RAID0、RAID1、RAID5和RAID6中的至少一个。In the embodiment of the present invention, the RAID level configured by the storage system includes at least one of RAID0, RAID1, RAID5 and RAID6.

在具体实施中，可以选择系统内的部分或全部存储资源创建一个存储池，为存储池配置数据冗余模式，例如可以配置为4+2模式，则表示存储系统同一个条带上双冗余，允许同时损坏两块磁盘。In specific implementation, some or all of the storage resources in the system can be selected to create a storage pool, and the data redundancy mode can be configured for the storage pool. For example, it can be configured as 4+2 mode, which means that the storage system has dual redundancy on the same stripe. , allowing damage to two disks at the same time.

本发明实施例中，所述存储系统的物理磁盘切分为多个存储大小相等的存储块。In the embodiment of the present invention, the physical disk of the storage system is divided into multiple storage blocks with equal storage sizes.

在具体实施中，将存储池内的每个物理磁盘，按照固定的大小进行切分，将所有的物理磁盘都切分成大小相等的存储块(Chunk)。所述的固定大小可以根据实际需要设定，不做特殊限定。所有物理磁盘切分的存储块被虚拟化成一个存储池，供后续的空间管理分配使用。In a specific implementation, each physical disk in the storage pool is divided according to a fixed size, and all physical disks are divided into storage blocks (Chunks) of equal size. The fixed size can be set according to actual needs and is not specifically limited. The storage blocks divided by all physical disks are virtualized into a storage pool for subsequent space management allocation.

在具体实施中，根据冗余模式(冗余存储空间)，部署不同RAID级别的空间管理模块。例如配置为4+2冗余模式，则可以部署以下RAID级别：RAID0；两副本RAID1；三副本RAID1；RAID5；RAID6。其中RAID0用于当整个存储系统处于无冗余磁盘的场景时的数据写入；两副本RAID1和RAID5用于当整个存储系统只有一块冗余磁盘的场景时的数据写入，更具体地，两副本RAID1用于小块数据写入，而RAID6用于大块数据写入。在实际实施的过程中，可以选择部署其中几种RAID级别或全部RAID级别，根据实际部署的RAID级别，选择其中性能最高的一种来完成数据的写入。In a specific implementation, space management modules of different RAID levels are deployed according to the redundancy mode (redundant storage space). For example, if it is configured as 4+2 redundancy mode, the following RAID levels can be deployed: RAID0; two-copy RAID1; three-copy RAID1; RAID5; RAID6. Among them, RAID0 is used for data writing when the entire storage system has no redundant disks; two copies of RAID1 and RAID5 are used for data writing when the entire storage system has only one redundant disk. More specifically, two Replica RAID1 is used for small-block data writes, while RAID6 is used for large-block data writes. During actual implementation, several or all of the RAID levels can be selected to be deployed, and the one with the highest performance is selected to complete data writing according to the actually deployed RAID levels.

在具体实施中，根据冗余模式，将来自不同磁盘上的存储块组成RAID存储组(Chunk RAID Groups)。以4+2冗余模式的RAID6为例，即需要分别从6块不同的物理磁盘上选择一个存储块组成存储组(Chunk Group)，以便数据能够落在6块不同的物理磁盘上，确保当两块磁盘故障时，仍然能够保证数据能够正常读取和恢复。多个存储组组成了该RAID级别的RAID存储组。In a specific implementation, according to the redundancy mode, storage blocks from different disks are formed into RAID storage groups (Chunk RAID Groups). Taking RAID6 in 4+2 redundancy mode as an example, it is necessary to select a storage block from 6 different physical disks to form a storage group (Chunk Group), so that data can fall on 6 different physical disks, ensuring that When two disks fail, data can still be read and recovered normally. Multiple storage groups make up the RAID storage group for this RAID level.

在具体实施中，RAID存储组中的存储组进一步切分成更细粒度的小存储块(Blocks),来自于同一个存储组不同存储块上的小存储块组成一个条带。In a specific implementation, the storage groups in the RAID storage group are further divided into finer-grained small storage blocks (Blocks), and the small storage blocks from different storage blocks of the same storage group form a stripe.

在具体实施中，一个或多个条带即是逻辑存储单元(LUN)存储数据的基本单位。所述的LUN是可以直接映射给主机实现数据读写的存储单元，在处理用户的读写请求以及进行数据迁移时，LUN向存储系统申请空间、释放空间和数据迁移都以空间管理模块的条带为单位进行的。In a specific implementation, one or more stripes are the basic unit for storing data in a logical storage unit (LUN). The LUN is a storage unit that can be directly mapped to the host to read and write data. When processing user read and write requests and performing data migration, the LUN applies for space from the storage system, releases space, and migrates data according to the conditions of the space management module. carried out with the unit.

参阅图4，列举部署三副本RAID1和RAID6两种的存储系统，逻辑存储单元LUN接收主机数据写入请求，根据数据写入的长度和存储系统当前的状态，选择合适的空间管理模块分配存储空间，数据写入完成后记录LUN的逻辑地址到物理地址的映射。空间管理模块从存储池的不同磁盘上获取存储块，根据RAID级别组成存储组，并进一步切分为更细粒度的小存储块，分配给LUN。空间管理模块管理空闲空间，并维护存储组到各物理磁盘存储块之间的映射关系。物理磁盘存储池负责将物理磁盘切分为存储块，分配和维护每个磁盘存储块空间的使用情况。Refer to Figure 4, which lists two storage systems that deploy three copies of RAID1 and RAID6. The logical storage unit LUN receives the host data write request, and selects the appropriate space management module to allocate storage space according to the length of data written and the current state of the storage system. After the data is written, record the mapping from the logical address to the physical address of the LUN. The space management module obtains storage blocks from different disks in the storage pool, forms storage groups according to the RAID level, and further divides them into smaller storage blocks of finer granularity and allocates them to LUNs. The space management module manages the free space and maintains the mapping relationship between the storage group and each physical disk storage block. The physical disk storage pool is responsible for dividing the physical disk into storage blocks, allocating and maintaining the space usage of each disk storage block.

在具体实施中，接收来自主机的数据写入请求，写入请求中包含写入LUN的起始逻辑地址LBA以及需要写入的数据长度。由于空间管理分配和释放的最小单位为存储系统定义的小存储块，因此在分配空间之前，先检查主机写入请求是否B1ock对齐，如逻辑地址以及数据长度为非对齐，则需要在写入请求的头部和尾部进行预读补齐，确保写入数据的完整性。In a specific implementation, a data write request from the host is received, and the write request includes the initial logical address LBA to be written into the LUN and the length of the data to be written. Since the minimum unit of space management allocation and release is a small storage block defined by the storage system, before allocating space, first check whether the host write request is B1ock aligned. If the logical address and data length are not aligned, then the write request needs to be The head and tail of the file are pre-read and filled to ensure the integrity of the written data.

本发明实施例中，若存储系统的状态是，没有配置冗余存储空间或冗余存储空间不足时，选择RAID0级别；若存储系统的状态是单冗余存储空间，由数据长度决定的写入性能确定两副本RAID1或RAID5；若存储系统的状态是双冗余存储空间，由数据长度决定的写入性能确定三副本RAID1或RAID6；所述冗余存储空间指，镜像副本存储空间或校验存储空间。In the embodiment of the present invention, if the state of the storage system is that there is no redundant storage space configured or the redundant storage space is insufficient, the RAID0 level is selected; if the state of the storage system is a single redundant storage space, the write rate determined by the data length The performance determines two copies of RAID1 or RAID5; if the state of the storage system is dual redundant storage space, the write performance determined by the data length determines three copies of RAID1 or RAID6; the redundant storage space refers to the mirror copy storage space or parity storage.

在具体实施中，假设创建存储池配置为RAID6的4+2冗余模式，并且部署所有RAID级别来进行说明的。如果系统当前为无冗余状态，则直接从RAID0空间管理模块分配存储空间；如果系统当前只有单冗余状态，则进一步根据写请求的数据长度确定是从两副本RAID1还是从RAID5(4+1)空间管理模块分配存储空间，具体判定条件为：如果写入数据的长度小于4个Blocks，则从两副本RAID1中分配存储空间，否则从RAID5分配。以写入数据长度为1个Block为例，在保证同等数据冗余条件下，根据两副本RAID1和RAID5的原理可知，两副本RAID1需要写入次数为2，而RAID5最少需要预读3块盘，写3块盘，因此写入放大是6，从RAID1分配存储空间性能是最高的。其他分配存储空间的情况同理，根据存储池磁盘配置(2+2、4+2或8+2等)和写入的数据长度，计算得到写入放大，根据写入放大选择写入性能更高的RAID级别，并分配相应的存储空间。In the specific implementation, it is assumed that a storage pool is created and configured as a 4+2 redundancy mode of RAID6, and all RAID levels are deployed for illustration. If the system is currently in a no-redundancy state, it will directly allocate storage space from the RAID0 space management module; if the system is currently in a single-redundancy state, it will further determine whether to use two copies of RAID1 or RAID5 (4+1) according to the data length of the write request. ) The space management module allocates storage space, and the specific judgment condition is: if the length of the written data is less than 4 Blocks, then allocate storage space from two copies of RAID1, otherwise allocate from RAID5. Taking the written data length as 1 Block as an example, under the condition of ensuring the same data redundancy, according to the principles of two-copy RAID1 and RAID5, it can be known that the two-copy RAID1 needs to write 2 times, while RAID5 needs to pre-read at least 3 disks , write 3 disks, so the write amplification is 6, and the performance of allocating storage space from RAID1 is the highest. The same is true for other storage space allocation situations. According to the storage pool disk configuration (2+2, 4+2 or 8+2, etc.) and the length of the written data, the write amplification is calculated, and the write performance is selected according to the write amplification. High RAID level and allocate corresponding storage space.

本发明实施例中，记录数据写入的逻辑地址和存储空间的物理地址之间的对应关系。In the embodiment of the present invention, the correspondence between the logical address where data is written and the physical address of the storage space is recorded.

在具体实施中，根据确定的空间管理模块RAID级别以及分配的存储空间，确定条带关系，完成多副本的数据写入或者计算P和Q校验码后，完成整个条带的数据写入。所述的空间管理模块，维护RAID存储组与物理磁盘之间的映射关系，主机的写入请求按照条带与物理磁盘之间的映射关系拆分成不同的子请求，下发到各个物理磁盘。主机写入请求中的写入地址为逻辑地址LBA，空间管理模块分配的存储空间为物理地址PBA，LBA和PBA的大小都对齐到Block大小。Block的编址方式为，高8位用来表示所属的Chunk RAID Group标识，低56位用来表示在该RAID存储组内的偏移。In the specific implementation, according to the determined RAID level of the space management module and the allocated storage space, the stripe relationship is determined, and the data writing of the entire stripe is completed after completing the data writing of multiple copies or calculating the P and Q check codes. The space management module maintains the mapping relationship between the RAID storage group and the physical disk, and the write request of the host is divided into different sub-requests according to the mapping relationship between the stripe and the physical disk, and sent to each physical disk . The write address in the host write request is the logical address LBA, the storage space allocated by the space management module is the physical address PBA, and the sizes of the LBA and PBA are aligned to the block size. The block addressing method is that the upper 8 bits are used to indicate the Chunk RAID Group ID to which it belongs, and the lower 56 bits are used to indicate the offset within the RAID storage group.

在具体实施中，所有拆分的子请求写入完成后，空间管理模块记录本次写入请求分配的存储空间和管理本模块内的空闲空间，逻辑卷LUN层记录LBA到PBA之间的映射关系，完成本次的写入请求。In the specific implementation, after all split sub-requests are written, the space management module records the storage space allocated by this write request and manages the free space in this module, and the logical volume LUN layer records the mapping between LBA and PBA Relationship, complete this write request.

在具体实施中，RAID存储组到物理磁盘之间的映射关系，物理磁盘空间的使用情况，空间管理模块的Block分配使用情况以及逻辑卷LUN层LBA到PBA之间的映射关系等存储系统的元数据，可以有独立的存储介质，无数据数据结构可以采用数据库或者其他自定义的数据类型。In the specific implementation, the mapping relationship between the RAID storage group and the physical disk, the usage of the physical disk space, the block allocation and usage of the space management module, and the mapping relationship between the logical volume LUN layer LBA and the PBA and other elements of the storage system Data can have an independent storage medium, and data structures without data can use databases or other custom data types.

由RAID的实现原理可知，不同的RAID级别，其有效存储空间有很大的差别，其中两副本RAID1的有效存储空间只有总磁盘空间的50％，三副本RAID1的有效存储空间只有总磁盘空间的33％。对于配置为4+2的RAID6的存储系统来说，其有效的存储空间是总磁盘空间的66％。因此为了在保证写入性能的同时，还能充分利用磁盘的存储空间。According to the realization principle of RAID, the effective storage space of different RAID levels is very different. The effective storage space of two-copy RAID1 is only 50% of the total disk space, and the effective storage space of three-copy RAID1 is only 50% of the total disk space. 33%. For a storage system configured as 4+2 RAID6, its effective storage space is 66% of the total disk space. Therefore, in order to ensure the writing performance, the storage space of the disk can also be fully utilized.

本发明实施例中，还包括数据迁移，步骤包括：将数据从存储空间中进行读取，按照条带关系重新组织；将组织完成的数据按照相应的RAID级别新的分配存储空间，进行数据写入，更新数据写入的逻辑地址和存储空间的物理地址之间的对应关系，同时释放迁移前占用存储空间。In the embodiment of the present invention, data migration is also included, and the steps include: reading the data from the storage space, and reorganizing according to the stripe relationship; allocating the organized data to a new storage space according to the corresponding RAID level, and performing data writing Input, update the corresponding relationship between the logical address of data writing and the physical address of the storage space, and at the same time release the storage space occupied before the migration.

在具体实施中，数据迁移模块可以实现将用户的数据从低空间利用率级别的RAID迁移到高空间利用率级别的RAID。参阅图5，该示意图以两副本RAID1迁移到4+2的RAID6为例，实际使用中，可以从RAID0、三副本RAID1或者RAID5存储空间的迁移，实现冗余度的提升或者提升有效存储空间。In a specific implementation, the data migration module can migrate user data from a RAID with a low space utilization level to a RAID with a high space utilization level. Refer to Figure 5. This diagram takes the migration of two-copy RAID1 to 4+2 RAID6 as an example. In actual use, the storage space of RAID0, three-copy RAID1 or RAID5 can be migrated to improve redundancy or increase effective storage space.

在具体实施中，数据迁移开始，从待迁移的RAID存储组获取待迁移的Block列表。所述的待迁移B1ock指的是含有有效用户数据的存储空间。将这些有效存储空间上的用户数据读取出来，并按照RAID6的4+2的条带关系，重新组织用户数据。从RAID6对应的空间管理模块分配一个满条带的存储空间，用重新组织好的用户数据计算条带的P和Q校验码，完成一个整条带的数据写入。写入完成后，首先需要更新逻辑卷LUN的数据映射关系，然后再更新RAID6空间管理模块的空闲空间和释放两副本RAID1占用的存储空间。In a specific implementation, the data migration starts, and the block list to be migrated is obtained from the RAID storage group to be migrated. The B1ock to be migrated refers to the storage space containing valid user data. Read out the user data on these effective storage spaces, and reorganize the user data according to the 4+2 stripe relationship of RAID6. Allocate a full stripe of storage space from the space management module corresponding to RAID6, calculate the P and Q check codes of the stripe with the reorganized user data, and complete the data writing of a whole stripe. After the writing is completed, the data mapping relationship of the logical volume LUN needs to be updated first, and then the free space of the RAID6 space management module is updated and the storage space occupied by the two copies of RAID1 is released.

本发明还提供一种存储系统，包括：RAID部署模块、数据写入模块和磁盘阵列，其中：The present invention also provides a storage system, including: a RAID deployment module, a data writing module and a disk array, wherein:

所述RAID部署模块，用于接收数据写入请求，根据写入请求对应的数据和存储系统的状态，选择数据存储的RAID级别；The RAID deployment module is configured to receive a data write request, and select the RAID level of data storage according to the data corresponding to the write request and the state of the storage system;

所述数据写入模块，用于根据数据存储的RAID级别分配存储空间，并确定存储组的条带关系，将写入请求按照条带与存储块之间的映射关系拆分为多个子请求；执行子请求，完成数据写入；The data writing module is used to allocate storage space according to the RAID level of data storage, and determine the stripe relationship of the storage group, and split the write request into multiple sub-requests according to the mapping relationship between the stripe and the storage block; Execute the sub-request and complete the data writing;

所述磁盘阵列，物理磁盘切分为多个存储块，根据RAID级别对应的存储结构，来自不同物理磁盘的多个存储块配置为存储组，多个存储组配置为相应RAID级别的RAID存储组。In the disk array, the physical disk is divided into multiple storage blocks, and according to the storage structure corresponding to the RAID level, multiple storage blocks from different physical disks are configured as storage groups, and multiple storage groups are configured as RAID storage groups of corresponding RAID levels .

本发明实施例中，包括空间管理模块，用于分配存储空间，记录数据写入的逻辑地址和存储空间的物理地址之间的对应关系。In the embodiment of the present invention, a space management module is included for allocating storage space and recording the correspondence between the logical address of data writing and the physical address of the storage space.

本发明实施例中，包括数据迁移模块，用于将数据从存储空间中进行读取，按照条带关系重新组织；将组织完成的数据按照相应的RAID级别分配存储空间，进行数据写入；空间管理模块用于记录数据写入的逻辑地址和存储空间的物理地址之间的对应关系，更新并释放空闲的存储空间。In the embodiment of the present invention, a data migration module is included, which is used to read the data from the storage space and reorganize according to the stripe relationship; allocate the organized data to the storage space according to the corresponding RAID level, and write the data; The management module is used to record the corresponding relationship between the logical address of data writing and the physical address of the storage space, update and release the free storage space.

本发明实施例中，所述RAID部署模块，用于若存储系统的状态是，没有配置冗余存储空间或冗余存储空间不足时，选择RAID0级别；若存储系统的状态是单冗余存储空间，由数据长度决定的写入性能确定RAID1或RAID5；若存储系统的状态是双冗余存储空间，选择RAID6；所述冗余存储空间指镜像副本存储空间或校验存储空间。In the embodiment of the present invention, the RAID deployment module is used to select the RAID0 level if the state of the storage system is that no redundant storage space is configured or the redundant storage space is insufficient; if the state of the storage system is single redundant storage space , the write performance determined by the data length determines RAID1 or RAID5; if the state of the storage system is a dual-redundant storage space, select RAID6; the redundant storage space refers to the mirror copy storage space or the verification storage space.

本发明实施例中，所述RAID部署模块，用于数据长度小于4个小存储块时，选择RAID1；数据长度大于等于4个小存储块时，选择RAID5；所述小存储块为存储系统支持的最小粒度的数据块，由存储组进行进一步的划分得到。In the embodiment of the present invention, the RAID deployment module is used when the data length is less than 4 small storage blocks, select RAID1; when the data length is greater than or equal to 4 small storage blocks, select RAID5; the small storage block is supported by the storage system The data block with the smallest granularity is obtained by further dividing the storage group.

参阅图6，存储系统的硬件包含存储系统控制器和磁盘阵列，其中存储系统控制器为双控制器架构，互为备份和负荷分担。磁盘阵列为包含若干物理磁盘的磁盘柜，与存储系统控制器之间通过系统总线相连。存储系统控制器10和存储系统控制器20，具有相同的结构和功能，两个控制器之间互为备份和负荷分担，可有效保证存储系统的安全和可靠性，保证存储系统的写操作性能。Referring to FIG. 6, the hardware of the storage system includes a storage system controller and a disk array, wherein the storage system controller is a dual-controller architecture for mutual backup and load sharing. A disk array is a disk cabinet containing several physical disks, and is connected to the storage system controller through the system bus. The storage system controller 10 and the storage system controller 20 have the same structure and function, and the two controllers are mutually backed up and load shared, which can effectively ensure the safety and reliability of the storage system and ensure the write operation performance of the storage system .

所述的磁盘阵列30，包含若干物理磁盘，为存储系统提供实际的物理存储空间。可为各种不同的存储介质，甚至可以不同的存储介质混合搭配。存储系统基于该磁盘阵列创建存储池并部署相应的存储系统控制器。The disk array 30 includes several physical disks to provide actual physical storage space for the storage system. Various storage media can be used, and even different storage media can be mixed and matched. The storage system creates a storage pool based on the disk array and deploys a corresponding storage system controller.

Claims

1. A data writing method of a storage system, comprising:

receiving a data writing request, and selecting a RAID level of data storage according to data corresponding to the writing request and the state of a storage system; the physical disk of the storage system is divided into a plurality of storage blocks, the plurality of storage blocks from different physical disks are configured into storage groups according to the storage structures corresponding to the RAID levels, and the plurality of storage groups are configured into RAID storage groups of corresponding RAID levels; if the state of the storage system is that redundant storage space is not configured or insufficient, selecting RAID0 level; if the state of the storage system is a single redundant storage space, determining two copies of RAID1 or RAID5 according to the write-in performance determined by the data length; if the state of the storage system is a dual-redundancy storage space, determining three-copy RAID1 or RAID6 according to the write performance determined by the data length; the redundant storage space refers to a mirror image copy storage space or a check storage space;

allocating a storage space according to the RAID level of data storage, determining the stripe relation of a storage group, and splitting a write request into a plurality of sub-requests according to the mapping relation between a stripe and a storage block;

and executing the sub-request to complete data writing.

2. The data writing method of the storage system according to claim 1, wherein the RAID level of the storage system configuration includes at least one of RAID0, RAID1, RAID5, and RAID6.

3. The data writing method of the storage system according to claim 2, wherein the executing the sub-request further comprises:

and recording the corresponding relation between the logical address written by the data and the physical address of the storage space.

4. The data writing method of the storage system according to claim 3, wherein the physical disk of the storage system is divided into a plurality of storage blocks with equal storage size.

5. The data writing method of the storage system according to claim 4, further comprising data migration, the steps comprising:

reading data from the storage space, and reorganizing the data according to a stripe relation;

distributing new storage space for the organized data according to corresponding RAID level, writing the data, updating the corresponding relation between the logic address written in the data and the physical address of the storage space, and releasing the occupied storage space before migration.

6. A storage system, comprising: RAID deploys module, data write-in module and disk array, wherein:

the RAID deployment module is used for receiving a data write-in request and selecting the RAID level of data storage according to the data corresponding to the write-in request and the state of the storage system; if the state of the storage system is that redundant storage space is not configured or insufficient, selecting RAID0 level; if the state of the storage system is a single redundant storage space, determining two copies of RAID1 or RAID5 according to the write-in performance determined by the data length; if the state of the storage system is a double-redundancy storage space, determining three-copy RAID1 or RAID6 according to the write-in performance determined by the data length; the redundant storage space refers to a mirror image copy storage space or a check storage space;

the data writing module is used for allocating storage space according to the RAID level of data storage, determining the stripe relation of a storage group, and splitting a writing request into a plurality of sub-requests according to the mapping relation between a stripe and a storage block; executing the sub-request to finish data writing;

in the disk array, a physical disk is divided into a plurality of storage blocks, the plurality of storage blocks from different physical disks are configured into storage groups according to a storage structure corresponding to the RAID level, and the plurality of storage groups are configured into RAID storage groups of corresponding RAID levels.

7. The storage system according to claim 6, comprising a space management module for allocating the storage space and recording the correspondence between the logical address of the data write and the physical address of the storage space.

8. The storage system according to claim 7, comprising a data migration module for reading data from the storage space and reorganizing the data according to a stripe relationship; distributing storage space for the organized data according to corresponding RAID levels, and writing the data; the space management module is used for recording the corresponding relation between the logical address written in the data and the physical address of the storage space, and updating and releasing the free storage space.