CN102541472A

CN102541472A - A method and device for rebuilding a RAID array

Info

Publication number: CN102541472A
Application number: CN2011104567385A
Authority: CN
Inventors: 上官应兰
Original assignee: Macrosan Technologies Co Ltd
Current assignee: Macrosan Technologies Co Ltd
Priority date: 2011-12-31
Filing date: 2011-12-31
Publication date: 2012-07-04

Abstract

The invention relates to a method and a device for reconstructing a RAID array, which are used for executing reconstruction operation of the RAID array, wherein the RAID array is divided into physical blocks with the same size in advance, and the device comprises: the resource allocation unit is used for allocating one or more physical blocks for the logic resources when the logic resources are created and recording the corresponding relation between the logic resources and the physical blocks; the access recording unit is used for maintaining a resource access recording table, and the resource access recording table is used for recording whether each physical block is written with data or not; and the reconstruction processing unit is used for acquiring the physical block with the written state according to the resource access record table when reconstructing the RAID array, and reconstructing the physical block with the written state by taking the physical block as a unit. The invention can greatly improve the reconstruction efficiency and reduce the influence on the service in the reconstruction process to a lower level.

Description

A method and device for rebuilding a RAID array

技术领域 technical field

本发明涉及网络存储技术，尤其涉及一种RAID阵列重建技术。The invention relates to network storage technology, in particular to a RAID array rebuilding technology.

背景技术 Background technique

在涉及众多主机的数据存储的网络环境中，为了提高数据存储的可靠性和安全性，同时为了存储容量的扩展性和灵活性，网络存储技术应运而生。通常来说，网络存储系统的作用是为客户端PC机或者服务器(一般统称为主机或Host)提供可用的存储空间。In the network environment involving data storage of many hosts, in order to improve the reliability and security of data storage, and at the same time for the expansion and flexibility of storage capacity, network storage technology emerges as the times require. Generally speaking, the role of the network storage system is to provide available storage space for client PCs or servers (generally collectively referred to as hosts or Hosts).

一般网络存储系统的前端可以通过IP网络或者FC网络与主机相连，为主机提供数据存储服务。在数据传输方面，以基于IP承载的网络存储系统为例，主机可以基于标准的iSCSI(互联网小型计算机系统接口)协议网络存储系统进行数据的读写操作。网络存储系统的核心是存储控制器(StorageController)，存储控制器进行数据处理并把数据写入到后端物理磁盘中。The front end of a general network storage system can be connected to the host through an IP network or FC network to provide data storage services for the host. In terms of data transmission, taking an IP-based network storage system as an example, the host can perform data read and write operations based on the standard iSCSI (Internet Small Computer System Interface) protocol network storage system. The core of the network storage system is the storage controller (Storage Controller), which performs data processing and writes the data to the back-end physical disk.

为提高写物理磁盘的性能以及提供数据冗余性，存储控制器通常支持独立磁盘冗余阵列(RAID，也可称为RAID阵列，或者简称为阵列)技术，RAID技术是一种把多块独立的物理磁盘按不同的方式组合起来形成一个磁盘组，从而提供比单个磁盘更高的存储性能及可靠性。In order to improve the performance of writing physical disks and provide data redundancy, storage controllers usually support Redundant Array of Independent Disks (RAID, also known as RAID array, or array for short) technology. Physical disks are combined in different ways to form a disk group, which provides higher storage performance and reliability than a single disk.

根据不同的数据组织方式，常用的RAID包括RAID0、RAID1、RAID5、RAID6、RAID10等。根据RAID级别的不同可以提供各种级别的性能和可靠度，可以保证多数情况下，一个或者多个磁盘故障时可以通过剩余成员磁盘中的数据采用RAID级别对应的算法恢复出错磁盘的数据，即保证数据不丢失。通过这种算法可重构故障盘中的数据并写入到热备盘中，重构完成后热备盘做成阵列的成员磁盘，恢复阵列的冗余性和可靠性，即通常所说的RAID阵列重建。Commonly used RAIDs include RAID0, RAID1, RAID5, RAID6, and RAID10 according to different data organization methods. Various levels of performance and reliability can be provided according to different RAID levels, and it can be guaranteed that in most cases, when one or more disks fail, the data in the remaining member disks can be recovered using the algorithm corresponding to the RAID level, that is, Ensure data is not lost. Through this algorithm, the data in the faulty disk can be reconstructed and written to the hot spare disk. After the reconstruction is completed, the hot spare disk can be used as a member disk of the array to restore the redundancy and reliability of the array. RAID array rebuild.

在传统的网络存储系统中，当某项应用需要一部分存储空间的时候，往往是预先从后端存储系统中划分出一部分足够大的空间预先分配给该项应用，分配空间时必须要考虑业务扩容的需求，以及业务数据量膨胀的需求，综合考虑各种因素的后果是，逻辑资源(LUN)的大小远远大于当前实际需要的存储空间，将导致LUN中只有少量的空间存放用户数据，大量的空间是闲置的。在这种情况下，一方面，用户的投资回报率降低；另一方面，存储空间变大，重建的概率也将变大。在重建过程中，如果再有其他数据磁盘损坏，则将会导致数据丢失。另外，在重建过程中，重建IO将占用系统资源，将影响读写业务的性能。重建的效率以及重建的性能，成为影响存储系统可靠性的关键因素。In a traditional network storage system, when an application needs a part of storage space, it is often pre-allocated a large enough space from the back-end storage system to the application in advance, and business expansion must be considered when allocating space As a result of comprehensive consideration of various factors, the size of the logical resource (LUN) is much larger than the actual storage space currently required, resulting in only a small amount of space in the LUN for storing user data, and a large amount of space for storing user data. space is free. In this case, on the one hand, the user's return on investment decreases; on the other hand, the storage space becomes larger, and the probability of rebuilding will also increase. During the rebuild process, if any other data disks are damaged, data loss will result. In addition, during the reconstruction process, the reconstruction IO will occupy system resources and affect the performance of read and write services. The reconstruction efficiency and reconstruction performance become the key factors affecting the reliability of the storage system.

自动精简配置是网络存储系统中常见的功能特性，其目的是解决前面提到的存储过量供给问题，根据实际的需求来分配存储空间。其核心原理是“欺骗”客户端操作系统，让客户端操作系统认为已经分配了很大LUN，比如客户端操作系统看到一个2TB的LUN，而实际上存储设备上只为这个资源分配了几十或者几百GB的物理空间，其余空间都是虚拟出来的。随着应用程序写入越来越多的数据，物理存储利用率也会越来越高，当实际分配的物理空间不足时，再分配额外的物理空间，达到随需扩展的目的。Thin provisioning is a common feature in network storage systems. Its purpose is to solve the problem of storage oversupply mentioned above and allocate storage space according to actual needs. Its core principle is to "deceive" the client operating system, making the client operating system think that a large LUN has been allocated. For example, the client operating system sees a 2TB LUN, but in fact only a few LUNs are allocated for this resource on the storage device. Ten or hundreds of GB of physical space, the rest of the space is virtual. As the application writes more and more data, the physical storage utilization rate will also become higher and higher. When the actual allocated physical space is insufficient, additional physical space will be allocated to achieve the purpose of expansion on demand.

主机(通常是各种服务器)识别LUN时，其所看到的并不是真实空间，而是由自动精简配置虚拟出来的空间，真实分配的物理空间取决于资源分配策略，可能只有总空间的四分之一，甚至更少。When a host (usually a variety of servers) recognizes a LUN, what it sees is not the real space, but the virtual space created by thin provisioning. The actual physical space allocated depends on the resource allocation policy, and may only be four times the total space. one-third, or even less.

创建一个启用自动精简配置的LUN时，需要指定LUN总容量、LUN预分配物理空间大小以及占用的RAID、LUN对应的物理空间扩容策略。LUN总容量是指客户端看到的LUN大小，LUN预分配物理空间大小是指创建LUN时实际占用的物理空间大小，LUN物理空间扩容策略是指LUN物理空间扩容的触发条件以及扩容策略，比如LUN预分配物理空间使用率达到80％时触发扩容，每次扩容的步长是LUN总容量的5％。系统在指定的RAID上根据LUN预分配物理空间大小分配资源，并创建LUN的段表，标识LUN和RAID的对应关系，同时修改RAID的段表，标识这些段已经使用。When creating a thin-provisioned LUN, you need to specify the total capacity of the LUN, the size of the pre-allocated physical space of the LUN, the occupied RAID, and the physical space expansion policy corresponding to the LUN. The total LUN capacity refers to the size of the LUN seen by the client. The size of the pre-allocated physical space of the LUN refers to the size of the physical space actually occupied when the LUN is created. The LUN physical space expansion policy refers to the triggering conditions and expansion policies for the expansion of the LUN physical space. For example, The capacity expansion is triggered when the utilization rate of the pre-allocated physical space of the LUN reaches 80%, and the step size of each expansion is 5% of the total capacity of the LUN. The system allocates resources on the specified RAID based on the size of the pre-allocated physical space of the LUN, and creates a segment table of the LUN to identify the corresponding relationship between the LUN and the RAID, and modifies the segment table of the RAID to indicate that these segments have been used.

因为启用了自动精简配置的LUN实际分配的物理空间和客户端看到的总空间不对等，因此还需要维护一个专门的LUN线性表，用于记录LUN线性空间和RAID实际物理空间的对应关系。当LUN上收到一个IO写请求时，先从预分配的物理空间中分配应用要访问的空闲空间，修改LUN线性表，写入数据。当LUN上收到一个IO读请求时，如果LUN线性表中有对应的物理空间，直接访问，如果没有，则直接返回全0。Because the actual physical space allocated by the thin-provisioned LUN is not equal to the total space seen by the client, a special LUN linear table needs to be maintained to record the correspondence between the LUN linear space and the actual physical space of the RAID. When a LUN receives an IO write request, it first allocates free space to be accessed by the application from the pre-allocated physical space, modifies the LUN linear table, and writes data. When an IO read request is received on the LUN, if there is a corresponding physical space in the LUN linear table, it will be accessed directly; if not, all 0s will be returned directly.

自动精简配置最显著的特点是可以根据当前业务的实际需求分配存储空间，总存储空间变小，需要重建的空间也随之变小，即从最小化存储空间的角度降低重建的风险。然而自动精简配置实现复杂，引入的一个显而易见的问题是降低了性能，LUN维护段表和线性分布表，每一个IO都需要查找段表和线性分布表，以找到对应的物理空间，数据通道处理流程加长，性能变差。因此，自动精简配置不适用于性能要求较高、可靠性要求较高、但是对成本控制松散的用户。The most notable feature of thin provisioning is that storage space can be allocated according to the actual needs of the current business. The total storage space is reduced, and the space to be rebuilt is also reduced. That is, the risk of reconstruction is reduced from the perspective of minimizing storage space. However, the implementation of thin provisioning is complicated, and an obvious problem introduced is that it reduces performance. LUN maintains segment tables and linear distribution tables. Each IO needs to look up segment tables and linear distribution tables to find the corresponding physical space. Data channel processing The process is longer and the performance is worse. Therefore, thin provisioning is not suitable for users with high performance and reliability requirements but loose cost control.

现有技术中，对重建优化的另外一个思路是仅重建RAID中已经分配的空间，根据RAID记录的分配信息，重建已经分配的区域，以此来减少RAID重建的任务量，避免重建过程中做无用功，从而缩短重建过程需要的时间。然而如前所述，通常存储系统中逻辑资源的大小远远大于当前实际需要的存储空间，这将导致逻辑资源中只有少量的空间存放用户数据，大量的空间是闲置的。显而易见，仅重建RAID中已分配的区域并不是最优的解决方案。即不能最大程度地规避对性能要求较高、可靠性要求较高但对成本控制松散的用户所面临的重建过程中数据丢失的风险。In the prior art, another idea for reconstruction optimization is to only rebuild the allocated space in the RAID, and rebuild the allocated area according to the allocation information recorded in the RAID, so as to reduce the task load of the RAID reconstruction and avoid the Useless work, thereby shortening the time required for the rebuild process. However, as mentioned above, the size of logical resources in the storage system is usually much larger than the actual storage space currently required, which will result in only a small amount of space in the logical resources for storing user data, and a large amount of space is idle. Obviously, rebuilding only the allocated areas of the RAID is not an optimal solution. That is, the risk of data loss during the reconstruction process faced by users who have high performance requirements and high reliability requirements but loose cost control cannot be avoided to the greatest extent.

发明内容 Contents of the invention

有鉴于此，本发明提供一种RAID阵列重建装置，用于执行网络存储系统内的RAID阵列重建操作，其中所述RAID阵列被预先划分为大小相同物理块，该装置包括：In view of this, the present invention provides a RAID array rebuilding device for performing a RAID array rebuilding operation in a network storage system, wherein the RAID array is pre-divided into physical blocks of the same size, and the device includes:

资源分配单元，用于在创建逻辑资源时为逻辑资源分配一个或多个物理块，并记录逻辑资源与物理块之间的对应关系；A resource allocation unit, configured to allocate one or more physical blocks to the logical resource when creating the logical resource, and record the correspondence between the logical resource and the physical block;

访问记录单元，用于维护一个资源访问记录表，该资源访问记录表用于记录每一个物理块是否被写入了数据；其中该访问记录单元在有数据写入物理块时将资源访问记录表中该物理块的状态标记为已写入，并在所述逻辑资源被删除时，将资源访问记录中该逻辑资源对应的物理块的状态标记为未写入；以及The access record unit is used to maintain a resource access record table, and the resource access record table is used to record whether each physical block has been written with data; wherein the access record unit records the resource access record table when data is written into the physical block Mark the status of the physical block in the resource access record as written, and when the logical resource is deleted, mark the status of the physical block corresponding to the logical resource in the resource access record as unwritten; and

重建处理单元，用于在重建RAID阵列时根据所述资源访问记录表获取状态为已写入的物理块，以物理块为单元对状态为已写入的物理块进行重建。The reconstruction processing unit is configured to obtain the written physical blocks according to the resource access record table when rebuilding the RAID array, and rebuild the written physical blocks in units of physical blocks.

本发明还提一种RAID阵列重建方法，用于执行网络存储系统内的RAID阵列重建操作，其中所述RAID阵列被预先划分为大小相同物理块，该方法包括：The present invention also provides a RAID array reconstruction method for performing a RAID array reconstruction operation in a network storage system, wherein the RAID array is pre-divided into physical blocks of the same size, and the method includes:

A、在创建逻辑资源时为逻辑资源分配一个或多个物理块，并记录逻辑资源与物理块之间的对应关系；A. Allocate one or more physical blocks for logical resources when creating logical resources, and record the correspondence between logical resources and physical blocks;

B、维护一个资源访问记录表，该资源访问记录表用于记录每一个物理块是否被写入了数据；并在有数据写入物理块时将资源访问记录表中该物理块的状态标记为已写入，并在所述逻辑资源被删除时，将资源访问记录中该逻辑资源对应的物理块的状态标记为未写入；以及B. Maintain a resource access record table, which is used to record whether each physical block has written data; and when data is written into the physical block, the state of the physical block in the resource access record table is marked as has been written, and when the logical resource is deleted, mark the status of the physical block corresponding to the logical resource in the resource access record as not written; and

C、在重建RAID阵列时根据所述资源访问记录表获取状态为已写入的物理块，以物理块为单元对状态为已写入的物理块进行重建。C. When rebuilding the RAID array, the physical blocks whose status is written are obtained according to the resource access record table, and the physical blocks whose status is written are rebuilt in units of physical blocks.

由于本发明在重建过程中仅仅重建实际被使用的物理空间，因此相较于现有技术大幅度提高了重建的效率与速度，有效避免了重建所引发的数据丢失等风险，并且对于正常的数据读写业务影响很低。Since the present invention only rebuilds the actually used physical space during the reconstruction process, compared with the prior art, the efficiency and speed of reconstruction are greatly improved, and the risk of data loss caused by reconstruction is effectively avoided, and normal data The business impact of reading and writing is very low.

附图说明 Description of drawings

图1是本发明网络存储设备逻辑原理图。FIG. 1 is a logical schematic diagram of a network storage device of the present invention.

图2是本发明一种实施方式中数据写入流程处理图。Fig. 2 is a flow chart of data writing process in an embodiment of the present invention.

图3是本发明一种实施方式中逻辑资源删除流程处理图。Fig. 3 is a processing flow diagram of logical resource deletion in an embodiment of the present invention.

图4是本发明一种实施方式中阵列重建流程处理图。Fig. 4 is a processing flow diagram of array reconstruction in an embodiment of the present invention.

具体实施方式 Detailed ways

总体上来说，本发明在现有的数据流处理和RAID重建管理的基础上，引入了资源访问记录表，跟踪数据写入情况，仅重建已写入数据的区域，从而实现最大程度地减少了需要重建的任务量，提高重建的效率，减少重建的时间，降低重建对读写业务性能的影响。Generally speaking, on the basis of the existing data flow processing and RAID reconstruction management, the present invention introduces a resource access record table, tracks data writing, and only rebuilds the area where data has been written, thereby minimizing The amount of tasks that need to be reconstructed improves the efficiency of reconstruction, reduces the time of reconstruction, and reduces the impact of reconstruction on the performance of read and write services.

在本发明中需要维护资源访问记录表，可以基于RAID条带记录，也可以基于固定长度的RAID资源块(请参考本申请人先前申请的相关专利)记录，取决于具体的实现，以下以统一称为物理块(Block)，一个Block表示特定长度的存储空间。资源访问记录表可以是任何格式的结构，也可以位于系统任意可实现的位置，主要取决于对系统性能和空间的需求。比如说，放入更底层的位置实现，会提升性能，但实现复杂度可能略高，反之则性能一般，但实现容易。在一种实施方式中，可以在存储系统的RAID模块这层面来维护资源访问记录表。另外，为了提高检索效率并减少资源访问记录表占用的系统资源，可以采用bitmap方式进行记录，一个bit对应一个Block，比如bit为1表示对应的Block已写入数据；bit为0：表示对应的Block上未写入数据。In the present invention, it is necessary to maintain the resource access record table, which can be recorded based on RAID strips, or based on fixed-length RAID resource blocks (please refer to the relevant patents previously applied by the applicant), depending on the specific implementation. Called a physical block (Block), a Block represents a specific length of storage space. The resource access record table can be structured in any format, and can also be located in any achievable position of the system, mainly depending on the requirements for system performance and space. For example, putting it in a lower-level location will improve performance, but the implementation complexity may be slightly higher; otherwise, the performance will be average, but the implementation will be easy. In an implementation manner, the resource access record table may be maintained at the level of the RAID module of the storage system. In addition, in order to improve the retrieval efficiency and reduce the system resources occupied by the resource access record table, the bitmap method can be used for recording. One bit corresponds to a Block. For example, a bit of 1 indicates that the corresponding Block has written data; a bit of 0 indicates that the corresponding No data has been written to the block.

请参考图1，图2以及图3。本发明RAID阵列重建装置20应用于网络存储系统10之中，该网络存储系统10进一步包括读写业务处理装置30，所述RAID阵列重建装置20包括资源分配单元22、访问记录单元24以及重建处理单元26。以上所述的装置是从逻辑层面抽象而成的，典型的方式是通过处理器加上程序代码来实现，但同样可以通过硬件、固件或者软硬结合的方式来实现。以下描述上述RAID阵列重建装置20的一般处理流程。Please refer to Figure 1, Figure 2 and Figure 3. The RAID array rebuilding device 20 of the present invention is applied in the network storage system 10, and the network storage system 10 further includes a read-write business processing device 30, and the RAID array rebuilding device 20 includes a resource allocation unit 22, an access recording unit 24, and a reconstruction processing Unit 26. The above-mentioned devices are abstracted from the logic level, and the typical way is to realize it through the processor plus program code, but it can also be realized through hardware, firmware or a combination of software and hardware. The general processing flow of the above-mentioned RAID array rebuilding device 20 is described below.

步骤101，在创建逻辑资源时为逻辑资源分配一个或多个物理块，并记录逻辑资源与物理块之间的对应关系；本步骤由资源分配单元22执行。Step 101 , allocate one or more physical blocks for the logical resource when creating the logical resource, and record the corresponding relationship between the logical resource and the physical block; this step is executed by the resource allocation unit 22 .

在本发明中，RAID阵列被预先划分为大小相同的物理块，典型的物理块可以是条带(Stripe)，也可以是本申请人在先前专利申请中提出的固定长度的资源块；本发明并不关心条带或者资源块的划分方法以及实际大小；其本质是将RAID阵列的物理资源进行分块，本发明在此将其统称为物理块。In the present invention, the RAID array is pre-divided into physical blocks of the same size, and a typical physical block can be a stripe (Stripe), or a fixed-length resource block proposed by the applicant in a previous patent application; the present invention The division method and actual size of stripes or resource blocks are not concerned; the essence is to divide the physical resources of the RAID array into blocks, which are collectively referred to as physical blocks in the present invention.

网络存储系统在创建逻辑资源的时候，从RAID阵列中挑选出一个或者多个物理块分配给该逻辑资源，这样一来逻辑资源会有一个或者多个对应的物理块，资源分配单元22将这种对应关系记录下来以备后续处理使用。When the network storage system creates a logical resource, it selects one or more physical blocks from the RAID array and assigns it to the logical resource. In this way, the logical resource has one or more corresponding physical blocks, and the resource allocation unit 22 allocates these The corresponding relationship is recorded for subsequent processing.

需要说明的是，网络存储系统10可能支持自动精简配置技术，用户一旦使能自动精简配置技术，逻辑资源的大小很可能与物理块的总和不一致，因此这里所说的对应关系是指物理块被分配给哪个逻辑资源，并不关心逻辑资源与物理资源之间大小是否对应的问题。It should be noted that the network storage system 10 may support the thin provisioning technology. Once the user enables the thin provisioning technology, the size of the logical resource may be inconsistent with the sum of the physical blocks. Therefore, the corresponding relationship mentioned here means that the physical blocks are Which logical resource is assigned to does not care about whether the size of the logical resource corresponds to the physical resource.

步骤102，维护一个资源访问记录表，该资源访问记录表用于记录每一个物理块是否被写入了数据；在有数据写入物理块时将资源访问记录表中该物理块的状态标记为已写入，并在所述逻辑资源被删除时，将资源访问记录中该逻辑资源对应的物理块的状态标记为未写入；本步骤由访问记录单元24执行。Step 102, maintaining a resource access record table, the resource access record table is used to record whether data is written into each physical block; when data is written into the physical block, the status of the physical block in the resource access record table is marked as has been written, and when the logical resource is deleted, the status of the physical block corresponding to the logical resource in the resource access record is marked as unwritten; this step is performed by the access record unit 24.

如前所述资源访问记录表比较典型有效的实现方式是Bitmap，其是一种在网络存储领域非常流行的技术，在此不在对其进行详细描述。资源访问记录表中各个物理块的更新处理可以与读写业务装置的处理串行设置也可以与读写业务并行设置。As mentioned above, a typical and effective implementation of the resource access record table is Bitmap, which is a very popular technology in the field of network storage, and will not be described in detail here. The update processing of each physical block in the resource access record table can be set in series with the processing of the read-write service device or in parallel with the read-write service.

对于串行处理来说，在读写业务处理装置(通常为RAID业务处理模块)收到来自逻辑资源或者网络存储系统内应用程序写入数据的写命令时，其可以按照一般的写处理流程去处理，如果数据写入是成功的，则可以转入步骤102进行处理，否则返回，比如提示写入端写入操作失败。For serial processing, when a read-write service processing device (usually a RAID service processing module) receives a write command from a logic resource or an application program in a network storage system, it can proceed according to the general write process flow. Processing, if the data writing is successful, it can go to step 102 for processing, otherwise return, such as prompting that the writing operation of the writing end failed.

对于并行处理来说，步骤102与读写业务装置对写命令的处理是并行的，即便写命令最后的处理结果是失败，步骤102仍然有可能会将物理块的状态标记为已写入。这样做的结果是资源访问记录表中对该物理块的状态记录可能与物理块的实际状态不符合。然而这样做的效率较高，可以提高业务流程的处理速度。而且即使出现不符合的状况，所引发的问题仅仅是后续重建工作有少许的增加，也就是说重建了一部分不应该重建的物理块。For parallel processing, step 102 is parallel to the processing of the write command by the read-write service device. Even if the final processing result of the write command is failure, step 102 may still mark the status of the physical block as written. The result of this is that the state record of the physical block in the resource access record table may not match the actual state of the physical block. However, this is more efficient and can improve the processing speed of business processes. And even if there is a non-conforming situation, the problem caused is only a slight increase in subsequent reconstruction work, that is to say, some physical blocks that should not be reconstructed are reconstructed.

用户可能会删除已经创建的逻辑资源，在删除逻辑资源之前，访问记录单元24可以先获得将逻辑资源所占用的对应的物理块，然后在资源访问记录表中把这些物理块的状态变更为未写入，执行逻辑资源的删除然后再释放掉物理空间进而完成整个逻辑资源删除的操作。逻辑资源被删除后，资源访问记录表需要做出相应的更新，因为后续的重建是依照逻辑资源记录表中对各个物理块状态的记录展开的，及时更新可以确保重建的范围限于被业务或者应用层面真实占用的各个物理块。The user may delete the logical resource that has been created. Before deleting the logical resource, the access record unit 24 can first obtain the corresponding physical blocks occupied by the logical resource, and then change the status of these physical blocks to unavailable in the resource access record table. Write, execute the deletion of logical resources and then release the physical space to complete the operation of deleting the entire logical resource. After the logical resource is deleted, the resource access record table needs to be updated accordingly, because the subsequent reconstruction is carried out according to the records of the status of each physical block in the logical resource record table, and timely updating can ensure that the scope of reconstruction is limited to the business or application Each physical block actually occupied by the layer.

进一步来说，在初始的时候，很显然所有物理块的状态皆为未写入数据。然而物理块可能会反复地被写入数据，所以在数据写入物理块时，可以先看看物理块的状态是否已经是已写入，如果是返回，否则继续。相当于在首次写入时进行状态更改，后续则跳过。对上述操作的优化方案是，在每次写入数据到物理块时直接更新该物理块为已写入，可以减去读出和判断的过程，从而提高效率。Furthermore, at the initial stage, it is obvious that the state of all physical blocks is unwritten data. However, data may be written into the physical block repeatedly, so when data is written into the physical block, you can first check whether the status of the physical block has been written, if it is returned, otherwise continue. It is equivalent to making a state change on the first write, and skipping it later. The optimization scheme for the above operations is to directly update the physical block as written every time data is written to the physical block, which can subtract the process of reading and judging, thereby improving efficiency.

进一步来说，在资源访问记录表可能会呈现出无效的状态，比如Bitmap中无法置位，或者说更新不能不成功。此时为了更为严谨的考虑，可以在标记物理块状态之前，检查所述资源访问记录表是否有效，如果是则更新资源访问记录表，否则不更新并转入其他不同的重建处理单元，比如说转入一个现有的软件/硬件实现的重建处理单元去执行常见的重建处理流程中去。如此一来可以确保充分利用已有的重建处理单元作为备份，提高重建处理的可靠性。相应地，可以在更新所述物理块状态时检查是否更新成功，如果更新成功则继续，否则将所述资源访问记录表标记为无效。Furthermore, the resource access record table may show an invalid state, such as the Bitmap cannot be set, or the update cannot be unsuccessful. At this time, for more rigorous consideration, before marking the state of the physical block, check whether the resource access record table is valid, if so, update the resource access record table, otherwise do not update and transfer to other different reconstruction processing units, such as It is said to transfer to an existing software/hardware implemented reconstruction processing unit to perform a common reconstruction processing flow. In this way, it can be ensured that the existing reconstruction processing unit is fully utilized as a backup, and the reliability of reconstruction processing is improved. Correspondingly, when updating the state of the physical block, it may be checked whether the update is successful, and if the update is successful, continue, otherwise, mark the resource access record table as invalid.

步骤103，在重建RAID阵列时根据所述资源访问记录表获取状态为已写入的物理块，以物理块为单元对状态为已写入的物理块进行重建。本步骤有重建处理单元26执行。Step 103 , when rebuilding the RAID array, obtain the written physical blocks according to the resource access record table, and rebuild the written physical blocks in units of physical blocks. This step is executed by the reconstruction processing unit 26 .

请参考图4，RAID阵列处于降级状态时，系统可以提示管理员进行重建，也可能是立刻触发重建，一旦系统发现有可用的热备盘，系统即可开始重建操作。通常重建是根据RAID阵列的级别使用相应的校验算法计算出损坏磁盘上的数据后写入到相应的热备盘中去。本发明获取到的物理块均是有数据的物理块，因此重建工作显得非常有意义，省去了大量的没有必要的重建工作。相对于现有技术中重建已经分配的物理资源的实现方式，本发明重建已经被写入数据的物理块，效率大大提升。因为往往已经分配的物理资源很多都没有实际写入数据。Please refer to Figure 4. When the RAID array is in a degraded state, the system can prompt the administrator to rebuild, or trigger the rebuild immediately. Once the system finds that there is an available hot spare disk, the system can start the rebuild operation. Usually rebuilding is based on the level of the RAID array, using the corresponding verification algorithm to calculate the data on the damaged disk and write it to the corresponding hot spare disk. The physical blocks obtained by the present invention are all physical blocks with data, so the reconstruction work is very meaningful, and a large amount of unnecessary reconstruction work is saved. Compared with the implementation manner of rebuilding allocated physical resources in the prior art, the present invention rebuilds physical blocks that have been written into data, and the efficiency is greatly improved. Because often many physical resources that have been allocated have not actually written data.

重建处理单元26还可以做进一步各种优化处理。在重建过程中如果新数据写入到物理块时，重建处理单元26可以将新数据同时写入数据盘和热备盘中。这样做的好处是重建过程中业务数据的写入与重建进程互不影响，相当于业务写入的时完成了重建，因此重建处理单元不需要再单独重建这部分新写入的数据。The reconstruction processing unit 26 can further perform various optimization processes. If new data is written into the physical block during the reconstruction process, the reconstruction processing unit 26 may simultaneously write the new data into the data disk and the hot spare disk. The advantage of this is that the writing of business data and the rebuilding process do not affect each other during the rebuilding process, which means that the rebuilding is completed when the business is written, so the rebuilding processing unit does not need to rebuild the newly written data separately.

从实现上来说，有两种方式。比如说，在重建开始的时候，重建处理单元26可以根据当前的资源访问记录表先生成一个重建列表(当前状态为已写入的物理块所构成的列表)，按照重建列表顺序对各个物理块进行重建。In terms of implementation, there are two ways. For example, when reconstruction starts, the reconstruction processing unit 26 can generate a reconstruction list (the current state is a list composed of written physical blocks) according to the current resource access record table, and process each physical block according to the order of the reconstruction list to rebuild.

当然也可以直接根据资源访问记录表进行重建，重建处理单元26每次取N个(N为自然数)状态为已写入的物理块进行重建，直到所有物理块重建完成为止。Of course, rebuilding can also be performed directly according to the resource access record table. The rebuilding processing unit 26 takes N (N is a natural number) physical blocks whose state is written and rebuilds each time until all the physical blocks are reconstructed.

更进一步，当重建过程中有新数据写入时，根据步骤102的方式，访问记录单元24会将该物理块的状态更新为已写入，这样可以确保资源访问记录表的完整性和实时性。假设重建过程中热备盘出现了故障，用另外的热备盘替换，由于资源访问记录表是完整的(或者说是及时更新了的)，因此再次重建的过程不会受到任何影响。Furthermore, when new data is written in the reconstruction process, according to the method of step 102, the access record unit 24 will update the state of the physical block to have been written, which can ensure the integrity and real-time performance of the resource access record table . Assuming that the hot spare disk fails during the rebuilding process, and replace it with another hot spare disk, since the resource access record table is complete (or updated in time), the rebuilding process will not be affected in any way.

以上所述仅仅为本发明较佳的实现方式，任何基于本发明精神所做出的等同的修改皆应涵盖于本发明的权利要求范围中。The above descriptions are only preferred implementation modes of the present invention, and any equivalent modifications made based on the spirit of the present invention shall fall within the scope of the claims of the present invention.

Claims

1. a RAID array rebuild device is used to carry out the RAID array rebuild operation in the network store system, and wherein said RAID array is divided into big or small same physical piece in advance, and this device comprises:

Resource allocation unit, being used for when creating logical resource is that logical resource distributes one or more physical blocks, and the corresponding relation between record logical resource and the physical block;

The Visitor Logs unit is used to safeguard a resource access record sheet, and this resource access record sheet is used to write down each physical block and whether has been written into data; Wherein this Visitor Logs unit when having data to write physical block with the status indication of this physical block in the resource access record sheet for writing; And when said logical resource is deleted, with the status indication of this logical resource corresponding physical piece in the resource access record for not writing; And

The reconstruction process unit is used for when rebuilding the RAID array, obtaining the physical block of state for having write according to said resource access record sheet, is that the unit rebuilds for the physical block that has write state with the physical block.

2. reconstructing device according to claim 1; It is characterized in that; When said reconstruction process unit was further used in finding process of reconstruction, having the new data write state to be the physical block that does not write, the heat that the data that need the Write fault disk is write this failed disk and correspondence was equipped with disk.

3. reconstructing device according to claim 1; It is characterized in that said network store system also comprises the read-write business processing device, it is used for the deal with data read write command; If handle failure then return, if write order and handle and successfully then change the Visitor Logs cell processing over to.

4. reconstructing device according to claim 3; Wherein said Visitor Logs unit was further used for before mark physics bulk state; Check whether said resource access record sheet is effective; If then upgrade said resource access record sheet, otherwise do not upgrade and change other different reconstruction process unit over to; And whether inspection upgrades success when upgrading said physical block state, if upgrades successfully then continuation, otherwise said resource access record sheet is labeled as invalid.

5. reconstructing device according to claim 1, wherein said Visitor Logs unit are further used for obtaining this physical block current state before the mark physics bulk state, if current state is then returned otherwise continuation for writing.

6. a RAID array rebuild method is used to carry out the RAID array rebuild operation in the network store system, and wherein said RAID array is divided into big or small same physical piece in advance, and this method comprises:

A, when creating logical resource, be that logical resource distributes one or more physical blocks, and the corresponding relation between record logical resource and the physical block;

B, resource access record sheet of maintenance, this resource access record sheet is used to write down each physical block and whether has been written into data; And when having data to write physical block with the status indication of this physical block in the resource access record sheet for writing, and when said logical resource is deleted, with the status indication of this logical resource corresponding physical piece in the resource access record for not writing; And

C, when rebuilding the RAID array, obtaining the physical block of state for having write according to said resource access record sheet, is that the unit rebuilds for the physical block that has write state with the physical block.

7. method according to claim 6; It is characterized in that; Step C further is included in when finding to have in the process of reconstruction new data write state to be the physical block that does not write, and the heat that the data that need the Write fault disk is write this failed disk and correspondence is equipped with disk.

8. method according to claim 6 is characterized in that, also comprises:

D, deal with data read write command are if handle failure then return, if write order and handle and successfully then change step B over to.

9. method according to claim 6; Wherein step B further comprises: before mark physics bulk state; Check whether said resource access record sheet is effective,, otherwise do not upgrade and change other different reconstruction process flow processs over to if then upgrade said resource access record sheet; And whether inspection upgrades success when upgrading said physical block state, if upgrades successfully then continuation, otherwise said resource access record sheet is labeled as invalid.

10. method according to claim 6, wherein said step B further comprises: before new physical block state more, obtaining this physical block current state, if current state is then returned otherwise continuation for writing.