US20090276566A1 - Creating logical disk drives for raid subsystems - Google Patents
Creating logical disk drives for raid subsystems Download PDFInfo
- Publication number
- US20090276566A1 US20090276566A1 US12/112,686 US11268608A US2009276566A1 US 20090276566 A1 US20090276566 A1 US 20090276566A1 US 11268608 A US11268608 A US 11268608A US 2009276566 A1 US2009276566 A1 US 2009276566A1
- Authority
- US
- United States
- Prior art keywords
- disk
- disks
- logical
- raid
- physical
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/0223—User address space allocation, e.g. contiguous or non contiguous base addressing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/08—Error detection or correction by redundancy in data representation, e.g. by using checking codes
- G06F11/10—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
- G06F11/1076—Parity data used in redundant arrays of independent storages, e.g. in RAID systems
- G06F11/1092—Rebuilding, e.g. when physically replacing a failing disk
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0604—Improving or facilitating administration, e.g. storage management
- G06F3/0605—Improving or facilitating administration, e.g. storage management by facilitating the interaction with a user or administrator
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0629—Configuration or reconfiguration of storage systems
- G06F3/0632—Configuration or reconfiguration of storage systems by initialisation or re-initialisation of storage systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0683—Plurality of storage devices
- G06F3/0689—Disk arrays, e.g. RAID, JBOD
Definitions
- the present invention generally relates to computer storage systems, and more particularly, to a method for creating logical disk drives for RAID subsystems.
- RAID redundant array of inexpensive disks
- FIG. 1 An exemplary RAID system 100 is shown in FIG. 1 .
- a RAID array 102 consists of a plurality of disks 104 , 106 , 108 , 110 .
- RAID4 is a form of RAID where the data is striped across multiple data disks to improve performance, and an additional parity disk is used for error detection and recovery from a single disk failure.
- FIG. 2 An example of RAID4 striping is shown in FIG. 2 .
- a RAID 200 includes a plurality of data disks 202 , 204 , 206 , 208 , and a parity disk 210 .
- the lettered portion of each disk 202 - 208 (e.g., A, B, C, D) is a “stripe.”
- the RAID 200 appears as a single logical disk with the stripes laid out consecutively (A, B, C, etc.).
- a stripe can be any size, but generally is some small multiple of the disk's block size.
- a RAID4 system has a stripe width, which is another way of referring to the number of data disks, and a “slice size”, which is the product of the stripe size and the stripe width.
- a slice 220 consists of a data stripe at the same offset on each disk in the RAID and the associated parity stripe. Performance is improved because each disk only has to record a fraction (in this case, one fourth) of the data.
- Removing and replacing a single disk from a RAID group triggers a reconstruction of the data that was on the removed disk. This data is reconstructed onto the replacement disk using the remaining data disks in the RAID group and the parity disk block by block. If a large disk is removed or a RAID group with a large number of disks has one disk removed, the reconstruction operation can be costly in terms of the time needed to reconstruct the removed disk and the processing resources used during the reconstruction operation. This problem is compounded in a storage system where multiple disks are located on a single carrier, such that all of the disks on the carrier are removed at the same time even if only one of the disks needs to be replaced.
- a tray also referred to as a sled or a carrier
- multiple disks associated with it it is noted that while the storage systems described herein include disks, one skilled in the art can construct similar storage systems with other types of storage devices, such as solid state devices.
- One solution to the above-identified problem is to make the tray, even though it has individual physical disks, appear as one large logical disk. This essentially presents the tray as a LUN (logical unit number) or some other construct that is independent of the other trays.
- LUN logical unit number
- a drive environment has 48 disks located on 16 trays of three disks each. Instead of seeing 48 disks, the RAID would see 16 logical disks which are just larger; i.e., each logical disk has three times the capacity of any one physical disk. Each tray of three disks is sequentially addressed and the system software maps those three disks onto a single logical disk. The single logical disk reports to the RAID subsystem, creating the impression that there is one large capacity disk. When the RAID subsystem starts to write data, it writes it to a logical block address (LBA) range which is three times the space of one of the physical disks. The storage subsystem interprets the LBA range to be accessed as being on the first disk, on the second disk, spanning the first disk and the second disk, on the third disk, etc.
- LBA logical block address
- each disk in the tray is allocated to a different RAID group. This is beneficial because if, for example, each disk in the tray is a one terabyte disk, the system would start to reconstruct multiple terabyte volumes to reconstruct a RAID group because one failed disk was removed, resulting in a large waste of time and system resources.
- the system can attempt to copy the data directly off of the “good” disks to another tray. The system would then be up and running in less time and would be able to handle dense storage trays being plugged in and out of the RAID array.
- FIG. 1 is a diagram of a general RAID configuration
- FIG. 2 is a diagram of a RAID4 system with striping and a parity disk
- FIG. 3 is a diagram of a virtualized RAID configuration
- FIG. 4 is a block diagram of a system showing one disk tray and its connections to a RAID controller.
- trays of one, two, or three disks are used that could be removed at one point in time for replacement.
- This arrangement permits the disks to be placed into a standard 4U type of shelf.
- This type of physical layout makes the disks individually accessible. If a disk needs to be replaced, the entire tray of disks including the disk to be replaced needs to be removed to be able to disconnect the one disk. For example, in a 48 disk unit, three or four disks would have to be removed at once if one disk had to be replaced.
- the present invention can be implemented in any storage environment where there is a physical carrier with two or more disks. It is noted that while the storage systems described herein include disks, one skilled in the art can construct similar storage systems with other types of storage devices, such as solid state devices.
- the disks on a single tray appear as one large logical disk, even though there are multiple physical disks on the tray.
- This arrangement presents the tray as a single LUN (logical unit number) that is independent of the other trays in the storage system.
- the single logical disk reports to the RAID subsystem, creating the impression that there is one large capacity disk.
- each disk in the tray is allocated to a different RAID group. By allocating the disks in a tray to different RAID groups, if the tray is removed, only a portion of several different RAID groups are removed.
- FIG. 3 is a diagram of a virtualized RAID configuration 300 .
- a RAID controller 302 sees a RAID array 304 made up of logical disk 1 306 , logical disk 2 308 , logical disk 3 310 , and a logical parity disk 312 .
- Each of the logical disks 306 - 312 is a single disk tray consisting of four physical disks, such that logical disk 1 includes physical disks P 1 -P 4 320 - 326 , logical disk 2 includes physical disks P 5 -P 8 330 - 336 , logical disk 3 includes physical disks P 9 -P 12 340 - 346 , and the logical parity disk includes physical disks P 13 -P 16 350 - 356 .
- Which physical disks 320 - 356 belong to the logical disks 306 - 312 can be indicated by setting an identifier for each disk by a disk driver located on the filer.
- the RAID controller 302 only sees the logical disks 306 - 312 , and does not know that the physical disks 320 - 356 are present.
- the RAID controller 302 operates in the same manner as it would if there were only four physical disks connected to the controller.
- the number of physical disks per logical disk has no effect on the operation of the RAID controller 302 . It is noted that while four physical disks are shown per logical disk, one skilled in the art can change the number of physical disks per logical disk without altering the operation of the RAID controller 302 .
- RAID group W includes disks 320 , 330 , 340 , and 350 .
- Each logical disk spans several different RAID groups.
- logical disk 306 includes a W RAID group disk 320 , an X RAID group disk 322 , a Y RAID group disk 324 , and a Z RAID group disk 326 .
- the RAID groups are all in a normal RAID group situation, wherein all of the W disks are in one RAID group, all of the X disks are in a second RAID group, all of the Y disks are in a third RAID group, and all of the Z disks are in a fourth RAID group. If the disk tray that contains logical disk 306 is removed with one W disk, one X disk, one Y disk, and one Z disk on it, none of the RAID groups will become doubly degraded, e.g., the entire W RAID group is not removed.
- the problem is that four different RAID groups have to be reconstructed.
- the W RAID group component, the X RAID group component, the Y RAID group component, and the Z RAID group component of the removed tray all need to be rebuilt.
- physical disk P 4 326 (of the Z RAID group) is the physical disk on the tray that failed. Then information on the W RAID group disk 320 , the X RAID group disk 322 , and the Y RAID group disk 324 could be copied to other disks, instead of being reconstructed.
- the RAID controller 302 To the RAID controller 302 , it would look like it is communicating with a disk that has an LBA range of four times the size of any physical disk that was actually present. So the RAID would stripe the data across the logical disks that it is aware of. By viewing the physical disks on a tray as a single logical disk, the net effect is adding the LBA ranges of each disk together and the storage subsystem recognizing where a physical location relating to the LBA is; i.e., mapping multiple physical disks into a single logical disk for RAID access.
- the RAID When implementing the present invention, there is no change in the way the RAID operates.
- the abstraction is placed below the RAID, so that it is possible for the RAID to handle the removal of a larger number of disks in an easier manner for supportability. Then the RAID does not have to be concerned about the complexities of the layout, because it has already taken care of the layout through the virtualization.
- FIG. 4 is a block diagram of a portion of a system 400 configured to implement the present invention.
- the system 400 includes a filer 402 and a disk shelf 404 .
- the filer 402 includes a RAID controller 410 , a disk driver 412 in communication with the RAID controller 410 , shelf enclosure services 414 in communication with the RAID controller 410 and the disk driver 412 , and an adapter driver 416 in communication with the disk driver 412 and the shelf enclosure services 414 .
- the disk shelf 404 includes a shelf controller 420 and a disk tray 422 .
- the disk tray 422 includes a disk tray adapter 424 and a plurality of physical disks 426 - 432 .
- the disk tray adapter 424 is in communication with each of the physical disks 426 - 432 and the shelf controller 420 .
- the filer 402 communicates with the disk shelf 404 via communication between the adapter driver 416 and the shelf controller 420 .
- the disk driver 412 is the entity in the system 400 that provides the virtualization to the RAID controller 410 .
- the shelf enclosure services 414 receives data from the shelf controller 420 about the configuration of the physical disks 426 - 432 on the disk shelf 404 .
- the configuration information includes environmental information about the physical layout of the shelf and each disk on the shelf.
- each disk on the disk shelf 404 is uniquely identified by its physical connection to the shelf 404 .
- a single disk on the shelf 404 may be identified by the bay where it is physically located and an ID number of the shelf 404 .
- the configuration information is provided to the disk driver 412 , which uses the information to indicate which physical disks 426 - 432 on the disk shelf 404 belong to which logical drive(s).
- the disk driver 412 uses a table to track the assignment of a physical drive to a logical drive. It is noted that one skilled in the art can use other means of identifying each disk on the disk shelf and other means of tracking the assignment of physical disks to logical disks.
- the disk driver 412 After the physical disks 426 - 432 are assigned to a logical disks, the disk driver 412 presents the logical disk to the RAID controller 410 .
- the elements “above” the RAID controller 410 i.e., the disk driver 412 , the shelf enclosure services 414 , and the adapter driver 416 ) treat the disks 426 - 432 as individual disk drives.
- the disk driver 412 provides the virtualization layer to the RAID controller 410 , such that the RAID controller 410 only sees the logical disk. In the event that multiple disk trays 422 are present, the disk driver 412 presents multiple logical disks to the RAID controller 410 .
- each disk tray communicates with the shelf controller 420 in a similar manner as shown in FIG. 4 .
- the shelf controllers for each shelf are in communication with each other, in a cascading style such that only one shelf controller communicates directly with the adapter driver 416 .
- the disk tray 422 is shown with four physical disks 426 - 432 , one skilled in the art can construct a similar disk tray with different numbers of physical disks.
- system 400 illustrates a disk shelf, a disk tray, and multiple physical disks
- a similar system can be built with other types of storage devices, such as solid state devices. In such circumstances, there would be a storage shelf, a storage tray, and multiple physical storage devices.
- the system 400 operates in a similar manner, regardless of the type of storage device used.
- the RAID controller 410 sends an input/output (I/O) command to be performed on a logical disk of the RAID subsystem. It is noted that the following description relates to a single I/O command for simplicity; the system 400 operates in the same manner for any number of I/O commands issued by the RAID controller 410 .
- the disk driver 412 receives the command from the RAID controller 410 and determines which of the physical disks 426 - 432 should receive the command, based on the mapping of the logical disk to the physical disks 426 - 432 .
- the disk driver 412 forwards the command along with the determined physical disk to the adapter driver 416 .
- the adapter driver 416 forwards the command to the shelf controller 420 , which passes the command to the disk tray adapter 424 .
- the disk tray adapter 424 directs the command to the determined physical disk 426 - 432 . Any response from the physical disk is passed in the reverse direction (through the disk tray adapter 424 , the shelf controller 420 , the adapter driver 416 , and the disk driver 412 ) to the RAID controller 410 .
- the present invention can be implemented in a computer program tangibly embodied in a computer-readable storage medium containing a set of instructions for execution by a processor or a general purpose computer; and method steps of the invention can be performed by a processor executing a program of instructions to perform functions of the invention by operating on input data and generating output data.
- Suitable processors include, by way of example, both general and special purpose processors.
- a processor will receive instructions and data from a ROM, a random access memory (RAM), and/or a storage device.
- Storage devices suitable for embodying computer program instructions and data include all forms of non-volatile memory, including by way of example semiconductor memory devices, magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD-ROM disks and digital versatile disks (DVDs).
- non-volatile memory including by way of example semiconductor memory devices, magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD-ROM disks and digital versatile disks (DVDs).
- the illustrative embodiments may be implemented in computer software, the functions within the illustrative embodiments may alternatively be embodied in part or in whole using hardware components such as Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), or other hardware, or in some combination of hardware components and software components.
- ASICs Application Specific Integrated Circuits
- FPGAs Field Programmable Gate Arrays
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Quality & Reliability (AREA)
- Signal Processing For Digital Recording And Reproducing (AREA)
- Techniques For Improving Reliability Of Storages (AREA)
Abstract
Description
- The present invention generally relates to computer storage systems, and more particularly, to a method for creating logical disk drives for RAID subsystems.
- Using RAID (redundant array of inexpensive disks) improves fault tolerance and performance of disks. An
exemplary RAID system 100 is shown inFIG. 1 . ARAID array 102 consists of a plurality ofdisks - One example of a RAID system is a RAID4, which is a form of RAID where the data is striped across multiple data disks to improve performance, and an additional parity disk is used for error detection and recovery from a single disk failure. An example of RAID4 striping is shown in
FIG. 2 . ARAID 200 includes a plurality ofdata disks parity disk 210. The lettered portion of each disk 202-208 (e.g., A, B, C, D) is a “stripe.” To the user of theRAID 200, theRAID 200 appears as a single logical disk with the stripes laid out consecutively (A, B, C, etc.). A stripe can be any size, but generally is some small multiple of the disk's block size. In addition to the stripe size, a RAID4 system has a stripe width, which is another way of referring to the number of data disks, and a “slice size”, which is the product of the stripe size and the stripe width. Aslice 220 consists of a data stripe at the same offset on each disk in the RAID and the associated parity stripe. Performance is improved because each disk only has to record a fraction (in this case, one fourth) of the data. - Removing and replacing a single disk from a RAID group triggers a reconstruction of the data that was on the removed disk. This data is reconstructed onto the replacement disk using the remaining data disks in the RAID group and the parity disk block by block. If a large disk is removed or a RAID group with a large number of disks has one disk removed, the reconstruction operation can be costly in terms of the time needed to reconstruct the removed disk and the processing resources used during the reconstruction operation. This problem is compounded in a storage system where multiple disks are located on a single carrier, such that all of the disks on the carrier are removed at the same time even if only one of the disks needs to be replaced.
- When building storage systems, it is possible to build a tray (also referred to as a sled or a carrier) that has multiple disks associated with it. It is noted that while the storage systems described herein include disks, one skilled in the art can construct similar storage systems with other types of storage devices, such as solid state devices. One solution to the above-identified problem is to make the tray, even though it has individual physical disks, appear as one large logical disk. This essentially presents the tray as a LUN (logical unit number) or some other construct that is independent of the other trays.
- For example, a drive environment has 48 disks located on 16 trays of three disks each. Instead of seeing 48 disks, the RAID would see 16 logical disks which are just larger; i.e., each logical disk has three times the capacity of any one physical disk. Each tray of three disks is sequentially addressed and the system software maps those three disks onto a single logical disk. The single logical disk reports to the RAID subsystem, creating the impression that there is one large capacity disk. When the RAID subsystem starts to write data, it writes it to a logical block address (LBA) range which is three times the space of one of the physical disks. The storage subsystem interprets the LBA range to be accessed as being on the first disk, on the second disk, spanning the first disk and the second disk, on the third disk, etc.
- In one implementation, each disk in the tray is allocated to a different RAID group. This is beneficial because if, for example, each disk in the tray is a one terabyte disk, the system would start to reconstruct multiple terabyte volumes to reconstruct a RAID group because one failed disk was removed, resulting in a large waste of time and system resources.
- By allocating the disks in a tray to different RAID groups, if the tray is removed, only a portion of several different RAID groups are removed. If the system has the capability to copy data from the “good” disks (i.e., the other disks on the tray that have not failed), called rapid RAID recovery, the system can attempt to copy the data directly off of the “good” disks to another tray. The system would then be up and running in less time and would be able to handle dense storage trays being plugged in and out of the RAID array.
- A more detailed understanding of the invention may be had from the following description of preferred embodiments, given by way of example, and to be understood in conjunction with the accompanying drawings, wherein:
-
FIG. 1 is a diagram of a general RAID configuration; -
FIG. 2 is a diagram of a RAID4 system with striping and a parity disk; -
FIG. 3 is a diagram of a virtualized RAID configuration; and -
FIG. 4 is a block diagram of a system showing one disk tray and its connections to a RAID controller. - In order to create a dense storage system, trays of one, two, or three disks are used that could be removed at one point in time for replacement. This arrangement permits the disks to be placed into a standard 4U type of shelf. This type of physical layout makes the disks individually accessible. If a disk needs to be replaced, the entire tray of disks including the disk to be replaced needs to be removed to be able to disconnect the one disk. For example, in a 48 disk unit, three or four disks would have to be removed at once if one disk had to be replaced.
- When a tray is removed, access to all disks on the tray is lost. For a RAID subsystem, removing multiple disks presents a problem. For example, assume that a tray is removed that has four disks on it and that the RAID has striped the data so that multiple RAID groups are affected when the tray is removed. If that tray is replaced, then four disks are being replaced, which causes multiple RAID groups to be reconstructed and rebuilt in the worst case scenario. One way to reconstruct a disk missing from a RAID group (in the case of a RAID4 implementation, for example), is to use the remaining disks in the RAID group and the parity disk for the RAID group to regenerate the data on the missing disk block by block. This leads to large amounts of time and system resources being spent on the reconstruction operation. There is therefore a need for a method wherein a tray having multiple disks can be removed without causing multiple RAID groups to be reconstructed.
- The present invention can be implemented in any storage environment where there is a physical carrier with two or more disks. It is noted that while the storage systems described herein include disks, one skilled in the art can construct similar storage systems with other types of storage devices, such as solid state devices.
- In one embodiment, the disks on a single tray appear as one large logical disk, even though there are multiple physical disks on the tray. This arrangement presents the tray as a single LUN (logical unit number) that is independent of the other trays in the storage system. The single logical disk reports to the RAID subsystem, creating the impression that there is one large capacity disk. In one implementation, each disk in the tray is allocated to a different RAID group. By allocating the disks in a tray to different RAID groups, if the tray is removed, only a portion of several different RAID groups are removed.
-
FIG. 3 is a diagram of a virtualizedRAID configuration 300. ARAID controller 302 sees aRAID array 304 made up oflogical disk 1 306,logical disk 2 308,logical disk 3 310, and alogical parity disk 312. Each of the logical disks 306-312 is a single disk tray consisting of four physical disks, such thatlogical disk 1 includes physical disks P1-P4 320-326,logical disk 2 includes physical disks P5-P8 330-336,logical disk 3 includes physical disks P9-P12 340-346, and the logical parity disk includes physical disks P13-P16 350-356. Which physical disks 320-356 belong to the logical disks 306-312 can be indicated by setting an identifier for each disk by a disk driver located on the filer. - The
RAID controller 302 only sees the logical disks 306-312, and does not know that the physical disks 320-356 are present. TheRAID controller 302 operates in the same manner as it would if there were only four physical disks connected to the controller. The number of physical disks per logical disk has no effect on the operation of theRAID controller 302. It is noted that while four physical disks are shown per logical disk, one skilled in the art can change the number of physical disks per logical disk without altering the operation of theRAID controller 302. - A RAID group can be created using one disk from each drive tray in the system. For example, RAID group W includes
disks logical disk 306 includes a WRAID group disk 320, an XRAID group disk 322, a YRAID group disk 324, and a ZRAID group disk 326. - The RAID groups are all in a normal RAID group situation, wherein all of the W disks are in one RAID group, all of the X disks are in a second RAID group, all of the Y disks are in a third RAID group, and all of the Z disks are in a fourth RAID group. If the disk tray that contains
logical disk 306 is removed with one W disk, one X disk, one Y disk, and one Z disk on it, none of the RAID groups will become doubly degraded, e.g., the entire W RAID group is not removed. - When a disk tray is removed, the problem is that four different RAID groups have to be reconstructed. In other words, the W RAID group component, the X RAID group component, the Y RAID group component, and the Z RAID group component of the removed tray all need to be rebuilt. For example, assume that physical disk P4 326 (of the Z RAID group) is the physical disk on the tray that failed. Then information on the W
RAID group disk 320, the XRAID group disk 322, and the YRAID group disk 324 could be copied to other disks, instead of being reconstructed. The result of this copy operation is that the W RAID group, the X RAID group, and the Y RAID group could be reconstructed more easily because the missing data from the removed disks 320-324 would not have to be generated from corresponding the parity disks 350-354, and can just be copied. - However, virtualizing the RAID group and treating each WXYZ drive tray as a larger disk permits three quarters of the LBA range of the virtual disk to be copied to three quarters of the next logical disk. The operation is abstracted in a virtual sense, meaning that this is basically a copy operation—the information does not have to be reconstructed from a parity disk. Then the simplicity from the RAID group standpoint is that it would see fewer large disks. For example, instead of seeing 48 disks, the RAID would see 12 disks, wherein each disk seen by the
RAID controller 302 is actually a disk tray with four physical disks. - To the
RAID controller 302, it would look like it is communicating with a disk that has an LBA range of four times the size of any physical disk that was actually present. So the RAID would stripe the data across the logical disks that it is aware of. By viewing the physical disks on a tray as a single logical disk, the net effect is adding the LBA ranges of each disk together and the storage subsystem recognizing where a physical location relating to the LBA is; i.e., mapping multiple physical disks into a single logical disk for RAID access. - Presenting multiple physical disks as a single logical disk is the opposite of what is traditionally thought of as virtualization. This is virtualization within the RAID subsystem itself, whereas virtualization traditionally occurs external to the RAID subsystem. A RAID traditionally approaches virtualization from the opposite direction, by mapping multiple logical devices onto a single physical device.
- When implementing the present invention, there is no change in the way the RAID operates. The abstraction is placed below the RAID, so that it is possible for the RAID to handle the removal of a larger number of disks in an easier manner for supportability. Then the RAID does not have to be concerned about the complexities of the layout, because it has already taken care of the layout through the virtualization.
- Exemplary System Construction
-
FIG. 4 is a block diagram of a portion of asystem 400 configured to implement the present invention. Thesystem 400 includes afiler 402 and adisk shelf 404. Thefiler 402 includes aRAID controller 410, adisk driver 412 in communication with theRAID controller 410,shelf enclosure services 414 in communication with theRAID controller 410 and thedisk driver 412, and anadapter driver 416 in communication with thedisk driver 412 and the shelf enclosure services 414. Thedisk shelf 404 includes ashelf controller 420 and adisk tray 422. Thedisk tray 422 includes adisk tray adapter 424 and a plurality of physical disks 426-432. Thedisk tray adapter 424 is in communication with each of the physical disks 426-432 and theshelf controller 420. Thefiler 402 communicates with thedisk shelf 404 via communication between theadapter driver 416 and theshelf controller 420. - The
disk driver 412 is the entity in thesystem 400 that provides the virtualization to theRAID controller 410. Theshelf enclosure services 414 receives data from theshelf controller 420 about the configuration of the physical disks 426-432 on thedisk shelf 404. The configuration information includes environmental information about the physical layout of the shelf and each disk on the shelf. In one implementation, each disk on thedisk shelf 404 is uniquely identified by its physical connection to theshelf 404. For example, a single disk on theshelf 404 may be identified by the bay where it is physically located and an ID number of theshelf 404. The configuration information is provided to thedisk driver 412, which uses the information to indicate which physical disks 426-432 on thedisk shelf 404 belong to which logical drive(s). In one implementation, thedisk driver 412 uses a table to track the assignment of a physical drive to a logical drive. It is noted that one skilled in the art can use other means of identifying each disk on the disk shelf and other means of tracking the assignment of physical disks to logical disks. - After the physical disks 426-432 are assigned to a logical disks, the
disk driver 412 presents the logical disk to theRAID controller 410. The elements “above” the RAID controller 410 (i.e., thedisk driver 412, theshelf enclosure services 414, and the adapter driver 416) treat the disks 426-432 as individual disk drives. Thedisk driver 412 provides the virtualization layer to theRAID controller 410, such that theRAID controller 410 only sees the logical disk. In the event thatmultiple disk trays 422 are present, thedisk driver 412 presents multiple logical disks to theRAID controller 410. - It is noted that while only one
disk tray 422 and onedisk shelf 404 are shown, one skilled in the art can construct a similar system with multiple disk trays and/or multiple disk shelves. When more than one disk tray is present on a single disk shelf, each disk tray communicates with theshelf controller 420 in a similar manner as shown inFIG. 4 . When more than one disk shelf is present in thesystem 400, the shelf controllers for each shelf are in communication with each other, in a cascading style such that only one shelf controller communicates directly with theadapter driver 416. It is also noted that while thedisk tray 422 is shown with four physical disks 426-432, one skilled in the art can construct a similar disk tray with different numbers of physical disks. - It is further noted that while the
system 400 illustrates a disk shelf, a disk tray, and multiple physical disks, a similar system can be built with other types of storage devices, such as solid state devices. In such circumstances, there would be a storage shelf, a storage tray, and multiple physical storage devices. Thesystem 400 operates in a similar manner, regardless of the type of storage device used. - In operation, the
RAID controller 410 sends an input/output (I/O) command to be performed on a logical disk of the RAID subsystem. It is noted that the following description relates to a single I/O command for simplicity; thesystem 400 operates in the same manner for any number of I/O commands issued by theRAID controller 410. - The
disk driver 412 receives the command from theRAID controller 410 and determines which of the physical disks 426-432 should receive the command, based on the mapping of the logical disk to the physical disks 426-432. Thedisk driver 412 forwards the command along with the determined physical disk to theadapter driver 416. Theadapter driver 416 forwards the command to theshelf controller 420, which passes the command to thedisk tray adapter 424. Thedisk tray adapter 424 directs the command to the determined physical disk 426-432. Any response from the physical disk is passed in the reverse direction (through thedisk tray adapter 424, theshelf controller 420, theadapter driver 416, and the disk driver 412) to theRAID controller 410. - The present invention can be implemented in a computer program tangibly embodied in a computer-readable storage medium containing a set of instructions for execution by a processor or a general purpose computer; and method steps of the invention can be performed by a processor executing a program of instructions to perform functions of the invention by operating on input data and generating output data. Suitable processors include, by way of example, both general and special purpose processors. Typically, a processor will receive instructions and data from a ROM, a random access memory (RAM), and/or a storage device. Storage devices suitable for embodying computer program instructions and data include all forms of non-volatile memory, including by way of example semiconductor memory devices, magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD-ROM disks and digital versatile disks (DVDs). In addition, while the illustrative embodiments may be implemented in computer software, the functions within the illustrative embodiments may alternatively be embodied in part or in whole using hardware components such as Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), or other hardware, or in some combination of hardware components and software components.
- While specific embodiments of the present invention have been shown and described, many modifications and variations could be made by one skilled in the art without departing from the scope of the invention. The above description serves to illustrate and not limit the particular invention in any way.
Claims (19)
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/112,686 US8612678B2 (en) | 2008-04-30 | 2008-04-30 | Creating logical disk drives for raid subsystems |
PCT/US2009/042414 WO2009135065A2 (en) | 2008-04-30 | 2009-04-30 | Creating logical disk drives for raid subsystems |
US14/078,352 US9811454B2 (en) | 2008-04-30 | 2013-11-12 | Creating logical disk drives for raid subsystems |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/112,686 US8612678B2 (en) | 2008-04-30 | 2008-04-30 | Creating logical disk drives for raid subsystems |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/078,352 Continuation US9811454B2 (en) | 2008-04-30 | 2013-11-12 | Creating logical disk drives for raid subsystems |
Publications (2)
Publication Number | Publication Date |
---|---|
US20090276566A1 true US20090276566A1 (en) | 2009-11-05 |
US8612678B2 US8612678B2 (en) | 2013-12-17 |
Family
ID=41255829
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/112,686 Active 2030-05-27 US8612678B2 (en) | 2008-04-30 | 2008-04-30 | Creating logical disk drives for raid subsystems |
US14/078,352 Active 2029-08-15 US9811454B2 (en) | 2008-04-30 | 2013-11-12 | Creating logical disk drives for raid subsystems |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/078,352 Active 2029-08-15 US9811454B2 (en) | 2008-04-30 | 2013-11-12 | Creating logical disk drives for raid subsystems |
Country Status (2)
Country | Link |
---|---|
US (2) | US8612678B2 (en) |
WO (1) | WO2009135065A2 (en) |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100057986A1 (en) * | 2008-08-29 | 2010-03-04 | Mahmoud Jibbe | Storage array boot and configuration |
US20130031247A1 (en) * | 2011-07-27 | 2013-01-31 | Cleversafe, Inc. | Generating dispersed storage network event records |
US8832368B1 (en) * | 2010-02-18 | 2014-09-09 | Netapp, Inc. | Method and apparatus for slicing mass storage devices |
US20150058555A1 (en) * | 2013-08-26 | 2015-02-26 | Vmware, Inc. | Virtual Disk Blueprints for a Virtualized Storage Area Network |
US9104330B1 (en) * | 2012-06-30 | 2015-08-11 | Emc Corporation | System and method for interleaving storage |
US20150279470A1 (en) * | 2014-03-26 | 2015-10-01 | 2419265 Ontario Limited | Solid-state memory device with plurality of memory cards |
US20160179411A1 (en) * | 2014-12-23 | 2016-06-23 | Intel Corporation | Techniques to Provide Redundant Array of Independent Disks (RAID) Services Using a Shared Pool of Configurable Computing Resources |
US9582198B2 (en) | 2013-08-26 | 2017-02-28 | Vmware, Inc. | Compressed block map of densely-populated data structures |
US9658803B1 (en) * | 2012-06-28 | 2017-05-23 | EMC IP Holding Company LLC | Managing accesses to storage |
US9672115B2 (en) | 2013-08-26 | 2017-06-06 | Vmware, Inc. | Partition tolerance in cluster membership management |
US9678680B1 (en) * | 2015-03-30 | 2017-06-13 | EMC IP Holding Company LLC | Forming a protection domain in a storage architecture |
US9811531B2 (en) | 2013-08-26 | 2017-11-07 | Vmware, Inc. | Scalable distributed storage architecture |
US9887924B2 (en) | 2013-08-26 | 2018-02-06 | Vmware, Inc. | Distributed policy-based provisioning and enforcement for quality of service |
US10496316B1 (en) * | 2018-10-31 | 2019-12-03 | EMC IP Holding Company LLC | Forming storage containers from extents having different widths within a group of storage devices |
EP2948950B1 (en) * | 2013-01-23 | 2019-12-11 | Dot Hill Systems Corporation | High density data storage system with improved storage device access |
CN110659160A (en) * | 2019-09-06 | 2020-01-07 | 厦门市美亚柏科信息股份有限公司 | RAID5 data recovery method, device, system and storage medium |
US10678619B2 (en) | 2011-07-27 | 2020-06-09 | Pure Storage, Inc. | Unified logs and device statistics |
US11016820B2 (en) | 2013-08-26 | 2021-05-25 | Vmware, Inc. | Load balancing of resources |
US11016702B2 (en) | 2011-07-27 | 2021-05-25 | Pure Storage, Inc. | Hierarchical event tree |
US12099752B2 (en) | 2011-07-27 | 2024-09-24 | Pure Storage, Inc. | Error prediction based on correlation using event records |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10445018B2 (en) * | 2016-09-09 | 2019-10-15 | Toshiba Memory Corporation | Switch and memory device |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5860079A (en) * | 1996-05-10 | 1999-01-12 | Apple Computer, Inc. | Arrangement and method for efficient calculation of memory addresses in a block storage memory system |
US6853546B2 (en) * | 2002-09-23 | 2005-02-08 | Josef Rabinovitz | Modular data storage device assembly |
US20050182898A1 (en) * | 2004-02-12 | 2005-08-18 | International Business Machines Corporation | Method and apparatus for providing high density storage |
US20060179209A1 (en) * | 2005-02-04 | 2006-08-10 | Dot Hill Systems Corp. | Storage device method and apparatus |
US20060242382A1 (en) * | 2005-04-25 | 2006-10-26 | Peter Griess | Apparatus and method for managing of common storage in a storage system |
US20070130424A1 (en) * | 2005-12-02 | 2007-06-07 | Hitachi, Ltd. | Storage system and capacity allocation method therefor |
US7401193B1 (en) * | 2004-10-29 | 2008-07-15 | Promise Technology, Inc. | System for storing data |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6384842B1 (en) * | 1999-05-13 | 2002-05-07 | Lsi Logic Corporation | User interface to provide a physical view of movable physical entities |
WO2007053356A2 (en) * | 2005-10-28 | 2007-05-10 | Network Appliance, Inc. | System and method for optimizing multi-pathing support in a distributed storage system environment |
US7805633B2 (en) * | 2006-09-18 | 2010-09-28 | Lsi Corporation | Optimized reconstruction and copyback methodology for a disconnected drive in the presence of a global hot spare disk |
-
2008
- 2008-04-30 US US12/112,686 patent/US8612678B2/en active Active
-
2009
- 2009-04-30 WO PCT/US2009/042414 patent/WO2009135065A2/en active Application Filing
-
2013
- 2013-11-12 US US14/078,352 patent/US9811454B2/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5860079A (en) * | 1996-05-10 | 1999-01-12 | Apple Computer, Inc. | Arrangement and method for efficient calculation of memory addresses in a block storage memory system |
US6853546B2 (en) * | 2002-09-23 | 2005-02-08 | Josef Rabinovitz | Modular data storage device assembly |
US20050182898A1 (en) * | 2004-02-12 | 2005-08-18 | International Business Machines Corporation | Method and apparatus for providing high density storage |
US7401193B1 (en) * | 2004-10-29 | 2008-07-15 | Promise Technology, Inc. | System for storing data |
US20060179209A1 (en) * | 2005-02-04 | 2006-08-10 | Dot Hill Systems Corp. | Storage device method and apparatus |
US20060242382A1 (en) * | 2005-04-25 | 2006-10-26 | Peter Griess | Apparatus and method for managing of common storage in a storage system |
US20070130424A1 (en) * | 2005-12-02 | 2007-06-07 | Hitachi, Ltd. | Storage system and capacity allocation method therefor |
Non-Patent Citations (1)
Title |
---|
Creating RAID 50 VolumesHow to combine 2 or more MeraRAID SATA adpters into 1 OS RAID 50 volume [downloaded from www.lsi.com/downloads/Public/Obsolete/Obsolete%20Common%20Files/rsa_os50volume1103.pdf]Novemer 2003 * |
Cited By (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8225038B2 (en) * | 2008-08-29 | 2012-07-17 | Netapp, Inc. | Storage array boot and configuration |
US20100057986A1 (en) * | 2008-08-29 | 2010-03-04 | Mahmoud Jibbe | Storage array boot and configuration |
US8832368B1 (en) * | 2010-02-18 | 2014-09-09 | Netapp, Inc. | Method and apparatus for slicing mass storage devices |
US9852017B2 (en) * | 2011-07-27 | 2017-12-26 | International Business Machines Corporation | Generating dispersed storage network event records |
US20130031247A1 (en) * | 2011-07-27 | 2013-01-31 | Cleversafe, Inc. | Generating dispersed storage network event records |
US11593029B1 (en) | 2011-07-27 | 2023-02-28 | Pure Storage, Inc. | Identifying a parent event associated with child error states |
US11016702B2 (en) | 2011-07-27 | 2021-05-25 | Pure Storage, Inc. | Hierarchical event tree |
US10678619B2 (en) | 2011-07-27 | 2020-06-09 | Pure Storage, Inc. | Unified logs and device statistics |
US12099752B2 (en) | 2011-07-27 | 2024-09-24 | Pure Storage, Inc. | Error prediction based on correlation using event records |
US9658803B1 (en) * | 2012-06-28 | 2017-05-23 | EMC IP Holding Company LLC | Managing accesses to storage |
US9104330B1 (en) * | 2012-06-30 | 2015-08-11 | Emc Corporation | System and method for interleaving storage |
EP2948950B1 (en) * | 2013-01-23 | 2019-12-11 | Dot Hill Systems Corporation | High density data storage system with improved storage device access |
US10747475B2 (en) * | 2013-08-26 | 2020-08-18 | Vmware, Inc. | Virtual disk blueprints for a virtualized storage area network, wherein virtual disk objects are created from local physical storage of host computers that are running multiple virtual machines |
US10614046B2 (en) | 2013-08-26 | 2020-04-07 | Vmware, Inc. | Scalable distributed storage architecture |
US12126536B2 (en) | 2013-08-26 | 2024-10-22 | VMware LLC | Distributed policy-based provisioning and enforcement for quality of service |
US9887924B2 (en) | 2013-08-26 | 2018-02-06 | Vmware, Inc. | Distributed policy-based provisioning and enforcement for quality of service |
US9672115B2 (en) | 2013-08-26 | 2017-06-06 | Vmware, Inc. | Partition tolerance in cluster membership management |
US9582198B2 (en) | 2013-08-26 | 2017-02-28 | Vmware, Inc. | Compressed block map of densely-populated data structures |
US11809753B2 (en) | 2013-08-26 | 2023-11-07 | Vmware, Inc. | Virtual disk blueprints for a virtualized storage area network utilizing physical storage devices located in host computers |
US9811531B2 (en) | 2013-08-26 | 2017-11-07 | Vmware, Inc. | Scalable distributed storage architecture |
US20150058555A1 (en) * | 2013-08-26 | 2015-02-26 | Vmware, Inc. | Virtual Disk Blueprints for a Virtualized Storage Area Network |
US11704166B2 (en) | 2013-08-26 | 2023-07-18 | Vmware, Inc. | Load balancing of resources |
US10855602B2 (en) | 2013-08-26 | 2020-12-01 | Vmware, Inc. | Distributed policy-based provisioning and enforcement for quality of service |
US11016820B2 (en) | 2013-08-26 | 2021-05-25 | Vmware, Inc. | Load balancing of resources |
US11249956B2 (en) | 2013-08-26 | 2022-02-15 | Vmware, Inc. | Scalable distributed storage architecture |
US11210035B2 (en) | 2013-08-26 | 2021-12-28 | Vmware, Inc. | Creating, by host computers, respective object of virtual disk based on virtual disk blueprint |
US20150279470A1 (en) * | 2014-03-26 | 2015-10-01 | 2419265 Ontario Limited | Solid-state memory device with plurality of memory cards |
US9177654B2 (en) * | 2014-03-26 | 2015-11-03 | Burst Corporation | Solid-state memory device with plurality of memory cards |
US20160179411A1 (en) * | 2014-12-23 | 2016-06-23 | Intel Corporation | Techniques to Provide Redundant Array of Independent Disks (RAID) Services Using a Shared Pool of Configurable Computing Resources |
US9678680B1 (en) * | 2015-03-30 | 2017-06-13 | EMC IP Holding Company LLC | Forming a protection domain in a storage architecture |
US10496316B1 (en) * | 2018-10-31 | 2019-12-03 | EMC IP Holding Company LLC | Forming storage containers from extents having different widths within a group of storage devices |
CN110659160A (en) * | 2019-09-06 | 2020-01-07 | 厦门市美亚柏科信息股份有限公司 | RAID5 data recovery method, device, system and storage medium |
Also Published As
Publication number | Publication date |
---|---|
US20140229671A1 (en) | 2014-08-14 |
WO2009135065A2 (en) | 2009-11-05 |
US9811454B2 (en) | 2017-11-07 |
US8612678B2 (en) | 2013-12-17 |
WO2009135065A3 (en) | 2010-02-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8612678B2 (en) | Creating logical disk drives for raid subsystems | |
US10210045B1 (en) | Reducing concurrency bottlenecks while rebuilding a failed drive in a data storage system | |
US8839028B1 (en) | Managing data availability in storage systems | |
US7000069B2 (en) | Apparatus and method for providing very large virtual storage volumes using redundant arrays of disks | |
US10289336B1 (en) | Relocating data from an end of life storage drive based on storage drive loads in a data storage system using mapped RAID (redundant array of independent disks) technology | |
US20060218433A1 (en) | Method, apparatus and program storage device for providing intelligent rebuild order selection | |
US20190129614A1 (en) | Load Balancing of I/O by Moving Logical Unit (LUN) Slices Between Non-Volatile Storage Represented by Different Rotation Groups of RAID (Redundant Array of Independent Disks) Extent Entries in a RAID Extent Table of a Mapped RAID Data Storage System | |
US10552078B2 (en) | Determining an effective capacity of a drive extent pool generated from one or more drive groups in an array of storage drives of a data storage system that uses mapped RAID (redundant array of independent disks) technology | |
US10229022B1 (en) | Providing Raid-10 with a configurable Raid width using a mapped raid group | |
CN104813290B (en) | RAID investigation machines | |
US20070113008A1 (en) | Configuring Memory for a Raid Storage System | |
US10353787B2 (en) | Data stripping, allocation and reconstruction | |
US20150286531A1 (en) | Raid storage processing | |
US20100030960A1 (en) | Raid across virtual drives | |
US20100306466A1 (en) | Method for improving disk availability and disk array controller | |
US20110145528A1 (en) | Storage apparatus and its control method | |
US20090265510A1 (en) | Systems and Methods for Distributing Hot Spare Disks In Storage Arrays | |
US10678643B1 (en) | Splitting a group of physical data storage drives into partnership groups to limit the risk of data loss during drive rebuilds in a mapped RAID (redundant array of independent disks) data storage system | |
US10514982B2 (en) | Alternate storage arrangement in a distributed data storage system with key-based addressing | |
US8838889B2 (en) | Method of allocating raid group members in a mass storage system | |
US10592111B1 (en) | Assignment of newly added data storage drives to an original data storage drive partnership group and a new data storage drive partnership group in a mapped RAID (redundant array of independent disks) system | |
US7418550B2 (en) | Methods and structure for improved import/export of raid level 6 volumes | |
CN102164165B (en) | Management method and device for network storage system | |
US10977130B2 (en) | Method, apparatus and computer program product for managing raid storage in data storage systems | |
US10877844B2 (en) | Using deletable user data storage space to recover from drive array failure |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: NETAPP, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:COATNEY, DOUG;ASTER, RADEK;REEL/FRAME:021294/0113;SIGNING DATES FROM 20080505 TO 20080507 Owner name: NETAPP, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:COATNEY, DOUG;ASTER, RADEK;SIGNING DATES FROM 20080505 TO 20080507;REEL/FRAME:021294/0113 |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
CC | Certificate of correction | ||
FPAY | Fee payment |
Year of fee payment: 4 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |