JP3172872B2

JP3172872B2 - Storage device and data restoration method thereof

Info

Publication number: JP3172872B2
Application number: JP10418598A
Authority: JP
Inventors: 淳田中; 善久加茂; 仁角田
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1991-04-01
Filing date: 1998-04-01
Publication date: 2001-06-04
Anticipated expiration: 2016-06-04
Also published as: JPH11119920A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、独立した複数個の記憶
媒体を１組として並列に読み書きを行う記憶装置及びそ
のデータ修正方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a storage device for reading and writing data in parallel by using a plurality of independent storage media as a set, and a data correction method for the storage device.

【０００２】[0002]

【従来の技術】従来技術として、記憶装置の大容量化と
データの高速転送を達成する技術として、複数の記憶媒
体を１組として、データをビット単位、バイト単位ある
いは任意の単位に分割してそれぞれを各記憶媒体に分配
して格納し、データ読み出し時には各記憶媒体から同時
に読み出す方式が知られている。そして、この方式で
は、各記憶媒体に分配されたデータからパリティチェッ
ク用のデータを生成し、これを別の記憶媒体に格納す
る。障害発生時には正常な記憶媒体のデータとパリティ
チェック用のデータを用いてデータの修復を行うことに
より、記憶装置の信頼性を向上させている。これらの技
術は、特開平１−２５０１２８号公報に一例が記載され
ている。さらに、記憶媒体に障害が発生した場合、通常
の読み出しのためにデータ修復を行うだけでなく、別に
用意された正常な記憶媒体に障害媒体の分のデータを回
復する技術が知られている。この技術は、予備媒体に修
復したデータを格納し、次回からのアクセスに対して予
備媒体からデータを読みだすことにより、記憶装置の可
用性を高めることが可能である。例えば、特開平２−１
３５５５５号公報にこの種の記憶装置の例が記載されて
いる。2. Description of the Related Art As a conventional technique, as a technique for increasing the capacity of a storage device and transferring data at a high speed, data is divided into bit units, byte units, or arbitrary units by using a plurality of storage media as a set. There is known a method in which each data is distributed and stored in each storage medium, and data is simultaneously read from each storage medium when data is read. In this method, data for parity check is generated from the data distributed to each storage medium, and this is stored in another storage medium. When a failure occurs, the data is restored using the data of the normal storage medium and the data for parity check, thereby improving the reliability of the storage device. Examples of these techniques are described in JP-A-1-250128. Further, there is known a technique in which when a failure occurs in a storage medium, not only data is restored for normal reading, but also data of the failed medium is restored to a separately prepared normal storage medium. This technique can increase the availability of a storage device by storing restored data in a spare medium and reading data from the spare medium for the next access. For example, Japanese Patent Laid-Open No. 2-1
Japanese Patent No. 35555 discloses an example of this type of storage device.

【０００３】[0003]

【発明が解決しようとする課題】上記公知の装置は、パ
リティデータを持つことにより一定数の記憶媒体の障害
を修復することが可能であり、予備記憶媒体を持つこと
により障害回復も行うことが出来る。しかし、障害回復
動作は、正常な記憶媒体のデータとパリティチェック用
のデータをすべて読み出し、障害データを修復し、予備
の記憶媒体に書き込む操作が必要である。このため、障
害回復の間は、各記憶媒体を占有してしまうこととな
り、上位装置から通常の読み書きの処理要求が来ても待
たされ、結果として記憶装置の性能低下になる。また、
上記公知の装置は、複数個の記憶媒体の障害に対して冗
長性があるにもかかわらず、一個の記憶媒体の障害時と
複数個の障害時の障害回復を区別なく扱っている。この
ため、障害回復に重点を置くと、１個の故障にもかかわ
らず通常の読み書きの処理が出来ないため、通常の読み
書きの処理が低下する問題があり、一方、通常の読み書
きを処理に重点を置くと、複数個の障害時には障害回復
時間が保証されず、装置全体が故障する可能性が大きく
なる問題がある。本発明は、２個以上の記憶媒体の障害
に対して冗長性のある記憶装置に関して、障害時におけ
る通常の読み書きの処理低下を最小限に抑え、障害回復
時間を一定時間内に納め、高信頼性を確保することにあ
る。The above-mentioned known apparatus can repair a failure of a fixed number of storage media by having parity data, and can also perform failure recovery by having a spare storage medium. I can do it. However, the failure recovery operation requires an operation of reading out all data of a normal storage medium and data for parity check, repairing the failure data, and writing the data to a spare storage medium. Therefore, during the recovery from a failure, each storage medium is occupied, and even if a normal read / write processing request is received from a higher-level device, it is waited for, and as a result, the performance of the storage device is reduced. Also,
The above-mentioned known apparatus handles the failure recovery of one storage medium and the failure recovery of a plurality of failures without distinction even though there is redundancy with respect to the failure of a plurality of storage media. For this reason, if emphasis is placed on failure recovery, normal read / write processing cannot be performed despite the occurrence of one failure, and there is a problem that normal read / write processing is reduced. On the other hand, normal read / write processing is emphasized. When there are a plurality of faults, a fault recovery time is not guaranteed in the case of a plurality of faults, and there is a problem that the possibility that the entire device will break down increases. The present invention relates to a storage device having redundancy with respect to a failure of two or more storage media, minimizing a decrease in normal read / write processing at the time of failure, keeping failure recovery time within a certain time, and achieving high reliability. Is to ensure the quality.

【０００４】[0004]

【課題を解決するための手段】上記目的は、独立した複
数個の記憶媒体を１組とし、データをビット、バイトあ
るいは任意の単位で分割して格納する記憶媒体群と、前
記分割したデータに対応するＥＣＣデータを格納するデ
ィスクと、修復したデータを格納する予備記憶媒体と、
上位装置からの入出力に関する命令を受取り、実行また
は上位装置に応答する入出力−障害回復制御回路と、障
害発生時刻、障害回復中の経過時間及び単位時間等を知
るためのタイマと、障害発生記憶媒体の障害回復用テー
ブルと、障害データの発見、修復及び予備記憶ディスク
への書き込みを行う障害データ修復回復回路とからな
り、前記記憶媒体に障害が発生したとき、前記障害デー
タ修復回復回路が該障害をエラーチェックにより発見し
て前記入出力−障害回復制御回路に連絡し、該入出力−
障害回復制御回路が前記障害の状態を判別し、それに適
した通常の読み書きまたは障害回復の処理の優先を選択
し、実行すること、または、通常の読み書きと障害回復
の処理頻度または処理量の比率を設定することによっ
て、達成される。An object of the present invention is to provide a storage medium group that stores a plurality of independent storage media as a set and divides data into bits, bytes, or arbitrary units, and stores the divided data in the divided data. A disk for storing the corresponding ECC data, a spare storage medium for storing the restored data,
An input / output-fault recovery control circuit that receives and executes instructions related to input / output from a higher-level device and executes or responds to the higher-level device; a timer for knowing a fault occurrence time, an elapsed time during fault recovery, a unit time, and the like; A failure recovery table for a storage medium, and a failure data recovery / recovery circuit for finding, repairing, and writing to a spare storage disk the failure data, and when a failure occurs in the storage medium, the failure data recovery / recovery circuit The fault is found by error check, and the fault is notified to the input / output-fault recovery control circuit.
A failure recovery control circuit determines the state of the failure, selects and executes the priority of normal read / write or failure recovery processing suitable for the failure state, or executes the normal read / write and failure recovery processing frequency or processing ratio. Is achieved by setting

【０００５】[0005]

【作用】上記記憶装置に障害が発生した時、記憶装置の
冗長度、障害回復の経過時間、通常の読み書き処理など
の状態を判別し、それに適した障害回復処理（方法）を
選択するので、通常の読み書きの処理性能の低下を防止
し、記憶装置の高信頼性を確保できる。すなわち、障害
を起こした記憶媒体の個数が記憶装置の冗長度に対して
余裕がある場合には、通常の読み書き処理を優先し、残
った時間で障害回復を行う障害回復処理（方法）を選択
するので、通常の読み書きの処理に負荷がかからず、一
方、冗長度に余裕が無くなった場合には、障害回復処理
を優先するので、記憶装置の障害に対する信頼性を確保
することが出来る。また、冗長度に余裕がある場合であ
って、各障害回復中の記憶媒体に対して障害回復に使っ
た時間の累計の大小により、障害回復処理（方法）を変
えるので、通常の読み書きの処理性能の低下を防止し、
障害の回復時間を一定時間内に納めることが出来る。ま
た、通常の通常の読み書きの少ない時間帯例えば夜間な
どを選んで障害回復に専念することが可能であり、通常
の読み書きの多い時間帯の記憶装置の負荷を軽減でき
る。また、通常の読み書きの処理頻度の大小にあわせて
障害回復処理の頻度または障害回復量の比率を設定する
ので、障害回復処理を時間的に効率よく実行することが
できる。When a failure occurs in the storage device, the redundancy of the storage device, the elapsed time of failure recovery, the state of normal read / write processing, and the like are determined, and a failure recovery process (method) suitable for the determination is selected. It is possible to prevent a decrease in the processing performance of normal reading and writing, and to ensure high reliability of the storage device. That is, when the number of failed storage media has a margin for the redundancy of the storage device, a normal read / write process is prioritized, and a failure recovery process (method) for performing a failure recovery in the remaining time is selected. Therefore, when no load is imposed on the normal read / write processing, and when there is no more margin in the redundancy, priority is given to the failure recovery processing, so that the reliability against the failure of the storage device can be ensured. Further, when there is a margin in the redundancy, the failure recovery processing (method) is changed depending on the total amount of time used for the failure recovery for the storage medium being recovered from the failure, so that the normal read / write processing is performed. Prevent performance degradation,
Failure recovery time can be set within a fixed time. Further, it is possible to select a normal time period during which normal reading and writing are less, for example, at night, and concentrate on the recovery from the failure, thereby reducing the load on the storage device during a normal reading and writing time period. Further, since the frequency of the failure recovery processing or the ratio of the amount of failure recovery is set in accordance with the magnitude of the normal read / write processing frequency, the failure recovery processing can be executed efficiently in terms of time.

【０００６】[0006]

【実施例】以下、本発明の実施例を詳細に説明する。実
施例では記憶媒体として磁気ディスクを例にとって説明
する。図１は本発明の障害回復の処理手順を示すフロー
チャートである。図２は本発明を２台のデータディスク
故障に耐えられる記憶装置に適用した実施例の構成図で
ある。図３は図２における障害発生ディスクの障害回復
用テーブルである。図４は図２の記憶装置における障害
回復の処理手順を示すフローチャートである。図５、図
６、図７、図８及び図９は、図４の処理手順における障
害回復処理選択ブロックの詳細を示す図である。Embodiments of the present invention will be described below in detail. In the embodiment, a magnetic disk will be described as an example of a storage medium. FIG. 1 is a flowchart showing the procedure of a failure recovery process according to the present invention. FIG. 2 is a configuration diagram of an embodiment in which the present invention is applied to a storage device that can withstand two data disk failures. FIG. 3 is a failure recovery table for the failed disk in FIG. FIG. 4 is a flowchart showing a procedure of a failure recovery process in the storage device of FIG. FIGS. 5, 6, 7, 8 and 9 are diagrams showing details of the failure recovery processing selection block in the processing procedure of FIG.

【０００７】図１の障害回復の処理手順を示すフローチ
ャートを説明する。記憶装置に障害が発生したとする
（ステップ１０）。まず、発生した障害が回復可能なも
のであるのかを判断する（ステップ２０）。もし、修復
が不可能であれば、そこで障害回復処理を終了し、デー
タ損失となる（ステップ３０）。修理が可能ならば、記
憶装置の冗長度、障害回復の経過時間、通常の読み書き
処理の処理状態から障害回復に専念すべき状態であるか
を判定する（ステップ４０）。もし、余裕が十分あり、
障害回復の緊急度が低い場合で、読み書きなどの通常の
処理要求が上位装置から来た時は、障害回復処理を止
め、読み書きなどの通常の処理を先に処理する。障害回
復処理は残りの時間で行い、障害回復処理途中の読み書
きの処理はキャンセルまたはキューイングする（ステッ
プ５０）。逆に、余裕がなく、障害回復の緊急度が高い
場合、障害回復処理を優先させ、読み書きなどの通常の
処理はすべてキャンセルまたはキューイングする（ステ
ップ６０）。また、障害回復の緊急度と読み書き等の通
常の処理の重要度に幾つかの組合せがある中間的な場
合、各々の条件に対応した障害回復処理を前もってプロ
グラムの形で用意し、条件が変化した場合、プログラム
を入れ替えることにより適当な処理に移行できるように
する（ステップ７０）。次に、障害回復処理が終了また
は中断した場合、障害回復処理がまだ残っているか否か
調べる（ステップ８０）。障害回復がすべて終了した場
合、記憶装置は正常状態に復帰する。まだ障害回復処理
が残っている場合、始め（２０）にもどり、以上のステ
ップを障害回復が終了するまで繰り返す。[0007] A flowchart showing the processing procedure of the failure recovery of FIG. 1 will be described. It is assumed that a failure has occurred in the storage device (step 10). First, it is determined whether the fault that has occurred can be recovered (step 20). If the data cannot be repaired, the failure recovery process is terminated and data loss occurs (step 30). If repair is possible, it is determined whether the storage device is in a state in which it is necessary to concentrate on the recovery from the redundancy of the storage device, the elapsed time of the recovery from the failure, and the processing state of the normal read / write processing (step 40). If you have enough time,
When the urgency of failure recovery is low and a normal processing request such as read / write is received from the host device, the failure recovery processing is stopped and normal processing such as read / write is performed first. The failure recovery processing is performed in the remaining time, and the read / write processing during the failure recovery processing is canceled or queued (step 50). Conversely, if there is no room and the urgency of failure recovery is high, priority is given to failure recovery processing, and all normal processing such as reading and writing is canceled or queued (step 60). Also, if there is some combination between the urgency of disaster recovery and the importance of normal processing such as reading and writing, disaster recovery processing corresponding to each condition is prepared in advance in the form of a program, and the conditions change. If so, the program can be switched to an appropriate process by replacing the program (step 70). Next, when the failure recovery processing ends or is interrupted, it is checked whether or not the failure recovery processing still remains (step 80). When all the failure recovery is completed, the storage device returns to the normal state. If the fault recovery process still remains, the process returns to the beginning (20), and the above steps are repeated until the fault recovery ends.

【０００８】次に、図２の本発明の実施例の構成図につ
いて説明する。図２において、１５０は入出力−障害回
復制御回路であり、上位装置からの入出力に関する命令
を受取り、実行または上位装置に応答する。さらに記憶
媒体に障害が発生している場合には、障害回復中のディ
スク数、障害回復にかかっている時間、及び、障害回復
の頻度または障害回復量などから適切な障害回復方法を
選択する。１５４は障害発生記憶媒体の障害回復用テー
ブルであり、詳細は図３で説明する。１５２は障害発生
時刻、障害回復中の経過時間及び単位時間を知るための
タイマであり、ここで計った時間を一つの条件として障
害回復方法を決める。１５６は障害データの発見、修
復、予備記憶ディスクへの書き込みを行う障害データ修
復回復回路であり、障害ディスクを除いたすべてのディ
スクよりデータを読みだし、それを使って障害データを
修復し、上位装置にそのデータを転送したり、予備記憶
ディスクに書き出しを行う。１５８から１６８までは分
割したデータを格納するデータディスク群である。図２
ではデータディスク数として６台のディスクを示した
が、一般には任意の数である。１７０、１７２は１５８
から１６８に分割したデータに対応するＥＣＣデータを
格納するディスクである。障害発生時にはこのデータと
１５８から１６８中の正常なデータを使って障害データ
を修復する。ただし、記憶装置の持つ冗長度より多くの
データディスクが故障した場合、データの修復は不可能
となり、データ損失となる。図２ではＥＣＣデータが２
個の場合、すなわち、２台のデータディスクが故障した
場合でも、障害データを修復できることを示している
が、一般には２台以上のディスク故障に耐えられるＥＣ
Ｃ生成法もあり、データ損失に到らない故障ディスクの
個数つまり冗長度はさらに多く取ることが可能である。
ＥＣＣの生成法は具体的には多重消失訂正可能なリード
ソロモン符号を用いることによって実現される。リード
ソロモン符号及びそれを利用した誤り訂正方式について
は、従来の技術（例えば、土井、伊賀共著ラジオ技術社
出版の”新版ディジタル・オーディオ”に記載されてい
る。）であるので、説明を省略する。１７４、１７６は
修復したデータを格納する予備記憶媒体であり、障害デ
ィスクの内容が格納された場合、次回からそのデータへ
のアクセスは予備記憶媒体になる。このディスク数も一
般に任意である。Next, the configuration of the embodiment of the present invention shown in FIG. 2 will be described. In FIG. 2, reference numeral 150 denotes an input / output-fault recovery control circuit which receives an input / output command from a higher-level device, executes the command, or responds to the higher-level device. Further, when a failure has occurred in the storage medium, an appropriate failure recovery method is selected from the number of disks during the recovery, the time required for the recovery, the frequency of the recovery or the amount of the recovery. Reference numeral 154 denotes a failure recovery table of the failure storage medium, which will be described in detail with reference to FIG. Reference numeral 152 denotes a timer for knowing the fault occurrence time, the elapsed time during the fault recovery, and the unit time, and determines the fault recovery method using the time measured here as one condition. Reference numeral 156 denotes a failure data recovery / recovery circuit for finding, restoring, and writing to the spare storage disk. The failure data recovery / recovery circuit reads data from all disks except the failed disk, and uses the read data to repair the failed data. The data is transferred to the device or written to a spare storage disk. Data disk groups 158 to 168 store divided data. FIG.
In the above, six disks are shown as the number of data disks, but the number is generally arbitrary. 170 and 172 are 158
This is a disk for storing ECC data corresponding to data divided into 168 and 168. When a fault occurs, the fault data is restored using this data and the normal data in 158 to 168. However, if more data disks fail than the redundancy of the storage device, the data cannot be repaired, resulting in data loss. In FIG. 2, the ECC data is 2
In other words, even if two data disks have failed, the failed data can be repaired. However, in general, an EC that can withstand two or more disk failures
There is also a C generation method, and the number of failed disks that do not lead to data loss, that is, the redundancy can be further increased.
Specifically, the ECC generation method is realized by using a Reed-Solomon code capable of multiple erasure correction. The Reed-Solomon code and the error correction method using the same are described in the related art (for example, described in "New Edition Digital Audio" published by Doi and Iga, published by Radio Engineering Co., Ltd.), and thus description thereof is omitted. . Reference numerals 174 and 176 denote spare storage media for storing the restored data. When the contents of the failed disk are stored, access to the data from the next time becomes the spare storage media. This number of disks is also generally arbitrary.

【０００９】図３の障害発生ディスクの障害回復用テー
ブルを説明する。この障害回復用テーブル１５４は、予
備記憶ディスクの識別番号（１）、障害発生ディスクの
識別番号（２）、障害発生時刻（３）、障害データのア
ドレス（４）及び障害回復の有無を判定するフラグ
（５）からなる。The failure recovery table for the failed disk shown in FIG. 3 will be described. The failure recovery table 154 determines the identification number (1) of the spare storage disk, the identification number (2) of the failed disk, the failure time (3), the address of the failed data (4), and the presence or absence of the failure recovery. It consists of a flag (5).

【００１０】次に、図４のフローチャートに基いて図２
及び図３の動作を説明する。まず、図２においてデータ
ディスク１６２に障害が発生したとする（１００）。障
害データ修復回復回路１５６が障害を発見し、入出力−
障害回復制御回路１５０に障害発生を伝える。障害デー
タ修復回復回路１５６から連絡を受けた入出力−障害回
復制御回路１５０は障害回復用テーブル１５４を見て空
きがあるかを確かめる（１０２）。次に、入出力−障害
回復制御回路１５０はこの障害が新しい障害であること
を確認する（１０４）。新しい障害であれば、入出力−
障害回復制御回路１５０は障害データ修復回復回路１５
６に対して障害回復用テーブル１５４の中の該当する欄
に初期値を書き込むことを命ずる。障害データ修復回復
回路１５６は障害回復用テーブル１５４内の予備記憶媒
体の欄に予備ディスク１７４の識別番号Ｓ１を書き込
み、障害発生記憶媒体の欄に障害を起こしたデータディ
スク１６２の識別番号＃２を書き込む。次にタイマ１５
２から読み込んだ障害発生時刻を障害発生時刻の欄に書
き込み、障害ディスク１６２のアドレスをアドレス欄に
書き込む。最後に障害データ修復回復回路１５６は各ア
ドレスの回復判定フラグを初期化する（１０６）。新し
い障害でなければ、ステップ１０６の処理は行わず、次
のステップに進む。次のステップでは、入出力−障害回
復制御回路１５０が障害の状態を判別し、それに適した
通常の読み書き処理または障害回復処理を選択し、実行
する（１０８）。このステップの詳細は図５〜図９にお
いて述べる。次に障害回復処理が終了または中断した場
合、障害回復処理がまだ残っているか否か調べる（１１
０）。障害回復がすべて終了した場合記憶装置は正常状
態に復帰する。まだ障害回復処理が残っている場合、始
めにもどり（１０２）、以上のステップを障害回復が終
了するまで繰り返す。どのような障害回復方法が採られ
ても、障害データ修復回復回路１５６は障害回復の継続
または終了の監視をしている。障害回復が終わらない間
に次の障害が発生した場合、上記と同じように障害デー
タ修復回復回路１５６が処理を開始するが（１０２）、
障害回復が終わらない障害ディスク数が装置の冗長度を
超えた場合、障害回復は不可能なので、入出力−障害回
復制御回路１５０は上位装置にデータが損失したことを
報告する（１１４）。もし、障害回復が終了した場合、
障害回復用テーブル１５４内の不要なデータを消去し、
正常状態に復帰する（１１２）。Next, based on the flowchart of FIG.
And the operation of FIG. First, assume that a failure has occurred in the data disk 162 in FIG. 2 (100). The fault data recovery / recovery circuit 156 detects the fault, and
The occurrence of a fault is notified to the fault recovery control circuit 150. The input / output-fault recovery control circuit 150, which has been notified from the fault data recovery / recovery circuit 156, checks the fault recovery table 154 to determine whether there is a free space (102). Next, the input / output-fault recovery control circuit 150 confirms that this fault is a new fault (104). If it is a new fault, I / O-
The fault recovery control circuit 150 is used for the fault data recovery and recovery circuit 15.
6 instructs to write an initial value in a corresponding column in the failure recovery table 154. The failure data recovery / recovery circuit 156 writes the identification number S1 of the spare disk 174 in the column of the spare storage medium in the failure recovery table 154, and the identification number # 2 of the failed data disk 162 in the column of the failure storage medium. Write. Next, timer 15
2 is written in the column of failure time, and the address of the failed disk 162 is written in the address column. Finally, the failure data recovery / recovery circuit 156 initializes a recovery determination flag of each address (106). If it is not a new fault, the process proceeds to the next step without performing the process of step 106. In the next step, the input / output-fault recovery control circuit 150 determines the status of the fault, selects and executes a normal read / write process or a fault recovery process suitable for the fault (108). Details of this step will be described with reference to FIGS. Next, when the failure recovery processing ends or is interrupted, it is checked whether the failure recovery processing still remains (11).
0). When all of the failure recovery is completed, the storage device returns to the normal state. If the failure recovery process still remains, the process returns to the beginning (102), and the above steps are repeated until the failure recovery is completed. Regardless of the failure recovery method, the failure data recovery / recovery circuit 156 monitors the continuation or termination of the failure recovery. If the next failure occurs before the failure recovery is completed, the failure data recovery / recovery circuit 156 starts processing as described above (102).
If the number of failed disks for which the failure recovery is not completed exceeds the redundancy of the device, recovery from the failure is impossible, so the I / O-fault recovery control circuit 150 reports the data loss to the higher-level device (114). If disaster recovery is over,
Erase unnecessary data in the failure recovery table 154,
It returns to a normal state (112).

【００１１】次に、図４のステップ１０８について図５
を用いて説明する。図５において、入出力−障害回復制
御回路１５０は、障害回復用テーブル１５４を見て障害
回復の終わっていないディスク数を数え、障害ティスク
数としきい値を比較する（ステップ１２０）。障害ディ
スク数が予め定められたしきい値より少なければ、入出
力−障害回復制御回路１５０は冗長度に余裕があると判
断し、通常の読み書きを優先し、その他の時間で障害回
復の処理を行う。障害回復処理途中のものはキャンセル
またはキューイングする（ステップ１２２）。また、障
害ディスク数がしきい値より多ければ、入出力−障害回
復制御回路１５０は冗長度に余裕がないと判断し、障害
回復処理を優先させ、読み書きなどの通常の処理はすべ
てキャンセルまたはキューイングする（ステップ１２
４）。障害回復を行う場合、１トラック等の比較的短時
間で修復と格納が終わる単位で行い、終了後は記憶装置
を通常処理のために開放する。ただし、障害回復中に通
常の読み書きの処理命令がきた場合には、直ちに障害回
復作業を中止し、通常の読み書き処理のために開放す
る。通常の読み書き処理中に障害回復の終わっていない
データを読む場合には、障害データはＥＣＣデータとそ
れを生成する際に使った正常なデータで修復し、上位装
置へ送り、同時に予備ディスクの中に修復データを格納
し、障害回復用テーブル１５４内の該当するアドレスの
回復判定フラグを障害回復済みとする。このフラグが障
害回復済みとなっていれば、次回のこのデータへのアク
セスは予備ディスクに行うことになる。データ書き込み
の場合は、ＥＣＣデータを作成後、障害ディスク内に格
納すべきデータを予備ディスクに書き込み、回復判定フ
ラグを障害回復済みとする。しきい値は、図２の例にお
いては冗長度が２台であるので必然的に１となる。しか
し、２台以上の多重消失訂正可能なリードソロモン符号
を使った場合、しきい値は冗長度以下の任意の整数を取
ることが出来る。障害回復は、入出力−障害回復制御回
路１５０が前回修復したデータのアドレスを記憶してお
り、その次のアドレスから行う。障害回復時には先に記
憶した前回修復済のデータアドレスを使い、その次のア
ドレスが障害回復用テーブル１５４でフラグが立ってな
く、障害回復が終了していない場合は、そのアドレスの
データを修復する。データの修復は、ＥＣＣデータと正
常なディスクよりＥＣＣデータを生成する際に使った正
常なデータを読み出し、障害データ修復回復回路１５６
を使って行う。修復したデータは予備のディスクに書き
込み、障害回復テーブル１５４のフラグを障害回復済み
にする。修復が終わったデータへのアクセスは予備ディ
スクに対して行うことになる。修復したデータのアドレ
スを障害データ修復回復回路１５６に記憶し、入出力−
障害回復制御回路１５０は次の障害回復処理に移る。こ
の図５の実施例では、障害ディスク数がしきい値以下の
場合、障害回復より通常の読み書き処理を優先させるの
で、記憶装置の読み書きの性能低下を抑えることができ
る。また、障害回復に専念する状態では、最短時間で修
復できるので、信頼性を保つことが出来る。上の実施例
では障害ディスク数のみ注目して障害回復方法を選んだ
が、障害ディスク数の他に障害回復時間の累計を条件に
含めることが出来る。Next, step 108 in FIG. 4 will be described with reference to FIG.
This will be described with reference to FIG. In FIG. 5, the input / output-failure recovery control circuit 150 looks at the failure recovery table 154, counts the number of disks for which recovery has not been completed, and compares the number of failure disks with a threshold (step 120). If the number of failed disks is less than a predetermined threshold, the I / O-failure recovery control circuit 150 determines that there is enough redundancy, gives priority to normal reading and writing, and performs the failure recovery process at other times. Do. Those in the middle of the failure recovery processing are canceled or queued (step 122). If the number of failed disks is larger than the threshold value, the I / O-failure recovery control circuit 150 determines that there is no margin in the redundancy, gives priority to the failure recovery processing, and cancels or queues all normal processing such as reading and writing. (Step 12
4). When recovering from a failure, the recovery and storage are completed in a relatively short time, such as one track, and the storage device is released for normal processing after the completion. However, if a normal read / write processing instruction is received during recovery from a failure, the failure recovery work is immediately stopped and released for normal read / write processing. When reading data for which failure recovery has not been completed during normal read / write processing, the failed data is repaired with the ECC data and the normal data used to generate it, sent to the upper-level device, and simultaneously stored in the spare disk. In the recovery table 154, the recovery determination flag of the corresponding address in the recovery table 154 is determined to have been recovered. If this flag indicates that the recovery has been completed, the next access to this data will be made to the spare disk. In the case of data writing, after the ECC data is created, data to be stored in the failed disk is written to the spare disk, and the recovery determination flag is set to "recovered". The threshold value is necessarily 1 since the redundancy is two in the example of FIG. However, when two or more multiplex erasure-correctable Reed-Solomon codes are used, the threshold can take any integer less than or equal to the redundancy. The fault recovery stores the address of the data previously recovered by the input / output-fault recovery control circuit 150, and starts from the next address. At the time of failure recovery, the previously stored data address stored previously is used. If the next address is not flagged in the failure recovery table 154 and the failure recovery is not completed, the data at that address is repaired. . To restore the data, the normal data used to generate the ECC data from the ECC data and the normal disk is read, and the faulty data recovery and recovery circuit 156 is used.
Perform using. The repaired data is written to a spare disk, and the flag in the failure recovery table 154 is set to "recovered". Access to the data that has been restored will be made to the spare disk. The address of the restored data is stored in the failure data restoration and recovery circuit 156, and the input / output
The fault recovery control circuit 150 proceeds to the next fault recovery process. In the embodiment shown in FIG. 5, when the number of failed disks is equal to or smaller than the threshold value, the normal read / write processing is prioritized over the recovery from the failure. In addition, in a state where the user concentrates on the recovery from the failure, the repair can be performed in the shortest time, so that the reliability can be maintained. In the above embodiment, the failure recovery method is selected by paying attention only to the number of failed disks, but the condition may also include the total failure recovery time in addition to the number of failed disks.

【００１２】次に、図４内のステップ１０８について図
６を用いて説明する。図６において、入出力−障害回復
制御回路１５０は障害回復用テーブル１５４を見て障害
回復の終わっていないディスク数を数え、障害ティスク
数としきい値を比較する（ステップ１３０）。それがし
きい値以下ならば、次に入出力−障害回復制御回路１５
０はタイマ１５２より現在時刻を読み、この現在時刻と
障害回復テーブル１５４内の障害発生時刻とから算出で
きる障害回復時間の累計と予め設定していた制限時間を
比較する（ステップ１３２）。そこで障害回復時間の累
計が予め設定していた制限時間より小さいとき、障害回
復に対して余裕があると見なせるので、入出力−障害回
復制御回路１５０は、障害データ修復回復回路１５６に
対して、通常の読み書きの処理を優先し、残りの時間で
障害ディスク内のデータを修復し、予備ディスクに格納
するように命令する。障害回復処理途中のものはキャン
セルまたはキューイングする(ステップ１３４）。も
し、障害回復が終了していないディスク数がしきい値よ
り多いとき、または、現在時刻と障害発生時刻の差が予
め設定していた制限時間より大きいとき、障害回復に余
裕がないと見なせるので、障害回復制御回路１５０は、
上位装置からの通常の読み書きはキャンセルまたはキュ
ーイングし、障害データ修復回復回路１５６に対して、
障害回復を優先して行うように命令する（ステップ１３
６）。この図６の実施例では、障害回復にかかる時間が
制限時間を超過する場合、障害回復処理に専念するの
で、修復時間を一定時間内に納めることができ、信頼性
を向上させることができる。Next, step 108 in FIG. 4 will be described with reference to FIG. In FIG. 6, the input / output-failure recovery control circuit 150 looks at the failure recovery table 154, counts the number of disks for which recovery has not been completed, and compares the number of failed disks with a threshold value (step 130). If it is below the threshold, then the input / output-fault recovery control circuit 15
In the case of 0, the current time is read from the timer 152, and the total of the fault recovery time calculated from the current time and the fault occurrence time in the fault recovery table 154 is compared with a preset time limit (step 132). Therefore, when the total of the fault recovery times is smaller than the preset time limit, it can be considered that there is a margin for the fault recovery, so that the input / output-fault recovery control circuit 150 It gives priority to normal read / write processing, and instructs the data in the failed disk to be repaired in the remaining time and stored in the spare disk. Those in the middle of the failure recovery processing are canceled or queued (step 134). If the number of disks for which recovery has not been completed is greater than the threshold value, or if the difference between the current time and the failure time is greater than a preset time limit, it can be considered that there is no margin for recovery from failure. , The failure recovery control circuit 150
Normal reading and writing from the higher-level device is canceled or queued.
An instruction is given to perform the recovery from the failure first (step 13
6). In the embodiment shown in FIG. 6, when the time required for the recovery from the fault exceeds the time limit, the user can concentrate on the recovery from the fault, so that the repair time can be set within a fixed time, and the reliability can be improved.

【００１３】次に、図４のステップ１０８について図７
を用いて説明する。図７において、入出力−障害回復制
御装置１５０は、タイマ１５２から現在時刻を取得し、
その時刻が通常の読み書きの処理が多い時間帯か否か判
定する（ステップ１４０）。もし、その時間帯でなけれ
ば、入出力−障害回復制御回路１５０は上位装置からの
通常の読み書きはキャンセルまたはキューイングし、障
害データ修復回復回路１５６に対して障害回復を優先し
て行うように命令する。また、その時間帯であってもス
テップ１４２の障害ディスク数がしきい値を超える場
合、同様に障害回復処理を優先する（ステップ１４
６）。通常の読み書きの処理が多くかつ障害ディスク数
がしきい値以下の場合のみ、通常の読み書きを優先し、
障害回復は残りの時間で行う（ステップ１４４）。この
図７の実施例では、記憶装置の使われ方が時間帯によっ
て異なっていることが前もって分かっている場合に、通
常の読み書きの処理が少ない時間帯に障害回復を当てる
ことができるので、通常の読み書きの処理が障害回復処
理を妨げることなく、障害回復をスム−ズに実行でき
る。以上の図５〜図７の実施例では、障害回復を優先す
るかもしくは通常の読み書きを優先するかの２通りの障
害回復処理であったが、状況に応じてこれを増やすこと
は差し支えない。Next, step 108 in FIG. 4 will be described with reference to FIG.
This will be described with reference to FIG. 7, the input / output-failure recovery control device 150 acquires the current time from the timer 152,
It is determined whether or not the time is a time zone in which normal read / write processing is frequently performed (step 140). If not, the input / output / fault recovery control circuit 150 cancels or queues the normal read / write from the host device, and gives priority to the fault data recovery / recovery circuit 156 for fault recovery. Command. If the number of failed disks in step 142 exceeds the threshold value even during that time, priority is given to the failure recovery processing in the same manner (step 14).
6). Only when normal read / write processing is large and the number of failed disks is below the threshold, normal read / write is given priority,
Failure recovery is performed in the remaining time (step 144). In the embodiment of FIG. 7, when it is known in advance that the use of the storage device differs depending on the time zone, the failure recovery can be applied to the time zone where the normal read / write processing is small. The failure recovery can be executed smoothly without the read / write processing of the data interfering with the failure recovery processing. In the above-described embodiments of FIGS. 5 to 7, two types of failure recovery processing are performed, giving priority to failure recovery or giving priority to normal reading and writing. However, the number of failure recovery processes may be increased according to the situation.

【００１４】次に、図４のステップ１０８について図８
を用いて説明する。図８において、ステップ１５０の障
害ディスク数がしきい値を超える場合は、障害回復を優
先し、通常の読み書きは止める（ステップ１５８）。障
害ディスク数がしきい値以下でかつステップ１５２の通
常の読み書きの処理が多い時間帯でない場合、読み出し
のみ処理して、その他の時間は障害回復を優先して行う
（ステップ１５６）。障害ディスク数がしきい値以下で
かつ通常の読み書きの処理が多い時間帯の場合、通常の
読み書きの処理を優先し、その他の時間で障害回復を行
う（ステップ１５４）。この図８の実施例では、障害デ
ィスク数はしきい値以下であるが、通常の読み書きの処
理が少ない時間帯、特に読み出しのみの時間帯の場合、
読み出しの処理を例外的に許すことにより、障害回復処
理を妨げずかつ記憶装置の性能低下を抑えることが可能
になる。Next, step 108 in FIG. 4 will be described with reference to FIG.
This will be described with reference to FIG. In FIG. 8, when the number of failed disks in step 150 exceeds the threshold value, priority is given to recovery from failure, and normal reading and writing are stopped (step 158). If the number of failed disks is equal to or less than the threshold value and the time period for normal read / write processing in step 152 is not a time zone, only read processing is performed, and the rest of the time is given priority to failure recovery (step 156). If the number of failed disks is equal to or less than the threshold value and there is a lot of normal read / write processing, priority is given to normal read / write processing, and failure recovery is performed at other times (step 154). In the embodiment of FIG. 8, the number of failed disks is equal to or smaller than the threshold value. However, in a time zone where normal read / write processing is small, particularly in a time zone where only reading is performed,
By exceptionally permitting the reading process, it is possible to prevent the failure recovery process and suppress the performance degradation of the storage device.

【００１５】次に、図４内のステップ１０８について図
９を用いて説明する。図９において、ステップ１６０の
障害ディスク数がしきい値を超える場合、または、障害
ディスク数がしきい値以下でかつステップ１６２の障害
回復時間の累計が制限時間を超過する場合は障害回復を
優先し、通常の読み書きを止める（ステップ１７２）。
入出力−障害回復制御回路１５０は、障害ディスク数が
しきい値以下でかつ障害回復時間の累計が制限時間より
小さいとき、タイマ１５２から単位時間を読み、その時
間内の通常の読み書きの処理頻度と予め設定したしきい
値を比較する（ステップ１６４）。通常の読み書きの処
理頻度がしきい値より大きい場合、制限時間内であり、
障害回復に対して余裕があると見なせるので、通常の読
み書きの処理を優先し、その他の時間で障害回復を行う
（ステップ１６６）。一方、通常の読み書きの処理頻度
がしきい値より小さいときであって、その頻度が限りな
くしきい値に近い場合あるいは遠い場合等、その頻度に
は大小の差があるので、通常の読み書きの処理頻度の大
小にあわせて動的に単位時間内の障害回復処理の頻度ま
たは障害回復量の比率を設定する（ステップ１６８）。
この設定した障害回復処理の頻度または障害回復量の比
率に応じて障害回復処理を実行する（ステップ１７
０）。この図９の実施例では、通常の読み書きの処理頻
度の大小にあわせて障害回復処理の頻度または障害回復
量の比率を設定するので、障害回復処理が時間的に効率
よく実行されることになる。Next, step 108 in FIG. 4 will be described with reference to FIG. In FIG. 9, when the number of failed disks exceeds the threshold value in step 160, or when the number of failed disks is less than the threshold value and the total failure recovery time in step 162 exceeds the time limit, priority is given to failure recovery. Then, normal reading / writing is stopped (step 172).
The input / output-failure recovery control circuit 150 reads the unit time from the timer 152 when the number of failed disks is equal to or less than the threshold value and the total of the failure recovery times is less than the time limit, and performs a normal read / write processing frequency within the time. Is compared with a preset threshold value (step 164). If the frequency of normal reading and writing is greater than the threshold, it is within the time limit,
Since it can be considered that there is room for failure recovery, normal read / write processing is prioritized, and failure recovery is performed at other times (step 166). On the other hand, when the normal read / write processing frequency is smaller than the threshold value and the frequency is infinitely close to or far from the threshold value, there is a difference in the frequency. The frequency of the failure recovery processing per unit time or the ratio of the amount of failure recovery is dynamically set according to the magnitude of the frequency (step 168).
The failure recovery processing is executed according to the set frequency of failure recovery processing or the ratio of the amount of failure recovery (step 17).
0). In the embodiment of FIG. 9, the frequency of the failure recovery processing or the ratio of the amount of failure recovery is set in accordance with the magnitude of the normal read / write processing frequency, so that the failure recovery processing is executed efficiently in terms of time. .

【００１６】これら上述の実施例では、記憶媒体として
磁気ディスクを例に挙げたが、この他に光ディスク、フ
ロッピーディスク、半導体メモリを記憶媒体として用い
ることが可能である。また、障害回復方法を選択する条
件として、上述の実施例に替えて、上位装置のジョブ内
容、記憶装置内のファイルの重要度などを条件としても
よい。これらの条件と障害回復方法の組合せにより柔軟
な障害回復処理を行うことができる。In the above embodiments, a magnetic disk is taken as an example of a storage medium, but an optical disk, a floppy disk, or a semiconductor memory can be used as a storage medium. In addition, instead of the above-described embodiment, the condition for selecting the failure recovery method may be the job content of the higher-level device, the importance of a file in the storage device, or the like. A flexible failure recovery process can be performed by a combination of these conditions and a failure recovery method.

【００１７】[0017]

【発明の効果】本発明によれば、障害を起こした記憶媒
体の個数が記憶装置の冗長度に比べて小さい場合、障害
回復より通常の読み書きの処理を優先するので、記憶装
置の負荷が重くならず、通常の読み書きの処理における
応答性能の低下を極力抑えることができる。また、冗長
度に余裕がなくなってきた場合、自動的に通常の読み書
きの処理を止めて障害回復の処理を優先するので、記憶
装置の信頼性は低下しない。さらに、各障害発生記憶媒
体の障害回復処理時間に関する累計によって障害回復処
理方法を替えるので、さらに高信頼性の記憶装置を実現
することが出来る。また、通常の読み書きの処理頻度の
大小にあわせて障害回復処理の頻度または障害回復量の
比率を設定するので、障害回復処理を時間的に効率よく
実行することができる。このように、本発明は、障害回
復に関わる諸条件に対応して各々適切な障害回復方法を
選択できるので、最適な障害回復処理の実行が可能であ
る、という顕著な効果を奏する。According to the present invention, when the number of failed storage media is smaller than the redundancy of the storage device, normal read / write processing is given priority over recovery from the failure, so that the load on the storage device is heavy. In addition, it is possible to minimize a decrease in response performance in normal read / write processing. Further, when the margin of the redundancy is reduced, the normal read / write processing is automatically stopped and the priority is given to the processing of the failure recovery, so that the reliability of the storage device does not decrease. Further, since the failure recovery processing method is changed depending on the total of the failure recovery processing times of the respective failure storage media, a storage device with higher reliability can be realized. Further, since the frequency of the failure recovery processing or the ratio of the amount of failure recovery is set in accordance with the magnitude of the normal read / write processing frequency, the failure recovery processing can be executed efficiently in terms of time. As described above, according to the present invention, an appropriate failure recovery method can be selected in accordance with various conditions relating to the failure recovery, so that there is a remarkable effect that optimal failure recovery processing can be performed.

[Brief description of the drawings]

【図１】本発明の処理手順を示すフローチャートであ
る。FIG. 1 is a flowchart showing a processing procedure of the present invention.

【図２】本発明の記憶装置の構成図である。FIG. 2 is a configuration diagram of a storage device of the present invention.

【図３】本発明の障害発生ディスクの障害回復用テーブ
ルの構成図である。FIG. 3 is a configuration diagram of a failure recovery table of a failed disk according to the present invention.

【図４】図２の処理手順を示すフローチャートである。FIG. 4 is a flowchart showing a processing procedure of FIG. 2;

【図５】図４における障害回復処理選択ブロックのフロ
ーチャートを示す。FIG. 5 shows a flowchart of a failure recovery processing selection block in FIG. 4;

【図６】図４における障害回復処理選択ブロックの他の
フローチャートを示す。FIG. 6 shows another flowchart of the failure recovery processing selection block in FIG.

【図７】図４における障害回復処理選択ブロックの他の
フローチャートを示す。FIG. 7 shows another flowchart of the failure recovery processing selection block in FIG.

【図８】図４における障害回復処理選択ブロックの他の
フローチャートを示す。FIG. 8 shows another flowchart of the failure recovery processing selection block in FIG.

【図９】図４における障害回復処理選択ブロックの他の
フローチャートを示す。FIG. 9 shows another flowchart of the failure recovery processing selection block in FIG.

[Explanation of symbols]

１５０入出力−障害回復制御回路１５４障害回復用テーブル１５２タイマ１５６障害データ修復回復回路１５８データディスク＃０１６０データディスク＃１１６２データディスク＃２１６４データディスク＃３１６６データディスク＃４１６８データディスク＃５１７０ＥＣＣデータディスクＥ１１７２ＥＣＣデータディスクＥ２１７４予備ディスクＳ１１７６予備ディスクＳ２ 150 I / O-fault recovery control circuit 154 fault recovery table 152 timer 156 fault data recovery recovery circuit 158 data disk # 0 160 data disk # 1 162 data disk # 2 164 data disk # 3 166 data disk # 4 168 data disk # 5 170 ECC data disk E1 172 ECC data disk E2 174 Spare disk S1 176 Spare disk S2

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平４−49449（ＪＰ，Ａ) 特開平５−143471（ＪＰ，Ａ) 米国特許4464747（ＵＳ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 3/06 ────────────────────────────────────────────────── ─── Continuation of the front page (56) References JP-A-4-49449 (JP, A) JP-A-5-143471 (JP, A) US Patent 4,446,747 (US, A) (58) Fields studied (Int .Cl. ⁷ , DB name) G06F 3/06

Claims

(57) [Claims]

1. A plurality of storage devices for storing a plurality of data groups and data for parity ticks of each data group of the plurality of data groups, and data stored in the defective data storage device.
Based on the data stored in the data storage device having no defect among the data groups to which the data belongs, and the parity tick data of the data group stored in the data storage device having no defect. A control device for performing a repair process.The control device monitors the occurrence of defects in the plurality of storage devices, and executes the repair process when the number of occurrences of the defect exceeds a preset value. A storage device characterized by being performed prior to other processing.

2. The control device according to claim 1, wherein the control device performs the repair process prior to other processes when the number of defective disks exceeds a preset value. Storage device.

3. A plurality of storage devices for storing a plurality of data groups and parity tick data of each data group of the plurality of data groups; a data storage device for storing restored data; When a failure occurs in any of the storage devices, the data stored in the failed data storage device is associated with data other than the data in the data group to which the data belongs, and And a control device for performing a restoration process based on the parity tick data and storing the restored data in a data storage device that accumulates the restored data. The storage device according to claim 1, wherein the restoration process is performed prior to an access request from a host unit in accordance with the number of the data storage devices. Place.

4. The control device according to claim 1, wherein, when the number of failed data storage devices is larger than a preset number, the control device performs the restoration process prior to an access request from a host unit. The storage device according to claim 3.

5. A plurality of storage devices for storing a plurality of data groups and parity tick data of each data group of the plurality of data groups; a data storage device for storing restored data; When a failure occurs in any of the storage devices, the data stored in the failed data storage device is associated with data other than the data in the data group to which the data belongs, and And a control device for performing a restoration process based on the parity tick data and storing the restored data in a data storage device that accumulates the restored data. If the number of failed data storage devices is greater than the preset number, the above-described restoration process is continued, and the number of failed data storage devices is increased. Is smaller than a preset number, the access from the host unit is permitted.

6. A plurality of storage devices for storing a plurality of data groups and parity tick data of each data group of the plurality of data groups; a data storage device for storing restored data; When a failure occurs in any of the storage devices, the data stored in the failed data storage device is associated with data other than the data in the data group to which the data belongs, and And a control device for performing a restoration process based on the parity tick data to be performed, and storing the restored data in a data storage device that stores the restored data. A storage device characterized in that the restoration process is performed prior to an access request from a host unit in accordance with a margin of redundancy of the storage device. apparatus.

7. The control device according to claim 1, wherein when the number of failed data storage devices is larger than a preset number, the control device performs the restoration process prior to an access request from a host unit. The storage device according to claim 6.

8. A plurality of storage devices for storing a plurality of data groups and parity tick data of each data group of the plurality of data groups; a data storage device for storing repaired data; When a failure occurs in any of the storage devices, the data stored in the failed data storage device is associated with data other than the data in the data group to which the data belongs, and And a control device for performing a restoration process based on the parity tick data to be performed, and storing the restored data in a data storage device that stores the restored data. If the number of failed data storage devices in the storage device is larger than a preset number, the restoration process is continued and the plurality of data storage devices are continued. If the number of failed data storage devices in the data storage device is smaller than a preset number, access from the host unit is permitted.

9. The storage device according to claim 1, wherein the control device stores the data restored by the restoration process in a spare storage device different from the storage device.
The storage device according to any one of the above.

10. A plurality of storage devices for storing data divided into bit units, byte units, or arbitrary units and parity tick data of the divided data, and one of the plurality of storage devices. Data for restoring the divided data having a defect stored therein based on the divided data stored in the other storage device having no defect and the data for parity ticks. Recovery means, monitoring means for monitoring the processing of the plurality of storage devices, and outputting a signal when any of the plurality of storage devices has a defect; and total number of the storage devices having the defect. Means for accumulating the data, and means for accumulating the number set based on the total number of the data for the parity ticks. If the number is larger than the set number, the repair is continued, and if the total number of the storage devices having the defect is smaller than the set number, the storage device can be accessed. Control means for controlling the storage device for temporarily suspending restoration of the divided data.

11. A plurality of storage devices that monitor processing of a plurality of storage devices that store data divided into bit units, byte units, or arbitrary units and parity tick data of the divided data, Detecting the defect generated in any of the above, determining the total number of the storage devices having the defect, and recovering the divided data stored in the storage device having the defect. When the repair frequency is set, and the total number of the storage devices having the defect is smaller than a predetermined value, the divided data is stored in the other storage device having no defect. Restoring based on the divided data and the data for parity ticks, storing the restored data in at least one of the storage devices; A first step of continuously restoring data; and a second step of temporarily suspending restoration of the divided data so that the storage device can be accessed. 2. The method of restoring data in a storage device according to claim 2, wherein the second step is executed based on the total number of the storage devices having the defect and the data restoration frequency.