JP3399398B2

JP3399398B2 - Mirror Disk Recovery Method in Fault Tolerant System

Info

Publication number: JP3399398B2
Application number: JP09702699A
Authority: JP
Inventors: 和浩冨士
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1999-04-02
Filing date: 1999-04-02
Publication date: 2003-04-21
Anticipated expiration: 2019-04-02
Also published as: JP2000293389A

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、コンピュータ二重
化システムを構築するフォールトトレラントシステムに
おける障害復旧処理方式に係り、特に二系のコンピュー
タのそれぞれがオンラインデータを相互にバックアップ
し合うオンラインデータ復旧方式に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a failure recovery processing method in a fault tolerant system for constructing a computer duplication system, and more particularly to an online data recovery method in which computers of a second system mutually back up online data.

【０００２】[0002]

【従来の技術】フォールトトレラントシステムにおける
ディスクミラーリングは、二系のコンピュータのハード
ディスクを相互接続し、両方に同時にデータを書き込
み、片方のコンピュータが故障してもデータの処理を継
続して行うことのできる方式として広く利用されてい
る。例えば、特開平９ー２０４３１９号公報では、対向
する２台のコンピュータが、相互に、対向システム稼働
情報を保有し、データ転送制御手段によって同一のオン
ラインデータを相互に転送してデータの二重化を図って
いる。そして、何れかのコンピュータが障害から復旧し
た場合、正常なコンピュータの全オンラインデータを自
動的に対向コンピュータに転送して、オンラインデータ
の復旧を迅速且つ容易に行っているものである。2. Description of the Related Art Disk mirroring in a fault-tolerant system allows hard disks of two computers to be interconnected, data to be written to both at the same time, and data processing to be continued even if one computer fails. Widely used as a method. For example, in Japanese Unexamined Patent Publication No. 9-204319, two opposing computers have opposing system operation information, and the same online data is mutually transferred by the data transfer control means to duplicate the data. ing. Then, when any of the computers recovers from the failure, all the online data of the normal computer is automatically transferred to the opposite computer so that the online data can be recovered quickly and easily.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、前述の
ような従来技術においては、フォールトトレラントシス
テムにおいて、障害が発生してから復旧するまでの時間
が短ければ短いほど信頼性は向上するが、ディスクデー
タ復旧のミラーリングにかかる時間がシステム復旧時間
の大半を占めているため、システム全体の信頼性を低下
させるという問題を生じる。特に、前述の特開平９ー２
０４３１９号公報の技術などは、復旧時に全オンライン
データを対向コンピュータに転送するため、復旧時間を
長引かせる要因となっている。However, in the prior art as described above, in the fault tolerant system, the shorter the time from the occurrence of a failure to the recovery, the higher the reliability. Since the time required for the mirroring for recovery occupies most of the system recovery time, there arises a problem that the reliability of the entire system is deteriorated. In particular, the above-mentioned JP-A-9-2
The technique disclosed in Japanese Patent No. 04319 transfers all online data to the opposite computer at the time of restoration, which is a factor of prolonging the restoration time.

【０００４】本発明はこのような事情に鑑みてなされた
ものであり、その目的は、システム復旧時のデータ転送
量を必要最小限に押さえることにより、ミラーリングに
かかる時間を短縮させ、もって、復旧時間の大幅な短縮
化を図り、信頼性の高いフォールトトレラントシステム
を構築することにある。The present invention has been made in view of such circumstances, and an object thereof is to reduce the time required for mirroring by suppressing the amount of data transfer at the time of system recovery to a necessary minimum, and thus to recover. The goal is to significantly shorten the time and build a highly reliable fault-tolerant system.

【０００５】[0005]

【課題を解決するための手段】本発明は、同一のハード
ウエア構成を持つコンピュータが２台あり、そのコンピ
ュータ間をケーブル接続して通信しながら２台のコンピ
ュータが同じ動作をすることにより信頼性の向上を図る
フォールトトレラントシステムであって、それぞれのコ
ンピュータに接続されているハードディスクをミラーリ
ングして同一情報を相互に保有し、ハードディスク以外
の障害発生によるコンピュータ停止後の復旧処理を行う
フォールトトレラントシステムにおけるミラーディスク
復旧方式に関するものである。According to the present invention, there are two computers having the same hardware configuration, and the two computers operate in the same manner while communicating with each other by connecting cables between the computers. A fault-tolerant system that improves the performance of a fault-tolerant system that mirrors hard disks connected to each computer and mutually retains the same information, and that performs recovery processing after the computer stops due to a failure other than the hard disk. It relates to a mirror disk restoration method.

【０００６】そこで、上記の課題を解決するために、本
発明のフォールトトレラントシステムにおけるミラーデ
ィスク復旧方式は、ディスクの障害以外の要因で１台の
コンピュータが停止している間に更新されたディスクの
データ情報を特別の領域に保存しておく。そして、障害
を起こしたコンピュータが修復してシステムに復帰し
て、ディスクデータをリカバリするときに、更新された
ディスクのデータ情報、すなわち、１台のコンピュータ
が停止および復旧作業したときの差分情報のみを転送す
ることによって、システム本来の処理を停止させること
なく従来より早く復旧させることが出来るようにしたこ
とを特徴とする。In order to solve the above problems, therefore, the mirror disk restoration method in the fault tolerant system of the present invention uses a disk that is updated while one computer is stopped due to a factor other than a disk failure. Save data information in a special area. Then, when the failed computer is repaired and returned to the system to recover the disk data, only the updated disk data information, that is, the difference information when one computer is stopped and recovered By transferring the data, it is possible to recover faster than before without stopping the original processing of the system.

【０００７】すなわち、請求項１に係るフォールトトレ
ラントシステムにおけるミラーディスク復旧方式は、同
一のハードウエア構成を持つ二系の対向するコンピュー
タが、相互通信しながら同一動作を行うことにより、信
頼性の向上を図るように構成されたフォールトトレラン
トシステムであって、それぞれの前記コンピュータ内に
設けられるディスクをミラーリングして同一情報を相互
に保有し、前記ディスク以外の障害発生によるコンピュ
ータ停止後の復旧処理を行うフォールトトレラントシス
テムにおけるミラーディスク復旧方式において、それぞ
れの前記コンピュータは、前記ディスクの記憶領域をＮ
（Ｎは２以上の自然数）分割したときの記憶領域に対応
づけられたＮ個の要素によって構成され、前記コンピュ
ータが停止している間に前記Ｎ分割されたディスクの記
憶領域のうち更新された記憶領域に対応する前記要素に
該更新された差分データ情報を保存する特別保存領域を
備え、障害を起こしたコンピュータが修復して二系のシ
ステムに復帰して、ディスクデータをリカバリするとき
は、正常なコンピュータは、システムに復帰したコンピ
ュータに対して、自己の特別保存領域に保存されている
差分データ情報に対応する前記ディスク内のデータのみ
を転送することによって、二系のシステムに復帰させる
ものであって、二系のシステムを構成するそれぞれのコ
ンピュータは、主たる演算を行うＣＰＵ（中央演算装
置）と、ディスクと他のハードウエアとのインタフェー
スを提供するディスクインターフェースと、前記ディス
クとのデータアクセスを直接行うディスクコントローラ
と、データを保存するために使用されるディスクと、そ
れぞれの前記コンピュータの相互に、データの受け渡し
を実現させるデータ通信手段と、前記コンピュータ全体
のハードウエアの故障を監視し、故障を発見した場合に
該コンピュータ全体の機能を停止させる機能を有す故障
検出手段と、前記二系のシステムが片系運転または復旧
運転時に作成された差分データ情報を記録するために前
記特別保存領域内に設けられ、前記ディスクの記憶領域
をＮ（Ｎは２以上の自然数）分割したときの記憶領域に
対応づけられたＮ個の要素によって構成され、前記コン
ピュータが停止している間に前記Ｎ分割されたディスク
の記憶領域のうち更新された記憶領域に対応する前記要
素に該更新された差分データ情報を保存するデータアク
セステーブルと、前記ディスクへのアクセス命令がデー
タの読み出しか書き込みかを判別して、前記データアク
セステーブルに書き込むデータを作成するデータ監視手
段とを備え、該データアクセステーブルが、他のコンピ
ュータが停止及び復旧作業をしている間に更新された前
記差分データ情報を保存するものであり、前記データ監
視手段は、前記Ｎ分割されたディスクの記憶領域のう
ち、既に書き換えられている記憶領域については、デー
タを書き込むときに計算したチェックサムのみを変更す
ることを特徴とする。That is, the mirror disk restoration method in the fault-tolerant system according to claim 1 improves reliability by allowing two opposing computers having the same hardware configuration to perform the same operation while mutually communicating. A fault-tolerant system configured to achieve the above, the disks provided in each of the computers are mirrored to mutually retain the same information, and recovery processing is performed after the computer is stopped due to a failure other than the disks. In the mirror disk restoration method in the fault tolerant system, each of the computers allocates a storage area of the disk to N disks.
(N is a natural number of 2 or more) It is composed of N elements corresponding to the storage area when it is divided, and is updated among the storage areas of the N-divided disk while the computer is stopped. A special storage area for storing the updated difference data information in the element corresponding to the storage area is provided, and when the failed computer recovers and returns to the secondary system to recover the disk data, The normal computer returns to the system of the second system by transferring only the data in the disk corresponding to the difference data information stored in the special storage area of itself to the computer which has returned to the system.
Each of the two components that make up the dual system
The computer is a CPU (central processing unit) that performs the main operations.
And the interface between the disk and other hardware.
Disk interface that provides
Disk controller for direct data access to the disk
And the disk used to store the data, and
Passing data to and from each of the computers
Data communication means for realizing the above, and the entire computer
Monitor hardware failures and detect failures
Failure that has the function of stopping the function of the entire computer
Single-system operation or restoration of the detection means and the two-system
Before to record the difference data information created during operation
The storage area of the disk provided in the special storage area
Is divided into N (N is a natural number of 2 or more)
It consists of N elements associated with each other.
Disk divided into N while the computer is stopped
Corresponding to the updated storage area of the storage area
The data access that stores the updated difference data information.
Access table and access instructions to the disk .
The data access is determined by determining whether the data is read or written.
Data monitor that creates the data to write to the process table
And the data access table is
Before it was updated while the computer was shutting down and restoring
This is to store the difference data information.
The viewing means is a storage area of the N-divided disk.
For storage areas that have already been rewritten,
Change only the checksum calculated when writing the data
Characterized in that that.

【０００８】[0008]

【０００９】請求項２に係るフォールトトレラントシス
テムにおけるミラーディスク復旧方式は、請求項１のも
のにおいて、データアクセステーブルは、電源が切れて
も内容を保持できるフラッシュメモリの記憶媒体で構成
されていることを特徴とする。A mirror disk restoration method in a fault tolerant system according to a second aspect is the one according to the first aspect , wherein the data access table is composed of a storage medium of a flash memory capable of retaining the contents even when the power is turned off. Is characterized by.

【００１０】[0010]

【発明の実施の形態】以下、図面を用いて本発明の実施
の形態を詳細に説明する。図１は、本発明の実施の形態
におけるコンピュータシステムの構成を示すブロック図
である。同図において、コンピュータ１とコンピュータ
２は同一のハードウエア構成であり、それぞれ、主たる
演算を行うＣＰＵ１１、２１と、ディスクと他のハード
ウエアとのインタフェースを提供するディスクインター
フェース１２、２２と、ディスクへのアクセス命令がデ
ータの読み出しか書き込みかを判別してデータアクセス
テーブルに書き込むデータを作成するデータ監視手段１
３、２３と、ディスクとのデータアクセスを直接行うデ
ィスクコントローラ１４、２４と、データを保存するた
めに使用されるディスク１５、２５と、コンピュータ１
及びコンピュータ２の相互にデータの受け渡しを実現す
るデータ通信手段１６、２６と、コンピュータ全体のハ
ードウエアの故障を監視し、故障を発見した場合はコン
ピュータ全体の機能を停止する機能を有す故障検出手段
１７、２７と、システムが、片系運転または復旧運転時
に作成されたデータの差分情報を記録するために必要な
記憶媒体で、ミラーコピーに必要なディスク書き込み命
令発行時の書き込み位置およびそのときのステータスを
記憶しておくデータアクセステーブル１８、２８とによ
って構成されている。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described in detail below with reference to the drawings. FIG. 1 is a block diagram showing the configuration of a computer system according to an embodiment of the present invention. In the figure, a computer 1 and a computer 2 have the same hardware configuration, and CPUs 11 and 21 for performing main operations, disk interfaces 12 and 22 for providing an interface between a disk and other hardware, and a disk, respectively. Data monitoring means 1 for creating data to be written in the data access table by discriminating whether the access instruction is reading or writing of data
3, 23, disk controllers 14 and 24 for directly accessing data to the disks, disks 15 and 25 used for storing data, and a computer 1.
And failure detection having a function of monitoring the failure of the data communication means 16 and 26 for mutually exchanging data between the computer 2 and the hardware of the entire computer and stopping the function of the entire computer when a failure is found. Means 17 and 27 and a storage medium necessary for the system to record difference information of data created during one-sided operation or recovery operation, and the write position at the time of issuing a disk write command necessary for mirror copy and at that time And data access tables 18 and 28 for storing the status of the.

【００１１】ただし、データアクセステーブル１８、２
８は、電源が切れても内容を保持できるようなフラッシ
ュＲＯＭのようなデバイスまたは、それに代わるディス
ク等の記憶媒体で構成されているものとする。また、デ
ィスク１５、２５の未使用領域を使ってデータアクセス
テーブル１８、２８の機能を有することも可能である。
また、コンピュータ１とコンピュータ２は、リンクケー
ブル３によって結合されており、お互いの処理情報及び
コンピュータ間のデータ転送および状態確認に使用され
る。However, the data access tables 18 and 2
Reference numeral 8 is assumed to be composed of a device such as a flash ROM capable of retaining the contents even when the power is turned off, or a storage medium such as a disk which replaces the device. It is also possible to use the unused areas of the disks 15 and 25 to have the functions of the data access tables 18 and 28.
The computer 1 and the computer 2 are connected by a link cable 3 and are used for mutual processing information, data transfer between computers, and status confirmation.

【００１２】次に、このような構成における実施の形態
の動作について説明する。２台のコンピュータ１、２に
よるフォールトトレラントシステムにおいて、ディスク
１５、２５以外の障害発生により１台のコンピュータが
停止してしまい、残りの１台のコンピュータ（１または
２）で動作し続けなければならなくなった場合に、１台
運転時に発生したディスク（１５または２５）の更新情
報を、特別に用意した記憶領域であるデータアクセステ
ーブル（１８または２８）に保存しておく。Next, the operation of the embodiment having such a configuration will be described. In a fault tolerant system with two computers 1 and 2, one computer stops due to a failure other than the disks 15 and 25, and the remaining one computer (1 or 2) must continue to operate. When the data disappears, the update information of the disk (15 or 25) generated during the operation of one unit is stored in the data access table (18 or 28) which is a specially prepared storage area.

【００１３】障害を取り除き、停止したコンピュータ
（１または２）が再起動したときに、停止中に記憶して
いたデータアクセステーブル（１８または２８）の更新
情報を使用して、停止中に更新されたディスク（１５ま
たは２５）の内容のみを転送することが出来るようなハ
ードウエアやソフトウエアを付加することによって、従
来よりも短時間にミラーコピーを終了させることが出来
るようにしたことを特徴とする。また、データアクセス
テーブル１８、２８のデータ構造を工夫することによっ
て、ミラーコピー中にもデータアクセステーブルを更新
できるので、ミラーコピーを行っている間も通常業務を
動作させることが出来るのも特徴である。When the fault is removed and the stopped computer (1 or 2) is restarted, the updated information of the data access table (18 or 28) stored during the stop is used to update the computer during the stop. The mirror copy can be completed in a shorter time than before by adding hardware or software that can transfer only the contents of the disk (15 or 25). To do. Also, by devising the data structure of the data access tables 18 and 28, the data access table can be updated even during the mirror copy, so that it is possible to operate normal operations while the mirror copy is being performed. is there.

【００１４】次に、動作の一例として、図１、図２、図
３、及び図４に示すようなシステム及びアルゴリズムを
例に用いてシステム復旧の仕方を詳細に説明する。尚、
図２は、図１におけるデータアクセステーブルが行う処
理のアルゴリズムである。また、図３は、図１における
２台のコンピュータ構成によるフォールトトレラントシ
ステムが取りうる状態遷移図である。さらに、図４は、
図１の構成において、１台のコンピュータの障害から復
旧までの処理の流れを示すフローチャートである。Next, as an example of the operation, the method of system recovery will be described in detail by using the system and algorithm as shown in FIGS. 1, 2, 3 and 4 as an example. still,
FIG. 2 is an algorithm of the process performed by the data access table in FIG. FIG. 3 is a state transition diagram which can be taken by the fault tolerant system including the two computers in FIG. Furthermore, FIG.
2 is a flowchart showing a flow of processing from failure to recovery of one computer in the configuration of FIG. 1.

【００１５】先ず、正常時の動作について説明する。Ｃ
ＰＵ１１およびＣＰＵ２１での演算結果のチェックサム
をデータ通信手段１６、２６に転送し、お互いのデータ
をリンクケーブル３を使って転送して、データの整合性
を確認する。そして、ディスクアクセス時は、データ監
視手段１３、２３は、読み込みおよび書き込みデータの
チェックサムを生成して読み書きを行い、読み込みおよ
び書き込みが終了したら、ディスクインタフェース１
２、２２とデータ通信手段１６、２６を使用して、生成
したチェックサムをリンクケーブル３より相互に転送し
て、データが一致しているかどうかを確認し、一致して
いれば次の処理に移る。First, the normal operation will be described. C
The checksum of the calculation result in the PU 11 and the CPU 21 is transferred to the data communication means 16 and 26, and the mutual data is transferred using the link cable 3 to confirm the consistency of the data. Then, at the time of disk access, the data monitoring means 13 and 23 generate checksums of read and write data to perform read / write, and when read and write are completed, the disk interface 1
2, 22 and the data communication means 16, 26 are used to transfer the generated checksums from the link cable 3 to each other to check whether the data match, and if they do match, perform the next processing. Move.

【００１６】ここで、図２のアルゴリズムを用いてデー
タアクセステーブル１８、２８の動作について詳細に説
明する。データアクセステーブル１８、２８の一要素
は、ディスク１５、２５の総容量をＮ分割したときの領
域に対応し、当該領域のディスク更新情報を保持してお
り、そのアドレスを左上から順に０、１、２と振って行
き、右上まできたら一段下の左に移りアドレスを振り当
てて行き、Ｎー１番のアドレスまで振り当てられてい
る。このデータアクセステーブル１８、２８の一要素
は、データフォーマット１の形式からなっており、障害
が発生したときに書き込まれたデータであることを示す
障害発生情報２１１と、そのデータを書き込むときに計
算したチェックサム２１２と、次に書き込みがあったデ
ータアクセステーブル１８、２８のアドレスを保存して
いるネクストアドレスポインタ２１３とからなる。Now, the operation of the data access tables 18 and 28 will be described in detail using the algorithm of FIG. One element of the data access tables 18 and 28 corresponds to an area when the total capacity of the disks 15 and 25 is divided into N, holds disk update information of the area, and its address is 0, 1 in order from the upper left. After waving it up to 2, and moving to the upper right, move one step down to the left and assign the address until the address N-1 is assigned. One element of the data access tables 18 and 28 has the format of the data format 1, and the failure occurrence information 211 indicating that the data is written when the failure occurs, and the calculation when writing the data. The checksum 212 and the next address pointer 213 storing the addresses of the data access tables 18 and 28 to which data is written next.

【００１７】また、データアクセステーブル１８、２８
には、障害復旧に必要なアドレスを保持するデータアド
レスレジスタ２が割り当てられている。データアドレス
レジスタ２は、最後にデータ更新した領域に対応したデ
ータアクセステーブル１８、２８のアドレスを示すラス
トアドレス２２１と、その一つ前にデータ更新した領域
に対応したデータアクセステーブルのアドレスを記憶し
ているプレラストアドレス２２２と、障害が発生した後
に一番最初にデータ更新した領域に対応したデータアク
セステーブルのアドレスを記録している障害発生アドレ
ス２２３と、ライトアクセス発生時の領域に対応したデ
ータアクセステーブル１８、２８のアドレスを示す、カ
レントアクセスアドレス２２４とからなっている。Further, the data access tables 18 and 28
Is assigned a data address register 2 that holds an address necessary for failure recovery. The data address register 2 stores the last address 221 indicating the address of the data access tables 18 and 28 corresponding to the last updated data area and the address of the data access table corresponding to the last updated data area. The pre-last address 222, the failure address 223 that records the address of the data access table corresponding to the area where the data was updated first after the failure occurred, and the data corresponding to the area when the write access occurred. It comprises a current access address 224, which indicates the addresses of the access tables 18 and 28.

【００１８】ディスク１５、２５への書き込みが発生し
たときに、データ監視手段１３、２３がディスク１５、
２５のどの領域にデータを書き込むかを判定し、その領
域からデータアクセステーブル１８、２８のアドレスに
変換して、そのアドレスをデータアドレスレジスタ２内
のカレントアクセスアドレス２２４に保存する。そし
て、カレントアクセスアドレス２２４が示すデータアク
セステーブル１８、２８内のチェックサム２１２にライ
トアクセス時に計算したチェックサムを書き込む。When the writing to the disks 15, 25 occurs, the data monitoring means 13, 23 cause the disks 15,
It is determined in which area of 25 the data is to be written, the area is converted into the address of the data access tables 18 and 28, and the address is stored in the current access address 224 in the data address register 2. Then, the checksum calculated during the write access is written in the checksum 212 in the data access tables 18 and 28 indicated by the current access address 224.

【００１９】また、現在発生したライトアクセスは、最
後にライトアクセスした次のアクセスとなるので、ラス
トアドレス２２１の示すデータアクセステーブル１８、
２８内のネクストアドレスポインタ２１３を現在のアク
セスアドレスに更新する。すなわち、カレントアクセス
アドレス２２４をネクストアドレスポインタ２１３に保
存する。そして、今度は現在ライトアクセスした領域の
データアクセステーブル１８、２８が最後にアクセスし
た領域を示すことになるので、ラストアドレス２２１の
内容をプレラストアドレス２２２へ保存し、カレントア
クセスアドレス２２４の内容をラストアドレス２２１へ
保存する。このようにして、ライトアクセスのあった領
域のデータアクセステーブル１８、２８をポインタ接続
することによってディスク１５、２５の更新履歴を保存
しておく。Since the write access that has occurred at present is the access next to the last write access, the data access table 18 indicated by the last address 221
The next address pointer 213 in 28 is updated to the current access address. That is, the current access address 224 is stored in the next address pointer 213. Then, since the data access tables 18 and 28 of the currently write-accessed area indicate the last accessed area, the contents of the last address 221 are saved in the pre-last address 222 and the contents of the current access address 224 are stored. Save to last address 221. In this way, the update history of the disks 15 and 25 is stored by connecting the pointers to the data access tables 18 and 28 of the areas that have been write-accessed.

【００２０】次に、図３の状態遷移図により、２台のコ
ンピュータ１、２の構成によるフォールトトレラントシ
ステムが取りうる状態遷移について説明する。コンピュ
ータ１およびコンピュータ２が起動していて、ディスク
１５、２５のデータ内容に違いが無く、同一の動作をし
ている状態を正常運転３１１、コンピュータ（１または
２）が一方しか動作していない状態を片方運転３１２、
システムとして動作できない状態をシステム停止３１
３、ディスク１５、２５の差分を転送している状態を復
旧運転３１４とすることによって、４つの状態で表すこ
とが出来る。Next, the state transitions that the fault tolerant system having the two computers 1 and 2 can take will be described with reference to the state transition diagram of FIG. When the computer 1 and the computer 2 are running, there is no difference in the data contents of the disks 15 and 25, and the same operation is performed. Normal operation 311, the computer (1 or 2) is operating only one. One-side operation 312,
Stop the system when it cannot operate as a system 31
By setting the state in which the difference between the disk 3 and the disks 15 and 25 is being transferred to the recovery operation 314, the state can be represented by four states.

【００２１】また、状態遷移は矢印の方向に発生し、そ
れぞれの条件は、コンピュータが異常を検出して停止す
る条件の一台停止３０１、一台停止している状態でさら
にもう一台停止する条件のもう一台停止３０２、復旧運
転中に再起動したコンピュータが停止する条件の転送先
が停止３０３、一台運転の状態で停止していたコンピュ
ータを再起動する条件の一台再起動３０４、システム停
止状態から長く動作していたコンピュータが再起動する
条件の転送元再起動３０５、復旧運転中に長く動作して
いるコンピュータが異常で停止してしまう条件の転送元
が停止３０６、ディスクの差分情報を全て転送し終わる
条件のディスクコピー終了３０７、の７つの条件によっ
て遷移する。Further, the state transition occurs in the direction of the arrow, and the respective conditions are one stop 301 under the condition that the computer detects the abnormality and stop, and another stop while one is stopped. Another condition is to stop 302, a computer to be restarted during the recovery operation is stopped, the transfer destination of the condition is to stop 303, and a condition to restart the computer that was stopped in the single operation condition is to be restarted 304. Transfer source restart 305 under the condition that a computer that has been operating for a long time from the system stop state is restarted, transfer source is stopped under the condition that a computer that has been operating for a long time during recovery operation stops due to an error 306, disk difference The transition is made according to the seven conditions of the disk copy end 307, which is the condition that all information is transferred.

【００２２】次に、図４のフローチャートを用いて、コ
ンピュータ２が障害で停止した状態で、片方運転３１２
から正常運転３１１へ復帰するまでに行う処理の流れを
説明する。先ず、故障検出手段２７がコンピュータ２の
内部の装置のハードウエアの故障を検出したらコンピュ
ータ２の動作を停止する（Step1）。これによって、コ
ンピュータ１が障害回避モードに移行する。すなわち、
コンピュータ１のデータ通信手段１６がコンピュータ２
のデータ通信手段２６と通信出来ないため、コンピュー
タ２の停止を検出し、その情報をディスクインタフェー
ス１２に伝える。すると、ディスクインタフェース１２
は、コンピュータ１のみで動作していることを認識し、
その情報をデータ監視手段１３に知らせる。この情報を
受けたデータ監視手段１３は、プレラストアドレス２２
２を障害発生アドレス２２３に書き込み、プレラストア
ドレス２２２およびラストアドレス２２１に書かれてい
るアドレスの示すデータアクセステーブル１８の障害発
生情報２１１をＯＮにする。こうすることによって、障
害が発生していることを示す障害発生フラッグが上がり
データアクセステーブル１８に情報を書き込む際に障害
発生情報２１１をＯＮにしてデータを書き込むモードへ
と移行する（Step2）。Next, referring to the flowchart of FIG. 4, one-sided operation 312 is performed with the computer 2 stopped due to a failure.
The flow of the processing performed from to return to the normal operation 311 will be described. First, when the failure detecting means 27 detects a hardware failure of a device inside the computer 2, the operation of the computer 2 is stopped (Step 1). As a result, the computer 1 shifts to the failure avoidance mode. That is,
The data communication means 16 of the computer 1 is the computer 2
Since it cannot communicate with the data communication means 26, the stop of the computer 2 is detected and the information is transmitted to the disk interface 12. Then, the disk interface 12
Recognizes that it is running only on computer 1,
The information is notified to the data monitoring means 13. Upon receiving this information, the data monitoring means 13 receives the pre-last address 22.
2 is written in the failure occurrence address 223, and the failure occurrence information 211 of the data access table 18 indicated by the addresses written in the pre-last address 222 and the last address 221 is turned ON. By doing so, the failure occurrence flag indicating that a failure has occurred is raised, and when the information is written in the data access table 18, the failure occurrence information 211 is turned on and the mode shifts to the data write mode (Step 2).

【００２３】次に、ＣＰＵ１１が要求するディスクアク
セスを制御する（Step3）。この制御方法は、別フロー
のSetp20〜Step26によって説明する。すなわち、ＣＰＵ
１１がディスク１５へのアクセスを要求すると、ディス
クインタフェース１２を経てデータ監視手段１３に命令
が転送される（Step20）。次に、その命令を解析し（St
ep21）、読み込み命令なら、ディスクコントローラ１４
を制御してディスク１５からデータを読み出して（Step
26）、デスクへのアクセス要求を終了する（Step25）。
一方、Step21での解析結果が書き込み命令なら、ディス
クコントローラ１４を制御してディスク１５にデータを
書き込む（Step22）。そのとき、データ監視手段１３は
チェックサムを生成する。また、書き込みアドレスに対
応するデータアクセステーブル１８のアドレスを決定
し、カレントアクセスアドレス２２４に書き込む。Next, the disk access requested by the CPU 11 is controlled (Step 3). This control method will be described with reference to Setp20 to Step26 of another flow. That is, the CPU
When 11 requests access to the disk 15, an instruction is transferred to the data monitoring means 13 via the disk interface 12 (Step 20). Next, the instruction is analyzed (St
ep21), if it is a read command, the disk controller 14
To read data from the disk 15 (Step
26) and end the access request to the desk (Step 25).
On the other hand, if the analysis result in Step 21 is a write command, the disk controller 14 is controlled to write data on the disk 15 (Step 22). At that time, the data monitoring means 13 generates a checksum. Further, the address of the data access table 18 corresponding to the write address is determined and written in the current access address 224.

【００２４】次に、データアクセステーブル１８の作成
を行う。すなわち、ラストアドレス２２１の示すアドレ
スにあるデータアクセステーブル１８内のネクストアド
レスポインタ２１３にカレントアクセスアドレス２２４
の内容を書き込み、カレントアクセスアドレス２２４の
示すデータアクセステーブル１８内の障害発生情報２１
１をＯＮに設定し、片方運転３１２中のライトアクセス
であることを示す。また、チェックサム２１２には、St
ep22で生成したチェックサムを書き込み、ネクストアド
レスポインタ２１３には、リストの最後であることを示
すために存在しないアドレス（Ｎ）を書き込む（Step2
3）。Next, the data access table 18 is created. That is, the current access address 224 is set to the next address pointer 213 in the data access table 18 at the address indicated by the last address 221.
Of the error occurrence information 21 in the data access table 18 indicated by the current access address 224.
1 is set to ON to indicate that the write access is in the one-way operation 312. In addition, the checksum 212 includes St
The checksum generated in ep22 is written, and the nonexistent address (N) is written in the next address pointer 213 to indicate the end of the list (Step 2).
3).

【００２５】尚、Step23において、データアクセステー
ブル１８を更新する際にすでに障害発生情報２１１がＯ
Ｎである場合は、すでに片方運転３１２状態中に当該領
域へのライトアクセスが発生していたことを示してお
り、この場合はチェックサム２１２のみを書き換えてSt
ep25に進む。すなわち、当該同アドレスに対する更新デ
ータ情報のみを書き換えて、ディスクへのアクセス要求
を終了する（Step25）。一方、Step23において、データ
アクセステーブルを更新する際に障害発生情報２１１が
ＯＦＦである場合は、プレラストアドレス２２２にラス
トアドレス２２１を、ラストアドレス２２１にカレント
アクセスアドレス２２４を書き込み、データアドレスレ
ジスタを更新し（Step24）、命令実行終了をディスクイ
ンタフェース１２に伝え、ディスクへのアクセスを終了
する（Step25）。In step 23, when the data access table 18 is updated, the failure occurrence information 211 has already been set to O.
If it is N, it indicates that the write access to the area has already occurred during the one-sided operation 312 state. In this case, only the checksum 212 is rewritten and the St
Continue to ep25. That is, only the update data information for the same address is rewritten, and the access request to the disk is ended (Step 25). On the other hand, in step 23, when the failure occurrence information 211 is OFF when updating the data access table, the last address 221 is written to the pre-last address 222, the current access address 224 is written to the last address 221, and the data address register is updated. (Step 24), the end of instruction execution is notified to the disk interface 12, and the access to the disk is ended (Step 25).

【００２６】このようにしてディスクアクセスが終了す
ると、前述のStep3よりStep4へ進む。すなわち、データ
通信手段１６が、コンピュータ２が再起動したかどうか
を確認し（Step4）、ＮＯであるならばStep3へ戻り、前
述のStep20〜Step26を実行して、データアクセステーブ
ルの障害発生情報２１１をＯＮにしながら記録を更新し
て行くことによって、片方運転３１２の状態時のデータ
更新履歴を保存して行く。When the disk access is completed in this way, the process proceeds from Step 3 to Step 4 described above. That is, the data communication unit 16 confirms whether or not the computer 2 is restarted (Step 4), and if NO, returns to Step 3 and executes the above Steps 20 to 26 to execute the failure occurrence information 211 of the data access table. By updating the record while turning on, the data update history in the state of the one-way operation 312 is saved.

【００２７】一方、Step4において、データ通信手段１
６によってコンピュータ２が再起動したかどうかの確認
がＹＥＳであるならばStep5に進む。すなわち、コンピ
ュータ２が再起動した際に、ディスク２５のディスクデ
ータが停止前と同じ内容であるかを確認するために、デ
ータアクセステーブル２８に記録されているラストアド
レス２２１およびプレラストアドレス２２２のそれぞれ
が指すデータアクセステーブル２８のチェックサム２１
２を取り出し、実際に記録されているディスクデータを
読み出してチェックサムを生成し、チェックサムが停止
前と一致しているかどうかを確認する。すなわち、一致
したかどうかの情報をコンピュータ１のデータ通信手段
１６が確認し、ディスクインタフェース１２を経由して
データ監視手段１３に知らせる（Step5）。On the other hand, in Step 4, the data communication means 1
If the confirmation of whether the computer 2 is restarted by 6 is YES, the process proceeds to Step 5. That is, when the computer 2 is restarted, each of the last address 221 and the pre-last address 222 recorded in the data access table 28 is checked in order to confirm whether the disk data of the disk 25 has the same contents as before the stop. Checksum 21 of the data access table 28 pointed to by
2 is taken out, the actually recorded disk data is read out to generate a checksum, and it is confirmed whether the checksum is the same as before the stop. That is, the data communication means 16 of the computer 1 confirms whether or not they match, and informs the data monitoring means 13 via the disk interface 12 (Step 5).

【００２８】また、ディスク２５のデータはコンピュー
タ２が停止していた片方運転３１２の間、データの更新
が行われておらず、古いデータとなるため、ＣＰＵ２１
はディスク２５のデータを使用できないので、ディスク
インタフェース２２を使用せず、データ通信手段２６、
リンクケーブル３およびデータ通信手段１６を使用し
て、ディスクインタフェース１２を使用する。The data in the disk 25 is not updated during the one-way operation 312 in which the computer 2 was stopped and becomes old data, so the CPU 21
Cannot use the data of the disk 25, the disk interface 22 is not used and the data communication means 26,
The disk interface 12 is used by using the link cable 3 and the data communication means 16.

【００２９】そして、Step6ではミラーコピーモードに
移行して図３における復旧運転３１４の状態となる。す
なわち、データ監視手段１３に転送されてきたチェック
サムの比較結果が両方とも一致した場合は、ディスク２
５のデータがコンピュータ２が停止する前と同じである
と判断できるため、データの差分のみを転送するモード
に切り替わる。またチェックサムの比較結果が一つでも
一致しない場合は、ディスクの交換などによりディスク
２５のデータがコンピュータ２が停止する前と異なって
いると判断できるため、ディスク１５に書かれている全
てのデータを転送するモードに切り替える（step6）。Then, in Step 6, the mirror copy mode is entered and the recovery operation 314 in FIG. 3 is entered. That is, if the comparison results of the checksums transferred to the data monitoring means 13 are the same, the disk 2
Since it can be determined that the data 5 is the same as before the computer 2 was stopped, the mode is switched to the mode in which only the data difference is transferred. If even one checksum comparison result does not match, it can be determined that the data on the disk 25 is different from that before the computer 2 was stopped due to disk replacement or the like. Switch to transfer mode (step6).

【００３０】ここで、Step6において、データの差分の
みを転送するモードである場合は、データアドレスレジ
スタ２に記憶されている障害発生アドレス２２３が示す
アドレスに対応するディスク１５のデータを転送し、次
に障害発生アドレス２２３が示すデータアクセステーブ
ルのネクストアドレスポインタ２１３に書かれているア
ドレスに対応したディスクデータを順々に転送するため
に、Step7のディスクアクセスに移る。Here, in Step 6, if only the data difference is transferred, the data of the disk 15 corresponding to the address indicated by the failure occurrence address 223 stored in the data address register 2 is transferred, In order to sequentially transfer the disk data corresponding to the address written in the next address pointer 213 of the data access table indicated by the failure occurrence address 223, the process moves to the disk access of Step 7.

【００３１】また、Step6において、全てのデータを転
送するモードである場合は、ディスク１５のデータ全て
をディスク２５に転送する必要があるので、データアク
セステーブル１８にある差分情報を破棄し、全てのデー
タが転送できるように、次のような処理を行う。すなわ
ち、あるアドレス（ｉ）の障害発生情報２１１をＯＮ
し、ネクストアドレスポインタ２１３にアドレス（ｉ＋
１）を書き込む、というような作業を、アドレス（０）
からアドレス（Ｎ−１）まで全てのデータに対して行
う。データアクセステーブル１８の更新が終了したら、
ラストアドレス２２１に（Ｎ−１）、プレラストアドレ
ス２２２に（Ｎ−２）、障害発生アドレス２２３に
（０）を書き込み、Step7のディスクアクセスに移る。In Step 6, if all the data is in the transfer mode, it is necessary to transfer all the data in the disk 15 to the disk 25. Therefore, the difference information in the data access table 18 is discarded and The following processing is performed so that the data can be transferred. That is, the failure occurrence information 211 of a certain address (i) is turned on.
The address (i +
Work such as writing 1) is performed at address (0)
To address (N-1) for all data. After updating the data access table 18,
(N-1) is written in the last address 221, (N-2) is written in the pre-last address 222, and (0) is written in the failure occurrence address 223, and the disk access in Step 7 is started.

【００３２】Step7では、ＣＰＵ１１またはＣＰＵ２１
からライトアクセスを処理するステップで、Step3と全
く同じ処理を行う。つまり、ミラーコピー中もディスク
１５のデータ更新を行うことが出来る。また、ＣＰＵ１
１またはＣＰＵ２１からのライトアクセス要求がなけれ
ば、Step7のディスクアクセスは実行されずに、Step8の
ディスクコピーの実行に進む。At Step 7, CPU 11 or CPU 21
In the step of processing the write access from to, the same processing as Step 3 is performed. That is, the data of the disk 15 can be updated even during the mirror copy. Also, CPU1
If there is no 1 or a write access request from the CPU 21, the disk access in Step 7 is not executed and the process proceeds to the disk copy execution in Step 8.

【００３３】すなわち、Step8では、コンピュータ１が
データアクセステーブル１８の障害発生アドレス２２３
が示すデータアクセステーブルに対応するディスク１５
のデータをディスク２５に転送する。コンピュータ２で
は、データ監視手段２３が正常動作時と同じ処理を行う
ので、ライトアクセス時にチェックサムの生成とカレン
トアクセスアドレス２２４を更新して、データアクセス
テーブル２８のカレントアクセスアドレス２２４が指す
データアクセステーブルのアドレスのチェックサム２１
２と、ラストアドレス２２１が指すデータアクセステー
ブルのアドレスのネクストアドレスポインタ２１３にカ
レントアクセスアドレス２２４を書き込み、ラストアド
レス２２１の内容をプレラストアドレス２２２に、カレ
ントアクセスアドレス２２４の内容をラストアドレス２
２１に転送する（Step8）。That is, in Step 8, the computer 1 causes the failure address 223 of the data access table 18 to occur.
Disk 15 corresponding to the data access table indicated by
Data is transferred to the disk 25. In the computer 2, since the data monitoring means 23 performs the same processing as in the normal operation, the checksum is generated and the current access address 224 is updated at the time of write access, and the data access table pointed to by the current access address 224 of the data access table 28. Address checksum 21
2 and the current access address 224 is written to the next address pointer 213 of the address of the data access table pointed to by the last address 221, and the contents of the last address 221 are stored in the pre-last address 222 and the contents of the current access address 224 are stored in the last address 2
21 (Step 8).

【００３４】次に、Step9において、データアドレスレ
ジスタの更新を行う。すなわち、コンピュータ２のディ
スク２５でライトアクセスが正常に終了したら、データ
監視手段２３が、ディスクインタフェース２２、データ
通信手段２６、リンクケーブル３、データ通信手段１６
及びディスクインタフェース１２を経由して、コンピュ
ータ１のデータ監視手段１３に終了確認を知らせる。コ
ンピュータ１は、正常終了の情報が確認できたら、デー
タアクセステーブル１８の障害発生アドレス２２３が示
すデータアクセステーブル１８に書かれている障害発生
情報２１１をＯＦＦにする。また、障害発生アドレス２
２３が示すデータアクセステーブル１８に書かれている
ネクストアドレスポインタ１１３を障害発生アドレス２
２３に書き込む（Step9）。Next, in Step 9, the data address register is updated. That is, when the write access to the disk 25 of the computer 2 ends normally, the data monitoring means 23 causes the disk interface 22, the data communication means 26, the link cable 3, and the data communication means 16 to operate.
And, via the disk interface 12, the data monitoring means 13 of the computer 1 is notified of the end confirmation. When the computer 1 can confirm the information of normal termination, it turns off the failure occurrence information 211 written in the data access table 18 indicated by the failure occurrence address 223 of the data access table 18. Also, the fault address 2
23 to the next address pointer 113 written in the data access table 18
Write in 23 (Step 9).

【００３５】そして、Step10ではミラーリングが終了し
たかどうかの判定を行う。すなわち、データアクセステ
ーブル１８の障害発生アドレス２２３がＮでなければ
（すなわち、ＮＯなら）、Step7のディスクアクセスに
戻る。また、Ｎならば（すなわち、ＹＥＳなら）、これ
以上リストは存在しないので、二系統のコンピュータを
正常系に復帰させる（Step11）。つまり、Step7〜Step9
までの処理を、データアクセステーブル１８のネクスト
アドレスポインタが終わりを示すアドレス（Ｎ）になる
まで繰り返すことによって、片方運転３１２の間に更新
したデータを転送することが出来る。Then, in Step 10, it is judged whether or not the mirroring is completed. That is, if the failure occurrence address 223 of the data access table 18 is not N (that is, NO), the process returns to the disk access of Step 7. If N (that is, if YES), there are no more lists, and the two-system computer is restored to the normal system (Step 11). That is, Step7 to Step9
By repeating the processing up to the above until the next address pointer of the data access table 18 reaches the end address (N), the updated data can be transferred during the one-way operation 312.

【００３６】Step11では、データ監視手段１３を正常動
作モードに移行する。すなわち、正常モードであること
をディスクインタフェース１２、データ通信手段１６、
リンクケーブル３及びデータ通信手段２６を経由してＣ
ＰＵ２１に知らせる。コンピュータ２のＣＰＵ２１はデ
ィスク１５とディスク２５との差分データの転送が完了
したことによりディスク２５のデータが復旧し、ディス
クへのアクセス時にはディスクインタフェース２２を使
用するようになり、正常系に復帰できる。At Step 11, the data monitoring means 13 is shifted to the normal operation mode. That is, the disk interface 12, data communication means 16,
C via the link cable 3 and the data communication means 26
Notify PU21. The CPU 21 of the computer 2 recovers the data in the disk 25 by completing the transfer of the differential data between the disk 15 and the disk 25, and uses the disk interface 22 when accessing the disk, thereby returning to the normal system.

【００３７】以上述べた実施の形態は本発明を説明する
ための一例であり、本発明は、上記の実施の形態に限定
されるものではなく、発明の要旨の範囲で種々の変形が
可能である。例えば、複数のコンピュータで同じファイ
ルを保存しているような、冗長系ネットワークサーバー
システムにおいても、本発明が適用できることは云うま
でもない。The embodiment described above is an example for explaining the present invention, and the present invention is not limited to the above-mentioned embodiment, and various modifications can be made within the scope of the invention. is there. It goes without saying that the present invention can be applied to a redundant network server system in which the same file is stored in a plurality of computers.

【００３８】[0038]

【発明の効果】以上説明したように、コンピュータの二
重化によるフォールトトレラントシステムでは、ハード
ウエアの故障が発見されたらコンピュータを停止させな
いと不良部品を交換できない状況がある。特に、両系で
ハードディスクをミラーリングしている場合、片系運転
状態から安定なシステムに復旧するためには、各ハード
ディスクの情報を両系で同一にする作業が必要となる。
しかし、従来の方式は、片系のコンピュータがダウン状
態から復旧し、復旧したコンピュータのハードディスク
情報を復元する場合、継続運転していたコンピュータの
ディスク内の情報全てを復旧したコンピュータのディス
クへコピーしていたため、ディスクのミラーリングが完
了するのに相当な時間がかかり、障害復旧までの時間の
殆どをディスクデータ復旧のためのミラーコピー時間で
占めている状況であり、システムの信頼性の低下を招い
ていた。しかし、本発明を適用することにより、ディス
クデータの復旧時には必要最低限のデータを転送するだ
けでよいので、大幅に復旧時間が短縮され、従来に比べ
て一層信頼性の高いフォールトトレラントシステムを構
築することが出来る。As described above, in a fault tolerant system by duplicating a computer, there is a situation where a defective part cannot be replaced unless the computer is stopped when a hardware failure is detected. In particular, when the hard disks are mirrored in both systems, it is necessary to make the information of each hard disk the same in both systems in order to restore the stable system from the one-system operating state.
However, in the conventional method, when one computer is restored from the down state and the hard disk information of the restored computer is restored, all the information in the disk of the computer that was continuously operating is copied to the restored computer disk. Therefore, it takes a considerable amount of time to complete disk mirroring, and most of the time until failure recovery is occupied by the mirror copy time for disk data recovery, which leads to a decrease in system reliability. Was there. However, by applying the present invention, it is only necessary to transfer the minimum necessary data when recovering the disk data, so the recovery time is greatly shortened, and a fault-tolerant system with higher reliability than before is constructed. You can do it.

[Brief description of drawings]

【図１】本発明の実施の形態におけるコンピュータシ
ステムの構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a computer system according to an embodiment of the present invention.

【図２】図１におけるデータアクセステーブルが行う
処理のアルゴリズムである。FIG. 2 is an algorithm of processing performed by the data access table in FIG.

【図３】図１における２台のコンピュータ構成による
フォールトトレラントシステムが取りうる状態遷移図で
ある。FIG. 3 is a state transition diagram which can be taken by a fault tolerant system having a configuration of two computers in FIG.

【図４】図１の構成において、１台のコンピュータの
障害から復旧までの処理の流れを示すフローチャートで
ある。FIG. 4 is a flowchart showing a processing flow from failure to recovery of one computer in the configuration of FIG.

[Explanation of symbols]

１、２…コンピュータ、１１、２１…ＣＰＵ、１２、２
２…ディスクインターフェース、１３、２３…データ監
視手段、１４、２４…ディスクコントローラ、１５、２
５…ディスク、１６、２６…データ通信手段、１７、２
７…故障検出手段、１８、２８…データアクセステーブ
ル1, 2 ... Computer, 11, 21 ... CPU, 12, 2
2 ... Disk interface, 13, 23 ... Data monitoring means, 14, 24 ... Disk controller, 15, 2
5 ... Disk, 16, 26 ... Data communication means, 17, 2
7 ... Failure detection means, 18, 28 ... Data access table

Claims

(57) [Claims]

1. A fault tolerant system configured to improve reliability by allowing two opposing computers having the same hardware configuration to perform the same operation while mutually communicating. In the mirror disk restoration method in a fault tolerant system, in which the disks provided in the computer are mirrored to mutually retain the same information, and the restoration processing is performed after the computer is stopped due to a failure other than the disks, , A disk which is composed of N elements corresponding to the storage area when the storage area of the disk is divided into N (N is a natural number of 2 or more), and which is divided into N while the computer is stopped In the element corresponding to the updated storage area of the storage area of A special storage area for storing the updated differential data information is provided, and when the failed computer recovers and returns to the secondary system to recover the disk data, the normal computer returns to the system. against the computer, by transferring only the data in the disk corresponding to the differential data information stored in the special storage area of the self, but to return to the two systems of
So, each computer that makes up the dual system
A CPU (central processing unit) that performs the main calculations and a disk
Interface to other hardware
Disk interface and data access to the disc.
Disk controller for direct access and save data
Discs used to
Data that enables the exchange of data between computers
Data communication means and the hardware of the entire computer.
Monitor the failure and if you find a failure,
Failure detecting means having a function of stopping the function of the body, and
The dual system is created during single operation or recovery operation.
In the special storage area for recording the difference data information
And the storage area of the disk is N (N is 2 or more).
Natural number) N associated with the storage area when divided
The computer is shut down
Of the storage area of the disk divided into N while
The element corresponding to the updated storage area is updated
A data access table for storing difference data information,
The disk access command is a data read or write command.
It is determined whether it has been imprinted and it is written in the data access table.
Data monitoring means for creating data to be imbedded, and the data access table is stopped by another computer.
And the difference data that was updated during the recovery work
Information is stored, and the data monitoring means stores the N divided disks.
Memory area that has already been rewritten
Of the checksum calculated when writing the data
A mirror disk restoration method in a fault-tolerant system characterized by changing only the settings.

2. The mirror disk restoration method in a fault tolerant system according to claim 2, wherein the data access table is composed of a storage medium of a flash memory that can retain the contents even when the power is turned off.