JP2953639B2

JP2953639B2 - Backup device and method thereof

Info

Publication number: JP2953639B2
Application number: JP4349854A
Authority: JP
Inventors: 讓真矢; 英明源馬; 俊之木下
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1992-12-02
Filing date: 1992-12-02
Publication date: 1999-09-27
Anticipated expiration: 2014-09-27
Also published as: JPH06175788A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、複数の現用機とそれら
の共通の予備機とをネットワークで接続したシステムに
適用して好適なバックアップ装置及びその方法に関し、
特に、高性能化及び高信頼化を図ることのできる装置及
びその方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a backup apparatus and a backup method suitable for use in a system in which a plurality of active units and their common standby unit are connected via a network.
In particular, the present invention relates to an apparatus and a method capable of achieving high performance and high reliability.

【０００２】[0002]

【従来の技術】近年、コンピュータシステムは大規模
化、広域化され、ネットワークを用いた分散システムに
移行している。また、処理形態はオンライントランザク
ション処理が中心となり、多くのサービスは２４時間３
６５日連続の方向に向かっている。そのため、このよう
なネットワークを用いたシステムの高性能化及び高信頼
化が求められている。2. Description of the Related Art In recent years, computer systems have become large-scale and wide-area, and have shifted to distributed systems using networks. In addition, the processing form is mainly online transaction processing, and most services are 24 hours 3 hours.
He is heading for 65 consecutive days. Therefore, there is a demand for higher performance and higher reliability of a system using such a network.

【０００３】高性能化や高信頼化のため、コンピュータ
システムでは一般的な業務処理を行なうホストと通信処
理を実行するＦＥＰ（フロント・エンド・プロセッサ）
とを高速ＬＡＮ（ローカル・エリア・ネットワーク）で
接続する構成が一般化している。そして、ＦＥＰをホッ
トスタンバイ方式により実現することにより、高性能・
高信頼化を図ってきた。ホットスタンバイ方式とは、バ
ックアップ用のＦＥＰを常に待機状態としておき、いず
れかのＦＥＰに障害が発生したとき、そのバックアップ
用のＦＥＰで処理を続行する方式である。In order to achieve higher performance and higher reliability, a computer system uses a FEP (Front End Processor) to execute communication processing with a host performing general business processing.
And a high-speed LAN (local area network). By realizing FEP by the hot standby method, high performance and
High reliability has been achieved. The hot standby method is a method in which a backup FEP is always in a standby state, and when a failure occurs in any one of the FEPs, the processing is continued in the backup FEP.

【０００４】同様に、幾つかの主システムとバックアッ
プ用のスタンバイシステムを高速ＬＡＮで接続し、主シ
ステムに障害が発生したとき、スタンバイシステムで処
理を続行するようなバックアップ方式を採用したシステ
ムがある。Similarly, there is a system that employs a backup system in which some main systems and a standby system for backup are connected by a high-speed LAN, and when a failure occurs in the main system, processing is continued in the standby system. .

【０００５】例えば、ＨＰ社のＨＰ９０００シリーズ
（商品名）のカタログによれば、同シリーズでは、ｎ対
１バックアップ方式をＳｗｉｔｃｈ／Ｏｖｅｒ機能とし
てサポートしている。これは、ｎ台の主システム（現用
機）と１台のスタンバイシステム（予備機）とをＬＡＮ
で接続し、これらのシステムに二重化した共有ディスク
装置を接続し、現用機で障害が発生すると、その時点か
ら予備機が現用機のディスクの制御を引き継ぐものであ
る。予備機は、通常、現用機の処理と無関係なプロセス
（例えば、バッチ処理など）を実行している。現用機で
障害が発生すると、予備機のプロセスはアボート（シャ
ットダウン）され、予備機はリブートされる。そして、
予備機が現用機のディスクの制御を引き継ぐ。[0005] For example, according to the catalog of the HP9000 series (trade name) of HP, this series supports the n-to-1 backup method as a Switch / Over function. This means that n main systems (active units) and one standby system (stand-by unit) are connected via LAN.
When a failure occurs in the active device, the spare device takes over the control of the disk of the active device from that point on. The spare machine usually executes a process (for example, a batch process) that is not related to the processing of the active machine. When a failure occurs in the active unit, the process of the standby unit is aborted (shut down), and the standby unit is rebooted. And
The spare unit takes over the control of the disk of the active unit.

【０００６】一方、「日経エレクトロニクス１９９
２．５．１８号Ｎｏ．５５４」には、ＩＢＭ社のＨＡ
／６０００及びＨＡＮＦＳ／６０００（商品名）におけ
るバックアップ方式が開示されている。これらは、上記
のＨＰ社のＳｗｉｔｃｈ／Ｏｖｅｒ機能とほとんど同じ
であり、予備機は比較的重要度の低いアプリケーション
を稼働しており、現用機で障害が発生すると、予備機は
一旦にリブートしている。On the other hand, “Nikkei Electronics 199
No. 2.5.18 No. 554 ”contains the IBM HA
/ 6000 and HANFS / 6000 (trade name) are disclosed. These are almost the same as the above-mentioned Switch / Over function of HP, and the standby unit runs a relatively low-priority application. If a failure occurs in the active unit, the standby unit is rebooted at once. I have.

【０００７】さらに、ＦＥＰとして、フォールトトレラ
ントコンピュータ（ＦＴＣ）を導入したものもある。例
えば、「フォールトトレラントシステム丸善、グレイ
著」に記載されているように、タンデム社では、フォー
ルトトレラントシステムにおいて共有ディスク装置その
ものを二重化構成にしている。[0007] Further, there is an FEP in which a fault tolerant computer (FTC) is introduced. For example, as described in “Fault-Tolerant System by Maruzen and Gray”, Tandem uses a dual configuration of the shared disk device itself in the fault-tolerant system.

【０００８】[0008]

【発明が解決しようとする課題】上記の従来の方式で
は、共有ディスク装置を二重化しており、ｎ台の現用機
と１台の予備機とが共有ディスク装置を共有している。
そのため、共有ディスク装置へのアクセスが他の現用機
と競合し、共有ディスク装置をアクセスする頻度が高く
なってしまう。その結果、共有ディスク装置からの読み
出しおよび書込み時間が長くなり、性能が低下するとい
う問題があった。In the above-mentioned conventional system, the shared disk device is duplicated, and the n active devices and one spare device share the shared disk device.
Therefore, access to the shared disk device competes with another active device, and the frequency of accessing the shared disk device increases. As a result, there is a problem that the time for reading from and writing to the shared disk device is lengthened, and the performance is reduced.

【０００９】さらに上記従来例では、通常、予備機は現
用機の処理と無関係なプロセスを実行している。そし
て、現用機は、できる限り障害発生から時点から再開す
るために必要な実行途中における制御データを取得して
いない。したがって、現用機で障害が発生すると、予備
機はプロセスをアボートして一旦リブートしなければな
らなかった。Further, in the above conventional example, usually, the standby unit executes a process irrelevant to the processing of the active unit. The active device has not acquired control data in the middle of execution necessary for restarting from the point of occurrence of a failure as much as possible. Therefore, when a failure occurs in the active device, the standby device has to abort the process and reboot once.

【００１０】また、障害の発生した現用機のプロセスを
アボートして、最初から処理を再開するため、システム
の停止時間が１５分程度と長くなるという問題があっ
た。Further, since the process of the active machine in which the failure has occurred is aborted and the processing is restarted from the beginning, there has been a problem that the stop time of the system is prolonged to about 15 minutes.

【００１１】本発明の目的は、このような従来の課題を
解決し、ディスク装置へのアクセスに競合を少なくし、
性能を向上させるｎ：１バックアップ装置及びその方法
を提供することにある。SUMMARY OF THE INVENTION An object of the present invention is to solve such a conventional problem and reduce contention for access to a disk drive.
It is an object of the present invention to provide an n: 1 backup apparatus and method for improving performance.

【００１２】さらに、本発明の目的は、現用機で障害が
発生しても、実行途中のプロセスを引き継ぐことを可能
とするｎ：１バックアップ装置及びその方法を提供する
ことにある。It is a further object of the present invention to provide an n: 1 backup apparatus and a method thereof that can take over a process that is being executed even if a failure occurs in an active device.

【００１３】[0013]

【課題を解決するための手段】ｎ台の現用機とこれらの
共通の予備である予備機により構成したシステムにおい
て、すべての現用機と予備機が共有するディスク装置を
設ける。この共有ディスク装置を、現用機と予備機毎に
ｎ＋１個のエリアに分割する。現用機は内蔵ディスク装
置から読み出し、内蔵ディスク装置と共有ディスク装置
に書き込む。SUMMARY OF THE INVENTION In a system composed of n active machines and a spare machine which is a common spare thereof, a disk device shared by all the active machines and the spare machine is provided. This shared disk device is divided into n + 1 areas for each of the active unit and the spare unit. The active device reads from the internal disk device and writes to the internal disk device and the shared disk device.

【００１４】また現用機は、書込みＩ／Ｏ発行時に、チ
ェックポイントデータとして、受信電文、プロセッサレ
ジスタ、回線情報およびプロセスの実行状態を格納す
る。その後、現用機で障害が発生すると、予備機は現用
機の障害を検出し、チェックポイントから現用機の処理
を引き継ぐ。The active device stores a received message, a processor register, line information, and a process execution state as checkpoint data when a write I / O is issued. Thereafter, when a failure occurs in the active unit, the standby unit detects the failure of the active unit and takes over the processing of the active unit from the check point.

【００１５】[0015]

【作用】本発明では、ｎ台の現用機と１台の共通の予備
機を高速ＬＡＮで接続するシステムとするとよい。現用
機と予備機が共有する共有ディスク装置は、現用機と予
備機に内蔵のディスク装置と二重化構成となっている。
したがって、現用機は、内蔵ディスク装置から読み出
し、内蔵ディスク装置と共有ディスク装置に書き込むよ
うにできる。これにより、従来のように、共有ディスク
のアクセスに競合が多く発生することがなくなり、高性
能化が図れる。According to the present invention, a system may be used in which n active devices and one common standby device are connected by a high-speed LAN. The shared disk device shared by the active device and the spare device has a duplex configuration with a disk device built in the active device and the spare device.
Therefore, the active device can read from the internal disk device and write to the internal disk device and the shared disk device. As a result, unlike the related art, the contention of the shared disk does not cause much contention, and the performance can be improved.

【００１６】現用機で障害が発生すると、アライブメッ
セージの欠落により、予備機が現用機の障害を検出す
る。予備機は障害の発生した現用機をリセットする。予
備機は、共有ディスク装置から読み出し、処理を引き継
ぐことができる。When a failure occurs in the active unit, the standby unit detects the failure of the active unit due to a missing alive message. The spare unit resets the failed active unit. The spare device can read from the shared disk device and take over the processing.

【００１７】この手段により、予備機は、共有ディスク
装置に格納されているチェックポイントデータ（実行中
の電文の情報、及びＩ／Ｏ待ちの電文の情報、Ｉ／Ｏの
処理が終了しレディ状態の電文の情報）を参照すること
により、最新のチェックポイントから実行中の電文、Ｉ
／Ｏ待ちの電文、レディ状態の電文を再開することがで
きる。By this means, the spare unit can check the checkpoint data stored in the shared disk device (information on the message being executed, information on the message waiting for I / O, the I / O processing is completed and the ready state is established). By referring to the information of the electronic message of the current
A message waiting for / O and a message in a ready state can be restarted.

【００１８】また、現用機で障害が発生しても、予備機
は、共有ディスク装置に格納されているチェックポイン
トデータを参照することにより、最新のチェックポイン
トから実行中の電文、Ｉ／Ｏ待ちの電文、およびレディ
状態の電文のすべてを得ることが可能である。この結
果、現用機で障害が発生しても、予備機は、受信電文、
プロセッサレジスタ、回線情報およびプロセスの実行状
態を読み出し、すべての電文の引継ぎ処理を再開させる
ことが可能となる。Further, even if a failure occurs in the active device, the spare device refers to the checkpoint data stored in the shared disk device, so that the standby device can execute a message or I / O waiting from the latest checkpoint. , And all of the ready messages. As a result, even if a failure occurs in the active unit, the standby unit
It becomes possible to read the processor register, the line information and the execution state of the process, and restart the takeover process of all messages.

【００１９】また、共有ディスク装置へのアクセスは書
込みだけとなり、アクセス競合が少なくなり、性能を向
上させることが可能となる。In addition, access to the shared disk device is performed only by writing, access competition is reduced, and performance can be improved.

【００２０】[0020]

【実施例】以下、図面を用いて、本発明の実施例を説明
する。まず、図１〜図１０を参照して、主として本実施
例のシステム構成およびハードウエア構成を説明する。Embodiments of the present invention will be described below with reference to the drawings. First, a system configuration and a hardware configuration of the present embodiment will be mainly described with reference to FIGS.

【００２１】図１は、本発明の一実施例に係る高性能高
信頼化システムのシステム構成図である。本実施例のシ
ステムは、７台の現用機（１１〜１７）と、これらの共
通の予備である１台の予備機（１０）と、現用機（１１
〜１７）および予備機（１０）を接続する高速ＬＡＮ
（１）を備えている。また、通信処理サーバ（２）を設
け、高速ＬＡＮ（１）に接続している。FIG. 1 is a system configuration diagram of a high-performance and high-reliability system according to an embodiment of the present invention. The system according to the present embodiment includes seven working machines (11 to 17), one spare machine (10) that is a common spare thereof, and a working machine (11).
17) and a high-speed LAN connecting the standby unit (10)
(1) is provided. Further, a communication processing server (2) is provided and connected to the high-speed LAN (1).

【００２２】通信処理サーバ（２）は、現用機（１１〜
１７）あるいは予備機（１０）と２１台の端末（７−０
〜７−２０）との間における電文の送信と受信のために
使用する。The communication processing server (2) is provided with the active devices (11 to 11).
17) Or spare machine (10) and 21 terminals (7-0)
To 7-20) for transmission and reception of messages.

【００２３】通信処理サーバ（２）は、現用通信処理サ
ーバ（２−０）、待機通信処理サーバ（２−１）、これ
らが共有する通信処理サーバ用ディスク装置（２−
２）、およびタイマ（２−３）で構成する。タイマ（２
−３）は、受信電文に時刻印（後述する図３０の付番７
３）を付与するために使用する。通信処理サーバ（２）
のディスク装置（２−２）には、現用機（１１〜１７）
で実行している電文の時刻印（７３）、および端末（７
−０〜７−２０）と現用機（１１〜１７）との変換テー
ブル（２−４）を格納する。変換テーブル（２−４）に
ついては、図８を参照して後に詳述する。The communication processing server (2) includes an active communication processing server (2-0), a standby communication processing server (2-1), and a communication processing server disk device (2-
2) and a timer (2-3). Timer (2
-3) indicates a time stamp (numbering 7 in FIG. 30 described later) in the received message.
Used to give 3). Communication processing server (2)
Of the working machines (11 to 17)
Time stamp (73) of the message being executed on the terminal (7)
-0 to 7-20) and the conversion table (2-4) between the active devices (11 to 17) are stored. The conversion table (2-4) will be described later in detail with reference to FIG.

【００２４】回線切替装置（３）は、現用通信処理サー
バ（２−０）あるいは待機通信処理サーバ（２−１）と
多数の端末（７）との切替に使用する。The line switching device (3) is used for switching between the active communication processing server (2-0) or the standby communication processing server (2-1) and a large number of terminals (7).

【００２５】ディスクサブシステム（４）は、現用機
（１１〜１７）と予備機（１０）に接続し、これらから
共有ディスク装置（５）がアクセスできるようにする。
共有ディスク装置（５）は、現用機（１１〜１７）と予
備機（１０）が共有するディスク装置である。The disk subsystem (4) is connected to the active units (11 to 17) and the standby unit (10) so that the shared disk unit (5) can access them.
The shared disk device (5) is a disk device shared by the active devices (11 to 17) and the standby device (10).

【００２６】図２は、図１の現用機（１１〜１７）と予
備機（１０）の構成図である。以下では、現用機（１
１）を例にして説明する。FIG. 2 is a configuration diagram of the working machines (11 to 17) and the spare machine (10) of FIG. In the following, the working machine (1
This will be described using 1) as an example.

【００２７】現用機（１１）は、プロセッサ（１１−
１）、メモリ（１１−２）、入出力処理装置（以下、Ｉ
ＯＰと呼ぶ）（１１−３）、ディスク制御装置（１１−
４）、ＬＡＮアダプタ（１１−５）、および内蔵ディス
ク装置（１１−６）で構成する。The working machine (11) has a processor (11-
1), memory (11-2), input / output processing device (hereinafter, I
OP) (11-3), the disk controller (11-
4), a LAN adapter (11-5), and a built-in disk device (11-6).

【００２８】他の現用機（１２〜１７）あるいは予備機
（１０）は、現用機（１１）と同一の構成である。現用
機（１２〜１７）あるいは予備機（１０）は、現用機
（１１）と同様に、プロセッサ（１２−１〜１７−１，
１０−１）、メモリ（１２−２〜１７−２，１０−
２）、ＩＯＰ（１２−３〜１７−３，１０−３）、ディ
スク制御装置（１２−４〜１７−４，１０−４）、ＬＡ
Ｎアダプタ（１２−５〜１７−５，１０−５）、および
内蔵ディスク装置（１２−６〜１７−６，１０−６）か
ら構成されるものとする。The other working machines (12 to 17) or the spare machine (10) have the same configuration as the working machine (11). The working machine (12 to 17) or the spare machine (10) is, like the working machine (11), a processor (12-1 to 17-1,.
10-1), memory (12-2 to 17-2, 10-)
2), IOP (12-3 to 17-3, 10-3), disk controller (12-4 to 17-4, 10-4), LA
N adapters (12-5 to 17-5, 10-5) and built-in disk devices (12-6 to 17-6, 10-6).

【００２９】図３は、図１の共有ディスク装置（５）の
割当てを示す図である。この実施例のシステムでは、現
用機が７台、予備機が１台であるため、共有ディスク装
置（５）は８つに分割する。FIG. 3 is a diagram showing the assignment of the shared disk device (5) in FIG. In the system of this embodiment, since the number of active devices is seven and the number of spare devices is one, the shared disk device (5) is divided into eight.

【００３０】最初の共有ディスク装置（５）のエリア
（５−１）は現用機（１１）用とし、現用機（１１）の
内蔵ディスク装置（１１−６）と同一の内容を格納する
ことにより、二重化構成とする。次の共有ディスク装置
のエリア（５−２）は、現用機（１２）の内蔵ディスク
装置（１２−６）と二重化構成にする。以下順に、共有
ディスク装置のエリア（５−ｘ）は、現用機（１ｘ）の
内蔵ディスク装置（１ｘ−６）と同一の内容を格納する
ことにより、二重化構成とする。最後の共有ディスク装
置のエリア（５−０）は、予備機（１０）の内蔵ディス
ク装置（１０−６）と二重化構成にする。The area (5-1) of the first shared disk unit (5) is used for the active unit (11) and stores the same contents as the internal disk unit (11-6) of the active unit (11). , A dual configuration. The area (5-2) of the next shared disk device is duplicated with the built-in disk device (12-6) of the active device (12). In the following order, the area (5-x) of the shared disk device has a duplex configuration by storing the same contents as the internal disk device (1x-6) of the active device (1x). The area (5-0) of the last shared disk device is made redundant with the built-in disk device (10-6) of the spare device (10).

【００３１】図４は、実施例のシステムにおけるデータ
の読出し・書込みの流れを示す図であり、本実施例の特
徴を示す図である。図では、現用機（１１）と現用機
（１７）を図示し、さらにディスクサブシステム（４）
と共有ディスク装置（５）とを図示している。矢印は、
データの読出し時の流れ、およびデータの書込み時の流
れを示す。FIG. 4 is a diagram showing a flow of data reading / writing in the system according to the embodiment, and is a diagram showing characteristics of the embodiment. In the figure, an active device (11) and an active device (17) are illustrated, and a disk subsystem (4) is further illustrated.
And a shared disk device (5). The arrow is
The flow when reading data and the flow when writing data are shown.

【００３２】図３に示したとおり、本実施例において、
ディスク装置は、共有ディスク装置（５）と、現用機
（１１〜１７）あるいは予備機（１０）の内蔵ディスク
装置（１１−６〜１７−６）とにより、二重化されてい
る。そして、図４に示すように、ディスク装置からの読
み出し処理は、アクセススピードが速く他の現用機（１
１〜１７）や予備機（１０）と競合することのない内蔵
ディスク装置（１１−６〜１７−６）から行うようにし
ている。また、書込み処理は内蔵ディスク装置（１１−
６〜１７−６）と共有ディスク装置（５）との両方に行
う。このようにして、システムの性能を向上させること
ができる。As shown in FIG. 3, in this embodiment,
The disk device is duplicated by a shared disk device (5) and a built-in disk device (11-6 to 17-6) of the active device (11 to 17) or the standby device (10). Then, as shown in FIG. 4, the read processing from the disk device is performed at a high access speed and the other active device (1
1 to 17) and the built-in disk device (11-6 to 17-6) which does not compete with the spare device (10). The writing process is performed on the internal disk device (11-
6 to 17-6) and the shared disk device (5). In this way, the performance of the system can be improved.

【００３３】さらに本実施例では、ディスク装置（５，
１１−６〜１７−６）に、チェックポイントデータ（図
１８）を格納し、電文処理（図２５の付番７３０）を中
断させないようにしている。チェックポイントデータの
詳細は、後に詳述するが、チェックポイントデータは、
受信電文、プロセッサレジスタ、回線情報、電文の状
態、プロセスの実行状態の引継ぎ処理のために必要なデ
ータ、およびディスク装置への書込みデータである。ま
た、チェックポイントは、ディスク装置（５，１１−６
〜１７−６）への書込み時とする。Further, in this embodiment, the disk device (5,
11-6 to 17-6), checkpoint data (FIG. 18) is stored so as not to interrupt the message processing (numbering 730 in FIG. 25). The details of the checkpoint data will be described later in detail.
These are the data necessary for taking over the received telegram, processor register, line information, telegram status, process execution status, and data to be written to the disk device. In addition, the check point is determined by the disk device (5, 11-6).
To 17-6).

【００３４】図５は、図２の現用機あるいは予備機内の
ディスク制御装置の構成図である。現用機（１１〜１
７）および予備機（１０）の各ディスク制御装置（１０
−４〜１７−７）は、すべて同じ構成であるため、ここ
では現用機７（１７）のディスク制御装置（１７−４）
を例にして説明する。FIG. 5 is a block diagram of the disk control device in the working machine or the spare machine in FIG. Working machine (11-1
7) and each disk controller (10) of the spare machine (10).
-4 to 17-7) have the same configuration, so the disk controller (17-4) of the active unit 7 (17) is used here.
Will be described as an example.

【００３５】ディスク制御装置（１７−４）は、プロセ
ッサ（１７−４−１）、メモリ（１７−４−２）、バッ
ファ（１７−４−３）、およびディスク制御部（１７−
４−４）で構成する。The disk controller (17-4) includes a processor (17-4-1), a memory (17-4-2), a buffer (17-4-3), and a disk controller (17-4-1).
4-4).

【００３６】ディスク制御装置（１７−４）は、内蔵デ
ィスク装置（１７−６）と共有ディスク装置（５）に対
する読み出しデータおよび書込みデータを所有する。通
常、データの読出しは、内蔵ディスク装置（１７−６）
から行なう。読み出したデータは、バッファ（１７−４
−３）内の受信用待ち行列（１７−４−５）に格納す
る。また、データの書込みは、内蔵ディスク装置（１７
−６）と共有ディスク装置（５）の両方に対して行な
う。書き込むデータは、バッファ（１７−４−３）内の
送信用待ち行列（１７−４−６）に格納する。The disk controller (17-4) owns read data and write data for the internal disk device (17-6) and the shared disk device (5). Normally, data is read from the internal disk device (17-6).
Do from. The read data is stored in a buffer (17-4).
-3) is stored in the reception queue (17-4-5). In addition, data is written in the internal disk device (17
-6) and the shared disk device (5). The data to be written is stored in the transmission queue (17-4-6) in the buffer (17-4-3).

【００３７】図６は、図２の現用機あるいは予備機内の
ＬＡＮアダプタの構成図である。現用機（１１〜１７）
および予備機（１０）の各ＬＡＮアダプタ（１０−５〜
１７−５）は、すべて同じ構成であるため、ここでは現
用機７（１７）のＬＡＮアダプタ（１７−４）を例にし
て説明する。FIG. 6 is a configuration diagram of the LAN adapter in the working machine or the spare machine in FIG. Working machine (11-17)
And each LAN adapter (10-5 to
17-5) have the same configuration, and therefore, the LAN adapter (17-4) of the active device 7 (17) will be described here as an example.

【００３８】ＬＡＮアダプタ（１７−５）は、プロセッ
サ（１７−５−１）、メモリ（１７−５−２）、バッフ
ァ（１７−５−３）、およびＬＡＮ制御部（１７−５−
４）で構成する。The LAN adapter (17-5) includes a processor (17-5-1), a memory (17-5-2), a buffer (17-5-3), and a LAN controller (17-5-5).
4).

【００３９】ＬＡＮアダプタ（１７−５）のバッファ
（１７−５−３）には、電文の受信用待ち行列（１７−
５−５）および送信用待ち行列（１７−５−６）を設け
る。これにより、他の現用機（１２〜１７）あるいは予
備機（１０）から受信した電文と他の現用機（１２〜１
７）あるいは予備機（１０）に送信する電文を所有す
る。また、電文の受信用待ち行列（１７−５−５）には
端末（７）から受信した電文を格納し、送信用待ち行列
（１７−５−６）には端末（７）へ送信する電文を格納
する。In the buffer (17-5-3) of the LAN adapter (17-5), a message receiving queue (17-
5-5) and a transmission queue (17-5-6). As a result, the message received from the other working machine (12 to 17) or the standby machine (10) and the other working machine (12 to 1) are received.
7) Or own a message to be transmitted to the standby unit (10). The message received from the terminal (7) is stored in the message reception queue (17-5-5), and the message to be transmitted to the terminal (7) is stored in the transmission queue (17-5-6). Is stored.

【００４０】図７は、図１の通信処理サーバ（２）の構
成図である。通信処理サーバ（２）は、現用通信処理サ
ーバ（２−０）、待機通信処理サーバ（２−１）、およ
びこれらが共有する通信処理サーバのディスク装置（２
−２）、およびタイマ（２−３）で構成する。FIG. 7 is a configuration diagram of the communication processing server (2) of FIG. The communication processing server (2) includes an active communication processing server (2-0), a standby communication processing server (2-1), and a disk device (2) of the communication processing server shared by these.
-2) and a timer (2-3).

【００４１】現用通信処理サーバ（２−０）は、プロセ
ッサ（２−０−１）、メモリ（２−０−２）、バッファ
（２−０−３）、回線制御部（２−０−４）、およびデ
ィスク制御装置（２−０−７）で構成する。待機通信処
理サーバ（２−１）は、現用通信処理サーバ（２−０）
と同じ構成である。The active communication processing server (2-0) includes a processor (2-0-1), a memory (2-0-2), a buffer (2-0-3), and a line controller (2-0-4). ) And a disk controller (2-0-7). The standby communication processing server (2-1) is the active communication processing server (2-0).
It has the same configuration as.

【００４２】現用通信処理サーバ（２−０）のバッファ
（２−０−３）には、電文の受信用待ち行列（２−０−
５）と送信用待ち行列（２−０−６）を設け、端末
（７）から受信した電文と端末（７）へ送信する電文を
所有する。すなわち、電文の受信用待ち行列（２−０−
５）には端末（７）から受信した電文を格納し、送信用
待ち行列（２−０−６）には端末（７）へ送信する電文
を格納する。待機通信処理サーバ（２−１）についても
同様である。In the buffer (2-0-3) of the active communication processing server (2-0), a message receiving queue (2-0-
5) and a transmission queue (2-0-6), and owns the message received from the terminal (7) and the message to be transmitted to the terminal (7). That is, the message receiving queue (2-0-
5) stores the message received from the terminal (7), and the transmission queue (2-0-6) stores the message to be transmitted to the terminal (7). The same applies to the standby communication processing server (2-1).

【００４３】図８は、通信処理サーバ（２）と端末
（７）の変換テーブルを示す図である。この変換テーブ
ルは、図７の通信処理サーバのディスク装置（２−２）
に記憶されている。変換テーブルは、通信処理サーバ
（２）が端末（７）と現用機とを接続する際にどの端末
をどの現用機に接続するか、その対応関係を示すテーブ
ルである。例えば、現用機（１１）は、端末（７−０，
７−７，７−１４）と接続する。以下、他の現用機（１
２〜１７）も、この変換テーブルに示す通りに、端末
（７）と接続する。FIG. 8 is a diagram showing a conversion table of the communication processing server (2) and the terminal (7). This conversion table is stored in the disk device (2-2) of the communication processing server in FIG.
Is stored in The conversion table is a table showing which terminal is connected to which active device when the communication processing server (2) connects the terminal (7) to the active device, and the correspondence between them. For example, the active device (11) is connected to the terminal (7-0,
7-7, 7-14). Hereinafter, other working machines (1
2 to 17) are also connected to the terminal (7) as shown in the conversion table.

【００４４】図９は、図１および図７の回線切替装置
（３）の構成図である。回線切替装置（３）には、図の
ような回線切替回路を設ける。待機通信処理サーバ（２
−１）が現用通信処理サーバ（２−０）の障害を検出す
ると、待機通信処理サーバ（２−１）は、競合防止回路
（３−０）を用いて、回線切替装置（３）を切り替え
る。FIG. 9 is a block diagram of the line switching device (3) shown in FIGS. 1 and 7. The line switching device (3) is provided with a line switching circuit as shown. Standby communication processing server (2
When -1) detects a failure of the active communication processing server (2-0), the standby communication processing server (2-1) switches the line switching device (3) by using the contention prevention circuit (3-0). .

【００４５】図１０は、図１、図４および図５のディス
クサブシステム（４）の構成図である。ディスクサブシ
ステム（４）は、プロセッサ（４−１）、メモリ（４−
２）、ＩＯＰ（４−３）、共有ディスク装置制御部（４
−４）、および内蔵ディスク装置制御部（４−５−０〜
４−５−７）で構成する。FIG. 10 is a block diagram of the disk subsystem (4) shown in FIGS. 1, 4 and 5. The disk subsystem (4) includes a processor (4-1) and a memory (4-
2), IOP (4-3), shared disk device controller (4
-4), and a built-in disk device control unit (4-5-0)
4-5-7).

【００４６】共有ディスク制御部（４−４）は、共有デ
ィスク装置（５）への書込みと読み出しを制御し、それ
ぞれのデータを所有する。内蔵ディスク装置制御部（４
−５−０〜４−５−７）は、それぞれ、現用機（１１〜
１７）あるいは予備機（１０）と接続し、各内蔵ディス
ク装置（１０−６〜１７−６）への書込みデータを受信
し、あるいは読み出しデータを送信する。The shared disk control unit (4-4) controls writing and reading to and from the shared disk device (5) and owns each data. Internal disk device controller (4
−5-0 to 4-5-7) are the working machines (11 to 11), respectively.
17) or connect with the spare machine (10) to receive write data to each built-in disk device (10-6 to 17-6) or transmit read data.

【００４７】以上、図１〜図１０により本実施例のシス
テムのハードウエア構成を説明した。The hardware configuration of the system according to the present embodiment has been described above with reference to FIGS.

【００４８】次に、図１１を参照して、現用機と予備機
の状態遷移について説明する。図１１は、上述した構成
を備えた現用機と予備機の状態遷移図である。Next, with reference to FIG. 11, a description will be given of the state transition between the active unit and the standby unit. FIG. 11 is a state transition diagram of the working machine and the spare machine having the above-described configuration.

【００４９】本実施例では、すべての現用機と予備機
（１０〜１７）にそれぞれ内蔵ディスク装置（１０−６
〜１７−６）を設け、またすべての現用機と予備機（１
０〜１７）が共有する共有ディスク装置（５）を設けて
いる。このため、障害はシステム障害とディスク障害に
分ける。ディスク障害（ディスク装置の障害）は、共有
ディスク装置（５）の障害と内蔵ディスク装置（１０−
６〜１７−６）の障害に分ける。In this embodiment, the built-in disk units (10-6) are provided in all the active units and the spare units (10-17), respectively.
~ 17-6), and all working machines and spare machines (1
0 to 17) are provided with a shared disk device (5). For this reason, failures are divided into system failures and disk failures. The disk failure (failure of the disk device) is determined by the failure of the shared disk device (5) and the internal
6 to 17-6).

【００５０】システム障害は、現用機（１１〜１７）あ
るいは予備機（１０）に影響を与える障害である。現用
機（１１〜１７）のいずれかにおいてシステム障害が発
生した場合は、予備機（１０）への切替が必須である。
以下、システム障害を単に障害という。内蔵ディスク装
置（１０−６〜１７−６）のディスク障害は、内蔵ディ
スク装置（１ｘ−７）へのアクセスを中断すれば、処理
の継続が可能である。The system failure is a failure that affects the working machine (11 to 17) or the spare machine (10). When a system failure occurs in any of the active devices (11 to 17), switching to the standby device (10) is essential.
Hereinafter, a system failure is simply referred to as a failure. In the case of a disk failure in the internal disk device (10-6 to 17-6), the processing can be continued by interrupting access to the internal disk device (1x-7).

【００５１】この結果、現用機と予備機（１０〜１７）
には、以下の５つの状態（１００〜１０４）を設ける。As a result, the working machine and the spare machine (10-17)
Is provided with the following five states (100 to 104).

【００５２】現用状態（１５０）は、現用機として正常
に処理を実行中の状態である。準現用状態（１５１）
は、内蔵ディスク装置（１１−６〜１７−６）あるいは
共有ディスク装置（５）が障害であるが、一方のディス
ク装置をアクセスして処理を実行中の状態である。待機
状態（１５２）は、現用機（１１〜１７）で障害が発生
しても直ちに処理を引き継げる状態である。オフライン
状態（１５３）は、障害発生や保守のためシステムから
切り離されている状態である。修復状態（１５４）は、
障害から復旧中の状態あるいは立ち上げ中の状態であ
る。The working state (150) is a state in which processing is being executed normally as a working machine. Semi-active state (151)
Indicates that the built-in disk device (11-6 to 17-6) or the shared disk device (5) has a fault, but one of the disk devices is being accessed and the process is being executed. The standby state (152) is a state where the processing can be immediately taken over even if a failure occurs in the active devices (11 to 17). The offline state (153) is a state where the system is disconnected from the system due to occurrence of a failure or maintenance. The restoration state (154) is
It is in a state of recovery from a failure or a state of startup.

【００５３】次に、現用機（１１〜１７）と予備機（１
０）の状態遷移について説明する。Next, the working machine (11-17) and the spare machine (1
The state transition of 0) will be described.

【００５４】現用状態（１５０）で内蔵ディスク装置
（１１−６〜１７−６）に障害が発生すると、内蔵ディ
スク装置（１１−６〜１７−６）へのアクセスを中断さ
せて、準現用状態（１５１）に遷移する（状態遷移１
５５）。現用状態（１５０）で共有ディスク装置（５）
で障害が発生すると、共有ディスク装置（５）へのアク
セスを中断させて、準現用状態（１５１）に遷移する
（状態遷移１５５）。現用状態（１５０）で障害が発
生すると、オフライン状態（１５３）に遷移し（状態遷
移１５７）、予備機（１０）は待機状態（１５２）か
ら現用状態（１５０）に遷移する（状態遷移１５
８）。If a failure occurs in the internal disk units (11-6 to 17-6) in the active state (150), access to the internal disk units (11-6 to 17-6) is interrupted and the semi-active state is established. Transit to (151) (state transition 1
55). Shared disk device (5) in active state (150)
, The access to the shared disk device (5) is interrupted, and the state transits to the semi-active state (151) (state transition 155). When a failure occurs in the active state (150), the state transitions to the offline state (153) (state transition 157), and the standby unit (10) transitions from the standby state (152) to the active state (150) (state transition 15).
8).

【００５５】準現用状態（１５０）で障害が発生する
と、オフライン状態（１５３）に遷移し（状態遷移１
６０）、予備機（１０）は待機状態（１５２）から現用
状態（１５０）に遷移する（状態遷移１５８）。準現
用状態（１５１）で内蔵ディスク装置（１１−６〜１７
−６）あるいは共有ディスク装置（５）が回復すると、
現用状態（１５０）に遷移する（状態遷移１５６）。When a failure occurs in the semi-active state (150), the state transits to the offline state (153) (state transition 1).
60), the standby unit (10) transits from the standby state (152) to the working state (150) (state transition 158). In the semi-active state (151), the internal disk device (11-6 to 17)
−6) or when the shared disk device (5) recovers,
The state transits to the working state (150) (state transition 156).

【００５６】待機状態（１５２）で障害が発生すると、
オフライン状態（１５３）に遷移する（状態遷移１５
９）。修復状態（１５４）で障害が発生すると、オフラ
イン状態（１５３）に遷移する（状態遷移１６３）。
オフライン状態（１５３）から修復が完了すると、修復
状態（１５４）に遷移する（状態遷移１６４）。アラ
イブメッセージを受信するようになると、修復状態（１
５４）から待機状態（１５２）に遷移する（状態遷移
１６２）。When a failure occurs in the standby state (152),
Transition to the offline state (153) (state transition 15
9). When a failure occurs in the restoration state (154), the state transits to the offline state (153) (state transition 163).
When the restoration is completed from the offline state (153), the state transits to the restoration state (154) (state transition 164). When an alive message is received, the repair status (1
54) to the standby state (152) (state transition)
162).

【００５７】なお、アライブ（ａｌｉｖｅ）メッセージ
とは、現用機（１１〜１７）から予備機（１０）に対し
て、自機が現用状態（１５０）にあることを示すために
送出されるメッセージである。予備機（１０）は、各現
用機からのアライブメッセージを常にチェックして、こ
れが送られなくなるとその現用機に障害が発生したこと
を知ることができる。逆に、予備機（１０）からも自機
が動作中であることを示すために現用機（１１〜１７）
に対してアライブメッセージが送出され、各現用機（１
１〜１７）はこれをチェックして予備機（１０）の障害
を検出できる。The alive message is a message sent from the active unit (11 to 17) to the standby unit (10) to indicate that the own unit is in the active state (150). is there. The standby unit (10) constantly checks the alive message from each active unit, and when this is not sent, can know that a failure has occurred in the active unit. Conversely, the standby unit (10) is used to indicate that the own unit is in operation.
Alive message is sent to each of the active devices (1
1 to 17) can check this to detect a failure of the standby unit (10).

【００５８】次に、図１２を参照して、ディスク装置の
モードとモード遷移について説明する。図１２は、内蔵
ディスク装置（１１−６〜１７−６）と共有ディスク装
置（５）のモードとモード遷移を示す図である。Next, the mode and mode transition of the disk device will be described with reference to FIG. FIG. 12 is a diagram showing modes and mode transitions of the internal disk devices (11-6 to 17-6) and the shared disk device (5).

【００５９】内蔵ディスク装置（１１−６〜１７−６）
と共有ディスク装置（５）のモードとしては、デュアル
モード（１７０）、シングルモード（１７１）、準デュ
アルモード（１７２）、修復モード（１７３）、および
ダウンモード（１７４）の５モード（１７０〜１７４）
を設ける。Built-in disk device (11-6 to 17-6)
And the shared disk device (5) include five modes (170 to 174) of a dual mode (170), a single mode (171), a quasi-dual mode (172), a restoration mode (173), and a down mode (174). )
Is provided.

【００６０】デュアルモード（１７０）は、内蔵ディス
ク装置（１１−６〜１７−６）と共有ディスク装置
（５）が正常である状態を示す。このモードでは、内蔵
ディスク装置（１１−６〜１７−６）から読み出し、内
蔵ディスク装置（１１−６〜１７−６）と共有ディスク
装置（５）に書き込む。シングルモード（１７１）は、
内蔵ディスク装置（１１−６〜１７−６）あるいは共有
ディスク装置（５）で障害が発生し、一方のディスク装
置のみ（５あるいは１１−６〜１７−６）で処理を実行
する状態を示す。The dual mode (170) indicates a state in which the internal disk devices (11-6 to 17-6) and the shared disk device (5) are normal. In this mode, data is read from the internal disk devices (11-6 to 17-6) and written to the internal disk devices (11-6 to 17-6) and the shared disk device (5). Single mode (171)
This indicates a state in which a failure has occurred in the internal disk device (11-6 to 17-6) or the shared disk device (5), and only one disk device (5 or 11-6 to 17-6) executes processing.

【００６１】準デュアルモード（１７２）は、内蔵ディ
スク装置（１１−６〜１７−６）が内蔵ディスク装置障
害から復旧し、復旧した内蔵ディスク装置（１１−６〜
１７−６）に共有ディスク装置（５）の内容をコピー中
である状態を示す。あるいは、共有ディスク装置（５）
が共有ディスク装置障害から復旧し、復旧した共有ディ
スク装置（５）に内蔵ディスク装置（１１−６〜１７−
６）の内容をコピー中である状態を示す。正常なディス
ク装置から読み出し、内蔵ディスク装置（１１−６〜１
７−６）または共有ディスク装置（５）に書き込むこと
になる。In the quasi-dual mode (172), the internal disk units (11-6 to 17-6) recover from the internal disk unit failure, and the recovered internal disk units (11-6 to 11-6) recover.
17-6) shows a state in which the contents of the shared disk device (5) are being copied. Alternatively, the shared disk device (5)
Has recovered from the shared disk device failure, and the built-in disk device (11-6 to 17-
This shows a state in which the contents of 6) are being copied. The data is read from the normal disk device, and the internal disk device (11-6 to 1-6) is read.
7-6) or write to the shared disk device (5).

【００６２】修復モード（１７３）は、内蔵ディスク装
置（１１−６〜１７−６）と共有ディスク装置（５）が
共に初期状態あるいは障害から修復した状態である。ダ
ウンモード（１７４）は、障害や保守により、内蔵ディ
スク装置（１１−６〜１７−６）と共有ディスク装置
（５）が共に障害状態であることを示す。The recovery mode (173) is a state in which both the internal disk devices (11-6 to 17-6) and the shared disk device (5) have been recovered from an initial state or a failure. The down mode (174) indicates that both the internal disk devices (11-6 to 17-6) and the shared disk device (5) are in a failure state due to a failure or maintenance.

【００６３】次に、図１３〜図１９を参照して、現用機
（１１〜１７）および予備機（１０）で使用するレジス
タおよびデータエリアについて説明する。Next, referring to FIGS. 13 to 19, the registers and data areas used in the active units (11 to 17) and the standby unit (10) will be described.

【００６４】レジスタとしては、図１３の現用機あるい
は予備機の状態レジスタ（２５０）、図１４の割込みレ
ジスタ（２５１）、図１５の現用機のアライブレジスタ
（２５２）、図１６の予備機のアライブレジスタ（２５
３）、図１７のチェックポイントデータレジスタ（２５
４）、およびコピー中レジスタ（２５５）が必要であ
る。また、データエリアとして、図１９のチェックポイ
ントデータエリア（２７０〜２７７）が必要である。As the registers, the status register (250) of the working machine or the spare machine in FIG. 13, the interrupt register (251) in FIG. 14, the alive register (252) of the working machine in FIG. 15, and the alive register of the spare machine in FIG. Register (25
3), the checkpoint data register (25
4), and a copying-in-progress register (255) are required. Also, the checkpoint data area (270 to 277) in FIG. 19 is required as a data area.

【００６５】これらのレジスタのうち、状態レジスタ
（２５０）、割込みレジスタ（２５１）、現用機のアラ
イブレジスタ（２５２）、予備機のアライブレジスタ
（２５３）、およびチェックポイントデータレジスタ
（２５４）は、現用機（１１〜１７）と予備機（１０）
毎に設ける。具体的には、それぞれのメモリ（１０−２
〜１７−２）上に設ける。Of these registers, the status register (250), the interrupt register (251), the alive register of the active unit (252), the alive register of the standby unit (253), and the checkpoint data register (254) are the active registers. Machine (11-17) and spare machine (10)
Provided every time. Specifically, each memory (10-2
To 17-2).

【００６６】コピー中レジスタ（２５５）とチェックポ
イントデータ（２７０〜２７７）は、現用機（１１〜１
７）と予備機（１０）毎に設け、ディスク装置（５，１
０−６〜１７−６）に格納する。The copy-in-progress register (255) and the checkpoint data (270 to 277) are stored in the active device (11 to 1).
7) and a spare machine (10), and a disk drive (5, 1
0-6 to 17-6).

【００６７】図１３は、現用機あるいは予備機の状態レ
ジスタを示す図である。現用機（１１〜１７）あるいは
予備機（１０）は、それぞれ状態レジスタ（２５０−０
〜２５０−７）を所有する。状態レジスタ（２５０）
は、当該現用機あるいは予備機の状態（図１１）を示
す。FIG. 13 is a diagram showing the status register of the working machine or the spare machine. The working machine (11-17) or the spare machine (10) has a status register (250-0).
２５０250-7). Status register (250)
Indicates the state of the working machine or the spare machine (FIG. 11).

【００６８】状態レジスタ（２５０）のビット５は現用
状態（１５０）かどうか、ビット４は準現用状態（共有
ディスク装置障害）（１５１）かどうか、ビット３は準
現用状態（内蔵ディスク装置障害）（１５２）かどう
か、ビット２は待機状態（１５２）かどうか、ビット１
は修復状態（１５４）かどうか、ビット０はオフライン
状態（１５３）かどうかを意味する。ビット７〜６は使
用しない。Bit 5 of the status register (250) is the active status (150), bit 4 is the semi-active status (shared disk device failure) (151), and bit 3 is the semi-active status (internal disk device failure). (152), bit 2 is in standby state (152), bit 1
Indicates whether the device is in the repair state (154), and bit 0 indicates whether the device is in the offline state (153). Bits 7-6 are not used.

【００６９】図１４は、現用機あるいは予備機の割込み
レジスタを示す図である。現用機（１１〜１７）あるい
は予備機（１０）は、それぞれ割込みレジスタ（２５１
−０〜２５１−７）を所有する。割込みレジスタ（２５
１）は、当該現用機（１１〜１７）あるいは予備機（１
０）に割込みが発生したかどうか、およびその割込みの
種別を示すレジスタである。FIG. 14 is a diagram showing an interrupt register of a working machine or a spare machine. The working machine (11 to 17) or the spare machine (10) has the interrupt register (251
-0 to 251-7). Interrupt register (25
1) is the working machine (11 to 17) or the spare machine (1
0) indicates whether an interrupt has occurred and a type of the interrupt.

【００７０】割込みレジスタ（２５１）のビット６は、
レベル６の緊急障害割込みの有無を示す。ビット４は、
レベル４の障害割込みの有無を示す。ビット２は、レベ
ル２のタイマ割込みを示す。優先順位はレベル７が一番
高く以下順に低くなる。ここでは、レべル７、レべル
５、レべル３、およびレべル１は使用しない。Bit 6 of the interrupt register (251) is
Indicates the presence or absence of a level 6 emergency failure interrupt. Bit 4 is
Indicates whether there is a level 4 failure interrupt. Bit 2 indicates a level 2 timer interrupt. The priority is highest in level 7 and lower in order. Here, the level 7, the level 5, the level 3, and the level 1 are not used.

【００７１】また、ディスクサブシステム（４）のプロ
セッサ（４−１）でも、同じように割込みレジスタ（２
５１−９）を使用する。Similarly, in the processor (4-1) of the disk subsystem (4), the interrupt register (2)
51-9) is used.

【００７２】図１５は、現用機のアライブレジスタを示
す図である。現用機（１１〜１７）のアライブレジスタ
（２５２−１〜２５２−７）は、各現用機から予備機
（１０）が正常に動作しているか監視するために使用す
る。アライブレジスタ（２５２−１〜２５２−７）は、
ビット０のみ使用するものとし、この値が「１」ならば
アライブメッセージ送信済みを、「０」ならばアライブ
メッセージ未送信を、それぞれ意味する。FIG. 15 is a diagram showing the alive register of the active device. The alive registers (252-1 to 252-7) of the active units (11 to 17) are used to monitor whether the standby unit (10) is operating normally from each active unit. Alive registers (252-1 to 252-7)
Only bit 0 is used. If this value is "1", it means that the alive message has been transmitted, and if this value is "0", it means that the alive message has not been transmitted.

【００７３】現用機のアライブレジスタ（２５２−１〜
２５２−７）による予備機（１０）の動作の監視は、以
下のように行なわれる。すなわち、予備機（１０）は、
現用機（１１〜１７）に周期的にアライブメッセージを
送信する。この場合のアライブメッセージとは、具体的
には、予備機（１０）が現用機（１１〜１７）の各アラ
イブレジスタ（２５２−１〜２５２−７）のビット０を
周期的（この実施例では１秒ごと）に「１」にセットす
ることをいう。各現用機（１１〜１７）は、自機のアラ
イブレジスタ（２５２−１〜２５２−７）を所定の周期
（この実施例では２秒ごと）で参照する。アライブレジ
スタ（２５２−１〜２５２−７）のビット０が「１」な
ら、予備機（１０）からのアライブメッセージが継続的
に到着しているので、予備機（１０）が正常に動作して
いることが分かる。The alive register (252-1 to 252-1) of the active device
Monitoring of the operation of the standby unit (10) according to 252-7) is performed as follows. That is, the spare machine (10)
An alive message is periodically transmitted to the active devices (11 to 17). Specifically, the alive message in this case means that the standby unit (10) periodically sets the bit 0 of each of the alive registers (252-1 to 252-7) of the active units (11 to 17) (in this embodiment, (Every 1 second). Each of the active devices (11 to 17) refers to its own alive register (252-1 to 252-7) at a predetermined cycle (every two seconds in this embodiment). If bit 0 of the alive registers (252-1 to 252-7) is "1", the alive message from the standby unit (10) continuously arrives, and the standby unit (10) operates normally. You can see that there is.

【００７４】図１６は、予備機のアライブレジスタを示
す図である。予備機（１０）のアライブレジスタ（２５
３）は、予備機（１０）から各現用機（１１〜１７）が
正常に動作しているか監視するために使用する。予備機
のアライブレジスタ（２５３）は、各現用機（１１〜１
７）に対応して７つ設けてあり、それぞれ、ビット０の
み使用するものとする。このビット０の値が「１」なら
ばアライブメッセージ送信済みを、「０」ならばアライ
ブメッセージ未送信を、それぞれ意味する。FIG. 16 is a diagram showing an alive register of the spare machine. Alive register (25
3) is used to monitor whether the active devices (11 to 17) from the standby device (10) are operating normally. The alive register (253) of the spare machine stores each of the active machines (11 to 1).
7) are provided corresponding to 7), and only bit 0 is used for each. If the value of this bit 0 is "1", it means that the alive message has been transmitted, and if it is "0", it means that the alive message has not been transmitted.

【００７５】予備機のアライブレジスタ（２５３）によ
る各現用機（１１〜１７）の動作の監視は、以下のよう
に行なわれる。すなわち、各現用機（１１〜１７）は、
予備機（１０）に周期的にアライブメッセージを送信す
る。この場合のアライブメッセージとは、具体的には、
各現用機（１１〜１７）が予備機（１０）のアライブレ
ジスタ（２５３）の対応ビット０を周期的（この例では
１秒ごと）に「１」にセットすることをいう。予備機
（１０）は、自機のアライブレジスタ（２５３）を所定
の周期（この例では２秒ごと）で参照する。アライブレ
ジスタ（２５３）の各対応ビット０が「１」なら、各現
用機（１１〜１７）からのアライブメッセージが継続的
に到着しているので、各現用機（１１〜１７）が正常に
動作していることが分かる。The operation of each active unit (11 to 17) is monitored by the alive register (253) of the standby unit as follows. That is, each working machine (11-17)
An alive message is periodically transmitted to the backup device (10). The alive message in this case is, specifically,
This means that each working machine (11-17) sets the corresponding bit 0 of the alive register (253) of the spare machine (10) to "1" periodically (every second in this example). The standby unit (10) refers to its own alive register (253) at a predetermined cycle (every 2 seconds in this example). If the corresponding bit 0 of the alive register (253) is "1", since the alive messages from the active devices (11 to 17) continuously arrive, the active devices (11 to 17) operate normally. You can see that it is doing.

【００７６】図１７は、チェックポイントデータレジス
タを示す図である。現用機（１１〜１７）あるいは予備
機（１０）は、それぞれ、チェックポイントデータレジ
スタ（２５４−０〜２５４−７）を所有する。FIG. 17 shows the checkpoint data register. Each of the working machines (11 to 17) and the spare machine (10) has a checkpoint data register (254-0 to 254-7).

【００７７】図１８はチェックポイントデータエリアを
示す図である。現用機（１１〜１７）あるいは予備機
（１０）は、それぞれチェックポイントデータエリア
（２７０−０〜２７０−７）を所有する。チェックポイ
ントデータエリアは、８領域に分けてある。上から順
に、チェックポイントデータエリア（２７０）、チェッ
クポイントデータエリア（２７１）、…、チェックポイ
ントデータエリア（２７７）とする。FIG. 18 shows a checkpoint data area. Each of the working machines (11 to 17) and the spare machine (10) has a checkpoint data area (270-0 to 270-7). The checkpoint data area is divided into eight areas. In order from the top, a checkpoint data area (270), a checkpoint data area (271),..., A checkpoint data area (277) are set.

【００７８】チェックポイントデータとは、現用機（１
１〜１７）あるいは予備機（１０）からディスク装置
（５，１１−６〜１７−６）への書込み時（チェックポ
イント）に、チェックポイントデータエリアに格納する
種々のデータをいう。例えば、受信電文、プロセッサレ
ジスタ、回線情報、電文の状態、プロセスの実行状態の
引継ぎ処理のために必要なデータ、およびディスク装置
への書込みデータなどである。これらのチェックポイン
トデータをチェックポイントに至るごとにチェックポイ
ントデータエリアに格納しておき、例えばある現用機に
障害が発生して、予備機がその処理を引き継ぐ場合に、
上記のように格納してあるチェックポイントデータを用
いてチェックポイントから処理を再開することができ
る。The checkpoint data refers to the current machine (1
1 to 17) or various data to be stored in the checkpoint data area at the time of writing (checkpoint) from the spare machine (10) to the disk device (5, 11-6 to 17-6). For example, the information includes a received message, a processor register, line information, a state of the message, data necessary for a process of taking over a process execution state, and data to be written to a disk device. These checkpoint data are stored in the checkpoint data area every time a checkpoint is reached.For example, when a failure occurs in a certain active device and the spare device takes over the processing,
Using the checkpoint data stored as described above, the processing can be restarted from the checkpoint.

【００７９】図１７のチェックポイントデータレジスタ
（２５４−０〜２５４−７）は、前回格納したチェック
ポイントデータが更新されているかどうかを示すレジス
タである。図１８に示すチェックポイントデータエリア
（２７０〜２７７）毎に、「１」ならばそのエリアのチ
ェックポイントデータは更新済みであり、「０」ならば
そのエリアのチェックポイントデータは更新されていな
いことを意味する。The checkpoint data registers (254-0 to 254-7) in FIG. 17 are registers indicating whether or not the previously stored checkpoint data has been updated. For each checkpoint data area (270 to 277) shown in FIG. 18, if "1", the checkpoint data for that area has been updated; if "0", the checkpoint data for that area has not been updated. Means

【００８０】なお、すべてのチェックポイントデータを
チェックポイントに至るごとに格納したのでは、処理の
負担が大きくなるので、チェックポイントデータエリア
を８領域に分けて、必要なデータエリアのみ更新するよ
うにしている。If all checkpoint data is stored every time a checkpoint is reached, the processing load increases. Therefore, the checkpoint data area is divided into eight areas, and only the necessary data area is updated. ing.

【００８１】図１９は、コピー中レジスタを示す図であ
る。コピー中レジスタ（２５５）は、共有ディスクと内
蔵ディスクとの間でコピー中かどうかを示す。コピー中
レジスタ（２５５）のビット０は、予備機（１０）のデ
ィスク装置（１０−６）がコピー中であるかどうかを示
す。ビット１は、現用機（１１）のディスク装置（１１
−６）がコピー中であるかどうかを示す。以下同様であ
り、ビット７は現用機（１７）のディスク装置（１７−
６）がコピー中であるかどうかを示す。いずれのビット
も、「１」でコピー中であることを、「０」でコピー中
でないことを、それぞれ示す。FIG. 19 is a diagram showing a register during copying. The copying register (255) indicates whether copying is being performed between the shared disk and the built-in disk. Bit 0 of the copying-in-progress register (255) indicates whether or not the disk unit (10-6) of the standby unit (10) is copying. Bit 1 is the disk unit (11) of the active unit (11).
-6) indicates whether copying is in progress. The same applies to the following, where bit 7 is the disk unit (17-
6) indicates whether copying is in progress. Each bit is “1” to indicate that copying is in progress, and “0” to indicate that copying is not in progress.

【００８２】次に、図２０および図２１を参照して、デ
ィスクサブシステム（４）で使用するレジスタについて
説明する。ディスクサブシステム（４）で使用するレジ
スタとしては、ディスクステータスレジスタ（２４
０）、および書込みデータレジスタ（２４１）がある。
これらのレジスタ（２４０，２４１）は、ディスクサブ
システム（４）のメモリ（４−２）に格納する。Next, referring to FIGS. 20 and 21, registers used in the disk subsystem (4) will be described. The registers used in the disk subsystem (4) include a disk status register (24
0), and a write data register (241).
These registers (240, 241) are stored in the memory (4-2) of the disk subsystem (4).

【００８３】図２０は、ディスクステータスレジスタ
（２４０）を示す図である。ディスクステータスレジス
タ（２４０）のビット０は、予備機（１０）の内蔵ディ
スク装置（１０−６）が障害か正常かを示す。ビット１
は、現用機（１１）の内蔵ディスク装置（１１−６）が
障害か正常かを示す。以下同様であり、ビット７は現用
機（１７）の内蔵ディスク装置（１１−７）が障害か正
常かを示す。いずれのビットも、「０」で正常、「１」
で障害を示す。FIG. 20 is a diagram showing the disk status register (240). Bit 0 of the disk status register (240) indicates whether the internal disk device (10-6) of the spare unit (10) has a failure or is normal. Bit 1
Indicates whether the internal disk unit (11-6) of the active unit (11) is faulty or normal. The same applies to the following, and bit 7 indicates whether the internal disk device (11-7) of the active device (17) is faulty or normal. All bits are normal for "0", "1"
Indicates failure.

【００８４】図２１は、書込みデータレジスタ（２４
１）を示す図である。書込みデータレジスタ（２４１）
のビット０は、予備機（１０）に書込みデータがあるか
どうかを示す。ビット１は、現用機（１１）に書込みデ
ータがあるかどうかを示す。以下同様であり、ビット７
は現用機（１７）に書込みデータがあるかどうかを示
す。いずれのビットも、「０」で書込みデータなしを、
「１」で書込みデータありを、それぞれ示す。書込みデ
ータレジスタ（２４１）は、後述の図３３に示す周期割
込み方式にのみ使用し、図３４に示すイベント割込み方
式では使用しない。FIG. 21 shows a write data register (24
It is a figure which shows 1). Write data register (241)
Bit 0 indicates whether or not there is write data in the spare unit (10). Bit 1 indicates whether there is write data in the active device (11). The same applies to bit 7
Indicates whether there is write data in the active device (17). Each bit is “0” to indicate no write data,
"1" indicates that there is write data. The write data register (241) is used only for the periodic interrupt method shown in FIG. 33 described later, and is not used for the event interrupt method shown in FIG.

【００８５】次に、図２２〜図２４を参照して、現用機
（１２〜１７）および予備機（１０）の回路構成などを
さらに詳細に説明する。Next, with reference to FIGS. 22 to 24, the circuit configuration of the working machines (12 to 17) and the spare machine (10) will be described in further detail.

【００８６】図２２は、現用機（１１）のプロセッサ
（１１−１）、メモリ（１１−２）、およびＩＯＰ（１
１−３）の詳細回路図である。現用機（１１）を中心に
記述する。現用機（１２〜１７）および予備機（１０）
は、現用機（１１）と同一構成である。FIG. 22 shows the processor (11-1), the memory (11-2) and the IOP (1) of the active device (11).
It is a detailed circuit diagram of 1-3). The description mainly focuses on the working machine (11). Working machine (12-17) and spare machine (10)
Has the same configuration as the working machine (11).

【００８７】プロセッサ（１１−１）は、例えばモトロ
ーラ社製の商品名：６８０００マイクロプロセッサを用
いる。マイクロプロセッサ（１１−１）の内部レジスタ
は、データレジスタＤＲ０−ＤＲ７（５００−１〜５０
７−１）、アドレスレジスタＡＲ０−ＡＲ６（５１０−
１〜５１６−１）、スタックポインタＡＲ７（５２０−
１）、ステータスレジスタＳＲ（５２１−１）、および
プログラムカウンタＰＣ（５２２−１）で構成する。The processor (11-1) uses, for example, a 68000 microprocessor manufactured by Motorola. The internal registers of the microprocessor (11-1) are data registers DR0 to DR7 (500-1 to 50).
7-1), address registers AR0 to AR6 (510-
1 to 516-1), the stack pointer AR7 (520-
1), a status register SR (521-1), and a program counter PC (522-1).

【００８８】マイクロプロセッサ（１１−１）の信号線
は、データ線Ｄ０〜Ｄ７（５４０−１）、アドレス線Ａ
１〜Ａ２３（５４１−１）、および割込み線（ＩＰＬ０
〜２）（５４３−１〜５４５−１）で構成する。Ｗ／Ｒ
線（５４６−１）は、”Ｈ”のときがリードサイク
ル、”Ｌ”のときがライトサイクルである。The signal lines of the microprocessor (11-1) include data lines D0 to D7 (540-1) and address lines A
1 to A23 (541-1) and an interrupt line (IPL0
2) (543-1 to 545-1). W / R
For the line (546-1), "H" indicates a read cycle, and "L" indicates a write cycle.

【００８９】ＩＯＰ（１１−３）は、プロセッサ（５７
０−１）、バッファ（５７１−１）、ＲＯＭ（５７２−
１）およびＲＡＭ（５７３−１）で構成する。バッファ
（５７１−１）には、プロセッサ（１１−１）から転送
される端末（７）に送信する電文や内蔵ディスク装置
（１１−６）あるいは共有ディスク装置（５）への書込
みデータを格納する。The IOP (11-3) is a processor (57)
0-1), buffer (571-1), ROM (572-
1) and a RAM (573-1). The buffer (571-1) stores a message transmitted from the processor (11-1) to the terminal (7) and data to be written to the internal disk device (11-6) or the shared disk device (5). .

【００９０】その他、タイマ（５３０−１）、アドレス
デコーダ（５３１−１）、および割込みエンコーダ（５
３２−１）を設ける。In addition, a timer (530-1), an address decoder (531-1), and an interrupt encoder (5
32-1) is provided.

【００９１】図２３は、現用機（１１）のタイマ割込み
の制御回路を示す図である。タイマ（５３０−１）に
は、クロック（５５０−１）を設ける。クロック（５５
０−１）は１０ｍ秒毎にカウンタを（＋１）する。例え
ば、１秒経過して割込むものは、カウンタ値が１００に
なれば、プロセッサ（１１−１）に割込みを発生させ
る。FIG. 23 is a diagram showing a timer interrupt control circuit of the active unit (11). The timer (530-1) is provided with a clock (550-1). Clock (55
0-1) increments the counter by (+1) every 10 ms. For example, when an interrupt occurs after a lapse of one second, an interrupt is generated in the processor (11-1) when the counter value reaches 100.

【００９２】このようにして、プロセッサ（１１−１）
は、タイマ割込みを発生することが可能となる。Thus, the processor (11-1)
Can generate a timer interrupt.

【００９３】図２２と図２３において、現用機（１１）
の（５＊＊−１）のものは、現用機（１２）の（５＊＊
−２）と、以下順に、現用機（１７）の（５＊＊−７）
と、予備機（１０）の（５＊＊−０）に対応する。In FIGS. 22 and 23, the working machine (11)
Of (5 **-1) of (5 **-1)
-2) and (5 **-7) of the working machine (17) in the following order:
And (5 **-0) of the spare machine (10).

【００９４】図２４は、現用機あるいは予備機のメモリ
マップを示す図である。現用機（１１〜１７）のメモリ
マップ（５８１−１〜５８１−７）と予備機（１０）の
メモリマップ（５８１−０）のメモリマップとは同一で
あり、以下の通りである。（０）16〜（０ＦＦＦＦＦ）16……監視エリア（２０
０）（１０００００）16〜β………ＯＳ領域（２０１） β〜γ………………………ユーザプログラム領域（２０
２） γ〜（ＦＦＦＦＦＦ）16………リザーブ領域（２０３）なお、（＊＊＊＊＊＊）16は、１６進表記を示す。FIG. 24 is a diagram showing a memory map of the working machine or the spare machine. The memory maps (581-1 to 581-7) of the working machines (11 to 17) and the memory map (581-0) of the spare machine (10) are the same, and are as follows. (0) 16-(0FFFFFF) 16 ..... monitoring area (20
0) (100000) 16 to β ... OS area (201) β to γ ... User program area (20
2) γ to (FFFFFF) 16..., Reserved area (203) Note that (****) 16 indicates hexadecimal notation.

【００９５】次に、図２５〜図２７を参照して、現用機
（１１〜１７）、予備機（１０）およびディスクサブシ
ステム（４）のソフトウェア構成を説明する。Next, with reference to FIGS. 25 to 27, the software configuration of the active units (11 to 17), the spare unit (10) and the disk subsystem (4) will be described.

【００９６】図２５は、現用機のソフトウェアの構成を
示す図である。FIG. 25 is a diagram showing a software configuration of the active device.

【００９７】現用機（１１〜１７）は、割込み（７０
０）を受信し、割込み種別を解析する（７０１）。障害
割込み（７０３）は割込みレベル４で、タイマ割込み
（７０２）は割込みレベル２で、電文処理（７３０）は
レべル０で実行する。The active devices (11 to 17) are provided with an interrupt (70
0) is received and the interrupt type is analyzed (701). The fault interrupt (703) is executed at interrupt level 4, the timer interrupt (702) is executed at interrupt level 2, and the message processing (730) is executed at level 0.

【００９８】障害割込み（７０３）では、予備機の切り
離し処理（７１１）、予備機の接続処理（７１２）、共
有ディスク装置の障害処理（７１３）、共有ディスク装
置の修復処理（７１４）、内蔵ディスク装置の障害処理
（７１５）、および内蔵ディスク装置の修復処理（７１
６）を実行する。In the failure interrupt (703), the processing for disconnecting the spare unit (711), the processing for connecting the spare unit (712), the failure processing for the shared disk unit (713), the restoration processing for the shared disk unit (714), the built-in disk Device failure processing (715) and internal disk device repair processing (71)
Execute 6).

【００９９】予備機（１０）で障害が発生すると、現用
機（１１〜１７）は、予備機の切り離し処理（７１１）
を実行し、予備機（１０）をオフライン状態（１５３）
とする。そして、現用機（１１〜１７）のみで処理を継
続する。When a failure occurs in the spare unit (10), the active units (11 to 17) disconnect the spare unit (711).
Is executed, and the standby unit (10) is placed in the offline state (153).
And Then, the processing is continued only by the active devices (11 to 17).

【０１００】予備機（１０）が障害から回復すると、現
用機（１１〜１７）は、予備機の接続処理（７１２）を
実行し、予備機（１０）を待機状態（１５２）とする。
そして、現用系（１１〜１７）と予備機（１０）による
バックアップ運転に戻る。When the standby unit (10) recovers from the failure, the active units (11 to 17) execute the standby unit connection process (712) and place the standby unit (10) in the standby state (152).
Then, the operation returns to the backup operation using the active system (11 to 17) and the standby unit (10).

【０１０１】共有ディスク装置（５）の障害を検出する
と、現用機（１１〜１７）は、共有ディスク装置の障害
処理（７１３）を実行する。これにより、共有ディスク
装置（５）の障害の検出を、他の現用機（１１〜１７）
と予備機（１０）に、通知する。そして、現用機（１１
〜１７）は、共有ディスク装置（５）を閉塞し、内蔵デ
ィスク装置（１１−６〜１７−６）のみを使用して処理
を継続する。When the failure of the shared disk device (5) is detected, the active devices (11 to 17) execute the shared disk device failure processing (713). As a result, the failure of the shared disk device (5) is detected by the other active devices (11 to 17).
To the standby machine (10). And the current machine (11
17) closes the shared disk device (5) and continues processing using only the built-in disk devices (11-6 to 17-6).

【０１０２】共有ディスク装置（５）の修復を受信する
と、現用機（１１〜１７）は、共有ディスク装置の修復
処理（７１４）を実行する。これにより、現用機（１１
〜１７）は、内蔵ディスク装置（１１−６〜１７−６）
から共有ディスク装置（５）にディスク装置の内容をコ
ピーする。Upon receiving the restoration of the shared disk device (5), the active devices (11 to 17) execute the restoration process (714) of the shared disk device. As a result, the current machine (11
To 17) are built-in disk devices (11-6 to 17-6)
From the disk device to the shared disk device (5).

【０１０３】現用機（１１〜１７）は、内蔵ディスク装
置（１１−６〜１７−６）の障害を検出すると、内蔵デ
ィスク装置の障害処理（７１５）を実行する。これによ
り、現用機（１１〜１７）は、内蔵ディスク装置（１１
−６〜１７−６）を閉塞し、共有ディスク装置（５）の
みを使用して処理を継続する。Upon detecting a failure in the internal disk device (11-6 to 17-6), the active devices (11 to 17) execute a failure process (715) for the internal disk device. As a result, the working machines (11 to 17) are connected to the internal disk device (11).
-6 to 17-6) are closed, and processing is continued using only the shared disk device (5).

【０１０４】現用機（１１〜１７）は、内蔵ディスク装
置（１１−６〜１７−６）の修復を受信すると、内蔵デ
ィスク装置の修復処理（７１６）を実行する。これによ
り、現用機（１１〜１７）は、共有ディスク装置（５）
から内蔵ディスク装置（１１−６〜１７−６）にディス
ク装置の内容をコピーする。Upon receiving the restoration of the internal disk units (11-6 to 17-6), the active units (11 to 17) execute the internal disk unit restoration processing (716). As a result, the active devices (11 to 17) are connected to the shared disk device (5).
Then, the contents of the disk device are copied to the internal disk devices (11-6 to 17-6).

【０１０５】タイマ割込み（７０２）では、予備機（１
０）の障害検出のためのアライブメッセージの送信処理
（７２１）、およびこれに対するアライブメッセージの
受信確認処理（７２２）を実行する。In the timer interrupt (702), the standby unit (1
The transmission process (721) of the alive message for the failure detection of 0) and the reception confirmation process of the alive message (722) for the alive message are executed.

【０１０６】予備機のアライブメッセージの送信処理
（７２１）は、周期的にアライブメッセージを予備機
（１０）に転送する処理である。予備機のアライブメッ
セージの受信確認処理（７２２）は、現用機（１１〜１
７）が予備機（１０）からの最終のアライブメッセージ
を受信して一定時間以内にアライブメッセージを受信し
たかどうかチェックする処理である。The transmission process (721) of the alive message of the standby unit is a process of periodically transferring the alive message to the standby unit (10). The process of confirming the reception of the alive message of the standby unit (722) is performed by the active unit (11-1).
7) is a process for checking whether or not an alive message has been received within a predetermined time after receiving the last alive message from the standby unit (10).

【０１０７】以上の割込みレべル４の処理（７１０）と
割込みレべル２の処理（７２０）の処理が終了すると、
現用機（１１〜１７）は電文処理（７３０）を実行す
る。When the processing of the interrupt level 4 (710) and the processing of the interrupt level 2 (720) are completed,
The active devices (11 to 17) execute the message processing (730).

【０１０８】電文処理（７３０）は、ディスク装置
（５，１１−６〜１７−６）への書込み処理をチェック
ポイントとし、チェックポイント毎にチェックポイント
データ（８０）をディスク装置（５，１１−６〜１７−
６）に格納する処理である。In the message processing (730), the writing process to the disk device (5, 11-6 to 17-6) is set as a check point, and the check point data (80) is transferred to the disk device (5, 11- 6-17-
6).

【０１０９】図２６は、予備機（１０）のソフトウェア
の構成を示す図である。予備機（１０）は、現用機（１
１〜１７）と同様に割込み（７５０）を受信する。割込
み種別の解析処理（７５１）により、タイマ割込み（７
５２）か障害割込み（７５３）かを解析する。FIG. 26 is a diagram showing a software configuration of the standby unit (10). The spare machine (10) is an active machine (1
An interrupt (750) is received in the same manner as in 1) to 17). By the interrupt type analysis processing (751), the timer interrupt (7
52) or failure interrupt (753).

【０１１０】障害割込み（７５３）では、現用機（１１
〜１７）で障害が発生し障害割込み（７５３）を受ける
と、予備機（１０）はチェックポイントデータ（図１
８）を参照して、障害発生した現用機からの引継ぎ処理
（７６１）を実行する。In the failure interrupt (753), the active device (11
When a failure occurs in steps (.about.17) and a failure interrupt (753) is received, the standby unit (10) checkspoint data (FIG. 1).
With reference to 8), a takeover process (761) from the active device in which the failure has occurred is executed.

【０１１１】タイマ割込み（７５２）では、アライブメ
ッセージの送信処理（７７１）、およびアライブメッセ
ージの受信確認処理（７７２）を実行する。アライブメ
ッセージの送信処理（７７１）は、周期的にアライブメ
ッセージを現用機（１１）から現用機（１７）に転送す
る処理である。アライブメッセージの受信確認処理（７
７２）は、予備機（１０）が現用機（１１〜１７）から
の最終のアライブメッセージを受信して一定時間以内に
アライブメッセージを受信するかどうかチェックする処
理である。In the timer interrupt (752), an alive message transmission process (771) and an alive message reception confirmation process (772) are executed. The alive message transmission process (771) is a process for periodically transferring the alive message from the active device (11) to the active device (17). Alive message reception confirmation processing (7
72) is a process for checking whether or not the standby unit (10) receives the last alive message from the active units (11 to 17) and receives the alive message within a predetermined time.

【０１１２】図２７は、ディスクサブシステムのソフト
ウェアの構成を示す図である。FIG. 27 is a diagram showing a software configuration of the disk subsystem.

【０１１３】ディスクサブシステム（４）は、割込み
（８００）を受信し、割込み種別を解析する（８０
１）。緊急障害割込み（８０４）は割込みレベル６で、
障害割込み（８０３）は割込みレベル４で、タイマ割込
み（８０２）は割込みレベル２で、それぞれ実行する。The disk subsystem (4) receives the interrupt (800) and analyzes the interrupt type (80).
1). Emergency interrupt (804) is at interrupt level 6,
The fault interrupt (803) is executed at interrupt level 4 and the timer interrupt (802) is executed at interrupt level 2 respectively.

【０１１４】緊急障害割込み（８０４）では、内蔵ディ
スク装置（１１−６〜１７−６）に障害が発生した現用
機の内蔵ディスク装置からの読み出し処理（８１１）を
実行する。障害割込み（８０３）では、イベント割込み
方式による共有ディスク装置（５）への書込み処理（８
２１）を実行する。タイマ割込み（８０２）では、周期
割込み方式による共有ディスク装置（５）への書込み処
理（８２１）を実行する。In the emergency failure interrupt (804), a read process (811) from the internal disk device of the active device in which a failure has occurred in the internal disk device (11-6 to 17-6) is executed. In the failure interrupt (803), the writing process (8) to the shared disk device (5) by the event interrupt method is performed.
21) is executed. In the timer interrupt (802), a write process (821) to the shared disk device (5) is executed by the periodic interrupt method.

【０１１５】以上の割込みレべル６の処理（８１５）、
割込みレべル４の処理（８１０）と割込みレべル２の処
理（８２０）の処理が終了すると、ディスクサブシステ
ム（４）は、予備機（１０）が準現用状態ならば、その
内蔵ディスク装置へのコピー処理（８３１）を実行す
る。The processing of the above interrupt level 6 (815),
When the processing of the interrupt level 4 (810) and the processing of the interrupt level 2 (820) are completed, if the spare unit (10) is in the semi-active state, the disk subsystem (4) A copy process to the device (831) is executed.

【０１１６】図２８は、現用機（１１〜１７）での電文
処理の概要を示す図である。電文処理（７３０）が実行
開始すると、端末（７）から電文を受信するが、この状
態を受信状態（２６１）という。そして、ディスク装置
（１１−６〜１７−６，５）から必要なデータを読み出
すが、この状態を読出し状態（２６２）という。受信し
た電文に対する所定の処理を実行し、処理結果をディス
ク装置（１１−６〜１７−６，５）に書き込むが、この
状態を書込み状態（２６３）という。最後に、端末
（７）に応答を返すが、端末（７）からＡＣＫを受信し
ていない状態を送信状態（２６４）、ＡＣＫを受信して
いる状態を送信済み状態（２６５）という。FIG. 28 is a diagram showing an outline of the message processing in the active devices (11 to 17). When the execution of the message processing (730) is started, a message is received from the terminal (7). This state is called a reception state (261). Then, necessary data is read from the disk devices (11-6 to 17-6, 5), and this state is called a read state (262). A predetermined process is executed for the received message, and the processing result is written to the disk device (11-6 to 17-6, 5). This state is called a write state (263). Lastly, a response is returned to the terminal (7), but a state in which ACK is not received from the terminal (7) is referred to as a transmission state (264), and a state in which ACK is received is referred to as a transmitted state (265).

【０１１７】図２９は、電文管理テーブル（２６０）を
示す図である。図２８より、電文の状態は、受信状態
（２６１）、読出し状態（２６２）、書込み状態（２６
３）、送信中状態（２６４）および送信済み状態（２６
５）に分けて管理する。この電文管理テーブル（２６
０）は、これら５つの状態を管理する。FIG. 29 is a diagram showing a message management table (260). From FIG. 28, the states of the message are the reception state (261), the reading state (262), and the writing state (26).
3), transmitting state (264) and transmitted state (26)
It is divided into 5) and managed. This message management table (26
0) manages these five states.

【０１１８】図３０は、電文のフォーマット（７０）を
示す図である。電文は、現用機（１１〜１７）の番号
（７１）、電文本体（７２）、および時刻印（７３）で
構成する。FIG. 30 is a diagram showing a message format (70). The message is composed of a number (71) of the active devices (11 to 17), a message body (72), and a time stamp (73).

【０１１９】以下、図１〜図３０で説明した本実施例の
システムの３つの動作例（動作例１〜動作例３）につい
て説明する。Hereinafter, three operation examples (operation examples 1 to 3) of the system of the present embodiment described with reference to FIGS. 1 to 30 will be described.

【０１２０】〈動作例１〉まず、現用機（１１〜１７）
で障害が発生し、予備機（１０）への引継ぎが必要な場
合、およびその障害から修復する場合について説明す
る。<Operation Example 1> First, the working machines (11 to 17)
In the following, a description will be given of a case where a failure occurs and a handover to the spare machine (10) is required, and a case where the failure is repaired.

【０１２１】図３１は、現用機（１１）の障害時におけ
る現用機（１１〜１７）、予備機（１０）、通信処理サ
ーバ（２）、および端末（７）の通信処理手順を示す図
である。以下、現用機（１１）で障害が発生した場合の
処理概要を示す。FIG. 31 is a diagram showing a communication processing procedure of the active units (11 to 17), the standby unit (10), the communication processing server (2), and the terminal (7) when the active unit (11) fails. is there. Hereinafter, an outline of processing when a failure occurs in the active device (11) will be described.

【０１２２】予備機（１０）は、アライブメッセージを
現用機（１１）から現用機（１７）の順に周期的に送る
（処理９００）。現用機（１１）から現用機７（１
７）は、同じように予備機（１０）にアライブメッセー
ジを送る（処理９０１）。（処理９００）と（処理
９０１）にて、互いにアライブメッセージを送ること
により、現用機（１１〜１７）と予備機（１０）は障害
検出が可能になる。The spare unit (10) periodically sends an alive message in order from the active unit (11) to the active unit (17) (process 900). Working machine (11) to working machine 7 (1
7) similarly sends an alive message to the standby unit (10) (process 901). By sending alive messages to each other in (Process 900) and (Process 901), the active devices (11 to 17) and the standby device (10) can detect a failure.

【０１２３】端末（７）は、通信処理サーバ（２）に電
文（７１）を送る（処理９０２）。通信処理サーバ
（２）では、通信処理サーバ（２−０）と待機通信処理
サーバ（２−１）が、電文を受信する。現用通信処理サ
ーバ（２−０）は、電文に時刻印（７３）を付与し、そ
の時刻印（７３）を通信処理サーバ用ディスク装置（２
−２）に格納する。そして、現用通信処理サーバ（２−
０）は変換テーブル（２−４）を参照し、指定する現用
機（この場合、現用機（１１））に電文を送信する（処
理９０３）。The terminal (7) sends the message (71) to the communication processing server (2) (process 902). In the communication processing server (2), the communication processing server (2-0) and the standby communication processing server (2-1) receive the message. The active communication processing server (2-0) attaches a time stamp (73) to the message, and attaches the time stamp (73) to the communication processing server disk device (2).
-2). Then, the active communication processing server (2-
0) refers to the conversion table (2-4) and transmits a message to the designated active device (in this case, the active device (11)) (process 903).

【０１２４】その後、現用機（１１）で障害が発生した
とする（処理９０４）。障害発生により現用機（１
１）から予備機（１０）へのアライブメッセージが転送
されなくなる（処理９０５）。これにより、予備機
（１０）は現用機（１１）の障害を検出する（処理９
０６）。予備機（１０）は、他のすべての現用機（１１
〜１６）に障害発生を通知する（処理９０７）。Then, it is assumed that a failure has occurred in the active device (11) (process 904). The current machine (1
The alive message from 1) to the standby device (10) is not transferred (process 905). Thereby, the standby unit (10) detects a failure of the active unit (11) (Process 9).
06). The spare machine (10) is connected to all other working machines (11
16) is notified of the occurrence of the failure (process 907).

【０１２５】予備機（１０）は、共有ディスク装置
（５）に格納されているチェックポイントデータ（８
０）を参照して、障害の発生した現用機（１１）の引き
継ぎ処理（７６１）を実行する（処理９０８）。チェ
ックポイントデータ（８０）は、受信電文、プロセッサ
レジスタ、回線情報、電文の状態、プロセスの実行状態
の引継ぎ処理のために必要なデータ、およびディスク装
置への書込みデータである。The spare unit (10) checks the checkpoint data (8) stored in the shared disk device (5).
With reference to (0), the takeover process (761) of the active device (11) in which the failure has occurred is executed (process 908). The checkpoint data (80) is data necessary for taking over the received telegram, processor register, line information, telegram status, process execution status, and data to be written to the disk device.

【０１２６】受信電文では、電文の本体（７２）と時刻
印（７３）をチェックポイントデータとする。電文の状
態は、図２８と図２９に示した通りである。回線情報や
プロセスの実行状態は、ＯＳ（オペレーティングシステ
ム）で制御されている。In the received message, the body (72) and the time stamp (73) of the message are used as check point data. The state of the message is as shown in FIGS. 28 and 29. The line information and the execution state of the process are controlled by an OS (operating system).

【０１２７】これらの情報により、予備機（１０）はプ
ロセスの引継ぎ処理が可能となる。Based on this information, the spare machine (10) can take over the process.

【０１２８】この処理（７６１）が完了すると、予備機
（１０）は、現用通信処理サーバ（２−０）に対して、
送信電文を送る（処理９０９）。そして、端末（７）
に、処理結果を報告し、完了した電文の時刻印（７３）
を通信処理サーバ用ディスク装置（２−２）に格納する
（処理９１０）。待機通信処理サーバ（２−１）は、
周期的に通信処理サーバ用ディスク装置（２−２）を参
照し、処理の完了している電文の時刻印（７３）を削除
する（処理９１１）。When this process (761) is completed, the standby unit (10) sends a request to the active communication processing server (2-0).
The transmission message is sent (process 909). And the terminal (7)
, The processing result is reported, and the time stamp of the completed message (73)
Is stored in the communication processing server disk device (2-2) (process 910). The standby communication processing server (2-1)
The time stamp (73) of the message for which processing has been completed is periodically deleted by referring to the disk device (2-2) for the communication processing server (processing 911).

【０１２９】現用機（１１）は、障害（９０４）から修
復すると、オフライン状態（１５３）から修復状態（１
５４）に遷移し（処理９１２）、予備機（１０）に修
復完了を通知する（処理９１３）。そして、予備機
（１０）は、現用機（１１）に修復完了を通知する（処
理９１４）。現用機（１１）は、修復状態（１５４）
から待機状態（１５２）に遷移する（処理９１５）。
以上より、予備機（１０）であった装置が現用機とな
り、修復した現用機（１１）が新たに予備機となる。When the active unit (11) recovers from the failure (904), the active unit (11) changes from the offline state (153) to the recovery state (1).
The process transitions to (54) (process 912), and the completion of restoration is notified to the standby unit (10) (process 913). Then, the standby unit (10) notifies the active unit (11) of the completion of the restoration (process 914). The working machine (11) is in the repair state (154).
To the standby state (152) (process 915).
As described above, the device that was the spare machine (10) becomes the working machine, and the repaired working machine (11) newly becomes the spare machine.

【０１３０】以下、障害の発生する現用機（１１）、他
の現用機（１２〜１７）、および予備機（１０）の詳細
な処理手順を、通常運転中の処理手順、障害検出手
順、予備機の引継ぎ処理手順、および再同期処理手
順に分けて説明する。通常運転中の処理手順The detailed processing procedures of the active machine (11), the other active machines (12 to 17), and the spare machine (10) in which a fault occurs will be described below. The procedure will be described separately for the handover process of the device and the resynchronization process. Processing procedure during normal operation

【０１３１】図３２は、通常運転中の処理手順を示す図
である。FIG. 32 is a diagram showing a processing procedure during normal operation.

【０１３２】端末（７）は、現用通信処理サーバ（２−
０）と待機通信処理サーバ（２−１）に電文を送信する
（処理９２０）。現用通信処理サーバ（２−０）と待
機通信処理サーバ（２−１）は、電文に時刻印（７３）
を付与する。そして、現用通信処理サーバ（２−０）
は、端末（７）と現用機（１１〜１７）の変換テーブル
（２−４）を参照することにより、指定する現用機（１
１）に電文を送信する（処理９２１）。The terminal (7) is connected to the active communication processing server (2-
0) and transmits a message to the standby communication processing server (2-1) (process 920). The active communication processing server (2-0) and the standby communication processing server (2-1) add a time stamp (73) to the message.
Is given. Then, the active communication processing server (2-0)
Refers to the conversion table (2-4) between the terminal (7) and the active devices (11 to 17) to specify the active device (1
The message is transmitted to 1) (process 921).

【０１３３】現用機（１１）で実行する電文処理（７３
０）の手順を以下に示す。The message processing (73) executed by the active device (11)
The procedure of 0) is shown below.

【０１３４】現用機（１１）は、電文を受信すると、必
要なデータを内蔵ディスク装置（１１−６）から読み出
す（処理９２３）。さらに、受信した電文について所
定の処理を実行し、ディスク装置（１１−６，５）への
書き込み処理を行う。このときのデータをチェックポイ
ントデータ（８０）とし、内蔵ディスク装置（１１−
６）と共有ディスク装置（５）に書き込む。共有ディス
ク装置（５）への書き込みは、サブディスクシステム
（４）に書き込みデータを送信することにより行なう
（処理９２６）。電文処理（７３０）が完了すると、
現用機（１１）は、処理結果を端末（７）へ報告する
（処理９２７，９２８）。Upon receiving the message, the active unit (11) reads out necessary data from the internal disk unit (11-6) (process 923). Further, predetermined processing is performed on the received message, and writing processing to the disk device (11-6, 5) is performed. The data at this time is referred to as checkpoint data (80), and is stored in the internal disk device (11-
6) and write to the shared disk device (5). Writing to the shared disk device (5) is performed by transmitting write data to the sub-disk system (4) (process 926). When the message processing (730) is completed,
The working unit (11) reports the processing result to the terminal (7) (processing 927, 928).

【０１３５】次に、ディスクサブシステム（４）の処理
手順を説明する。Next, the processing procedure of the disk subsystem (4) will be described.

【０１３６】ディスクサブシステム（４）の処理手順と
しては、以下に示す周期割込み方式とイベント割込み方
式がある。サブディスクシステム（４）の処理能力を重
視するとオーバヘッドの小さい周期割込み方式を採用す
るのが好ましく、一方、応答時間を考慮すると割込み方
法を採用するのが好ましい。以下、周期割込み方式とイ
ベント割込み方式とを分けて説明する。The processing procedure of the disk subsystem (4) includes a periodic interrupt method and an event interrupt method described below. When emphasizing the processing capacity of the subdisk system (4), it is preferable to employ a periodic interruption method with a small overhead, while it is preferable to adopt an interruption method in consideration of the response time. Hereinafter, the periodic interrupt method and the event interrupt method will be described separately.

【０１３７】（ａ）周期割込み方式図３３は、周期割込み方式で通常運転中のディスクサブ
システム（４）の処理手順を示す図である。(A) Periodic interruption method FIG. 33 is a diagram showing a processing procedure of the disk subsystem (4) during normal operation by the periodic interruption method.

【０１３８】書き込み要求が発生すると、イベント割込
み（８０３）により、書込みデータレジスタ（２４１）
の指定ビットをオンにする。現用機（１１）の場合に
は、（０２）16にする（処理１０００）。次に、ディ
スク装置への書込みデータ（８９）とチェックポイント
データをディスクサブシステム（４）のディスク制御部
（４−４）に格納する（処理１００１）。そして、Ａ
ＣＫを現用機（１１）に戻す（処理１００２）。When a write request occurs, an event interrupt (803) causes a write data register (241) to be generated.
Turn on the specified bit of. In the case of the working machine (11), it is set to (02) 16 (process 1000). Next, the write data (89) to the disk device and the checkpoint data are stored in the disk controller (4-4) of the disk subsystem (4) (process 1001). And A
The CK is returned to the active device (11) (process 1002).

【０１３９】また、サブディスクシステム（４）は、周
期的に書込みデータレジスタ（２４１）を参照してお
り、この値が（００）16となる（処理１０１０）ま
で、以下の処理（処理１０１１，１０１２）を行う。
まず、ディスク装置への書込みデータ（８９）とチェッ
クポイントデータをサブディスクシステム（４）のディ
スク制御部（４−４）から共有ディスク装置（５）に書
き込む（処理１０１１）。そして、書込みデータレジ
スタ（２４１）の指定ビットをオフにする（処理１０１
２）。The sub-disk system (4) periodically refers to the write data register (241), and the following processing (processing 1011) until this value becomes (00) 16 (processing 1010). 1012).
First, write data (89) and checkpoint data to the disk device are written from the disk controller (4-4) of the sub-disk system (4) to the shared disk device (5) (process 1011). Then, the designated bit of the write data register (241) is turned off (process 101).
2).

【０１４０】（ｂ）イベント割込み方式図３４は、イベント割込み方式で通常運転中のディスク
サブシステム（４）の処理手順を示す図である。(B) Event Interruption Method FIG. 34 is a diagram showing a processing procedure of the disk subsystem (4) during normal operation by the event interruption method.

【０１４１】書き込み要求が発生すると、書込みデータ
とチェックポイントデータをサブディスクシステム
（４）のディスク制御部（４−４）に格納し（処理１
０２１）、書込みデータとチェックポイントデータをサ
ブディスクシステム（４）のバッファから共有ディスク
装置（５）に書き込む（処理１０２２）。そして、ＡＣ
Ｋを現用機（１１）に戻す（処理１０２３）。When a write request occurs, the write data and checkpoint data are stored in the disk control unit (4-4) of the subdisk system (4) (Process 1).
021), write data and checkpoint data are written from the buffer of the subdisk system (4) to the shared disk device (5) (process 1022). And AC
K is returned to the active device (11) (process 1023).

【０１４２】以上の２方式により、現用機（１１）は、
チェックポイントデータ（８０）を取得することができ
る。また、内蔵ディスク装置（１１−６）と共有ディス
ク装置（５）への二重書き込みが可能となる。According to the above two methods, the working machine (11)
Checkpoint data (80) can be obtained. Further, double writing to the internal disk device (11-6) and the shared disk device (5) becomes possible.

【０１４３】障害検出手順次に、図３５から図３７を参照して、予備機（１０）が
現用機（１１）の障害を検出する方式の概要を説明す
る。同様にして、現用機（１２〜１７）の障害検出も可
能である。また、現用機（１１〜１７）が予備機（１
０）の障害を検出する方式も同様である。Next, an outline of a system in which the standby unit (10) detects a failure of the active unit (11) will be described with reference to FIGS. 35 to 37. Similarly, it is possible to detect a failure of the active devices (12 to 17). In addition, the active machines (11 to 17) are used as spare machines (1 to 17).
The same applies to the method of detecting the fault of 0).

【０１４４】図３５は、アライブメッセージによる現用
機（１１〜１７）の障害検出方式を示す図である。現用
機（１１〜１７）は１秒毎にアライブメッセージ（９０
１）を予備機（１０）に転送する。予備機（１０）は、
最後のアライブメッセージを受信して２秒以内に次のア
ライブメッセージ（９０１）を受信しないと（９０
５）、当該現用機に障害が発生したと判定する。FIG. 35 is a diagram showing a failure detection method for the active devices (11 to 17) using an alive message. The active devices (11 to 17) transmit an alive message (90
1) is transferred to the standby unit (10). The spare machine (10)
If the next alive message (901) is not received within 2 seconds after receiving the last alive message (90
5) It is determined that a failure has occurred in the active device.

【０１４５】以下、現用機（１１）で障害（９０４）が
発生し、予備機（１０）がその現用機（１１）の障害を
検出する場合の詳細な処理手順を示す。Hereinafter, a detailed processing procedure when a failure (904) occurs in the active unit (11) and the standby unit (10) detects the failure of the active unit (11) will be described.

【０１４６】図３６は、現用機（１１）のアライブメッ
セージの送信処理のフローチャート図である。現用機
（１１）はタイマ割込み（７０２）により、１秒毎にア
ライブメッセージの送信処理（８２２）を起動する。こ
の処理では、現用機（１１）が予備機（１０）のアライ
ブレジスタ（２５３）のうち現用機（１１）に対応する
レジスタを（００）16から（０１）16にする（処理１
１００）。FIG. 36 is a flowchart of the alive message transmission process of the active device (11). The active device (11) starts the alive message transmission process (822) every second by the timer interrupt (702). In this process, the active unit (11) changes the register corresponding to the active unit (11) among the alive registers (253) of the standby unit (10) from (00) 16 to (01) 16 (Process 1).
100).

【０１４７】図３７は、予備機（１０）のアライブメッ
セージの受信確認処理のフローチャート図である。予備
機（１０）は、現用機（１１）から最終のアライブメッ
セージ（９０１）を受信した後、２秒経過すると、この
アライブメッセージの受信確認処理（７７２）を起動す
る。FIG. 37 is a flow chart of the alive message reception confirmation processing of the standby unit (10). The standby unit (10) starts the alive message reception confirmation process (772) two seconds after the last alive message (901) is received from the active unit (11).

【０１４８】この処理において、予備機（１０）は、自
機のアライブレジスタ（２５３）がのうち現用機（１
１）に対応するレジスタが（０１）16かどうか判定する
（処理１１１０）。このアライブレジスタ（２５３）が
（０１）16ならば、現用機（１１）は正常と判定し、ア
ライブレジスタを（００）16とする（処理１１１
１）。一方、そのアライブレジスタ（２５３）が（０
１）16でなければ、現用機（１１）に障害が発生したと
判定する（処理１１１２）。他の現用機（１２〜１
７）の障害検出も同様である。また、現用機（１１〜１
７）による予備機の障害検出も同様である。In this processing, the standby unit (10) stores the active unit (1) in its own alive register (253).
It is determined whether the register corresponding to 1) is (01) 16 (process 1110). If the alive register (253) is (01) 16, the active unit (11) is determined to be normal, and the alive register is set to (00) 16 (process 111).
1). On the other hand, the alive register (253) is set to (0
1) If it is not 16, it is determined that a failure has occurred in the active device (11) (process 1112). Other working machines (12-1
The same applies to the failure detection of 7). In addition, the current machine (11-1
The same applies to the detection of a failure in the standby unit according to 7).

【０１４９】予備機の引継ぎ処理手順次に、図３８を参照して、現用機（１１）の障害時にお
ける予備機（１０）の引継ぎ処理手順を説明する。Next, with reference to FIG. 38, a description will be given of a procedure for taking over the spare unit (10) when the active unit (11) fails.

【０１５０】予備機（１０）は、現用機（１１）の障害
をアライブメッセージにより検出し（処理１２０
０）、障害発生を現用通信処理サーバ（２−０）に通知
する（処理１２０１）。次に、現用機（１１）の状態
レジスタ（２５０−１）を（２０）16から（０１）16に
遷移することにより、現用機（１１）をオフライン状態
（１５３）にする（処理１２０２）。そして、予備機
（１０）の状態レジスタ（２５０−０）を（０４）16か
ら（２０）16に遷移することにより、予備機（１０）を
現用状態（１５０）にする（処理１２０３）。The standby unit (10) detects the failure of the active unit (11) by the alive message (Step 120).
0), and notifies the active communication processing server (2-0) of the occurrence of the failure (process 1201). Next, the status of the active device (11) is changed from (20) 16 to (01) 16 in the status register (250-1), thereby bringing the active device (11) into the offline state (153) (process 1202). Then, the state register (250-0) of the standby unit (10) is changed from (04) 16 to (20) 16, thereby setting the standby unit (10) to the working state (150) (process 1203).

【０１５１】次に、予備機（１０）は、アライブメッセ
ージの送信処理（７７１）とアライブメッセージの受信
確認処理（７７２）を中断する（処理１２０４）。共
有ディスク装置（５）を読み出し、実行中の電文のチェ
ックポイントデータを捜す（処理１２０５）。そし
て、実行中の電文は、チェックポイントまで実行してい
るかどうかを判定する（処理１２０６）。チェックポ
イントまで実行していれば、チェックポイントから再開
する（処理１２０７）。チェックポイントまで実行し
ていなければ、電文の最初から再開する（処理１２０
８）。予備機（１０）は、共有ディスク装置（５）の予
備機用エリア（５−０）から予備機（１０）の内蔵ディ
スク装置（１０−６）にコピーを開始する（処理１２
０９）。Next, the standby unit (10) interrupts the alive message transmission processing (771) and the alive message reception confirmation processing (772) (processing 1204). The shared disk device (5) is read, and checkpoint data of the message being executed is searched (process 1205). Then, it is determined whether the message being executed has been executed up to the checkpoint (process 1206). If the process has been executed up to the check point, the process is restarted from the check point (process 1207). If the execution has not been performed up to the checkpoint, the message is restarted from the beginning (processing 120
8). The spare unit (10) starts copying from the spare unit area (5-0) of the shared disk unit (5) to the internal disk unit (10-6) of the spare unit (10) (Process 12).
09).

【０１５２】図３９は、図３８の（処理１２０７）の
チェックポイントから再開する処理の詳細フローチャー
ト図である。FIG. 39 is a detailed flowchart of the process of resuming from the check point in (process 1207) of FIG.

【０１５３】（処理１２０７）では、まず共有ディス
ク装置（５）のチェックポイントデータの引き継ぎ情報
を読み出し、ＤＲ０−ＤＲ７（５００−５０７），ＡＲ
０−ＡＲ６（５１０−５１６），ＡＲ７（５２０），Ｓ
Ｒ（５２１），ＰＣ（５２２）を待機系である予備機
（１０）のプロセッサ（３１−１）に設定する。ＳＲ
（５２１）を設定すれば、割込みレベルは０となる（処
理１２２０）。ＤＲ０−ＤＲ７（５００−５０７），
ＡＲ０−ＡＲ６（５１０−５１６），ＡＲ７（５２
０），ＳＲ（５２１），ＰＣ（５２２）の値は、（処理
１２０６）の値であり、ディスクへの書込み処理実行
の直前の値である。In (Process 1207), first, the takeover information of the checkpoint data of the shared disk device (5) is read, and DR0-DR7 (500-507), AR
0-AR6 (510-516), AR7 (520), S
R (521) and PC (522) are set to the processor (31-1) of the standby unit (10), which is a standby system. SR
If (521) is set, the interrupt level becomes 0 (process 1220). DR0-DR7 (500-507),
AR0-AR6 (510-516), AR7 (52
The values of (0), SR (521), and PC (522) are the values of (Process 1206), and are the values immediately before the execution of the writing process to the disk.

【０１５４】予備機（１０）のプロセッサ（１０−１）
は、ＲＴＥ命令により、ディスクへの書込み処理の時点
（チェックポイント）から再開する（処理１２２
１）。Processor (10-1) of spare machine (10)
Resumes from the point of write processing to the disk (checkpoint) by the RTE instruction (processing 122).
1).

【０１５５】一方、障害の発生した現用機（１１）はリ
セットし、オフライン状態（１５３）に遷移する。そし
て、障害箇所が修復完了するまで、リブートさせないよ
うにする。On the other hand, the active device (11) in which the failure has occurred is reset and transits to the offline state (153). Then, the system is prevented from rebooting until the faulty part is repaired.

【０１５６】予備機（１０）は、図４０と図４１と図４
２の処理を実行する。The spare unit (10) is shown in FIGS.
Step 2 is executed.

【０１５７】図４０は、予備機（１０）のコピー処理の
フローチャート図である。まず、コピー中レジスタ（２
５５）を（０１）16とし（処理１２３０）、コピー完
了ポインタを（００００００）16とする（処理１２３
１）。一度に１ｋＢコピーを行ない（処理１２３
２）、その１ｋＢコピーが完了すると現在のコピー完了
ポインタを（０００２００）16に更新する（処理１２
３３）。FIG. 40 is a flow chart of the copy process of the standby unit (10). First, the register during copying (2
55) is set to (01) 16 (process 1230), and the copy completion pointer is set to (000000) 16 (process 123)
1). 1 kB copy is performed at a time (processing 123
2) When the 1 kB copy is completed, the current copy completion pointer is updated to (000200) 16 (process 12).
33).

【０１５８】コピー完了ポインタの値が（ＦＦＦＦＦ
Ｆ）16となれば（処理１２３４）、コピー完了とし
（処理１２３５）、コピー中レジスタ（２５５）を
（００）16にする（処理１２３６）。コピー完了ポイ
ンタの値が未だ（ＦＦＦＦＦＦ）16でなければ、処理１
２３２に戻って（処理１２３７）、もう一度コピーを
行なう。When the value of the copy completion pointer is (FFFFF)
F) When 16 is reached (processing 1234), the copy is completed (processing 1235), and the copy-in-progress register (255) is set to (00) 16 (processing 1236). If the value of the copy completion pointer is not (FFFFFF) 16 yet, processing 1
Returning to 232 (processing 1237), copying is performed again.

【０１５９】図４１は、予備機のディスク装置の書込み
処理のフローチャート図である。まず、コピー完了ポイ
ンタを参照して、コピーが完了済エリアかどうか判定す
る（処理１２４０）。コピー完了済ならば、共有ディ
スク装置（５）と内蔵ディスク装置（１０−６）に書き
込む（処理１２４１）。コピーが完了していないとき
は、共有ディスク装置（５）にのみ書き込む（処理１
２４２）。FIG. 41 is a flow chart of the writing process of the disk device of the spare machine. First, referring to the copy completion pointer, it is determined whether or not the area is a copy completed area (process 1240). If the copying is completed, the data is written to the shared disk device (5) and the internal disk device (10-6) (process 1241). If the copy has not been completed, write only to the shared disk device (5) (Process 1
242).

【０１６０】図４２は、予備機のディスク装置の読み出
し処理のフローチャート図である。まず、コピー中レジ
スタ（２５５）を参照し（処理１２５０）、コピーが
完了していれば、内蔵ディスク装置（１０−６）から読
み出す（処理１２５１）。コピーが完了していなけれ
ば、共有ディスク装置（５）から読み出す（処理１２
５２）。FIG. 42 is a flow chart of a reading process of the disk device of the spare machine. First, referring to the copying-in-progress register (255) (processing 1250), if copying is completed, the data is read from the internal disk device (10-6) (processing 1251). If the copy has not been completed, the data is read from the shared disk device (5) (Process 12
52).

【０１６１】なお、予備機に障害が発生した場合は、現
用機はこれを検出し、他の全現用機に通知し、予備機を
切り離すようにしている。When a failure occurs in the standby unit, the active unit detects this, notifies all other active units, and disconnects the standby unit.

【０１６２】再同期処理手順最後に、図４３を参照して、現用機（１１）が障害から
修復した場合の処理手順を説明する。Re-Synchronization Processing Procedure Finally, with reference to FIG. 43, a processing procedure in the case where the active device (11) has recovered from a failure will be described.

【０１６３】図４３は、旧現用機（１１−ａ）が障害か
ら回復し、新現用系（１０−ａ）とバックアップ運転を
行うまでのフローチャート図である。ここでは、障害の
発生した現用機（１１）を旧現用機（１１−ａ）とい
う。また、予備機（１０）を新現用系（１０−ａ）とい
う。FIG. 43 is a flow chart from when the old active unit (11-a) recovers from the failure and when it performs the backup operation with the new active unit (10-a). Here, the active device (11) in which the failure has occurred is referred to as the old active device (11-a). The spare machine (10) is called a new working system (10-a).

【０１６４】旧現用機（１１−ａ）が修復を開始し修復
完了すると、オフライン状態（１５３）から修復状態
（１５４）へ遷移する（処理１４００）。そして、新
現用機（１０−ａ）に修復完了を通知する（処理１４
０１）。新現用機（１０−ａ）はこれを受信し（処理
１４０２）、新現用機（１０−ａ）は旧現用機（１１−
ａ）に待機状態（１５２）への遷移を要求する（処理
１４０３）。旧現用機（１１−ａ）は、この要求を受信
すると、状態レジスタ（２５０−１）を（０１）16から
（０２）16に遷移することにより、修復状態（１５４）
から待機状態（１５２）に遷移する（処理１４０
４）。そして、旧現用機（１１−ａ）と新現用系（１０
−ａ）は、アライブメッセージの交換を開始する（処理
１４０５）。When the old active unit (11-a) starts the restoration and the restoration is completed, the state transitions from the offline state (153) to the restoration state (154) (processing 1400). Then, the new active unit (10-a) is notified of the completion of the restoration (process 14).
01). The new active device (10-a) receives this (processing
1402), the new working machine (10-a) is replaced with the old working machine (11-a).
a) requesting transition to the standby state (152) (processing
1403). Upon receiving this request, the old active unit (11-a) changes the status register (250-1) from (01) 16 to (02) 16, thereby restoring the status (154).
To the standby state (152) (processing 140
4). The old working machine (11-a) and the new working system (10-a)
-A) starts exchanging alive messages (process 1405).

【０１６５】〈動作例２〉次に、図１〜図３０で説明し
た本実施例のシステムにおいて、現用機（１１）の内蔵
ディスク装置（１１−６）で障害が発生した場合の処理
について説明する。<Operation Example 2> Next, in the system of the present embodiment described with reference to FIGS. 1 to 30, the processing when a failure occurs in the internal disk unit (11-6) of the active unit (11) will be described. I do.

【０１６６】この場合、予備機（１０）が待機状態（１
５２）であるならば、予備機（１０）に処理を引き継ぐ
ことも可能である。しかし、予備機（１０）が待機状態
（１５２）でなければ、引き継ぎは不可能である。この
ため、ここでは、現用機（１１）の内蔵ディスク装置
（１１−６）を閉塞し、共有ディスク装置（５）のみで
実行する処理について説明する。In this case, the standby unit (10) is in the standby state (1
52), it is also possible to take over the processing to the spare machine (10). However, takeover is not possible unless the standby machine (10) is in the standby state (152). Therefore, here, a description will be given of a process in which the internal disk device (11-6) of the active device (11) is closed and executed only by the shared disk device (5).

【０１６７】図４４は、内蔵ディスク装置の障害時の処
理手順を示す図である。現用機（１１）は、外部割込み
により内蔵ディスク装置（１１−６）の障害を検出する
（処理１６０１）。そして、現用機（１１）は、ディ
スクサブシステム（４）に内蔵ディスク装置（１１−
６）に障害が発生したことを通知する（処理１６０
２）。FIG. 44 is a diagram showing a processing procedure when a failure occurs in the internal disk device. The active device (11) detects a failure of the internal disk device (11-6) by an external interrupt (process 1601). The working machine (11) has a built-in disk device (11-) in the disk subsystem (4).
Inform 6) that a failure has occurred (processing 160).
2).

【０１６８】一方、ディスクサブシステム（４）は、現
用機（１１）の内蔵ディスク装置（１１−６）に障害が
発生したことを示すために、ディスクステータスレジス
タ（２４０−１）を（０２）16にする（処理１６１
０）。On the other hand, the disk subsystem (4) sets the disk status register (240-1) to (02) to indicate that a failure has occurred in the internal disk unit (11-6) of the active unit (11). 16 (processing 161)
0).

【０１６９】現用機（１１）は、ディスク制御装置（１
１−４）をリセットする（処理１６０３）。そして、
現用機（１１）の状態レジスタ（２５０−１）は、（２
０）16から（０８）16とし、準現用状態（内蔵ディスク
装置障害）（１５１）とする。The active unit (11) is provided with the disk controller (1).
1-4) is reset (processing 1603). And
The status register (250-1) of the active device (11) stores (2
From (0) 16 to (08) 16, the quasi-active state (built-in disk device failure) (151).

【０１７０】これ以降、ディスクサブシステム（４）
は、現用機（１１）の読み出し要求あるいは書込み要求
に対して、図２７に示したように、緊急割込みにより
（８０４）、読み出し処理を行う（８１１）。また、書
込み処理は、正常状態（障害が発生していない状態）と
同じように、タイマ割込み（８０２）かイベント割込み
（８０３）により行う（８２１）。Hereinafter, the disk subsystem (4)
Performs a read process by an emergency interrupt (804) as shown in FIG. 27 in response to a read request or a write request of the active device (11) (811). The write process is performed by a timer interrupt (802) or an event interrupt (803), as in the normal state (a state in which no failure has occurred) (821).

【０１７１】図４５は、内蔵ディスク装置修復完了時の
処理手順を示す図である。現用機（１１）において内蔵
ディスク装置（１１−６）の修復が完了すると（処理
１６５０）、現用機（１１）は、ディスクサブシステム
（４）に修復完了を通知する（処理１６５１）。FIG. 45 is a diagram showing a processing procedure when the internal disk device restoration is completed. When the repair of the internal disk unit (11-6) is completed in the active unit (11) (processing
1650), the active device (11) notifies the disk subsystem (4) of the completion of the restoration (process 1651).

【０１７２】一方、ディスクサブシステム（４）は、デ
ィスクステータスレジスタ（２４０−１）の該当ビット
をクリアする。具体的には、ディスクステータスレジス
タ（２４０−１）を（０２）16から（００）16にする
（処理１６６０）。On the other hand, the disk subsystem (4) clears the corresponding bit of the disk status register (240-1). Specifically, the disk status register (240-1) is changed from (02) 16 to (00) 16 (process 1660).

【０１７３】現用機（１１）は、共有ディスク装置
（５）から内蔵ディスク装置（１１−６）にコピーを開
始する（処理１６５２）。このコピー処理が完了する
と、現用機（１１）は、ディスクサブシステム（４）か
らコピーを開始する（処理１６５３）。ディスクサブ
システム（４）からのコピーが完了すると、ディスクサ
ブシステム（４）は、ディスクステータスレジスタ（２
４０）の該当ビットをクリアする。また、状態レジスタ
を（０８）16から（２０）16とし、準現用状態（内蔵デ
ィスク障害）（１５１）から現用状態（１５０）に遷移
する（処理１６５４）。The active unit (11) starts copying from the shared disk unit (5) to the internal disk unit (11-6) (processing 1652). When the copy processing is completed, the working machine (11) starts copying from the disk subsystem (4) (processing 1653). When the copy from the disk subsystem (4) is completed, the disk subsystem (4) sets the disk status register (2
40) The corresponding bit is cleared. Further, the status register is changed from (08) 16 to (20) 16, and the status changes from the semi-active status (internal disk failure) (151) to the active status (150) (process 1654).

【０１７４】〈動作例３〉次に、図１〜図３０で説明し
た本実施例のシステムにおいて、現用機（１１〜１７）
の共有ディスク装置（５）で障害が発生した場合の処理
について説明する。ここでは、共有ディスク装置（５）
を閉塞し、内蔵ディスク装置（１１−６〜１７−６）の
みで実行する処理について説明する。<Operation Example 3> Next, in the system of the present embodiment described with reference to FIGS.
The processing when a failure occurs in the shared disk device (5) will be described. Here, the shared disk device (5)
Will be described, and processing executed only by the internal disk devices (11-6 to 17-6) will be described.

【０１７５】図４６は、共有ディスク装置障害時の処理
手順を示す図である。現用機（１１）が共有ディスク装
置（５）の障害を検出すると（処理１７００）、現用
機（１１）は、他の現用機（１２〜１７）と予備機（１
０）に共有ディスク装置（５）で障害が発生したことを
通知する（処理１７０１）。FIG. 46 is a diagram showing a processing procedure when a failure occurs in the shared disk device. When the active unit (11) detects a failure of the shared disk device (5) (processing 1700), the active unit (11) is connected to the other active units (12 to 17) and the standby unit (1).
0) is notified that a failure has occurred in the shared disk device (5) (process 1701).

【０１７６】一方、他の現用機（１２〜１７）は、状態
レジスタ（２５０−２〜２５０−７）を（２０）16から
（１０）16に遷移し、準現用状態（共有ディスク装置障
害）（１５１）とする（処理１７１０）。また、予備
機（１０）は、状態レジスタ（２５０−０）を（０４）
16から（０１）16に遷移し、オフライン状態（１５３）
とする（処理１７１０）。On the other hand, the other active devices (12 to 17) change the status registers (250-2 to 250-7) from (20) 16 to (10) 16 and the semi-active status (shared disk device failure). (151) (Processing 1710). The spare machine (10) sets the status register (250-0) to (04).
Transits from 16 to (01) 16 and goes offline (153)
(Processing 1710).

【０１７７】現用機（１１）は、ディスクサブシステム
（４）をリセットする。そして、現用機（１１）は、状
態レジスタ（２５０−１）を（２０）16から（１０）16
に遷移し、準現用状態（共有ディスク装置障害）（１５
１）とする（処理１７０３）。The working unit (11) resets the disk subsystem (4). Then, the working machine (11) updates the status register (250-1) from (20) 16 to (10) 16.
To the semi-active state (shared disk device failure) (15
1) (process 1703).

【０１７８】図４７は、共有ディスク装置修復完了時の
処理手順を示す図である。現用機（１１）で共有ディス
ク装置（５）の修復完了を検出すると（処理１７５
０）、現用機（１１）は、他の現用機（１２〜１７）に
修復完了を通知する（処理１７５１）。FIG. 47 is a diagram showing a processing procedure when the restoration of the shared disk device is completed. When the active device (11) detects the completion of the restoration of the shared disk device (5) (processing 175)
0), the active device (11) notifies the other active devices (12 to 17) of the completion of restoration (processing 1751).

【０１７９】現用機（１２〜１７）と予備機（１０）
は、内蔵ディスク装置（１２−６〜１７−６）の内容を
共有ディスク装置（５）の指定エリア（５−２〜５−
７，５−０）にコピーする（処理１７６０）。その処
理が完了すると、現用機（１２〜１７）は、状態レジス
タ（２５０−２〜２５０−７）を（１０）16から（２
０）16に遷移し、現用状態（１５０）とする（処理１
７６１）。また、予備機（１０）では、状態レジスタ
（２５０−０）を（０１）16から（０４）16に遷移し、
待機状態とする（処理１７１０）。Working machine (12-17) and spare machine (10)
Stores the contents of the internal disk device (12-6 to 17-6) in the designated area (5-2 to 5-
7, 5-0) (processing 1760). When the processing is completed, the working machines (12 to 17) change the status registers (250-2 to 250-7) from (10) 16 to (2).
0) The state transits to 16 and becomes the working state (150) (Process 1)
761). In the spare unit (10), the status register (250-0) transits from (01) 16 to (04) 16,
A standby state is set (processing 1710).

【０１８０】現用機（１１）は、内蔵ディスク装置（１
１−６）の内容を共有ディスク装置の指定エリア（５−
１）にコピーする（処理１７５２）。この処理が完了
すると、現用機（１１）は、状態レジスタ（１０）16か
ら（２０）16とし、準現用状態（共有ディスク障害）
（１５１）から現用状態（１５０）に遷移する（処理１
７５３）。The working machine (11) has a built-in disk device (1
1-6) to the designated area (5-
1) (processing 1752). When this processing is completed, the active unit (11) changes the status registers (10) 16 to (20) 16 to the semi-active state (shared disk failure).
Transition from (151) to working state (150) (Process 1)
753).

【０１８１】〈他の実施例〉次に、本発明の他の実施例
を説明する。<Other Embodiment> Next, another embodiment of the present invention will be described.

【０１８２】図４８は、本発明の第２の実施例に係るシ
ステム構成図である。図に示すように、この実施例のシ
ステムでは、ホストコンピュータ（８）を設ける。ホス
トコンピュータ（８）は、現用ホスト（８−０）と予備
ホスト（８−１）からなり、共有ディスク装置（８−
２）を設ける。FIG. 48 is a system configuration diagram according to the second embodiment of the present invention. As shown in the figure, in the system of this embodiment, a host computer (8) is provided. The host computer (8) is composed of an active host (8-0) and a spare host (8-1).
2) is provided.

【０１８３】図４９は、本実施例における現用機と予備
機の構成図である。図２と比較して、本実施例では現用
機と予備機に回線アダプタ（１１−７）を追加してい
る。FIG. 49 is a configuration diagram of the working machine and the spare machine in this embodiment. Compared to FIG. 2, in the present embodiment, a line adapter (11-7) is added to the active unit and the standby unit.

【０１８４】図５０は、本実施例における回線アダプタ
の構成図である。回線アダプタ（１０−７〜１７−７）
は、すべて同一の構成であるため、現用機（１７）の回
線アダプタ（１７−７）を例にして説明する。回線アダ
プタ（１７−７）は、プロセッサ（１７−７−１）、メ
モリ（１７−７−２）、バッファ（１７−７−３）、お
よびＬＡＮ制御部（１７−７−４）で構成する。FIG. 50 is a configuration diagram of a line adapter according to this embodiment. Line adapter (10-7 to 17-7)
Have the same configuration, the line adapter (17-7) of the active device (17) will be described as an example. The line adapter (17-7) includes a processor (17-7-1), a memory (17-7-2), a buffer (17-7-3), and a LAN control unit (17-7-4). .

【０１８５】回線アダプタ（１７−７）のバッファ（１
７−７−３）には、電文の受信用待ち行列（１７−７−
７）と送信用待ち行列（１７−７−７）を設けている。
電文の受信用待ち行列（１７−７−７）には端末（７）
から受信した電文を格納し、送信用待ち行列（１７−７
−７）には端末（７）へ送信する電文を格納する。The buffer (1) of the line adapter (17-7)
7-7-3) includes a message receiving queue (17-7-).
7) and a transmission queue (17-7-7).
Terminal (7) is in the message receiving queue (17-7-7).
From the transmission queue (17-7).
-7) stores a message to be transmitted to the terminal (7).

【０１８６】以下、図４８から図５０を参照して、第２
の実施例を説明する。Hereinafter, referring to FIGS. 48 to 50, the second
An example will be described.

【０１８７】上述の第１の実施例で使用していたディス
クサブシステム（４）と共有ディスク装置（５）は使用
しない。その代わりに、ホストコンピュータ（８）のデ
ィスク装置（８−２）を使用する。また、第１の実施例
で使用していた通信処理サーバ（２）は使用しない。代
わりに、現用機（１１〜１７）と予備機（１０）に回線
アダプタ（１０−７〜１７−７）を設けている。そし
て、回線切替装置（３）で、直接、現用機（１１〜１
７）と予備機（１０）を切り替える。The disk subsystem (4) and the shared disk device (5) used in the first embodiment are not used. Instead, the disk device (8-2) of the host computer (8) is used. Further, the communication processing server (2) used in the first embodiment is not used. Instead, line adapters (10-7 to 17-7) are provided for the active units (11 to 17) and the standby unit (10). Then, the line switching device (3) directly outputs the active devices (11 to 1).
7) and the spare machine (10) are switched.

【０１８８】この結果、上記第１の実施例の構成と同様
に、現用機（１１〜１７）と予備機（１０）に内蔵ディ
スク装置（１０−６〜１７−６）を設ける。そして、ホ
ストのディスク装置（８−２）と同一の内容とすること
により、二重化構成とする。処理手順は、上記第１の実
施例と同様にすればよい。As a result, similar to the configuration of the first embodiment, the working disks (11 to 17) and the standby device (10) are provided with the built-in disk units (10-6 to 17-6). Then, the content is the same as that of the disk device (8-2) of the host, thereby forming a duplex configuration. The processing procedure may be the same as in the first embodiment.

【０１８９】[0189]

【発明の効果】本発明では、ｎ対１バックアップ構成に
おいて、現用機は必要なデータを自身の内蔵ディスクか
ら読み出し、処理結果と引継ぎ情報を共有ディスクと自
身の内蔵ディスクに書き込む。現用機で障害が発生する
と、予備機が共有ディスクから引継ぎ情報を読み出すこ
とにより、障害の発生した現用機の処理を引継ぐことが
できる。さらに、予備機は共有ディスクの内容を自身の
内蔵ディスクにコピーし、終了すると、内蔵ディスク装
置から読み出す。以上より、本発明では、共有ディスク
装置への書込み処理の場合にのみ他の現用機と競合する
が、共有ディスクの読出し処理と書込み処理の両方で競
合する従来の方式より、処理能力を向上させることが可
能となる。According to the present invention, in the n-to-1 backup configuration, the active device reads necessary data from its own internal disk, and writes the processing result and takeover information to the shared disk and its own internal disk. When a failure occurs in the active device, the standby device can take over the processing of the failed active device by reading the takeover information from the shared disk. Further, the spare device copies the contents of the shared disk to its own internal disk, and when it is finished, reads the content from the internal disk device. As described above, in the present invention, the contention with other active devices occurs only in the writing process to the shared disk device, but the processing capability is improved as compared with the conventional system competing in both the reading process and the writing process of the shared disk. It becomes possible.

[Brief description of the drawings]

【図１】本発明の第１の実施例に係るシステム構成図で
ある。FIG. 1 is a system configuration diagram according to a first embodiment of the present invention.

【図２】現用機と予備機の構成図である。FIG. 2 is a configuration diagram of a working machine and a spare machine.

【図３】ディスク装置の割当て図である。FIG. 3 is an allocation diagram of disk devices.

【図４】本発明の特徴を示す図である。FIG. 4 is a diagram showing features of the present invention.

【図５】ディスク制御装置の構成図である。FIG. 5 is a configuration diagram of a disk control device.

【図６】ＬＡＮアダプタの構成図である。FIG. 6 is a configuration diagram of a LAN adapter.

【図７】通信処理サーバの構成図である。FIG. 7 is a configuration diagram of a communication processing server.

【図８】通信処理サーバと端末の変換テーブルを示す図
である。FIG. 8 is a diagram illustrating a conversion table between a communication processing server and a terminal;

【図９】回線切替装置の構成図である。FIG. 9 is a configuration diagram of a line switching device.

【図１０】ディスクサブシステムの構成図である。FIG. 10 is a configuration diagram of a disk subsystem.

【図１１】状態遷移図である。FIG. 11 is a state transition diagram.

【図１２】ディスク装置のモードとモード遷移を示す図
である。FIG. 12 is a diagram showing modes and mode transitions of a disk device.

【図１３】状態レジスタを示す図である。FIG. 13 shows a status register.

【図１４】割込みレジスタを示す図である。FIG. 14 is a diagram showing an interrupt register.

【図１５】現用機のアライブレジスタを示す図である。FIG. 15 is a diagram showing an alive register of an active device.

【図１６】予備機のアライブレジスタを示す図である。FIG. 16 is a diagram showing an alive register of a spare machine.

【図１７】チェックポイントデータレジスタを示す図で
ある。FIG. 17 is a diagram showing a checkpoint data register.

【図１８】チェックポイントデータエリアを示す図であ
る。FIG. 18 is a diagram showing a checkpoint data area.

【図１９】コピー中レジスタを示す図である。FIG. 19 is a diagram showing a register during copying.

【図２０】ディスクステータスレジスタを示す図であ
る。FIG. 20 is a diagram showing a disk status register.

【図２１】書込みデータレジスタを示す図である。FIG. 21 is a diagram showing a write data register.

【図２２】プロセッサ、メモリ、およびＩＯＰの詳細回
路図である。FIG. 22 is a detailed circuit diagram of a processor, a memory, and an IOP.

【図２３】タイメ割込みの制御回路を示す図である。FIG. 23 is a diagram showing a control circuit for a time interruption.

【図２４】メモリマップを示す図である。FIG. 24 is a diagram showing a memory map.

【図２５】現用機のソフトウェア構成を示す図である。FIG. 25 is a diagram showing a software configuration of an active device.

【図２６】予備機のソフトウェア構成を示す図である。FIG. 26 is a diagram showing a software configuration of a spare machine.

【図２７】ディスクサブシステムのソフトウェア構成を
示す図である。FIG. 27 is a diagram showing a software configuration of a disk subsystem.

【図２８】電文処理の概要を示す図である。FIG. 28 is a diagram illustrating an outline of a message processing;

【図２９】電文管理テーブルを示す図である。FIG. 29 is a diagram showing a message management table.

【図３０】電文のフォーマット図である。FIG. 30 is a format diagram of a message.

【図３１】通信処理手順を示す図である。FIG. 31 is a diagram showing a communication processing procedure.

【図３２】通常運転中の処理手順を示す図である。FIG. 32 is a diagram showing a processing procedure during normal operation.

【図３３】通常運転中のディスクサブシステムの処理手
順（周期割込み方式）を示す図である。FIG. 33 is a diagram showing a processing procedure (periodic interrupt method) of the disk subsystem during normal operation.

【図３４】通常運転中のディスクサブシステムの処理手
順（イベント割込み方式）を示す図である。FIG. 34 is a diagram showing a processing procedure (event interrupt method) of the disk subsystem during normal operation.

【図３５】アライブメッセージによる障害検出方式を示
す図である。FIG. 35 is a diagram showing a failure detection method using an alive message.

【図３６】アライブメッセージの送信処理のフローチャ
ート図である。FIG. 36 is a flowchart of an alive message transmission process.

【図３７】アライブメッセージの受信確認処理のフロー
チャート図である。FIG. 37 is a flowchart of an alive message reception confirmation process.

【図３８】予備機の引継ぎ処理のフローチャート図であ
る。FIG. 38 is a flow chart of a takeover process of a spare machine.

【図３９】（処理１２０７）の詳細フローチャート図
である。FIG. 39 is a detailed flowchart of (Process 1207).

【図４０】予備機のコピー処理のフローチャート図であ
る。FIG. 40 is a flowchart of a copy process performed by a spare machine.

【図４１】予備機のディスク装置のコピー処理のフロー
チャート図である。FIG. 41 is a flowchart of a copy process of a disk device of a spare machine.

【図４２】予備機のディスク装置からの読み出し処理を
示す図である。FIG. 42 is a diagram illustrating a reading process from a disk device of a spare machine.

【図４３】再同期化処理手順を示す図である。FIG. 43 is a diagram showing a resynchronization processing procedure.

【図４４】内蔵ディスク装置障害時の処理手順を示す図
である。FIG. 44 is a diagram showing a processing procedure when an internal disk device fails.

【図４５】内蔵ディスク装置障害修復時の処理手順を示
す図である。FIG. 45 is a diagram showing a processing procedure at the time of repairing a failure in an internal disk device.

【図４６】共有ディスク装置障害時の処理手順を示す図
である。FIG. 46 is a diagram showing a processing procedure when a failure occurs in the shared disk device.

【図４７】共有ディスク装置障害修復時の処理手順を示
す図である。FIG. 47 is a diagram depicting a processing procedure at the time of restoration of a shared disk device failure;

【図４８】本発明の第２の実施例に係るシステム構成図
である。FIG. 48 is a system configuration diagram according to a second example of the present invention.

【図４９】第２の実施例のシステム構成図における現用
機と予備機の構成図である。FIG. 49 is a configuration diagram of an active unit and a standby unit in the system configuration diagram of the second embodiment.

【図５０】回線アダプタの構成図である。FIG. 50 is a configuration diagram of a line adapter.

[Explanation of symbols]

１…高速ＬＡＮ、２…通信処理サーバ、２−０…現用通
信処理サーバ、２−１…待機通信処理サーバ、２−２…
通信処理サーバ用ディスク装置、２−３…タイマ、２−
４…端末変換テーブル、３…回線切替装置、３−０…競
合防止回路、４…ディスクサブシステム、５…共有ディ
スク、６…回線、７…端末、８…ホストコンピュータ、
８−０…現用ホスト、８−１…予備ホスト、８−２…ホ
ストの共有ディスク装置、１０…予備機、１１〜１７…
現用機〜現用機、７０…電文フォーマット、７１…現用
機の番号、７２…電文本体、７３…時刻印、２４０…デ
ィスクステータスレジスタ、２４１…書込みデータレジ
スタ、２５０…状態レジスタ、２５１…割込みレジス
タ、２５２…現用機のアライブレジスタ、２５３…予備
機のアライブレジスタ、２５４…チェックポイントデー
タレジスタ、２５５…コピー中レジスタ、２７０〜２７
７…チェックポイントデータエリア。DESCRIPTION OF SYMBOLS 1 ... High-speed LAN, 2 ... Communication processing server, 2-0 ... Active communication processing server, 2-1 ... Standby communication processing server, 2-2 ...
Disk device for communication processing server, 2-3 ... timer, 2-
4 terminal conversion table, 3 line switching device, 3-0 conflict prevention circuit, 4 disk subsystem, 5 shared disk, 6 line, 7 terminal, 8 host computer,
8-0: working host, 8-1: spare host, 8-2: shared disk device of host, 10: spare machine, 11-17 ...
Working machine to working machine, 70: message format, 71: working machine number, 72: message body, 73: time stamp, 240: disk status register, 241: write data register, 250: status register, 251: interrupt register, 252: Alive register of the active device, 253: Alive register of the spare device, 254: Checkpoint data register, 255: Register during copying, 270 to 27
7 Checkpoint data area.

───────────────────────────────────────────────────── フロントページの続き (72)発明者木下俊之神奈川県川崎市麻生区王禅寺1099番地株式会社日立製作所システム開発研究所内 (56)参考文献特開平２−297643（ＪＰ，Ａ) 特開昭55−115151（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁶，ＤＢ名) G06F 3/06 ────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Toshiyuki Kinoshita 1099 Ozenji Temple, Aso-ku, Kawasaki City, Kanagawa Prefecture Hitachi, Ltd. System Development Laboratory (56) References JP-A-2-297643 (JP, A) JP-A Sho 55-115151 (JP, A) (58) Field surveyed (Int. Cl. ⁶ , DB name) G06F 3/06

Claims

(57) [Claims]

In a system in which n active machines and a spare machine which is a common spare are connected by a network,
A built-in disk device is provided for all the active devices and the spare devices, and a shared disk device shared by all the active devices and the spare devices is provided, and the shared disk device is provided for the n active devices and the spare devices. Correspondingly n +
A backup device, which is divided into one area, and duplexes each area of the shared disk device and a built-in disk device of an active device or a spare device corresponding to the area.

2. The backup device according to claim 1, wherein the active device reads data from the internal disk device of the active device and writes the data to the internal disk device and the shared disk device when the active device is normal.

3. The built-in disk device and the shared disk device as checkpoint data of a received message, a processor register, line information, a status of a message, and information for taking over processing of a process execution state. The backup device according to claim 1, wherein the backup device is stored in the backup device.

4. The n active devices send an alive message to the standby device, and the standby device detects a failure of each active device by confirming reception of the alive message. 4. The backup device according to claim 1, wherein:

5. The standby unit sends an alive message to each of the n active units, and the active unit detects a failure of the standby unit by confirming reception of the alive message. The backup device according to any one of claims 1 to 4, wherein:

6. The apparatus according to claim 1, wherein upon detecting a failure in the active unit, the standby unit reads out checkpoint data necessary for restarting the failure process of the active unit from the shared disk device. 6. The backup device according to any one of items 5 to 5.

7. The spare unit copies contents of a disk unit of the active unit in which a failure has occurred from a shared disk unit to a built-in disk unit of the spare unit during a failure recovery process of the active unit. Item 7. The backup device according to any one of Items 1 to 6.

8. The system according to claim 1, wherein when the failure of the standby unit is detected, the active unit notifies all other active units that the standby unit has a failure, and disconnects the standby unit from the system. 8. The backup device according to any one of claims 1 to 7.

9. The active device, when detecting a failure in the internal disk device, closes the internal disk device, performs a read process from the shared disk device, and performs a write process to the shared disk device. Claim 1
9. The backup device according to any one of items 1 to 8.

10. A process according to claim 9, wherein the process of reading from said shared disk device is performed prior to the process of writing and copying to another shared disk device.
The backup device according to item 1.

11. When the active device detects a failure in the shared disk device, it notifies all other active devices, and all the active devices read and write data from their own internal disk devices, and 11. The backup device according to claim 1, wherein processing is continued.

12. The backup device according to claim 1, wherein, when the active device in which the failure has occurred is repaired from the failure, all the other active devices are notified of the failure repair, and transition to a standby state. .

13. When the shared disk device is repaired,
13. The backup device according to claim 1, wherein all active devices copy the contents of a built-in disk device to a shared disk device.

14. The backup device according to claim 1, wherein the shared disk device is a disk device of a host configured by a hot standby system including an execution host and a standby host.

15. In a system in which n active machines and a spare machine, which is a common spare thereof, are connected via a network, all of the active machines and the spare machines are provided with built-in disk units, and all of the active machines and the spare machines are provided. A shared disk device shared by the shared disk device and the spare device, the shared disk device is divided into n + 1 areas corresponding to the n active devices and the spare device, and the respective areas of the shared disk device are divided into n + 1 areas. A backup method comprising duplicating an internal disk device of an active device or a spare device corresponding to the area.