JPH10269097A

JPH10269097A - Fault management method by dual processing

Info

Publication number: JPH10269097A
Application number: JP9075758A
Authority: JP
Inventors: Shuichi Hisama; 修一久間; Minoru Ueda; 穣上田
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1997-03-27
Filing date: 1997-03-27
Publication date: 1998-10-09

Abstract

(57)【要約】【課題】構成及び制御の簡略化を保ちながら、障害が
発生した場合にもある程度自律的に復旧してシステムの
機能を回復が図れるようにする。【解決手段】タイマＢは、プロセッサＢが障害に陥っ
たことを認識してプロセッサＢをリセットしかつリセッ
トによる立ち上げフラグを立てる。プロセッサＢは、上
記フラグに基づき共通メモリ３０のフェーズ管理情報エ
リアに自らの障害発生を記憶させる。プロセッサＡは、
上記フェーズ管理情報によりプロセッサＢの障害を認識
すると、共通メモリ３０のグローバルデータエリアにシ
ステムプログラムＡ及びＢを記憶させる。プロセッサＢ
は、上記リセット後、初期化プログラム実行に際し共通
メモリ３０からシステムプログラムＡ及びＢをプロセッ
サメモリＢに読み込み、システムプログラムＢの最初か
ら処理を再実行する。 (57) [Summary] PROBLEM TO BE SOLVED: To maintain the simplification of the configuration and control and to recover autonomously to some extent even if a failure occurs, so that the function of the system can be recovered. SOLUTION: A timer B recognizes that a fault has occurred in the processor B, resets the processor B, and sets a reset start flag. The processor B stores its own failure occurrence in the phase management information area of the common memory 30 based on the flag. Processor A is
When the failure of the processor B is recognized from the phase management information, the system programs A and B are stored in the global data area of the common memory 30. Processor B
Reads the system programs A and B from the common memory 30 into the processor memory B upon execution of the initialization program after the reset, and executes the processing again from the beginning of the system program B.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、高い信頼性を必要
とする各種システムにおける障害管理方式に係り、詳し
くは、２つのプロセスユニットを用いたデュアルプロセ
ッシングによる障害管理方式に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a fault management system for various systems requiring high reliability, and more particularly to a fault management system based on dual processing using two process units.

【０００２】[0002]

【従来の技術】通信システムをはじめとする各種システ
ムにおいて、動作信頼性を高める最も一般的な方法とし
て、システムを二重化する方式が知られている。図７
は、かかる障害管理方式に基づく従来システムの概略構
成図であり、システムＡ（１００）と、該システムＡと
ほぼ同等の機能を有するシステムＢ（２００）とを制御
システム３００の配下に置き、システムＡに障害が発生
した場合、システムＢを起動することによりシステムの
動作を維持している。2. Description of the Related Art In a variety of systems including a communication system, a method of duplicating a system is known as the most general method for improving operation reliability. FIG.
FIG. 1 is a schematic configuration diagram of a conventional system based on such a fault management method, in which a system A (100) and a system B (200) having substantially the same functions as the system A are placed under a control system 300, and When a failure occurs in A, the operation of the system is maintained by activating the system B.

【０００３】かかる従来の障害管理方式では、運用系と
して通常機能するシステムＡと、該システムＡに障害が
発生した場合に切り換え動作する待機系のシステムＢの
他、これら各システムの切り換えを行うための制御シス
テム３００が不可欠となり、システム全体が大規模化
し、その制御も複雑化することになった。In such a conventional fault management system, a system A that normally functions as an active system, a standby system B that performs a switching operation when a failure occurs in the system A, and a system for switching between these systems. Becomes indispensable, the whole system becomes large-scale, and the control thereof is also complicated.

【０００４】この他、従来一般的な障害管理方式として
は、図８に示す如く、プロセッサＡとプロセッサＢを用
いて１つのシステム４００を構築したデュアルプロセッ
サ方式があった。この方式では、主処理を行うプロセッ
サＡと特定の処理を行うプロセッサＢの各々が独立して
制御を行うようになっており、その処理は各プロセッサ
Ａ及びＢで閉じている。このため、もしプロセッサＡに
障害が発生した場合には、システムとして外部との情報
の送受が不可能となり、プロセッサＢも動作不能となら
ざるを得なかった。同様に、プロセッサＢに障害が発生
した場合には、プロセッサＡがプロセッサＢから処理結
果を得られなくなり、結局は、プロセッサＡも動作不能
状態に陥ることになった。In addition, as a conventional general fault management system, there has been a dual processor system in which a single system 400 is constructed by using a processor A and a processor B as shown in FIG. In this method, a processor A that performs main processing and a processor B that performs specific processing independently control each other, and the processing is closed by each of the processors A and B. For this reason, if a failure occurs in the processor A, the system cannot transmit and receive information to and from the outside, and the processor B must be inoperable. Similarly, when a failure occurs in the processor B, the processor A cannot obtain the processing result from the processor B, and the processor A eventually falls into an inoperable state.

【０００５】つまり、従来のデュアルプロセッサ方式で
は、２つのプロセッサＡ，Ｂはあくまでも主従関係を成
していて、障害が発生したプロセッサに対して障害が発
生していないもう一方のプロセッサの正常な動作を保証
できず、満足な障害管理を実現できなかった。That is, in the conventional dual processor system, the two processors A and B are in a master-slave relationship to the last, and the normal operation of the other processor in which the fault has not occurred with respect to the faulty processor. Could not be guaranteed, and satisfactory failure management could not be realized.

【０００６】[0006]

【発明が解決しようとする課題】このように、従来の代
表的な障害管理方式としては、システム二重化方式とデ
ュアルプロセッサ方式があったが、前者において同等の
機能を備えたシステムを２つ用いる必要性からシステム
の大規模化、制御の複雑化を免れず、また、後者におい
ては、２つのプロセッサが主従関係にあるために、一方
のプロセッサが何等かの障害に陥って停止してしまった
時、他方のプロセッサもその障害に陥ったプロセッサと
共に機能停止してしまうことになり、外部からの作用が
ない限り自律復旧ができず、システムの信頼性が低下す
るという問題点があった。As described above, the conventional representative fault management methods include the system duplication method and the dual processor method. However, the former requires two systems having the same functions. Due to the nature of the system, the system is inevitably enlarged and the control is complicated. In the latter case, when two processors are in a master-slave relationship and one of the processors is stopped due to some kind of failure. On the other hand, the other processor also stops functioning together with the failed processor, so that autonomous recovery cannot be performed unless there is an external operation, and the reliability of the system is reduced.

【０００７】本発明は上記問題点を除去し、構成及び制
御が簡略でありながら、障害が発生した場合にもある程
度自律的に復旧してシステムの機能を回復を図ることが
できるデュアルプロセッシングによる障害管理方式を提
供することを目的とする。[0007] The present invention eliminates the above-mentioned problems, and has a simple configuration and control, and even in the event of a failure, it can recover to some extent autonomously and restore the function of the system. The purpose is to provide a management method.

【０００８】[0008]

【課題を解決するための手段】本発明は、２つのプロセ
スユニットと、前記各プロセスユニット間に設けられ、
該各プロセスユニットが共にアクセス可能なグローバル
データエリアを少なくとも有する共通メモリと、前記共
通メモリのグローバルデータエリアを通じ、前記各プロ
セッサ間で互いに必要とするシステムプログラムの送受
を行うプロセッサ間情報送受手段とを具備し、前記各プ
ロセスユニットのいずれか一方に障害が発生した場合、
もう一方の正常なプロセスユニットから前記共通メモリ
のグローバルデータエリアを通じて前記障害が発生した
プロセスユニットに対して必要なシステムプログラムを
転送し、初期化プログラムに従って該システムプログラ
ムを最初から再実行させることにより当該障害が発生し
たプロセスユニットを自律的に復旧させることを特徴と
する。According to the present invention, two process units are provided between each of the process units,
A common memory having at least a global data area which can be accessed by each of the process units; and an inter-processor information transmitting / receiving means for transmitting / receiving mutually necessary system programs between the processors through the global data area of the common memory. If any one of the process units fails,
A necessary system program is transferred from the other normal process unit to the failed process unit through the global data area of the common memory, and the system program is re-executed from the beginning according to the initialization program. It is characterized by autonomously restoring a failed process unit.

【０００９】また、本発明では、２つのプロセスユニッ
トと、前記各プロセスユニットに対応して設けられ、そ
れぞれが共に自プロセスユニット及び相手プロセスユニ
ットの動作に必要なシステムプログラムを記憶して成る
２つのプロセッサメモリと、前記各プロセッサメモリ間
に設けられ、フェーズ管理情報エリア，プロセッサ情報
操作エリア及びグローバルデータエリアを有する共通メ
モリと、前記各プロセスユニットに障害が発生したこと
を検出する検出手段と、障害が発生したプロセスユニッ
トに対応して障害発生情報を前記共通メモリのフェーズ
管理情報エリアに記録する障害情報記録手段と、前記共
通メモリのフェーズ管理情報エリアを監視することによ
り前記各プロセスユニットにおける障害発生を認識する
認識手段と、前記共通メモリのプロセッサ情報操作エリ
アへ必要に応じて転送される制御情報に基づき、前記障
害発生が認識されていないプロセスユニットに対応する
プロセッサメモリから前記共通メモリのグローバルデー
タエリアに前記システムプログラムを書込む書込手段
と、障害が発生したプロセスユニットをリセットし、前
記共通メモリのグローバルデータエリア内のシステムプ
ログラムを読込むルーチンを含む初期プログラムを実行
させる再起動手段とを具備することを特徴とする。In the present invention, two process units and two process units are provided corresponding to the respective process units, each of which stores a system program necessary for the operation of the own process unit and the partner process unit. A processor memory, a common memory provided between the processor memories and having a phase management information area, a processor information operation area, and a global data area; detection means for detecting occurrence of a failure in each of the process units; Failure information recording means for recording failure occurrence information in the phase management information area of the common memory corresponding to the process unit in which the error has occurred, and monitoring of the phase management information area of the common memory to generate a failure in each of the process units. Recognition means for recognizing The system program is written to the global data area of the common memory from the processor memory corresponding to the process unit whose failure has not been recognized based on the control information transferred to the processor information operation area of the communication memory as necessary. It is characterized by comprising writing means and restart means for resetting the failed process unit and executing an initial program including a routine for reading a system program in the global data area of the common memory.

【００１０】本発明では、２つのプロセスユニットと、
これら両プロセスユニット間の情報の送受を行うための
共通メモリを有することにより、システム二重化方式に
おいて顕著な構造面での大型化やそれに伴う制御の複雑
化を回避し、比較的コンパクトなシステムを構築でき
る。また、本発明では、たとえ１つのプロセスユニット
の機能が停止しても正常機能している他方のプロセスユ
ニットによる復帰制御により、システム全体の機能の自
律回復を行うことができ、従来のデュアルプロセッシン
グ方式に見られるような１つのプロセッサの機能停止か
ら引き起こされるシステム全体のダウンを防止できる。In the present invention, two process units are provided:
By having a common memory for transmitting and receiving information between these two process units, a relatively compact system can be constructed, avoiding the remarkable structural increase in the system redundancy and the accompanying complicated control. it can. Further, according to the present invention, even if the function of one process unit is stopped, the autonomous restoration of the function of the entire system can be performed by the return control by the other process unit which is functioning normally, and the conventional dual processing system As a result, it is possible to prevent the whole system from going down due to the stoppage of one processor.

【００１１】[0011]

【発明の実施の形態】以下、本発明の一実施の形態につ
いて添付図面を参照して詳細に説明する。図１は、本発
明の一実施の形態に係るプロセッサシステムの機能構成
を示すブロック図である。このプロセッサシステムは、
システムのＭＰＵとして各々独立に動作する２つのプロ
セッサＡ（１０），プロセッサＢ（２０）、これらプロ
セッサＡ，プロセッサＢがハードウェア的に正常に機能
しているかどうかを監視するタイマＡ（１１），タイマ
Ｂ（２１）、上記プロセッサＡ，プロセッサＢのシステ
ムプログラムを格納するプロセッサメモリＡ（１２），
プロセッサメモリＢ（２２）、上記プロセッサＡ，プロ
セッサＢ間の情報の送受を行うために用いる共通メモリ
（３０）を具備して構成される。Embodiments of the present invention will be described below in detail with reference to the accompanying drawings. FIG. 1 is a block diagram showing a functional configuration of a processor system according to one embodiment of the present invention. This processor system
Two processors A (10) and B (20), each operating independently as an MPU of the system; a timer A (11) for monitoring whether these processors A and B are functioning normally in hardware; A timer B (21), a processor memory A (12) for storing the system programs of the processors A and B,
The processor memory B (22) includes a common memory (30) used for transmitting and receiving information between the processors A and B.

【００１２】プロセッサＡ，プロセッサＢは、システム
の処理を分担して行うものである。プロセッサメモリＡ
は、上記プロセッサＡに対応して設けられ、図２に示す
如く、プロセッサＡ，プロセッサＢに共通のプロセッサ
初期化プログラム１２０と、自プロセッサのシステムプ
ログラムＡ（１２１）と、障害に陥った相手プロセッサ
Ｂを復帰させるためのシステムプログラムＢ（１２２）
と、自プロセッサ固有の固有データエリア１２３を具備
して構成される。The processor A and the processor B share and execute the processing of the system. Processor memory A
Are provided corresponding to the processor A, as shown in FIG. 2, a processor initialization program 120 common to the processors A and B, a system program A (121) of its own processor, and a partner processor which has failed. System program B (122) for restoring B
And a unique data area 123 unique to the own processor.

【００１３】他方、プロセッサメモリＢは、上記プロセ
ッサＢに対応して設けられ、上記プロセッサメモリＡと
同様（図２参照）、初期化プログラム２２０と、自プロ
セッサのシステムプログラムＢ（２２２）と、障害に陥
った相手プロセッサＡを復帰させるためのシステムプロ
グラムＡ（２２１）と、自プロセッサ固有の固有データ
エリア２２３を具備して構成される。各プロセッサメモ
リＡ，Ｂにおいて、初期化プログラム（１２０，２２
０）はＲＯＭに格納され、実行プログラム・システムデ
ータ等はＲＡＭに格納される。On the other hand, the processor memory B is provided corresponding to the processor B, and like the processor memory A (see FIG. 2), an initialization program 220, a system program B (222) of its own processor, And a unique data area 223 unique to the own processor. In each of the processor memories A and B, the initialization program (120, 22)
0) is stored in the ROM, and the execution program / system data and the like are stored in the RAM.

【００１４】タイマＡ，タイマＢは、それぞれ隣接した
プロセッサＡ，プロセッサＢがハードウェア的に機能し
ているかどうかを監視するハードウェアであり、当該ハ
ードウェアに対してプロセッサＡ，プロセッサＢから一
定時間以上アクセスがなかった場合、プロセッサＡ，プ
ロセッサＢが機能停止状態すなわち障害に陥ったと判断
し、対応するプロセッサＡ，プロセッサＢにそれぞれリ
セットをかける機能と、そのリセットからの立ち上がり
時に「ハードウェアリセットによる立ち上がり状態」で
あることを、対応するプロセッサＡ，プロセッサＢに認
識させる機能とを有する。The timer A and the timer B are hardware for monitoring whether or not the adjacent processors A and B function as hardware, respectively. If no access has been made, it is determined that the processors A and B have stopped functioning, that is, a failure has occurred, and the corresponding processors A and B are reset. The processor A and the processor B have a function of recognizing that the processor is in the “starting state”.

【００１５】共通メモリ３０は、障害に陥ったプロセッ
サをもう一方の正常なプロセッサにより復旧させるため
の情報を保持するものであり、図３に示す如く、フェー
ズ管理情報エリア３００、プロセッサ情報操作エリア３
０１、グローバルデータエリア３０２により構成され
る。The common memory 30 holds information for restoring a failed processor by another normal processor. As shown in FIG. 3, a phase management information area 300 and a processor information operation area 3 are provided.
01, a global data area 302.

【００１６】通常時すなわちプロセッサＡ，プロセッサ
Ｂが正常に機能している時、上記共通メモリ３０のフェ
ーズ管理情報エリア３００はこれら両プロセッサＡ，Ｂ
のフェーズ情報を保持するエリアとして用いられ、プロ
セッサ情報操作エリア３０１は両プロセッサＡ，Ｂ間の
制御用データを保持するエリアとして用いられ、グロー
バルデータエリア３０２は両プロセッサＡ，Ｂが共通に
使用するデータの格納エリアとして用いられる。Normally, that is, when the processors A and B are functioning normally, the phase management information area 300 of the common memory 30 stores these two processors A and B.
The processor information operation area 301 is used as an area for holding control data between the processors A and B, and the global data area 302 is used in common by the processors A and B. Used as a data storage area.

【００１７】これに対し、プロセッサＡまたはプロセッ
サＢに障害が発生した場合、共通メモリ３０のエリア配
分は図４に示すような態様に変わり、プロセッサ情報操
作エリア３０１はシステムプログラムの転送制御用エリ
アとして、またグローバルデータエリア３０２は障害が
発生したプロセッサのシステムプログラム格納用エリア
として用いられる。On the other hand, if a failure occurs in the processor A or the processor B, the area allocation of the common memory 30 changes to the form shown in FIG. 4, and the processor information operation area 301 is used as a transfer control area for the system program. The global data area 302 is used as an area for storing a system program of a failed processor.

【００１８】以下、このプロセッサシステムの処理動作
について説明する。まず、本システムが正常に動作して
いる時、各プロセッサＡ，Ｂは独立して与えられた処理
を実行する。その際、共通メモリ３０は、両プロセッサ
Ａ，Ｂが共に参照するグローバルデータを格納するため
のメモリとして用いられる（図３参照）。また、各プロ
セッサＡ，Ｂは共通メモリ３０のフェーズ管理情報エリ
アの内容を一定時間毎に監視し、互いに他方のプロセッ
サＢ，Ａのフェーズチェックを行っている。Hereinafter, the processing operation of the processor system will be described. First, when the present system is operating normally, each of the processors A and B independently executes a given process. At this time, the common memory 30 is used as a memory for storing global data that is referred to by both the processors A and B (see FIG. 3). Each of the processors A and B monitors the contents of the phase management information area of the common memory 30 at regular intervals, and checks the phases of the other processors B and A.

【００１９】システムの動作中に、いずれかのプロセッ
サに何等かの障害が発生した場合、以下のような制御に
より、自律的なシステム復旧が行われる。なお、この例
においては、上記障害の種類として、オペレーティング
システム（ＯＳ）が正常な状態でプロセッサＡまたはプ
ロセッサＢに発生するタスクレベルでの障害と、タスク
レベルではない、ＯＳ自体の障害とを想定している。If any failure occurs in any of the processors during operation of the system, autonomous system restoration is performed by the following control. In this example, the types of the faults are assumed to be a fault at the task level which occurs in the processor A or the processor B when the operating system (OS) is in a normal state, and a fault of the OS itself which is not at the task level. doing.

【００２０】まず、ＯＳが正常な状態でプロセッサＡま
たはプロセッサＢにタスクレベルでの障害が発生した場
合の障害復旧動作を図５に示すフローチャートを参照し
て説明する。First, a failure recovery operation when a failure occurs at the task level in the processor A or the processor B while the OS is in a normal state will be described with reference to a flowchart shown in FIG.

【００２１】システム動作中、ＯＳは、図５（ａ）に示
す如く、プロセッサＡまたはプロセッサＢにタスク障害
が発生したか否かを監視している（ステップ５０１）。
ここで、例えば、プロセッサＢが実行中のタスクが障害
に陥った場合（ステップ５０１ＹＥＳ）、これを認識し
たＯＳでは、共通メモリ３０のフェーズ管理情報エリア
にプロセッサＢが障害に陥ったことを記録する（ステッ
プ５０２）。During the operation of the system, the OS monitors whether a task failure has occurred in the processor A or the processor B as shown in FIG. 5A (step 501).
Here, for example, when the task being executed by the processor B has a fault (step 501 YES), the OS that recognizes this records the fact that the processor B has faulted in the phase management information area of the common memory 30. (Step 502).

【００２２】他方、その時点で正常動作しているプロセ
ッサＡでは、図５（ｂ）に示す如く、一定時間毎に共通
メモリ３０のフェーズ管理情報エリアにアクセスしなが
ら相手プロセッサＢのフェーズ状態をチェックし（ステ
ップ５１０，５１１）、このプロセッサＢに障害が発生
しているかどうかを監視している（ステップ５１２）。On the other hand, the processor A operating normally at that time checks the phase state of the partner processor B while accessing the phase management information area of the common memory 30 at regular time intervals as shown in FIG. Then, it is monitored whether or not a failure has occurred in the processor B (step 512).

【００２３】ここで、プロセッサＢに障害が発生してい
ることが認識されると（ステップ５１２ＹＥＳ）、正常
動作しているプロッッサＡは、障害が発生しているプロ
セッサＢのプロセッサメモリＢのシステムプログラムＢ
の一番最初にプログラムカウンタをもっていく処理を行
う（ステップ５１３）。Here, when it is recognized that a fault has occurred in the processor B (step 512 YES), the processor A operating normally operates the system program in the processor memory B of the faulty processor B. B
At the very beginning, a process of obtaining a program counter is performed (step 513).

【００２４】一方、障害が発生した上記プロセッサＢ
は、タスク障害に陥った後、プログラムカウンタをアク
セスすることを繰り返し行う（ステップ５２０）。ここ
で、アクセスが成功したら（ステップ５２１ＹＥＳ）、
そのプログラムカウンタに従って自システムプログラム
Ｂの最初から再実行する（ステップ５２２）。On the other hand, the processor B in which the fault has occurred
Repeatedly accesses the program counter after the task failure (step 520). Here, if the access is successful (step 521 YES),
According to the program counter, the system program B is re-executed from the beginning (step 522).

【００２５】このように、タスクレベルでの障害に際し
ては、例えばプロセッサＢがタスク障害に陥った場合、
ＯＳがこれを認識して共通メモリ３０のフェーズ管理情
報エリアにその旨を記録する一方、正常動作しているプ
ロセッサＡはこのフェーズ管理情報を定期的に監視する
ことによりプロセッサＢが障害に陥っているかどうかを
監視し、障害に陥っている場合には、プロセッサＢ用の
プロセッサメモリＢのシステムプログラムＢ（２２２）
の一番最初にプログラムカウンタを持っていき、プロセ
ッサＢにシステムプログラム処理を最初から再実行させ
ることにより、自律的な機能回復を図るようにしてい
る。As described above, when a failure occurs at the task level, for example, when the processor B falls into a task failure,
The OS recognizes this and records the fact in the phase management information area of the common memory 30, while the normally operating processor A periodically monitors the phase management information to cause the processor B to fail. The system program B (222) of the processor memory B for the processor B is monitored if it has failed.
, The program counter is brought to the beginning, and the processor B is re-executed from the beginning so as to achieve autonomous function recovery.

【００２６】次に、タスクレベルではなくＯＳ自身が停
止してしまった場合における障害発生に際しての障害復
旧動作を図６に示すフローチャートを参照して説明す
る。システム動作中、タイマＡ，Ｂは、図６（ａ）に示
す如く、一定時間以上プロセッサＡ，Ｂからアクセスが
なかったＫどうかを監視し（ステップ６０１〜６０
３）、上記一定時間以上アクセスがなかった場合（ステ
ップ６０３ＹＥＳ）は、対応するプロセッサＢ，Ａをリ
セットする（ステップ６０４）。Next, a failure recovery operation when a failure occurs when the OS itself is stopped, not at the task level, will be described with reference to the flowchart shown in FIG. During the operation of the system, the timers A and B monitor whether there has been no access from the processors A and B for a certain period of time as shown in FIG. 6A (steps 601 to 60).
3) If there has been no access for the predetermined time or more (step 603 YES), the corresponding processors B and A are reset (step 604).

【００２７】一例として、プロセッサＢが障害に陥った
場合、タイマＢがこれを検出してプロセッサＢにリセッ
トをかける。その際、タイマＢは、自タイマ内の「ハー
ドウェアリセットによる立ち上がり」を示すフラグを立
てる（ステップ６０５）。As an example, when the processor B fails, the timer B detects this and resets the processor B. At this time, the timer B sets a flag indicating “rising due to hardware reset” in its own timer (step 605).

【００２８】これに対して、障害に陥ったプロセッサＢ
は、図６（ｂ）に示す如く、タイマＢによるリセット処
理（ステップ６０４）による立ち上がり（ステップ６１
０）に際し、タイマＢ内の上記「リセットによる立ち上
がりフラグ」を見に行く（ステップ６１１）。ここで、
同フラグがオン状態であれば（ステップ６１２ＹＥ
Ｓ）、共通メモリ３０のフェーズ管理情報エリアに自ら
がハードウェアリセットにより再起動したことを記憶す
る（ステップ６１３）。On the other hand, the failed processor B
As shown in FIG. 6B, the rising edge (step 61) of the reset process (step 604) by the timer B is performed.
At the time of 0), go to the “rising flag by reset” in the timer B (step 611). here,
If the flag is on (step 612YE)
S), the fact that it has been restarted by a hardware reset is stored in the phase management information area of the common memory 30 (step 613).

【００２９】次いで、プロセッサＢは、プロセッサメモ
リＢ内に記憶されている初期化プログラム２２０を実行
する（ステップ６１４）。この初期化プログラム２２０
にはハードリセットによる立ち上がり時には共通メモリ
３０からシステムプログラムＡ，Ｂを読み込むルーチン
が含まれている。従って、プロセッサＢは、上記ステッ
プ６１４における初期化プログラム２２０の実行に際
し、上記ルーチンに従って共通メモリ３０からシステム
プログラムＡ，Ｂを読み込み（ステップ６１５）、これ
をプロセッサメモリＢに記憶した後、該プロセッサメモ
リＢ内の自システムプログラムＢの最初から処理を実行
する（ステップ６１６）。Next, the processor B executes the initialization program 220 stored in the processor memory B (step 614). This initialization program 220
Includes a routine for reading the system programs A and B from the common memory 30 at the time of startup by a hard reset. Accordingly, the processor B reads the system programs A and B from the common memory 30 in accordance with the above routine when executing the initialization program 220 in the above step 614 (step 615), and stores them in the processor memory B. The process is executed from the beginning of the own system program B in B (step 616).

【００３０】また、正常機能しているプロセッサＡは、
一定期間毎に共通メモリ３０のフェーズ管理情報エリア
にアクセスしてフェーズ情報をチェックし（ステップ６
２０，６２１）、そのチェック結果に基づき相手プロセ
ッサＢに障害が発生しているかどうかを判断する（ステ
ップ６２２）。The processor A functioning normally is
The phase management information area of the common memory 30 is accessed at regular intervals to check the phase information (step 6).
20, 621), and it is determined whether or not a failure has occurred in the partner processor B based on the check result (step 622).

【００３１】ここで、上記ステップ６１３におけるハー
ドウェアリセットにより再起動したことに関するプロセ
ッサＢ自らによるフェーズ管理情報エリアへの障害発生
情報の記憶処理に対応してプロセッサＢが障害に陥った
ことを認識すると（ステップ６２２ＹＥＳ）、次いでプ
ロセッサＡは、プロセッサメモリＡから共通メモリ３０
のグローバルデータエリアに対してシステムプログラム
Ａ，Ｂをストアする処理を行う（ステップ６２３）。Here, upon recognizing that the processor B has failed in response to the processing of storing the failure occurrence information in the phase management information area by the processor B relating to the restart due to the hardware reset in the step 613 described above. (Step 622 YES) Then, the processor A sends the common memory 30
To store the system programs A and B in the global data area (step 623).

【００３２】このように、ＯＳ停止時のプロセッサ障害
に際しては、例えば、プロセッサＢが障害に陥った場
合、対応するタイマＢによりその旨を検出し、タイマＢ
がその障害の発生したプロセッサＢをリセットにより立
ち上げた後、リセットによる立ち上げフラグをプロセッ
サＢに示すことにより、当該プロセッサＢに共通メモリ
３０のフェーズ管理情報エリアへ自らの障害発生情報を
記憶させるとともに、正常動作しているプロセッサＡに
おいては、上記フェーズ管理情報を定期的にアクセスす
ることによりプロセッサＢの障害の有無を判断させ、プ
ロセッサＢの障害発生時には、自プロセッサＡに対応す
るプロセッサメモリＡから共通メモリのグローバルデー
タエリアにシステムプログラムＡ，Ｂをストアさせる。
そのうえで、プロセッサＢは、上記の如く自ら障害発生
情報の記憶を行った後に初期化プログラムを実行する
際、その初期化プログラムに含まれるルーチン（ハード
リセットによる立ち上がり時には共通メモリ３０からシ
ステムプログラムＡ，Ｂを読み込むルーチン）に従って
これらシステムプログラムＡ，Ｂを獲得し、その内のシ
ステムプログラムＢの最初から処理を実行させることに
より、自律的な機能回復を図るようにしている。As described above, when a processor failure occurs when the OS is stopped, for example, when the processor B has a failure, the fact is detected by the corresponding timer B and the timer B is detected.
Starts the processor B in which the fault has occurred by resetting, and then indicates the reset start flag to the processor B, thereby causing the processor B to store its fault occurrence information in the phase management information area of the common memory 30. At the same time, in the processor A which is operating normally, the presence or absence of the failure of the processor B is determined by periodically accessing the phase management information. When the failure of the processor B occurs, the processor memory A corresponding to the own processor A is determined. To store the system programs A and B in the global data area of the common memory.
Then, when the processor B executes the initialization program after storing the fault occurrence information by itself as described above, the processor B includes a routine included in the initialization program (at the time of startup by a hard reset, the system program A, B The system programs A and B are acquired in accordance with a (routine for reading), and the processing is executed from the beginning of the system program B in the system programs B, thereby achieving autonomous function recovery.

【００３３】[0033]

【発明の効果】以上説明したように、本発明によれば、
２つのプロセスユニット間に共通メモリを設け、相手プ
ロセスユニットの障害発生を認識したプロセスユニット
により上記共通メモリのグローバルデータエリアに相手
プロセスユニットが必要とするシステムプログラムを書
き込む一方、相手プロセスユニットに対しては、上記障
害発生後にリセット制御を経て上記システムプログラム
を再実行させるようにしたため、一方のプロセスユニッ
トに障害が発生した場合、他方の正常なプロセスユニッ
トによる必要なシステムプログラムの提供を受けて、当
該リセット後のプロセスユニットを自律的に復旧させる
ことができ、構成及び制御が簡略でありながら、障害が
発生した場合にもシステムの機能を回復を図りつつ高い
動作信頼性を維持できるという優れた利点を有する。As described above, according to the present invention,
A common memory is provided between the two process units, and a system program required by the partner process unit is written in the global data area of the common memory by the process unit that has recognized the occurrence of a failure in the partner process unit. Since the system program is re-executed via reset control after the occurrence of the failure, if a failure occurs in one of the process units, the other normal process unit provides the necessary system program and An excellent advantage that the process unit after a reset can be autonomously restored, the structure and control are simple, and even if a failure occurs, the system functions can be restored and high operation reliability can be maintained. Having.

[Brief description of the drawings]

【図１】本発明の一実施の形態に係わるデュアルプロセ
ッサシステムの構成を示すブロック図。FIG. 1 is a block diagram showing a configuration of a dual processor system according to an embodiment of the present invention.

【図２】図１におけるシステムのプロセッサメモリのエ
リア構成を示す図。FIG. 2 is a diagram showing an area configuration of a processor memory of the system in FIG. 1;

【図３】図１におけるシステムのプロセッサ正常動作時
の共通メモリのエリア配分を示す図。FIG. 3 is a diagram showing an area distribution of a common memory during a normal operation of a processor of the system in FIG. 1;

【図４】図１におけるシステムのプロセッサ障害発生時
の共通メモリのエリア配分を示す図。FIG. 4 is a diagram showing area allocation of a common memory when a processor failure occurs in the system in FIG. 1;

【図５】図１におけるシステムのタスクレベルでのプロ
セッサ障害に際してのシステム各部の処理動作を示すフ
ローチャート。FIG. 5 is a flowchart showing the processing operation of each unit of the system in the event of a processor failure at the task level of the system in FIG. 1;

【図６】図１におけるシステムのＯＳ停止時のプロセッ
サ障害に際してのシステム各部の処理動作を示すフロー
チャート。FIG. 6 is a flowchart showing the processing operation of each unit of the system in the event of a processor failure when the OS of the system in FIG. 1 is stopped.

【図７】システム二重化方式を適用した従来システムの
概略構成図。FIG. 7 is a schematic configuration diagram of a conventional system to which a system redundancy system is applied.

【図８】デュアルプロセッサ方式を適用した従来システ
ムの概略構成図。FIG. 8 is a schematic configuration diagram of a conventional system to which a dual processor system is applied.

[Explanation of symbols]

１０プロセッサＡ１１タイマＡ１２プロセッサメモリＡ１２０初期化プログラム１２１システムプログラムＡ１２２システムプログラムＢ１２３固有データエリア２０プロセッサＢ２１タイマＢ２２プロセッサメモリＢ２２０初期化プログラム２２１システムプログラムＡ２２２システムプログラムＢ２２３固有データエリア３０共通メモリ３００フェーズ管理情報エリア３０１プロセッサ情報操作エリア３０２グローバルデータエリア Reference Signs List 10 processor A 11 timer A 12 processor memory A 120 initialization program 121 system program A 122 system program B 123 unique data area 20 processor B 21 timer B 22 processor memory B 220 initialization program 221 system program A 222 system program B 223 unique Data area 30 Common memory 300 Phase management information area 301 Processor information operation area 302 Global data area

Claims

[Claims]

1. A common memory provided between two process units, at least a global data area provided between the respective process units, and accessible by the respective process units. Inter-processor information transmission / reception means for transmitting / receiving system programs required between processors, and when a failure occurs in any one of the process units, the other normal process unit transmits the common memory to the common memory. A necessary system program is transferred to the failed process unit through the global data area, and the failed process unit is autonomously restored by re-executing the system program from the beginning according to the initialization program. This Fault management system by dual processing, wherein.

2. two processor units, two processor memories provided corresponding to the respective process units, each of which stores a system program necessary for the operation of the own process unit and the partner process unit, A common memory provided between the processor memories and having a phase management information area, a processor information operation area, and a global data area; detection means for detecting that a failure has occurred in each of the process units; Failure information recording means for recording failure occurrence information in the phase management information area of the common memory corresponding to the unit; and recognition for recognizing occurrence of a failure in each of the process units by monitoring the phase management information area of the common memory. Means; Writing means for writing the system program from the processor memory corresponding to the process unit whose failure has not been recognized to the global data area of the common memory based on the control information transferred to the processor information operation area as needed And restart means for resetting the failed process unit and executing an initial program including a routine for reading a system program in the global data area of the common memory. Management method.