JPS62143141A

JPS62143141A - Detecting system for down of processor

Info

Publication number: JPS62143141A
Application number: JP60283904A
Authority: JP
Inventors: Mitsuaki Shoda; 正田　光明
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1985-12-17
Filing date: 1985-12-17
Publication date: 1987-06-26

Abstract

PURPOSE:To detect the down of a processor without subjecting heavy load to a CPU at the time of normal operation and without making a bus busy by providing the system with an auto-repute information control device. CONSTITUTION:Respective processors 11-14 are connected to the auto-repute information control device 2 and a common storage device 3, and when a fault is generated in a self-processor, the processor can execute automatic rerising on the basis of the auto-repute function. When auto-repute start information or auto-repute end information is sent from the processors 12-14, the device 2 sends the processor number and the information to the master processor 12. When the information is sent from the master processor 11, the information is sent to the master-substitutive processor 12. The processor 11 or 12 receiving the auto-repute start information starts time monitoring, and if receiving the auto-repute end information within a fixed time, regards the processor concerned as a downed processor and informs the processor number of the downed processor to all the other processors through inter-processor communication.

Description

【発明の詳細な説明】（産業上の利用分野）本発明はプロセサのダウン検出方式に関し、特に疎結合
マルチプロセサシステムにおケルプロセサダウン検出方
式に関する。DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention relates to a processor down detection method, and more particularly to a Kel processor down detection method for a loosely coupled multiprocessor system.

（従来の技術）従来、疎結合マルチプロセサシステムにおいてプロセサ
のダウンを検出するためには、各プロセサが共有する装
置、あるいは特定のプロセサにウオッチドックタイマを
もたせ、各プロセサから定期的にウォッチドッグタイマ
をリセットすることができるように通知を発行する必要
がある。(Prior Art) Conventionally, in order to detect when a processor is down in a loosely coupled multiprocessor system, a watchdog timer is provided in a device shared by each processor or in a specific processor, and the watchdog timer is periodically transmitted from each processor. A notification needs to be issued so that it can be reset.

（発明が解決しようとする問題点）上述した従来のプロセサダウン検出方式では、定期的に
通知を発行するため、ＣＰＵに重い負担がかかるという
欠点がある。また、上記通知の発行によってバスが定期
的にビジーになるという欠点もある。(Problems to be Solved by the Invention) The conventional processor down detection method described above has the drawback of placing a heavy burden on the CPU because notifications are issued periodically. Another drawback is that the bus becomes busy periodically due to the issuance of the above notifications.

本発明の目的は、マスタプロセサ、マスタ代替えプロセ
サ、ならびに複数のスレーブプロセサを有し、オートリ
プート機能（コンピュータシステムを運用中に障害が発
生して続行不可能になった場合、自動的に７ステムを初
期化して立上げることをいう。）をもった疎結合マルチ
プロセサシステムにおいて、各プロセサはオートリプー
ト通知制御装置にオートリプートの開始および終了の通
知をし、オートリプート通知制御前記装置では各プロセ
サからオートリプートの開始および終了の通知を受取り
、上記通知がマスタプロセサ以外から到来した場合には
プロセサ識別番号と上記通知とをマスタプロセサに送出
し、上記通知がマスタプロセサからきた場合には上記通
知をマスタ代替えプロセサに送出することによって上記
欠点を除去し、オートリプートの開始の通知を受取った
プロセサでは一定期間内にオートリプートの終了の通知
が到来しなければオー）　ＩＪプートの開始の通知を送
出したプロセサはダウンしたものとみなすことができる
ように構成したプロセサダウン検出方式を提供すること
にある。The object of the present invention is to have a master processor, a master replacement processor, and a plurality of slave processors, and an auto-reput function (if a failure occurs during operation of the computer system and it becomes impossible to continue, the system automatically restarts the 7-system system). ), each processor notifies an autoreput notification control device of the start and end of autoreput. Receives notifications of the start and end of autoreput, and if the above notification comes from a source other than the master processor, sends the processor identification number and the above notification to the master processor, and if the above notification comes from the master processor, sends the above notification to the master processor. The above drawback is removed by sending the notification to the master alternative processor, and if the processor that receives the notification of the start of autoreput does not receive the notification of the end of autoreput within a certain period of time, it sends the notification of the start of IJ put. It is an object of the present invention to provide a processor down detection method configured so that a processor that has failed can be regarded as having gone down.

（問題点を解決するための手段）本発明によるプロセサダウン検出方式は、マスタプロセ
サ、マスタ代替えプロセサ、ならびに複数のスレーブプ
ロセサと、オートリプート通知制御装置とを具備して構
成したものである。(Means for Solving the Problems) A processor down detection method according to the present invention is configured to include a master processor, a master replacement processor, a plurality of slave processors, and an autoreput notification control device.

マスタプロセサ、マスタ代替えプロセサ、ならびに複数
のスレーブプロセサは、それぞれ共有記憶装置に接続さ
れていて疎結合され、オートリプートの開始／終了を通
知することができるとともに、自体に障害が発生した場
合には自動再立上げを行うためのオー）　ＩＪブート機
能を有するものである。The master processor, master replacement processor, and multiple slave processors are each connected to a shared storage device and are loosely coupled, and can notify the start/end of autoreput, and can also notify the start/end of autoreput in the event of a failure. It has an IJ boot function for automatic restart.

オートリプート通知制御装置は、オートリプートの開始
／終了の通知を受取シ、上記通知がマスタ代替えプロセ
サまたはスレーブプロセサから到来した場合にはマスタ
プロセサに上記通知およびプロセサ識別番号を送出し、
上記通知がマスタプロセサから到来した場合にはマスタ
代替えプロセサに上記通知を送出して処理を制御するた
めのものである。The autoreput notification control device receives a notification of the start/end of autoreput, and if the notification arrives from a master replacement processor or a slave processor, sends the notification and processor identification number to the master processor,
When the notification arrives from the master processor, it is sent to the master replacement processor to control processing.

（実施例）次に、本発明について図面を参照して説明する。(Example) Next, the present invention will be explained with reference to the drawings.

第１図は、本発明によるプロセサダウン検出方式を実現
するだめの一実施例を示すブロック図である。第１図に
おいて、ｌｌはマスタプロセサ、１２はマスタ代替えプ
ロセサ、１３はスレーブプロセサ、１４はスレーブプロ
セサ、２はオートリプート通知制御装置、３は共有記憶
装置である。FIG. 1 is a block diagram showing an embodiment of the processor down detection method according to the present invention. In FIG. 1, 11 is a master processor, 12 is a master replacement processor, 13 is a slave processor, 14 is a slave processor, 2 is an autoreput notification control device, and 3 is a shared storage device.

第１図において、プロセサ１１−１４はそれぞれオート
リプート通知制御装置２および共有記憶装＃３に接続さ
れている。プロセサ１１−１４は自プロセサ内で障害が
発生すると、オートリプート機能により自動再立上げを
行うことができる。In FIG. 1, processors 11-14 are connected to autoreput notification control device 2 and shared storage device #3, respectively. When a failure occurs within the processors 11-14, the processors 11-14 can be automatically restarted using an autoreput function.

このとき、オートリプートの開始時には各プロセサはオ
ートリプートの開始の通知をオートリーグ通知制御装置
に送出し、オートリプートの終了時にはオートリプート
の終了の通知をオートリプート通知制御装置２に送出す
る。オートリプートの開始の通知、またはオートリプー
トの終了の通知がプロセサ１２〜１４から送出されてく
ると、オートリプート通知制御装置２はプロセサ番号と
上記通知とをマスタプロセサ１１に送出する。一方、上
記通知がマスタプロセサ１１から送られてくると、マス
タ代替えプロセサ１２にこれを送出する。At this time, each processor sends a notification of the start of autoreputation to the autoleague notification control device at the start of autoreputation, and sends a notification of the end of autoreputation to the autoreputation notification control device 2 at the end of autoreputation. When a notification of the start of autoreput or a notification of the end of autoreput is sent from the processors 12 to 14, the autoreput notification control device 2 sends the processor number and the above notification to the master processor 11. On the other hand, when the above notification is sent from the master processor 11, it is sent to the master substitute processor 12.

オートリプートの開始の通知を受取ったマスタプロセサ
ＩＬまたはマスタ代替えプロセサ１２は時間監視を開始
し、オートリプートの終了の通知を待つ。一定時間内に
オートリプートの終了の通知が到来しなければ、オート
リプートの開始の通知を送出したプロセサは、オートリ
プート中に障害が発生し、オー）　ＩＪブートの続行が
不可能となってダウンしたものとみなされる。The master processor IL or the master substitute processor 12, which has received the notification of the start of autoreputation, starts time monitoring and waits for the notification of the end of autoreputation. If the notification of the end of autoreput does not arrive within a certain period of time, the processor that sent the notification of the start of autoreput will fail during autoreput, and will be unable to continue IJ boot and go down. shall be deemed to have been done.

プロセサのダウンを検出したマスタプロセサ１１または
マスタ代替えプロセサ１２は、ダウンしたプロセサの番
号をプロセサ間通信によって他のすべてのプロセサに通
知する。ダウンしたプロセサがマスタプロセサ１１であ
る場合には、マスタ代替えプロセサ１２がマスタプロセ
サとなシ、その後のオートリプート時間監視等の処理を
マスタプロセサ１１に代って実行する。The master processor 11 or master substitute processor 12 that has detected that a processor is down notifies all other processors of the number of the down processor through inter-processor communication. If the processor that has gone down is the master processor 11, the master replacement processor 12 takes over as the master processor and executes subsequent processes such as monitoring autoreput time in place of the master processor 11.

ここで、プロセサ１１−１４は逐次、自グロセサの運用
状態を共有記憶装置３に書込むので、あるプロセサがダ
ウンした場合には、あらかじめ指定されているプロセサ
によってダウンしたプロセサの処理が引継がれる。この
引継ぎは、ダウンしたプロセサの番号が通知されてきた
時点で行われる。Here, the processors 11-14 sequentially write the operating status of their own processors to the shared storage device 3, so if a certain processor goes down, a previously designated processor takes over the processing of the down processor. This handover is performed when the number of the down processor is notified.

（発明の効果）以上説明したように本発明によれば、マスタプロセサ、
マスタ代替えプロセサ、ならびに複数のスレーブプロセ
サを有し、オートリゲート機能をもつ疎結合マルチグロ
セサ７ステムにおいて、各プロセサはオートリプート通
知制御装置にオートリプートの開始および終了の通知を
送出し、オートリプート通知制御装置が各プロセサから
オートリプートの開始および終了の通知を受取り、上記
通知がマスタプロセサ以外から到来した場合にはプロセ
サ番号と上記通知とをマスタプロセサに送出し、上記通
知がマスタプロセサから到来した場合にはマスタ代替え
プロセサに送出することにより、通常の運用時にＣＰＵ
へ重い負担をかけず、またバスをビジーにすることもな
くプロセサのダウンを検出することができるという効果
がある。(Effects of the Invention) As explained above, according to the present invention, the master processor,
In a loosely coupled multi-grocer 7 system that has a master replacement processor and multiple slave processors and has an autoregate function, each processor sends autoreput start and end notifications to the autoreput notification control device, and performs autoreput notification control. The device receives notifications of the start and end of autoreput from each processor, and if the above notifications arrive from a source other than the master processor, sends the processor number and the above notifications to the master processor, and if the above notifications arrive from the master processor. By sending the data to the master alternative processor, the CPU
This has the effect of being able to detect processor failure without placing a heavy burden on the processor or making the bus busy.

[Brief explanation of drawings]

第１図は、本発明によるプロセサダウン検出方式を実現
するだめの一実施例を示すブロック図である。１１・・・マスタプロセサ１２・・・マスタ代替えプロセサ１３．１４・争・スレーブプロセサ２修・・オー）　ＩＪプート通知制御装置３・・・共有
記憶装置FIG. 1 is a block diagram showing an embodiment of the processor down detection method according to the present invention. 11...Master processor 12...Master replacement processor 13.14・Conflict・Slave processor 2 repair...O) IJ put notification control device 3...Shared storage device

Claims

[Claims]

Each master is connected to a shared storage device, is loosely coupled, can notify the start/end of autoreput, and has an autoreput function to automatically restart itself in the event of a failure. processor,
A master replacement processor and a plurality of slave processors receive a notification of the start/end of the autoreput, and if the notification comes from the master replacement processor or the slave processor, send the notification and the processor identification number to the master processor. and an autoreput notification control device for sending the notification to the master replacement processor to control processing when the notification arrives from the master processor. Processor down detection method.