JPH02196341A

JPH02196341A - Fault restoring system for information processor

Info

Publication number: JPH02196341A
Application number: JP1017232A
Authority: JP
Inventors: Katsuhiko Umeda; 克彦梅田
Original assignee: NEC Engineering Ltd
Current assignee: NEC Engineering Ltd
Priority date: 1989-01-26
Filing date: 1989-01-26
Publication date: 1990-08-02

Abstract

PURPOSE:To minimize the intervention of an operator without switching on and off a power supply and to shorten the restoring time by detecting automatically the occurrence of a fault and reloading and restarting the task that caused the fault. CONSTITUTION:When an error is displayed during the execution of a process to inhibit the input/output operations and the normal working is impossible, a task monitor means 13 detects the occurrence of a fault. Then a protection violation processing means 14 checks the task that had a protection violation in the case an interruption is applied due to the destruction of an area and the execution of an improper instruction. In this case, a reloading/restarting means 15 is started by the means 14 or the means 13 to reload or restart the task that caused the fault. Thus it is possible to minimize the intervention of an operator without switching on and off a power supply and to shorten the restoring time.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は情報処理装置に利用する。本発明は汎用端末（
パーソナルコンピュータ、オフィスコンピュータなど）
や専用端末（ＰＯ３端末あるいは金融端末など）の情報
処理装置運用中に発生する障害復旧方式に関する。DETAILED DESCRIPTION OF THE INVENTION [Industrial Field of Application] The present invention is applied to an information processing device. The present invention is a general-purpose terminal (
personal computers, office computers, etc.)
The present invention relates to a method for recovering from failures that occur during the operation of information processing equipment such as terminals and dedicated terminals (PO3 terminals, financial terminals, etc.).

〔overview〕

本発明は中央処理装置に接続された複数のタスクを有す
る情報処理装置の障害復旧方式において、障害発生を自
動的に検知し、障害の原因となったタスクを再ロードお
よび再起動することにより、電源のＯＦＦ、ＯＮをする
ことなく、オペレータの介入を必要最小限にとどめ復旧
時間を短縮できるようにしたものである。The present invention provides a fault recovery method for an information processing device having multiple tasks connected to a central processing unit, by automatically detecting the occurrence of a fault, and reloading and restarting the task that caused the fault. This eliminates the need to turn the power off and on, minimizing operator intervention and shortening recovery time.

[Conventional technology]

情報処理装置での障害原因としてノイズによるメモリ破
壊や不正動作など多くの原因が考えられるが、従来この
ような障害が発生したときに、エラー表示もしくは動作
不可となることにより、オペレータが障害を確認し、復
旧処理を行っていた。There are many possible causes of failures in information processing equipment, such as memory corruption due to noise and malfunctions, but conventionally, when such failures occur, operators can confirm the failure by displaying an error message or by not being able to operate. and was undergoing recovery processing.

しかし、使用者側としてはできるだけ介入したくないの
が現状であり、自動復旧することが望ましい。However, the current situation is that users do not want to intervene as much as possible, and automatic recovery is desirable.

従来の障害発生時の復旧方法の一例を第７図に示し、以
下同図により説明する。An example of a conventional recovery method when a failure occurs is shown in FIG. 7, and will be explained below with reference to the same figure.

従来の障害復旧方式は、通常の業務運用中に（ステップ
７１）、ノイズによるメモリ破壊や不正動作などが発生
しくステップ７２）、エラー表示された場合（ステップ
７３）、それ以後の動作が保証されないことが多く、そ
のため−変電源を切らなければならなず（ステップ７４
）、その後再度電源を投入して（ステップ７５）復旧用
のユーティリティを動作させるなどのマニュアルによる
作業をする（ステップ７６）ことで使用可能状態にして
運用していた。With conventional failure recovery methods, if memory corruption or malfunction due to noise occurs during normal business operations (step 71), or if an error is displayed (step 73), subsequent operations are not guaranteed. Therefore, it is necessary to turn off the transformer power (step 74).
), and then turned on the power again (step 75) and performed manual work such as running a recovery utility (step 76), thereby making it usable and operating it.

[Problem that the invention seeks to solve]

上述した従来の障害復旧方式は、電源のＯＦＦ。 The conventional failure recovery method described above is to turn off the power.

ＯＮを伴い、復旧作業にオペレータの介入が必要であり
、オペレータが介入するために復旧時間がかかる欠点が
ある。It is accompanied by ON, requires operator intervention for recovery work, and has the disadvantage that recovery time is required due to the operator's intervention.

本発明はこのような欠点を除去するもので、電源のＯＦ
Ｆ、ＯＮを回避し、オペレータの介入を必要最小限にと
どめ、復旧時間を短縮することができる障害復旧方式を
提供することを目的とする。The present invention eliminates such drawbacks, and
The present invention aims to provide a failure recovery method that can avoid F,ON, minimize operator intervention, and shorten recovery time.

[Means for solving problems]

本発明は、基本オペレーティングシステム手段を備えた
中央処理装置に接続され、複数のタスクを制御する情報
処理装置の障害復旧方式において、上記基本オペレーテ
ィングシステム手段には、処理実行中にエラーが表示さ
れて入出力不可となり正常動作ができなくなったときに
障害発生を検知するタスク監視手段と、エリア破壊およ
び不当命令実行により割込みがかかったときに保護違反
を起こしたタスクをチェックする保護違反処理手段と、
この保護違反処理手段もしくは上記タスク監視手段によ
り起動され障害発生の原因となったタスクの再ロードお
よび再起動を実行する再ロード・再起動手段とを含むこ
とを特徴とする。The present invention provides a failure recovery method for an information processing device connected to a central processing unit equipped with a basic operating system means and controlling a plurality of tasks, in which an error is displayed on the basic operating system means during processing. a task monitoring means for detecting the occurrence of a failure when input/output is disabled and normal operation is no longer possible; a protection violation processing means for checking a task that has caused a protection violation when an interrupt occurs due to area destruction or execution of an illegal instruction;
The present invention is characterized in that it includes reload/restart means for reloading and restarting the task that is activated by the protection violation processing means or the task monitoring means and causes the failure.

[Effect]

処理実行中にエラーが表示されて入出力が不可となり正
常動作ができなくなったときにタスク監視手段が障害発
生を検知し、エリア破壊および不当命令実行により割込
みがかかったときに保護違反処理手段が保護違反を起こ
したタスクをチェックする。このとき保護違反処理手段
もしくはタスク監視手段により再ロード・再起動手段が
起動されて障害発生の原因となったタスクの再ロードま
たは再起動を行う。これにより電源を０ＦＦ−ＯＮする
ことなく、オペレータの介入を最小限にとどめ復旧時間
を短縮することができる。When an error is displayed during processing, input/output is disabled, and normal operation is no longer possible, the task monitoring means detects the occurrence of a failure, and when an interrupt occurs due to area destruction or illegal instruction execution, the protection violation handling means is activated. Check the task that caused the protection violation. At this time, the reload/restart means is activated by the protection violation processing means or the task monitoring means to reload or restart the task that caused the failure. As a result, the operator's intervention can be minimized and the recovery time can be shortened without turning the power off and on.

〔Example〕

次に、本発明実施例を図面に基づいて説明する。 Next, embodiments of the present invention will be described based on the drawings.

第１図は本発明実施例の構成を示すブロック図、第２図
は本発明実施例の動作の流れを示す流れ図である。FIG. 1 is a block diagram showing the configuration of an embodiment of the present invention, and FIG. 2 is a flow chart showing the flow of operation of the embodiment of the present invention.

本発明実施例情報処理装置１１は、タスク１６−１〜１
６−〇と、基本オペレーティングシステム手段１２とを
備え、この基本オペレーティングシステム手段１２には
、処理実行中にエラーが表示されて入出力不可となり正
常動作ができなくなったときに障害発生を検知するタス
ク監視手段１３と、エリア破壊および不当命令実行によ
り割込みがかかったときに保護違反をおこしたタスクを
チェックする保護違反処理手段１４と、この保護違反処
理手段１４もしくはタスク監視手段１３により起動され
障害発生の原因となったタスクの再ロードまたは再起動
を実行する再ロード・再起動手段１５とを含む。The information processing device 11 according to the embodiment of the present invention performs tasks 16-1 to 16-1.
6-0 and a basic operating system means 12, the basic operating system means 12 has a task of detecting the occurrence of a failure when an error is displayed during processing execution and input/output is disabled and normal operation is no longer possible. A monitoring means 13, a protection violation processing means 14 that checks a task that caused a protection violation when an interrupt occurs due to area destruction or illegal instruction execution, and a protection violation processing means 14 that is activated by the protection violation processing means 14 or the task monitoring means 13 to detect a failure. reload/restart means 15 for reloading or restarting the task that caused the problem.

各タスク１６−１〜１６−ｎは第３図に示すように、各
処理ごとに処理３３−１〜処理３３−ｎのいずれかを行
い、通常処理がない場合には各タスク１６−１〜１６−
ｎは処理待ち状態となり、何らかの処理要求が発生する
とこの状態から抜は出て処理３３−１〜３３−ｎを行う
。As shown in FIG. 3, each task 16-1 to 16-n performs one of processes 33-1 to 33-n for each process, and if there is no normal process, each task 16-1 to 16-
n is in a waiting state for processing, and when some processing request occurs, it exits from this state and performs processing 33-1 to 33-n.

このときタスク監視手段１３よりタスク調査用コマンド
が発行されたときには正常動作応答処理３６へ処理を移
し、正常に動作していることを返送する。At this time, when a task investigation command is issued by the task monitoring means 13, the process is transferred to the normal operation response processing 36, and a message indicating that the task is operating normally is sent back.

タスクが障害などにより動作不可の場合には返送するこ
とができなくなり、タスク監視手段１３は障害の発生を
検知する。If the task cannot be operated due to a failure or the like, the task cannot be returned, and the task monitoring means 13 detects the occurrence of the failure.

タスク監視手段１３は第４図に示すように、あるタスク
１６−１〜１６−ｎに対して監視用のコマンドを発行し
くステップ４２）、発行したタスクより正常動作応答３
６があれば、そのタスクは正常に動作しているとみなし
監視用タイマ設定（ステップ４１）をクリアしくステッ
プ４６）、次のタスクへ監視用のコマンドを発行する（
ステップ４２）。もしそのタスクがストールしていれば
正常動作応答を返送できなくなるために先に設定した監
視用タイマがタイムアウトする。このとき、ストールし
ていると判断し、後の調査のためにその状態をロギング
しくステップ４３）、再ロード・再起動手段１５で使用
する情報を設定しくステップ４４）、再ロード・再起動
手段１５を起動する（ステップ４５）。As shown in FIG. 4, the task monitoring means 13 issues a monitoring command to certain tasks 16-1 to 16-n (step 42), and receives a normal operation response 3 from the issued task.
6, it is assumed that the task is operating normally, and the monitoring timer setting (step 41) is cleared (step 46), and a monitoring command is issued to the next task (step 46).
Step 42). If the task is stalled, the monitoring timer set earlier will time out because a normal operation response cannot be returned. At this time, it is determined that it is stalled, and the status is logged for later investigation (step 43), and information to be used by the reload/restart means 15 is set (step 44), the reload/restart means 15 (step 45).

保護違反処理手段１４は第５図に示す通りであり、基本
オペレーティングシステム手段１２の割込み処理機能の
一部となっており、エリア破壊および不当命令実行など
により割込みがかかるとこの機能に制御が渡る。ここで
は、保護違反を起こしたタスクのチェック　（ステップ
５１）を行い、ロギング（ステップ５２）、障害発生タ
スク情報の設定（ステップ５３）、再ロード・再起動手
段１５の起動（ステップ５４）を行う。The protection violation processing means 14 is as shown in FIG. 5, and is part of the interrupt processing function of the basic operating system means 12, and control is passed to this function when an interrupt occurs due to area destruction or illegal instruction execution. . Here, the task that caused the protection violation is checked (step 51), logging (step 52), failure task information setting (step 53), and reload/restart means 15 are activated (step 54). .

また、再ロード・再起動手段１５は、第６図に示すよう
に、通常起動待ち状態（ステップ６１）となっているが
、タスク監視手段１３もしくは保護違反処理手段１４に
より起動されると障害発生タスクのチェックを行い（ス
テップ６２）、その後障害発生タスクを一度削除（ステ
ップ６３）シてから再ロードする（ステップ６４）。再
ロード完了後そのタスクに対して再起動をかけ（ステッ
プ６５）、ファイルなど影響を受けた箇所をチェックし
復旧処理を行う　（ステップ６６）。Further, as shown in FIG. 6, the reloading/restarting means 15 is normally in a startup waiting state (step 61), but if it is activated by the task monitoring means 13 or the protection violation processing means 14, a failure occurs. The tasks are checked (step 62), and then the failed task is deleted (step 63) and reloaded (step 64). After the reloading is completed, the task is restarted (step 65), and affected locations such as files are checked and recovery processing is performed (step 66).

〔Effect of the invention〕

以上説明したように本発明によれば、障害復旧時に電源
ＯＦＦ、○Ｎを回避して、オペレータの介入を必要最小
限にとどめ、復旧時間を短縮することができる効果があ
る。As described above, according to the present invention, it is possible to avoid turning off the power and turning off the power at the time of failure recovery, to minimize operator intervention, and to shorten the recovery time.

[Brief explanation of the drawing]

第１図は本発明実施例の構成を示すブロック図。第２図は本発明実施例の動作の流れ図。第３図は本発明実施例の各タスクの動作の流れを示す図
。第４図は本発明実施例のタスク監視手段の動作の流れを
示す図。第５図は本発明実施例の保護違反処理手段の動作の流れ
を示す図。第６図は本発明実施例の再ロード・再起動手段の動作の
流れを示す図。第７図は従来例の動作の流れを示す流れ図。１１・・・情報処理装置、１２・・・基本オペレーティ
ングシステム手段、１３・・・タスク監視手段、１４・
・・保護違反処理手段、１５・・・再ロード・再起動手
段、１６−１〜１６−ｎ・・・タスク。FIG. 1 is a block diagram showing the configuration of an embodiment of the present invention. FIG. 2 is a flow chart of the operation of the embodiment of the present invention. FIG. 3 is a diagram showing the flow of operations of each task in the embodiment of the present invention. FIG. 4 is a diagram showing the flow of operation of the task monitoring means according to the embodiment of the present invention. FIG. 5 is a diagram showing the flow of operation of the protection violation processing means according to the embodiment of the present invention. FIG. 6 is a diagram showing the flow of operation of the reload/restart means according to the embodiment of the present invention. FIG. 7 is a flowchart showing the operation flow of the conventional example. 11... Information processing device, 12... Basic operating system means, 13... Task monitoring means, 14.
... protection violation processing means, 15 ... reload/restart means, 16-1 to 16-n... tasks.

Claims

[Scope of Claims] 1. In a failure recovery method for an information processing device that is connected to a central processing unit equipped with basic operating system means and controls a plurality of tasks, the basic operating system means has error recovery during processing. A task monitoring method that detects the occurrence of a failure when a message is displayed and input/output is disabled and normal operation is no longer possible, and a protection violation that checks for a task that caused a protection violation when an interrupt occurs due to area destruction or illegal instruction execution. Information processing characterized by comprising: a processing means; and a reloading/restarting means for reloading and restarting the task that is started by the protection violation processing means or the task monitoring means and causes the failure. Equipment failure recovery method.