[go: up one dir, main page]

JPH02196341A - Fault restoring system for information processor - Google Patents

Fault restoring system for information processor

Info

Publication number
JPH02196341A
JPH02196341A JP1017232A JP1723289A JPH02196341A JP H02196341 A JPH02196341 A JP H02196341A JP 1017232 A JP1017232 A JP 1017232A JP 1723289 A JP1723289 A JP 1723289A JP H02196341 A JPH02196341 A JP H02196341A
Authority
JP
Japan
Prior art keywords
task
protection violation
failure
fault
reloading
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP1017232A
Other languages
Japanese (ja)
Inventor
Katsuhiko Umeda
克彦 梅田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Engineering Ltd
Original Assignee
NEC Engineering Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Engineering Ltd filed Critical NEC Engineering Ltd
Priority to JP1017232A priority Critical patent/JPH02196341A/en
Publication of JPH02196341A publication Critical patent/JPH02196341A/en
Pending legal-status Critical Current

Links

Landscapes

  • Retry When Errors Occur (AREA)
  • Stored Programmes (AREA)

Abstract

PURPOSE:To minimize the intervention of an operator without switching on and off a power supply and to shorten the restoring time by detecting automatically the occurrence of a fault and reloading and restarting the task that caused the fault. CONSTITUTION:When an error is displayed during the execution of a process to inhibit the input/output operations and the normal working is impossible, a task monitor means 13 detects the occurrence of a fault. Then a protection violation processing means 14 checks the task that had a protection violation in the case an interruption is applied due to the destruction of an area and the execution of an improper instruction. In this case, a reloading/restarting means 15 is started by the means 14 or the means 13 to reload or restart the task that caused the fault. Thus it is possible to minimize the intervention of an operator without switching on and off a power supply and to shorten the restoring time.

Description

【発明の詳細な説明】 〔産業上の利用分野〕 本発明は情報処理装置に利用する。本発明は汎用端末(
パーソナルコンピュータ、オフィスコンピュータなど)
や専用端末(PO3端末あるいは金融端末など)の情報
処理装置運用中に発生する障害復旧方式に関する。
DETAILED DESCRIPTION OF THE INVENTION [Industrial Field of Application] The present invention is applied to an information processing device. The present invention is a general-purpose terminal (
personal computers, office computers, etc.)
The present invention relates to a method for recovering from failures that occur during the operation of information processing equipment such as terminals and dedicated terminals (PO3 terminals, financial terminals, etc.).

〔概要〕〔overview〕

本発明は中央処理装置に接続された複数のタスクを有す
る情報処理装置の障害復旧方式において、障害発生を自
動的に検知し、障害の原因となったタスクを再ロードお
よび再起動することにより、電源のOFF、ONをする
ことなく、オペレータの介入を必要最小限にとどめ復旧
時間を短縮できるようにしたものである。
The present invention provides a fault recovery method for an information processing device having multiple tasks connected to a central processing unit, by automatically detecting the occurrence of a fault, and reloading and restarting the task that caused the fault. This eliminates the need to turn the power off and on, minimizing operator intervention and shortening recovery time.

〔従来の技術〕[Conventional technology]

情報処理装置での障害原因としてノイズによるメモリ破
壊や不正動作など多くの原因が考えられるが、従来この
ような障害が発生したときに、エラー表示もしくは動作
不可となることにより、オペレータが障害を確認し、復
旧処理を行っていた。
There are many possible causes of failures in information processing equipment, such as memory corruption due to noise and malfunctions, but conventionally, when such failures occur, operators can confirm the failure by displaying an error message or by not being able to operate. and was undergoing recovery processing.

しかし、使用者側としてはできるだけ介入したくないの
が現状であり、自動復旧することが望ましい。
However, the current situation is that users do not want to intervene as much as possible, and automatic recovery is desirable.

従来の障害発生時の復旧方法の一例を第7図に示し、以
下同図により説明する。
An example of a conventional recovery method when a failure occurs is shown in FIG. 7, and will be explained below with reference to the same figure.

従来の障害復旧方式は、通常の業務運用中に(ステップ
71)、ノイズによるメモリ破壊や不正動作などが発生
しくステップ72)、エラー表示された場合(ステップ
73)、それ以後の動作が保証されないことが多く、そ
のため−変電源を切らなければならなず(ステップ74
)、その後再度電源を投入して(ステップ75)復旧用
のユーティリティを動作させるなどのマニュアルによる
作業をする(ステップ76)ことで使用可能状態にして
運用していた。
With conventional failure recovery methods, if memory corruption or malfunction due to noise occurs during normal business operations (step 71), or if an error is displayed (step 73), subsequent operations are not guaranteed. Therefore, it is necessary to turn off the transformer power (step 74).
), and then turned on the power again (step 75) and performed manual work such as running a recovery utility (step 76), thereby making it usable and operating it.

〔発明が解決しようとする問題点〕[Problem that the invention seeks to solve]

上述した従来の障害復旧方式は、電源のOFF。 The conventional failure recovery method described above is to turn off the power.

ONを伴い、復旧作業にオペレータの介入が必要であり
、オペレータが介入するために復旧時間がかかる欠点が
ある。
It is accompanied by ON, requires operator intervention for recovery work, and has the disadvantage that recovery time is required due to the operator's intervention.

本発明はこのような欠点を除去するもので、電源のOF
F、ONを回避し、オペレータの介入を必要最小限にと
どめ、復旧時間を短縮することができる障害復旧方式を
提供することを目的とする。
The present invention eliminates such drawbacks, and
The present invention aims to provide a failure recovery method that can avoid F,ON, minimize operator intervention, and shorten recovery time.

〔問題点を解決するための手段〕[Means for solving problems]

本発明は、基本オペレーティングシステム手段を備えた
中央処理装置に接続され、複数のタスクを制御する情報
処理装置の障害復旧方式において、上記基本オペレーテ
ィングシステム手段には、処理実行中にエラーが表示さ
れて入出力不可となり正常動作ができなくなったときに
障害発生を検知するタスク監視手段と、エリア破壊およ
び不当命令実行により割込みがかかったときに保護違反
を起こしたタスクをチェックする保護違反処理手段と、
この保護違反処理手段もしくは上記タスク監視手段によ
り起動され障害発生の原因となったタスクの再ロードお
よび再起動を実行する再ロード・再起動手段とを含むこ
とを特徴とする。
The present invention provides a failure recovery method for an information processing device connected to a central processing unit equipped with a basic operating system means and controlling a plurality of tasks, in which an error is displayed on the basic operating system means during processing. a task monitoring means for detecting the occurrence of a failure when input/output is disabled and normal operation is no longer possible; a protection violation processing means for checking a task that has caused a protection violation when an interrupt occurs due to area destruction or execution of an illegal instruction;
The present invention is characterized in that it includes reload/restart means for reloading and restarting the task that is activated by the protection violation processing means or the task monitoring means and causes the failure.

〔作用〕[Effect]

処理実行中にエラーが表示されて入出力が不可となり正
常動作ができなくなったときにタスク監視手段が障害発
生を検知し、エリア破壊および不当命令実行により割込
みがかかったときに保護違反処理手段が保護違反を起こ
したタスクをチェックする。このとき保護違反処理手段
もしくはタスク監視手段により再ロード・再起動手段が
起動されて障害発生の原因となったタスクの再ロードま
たは再起動を行う。これにより電源を0FF−ONする
ことなく、オペレータの介入を最小限にとどめ復旧時間
を短縮することができる。
When an error is displayed during processing, input/output is disabled, and normal operation is no longer possible, the task monitoring means detects the occurrence of a failure, and when an interrupt occurs due to area destruction or illegal instruction execution, the protection violation handling means is activated. Check the task that caused the protection violation. At this time, the reload/restart means is activated by the protection violation processing means or the task monitoring means to reload or restart the task that caused the failure. As a result, the operator's intervention can be minimized and the recovery time can be shortened without turning the power off and on.

〔実施例〕〔Example〕

次に、本発明実施例を図面に基づいて説明する。 Next, embodiments of the present invention will be described based on the drawings.

第1図は本発明実施例の構成を示すブロック図、第2図
は本発明実施例の動作の流れを示す流れ図である。
FIG. 1 is a block diagram showing the configuration of an embodiment of the present invention, and FIG. 2 is a flow chart showing the flow of operation of the embodiment of the present invention.

本発明実施例情報処理装置11は、タスク16−1〜1
6−〇と、基本オペレーティングシステム手段12とを
備え、この基本オペレーティングシステム手段12には
、処理実行中にエラーが表示されて入出力不可となり正
常動作ができなくなったときに障害発生を検知するタス
ク監視手段13と、エリア破壊および不当命令実行によ
り割込みがかかったときに保護違反をおこしたタスクを
チェックする保護違反処理手段14と、この保護違反処
理手段14もしくはタスク監視手段13により起動され
障害発生の原因となったタスクの再ロードまたは再起動
を実行する再ロード・再起動手段15とを含む。
The information processing device 11 according to the embodiment of the present invention performs tasks 16-1 to 16-1.
6-0 and a basic operating system means 12, the basic operating system means 12 has a task of detecting the occurrence of a failure when an error is displayed during processing execution and input/output is disabled and normal operation is no longer possible. A monitoring means 13, a protection violation processing means 14 that checks a task that caused a protection violation when an interrupt occurs due to area destruction or illegal instruction execution, and a protection violation processing means 14 that is activated by the protection violation processing means 14 or the task monitoring means 13 to detect a failure. reload/restart means 15 for reloading or restarting the task that caused the problem.

各タスク16−1〜16−nは第3図に示すように、各
処理ごとに処理33−1〜処理33−nのいずれかを行
い、通常処理がない場合には各タスク16−1〜16−
nは処理待ち状態となり、何らかの処理要求が発生する
とこの状態から抜は出て処理33−1〜33−nを行う
As shown in FIG. 3, each task 16-1 to 16-n performs one of processes 33-1 to 33-n for each process, and if there is no normal process, each task 16-1 to 16-
n is in a waiting state for processing, and when some processing request occurs, it exits from this state and performs processing 33-1 to 33-n.

このときタスク監視手段13よりタスク調査用コマンド
が発行されたときには正常動作応答処理36へ処理を移
し、正常に動作していることを返送する。
At this time, when a task investigation command is issued by the task monitoring means 13, the process is transferred to the normal operation response processing 36, and a message indicating that the task is operating normally is sent back.

タスクが障害などにより動作不可の場合には返送するこ
とができなくなり、タスク監視手段13は障害の発生を
検知する。
If the task cannot be operated due to a failure or the like, the task cannot be returned, and the task monitoring means 13 detects the occurrence of the failure.

タスク監視手段13は第4図に示すように、あるタスク
16−1〜16−nに対して監視用のコマンドを発行し
くステップ42)、発行したタスクより正常動作応答3
6があれば、そのタスクは正常に動作しているとみなし
監視用タイマ設定(ステップ41)をクリアしくステッ
プ46)、次のタスクへ監視用のコマンドを発行する(
ステップ42)。もしそのタスクがストールしていれば
正常動作応答を返送できなくなるために先に設定した監
視用タイマがタイムアウトする。このとき、ストールし
ていると判断し、後の調査のためにその状態をロギング
しくステップ43)、再ロード・再起動手段15で使用
する情報を設定しくステップ44)、再ロード・再起動
手段15を起動する(ステップ45)。
As shown in FIG. 4, the task monitoring means 13 issues a monitoring command to certain tasks 16-1 to 16-n (step 42), and receives a normal operation response 3 from the issued task.
6, it is assumed that the task is operating normally, and the monitoring timer setting (step 41) is cleared (step 46), and a monitoring command is issued to the next task (step 46).
Step 42). If the task is stalled, the monitoring timer set earlier will time out because a normal operation response cannot be returned. At this time, it is determined that it is stalled, and the status is logged for later investigation (step 43), and information to be used by the reload/restart means 15 is set (step 44), the reload/restart means 15 (step 45).

保護違反処理手段14は第5図に示す通りであり、基本
オペレーティングシステム手段12の割込み処理機能の
一部となっており、エリア破壊および不当命令実行など
により割込みがかかるとこの機能に制御が渡る。ここで
は、保護違反を起こしたタスクのチェック (ステップ
51)を行い、ロギング(ステップ52)、障害発生タ
スク情報の設定(ステップ53)、再ロード・再起動手
段15の起動(ステップ54)を行う。
The protection violation processing means 14 is as shown in FIG. 5, and is part of the interrupt processing function of the basic operating system means 12, and control is passed to this function when an interrupt occurs due to area destruction or illegal instruction execution. . Here, the task that caused the protection violation is checked (step 51), logging (step 52), failure task information setting (step 53), and reload/restart means 15 are activated (step 54). .

また、再ロード・再起動手段15は、第6図に示すよう
に、通常起動待ち状態(ステップ61)となっているが
、タスク監視手段13もしくは保護違反処理手段14に
より起動されると障害発生タスクのチェックを行い(ス
テップ62)、その後障害発生タスクを一度削除(ステ
ップ63)シてから再ロードする(ステップ64)。再
ロード完了後そのタスクに対して再起動をかけ(ステッ
プ65)、ファイルなど影響を受けた箇所をチェックし
復旧処理を行う (ステップ66)。
Further, as shown in FIG. 6, the reloading/restarting means 15 is normally in a startup waiting state (step 61), but if it is activated by the task monitoring means 13 or the protection violation processing means 14, a failure occurs. The tasks are checked (step 62), and then the failed task is deleted (step 63) and reloaded (step 64). After the reloading is completed, the task is restarted (step 65), and affected locations such as files are checked and recovery processing is performed (step 66).

〔発明の効果〕〔Effect of the invention〕

以上説明したように本発明によれば、障害復旧時に電源
OFF、○Nを回避して、オペレータの介入を必要最小
限にとどめ、復旧時間を短縮することができる効果があ
る。
As described above, according to the present invention, it is possible to avoid turning off the power and turning off the power at the time of failure recovery, to minimize operator intervention, and to shorten the recovery time.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は本発明実施例の構成を示すブロック図。 第2図は本発明実施例の動作の流れ図。 第3図は本発明実施例の各タスクの動作の流れを示す図
。 第4図は本発明実施例のタスク監視手段の動作の流れを
示す図。 第5図は本発明実施例の保護違反処理手段の動作の流れ
を示す図。 第6図は本発明実施例の再ロード・再起動手段の動作の
流れを示す図。 第7図は従来例の動作の流れを示す流れ図。 11・・・情報処理装置、12・・・基本オペレーティ
ングシステム手段、13・・・タスク監視手段、14・
・・保護違反処理手段、15・・・再ロード・再起動手
段、16−1〜16−n・・・タスク。
FIG. 1 is a block diagram showing the configuration of an embodiment of the present invention. FIG. 2 is a flow chart of the operation of the embodiment of the present invention. FIG. 3 is a diagram showing the flow of operations of each task in the embodiment of the present invention. FIG. 4 is a diagram showing the flow of operation of the task monitoring means according to the embodiment of the present invention. FIG. 5 is a diagram showing the flow of operation of the protection violation processing means according to the embodiment of the present invention. FIG. 6 is a diagram showing the flow of operation of the reload/restart means according to the embodiment of the present invention. FIG. 7 is a flowchart showing the operation flow of the conventional example. 11... Information processing device, 12... Basic operating system means, 13... Task monitoring means, 14.
... protection violation processing means, 15 ... reload/restart means, 16-1 to 16-n... tasks.

Claims (1)

【特許請求の範囲】 1、基本オペレーティングシステム手段を備えた中央処
理装置に接続され、複数のタスクを制御する情報処理装
置の障害復旧方式において、 上記基本オペレーティングシステム手段には、処理実行
中にエラーが表示されて入出力不可となり正常動作がで
きなくなったときに障害発生を検知するタスク監視手段
と、 エリア破壊および不当命令実行により割込みがかかった
ときに保護違反を起こしたタスクをチェックする保護違
反処理手段と、 この保護違反処理手段もしくは上記タスク監視手段によ
り起動され障害発生の原因となったタスクの再ロードお
よび再起動を実行する再ロード・再起動手段と を含むことを特徴とする情報処理装置の障害復旧方式。
[Scope of Claims] 1. In a failure recovery method for an information processing device that is connected to a central processing unit equipped with basic operating system means and controls a plurality of tasks, the basic operating system means has error recovery during processing. A task monitoring method that detects the occurrence of a failure when a message is displayed and input/output is disabled and normal operation is no longer possible, and a protection violation that checks for a task that caused a protection violation when an interrupt occurs due to area destruction or illegal instruction execution. Information processing characterized by comprising: a processing means; and a reloading/restarting means for reloading and restarting the task that is started by the protection violation processing means or the task monitoring means and causes the failure. Equipment failure recovery method.
JP1017232A 1989-01-26 1989-01-26 Fault restoring system for information processor Pending JPH02196341A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP1017232A JPH02196341A (en) 1989-01-26 1989-01-26 Fault restoring system for information processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP1017232A JPH02196341A (en) 1989-01-26 1989-01-26 Fault restoring system for information processor

Publications (1)

Publication Number Publication Date
JPH02196341A true JPH02196341A (en) 1990-08-02

Family

ID=11938202

Family Applications (1)

Application Number Title Priority Date Filing Date
JP1017232A Pending JPH02196341A (en) 1989-01-26 1989-01-26 Fault restoring system for information processor

Country Status (1)

Country Link
JP (1) JPH02196341A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016206735A (en) * 2015-04-16 2016-12-08 株式会社日立製作所 Control server and fault detection method
JP2018072944A (en) * 2016-10-25 2018-05-10 キヤノン株式会社 Program, system and information processing method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016206735A (en) * 2015-04-16 2016-12-08 株式会社日立製作所 Control server and fault detection method
JP2018072944A (en) * 2016-10-25 2018-05-10 キヤノン株式会社 Program, system and information processing method

Similar Documents

Publication Publication Date Title
JPH07160370A (en) Service interruption controller
JPH02196341A (en) Fault restoring system for information processor
JPH05314075A (en) On-line computer system
JPH0683657A (en) Service processor switching system
JPH06332734A (en) System activation maintaining system
JPH07261888A (en) Blocking method for data processing and data processor
JPS638834A (en) Operating condition control system for automatic trouble recovery in computer system
JP2977705B2 (en) Control system of networked multiplexed computer system
JPS61813A (en) Deciding system for faulty area of sequence controller
JPS62284440A (en) Software resource maintenance system for terminal equipment
JPS6155750A (en) Alarm processing of computer system
JPS58195968A (en) Re-execution controlling system
JPH0395634A (en) Restart control system for computer system
JPS6272038A (en) Testing method for program runaway detecting device
JPH0235528A (en) Control system for virtual computer system
JPH05108389A (en) Test method of power-off / power-on sequence in non-stop computer
JPH0581065A (en) Self diagnostic method for programmable controller system
JP2795246B2 (en) Failure recovery device at the time of interrupt processing in redundant memory system
JPS6077252A (en) Input/output control device
JPH0916201A (en) Automatic resuming device for plant equipment
JPH05204501A (en) Data terminal equipment
JPS6149225A (en) Operation of information processing system
JPS6247722A (en) Starting method for terminal equipment
JPS63299438A (en) Deciding system for faulty device
JPH03257638A (en) Multiprocessor control system