JPH02137035A

JPH02137035A - Computer system failure diagnosis device

Info

Publication number: JPH02137035A
Application number: JP63290021A
Authority: JP
Inventors: Motoki Inoue; 源樹井上; Tetsuhiro Kondo; 近藤　哲啓
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1988-11-18
Filing date: 1988-11-18
Publication date: 1990-05-25

Abstract

PURPOSE:To improve the trouble diagnostic accuracy and at the same time to shorten the diagnostic time by updating the assurance degree of diagnostic rule based on the diagnostic result information. CONSTITUTION:When the abnormality 1 to be diagnosed occurs, the state of occurrence of the abnormality 1 is automatically detected by an abnormality detector 4c. Then the detector 4c collects and stores the relevant data. Based on this data, a diagnostic part 3 reports the diagnostic result to a user 6 via a knowledge base 5 and the freeze data secured at detection of the abnormality 1. In the case other abnormality analysis data are required as a result of the initial diagnosis, the part 3 gives a request to an automatic data collection part 4 to collect the necessary data in the order of higher assurance degrees of the items conceivable as the factors of the abnormality 1. Thus the part 4 fetches the analysis data on a diagnostic subject 1. The collected data are stored in an abnormality result data base 7 and also sent to a knowledge data production part 4a. The part 4a produces the new knowledge data based on those collected data to add the new data to the base 5 and also to report it to the part 3. The part 3 receives the additional report from the base 5 and carries out a factor estimating process.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、制御用計算機システム、汎用計算機システム
の様な、単体もしくは複数の計算機にて構成されるシス
テムの故障診断に係わり、特に故障発生の都度、知識ベ
ース内確信度を更新してぃくを具備した故障診断装置に
関する。[Detailed Description of the Invention] [Industrial Application Field] The present invention relates to failure diagnosis of a system composed of a single computer or a plurality of computers, such as a control computer system or a general-purpose computer system. The present invention relates to a fault diagnosis device that is equipped with a device that updates confidence in a knowledge base each time.

[Conventional technology]

従来の知識処理を用いた計算機システム・ン〕故障診断
は５以下のような構成となっている。即ち、システムの
異常発生時に、初期異常データをフリーズし、このデー
タからシステムの異常原因となる可能性のある項目を列
挙するための診断推論を実行し、この推論結果に基づき
、必要となる他データを収集し、より的確な診断推論結
果を導き。The computer system failure diagnosis using conventional knowledge processing has the following configuration. In other words, when an abnormality occurs in the system, initial abnormality data is frozen, diagnostic inference is performed to list items that may be the cause of the system abnormality from this data, and based on the results of this inference, necessary Collect data and derive more accurate diagnostic inference results.

ユーザーにガイダンスするものである。一方、異常が発
生し、その原因を見つける為に、知識ベースに不確実な
知識の表現として確信度を導入することが行われている
。しかし、確信度は知識ベース構築者の勘により決定さ
れている為、現実にそぐわない推論結果を得ることもあ
る。It provides guidance to the user. On the other hand, when an abnormality occurs, in order to find the cause, certainty is introduced into the knowledge base as an expression of uncertain knowledge. However, since the confidence level is determined by the intuition of the knowledge base builder, inference results that do not match reality may be obtained.

また一方、プロダクションルールの使用頻度に応じ、ル
ール実行に優先順位をもたせる方式もある（特開昭６０
−２４６４６）。On the other hand, there is also a method that gives priority to rule execution according to the frequency of use of production rules (Japanese Patent Application Laid-Open No. 1989-1999).
-24646).

[Problem to be solved by the invention]

上記従来技術での確信度を用いる方式では、設定された
確信度が実際の現象と必ずしも一致しているとは限らな
い。この為、オペレータが入力した確信度の設定のまず
さによりｉｎな診断結果が得られなかったり、推論結果
を得るまでに多くの時間を要する場合もある。しかし、
現状の診断装置では、何が原因で現実にそぐわないのか
を発見するのが困難であるため一度設定された確（８度
を変更することはまれであり、また確信度を修正すると
しても勘に頼らざるを得なかった。In the above-mentioned method of using confidence in the conventional technology, the set confidence does not necessarily match the actual phenomenon. Therefore, due to improper setting of the confidence level input by the operator, an accurate diagnosis result may not be obtained, or it may take a long time to obtain an inference result. but,
With current diagnostic equipment, it is difficult to discover what is causing the failure to match reality, so it is rare to change the certainty (8 degrees) once set, and even if the certainty is modified, it is based on intuition. I had no choice but to rely on him.

また、プロダクションルールの使用頻度によりルール実
行順序を更新する方式は、過去のルール使用頻度の大な
るものの実行優先順位を高くするという方式であるが１
本発明のような故障診断装置に於いては単純に過去のル
ールの実行頻度にルールの優先度が依存しないという問
題があった。Furthermore, the method of updating the rule execution order based on the usage frequency of production rules is to give higher execution priority to rules that have been used more frequently in the past.
A problem with a fault diagnosis device such as the present invention is that the priority of a rule does not simply depend on the frequency of execution of a rule in the past.

本発明の目的は、故障発生の都度、該当の確信度を更新
していくことにより、推論結果の精度向上と推論時間の
短縮を図るものである。An object of the present invention is to improve the precision of inference results and shorten the inference time by updating the corresponding confidence level each time a failure occurs.

[Means to solve the problem]

本発明は、上記目的を達成する為に以下の様な構成とす
る。In order to achieve the above object, the present invention has the following configuration.

まず、収集したデータを基に推論の実行を行う場合、確
信度の高いルールから実行していくこととし、確信度が
ある値以−Ｌになれば推論を打ち切ることとする。そし
て故障発生の都度、診断装置が出力するガイダンスに従
い故障原因をつきとめる。最終的に故障原因が判明した
段階で、結論を故障診断装置に入力することにより、知
識ベース内の該当する確信度を、過去の故障発生Ｈ歴を
基に再決定する。ここでは、確信度の更新に過去の故障
履歴を適切に反映させ、且つ計算機処理速度の速い移動
平均値を用いる。First, when inference is performed based on collected data, the rules are executed starting from the rule with the highest degree of certainty, and the inference is discontinued when the degree of certainty becomes less than a certain value -L. Each time a failure occurs, the cause of the failure is determined according to the guidance output by the diagnostic device. When the cause of the failure is finally determined, the conclusion is input into the failure diagnosis device to re-determine the corresponding confidence in the knowledge base based on the past history of failure occurrences. Here, the past failure history is appropriately reflected in updating the certainty factor, and a moving average value is used, which has a fast computer processing speed.

これにより、状況−原因の関係を示す確信度をより現実
に即したものとし、故障診断の精度を向上させ、且つ診
断時間を短縮することが可能となる。This makes it possible to make the confidence level indicating the relationship between the situation and the cause more realistic, improve the accuracy of failure diagnosis, and shorten the diagnosis time.

[Effect]

このような構成にすることにより、本来知識構築者の勘
によって決定されていた知識ベース内確信度が、故障診
断の都度現実の発生履歴を基に更新されていく為、当初
は誤った推論結果を出したす、推論時間が長かったもの
が、改善され、より正確な推論結果が短時間に得られる
様になる。With this configuration, the confidence in the knowledge base, which was originally determined by the intuition of the knowledge constructor, is updated based on the actual history of occurrence each time a fault is diagnosed, so initially incorrect inference results may occur. By issuing , the long inference time will be improved and more accurate inference results will be obtained in a shorter time.

〔Example〕

故障診断の実施例を中心に本発明を具体的に説明する。 The present invention will be specifically described with reference to examples of failure diagnosis.

まず、診断対象の異常診断の前売として、そのシステム
において発生しうる異常原因とその結果として生じつる
状況の属性的関係が得られているものとする。First, it is assumed that the attribute relationship between the causes of abnormalities that may occur in the system and the situations that may occur as a result is obtained as a pre-order for diagnosing the abnormality of the subject.

第１図は本診断装置のソフトウェア構成である。FIG. 1 shows the software configuration of this diagnostic device.

図に於て１診断の対象となる異常１が発生すると、その
発生状況は、異常検出器４ｃが自動的に検出し、該当す
る異常箇所に関するデータを収集し保存（フリーズ）す
る。このデータをもとに診断部３は、知識ベース５と異
常検出時のフリーズデータを用いて異常の初期診断を行
うとともに。In the figure, when an abnormality 1 that is a target of 1 diagnosis occurs, the abnormality detector 4c automatically detects the occurrence situation, collects and saves (freezes) data regarding the corresponding abnormality location. Based on this data, the diagnostic unit 3 performs an initial diagnosis of the abnormality using the knowledge base 5 and the freeze data at the time of abnormality detection.

マンマシンインターフェイス２を介して初期診断の結果
をユーザー６へ報告する。この初期診断の結果、他の異
常解析用データが必要となった場合。The results of the initial diagnosis are reported to the user 6 via the man-machine interface 2. As a result of this initial diagnosis, if other abnormality analysis data is required.

これは異常状況の原因として考えられる項目の確信度が
一定値以下であることを意味するが、診断部３は異常状
況の原因として考えられる項目の確信度の高い順に自動
データ収集部４へ必要なデータ収集要求する。自動デー
タ収集部４は、システムのデータ通信経路を介して診断
対象１の解析用データを取り込む。収集されたデータは
、異常実績データベース７に格納されると同時に、知識
データ作成部４ａへ転送され、このデータをもとに知識
データ作成部４ａは新規知識データを作成し知識ベース
５へ追加を行うと共に診断部３へ報告する。知識ベース
５の追加報告を受けた診断部；３は、新情報をもとに原
因推定のプロセスを実行する。This means that the confidence level of items that are considered to be the cause of the abnormal situation is below a certain value, but the diagnosis unit 3 sends the items that are considered to be the cause of the abnormal situation to the automatic data collection unit 4 in order of their confidence level. requests for data collection. The automatic data collection unit 4 takes in data for analysis of the diagnostic object 1 via the data communication path of the system. The collected data is stored in the abnormal performance database 7 and at the same time is transferred to the knowledge data creation section 4a.Based on this data, the knowledge data creation section 4a creates new knowledge data and adds it to the knowledge base 5. and report it to the diagnostic department 3. Upon receiving the additional report from the knowledge base 5, the diagnostic unit 3 executes a process of estimating the cause based on the new information.

上記の原因推定プロセスにおいて、原因の可能性、具体
的には異常の原因である項目の確信度が一定値以上とな
った場合、診断部３はマンマシンインターフェイス２を
介して上記推論結果である異常原因とその対策方法をユ
ーザー６に対して出力する。確信度が一定値を越える異
常原因が複数個存在する場合、この複数の異常原因とぞ
の対策方法を確信度の高い順に列挙する。ユーザー６は
、この結果に従い対策を実施する。また自動的に収集で
きるデータと実行できる推論ルールが無くなった場合、
上記推論結果、具体的には考えられる異常原因の列挙と
その異常対策方法を今までで得られた確信度の高い順に
、ユーザー６に対して出力する。この時点でユーザーは
、上記診断結果から状況判断を行い、より綿密な診断を
行う場合は、マンマシンインターフェイス２の人出力部
２ａを介し新状況設定人力を行う。これにより、前記処
理経路が、再度実行され、より正確な異常原因推定が可
能となる。またこの異常原因推定結果は、ルールの学習
用歴史データとして異常実績データベース７に格納され
る。ユーザー６は推論結果より異常回復の対策を行い、
最終的に故障原因が判明した段階で、結論を入力するこ
とにより学習部８によって知識ベース５内の不況−原因
ルールの確信度を再決定する。In the above cause estimation process, if the possibility of the cause, specifically the certainty of the item that is the cause of the abnormality, exceeds a certain value, the diagnosis unit 3 uses the above inference result via the man-machine interface 2. The cause of the abnormality and its countermeasures are output to the user 6. If there are multiple causes of anomaly whose confidence exceeds a certain value, the multiple causes of anomaly and their countermeasures are listed in descending order of confidence. User 6 implements countermeasures according to this result. Also, if there are no longer data that can be automatically collected and inference rules that can be executed,
The above inference results, specifically, a list of possible causes of anomalies and countermeasures for the anomalies are output to the user 6 in descending order of confidence obtained so far. At this point, the user judges the situation based on the above diagnosis results, and if a more thorough diagnosis is to be performed, manually sets a new situation via the human output section 2a of the man-machine interface 2. As a result, the processing path is executed again, making it possible to more accurately estimate the cause of the abnormality. Further, the result of estimating the cause of the abnormality is stored in the abnormality record database 7 as historical data for learning rules. User 6 takes measures to recover from the abnormality based on the inference results.
When the cause of the failure is finally determined, the learning unit 8 re-determines the reliability of the depression-cause rule in the knowledge base 5 by inputting the conclusion.

本診断処理の動作順序を第２図に示す。第２図のブロッ
ク１０及び２０は第１図のブロック４ｃ、第２図のブロ
ック３０及び４０は第１図のブロック３．第２図のブロ
ック５０は第１図のブロック４ｂ、第２図のブロック６
０及び６０ａは第１図のブロック４ａと３．第２図のブ
ロック７０及び８０は第１図のブロック２ｂ、第２図の
１４０から１７０は第１図の８にてそれぞれ行われる。FIG. 2 shows the operating order of this diagnostic process. Blocks 10 and 20 in FIG. 2 are block 4c in FIG. 1, and blocks 30 and 40 in FIG. 2 are block 3 in FIG. Block 50 in FIG. 2 is block 4b in FIG. 1, block 6 in FIG.
0 and 60a correspond to blocks 4a and 3.0 in FIG. Blocks 70 and 80 of FIG. 2 are performed at block 2b of FIG. 1, and blocks 140 to 170 of FIG. 2 are performed at 8 of FIG. 1, respectively.

第２図のブロック１ｏから９０までは自動診断部の処理
フローを表している。Blocks 1o to 90 in FIG. 2 represent the processing flow of the automatic diagnosis section.

システムの異常が発生すると、診断処理は異常発生箇所
を認識し異常情報フリーズ２０を実行する。これは異常
発生時の瞬間データを保存し異常が発生した部分の暴走
などによってデータの破壊及び時間経過によるデータの
変化−防止する。この異常情報の収集が完了した時点で
、フリーズされた異常発生時の情報と原因ルールと知識
ベース上の事実形知識を利用した異常原因推定３０を実
行する。具体的には、上記フリーズ情報に基づく異常状
況群から考えられる全ての異常原因が、その異常状況群
の原因である確信度を、状況−原因ルールの確信度の高
いものによって求め、その結果確信度の高いものを、原
因仮説とするものである。この推論結果により原因仮説
の確信度が一定値以下である場合、異常診断用データ収
集５０をシステムのデータ通信経路を介して実行する。When an abnormality occurs in the system, the diagnostic process recognizes the location where the abnormality has occurred and executes abnormality information freeze 20. This saves the instantaneous data when an abnormality occurs, and prevents data from being destroyed due to runaway of the part where the abnormality has occurred, and data from changing over time. When the collection of this abnormality information is completed, abnormality cause estimation 30 is executed using the frozen information at the time of abnormality occurrence, cause rules, and factual knowledge on the knowledge base. Specifically, the degree of certainty that all possible abnormal causes from the group of abnormal situations based on the freeze information are the causes of the group of abnormal situations is determined by using the situation-cause rule with a high degree of certainty. The one with the highest degree is considered the causal hypothesis. If the certainty of the cause hypothesis is below a certain value based on this inference result, data collection 50 for abnormality diagnosis is executed via the data communication path of the system.

ここでは、原因−状況ルールによって原因仮説から考え
られる関連状況を抽出し、これら状況に関する収集デー
タ項目でフリーズ情報に含まれていないデータを収集す
る。具体的には、固定データ及び拡張情報、また異常箇
所に関連ある部分のデータである。収集されたデータの
事実情報に基づき、異常突型績データベース更新６０が
実行される。Here, possible related situations are extracted from the cause hypothesis using cause-situation rules, and data that is not included in the freeze information is collected as collected data items related to these situations. Specifically, this includes fixed data, extended information, and data related to the abnormal location. Based on the factual information of the collected data, an abnormal performance database update 60 is executed.

この知識は事実知識データとして既成の事実形知識デー
タに準する形で作成される。This knowledge is created as factual knowledge data in a form similar to existing factual knowledge data.

上記処理を繰り返した後、原因仮説の確信度が一つでも
、ある一定値を越えるか、もしくは自動収集が可能な情
報だけではまだ十分な可能性が裏付けられない場合、ユ
ーザーへ診断報告７０を行う。これは現時点までで得ら
れた異常原因仮説、異常状況の原因としての確信度と対
策方法を確信度の高い順に列挙する。この報告にて出力
される異常原因仮説の確（，７度、対策方法によりユー
ザーは、再診断を実施するか、診断終了９０とするかを
決定する。この判断の補助としてユーザーは今まで収集
されたデータをマンマシンインターフェイスを介して参
照する。再診断を決定した場合は再診断用の新事実デー
タ、具体的にはユーザーの視覚などに基づくあいまい要
素を含んだ情報であるが、を入力することにより異常原
因推定３０を再度実行し、より綿密な診断結果が得られ
る。After repeating the above process, if the confidence level of even one cause hypothesis exceeds a certain value, or if the possibility cannot be sufficiently supported by the information that can be automatically collected, a diagnostic report 70 is sent to the user. conduct. This lists the abnormality cause hypotheses obtained up to this point, the degree of certainty as the cause of the abnormal situation, and countermeasures in descending order of degree of certainty. The confirmation of the abnormality cause hypothesis output in this report (7 degrees, depending on the countermeasure method, the user decides whether to re-diagnose or terminate the diagnosis.As an aid to this decision, the user has collected the The data is referenced via a man-machine interface.If re-diagnosis is decided, new factual data for re-diagnosis, specifically information containing ambiguous elements based on the user's visual perception, is input. By doing so, the abnormality cause estimation 30 is executed again, and a more thorough diagnosis result can be obtained.

次に学習部であるが、この処理は自動診断部が出力した
結果の中で最終的に異常原因であると判明した結果を診
断装置に入力した時点で推定結果によるルール確信変更
＄７１５０が実行される。Next is the learning section, which executes the process of changing the rule confidence based on the estimation result at the time when the result finally determined to be the cause of the abnormality among the results output by the automatic diagnosis section is input into the diagnostic device. be done.

ここでは状況−原因ルール１１０及び推定結果１２０を
使用する。Here, the situation-cause rule 110 and the estimated result 120 are used.

状況−原因ルール１１０は次のような形で表されている
。The situation-cause rule 110 is expressed in the following form.

［条件部］状況Ｘが観測されれば［推論部］Ａが原因である　確信度ａＩＢが原因である
　確信度ａ２Ｃが原因である　確信度ａ３Ｄが原因である　確信度ａ４上記確信度ａｌ、ａ２．ａ３．ａ４は診断システムの初
期動作時には一定の数値（初期値）が与えられている。[Conditional part] If situation X is observed, [Inference part] A is the cause. Confidence level aIB is the cause. Confidence level a2 C is the cause. , a2. a3. a4 is given a constant value (initial value) at the time of initial operation of the diagnostic system.

通常この初期値は過去の経験データをもとに計算される
ものであるが、経験データがまったく無い場合は単なる
推測値でも構わない。Normally, this initial value is calculated based on past experience data, but if there is no experience data at all, a mere estimated value may be used.

ここでは原因Ａ、Ｂ、Ｃ，Ｄは状況Ｘに関しては過去の
事例に基づき同一確信度の原因であると仮定する。Here, it is assumed that causes A, B, C, and D have the same degree of certainty regarding situation X based on past cases.

また容易に原因を確定出来ない場合上記確信度の裏付け
として次のような原因−状況ルールがある。In addition, when the cause cannot be easily determined, the following cause-situation rules are available to support the above certainty.

［条件部］原因Ａがあれば［推論部］状況りがＩｔ　１ｌｌｌ＋される　確信度ｂ
１状況Ｍが観測される　確信度ｂ２状況Ｎがｌｌ１２２ＩＩ１１される　確信度ｂ３システ
ムを運用し状況Ｘの故障が発生すると自動診断部は上記
ルール群と収集データをもとに次のような推論結果を推
定結果ファイル１２０へ出力する。[Conditional part] If cause A exists, [Inference part] The situation is It 1llll+ Confidence level b
1 Situation M is observed Confidence level b2 Situation N is ll122II11 Confidence level b3 When the system is operated and a failure in situation is output to the estimation result file 120.

［結果１］Ａｔ：ｉｃｌの確信度で状況Ｘの原因である
。[Result 1] At: is the cause of situation X with the confidence of icl.

［結果２］Ｂはｃ２の確信度で状況Ｘの原因である。[Result 2] B is the cause of situation X with certainty of c2.

［結果３］Ｃはｃ３の確信度で状況Ｘの原因である。[Result 3] C is the cause of situation X with certainty of c3.

これらの結果確信度ｃｌ、ｃ２．’ｃ３は原因Ａ。These result confidence levels cl, c2. 'c3 is cause A.

Ｂ、Ｃが持つ確信度と収集されたデータから裏付けられ
るＤに囚−状況ルールの確信度を使用して計算されるも
のである。これらは次のような関数（コンバイン関数）
を使用することで求められる。It is calculated using the certainty of B and C and the certainty of the prisoner-situation rule for D, which is supported by the collected data. These are the following functions (combine functions)
It can be found by using

（１）ａ及びｂが正の値である場合ｃ＝ａ＋ｂ−（ａ　＊　ｂ）（２）ａかｂのどちらかが負の値である場合ｃ　　＝　
（ａ　　＋　　ｂ　）／　（１−＋＋＋ｉｎ（ｌ　　ａ
　　ｌ　　本　１　ｂ　１　）（３）ａ及びｂが負の値
である場合ｃ＝ａ＋ｂ＋　（ａ　＊　ｂ）これは複数個の確信度を一つにまとめる役割をしている
。(1) If a and b are positive values, c = a + b - (a * b) (2) If either a or b is a negative value, c =
(a + b)/ (1-+++in(l a
l Book 1 b 1 ) (3) When a and b are negative values, c = a + b + (a * b) This serves to combine multiple confidence levels into one.

ここで結果の確信度ｃｌ、ｃ２．ｃ３には次の関係が存
在するものと仮定する。Here, the confidence of the result cl, c2. It is assumed that c3 has the following relationship.

ｃ　１　＞　ｃ　２　＞　ｃ　３かつ、ｃｌ）原因としてみとめられる時の一定値学習部
では、この確信度結果に基づき該当ルールの確（４度（
ａｌ、ａ２．ａ３）を更新１５０を行う。c 1 > c 2 > c 3 and cl) The constant value learning unit determines the probability of the corresponding rule (4 degrees (
al, a2. a3) is updated 150.

この更新方式には移動平均値（ｍｏｖｉｎｇａｖｅｒａ
ｇｅ）を用いる。この方式を採用する理由は、計算機プ
ログラムのステップ数が他方式と比べて比較的少なくて
済むこと、実行時間が短いなどが上げられる。This update method uses a moving average value (moving average value).
ge) is used. The reasons for adopting this method are that the number of steps in the computer program is relatively small compared to other methods, and the execution time is short.

推論ルール部の確信度群ａ　ｌ　＋　ａ　２　＋　ａ　
３は次のように過去の観ル１す結果から単純平均として
計算されている。Confidence group a l + a 2 + a of inference rule part
3 is calculated as a simple average from past viewing results as follows.

ｉ＝１ Σｃｌａ＝　　　　　　　　　　　　　　・・・式（１）Ｎ　
＝該当原因による異常が観測された回数ＣＩ　＝結果確信度ｉａ　＝原因の平均確信度。i=1 Σcl a= ...Formula (1) N
= Number of times an abnormality due to the corresponding cause was observed CI = Result confidence level i a = Average confidence level of the cause.

従って今回の結果１２０を使用して確信度ａ１を更新す
る場合は下記式を利用する。Therefore, when updating the confidence level a1 using the current result 120, the following formula is used.

ａｔ”ａｔ−ｔ＋　　（ａｅｔ−ａｔ−ｎ）　　　　”
’式（２）ａｊ　　＝今回の原因確信度（更新された値
）ａｔ−１”前回の原因確信度ａｅｔ　　＝今回の原因確信度（推定値）ａｔ−１＝ｎ
回前の結果確信度ｎ　　　＝移動平均値のウィンドウ幅大（２）に現われるａｅｌは、今回の結果確信度より算
出される。これは原因確信度は結果確信度に比例するこ
とから、次の式にて求める。at"at-t+ (aet-at-n)"
'Equation (2) aj = Current cause certainty (updated value) at-1'' Previous cause certainty aet = Current cause certainty (estimated value) at-1 = n
The ael that appears in the previous result confidence n = moving average value window width (2) is calculated from the current result confidence. Since cause certainty is proportional to result certainty, this is calculated using the following formula.

　　ｔ−１ｃｔ　　＝今回の結果確信度Ｑｔ−１＝前回の結果確信度式（２）にて求めた移動平均値を状況−原因ルールの確
信度として使用することにより、推論ルールは現状を反
映させる事になり、より正確に現状を表す。t-1 ct = Current result confidence Qt-1 = Previous result confidence By using the moving average value obtained by formula (2) as the certainty of the situation-cause rule, the inference rule reflects the current situation. This will more accurately represent the current situation.

この学習部分の特徴は１式（２）のｎを小さくすれば新
しい情報により敏感になり、最新の状態を素早く追従（
学習）する。逆にｎを大きくとれば、過去の情報にも重
みを与える事となり、学習速度は遅くなるが、従来から
の考え方、解析方法を含んだ故障診断となる。The feature of this learning part is that if n in Equation 1 (2) is made smaller, it becomes more sensitive to new information and quickly follows the latest state (
learn. On the other hand, if n is set to a large value, past information will also be given weight, and the learning speed will be slow, but the fault diagnosis will include conventional thinking and analysis methods.

式（２）のｎを２から５まで変化させ、シミュレーショ
ンした結果を図３のグラフに示す。この確信度更新は、
最終的に正しい推論結果を導き出したルールの確信度更
新である。The graph of FIG. 3 shows the results of a simulation in which n in equation (2) was varied from 2 to 5. This confidence update is
This is an update of the confidence level of the rule that ultimately led to the correct inference result.

このグラフからも分かるように、ｎが小さいほどルール
の確信度更新（学習）は早くなり、最新の状態を程良く
追従する事が分かる。As can be seen from this graph, the smaller n is, the faster the rule confidence update (learning) is, and it is understood that the latest state can be followed appropriately.

但し、ｎ＝２では、変動量が多く、またｎ≧５では学習
速度が極端に遅くなってくる。従って。However, when n=2, the amount of variation is large, and when n≧5, the learning speed becomes extremely slow. Therefore.

ｎ＝３が一番スムーズに追従する。n=3 provides the smoothest tracking.

ルールの学習式にｎ＝３を使用し、過去の故障データを
入力として状況から正しい原因を導き出すヒツト回数の
実績を第４図のグラフに示す。ここではｎ＝３及びｎ＝
５の学習式と学習無しの結果比較を行なった。ここで見
られる様にテスト開始時のヒラ１〜率は、学習式も無学
習式も同じであるが、診断回数が増えるにつれて、ヒツ
ト率の差は歴然としている。学習ｎ＝３の学習式では、
１０回目のテストに於て原因のヒツト率９４％となり、
はぼ完壁な原因推論を１行なっているのに対して、学習
無しの診断は、原因のヒツト率５０％程度とシミュレー
ション開始時と変わらない。The graph in FIG. 4 shows the number of hits for deriving the correct cause from the situation by using n=3 in the rule learning formula and inputting past failure data. Here n=3 and n=
We compared the results of the learning method No. 5 and the results without learning. As can be seen here, the hit rate at the start of the test is the same for both the learning and non-learning methods, but as the number of diagnoses increases, the difference in hit rate becomes clearer. In the learning formula for learning n = 3,
In the 10th test, the hit rate for the cause was 94%,
While one complete causal inference is performed, diagnosis without learning results in a hit rate of about 50%, which is the same as at the start of the simulation.

本発明のもう一つの特徴は、異常の原因とみなす確信度
の設定値により、推論ルール数を限定出来ることにある
。Another feature of the present invention is that the number of inference rules can be limited by the set value of the confidence level that is considered to be the cause of an abnormality.

これは、異常の原因となる項目の確信度（推論結果とし
ての確信度）が一つでも、ある一定値を越えたとき、原
因追求の推論を中止しシステムのユーザーに報告するた
めである。This is because when even one item's confidence level (confidence level as an inference result) that causes an abnormality exceeds a certain value, the inference to pursue the cause is stopped and a report is sent to the system user.

原因項目の確信度は次の様に分類出来る。The certainty of cause items can be classified as follows.

確信度　　　　　　可能性の度合１．０〜０．７　　　状況Ｘの原因である可能性が高い
０．７〜０．３　　　状況Ｘの原因である可能性がある
０、３〜−０．３　　　どちらとも言えない−０，３〜
−０，７状況Ｘの原因でない可能性がある−０．７〜−
１．０　　　状況Ｘの原因でない可能性が高い従って、
非常に高い確信度で原因を知りたい場合、結果確信度を
高く設定すれば（例：０．９）故障診断システムの推論
時間はある程度かかるが、ユーザーに報告される原因は
、非常に高い確率で真の故障原因である事が補償される
。Confidence level Probability 1.0 to 0.7 Highly likely to be the cause of situation X 0.7 to 0.3 Possible to be the cause of situation X 0, 3 to -0.3 Neither I can't say -0,3~
-0.7 Possibly not the cause of situation X -0.7~-
1.0 There is a high possibility that this is not the cause of situation X. Therefore,
If you want to know the cause with very high confidence, setting the result confidence high (e.g. 0.9) will take some time for the fault diagnosis system to infer, but the cause reported to the user will have a very high probability. The true cause of the failure will be compensated.

逆に結果確信度を低く設定すれば（例：　０．６）故障
診断システムの推論時間は短くなるが原因となる可能性
をもつ項目が複数得られる。Conversely, if the result certainty is set low (for example, 0.6), the inference time of the fault diagnosis system will be shortened, but multiple items that may be the cause will be obtained.

どちらを採用するかは、そのシステムの規模などによっ
てきまる。状況−原因ルールの確信度更新が終了すると
、確信度の高い順にルールを並び替える１６０゜この処
理の目的は、学習結果を基に次回診断時に確信度の高い
ものを優先して実行することにある。これによって推論
時間の短縮が得られるようになる。Which one to adopt depends on the scale of the system. When the confidence of the situation-cause rule is updated, the rules are sorted in descending order of confidence (160°).The purpose of this process is to give priority to execution of rules with high confidence the next time based on the learning results. be. This makes it possible to reduce inference time.

〔Effect of the invention〕

本発明は、故障診断部とルール学習部を結合し。 The present invention combines a fault diagnosis section and a rule learning section.

診断結果から自動的にルール調整を行なうことにより、
システムの故障診断において次の効果をもたらすことが
期待される。By automatically adjusting rules based on diagnosis results,
It is expected to bring about the following effects in system failure diagnosis.

（１）診断結果情報から診断ルールの確信度を更新（学
習）する事により、専門家に頼らなくとも診断ルールの
微調整が可能である。(1) By updating (learning) the confidence level of the diagnostic rule from the diagnostic result information, it is possible to fine-tune the diagnostic rule without relying on experts.

（２）診断結果情報から診断ルールの確信度を更新（学
習）し、次回診断時に確信度の高いルールを優先して実
行する事により、原因推定の推論時間を短縮できる。(2) The inference time for cause estimation can be shortened by updating (learning) the confidence level of diagnostic rules from the diagnosis result information and giving priority to executing rules with a high confidence level at the next diagnosis.

（３）学習課程の移動平均値のウィンドウを任意に設定
することにより、急速学習、低速学習のどちらのシステ
ムにも変更可能になる。(3) By arbitrarily setting the window of the moving average value of the learning process, it is possible to change the system to either rapid learning or slow learning.

従ってより柔軟な診断システム構築ができろ。Therefore, it is possible to build a more flexible diagnostic system.

（４）自動学習によって、専門家の個人的な経験。(4) Expert's personal experience through automatic learning.

勘に左右されることなくルールの学習度合のバラツキを
防止する。従って故障診断システムの標準化が行える。To prevent variations in the degree of learning of rules without being influenced by intuition. Therefore, the failure diagnosis system can be standardized.

[Brief explanation of the drawing]

第１図は本発明の一実施例のソフトウェア構成図、第２
図は第１図の動作順序を示すフロー図。第３図はルールの確信度更新のグラフを示す図、第４図
は故障総数及び原因ヒツト率を表わすグラフを示す図で
ある。１・・・診断対象、２・・・マンマシンインターフェイ
ス、３・・・診断部、４・・・自動データ収集部、５・
・・知識ベース、６・・・ユーザー、７・・・異常実績
データベース、８・・・学習部。Figure 1 is a software configuration diagram of an embodiment of the present invention, Figure 2 is a software configuration diagram of an embodiment of the present invention.
The figure is a flowchart showing the operation sequence of FIG. 1. FIG. 3 is a diagram showing a graph of rule certainty update, and FIG. 4 is a diagram showing a graph representing the total number of failures and cause hit rate. 1... Diagnosis target, 2... Man-machine interface, 3... Diagnosis section, 4... Automatic data collection section, 5.
...Knowledge base, 6. User, 7. Abnormal performance database, 8. Learning department.

Claims

[Claims]

1. A data collection unit that recognizes the occurrence of a failure in a computer system and collects the data necessary for failure diagnosis, a diagnosis unit that estimates the cause of the failure based on the relevant knowledge base, and a man-machine interface that notifies the operator of the diagnosis results. What is claimed is: 1. A computer system failure diagnosis device having a function of updating certainty factors used in rules in a knowledge base each time a failure occurs.