JP5198154B2

JP5198154B2 - Fault monitoring system, device, monitoring apparatus, and fault monitoring method

Info

Publication number: JP5198154B2
Application number: JP2008146774A
Authority: JP
Inventors: 大輔福井; 真澄川上
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2008-06-04
Filing date: 2008-06-04
Publication date: 2013-05-15
Anticipated expiration: 2028-06-04
Also published as: JP2009294837A

Description

本発明は、障害監視システム及びデバイスと監視装置並びに障害監視方法に関し、特にネットワークを通じて直接的または間接的に通信が可能な複数のデバイスにおける障害の発生を監視する技術に関する。 The present invention relates to a fault monitoring system, a device, a monitoring apparatus, and a fault monitoring method, and more particularly to a technique for monitoring the occurrence of a fault in a plurality of devices that can communicate directly or indirectly through a network.

近年、携帯電話やテレビといったコンシューマ機器が多機能化しており、そこに搭載されるソフトウェアの規模は年々増加する傾向にある。一般に、ソフトウェアの規模が増加すると、潜在的なバグの数が増加し、製品出荷後に不良が発生する可能性が高くなる。こうした状況を避けるため、大規模なソフトウェアの開発を高品位に、かつ高効率に行うために、モデルベース開発やソフトウェアプロダクトラインといったソフトウェア開発の方法論に関する研究が進められている。
一方、コンシューマ機器の製品出荷後に発生するソフトウェアの不具合を迅速に修正するための要素技術の研究も進んでいる。ソフトウェアをリモートから更新するための機能を備えたテレビやゲーム機が製品化されている。また、機器上で発生した障害を検出するシステムが一般に知られている。例えば特許文献１記載の障害監視システムや特許文献２記載の組み込み機器用監視装置などがある。
特開平９−９１２１９号公報特開２００４−１８７１９６号公報 In recent years, consumer devices such as mobile phones and televisions have become multifunctional, and the scale of software installed therein tends to increase year by year. In general, as the size of software increases, the number of potential bugs increases and the possibility of defects occurring after product shipment increases. In order to avoid such a situation, research on software development methodologies such as model-based development and software product line is underway in order to develop large-scale software with high quality and high efficiency.
On the other hand, research on elemental technologies for promptly correcting software defects that occur after product shipment of consumer devices is also in progress. TVs and game consoles with a function for remotely updating software have been commercialized. A system for detecting a failure that has occurred on a device is generally known. For example, there are a failure monitoring system described in Patent Document 1 and a monitoring device for embedded devices described in Patent Document 2.
JP 9-91219 A JP 2004-187196 A

一般的な障害検出システムは、監視対象機器からセンタサーバへ障害情報を送信することで障害の検出を行っている。この方式では、監視対象機器の数が増えるにつれて通信回数や通信データ量が増加し、輻輳の発生やサーバ維持費の増加に繋がっている。そこで、前記特許文献１では、発生した障害情報の緊急度を判定し、緊急度が高い障害情報は直ちにセンタサーバへ送信し、緊急度が低い障害情報は一定期間蓄積した後にまとめて送信する手段について述べている。また、前記特許文献２では、監視対象機器同士が連携し、障害情報を回覧することで通信回数を削減する手段について述べている。
しかし、前記特許文献１記載の方式では、通信の回数を削減することは可能であるが、通信データ量の削減には課題が残る。また、前記特許文献２記載の方式では、センタサーバに対する負担は軽減されるが、通信データ量は削減されず、かつ障害情報の収集に係る監視対象機器の高コスト化といった課題を持つ。 A general failure detection system detects a failure by transmitting failure information from a monitored device to a center server. In this method, the number of communications and the amount of communication data increase as the number of devices to be monitored increases, leading to congestion and an increase in server maintenance costs. Therefore, in Patent Document 1, the urgency of the failure information that has occurred is determined, the failure information having a high degree of urgency is immediately transmitted to the center server, and the failure information having a low degree of urgency is accumulated for a certain period and then transmitted together. About. Further, the above-mentioned Patent Document 2 describes means for reducing the number of communication times by monitoring target devices in cooperation with each other and circulating fault information.
However, with the method described in Patent Document 1, it is possible to reduce the number of communications, but there remains a problem in reducing the amount of communication data. In the method described in Patent Document 2, although the burden on the center server is reduced, the amount of communication data is not reduced, and there is a problem that the cost of a monitoring target device related to collection of failure information is increased.

一般に通信データ量を削減する方式としてエラーコードの利用や、重要度の低い障害情報を間引いて送信する手段が考えられるが、この方式では障害情報の品質が低下する。エラーコードの送信だけでは想定外の障害に対処することは難しく、障害情報を間引く手段では本当に必要な障害情報にアクセスできなくなる可能性がある。 In general, as a method for reducing the amount of communication data, use of an error code or means for thinning out and transmitting failure information with low importance can be considered, but with this method, the quality of failure information is reduced. It is difficult to cope with an unexpected failure only by transmitting an error code, and there is a possibility that the failure information that is really necessary cannot be accessed by means of thinning out the failure information.

本発明が解決しようとする第１の課題は、監視対象機器の増加に伴う障害情報の通信データ量の増加である。第２の課題は、通信データ量を削減することによる障害情報の品質低下である。 A first problem to be solved by the present invention is an increase in the amount of communication data of failure information accompanying an increase in monitored devices. A second problem is a reduction in the quality of failure information due to a reduction in the amount of communication data.

前記課題を解決するため、本発明ではソフトウェアの開発段階で実施されるテストの結果を基にした障害情報を作成する。例えばモデルベース開発と呼ばれる開発方法論で用いられる状態遷移表を基にした障害情報を作成する。テスト結果を基に、障害発生箇所のテストが実施済である場合は想定外の障害として詳細な障害情報を作成し、未実施である場合は想定内の障害としてテストの実施を促すための簡易な障害情報を作成し、実施中である場合は現在対策中の障害として障害情報を作成しない。この方式を採用することで、前記課題を解決する。 In order to solve the above-mentioned problems, the present invention creates fault information based on the results of tests performed at the software development stage. For example, fault information is created based on a state transition table used in a development methodology called model-based development. Based on the test result, if failure test has been performed, detailed failure information is created as an unexpected failure, and if it has not been performed, it is easy to promote the test execution as an expected failure If failure information is created and is being implemented, failure information is not created as a failure currently being addressed. By adopting this method, the above problem is solved.

すなわち、本発明は、監視対象となるデバイスと、前記デバイスにおける障害の発生を監視する監視装置と、前記デバイスの障害データを記憶するデータベースである外部記憶装置とからなり、これら装置がネットワークを介してデータの送受信が可能な障害監視システムであって、前記デバイス上で動作するプログラムモジュールにおいて発生した障害を監視する障害監視部と、前記プログラムモジュールの状態遷移を監視する状態遷移監視部と、前記障害監視部が障害の発生を検知した際に前記状態遷移監視部より状態遷移履歴情報を取得し、さらに前記外部記憶装置より前記プログラムモジュールに関して実施されたテストデータを取得し、前記状態遷移履歴情報および前記テストデータを基に障害情報を作成する障害情報作成部と、前記障害情報作成部により作成された障害情報を、ネットワークを介して前記監視装置に送信する障害情報送信部とを有する障害監視システムである。 That is, the present invention includes a device to be monitored, a monitoring device that monitors occurrence of a failure in the device, and an external storage device that is a database that stores failure data of the device, and these devices are connected via a network. A fault monitoring system capable of transmitting and receiving data, a fault monitoring unit for monitoring faults occurring in program modules operating on the device, a state transition monitoring unit for monitoring status transitions of the program modules, When the failure monitoring unit detects the occurrence of a failure, it acquires state transition history information from the state transition monitoring unit, and further acquires test data performed on the program module from the external storage device, and the state transition history information And a failure information creation unit for creating failure information based on the test data, The fault information generated by the serial failure information creation unit, a fault monitoring system comprising a fault information transmission unit that transmits to the monitoring device via the network.

本発明によれば、監視対象機器からセンタサーバに送信する障害情報のデータ量を削減でき、よって監視対象機器の増加に伴うネットワークトラフィックの増加を抑制することができる。また、ソフトウェアの開発段階で実施されるテストの結果に依拠した監視方法であることから、既存の方式に比べて障害情報の品質を向上させることができ、かつ本発明の導入コストを低く抑えることが可能である。 ADVANTAGE OF THE INVENTION According to this invention, the data amount of the failure information transmitted to a center server from a monitoring object apparatus can be reduced, Therefore The increase in the network traffic accompanying the increase in the monitoring object apparatus can be suppressed. In addition, since the monitoring method is based on the results of tests performed at the software development stage, it is possible to improve the quality of fault information compared to existing methods and to keep the introduction cost of the present invention low. Is possible.

本発明を実施するための最良の形態を説明する。
本発明の障害監視システム及びデバイスと監視装置並びに障害監視方法の実施形態について、図面を用いて説明する。
図１はハードウェア構成図である。これは、本発明に係る一般的なシステムのハードウェア構成を表すものであり、ハードウェア構成を限定するものではない。また、本発明はネットワークを介した複数の端末を利用するシステムについて述べたものであるが、本図面は単一の端末について示したものであり、本発明に係る全ての端末を網羅するものではない。 The best mode for carrying out the present invention will be described.
Embodiments of a failure monitoring system, a device, a monitoring apparatus, and a failure monitoring method of the present invention will be described with reference to the drawings.
FIG. 1 is a hardware configuration diagram. This represents a hardware configuration of a general system according to the present invention, and does not limit the hardware configuration. In addition, the present invention describes a system that uses a plurality of terminals via a network, but this drawing shows a single terminal and does not cover all terminals according to the present invention. Absent.

演算装置１０１は、主記憶装置１０２にロードされたプログラムデータを解析し、処理を実行する中央演算ユニットである。例えばIntel社製のPentium（登録商標）プロセッサなどが演算装置１０１に該当する。
主記憶装置１０２は、外部記憶装置１０４に記録されているプログラムデータをロードする揮発メモリである。例えばＤＲＡＭ等の半導体メモリが主記憶装置１０２に該当する。
通信装置１０３は、外部ネットワークと通信するための装置である。例えばインターネットに接続するためのネットワークインタフェースカードが通信装置１０３に該当する。
外部記憶装置１０４は、プログラムデータ等を記憶する不揮発メモリである。例えばハードディスク装置が外部記憶装置１０４に該当する。外部記憶装置１０４は、データベース等のネットワークを介した装置であってもよい。 The arithmetic device 101 is a central arithmetic unit that analyzes program data loaded in the main storage device 102 and executes processing. For example, a Pentium (registered trademark) processor manufactured by Intel corresponds to the arithmetic unit 101.
The main storage device 102 is a volatile memory that loads program data recorded in the external storage device 104. For example, a semiconductor memory such as a DRAM corresponds to the main storage device 102.
The communication device 103 is a device for communicating with an external network. For example, a network interface card for connecting to the Internet corresponds to the communication device 103.
The external storage device 104 is a nonvolatile memory that stores program data and the like. For example, a hard disk device corresponds to the external storage device 104. The external storage device 104 may be a device via a network such as a database.

図２はモジュール概念図である。これは、本発明に係る一般的なモジュールの概念構成を表すものであり、モジュールの構成を限定するものではない。図示されているモジュールの包含関係（例えば被監視装置２０２と監視エージェント２０６の包含関係）は一般的な例であり、包含関係を規定するものではない。また、図示されているモジュールがネットワークを介して連携するかどうか、同一の装置上で連携するかどうかについて規定するものではない。 FIG. 2 is a conceptual diagram of the module. This represents a conceptual configuration of a general module according to the present invention, and does not limit the configuration of the module. The illustrated module inclusion relationship (for example, the inclusion relationship between the monitored device 202 and the monitoring agent 206) is a general example and does not define the inclusion relationship. Further, it does not define whether the illustrated modules are linked via a network or whether they are linked on the same device.

ネットワーク２０１は、データの送受信が行えることを特徴とする通信ネットワークである。例えばインターネットやEthernet（登録商標）などがネットワーク２０１に該当する。 The network 201 is a communication network characterized in that data can be transmitted and received. For example, the Internet or Ethernet (registered trademark) corresponds to the network 201.

被監視装置２０２は、ネットワーク２０１を介して監視装置２１５によって監視される装置である。例えば携帯電話やホームゲートウェイなどが被監視装置２０２に該当する。
被監視プログラム２０３は、監視エージェント２０６によって監視されるプログラムである。例えばホームゲートウェイ上で動作する情報家電制御プログラムなどが被監視プログラム２０３に該当する。 The monitored device 202 is a device monitored by the monitoring device 215 via the network 201. For example, a mobile phone or a home gateway corresponds to the monitored device 202.
The monitored program 203 is a program monitored by the monitoring agent 206. For example, an information home appliance control program that operates on a home gateway corresponds to the monitored program 203.

状態遷移通知部２０４は、被監視プログラム２０３上で発生した状態遷移に関する情報を状態遷移監視部２０７に通知するモジュールである。また、通知方式は、PUSH型（状態遷移通知部２０４から状態遷移監視部２０７へ通知する方式）であってもよく、PULL型（状態遷移監視部２０７から状態遷移通知部２０４に問い合わせる方式）であってもよい。例えばJava（登録商標）のJMX（Java Management Extensions）仕様で規定されるMBeanなどを利用することで状態遷移通知部２０４を実装できる。この具体例については後述する。 The state transition notification unit 204 is a module that notifies the state transition monitoring unit 207 of information regarding the state transition that has occurred on the monitored program 203. The notification method may be a PUSH type (a method for notifying the state transition monitoring unit 207 from the state transition notification unit 204), or a PULL type (a method for inquiring the state transition notification unit 204 from the state transition monitoring unit 207). There may be. For example, the state transition notification unit 204 can be implemented by using an MBean defined by the Java (registered trademark) JMX (Java Management Extensions) specification. A specific example will be described later.

障害通知部２０５は、被監視プログラム２０３上で発生した障害に関する情報を障害監視部２０８に通知するモジュールである。また、通知方式は、PUSH型（障害通知部２０５から障害監視部２０８へ通知する方式）であってもよく、PULL型（障害監視部２０８から障害通知部２０５に問い合わせる方式）であってもよい。例えばJavaのJMX仕様で規定されるMBeanなどを利用することで障害通知部２０５を実装できる。この具体例については後述する。 The failure notification unit 205 is a module that notifies the failure monitoring unit 208 of information related to a failure that has occurred in the monitored program 203. The notification method may be a PUSH type (a method for notifying the failure monitoring unit 208 from the failure notification unit 205) or a PULL type (a method for making an inquiry from the failure monitoring unit 208 to the failure notification unit 205). . For example, the failure notification unit 205 can be implemented by using an MBean defined by the Java JMX specification. A specific example will be described later.

監視エージェント２０６は、被監視プログラム２０３の状態遷移、および被監視プログラム２０３における障害の発生を監視し、被監視プログラム２０３で障害が発生した場合、ネットワーク２０１を介してデータベース２１２より状態遷移表テストデータ２１３および状態遷移パステストデータ２１４等のテストデータを取得し、当該テストデータを基に発生した障害の種類を判定し、発生した障害の種類を基に障害情報を作成し、必要であればネットワーク２０１を介して監視装置２１５に前記障害情報を送信するモジュールであり、また、ネットワーク２０１を介して監視装置２１５より最新のテストデータを受信し、当該テストデータを基にネットワーク２０１を介してデータベース２１２に格納されている状態遷移表テストデータ２１３および状態遷移パステストデータ２１４等の前記テストデータを更新する機能を持ったモジュールである。
監視エージェント２０６は、被監視プログラム２０３と同じマシン上に存在してもよく、ネットワーク２０１を介した別のマシン上に存在してもよい。また、状態遷移表テストデータ２１３および状態遷移パステストデータ２１４等のテストデータは、別のテストデータを利用することも可能である。本実施例では、テストデータの具体的な例として状態遷移表テストデータ２１３および状態遷移パステストデータ２１４を利用する。これらのテストデータの具体例については後述する。 The monitoring agent 206 monitors the state transition of the monitored program 203 and the occurrence of a failure in the monitored program 203, and when a failure occurs in the monitored program 203, the state transition table test data from the database 212 via the network 201. 213, state transition path test data 214, and the like are acquired, the type of fault that has occurred is determined based on the test data, fault information is created based on the type of fault that has occurred, and if necessary, network 201 is a module that transmits the failure information to the monitoring device 215 via the 201, and receives the latest test data from the monitoring device 215 via the network 201, and the database 212 via the network 201 based on the test data. State transition table test data stored in The test data such as 13 and the state transition path test data 214 is a module having a function of updating the.
The monitoring agent 206 may exist on the same machine as the monitored program 203, or may exist on another machine via the network 201. Further, the test data such as the state transition table test data 213 and the state transition path test data 214 can use other test data. In this embodiment, the state transition table test data 213 and the state transition path test data 214 are used as specific examples of test data. Specific examples of these test data will be described later.

状態遷移監視部２０７は、前述の通り、状態遷移通知部２０４と連携し、被監視プログラム２０３の状態遷移を監視するモジュールである。
障害監視部２０８は、前述の通り、障害通知部２０５と連携し、被監視プログラム２０３上で発生する障害を監視するモジュールである。
障害情報作成部２０９は、障害監視部２０８が障害の発生を検知した際、過去に遷移した状態の履歴情報を状態遷移監視部２０７より取得し、ネットワーク２０１を介して当該情報に対応する状態遷移表テストデータ２１３および状態遷移パステストデータ２１４をデータベース２１２より取得し、当該データを基に障害情報を作成し、必要であれば当該障害情報を障害情報送信部２１０に送信するモジュールである。障害情報の作成手順および障害情報の具体例については後述する。 As described above, the state transition monitoring unit 207 is a module that monitors the state transition of the monitored program 203 in cooperation with the state transition notification unit 204.
As described above, the fault monitoring unit 208 is a module that monitors faults occurring on the monitored program 203 in cooperation with the fault notification unit 205.
When the failure monitoring unit 208 detects the occurrence of a failure, the failure information creation unit 209 acquires history information of the state transitioned in the past from the state transition monitoring unit 207, and the state transition corresponding to the information via the network 201 This module acquires the table test data 213 and the state transition path test data 214 from the database 212, creates failure information based on the data, and transmits the failure information to the failure information transmission unit 210 if necessary. A procedure for creating failure information and a specific example of failure information will be described later.

障害情報送信部２１０は、障害情報作成部２０９によって作成された障害情報を、ネットワーク２０１を介して監視装置２１５の障害情報受信部２１６に送信するモジュールである。
テストデータ更新部２１１は、監視装置２１５のテストデータ送信部２１７がネットワーク２０１を介して送信したテストデータを受信し、当該テストデータを基にデータベース２１２に格納されている状態遷移表テストデータ２１３および状態遷移パステストデータ２１４を更新するモジュールである。 The failure information transmission unit 210 is a module that transmits the failure information created by the failure information creation unit 209 to the failure information reception unit 216 of the monitoring device 215 via the network 201.
The test data update unit 211 receives the test data transmitted from the test data transmission unit 217 of the monitoring device 215 via the network 201, and the state transition table test data 213 stored in the database 212 based on the test data. This is a module for updating the state transition path test data 214.

データベース２１２は、状態遷移表テストデータ２１３および状態遷移パステストデータ２１４を格納する外部記憶装置１０４である。
状態遷移表テストデータ２１３は、被監視プログラム２０３の開発段階で実施された状態遷移表テストに関するデータである。
状態遷移パステストデータ２１４は、被監視プログラム２０３の開発段階で実施された状態遷移パステストに関するデータである。
監視プログラム２１４は、ネットワーク２０１を介して監視エージェント２０６から障害情報を受信し、当該障害情報を管理者等に通知し、また、最新のテストデータが存在する場合は当該テストデータをネットワーク２０１を介してテストデータ更新部２１１に送信するモジュールである。 The database 212 is an external storage device 104 that stores state transition table test data 213 and state transition path test data 214.
The state transition table test data 213 is data related to the state transition table test performed at the development stage of the monitored program 203.
The state transition path test data 214 is data related to the state transition path test performed at the development stage of the monitored program 203.
The monitoring program 214 receives failure information from the monitoring agent 206 via the network 201, notifies the failure information to an administrator or the like. If the latest test data exists, the monitoring program 214 sends the test data via the network 201. The module is transmitted to the test data update unit 211.

監視装置２１５は、ネットワーク２０１を介して監視エージェント２０６と通信を行い、被監視装置２０２を監視する装置である。
障害情報受信部２１６は、ネットワーク２０１を介して障害情報送信部２１０より障害情報を受信し、当該障害情報を障害情報通知部２１８に通知するモジュールである。
テストデータ送信部２１７は、ネットワーク２０１を介し、被監視プログラム２０３の開発担当者によって実施された最新のテストデータをテストデータ更新部２１１に送信するモジュールである。
障害情報通知部２１８は、障害情報受信部２１６が受信した障害情報を、独自の手段（メール送信、ダイアログ表示等）により管理者等に通知するモジュールである。 The monitoring device 215 is a device that communicates with the monitoring agent 206 via the network 201 and monitors the monitored device 202.
The failure information reception unit 216 is a module that receives failure information from the failure information transmission unit 210 via the network 201 and notifies the failure information notification unit 218 of the failure information.
The test data transmission unit 217 is a module that transmits the latest test data performed by the person in charge of development of the monitored program 203 to the test data update unit 211 via the network 201.
The failure information notification unit 218 is a module that notifies the administrator or the like of the failure information received by the failure information reception unit 216 by original means (email transmission, dialog display, etc.).

図３は障害情報の作成に関するフローチャートを示す図である。
ステップ３０１は、状態遷移監視部２０７が被監視プログラム２０３の状態遷移を監視するステップである。このステップにおいて、状態遷移監視部２０７は被監視プログラム２０３上で発生した状態遷移の履歴を保持する。具体的な状態遷移の監視手段については後述する。
ステップ３０２は、障害監視部２０８が被監視プログラム２０３で発生する障害を監視するステップである。このステップにおいて、障害監視部２０８は被監視プログラム２０３上で発生した障害情報のログを保持する。具体的な障害の監視手段については後述する。
ステップ３０１とステップ３０２は、処理が前後してもよく、また、並列に実行されていてもよい。
ステップ３０３は、ステップ３０２で障害が発生したかどうかを調べるステップである。もし障害が発生していなければ、ステップ３０１へ戻る。もし障害が発生した場合は、ステップ３０４へ進む。 FIG. 3 is a diagram showing a flowchart relating to creation of failure information.
Step 301 is a step in which the state transition monitoring unit 207 monitors the state transition of the monitored program 203. In this step, the state transition monitoring unit 207 holds a history of state transitions that have occurred on the monitored program 203. Specific state transition monitoring means will be described later.
Step 302 is a step in which the failure monitoring unit 208 monitors a failure that occurs in the monitored program 203. In this step, the failure monitoring unit 208 holds a log of failure information generated on the monitored program 203. Specific fault monitoring means will be described later.
Steps 301 and 302 may be processed before and after, or may be executed in parallel.
Step 303 is a step for checking whether or not a failure has occurred in step 302. If no failure has occurred, the process returns to step 301. If a failure has occurred, go to step 304.

ステップ３０４は、障害情報作成部２０９が状態遷移監視部２０７から状態遷移履歴情報を取得するステップである。状態遷移履歴情報の具体例については後述する。
ステップ３０５は、障害情報作成部２０９がステップ３０４で取得した状態遷移履歴情報を基にデータベースを検索するステップである。このステップにおいて、障害情報作成部２０９はデータベースから前記状態遷移履歴情報に対応した状態遷移表テストデータ２１３および状態遷移パステストデータ２１４を検索する。 Step 304 is a step in which the failure information creation unit 209 acquires state transition history information from the state transition monitoring unit 207. A specific example of the state transition history information will be described later.
Step 305 is a step of searching the database based on the state transition history information acquired in step 304 by the failure information creation unit 209. In this step, the failure information creation unit 209 searches the database for state transition table test data 213 and state transition path test data 214 corresponding to the state transition history information.

ステップ３０６は、障害情報作成部２０９がステップ３０５において検索した状態遷移表テストデータ２１３を検証するステップである。ステップ３０４で取得した状態遷移履歴情報と状態遷移表テストデータ２１３を比較し、現在のテスト状態を検証する処理を行う。
ステップ３０７は、ステップ３０６の検証処理において、現在のテスト状態が障害対策中かどうかを判定するステップである。もし障害対策中であれば、障害情報を監視装置２１５に送信する必要がないため、処理を終了する。もし障害対策中でなければ、ステップ３０８へ進む。 Step 306 is a step of verifying the state transition table test data 213 searched by the failure information creation unit 209 in step 305. The state transition history information acquired in step 304 is compared with the state transition table test data 213, and the current test state is verified.
Step 307 is a step of determining whether or not the current test state is in the process of troubleshooting in the verification process of step 306. If the trouble is being dealt with, it is not necessary to send the trouble information to the monitoring device 215, so the processing is terminated. If the trouble is not being dealt with, the process proceeds to step 308.

ステップ３０８は、ステップ３０６の検証処理において、現在のテスト状態が未実施状態かどうかを判定するステップである。もし未実施であれば、ステップ３１５へ進む。もし未実施でなければ、つまり実施済であれば、ステップ３０９へ進む。 Step 308 is a step of determining whether or not the current test state is an unexecuted state in the verification process of step 306. If not, go to step 315. If not implemented, that is, if implemented, the process proceeds to step 309.

ステップ３０９は、障害情報作成部２０９がステップ３０５において検索した状態遷移パステストデータ２１４を検証するステップである。ステップ３０４で取得した状態遷移履歴情報と状態遷移パステストデータ２１４を比較し、現在のテスト状態を検証する処理を行う。 Step 309 is a step in which the failure information creation unit 209 verifies the state transition path test data 214 searched in step 305. The state transition history information acquired in step 304 is compared with the state transition path test data 214, and the current test state is verified.

ステップ３１０は、ステップ３０９の検証処理において、現在のテスト状態が障害対策中かどうかを判定するステップである。もし障害対策中であれば、障害情報を監視装置２１５に送信する必要がないため、処理を終了する。もし障害対策中でなければ、ステップ３１１へ進む。 Step 310 is a step of determining whether or not the current test state is a countermeasure against a failure in the verification process of Step 309. If the trouble is being dealt with, it is not necessary to send the trouble information to the monitoring device 215, so the processing is terminated. If the trouble is not being dealt with, the process proceeds to step 311.

ステップ３１１は、ステップ３０９の検証処理において、現在のテスト状態が実施済状態かどうかを判定するステップである。もし実施済であれば、ステップ３１３へ進む。もし実施済でなければ、ステップ３１２へ進む。
ステップ３１２は、ステップ３１１において状態遷移パステストデータ２１４のテスト状態が実施済でなかった場合にパス障害情報を作成するステップである。パス障害情報とは、障害発生時点の状態遷移履歴に関する情報である。障害発生箇所は状態遷移パステストが実施されていないため、状態遷移パステストを実施するための最低限の情報のみを作成することで情報量を削減する。パス障害情報の詳細については後述する。
ステップ３１３は、ステップ３１１において状態遷移パステストデータ２１４のテスト状態が実施済であった場合に詳細障害情報を作成するステップである。詳細障害情報とは、メモリダンプやエラーログといった、障害原因の特定に必要な情報である。障害発生箇所は状態遷移パステストが実施されているため、予期せぬ障害が発生したと判断し、詳細な障害情報を作成する。 Step 311 is a step of determining whether or not the current test state is an executed state in the verification process of Step 309. If it has been implemented, the process proceeds to step 313. If not, the process proceeds to step 312.
Step 312 is a step of creating path fault information when the test state of the state transition path test data 214 has not been performed in Step 311. The path failure information is information relating to the state transition history at the time of failure occurrence. Since the state transition path test has not been performed at the location where the failure has occurred, the amount of information is reduced by creating only the minimum information for performing the state transition path test. Details of the path failure information will be described later.
Step 313 is a step of creating detailed fault information when the test state of the state transition path test data 214 has been performed in step 311. The detailed failure information is information necessary for identifying the cause of the failure, such as a memory dump or an error log. Since the state transition path test has been performed at the failure occurrence location, it is determined that an unexpected failure has occurred, and detailed failure information is created.

ステップ３１５は、ステップ３０８において状態遷移表テストデータ２１３のテスト状態が未実施であった場合に簡易障害情報を作成するステップである。簡易障害情報とは、障害発生箇所に対応する状態遷移表のセル（行、列）に関する情報である。障害発生箇所は状態遷移表テストが実施されていないため、状態遷移表テストを実施するための最低限の情報のみを作成することで情報量を削減する。簡易障害情報の詳細については後述する。 Step 315 is a step of creating simple fault information when the test state of the state transition table test data 213 is not executed in step 308. The simple fault information is information related to cells (rows, columns) of the state transition table corresponding to the fault occurrence location. Since the state transition table test is not performed at the fault occurrence location, the amount of information is reduced by creating only the minimum information for performing the state transition table test. Details of the simple failure information will be described later.

ステップ３１４およびステップ３１６は、ステップ３１２、ステップ３１３、およびステップ３１５で作成された障害情報を監視装置２１５の障害情報受信部２１６に送信するステップである。以上の手順により障害情報を作成することで、必要最低限の障害情報のみを監視装置２１５へ送信することができる。 Steps 314 and 316 are steps of transmitting the failure information created in steps 312, 313, and 315 to the failure information receiving unit 216 of the monitoring device 215. By creating the failure information by the above procedure, only the minimum necessary failure information can be transmitted to the monitoring device 215.

図４は、前記JMX技術を利用した場合の状態遷移監視手段および障害監視手段を表すシーケンス図である。JMX技術は、MBeanとよばれるプログラム監視向けモジュールを開発することで、プログラムの内部状態を外部から監視できるようにするための技術である。Javaのバージョン５（Java SE ５）から、JMX技術が標準技術として導入されている。図４では、このJMX技術を利用した場合の被監視プログラム２０３の監視手段を示す。図では処理の流れを単純化するため、MBeanServer等のJMX関連モジュールについては記載を省略する。 FIG. 4 is a sequence diagram showing state transition monitoring means and failure monitoring means when the JMX technology is used. JMX technology is a technology that enables the internal state of a program to be monitored from the outside by developing a module for program monitoring called MBean. JMX technology has been introduced as a standard technology from Java version 5 (Java SE 5). FIG. 4 shows the monitoring means of the monitored program 203 when this JMX technology is used. In the figure, to simplify the process flow, the description of JMX-related modules such as MBeanServer is omitted.

最初に、監視エージェント２０６は、状態遷移監視リスナの登録要求４０３を状態遷移監視MBean４０１へ送信する。当該リスナの登録に成功すると、状態遷移監視MBean４０１はエラーを発生させずに登録成功メッセージ４０４を返す。
次に、監視エージェント２０６は、障害監視リスナの登録要求４０５を障害監視MBean４０２へ送信する。当該リスナの登録に成功すると、障害監視MBean４０２は、エラーを発生させずに登録成功メッセージ４０６を返す。 First, the monitoring agent 206 sends a state transition monitoring listener registration request 403 to the state transition monitoring MBean 401. If the listener is successfully registered, the state transition monitoring MBean 401 returns a registration success message 404 without causing an error.
Next, the monitoring agent 206 transmits a failure monitoring listener registration request 405 to the failure monitoring MBean 402. If the listener is successfully registered, the failure monitoring MBean 402 returns a registration success message 406 without causing an error.

次に、被監視プログラム２０３は、プログラム内部の状態が変化すると状態遷移メッセージ４０７を状態遷移監視MBean４０１へ送信する。状態遷移監視MBean４０１は、当該状態遷移メッセージ４０７を受信すると、状態遷移監視リスナを通じて状態遷移通知メッセージ４０８を監視エージェント２０６へ送信する。監視エージェント２０６は、当該状態遷移通知メッセージ４０８を受信すると、状態遷移履歴（ログ）更新する処理４０９を行う。このようにして、監視エージェント２０６は状態遷移履歴情報を保持する。 Next, the monitored program 203 transmits a state transition message 407 to the state transition monitoring MBean 401 when the internal state of the program changes. Upon receiving the state transition message 407, the state transition monitoring MBean 401 transmits a state transition notification message 408 to the monitoring agent 206 through the state transition monitoring listener. When the monitoring agent 206 receives the state transition notification message 408, the monitoring agent 206 performs processing 409 for updating the state transition history (log). In this way, the monitoring agent 206 holds state transition history information.

プログラム内部で障害が発生すると、被監視プログラム２０３は、障害発生メッセージ４１０を障害監視MBean４０２へ送信する。障害監視MBean４０２は、障害発生メッセージ４１０を受信すると、障害監視リスナを通じて障害発生通知メッセージ４１１を監視エージェント２０６へ送信する。監視エージェント２０６は、当該障害発生通知メッセージ４１１を受信すると、状態遷移履歴情報を確認する処理４１２を行い、次に障害情報を作成する処理４１３を行い、最後に障害情報を送信する処理４１４を行う。障害の作成手順および障害情報の送信処理については図３に示した通りである。このように、JMX技術等を利用することで外部監視用プログラムを作成できる。但し、これは障害監視方法の一例であり、他のプログラムを記述することも、専用のハードウェアを利用して監視することも可能である。 When a failure occurs inside the program, the monitored program 203 transmits a failure occurrence message 410 to the failure monitoring MBean 402. Upon receiving the failure occurrence message 410, the failure monitoring MBean 402 transmits a failure occurrence notification message 411 to the monitoring agent 206 through the failure monitoring listener. When the monitoring agent 206 receives the failure occurrence notification message 411, the monitoring agent 206 performs processing 412 for confirming state transition history information, performs processing 413 for creating failure information, and finally performs processing 414 for transmitting failure information. . The fault creation procedure and the fault information transmission process are as shown in FIG. In this way, an external monitoring program can be created by using JMX technology or the like. However, this is an example of a failure monitoring method, and it is possible to write other programs or to monitor using dedicated hardware.

図５は本発明で参照している状態遷移表の例を示す図である。これは、モデルベース開発と呼ばれる開発手法において一般に利用される、プログラムの状態遷移を表すデータである。
イベントＡ５０１、イベントＢ５０２、イベントＣ５０３、イベントＤ５０４は、プログラムに関連して発生するイベントの種類を表し、状態Ａ５０５、状態Ｂ５０６、状態Ｃ５０７、状態Ｄ５０８、状態Ｅ５０９は、プログラムの遷移し得る状態を表している。また、図中の「×」はある状態の時にそのイベントが発生する可能性がないことを表し、図中の「／」はある状態の時にそのイベントが発生しても処理が行われず、無視されることを表し、図中の「遷移Ｘ（ＸはＡからＥまでのいずれかのアルファベット）」はある状態の時にそのイベントＸが発生すると、アルファベットＸに対応する別の状態に遷移することを表している。例えば図５の表は、状態Ａ５０５の時にイベントＢ５０２は発生する可能性がないことを表し、状態Ｂ５０６の時にイベントＡ５０１が発生しても無視されることを表し、状態Ｃ５０７の時にイベントＣ５０３が発生するとプログラムの状態が状態Ａ５０５に遷移することを表している。このような状態遷移表がモデルベース開発と呼ばれる開発方法において一般に利用されている。 FIG. 5 is a diagram showing an example of a state transition table referred to in the present invention. This is data representing the state transition of a program that is generally used in a development method called model-based development.
Event A501, event B502, event C503, and event D504 represent the types of events that occur in relation to the program, and state A505, state B506, state C507, state D508, and state E509 represent states in which the program can transition. ing. Also, “x” in the figure indicates that there is no possibility that the event will occur in a certain state, and “/” in the figure does not process even if the event occurs in a certain state and ignores it. "Transition X (X is any alphabet from A to E)" in the figure means that if the event X occurs in a certain state, the transition to another state corresponding to the alphabet X Represents. For example, the table in FIG. 5 indicates that event B502 is not likely to occur when in state A505, indicates that event A501 occurs even when in state B506, and event C503 occurs when in state C507. Then, the program state transitions to state A505. Such a state transition table is generally used in a development method called model-based development.

図６は、図５で示した状態遷移表に対応する数値データの例を示す表である。図中のe１（６０１）、e２（６０２）、e３（６０３）、e４（６０４）は、それぞれイベントａ５０１、イベントｂ５０２、イベントｃ５０３、イベントｄ５０４に対応し、ｓ１（６０５）、s２（６０６）、s３（６０７）、s４（６０８）、s５（６０９）は、それぞれ状態Ａ５０５、状態Ｂ５０６、状態Ｃ５０７、状態Ｄ５０８、状態Ｅ５０９に対応している。表中の番号は、図５の表中で示す表記にそれぞれ対応している。このように状態遷移表そのものを文字列データとして利用するのではなく、状態遷移表に対応した数値データを利用することで情報量を削減できる。これは情報量を削減するための一般的な例である。 FIG. 6 is a table showing an example of numerical data corresponding to the state transition table shown in FIG. In the figure, e1 (601), e2 (602), e3 (603), e4 (604) correspond to event a501, event b502, event c503, and event d504, respectively, and s1 (605), s2 (606), s3 (607), s4 (608), and s5 (609) correspond to state A505, state B506, state C507, state D508, and state E509, respectively. The numbers in the table correspond to the notations shown in the table of FIG. Thus, the amount of information can be reduced by using numerical data corresponding to the state transition table instead of using the state transition table itself as character string data. This is a general example for reducing the amount of information.

図７は、図６で示したフォーマットを用いて表記した状態遷移履歴情報の例を示す表である。図中の発生順序７０１は状態遷移が発生した順番を表す数値であり、セル番号７０２は図６で示した数値データに対応する数値である。この例では、状態Ｅの時にイベントＢが発生して状態Ｄに遷移し、状態Ｄの時にイベントＣが発生して状態Ｃに遷移し、状態Ｃの時にイベントＡが発生して状態Ｂに遷移し、状態Ｂの時にイベントＤが発生して状態Ａに遷移し、状態Ａの時にイベントＢが発生したことを示している。図５の状態遷移図では状態Ａの時にイベントＢは発生しないと表記されているが、この例では状態Ａの時にイベントＢが発生したことを示している。
ここで、図３の説明において述べた簡易障害情報とは、障害発生直前の状態遷移に対応するセル番号７０２の数値データを障害情報として利用するものである。図７の例では、簡易障害情報に相当するデータは「２」となる。但し、これは情報量を削減するための簡易障害情報の一例であり、その内容を限定するものではない。 FIG. 7 is a table showing an example of the state transition history information expressed using the format shown in FIG. The generation order 701 in the figure is a numerical value indicating the order in which the state transition occurs, and the cell number 702 is a numerical value corresponding to the numerical data shown in FIG. In this example, event B occurs in state E and transitions to state D, event C occurs in state D and transitions to state C, event A occurs in state C and transitions to state B In the state B, the event D is generated and transitions to the state A, and the event B is generated in the state A. In the state transition diagram of FIG. 5, it is described that the event B does not occur in the state A, but in this example, the event B occurs in the state A.
Here, the simple failure information described in the description of FIG. 3 uses numerical data of the cell number 702 corresponding to the state transition immediately before the failure occurs as the failure information. In the example of FIG. 7, the data corresponding to the simple failure information is “2”. However, this is an example of simple failure information for reducing the amount of information, and the content is not limited.

図８は、図６で示したフォーマットを用いて表記した状態遷移表テストデータ２１３の例を示す表である。状態遷移表テストとは、状態遷移表の各マトリクスが正常に動作するかどうかを確認するテストであり、モデルベース開発で一般的に行われるテスト手法である。例えば図５においては、状態Ａの時にイベントＡが発生した場合に状態Ｂへ正しく遷移するかどうかといった動作をテストする。
図中のセル番号８０１は図６で示した数値データに対応する数値であり、テスト状態８０２はセル番号８０１に対応した状態遷移表テストの実施状態を表す数値である。テスト状態８０２の数値は、「０」が実施済であることを表し、「１」が未実施であることを表し、「２」が実施不可能であることを表し、「３」が対策中であることを表している。
ここで、実施不可能とは、ある状態の時にイベントが発生し得ないためにテストが不可能である（図５の「×」に該当する）という意味であり、対策中とは、現在テスト中であるという意味である。このような状態遷移表テストデータ２１３を利用することで、どういった種類の障害情報を作成すべきか、また、障害情報を監視装置２１５に送信すべきかどうかを判定することができる。判定手順の例については図３で示した通りである。 FIG. 8 is a table showing an example of the state transition table test data 213 expressed using the format shown in FIG. The state transition table test is a test for confirming whether each matrix of the state transition table operates normally, and is a test method generally performed in model-based development. For example, in FIG. 5, when an event A occurs in the state A, an operation such as whether or not the transition to the state B is correctly performed is tested.
The cell number 801 in the figure is a numerical value corresponding to the numerical data shown in FIG. 6, and the test state 802 is a numerical value representing the state transition table test execution state corresponding to the cell number 801. The numerical value of the test state 802 indicates that “0” has been executed, “1” indicates that it has not been executed, “2” indicates that it cannot be executed, and “3” indicates that measures are being taken. It represents that.
Here, “impossible to execute” means that the test cannot be performed because an event cannot occur in a certain state (corresponding to “x” in FIG. 5). It means being inside. By using such state transition table test data 213, it is possible to determine what type of failure information should be created and whether failure information should be transmitted to the monitoring device 215. An example of the determination procedure is as shown in FIG.

図９は、図６で示したフォーマットを用いて表記した状態遷移パステストデータ２１４の例を示す表である。状態遷移パステストとは、状態遷移表の一連の状態遷移が正常に動作するかどうかを確認するテストであり、モデルベース開発で一般的に行われるテスト手法である。例えば図５においては、状態Ａの時にイベントＡが発生した場合に状態Ｂへ正しく遷移し、さらに状態Ｂの時にイベントＢが発生した場合に状態Ｅへ正しく遷移し、さらに状態Ｅの時にイベントＡが発生した場合に正しく当該イベントを無視するかどうかといった一連の動作をテストする。
図中の状態遷移パス９０１は図６で示した数値データに対応する状態遷移履歴を表すデータであり、テスト状態９０２は状態遷移パス９０１に対応した状態遷移パステストの実施状況を表す数値である。テスト状態９０２の数値は、「０」が実施済であることを表し、「１」が対策中（例えば事前に受付済み）であることを表す。図中に表れない状態遷移パスは、全てテスト状態が未実施であることを示すものとする。未実施の状態遷移パスを図中に表記しない理由は、状態遷移パス９０１の取り得る値を全て網羅するとデータ量が膨大になるためである。このような状態遷移パステストデータ２１４を利用することで、どういった種類の障害情報を作成すべきか、また、障害情報を監視装置２１５に送信すべきかどうかを判定することができる。判定手順の例については図３で示した通りである。
ここで、図３の説明において述べたパス傷害情報とは、障害発生前の一連の状態遷移に対応する状態遷移パス９０１の数値データを障害情報として利用するものである。図９の例では、パス障害情報に相当するデータは「１、６、２０、９、５」等になる。但し、これは情報量を削減するためのパス障害情報の一例であり、その内容を限定するものではない。 FIG. 9 is a table showing an example of the state transition path test data 214 expressed using the format shown in FIG. The state transition path test is a test for confirming whether a series of state transitions in the state transition table operates normally, and is a test method generally performed in model-based development. For example, in FIG. 5, when event A occurs in state A, the state transitions correctly to state B, and when event B occurs in state B, transitions to state E correctly. Tests a series of actions such as whether to ignore the event correctly when an error occurs.
A state transition path 901 in the figure is data representing a state transition history corresponding to the numerical data shown in FIG. 6, and a test state 902 is a numerical value representing an implementation status of the state transition path test corresponding to the state transition path 901. . The numerical value of the test state 902 indicates that “0” has been implemented, and “1” indicates that countermeasures are being performed (for example, acceptance has been made in advance). State transition paths that do not appear in the figure all indicate that the test state has not been executed. The reason for not describing unimplemented state transition paths in the drawing is that the data amount becomes enormous if all possible values of the state transition path 901 are covered. By using such state transition path test data 214, it is possible to determine what type of failure information should be created and whether failure information should be transmitted to the monitoring device 215. An example of the determination procedure is as shown in FIG.
Here, the path injury information described in the description of FIG. 3 uses numerical data of the state transition path 901 corresponding to a series of state transitions before the failure occurs as the failure information. In the example of FIG. 9, the data corresponding to the path failure information is “1, 6, 20, 9, 5” or the like. However, this is an example of path failure information for reducing the amount of information, and the content is not limited.

図１０は、図３の説明において述べた障害情報の送信手順を示すシーケンス図である。
最初に、障害情報作成部２０９は、前記障害情報の送信要求１００１を障害情報送信部２１０へ送信する。次に、障害情報送信部２１０は、前記障害情報を監視装置２１５の障害情報受信部２１６へ送信する処理１００２を行う。次に、障害情報受信部２１６は、障害情報送信部２１０より受信した前記障害情報の通知要求１００３を障害情報通知部２１８へ送信する。障害情報通知部２１８は前記障害情報の解析処理１００４を行い、前記障害情報を人間が読むことのできる形式に変換し、メール送信等の手段により被監視装置２０２の管理者へ通知する処理１００５を行う。最後に、障害情報通知部２１８は、通知に成功したことを示すメッセージ１００６を障害情報受信部２１６へ送信し、障害情報受信部２１６は、障害情報の受信に成功したことを示すメッセージ１００７を障害情報送信部２１０へ送信し、障害情報送信部２１０は、障害情報の送信に成功したことを示すメッセージ１００８を障害情報作成部２０９へ送信する。以上の流れにより障害情報を被監視装置２０２の管理者に通知するが、これは一般的な通知方法を述べたものであり、通知方法を限定するものではない。 FIG. 10 is a sequence diagram showing a procedure for transmitting failure information described in the description of FIG.
First, the failure information creation unit 209 transmits the failure information transmission request 1001 to the failure information transmission unit 210. Next, the failure information transmitting unit 210 performs processing 1002 for transmitting the failure information to the failure information receiving unit 216 of the monitoring device 215. Next, the failure information reception unit 216 transmits the failure information notification request 1003 received from the failure information transmission unit 210 to the failure information notification unit 218. The failure information notification unit 218 performs the failure information analysis processing 1004, converts the failure information into a form that can be read by humans, and notifies the administrator of the monitored device 202 by means such as mail transmission. Do. Finally, the failure information notification unit 218 transmits a message 1006 indicating that the notification has been successful to the failure information reception unit 216, and the failure information reception unit 216 generates a message 1007 indicating that the failure information has been successfully received. The failure information transmission unit 210 transmits a message 1008 indicating that the failure information has been successfully transmitted to the failure information creation unit 209. Although the failure information is notified to the administrator of the monitored apparatus 202 by the above flow, this describes a general notification method and does not limit the notification method.

図１１はテストデータの更新手順を示すシーケンス図である。障害情報の通知を受けた管理者は、障害箇所をテストし、バグを修正したモジュールを配信すると共に、前記テストデータのテスト状態を更新することができる。 FIG. 11 is a sequence diagram showing a test data update procedure. The administrator who has received the notification of the fault information can test the fault location, distribute the module in which the bug is corrected, and update the test state of the test data.

最初に、管理者１１０１は、被監視プログラム２０３に関する最新のテストデータを監視プログラム２１４へ入力する処理１１０２を行う。次に、監視プログラム２１４は、前記最新のテストデータを更新するメッセージを監視エージェント２０６へ送信する処理１１０３を行う。次に、監視エージェント２０６は、受信した前記最新のテストデータをデータベース２１２へ送信し、前記テストデータを更新する処理１１０４を行う。最後に、データベース２１２はテストデータの更新に成功したことを示すメッセージ１１０５を監視エージェント２０６へ送信し、監視エージェント２０６は、テストデータの受信に成功したことを示すメッセージ１１０６を監視プログラム２１４へ送信し、監視プログラム２１４は、テストデータの入力に成功したことを示すメッセージ１１０７を管理者１１０１へ送信する。以上の流れによりテストデータの更新を行うが、これは一般的な更新方法を述べたものであり、更新方法を限定するものではない。特に、テスト状態の更新は監視エージェント２０６が自動で行うことも考えられるが、ここでは一般的な例として管理者による更新処理を示したものである。 First, the administrator 1101 performs processing 1102 for inputting the latest test data related to the monitored program 203 to the monitoring program 214. Next, the monitoring program 214 performs processing 1103 for transmitting a message for updating the latest test data to the monitoring agent 206. Next, the monitoring agent 206 transmits the received latest test data to the database 212 and performs processing 1104 for updating the test data. Finally, the database 212 transmits a message 1105 indicating that the test data has been successfully updated to the monitoring agent 206, and the monitoring agent 206 transmits a message 1106 indicating that the test data has been successfully received to the monitoring program 214. The monitoring program 214 transmits a message 1107 indicating that the test data has been successfully input to the administrator 1101. The test data is updated according to the above flow, but this describes a general update method and does not limit the update method. In particular, it is conceivable that the monitoring agent 206 automatically updates the test state, but here, as a general example, an update process by an administrator is shown.

本発明に係る一般的なシステムのハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of the general system which concerns on this invention. 本発明に係るモジュールの概念的な構成を示す図である。It is a figure which shows the notional structure of the module which concerns on this invention. 障害情報の生成手順の一例を示すフローチャートである。It is a flowchart which shows an example of the production | generation procedure of failure information. JMX技術を利用した場合の状態監視手段の一例を示すシーケンス図である。It is a sequence diagram which shows an example of the state monitoring means at the time of utilizing JMX technology. モデルベース開発で用いられる状態遷移表の一例を示す図である。It is a figure which shows an example of the state transition table used by model base development. 状態遷移表に対応した数値データの一例を示す表である。It is a table | surface which shows an example of the numerical data corresponding to a state transition table. 状態遷移履歴データの一例を示す表である。It is a table | surface which shows an example of state transition history data. 状態遷移表テストデータの一例を示す表である。It is a table | surface which shows an example of state transition table test data. 状態遷移パステストデータの一例を示す表である。It is a table | surface which shows an example of state transition path test data. 障害情報の送信方法の一例を示すシーケンス図である。It is a sequence diagram which shows an example of the transmission method of failure information. テストデータの更新方法の一例を示すシーケンス図である。It is a sequence diagram which shows an example of the update method of test data.

Explanation of symbols

１０１演算装置、１０２主記憶装置、１０３通信装置、１０４外部記憶装置、２０１ネットワーク、２０２被監視装置、２０３被監視プログラム、２０４状態遷移通知部、２０５障害通知部、２０６監視エージェント、２０７状態遷移監視部、２０８障害監視部、２０９障害情報策西部、２１０障害情報送信部、２１１テストデータ更新部、２１２データベース、２１３状態遷移表テストデータ、２１４状態遷移パステストデータ、２１５監視装置、２１６障害情報受信部、２１７テストデータ送信部、２１８障害情報通知部。 101 computing device 102 main storage device 103 communication device 104 external storage device 201 network 202 monitored device 203 monitored program 204 state transition notification unit 205 failure notification unit 206 monitoring agent 207 state transition monitoring , 208 Fault monitoring unit, 209 Fault information policy western part, 210 Fault information transmission part, 211 Test data update part, 212 Database, 213 State transition table test data, 214 State transition path test data, 215 Monitoring device, 216 Fault information reception , 217 Test data transmission unit, 218 Fault information notification unit.

Claims

It consists of a device to be monitored, a monitoring device that monitors the occurrence of a failure in the device, and an external storage device that is a database that stores the failure data of the device, and these devices can send and receive data via a network Fault monitoring system,
A fault monitoring unit that monitors a fault that has occurred in a program module that operates on the device; a state transition monitoring unit that monitors a state transition of the program module; and the state when the fault monitoring unit detects the occurrence of a fault. Failure information for obtaining state transition history information from a transition monitoring unit, further obtaining test data for the program module from the external storage device, and creating failure information based on the state transition history information and the test data A failure monitoring system comprising: a creation unit; and a failure information transmission unit that transmits failure information created by the failure information creation unit to the monitoring device via a network.

The test data is state transition table test data corresponding to a result of a state transition table test performed on the program module, state transition path test data corresponding to a result of a state transition path test, or both. The fault monitoring system according to 1.

When the failure information creation unit determines whether the program module has already passed the state transition table test based on the state transition table test data and the state transition history information, The fault monitoring system according to claim 2, wherein detailed fault information such as a memory dump and a fault log is created, and simple fault information corresponding to the state transition table of the state transition table test data is created when the information does not pass.

The failure information creation unit determines whether the program module has already passed the state transition path test based on the state transition path test data and the state transition history information. The fault monitoring system according to claim 3, wherein detailed fault information is created, and path fault information corresponding to the state transition path of the state transition path test data is created if the detailed fault information is not passed.

The failure information creation unit determines whether the program module has passed the state transition table test based on the state transition table test data and the state transition history information. If simple fault information is created and passed, it is determined whether the program module passes the state transition path test based on the state transition path test data and the state transition history information. 5. The fault monitoring system according to claim 4, wherein the path fault information is created if not, and the detailed fault information is created if the path is passed.

When the failure information creation unit transmits the failure information to the monitoring device, the test data corresponding to the state transition history information is stored in the external storage device as a countermeasure, and when the failure recurs, The failure information is not created when the test data corresponding to the state transition history information is being taken, and the failure information is not transmitted to the monitoring device. The fault monitoring system according to any one of the above.

A test data update unit that transmits the latest test data input to the monitoring device by the device administrator or the like and a countermeasure status of the test data to the external storage device and updates the test data stored in the external storage device; The fault monitoring system according to any one of claims 1 to 6.

A device comprising a program module, capable of transmitting and receiving data via a network with a monitoring device and an external storage device which is a database for storing device failure data,
A failure monitoring unit that monitors a failure that has occurred in the program module, a state transition monitoring unit that monitors a state transition of the program module, and a state from the state transition monitoring unit when the failure monitoring unit detects the occurrence of a failure. Acquiring a transition history information, further acquiring test data performed on the program module from the external storage device, and generating a failure information based on the state transition history information and the test data; and A device comprising: a failure information transmission unit configured to transmit failure information created by a failure information creation unit to the monitoring apparatus via a network.

It consists of a device to be monitored, a monitoring device that monitors the occurrence of a failure in the device, and an external storage device that is a database that stores the failure data of the device, and these devices can send and receive data via a network A method for monitoring faults in a fault monitoring system,
A fault monitoring step for monitoring a fault that has occurred in a program module operating on the device; a state transition monitoring step for monitoring a state transition of the program module; and the state when a fault occurrence is detected by the fault monitoring step. Failure information for acquiring state transition history information by a transition monitoring step, further acquiring test data for the program module from the external storage device, and creating failure information based on the state transition history information and the test data A failure monitoring method comprising: a creation step; and a failure information transmission step of transmitting failure information created in the failure information creation step to the monitoring device via a network.

The test data is state transition table test data corresponding to a result of a state transition table test performed on the program module, state transition path test data corresponding to a result of a state transition path test, or both. 9. The fault monitoring method according to 9.