JP2013003681A

JP2013003681A - Service operation management device

Info

Publication number: JP2013003681A
Application number: JP2011131642A
Authority: JP
Inventors: Yoshiaki Takeda; 義聡竹田
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2011-06-13
Filing date: 2011-06-13
Publication date: 2013-01-07

Abstract

PROBLEM TO BE SOLVED: To efficiently detect fraudulence of an execution order of services.SOLUTION: A service operation management device includes: a service configuration information management part 111 for storing service execution order definition information defining the execution order of a plurality of services and storing influence range information indicating an influence degree of a failure when the failure occurs in the service; an integrated log storage part 117 for storing logs; and a service failure detection part 115 for inputting a new input log, detecting a previous log of the service executed before the service indicated by the input log from the plurality of logs stored in the integrated log storage part, and determining whether or not the service execution order of the service indicated by the previous log and the service indicated by the input log coincides with the service execution order of the service execution order definition information. The service failure detection part 115 refers to the influence range information stored by the service configuration information storage management part and detects failure occurrence preferentially to the log of the service with a high influence degree of the failure.

Description

この発明は、機能をユーザに提供する過程でメッセージを交換するコンピュータソフトウェアであるサービスの障害を検出するサービス運用管理装置に関するものである。 The present invention relates to a service operation management apparatus that detects a failure in a service that is computer software that exchanges messages in the process of providing a function to a user.

特許文献１に示すように、異なる記録内容を持つ複数種類のログデータを統合管理する装置が実現されている。この技術を用いて、サービス指向アーキテクチャ（ＳＯＡ）に基づくサービスシステムの基盤ミドルウェアであるＥＳＢのログをリアルタイムに監視し、サービスシステムを構成する個々のサービスの障害を検出するサービス運用管理装システムを構築することができる。ここで、ＥＳＢとは「ＥｎｔｅｒｐｒｉｓｅＳｅｒｖｉｃｅＢｕｓ」の略であり、サービス間メッセージの中継に必要なルーティングやフォーマット変換などを行うミドルウェアである。 As shown in Patent Document 1, an apparatus that integrates and manages a plurality of types of log data having different recording contents is realized. Using this technology, a service operation management system is built that monitors the ESB log, which is the basic middleware of a service system based on Service Oriented Architecture (SOA), in real time and detects the failure of each service that constitutes the service system. can do. Here, ESB is an abbreviation of “Enterprise Service Bus” and is middleware that performs routing and format conversion necessary for relaying messages between services.

従来技術による、サービス、ＥＳＢ、サービス運用管理装置より構成されるサービス運用管理システムでは、複数のサービスがＥＳＢを経由しメッセージを授受し、利用者向けの機能を実現している。サービス運用管理装置は、ＥＳＢのログ（メッセージのログ、ＥＳＢ自身の行うメッセージルーティングなどの処理の記録であるトランザクションログ）をリアルタイムに監視し、サービスの障害を検出する。ＥＳＢは、ひとつのサービス運用管理システムにつき１つ用いられる場合もあるし、企業合併や提携などで複数のＥＳＢを並存させる場合もある。 In a service operation management system comprising a service, an ESB, and a service operation management device according to the prior art, a plurality of services send and receive messages via the ESB to realize functions for users. The service operation management apparatus monitors the ESB log (message log, transaction log which is a record of processing such as message routing performed by the ESB itself) in real time, and detects a service failure. One ESB may be used for each service operation management system, and a plurality of ESBs may coexist due to a corporate merger or partnership.

特開２０１０−１８２１９４号公報JP 2010-182194 A

従来技術には、複雑なシステムにおいてサービスの実行順序の不正を検出するには、実行され得る全てのサービス実行順序を実際に実行中のサービスのログと照合する必要があり、効率的に検出することが難しいという課題があった。 In the prior art, in order to detect an illegal execution order of services in a complex system, it is necessary to collate all service execution orders that can be executed with the log of the service that is actually being executed. There was a problem that it was difficult.

複雑な業務を多数含むサービス運用管理システムの場合、このマッチング対象となるサービス実行順序定義が膨大となり、多数のサービスを同時実行し短時間に大量のログが発生する場合など、ログ監視のリアルタイム性を損なう恐れがあった。 In the case of a service operation management system that includes a large number of complex tasks, the real-time nature of log monitoring, such as when the number of service execution order definitions to be matched becomes enormous and many services are executed simultaneously and a large amount of logs are generated in a short time, etc. There was a risk of damage.

この発明は、サービス実行順序不正を効果的に検出するサービス運用管理装置を提供することを目的とする。 It is an object of the present invention to provide a service operation management apparatus that effectively detects service execution order fraud.

この発明のサービス運用管理装置は、
所定の順序で実行されるコンピュータソフトウェアにより提供されるサービスの障害を検出するサービス運用管理装置において、
複数のサービスの実行順序を定義したサービス実行順序定義情報を記憶するとともに、サービスに対応してそのサービスに障害が発生した場合に障害の影響度を示す影響範囲情報を記憶するサービス構成情報管理部と、
サービス同士のメッセージを記録したメッセージログと上記メッセージのメッセージ中継を記録したトランザクションログとの少なくともいずれかのログを過去ログとして複数記憶する統合ログ記憶部と、
統合ログ記憶部に記憶する新たな入力ログを入力し、既に統合ログ記憶部に記憶した複数の過去ログの中から、入力ログが示すサービスより前に実行されたサービスのログを前ログとして検出し、サービス構成情報管理部に記憶したサービス実行順序定義情報を検索して、前ログが示すサービスと入力ログが示すサービスとのサービス実行順序がサービス構成情報管理部に記憶したサービス実行順序定義情報のサービス実行順序と一致するか否かを判断してサービスの障害発生を検出するサービス障害検出部と、
を備え、
サービス障害検出部は、サービス構成情報管理部が記憶する影響範囲情報を参照して、障害の影響度が大きいサービスの入力ログに対して優先的に障害発生の検出をすることを特徴とする。 The service operation management apparatus of the present invention
In a service operation management apparatus for detecting a failure in a service provided by computer software executed in a predetermined order,
Service configuration information management unit that stores service execution order definition information that defines the execution order of multiple services, and stores influence range information that indicates the degree of influence of a failure when a failure occurs in response to a service. When,
An integrated log storage unit that stores a plurality of logs as a past log of at least one of a message log that records messages between services and a transaction log that records message relay of the above message;
A new input log to be stored in the integrated log storage unit is input, and a log of a service executed before the service indicated by the input log is detected as a previous log from a plurality of past logs already stored in the integrated log storage unit. The service execution order definition information stored in the service configuration information management unit is retrieved, and the service execution order between the service indicated by the previous log and the service indicated by the input log is stored in the service configuration information management unit. A service failure detection unit that determines whether or not the service execution order matches the service execution order,
With
The service failure detection unit is characterized in that, with reference to the influence range information stored in the service configuration information management unit, the occurrence of a failure is preferentially detected for an input log of a service having a high degree of failure influence.

本発明は、サービス構成情報管理部が、障害の影響度を記憶しているので、サービス障害の影響度が大きいものについて優先的にサービス実行順序不正を検出することを可能にする。 According to the present invention, since the service configuration information management unit stores the degree of influence of a failure, it is possible to preferentially detect an illegal service execution order for a case where the degree of influence of a service failure is large.

実施の形態１〜３のサービス運用管理システムの構成図。The block diagram of the service operation management system of Embodiment 1-3. 業務とサービスの説明図。Explanatory drawing of business and service. サービス構成情報管理部１１１が記憶するサービスの階層関係を示す図。The figure which shows the hierarchical relationship of the service which the service structure information management part 111 memorize | stores. サービス構成情報管理部１１１が記憶するサービス障害の影響範囲情報を示す図。The figure which shows the service fault influence range information which the service structure information management part 111 memorize | stores. サービス障害検出部１１５の動作フロー図。The operation | movement flowchart of the service failure detection part 115. FIG. サービス障害影響範囲特定部１１２の動作フロー図。The operation | movement flowchart of the service failure influence range specific | specification part 112. FIG. ログの種類を示す図。The figure which shows the kind of log. 実施の形態２と３のサービス部１０１が作成するメッセージの構成を示す図。The figure which shows the structure of the message which the service part 101 of Embodiment 2 and 3 produces. 実施の形態３のサービス部１０１の動作フロー図。FIG. 10 is an operation flowchart of the service unit 101 according to the third embodiment. 実施の形態３の属性クエリ解釈部１０９の動作フロー図。FIG. 10 is an operation flowchart of the attribute query interpretation unit 109 according to the third embodiment. 実施の形態３の属性クエリ解釈部１０９の動作フロー図。FIG. 10 is an operation flowchart of the attribute query interpretation unit 109 according to the third embodiment. 実施の形態１〜３におけるサービス部１０１、サービス運用管理装置１０３の外観の一例を示す図。FIG. 3 is a diagram illustrating an example of the appearance of a service unit 101 and a service operation management apparatus 103 according to the first to third embodiments. 実施の形態１〜３におけるサービス部１０１、ＥＳＢ１０２、サービス運用管理装置１０３、のハードウェア資源の一例を示す図。The figure which shows an example of the hardware resource of the service part 101 in Embodiment 1-3, ESB102, and the service operation management apparatus 103. FIG.

実施の形態１．
図１はこの発明の実施の形態１〜３に係るサービス運用管理装置１０３を用いたサービス運用管理システム１００の構成図である。 Embodiment 1 FIG.
FIG. 1 is a configuration diagram of a service operation management system 100 using a service operation management apparatus 103 according to Embodiments 1 to 3 of the present invention.

≪サービス運用管理システム１００の説明≫
サービス運用管理システム１００は、管理対象となるサービス部１０１、ＥＳＢ（ＥｎｔｅｒｐｒｉｓｅＳｅｒｖｉｃｅＢｕｓ）１０２、サービス運用管理装置１０３よりなる。 << Description of Service Operation Management System 100 >>
The service operation management system 100 includes a service unit 101 to be managed, an ESB (Enterprise Service Bus) 102, and a service operation management apparatus 103.

サービス部１０１は、計算機とアプリケーションプログラムとから構成される。サービス部１０１はサービスを提供する。サービスとは、計算機や専用機でアプリケーションプログラムを実行することにより達成されるものである。業務は、サービスが順次実行されて達成される仕事のことであり、例えば、在庫管理業務、予約業務、ネット検索業務などである。業務は複数のサービスが所定の実行順序で実行されることにより達成される。
サービス部１０１は、その機能の実現のためにメッセージ１０４をサービス間で授受する。 The service unit 101 includes a computer and an application program. The service unit 101 provides a service. A service is achieved by executing an application program on a computer or a dedicated machine. The work is a work that is achieved by sequentially executing services, such as an inventory management work, a reservation work, and a net search work. The business is achieved by executing a plurality of services in a predetermined execution order.
The service unit 101 exchanges messages 104 between services for realizing the functions.

メッセージ１０４は、セッションＩＤと業務ＩＤとサービスＩＤと宛先アドレスと差出アドレスとメッセージ本文とを有している。
ここで、セッションとは、コンピュータシステムやネットワーク通信において、接続又はログインしてから、切断又はログオフするまでの、一連の操作や通信のことをいう。セッションＩＤとは、接続又はログインから切断又はログオフまでの間セッションを識別するために付与された識別子である。
業務ＩＤとは、業務を識別するために付与された識別子である。
サービスＩＤとは、サービスを識別するために付与された識別子である。
さらに、メッセージ１０４は、後述する実施の形態２，３で説明する属性情報１０５や属性クエリ１０６を属性情報として有している。 The message 104 has a session ID, a business ID, a service ID, a destination address, a sending address, and a message text.
Here, a session refers to a series of operations and communications from connection or login to disconnection or logoff in a computer system or network communication. The session ID is an identifier assigned to identify a session from connection or login to disconnection or logoff.
The business ID is an identifier assigned to identify a business.
The service ID is an identifier assigned to identify the service.
Further, the message 104 includes attribute information 105 and an attribute query 106 described in Embodiments 2 and 3 to be described later as attribute information.

ＥＳＢ１０２は、メッセージログ１０７、トランザクションログ１０８を記憶するメモリ装置（図示せず）を備えている。メッセージログ１０７は、メッセージ１０４の内容をログとして記録したものである。ログは、年月日時分秒などのログ時刻を記録している。 The ESB 102 includes a memory device (not shown) that stores a message log 107 and a transaction log 108. The message log 107 records the content of the message 104 as a log. The log records the log time such as year / month / day / hour / minute / second.

トランザクションログ１０８は、ＥＳＢ１０２自身の行うメッセージルーティングなどの処理の記録である。トランザクションログ１０８は、メッセージ１０４とセッションＩＤと業務ＩＤとサービスＩＤとを含んでいる。 The transaction log 108 is a record of processing such as message routing performed by the ESB 102 itself. The transaction log 108 includes a message 104, a session ID, a business ID, and a service ID.

図１に示すように、複数のサービス部１０１が、ＥＳＢ１０２を経由しメッセージを授受し、利用者向けの機能を実現している。ＥＳＢ１０２は、ひとつのサービス運用管理システムにつき１つ用いられる場合もあるが、図１に示すように企業合併や提携などで複数のＥＳＢ１０２を並存させる場合もある。 As shown in FIG. 1, a plurality of service units 101 exchange messages via the ESB 102 to realize functions for users. One ESB 102 may be used for each service operation management system. However, as shown in FIG. 1, a plurality of ESBs 102 may coexist due to corporate merger or partnership.

≪サービス運用管理装置１０３の構成≫
サービス運用管理装置１０３は、サービス構成情報管理部１１１、サービス障害影響範囲特定部１１２、収集・蓄積部１１４、サービス障害検出部１１５、ログ管理部１１６、統合ログ記憶部１１７を備えている。さらに、サービス運用管理装置１０３は、後述する実施の形態２、３で説明する属性クエリ解釈部１０９や属性管理部１１０やサービス障害定義管理部１１３を有している。 << Configuration of Service Operation Management Device 103 >>
The service operation management apparatus 103 includes a service configuration information management unit 111, a service failure influence range identification unit 112, a collection / accumulation unit 114, a service failure detection unit 115, a log management unit 116, and an integrated log storage unit 117. Furthermore, the service operation management apparatus 103 includes an attribute query interpretation unit 109, an attribute management unit 110, and a service failure definition management unit 113, which will be described in Embodiments 2 and 3 described later.

サービス運用管理装置１０３は、ＥＳＢ１０２のログ（メッセージログ１０７、ＥＳＢ自身の行うメッセージルーティングなどの処理の記録であるトランザクションログ１０８）を入力してリアルタイムに監視し、サービスの障害を検出する。 The service operation management apparatus 103 receives the log of the ESB 102 (the message log 107 and the transaction log 108 which is a record of processing such as message routing performed by the ESB itself) and monitors it in real time to detect a service failure.

サービス構成情報管理部１１１は、記憶装置であり、複数のサービスの実行順序を定義したサービス実行順序定義情報を記憶する。サービス構成情報管理部１１１は、サービスを業務ＩＤとサービスＩＤとにより識別し、業務ＩＤとサービスＩＤとにより複数のサービスのサービス実行順序を記憶する。サービス構成情報管理部１１１は、さらに、サービスに対応してそのサービスに障害が発生した場合に障害の影響度を示す影響範囲情報を記憶する。 The service configuration information management unit 111 is a storage device and stores service execution order definition information that defines the execution order of a plurality of services. The service configuration information management unit 111 identifies a service by a business ID and a service ID, and stores a service execution order of a plurality of services by the business ID and the service ID. The service configuration information management unit 111 further stores influence range information indicating the degree of influence of a failure when a failure occurs in the service corresponding to the service.

サービス障害影響範囲特定部１１２は、サービス構成情報管理部１１１に記憶したサービス実行順序定義情報からサービス実行順序を検索することにより、障害の影響を受けるサービスの範囲を特定し、障害の影響を受けるサービスの障害影響範囲を影響範囲情報としてサービス構成情報管理部１１１に記憶する。 The service fault influence range specifying unit 112 searches the service execution order definition information stored in the service configuration information management unit 111 to identify the range of services affected by the fault and is affected by the fault. The service failure information range is stored in the service configuration information management unit 111 as the influence range information.

収集・蓄積部１１４は、ＥＳＢ１０２から（後述する実施の形態３の場合は、属性クエリ解釈部１０９を経由して）、サービス同士のメッセージ中継を記録したトランザクションログ１０８とサービス同士のメッセージを記録したメッセージログとの少なくともいずれかのログを入力する。ログはログ管理部１１６に渡される。 The collection / accumulation unit 114 records the transaction log 108 that records message relay between services and the message between services from the ESB 102 (in the case of Embodiment 3 described later, via the attribute query interpretation unit 109). Enter at least one of the message log and log. The log is passed to the log management unit 116.

ログ管理部１１６は、収集・蓄積部１１４が収集したログを統合ログ記憶部１１７に記憶するとともに、ログをサービス障害検出部１１５に引き渡す。 The log management unit 116 stores the log collected by the collection / accumulation unit 114 in the integrated log storage unit 117 and delivers the log to the service failure detection unit 115.

統合ログ記憶部１１７は、記憶装置であり、サービス同士のメッセージ中継を記録したトランザクションログ１０８とサービス同士のメッセージ１０４を記録したメッセージログ１０７との少なくともいずれかのログを過去ログとして複数記憶する。 The integrated log storage unit 117 is a storage device, and stores at least one of a transaction log 108 that records message relay between services and a message log 107 that records messages 104 between services as a past log.

サービス障害検出部１１５は、新たな入力ログを入力し、統合ログ記憶部に記憶した複数の過去ログの中から、入力ログが示すサービスより前に実行されたサービスの過去ログを前ログとして検出する。 The service failure detection unit 115 inputs a new input log and detects, as a previous log, a past log of a service executed before the service indicated by the input log from a plurality of past logs stored in the integrated log storage unit To do.

サービス障害検出部１１５は、サービス構成情報管理部１１１に記憶したサービス実行順序定義情報を検索して、前ログが示すサービスと入力ログが示すサービスとのサービス実行順序がサービス構成情報管理部に記憶したサービス実行順序定義情報のサービス実行順序と一致するか否かを判断してサービスの障害発生を検出する。 The service failure detection unit 115 searches the service execution order definition information stored in the service configuration information management unit 111, and stores the service execution order of the service indicated by the previous log and the service indicated by the input log in the service configuration information management unit. The occurrence of a service failure is detected by determining whether the service execution order in the service execution order definition information matches.

さらに、サービス障害検出部１１５は、サービス構成情報管理部１１１が記憶する影響範囲情報を参照して、障害の影響度が大きいサービスのログに対して優先的に障害発生の検出をする。 Furthermore, the service failure detection unit 115 refers to the influence range information stored in the service configuration information management unit 111, and detects the occurrence of a failure with priority for a service log having a large degree of failure influence.

≪サービス運用管理装置１０３のサービス運用管理方法の概略動作≫
本実施の形態にかかるサービス運用管理装置１０３は、サービス構成情報管理部１１１、およびサービス障害影響範囲特定部１１２を備えることを特徴とする。本実施の形態１にかかるサービス運用管理装置１０３は、サービス構成情報管理部１１１、およびサービス障害影響範囲特定部１１２という従来技術にはない特徴的な構成要素を設け、サービス障害検出部１１５の実行の際に、サービス障害の影響範囲が広いものについて優先的に検出する。 << Outline Operation of Service Operation Management Method of Service Operation Management Device 103 >>
The service operation management apparatus 103 according to the present embodiment includes a service configuration information management unit 111 and a service failure influence range specifying unit 112. The service operation management apparatus 103 according to the first embodiment includes characteristic components that are not present in the prior art, such as a service configuration information management unit 111 and a service failure influence range specifying unit 112, and executes the service failure detection unit 115. In this case, priority is given to those having a wide influence range of service failures.

≪サービス実行順序定義情報≫
業務とサービスとの例を図２に示す。
図２は、サービスの集合である業務（１）〜（ｎ）において、それぞれの業務がサービスを実行するサービス実行順序定義の例を示す。Ｗ_１、Ｙ_１などアルファベット（と添え字の組み合わせ）はサービスを示す。また矢印は、矢印左側のサービスが矢印右側のサービスを呼び出して実行することを示す。 << Service execution order definition information >>
An example of business and service is shown in FIG.
FIG. 2 shows an example of a service execution order definition in which each business executes a service in the business (1) to (n) which is a set of services. Alphabets (and combinations of subscripts) such as W ₁ and Y ₁ indicate services. The arrow indicates that the service on the left side of the arrow calls and executes the service on the right side of the arrow.

図２において、サービスＸは、業務（１）〜（ｎ）で呼び出されており、サービスＸの実行時に実行順序不正が発生しているかどうかを厳密にチェックするためには、サービスＸの直前に実行されたサービスがＷ_１〜Ｗ_ｎのいずれかのサービスであるかどうか確認する必要がある。定義が複雑な業務を多数含むサービス運用管理システムの場合、このマッチング対象となるサービス実行順序定義が膨大となり、多数のサービスを同時実行し短時間に大量のログが発生する場合など、ログ監視のリアルタイム性を損なう恐れがある。 In FIG. 2, the service X is called in the operations (1) to (n), and in order to strictly check whether or not the execution order is invalid when the service X is executed, immediately before the service X, the executed service it is necessary to confirm whether it is one of the services of W ₁ ~W _n. In the case of a service operation management system that includes many tasks with complicated definitions, the service execution order definition to be matched is enormous, and a large number of logs are generated in a short time by executing many services simultaneously. Real-time performance may be impaired.

図３において、（１）（２）は業務を示す業務ＩＤである。Ａ〜Ｇはそれぞれサービスを示すサービスＩＤである。業務（１）は、サービスＡ、Ｂ、Ｃ、Ｄから構成されている。業務（２）は、サービスＥ、Ｂ、Ｆ、Ｇから構成されている。矢印は、サービスの呼び出しを示す。また、サービスＢを利用する業務は（１）（２）のみとする。サービス間の呼び出し関係を階層情報という。 In FIG. 3, (1) and (2) are business IDs indicating business. A to G are service IDs indicating services. Business (1) is composed of services A, B, C, and D. Business (2) is composed of services E, B, F, and G. Arrows indicate service calls. In addition, the business using the service B is only (1) and (2). The call relationship between services is called hierarchical information.

サービス構成情報管理部１１１は、図３に示す階層情報を、業務ＩＤとサービスＩＤとを用いて複数のサービスのサービス実行順序定義情報（例えば図３の情報）として記憶している。サービス実行順序定義情報の記憶形式やデータ形式は問わない。 The service configuration information management unit 111 stores the hierarchical information illustrated in FIG. 3 as service execution order definition information (for example, information illustrated in FIG. 3) of a plurality of services using the business ID and the service ID. The storage format and data format of the service execution order definition information are not limited.

≪影響範囲情報≫
図３において、業務（１）のサービスＢで障害が発生したものとすると、サービスＢの障害の影響範囲はＡ、Ｃ、Ｄまで及ぶ。サービス障害影響範囲特定部１１２は、この影響範囲の大小を、サービス構成情報管理部１１１の持つサービス実行順序定義情報（例えば図３の情報）より算出して、影響範囲情報としてサービス構成情報管理部１１１に記憶する。影響範囲情報は、サービスに対応してそのサービスに障害が発生した場合に障害の影響度を有している。 ≪Effect range information≫
In FIG. 3, assuming that a failure occurs in the service B of the business (1), the influence range of the failure of the service B extends to A, C, and D. The service failure influence range specifying unit 112 calculates the size of the influence range from the service execution order definition information (for example, information of FIG. 3) of the service configuration information management unit 111, and the service configuration information management unit as the influence range information. 111. The influence range information has a degree of influence of a failure when a failure occurs in the service corresponding to the service.

障害の影響度は、例えばサービスＩＤごとに固定の値を定義するように構成してもよい。例えば、サービスＡは５点、サービスＢは１０点、サービスＣは７点のように固定値を付与し、その固定値を影響範囲情報としてサービス構成情報管理部１１１に記憶する。 The influence degree of the failure may be configured to define a fixed value for each service ID, for example. For example, a fixed value is assigned such that service A has 5 points, service B has 10 points, and service C has 7 points, and the fixed values are stored in the service configuration information management unit 111 as influence range information.

あるいは、図４に示すように、あるサービスで障害が発生した場合、障害のあったサービスのサービスＩＤに関係なく、そのサービスを含む業務の全てのサービスに影響が発生するものと仮定し、以下で説明する解析優先順スコアを各サービスに付与して、各サービスの解析優先順スコアの合計値を業務ごとの合計として表してもよい。 Alternatively, as shown in FIG. 4, when a failure occurs in a certain service, it is assumed that all the services of the business including the service are affected regardless of the service ID of the failed service. The analysis priority order score described in the above section may be assigned to each service, and the total value of the analysis priority order scores of each service may be expressed as a total for each service.

例えば、解析優先順スコアは、障害の発生したサービスＢに初期値１０点を与え、当該サービスＢを呼ぶサービスＡおよび当該サービスＢから呼び出されるサービスＣ、Ｄは呼び出しごとに解析優先度スコアが減算されるものとするよう構成してよい。また例えば減算の代わりに加算することにより解析優先度スコアを算出してもよい。図４においては、最も高い解析優先度スコアは１０で、呼び出しごとに解析優先度スコアが１ずつ減算されるよう構成した解析優先順スコアを用いて、図３の業務（１）、業務（２）ついてサービス障害影響範囲を算出した際の影響範囲情報構造体の例である。 For example, the analysis priority score gives an initial value of 10 to the service B in which the failure has occurred, and the service A that calls the service B and the services C and D that are called from the service B subtract the analysis priority score for each call. It may be configured to be. For example, the analysis priority score may be calculated by adding instead of subtraction. In FIG. 4, the highest analysis priority score is 10, and the analysis priority score configured so that the analysis priority score is subtracted by 1 for each call is used to perform the tasks (1) and (2) in FIG. This is an example of the influence range information structure when the service fault influence range is calculated.

図４は、業務（１）のサービスＢに対して記憶される影響範囲情報の例である。図４においては、影響範囲情報を単一の構造体（単一の値）としてではなく、
１．業務別の詳細影響度を保持する構造体、
２．業務毎の影響度をまとめて保持する構造体、
３．サービス別の影響度をまとめて保持する構造体、
の３種類の構造体として実現している。 FIG. 4 is an example of the influence range information stored for the service B of the business (1). In FIG. 4, the influence range information is not a single structure (single value),
1. Structure that holds the detailed impact level for each business,
2. A structure that collectively holds the degree of impact for each business,
3. A structure that collectively holds the impact level for each service,
This is realized as three types of structures.

「業務別の詳細影響度を保持する構造体」には、例えば、業務（１）の実行中にサービスＢに障害が発生した場合、業務（１）のサービスＢに付与された影響度と業務（１）のサービスＡ、Ｃ、Ｄに付与された影響度が保持される。また、業務（１）の実行中にサービスＢに障害が発生した場合、業務（２）のサービスＢに付与された影響度と業務（２）のサービスＥ、Ｆ、Ｇに付与された影響度が保持される。図４の場合は、業務（１）の実行中にサービスＢに障害が発生した場合、業務（２）には関係ないので業務（２）のサービスＢと業務（２）のサービスＥ、Ｆ、Ｇには影響がないので、影響度はゼロが保持される。 For example, if a failure occurs in the service B during the execution of the business (1), the “structure that holds the detailed impact level for each business” includes the degree of influence given to the service B of the business (1) and the business The degree of influence given to the services A, C, and D in (1) is held. If a failure occurs in service B during the execution of business (1), the degree of influence given to service B of business (2) and the degree of influence given to services E, F, and G of business (2) Is retained. In the case of FIG. 4, if a failure occurs in the service B during the execution of the business (1), the service B of the business (2) and the services E and F of the business (2) are not related to the business (2). Since there is no influence on G, the influence degree is kept at zero.

「業務毎の影響度をまとめて保持する構造体」には、業務毎の影響度をまとめて保持する。すなわち、「業務別の詳細影響度を保持する構造体」に保持されている各サービスの影響度の業務別合計が保持される。 In the “structure that collectively holds the degree of influence for each business”, the degree of influence for each business is collectively held. In other words, the total of the degree of influence of each service held in the “structure holding the degree of detailed influence by work” is held.

「サービス別の影響度をまとめて保持する構造体」には、「業務別の詳細影響度を保持する構造体」に保持されている各サービスの影響度のサービス別合計が保持される。 The “structure that collectively holds the degree of influence for each service” holds the total of the degree of influence of each service that is held in the “structure that holds the degree of detailed influence for each business” for each service.

図４の影響範囲情報から、業務（１）を実行中のサービスＢの影響範囲は、サービスＢ単独であれば１０点であり、サービスＢが属する業務（１）であれば３７点であり、サービスＢが属する業務（２）であれば０点であることが、わかる。 From the influence range information of FIG. 4, the influence range of the service B executing the business (1) is 10 points for the service B alone, and 37 points for the business (1) to which the service B belongs, It can be seen that the service (2) to which service B belongs is 0 points.

図４の影響範囲情報から、業務（２）を実行中のサービスＢの影響範囲は、サービスＢ単独であれば１０点であり、サービスＢが属する業務（１）であれば０点であり、サービスＢが属する業務（２）であれば３７点であることが、わかる。 From the influence range information of FIG. 4, the influence range of the service B executing the business (2) is 10 points for the service B alone, and 0 points for the business (1) to which the service B belongs. It can be seen that there are 37 points for the service (2) to which the service B belongs.

サービス構成情報管理部１１１は、サービスＢに対応してそのサービスＢに障害が発生した場合に障害の影響度を示す影響範囲情報を図４に示すように記憶している。 The service configuration information management unit 111 stores, as shown in FIG. 4, influence range information indicating the degree of influence of a failure when a failure occurs in the service B corresponding to the service B.

≪サービス障害検出部１１５のサービス実行順序不正検出方法の動作≫
図５に、サービス障害検出部１１５におけるサービス実行順序不正検出処理の動作フローチャートを示す。 << Operation of Service Fault Detection Unit 115 Service Execution Order Incorrect Detection Method >>
FIG. 5 shows an operation flowchart of service execution order fraud detection processing in the service failure detection unit 115.

まず、サービス障害検出部１１５は、サービス同士のメッセージ中継を記録したトランザクションログ１０８とサービス同士のメッセージを記録したメッセージログとの少なくともいずれかのログをＥＳＢ１０２から入力する。 First, the service failure detection unit 115 inputs from the ESB 102 at least one of a transaction log 108 that records message relay between services and a message log that records messages between services.

サービス障害検出部１１５は、入力としてログ（入力ログ）を受け取り（Ｓ１０１）、入力ログ中のサービスＩＤ、セッションＩＤ、業務ＩＤを取り出す（Ｓ１０２、Ｓ１０３、Ｓ１０４）。 The service failure detection unit 115 receives a log (input log) as an input (S101), and extracts a service ID, a session ID, and a business ID in the input log (S102, S103, S104).

サービス障害検出部１１５は、サービス構成情報管理部１１１を検索して、入力ログのサービス障害影響範囲を取り出す（Ｓ１９９）。サービス構成情報管理部１１１から入力ログのサービス障害影響範囲が検索できない場合は、Ｓ１０５に進む。サービス障害検出部１１５は、サービス障害影響範囲と比較する閾値を設定しておき、入力ログのサービス障害影響範囲が検索できた場合、サービス障害検出部１１５は、閾値以上の影響範囲を持つサービスのログを受け取った際には、他のサービスの障害よりも優先的に図５の実行順序不正検出の処理を実行する。閾値未満の影響範囲を持つサービスのログを受け取った際には、その入力ログを一時メモリに保管し、その入力ログに対する図５の実行順序不正検出の処理を後刻実行する。 The service failure detection unit 115 searches the service configuration information management unit 111 and extracts the service failure influence range of the input log (S199). When the service failure influence range of the input log cannot be retrieved from the service configuration information management unit 111, the process proceeds to S105. The service failure detection unit 115 sets a threshold value to be compared with the service failure influence range, and when the service failure influence range of the input log can be searched, the service failure detection unit 115 detects the service having an influence range equal to or greater than the threshold value. When the log is received, the execution order fraud detection process of FIG. 5 is executed with priority over other service failures. When a log of a service having an influence range less than the threshold is received, the input log is stored in a temporary memory, and the execution order fraud detection process shown in FIG. 5 is executed later for the input log.

Ｓ１０５において、サービス障害検出部１１５は、Ｓ１０３で取り出したセッションＩＤに関連付けられたログで最新のもの（直前のログ、前ログという）を、統合ログ記憶部１１７より取り出す（Ｓ１０５、Ｓ１０６）。サービス障害検出部１１５は、同一セッションＩＤを有するログであって、ログ時刻が最新のものを前ログとして取り出す。 In S105, the service failure detection unit 115 retrieves the latest log (referred to as the previous log or previous log) associated with the session ID retrieved in S103 from the integrated log storage unit 117 (S105, S106). The service failure detection unit 115 extracts a log having the same session ID and the latest log time as a previous log.

サービス障害検出部１１５は、Ｓ１０５、Ｓ１０６の取り出しで、セッションＩＤに関連付けられた前ログがある場合は、Ｓ１０７以下の処理を行う。 If there is a previous log associated with the session ID in S105 and S106, the service failure detection unit 115 performs the processing from S107 onward.

まず、サービス障害検出部１１５は、サービス構成情報管理装置１１１に記憶されたサービス実行順序定義情報をサービスＩＤ、業務ＩＤで検索して前ログが示すサービスの次に実行する可能性のあるサービスを候補サービスとして取り出す。サービス障害検出部１１５は、Ｓ１０５で取り出した前ログに関連付けられた、次に実行する可能性のある候補サービスのサービスＩＤ・業務ＩＤのリストを作成する（Ｓ１０７）。Ｓ１１４において後述するが、Ｓ１０５で取り出した前ログにサービスＩＤ・業務ＩＤのリストが既に付随している場合には、Ｓ１０７の候補サービスのサービスＩＤ・業務ＩＤのリスト作成動作は省略することができる。 First, the service failure detection unit 115 searches the service execution order definition information stored in the service configuration information management device 111 by the service ID and the business ID, and executes a service that may be executed next to the service indicated by the previous log. Take out as a candidate service. The service failure detection unit 115 creates a list of service IDs and business IDs of candidate services that may be executed next, which are associated with the previous log extracted in S105 (S107). As will be described later in S114, when the service ID / business ID list is already attached to the previous log extracted in S105, the service ID / business ID list creation operation of the candidate service in S107 can be omitted. .

そして、Ｓ１０２、Ｓ１０３で取り出した入力ログのサービスＩＤ・業務ＩＤが、Ｓ１０７で取り出し作成した候補サービスのサービスＩＤ・業務ＩＤのリストに含まれるかどうか調べる（Ｓ１０９、Ｓ１１０）。サービス障害検出部１１５は、Ｓ１０９、Ｓ１１０で、候補サービスのサービスＩＤ・業務ＩＤが一致する場合、サービス実行順序不正は発生していないと判断し、Ｓ１１３以下の処理を行う。 Then, it is checked whether the service ID / business ID of the input log extracted in S102 and S103 is included in the list of service IDs / business IDs of candidate services extracted and created in S107 (S109, S110). In S109 and S110, the service failure detection unit 115 determines that the service execution order is not invalid when the service ID and the business ID of the candidate services match, and performs the processing from S113 onward.

Ｓ１０８〜Ｓ１１０で、候補サービスのサービスＩＤまたは業務ＩＤがＳ１０２、Ｓ１０３で取り出した入力ログのサービスのサービスＩＤ・業務ＩＤのいずれかと一致しない場合、サービス障害検出部１１５は、Ｓ１０９、Ｓ１１０の処理を、Ｓ１０７で取り出し作成した候補サービスのサービスＩＤ・業務ＩＤのリストに対して繰り返し実行し、候補サービスのサービスＩＤ・業務ＩＤのリストの中から、Ｓ１０２、Ｓ１０３で取り出した入力ログのサービスのサービスＩＤ・業務ＩＤと一致するものを探す。Ｓ１１１で、サービスＩＤ・業務ＩＤが候補サービスのサービスＩＤ・業務ＩＤのリストの最後の要素である場合、サービス障害検出部１１５は、サービス実行順序不正を検出したと判断し（Ｓ１１２）、あらかじめ定めたサービス実行順序不正への対処処理を行う。すなわち、サービス障害検出部１１５は、サービス障害影響範囲特定部１１２に対して、サービス障害情報を送る。このサービス障害情報は、障害を起こしたサービスの業務ＩＤ、サービスＩＤ、障害の原因を示す障害原因ＩＤなどを含む。 In S108 to S110, when the service ID or the business ID of the candidate service does not match any of the service ID / business ID of the service of the input log extracted in S102 or S103, the service failure detection unit 115 performs the processes of S109 and S110. , Repeatedly executed on the service ID / business ID list of the candidate service extracted and created in S107, and the service ID of the service of the input log extracted in S102 and S103 from the list of the service ID / business ID of the candidate service -Search for the one that matches the business ID. If the service ID / work ID is the last element in the list of service IDs / work IDs of candidate services in S111, the service failure detection unit 115 determines that the service execution order is invalid (S112), and determines in advance. To deal with incorrect service execution order. That is, the service failure detection unit 115 sends service failure information to the service failure influence range specifying unit 112. This service failure information includes a service ID of the service that caused the failure, a service ID, a failure cause ID indicating the cause of the failure, and the like.

Ｓ１０８〜Ｓ１１０の処理でサービス実行順序不正を検出しなかった場合、またはＳ１０５、Ｓ１０６でセッションＩＤに関連付けられた前ログがない場合は、サービス障害検出部１１５は、Ｓ１１３以下の処理を行う。 When the service execution order fraud is not detected in the processing of S108 to S110, or when there is no previous log associated with the session ID in S105 and S106, the service failure detection unit 115 performs the processing of S113 and subsequent steps.

まず、サービス障害検出部１１５は、サービス構成情報管理部１１１より、当該入力ログを生成したサービスの次に実行する可能性のある候補サービスのサービスＩＤ・業務ＩＤのリストを取り出し（Ｓ１１３）、Ｓ１０１で入力された入力ログと、Ｓ１１３で取り出した候補サービスのサービスＩＤ・業務ＩＤのリストを関連付け、ログ管理部１１６を通して統合ログ記憶部１１７に過去ログとして保管し（Ｓ１１４）、処理を完了する。 First, the service failure detection unit 115 extracts a list of service IDs and business IDs of candidate services that may be executed next to the service that generated the input log from the service configuration information management unit 111 (S113). And the list of service IDs / service IDs of candidate services extracted in S113 are associated with each other and stored as a past log in the integrated log storage unit 117 through the log management unit 116 (S114), thereby completing the process.

このように、統合ログ記憶部１１７に記憶された入力ログは、次回、過去ログとして検索されることになるが、過去ログに次に実行する可能性のある候補サービスのサービスＩＤ・業務ＩＤのリストが関連付けられて記憶されている場合はその過去ログが前ログとして検索された場合には、前ログと次に実行する可能性のあるサービスＩＤ・業務ＩＤのリストとが同時に検索できるのでＳ１０７の候補サービスのサービスＩＤ・業務ＩＤのリスト作成動作を省略することができる。 As described above, the input log stored in the integrated log storage unit 117 will be searched as the past log next time, but the service ID / work ID of the candidate service that may be executed next in the past log. When the list is stored in association with the past log, when the previous log is searched as the previous log, the previous log and the list of service IDs and business IDs that may be executed next can be searched simultaneously. The operation of creating a list of service IDs and business IDs of the candidate services can be omitted.

なお、サービス運用管理システム１００の実現上の都合によっては、サービス実行順序不正を検出したログに対しても、Ｓ１１３、Ｓ１１４の処理を行い、サービス実行順序不正を検出したログに候補サービスのサービスＩＤ・業務ＩＤのリストを関連付けて統合ログ記憶部１１７に記憶するようにしてもよい。 Depending on the convenience of the service operation management system 100, the processes of S113 and S114 are also performed on the log in which the service execution order is detected to be invalid, and the service ID of the candidate service is added to the log in which the service execution order is detected in an incorrect manner. A list of business IDs may be associated and stored in the integrated log storage unit 117.

また、図５において、サービス運用管理システム１００の実現上の都合によっては、例えば、ユーザＩＤをセッションＩＤの代わりに用いることもできる。また、ユーザがサービスの集合としての業務単位でなく個別のサービスを実行するようなシステムの場合は、業務ＩＤの取り出しや比較を省略することもできる。 In FIG. 5, for example, a user ID can be used instead of a session ID depending on the implementation of the service operation management system 100. Further, in the case of a system in which a user executes an individual service instead of a business unit as a set of services, it is possible to omit extraction and comparison of business IDs.

あるいは、例えば、Ｓ１０９・Ｓ１１０においてサービスＩＤ・業務ＩＤをペアとして比較し、いずれか一方が異なるときは一致しないと判断するよう構成することもできる。 Alternatively, for example, in S109 and S110, the service ID and the business ID may be compared as a pair, and if either one is different, it may be determined that they do not match.

以上のように、サービス障害検出部１１５は、統合ログ記憶部１１７に記憶した複数のログから、入力ログが有するセッションＩＤと同じセッションＩＤを有する過去ログであって入力ログが示すサービスの直前に実行されたサービスの過去ログを前ログとして検出し、サービス構成情報管理部１１１に記憶したサービス実行順序情報を検索して、前ログが示すサービスの後に実行する可能性のあるサービスを候補サービスとして取得し、取得した候補サービスの業務ＩＤとサービスＩＤとが、入力ログが有する業務ＩＤとサービスＩＤとに一致するか判断してサービスの障害発生を検出する。 As described above, the service failure detection unit 115 is a past log having the same session ID as the session ID included in the input log, immediately before the service indicated by the input log, from the plurality of logs stored in the integrated log storage unit 117. The past log of the executed service is detected as a previous log, the service execution order information stored in the service configuration information management unit 111 is searched, and a service that can be executed after the service indicated by the previous log is set as a candidate service. The service ID of the acquired candidate service and the service ID are determined to match the business ID and service ID of the input log, and the occurrence of a service failure is detected.

≪サービス障害影響範囲特定部１１２のサービス障害影響範囲特定方法の動作≫
サービス障害影響範囲特定部１１２は、サービス障害影響範囲を特定するものである。以下の説明では、サービス障害影響範囲特定部１１２が、サービス障害影響範囲を、サービス障害情報をサービス障害検出部１１５より受け取るたびに計算する場合を説明する。 << Operation of Service Fault Influence Range Identification Method of Service Fault Influence Range Identification Unit 112 >>
The service failure influence range specifying unit 112 specifies a service failure influence range. In the following description, a case will be described in which the service failure influence range specifying unit 112 calculates the service failure influence range every time service failure information is received from the service failure detection unit 115.

図６に、サービス障害影響範囲特定部１１２の処理の動作フローチャートを示す。
サービス障害影響範囲特定部１１２は、まず、サービス障害情報をサービス障害検出部１１５より受け取る（Ｓ２０１）。このサービス障害情報は、障害を起こしたサービスの業務ＩＤ、サービスＩＤ、障害原因ＩＤなどを含む。 FIG. 6 shows an operation flowchart of processing of the service failure influence range specifying unit 112.
The service failure influence range specifying unit 112 first receives service failure information from the service failure detection unit 115 (S201). This service failure information includes the business ID, service ID, failure cause ID, and the like of the service that caused the failure.

次に、サービス障害影響範囲特定部１１２は、障害を起こしたサービスのそれぞれについて業務ＩＤ、サービスＩＤを取り出し（Ｓ２０２）、サービス構成情報管理部１１１より、当該サービスの親サービス・親業務情報を取り出す（Ｓ２０３）。この親サービス情報は、親サービスのサービスＩＤを含み、親業務情報は親業務の業務ＩＤを含む。 Next, the service failure influence range specifying unit 112 extracts a business ID and a service ID for each of the failed services (S202), and extracts the parent service / parent business information of the service from the service configuration information management unit 111. (S203). The parent service information includes the service ID of the parent service, and the parent business information includes the business ID of the parent business.

サービス障害影響範囲特定部１１２は、Ｓ２０３で取り出した親業務のそれぞれに対して、図４に示すような、業務ごとの影響範囲情報を組み立て、業務ごとの影響範囲情報を当該サービス情報の一部として、障害を起こしたサービスに対応させてサービス構成情報管理部１１１に追加記憶する（Ｓ２０４）。 The service fault influence range identification unit 112 assembles the impact range information for each business as shown in FIG. 4 for each parent business extracted in S203, and sets the impact range information for each business as part of the service information. Are additionally stored in the service configuration information management unit 111 in correspondence with the failed service (S204).

図６において、サービス障害影響範囲特定部１１２は、図６のＳ２０４の処理を、Ｓ２０３で取り出した全ての親業務について繰り返し（Ｓ２０５）、更に、Ｓ２０３で取り出した全ての親サービスについて、サービス間の階層情報を計算し、中間情報として記録する（Ｓ２０６）。サービス間の階層情報とは、あるサービス（親サービス）が他のサービス（子サービス）を呼び出す関係、および同じ親サービスに呼び出されたサービス群（兄弟サービス）の関係の情報、および、障害を起こしたサービスから各サービスまでの階層の深さを含む情報である。 In FIG. 6, the service failure influence range specifying unit 112 repeats the process of S204 of FIG. 6 for all parent businesses extracted in S203 (S205), and for all parent services extracted in S203, between services. The hierarchy information is calculated and recorded as intermediate information (S206). Hierarchical information between services refers to the relationship between a service (parent service) calling another service (child service), the relationship between services called by the same parent service (sibling service), and a failure. This information includes the depth of the hierarchy from each service to each service.

サービス障害影響範囲特定部１１２は、Ｓ２０３〜Ｓ２０７までの処理を、Ｓ２０３で取り出した全ての親サービスおよびそれらから親子関係を辿れる全ての親サービス・親業務に対して再帰的に実行する（Ｓ２０７）。 The service failure influence range identification unit 112 recursively executes the processing from S203 to S207 for all parent services extracted in S203 and all parent services / businesses that can follow the parent-child relationship from them (S207). .

また、サービス障害影響範囲特定部１１２は、Ｓ２０１で入力された全ての障害を起こしたサービスについて、当該サービスが呼び出す子サービスの情報をサービス構成情報管理部１１１より取り出し（Ｓ２０８）、それら全ての子サービスについて、サービス間の階層情報を計算し、中間情報に記録する（Ｓ２０９、Ｓ２１０）。 In addition, the service failure influence range specifying unit 112 extracts, from the service configuration information management unit 111, information on child services called by the service for all of the failed services input in S201 (S208), For services, hierarchical information between services is calculated and recorded in intermediate information (S209, S210).

サービス障害影響範囲特定部１１２は、Ｓ２０８〜Ｓ２１０の処理を、Ｓ２０８で取り出した各サービスの全ての子サービスおよびそれらから親子関係を辿れる全ての子サービスに対し再帰的に実行する（Ｓ２１１）。 The service failure influence range specifying unit 112 recursively executes the processing of S208 to S210 for all the child services of each service extracted in S208 and all the child services that can follow the parent-child relationship from them (S211).

以上の処理を完了すると、サービス障害影響範囲特定部１１２は、Ｓ２０６およびＳ２０９で組み立てた中間情報をもとに、業務ごとに、また、更にサービス毎に、影響度を算出し（Ｓ２１２、Ｓ２１３）処理を完了する。 When the above processing is completed, the service failure influence range specifying unit 112 calculates the degree of influence for each business and for each service based on the intermediate information assembled in S206 and S209 (S212, S213). Complete the process.

この影響範囲情報の構造体の詳細例は、図４に示したとおりである。図４の例では、更に業務ごと、且つ、業務を構成するサービスごとの詳細影響度を算出し、影響範囲情報構造体に加えている。サービス障害影響範囲は、障害を検出するたびに計算され、障害が発生したサービスに対してのみサービス障害影響範囲が計算されて記憶されることになる。 A detailed example of the structure of the influence range information is as shown in FIG. In the example of FIG. 4, the detailed influence degree is further calculated for each service and for each service constituting the business, and added to the influence range information structure. The service failure influence range is calculated every time a failure is detected, and the service failure influence range is calculated and stored only for the service in which the failure has occurred.

障害が発生したサービスに対してのみサービス障害影響範囲が計算されて記憶されている場合には、図５のＳ１９９において、すべての入力ログについてはサービス障害影響範囲を検索することができない。 When the service failure influence range is calculated and stored only for the service in which the failure has occurred, the service failure influence range cannot be searched for all input logs in S199 of FIG.

図６においては、サービス障害検出部１１５が障害を検出するたびにサービス障害影響範囲特定部１１２がサービス障害影響範囲を計算する場合を示したが、サービス障害影響範囲特定部１１２は、サービス障害影響範囲をログ収集・蓄積のたびに計算してもよい。 FIG. 6 shows the case where the service failure influence range specifying unit 112 calculates the service failure influence range every time the service failure detection unit 115 detects a failure. The range may be calculated for each log collection / accumulation.

また、例えば、サービス障害影響範囲特定部１１２は、サービス構成情報管理部１１１のサービス構成情報が更新されるたびにすべてのサービスについてサービス障害影響範囲を算出し、サービス障害影響範囲を業務・サービスと関連付けてサービス構成情報管理部１１１にあらかじめ記憶させておいてもよい。またはサービス障害影響範囲特定部１１２が、サービス障害影響範囲を業務・サービスと関連付けてメモリ装置に一時保管（キャッシュ）しておくように構成してもよい。すべてのサービスについてサービス障害影響範囲をあらかじめ算出しサービス構成情報管理部１１１にあらかじめ記憶させておく場合には、サービス障害検出部１１５は、図５のＳ１９９において、すべての入力ログについてサービス障害影響範囲を検索することができ、優先的なサービス実行順序不正の検出をするか否かをすべてのサービスについて判断することができる。 Further, for example, the service failure influence range specifying unit 112 calculates the service failure influence range for all services every time the service configuration information of the service configuration information management unit 111 is updated, and sets the service failure influence range as a business / service. The service configuration information management unit 111 may associate and store the information in advance. Alternatively, the service failure influence range specifying unit 112 may be configured to temporarily store (cache) the service failure influence range in association with the business / service in the memory device. When the service failure influence range is calculated in advance for all services and stored in the service configuration information management unit 111 in advance, the service failure detection unit 115 determines the service failure influence range for all input logs in S199 of FIG. It is possible to search for all services whether or not to detect preferential service execution order fraud.

以上の構成および動作により、この実施の形態では以下のような効果を奏する。
サービス障害検出部１１５、サービス構成情報管理部１１１、サービス障害影響範囲特定部１１２の動作により、サービス障害影響範囲が広いものについて優先的にサービス実行順序不正を検出することを可能にする。 With the above configuration and operation, this embodiment has the following effects.
By the operations of the service failure detection unit 115, the service configuration information management unit 111, and the service failure influence range specifying unit 112, it is possible to preferentially detect an invalid service execution order for a wide service failure influence range.

実施の形態２．
実施の形態１の説明ではサービス実行順序不正の検出について述べてきたが、サービス障害影響範囲による優先度付けは、サービス実行順序不正以外の障害検出についても適用可能である。 Embodiment 2. FIG.
In the description of the first embodiment, detection of service execution order fraud has been described, but prioritization based on the service fault influence range can also be applied to fault detection other than service execution order fraud.

以下に、サービス運用管理装置１０３が、メッセージログ１０７およびトランザクションログ１０８から検出可能なサービス障害を示す。 The service failures that can be detected from the message log 107 and the transaction log 108 by the service operation management apparatus 103 are shown below.

１．実行順序不正（実施の形態１の場合）
実行順序不正とは、サービスの実行順序定義から逸脱したサービスの呼び出しが行われた場合をいう。 1. Invalid execution order (in the first embodiment)
Incorrect execution order refers to a case where a service call that deviates from the execution order definition of the service is made.

２．タイミングの不正
タイミングの不正とは、指定された時間間隔で応答されていない（早すぎ、遅すぎ）場合、あるいは、応答が欠落した（通信エラーを含む）場合をいう。 2. Incorrect timing Incorrect timing refers to a case where a response is not made at a specified time interval (too early or too late) or a response is missing (including a communication error).

３．サービスエラー
サービスエラーとは、サービス自身にて検出済みのエラーをいう。ログに特定のエラーコードおよびメッセージが含まれている。 3. Service error A service error is an error detected by the service itself. The log contains specific error codes and messages.

４．値の不正
値の不正とは、応答された値が正しい範囲内に収まっていない場合、又は、フォーマット不正の場合をいう。 4). Invalid value Invalid value is when the returned value is not within the correct range or the format is invalid.

これらの障害を検出するために、この実施の形態のサービス運用管理システムを構成するサービス運用管理装置１０３は、サービス障害定義管理部１１３を備えている。
サービス障害定義管理部１１３は、サービスの実行タイミング、パラメータの値のとりうる範囲の制限、パラメータの値のとりうる範囲の制限などのサービス障害定義（障害基準値）をあらかじめ保管する記憶部である。 In order to detect these failures, the service operation management apparatus 103 constituting the service operation management system of this embodiment includes a service failure definition management unit 113.
The service failure definition management unit 113 is a storage unit that stores in advance service failure definitions (failure reference values) such as service execution timing, limitation of the range of possible parameter values, and limitation of the range of possible parameter values. .

これらの障害を検出するために、サービス部１０１は、障害を検出するのに必要な属性情報１０５の属性値を、メッセージ１０４に付加する。この属性情報１０５としては、一連の業務かどうかを示すセッションＩＤや業務ＩＤ、サービスの実行時刻などを付加することができる。例えば、サービス部１０１は、メッセージ１０４がＸＭＬである場合、後述する図８において、＜ｑｕｅｒｙ＞〜＜／ｑｕｅｒｙ＞で囲まれた部分（点線枠内）で示したＸＰａｔｈ記法の条件式として、サービスの実行タイミング、パラメータの値、サービスの戻り値などを属性として定義することができる。 In order to detect these failures, the service unit 101 adds an attribute value of the attribute information 105 necessary for detecting the failure to the message 104. As the attribute information 105, a session ID indicating whether it is a series of business, business ID, service execution time, or the like can be added. For example, when the message 104 is XML, the service unit 101 uses a service expression as a conditional expression of the XPath notation indicated by a portion surrounded by <query> to </ query> (within a dotted frame) in FIG. The execution timing, parameter value, service return value, etc. can be defined as attributes.

サービス運用管理装置１０３のサービス障害検出部１１５は、入力ログに記録されたこれら属性情報１０５の属性値が、サービス障害定義管理部１１３が記憶したサービス障害定義情報の異常範囲に違反しているか否かにより、障害を検出することができる。 The service failure detection unit 115 of the service operation management apparatus 103 determines whether or not the attribute value of the attribute information 105 recorded in the input log violates the abnormal range of the service failure definition information stored by the service failure definition management unit 113. Therefore, a failure can be detected.

サービス障害検出部１１５は、入力ログに属性情報１０５として記録されたサービスの実行タイミングが、サービスの実行タイミングと一致しない場合は、そのサービスに障害が発生したものと判定する。 When the service execution timing recorded as the attribute information 105 in the input log does not match the service execution timing, the service failure detection unit 115 determines that a failure has occurred in the service.

サービス障害検出部１１５は、入力ログに属性情報１０５として記録されたパラメータの値が、パラメータの値のとりうる範囲と一致しない場合は、そのサービスに障害が発生したものと判定する。 If the parameter value recorded as the attribute information 105 in the input log does not match the possible range of the parameter value, the service failure detection unit 115 determines that a failure has occurred in the service.

サービス障害検出部１１５は、入力ログに属性情報１０５として記録されたサービスの戻り値が。パラメータの値のとりうる範囲と一致しない場合は、そのサービスに障害が発生したものと判定する。 The service failure detection unit 115 has the return value of the service recorded as the attribute information 105 in the input log. If it does not match the possible range of the parameter value, it is determined that a failure has occurred in the service.

サービス障害検出部１１５は、サービス障害影響範囲特定部１１２の算出したサービス障害の影響範囲の広いサービスであって、且つ、サービス障害定義管理部１１３が記憶したサービス障害定義情報に違反しているサービスの障害を優先的に検出する。サービス障害検出部１１５は、サービス障害影響範囲特定部１１２の算出したサービス障害の影響範囲にかかわりなく、サービス障害定義管理部１１３が記憶したサービス障害定義情報に違反しているサービスの障害を優先的に検出するようにしてもよい。 The service failure detection unit 115 is a service with a wide service failure impact range calculated by the service failure impact range specifying unit 112 and violates the service failure definition information stored in the service failure definition management unit 113 Preferentially detect faults. The service failure detection unit 115 gives priority to the failure of the service that violates the service failure definition information stored in the service failure definition management unit 113 regardless of the service failure impact range calculated by the service failure impact range specifying unit 112. You may make it detect to.

以上の構成および動作により、この実施の形態のサービス運用管理システム１００は以下のような効果を奏する。 With the above configuration and operation, the service operation management system 100 of this embodiment has the following effects.

サービス障害検出部１１５が、メッセージログ１０７から属性情報１０５を抽出し、当該属性情報１０５の値を、サービス障害定義管理部１１３の保管するサービス障害定義情報と照合するので、サービス障害定義管理部１１３が記憶したサービス障害定義に違反しているサービスの障害を優先的に検出することができる。また、サービス障害影響範囲特定部１１２の算出するサービス障害の影響範囲をあわせて用いれば、さらに、重要なサービスの障害を優先的に検出することを可能にする。 Since the service fault detection unit 115 extracts the attribute information 105 from the message log 107 and collates the value of the attribute information 105 with the service fault definition information stored in the service fault definition management unit 113, the service fault definition management unit 113 It is possible to preferentially detect a service failure that violates the service failure definition stored in the. Further, if the service fault influence range calculated by the service fault influence range specifying unit 112 is also used, it is possible to preferentially detect a fault of an important service.

実施の形態３．
実施の形態２において、メッセージからの属性抽出はＥＳＢ１０２のログの形式に依存する形で実現する必要がある。このため、企業合併や提携などで、図１のように、サービス運用管理システム１００が複数のＥＳＢ１０２のトランザクションログ１０８に対応しようとする場合、ＥＳＢ１０２を監視対象として追加したり、ＥＳＢ１０２をリプレースしたりするたびにサービス運用管理システム１００の修正が必要となり、ＳＯＡの目的である柔軟かつ迅速なサービス構成変更に支障をきたすという課題がある。 Embodiment 3 FIG.
In the second embodiment, the attribute extraction from the message needs to be realized depending on the log format of the ESB 102. For this reason, when the service operation management system 100 tries to support the transaction logs 108 of a plurality of ESBs 102 as shown in FIG. Each time the service operation management system 100 is corrected, there is a problem that the flexible and quick service configuration change that is the purpose of the SOA is hindered.

このＥＳＢ１０２の処理の記録は、ＥＳＢ１０２の種類により様々な形式がある。これらの例を図７に示す。 The ESB 102 processing record has various formats depending on the type of the ESB 102. Examples of these are shown in FIG.

１．特殊な形式のＥＳＢログ（図７右側）
ＥＳＢ１０２の処理をテキスト形式のトランザクションログ１０８に記録し、これとは別にサービス間のメッセージをファイルとして保管し、トランザクションログ１０８中にメッセージのファイル名とメッセージセッションＩＤなどを記録するもの。 1. Special format ESB log (right side of Fig. 7)
The process of the ESB 102 is recorded in a text-format transaction log 108. Separately, messages between services are stored as files, and the file name and message session ID of the message are recorded in the transaction log 108.

２．一般的なＥＳＢログ（図７左側）
ＥＳＢ１０２のトランザクションログ１０８がサービス間のメッセージ全文を包含するもの。 2. General ESB log (left side of Fig. 7)
The transaction log 108 of the ESB 102 includes the entire message between services.

従来の技術によるサービス運用管理装置では、このようなＥＳＢ１０２のログの形式に応じて、メッセージのログまたはトランザクションログ１０８から属性を抽出するための仕組み（ロジック）を用意する必要があった。 In the service operation management apparatus according to the prior art, it is necessary to prepare a mechanism (logic) for extracting attributes from the message log or the transaction log 108 according to the ESB 102 log format.

図１において、この実施の形態のサービス運用管理システム１００ではサービス部１０１がメッセージ１０４に属性情報１０５を付加するほか、メッセージ中の属性抽出のための属性クエリ１０６を添付することにより対処する。また、サービス運用管理装置１０３には、ログ内のメッセージ中の属性クエリ１０６を解釈してメッセージ中の属性を抽出する属性クエリ解釈部１０９と、抽出した属性を保管する属性管理部１１０とを備えるようにする。 In FIG. 1, in the service operation management system 100 according to this embodiment, the service unit 101 adds the attribute information 105 to the message 104 and attaches an attribute query 106 for extracting the attribute in the message. In addition, the service operation management apparatus 103 includes an attribute query interpretation unit 109 that interprets the attribute query 106 in the message in the log and extracts the attribute in the message, and an attribute management unit 110 that stores the extracted attribute. Like that.

属性クエリ１０６の例を図８に示すメッセージ１０４に示す。図８において、＜ｑｕｅｒｙ＞〜＜／ｑｕｅｒｙ＞で囲まれた部分（点線枠内）が、ｓｅｓｓｉｏｎ（セッションＩＤ）、ｊｏｂ（業務ＩＤ）、ｓｅｒｖｉｃｅ（サービスＩＤ）、ｔｉｍｅ（実行時刻）の属性を抽出するための属性クエリである。この例では、メッセージ１０４がＸＭＬ形式であるため、ＸＭＬ文書内における情報の位置を記述する標準的なＸＰａｔｈ記法で記述したクエリを示す。属性には、当該メッセージが高重要度であることを示す値を含ませてもよい。その場合には、サービス部１０１は、重要度の属性を抽出するための属性クエリを付加する。 An example of the attribute query 106 is shown in the message 104 shown in FIG. In FIG. 8, the portion surrounded by <query> to </ query> (within the dotted frame) indicates the attributes of session (session ID), job (business ID), service (service ID), and time (execution time). This is an attribute query for extraction. In this example, since the message 104 is in the XML format, a query written in a standard XPath notation describing the position of information in the XML document is shown. The attribute may include a value indicating that the message is of high importance. In that case, the service unit 101 adds an attribute query for extracting the attribute of importance.

≪サービス部１０１の動作≫
図９に、サービス部１０１の処理の動作フローチャートを示す。まず、サービス部１０１は、ユーザに提供する機能（企業の業務サービスなど）を実行し（Ｓ３０１）、この結果をもとにメッセージ１０４を生成する（Ｓ３０２）。このメッセージ１０４の生成に当たり、サービス部１０１は、メッセージ１０４に属性情報１０５を添付する（Ｓ３０３、Ｓ３０４）。更に、サービス部１０１は、属性情報１０５をメッセージ１０４から取り出すための属性クエリ１０６を生成し、属性クエリ１０６をメッセージ１０４に添付する（Ｓ３０５、Ｓ３０６）。 << Operation of Service Unit 101 >>
FIG. 9 shows an operation flowchart of processing of the service unit 101. First, the service unit 101 executes a function provided to a user (business service of a company, etc.) (S301), and generates a message 104 based on the result (S302). In generating the message 104, the service unit 101 attaches the attribute information 105 to the message 104 (S303, S304). Further, the service unit 101 generates an attribute query 106 for extracting the attribute information 105 from the message 104, and attaches the attribute query 106 to the message 104 (S305, S306).

なお、サービス運用管理システム実現上の都合によっては、属性クエリ１０６は属性情報１０５の全てに対して添付してもよく、一部に対して添付してもよい。 Note that the attribute query 106 may be attached to all of the attribute information 105 or may be attached to a part depending on the convenience in realizing the service operation management system.

これらの処理を完了すると、サービス部１０１は、ＥＳＢ１０２または他のサービスに向けて当該メッセージ１０４を送付し（Ｓ３０７）、処理を終了する。 When these processes are completed, the service unit 101 sends the message 104 to the ESB 102 or another service (S307), and ends the process.

≪属性クエリ解釈部１０９の動作≫
図１０と図１１に、サービス運用管理装置１０３の属性クエリ解釈部１０９の処理の動作フローチャートを示す。 << Operation of Attribute Query Interpreting Unit 109 >>
10 and 11 show operation flowcharts of processing of the attribute query interpretation unit 109 of the service operation management apparatus 103. FIG.

サービス運用管理装置１０３は、ＥＳＢ１０２のメッセージログ１０７またはトランザクションログ１０８を入力ログの入力として受け取り、入力した入力ログが、メッセージログ１０７であるか、トランザクションログ１０８であるかを識別する。 The service operation management apparatus 103 receives the message log 107 or transaction log 108 of the ESB 102 as input log input, and identifies whether the input log input is the message log 107 or the transaction log 108.

入力ログがメッセージログ１０７である場合、Ｓ５０１〜Ｓ５０８の処理を行う。 When the input log is the message log 107, the processing of S501 to S508 is performed.

入力ログが一般的な形式のトランザクションログ１０８（図７左側）である場合、Ｓ４１０〜Ｓ４１７の処理を行う。 When the input log is a transaction log 108 in a general format (left side in FIG. 7), the processing of S410 to S417 is performed.

入力ログが特殊な形式のトランザクションログ１０８（図７右側）である場合、Ｓ４０４〜Ｓ４０７の処理を行う。 When the input log is a special type transaction log 108 (right side in FIG. 7), the processing of S404 to S407 is performed.

入力ログがトランザクションログ１０８である識別は、例えば、メッセージログ１０７（及びメッセージ１０４）がＸＭＬであり、トランザクションログ１０８が構造を持たないテキスト情報である場合、図８のメッセージの例において１行目が「＜？ＸＭＬｖｅｒｓｉｏｎ …？＞」であるかどうか判定することにより行うことができる。１行目が「＜？ＸＭＬｖｅｒｓｉｏｎ …？＞」であれば、メッセージログ１０７（及びメッセージ１０４）であると判断する。１行目が「＜？ＸＭＬｖｅｒｓｉｏｎ …？＞」でなければ、トランザクションログ１０８であると判断する。 For example, when the message log 107 (and the message 104) is XML and the transaction log 108 is text information having no structure, the identification that the input log is the transaction log 108 is the first line in the message example of FIG. Can be performed by determining whether or not “<? XML version ...?>”. If the first line is “<? XML version...>”, It is determined that the message log 107 (and message 104). If the first line is not “<? XML version ...?>”, It is determined that the transaction log 108 is stored.

サービス運用管理装置１０３は、ＥＳＢ１０２のメッセージログ１０７またはトランザクションログ１０８を入力ログの入力として受け取り（Ｓ４０１）、入力した入力ログがメッセージログ１０７であるか否かを識別する。 The service operation management apparatus 103 receives the message log 107 or transaction log 108 of the ESB 102 as input log input (S401), and identifies whether the input log input is the message log 107 or not.

次に、入力ログが、メッセージログ１０７であるか、トランザクションログ１０８であるかを識別する（Ｓ４０２）。 Next, it is identified whether the input log is the message log 107 or the transaction log 108 (S402).

Ｓ４０２において入力ログがメッセージログ１０７でない場合、属性クエリ解釈部１０９は、図７の一般的なＥＳＢ１０２のログのようにトランザクションログ１０８がメッセージログ１０７（ＸＭＬ）を包含するかどうかを識別する（Ｓ４０３）。メッセージログ１０７（ＸＭＬ）を包含しない場合（特殊な形式のトランザクションログ１０８の場合）、属性クエリのうち、トランザクションログ１０８を対象としたものと解釈し、メッセージに含まれない属性情報を切り出し（Ｓ４０４）、入力ログに記録されているサービスを取り出し（Ｓ４０５）、サービス障害検出（優先実行されていないサービス障害の検出を含む）をサービス障害検出部１１５に実行させる（Ｓ４０６）。あるいは、属性クエリ解釈部１０９は、自らサービス障害検出部１１５と同様の処理を実行してサービス障害を検出する。 If the input log is not the message log 107 in S402, the attribute query interpretation unit 109 identifies whether the transaction log 108 includes the message log 107 (XML) as in the general ESB 102 log of FIG. 7 (S403). ). If the message log 107 (XML) is not included (in the case of the transaction log 108 in a special format), the attribute query is interpreted as targeting the transaction log 108, and attribute information not included in the message is extracted (S404). ), The service recorded in the input log is extracted (S405), and service failure detection (including detection of a service failure that is not preferentially executed) is executed by the service failure detection unit 115 (S406). Alternatively, the attribute query interpretation unit 109 performs the same process as the service failure detection unit 115 and detects a service failure.

属性クエリ解釈部１０９は、Ｓ４０５で取り出したサービスが、入力した入力ログに記録されている最後のサービスかどうか判断し（Ｓ４０７）、最後のサービスでない場合、入力した入力ログに記録されている次のサービスについてＳ４０６を実行する。Ｓ４０７で最後のサービスである場合、属性クエリ解釈部１０９は、当該入力をログ管理部１１６のログ保管機能により統合ログ記憶部１１７に格納し（Ｓ４０８）、処理を終了する。 The attribute query interpretation unit 109 determines whether the service extracted in S405 is the last service recorded in the input log (S407). If the service is not the last service, the attribute query interpreter 109 determines the next service recorded in the input log. Step S406 is executed for these services. If it is the last service in S407, the attribute query interpretation unit 109 stores the input in the integrated log storage unit 117 by the log storage function of the log management unit 116 (S408), and ends the process.

Ｓ４０３で、トランザクションログ１０８がメッセージログ１０７（ＸＭＬ）を包含する場合（一般的な形式のトランザクションログ１０８の場合）、属性クエリ解釈部１０９は、入力された入力ログからメッセージログ（ＸＭＬ）を切り出し（Ｓ４１０）、メッセージログのそれぞれから属性クエリ１０６を切り出す（Ｓ４１１）。これは例えば、サービス部１０１が、メッセージ中に属性クエリ１０６の始点・終点を示す予約語を挿入することにより実現できる。属性クエリ解釈部１０９は、Ｓ４１１で、入力ログ中の最後の属性クエリ１０６かどうか判定し（Ｓ４１２）、最後の属性クエリ１０６でない場合はＳ４１０で切り出したメッセージログから属性クエリ１０６を取り出すＳ４１１の処理を繰り返す。Ｓ４１２で入力中の最後の属性クエリである場合、属性クエリ解釈部１０９は、属性クエリ解釈を実行する（Ｓ４１３）。これは例えば、ＸＰａｔｈ処理系を起動し図８で示した属性クエリおよびメッセージログをＸＰａｔｈ処理系に入力することにより実現できる。この結果を受け、属性クエリ解釈部１０９は、属性中に当該メッセージが高重要度であることを示す値を含むかどうか確認する（Ｓ４１４）。ここで高重要度を示す値を含む場合、サービス運用管理装置１０３は、サービス障害検出部１１５によるサービス障害検出（サービス実行順序不正検出）を優先実行する（Ｓ４１６）。あるいは、属性クエリ解釈部１０９は、自らサービス障害検出部１１５と同様の処理を実行してサービス障害を検出する。 In S403, when the transaction log 108 includes the message log 107 (XML) (in the case of a general format transaction log 108), the attribute query interpretation unit 109 extracts the message log (XML) from the input log input. (S410), the attribute query 106 is cut out from each message log (S411). For example, this can be realized by the service unit 101 inserting a reserved word indicating the start point / end point of the attribute query 106 in the message. In step S411, the attribute query interpretation unit 109 determines whether or not the attribute query 106 is the last attribute query 106 in the input log (S412). repeat. If it is the last attribute query being input in S412, the attribute query interpretation unit 109 executes attribute query interpretation (S413). This can be realized, for example, by starting the XPath processing system and inputting the attribute query and message log shown in FIG. 8 to the XPath processing system. Upon receiving this result, the attribute query interpretation unit 109 checks whether or not the attribute includes a value indicating that the message is of high importance (S414). When a value indicating high importance is included here, the service operation management apparatus 103 preferentially executes service failure detection (service execution order illegal detection) by the service failure detection unit 115 (S416). Alternatively, the attribute query interpretation unit 109 performs the same process as the service failure detection unit 115 and detects a service failure.

Ｓ４１４で属性中に当該メッセージが高重要度を示す値を含まない場合、属性クエリ解釈部１０９は、当該サービスのサービス障害影響範囲をサービス障害影響範囲特定部１１２に算出させ、その値が閾値を超えるかどうか判断する（Ｓ４１５）。ここで閾値を超える場合、サービス運用管理装置１０３は、Ｓ４１６を実行する。すなわち、サービス運用管理装置１０３は、サービス障害検出部１１５によるサービス障害検出を優先実行する（Ｓ４１６）。あるいは、属性クエリ解釈部１０９は、自らサービス障害検出部１１５と同様の処理を実行してサービス障害を検出する。 When the message does not include a value indicating high importance in the attribute in S414, the attribute query interpretation unit 109 causes the service failure influence range specifying unit 112 to calculate the service failure influence range of the service, and the value is a threshold value. It is determined whether it exceeds (S415). If the threshold value is exceeded, the service operation management apparatus 103 executes S416. That is, the service operation management apparatus 103 preferentially executes service failure detection by the service failure detection unit 115 (S416). Alternatively, the attribute query interpretation unit 109 performs the same process as the service failure detection unit 115 and detects a service failure.

Ｓ４１５で閾値を超えない場合、またはＳ４１６の処理を終えた場合、属性クエリ解釈部１０９は、Ｓ４１０で取り出したメッセージが、入力中の最後のメッセージ本文であるかどうかを確認する（Ｓ４１７）。最後のメッセージでない場合、Ｓ４１０〜Ｓ４１７の処理を繰り返し、最後のメッセージである場合、属性クエリ解釈部１０９は、当該入力をログ管理部１１６のログ保管機能により統合ログ記憶部１１７に格納し（Ｓ４０８）処理を終了する。 When the threshold value is not exceeded in S415 or when the process of S416 is completed, the attribute query interpretation unit 109 checks whether or not the message extracted in S410 is the last message body being input (S417). If it is not the last message, the processing of S410 to S417 is repeated. If it is the last message, the attribute query interpretation unit 109 stores the input in the integrated log storage unit 117 by the log storage function of the log management unit 116 (S408). ) End the process.

Ｓ４０２において、入力ログがメッセージログ１０７である場合、属性クエリ解釈部１０９は図１１の処理を行う。図１１においては、まず、属性クエリ解釈部１０９は入力ログから属性クエリを切り出す（Ｓ５０１）。これはＳ４１１と同じ処理として実現できる。この処理を、入力ログ中の最後の属性クエリを取り出すまで繰り返す（Ｓ５０２）。次に、属性クエリ解釈部１０９は、取り出した属性クエリのそれぞれに対し、属性クエリ解釈を実行する（Ｓ５０３）。この処理は、Ｓ４１３と同様な処理として実現できる。この処理を、最後の属性クエリになるまで繰り返す（Ｓ５０４）。この結果をもとに、属性クエリ解釈部１０９は、メッセージログ１０７の含む属性に高重要度を示す値が含まれるか判断する（Ｓ５０５）。ここで高重要度を示す値が含まれる場合、サービス運用管理装置１０３は、サービス障害検出を優先実行する（Ｓ５０７）。Ｓ５０５で属性中に高重要度を示す値を含まない場合、属性クエリ解釈部１０９は、当該サービスの影響範囲を算出し、その値が閾値を超えるかどうか判断する（Ｓ５０６）。ここで閾値を超える場合、サービス運用管理装置１０３は、サービス障害検出を優先実行する（Ｓ５０７）。Ｓ５０６で閾値を超えない場合、属性クエリ解釈部１０９は、このメッセージのログに関する情報を含むトランザクションログ１０８が入力されるまで、収集・蓄積部１１４またはログ管理部１１６でメッセージログ１０７を、また属性管理部１１０で属性の値を、それぞれ保管する（Ｓ５０８）。Ｓ５０７またはＳ５０８の後、属性クエリ解釈部１０９は図１０のＳ４０９以下の処理を行う。 In S402, when the input log is the message log 107, the attribute query interpretation unit 109 performs the process of FIG. In FIG. 11, first, the attribute query interpretation unit 109 extracts an attribute query from the input log (S501). This can be realized as the same processing as S411. This process is repeated until the last attribute query in the input log is extracted (S502). Next, the attribute query interpretation unit 109 executes attribute query interpretation for each of the extracted attribute queries (S503). This process can be realized as a process similar to S413. This process is repeated until the last attribute query is reached (S504). Based on this result, the attribute query interpretation unit 109 determines whether the attribute included in the message log 107 includes a value indicating high importance (S505). When a value indicating high importance is included here, the service operation management apparatus 103 preferentially executes service failure detection (S507). If the attribute does not include a value indicating high importance in S505, the attribute query interpretation unit 109 calculates the influence range of the service and determines whether the value exceeds the threshold (S506). If the threshold is exceeded, the service operation management apparatus 103 preferentially executes service failure detection (S507). If the threshold value is not exceeded in S506, the attribute query interpretation unit 109 causes the collection / accumulation unit 114 or the log management unit 116 to store the message log 107 and the attribute until the transaction log 108 including information regarding the log of the message is input. The management unit 110 stores each attribute value (S508). After S507 or S508, the attribute query interpretation unit 109 performs the processing from S409 onward in FIG.

なお、図１０は、重要度の高いサービスの障害検出を、サービス障害影響範囲の広いサービスの障害検出よりもさらに優先して実行するように構成した例である。実施の都合によっては、例えばサービス障害影響範囲の広いサービスの障害検出を、重要度の高いサービスの障害検出よりも、優先して実行するように構成してもよい。 FIG. 10 shows an example in which failure detection of a highly important service is executed with higher priority than failure detection of a service having a wide service failure influence range. Depending on the implementation convenience, for example, the failure detection of a service having a wide service failure influence range may be executed in preference to the failure detection of a service with high importance.

また、図５のＳ１０２〜Ｓ１０４における入力ログ中のサービスＩＤ、業務ＩＤ、セッションＩＤの取り出しは、サービスＩＤ、セッションＩＤ、業務ＩＤをメッセージの属性としてメッセージ１０４に付加して、属性として取り出してもよい。または、図５のＳ１０２〜Ｓ１０４における入力ログ中のサービスＩＤ、業務ＩＤ、セッションＩＤの取り出しは、当該メッセージにサービスＩＤ、セッションＩＤ、業務ＩＤを取り出す属性クエリを付加し、図１０と図１１にかかる手順で属性クエリを実行することにより実現してもよい。 Further, the service ID, the business ID, and the session ID in the input log in S102 to S104 in FIG. 5 may be extracted by adding the service ID, the session ID, and the business ID as message attributes to the message 104 and extracting them as attributes. Good. Alternatively, in order to extract the service ID, business ID, and session ID in the input log in S102 to S104 in FIG. 5, an attribute query that extracts the service ID, session ID, and business ID is added to the message, and FIGS. You may implement | achieve by performing an attribute query in such a procedure.

以上の構成および動作により、この実施の形態は以下のような効果を奏する。
属性クエリ解釈部１０９による動作により、ＥＳＢ１０２の種類により異なるログ形式に依存せず属性情報を取り出し、メッセージ中の属性で示されたサービスの重要度とサービス障害影響範囲特定部１１２の算出するサービス障害の影響範囲との何れか又は両方をパラメータとして、サービス障害検出の優先度を決定することが可能となる。 With this configuration and operation, this embodiment has the following effects.
By the operation of the attribute query interpretation unit 109, the attribute information is extracted without depending on the type of log depending on the type of the ESB 102, and the service failure level calculated by the service failure level and the service failure impact range specifying unit 112 indicated by the attribute in the message is calculated. The priority of service failure detection can be determined using either or both of the influence ranges of the parameters as parameters.

以上のように、実施の形態１〜３では、機能をユーザに提供する過程でメッセージを交換するコンピュータソフトウェアであるサービスの障害を検出するサービス運用管理装置１０３を説明した。サービス運用管理装置１０３は、
サービス実行順序の定義、およびサービスの集合である業務とサービスの対応関係の定義を管理するサービス構成情報管理部１１１と、
サービス構成情報管理部の情報をもとに、あるサービスにおいて障害が発生した場合に影響を受けるサービスの範囲を算出するサービス障害影響範囲特定部１１２と、
サービス同士のメッセージ中継を行うミドルウェアの処理の記録であるトランザクションログ１０８およびサービス同士のメッセージのログを収集蓄積する収集・蓄積部１１４と、
種類の異なるログと当該ログの属性情報を関連付けて蓄積し管理するログ管理部１１６と、
上記ログからサービスの実行された順序を検出し、サービスの実行された順序とサービス構成情報管理部の持つサービス実行順序の定義を比較することによりサービス実行順序不正を検出するサービス障害検出部１１５と
を備えたことを特徴とする。 As described above, the first to third embodiments have described the service operation management apparatus 103 that detects a failure of a service that is computer software that exchanges messages in the process of providing a function to a user. The service operation management device 103
A service configuration information management unit 111 that manages the definition of the service execution order and the definition of the correspondence between services and services that are a set of services;
Based on the information of the service configuration information management unit, a service failure influence range specifying unit 112 that calculates a range of services that are affected when a failure occurs in a certain service;
A transaction log 108 that is a record of middleware processing that relays messages between services, and a collection / storage unit 114 that collects and stores message logs between services;
A log management unit 116 that accumulates and manages different types of logs and attribute information of the logs in association with each other;
A service failure detection unit 115 for detecting service execution order from the log and detecting service execution order fraud by comparing the service execution order with the service execution order definition of the service configuration information management unit; It is provided with.

また、サービス運用管理装置１０３は、ログ本文と、当該ログを生成したサービスを識別する情報、および、連続して実行する一連のサービスを識別するセッションの情報、および、当該ログを生成したサービスの属す業務が次に実行する可能性のあるサービスの識別情報のリストをサービス構成情報管理部１１１に関連付けて保管している。 In addition, the service operation management apparatus 103 includes a log text, information for identifying the service that generated the log, information on a session that identifies a series of services to be continuously executed, and the service that has generated the log. A list of identification information of services that may be executed next by the business to which the service belongs is stored in association with the service configuration information management unit 111.

サービス障害検出部１１５、あるログからサービス実行順序不正を検出する処理の際、不正検出処理対象となっているログと同じセッションで直前に実行したサービスのログに関連付けられた、当該直前に実行したサービスが次に実行する可能性のあるサービスおよび業務の識別情報のリストをサービス構成情報管理部１１１から取り出し、不正検出処理の対象となっているログのサービスおよび業務の識別情報と比較することにより、サービス実行順序不正を検出する。 When the service failure detection unit 115 detects a service execution order fraud from a certain log, the service fault detection unit 115 associated with the log of the service executed immediately before in the same session as the log subject to the fraud detection process is executed. By retrieving a list of service and business identification information that may be executed next by the service from the service configuration information management unit 111 and comparing the list with the service and business identification information of the log subject to fraud detection processing Incorrect service execution order is detected.

実施の形態１〜３では、サービス運用管理システム１００は、
機能をユーザに提供する過程でメッセージを交換するコンピュータソフトウェアとハードウェアであるサービス部１０１と、
サービス同士のメッセージ中継を行うとともに、個々の処理を記録するミドルウェアと、
前述したサービス運用管理装置と、
を持ち、
サービス運用管理装置１０３は、トランザクションログ１０８、メッセージのログを、サービス構成情報管理部１１１の保管するサービス実行順序定義情報と照合し、サービス障害影響範囲特定部１１２の算出するサービス障害の影響範囲の広いサービスの実行順序不正を優先的に検出することを特徴とする。 In the first to third embodiments, the service operation management system 100 includes:
A service unit 101 which is computer software and hardware for exchanging messages in the process of providing a function to a user;
Middleware that relays messages between services and records individual processes;
The service operation management device described above;
Have
The service operation management apparatus 103 collates the transaction log 108 and the message log with the service execution order definition information stored in the service configuration information management unit 111, and determines the service fault influence range calculated by the service fault influence range specifying unit 112. It is characterized by preferentially detecting an illegal execution order of a wide service.

サービス障害影響範囲特定部１１２は、サービス構成情報管理部１１１の情報をもとに、あるサービスにおいて障害が発生した場合に影響を受けるサービスの範囲を算出する。サービス障害影響範囲特定部１１２は、サービス間の階層情報（あるサービス（親サービス）が他のサービス（子サービス）を呼び出す関係、および同じ親サービスに呼び出されたサービス群（兄弟サービス）の関係の情報）をもとに、サービス障害影響範囲を特定する。 Based on the information of the service configuration information management unit 111, the service failure influence range identification unit 112 calculates a range of services that are affected when a failure occurs in a certain service. The service failure influence range specifying unit 112 is configured to obtain hierarchical information between services (a relationship in which a certain service (parent service) calls another service (child service) and a relationship between service groups (sibling services) called by the same parent service). Based on the information), the service failure impact range is specified.

サービス障害影響範囲特定部１１２は、障害の発生したサービスに初期値を付与し、また当該サービスを呼ぶサービスおよび当該サービスから呼び出されるサービスは呼び出しごとに値が減算または加算されるものとするよう構成した解析優先度スコアをもとに、サービス障害影響範囲を算出することを特徴とする。 The service failure influence range identification unit 112 assigns an initial value to a service in which a failure has occurred, and a service that calls the service and a service that is called from the service are configured such that a value is subtracted or added for each call. The service failure influence range is calculated based on the analyzed priority score.

サービス運用管理装置は、
前述のミドルウェアのログおよびメッセージの属性からサービス障害を検出するのに必要な、サービス障害定義を保管するサービス障害定義管理部１１３と、
サービス障害定義管理部１１３よりサービス障害定義情報を取り出し、前述のログおよびメッセージおよびそれらの属性との照合を行いサービスで発生した障害を検出するサービス障害検出部１１５と、
を備え、
サービス部１０１はメッセージ１０４に、サービス障害を検出するのに必要な付加情報である属性情報１０５を付加し、
属性クエリ解釈部１０９において、メッセージのログから属性を抽出し、属性クエリ解釈部１０９が、当該属性情報およびトランザクションログ１０８、メッセージのログ、属性の値を、サービス障害定義管理部１１３の保管するサービス障害定義情報、およびサービス構成情報管理部１１１の保管するサービス実行順序定義情報と照合し、サービス障害影響範囲特定部１１２の算出するサービス障害の影響範囲の広いサービス、の障害を優先的に検出することを特徴とする。 Service operation management device
A service fault definition management unit 113 that stores a service fault definition necessary for detecting a service fault from the aforementioned middleware log and message attributes;
A service failure detection unit 115 that retrieves service failure definition information from the service failure definition management unit 113 and compares the above-described log and message and their attributes to detect a failure that has occurred in the service;
With
The service unit 101 adds attribute information 105, which is additional information necessary for detecting a service failure, to the message 104,
The attribute query interpretation unit 109 extracts attributes from the message log, and the attribute query interpretation unit 109 stores the attribute information, the transaction log 108, the message log, and the attribute value in the service failure definition management unit 113. Collate with the failure definition information and the service execution order definition information stored in the service configuration information management unit 111 to preferentially detect failures of services with a wide range of service failure impacts calculated by the service failure impact range identification unit 112 It is characterized by that.

サービス部１０１がメッセージに付加する属性のうちに、当該サービスの重要度に関する情報を含み、サービス実行順序不正またはその他のサービス障害検出にあたっては、サービス障害影響範囲に加え、サービスの重要度を、サービス障害検出処理の優先度決定のパラメータとして用いることを特徴とする。 The attribute added to the message by the service unit 101 includes information on the importance level of the service. When detecting an illegal service execution order or other service faults, the service importance level is set in addition to the service fault impact range. It is used as a parameter for determining priority of fault detection processing.

属性クエリ解釈部１０９は、サービス同士の授受するメッセージに付加された属性、および当該属性をメッセージより抽出するための属性クエリを、当該メッセージより抽出し、また属性クエリを解釈し属性の値を抽出する。 The attribute query interpretation unit 109 extracts an attribute added to a message exchanged between services and an attribute query for extracting the attribute from the message from the message, and also interprets the attribute query and extracts an attribute value. To do.

属性クエリ解釈部１０９は、ミドルウェアにおいてメッセージのログがトランザクションログ１０８本文に含まれていても、属性クエリを解釈することにより当該属性をメッセージより抽出する。また、メッセージのログがトランザクションログ１０８とは別のファイルとして管理されていても、明示的な設定変更を行わずに、当該属性をメッセージより抽出する。 The attribute query interpretation unit 109 extracts the attribute from the message by interpreting the attribute query even if the message log is included in the transaction log 108 body in the middleware. Even if the message log is managed as a file different from the transaction log 108, the attribute is extracted from the message without explicitly changing the setting.

属性クエリ解釈部１０９は、メッセージ中の属性で示されたサービスの重要度、またはサービス障害影響範囲特定部の算出するサービス障害の影響範囲をパラメータとして算出した優先度に従いサービス障害を検出する。 The attribute query interpretation unit 109 detects a service failure according to the priority calculated using the importance of the service indicated by the attribute in the message or the service failure influence range calculated by the service failure influence range specifying unit as a parameter.

図１２は、実施の形態１〜３におけるサービス部１０１、サービス運用管理装置１０３の外観の一例を示す図である。
図１２において、サービス部１０１又はサービス運用管理装置１０３は、システムユニット９１０、ＣＲＴ（Ｃａｔｈｏｄｅ・Ｒａｙ・Ｔｕｂｅ）やＬＣＤ（液晶）の表示画面を有する表示装置９０１、キーボード９０２（Ｋｅｙ・Ｂｏａｒｄ：Ｋ／Ｂ）、マウス９０３、ＦＤＤ９０４（Ｆｌｅｘｉｂｌｅ・Ｄｉｓｋ・Ｄｒｉｖｅ）、コンパクトディスク装置９０５（ＣＤＤ）、プリンタ装置９０６、スキャナ装置９０７などのハードウェア資源を備え、これらはケーブルや信号線で接続されている。 FIG. 12 is a diagram illustrating an example of the appearance of the service unit 101 and the service operation management apparatus 103 according to the first to third embodiments.
In FIG. 12, a service unit 101 or a service operation management apparatus 103 includes a system unit 910, a display device 901 having a CRT (Cathode / Ray / Tube) or LCD (liquid crystal) display screen, a keyboard 902 (Key / Board: K / B), hardware resources such as a mouse 903, an FDD 904 (Flexible / Disk / Drive), a compact disk device 905 (CDD), a printer device 906, and a scanner device 907, which are connected by cables and signal lines.

システムユニット９１０は、コンピュータであり、ファクシミリ機９３２、電話器９３１とケーブルで接続され、また、ローカルエリアネットワーク９４２（ＬＡＮ）、ゲートウェイ９４１を介してインターネット９４０に接続されている。 The system unit 910 is a computer, and is connected to the facsimile machine 932 and the telephone 931 via a cable, and is connected to the Internet 940 via a local area network 942 (LAN) and a gateway 941.

図１３は、実施の形態１〜３におけるサービス部１０１、ＥＳＢ１０２、サービス運用管理装置１０３のハードウェア資源の一例を示す図である。
図１３において、サービス部１０１、ＥＳＢ１０２、サービス運用管理装置１０３は、プログラムを実行するＣＰＵ９１１（Ｃｅｎｔｒａｌ・Ｐｒｏｃｅｓｓｉｎｇ・Ｕｎｉｔ、中央処理装置、処理装置、演算装置、マイクロプロセッサ、マイクロコンピュータ、プロセッサともいう）を備えている。ＣＰＵ９１１は、バス９１２を介してＲＯＭ９１３、ＲＡＭ９１４、通信ボード９１５、表示装置９０１、キーボード９０２、マウス９０３、ＦＤＤ９０４、ＣＤＤ９０５、プリンタ装置９０６、スキャナ装置９０７、磁気ディスク装置９２０と接続され、これらのハードウェアデバイスを制御する。磁気ディスク装置９２０の代わりに、光ディスク装置、メモリカード読み書き装置などの記憶装置でもよい。 FIG. 13 is a diagram illustrating an example of hardware resources of the service unit 101, the ESB 102, and the service operation management apparatus 103 according to the first to third embodiments.
In FIG. 13, a service unit 101, an ESB 102, and a service operation management apparatus 103 include a CPU 911 (also referred to as a central processing unit, a central processing unit, a processing unit, a processing unit, a microprocessor, a microcomputer, and a processor) that executes a program. I have. The CPU 911 is connected to the ROM 913, the RAM 914, the communication board 915, the display device 901, the keyboard 902, the mouse 903, the FDD 904, the CDD 905, the printer device 906, the scanner device 907, and the magnetic disk device 920 via the bus 912, and the hardware. Control the device. Instead of the magnetic disk device 920, a storage device such as an optical disk device or a memory card read / write device may be used.

ＲＡＭ９１４は、揮発性メモリの一例である。ＲＯＭ９１３、ＦＤＤ９０４、ＣＤＤ９０５、磁気ディスク装置９２０の記憶媒体は、不揮発性メモリの一例である。これらは、記憶装置あるいは記憶部の一例である。 The RAM 914 is an example of a volatile memory. The storage media of the ROM 913, the FDD 904, the CDD 905, and the magnetic disk device 920 are an example of a nonvolatile memory. These are examples of a storage device or a storage unit.

通信ボード９１５、キーボード９０２、スキャナ装置９０７、ＦＤＤ９０４などは、入力部、入力装置の一例である。
また、通信ボード９１５、表示装置９０１、プリンタ装置９０６などは、出力部、出力装置の一例である。 The communication board 915, the keyboard 902, the scanner device 907, the FDD 904, and the like are examples of an input unit and an input device.
Further, the communication board 915, the display device 901, the printer device 906, and the like are examples of an output unit and an output device.

通信ボード９１５は、ファクシミリ機９３２、電話器９３１、ＬＡＮ９４２等に接続されている。通信ボード９１５は、ＬＡＮ９４２に限らず、インターネット９４０、ＩＳＤＮ等のＷＡＮ（ワイドエリアネットワーク）などに接続されていても構わない。インターネット９４０或いはＩＳＤＮ等のＷＡＮに接続されている場合、ゲートウェイ９４１は不用となる。 The communication board 915 is connected to the facsimile machine 932, the telephone 931, the LAN 942, and the like. The communication board 915 is not limited to the LAN 942 and may be connected to the Internet 940, a WAN (wide area network) such as ISDN, or the like. When connected to a WAN such as the Internet 940 or ISDN, the gateway 941 is unnecessary.

磁気ディスク装置９２０には、オペレーティングシステム９２１（ＯＳ）、ウィンドウシステム９２２、プログラム群９２３、ファイル群９２４が記憶されている。プログラム群９２３のプログラムは、ＣＰＵ９１１、オペレーティングシステム９２１、ウィンドウシステム９２２により実行される。 The magnetic disk device 920 stores an operating system 921 (OS), a window system 922, a program group 923, and a file group 924. The programs in the program group 923 are executed by the CPU 911, the operating system 921, and the window system 922.

上記プログラム群９２３には、実施の形態１〜３の説明において「〜部」として説明する機能を実行するプログラムが記憶されている。プログラムは、ＣＰＵ９１１により読み出され実行される。 The program group 923 stores a program for executing a function described as “˜unit” in the description of the first to third embodiments. The program is read and executed by the CPU 911.

ファイル群９２４には、実施の形態１〜３の説明において、「〜の判定結果」、「〜の計算結果」、「〜の処理結果」として説明する情報やデータや信号値や変数値やパラメータが、「〜ファイル」や「〜データベース」の各項目として記憶されている。「〜ファイル」や「〜データベース」は、ディスクやメモリなどの記録媒体に記憶される。ディスクやメモリなどの記憶媒体に記憶された情報やデータや信号値や変数値やパラメータは、読み書き回路を介してＣＰＵ９１１によりメインメモリやキャッシュメモリに読み出され、抽出・検索・参照・比較・演算・計算・処理・出力・印刷・表示などのＣＰＵの動作に用いられる。抽出・検索・参照・比較・演算・計算・処理・出力・印刷・表示のＣＰＵの動作の間、情報やデータや信号値や変数値やパラメータは、メインメモリやキャッシュメモリやバッファメモリに一時的に記憶される。 In the file group 924, information, data, signal values, variable values, and parameters described as “determination results of”, “calculation results of”, and “processing results of” in the description of the first to third embodiments. Are stored as items of “˜file” and “˜database”. The “˜file” and “˜database” are stored in a recording medium such as a disk or a memory. Information, data, signal values, variable values, and parameters stored in a storage medium such as a disk or memory are read out to the main memory or cache memory by the CPU 911 via a read / write circuit, and extracted, searched, referenced, compared, and calculated. Used for CPU operations such as calculation, processing, output, printing, and display. Information, data, signal values, variable values, and parameters are temporarily stored in the main memory, cache memory, and buffer memory during the CPU operations of extraction, search, reference, comparison, operation, calculation, processing, output, printing, and display. Is remembered.

また、実施の形態１〜３の説明において説明したフローチャートの矢印の部分は主としてデータや信号の入出力を示し、データや信号値は、ＲＡＭ９１４のメモリ、ＦＤＤ９０４のフレキシブルディスク、ＣＤＤ９０５のコンパクトディスク、磁気ディスク装置９２０の磁気ディスク、その他光ディスク、ミニディスク、ＤＶＤ（Ｄｉｇｉｔａｌ・Ｖｅｒｓａｔｉｌｅ・Ｄｉｓｋ）等の記録媒体に記録される。また、データや信号は、バス９１２や信号線やケーブルその他の伝送媒体によりオンライン伝送される。 The arrows in the flowcharts described in the description of the first to third embodiments mainly indicate input / output of data and signals. The data and signal values are the RAM 914 memory, the FDD 904 flexible disk, the CDD 905 compact disk, and the magnetic field. The data is recorded on a recording medium such as a magnetic disk of the disk device 920, another optical disk, a mini disk, and a DVD (Digital Versatile Disk). Data and signals are transmitted online via a bus 912, signal lines, cables, or other transmission media.

また、実施の形態１〜３の説明において「〜部」として説明したものは、「〜回路」、「〜装置」、「〜機器」、「〜手段」であってもよく、また、「〜ステップ」、「〜手順」、「〜処理」であってもよい。すなわち、「〜部」として説明したものは、ＲＯＭ９１３に記憶されたファームウェアで実現されていても構わない。或いは、ソフトウェアのみ、或いは、素子・デバイス・基板・配線などのハードウェアのみ、或いは、ソフトウェアとハードウェアとの組み合わせ、さらには、ファームウェアとの組み合わせで実施されても構わない。ファームウェアとソフトウェアは、プログラムとして、磁気ディスク、フレキシブルディスク、光ディスク、コンパクトディスク、ミニディスク、ＤＶＤ等の記録媒体に記憶される。プログラムはＣＰＵ９１１により読み出され、ＣＰＵ９１１により実行される。すなわち、プログラムは、実施の形態１〜３で述べた「〜部」としてコンピュータを機能させるものである。あるいは、実施の形態１〜３で述べた「〜部」の手順や方法をコンピュータに実行させるものである。 In addition, what has been described as “to part” in the description of the first to third embodiments may be “to circuit”, “to device”, “to device”, and “to means”. It may be “step”, “˜procedure”, “˜processing”. In other words, what has been described as “˜unit” may be realized by firmware stored in the ROM 913. Alternatively, it may be implemented only by software, or only by hardware such as elements, devices, substrates, and wirings, by a combination of software and hardware, or by a combination of firmware. Firmware and software are stored as programs in a recording medium such as a magnetic disk, a flexible disk, an optical disk, a compact disk, a mini disk, and a DVD. The program is read by the CPU 911 and executed by the CPU 911. That is, the program causes the computer to function as the “˜unit” described in the first to third embodiments. Alternatively, the computer executes the procedure and method of “to part” described in the first to third embodiments.

１００サービス運用管理システム、１０１サービス部、１０２ＥＳＢ、１０３サービス運用管理装置、１０４メッセージ、１０５属性情報、１０６属性クエリ、１０７メッセージログ、１０８トランザクションログ、１０９属性クエリ解釈部、１１０属性管理部、１１１サービス構成情報管理部、１１２サービス障害影響範囲特定部、１１３サービス障害定義管理部、１１４収集・蓄積部、１１５サービス障害検出部、１１６ログ管理部、１１７統合ログ記憶部、９０１表示装置、９０２キーボード、９０３マウス、９０４ＦＤＤ、９０５コンパクトディスク装置、９０６プリンタ装置、９０７スキャナ装置、９１０システムユニット、９１１ＣＰＵ、９１２バス、９１３ＲＯＭ、９１４ＲＡＭ、９１５通信ボード、９２０磁気ディスク装置、９２１オペレーティングシステム、９２２ウィンドウシステム、９２３プログラム群、９２４ファイル群、９３１電話器、９３２ファクシミリ機、９４０インターネット、９４１ゲートウェイ、９４２ローカルエリアネットワーク。 DESCRIPTION OF SYMBOLS 100 Service operation management system, 101 Service part, 102 ESB, 103 Service operation management apparatus, 104 message, 105 attribute information, 106 attribute query, 107 message log, 108 transaction log, 109 attribute query interpretation part, 110 attribute management part, 111 Service configuration information management unit, 112 Service failure influence range identification unit, 113 Service failure definition management unit, 114 Collection / accumulation unit, 115 Service failure detection unit, 116 Log management unit, 117 Integrated log storage unit, 901 Display device, 902 Keyboard 903 mouse, 904 FDD, 905 compact disc device, 906 printer device, 907 scanner device, 910 system unit, 911 CPU, 912 bus, 913 ROM, 914 RAM, 15 communication board, 920 a magnetic disk device, 921 operating system, 922 Window system, 923 Program group, 924 File group, 931 telephone, 932 a facsimile machine, 940 Internet, 941 Gateway, 942 LAN.

Claims

In a service operation management apparatus for detecting a failure in a service provided by computer software executed in a predetermined order,
Service configuration information management unit that stores service execution order definition information that defines the execution order of multiple services, and stores influence range information that indicates the degree of influence of a failure when a failure occurs in response to a service. When,
An integrated log storage unit that stores a plurality of logs as a past log of at least one of a message log that records messages between services and a transaction log that records message relay of the above message;
A new input log to be stored in the integrated log storage unit is input, and a log of a service executed before the service indicated by the input log is detected as a previous log from a plurality of past logs already stored in the integrated log storage unit. The service execution order definition information stored in the service configuration information management unit is retrieved, and the service execution order between the service indicated by the previous log and the service indicated by the input log is stored in the service configuration information management unit. A service failure detection unit that determines whether or not the service execution order matches the service execution order,
With
The service failure detection unit refers to the influence range information stored in the service configuration information management unit, and preferentially detects the occurrence of a failure with respect to an input log of a service having a large influence degree of the failure. Operation management device.

The log has a session ID, a business ID, and a service ID.
The service configuration information management unit defines a service by a business ID and a service ID, stores a service execution order of a plurality of services by the business ID and the service ID,
The service failure detection unit is a past log having the same session ID as the session ID of the input log from a plurality of past logs already stored in the integrated log storage unit, and the service executed immediately before the service indicated by the input log The previous service log is detected as the previous log, and the service execution order is searched from the service execution order definition information stored in the service configuration information management unit, and a service that can be executed after the service indicated by the previous log is selected as a candidate service. 2. A service failure occurrence is detected by determining whether the acquired business ID and service ID of the candidate service match the business ID and service ID of the input log. Service operation management device.

The service operation management device
By retrieving the service execution order from the service execution order definition information stored in the service configuration information management unit, the range of other services affected by the failure is identified from the failed service where the failure occurred, and affected by the failure. 3. A service failure influence range specifying unit for storing a failure influence range of another service as a failure occurrence service influence range information in association with a failure occurrence service and storing it in a service configuration information management unit. The service operation management device described.

4. The service operation management apparatus according to claim 1, wherein the service fault influence range specifying unit specifies a service fault influence range based on the hierarchy information using the call relationship between services as hierarchy information. .

The service fault influence range identification unit assigns an initial value to the fault service, and analyzes the value that is subtracted or added for each call for services that call the fault service and services that are called from the fault service The service operation management apparatus according to claim 4, wherein the service operation management apparatus is assigned as a priority score, and calculates a failure influence range of the service based on the analysis priority score.

The service operation management device
A service fault definition management unit for storing service fault definition information in which an abnormal range of attribute values of attribute information included in a service message is defined as a fault reference value;
The message has attribute information including the attribute value of the service,
The service fault detection unit refers to the service fault definition information stored in the service fault definition management unit, compares the fault reference value defined by the service fault definition information with the attribute value of the attribute information included in the log message. 6. The service operation management apparatus according to claim 1, wherein occurrence of a service failure is detected.

The message has attribute information including an attribute value indicating the importance of the service,
The service failure detection unit preferentially detects the occurrence of a service failure of a message having attribute information having a high importance level with reference to the importance level indicated by the attribute information included in the log message. -Service operation management apparatus in any one of -6.

The message has an attribute query for retrieving attribute information from the message,
The service operation management device
Execute an attribute query included in the log message and extract attribute information from the log message;
An attribute management unit for storing attribute information extracted by the attribute query interpretation unit,
8. The service operation management apparatus according to claim 6, wherein the service failure detection unit determines failure detection with reference to attribute information stored in the attribute management unit.