CN113986677B

CN113986677B - A method and device for monitoring business resources

Info

Publication number: CN113986677B
Application number: CN202111302291.6A
Authority: CN
Inventors: 白石
Original assignee: Jingdong Technology Information Technology Co Ltd
Current assignee: Jingdong Technology Information Technology Co Ltd
Priority date: 2021-11-04
Filing date: 2021-11-04
Publication date: 2025-10-21
Anticipated expiration: 2041-11-04
Also published as: CN113986677A

Abstract

The method specifically comprises the steps of determining a service resource domain to be monitored, receiving monitoring data of the service resource sent by the service resource domain through the distributed monitoring platform peer-to-peer center, judging whether the service resource domain is abnormal or not according to the monitoring data, and calling the distributed monitoring platform peer-to-peer center to convert the abnormal monitoring data into abnormal alarm data under the condition that the service resource domain is abnormal. By establishing a distributed monitoring platform, full coverage monitoring of multi-service resources is realized.

Description

Method and device for monitoring service resources

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and apparatus for monitoring service resources.

Background

The existing main monitoring technology is used for monitoring all levels of resources of the cloud platform, and is mainly used for monitoring independent service resources or service systems, and is realized by deploying agent programs on key nodes. However, there is no unified monitoring method for monitoring these independent service resources, so that when the cloud platform itself fails, such as the control plane fails, the monitoring system will fail.

Disclosure of Invention

The present disclosure provides a method and an apparatus for monitoring service resources, which implement full coverage monitoring for multiple service resources by establishing a distributed monitoring platform.

In a first aspect, the present disclosure provides a method for monitoring service resources, which is applied to a distributed monitoring platform, where the distributed monitoring platform includes at least one distributed monitoring platform peer center, and each distributed monitoring platform peer center corresponds to at least one service resource domain;

the method specifically comprises the following steps:

Determining a service resource domain to be monitored;

receiving monitoring data of the service resources sent by the service resource domain through the distributed monitoring platform peer-to-peer center, and judging whether the service resource domain is abnormal or not according to the monitoring data;

And under the condition that the business resource domain is determined to be abnormal, calling the distributed monitoring platform peer-to-peer center to convert the abnormal monitoring data into abnormal alarm data.

The method for monitoring the service resources provided by the present disclosure further comprises:

under the condition of adding the service resource domain, determining the corresponding relation between the service resource domain and the distributed monitoring platform peer-to-peer center, and publishing the information of the service resource domain to the distributed monitoring platform.

According to the method for monitoring service resources provided by the present disclosure, before the determining the service resource domain to be monitored, the method includes:

Generating a random flow reservation table based on a data receiving index of a distributed monitoring platform peer-to-peer center, wherein the random flow reservation table comprises time and the sending data volume of a service resource domain;

And transmitting the random traffic reservation table to the service resource domain, and calling the service resource domain to carry out hierarchical division on service resources in the service resource domain according to the time of the random traffic reservation table and the transmission data quantity of the service resource domain, so as to determine each hierarchy of the service resource domain, wherein each hierarchy of the service resource domain has a priority order.

According to the method for monitoring the service resources, the distributed monitoring platform peer-to-peer center comprises a primary shunting node, a secondary shunting node and a parallel data processing node;

The receiving, by the distributed monitoring platform peer-to-peer center, monitoring data of the service resource sent by the service resource domain, and judging whether the service resource domain is abnormal according to the monitoring data, including:

invoking the primary shunting node to receive the monitoring data, and adding the monitoring data into a high-speed processing queue in the primary shunting node;

Calling the secondary streaming nodes to perform position hash processing on the monitoring data in the high-speed processing queue, and determining corresponding first parallel data processing nodes;

and calling the secondary streaming node to send the monitoring data to the first parallel data processing node, and judging whether the business resource domain is abnormal or not based on the monitoring data through the first parallel data processing node.

According to the method for monitoring the service resources provided by the disclosure, the distributed monitoring platform peer-to-peer center further comprises a fault collection node;

The determining, by the first parallel data processing node, whether the service resource domain is abnormal based on the monitoring data includes:

analyzing the monitoring data through the first parallel data processing node, and judging whether the monitoring data is abnormal or not;

if the monitoring data are abnormal, reporting the monitoring data to a fault collecting node;

and sending an abnormal inquiry message to a service resource domain corresponding to the monitoring data through the fault aggregation node, and judging whether the service resource domain is abnormal or not according to the returned response message.

According to the method for monitoring service resources provided by the present disclosure, the analyzing, by the first parallel data processing node, the monitoring data to determine whether an abnormality occurs in the monitoring data includes:

Uniformly analyzing the monitoring data sent by each level of the business resource domain through the first parallel processing node, and judging whether the monitoring data in each level is abnormal or not;

the judging whether the service resource domain is abnormal according to the returned response message comprises the following steps:

and judging whether the hierarchy of the service resource domain is abnormal or not according to the returned response message.

According to the method for monitoring service resources provided by the present disclosure, the receiving, by the distributed monitoring platform peer-to-peer center, monitoring data of service resources sent by the service resource domain, and judging whether the service resource domain is abnormal according to the monitoring data, further includes:

Monitoring data corresponding to the high-priority service resources are directly sent to the secondary streaming nodes;

calling the secondary streaming nodes to perform position hash processing on the monitoring data corresponding to the high-priority service resources, and acquiring second parallel data processing nodes;

And calling the secondary streaming node to send the monitoring data to the second parallel data processing node, and judging whether the business resource domain is abnormal or not based on the monitoring data through the second parallel data processing node.

According to the method for monitoring service resources provided by the present disclosure, the judging whether the service resource domain is abnormal according to the returned response message includes:

if the response message does not acquire the feedback data, confirming that the abnormality occurs;

And if the response message acquires the feedback data, confirming that no abnormality occurs.

In a second aspect, the present disclosure provides a device for monitoring service resources, which is disposed on a distributed monitoring platform, where the distributed monitoring platform includes at least one distributed monitoring platform peer center, and each distributed monitoring platform peer center corresponds to at least one service resource domain;

The device specifically comprises:

The determining module is used for determining a service resource domain to be monitored;

the receiving module is used for receiving the monitoring data of the service resources sent by the service resource domain through the distributed monitoring platform peer-to-peer center and judging whether the service resource domain is abnormal or not according to the monitoring data;

And the conversion module is used for calling the distributed monitoring platform peer-to-peer center to convert the abnormal monitoring data into abnormal alarm data under the condition that the abnormal business resource domain is determined to occur.

In a third aspect, the present disclosure provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of traffic resource monitoring as claimed in any one of the preceding claims when the program is executed by the processor.

In a fourth aspect, the present disclosure provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method of traffic resource monitoring as described in any of the preceding claims.

In a fifth aspect, the present disclosure provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method of traffic resource monitoring as described in any of the preceding claims.

The invention provides a method and a device for monitoring service resources, which are characterized in that firstly, a service resource domain to be monitored is determined, all service resources are contained in the service resource domain, monitoring data of the service resources sent by the service resource domain are received through a distributed monitoring platform peer-to-peer center in a distributed monitoring platform, and the distributed monitoring platform peer-to-peer center is called to judge whether the service resource domain is abnormal or not according to the monitoring data, wherein each distributed monitoring platform peer-to-peer center is correspondingly provided with at least one service resource domain, so that whether the service resource domain is abnormal or not is judged through the distributed monitoring platform peer-to-peer center, monitoring of massive service resource domains can be realized, and under the condition that the service resource domain is abnormal, the abnormal monitoring data are converted into abnormal alarm data by calling the distributed monitoring platform peer-to-peer center. The method and the device realize full-coverage monitoring of the multi-service resources by establishing a distributed monitoring platform.

Drawings

In order to more clearly illustrate the present disclosure or the prior art solutions, a brief description will be given below of the drawings that are needed in the embodiments or prior art descriptions, it being apparent that the drawings in the following description are some embodiments of the present disclosure and that other drawings may be obtained from these drawings without inventive effort to a person of ordinary skill in the art.

FIG. 1 is an overall layout of a distributed monitoring platform provided by embodiments of the present disclosure;

FIG. 2 is a flow chart of a method for monitoring a business resource domain provided by an embodiment of the present disclosure;

FIG. 3 is a level block diagram of various levels of a financial resource domain A provided by an embodiment of the present disclosure;

FIG. 4 is a level block diagram of various levels of a financial resource domain B provided by an embodiment of the present disclosure;

FIG. 5 is one of the flowcharts provided in the embodiments of the present disclosure for determining whether an anomaly has occurred in the service resource domain;

FIG. 6 is a second flowchart for determining whether an abnormality occurs in the service resource domain according to an embodiment of the present disclosure;

fig. 7 is an overall flowchart of a method for monitoring a service resource domain according to an embodiment of the present disclosure;

FIG. 8 is a schematic overall flow diagram of a special case of a method for monitoring a service resource domain provided by an embodiment of the present disclosure;

Fig. 9 is a schematic structural diagram of an apparatus for monitoring service resources according to an embodiment of the present disclosure;

Fig. 10 is a schematic structural diagram of an electronic device provided by the present disclosure.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are some embodiments, but not all embodiments of the present disclosure. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the disclosed embodiments, are within the scope of the disclosed embodiments.

The distributed monitoring platform comprises at least one distributed monitoring platform peer-to-peer center, if each distributed monitoring platform peer-to-peer center is regarded as a node, the distributed monitoring platform can be understood as a monitoring platform formed by nodes which are communicated through a network and work cooperatively for completing common tasks, and the purpose of building the distributed monitoring platform is to utilize more distributed monitoring platform peer-to-peer centers to monitor more business resource domains.

Correspondingly, each distributed monitoring platform peer-to-peer center may correspond to a service resource domain or multiple service resource domains.

Referring to fig. 1, an overall layout of a distributed monitoring platform according to an embodiment of the present disclosure is shown. The layout in fig. 1 is exemplified by the distributed monitoring platform including three distributed monitoring platform peering centers, and each distributed monitoring platform peering center corresponds to three service resource domains.

Under the condition that the distributed monitoring platform peer-to-peer centers are multiple, the distributed monitoring platform peer-to-peer centers adopt a ring network structure for data intercommunication, and when one of the distributed monitoring platform peer-to-peer centers is abnormal, the business resource domain corresponding to the abnormal distributed monitoring platform peer-to-peer center can be shunted to the adjacent upstream and downstream distributed monitoring platform peer-to-peer centers in a network switching mode. The monitoring method and the monitoring system have the advantages that the effect of monitoring the service resource domain corresponding to the distributed monitoring platform peer center can be achieved under the condition that the distributed monitoring platform peer center is abnormal.

It can be appreciated that the setting of the distributed monitoring platform peer-to-peer center and the service resource domain can be set autonomously by those skilled in the art according to actual requirements or application scenarios, which is not limited by the present disclosure.

Referring to fig. 2, a flowchart of a method for monitoring a service resource domain according to an embodiment of the disclosure is shown, where the method includes:

And 210, determining a service resource domain to be monitored.

In this step, the service resource domain may be understood as a generic term for all resources in any one of the monitored service systems. The business resource domain to be monitored may be arbitrary, such as a financial resource domain.

220, Receiving monitoring data of the service resources sent by the service resource domain through the distributed monitoring platform peer-to-peer center, and judging whether the service resource domain is abnormal or not according to the monitoring data.

In this step, the financial resource domain is taken as an example, and the financial resource domain includes all financial resources in the monitored financial system, wherein the financial resources can be one or more of server resources, virtual machine resources, network resources or storage system resources.

An anomaly may be understood as an abnormal event that occurs in the operation of a financial resource domain, such as a downtime event.

Specifically, the distributed monitoring platform peer-to-peer center receives monitoring data of financial resources sent by a financial resource domain, and judges whether the financial resource domain is abnormal or not according to the monitoring data

And 230, calling the distributed monitoring platform peer center to convert the abnormal monitoring data into abnormal alarm data under the condition that the abnormal business resource domain is determined.

In the step, a financial resource domain is taken as an example, an abnormal event occurs in the operation process of the financial resource domain, and a distributed monitoring platform peer-to-peer center is called to convert monitoring data corresponding to the abnormal event into abnormal alarm data.

The invention provides a business resource monitoring method, which comprises the steps of firstly determining a business resource domain to be monitored, including all business resources in the business resource domain, receiving monitoring data of the business resources sent by the business resource domain through a distributed monitoring platform peer-to-peer center in a distributed monitoring platform, and calling the distributed monitoring platform peer-to-peer center to judge whether the business resource domain is abnormal according to the monitoring data, wherein each distributed monitoring platform peer-to-peer center corresponds to at least one business resource domain, so that the monitoring of massive business resource domains can be realized by judging whether the business resource domain is abnormal through the distributed monitoring platform peer-to-peer center, and calling the distributed monitoring platform peer-to-peer center to convert the abnormal monitoring data into abnormal alarm data under the condition that the business resource domain is determined to be abnormal. The method and the device realize full-coverage monitoring of the multi-service resources by establishing a distributed monitoring platform.

The method provided by the embodiment of the disclosure further comprises the following steps:

In this step, when the distributed monitoring platform peer-to-peer centers are multiple, different distributed monitoring platform peer-to-peer centers may correspond to different service resource domains, so that in the case of a newly added service resource domain, a corresponding relationship between the newly added service resource domain and the distributed monitoring platform peer-to-peer center is determined, and information of the newly added service resource domain is issued to the distributed monitoring platform, so that the distributed monitoring platform monitors the newly added service resource domain.

The method provided by the embodiment of the present disclosure, before step 210, includes the following steps 211 to 212:

step 211, generating a random traffic reservation table based on the data receiving index of the distributed monitoring platform peer-to-peer center, wherein the random traffic reservation table comprises time and the sending data quantity of the service resource domain.

In the step, generating the random traffic reservation table based on the data receiving index of the distributed monitoring platform peer-to-peer center can be understood as calculating and processing the blank space of the service resources in the service resource domain in a certain time window T, generating the random uniform data receiving index in time in the next time window T, and collecting the generated data receiving indexes by the distributed monitoring platform peer-to-peer center to generate the random traffic reservation table.

And 212, transmitting the random flow reservation table to the service resource domain, and calling the service resource domain to carry out hierarchical division on the service resources in the service resource domain according to the time of the random flow reservation table and the transmission data quantity of the service resource domain, so as to determine each hierarchy of the service resource domain, wherein each hierarchy of the service resource domain has a priority order.

In this step, the random traffic reservation table may be sent to the service resource domain through a network, and meanwhile, the monitored data may be balanced in time through the random traffic reservation table.

Each level of the service resource domain has a priority order, and the determination of the priority order can be set according to a specific application scene, so that the priority order is not limited in the disclosure.

Correspondingly, the business resources in the business resource domain are hierarchically divided, each hierarchy of the business resource domain is determined, and the business resource domain is taken as a financial resource domain as an example.

Referring to fig. 3, which is a level block diagram of each level of a financial resource domain a provided by an embodiment of the present disclosure, a virtualized or cloud platform architecture may be adopted for the financial resource domain a to divide the financial resource a into an available area collector, a fault domain collector, a cabinet collector, a server collector, and a virtual machine collector, and referring to fig. 4, which is a level block diagram of each level of a financial resource domain B provided by an embodiment of the present disclosure, a distributed material resource node architecture may be adopted for the financial resource B to divide the financial resource B into an available area collector, a fault domain collector, a cabinet collector, and a server collector. The collector of the monitoring data of each level is designed for the financial resources of the level, for example, the available area collector can comprise a data center power supply state, a temperature and humidity state or a personnel on duty state and the like. It is understood that the architecture adopted by each financial resource domain and the content contained in each collector can be set by those skilled in the art according to the actual requirements or application scenarios, and the disclosure is not limited thereto.

The distributed monitoring platform peer-to-peer center comprises a primary shunting node, a secondary shunting node and a parallel data processing node.

Specifically, the distributed monitoring platform peer-to-peer center can be realized by adopting a cloud platform technology in a layered and hierarchical manner, and the distributed monitoring platform peer-to-peer center is correspondingly divided into a primary shunting node, a secondary shunting node and a parallel data processing node.

Referring to fig. 5, one of the flowcharts for determining whether an abnormality occurs in the service resource domain according to the embodiment of the present disclosure includes:

and 510, calling the primary shunting node to receive the monitoring data, and adding the monitoring data to a high-speed processing queue in the primary shunting node.

In the step, a processing system for receiving the monitoring data and a high-capacity memory buffer system are provided in the first-level shunting node, the monitoring data is received by the processing system, and the monitoring data is added into a high-speed processing queue in the first-level shunting node by the high-capacity memory buffer system.

The processing system for receiving the monitoring data and the high-capacity memory buffer system are arranged in the first-level shunting node, so that excessive pressure and damage to the distributed monitoring platform peer-to-peer center caused by the system network flow flood peak can be effectively prevented.

And 520, calling the secondary streaming nodes to perform position hash processing on the monitoring data in the high-speed processing queue, and determining a corresponding first parallel data processing node.

In the step, the second-stage flow dividing node is called to receive the monitoring data from the first-stage flow dividing node, and the monitoring data are balanced in time through a random flow reservation table, so that the monitoring data in the second-stage flow dividing node are balanced monitoring data.

The location hash refers to storing the monitoring data in a hash table, establishing a mapping relation between the monitoring data and the storage location of the monitoring data in the hash table, so that each monitoring data corresponds to a unique location in the hash table, when certain monitoring data needs to be acquired, mapping the monitoring data needing to be acquired into the hash table through a hash function, and the storage location corresponding to the monitoring data is the location hash corresponding to the monitoring data needing to be acquired.

The position hash processing of the monitoring data can enable the monitoring data to be balanced in space.

And 530, calling the secondary streaming node to send the monitoring data to the first parallel data processing node, and judging whether the business resource domain is abnormal or not based on the monitoring data through the first parallel data processing node.

In this step, the first parallel processing node may be an operating virtual machine, and the first parallel processing node is responsible for processing the received monitoring data, performing unified processing, and determining whether an abnormality occurs in a service resource domain corresponding to the monitoring data.

The method provided by the embodiment of the disclosure, the distributed monitoring platform peer-to-peer center further comprises a fault collecting node.

In the step, the fault collecting node is responsible for collecting fault analysis results of the parallel processing nodes.

Specifically, step 530 includes the following steps 531-533:

and 531, analyzing the monitoring data by the first parallel data processing node, and judging whether the monitoring data is abnormal or not.

And step 532, if the monitoring data is abnormal, reporting the monitoring data to a fault collecting node.

And 533, sending an abnormal inquiry message to the service resource domain corresponding to the monitoring data through the fault aggregation node, and judging whether the service resource domain is abnormal or not according to the returned response message.

Steps 531 to 533 are further described in the following examples:

Taking the monitoring data of the financial resource domain B as an example, analyzing the monitoring data of the financial resource domain B through a first parallel data processing node, judging whether the monitoring data of the financial resource domain B is abnormal, if the monitoring data corresponding to the cabinet collector in the financial resource domain B is down, reporting the monitoring data corresponding to the cabinet collector to a fault collecting node, wherein the fault collecting node sends an abnormal inquiry message to the financial resource domain B, the sent abnormal inquiry message can be a network connection probe data message, and determining whether the cabinet collector in the financial resource domain B is really down according to a returned response message.

Step 531 specifically includes:

And uniformly analyzing the monitoring data sent by each level of the business resource domain through the first parallel processing node, and judging whether the monitoring data in each level is abnormal or not.

In the step, the financial resource B is taken as an example for explanation, and the financial resource B is divided into an available area collector, a fault domain collector, a cabinet collector and a server collector, namely, unified analysis is carried out on monitoring data corresponding to the available area collector, the fault domain collector, the cabinet collector and the server collector respectively, and whether the monitoring data corresponding to each level is abnormal or not is judged.

By judging whether the monitoring data of each level of the service resource domain is abnormal or not, the purpose of accurate positioning can be achieved.

Step 533 specifically includes:

In this step, the reply message may be understood as a message that the financial resource domain replies or responds to the transmitted abnormal inquiry message.

Referring to fig. 6, a second flowchart for determining whether an abnormality occurs in the service resource domain according to an embodiment of the present disclosure further includes:

and 610, directly sending the monitoring data corresponding to the service resources with high priority to the secondary streaming nodes.

In the step, a financial resource domain A is used for illustration, wherein the priority order of the financial resource A is an available area collector, a fault domain collector, a cabinet collector, a server collector and a virtual machine collector, the available area collector is high priority, and the virtual machine collector is low priority.

And monitoring data corresponding to the high-priority available area collector in the financial resource domain A is skipped over the primary shunting node and is directly sent to the secondary shunting node.

And 620, calling the secondary streaming nodes to perform position hash processing on the monitoring data corresponding to the high-priority service resources, and acquiring a second parallel data processing node.

In the step, the second-stage streaming node performs position hash processing on the monitoring data corresponding to the received high-priority available region collector to acquire a second parallel data processing node.

And 630, calling the secondary streaming node to send the monitoring data to the second parallel data processing node, and judging whether the business resource domain is abnormal or not based on the monitoring data through the second parallel data processing node.

In the step, the second parallel data processing node receives the monitoring data corresponding to the high-priority available region collector and judges whether the corresponding service resource domain is abnormal or not.

The method provided by the embodiments of the present disclosure, step 533 further includes:

Further, the implementation of the present disclosure is further described with reference to fig. 7, which is a schematic overall flow chart of a method for monitoring a service resource domain according to an embodiment of the present disclosure, and specifically includes steps 710 to 780:

The method for monitoring the service resources provided by the embodiment of the disclosure is applied to a distributed monitoring platform, and the distributed monitoring platform comprises a distributed monitoring platform peer-to-peer center, which is described by taking a service resource domain as an example, wherein the distributed monitoring platform peer-to-peer center comprises a primary shunting node, a secondary shunting node, a parallel data processing node and a fault collecting node.

Before monitoring the service resource domain, the monitoring data of the service resource domain needs to be acquired, the acquisition of the monitoring data is realized based on a hierarchical structure, and a financial resource domain A is taken as an example, a virtualization architecture is adopted for the financial resource A and is divided into an available area collector, a fault domain collector, a cabinet collector, a server collector and a virtual machine collector, wherein the available area collector is of high priority, and the virtual machine collector is of low priority.

And 710, determining a financial resource domain A, and monitoring data of business resources in the financial resource A.

And 720, calling the primary shunting node to receive the monitoring data, and adding the monitoring data into a high-speed processing queue of the primary shunting node.

And 730, calling the second-level streaming node to perform position hash processing on the monitoring data in the high-speed processing queue, and determining a corresponding first parallel data processing node.

And 740, uniformly analyzing the monitoring data sent by each level of the business resource domain A through the first parallel processing node, and judging whether the monitoring data in each level is abnormal or not so as to realize accurate positioning.

And 750, if the monitoring data is abnormal, reporting the monitoring data to a fault collecting node.

And 760, calling the fault aggregation node to send an abnormal inquiry message to the service resource domain A corresponding to the monitoring data, and judging whether the service resource domain A is abnormal or not according to the returned response message.

And 770, if the response message does not acquire the feedback data, confirming that the service resource domain A is abnormal.

780, If the response message obtains the feedback data, confirming that the service resource domain a confirms that no abnormality occurs.

Referring to fig. 8, an overall flow diagram of a special case of a method for monitoring a service resource domain according to an embodiment of the disclosure is shown. There is a special case that if the monitored data is marked as high priority during the process of monitoring the service resource, for example, the available area collector in the financial resource domain a is high priority, steps 810 to 870 are executed:

810, determining monitoring data corresponding to the high-priority available area collector in the financial resource domain A.

And 820, calling the second-level streaming nodes to perform position hash processing on the monitoring data corresponding to the high-priority available region collector, and determining the corresponding second parallel data processing nodes.

And 830, calling the secondary streaming node to send the monitoring data to the second parallel data processing node, and judging whether the business resource domain A is abnormal or not based on the monitoring data through the second parallel data processing node.

And 840, if the monitoring data is abnormal, reporting the monitoring data to a fault collecting node.

And 850, calling the fault aggregation node to send an abnormal inquiry message to the service resource domain A corresponding to the monitoring data, and judging whether the service resource domain A is abnormal or not according to the returned response message.

And 860, if the response message does not acquire the feedback data, confirming that the service resource domain A is abnormal.

870, If the response message obtains the feedback data, confirming that the service resource domain a confirms that no abnormality occurs.

Based on any of the foregoing embodiments, fig. 9 is a schematic structural diagram of a device for monitoring service resources provided by the embodiments of the present disclosure, which is disposed on a distributed monitoring platform, where the distributed monitoring platform includes at least one distributed monitoring platform peer center, and each distributed monitoring platform peer center corresponds to at least one service resource domain.

The device specifically comprises:

A determining module 910, configured to determine a service resource domain to be monitored.

And the receiving module 920 is configured to receive, by using the distributed monitoring platform peer-to-peer center, monitoring data of the service resource sent by the service resource domain, and determine whether the service resource domain is abnormal according to the monitoring data.

And the conversion module 930 is used for calling the distributed monitoring platform peer center to convert the abnormal monitoring data into abnormal alarm data under the condition that the abnormal service resource domain is determined to occur.

The invention provides a device for monitoring service resources, which comprises the steps of firstly determining a service resource domain to be monitored, including all service resources in the service resource domain, receiving monitoring data of the service resources sent by the service resource domain through a distributed monitoring platform peer-to-peer center in a distributed monitoring platform, and calling the distributed monitoring platform peer-to-peer center to judge whether the service resource domain is abnormal according to the monitoring data, wherein each distributed monitoring platform peer-to-peer center corresponds to at least one service resource domain, so that the monitoring of massive service resource domains can be realized by judging whether the service resource domain is abnormal through the distributed monitoring platform peer-to-peer center, and calling the distributed monitoring platform peer-to-peer center to convert the abnormal monitoring data into abnormal alarm data under the condition that the service resource domain is determined to be abnormal. The method and the device realize full-coverage monitoring of the multi-service resources by establishing a distributed monitoring platform.

Based on any of the above embodiments, the apparatus further includes a new module configured to:

Based on any of the above embodiments, the apparatus further comprises:

and the generation module is used for generating a random flow reservation table based on the data receiving index of the distributed monitoring platform peer-to-peer center, wherein the random flow reservation table comprises the time and the sending data volume of the service resource domain.

The system comprises a dividing module, a service resource domain and a service resource domain, wherein the dividing module is used for transmitting the random flow reservation table to the service resource domain and calling the service resource domain to divide the service resource in the service resource domain according to the time of the random flow reservation table and the transmission data quantity of the service resource domain, and determining each level of the service resource domain, wherein each level of the service resource domain has a priority order.

Based on any of the above embodiments, the distributed monitoring platform peer-to-peer center includes a primary streaming node, a secondary streaming node, and a parallel data processing node.

The receiving module 920 specifically includes:

and the adding unit is used for calling the primary shunting node to receive the monitoring data and adding the monitoring data into a high-speed processing queue in the primary shunting node.

And the processing unit is used for calling the secondary streaming nodes to perform position hash processing on the monitoring data in the high-speed processing queue and determining corresponding first parallel data processing nodes.

And the judging unit is used for calling the secondary streaming node to send the monitoring data to the first parallel data processing node, and judging whether the business resource domain is abnormal or not based on the monitoring data through the first parallel data processing node.

Based on any of the above embodiments, the distributed monitoring platform peering center further comprises a failure aggregation node.

The judging unit includes:

And the analysis subunit is used for analyzing the monitoring data through the first parallel data processing node and judging whether the monitoring data is abnormal or not.

And the reporting subunit is used for reporting the monitoring data to the fault collecting node if the monitoring data is abnormal.

And the judging subunit is used for sending an abnormal inquiry message to the service resource domain corresponding to the monitoring data through the fault aggregation node, and judging whether the service resource domain is abnormal or not according to the returned response message.

Based on any of the above embodiments, the analysis subunit is specifically configured to:

the judging subunit is specifically configured to:

Based on any of the above embodiments, the receiving module 920 further includes:

And the sending unit is used for directly sending the monitoring data corresponding to the service resources with high priority to the secondary streaming nodes.

And the acquisition unit is used for calling the secondary streaming nodes to perform position hash processing on the monitoring data corresponding to the high-priority service resources, and acquiring a second parallel data processing node.

And the calling unit is used for calling the secondary streaming node to send the monitoring data to the second parallel data processing node, and judging whether the business resource domain is abnormal or not based on the monitoring data through the second parallel data processing node.

Based on any of the above embodiments, whether the hierarchy of the service resource domain is abnormal is determined according to the returned response message, which is specifically configured to:

and if the response message does not acquire the feedback data, confirming that the abnormality occurs.

Fig. 10 illustrates a physical schematic diagram of an electronic device, which may include a processor 1001, a communication interface (Communications Interface) 1002, a memory 1003, and a communication bus 1004, as shown in fig. 10, where the processor 1001, the communication interface 1002, and the memory 1003 perform communication with each other through the communication bus 1004. The processor 1001 can call logic instructions in the memory 1003 to execute a method for monitoring service resources, the method comprises the steps of determining a service resource domain to be monitored, receiving monitoring data of the service resources sent by the service resource domain through the distributed monitoring platform peer-to-peer center, judging whether the service resource domain is abnormal according to the monitoring data, and calling the distributed monitoring platform peer-to-peer center to convert the abnormal monitoring data into abnormal alarm data under the condition that the service resource domain is determined to be abnormal.

Further, the logic instructions in the memory 1003 described above may be implemented in the form of software functional units and sold or used as a separate product, and may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present disclosure may be essentially or, what contributes to the prior art, or part of the technical solutions, may be embodied in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present disclosure. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes.

In another aspect, the disclosure also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, are capable of performing the method of monitoring a service resource provided by the methods described above, the method comprising determining a service resource domain to be monitored, receiving, by a distributed monitoring platform peer center, monitoring data of the service resource sent by the service resource domain, and determining whether an anomaly has occurred in the service resource domain according to the monitoring data, and invoking the distributed monitoring platform peer center to convert the monitoring data of the anomaly to anomaly alarm data if it is determined that the anomaly has occurred in the service resource domain.

In yet another aspect, the disclosure further provides a non-transitory computer readable storage medium having stored thereon a computer program which when executed by a processor performs the method of monitoring service resources provided above, the method comprising determining a service resource domain to be monitored, receiving, by the distributed monitoring platform peer-to-peer center, monitoring data of service resources sent by the service resource domain, and determining whether an abnormality occurs in the service resource domain according to the monitoring data, and calling the distributed monitoring platform peer-to-peer center to convert the monitoring data that is abnormal into abnormality alarm data if it is determined that an abnormality occurs in the service resource domain.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that the foregoing embodiments are merely illustrative of the technical solutions of the present disclosure, and not limiting thereof, and although the present disclosure has been described in detail with reference to the foregoing embodiments, it should be understood by those skilled in the art that modifications may be made to the technical solutions described in the foregoing embodiments or equivalents may be substituted for some of the technical features thereof, and these modifications or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present disclosure in essence.

Claims

1. The method for monitoring the service resources is characterized by being applied to a distributed monitoring platform, wherein the distributed monitoring platform comprises at least one distributed monitoring platform peer-to-peer center, and each distributed monitoring platform peer-to-peer center corresponds to at least one service resource domain;

the method specifically comprises the following steps:

Determining a service resource domain to be monitored;

under the condition that the business resource domain is determined to be abnormal, calling the distributed monitoring platform peer-to-peer center to convert the abnormal monitoring data into abnormal alarm data;

before the service resource domain to be monitored is determined, the method comprises the following steps:

Generating a random flow reservation table based on data receiving indexes of a distributed monitoring platform peer-to-peer center, wherein the random flow reservation table comprises time and the sending data volume of a service resource domain;

And transmitting the random traffic reservation table to the service resource domain, calling the service resource domain to carry out hierarchical division on service resources in the service resource domain according to the time of the random traffic reservation table and the transmission data quantity of the service resource domain, and determining each hierarchy of the service resource domain, wherein each hierarchy of the service resource domain has a priority order, and the random traffic reservation table realizes time equalization of monitoring data.

2. The method of traffic resource monitoring according to claim 1, wherein the method further comprises:

3. The method of claim 1, wherein the distributed monitoring platform peer-to-peer center comprises a primary split node, a secondary split node, and a parallel data processing node;

4. A method of traffic resource monitoring according to claim 3, wherein the distributed monitoring platform peer-to-peer center further comprises a failure aggregation node;

5. The method for monitoring service resources according to claim 4, wherein the analyzing, by the first parallel data processing node, the monitoring data to determine whether an abnormality occurs in the monitoring data includes:

6. The method for monitoring service resources according to claim 3, wherein the receiving, by the distributed monitoring platform peer-to-peer center, monitoring data of service resources sent by the service resource domain, and determining whether an abnormality occurs in the service resource domain according to the monitoring data, further comprises:

7. The method for monitoring service resources according to claim 4, wherein the determining whether the service resource domain is abnormal according to the returned response message comprises:

8. The device for monitoring the service resources is characterized by being arranged on a distributed monitoring platform, wherein the distributed monitoring platform comprises at least one distributed monitoring platform peer-to-peer center, and each distributed monitoring platform peer-to-peer center corresponds to at least one service resource domain;

The device specifically comprises:

The conversion module is used for calling the distributed monitoring platform peer center to convert the abnormal monitoring data into abnormal alarm data under the condition that the abnormal occurrence of the service resource domain is determined;

The system comprises a generation module, a random flow reservation table, a calling module and a random flow reservation table, wherein the generation module is used for generating the random flow reservation table based on the data receiving index of the distributed monitoring platform peer-to-peer center, the random flow reservation table comprises time and the sending data quantity of a service resource domain, the random flow reservation table is sent to the service resource domain, the service resource domain is called to conduct hierarchical division on service resources in the service resource domain according to the time of the random flow reservation table and the sending data quantity of the service resource domain, each hierarchy of the service resource domain is determined, each hierarchy of the service resource domain is provided with a priority order, and the random flow reservation table enables monitoring data to be balanced in time.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method of traffic resource monitoring according to any of claims 1 to 7 when the program is executed by the processor.

10. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor performs the steps of the method of traffic resource monitoring according to any of claims 1 to 7.

11. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of traffic resource monitoring according to any of claims 1 to 7.