[go: up one dir, main page]

CN106649055A - Domestic CPU (central processing unit) and operating system based software and hardware fault alarming system and method - Google Patents

Domestic CPU (central processing unit) and operating system based software and hardware fault alarming system and method Download PDF

Info

Publication number
CN106649055A
CN106649055A CN201710015718.1A CN201710015718A CN106649055A CN 106649055 A CN106649055 A CN 106649055A CN 201710015718 A CN201710015718 A CN 201710015718A CN 106649055 A CN106649055 A CN 106649055A
Authority
CN
China
Prior art keywords
module
information
fault message
failure
fault
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710015718.1A
Other languages
Chinese (zh)
Inventor
朱宪
李超
孙元田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Inspur Cloud Service Information Technology Co Ltd
Original Assignee
Shandong Inspur Cloud Service Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Inspur Cloud Service Information Technology Co Ltd filed Critical Shandong Inspur Cloud Service Information Technology Co Ltd
Priority to CN201710015718.1A priority Critical patent/CN106649055A/en
Publication of CN106649055A publication Critical patent/CN106649055A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3024Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3058Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a domestic CPU (central processing unit) and operating system based software and hardware fault alarming system and method. The domestic CPU and operating system based software and hardware fault alarming system comprises a fault detection platform and an alarming service platform. The fault detection platform, running on a host which is provided with domestic CPU and operating system, is used for shielding the difference of the domestic CPU and operating system to complete fault detection of different software and hardware information of the host and to send the corresponding fault information to the alarming service platform. The alarming service platform is used for receiving alarm message sent from the fault detection platform and then displaying the fault message to make an alarm. Compared with the prior art, the domestic CPU and operating system based software and hardware fault alarming system and method has the advantages of high efficiency, simplicity and good expansibility. Besides, new fault detection items can be added with simple configuration, thus, the system and method is high in practicability and easy to promote.

Description

A kind of hardware and software failure warning system and method based on domestic CPU and operating system
Technical field
The present invention relates to field of computer technology, specifically it is a kind of it is practical, based on domestic CPU and operating system Hardware and software failure warning system and method.
Background technology
As domestic enterprise is information-based and the development of electronic government affairs, large and medium-sized enterprise and government organs increasingly according to Rely and launch its business running in information system, the importance of information system operation maintenance work is also highlighted therewith.And main frame soft or hard The monitoring of part information is an important component part in information system O&M.
It is currently based on domestic CPU, the server and client side of operating system has formed scale in some key areas Using, but in terms of comparing the X86 environment of main flow, the stability of its software and hardware still in constantly improving during, this is just right Put forward higher requirement based on the main frame hardware and software failure information monitoring of domestic CPU and operating system, and safety can at present By a kind of ripe, efficient hardware and software failure problem information warning system and method are there is presently no in environment.For this feelings Condition, now provides a kind of hardware and software failure warning system based on domestic CPU and operating system and method.
The content of the invention
The technical assignment of the present invention is for above weak point, there is provided it is a kind of it is practical, based on domestic CPU and operation The hardware and software failure warning system of system and method.
A kind of hardware and software failure warning system based on domestic CPU and operating system, including fault detect platform, operate in Install on the main frame of domestic CPU and operating system, for shielding the diversity of domestic CPU and operating system, complete each to main frame The fault detect of software and hardware information, and corresponding failure information is sent to into alerting service platform;
Alerting service platform, after receiving the warning information that fault detect platform sends, carries out the displaying of fault message, concurrently Go out alarm.
The fault detect platform is by kernel scheduling module, information acquisition module, failure analysis module, configuration module, logical News module, tripartite's fault message AM access module, pretreatment module composition, wherein kernel scheduling module is responsible for overall flow scheduling And process;Configuration module be responsible for corresponding strategies configuration, receive communication module transmission including fault detect strategy, failure pretreatment The configuration information of mechanism, malfunction index table, and upgrade in time and come into force, it is synchronous with the holding of alerting service platform configuration information;Information The software and hardware information of acquisition module Real-time Collection main frame;Failure analysis module realize to the instant detection of fault message, analysis and Filter;The AM access module of tripartite's failure receives third-party application fault message immediately;Pretreatment module, to meeting pretreatment mechanism Fault message, performs default Shell scripts or the processing routine very first time solves failure problems, to the event after the completion of process Barrier information is persistently tracked, and the fault message and processing information is sent to alerting service if the fault message is still suffered from and is put down Platform;The fault message for detecting is packed by communication module according to the corresponding relation of { key, value }, by socket communication mode Fault message is sent to into alerting service platform.
The main frame software and hardware information of described information acquisition module collection includes:Filesystem information, cpu load to main frame Information, internal memory load information, SWAP load informations, disk I/O load information, network interface flow information, progress information, information on services Data carry out real-time monitoring;The hardware state of main frame, including machine temperature state, fan-status are obtained in real time by IPMI;It is logical Cross the middleware installed on JMX protocol monitor main frames;Data base on plug-in unit access main frame is connected by data base, data are obtained Storehouse operation information.
The failure analysis module is analyzed in real time for the main frame software and hardware information obtained to information acquisition module, pin Initial multi-level threshold value is arranged to different types of software and hardware information, and the software and hardware information to main frame is classified Statistical analysiss, dynamic adjusted threshold threshold value, specially:Exceed when failure analysis module detects certain class software and hardware information monitoring value Its threshold value for arranging, is continued to monitor at short notice to such software and hardware information, if still suffer from monitor value arranging more than which Threshold value then produce fault message, be otherwise considered as wrong report information;The failure analysis module according to the fault message order of severity, Fault message is divided into four ranks by importance and urgency:Minor failure, minor failure, important failure, critical failure, therefore Barrier analysis module sets the corresponding failure of the fault message for the classification importance and the threshold level for exceeding of fault message Rank;Corresponding, alerting service platform sets four kinds of different alarm levels:Prompt alarm, minor alarm, significant alarm, Critical alarm.
The pretreatment module processes the fault message for meeting pretreatment mechanism, specially:First to meeting preprocessor The fault message of condition processed is classified, and each classification is supported to pre-set corresponding shell scripts or processing routine;When After failure analysis module detects fault message, kernel scheduling module detects that the fault message is located with the presence or absence of correspondence is pre- first Reason mechanism:The fault message is passed to into pretreatment module process if it there is pretreatment mechanism, otherwise then by the fault message Pass to communication module.
Tripartite's failure AM access module by two ways access third-party application fault message, first kind of way be by Fault message by encrypt xml modes derive and be placed under assigned catalogue, tripartite's failure AM access module timing scan this refer to Determine catalogue, parse xml therein and encrypt file acquisition fault message, after the AM access module of tripartite's failure receives fault message, by event Barrier information transmission passes to alerting service platform by the communication module to communication module;Second mode is logical by socket News directly interacts with alerting service platform communication module, by fault information data according to formulation standard criterion, with according to xml or Json forms carry out packing transmission.
The alerting service platform is by kernel scheduling module, communication module, configuration module, foreground display module, alarm mould Block, interlocking module composition, wherein,
The fault message that communication module real-time reception is sent to fault detect platform or third-party application, passes to kernel scheduling Module;
Kernel scheduling module analysis fault message, generates corresponding warning information, point warning information is distributed to memory module, front Platform display module, alarm module;
Foreground display module is implemented to represent warning information in front end web page in two ways:Listed in the form of alarm list Current all active alarms;The position that failure occurs is shown by changing the end mark on network topological diagram, with different Flicker Warning Mark color reflects different alarm levels;
The warning information that memory module real-time storage is received, for inquiry, analysis and statistics;Alarm module receives warning information Afterwards, alerted according to the warning strategies of setting;
The alerting service platform is deployed in intranet and extranet respectively, and warning information is passed by the alarm module of Intranet alerting service platform Interlocking module is passed, warning information is encrypted by interlocking module, passed note, by way of manually importing and exporting encryption xml document Warning information is passed to outer net alerting service platform interlocking module, outer net alerting service platform interlocking module receives warning information Afterwards, warning information is decrypted and is obtained to information, warning information is passed to into alarm module, perform corresponding warning strategies, Realize that alarm is reminded by outer net alerting service platform.
A kind of hardware and software failure alarm method based on domestic CPU and operating system, comprises the following steps:
First, fault detect is carried out first, and the process fault detection includes fault information acquisition, fault information analysis, fault message Pretreatment and fault message transmission, wherein,
Fault information acquisition is referred to be inserted by analysis system file, execution shell scripts, JMX agreements, IPMI, data base's connection Part various ways are combined, and the software and hardware information, running status to main frame carries out Real-time Collection, while also gathering the third party of main frame Application and trouble information;
Fault information analysis refer to whether determination fault message is wrong report information, and it is pre- to determine whether the fault message can be carried out Process;
Fault message pretreatment refers to, if process it is unsuccessful if carry out fault message transmission;
2nd, fault message is reported to the police, and the process includes receiving fault message, shows fault message and send alarm.
The detailed process of fault detect is:
Real-time Collection is carried out to the software and hardware information of main frame, running status by information acquisition module first;Failure analysis module The main frame software and hardware information that information acquisition module is obtained is analyzed in real time, when failure analysis module detects certain class software and hardware Information monitoring value exceedes the threshold value of its setting, and such software and hardware information is continued to monitor at short notice, if still suffering from prison Measured value exceedes the threshold value of its setting and then produces fault message, and sets corresponding failure rank, execution step 2), otherwise regard If to report information fault message by mistake;Tripartite's fault message AM access module monitor in real time assigned catalogue, parses the encryption under the catalogue File xml document, obtains the fault message of the machine third-party application, after fault message is detected, execution step 2 in time);
2)Kernel scheduling module is analyzed to fault message, can detect that the failure is that have pretreatment mechanism, if there is pretreatment Mechanism, then execution step 3), otherwise execution step 4);
3)Fault message is passed to pretreatment module by kernel scheduling module, and pretreatment module performs the corresponding Shell of the failure Script or processing routine, are persistently tracked to the failure after the completion of process, if still there is identical fault message, are performed Step 4), otherwise return to step 1);
4)Fault message is passed to communication module, corresponding relation of the communication module according to { key, value } by kernel scheduling module By fault message according to certain rule packing, the rule includes packing according to the carrying out of xml or json forms, and takes with alarm Business platform communication module sets up TCP connections by socket communication, and packing data is passed to alerting service platform communication mould Block, the return to step 1 after reception confirmation message is received).
Fault message report to the police detailed process be:
A, communication module are persistently monitored realization and receive fault message, when receiving fault detect platform or third-party application is sent Fault message when, the fault message for parsing can be passed to kernel scheduling module, execution step b;
After b, kernel scheduling module receive fault message, fault message can be parsed, distribute corresponding alarm classification And alarm level, warning information is generated, and the warning information of generation is distributed to into foreground display module, alarm module, storage mould Block, execution step c;
After c, memory module receive warning information, warning information is stored in data base, for inquiry, analysis and statistics; After foreground display module receives meeting warning information, warning information is reacted to by foreground by two ways and is shown in interface, first Plant and current all active alarms are listed in the form of alarm list, and support to be inquired about according to correlated condition;Pass through for second Change end mark on network topological diagram to show position that failure occurs, different alert levels are indicated with different flickers Not;After alarm module receives warning information, current alert mode can be analyzed:If there is Network Isolation situation, need to pass through Outer net alerting service platform sends a warning message, can execution step d, otherwise execution step e;
Warning information is sent to interlocking module by d, alarm module, after interlocking module receives warning information, form according to the rules After packaging ciphering, outer net alerting service platform linkage mould is sent to by short message sending, derivation encryption XML file mode Block;Outer net alerting service platform interlocking module is accused after receiving encryption warning information, and acquisition alarm letter is decrypted to encryption information Breath, and warning information is passed to into outer net alerting service platform alarm module according to the corresponding warning strategies of warning information, carry out Alarm is reminded, return to step a;
E, alarm module are alerted according to the corresponding warning strategies of the warning information, return to step a.
A kind of hardware and software failure warning system and method based on domestic CPU and operating system of the present invention, with following Advantage:
(1), the difference of domestic CPU and operating system is shielded using the method that fault detect platform is arranged in safe and reliable main frame Property, fault detect platform run on main frame based on domestic CPU and operating system, by information acquisition module, failure analysis module, Configuration module, communication module, tripartite's fault message AM access module, pretreatment module composition.Information acquisition module is by analyzing system The various ways such as system file, execution shell scripts, JMX agreements, IPMI, safe and reliable data base connection plug-in unit are combined, to safety The software and hardware information of reliable main frame, running status, data base, middleware etc. carry out Real-time Collection using operation information.Failure point Analysis module carries out analysis in real time and finds fault message in time to the monitoring information that information acquisition module is obtained, and failure was carried out Wrong report information is rejected in filter.Fault detect platform carries out monitor in real time, analysis to the main frame based on domestic CPU and operating system, and When, prepare find hostdown information, it is ensured that fault message obtain real-time, accuracy.
(2), support by interact with fault detect platform or is transmitted directly to alerting service platform realize third party answer Accessed with fault message, it is ensured that the integrity that fault message is obtained.
(3), support fault message pretreatment mechanism, for the fault message for meeting pretreatment condition of mechanism, the very first time Perform default script or processing routine and solve failure problems, and the fault message is tracked, if it has been found that the problem, Fault message and process operation are sent to into alerting service platform immediately then.
(4), support warning information in Web leading portion real-time exhibitions, support to change end mark on network topological diagram to show Show the position that failure occurs, different alarm levels are reflected with different flicker Warning Mark colors.
(5), support the warning strategies of flexibly definition, support various alarm modes such as mail, note, wechat, phone, For example mail and wechat message are sent immediately to user after generation alarm can be set, it is intended that also unconfirmed after the time, then Note is sent, it is also unconfirmed after a time, phone user.And support that warning strategies are upgraded, send in the alarm very first time Notify to whom, if at the appointed time also untreated good, upper-level leader personnel can be transmitted the message to.
(6), support Network Isolation situation warning information transmission, dispose alerting service platform respectively in intranet and extranet, pass through Encrypted short message, manually import and export the encryption mode such as xml document and warning information is transmitted between alerting service platform, improve announcement Alert adaptability and motility of the service platform to various environment, it is practical, it is applied widely, it is easy to promote.
Description of the drawings
For the clearer explanation embodiment of the present invention or the technical scheme of prior art, below will be to embodiment or existing Accompanying drawing to be used needed for technology description is briefly described, it should be apparent that, drawings in the following description are only this Some bright embodiments, for those of ordinary skill in the art, on the premise of not paying creative work, can be with root Other accompanying drawings are obtained according to these accompanying drawings.
Accompanying drawing 1 is that end host monitors Organization Chart.
2 fault detection program building-block of logic of accompanying drawing.
3 alerting service platform logic structure chart of accompanying drawing.
Specific embodiment
In order that those skilled in the art more fully understand the present invention program, with reference to the accompanying drawings and detailed description The present invention is described in further detail.Obviously, described embodiment is only a part of embodiment of the invention, rather than Whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art are not making creative work premise Lower obtained every other embodiment, belongs to the scope of protection of the invention.
As shown in accompanying drawing 1, Fig. 2, Fig. 3, a kind of hardware and software failure warning system based on domestic CPU and operating system, this System described in patent includes the fault detect platform and two, alerting service platform operated in based on domestic CPU and operating system Point.
Fault detect platform, operates on the main frame for installing domestic CPU and operating system, for shielding domestic CPU and behaviour Make the diversity of system, complete the fault detect to each software and hardware information of main frame, and corresponding failure information is sent to into alarm clothes Business platform;
Alerting service platform, after receiving the warning information that fault detect platform sends, carries out the displaying of fault message, concurrently Go out alarm.
The present invention shields domestic CPU and operation system using the method that fault message detection platform is arranged in safe and reliable main frame The diversity of system, fault detect platform are made up of computer program, are run on the main frame based on domestic CPU and operating system, such as Fault detection program shown in accompanying drawing 1 is the fault detect platform, and which is by kernel scheduling module, information acquisition module, failure Analysis module, configuration module, communication module, tripartite's fault message AM access module, pretreatment module composition.Kernel scheduling module is born The overall flow scheduling of duty and process;Configuration module is responsible for corresponding strategies configuration;Information acquisition module Real-time Collection main frame is all kinds of Software and hardware information;Failure analysis module is realized the instant detection to fault message, analysis and is filtered;Tripartite's failure AM access module is When receive third-party application fault message;Pretreatment module, the fault message to meeting pretreatment mechanism are performed default Shell scripts or the processing routine very first time solve failure problems, the fault message are persistently tracked after the completion of process, If still exist the fault message if the fault message and processing information are sent to into alerting service platform;Communication module according to The corresponding relation of { key, value } by the fault message for detecting according to certain rule packing (as data can according to xml or The carrying out of json forms is packed), fault message is sent to by alerting service platform by socket communication mode.Alerting service is put down Platform is made up of kernel scheduling module, communication module, configuration module, foreground display module, alarm module, interlocking module.Communication mould The fault message that block real-time reception is sent to fault detection program or third-party application, passes to kernel scheduling module;Core Scheduler module analyzes fault message, generates corresponding warning information, divides and for warning information to be distributed to memory module, foreground displaying mould Block, alarm module.Foreground display module can be implemented to represent warning information in front end web page in two ways, support to change net End mark on network topological diagram reflects different announcements with different flicker Warning Mark colors showing position that failure occurs Alert rank.The warning information that memory module real-time storage is received, for inquiry, analysis and statistics.Alarm module can receive announcement After alarming information, can be alerted according to the warning strategies of setting.For the situation of tertiary-structure network, support in intranet and extranet difference portion Warning information is passed to interlocking module by administration's alerting service platform, alarm module, and warning information is encrypted by interlocking module, by short The modes such as letter transmit warning information to outer net alerting service platform interlocking module, and outer net alerting service platform interlocking module is received After warning information, warning information is passed to into alarm module, perform corresponding warning strategies, by outer net alerting service platform reality Now alarm is reminded.
For due to the particularity of safe and reliable hardware and software platform, arranging fault detection program using in safe and reliable main frame Method, realize fault message detect.Fault detection program is run on the main frame based on domestic CPU and operating system, by information Acquisition module, failure analysis module, configuration module, communication module, tripartite's fault message AM access module, pretreatment module composition. Information acquisition module is combined by various ways such as analysis system file, execution shell scripts, the file to safe and reliable main frame System information, cpu load information, internal memory load information, SWAP load informations, disk I/O load information, network interface flow information etc. Information data real-time monitoring;The hardware state that main frame is obtained in real time by IPMI is such as(Machine temperature state, fan etc.);Pass through The Java such as the safe and reliable middleware installed on JMX protocol monitor main frames are applied;By the safe and reliable data base of independent research Connection plug-in unit accesses the safe and reliable data base on main frame, obtains data base's operation information.Failure analysis module is to information gathering The main frame software and hardware information that module is obtained is analyzed in real time, arranges initial multi-level for different types of software and hardware information Threshold value, and the software and hardware information to main frame carries out classified statistic analysis, in the reasonable scope dynamic adjusted threshold threshold value. When failure analysis module detects threshold value of certain class software and hardware information monitoring value more than its setting, such software and hardware can be believed Breath is continued to monitor at short notice, produces fault message, otherwise regard if threshold value of the monitor value more than its setting is still suffered from To report information by mistake.Fault message is divided into four according to the fault message order of severity, importance and urgency by this patent methods described Individual rank:Minor failure, minor failure, important failure, critical failure.Failure analysis module is important for the classification of fault message Property and the threshold level for exceeding, set the corresponding failure rank of the fault message.In addition fault detection program is accused with service end Alert service platform real-time communication, when service end alerting service platform and fault detection program produce communication failure and exceed certain When time cannot still communicate, alerting service platform triggers the network outage of the fault detection program place main frame, and according to Whether the fault message situation of the main frame that network is received before interrupting, intelligent decision trigger the fault messages such as grand machine.
Support fault warning information pre-processing mechanism.Based on O&M process experience for many years, to nonfatal fault message Arranged, the failure problems to meeting pretreatment condition of mechanism are classified, each classification support is pre-set corresponding Shell scripts or processing routine.After fault message generation is detected, whether kernel scheduling module detects fault message first Meet pretreatment mechanism, if meeting transfer to pretreatment module to process fault message, it is right that pretreatment module is performed automatically The shell scripts answered or processing routine, the very first time solve failure problems, and the problem is persistently monitored.If processing After program performing, failure problems are yet suffered from, then fault message is sent to alerting service platform with processed strategy.
Support by two ways access third-party application warning information, first kind of way be by by fault message with Xml forms export to assigned catalogue, and fault detection program tripartite's failure AM access module is monitored the assigned catalogue in real time, parses xml Obtain third party's fault message.Second mode is directly interacted with alerting service platform by socket communication, directly by number According to according to the standard criterion formulated, to carry out packing transmission according to xml or json forms.
Alerting service platform realizes that fault message is alerted, by kernel scheduling module, communication module, configuration module, foreground exhibition Show module, alarm module, interlocking module composition.Communication module real-time reception is sent to fault detection program or third-party application Fault message, pass to kernel scheduling module;Kernel scheduling module analysis fault message, generates corresponding warning information, point Warning information is distributed to into memory module, foreground display module, alarm module.Foreground display module can front end web page with Two ways is implemented to represent warning information.The warning information that memory module real-time storage is received, for inquiry, analysis and statistics. After alarm module can receive warning information, can be alerted according to the warning strategies of setting.
The warning strategies of flexibly definition are supported, various alarm modes such as mail, note, wechat, phone are supported, for example can be with Arrange and mail and wechat message are sent immediately to user after there is alarm, it is intended that be also unconfirmed after the time, then send note, It is also unconfirmed after a time, phone user.And support that warning strategies are upgraded, which sends notification in the alarm very first time A little people, if at the appointed time also untreated good, can transmit the message to upper-level leader personnel.
The warning information transmission of Network Isolation situation is supported, alerting service platform is disposed respectively in intranet and extranet, by encryption Note, manually import and export the encryption mode such as xml document and warning information is transmitted between alerting service platform.When the alarm of Intranet When service platform receives warning information, can be alerted according to warning strategies, when needing outer net to alert, will can be alerted Content is encrypted the alerting service platform by short message sending to outer net, and the alerting service platform of outer net receives Encrypted short message Afterwards, note is decrypted, and obtains warning information, and warning information is sent to into phase according to the warning strategies for pre-setting Pass personnel.
Based on foregoing description, the present invention is described in detail to fault detect platform and alerting service platform.
First, fault detect platform.
Fault detect platform is operated on the main frame based on domestic CPU and operating system, shields domestic CPU and operating system Diversity, realize real-time monitoring to fault message.Fault message detection program includes kernel scheduling module, information gathering mould Block, failure analysis module, configuration module, communication module, tripartite's fault message AM access module, pretreatment module.
Kernel scheduling module.The flow scheduling of responsible fault detect platform overall failure detection.
Configuration module.Receive simultaneously the fault detect strategy of communication module transmission, failure pretreatment mechanism, malfunction index table etc. Configuration information, and upgrade in time and come into force, it is synchronous with the holding of alerting service platform configuration information.
Information acquisition module.By analysis system file, the combination of the various ways such as shell scripts is performed, to safe and reliable The filesystem information of main frame, cpu load information, internal memory load information, SWAP load informations, disk I/O load information, network interface The information data real-time monitoring such as flow information, progress information, information on services;The hardware state that main frame is obtained in real time by IPMI is such as (Machine temperature state, fan etc.);By the Java such as the safe and reliable middleware installed on JMX protocol monitor main frames applications;It is logical The safe and reliable data base that safe and reliable data base's connection plug-in unit of independent research is accessed on main frame is crossed, data base's operation letter is obtained Breath.
Failure analysis module.The main frame software and hardware information that information acquisition module is obtained is analyzed in real time, for difference The software and hardware information of species arranges initial multi-level threshold value, and the software and hardware information to main frame carries out classified statistic point Analyse, in the reasonable scope dynamic adjusted threshold threshold value.Exceed when failure analysis module detects certain class software and hardware information monitoring value Its threshold value for arranging, can be continued to monitor to such software and hardware information, at short notice if still suffer from monitor value setting more than which The threshold value put then produces fault message, is otherwise considered as wrong report information.This patent methods described is according to the serious journey of fault message Fault message is divided into four ranks by degree, importance and urgency:Minor failure, minor failure, important failure, critical failure. Classification importance and the threshold level that exceed of the failure analysis module for fault message, set the corresponding event of the fault message Barrier rank.In addition fault detect platform and service end alerting service platform real-time communication, when service end alerting service platform and event When barrier detection platform produces communication failure and cannot still communicate more than certain hour, alerting service platform triggers the fault detect The network outage of platform place main frame, and the fault message situation of the main frame received before being interrupted according to network, intelligently sentence It is disconnected whether to trigger the fault messages such as grand machine.
Pretreatment module.Fault detect platform supports fault message pretreatment mechanism.Based on O&M process experience for many years Failure problems are arranged, for Insufficient disk space, internal memory using too high, cpu load excessively it is high it is non-emergent it is important simultaneously And the failure problems that can be solved by automatic perform script or processing routine, pretreatment mechanism is set.To meeting pretreatment The fault message of condition of mechanism is classified, and each classification is supported to pre-set corresponding shell scripts or processing routine. After failure analysis module detects fault message, kernel scheduling module detects that the fault message is pre- with the presence or absence of correspondence first Treatment mechanism:The fault message is passed to into pretreatment module process if it there is pretreatment mechanism, otherwise then the failure is believed Breath passes to communication module.After pretreatment module process receives fault message, the fault message is performed automatically corresponding Shell scripts or processing routine, very first time handling failure problem(For example when there is Insufficient disk space alarm, automatically Perform the discarded records such as default disk space liquidation proceduress, the temporary file in automatic scavenging system), and the problem is carried out Persistently monitor.If failure problems are yet suffered from after processing routine is performed, fault message and processed strategy are passed to logical News module.
Tripartite's failure AM access module.This patent methods described is supported to access third-party application failure letter by two ways Breath, first kind of way are that fault message is derived and is placed under assigned catalogue by encrypting xml modes, in fault detect platform Tripartite's fault message AM access module, the timing scan formulation catalogue parses xml therein and encrypts file acquisition fault message, After the AM access module of tripartite's fault message receives fault message, fault message is passed to into communication module, pass to alerting service and put down Platform.Second mode is directly interacted with alerting service platform communication module by socket communication, and fault information data is pressed According to the standard criterion formulated, to carry out packing transmission according to xml or json forms.
Communication module.Fault detect platform and a set of identical fault message concordance list of alerting service platform maintenance, it is every kind of Fault message unique key values of correspondence on concordance list, after communication module receives fault message, according to { key, value } , by the fault message for detecting according to certain rule packing, such as data can entering according to xml or json forms for corresponding relation Row packing, and with alerting service platform by socket communication, set up TCP connections, packing data is passed to into alerting service and is put down Platform, alerting service platform are deposited into data base after receiving fault message, and send reception to fault detect platform communication module Success message, informs that client failure information has been received successfully.After alerting service platform receives fault message, according to the event The corresponding warning strategies of barrier information are alerted.
2nd, alerting service platform.
Alerting service platform comprising core calling module, communication module, configuration module, foreground display module, alarm module, Interlocking module is constituted.
Kernel scheduling module.The scheduling of responsible alerting service platform entirety alarm processing, is responsible for parsing fault message, generates Warning information.
Configuration module.Support by foreground interface, to fault message inspection policies, failure pretreatment strategy, fault message Concordance list, warning strategies, detection host information etc. are configured, and config update information is stored in data base in real time, and will be with failure The related configuration information of detection platform passes to hostdown detection platform by communication module, realizes the consistent of configuration information Property.
Communication module.It is responsible for communicating with fault detect platform communication module and third-party application, real-time reception fault message, And the transmission of alerting service platform and fault detect platform configuration information is realized, realizes that alarm platform and fault detect platform are matched somebody with somebody The synchronization of confidence breath.
Foreground display module.Support the functions such as real time fail information displaying, parameter configuration, monitoring topological figure.Wherein support The failure of two ways shows:The first lists current all active alarms in the form of alarm list, and supports according to correlation Condition is inquired about.The position for showing that by changing the end mark on network topological diagram failure occurs second, with difference Flicker Warning Mark color reflect different alarm levels
Alarm module.Four kinds of failure ranks of correspondence fault detect platform, alerting service platform set four kinds of different alert levels Not:Prompt alarm, minor alarm, significant alarm, critical alarm.Alarm module is supported to support setting for different alarm levels Different warning strategies.Warning strategies support flexible definition rule:Many strategies such as mail, note, wechat, phone alerts are carried Wake up, for example:Prompt alarm prompt alarm in the way of web interface displaying;Minor alarm is shown by web interface, mail notification Mode prompt alarm;Significant alarm is alerted in the way of web interface displaying, mail notification, SMS notification, wechat are notified;Tightly The various ways alarms such as anxious alarm is notified with web interface displaying, mail notification, SMS notification, wechat, phone alerts alarm.And And support warning strategies upgrading:Mail and wechat message are sent immediately to user after generation alarm can be set, it is intended that the time It is also unconfirmed afterwards, then to send note, it is also unconfirmed after a time, phone user;Send in the alarm very first time and notify To whom, if at the appointed time also untreated good, upper-level leader personnel can be transmitted the message to.
Interlocking module.At present in safe and reliable environment, independence is often deployed in based on the main frame Jing of domestic CPU and operating system Intranet environment operation, the isolation of network limits the multiformity of alarm mode, while also on one's own time by important announcement Alarming information passes to operation maintenance personnel and brings difficulty.To solve this problem, this patent methods described supports Network Isolation situation Warning information transmission, dispose alerting service platform respectively in intranet and extranet, by Encrypted short message, manually import and export encryption xml The modes such as file transmit warning information between alerting service platform.When the alerting service platform of Intranet receives warning information When, can be alerted according to warning strategies, when needing outer net to alert, warning information can be passed to linkage by alarm module Module, after interlocking module receives warning information, can be encrypted to warning information, and by encrypted content carry out by note, Manually import and export the modes such as encryption xml document and be sent to outer net alerting service platform interlocking module, outer net alerting service platform After interlocking module receives Encrypted short message or encryption xml document, warning information is decrypted and is obtained to information, will alarm Warning information is sent to operation maintenance personnel to alarm module, alarm module and according to the warning strategies for pre-setting by information transmission.
A kind of hardware and software failure alarm method based on domestic CPU and operating system, supports by foreground interface to failure Inspection policies, failure pretreatment mechanism, malfunction index table, alarm classification, warning strategies etc. carry out relevant configuration, and configuration module connects Receive configuration information, configuration information can be updated and be stored in data base, and by communication module by fault detect strategy, The relevant configuration informations such as failure pretreatment mechanism, malfunction index table pass to fault detect platform configuration module, complete configuration more Newly.
Comprise the following steps:
First, fault detect is carried out first, and the process fault detection includes fault information acquisition, fault information analysis, fault message Pretreatment and fault message transmission, wherein,
Fault information acquisition is referred to be inserted by analysis system file, execution shell scripts, JMX agreements, IPMI, data base's connection Part various ways are combined, and the software and hardware information, running status to main frame carries out Real-time Collection, while also gathering the third party of main frame Application and trouble information;
Fault information analysis refer to whether determination fault message is wrong report information, and it is pre- to determine whether the fault message can be carried out Process;
Fault message pretreatment refers to, if process it is unsuccessful if carry out fault message transmission;
2nd, fault message is reported to the police, and the process includes receiving fault message, shows fault message and send alarm.
The detailed process of fault detect is:
Real-time Collection is carried out to the software and hardware information of main frame, running status by information acquisition module first;Failure analysis module The main frame software and hardware information that information acquisition module is obtained is analyzed in real time, when failure analysis module detects certain class software and hardware Information monitoring value exceedes the threshold value of its setting, and such software and hardware information is continued to monitor at short notice, if still suffering from prison Measured value exceedes the threshold value of its setting and then produces fault message, and sets corresponding failure rank, execution step 2), otherwise regard If to report information fault message by mistake;Tripartite's fault message AM access module monitor in real time assigned catalogue, parses the encryption under the catalogue File xml document, obtains the fault message of the machine third-party application, after fault message is detected, execution step 2 in time);
2)Kernel scheduling module is analyzed to fault message, can detect that the failure is that have pretreatment mechanism, if there is pretreatment Mechanism, then execution step 3), otherwise execution step 4);
3)Fault message is passed to pretreatment module by kernel scheduling module, and pretreatment module performs the corresponding Shell of the failure Script or processing routine, are persistently tracked to the failure after the completion of process, if still there is identical fault message, are performed Step 4), otherwise return to step 1);
4)Fault message is passed to communication module, corresponding relation of the communication module according to { key, value } by kernel scheduling module By fault message according to certain rule packing, the rule includes packing according to the carrying out of xml or json forms, and takes with alarm Business platform communication module sets up TCP connections by socket communication, and packing data is passed to alerting service platform communication mould Block, the return to step 1 after reception confirmation message is received).
Fault message report to the police detailed process be:
A, communication module are persistently monitored realization and receive fault message, when receiving fault detect platform or third-party application is sent Fault message when, the fault message for parsing can be passed to kernel scheduling module, execution step b;
After b, kernel scheduling module receive fault message, fault message can be parsed, distribute corresponding alarm classification And alarm level, warning information is generated, and the warning information of generation is distributed to into foreground display module, alarm module, storage mould Block, execution step c;
After c, memory module receive warning information, warning information is stored in data base, for inquiry, analysis and statistics; After foreground display module receives meeting warning information, warning information is reacted to by foreground by two ways and is shown in interface, first Plant and current all active alarms are listed in the form of alarm list, and support to be inquired about according to correlated condition;Pass through for second Change end mark on network topological diagram to show position that failure occurs, different alert levels are indicated with different flickers Not;After alarm module receives warning information, current alert mode can be analyzed:If there is Network Isolation situation, need to pass through Outer net alerting service platform sends a warning message, can execution step d, otherwise execution step e;
Warning information is sent to interlocking module by d, alarm module, after interlocking module receives warning information, form according to the rules After packaging ciphering, outer net alerting service platform linkage mould is sent to by short message sending, derivation encryption XML file mode Block;Outer net alerting service platform interlocking module is accused after receiving encryption warning information, and acquisition alarm letter is decrypted to encryption information Breath, and warning information is passed to into outer net alerting service platform alarm module according to the corresponding warning strategies of warning information, carry out Alarm is reminded, return to step a;
E, alarm module are alerted according to the corresponding warning strategies of the warning information, return to step a.
Above-mentioned specific embodiment be only the present invention concrete case, the present invention scope of patent protection include but is not limited to Above-mentioned specific embodiment, a kind of any hardware and software failure alarm system based on domestic CPU and operating system for meeting the present invention Appropriate change or replace that the those of ordinary skill of claims the and any technical fields of system and method is done to which Change, should all fall into the scope of patent protection of the present invention.

Claims (10)

1. a kind of hardware and software failure warning system based on domestic CPU and operating system, it is characterised in that include:
Fault detect platform, operates on the main frame for installing domestic CPU and operating system, for shielding domestic CPU and operation system The diversity of system, completes the fault detect to each software and hardware information of main frame, and corresponding failure information is sent to alerting service to put down Platform;
Alerting service platform, after receiving the warning information that fault detect platform sends, carries out the displaying of fault message, concurrently Go out alarm.
2. a kind of hardware and software failure warning system based on domestic CPU and operating system according to claim 1, its feature It is that the fault detect platform is by kernel scheduling module, information acquisition module, failure analysis module, configuration module, communication mould Block, tripartite's fault message AM access module, pretreatment module composition, wherein kernel scheduling module is responsible for overall flow scheduling and place Reason;Configuration module be responsible for corresponding strategies configuration, receive communication module transmission including fault detect strategy, failure preprocessor System, the configuration information of malfunction index table, and upgrade in time and come into force, it is synchronous with the holding of alerting service platform configuration information;Information is adopted The software and hardware information of collection module Real-time Collection main frame;Failure analysis module realizes the instant detection to fault message, analysis and mistake Filter;The AM access module of tripartite's failure receives third-party application fault message immediately;Pretreatment module, the event to meeting pretreatment mechanism Barrier information, performs default Shell scripts or the processing routine very first time solves failure problems, to the failure after the completion of process Information is persistently tracked, and the fault message and processing information is sent to alerting service if the fault message is still suffered from and is put down Platform;The fault message for detecting is packed by communication module according to the corresponding relation of { key, value }, by socket communication mode Fault message is sent to into alerting service platform.
3. a kind of hardware and software failure warning system based on domestic CPU and operating system according to claim 2, its feature It is that the main frame software and hardware information of described information acquisition module collection includes:Filesystem information, cpu load letter to main frame Breath, internal memory load information, SWAP load informations, disk I/O load information, network interface flow information, progress information, information on services number According to carrying out real-time monitoring;The hardware state of main frame, including machine temperature state, fan-status are obtained in real time by IPMI;Pass through The middleware installed on JMX protocol monitor main frames;Data base on plug-in unit access main frame is connected by data base, data base is obtained Operation information.
4. a kind of hardware and software failure warning system based on domestic CPU and operating system according to claim 2, its feature It is that the failure analysis module is analyzed in real time for the main frame software and hardware information obtained to information acquisition module, for Different types of software and hardware information arranges initial multi-level threshold value, and the software and hardware information to main frame carries out classification system Meter analysis, dynamic adjusted threshold threshold value, specially:When failure analysis module detects certain class software and hardware information monitoring value more than which The threshold value of setting, is continued to monitor at short notice to such software and hardware information, if still suffering from monitor value more than its setting Threshold value then produces fault message, is otherwise considered as wrong report information;The failure analysis module is according to the fault message order of severity, weight Fault message is divided into four ranks by the property wanted and urgency:Minor failure, minor failure, important failure, critical failure, failure Analysis module sets the corresponding failure level of the fault message for the classification importance and the threshold level for exceeding of fault message Not;Corresponding, alerting service platform sets four kinds of different alarm levels:It is prompt alarm, minor alarm, significant alarm, tight It is anxious to alert.
5. a kind of hardware and software failure warning system based on domestic CPU and operating system according to claim 2, its feature It is that the pretreatment module processes the fault message for meeting pretreatment mechanism, specially:First to meeting pretreatment mechanism bar The fault message of part is classified, and each classification is supported to pre-set corresponding shell scripts or processing routine;Work as failure After analysis module detects fault message, kernel scheduling module detects the fault message with the presence or absence of correspondence preprocessor first System:The fault message is passed to into pretreatment module process if it there is pretreatment mechanism, otherwise then the fault message is transmitted To communication module.
6. a kind of hardware and software failure warning system based on domestic CPU and operating system according to claim 2, its feature It is that tripartite's failure AM access module accesses third-party application fault message by two ways, first kind of way is by event Barrier information by encrypt xml modes derive and be placed under assigned catalogue, tripartite's failure AM access module timing scan this specify Catalogue, parses xml therein and encrypts file acquisition fault message, after the AM access module of tripartite's failure receives fault message, by failure Information transmission passes to alerting service platform by the communication module to communication module;Second mode is by socket communication Directly interact with alerting service platform communication module, by fault information data according to the standard criterion formulated, with according to xml or Json forms carry out packing transmission.
7. a kind of hardware and software failure warning system based on domestic CPU and operating system according to claim 2, its feature Be, the alerting service platform by kernel scheduling module, communication module, configuration module, foreground display module, alarm module, Interlocking module is constituted, wherein,
The fault message that communication module real-time reception is sent to fault detect platform or third-party application, passes to kernel scheduling Module;
Kernel scheduling module analysis fault message, generates corresponding warning information, point warning information is distributed to memory module, front Platform display module, alarm module;
Foreground display module is implemented to represent warning information in front end web page in two ways:Listed in the form of alarm list Current all active alarms;The position that failure occurs is shown by changing the end mark on network topological diagram, with different Flicker Warning Mark color reflects different alarm levels;
The warning information that memory module real-time storage is received, for inquiry, analysis and statistics;Alarm module receives warning information Afterwards, alerted according to the warning strategies of setting;
The alerting service platform is deployed in intranet and extranet respectively, and warning information is passed by the alarm module of Intranet alerting service platform Interlocking module is passed, warning information is encrypted by interlocking module, passed note, by way of manually importing and exporting encryption xml document Warning information is passed to outer net alerting service platform interlocking module, outer net alerting service platform interlocking module receives warning information Afterwards, warning information is decrypted and is obtained to information, warning information is passed to into alarm module, perform corresponding warning strategies, Realize that alarm is reminded by outer net alerting service platform.
8. a kind of hardware and software failure alarm method based on domestic CPU and operating system, it is characterised in that comprise the following steps:
First, fault detect is carried out first, and the process fault detection includes fault information acquisition, fault information analysis, fault message Pretreatment and fault message transmission, wherein,
Fault information acquisition is referred to be inserted by analysis system file, execution shell scripts, JMX agreements, IPMI, data base's connection Part various ways are combined, and the software and hardware information, running status to main frame carries out Real-time Collection, while also gathering the third party of main frame Application and trouble information;
Fault information analysis refer to whether determination fault message is wrong report information, and it is pre- to determine whether the fault message can be carried out Process;
Fault message pretreatment refers to, if process it is unsuccessful if carry out fault message transmission;
2nd, fault message is reported to the police, and the process includes receiving fault message, shows fault message and send alarm.
9. a kind of hardware and software failure alarm method based on domestic CPU and operating system according to claim 8, its feature It is that the detailed process of fault detect is:
Real-time Collection is carried out to the software and hardware information of main frame, running status by information acquisition module first;Failure analysis module The main frame software and hardware information that information acquisition module is obtained is analyzed in real time, when failure analysis module detects certain class software and hardware Information monitoring value exceedes the threshold value of its setting, and such software and hardware information is continued to monitor at short notice, if still suffering from prison Measured value exceedes the threshold value of its setting and then produces fault message, and sets corresponding failure rank, execution step 2), otherwise regard If to report information fault message by mistake;Tripartite's fault message AM access module monitor in real time assigned catalogue, parses the encryption under the catalogue File xml document, obtains the fault message of the machine third-party application, after fault message is detected, execution step 2 in time);
2)Kernel scheduling module is analyzed to fault message, can detect that the failure is that have pretreatment mechanism, if there is pretreatment Mechanism, then execution step 3), otherwise execution step 4);
3)Fault message is passed to pretreatment module by kernel scheduling module, and pretreatment module performs the corresponding Shell of the failure Script or processing routine, are persistently tracked to the failure after the completion of process, if still there is identical fault message, are performed Step 4), otherwise return to step 1);
4)Fault message is passed to communication module, corresponding relation of the communication module according to { key, value } by kernel scheduling module By fault message according to certain rule packing, the rule includes packing according to the carrying out of xml or json forms, and takes with alarm Business platform communication module sets up TCP connections by socket communication, and packing data is passed to alerting service platform communication mould Block, the return to step 1 after reception confirmation message is received).
10. a kind of hardware and software failure alarm method based on domestic CPU and operating system according to claim 8 or claim 9, its It is characterised by, the detailed process that fault message is reported to the police is:
A, communication module are persistently monitored realization and receive fault message, when receiving fault detect platform or third-party application is sent Fault message when, the fault message for parsing can be passed to kernel scheduling module, execution step b;
After b, kernel scheduling module receive fault message, fault message can be parsed, distribute corresponding alarm classification And alarm level, warning information is generated, and the warning information of generation is distributed to into foreground display module, alarm module, storage mould Block, execution step c;
After c, memory module receive warning information, warning information is stored in data base, for inquiry, analysis and statistics; After foreground display module receives meeting warning information, warning information is reacted to by foreground by two ways and is shown in interface, first Plant and current all active alarms are listed in the form of alarm list, and support to be inquired about according to correlated condition;Pass through for second Change end mark on network topological diagram to show position that failure occurs, different alert levels are indicated with different flickers Not;After alarm module receives warning information, current alert mode can be analyzed:If there is Network Isolation situation, need to pass through Outer net alerting service platform sends a warning message, can execution step d, otherwise execution step e;
Warning information is sent to interlocking module by d, alarm module, after interlocking module receives warning information, form according to the rules After packaging ciphering, outer net alerting service platform linkage mould is sent to by short message sending, derivation encryption XML file mode Block;Outer net alerting service platform interlocking module is accused after receiving encryption warning information, and acquisition alarm letter is decrypted to encryption information Breath, and warning information is passed to into outer net alerting service platform alarm module according to the corresponding warning strategies of warning information, carry out Alarm is reminded, return to step a;
E, alarm module are alerted according to the corresponding warning strategies of the warning information, return to step a.
CN201710015718.1A 2017-01-10 2017-01-10 Domestic CPU (central processing unit) and operating system based software and hardware fault alarming system and method Pending CN106649055A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710015718.1A CN106649055A (en) 2017-01-10 2017-01-10 Domestic CPU (central processing unit) and operating system based software and hardware fault alarming system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710015718.1A CN106649055A (en) 2017-01-10 2017-01-10 Domestic CPU (central processing unit) and operating system based software and hardware fault alarming system and method

Publications (1)

Publication Number Publication Date
CN106649055A true CN106649055A (en) 2017-05-10

Family

ID=58842824

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710015718.1A Pending CN106649055A (en) 2017-01-10 2017-01-10 Domestic CPU (central processing unit) and operating system based software and hardware fault alarming system and method

Country Status (1)

Country Link
CN (1) CN106649055A (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107733688A (en) * 2017-09-14 2018-02-23 国网湖北省电力公司孝感供电公司 A kind of warning system based on mobile terminal
CN107832200A (en) * 2017-10-24 2018-03-23 平安科技(深圳)有限公司 Alert processing method, device, computer equipment and storage medium
CN108833190A (en) * 2018-07-27 2018-11-16 郑州云海信息技术有限公司 A kind of NFS service failure warning method, device and storage medium
CN108959025A (en) * 2018-06-27 2018-12-07 郑州云海信息技术有限公司 A kind of server alarm method, device and server
CN109039774A (en) * 2018-09-11 2018-12-18 郑州云海信息技术有限公司 The management method and device of warning information in openstack platform
CN109144798A (en) * 2018-08-13 2019-01-04 清华大学 Intelligent management system with machine learning function
CN109992486A (en) * 2019-04-02 2019-07-09 北京睿至大数据有限公司 A kind of IT failure methods of exhibiting based on timing and thermodynamic chart
CN110365631A (en) * 2018-04-11 2019-10-22 北京视联动力国际信息技术有限公司 A kind of data processing method and view networked system
CN110470948A (en) * 2019-08-15 2019-11-19 国网四川省电力公司电力科学研究院 A kind of fault location system and method based on platform area circuit topology relationship
CN111158768A (en) * 2019-12-25 2020-05-15 浪潮商用机器有限公司 A server switch control method, device, equipment and storage medium
CN112131579A (en) * 2020-09-30 2020-12-25 中孚安全技术有限公司 Security check method and system for shielding difference between bottom CPU and operating system
CN112988523A (en) * 2021-03-09 2021-06-18 杭州电魂网络科技股份有限公司 Multi-dimensional game system warning method and system
CN113342609A (en) * 2021-06-10 2021-09-03 重庆科创职业学院 Computer obstacle removing system
CN113448763A (en) * 2021-07-16 2021-09-28 广东电网有限责任公司 Dynamic expansion grouping alarm service method for full life cycle management
CN113692573A (en) * 2019-04-11 2021-11-23 微软技术许可有限责任公司 Hierarchically deploying packages to devices in a cluster
CN113791959A (en) * 2021-08-13 2021-12-14 济南浪潮数据技术有限公司 Alarm push method, system, terminal and storage medium of service platform
CN114168404A (en) * 2021-11-04 2022-03-11 济南浪潮数据技术有限公司 Alarm processing method of monitoring platform in data center and monitoring platform
CN114584455A (en) * 2022-03-04 2022-06-03 吉林大学 Small and medium-sized high-performance cluster monitoring system based on enterprise WeChat
CN114660988A (en) * 2022-03-25 2022-06-24 佛山市博顿光电科技有限公司 Troubleshooting method and device
CN115035698A (en) * 2022-06-06 2022-09-09 大牧人机械(胶州)有限公司 Pig farm centralized alarm equipment redundancy system and self-checking method
CN116016262A (en) * 2022-12-28 2023-04-25 天翼云科技有限公司 Method and device for detecting call chain consistency in real time based on union

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101201786A (en) * 2006-12-13 2008-06-18 中兴通讯股份有限公司 Method and device for monitoring fault log
CN103475544A (en) * 2013-09-18 2013-12-25 浪潮电子信息产业股份有限公司 Service monitoring method based on cloud resource monitoring platform
CN104331354A (en) * 2014-11-20 2015-02-04 普华基础软件股份有限公司 Real-time comprehensive monitoring method for cloud computing
CN104486106A (en) * 2014-12-04 2015-04-01 珠海金山网络游戏科技有限公司 Grading warning service system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101201786A (en) * 2006-12-13 2008-06-18 中兴通讯股份有限公司 Method and device for monitoring fault log
CN103475544A (en) * 2013-09-18 2013-12-25 浪潮电子信息产业股份有限公司 Service monitoring method based on cloud resource monitoring platform
CN104331354A (en) * 2014-11-20 2015-02-04 普华基础软件股份有限公司 Real-time comprehensive monitoring method for cloud computing
CN104486106A (en) * 2014-12-04 2015-04-01 珠海金山网络游戏科技有限公司 Grading warning service system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
普华基础软件股份有限公司: "普华综合监控平台担纲电信级云平台运维监控重任", 《HTTP://WWW.I-SOFT.COM.CN/ARTICLE/10038.JHTML》 *
普华基础软件股份有限公司: "普华综合监控平台软件", 《HTTPS://WEB.ARCHIVE.ORG/WEB/20161221003147/HTTP://WWW.I-SOFT.COM.CN/ARTICLE/39.JHTML》 *

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107733688A (en) * 2017-09-14 2018-02-23 国网湖北省电力公司孝感供电公司 A kind of warning system based on mobile terminal
CN107832200A (en) * 2017-10-24 2018-03-23 平安科技(深圳)有限公司 Alert processing method, device, computer equipment and storage medium
CN110365631A (en) * 2018-04-11 2019-10-22 北京视联动力国际信息技术有限公司 A kind of data processing method and view networked system
CN108959025A (en) * 2018-06-27 2018-12-07 郑州云海信息技术有限公司 A kind of server alarm method, device and server
CN108833190A (en) * 2018-07-27 2018-11-16 郑州云海信息技术有限公司 A kind of NFS service failure warning method, device and storage medium
CN109144798A (en) * 2018-08-13 2019-01-04 清华大学 Intelligent management system with machine learning function
CN109039774B (en) * 2018-09-11 2022-06-07 郑州云海信息技术有限公司 Method and device for managing alarm information in openstack platform
CN109039774A (en) * 2018-09-11 2018-12-18 郑州云海信息技术有限公司 The management method and device of warning information in openstack platform
CN109992486A (en) * 2019-04-02 2019-07-09 北京睿至大数据有限公司 A kind of IT failure methods of exhibiting based on timing and thermodynamic chart
CN113692573A (en) * 2019-04-11 2021-11-23 微软技术许可有限责任公司 Hierarchically deploying packages to devices in a cluster
CN110470948A (en) * 2019-08-15 2019-11-19 国网四川省电力公司电力科学研究院 A kind of fault location system and method based on platform area circuit topology relationship
CN111158768A (en) * 2019-12-25 2020-05-15 浪潮商用机器有限公司 A server switch control method, device, equipment and storage medium
CN112131579A (en) * 2020-09-30 2020-12-25 中孚安全技术有限公司 Security check method and system for shielding difference between bottom CPU and operating system
CN112988523A (en) * 2021-03-09 2021-06-18 杭州电魂网络科技股份有限公司 Multi-dimensional game system warning method and system
CN113342609A (en) * 2021-06-10 2021-09-03 重庆科创职业学院 Computer obstacle removing system
CN113448763A (en) * 2021-07-16 2021-09-28 广东电网有限责任公司 Dynamic expansion grouping alarm service method for full life cycle management
CN113791959A (en) * 2021-08-13 2021-12-14 济南浪潮数据技术有限公司 Alarm push method, system, terminal and storage medium of service platform
CN114168404A (en) * 2021-11-04 2022-03-11 济南浪潮数据技术有限公司 Alarm processing method of monitoring platform in data center and monitoring platform
CN114584455A (en) * 2022-03-04 2022-06-03 吉林大学 Small and medium-sized high-performance cluster monitoring system based on enterprise WeChat
CN114660988A (en) * 2022-03-25 2022-06-24 佛山市博顿光电科技有限公司 Troubleshooting method and device
CN115035698A (en) * 2022-06-06 2022-09-09 大牧人机械(胶州)有限公司 Pig farm centralized alarm equipment redundancy system and self-checking method
CN115035698B (en) * 2022-06-06 2024-04-26 大牧人机械(胶州)有限公司 Pig farm centralized alarm equipment redundancy system and self-checking method
CN116016262A (en) * 2022-12-28 2023-04-25 天翼云科技有限公司 Method and device for detecting call chain consistency in real time based on union
CN116016262B (en) * 2022-12-28 2024-05-24 天翼云科技有限公司 A method and device for real-time detection of call chain consistency based on union-find

Similar Documents

Publication Publication Date Title
CN106649055A (en) Domestic CPU (central processing unit) and operating system based software and hardware fault alarming system and method
US10104095B2 (en) Automatic stability determination and deployment of discrete parts of a profile representing normal behavior to provide fast protection of web applications
CA2578957C (en) Agile information technology infrastructure management system
CN110232006B (en) Equipment alarm method and related device
US20060282886A1 (en) Service oriented security device management network
CN110224865A (en) A kind of log warning system based on Stream Processing
CN110209518A (en) A kind of multi-data source daily record data, which is concentrated, collects storage method and device
CN110968479B (en) Service level full-link monitoring method and server for application program
US11023304B2 (en) System and method for data error notification in interconnected data production systems
CN113452607A (en) Distributed link acquisition method and device, computing equipment and storage medium
CN112667660B (en) Enterprise internal information system data leakage identification method based on complex event identification
CA2361003C (en) System for data capture, normalization, data event processing, communication and operator interface
US20080243872A1 (en) Computer network security data management system and method
CN111782481A (en) Universal data interface monitoring system and monitoring method
US7499937B2 (en) Network security data management system and method
CN106411566A (en) MIB alarm analysis method and system based on XML technology
CN114338347A (en) Ampere platform-based fault information out-of-band acquisition method and device
CN114006940A (en) Building integrated management information pushing method, system, computer and storage medium
CN109507711A (en) A kind of radioactive source monitoring management system and method
CN117914511A (en) Security audit system based on data exchange and log analysis
CN115865623B (en) Multi-platform-oriented alarm data processing method and related equipment
CN112422349B (en) Network management system, method, equipment and medium for NFV
CN109905391A (en) A kind of business network secure data acquisition management system
CN117557211B (en) Intelligent financial business processing method, platform and medium based on flow automation
CN117667565B (en) Business abnormality monitoring method, electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170510

RJ01 Rejection of invention patent application after publication