[go: up one dir, main page]

CN115087000B - Fault determination method and device, nonvolatile storage medium and computer terminal - Google Patents

Fault determination method and device, nonvolatile storage medium and computer terminal

Info

Publication number
CN115087000B
CN115087000B CN202110235249.0A CN202110235249A CN115087000B CN 115087000 B CN115087000 B CN 115087000B CN 202110235249 A CN202110235249 A CN 202110235249A CN 115087000 B CN115087000 B CN 115087000B
Authority
CN
China
Prior art keywords
infrastructure services
fault
target application
infrastructure
log
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110235249.0A
Other languages
Chinese (zh)
Other versions
CN115087000A (en
Inventor
杨光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Innovation Private Ltd
Original Assignee
Alibaba Singapore Holdings Pte Ltd
Alibaba Innovation Private Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Singapore Holdings Pte Ltd, Alibaba Innovation Private Ltd filed Critical Alibaba Singapore Holdings Pte Ltd
Priority to CN202110235249.0A priority Critical patent/CN115087000B/en
Publication of CN115087000A publication Critical patent/CN115087000A/en
Application granted granted Critical
Publication of CN115087000B publication Critical patent/CN115087000B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/04Arrangements for maintaining operational condition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本申请公开了一种故障确定方法及装置、非易失性存储介质、计算机终端。其中,该方法包括:获取ICT运行平台中提供的目标应用的目标检测指标,其中,目标检测指标用于反映目标应用的运行状态,ICT运行平台中包括多种基础设施服务,该多种基础设施服务中至少包括:CT基础设施服务和IT基础设施服务;依据目标检测指标确定目标应用是否发生故障;在目标应用发生故障的情况下,获取ICT运行平台中的多种基础设施服务的运行状态;依据多种基础设施服务的运行状态确定目标应用的故障原因。

The present application discloses a fault determination method and apparatus, a non-volatile storage medium, and a computer terminal. The method comprises: obtaining a target detection indicator of a target application provided in an ICT operation platform, wherein the target detection indicator is used to reflect the operating status of the target application, the ICT operation platform including multiple infrastructure services, the multiple infrastructure services including at least CT infrastructure services and IT infrastructure services; determining whether a target application has failed based on the target detection indicator; if the target application has failed, obtaining the operating status of the multiple infrastructure services in the ICT operation platform; and determining the cause of the target application failure based on the operating status of the multiple infrastructure services.

Description

Fault determination method and device, nonvolatile storage medium and computer terminal
Technical Field
The present application relates to the field of information communication technology (Information Communication Technology, abbreviated as ICT), and in particular, to a fault determining method and apparatus, a nonvolatile storage medium, and a computer terminal.
Background
In the case of a 5G private network, since private network equipment is cloud-loaded through a unified infrastructure, the influence on network quality is not only in communication function, but also in relation to the infrastructure, so that when a fault occurs in the 5G private network, the cause of the fault cannot be determined by screening only communication indexes. Besides the communication function of the private network, the infrastructure also needs to bear the industry application of the private network, and the dynamic resource guarantee condition of the infrastructure also needs to be detected.
In view of the above problems, no effective solution has been proposed at present.
Disclosure of Invention
The embodiment of the application provides a fault determining method and device, a nonvolatile storage medium and a computer terminal, which at least solve the technical problem that infrastructure is not detected in the related technology.
According to one aspect of the embodiment of the application, a fault determination method is provided, which comprises the steps of obtaining a target detection index of a target application provided in an ICT operation platform, wherein the target detection index is used for reflecting the operation state of the target application, the ICT operation platform comprises a plurality of infrastructure services, at least comprising CT infrastructure services and IT infrastructure services, and determining whether the target application has a fault according to the target detection index. IT should be noted that, because the infrastructure services of the ICT platform include both IT foundation services and CT foundation services, in order to accurately determine whether the IT foundation services or the CT foundation services specifically fail, IT is necessary to perform troubleshooting on the IT foundation services and the CT foundation services at the same time, that is, when determining whether the target application fails according to the target detection index, the embodiment of the present application adopts a mode of linkage analysis of the IT and CT equipment. According to the embodiment of the application, through fault linkage investigation of the IT equipment and the CT equipment, the fault location is realized efficiently and accurately, and the unified operation and maintenance of the IT/CT equipment is realized.
According to another aspect of the embodiment of the application, a fault processing method is further provided, and the fault processing method comprises the steps of collecting current running state information in an ICT running platform and generating running log information when at least one fault occurs in the ICT running platform, determining the fault type of each fault in the at least one fault based on the running log information, and positioning each fault, wherein the fault type comprises an IT fault and a CT fault, and isolating infrastructure services corresponding to each fault by adopting a fault isolation strategy corresponding to the fault type of each fault.
According to another aspect of the embodiment of the application, a fault elimination device is further provided, which comprises a first acquisition module, a first determination module and a second acquisition module, wherein the first acquisition module is used for acquiring target detection indexes of target applications provided in an ICT operation platform, the target detection indexes are used for reflecting the operation states of the target applications, the ICT operation platform comprises a plurality of infrastructure services, the plurality of infrastructure services at least comprise CT infrastructure services and IT infrastructure services, the first determination module is used for determining whether the target applications have faults according to the target detection indexes, the second acquisition module is used for acquiring the operation states of the plurality of infrastructure services in the ICT operation platform under the condition that the target applications have faults, and the second determination module is used for determining fault reasons of the target applications according to the operation states of the plurality of infrastructure services.
According to another aspect of the embodiment of the present application, there is also provided a nonvolatile storage medium including a stored program, wherein the device in which the nonvolatile storage medium is controlled to execute the above failure determination method when the program runs.
According to another aspect of the embodiment of the application, a computer terminal is provided, wherein the computer terminal comprises a processor and a memory, the memory is connected with the processor and is used for providing instructions for the processor to process the following processing steps, the target detection index of a target application provided in an ICT operation platform is obtained, the target detection index is used for reflecting the operation state of the target application, the ICT operation platform comprises a plurality of infrastructure services, the plurality of infrastructure services at least comprise CT infrastructure services and IT infrastructure services, whether the target application fails or not is determined according to the target detection index, the operation states of the plurality of infrastructure services in the ICT operation platform are obtained under the condition that the target application fails, and the failure reason of the target application is determined according to the operation states of the plurality of infrastructure services.
In the embodiment of the application, under the condition that the fault is determined according to the target detection index of the target application in the ICT operation platform, the fault cause is determined based on the operation states of various infrastructure services in the ICT operation platform, so that the detection of the infrastructure is realized, and the technical problem that the infrastructure is not detected in the related technology is solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
Fig. 1 is a diagram of a conventional network operation and maintenance architecture of 5G To C according To the related art;
FIG. 2 is a schematic diagram of an ICT unified operation and maintenance architecture for an industry private network according to an embodiment of the present application;
FIG. 3 is a schematic diagram of ICT unified operation and maintenance for an industry private network according to an embodiment of the present application;
fig. 4 is a schematic structural view of a computer terminal according to an embodiment of the present application;
FIG. 5 is a flow chart of a fault determination method according to an embodiment of the present application;
FIG. 6 is a flow chart of a fault handling method according to an embodiment of the present application;
fig. 7 is a schematic structural view of a failure determination device according to an embodiment of the present application;
FIG. 8 is a schematic diagram of an interactive interface according to an embodiment of the present application;
fig. 9 is a flowchart of another fault determination method according to an embodiment of the present application.
Detailed Description
In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
First, partial terms or terminology appearing in the course of describing embodiments of the application are applicable to the following explanation:
Private network communication refers to services such as emergency communication, command scheduling, daily work communication and the like provided for government and public security, public utilities, industry and commerce and the like. The communication network is built in some industries, departments or units to meet the requirements of organization management, safe production, dispatching command and the like.
ICT-a combination of information technology and communication technology. Information technology refers to various technologies employed to manage and process information, and is generally referred to as the application of computer science and communication technology to design, develop, install, and implement information systems and application software. Communication technologies include transmission access, network switching, mobile communication, wireless communication, optical communication, satellite communication, support management, private network communication, and the like, and currently there are 5G, LTE, IPTV, voIP, NGN and IMS.
Resource orchestrator-each cloud vendor also sequentially pushes out its own resource orchestration service (Resource Orchestration, ROS below). The ROS concept is that an infrastructure, namely a code, is characterized in that on one hand, the change of the infrastructure is recorded by the version management of the thinking of the code, on the other hand, the automatic operation and maintenance is realized through the code, the complexity of writing the code is simplified, a user describes the configuration, the dependency relationship and the like of a plurality of cloud computing resources (such as ECS, RDS, SLB) by using a Json/Yaml format template, and the deployment and the configuration of all cloud resources in a plurality of different areas and a plurality of accounts are automatically completed, so that the operation and maintenance personnel can easily complete the construction like a high-level building block.
And integrating the calculation network, wherein the cloud network cooperation is the integration of cloud and network service. The cloud and the network are relatively independent, only the integration of cloud computing and network services is provided, and the integration of the network architecture is not realized, wherein the integration of the computing network is 2.0 of cloud network integration, the network bottleneck of computing is broken through, and the tidal effect of computing power is links. Combining with 5G+MEC+AI technology, the network serves for computing, and the network is changed by the improvement of computing capacity, and the network are fused deeply.
Simple Network Management Protocol (SNMP), which is a standard protocol specifically designed for managing network nodes (servers, workstations, routers, switches, HUBS, etc.) in an IP network, is an application layer protocol. SNMP enables network administrators to manage network performance, discover and solve network problems, and plan network growth. The network management system knows that the network has problems by receiving random messages (and event reports) through SNMP.
DPI (DEEP PACKET Instructions) is a deep detection technology based on data packets, and performs deep detection on different network application layer loads (such as HTTP, DNS and the like), and determines the validity of the message through detecting the payload of the message.
The DPI system mainly bears the steps of analyzing binary network transmission data into a visible message, carrying out layer-by-layer feature analysis on massive messages, and finally utilizing the form visualization of software to be presented to an operator network management and operation service unit so as to help the operator to carry out more refined network flow management and management of other related services.
As shown in fig. 1, the conventional Operation and maintenance architecture of the 5G to C (5G technology for user) large network mainly includes three layers, namely, a dedicated hardware infrastructure layer, a network device layer, and a professional Operation and maintenance platform layer (i.e., a unified Operation and maintenance center in fig. 1), wherein the professional Operation and maintenance platform layer includes a wireless workbench, a core network workbench, a transmission workbench, a wireless Operation and maintenance center (OMC, operation AND MAINTENANCE CENTER), a core network OMC, and a transmission OMC. Correspondingly, the network equipment layer comprises a wireless network, a core network and a transmission network, and the special hardware infrastructure layer comprises wireless network special hardware, core network special hardware and transmission network special hardware. In the operation and maintenance architecture of the traditional 5G to C (5G technology for users), network devices related to a wireless network, a transmission network and a core network all have special hardware guarantee performance and are completely and independently deployed with upper-layer applications, so that only communication network indexes are concerned in network operation and maintenance. However, in the case of a 5G private network, the network quality is affected by unifying the infrastructure bearer, which is not only related to the communication function but also related to the infrastructure, and the infrastructure is required to bear the industry application of the private network in addition to the private network communication function, and the dynamic resource guarantee condition of the infrastructure is also required to be detected.
In addition, the operation and maintenance system has the following problems that the IT infrastructure has the resource scheduling problem when the operation and maintenance system is applied to the 5G to B private network, and because the infrastructure of the private network simultaneously carries the 5G industry application and the 5G communication function, the resource orchestrator integrated with the computing network is required to be orchestrated uniformly, and conflicts or other faults can occur. IT infrastructure problems will likely affect the communication metrics that the originally detected communication metrics (e.g. registration success rate, session establishment success rate) are only related to the communication service logic or terminal environment changes, the infrastructure is dedicated and the metrics are not affected. But since the private network is likely to adopt a general hardware platform for bearing, the fault of the general hardware platform also affects the communication index.
In order to solve the problems, the embodiment of the application provides a corresponding solution, namely, under the condition that the fault is determined according to the target detection index of the target application in the ICT operation platform, the fault cause is determined based on various infrastructure service operation states in the ICT operation platform, so that the detection of the infrastructure is realized, and the detailed description is given below.
Example 1
Fig. 2 is a schematic diagram of an ICT unified operation and maintenance architecture for an industry private network according to an embodiment of the present application. As shown in fig. 2, the operation and maintenance architecture mainly comprises three layers, namely a general infrastructure, service functions and a professional operation and maintenance platform (namely a unified operation and maintenance center in fig. 2), wherein the professional operation and maintenance platform comprises a CT infrastructure service (wireless operation OMC, a core network OMC and transmission OMC) and an IT infrastructure detection. Accordingly, the service functions include wireless network functions, core network functions and industry applications, and the general infrastructure includes infrastructure services such as computing, storage and networking.
As can be seen from fig. 2, the 5G communication functions (including 5G ran,5 gc) and the industry applications are uniformly carried on a general hardware platform, and there are different professional operation and maintenance system management respectively. Compared with the traditional large network, the method has the advantages that the general infrastructure is detected and managed, so that the upper ICT fusion operation and maintenance center can uniformly detect indexes and conduct operation and maintenance management, and if the communication indexes are involved to be in problems or faults, linkage investigation is conducted on each professional operation and maintenance, so that the communication index problems possibly caused by infrastructure faults and insufficient resources can be located.
The interaction flow of the above layers is shown in fig. 3, and can be roughly divided into the following parts:
And (3) fault discovery, namely forming unified log (log) through multiple acquisition means and based on a certain mark, and screening faults according to different rules of IT and CT. For example it device uses syslog snmp and ct device uses DPI proprietary protocol.
Syslog, often referred to as system log or system record, is a standard used to deliver documented messages over an Internet protocol (TCP/IP) network. This vocabulary is often used to refer to the actual syslog protocol, or to those applications or databases that submit syslog messages. The syslog protocol belongs to a master-slave protocol, and a syslog sending end can send a small text message (less than 1024 bytes) to a syslog receiving end. The receiving end is commonly named as "syslogd", "syslog daemon" or syslog server. The system log message may be transmitted in UDP protocol or/or TCP protocol. These data are sent in clear type. However, because SSL encryption jackets (e.g., stunnel, sslio or sslwrap, etc.) are not part of the syslog protocol itself, they can be used to provide a layer of encryption over SSL/TLS.
And (3) fault positioning, namely carrying out linkage analysis on it faults and ct faults under log of a unified time stamp.
Fault isolation, namely, all the ct service states are stored in a lasting mode, and it can be isolated at any time.
Fault recovery, it water level detection, capacity expansion according to the requirement, and ensuring the elastic capacity expansion of ct.
Based on the above principles, embodiments of the present application provide a method embodiment of a fault determination method, it being noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system, such as a set of computer executable instructions, and, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order other than that illustrated herein.
The method embodiments provided by the embodiments of the present application may be performed in a mobile terminal, a computer terminal, or similar computing device. Fig. 4 shows a hardware block diagram of a computer terminal (or mobile device) for implementing the fault determination method. As shown in fig. 4, the computer terminal 40 (or mobile device 40) may include one or more (402 a, 402b are shown here, 402n, the processor 402 (the processor 402 may include, but is not limited to, a microprocessor MCU or a processing device such as a programmable logic device FPGA), a memory 404 for storing data, and a transmission module 406 for communication functions. Among other things, a display, an input/output interface (I/O interface), a Universal Serial Bus (USB) port (which may be included as one of the ports of the I/O interface), a network interface, a power supply, and/or a camera may be included. It will be appreciated by those of ordinary skill in the art that the configuration shown in fig. 4 is merely illustrative and is not intended to limit the configuration of the electronic device described above. For example, the computer terminal 40 may also include more or fewer components than shown in FIG. 4, or have a different configuration than shown in FIG. 4.
It should be noted that the one or more processors 402 and/or other data processing circuits described above may be referred to herein generally as "data processing circuits. The data processing circuit may be embodied in whole or in part in software, hardware, firmware, or any other combination. Furthermore, the data processing circuitry may be a single stand-alone processing module, or incorporated, in whole or in part, into any of the other elements in the computer terminal 40 (or mobile device). As referred to in embodiments of the application, the data processing circuit acts as a processor control (e.g., selection of the path of the variable resistor termination connected to the interface).
The memory 404 may be used to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the methods in the embodiments of the present application, and the processor 402 executes the software programs and modules stored in the memory 404, thereby performing various functional applications and data processing, that is, implementing the above-mentioned vulnerability detection methods of application programs. Memory 404 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, memory 404 may further include memory located remotely from processor 402, which may be connected to computer terminal 40 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission module 406 is used to receive or transmit data via a network. The specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 40. In one example, the transmission module 406 includes a network adapter (Network Interface Controller, NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission module 406 may be a Radio Frequency (RF) module for communicating with the internet wirelessly.
The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the computer terminal 40 (or mobile device).
In the above operating environment, as shown in fig. 5, the fault determining method provided by the embodiment of the present application includes the following processing steps:
Step S502, obtaining a target detection index of a target application provided in an ICT operation platform, wherein the target detection index is used for reflecting the operation state of the target application, the ICT operation platform comprises a plurality of infrastructure services, the plurality of infrastructure services at least comprise CT infrastructure services and IT infrastructure services, and the target application comprises but is not limited to various types of industry applications.
The ICT fusion device refers to fusion of an IT (Information Technology ) device and a CT (Communication Technology, communication technology), the IT device may be a device such as a server, and the CT device may be a router, a switch, and the like. The ICT fusion device can adopt a hardware architecture of the CT device, wherein the hardware architecture comprises an Ethernet switching chip and a CPU, the Ethernet switching chip can realize a two-layer switching function, the CPU runs software to form a router operating system, namely the CPU bears the router operating system, and the router operating system can realize a three-layer switching function, such as a routing function. And the ICT fusion device forms a virtual application based on a virtual machine by running software on the CPU, and realizes the related functions of the IT device.
The target detection index may be obtained in various manners, for example, a request message may be sent to the user side, and the user equipment may respond to the request message and feed back the target detection index of the target application to the ICT operation platform, or may receive the target detection index from the user side at regular time. The target detection index includes, but is not limited to, communication quality parameters such as an applied data transmission rate, an applied bit error rate, and the like.
Step S504, determining whether the target application fails according to the target detection index;
In some embodiments, the step may be implemented by comparing the target detection index with a preset threshold, determining whether a failure occurs according to the comparison result, wherein the failure of the target application is determined when the comparison result indicates that the target detection index is smaller than the preset threshold, and determining that the failure of the target application is determined when the comparison result indicates that the target detection index is greater than the preset threshold. For example, when the data transmission rate of the target application is less than the rate threshold, then it is determined that the target application is malfunctioning.
In some embodiments of the present application, the fault may also be determined by a fault determination method as shown in fig. 9, which includes, as shown in fig. 9:
In step S902, a target detection index of a target application provided in an ICT operation platform is obtained, wherein the target detection index is used for reflecting an operation state of the target application, the ICT operation platform includes a plurality of infrastructure services, and the plurality of infrastructure services at least include a CT infrastructure service and an IT infrastructure service, and the target application includes, but is not limited to, various types of industry applications.
Step S904, determining whether the target application fails according to the target detection index;
It should be noted that, in the above step S902, the step S904 is in a one-to-one correspondence with the steps S502 and S904 in fig. 5, so the explanation of the step S502 and the step S504 is also applicable to the step S902 and the step S904, which are not repeated here.
Step S906, under the condition that the target application fails, acquiring the running states of various infrastructure services in the ICT running platform;
In some embodiments, running logs of multiple infrastructure services are obtained, wherein the log collection manners adopted by different infrastructure services are different, i.e. the log collection manners adopted by different infrastructure services can be independent, for example, it device collects logs by using syslog snmp, ct device collects logs by using DPI proprietary protocol, and running states of the multiple infrastructure services are determined based on the running logs.
The running log can be determined by determining time information when a target application fails and log identification of the target application, wherein the log identification is used for identifying logs generated by IT infrastructure services and CT infrastructure services associated with the target application, determining a time stamp corresponding to the time information and determining a log set corresponding to the time stamp, determining the log corresponding to the log identification from the log set according to the log identification, and taking the log corresponding to the log identification as the running log.
The log identifier may be determined based on a first identifier of the IT infrastructure service and a second identifier of the CT infrastructure service associated with the target application, and the specific determination manners are various, for example, the first identifier and the second identifier may be combined to form the log identifier, or the first identifier and the second identifier may be subjected to hash operation, and a result obtained by the hash operation is determined to be the log identifier.
Step S908, determining the failure cause of the target application according to the running states of the multiple infrastructure services.
The method comprises the steps of evaluating the running states of various infrastructure services to obtain evaluation indexes of the infrastructure services, wherein the evaluation indexes are used for evaluating the running states of the infrastructure services, determining target infrastructure services in the various infrastructure services according to the evaluation indexes, and determining that the fault cause is a fault caused by the target infrastructure services. When evaluating the operation states of the plurality of infrastructure services, the evaluation modes corresponding to the infrastructure services in the plurality of infrastructure services are required to be determined, namely, different evaluation modes exist for different infrastructure services, and the plurality of infrastructure services are evaluated by adopting the evaluation modes corresponding to the infrastructure services. The evaluation index includes, but is not limited to, communication rate, error rate, etc.
In some embodiments of the present application, the user may select an appropriate evaluation index according to the requirement, as shown in fig. 8. Fig. 8 is an interactive interface according to an embodiment of the present application, wherein the upper left side of fig. 8 is a display interface for displaying a topology structure diagram of a private network. The topology may be green or other user-specified colors during normal operation. When a fault occurs, after the cause of the fault is determined, the color of the node corresponding to the equipment with the fault changes, and the color is different according to the severity of the fault. The upper right of fig. 8 is a preset multiple evaluation criteria, and the user can directly select at least one evaluation criteria to evaluate the infrastructure services of the private network in combination with his own needs, while the lower right is an evaluation index to evaluate the infrastructure services of the private network based on the selected evaluation criteria. The lower left of fig. 8 shows a summary of faults, as shown in fig. 8, all faults are classified into IT faults and CT faults according to the fault category, and important information such as fault cause, fault level (severity), duration and the like of each fault can be displayed, so that a user can conveniently operate and maintain the private network.
In other embodiments of the present application, after determining the cause of the failure of the target application based on the operational status of various infrastructure services, the failure may be isolated, for example:
In the case of a fault caused by a CT infrastructure service, storing running state indicating information of the CT infrastructure service in a first time period, wherein the running state indicating information is used for indicating that the CT infrastructure service is in an unavailable state, in the case of a fault caused by the IT infrastructure service, storing running state indicating information of the IT infrastructure service in a second time period, wherein the running state indicating information is used for indicating that the IT infrastructure service is in the unavailable state, and in the case of the fault caused by the IT infrastructure service, the CT infrastructure service is in a larger influence and a complex detection process due to the fact that the CT infrastructure service is in a fault, the fault is required to be permanently isolated before the fault is recovered, and in the case of the fault of the IT infrastructure, a certain program section or script is in a fault, the fault is easy to detect, and therefore the fault can be isolated at any time. So that the duration corresponding to the first time period needs to be greater than the duration corresponding to the second time period.
In some embodiments, the fault of the target application may be caused by insufficient capacity of the infrastructure service, and after determining the fault cause of the target application according to the running states of the plurality of infrastructure services, when the fault cause is a fault caused by insufficient capacity of the plurality of infrastructure services, the prompt information for prompting the capacity expansion is generated. Wherein, IT infrastructure service and CT infrastructure service are both possible to have the situation of causing the failure of the target application due to insufficient capacity. Such capacities include, but are not limited to, remaining operating resources of the infrastructure corresponding to each infrastructure service, such as memory remaining space, etc.
It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.
From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present application.
Example 2
According to an embodiment of the present application, there is also provided a fault handling method as shown in fig. 6, including:
step S602, when at least one fault occurs in the ICT operation platform, current operation state information in the ICT operation platform is collected, and operation log information is generated;
In some embodiments, running logs of multiple infrastructure services are obtained, wherein the log collection manners adopted by different infrastructure services are different, i.e. the log collection manners adopted by different infrastructure services can be independent, for example, it device collects logs by using syslog snmp, ct device collects logs by using DPI proprietary protocol, and running states of the multiple infrastructure services are determined based on the running logs.
The running log can be determined by determining time information when a target application fails and log identification of the target application, wherein the log identification is used for identifying logs generated by IT infrastructure services and CT infrastructure services associated with the target application, determining a time stamp corresponding to the time information and determining a log set corresponding to the time stamp, determining the log corresponding to the log identification from the log set according to the log identification, and taking the log corresponding to the log identification as the running log.
The log identifier may be determined based on a first identifier of the IT infrastructure service and a second identifier of the CT infrastructure service associated with the target application, and the specific determination manners are various, for example, the first identifier and the second identifier may be combined to form the log identifier, or the first identifier and the second identifier may be subjected to hash operation, and a result obtained by the hash operation is determined to be the log identifier.
Step S604, determining a fault type of each fault in at least one fault based on the operation log information, and positioning each fault, wherein the fault type comprises an IT fault and a CT fault;
The method comprises the steps of evaluating the running states of various infrastructure services to obtain evaluation indexes of the infrastructure services, wherein the evaluation indexes are used for evaluating the running states of the infrastructure services, determining target infrastructure services in the various infrastructure services according to the evaluation indexes, and determining that the fault cause is a fault caused by the target infrastructure services. When evaluating the operation states of the plurality of infrastructure services, the evaluation modes corresponding to the infrastructure services in the plurality of infrastructure services are required to be determined, namely, different evaluation modes exist for different infrastructure services, and the plurality of infrastructure services are evaluated by adopting the evaluation modes corresponding to the infrastructure services. The evaluation index includes, but is not limited to, communication rate, error rate, etc.
Step S606, adopting a fault isolation strategy corresponding to the fault type of each fault to isolate the infrastructure service corresponding to each fault.
In other embodiments of the present application, after locating the fault, the fault may be isolated, for example:
In the case of a fault caused by a CT infrastructure service, storing running state indicating information of the CT infrastructure service in a first time period, wherein the running state indicating information is used for indicating that the CT infrastructure service is in an unavailable state, in the case of a fault caused by the IT infrastructure service, storing running state indicating information of the IT infrastructure service in a second time period, wherein the running state indicating information is used for indicating that the IT infrastructure service is in the unavailable state, and in the case of the fault caused by the IT infrastructure service, the CT infrastructure service is in a larger influence and a complex detection process due to the fact that the CT infrastructure service is in a fault, the fault is required to be permanently isolated before the fault is recovered, and in the case of the fault of the IT infrastructure, a certain program section or script is in a fault, the fault is easy to detect, and therefore the fault can be isolated at any time. So that the duration corresponding to the first time period needs to be greater than the duration corresponding to the second time period.
It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.
From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present application.
Example 3
There is further provided in accordance with an embodiment of the present application a fault determining apparatus as shown in fig. 7, which includes a first obtaining module 70 configured to obtain a target detection indicator of a target application provided in an ICT operation platform, where the target detection indicator is configured to reflect an operation state of the target application, and the ICT operation platform includes a plurality of infrastructure services, where the plurality of infrastructure services includes at least a CT infrastructure service and an IT infrastructure service, a first determining module 72 configured to determine whether the target application has a fault according to the target detection indicator, a second obtaining module 74 configured to obtain an operation state of the plurality of infrastructure services in the ICT operation platform when the target application has a fault, and a second determining module 76 configured to determine a cause of the fault of the target application according to the operation states of the plurality of infrastructure services.
In some embodiments of the present application, the first obtaining module 70 may obtain the target detection index in various manners, for example, a request message may be sent to the user side, and the user equipment may respond to the request message to feed back the target detection index of the target application to the ICT operation platform, and may also receive the target detection index from the user side at regular time. The target detection index includes, but is not limited to, communication quality parameters such as an applied data transmission rate, an applied bit error rate, and the like.
In some embodiments of the present application, the first determination module 72 may determine whether the target application is malfunctioning by comparing the target detection index to a preset threshold, determining whether the target application is malfunctioning based on the comparison result, wherein the target application is determined to be malfunctioning when the comparison result indicates that the target detection index is less than the preset threshold, and determining the target application to be malfunctioning when the comparison result indicates that the target detection index is greater than the preset threshold. For example, when the data transmission rate of the target application is less than the rate threshold, then it is determined that the target application is malfunctioning.
In some embodiments, the second obtaining module 74 may obtain running logs of multiple infrastructure services, where the log collection manners adopted by different infrastructure services are different, i.e., the log collection manners adopted by different infrastructure services may be independent, for example, it device uses syslog snmp to collect logs, ct device uses DPI proprietary protocol to collect logs, and determine the running states of the multiple infrastructure services based on the running logs.
The running log can be determined by determining time information when a target application fails and log identification of the target application, wherein the log identification is used for identifying logs generated by IT infrastructure services and CT infrastructure services associated with the target application, determining a time stamp corresponding to the time information and determining a log set corresponding to the time stamp, determining the log corresponding to the log identification from the log set according to the log identification, and taking the log corresponding to the log identification as the running log.
The log identifier may be determined based on a first identifier of the IT infrastructure service and a second identifier of the CT infrastructure service associated with the target application, and the specific determination manners are various, for example, the first identifier and the second identifier may be combined to form the log identifier, or the first identifier and the second identifier may be subjected to hash operation, and a result obtained by the hash operation is determined to be the log identifier.
In some embodiments of the present application, the second determination module 76 may be implemented by evaluating the operational status of the plurality of infrastructure services to obtain an evaluation index for each infrastructure service, where the evaluation index is used to evaluate the operational status of each infrastructure service, determining a target infrastructure service of the plurality of infrastructure services based on the evaluation index, and determining a cause of the fault as the fault caused by the target infrastructure service. When evaluating the operation states of the plurality of infrastructure services, the evaluation modes corresponding to the infrastructure services in the plurality of infrastructure services are required to be determined, namely, different evaluation modes exist for different infrastructure services, and the plurality of infrastructure services are evaluated by adopting the evaluation modes corresponding to the infrastructure services. The evaluation index includes, but is not limited to, communication rate, error rate, etc.
Example 3
Embodiments of the present application may provide a computer terminal, which may be any one of a group of computer terminals. Alternatively, in the present embodiment, the above-described computer terminal may be replaced with a terminal device such as a mobile terminal.
Alternatively, in this embodiment, the above-mentioned computer terminal may be located in at least one network device among a plurality of network devices of the computer network.
In this embodiment, the computer terminal may execute program codes of the following steps in the vulnerability detection method of an application program, where the program codes obtain a target detection index of a target application provided in an ICT operation platform, where the target detection index is used to reflect an operation state of the target application, the ICT operation platform includes multiple infrastructure services, and the multiple infrastructure services at least include a CT infrastructure service and an IT infrastructure service, determine whether the target application fails according to the target detection index, obtain operation states of multiple infrastructure services in the ICT operation platform in case of failure of the target application, and determine a failure cause of the target application according to the operation states of the multiple infrastructure services.
The memory may be used to store software programs and modules, such as program instructions/modules corresponding to the fault determining method and apparatus in the embodiments of the present application, and the processor executes the software programs and modules stored in the memory, thereby executing various functional applications and data processing, that is, implementing the fault determining method described above. The memory may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include memory remotely located with respect to the processor, which may be connected to terminal a through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The processor can call information and application programs stored in the memory through the transmission device to acquire target detection indexes of target applications provided in the ICT operation platform, wherein the target detection indexes are used for reflecting the operation states of the target applications, the ICT operation platform comprises multiple infrastructure services, at least the CT infrastructure services and the IT infrastructure services, whether the target applications are faulty or not is determined according to the target detection indexes, the operation states of the multiple infrastructure services in the ICT operation platform are acquired under the condition that the target applications are faulty, and the fault reasons of the target applications are determined according to the operation states of the multiple infrastructure services.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be implemented by a program for instructing a terminal device related hardware, and the program may be stored in a computer readable storage medium, where the storage medium may include a flash disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, etc.
Example 4
Embodiments of the present application also provide a nonvolatile storage medium. Alternatively, in the present embodiment, the above-described nonvolatile storage medium may be used to store the program code executed by the failure determination method provided in the above-described embodiment.
Alternatively, in this embodiment, the storage medium may be located in any one of the computer terminals in the computer terminal group in the computer network, or in any one of the mobile terminals in the mobile terminal group.
Optionally, in this embodiment, the storage medium is configured to store program code for obtaining a target detection indicator of a target application provided in an ICT operation platform, wherein the target detection indicator is configured to reflect an operation state of the target application, the ICT operation platform includes a plurality of infrastructure services, and the plurality of infrastructure services include at least a CT infrastructure service and an IT infrastructure service, determining whether the target application fails according to the target detection indicator, obtaining an operation state of the plurality of infrastructure services in the ICT operation platform in case of failure of the target application, and determining a cause of failure of the target application according to the operation state of the plurality of infrastructure services.
The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed technology may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, such as the division of the units, is merely a logical function division, and may be implemented in another manner, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. The storage medium includes a U disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, etc. which can store the program code.
The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application, which are intended to be comprehended within the scope of the present application.

Claims (16)

1.一种故障确定方法,其中,包括:1. A fault determination method, comprising: 获取ICT运行平台中提供的目标应用的目标检测指标,其中,所述目标检测指标用于反映所述目标应用的运行状态,所述ICT运行平台中包括多种基础设施服务,该多种基础设施服务中至少包括:CT基础设施服务和IT基础设施服务;Obtaining a target detection indicator of a target application provided in an ICT operation platform, wherein the target detection indicator is used to reflect an operation status of the target application, wherein the ICT operation platform includes multiple infrastructure services, and the multiple infrastructure services include at least: a CT infrastructure service and an IT infrastructure service; 依据所述目标检测指标确定所述目标应用是否发生故障;Determining whether a failure occurs in the target application based on the target detection indicator; 在所述目标应用发生故障的情况下,获取所述ICT运行平台中的所述多种基础设施服务的运行状态,包括:In the event that the target application fails, obtaining the operating status of the multiple infrastructure services in the ICT operating platform includes: 获取所述多种基础设施服务的运行日志,包括:Obtain operation logs for the various infrastructure services, including: 确定所述目标应用发生故障时的时间信息和所述目标应用的日志标识,其中,所述日志标识用于标识与所述目标应用关联的IT基础设施服务和CT基础设施服务所产生的日志;依据所述时间信息和所述目标应用的日志标识,确定所述运行日志;Determining time information when the target application fails and a log identifier of the target application, wherein the log identifier is used to identify logs generated by IT infrastructure services and CT infrastructure services associated with the target application; determining the operation log based on the time information and the log identifier of the target application; 基于所述运行日志确定所述多种基础设施服务的运行状态;determining the operating status of the plurality of infrastructure services based on the operating log; 依据所述多种基础设施服务的运行状态确定所述目标应用的故障原因。The cause of the failure of the target application is determined based on the operating status of the multiple infrastructure services. 2.根据权利要求1所述的方法,其中,不同的基础设施服务采用的日志采集方式是不同的。2. The method according to claim 1, wherein different infrastructure services use different log collection methods. 3.根据权利要求1所述的方法,其中,所述运行日志包括IT基础设施服务和CT基础设施服务所产生的日志。3 . The method according to claim 1 , wherein the operation logs include logs generated by IT infrastructure services and CT infrastructure services. 4.根据权利要求1所述的方法,其中,确定所述运行日志,包括:4. The method according to claim 1, wherein determining the operation log comprises: 确定与所述时间信息对应的时间戳,并确定与所述时间戳对应的日志集合;Determining a timestamp corresponding to the time information, and determining a log set corresponding to the timestamp; 依据所述日志标识从所述日志集合中确定与所述日志标识对应的日志,并将与所述日志标识对应的日志作为所述运行日志。The log corresponding to the log identifier is determined from the log set according to the log identifier, and the log corresponding to the log identifier is used as the running log. 5.根据权利要求1所述的方法,其中,依据所述多种基础设施服务的运行状态确定所述目标应用的故障原因,包括:5. The method according to claim 1 , wherein determining the cause of the failure of the target application based on the operating status of the multiple infrastructure services comprises: 对所述多种基础设施服务的运行状态进行评价,得到各个基础设施服务的评价指标,其中,所述评价指标用于评价各个基础设施服务的运行状态;Evaluating the operating status of the multiple infrastructure services to obtain an evaluation index for each infrastructure service, wherein the evaluation index is used to evaluate the operating status of each infrastructure service; 依据所述评价指标确定所述多种基础设施服务中的目标基础设施服务,并确定所述故障原因为所述目标基础设施服务引起的故障。A target infrastructure service among the multiple infrastructure services is determined based on the evaluation index, and the cause of the fault is determined to be a fault caused by the target infrastructure service. 6.根据权利要求5所述的方法,其中,确定所述故障原因为所述目标基础设施服务引起的故障后,所述方法还包括:6. The method according to claim 5, wherein after determining that the fault cause is a fault caused by the target infrastructure service, the method further comprises: 分类向目标对象展示故障的故障原因,故障持续时间,故障级别,其中,所述故障级别用于表征故障严重程度,不同故障级别的故障使用不同的标记标注。The classification shows the target object the cause of the fault, fault duration, and fault level. The fault level is used to characterize the severity of the fault, and faults of different fault levels are marked with different labels. 7.根据权利要求5所述的方法,其中,对所述多种基础设施服务的运行状态进行评价,包括:7. The method according to claim 5, wherein evaluating the operating status of the plurality of infrastructure services comprises: 确定所述多种基础设施服务中与各个基础设施服务对应的评价方式;determining an evaluation method corresponding to each of the plurality of infrastructure services; 分别采用与各个基础设施服务对应的评价方式对所述多种基础设施服务进行评价。The plurality of infrastructure services are evaluated respectively using evaluation methods corresponding to the respective infrastructure services. 8.根据权利要求6所述的方法,其中,确定所述多种基础设施服务中与各个基础设施服务对应的评价方式,包括:8. The method according to claim 6, wherein determining the evaluation method corresponding to each of the plurality of infrastructure services comprises: 在交互界面中显示预设的多种评价方式;Display multiple preset evaluation methods in the interactive interface; 响应于目标对象的通过所述交互界面接收的选择指令,从所述多种评价方式中选择与各个基础设施服务对应的评价方式。In response to a selection instruction received by the target object through the interactive interface, an evaluation method corresponding to each infrastructure service is selected from the multiple evaluation methods. 9.根据权利要求7所述的方法,其中,所述方法还包括:在交互界面中展示对所述多种基础设施服务的评价结果,以及分类展示故障的故障原因,故障持续时间故障级别,其中,所述故障级别用于表征故障严重程度,不同故障级别的故障使用不同的标记标注。9. The method according to claim 7, wherein the method further comprises: displaying the evaluation results of the multiple infrastructure services in an interactive interface, and categorizing and displaying the fault causes, fault durations, and fault levels of the faults, wherein the fault levels are used to characterize the severity of the faults, and faults of different fault levels are marked with different labels. 10.根据权利要求1所述的方法,其中,依据所述多种基础设施服务的运行状态确定所述目标应用的故障原因之后,所述方法还包括:10. The method according to claim 1, wherein after determining the cause of the failure of the target application based on the operating status of the multiple infrastructure services, the method further comprises: 在所述故障原因为CT基础设施服务引起的故障的情况下,在第一时间段内存储所述CT基础设施服务的运行状态指示信息,其中,所述运行状态指示信息用于指示所述CT基础设施服务处于不可用状态;In a case where the fault cause is a fault caused by a CT infrastructure service, storing operation status indication information of the CT infrastructure service within a first time period, wherein the operation status indication information is used to indicate that the CT infrastructure service is in an unavailable state; 在所述故障原因为IT基础设施服务引起的故障情况下,在第二时间段内存储所述IT基础设施服务的运行状态指示信息,其中,所述运行状态指示信息用于指示所述IT基础设施服务处于不可用状态;If the fault is caused by an IT infrastructure service, storing operation status indication information of the IT infrastructure service within a second time period, wherein the operation status indication information is used to indicate that the IT infrastructure service is in an unavailable state; 其中,所述第一时间段对应的时长大于第二时间段对应的时长。The duration corresponding to the first time period is greater than the duration corresponding to the second time period. 11.根据权利要求1至9中任意一项所述的方法,其中,依据所述多种基础设施服务的运行状态确定所述目标应用的故障原因之后,所述方法还包括:11. The method according to any one of claims 1 to 9, wherein after determining the cause of the failure of the target application based on the operating status of the multiple infrastructure services, the method further comprises: 当所述故障原因为所述多种基础设施服务的容量不足引起的故障的情况下,生成用于提示扩容的提示信息。When the fault is caused by insufficient capacity of the multiple infrastructure services, prompt information is generated to prompt capacity expansion. 12.一种故障确定方法,其中,包括:12. A fault determination method, comprising: 获取ICT运行平台中提供的目标应用的目标检测指标,其中,所述目标检测指标用于反映所述目标应用的运行状态,所述ICT运行平台中包括多种基础设施服务,该多种基础设施服务中至少包括:CT基础设施服务和IT基础设施服务;Obtaining a target detection indicator of a target application provided in an ICT operation platform, wherein the target detection indicator is used to reflect an operation status of the target application, wherein the ICT operation platform includes multiple infrastructure services, and the multiple infrastructure services include at least: a CT infrastructure service and an IT infrastructure service; 依据所述目标检测指标确定所述目标应用是否发生故障;Determining whether a failure occurs in the target application based on the target detection indicator; 在所述目标应用发生故障的情况下,获取所述ICT运行平台中的所述多种基础设施服务的运行状态,包括:In the event that the target application fails, obtaining the operating status of the multiple infrastructure services in the ICT operating platform includes: 获取所述多种基础设施服务的运行日志,包括:Obtain operation logs for the various infrastructure services, including: 确定所述目标应用发生故障时的时间信息和所述目标应用的日志标识,其中,所述日志标识用于标识与所述目标应用关联的IT基础设施服务和CT基础设施服务所产生的日志;依据所述时间信息和所述目标应用的日志标识,确定所述运行日志;Determining time information when the target application fails and a log identifier of the target application, wherein the log identifier is used to identify logs generated by IT infrastructure services and CT infrastructure services associated with the target application; determining the operation log based on the time information and the log identifier of the target application; 基于所述运行日志确定所述多种基础设施服务的运行状态;determining the operating status of the plurality of infrastructure services based on the operating log; 依据所述多种基础设施服务的运行状态确定所述目标应用的故障原因。The cause of the failure of the target application is determined based on the operating status of the multiple infrastructure services. 13.一种故障处理方法,其中,包括:13. A fault handling method, comprising: 在ICT运行平台中发生至少一个故障时,采集所述ICT运行平台中当前的运行状态信息,并生成运行日志信息,包括:When at least one fault occurs in the ICT operation platform, current operation status information of the ICT operation platform is collected and operation log information is generated, including: 获取多种基础设施服务的运行日志,包括:Obtain operation logs for various infrastructure services, including: 确定目标应用发生故障时的时间信息和所述目标应用的日志标识,其中,所述日志标识用于标识与所述目标应用关联的IT基础设施服务和CT基础设施服务所产生的日志;依据所述时间信息和所述目标应用的日志标识,确定所述运行日志;Determining time information when a target application fails and a log identifier of the target application, wherein the log identifier is used to identify logs generated by IT infrastructure services and CT infrastructure services associated with the target application; and determining the operation log based on the time information and the log identifier of the target application; 基于所述运行日志确定所述ICT运行平台中当前的运行状态信息;Determining current operating status information in the ICT operating platform based on the operating log; 基于所述运行日志信息,确定所述至少一个故障中每个故障的故障类型,并对所述每个故障进行定位,其中,所述故障类型包括IT故障和CT故障;Determine a fault type of each of the at least one fault based on the operation log information, and locate each of the faults, wherein the fault type includes an IT fault and a CT fault; 采用与所述每个故障的故障类型对应的故障隔离策略,对所述每个故障对应的基础设施服务进行隔离。A fault isolation strategy corresponding to the fault type of each fault is adopted to isolate the infrastructure service corresponding to each fault. 14.一种故障确定装置,其中,包括:14. A fault determination device, comprising: 第一获取模块,用于获取ICT运行平台中提供的目标应用的目标检测指标,其中,所述目标检测指标用于反映所述目标应用的运行状态,所述ICT运行平台中包括多种基础设施服务,该多种基础设施服务中至少包括:CT基础设施服务和IT基础设施服务;a first acquisition module, configured to acquire a target detection indicator of a target application provided in an ICT operation platform, wherein the target detection indicator is used to reflect an operation status of the target application, wherein the ICT operation platform includes multiple infrastructure services, and the multiple infrastructure services include at least CT infrastructure services and IT infrastructure services; 第一确定模块,用于依据所述目标检测指标确定所述目标应用是否发生故障;A first determination module is configured to determine whether a fault occurs in the target application based on the target detection indicator; 第二获取模块,用于在所述目标应用发生故障的情况下,获取所述ICT运行平台中的所述多种基础设施服务的运行状态,包括:The second acquisition module is configured to acquire the operating status of the multiple infrastructure services in the ICT operating platform when the target application fails, including: 获取所述多种基础设施服务的运行日志,包括:Obtain operation logs for the various infrastructure services, including: 确定所述目标应用发生故障时的时间信息和所述目标应用的日志标识,其中,所述日志标识用于标识与所述目标应用关联的IT基础设施服务和CT基础设施服务所产生的日志;依据所述时间信息和所述目标应用的日志标识,确定所述运行日志;Determining time information when the target application fails and a log identifier of the target application, wherein the log identifier is used to identify logs generated by IT infrastructure services and CT infrastructure services associated with the target application; determining the operation log based on the time information and the log identifier of the target application; 基于所述运行日志确定所述多种基础设施服务的运行状态;determining the operating status of the plurality of infrastructure services based on the operating log; 第二确定模块,用于依据所述多种基础设施服务的运行状态确定所述目标应用的故障原因。The second determining module is configured to determine a cause of the failure of the target application according to the operating status of the multiple infrastructure services. 15.一种非易失性存储介质,其中,所述非易失性存储介质包括存储的程序,其中,在所述程序运行时控制所述非易失性存储介质所在设备执行权利要求1至7中任意一项所述的故障确定方法。15. A non-volatile storage medium, wherein the non-volatile storage medium includes a stored program, wherein when the program is executed, the device where the non-volatile storage medium is located is controlled to execute the fault determination method according to any one of claims 1 to 7. 16.一种计算机终端,其中,包括:16. A computer terminal, comprising: 处理器;以及processor; and 存储器,与所述处理器连接,用于为所述处理器提供处理以下处理步骤的指令:A memory, connected to the processor, configured to provide the processor with instructions for processing the following processing steps: 获取ICT运行平台中提供的目标应用的目标检测指标,其中,所述目标检测指标用于反映所述目标应用的运行状态,所述ICT运行平台中包括多种基础设施服务,该多种基础设施服务中至少包括:CT基础设施服务和IT基础设施服务;Obtaining a target detection indicator of a target application provided in an ICT operation platform, wherein the target detection indicator is used to reflect an operation status of the target application, wherein the ICT operation platform includes multiple infrastructure services, and the multiple infrastructure services include at least: a CT infrastructure service and an IT infrastructure service; 依据所述目标检测指标确定所述目标应用是否发生故障;Determining whether a failure occurs in the target application based on the target detection indicator; 在所述目标应用发生故障的情况下,获取所述ICT运行平台中的所述多种基础设施服务的运行状态,包括:In the event that the target application fails, obtaining the operating status of the multiple infrastructure services in the ICT operating platform includes: 获取所述多种基础设施服务的运行日志,包括:Obtain operation logs for the various infrastructure services, including: 确定所述目标应用发生故障时的时间信息和所述目标应用的日志标识,其中,所述日志标识用于标识与所述目标应用关联的IT基础设施服务和CT基础设施服务所产生的日志;依据所述时间信息和所述目标应用的日志标识,确定所述运行日志;Determining time information when the target application fails and a log identifier of the target application, wherein the log identifier is used to identify logs generated by IT infrastructure services and CT infrastructure services associated with the target application; determining the operation log based on the time information and the log identifier of the target application; 基于所述运行日志确定所述多种基础设施服务的运行状态;determining the operating status of the plurality of infrastructure services based on the operating log; 依据所述多种基础设施服务的运行状态确定所述目标应用的故障原因。The cause of the failure of the target application is determined based on the operating status of the multiple infrastructure services.
CN202110235249.0A 2021-03-03 2021-03-03 Fault determination method and device, nonvolatile storage medium and computer terminal Active CN115087000B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110235249.0A CN115087000B (en) 2021-03-03 2021-03-03 Fault determination method and device, nonvolatile storage medium and computer terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110235249.0A CN115087000B (en) 2021-03-03 2021-03-03 Fault determination method and device, nonvolatile storage medium and computer terminal

Publications (2)

Publication Number Publication Date
CN115087000A CN115087000A (en) 2022-09-20
CN115087000B true CN115087000B (en) 2025-08-19

Family

ID=83240481

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110235249.0A Active CN115087000B (en) 2021-03-03 2021-03-03 Fault determination method and device, nonvolatile storage medium and computer terminal

Country Status (1)

Country Link
CN (1) CN115087000B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111901171A (en) * 2020-07-29 2020-11-06 腾讯科技(深圳)有限公司 Anomaly detection and attribution method, device, equipment and computer readable storage medium

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102158360B (en) * 2011-04-01 2013-10-30 华中科技大学 Network fault self-diagnosis method based on causal relationship positioning of time factors
CN104732447B (en) * 2014-04-23 2019-03-22 国家电网公司 A method for establishing vulnerability index system of important infrastructure of power grid
US9553997B2 (en) * 2014-11-01 2017-01-24 Somos, Inc. Toll-free telecommunications management platform
CN104618479B (en) * 2015-01-29 2018-10-09 深圳市布谷鸟科技有限公司 A method of realizing the vehicle-mounted cloud service terminal information transfer of commercial car to be layered communication modes
CN204374756U (en) * 2015-02-03 2015-06-03 福建金科信息技术股份有限公司 Integration O&M supervisory system
CN105335819B (en) * 2015-10-21 2019-08-02 国家电网公司 A method for building an information system risk early warning model based on big data
CN105407011B (en) * 2015-10-26 2018-10-19 贵州电网公司信息通信分公司 A kind of IT basic platforms monitor control index acquisition system and acquisition method
CN106804054B (en) * 2015-11-26 2020-11-20 中兴通讯股份有限公司 A method and device for virtualized base station access network to share transmission resources
KR101853746B1 (en) * 2016-09-12 2018-05-03 주식회사 보성산업롤 Roller Abnomal Checking Device
CN107102931A (en) * 2017-04-11 2017-08-29 深信服科技股份有限公司 IT operation management method, device and computer-readable recording medium
CN107360253A (en) * 2017-08-18 2017-11-17 上海盈联电信科技有限公司 A kind of middleware system for Internet of things
CN111175634A (en) * 2019-12-16 2020-05-19 珠海博杰电子股份有限公司 ICT test platform
CN111516548B (en) * 2020-04-23 2021-11-23 华南理工大学 Cloud platform-based charging pile system for realizing power battery fault diagnosis
CN111984999B (en) * 2020-08-20 2021-11-30 海南电网有限责任公司信息通信分公司 Safety management and control method and system for power failure first-aid repair system
CN112231174B (en) * 2020-09-30 2024-02-23 中国银联股份有限公司 Abnormality warning method, device, equipment and storage medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111901171A (en) * 2020-07-29 2020-11-06 腾讯科技(深圳)有限公司 Anomaly detection and attribution method, device, equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN115087000A (en) 2022-09-20

Similar Documents

Publication Publication Date Title
US12375340B2 (en) Data driven systems and methods to isolate network faults
US11469992B2 (en) Systems and methods for managing multi-layer communication networks
US8370466B2 (en) Method and system for providing operator guidance in network and systems management
US9225554B2 (en) Device-health-based dynamic configuration of network management systems suited for network operations
US7756046B2 (en) Apparatus and method for locating trouble occurrence position in communication network
US10318335B1 (en) Self-managed virtual networks and services
CN107733672A (en) Fault handling method, device and controller
EP2681870A1 (en) Technique for determining correlated events in a communication system
EP2586158B1 (en) Apparatus and method for monitoring of connectivity services
CN116074178A (en) Network digital twin architecture, network session processing method and device
CN118075160A (en) Method, system and device for determining network link quality
CN112671586B (en) Automatic migration and guarantee method and device for service configuration
CN115087000B (en) Fault determination method and device, nonvolatile storage medium and computer terminal
Levin et al. Network Monitoring in Federated Cloud Environment
US12101241B1 (en) Mechanism for intelligent and comprehensive monitoring system using peer-to-peer agents in a network
JP2016146519A (en) Network monitoring system, monitoring device, and monitoring method
US7792045B1 (en) Method and apparatus for configuration and analysis of internal network routing protocols
CN118590425A (en) Dial-up test method, device, system and computing device cluster
CN119450749B (en) Data processing method and device, non-volatile storage medium, and electronic device
CN120075092A (en) Method and device for diagnosing unmanagability of home gateway, electronic equipment and computer program product
CN119316319B (en) Network quality determination method, storage medium, electronic device, and program product
CN115277133B (en) Equipment management methods and devices
US8572235B1 (en) Method and system for monitoring a complex IT infrastructure at the service level
Savoor et al. Network Measurements
Arnold Understanding Cloud Network Performance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20240313

Address after: # 03-06, Lai Zan Da Building 1, 51 Belarusian Road, Singapore

Applicant after: Alibaba Innovation Co.

Country or region after: Singapore

Address before: Room 01, 45th Floor, AXA Building, 8 Shanton Road, Singapore

Applicant before: Alibaba Singapore Holdings Ltd.

Country or region before: Singapore

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant