CN115087000B

CN115087000B - Fault determination method and device, nonvolatile storage medium and computer terminal

Info

Publication number: CN115087000B
Application number: CN202110235249.0A
Authority: CN
Inventors: 杨光
Original assignee: Alibaba Singapore Holdings Pte Ltd; Alibaba Innovation Private Ltd
Current assignee: Alibaba Innovation Private Ltd
Priority date: 2021-03-03
Filing date: 2021-03-03
Publication date: 2025-08-19
Anticipated expiration: 2041-03-03
Also published as: CN115087000A

Abstract

The present application discloses a fault determination method and apparatus, a non-volatile storage medium, and a computer terminal. The method comprises: obtaining a target detection indicator of a target application provided in an ICT operation platform, wherein the target detection indicator is used to reflect the operating status of the target application, the ICT operation platform including multiple infrastructure services, the multiple infrastructure services including at least CT infrastructure services and IT infrastructure services; determining whether a target application has failed based on the target detection indicator; if the target application has failed, obtaining the operating status of the multiple infrastructure services in the ICT operation platform; and determining the cause of the target application failure based on the operating status of the multiple infrastructure services.

Description

Fault determination method and device, nonvolatile storage medium and computer terminal

Technical Field

The present application relates to the field of information communication technology (Information Communication Technology, abbreviated as ICT), and in particular, to a fault determining method and apparatus, a nonvolatile storage medium, and a computer terminal.

Background

In the case of a 5G private network, since private network equipment is cloud-loaded through a unified infrastructure, the influence on network quality is not only in communication function, but also in relation to the infrastructure, so that when a fault occurs in the 5G private network, the cause of the fault cannot be determined by screening only communication indexes. Besides the communication function of the private network, the infrastructure also needs to bear the industry application of the private network, and the dynamic resource guarantee condition of the infrastructure also needs to be detected.

In view of the above problems, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment of the application provides a fault determining method and device, a nonvolatile storage medium and a computer terminal, which at least solve the technical problem that infrastructure is not detected in the related technology.

According to one aspect of the embodiment of the application, a fault determination method is provided, which comprises the steps of obtaining a target detection index of a target application provided in an ICT operation platform, wherein the target detection index is used for reflecting the operation state of the target application, the ICT operation platform comprises a plurality of infrastructure services, at least comprising CT infrastructure services and IT infrastructure services, and determining whether the target application has a fault according to the target detection index. IT should be noted that, because the infrastructure services of the ICT platform include both IT foundation services and CT foundation services, in order to accurately determine whether the IT foundation services or the CT foundation services specifically fail, IT is necessary to perform troubleshooting on the IT foundation services and the CT foundation services at the same time, that is, when determining whether the target application fails according to the target detection index, the embodiment of the present application adopts a mode of linkage analysis of the IT and CT equipment. According to the embodiment of the application, through fault linkage investigation of the IT equipment and the CT equipment, the fault location is realized efficiently and accurately, and the unified operation and maintenance of the IT/CT equipment is realized.

According to another aspect of the embodiment of the application, a fault processing method is further provided, and the fault processing method comprises the steps of collecting current running state information in an ICT running platform and generating running log information when at least one fault occurs in the ICT running platform, determining the fault type of each fault in the at least one fault based on the running log information, and positioning each fault, wherein the fault type comprises an IT fault and a CT fault, and isolating infrastructure services corresponding to each fault by adopting a fault isolation strategy corresponding to the fault type of each fault.

According to another aspect of the embodiment of the application, a fault elimination device is further provided, which comprises a first acquisition module, a first determination module and a second acquisition module, wherein the first acquisition module is used for acquiring target detection indexes of target applications provided in an ICT operation platform, the target detection indexes are used for reflecting the operation states of the target applications, the ICT operation platform comprises a plurality of infrastructure services, the plurality of infrastructure services at least comprise CT infrastructure services and IT infrastructure services, the first determination module is used for determining whether the target applications have faults according to the target detection indexes, the second acquisition module is used for acquiring the operation states of the plurality of infrastructure services in the ICT operation platform under the condition that the target applications have faults, and the second determination module is used for determining fault reasons of the target applications according to the operation states of the plurality of infrastructure services.

According to another aspect of the embodiment of the present application, there is also provided a nonvolatile storage medium including a stored program, wherein the device in which the nonvolatile storage medium is controlled to execute the above failure determination method when the program runs.

According to another aspect of the embodiment of the application, a computer terminal is provided, wherein the computer terminal comprises a processor and a memory, the memory is connected with the processor and is used for providing instructions for the processor to process the following processing steps, the target detection index of a target application provided in an ICT operation platform is obtained, the target detection index is used for reflecting the operation state of the target application, the ICT operation platform comprises a plurality of infrastructure services, the plurality of infrastructure services at least comprise CT infrastructure services and IT infrastructure services, whether the target application fails or not is determined according to the target detection index, the operation states of the plurality of infrastructure services in the ICT operation platform are obtained under the condition that the target application fails, and the failure reason of the target application is determined according to the operation states of the plurality of infrastructure services.

In the embodiment of the application, under the condition that the fault is determined according to the target detection index of the target application in the ICT operation platform, the fault cause is determined based on the operation states of various infrastructure services in the ICT operation platform, so that the detection of the infrastructure is realized, and the technical problem that the infrastructure is not detected in the related technology is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

Fig. 1 is a diagram of a conventional network operation and maintenance architecture of 5G To C according To the related art;

FIG. 2 is a schematic diagram of an ICT unified operation and maintenance architecture for an industry private network according to an embodiment of the present application;

FIG. 3 is a schematic diagram of ICT unified operation and maintenance for an industry private network according to an embodiment of the present application;

fig. 4 is a schematic structural view of a computer terminal according to an embodiment of the present application;

FIG. 5 is a flow chart of a fault determination method according to an embodiment of the present application;

FIG. 6 is a flow chart of a fault handling method according to an embodiment of the present application;

fig. 7 is a schematic structural view of a failure determination device according to an embodiment of the present application;

FIG. 8 is a schematic diagram of an interactive interface according to an embodiment of the present application;

fig. 9 is a flowchart of another fault determination method according to an embodiment of the present application.

Detailed Description

In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

First, partial terms or terminology appearing in the course of describing embodiments of the application are applicable to the following explanation:

Private network communication refers to services such as emergency communication, command scheduling, daily work communication and the like provided for government and public security, public utilities, industry and commerce and the like. The communication network is built in some industries, departments or units to meet the requirements of organization management, safe production, dispatching command and the like.

ICT-a combination of information technology and communication technology. Information technology refers to various technologies employed to manage and process information, and is generally referred to as the application of computer science and communication technology to design, develop, install, and implement information systems and application software. Communication technologies include transmission access, network switching, mobile communication, wireless communication, optical communication, satellite communication, support management, private network communication, and the like, and currently there are 5G, LTE, IPTV, voIP, NGN and IMS.

Resource orchestrator-each cloud vendor also sequentially pushes out its own resource orchestration service (Resource Orchestration, ROS below). The ROS concept is that an infrastructure, namely a code, is characterized in that on one hand, the change of the infrastructure is recorded by the version management of the thinking of the code, on the other hand, the automatic operation and maintenance is realized through the code, the complexity of writing the code is simplified, a user describes the configuration, the dependency relationship and the like of a plurality of cloud computing resources (such as ECS, RDS, SLB) by using a Json/Yaml format template, and the deployment and the configuration of all cloud resources in a plurality of different areas and a plurality of accounts are automatically completed, so that the operation and maintenance personnel can easily complete the construction like a high-level building block.

And integrating the calculation network, wherein the cloud network cooperation is the integration of cloud and network service. The cloud and the network are relatively independent, only the integration of cloud computing and network services is provided, and the integration of the network architecture is not realized, wherein the integration of the computing network is 2.0 of cloud network integration, the network bottleneck of computing is broken through, and the tidal effect of computing power is links. Combining with 5G+MEC+AI technology, the network serves for computing, and the network is changed by the improvement of computing capacity, and the network are fused deeply.

Simple Network Management Protocol (SNMP), which is a standard protocol specifically designed for managing network nodes (servers, workstations, routers, switches, HUBS, etc.) in an IP network, is an application layer protocol. SNMP enables network administrators to manage network performance, discover and solve network problems, and plan network growth. The network management system knows that the network has problems by receiving random messages (and event reports) through SNMP.

DPI (DEEP PACKET Instructions) is a deep detection technology based on data packets, and performs deep detection on different network application layer loads (such as HTTP, DNS and the like), and determines the validity of the message through detecting the payload of the message.

The DPI system mainly bears the steps of analyzing binary network transmission data into a visible message, carrying out layer-by-layer feature analysis on massive messages, and finally utilizing the form visualization of software to be presented to an operator network management and operation service unit so as to help the operator to carry out more refined network flow management and management of other related services.

As shown in fig. 1, the conventional Operation and maintenance architecture of the 5G to C (5G technology for user) large network mainly includes three layers, namely, a dedicated hardware infrastructure layer, a network device layer, and a professional Operation and maintenance platform layer (i.e., a unified Operation and maintenance center in fig. 1), wherein the professional Operation and maintenance platform layer includes a wireless workbench, a core network workbench, a transmission workbench, a wireless Operation and maintenance center (OMC, operation AND MAINTENANCE CENTER), a core network OMC, and a transmission OMC. Correspondingly, the network equipment layer comprises a wireless network, a core network and a transmission network, and the special hardware infrastructure layer comprises wireless network special hardware, core network special hardware and transmission network special hardware. In the operation and maintenance architecture of the traditional 5G to C (5G technology for users), network devices related to a wireless network, a transmission network and a core network all have special hardware guarantee performance and are completely and independently deployed with upper-layer applications, so that only communication network indexes are concerned in network operation and maintenance. However, in the case of a 5G private network, the network quality is affected by unifying the infrastructure bearer, which is not only related to the communication function but also related to the infrastructure, and the infrastructure is required to bear the industry application of the private network in addition to the private network communication function, and the dynamic resource guarantee condition of the infrastructure is also required to be detected.

In addition, the operation and maintenance system has the following problems that the IT infrastructure has the resource scheduling problem when the operation and maintenance system is applied to the 5G to B private network, and because the infrastructure of the private network simultaneously carries the 5G industry application and the 5G communication function, the resource orchestrator integrated with the computing network is required to be orchestrated uniformly, and conflicts or other faults can occur. IT infrastructure problems will likely affect the communication metrics that the originally detected communication metrics (e.g. registration success rate, session establishment success rate) are only related to the communication service logic or terminal environment changes, the infrastructure is dedicated and the metrics are not affected. But since the private network is likely to adopt a general hardware platform for bearing, the fault of the general hardware platform also affects the communication index.

In order to solve the problems, the embodiment of the application provides a corresponding solution, namely, under the condition that the fault is determined according to the target detection index of the target application in the ICT operation platform, the fault cause is determined based on various infrastructure service operation states in the ICT operation platform, so that the detection of the infrastructure is realized, and the detailed description is given below.

Example 1

Fig. 2 is a schematic diagram of an ICT unified operation and maintenance architecture for an industry private network according to an embodiment of the present application. As shown in fig. 2, the operation and maintenance architecture mainly comprises three layers, namely a general infrastructure, service functions and a professional operation and maintenance platform (namely a unified operation and maintenance center in fig. 2), wherein the professional operation and maintenance platform comprises a CT infrastructure service (wireless operation OMC, a core network OMC and transmission OMC) and an IT infrastructure detection. Accordingly, the service functions include wireless network functions, core network functions and industry applications, and the general infrastructure includes infrastructure services such as computing, storage and networking.

As can be seen from fig. 2, the 5G communication functions (including 5G ran,5 gc) and the industry applications are uniformly carried on a general hardware platform, and there are different professional operation and maintenance system management respectively. Compared with the traditional large network, the method has the advantages that the general infrastructure is detected and managed, so that the upper ICT fusion operation and maintenance center can uniformly detect indexes and conduct operation and maintenance management, and if the communication indexes are involved to be in problems or faults, linkage investigation is conducted on each professional operation and maintenance, so that the communication index problems possibly caused by infrastructure faults and insufficient resources can be located.

The interaction flow of the above layers is shown in fig. 3, and can be roughly divided into the following parts:

And (3) fault discovery, namely forming unified log (log) through multiple acquisition means and based on a certain mark, and screening faults according to different rules of IT and CT. For example it device uses syslog snmp and ct device uses DPI proprietary protocol.

Syslog, often referred to as system log or system record, is a standard used to deliver documented messages over an Internet protocol (TCP/IP) network. This vocabulary is often used to refer to the actual syslog protocol, or to those applications or databases that submit syslog messages. The syslog protocol belongs to a master-slave protocol, and a syslog sending end can send a small text message (less than 1024 bytes) to a syslog receiving end. The receiving end is commonly named as "syslogd", "syslog daemon" or syslog server. The system log message may be transmitted in UDP protocol or/or TCP protocol. These data are sent in clear type. However, because SSL encryption jackets (e.g., stunnel, sslio or sslwrap, etc.) are not part of the syslog protocol itself, they can be used to provide a layer of encryption over SSL/TLS.

And (3) fault positioning, namely carrying out linkage analysis on it faults and ct faults under log of a unified time stamp.

Fault isolation, namely, all the ct service states are stored in a lasting mode, and it can be isolated at any time.

Fault recovery, it water level detection, capacity expansion according to the requirement, and ensuring the elastic capacity expansion of ct.

Based on the above principles, embodiments of the present application provide a method embodiment of a fault determination method, it being noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system, such as a set of computer executable instructions, and, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order other than that illustrated herein.

The method embodiments provided by the embodiments of the present application may be performed in a mobile terminal, a computer terminal, or similar computing device. Fig. 4 shows a hardware block diagram of a computer terminal (or mobile device) for implementing the fault determination method. As shown in fig. 4, the computer terminal 40 (or mobile device 40) may include one or more (402 a, 402b are shown here, 402n, the processor 402 (the processor 402 may include, but is not limited to, a microprocessor MCU or a processing device such as a programmable logic device FPGA), a memory 404 for storing data, and a transmission module 406 for communication functions. Among other things, a display, an input/output interface (I/O interface), a Universal Serial Bus (USB) port (which may be included as one of the ports of the I/O interface), a network interface, a power supply, and/or a camera may be included. It will be appreciated by those of ordinary skill in the art that the configuration shown in fig. 4 is merely illustrative and is not intended to limit the configuration of the electronic device described above. For example, the computer terminal 40 may also include more or fewer components than shown in FIG. 4, or have a different configuration than shown in FIG. 4.

It should be noted that the one or more processors 402 and/or other data processing circuits described above may be referred to herein generally as "data processing circuits. The data processing circuit may be embodied in whole or in part in software, hardware, firmware, or any other combination. Furthermore, the data processing circuitry may be a single stand-alone processing module, or incorporated, in whole or in part, into any of the other elements in the computer terminal 40 (or mobile device). As referred to in embodiments of the application, the data processing circuit acts as a processor control (e.g., selection of the path of the variable resistor termination connected to the interface).

The memory 404 may be used to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the methods in the embodiments of the present application, and the processor 402 executes the software programs and modules stored in the memory 404, thereby performing various functional applications and data processing, that is, implementing the above-mentioned vulnerability detection methods of application programs. Memory 404 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, memory 404 may further include memory located remotely from processor 402, which may be connected to computer terminal 40 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission module 406 is used to receive or transmit data via a network. The specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 40. In one example, the transmission module 406 includes a network adapter (Network Interface Controller, NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission module 406 may be a Radio Frequency (RF) module for communicating with the internet wirelessly.

The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the computer terminal 40 (or mobile device).

In the above operating environment, as shown in fig. 5, the fault determining method provided by the embodiment of the present application includes the following processing steps:

Step S502, obtaining a target detection index of a target application provided in an ICT operation platform, wherein the target detection index is used for reflecting the operation state of the target application, the ICT operation platform comprises a plurality of infrastructure services, the plurality of infrastructure services at least comprise CT infrastructure services and IT infrastructure services, and the target application comprises but is not limited to various types of industry applications.

The ICT fusion device refers to fusion of an IT (Information Technology ) device and a CT (Communication Technology, communication technology), the IT device may be a device such as a server, and the CT device may be a router, a switch, and the like. The ICT fusion device can adopt a hardware architecture of the CT device, wherein the hardware architecture comprises an Ethernet switching chip and a CPU, the Ethernet switching chip can realize a two-layer switching function, the CPU runs software to form a router operating system, namely the CPU bears the router operating system, and the router operating system can realize a three-layer switching function, such as a routing function. And the ICT fusion device forms a virtual application based on a virtual machine by running software on the CPU, and realizes the related functions of the IT device.

The target detection index may be obtained in various manners, for example, a request message may be sent to the user side, and the user equipment may respond to the request message and feed back the target detection index of the target application to the ICT operation platform, or may receive the target detection index from the user side at regular time. The target detection index includes, but is not limited to, communication quality parameters such as an applied data transmission rate, an applied bit error rate, and the like.

Step S504, determining whether the target application fails according to the target detection index;

In some embodiments, the step may be implemented by comparing the target detection index with a preset threshold, determining whether a failure occurs according to the comparison result, wherein the failure of the target application is determined when the comparison result indicates that the target detection index is smaller than the preset threshold, and determining that the failure of the target application is determined when the comparison result indicates that the target detection index is greater than the preset threshold. For example, when the data transmission rate of the target application is less than the rate threshold, then it is determined that the target application is malfunctioning.

In some embodiments of the present application, the fault may also be determined by a fault determination method as shown in fig. 9, which includes, as shown in fig. 9:

In step S902, a target detection index of a target application provided in an ICT operation platform is obtained, wherein the target detection index is used for reflecting an operation state of the target application, the ICT operation platform includes a plurality of infrastructure services, and the plurality of infrastructure services at least include a CT infrastructure service and an IT infrastructure service, and the target application includes, but is not limited to, various types of industry applications.

Step S904, determining whether the target application fails according to the target detection index;

It should be noted that, in the above step S902, the step S904 is in a one-to-one correspondence with the steps S502 and S904 in fig. 5, so the explanation of the step S502 and the step S504 is also applicable to the step S902 and the step S904, which are not repeated here.

Step S906, under the condition that the target application fails, acquiring the running states of various infrastructure services in the ICT running platform;

In some embodiments, running logs of multiple infrastructure services are obtained, wherein the log collection manners adopted by different infrastructure services are different, i.e. the log collection manners adopted by different infrastructure services can be independent, for example, it device collects logs by using syslog snmp, ct device collects logs by using DPI proprietary protocol, and running states of the multiple infrastructure services are determined based on the running logs.

The running log can be determined by determining time information when a target application fails and log identification of the target application, wherein the log identification is used for identifying logs generated by IT infrastructure services and CT infrastructure services associated with the target application, determining a time stamp corresponding to the time information and determining a log set corresponding to the time stamp, determining the log corresponding to the log identification from the log set according to the log identification, and taking the log corresponding to the log identification as the running log.

The log identifier may be determined based on a first identifier of the IT infrastructure service and a second identifier of the CT infrastructure service associated with the target application, and the specific determination manners are various, for example, the first identifier and the second identifier may be combined to form the log identifier, or the first identifier and the second identifier may be subjected to hash operation, and a result obtained by the hash operation is determined to be the log identifier.

Step S908, determining the failure cause of the target application according to the running states of the multiple infrastructure services.

The method comprises the steps of evaluating the running states of various infrastructure services to obtain evaluation indexes of the infrastructure services, wherein the evaluation indexes are used for evaluating the running states of the infrastructure services, determining target infrastructure services in the various infrastructure services according to the evaluation indexes, and determining that the fault cause is a fault caused by the target infrastructure services. When evaluating the operation states of the plurality of infrastructure services, the evaluation modes corresponding to the infrastructure services in the plurality of infrastructure services are required to be determined, namely, different evaluation modes exist for different infrastructure services, and the plurality of infrastructure services are evaluated by adopting the evaluation modes corresponding to the infrastructure services. The evaluation index includes, but is not limited to, communication rate, error rate, etc.

In some embodiments of the present application, the user may select an appropriate evaluation index according to the requirement, as shown in fig. 8. Fig. 8 is an interactive interface according to an embodiment of the present application, wherein the upper left side of fig. 8 is a display interface for displaying a topology structure diagram of a private network. The topology may be green or other user-specified colors during normal operation. When a fault occurs, after the cause of the fault is determined, the color of the node corresponding to the equipment with the fault changes, and the color is different according to the severity of the fault. The upper right of fig. 8 is a preset multiple evaluation criteria, and the user can directly select at least one evaluation criteria to evaluate the infrastructure services of the private network in combination with his own needs, while the lower right is an evaluation index to evaluate the infrastructure services of the private network based on the selected evaluation criteria. The lower left of fig. 8 shows a summary of faults, as shown in fig. 8, all faults are classified into IT faults and CT faults according to the fault category, and important information such as fault cause, fault level (severity), duration and the like of each fault can be displayed, so that a user can conveniently operate and maintain the private network.

In other embodiments of the present application, after determining the cause of the failure of the target application based on the operational status of various infrastructure services, the failure may be isolated, for example:

In the case of a fault caused by a CT infrastructure service, storing running state indicating information of the CT infrastructure service in a first time period, wherein the running state indicating information is used for indicating that the CT infrastructure service is in an unavailable state, in the case of a fault caused by the IT infrastructure service, storing running state indicating information of the IT infrastructure service in a second time period, wherein the running state indicating information is used for indicating that the IT infrastructure service is in the unavailable state, and in the case of the fault caused by the IT infrastructure service, the CT infrastructure service is in a larger influence and a complex detection process due to the fact that the CT infrastructure service is in a fault, the fault is required to be permanently isolated before the fault is recovered, and in the case of the fault of the IT infrastructure, a certain program section or script is in a fault, the fault is easy to detect, and therefore the fault can be isolated at any time. So that the duration corresponding to the first time period needs to be greater than the duration corresponding to the second time period.

In some embodiments, the fault of the target application may be caused by insufficient capacity of the infrastructure service, and after determining the fault cause of the target application according to the running states of the plurality of infrastructure services, when the fault cause is a fault caused by insufficient capacity of the plurality of infrastructure services, the prompt information for prompting the capacity expansion is generated. Wherein, IT infrastructure service and CT infrastructure service are both possible to have the situation of causing the failure of the target application due to insufficient capacity. Such capacities include, but are not limited to, remaining operating resources of the infrastructure corresponding to each infrastructure service, such as memory remaining space, etc.

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.

From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present application.

Example 2

According to an embodiment of the present application, there is also provided a fault handling method as shown in fig. 6, including:

step S602, when at least one fault occurs in the ICT operation platform, current operation state information in the ICT operation platform is collected, and operation log information is generated;

Step S604, determining a fault type of each fault in at least one fault based on the operation log information, and positioning each fault, wherein the fault type comprises an IT fault and a CT fault;

Step S606, adopting a fault isolation strategy corresponding to the fault type of each fault to isolate the infrastructure service corresponding to each fault.

In other embodiments of the present application, after locating the fault, the fault may be isolated, for example:

Example 3

There is further provided in accordance with an embodiment of the present application a fault determining apparatus as shown in fig. 7, which includes a first obtaining module 70 configured to obtain a target detection indicator of a target application provided in an ICT operation platform, where the target detection indicator is configured to reflect an operation state of the target application, and the ICT operation platform includes a plurality of infrastructure services, where the plurality of infrastructure services includes at least a CT infrastructure service and an IT infrastructure service, a first determining module 72 configured to determine whether the target application has a fault according to the target detection indicator, a second obtaining module 74 configured to obtain an operation state of the plurality of infrastructure services in the ICT operation platform when the target application has a fault, and a second determining module 76 configured to determine a cause of the fault of the target application according to the operation states of the plurality of infrastructure services.

In some embodiments of the present application, the first obtaining module 70 may obtain the target detection index in various manners, for example, a request message may be sent to the user side, and the user equipment may respond to the request message to feed back the target detection index of the target application to the ICT operation platform, and may also receive the target detection index from the user side at regular time. The target detection index includes, but is not limited to, communication quality parameters such as an applied data transmission rate, an applied bit error rate, and the like.

In some embodiments of the present application, the first determination module 72 may determine whether the target application is malfunctioning by comparing the target detection index to a preset threshold, determining whether the target application is malfunctioning based on the comparison result, wherein the target application is determined to be malfunctioning when the comparison result indicates that the target detection index is less than the preset threshold, and determining the target application to be malfunctioning when the comparison result indicates that the target detection index is greater than the preset threshold. For example, when the data transmission rate of the target application is less than the rate threshold, then it is determined that the target application is malfunctioning.

In some embodiments, the second obtaining module 74 may obtain running logs of multiple infrastructure services, where the log collection manners adopted by different infrastructure services are different, i.e., the log collection manners adopted by different infrastructure services may be independent, for example, it device uses syslog snmp to collect logs, ct device uses DPI proprietary protocol to collect logs, and determine the running states of the multiple infrastructure services based on the running logs.

In some embodiments of the present application, the second determination module 76 may be implemented by evaluating the operational status of the plurality of infrastructure services to obtain an evaluation index for each infrastructure service, where the evaluation index is used to evaluate the operational status of each infrastructure service, determining a target infrastructure service of the plurality of infrastructure services based on the evaluation index, and determining a cause of the fault as the fault caused by the target infrastructure service. When evaluating the operation states of the plurality of infrastructure services, the evaluation modes corresponding to the infrastructure services in the plurality of infrastructure services are required to be determined, namely, different evaluation modes exist for different infrastructure services, and the plurality of infrastructure services are evaluated by adopting the evaluation modes corresponding to the infrastructure services. The evaluation index includes, but is not limited to, communication rate, error rate, etc.

Example 3

Embodiments of the present application may provide a computer terminal, which may be any one of a group of computer terminals. Alternatively, in the present embodiment, the above-described computer terminal may be replaced with a terminal device such as a mobile terminal.

Alternatively, in this embodiment, the above-mentioned computer terminal may be located in at least one network device among a plurality of network devices of the computer network.

In this embodiment, the computer terminal may execute program codes of the following steps in the vulnerability detection method of an application program, where the program codes obtain a target detection index of a target application provided in an ICT operation platform, where the target detection index is used to reflect an operation state of the target application, the ICT operation platform includes multiple infrastructure services, and the multiple infrastructure services at least include a CT infrastructure service and an IT infrastructure service, determine whether the target application fails according to the target detection index, obtain operation states of multiple infrastructure services in the ICT operation platform in case of failure of the target application, and determine a failure cause of the target application according to the operation states of the multiple infrastructure services.

The memory may be used to store software programs and modules, such as program instructions/modules corresponding to the fault determining method and apparatus in the embodiments of the present application, and the processor executes the software programs and modules stored in the memory, thereby executing various functional applications and data processing, that is, implementing the fault determining method described above. The memory may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include memory remotely located with respect to the processor, which may be connected to terminal a through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The processor can call information and application programs stored in the memory through the transmission device to acquire target detection indexes of target applications provided in the ICT operation platform, wherein the target detection indexes are used for reflecting the operation states of the target applications, the ICT operation platform comprises multiple infrastructure services, at least the CT infrastructure services and the IT infrastructure services, whether the target applications are faulty or not is determined according to the target detection indexes, the operation states of the multiple infrastructure services in the ICT operation platform are acquired under the condition that the target applications are faulty, and the fault reasons of the target applications are determined according to the operation states of the multiple infrastructure services.

Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be implemented by a program for instructing a terminal device related hardware, and the program may be stored in a computer readable storage medium, where the storage medium may include a flash disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, etc.

Example 4

Embodiments of the present application also provide a nonvolatile storage medium. Alternatively, in the present embodiment, the above-described nonvolatile storage medium may be used to store the program code executed by the failure determination method provided in the above-described embodiment.

Alternatively, in this embodiment, the storage medium may be located in any one of the computer terminals in the computer terminal group in the computer network, or in any one of the mobile terminals in the mobile terminal group.

Optionally, in this embodiment, the storage medium is configured to store program code for obtaining a target detection indicator of a target application provided in an ICT operation platform, wherein the target detection indicator is configured to reflect an operation state of the target application, the ICT operation platform includes a plurality of infrastructure services, and the plurality of infrastructure services include at least a CT infrastructure service and an IT infrastructure service, determining whether the target application fails according to the target detection indicator, obtaining an operation state of the plurality of infrastructure services in the ICT operation platform in case of failure of the target application, and determining a cause of failure of the target application according to the operation state of the plurality of infrastructure services.

The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed technology may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, such as the division of the units, is merely a logical function division, and may be implemented in another manner, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. The storage medium includes a U disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, etc. which can store the program code.

The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application, which are intended to be comprehended within the scope of the present application.

Claims

1. A fault determination method, comprising:

Obtaining a target detection indicator of a target application provided in an ICT operation platform, wherein the target detection indicator is used to reflect an operation status of the target application, wherein the ICT operation platform includes multiple infrastructure services, and the multiple infrastructure services include at least: a CT infrastructure service and an IT infrastructure service;

Determining whether a failure occurs in the target application based on the target detection indicator;

In the event that the target application fails, obtaining the operating status of the multiple infrastructure services in the ICT operating platform includes:

Obtain operation logs for the various infrastructure services, including:

Determining time information when the target application fails and a log identifier of the target application, wherein the log identifier is used to identify logs generated by IT infrastructure services and CT infrastructure services associated with the target application; determining the operation log based on the time information and the log identifier of the target application;

determining the operating status of the plurality of infrastructure services based on the operating log;

The cause of the failure of the target application is determined based on the operating status of the multiple infrastructure services.

2. The method according to claim 1, wherein different infrastructure services use different log collection methods.

3 . The method according to claim 1 , wherein the operation logs include logs generated by IT infrastructure services and CT infrastructure services.

4. The method according to claim 1, wherein determining the operation log comprises:

Determining a timestamp corresponding to the time information, and determining a log set corresponding to the timestamp;

The log corresponding to the log identifier is determined from the log set according to the log identifier, and the log corresponding to the log identifier is used as the running log.

5. The method according to claim 1 , wherein determining the cause of the failure of the target application based on the operating status of the multiple infrastructure services comprises:

Evaluating the operating status of the multiple infrastructure services to obtain an evaluation index for each infrastructure service, wherein the evaluation index is used to evaluate the operating status of each infrastructure service;

A target infrastructure service among the multiple infrastructure services is determined based on the evaluation index, and the cause of the fault is determined to be a fault caused by the target infrastructure service.

6. The method according to claim 5, wherein after determining that the fault cause is a fault caused by the target infrastructure service, the method further comprises:

The classification shows the target object the cause of the fault, fault duration, and fault level. The fault level is used to characterize the severity of the fault, and faults of different fault levels are marked with different labels.

7. The method according to claim 5, wherein evaluating the operating status of the plurality of infrastructure services comprises:

determining an evaluation method corresponding to each of the plurality of infrastructure services;

The plurality of infrastructure services are evaluated respectively using evaluation methods corresponding to the respective infrastructure services.

8. The method according to claim 6, wherein determining the evaluation method corresponding to each of the plurality of infrastructure services comprises:

Display multiple preset evaluation methods in the interactive interface;

In response to a selection instruction received by the target object through the interactive interface, an evaluation method corresponding to each infrastructure service is selected from the multiple evaluation methods.

9. The method according to claim 7, wherein the method further comprises: displaying the evaluation results of the multiple infrastructure services in an interactive interface, and categorizing and displaying the fault causes, fault durations, and fault levels of the faults, wherein the fault levels are used to characterize the severity of the faults, and faults of different fault levels are marked with different labels.

10. The method according to claim 1, wherein after determining the cause of the failure of the target application based on the operating status of the multiple infrastructure services, the method further comprises:

In a case where the fault cause is a fault caused by a CT infrastructure service, storing operation status indication information of the CT infrastructure service within a first time period, wherein the operation status indication information is used to indicate that the CT infrastructure service is in an unavailable state;

If the fault is caused by an IT infrastructure service, storing operation status indication information of the IT infrastructure service within a second time period, wherein the operation status indication information is used to indicate that the IT infrastructure service is in an unavailable state;

The duration corresponding to the first time period is greater than the duration corresponding to the second time period.

11. The method according to any one of claims 1 to 9, wherein after determining the cause of the failure of the target application based on the operating status of the multiple infrastructure services, the method further comprises:

When the fault is caused by insufficient capacity of the multiple infrastructure services, prompt information is generated to prompt capacity expansion.

12. A fault determination method, comprising:

Obtain operation logs for the various infrastructure services, including:

13. A fault handling method, comprising:

When at least one fault occurs in the ICT operation platform, current operation status information of the ICT operation platform is collected and operation log information is generated, including:

Obtain operation logs for various infrastructure services, including:

Determining time information when a target application fails and a log identifier of the target application, wherein the log identifier is used to identify logs generated by IT infrastructure services and CT infrastructure services associated with the target application; and determining the operation log based on the time information and the log identifier of the target application;

Determining current operating status information in the ICT operating platform based on the operating log;

Determine a fault type of each of the at least one fault based on the operation log information, and locate each of the faults, wherein the fault type includes an IT fault and a CT fault;

A fault isolation strategy corresponding to the fault type of each fault is adopted to isolate the infrastructure service corresponding to each fault.

14. A fault determination device, comprising:

a first acquisition module, configured to acquire a target detection indicator of a target application provided in an ICT operation platform, wherein the target detection indicator is used to reflect an operation status of the target application, wherein the ICT operation platform includes multiple infrastructure services, and the multiple infrastructure services include at least CT infrastructure services and IT infrastructure services;

A first determination module is configured to determine whether a fault occurs in the target application based on the target detection indicator;

The second acquisition module is configured to acquire the operating status of the multiple infrastructure services in the ICT operating platform when the target application fails, including:

Obtain operation logs for the various infrastructure services, including:

The second determining module is configured to determine a cause of the failure of the target application according to the operating status of the multiple infrastructure services.

15. A non-volatile storage medium, wherein the non-volatile storage medium includes a stored program, wherein when the program is executed, the device where the non-volatile storage medium is located is controlled to execute the fault determination method according to any one of claims 1 to 7.

16. A computer terminal, comprising:

processor; and

A memory, connected to the processor, configured to provide the processor with instructions for processing the following processing steps:

Obtain operation logs for the various infrastructure services, including: