[go: up one dir, main page]

CN110225540A - A kind of fault detection method towards centralization access net - Google Patents

A kind of fault detection method towards centralization access net Download PDF

Info

Publication number
CN110225540A
CN110225540A CN201910383058.1A CN201910383058A CN110225540A CN 110225540 A CN110225540 A CN 110225540A CN 201910383058 A CN201910383058 A CN 201910383058A CN 110225540 A CN110225540 A CN 110225540A
Authority
CN
China
Prior art keywords
fault
layer
detector
network
fault detector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910383058.1A
Other languages
Chinese (zh)
Inventor
叶冠文
王园园
张宗帅
孙茜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhongke Polytron Technologies Inc
Original Assignee
Beijing Zhongke Polytron Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongke Polytron Technologies Inc filed Critical Beijing Zhongke Polytron Technologies Inc
Publication of CN110225540A publication Critical patent/CN110225540A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • H04L41/065Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis involving logical or physical relationship, e.g. grouping and hierarchies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/04Arrangements for maintaining operational condition

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The method that a kind of pair of centralization access net executes abnormality detection, it include: 1) for one layer in the network architecture of centralization access net, accident detection is carried out using Fisrt fault detector corresponding with this layer, wherein operation data of the Fisrt fault detector based on this layer itself, obtaining using Negative Selection Algorithm training;2) component is detected in the Fisrt fault detector there are when multinomial fault message, the second tracer is used to execute to ship the multinomial fault message of the component and calculate with the failure of the determination component, fault message described in one of them is represented as occur on the components anomalous event and there are the set of associated all anomalous events with the anomalous event.

Description

Fault detection method for centralized access network
Technical Field
The present invention relates to fault detection in wireless communication systems, and more particularly to fault detection for centralized access networks.
Background
A centralized access network (C-RAN) is a novel resource management and control system, which creates a large number of virtual base stations in a large-scale centralized resource pool on demand through a uniform and open interface to realize resource sharing among a plurality of virtual base stations. However, for such a shared resource pool, once a problem occurs in the resource pool, multiple base stations associated with the resource pool may fail, thereby affecting the services of access users in a wide range, and even causing the entire network to crash. It is therefore desirable to provide a fault management system for a centralized access network.
The fault detection is used as the first step of fault management, and the detection effect of the fault detection directly influences the effect of fault management. The traditional fault detection mode has large manual participation, and the fault detection mode is very dependent on the experience of operation and maintenance personnel and is easy to cause misjudgment and missed judgment. With the emergence of new services, the complexity of centralized access network equipment is higher and higher, and the network scale is gradually increased, which makes it more difficult to implement accurate and efficient fault detection.
Currently, some new fault detection methods that can reduce human involvement are proposed in the art, and can be roughly classified into two categories: 1. the method based on the probe periodically sends detection data to the network, and judges whether a fault occurs according to the response condition of the network. The method has high detection rate, but the detection packets need to be continuously sent to the network, the data scale of the centralized access network is huge, and the mode of sending the detection data can increase extra overhead for the system. 2. Based on a data mining and neural network method, the method trains fault data samples into corresponding rules or models for fault detection, and the larger the fault data sample is, the more accurate the trained rules or models are. However, for most devices, it is difficult to obtain a large number of fault samples at one time, so such a method is not suitable for a centralized access network.
Disclosure of Invention
Therefore, an object of the present invention is to overcome the above-mentioned drawbacks of the prior art, and to provide a method for performing anomaly detection on a centralized access network, including:
1) aiming at one layer in a network architecture of a centralized access network, a first fault detector corresponding to the layer is adopted for detecting abnormal events, wherein the first fault detector is obtained by adopting a negative selection algorithm for training based on the self operating data of the layer;
2) when the first fault detector detects that a plurality of fault information exist in a component, a second fault detector is adopted to perform intersection operation on the plurality of fault information of the component so as to determine the fault of the component, wherein one fault information is represented as a set of one abnormal event occurring on the component and all abnormal events related to the abnormal event.
Preferably, the method comprises: performing the above step 1) and/or step 2) for one layer in a network architecture of a centralized access network, and in case that a failure exists in a component of the layer, performing the above step 1) and/or step 2) for an adjacent layer of the layer until it is determined that there is no failure in the component of the current layer or all layers are traversed.
Preferably, the method comprises: performing the above step 1) and/or step 2) for a non-underlying layer in the network architecture of the centralized access network, in case it is found that there is a failure in a component of said layer, performing the above step 1) and/or step 2) for a lower layer of said layer until it is determined that there is no failure or the lowest layer is detected in the component of the current layer.
Preferably, according to the method, the following layers in the network architecture of the centralized access network are selected from the corresponding sets for the negative selection algorithm and the first fault detector, respectively:
computing resource layer: the temperature of the equipment, the voltage of the equipment, the utilization rate of a CPU (Central processing Unit), the utilization rate of a memory, the flow of a network interface and the rate of the network interface;
a virtualization layer: the utilization rate of a virtual CPU, the utilization rate of a virtual memory and the utilization rate of a virtual baseband;
a network element layer: the method comprises the following steps that the number of access users of a virtual base station, an uplink rate, a downlink rate, signal strength, time delay, packet loss rate and reference signal receiving power are calculated;
network layer: network performance parameters for the area.
Preferably, according to the method, wherein the first fault detector is trained by a method comprising:
I) for one layer in a network architecture of a centralized access network, collecting n-dimensional operation data D ═ (z) of the layer(1),z(2),...,zn) To serve as an autologous sample set;
II) training by adopting a negative selection algorithm based on the self-body sample set to obtain a candidate first fault detector, and if the candidate first fault detector is the minimum distance d from the self-body sampleminGreater than or equal to the affinity radius r of the autologous samplesAdding the candidate fault detectors to a set of fault detectors;
III) outputting the set of fault detectors when the obtained set of fault detectors meets the set coverage rate, otherwise, repeating the step II).
Preferably, according to the method, wherein the step III) of determining whether the obtained set of fault detectors reaches a set coverage rate includes:
based on z > zαDetermining whether the obtained set of fault detectors has reached a set coverage, whereinx is the number covered by the set of fault detectors in n test samples, p is a predetermined coverage, zαIs the confidence value corresponding to the selected significance level α.
Preferably, according to said method, wherein between steps II) and III) it is comprised:
if the detection radius r of the candidate fault detectordi>dmin-rsThen the candidate fault detector is excluded from the set for the fault detector.
Preferably, according to said method, wherein said set of autologous samples is D ═ z (z)(1),z(2),...,zn) Corresponding to analysis by principal componentsThe algorithm carries out dimension reduction processing on the m-dimensional running data which can be selected by the layer to determine n dimensions, n<m。
A method of training a fault detector, comprising:
I) collecting n-dimensional operation data D ═ (z) generated by the system in operation(1),z(2),...,zn) To serve as an autologous sample set;
II) training by adopting a negative selection algorithm based on the self-body sample set to obtain a candidate fault detector, wherein if the minimum distance d between the candidate fault detector and the self-body sample isminGreater than or equal to the affinity radius r of the autologous samplesAdding the candidate fault detectors to a set of fault detectors;
III) outputting the set of fault detectors when the obtained set of fault detectors meets the set coverage rate, otherwise, repeating the step II).
Preferably, according to the method, wherein the step III) of determining whether the obtained set of fault detectors reaches a set coverage rate includes:
based on z > zαDetermining whether the obtained set of fault detectors has reached a set coverage, whereinx is the number covered by the set of fault detectors in n test samples, p is a predetermined coverage, zαIs the confidence value corresponding to the selected significance level α.
Preferably, according to said method, wherein between steps II) and III) it is comprised:
if the detection radius r of the candidate fault detectordi>dmin-rsThen the candidate fault detector is excluded from the set for the fault detector.
Preferably, according to said method, wherein said set of autologous samples is D ═ z (z)(1),z(2),...,zn) Corresponding to n dimensions determined by performing dimension reduction processing on the selectable m-dimensional running data of the layer through a principal component analysis algorithm, n<m。
A computer-readable storage medium, in which a computer program is stored which, when executed, is adapted to carry out the method of any of the above.
Compared with the prior art, the embodiment of the invention has the advantages that:
1. the invention can continuously and automatically detect the abnormal condition of the centralized access network, reduces the amount of manual participation and improves the automation degree of the system.
2. The negative selection algorithm adopted by the primary fault detection only needs to provide normal operation parameter samples when the abnormal detection model is trained, and a large number of fault samples are not needed, so that the method is easy to implement.
3. According to the invention, the second-level fault detection establishes the abnormal association mapping table for carrying out abnormal reasoning, so that large-scale abnormal conditions can be effectively faced, and the fault detection efficiency is improved.
Drawings
Embodiments of the invention are further described below with reference to the accompanying drawings, in which:
FIG. 1 is a diagram of a multi-layer network architecture of a super base station according to one embodiment of the present invention;
fig. 2 is a flow diagram of a method for a centralized access network to perform anomaly detection in accordance with one embodiment of the present invention;
FIG. 3 is a flow diagram of a method for training an anomaly detector using a negative selection algorithm, according to one embodiment of the present invention;
FIG. 4 is a flow diagram of a method of checking detector coverage according to one embodiment of the invention;
FIG. 5 is a test result of the fault detection rate for the inventive arrangements;
FIG. 6 is a test result of a detection time analysis for the protocol of the present invention.
Detailed Description
When studying fault detection of a centralized access network, the inventor finds that the current fault detection mechanism has the problems of low detection rate, large manual participation amount and insufficient automation degree, and proposes to adopt a Negative Selection Algorithm (NSA) based on the field of artificial immunity to carry out fault detection. The NSA algorithm references the 'negative selection' process when immune cells mature, and is used for detecting the 'non-self' condition by learning an 'self' data training abnormality detector, wherein the 'self' data refers to normal data, and the 'variant' refers to an abnormal condition. The method does not need a large number of fault samples in the training process, can generate the abnormal detector mainly depending on the normal operation parameters of the method, does not send data packets to the network, and avoids increasing the burden of the system. Therefore, anomaly detection for centralized access networks can be carried out based on this approach.
However, the abnormality detected by NSA does not necessarily mean that a failure has occurred, and further failure determination is required. In the centralized access network, one fault can be associated with one or more abnormal conditions, and at this time, fault association analysis needs to be performed on a plurality of abnormal conditions to find out a fault source where the abnormality occurs.
In view of the above, the present invention proposes to use a multi-stage Fault detection mechanism (MFDM) to determine whether the network is abnormal by using a first-stage Fault detection (anomaly detection), and perform association analysis on the abnormality by using a second-stage Fault detection (anomaly association analysis) to find out the Fault source where the abnormality occurs.
In summary, for the first-stage fault detection, an abnormal detector set based on an artificial immune negative selection algorithm is obtained by training normal data of a centralized access network, and the abnormal detector set obtained by training is used for carrying out abnormal detection on the network to judge whether an abnormal condition occurs.
For the second level of fault detection, an intersection operation is performed based on a plurality of exceptions for a component to determine fault information for the component, wherein an exception is represented as a set of one exception event occurring on the component and all exception events associated with the exception event. The exception event occurring at the component is an exception event occurring at the component as determined by the first level of fault detection.
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
< example 1>
The method of training the first fault detector will be described below by way of one embodiment.
For most centralized access networks, the network architecture of the centralized access network can be abstracted into multiple layers, and each layer has different characteristics, so according to an embodiment of the present invention, the first fault detectors can be trained respectively for each layer in the network architecture of the centralized access network, so as to perform abnormal event detection by using the first fault detector corresponding to the layer when performing abnormal event detection.
For the above embodiment, when the first fault detector is trained, the corresponding operating data of each layer may be selected by combining the characteristics of the layer to be used as a training sample of the negative selection algorithm, and when step 1 is implemented, the to-be-detected data of the same category as the training sample may be selected to perform anomaly detection.
Referring to fig. 1, taking a super base station (a centralized access network) as an example, the network architecture is divided into, from top to bottom:
network layer: is a network area formed by a plurality of base stations serving together, such as the network area of the Hai lake district of Beijing. Based on the characteristics of the network layer, data reflecting the network performance of the area can be selected for the layer as a training sample, such as throughput, packet loss rate, time delay and the like of the area.
A network element layer: is a variety of virtual base stations established according to user requirements. The single cell of the super base station is equivalent to a traditional base station, such as a base station capable of supporting 2G, 3G and 4G. The virtual base station has all the logic functions of the conventional base station, and is different from the conventional base station in that the virtual base station does not have the physical form of the conventional base station. For the network element layer, the number of access users, uplink and downlink rates, signal strength, time delay, packet loss rate, Reference Signal Received Power (RSRP), and the like can be selected as training samples.
A virtualization layer: physical computing resources such as a memory, a CPU, a baseband and the like on the bottom layer are abstracted into logic computing resources which can be directly called by using a real-time virtualization technology. Most of the reasons for the exception of the virtualization layer are caused by insufficient physical computing resources (migration or newly added physical devices), or by software configuration errors. Thus, for the virtualization layer, the usage of virtual CPUs, virtual memories, virtual baseband, etc. can be selected as training samples.
Computing resource layer: namely a hardware layer, which corresponds to an external radio frequency unit, a centralized baseband pool, a centralized general server, etc., to provide software and hardware support. The exception of the computing resource layer is mainly caused by hardware, so that the temperature, voltage, CPU, memory utilization rate, network interface flow, speed and the like of hardware equipment can be selected as training samples for the layer.
According to one embodiment of the invention, a method of training a first fault detector comprises:
step 1, aiming at one layer of the network architecture of the centralized access network, collecting n-dimensional operation data D ═ z (of the layer) of the layer(1),z(2),...,zn) To serve as an autologous sample set. Wherein the one-dimensional operational data representsOne kind of operation data, taking a hardware layer as an example, may select 6 kinds of operation data, namely, temperature, voltage, CPU, memory utilization rate, network interface traffic, and rate of a hardware device, as a training sample for the layer, that is, n is 6.
In one embodiment of the present invention, the managed object may be configured in the detection agent by an administrator, and the data uploaded by the management software, the protocol stack software, and the baseband software for the managed object may be collected through a network management protocol (SNMP).
For example, for the management and control software, the configuration management module collects the configured information to judge whether the configuration information of the base station input by the administrator is reasonable or not; collecting data of hardware such as related computing resources and the like based on a board management module, such as temperature, voltage, CPU and utilization rate of an internal memory; collecting information of network calls (FTP, HTTP, SNMP and the like) based on the network port management module, such as uplink and downlink rates, packet loss rate, TCP/UDP packets and the like; collecting cell-related information such as the number of users, bandwidth, cell load, etc. of a cell based on a service management module; the RRU-based management module is responsible for collecting information of the remote radio units, such as transmission power, signal strength, coverage rate, etc. of signals.
For protocol stack software, protocol processing information about protocol stack layer 2(MAC, RLC, PDCP, RRC, etc.) such as packet transmission/reception rate, packet flooding rate, jitter delay, etc. is collected by the software.
For the baseband software, the processing information of the physical layer PHY, such as data volume of the network port, throughput, baseband packet receiving and transmitting rate, power consumption, error rate, etc., is collected by the software.
Considering that a centralized access network such as a super base station is divided into multiple layers, the number of parameters to be detected in each layer is large, and the corresponding data amount is very large, so that the collected parameters can be subjected to redundancy removal, for example, linear conversion is adopted to reduce the dimensionality of data so as to reduce the time for data training and the complexity of data processing.
In one embodiment of the invention, the parameters measurable for one layer in the super base station network architecture are summarized as m-dimensional parameters D ═ x(1),x(2),...,x(m)) And converted into an n-dimensional parameter D' ═ x based on Principal Component Analysis (PCA)(1),x(2),...,x(n)) Wherein n is<And m is selected. In the PCA algorithm, the smaller n is selected, the shorter the training time of the algorithm is, and the lower the corresponding accuracy is, but the larger n is, the longer the training time of the algorithm is, and further the efficiency of fault detection is affected. The value of n is related to the contribution rate of the data and is expressed as:
where p is the contribution of the data and λ is the covariance matrix XX of the original sampleTThe characteristic value of (2). Typically, a contribution rate greater than 90% may be considered to cover the original data space well.
The specific implementation of the dimensionality reduction of the samples by the PCA algorithm can refer to the prior art. Taking the example of using protocol stack software to obtain self samples, it is assumed that 20 sets of data with m-5 as shown in table 1 can be captured at the current layer of the network architecture according to an embodiment of the present invention.
TABLE 1 summary of data for Current layer
Based on PCA algorithm, the data are decentralized and the corresponding covariance is calculatedMatrix XXTThe results shown in table 2 were obtained.
TABLE 2 covariance matrix XXT
0.9444 -0.3232 0.0592 0.0138 -0.0008
0.3278 0.9126 -0.2434 -0.0177 -0.0082
0.0035 0.0037 -0.0126 -0.0372 0.9992
-0.0059 0.0330 0.0420 0.9979 0.0093
0.0250 0.2481 0.9671 -0.0491 0.0093
Continuing the PCA algorithm, eigenvalues and contribution rates corresponding to the above covariance matrix are calculated, and λ ═ 36.7635,10.3131,0.4373,0.0275,0.0000 can be obtained in descending order of eigenvalues]TWhere the total contribution ratio of the first 2 eigenvalues is 99.02%, and the criterion that the set contribution ratio is greater than 90% is satisfied, n is selected to be 2.
Finding the vectors corresponding to the first n-2 feature matrices and calculating the principal component vectors based on the PCA algorithm can obtain the results as shown in table 3.
Table 3 major component data with n-2
For the convenience of calculation, the above-described feature 1 and feature 2 are normalized to a dimensionless index by the following equation in the present embodiment.
Where x is the original data, xmax、xminMaximum and minimum values, x, of the original data setnormIs normalized data.
The results of the normalization process on the data in table 3 are shown in table 4.
TABLE 4 normalized data
Thus, an autologous sample D' (z) for one layer of the network architecture of the centralized access network may be obtained(1),z(2)),(n=2)。
Step 2, training by adopting a negative selection algorithm based on the self-body sample set to obtain a candidate first fault detector, and if the candidate first fault detector is the minimum distance d between the candidate first fault detector and the self-body sampleminGreater than or equal to the affinity radius r of the autologous samplesThen the candidate fault detectors are added to the set of fault detectors.
According to one embodiment of the invention, the Euclidean distance calculation can be adopted to calculate the affinity radius r of the autologous samples. Assuming that the autologous samples obtained in step 1 are as shown in table 4, x is made [0.4446,0 ]],y=[0.3419,0.0722]The distance between the 2 real-valued vectors represents the affinity between the two, and the smaller the affinity, the more matched. The euclidean distance is used to calculate the affinity radius between vector x and vector y:
wherein x isiAnd yiRepresenting the ith bit of vectors x and y, respectively. n represents the number of parameter types.
FIG. 3 illustrates a process for training an anomaly detector using a negative selection algorithm according to one embodiment of the present invention. Referring to fig. 3, the training process includes: randomly generating a detector sequence, and comparing the detector with the minimum distance d of the autologous samplemin(the distance can be calculated by the Euclidean distance formula), if dmin<rsThe detector is negated; if d ismin>rsThe detector can be used as a candidate detector, and the corresponding candidate detection radius is L ═ dmin-rs. The detection radius set of the existing anomaly detector is rdiI is the number of the anomaly detector. To reduce the coincidence rate between the detectors, it is also possible to judge L and all rdiIf L < r is presentdiThen the candidate detector will be discarded; if L < r is absentdiThen the candidate detector is added to the mature detector set.
By this step, a set for the first fault detector may be obtained.
And 3, outputting the set of fault detectors when the obtained set of fault detectors meets the set coverage rate, and otherwise, repeating the step 2.
In the present embodiment, the stop condition for generating the detector set is that the detector reaches a value of a predetermined coverage. The estimated coverage of the set of current fault detectors may be evaluated based on the decimated test samples using the following equation
Wherein,for estimated coverage, x is the number of detector coverage and n is the number of test samples taken.
Assuming that p is a predetermined coverage and σ is a standard deviation, according to the central limit theorem, when the test sample n is large enough, the error z value of the estimated coverage of the test sample can be approximated as following a standard normal distribution:
based on the expressions (4) and (5), the following expression can be derived:
when the error z > z of the coverage determined by the formula (6)αIf so, receiving the assumption that the coverage rate p is reached, and stopping training; when z < zαWhere α is a significance level, the smaller α indicates the more accurate the result was, typically the significance level is selected to be α -0.05, and the confidence level is 1- α -0.95, zαFor this confidence level, the corresponding value of z at that time can be found by looking up the standard normal distribution tableαIs 1.645.
Referring to fig. 4, a desired coverage p, a significance level α, and a number of test samples n are selected as needed, autologous samples are randomly selected for testing, it is determined for the n selected test samples whether each test sample is covered by a detector, and a z value is calculated based on equation (6) as a result, it is determined whether z is greater than z determined by the significance level ααAnd if so, considering that the expected coverage rate is reached, otherwise, considering that the expected coverage rate is not reached and continuously training the first fault detector.
By the embodiment 1, a set of first fault detectors for a layer in a network architecture of a centralized access network can be obtained based on data of self operation of the layer, and comprehensive coverage of self samples is satisfied based on a plurality of fault detectors.
< example 2>
According to embodiment 2 of the present invention, there is provided a method for performing anomaly detection on a centralized access network, and referring to fig. 2, the method includes:
step 1, aiming at a centralized access network, a first fault detector is adopted to detect abnormal events, wherein the first fault detector is obtained by adopting a negative selection algorithm for training based on the self operation data of the centralized access network.
According to an embodiment of the invention, the first fault detector is trained using the method of embodiment 1.
The inventor finds that the characteristic that each layer has faults accords with the following rule, namely if the current layer has faults, the lower layer has a maximum probability to be abnormal. For example, if the index of the network layer is abnormal, the lower network element layer usually has an abnormality. Therefore, the anomaly detection directly performed on a certain higher layer often cannot directly locate the fault point, and the analysis on the lower layer is also needed. For example, according to an embodiment of the present invention, when an index of a network layer is abnormal, fault detection is performed on a network element layer, a corresponding abnormal network element (virtual base station service segment) is found, and then whether the network element is a software fault of a virtualization layer is determined through a virtual resource corresponding to a mapping relationship of the virtualization layer, and if not, fault detection is continuously performed on a lower layer until a fault source is found. Since the closer to the bottom layer, the more devices and indexes that need to be detected, in view of gradually narrowing the fault range to achieve fast fault location, it is preferable to perform fault detection from high to low for each layer in the network architecture of the centralized access network, including: in the event that a failure is found to exist in a higher layer component, failure detection is performed for a lower layer in the network architecture of the centralized access network until it is determined that there is no failure in the current layer component or all layers have been traversed. However, it is to be understood that in other embodiments of the present invention, the fault detection may also be performed from low to high for each layer in the network architecture of the centralized access network.
And 2, performing intersection operation according to a plurality of detected anomalies of one component by adopting a second-stage fault detector (also called an intersection operation detector) to determine fault information of the component, wherein one anomaly is represented as a set of one anomaly event occurring on the component and all anomaly events related to the anomaly event.
According to an embodiment of the present invention, for the scheme of using the first fault detector for abnormal event detection in step 1, in step 2, correlation analysis may be further performed based on the abnormal event occurring on the component detected by the first-stage fault detection. It is assumed that each failure that may occur in one layer of the centralized access network is numbered F1,…,Fi(i is the total number of faults) and a fault F is detected by the first fault detector corresponding to that layer1The abnormal event F1Possibly by an exception event F2And F3Derived so that the detected anomaly can be denoted as F1F2F3. More than one fault may be detected by the first fault detector, which may be represented for each of them in the manner described above, and then an intersection operation is performed on all the exceptions to determine a set of exception events from the results obtained.
Taking a computing resource layer as an example, suppose that the layer has 4 kinds of failures, which are respectively numbered as: f1Indicating event-over-temperature, F2Indicating an event-memory utilization is too high, F3Indicating event-network interface overload, F4Indicating an event-device overload. This layer has 3 devices, respectively: b is1-represents a baseband processing unit 1, S1-represents a generic server 1, R1-represents a radio frequency unit 1. In addition, at this layer, the memory utilization rate is too high, the temperature is too high due to overload of the network interface, and the memory utilization rate is too high due to overload of the network interface. Thus, an abnormality of one device can be indicated, for example, an abnormality in which the temperature of the server 1 is too high is indicated as S1F1F2F3An abnormality that the memory utilization rate of the server 1 is too high is represented as S1F2F3Exception table for overloading the network interface of the server 1Shown as S1F3
If the server 1 is detected to have overhigh temperature and overhigh memory utilization rate at the same time through the step 1, the two exceptions are handed over: s1F1F2F3∩S1F2F3=S1F2F3Based on this operation, it can be determined that the memory utilization of the server 1 is excessive (S)1F2F3)。
Based on the embodiment 1, the centralized access network can be subjected to multistage fault detection, and the detection method is realized through two-stage fault detectors of the MFDM, so that the manual participation amount is reduced, and the automation degree of the system is improved. The primary fault detector is obtained by performing negative selection algorithm training on data generated by the operation of the network architecture of the centralized access network, and a large number of training samples do not need to be specially prepared. And the primary fault detectors are respectively trained aiming at each layer in the network architecture, so that the number of required training samples is reduced, meanwhile, the uncertainty caused by training aiming at the whole network acquisition sample is reduced, and the pertinence and the detection accuracy of the primary fault detector obtained by training are improved. And a second fault detector is used for establishing a direct association mapping of the abnormal events, one abnormal event is expressed as a set of one abnormal event which occurs on the component and all abnormal events which are associated with the abnormal event, and when a plurality of abnormal events of one component are detected by the first fault detector, the second fault detector is used for determining the fault of the component, so that the large-scale abnormal conditions can be realized, and the fault detection efficiency can be improved.
< Performance test >
In order to verify the use effect of the scheme of the invention in the face of super base station fault detection, the inventor performs tests, and the evaluation indexes of the scheme comprise: training time, fault detection rate, detection time of the first fault detector. A total of 2 experiments were performed: (1) the training time of EFDM is analyzed through simulation, the training time is related to the set coverage rate of an abnormal detector and the size of a data set, and the training time of the data set with different sizes in 3 cases of 90%, 95%, 99% and the like is analyzed. (2) The detection rate and detection time of the EFDM are verified through 100 pieces of fault injection and 1000 pieces of parameters containing fault data.
Table 5 shows the parameter indices of the test experiments.
TABLE 5 Experimental test parameter index
(1) Training time analysis
In the experimental process, the data of protocol stack software is used for carrying out PCA processing, in order to better compare the PCA effect, the original data are respectively processed into 20 groups of n-3-dimensional data, the simplified autologous samples are adopted for carrying out detector training, and the training time before and after the PCA processing is counted in the training process.
The training time of the EFDM algorithm is related to the coverage of the anomaly detector. The 3 cases with the coverage of the anomaly detectors set to 99%, 95%, and 90% are selected, and the data samples are trained to generate an anomaly detector set. The average value is obtained after 10 times of repeated training: when the coverage rate is set to be 99%, the number of the training obtained abnormal detectors is 777, and the average training time is 2.6509 s; when the coverage rate is set to be 95%, the number of the abnormal detectors obtained by training is 194, and the average training time is 1.4937 s; when the coverage rate is set to 90%, the number of abnormal detectors is 54, and the average training time is 0.4665 s.
To analyze the effect of selecting different n values on training time, the inventors also tested training times when n is 2 and 5, with the results shown in table 6.
TABLE 6 training time comparison
The experimental results show that: (i) under the condition that other conditions are not changed, the number of generated abnormal detections is increased along with the increase of the coverage rate of the abnormal detector, but the corresponding training time is prolonged; however, when the coverage rate is 90%, the training time is short, but the area which is not covered by the detector is large, and the reliability of the abnormity detection is not guaranteed; when the coverage is set to 99%, the training time is relatively long. Therefore, from the experimental results, it is reasonable to set the coverage of the anomaly detector to 95%. (ii) As the data dimensionality decreases, the algorithm training time also decreases. (iii) EFDM can greatly reduce the training time of the algorithm, and the training time is reduced by more than 20%.
(2) Analysis of fault detection rate and detection time
To test the failure detection rate, the inventors selected a set of anomaly detectors trained under 95% coverage, injected with known 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000 test samples (where anomaly data accounts for 50%) to test the failure detection rate and detection time of EFDM processing into 2 and 3 dimensional data. Here, the detection rate is a probability that abnormal data is detected, and the detection time is a time taken to detect the data. The results are shown in fig. 5 and 6, averaged over several tests.
As can be seen from the detection rate curve in fig. 5, when the test sample is greater than 500, the failure detection rate of the data before and after the PCA processing tends to be stable, and the failure detection rate of the data after the PCA processing is slightly reduced compared to the failure detection without the PCA processing, but within an acceptable error range, the reason may be that data compression may cause data errors. As can be seen from comparing the detection time curves in fig. 6, when the number of detection samples is less than 400, the difference between the detection times before and after the data is processed by PCA is small, and when the number of detection samples is greater than 400, the detection time after the data is compressed is significantly shorter than that before the data is compressed. Since the actual fault detection data size on the super base station line is much larger than 400, the EFDM can be considered to be feasible.
Based on the above two tests, EFDM was found to be more effective than the original negative selection algorithm with a training time reduction of more than 20% and with a predetermined anomalous coverage detector coverage rate of 95%. Compared with the common negative selection algorithm, the EFDM provided by the invention can reduce the detection time and greatly improve the fault detection efficiency on the premise of not influencing the detection accuracy.
It should be noted that, all the steps described in the above embodiments are not necessary, and those skilled in the art may make appropriate substitutions, replacements, modifications, and the like according to actual needs.
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention and are not limited. Although the present invention has been described in detail with reference to the embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (13)

1. A method of performing anomaly detection for a centralized access network, comprising:
1) aiming at one layer in a network architecture of a centralized access network, a first fault detector corresponding to the layer is adopted for detecting abnormal events, wherein the first fault detector is obtained by adopting a negative selection algorithm for training based on the self operating data of the layer;
2) when the first fault detector detects that a plurality of fault information exist in a component, a second fault detector is adopted to perform intersection operation on the plurality of fault information of the component so as to determine the fault of the component, wherein one fault information is represented as a set of one abnormal event occurring on the component and all abnormal events related to the abnormal event.
2. The method of claim 1, comprising: performing the above step 1) and/or step 2) for one layer in a network architecture of a centralized access network, and in case that a failure exists in a component of the layer, performing the above step 1) and/or step 2) for an adjacent layer of the layer until it is determined that there is no failure in the component of the current layer or all layers are traversed.
3. The method of claim 2, comprising: performing the above step 1) and/or step 2) for a non-underlying layer in the network architecture of the centralized access network, in case it is found that there is a failure in a component of said layer, performing the above step 1) and/or step 2) for a lower layer of said layer until it is determined that there is no failure or the lowest layer is detected in the component of the current layer.
4. The method according to claim 1, wherein the operating data of the layer itself is chosen from the corresponding set for the negative selection algorithm and the first fault detector for each of the following layers in the network architecture of the centralized access network:
computing resource layer: the temperature of the equipment, the voltage of the equipment, the utilization rate of a CPU (Central processing Unit), the utilization rate of a memory, the flow of a network interface and the rate of the network interface;
a virtualization layer: the utilization rate of a virtual CPU, the utilization rate of a virtual memory and the utilization rate of a virtual baseband;
a network element layer: the method comprises the following steps that the number of access users of a virtual base station, an uplink rate, a downlink rate, signal strength, time delay, packet loss rate and reference signal receiving power are calculated;
network layer: network performance parameters for the area.
5. The method of claim 1, wherein the first fault detector is trained by a method comprising:
I) for one layer in a network architecture of a centralized access network, collecting n-dimensional operation data D ═ (z) of the layer(1),z(2),...,zn) To serve as an autologous sample set;
II) training by adopting a negative selection algorithm based on the self-body sample set to obtain a candidate first fault detector, and if the candidate first fault detector is the minimum distance d from the self-body sampleminGreater than or equal to the affinity radius r of the autologous samplesAdding the candidate fault detectors to a set of fault detectors;
III) outputting the set of fault detectors when the obtained set of fault detectors meets the set coverage rate, otherwise, repeating the step II).
6. The method of claim 5, wherein step III) of determining whether the obtained set of fault detectors has reached a set coverage comprises:
based on z > zαDetermining whether the obtained set of fault detectors has reached a set coverage, whereinx is the number covered by the set of fault detectors in n test samples, p is a predetermined coverage, zαIs the confidence value corresponding to the selected significance level α.
7. The method of claim 5, wherein between steps II) and III) comprises:
if the detection radius r of the candidate fault detectordi>dmin-rsThen the candidate fault detector is excluded from the set for the fault detector.
8. The method of claim 5, wherein the set of autologous samples is D ═ (z)(1),z(2),...,zn) Corresponding to n dimensions determined by performing dimension reduction processing on the selectable m-dimensional running data of the layer through a principal component analysis algorithm, n<m。
9. A method of training a fault detector, comprising:
I) collecting n-dimensional operation data D ═ (z) generated by the system in operation(1),z(2),...,zn) To serve as an autologous sample set;
II) training by adopting a negative selection algorithm based on the self-body sample set to obtain a candidate fault detector, wherein if the minimum distance d between the candidate fault detector and the self-body sample isminGreater than or equal to the affinity radius r of the autologous samplesAdding the candidate fault detectors to a set of fault detectors;
III) outputting the set of fault detectors when the obtained set of fault detectors meets the set coverage rate, otherwise, repeating the step II).
10. The method of claim 9, wherein step III) determining whether the obtained set of fault detectors has reached a set coverage comprises:
based on z > zαDetermining whether the obtained set of fault detectors has reached a set coverage, whereinx is the number covered by the set of fault detectors in n test samples, p is a predetermined coverage, zαIs the confidence value corresponding to the selected significance level α.
11. The method of claim 9, wherein between steps II) and III) comprises:
if the candidate fault is detectedRadius of detection r of devicedi>dmin-rsThen the candidate fault detector is excluded from the set for the fault detector.
12. The method of claim 9, wherein the set of autologous samples is D ═ (z)(1),z(2),...,zn) Corresponding to n dimensions determined by performing dimension reduction processing on the selectable m-dimensional running data of the layer through a principal component analysis algorithm, n<m。
13. A computer-readable storage medium, in which a computer program is stored which, when being executed, is adapted to carry out the method of any one of claims 1-12.
CN201910383058.1A 2019-01-30 2019-05-09 A kind of fault detection method towards centralization access net Pending CN110225540A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910091232 2019-01-30
CN2019100912325 2019-01-30

Publications (1)

Publication Number Publication Date
CN110225540A true CN110225540A (en) 2019-09-10

Family

ID=67820742

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910383058.1A Pending CN110225540A (en) 2019-01-30 2019-05-09 A kind of fault detection method towards centralization access net

Country Status (1)

Country Link
CN (1) CN110225540A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111930025A (en) * 2020-08-06 2020-11-13 北京中科晶上科技股份有限公司 Modeling and simulation method and device for satellite communication system simulation platform
CN113498073A (en) * 2020-03-19 2021-10-12 华为技术服务有限公司 Packing method, related device and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101866402A (en) * 2010-05-31 2010-10-20 西安电子科技大学 Negative selection intrusion detection method based on immune multi-objective constraints
CN102055604A (en) * 2009-11-05 2011-05-11 中国移动通信集团山东有限公司 Fault location method and system thereof
CN104202196A (en) * 2014-09-11 2014-12-10 德科仕通信(上海)有限公司 Method for detecting network performance problems and positioning failure nodes
EP2963552A1 (en) * 2013-02-26 2016-01-06 Nec Corporation System analysis device and system analysis method
CN109242242A (en) * 2018-07-26 2019-01-18 中国电力科学研究院有限公司 A kind of method and system modeled for determining system protection private network business risk

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102055604A (en) * 2009-11-05 2011-05-11 中国移动通信集团山东有限公司 Fault location method and system thereof
CN101866402A (en) * 2010-05-31 2010-10-20 西安电子科技大学 Negative selection intrusion detection method based on immune multi-objective constraints
EP2963552A1 (en) * 2013-02-26 2016-01-06 Nec Corporation System analysis device and system analysis method
CN104202196A (en) * 2014-09-11 2014-12-10 德科仕通信(上海)有限公司 Method for detecting network performance problems and positioning failure nodes
CN109242242A (en) * 2018-07-26 2019-01-18 中国电力科学研究院有限公司 A kind of method and system modeled for determining system protection private network business risk

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
叶冠文等: ""一种基于超级基站的故障检测机制"", 《广东通信技术》 *
石磊: ""网络故障定位与检测技术研究"", 《中国优秀博硕士学位论文全文数据库(硕士)》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113498073A (en) * 2020-03-19 2021-10-12 华为技术服务有限公司 Packing method, related device and system
CN111930025A (en) * 2020-08-06 2020-11-13 北京中科晶上科技股份有限公司 Modeling and simulation method and device for satellite communication system simulation platform
CN111930025B (en) * 2020-08-06 2024-05-31 北京中科晶上科技股份有限公司 Modeling and simulation method and device for satellite communication system simulation platform

Similar Documents

Publication Publication Date Title
CN108322347B (en) Data detection method, device, detection server and storage medium
CN115412947B (en) Fault simulation method and system based on digital twin and AI algorithm
CN108063676A (en) Communication network failure method for early warning and device
CN107872457B (en) Method and system for network operation based on network flow prediction
CN104683984A (en) Method and system for real-time monitoring and processing of wireless communication signals
CN114422379B (en) Analysis method for multi-platform equipment wireless networking
CN116723136B (en) Network data detection method applying FCM clustering algorithm
CN114337792A (en) Satellite communication signal fault diagnosis method and device
JP2018148350A (en) Threshold determination device, threshold level determination method and program
CN113207146B (en) Wireless communication network quality monitoring system and method
CN110225540A (en) A kind of fault detection method towards centralization access net
CN117376084A (en) Fault detection method, electronic equipment and medium thereof
US20240250886A1 (en) A classifier model for determining a network status of a communication network from log data
CN115309605A (en) Big data based anomaly monitoring method and device
CN114879081A (en) Lightning damage area analysis method based on synchronous dynamic monitoring data of lightning arrester
CN118102372A (en) Communication terminal wireless quality detection method and device based on 5G electric power virtual private network
CN108768772A (en) The fault detection method of self-organizing network based on cost-sensitive
JPWO2015182629A1 (en) Monitoring system, monitoring device and monitoring program
Geng et al. Quickest change-point detection over multiple data streams via sequential observations
CN110650145A (en) A low-rate denial-of-service attack detection method based on SA-DBSCAN algorithm
CN113660687A (en) Network poor cell processing method, device, device and storage medium
CN111385116B (en) Multidimensional correlation feature analysis method and device for high-interference cells
CN117478360A (en) Intrusion detection method, intrusion detection device, computer equipment and storage medium
CN106686082B (en) Storage resource adjusting method and management node
CN111950853B (en) Electric power running state white list generation method based on information physical bilateral data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190910