[go: up one dir, main page]

WO2024205504A1 - Anomaly signal detection method and system - Google Patents

Anomaly signal detection method and system Download PDF

Info

Publication number
WO2024205504A1
WO2024205504A1 PCT/SG2024/050191 SG2024050191W WO2024205504A1 WO 2024205504 A1 WO2024205504 A1 WO 2024205504A1 SG 2024050191 W SG2024050191 W SG 2024050191W WO 2024205504 A1 WO2024205504 A1 WO 2024205504A1
Authority
WO
WIPO (PCT)
Prior art keywords
outlier
abnormal
nearest neighbor
sensor
time series
Prior art date
Application number
PCT/SG2024/050191
Other languages
French (fr)
Chinese (zh)
Inventor
邓锦浩
昂以豪
黄强
Original Assignee
新加坡国立大学
重庆新国大研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 新加坡国立大学, 重庆新国大研究院 filed Critical 新加坡国立大学
Publication of WO2024205504A1 publication Critical patent/WO2024205504A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01DMEASURING NOT SPECIALLY ADAPTED FOR A SPECIFIC VARIABLE; ARRANGEMENTS FOR MEASURING TWO OR MORE VARIABLES NOT COVERED IN A SINGLE OTHER SUBCLASS; TARIFF METERING APPARATUS; MEASURING OR TESTING NOT OTHERWISE PROVIDED FOR
    • G01D18/00Testing or calibrating apparatus or arrangements provided for in groups G01D1/00 - G01D15/00
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2123/00Data types
    • G06F2123/02Data types in the time domain, e.g. time-series data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching

Definitions

  • the present application relates to signal detection, and more particularly to a method and a system for abnormal signal detection.
  • Background technology Sensor networks are widely used in industry, including monitoring the manufacturing process of assembly lines, recording the operating status of wind turbines, and other critical infrastructure.
  • an important function of sensor networks is to promote predictive maintenance by detecting anomalies in sensor-based multivariate time series as early as possible, which is crucial for timely detection and recovery of faults before serious damage occurs. It may be very challenging to achieve such detection goals.
  • the present application provides a method for abnormal signal detection.
  • the method comprises: dividing the sensor signal including the multivariate time series into a plurality of sub-matrices, wherein the plurality of sub-matrices respectively correspond to respective sliding windows H relative to the sensor signal in the time series; converting each of the sub-matrices into a corresponding k-nearest neighbor graph, wherein each of the k-nearest neighbor graphs comprises a plurality of vertices and a plurality of edges connecting adjacent vertices; pruning each of the k-nearest neighbor graphs to remove the edges whose weights are less than a weight threshold among the plurality of edges; splitting each of the k-nearest neighbor graphs into a plurality of non-intersecting graphs; obtaining the number of outlier variation by comparing two consecutive k-nearest neighbor graphs, wherein the number of outlier variation corresponds to the number of outliers, and the outliers are respectively placed in different sub-graphs in the two consecutive k-nearest neighbor graphs; and determining whether the
  • the method further comprises, in response to the outlier variation being greater than or equal to the number of outliers, determining whether the sensor signal has an abnormal signal. J'' when the first threshold is exceeded, the sensor signal is determined to be abnormal.
  • the method further comprises obtaining the abnormal time and abnormal sensor corresponding to each of the at least one outlier according to at least one outlier.
  • the method further comprises obtaining the number of co-occurrences and the corresponding co-occurrence rate of each vertex by comparing two consecutive k-nearest neighbor graphs; and adding each vertex whose co-occurrence rate is lower than the second threshold to the outlier set.
  • the method further comprises obtaining the number of outlier mutations by comparing two consecutive outlier sets.
  • the method further comprises obtaining the average value and standard deviation of the number of outlier mutations according to historical sensor signals.
  • the method further comprises obtaining the first threshold according to the standard deviation of the number of outlier mutations.
  • the method further comprises determining whether the sensor signal has the abnormal signal according to the first threshold, the number of outlier mutations and the average value of the number of outlier mutations.
  • the method further comprises initializing the outlier set to an empty set.
  • the present application provides a system capable of performing abnormal signal detection in real time.
  • the system includes at least one sensor, wherein the at least one sensor can provide a sensor signal including a multivariate time series; and a computing device that forms a signal connection with the multiple sensors, wherein the computing device is configured to execute the method for abnormal signal detection described above.
  • FIG1 is a schematic diagram of a system that can perform abnormal signal detection in real time according to an embodiment of the present application
  • FIG2 is a schematic diagram of a multivariate time series according to an example of a sensor signal
  • FIG3 is a schematic diagram of a series of time series diagrams of the multivariate time series according to FIG2
  • FIG4 is a flow chart of a method for abnormal signal detection according to an embodiment of the present application
  • FIG5 is a schematic diagram of a method for abnormal signal detection according to another embodiment
  • FIG6 shows the difference between point adjustment and delay-point adjustment
  • FIG7 is an evaluation of the method for detecting abnormal signals according to FIG5 on SMD
  • FIG8 is a scalability test of the method for detecting abnormal signals according to FIG5 on data sets IS-1 to IS-5
  • FIG9 is a time series of some sensors marked with abnormalities according to an example.
  • the present invention discloses a system for detecting abnormal signals, which can be used in real time to detect sensor signals, such as abnormal signals in a multivariate time series.
  • the system 100 for detecting abnormal signals includes a plurality of sensors 110 and a computing device 120 connected to the plurality of sensors 110.
  • the sensors 110 may include sensors with multiple uses, multiple setting positions, and multiple measurement properties.
  • the plurality of sensors 110 may include multiple sensors in a robotic arm, such as a laser sensor, a sonar sensor, a temperature sensor, a pressure sensor, etc.
  • the plurality of sensors 110 may provide signals to the computing device 120.
  • the system 100 for abnormal signal detection that sends or provides a sensor signal 200 including a multivariate time series can implement an unsupervised abnormal signal detection method based on association analysis.
  • the abnormal signal detection method is a deterministic method that can obtain stable output without model training. In addition, the method does not require pre-set assumptions about data distribution, and can therefore be easily extended to the processing of streaming data.
  • the abnormal signal detection method identifies anomalies by observing the correlation between sensors or the correlation between sensor data, mining the changes in the correlation between the sensor/sensor data, and obtaining (1) the time period when the abnormality occurs (abnormal time) and (2) the affected components (abnormal sensors with abnormal readings) of the sensor signal. Combining these two aspects can provide valuable information for predictive maintenance.
  • the method for detecting abnormal signals or the method for detecting abnormal signals identifies anomalies by observing the correlation between sensors and mining the changes in the correlation between sensor signals.
  • the multivariate time series is first converted into a series of time series graphs (Time-Series Graph).
  • the time series graph is a k-nearest neighbor graph. Each vertex in the k-nearest neighbor graph corresponds to a sensor, and the edge between two vertices represents the correlation between the time series of two sensors over a period of time.
  • the time series graph only connects sensors to highly correlated neighbors.
  • the method of abnormal signal detection can detect abnormal time periods and affected sensors.
  • the method of abnormal signal detection is practical and reliable for early anomaly detection of noisy data.
  • FIGS. 2 to 4 show a method 300 for detecting abnormal signals.
  • Rj'ffl ⁇ Detect abnormal sensor signal 200 such as a multivariate time series T.
  • the operating state in the multivariate time series T can be divided into two types: normal state and abnormal state (abnormal).
  • the abnormality includes two parts: abnormal sensors and abnormal time.
  • Abnormal sensors refer to sensors that cause detection abnormalities and are fundamentally related to the occurrence of abnormalities, while abnormal time corresponds to sensor readings, usually referring to the time period when the abnormality occurs.
  • T time series
  • the goal of abnormality detection is to find / sub-matrices (from T) that are abnormal, where / is the number of abnormalities.
  • Each sub-matrix represents an abnormality Z, where each column represents a single abnormal time point; each row corresponds to the time series of the abnormal sensor that caused the abnormality.
  • the sub-matrix can be composed of non-adjacent rows of T, or adjacent columns to keep the abnormal time continuous.
  • the method 300 for abnormal signal detection includes in box 310, first dividing the sensor signal including the multivariate time series into multiple sub-matrices 210/220/230.
  • the sub-matrices 210/220/230 correspond to respective sliding windows w relative to the sensor signal 200 in the time series.
  • multiple sub-matrices 210/220/230 may overlap.
  • the step size s is preferably smaller than the sliding window w. A larger sliding window w and a smaller step size s are helpful for fine-grained tracking of changes in correlation.
  • each sub-matrix Ti has the same length w. It is worth noting that when the multivariate time series is not divisible by the sliding window w, that is, (
  • , the last few sequences of T are deleted to make it divisible. For ease of explanation, assume R (
  • each vertex is connected to its k highest correlation neighbors.
  • setting a larger k value may include a weaker correlation with its k neighbors, resulting in additional noise and affecting the abnormality detection result.
  • setting a smaller left value may not include enough relevant information between sensors. Therefore, the corresponding left value can be set according to the sensor signal.
  • some vertices in the k-nearest neighbor graph G still have weak correlation among k neighbors. These weakly correlated edges may change frequently in consecutive rounds or sequences, resulting in additional noise in the actual correlation between sensors.
  • each of the k-nearest neighbor graphs is pruned to remove edges with weights less than a weight threshold T, that is, o ( e )
  • the edge pruning is performed so that the method for abnormal signal detection only tracks the strong correlation between sensors. And perform real-time anomaly detection.
  • the weight threshold T affects the sparsity of the time series graph. A larger weight threshold T may cause some vertices in the time series graph to not have enough neighbors, and thus lack sufficient relevant information to detect anomalies. A smaller weight threshold T, This will cause the time series graph RJ to include many edges with weak correlation, resulting in additional noise.
  • a moderate weight threshold T is selected between 0.4 and 0.6.
  • the pruned k-nearest neighbor graph for each submatrix T" can be called a time series graph G ro
  • the affected vertex 238 in FIG. 3 represents an abnormal sensor, and the continuous rounds of the abnormal sensor are used to represent the abnormal time.
  • the method 300 for detecting anomaly signals also includes, in block 340, splitting each of the k-nearest neighbor graphs into a plurality of non-intersecting subgraphs.
  • Definition 1 illustrates the overall representation of anomalies in a time series graph. For a certain round r, the time series graph G can be split into % non-intersecting subgraphs Loosely connected, the subgraphs are called 4 communities. Sensor networks usually present community structures.
  • time series graphs constructed by sensor networks usually show community structures.
  • the changes in the corresponding communities of different vertices can be detected through two consecutive rounds or two consecutive time series graphs to determine whether the connectivity of different vertices is stable.
  • specific constraints are added to (Kz, Rz) to obtain Definition 2 and Definition 3.
  • Definition 2 (affected vertices): For a time series graph G r (V, E r )- , given a round r£ (l, R] with 4 communities, the vertex set J is the set of affected vertices.
  • each vertex ve r changes in two consecutive rounds, that is, moves into or out of the corresponding community Cg, it is represented as a variation.
  • Each vertex v that changes may correspond to a corresponding outlier.
  • the outliers are obtained by comparing two consecutive k-nearest neighbor graphs. Point mutation number.
  • the outlier mutation number is the total number or number of outliers in the sequence, in other words, the number of affected vertices in the vertex subset K ⁇ . It can be understood that there may be no outliers between two consecutive k-nearest neighbor graphs, so the outlier mutation number is zero. When the outlier mutation number is not zero, it may include at least one outlier. The at least one outlier may be placed in different graphs in the two consecutive k-nearest neighbor graphs. Therefore, a continuous round including more changes in other rounds is obtained, and this period is regarded as abnormal time.
  • Definition 3 (Abnormal time and abnormal sensor): Given a first threshold, such as the abnormal time threshold & (i W & Wn) and R rounds of affected vertices ⁇ Vi, ..., VR), the abnormal time Rz is 3, if there are more affected vertices, the wheel corpse is regarded as abnormal. In some cases, all vertices may be affected. Therefore, the range of the abnormal time threshold & can be 1W & Wn. Therefore, in box 360, the method 300 for detecting abnormal signals also includes determining whether the sensor signal has only an abnormal signal based on the number of outlier mutations and the first threshold, such as the abnormal time threshold, and determining that the sensor signal is abnormal when the number of outlier mutations is greater than or equal to the abnormal time threshold &.
  • a first threshold such as the abnormal time threshold & (i W & Wn) and R rounds of affected vertices ⁇ Vi, ..., VR
  • the abnormal time Rz is 3 if there are more affected vertices, the wheel
  • a row of the sequence T is a time period when the anomaly occurs. Therefore, the anomaly 238 in the time series graph corresponds to the anomaly submatrix in the multivariate time series 200.
  • the abnormal sensor is S4
  • FIG5 an embodiment of the method for abnormal signal detection is shown.
  • the method for abnormal signal detection is divided into two stages: In the first stage 410, the sensor signal is divided and converted into a plurality of corresponding time series graphs, such as k-nearest neighbor graphs. And different communities/subgraphs are detected from different time series graphs according to the correlation of the vertices. In the second stage 420, the co-occurrence relationship between each vertex and other vertices is mined in every two consecutive rounds or sequences, for example, the co-occurrence number and the corresponding co-occurrence rate of each vertex are obtained, and the outlier value is obtained. In the second stage 430, the number of outlier point variations is defined and its attributes are analyzed. The pseudo code 1 of the anomaly detection process is shown below, which can return the outlier set and the number of outlier point variations. Based on the analysis of the number of outlier changes, the abnormal signal detection method can detect abnormal time and abnormal sensors. Pseudo code 1
  • the first stage 410 is community detection.
  • the method of abnormal signal detection may include first The sensor signal is divided into multiple sub-matrices, and each of the sub-matrices is converted into a corresponding time series graph, such as a k-nearest neighbor graph. Different communities/sub-graphs are detected from different time series graphs according to the correlation of vertices. For each round r£ [l, R], the time series graph G ⁇ is first divided into communities using an algorithm such as Louvain, so that vertices that are more correlated than other vertices are divided into the same community (pseudocode 1, line 2). Through the community, the abnormal correlation changes of vertices can be tracked.
  • FIG. 5 shows the first 10 rounds of time series graphs with 12 sensors. After the first stage, each time series graph is divided into three communities, but the communities in different rounds can be different. When an anomaly occurs, although the sensor may not initially show obvious abnormal behavior, the correlation of the sensor is likely to collapse, resulting in a change in the community. Next, the co-occurrence relationship of the vertices will be mined from the sensor's community. Phase 2: Co-occurrence mining. First, we define co-appearance.
  • Co-occurrence number is the number of vertices in each vertex v er (a round r W
  • the co-occurrence % (v) in different rounds may be different for each vertex v & K, especially When there are anomalies. Therefore, define the co-appearance ratio (Ratio of Co-appearance Number) to obtain the co-appearance number through a vertex, based on the changes in all r rounds so far.
  • Definition 6 (Co-appearance ratio): For each v &K, the co-appearance ratio of v with other vertices in round r W(l,R] is calculated as follows:
  • vertex V If a vertex V always co-occurs with a certain set of vertices in the same community in rounds, it has a high/large RC and can be considered normal. However, for a particular round r, if vertex v suddenly becomes a member of another community, then vertex v should be considered an outlier or outlier. This is because this community may contain only a few (or no) other vertices that co-occur with vertex v. For example, the outlier threshold is compared with the value to determine whether the vertex v is an outlier in a round r.
  • the overall steps of the co-occurrence mining phase are described as follows. First, the outlier set Or is initialized to an empty set ( K, calculate the number of co-occurrences Sr(v) and the co-occurrence rate RC v , r of each vertex v (pseudocode 1 line 5). And add each vertex v whose co-occurrence rate RC" is lower than the outlier threshold 0 to the outlier set (pseudocode 1 line 6). A higher outlier threshold will exclude many vertices that should be considered outliers, resulting in low detection accuracy.
  • the outlier threshold 0 is set to about 0.3. It is worth noting that since multivariate time series usually contain noise, the presence of (a few) outliers cannot reliably determine that a round is abnormal in some cases. Therefore, the following stage describes how to analyze outlier changes and mitigate the impact of noise. Stage 3: Variation analysis.
  • the vertices in the time series graph can have two states in two consecutive rounds/sequences: (1) both rounds are normal, (2) both rounds are outliers/outliers, (3) the transition between normal state and outliers.
  • the vertex force and vertex Point c is constant in both the 9th and 10th rounds, while vertex a becomes a cluster value in the 9th round and becomes normal in the 10th round.
  • n r Definition 8
  • the outlier sets O r -i corresponding to round r 1 and round p respectively Referring to Definition 3, according to the
  • the number of outlier variation is Anomaly detection. Given a round r, after obtaining the number of outlier mutations V, it is necessary to determine whether the current round is abnormal.
  • is a constant; ⁇ and ⁇ are the average value and standard deviation of the number of outlier variation, respectively.
  • the wheel corpse is also abnormal.
  • the outlier variance of the wheel corpse The average value of the quantity 0 and the number of outlier mutations is used to determine whether the round has an abnormal signal.
  • the method for abnormal signal detection can be described by the following pseudo code 2 for abnormal detection.
  • Z is used to store abnormalities, and each abnormal or affected sensor and abnormal round are stored separately using the modified and ⁇ (pseudo code 2 line 2).
  • the method in pseudo code 1 is first called to perform outlier detection, and the outlier set O' and the number of outlier mutations n r are obtained (pseudo code 2 line 6).
  • N is used to store a series of outlier mutation numbers n r.
  • the must and b are two parameters for determining the abnormal time. If ⁇ n rM ⁇ ⁇ 3a, the round is determined to be abnormal, so r is added to Rz and Tz is updated to Or (pseudo code 2 line 8). If ⁇ is not empty, it means that the current abnormality ends, and ( ⁇ ) is added to Z and modified and ⁇ are reinitialized to empty sets (pseudo code 2 line 11). Then, n is added to N (pseudocode 2 line 12) and 1 and G are updated accordingly (pseudocode 2 line 13), and all detected anomalies Z are returned as output (pseudocode 2 line 14).
  • ⁇ factory ⁇ A3b it is more reliable to process noisy sensor signals/data. Based on the fact that the noise of a certain machine is usually at the same level, the use of ⁇ can reduce the impact of noise, and use 3b to distinguish actual anomalies from noise data.
  • a warm-up process is introduced using a historical multivariate time series (pseudocode 2 lines 16-23) to obtain enough ⁇ ".
  • the input of the warm-up process comes from the historical sensor signal of the same data source as T, such as the historical multivariate time series several ".
  • T such as the historical multivariate time series several ".
  • MTS T Historical MTS T hzs , sliding window step size and outlier threshold
  • the method for detecting abnormal signals disclosed in the text can detect abnormalities in multivariate time series based on sensors. It is worth noting that the method for detecting abnormal signals can be extended to process streaming data. For example, when a new round of data also arrives, the method for detecting abnormal signals can be used to detect abnormalities in real time. Since the detection of abnormal signals is performed for each round, the detection process can be carried out simultaneously with the collection of new data. As long as the running time of each round of the method for detecting abnormal signals is less than a time period of one step s, the abnormality can be detected in real time. In addition, in the process of abnormality detection, by maintaining a series of fives, the mouth and eyes can be obtained.
  • real-time detection Under the condition of real-time detection, the number of fives increases as more and more data streams enter, and the value of can be estimated more accurately, thereby achieving more Accurately detect anomalies. Therefore, real-time detection can be applied to various industrial scenarios, such as server nodes, commercial servers, water supply networks, power sensors, industrial assembly lines, etc.
  • the method of abnormal signal detection can detect anomalies at an earlier stage. The main reason is that when an anomaly occurs, although the sensor may not show obvious abnormal behavior, the correlation between the affected sensors is likely to change, so this change can be captured by this method.
  • the evaluation of time series anomaly detection is usually based on point adjustment (PA).
  • the anomaly is considered to be detected. After that, it is directly compared with the label value to calculate the value of . Since the point adjustment PA does not consider the time point when the anomaly is first detected, the time order when the anomaly is detected is ignored, and the time order of detection is not evaluated. In order to solve the problem in the point adjustment PA, this paper provides a new evaluation scheme for delay point adjustment (DPA) to reflect the time order when the anomaly is detected.
  • DPA delay point adjustment
  • the delayed point adjustment DPA is a stricter evaluation, namely F'DPA WF'PA.
  • two relative evaluation methods are proposed: Ahead and Miss, which are used for relative comparison between the detection results of the two methods.
  • Other methods include three data mining-based methods: LOF, ECOD and TForest; two deep learning methods: USAD and RCoders; and four variable methods: S2G, SAND, its online version SAND* and NormA.
  • Sensor datasets with label information are used: PSM, SWaT and SMD, as well as two private datasets from industry, IS-1 and IS-2o.
  • the SMD dataset consists of 28 different subsets, and the method evaluation is performed directly on each subset without preheating.
  • Table II summarizes the statistics of the eight datasets.
  • Table III shows the results of abnormal time detection on the sensor datasets of PSM, SWaT, IS-1 and IS-2, where the standard deviation (S0) is obtained based on the publication R4 and F1 DPA repeated 10 times. Since the results of the proposed method CAD, LOF, ECOD and S2G do not change with different repetitions, the corresponding standard deviation is 0.
  • the proposed method CAD outperforms all baselines of IS-1 and achieves the highest average ranking of FM and FIDPA.
  • Table IV shows the results of the SMD dataset.
  • the proposed method CAD outperforms the other methods of USAD, RCoders, S2G, SAND, SAND* and NormA in at least 17 of the 28 subsets, and is comparable to the LOF, ECOD and IForest methods.
  • Table II Statistics of the eight sensor datasets
  • Table V shows the evaluation of Ahead and Miss.
  • the method CAD described in this application is compared with other methods. At least 50% (PSM and SWaT sensor datasets), while its miss rate is still less than 50% (except SAND). The results show that for the detected anomalies, our anomaly signal precedes other discoveries at least half; for those anomalies missed by CAD, others can detect at most half. Refer to Figure 7, for all 28 subsets of the SMD dataset, The CAD method is superior to other methods. Compared with other methods, the CAD method can achieve Ahead>50°/ o in most subsets of SMD, and more than half of the subsets have Miss ⁇ 50%, thus verifying the CAD method. The method can detect abnormal time as early as possible and almost no omissions. Table V Evaluation of Ahead and Miss
  • the CAD method can not only detect abnormal time, but also reveal abnormal sensors.
  • the last column of Table IV shows the abnormal sensor detection results on SMD. It is observed that the CAD method The CAD method outperforms the ECO D and RCoders methods by a ratio of 28/28, which are the only two benchmark methods for obtaining abnormal sensors. Based on the sensor data sets IS-1 and IS-2, the CAD method of this method is also better than the ECO D and RCoders methods, and its F ⁇ sensor exceeds 60%. Therefore, at least half of the normal sensors can be excluded, which greatly reduces the pressure on operators to perform predictive maintenance.
  • Table VI shows the detection time of the CAD method in each round, that is, the time per round (TPR for short).
  • TPR time per round
  • the time per round TPR should be less than the time corresponding to the step size s, that is, TPR ⁇ s, where s is the sampling frequency of the sensor. Therefore, "eg Vs/TPR Hz.
  • the CAD method can support real-time anomaly detection, that is, the maximum frequency supported by the method on SWaT and IS-2 is approximately 43 kHz and 331 Hz, respectively, which is much larger than the corresponding actual sampling frequency of 1 Hz and 1/900 Hz.
  • the proposed CAD method is run on a large real-world labeled dataset (i.e., IS-1-IS-5) for anomaly detection by increasing the number of sensors.
  • the left side of Figure 8 shows the results of anomaly detection in real time. Even though the IS-5 dataset contains more than one thousand sensors, the proposed method can still ensure anomaly detection accuracy of Fl DPA > 85%.
  • the right side of Figure 8 shows the TPR of each round of anomaly detection on IS-1 to IS-5, which shows that the proposed CAD method is scalable to a large number of sensors, and real-time anomaly detection can be achieved as long as the TPR of each round is less than one step time (as in the case of IS-1-IS-5).
  • FIG9 shows the time series of some sensors of the subset SMD 1_6 in the SMD, which covers the abnormal time period of the second abnormality, indicating the sensors affected by the abnormality.
  • the time series of the abnormal sensors e.g., sensors 2-4, 9, 12, and 13
  • the normal sensors e.g., sensors 19-21
  • the present method CAD uses the edges in the time series graph to represent the strong correlation between sensors, the abnormal changes of some sensors can be immediately reflected by the changes in the edges. Therefore, they can be reported as abnormal sensors earlier and accurately. However, when such an abnormality occurs in the initial stage, the changes in the time series may be negligible, so other methods based only on specific rules may not be able to detect it early.
  • the first row of FIG. 9 shows the time point at which each method detects this anomaly. It can be seen that when an anomaly occurs, the present method can detect the anomaly immediately, so it is suitable for the industrial environment of the real world, and can avoid the propagation of faulty components to other adjacent components over time due to untimely maintenance.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Testing And Monitoring For Control Systems (AREA)
  • Testing Or Calibration Of Command Recording Devices (AREA)

Abstract

The present application relates to an anomaly signal detection method, comprising: dividing a sensor signal into a plurality of sub-matrixes, wherein the plurality of sub-matrixes respectively correspond to sliding windows relative to the sensor signal on a time sequence; converting each sub-matrix into a corresponding k-nearest neighbor graph, wherein the k-nearest neighbor graph comprises a plurality of vertexes and a plurality of edges connected to the adjacent vertexes; trimming each k-nearest neighbor graph to remove edges having weights smaller than a weight threshold among the plurality of edges; splitting each k-nearest neighbor graph into a plurality of non-intersecting sub-graphs; obtaining the number of outlier variations corresponding to the number of outliers by comparing two continuous k-nearest neighbor graphs, wherein the outliers are respectively placed in different sub-graphs in the two continuous k-nearest neighbor graphs; and according to the number of outlier variations and a first threshold, determining whether there is an anomaly signal.

Description

异常信号检测的方法及系统 技术领域 本 申请涉及信号检测, 尤其涉及一种异常信号检测的方法及一种 异常信号检测的系统。 背景技术 传感器 网络在工业中有着广泛的应用, 包括监控装配线的制造过 程、记录风力涡轮机、 等关键基础设施的运行状态。在这些应用中, 传 感器网络的 -项重要功能是通过尽早检测基于传感器的多变量时间序 列的异常来促进预测性维护,这对于在严重损坏发生之前及时发现并恢 复故障至关重要。 要实现这样的检测目标可能非常具有挑战性。 发明内容 在一方面,本申请提供一种异常信号检测的方法。所述方法包括: 将包括多变量时间序列的传感器信号划分为多个于矩阵,其中所述多个 子矩阵分别对应于在时间序列上相对于所述传 感器信号的各个滑动窗 H ; 将每个所述于矩阵转换为相应的 k-最近邻图, 其中每个所述 k-最 近邻图包括多个顶点及连接相邻顶点的多个边; 修剪每个所述 k-最近 邻图以去除所述多个边其中权重小于权重阈值的边; 将每个所述 k-最 近邻图拆分为多个不相交的了图;通过对比两个连续的 k-最近邻图,获 得离群点变异数量 , 所述离群点变异数量对应于离群点的数量, 所述 离群点在所述两个连续的 k-最近邻图中分别置于不同的于图; 及根据 所述离群点变异数量及笫一阈值,判定所述传感器信号是否具有异常信 号。 优选地 , 所述方法还包括响应于在所述离群点变异数量大于或等 J'' 所述第一阈值时, 判定传感器信号为异常。 优选地, 所述方法还包 括根据至少一个离群点,获得相应于各个所述至少一个离群点的异常时 间及异常传感器。 在一个实施例 中, 所述方法还包括通过对比两个连续的所述 k-最 近邻图,获得每个所述顶点的共现数及相应的共现率;及将所述共现率 其中低于第二阈值 的各个顶点添加入离群值集 。优选地,所述方法还 包括通过对比两个连续的所述离群值集, 获得所述离群点变异数量。 优选地 ,所述方法还包括根据历史传感器信号,获得所述离群点变 异数量的平均值及标准偏差。优选地,所述方法还包括根据所述离群点 变异数量的标准偏差, 获得所述第一阈值。优选地, 所述方法还包括根 据所述第一阈值、所述离群点变异数量及所述离群点变异数量的平均值, 判定所述传感器信号是否具有所述异常信号。优选地,所述方法还包括 将所述离群值集初始化为空集。 在另一方面, 本申请提供了一种可实时进行异常信号检测的系统。 所述系统包括至少一个传感器,其中所述至少一个传感器可提供包括多 变量时间序列的传感器信号;及与多个传感器形成信号连接的计算设备, 所述计算设备被设置为可执行以上所述的异常信号检测的方法。 附图说明 图 1 是根据本申请一个实施例的可实时进行异常信号检测的系统 的不意图; 图 2是根据一个传感器信号的示例的多变量时间序列的示意图; 图 3是根据图 2的多变量时间序列的一系列时间序列图的示意图; 图 4是根据本申请 -个实施例的异常信号检测的方法的流程图; 图 5是根据另一个实施例的异常信号检测的方法的示意图; 图 6 ZK出了点调整( Point Adjustment)与延迟点调整( Delay-Point Adjustment) 的区别; 图 7是根据图 5的异常信号检测的方法在 SMD 「.的评估; 图 8是根据图 5的异常信号检测的方法在数据集 IS-1至 IS-5上的 可扩展性测试; 以及 图 9是根据 -个示例, 标记异常的部分传感器的时间序列。 具体实施方式 本 文中公开了一种异常信号检测的系统, 可实时用于检测传感器 信号, 例如多变量时间序列(Multivariate Time Series, 中的异常信号。 如图 1所示,在一个实施例中,异常信号检测的系统 100包括多个传感 器 110及与多个传感器 110信号连接的计算设备 120o传感器 110可包 括多种用途、 多种设置位置、 多种测量性质的传感器。 为一示例, 多个 传感器 110可包括机械臂中的多种传感器,例如激光传感器、声纳传感 器、 温度传感器、 压力传感器、 等。 多个传感器 110可向计算设备 120 发送或提供包括多变量时间序列的传感器信号 200o异常信号检测的系 统 100 可基于关联分析实现无监督的异常信号检测的方法。 异常信号 检测的方法为一种确定性方法,无需模型训练也可得到稳定的输出。此 外,本方法无需对数据分布预设假设, 因此可轻易地扩展到对流数据的 处理。异常信号检测的方法通过观察传感器之间的相关性或传感器数据 间的相关性, 挖掘该传感器 /传感器数据相关性的变化以识别异常, 且 可获得传感器信号的 (1)异常发生的时间段(异常时间) 及(2) 受影 响的组件 (读数异常的异常传感器)。 结合这两方面内容可以为预测性 维护提供有价值的信息。 根据一个实施例, 检测异常信号的方法或异常信号检测的方法通 过观察传感器之间的相关性,挖掘传感器信号间相关性的变化来识别异 常。 为此, 先将多变量时间序列转换为一系列时间序列图(Time-Series Graph)。在一个实施例中,时间序列图为 k-最近邻图 ( k-Nearest Neighbor Graph) o k-最近邻图中的每个顶点对应于一个传感器, 两个顶点之间 的边表示一段时间内两个传感器时间序列的相关性 o时间序列图仅将传 感器连接到高度相关的邻居。为了检测异常,先找到一小部分受影响的 顶点, 这些顶点与其邻居的相关性已经发生了改变; 然后, 跟踪这些受 影响顶点 ■系列的时间序列图中的异常相关变化。最后,基于有理论保 证的相关变化分析,异常信号检测的方法可检测异常时间段及受影响的 传感器。 此外 , 本异常信号检测的方法对于噪声数据的早期异常检测是实 用且可靠的。 :旦时间序列在特定时间点出现异常变化,受影响的传感 器与其他传感器的相关性就会发生相应的变化,在时间序列图上与这些 受影响的顶点相关的边也将立即发牛变化。因此一旦异常发牛,基于传 感器的多变量时间序列中的微小异常可通过在 时间序列图上受影响的 顶点反映。尽管如此,由于多变量时间序列通常包含噪声,根据 (少数) 受影响的顶点报告异常可能还不太准确。因此通过跟踪一系列时间序列 图的相关变化,在发生显着变化的顶点高于阈值时才视为异常。所述异 常信号检测的方法可在嘈杂的现实世界数据 中实现迅速且可靠的异常 检测。

Figure imgf000006_0001
根据一个实施例, 图 2至 4示出了一种异常信号检测的方法 300, Rj'ffl「检测异常的传感器信号 200, 例如多变量时间序列 T。 多变量时
Figure imgf000007_0001
多变量时间序列 T中的运行状态可以分为两种类型: 正常状态和 异常状态 (异常)。异常包括两部分: 异常传感器 (Abnormal Sensors) 和异常时间 ( Abnormal Time ) 。 异常传感器是指导致检测异常并且与 异常发生有根本关联的传感器,而异常时间则对应于传感器读数,通常 是指异常发生的时间段。 给定一个带有 n 个传感器的多变量时间序列 T, 异常检测的目标为找到 /个于矩阵 (从 T中) 为异常, 其中 /是异 常的数量。每个子矩阵代表一个异常 Z, 其中每一列表示单个异常时间 点;每行对应于导致异常的异常传感器的时间序列。子矩阵可以由 T 的 非相邻行组成, 也可由相邻列组成以保持异常时间连续。 多变量时间序列通常很长, 但异常往往在很短的时间序列内发生, 因此直接从多变量时间序列中检测异常具有难度。因此,异常信号检测 的方法 300包括在框 310,先将包括多变量时间序列的传感器信号划分 为多个子矩阵 210/220/230。子矩阵 210/220/230分别对应于在时间序列 上相对于传感器信号 200的各个滑动窗口 w。一些实施例中, 多个子矩 阵 210/220/230可 '重叠。给定滑动窗口 w之外,还包括步氐眼优选地, 步长 s小于滑动窗口 w。较大的滑动窗口 w及较小的步长 s有助于细粒 度跟踪相关性的变化。 然而, 较小的 s/w比例具有较多窗门, 因此较为 昂贵。 优选地, 设置 w e[0.01 | T|, 0.03幻]及 s e[0.01w, 0.02M小
Figure imgf000007_0002
个子矩阵 Ti具有相 RJ的长度 w。 值得注意的是, 当多变量时间序列不 能被滑动窗口 w整除时, 即 (|T|- w) 丰 O mod s, 从 s<w«|T|开始, 删除 T 的最后几序列以使其可整除。为了便于说明, 假设 R=(|T|- w)/s+l ,将 多变量时间序列 T转换为 R个重叠的子矩阵 {Ti, , TR} O 子矩阵 L Method and system for abnormal signal detection Technical field The present application relates to signal detection, and more particularly to a method and a system for abnormal signal detection. Background technology Sensor networks are widely used in industry, including monitoring the manufacturing process of assembly lines, recording the operating status of wind turbines, and other critical infrastructure. In these applications, an important function of sensor networks is to promote predictive maintenance by detecting anomalies in sensor-based multivariate time series as early as possible, which is crucial for timely detection and recovery of faults before serious damage occurs. It may be very challenging to achieve such detection goals. SUMMARY OF THE INVENTION In one aspect, the present application provides a method for abnormal signal detection. The method comprises: dividing the sensor signal including the multivariate time series into a plurality of sub-matrices, wherein the plurality of sub-matrices respectively correspond to respective sliding windows H relative to the sensor signal in the time series; converting each of the sub-matrices into a corresponding k-nearest neighbor graph, wherein each of the k-nearest neighbor graphs comprises a plurality of vertices and a plurality of edges connecting adjacent vertices; pruning each of the k-nearest neighbor graphs to remove the edges whose weights are less than a weight threshold among the plurality of edges; splitting each of the k-nearest neighbor graphs into a plurality of non-intersecting graphs; obtaining the number of outlier variation by comparing two consecutive k-nearest neighbor graphs, wherein the number of outlier variation corresponds to the number of outliers, and the outliers are respectively placed in different sub-graphs in the two consecutive k-nearest neighbor graphs; and determining whether the sensor signal has an abnormal signal according to the number of outlier variation and a first threshold. Preferably, the method further comprises, in response to the outlier variation being greater than or equal to the number of outliers, determining whether the sensor signal has an abnormal signal. J'' when the first threshold is exceeded, the sensor signal is determined to be abnormal. Preferably, the method further comprises obtaining the abnormal time and abnormal sensor corresponding to each of the at least one outlier according to at least one outlier. In one embodiment, the method further comprises obtaining the number of co-occurrences and the corresponding co-occurrence rate of each vertex by comparing two consecutive k-nearest neighbor graphs; and adding each vertex whose co-occurrence rate is lower than the second threshold to the outlier set. Preferably, the method further comprises obtaining the number of outlier mutations by comparing two consecutive outlier sets. Preferably, the method further comprises obtaining the average value and standard deviation of the number of outlier mutations according to historical sensor signals. Preferably, the method further comprises obtaining the first threshold according to the standard deviation of the number of outlier mutations. Preferably, the method further comprises determining whether the sensor signal has the abnormal signal according to the first threshold, the number of outlier mutations and the average value of the number of outlier mutations. Preferably, the method further comprises initializing the outlier set to an empty set. On the other hand, the present application provides a system capable of performing abnormal signal detection in real time. The system includes at least one sensor, wherein the at least one sensor can provide a sensor signal including a multivariate time series; and a computing device that forms a signal connection with the multiple sensors, wherein the computing device is configured to execute the method for abnormal signal detection described above. Description of the drawings FIG1 is a schematic diagram of a system that can perform abnormal signal detection in real time according to an embodiment of the present application; FIG2 is a schematic diagram of a multivariate time series according to an example of a sensor signal; FIG3 is a schematic diagram of a series of time series diagrams of the multivariate time series according to FIG2; FIG4 is a flow chart of a method for abnormal signal detection according to an embodiment of the present application; FIG5 is a schematic diagram of a method for abnormal signal detection according to another embodiment; FIG6 shows the difference between point adjustment and delay-point adjustment; FIG7 is an evaluation of the method for detecting abnormal signals according to FIG5 on SMD; FIG8 is a scalability test of the method for detecting abnormal signals according to FIG5 on data sets IS-1 to IS-5; and FIG9 is a time series of some sensors marked with abnormalities according to an example. Specific embodiments The present invention discloses a system for detecting abnormal signals, which can be used in real time to detect sensor signals, such as abnormal signals in a multivariate time series. As shown in FIG1, in one embodiment, the system 100 for detecting abnormal signals includes a plurality of sensors 110 and a computing device 120 connected to the plurality of sensors 110. The sensors 110 may include sensors with multiple uses, multiple setting positions, and multiple measurement properties. As an example, the plurality of sensors 110 may include multiple sensors in a robotic arm, such as a laser sensor, a sonar sensor, a temperature sensor, a pressure sensor, etc. The plurality of sensors 110 may provide signals to the computing device 120. The system 100 for abnormal signal detection that sends or provides a sensor signal 200 including a multivariate time series can implement an unsupervised abnormal signal detection method based on association analysis. The abnormal signal detection method is a deterministic method that can obtain stable output without model training. In addition, the method does not require pre-set assumptions about data distribution, and can therefore be easily extended to the processing of streaming data. The abnormal signal detection method identifies anomalies by observing the correlation between sensors or the correlation between sensor data, mining the changes in the correlation between the sensor/sensor data, and obtaining (1) the time period when the abnormality occurs (abnormal time) and (2) the affected components (abnormal sensors with abnormal readings) of the sensor signal. Combining these two aspects can provide valuable information for predictive maintenance. According to one embodiment, the method for detecting abnormal signals or the method for detecting abnormal signals identifies anomalies by observing the correlation between sensors and mining the changes in the correlation between sensor signals. To this end, the multivariate time series is first converted into a series of time series graphs (Time-Series Graph). In one embodiment, the time series graph is a k-nearest neighbor graph. Each vertex in the k-nearest neighbor graph corresponds to a sensor, and the edge between two vertices represents the correlation between the time series of two sensors over a period of time. The time series graph only connects sensors to highly correlated neighbors. In order to detect anomalies, a small number of affected vertices whose correlations with their neighbors have changed are first found; then, the abnormal correlation changes in the time series graph of these affected vertices are tracked. Finally, based on the theoretically guaranteed correlation change analysis, the method of abnormal signal detection can detect abnormal time periods and affected sensors. In addition, the method of abnormal signal detection is practical and reliable for early anomaly detection of noisy data. Once the time series changes abnormally at a specific time point, the correlation of the affected sensor with other sensors will change accordingly, and the edges related to these affected vertices on the time series graph will also change immediately. Therefore, once an anomaly occurs, a small anomaly in a sensor-based multivariate time series can be reflected by the affected vertices on the time series graph. However, since multivariate time series usually contain noise, reporting anomalies based on (a few) affected vertices may not be very accurate. Therefore, by tracking the relevant changes of a series of time series graphs, the vertices with significant changes are considered abnormal only when they are above the threshold. The method of abnormal signal detection can achieve rapid and reliable anomaly detection in noisy real-world data.
Figure imgf000006_0001
According to one embodiment, FIGS. 2 to 4 show a method 300 for detecting abnormal signals. Rj'ffl「Detect abnormal sensor signal 200, such as a multivariate time series T. Multivariate time
Figure imgf000007_0001
The operating state in the multivariate time series T can be divided into two types: normal state and abnormal state (abnormal). The abnormality includes two parts: abnormal sensors and abnormal time. Abnormal sensors refer to sensors that cause detection abnormalities and are fundamentally related to the occurrence of abnormalities, while abnormal time corresponds to sensor readings, usually referring to the time period when the abnormality occurs. Given a multivariate time series T with n sensors, the goal of abnormality detection is to find / sub-matrices (from T) that are abnormal, where / is the number of abnormalities. Each sub-matrix represents an abnormality Z, where each column represents a single abnormal time point; each row corresponds to the time series of the abnormal sensor that caused the abnormality. The sub-matrix can be composed of non-adjacent rows of T, or adjacent columns to keep the abnormal time continuous. Multivariate time series are usually long, but abnormalities often occur in a very short time series, so it is difficult to detect abnormalities directly from multivariate time series. Therefore, the method 300 for abnormal signal detection includes in box 310, first dividing the sensor signal including the multivariate time series into multiple sub-matrices 210/220/230. The sub-matrices 210/220/230 correspond to respective sliding windows w relative to the sensor signal 200 in the time series. In some embodiments, multiple sub-matrices 210/220/230 may overlap. In addition to the given sliding window w, the step size s is preferably smaller than the sliding window w. A larger sliding window w and a smaller step size s are helpful for fine-grained tracking of changes in correlation. However, a smaller s/w ratio has more windows and is therefore more expensive. Preferably, we[0.01|T|, 0.03W] and se[0.01W, 0.02W] are set to be smaller than 0.01W.
Figure imgf000007_0002
Each sub-matrix Ti has the same length w. It is worth noting that when the multivariate time series is not divisible by the sliding window w, that is, (|T|- w) mod s, starting from s<w«|T|, the last few sequences of T are deleted to make it divisible. For ease of explanation, assume R=(|T|- w)/s+l, and convert the multivariate time series T into R overlapping sub-matrices {Ti, , TR} O sub-matrices L

(i WrWR) 仅显示每个,'L独传感器的时间序列。为了监测传感器之间的 相关 性并通过异 常相关性变 化检测异 常, 先将每个所述 子矩阵 210/220/230转换为相应的时间序列图 (Time Series Graph) 。 在框 320 中, 将每个所述子矩阵 210/220/230时间序列图转换为相应的 k-最近邻 图 (k-Nearest Neighbor Graph) 212/222/232 o 如图 3所示, 每个所述 k- 最近邻 图可包括多 个顶点 214/224/234 及连接相邻顶点的多个边 216/226/236。例如,每个了矩阵 转换为相应 k-最近邻图 Gr=(V, Er), 其中 #是一个集合每个顶点对应一个传感器, 即 n=|F1; &是一组边。 根据皮尔逊相关系数 ( Pearson Correlation) , 将每个顶点连接到其 k最 高相关邻居。 此外, 在每个子矩阵 Tr中, 将每表示为加权图, 其中权 重 G(e), 每条边 e=(", v) EEr是两个传感器 «, v的时间序列的皮尔逊相 关系数。 在形成 k-最近邻图时, 设置较大的 k值可能包含与其 k邻居的较 弱相关性, 导致额外的噪声且影响异常检测结果。 反之, 设置较小的左 值可能不包含传感器之间足够的相关信息。因此可根据传感器信号设置 相应的 左值。 在一些实施例中, k-最近邻图 G,中的一些顶点在 k邻居 中仍然具有弱相关性。这些较弱相关性的边,可能会在连续的轮次或序 列中发生频繁变化,从而导致传感器之间的实际相关性产生额外的噪声。 为了精确检测异常, 优选地, 在框 330中, 修剪每个所述 k-最近邻图以 去除权重小于权重阈值 T的边,即 o (e)| < T 0在具有较大量传感器时, 进行边的修剪使得异常信号检测的方法仅跟踪传感器之间的强相关性, 并执行实时异常检测。在一些实施例中,权重阈值 T会影响时间序列图 的稀疏性。较大的权重阈值 T会导致口寸间序列图中的 :些顶点可能没有 足够的邻居,囚此缺乏足够的相关信息来检测异常。较小的权重阈值 T , 则会使得时间序列图 RJ能包括许多相关性较弱的边,从而导致额外的噪 声。 优选地, 选择适度的权重阈值 T于。 .4-0.6之间。 关于每个子矩阵 T” 修剪后的 k-最近邻图可称为时间序列图 Gro |^|此, 可将较长的多 变量时间序列 T转换为 R轮次的时间序列图 {G/, ... ,GR} . 在时间序列图的形式中, 例如图 3 中受影响的顶点 238表示异常 传感器,并利用异常传感器的连续轮次来表示异常时间。首先定义在时

Figure imgf000009_0001
异常 Z由一对 (Vz, Rz)定义, 即, Z=(Vz, Rz), 其中 /zb为受异常影响 的所有顶点; Rz是异常发生时的一组连续轮次或连续的时间序列图。 异常信号检测的方法 300还包括在框 340中, 将每个所述 k-最近 邻图拆分为多个不相交的子图。 定义 1 说明了时间序列图中异常的总 体表示。 对于某一轮 r, 时间序列图 G可拆分为 %个不相交的子图
Figure imgf000009_0002
松散连接, 则称 0.个子图为 4个社区 ( Community ) 。传感器网络通常 呈现社区结构。因此, 由传感器网络构建的时间序列图通常表现出社区 结构。有了社区结构, 可通过连续两轮或两个连续的时间序列图, 检测 不同顶点的相应社区中的变化, 以判断不同顶点的连通性是否稳定。基 于这种想法, 向 (Kz, Rz)添加了特定的约束, 得到定义 2 及定义 3。 定义 2 (受影响的顶点) : 对时间序列图 Gr=(V, Er)- , 给定轮 r£ (l, R]中具有 4社区, 顶点于集竹 J矿为受影响的顶点的于集。 受影响的 顶点为在连续两轮中移入或移出社区 Cr.c的每个顶点 vG " 即 14 ={v G V I (v Cr-i:c and v £ Crc) or (v £ Cr-i c and v g G,c)} o 根据定义 2, 如果 每个顶点 ve r在连续两轮中发生变化, 即移入或移出相应社区 Cg 将其表示为一个变化 (Variation ) 。 发生变化每个顶点 v可对应于一个 相应离群点。在框 350中, 通过对比两个连续的 k-最近邻图, 获得离群 点变异数量。在一个实施例中, 离群点变异数量町为在该序列中, 离群 点的总数或数量, 换言之, 可为顶点子集 K ■中受影响的顶点的数量。 可以理解地,两个连续的 k-最近邻图之间可能没有离群点, I大 I此所述离 群点变异数量则为零。当所述离群点变异数量不为零时,可包括至少一 个离群点。所述至少一个离群点在所述两个连续的 k-最近邻图中,可分 别置于不同的了图。因此,町获得包括其他轮次更多的变化的连续轮次, 并将这段时间视为异常时间。 定义 3 (异常时间及异常传感器) : 给定第一阈值, 例如异常时间 阈值& (i W & Wn)及 R轮受影响的顶点{Vi, ... , VR), 异常时间 Rz 是
Figure imgf000010_0001
3中,如果存在较多受影响的顶点,则将轮尸视为异常。在一些情况下, 所有顶点都可能受到影响。 因此, 异常时间阈值 &的范围可以是 1W & Wn 。 因此, 在框 360中, 异常信号检测的方法 300还包括根据所述离 群点变异数量及第一阈值,例如异常时间阈值 判定传感器信号是否 只有 •异常信号,并在所述离群点变异数量大于或等于异常时间阈值 &时, 判定传感器信号为异常。 异常信号检测的方法 300 还包括可根据所述 至少一个离群点,获得相应于各个所述至少一个离群点的异常时间及异 常传感器。 设置异常时间阈值 &的相关内容在后续公开中说明。 如 图 2及 3示出了异常信号检测的方法 300的示例。 示例中展示 了带有 ”=4个传感器的多变量时间序列 T, 其中 |T|=8。 与 ti至 t6中的 传感器读数相比, S4t7及 S4t8中传感器读数显着下降。因此, 子矩阵(20 20)为异常 Z, 其中异常时间为 t7至 t8, 异常传感器为 s4。 其中, 滑动 窗口 w=4, 步长 s=2, 轮次 R=3, 权重阈值 T=0.8 o 图 2示出了将多变量 时间序列 T转换为 R=3轮时间序列图{Gj, G2, G}的示例。 给定一个子 矩阵 Tr, 当 彳丁或多行 (即传感器 S4 的口寸间序列) 发生异常变化时, 由于该传感器与其它传感器的相关性会发生相应变化,囚此时间序列图 Gr中与这些顶点相关的边, 即边 e=(S2, S4), 将立即改变。 因此, 以时间 序列图的形式, 可实现与使用多变量时间序列 形式相同的检测目标,
Figure imgf000011_0001
序列 T的一行, 彪为异常发生的时间段。 因此, 时间序列图中的异常 238 对应于多变量时间 200 序列中的异常子矩阵。 在多变量时间序列 200中, 异常传感器为 S4, 异常时间为 t7至 t8, 而在时间序列图 232中 为 KZ={S4)及人 z={r3}。 参考 图 5, 示出了异常信号检测的方法的一个实施例。 异常信号检 测的方法分为二个阶段: 在第一阶段 410, 将传感器信号划分并转换为 多个相应的时 '间序列图,例如 k-最近邻图。并根据顶点的相关性从不同 的时间序列图中检测不同的社区/子图。在第二阶段 420,在每两个连续 轮次或序列中挖掘每个顶点与其他顶点的共现关系,例如获得每个所述 顶点的共现数及相应的共现率, 并获得离群值。 在第二阶段 430, 定义 离群点变异数量并分析其属,件。异常检测过程的伪代码 1如下所示,可 返回离群值集及离群点变异数量。基于离群值变化数量的分析,异常信 号检测的方法可检测异常时间及异常传感器。 伪代码 1
Figure imgf000011_0002
(iWrWR) only displays the time series of each,'L individual sensor. In order to monitor the correlation between sensors and detect anomalies through abnormal correlation changes, each of the sub-matrices 210/220/230 is first converted into a corresponding time series graph (Time Series Graph). In box 320, each of the sub-matrices 210/220/230 time series graphs is converted into a corresponding k-nearest neighbor graph (k-Nearest Neighbor Graph) 212/222/232 . As shown in Figure 3, each of the k-nearest neighbor graphs may include multiple vertices 214/224/234 and multiple edges 216/226/236 connecting adjacent vertices. For example, each matrix is converted into a corresponding k-nearest neighbor graph Gr = (V, Er), where # is a set and each vertex corresponds to a sensor, that is, n=|F1; & is a set of edges. According to the Pearson correlation coefficient (Pearson Correlation), each vertex is connected to its k highest correlation neighbors. In addition, in each submatrix Tr, each is represented as a weighted graph, where the weight G(e), each edge e=(", v) EEr is the Pearson correlation coefficient of the time series of the two sensors «, v. When forming the k-nearest neighbor graph, setting a larger k value may include a weaker correlation with its k neighbors, resulting in additional noise and affecting the abnormality detection result. Conversely, setting a smaller left value may not include enough relevant information between sensors. Therefore, the corresponding left value can be set according to the sensor signal. In some embodiments, some vertices in the k-nearest neighbor graph G, still have weak correlation among k neighbors. These weakly correlated edges may change frequently in consecutive rounds or sequences, resulting in additional noise in the actual correlation between sensors. In order to accurately detect anomalies, preferably, in box 330, each of the k-nearest neighbor graphs is pruned to remove edges with weights less than a weight threshold T, that is, o ( e )| < T 0. When there are a large number of sensors, the edge pruning is performed so that the method for abnormal signal detection only tracks the strong correlation between sensors. And perform real-time anomaly detection. In some embodiments, the weight threshold T affects the sparsity of the time series graph. A larger weight threshold T may cause some vertices in the time series graph to not have enough neighbors, and thus lack sufficient relevant information to detect anomalies. A smaller weight threshold T, This will cause the time series graph RJ to include many edges with weak correlation, resulting in additional noise. Preferably, a moderate weight threshold T is selected between 0.4 and 0.6. The pruned k-nearest neighbor graph for each submatrix T" can be called a time series graph G ro |^|. Therefore, a longer multivariate time series T can be converted into a time series graph {G/, ..., GR} of R rounds. In the form of a time series graph, for example, the affected vertex 238 in FIG. 3 represents an abnormal sensor, and the continuous rounds of the abnormal sensor are used to represent the abnormal time. First, define the time series graph G ro |^|.
Figure imgf000009_0001
The anomaly Z is defined by a pair (Vz, Rz), that is, Z=(Vz, Rz), where Vzb is all vertices affected by the anomaly; Rz is a set of consecutive rounds or a consecutive time series graph when the anomaly occurs. The method 300 for detecting anomaly signals also includes, in block 340, splitting each of the k-nearest neighbor graphs into a plurality of non-intersecting subgraphs. Definition 1 illustrates the overall representation of anomalies in a time series graph. For a certain round r, the time series graph G can be split into % non-intersecting subgraphs
Figure imgf000009_0002
Loosely connected, the subgraphs are called 4 communities. Sensor networks usually present community structures. Therefore, time series graphs constructed by sensor networks usually show community structures. With the community structure, the changes in the corresponding communities of different vertices can be detected through two consecutive rounds or two consecutive time series graphs to determine whether the connectivity of different vertices is stable. Based on this idea, specific constraints are added to (Kz, Rz) to obtain Definition 2 and Definition 3. Definition 2 (affected vertices): For a time series graph G r =(V, E r )- , given a round r£ (l, R] with 4 communities, the vertex set J is the set of affected vertices. The affected vertices are each vertex vG " that moves into or out of the community Cr.c in two consecutive rounds, that is, 14 ={v GVI (v Cr-i : c and v £ Crc ) or (v £ Cr -i c and vg G, c )} o According to Definition 2, if each vertex ve r changes in two consecutive rounds, that is, moves into or out of the corresponding community Cg, it is represented as a variation. Each vertex v that changes may correspond to a corresponding outlier. In block 350, the outliers are obtained by comparing two consecutive k-nearest neighbor graphs. Point mutation number. In one embodiment, the outlier mutation number is the total number or number of outliers in the sequence, in other words, the number of affected vertices in the vertex subset K ■. It can be understood that there may be no outliers between two consecutive k-nearest neighbor graphs, so the outlier mutation number is zero. When the outlier mutation number is not zero, it may include at least one outlier. The at least one outlier may be placed in different graphs in the two consecutive k-nearest neighbor graphs. Therefore, a continuous round including more changes in other rounds is obtained, and this period is regarded as abnormal time. Definition 3 (Abnormal time and abnormal sensor): Given a first threshold, such as the abnormal time threshold & (i W & Wn) and R rounds of affected vertices {Vi, ..., VR), the abnormal time Rz is
Figure imgf000010_0001
3, if there are more affected vertices, the wheel corpse is regarded as abnormal. In some cases, all vertices may be affected. Therefore, the range of the abnormal time threshold & can be 1W & Wn. Therefore, in box 360, the method 300 for detecting abnormal signals also includes determining whether the sensor signal has only an abnormal signal based on the number of outlier mutations and the first threshold, such as the abnormal time threshold, and determining that the sensor signal is abnormal when the number of outlier mutations is greater than or equal to the abnormal time threshold &. The method 300 for detecting abnormal signals also includes obtaining the abnormal time and abnormal sensor corresponding to each of the at least one outlier based on the at least one outlier. The relevant content of setting the abnormal time threshold & is described in the subsequent disclosure. An example of the method 300 for detecting abnormal signals is shown in Figures 2 and 3. The example shows a multivariate time series T with t=4 sensors, where |T|=8. Compared with the sensor readings from ti to t6, the sensor readings in S4t7 and S4t8 drop significantly. Therefore, the submatrix (20 20) is the anomaly Z, where the abnormal time is t7 to t8, and the abnormal sensor is s4. Among them, the sliding window w=4, the step size s=2, the round R=3, and the weight threshold T=0.8 oFigure 2 shows an example of converting the multivariate time series T into an R=3 round time series graph {Gj, G2 , G}. Given a submatrix Tr, when one or more rows (i.e., the time series of sensor S4) undergo abnormal changes, the correlation between the sensor and other sensors will change accordingly, so the time series graph The edges in Gr associated with these vertices, i.e., the edge e=(S2, S4), will change immediately. Therefore, in the form of a time series graph, the same detection goal as using a multivariate time series form can be achieved,
Figure imgf000011_0001
A row of the sequence T is a time period when the anomaly occurs. Therefore, the anomaly 238 in the time series graph corresponds to the anomaly submatrix in the multivariate time series 200. In the multivariate time series 200, the abnormal sensor is S4, the abnormal time is t7 to t8, and in the time series graph 232, KZ={S4) and z={r3}. Referring to FIG5, an embodiment of the method for abnormal signal detection is shown. The method for abnormal signal detection is divided into two stages: In the first stage 410, the sensor signal is divided and converted into a plurality of corresponding time series graphs, such as k-nearest neighbor graphs. And different communities/subgraphs are detected from different time series graphs according to the correlation of the vertices. In the second stage 420, the co-occurrence relationship between each vertex and other vertices is mined in every two consecutive rounds or sequences, for example, the co-occurrence number and the corresponding co-occurrence rate of each vertex are obtained, and the outlier value is obtained. In the second stage 430, the number of outlier point variations is defined and its attributes are analyzed. The pseudo code 1 of the anomaly detection process is shown below, which can return the outlier set and the number of outlier point variations. Based on the analysis of the number of outlier changes, the abnormal signal detection method can detect abnormal time and abnormal sensors. Pseudo code 1
Figure imgf000011_0002

2在 G,上应阴 Louvam 得到财 个社区; 2 In G, Louvam should get wealth from the community;

3 Or 0: 3 Or 0:

4 foreach e V do

Figure imgf000011_0003
4 foreach e V do
Figure imgf000011_0003

8 return {Or. n.r}: 第一阶段 410 为社区检测。 所述异常信号检测的方法可包括先将 传感器信号划分为多个子矩阵,并将每个所述子矩阵转换为相应的时间 序列图,例如 k-最近邻图。并根据顶点的相关性从不同的时间序列图中 检测不同的社区/子图。对于每一轮 r£ [l,R], 首先使用例如 Louvain的 算法, 将时间序列图 G ■划分为今社区, 使得比其他顶点更相关的顶点 被分成相同的社区 (伪代码 1第 2行) 。通过社区, 可跟踪顶点的异常 相关变化。 在众多社区检测 ( Community Detection) 算法中, 优选地, 选择较有效且仅需 O(n log〃)时间的 Louvain算法。 图 5显示了带有 12 个传感器的前 10 轮时间序列图。第 1 阶段之后,每个时间序列图被划 分为三个社区, 但不同轮次的社区可以不同。 当异常发生时, 虽然传感 器最初可能不会表现出明显的异常育为,但传感器的相关性很可能会崩 溃, 从而导致社区发生变化。接下来, 将从传感器的社区中挖掘顶点的 共现关系。 第 2 阶段: 共现挖掘。首先定义共现 (Co-appearance)。如果两个 顶点在两个连续轮次/序列中, 例如两轮 ( r- 1 ) 与,中, 都在同一个社 区中, 则称之为共同出现或共现。 定义 4 (共现) : 给定一轮 r W(l,用 及一个顶点 v; 其先前社区 GTC及当前社区 G。, 共现 Sr(v,M)的计算

Figure imgf000012_0001
定义 4 描述了两个顶点在连续两轮 /两个序列中的一次共现。 可通过计 算每个顶点与其他顶点的共现来监控每一轮中所有顶点所属的社区是 否产生变化或修改。 之后, 定义共现数 ( Co-appearance Number) 5r(v)8 return {O r . n. r }: The first stage 410 is community detection. The method of abnormal signal detection may include first The sensor signal is divided into multiple sub-matrices, and each of the sub-matrices is converted into a corresponding time series graph, such as a k-nearest neighbor graph. Different communities/sub-graphs are detected from different time series graphs according to the correlation of vertices. For each round r£ [l, R], the time series graph G ■ is first divided into communities using an algorithm such as Louvain, so that vertices that are more correlated than other vertices are divided into the same community (pseudocode 1, line 2). Through the community, the abnormal correlation changes of vertices can be tracked. Among the many community detection algorithms, it is preferred to select the Louvain algorithm, which is more effective and only requires O(n log〃) time. Figure 5 shows the first 10 rounds of time series graphs with 12 sensors. After the first stage, each time series graph is divided into three communities, but the communities in different rounds can be different. When an anomaly occurs, although the sensor may not initially show obvious abnormal behavior, the correlation of the sensor is likely to collapse, resulting in a change in the community. Next, the co-occurrence relationship of the vertices will be mined from the sensor's community. Phase 2: Co-occurrence mining. First, we define co-appearance. If two vertices are in the same community in two consecutive rounds/sequences, such as two rounds (r-1) and, they are called co-appearance or co-occurrence. Definition 4 (Co-occurrence): Given a round r W(l, with a vertex v; its previous community GTC and current community G., the calculation of co-occurrence Sr(v, M ) is
Figure imgf000012_0001
Definition 4 describes the co-occurrence of two vertices in two consecutive rounds/sequences. By calculating the co-occurrence of each vertex with other vertices, we can monitor whether the community to which all vertices belong changes or modifies in each round. Then, we define the co-appearance number 5r(v)

(根据顶点 v) 为计算其相应顶点 v共现顶点的总数。 定义 5 (共现数) : 共现数》(v) 在每个顶点 v er中 (一轮 r W

Figure imgf000012_0002
在不同轮次中的共现数% (v) 对每个顶点 v &K而言可能不同, 尤其是 存在异常时。 因此, 定义共现率(Ratio of Co-appearance Number) 以获 得通过某一个顶点的共现数, 基于目前为止的所有 r轮次的变化。 定 义 6 (共现率): 对于每个 v &K, v在轮 r W(l,R]与其他顶点的共现率 计算如下: (According to vertex v) is the total number of vertices that co-occur with its corresponding vertex v. Definition 5 (Co-occurrence number): Co-occurrence number (v) is the number of vertices in each vertex v er (a round r W
Figure imgf000012_0002
The co-occurrence % (v) in different rounds may be different for each vertex v & K, especially When there are anomalies. Therefore, define the co-appearance ratio (Ratio of Co-appearance Number) to obtain the co-appearance number through a vertex, based on the changes in all r rounds so far. Definition 6 (Co-appearance ratio): For each v &K, the co-appearance ratio of v with other vertices in round r W(l,R] is calculated as follows:

(3)

Figure imgf000013_0001
如果一个顶点 V在,轮中始终与同一社区中的某一组顶点共现,则具有 较高 /大的 RC”并可被认为是正常的。 然而, 对于特定的一轮 r, 如果 顶点 v突然成为另一个社区的成员,其顶点 v则应被视为一个离群点或 离群值。 囚为这个社区可能只包含几个(或不包含)与顶点 v共现的其
Figure imgf000013_0002
例如离群值阈仇值进行 比较,以确定顶点 v是否是一轮 r中的离群点。
Figure imgf000013_0003
(1,R] 中定义如下: or =(v er | Rcv,r<e} (4) 共现挖掘阶段的总体步骤描述如下。先将离群值集 Or初始化为空集(第
Figure imgf000013_0004
K, 计算各顶点 v的共现数 Sr(v)及共现率 RCv,r(伪代码 1第 5行) 。 并将共现率 RC”低于离样值阈值 0的各个顶点 v,添加入离样值集(伪 代码 1第 6行)。较高的离群值阈值。会排除许多应被视为离群值的顶 点, 从而导致检测精度低。优选地,对于大多数数据集, 将离群值阈值 0 设置为大约 0.3 o 值得注意的是, 由于多变量时间序列通常包含噪声, 因此(少数)离群值的存在在某些情况下,无法可靠地确定一轮为异常。 因此, 以下阶段说明了将分析离群值变化并减轻噪声的影响。 第 3 阶段: 变异分析。 时间序列图中的顶点在连续两轮 /序列中可' 具有二种状态: (1) 两轮都为正常状态、 (2) 两轮都为离群点 /离群 值、 (3) 正常状态与离群值之间的过渡。 例如, 如图 5, 顶点力及顶 点 c在第 9 轮及第 10轮中都为止常, 而顶点 a在第 9 轮中成为屋群 值, 并在笫 10轮中变为正常。 为了提高碰定异常 /异常时间的准确率及 可靠性,关注过渡状态并进一步定义离群点变异数量 ( Number of Outlier Variations) nro 定义 8 (离样点变异数量) : 对于 -轮 r £(l,R], 离样点变异数量 nr可定义为处于过渡状态的顶点数量, 即% = &或 (("定 0r-1 and v e 0r) or (v E 0r-1 and v t 0r))o根据定义 8, 通过对比两个连续轮次或序 列的离群值集, 例如分别为对应于轮次 r 1及轮次尸的离群值集 Or-i
Figure imgf000014_0001
参考定义 3, 根据离群点变异数量及第 〃阈值, 例如异常时 '间阈值 & , 判定传感器信号是否具有异常信号 。 为一示例, 如果离群点变异数量
Figure imgf000014_0002
异 常检测。 给定一轮 r, 在获得离群点变异数量伍之后, 需确定当 前轮尸是否异常。在一个实施例中, 根据切比雪夫不等式 ( Chebyshevas Inequality) , 得出:
Figure imgf000014_0003
其中〃 为常数;〃及广分别为离群点变异数量伍的平均值及标准偏差。 为一个优选示例, 为了精确地检测异常, 默认设 ® /z = 3 o 因此, 异常 时间阈值 S为己 = 3c 即可根据离群点变异数最 m的标准偏差 c 获得
Figure imgf000014_0004
轮尸同为异常。 因此, 可根据异常时间阈值 4、 该轮尸的离群点变异数 量 0及离群点变异数量的平均值用 判定轮次,是否具有异常信号。 在一个 实施例中, 异常信号检测的方法中用于异常检测可通过以 下伪代码 2描述。使用 Z来存储异常, 并采用改及彪分别存储每个异 常或受影响的传感器及异常轮次 (伪代码 2第 2行) 。 在一个实施例 中, 对于每 :轮 r E[\,R\, 先调用伪代码 1 中的方法进行离群值检测, 并获得离群值集 O’及离群点变异数量 nr(伪代码 2第 6行) 。 使用 N 以存储一系列的离群点变异数量 nro值得注意的是,必及 b为确定异常 时间的两个参数。 如果 \nr-M\ ^3a, 则判定该轮为异常, 因此将 r添加 到 Rz并将 Tz更新为 Or (伪代码 2第 8行) 。 如果々不为空, 意味着 当前异常结束,并将(氐死)添加到 Z并将改与彪重新初始化为空集(伪 代码 2第 11 行) 。 然后, 将 n添加到 N(伪代码 2第 12行) 并相应 地更新卜及 G(伪代码 2第 13行) , 并返回所有检测到的异常 Z为输 出 (伪代码 2第 14行) 。 使用不等式 |〃厂川 A3b,对处理具有噪声的传感器信号 /数据较为可 靠。 基于某台机器的噪音通常处于同一水平, 使用皿 川可减低噪声的 影响, 并利用 3b区分实际异常与噪声数据。 在一些实施例中, 为了避 免可能因仅通过几个 m .所获得的#及 b的差异, 尤其是当 r 较少时,使 用历史多变量时间序列引入预热过程(伪代码 2第 16-23行)以获得足 够的 〃 ” 预热过程的输入来自 T相同的数据源的历史传感器信号, 例 如历史多变量时间序列 几”, 通过类似于异常检测的过程, 得到每一轮
Figure imgf000015_0001
确的离群点变异数量的平均值 M 及离群点变异数量的准偏差 c 进行 后续的异常检测。 伪代码 2 (3)
Figure imgf000013_0001
If a vertex V always co-occurs with a certain set of vertices in the same community in rounds, it has a high/large RC and can be considered normal. However, for a particular round r, if vertex v suddenly becomes a member of another community, then vertex v should be considered an outlier or outlier. This is because this community may contain only a few (or no) other vertices that co-occur with vertex v.
Figure imgf000013_0002
For example, the outlier threshold is compared with the value to determine whether the vertex v is an outlier in a round r.
Figure imgf000013_0003
(1,R] is defined as follows: o r =(ver | Rc v , r <e} (4) The overall steps of the co-occurrence mining phase are described as follows. First, the outlier set Or is initialized to an empty set (
Figure imgf000013_0004
K, calculate the number of co-occurrences Sr(v) and the co-occurrence rate RC v , r of each vertex v (pseudocode 1 line 5). And add each vertex v whose co-occurrence rate RC" is lower than the outlier threshold 0 to the outlier set (pseudocode 1 line 6). A higher outlier threshold will exclude many vertices that should be considered outliers, resulting in low detection accuracy. Preferably, for most data sets, the outlier threshold 0 is set to about 0.3. It is worth noting that since multivariate time series usually contain noise, the presence of (a few) outliers cannot reliably determine that a round is abnormal in some cases. Therefore, the following stage describes how to analyze outlier changes and mitigate the impact of noise. Stage 3: Variation analysis. The vertices in the time series graph can have two states in two consecutive rounds/sequences: (1) both rounds are normal, (2) both rounds are outliers/outliers, (3) the transition between normal state and outliers. For example, as shown in Figure 5, the vertex force and vertex Point c is constant in both the 9th and 10th rounds, while vertex a becomes a cluster value in the 9th round and becomes normal in the 10th round. In order to improve the accuracy and reliability of determining the anomaly/abnormal time, we pay attention to the transition state and further define the number of outlier variations (Number of Outlier Variations) n r Definition 8 (Number of Outlier Variations): For a round r £(l,R], the number of outlier variations n r can be defined as the number of vertices in the transition state, that is, % = & or (("定0 r-1 and ve 0 r ) or (v E 0 r-1 and vt 0 r )) o According to Definition 8, by comparing the outlier sets of two consecutive rounds or sequences, for example, the outlier sets O r -i corresponding to round r 1 and round p respectively
Figure imgf000014_0001
Referring to Definition 3, according to the number of outlier variation and the first threshold, such as the abnormal time threshold θ, it is determined whether the sensor signal has an abnormal signal. For example, if the number of outlier variation is
Figure imgf000014_0002
Anomaly detection. Given a round r, after obtaining the number of outlier mutations V, it is necessary to determine whether the current round is abnormal. In one embodiment, according to Chebyshevas Inequality, it is obtained that:
Figure imgf000014_0003
Wherein, 〃 is a constant; 〃 and 〃 are the average value and standard deviation of the number of outlier variation, respectively. As a preferred example, in order to accurately detect anomalies, it is assumed by default that ® /z = 3. Therefore, the abnormal time threshold S is 3c = 3c, which can be obtained according to the standard deviation c of the number of outlier variation m.
Figure imgf000014_0004
The wheel corpse is also abnormal. Therefore, according to the abnormal time threshold 4, the outlier variance of the wheel corpse The average value of the quantity 0 and the number of outlier mutations is used to determine whether the round has an abnormal signal. In one embodiment, the method for abnormal signal detection can be described by the following pseudo code 2 for abnormal detection. Z is used to store abnormalities, and each abnormal or affected sensor and abnormal round are stored separately using the modified and 標 (pseudo code 2 line 2). In one embodiment, for each round r E[\,R\, the method in pseudo code 1 is first called to perform outlier detection, and the outlier set O' and the number of outlier mutations n r are obtained (pseudo code 2 line 6). N is used to store a series of outlier mutation numbers n r. It is worth noting that the must and b are two parameters for determining the abnormal time. If \n rM \ ^3a, the round is determined to be abnormal, so r is added to Rz and Tz is updated to Or (pseudo code 2 line 8). If 々 is not empty, it means that the current abnormality ends, and (氐死) is added to Z and modified and 標 are reinitialized to empty sets (pseudo code 2 line 11). Then, n is added to N (pseudocode 2 line 12) and 1 and G are updated accordingly (pseudocode 2 line 13), and all detected anomalies Z are returned as output (pseudocode 2 line 14). Using the inequality | 〃 factory 川 A3b, it is more reliable to process noisy sensor signals/data. Based on the fact that the noise of a certain machine is usually at the same level, the use of 皿川 can reduce the impact of noise, and use 3b to distinguish actual anomalies from noise data. In some embodiments, in order to avoid the difference in # and b that may be obtained by only a few m., especially when r is small, a warm-up process is introduced using a historical multivariate time series (pseudocode 2 lines 16-23) to obtain enough 〃 ". The input of the warm-up process comes from the historical sensor signal of the same data source as T, such as the historical multivariate time series several ". Through a process similar to anomaly detection, each round is obtained.
Figure imgf000015_0001
The mean value M of the number of outlier mutations and the accurate deviation c of the number of outlier mutations are used for subsequent anomaly detection. Pseudocode 2

Input: MTS T. 历史 MTS Thzs, 滑动窗口叫 步 长 尝 离群值阈值代 Input: MTS T. Historical MTS T hzs , sliding window step size and outlier threshold

% N、 叫和# WarrYiUp(T^ST w, &); % N, call and # WarrYiUp(T^ ST w, &);

2 Z ^-^ Vz ^-^ Rz ^r- 0; OQ — 0; 2 Z ^-^ Vz ^-^ Rz ^r- 0; OQ — 0;

3 R= (|T|一地)川 + L 3 R = (|T| one place)chuan + L

4 for r = 1 to 7?. do

Figure imgf000016_0001
文本 公开的异常信号检测的方法可检测基于传感器的多变量时间 序列的异常。值得注意的是,所述异常信号检测的方法可扩展至处理流 数据。例如, 当新一轮数据亦 +i到达时, 可通过本异常信号检测的方法 实现实时检测异常。由于对每一轮执行异常信号的检测,检测过程可与 新数据收集同时进行。只要异常信号检测的方法每一轮的运行时间小于 一步 s的时间段, 则可实时检测异常。 此外, 在异常检测过程中, 通过 维护一系列伍, 可获得口及眼 在实时检测的条件下, 伍的数量随着越 来越多的数据流进入而增加, 可更精确地估计〃及。的值,从而实现更 准确地检测异常。 因此, 实时检测可应用「各种工业场景, 例如服务器 节点、 商业服务器、 供水网络、 电力传感器、 工业流水线等。 此外, 本 异常信号检测的方法可在较早期发现异常。 主要原 I大 I为当异常发生时, 虽然传感器可能不会表现出明显的异常行为,但受影响的传感器之间的 相 关性很可能会发生变化, 因此可通过本方法捕捉到这种变化。 时间序列异常检测的评估通常基于点调整 ( Point Adjustment,简称 PA) 。 参考图 6, 给定时间序列的异常标签, 一旦在该范围内中检测到 单个时间点, 则认为检测到该异常。之后, 直接与标签值进行比较以计 算尸 1 的值。 由于点调整 PA不考虑首先检测到异常的时间点, 囚此忽 略了检测到异常时的时间顺序,并且不评估检测到的时间先后。为了解 决点调整 PA 中的问题, 本文提供了一种新的评估方案为延迟点调整 ( Delay-Point Adjustment, 简称 DPA)以反映检测到异常时的时间顺序。 与点调整 PA 相比, 基于每个异常, 延迟点调整 DPA仅调整第一个真 阳性 ( True Positive ) 之后的假阴性 (False Negative ) 。 图 6中, 基于 相 同的异常检测方法 Mi, 通过使用延迟点调整 DPA, 使得仪 /s及 4中 的两个假阴性被调整为真阳性, 而 77至冷中的二个假阴性保持不变。 因此,尸比处=72.7%。与点调整 PA相比, 延迟点调整 DPA 为较严格的 评 估, 即 F'DPA WF'PA。 此外, 提出了两个相对评估的方法: 早于 ( Ahead) 及未检测到 (Miss) , 用于两种方法的检测结果之间的相对比较。
Figure imgf000017_0001
常检测方法 Mi早于 (Ahead) 异常检测方法 "检测到的异常敬目, 而 lm,ss 为异常检测方法庭 1未检测到 (Miss) 但异常检测方法庭 2检测出
Figure imgf000017_0002
有遗漏任何异常。 因此, Ahead^50%, Miss=Q。 在理想情况下, Ahead = 100% , Miss = 0。 通过实验, 评估本异常信号检测的方法 (简称 CAD) 相对于其他 方法 的表现。其中对每个时间序列执行这些方法,并将异常分数的平均 值作为输 出。 其他方法包括三种基于数据挖掘的方法: LOF、 ECOD及 TForest; 两个深度学习方法: USAD 及 RCoders; 及四种 变量方法: S2G、 SAND、其在线版本 SAND*及 NormA。使用了均带有标签信息的 传感器数据集: PSM、 SWaT及 SMD, 以及来自工业的两个私有数据集 IS-1及 IS-2o SMD数据集由 28个不同的子集组成, 没有进行预热过程 直接对每个子集进行方法评估。 表 II总结了八个数据集的统计数据。 表 III示出了在 PSM、 SWaT、 IS-1 及 IS-2 的传感器数据集上的 异常时 间检测结果,其中标准差(S0)是根据重复 10次的刊 R4及 F1DPA 获得 。 由于本申请的方法 CAD、 LOF、 ECOD及 S2G的结果不会随着 不 同的重复而改变,因此相应的标准差为 0。如表 III所示,本方法 CAD 优于 IS-1 的所有基线, 并达到了 FM及 FIDPA的最高平均排名。表 IV 示 出了 SMD数据集的结果。 本方法 CAD在 28个子集中至少有 17 个 优于 USAD、 RCoders、 S2G、 SAND、 SAND*及 NormA 的其他方法, 并且与 LOF、 ECOD 及 IForest方法相当。 表 II 八个传感器数据集的统计数据
Figure imgf000018_0001
Figure imgf000019_0001
4 for r = 1 to 7?. do
Figure imgf000016_0001
The method for detecting abnormal signals disclosed in the text can detect abnormalities in multivariate time series based on sensors. It is worth noting that the method for detecting abnormal signals can be extended to process streaming data. For example, when a new round of data also arrives, the method for detecting abnormal signals can be used to detect abnormalities in real time. Since the detection of abnormal signals is performed for each round, the detection process can be carried out simultaneously with the collection of new data. As long as the running time of each round of the method for detecting abnormal signals is less than a time period of one step s, the abnormality can be detected in real time. In addition, in the process of abnormality detection, by maintaining a series of fives, the mouth and eyes can be obtained. Under the condition of real-time detection, the number of fives increases as more and more data streams enter, and the value of can be estimated more accurately, thereby achieving more Accurately detect anomalies. Therefore, real-time detection can be applied to various industrial scenarios, such as server nodes, commercial servers, water supply networks, power sensors, industrial assembly lines, etc. In addition, the method of abnormal signal detection can detect anomalies at an earlier stage. The main reason is that when an anomaly occurs, although the sensor may not show obvious abnormal behavior, the correlation between the affected sensors is likely to change, so this change can be captured by this method. The evaluation of time series anomaly detection is usually based on point adjustment (PA). Referring to Figure 6, given the anomaly label of a time series, once a single time point is detected within the range, the anomaly is considered to be detected. After that, it is directly compared with the label value to calculate the value of . Since the point adjustment PA does not consider the time point when the anomaly is first detected, the time order when the anomaly is detected is ignored, and the time order of detection is not evaluated. In order to solve the problem in the point adjustment PA, this paper provides a new evaluation scheme for delay point adjustment (DPA) to reflect the time order when the anomaly is detected. Compared with the point adjustment PA, based on each anomaly, the delay point adjustment DPA only adjusts the first true positive ( In FIG6 , based on the same anomaly detection method Mi, by using the delayed point adjustment DPA, the two false negatives in the instrument/s and 4 are adjusted to true positives, while the two false negatives in 77 to cold remain unchanged. Therefore, the ratio of corpse to place = 72.7%. Compared with the point adjustment PA, the delayed point adjustment DPA is a stricter evaluation, namely F'DPA WF'PA. In addition, two relative evaluation methods are proposed: Ahead and Miss, which are used for relative comparison between the detection results of the two methods.
Figure imgf000017_0001
The abnormal detection method Mi is detected earlier than the abnormal detection method "(Ahead), while lm, ss is the abnormal detection method 1 that did not detect (Miss) but the abnormal detection method 2 detected
Figure imgf000017_0002
There is no missing exception. Therefore, Ahead^50%, Miss=Q. In an ideal case, Ahead = 100%, Miss = 0. Through experiments, the performance of the proposed method for abnormal signal detection (CAD) is evaluated relative to other methods. These methods are executed for each time series and the average of the anomaly scores is taken as the output. Other methods include three data mining-based methods: LOF, ECOD and TForest; two deep learning methods: USAD and RCoders; and four variable methods: S2G, SAND, its online version SAND* and NormA. Sensor datasets with label information are used: PSM, SWaT and SMD, as well as two private datasets from industry, IS-1 and IS-2o. The SMD dataset consists of 28 different subsets, and the method evaluation is performed directly on each subset without preheating. Table II summarizes the statistics of the eight datasets. Table III shows the results of abnormal time detection on the sensor datasets of PSM, SWaT, IS-1 and IS-2, where the standard deviation (S0) is obtained based on the publication R4 and F1 DPA repeated 10 times. Since the results of the proposed method CAD, LOF, ECOD and S2G do not change with different repetitions, the corresponding standard deviation is 0. As shown in Table III, the proposed method CAD outperforms all baselines of IS-1 and achieves the highest average ranking of FM and FIDPA. Table IV shows the results of the SMD dataset. The proposed method CAD outperforms the other methods of USAD, RCoders, S2G, SAND, SAND* and NormA in at least 17 of the 28 subsets, and is comparable to the LOF, ECOD and IForest methods. Table II Statistics of the eight sensor datasets
Figure imgf000018_0001
Figure imgf000019_0001

CAD 95*0 89.7 83.8 78.2CAD 95*0 89.7 83.8 78.2

LOF 76,2 7:L4 80.1 74.1LOF 76,2 7: L4 80.1 74.1

ECOD 87,1 80.5 84.0 77.6ECOD 87.1 80.5 84.0 77.6

IForest 91,4土 0.9 85.7 ± 1.7 84.9 ± 0.5 76.9 ± 0.4IForest 91,4± 0.9 85.7 ± 1.7 84.9 ± 0.5 76.9 ± 0.4

USAD 陌 E ± (17 76J?> ± 1璀 80.3 i 0.3 7:L5 ± 2.4USAD E ± (17 76J?> ± 1 80.3 i 0.3 7:L5 ± 2.4

RCocte 94.2 ± 1.1 90.6 ± 1.2 81.9 ± 1.0 76.3 ± 1.2RCocte 94.2 ± 1.1 90.6 ± 1.2 81.9 ± 1.0 76.3 ± 1.2

S2G 93.6 85.0 84.9 花、US2G 93.6 85.0 84.9 Hua, U

SAND 85,7土 0,6 73,5 ± 2.4 80.5 ± 4,2 60. G ± 3.2SAND 85.7±0.6 73.5 ± 2.4 80.5 ± 4.2 60. G ± 3.2

SAND" -®.O i L6 72.8 ± 1.3 8(). D ± 0.0 77.8 ± 0.8SAND" -®.O i L6 72.8 ± 1.3 8(). D ± 0.0 77.8 ± 0.8

NormA 85.7 i (12 76.7 ± 0.6 80.4 ± 0.6 75.2 ± 1.3 、、卜均排冬

Figure imgf000019_0002
NormA 85.7 i (12 76.7 ± 0.6 80.4 ± 0.6 75.2 ± 1.3 , , average
Figure imgf000019_0002

CAD lOD.O 97.4 98.2 91.1 1>6CAD lOD.O 97.4 98.2 91.1 1>6

LOF 100.0 83.8 99,9 70,3 6,1LOF 100.0 83.8 99.9 70.3 6.1

ECOD 99.9 <)7.3 71.1 64,0 4,3 iForest 99.1 ± 1.3 94.4 ± 2,4 71,3 i 22 67.3 ± 1,9 3,9ECOD 99.9 <)7.3 71.1 64.0 4.3 iForest 99.1 ± 1.3 94.4 ± 2.4 71.3 i 22 67.3 ± 1.9 3.9

USAD 99.8土 0.0 84.0土 0.0 82.6 i (L0 60.2 ± 0.0 6.3USAD 99.8± 0.0 84.0± 0.0 82.6 i (L0 60.2 ± 0.0 6.3

RCoders 99,4 i 0,8 89.8 ± 4.9 64.6 i 2.3 62.4 ± 1.9 4.6RCoders 99.4 i 0.8 89.8 ± 4.9 64.6 i 2.3 62.4 ± 1.9 4.6

S2G 95.3 85.9 72 ;7 67.9 :18S2G 95.3 85.9 72;7 67.9:18

SAND 78.6 ± 4.3 67.2 i 2.3 68.7 ± 2/7 50.5 ± 0.7 8.6SAND 78.6 ± 4.3 67.2 i 2.3 68.7 ± 2/7 50.5 ± 0.7 8.6

SAND* 93.2 i 2.() 87.5 ± 2.6 6L5 ± 1.2 5U ± 1,0 7.6SAND* 93.2 i 2.() 87.5 ± 2.6 6L5 ± 1.2 5U ± 1,0 7.6

NormA 95.3 ± 1.5 82.2 ± 11 55,5 ± Ch3 52.3 ± 0,1 7:8 表 IV 在 SMD 上的异常时间及异常传感器检测NormA 95.3 ± 1.5 82.2 ± 11 55,5 ± Ch3 52.3 ± 0,1 7:8 Table IV Abnormal time and abnormal sensor detection on SMD

(OP 为本方法 CAD在 28个子集中可胜过的子集数)

Figure imgf000019_0003
(OP is the number of subsets that CAD can outperform in the 28 subsets)
Figure imgf000019_0003

CAD 80.7 ^ 14.4 71.7 ^ 15.9 CAD 80.7 ^ 14.4 71.7 ^ 15.9

LOF 16 75.0土 24.4 13 65.5土 26.0 /LOF 16 75.0± 24.4 13 65.5± 26.0 /

ECOD 14 83.0 ± 14.5 12 73.2 ± 16.8 28ECOD 14 83.0 ± 14.5 12 73.2 ± 16.8 28

1 Forest 7 86.5 ± 13.7 8 76.5土 18.4 /1 Forest 7 86.5 ± 13.7 8 76.5Tu 18.4 /

USAD 20 75.3 ± 18.4 17 6(15士 20.7 /USAD 20 75.3 ± 18.4 17 6 (15 ± 20.7 /

R Coders 21 75.3土 17.4 19 63.6土 20.2 28R Coders 21 75.3± 17.4 19 63.6± 20.2 28

S2G 25 64.0 ± 21.4 23 52.5 ± 21.4 /S2G 25 64.0 ± 21.4 23 52.5 ± 21.4 /

SAND 26 45,3 ± 30.3 26 33.5 ± 25.7 /SAND 26 45,3 ± 30.3 26 33.5 ± 25.7 /

SAND" 26 51.1 ± 28.8 24 38.7士 25.8 /SAND" 26 51.1 ± 28.8 24 38.7 ± 25.8 /

NormA 25 55.3 ± 29.3 26 42.6士 26.5 / 当传感器数 tl增加时, 例如从 IS-1到 IS-2, 某些方法 (例如 LOF 及 USAD) 在在延迟点调整 DPA下的结果大幅下降。 部分原因是具有 更多传感器的多变量时间序列通常包含更复杂的关系且包含更多噪声。 四种单变量方法的表现在点调整 PA及延迟点调整 DPA下均显着降低。 这说明了将这些方法简, 1地扩展用于多变量时间序列可能无效。由于大 多数基准方法基于某些假设发现离样点,因此可能会因而变得不稳定并 且对噪声更敏感。 反之, 本方法 CAD通过一系列时间序列图检测基于 传感器之间异常相关变化的异常, 可降低噪声目 .更为稳定。 表 V示出了早于 (Ahead)及未检测到 ( Miss) 的评估。 本申请描 述的方法 CAD与其他方法相比, 早于 (Ahead) 至少达到 50% (PSM 及 SWaT传感器数据集) , 同时, 其未检测到 (Miss) 仍然小于 50% (SAND以外) 。 结果说明了对于检测到的异常, 本异常信号 先于其 他发现至少一半; 对于本方法 CAD 错过的那些异常, 其他人最多可以 检测到一半。 参考图 7, 关于 SMD数据集的所有 28个子集, 木方法 CAD 优于其他方法。 与其他方法相比, 本方法 CAD 在 SMD 中的大 多数子集可达到 Ahead>50°/o, 并且超过一半的子集获得 Miss<50%, 因 此验证了本方法可尽早发现异常时间且儿乎没有遗漏。 表 V早于 (Ahead) 及未检测到 (Miss) 的评估NormA 25 55.3 ± 29.3 26 42.6 ± 26.5 / When the number of sensors tl increases, for example from IS-1 to IS-2, the results of some methods (such as LOF and USAD) drop significantly when adjusting DPA at the delay point. This is partly due to the multivariate time series with more sensors. The performance of the four univariate methods is significantly reduced under point-adjusted PA and delayed point-adjusted DPA. This shows that it is not easy to extend these methods to multivariate time domains. The sequence may be invalid. Since most benchmark methods detect outliers based on certain assumptions, they may become unstable and more sensitive to noise. In contrast, the CAD method detects outliers based on a series of time series plots. The abnormal changes can reduce the noise and be more stable. Table V shows the evaluation of Ahead and Miss. The method CAD described in this application is compared with other methods. At least 50% (PSM and SWaT sensor datasets), while its miss rate is still less than 50% (except SAND). The results show that for the detected anomalies, our anomaly signal precedes other discoveries at least half; for those anomalies missed by CAD, others can detect at most half. Refer to Figure 7, for all 28 subsets of the SMD dataset, The CAD method is superior to other methods. Compared with other methods, the CAD method can achieve Ahead>50°/ o in most subsets of SMD, and more than half of the subsets have Miss<50%, thus verifying the CAD method. The method can detect abnormal time as early as possible and almost no omissions. Table V Evaluation of Ahead and Miss

CAD w PSM SWaT IS-1 IS-2 其他方法 Ah Ms Ah Ms Ah .Ms Ah Ms CAD w PSM SWaT IS-1 IS-2 Other methods Ah Ms Ah Ms Ah .Ms Ah Ms

LOF 60.0 40.4 100.0 0.D 106.0 OX) 77.8 CLOLOF 60.0 40.4 100.0 0.D 106.0 OX) 77.8 CLO

ECOD 73.3 19.3 100.0^ 35.3 100 0 0.0 66.7 0.0ECOD 73.3 19.3 100.0^ 35.3 100 0 0.0 66.7 0.0

1 Forest 50.0 40.4 100. Q 37.9 10G.0 D.O 36/7 0.01 Forest 50.0 40.4 100. Q 37.9 10G.0 D.O 36/7 0.0

USAD 66.4 17,5 100.0 9.7 100.0 0.0 77.8 0.0USAD 66.4 17,5 100.0 9.7 100.0 0.0 77.8 0.0

RCoders 64/7 31.8 100,0 13.5 100.0 0,0 43.3 0.0RCoders 64/7 31.8 100.0 13.5 100.0 0.0 43.3 0.0

S2G 73.3 12.3 100.0 31.2 10Q.0 0.0 100,0 0.QS2G 73.3 12.3 100.0 31.2 10Q.0 0.0 100,0 0.Q

SAND 96.0 8.2 100. Q 59.1 1QG.0 100.0 0X)SAND 96.0 8.2 100.Q 59.1 1QG.0 100.0 0X)

SAND^ 86.7 12.8 100.0 4.7 1OD.0 0.0 100.0 0.0SAND^ 86.7 12.8 100.0 4.7 1OD.0 0.0 100.0 0.0

NormA 82.0 19.5 100. Q 15.7 WO.O D.O 100.0 0.0 本方法 CAD不仅 可 '检测异常时间, 还可揭示异常传感器。 表 IV 的最后一列显示了 SMD 上的异常传感器检测结果。观察到本方法 CAD 以 28/28的比例优 J'- ECOD及 RCoders方法, 其为仅有的两个町获得 异常传感器的基准方法。 基于传感器数据集 IS-1及 1S-2, 本方法 CAD 也优于 ECO D及 RCoders方法, 其 F\sensor超过 60%。 因此, 至少一半 的正常传感器可被排除,大大地减轻了操作人员进行预测性维护的压力。 表 VI 示出了本方法 CAD在每 ■轮的检测时间,即每轮时间(Time Per Round, 简称 TPR) 。 为了实时检测异常, 每轮时间 TPR应小于步 长 s对应的时间, 即 TPR〈s泌纠, 其中斤 eg为传感器的采样频率。 因 此, "eg Vs/TPR Hz。 根据表 VI列出的结果, 本方法 CAD可支持实时 异常检测, 即本方法在 SWaT及 IS-2上可支持的最大频率分别约为 43 千赫兹 (kHz)及 331赫兹(Hz) , 远大于对应的实际采样频率 1 赫兹 (Hz) 及 1/900林兹 (Hz) 。

Figure imgf000021_0001
NormA 82.0 19.5 100. Q 15.7 WO.O DO 100.0 0.0 The CAD method can not only detect abnormal time, but also reveal abnormal sensors. The last column of Table IV shows the abnormal sensor detection results on SMD. It is observed that the CAD method The CAD method outperforms the ECO D and RCoders methods by a ratio of 28/28, which are the only two benchmark methods for obtaining abnormal sensors. Based on the sensor data sets IS-1 and IS-2, the CAD method of this method is also better than the ECO D and RCoders methods, and its F\ sensor exceeds 60%. Therefore, at least half of the normal sensors can be excluded, which greatly reduces the pressure on operators to perform predictive maintenance. Table VI shows the detection time of the CAD method in each round, that is, the time per round (TPR for short). In order to detect abnormalities in real time, the time per round TPR should be less than the time corresponding to the step size s, that is, TPR < s, where s is the sampling frequency of the sensor. Therefore, "eg Vs/TPR Hz. According to the results listed in Table VI, the CAD method can support real-time anomaly detection, that is, the maximum frequency supported by the method on SWaT and IS-2 is approximately 43 kHz and 331 Hz, respectively, which is much larger than the corresponding actual sampling frequency of 1 Hz and 1/900 Hz.
Figure imgf000021_0001

IFarest 1.5 11.3 0.5 4.3 IFarest 1.5 11.3 0.5 4.3

USAD 0.6 3.1 0.2 OS 0.2USAD 0.6 3.1 0.2 OS 0.2

RCoders 1忑 11,0 111 47,0 (K6RCoders 1 11,0 111 47,0 (K6

S2G 6,4 34,3 D,9 2,8 1.9S2G 6,4 34,3 D,9 2,8 1.9

SAND 12.7 20,3 3.8 5,0 4.6SAND 12.7 20,3 3.8 5,0 4.6

SAND* 7.9 21.7 1.7 夜 2 3.2SAND* 7.9 21.7 1.7 Night 2 3.2

NorniA 8.6 105.5 1.3 4,6 3.2 参考 图 8, 在无个大型真实世界标记数据集(即 IS-1-IS-5)上, 通 过增加传感器的数量, 运行本方法 CAD以进行异常检测。 图 8的左侧 显示了异常时间检测的结果。即使 IS-5数据集包含超过一千个传感器, 本方法仍可确保检测异常的精度 Fl DPA > 85% o 图 8的右侧显示了在 IS- 1至 IS-5 上异常检测的每轮时间 TPR,说明了本方法 CAD可扩展到大 量传感器,只要每轮时间 TPR 小「一个步长的时间(如 IS-1-IS-5的情 况下) , 则可实现实时异常检测。 图 9示出了 SMD 中子集 SMD 1_6的部分传感器的时间序列, 其 中覆盖了木次异常的异常时间段,表示受到异常影响的传感器。与异常 发生时, 异常传感器(例如, 传感器 2 -4、 9、 12 及 13) 的时间序列与 之前的传感器 (非常) 不同, 而正常传感器 (例如传感器 19-21) 依旧 保持着原本的趋势,不受此异常影响。传感器 2至 4最初的行为类似于 传感器 9、 12及 13 , 因此具有很强的相关性。 当异常发生时, 则会导 致一些时间序列发牛变化, 传感器 2至 4 不再与传感器 9、 12及 13相 关联。 由于本方法 CAD使用时间序列图中的边缘来表示传感器之间的 强相关性,因此某些传感器的异常变化可立即通过边的变化反映。因此, 可较早准确地报告为异常传感器。 然而, 当这种异常发生的最初阶段, 时间序列的变化可能是微不足道的,因此仅基于特定规则的其他方法可 能无法及早发现。 此外, 图 9 的第一行不出了各方法检测到此异常的 时间点。可以看出当异常发生时, 本方法可立即检测到异常, 因此适用 于现实世界的工业环境,可避免因不及时维护,故障组件随着时间的推 移传播到其他相邻组件。 如本文 中所使用的,除非另外明确指出,否则单数 “一 ”和“一个” 可以被解释为包括复数 “一个或多个” 。 以上出于说明及描述的目的呈现本公开, 但是并不旨在穷举或限 制。许多修改及变化对于本领域普通技术人员来说是显而易见的。选择 及描述示例实施例是为了解释原理及实际应用,并且使本领域普通技术 人员能够理解本公开的各种实施例,其具有适合于预期的特定用途的各 种修改。 因此,尽管本文参考附图描述说明性示例实施例,但是应该理解, 该描述不是限制性的,并且本领域技术人员可以在其中实现各种其他变 化及修改而不脱离本公开的范围、 创新构思及技术方案。 NorniA 8.6 105.5 1.3 4,6 3.2 Referring to Figure 8, the proposed CAD method is run on a large real-world labeled dataset (i.e., IS-1-IS-5) for anomaly detection by increasing the number of sensors. The left side of Figure 8 shows the results of anomaly detection in real time. Even though the IS-5 dataset contains more than one thousand sensors, the proposed method can still ensure anomaly detection accuracy of Fl DPA > 85%. The right side of Figure 8 shows the TPR of each round of anomaly detection on IS-1 to IS-5, which shows that the proposed CAD method is scalable to a large number of sensors, and real-time anomaly detection can be achieved as long as the TPR of each round is less than one step time (as in the case of IS-1-IS-5). FIG9 shows the time series of some sensors of the subset SMD 1_6 in the SMD, which covers the abnormal time period of the second abnormality, indicating the sensors affected by the abnormality. When the abnormality occurs, the time series of the abnormal sensors (e.g., sensors 2-4, 9, 12, and 13) are (very) different from the previous sensors, while the normal sensors (e.g., sensors 19-21) still maintain the original trend and are not affected by the abnormality. Sensors 2 to 4 initially behave like sensors 9, 12, and 13, and therefore have a strong correlation. When the abnormality occurs, some time series will change, and sensors 2 to 4 will no longer be associated with sensors 9, 12, and 13. Since the present method CAD uses the edges in the time series graph to represent the strong correlation between sensors, the abnormal changes of some sensors can be immediately reflected by the changes in the edges. Therefore, they can be reported as abnormal sensors earlier and accurately. However, when such an abnormality occurs in the initial stage, the changes in the time series may be negligible, so other methods based only on specific rules may not be able to detect it early. In addition, the first row of FIG. 9 shows the time point at which each method detects this anomaly. It can be seen that when an anomaly occurs, the present method can detect the anomaly immediately, so it is suitable for the industrial environment of the real world, and can avoid the propagation of faulty components to other adjacent components over time due to untimely maintenance. As used herein, unless otherwise explicitly stated, the singular "one" and "an" may be interpreted as including the plural "one or more". The present disclosure is presented above for the purpose of illustration and description, but is not intended to be exhaustive or limiting. Many modifications and variations are obvious to those of ordinary skill in the art. The example embodiments are selected and described in order to explain the principles and practical applications, and to enable those of ordinary skill in the art to understand the various embodiments of the present disclosure, which have various modifications suitable for the intended specific purposes. Therefore, although the illustrative example embodiments are described herein with reference to the accompanying drawings, it should be understood that the description is not restrictive, and that those skilled in the art may implement various other changes and modifications therein without departing from the scope, innovative concepts and technical solutions of the present disclosure.

Claims

权利要求书 Claims 1. 一种异常信号检测的方法, 其特征在于, 所述方法包括: 将包括多变量时间序列的传感器信号划分为多个子矩阵, 其 中所述多个子矩阵分别对应于在时间序列上相对 于所述传感器 信号的各个滑动窗口; 将每个所述子矩阵转换为相应的 k-最近邻图, 其中每个所述 k-最近邻图包括多个顶点及连接相邻顶点的多个边; 修剪每个所述 k-最近邻图以去除所述多个边其中权重小于 权重阈值的边; 将每个所述 k-最近邻图拆分为多个不相交的了图; 通过对比两个连续的 k-最近邻图, 获得离群点变异数量所述 离群点变异数量对应于离群点的数量, 所述离群点在所述两个连 续的 k-最近邻图中分别置于不同的子图; 及 根据所述离群点变异数量及第一阈值, 判定所述传感器信号 是否具有异常信号。 1. A method for detecting abnormal signals, characterized in that the method comprises: dividing a sensor signal including a multivariate time series into a plurality of sub-matrices, wherein the plurality of sub-matrices respectively correspond to respective sliding windows relative to the sensor signal in the time series; converting each of the sub-matrices into a corresponding k-nearest neighbor graph, wherein each of the k-nearest neighbor graphs comprises a plurality of vertices and a plurality of edges connecting adjacent vertices; pruning each of the k-nearest neighbor graphs to remove edges whose weights are less than a weight threshold among the plurality of edges; splitting each of the k-nearest neighbor graphs into a plurality of non-intersecting graphs; obtaining an outlier variation number by comparing two consecutive k-nearest neighbor graphs, wherein the outlier variation number corresponds to the number of outliers, and the outliers are respectively placed in different sub-graphs in the two consecutive k-nearest neighbor graphs; and determining whether the sensor signal has an abnormal signal according to the outlier variation number and a first threshold. 2. 根据权利要求 1所述的方法, 其特征在丁, 所述方法还包括响应 于在所述离群点变异数量大于或等于所述笫一阈值时, 判定传感 器信号为异常。 2. The method according to claim 1, characterized in that, in step D, the method further comprises determining that the sensor signal is abnormal in response to when the number of outlier point variations is greater than or equal to the first threshold. 3. 根据权利要求 2所述的方法, 其特征在于, 所述方法还包括根据 至少 -个离群点, 获得相应于各个所述至少 -个离群点的异常时 间及异常传感器。 3. The method according to claim 2, characterized in that the method further comprises obtaining, based on at least one outlier, an abnormal time and an abnormal sensor corresponding to each of the at least one outlier. 4. 根据权利要求 1所述的方法, 其特征在于, 还包括: 通过对比两个连续的所述 k-最近邻图, 获得每个所述顶点的 共现数及相应的共现率; 及 将所述共现率其中低于第二 阈值的各个顶点添加入离群值 集 。 4. The method according to claim 1, further comprising: obtaining the number of co-occurrences and the corresponding co-occurrence rate of each vertex by comparing two consecutive k-nearest neighbor graphs; and adding each vertex whose co-occurrence rate is lower than a second threshold into an outlier. Set. 5. 根据权利要求 4所述的方法, 其特征在于, 所述方法还包括通过 对 比两个连续的所述离群值集, 获得所述离群点变异数量。 5. The method according to claim 4 is characterized in that the method further comprises obtaining the number of outlier point variations by comparing two consecutive outlier value sets. 6. 根据权利要求 5所述的方法, 其特征在于, 所述方法还包括根据 历史传感器信号, 获得所述离群点变异数量的平均值及标准偏差。 6. The method according to claim 5 is characterized in that the method also includes obtaining the average value and standard deviation of the outlier point variation quantity based on historical sensor signals. 7. 根据权利要求 6所述的方法, 其特征在于, 所述方法还包括根据 所 述离群点变异数量的标准偏差, 获得所述第一阈值。 7. The method according to claim 6 is characterized in that the method further comprises obtaining the first threshold value according to a standard deviation of the number of outlier point variations. 8. 根据权利要求 7所述的方法, 其特征在于, 所述方法还包括根据 所 述第一阈值、所述离群点变异数量及所述离群点变异数量的平 均值, 判定所述传感器信号是否具有所述异常信号。 8. The method according to claim 7 is characterized in that the method further comprises determining whether the sensor signal has the abnormal signal based on the first threshold, the number of outlier variations and the average value of the number of outlier variations. 9. 根据权利要求 4所述的方法, 其特征在于, 所述方法还包括将所 述离群值集初始化为空集。 9. The method according to claim 4, characterized in that the method further comprises initializing the outlier set to an empty set. 10. :种可实时进行异常信号检测的系统, 其特征在十, 所述系统包 括 : 至少一个传感器, 其中所述至少一个传感器可提供包括多变 量时间序列的传感器信号; 及与多个传感器形成信号连接的计算 设备,所述计算设备被设置为可执行所述权利要求 1至 9的方法。 10. A system capable of performing abnormal signal detection in real time, characterized in that the system comprises: at least one sensor, wherein the at least one sensor can provide a sensor signal including a multivariate time series; and a computing device that forms a signal connection with a plurality of sensors, wherein the computing device is configured to execute the methods of claims 1 to 9.
PCT/SG2024/050191 2023-03-30 2024-03-28 Anomaly signal detection method and system WO2024205504A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202310327059.0 2023-03-30
CN202310327059.0A CN118734208A (en) 2023-03-30 2023-03-30 Abnormal signal detection method and system

Publications (1)

Publication Number Publication Date
WO2024205504A1 true WO2024205504A1 (en) 2024-10-03

Family

ID=92844438

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SG2024/050191 WO2024205504A1 (en) 2023-03-30 2024-03-28 Anomaly signal detection method and system

Country Status (2)

Country Link
CN (1) CN118734208A (en)
WO (1) WO2024205504A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1926879A (en) * 2004-03-01 2007-03-07 皇家飞利浦电子股份有限公司 A video signal encoder, a video signal processor, a video signal distribution system and methods of operation therefor
CN112214499A (en) * 2020-12-03 2021-01-12 腾讯科技(深圳)有限公司 Graph data processing method and device, computer equipment and storage medium
US20230014068A1 (en) * 2019-12-20 2023-01-19 Brightsign Technology Limited Method and system for gesture recognition

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1926879A (en) * 2004-03-01 2007-03-07 皇家飞利浦电子股份有限公司 A video signal encoder, a video signal processor, a video signal distribution system and methods of operation therefor
US20230014068A1 (en) * 2019-12-20 2023-01-19 Brightsign Technology Limited Method and system for gesture recognition
CN112214499A (en) * 2020-12-03 2021-01-12 腾讯科技(深圳)有限公司 Graph data processing method and device, computer equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HAUTAMAKI V., KARKKAINEN I., FRANTI P.: "Outlier detection using k-nearest neighbour graph", PATTERN RECOGNITION, 2004. ICPR 2004. PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON CAMBRIDGE, UK AUG. 23-26, 2004, PISCATAWAY, NJ, USA,IEEE, LOS ALAMITOS, CA, USA, vol. 3, 23 August 2004 (2004-08-23) - 26 August 2004 (2004-08-26), US, pages 430 - 433, XP010724690, ISBN: 978-0-7695-2128-2, DOI: 10.1109/ICPR.2004.1334558 *
PAUL BONIOL; THEMIS PALPANAS: "Series2Graph: Graph-based Subsequence Anomaly Detection for Time Series", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 25 July 2022 (2022-07-25), US, XP091279850, DOI: 10.14778/3407790.3407792 *

Also Published As

Publication number Publication date
CN118734208A (en) 2024-10-01

Similar Documents

Publication Publication Date Title
US7716011B2 (en) Strategies for identifying anomalies in time-series data
JP2019061565A (en) Diagnostic method and diagnostic device
CN111931834B (en) Anomaly detection method, equipment and storage medium for aluminum profile extrusion process flow data based on isolated forest algorithm
JP2018535501A (en) Periodic analysis of heterogeneous logs
KR102440335B1 (en) Anomaly detection and management method and device therefor
JP6915693B2 (en) System analysis method, system analyzer, and program
CN118378155B (en) A fault detection method and system for intelligent middleware
CN110113368B (en) A network behavior anomaly detection method based on sub-trajectory pattern
FR3074316A1 (en) METHOD AND DEVICE FOR MONITORING A DATA GENERATION PROCESS OF A METRIC FOR PREDICTING ANOMALIES
CN117194994A (en) Anomaly detection method based on unbalanced countermeasure training convolution self-encoder
CN118174788A (en) Fault detection method, device and equipment of optical fiber wiring cabinet and storage medium
Ang et al. A stitch in time saves nine: Enabling early anomaly detection with correlation analysis
CN116248532A (en) Network abnormality detection method, network abnormality detection device and electronic equipment
WO2024205504A1 (en) Anomaly signal detection method and system
Feng et al. RelSen: An optimization-based framework for simultaneously sensor reliability monitoring and data cleaning
JP4112584B2 (en) Abnormal traffic detection method and apparatus
Chen et al. Frequency-domain spectrum discrepancy-based fast anomaly detection for IIoT sensor time-series signals
CN118673373A (en) Fault prediction algorithm for on-orbit motion mechanism
CN112231341A (en) Cloud platform anomaly detection method and system based on multi-data channel analysis
CN115836306A (en) Data amount adequacy judging device, data amount adequacy judging method, data amount adequacy judging program, learning model generating system, learning completed learning model generating method, and completed learning learning model generating program
CN114026513B (en) Fault prediction diagnosis device and method
Yasaei et al. IoT-GRAF: IoT Graph Learning-Based Anomaly and Intrusion Detection Through Multi-Modal Data Fusion
Tekeoglu et al. Unsupervised time-series based anomaly detection in ics/scada networks
US20180157718A1 (en) Episode mining device, method and non-transitory computer readable medium of the same
Shi et al. Constraint-based learning for sensor failure detection and adaptation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24781422

Country of ref document: EP

Kind code of ref document: A1