CN110619345B

CN110619345B - Comprehensive verification method of tag credibility for cable-stayed bridge monitoring data validity

Info

Publication number: CN110619345B
Application number: CN201910661200.4A
Authority: CN
Inventors: 梁宗保; 柴洁; 唐朝霞; 唐玉
Original assignee: Chongqing Jiaotong University
Current assignee: Chongqing Jiaotong University
Priority date: 2019-07-22
Filing date: 2019-07-22
Publication date: 2022-12-06
Anticipated expiration: 2039-07-22
Also published as: CN110619345A

Abstract

The invention discloses a comprehensive tag reliability verification method for validity of cable-stayed bridge monitoring data. The invention relates to a method for comprehensively verifying confidence degrees of data validity labels detected by various abnormal points on the basis of a grey correlation degree theory, adopts a calculation method of Dun's correlation degree to pairwise correlate data of each monitoring point of a bridge every day, respectively adopts 3 algorithm, LOF algorithm and isolated forest algorithm to detect the abnormal points of the labeled data section obtained in the first step, verifies the confidence degree of the validity labels of the data section according to the probability of the abnormal point crossing, can diagnose the collected data before evaluation by the abnormal point detection with Isolation forest algorithm to eliminate the invisible invalid data or repair the invisible invalid data to form valid data, provides a scientific data base for the next safety evaluation, and has a certain use prospect.

Description

Comprehensive verification method of tag credibility for cable-stayed bridge monitoring data validity

技术领域technical field

本发明涉及面向斜拉桥监测数据的标签可信度综合验证方法领域，具体为面向斜拉桥监测数据有效性的标签可信度综合验证方法。The invention relates to the field of a comprehensive verification method for label credibility oriented to cable-stayed bridge monitoring data, in particular to a comprehensive verification method for label credibility oriented to the effectiveness of cable-stayed bridge monitoring data.

背景技术Background technique

斜拉桥因结构轻灵、造型美观、跨度大、经济性好而得到广泛应用。因此对斜拉桥的监测和安全评价也是一直以来的研究热点。然而这一切的基础是能够采集到准确有效的数据。然而由于各方面因素的影响，所采集到的数据却往往不一定真实有效，甚至错漏百出，完全失效。Cable-stayed bridges are widely used because of their light structure, beautiful appearance, large span and good economy. Therefore, the monitoring and safety evaluation of cable-stayed bridges has always been a research hotspot. However, the basis of all this is the ability to collect accurate and effective data. However, due to the influence of various factors, the collected data is often not necessarily true and effective, or even full of errors and omissions, completely invalid.

其中，数据段失效可以是连续数据点出现失效，也可以是若干分散的单点失效使得整个数据段失效。几种常见的数据特征包括：数据段漂移、数据频率变化异常、数据段突然跳变、数据值为恒定值等，这些类型的失效数据可通过图像显示的方式直观判断，而有一种失效数据段表面看和许多正常的数据段没有什么不同，没有很明显的特征波动，难以直接判断。而这类数据占据了失效数据的大部分，对斜拉桥的安全评价影响巨大。因此必须在评价前对所采集到的数据进行诊断处理，以剔除这种隐形无效的数据或对其进行修复而形成有效数据，为接下来的安全评价提供科学的数据基础。Wherein, the failure of the data segment may be the failure of continuous data points, or the failure of several scattered single points causing the failure of the entire data segment. Several common data characteristics include: data segment drift, abnormal data frequency change, sudden jump of data segment, constant data value, etc. These types of failure data can be judged intuitively through image display, and there is a failure data segment On the surface, it is no different from many normal data segments, and there are no obvious characteristic fluctuations, so it is difficult to judge directly. And this kind of data occupies most of the failure data, which has a great impact on the safety evaluation of cable-stayed bridges. Therefore, the collected data must be diagnosed and processed before the evaluation, so as to eliminate the invisible and invalid data or repair it to form valid data, and provide a scientific data basis for the next safety evaluation.

目前，以深度学习为核心的人工智能方法是被认可的数据有效性诊断的主要方法之一，而其中数据有效性标签的可信度大小是这种方法的重要依据，为此本申请提出了面向斜拉桥监测数据有效性的标签可信度综合验证方法。At present, the artificial intelligence method with deep learning as the core is one of the main methods of recognized data validity diagnosis, and the credibility of the data validity label is an important basis for this method. Therefore, this application proposes A comprehensive verification method of tag credibility for cable-stayed bridge monitoring data validity.

发明内容Contents of the invention

本发明要解决的技术问题是克服现有的缺陷，提供面向斜拉桥监测数据有效性的标签可信度综合验证方法，可以有效解决背景技术中的问题。The technical problem to be solved by the present invention is to overcome the existing defects and provide a comprehensive verification method for label credibility oriented to the effectiveness of cable-stayed bridge monitoring data, which can effectively solve the problems in the background technology.

为实现上述目的，本发明提供如下技术方案：面向斜拉桥监测数据有效性的标签可信度综合验证方法，包括以下步骤：In order to achieve the above object, the present invention provides the following technical solutions: a method for comprehensively verifying the credibility of tags facing the effectiveness of cable-stayed bridge monitoring data, comprising the following steps:

步骤一：采用邓氏关联度的计算方法，对桥梁各个监测点每天的数据进行两两关联：Step 1: Use the calculation method of Deng's correlation degree to correlate the daily data of each monitoring point of the bridge in pairs:

(1)将每个测点采集的数据标准化；(1) Standardize the data collected at each measuring point;

(2)将每个测点的数据利用邓氏关联度算法完成两两关联的计算；(2) The data of each measuring point is used to complete the calculation of pairwise correlation with the Deng's correlation degree algorithm;

(3)根据测点位置以及关联度的计算值大小实现数据段有效性的自动标签；(3) Realize the automatic labeling of the validity of the data segment according to the position of the measuring point and the calculated value of the degree of correlation;

①对测点数据进行标签：若r_oi∈[0.8,1]，属于强关联，标签为有效数据段；若r_oi∈[0.6,0.8)，属于一般关联，标签为一般有效数据段；若r_oi∈[0,0.6)，关联很弱，标签为无效数据段；① Label the measurement point data: if r _oi ∈ [0.8,1], it belongs to strong association, and the label is a valid data segment; if r _oi ∈ [0.6,0.8), it belongs to general association, and the label is a general valid data segment; if r _oi ∈[0,0.6), the association is very weak, and the label is an invalid data segment;

②如果参考测点数据序列与周围测点、同一截面的数据序列均强关联，说明参考序列有效，并计算参考数据序列与周围测量点、同一截面的数据序列的关联度平均值，按照标准区间进行标签；②If the data sequence of the reference measurement point is strongly correlated with the data sequence of the surrounding measurement points and the same section, it means that the reference sequence is valid. labeling;

③如果参考测点数据序列与周围测点、同一截面的数据序列大部分强关联，少量弱关联，则去除弱关联数据序列的关联度，计算均值关联度，按照标准区间进行标签，此外，该弱关联数据序列嫌疑较大，以此为参考数据序列，与其相邻监测点数据序列和同一截面的数据序列进行关联度比较，按照②③④再次分析；③If the data sequence of the reference measuring point is mostly strongly correlated with the data sequence of the surrounding measuring points and the same section, and a few are weakly correlated, remove the correlation degree of the weakly correlated data sequence, calculate the average correlation degree, and label according to the standard interval. In addition, the The weakly correlated data sequence is suspected to be relatively large. Use this as a reference data sequence to compare the correlation degree with the data sequence of adjacent monitoring points and the data sequence of the same section, and analyze again according to ②③④;

④如果参考测点数据序列与周围测点、同一截面的数据序列大部分关联度很弱，则可以基本判定为无效数据段，仍然进行关联度均值计算，确定该数据段标签；④ If the data sequence of the reference measuring point is weakly correlated with the surrounding measuring points and the data sequence of the same section, it can be basically judged as an invalid data segment, and the average value of the correlation degree is still calculated to determine the label of the data segment;

步骤二：分别采用3σ算法、LOF算法、孤立森林算法对步骤一获得的已标签数据段进行异常点检测，并根据异常点交叉出现的概率大小对数据段的有效性标签的可信度进行验证；Step 2: Use the 3σ algorithm, LOF algorithm, and isolation forest algorithm to detect outliers on the labeled data segments obtained in step 1, and verify the credibility of the validity labels of the data segments according to the probability of crossing occurrences of abnormal points ;

(1)基于3σ算法的数据点检测；(1) Data point detection based on 3σ algorithm;

①首先对数据进行标准化处理，使数据满足正态分布；① First, standardize the data to make the data meet the normal distribution;

②将该算法得到的数据参量赋给单点数据，对公式做出适当变化，>0说明该点异常：其中为数据序列的均值，为数据序列的标准差，x为某个数据点；② Assign the data parameters obtained by the algorithm to the single-point data, and make appropriate changes to the formula, >0 indicates that the point is abnormal: where is the mean value of the data sequence, is the standard deviation of the data sequence, and x is a certain data point;

(2)基于LOF算法的数据点检测：局部异常因子LOF算法是一种基于密度的算法，C1是一个簇，C2是另外一个簇，而O1和O2则是远离两个簇的孤立点，也叫做异常点；(2) Data point detection based on the LOF algorithm: the local outlier factor LOF algorithm is a density-based algorithm, C1 is a cluster, C2 is another cluster, and O1 and O2 are isolated points far away from the two clusters. called outliers;

(3)表示点p的邻域点的局部可达密度与点的局部可达密度之比的平均数，如果这个比值大于1，则的密度小于其邻域点密度，那么认为可能是异常点；其中：(3) Indicates the average number of the ratio of the local reachable density of the neighborhood points of point p to the local reachable density of the point. If the ratio is greater than 1, then the density of p is smaller than the density of its neighbor points, then it may be an abnormal point ;in:

为局部可达密度，是点p的第k邻域内点到p的平均可达距离的倒数，表示点p周围点的密度，密度越高自然是在所在的簇内，密度越低，则认为离簇越远，极有可能是异常点，而点o到点p的第k可达距离，是点p的第k距离邻域，即点p的第k距离以内的所有点；is the local reachable density, which is the reciprocal of the average reachable distance from point to p in the k-th neighborhood of point p, and represents the density of points around point p. The farther away from the cluster, it is very likely to be an abnormal point, and the k-th reachable distance from point o to point p is the k-th distance neighborhood of point p, that is, all points within the k-th distance of point p;

(3)异常点检测：孤立森林用超平面把空间一割为二，无限切割下去，直到切割成一个个数据；如果是一个密度高的簇，切割次数越多；而密度低的簇或者本身就是一个远离簇的数据，则很快会被切割完，孤立森林由t个孤立树组成，每个孤立树是一个二叉树结构；(3) Outlier detection: The isolated forest uses a hyperplane to divide the space into two, and cuts it infinitely until it is cut into pieces of data; if it is a cluster with high density, the number of cuts is more; while the cluster with low density or itself It is a data that is far away from the cluster, and it will be cut soon. The isolated forest is composed of t isolated trees, and each isolated tree is a binary tree structure;

获得t个孤立树之后，孤立森林训练就结束，然后我们可以用生成的孤立树来评估测试数据了，对于每一个数据点a，让该数据点遍历每一棵树，然后计算a最终落在每个树第几层，从而得出该数据的平均高度，设置一定阈值，低于此阈值的数据即为异常，这里将每个数据的高度平均值设置为，取＝-0.05，如果平均高度值<-0.05,则说明该数据异常；After obtaining t isolated trees, the isolated forest training is over, and then we can use the generated isolated trees to evaluate the test data. For each data point a, let the data point traverse each tree, and then calculate a and finally fall on Which layer is each tree, so as to obtain the average height of the data, set a certain threshold, the data below this threshold is abnormal, here set the average height of each data to, take=-0.05, if the average height If the value is <-0.05, it means that the data is abnormal;

步骤三：标签可信度的综合验证，通过使用统计算法、LOF算法和孤立森林算法(ISO)对各个测点的数据段分别进行异常点检测，并根据以下步骤验证标签的有效性；Step 3: Comprehensive verification of label credibility, by using statistical algorithm, LOF algorithm and isolated forest algorithm (ISO) to detect abnormal points on the data segments of each measurement point, and verify the validity of the label according to the following steps;

(1)分别计算每种算法的异常点概率P1，P2，P3以及他们的平均概率P；(1) Calculate the outlier probability P1, P2, P3 of each algorithm and their average probability P;

(2)若平均概率P大于10％，则认为此数据段是大概率的异常数据，标签为无效数据段；若平均概率P小于等于5％，则不认为其是异常数据，标签为有效数据段；若平均概率P大于5％且小于等于10％，则标签为一般有效数据段；(2) If the average probability P is greater than 10%, the data segment is considered to be abnormal data with a high probability, and the label is an invalid data segment; if the average probability P is less than or equal to 5%, it is not considered abnormal data, and the label is valid data segment; if the average probability P is greater than 5% and less than or equal to 10%, the label is a general valid data segment;

与现有技术相比，本发明的有益效果是：Compared with prior art, the beneficial effect of the present invention is:

本发明通过在灰色关联度理论的基础上，结合多种异常点检测的数据有效性标签置信度的综合验证方法，采用邓氏关联度的计算方法，对桥梁各个监测点每天的数据进行两两关联，分别采用3σ算法、LOF算法、孤立森林算法对步骤一获得的已标签数据段进行异常点检测，并根据异常点交叉出现的概率大小对数据段的有效性标签的可信度进行验证，通过有Isolation Froest(孤立森林)算法的异常点检测，可在评价前对所采集到的数据进行诊断处理，以剔除这种隐形无效的数据或对其进行修复而形成有效数据，为接下来的安全评价提供科学的数据基础，利于人们的使用。In the present invention, on the basis of the theory of gray relational degree, combined with the comprehensive verification method of the data validity label confidence degree of various abnormal point detection, the calculation method of Deng's correlation degree is used to carry out two-by-two daily data of each monitoring point of the bridge. Correlation, using the 3σ algorithm, LOF algorithm, and isolation forest algorithm to detect abnormal points on the labeled data segments obtained in step 1, and verify the credibility of the validity labels of the data segments according to the probability of occurrence of abnormal point intersections. Through the outlier detection with the Isolation Froest (isolated forest) algorithm, the collected data can be diagnosed and processed before evaluation, so as to eliminate this invisible and invalid data or repair it to form effective data, which will provide a basis for the next Safety evaluation provides a scientific data basis, which is beneficial to people's use.

附图说明Description of drawings

图1为本发明中孤立森林算法流程图；Fig. 1 is isolated forest algorithm flow chart among the present invention;

图2为本发明中基于距离的异常点示意图；Fig. 2 is a schematic diagram of distance-based abnormal points in the present invention;

具体实施方式detailed description

为使本发明实现的技术手段、创作特征、达成目的与功效易于明白了解，下面结合具体实施方式，进一步阐述本发明。In order to make the technical means, creative features, goals and effects achieved by the present invention easy to understand, the present invention will be further described below in conjunction with specific embodiments.

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

如图1-2所示，面向斜拉桥监测数据有效性的标签可信度综合验证方法，步骤一：采用邓氏关联度的计算方法，对桥梁各个监测点每天的数据进行两两关联：As shown in Figure 1-2, the comprehensive verification method of label credibility for the effectiveness of cable-stayed bridge monitoring data, step 1: use the calculation method of Deng's correlation degree to correlate the daily data of each monitoring point of the bridge in pairs:

将每个测点采集的数据标准化；Standardize the data collected at each measuring point;

将每个测点的数据利用邓氏关联度算法完成两两关联的计算；Use the data of each measuring point to complete the calculation of pairwise correlation by using Deng's correlation degree algorithm;

则关联系数的计算公式为：Then the formula for calculating the correlation coefficient is:

ρ为分辨系数，平衡了关联系数的判定，一般在区间(0，1)之间。ρ is the resolution coefficient, which balances the determination of the correlation coefficient, and is generally in the interval (0, 1).

关联度的计算公式为：The formula for calculating the correlation degree is:

r_oi反映了序列之间的变化相似度。r _oi reflects the variation similarity between sequences.

②实际应用中，X₀为某个测点的数据序列，X{X_i|i＝1,2,...m}为同一截面的其他测点数据序列，m为数据序列的长度，它们均有n个分量。②In practical application, X ₀ is the data sequence of a certain measuring point, X{X _i |i=1,2,...m} is the data sequence of other measuring points in the same section, m is the length of the data sequence, they Each has n components.

根据测点位置以及关联度的计算值大小实现数据段有效性的自动标签。Automatic labeling of the validity of the data segment is realized according to the position of the measuring point and the calculated value of the correlation degree.

①对测点数据进行标签：若r_oi∈[0.8,1]，属于强关联，标签为有效数据段；若r_oi∈[0.6,0.8)，属于一般关联，标签为一般有效数据段；若r_oi∈[0,0.6)，关联很弱，标签为无效数据段。① Label the measurement point data: if r _oi ∈ [0.8,1], it belongs to strong association, and the label is a valid data segment; if r _oi ∈ [0.6,0.8), it belongs to general association, and the label is a general valid data segment; if r _oi ∈[0,0.6), the correlation is very weak, and the label is an invalid data segment.

②如果参考测点数据序列与周围测点、同一截面的数据序列均强关联，说明参考序列有效，并计算参考数据序列与周围测量点、同一截面的数据序列的关联度平均值，按照标准区间进行标签。②If the data sequence of the reference measurement point is strongly correlated with the data sequence of the surrounding measurement points and the same section, it means that the reference sequence is valid. Make a label.

③如果参考测点数据序列与周围测点、同一截面的数据序列大部分强关联，少量弱关联，则去除弱关联数据序列的关联度，计算均值关联度，按照标准区间进行标签，此外，该弱关联数据序列嫌疑较大，以此为参考数据序列，与其相邻监测点数据序列和同一截面的数据序列进行关联度比较，按照②③④再次分析。③If the data sequence of the reference measuring point is mostly strongly correlated with the data sequence of the surrounding measuring points and the same section, and a few are weakly correlated, remove the correlation degree of the weakly correlated data sequence, calculate the average correlation degree, and label according to the standard interval. In addition, the The weakly correlated data sequence is more suspected, so use this as a reference data sequence to compare the correlation degree with the data sequence of adjacent monitoring points and the data sequence of the same section, and analyze again according to ②③④.

④如果参考测点数据序列与周围测点、同一截面的数据序列大部分关联度很弱，则可以基本判定为无效数据段，仍然进行关联度均值计算，确定该数据段标签。④ If the data sequence of the reference measuring point is weakly correlated with the surrounding measuring points and the data sequence of the same section, it can be basically judged as an invalid data segment, and the mean value of the correlation degree is still calculated to determine the label of the data segment.

步骤二：分别采用3σ算法、LOF算法、孤立森林算法对步骤一获得的已标签数据段进行异常点检测，并根据异常点交叉出现的概率大小对数据段的有效性标签的可信度进行验证。Step 2: Use the 3σ algorithm, LOF algorithm, and isolation forest algorithm to detect outliers on the labeled data segments obtained in step 1, and verify the credibility of the validity labels of the data segments according to the probability of crossing occurrences of abnormal points .

基于3σ算法的数据点检测。Data point detection based on 3σ algorithm.

②将该算法得到的数据参量赋给单点数据，对公式做出适当变化，ξ>0说明该点异常：②Assign the data parameters obtained by the algorithm to the single-point data, and make appropriate changes to the formula, and ξ>0 indicates that the point is abnormal:

ξ＝|x-μ|-3σ (3)ξ=|x-μ|-3σ (3)

其中μ为数据序列的均值，σ为数据序列的标准差，x为某个数据点。Among them, μ is the mean value of the data sequence, σ is the standard deviation of the data sequence, and x is a certain data point.

基于LOF算法的数据点检测：局部异常因子LOF算法是一种基于密度的算法，C1是一个簇，C2是另外一个簇，而O1和O2则是远离两个簇的孤立点，也叫做异常点。Data point detection based on LOF algorithm: local anomaly factor LOF algorithm is a density-based algorithm, C1 is a cluster, C2 is another cluster, and O1 and O2 are isolated points away from the two clusters, also called outliers .

表示点p的邻域点的局部可达密度与点的局部可达密度之比的平均数，如果这个比值大于1，则的密度小于其邻域点密度，那么认为可能是异常点，其中：Indicates the average number of the ratio of the local reachable density of the neighborhood points of point p to the local reachable density of the point. If the ratio is greater than 1, the density of p is smaller than the density of its neighbor points, so it may be an abnormal point, where:

为局部可达密度，是点p的第k邻域内点到p的平均可达距离的倒数，表示点p周围点的密度，密度越高自然是在所在的簇内，密度越低，则认为离簇越远，极有可能是异常点，而点o到点p的第k可达距离，是点p的第k距离邻域，即点p的第k距离以内的所有点。is the local reachable density, which is the reciprocal of the average reachable distance from point to p in the k-th neighborhood of point p, and represents the density of points around point p. The farther away from the cluster, it is very likely to be an outlier point, and the k-th reachable distance from point o to point p is the k-th distance neighborhood of point p, that is, all points within the k-th distance of point p.

异常点检测：孤立森林用超平面把空间一割为二，无限切割下去，直到切割成一个个数据；如果是一个密度高的簇，切割次数越多；而密度低的簇或者本身就是一个远离簇的数据，则很快会被切割完，孤立森林由t个孤立树组成，每个孤立树是一个二叉树结构。Outlier detection: The isolated forest uses a hyperplane to divide the space into two, and cuts it infinitely until it is cut into pieces of data; if it is a cluster with high density, the number of cuts is more; while the cluster with low density may itself be a far away The data of the cluster will be cut soon. The isolated forest consists of t isolated trees, and each isolated tree is a binary tree structure.

获得t个孤立树之后，孤立森林训练就结束，然后我们可以用生成的孤立树来评估测试数据了，对于每一个数据点a，让该数据点遍历每一棵树，然后计算a最终落在每个树第几层，从而得出该数据的平均高度，设置一定阈值，低于此阈值的数据即为异常，这里将每个数据的高度平均值设置为，取＝-0.05，如果平均高度值<-0.05,则说明该数据异常。After obtaining t isolated trees, the isolated forest training is over, and then we can use the generated isolated trees to evaluate the test data. For each data point a, let the data point traverse each tree, and then calculate a and finally fall on Which layer is each tree, so as to obtain the average height of the data, set a certain threshold, the data below this threshold is abnormal, here set the average height of each data to, take=-0.05, if the average height If the value is <-0.05, it means that the data is abnormal.

步骤三：标签可信度的综合验证，通过使用统计算法、LOF算法和孤立森林算法(ISO)对各个测点的数据段分别进行异常点检测，并根据以下步骤验证标签的有效性。Step 3: Comprehensive verification of label credibility, by using statistical algorithm, LOF algorithm and isolated forest algorithm (ISO) to detect outliers in the data segments of each measurement point, and verify the validity of the label according to the following steps.

(1)分别计算每种算法的异常点概率P1，P2，P3以及他们的平均概率P。(1) Calculate the outlier probabilities P1, P2, P3 and their average probability P of each algorithm respectively.

(2)若平均概率P大于10％，则认为此数据段是大概率的异常数据，标签为无效数据段；若平均概率P小于等于5％，则不认为其是异常数据，标签为有效数据段；若平均概率P大于5％且小于等于10％，则标签为一般有效数据段。(2) If the average probability P is greater than 10%, the data segment is considered to be abnormal data with a high probability, and the label is an invalid data segment; if the average probability P is less than or equal to 5%, it is not considered abnormal data, and the label is valid data segment; if the average probability P is greater than 5% and less than or equal to 10%, the tag is a general valid data segment.

尽管已经示出和描述了本发明的实施例，对于本领域的普通技术人员而言，可以理解在不脱离本发明的原理和精神的情况下可以对这些实施例进行多种变化、修改、替换和变型，本发明的范围由所附权利要求及其等同物限定。Although the embodiments of the present invention have been shown and described, those skilled in the art can understand that various changes, modifications and substitutions can be made to these embodiments without departing from the principle and spirit of the present invention. and modifications, the scope of the invention is defined by the appended claims and their equivalents.

Claims

1. The comprehensive label credibility verification method for the effectiveness of the cable-stayed bridge monitoring data comprises the following steps: the method comprises the following steps:

the method comprises the following steps: performing pairwise association on data of each monitoring point of the bridge every day by adopting a Deng correlation calculation method:

standardizing data collected by each measuring point;

computing pairwise correlation of the data of each measuring point by using a Deng correlation algorithm;

the correlation coefficient is calculated by the following formula:

rho is a resolution coefficient, balances the judgment of the correlation coefficient and is generally between intervals (0,1);

the calculation formula of the correlation degree is as follows:

r _oi reflecting varying similarity between sequences;

(2) in practical application, X ₀ For a data sequence at a certain point, X { X } _i I =1,2,. M } is other measuring point data sequences of the same section, m is the length of the data sequence, and the measuring point data sequences have n components;

(1) labeling the measuring point data: if r _oi ∈[0.8,1]The label is valid, belonging to a strong associationA data segment; if r _oi E [0.6,0.8) belongs to general association, and the label is a general valid data segment; if r _oi E.g., [0,0.6), the association is weak, and the label is an invalid data segment;

(2) if the data sequence of the reference measuring point is strongly associated with the data sequences of the surrounding measuring points and the same section, the reference sequence is indicated to be effective, the average value of the association degree of the reference data sequence with the data sequences of the surrounding measuring points and the same section is calculated, and the labeling is carried out according to a standard interval;

(3) if the data sequence of the reference measuring point is strongly associated with most of the data sequences of the surrounding measuring points and the same section, and a small amount of weak association is performed, removing the association degree of the weak association data sequence, calculating the average association degree, labeling according to a standard interval, and in addition, the weak association data sequence is relatively suspected, taking the weak association data sequence as the reference data sequence, performing association degree comparison with the data sequence of the adjacent monitoring point and the data sequence of the same section, and analyzing again according to the steps (2), (3) and (4);

(4) if the correlation degree of the reference measuring point data sequence, the data sequences of the surrounding measuring points and the same section is weak, determining as an invalid data segment, and still performing correlation degree mean value calculation to determine a data segment label;

step two: respectively adopting a 3 sigma algorithm, an LOF algorithm and an isolated forest algorithm to detect abnormal points of the labeled data segments obtained in the step one, and verifying the reliability of the validity labels of the data segments according to the probability of the abnormal points;

(1) Data point detection based on a 3 sigma algorithm;

(1) firstly, carrying out standardization processing on data to enable the data to meet normal distribution;

(2) the data parameters obtained by the algorithm are assigned to single-point data, the formula is appropriately changed, and xi >0 shows that the point is abnormal as follows:

ξ＝|x-μ|-3σ

wherein mu is the mean value of the data sequence, sigma is the standard deviation of the data sequence, and x is a certain data point;

(2) Data point detection based on LOF algorithm: the local anomaly factor LOF algorithm is a density-based algorithm, wherein C1 is a cluster, C2 is another cluster, and O1 and O2 are isolated points far away from the two clusters, which are also called anomaly points;

(3) Representing the average of the ratio of the local reachable density of the neighborhood point of the point p to the local reachable density of the point, if the ratio is more than 1, the local reachable density is less than the neighborhood point density, and the point is considered as an abnormal point;

wherein:

the local reachable density is the reciprocal of the average reachable distance from the point p in the kth neighborhood of the point p to the point p, the density of the points around the point p is represented, the higher the density is, the naturally, the cluster is, the lower the density is, the farther the point p is from the cluster, the higher the possibility that the point p is an abnormal point is considered to be, and the kth reachable distance from the point o to the point p is the kth neighborhood of the point p, namely all the points within the kth distance of the point p;

(4) Abnormal point detection: the isolated forest cuts the space I into two by using a hyperplane, and continues to be cut infinitely until the space I is cut into data one by one; if the cluster is a high-density cluster, the cutting times are more; the clusters with low density or data far away from the clusters can be cut quickly, the isolated forest is composed of t isolated trees, and each isolated tree is of a binary tree structure;

after t isolated trees are obtained, training of the isolated forest is finished, then the generated isolated trees are used for evaluating test data, for each data point a, the data point is made to traverse each tree, then a is calculated to finally fall on the fourth layer of each tree, so that the average height of the data is obtained, a certain threshold value is set, the data below the threshold value is abnormal, the average height value of each data is set to be = -0.05, and if the average height value is less than-0.05, the data is abnormal;

step three: comprehensively verifying the reliability of the label, respectively detecting abnormal points of the data segments of each measuring point by using a statistical algorithm, an LOF algorithm and an isolated forest algorithm (ISO), and verifying the validity of the label according to the following steps;

(1) Respectively calculating the abnormal point probabilities P1, P2 and P3 of each algorithm and the average probability P of the abnormal point probabilities;

(2) If the average probability P is more than 10%, the data segment is considered as abnormal data, and the label is an invalid data segment; if the average probability P is less than or equal to 5%, the data is not considered as abnormal data, and the label is an effective data segment; if the average probability P is more than 5% and less than or equal to 10%, the label is a general valid data segment.