[go: up one dir, main page]

CN110619345B - Comprehensive verification method of tag credibility for cable-stayed bridge monitoring data validity - Google Patents

Comprehensive verification method of tag credibility for cable-stayed bridge monitoring data validity Download PDF

Info

Publication number
CN110619345B
CN110619345B CN201910661200.4A CN201910661200A CN110619345B CN 110619345 B CN110619345 B CN 110619345B CN 201910661200 A CN201910661200 A CN 201910661200A CN 110619345 B CN110619345 B CN 110619345B
Authority
CN
China
Prior art keywords
data
point
algorithm
abnormal
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201910661200.4A
Other languages
Chinese (zh)
Other versions
CN110619345A (en
Inventor
梁宗保
柴洁
唐朝霞
唐玉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Jiaotong University
Original Assignee
Chongqing Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Jiaotong University filed Critical Chongqing Jiaotong University
Priority to CN201910661200.4A priority Critical patent/CN110619345B/en
Publication of CN110619345A publication Critical patent/CN110619345A/en
Application granted granted Critical
Publication of CN110619345B publication Critical patent/CN110619345B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/08Construction

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Educational Administration (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Marketing (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Testing Or Calibration Of Command Recording Devices (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a comprehensive tag reliability verification method for validity of cable-stayed bridge monitoring data. The invention relates to a method for comprehensively verifying confidence degrees of data validity labels detected by various abnormal points on the basis of a grey correlation degree theory, adopts a calculation method of Dun's correlation degree to pairwise correlate data of each monitoring point of a bridge every day, respectively adopts 3 algorithm, LOF algorithm and isolated forest algorithm to detect the abnormal points of the labeled data section obtained in the first step, verifies the confidence degree of the validity labels of the data section according to the probability of the abnormal point crossing, can diagnose the collected data before evaluation by the abnormal point detection with Isolation forest algorithm to eliminate the invisible invalid data or repair the invisible invalid data to form valid data, provides a scientific data base for the next safety evaluation, and has a certain use prospect.

Description

面向斜拉桥监测数据有效性的标签可信度综合验证方法Comprehensive verification method of tag credibility for cable-stayed bridge monitoring data validity

技术领域technical field

本发明涉及面向斜拉桥监测数据的标签可信度综合验证方法领域,具体为面向斜拉桥监测数据有效性的标签可信度综合验证方法。The invention relates to the field of a comprehensive verification method for label credibility oriented to cable-stayed bridge monitoring data, in particular to a comprehensive verification method for label credibility oriented to the effectiveness of cable-stayed bridge monitoring data.

背景技术Background technique

斜拉桥因结构轻灵、造型美观、跨度大、经济性好而得到广泛应用。因此对斜拉桥的监测和安全评价也是一直以来的研究热点。然而这一切的基础是能够采集到准确有效的数据。然而由于各方面因素的影响,所采集到的数据却往往不一定真实有效,甚至错漏百出,完全失效。Cable-stayed bridges are widely used because of their light structure, beautiful appearance, large span and good economy. Therefore, the monitoring and safety evaluation of cable-stayed bridges has always been a research hotspot. However, the basis of all this is the ability to collect accurate and effective data. However, due to the influence of various factors, the collected data is often not necessarily true and effective, or even full of errors and omissions, completely invalid.

其中,数据段失效可以是连续数据点出现失效,也可以是若干分散的单点失效使得整个数据段失效。几种常见的数据特征包括:数据段漂移、数据频率变化异常、数据段突然跳变、数据值为恒定值等,这些类型的失效数据可通过图像显示的方式直观判断,而有一种失效数据段表面看和许多正常的数据段没有什么不同,没有很明显的特征波动,难以直接判断。而这类数据占据了失效数据的大部分,对斜拉桥的安全评价影响巨大。因此必须在评价前对所采集到的数据进行诊断处理,以剔除这种隐形无效的数据或对其进行修复而形成有效数据,为接下来的安全评价提供科学的数据基础。Wherein, the failure of the data segment may be the failure of continuous data points, or the failure of several scattered single points causing the failure of the entire data segment. Several common data characteristics include: data segment drift, abnormal data frequency change, sudden jump of data segment, constant data value, etc. These types of failure data can be judged intuitively through image display, and there is a failure data segment On the surface, it is no different from many normal data segments, and there are no obvious characteristic fluctuations, so it is difficult to judge directly. And this kind of data occupies most of the failure data, which has a great impact on the safety evaluation of cable-stayed bridges. Therefore, the collected data must be diagnosed and processed before the evaluation, so as to eliminate the invisible and invalid data or repair it to form valid data, and provide a scientific data basis for the next safety evaluation.

目前,以深度学习为核心的人工智能方法是被认可的数据有效性诊断的主要方法之一,而其中数据有效性标签的可信度大小是这种方法的重要依据,为此本申请提出了面向斜拉桥监测数据有效性的标签可信度综合验证方法。At present, the artificial intelligence method with deep learning as the core is one of the main methods of recognized data validity diagnosis, and the credibility of the data validity label is an important basis for this method. Therefore, this application proposes A comprehensive verification method of tag credibility for cable-stayed bridge monitoring data validity.

发明内容Contents of the invention

本发明要解决的技术问题是克服现有的缺陷,提供面向斜拉桥监测数据有效性的标签可信度综合验证方法,可以有效解决背景技术中的问题。The technical problem to be solved by the present invention is to overcome the existing defects and provide a comprehensive verification method for label credibility oriented to the effectiveness of cable-stayed bridge monitoring data, which can effectively solve the problems in the background technology.

为实现上述目的,本发明提供如下技术方案:面向斜拉桥监测数据有效性的标签可信度综合验证方法,包括以下步骤:In order to achieve the above object, the present invention provides the following technical solutions: a method for comprehensively verifying the credibility of tags facing the effectiveness of cable-stayed bridge monitoring data, comprising the following steps:

步骤一:采用邓氏关联度的计算方法,对桥梁各个监测点每天的数据进行两两关联:Step 1: Use the calculation method of Deng's correlation degree to correlate the daily data of each monitoring point of the bridge in pairs:

(1)将每个测点采集的数据标准化;(1) Standardize the data collected at each measuring point;

(2)将每个测点的数据利用邓氏关联度算法完成两两关联的计算;(2) The data of each measuring point is used to complete the calculation of pairwise correlation with the Deng's correlation degree algorithm;

(3)根据测点位置以及关联度的计算值大小实现数据段有效性的自动标签;(3) Realize the automatic labeling of the validity of the data segment according to the position of the measuring point and the calculated value of the degree of correlation;

①对测点数据进行标签:若roi∈[0.8,1],属于强关联,标签为有效数据段;若roi∈[0.6,0.8),属于一般关联,标签为一般有效数据段;若roi∈[0,0.6),关联很弱,标签为无效数据段;① Label the measurement point data: if r oi ∈ [0.8,1], it belongs to strong association, and the label is a valid data segment; if r oi ∈ [0.6,0.8), it belongs to general association, and the label is a general valid data segment; if r oi ∈[0,0.6), the association is very weak, and the label is an invalid data segment;

②如果参考测点数据序列与周围测点、同一截面的数据序列均强关联,说明参考序列有效,并计算参考数据序列与周围测量点、同一截面的数据序列的关联度平均值,按照标准区间进行标签;②If the data sequence of the reference measurement point is strongly correlated with the data sequence of the surrounding measurement points and the same section, it means that the reference sequence is valid. labeling;

③如果参考测点数据序列与周围测点、同一截面的数据序列大部分强关联,少量弱关联,则去除弱关联数据序列的关联度,计算均值关联度,按照标准区间进行标签,此外,该弱关联数据序列嫌疑较大,以此为参考数据序列,与其相邻监测点数据序列和同一截面的数据序列进行关联度比较,按照②③④再次分析;③If the data sequence of the reference measuring point is mostly strongly correlated with the data sequence of the surrounding measuring points and the same section, and a few are weakly correlated, remove the correlation degree of the weakly correlated data sequence, calculate the average correlation degree, and label according to the standard interval. In addition, the The weakly correlated data sequence is suspected to be relatively large. Use this as a reference data sequence to compare the correlation degree with the data sequence of adjacent monitoring points and the data sequence of the same section, and analyze again according to ②③④;

④如果参考测点数据序列与周围测点、同一截面的数据序列大部分关联度很弱,则可以基本判定为无效数据段,仍然进行关联度均值计算,确定该数据段标签;④ If the data sequence of the reference measuring point is weakly correlated with the surrounding measuring points and the data sequence of the same section, it can be basically judged as an invalid data segment, and the average value of the correlation degree is still calculated to determine the label of the data segment;

步骤二:分别采用3σ算法、LOF算法、孤立森林算法对步骤一获得的已标签数据段进行异常点检测,并根据异常点交叉出现的概率大小对数据段的有效性标签的可信度进行验证;Step 2: Use the 3σ algorithm, LOF algorithm, and isolation forest algorithm to detect outliers on the labeled data segments obtained in step 1, and verify the credibility of the validity labels of the data segments according to the probability of crossing occurrences of abnormal points ;

(1)基于3σ算法的数据点检测;(1) Data point detection based on 3σ algorithm;

①首先对数据进行标准化处理,使数据满足正态分布;① First, standardize the data to make the data meet the normal distribution;

②将该算法得到的数据参量赋给单点数据,对公式做出适当变化,>0说明该点异常:其中为数据序列的均值,为数据序列的标准差,x为某个数据点;② Assign the data parameters obtained by the algorithm to the single-point data, and make appropriate changes to the formula, >0 indicates that the point is abnormal: where is the mean value of the data sequence, is the standard deviation of the data sequence, and x is a certain data point;

(2)基于LOF算法的数据点检测:局部异常因子LOF算法是一种基于密度的算法,C1是一个簇,C2是另外一个簇,而O1和O2则是远离两个簇的孤立点,也叫做异常点;(2) Data point detection based on the LOF algorithm: the local outlier factor LOF algorithm is a density-based algorithm, C1 is a cluster, C2 is another cluster, and O1 and O2 are isolated points far away from the two clusters. called outliers;

(3)表示点p的邻域点的局部可达密度与点的局部可达密度之比的平均数,如果这个比值大于1,则的密度小于其邻域点密度,那么认为可能是异常点;其中:(3) Indicates the average number of the ratio of the local reachable density of the neighborhood points of point p to the local reachable density of the point. If the ratio is greater than 1, then the density of p is smaller than the density of its neighbor points, then it may be an abnormal point ;in:

Figure GDA0003681432670000031
Figure GDA0003681432670000031

为局部可达密度,是点p的第k邻域内点到p的平均可达距离的倒数,表示点p周围点的密度,密度越高自然是在所在的簇内,密度越低,则认为离簇越远,极有可能是异常点,而点o到点p的第k可达距离,是点p的第k距离邻域,即点p的第k距离以内的所有点;is the local reachable density, which is the reciprocal of the average reachable distance from point to p in the k-th neighborhood of point p, and represents the density of points around point p. The farther away from the cluster, it is very likely to be an abnormal point, and the k-th reachable distance from point o to point p is the k-th distance neighborhood of point p, that is, all points within the k-th distance of point p;

(3)异常点检测:孤立森林用超平面把空间一割为二,无限切割下去,直到切割成一个个数据;如果是一个密度高的簇,切割次数越多;而密度低的簇或者本身就是一个远离簇的数据,则很快会被切割完,孤立森林由t个孤立树组成,每个孤立树是一个二叉树结构;(3) Outlier detection: The isolated forest uses a hyperplane to divide the space into two, and cuts it infinitely until it is cut into pieces of data; if it is a cluster with high density, the number of cuts is more; while the cluster with low density or itself It is a data that is far away from the cluster, and it will be cut soon. The isolated forest is composed of t isolated trees, and each isolated tree is a binary tree structure;

获得t个孤立树之后,孤立森林训练就结束,然后我们可以用生成的孤立树来评估测试数据了,对于每一个数据点a,让该数据点遍历每一棵树,然后计算a最终落在每个树第几层,从而得出该数据的平均高度,设置一定阈值,低于此阈值的数据即为异常,这里将每个数据的高度平均值设置为,取=-0.05,如果平均高度值<-0.05,则说明该数据异常;After obtaining t isolated trees, the isolated forest training is over, and then we can use the generated isolated trees to evaluate the test data. For each data point a, let the data point traverse each tree, and then calculate a and finally fall on Which layer is each tree, so as to obtain the average height of the data, set a certain threshold, the data below this threshold is abnormal, here set the average height of each data to, take=-0.05, if the average height If the value is <-0.05, it means that the data is abnormal;

步骤三:标签可信度的综合验证,通过使用统计算法、LOF算法和孤立森林算法(ISO)对各个测点的数据段分别进行异常点检测,并根据以下步骤验证标签的有效性;Step 3: Comprehensive verification of label credibility, by using statistical algorithm, LOF algorithm and isolated forest algorithm (ISO) to detect abnormal points on the data segments of each measurement point, and verify the validity of the label according to the following steps;

(1)分别计算每种算法的异常点概率P1,P2,P3以及他们的平均概率P;(1) Calculate the outlier probability P1, P2, P3 of each algorithm and their average probability P;

(2)若平均概率P大于10%,则认为此数据段是大概率的异常数据,标签为无效数据段;若平均概率P小于等于5%,则不认为其是异常数据,标签为有效数据段;若平均概率P大于5%且小于等于10%,则标签为一般有效数据段;(2) If the average probability P is greater than 10%, the data segment is considered to be abnormal data with a high probability, and the label is an invalid data segment; if the average probability P is less than or equal to 5%, it is not considered abnormal data, and the label is valid data segment; if the average probability P is greater than 5% and less than or equal to 10%, the label is a general valid data segment;

与现有技术相比,本发明的有益效果是:Compared with prior art, the beneficial effect of the present invention is:

本发明通过在灰色关联度理论的基础上,结合多种异常点检测的数据有效性标签置信度的综合验证方法,采用邓氏关联度的计算方法,对桥梁各个监测点每天的数据进行两两关联,分别采用3σ算法、LOF算法、孤立森林算法对步骤一获得的已标签数据段进行异常点检测,并根据异常点交叉出现的概率大小对数据段的有效性标签的可信度进行验证,通过有Isolation Froest(孤立森林)算法的异常点检测,可在评价前对所采集到的数据进行诊断处理,以剔除这种隐形无效的数据或对其进行修复而形成有效数据,为接下来的安全评价提供科学的数据基础,利于人们的使用。In the present invention, on the basis of the theory of gray relational degree, combined with the comprehensive verification method of the data validity label confidence degree of various abnormal point detection, the calculation method of Deng's correlation degree is used to carry out two-by-two daily data of each monitoring point of the bridge. Correlation, using the 3σ algorithm, LOF algorithm, and isolation forest algorithm to detect abnormal points on the labeled data segments obtained in step 1, and verify the credibility of the validity labels of the data segments according to the probability of occurrence of abnormal point intersections. Through the outlier detection with the Isolation Froest (isolated forest) algorithm, the collected data can be diagnosed and processed before evaluation, so as to eliminate this invisible and invalid data or repair it to form effective data, which will provide a basis for the next Safety evaluation provides a scientific data basis, which is beneficial to people's use.

附图说明Description of drawings

图1为本发明中孤立森林算法流程图;Fig. 1 is isolated forest algorithm flow chart among the present invention;

图2为本发明中基于距离的异常点示意图;Fig. 2 is a schematic diagram of distance-based abnormal points in the present invention;

具体实施方式detailed description

为使本发明实现的技术手段、创作特征、达成目的与功效易于明白了解,下面结合具体实施方式,进一步阐述本发明。In order to make the technical means, creative features, goals and effects achieved by the present invention easy to understand, the present invention will be further described below in conjunction with specific embodiments.

下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

如图1-2所示,面向斜拉桥监测数据有效性的标签可信度综合验证方法,步骤一:采用邓氏关联度的计算方法,对桥梁各个监测点每天的数据进行两两关联:As shown in Figure 1-2, the comprehensive verification method of label credibility for the effectiveness of cable-stayed bridge monitoring data, step 1: use the calculation method of Deng's correlation degree to correlate the daily data of each monitoring point of the bridge in pairs:

将每个测点采集的数据标准化;Standardize the data collected at each measuring point;

将每个测点的数据利用邓氏关联度算法完成两两关联的计算;Use the data of each measuring point to complete the calculation of pairwise correlation by using Deng's correlation degree algorithm;

则关联系数的计算公式为:Then the formula for calculating the correlation coefficient is:

Figure GDA0003681432670000051
Figure GDA0003681432670000051

ρ为分辨系数,平衡了关联系数的判定,一般在区间(0,1)之间。ρ is the resolution coefficient, which balances the determination of the correlation coefficient, and is generally in the interval (0, 1).

关联度的计算公式为:The formula for calculating the correlation degree is:

Figure GDA0003681432670000061
Figure GDA0003681432670000061

roi反映了序列之间的变化相似度。r oi reflects the variation similarity between sequences.

②实际应用中,X0为某个测点的数据序列,X{Xi|i=1,2,...m}为同一截面的其他测点数据序列,m为数据序列的长度,它们均有n个分量。②In practical application, X 0 is the data sequence of a certain measuring point, X{X i |i=1,2,...m} is the data sequence of other measuring points in the same section, m is the length of the data sequence, they Each has n components.

根据测点位置以及关联度的计算值大小实现数据段有效性的自动标签。Automatic labeling of the validity of the data segment is realized according to the position of the measuring point and the calculated value of the correlation degree.

①对测点数据进行标签:若roi∈[0.8,1],属于强关联,标签为有效数据段;若roi∈[0.6,0.8),属于一般关联,标签为一般有效数据段;若roi∈[0,0.6),关联很弱,标签为无效数据段。① Label the measurement point data: if r oi ∈ [0.8,1], it belongs to strong association, and the label is a valid data segment; if r oi ∈ [0.6,0.8), it belongs to general association, and the label is a general valid data segment; if r oi ∈[0,0.6), the correlation is very weak, and the label is an invalid data segment.

②如果参考测点数据序列与周围测点、同一截面的数据序列均强关联,说明参考序列有效,并计算参考数据序列与周围测量点、同一截面的数据序列的关联度平均值,按照标准区间进行标签。②If the data sequence of the reference measurement point is strongly correlated with the data sequence of the surrounding measurement points and the same section, it means that the reference sequence is valid. Make a label.

③如果参考测点数据序列与周围测点、同一截面的数据序列大部分强关联,少量弱关联,则去除弱关联数据序列的关联度,计算均值关联度,按照标准区间进行标签,此外,该弱关联数据序列嫌疑较大,以此为参考数据序列,与其相邻监测点数据序列和同一截面的数据序列进行关联度比较,按照②③④再次分析。③If the data sequence of the reference measuring point is mostly strongly correlated with the data sequence of the surrounding measuring points and the same section, and a few are weakly correlated, remove the correlation degree of the weakly correlated data sequence, calculate the average correlation degree, and label according to the standard interval. In addition, the The weakly correlated data sequence is more suspected, so use this as a reference data sequence to compare the correlation degree with the data sequence of adjacent monitoring points and the data sequence of the same section, and analyze again according to ②③④.

④如果参考测点数据序列与周围测点、同一截面的数据序列大部分关联度很弱,则可以基本判定为无效数据段,仍然进行关联度均值计算,确定该数据段标签。④ If the data sequence of the reference measuring point is weakly correlated with the surrounding measuring points and the data sequence of the same section, it can be basically judged as an invalid data segment, and the mean value of the correlation degree is still calculated to determine the label of the data segment.

步骤二:分别采用3σ算法、LOF算法、孤立森林算法对步骤一获得的已标签数据段进行异常点检测,并根据异常点交叉出现的概率大小对数据段的有效性标签的可信度进行验证。Step 2: Use the 3σ algorithm, LOF algorithm, and isolation forest algorithm to detect outliers on the labeled data segments obtained in step 1, and verify the credibility of the validity labels of the data segments according to the probability of crossing occurrences of abnormal points .

基于3σ算法的数据点检测。Data point detection based on 3σ algorithm.

①首先对数据进行标准化处理,使数据满足正态分布;① First, standardize the data to make the data meet the normal distribution;

②将该算法得到的数据参量赋给单点数据,对公式做出适当变化,ξ>0说明该点异常:②Assign the data parameters obtained by the algorithm to the single-point data, and make appropriate changes to the formula, and ξ>0 indicates that the point is abnormal:

ξ=|x-μ|-3σ (3)ξ=|x-μ|-3σ (3)

其中μ为数据序列的均值,σ为数据序列的标准差,x为某个数据点。Among them, μ is the mean value of the data sequence, σ is the standard deviation of the data sequence, and x is a certain data point.

基于LOF算法的数据点检测:局部异常因子LOF算法是一种基于密度的算法,C1是一个簇,C2是另外一个簇,而O1和O2则是远离两个簇的孤立点,也叫做异常点。Data point detection based on LOF algorithm: local anomaly factor LOF algorithm is a density-based algorithm, C1 is a cluster, C2 is another cluster, and O1 and O2 are isolated points away from the two clusters, also called outliers .

表示点p的邻域点的局部可达密度与点的局部可达密度之比的平均数,如果这个比值大于1,则的密度小于其邻域点密度,那么认为可能是异常点,其中:Indicates the average number of the ratio of the local reachable density of the neighborhood points of point p to the local reachable density of the point. If the ratio is greater than 1, the density of p is smaller than the density of its neighbor points, so it may be an abnormal point, where:

Figure GDA0003681432670000071
Figure GDA0003681432670000071

为局部可达密度,是点p的第k邻域内点到p的平均可达距离的倒数,表示点p周围点的密度,密度越高自然是在所在的簇内,密度越低,则认为离簇越远,极有可能是异常点,而点o到点p的第k可达距离,是点p的第k距离邻域,即点p的第k距离以内的所有点。is the local reachable density, which is the reciprocal of the average reachable distance from point to p in the k-th neighborhood of point p, and represents the density of points around point p. The farther away from the cluster, it is very likely to be an outlier point, and the k-th reachable distance from point o to point p is the k-th distance neighborhood of point p, that is, all points within the k-th distance of point p.

异常点检测:孤立森林用超平面把空间一割为二,无限切割下去,直到切割成一个个数据;如果是一个密度高的簇,切割次数越多;而密度低的簇或者本身就是一个远离簇的数据,则很快会被切割完,孤立森林由t个孤立树组成,每个孤立树是一个二叉树结构。Outlier detection: The isolated forest uses a hyperplane to divide the space into two, and cuts it infinitely until it is cut into pieces of data; if it is a cluster with high density, the number of cuts is more; while the cluster with low density may itself be a far away The data of the cluster will be cut soon. The isolated forest consists of t isolated trees, and each isolated tree is a binary tree structure.

获得t个孤立树之后,孤立森林训练就结束,然后我们可以用生成的孤立树来评估测试数据了,对于每一个数据点a,让该数据点遍历每一棵树,然后计算a最终落在每个树第几层,从而得出该数据的平均高度,设置一定阈值,低于此阈值的数据即为异常,这里将每个数据的高度平均值设置为,取=-0.05,如果平均高度值<-0.05,则说明该数据异常。After obtaining t isolated trees, the isolated forest training is over, and then we can use the generated isolated trees to evaluate the test data. For each data point a, let the data point traverse each tree, and then calculate a and finally fall on Which layer is each tree, so as to obtain the average height of the data, set a certain threshold, the data below this threshold is abnormal, here set the average height of each data to, take=-0.05, if the average height If the value is <-0.05, it means that the data is abnormal.

步骤三:标签可信度的综合验证,通过使用统计算法、LOF算法和孤立森林算法(ISO)对各个测点的数据段分别进行异常点检测,并根据以下步骤验证标签的有效性。Step 3: Comprehensive verification of label credibility, by using statistical algorithm, LOF algorithm and isolated forest algorithm (ISO) to detect outliers in the data segments of each measurement point, and verify the validity of the label according to the following steps.

(1)分别计算每种算法的异常点概率P1,P2,P3以及他们的平均概率P。(1) Calculate the outlier probabilities P1, P2, P3 and their average probability P of each algorithm respectively.

(2)若平均概率P大于10%,则认为此数据段是大概率的异常数据,标签为无效数据段;若平均概率P小于等于5%,则不认为其是异常数据,标签为有效数据段;若平均概率P大于5%且小于等于10%,则标签为一般有效数据段。(2) If the average probability P is greater than 10%, the data segment is considered to be abnormal data with a high probability, and the label is an invalid data segment; if the average probability P is less than or equal to 5%, it is not considered abnormal data, and the label is valid data segment; if the average probability P is greater than 5% and less than or equal to 10%, the tag is a general valid data segment.

尽管已经示出和描述了本发明的实施例,对于本领域的普通技术人员而言,可以理解在不脱离本发明的原理和精神的情况下可以对这些实施例进行多种变化、修改、替换和变型,本发明的范围由所附权利要求及其等同物限定。Although the embodiments of the present invention have been shown and described, those skilled in the art can understand that various changes, modifications and substitutions can be made to these embodiments without departing from the principle and spirit of the present invention. and modifications, the scope of the invention is defined by the appended claims and their equivalents.

Claims (1)

1. The comprehensive label credibility verification method for the effectiveness of the cable-stayed bridge monitoring data comprises the following steps: the method comprises the following steps:
the method comprises the following steps: performing pairwise association on data of each monitoring point of the bridge every day by adopting a Deng correlation calculation method:
standardizing data collected by each measuring point;
computing pairwise correlation of the data of each measuring point by using a Deng correlation algorithm;
the correlation coefficient is calculated by the following formula:
Figure FDA0003864592640000011
rho is a resolution coefficient, balances the judgment of the correlation coefficient and is generally between intervals (0,1);
the calculation formula of the correlation degree is as follows:
Figure FDA0003864592640000012
r oi reflecting varying similarity between sequences;
(2) in practical application, X 0 For a data sequence at a certain point, X { X } i I =1,2,. M } is other measuring point data sequences of the same section, m is the length of the data sequence, and the measuring point data sequences have n components;
(1) labeling the measuring point data: if r oi ∈[0.8,1]The label is valid, belonging to a strong associationA data segment; if r oi E [0.6,0.8) belongs to general association, and the label is a general valid data segment; if r oi E.g., [0,0.6), the association is weak, and the label is an invalid data segment;
(2) if the data sequence of the reference measuring point is strongly associated with the data sequences of the surrounding measuring points and the same section, the reference sequence is indicated to be effective, the average value of the association degree of the reference data sequence with the data sequences of the surrounding measuring points and the same section is calculated, and the labeling is carried out according to a standard interval;
(3) if the data sequence of the reference measuring point is strongly associated with most of the data sequences of the surrounding measuring points and the same section, and a small amount of weak association is performed, removing the association degree of the weak association data sequence, calculating the average association degree, labeling according to a standard interval, and in addition, the weak association data sequence is relatively suspected, taking the weak association data sequence as the reference data sequence, performing association degree comparison with the data sequence of the adjacent monitoring point and the data sequence of the same section, and analyzing again according to the steps (2), (3) and (4);
(4) if the correlation degree of the reference measuring point data sequence, the data sequences of the surrounding measuring points and the same section is weak, determining as an invalid data segment, and still performing correlation degree mean value calculation to determine a data segment label;
step two: respectively adopting a 3 sigma algorithm, an LOF algorithm and an isolated forest algorithm to detect abnormal points of the labeled data segments obtained in the step one, and verifying the reliability of the validity labels of the data segments according to the probability of the abnormal points;
(1) Data point detection based on a 3 sigma algorithm;
(1) firstly, carrying out standardization processing on data to enable the data to meet normal distribution;
(2) the data parameters obtained by the algorithm are assigned to single-point data, the formula is appropriately changed, and xi >0 shows that the point is abnormal as follows:
ξ=|x-μ|-3σ
wherein mu is the mean value of the data sequence, sigma is the standard deviation of the data sequence, and x is a certain data point;
(2) Data point detection based on LOF algorithm: the local anomaly factor LOF algorithm is a density-based algorithm, wherein C1 is a cluster, C2 is another cluster, and O1 and O2 are isolated points far away from the two clusters, which are also called anomaly points;
(3) Representing the average of the ratio of the local reachable density of the neighborhood point of the point p to the local reachable density of the point, if the ratio is more than 1, the local reachable density is less than the neighborhood point density, and the point is considered as an abnormal point;
wherein:
Figure FDA0003864592640000031
the local reachable density is the reciprocal of the average reachable distance from the point p in the kth neighborhood of the point p to the point p, the density of the points around the point p is represented, the higher the density is, the naturally, the cluster is, the lower the density is, the farther the point p is from the cluster, the higher the possibility that the point p is an abnormal point is considered to be, and the kth reachable distance from the point o to the point p is the kth neighborhood of the point p, namely all the points within the kth distance of the point p;
(4) Abnormal point detection: the isolated forest cuts the space I into two by using a hyperplane, and continues to be cut infinitely until the space I is cut into data one by one; if the cluster is a high-density cluster, the cutting times are more; the clusters with low density or data far away from the clusters can be cut quickly, the isolated forest is composed of t isolated trees, and each isolated tree is of a binary tree structure;
after t isolated trees are obtained, training of the isolated forest is finished, then the generated isolated trees are used for evaluating test data, for each data point a, the data point is made to traverse each tree, then a is calculated to finally fall on the fourth layer of each tree, so that the average height of the data is obtained, a certain threshold value is set, the data below the threshold value is abnormal, the average height value of each data is set to be = -0.05, and if the average height value is less than-0.05, the data is abnormal;
step three: comprehensively verifying the reliability of the label, respectively detecting abnormal points of the data segments of each measuring point by using a statistical algorithm, an LOF algorithm and an isolated forest algorithm (ISO), and verifying the validity of the label according to the following steps;
(1) Respectively calculating the abnormal point probabilities P1, P2 and P3 of each algorithm and the average probability P of the abnormal point probabilities;
(2) If the average probability P is more than 10%, the data segment is considered as abnormal data, and the label is an invalid data segment; if the average probability P is less than or equal to 5%, the data is not considered as abnormal data, and the label is an effective data segment; if the average probability P is more than 5% and less than or equal to 10%, the label is a general valid data segment.
CN201910661200.4A 2019-07-22 2019-07-22 Comprehensive verification method of tag credibility for cable-stayed bridge monitoring data validity Expired - Fee Related CN110619345B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910661200.4A CN110619345B (en) 2019-07-22 2019-07-22 Comprehensive verification method of tag credibility for cable-stayed bridge monitoring data validity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910661200.4A CN110619345B (en) 2019-07-22 2019-07-22 Comprehensive verification method of tag credibility for cable-stayed bridge monitoring data validity

Publications (2)

Publication Number Publication Date
CN110619345A CN110619345A (en) 2019-12-27
CN110619345B true CN110619345B (en) 2022-12-06

Family

ID=68921642

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910661200.4A Expired - Fee Related CN110619345B (en) 2019-07-22 2019-07-22 Comprehensive verification method of tag credibility for cable-stayed bridge monitoring data validity

Country Status (1)

Country Link
CN (1) CN110619345B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111710373A (en) * 2020-07-20 2020-09-25 中科三清科技有限公司 Detection method, device, equipment and medium of volatile organic compound observation data
CN112035919A (en) * 2020-08-24 2020-12-04 山东高速工程检测有限公司 Bridge in-service performance safety assessment method and system, storage medium and equipment
CN116776274B (en) * 2023-08-25 2023-10-17 北京点聚信息技术有限公司 Electronic seal data management system based on data analysis
CN116881819B (en) * 2023-09-07 2023-11-14 成都理工大学 A method for monitoring the working status of cable-stayed cables based on isolated forests
CN119416886A (en) * 2025-01-08 2025-02-11 汕头大学医学院附属肿瘤医院 A method and system for building a medical accelerator quality control knowledge base using a large language model

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016174264A1 (en) * 2015-04-30 2016-11-03 Marel Iceland Ehf. A method of handling weight data in a data processing system
CN106598822A (en) * 2015-10-15 2017-04-26 华为技术有限公司 Abnormal data detection method and device applied to capacity estimation
CN109739849A (en) * 2019-01-02 2019-05-10 山东省科学院情报研究所 A kind of network sensitive information of data-driven excavates and early warning platform

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101382473B (en) * 2008-10-08 2011-04-20 重庆大学 EWMA control chart method for bridge structure safety alarm
CN101382474B (en) * 2008-10-08 2011-04-20 重庆大学 Alarming method for bridge structure safety
CN102436530A (en) * 2011-11-15 2012-05-02 东南大学 Sensor distribution method for bowstring arc bridge structure made of special-shaped steel tube concrete
CN105067206B (en) * 2015-07-16 2017-09-26 长安大学 A kind of deflection of bridge structure linear measurement method
US9896836B1 (en) * 2015-11-09 2018-02-20 Iowa State University Research Foundation, Inc. Apparatus, method, and system for high capacity band brake type variable friction damping of movement of structures
CN106223189B (en) * 2016-07-18 2018-01-23 深圳市市政设计研究院有限公司 Lead rubber laminated bearing, intelligent bearing and bearing monitoring system
CN106529062B (en) * 2016-11-20 2019-10-11 重庆交通大学 A health diagnosis method for bridge structures based on deep learning
CN109214355B (en) * 2018-09-29 2020-05-15 西安交通大学 Mechanical monitoring data abnormal section detection method based on kernel estimation LOF
CN109556897A (en) * 2018-11-16 2019-04-02 王玉波 A kind of bridge construction system in science of bridge building field

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016174264A1 (en) * 2015-04-30 2016-11-03 Marel Iceland Ehf. A method of handling weight data in a data processing system
CN106598822A (en) * 2015-10-15 2017-04-26 华为技术有限公司 Abnormal data detection method and device applied to capacity estimation
CN109739849A (en) * 2019-01-02 2019-05-10 山东省科学院情报研究所 A kind of network sensitive information of data-driven excavates and early warning platform

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于密度偏倚抽样的局部距离异常检测方法;付培国等;《软件学报》;20171015(第10期);第105-119页 *
悬索桥钢混组合桥面系温度梯度数值模拟及效应研究(英文);王达等;《Journal of Central South University》;20180115(第01期);第189-199页 *

Also Published As

Publication number Publication date
CN110619345A (en) 2019-12-27

Similar Documents

Publication Publication Date Title
CN110619345B (en) Comprehensive verification method of tag credibility for cable-stayed bridge monitoring data validity
CN110824270B (en) Electricity stealing user identification method and device combining transformer area line loss and abnormal events
CN111353482A (en) LSTM-based fatigue factor recessive anomaly detection and fault diagnosis method
US20210397175A1 (en) Abnormality detection device, abnormality detection method, and program
CN101403676B (en) Insulator hydrophobicity rank amalgamation judging method based on D-S evidence theory
CN111562108A (en) An Intelligent Fault Diagnosis Method of Rolling Bearing Based on CNN and FCMC
WO2019019709A1 (en) Method for detecting water leakage of tap water pipe
Feng et al. Data mining for abnormal power consumption pattern detection based on local matrix reconstruction
CN119150175B (en) Pipeline internal detection method and system based on multi-sensor fusion
CN111753877B (en) Product quality detection method based on deep neural network migration learning
CN112949735A (en) Liquid hazardous chemical substance volatile concentration abnormity discovery method based on outlier data mining
CN110046651B (en) Pipeline state identification method based on monitoring data multi-attribute feature fusion
CN118211171B (en) A target path mining method based on knowledge graph
CN113312968B (en) Real abnormality detection method in monitoring video
Sun et al. Feature extraction and pattern identification for anemometer condition diagnosis
Sun et al. Flow measurement-based self-adaptive line segment clustering model for leakage detection in water distribution networks
CN118859800A (en) A smart manufacturing monitoring method and system based on big data
CN118646475A (en) A method and system for identifying OPGW optical cable status parameters based on GoogLeNet network
Ameli et al. Explainable unsupervised multi-sensor industrial anomaly detection and categorization
CN103310088A (en) Automatic detecting method of abnormal illumination power consumption
CN112884167B (en) Multi-index anomaly detection method based on machine learning and application system thereof
CN118570194A (en) Method and system for detecting defects of inner surface of special-shaped bushing based on three-dimensional point cloud
Cai et al. An efficient outlier detection approach for streaming sensor data based on neighbor difference and clustering
Wu et al. Early anomaly detection in wind turbine bolts breaking problem—Methodology and application
Zhang et al. Unsupervised structural damage identification based on covariance matrix and deep clustering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20221206