CN110619345B - Comprehensive verification method of tag credibility for cable-stayed bridge monitoring data validity - Google Patents
Comprehensive verification method of tag credibility for cable-stayed bridge monitoring data validity Download PDFInfo
- Publication number
- CN110619345B CN110619345B CN201910661200.4A CN201910661200A CN110619345B CN 110619345 B CN110619345 B CN 110619345B CN 201910661200 A CN201910661200 A CN 201910661200A CN 110619345 B CN110619345 B CN 110619345B
- Authority
- CN
- China
- Prior art keywords
- data
- point
- algorithm
- abnormal
- label
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000012544 monitoring process Methods 0.000 title claims abstract description 18
- 238000000034 method Methods 0.000 title claims abstract description 17
- 238000012795 verification Methods 0.000 title claims abstract description 11
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 45
- 230000002159 abnormal effect Effects 0.000 claims abstract description 38
- 238000004364 calculation method Methods 0.000 claims abstract description 9
- 238000001514 detection method Methods 0.000 claims abstract description 9
- 230000003203 everyday effect Effects 0.000 claims abstract 2
- 238000002372 labelling Methods 0.000 claims description 6
- 238000012896 Statistical algorithm Methods 0.000 claims description 3
- 238000012360 testing method Methods 0.000 claims description 3
- 238000012549 training Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims 1
- 238000011156 evaluation Methods 0.000 abstract description 8
- 238000002955 isolation Methods 0.000 abstract description 5
- 230000002596 correlated effect Effects 0.000 description 12
- 238000005259 measurement Methods 0.000 description 8
- 238000013450 outlier detection Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/10—Pre-processing; Data cleansing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/08—Construction
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Development Economics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Educational Administration (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Entrepreneurship & Innovation (AREA)
- General Business, Economics & Management (AREA)
- General Engineering & Computer Science (AREA)
- Marketing (AREA)
- Tourism & Hospitality (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Game Theory and Decision Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Testing Or Calibration Of Command Recording Devices (AREA)
- Complex Calculations (AREA)
Abstract
Description
技术领域technical field
本发明涉及面向斜拉桥监测数据的标签可信度综合验证方法领域,具体为面向斜拉桥监测数据有效性的标签可信度综合验证方法。The invention relates to the field of a comprehensive verification method for label credibility oriented to cable-stayed bridge monitoring data, in particular to a comprehensive verification method for label credibility oriented to the effectiveness of cable-stayed bridge monitoring data.
背景技术Background technique
斜拉桥因结构轻灵、造型美观、跨度大、经济性好而得到广泛应用。因此对斜拉桥的监测和安全评价也是一直以来的研究热点。然而这一切的基础是能够采集到准确有效的数据。然而由于各方面因素的影响,所采集到的数据却往往不一定真实有效,甚至错漏百出,完全失效。Cable-stayed bridges are widely used because of their light structure, beautiful appearance, large span and good economy. Therefore, the monitoring and safety evaluation of cable-stayed bridges has always been a research hotspot. However, the basis of all this is the ability to collect accurate and effective data. However, due to the influence of various factors, the collected data is often not necessarily true and effective, or even full of errors and omissions, completely invalid.
其中,数据段失效可以是连续数据点出现失效,也可以是若干分散的单点失效使得整个数据段失效。几种常见的数据特征包括:数据段漂移、数据频率变化异常、数据段突然跳变、数据值为恒定值等,这些类型的失效数据可通过图像显示的方式直观判断,而有一种失效数据段表面看和许多正常的数据段没有什么不同,没有很明显的特征波动,难以直接判断。而这类数据占据了失效数据的大部分,对斜拉桥的安全评价影响巨大。因此必须在评价前对所采集到的数据进行诊断处理,以剔除这种隐形无效的数据或对其进行修复而形成有效数据,为接下来的安全评价提供科学的数据基础。Wherein, the failure of the data segment may be the failure of continuous data points, or the failure of several scattered single points causing the failure of the entire data segment. Several common data characteristics include: data segment drift, abnormal data frequency change, sudden jump of data segment, constant data value, etc. These types of failure data can be judged intuitively through image display, and there is a failure data segment On the surface, it is no different from many normal data segments, and there are no obvious characteristic fluctuations, so it is difficult to judge directly. And this kind of data occupies most of the failure data, which has a great impact on the safety evaluation of cable-stayed bridges. Therefore, the collected data must be diagnosed and processed before the evaluation, so as to eliminate the invisible and invalid data or repair it to form valid data, and provide a scientific data basis for the next safety evaluation.
目前,以深度学习为核心的人工智能方法是被认可的数据有效性诊断的主要方法之一,而其中数据有效性标签的可信度大小是这种方法的重要依据,为此本申请提出了面向斜拉桥监测数据有效性的标签可信度综合验证方法。At present, the artificial intelligence method with deep learning as the core is one of the main methods of recognized data validity diagnosis, and the credibility of the data validity label is an important basis for this method. Therefore, this application proposes A comprehensive verification method of tag credibility for cable-stayed bridge monitoring data validity.
发明内容Contents of the invention
本发明要解决的技术问题是克服现有的缺陷,提供面向斜拉桥监测数据有效性的标签可信度综合验证方法,可以有效解决背景技术中的问题。The technical problem to be solved by the present invention is to overcome the existing defects and provide a comprehensive verification method for label credibility oriented to the effectiveness of cable-stayed bridge monitoring data, which can effectively solve the problems in the background technology.
为实现上述目的,本发明提供如下技术方案:面向斜拉桥监测数据有效性的标签可信度综合验证方法,包括以下步骤:In order to achieve the above object, the present invention provides the following technical solutions: a method for comprehensively verifying the credibility of tags facing the effectiveness of cable-stayed bridge monitoring data, comprising the following steps:
步骤一:采用邓氏关联度的计算方法,对桥梁各个监测点每天的数据进行两两关联:Step 1: Use the calculation method of Deng's correlation degree to correlate the daily data of each monitoring point of the bridge in pairs:
(1)将每个测点采集的数据标准化;(1) Standardize the data collected at each measuring point;
(2)将每个测点的数据利用邓氏关联度算法完成两两关联的计算;(2) The data of each measuring point is used to complete the calculation of pairwise correlation with the Deng's correlation degree algorithm;
(3)根据测点位置以及关联度的计算值大小实现数据段有效性的自动标签;(3) Realize the automatic labeling of the validity of the data segment according to the position of the measuring point and the calculated value of the degree of correlation;
①对测点数据进行标签:若roi∈[0.8,1],属于强关联,标签为有效数据段;若roi∈[0.6,0.8),属于一般关联,标签为一般有效数据段;若roi∈[0,0.6),关联很弱,标签为无效数据段;① Label the measurement point data: if r oi ∈ [0.8,1], it belongs to strong association, and the label is a valid data segment; if r oi ∈ [0.6,0.8), it belongs to general association, and the label is a general valid data segment; if r oi ∈[0,0.6), the association is very weak, and the label is an invalid data segment;
②如果参考测点数据序列与周围测点、同一截面的数据序列均强关联,说明参考序列有效,并计算参考数据序列与周围测量点、同一截面的数据序列的关联度平均值,按照标准区间进行标签;②If the data sequence of the reference measurement point is strongly correlated with the data sequence of the surrounding measurement points and the same section, it means that the reference sequence is valid. labeling;
③如果参考测点数据序列与周围测点、同一截面的数据序列大部分强关联,少量弱关联,则去除弱关联数据序列的关联度,计算均值关联度,按照标准区间进行标签,此外,该弱关联数据序列嫌疑较大,以此为参考数据序列,与其相邻监测点数据序列和同一截面的数据序列进行关联度比较,按照②③④再次分析;③If the data sequence of the reference measuring point is mostly strongly correlated with the data sequence of the surrounding measuring points and the same section, and a few are weakly correlated, remove the correlation degree of the weakly correlated data sequence, calculate the average correlation degree, and label according to the standard interval. In addition, the The weakly correlated data sequence is suspected to be relatively large. Use this as a reference data sequence to compare the correlation degree with the data sequence of adjacent monitoring points and the data sequence of the same section, and analyze again according to ②③④;
④如果参考测点数据序列与周围测点、同一截面的数据序列大部分关联度很弱,则可以基本判定为无效数据段,仍然进行关联度均值计算,确定该数据段标签;④ If the data sequence of the reference measuring point is weakly correlated with the surrounding measuring points and the data sequence of the same section, it can be basically judged as an invalid data segment, and the average value of the correlation degree is still calculated to determine the label of the data segment;
步骤二:分别采用3σ算法、LOF算法、孤立森林算法对步骤一获得的已标签数据段进行异常点检测,并根据异常点交叉出现的概率大小对数据段的有效性标签的可信度进行验证;Step 2: Use the 3σ algorithm, LOF algorithm, and isolation forest algorithm to detect outliers on the labeled data segments obtained in step 1, and verify the credibility of the validity labels of the data segments according to the probability of crossing occurrences of abnormal points ;
(1)基于3σ算法的数据点检测;(1) Data point detection based on 3σ algorithm;
①首先对数据进行标准化处理,使数据满足正态分布;① First, standardize the data to make the data meet the normal distribution;
②将该算法得到的数据参量赋给单点数据,对公式做出适当变化,>0说明该点异常:其中为数据序列的均值,为数据序列的标准差,x为某个数据点;② Assign the data parameters obtained by the algorithm to the single-point data, and make appropriate changes to the formula, >0 indicates that the point is abnormal: where is the mean value of the data sequence, is the standard deviation of the data sequence, and x is a certain data point;
(2)基于LOF算法的数据点检测:局部异常因子LOF算法是一种基于密度的算法,C1是一个簇,C2是另外一个簇,而O1和O2则是远离两个簇的孤立点,也叫做异常点;(2) Data point detection based on the LOF algorithm: the local outlier factor LOF algorithm is a density-based algorithm, C1 is a cluster, C2 is another cluster, and O1 and O2 are isolated points far away from the two clusters. called outliers;
(3)表示点p的邻域点的局部可达密度与点的局部可达密度之比的平均数,如果这个比值大于1,则的密度小于其邻域点密度,那么认为可能是异常点;其中:(3) Indicates the average number of the ratio of the local reachable density of the neighborhood points of point p to the local reachable density of the point. If the ratio is greater than 1, then the density of p is smaller than the density of its neighbor points, then it may be an abnormal point ;in:
为局部可达密度,是点p的第k邻域内点到p的平均可达距离的倒数,表示点p周围点的密度,密度越高自然是在所在的簇内,密度越低,则认为离簇越远,极有可能是异常点,而点o到点p的第k可达距离,是点p的第k距离邻域,即点p的第k距离以内的所有点;is the local reachable density, which is the reciprocal of the average reachable distance from point to p in the k-th neighborhood of point p, and represents the density of points around point p. The farther away from the cluster, it is very likely to be an abnormal point, and the k-th reachable distance from point o to point p is the k-th distance neighborhood of point p, that is, all points within the k-th distance of point p;
(3)异常点检测:孤立森林用超平面把空间一割为二,无限切割下去,直到切割成一个个数据;如果是一个密度高的簇,切割次数越多;而密度低的簇或者本身就是一个远离簇的数据,则很快会被切割完,孤立森林由t个孤立树组成,每个孤立树是一个二叉树结构;(3) Outlier detection: The isolated forest uses a hyperplane to divide the space into two, and cuts it infinitely until it is cut into pieces of data; if it is a cluster with high density, the number of cuts is more; while the cluster with low density or itself It is a data that is far away from the cluster, and it will be cut soon. The isolated forest is composed of t isolated trees, and each isolated tree is a binary tree structure;
获得t个孤立树之后,孤立森林训练就结束,然后我们可以用生成的孤立树来评估测试数据了,对于每一个数据点a,让该数据点遍历每一棵树,然后计算a最终落在每个树第几层,从而得出该数据的平均高度,设置一定阈值,低于此阈值的数据即为异常,这里将每个数据的高度平均值设置为,取=-0.05,如果平均高度值<-0.05,则说明该数据异常;After obtaining t isolated trees, the isolated forest training is over, and then we can use the generated isolated trees to evaluate the test data. For each data point a, let the data point traverse each tree, and then calculate a and finally fall on Which layer is each tree, so as to obtain the average height of the data, set a certain threshold, the data below this threshold is abnormal, here set the average height of each data to, take=-0.05, if the average height If the value is <-0.05, it means that the data is abnormal;
步骤三:标签可信度的综合验证,通过使用统计算法、LOF算法和孤立森林算法(ISO)对各个测点的数据段分别进行异常点检测,并根据以下步骤验证标签的有效性;Step 3: Comprehensive verification of label credibility, by using statistical algorithm, LOF algorithm and isolated forest algorithm (ISO) to detect abnormal points on the data segments of each measurement point, and verify the validity of the label according to the following steps;
(1)分别计算每种算法的异常点概率P1,P2,P3以及他们的平均概率P;(1) Calculate the outlier probability P1, P2, P3 of each algorithm and their average probability P;
(2)若平均概率P大于10%,则认为此数据段是大概率的异常数据,标签为无效数据段;若平均概率P小于等于5%,则不认为其是异常数据,标签为有效数据段;若平均概率P大于5%且小于等于10%,则标签为一般有效数据段;(2) If the average probability P is greater than 10%, the data segment is considered to be abnormal data with a high probability, and the label is an invalid data segment; if the average probability P is less than or equal to 5%, it is not considered abnormal data, and the label is valid data segment; if the average probability P is greater than 5% and less than or equal to 10%, the label is a general valid data segment;
与现有技术相比,本发明的有益效果是:Compared with prior art, the beneficial effect of the present invention is:
本发明通过在灰色关联度理论的基础上,结合多种异常点检测的数据有效性标签置信度的综合验证方法,采用邓氏关联度的计算方法,对桥梁各个监测点每天的数据进行两两关联,分别采用3σ算法、LOF算法、孤立森林算法对步骤一获得的已标签数据段进行异常点检测,并根据异常点交叉出现的概率大小对数据段的有效性标签的可信度进行验证,通过有Isolation Froest(孤立森林)算法的异常点检测,可在评价前对所采集到的数据进行诊断处理,以剔除这种隐形无效的数据或对其进行修复而形成有效数据,为接下来的安全评价提供科学的数据基础,利于人们的使用。In the present invention, on the basis of the theory of gray relational degree, combined with the comprehensive verification method of the data validity label confidence degree of various abnormal point detection, the calculation method of Deng's correlation degree is used to carry out two-by-two daily data of each monitoring point of the bridge. Correlation, using the 3σ algorithm, LOF algorithm, and isolation forest algorithm to detect abnormal points on the labeled data segments obtained in step 1, and verify the credibility of the validity labels of the data segments according to the probability of occurrence of abnormal point intersections. Through the outlier detection with the Isolation Froest (isolated forest) algorithm, the collected data can be diagnosed and processed before evaluation, so as to eliminate this invisible and invalid data or repair it to form effective data, which will provide a basis for the next Safety evaluation provides a scientific data basis, which is beneficial to people's use.
附图说明Description of drawings
图1为本发明中孤立森林算法流程图;Fig. 1 is isolated forest algorithm flow chart among the present invention;
图2为本发明中基于距离的异常点示意图;Fig. 2 is a schematic diagram of distance-based abnormal points in the present invention;
具体实施方式detailed description
为使本发明实现的技术手段、创作特征、达成目的与功效易于明白了解,下面结合具体实施方式,进一步阐述本发明。In order to make the technical means, creative features, goals and effects achieved by the present invention easy to understand, the present invention will be further described below in conjunction with specific embodiments.
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.
如图1-2所示,面向斜拉桥监测数据有效性的标签可信度综合验证方法,步骤一:采用邓氏关联度的计算方法,对桥梁各个监测点每天的数据进行两两关联:As shown in Figure 1-2, the comprehensive verification method of label credibility for the effectiveness of cable-stayed bridge monitoring data, step 1: use the calculation method of Deng's correlation degree to correlate the daily data of each monitoring point of the bridge in pairs:
将每个测点采集的数据标准化;Standardize the data collected at each measuring point;
将每个测点的数据利用邓氏关联度算法完成两两关联的计算;Use the data of each measuring point to complete the calculation of pairwise correlation by using Deng's correlation degree algorithm;
则关联系数的计算公式为:Then the formula for calculating the correlation coefficient is:
ρ为分辨系数,平衡了关联系数的判定,一般在区间(0,1)之间。ρ is the resolution coefficient, which balances the determination of the correlation coefficient, and is generally in the interval (0, 1).
关联度的计算公式为:The formula for calculating the correlation degree is:
roi反映了序列之间的变化相似度。r oi reflects the variation similarity between sequences.
②实际应用中,X0为某个测点的数据序列,X{Xi|i=1,2,...m}为同一截面的其他测点数据序列,m为数据序列的长度,它们均有n个分量。②In practical application, X 0 is the data sequence of a certain measuring point, X{X i |i=1,2,...m} is the data sequence of other measuring points in the same section, m is the length of the data sequence, they Each has n components.
根据测点位置以及关联度的计算值大小实现数据段有效性的自动标签。Automatic labeling of the validity of the data segment is realized according to the position of the measuring point and the calculated value of the correlation degree.
①对测点数据进行标签:若roi∈[0.8,1],属于强关联,标签为有效数据段;若roi∈[0.6,0.8),属于一般关联,标签为一般有效数据段;若roi∈[0,0.6),关联很弱,标签为无效数据段。① Label the measurement point data: if r oi ∈ [0.8,1], it belongs to strong association, and the label is a valid data segment; if r oi ∈ [0.6,0.8), it belongs to general association, and the label is a general valid data segment; if r oi ∈[0,0.6), the correlation is very weak, and the label is an invalid data segment.
②如果参考测点数据序列与周围测点、同一截面的数据序列均强关联,说明参考序列有效,并计算参考数据序列与周围测量点、同一截面的数据序列的关联度平均值,按照标准区间进行标签。②If the data sequence of the reference measurement point is strongly correlated with the data sequence of the surrounding measurement points and the same section, it means that the reference sequence is valid. Make a label.
③如果参考测点数据序列与周围测点、同一截面的数据序列大部分强关联,少量弱关联,则去除弱关联数据序列的关联度,计算均值关联度,按照标准区间进行标签,此外,该弱关联数据序列嫌疑较大,以此为参考数据序列,与其相邻监测点数据序列和同一截面的数据序列进行关联度比较,按照②③④再次分析。③If the data sequence of the reference measuring point is mostly strongly correlated with the data sequence of the surrounding measuring points and the same section, and a few are weakly correlated, remove the correlation degree of the weakly correlated data sequence, calculate the average correlation degree, and label according to the standard interval. In addition, the The weakly correlated data sequence is more suspected, so use this as a reference data sequence to compare the correlation degree with the data sequence of adjacent monitoring points and the data sequence of the same section, and analyze again according to ②③④.
④如果参考测点数据序列与周围测点、同一截面的数据序列大部分关联度很弱,则可以基本判定为无效数据段,仍然进行关联度均值计算,确定该数据段标签。④ If the data sequence of the reference measuring point is weakly correlated with the surrounding measuring points and the data sequence of the same section, it can be basically judged as an invalid data segment, and the mean value of the correlation degree is still calculated to determine the label of the data segment.
步骤二:分别采用3σ算法、LOF算法、孤立森林算法对步骤一获得的已标签数据段进行异常点检测,并根据异常点交叉出现的概率大小对数据段的有效性标签的可信度进行验证。Step 2: Use the 3σ algorithm, LOF algorithm, and isolation forest algorithm to detect outliers on the labeled data segments obtained in step 1, and verify the credibility of the validity labels of the data segments according to the probability of crossing occurrences of abnormal points .
基于3σ算法的数据点检测。Data point detection based on 3σ algorithm.
①首先对数据进行标准化处理,使数据满足正态分布;① First, standardize the data to make the data meet the normal distribution;
②将该算法得到的数据参量赋给单点数据,对公式做出适当变化,ξ>0说明该点异常:②Assign the data parameters obtained by the algorithm to the single-point data, and make appropriate changes to the formula, and ξ>0 indicates that the point is abnormal:
ξ=|x-μ|-3σ (3)ξ=|x-μ|-3σ (3)
其中μ为数据序列的均值,σ为数据序列的标准差,x为某个数据点。Among them, μ is the mean value of the data sequence, σ is the standard deviation of the data sequence, and x is a certain data point.
基于LOF算法的数据点检测:局部异常因子LOF算法是一种基于密度的算法,C1是一个簇,C2是另外一个簇,而O1和O2则是远离两个簇的孤立点,也叫做异常点。Data point detection based on LOF algorithm: local anomaly factor LOF algorithm is a density-based algorithm, C1 is a cluster, C2 is another cluster, and O1 and O2 are isolated points away from the two clusters, also called outliers .
表示点p的邻域点的局部可达密度与点的局部可达密度之比的平均数,如果这个比值大于1,则的密度小于其邻域点密度,那么认为可能是异常点,其中:Indicates the average number of the ratio of the local reachable density of the neighborhood points of point p to the local reachable density of the point. If the ratio is greater than 1, the density of p is smaller than the density of its neighbor points, so it may be an abnormal point, where:
为局部可达密度,是点p的第k邻域内点到p的平均可达距离的倒数,表示点p周围点的密度,密度越高自然是在所在的簇内,密度越低,则认为离簇越远,极有可能是异常点,而点o到点p的第k可达距离,是点p的第k距离邻域,即点p的第k距离以内的所有点。is the local reachable density, which is the reciprocal of the average reachable distance from point to p in the k-th neighborhood of point p, and represents the density of points around point p. The farther away from the cluster, it is very likely to be an outlier point, and the k-th reachable distance from point o to point p is the k-th distance neighborhood of point p, that is, all points within the k-th distance of point p.
异常点检测:孤立森林用超平面把空间一割为二,无限切割下去,直到切割成一个个数据;如果是一个密度高的簇,切割次数越多;而密度低的簇或者本身就是一个远离簇的数据,则很快会被切割完,孤立森林由t个孤立树组成,每个孤立树是一个二叉树结构。Outlier detection: The isolated forest uses a hyperplane to divide the space into two, and cuts it infinitely until it is cut into pieces of data; if it is a cluster with high density, the number of cuts is more; while the cluster with low density may itself be a far away The data of the cluster will be cut soon. The isolated forest consists of t isolated trees, and each isolated tree is a binary tree structure.
获得t个孤立树之后,孤立森林训练就结束,然后我们可以用生成的孤立树来评估测试数据了,对于每一个数据点a,让该数据点遍历每一棵树,然后计算a最终落在每个树第几层,从而得出该数据的平均高度,设置一定阈值,低于此阈值的数据即为异常,这里将每个数据的高度平均值设置为,取=-0.05,如果平均高度值<-0.05,则说明该数据异常。After obtaining t isolated trees, the isolated forest training is over, and then we can use the generated isolated trees to evaluate the test data. For each data point a, let the data point traverse each tree, and then calculate a and finally fall on Which layer is each tree, so as to obtain the average height of the data, set a certain threshold, the data below this threshold is abnormal, here set the average height of each data to, take=-0.05, if the average height If the value is <-0.05, it means that the data is abnormal.
步骤三:标签可信度的综合验证,通过使用统计算法、LOF算法和孤立森林算法(ISO)对各个测点的数据段分别进行异常点检测,并根据以下步骤验证标签的有效性。Step 3: Comprehensive verification of label credibility, by using statistical algorithm, LOF algorithm and isolated forest algorithm (ISO) to detect outliers in the data segments of each measurement point, and verify the validity of the label according to the following steps.
(1)分别计算每种算法的异常点概率P1,P2,P3以及他们的平均概率P。(1) Calculate the outlier probabilities P1, P2, P3 and their average probability P of each algorithm respectively.
(2)若平均概率P大于10%,则认为此数据段是大概率的异常数据,标签为无效数据段;若平均概率P小于等于5%,则不认为其是异常数据,标签为有效数据段;若平均概率P大于5%且小于等于10%,则标签为一般有效数据段。(2) If the average probability P is greater than 10%, the data segment is considered to be abnormal data with a high probability, and the label is an invalid data segment; if the average probability P is less than or equal to 5%, it is not considered abnormal data, and the label is valid data segment; if the average probability P is greater than 5% and less than or equal to 10%, the tag is a general valid data segment.
尽管已经示出和描述了本发明的实施例,对于本领域的普通技术人员而言,可以理解在不脱离本发明的原理和精神的情况下可以对这些实施例进行多种变化、修改、替换和变型,本发明的范围由所附权利要求及其等同物限定。Although the embodiments of the present invention have been shown and described, those skilled in the art can understand that various changes, modifications and substitutions can be made to these embodiments without departing from the principle and spirit of the present invention. and modifications, the scope of the invention is defined by the appended claims and their equivalents.
Claims (1)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910661200.4A CN110619345B (en) | 2019-07-22 | 2019-07-22 | Comprehensive verification method of tag credibility for cable-stayed bridge monitoring data validity |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910661200.4A CN110619345B (en) | 2019-07-22 | 2019-07-22 | Comprehensive verification method of tag credibility for cable-stayed bridge monitoring data validity |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110619345A CN110619345A (en) | 2019-12-27 |
CN110619345B true CN110619345B (en) | 2022-12-06 |
Family
ID=68921642
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910661200.4A Expired - Fee Related CN110619345B (en) | 2019-07-22 | 2019-07-22 | Comprehensive verification method of tag credibility for cable-stayed bridge monitoring data validity |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110619345B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111710373A (en) * | 2020-07-20 | 2020-09-25 | 中科三清科技有限公司 | Detection method, device, equipment and medium of volatile organic compound observation data |
CN112035919A (en) * | 2020-08-24 | 2020-12-04 | 山东高速工程检测有限公司 | Bridge in-service performance safety assessment method and system, storage medium and equipment |
CN116776274B (en) * | 2023-08-25 | 2023-10-17 | 北京点聚信息技术有限公司 | Electronic seal data management system based on data analysis |
CN116881819B (en) * | 2023-09-07 | 2023-11-14 | 成都理工大学 | A method for monitoring the working status of cable-stayed cables based on isolated forests |
CN119416886A (en) * | 2025-01-08 | 2025-02-11 | 汕头大学医学院附属肿瘤医院 | A method and system for building a medical accelerator quality control knowledge base using a large language model |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016174264A1 (en) * | 2015-04-30 | 2016-11-03 | Marel Iceland Ehf. | A method of handling weight data in a data processing system |
CN106598822A (en) * | 2015-10-15 | 2017-04-26 | 华为技术有限公司 | Abnormal data detection method and device applied to capacity estimation |
CN109739849A (en) * | 2019-01-02 | 2019-05-10 | 山东省科学院情报研究所 | A kind of network sensitive information of data-driven excavates and early warning platform |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101382473B (en) * | 2008-10-08 | 2011-04-20 | 重庆大学 | EWMA control chart method for bridge structure safety alarm |
CN101382474B (en) * | 2008-10-08 | 2011-04-20 | 重庆大学 | Alarming method for bridge structure safety |
CN102436530A (en) * | 2011-11-15 | 2012-05-02 | 东南大学 | Sensor distribution method for bowstring arc bridge structure made of special-shaped steel tube concrete |
CN105067206B (en) * | 2015-07-16 | 2017-09-26 | 长安大学 | A kind of deflection of bridge structure linear measurement method |
US9896836B1 (en) * | 2015-11-09 | 2018-02-20 | Iowa State University Research Foundation, Inc. | Apparatus, method, and system for high capacity band brake type variable friction damping of movement of structures |
CN106223189B (en) * | 2016-07-18 | 2018-01-23 | 深圳市市政设计研究院有限公司 | Lead rubber laminated bearing, intelligent bearing and bearing monitoring system |
CN106529062B (en) * | 2016-11-20 | 2019-10-11 | 重庆交通大学 | A health diagnosis method for bridge structures based on deep learning |
CN109214355B (en) * | 2018-09-29 | 2020-05-15 | 西安交通大学 | Mechanical monitoring data abnormal section detection method based on kernel estimation LOF |
CN109556897A (en) * | 2018-11-16 | 2019-04-02 | 王玉波 | A kind of bridge construction system in science of bridge building field |
-
2019
- 2019-07-22 CN CN201910661200.4A patent/CN110619345B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016174264A1 (en) * | 2015-04-30 | 2016-11-03 | Marel Iceland Ehf. | A method of handling weight data in a data processing system |
CN106598822A (en) * | 2015-10-15 | 2017-04-26 | 华为技术有限公司 | Abnormal data detection method and device applied to capacity estimation |
CN109739849A (en) * | 2019-01-02 | 2019-05-10 | 山东省科学院情报研究所 | A kind of network sensitive information of data-driven excavates and early warning platform |
Non-Patent Citations (2)
Title |
---|
基于密度偏倚抽样的局部距离异常检测方法;付培国等;《软件学报》;20171015(第10期);第105-119页 * |
悬索桥钢混组合桥面系温度梯度数值模拟及效应研究(英文);王达等;《Journal of Central South University》;20180115(第01期);第189-199页 * |
Also Published As
Publication number | Publication date |
---|---|
CN110619345A (en) | 2019-12-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110619345B (en) | Comprehensive verification method of tag credibility for cable-stayed bridge monitoring data validity | |
CN110824270B (en) | Electricity stealing user identification method and device combining transformer area line loss and abnormal events | |
CN111353482A (en) | LSTM-based fatigue factor recessive anomaly detection and fault diagnosis method | |
US20210397175A1 (en) | Abnormality detection device, abnormality detection method, and program | |
CN101403676B (en) | Insulator hydrophobicity rank amalgamation judging method based on D-S evidence theory | |
CN111562108A (en) | An Intelligent Fault Diagnosis Method of Rolling Bearing Based on CNN and FCMC | |
WO2019019709A1 (en) | Method for detecting water leakage of tap water pipe | |
Feng et al. | Data mining for abnormal power consumption pattern detection based on local matrix reconstruction | |
CN119150175B (en) | Pipeline internal detection method and system based on multi-sensor fusion | |
CN111753877B (en) | Product quality detection method based on deep neural network migration learning | |
CN112949735A (en) | Liquid hazardous chemical substance volatile concentration abnormity discovery method based on outlier data mining | |
CN110046651B (en) | Pipeline state identification method based on monitoring data multi-attribute feature fusion | |
CN118211171B (en) | A target path mining method based on knowledge graph | |
CN113312968B (en) | Real abnormality detection method in monitoring video | |
Sun et al. | Feature extraction and pattern identification for anemometer condition diagnosis | |
Sun et al. | Flow measurement-based self-adaptive line segment clustering model for leakage detection in water distribution networks | |
CN118859800A (en) | A smart manufacturing monitoring method and system based on big data | |
CN118646475A (en) | A method and system for identifying OPGW optical cable status parameters based on GoogLeNet network | |
Ameli et al. | Explainable unsupervised multi-sensor industrial anomaly detection and categorization | |
CN103310088A (en) | Automatic detecting method of abnormal illumination power consumption | |
CN112884167B (en) | Multi-index anomaly detection method based on machine learning and application system thereof | |
CN118570194A (en) | Method and system for detecting defects of inner surface of special-shaped bushing based on three-dimensional point cloud | |
Cai et al. | An efficient outlier detection approach for streaming sensor data based on neighbor difference and clustering | |
Wu et al. | Early anomaly detection in wind turbine bolts breaking problem—Methodology and application | |
Zhang et al. | Unsupervised structural damage identification based on covariance matrix and deep clustering |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20221206 |