CN112036075A - Abnormal data judgment method based on environmental monitoring data association relation - Google Patents
Abnormal data judgment method based on environmental monitoring data association relation Download PDFInfo
- Publication number
- CN112036075A CN112036075A CN202010801821.0A CN202010801821A CN112036075A CN 112036075 A CN112036075 A CN 112036075A CN 202010801821 A CN202010801821 A CN 202010801821A CN 112036075 A CN112036075 A CN 112036075A
- Authority
- CN
- China
- Prior art keywords
- data
- hidden layer
- gate
- model
- input
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Computer Hardware Design (AREA)
- Geometry (AREA)
- Quality & Reliability (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
本发明涉及一种基于环境监测数据关联关系的异常数据判定方法,主要包括:首先监测数据分为训练数据、验证数据和测试数据,用训练数据构建模型,再用验证数据根据MAE选择模型的最佳参数;模型构建并完成调试后,经过测试数据测试,嵌入环境监测平台。在监测平台上根据实时监测数据与模型给出下一时刻的预测值,计算预测结果与真实值的绝对值bia,bia与MAE±30%真实值比较后判断实测值是否异常。本发明充分考虑了气象条件对监测数据的影响,以及监测数据的时间连续性和变化特点,最终解决多源监测数据缺少自动化质量控制手段的问题,实现了自动化、智能化的对可疑数据进行筛选和判断功能,保障了数据的质量,为后期数据使用和环境预报预警提供有力支撑。
The invention relates to a method for determining abnormal data based on the association relationship of environmental monitoring data, which mainly includes: firstly, the monitoring data is divided into training data, verification data and test data; After the model is constructed and debugged, it is tested with test data and embedded in the environmental monitoring platform. Based on the real-time monitoring data and the model, the predicted value at the next moment is given on the monitoring platform, and the absolute value bia of the predicted result and the actual value is calculated, and the measured value is judged whether the measured value is abnormal or not after comparing the bia with the true value of MAE±30%. The present invention fully considers the influence of meteorological conditions on monitoring data, as well as the time continuity and variation characteristics of monitoring data, finally solves the problem of lack of automatic quality control means for multi-source monitoring data, and realizes automatic and intelligent screening of suspicious data and judgment function, which ensures the quality of data and provides strong support for later data use and environmental forecasting and early warning.
Description
技术领域technical field
本发明涉及环境实时监测的数据质量控制技术领域,主要用于颗粒物和气态污染物实时监测数据的异常值自动判断。The invention relates to the technical field of data quality control for real-time environmental monitoring, and is mainly used for automatic judgment of abnormal values of real-time monitoring data of particulate matter and gaseous pollutants.
背景技术Background technique
对于大气环境数据质量的控制和监测,目前使用数据筛选方法大多采用手工形式,即通过绘制日均图、月均图判断各监测指标的异常波动和离群程度等。这样的方法增加了大量的人力资源,面对海量的监测数据,人工审核往往会有遗漏的情况。鉴于环境监测仪器输出监测物的浓度指标一般以分钟或小时为单位,人工手动审核数据存在一定的滞后性,通过自动化的审核机制可以做到实时对数据进行质量控制。For the control and monitoring of atmospheric environmental data quality, most of the data screening methods currently used are manual, that is, by drawing daily average graphs and monthly average graphs to judge the abnormal fluctuation and outlier degree of each monitoring indicator. This method increases a lot of human resources, and in the face of massive monitoring data, manual review often has omissions. In view of the fact that the concentration indicators of the monitored substances output by environmental monitoring instruments are generally measured in minutes or hours, there is a certain lag in manual manual review of data, and real-time data quality control can be achieved through an automated review mechanism.
针对大气监测数据缺少自动化质量控制手段的状况,现依据环境监测总站数据监控和复合采用的技术方案设计算法,实现大气环境监测数据自动化智能质量控制技术,解决多源监测数据缺少自动化质量控制手段的问题,使大气监测设备的质量控制遵从同一套方法体系,推进监测设备远程自动化质控技术的发展。In view of the lack of automatic quality control means for atmospheric monitoring data, the algorithm is now designed according to the technical scheme of data monitoring and composite adoption of the environmental monitoring station, to realize the automatic intelligent quality control technology of atmospheric environment monitoring data, and solve the problem of lack of automatic quality control means for multi-source monitoring data. Therefore, the quality control of atmospheric monitoring equipment should follow the same method system, and the development of remote automatic quality control technology of monitoring equipment should be promoted.
发明内容SUMMARY OF THE INVENTION
本发明的目的在于提供一种基于环境监测数据关联关系的异常数据判定方法,以解决多源监测数据缺少自动化质量控制手段等问题。The purpose of the present invention is to provide a method for judging abnormal data based on the association relationship of environmental monitoring data, so as to solve the problem of lack of automatic quality control means for multi-source monitoring data.
一种基于环境监测数据关联关系的异常数据判定方法,包括以下步骤:A method for determining abnormal data based on the association relationship of environmental monitoring data, comprising the following steps:
S1.对历史数据及待分析的环境监测数据进行预处理:对历史数据及待分析的环境监测数据监测数据用数采软件判断缺失值和异常值,再替换缺失值和异常值;S1. Preprocessing the historical data and the environmental monitoring data to be analyzed: the monitoring data of the historical data and the environmental monitoring data to be analyzed is used to judge the missing values and abnormal values with data acquisition software, and then replace the missing values and abnormal values;
S2.将数据划分为训练数据、验证数据,并将训练数据和验证数据转换为模型所需的序列数据;其中训练集和验证集数据均包括正常数据以及人工标识为异常的数据,其异常原因包括数据突升/突降、未出现昼夜变化、持续性低值等,异常数据判断均与前后监测数据连续性相关;训练集和验证集数据的比例可以是7:3;S2. Divide the data into training data and validation data, and convert the training data and validation data into sequence data required by the model; the training set and validation set data both include normal data and data that are manually identified as abnormal, and the reason for the abnormality Including sudden rise/drop of data, no circadian change, persistent low value, etc., abnormal data judgment is related to the continuity of monitoring data before and after; the ratio of training set and validation set data can be 7:3;
S3.用训练数据构建模型,再用验证数据根据平均绝对误差MAE选择模型的最佳参数;模型构建并完成调试后,嵌入环境监测平台中,t-1时刻的待分析的环境监测数据ct-true作为输入数据,不断得到t时刻的预测值ct-pre;S3. Use the training data to build a model, and then use the validation data to select the best parameters of the model according to the mean absolute error MAE; after the model is built and debugged, it is embedded in the environmental monitoring platform, and the environmental monitoring data to be analyzed at time t-1 c t -true is used as input data, and the predicted value c t-pre at time t is continuously obtained;
S4.将预测值ct-pre与t时刻的待分析的环境监测数据ct-true比较,求出绝对值bia,与MAE±30%真实值的经验误差相比,来判定异常,超过该范围的数据即标记为异常数据,此过程中MAE随输入数据变化,因此阈值为动态阈值。S4. Compare the predicted value c t-pre with the environmental monitoring data c t-true to be analyzed at time t, obtain the absolute value bia, and compare it with the empirical error of MAE±30% true value to determine the abnormality. The data in the range is marked as abnormal data. In this process, the MAE changes with the input data, so the threshold is a dynamic threshold.
其中bia=|ct-pre-ct-true|。where bia=|c t-pre -c t-true |.
优选的,步骤S1所述替换缺失值和异常值的方法包括:Preferably, the method for replacing missing values and outliers described in step S1 includes:
S11.线性插值的插值函数为一次多项式,首先假设已知函数y=f(x)在区间[a,b]上(n+1)个互异点xi(i=0,1,2,3...,n)上的值分别为yi,求多项式:S11. The interpolation function of linear interpolation is a first-order polynomial. First, it is assumed that the known function y=f(x) has (n+1) mutually different points x i (i=0,1,2, 3...,n) are respectively y i , and the polynomial is calculated:
使满足 to satisfy
由解析几何可知:From analytic geometry we know:
其中x0、x1、y0、y1——已知的统计数据where x 0 , x 1 , y 0 , y 1 — known statistics
x——x0,x1间的任何数据x - any data between x 0 and x 1
y——与x对应的插值数据;y - interpolation data corresponding to x;
S12.利用缺失或异常值前后2个时刻的数据,根据以上公式进行线性插值。S12. Use the data at two moments before and after the missing or abnormal value, and perform linear interpolation according to the above formula.
优选的,环境监测数据包括气象五参数气压、温度、湿度、风向、风速和污染物浓度。Preferably, the environmental monitoring data includes five meteorological parameters, air pressure, temperature, humidity, wind direction, wind speed and pollutant concentration.
优选的,将步骤S1中历史数据及待分析的环境监测数据按照季节和地域分组,分别构建每个污染物对应的二维表Preferably, the historical data in step S1 and the environmental monitoring data to be analyzed are grouped according to seasons and regions, and a two-dimensional table corresponding to each pollutant is constructed respectively.
现有的监测数据为小时监测数据,由于环境监测数据的地域性和季节性特点强,因此将全国数据按照区域(东北、西北、华北、华中、华南、东南、西南)、季节(春、夏、秋、冬)划分;小时监测数据中经数采软件判断出的异常值及缺失值使用线性插值法自动替换,最后构建成不同区域、不同季节每个污染物对应的二维表,用于构建各自的模型。The existing monitoring data is hourly monitoring data. Due to the strong regional and seasonal characteristics of environmental monitoring data, the national data are divided into regions (Northeast, Northwest, North, Central, South, Southeast, Southwest), seasons (spring, summer) The outliers and missing values judged by the data mining software in the hourly monitoring data are automatically replaced by the linear interpolation method, and finally a two-dimensional table corresponding to each pollutant in different regions and different seasons is constructed for use in Build the respective models.
优选的,步骤S3中训练数据构建模型,包括以下步骤:建模过程中,使用t时刻的某污染物浓度和气象条件作为输入,t+1时刻的某污染物浓度和气象条件作为输出,使用训练集数据经过学习后得到t+1时刻的污染物浓度和气象条件;Preferably, building a model from the training data in step S3 includes the following steps: during the modeling process, use a certain pollutant concentration and meteorological conditions at time t As input, a pollutant concentration and meteorological conditions at time t+1 As output, use the training set data After learning, the pollutant concentration and meteorological conditions at time t+1 are obtained;
模型构建时经过输入层、隐藏层和输出层,隐藏层中可以选择保留t1、t2......t时刻的信息,并将其作为输入信息作用于t+1时刻;各单元的计算步骤和方法如下:When the model is constructed, it goes through the input layer, the hidden layer and the output layer. The hidden layer can choose to retain the information at time t 1 , t 2 ...... t, and use it as input information at time t+1; each unit The calculation steps and methods are as follows:
输入门: Input gate:
遗忘门: Forgotten Gate:
输出门: Output gate:
当前时刻隐藏层候选记忆单元值: The hidden layer candidate memory unit value at the current moment:
当前时刻隐藏层记忆单元状态值: The state value of the hidden layer memory unit at the current moment:
当前时刻隐藏层输出值: The output value of the hidden layer at the current moment:
i、φ、ω分别是输入门、遗忘门和输出门,h为隐藏层输出,c为隐藏层记忆单元值,θ、σ分别表示几个门的非线性激活函数,θ一般取tanh函数,σ一般为logistic sigmoid函数,表示输入-输入门权重矩阵,表示上一时刻隐藏层单元-输入门权重矩阵,表示输入门-隐藏层记忆单元权重矩阵,是输入层-遗忘门的权重矩阵,是上一时刻隐藏层单元-遗忘门的权重矩阵,是隐藏层记忆单元-遗忘门的权重矩阵,是输入层-输出门权重矩阵,是上一时刻隐藏层单元-输出门的权重矩阵,是隐藏层记忆单元-输出门的权重矩阵,是输入层-隐藏层记忆单元的权重矩阵,是上一时刻隐藏层单元到隐藏层记忆单元的权重矩阵,分别是输入门、遗忘门、输出门和隐藏层记忆单元的偏置。i, φ and ω are the input gate, forgetting gate and output gate respectively, h is the output of the hidden layer, c is the value of the memory unit of the hidden layer, θ and σ respectively represent the nonlinear activation functions of several gates, θ generally takes the tanh function, σ is generally a logistic sigmoid function, represents the input-input gate weight matrix, Represents the hidden layer unit-input gate weight matrix at the previous moment, represents the input gate-hidden layer memory unit weight matrix, is the weight matrix of the input layer-forget gate, is the weight matrix of the hidden layer unit-forgetting gate at the previous moment, is the weight matrix of the hidden layer memory unit-forgetting gate, is the input layer-output gate weight matrix, is the weight matrix of the hidden layer unit-output gate at the previous moment, is the weight matrix of the hidden layer memory unit-output gate, is the weight matrix of the input layer-hidden layer memory unit, is the weight matrix from the hidden layer unit to the hidden layer memory unit at the previous moment, are the biases of the input gate, forget gate, output gate, and hidden layer memory unit, respectively.
考虑到样本中数据间的时间关联性强这一特点,在几种常用机器学习算法中选择了RNN-LSTM模型,此模型可利用输入序列的时间信息,提高时间序列数据的预测准确率。在普通多层BP神经网络基础上,增加了隐藏层各单元间的横向联系,通过一个权重矩阵,可以将上一个时间序列的神经单元的输出作为当前的神经单元的输入,从而使神经网络具备了记忆功能。Considering the strong temporal correlation between the data in the sample, the RNN-LSTM model is selected among several common machine learning algorithms. This model can use the temporal information of the input sequence to improve the prediction accuracy of time series data. On the basis of the ordinary multi-layer BP neural network, the horizontal connection between the units in the hidden layer is added. Through a weight matrix, the output of the neural unit of the previous time series can be used as the input of the current neural unit, so that the neural network has the memory function.
优选的训练集数据,作为模型输入,完成了模型构建;步骤S3中利用验证数据根据平均绝对误差MAE选择模型的最佳参数的方法为:验证数据作为模型输入,将模型预测值cpre与真实值cture对比,采用平均绝对误差(MAE)评价,当MAE最小时得到最优的模型参数;The preferred training set data is used as the model input to complete the model construction; in step S3, the method of using the verification data to select the best parameters of the model according to the mean absolute error MAE is as follows: the verification data is used as the model input, and the model predicted value c pre and the real The value c ture is compared, and the mean absolute error (MAE) is used for evaluation, and the optimal model parameters are obtained when the MAE is the smallest;
其中,n代表样本数,ctrue代表真实值,cpre代表预测值。Among them, n represents the number of samples, c true represents the true value, and c pre represents the predicted value.
与现有异常数据检测手段相比,本发明用人工审核后的历史数据作为训练数据构建神经网络模型,再用验证数据,根据MAE选择模型的最佳参数;模型构建并完成调试后,嵌入环境监测平台,根据实时监测数据与模型给出下一时刻的预测值,来判断下一时刻监测值是否异常。本发明将机器学习、数据挖掘的方法应用于环境监测领域,实现了计算机数据科学与环境监测交叉学科的融合创新,针对规模大且内部规则复杂程度高的监测数据,通过学习来建立分析系统。并且构建模型时,充分考虑了气象条件对监测数据的影响,以及监测数据的时间连续性和变化特点,测试结果也体现了模型预测值与实际监测值拟合度好。最终解决多源监测数据缺少自动化质量控制手段的问题,实现自动化、智能化的对可疑数据进行筛选和判断功能,使大气监测设备的质量控制遵从同一套方法体系,保障监测数据的质量,推进监测设备远程自动化质控技术的发展,为后期数据使用和环境预报预警提供有力支撑。Compared with the existing abnormal data detection methods, the present invention uses the manually reviewed historical data as training data to construct a neural network model, and then uses the verification data to select the best parameters of the model according to MAE; after the model is constructed and debugged, it is embedded in the environment. The monitoring platform gives the predicted value at the next moment according to the real-time monitoring data and the model to judge whether the monitoring value at the next moment is abnormal. The invention applies the methods of machine learning and data mining to the field of environmental monitoring, realizes the integration and innovation of computer data science and environmental monitoring interdisciplinary, and establishes an analysis system through learning for monitoring data with large scale and high complexity of internal rules. In addition, when building the model, the influence of meteorological conditions on the monitoring data, as well as the time continuity and variation characteristics of the monitoring data were fully considered. Ultimately solve the problem of lack of automatic quality control methods for multi-source monitoring data, realize automatic and intelligent screening and judgment functions for suspicious data, make the quality control of atmospheric monitoring equipment follow the same method system, ensure the quality of monitoring data, and promote monitoring The development of equipment remote automatic quality control technology provides strong support for later data use and environmental forecasting and early warning.
附图说明Description of drawings
图1是本发明基于环境监测数据关联关系的异常数据判定方法的流程图;Fig. 1 is the flow chart of the abnormal data determination method based on the environmental monitoring data association relationship of the present invention;
图2是RNN-LSTM模型构建示意图;Figure 2 is a schematic diagram of the construction of the RNN-LSTM model;
图3是预测效果图。Figure 3 is a prediction effect diagram.
具体实施方式Detailed ways
为使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明作进一步地详细描述。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings.
实施例1Example 1
本实施例依据某地2019年9月到10月的大气污染物监测数据(部分污染物数据和气象数据见表1),对11月监测数据进行异常检测。In this embodiment, anomaly detection is performed on the monitoring data in November based on the monitoring data of air pollutants in a certain place from September to October 2019 (see Table 1 for some pollutant data and meteorological data).
模型执行步骤如下:The model execution steps are as follows:
第一步:对大气污染物数据及对应的气象数据进行预处理,包括其中数采软件自动识别的缺失值、异常值(采样时长不足导致)等的插值处理,并将人工审核为异常的数据进行标记,详细见表2。Step 1: Preprocess the air pollutant data and the corresponding meteorological data, including interpolation processing of missing values and abnormal values (caused by insufficient sampling time) automatically identified by the data acquisition software, and manually review them as abnormal data marked, see Table 2 for details.
其中自动替换缺失值和异常值包括:Among them, the automatic replacement of missing values and outliers includes:
S11.首先假设已知函数y=f(x)在区间[a,b]上(n+1)个互异点xi(i=0,1,2,3...,n)上的值分别为yi,求多项式:S11. First, assume that the known function y=f(x) is on (n+1) mutually different points x i (i=0,1,2,3...,n) on the interval [a,b] The values are y i , and find the polynomial:
使满足 to satisfy
由解析几何可知:From analytic geometry we know:
其中x0、x1、y0、y1——已知的统计数据where x 0 , x 1 , y 0 , y 1 — known statistics
x——x0,x1间的任何数据x - any data between x 0 and x 1
y——与x对应的插值数据。y - the interpolated data corresponding to x.
第二步:将以上9月份数据作为训练数据,10月份数据作为验证数据,11月数据作为测试数据,并将训练和验证数据转换为模型所需的序列数据格式。Step 2: Use the above September data as training data, October data as validation data, and November data as test data, and convert the training and validation data into the sequence data format required by the model.
第三步:构建基于RNN-LSTM的神经网络,将训练数据输入模型,模型将根据训练数据的预测误差来自动调整网络参数(包括网络节点的权重和阈值),直至训练过程结束。训练时采用标准的3层网络,1输入层(6个输入节点)+1隐藏层+1输出层(1个输出节点);同时根据验证数据效果来调整模型超参数(包括模型的隐藏层节点数、学习率等等),其中PM10模型隐藏层节点数为500,PM2.5模型隐藏层节点为300,CO模型隐藏层节点数为200,NO2模型隐藏层节点为100,O3模型隐藏层节点为100,SO2模型隐藏层节点为100。图2模型构建的流程。Step 3: Build a neural network based on RNN-LSTM, input the training data into the model, and the model will automatically adjust the network parameters (including the weights and thresholds of network nodes) according to the prediction error of the training data until the end of the training process. During training, a standard 3-layer network is used, 1 input layer (6 input nodes) + 1 hidden layer + 1 output layer (1 output node); at the same time, the model hyperparameters (including the hidden layer nodes of the model) are adjusted according to the effect of the verification data. The number of hidden layer nodes of the PM 10 model is 500, the number of hidden layer nodes of the PM 2.5 model is 300, the number of hidden layer nodes of the CO model is 200, the number of hidden layer nodes of the NO 2 model is 100, and the number of hidden layer nodes of the O 3 model is 100. The layer nodes are 100 and the SO 2 model hidden layer nodes are 100. Figure 2 The flow of model building.
模型公式为:The model formula is:
输入门: Input gate:
遗忘门: Forgotten Gate:
输出门: Output gate:
当前时刻隐藏层候选记忆单元值: The hidden layer candidate memory unit value at the current moment:
当前时刻隐藏层记忆单元状态值: The state value of the hidden layer memory unit at the current moment:
当前时刻隐藏层输出值: The output value of the hidden layer at the current moment:
第四步:将11月测试数据输入训练好的模型中,并将预测数据与实测值进行比对,部分测试数据效果图3。Step 4: Input the November test data into the trained model, and compare the predicted data with the measured value. The effect of some test data is shown in Figure 3.
采用平均绝对误差评价,其中CO约为0.08,NO2约为6.85,O3约为10.45,PM10约为7.80,PM2.5约为4.97,SO2约为3.38。MAE的大小与测试数据的尺度有关,总体11月份数据测试结果较好。Using the mean absolute error evaluation, CO is about 0.08, NO 2 is about 6.85, O 3 is about 10.45, PM 10 is about 7.80, PM 2.5 is about 4.97, and SO 2 is about 3.38. The size of the MAE is related to the scale of the test data, and the test results for the November data are generally better.
第五步:根据预测值与实测值之间差值的绝对值来进行数据异常的判断,超过模型预测误差MAE±30%实测值的数据,将被模型判定为异常数据。模型输出的部分预测值、bia和MAE值如表3所示,模型异常判断的部分结果如表4所示。Step 5: Judging data abnormality according to the absolute value of the difference between the predicted value and the measured value. The data exceeding the model prediction error MAE±30% of the measured value will be judged as abnormal data by the model. Some predicted values, bia and MAE values output by the model are shown in Table 3, and some results of model abnormality judgment are shown in Table 4.
表1某地部分污染物数据和气象数据Table 1 Partial pollutant data and meteorological data of a certain place
注:其中NA表示缺失数据,RM为人工审核为异常数据Note: NA means missing data, RM means manual review as abnormal data
表2数据前处理后Table 2 Data before and after processing
接表2Connect to Table 2
表3模型判断的NO2部分时刻预测数据、实测数据、MAE及bia值Table 3 The NO 2 part of the time prediction data, measured data, MAE and bia value judged by the model
注:NO2_Mark列中1表示模型判断为异常数据Note: 1 in the NO 2 _Mark column indicates that the model is judged to be abnormal data
表4模型最终判断的部分异常污染物数据Table 4 Some abnormal pollutant data finally judged by the model
Claims (6)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010801821.0A CN112036075A (en) | 2020-08-11 | 2020-08-11 | Abnormal data judgment method based on environmental monitoring data association relation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010801821.0A CN112036075A (en) | 2020-08-11 | 2020-08-11 | Abnormal data judgment method based on environmental monitoring data association relation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112036075A true CN112036075A (en) | 2020-12-04 |
Family
ID=73578362
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010801821.0A Pending CN112036075A (en) | 2020-08-11 | 2020-08-11 | Abnormal data judgment method based on environmental monitoring data association relation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112036075A (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112613233A (en) * | 2020-12-18 | 2021-04-06 | 中国环境监测总站 | Algorithm for discovering environmental monitoring abnormal data based on single-classification support vector machine model |
CN112767653A (en) * | 2020-12-21 | 2021-05-07 | 武汉达梦数据技术有限公司 | Geological disaster professional monitoring data acquisition method and system |
CN113434854A (en) * | 2021-08-26 | 2021-09-24 | 中国电子信息产业集团有限公司 | Method for generating data element based on sandbox environment and storage medium |
CN113569324A (en) * | 2021-08-03 | 2021-10-29 | 招商局重庆交通科研设计院有限公司 | Anomaly data analysis and optimization method for slope deformation monitoring |
CN114035553A (en) * | 2021-11-16 | 2022-02-11 | 湖南机电职业技术学院 | Control system fault detection method and device based on system identification and fitting degree |
CN114511784A (en) * | 2022-02-16 | 2022-05-17 | 平安国际智慧城市科技股份有限公司 | Environment monitoring and early warning method, device, equipment and storage medium |
CN114638290A (en) * | 2022-03-07 | 2022-06-17 | 廖彤 | Environment monitoring instrument fault prediction method based on edge calculation and BP neural network |
CN114648238A (en) * | 2022-03-31 | 2022-06-21 | 广西博世科环保科技股份有限公司 | Artificial intelligent automatic dosing method in sewage treatment scene |
CN114826988A (en) * | 2021-01-29 | 2022-07-29 | 中国电信股份有限公司 | Method and device for anomaly detection and parameter filling of time sequence data |
CN115080909A (en) * | 2022-07-15 | 2022-09-20 | 深圳市城市交通规划设计研究中心股份有限公司 | Analysis method for influencing data of internet of things sensing equipment, electronic equipment and storage medium |
CN116576553A (en) * | 2023-07-11 | 2023-08-11 | 韦德电子有限公司 | Data optimization acquisition method and system for air conditioner |
CN117074627A (en) * | 2023-10-16 | 2023-11-17 | 三科智能(山东)集团有限公司 | Medical laboratory air quality monitoring system based on artificial intelligence |
CN118013212A (en) * | 2024-01-19 | 2024-05-10 | 中移雄安信息通信科技有限公司 | Regional environment monitoring method and device based on space-time remote sensing |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101149819A (en) * | 2007-10-31 | 2008-03-26 | 山东省科学院海洋仪器仪表研究所 | Meteorological element real time data singular value elimination method |
US20120226653A1 (en) * | 2009-09-24 | 2012-09-06 | Mclaughlin Michael John | Method of contaminant prediction |
CN103336906A (en) * | 2013-07-15 | 2013-10-02 | 哈尔滨工业大学 | Sampling GPR method of continuous anomaly detection in collecting data flow of environment sensor |
CN105303051A (en) * | 2015-11-11 | 2016-02-03 | 中国科学院遥感与数字地球研究所 | Air pollutant concentration prediction method |
CN108900546A (en) * | 2018-08-13 | 2018-11-27 | 杭州安恒信息技术股份有限公司 | The method and apparatus of time series Network anomaly detection based on LSTM |
CN109302410A (en) * | 2018-11-01 | 2019-02-01 | 桂林电子科技大学 | A method, system and computer storage medium for detecting abnormal behavior of internal users |
CN109738939A (en) * | 2019-03-21 | 2019-05-10 | 蔡寅 | A kind of Precursory Observational Data method for detecting abnormality |
CN110008979A (en) * | 2018-12-13 | 2019-07-12 | 阿里巴巴集团控股有限公司 | Abnormal data prediction technique, device, electronic equipment and computer storage medium |
CN111144286A (en) * | 2019-12-25 | 2020-05-12 | 北京工业大学 | An urban PM2.5 concentration prediction method integrating EMD and LSTM |
CN111241673A (en) * | 2020-01-07 | 2020-06-05 | 北京航空航天大学 | Health state prediction method for industrial equipment in noisy environment |
-
2020
- 2020-08-11 CN CN202010801821.0A patent/CN112036075A/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101149819A (en) * | 2007-10-31 | 2008-03-26 | 山东省科学院海洋仪器仪表研究所 | Meteorological element real time data singular value elimination method |
US20120226653A1 (en) * | 2009-09-24 | 2012-09-06 | Mclaughlin Michael John | Method of contaminant prediction |
CN103336906A (en) * | 2013-07-15 | 2013-10-02 | 哈尔滨工业大学 | Sampling GPR method of continuous anomaly detection in collecting data flow of environment sensor |
CN105303051A (en) * | 2015-11-11 | 2016-02-03 | 中国科学院遥感与数字地球研究所 | Air pollutant concentration prediction method |
CN108900546A (en) * | 2018-08-13 | 2018-11-27 | 杭州安恒信息技术股份有限公司 | The method and apparatus of time series Network anomaly detection based on LSTM |
CN109302410A (en) * | 2018-11-01 | 2019-02-01 | 桂林电子科技大学 | A method, system and computer storage medium for detecting abnormal behavior of internal users |
CN110008979A (en) * | 2018-12-13 | 2019-07-12 | 阿里巴巴集团控股有限公司 | Abnormal data prediction technique, device, electronic equipment and computer storage medium |
CN109738939A (en) * | 2019-03-21 | 2019-05-10 | 蔡寅 | A kind of Precursory Observational Data method for detecting abnormality |
CN111144286A (en) * | 2019-12-25 | 2020-05-12 | 北京工业大学 | An urban PM2.5 concentration prediction method integrating EMD and LSTM |
CN111241673A (en) * | 2020-01-07 | 2020-06-05 | 北京航空航天大学 | Health state prediction method for industrial equipment in noisy environment |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112613233A (en) * | 2020-12-18 | 2021-04-06 | 中国环境监测总站 | Algorithm for discovering environmental monitoring abnormal data based on single-classification support vector machine model |
CN112767653A (en) * | 2020-12-21 | 2021-05-07 | 武汉达梦数据技术有限公司 | Geological disaster professional monitoring data acquisition method and system |
CN114826988A (en) * | 2021-01-29 | 2022-07-29 | 中国电信股份有限公司 | Method and device for anomaly detection and parameter filling of time sequence data |
CN113569324A (en) * | 2021-08-03 | 2021-10-29 | 招商局重庆交通科研设计院有限公司 | Anomaly data analysis and optimization method for slope deformation monitoring |
CN113434854A (en) * | 2021-08-26 | 2021-09-24 | 中国电子信息产业集团有限公司 | Method for generating data element based on sandbox environment and storage medium |
CN114035553A (en) * | 2021-11-16 | 2022-02-11 | 湖南机电职业技术学院 | Control system fault detection method and device based on system identification and fitting degree |
CN114035553B (en) * | 2021-11-16 | 2023-11-24 | 湖南机电职业技术学院 | Control system fault detection method and device based on system identification and fitting degree |
CN114511784A (en) * | 2022-02-16 | 2022-05-17 | 平安国际智慧城市科技股份有限公司 | Environment monitoring and early warning method, device, equipment and storage medium |
CN114638290A (en) * | 2022-03-07 | 2022-06-17 | 廖彤 | Environment monitoring instrument fault prediction method based on edge calculation and BP neural network |
CN114638290B (en) * | 2022-03-07 | 2024-04-30 | 廖彤 | Environment monitoring instrument fault prediction method based on edge calculation and BP neural network |
CN114648238A (en) * | 2022-03-31 | 2022-06-21 | 广西博世科环保科技股份有限公司 | Artificial intelligent automatic dosing method in sewage treatment scene |
CN115080909A (en) * | 2022-07-15 | 2022-09-20 | 深圳市城市交通规划设计研究中心股份有限公司 | Analysis method for influencing data of internet of things sensing equipment, electronic equipment and storage medium |
CN115080909B (en) * | 2022-07-15 | 2022-11-25 | 深圳市城市交通规划设计研究中心股份有限公司 | Analysis method for influencing data of internet of things sensing equipment, electronic equipment and storage medium |
CN116576553A (en) * | 2023-07-11 | 2023-08-11 | 韦德电子有限公司 | Data optimization acquisition method and system for air conditioner |
CN116576553B (en) * | 2023-07-11 | 2023-09-22 | 韦德电子有限公司 | Data optimization acquisition method and system for air conditioner |
CN117074627A (en) * | 2023-10-16 | 2023-11-17 | 三科智能(山东)集团有限公司 | Medical laboratory air quality monitoring system based on artificial intelligence |
CN117074627B (en) * | 2023-10-16 | 2024-01-09 | 三科智能(山东)集团有限公司 | Medical laboratory air quality monitoring system based on artificial intelligence |
CN118013212A (en) * | 2024-01-19 | 2024-05-10 | 中移雄安信息通信科技有限公司 | Regional environment monitoring method and device based on space-time remote sensing |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112036075A (en) | Abnormal data judgment method based on environmental monitoring data association relation | |
CN109063366B (en) | Building performance data online preprocessing method based on time and space weighting | |
Lei et al. | A comprehensive evaluation method for indoor air quality of buildings based on rough sets and a wavelet neural network | |
CN107577910B (en) | A vehicle exhaust concentration inversion method based on deep neural network | |
CN107436277B (en) | A single-index data quality control method based on similarity distance discrimination | |
CN108764601A (en) | A kind of monitoring structural health conditions abnormal data diagnostic method based on computer vision and depth learning technology | |
CN111191855B (en) | Water quality abnormal event identification and early warning method based on pipe network multi-element water quality time sequence data | |
CN113987908A (en) | Natural gas pipe network leakage early warning method based on machine learning method | |
CN114023399A (en) | Air particulate matter analysis early warning method and device based on artificial intelligence | |
CN111179591A (en) | Road network traffic time sequence characteristic data quality diagnosis and restoration method | |
CN110824914A (en) | Intelligent wastewater treatment monitoring method based on PCA-LSTM network | |
CN114169254A (en) | Abnormal energy consumption diagnosis method and system based on short-term building energy consumption prediction model | |
CN110570013B (en) | Single-station online wave period data prediction diagnosis method | |
CN113988210A (en) | Distorted data restoration method, device and storage medium for structural monitoring sensor network | |
CN113688506B (en) | Potential atmospheric pollution source identification method based on multi-dimensional data such as micro-station and the like | |
CN117520989A (en) | Natural gas pipeline leakage detection method based on machine learning | |
CN118861957A (en) | An air quality detection method based on multi-sensor monitoring | |
Liu et al. | Research on data correction method of micro air quality detector based on combination of partial least squares and random forest regression | |
CN114819102A (en) | GRU-based air conditioning equipment fault diagnosis method | |
CN117109582A (en) | Air pollution source positioning system and method combining sensor network and machine learning | |
CN108257365A (en) | A kind of industrial alarm designs method based on global nonspecific evidence dynamic fusion | |
CN113836813B (en) | Blast furnace tuyere water leakage detection method based on data analysis | |
KR20240059742A (en) | Air pollutant concentration correction and error expectation system using artificial intelligence | |
Hu et al. | The early warning model of dust concentration in smart construction sites based on long short term memory network | |
CN115526330A (en) | Organic matter navigation data calibration method and device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20201204 |
|
RJ01 | Rejection of invention patent application after publication |