CN115470850A - A water quality anomaly event recognition and early warning method based on water quality spatiotemporal data of pipe network - Google Patents
A water quality anomaly event recognition and early warning method based on water quality spatiotemporal data of pipe network Download PDFInfo
- Publication number
- CN115470850A CN115470850A CN202211104588.6A CN202211104588A CN115470850A CN 115470850 A CN115470850 A CN 115470850A CN 202211104588 A CN202211104588 A CN 202211104588A CN 115470850 A CN115470850 A CN 115470850A
- Authority
- CN
- China
- Prior art keywords
- water quality
- model
- data
- probability
- event
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/18—Water
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A20/00—Water conservation; Efficient water supply; Efficient water use
- Y02A20/152—Water filtration
Landscapes
- Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Pathology (AREA)
- Data Mining & Analysis (AREA)
- Analytical Chemistry (AREA)
- Immunology (AREA)
- Medicinal Chemistry (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Biochemistry (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Food Science & Technology (AREA)
- Testing And Monitoring For Control Systems (AREA)
Abstract
Description
技术领域technical field
本发明属于供水管网水处理技术领域,涉及到一种基于管网水质时空数据的水质异常事件识别预警方法。The invention belongs to the technical field of water treatment of water supply pipe networks, and relates to a water quality abnormal event identification and early warning method based on pipe network water quality spatio-temporal data.
背景技术Background technique
供水管网是将水安全可靠地输送到用户的重要城市基础设施。然而,特别是在像中国一样的发展中国家,供水管网中常常会由于管网老化、缺乏调度和维护管理以及施工质量差等发生污染事故。当供水管网中发生污染事故时,除非能迅速的发现并去除污染,否则被污染的水会迅速扩散到整个供水管网,形成全局风险。这不仅会造成供水中断和巨大的经济损失,还会造成环境破坏和公共卫生安全问题。因此,快速、准确地检测污染事件对于保障城市供水安全至关重要,这有助于水务集团制定补救措施,减少污染事件造成的损失,提高水务集团的供水水平和社会受认可度。The water supply network is an important urban infrastructure that transports water safely and reliably to users. However, especially in developing countries like China, pollution accidents often occur in the water supply network due to the aging of the network, lack of dispatch and maintenance management, and poor construction quality. When a pollution accident occurs in the water supply network, unless the pollution can be found and removed quickly, the polluted water will quickly spread to the entire water supply network, forming a global risk. This will not only cause interruption of water supply and huge economic losses, but also cause environmental damage and public health safety problems. Therefore, rapid and accurate detection of pollution incidents is crucial to ensuring the safety of urban water supply, which will help the water affairs group formulate remedial measures, reduce losses caused by pollution incidents, and improve the water supply level and social acceptance of the water affairs group.
传统地,检测饮用水中的特定物质是通过现场采样和实验室分析进行的。这种方法可以测试各种类型的水质参数或直接识别污染物。然而,对于大型的供水管网,这种方法非常耗时耗力,最重要的是它不能及时提供污染事件的早期预警。目前,我国部分城市的水务集团已经在当地供水系统建立了相对完善的传感器监控网络(SCADA系统),利用水质在线监测传感器可以实时监测管网中关键节点的常规水质参数的变化情况,在此基础上,可以对供水管网水质事故进行快速分析和应对,以提高公司的供水服务水平。Traditionally, detection of specific substances in drinking water has been carried out through field sampling and laboratory analysis. This method can test various types of water quality parameters or directly identify pollutants. However, for large-scale water supply networks, this method is very time-consuming and labor-intensive, and most importantly, it cannot provide early warning of pollution events in time. At present, the water affairs groups in some cities in my country have established a relatively complete sensor monitoring network (SCADA system) in the local water supply system. Using water quality online monitoring sensors can monitor the changes of conventional water quality parameters at key nodes in the pipeline network in real time. Based on this In addition, it can quickly analyze and respond to water quality accidents in the water supply network, so as to improve the company's water supply service level.
关于水质污染事故识别的研究,主要可以分为基于统计分析模型的、基于水力学模型的、以及基于机器学习模型的方法。基于统计分析模型的方法中,污染事故的检测通常是基于供水管网的水质参数数据分布。由于水质数据的非线性和非平稳特性,统计方法通常不适用于检测供水管网水质的微小异常变化。基于水力学模型的方法通过比较观察到的实时水质数据和使用管网水力水质模型得到的预测值来检测污染事故。然而,由于管网拓扑结构的复杂性和数据的限制,实际的应用中很难得到精确地水力水质模型。机器学习算法被认为是实时预测水质参数变化和识别污染事故的替代方法。各种机器学习算法已经被应用于供水管网的水质异常事故检测,如人工神经网络(ANN)、支持向量机(SVM)、长短时记忆(LSTM)等,这些模型可以捕捉单个传感器站点的水质时序数据的特征。但是这些模型没有利用多个站点的传感器数据的空间分布关系,当水质监测站点在正常运行期间有较大的水力操作变化时,会产生较多的误报。总的来说,现有的模型在实际应用中存在模型精度不高,污染检测事故检测率低,误报较多的问题。The research on the identification of water pollution accidents can be mainly divided into methods based on statistical analysis models, methods based on hydraulic models, and methods based on machine learning models. In the method based on the statistical analysis model, the detection of pollution accidents is usually based on the distribution of water quality parameter data of the water supply network. Due to the nonlinear and non-stationary nature of water quality data, statistical methods are usually not suitable for detecting small abnormal changes in water quality in water supply networks. The hydraulic model-based approach detects pollution incidents by comparing observed real-time water quality data with predicted values obtained using a network hydraulic water quality model. However, due to the complexity of the pipe network topology and data limitations, it is difficult to obtain an accurate hydraulic water quality model in practical applications. Machine learning algorithms are considered as an alternative to predict changes in water quality parameters and identify pollution incidents in real time. Various machine learning algorithms have been applied to the detection of water quality anomalies in water supply networks, such as artificial neural network (ANN), support vector machine (SVM), long short-term memory (LSTM), etc. These models can capture the water quality of a single sensor site Characteristics of Time Series Data. However, these models do not take advantage of the spatial distribution relationship of sensor data from multiple stations, which will generate more false positives when water quality monitoring stations have large hydraulic operation changes during normal operation. In general, the existing models have the problems of low model accuracy, low detection rate of pollution detection accidents, and many false positives in practical applications.
发明内容Contents of the invention
对上述不足,本发明要解决的问题是提供一种供水管网的水质异常事故识别预警方法,能在无法精确获得水力水质模型的复杂供水系统中,对空间范围上的多元水质数据进行关联分析,获取供水管网水质的时空分布的特征,实现水质污染事故监测的实时自动化,提高模型对污染事故的检测率,减少模型的误报次数。For the above-mentioned deficiencies, the problem to be solved by the present invention is to provide a method for identifying and early warning of abnormal water quality accidents in the water supply pipe network, which can perform correlation analysis on multivariate water quality data in a spatial range in complex water supply systems where the hydraulic water quality model cannot be accurately obtained , to obtain the characteristics of the spatio-temporal distribution of water quality in the water supply network, realize the real-time automation of water pollution accident monitoring, improve the detection rate of the model for pollution accidents, and reduce the number of false alarms of the model.
为了达到上述的目的,本发明采用的技术方案为:In order to achieve the above-mentioned purpose, the technical scheme adopted in the present invention is:
一种基于管网水质时空数据的水质异常事件识别预警方法,包括以下步骤:A water quality abnormal event identification and early warning method based on pipe network water quality spatiotemporal data, comprising the following steps:
(1)选择监测的水质参数以及水质传感器站点,通过分析传感器接收污染水的时间间隔以及变化趋势,将供水管网中具有相同变化趋势的相邻监测站点分组分析。将N个传感器检测站点分为N+1组:前N组中,每个组内包含单个传感器站点的多参数水质监测数据,另外还有一组包含所有N个传感器站点的多参数水质监测数据。(1) Select the monitored water quality parameters and water quality sensor stations, and analyze the time interval and change trend of the sensors receiving polluted water, and group and analyze the adjacent monitoring stations with the same change trend in the water supply network. Divide N sensor detection stations into N+1 groups: in the first N groups, each group contains multi-parameter water quality monitoring data of a single sensor station, and another group contains multi-parameter water quality monitoring data of all N sensor stations.
(2)将N+1组的水质数据进行标准化预处理,将预处理后每一组内的水质时序数据分别转化为叠加图像。假设每个站点收集Nr个水质参数,对于每一个时刻,从分析的N个传感器站点可以收集到V个水质参数数据(V=Nr×N)。在t时刻,每个组的距离图像可以表示为其中是t时刻叠加图像mt中第i行,第j列的元素,计算为t时刻所收集的V个水质参数数据中第i个和第j个数据的总和。当N=1时,叠加图像转换仅用于单个传感器站点的多参数数据融合;当N>1时,其叠加图像转换用于融合多个站点的时空分布数据。为了减少噪声的影响,每个时刻的叠加图像取过去d个时间长度的平均值。(2) Perform standardized preprocessing on the water quality data of the N+1 group, and convert the water quality time series data in each group after preprocessing into superimposed images. Assuming that each station collects Nr water quality parameters, for each moment, V water quality parameter data can be collected from the analyzed N sensor stations (V=Nr×N). At time t, the range image of each group can be expressed as in is the i-th row and j-th column element in the superimposed image m t at time t, calculated as the sum of the i-th and j-th data among the V water quality parameter data collected at time t. When N = 1, the superimposed image transformation is only used for multi-parameter data fusion of a single sensor site; when N > 1, its superimposed image transformation is used to fuse the spatio-temporal distribution data of multiple stations. In order to reduce the influence of noise, the superimposed image at each moment takes the average value of the past d time lengths.
(3)构造对抗学习网络模型GAN。(3) Construct an adversarial learning network model GAN.
所述的对抗学习网络模型分为生成器G和判别器D,生成器G采用包括压缩图像信息的编码器和还原图像信息的解码器的卷积神经网络;判别器D是一个用来压缩提取图像特征的分类卷积神经网络。首先,利用步骤(2)将正常发运行状态下所分析的水质监测站点的多元时间序列数据转化为叠加图像,然后将过去K个时刻的历史叠加图像作为GAN模型的输入,通过生成器网络G生成下一时刻的叠加图像。接着,将生成器生成的图像和基于实测数据的叠加图像m输入到判别器网络,判别器用于区分实测数据的叠加图像m(真实)和由生成器G重构的叠加图像(伪造)。The described confrontational learning network model is divided into a generator G and a discriminator D, and the generator G adopts a convolutional neural network including an encoder for compressing image information and a decoder for restoring image information; the discriminator D is a network used to compress and extract Convolutional Neural Networks for Classification of Image Features. First, use step (2) to convert the multivariate time series data of water quality monitoring stations analyzed under normal operation into superimposed images, and then use the historical superimposed images of the past K moments as the input of the GAN model, through the generator network G Generate an overlay image of the next moment. Next, the image generated by the generator and the superimposed image m based on the measured data are input to the discriminator network, and the discriminator is used to distinguish the superimposed image m (real) of the measured data from the superimposed image reconstructed by the generator G (forgery).
GAN模型的训练采用改进的W-GAN损失作为判别器的训练损失LD,用于稳定训练过程,具体表示为,The training of the GAN model uses the improved W- GAN loss as the training loss LD of the discriminator, which is used to stabilize the training process, specifically expressed as,
式中,E表示计算的数学期望;D(·)为判别器D最终输出的特征向量;是由生成器重构的叠加图像;m是基于实测数据的叠加图像;是由一对真实叠加图像和重构叠加图像组成的插值图像;是对插值图像求梯度;λGP是梯度惩罚项的系数,根据经验值取10。In the formula, E represents the mathematical expectation of the calculation; D( ) is the feature vector finally output by the discriminator D; is the superimposed image reconstructed by the generator; m is the superimposed image based on the measured data; is an interpolated image consisting of a pair of real overlay images and reconstructed overlay images; is the interpolated image Find the gradient; λ GP is the coefficient of the gradient penalty item, which is 10 according to the empirical value.
生成器训练采用真实叠加图像和生成的叠加图像之间的重构损失作为生成器损失LG,来帮助生成器学习正常水质数据的分布。其计算表示为,The generator training uses the reconstruction loss between the real overlay image and the generated overlay image as the generator loss L G to help the generator learn the distribution of normal water quality data. Its calculation is expressed as,
在GAN模型中,对G和D两个网络同时进行训练和更新。模型训练的最终目标不是最小化任何单个网络的损失,而是找到一个稳定状态,当G和D的损失都收敛到一个稳定状态。In the GAN model, two networks, G and D, are trained and updated simultaneously. The ultimate goal of model training is not to minimize the loss of any single network, but to find a steady state when both G and D losses converge to a stable state.
(4)基于步骤(3)中训练好的GAN模型的生成器和判别器构造异常分数ψ。基于GAN的异常分数计算包括两部分:利用生成器网络的异常识别和利用判别器网络的异常识别。其中基于生成器网络的异常识别是通过比较生成器网络生成的图像与基于实测数据构造的叠加图像m之间的重构损失来计算。基于判别器网络的异常识别是通过比较将基于实测数据的叠加图像和生成器生成的图像分别输入判别器网络后得到的最终输出的特征向量D(m)和之间特征损失。最终基于GAN模型构造的异常分数具体表示为,(4) Construct an anomaly score ψ based on the generator and discriminator of the GAN model trained in step (3). GAN-based anomaly score calculation consists of two parts: anomaly identification with generator network and anomaly identification with discriminator network. Among them, the anomaly recognition based on the generator network is by comparing the images generated by the generator network and the reconstruction loss between the superimposed image m constructed based on the measured data. The anomaly recognition based on the discriminator network is to compare the final output feature vector D(m) and Between feature loss. The final abnormal score based on the GAN model is specifically expressed as,
其中,ψ(t)表示t时刻利用GAN模型计算的异常分数。异常分数越接近0,说明当前水质状态越正常,反之,异常分数越大,说明当前时刻的水质状态与正常运行时的水质差别越大,该时刻水质就越可能处于异常状态。λs是调节重构损失和特征损失相对于异常分数的相对重要性的加权参数。Among them, ψ(t) represents the anomaly score calculated using the GAN model at time t. The closer the anomaly score is to 0, the more normal the current water quality is. On the contrary, the larger the anomaly score, the greater the difference between the water quality at the current moment and the water quality during normal operation, and the more likely the water quality is in an abnormal state at this moment. λs is a weighting parameter that adjusts the relative importance of reconstruction loss and feature loss with respect to the anomaly score.
(5)选择合适的异常分数阈值ψthre,当利用步骤(4)得到的异常分数超过ψthre后作为初始的异常点识别。由于步骤(3)中GAN模型的构建与训练均是基于正常运行状态下的水质数据进行的,因此异常分数阈值的选择主要是基于训练GAN网络的数据集得到的异常分数分布决定的。将用于步骤(3)中GAN模型构建和训练的水质数据集利用步骤(4)得到异常分数后,从小到大将异常分数值进行排列,根据数据分布实际情况,选择其分布中96%-99%分位数点作为其异常分数阈值ψthre,使得用于构造GAN模型的水质数据集中大部分点都处于正常范围内。(5) Select an appropriate anomaly score threshold ψ thre , and when the anomaly score obtained in step (4) exceeds ψ thre , it is used as an initial outlier point recognition. Since the construction and training of the GAN model in step (3) are based on water quality data under normal operating conditions, the selection of the abnormal score threshold is mainly determined based on the abnormal score distribution obtained from the training GAN network dataset. After the water quality data set used for GAN model construction and training in step (3) is used to obtain abnormal scores in step (4), arrange the abnormal score values from small to large, and select 96%-99% of the distribution according to the actual situation of the data distribution. The % quantile point is used as its abnormal score threshold ψ thre , so that most of the points in the water quality dataset used to construct the GAN model are within the normal range.
(6)利用时序贝叶斯原理进行水质异常事件的概率计算,基于步骤(5)中得到的异常点的识别结果计算各个时刻发生污染事件的概率P(t),具体可以用以下公式表示,(6) Use the time-series Bayesian principle to calculate the probability of abnormal water quality events, and calculate the probability P(t) of pollution events at each time based on the identification results of abnormal points obtained in step (5). Specifically, it can be expressed by the following formula,
式中,TPR是真阳性率,计算为在供水管网受污染时正确归类为异常的时间步数与受污染的总时间步数的比率。在没有污染事件的先验知识时,TPR取值为0.5。FPR是假阳性率,计算为在供水管网正常运行无污染情况下模型识别为异常点的时间步数与正常运行的总时间步数的比率,这相当于在训练集中超过异常分数阈值的时刻数目与训练数据集总数之比。where TPR is the true positive rate, calculated as the ratio of the number of time steps correctly classified as abnormal when the water supply network is polluted to the total number of polluted time steps. When there is no prior knowledge of pollution events, TPR takes the value of 0.5. FPR is the false positive rate, calculated as the ratio of the number of time steps identified as outliers by the model to the total number of time steps in normal operation when the water supply network is operating normally without pollution, which is equivalent to the moment when the anomaly score threshold is exceeded in the training set The ratio of the number to the total number of training datasets.
初始时刻给定污染事件发生概率为P(0),由于污染事件在实际生活中很少见,取一个较小的概率值P(0)∈[10-6,10-4]。当计算的概率P(t)超过一定阈值Pthre时,该模型就识别为是污染事件。一个低的概率阈值可以提高事件检出率,但可能增加误报的次数;一个高的概率阈值可以提高事件报警的可靠性,减少误报数量,但同时可能检测出的污染事件会变少。所述的污染事件识别的概率阈值Pthre根据决策者对污染事件检出率和误报率的权衡考虑设定,通常设置为超过70%的概率值。The probability of occurrence of pollution events at the initial moment is given as P(0). Since pollution events are rare in real life, a smaller probability value P(0)∈[10 -6 ,10 -4 ] is taken. When the calculated probability P(t) exceeds a certain threshold P thre , the model recognizes it as a pollution event. A low probability threshold can improve the event detection rate, but may increase the number of false alarms; a high probability threshold can improve the reliability of event alarms and reduce the number of false alarms, but at the same time, fewer pollution events may be detected. The probability threshold P thre for identifying a pollution event is set according to the trade-off between the detection rate and the false positive rate of the pollution event by decision makers, and is usually set to a probability value exceeding 70%.
(7)供水管网中常规运行水力变化会在短期内导致水质参数的突变。为了区分正常的水质变化和污染事件,使用简单的指数平滑模型对计算的概率进行平滑处理,具体可以用以下公式表示,(7) The hydraulic changes in the routine operation of the water supply network will lead to sudden changes in water quality parameters in a short period of time. In order to distinguish between normal water quality changes and pollution events, a simple exponential smoothing model is used to smooth the calculated probability, which can be expressed by the following formula,
P(t)=αP(t)+(1-α)P(t-1) (6)P(t)=αP(t)+(1-α)P(t-1) (6)
式中,α是平滑参数,决定了对最近更新的事件概率的重视程度,α∈[0.3,0.9]。α越小,其事件概率更新的越慢,达到报警阈值所需要识别的异常点也越多。In the formula, α is a smoothing parameter, which determines the degree of emphasis on the latest updated event probability, α∈[0.3,0.9]. The smaller α is, the slower the update of the event probability is, and the more abnormal points need to be identified to reach the alarm threshold.
(8)分别利用单站点模型和多站点模型进行水质异常事故的概率计算。单站点模型是指将步骤(3)-(7)提出的异常检测方法分别单独应用在步骤(1)-(2)中前N组水质数据中,对每一个站点构造GAN模型,并将N个GAN模型的计算的污染事件发生概率的最大值作为单站点模型计算的事件概率Psingle。多站点模型是指将步骤(3)-(7)提出的异常检测方法应用在步骤(1)-(2)中的第N+1组水质数据中,该组数据包括了N个水质监测站点监测的水质数据,构造一个GAN模型,得到最终多站点模型计算的污染事件发生概率Pmulti。(8) Calculate the probability of abnormal water quality accidents by using the single-site model and the multi-site model respectively. The single-site model refers to applying the anomaly detection method proposed in steps (3)-(7) to the first N sets of water quality data in steps (1)-(2), constructing a GAN model for each site, and combining N The maximum value of the pollution event occurrence probabilities calculated by the GAN models is taken as the event probability P single calculated by the single-site model. The multi-site model refers to applying the anomaly detection method proposed in steps (3)-(7) to the N+1th set of water quality data in steps (1)-(2), which includes N water quality monitoring stations Based on the monitored water quality data, a GAN model is constructed to obtain the pollution event probability P multi calculated by the final multi-site model.
(9)为了充分利用各监测站之间及监测站内部的水质关系,将单站点模型和多站点模型计算的事件概率进行融合,基于所有监测站的多元水质参数,得到反映污染事件可能性的组合事件概率Pall,具体可以用以下表达式表达:(9) In order to make full use of the water quality relationship among the monitoring stations and within the monitoring stations, the event probability calculated by the single-site model and the multi-site model is integrated, and based on the multivariate water quality parameters of all monitoring stations, the probability of pollution events is obtained The combined event probability P all can be specifically expressed by the following expression:
Pall(t)=ηPsingle(t)+(1-η)Pmulti(t) (7)P all (t)=ηP single (t)+(1-η)P multi (t) (7)
式中,Psingle(t)和Pmulti(t)分别表示为t时刻,利用单站点模型和利用多站点模型计算得到的污染事件发生概率。η是调节单站点模型和多站点模型对最终污染事件识别同步决策影响的重要性权重。本发明中的单站点和多站点模型都是无监督模型,事先不知道污染信息,因此设置η=0.5反映单站点模型和多站点模型对最终污染检测报警相同的重要性。当组合事件概率Pall超过步骤(6)中预设的概率阈值Pthre时,给出最终模型的报警信号,并给出水质异常事故发生概率Pall。In the formula, P single (t) and P multi (t) represent the probability of occurrence of pollution events calculated by using the single-site model and the multi-site model at time t, respectively. η is the importance weight to adjust the influence of single-site model and multi-site model on the synchronization decision-making of final pollution event identification. The single-site and multi-site models in the present invention are all unsupervised models, and the pollution information is not known in advance, so setting η=0.5 reflects the same importance of the single-site model and the multi-site model to the final pollution detection and alarm. When the combined event probability P all exceeds the preset probability threshold P thre in step (6), the alarm signal of the final model is given, and the occurrence probability of water quality abnormal accident P all is given.
本发明的有益效果是:The beneficial effects of the present invention are:
(1)本发明提出的水质异常事故检测方法考虑了多个站点关联的水质数据信息,通过融合单站点和多站点的异常检测模型的结果提高了污染事件的检测准确率。(1) The method for detecting abnormal water quality accidents proposed by the present invention takes into account the water quality data information associated with multiple sites, and improves the detection accuracy of pollution events by fusing the results of single-site and multi-site anomaly detection models.
(2)本发明提出的供水管网水质异常事故检测方法有较强的鲁棒性,能够适应一定程度的噪声点和不平稳水质数据情况,对于检测的水质指标种类,检测的水质站点个数没有具体要求,提高了本发明的适用范围。(2) The water supply pipe network water quality abnormal accident detection method that the present invention proposes has stronger robustness, can adapt to a certain degree of noise point and unsteady water quality data situation, for the water quality index kind of detection, the water quality site number of detection The absence of specific requirements increases the scope of application of the present invention.
(3)现有的管网水质污染事故识别方法大都需要实际的污染数据用于模型的训练或者相关参数的设置,在实际生活中很难收集到大量的污染事件用于训练学习。相较于这些方法,本发明提出的供水管网水质异常事故检测方法为无监督学习方法,模型的构建和训练仅需要供水管网正常运行下的水质数据,模型的应用范围更广,实用性更强。(3) Most of the existing pipe network water pollution accident identification methods require actual pollution data for model training or setting of related parameters, and it is difficult to collect a large number of pollution events for training and learning in real life. Compared with these methods, the method for detecting abnormal water quality accidents of the water supply pipe network proposed by the present invention is an unsupervised learning method. The construction and training of the model only need the water quality data under the normal operation of the water supply pipe network, and the application range of the model is wider and the practicability stronger.
附图说明Description of drawings
图1为模型构建的流程图;Figure 1 is a flow chart of model building;
图2为某城市供水管网及传感器监测站点布置图;Figure 2 is a layout diagram of a city's water supply network and sensor monitoring stations;
图3为对抗学习模型(GAN)的构造示意图;Figure 3 is a schematic diagram of the construction of the confrontational learning model (GAN);
图4为对测试集数据利用分别利用单站点模型、多站点模型以及单站点和多站点结合模型得到的最终水质异常事件识别和预警情况。Figure 4 shows the final identification and early warning of abnormal water quality events obtained by using the single-site model, multi-site model, and single-site and multi-site combined model for the test set data.
具体实施方式detailed description
为了本发明的技术方案及优点呈现地更加清楚明白,以下结合附图和实施例对本发明进行详细说明,应当指出的是,实施例只是对本发明的具体阐释,但发明的实施方式不限于此。In order to present the technical solutions and advantages of the present invention more clearly, the present invention will be described in detail below in conjunction with the accompanying drawings and examples. It should be noted that the examples are only specific illustrations of the present invention, but the embodiments of the invention are not limited thereto.
实施例1。Example 1.
参照附图1,本发明的具体实施步骤如下:With reference to accompanying drawing 1, concrete implementation steps of the present invention are as follows:
S1,数据的准备与处理。模型通过利用正常的传感器监测站点的水质数据来进行模型的构建与训练,利用含有水质污染事件的数据对模型的性能进行评价。利用管网水力水质模型模拟得到供水管网中的多个传感器监测站点的水质数据,将选择的N个传感器分为N+1组,前N组包含各自单站点的水质参数数据,后一组包含N个站点的水质参数数据。需要的数据包括正常运行的水质数据和带有污染事件的水质数据两种,分为以下两部分:S1, data preparation and processing. The model builds and trains the model by using the water quality data of normal sensor monitoring stations, and evaluates the performance of the model by using the data containing water pollution events. The water quality data of multiple sensor monitoring stations in the water supply network are simulated by using the hydraulic water quality model of the pipe network, and the selected N sensors are divided into N+1 groups. The first N groups contain the water quality parameter data of their respective single stations, and the latter group Contains water quality parameter data of N stations. The required data include water quality data in normal operation and water quality data with pollution events, which are divided into the following two parts:
S11,正常水质数据。在水源处输入正常运行状态下的多个水质指标的时序值,其中水质指标包括但不限于余氯、pH、温度、电导率、浊度、TOC(总有机碳)等,通过运行管网水力水质模型,收集所选的多个监测站点的水质数据。将原始的正常数据分成两部分,70%的训练集和30%的测试集。S11, normal water quality data. Input time-series values of multiple water quality indicators under normal operating conditions at the water source, where the water quality indicators include but not limited to residual chlorine, pH, temperature, conductivity, turbidity, TOC (total organic carbon), etc. The water quality model collects water quality data from multiple selected monitoring stations. Divide the original normal data into two parts, 70% training set and 30% test set.
S12,含有污染事件的水质数据。为了保证污染事件能影响到所选择的水质监测站点,在靠近水源处设置污染事件,由于管网运行过程中水质异常事件记录较少,水质事件的发生较大依赖于管网的环境,在本发明中参考了相关研究中模拟事件发生的方法,通过模拟高斯形状分布的水质参数变化,设置污染发生的时长为10小时,随机模拟不同污染时间发生时影响的水质指标数量(3-6),相应监测水质指标的变化趋势(增加或减少),以及变化的幅值(1.0-2.5),在测试集的数据中添加污染后并利用水力水质模型得到各个站点的水质参数的变化情况。S12, water quality data containing pollution events. In order to ensure that pollution events can affect the selected water quality monitoring stations, pollution events are set up near the water source. Since there are few records of water quality abnormal events during the operation of the pipeline network, the occurrence of water quality events is largely dependent on the environment of the pipeline network. The invention refers to the method of simulating the occurrence of events in related research, by simulating the change of water quality parameters in Gaussian shape distribution, setting the duration of pollution occurrence as 10 hours, and randomly simulating the number of water quality indicators (3-6) affected when different pollution times occur, Correspondingly monitor the change trend (increase or decrease) of water quality indicators, and the magnitude of change (1.0-2.5), add pollution to the data of the test set and use the hydraulic water quality model to obtain the change of water quality parameters at each station.
S2,叠加图像转换,对收集的传感器站点1,2,3的正常以及含有污染的水质数据进行数据图像转换。为了保证的单站点和多站点的叠加图像大小的一致性,在进行时序数据转化为叠加图像的操作后,利用图像填充的方法将最终的叠加图像都设置为相同的大小(32*32),图像最外圈按0填充。为了减少噪声的影响,每个时刻的叠加图像取过去5个时间长度的平均值。S2, superimposed image conversion, data image conversion is performed on the normal and polluted water quality data collected from sensor stations 1, 2, and 3. In order to ensure the size consistency of the superimposed images of single site and multi-site, after the operation of converting the time series data into superimposed images, the final superimposed images are set to the same size (32*32) by using the image filling method, The outermost circle of the image is filled with 0. In order to reduce the influence of noise, the superimposed image at each moment takes the average value of the past 5 time lengths.
S3,构造对抗学习模型,利用训练好的对抗学习模型计算各个时刻的异常分数。对于N个单站点和一个多站点的水质数据转换的叠加图像,分别构造对抗学习模型(GAN)进行训练,每个对抗学习模型的结构以及参数设置相同,生成器采用自编码结构及卷积神经网络,利用历史30张图像(2.5小时)作为输入,输出为当前时刻估计的叠加图像,判别器D采用卷积神经网络结构,将历史的30张图像作为条件输入,同时输入当前时刻真实数据构造的叠加图像以及由生成器生成的叠加图像,输出为评估输入图片的真实性的特征向量。综合生成器的重构损失和判别器的特征损失公式(4)计算各个时刻水质的异常分数。S3. Construct an adversarial learning model, and use the trained adversarial learning model to calculate abnormal scores at each moment. For the superimposed images converted from N single-site and one multi-site water quality data, construct an adversarial learning model (GAN) for training. The structure and parameter settings of each adversarial learning model are the same, and the generator adopts the self-encoding structure and convolutional neural network. The network uses 30 historical images (2.5 hours) as input, and the output is the superimposed image estimated at the current moment. The discriminator D adopts a convolutional neural network structure, takes the 30 historical images as conditional input, and simultaneously inputs the real data structure at the current moment. The overlay image of and the overlay image generated by the generator, the output is a feature vector that evaluates the authenticity of the input image. Combining the reconstruction loss of the generator and the feature loss of the discriminator, Equation (4) calculates the anomaly score of the water quality at each moment.
S4,对于S3中训练的对抗学习模型计算的异常分数,需要确定不同模型计算的异常分数的正常值阈值范围。对于单站点和多站点模型,其阈值的选择与训练集中计算的异常分数的分布情况有关,总体原则是让大部分的训练集中的计算的异常分数值处于正常的范围。S4, for the abnormal scores calculated by the adversarial learning model trained in S3, it is necessary to determine the normal value threshold range of the abnormal scores calculated by different models. For single-site and multi-site models, the selection of the threshold is related to the distribution of abnormal scores calculated in the training set. The general principle is to keep the calculated abnormal scores in most training sets in the normal range.
S5,水质异常事件发生概率更新及预警。利用时序贝叶斯原理进行水质异常事件的概率更新,当概率超过某一阈值时则该模型识别为是污染事故报警。污染事件报警的概率阈值设置为Pthre=80%,初始时刻的污染事件概率设置为较低的值P(0)=10-5,平滑参数取值为α=0.6。利用公式(5)和(6)对N个单站点模型和一个多站点模型分别进行水质异常事件发生概率更新。S5, the probability update and early warning of abnormal water quality events. Time-series Bayesian principle is used to update the probability of water quality abnormal events. When the probability exceeds a certain threshold, the model recognizes it as a pollution accident and alarms. The probability threshold of the pollution event alarm is set to P thre =80%, the pollution event probability at the initial moment is set to a relatively low value P(0)=10 -5 , and the value of the smoothing parameter is α=0.6. Using formulas (5) and (6) to update the probability of occurrence of water quality abnormal events for N single-site models and a multi-site model respectively.
S6,同一时刻统计多个模型的报警结果。基于多站点模型和单站点模型得到的污染事件发生概率计算出最终模型的污染事件发生概率Pall,当其超过污染事件报警的概率阈值Pthre=80%时,进行最终模型的报警,并可以输出污染事件概率Pall以及监测到污染的传感器站点及相应的水质参数。S6, counting the alarm results of multiple models at the same time. Based on the pollution event occurrence probability obtained by the multi-site model and the single-site model, the pollution event occurrence probability P all of the final model is calculated. When it exceeds the pollution event alarm probability threshold P thre = 80%, the final model alarm is issued, and can be Output the pollution event probability P all and the sensor sites that have detected the pollution and the corresponding water quality parameters.
将本发明的方法用于中国某城市供水管网(图2),该管网中共放置了33个传感器检测站点,选择靠近污染的传感器站点1,2,3用于模型方法的构建和性能评估,收集三个站点的2个月时间间隔为5分钟的水质数据记录,设计的水质包括余氯、pH、温度、电导率、浊度、总有机碳,将数据分为训练集和测试集,训练集的数据均是管网正常运行的数据,用于模型的构建与训练,测试集数据是添加了污染事件的数据,用于模型检测污染事故的性能评价。图4将模型最终的检测结果与基于单个站点的单站点模型和基于多站点模型进行了比较,可以看到提出的模型综合了单站点和多站点模型的优势,在7次污染事故中,模型成功检测出来6次污染事件,仅发生2次短时间误报,其结果优于仅使用单站点或者多站点模型的检测结果。本发明提出的方法能够提升检测精度,增加污染事件的检出率。该实例的应用也证实了本发明提出的方法具有较好的实用性,有效报警率高,误报率低,能在实际的供水管网中有较好的应用价值。The method of the present invention is applied to a water supply network in a certain city in China (Fig. 2). A total of 33 sensor detection sites are placed in the network, and the sensor sites 1, 2, and 3 close to pollution are selected for the construction of the model method and performance evaluation , collect water quality data records of three sites at intervals of 5 minutes for 2 months, design water quality including residual chlorine, pH, temperature, conductivity, turbidity, total organic carbon, divide the data into training set and test set, The data in the training set is the normal operation data of the pipeline network, which is used for model construction and training, and the data in the test set is the data with pollution incidents added, which is used for the performance evaluation of the model to detect pollution accidents. Figure 4 compares the final detection results of the model with the single-site model based on a single site and the multi-site model. It can be seen that the proposed model combines the advantages of the single-site and multi-site models. In the seven pollution accidents, the model Six pollution events were successfully detected, and only two short-term false positives occurred, and the results were better than those using only single-site or multi-site models. The method proposed by the invention can improve the detection accuracy and increase the detection rate of pollution events. The application of this example also proves that the method proposed by the present invention has better practicability, high effective alarm rate and low false alarm rate, and can have better application value in actual water supply pipe network.
以上所述实施例仅表达本发明的实施方式,但并不能因此而理解为对本发明专利的范围的限制,应当指出,对于本领域的技术人员来说,在不脱离本发明构思的前提下,还可以做出若干变形和改进,这些均属于本发明的保护范围。The above-mentioned embodiment only expresses the implementation mode of the present invention, but can not therefore be interpreted as the limitation of the scope of the patent of the present invention, it should be pointed out that, for those skilled in the art, under the premise of not departing from the concept of the present invention, Several modifications and improvements can also be made, all of which belong to the protection scope of the present invention.
Claims (5)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211104588.6A CN115470850B (en) | 2022-09-09 | 2022-09-09 | A water quality abnormality event identification and early warning method based on water quality spatiotemporal data of pipe network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211104588.6A CN115470850B (en) | 2022-09-09 | 2022-09-09 | A water quality abnormality event identification and early warning method based on water quality spatiotemporal data of pipe network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115470850A true CN115470850A (en) | 2022-12-13 |
CN115470850B CN115470850B (en) | 2025-05-30 |
Family
ID=84369608
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211104588.6A Active CN115470850B (en) | 2022-09-09 | 2022-09-09 | A water quality abnormality event identification and early warning method based on water quality spatiotemporal data of pipe network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115470850B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116186547A (en) * | 2023-04-27 | 2023-05-30 | 深圳市广汇源环境水务有限公司 | Method for rapidly identifying abnormal data of environmental water affair monitoring and sampling |
CN117193399A (en) * | 2023-09-27 | 2023-12-08 | 安徽省高迪科技有限公司 | Solar energy sewage purification cloud intelligence control system |
CN117235661A (en) * | 2023-08-30 | 2023-12-15 | 广州怡水水务科技有限公司 | AI-based direct drinking water quality monitoring method |
CN118570211A (en) * | 2024-08-05 | 2024-08-30 | 江西洪城水业环保有限公司 | Water quality abnormality detection analysis method and system based on deep learning |
CN119809452A (en) * | 2025-03-06 | 2025-04-11 | 沈阳水务集团有限公司 | Water service performance data matching method and system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106872657A (en) * | 2017-01-05 | 2017-06-20 | 河海大学 | A kind of multivariable water quality parameter time series data accident detection method |
CN111191855A (en) * | 2020-01-13 | 2020-05-22 | 大连理工大学 | A method for identifying and early warning of abnormal water quality events based on multivariate water quality time series data of pipeline network |
US20200231466A1 (en) * | 2017-10-09 | 2020-07-23 | Zijun Xia | Intelligent systems and methods for process and asset health diagnosis, anomoly detection and control in wastewater treatment plants or drinking water plants |
-
2022
- 2022-09-09 CN CN202211104588.6A patent/CN115470850B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106872657A (en) * | 2017-01-05 | 2017-06-20 | 河海大学 | A kind of multivariable water quality parameter time series data accident detection method |
US20200231466A1 (en) * | 2017-10-09 | 2020-07-23 | Zijun Xia | Intelligent systems and methods for process and asset health diagnosis, anomoly detection and control in wastewater treatment plants or drinking water plants |
CN111191855A (en) * | 2020-01-13 | 2020-05-22 | 大连理工大学 | A method for identifying and early warning of abnormal water quality events based on multivariate water quality time series data of pipeline network |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116186547A (en) * | 2023-04-27 | 2023-05-30 | 深圳市广汇源环境水务有限公司 | Method for rapidly identifying abnormal data of environmental water affair monitoring and sampling |
CN117235661A (en) * | 2023-08-30 | 2023-12-15 | 广州怡水水务科技有限公司 | AI-based direct drinking water quality monitoring method |
CN117235661B (en) * | 2023-08-30 | 2024-04-12 | 广州怡水水务科技有限公司 | AI-based direct drinking water quality monitoring method |
CN117193399A (en) * | 2023-09-27 | 2023-12-08 | 安徽省高迪科技有限公司 | Solar energy sewage purification cloud intelligence control system |
CN118570211A (en) * | 2024-08-05 | 2024-08-30 | 江西洪城水业环保有限公司 | Water quality abnormality detection analysis method and system based on deep learning |
CN119809452A (en) * | 2025-03-06 | 2025-04-11 | 沈阳水务集团有限公司 | Water service performance data matching method and system |
CN119809452B (en) * | 2025-03-06 | 2025-06-17 | 沈阳水务集团有限公司 | Water service performance data matching method and system |
Also Published As
Publication number | Publication date |
---|---|
CN115470850B (en) | 2025-05-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115470850A (en) | A water quality anomaly event recognition and early warning method based on water quality spatiotemporal data of pipe network | |
CN111507376B (en) | Single-index anomaly detection method based on fusion of multiple non-supervision methods | |
CN117290800B (en) | Timing sequence anomaly detection method and system based on hypergraph attention network | |
CN114358152A (en) | Intelligent power data anomaly detection method and system | |
CN111191855B (en) | Water quality abnormal event identification and early warning method based on pipe network multi-element water quality time sequence data | |
Li et al. | Generative adversarial networks for detecting contamination events in water distribution systems using multi-parameter, multi-site water quality monitoring | |
Li et al. | Developing stacking ensemble models for multivariate contamination detection in water distribution systems | |
CN106779069A (en) | A kind of abnormal electricity consumption detection method based on neutral net | |
CN110636066B (en) | Network security threat situation assessment method based on unsupervised generative reasoning | |
CN110119758A (en) | A kind of electricity consumption data abnormality detection and model training method, device | |
CN114023399A (en) | Air particulate matter analysis early warning method and device based on artificial intelligence | |
CN115185937A (en) | SA-GAN architecture-based time sequence anomaly detection method | |
CN108510072A (en) | A kind of discharge of river monitoring data method of quality control based on chaotic neural network | |
CN116049764A (en) | Cross-scale time sequence data fusion method and system for Internet of things | |
CN118297487A (en) | Abnormality detection and root cause analysis method and system for foundation pit monitoring | |
CN113779879A (en) | A medium and long-term electrical abnormality detection method based on LSTM-seq2seq-attention model | |
CN119293713A (en) | A method for anomaly detection and automatic positioning based on data curve | |
CN119724645A (en) | Nuclear power robot intelligent inspection and fault diagnosis system | |
CN115062686B (en) | Multivariate KPI time series anomaly detection method and system based on multi-angle features | |
CN117033923A (en) | Method and system for predicting crime quantity based on interpretable machine learning | |
CN119691568A (en) | Sewage treatment process abnormal condition identification method and system based on deep neural network | |
CN119583182A (en) | A power grid network attack detection method and system based on deep learning | |
CN119168322A (en) | A method for early warning analysis of full pipe operation of sewage pipe network | |
CN119479230A (en) | A linkage early warning method based on security situation assessment and its application | |
CN118297414A (en) | Mine water hazard forecasting electronic equipment and computer program product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |