[go: up one dir, main page]

CN120559180B - A method and device for environmental monitoring and pollution tracing in chemical parks - Google Patents

A method and device for environmental monitoring and pollution tracing in chemical parks

Info

Publication number
CN120559180B
CN120559180B CN202511054980.8A CN202511054980A CN120559180B CN 120559180 B CN120559180 B CN 120559180B CN 202511054980 A CN202511054980 A CN 202511054980A CN 120559180 B CN120559180 B CN 120559180B
Authority
CN
China
Prior art keywords
data
pollutant
model
tracing
xgboost
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202511054980.8A
Other languages
Chinese (zh)
Other versions
CN120559180A (en
Inventor
房春生
阿巴西·阿萨德
王菊
阿赫塔·阿尼斯
陈嘉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jilin University
Original Assignee
Jilin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jilin University filed Critical Jilin University
Priority to CN202511054980.8A priority Critical patent/CN120559180B/en
Publication of CN120559180A publication Critical patent/CN120559180A/en
Application granted granted Critical
Publication of CN120559180B publication Critical patent/CN120559180B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/0004Gaseous mixtures, e.g. polluted air
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/18Water
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/24Earth materials
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16YINFORMATION AND COMMUNICATION TECHNOLOGY SPECIALLY ADAPTED FOR THE INTERNET OF THINGS [IoT]
    • G16Y20/00Information sensed or collected by the things
    • G16Y20/10Information sensed or collected by the things relating to the environment, e.g. temperature; relating to location

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Biochemistry (AREA)
  • Physics & Mathematics (AREA)
  • Pathology (AREA)
  • Immunology (AREA)
  • General Physics & Mathematics (AREA)
  • Analytical Chemistry (AREA)
  • Food Science & Technology (AREA)
  • Medicinal Chemistry (AREA)
  • Environmental & Geological Engineering (AREA)
  • Toxicology (AREA)
  • Computing Systems (AREA)
  • Remote Sensing (AREA)
  • Geology (AREA)
  • General Life Sciences & Earth Sciences (AREA)
  • Combustion & Propulsion (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明提出了一种用于化工园区环境监测与污染溯源的方法和装置,涉及环境监测和污染溯源领域。该方法通过物联网传感器采集污染物数据,对污染物数据进行预处理,得到处理后的污染物数据;结合边缘计算与云端计算,采用联邦学习训练溯源XGBoost子模型,优化污染源定位和类型分类。利用VAE进行异常检测、卡尔曼滤波融合数据、Transformer预测传播路径,能够实现实时、高效、安全的污染预警管理。

This invention proposes a method and apparatus for environmental monitoring and pollution source tracing in chemical parks, relating to the fields of environmental monitoring and pollution source tracing. This method collects pollutant data through IoT sensors, preprocesses the data, and generates processed pollutant data. Combining edge computing with cloud computing, it employs federated learning to train an XGBoost sub-model for source tracing, optimizing pollution source location and classification. Using VAE for anomaly detection, Kalman filtering for data fusion, and Transformer for propagation path prediction, it enables real-time, efficient, and secure pollution early warning management.

Description

Method and device for environmental monitoring and pollution tracing of chemical industry park
Technical Field
The invention relates to the field of environmental monitoring and pollution tracing, in particular to a method and a device for environmental monitoring and pollution tracing of a chemical industry park.
Background
With the acceleration of industrialization progress, chemical parks are increasingly prominent in environmental pollution as a core area for chemical production and processing. Contaminants in the atmosphere, water and soil, such as PM2.5, volatile Organic Compounds (VOCs), chemical Oxygen Demand (COD) and heavy metals, pose a risk not only to the ecological environment but also to the health of surrounding residents. Timely and accurate monitoring of contaminant concentration, detection of abnormal events, positioning of contaminant sources and prediction of contaminant propagation paths are key requirements for environmental management and pollution control in chemical parks.
In the prior art, environmental monitoring mainly relies on a fixed site sensor network to perform pollution analysis by collecting air, water quality and soil data. For example, conventional monitoring systems utilize laser scattering, electrochemical or spectroscopic analysis techniques to collect contaminant concentration data in real-time and combine the meteorological data for preliminary analysis. However, these systems have the following limitations:
the existing monitoring system is generally used for independently processing the sensor, the meteorological and remote sensing data, and an efficient fusion method is lacked, so that the analysis precision is affected by data noise and missing problems. For example, single sensor data is difficult to cope with contamination spread in complex meteorological conditions (such as strong winds or reverse temperatures).
The abnormality detection capability is limited, and the traditional abnormality detection method is mostly based on a fixed threshold value or a simple statistical model, so that the traditional abnormality detection method is difficult to adapt to dynamic environment changes (such as seasonal weather or production activity fluctuation), and the false alarm rate is high or the omission is serious. The use of dynamic thresholding and deep learning models (e.g., variational self-encoders) has not been widespread.
Pollution tracing is difficult, and the existing tracing technology, such as a reverse track model (e.g. HYSPLIT), relies on high-resolution meteorological data, but has insufficient adaptability to multi-source data integration and complex pollution scenes. The traditional machine learning model (such as a support vector machine) is poor in high-dimensional and multi-scale feature processing and real-time performance, the positioning accuracy is often more than 200 meters, and the accurate treatment requirement is difficult to meet.
Traditional centralized training methods require raw data to be uploaded to the cloud, and are large in data volume (hundreds of MB/site per day), increasing transmission cost and leakage risk. The use of distributed learning (e.g., federal learning) in environmental monitoring is still under exploration.
The real-time performance and the calculation efficiency are insufficient, and when the existing system processes high-dimensional data (such as minute-level multi-pollutant concentration) and complex models (such as a deep neural network), the calculation delay is high (> 500 milliseconds), so that the real-time early warning and tracking requirements are difficult to meet. Edge computing and model compression techniques have not been widely used for environmental monitoring.
In recent years, artificial Intelligence (AI) and internet of things (IoT) technologies offer new opportunities for environmental monitoring. The literature reports that the sensor network based on the Internet of things can realize minute-level data acquisition, and combines edge computing equipment (such as NVIDIA Jetson) to perform local reasoning so as to reduce cloud load. However, existing AI models (e.g., random forests or gradient-lifting trees) mostly employ centralized training, neglecting computing power of edge computing devices, and do not adequately optimize multi-objective tasks (e.g., pollution source coordinate regression and type classification). Federal learning is used as a distributed training paradigm, and communication overhead is remarkably reduced and privacy is protected by only transmitting model parameters instead of original data, but its application in pollution tracing in chemical industry parks still faces challenges such as multi-scale feature processing, uncertainty quantization and dynamic weight adjustment.
In addition, pollution propagation prediction needs to combine high-dimensional meteorological data and space-time characteristics, and a traditional model (such as a convolutional neural network) is poor in capturing long-term dependency. Successful application of the Transformer model in sequence prediction inspires the potential of the Transformer model in pollutant propagation path prediction, but real-time performance and computational efficiency still need to be optimized.
Disclosure of Invention
In view of the above, the invention provides a method and a device for environmental monitoring and pollution tracing in a chemical industry park, which can optimize pollution source positioning and type classification and realize real-time, efficient and safe pollution early warning management.
A method for environmental monitoring and pollution tracing of a chemical industry park comprises the following steps:
step 1, collecting pollutant data of air, water and soil in real time through an Internet of things sensor network, and deploying edge computing equipment and a cloud server;
step 2, preprocessing pollutant data to obtain processed pollutant data;
Step 3, training a tracing XGBoost sub-model on the edge computing equipment through federation learning, and carrying out weighted aggregation on a plurality of tracing XGBoost sub-models by a cloud server to form a global tracing XGBoost model;
Detecting the pollutant concentration corresponding to the processed pollutant data through a variation self-encoder, determining an abnormal score corresponding to the processed pollutant data based on the abnormal threshold and the pollutant concentration corresponding to the processed pollutant data, and sending out an early warning when the abnormal score exceeds a preset threshold;
Step 5, under the condition of sending out early warning, utilizing a 4D meteorological database, integrating the processed pollutant data, meteorological data and remote sensing data through Kalman filtering, combining HYSPLIT reverse track model and global tracing XGBoost model to position the pollutant source coordinates and pollutant source types, and predicting the subsequent propagation paths of pollutants by using a Transformer model;
And 6, outputting the predicted result to a central control platform, storing real-time predicted result data by using a MongoDB, and storing historical predicted result data by using an HDFS.
In particular, said step 1 comprises:
The method comprises the steps of deploying an environment monitoring station, deploying an air sensor, a water quality sensor and a soil sensor to form an Internet of things sensor network, collecting pollutant data of air, water and soil in real time based on the Internet of things sensor network, and deploying edge computing equipment and a cloud server.
In particular, the step 2 includes processing the contaminant data stream in real time using a APACHE KAFKA flow computing framework, the computed contaminant data including a contaminant mean, a contaminant median, and a contaminant variance.
In particular, said step 3 comprises:
the tracing XGBoost sub-model is trained on the edge computing equipment through federation learning, output data of the tracing XGBoost sub-model is sent to the cloud server through a gRPC channel, the cloud server carries out weighted aggregation on the plurality of tracing XGBoost sub-models to form a global tracing XGBoost model, and the weighted aggregation is achieved through the following expression:
;
Wherein, the Representing the data volume of the nth monitoring site,Representing the error of XGBoost submodels on the nth edge computing device, n representing the nth edge computing device, one edge computing device deployed for each monitoring site, m representing the index of the edge computing device,A penalty parameter is indicated and a penalty parameter is indicated,Representing the data volume processed by the mth edge computing device, the cloud server distributes a global traceability XGBoost model daily through an OTA mechanism,Representing the weights of the trace-source XGBoost sub-model on the nth edge computing device.
In particular, the method further comprises performing performance optimization on the traceability XGBoost submodel, the variant self-encoder and the transducer model, and the performance optimization method comprises the following steps:
Pruning and quantizing are applied to a source XGBoost submodel, a variation self-encoder and a transform model to reduce the quantity of parameters, parallel calculation is used for reducing calculation delay, and model parameters are updated every day in an incremental learning mode;
the trace-source XGBoost submodel is optimized using a multi-objective loss function.
In particular, in the step 4, the loss function of the variable self-encoder is:
;
Wherein, the Indicating the concentration of the contaminant after normalization,Represents the reconstitution concentration, N represents the contaminant species,Representing the mean value of the latent variable,Representing the variance of the latent variable, t being the current time step, i and j being variables,Is the loss function value of the variable self-encoder.
In particular, in the step 5, the method for positioning the pollution source coordinates and the pollution source type by using the 4D weather database and integrating the processed pollutant data, weather data and remote sensing data through kalman filtering and combining HYSPLIT reverse track model and global tracing XGBoost model comprises the following steps:
And outputting the fusion concentration by using the pollutant data, the 4D meteorological data and the remote sensing data which are processed through Kalman filtering fusion, wherein the HYSPLIT reverse track model utilizes Lagrange particles to track and analyze the pollutant propagation path, determines the area of a pollution source, positions the pollution source coordinates in the area of the pollution source through the global tracing XGBoost model, and determines the type of the pollution source.
In particular, in the step 5, the converter model is used for predicting the subsequent transmission path of the pollutant, wherein the method comprises the steps of taking the pollution source coordinates and the pollution source type as initial conditions of the converter model for predicting the transmission path, and using the converter model for predicting the subsequent transmission path of the pollutant and the concentration distribution of the pollutant for 24 hours in future.
In particular, said step 6 comprises:
Outputting the prediction result to a central control platform, storing real-time data by using MongoDB, storing historical data by using HDFS, simulating pollutant leakage by using MATLAB, and verifying positioning errors and type classification errors.
The invention also discloses a device for environmental monitoring and pollution tracing in the chemical industry park, which comprises:
the data acquisition and hardware deployment module is used for acquiring pollutant data of air, water and soil in real time through an Internet of things sensor network and deploying edge computing equipment and a cloud server;
The software configuration module is used for preprocessing the pollutant data to obtain processed pollutant data;
the model training and optimizing module is used for training the tracing XGBoost sub-models on the edge computing equipment through federation learning, and the cloud server carries out weighted aggregation on the plurality of tracing XGBoost sub-models to form a global tracing XGBoost model;
The variation self-encoder anomaly detection module is used for dynamically adjusting an anomaly threshold value by using a random forest model, detecting the pollutant concentration corresponding to the processed pollutant data through the variation self-encoder, determining an anomaly score corresponding to the processed pollutant data based on the anomaly threshold value and the pollutant concentration corresponding to the processed pollutant data, and sending out an early warning when the anomaly score exceeds a preset threshold value;
the pollution tracing and tracking module is used for utilizing a 4D meteorological database to integrate pollutant data, meteorological data and remote sensing data after treatment through Kalman filtering under the condition of sending out early warning, combining HYSPLIT reverse track model and global tracing XGBoost model to position a pollution source coordinate and a pollution source type, and predicting a subsequent transmission path of pollutants by using a Transformer model;
and the result output and storage module is used for outputting the prediction result to the central control platform, storing real-time data by using the MongoDB and storing historical data by using the HDFS.
The beneficial effects are that:
According to the technical scheme, high-precision pollution source positioning is realized, the pollution source coordinate positioning error is controlled within an effective range by utilizing a tracing XGBoost submodels (comprising 100 lifting trees and 25 trees trained on the edge) and a multi-objective loss function, and the method is superior to the traditional method (such as HYSPLIT, error is 200 m), so that reliable support is provided for accurately identifying the pollution source.
According to the technical scheme, the pollution source type classification accuracy is remarkably improved, the traceability XGBoost submodel classification performance is optimized through the terms of the multi-objective loss function, 94% type accuracy (types include factory emission, pipeline leakage, diffusion pollution and others) is achieved, and compared with a traditional support vector machine, differential treatment measures are effectively guided.
According to the technical scheme, the stability of model prediction is enhanced, the prediction uncertainty is reduced, the standard deviation of coordinates is less than 45 meters, the standard deviation of type probability is less than 0.07, the event proportion of confidence coefficient is less than 0.8 is only 2%, and the robustness under dynamic environmental weather change is ensured through a term regularization traceability XGBoost submodel of a multi-objective loss function.
According to the technical scheme, the data transmission quantity is greatly reduced, the privacy is protected, only the traceability XGBoost submodel parameters are transmitted by adopting federal learning, the original data transmission quantity is reduced by more than 90%, and compared with the traditional centralized training, the safety is remarkably improved by combining TLS 1.3 encryption.
According to the technical scheme, real-time and efficient environment monitoring and early warning are realized, and the real-time early warning requirement of a chemical industry park is met by combining VAE anomaly detection and API alarm through edge computing equipment reasoning, cloud aggregation and minute data acquisition, which is superior to that of a traditional system.
According to the technical scheme, the data fusion and pollutant propagation prediction capability is improved, pollutant data, 4D meteorological data and remote sensing data which are processed through Kalman filtering fusion are combined with a transducer model to predict a pollutant propagation path (100 m grid, mean square error < 0.1) in the future 24 hours, and compared with a traditional convolution network, the method and the device provide high-precision support for pollution diffusion prevention and control.
According to the technical scheme, the computing efficiency and the system expandability are optimized, the memory occupation of the edge computing equipment is <450MB, the cloud aggregation time is <1 minute, the system covers 40 sites (10 km 2), the system is easy to expand to a larger area, and the system is more efficient than the traditional centralized system through model pruning (30% parameter reduction), 16-bit floating point quantization and CUDA (Compute Unified Device Architecture, unified computing equipment architecture) parallel computing (50% delay reduction).
Drawings
FIG. 1 is a schematic diagram of a method for chemical industry park environmental monitoring and pollution tracing in accordance with the present invention;
FIG. 2 is a schematic diagram of an apparatus for environmental monitoring and pollution tracing in a chemical industrial park according to the present invention.
Detailed Description
The invention will now be described in detail by way of example with reference to the accompanying drawings.
The invention provides a method for environmental monitoring and pollution tracing in a chemical industry park, which is shown in figure 1 and comprises the following steps:
step 1, collecting pollutant data of air, water and soil in real time through an Internet of things sensor network, and deploying edge computing equipment and a cloud server;
step 2, preprocessing pollutant data to obtain processed pollutant data;
step 3, training a tracing XGBoost (Extreme Gradient Boosting, limit gradient lifting) sub-model on edge computing equipment through federation learning, and carrying out weighted aggregation on a plurality of tracing XGBoost sub-models by a cloud server to form a global tracing XGBoost model;
Detecting the pollutant concentration corresponding to the processed pollutant data through a variation self-encoder, determining an abnormal score corresponding to the processed pollutant data based on the abnormal threshold and the pollutant concentration corresponding to the processed pollutant data, and sending out an early warning when the abnormal score exceeds a preset threshold;
step 5, under the condition of sending out early warning, utilizing a 4D meteorological database, integrating the processed pollutant data, meteorological data and remote sensing data through Kalman filtering, combining HYSPLIT (Hybrid SINGLE PARTICLE LAGRANGIAN INTEGRATED Trajectory Model) reverse track model and global tracing XGBoost model to position a pollutant source coordinate and a pollutant source type, and predicting a pollutant subsequent propagation path by using a transducer model;
And 6, outputting the predicted result to a central control platform, storing real-time predicted result data by using a MongoDB, and storing historical predicted result data by using an HDFS (Hadoop Distributed FILE SYSTEM ).
In the embodiment of the invention, the step 1 specifically comprises the steps of collecting pollutant data of air, water and soil in real time through an internet of things (IoT) sensor network, deploying edge computing equipment and a cloud server, collecting 60-dimensional pollutant data (including PM2.5, 57 volatile organic compounds and chemical oxygen demand) of the air, the water and the soil in real time through the internet of things (IoT) sensor network, deploying 40 edge computing equipment (NVIDIA Jetson Nano) and the cloud server (AWS EC2, NVIDIA A100 GPU), deploying air sensors (PM 2.5, accuracy + -1 mug m 3), water quality sensors (COD, accuracy + -0.5 mg/L) and soil sensors (heavy metal, detection limit <1 ppm), and collecting 60-dimensional pollutant concentration data at a minute-level frequency (1 time per minute) at 40 monitoring sites, wherein the total data amount is about 100 MB/day/site.
Minute-level data transmission is achieved by using a 5G module (transmission rate >100 Mbps, delay <100 ms) or a low-power-consumption wide area network (LPWAN, power consumption <10 mW), edge computing devices (NVIDIA Jetson Nano,4GB RAM) are configured to support real-time reasoning, and cloud servers (AWS EC2, NVIDIA A100 GPU) support model training and aggregation. The system is deployed in a chemical industry park of a Jilin economic technology development area, covers a 64 km2 area and comprises 40 monitoring sites.
In this embodiment, the step 2 specifically includes configuring a streaming computing framework, an artificial intelligence model and a database, preprocessing pollutant data, performing real-time statistical analysis, and configuring APACHE KAFKA the streaming computing framework, an Artificial Intelligence (AI) model (variable self-encoder, gradient lifting decision tree, XGBoost, transformer) and a MongoDB/HDFS database.
In this embodiment, the edge computing device is configured to receive the processed contaminant data, meteorological data in a 4D meteorological database (100 meters spatial resolution, 1 minute temporal resolution, wind speed + -0.1 m/s), and remote sensing data (10 meters resolution, daily updates).
In the embodiment, the step 3 specifically comprises training a tracing XGBoost sub-model on edge computing equipment through federation learning, carrying out weighted aggregation on a cloud server to form a global tracing XGBoost model, optimizing artificial intelligent model performance through model compression, GPU parallel computing and incremental learning, training a tracing XGBoost sub-model (comprising 25 decision trees for example) on the edge computing equipment through federation learning, carrying out weighted aggregation on a global tracing XGBoost model (comprising 100 trees) by the cloud server, and reducing transmission sub-model parameters by 90% of data volume.
According to the invention, historical data is acquired, and pollutant data, 4D meteorological data and remote sensing data processed in the historical data are integrated by adopting Kalman filtering, so that a training sample is obtained. The pollutant data, 4D meteorological data and remote sensing data processed in the Kalman filtering integration historical data are the same as the pollutant data, 4D meteorological data and remote sensing data processed in the Kalman filtering integration historical data in the step 5, a 200-dimensional feature vector is obtained, and the integration method is described in detail in the step 5.
The invention uses Principal Component Analysis (PCA) to reduce the dimension of the features of the training sample to 80 dimensions, and retains 95% variance with a calculation time of <30 milliseconds. Data preprocessing includes wavelet denoising (Daubechies, 3 layers) and spatiotemporal KNN padding (k=5), with a loss rate <1%.
In the model training stage, each edge computing device trains a traceability XGBoost sub-model (comprising 25 lifting trees) and predicts pollution source coordinates and types.
80-Dimensional feature vectors (after PCA dimension reduction) are input.
Output of coordinatesType (type 4 total).
The model parameters are as follows:
maximum depth 6, learning rate 0.1, l2 regularization lambda=1.
Target function, self-defining multi-target loss function.
Training data 1 calendar history data.
The multi-objective loss function is used for optimizing a source-tracing XGBoost submodel, and the expression is as follows:
;
Wherein, the Optimizing pollution source coordinates for coordinate mean square errorAnd predicting, wherein Num is the number of samples, and i1 is the number of samples.The actual pollution source coordinates (in meters, based on the chemical industry park geographical coordinate system, e.g., UTM projection) representing the i1 st pollution event.
The pollution source coordinates predicted by the traceability XGBoost submodel are expressed and are based on 80-dimensional multi-scale feature vectors (comprising minute-scale, 5 minute-scale and 60 minute-scale concentrations, meteorological data parameters, spatial correlation and remote sensing data subjected to PCA dimension reduction).The Euclidean distance square representing the predicted and true coordinates is used to measure the positioning error.
Optimizing the pollution source type classification for the type cross entropy,The number of categories is set to 4, including factory discharge, pipeline leakage, diffusion pollution and others;
The real type label representing the i1 st sample is encoded with one-hot (e.g., [1, 0, 0, 0] represents "factory emissions").
Representing the type probability of the i1 st traceability XGBoost submodel prediction; and (3) representing the cross entropy of the i1 th sample, and measuring the difference between the prediction probability and the real label.
For prediction variance, for reducing model uncertainty;
Where T represents the number of lifting trees for the sub-model, set to 25 (25 trees trained per edge computing device). Representing the prediction result of the kth lifting tree, and classifying the type probability of the coordinates; Representing an average prediction of 25 trees of the sub-model, t=100 being the number of trees; are weight coefficients for balancing coordinate prediction, type classification, and stability.
The cloud collects 40 traceability XGBoost sub-models, a global XGBoost model is formed through weighted average aggregation, the traceability XGBoost sub-model is trained on edge computing equipment through federal learning, the cloud server weights and aggregates the global traceability XGBoost model, and the weighted aggregate global model is achieved through the following expression:
;
Wherein, the Representing the data volume of the nth monitoring site,Representing the error of XGBoost submodels on the nth edge computing device, n representing the nth edge computing device, one edge computing device deployed for each monitoring site, m representing the index of the edge computing device,A penalty parameter is indicated and a penalty parameter is indicated,Representing The data volume processed by The mth edge computing device, the cloud server distributes The global traceability XGBoost model daily through an OTA (Over The Air) mechanism,Representing the weights of the trace-source XGBoost sub-model on the nth edge computing device.
The dynamic tree weight updating formula from the traceability XGBoost submodel in the global traceability XGBoost model is as follows:
;
Where k represents the kth dynamic tree (with a value of 1 to 100), is the decision tree of the traceability XGBoost model, j1 represents the index of all trees (with a value of 1 to 100) for normalization, Representing the current time step (minute level, update per hour),Representing the prediction error of a time step on the kth tree,Indicating the correlation of the kth tree with the current weather. Alpha represents an error penalty parameter, which may be 0.5, and beta represents a correlation enhancement parameter, which may be 0.3.The weight of the kth tree is represented for dynamic prediction.
In this embodiment, the federation learning only transmits the traceability XGBoost submodel parameters, and the original data is retained in the edge computing device, so that the transmission quantity is reduced by 90%. The trace-source XGBoost submodel is uploaded to the cloud via gRPC protocol with an encryption delay of <10 milliseconds. The cloud distributes the global XGBoost model daily through the OTA for <5 minutes.
Performing performance optimization on the traceability XGBoost submodel, the variable self-encoder and the transducer model, wherein the performance optimization method comprises the following steps:
Pruning and quantizing are applied to a source XGBoost submodel, a variation self-encoder and a transform model to reduce the quantity of parameters, parallel calculation is used for reducing calculation delay, and model parameters are updated every day in an incremental learning mode;
the trace-source XGBoost submodel is optimized using a multi-objective loss function.
In the embodiment of the invention, the step 4 is specifically to dynamically adjust an abnormal threshold by using a random forest model, detect the pollutant concentration corresponding to the processed pollutant data through a variation self-encoder (VAE), determine the abnormal score corresponding to the processed pollutant data based on the abnormal threshold and the pollutant concentration corresponding to the processed pollutant data, and send out early warning when the abnormal score exceeds a preset threshold.
In one embodiment, if the anomaly score is >0.9, an early warning is triggered and a manual verification is triggered.
In this step 4, the variation self-encoder (VAE) processes the 60-dimensional normalized contaminant concentrationGenerating a reconstructed concentrationAn abnormal event is detected. The model architecture of the variational self-encoder is as follows:
input layer 60 dimensions (normalized concentration).
Encoder 3-layer fully connected network (128, 64, 10-dimensional), output latent variable mean valueSum of variances
Decoder 3-layer fully connected network (10, 64, 60 dimensions).
In the step 4, the anomaly detection step of the variation self-encoder comprises the steps of disposing the variation self-encoder for anomaly detection and generating anomaly scores, wherein the loss function of the variation self-encoder is as follows:
;
Wherein, the Indicating the concentration of the contaminant after normalization,Represents the reconstitution concentration, N represents the contaminant species,Representing the mean value of the latent variable,Representing the variance of the latent variable, t being the current time step, i and j being variables,Loss function value for a variational self-encoder
In the step 4, the anomaly threshold is dynamically adjusted by using a random forest model, and specifically comprises the step of predicting the anomaly threshold by combining the random forest (100 trees with the maximum depth of 20) with meteorological data (wind speed, humidity and 3 dimensions). The related parameters of the random forest model are designed as follows:
meteorological data + historical anomaly score (4 dimensions).
Output: threshold (scalar, e.g., 0.9).
The update frequency is 1 time per hour.
Training time, <5 minutes (Jetson Nano).
The anomaly score >0.9 triggers an early warning, the false alarm rate is <2%.
In the embodiment of the invention, step 5 specifically comprises the steps of utilizing a 4D weather database (spatial resolution is 100 meters, minute time resolution) to integrate pollutant data, weather data and remote sensing data after processing through Kalman filtering under the condition of sending out early warning, combining HYSPLIT reverse track model and global tracing XGBoost model to position pollution source coordinates and pollution source types, and predicting a subsequent transmission path of pollutants by using a transducer model, wherein the pollution source coordinates, the pollution source types and the pollutant transmission path are used as prediction results.
Wherein the treated pollutant data (180-dimensional) is concentration sequence of minute level (PM 2.5, 57 VOCs, COD, etc.), 5 minute level, 60 minute level time scale.
Characteristics of meteorological data (4 dimensions) wind speed (m/s), wind direction (degrees), temperature (°c), humidity (%), from 4D meteorological database.
Spatial correlation (12 dimensions) concentration covariance matrix between 40 sites based on Kalman filtering fusion data.
The remote sensing data is characterized by (4-dimensional) the contaminant distribution of the satellite images (100 meter resolution).
Totaling 200-dimensional feature vectors.
And integrating the processed pollutant data, meteorological data and remote sensing data through Kalman filtering. For example, for processed contaminant data (e.g., PM2.5 or COD), meteorological data, and telemetry data, the Kalman filter outputs are:
;
Indicating the fusion concentration of the ith contaminant at the current time step t.
Representing the processed pollutant data, the meteorological data and the remote sensing data integrated in the current time step t, wherein the processed pollutant data, the meteorological data and the remote sensing data are respectively normalized, and the data alignment is carried out based on the normalized processed pollutant data, the normalized meteorological data and the normalized remote sensing data to obtainThe fusion concentration of the ith contaminant is time step (t-1).
Is Kalman gain, and is used for dynamically adjusting the weights of the sensor and the prediction.
Based on Kalman filtering output, the HYSPLIT reverse trajectory model utilizes Lagrangian particle tracking to analyze the pollutant propagation path, and generates a first position area of a pollution source, wherein the first position area is a rough area, so that the prediction range of the global traceability XGBoost model can be shortened. And acquiring processed pollutant data, meteorological data and remote sensing data corresponding to the first position area, inputting the processed pollutant data, meteorological data and remote sensing data into a global tracing XGBoost model, and positioning a pollution source coordinate and a pollution source type by using the global tracing XGBoost model.
The global traceability XGBoost model predicts the pollution source as follows:
;
Wherein, the For the input of the global trace-back XGBoost model,For the kth lifting tree prediction function,For the uncertainty regularization coefficient,For the dynamic tree weights to be used,The function is quantized for uncertainty, e.g. by monte carlo sampling, 50 times.
Monte Carlo samples are used to calculate the prediction variance:
;
the feature vector representing the mth sample is added with gaussian noise.
M=50 represents the number of samples,Representing the predicted mean.
Λ=0.1 represents an uncertainty regularization coefficient for balancing the prediction and uncertainty. The confidence coefficient calculation formula is as follows:
;
the HYSPLIT reverse trajectory model cooperates with the global tracing XGBoost model, the HYSPLIT reverse trajectory model provides a candidate region (e.g., 1 km grid), and the global tracing XGBoost model further locates coordinates precisely within this range And type. For example, HYSPLIT might identify "a certain industrial area" as a pollution source area, XGBoost determine the coordinates and type of a particular plant or pipeline.
In the embodiment of the invention, step 6 is specifically that the prediction result is output to a central control platform, the MongoDB is used for storing real-time prediction result data, and the HDFS is used for storing historical prediction result data. In this embodiment, ECharts is used to generate a pollution source thermodynamic diagram. The display content comprises pollution source coordinates, pollution source types, prediction result confidence and prediction result uncertainty.
If the confidence is >0.8 and the type is "factory emission" or "pipe leak," an alert is sent to the regulatory agency through the API with a delay <1 second.
MongoDB: real-time data (ttl=1 hour), concurrent query delay <10 milliseconds.
HDFS: historical data (data stored for 1 year, data volume about 20 TB), supporting offline analysis.
The invention also includes a simulation test procedure using MATLAB to simulate VOCs leakage (concentration surge 50%, coordinates (1000, 2000), type "factory emissions"). The simulation test results are shown in table 1:
TABLE 1
Index (I) Results
Positioning error 75 Meters
Type accuracy rate 94%
Confidence level 0.90
Uncertainty of Standard deviation of coordinates 45 m, standard deviation of type probability 0.07
F1 fraction 0.94
Inference time 140 Ms (Jetson Nano)
In the field test stage, the pollution event is recorded 1500 times after 30 days of deployment in Jilin open areas, and the field test results are shown in Table 2:
TABLE 2
Index (I) Results
Positioning error Average 76 meters, 97% events <100 meters
Type accuracy rate 94% (4 Type)
Confidence level Average 0.89, <0.8 to 2%
Uncertainty of The standard deviation of coordinates is less than 50 meters, and the standard deviation of type probability is less than 0.08
Communication delay <500 Ms/wheel
System stability >99.9%
CPU utilization <75%(Jetson Nano)
Memory occupancy <450MB
Using promethaus to monitor edge computing device performance (CPU, memory), grafana generates a real-time dashboard, displaying prediction confidence and uncertainty.
The implementation effect is that the embodiment realizes the chemical industry park pollution monitoring and management with high efficiency and privacy protection:
Accuracy, positioning error <80 m, type F1>0.93, is superior to traditional methods (e.g. single sensor tracing, error >150 m).
Real-time reasoning <150 ms, communication <500 ms, alarm <1 second.
Federal learning reduces data transmission by 90% and TLS 1.3 encryption ensures security.
Scalability 40 sites cover 10 km2, easily expanding to larger areas.
Robustness, namely incremental learning is adapted to environmental changes, and stability is more than 99.9%.
Compared with the prior art, the method and the device remarkably improve the tracing precision and the instantaneity through edge-cloud combined training, XGBoost multi-objective optimization and complicated prediction functions.
As shown in fig. 2, the invention also discloses a device for environmental monitoring and pollution tracing in the chemical industry park, which comprises:
the data acquisition and hardware deployment module is used for acquiring pollutant data of air, water and soil in real time through an Internet of things sensor network and deploying edge computing equipment and a cloud server;
The software configuration module is used for preprocessing the pollutant data to obtain processed pollutant data;
the model training and optimizing module is used for training the tracing XGBoost sub-models on the edge computing equipment through federation learning, and the cloud server carries out weighted aggregation on the plurality of tracing XGBoost sub-models to form a global tracing XGBoost model;
The variation self-encoder anomaly detection module is used for dynamically adjusting an anomaly threshold value by using a random forest model, detecting the pollutant concentration corresponding to the processed pollutant data through the variation self-encoder, determining an anomaly score corresponding to the processed pollutant data based on the anomaly threshold value and the pollutant concentration corresponding to the processed pollutant data, and sending out an early warning when the anomaly score exceeds a preset threshold value;
the pollution tracing and tracking module is used for utilizing a 4D meteorological database to integrate pollutant data, meteorological data and remote sensing data after treatment through Kalman filtering under the condition of sending out early warning, combining HYSPLIT reverse track model and global tracing XGBoost model to position a pollution source coordinate and a pollution source type, and predicting a subsequent transmission path of pollutants by using a Transformer model;
and the result output and storage module is used for outputting the prediction result to the central control platform, storing real-time data by using the MongoDB and storing historical data by using the HDFS.
It will be evident to those skilled in the art that the embodiments of the invention are not limited to the details of the foregoing illustrative embodiments, and that the embodiments of the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of embodiments being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. A plurality of units, modules or means recited in a system, means or terminal claim may also be implemented by means of software or hardware by means of one and the same unit, module or means. The terms first, second, etc. are used to denote a name, but not any particular order.
Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the embodiment of the present invention, and not for limiting, and although the embodiment of the present invention has been described in detail with reference to the above-mentioned preferred embodiments, it should be understood by those skilled in the art that modifications and equivalent substitutions can be made to the technical solution of the embodiment of the present invention without departing from the spirit and scope of the technical solution of the embodiment of the present invention.

Claims (10)

1. The method for environmental monitoring and pollution tracing of the chemical industry park is characterized by comprising the following steps:
step 1, collecting pollutant data of air, water and soil in real time through an Internet of things sensor network, and deploying edge computing equipment and a cloud server;
step 2, preprocessing pollutant data to obtain processed pollutant data;
Step 3, training a tracing XGBoost sub-model on the edge computing equipment through federation learning, and carrying out weighted aggregation on a plurality of tracing XGBoost sub-models by a cloud server to form a global tracing XGBoost model;
Detecting the pollutant concentration corresponding to the processed pollutant data through a variation self-encoder, determining an abnormal score corresponding to the processed pollutant data based on the abnormal threshold and the pollutant concentration corresponding to the processed pollutant data, and sending out an early warning when the abnormal score exceeds a preset threshold;
Step 5, under the condition of sending out early warning, utilizing a 4D meteorological database, integrating the processed pollutant data, meteorological data and remote sensing data through Kalman filtering, combining HYSPLIT reverse track model and global tracing XGBoost model to position the pollutant source coordinates and pollutant source types, and predicting the subsequent propagation paths of pollutants by using a Transformer model;
And 6, outputting the predicted result to a central control platform, storing real-time predicted result data by using a MongoDB, and storing historical predicted result data by using an HDFS.
2. The method for environmental monitoring and pollution tracing in chemical industrial park according to claim 1, wherein said step 1 comprises:
The method comprises the steps of deploying an environment monitoring station, deploying an air sensor, a water quality sensor and a soil sensor to form an Internet of things sensor network, collecting pollutant data of air, water and soil in real time based on the Internet of things sensor network, and deploying edge computing equipment and a cloud server.
3. The method for environmental monitoring and pollution tracing of chemical industrial park according to claim 2, wherein said step 2 comprises real-time processing of the pollutant data stream using APACHE KAFKA flow computing framework, the calculated pollutant data comprising pollutant mean, pollutant median and pollutant variance.
4. The method for environmental monitoring and pollution tracing of chemical industrial park according to claim 3, wherein said step 3 comprises:
the tracing XGBoost sub-model is trained on the edge computing equipment through federation learning, output data of the tracing XGBoost sub-model is sent to the cloud server through a gRPC channel, the cloud server carries out weighted aggregation on the plurality of tracing XGBoost sub-models to form a global tracing XGBoost model, and the weighted aggregation is achieved through the following expression:
;
Wherein, the Representing the data volume of the nth monitoring site,Representing the error of XGBoost submodels on the nth edge computing device, n representing the nth edge computing device, one edge computing device deployed for each monitoring site, m representing the index of the edge computing device,A penalty parameter is indicated and a penalty parameter is indicated,Representing the data volume processed by the mth edge computing device, the cloud server distributes a global traceability XGBoost model daily through an OTA mechanism,Representing the weights of the trace-source XGBoost sub-model on the nth edge computing device.
5. The method for chemical park environmental monitoring and pollution tracing of claim 4, further comprising performance optimization of the tracing XGBoost sub-model, the variant self-encoder, and the Transformer model, the performance optimization method comprising:
Pruning and quantizing are applied to a source XGBoost submodel, a variation self-encoder and a transform model to reduce the quantity of parameters, parallel calculation is used for reducing calculation delay, and model parameters are updated every day in an incremental learning mode;
the trace-source XGBoost submodel is optimized using a multi-objective loss function.
6. The method for environmental monitoring and pollution tracing in a chemical industry park according to claim 4, wherein in step 4, the loss function of the variation self-encoder is:
;
Wherein, the Indicating the concentration of the contaminant after normalization,Represents the reconstitution concentration, N represents the contaminant species,Representing the mean value of the latent variable,Representing the variance of the latent variable, t being the current time step, i and j being variables,Is the loss function value of the variable self-encoder.
7. The method for environmental monitoring and pollution tracing in a chemical industry park according to claim 6, wherein in the step 5, the pollutant data, the meteorological data and the remote sensing data after the integration processing through the kalman filtering are utilized to locate the pollution source coordinates and the pollution source type by combining HYSPLIT reverse track model and global tracing XGBoost model, comprising:
And outputting the fusion concentration by using the pollutant data, the 4D meteorological data and the remote sensing data which are processed through Kalman filtering fusion, wherein the HYSPLIT reverse track model utilizes Lagrange particles to track and analyze the pollutant propagation path, determines the area of a pollution source, positions the pollution source coordinates in the area of the pollution source through the global tracing XGBoost model, and determines the type of the pollution source.
8. The method for environmental monitoring and pollution tracing in chemical industrial park according to claim 7, wherein in step 5, the method for predicting the subsequent propagation path of the pollutant by using a transducer model comprises using the pollution source coordinates and the pollution source type as initial conditions of the propagation path prediction by using the transducer model, and predicting the subsequent propagation path of the pollutant and the concentration distribution for 24 hours in future by using the transducer model.
9. The method for environmental monitoring and pollution tracing in a chemical industry park according to claim 8, wherein said step 6 comprises:
Outputting the prediction result to a central control platform, storing real-time data by using MongoDB, storing historical data by using HDFS, simulating pollutant leakage by using MATLAB, and verifying positioning errors and type classification errors.
10. A device for chemical industry garden environmental monitoring and pollution traceability, its characterized in that, the device includes:
the data acquisition and hardware deployment module is used for acquiring pollutant data of air, water and soil in real time through an Internet of things sensor network and deploying edge computing equipment and a cloud server;
The software configuration module is used for preprocessing the pollutant data to obtain processed pollutant data;
the model training and optimizing module is used for training the tracing XGBoost sub-models on the edge computing equipment through federation learning, and the cloud server carries out weighted aggregation on the plurality of tracing XGBoost sub-models to form a global tracing XGBoost model;
The variation self-encoder anomaly detection module is used for dynamically adjusting an anomaly threshold value by using a random forest model, detecting the pollutant concentration corresponding to the processed pollutant data through the variation self-encoder, determining an anomaly score corresponding to the processed pollutant data based on the anomaly threshold value and the pollutant concentration corresponding to the processed pollutant data, and sending out an early warning when the anomaly score exceeds a preset threshold value;
the pollution tracing and tracking module is used for utilizing a 4D meteorological database to integrate pollutant data, meteorological data and remote sensing data after treatment through Kalman filtering under the condition of sending out early warning, combining HYSPLIT reverse track model and global tracing XGBoost model to position a pollution source coordinate and a pollution source type, and predicting a subsequent transmission path of pollutants by using a Transformer model;
and the result output and storage module is used for outputting the prediction result to the central control platform, storing real-time data by using the MongoDB and storing historical data by using the HDFS.
CN202511054980.8A 2025-07-30 2025-07-30 A method and device for environmental monitoring and pollution tracing in chemical parks Active CN120559180B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202511054980.8A CN120559180B (en) 2025-07-30 2025-07-30 A method and device for environmental monitoring and pollution tracing in chemical parks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202511054980.8A CN120559180B (en) 2025-07-30 2025-07-30 A method and device for environmental monitoring and pollution tracing in chemical parks

Publications (2)

Publication Number Publication Date
CN120559180A CN120559180A (en) 2025-08-29
CN120559180B true CN120559180B (en) 2025-09-30

Family

ID=96833407

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202511054980.8A Active CN120559180B (en) 2025-07-30 2025-07-30 A method and device for environmental monitoring and pollution tracing in chemical parks

Country Status (1)

Country Link
CN (1) CN120559180B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103729536A (en) * 2012-07-31 2014-04-16 通用电气公司 Method and apparatus for providing in-flight weather data
CN113009086A (en) * 2021-03-08 2021-06-22 重庆邮电大学 Method for exploring urban atmospheric pollutant source based on backward trajectory mode

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117407800A (en) * 2023-09-11 2024-01-16 北京工商大学 A social media robot detection method and system based on random forest and XGBoost model
CN119129414B (en) * 2024-09-06 2025-11-11 武汉市三藏科技有限责任公司 Pollution source information display method, pollution source information display device, electronic equipment and computer readable medium
CN119763660B (en) * 2025-03-07 2025-07-01 同济大学 A method for tracing the source of water pollutants based on environmental DNA and machine learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103729536A (en) * 2012-07-31 2014-04-16 通用电气公司 Method and apparatus for providing in-flight weather data
CN113009086A (en) * 2021-03-08 2021-06-22 重庆邮电大学 Method for exploring urban atmospheric pollutant source based on backward trajectory mode

Also Published As

Publication number Publication date
CN120559180A (en) 2025-08-29

Similar Documents

Publication Publication Date Title
Kang et al. Air quality prediction: Big data and machine learning approaches
CN114926749B (en) Near-surface atmospheric pollutant inversion method and system based on remote sensing image
Wang et al. Fine-scale estimation of carbon monoxide and fine particulate matter concentrations in proximity to a road intersection by using wavelet neural network with genetic algorithm
CN117538503A (en) Real-time intelligent soil pollution monitoring system and method
Liu et al. A novel method for regional NO2 concentration prediction using discrete wavelet transform and an LSTM network
Gogikar et al. Seasonal prediction of particulate matter over the steel city of India using neural network models
Kumar et al. Prediction and examination of seasonal variation of ozone with meteorological parameter through artificial neural network at NEERI, Nagpur, India
Jonnalagadda et al. Forecasting atmospheric visibility using auto regressive recurrent neural network
CN112016696B (en) PM integrating satellite observation and ground observation 1 Concentration inversion method and system
CN119005757B (en) Environment variable driven perchlorate point source pollution treatment method and system
CN118538329B (en) An outdoor air data processing system based on data analysis
CN120446400A (en) Air pollution monitoring method based on UAV remote sensing and machine learning
CN119207627B (en) A dynamic monitoring method and system for river micropollutants based on artificial intelligence
CN117665974A (en) Sea fog and low cloud coverage two-dimensional distribution field forecasting method and system
CN119721356A (en) An ecological environment inspection and testing information management system based on environmental AI big model
CN116013426A (en) A Prediction Method of Ozone Concentration at Sites with High Temporal and Spatial Resolution
Salcedo-Bosch et al. Forecasting particulate matter concentration in Shanghai using a small-scale long-term dataset
CN119807603A (en) A method and system for rapid sewage tracing
Huan et al. River water quality forecasting: a novel LSTM-Transformer approach enhanced by multi-source data
CN117216480A (en) A remote sensing estimation method of near-surface ozone based on deeply coupled geographical spatiotemporal information
CN120559180B (en) A method and device for environmental monitoring and pollution tracing in chemical parks
Chiang et al. Deep-learning-based multi-timestamp multi-location PM 2.5 prediction: Verification by using a mobile monitoring system with an IoT framework deployed in the urban zone of a metropolitan area
Dragomir et al. Prediction of the NO2 concentration data in an urban area using multiple regression and neuronal networks
Hassan et al. Intelligent dust monitoring system based on IoT
CN120496292A (en) Methane change dynamic traceability analysis and early warning system based on artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant