CN120559180B

CN120559180B - A method and device for environmental monitoring and pollution tracing in chemical parks

Info

Publication number: CN120559180B
Application number: CN202511054980.8A
Authority: CN
Inventors: 房春生; 阿巴西·阿萨德; 王菊; 阿赫塔·阿尼斯; 陈嘉
Original assignee: Jilin University
Current assignee: Jilin University
Priority date: 2025-07-30
Filing date: 2025-07-30
Publication date: 2025-09-30
Anticipated expiration: 2045-07-30
Also published as: CN120559180A

Abstract

This invention proposes a method and apparatus for environmental monitoring and pollution source tracing in chemical parks, relating to the fields of environmental monitoring and pollution source tracing. This method collects pollutant data through IoT sensors, preprocesses the data, and generates processed pollutant data. Combining edge computing with cloud computing, it employs federated learning to train an XGBoost sub-model for source tracing, optimizing pollution source location and classification. Using VAE for anomaly detection, Kalman filtering for data fusion, and Transformer for propagation path prediction, it enables real-time, efficient, and secure pollution early warning management.

Description

Method and device for environmental monitoring and pollution tracing of chemical industry park

Technical Field

The invention relates to the field of environmental monitoring and pollution tracing, in particular to a method and a device for environmental monitoring and pollution tracing of a chemical industry park.

Background

With the acceleration of industrialization progress, chemical parks are increasingly prominent in environmental pollution as a core area for chemical production and processing. Contaminants in the atmosphere, water and soil, such as PM2.5, volatile Organic Compounds (VOCs), chemical Oxygen Demand (COD) and heavy metals, pose a risk not only to the ecological environment but also to the health of surrounding residents. Timely and accurate monitoring of contaminant concentration, detection of abnormal events, positioning of contaminant sources and prediction of contaminant propagation paths are key requirements for environmental management and pollution control in chemical parks.

In the prior art, environmental monitoring mainly relies on a fixed site sensor network to perform pollution analysis by collecting air, water quality and soil data. For example, conventional monitoring systems utilize laser scattering, electrochemical or spectroscopic analysis techniques to collect contaminant concentration data in real-time and combine the meteorological data for preliminary analysis. However, these systems have the following limitations:

the existing monitoring system is generally used for independently processing the sensor, the meteorological and remote sensing data, and an efficient fusion method is lacked, so that the analysis precision is affected by data noise and missing problems. For example, single sensor data is difficult to cope with contamination spread in complex meteorological conditions (such as strong winds or reverse temperatures).

The abnormality detection capability is limited, and the traditional abnormality detection method is mostly based on a fixed threshold value or a simple statistical model, so that the traditional abnormality detection method is difficult to adapt to dynamic environment changes (such as seasonal weather or production activity fluctuation), and the false alarm rate is high or the omission is serious. The use of dynamic thresholding and deep learning models (e.g., variational self-encoders) has not been widespread.

Pollution tracing is difficult, and the existing tracing technology, such as a reverse track model (e.g. HYSPLIT), relies on high-resolution meteorological data, but has insufficient adaptability to multi-source data integration and complex pollution scenes. The traditional machine learning model (such as a support vector machine) is poor in high-dimensional and multi-scale feature processing and real-time performance, the positioning accuracy is often more than 200 meters, and the accurate treatment requirement is difficult to meet.

Traditional centralized training methods require raw data to be uploaded to the cloud, and are large in data volume (hundreds of MB/site per day), increasing transmission cost and leakage risk. The use of distributed learning (e.g., federal learning) in environmental monitoring is still under exploration.

The real-time performance and the calculation efficiency are insufficient, and when the existing system processes high-dimensional data (such as minute-level multi-pollutant concentration) and complex models (such as a deep neural network), the calculation delay is high (> 500 milliseconds), so that the real-time early warning and tracking requirements are difficult to meet. Edge computing and model compression techniques have not been widely used for environmental monitoring.

In recent years, artificial Intelligence (AI) and internet of things (IoT) technologies offer new opportunities for environmental monitoring. The literature reports that the sensor network based on the Internet of things can realize minute-level data acquisition, and combines edge computing equipment (such as NVIDIA Jetson) to perform local reasoning so as to reduce cloud load. However, existing AI models (e.g., random forests or gradient-lifting trees) mostly employ centralized training, neglecting computing power of edge computing devices, and do not adequately optimize multi-objective tasks (e.g., pollution source coordinate regression and type classification). Federal learning is used as a distributed training paradigm, and communication overhead is remarkably reduced and privacy is protected by only transmitting model parameters instead of original data, but its application in pollution tracing in chemical industry parks still faces challenges such as multi-scale feature processing, uncertainty quantization and dynamic weight adjustment.

In addition, pollution propagation prediction needs to combine high-dimensional meteorological data and space-time characteristics, and a traditional model (such as a convolutional neural network) is poor in capturing long-term dependency. Successful application of the Transformer model in sequence prediction inspires the potential of the Transformer model in pollutant propagation path prediction, but real-time performance and computational efficiency still need to be optimized.

Disclosure of Invention

In view of the above, the invention provides a method and a device for environmental monitoring and pollution tracing in a chemical industry park, which can optimize pollution source positioning and type classification and realize real-time, efficient and safe pollution early warning management.

A method for environmental monitoring and pollution tracing of a chemical industry park comprises the following steps:

step 1, collecting pollutant data of air, water and soil in real time through an Internet of things sensor network, and deploying edge computing equipment and a cloud server;

step 2, preprocessing pollutant data to obtain processed pollutant data;

Step 3, training a tracing XGBoost sub-model on the edge computing equipment through federation learning, and carrying out weighted aggregation on a plurality of tracing XGBoost sub-models by a cloud server to form a global tracing XGBoost model;

Detecting the pollutant concentration corresponding to the processed pollutant data through a variation self-encoder, determining an abnormal score corresponding to the processed pollutant data based on the abnormal threshold and the pollutant concentration corresponding to the processed pollutant data, and sending out an early warning when the abnormal score exceeds a preset threshold;

Step 5, under the condition of sending out early warning, utilizing a 4D meteorological database, integrating the processed pollutant data, meteorological data and remote sensing data through Kalman filtering, combining HYSPLIT reverse track model and global tracing XGBoost model to position the pollutant source coordinates and pollutant source types, and predicting the subsequent propagation paths of pollutants by using a Transformer model;

And 6, outputting the predicted result to a central control platform, storing real-time predicted result data by using a MongoDB, and storing historical predicted result data by using an HDFS.

In particular, said step 1 comprises:

The method comprises the steps of deploying an environment monitoring station, deploying an air sensor, a water quality sensor and a soil sensor to form an Internet of things sensor network, collecting pollutant data of air, water and soil in real time based on the Internet of things sensor network, and deploying edge computing equipment and a cloud server.

In particular, the step 2 includes processing the contaminant data stream in real time using a APACHE KAFKA flow computing framework, the computed contaminant data including a contaminant mean, a contaminant median, and a contaminant variance.

In particular, said step 3 comprises:

the tracing XGBoost sub-model is trained on the edge computing equipment through federation learning, output data of the tracing XGBoost sub-model is sent to the cloud server through a gRPC channel, the cloud server carries out weighted aggregation on the plurality of tracing XGBoost sub-models to form a global tracing XGBoost model, and the weighted aggregation is achieved through the following expression:

;

Wherein, the Representing the data volume of the nth monitoring site,Representing the error of XGBoost submodels on the nth edge computing device, n representing the nth edge computing device, one edge computing device deployed for each monitoring site, m representing the index of the edge computing device,A penalty parameter is indicated and a penalty parameter is indicated,Representing the data volume processed by the mth edge computing device, the cloud server distributes a global traceability XGBoost model daily through an OTA mechanism,Representing the weights of the trace-source XGBoost sub-model on the nth edge computing device.

In particular, the method further comprises performing performance optimization on the traceability XGBoost submodel, the variant self-encoder and the transducer model, and the performance optimization method comprises the following steps:

Pruning and quantizing are applied to a source XGBoost submodel, a variation self-encoder and a transform model to reduce the quantity of parameters, parallel calculation is used for reducing calculation delay, and model parameters are updated every day in an incremental learning mode;

the trace-source XGBoost submodel is optimized using a multi-objective loss function.

In particular, in the step 4, the loss function of the variable self-encoder is:

;

Wherein, the Indicating the concentration of the contaminant after normalization,Represents the reconstitution concentration, N represents the contaminant species,Representing the mean value of the latent variable,Representing the variance of the latent variable, t being the current time step, i and j being variables,Is the loss function value of the variable self-encoder.

In particular, in the step 5, the method for positioning the pollution source coordinates and the pollution source type by using the 4D weather database and integrating the processed pollutant data, weather data and remote sensing data through kalman filtering and combining HYSPLIT reverse track model and global tracing XGBoost model comprises the following steps:

And outputting the fusion concentration by using the pollutant data, the 4D meteorological data and the remote sensing data which are processed through Kalman filtering fusion, wherein the HYSPLIT reverse track model utilizes Lagrange particles to track and analyze the pollutant propagation path, determines the area of a pollution source, positions the pollution source coordinates in the area of the pollution source through the global tracing XGBoost model, and determines the type of the pollution source.

In particular, in the step 5, the converter model is used for predicting the subsequent transmission path of the pollutant, wherein the method comprises the steps of taking the pollution source coordinates and the pollution source type as initial conditions of the converter model for predicting the transmission path, and using the converter model for predicting the subsequent transmission path of the pollutant and the concentration distribution of the pollutant for 24 hours in future.

In particular, said step 6 comprises:

Outputting the prediction result to a central control platform, storing real-time data by using MongoDB, storing historical data by using HDFS, simulating pollutant leakage by using MATLAB, and verifying positioning errors and type classification errors.

The invention also discloses a device for environmental monitoring and pollution tracing in the chemical industry park, which comprises:

the data acquisition and hardware deployment module is used for acquiring pollutant data of air, water and soil in real time through an Internet of things sensor network and deploying edge computing equipment and a cloud server;

The software configuration module is used for preprocessing the pollutant data to obtain processed pollutant data;

the model training and optimizing module is used for training the tracing XGBoost sub-models on the edge computing equipment through federation learning, and the cloud server carries out weighted aggregation on the plurality of tracing XGBoost sub-models to form a global tracing XGBoost model;

The variation self-encoder anomaly detection module is used for dynamically adjusting an anomaly threshold value by using a random forest model, detecting the pollutant concentration corresponding to the processed pollutant data through the variation self-encoder, determining an anomaly score corresponding to the processed pollutant data based on the anomaly threshold value and the pollutant concentration corresponding to the processed pollutant data, and sending out an early warning when the anomaly score exceeds a preset threshold value;

the pollution tracing and tracking module is used for utilizing a 4D meteorological database to integrate pollutant data, meteorological data and remote sensing data after treatment through Kalman filtering under the condition of sending out early warning, combining HYSPLIT reverse track model and global tracing XGBoost model to position a pollution source coordinate and a pollution source type, and predicting a subsequent transmission path of pollutants by using a Transformer model;

and the result output and storage module is used for outputting the prediction result to the central control platform, storing real-time data by using the MongoDB and storing historical data by using the HDFS.

The beneficial effects are that:

According to the technical scheme, high-precision pollution source positioning is realized, the pollution source coordinate positioning error is controlled within an effective range by utilizing a tracing XGBoost submodels (comprising 100 lifting trees and 25 trees trained on the edge) and a multi-objective loss function, and the method is superior to the traditional method (such as HYSPLIT, error is 200 m), so that reliable support is provided for accurately identifying the pollution source.

According to the technical scheme, the pollution source type classification accuracy is remarkably improved, the traceability XGBoost submodel classification performance is optimized through the terms of the multi-objective loss function, 94% type accuracy (types include factory emission, pipeline leakage, diffusion pollution and others) is achieved, and compared with a traditional support vector machine, differential treatment measures are effectively guided.

According to the technical scheme, the stability of model prediction is enhanced, the prediction uncertainty is reduced, the standard deviation of coordinates is less than 45 meters, the standard deviation of type probability is less than 0.07, the event proportion of confidence coefficient is less than 0.8 is only 2%, and the robustness under dynamic environmental weather change is ensured through a term regularization traceability XGBoost submodel of a multi-objective loss function.

According to the technical scheme, the data transmission quantity is greatly reduced, the privacy is protected, only the traceability XGBoost submodel parameters are transmitted by adopting federal learning, the original data transmission quantity is reduced by more than 90%, and compared with the traditional centralized training, the safety is remarkably improved by combining TLS 1.3 encryption.

According to the technical scheme, real-time and efficient environment monitoring and early warning are realized, and the real-time early warning requirement of a chemical industry park is met by combining VAE anomaly detection and API alarm through edge computing equipment reasoning, cloud aggregation and minute data acquisition, which is superior to that of a traditional system.

According to the technical scheme, the data fusion and pollutant propagation prediction capability is improved, pollutant data, 4D meteorological data and remote sensing data which are processed through Kalman filtering fusion are combined with a transducer model to predict a pollutant propagation path (100 m grid, mean square error < 0.1) in the future 24 hours, and compared with a traditional convolution network, the method and the device provide high-precision support for pollution diffusion prevention and control.

According to the technical scheme, the computing efficiency and the system expandability are optimized, the memory occupation of the edge computing equipment is <450MB, the cloud aggregation time is <1 minute, the system covers 40 sites (10 km 2), the system is easy to expand to a larger area, and the system is more efficient than the traditional centralized system through model pruning (30% parameter reduction), 16-bit floating point quantization and CUDA (Compute Unified Device Architecture, unified computing equipment architecture) parallel computing (50% delay reduction).

Drawings

FIG. 1 is a schematic diagram of a method for chemical industry park environmental monitoring and pollution tracing in accordance with the present invention;

FIG. 2 is a schematic diagram of an apparatus for environmental monitoring and pollution tracing in a chemical industrial park according to the present invention.

Detailed Description

The invention will now be described in detail by way of example with reference to the accompanying drawings.

The invention provides a method for environmental monitoring and pollution tracing in a chemical industry park, which is shown in figure 1 and comprises the following steps:

step 2, preprocessing pollutant data to obtain processed pollutant data;

step 3, training a tracing XGBoost (Extreme Gradient Boosting, limit gradient lifting) sub-model on edge computing equipment through federation learning, and carrying out weighted aggregation on a plurality of tracing XGBoost sub-models by a cloud server to form a global tracing XGBoost model;

step 5, under the condition of sending out early warning, utilizing a 4D meteorological database, integrating the processed pollutant data, meteorological data and remote sensing data through Kalman filtering, combining HYSPLIT (Hybrid SINGLE PARTICLE LAGRANGIAN INTEGRATED Trajectory Model) reverse track model and global tracing XGBoost model to position a pollutant source coordinate and a pollutant source type, and predicting a pollutant subsequent propagation path by using a transducer model;

And 6, outputting the predicted result to a central control platform, storing real-time predicted result data by using a MongoDB, and storing historical predicted result data by using an HDFS (Hadoop Distributed FILE SYSTEM ).

In the embodiment of the invention, the step 1 specifically comprises the steps of collecting pollutant data of air, water and soil in real time through an internet of things (IoT) sensor network, deploying edge computing equipment and a cloud server, collecting 60-dimensional pollutant data (including PM2.5, 57 volatile organic compounds and chemical oxygen demand) of the air, the water and the soil in real time through the internet of things (IoT) sensor network, deploying 40 edge computing equipment (NVIDIA Jetson Nano) and the cloud server (AWS EC2, NVIDIA A100 GPU), deploying air sensors (PM 2.5, accuracy + -1 mug m 3), water quality sensors (COD, accuracy + -0.5 mg/L) and soil sensors (heavy metal, detection limit <1 ppm), and collecting 60-dimensional pollutant concentration data at a minute-level frequency (1 time per minute) at 40 monitoring sites, wherein the total data amount is about 100 MB/day/site.

Minute-level data transmission is achieved by using a 5G module (transmission rate >100 Mbps, delay <100 ms) or a low-power-consumption wide area network (LPWAN, power consumption <10 mW), edge computing devices (NVIDIA Jetson Nano,4GB RAM) are configured to support real-time reasoning, and cloud servers (AWS EC2, NVIDIA A100 GPU) support model training and aggregation. The system is deployed in a chemical industry park of a Jilin economic technology development area, covers a 64 km2 area and comprises 40 monitoring sites.

In this embodiment, the step 2 specifically includes configuring a streaming computing framework, an artificial intelligence model and a database, preprocessing pollutant data, performing real-time statistical analysis, and configuring APACHE KAFKA the streaming computing framework, an Artificial Intelligence (AI) model (variable self-encoder, gradient lifting decision tree, XGBoost, transformer) and a MongoDB/HDFS database.

In this embodiment, the edge computing device is configured to receive the processed contaminant data, meteorological data in a 4D meteorological database (100 meters spatial resolution, 1 minute temporal resolution, wind speed + -0.1 m/s), and remote sensing data (10 meters resolution, daily updates).

In the embodiment, the step 3 specifically comprises training a tracing XGBoost sub-model on edge computing equipment through federation learning, carrying out weighted aggregation on a cloud server to form a global tracing XGBoost model, optimizing artificial intelligent model performance through model compression, GPU parallel computing and incremental learning, training a tracing XGBoost sub-model (comprising 25 decision trees for example) on the edge computing equipment through federation learning, carrying out weighted aggregation on a global tracing XGBoost model (comprising 100 trees) by the cloud server, and reducing transmission sub-model parameters by 90% of data volume.

According to the invention, historical data is acquired, and pollutant data, 4D meteorological data and remote sensing data processed in the historical data are integrated by adopting Kalman filtering, so that a training sample is obtained. The pollutant data, 4D meteorological data and remote sensing data processed in the Kalman filtering integration historical data are the same as the pollutant data, 4D meteorological data and remote sensing data processed in the Kalman filtering integration historical data in the step 5, a 200-dimensional feature vector is obtained, and the integration method is described in detail in the step 5.

The invention uses Principal Component Analysis (PCA) to reduce the dimension of the features of the training sample to 80 dimensions, and retains 95% variance with a calculation time of <30 milliseconds. Data preprocessing includes wavelet denoising (Daubechies, 3 layers) and spatiotemporal KNN padding (k=5), with a loss rate <1%.

In the model training stage, each edge computing device trains a traceability XGBoost sub-model (comprising 25 lifting trees) and predicts pollution source coordinates and types.

80-Dimensional feature vectors (after PCA dimension reduction) are input.

Output of coordinatesType (type 4 total).

The model parameters are as follows:

maximum depth 6, learning rate 0.1, l2 regularization lambda=1.

Target function, self-defining multi-target loss function.

Training data 1 calendar history data.

The multi-objective loss function is used for optimizing a source-tracing XGBoost submodel, and the expression is as follows:

;

Wherein, the Optimizing pollution source coordinates for coordinate mean square errorAnd predicting, wherein Num is the number of samples, and i1 is the number of samples.The actual pollution source coordinates (in meters, based on the chemical industry park geographical coordinate system, e.g., UTM projection) representing the i1 st pollution event.

The pollution source coordinates predicted by the traceability XGBoost submodel are expressed and are based on 80-dimensional multi-scale feature vectors (comprising minute-scale, 5 minute-scale and 60 minute-scale concentrations, meteorological data parameters, spatial correlation and remote sensing data subjected to PCA dimension reduction).The Euclidean distance square representing the predicted and true coordinates is used to measure the positioning error.

Optimizing the pollution source type classification for the type cross entropy,The number of categories is set to 4, including factory discharge, pipeline leakage, diffusion pollution and others;

The real type label representing the i1 st sample is encoded with one-hot (e.g., [1, 0, 0, 0] represents "factory emissions").

Representing the type probability of the i1 st traceability XGBoost submodel prediction; and (3) representing the cross entropy of the i1 th sample, and measuring the difference between the prediction probability and the real label.

For prediction variance, for reducing model uncertainty;

Where T represents the number of lifting trees for the sub-model, set to 25 (25 trees trained per edge computing device). Representing the prediction result of the kth lifting tree, and classifying the type probability of the coordinates; Representing an average prediction of 25 trees of the sub-model, t=100 being the number of trees; are weight coefficients for balancing coordinate prediction, type classification, and stability.

The cloud collects 40 traceability XGBoost sub-models, a global XGBoost model is formed through weighted average aggregation, the traceability XGBoost sub-model is trained on edge computing equipment through federal learning, the cloud server weights and aggregates the global traceability XGBoost model, and the weighted aggregate global model is achieved through the following expression:

;

Wherein, the Representing the data volume of the nth monitoring site,Representing the error of XGBoost submodels on the nth edge computing device, n representing the nth edge computing device, one edge computing device deployed for each monitoring site, m representing the index of the edge computing device,A penalty parameter is indicated and a penalty parameter is indicated,Representing The data volume processed by The mth edge computing device, the cloud server distributes The global traceability XGBoost model daily through an OTA (Over The Air) mechanism,Representing the weights of the trace-source XGBoost sub-model on the nth edge computing device.

The dynamic tree weight updating formula from the traceability XGBoost submodel in the global traceability XGBoost model is as follows:

;

Where k represents the kth dynamic tree (with a value of 1 to 100), is the decision tree of the traceability XGBoost model, j1 represents the index of all trees (with a value of 1 to 100) for normalization, Representing the current time step (minute level, update per hour),Representing the prediction error of a time step on the kth tree,Indicating the correlation of the kth tree with the current weather. Alpha represents an error penalty parameter, which may be 0.5, and beta represents a correlation enhancement parameter, which may be 0.3.The weight of the kth tree is represented for dynamic prediction.

In this embodiment, the federation learning only transmits the traceability XGBoost submodel parameters, and the original data is retained in the edge computing device, so that the transmission quantity is reduced by 90%. The trace-source XGBoost submodel is uploaded to the cloud via gRPC protocol with an encryption delay of <10 milliseconds. The cloud distributes the global XGBoost model daily through the OTA for <5 minutes.

Performing performance optimization on the traceability XGBoost submodel, the variable self-encoder and the transducer model, wherein the performance optimization method comprises the following steps:

In the embodiment of the invention, the step 4 is specifically to dynamically adjust an abnormal threshold by using a random forest model, detect the pollutant concentration corresponding to the processed pollutant data through a variation self-encoder (VAE), determine the abnormal score corresponding to the processed pollutant data based on the abnormal threshold and the pollutant concentration corresponding to the processed pollutant data, and send out early warning when the abnormal score exceeds a preset threshold.

In one embodiment, if the anomaly score is >0.9, an early warning is triggered and a manual verification is triggered.

In this step 4, the variation self-encoder (VAE) processes the 60-dimensional normalized contaminant concentrationGenerating a reconstructed concentrationAn abnormal event is detected. The model architecture of the variational self-encoder is as follows:

input layer 60 dimensions (normalized concentration).

Encoder 3-layer fully connected network (128, 64, 10-dimensional), output latent variable mean valueSum of variances。

Decoder 3-layer fully connected network (10, 64, 60 dimensions).

In the step 4, the anomaly detection step of the variation self-encoder comprises the steps of disposing the variation self-encoder for anomaly detection and generating anomaly scores, wherein the loss function of the variation self-encoder is as follows:

;

Wherein, the Indicating the concentration of the contaminant after normalization,Represents the reconstitution concentration, N represents the contaminant species,Representing the mean value of the latent variable,Representing the variance of the latent variable, t being the current time step, i and j being variables,Loss function value for a variational self-encoder

In the step 4, the anomaly threshold is dynamically adjusted by using a random forest model, and specifically comprises the step of predicting the anomaly threshold by combining the random forest (100 trees with the maximum depth of 20) with meteorological data (wind speed, humidity and 3 dimensions). The related parameters of the random forest model are designed as follows:

meteorological data + historical anomaly score (4 dimensions).

Output: threshold (scalar, e.g., 0.9).

The update frequency is 1 time per hour.

Training time, <5 minutes (Jetson Nano).

The anomaly score >0.9 triggers an early warning, the false alarm rate is <2%.

In the embodiment of the invention, step 5 specifically comprises the steps of utilizing a 4D weather database (spatial resolution is 100 meters, minute time resolution) to integrate pollutant data, weather data and remote sensing data after processing through Kalman filtering under the condition of sending out early warning, combining HYSPLIT reverse track model and global tracing XGBoost model to position pollution source coordinates and pollution source types, and predicting a subsequent transmission path of pollutants by using a transducer model, wherein the pollution source coordinates, the pollution source types and the pollutant transmission path are used as prediction results.

Wherein the treated pollutant data (180-dimensional) is concentration sequence of minute level (PM 2.5, 57 VOCs, COD, etc.), 5 minute level, 60 minute level time scale.

Characteristics of meteorological data (4 dimensions) wind speed (m/s), wind direction (degrees), temperature (°c), humidity (%), from 4D meteorological database.

Spatial correlation (12 dimensions) concentration covariance matrix between 40 sites based on Kalman filtering fusion data.

The remote sensing data is characterized by (4-dimensional) the contaminant distribution of the satellite images (100 meter resolution).

Totaling 200-dimensional feature vectors.

And integrating the processed pollutant data, meteorological data and remote sensing data through Kalman filtering. For example, for processed contaminant data (e.g., PM2.5 or COD), meteorological data, and telemetry data, the Kalman filter outputs are:

;

Indicating the fusion concentration of the ith contaminant at the current time step t.

Representing the processed pollutant data, the meteorological data and the remote sensing data integrated in the current time step t, wherein the processed pollutant data, the meteorological data and the remote sensing data are respectively normalized, and the data alignment is carried out based on the normalized processed pollutant data, the normalized meteorological data and the normalized remote sensing data to obtain。The fusion concentration of the ith contaminant is time step (t-1).

Is Kalman gain, and is used for dynamically adjusting the weights of the sensor and the prediction.

Based on Kalman filtering output, the HYSPLIT reverse trajectory model utilizes Lagrangian particle tracking to analyze the pollutant propagation path, and generates a first position area of a pollution source, wherein the first position area is a rough area, so that the prediction range of the global traceability XGBoost model can be shortened. And acquiring processed pollutant data, meteorological data and remote sensing data corresponding to the first position area, inputting the processed pollutant data, meteorological data and remote sensing data into a global tracing XGBoost model, and positioning a pollution source coordinate and a pollution source type by using the global tracing XGBoost model.

The global traceability XGBoost model predicts the pollution source as follows:

;

Wherein, the For the input of the global trace-back XGBoost model,For the kth lifting tree prediction function,For the uncertainty regularization coefficient,For the dynamic tree weights to be used,The function is quantized for uncertainty, e.g. by monte carlo sampling, 50 times.

Monte Carlo samples are used to calculate the prediction variance:

;

the feature vector representing the mth sample is added with gaussian noise.

M=50 represents the number of samples,Representing the predicted mean.

Λ=0.1 represents an uncertainty regularization coefficient for balancing the prediction and uncertainty. The confidence coefficient calculation formula is as follows:

;

the HYSPLIT reverse trajectory model cooperates with the global tracing XGBoost model, the HYSPLIT reverse trajectory model provides a candidate region (e.g., 1 km grid), and the global tracing XGBoost model further locates coordinates precisely within this range And type. For example, HYSPLIT might identify "a certain industrial area" as a pollution source area, XGBoost determine the coordinates and type of a particular plant or pipeline.

In the embodiment of the invention, step 6 is specifically that the prediction result is output to a central control platform, the MongoDB is used for storing real-time prediction result data, and the HDFS is used for storing historical prediction result data. In this embodiment, ECharts is used to generate a pollution source thermodynamic diagram. The display content comprises pollution source coordinates, pollution source types, prediction result confidence and prediction result uncertainty.

If the confidence is >0.8 and the type is "factory emission" or "pipe leak," an alert is sent to the regulatory agency through the API with a delay <1 second.

MongoDB: real-time data (ttl=1 hour), concurrent query delay <10 milliseconds.

HDFS: historical data (data stored for 1 year, data volume about 20 TB), supporting offline analysis.

The invention also includes a simulation test procedure using MATLAB to simulate VOCs leakage (concentration surge 50%, coordinates (1000, 2000), type "factory emissions"). The simulation test results are shown in table 1:

TABLE 1

Index (I)	Results
		Positioning error	75 Meters
Type accuracy rate	94%
		Confidence level	0.90
Uncertainty of	Standard deviation of coordinates 45 m, standard deviation of type probability 0.07
		F1 fraction	0.94
Inference time	140 Ms (Jetson Nano)

In the field test stage, the pollution event is recorded 1500 times after 30 days of deployment in Jilin open areas, and the field test results are shown in Table 2:

TABLE 2

Index (I)	Results
		Positioning error	Average 76 meters, 97% events <100 meters
Type accuracy rate	94% (4 Type)
		Confidence level	Average 0.89, <0.8 to 2%
Uncertainty of	The standard deviation of coordinates is less than 50 meters, and the standard deviation of type probability is less than 0.08
		Communication delay	<500 Ms/wheel
System stability	>99.9%
		CPU utilization	<75%(Jetson Nano)
Memory occupancy	<450MB

Using promethaus to monitor edge computing device performance (CPU, memory), grafana generates a real-time dashboard, displaying prediction confidence and uncertainty.

The implementation effect is that the embodiment realizes the chemical industry park pollution monitoring and management with high efficiency and privacy protection:

Accuracy, positioning error <80 m, type F1>0.93, is superior to traditional methods (e.g. single sensor tracing, error >150 m).

Real-time reasoning <150 ms, communication <500 ms, alarm <1 second.

Federal learning reduces data transmission by 90% and TLS 1.3 encryption ensures security.

Scalability 40 sites cover 10 km2, easily expanding to larger areas.

Robustness, namely incremental learning is adapted to environmental changes, and stability is more than 99.9%.

Compared with the prior art, the method and the device remarkably improve the tracing precision and the instantaneity through edge-cloud combined training, XGBoost multi-objective optimization and complicated prediction functions.

As shown in fig. 2, the invention also discloses a device for environmental monitoring and pollution tracing in the chemical industry park, which comprises:

It will be evident to those skilled in the art that the embodiments of the invention are not limited to the details of the foregoing illustrative embodiments, and that the embodiments of the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of embodiments being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. A plurality of units, modules or means recited in a system, means or terminal claim may also be implemented by means of software or hardware by means of one and the same unit, module or means. The terms first, second, etc. are used to denote a name, but not any particular order.

Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the embodiment of the present invention, and not for limiting, and although the embodiment of the present invention has been described in detail with reference to the above-mentioned preferred embodiments, it should be understood by those skilled in the art that modifications and equivalent substitutions can be made to the technical solution of the embodiment of the present invention without departing from the spirit and scope of the technical solution of the embodiment of the present invention.

Claims

1. The method for environmental monitoring and pollution tracing of the chemical industry park is characterized by comprising the following steps:

step 2, preprocessing pollutant data to obtain processed pollutant data;

2. The method for environmental monitoring and pollution tracing in chemical industrial park according to claim 1, wherein said step 1 comprises:

3. The method for environmental monitoring and pollution tracing of chemical industrial park according to claim 2, wherein said step 2 comprises real-time processing of the pollutant data stream using APACHE KAFKA flow computing framework, the calculated pollutant data comprising pollutant mean, pollutant median and pollutant variance.

4. The method for environmental monitoring and pollution tracing of chemical industrial park according to claim 3, wherein said step 3 comprises:

;

5. The method for chemical park environmental monitoring and pollution tracing of claim 4, further comprising performance optimization of the tracing XGBoost sub-model, the variant self-encoder, and the Transformer model, the performance optimization method comprising:

6. The method for environmental monitoring and pollution tracing in a chemical industry park according to claim 4, wherein in step 4, the loss function of the variation self-encoder is:

;

7. The method for environmental monitoring and pollution tracing in a chemical industry park according to claim 6, wherein in the step 5, the pollutant data, the meteorological data and the remote sensing data after the integration processing through the kalman filtering are utilized to locate the pollution source coordinates and the pollution source type by combining HYSPLIT reverse track model and global tracing XGBoost model, comprising:

8. The method for environmental monitoring and pollution tracing in chemical industrial park according to claim 7, wherein in step 5, the method for predicting the subsequent propagation path of the pollutant by using a transducer model comprises using the pollution source coordinates and the pollution source type as initial conditions of the propagation path prediction by using the transducer model, and predicting the subsequent propagation path of the pollutant and the concentration distribution for 24 hours in future by using the transducer model.

9. The method for environmental monitoring and pollution tracing in a chemical industry park according to claim 8, wherein said step 6 comprises:

10. A device for chemical industry garden environmental monitoring and pollution traceability, its characterized in that, the device includes: