CN112446168A

CN112446168A - Effluent BOD concentration soft measurement method based on MIC and RBFNN

Info

Publication number: CN112446168A
Application number: CN202011169471.7A
Authority: CN
Inventors: 乔俊飞; 石文强; 李文静
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2020-10-28
Filing date: 2020-10-28
Publication date: 2021-03-05

Abstract

The invention discloses a soft measurement method of effluent BOD concentration based on MIC and RBFNN, aiming at the problems of long waiting time, high cost of instruments and equipment, and separate development, deployment and maintenance of software measurement hardware in the current sewage treatment process. Based on the biochemical reaction characteristics of sewage treatment, the soft measurement of the effluent BOD concentration of key water quality parameters is realized by using the maximum information number and radial basis function neural network, which solves the problem that the effluent BOD concentration is difficult to measure and the software measurement hardware needs to be developed and deployed separately. The results show that the radial basis function neural network can accurately predict the concentration of BOD in the effluent of sewage treatment to a certain extent, which is beneficial to improve the monitoring level of the concentration and quality of BOD in the effluent of sewage treatment.

Description

Effluent BOD concentration soft measurement method based on MIC and RBFNN

Technical Field

According to the biochemical reaction characteristics of sewage treatment, the method uses a Neural Network (RBFNN) based on Maximum Information Coefficient (MIC) and Radial Basis Function to realize the prediction of the BOD concentration of a key water quality parameter in the sewage treatment process, and the BOD concentration of effluent is an important parameter for representing the water pollution and the sewage treatment degree, and has important influence on the environment. The realization of the online prediction of the BOD concentration of the effluent is an important link of sewage treatment, and belongs to the fields of artificial intelligence and sewage treatment.

Background

The urban sewage treatment process is a complex and large-lag biochemical reaction process, has the characteristics of diversity, randomness, uncertainty, strong coupling, high nonlinearity, large time variation and the like, and the detection and control of key water quality parameters are important preconditions for stable and efficient operation of sewage treatment plants.

The BOD of the effluent is one of the key parameters for describing the characteristics of the sewage and is an important index for measuring the overall performance of the sewage treatment. However, the traditional effluent BOD detection technology is offline, a measured value can be obtained only after several days, and the sewage treatment process has the characteristics of strong nonlinearity, time-varying property and the like, so that BOD has the characteristic of difficult accurate measurement.

The BOD concentration of the effluent can be obtained through an artificial chemical examination method, the operation of the artificial chemical examination method is complex, the time consumption from sampling to chemical examination is long, 5 days are needed, the time lag of the artificial chemical examination can seriously influence the sewage treatment effect, and the secondary pollution is easily caused. Compared with the manual sampling assay method, the online detection instrument can shorten the detection time, avoid accidental errors caused by manual operation, but has very expensive purchase and maintenance cost.

In order to measure BOD concentration of water quickly and accurately, many researchers have proposed soft-sensing methods. The soft measurement technology is to utilize the mathematical relationship established between the process variable easy to measure and the variable to be measured which is difficult to directly measure, and to realize the measurement of the process variable to be measured through various mathematical calculation and estimation methods. Soft measurements are able to measure variables that are currently impossible or difficult to detect directly with sensors for technical or economic reasons.

Based on the method, the invention designs the soft BOD concentration measurement method of the effluent based on the maximum information number and the radial basis function neural network, and realizes the online prediction of the BOD concentration of the effluent.

Disclosure of Invention

The invention designs an effluent BOD concentration prediction method based on the maximum information number and the radial basis function neural network, which trains the radial basis function neural network by using the production data of a sewage treatment plant, corrects the parameters of the network, realizes the real-time measurement of the BOD concentration of the effluent, solves the problem that the BOD concentration of the effluent is difficult to measure in real time in the sewage treatment process, and reduces the production cost of sewage treatment;

the invention adopts the following technical scheme that the method for predicting the BOD concentration of the effluent based on the maximum information number and the radial basis function neural network comprises the following steps:

step 1, determining auxiliary variables: carrying out correlation analysis on the acquired actual water quality parameter original data of the sewage treatment plant by adopting a Maximum Information Coefficient (MIC), calculating the correlation coefficient of each water quality parameter and the BOD of the effluent water in a calculation mode shown as a formula (1), selecting a variable with the correlation coefficient larger than 0.5, and obtaining an auxiliary variable with strong correlation with the BOD concentration of the effluent water as follows: the total nitrogen concentration of the effluent, the ammonia nitrogen concentration of the effluent, the total nitrogen concentration of the influent, the BOD concentration of the influent, the ammonia nitrogen concentration of the influent, the DO concentration of the biochemical tank and the phosphate concentration of the influent tank;

wherein, I (X, Y) represents mutual information of X and Y, p (X, Y) represents joint probability density distribution function of X and Y, p (X), p (Y) respectively represent probability density distribution function of X, Y, N represents sample data volume, B (N) is function related to sample data volume, and its value is N^0.6；

Step 2, determining an initial clustering center of the K-means clustering algorithm on the basis of the feature data screened in the step 1: using the sample densities and the distances between the samples to determine K initial cluster centers for the K-means algorithm, step 2 comprises the steps of,

step 2.1 data normalization: normalizing the training data and the test data according to the formula (3) to reduce the influence of different dimensions on the result;

wherein x_normalRepresenting the normalized data, min representing the minimum value of the variable in all samples, max representing the maximum value of the variable in all samples, and x representing the original value of the data;

step 2.2 determining clustered candidate samples: calculating Euclidean distances among all samples in all data, sequencing all the distances in an ascending order, taking the mean value of the upper quartile and the lower quartile of the distances as a distance threshold value R, and calculating the density of the ith sample according to a formula (4)_iSorting all the densities in an ascending order, selecting the mean value of the upper quartile and the lower quartile of the density values of all the samples as a density threshold value, and selecting the samples with the density being more than or equal to the threshold value as candidate samples;

wherein, | | | represents a modulo operation, N represents the number of samples, N represents the number of input samples, x (x) is a threshold function, and the function value is 0 or 1;

step 2.3, determining an initial clustering center of the K-means clustering algorithm: determining the number K of final clustering centers, obtaining two samples with the largest distance from the candidate samples as initial clustering centers, and recording the initial clustering centers as C₁、C₂Deleting two samples from the candidate set, and in the remaining samples, allocating the remaining candidate samples to the nearest center according to the Euclidean distance shortest principle to serve as a sample cluster, and forming two sample clusters S₁、S₂Calculating S₁Samples in clusters to C₁And S₂Samples in clusters to C₂Taking two samples farthest from the center of the existing initial cluster in the two clusters as C₁₁、C₂₁The two farthest distances are denoted as d₁、d₂If, ifd₁>＝d₂Then C will be₁₁Removed from the original sample set, added to the initial cluster center sample, denoted C₃Otherwise, C is added₂₁Removed from the original sample set, added to the initial cluster center sample, denoted C₃；

Step 2.4 calculate the remaining initial cluster centers: dividing the rest samples into corresponding initial clustering centers according to the principle of closest distance, and recording the formed clustering cluster as S₁、S₂...S_mCalculating the distance from the sample point in each cluster to the cluster center thereof, and respectively recording the distance from the sample in each cluster to the cluster center thereof as d₁,d₂,...,d_mM represents the number of the existing cluster, and d is taken_m+1＝max{d₁,d₂,...,d_mGet it before

h is an empirical value and the value range is [0, 1]]Get d_m+1The corresponding sample is taken as a new clustering center and is marked as C_m+1If m +1 is equal to K, all initial clustering centers have been determined, and the step is ended, if m +1 is equal to K<K, continuing the step;

step 3, determining the center, width and weight parameters of the radial basis function neural network in the soft measurement model: substituting the K initial clustering centers obtained in the step (2) into an original K-means algorithm to obtain a clustering result, taking the clustering result of the K-means clustering algorithm as a central parameter of a radial basis function, and taking the initial weight of each node in the hidden layer and the node of the output layer as 1, wherein the step (3) comprises the following steps;

step 3.1, calculating Euclidean distance between the samples in the data set and the existing clustering centers, and distributing the samples to the clustering centers closest to the samples to form clustering clusters;

step 3.2, solving the mean value of all samples in each cluster, and taking the mean value as a new cluster center;

step 3.3, repeating the steps 3.1 and 3.2, ending clustering when the clustering centers are not changed or the cycle number reaches a specified upper limit, and obtaining K clustering centers;

and 3.4, selecting K clustering centers as the centers of the radial basis functions, and selecting the shortest Euclidean distance from the current center to other centers as the width parameters of the radial basis functions corresponding to the centers.

Step 4, determining a topological structure of a radial basis function neural network for predicting the BOD concentration of the effluent, wherein the step 4 comprises the following steps;

step 4.1, determining the number of nodes of an input layer: the layer has n neurons, n is the number of auxiliary variables determined in step 1, and each node represents an input variable x_iThe purpose of this layer is to pass the input value directly to the hidden layer, i denotes the sample sequence number;

x_i,i＝1,2,...,n (6)

step 4.2 determines the number of hidden layer nodes and the width and center of the hidden layer nodes. The layer is provided with m neurons in total, m is the number K of the clustering centers determined by the K-means algorithm in the step 2, the center selection of the radial basis function is the clustering result determined in the step 2, the width is the nearest Euclidean distance from the clustering center to other clustering centers, the hidden layer transfer function is the radial basis function, and a standard Gaussian function is usually selected and shown in the formula (7);

wherein, c_iCentral parameter, σ, representing the ith hidden layer node_iA width parameter representing the ith hidden layer node;

step 4.3, determining the output layer connection weight: the output layer has a node in common, the output of the node of the output layer is as shown in formula (8), and the initial connection weight of each hidden layer node and the output node is set to be 1;

wherein, y_jRepresenting the jth input sample x_jCorresponding output when input to the network, w_iRepresenting the ith implicitAnd connecting the layer node with the output node.

Step 5, adjusting radial basis function neural network parameters of the soft measurement model, wherein the step 5 comprises the following steps;

step 5.1 determining the parameters to be updated: the parameters to be adjusted are the output weights of the radial basis function neural network, all the weight parameters are arranged into a row vector, and the row vector is marked as delta, and the value of the delta is shown as a formula (9);

Δ＝[w₁,w₂,...,w_m] (9)

wherein w_mRepresenting the connection weight of the mth hidden layer node and the output node; step 5.2, circularly adjusting weight parameters of the radial basis function neural network by using an LM algorithm: calculating a gradient vector, a Jacobian matrix and a Hessian-like matrix according to the input of the current network, wherein the gradient vector g is calculated as shown in a formula (10), and the Jacobian matrix j is calculated_pThe calculation of (2) is shown as a formula (11), and the calculation of the Hessian-like matrix Q is shown as a formula (12);

e_p＝y_d-y_o (13)

where P denotes the total number of samples, P denotes the sample number currently input into the network, y_dFor the desired output of the network, y_oIs the actual output j of the network_p ^TRepresenting the Jacobian matrix j_pThe transposed matrix of (2); updating the parameters to be updated: updating the output weight of the radial basis function neural network according to the formula (14);

Δ_k+1＝Δ_k-(Q_k+μ_kI)^-1g_k (14)

wherein k represents the current training times, mu represents the learning rate, the value is 1, when the network is reduced, the parameter is reduced to 1/10 of the last iteration, otherwise, the parameter is increased to 10 times of the last iteration, the upper value limit of mu is 10^15, and the lower value limit of mu is 10^ 15;

if the absolute value of the error change of the two adjacent parameter updates is less than 10^ -10 or the number of times of single adjustment cycle reaches the upper limit, ending the parameter adjustment of the cycle, inputting the next sample, and repeating the step 5.2;

if the training samples are completely traversed, but the error is not smaller than the target value yet and the traversal times do not reach the upper limit, re-inputting the first sample in the training sample set, and repeating the step 5.2, otherwise, ending the parameter adjusting process;

step 5.3, the test sample is used as the input of the radial basis function neural network to obtain the predicted value of the BOD concentration of the normalized effluent, and the result is subjected to reverse normalization according to the formula (15) to obtain the actual predicted value of the BOD concentration of the effluent;

x_real＝x_normal*(max-min)+min (15)

wherein x_real，x_normalRepresenting true prediction data;

and 6, packaging the soft measurement model obtained in the step 5 into a jar file, importing a JavaWeb project, using a cloud server to complete service deployment, using a browser to access the project, uploading production data, calling a radial basis function neural network program by the server to predict, and returning a predicted result to the client.

The invention is mainly characterized in that:

(1) aiming at the problem that the BOD concentration of effluent of the current sewage treatment plant cannot be measured in real time, the invention extracts 7 related quantities with higher BOD concentration of the effluent through a maximum information number algorithm, simplifies the input of a neural network and improves the processing speed of a radial basis function neural network;

(2) the urban sewage treatment process is a complex and large-lag biochemical reaction process and has the characteristics of diversity, randomness, uncertainty, strong coupling, high nonlinearity, large time variation and the like, so that the prediction of the BOD concentration of effluent is realized by adopting a radial basis function neural network based on actual measured data of an actual sewage treatment plant, and the method has the characteristics of higher prediction precision, strong adaptability to complex working conditions and the like;

particular attention is paid to: the invention adopts 7 screened auxiliary variables based on the maximum information number algorithm, and the radial basis function neural network initialization mode based on the improved K-means algorithm all belongs to the scope of the invention;

drawings

FIG. 1 is a diagram of a radial basis function neural network architecture of the present invention

FIG. 2 is a graph of the BOD concentration prediction method of effluent according to the present invention

FIG. 3 is a graph of the BOD concentration prediction method of the effluent water according to the present invention

FIG. 4 is a test result chart of the BOD concentration prediction method of effluent water of the present invention

FIG. 5 is a test error diagram of the BOD concentration prediction method of effluent water of the present invention

Detailed Description

The invention obtains a soft BOD concentration measuring method of effluent based on maximum mutual information number and radial basis function network, completes auxiliary variable screening by using maximum information number calculation method, completes initialization of radial basis function neural network by using improved K-means algorithm, completes output weight adjustment of network by using second-order LM algorithm, realizes real-time measurement of BOD concentration of effluent, and solves the problem that BOD concentration of effluent is difficult to measure in real time in sewage treatment process;

experimental data come from production operation data of a certain Beijing sewage plant; selecting actual detection data of total nitrogen concentration of outlet water, ammonia nitrogen concentration of outlet water, total nitrogen concentration of inlet water, BOD concentration of inlet water, ammonia nitrogen concentration of inlet water, DO concentration of a biochemical pool and phosphate concentration of an inlet pool as experimental sample data, wherein 365 groups of samples are total and divided into two parts: wherein the first 280 groups of data are used as training samples, and the other 85 groups of data are used as testing samples;

a method for predicting BOD concentration of effluent based on maximum information number and radial basis function neural network is characterized by comprising the following steps:

step 1, determining auxiliary variables: carrying out correlation analysis on the acquired actual water quality parameter original data of the sewage treatment plant by adopting a Maximum Information Coefficient (MIC), calculating the correlation coefficient of each water quality parameter and the BOD of the effluent water in a calculation mode shown as a formula (16), selecting a variable with the correlation coefficient larger than 0.5, and obtaining an auxiliary variable with strong correlation with the BOD concentration of the effluent water as follows: the total nitrogen concentration of the effluent, the ammonia nitrogen concentration of the effluent, the total nitrogen concentration of the influent, the BOD concentration of the influent, the ammonia nitrogen concentration of the influent, the DO concentration of the biochemical tank and the phosphate concentration of the influent tank;

step 2.1 data normalization: normalizing the training data and the test data according to the formula (18) to reduce the influence of different dimensions on the result;

wherein x_normalRepresents the normalized data, min represents the minimum value of the variable in all samples,max represents the maximum value of the variable in all samples, x represents the original value of the data;

step 2.2 determining clustered candidate samples: calculating Euclidean distances among all samples in all data, sequencing all the distances in an ascending order, taking the mean value of the upper quartile and the lower quartile of the distances as a distance threshold value R, and calculating the density diversity of the ith sample according to a formula (19)_iSorting all the densities in an ascending order, selecting the mean value of the upper quartile and the lower quartile of the density values of all the samples as a density threshold value, and selecting the samples with the density being more than or equal to the threshold value as candidate samples;

step 2.3, determining an initial clustering center of the K-means clustering algorithm: determining the number K of final clustering centers, obtaining two samples with the largest distance from the candidate samples as initial clustering centers, and recording the initial clustering centers as C₁、C₂Deleting two samples from the candidate set, and in the remaining samples, allocating the remaining candidate samples to the nearest center according to the Euclidean distance shortest principle to serve as a sample cluster, and forming two sample clusters S₁、S₂Calculating S₁Samples in clusters to C₁And S₂Samples in clusters to C₂Taking two samples farthest from the center of the existing initial cluster in the two clusters as C₁₁、C₂₁The two farthest distances are denoted as d₁、d₂If d is₁>＝d₂Then C will be₁₁Removed from the original sample set, added to the initial cluster center sample, denoted C₃Otherwise, C is added₂₁Removed from the original sample set, added to the initial cluster center sample, denoted C₃；

x_i,i＝1,2,...,n (21)

step 4.2 determines the number of hidden layer nodes and the width and center of the hidden layer nodes. The layer is provided with m neurons in total, m is the number K of the clustering centers determined by the K-means algorithm in the step 2, the center selection of the radial basis function is the clustering result determined in the step 2, the width is the nearest Euclidean distance from the clustering center to other clustering centers, the hidden layer transfer function is the radial basis function, and a standard Gaussian function is usually selected and is shown in a formula (22);

step 4.3, determining the output layer connection weight: the output layer has a node in common, the output of the node of the output layer is as shown in formula (23), and the initial connection weight of each hidden layer node and the output node is set to be 1;

wherein, y_jRepresenting the jth input sample x_jCorresponding output when input to the network, w_iAnd representing the connection weight of the ith hidden layer node and the output node.

Step 5,

Adjusting the parameters of the radial basis function neural network of the soft measurement model, wherein the step 5 comprises the following steps;

step 5.1 determining the parameters to be updated: the parameters to be adjusted are the output weights of the radial basis function neural network, all the weight parameters are arranged into a row vector, and the row vector is marked as delta, and the value of the delta is shown as a formula (24);

Δ＝[w₁,w₂,...,w_m] (24)

wherein w_mRepresenting the connection weight of the mth hidden layer node and the output node; step 5.2, circularly adjusting weight parameters of the radial basis function neural network by using an LM algorithm: calculating a gradient vector, a Jacobian matrix and a Hessian-like matrix according to the input of the current network, wherein the gradient vector g is calculated as shown in a formula (26), and the Jacobian matrix j is calculated as shown in a formula (26)_pThe calculation of (2) is shown as a formula (26), and the calculation of the Hessian-like matrix Q is shown as a formula (27);

e_p＝y_d-y_o (28)

where P denotes the total number of samples, P denotes the sample number currently input into the network, y_dFor the desired output of the network, y_oIs the actual output j of the network_p ^TRepresenting the Jacobian matrix j_pThe transposed matrix of (2); updating the parameters to be updated: updating the output weight of the radial basis function neural network according to a formula (29);

Δ_k+1＝Δ_k-(Q_k+μ_kI)^-1g_k (29)

step 6, inputting the test data into the trained radial basis function neural network to obtain a predicted value of the BOD concentration of the effluent, packaging an MATLAB program into jar files through MATLAB, adding Java engineering, and realizing the soft measurement of the BOD of the effluent by using Java language through calling corresponding API;

the training results for the radial basis function neural network are shown in fig. 2, with X-axis: number of samples, in units of units per sample, Y-axis: the BOD concentration of the effluent water is in unit mg/L, the dotted line is the actual BOD concentration value of the effluent water, and the solid line is the output value of the radial basis function neural network; the error between the actual output value of the BOD concentration of the effluent and the output value of the radial basis function neural network is shown in FIG. 3, and the X axis: number of samples, in units of units per sample, Y-axis: the BOD concentration of the effluent is mg/L;

the prediction results are shown in fig. 4, X-axis: number of samples, in units of units per sample, Y-axis: the BOD concentration of the effluent is in mg/L, the dotted line is the actual output value of the BOD concentration of the effluent, and the solid line is the predicted output value of the BOD concentration of the effluent; the error between the actual output value of the BOD concentration of the effluent and the predicted output value of the BOD concentration of the effluent is shown in figure 5, and the X axis: number of samples, in units of units per sample, Y-axis: predicting the BOD concentration of the effluent, wherein the unit is mg/L;

tables 1-18 show the experimental data of the present invention, with the auxiliary variables having been normalized (normalized interval of [1-,1 ]). Tables 1 to 7 show auxiliary variable data in the training process, table 8 shows actual training output, table 9 is output of the radial basis function neural network in the training process, tables 10 to 16 show auxiliary variable data of the test sample, table 17 shows actual test output data, and table 18 shows effluent BOD concentration prediction value data of the present invention.

TABLE 1 auxiliary variable Total Nitrogen concentration in effluent

TABLE 2 auxiliary variable of the Ammonia Nitrogen concentration in the effluent

TABLE 3 Total Nitrogen concentration of the auxiliary variable influent

-0.529	0.740	0.047	-0.017	0.339	0.044	0.666	-0.018	-0.522
									0.083	0.012	0.065	-0.352	-0.326	0.273	-0.312	-0.270	-0.803
0.049	0.669	-0.714	-0.018	0.058	-0.295	-0.207	-0.531	0.820
									0.037	-0.809	0.026	-0.226	-0.360	-0.148	0.162	-0.250	0.056
-0.299	-0.377	-0.177	-0.258	0.073	-0.022	-0.477	-0.322	-0.109
									0.499	0.016	-0.686	-0.738	-0.015	-0.544	0.290	-0.276	-0.224
-0.273	-0.514	0.106	-0.377	-0.529	-0.873	-0.201	0.833	-0.418
									-0.268	-0.298	-0.061	-0.609	0.753	-0.566	0.522	0.120	-0.501
-0.051	-0.253	0.317	0.002	0.027	-0.008	-0.417	-0.161	0.378
									-0.436	-0.528	-0.212	-0.509	-0.396	0.042	-0.737	-0.381	-1.000
0.371	-0.558	-0.566	-0.534	0.658	0.367	-0.268	-0.258	0.060
									-0.398	-0.317	0.147	-0.521	0.008	-0.588	0.049	0.052	-0.091
0.793	-0.242	-0.283	-0.479	0.198	-0.320	-0.533	-0.682	0.309
									-0.462	-0.246	-0.400	-0.229	-0.227	0.013	0.004	-0.297	-0.448
-0.495	-0.352	-0.869	0.780	-0.036	-0.252	0.092	-0.589	-0.558
									-0.236	-0.143	-0.057	-0.711	0.835	-0.234	-0.934	0.415	-0.257
-0.285	0.248	0.382	-0.521	0.059	-0.105	0.749	-0.194	0.037
									-0.276	-0.263	0.259	-0.530	-0.544	0.032	-0.512	-0.336	-0.209
-0.606	-0.196	0.228	-0.551	-0.090	-0.349	0.326	-0.565	-0.275
									0.004	-0.478	-0.701	0.040	-0.518	-0.033	-0.206	-0.415	0.504
-0.746	-0.212	-0.363	-0.146	0.075	0.601	0.206	-0.580	-0.085
									-0.391	-0.308	-0.419	-0.257	-0.680	-0.604	0.019	-0.240	0.590
-0.638	-0.282	-0.327	-0.524	-0.267	-0.334	-0.553	-0.233	0.257
									-0.243	0.191	-0.283	-0.374	0.284	-0.339	-0.516	-0.188	-0.191
-0.037	-0.613	-0.268	0.259	0.806	-0.363	-0.495	-0.631	-0.422
									-0.274	0.070	-0.322	-0.529	0.087	-0.323	-0.613	0.208	0.374
-0.266	0.334	0.234	0.183	-0.536	-0.135	-0.303	0.386	-0.237
									0.241	-0.281	-0.253	-0.664	-0.007	-0.646	-0.400	-0.259	-0.217
-0.420	-0.260	0.060	0.115	-0.033	0.066	0.167	-0.256	0.012
									-0.425	-0.582	0.029	-0.440	-0.148	-0.490	-0.350	-0.282	0.916
0.363	-0.538	-0.272	-0.220	-0.216	0.727	0.015	-0.555	-0.018
									0.009

TABLE 4 BOD concentration of the auxiliary variable influent water

TABLE 5 auxiliary variable influent ammonia nitrogen concentration

TABLE 6 auxiliary variable Biochemical pool DO concentration

TABLE 7 auxiliary variable intake pool phosphate concentration

TABLE 8 measured BOD concentration (mg/L) of the water

10.371	12.957	12.529	14.829	12.871	14.143	14.700	12.600	12.929
									13.029	12.729	13.843	10.800	10.557	14.671	11.543	11.686	11.857
12.971	13.386	11.700	12.857	12.543	11.600	11.314	11.029	12.100
									13.829	11.171	13.114	10.843	11.071	12.386	11.929	10.857	12.843
12.000	12.380	12.029	11.543	12.557	12.343	10.700	11.486	12.900
									15.100	12.914	12.171	10.100	12.714	10.857	14.800	11.814	10.986
11.386	13.100	13.943	11.686	10.900	11.214	11.800	14.300	10.843
									10.971	10.200	12.814	11.114	12.814	10.943	14.214	13.871	10.686
12.800	10.671	15.329	12.686	14.800	12.643	10.943	12.271	13.457
									10.900	11.443	10.457	11.200	11.129	12.857	12.043	10.729	11.300
13.314	12.071	11.900	10.614	13.471	13.243	11.371	10.629	13.971
									10.986	11.514	13.886	10.543	12.357	11.029	12.771	12.900	12.200
12.386	10.800	11.629	10.600	14.371	12.100	11.000	11.086	13.043
									11.600	11.357	10.400	10.900	11.786	14.486	12.671	11.571	12.730
11.057	10.814	11.671	12.529	12.400	11.100	13.857	12.457	10.243
									11.814	10.500	11.871	12.100	13.643	12.243	11.486	15.300	11.057
11.086	15.700	13.529	12.271	12.786	12.457	14.500	11.300	12.943
									11.029	11.200	14.657	10.214	12.414	12.814	10.714	11.400	11.143
12.414	11.957	14.514	12.243	12.157	12.240	13.800	12.529	11.671
									13.457	11.400	10.171	12.586	12.686	15.000	10.700	11.729	13.129
11.129	10.600	12.310	12.000	13.800	12.043	14.200	10.600	12.857
									12.450	10.500	12.590	11.743	11.657	10.400	12.600	11.000	13.843
12.314	10.371	11.457	11.857	11.129	12.170	10.457	11.343	14.543
									11.800	14.029	10.357	10.971	13.014	11.643	10.529	11.957	12.314
12.771	11.571	10.514	12.986	12.243	10.671	11.757	11.200	10.900
									10.457	12.614	11.157	12.757	14.600	11.457	12.386	14.157	13.386
10.543	13.071	12.957	12.900	12.586	12.586	11.200	13.600	10.914
									14.414	10.414	11.629	10.243	13.629	11.614	10.786	10.914	10.800
10.329	11.371	12.600	12.714	12.757	12.671	14.229	11.743	12.686
									10.500	10.186	14.314	11.443	12.429	11.814	10.371	10.700	14.100
13.171	10.200	11.286	11.329	10.771	13.100	13.286	11.000	13.800
									13.814

TABLE 9 radial basis function neural network training output (mg/L)

11.711	12.926	13.370	13.929	13.699	13.212	14.130	12.596	12.069
									13.417	12.604	13.309	10.690	11.376	13.536	11.372	10.925	11.351
13.296	14.083	11.259	13.129	12.622	12.221	11.249	11.771	13.480
									13.327	11.299	13.104	10.996	11.283	12.688	12.881	10.652	12.557
11.263	12.554	12.803	10.952	12.542	13.136	10.521	11.135	13.428
									13.989	13.263	11.682	11.011	12.879	10.772	14.258	12.404	10.910
11.571	11.901	12.963	11.333	11.032	10.899	12.253	13.645	11.359
									11.012	10.870	13.063	10.709	13.230	10.518	13.390	13.167	11.098
12.941	11.116	13.569	12.553	13.914	12.522	11.365	12.842	13.206
									11.348	11.943	10.524	10.675	11.095	12.534	10.581	11.326	10.577
13.518	11.389	11.323	12.048	12.790	13.604	11.021	10.792	13.069
									11.454	11.291	13.087	11.746	13.370	10.581	13.157	12.545	12.558
13.577	10.654	10.970	10.706	13.727	12.598	11.036	10.612	13.768
									11.094	11.254	11.060	10.928	12.616	13.605	12.921	10.943	12.575
11.789	11.277	10.593	13.383	12.791	11.060	13.261	12.126	10.491
									12.628	10.749	12.553	11.833	13.749	12.657	10.812	13.862	11.063
11.014	13.550	13.474	11.854	12.603	12.708	14.024	11.055	13.258
									10.700	11.427	14.058	11.018	11.642	12.545	12.078	11.168	11.059
11.452	12.894	13.927	11.601	12.810	12.472	13.531	11.350	12.449
									13.174	11.718	11.338	12.726	11.838	14.069	10.940	11.306	13.732
10.400	11.028	12.472	12.586	12.759	13.561	13.795	10.607	13.396
									12.553	11.405	12.599	10.757	11.502	10.595	13.058	10.984	13.026
11.774	10.872	11.104	11.812	11.101	12.428	11.396	11.357	13.879
									10.696	13.344	11.197	11.343	13.617	11.358	10.699	12.540	12.725
13.232	11.660	11.075	13.674	13.182	11.403	11.877	10.821	10.640
									10.919	12.613	10.844	11.675	13.709	11.138	11.585	13.583	13.514
10.716	13.744	13.595	13.583	11.681	13.053	11.125	13.541	10.716
									13.845	10.926	10.971	11.333	13.113	11.511	11.417	10.930	11.109
10.886	11.259	13.202	13.448	12.912	12.524	13.670	12.504	13.304
									10.730	10.846	13.289	11.187	13.062	11.458	10.817	10.799	13.946
13.435	11.462	11.080	11.299	10.900	12.797	13.324	10.729	13.217
									11.711	12.926	13.370	13.929	13.699	13.212	14.130	12.596	12.069
13.194

Test specimen

TABLE 10 auxiliary variables Total Nitrogen concentration in effluent

TABLE 11 auxiliary variable effluent Ammonia Nitrogen concentration

0.484	0.029	0.282	-0.800	-0.245	-0.938	-0.386	-0.580	-0.399
									-0.688	-0.679	0.362	0.565	-0.097	-0.789	-0.951	-0.643	0.289
0.273	-0.545	-1.000	-0.841	-0.080	-0.437	0.180	0.321	-0.179
									0.343	-0.628	0.354	-0.713	0.427	0.183	-0.617	-0.529	0.584
0.403	-0.461	-0.383	-0.716	-0.630	0.432	0.256	0.430	0.208
									-0.299	0.825	-0.506	-0.792	-0.761	-0.825	0.597	0.214	0.494
0.987	-0.567	-0.989	-0.369	-0.094	-0.594	0.305	-0.940	0.266
									0.357	-0.670	-0.469	0.273	0.610	0.344	-0.756	0.255	0.224
0.805	0.591	0.412	-0.950	-0.114	0.266	-0.675	0.268	0.394
									-0.555	0.393	-0.485	-0.443

TABLE 12 Total Nitrogen concentration of the auxiliary variables influent

-0.374	-0.310	-0.489	0.224	0.077	-0.216	0.183	-0.579	0.150
									-0.122	-0.545	-0.379	-0.634	-0.277	-0.013	0.147	-0.302	-0.503
-0.478	0.767	-0.157	0.266	-0.433	0.073	-0.561	-0.628	0.022
									-0.686	0.453	-0.627	1.000	-0.540	-0.405	-0.486	0.022	-0.451
-0.290	0.116	-0.512	-0.175	-0.504	-0.556	-0.491	-0.662	-0.263
									-0.331	-0.672	-0.307	0.136	0.359	0.208	-0.426	-0.248	-0.936
-0.618	0.385	0.174	0.063	-0.462	0.582	-0.330	-0.002	-0.286
									-0.250	0.332	0.128	-0.238	0.111	-0.344	0.036	-0.294	-0.274
-0.523	-0.590	-0.341	0.175	-0.061	-0.247	-0.453	-0.277	-0.418
									-0.528	-0.485	-0.158	0.382

TABLE 13 BOD concentration of the auxiliary variable influent water

TABLE 14 auxiliary variable influent ammonia nitrogen concentration

-0.240	-0.305	-0.356	0.101	0.140	-0.248	0.460	-0.493	0.675
									-0.049	-0.427	-0.446	-0.316	0.026	0.166	0.105	-0.448	-0.478
-0.279	0.885	-0.040	0.448	-0.092	0.067	-0.417	-0.446	0.071
									-0.533	0.346	-0.519	0.706	-0.260	-0.051	-0.504	0.037	0.057
-0.406	0.355	-0.392	0.451	-0.410	-0.441	-0.487	-0.383	-0.368
									-0.348	0.020	-0.400	0.425	0.433	0.040	-0.276	-0.359	0.099
-0.514	0.295	0.263	0.173	-0.348	0.464	-0.351	0.013	-0.402
									-0.421	0.452	0.470	-0.442	-0.492	-0.438	0.306	-0.584	-0.415
-0.433	-0.248	-0.446	0.363	-0.019	-0.447	-0.417	-0.395	-0.237
									-0.298	-0.345	-0.233	0.448

TABLE 15 auxiliary variables Biochemical pool DO concentration

-0.325	0.564	0.342	-0.654	-0.218	-0.169	-0.383	0.111	-0.300
									-0.342	0.539	-0.259	0.523	-0.374	-0.177	-0.243	0.704	0.169
-0.029	0.169	-0.276	-0.300	-0.202	0.251	0.267	0.449	-0.193
									-0.119	0.103	0.070	-0.128	0.350	-0.218	0.358	-0.399	0.037
0.152	-0.457	0.440	-0.259	0.539	0.424	0.004	0.350	0.646
									0.597	0.613	0.473	-0.366	-0.597	-0.473	0.514	0.671	0.259
0.383	0.556	-0.185	-0.358	-0.202	-0.169	0.202	-0.259	0.572
									0.399	0.012	-0.383	0.564	0.630	0.235	-0.111	0.556	0.556
0.070	0.152	0.440	-0.366	-0.366	0.218	0.358	0.309	0.193
									0.712	0.407	-0.333	0.572

TABLE 16 auxiliary variable intake pool phosphate concentration

-0.832	-0.802	-0.835	-0.204	-0.646	-0.085	-0.527	-0.822	-0.572
									-0.591	-0.808	-0.916	-0.952	0.772	-0.547	-0.042	-0.796	-0.867
-0.880	-0.686	0.080	-0.220	0.345	-0.572	-0.892	-0.881	-0.610
									-0.976	-0.719	-0.936	-0.363	-0.761	0.509	-0.790	-0.731	-0.853
-0.789	-0.618	-0.793	-0.654	-0.811	-0.846	-0.863	-0.783	-0.794
									-0.815	-0.829	-0.800	-0.204	-0.451	-0.193	-0.856	-0.769	-0.983
-0.898	-0.724	-0.047	-0.631	0.181	-0.569	-0.858	0.087	-0.774
									-0.795	-0.693	-0.549	-0.770	-0.668	-0.889	-0.534	-0.869	-0.784
-0.870	-0.914	-0.898	-0.310	-0.369	-0.828	-0.865	-0.819	-0.820
									-0.899	-0.812	-0.261	-0.729

TABLE 17 actual BOD concentration (mg/L) of the water

10.300	11.514	10.286	14.286	12.500	11.886	13.200	11.529	13.143
									12.743	11.486	11.029	10.157	12.171	12.729	14.400	11.600	10.800
10.243	12.671	12.100	14.000	12.660	12.800	11.100	10.200	12.771
									10.129	14.586	10.314	13.900	12.600	12.520	11.229	12.629	10.600
10.286	13.086	11.443	12.114	10.886	10.800	11.000	12.243	11.457
									11.429	12.229	11.571	14.086	13.100	12.929	10.271	11.714	11.257
11.043	14.957	12.614	12.729	12.800	14.900	10.657	14.657	11.400
									10.714	15.500	13.000	10.829	11.900	10.614	12.643	11.143	11.300
10.771	10.386	11.114	13.900	12.529	10.986	11.771	11.200	11.286
									11.857	11.400	11.971	11.986

TABLE 18 radial basis function neural network prediction output (mg/L)

10.649	11.010	10.688	13.822	12.576	12.594	13.666	11.784	13.533
									13.225	11.761	11.020	11.259	12.607	13.251	13.677	11.374	10.608
10.738	13.392	12.955	13.725	12.588	13.029	10.772	10.861	12.520
									10.871	13.602	10.879	14.053	11.343	12.553	11.908	12.818	10.370
10.906	13.547	12.161	12.669	11.883	10.968	10.718	11.676	11.012
									11.091	11.312	11.351	13.434	13.887	13.655	11.092	10.960	10.553
11.108	13.339	13.508	12.608	12.563	14.101	11.334	13.538	11.498
									10.825	13.643	13.498	11.225	10.513	11.421	13.242	11.130	11.409
10.817	10.779	11.174	13.111	12.648	10.864	11.229	10.909	11.042
									11.390	11.018	12.642	13.101

Claims

1. A soft BOD concentration measurement method of effluent based on MIC and RBFNN is characterized by comprising the following steps:

step 1, determining auxiliary variables: carrying out correlation analysis on the acquired actual water quality parameter original data of the sewage treatment plant by adopting a maximum information coefficient MIC, calculating the correlation coefficient of each water quality parameter and the BOD of the effluent in a calculation mode shown as a formula (1), selecting a variable with the correlation coefficient larger than 0.5, and obtaining an auxiliary variable with strong correlation with the BOD concentration of the effluent as follows: the total nitrogen concentration of the effluent, the ammonia nitrogen concentration of the effluent, the total nitrogen concentration of the influent, the BOD concentration of the influent, the ammonia nitrogen concentration of the influent, the DO concentration of the biochemical tank and the phosphate concentration of the influent tank;

Step 2, determining an initial clustering center of the K-means clustering algorithm on the basis of the feature data screened in the step 1: determining K initial clustering centers of the K-means algorithm by using the sample density and the distance between the samples;

step 3, determining the center, width and weight parameters of the radial basis function neural network: substituting the K initial clustering centers obtained in the step (2) into an original K-means algorithm to obtain a clustering result, taking the clustering result of the K-means clustering algorithm as a central parameter of a radial basis function, and taking the initial weight of each node in the hidden layer and the node of the output layer as 1;

step 4, determining a topological structure of a radial basis function neural network for predicting the BOD concentration of the effluent;

step 5, adjusting the radial basis function neural network parameters of the soft measurement model;

and 6, packaging the soft measurement model obtained in the step 5 into a jar file, importing the jar file into a Javaweb project, using a cloud server to complete service deployment, using a browser to access the project, uploading production data, calling a radial basis function neural network program by the server to predict, and transmitting a predicted result back to the client.

2. The BOD (biochemical oxygen demand) concentration soft measurement method of effluent based on MIC (many integrated core) and RBFNN (radial basis function) as claimed in claim 1, wherein: the step 2 comprises the following steps of,

step 2.3, determining an initial clustering center of the K-means clustering algorithm: determining the number K of final clustering centers, obtaining two samples with the largest distance from the candidate samples as initial clustering centers, and recording the initial clustering centers as C₁、C₂Two ofDeleting samples from the candidate set, distributing the remaining candidate samples to the nearest center according to the Euclidean distance shortest principle in the remaining samples to serve as a sample cluster, and forming two sample clusters S₁、S₂Calculating S₁Samples in clusters to C₁And S₂Samples in clusters to C₂Taking two samples farthest from the center of the existing initial cluster in the two clusters as C₁₁、C₂₁The two farthest distances are denoted as d₁、d₂If d is₁>＝d₂Then C will be₁₁Removed from the original sample set, added to the initial cluster center sample, denoted C₃Otherwise, C is added₂₁Removed from the original sample set, added to the initial cluster center sample, denoted C₃；

h is an empirical value and the value range is [0, 1]]Get d_m+1The corresponding sample is taken as a new clustering center and is marked as C_m+1If m +1 is equal to K, all initial cluster centers have been determined, ending this step 2.4, if m +1 is equal to K<K, continue this step 2.4.

3. The BOD (biochemical oxygen demand) concentration soft measurement method of effluent based on MIC (many integrated core) and RBFNN (radial basis function) as claimed in claim 1, wherein: the step 3 comprises the following steps;

4. The BOD (biochemical oxygen demand) concentration soft measurement method of effluent based on MIC (many integrated core) and RBFNN (radial basis function) as claimed in claim 1, wherein: the step 4 comprises the following steps;

x_i,i＝1,2,...,n (6)

step 4.2, determining the number of hidden layer nodes and the width and the center of the hidden layer nodes; the layer is provided with m neurons in total, m is the number K of the clustering centers determined by the K-means algorithm in the step 2, the center selection of the radial basis function is the clustering result determined in the step 2, the width is the nearest Euclidean distance from the clustering center to other clustering centers, the hidden layer transfer function is the radial basis function, and a standard Gaussian function is usually selected and shown in the formula (7);

5. The BOD (biochemical oxygen demand) concentration soft measurement method of effluent based on MIC (many integrated core) and RBFNN (radial basis function) as claimed in claim 1, wherein:

Δ＝[w₁,w₂,...,w_m] (9)

e_p＝y_d-y_o (13)

Δ_k+1＝Δ_k-(Q_k+μ_kI)^-1g_k (14)

x_real＝x_normal*(max-min)+min (15)

wherein x_real，x_normalRepresenting the true prediction data.