Comprehensive poison monitoring method of microbial electrochemical sensor based on machine learning model
Technical Field
The invention belongs to the technical field of environment detection, and particularly relates to a complex poison detection method for data analysis based on machine learning, a microbial electrochemical sensor for realizing the method and a use method thereof.
Background
With the acceleration of the urban and industrial processes, the problem of water pollution becomes increasingly serious. In order to cope with the complex types of pollutants, development of a high-precision rapid water quality detection means is a problem to be solved urgently. The existing water quality detection means including physical and chemical methods have the problems of high cost, long detection time, need of in-situ analysis in a laboratory and the like. Therefore, on the basis of realizing high-precision and high-accuracy comprehensive poison water quality detection, it is particularly important to develop an online low-cost and in-situ real-time detection technology. In recent years, microbial electrochemical toxicity sensors with microbial electrochemical systems (Microbial electrochemical system, MES) as the core have been widely studied because of their ability to achieve low cost broad-spectrum real-time monitoring of toxins. MES sensors generally use anodic electroactive biofilms as sensing units, when impacted by a poison, the metabolism of the electroactive microorganisms is affected, the electron transfer rate is reduced, and the electrical signals generated by the system are correspondingly changed. However, since the electroactive biological film can fully respond all toxic impacts as changes of electric signals, when dealing with the actual water body with complex pollutants, researchers can hardly directly obtain information of each toxic by analyzing a single electric signal, which prevents the MES sensor from further application in early warning of toxicity and water quality monitoring of the actual water body.
Machine learning is a process of deep law through data mining, and has been widely used in water treatment and environmental monitoring in recent years. In the framework of machine learning, a reasonable statistical model is built by selecting a proper algorithm and parameters, wherein one part of data set is used as sample data (training set), and the other part of data set is used as verification set to check the accuracy of the model, so that accurate prediction and decision on newly input data can be finally realized. Through regression modeling, machine learning can realize deep analysis of different types of toxicants and response electric signal relations, and finally, quantification of various toxicants by the MES sensor is realized.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a method for a comprehensive poison monitoring system of a microbial electrochemical sensor for data analysis based on machine learning, which is used for constructing a Microbial Electrolytic Cell (MEC) sensor by taking an anodic electroactive biomembrane as a sensing element and realizing rapid and accurate monitoring of various poisons in the same batch of MECs by combining different machine learning algorithms and electrical signal characteristic data.
The technical scheme of the invention is a method for a comprehensive poison monitoring system of a microbial electrochemical sensor for data analysis based on machine learning, which specifically comprises the following steps:
1. The comprehensive poison monitoring method of the microbial electrochemical sensor based on the machine learning model is characterized by comprising the following steps of:
1) Performing data processing on the acquired electrical signal data before and after the microbial electrochemical sensors receive toxic impact, and finally obtaining electrical signal data of each microbial electrochemical sensor arranged according to time sequence;
2) For each reactor, the characteristic value of the descriptive electric signal and the characteristic value responding to poison impact are obtained through processing experimental data and collected electric signal data. The characteristic values of the descriptive electrical signals are the maximum current (I_max) generated during the operation of the MEC reactor, the domestication maturity of the biological film in the reactor, the steady-state current (I_stable) of the system reaching steady state, the time (t_stable) required for reaching steady state, and the time point (t_object) of injecting poison;
3) Training a machine learning model, selecting a specific algorithm according to the type of the poison, selecting a specific descriptive electric signal characteristic value and a response poison impact characteristic value as input, carrying out regression model training by taking the concentration of the poison as output, taking one part of the processed electric signal data set as a training set and the other part as a verification set, and predicting the poison concentration of a verification set sample through the model;
4) Machine learning model evaluation
And evaluating the precision of the machine learning model, wherein after the model is accurate enough, the predicted value can be used as the actual poison concentration, so that the comprehensive poison concentration is detected.
2. Alternatively, the characteristic values of the response poison impact are the current and the current drop rate every 6 hours after the poison impact, namely, the current (i_6h) and the current drop rate (DropRatio _6h) for 6 hours, the current (i_12h) and the current drop rate (DropRatio _12h) for 12 hours, the current (i_18h) and the current drop rate (DropRatio _18h) for 18 hours, the current (i_24h) and the current drop rate (DropRatio _24h) for 24 hours.
3. Optionally, the maximum current (I_max) generated during the operation of the reactor is the third highest current value in the electric signal data of the reactor to eliminate possible measurement deviation and outliers caused by unexpected disconnection during the operation, the reactor current reaches 90% of the I_max to be in a stable state, the time point is the time (t_stable) required for reaching the stable state, the average current from the time of entering the stable state to the time of receiving the poison impact is the steady state current (I_stable), the reactor receives the poison impact from the time of injecting the poison, and the current drop rate (DropRatio) is expressed as follows:
Wherein, dropRatio is DropRatio _6h, dropRatio_12h dropwatio 18h, dropwatio 24h, in response to this, the control unit, I_drop is I_6h, I_12h I_18h, I_24h.
4. Optionally, the feature group for training the manganese chloride quantitative model is i_max, i_stable, t_ stable, dropRatio _12h;
The characteristic groups for training the quantitative model of sodium nitrite are I_max, I_stable and t_ stable, dropRatio _24h, and the characteristic groups for training the quantitative model of tetracycline hydrochloride are I_max, I_stable and t_ stable, dropRatio _6h and DropRatio_12h.
5. Alternatively, 90% of the data is used for the training set and 20% of the data is used for the validation set.
6. Optionally, the machine learning models for training manganese chloride, sodium nitrite and tetracycline hydrochloride are respectively a partial least Squares algorithm (PARTIAL LEAST Squares, PLS), a K Nearest Neighbor algorithm (KNN), and a neural network algorithm (Neural Network, NNET).
Advantageous effects
Compared with the prior art, the invention combines the microbial electrochemical water quality monitoring at the front end with the machine learning data analysis at the rear end, overcomes the defect that the traditional microbial electrochemical sensor cannot identify comprehensive poison at the same time through the operation of the reactor of the same batch and the poison impact, realizes the quantification of various poison concentrations at the same time, and provides a new technology for the microbial electrochemical sensor to be further applied to the water quality detection and the toxicity early warning of complex pollutant water bodies. The machine-learned model is assembled from models that quantify each specific poison, which use different algorithms and feature sets for each poison, providing the possibility to customize the model for a specific body of water and build a broad-spectrum poison analysis database. The wireless data acquisition system provides a wide application prospect for the intellectualization and the Internet of things of the microbial electrochemical online water quality monitoring and early warning system.
Drawings
FIG. 1 is a flow chart of water quality detection of a microbial electrochemical sensor for data analysis based on machine learning.
FIG. 2 is a graph of the time current of a MEC reactor having a manganese chloride concentration of 9mg/L, a sodium nitrite concentration of 9mg/L, and a tetracycline hydrochloride concentration of 6 mg/L.
FIG. 3 is a schematic diagram illustrating the definition of characteristic values of electrical signal data
FIG. 4 is a graph of machine learning models versus predicted results of three poison concentrations in a validation set reactor.
Detailed Description
Example 1 microbial electrochemical sensor for data analysis based on machine learning simultaneously detects manganese chloride, sodium nitrite, tetracycline hydrochloride
1. Constructing a microbial electrolytic cell reactor with an anodic electroactive biomembrane as a sensing element and performing comprehensive poison monitoring
1) Construction of microbial electrolytic cell reactor with anode electroactive biomembrane as sensing element
Electroactive biofilms are enriched by a two electrode MEC system. The main body of the reactor is a 100mL blue-cap reagent bottle (not only limited by the size), a stainless steel mesh with the diameter of 1.5X1.5 cm 2 is perpendicular to the bottom surface of the reagent bottle and is used as a cathode, a graphite rod with the diameter of 1cm and the height of 1.5cm is used as an anode, one bottom surface is parallel to the stainless steel mesh, and other surfaces are covered by silicone rubber, so that the bottom surface is the only surface enriched with biological membranes. An external voltage of 0.7V was applied to the reactor to enrich the anodic biofilm and the biofilm status was monitored for current.
2) Microbial electrolytic cell sensor using anodic electroactive biomembrane as sensing element for toxicity monitoring
As shown in fig. 2, after all reactors reach their steady current, a poison is added to the reactor. Manganese chloride, sodium nitrite, and tetracycline hydrochloride were added to simulate the effects of complex poisons including heavy metals, nitrite, antibiotics on electroactive biofilms. In order to ensure that the electric signal data is taken as sample data of a machine learning model to have enough representativeness and universality, the concentration ranges of the three poisons are all 1-10 mg/L, and the concentration of each poison of each reactor is randomly generated and is not completely the same. Three poisons and 50mmol/L phosphoric acid buffer solution (PBS) are prepared into a total amount of 1mL of poison mixed solution which is injected into the reactor, and the change of the reactor current after the impact of the poisons is continuously monitored under the condition that the pH and the substrate concentration of the reactor are not affected. Finally, 23 MEC reactors impacted by different concentrations of poison and data of electric signals of the MEC reactors are obtained. 2. Data processing and machine learning modeling of collected sample data
1) Data cleaning processing is carried out on the acquired electric signal data
The electric signal data stored by the computer is provided with a sampling time stamp, an IP address of the data acquisition system, a device number and other labels, and the data are processed into electric signal data patterns which are distributed in time sequence by each reactor, so that the subsequent characteristic value extraction and machine learning modeling are facilitated.
2) Feature set extraction and feature value selection for electrical signal data
The data of the electric signals of the 23 reactors after data arrangement are used as a database. As shown in FIG. 3, for each MEC reactor, a series of characteristic values are obtained by processing experimental data and collected electrical signal data, wherein the characteristic values comprise the maximum current (I_max) generated during the operation of the MEC reactor, the domestication maturity of biological membranes in the reactor, the steady state current (I_stable) of the system reaching steady state and the time (t_stable) required for reaching steady state, the time point (t_object) of injecting poison, the characteristic values comprise the current (drop_6h) and the current Drop rate (DropRatio _6h) 6h after the poison impact, the current (drop_12h) and the current Drop rate (DropRatio _12h) 12h after the poison impact, the current (drop_18h) and the current Drop rate (DropRatio _18h) 18 h) and the current (drop_24h) and the current Drop rate (DropRatio _24h) after the poison impact, and the characteristic values are not defined in the previous method.
And carrying out normalization processing on all the eigenvalue data, and mapping each eigenvalue data set to a (0, 1) interval. The normalization formula is as follows:
The data sets of 20 reactors are randomly selected as training sets to train a machine learning model, and the other 3 are used as training sets to verify that the model can realize concentration prediction and evaluate the prediction accuracy degree of the model.
The characteristic groups for training the manganese chloride concentration model are I_max, I_stable and t_ stable, dropRatio _12h, the characteristic groups for training the sodium nitrite concentration model are I_max, I_stable and t_ stable, dropRatio _24h, and the characteristic groups for training the tetracycline hydrochloride concentration model are I_max, I_stable, t_ stable, dropRatio _6h and DropRatio_12h.
3) Machine learning modeling using different algorithms for different poisons
The machine learning models for training manganese chloride, sodium nitrite and tetracycline hydrochloride are respectively partial least squares algorithm (PLS), K nearest neighbor algorithm (KNN) and neural network algorithm (NNET). Because the poison concentration value is a continuous variable, a regression model is established based on training collecting signal characteristic value data to perform supervised machine learning, and the electric signal characteristic value data in the verification set is input into the model to obtain the concentration of each poison for predicting the continuous variable.
The partial least square regression analysis is a regression analysis method aiming at multiple characteristic values and multiple output values, and realizes a multiple regression method for carrying out variable correlation analysis and principal component analysis and data simplification by combining typical correlation analysis. By integrating and screening the information of the electric signal adjustment data, the PLS algorithm can extract new comprehensive variables with the best interpretation ability for poison concentration from the whole characteristic group for regression modeling. The K nearest neighbor method is to find several points closest to the new point by a predetermined number (K) in distance (standard euclidean distance) from the trained electrical signal data samples, and then predict from these points. Based on the electric signal data characteristic group of the continuous label, the KNN can realize the training of the regression model. In the regression model, training of the neural network is performed by means of a Multi-layer Perceptron (MLP). The multi-layer sensor under the NNET framework comprises an input layer, an output layer and a nonlinear hidden layer. The first input layer consists of a set of neurons { x_i|x_1, x_2,.. X_m } representing the input features, i.e., a series of electrical signal feature values for modeling, the neurons in the hidden layer perform weighted linear sum transformation w 1x1+w2x2+...+wmxm on the values of the previous layer and nonlinear activation function transformation g (·) r→r (identity transformation in the regression model), and finally the output layer accepts the continuous variable transformed by the hidden layer. 4) Evaluating machine learning model prediction accuracy
After a regression model for manganese chloride, sodium nitrite and tetracycline hydrochloride was trained by the corresponding algorithm and feature sets, the feature value sets for each reactor in the validation set were input into the model to obtain predicted concentrations of different poisons, as shown in fig. 4. It can be seen that the concentration predicted by the model has little difference from the concentration of the actual injected poison, which initially indicates that the quantitative model of the three poisons is sufficiently accurate.
Model accuracy was further assessed by calculating Root Mean Square Error (RMSE) and Mean absolute Error (Mean Absolute Error, MAE) of the model.
The MAE measures the average value of all errors in the prediction process, and the calculation method is as follows:
Wherein, Y i is an actual value, which is a predicted value output given the feature set of system sample i.
In the RMSE measurement prediction process, the standard deviation of the prediction error is calculated as follows:
Wherein, Y i is an actual value, which is a predicted value output given the feature set of system sample i.
Through verification of a verification set, MAE of a concentration quantitative model of manganese chloride, sodium nitrite and tetracycline hydrochloride can be reduced to 0.20, 0.18 and 0.26, and RMSE can be reduced to 0.21, 0.20 and 0.23.
The method proves that the quantitative detection of various toxins can be realized on the same batch of microbial electrochemical sensors simultaneously by a large amount of electric signal data for reaction toxicity monitoring and by different machine learning algorithm construction, and lays a foundation for further popularization of microbial electrochemistry.
The foregoing is only illustrative of the present invention and is not to be construed as limiting thereof, but rather as various modifications, equivalent arrangements, improvements, etc., within the spirit and principles of the present invention.