CN118503787B - Data analysis method and device based on electroactive microorganism detection - Google Patents
Data analysis method and device based on electroactive microorganism detection Download PDFInfo
- Publication number
- CN118503787B CN118503787B CN202410766467.0A CN202410766467A CN118503787B CN 118503787 B CN118503787 B CN 118503787B CN 202410766467 A CN202410766467 A CN 202410766467A CN 118503787 B CN118503787 B CN 118503787B
- Authority
- CN
- China
- Prior art keywords
- data
- model
- characteristic
- analysis
- electrochemical
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 244000005700 microbiome Species 0.000 title claims abstract description 171
- 238000001514 detection method Methods 0.000 title claims abstract description 105
- 238000000034 method Methods 0.000 title claims abstract description 51
- 238000007405 data analysis Methods 0.000 title claims abstract description 24
- 238000004458 analytical method Methods 0.000 claims abstract description 75
- 238000007781 pre-processing Methods 0.000 claims abstract description 36
- 230000000694 effects Effects 0.000 claims abstract description 27
- 238000010219 correlation analysis Methods 0.000 claims abstract description 16
- 108090000623 proteins and genes Proteins 0.000 claims description 78
- 230000002503 metabolic effect Effects 0.000 claims description 48
- 238000013145 classification model Methods 0.000 claims description 43
- 238000012545 processing Methods 0.000 claims description 41
- QVGXLLKOCUKJST-UHFFFAOYSA-N atomic oxygen Chemical compound [O] QVGXLLKOCUKJST-UHFFFAOYSA-N 0.000 claims description 33
- 239000001301 oxygen Substances 0.000 claims description 33
- 229910052760 oxygen Inorganic materials 0.000 claims description 33
- 230000009467 reduction Effects 0.000 claims description 31
- 238000010183 spectrum analysis Methods 0.000 claims description 31
- 239000000126 substance Substances 0.000 claims description 31
- 238000004140 cleaning Methods 0.000 claims description 28
- 238000010238 partial least squares regression Methods 0.000 claims description 28
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims description 27
- 230000005540 biological transmission Effects 0.000 claims description 21
- 238000005516 engineering process Methods 0.000 claims description 18
- 239000002207 metabolite Substances 0.000 claims description 18
- 230000000813 microbial effect Effects 0.000 claims description 17
- 239000011159 matrix material Substances 0.000 claims description 16
- 108700005443 Microbial Genes Proteins 0.000 claims description 15
- 230000033228 biological regulation Effects 0.000 claims description 15
- 230000003595 spectral effect Effects 0.000 claims description 15
- 238000012216 screening Methods 0.000 claims description 14
- 230000007613 environmental effect Effects 0.000 claims description 13
- 230000002068 genetic effect Effects 0.000 claims description 13
- 230000008569 process Effects 0.000 claims description 13
- 238000000605 extraction Methods 0.000 claims description 11
- 238000005457 optimization Methods 0.000 claims description 10
- 238000000513 principal component analysis Methods 0.000 claims description 9
- 230000014509 gene expression Effects 0.000 claims description 8
- 230000007614 genetic variation Effects 0.000 claims description 8
- 238000007637 random forest analysis Methods 0.000 claims description 8
- 230000007269 microbial metabolism Effects 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 2
- 230000027756 respiratory electron transport chain Effects 0.000 claims 4
- 238000012549 training Methods 0.000 abstract description 17
- 238000010276 construction Methods 0.000 description 19
- 230000000875 corresponding effect Effects 0.000 description 13
- 108091028043 Nucleic acid sequence Proteins 0.000 description 8
- 230000002159 abnormal effect Effects 0.000 description 8
- 230000004060 metabolic process Effects 0.000 description 8
- 238000010606 normalization Methods 0.000 description 8
- 238000001228 spectrum Methods 0.000 description 8
- 230000009466 transformation Effects 0.000 description 8
- 238000004422 calculation algorithm Methods 0.000 description 6
- 238000003066 decision tree Methods 0.000 description 4
- 238000010801 machine learning Methods 0.000 description 3
- 238000003556 assay Methods 0.000 description 2
- 238000013499 data model Methods 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000033116 oxidation-reduction process Effects 0.000 description 2
- 238000011946 reduction process Methods 0.000 description 2
- 241000894006 Bacteria Species 0.000 description 1
- 108020004414 DNA Proteins 0.000 description 1
- 241000233866 Fungi Species 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000000835 electrochemical detection Methods 0.000 description 1
- 238000002848 electrochemical method Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000012010 growth Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000002773 nucleotide Substances 0.000 description 1
- 125000003729 nucleotide group Chemical group 0.000 description 1
- 230000001575 pathological effect Effects 0.000 description 1
- 230000001124 posttranscriptional effect Effects 0.000 description 1
- 238000004451 qualitative analysis Methods 0.000 description 1
- 238000004445 quantitative analysis Methods 0.000 description 1
- 230000008844 regulatory mechanism Effects 0.000 description 1
- 230000033458 reproduction Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2135—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/27—Regression, e.g. linear or logistic regression
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/01—Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Epidemiology (AREA)
- General Health & Medical Sciences (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Biotechnology (AREA)
- Computational Linguistics (AREA)
- Public Health (AREA)
- Databases & Information Systems (AREA)
- Biophysics (AREA)
- Bioethics (AREA)
- Apparatus Associated With Microorganisms And Enzymes (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The application relates to an electroactive microorganism detection data analysis method and device, comprising the following steps: preprocessing the original data; carrying out correlation analysis on the preprocessed data, and extracting relevant features; constructing a model, training the model and optimizing the model; when the detection data of the electroactive microorganisms are received, the detection data are preprocessed and input into a corresponding model for analysis. According to the application, the collected original data of the detection of the electroactive microorganisms and the original data detected by the environment and the electrochemical system are subjected to pretreatment and correlation analysis, the characteristics related to power output and the like are extracted, and a plurality of models related to the detection of the electroactive microorganisms are constructed based on the pretreated data, so that the efficient and accurate identification and classification of the types of the electroactive microorganisms and the analysis of the detection data are realized, and the method has the effect of improving the accuracy and the adaptability of the analysis of the detection data of the electroactive microorganisms.
Description
Technical Field
The application relates to the technical field of data analysis, in particular to a method and a device for analyzing detection data based on electroactive microorganisms.
Background
With the rapid development of the environmental monitoring and bioenergy fields, the electroactive microorganism detection technology is widely applied as an important technical means. However, due to the complexity and diversity of the detection data, conventional data analysis methods often have difficulty meeting practical requirements. On one hand, the traditional data analysis method mainly depends on manual experience, and has the problems of low efficiency, strong subjectivity and the like; on the other hand, the difficulty of data analysis is greatly increased due to the characteristics of high dimension, large noise, multiple missing values and the like of the detection data.
Therefore, how to automatically extract key features from massive electroactive microorganism detection data, and realize efficient and accurate electroactive microorganism type identification and classification, so as to effectively analyze the electroactive microorganism detection data is a technical problem to be solved urgently.
Disclosure of Invention
In order to realize efficient and accurate identification and classification of electroactive microorganism types and analysis of detection data and improve the accuracy and adaptability of the electroactive microorganism detection data analysis, the application provides a data analysis method and device based on electroactive microorganism detection.
The first object of the present application is achieved by the following technical solutions:
An electroactive microorganism detection-based data analysis method, comprising the steps of:
Preprocessing the collected raw data based on detection of electroactive microorganisms and the raw data based on detection of environments and an electrochemical system to obtain electrochemical system data, effective current data and an input data set;
Carrying out correlation analysis on the electrochemical system data and the effective current data, extracting power output related characteristics, and screening electrochemical characteristic data from the electrochemical system data;
Constructing a prediction model, a comprehensive characteristic analysis model, a partial least squares regression spectrum analysis model, a correlation model of gene sequences and metabolic activities and a microorganism classification model based on the input data set, the power output related characteristics and the electrochemical characteristic data;
training and optimizing a prediction model, a comprehensive characteristic analysis model, a partial least squares regression spectrum analysis model, a correlation model of gene sequences and metabolic activities and a microorganism classification model respectively;
when the detection data of the electroactive microorganisms input by the user terminal are received, preprocessing the detection data based on a preset data processing flow, and inputting the preprocessed detection data into a corresponding model for analysis.
By adopting the technical scheme, the collected original data about the detection of the electroactive microorganisms and the original data detected based on the environment and the electrochemical system are preprocessed, so that noise, abnormal values, missing values and the like are eliminated, and the influence on data analysis is reduced. And then carrying out correlation analysis on the electrochemical system data and the effective current data, extracting the characteristics related to power output, and simultaneously screening electrochemical characteristic data reflecting the electrochemical activity of the microorganism and revealing the interaction between the microorganism and the environment from the electrochemical system data as the basis for the subsequent model construction. Based on the input data set, the power output related features and the electrochemical feature data, a plurality of models for predicting microbial activity, analyzing the relationship of the microbes to the environment, analyzing the spectral data, analyzing the association of genes with metabolism, and classifying the microbes are constructed. After the model is built, each model is trained and optimized, so that the model can be better fit with data, the precision and generalization capability of the model are improved, and the internal rules and features of the data can be accurately captured by the models. After training and optimizing the model, preprocessing the electroactive microorganism detection data input by the user terminal according to a preset data processing flow, and inputting the preprocessed detection data into a corresponding model for analysis so as to complete detection data analysis based on the electroactive microorganism. According to the application, the collected original data of the detection of the electroactive microorganisms and the original data detected by the environment and the electrochemical system are subjected to pretreatment and correlation analysis, so that the key characteristics related to power output and the like are automatically extracted, a plurality of models related to the detection of the electroactive microorganisms are constructed through the processed data, the efficient and accurate identification and classification of the types of the electroactive microorganisms and the analysis of the detection data are realized, and the method has the effects of improving the accuracy and the adaptability of the analysis of the detection data of the electroactive microorganisms.
The present application may be further configured in a preferred example to: the step of preprocessing the collected raw data based on detection of electroactive microorganisms and raw data based on detection of an environment and an electrochemical system to obtain electrochemical system data, effective current data and an input data set comprises the steps of:
Collecting original current data obtained by detecting electroactive microorganisms, obtaining a mixed signal containing an electric signal generated by microorganism metabolism and environmental noise, and collecting experimental data of an electrochemical system;
noise reduction processing is carried out on the original current data to remove clutter and interference signals, and noise reduction current data are obtained;
And carrying out normalization processing on the noise reduction current data and the experimental data of the electrochemical system to obtain effective current data and electrochemical system data.
By adopting the technical scheme, the original current signal obtained by detecting the electroactive microorganisms and the experimental data of the electrochemical system are acquired to obtain the mixed signal containing the electric signal generated by microorganism metabolism and environmental noise and the information of the electrochemical system, noise reduction processing is carried out on the original current data to remove clutter and interference signals, normalization processing is carried out on the noise reduction current data and the experimental data of the electrochemical system to obtain effective current data and electrochemical system data, and the effective current data and the electrochemical system data are used as the basis for subsequent data analysis and model construction.
The present application may be further configured in a preferred example to: the step of performing correlation analysis on the electrochemical system data and the effective current data, extracting power output related characteristics, and screening electrochemical characteristic data from the electrochemical system data comprises the steps of:
calculating correlation coefficients between each parameter of the effective current data and the power output of other preset parameters in the electrochemical system data respectively;
if the correlation coefficient between any parameter and the power output is higher than a preset threshold value, determining the parameter as the power output correlation characteristic.
By adopting the technical scheme, the correlation coefficients between each parameter and power output of the effective current data and the correlation coefficients between the power output and other preset parameters except the power output in the electrochemical system data are calculated respectively, and when the correlation coefficient between any parameter or any plurality of parameters and the power output is larger than a preset threshold value, the parameter is determined to be the correlation characteristic affecting the power output, so that the automatic extraction of the key characteristic is completed.
The present application may be further configured in a preferred example to: the input data set comprises biomembrane activity data, microorganism gene sequence data, metabolite data, water sample data, colony characteristic data, electronic transmission information data and spectrum characteristic data, and the steps of constructing a prediction model, a comprehensive characteristic analysis model, a partial least squares regression spectrum analysis model, a correlation model of gene sequences and metabolic activities and a microorganism classification model based on the input data set, the power output related characteristics and the electrochemical characteristic data comprise the following steps:
constructing a prediction model for predicting power output under different conditions based on each relevant feature based on the relevant features;
performing dimension reduction treatment on the water sample data and the electrochemical characteristic data, and constructing a comprehensive characteristic analysis model;
constructing a partial least squares regression spectral analysis model based on the biomembrane activity data and the spectral characteristic data;
Constructing a correlation model of gene sequences and metabolic activities based on microbial gene sequence data and metabolite data;
and constructing a microorganism classification model based on the electronically transmitted information data and the colony characteristic data.
By adopting the technical scheme, a prediction model for predicting power output under different conditions based on each relevant characteristic is constructed based on the relevant characteristics, a comprehensive characteristic analysis model for analyzing comprehensive influences of water samples and electrochemical characteristics is constructed based on the water sample data and the electrochemical characteristic data after the dimension reduction treatment, a partial least squares regression spectrum analysis model for analyzing the relation between the biological film activity and the spectrum characteristic is constructed based on the biological film activity data and the spectrum characteristic data, a correlation model for analyzing the gene sequence and the metabolic activity in relation to the gene sequence is constructed based on the microbial gene sequence data and the metabolic product data, a microbial classification model for identifying and classifying microorganisms is constructed based on the electronic transmission information data and the colony characteristic data, and efficient and accurate identification and classification of the type of the electrically active microorganisms and analysis of detection data are realized by constructing a plurality of models based on detection of the electrically active microorganisms.
The present application may be further configured in a preferred example to: the water sample data comprises chemical oxygen demand data, and the steps of performing dimension reduction treatment on the water sample data and electrochemical characteristic data and constructing a comprehensive characteristic analysis model comprise the following steps:
Performing data cleaning and preprocessing on the chemical oxygen demand data and the electrochemical characteristic data, wherein the data cleaning and preprocessing comprises abnormal value removal and standardization processing;
Performing dimension reduction treatment on the chemical oxygen demand data and the electrochemical characteristic data subjected to data cleaning and pretreatment based on a principal component analysis technology to obtain a plurality of principal component characteristics and constructing a principal component load matrix which displays the correlation between each principal component and the chemical oxygen demand data and the electrochemical characteristic data;
Calculating the score of each sample on each main component based on the main component load matrix, the chemical oxygen demand data and the electrochemical characteristic data before data cleaning and pretreatment, wherein the samples are chemical oxygen demand data and corresponding electrochemical characteristic data in single water sample data;
a synthetic characteristic analysis model is constructed based on the score of each sample on each principal component and the chemical oxygen demand data.
By adopting the technical scheme, the data cleaning and preprocessing are carried out on the chemical oxygen demand data and the electrochemical characteristic data, the data cleaning and preprocessing comprises abnormal value removal and standardization processing, the dimensionality reduction processing is carried out on the cleaned and preprocessed data based on a principal component analysis technology, a plurality of principal component characteristics are obtained, a principal component load matrix is constructed, the principal component load matrix represents the weight of each original characteristic on each principal component, the score of each sample on each principal component is calculated based on the principal component load matrix result, and a comprehensive characteristic analysis model for analyzing the comprehensive influence of the water sample and the electrochemical characteristic is constructed based on the calculated score and the original chemical oxygen demand data.
The present application may be further configured in a preferred example to: the step of constructing a correlation model of gene sequences and metabolic activities based on microbial gene sequence data and metabolite data comprises the steps of:
Acquiring a gene sequence of a known microorganism, and constructing a evolutionary tree based on the gene sequence data of the microorganism and the gene sequence of the known microorganism;
Analyzing the gene sequence data of the microorganism by a genetic variation analysis tool to obtain genetic characteristic data, and determining the gene expression level of the microorganism under different conditions by a transcriptome method to obtain gene regulation data;
And constructing a correlation model of the gene sequence and the metabolic activity based on the microorganism gene sequence data, the metabolite data, the evolutionary tree, the genetic characteristic data and the gene regulation data.
By adopting the technical scheme, the genetic sequence of the known microorganism is obtained, the evolutionary tree is constructed, the evolutionary relationship among the microorganisms can be displayed, the genetic variation analysis tool is used for carrying out deep analysis on the genetic sequence data of the microorganisms to obtain genetic characteristic data of the microorganisms, the transcriptome method is used for measuring the gene expression level of the microorganisms under different environmental conditions to obtain gene regulation data, and a correlation model for analyzing the correlation between the genetic sequence and the metabolic activity is constructed based on the genetic sequence data of the microorganisms, the metabolite data, the evolutionary tree, the genetic characteristic data and the gene regulation data.
The present application may be further configured in a preferred example to: the step of constructing a microorganism classification model based on the electronic transmission information data and the colony characteristic data comprises the following steps:
Data cleaning is carried out on the electronic transmission information data and the colony characteristic data, and a comprehensive data set of extracellular electronic transmission and colony characteristics is obtained;
extracting classification features related to the classification of the electroactive microorganisms from the comprehensive data set based on a feature engineering technology, and scoring the extracted classification features by a preset scoring strategy;
And selecting specific characteristics in a specific scoring range from the classification characteristics by a preset screening mode, and inputting a random forest model to construct a microorganism classification model.
By adopting the technical scheme, the electronic transmission information data and the colony characteristic data are subjected to data cleaning to obtain a comprehensive data set of the extracellular electronic transmission and the colony characteristic, the characteristic engineering technology is used for extracting classification characteristics related to the classification of the electroactive microorganisms from the comprehensive data set, scoring is carried out on the extracted classification characteristics, and specific characteristics conforming to a specific scoring range are screened out in a preset screening mode and are used for inputting a random forest model, so that a microorganism classification model for identifying and classifying the microorganisms is constructed.
The second object of the present application is achieved by the following technical solutions:
an electroactive microorganism-based detection data analysis device, comprising:
The data processing module is used for preprocessing the collected raw data based on the detection of the electroactive microorganisms and the raw data based on the detection of the environment and the electrochemical system to obtain electrochemical system data, effective current data and an input data set;
The feature extraction module is used for carrying out correlation analysis on electrochemical system data and effective current data, extracting power output related features and screening electrochemical feature data from the electrochemical system data;
The model construction module is used for constructing a prediction model, a comprehensive characteristic analysis model, a partial least squares regression spectrum analysis model, a correlation model of a gene sequence and metabolic activity and a microorganism classification model based on the input data set, the power output related characteristics and the electrochemical characteristic data;
the model optimization module is used for respectively training and optimizing a prediction model, a comprehensive characteristic analysis model, a partial least squares regression spectrum analysis model, a correlation model of a gene sequence and metabolic activity and a microorganism classification model;
the data input module is used for preprocessing the detection data based on a preset data processing flow when receiving the detection data of the electroactive microorganisms input by the user terminal, and inputting the preprocessed detection data into the corresponding model for analysis.
By adopting the technical scheme, the data processing module is used for preprocessing the collected original data based on detection of the electroactive microorganisms and the original data based on detection of the environment and the electrochemical system to obtain electrochemical system data, effective current data and an input data set; the feature extraction module is used for carrying out correlation analysis on electrochemical system data and effective current data, extracting power output related features and screening electrochemical feature data from the electrochemical system data; the model construction module is used for constructing a prediction model, a comprehensive characteristic analysis model, a partial least squares regression spectrum analysis model, a correlation model of a gene sequence and metabolic activity and a microorganism classification model based on the input data set, the power output related characteristics and the electrochemical characteristic data; the model optimization module is used for respectively training and optimizing a prediction model, a comprehensive characteristic analysis model, a partial least squares regression spectrum analysis model, a correlation model of a gene sequence and metabolic activity and a microorganism classification model; the data input module is used for preprocessing the detection data based on a preset data processing flow when receiving the detection data of the electroactive microorganisms input by the user side, and inputting the preprocessed detection data into the corresponding model for analysis.
In summary, the present application includes at least one of the following beneficial technical effects:
1. According to the application, through preprocessing and correlation analysis of the collected original data of the detection of the electroactive microorganisms and the original data of the detection of the environment and the electrochemical system, key characteristics related to power output and the like are automatically extracted, and a plurality of models related to the detection of the electroactive microorganisms are constructed through the processed data, so that efficient and accurate identification and classification of the types of the electroactive microorganisms and analysis of the detection data are realized, and the method has the effects of improving the accuracy and adaptability of the analysis of the detection data of the electroactive microorganisms;
2. after the model is built, each model is trained and optimized, so that the model can be better fitted with data, the precision and generalization capability of the model are improved, the internal rules and characteristics of the data can be accurately captured by each model, and the accuracy and adaptability of each model to the analysis of the electroactive microorganism detection data are further improved.
Drawings
FIG. 1 is a flow chart of an embodiment of an electroactive microorganism-based assay method of the present application;
FIG. 2 is a flowchart showing an implementation of step S10 in an embodiment of a method for analyzing detection data based on electroactive microorganisms according to the present application;
FIG. 3 is a flowchart showing an implementation of step S30 in an embodiment of a method for analyzing detection data based on electroactive microorganisms according to the present application;
FIG. 4 is a flowchart showing an implementation of step S32 in an embodiment of a method for analyzing detection data based on electroactive microorganisms according to the present application;
FIG. 5 is a flowchart showing an implementation of step S34 in an embodiment of a method for analyzing detection data based on electroactive microorganisms according to the present application;
FIG. 6 is a flowchart showing an implementation of step S35 in an embodiment of the method for analyzing detection data based on electroactive microorganisms according to the present application.
Detailed Description
The application is described in further detail below with reference to fig. 1-6.
In one embodiment, as shown in fig. 1, the application discloses a method for analyzing detection data based on electroactive microorganisms, which specifically comprises the following steps:
S10: preprocessing the collected raw data based on detection of electroactive microorganisms and the raw data based on detection of environments and an electrochemical system to obtain electrochemical system data, effective current data and an input data set;
In the present embodiment, the raw data based on the detection of the electroactive microorganism is a raw data set directly measured or observed without any treatment or analysis by using the electroactive microorganism as a biosensor or a detector for evaluating the activity, metabolic state, genetic composition, metabolic product, and electrochemical process-related characteristics of the electroactive microorganism; raw data detected based on the environment and the electrochemical system are raw data collected by an environment detection station or an electrochemical detection instrument on environmental monitoring (such as water quality, air quality, spectral characteristics, etc.) or electrochemical parameters (such as current, voltage, resistance, etc.); preprocessing is a series of preprocessing operations performed on raw data prior to data analysis; electrochemical system data is data about an electrochemical system obtained by an electrochemical method or technique that reflects the state, performance, or reaction process of the electrochemical system; the effective current data is the current data which can truly reflect the state of an electrochemical process or a system after proper treatment in the electrochemical system; the input data set is a set of data other than electrochemical system data and effective current data after preprocessing the collected raw data based on the detection of electrically active microorganisms and raw data based on the detection of environmental and electrochemical systems.
Specifically, the collected raw data about the detection of the electroactive microorganisms and the raw data detected based on the environment and the electrochemical system are preprocessed, so that noise, abnormal values, missing values and the like are eliminated, the influence on data analysis is reduced, and electrochemical system data, effective current data and an input data set are obtained.
S20: carrying out correlation analysis on the electrochemical system data and the effective current data, extracting power output related characteristics, and screening electrochemical characteristic data from the electrochemical system data;
in the present embodiment, the correlation analysis is a statistical method for studying the strength and direction of the relationship between the plurality of variables; the extraction of the power output related characteristics is to extract parameters or indexes closely related to the power output from the original data; electrochemical characteristic data are data or parameters that are closely related to the electrochemical process or system performance in electrochemical system data.
Specifically, correlation analysis is performed on the electrochemical system data and the effective current data, correlations between each parameter of the effective current data and other parameters in the electrochemical system data and power output are judged, closely related parameters are determined as power output related characteristics, extraction is performed, and parameters closely related to an electrochemical process are screened from the electrochemical system data and determined as electrochemical characteristic data.
S30: constructing a prediction model, a comprehensive characteristic analysis model, a partial least squares regression spectrum analysis model, a correlation model of gene sequences and metabolic activities and a microorganism classification model based on the input data set, the power output related characteristics and the electrochemical characteristic data;
In this embodiment, the prediction model is a mathematical model for predicting a target variable or result based on an input data set and related features, specifically predicting power output under different conditions based on each related feature; the comprehensive characteristic analysis model is a model for comprehensively considering a plurality of electrochemical characteristic data and related characteristics so as to analyze the overall performance or behavior of an electrochemical system, and is particularly used for analyzing the comprehensive influence of water samples and electrochemical characteristics; the partial least square regression spectrum analysis model is a model for realizing quantitative or qualitative analysis of spectrum data by combining spectrum data and target variables and establishing a relation between spectrum characteristics and the target variables, and particularly analyzes the relation between the activity of a biological film and the spectrum characteristics; the correlation model of the gene sequence and the metabolic activity is a correlation rule model established between the characteristics of the gene sequence and the metabolic activity, and specifically, the correlation of the gene sequence and the metabolic activity is analyzed; the microorganism classification model is a model for classifying microorganisms based on characteristic data of the microorganisms.
Specifically, a prediction model for predicting power output under different conditions based on each relevant characteristic, a comprehensive characteristic analysis model for analyzing comprehensive influences of water sample and electrochemical characteristics, a partial least squares regression spectral analysis model for analyzing the relation between biological film activity and spectral characteristics, a correlation model for analyzing the correlation between gene sequences and metabolic activity, and a microorganism classification model for classifying microorganisms are constructed based on an input data set, power output relevant characteristics and electrochemical characteristic data.
S40: training and optimizing a prediction model, a comprehensive characteristic analysis model, a partial least squares regression spectrum analysis model, a correlation model of gene sequences and metabolic activities and a microorganism classification model respectively;
in this embodiment, the training process for the predictive model includes training the predictive model using known input data and corresponding power outputs, typically through machine learning algorithms such as linear regression, decision trees, neural networks, and the like;
Wherein optimizing the predictive model includes adjusting parameters or structures of the predictive model to minimize prediction errors or improve processing performance of the model on unseen data, which typically involves techniques such as super-parametric tuning, feature selection, model selection, etc. Training the comprehensive characteristic analysis model comprises the steps of searching the most representative system performance index based on weight distribution, characteristic combination or selection and other modes; optimization of the synthetic trait analysis model includes adding new features, adjusting weights between features, or optimizing mathematical expressions of the model. Training of the partial least squares regression spectral analysis model includes training using the spectral data and corresponding target variables (e.g., concentration, activity, etc.) to obtain an optimal linear relationship between the spectral data and the target variables; optimization of the partial least squares regression spectral analysis model includes optimizing parameters, such as the number of principal components, regularization parameters, etc., to improve the predictive power and generalization performance of the model. Training a correlation model of a gene sequence and metabolic activity includes training using gene sequence data of a microorganism and corresponding metabolic activity data to obtain a key region or pattern in the gene sequence that is significantly correlated with metabolic activity; the optimization of the correlation model of the gene sequence and the metabolic activity comprises optimizing parameters and algorithms of the correlation model to improve the accuracy of the model in predicting the metabolic activity of a new sample, and specifically comprises feature selection, algorithm adjustment or data enhancement technology and the like. Training of the microorganism classification model includes training using microorganism characteristic data (e.g., gene sequences, metabolite profiles, electrochemical characteristics, etc.) with class labels to obtain characteristics or patterns that can distinguish between different microorganism classes; optimization of the microbial classification model includes optimizing parameters, structures or algorithms of the classification model to improve classification accuracy and generalization ability, including specifically adjusting hyper-parameters of the classifier, using more complex feature extraction methods or trying different classification algorithms.
Further, optimization of the partial least squares regression spectral analysis model also includes preprocessing of the spectral data (e.g., smoothing, baseline correction, etc.);
further, optimizing the microorganism classification model further comprises adopting an ensemble learning technology (such as random forest, gradient lifting machine and the like) to improve classification performance.
Specifically, a prediction model, a comprehensive characteristic analysis model, a partial least squares regression spectrum analysis model, a correlation model of gene sequences and metabolic activities and a microorganism classification model are respectively trained and optimized based on different modes, so that the accuracy and generalization of each model are improved.
S50: when the detection data of the electroactive microorganisms input by the user terminal are received, preprocessing the detection data based on a preset data processing flow, and inputting the preprocessed detection data into a corresponding model for analysis.
In this embodiment, the preset data processing flow is a preset data preprocessing flow based on type matching of input detection data.
Specifically, after model construction, training and optimization are completed, when detection data of electroactive microorganisms input by a user terminal are received, the type of the detection data is identified, the detection data is preprocessed by a preset data processing flow based on the identification result, and the detection data is input to a corresponding model for analysis after preprocessing.
In one embodiment, as shown in fig. 2, step S10 includes the steps of:
s11: collecting original current data obtained by detecting electroactive microorganisms, obtaining a mixed signal containing an electric signal generated by microorganism metabolism and environmental noise, and collecting experimental data of an electrochemical system;
S12: noise reduction processing is carried out on the original current data to remove clutter and interference signals, and noise reduction current data are obtained;
s13: and carrying out normalization processing on the noise reduction current data and the experimental data of the electrochemical system to obtain effective current data and electrochemical system data.
In this embodiment, the original current data is a mixed signal containing an electrical signal generated by microbial metabolism, environmental noise and system noise; the noise reduction process is a process of removing unnecessary clutter, interference signals or noise from the signal to improve the signal to noise ratio of the signal, and specifically comprises filtering, wavelet transformation, fourier transformation and the like; the normalization process is to scale the data according to a certain preset rule so as to make the data fall in a specific range, so that the comparison and analysis are convenient.
Specifically, the original current signal obtained by detecting the electroactive microorganism and the experimental data of the electrochemical system are collected to obtain the mixed signal containing the electric signal generated by microorganism metabolism and environmental noise and the information of the electrochemical system, noise reduction processing is carried out on the original current data to remove clutter and interference signals, normalization processing is carried out on the noise reduction current data and the experimental data of the electrochemical system to obtain effective current data and electrochemical system data, and the effective current data and the electrochemical system data are used as the basis for subsequent data analysis and model construction.
Further, the noise reduction processing is performed by adopting a wavelet transformation technology, and the method comprises the following steps:
performing time-frequency analysis on the mixed signal containing the electrical signal generated by microbial metabolism and the environmental noise based on a wavelet transformation technology, and performing multi-scale decomposition on the signal according to a wavelet transformation result to identify high-frequency noise and low-frequency useful signals;
Removing high-frequency noise and retaining low-frequency useful signals;
the low frequency useful signal is further optimized based on a signal processing algorithm to ensure its authenticity.
In one embodiment, step S20 includes the steps of:
S21: calculating correlation coefficients between each parameter of the effective current data and the power output of other preset parameters in the electrochemical system data respectively;
s22: if the correlation coefficient between any parameter and the power output is higher than a preset threshold value, determining the parameter as the power output correlation characteristic.
In this embodiment, the correlation coefficient is a statistic for measuring the degree of correlation between each parameter in the effective current data and the power output in the electrochemical system data, and between each parameter other than the power output in the electrochemical system data and the power output.
Specifically, the correlation coefficients between each parameter and power output of the effective current data and the correlation coefficients between the power output and all preset parameters except the power output in the electrochemical system data are calculated respectively, and when the correlation coefficient between any parameter or a plurality of parameters and the power output is larger than a preset threshold value, the parameter is determined to be the correlation feature affecting the power output, so that the automatic extraction of the key feature is completed.
In one embodiment, the input data set includes biofilm activity data, microbial gene sequence data, metabolite data, water sample data, colony characterization data, electronic transmission information data, and spectral characteristics data, as shown in FIG. 3, step S30 includes the steps of:
s31: constructing a prediction model for predicting power output under different conditions based on each relevant feature based on the relevant features;
S32: performing dimension reduction treatment on the water sample data and the electrochemical characteristic data, and constructing a comprehensive characteristic analysis model;
s33: constructing a partial least squares regression spectral analysis model based on the biomembrane activity data and the spectral characteristic data;
s34: constructing a correlation model of gene sequences and metabolic activities based on microbial gene sequence data and metabolite data;
S35: and constructing a microorganism classification model based on the electronically transmitted information data and the colony characteristic data.
In this embodiment, the dimension reduction process is a process of converting original high-dimensional data into low-dimensional data through mathematical transformation, and specifically includes principal component analysis, linear discriminant analysis, t-SNE, and the like.
Specifically, a prediction model for predicting power output under different conditions based on each relevant feature is constructed based on relevant features, a comprehensive characteristic analysis model for analyzing comprehensive influences of water samples and electrochemical characteristics is constructed based on water sample data and electrochemical characteristic data after dimension reduction treatment, a partial least squares regression spectrum analysis model for analyzing relationships between biological film activity and spectral characteristics is constructed based on biological film activity data and spectral characteristic data, a correlation model for analyzing gene sequences and metabolic activity associated with the gene sequences and the metabolic activity is constructed based on microbial gene sequence data and metabolite data, a microbial classification model for identifying and classifying microorganisms is constructed based on electronic transmission information data and colony characteristic data, and efficient and accurate identification and classification of electric activity microbial types and detection data analysis are realized by constructing different multiple models based on electric activity microbial detection.
In one embodiment, as shown in FIG. 4, the water sample data includes chemical oxygen demand data, and step S32 includes the steps of:
s321: performing data cleaning and preprocessing on the chemical oxygen demand data and the electrochemical characteristic data, wherein the data cleaning and preprocessing comprises abnormal value removal and standardization processing;
S322: performing dimension reduction treatment on the chemical oxygen demand data and the electrochemical characteristic data subjected to data cleaning and pretreatment based on a principal component analysis technology to obtain a plurality of principal component characteristics and constructing a principal component load matrix which displays the correlation between each principal component and the chemical oxygen demand data and the electrochemical characteristic data;
S323: calculating the score of each sample on each main component based on the main component load matrix, the chemical oxygen demand data and the electrochemical characteristic data before data cleaning and pretreatment, wherein the samples are chemical oxygen demand data and corresponding electrochemical characteristic data in single water sample data;
s324: a synthetic characteristic analysis model is constructed based on the score of each sample on each principal component and the chemical oxygen demand data.
In this embodiment, the data cleaning and preprocessing is a method of removing the interference data including removing abnormal values and performing normalization processing; principal component analysis is the transformation of raw data into a set of linearly uncorrelated variables (i.e., principal component features) by orthogonal transformation, which preserve as much as possible the variation information in the raw data; the principal component load matrix is an important output in principal component analysis that represents the correlation between the original variable and the principal component.
Specifically, the data cleaning and preprocessing are carried out on the chemical oxygen demand data and the electrochemical characteristic data, the abnormal values are removed, the standardized processing is carried out, the dimensionality reduction processing is carried out on the cleaned and preprocessed data based on a principal component analysis technology, a plurality of principal component characteristics are obtained, a principal component load matrix is constructed, the principal component load matrix represents the weight of each original characteristic on each principal component, the score of each sample on each principal component is calculated based on the principal component load matrix result, and a comprehensive characteristic analysis model for analyzing the comprehensive influence of the water sample and the electrochemical characteristics is constructed based on the calculated score and the original chemical oxygen demand data.
Further, the water sample data includes conductivity, pH, oxidation-reduction potential, dissolved oxygen, and Chemical Oxygen Demand (COD); and analyzing the conductivity, the pH value, the oxidation-reduction potential, the dissolved oxygen and the electrochemical characteristic data based on a principal component analysis technology to obtain a plurality of principal component characteristics.
Further, fitting goodness analysis and significance test are carried out on the constructed comprehensive characteristic analysis model, so that the interpretation power and prediction accuracy of the model are judged, and the effectiveness of the model is verified.
In one embodiment, as shown in fig. 5, step S34 includes the steps of:
s341: acquiring a gene sequence of a known microorganism, and constructing a evolutionary tree based on the gene sequence data of the microorganism and the gene sequence of the known microorganism;
S342: analyzing the gene sequence data of the microorganism by a genetic variation analysis tool to obtain genetic characteristic data, and determining the gene expression level of the microorganism under different conditions by a transcriptome method to obtain gene regulation data;
s343: and constructing a correlation model of the gene sequence and the metabolic activity based on the microorganism gene sequence data, the metabolite data, the evolutionary tree, the genetic characteristic data and the gene regulation data.
In this example, the gene sequence of a microorganism is known to be a nucleotide sequence of DNA or RNA in the genome of the microorganism, which contains information required for vital activities such as growth, reproduction, metabolism, etc. of the microorganism; the evolutionary tree is used for displaying the evolutionary relationship of genes in the form of a dendrogram; genetic variation analysis tools are software or algorithms for analyzing genetic variation in the genome of an organism; transcriptomics is a disciplinary method of studying all RNA molecules of a particular cell or tissue under a particular physiological or pathological condition; the gene regulation data are data describing the gene expression regulation mechanism, including information on gene transcription level, post-transcriptional regulation, translational regulation and the like.
Specifically, the genetic sequence of the known electroactive microorganism is obtained, a evolutionary tree is constructed, the evolutionary tree can display the evolutionary relationship among the electroactive microorganisms, genetic variation analysis tools are used for carrying out deep analysis on the genetic sequence data of the microorganisms to obtain genetic characteristic data of the microorganisms, a transcriptome method is used for measuring the gene expression level of the microorganisms under different environmental conditions to obtain gene regulation data, and a correlation model for analyzing the correlation between the genetic sequence and the metabolic activity is constructed based on the genetic sequence data of the microorganisms, the metabolite data, the evolutionary tree, the genetic characteristic data and the gene regulation data.
In one embodiment, as shown in fig. 6, step S35 includes the steps of:
S351: data cleaning is carried out on the electronic transmission information data and the colony characteristic data, and a comprehensive data set of extracellular electronic transmission and colony characteristics is obtained;
S352: extracting classification features related to the classification of the electroactive microorganisms from the comprehensive data set based on a feature engineering technology, and scoring the extracted classification features by a preset scoring strategy;
S353: and selecting specific characteristics in a specific scoring range from the classification characteristics by a preset screening mode, and inputting a random forest model to construct a microorganism classification model.
In this example, the electronically transmitted information data is data describing the extracellular electron transmission activity of microorganisms (e.g., certain bacteria or fungi), wherein the electron transmission activity is generally related to the metabolic processes, energy production and transmission of the microorganisms; colony characteristic data is data describing the size, shape, color, texture, etc. characteristics of colonies formed by microorganisms grown on solid media, which are often used for classification and identification of microorganisms; the data cleaning is a data preprocessing step for identifying and correcting errors, anomalies, missing values or inconsistent information in the data set and ensuring the accuracy, integrity and consistency of the data; feature engineering is the process of extracting, constructing or selecting features from raw data, the selected features helping the machine learning model to better understand and predict target variables; classification features are features closely related to classification tasks (e.g., microbial classification) that can help machine learning models differentiate between different classes; the preset scoring strategy is a method or standard for evaluating the importance of the features in the classification task, and the influence of the features on the performance of the classification model can be quantified by scoring each feature; the random forest model is an integrated learning method based on decision trees, and the stability and the accuracy of the model can be improved by constructing a plurality of decision trees and integrating the prediction results of the decision trees.
Specifically, the electronic transmission information data and the colony characteristic data are subjected to data cleaning to obtain a comprehensive data set of extracellular electronic transmission and colony characteristics, classification characteristics related to the classification of electroactive microorganisms are extracted from the comprehensive data set by using a characteristic engineering technology, the extracted classification characteristics are scored, specific characteristics in a specific scoring range are screened out through a preset screening mode and are used for inputting a random forest model, and therefore a microorganism classification model for identifying and classifying microorganisms is constructed.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.
In one embodiment, an electroactive microorganism-based detection data analysis device is provided, which corresponds to one of the electroactive microorganism-based detection data analysis methods in the above embodiments.
An electroactive microorganism-based detection data analysis device, comprising:
The data processing module is used for preprocessing the collected raw data based on the detection of the electroactive microorganisms and the raw data based on the detection of the environment and the electrochemical system to obtain electrochemical system data, effective current data and an input data set;
The feature extraction module is used for carrying out correlation analysis on electrochemical system data and effective current data, extracting power output related features and screening electrochemical feature data from the electrochemical system data;
The model construction module is used for constructing a prediction model, a comprehensive characteristic analysis model, a partial least squares regression spectrum analysis model, a correlation model of a gene sequence and metabolic activity and a microorganism classification model based on the input data set, the power output related characteristics and the electrochemical characteristic data;
the model optimization module is used for respectively training and optimizing a prediction model, a comprehensive characteristic analysis model, a partial least squares regression spectrum analysis model, a correlation model of a gene sequence and metabolic activity and a microorganism classification model;
the data input module is used for preprocessing the detection data based on a preset data processing flow when receiving the detection data of the electroactive microorganisms input by the user terminal, and inputting the preprocessed detection data into the corresponding model for analysis.
Optionally, the data processing module includes:
The data acquisition sub-module is used for acquiring original current data obtained by detecting the electroactive microorganisms, acquiring mixed signals containing electric signals generated by microorganism metabolism and environmental noise, and acquiring experimental data of an electrochemical system;
the noise reduction processing sub-module is used for carrying out noise reduction processing on the original current data so as to remove clutter and interference signals and obtain noise reduction current data;
And the normalization processing sub-module is used for carrying out normalization processing on the noise reduction current data and the experimental data of the electrochemical system to obtain effective current data and electrochemical system data.
Optionally, the feature extraction module includes:
The correlation coefficient calculation sub-module is used for calculating correlation coefficients among each parameter of the effective current data, other preset parameters in the electrochemical system data and the power output respectively;
and the parameter comparison sub-module is used for determining that any parameter is a power output related characteristic when the correlation coefficient between the parameter and the power output is higher than a preset threshold value.
Optionally, the model building module includes:
the prediction model construction module is used for constructing a prediction model for predicting power output under different conditions based on each relevant characteristic based on the relevant characteristic;
the comprehensive characteristic analysis model construction module is used for carrying out dimension reduction treatment on the water sample data and the electrochemical characteristic data and constructing a comprehensive characteristic analysis model;
the partial least square regression spectrum analysis model construction module is used for constructing a partial least square regression spectrum analysis model based on the biological film activity data and the spectrum characteristic data;
the related model construction module is used for constructing a related model of the gene sequence and the metabolic activity based on the microbial gene sequence data and the metabolic product data;
And the microorganism classification model construction module is used for constructing a microorganism classification model based on the electronic transmission information data and the colony characteristic data.
Optionally, the comprehensive characteristic analysis model building module includes:
the pretreatment sub-module is used for carrying out data cleaning and pretreatment on the chemical oxygen demand data and the electrochemical characteristic data, wherein the data cleaning and pretreatment comprises abnormal value removal and standardization treatment;
The main component load matrix construction submodule is used for carrying out dimension reduction treatment on the chemical oxygen demand data and the electrochemical characteristic data subjected to data cleaning and pretreatment based on a main component analysis technology, obtaining a plurality of main component characteristics and constructing a main component load matrix which displays the correlation between each main component and the chemical oxygen demand data and the electrochemical characteristic data;
A principal component score computation sub-module for computing a score for each sample on each principal component based on the principal component load matrix and the chemical oxygen demand data and electrochemical characteristic data prior to data cleaning and preprocessing;
And the comprehensive characteristic analysis model construction submodule is used for constructing a comprehensive characteristic analysis model based on the score of each sample on each main component and the chemical oxygen demand data.
Optionally, the correlation model construction module of the gene sequence and the metabolic activity comprises:
the evolutionary tree construction submodule is used for acquiring the gene sequence of the known microorganism and constructing an evolutionary tree based on the gene sequence data of the microorganism and the gene sequence of the known microorganism;
The data acquisition submodule is used for analyzing the microorganism gene sequence data through a genetic variation analysis tool to acquire genetic characteristic data, and measuring the gene expression level of microorganisms under different conditions through a transcriptome method to acquire gene regulation data;
And the correlation model construction submodule is used for constructing a correlation model of the gene sequence and the metabolic activity based on the microorganism gene sequence data, the metabolite data, the evolutionary tree, the genetic characteristic data and the gene regulation data.
Optionally, the microorganism classification model construction module includes:
The data cleaning submodule is used for carrying out data cleaning on the electronic transmission information data and the colony characteristic data to obtain a comprehensive data set of the extracellular electronic transmission and the colony characteristic;
The characteristic evaluation sub-module is used for extracting classification characteristics related to the classification of the electroactive microorganisms from the comprehensive data set based on a characteristic engineering technology and scoring the extracted classification characteristics by a preset scoring strategy;
The microorganism classification model construction submodule is used for selecting specific characteristics in a specific scoring range from classification characteristics through a preset screening mode and inputting a random forest model so as to construct a microorganism classification model.
Specific limitations regarding an electroactive microorganism-based detection data analysis device can be found in the above description of a method for analyzing electroactive microorganism-based detection data, and are not described in detail herein. The above-described modules of an electroactive microorganism-based assay data analysis device may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.
Claims (6)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410766467.0A CN118503787B (en) | 2024-06-14 | 2024-06-14 | Data analysis method and device based on electroactive microorganism detection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410766467.0A CN118503787B (en) | 2024-06-14 | 2024-06-14 | Data analysis method and device based on electroactive microorganism detection |
Publications (2)
Publication Number | Publication Date |
---|---|
CN118503787A CN118503787A (en) | 2024-08-16 |
CN118503787B true CN118503787B (en) | 2024-11-08 |
Family
ID=92240992
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410766467.0A Active CN118503787B (en) | 2024-06-14 | 2024-06-14 | Data analysis method and device based on electroactive microorganism detection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118503787B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114813873A (en) * | 2022-04-18 | 2022-07-29 | 中国科学院重庆绿色智能技术研究院 | Microbial electrochemical analysis device and analysis method thereof |
CN117951584A (en) * | 2024-03-13 | 2024-04-30 | 青岛启弘信息科技有限公司 | Ocean data processing and information scheduling system based on AI and Internet of things technology |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
MX388995B (en) * | 2015-06-25 | 2025-03-20 | Native Microbials Inc | Methods, apparatuses, and systems for analyzing microorganism strains from complex heterogeneous communities, predicting and identifying functional relationships and interactions thereof, and selecting and synthesizing microbial ensembles based thereon |
CA3068084C (en) * | 2017-09-09 | 2023-11-28 | Neil Gordon | Bioanalyte signal amplification and detection with artificial intelligence diagnosis |
WO2022179444A1 (en) * | 2021-02-25 | 2022-09-01 | 华谱科仪(大连)科技有限公司 | Chromatographic analysis system, method for detecting and analyzing chromatogram, and electronic device |
CN113820376B (en) * | 2021-09-14 | 2025-03-04 | 南开大学 | A comprehensive toxicant monitoring method for microbial electrochemical sensors based on machine learning models |
CN117349782B (en) * | 2023-12-06 | 2024-02-20 | 湖南嘉创信息科技发展有限公司 | Intelligent data early warning decision tree analysis method and system |
CN118051859B (en) * | 2024-04-15 | 2024-08-06 | 深圳市俊元生物科技有限公司 | Automatic analysis system for microorganism culture result |
-
2024
- 2024-06-14 CN CN202410766467.0A patent/CN118503787B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114813873A (en) * | 2022-04-18 | 2022-07-29 | 中国科学院重庆绿色智能技术研究院 | Microbial electrochemical analysis device and analysis method thereof |
CN117951584A (en) * | 2024-03-13 | 2024-04-30 | 青岛启弘信息科技有限公司 | Ocean data processing and information scheduling system based on AI and Internet of things technology |
Also Published As
Publication number | Publication date |
---|---|
CN118503787A (en) | 2024-08-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Monroy et al. | Fault diagnosis of a benchmark fermentation process: a comparative study of feature extraction and classification techniques | |
Zhou et al. | Data pre-processing for analyzing microbiome data–A mini review | |
CN119470349A (en) | A method, system, device and medium for detecting water turbidity using multi-light source scattering | |
Halgamuge et al. | Lessons learned from the application of machine learning to studies on plant response to radio-frequency | |
CN113343361A (en) | Intelligent monitoring method, device and equipment for vehicle body size and storage medium | |
CN117309838A (en) | Industrial park water pollution tracing method based on three-dimensional fluorescence characteristic data | |
CN119249367B (en) | Intelligent environment monitoring method, system, equipment and readable storage medium | |
CN119000487B (en) | A cell death detection method and system based on fluorescence technology | |
CN118503787B (en) | Data analysis method and device based on electroactive microorganism detection | |
CN118428608B (en) | Essence production quality full-flow tracing method and system based on data analysis | |
CN119830051A (en) | Organic fertilizer production parameter optimization control method for composite microbial agent | |
CN119226976A (en) | Sewage treatment effect evaluation method and system based on data analysis | |
Mulvey et al. | Assessing the adequacy of morphological models used in palaeobiology | |
CN119252349A (en) | Methods for single-cell transcriptome data-assisted AD analysis and classification | |
CN118072825B (en) | A method for identifying and analyzing microorganisms in soil | |
CN118380066A (en) | Gradient lifting integrated learning algorithm and three-dimensional fluorescence-based rapid detection method and device for ammonia nitrogen in water | |
Parikh et al. | An application of matching after learning to stretch (MALTS) to the ACIC 2018 causal inference challenge data | |
CN117952482A (en) | A product quality accident classification method and system based on convolutional neural network | |
Liao et al. | Efficient and robust bayesian selection of hyperparameters in dimension reduction for visualization | |
Sinha et al. | A study of feature selection and extraction algorithms for cancer subtype prediction | |
Obare et al. | Advancing statistical methodologies for composite phenotype analysis in genome-wide association studies | |
CN118657232B (en) | Prediction model construction method, method for detecting pathogenic microorganisms in groundwater, and computer program product | |
Poignard et al. | Feature screening with kernel knockoffs | |
CN120280037A (en) | Key compound identification method and device based on Daqu grade | |
Rezvani et al. | Data cleaning for image-based profiling enhancement |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |