[go: up one dir, main page]

CN118503787B - Data analysis method and device based on electroactive microorganism detection - Google Patents

Data analysis method and device based on electroactive microorganism detection Download PDF

Info

Publication number
CN118503787B
CN118503787B CN202410766467.0A CN202410766467A CN118503787B CN 118503787 B CN118503787 B CN 118503787B CN 202410766467 A CN202410766467 A CN 202410766467A CN 118503787 B CN118503787 B CN 118503787B
Authority
CN
China
Prior art keywords
data
model
characteristic
analysis
electrochemical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410766467.0A
Other languages
Chinese (zh)
Other versions
CN118503787A (en
Inventor
王樊
冯国仁
韦雪柠
黄子其
陆心卉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Gdh Water Co ltd
Guangdong Yuehai Water Inspection Technology Co ltd
Original Assignee
Guangdong Gdh Water Co ltd
Guangdong Yuehai Water Inspection Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Gdh Water Co ltd, Guangdong Yuehai Water Inspection Technology Co ltd filed Critical Guangdong Gdh Water Co ltd
Priority to CN202410766467.0A priority Critical patent/CN118503787B/en
Publication of CN118503787A publication Critical patent/CN118503787A/en
Application granted granted Critical
Publication of CN118503787B publication Critical patent/CN118503787B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/27Regression, e.g. linear or logistic regression
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biotechnology (AREA)
  • Computational Linguistics (AREA)
  • Public Health (AREA)
  • Databases & Information Systems (AREA)
  • Biophysics (AREA)
  • Bioethics (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The application relates to an electroactive microorganism detection data analysis method and device, comprising the following steps: preprocessing the original data; carrying out correlation analysis on the preprocessed data, and extracting relevant features; constructing a model, training the model and optimizing the model; when the detection data of the electroactive microorganisms are received, the detection data are preprocessed and input into a corresponding model for analysis. According to the application, the collected original data of the detection of the electroactive microorganisms and the original data detected by the environment and the electrochemical system are subjected to pretreatment and correlation analysis, the characteristics related to power output and the like are extracted, and a plurality of models related to the detection of the electroactive microorganisms are constructed based on the pretreated data, so that the efficient and accurate identification and classification of the types of the electroactive microorganisms and the analysis of the detection data are realized, and the method has the effect of improving the accuracy and the adaptability of the analysis of the detection data of the electroactive microorganisms.

Description

Data analysis method and device based on electroactive microorganism detection
Technical Field
The application relates to the technical field of data analysis, in particular to a method and a device for analyzing detection data based on electroactive microorganisms.
Background
With the rapid development of the environmental monitoring and bioenergy fields, the electroactive microorganism detection technology is widely applied as an important technical means. However, due to the complexity and diversity of the detection data, conventional data analysis methods often have difficulty meeting practical requirements. On one hand, the traditional data analysis method mainly depends on manual experience, and has the problems of low efficiency, strong subjectivity and the like; on the other hand, the difficulty of data analysis is greatly increased due to the characteristics of high dimension, large noise, multiple missing values and the like of the detection data.
Therefore, how to automatically extract key features from massive electroactive microorganism detection data, and realize efficient and accurate electroactive microorganism type identification and classification, so as to effectively analyze the electroactive microorganism detection data is a technical problem to be solved urgently.
Disclosure of Invention
In order to realize efficient and accurate identification and classification of electroactive microorganism types and analysis of detection data and improve the accuracy and adaptability of the electroactive microorganism detection data analysis, the application provides a data analysis method and device based on electroactive microorganism detection.
The first object of the present application is achieved by the following technical solutions:
An electroactive microorganism detection-based data analysis method, comprising the steps of:
Preprocessing the collected raw data based on detection of electroactive microorganisms and the raw data based on detection of environments and an electrochemical system to obtain electrochemical system data, effective current data and an input data set;
Carrying out correlation analysis on the electrochemical system data and the effective current data, extracting power output related characteristics, and screening electrochemical characteristic data from the electrochemical system data;
Constructing a prediction model, a comprehensive characteristic analysis model, a partial least squares regression spectrum analysis model, a correlation model of gene sequences and metabolic activities and a microorganism classification model based on the input data set, the power output related characteristics and the electrochemical characteristic data;
training and optimizing a prediction model, a comprehensive characteristic analysis model, a partial least squares regression spectrum analysis model, a correlation model of gene sequences and metabolic activities and a microorganism classification model respectively;
when the detection data of the electroactive microorganisms input by the user terminal are received, preprocessing the detection data based on a preset data processing flow, and inputting the preprocessed detection data into a corresponding model for analysis.
By adopting the technical scheme, the collected original data about the detection of the electroactive microorganisms and the original data detected based on the environment and the electrochemical system are preprocessed, so that noise, abnormal values, missing values and the like are eliminated, and the influence on data analysis is reduced. And then carrying out correlation analysis on the electrochemical system data and the effective current data, extracting the characteristics related to power output, and simultaneously screening electrochemical characteristic data reflecting the electrochemical activity of the microorganism and revealing the interaction between the microorganism and the environment from the electrochemical system data as the basis for the subsequent model construction. Based on the input data set, the power output related features and the electrochemical feature data, a plurality of models for predicting microbial activity, analyzing the relationship of the microbes to the environment, analyzing the spectral data, analyzing the association of genes with metabolism, and classifying the microbes are constructed. After the model is built, each model is trained and optimized, so that the model can be better fit with data, the precision and generalization capability of the model are improved, and the internal rules and features of the data can be accurately captured by the models. After training and optimizing the model, preprocessing the electroactive microorganism detection data input by the user terminal according to a preset data processing flow, and inputting the preprocessed detection data into a corresponding model for analysis so as to complete detection data analysis based on the electroactive microorganism. According to the application, the collected original data of the detection of the electroactive microorganisms and the original data detected by the environment and the electrochemical system are subjected to pretreatment and correlation analysis, so that the key characteristics related to power output and the like are automatically extracted, a plurality of models related to the detection of the electroactive microorganisms are constructed through the processed data, the efficient and accurate identification and classification of the types of the electroactive microorganisms and the analysis of the detection data are realized, and the method has the effects of improving the accuracy and the adaptability of the analysis of the detection data of the electroactive microorganisms.
The present application may be further configured in a preferred example to: the step of preprocessing the collected raw data based on detection of electroactive microorganisms and raw data based on detection of an environment and an electrochemical system to obtain electrochemical system data, effective current data and an input data set comprises the steps of:
Collecting original current data obtained by detecting electroactive microorganisms, obtaining a mixed signal containing an electric signal generated by microorganism metabolism and environmental noise, and collecting experimental data of an electrochemical system;
noise reduction processing is carried out on the original current data to remove clutter and interference signals, and noise reduction current data are obtained;
And carrying out normalization processing on the noise reduction current data and the experimental data of the electrochemical system to obtain effective current data and electrochemical system data.
By adopting the technical scheme, the original current signal obtained by detecting the electroactive microorganisms and the experimental data of the electrochemical system are acquired to obtain the mixed signal containing the electric signal generated by microorganism metabolism and environmental noise and the information of the electrochemical system, noise reduction processing is carried out on the original current data to remove clutter and interference signals, normalization processing is carried out on the noise reduction current data and the experimental data of the electrochemical system to obtain effective current data and electrochemical system data, and the effective current data and the electrochemical system data are used as the basis for subsequent data analysis and model construction.
The present application may be further configured in a preferred example to: the step of performing correlation analysis on the electrochemical system data and the effective current data, extracting power output related characteristics, and screening electrochemical characteristic data from the electrochemical system data comprises the steps of:
calculating correlation coefficients between each parameter of the effective current data and the power output of other preset parameters in the electrochemical system data respectively;
if the correlation coefficient between any parameter and the power output is higher than a preset threshold value, determining the parameter as the power output correlation characteristic.
By adopting the technical scheme, the correlation coefficients between each parameter and power output of the effective current data and the correlation coefficients between the power output and other preset parameters except the power output in the electrochemical system data are calculated respectively, and when the correlation coefficient between any parameter or any plurality of parameters and the power output is larger than a preset threshold value, the parameter is determined to be the correlation characteristic affecting the power output, so that the automatic extraction of the key characteristic is completed.
The present application may be further configured in a preferred example to: the input data set comprises biomembrane activity data, microorganism gene sequence data, metabolite data, water sample data, colony characteristic data, electronic transmission information data and spectrum characteristic data, and the steps of constructing a prediction model, a comprehensive characteristic analysis model, a partial least squares regression spectrum analysis model, a correlation model of gene sequences and metabolic activities and a microorganism classification model based on the input data set, the power output related characteristics and the electrochemical characteristic data comprise the following steps:
constructing a prediction model for predicting power output under different conditions based on each relevant feature based on the relevant features;
performing dimension reduction treatment on the water sample data and the electrochemical characteristic data, and constructing a comprehensive characteristic analysis model;
constructing a partial least squares regression spectral analysis model based on the biomembrane activity data and the spectral characteristic data;
Constructing a correlation model of gene sequences and metabolic activities based on microbial gene sequence data and metabolite data;
and constructing a microorganism classification model based on the electronically transmitted information data and the colony characteristic data.
By adopting the technical scheme, a prediction model for predicting power output under different conditions based on each relevant characteristic is constructed based on the relevant characteristics, a comprehensive characteristic analysis model for analyzing comprehensive influences of water samples and electrochemical characteristics is constructed based on the water sample data and the electrochemical characteristic data after the dimension reduction treatment, a partial least squares regression spectrum analysis model for analyzing the relation between the biological film activity and the spectrum characteristic is constructed based on the biological film activity data and the spectrum characteristic data, a correlation model for analyzing the gene sequence and the metabolic activity in relation to the gene sequence is constructed based on the microbial gene sequence data and the metabolic product data, a microbial classification model for identifying and classifying microorganisms is constructed based on the electronic transmission information data and the colony characteristic data, and efficient and accurate identification and classification of the type of the electrically active microorganisms and analysis of detection data are realized by constructing a plurality of models based on detection of the electrically active microorganisms.
The present application may be further configured in a preferred example to: the water sample data comprises chemical oxygen demand data, and the steps of performing dimension reduction treatment on the water sample data and electrochemical characteristic data and constructing a comprehensive characteristic analysis model comprise the following steps:
Performing data cleaning and preprocessing on the chemical oxygen demand data and the electrochemical characteristic data, wherein the data cleaning and preprocessing comprises abnormal value removal and standardization processing;
Performing dimension reduction treatment on the chemical oxygen demand data and the electrochemical characteristic data subjected to data cleaning and pretreatment based on a principal component analysis technology to obtain a plurality of principal component characteristics and constructing a principal component load matrix which displays the correlation between each principal component and the chemical oxygen demand data and the electrochemical characteristic data;
Calculating the score of each sample on each main component based on the main component load matrix, the chemical oxygen demand data and the electrochemical characteristic data before data cleaning and pretreatment, wherein the samples are chemical oxygen demand data and corresponding electrochemical characteristic data in single water sample data;
a synthetic characteristic analysis model is constructed based on the score of each sample on each principal component and the chemical oxygen demand data.
By adopting the technical scheme, the data cleaning and preprocessing are carried out on the chemical oxygen demand data and the electrochemical characteristic data, the data cleaning and preprocessing comprises abnormal value removal and standardization processing, the dimensionality reduction processing is carried out on the cleaned and preprocessed data based on a principal component analysis technology, a plurality of principal component characteristics are obtained, a principal component load matrix is constructed, the principal component load matrix represents the weight of each original characteristic on each principal component, the score of each sample on each principal component is calculated based on the principal component load matrix result, and a comprehensive characteristic analysis model for analyzing the comprehensive influence of the water sample and the electrochemical characteristic is constructed based on the calculated score and the original chemical oxygen demand data.
The present application may be further configured in a preferred example to: the step of constructing a correlation model of gene sequences and metabolic activities based on microbial gene sequence data and metabolite data comprises the steps of:
Acquiring a gene sequence of a known microorganism, and constructing a evolutionary tree based on the gene sequence data of the microorganism and the gene sequence of the known microorganism;
Analyzing the gene sequence data of the microorganism by a genetic variation analysis tool to obtain genetic characteristic data, and determining the gene expression level of the microorganism under different conditions by a transcriptome method to obtain gene regulation data;
And constructing a correlation model of the gene sequence and the metabolic activity based on the microorganism gene sequence data, the metabolite data, the evolutionary tree, the genetic characteristic data and the gene regulation data.
By adopting the technical scheme, the genetic sequence of the known microorganism is obtained, the evolutionary tree is constructed, the evolutionary relationship among the microorganisms can be displayed, the genetic variation analysis tool is used for carrying out deep analysis on the genetic sequence data of the microorganisms to obtain genetic characteristic data of the microorganisms, the transcriptome method is used for measuring the gene expression level of the microorganisms under different environmental conditions to obtain gene regulation data, and a correlation model for analyzing the correlation between the genetic sequence and the metabolic activity is constructed based on the genetic sequence data of the microorganisms, the metabolite data, the evolutionary tree, the genetic characteristic data and the gene regulation data.
The present application may be further configured in a preferred example to: the step of constructing a microorganism classification model based on the electronic transmission information data and the colony characteristic data comprises the following steps:
Data cleaning is carried out on the electronic transmission information data and the colony characteristic data, and a comprehensive data set of extracellular electronic transmission and colony characteristics is obtained;
extracting classification features related to the classification of the electroactive microorganisms from the comprehensive data set based on a feature engineering technology, and scoring the extracted classification features by a preset scoring strategy;
And selecting specific characteristics in a specific scoring range from the classification characteristics by a preset screening mode, and inputting a random forest model to construct a microorganism classification model.
By adopting the technical scheme, the electronic transmission information data and the colony characteristic data are subjected to data cleaning to obtain a comprehensive data set of the extracellular electronic transmission and the colony characteristic, the characteristic engineering technology is used for extracting classification characteristics related to the classification of the electroactive microorganisms from the comprehensive data set, scoring is carried out on the extracted classification characteristics, and specific characteristics conforming to a specific scoring range are screened out in a preset screening mode and are used for inputting a random forest model, so that a microorganism classification model for identifying and classifying the microorganisms is constructed.
The second object of the present application is achieved by the following technical solutions:
an electroactive microorganism-based detection data analysis device, comprising:
The data processing module is used for preprocessing the collected raw data based on the detection of the electroactive microorganisms and the raw data based on the detection of the environment and the electrochemical system to obtain electrochemical system data, effective current data and an input data set;
The feature extraction module is used for carrying out correlation analysis on electrochemical system data and effective current data, extracting power output related features and screening electrochemical feature data from the electrochemical system data;
The model construction module is used for constructing a prediction model, a comprehensive characteristic analysis model, a partial least squares regression spectrum analysis model, a correlation model of a gene sequence and metabolic activity and a microorganism classification model based on the input data set, the power output related characteristics and the electrochemical characteristic data;
the model optimization module is used for respectively training and optimizing a prediction model, a comprehensive characteristic analysis model, a partial least squares regression spectrum analysis model, a correlation model of a gene sequence and metabolic activity and a microorganism classification model;
the data input module is used for preprocessing the detection data based on a preset data processing flow when receiving the detection data of the electroactive microorganisms input by the user terminal, and inputting the preprocessed detection data into the corresponding model for analysis.
By adopting the technical scheme, the data processing module is used for preprocessing the collected original data based on detection of the electroactive microorganisms and the original data based on detection of the environment and the electrochemical system to obtain electrochemical system data, effective current data and an input data set; the feature extraction module is used for carrying out correlation analysis on electrochemical system data and effective current data, extracting power output related features and screening electrochemical feature data from the electrochemical system data; the model construction module is used for constructing a prediction model, a comprehensive characteristic analysis model, a partial least squares regression spectrum analysis model, a correlation model of a gene sequence and metabolic activity and a microorganism classification model based on the input data set, the power output related characteristics and the electrochemical characteristic data; the model optimization module is used for respectively training and optimizing a prediction model, a comprehensive characteristic analysis model, a partial least squares regression spectrum analysis model, a correlation model of a gene sequence and metabolic activity and a microorganism classification model; the data input module is used for preprocessing the detection data based on a preset data processing flow when receiving the detection data of the electroactive microorganisms input by the user side, and inputting the preprocessed detection data into the corresponding model for analysis.
In summary, the present application includes at least one of the following beneficial technical effects:
1. According to the application, through preprocessing and correlation analysis of the collected original data of the detection of the electroactive microorganisms and the original data of the detection of the environment and the electrochemical system, key characteristics related to power output and the like are automatically extracted, and a plurality of models related to the detection of the electroactive microorganisms are constructed through the processed data, so that efficient and accurate identification and classification of the types of the electroactive microorganisms and analysis of the detection data are realized, and the method has the effects of improving the accuracy and adaptability of the analysis of the detection data of the electroactive microorganisms;
2. after the model is built, each model is trained and optimized, so that the model can be better fitted with data, the precision and generalization capability of the model are improved, the internal rules and characteristics of the data can be accurately captured by each model, and the accuracy and adaptability of each model to the analysis of the electroactive microorganism detection data are further improved.
Drawings
FIG. 1 is a flow chart of an embodiment of an electroactive microorganism-based assay method of the present application;
FIG. 2 is a flowchart showing an implementation of step S10 in an embodiment of a method for analyzing detection data based on electroactive microorganisms according to the present application;
FIG. 3 is a flowchart showing an implementation of step S30 in an embodiment of a method for analyzing detection data based on electroactive microorganisms according to the present application;
FIG. 4 is a flowchart showing an implementation of step S32 in an embodiment of a method for analyzing detection data based on electroactive microorganisms according to the present application;
FIG. 5 is a flowchart showing an implementation of step S34 in an embodiment of a method for analyzing detection data based on electroactive microorganisms according to the present application;
FIG. 6 is a flowchart showing an implementation of step S35 in an embodiment of the method for analyzing detection data based on electroactive microorganisms according to the present application.
Detailed Description
The application is described in further detail below with reference to fig. 1-6.
In one embodiment, as shown in fig. 1, the application discloses a method for analyzing detection data based on electroactive microorganisms, which specifically comprises the following steps:
S10: preprocessing the collected raw data based on detection of electroactive microorganisms and the raw data based on detection of environments and an electrochemical system to obtain electrochemical system data, effective current data and an input data set;
In the present embodiment, the raw data based on the detection of the electroactive microorganism is a raw data set directly measured or observed without any treatment or analysis by using the electroactive microorganism as a biosensor or a detector for evaluating the activity, metabolic state, genetic composition, metabolic product, and electrochemical process-related characteristics of the electroactive microorganism; raw data detected based on the environment and the electrochemical system are raw data collected by an environment detection station or an electrochemical detection instrument on environmental monitoring (such as water quality, air quality, spectral characteristics, etc.) or electrochemical parameters (such as current, voltage, resistance, etc.); preprocessing is a series of preprocessing operations performed on raw data prior to data analysis; electrochemical system data is data about an electrochemical system obtained by an electrochemical method or technique that reflects the state, performance, or reaction process of the electrochemical system; the effective current data is the current data which can truly reflect the state of an electrochemical process or a system after proper treatment in the electrochemical system; the input data set is a set of data other than electrochemical system data and effective current data after preprocessing the collected raw data based on the detection of electrically active microorganisms and raw data based on the detection of environmental and electrochemical systems.
Specifically, the collected raw data about the detection of the electroactive microorganisms and the raw data detected based on the environment and the electrochemical system are preprocessed, so that noise, abnormal values, missing values and the like are eliminated, the influence on data analysis is reduced, and electrochemical system data, effective current data and an input data set are obtained.
S20: carrying out correlation analysis on the electrochemical system data and the effective current data, extracting power output related characteristics, and screening electrochemical characteristic data from the electrochemical system data;
in the present embodiment, the correlation analysis is a statistical method for studying the strength and direction of the relationship between the plurality of variables; the extraction of the power output related characteristics is to extract parameters or indexes closely related to the power output from the original data; electrochemical characteristic data are data or parameters that are closely related to the electrochemical process or system performance in electrochemical system data.
Specifically, correlation analysis is performed on the electrochemical system data and the effective current data, correlations between each parameter of the effective current data and other parameters in the electrochemical system data and power output are judged, closely related parameters are determined as power output related characteristics, extraction is performed, and parameters closely related to an electrochemical process are screened from the electrochemical system data and determined as electrochemical characteristic data.
S30: constructing a prediction model, a comprehensive characteristic analysis model, a partial least squares regression spectrum analysis model, a correlation model of gene sequences and metabolic activities and a microorganism classification model based on the input data set, the power output related characteristics and the electrochemical characteristic data;
In this embodiment, the prediction model is a mathematical model for predicting a target variable or result based on an input data set and related features, specifically predicting power output under different conditions based on each related feature; the comprehensive characteristic analysis model is a model for comprehensively considering a plurality of electrochemical characteristic data and related characteristics so as to analyze the overall performance or behavior of an electrochemical system, and is particularly used for analyzing the comprehensive influence of water samples and electrochemical characteristics; the partial least square regression spectrum analysis model is a model for realizing quantitative or qualitative analysis of spectrum data by combining spectrum data and target variables and establishing a relation between spectrum characteristics and the target variables, and particularly analyzes the relation between the activity of a biological film and the spectrum characteristics; the correlation model of the gene sequence and the metabolic activity is a correlation rule model established between the characteristics of the gene sequence and the metabolic activity, and specifically, the correlation of the gene sequence and the metabolic activity is analyzed; the microorganism classification model is a model for classifying microorganisms based on characteristic data of the microorganisms.
Specifically, a prediction model for predicting power output under different conditions based on each relevant characteristic, a comprehensive characteristic analysis model for analyzing comprehensive influences of water sample and electrochemical characteristics, a partial least squares regression spectral analysis model for analyzing the relation between biological film activity and spectral characteristics, a correlation model for analyzing the correlation between gene sequences and metabolic activity, and a microorganism classification model for classifying microorganisms are constructed based on an input data set, power output relevant characteristics and electrochemical characteristic data.
S40: training and optimizing a prediction model, a comprehensive characteristic analysis model, a partial least squares regression spectrum analysis model, a correlation model of gene sequences and metabolic activities and a microorganism classification model respectively;
in this embodiment, the training process for the predictive model includes training the predictive model using known input data and corresponding power outputs, typically through machine learning algorithms such as linear regression, decision trees, neural networks, and the like;
Wherein optimizing the predictive model includes adjusting parameters or structures of the predictive model to minimize prediction errors or improve processing performance of the model on unseen data, which typically involves techniques such as super-parametric tuning, feature selection, model selection, etc. Training the comprehensive characteristic analysis model comprises the steps of searching the most representative system performance index based on weight distribution, characteristic combination or selection and other modes; optimization of the synthetic trait analysis model includes adding new features, adjusting weights between features, or optimizing mathematical expressions of the model. Training of the partial least squares regression spectral analysis model includes training using the spectral data and corresponding target variables (e.g., concentration, activity, etc.) to obtain an optimal linear relationship between the spectral data and the target variables; optimization of the partial least squares regression spectral analysis model includes optimizing parameters, such as the number of principal components, regularization parameters, etc., to improve the predictive power and generalization performance of the model. Training a correlation model of a gene sequence and metabolic activity includes training using gene sequence data of a microorganism and corresponding metabolic activity data to obtain a key region or pattern in the gene sequence that is significantly correlated with metabolic activity; the optimization of the correlation model of the gene sequence and the metabolic activity comprises optimizing parameters and algorithms of the correlation model to improve the accuracy of the model in predicting the metabolic activity of a new sample, and specifically comprises feature selection, algorithm adjustment or data enhancement technology and the like. Training of the microorganism classification model includes training using microorganism characteristic data (e.g., gene sequences, metabolite profiles, electrochemical characteristics, etc.) with class labels to obtain characteristics or patterns that can distinguish between different microorganism classes; optimization of the microbial classification model includes optimizing parameters, structures or algorithms of the classification model to improve classification accuracy and generalization ability, including specifically adjusting hyper-parameters of the classifier, using more complex feature extraction methods or trying different classification algorithms.
Further, optimization of the partial least squares regression spectral analysis model also includes preprocessing of the spectral data (e.g., smoothing, baseline correction, etc.);
further, optimizing the microorganism classification model further comprises adopting an ensemble learning technology (such as random forest, gradient lifting machine and the like) to improve classification performance.
Specifically, a prediction model, a comprehensive characteristic analysis model, a partial least squares regression spectrum analysis model, a correlation model of gene sequences and metabolic activities and a microorganism classification model are respectively trained and optimized based on different modes, so that the accuracy and generalization of each model are improved.
S50: when the detection data of the electroactive microorganisms input by the user terminal are received, preprocessing the detection data based on a preset data processing flow, and inputting the preprocessed detection data into a corresponding model for analysis.
In this embodiment, the preset data processing flow is a preset data preprocessing flow based on type matching of input detection data.
Specifically, after model construction, training and optimization are completed, when detection data of electroactive microorganisms input by a user terminal are received, the type of the detection data is identified, the detection data is preprocessed by a preset data processing flow based on the identification result, and the detection data is input to a corresponding model for analysis after preprocessing.
In one embodiment, as shown in fig. 2, step S10 includes the steps of:
s11: collecting original current data obtained by detecting electroactive microorganisms, obtaining a mixed signal containing an electric signal generated by microorganism metabolism and environmental noise, and collecting experimental data of an electrochemical system;
S12: noise reduction processing is carried out on the original current data to remove clutter and interference signals, and noise reduction current data are obtained;
s13: and carrying out normalization processing on the noise reduction current data and the experimental data of the electrochemical system to obtain effective current data and electrochemical system data.
In this embodiment, the original current data is a mixed signal containing an electrical signal generated by microbial metabolism, environmental noise and system noise; the noise reduction process is a process of removing unnecessary clutter, interference signals or noise from the signal to improve the signal to noise ratio of the signal, and specifically comprises filtering, wavelet transformation, fourier transformation and the like; the normalization process is to scale the data according to a certain preset rule so as to make the data fall in a specific range, so that the comparison and analysis are convenient.
Specifically, the original current signal obtained by detecting the electroactive microorganism and the experimental data of the electrochemical system are collected to obtain the mixed signal containing the electric signal generated by microorganism metabolism and environmental noise and the information of the electrochemical system, noise reduction processing is carried out on the original current data to remove clutter and interference signals, normalization processing is carried out on the noise reduction current data and the experimental data of the electrochemical system to obtain effective current data and electrochemical system data, and the effective current data and the electrochemical system data are used as the basis for subsequent data analysis and model construction.
Further, the noise reduction processing is performed by adopting a wavelet transformation technology, and the method comprises the following steps:
performing time-frequency analysis on the mixed signal containing the electrical signal generated by microbial metabolism and the environmental noise based on a wavelet transformation technology, and performing multi-scale decomposition on the signal according to a wavelet transformation result to identify high-frequency noise and low-frequency useful signals;
Removing high-frequency noise and retaining low-frequency useful signals;
the low frequency useful signal is further optimized based on a signal processing algorithm to ensure its authenticity.
In one embodiment, step S20 includes the steps of:
S21: calculating correlation coefficients between each parameter of the effective current data and the power output of other preset parameters in the electrochemical system data respectively;
s22: if the correlation coefficient between any parameter and the power output is higher than a preset threshold value, determining the parameter as the power output correlation characteristic.
In this embodiment, the correlation coefficient is a statistic for measuring the degree of correlation between each parameter in the effective current data and the power output in the electrochemical system data, and between each parameter other than the power output in the electrochemical system data and the power output.
Specifically, the correlation coefficients between each parameter and power output of the effective current data and the correlation coefficients between the power output and all preset parameters except the power output in the electrochemical system data are calculated respectively, and when the correlation coefficient between any parameter or a plurality of parameters and the power output is larger than a preset threshold value, the parameter is determined to be the correlation feature affecting the power output, so that the automatic extraction of the key feature is completed.
In one embodiment, the input data set includes biofilm activity data, microbial gene sequence data, metabolite data, water sample data, colony characterization data, electronic transmission information data, and spectral characteristics data, as shown in FIG. 3, step S30 includes the steps of:
s31: constructing a prediction model for predicting power output under different conditions based on each relevant feature based on the relevant features;
S32: performing dimension reduction treatment on the water sample data and the electrochemical characteristic data, and constructing a comprehensive characteristic analysis model;
s33: constructing a partial least squares regression spectral analysis model based on the biomembrane activity data and the spectral characteristic data;
s34: constructing a correlation model of gene sequences and metabolic activities based on microbial gene sequence data and metabolite data;
S35: and constructing a microorganism classification model based on the electronically transmitted information data and the colony characteristic data.
In this embodiment, the dimension reduction process is a process of converting original high-dimensional data into low-dimensional data through mathematical transformation, and specifically includes principal component analysis, linear discriminant analysis, t-SNE, and the like.
Specifically, a prediction model for predicting power output under different conditions based on each relevant feature is constructed based on relevant features, a comprehensive characteristic analysis model for analyzing comprehensive influences of water samples and electrochemical characteristics is constructed based on water sample data and electrochemical characteristic data after dimension reduction treatment, a partial least squares regression spectrum analysis model for analyzing relationships between biological film activity and spectral characteristics is constructed based on biological film activity data and spectral characteristic data, a correlation model for analyzing gene sequences and metabolic activity associated with the gene sequences and the metabolic activity is constructed based on microbial gene sequence data and metabolite data, a microbial classification model for identifying and classifying microorganisms is constructed based on electronic transmission information data and colony characteristic data, and efficient and accurate identification and classification of electric activity microbial types and detection data analysis are realized by constructing different multiple models based on electric activity microbial detection.
In one embodiment, as shown in FIG. 4, the water sample data includes chemical oxygen demand data, and step S32 includes the steps of:
s321: performing data cleaning and preprocessing on the chemical oxygen demand data and the electrochemical characteristic data, wherein the data cleaning and preprocessing comprises abnormal value removal and standardization processing;
S322: performing dimension reduction treatment on the chemical oxygen demand data and the electrochemical characteristic data subjected to data cleaning and pretreatment based on a principal component analysis technology to obtain a plurality of principal component characteristics and constructing a principal component load matrix which displays the correlation between each principal component and the chemical oxygen demand data and the electrochemical characteristic data;
S323: calculating the score of each sample on each main component based on the main component load matrix, the chemical oxygen demand data and the electrochemical characteristic data before data cleaning and pretreatment, wherein the samples are chemical oxygen demand data and corresponding electrochemical characteristic data in single water sample data;
s324: a synthetic characteristic analysis model is constructed based on the score of each sample on each principal component and the chemical oxygen demand data.
In this embodiment, the data cleaning and preprocessing is a method of removing the interference data including removing abnormal values and performing normalization processing; principal component analysis is the transformation of raw data into a set of linearly uncorrelated variables (i.e., principal component features) by orthogonal transformation, which preserve as much as possible the variation information in the raw data; the principal component load matrix is an important output in principal component analysis that represents the correlation between the original variable and the principal component.
Specifically, the data cleaning and preprocessing are carried out on the chemical oxygen demand data and the electrochemical characteristic data, the abnormal values are removed, the standardized processing is carried out, the dimensionality reduction processing is carried out on the cleaned and preprocessed data based on a principal component analysis technology, a plurality of principal component characteristics are obtained, a principal component load matrix is constructed, the principal component load matrix represents the weight of each original characteristic on each principal component, the score of each sample on each principal component is calculated based on the principal component load matrix result, and a comprehensive characteristic analysis model for analyzing the comprehensive influence of the water sample and the electrochemical characteristics is constructed based on the calculated score and the original chemical oxygen demand data.
Further, the water sample data includes conductivity, pH, oxidation-reduction potential, dissolved oxygen, and Chemical Oxygen Demand (COD); and analyzing the conductivity, the pH value, the oxidation-reduction potential, the dissolved oxygen and the electrochemical characteristic data based on a principal component analysis technology to obtain a plurality of principal component characteristics.
Further, fitting goodness analysis and significance test are carried out on the constructed comprehensive characteristic analysis model, so that the interpretation power and prediction accuracy of the model are judged, and the effectiveness of the model is verified.
In one embodiment, as shown in fig. 5, step S34 includes the steps of:
s341: acquiring a gene sequence of a known microorganism, and constructing a evolutionary tree based on the gene sequence data of the microorganism and the gene sequence of the known microorganism;
S342: analyzing the gene sequence data of the microorganism by a genetic variation analysis tool to obtain genetic characteristic data, and determining the gene expression level of the microorganism under different conditions by a transcriptome method to obtain gene regulation data;
s343: and constructing a correlation model of the gene sequence and the metabolic activity based on the microorganism gene sequence data, the metabolite data, the evolutionary tree, the genetic characteristic data and the gene regulation data.
In this example, the gene sequence of a microorganism is known to be a nucleotide sequence of DNA or RNA in the genome of the microorganism, which contains information required for vital activities such as growth, reproduction, metabolism, etc. of the microorganism; the evolutionary tree is used for displaying the evolutionary relationship of genes in the form of a dendrogram; genetic variation analysis tools are software or algorithms for analyzing genetic variation in the genome of an organism; transcriptomics is a disciplinary method of studying all RNA molecules of a particular cell or tissue under a particular physiological or pathological condition; the gene regulation data are data describing the gene expression regulation mechanism, including information on gene transcription level, post-transcriptional regulation, translational regulation and the like.
Specifically, the genetic sequence of the known electroactive microorganism is obtained, a evolutionary tree is constructed, the evolutionary tree can display the evolutionary relationship among the electroactive microorganisms, genetic variation analysis tools are used for carrying out deep analysis on the genetic sequence data of the microorganisms to obtain genetic characteristic data of the microorganisms, a transcriptome method is used for measuring the gene expression level of the microorganisms under different environmental conditions to obtain gene regulation data, and a correlation model for analyzing the correlation between the genetic sequence and the metabolic activity is constructed based on the genetic sequence data of the microorganisms, the metabolite data, the evolutionary tree, the genetic characteristic data and the gene regulation data.
In one embodiment, as shown in fig. 6, step S35 includes the steps of:
S351: data cleaning is carried out on the electronic transmission information data and the colony characteristic data, and a comprehensive data set of extracellular electronic transmission and colony characteristics is obtained;
S352: extracting classification features related to the classification of the electroactive microorganisms from the comprehensive data set based on a feature engineering technology, and scoring the extracted classification features by a preset scoring strategy;
S353: and selecting specific characteristics in a specific scoring range from the classification characteristics by a preset screening mode, and inputting a random forest model to construct a microorganism classification model.
In this example, the electronically transmitted information data is data describing the extracellular electron transmission activity of microorganisms (e.g., certain bacteria or fungi), wherein the electron transmission activity is generally related to the metabolic processes, energy production and transmission of the microorganisms; colony characteristic data is data describing the size, shape, color, texture, etc. characteristics of colonies formed by microorganisms grown on solid media, which are often used for classification and identification of microorganisms; the data cleaning is a data preprocessing step for identifying and correcting errors, anomalies, missing values or inconsistent information in the data set and ensuring the accuracy, integrity and consistency of the data; feature engineering is the process of extracting, constructing or selecting features from raw data, the selected features helping the machine learning model to better understand and predict target variables; classification features are features closely related to classification tasks (e.g., microbial classification) that can help machine learning models differentiate between different classes; the preset scoring strategy is a method or standard for evaluating the importance of the features in the classification task, and the influence of the features on the performance of the classification model can be quantified by scoring each feature; the random forest model is an integrated learning method based on decision trees, and the stability and the accuracy of the model can be improved by constructing a plurality of decision trees and integrating the prediction results of the decision trees.
Specifically, the electronic transmission information data and the colony characteristic data are subjected to data cleaning to obtain a comprehensive data set of extracellular electronic transmission and colony characteristics, classification characteristics related to the classification of electroactive microorganisms are extracted from the comprehensive data set by using a characteristic engineering technology, the extracted classification characteristics are scored, specific characteristics in a specific scoring range are screened out through a preset screening mode and are used for inputting a random forest model, and therefore a microorganism classification model for identifying and classifying microorganisms is constructed.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.
In one embodiment, an electroactive microorganism-based detection data analysis device is provided, which corresponds to one of the electroactive microorganism-based detection data analysis methods in the above embodiments.
An electroactive microorganism-based detection data analysis device, comprising:
The data processing module is used for preprocessing the collected raw data based on the detection of the electroactive microorganisms and the raw data based on the detection of the environment and the electrochemical system to obtain electrochemical system data, effective current data and an input data set;
The feature extraction module is used for carrying out correlation analysis on electrochemical system data and effective current data, extracting power output related features and screening electrochemical feature data from the electrochemical system data;
The model construction module is used for constructing a prediction model, a comprehensive characteristic analysis model, a partial least squares regression spectrum analysis model, a correlation model of a gene sequence and metabolic activity and a microorganism classification model based on the input data set, the power output related characteristics and the electrochemical characteristic data;
the model optimization module is used for respectively training and optimizing a prediction model, a comprehensive characteristic analysis model, a partial least squares regression spectrum analysis model, a correlation model of a gene sequence and metabolic activity and a microorganism classification model;
the data input module is used for preprocessing the detection data based on a preset data processing flow when receiving the detection data of the electroactive microorganisms input by the user terminal, and inputting the preprocessed detection data into the corresponding model for analysis.
Optionally, the data processing module includes:
The data acquisition sub-module is used for acquiring original current data obtained by detecting the electroactive microorganisms, acquiring mixed signals containing electric signals generated by microorganism metabolism and environmental noise, and acquiring experimental data of an electrochemical system;
the noise reduction processing sub-module is used for carrying out noise reduction processing on the original current data so as to remove clutter and interference signals and obtain noise reduction current data;
And the normalization processing sub-module is used for carrying out normalization processing on the noise reduction current data and the experimental data of the electrochemical system to obtain effective current data and electrochemical system data.
Optionally, the feature extraction module includes:
The correlation coefficient calculation sub-module is used for calculating correlation coefficients among each parameter of the effective current data, other preset parameters in the electrochemical system data and the power output respectively;
and the parameter comparison sub-module is used for determining that any parameter is a power output related characteristic when the correlation coefficient between the parameter and the power output is higher than a preset threshold value.
Optionally, the model building module includes:
the prediction model construction module is used for constructing a prediction model for predicting power output under different conditions based on each relevant characteristic based on the relevant characteristic;
the comprehensive characteristic analysis model construction module is used for carrying out dimension reduction treatment on the water sample data and the electrochemical characteristic data and constructing a comprehensive characteristic analysis model;
the partial least square regression spectrum analysis model construction module is used for constructing a partial least square regression spectrum analysis model based on the biological film activity data and the spectrum characteristic data;
the related model construction module is used for constructing a related model of the gene sequence and the metabolic activity based on the microbial gene sequence data and the metabolic product data;
And the microorganism classification model construction module is used for constructing a microorganism classification model based on the electronic transmission information data and the colony characteristic data.
Optionally, the comprehensive characteristic analysis model building module includes:
the pretreatment sub-module is used for carrying out data cleaning and pretreatment on the chemical oxygen demand data and the electrochemical characteristic data, wherein the data cleaning and pretreatment comprises abnormal value removal and standardization treatment;
The main component load matrix construction submodule is used for carrying out dimension reduction treatment on the chemical oxygen demand data and the electrochemical characteristic data subjected to data cleaning and pretreatment based on a main component analysis technology, obtaining a plurality of main component characteristics and constructing a main component load matrix which displays the correlation between each main component and the chemical oxygen demand data and the electrochemical characteristic data;
A principal component score computation sub-module for computing a score for each sample on each principal component based on the principal component load matrix and the chemical oxygen demand data and electrochemical characteristic data prior to data cleaning and preprocessing;
And the comprehensive characteristic analysis model construction submodule is used for constructing a comprehensive characteristic analysis model based on the score of each sample on each main component and the chemical oxygen demand data.
Optionally, the correlation model construction module of the gene sequence and the metabolic activity comprises:
the evolutionary tree construction submodule is used for acquiring the gene sequence of the known microorganism and constructing an evolutionary tree based on the gene sequence data of the microorganism and the gene sequence of the known microorganism;
The data acquisition submodule is used for analyzing the microorganism gene sequence data through a genetic variation analysis tool to acquire genetic characteristic data, and measuring the gene expression level of microorganisms under different conditions through a transcriptome method to acquire gene regulation data;
And the correlation model construction submodule is used for constructing a correlation model of the gene sequence and the metabolic activity based on the microorganism gene sequence data, the metabolite data, the evolutionary tree, the genetic characteristic data and the gene regulation data.
Optionally, the microorganism classification model construction module includes:
The data cleaning submodule is used for carrying out data cleaning on the electronic transmission information data and the colony characteristic data to obtain a comprehensive data set of the extracellular electronic transmission and the colony characteristic;
The characteristic evaluation sub-module is used for extracting classification characteristics related to the classification of the electroactive microorganisms from the comprehensive data set based on a characteristic engineering technology and scoring the extracted classification characteristics by a preset scoring strategy;
The microorganism classification model construction submodule is used for selecting specific characteristics in a specific scoring range from classification characteristics through a preset screening mode and inputting a random forest model so as to construct a microorganism classification model.
Specific limitations regarding an electroactive microorganism-based detection data analysis device can be found in the above description of a method for analyzing electroactive microorganism-based detection data, and are not described in detail herein. The above-described modules of an electroactive microorganism-based assay data analysis device may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims (6)

1.一种基于电活性微生物检测数据分析方法,其特征在于:包括步骤:1. A method for analyzing data based on electroactive microorganism detection, characterized in that it comprises the following steps: 对收集到的基于电活性微生物检测的原始数据以及基于环境和电化学系统检测的原始数据进行前处理,得到电化学系统数据、有效电流数据以及输入数据集合;Pre-processing the collected raw data based on electroactive microorganism detection and the raw data based on environment and electrochemical system detection to obtain electrochemical system data, effective current data and input data set; 对电化学系统数据以及有效电流数据进行相关性分析,提取功率输出相关特征,并从电化学系统数据中筛选电化学特征数据;Conduct correlation analysis on electrochemical system data and effective current data, extract power output related features, and screen electrochemical characteristic data from electrochemical system data; 基于输入数据集合、功率输出相关特征以及电化学特征数据构建预测模型、综合特性分析模型、偏最小二乘回归光谱分析模型、基因序列与代谢活性的关联模型以及微生物分类模型;Construct prediction models, comprehensive characteristic analysis models, partial least squares regression spectral analysis models, gene sequence and metabolic activity association models, and microbial classification models based on input data sets, power output related characteristics, and electrochemical characteristic data; 分别对预测模型、综合特性分析模型、偏最小二乘回归光谱分析模型、基因序列与代谢活性的关联模型以及微生物分类模型进行训练以及优化;The prediction model, comprehensive characteristic analysis model, partial least squares regression spectral analysis model, gene sequence and metabolic activity association model, and microbial classification model were trained and optimized respectively; 当接收到用户端输入的电活性微生物的检测数据时,基于预设的数据处理流程对检测数据进行预处理,并将预处理后的检测数据输入至对应的模型进行分析;When the detection data of the electroactive microorganisms inputted by the user is received, the detection data is preprocessed based on the preset data processing flow, and the preprocessed detection data is inputted into the corresponding model for analysis; 所述对电化学系统数据以及有效电流数据进行相关性分析,提取功率输出相关特征,并从电化学系统数据中筛选电化学特征数据的步骤,包括步骤:The step of performing correlation analysis on the electrochemical system data and the effective current data, extracting power output related features, and screening electrochemical characteristic data from the electrochemical system data comprises the following steps: 分别对有效电流数据各参数、电化学系统数据中其余各预设参数与功率输出之间的相关系数进行计算;Calculating the correlation coefficients between each parameter of the effective current data, other preset parameters in the electrochemical system data and the power output; 若任一参数与功率输出之间的相关系数高于预定阈值,则确定该参数为功率输出相关特征;If the correlation coefficient between any parameter and the power output is higher than a predetermined threshold, the parameter is determined to be a power output related feature; 输入数据集合包括生物膜活性数据、微生物基因序列数据、代谢产物数据、水样数据、菌落特征数据、电子传递信息数据以及光谱特性数据,所述基于输入数据集合、功率输出相关特征以及电化学特征数据构建预测模型、综合特性分析模型、偏最小二乘回归光谱分析模型、基因序列与代谢活性的关联模型以及微生物分类模型的步骤,包括步骤:The input data set includes biofilm activity data, microbial gene sequence data, metabolite data, water sample data, colony characteristic data, electron transfer information data and spectral characteristic data. The steps of constructing a prediction model, a comprehensive characteristic analysis model, a partial least squares regression spectral analysis model, a gene sequence and metabolic activity association model and a microbial classification model based on the input data set, power output related characteristics and electrochemical characteristic data include the following steps: 基于相关特征构建预测基于各相关特征的不同条件下功率输出的预测模型;Constructing a prediction model based on the relevant features to predict the power output under different conditions based on the relevant features; 对水样数据以及电化学特征数据进行降维处理,并构建综合特性分析模型;Perform dimensionality reduction on water sample data and electrochemical characteristic data, and build a comprehensive characteristic analysis model; 基于生物膜活性数据以及光谱特性数据构建偏最小二乘回归光谱分析模型;A partial least squares regression spectral analysis model was constructed based on biofilm activity data and spectral characteristic data; 基于微生物基因序列数据以及代谢产物数据构建基因序列与代谢活性的关联模型;Construct a correlation model between gene sequence and metabolic activity based on microbial gene sequence data and metabolite data; 基于电子传递信息数据以及菌落特征数据构建微生物分类模型。A microbial classification model is constructed based on electronic transfer information data and colony characteristic data. 2.根据权利要求1所述的一种基于电活性微生物检测数据分析方法,其特征在于:所述对收集到的基于电活性微生物检测的原始数据以及基于环境和电化学系统检测的原始数据进行前处理,得到电化学系统数据、有效电流数据以及输入数据集合的步骤,包括步骤:2. A method for analyzing data based on electroactive microorganism detection according to claim 1, characterized in that: the step of pre-processing the collected raw data based on electroactive microorganism detection and the raw data based on environment and electrochemical system detection to obtain electrochemical system data, effective current data and input data set comprises the steps of: 对电活性微生物检测得到的原始电流数据进行采集,获取包含微生物代谢产生的电信号以及环境噪声的混合信号,并对电化学系统的实验数据进行采集;The raw current data obtained from the detection of electroactive microorganisms is collected to obtain a mixed signal including the electrical signal generated by microbial metabolism and environmental noise, and the experimental data of the electrochemical system is collected; 对原始电流数据进行降噪处理,以去除杂波和干扰信号,得到降噪电流数据;Performing noise reduction processing on the original current data to remove clutter and interference signals to obtain noise-reduced current data; 对降噪电流数据以及电化学系统的实验数据进行归一化处理,得到有效电流数据以及电化学系统数据。The noise-reduced current data and the experimental data of the electrochemical system are normalized to obtain the effective current data and the electrochemical system data. 3.根据权利要求1所述的一种基于电活性微生物检测数据分析方法,其特征在于:水样数据包括化学需氧量数据,所述对水样数据以及电化学特征数据进行降维处理,并构建综合特性分析模型的步骤,包括步骤:3. The method for analyzing data based on electroactive microorganism detection according to claim 1 is characterized in that: the water sample data includes chemical oxygen demand data, and the step of performing dimensionality reduction processing on the water sample data and the electrochemical characteristic data and constructing a comprehensive characteristic analysis model comprises the steps of: 对化学需氧量数据以及电化学特征数据进行数据清洗和预处理,所述数据清洗和预处理包括去除异常值以及标准化处理;Performing data cleaning and preprocessing on chemical oxygen demand data and electrochemical characteristic data, wherein the data cleaning and preprocessing include removing outliers and standardizing; 基于主成分分析技术对进行数据清洗和预处理后的化学需氧量数据和电化学特征数据进行降维处理,获得若干个主成分特征并构建显示每个主成分与化学需氧量数据和电化学特征数据之间相关性的主成分载荷矩阵;Based on the principal component analysis technology, the chemical oxygen demand data and electrochemical characteristic data after data cleaning and preprocessing are subjected to dimensionality reduction processing, several principal component characteristics are obtained, and a principal component loading matrix showing the correlation between each principal component and the chemical oxygen demand data and the electrochemical characteristic data is constructed; 基于主成分载荷矩阵以及数据清洗和预处理前的化学需氧量数据和电化学特征数据计算每个样本在每个主成分上的得分,所述样本为单次的水样数据中的化学需氧量数据以及对应的电化学特征数据;Calculate the score of each sample on each principal component based on the principal component loading matrix and the chemical oxygen demand data and electrochemical characteristic data before data cleaning and preprocessing, wherein the sample is the chemical oxygen demand data and the corresponding electrochemical characteristic data in a single water sample data; 基于每个样本在每个主成分上的得分以及化学需氧量数据构建综合特性分析模型。A comprehensive characteristic analysis model was constructed based on the scores of each sample on each principal component and the chemical oxygen demand data. 4.根据权利要求1所述的一种基于电活性微生物检测数据分析方法,其特征在于:所述基于微生物基因序列数据以及代谢产物数据构建基因序列与代谢活性的关联模型的步骤,包括步骤:4. The method for analyzing electroactive microbial detection data according to claim 1, characterized in that the step of constructing a correlation model between gene sequence and metabolic activity based on microbial gene sequence data and metabolite data comprises the following steps: 获取已知微生物的基因序列,并基于微生物基因序列数据与已知微生物的基因序列构建进化树;Obtain the gene sequences of known microorganisms, and construct an evolutionary tree based on the microbial gene sequence data and the gene sequences of known microorganisms; 通过遗传变异分析工具对微生物基因序列数据进行分析,获取遗传特性数据,并通过转录组学方法对不同条件下微生物的基因表达水平进行测定,以获取基因调控数据;Genetic variation analysis tools are used to analyze microbial gene sequence data to obtain genetic characteristic data, and transcriptomics methods are used to measure the gene expression levels of microorganisms under different conditions to obtain gene regulation data; 基于微生物基因序列数据、代谢产物数据、进化树、遗传特性数据以及基因调控数据构建基因序列与代谢活性的关联模型。A correlation model between gene sequence and metabolic activity is constructed based on microbial gene sequence data, metabolite data, evolutionary tree, genetic characteristic data and gene regulation data. 5.根据权利要求1所述的一种基于电活性微生物检测数据分析方法,其特征在于:所述基于电子传递信息数据以及菌落特征数据构建微生物分类模型的步骤,包括步骤:5. The method for analyzing data based on electroactive microorganism detection according to claim 1 is characterized in that the step of constructing a microorganism classification model based on electronic transmission information data and colony characteristic data comprises the following steps: 对电子传递信息数据以及菌落特征数据进行数据清洗,获取胞外电子传递与菌落特征的综合数据集;Data cleaning was performed on the electron transfer information data and colony characteristic data to obtain a comprehensive data set of extracellular electron transfer and colony characteristics; 基于特征工程技术从综合数据集中提取与电活性微生物分类相关的分类特征,并以预设的评分策略对提取的分类特征进行评分;Based on feature engineering technology, classification features related to the classification of electroactive microorganisms are extracted from the comprehensive data set, and the extracted classification features are scored using a preset scoring strategy; 通过预设的筛选方式从分类特征中选出特定评分范围内的特定特征,并用于随机森林模型的输入,以构建微生物分类模型。Specific features within a specific score range are selected from the classification features through a preset screening method and used as input to the random forest model to build a microbial classification model. 6.一种基于电活性微生物检测数据分析装置,其特征在于:包括:6. A data analysis device based on electroactive microorganism detection, characterized in that it includes: 数据处理模块,用于对收集到的基于电活性微生物检测的原始数据以及基于环境和电化学系统检测的原始数据进行前处理,得到电化学系统数据、有效电流数据以及输入数据集合;A data processing module is used to pre-process the collected raw data based on electroactive microorganism detection and the raw data based on environment and electrochemical system detection to obtain electrochemical system data, effective current data and input data set; 特征提取模块,用于对电化学系统数据以及有效电流数据进行相关性分析,提取功率输出相关特征,并从电化学系统数据中筛选电化学特征数据;A feature extraction module is used to perform correlation analysis on the electrochemical system data and the effective current data, extract power output related features, and filter electrochemical feature data from the electrochemical system data; 模型构建模块,用于基于输入数据集合、功率输出相关特征以及电化学特征数据构建预测模型、综合特性分析模型、偏最小二乘回归光谱分析模型、基因序列与代谢活性的关联模型以及微生物分类模型;Model building module, used to build prediction models, comprehensive characteristic analysis models, partial least squares regression spectral analysis models, gene sequence and metabolic activity association models, and microbial classification models based on input data sets, power output related characteristics, and electrochemical characteristic data; 模型优化模块,用于分别对预测模型、综合特性分析模型、偏最小二乘回归光谱分析模型、基因序列与代谢活性的关联模型以及微生物分类模型进行训练以及优化;Model optimization module, used to train and optimize the prediction model, comprehensive characteristic analysis model, partial least squares regression spectral analysis model, gene sequence and metabolic activity association model, and microbial classification model respectively; 数据输入模块,用于当接收到用户端输入的电活性微生物的检测数据时,基于预设的数据处理流程对检测数据进行预处理,并将预处理后的检测数据输入至对应的模型进行分析;A data input module is used to pre-process the detection data based on a preset data processing flow when receiving the detection data of electroactive microorganisms input by the user end, and input the pre-processed detection data into a corresponding model for analysis; 特征提取模块包括:The feature extraction module includes: 相关系数计算子模块,用于分别对有效电流数据各参数、电化学系统数据中其余各预设参数与功率输出之间的相关系数进行计算;The correlation coefficient calculation submodule is used to calculate the correlation coefficients between each parameter of the effective current data, other preset parameters in the electrochemical system data and the power output respectively; 参数对比子模块,用于在任一参数与功率输出之间的相关系数高于预定阈值时,确定该参数为功率输出相关特征;a parameter comparison submodule, for determining that any parameter is a power output-related feature when a correlation coefficient between the parameter and the power output is higher than a predetermined threshold; 输入数据集合包括生物膜活性数据、微生物基因序列数据、代谢产物数据、水样数据、菌落特征数据、电子传递信息数据以及光谱特性数据,模型构建模块包括:The input data set includes biofilm activity data, microbial gene sequence data, metabolite data, water sample data, colony characteristic data, electron transfer information data and spectral characteristic data. The model building modules include: 预测模型构建模块,用于基于相关特征构建预测基于各相关特征的不同条件下功率输出的预测模型;A prediction model building module, used for building a prediction model for predicting power output under different conditions based on relevant features; 综合特性分析模型构建模块,用于对水样数据以及电化学特征数据进行降维处理,并构建综合特性分析模型;A comprehensive characteristic analysis model building module is used to perform dimensionality reduction processing on water sample data and electrochemical characteristic data, and to build a comprehensive characteristic analysis model; 偏最小二乘回归光谱分析模型构建模块,用于基于生物膜活性数据以及光谱特性数据构建偏最小二乘回归光谱分析模型;A partial least squares regression spectral analysis model building module is used to build a partial least squares regression spectral analysis model based on biofilm activity data and spectral characteristic data; 基因序列与代谢活性的关联模型构建模块,用于基于微生物基因序列数据以及代谢产物数据构建基因序列与代谢活性的关联模型;A module for building a model for the association between gene sequence and metabolic activity, which is used to build a model for the association between gene sequence and metabolic activity based on microbial gene sequence data and metabolite data; 微生物分类模型构建模块,用于基于电子传递信息数据以及菌落特征数据构建微生物分类模型。The microbial classification model building module is used to build a microbial classification model based on electronic transmission information data and colony characteristic data.
CN202410766467.0A 2024-06-14 2024-06-14 Data analysis method and device based on electroactive microorganism detection Active CN118503787B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410766467.0A CN118503787B (en) 2024-06-14 2024-06-14 Data analysis method and device based on electroactive microorganism detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410766467.0A CN118503787B (en) 2024-06-14 2024-06-14 Data analysis method and device based on electroactive microorganism detection

Publications (2)

Publication Number Publication Date
CN118503787A CN118503787A (en) 2024-08-16
CN118503787B true CN118503787B (en) 2024-11-08

Family

ID=92240992

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410766467.0A Active CN118503787B (en) 2024-06-14 2024-06-14 Data analysis method and device based on electroactive microorganism detection

Country Status (1)

Country Link
CN (1) CN118503787B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114813873A (en) * 2022-04-18 2022-07-29 中国科学院重庆绿色智能技术研究院 Microbial electrochemical analysis device and analysis method thereof
CN117951584A (en) * 2024-03-13 2024-04-30 青岛启弘信息科技有限公司 Ocean data processing and information scheduling system based on AI and Internet of things technology

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
MX388995B (en) * 2015-06-25 2025-03-20 Native Microbials Inc Methods, apparatuses, and systems for analyzing microorganism strains from complex heterogeneous communities, predicting and identifying functional relationships and interactions thereof, and selecting and synthesizing microbial ensembles based thereon
CA3068084C (en) * 2017-09-09 2023-11-28 Neil Gordon Bioanalyte signal amplification and detection with artificial intelligence diagnosis
WO2022179444A1 (en) * 2021-02-25 2022-09-01 华谱科仪(大连)科技有限公司 Chromatographic analysis system, method for detecting and analyzing chromatogram, and electronic device
CN113820376B (en) * 2021-09-14 2025-03-04 南开大学 A comprehensive toxicant monitoring method for microbial electrochemical sensors based on machine learning models
CN117349782B (en) * 2023-12-06 2024-02-20 湖南嘉创信息科技发展有限公司 Intelligent data early warning decision tree analysis method and system
CN118051859B (en) * 2024-04-15 2024-08-06 深圳市俊元生物科技有限公司 Automatic analysis system for microorganism culture result

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114813873A (en) * 2022-04-18 2022-07-29 中国科学院重庆绿色智能技术研究院 Microbial electrochemical analysis device and analysis method thereof
CN117951584A (en) * 2024-03-13 2024-04-30 青岛启弘信息科技有限公司 Ocean data processing and information scheduling system based on AI and Internet of things technology

Also Published As

Publication number Publication date
CN118503787A (en) 2024-08-16

Similar Documents

Publication Publication Date Title
Monroy et al. Fault diagnosis of a benchmark fermentation process: a comparative study of feature extraction and classification techniques
Zhou et al. Data pre-processing for analyzing microbiome data–A mini review
CN119470349A (en) A method, system, device and medium for detecting water turbidity using multi-light source scattering
Halgamuge et al. Lessons learned from the application of machine learning to studies on plant response to radio-frequency
CN113343361A (en) Intelligent monitoring method, device and equipment for vehicle body size and storage medium
CN117309838A (en) Industrial park water pollution tracing method based on three-dimensional fluorescence characteristic data
CN119249367B (en) Intelligent environment monitoring method, system, equipment and readable storage medium
CN119000487B (en) A cell death detection method and system based on fluorescence technology
CN118503787B (en) Data analysis method and device based on electroactive microorganism detection
CN118428608B (en) Essence production quality full-flow tracing method and system based on data analysis
CN119830051A (en) Organic fertilizer production parameter optimization control method for composite microbial agent
CN119226976A (en) Sewage treatment effect evaluation method and system based on data analysis
Mulvey et al. Assessing the adequacy of morphological models used in palaeobiology
CN119252349A (en) Methods for single-cell transcriptome data-assisted AD analysis and classification
CN118072825B (en) A method for identifying and analyzing microorganisms in soil
CN118380066A (en) Gradient lifting integrated learning algorithm and three-dimensional fluorescence-based rapid detection method and device for ammonia nitrogen in water
Parikh et al. An application of matching after learning to stretch (MALTS) to the ACIC 2018 causal inference challenge data
CN117952482A (en) A product quality accident classification method and system based on convolutional neural network
Liao et al. Efficient and robust bayesian selection of hyperparameters in dimension reduction for visualization
Sinha et al. A study of feature selection and extraction algorithms for cancer subtype prediction
Obare et al. Advancing statistical methodologies for composite phenotype analysis in genome-wide association studies
CN118657232B (en) Prediction model construction method, method for detecting pathogenic microorganisms in groundwater, and computer program product
Poignard et al. Feature screening with kernel knockoffs
CN120280037A (en) Key compound identification method and device based on Daqu grade
Rezvani et al. Data cleaning for image-based profiling enhancement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant