[go: up one dir, main page]

CN114863994B - Pollution assessment method, device, electronic equipment and storage medium - Google Patents

Pollution assessment method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114863994B
CN114863994B CN202210785147.0A CN202210785147A CN114863994B CN 114863994 B CN114863994 B CN 114863994B CN 202210785147 A CN202210785147 A CN 202210785147A CN 114863994 B CN114863994 B CN 114863994B
Authority
CN
China
Prior art keywords
cell
pollution
evaluated
sample
sample matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210785147.0A
Other languages
Chinese (zh)
Other versions
CN114863994A (en
Inventor
黄万翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Singleron Nanjing Biotechnologies Ltd
Original Assignee
Singleron Nanjing Biotechnologies Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Singleron Nanjing Biotechnologies Ltd filed Critical Singleron Nanjing Biotechnologies Ltd
Priority to CN202210785147.0A priority Critical patent/CN114863994B/en
Publication of CN114863994A publication Critical patent/CN114863994A/en
Application granted granted Critical
Publication of CN114863994B publication Critical patent/CN114863994B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Physiology (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Algebra (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a pollution assessment method, a pollution assessment device, electronic equipment and a storage medium. The method comprises the following steps: under the condition that the pollution information of the cells to be evaluated meets the evaluation condition, acquiring a cell sample matrix of the cells to be evaluated; determining a cell contamination score based on the cell sample matrix of the cells to be assessed; inputting the cell pollution score into a trained pollution evaluation model to obtain a pollution evaluation result, wherein the pollution evaluation model is obtained by training sample scores of cell samples in a training data set and labeling results corresponding to the sample scores of the cell samples; and judging whether to perform decontamination treatment on the cell sample matrix based on the pollution evaluation result. According to the technical scheme, an accurate pollution evaluation result can be obtained through the pollution evaluation model, so that reasonable decontamination is realized according to the accurate pollution evaluation result.

Description

Pollution assessment method, device, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of cell sequencing, in particular to a pollution assessment method, a pollution assessment device, electronic equipment and a storage medium.
Background
Single cell RNA sequencing has been widely used today as a job that can study complex biological systems at the resolution of a single cell, while microwell-based single cell sequencing platforms can study large numbers of cells at a small cost.
However, such microwell-based single-cell transcriptome sequencing technology is also challenging due to environmental contamination. Environmental RNA may originate from apoptotic cells, large-sized cellular pores breaking, and so forth. When environmental RNA enters the microwells, the captured barcode begins to express environmental genes. Especially after clustering, it was found that some highly specific classical marker genes were ubiquitously expressed in individual cell populations, severely affecting manual annotation by marker genes.
In the process of implementing the invention, the inventor finds that the following technical problems exist in the prior art: in the decontamination process, even a sample that is substantially free of contamination (presence of doubls) has more or less highly scored cells defined as contamination. However, in actual scientific research, the contamination may not be enough to affect subsequent manual annotation and downstream analysis, but rather, some biological meanings between samples are destroyed due to forced decontamination, so that the problem of unreasonable decontamination exists.
Disclosure of Invention
The invention provides a pollution assessment method, a pollution assessment device, electronic equipment and a storage medium, and aims to solve the problem of unreasonable pollution removal.
According to an aspect of the present invention, there is provided a contamination evaluation method including:
under the condition that the pollution information of the cells to be evaluated meets the evaluation condition, acquiring a cell sample matrix of the cells to be evaluated;
determining a cell contamination score based on the cell sample matrix of the cells to be assessed;
inputting the cell pollution score into a trained pollution evaluation model to obtain a pollution evaluation result, wherein the pollution evaluation model is obtained by training sample scores of cell samples in a training data set and labeling results corresponding to the sample scores of the cell samples;
and judging whether to carry out decontamination treatment on the cell sample matrix based on the contamination evaluation result.
According to another aspect of the present invention, there is provided a contamination evaluation device including:
the sample matrix acquisition module is used for acquiring a cell sample matrix of the cell to be evaluated under the condition that the pollution information of the cell to be evaluated meets the evaluation condition;
a contamination score determination module for determining a cell contamination score based on the cell sample matrix of the cells to be assessed;
the assessment result determining module is used for inputting the cell pollution score into a trained pollution assessment model to obtain a pollution assessment result, wherein the pollution assessment model is obtained by training sample scores of cell samples in a training data set and labeling results corresponding to the sample scores of the cell samples;
and the decontamination processing module is used for judging whether to carry out decontamination processing on the cell sample matrix based on the pollution evaluation result.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform a contamination assessment method according to any of the embodiments of the present invention.
According to another aspect of the present invention, there is provided a computer-readable storage medium storing computer instructions for causing a processor to perform the contamination assessment method according to any one of the embodiments of the present invention when the computer instructions are executed.
According to the technical scheme of the embodiment of the invention, under the condition that the pollution information of the cells to be evaluated meets the evaluation condition, the cell pollution value is determined according to the cell sample matrix of the cells to be evaluated, so that the cells to be evaluated are scored; further, the cell pollution value is input into the trained pollution assessment model to obtain an accurate pollution assessment result, so that reasonable decontamination is realized according to the accurate pollution assessment result.
It should be understood that the statements in this section are not intended to identify key or critical features of the embodiments of the present invention, nor are they intended to limit the scope of the invention. Other features of the present invention will become apparent from the following description.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow chart of a contamination evaluation method according to an embodiment of the present invention;
FIG. 2 is a flow chart of a contamination evaluation method according to a second embodiment of the present invention;
FIG. 3 is a flow chart of a contamination evaluation method according to a second embodiment of the present invention;
FIG. 4 is a flow chart of a contamination evaluation method according to a third embodiment of the present invention;
FIG. 5 is a schematic flow chart of cell type annotation provided according to the third embodiment of the present invention;
FIG. 6 is a schematic structural diagram of a contamination evaluation device according to a fourth embodiment of the present invention;
fig. 7 is a schematic structural diagram of an electronic device implementing the pollution evaluation method according to the embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example one
Fig. 1 is a flowchart of a contamination evaluation method according to an embodiment of the present invention, which may be applied to automatically evaluate a cell contamination condition, and the method may be performed by a contamination evaluation device, which may be implemented in a form of hardware and/or software, and the contamination evaluation device may be configured in a terminal and/or a server. As shown in fig. 1, the method includes:
s110, under the condition that the pollution information of the cells to be evaluated meets the evaluation condition, obtaining a cell sample matrix of the cells to be evaluated.
In the embodiment of the present invention, the cell to be evaluated refers to a cell tissue sample to be subjected to contamination evaluation, and the cell tissue sample may include one or more cells, which is not limited herein. Alternatively, the cells to be evaluated may be a tissue sample for gene sequencing, for example, the cells to be evaluated may be single cells for RNA sequencing.
It should be noted that, because the contamination conditions of the cells to be evaluated are different, the contamination evaluation performed in a unified manner may affect the contamination evaluation result. For example, in the case of large-area cell contamination, the existing contamination treatment methods are not sensitive to the large-area cell contamination, and the contamination of the cells to be evaluated cannot be determined. Therefore, before pollution evaluation, the pollution information of the cells to be evaluated can be judged, and the cells to be evaluated which meet the evaluation conditions can be screened out, so that the reliability of the evaluation result is improved.
In the embodiment of the present invention, the cell sample matrix refers to a gene sequencing matrix of the cell to be evaluated, and can be used for characterizing the gene distribution condition at different spatial positions in the cell to be evaluated. In this embodiment, the cell sample matrix may be a preprocessed sample matrix or an initial sample matrix, which is not limited herein.
Specifically, a cell tissue sample is sequenced through a sequencing instrument, each sequence of R1 contains information such as cell tags (barcode) to be extracted and molecular unique identifiers (UMI) from original data R1 and R2 files obtained through sequencing, each sequence in R2 corresponds to R1 and is used for carrying out comparison and quantification work with a genome, and finally, an expression matrix for representing different cell gene expression levels in the tissue sample is constructed.
On the basis of the above embodiment, the obtaining a cell sample matrix of the cells to be evaluated includes: obtaining an initial sample matrix of a cell to be evaluated; and carrying out double-cell removal treatment on the initial sample matrix of the cells to be evaluated to obtain a cell sample matrix.
Wherein, the initial sample matrix refers to a gene sequencing matrix without pretreatment. The bicell removal process is a cell pretreatment operation by which interference of the bicells with the cells to be evaluated can be reduced. Optionally, the double cell removal treatment is a doublles treatment of the cell sample matrix.
Illustratively, after the initial gene sequencing matrix is obtained, randomly sampled single cell transcriptomes may be linearly combined to generate mock doublets for identifying and removing actual doublets present in the initial sample matrix, thereby reducing interference caused by double cells.
In some embodiments, the pre-processing method may further comprise: and screening the cells to be evaluated according to one or more of the mitochondrial content, the gene content and the UMI (unique molecular identifier) content of the cells to be evaluated so as to remove the cells which do not meet the evaluation standard and improve the quality of the cells to be evaluated.
Exemplarily, if the mitochondrial content of the cell to be evaluated exceeds a preset mitochondrial content threshold, determining that the current cell to be evaluated does not meet the evaluation standard; and if the mitochondrial content of the cell to be evaluated is less than the preset mitochondrial content threshold, determining that the current cell to be evaluated meets the evaluation standard, wherein the preset mitochondrial content threshold can be determined according to experiments, and the specific numerical value is not limited herein. Or if the gene content of the cell to be evaluated is smaller than a preset gene content threshold value, determining that the cell to be evaluated does not meet the evaluation standard; and if the gene content of the cell to be evaluated exceeds a preset gene content threshold value, determining that the cell to be evaluated meets the evaluation standard, wherein the preset gene content threshold value can be determined according to experiments, and the specific numerical value is not limited here. Or if the UMI content of the cell to be evaluated is smaller than a preset UMI content threshold value, determining that the current cell to be evaluated does not meet the evaluation standard; and if the UMI content of the cell to be evaluated exceeds a preset UMI content threshold value, determining that the current cell to be evaluated meets the evaluation standard, wherein the preset UMI content threshold value can be determined according to experiments, and the specific numerical value is not limited here.
And S120, determining a cell contamination score based on the cell sample matrix of the cells to be evaluated.
The cell contamination score is a contamination score of the cell to be evaluated, and can be used for representing the contamination degree of the cell to be evaluated.
Specifically, a cell sample matrix of the cells to be evaluated is analyzed to obtain gene expression information. The gene expression information refers to the expression distribution information of each gene in the cell to be evaluated, and may include, but is not limited to, cell expression information and contaminant gene expression information, for example, the gene expression information may be a high expression gene, a low expression gene, and the like of the cell. Further, a cell contamination score may be determined based on cell self expression information and contaminant gene expression information.
Illustratively, the cells to be evaluated may include a cell population a and a cell population B, and if a specifically expressed gene in the cell population B is superimposed on the gene expression profile of the cell population a, the expressed gene in the cell population B may be considered to be contaminated. Further, the cellular contamination score may be determined based on the ratio of the expressed genes in cell population B superimposed on the gene expression profile of cell population A.
S130, inputting the cell pollution score into a trained pollution evaluation model to obtain a pollution evaluation result, wherein the pollution evaluation model is obtained by training sample scores of cell samples in a training data set and labeling results corresponding to the sample scores of the cell samples.
The pollution evaluation model refers to a neural network model trained in advance.
Specifically, the neural network model may be obtained by training in advance sample scores of cell samples in a large number of training data sets and labeling results corresponding to the sample scores of the cell samples. In the trained neural network model, sample scores of cell samples are classified to obtain a prediction classification result, model parameters in the neural network model are trained based on the prediction classification result and a labeling result corresponding to the sample scores, and the distance deviation between the prediction classification result of the model and the labeling result corresponding to the sample scores is gradually reduced and tends to be stable by continuously adjusting the model parameters.
And S140, judging whether to carry out decontamination treatment on the cell sample matrix based on the contamination evaluation result.
It should be noted that, after the pollution evaluation result of the cell to be evaluated is obtained through the pollution evaluation model prediction, the embodiment of the invention can judge whether to perform decontamination treatment on the cell sample matrix according to the pollution evaluation result, so as to achieve reasonable decontamination.
Specifically, when the contamination evaluation result is contamination, the cell sample matrix is subjected to decontamination processing to obtain a decontamination sample matrix, and the decontamination sample matrix is a contamination-eliminated cell sample matrix provided for subsequent cell sequencing work. And under the condition that the pollution evaluation result is normal, the cell sample matrix is not subjected to decontamination treatment, so that the condition that the cells are damaged due to forced decontamination can be avoided.
In some embodiments, the contamination assessment results may also include the type of cell that is contaminated and the environmental genes that caused the contamination.
According to the technical scheme of the embodiment of the invention, under the condition that the pollution information of the cells to be evaluated meets the evaluation condition, the cell pollution value is determined according to the cell sample matrix of the cells to be evaluated, so that the cells to be evaluated are scored; further, the cell pollution value is input into the trained pollution assessment model to obtain an accurate pollution assessment result, so that reasonable decontamination is realized according to the accurate pollution assessment result.
Example two
Fig. 2 is a flowchart of a pollution evaluation method according to a second embodiment of the present invention, and the present embodiment introduces training steps of a pollution evaluation model based on the above embodiments. Optionally, the pollution assessment model is obtained by pre-training through the following steps: regarding a cell sample matrix belonging to a contaminated sample and a cell sample matrix belonging to an entity of a normal sample, taking sample scores of the contaminated sample and the normal sample and labeling results corresponding to the sample scores of the contaminated sample and the normal sample as a group of contamination assessment samples; training an original assessment model based on a plurality of groups of pollution assessment samples to obtain the pollution assessment model, wherein the pollution assessment model comprises a logistic regression network.
As shown in fig. 2, the method includes:
s210, regarding to a cell sample matrix belonging to a contaminated sample and a cell sample matrix belonging to an entity of a normal sample, taking a labeling result corresponding to the sample score of the contaminated sample or the normal sample and the sample score of the contaminated sample or the normal sample as a group of contamination assessment samples.
S220, training an original assessment model based on a plurality of groups of pollution assessment samples to obtain the pollution assessment model, wherein the pollution assessment model comprises a logistic regression network.
And S230, acquiring a cell sample matrix of the cell to be evaluated under the condition that the pollution information of the cell to be evaluated meets the evaluation condition.
S240, determining a cell contamination score based on the cell sample matrix of the cell to be evaluated.
And S250, inputting the cell pollution value into a trained pollution evaluation model to obtain a pollution evaluation result.
And S260, judging whether to carry out decontamination treatment on the cell sample matrix based on the contamination evaluation result.
In an embodiment of the present invention, the training data set of the contamination assessment model may include sample scores of a contaminated sample and a normal sample, where the contaminated sample refers to a cell sample with contamination, and the normal sample refers to a cell sample without contamination or with negligible contamination. And the labeling result corresponding to the sample score is a label of the sample score of the cell sample and is used for monitoring the training success of the model. It should be noted that the logistic regression network is a binary classification model, and can classify the cell samples according to the sample values, i.e., the cell samples are classified into normal types and pollution types, so as to determine whether to perform decontamination treatment.
Specifically, the activation function of the original evaluation model may be a step function (Sigmoid), the loss function may be a logarithmic loss function, multiple sets of pollution evaluation samples may be input to the original evaluation model, and according to the logarithmic loss function, model parameters of the original evaluation model are continuously optimized, so that a loss function value of the logarithmic loss function becomes smaller gradually, and when the loss function value does not change significantly, the model parameters of the original evaluation model are considered to be optimal, that is, the pollution evaluation model completes training. The trained pollution assessment model can be used for carrying out pollution assessment on unappreciated cell samples so as to judge the pollution condition of the cell samples.
For example, fig. 3 is a flowchart of a contamination evaluation method according to an embodiment of the present invention, which may perform a double processing on a single sample matrix (i.e., an initial sample matrix) to obtain a cell sample matrix without double interference; further, scoring the cell sample matrix to obtain a cell pollution value, inputting the cell pollution value into a trained pollution evaluation model, and predicting the pollution condition of the cells to be evaluated; under the condition that the pollution evaluation result is pollution, performing decontamination treatment on the cell sample matrix to obtain a decontamination sample matrix, and providing a sample matrix for eliminating the pollution for subsequent cell sequencing work; and under the condition that the pollution evaluation result is normal, the cell sample matrix is not subjected to decontamination treatment, the original cell sample matrix is reserved, and the condition that the cells are damaged due to forced decontamination can be avoided. The subsequent cell sequencing work may include, but is not limited to, downstream quality control, integration matrix, normalization, centralization, dimension reduction grouping, cell type annotation, and the like, and is not limited herein.
According to the technical scheme of the embodiment of the invention, a plurality of groups of pollution evaluation samples are input into the original evaluation model, model parameters of the original evaluation model are continuously optimized according to the logarithmic loss function, so that the loss function value of the logarithmic loss function is gradually reduced, and when the loss function value is not obviously changed, the model parameters of the original evaluation model are considered to be optimal, namely the pollution evaluation model completes training. The trained pollution evaluation model can be used for carrying out two-classification pollution evaluation on unapproved cell samples so as to judge the pollution condition of cells to be evaluated, and therefore reasonable decontamination is achieved according to the pollution evaluation result.
EXAMPLE III
Fig. 4 is a flowchart of a contamination evaluation method according to a third embodiment of the present invention, which adds a new technical feature on the basis of the above-mentioned embodiments. Optionally, before obtaining the matrix of cell samples to be evaluated, the method further includes: judging whether the cell to be evaluated meets the evaluation condition or not based on the environmental gene information, if the genes in the environmental gene information have the pre-collected annotation information, determining that the pollution information of the cell to be evaluated does not meet the evaluation condition, and if the genes in the environmental gene information do not have the pre-collected annotation information, determining that the pollution information of the cell to be evaluated meets the evaluation condition.
As shown in fig. 4, the method includes:
and S310, judging whether the cells to be evaluated meet the evaluation conditions or not based on the environmental gene information.
S320, under the condition that the pollution information of the cells to be evaluated meets the evaluation conditions, obtaining a cell sample matrix of the cells to be evaluated.
S330, determining a cell contamination score based on the cell sample matrix of the cell to be evaluated.
And S340, inputting the cell pollution score into a trained pollution evaluation model to obtain a pollution evaluation result, wherein the pollution evaluation model is obtained by training the sample score of the cell sample in a training data set and a labeling result corresponding to the sample score of the cell sample.
And S350, judging whether to carry out decontamination treatment on the cell sample matrix based on the contamination evaluation result.
In the embodiments of the present invention, the environmental genetic information refers to environmental genetic information used for characterizing contamination in the cells to be evaluated.
It can be understood that if the cells to be evaluated are polluted in a large area, certain cell types of genes inevitably become environmental genes and are ubiquitously expressed in most cells, and therefore, whether the cells to be evaluated meet the evaluation conditions can be judged according to the environmental gene information.
Specifically, if there is pre-collected annotation information for each gene in the environmental gene information, it is determined that the contamination information of the cell to be evaluated does not satisfy the evaluation condition, and if there is no pre-collected annotation information for each gene in the environmental gene information, it is determined that the contamination information of the cell to be evaluated satisfies the evaluation condition. The pre-collected annotation information refers to an annotation information list containing a plurality of cell types, and can be used for judging whether the cells to be evaluated meet the evaluation conditions. For example, annotation information can be classical markers and algorithm markers of previously collected cell types.
If there is annotation information collected in advance for each gene in the environmental gene information, which indicates that the cells to be evaluated have large-area contamination (for example, the cell sample may have large-area contamination of keratinocytes), the evaluation condition is not satisfied.
On the basis of the above embodiment, before determining whether the cell to be evaluated satisfies the evaluation condition based on the environmental genetic information, the method further includes: determining the expression rate of each gene in the initial sample matrix; and sequencing the genes according to the expression rate of each gene in the cell sample matrix, and selecting a preset number of genes as environmental genes according to a sequencing result.
Illustratively, as shown in fig. 5, the dropout rate (i.e., expression rate) of each gene in the initial sample matrix is calculated, and is sorted according to the dropout rate of each gene, and the gene ubiquitously expressed by top20 is selected as an environmental gene and is used as a scope for investigation and comparison. If a pre-collected classical marker or algorithm marker exists in the top20 gene, the situation that the cells to be evaluated have large-area pollution is shown. And after judging whether the cells to be evaluated meet the evaluation conditions or not, generating an annotation record text, wherein the annotation record text can comprise the marker of the environmental gene and the corresponding cell type.
According to the technical scheme of the embodiment of the invention, whether the cells to be evaluated have large-scale pollution or not is determined by judging whether the environmental gene information contains the pre-collected annotation information, so that the sensitivity of cell capture with the large-scale pollution is improved.
Example four
Fig. 6 is a schematic structural diagram of a contamination evaluation apparatus according to a fourth embodiment of the present invention. As shown in fig. 6, the apparatus includes:
a sample matrix obtaining module 410, configured to obtain a cell sample matrix of a cell to be evaluated, when contamination information of the cell to be evaluated satisfies an evaluation condition;
a contamination score determining module 420 for determining a cell contamination score based on the cell sample matrix of the cells to be assessed;
an evaluation result determining module 430, configured to input the cell contamination score into a trained contamination evaluation model to obtain a contamination evaluation result, where the contamination evaluation model is obtained by training a sample score of a cell sample in a training data set and a labeling result corresponding to the sample score of the cell sample;
and a decontamination processing module 440, configured to determine whether to perform decontamination processing on the cell sample matrix based on the contamination evaluation result.
According to the technical scheme of the embodiment of the invention, under the condition that the pollution information of the cells to be evaluated meets the evaluation condition, the cell pollution value is determined according to the cell sample matrix of the cells to be evaluated, so that the cells to be evaluated are scored; further, the cell pollution value is input into the trained pollution assessment model to obtain an accurate pollution assessment result, so that reasonable decontamination is realized according to the accurate pollution assessment result.
In some optional implementations of the embodiments of the present disclosure, the sample matrix obtaining module 410 is specifically configured to:
obtaining an initial sample matrix of a cell to be evaluated;
and carrying out double-cell removal treatment on the initial sample matrix of the cells to be evaluated to obtain a cell sample matrix.
In some optional implementations of embodiments of the present disclosure, the pollution score determining module 420 is specifically configured to:
analyzing a cell sample matrix of the cell to be evaluated to obtain gene expression information, wherein the gene expression information comprises cell expression information and pollution gene expression information;
determining a cell contamination score based on the cell self expression information and the contaminating gene expression information.
In some optional implementations of embodiments of the present disclosure, the pollution assessment model is pre-trained by:
regarding a cell sample matrix belonging to a contaminated sample and a cell sample matrix belonging to an entity of a normal sample, taking sample scores of the contaminated sample and the normal sample and labeling results corresponding to the sample scores of the contaminated sample and the normal sample as a group of contamination assessment samples;
training an original assessment model based on a plurality of groups of pollution assessment samples to obtain the pollution assessment model, wherein the pollution assessment model comprises a logistic regression network.
In some optional implementations of embodiments of the present disclosure, the apparatus is further configured to:
and under the condition that the pollution evaluation result is pollution, performing decontamination treatment on the cell sample matrix to obtain a decontamination sample matrix.
In some optional implementations of embodiments of the present disclosure, the apparatus is further configured to:
judging whether the cell to be evaluated meets the evaluation condition or not based on the environmental gene information, if the genes in the environmental gene information have the pre-collected annotation information, determining that the pollution information of the cell to be evaluated does not meet the evaluation condition, and if the genes in the environmental gene information do not have the pre-collected annotation information, determining that the pollution information of the cell to be evaluated meets the evaluation condition.
In some optional implementations of embodiments of the present disclosure, the apparatus is further configured to:
determining the expression rate of each gene in the initial sample matrix;
and sequencing the genes according to the expression rate of each gene in the cell sample matrix, and selecting a preset number of genes as environmental genes according to a sequencing result.
The pollution evaluation device provided by the embodiment of the invention can execute the pollution evaluation method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
EXAMPLE five
FIG. 7 illustrates a block diagram of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 7, the electronic device 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, and the like, wherein the memory stores a computer program executable by the at least one processor, and the processor 11 can perform various suitable actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from a storage unit 18 into the Random Access Memory (RAM) 13. In the Random Access Memory (RAM) 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, Read Only Memory (ROM) 12 and Random Access Memory (RAM) 13 are connected to each other by a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
A number of components in the electronic device 10 are connected to an input/output (I/O) interface 15, including: an input unit 16 such as a keyboard, a mouse, or the like; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, or the like. Processor 11 performs the various methods and processes described above, such as contamination assessment methods, including:
under the condition that the pollution information of the cells to be evaluated meets the evaluation condition, acquiring a cell sample matrix of the cells to be evaluated;
determining a cell contamination score based on the cell sample matrix of the cells to be assessed;
inputting the cell pollution score into a trained pollution evaluation model to obtain a pollution evaluation result, wherein the pollution evaluation model is obtained by training sample scores of cell samples in a training data set and labeling results corresponding to the sample scores of the cell samples;
and judging whether to perform decontamination treatment on the cell sample matrix based on the contamination evaluation result.
In some embodiments, the contamination assessment method may be implemented as a computer program tangibly embodied in a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via a Read Only Memory (ROM) 12 and/or the communication unit 19. When the computer program is loaded into Random Access Memory (RAM) 13 and executed by processor 11, one or more steps of the contamination assessment method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the contamination assessment method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for implementing the methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. A computer program can execute entirely on a machine, partly on a machine, as a stand-alone software package partly on a machine and partly on a remote machine or entirely on a remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. A computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present invention may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solution of the present invention can be achieved.
The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (9)

1. A contamination evaluation method, comprising:
under the condition that the pollution information of the cells to be evaluated meets the evaluation conditions, obtaining a cell sample matrix of the cells to be evaluated, wherein the cell sample matrix is a gene sequencing matrix of the cells to be evaluated;
determining a cell contamination score based on the cell sample matrix of the cells to be assessed;
inputting the cell pollution score into a trained pollution evaluation model to obtain a pollution evaluation result, wherein the pollution evaluation model is obtained by training sample scores of cell samples in a training data set and labeling results corresponding to the sample scores of the cell samples;
judging whether to carry out decontamination treatment on the cell sample matrix based on the contamination evaluation result;
determining a cellular contamination score based on the cell sample matrix of the cells to be assessed, comprising:
analyzing a cell sample matrix of the cell to be evaluated to obtain gene expression information, wherein the gene expression information comprises cell expression information and pollution gene expression information;
and determining a cell contamination score based on the cell self-expression information and the contamination gene expression information.
2. The method of claim 1, wherein said obtaining a cell sample matrix of said cells to be assessed comprises:
obtaining an initial sample matrix of a cell to be evaluated;
and performing double-cell removal treatment on the initial sample matrix of the cells to be evaluated to obtain a cell sample matrix.
3. The method of claim 1, wherein the pollution assessment model is pre-trained by:
regarding a cell sample matrix belonging to a contaminated sample and a cell sample matrix belonging to an entity of a normal sample, taking sample scores of the contaminated sample and the normal sample and labeling results corresponding to the sample scores of the contaminated sample and the normal sample as a group of contamination assessment samples;
training an original assessment model based on a plurality of groups of pollution assessment samples to obtain the pollution assessment model, wherein the pollution assessment model comprises a logistic regression network.
4. The method of claim 1, wherein after obtaining the contamination assessment results, the method further comprises:
and under the condition that the pollution evaluation result is pollution, performing decontamination treatment on the cell sample matrix to obtain a decontamination sample matrix.
5. The method of claim 1, wherein prior to obtaining the matrix of cell samples to be evaluated, the method further comprises:
judging whether the cell to be evaluated meets the evaluation condition or not based on the environmental gene information, if the genes in the environmental gene information have the pre-collected annotation information, determining that the pollution information of the cell to be evaluated does not meet the evaluation condition, and if the genes in the environmental gene information do not have the pre-collected annotation information, determining that the pollution information of the cell to be evaluated meets the evaluation condition.
6. The method according to claim 5, wherein before determining whether the cell to be evaluated satisfies the evaluation condition based on the environmental genetic information, the method further comprises:
determining the expression rate of each gene in the initial sample matrix;
and sequencing the genes according to the expression rate of each gene in the cell sample matrix, and selecting a preset number of genes as environmental genes according to a sequencing result.
7. A contamination evaluation device, comprising:
the system comprises a sample matrix acquisition module, a sample analysis module and a sample analysis module, wherein the sample matrix acquisition module is used for acquiring a cell sample matrix of a cell to be evaluated under the condition that the pollution information of the cell to be evaluated meets an evaluation condition, and the cell sample matrix refers to a gene sequencing matrix of the cell to be evaluated;
a contamination score determination module for determining a cell contamination score based on the cell sample matrix of the cells to be assessed;
the assessment result determining module is used for inputting the cell pollution score into a trained pollution assessment model to obtain a pollution assessment result, wherein the pollution assessment model is obtained by training sample scores of cell samples in a training data set and labeling results corresponding to the sample scores of the cell samples;
the decontamination processing module is used for judging whether to carry out decontamination processing on the cell sample matrix based on the pollution evaluation result;
a pollution score determination module specifically configured to:
analyzing a cell sample matrix of the cell to be evaluated to obtain gene expression information, wherein the gene expression information comprises cell expression information and pollution gene expression information;
determining a cell contamination score based on the cell self expression information and the contaminating gene expression information.
8. An electronic device, characterized in that the electronic device comprises:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the contamination assessment method of any one of claims 1-6.
9. A computer-readable storage medium having stored thereon computer instructions for causing a processor to execute the contamination assessment method of any one of claims 1-6.
CN202210785147.0A 2022-07-06 2022-07-06 Pollution assessment method, device, electronic equipment and storage medium Active CN114863994B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210785147.0A CN114863994B (en) 2022-07-06 2022-07-06 Pollution assessment method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210785147.0A CN114863994B (en) 2022-07-06 2022-07-06 Pollution assessment method, device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN114863994A CN114863994A (en) 2022-08-05
CN114863994B true CN114863994B (en) 2022-09-30

Family

ID=82626410

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210785147.0A Active CN114863994B (en) 2022-07-06 2022-07-06 Pollution assessment method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114863994B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115691676A (en) * 2022-11-16 2023-02-03 北京昌平实验室 Method, device and storage medium for analyzing tissue cell components

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR3009088B1 (en) * 2013-07-24 2016-11-25 Assist Publique - Hopitaux De Marseille METHOD FOR DETECTING MICROBIAL CONTAMINATION OF A SAMPLE BY ANALYSIS OF MASS SPECTRUM
US20160177396A1 (en) * 2014-12-22 2016-06-23 Enzo Biochem, Inc. Comprehensive and comparative flow cytometry-based methods for identifying the state of a biological system
US11326211B2 (en) * 2015-04-17 2022-05-10 Merck Sharp & Dohme Corp. Blood-based biomarkers of tumor sensitivity to PD-1 antagonists
WO2019079493A2 (en) * 2017-10-17 2019-04-25 President And Fellows Of Harvard College Methods and systems for detection of somatic structural variants
CN109785898B (en) * 2019-01-14 2021-03-16 清华大学 A method for assessing environmental pollution risks based on microbial networks
CN110334565A (en) * 2019-03-21 2019-10-15 江苏迪赛特医疗科技有限公司 A kind of uterine neck neoplastic lesions categorizing system of microscope pathological photograph
US20230026559A1 (en) * 2019-12-10 2023-01-26 Novigenix Sa Analysis of cell signatures for disease detection
EP4107262A4 (en) * 2020-02-20 2024-03-27 The Regents of the University of California Methods of spatially resolved single cell rna sequencing
CN111583226B (en) * 2020-05-08 2023-06-30 上海杏脉信息科技有限公司 Cell pathological infection evaluation method, electronic device and storage medium
CN113838531B (en) * 2021-09-19 2024-03-29 复旦大学 A method to assess the degree of cellular senescence based on transcriptomic data and machine learning strategies

Also Published As

Publication number Publication date
CN114863994A (en) 2022-08-05

Similar Documents

Publication Publication Date Title
CN109189767B (en) Data processing method and device, electronic equipment and storage medium
CN110991657A (en) Abnormal sample detection method based on machine learning
CN109635292B (en) Work order quality inspection method and device based on machine learning algorithm
CN113590764A (en) Training sample construction method and device, electronic equipment and storage medium
US20200210776A1 (en) Question answering method, terminal, and non-transitory computer readable storage medium
CN114118287B (en) Sample generation method, device, electronic device and storage medium
CN113704389A (en) Data evaluation method and device, computer equipment and storage medium
CN114138968A (en) Network hotspot mining method, device, equipment and storage medium
CN114863994B (en) Pollution assessment method, device, electronic equipment and storage medium
CN111611386A (en) Text classification method and device
CN114553591A (en) Training method of random forest model, abnormal flow detection method and device
CN114090601A (en) Data screening method, device, equipment and storage medium
CN111639185B (en) Relation information extraction method, device, electronic equipment and readable storage medium
CN114927167B (en) Cell type classification method and system based on iterative block matrix completion based on matrix decomposition
CN115186095B (en) Juvenile text recognition method and device
CN115131784B (en) Image processing method and device, electronic equipment and storage medium
CN108153726A (en) Text handling method and device
CN117574146B (en) Text classification labeling method, device, electronic equipment and storage medium
CN113612777B (en) Training method, flow classification method, device, electronic equipment and storage medium
CN117216682A (en) Method and device for processing perception data, electronic equipment and storage medium
CN116468076A (en) Driving behavior analysis method and device, electronic equipment and storage medium
CN116795978A (en) Complaint information processing method and device, electronic equipment and medium
CN115169831A (en) Enterprise risk early warning method and system
CN116071558A (en) Processing method and device and electronic equipment
CN116362534A (en) Emergency management method and system for violations and risks of online customer service contents in railway field

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant