CN112599218B - Training method and prediction method of drug sensitivity prediction model and related device - Google Patents
Training method and prediction method of drug sensitivity prediction model and related device Download PDFInfo
- Publication number
- CN112599218B CN112599218B CN202011492075.8A CN202011492075A CN112599218B CN 112599218 B CN112599218 B CN 112599218B CN 202011492075 A CN202011492075 A CN 202011492075A CN 112599218 B CN112599218 B CN 112599218B
- Authority
- CN
- China
- Prior art keywords
- cell line
- cell lines
- metabolite
- training
- drug
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 239000003814 drug Substances 0.000 title claims abstract description 366
- 229940079593 drug Drugs 0.000 title claims abstract description 364
- 238000012549 training Methods 0.000 title claims abstract description 181
- 230000035945 sensitivity Effects 0.000 title claims abstract description 146
- 238000000034 method Methods 0.000 title claims abstract description 90
- 239000002207 metabolite Substances 0.000 claims abstract description 226
- 238000012216 screening Methods 0.000 claims abstract description 35
- 230000008569 process Effects 0.000 claims abstract description 34
- 238000012360 testing method Methods 0.000 claims abstract description 22
- 238000005070 sampling Methods 0.000 claims abstract description 21
- 230000004044 response Effects 0.000 claims description 182
- 206010028980 Neoplasm Diseases 0.000 claims description 148
- 201000011510 cancer Diseases 0.000 claims description 142
- 238000011156 evaluation Methods 0.000 claims description 36
- 230000002159 abnormal effect Effects 0.000 claims description 27
- 238000010606 normalization Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 7
- 238000012163 sequencing technique Methods 0.000 claims description 6
- 230000008685 targeting Effects 0.000 claims description 3
- 241001465754 Metazoa Species 0.000 description 3
- 230000009471 action Effects 0.000 description 3
- 230000031018 biological processes and functions Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000007613 environmental effect Effects 0.000 description 3
- 230000002068 genetic effect Effects 0.000 description 3
- 239000003550 marker Substances 0.000 description 3
- 230000001991 pathophysiological effect Effects 0.000 description 3
- 101000984753 Homo sapiens Serine/threonine-protein kinase B-raf Proteins 0.000 description 2
- 102100027103 Serine/threonine-protein kinase B-raf Human genes 0.000 description 2
- 239000000090 biomarker Substances 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000004140 cleaning Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000035772 mutation Effects 0.000 description 2
- 239000000047 product Substances 0.000 description 2
- 108090000623 proteins and genes Proteins 0.000 description 2
- 238000003908 quality control method Methods 0.000 description 2
- 239000005411 L01XE02 - Gefitinib Substances 0.000 description 1
- 206010058467 Lung neoplasm malignant Diseases 0.000 description 1
- 108700020796 Oncogene Proteins 0.000 description 1
- 108010026552 Proteome Proteins 0.000 description 1
- 239000002246 antineoplastic agent Substances 0.000 description 1
- 229940041181 antineoplastic drug Drugs 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 239000007795 chemical reaction product Substances 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 102000052116 epidermal growth factor receptor activity proteins Human genes 0.000 description 1
- 108700015053 epidermal growth factor receptor activity proteins Proteins 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 229960002584 gefitinib Drugs 0.000 description 1
- XGALLCVXEZPNRQ-UHFFFAOYSA-N gefitinib Chemical compound C=12C=C(OCCCN3CCOCC3)C(OC)=CC2=NC=NC=1NC1=CC=C(F)C(Cl)=C1 XGALLCVXEZPNRQ-UHFFFAOYSA-N 0.000 description 1
- 230000012010 growth Effects 0.000 description 1
- 239000003112 inhibitor Substances 0.000 description 1
- 201000005202 lung cancer Diseases 0.000 description 1
- 208000020816 lung neoplasm Diseases 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- YOHYSYJDKVYCJI-UHFFFAOYSA-N n-[3-[[6-[3-(trifluoromethyl)anilino]pyrimidin-4-yl]amino]phenyl]cyclopropanecarboxamide Chemical compound FC(F)(F)C1=CC=CC(NC=2N=CN=C(NC=3C=C(NC(=O)C4CC4)C=CC=3)C=2)=C1 YOHYSYJDKVYCJI-UHFFFAOYSA-N 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000001575 pathological effect Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000002560 therapeutic procedure Methods 0.000 description 1
- 230000004614 tumor growth Effects 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H20/00—ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
- G16H20/10—ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to drugs or medications, e.g. for ensuring correct administration to patients
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Public Health (AREA)
- Epidemiology (AREA)
- Data Mining & Analysis (AREA)
- Primary Health Care (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Pathology (AREA)
- Biomedical Technology (AREA)
- Chemical & Material Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Medicinal Chemistry (AREA)
- Investigating Or Analysing Biological Materials (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The application provides a training method, a prediction method and a related device of a drug sensitivity prediction model. A plurality of sets of training data sets are generated for each training cell line set by sampling a first set number of cell lines from a plurality of cell lines as training cell lines based on a modified bootstrap sampling method. And executing an important feature screening process on each set of training data set to obtain an important feature set. Counting the occurrence times of the metabolite features in the feature screening process of all rounds, sorting the importance of the metabolite features from high to low based on the selected times, selecting the metabolite features with high importance of the set number, ensuring that the metabolite features to be used are higher in importance and higher in robustness, and improving the effectiveness of the metabolite features. On the basis, the selected metabolite characteristics are used for constructing a prediction model for the training cell line by using an integrated method, and then the new test data cell line is predicted, so that the accuracy of model prediction can be improved.
Description
Technical Field
The application relates to the field of data processing, in particular to a training method, a prediction method and a related device of a drug sensitivity prediction model.
Background
Tumors are a complex class of heterogeneous diseases, such as even tumor patients of the same pathological type respond very differently to antitumor drugs. Therefore, tumor science becomes one of important fields of accurate medical treatment, and accurate medicine application can achieve better treatment effect and reduce side effects. One way to implement accurate tumor therapy is to transplant tumors in animals, then apply drugs to the animals, observe the effect of the drugs on tumor growth in animals, and determine the efficacy. The method has high cost, long time consumption and low success rate. In light of these challenges, human cancer cell lines provide new vectors for screening candidate drugs for the treatment of cancer. The existing cancer cell lines cultured by cell line culture technology can approximate the growth environment of cancer cells in cancer patients, and the cancer cell lines and the cancer cells in the cancer patients have great similarity in each group of chemical levels. Thus, by analyzing the molecular data of the cancer cell line to predict drug response, the response of the drug in the patient can be predicted.
How to predict drug response based on cancer cell line molecular data is a problem.
Disclosure of Invention
In order to solve the technical problems, the embodiment of the application provides a training method, a prediction method and a related device of a drug sensitive prediction model, so as to achieve the purposes of improving the effectiveness of metabolite characteristics to be used and guaranteeing the effectiveness of the training of the prediction model, and the technical scheme is as follows:
A method of training a drug sensitive predictive model, comprising:
obtaining a metabolite profile of each of a plurality of cell lines and a drug response parameter IC50 for each of the cell lines;
determining a drug response class for each of the cell lines based on the drug response parameters IC 50;
sampling a first set number of cell lines from a plurality of cell lines to be used as training cell lines for the constructed drug sensitivity prediction model of each cancer cell line, and executing a plurality of important feature screening processes for each training cell line;
each time the important feature screening process includes: inputting the metabolite characteristics of the training cell line into the cancer cell line drug sensitivity prediction model to obtain an important characteristic set output by the cancer cell line drug sensitivity prediction model;
Counting the number of times that the metabolite features appear in the important feature set outputted by each cancer cell line drug sensitivity prediction model for a plurality of times for each metabolite feature of each training cell line as the selected number of times;
for each of said metabolite features of each of said training cell lines, taking as a target number a maximum of a plurality of said selected numbers of times of said metabolite features;
sequencing a plurality of target times from large to small to obtain a target sequencing result, and taking metabolite features corresponding to the first to the mth target times in the target sequencing result as metabolite features to be used;
And training the cancer cell line drug sensitivity prediction model to be trained by utilizing the metabolite to be used characteristic and the drug response category of the cell line to which the metabolite to be used characteristic belongs.
Inputting the metabolite characteristics of the training cell line into the cancer cell line drug sensitivity prediction model to obtain an important characteristic set output by the cancer cell line drug sensitivity prediction model, wherein the method comprises the following steps of:
Normalizing each metabolite characteristic of the training cell line by using a normalization relation (x-min_x)/(max_x-min_x) to obtain normalized metabolite characteristics;
said x represents the content of said metabolite features in said cell lines, min_x is the minimum value of the content of said metabolite features in a plurality of said cell lines, max_x is the maximum value of the content of said metabolite features in a plurality of said cell lines;
inputting the normalized metabolite characteristics of the training cell line into the cancer cell line drug sensitivity prediction model to obtain an important characteristic set output by the cancer cell line drug sensitivity prediction model.
The drug sensitivity prediction model of the cancer cell line to be trained is trained by utilizing the metabolite characteristics to be used and the drug response categories of the metabolite characteristics to be used, and then:
Sampling a second set number of cell lines from a plurality of cell lines to serve as test cell lines, and respectively predicting each test cell line for a plurality of times by using the cancer cell line drug sensitivity prediction model to be trained to obtain a prediction result;
evaluating each prediction result to obtain an evaluation result;
Judging whether the cancer cell line drug sensitivity prediction model to be trained meets a set requirement or not based on a plurality of evaluation results;
If yes, finishing training;
If not, returning to the step of obtaining the metabolite characteristics of each of the plurality of cell lines and the drug response parameters IC50 of each of the cell lines.
And before the evaluation of each prediction result to obtain an evaluation result, the method further comprises the following steps:
determining whether a plurality of said training cell lines are abnormal cell lines based on a plurality of said predictions for each of said metabolite characteristics;
If the abnormal cell lines exist, eliminating the abnormal cell lines in the plurality of training cell lines, taking the cell lines with the abnormal cell lines eliminated as training cell lines, and returning to execute the step of executing a plurality of important feature screening processes on each training cell line;
If the predicted result does not exist, evaluating each predicted result to obtain an evaluation result;
judging whether the cancer cell line drug sensitivity prediction model to be trained meets a set requirement or not based on the evaluation result;
If yes, finishing training;
If not, returning to the step of obtaining the metabolite characteristics of each of the plurality of cell lines and the drug response parameters IC50 of each of the cell lines.
The determining a drug response class based on the drug response parameter IC50, comprising:
Dividing the cell lines with the same IC50 of the drug response parameters in a plurality of cell lines into a group to obtain cell line groups, and counting the number of the cell lines in each cell line group;
Searching a target drug response parameter IC50 in the drug response parameters IC50 of a plurality of cell line groups, wherein the difference between the sum of the numbers of cell lines in the cell line groups to which the drug response parameter IC50 smaller than the target drug response parameter IC50 belongs and the sum of the numbers of cell lines in the cell line groups to which the drug response parameter IC50 larger than the target drug response parameter IC50 belongs is within a set threshold value range;
taking the target drug response parameter IC50 as a preset drug response parameter IC50 threshold;
judging whether the drug response parameter IC50 is larger than a preset drug response parameter IC50 threshold;
if yes, determining the drug response type as insensitive;
If not, the drug response class is determined to be sensitive.
A method of drug sensitivity prediction comprising:
Obtaining metabolite characteristics of a cell line to be treated;
invoking a cancer cell line drug sensitivity prediction model, and processing metabolite characteristics of the cell line to be processed to obtain a drug response class;
The cancer cell line drug sensitivity prediction model is trained based on the training method of the drug sensitivity prediction model.
A training device for a drug sensitive predictive model, comprising:
an acquisition module for acquiring metabolite characteristics of each of a plurality of cell lines and a drug response parameter IC50 of each of the cell lines;
A first determination module for determining, for each of the cell lines, a drug response class based on the drug response parameter IC 50;
The important feature screening module is used for sampling a first set number of cell lines from a plurality of constructed cancer cell line drug sensitivity prediction models as training cell lines, and executing a plurality of important feature screening processes on each training cell line;
each time the important feature screening process includes: inputting the metabolite characteristics of the training cell line into the cancer cell line drug sensitivity prediction model to obtain an important characteristic set output by the cancer cell line drug sensitivity prediction model;
A statistics module for counting, for each metabolite feature of each of the training cell lines, the number of occurrences of the metabolite feature in the set of important features that are output multiple times by each of the cancer cell line drug sensitivity prediction models, as a selected number;
a second determination module for, for each of said metabolite features of each of said training cell lines, targeting a maximum of a plurality of said selected numbers of times of said metabolite features;
The third determining module is used for sorting a plurality of target times from large to small to obtain a target sorting result, and taking metabolite features corresponding to the first to the m-th target times in the target sorting result as metabolite features to be used;
The training module is used for training the cancer cell line drug sensitivity prediction model to be trained by utilizing the metabolite to be used characteristic and the drug response category of the cell line to which the metabolite to be used characteristic belongs.
The important characteristic screening module is specifically used for: normalizing each metabolite characteristic of the training cell line by using a normalization relation (x-min_x)/(max_x-min_x) to obtain normalized metabolite characteristics;
said x represents the content of said metabolite features in said cell lines, min_x is the minimum value of the content of said metabolite features in a plurality of said cell lines, max_x is the maximum value of the content of said metabolite features in a plurality of said cell lines;
inputting the normalized metabolite characteristics of the training cell line into the cancer cell line drug sensitivity prediction model to obtain an important characteristic set output by the cancer cell line drug sensitivity prediction model.
The apparatus further comprises:
The test module is used for evaluating each prediction result to obtain an evaluation result;
Judging whether the cancer cell line drug sensitivity prediction model to be trained meets a set requirement or not based on a plurality of evaluation results;
If yes, finishing training;
If not, returning to the step of obtaining the metabolite characteristics of each of the plurality of cell lines and the drug response parameters IC50 of each of the cell lines.
The test module is further configured to:
before each predicted result is evaluated to obtain an evaluation result, judging whether an abnormal cell line exists in a plurality of training cell lines or not based on a plurality of predicted results of each metabolite characteristic;
If the abnormal cell lines exist, eliminating the abnormal cell lines in the plurality of training cell lines, taking the cell lines with the abnormal cell lines eliminated as training cell lines, and returning to execute the step of executing a plurality of important feature screening processes on each training cell line;
If the predicted result does not exist, evaluating each predicted result to obtain an evaluation result;
judging whether the cancer cell line drug sensitivity prediction model to be trained meets a set requirement or not based on the evaluation result;
If yes, finishing training;
If not, returning to the step of obtaining the metabolite characteristics of each of the plurality of cell lines and the drug response parameters IC50 of each of the cell lines.
The first determining module is specifically configured to:
Dividing the cell lines with the same IC50 of the drug response parameters in a plurality of cell lines into a group to obtain cell line groups, and counting the number of the cell lines in each cell line group;
Searching a target drug response parameter IC50 in the drug response parameters IC50 of a plurality of cell line groups, wherein the difference between the sum of the numbers of cell lines in the cell line groups to which the drug response parameter IC50 smaller than the target drug response parameter IC50 belongs and the sum of the numbers of cell lines in the cell line groups to which the drug response parameter IC50 larger than the target drug response parameter IC50 belongs is within a set threshold value range;
taking the target drug response parameter IC50 as a preset drug response parameter IC50 threshold;
judging whether the drug response parameter IC50 is larger than a preset drug response parameter IC50 threshold;
if yes, determining the drug response type as insensitive;
If not, the drug response class is determined to be sensitive.
A drug sensitivity prediction device comprising:
the acquisition module is used for acquiring metabolite characteristics of the cell line to be treated;
The calling module is used for calling a cancer cell line drug sensitivity prediction model and processing the metabolite characteristics of the cell line to be processed so as to obtain a drug response type;
The cancer cell line drug sensitivity prediction model is trained based on the training method of the drug sensitivity prediction model according to any one of claims 1-5.
Compared with the prior art, the application has the beneficial effects that:
According to the application, a first set number of cell lines are sampled from a plurality of cell lines to serve as training cell lines, a plurality of important feature screening processes are carried out on each training cell line to obtain an important feature set, the number of times that the metabolite features appear in the important feature set which is output by each cancer cell line drug sensitivity prediction model for a plurality of times is counted, the metabolite features to be used are selected as the selected times based on the selected times, the metabolite features to be used are guaranteed to be higher in importance and used more times, the effectiveness of the metabolite features to be used is improved, on the basis, the effectiveness of training of the prediction model is guaranteed, and the metabolite features corresponding to the cell lines are predicted by using a trained cancer cell line drug sensitivity prediction model, so that the accuracy of prediction can be improved.
And because the metabolite characteristics corresponding to the cell line are the final products of various biological processes in the cell, and are the final reactions of organisms to genetic, pathophysiological and environmental stimuli, the metabolite characteristics corresponding to the cell line are utilized to train the cancer cell line drug sensitivity prediction model, so that the training precision of the cancer cell line drug sensitivity prediction model can be improved, and the prediction accuracy of the trained cancer cell line drug sensitivity prediction model is ensured.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.
FIG. 1 is a flowchart of an embodiment 1 of a training method for a drug sensitive predictive model provided by the present application;
FIG. 2 is a flowchart of an embodiment 1 of a training method for a drug sensitive predictive model provided by the present application;
FIG. 3 is a flowchart of an embodiment 1 of a training method for a drug sensitive predictive model provided by the present application;
FIG. 4 is a flow chart of a method for drug susceptibility prediction provided by the present application;
FIG. 5 is a schematic diagram of the logic structure of a training device for a drug sensitive prediction model provided by the application;
Fig. 6 is a schematic diagram of a logic structure of a drug sensitivity prediction device according to the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Currently, most of the current drug susceptibility studies are focused on the genome, and biomarkers of clinical application are mainly single genes or few genes, for example, the susceptibility prediction of gefitinib drugs for treating lung cancer is through EGFR mutation. However, the etiology of some tumors is not just due to a single major oncogene, such as BRAF (V600E) mutation positive in nearly half of patients, but is ineffective against BRAF inhibitors. In addition, many drugs are still clinically useless as biomarkers for personalized medicine. There is an urgent need to develop new methods and new technologies that can be used to better predict the response (sensitivity or resistance) of cancer patients to drugs. Based on this background, the inventors have found that metabolites are end products of various biological processes within cells, are the final response of organisms to genetic, pathophysiological and environmental stimuli, and act as signal collectors and amplifiers for various vital information upstream including genome, transcriptome and proteome. Metabolome is one of the most proximal to biological phenotypes and its use as a marker of drug response has other histological advantages not available. Therefore, the inventor provides a training method of a drug sensitivity prediction model based on metabolite characteristics.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
As shown in fig. 1, a flowchart of an embodiment 1 of a training method of a drug sensitivity prediction model provided by the present application includes the following steps:
Step S11, obtaining metabolite characteristics of each cell line in the plurality of cell lines and drug response parameters IC50 of each cell line.
In this example, the metabolite profile of each of the plurality of cell lines and the drug response parameter IC50 for each cell line can be obtained from the CCLE database. For example, if 75 cell lines are desired, the metabolite profile of each of the 75 cell lines, and the drug response parameters IC50 for each of the 75 cell lines, are obtained from the CCLE database.
Among these, metabolite characteristics can be understood as: quantitative value of metabolites. The drug response parameter IC50 can be understood as: the drug response reached a drug concentration that was absolutely 50% inhibited.
Step S12, determining the drug response category based on the drug response parameter IC50 for each cell line.
In this example, for each of the cell lines, the drug response class may be determined based on a specific drug response parameter threshold as in the prior art. Specifically, comparing the drug response parameter IC50 with a specific drug response parameter threshold in the prior art, and if the drug response parameter IC50 is greater than the specific drug response parameter threshold in the prior art, determining that the drug response class is insensitive; if the drug response parameter IC50 is less than the prior art specific drug response parameter threshold, the drug response class is determined to be sensitive.
However, the accuracy of determining the type of drug response based on the specific drug response parameter threshold in the prior art is not high. Accordingly, in this embodiment, another method for determining a drug response class is provided, which may specifically include:
S121, dividing the cell lines with the same drug response parameters IC50 into a group, obtaining cell line groups, and counting the number of the cell lines in each cell line group.
For example, if the metabolite characteristics and the drug response parameters IC50 of 75 cell lines are obtained from CCLE database, and the drug response parameters IC50 of cell lines a1-a20 are all 1, the drug response parameters IC50 of cell lines a21-a36 are all 2, the drug response parameters IC50 of cell lines a37-a50 are all 3, and the drug response parameters IC50 of cell lines a51-a75 are all 4, the cell lines a1-a20 are divided into a group as cell line group 1; dividing cell lines a21-a36 into a group as cell line group 2; dividing cell lines a37-a50 into a group as cell line group 3; cell lines a51-a75 were divided into a group as cell line group 4.
S122, searching a target drug response parameter IC50 in the drug response parameters IC50 of the cell line groups, wherein the difference between the sum of the numbers of the cell lines in the cell line groups to which the drug response parameter IC50 which is not more than the target drug response parameter IC50 belongs and the sum of the numbers of the cell lines in the cell line groups to which the drug response parameter IC50 which is more than the target drug response parameter IC50 belongs is within a set threshold range.
Referring now to the description of the case where the target drug response parameter IC50 is found among the drug response parameters IC50 of the plurality of cell line groups, for example, still taking the division of the cell lines described in step S121 as an example, after obtaining cell line group 1, cell line group 2, cell line group 3 and cell line group 4, it is possible to determine that the drug response parameter IC50 of cell line group 1 is 1, the drug response parameter IC50 of cell line group 2 is 2, the drug response parameter IC50 of cell line group 3 is 3, the drug response parameter IC50 of cell line group 4 is 4, it is possible to determine that the cell line group not greater than the drug response parameter IC50 of 2 includes cell line groups 1 and 2, the sum of the numbers of cell lines of cell line groups 1 and 2 is 36, the cell line group greater than the drug response parameter IC50 of 2 includes cell line groups 3 and 4, the sum of the numbers of cell lines 3 and 4 is 39,39 and 36 is 3, and within the set threshold range of 1 to 10, it is possible to determine that the target drug response parameter IC50 of cell line group 2 is the drug response parameter IC50.
S123, taking the target drug response parameter IC50 as a preset drug response parameter IC50 threshold.
S124, judging whether the drug response parameter IC50 is larger than a preset drug response parameter IC50 threshold.
The larger the drug response parameter IC50, the less sensitive to the drug; conversely, the more sensitive the indication is to the drug.
If yes, step S125; if not, step S126 is performed.
S125, determining the drug response type as insensitive.
S126, determining the drug response type as sensitive.
In this embodiment, the cell lines on the same side of the drug response parameter IC50 threshold value in the cell lines are divided into a group to obtain cell line groups, the number of the cell lines in each cell line group is counted, the target drug response parameter IC50 is searched in the drug response parameters IC50 of the cell line groups, the target drug response parameter IC50 is used as a preset drug response parameter IC50 threshold value, so that the number of the cell lines on both sides of the preset drug response parameter IC50 threshold value is balanced as much as possible, and therefore, the drug response category is classified based on the preset drug response parameter IC50 threshold value, the balance of the number of the drug sensitive and insensitive cell lines is ensured, and the reliability of training data is ensured.
Step S13, sampling a first set number of cell lines from a plurality of cell lines as training cell lines for each constructed cancer cell line drug sensitivity prediction model, and executing a plurality of important feature screening processes for each training cell line, wherein each important feature screening process comprises the following steps: inputting the metabolite characteristics of the training cell line into the cancer cell line drug sensitivity prediction model to obtain an important characteristic set output by the cancer cell line drug sensitivity prediction model.
In this example, a plurality of different cancer cell line drug sensitivity predictive models can be constructed.
A plurality of different cancer cell line drug sensitivity prediction models may include: at least any two of a cancer cell line drug sensitivity prediction model based on ExtraTreesClassifier algorithm, a cancer cell line drug sensitivity prediction model based on GaussianProcessClassifier algorithm, a cancer cell line drug sensitivity prediction model based on NuSVC algorithm, a cancer cell line drug sensitivity prediction model based on RIDGECLASSIFIERCV algorithm, a cancer cell line drug sensitivity prediction model based on GaussianNB algorithm, a cancer cell line drug sensitivity prediction model based on RandomForestClassifier algorithm and a cancer cell line drug sensitivity prediction model based on XGBClassifier algorithm.
In this embodiment, the first set number of cell lines may be sampled from the plurality of cell lines based on a modified bootstrapping sampling algorithm. Specifically, a sampling mode without replacement is adopted, a first set number of cell lines are obtained by sampling from a plurality of cell lines, and each sampled cell line is different. For example, one cell line is sampled from 75 cell lines, then one cell line is sampled from the remaining 74 cell lines, …, and one cell line is sampled from the remaining (75-i) cell lines until a first set number of cell lines are sampled.
Wherein the first set number is less than the total number of the plurality of cell lines.
The sampling mode without replacement is adopted, so that the cell line obtained by sampling can be ensured to be non-repeatable, and the diversity of training data is improved.
The cancer cell line drug sensitivity predictive model can be understood as: a machine learning model for predicting whether a drug is susceptible. The cancer cell line drug sensitivity prediction model can evaluate the importance of the metabolite characteristics to obtain the importance index value of the metabolite characteristics. The importance index value of the metabolite features is used to characterize the importance of the metabolite features in the prediction process. The higher the importance index value of a metabolite feature, the greater the influence of that feature on the predicted outcome.
The set of important features includes: and sorting importance index values of the metabolite features of the training cell line from large to small, wherein the first to nth importance index values in the sorting result correspond to a set of features, and n is smaller than the total number of the metabolite features of the training cell line.
In this embodiment, inputting the metabolite features of the training cell line into the cancer cell line drug sensitivity prediction model to obtain the important feature set output by the cancer cell line drug sensitivity prediction model may include:
S131, normalizing each metabolite characteristic of the training cell line by using a normalization relation (x-min_x)/(max_x-min_x) to obtain normalized metabolite characteristics.
The x represents the content of the metabolite features in the cell lines, min_x is the minimum value of the content of the metabolite features in the plurality of the cell lines, and max_x is the maximum value of the content of the metabolite features in the plurality of the cell lines.
S132, inputting the normalized metabolite characteristics of the training cell line into the cancer cell line drug sensitivity prediction model to obtain an important characteristic set output by the cancer cell line drug sensitivity prediction model.
In this embodiment, each metabolite feature of the training cell line is normalized by using a normalization relation (x-min_x)/(max_x-min_x), so as to obtain normalized metabolite features, which can improve the operation speed and the efficiency of outputting an important feature set by using a cancer cell line drug sensitivity prediction model.
In this embodiment, inputting the metabolite characteristics of the training cell line into the cancer cell line drug sensitivity prediction model to obtain the important feature set output by the cancer cell line drug sensitivity prediction model may also include:
S133, performing quality control and cleaning on the metabolite characteristics of the training cell line to obtain preprocessed metabolite characteristics, and normalizing each preprocessed metabolite characteristic of the training cell line by using a normalization relation (x-min_x)/(max_x-min_x) to obtain normalized metabolite characteristics.
The x represents the content of the metabolite features in the cell lines, min_x is the minimum value of the content of the metabolite features in the plurality of the cell lines, and max_x is the maximum value of the content of the metabolite features in the plurality of the cell lines.
S132, inputting the normalized metabolite characteristics of the training cell line into the cancer cell line drug sensitivity prediction model to obtain an important characteristic set output by the cancer cell line drug sensitivity prediction model.
In this embodiment, the metabolite features are more reliable by performing quality control and cleaning on the metabolite features, so that the normalization efficiency and the reliability of training data are improved.
Step S14, counting the occurrence times of the metabolite characteristics in the important characteristic sets output by the drug sensitivity prediction model of each cancer cell line for each metabolite characteristic of each training cell line as the selected times.
The number of times of occurrence of the metabolite characteristics in the important characteristic set outputted a plurality of times for each of the trained cell lines is counted for each of the trained cell lines, and the number of times of selection is described, for example, 80 cell lines in total, 60 cell lines are extracted as a training set. Then 50 cell lines were extracted from the training set, i.e. 60 cell lines, 100 times in total, using the non-replacement bootstrap algorithm. (1) Model training and feature screening are carried out on the extracted cell lines by adopting 5 different REFCV algorithms respectively. (2) 100 rounds were performed in total. (3) The number of times each metabolite feature was selected in 100 cycles of the five algorithms was counted separately.
Step S15, regarding each metabolite characteristic of each training cell line, setting the maximum value of the selected times of the metabolite characteristics as a target time.
Still referring to the example in step S14, for each of the metabolite characteristics of each of the training cell lines, a maximum value among the plurality of selected times of the metabolite characteristics is described as a target time, for example, if the selected time y1 of the metabolite characteristic c1 of the training cell line b1 is greater than the selected time y2, the selected time y1 is taken as the target time of the metabolite characteristic c1 of the training cell line b 1; if the selected times y3 of the metabolite features c2 of the training cell line b2 are smaller than the selected times y4, the selected times y4 are taken as target times of the metabolite features c2 of the training cell line b 2; if the selected times y5 of the metabolite features c3 of the training cell line b3 are larger than the selected times y6, the selected times y5 are used as the target times of the metabolite features c3 of the training cell line b 3; if the selected times y7 of the metabolite features c4 of the training cell line b4 are smaller than the selected times y8, the selected times y8 are taken as target times of the metabolite features c4 of the training cell line b 4; if the number of selections y9 of the metabolite features c5 of the training cell line b5 is smaller than the number of selections y10, the number of selections y10 is taken as the target number of metabolite features c5 of the training cell line b 5.
And S16, sorting the multiple target times from large to small to obtain a target sorting result, and taking metabolite features corresponding to the first to the m-th target times in the target sorting result as metabolite features to be used.
And S17, training a cancer cell line drug sensitivity prediction model to be trained by utilizing the metabolite to be used and the drug response category of the cell line to which the metabolite to be used belongs.
The cancer cell line drug susceptibility prediction model to be trained may be one of the plurality of cancer cell line drug susceptibility prediction models constructed in step S13. Of course, the cancer cell line drug sensitivity prediction model to be trained can also be: and (3) combining a plurality of models in the plurality of cancer cell line drug sensitivity prediction models constructed in the step (S13).
Under the condition that the cancer cell line drug sensitivity prediction model to be trained is a model obtained by combining a plurality of models in the plurality of cancer cell line drug sensitivity prediction models constructed in the step S13, when the cancer cell line drug sensitivity prediction model to be trained is adopted for prediction, each prediction model in the cancer cell line drug sensitivity prediction model to be trained can be utilized for predicting target data to obtain a plurality of prediction results, and then a final prediction result is determined in a voting mode. For example, the drug sensitive model of the cancer cell line to be trained comprises 7 prediction models, the 7 prediction models predict target data, 7 obtained prediction results are sensitive, insensitive, sensitive, insensitive and insensitive respectively, the number of the sensitive is more than the number of the insensitive, and the final prediction result is determined to be sensitive.
According to the application, a first set number of cell lines are sampled from a plurality of cell lines to serve as training cell lines, a plurality of important feature screening processes are carried out on each training cell line to obtain an important feature set, the number of times that the metabolite features appear in the important feature set which is output by each cancer cell line drug sensitivity prediction model for a plurality of times is counted, the metabolite features to be used are selected as the selected times based on the selected times, the metabolite features to be used are guaranteed to be higher in importance and used more times, the effectiveness of the metabolite features to be used is improved, on the basis, the effectiveness of training of the prediction model is guaranteed, and the metabolite features corresponding to the cell lines are predicted by using a trained cancer cell line drug sensitivity prediction model, so that the accuracy of prediction can be improved.
And because the metabolite characteristics corresponding to the cell line are the final products of various biological processes in the cell, and are the final reactions of organisms to genetic, pathophysiological and environmental stimuli, the metabolite characteristics corresponding to the cell line are utilized to train the cancer cell line drug sensitivity prediction model, so that the training precision of the cancer cell line drug sensitivity prediction model can be improved, and the prediction accuracy of the trained cancer cell line drug sensitivity prediction model is ensured.
As another alternative embodiment of the present application, referring to fig. 2, a flow chart of an embodiment 2 of a training method of a drug sensitive prediction model provided by the present application is mainly an extension of the training method of the drug sensitive prediction model described in the above embodiment 1, and as shown in fig. 2, the method may include, but is not limited to, the following steps:
step S21, obtaining metabolite characteristics of each cell line in the plurality of cell lines and drug response parameters IC50 of each cell line.
Step S22, determining the drug response category based on the drug response parameter IC50 for each cell line.
Step S23, sampling a first set number of cell lines from a plurality of cell lines for each constructed cancer cell line drug sensitivity prediction model as training cell lines, and executing a plurality of important feature screening processes for each training cell line.
Each time the important feature screening process includes: inputting the metabolite characteristics of the training cell line into the cancer cell line drug sensitivity prediction model to obtain an important characteristic set output by the cancer cell line drug sensitivity prediction model.
Step S24, counting the occurrence times of the metabolite characteristics in the important characteristic sets output by the drug sensitivity prediction model of each cancer cell line for each metabolite characteristic of each training cell line as the selected times.
Step S25, regarding each metabolite characteristic of each training cell line, setting the maximum value of the selected times of the metabolite characteristics as a target time.
And S26, sorting the multiple target times from large to small to obtain a target sorting result, and taking metabolite features corresponding to the first to the m-th target times in the target sorting result as metabolite features to be used.
And step S27, training a cancer cell line drug sensitivity prediction model to be trained by utilizing the metabolite to be used and the drug response category of the cell line to which the metabolite to be used belongs.
The detailed process of steps S21-S27 can be referred to in the description of steps S11-S17, and will not be described herein.
And step S28, sampling a second set number of cell lines from the cell lines to serve as test cell lines, and predicting each metabolite characteristic of each test cell line for multiple times by using the cancer cell line drug sensitivity prediction model to be trained to obtain a prediction result.
The detailed process of sampling the second set number of cell lines from the plurality of cell lines may be referred to as a description of sampling the first set number of cell lines from the plurality of cell lines in step S23, and will not be described herein.
And step 29, evaluating each prediction result to obtain an evaluation result.
Evaluating each of the prediction results to obtain an evaluation result may include: and comparing whether the predicted result is consistent with the drug response category of the metabolite feature marker of the test cell line or not to obtain a comparison result, and taking the comparison result as an evaluation result.
And step S210, judging whether the drug sensitivity prediction model of the cancer cell line to be trained meets the set requirements or not based on a plurality of evaluation results.
Based on a plurality of the evaluation results, determining whether the cancer cell line drug sensitivity prediction model to be trained meets a set requirement may include:
And counting whether the number of the evaluation results with correct characterization prediction reaches the set number or not in the plurality of evaluation results.
And if the number reaches the set number, the cancer cell line drug sensitivity prediction model to be trained meets the set requirement.
And if the evaluation result is that the prediction result is consistent with the drug response category of the metabolite characteristic marker of the test cell line, the drug sensitivity prediction model of the cancer cell line to be trained is characterized to be correctly predicted.
If yes, go to step S211; if not, the process returns to step S21.
Step S211, training is ended.
In this embodiment, the cell line is tested to evaluate the drug sensitivity prediction model of the cancer cell line to be trained, and when the set requirement is not met, the drug sensitivity prediction model of the cancer cell line to be trained is continuously trained, so that the training precision is improved, and the accuracy of the drug sensitivity prediction model of the cancer cell line to be trained after the training is finished is ensured.
As another alternative embodiment of the present application, referring to fig. 3, a flow chart of an embodiment 3 of a training method of a drug sensitive prediction model provided by the present application is mainly an extension of the training method of the drug sensitive prediction model described in the above embodiment 2, and as shown in fig. 3, the method may include, but is not limited to, the following steps:
Step S31, obtaining metabolite characteristics of each cell line in the plurality of cell lines and drug response parameters IC50 of each cell line.
Step S32, determining the drug response category based on the drug response parameter IC50 for each cell line.
Step S33, sampling a first set number of cell lines from a plurality of cell lines for each constructed cancer cell line drug sensitivity prediction model as training cell lines, and executing a plurality of important feature screening processes for each training cell line.
Each time the important feature screening process includes: inputting the metabolite characteristics of the training cell line into the cancer cell line drug sensitivity prediction model to obtain an important characteristic set output by the cancer cell line drug sensitivity prediction model.
Step S34, counting the occurrence times of the metabolite characteristics in the important characteristic sets outputted by the drug sensitivity prediction model of each cancer cell line for each metabolite characteristic of each training cell line as the selected times.
Step S35, regarding each metabolite characteristic of each training cell line, setting the maximum value of the selected times of the metabolite characteristics as a target time.
Step S36, sorting a plurality of target times from large to small to obtain a target sorting result, and taking metabolite features corresponding to the first to the m-th target times in the target sorting result as metabolite features to be used.
And step S37, training a cancer cell line drug sensitivity prediction model to be trained by utilizing the metabolite to be used and the drug response category of the cell line to which the metabolite to be used belongs.
Step S38, sampling a second set number of cell lines from a plurality of cell lines to serve as test cell lines, and predicting each metabolite characteristic of each test cell line for multiple times by using the cancer cell line drug sensitivity prediction model to be trained to obtain a prediction result.
The detailed procedure of steps S31-S38 can be referred to in the related description of steps S21-S28 in embodiment 2, and will not be described herein.
Step S39, judging whether an abnormal cell line exists in a plurality of training cell lines based on a plurality of prediction results of each metabolite characteristic.
Based on a plurality of said predictions for each of said metabolite characteristics, determining the presence or absence of an abnormal cell line in a plurality of said training cell lines can be understood as:
Judging whether a preset number of prediction results with wrong prediction exists in a plurality of prediction results of each metabolite characteristic;
If present, this cell line is an abnormal cell line.
If the predicted result does not match the drug response class of the metabolite signature, a prediction error is indicated.
If yes, go to step S310; if not, step S311 is performed.
And step S310, eliminating abnormal cell lines in the plurality of training cell lines, taking the cell lines with the abnormal cell lines eliminated as training cell lines, and returning to execute the step of executing important feature screening processes for each training cell line for a plurality of times.
Step S311, evaluating each prediction result to obtain an evaluation result;
And step S312, judging whether the drug sensitivity prediction model of the cancer cell line to be trained meets the set requirements or not based on the evaluation result.
If yes, go to step S313; if not, the process returns to step S31.
Step S313, the training is ended.
In this embodiment, based on the plurality of prediction results of each metabolite feature, it is determined whether an abnormal metabolite feature exists in the metabolite features of the plurality of training cell lines, and if so, the abnormal metabolite features in the plurality of training cell lines are eliminated, so that training data are more accurate, and accuracy of training the drug sensitive prediction model of the cancer cell line to be trained is improved.
In another embodiment of the present application, a method for predicting drug sensitivity is provided, please refer to fig. 4, which includes:
step S41, obtaining metabolite characteristics of the cell line to be treated.
And step S42, calling a cancer cell line drug sensitivity prediction model, and processing the metabolite characteristics of the cell line to be processed to obtain a drug response type.
The cancer cell line drug sensitivity prediction model is trained based on the training method of the drug sensitivity prediction model described in any one of embodiments 1-3.
In this embodiment, the model obtained by training by using the training method of the drug sensitive prediction model described in the foregoing embodiments is used for prediction, so that the accuracy of prediction can be improved, and the accuracy of the prediction result can be improved.
The device for training the drug sensitive prediction model provided by the application is described next, and the device for training the drug sensitive prediction model described below and the method for training the drug sensitive prediction model described above can be referred to correspondingly.
Referring to fig. 5, the training device for the drug sensitivity prediction model includes: the system comprises an acquisition module 100, a first determination module 200, an important feature screening module 300, a statistics module 400, a second determination module 500, a third determination module 600 and a training module 700.
An acquisition module 100 for acquiring metabolite characteristics of each of a plurality of cell lines and a drug response parameter IC50 of each of the cell lines;
a first determination module 200 for determining, for each of the cell lines, a drug response class based on the drug response parameter IC 50;
the important feature screening module 300 is configured to sample a first set number of cell lines from a plurality of cell lines for each constructed cancer cell line drug sensitivity prediction model, and perform a plurality of important feature screening processes on each of the training cell lines as a training cell line;
each time the important feature screening process includes: inputting the metabolite characteristics of the training cell line into the cancer cell line drug sensitivity prediction model to obtain an important characteristic set output by the cancer cell line drug sensitivity prediction model;
a statistics module 400, configured to count, for each metabolite feature of each of the training cell lines, the number of occurrences of the metabolite feature in the set of important features that are outputted multiple times by each of the cancer cell line drug sensitivity prediction models, as the selected number;
a second determination module 500 for, for each of the metabolite features of each of the training cell lines, targeting a maximum of a plurality of the selected numbers of times of the metabolite feature;
a third determining module 600, configured to rank the multiple target times from large to small to obtain a target ranking result, and use metabolite features corresponding to the first to the m-th target times in the target ranking result as metabolite features to be used;
The training module 700 is configured to train the cancer cell line drug sensitivity prediction model to be trained by using the metabolite feature to be used and the drug response class of the cell line to which the metabolite feature to be used belongs.
In this embodiment, the important feature screening module 300 may be specifically configured to: normalizing each metabolite characteristic of the training cell line by using a normalization relation (x-min_x)/(max_x-min_x) to obtain normalized metabolite characteristics;
said x represents the content of said metabolite features in said cell lines, min_x is the minimum value of the content of said metabolite features in a plurality of said cell lines, max_x is the maximum value of the content of said metabolite features in a plurality of said cell lines;
inputting the normalized metabolite characteristics of the training cell line into the cancer cell line drug sensitivity prediction model to obtain an important characteristic set output by the cancer cell line drug sensitivity prediction model.
In this embodiment, the training device of the drug sensitivity prediction model may further include:
The test module is used for evaluating each prediction result to obtain an evaluation result;
Judging whether the cancer cell line drug sensitivity prediction model to be trained meets a set requirement or not based on a plurality of evaluation results;
If yes, finishing training;
If not, returning to the step of obtaining the metabolite characteristics of each of the plurality of cell lines and the drug response parameters IC50 of each of the cell lines.
In this embodiment, the test module may be further configured to:
Before evaluating each of the predicted results to obtain an evaluation result, determining whether an abnormal cell line exists in a plurality of the training cell lines based on a plurality of the predicted results for each of the metabolite features in each of the test cell lines;
If the abnormal cell lines exist, eliminating the abnormal cell lines in the plurality of training cell lines, taking the cell lines with the abnormal cell lines eliminated as training cell lines, and returning to execute the step of executing a plurality of important feature screening processes on each training cell line;
If the predicted result does not exist, evaluating each predicted result to obtain an evaluation result;
judging whether the cancer cell line drug sensitivity prediction model to be trained meets a set requirement or not based on the evaluation result;
If yes, finishing training;
If not, returning to the step of obtaining the metabolite characteristics of each of the plurality of cell lines and the drug response parameters IC50 of each of the cell lines.
In this embodiment, the first determining module 200 may specifically be configured to:
Dividing the cell lines with the same IC50 of the drug response parameters in a plurality of cell lines into a group to obtain cell line groups, and counting the number of the cell lines in each cell line group;
Searching a target drug response parameter IC50 in the drug response parameters IC50 of a plurality of cell line groups, wherein the difference between the sum of the numbers of cell lines in the cell line groups to which the drug response parameter IC50 smaller than the target drug response parameter IC50 belongs and the sum of the numbers of cell lines in the cell line groups to which the drug response parameter IC50 larger than the target drug response parameter IC50 belongs is within a set threshold value range;
taking the target drug response parameter IC50 as a preset drug response parameter IC50 threshold;
judging whether the drug response parameter IC50 is larger than a preset drug response parameter IC50 threshold;
if yes, determining the drug response type as insensitive;
If not, the drug response class is determined to be sensitive.
In another embodiment of the present application, a drug susceptibility prediction apparatus is provided, please refer to fig. 6, the drug susceptibility prediction apparatus includes: an acquisition module 800 and a call module 900.
The acquisition module is used for acquiring metabolite characteristics of the cell line to be treated;
The calling module is used for calling a cancer cell line drug sensitivity prediction model and processing the metabolite characteristics of the cell line to be processed so as to obtain a drug response type;
the cancer cell line drug sensitivity prediction model is trained based on the training method of the drug sensitivity prediction model described in any one of embodiments 1-3.
It should be noted that, in each embodiment, the differences from the other embodiments are emphasized, and the same similar parts between the embodiments are referred to each other. For the apparatus class embodiments, the description is relatively simple as it is substantially similar to the method embodiments, and reference is made to the description of the method embodiments for relevant points.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in the same piece or pieces of software and/or hardware when implementing the present application.
From the above description of embodiments, it will be apparent to those skilled in the art that the present application may be implemented in software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present application.
The training method, the prediction method and the related devices of the drug sensitivity prediction model provided by the application are described in detail, and specific examples are applied to illustrate the principle and the implementation mode of the application, and the description of the examples is only used for helping to understand the method and the core idea of the application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.
Claims (12)
1. A method of training a drug sensitive predictive model comprising:
obtaining a metabolite profile of each of a plurality of cell lines and a drug response parameter IC50 for each of the cell lines;
determining a drug response class for each of the cell lines based on the drug response parameters IC 50;
sampling a first set number of cell lines from a plurality of cell lines to be used as training cell lines for the constructed drug sensitivity prediction model of each cancer cell line, and executing a plurality of important feature screening processes for each training cell line;
each time the important feature screening process includes: inputting the metabolite characteristics of the training cell line into the cancer cell line drug sensitivity prediction model to obtain an important characteristic set output by the cancer cell line drug sensitivity prediction model;
Counting the number of times that the metabolite features appear in the important feature set outputted by each cancer cell line drug sensitivity prediction model for a plurality of times for each metabolite feature of each training cell line as the selected number of times;
for each of said metabolite features of each of said training cell lines, taking as a target number a maximum of a plurality of said selected numbers of times of said metabolite features;
sequencing a plurality of target times from large to small to obtain a target sequencing result, and taking metabolite features corresponding to the first to the mth target times in the target sequencing result as metabolite features to be used;
And training the cancer cell line drug sensitivity prediction model to be trained by utilizing the metabolite to be used characteristic and the drug response category of the cell line to which the metabolite to be used characteristic belongs.
2. The method of claim 1, wherein said inputting the metabolite features of the training cell line into the cancer cell line drug susceptibility prediction model results in an important feature set output by the cancer cell line drug susceptibility prediction model, comprising:
Normalizing each metabolite characteristic of the training cell line by using a normalization relation (x-min_x)/(max_x-min_x) to obtain normalized metabolite characteristics;
said x represents the content of said metabolite features in said cell lines, min_x is the minimum value of the content of said metabolite features in a plurality of said cell lines, max_x is the maximum value of the content of said metabolite features in a plurality of said cell lines;
inputting the normalized metabolite characteristics of the training cell line into the cancer cell line drug sensitivity prediction model to obtain an important characteristic set output by the cancer cell line drug sensitivity prediction model.
3. The method of claim 1, wherein the training of the drug sensitivity predictive model for a cancer cell line using the metabolite profile to be used, and the drug response class of the metabolite profile to be used, is followed by:
Sampling a second set number of cell lines from a plurality of cell lines to serve as test cell lines, and predicting each metabolite characteristic of each test cell line for a plurality of times by using the cancer cell line drug sensitivity prediction model to be trained to obtain a prediction result;
evaluating each prediction result to obtain an evaluation result;
Judging whether the cancer cell line drug sensitivity prediction model to be trained meets a set requirement or not based on a plurality of evaluation results;
If yes, finishing training;
If not, returning to the step of obtaining the metabolite characteristics of each of the plurality of cell lines and the drug response parameters IC50 of each of the cell lines.
4. A method according to claim 3, wherein said evaluating each of said predicted outcomes, prior to obtaining an evaluation outcome, further comprises:
Determining whether an abnormal cell line is present in a plurality of said training cell lines based on a plurality of said predictions of each of said metabolite characteristics in each of said test cell lines;
If the abnormal cell lines exist, eliminating the abnormal cell lines in the plurality of training cell lines, taking the cell lines with the abnormal cell lines eliminated as training cell lines, and returning to execute the step of executing a plurality of important feature screening processes on each training cell line;
If the predicted result does not exist, evaluating each predicted result to obtain an evaluation result;
judging whether the cancer cell line drug sensitivity prediction model to be trained meets a set requirement or not based on the evaluation result;
If yes, finishing training;
If not, returning to the step of obtaining the metabolite characteristics of each of the plurality of cell lines and the drug response parameters IC50 of each of the cell lines.
5. The method of claim 1, wherein said determining a drug response class based on said drug response parameter IC50 comprises:
Dividing the cell lines with the same IC50 of the drug response parameters in a plurality of cell lines into a group to obtain cell line groups, and counting the number of the cell lines in each cell line group;
Searching a target drug response parameter IC50 in the drug response parameters IC50 of a plurality of cell line groups, wherein the difference between the sum of the numbers of cell lines in the cell line groups to which the drug response parameter IC50 smaller than the target drug response parameter IC50 belongs and the sum of the numbers of cell lines in the cell line groups to which the drug response parameter IC50 larger than the target drug response parameter IC50 belongs is within a set threshold value range;
taking the target drug response parameter IC50 as a preset drug response parameter IC50 threshold;
judging whether the drug response parameter IC50 is larger than a preset drug response parameter IC50 threshold;
if yes, determining the drug response type as insensitive;
If not, the drug response class is determined to be sensitive.
6. A method of predicting drug sensitivity, comprising:
Obtaining metabolite characteristics of a cell line to be treated;
invoking a cancer cell line drug sensitivity prediction model, and processing metabolite characteristics of the cell line to be processed to obtain a drug response class;
The cancer cell line drug sensitivity prediction model is trained based on the training method of the drug sensitivity prediction model according to any one of claims 1-5.
7. A training device for a drug sensitive predictive model, comprising:
an acquisition module for acquiring metabolite characteristics of each of a plurality of cell lines and a drug response parameter IC50 of each of the cell lines;
A first determination module for determining, for each of the cell lines, a drug response class based on the drug response parameter IC 50;
The important feature screening module is used for sampling a first set number of cell lines from a plurality of constructed cancer cell line drug sensitivity prediction models as training cell lines, and executing a plurality of important feature screening processes on each training cell line;
each time the important feature screening process includes: inputting the metabolite characteristics of the training cell line into the cancer cell line drug sensitivity prediction model to obtain an important characteristic set output by the cancer cell line drug sensitivity prediction model;
A statistics module for counting, for each metabolite feature of each of the training cell lines, the number of occurrences of the metabolite feature in the set of important features that are output multiple times by each of the cancer cell line drug sensitivity prediction models, as a selected number;
a second determination module for, for each of said metabolite features of each of said training cell lines, targeting a maximum of a plurality of said selected numbers of times of said metabolite features;
The third determining module is used for sorting a plurality of target times from large to small to obtain a target sorting result, and taking metabolite features corresponding to the first to the m-th target times in the target sorting result as metabolite features to be used;
The training module is used for training the cancer cell line drug sensitivity prediction model to be trained by utilizing the metabolite to be used characteristic and the drug response category of the cell line to which the metabolite to be used characteristic belongs.
8. The device according to claim 7, wherein the important feature screening module is specifically configured to: normalizing each metabolite characteristic of the training cell line by using a normalization relation (x-min_x)/(max_x-min_x) to obtain normalized metabolite characteristics;
said x represents the content of said metabolite features in said cell lines, min_x is the minimum value of the content of said metabolite features in a plurality of said cell lines, max_x is the maximum value of the content of said metabolite features in a plurality of said cell lines;
inputting the normalized metabolite characteristics of the training cell line into the cancer cell line drug sensitivity prediction model to obtain an important characteristic set output by the cancer cell line drug sensitivity prediction model.
9. The apparatus of claim 7, wherein the apparatus further comprises:
The test module is used for sampling a second set number of cell lines from a plurality of cell lines, and taking the second set number of cell lines as test cell lines, and respectively predicting each metabolite characteristic of each test cell line for a plurality of times by using the cancer cell line drug sensitivity prediction model to be trained to obtain a prediction result;
evaluating each prediction result to obtain an evaluation result;
Judging whether the cancer cell line drug sensitivity prediction model to be trained meets a set requirement or not based on a plurality of evaluation results;
If yes, finishing training;
If not, returning to the step of obtaining the metabolite characteristics of each of the plurality of cell lines and the drug response parameters IC50 of each of the cell lines.
10. The apparatus of claim 9, wherein the test module is further configured to:
Before evaluating each of the predicted results to obtain an evaluation result, determining whether an abnormal cell line exists in a plurality of the training cell lines based on a plurality of the predicted results for each of the metabolite features in each of the test cell lines;
If the abnormal cell lines exist, eliminating the abnormal cell lines in the plurality of training cell lines, taking the cell lines with the abnormal cell lines eliminated as training cell lines, and returning to execute the step of executing a plurality of important feature screening processes on each training cell line;
If the predicted result does not exist, evaluating each predicted result to obtain an evaluation result;
judging whether the cancer cell line drug sensitivity prediction model to be trained meets a set requirement or not based on the evaluation result;
If yes, finishing training;
If not, returning to the step of obtaining the metabolite characteristics of each of the plurality of cell lines and the drug response parameters IC50 of each of the cell lines.
11. The apparatus of claim 7, wherein the first determining module is specifically configured to:
Dividing the cell lines with the same IC50 of the drug response parameters in a plurality of cell lines into a group to obtain cell line groups, and counting the number of the cell lines in each cell line group;
Searching a target drug response parameter IC50 in the drug response parameters IC50 of a plurality of cell line groups, wherein the difference between the sum of the numbers of cell lines in the cell line groups to which the drug response parameter IC50 smaller than the target drug response parameter IC50 belongs and the sum of the numbers of cell lines in the cell line groups to which the drug response parameter IC50 larger than the target drug response parameter IC50 belongs is within a set threshold value range;
taking the target drug response parameter IC50 as a preset drug response parameter IC50 threshold;
judging whether the drug response parameter IC50 is larger than a preset drug response parameter IC50 threshold;
if yes, determining the drug response type as insensitive;
If not, the drug response class is determined to be sensitive.
12. A drug susceptibility prediction apparatus, comprising:
the acquisition module is used for acquiring metabolite characteristics of the cell line to be treated;
The calling module is used for calling a cancer cell line drug sensitivity prediction model and processing the metabolite characteristics of the cell line to be processed so as to obtain a drug response type;
The cancer cell line drug sensitivity prediction model is trained based on the training method of the drug sensitivity prediction model according to any one of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011492075.8A CN112599218B (en) | 2020-12-16 | 2020-12-16 | Training method and prediction method of drug sensitivity prediction model and related device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011492075.8A CN112599218B (en) | 2020-12-16 | 2020-12-16 | Training method and prediction method of drug sensitivity prediction model and related device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112599218A CN112599218A (en) | 2021-04-02 |
CN112599218B true CN112599218B (en) | 2024-06-18 |
Family
ID=75196643
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011492075.8A Active CN112599218B (en) | 2020-12-16 | 2020-12-16 | Training method and prediction method of drug sensitivity prediction model and related device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112599218B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113782089B (en) * | 2021-11-15 | 2022-02-18 | 浙江大学 | Drug sensitivity prediction method and device based on multigroup chemical data fusion |
CN114255886B (en) * | 2022-02-28 | 2022-06-14 | 浙江大学 | Multi-group similarity guide-based drug sensitivity prediction method and device |
CN114613512A (en) * | 2022-03-01 | 2022-06-10 | 武汉工程大学 | Screening method, device, equipment and storage medium for anti-breast cancer candidate drugs |
CN116110509B (en) * | 2022-11-15 | 2023-08-04 | 浙江大学 | Drug sensitivity prediction method and device based on omics consistency pre-training |
CN115631871B (en) * | 2022-12-22 | 2023-03-24 | 北京大学第三医院(北京大学第三临床医学院) | Method and device for determining drug interaction grade |
CN116564514A (en) * | 2023-03-30 | 2023-08-08 | 深圳市儿童医院 | Multi-model-based method for predicting curative effect of epileptic caused by tuberous sclerosis |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2401696B1 (en) * | 2009-02-26 | 2017-06-21 | Intrexon CEU, Inc. | Mammalian cell line models and related methods |
US20150254433A1 (en) * | 2014-03-05 | 2015-09-10 | Bruce MACHER | Methods and Models for Determining Likelihood of Cancer Drug Treatment Success Utilizing Predictor Biomarkers, and Methods of Diagnosing and Treating Cancer Using the Biomarkers |
CN109308545B (en) * | 2018-08-21 | 2023-07-07 | 中国平安人寿保险股份有限公司 | Method, device, computer equipment and storage medium for predicting diabetes probability |
EP3963589A4 (en) * | 2019-04-30 | 2023-01-25 | Amgen Inc. | DATA-DRIVEN PREDICTIVE MODELING FOR SELECTION OF CELL LINES IN BIOPHARMACEUTICAL PRODUCTION |
CN111524554B (en) * | 2020-04-24 | 2023-03-24 | 上海海洋大学 | Cell activity prediction method based on LINCS-L1000 perturbation signal |
-
2020
- 2020-12-16 CN CN202011492075.8A patent/CN112599218B/en active Active
Non-Patent Citations (2)
Title |
---|
(反应 or 应答 or 敏感);高红杰;硕士学位论文;全文 * |
基于机器学习的煤矿突水预测方法;童柔;谢天保;;计算机系统应用(第12期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN112599218A (en) | 2021-04-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112599218B (en) | Training method and prediction method of drug sensitivity prediction model and related device | |
US20210383890A1 (en) | Systems and methods for classifying, prioritizing and interpreting genetic variants and therapies using a deep neural network | |
Green et al. | Cellular communities reveal trajectories of brain ageing and Alzheimer’s disease | |
CA2894317C (en) | Systems and methods for classifying, prioritizing and interpreting genetic variants and therapies using a deep neural network | |
RU2517286C2 (en) | Classification of samples data | |
Le et al. | Machine learning for cell type classification from single nucleus RNA sequencing data | |
US20220035892A1 (en) | Statistical analysis system and statistical analysis method using conversational interface | |
Elosua et al. | SPOTlight: seeded NMF regression to deconvolute spatial transcriptomics spots with single-cell transcriptomes | |
US20110087436A1 (en) | Method and system for analysis of time-series molecular quantities | |
Lu | Multi-omics Data Integration for Identifying Disease Specific Biological Pathways | |
CN117612712A (en) | Method and system for detecting and improving cognition evaluation diagnosis precision | |
Guan et al. | Cell type-specific predictive models perform prioritization of genes and gene sets associated with autism | |
WO2021224916A1 (en) | Prediction of biological role of tissue receptors | |
EP4386767A1 (en) | Characteristics of patient influencing disease progession | |
CN101517579A (en) | Method of searching for protein and apparatus therefor | |
US20130218581A1 (en) | Stratifying patient populations through characterization of disease-driving signaling | |
KR20210059325A (en) | Model for Predicting Cancer Prognosis using Deep learning | |
EP4143848B1 (en) | Patient stratification using latent variables | |
Burke et al. | Clinical validation of molecular biomarkers in translational medicine | |
CN109643584A (en) | For predicting the system, method and gene label of individual biological aspect | |
Nguyen et al. | Polar Gini Curve: a technique to discover gene expression spatial patterns from single-cell RNA-seq data | |
CN111584085A (en) | Subarachnoid hemorrhage prediction model establishment method and system based on genes and signal paths | |
CN118711751B (en) | Training program analysis system for renal rehabilitation | |
US20230116904A1 (en) | Selecting a cell line for an assay | |
Lipkovich et al. | Statistical methods for biomarker and subgroup evaluation in oncology trials |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |