CN116334225A

CN116334225A - Non-small cell lung cancer PD-1 immune therapy response prediction method for non-disease diagnosis or treatment

Info

Publication number: CN116334225A
Application number: CN202310326560.5A
Authority: CN
Inventors: 胡学达; 徐玄昊; 李勇; 张海满; 李辰威; 郑良涛
Original assignee: Beijing Baiaozhihui Technology Co ltd
Current assignee: Beijing Baiaozhihui Technology Co ltd
Priority date: 2023-03-30
Filing date: 2023-03-30
Publication date: 2023-06-27

Abstract

The invention provides a method for predicting the PD-1 immune therapy response of non-small cell lung cancer for the purpose of non-disease diagnosis or therapy, which mainly comprises the following two parts, namely, fully utilizing immune therapy single-cell sequencing data to establish a stable prediction model, and accurately acquiring transcriptome information of peripheral blood key cells by a full-length transcriptome sequencing technology. The cell subtype and the characteristic gene related to tumor immunotherapy in peripheral blood are identified, and the method has obvious statistical significance in science; a prediction model based on machine learning is established, and the accuracy and stability of prediction are high; the construction flow of the transcriptome library is optimized, so that the technical application range is improved; the peripheral blood sample before treatment of a small amount of patients can be used for giving more accurate prediction results to the response situation of the peripheral blood sample after the peripheral blood sample is subjected to immunotherapy, and the detection cost is low and the popularization and the use are easy.

Description

Non-small cell lung cancer PD-1 immune therapy response prediction method for non-disease diagnosis or treatment

Technical Field

The invention belongs to the technical field of biomedicine, and particularly relates to a non-small cell lung cancer PD-1 immune therapy response prediction method aiming at non-disease diagnosis or treatment.

Background

Currently, the common lung cancer immunotherapy and accompanying diagnosis methods on the market are mainly divided into two major categories, namely PD-L1 histochemical staining based on imaging and tumor mutation load analysis based on high-throughput sequencing. Immune checkpoint-based therapies target cell surface specific protein interactions (e.g., PD-1/PD-L1 antibody drugs), thereby reducing immune cell inhibition status and enhancing anti-tumor function. The histochemical staining method can directly analyze the quantity of the PD-L1 protein expressed by tumor cells in the tissue sample, thereby judging the potential clinical benefit of the patient.

Internationally, PD-L1 histochemical detection methods have been approved by the United states FDA, and existing commercial reagents include various products such as PD-L1 IHC 22C3 pharmDx (Dako), PD-L1 IHC 28-8pharmDx (Dako) and VENTANA PD-L1 SP142 (Roche), VENTANA PD-L1 SP263 (Roche).

However, PD-L1 detection by immunohistochemistry has many problems in practice, including:

(1) This method of detection is difficult because it requires the acquisition of a sample of tumor tissue of a patient and requires a high sample size, but it is difficult to sample a part of patients in practice.

(2) In the quantification process, the ratio of tumor cell surface protein expression (tumor proportion score, TPS) needs to be counted, and due to the heterogeneity of tumor tissue samples, the tumor cell PD-L1 expression of the puncture sampling part may be difficult to represent the overall situation of the microenvironment, thereby causing deviation of the final treatment prediction result of the patient.

(3) Some patients have been shown to be refractory to immunotherapy, i.e. have primary drug resistance problems, although PD-L1 detection is positive; while some patients respond better initially, they eventually develop acquired resistance, which is difficult to benefit from treatment, these objective conditions also lead to inaccuracy in patient stratification based on a histochemical analysis.

(4) Because commercial kits selected by different detection institutions/hospital pathology departments are different, quantitative results of various commercial detection products are different due to the fact that different antibody recognition areas are selected, detection uncertainty is brought, the process of multiple detection is time-consuming, and clinical work is finally affected.

For a number of reasons as described above, there are deviations in the sensitivity and accuracy of PD-L1 detection by immunohistochemistry, and detection of PD-L1 is difficult to be the only standard for patient group entry in tumor immunotherapy.

Mutations in the genome of tumor cells result in changes in the sequence of the encoded protein, which, upon antigen presentation, produces immunogenicity, i.e., tumor neoantigens. Therefore, the tumor mutation load can be analyzed by high-throughput sequencing, and the quality of the immunotherapy effect of the patient can be estimated. At present, a plurality of companies internationally provide a detection kit for tumor mutation loads, wherein the detection kit comprises GH Omni 500 (Guardant Health), foundationOneCDx (Foundation Medicine), plasmaSELECT (Personal Genome Diagnostics) and the like.

However, the analysis of Tumor Mutational Burden (TMB) is also affected by a number of factors, including: (1) The range of TMB values varies with the species, for example, high TMB values are most common in squamous cell carcinoma of the skin, melanoma, non-small cell lung carcinoma, and minimally in papillary thyroid carcinoma. (2) The impact of the patient's living environment and dietary daily life, such as TMB, is generally high in smokers, and results are biased during analysis. (3) TMB detection is affected by different methods and technical platforms; different TMB products have differences in the range of formulating the detection genes, and sequencing means are also divided into targeted sequencing and whole exon sequencing; on the other hand, there is also a difference between the detection results for tissue TMB and plasma TMB. (4) Difficulty in threshold selection at the time of analysis, for example in the KEYNOTE-158 study, 102 patients were defined as TMB-high (. Gtoreq.10 mut/Mb), accounting for 13%; in the B-F1RST (NCT 02848651) study, the TMB threshold (. Gtoreq.14.5 mut/Mb) was adjusted, but the results showed that both groups of patients were not statistically significant in Progression Free Survival (PFS). (5) At the same threshold, there is also a contradiction between the statistics of TMB. For example, in NEPTUNE studies of the anti-combination of Duplex with Tremellizumab for Duplex with Duvali You Shan, TMB-H (. Gtoreq.20 mut/Mb) is independent of clinical benefit; whereas patients with TMB.gtoreq.20 mut/Mb in MYSTIC (NCT 02453282) study had longer Overall Survival (OS) and Progression Free Survival (PFS). The problems and differences in clinical data present in these analyses render TMB difficult as an accurate predictor.

For these complications occurring in clinical practice, the identification and development of new detection markers and companion diagnostic strategies is urgent, and therefore, it is important to provide a more accurate companion diagnostic scheme.

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims to provide a method for predicting the PD-1 immune therapy response of non-small cell lung cancer for the purpose of non-disease diagnosis or treatment. The method of the invention is expected to solve the problems of accuracy, stability and convenience in the diagnosis accompanied by the immunotherapy.

In order to achieve the aim of the invention, the invention adopts the following technical scheme:

in a first aspect, the present invention provides a method for predicting the response of non-small cell lung cancer PD-1 immunotherapy for the purpose of non-disease diagnosis or treatment, said method comprising:

(1) Collecting single-cell transcriptome data of tumor tissues, peripheral blood and lymph nodes of a cancer patient, and carrying out data filtering and standardization on the single-cell transcriptome data; analyzing the differences of tissue distribution, expression profile characteristics, cell proliferation, migration capacity and clonal expansion of different groups of T cells to identify a tumor-responsive T cell subset including terminally differentiated depleted T cells and regulatory T cells expressing TNFRSF 9; the same type of T cells were found in peripheral blood by analysis of TCR sequences of the tumor-responsive T cell subpopulation and designated as outer Zhou Xiezhong tumor-responsive T cells;

(2) Collecting single-cell transcriptome data sets and clinical treatment information disclosed by existing melanoma, triple-negative breast and non-small cell lung cancer patients, and carrying out data filtering, standardization and annotation;

(3) Layering the single-cell transcriptome data obtained in the step (2), randomly dividing a training set and a testing set, and taking clinical response information as a single-cell transcriptome data label;

(4) Training a logistic regression model based on single-cell transcriptome data of a training set, fitting a parameter vector according to the gene expression condition of a sample cell and corresponding clinical response information, and constructing the logistic regression model;

(5) Verifying by adopting the test set data pair, and adding and averaging the response probability obtained by model prediction to calculate the corresponding clinical response probability;

(6) Collecting peripheral blood samples of a patient with non-small cell lung cancer receiving PD-1 immune treatment, sorting out Zhou Xiezhong tumor response T cells, performing high-throughput sequencing to obtain transcriptome information, filtering and standardizing transcriptome data, inputting a logistic regression model, adding and averaging response probabilities obtained through model prediction to calculate corresponding clinical response probabilities, and predicting clinical response conditions according to the clinical response probabilities.

In the invention, in the step (1), the gene expression characteristics of the external Zhou Xiezhong tumor response T cells are similar to those of the terminally differentiated memory T cells, and are closely related to immunotherapy. The terminally differentiated memory T cells were designated Temra (terminally differentiated effector memory or effector T cell).

Preferably, in step (1), identifying a characteristic gene of the tumor-responsive T cell subpopulation comprises: CX3CR1, GZMB, GZMH, KLRD1, NKG7, GNLY, and FGFBP2.

Preferably, in step (3), the training set comprises 80% single cell transcriptome data and the test set comprises 20% single cell transcriptome data.

Preferably, in step (4), the logistic regression model is represented by the following formula:

wherein X is the gene expression, W ^T As parameter vector, W ₀ Is a bias parameter.

Preferably, in step (5), the method for calculating the response probability is as follows:

wherein P is probability corresponding to clinical response condition, x is characteristic value, y is clinical response condition, θ ^T Is a parameter vector.

Preferably, in step (5), the method for calculating the clinical response probability is as follows:

where y=1 indicates clinical response, y=0 indicates clinical non-response, n indicates cell number, response is a predicted value, and if response >0 is clinical response, response <0 is clinical non-response, response=0 is indistinct, and detection is required again.

Preferably, in step (6), the high throughput sequencing obtaining transcriptome information is performed using a method comprising:

cells are subjected to lysis and reverse transcription, cDNA amplification and fragmentation are performed, library amplification and purification are performed after fragmentation, and high-throughput sequencing is performed after purification to obtain transcriptome information.

Preferably, the cDNA is fragmented using a Tn5 cleavage system in which the final concentration of Tn5 enzyme is 0.001-0.01. Mu.M, for example, 0.001. Mu.M, 0.005. Mu.M, or 0.01. Mu.M, and preferably 0.005. Mu.M.

Preferably, 5% -20% of dimethylformamide is also included in the Tn5 enzyme cutting system, for example, 5%, 10%, 15% or 20% of dimethylformamide is selected, and preferably 10% of dimethylformamide is selected.

Preferably, the pH value of the Tn5 enzyme cutting system is 7.0-8.5, for example, 7.0, 7.5, 8.0 or 8.5, and the like, preferably 7.3.

Preferably, the library amplification is performed using an amplification system to which 0.01-0.012% Tween-20 (e.g., may be 0.01%, 0.011% or 0.012% or the like) is added.

Preferably, the purification employs a purification strategy of: 0.7X retention beads+0.6x retention supernatant+0.15X retention beads+0.6x retention beads.

In a second aspect, the present invention provides a non-small cell lung cancer PD-1 immunotherapeutic response prediction system based on peripheral blood detection, the system comprising:

a tumor-responsive T cell subpopulation screening module for screening a tumor-responsive T cell subpopulation from single cell transcriptome data of tumor tissue, peripheral blood, and lymph nodes of a cancer patient;

the data acquisition module is used for acquiring a single-cell transcriptome data set and clinical treatment information disclosed by the existing melanoma, triple-negative breast and non-small cell lung cancer patients;

the data dividing module is used for layering the obtained single-cell transcriptome data and randomly dividing a training set and a testing set, and taking clinical response information as a single-cell transcriptome data tag;

the logistic regression model construction module is used for fitting the parameter vector according to the gene expression condition of the sample cell and the corresponding clinical response information to construct a logistic regression model;

the logistic regression model verification module is used for verifying the obtained logistic regression model, and calculating the clinical response probability by adding and averaging the response probability obtained by model prediction;

the clinical response situation prediction module is used for filtering and standardizing the response T cell transcriptome information of the outer Zhou Xiezhong tumor of the person to be predicted, inputting a logistic regression model, and adding and averaging the response probability obtained by model prediction to calculate the clinical response probability so as to predict the clinical response situation according to the clinical response probability.

Preferably, in the tumor-responsive T cell subset screening module, the tumor-responsive T cell subset is screened by the following method: collecting single-cell transcriptome data of tumor tissues, peripheral blood and lymph nodes of a cancer patient, and carrying out data filtering and standardization on the single-cell transcriptome data; analyzing the differences of tissue distribution, expression profile characteristics, cell proliferation, migration capacity and clonal expansion of different groups of T cells to identify a tumor-responsive T cell subset including terminally differentiated depleted T cells and regulatory T cells expressing TNFRSF 9; the same type of T cells were found in peripheral blood by analysis of TCR sequences of the tumor-responsive T cell subpopulation and designated as outer Zhou Xiezhong tumor-responsive T cells.

In the invention, the gene expression characteristics of the selected external Zhou Xiezhong tumor response T cells are similar to those of terminally differentiated memory T cells, and the gene expression characteristics are closely related to immunotherapy. The terminally differentiated memory T cells were designated Temra (terminally differentiated effector memory or effector T cell).

Preferably, identifying a characteristic gene of the tumor-responsive T cell subpopulation comprises: CX3CR1, GZMB, GZMH, KLRD1, NKG7, GNLY, and FGFBP2.

Preferably, the data obtaining module further comprises data filtering, standardization and annotation after obtaining the data information.

Preferably, in the data partitioning module, the training set includes 80% single cell transcriptome data, and the test set includes 20% single cell transcriptome data.

Preferably, the logistic regression model is represented by the following formula:

wherein the method comprises the steps ofX is the gene expression, W ^T As parameter vector, W ₀ Is a bias parameter.

Preferably, the calculation method of the response probability is as follows:

Preferably, the calculation method of the clinical response probability is as follows:

In a third aspect, the present invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of the first aspect when the computer program is executed.

In a fourth aspect, the present invention provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method of the first aspect.

The numerical ranges recited herein include not only the recited point values, but also any point values between the recited numerical ranges that are not recited, and are limited to, and for the sake of brevity, the invention is not intended to be exhaustive of the specific point values that the recited range includes.

Compared with the prior art, the invention has the following beneficial effects:

(1) The invention identifies cell subtype and characteristic gene related to tumor immunotherapy in peripheral blood based on large-queue single-cell histology data, and has obvious statistical significance in science.

(2) The invention establishes a prediction model based on machine learning, and can improve the accuracy and stability of treatment response prediction by continuously accumulating the data and repeatedly iterating the model.

(3) The experimental method of the invention is based on SMART-seq full-length transcriptome amplification technology, and has higher gene detection capability, so the detection resolution is higher.

(4) The invention optimizes the construction flow of transcriptome library, can successfully perform experiments in the initial quantity range of 1-1000 cells, and improves the technical application range.

(5) The detection means of the invention adopts a common transcriptome method (bulk RNA-seq) instead of a single-cell transcriptome, can greatly reduce the experimental workload and cost, and is favorable for clinical practice and popularization.

(6) The invention collects the peripheral blood sample of the immune therapeutic patient before treatment, the detection usage amount is not more than 4 milliliters (mL), the sampling process is convenient and quick, the damage to the patient is less, and the invention is beneficial to clinical practice and popularization.

Drawings

FIG. 1 is a flow chart of machine learning predictive model construction based on single cell data with concomitant diagnosis of lung cancer immunotherapy;

FIG. 2 is transcriptome library information at different Tn5 enzyme concentrations;

FIG. 3 is transcriptome library information at different buffer pH values;

FIG. 4 shows the purification results before and after adjustment of the magnetic bead screening strategy;

FIG. 5 is the results of construction of full length transcriptome library at a cell starting amount of 10;

FIG. 6 is the results of construction of full length transcriptome libraries at a cell starting amount of 100;

FIG. 7 is the results of construction of a full length transcriptome library at a cell starting amount of 200;

FIG. 8 is the results of construction of a full length transcriptome library at a cell starting amount of 300;

FIG. 9 is the results of construction of full length transcriptome libraries at a cell initiation amount of 1000;

FIG. 10 is a predicted outcome of immunotherapy response in peripheral blood based lung cancer patients.

Detailed Description

The technical scheme of the invention is further described by the following specific embodiments. It will be apparent to those skilled in the art that the examples are merely to aid in understanding the invention and are not to be construed as a specific limitation thereof.

The specific techniques or conditions are not identified in the examples and are described in the literature in this field or are carried out in accordance with the product specifications. The reagents or apparatus used were conventional products commercially available through regular channels, with no manufacturer noted.

The materials and solution formulation methods used in the following embodiments are as follows:

example 1

In order to realize the prediction of drug response to a patient with non-small cell lung cancer receiving PD-1 immunotherapy, the embodiment builds a prediction model based on the data of an immunotherapy transcriptome, and develops a peripheral blood sequencing technology based on a SMART-seq full-length transcriptome.

1. In order to improve the accuracy of the model building link, single cell data collection and arrangement are carried out:

(1) Through unified filtering standard and data standardization, integrating single cell transcriptome data of tumor tissues, peripheral blood and lymph nodes of a total of 21 patients with more than 300 cancer types, and carrying out data filtering and standardization on the single cell transcriptome data; analyzing the differences of tissue distribution, expression profile characteristics, cell proliferation, migration capacity and clonal expansion of different groups of T cells, and identifying T cell subsets of tumor responses therefrom, including terminally differentiated depleted T cells and regulatory T cells expressing TNFRSF 9; the same type of T cells were found in peripheral blood by analysis of TCR sequences of the tumor-responsive T cell subpopulation and designated as outer Zhou Xiezhong tumor-responsive T cells.

The gene expression characteristics of the outer Zhou Xiezhong tumor response T cells are similar to those of the terminally differentiated memory T cells, and are closely related to immunotherapy.

(2) Based on the latest single cell immunotherapy research results, the single cell data set and clinical treatment information (including melanoma, triple negative breast and non-small cell lung cancer) disclosed by 38 additional cancer patients are collected, and after data filtering, standardization and annotation, the T cell group is focused on.

(3) Single cell data were stratified with patient numbers and training sets (80% single cell data) and test sets (20% single cell data) were randomly divided at a ratio of 4:1, with clinical response information as single cell data tags.

(3) Selecting a default parameter training logistic regression model based on single cell data of a training set by using Python3.9, firstly fitting a parameter vector according to the gene expression condition of a sample cell and corresponding clinical response information, wherein the logistic regression model is shown in a formula 1:

(4) And verifying the model obtained by training by adopting test set data, and adding and averaging single cell response probabilities obtained by model prediction to calculate clinical response probabilities of corresponding patients, wherein the clinical response probabilities of the patients are predicted according to the clinical response probabilities, and the clinical response probabilities are specifically shown as a formula 2 and a formula 3:

wherein, in the formula 2, P is the probability of corresponding to the clinical response condition, x is the characteristic value, y is the clinical response condition,θ ^T is a parameter vector.

In formula 3, y=1 indicates clinical response, y=0 indicates clinical non-response, n indicates cell number, response is a predicted value, and if response >0 is clinical response, response <0 is clinical non-response, response=0 is indistinct, and detection is required again.

(5) The model construction strategy is based on verification test of melanoma, triple negative mammary gland and non-small cell lung cancer single cell immunotherapy data, and can effectively predict clinical response conditions of patients, and the prediction accuracy rate is more than 90%.

A flow chart of machine learning prediction model construction and lung cancer immunotherapy accompanying diagnosis based on single-cell data is shown in fig. 1, wherein the left graph is a prediction model construction flow, and the right graph is a clinical sample processing and therapy response prediction flow.

2. In the development link of peripheral blood sequencing technology:

(1) Peripheral blood samples of non-small cell lung cancer patients receiving PD-1 immunotherapy were collected and PBMC cells were extracted therefrom (sample size required: fresh not more than 4 ml or 10 ml ⁶ Individual cryopreserved PBMC cells).

(2) Labelling of CD3 in PBMC by fluorescent antibodies ⁺ T cells and specific T cell subsets (i.e. CD 3) are obtained by flow cell sorting ⁺ CX3CR1 ⁺ Outer Zhou Xiezhong tumor-responsive T cells and CD3 as background ⁺ CX3CR1 ^- Non-external Zhou Xiezhong tumor responsive T cells), three tubes were collected for each cell subpopulation as a technical repeat in order to ensure stability of the experimental results.

(3) The literature reports that the SMART-seq full-length transcriptome technology prototype is mainly used for constructing a second generation sequencing library aiming at single cells, and experimental reagents and steps are adjusted in the embodiment so that the SMART-seq full-length transcriptome technology prototype can be suitable for 1-10 ³ Cell processing and library construction on the order of magnitude; this process involves three main steps, namely cell lysis and reverse transcription, cDNA amplification and fragmentation, library preparation and purification.

(4) Cell lysis and reverse transcription are routine experimental procedures, and conditions are optimized in key steps, including Tn5 cleavage system, reaction buffer adjustment, library purification strategy, etc., for better realization of subsequent cDNA fragmentation and library preparation.

(5) To improve experimental efficiency, the final concentrations of the reactions of different Tn5 enzyme complexes were tested (0.001. Mu.M-0.01. Mu.M, transcriptome library information and statistical results at different Tn5 enzyme concentrations are shown in FIG. 2 and Table 1, FIG. 2 is transcriptome library information at different Tn5 enzyme concentrations, table 1 is a summary statistical result of transcriptome library information at different Tn5 enzyme concentrations, and the results indicate that the library concentration and total amount are highest when the Tn5 concentration is 0.005. Mu.M. Therefore, the optimal condition is finally determined to be 0.005. Mu.M, compared with the commercial kit, the ratio of small fragments in the library is reduced, the quality of the library is improved, and the experimental cost is remarkably reduced.

TABLE 1

Sample name	Tn5 enzyme concentration	Library concentration	Library inventory
					1	0.01μM	8.76ng/μL	87.6ng
2	0.0075μM	10.64ng/μL	106.4ng
				3	0.005μM	12.40ng/μL	124.0ng
4	0.004μM	10.50ng/μL	105.0ng
				5	0.003μM	6.62ng/μL	66.2ng
6	0.001μM	1.28ng/μL	12.8ng

(6) In order to improve the quality and stability of experimental results, the pH value (7.0-8.5) of the reaction buffer solution and the concentration (0% -20%) of Dimethylformamide (DMF) are adjusted, and finally, the combination condition of the optimal pH value of 7.3 and the optimal DMF of 10% is determined, so that the cDNA fragmentation effect is best, the library concentration is higher, and the quality is guaranteed. The transcriptome library information and the statistical results under the different buffer pH values are shown in FIG. 3 and Table 2, FIG. 3 is the transcriptome library information under the different buffer pH values, and Table 2 is the summarized statistical result of the transcriptome library information under the different buffer pH values; the results indicated that the library concentration and total amount were highest at pH 7.3. Meanwhile, 0.01% of Tween-20 is added in the library amplification step after fragmentation, and the improvement can neutralize the effect of SDS and improve the efficiency of the amplification enzyme.

TABLE 2

Sample is clearly called	PH value	Library concentration	Library inventory
					1	7.2	7.86ng/μL	78.6ng
2	7.3	10.30ng/μL	103.0ng
				3	8.4	4.68ng/μL	46.8ng

(7) In order to improve the quality of sequencing data, in the library purification process, the step of magnetic bead screening is adjusted to be a purification strategy of 0.7X (reserved magnetic beads) +0.6X (reserved supernatant) +0.15X (reserved magnetic beads) +0.6X (reserved magnetic beads), and the scheme can effectively remove small fragments, so that the library size is more concentrated, and the data quality is improved. The purification results before and after the magnetic bead screening strategy are adjusted are shown in fig. 4, the magnetic bead screening strategy is adjusted, and one-step purification is additionally added, so that the proportion of small fragments can be effectively reduced, and the library quality is improved. The left image is an original document method, wherein the original document method is 1X (reserved magnetic beads) +0.6X (reserved supernatant) +0.15X (reserved magnetic beads), more magnetic beads are consumed, and small fragments are difficult to completely remove; the right panel shows the purification result after strategy adjustment, and the small fragments in the range of 200bp are obviously reduced.

(8) Transcriptome information corresponding to the cells was then obtained by Illumina NovaSeq 6000PE150 high throughput sequencing.

(9) And filtering and normalizing the transcriptome data, and inputting the transcriptome data into a prediction model to finally obtain the treatment response probability value of the corresponding patient.

In conclusion, the invention fully utilizes the immune therapy single cell sequencing data to establish a stable prediction model, and accurately obtains transcriptome information of peripheral blood key cells by a full-length transcriptome sequencing technology:

(1) The invention has firm theoretical basis and abundant data, integrates single cell transcriptome data of a total of 21 cancer types of more than 300 patients, and identifies cell types and gene expression characteristics related to immune therapy response in peripheral blood, namely external Zhou Xiezhong tumor response T cells, wherein the characteristic genes comprise CX3CR1, GZMB, GZMH, KLRD1, NKG7, GNLY, FGFBP2 and the like;

(2) The present invention collects and integrates immunotherapeutic single cell datasets for an additional 38 cancer patients, directed against CD8 therein ⁺ T lymphocytes, establishing a stable machine learning prediction model through logistic regression; the subsequent cross verification is carried out through the test set data, and the overall prediction accuracy can reach more than 90%;

(3) The experimental method of the technology only needs 2-4 milliliters (mL) of peripheral blood sample or 10 milliliters (mL) ⁶ The specific T cells responding to the outer Zhou Xiezhong tumor are enriched by antibody labeling and flow sorting, which is lower than the sample size required by tumor mutation load detection or other diagnostic methods;

(4) Unlike available single cell, the present invention improves the SMART-seq full length transcriptome library constructing process to reach initial amount of 1-1000 cells, and has integral experiment success rate not less than 95%, and the results shown in FIG. 5, FIG. 6, FIG. 7, FIG. 8 and FIG. 9, in which specific T cell group is flow sorted for subsequent transcriptome library construction and library segment analysis shows obvious signal peak in 400-600 range; thus, more gene numbers are detected by the SMART-seq method, the accuracy of the data is improved, and meanwhile, the stability of the result is ensured based on transcriptome data of a plurality of cells;

(5) The sequencing library construction process adopts an optimized and improved enzyme breaking system (comprising an enzyme complex and a reaction buffer system), so that the quality of the library is obviously improved, and the experimental cost is obviously reduced;

(6) According to the invention, more than 50 pre-treatment blood samples of non-small cell lung cancer patients receiving PD-1 immune treatment are collected, the current 16 patients have paired clinical follow-up information, the overall prediction accuracy can reach more than 80% by testing the method, and the result is shown in a graph 10, wherein in the graph 10, R represents clinical treatment response, non-R represents clinical treatment non-response, data points are distributed above a y-axis 0 point and represent model prediction as response, the model prediction below the y-axis 0 point is not response, the number of the data points represents the number of calculation simulation, the current display result is 10 simulation cases, and the current overall prediction accuracy is 81.3%, so that the method is proved to be effective in practical application.

Based on the characteristics, the invention can give more accurate prediction results for the response situation of the peripheral blood sample before the treatment of a small number of patients after the treatment of the patients.

The applicant declares that the above is only a specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and it should be apparent to those skilled in the art that any changes or substitutions that are easily conceivable within the technical scope of the present invention disclosed by the present invention fall within the scope of the present invention and the disclosure.

Claims

1. A method of predicting a non-small cell lung cancer PD-1 immunotherapeutic response for the purposes of non-disease diagnosis or treatment, the method comprising:

(6) Collecting peripheral blood samples of a patient with non-small cell lung cancer receiving PD-1 immune treatment, sorting T cells responding to Zhou Xiezhong tumor outside, performing high-throughput sequencing to obtain transcriptome information, filtering and standardizing transcriptome data, inputting a logistic regression model, adding and averaging response probabilities obtained through model prediction to calculate corresponding clinical response probabilities, and predicting clinical response conditions according to the clinical response probabilities.

2. The method of predicting the response of non-small cell lung cancer PD-1 immunotherapy for non-disease diagnosis or treatment according to claim 1, wherein in step (1), identifying the characteristic genes of the T cell subset of the tumor response comprises: CX3CR1, GZMB, GZMH, KLRD1, NKG7, GNLY, and FGFBP2.

3. The method of predicting the response of non-small cell lung cancer PD-1 immunotherapy for non-disease diagnosis or treatment according to claim 1 or 2, wherein in step (3), the training set comprises 80% single cell transcriptome data and the test set comprises 20% single cell transcriptome data.

4. The method of predicting the PD-1 immunotherapy response of non-small cell lung cancer for the purpose of non-disease diagnosis or treatment according to any one of claims 1-3, wherein in step (4), the logistic regression model is represented by the following formula:

5. The method of predicting the response of non-small cell lung cancer PD-1 immunotherapy for the purpose of non-disease diagnosis or treatment according to any one of claims 1 to 4, wherein in step (5), the method of calculating the response probability is represented by the following formula:

wherein P is probability corresponding to clinical response condition, x is characteristic value, y is clinical response condition, θ ^T Is a parameter vector;

6. The method of predicting the response to non-small cell lung cancer PD-1 immunotherapy for the purpose of non-disease diagnosis or treatment according to any one of claims 2 to 5, wherein in step (6), the high throughput sequencing obtaining transcriptome information is performed by a method comprising the steps of:

the cell is subjected to lysis and reverse transcription, cDNA amplification and fragmentation are carried out, library amplification and purification are carried out after fragmentation, and high-throughput sequencing is carried out after purification to obtain transcriptome information;

preferably, the cDNA is fragmented by using a Tn5 cleavage system, wherein the final concentration of Tn5 enzyme in the Tn5 cleavage system is 0.001-0.01. Mu.M, preferably 0.005. Mu.M;

preferably, the Tn5 enzyme cutting system also comprises 5% -20% of dimethylformamide, preferably 10% of dimethylformamide;

preferably, the pH value of the Tn5 enzyme cutting system is 7.0-8.5, preferably 7.3;

preferably, the library amplification is performed using an amplification system with 0.01-0.012% Tween-20 added;

7. A non-small cell lung cancer PD-1 immunotherapeutic response prediction system based on peripheral blood detection, the system comprising:

8. The peripheral blood detection-based non-small cell lung cancer PD-1 immunotherapeutic response prediction system of claim 7, wherein the logistic regression model is represented by the formula:

wherein X is the gene expression, W ^T As parameter vector, W ₀ Is a bias parameter;

preferably, the calculation method of the response probability is as follows:

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1-5 when the computer program is executed by the processor.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any of claims 1-5.