[go: up one dir, main page]

CN116334225A - Non-small cell lung cancer PD-1 immune therapy response prediction method for non-disease diagnosis or treatment - Google Patents

Non-small cell lung cancer PD-1 immune therapy response prediction method for non-disease diagnosis or treatment Download PDF

Info

Publication number
CN116334225A
CN116334225A CN202310326560.5A CN202310326560A CN116334225A CN 116334225 A CN116334225 A CN 116334225A CN 202310326560 A CN202310326560 A CN 202310326560A CN 116334225 A CN116334225 A CN 116334225A
Authority
CN
China
Prior art keywords
response
cell
clinical
tumor
transcriptome
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310326560.5A
Other languages
Chinese (zh)
Inventor
胡学达
徐玄昊
李勇
张海满
李辰威
郑良涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baiaozhihui Technology Co ltd
Original Assignee
Beijing Baiaozhihui Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baiaozhihui Technology Co ltd filed Critical Beijing Baiaozhihui Technology Co ltd
Priority to CN202310326560.5A priority Critical patent/CN116334225A/en
Publication of CN116334225A publication Critical patent/CN116334225A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/27Regression, e.g. linear or logistic regression
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/50ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/106Pharmacogenomics, i.e. genetic variability in individual responses to drugs and drug metabolism
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Zoology (AREA)
  • Pathology (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Wood Science & Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Genetics & Genomics (AREA)
  • Immunology (AREA)
  • General Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Epidemiology (AREA)
  • Theoretical Computer Science (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biotechnology (AREA)
  • Databases & Information Systems (AREA)
  • Primary Health Care (AREA)
  • Biophysics (AREA)
  • Biochemistry (AREA)
  • Oncology (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Hospice & Palliative Care (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention provides a method for predicting the PD-1 immune therapy response of non-small cell lung cancer for the purpose of non-disease diagnosis or therapy, which mainly comprises the following two parts, namely, fully utilizing immune therapy single-cell sequencing data to establish a stable prediction model, and accurately acquiring transcriptome information of peripheral blood key cells by a full-length transcriptome sequencing technology. The cell subtype and the characteristic gene related to tumor immunotherapy in peripheral blood are identified, and the method has obvious statistical significance in science; a prediction model based on machine learning is established, and the accuracy and stability of prediction are high; the construction flow of the transcriptome library is optimized, so that the technical application range is improved; the peripheral blood sample before treatment of a small amount of patients can be used for giving more accurate prediction results to the response situation of the peripheral blood sample after the peripheral blood sample is subjected to immunotherapy, and the detection cost is low and the popularization and the use are easy.

Description

Non-small cell lung cancer PD-1 immune therapy response prediction method for non-disease diagnosis or treatment
Technical Field
The invention belongs to the technical field of biomedicine, and particularly relates to a non-small cell lung cancer PD-1 immune therapy response prediction method aiming at non-disease diagnosis or treatment.
Background
Currently, the common lung cancer immunotherapy and accompanying diagnosis methods on the market are mainly divided into two major categories, namely PD-L1 histochemical staining based on imaging and tumor mutation load analysis based on high-throughput sequencing. Immune checkpoint-based therapies target cell surface specific protein interactions (e.g., PD-1/PD-L1 antibody drugs), thereby reducing immune cell inhibition status and enhancing anti-tumor function. The histochemical staining method can directly analyze the quantity of the PD-L1 protein expressed by tumor cells in the tissue sample, thereby judging the potential clinical benefit of the patient.
Internationally, PD-L1 histochemical detection methods have been approved by the United states FDA, and existing commercial reagents include various products such as PD-L1 IHC 22C3 pharmDx (Dako), PD-L1 IHC 28-8pharmDx (Dako) and VENTANA PD-L1 SP142 (Roche), VENTANA PD-L1 SP263 (Roche).
However, PD-L1 detection by immunohistochemistry has many problems in practice, including:
(1) This method of detection is difficult because it requires the acquisition of a sample of tumor tissue of a patient and requires a high sample size, but it is difficult to sample a part of patients in practice.
(2) In the quantification process, the ratio of tumor cell surface protein expression (tumor proportion score, TPS) needs to be counted, and due to the heterogeneity of tumor tissue samples, the tumor cell PD-L1 expression of the puncture sampling part may be difficult to represent the overall situation of the microenvironment, thereby causing deviation of the final treatment prediction result of the patient.
(3) Some patients have been shown to be refractory to immunotherapy, i.e. have primary drug resistance problems, although PD-L1 detection is positive; while some patients respond better initially, they eventually develop acquired resistance, which is difficult to benefit from treatment, these objective conditions also lead to inaccuracy in patient stratification based on a histochemical analysis.
(4) Because commercial kits selected by different detection institutions/hospital pathology departments are different, quantitative results of various commercial detection products are different due to the fact that different antibody recognition areas are selected, detection uncertainty is brought, the process of multiple detection is time-consuming, and clinical work is finally affected.
For a number of reasons as described above, there are deviations in the sensitivity and accuracy of PD-L1 detection by immunohistochemistry, and detection of PD-L1 is difficult to be the only standard for patient group entry in tumor immunotherapy.
Mutations in the genome of tumor cells result in changes in the sequence of the encoded protein, which, upon antigen presentation, produces immunogenicity, i.e., tumor neoantigens. Therefore, the tumor mutation load can be analyzed by high-throughput sequencing, and the quality of the immunotherapy effect of the patient can be estimated. At present, a plurality of companies internationally provide a detection kit for tumor mutation loads, wherein the detection kit comprises GH Omni 500 (Guardant Health), foundationOneCDx (Foundation Medicine), plasmaSELECT (Personal Genome Diagnostics) and the like.
However, the analysis of Tumor Mutational Burden (TMB) is also affected by a number of factors, including: (1) The range of TMB values varies with the species, for example, high TMB values are most common in squamous cell carcinoma of the skin, melanoma, non-small cell lung carcinoma, and minimally in papillary thyroid carcinoma. (2) The impact of the patient's living environment and dietary daily life, such as TMB, is generally high in smokers, and results are biased during analysis. (3) TMB detection is affected by different methods and technical platforms; different TMB products have differences in the range of formulating the detection genes, and sequencing means are also divided into targeted sequencing and whole exon sequencing; on the other hand, there is also a difference between the detection results for tissue TMB and plasma TMB. (4) Difficulty in threshold selection at the time of analysis, for example in the KEYNOTE-158 study, 102 patients were defined as TMB-high (. Gtoreq.10 mut/Mb), accounting for 13%; in the B-F1RST (NCT 02848651) study, the TMB threshold (. Gtoreq.14.5 mut/Mb) was adjusted, but the results showed that both groups of patients were not statistically significant in Progression Free Survival (PFS). (5) At the same threshold, there is also a contradiction between the statistics of TMB. For example, in NEPTUNE studies of the anti-combination of Duplex with Tremellizumab for Duplex with Duvali You Shan, TMB-H (. Gtoreq.20 mut/Mb) is independent of clinical benefit; whereas patients with TMB.gtoreq.20 mut/Mb in MYSTIC (NCT 02453282) study had longer Overall Survival (OS) and Progression Free Survival (PFS). The problems and differences in clinical data present in these analyses render TMB difficult as an accurate predictor.
For these complications occurring in clinical practice, the identification and development of new detection markers and companion diagnostic strategies is urgent, and therefore, it is important to provide a more accurate companion diagnostic scheme.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a method for predicting the PD-1 immune therapy response of non-small cell lung cancer for the purpose of non-disease diagnosis or treatment. The method of the invention is expected to solve the problems of accuracy, stability and convenience in the diagnosis accompanied by the immunotherapy.
In order to achieve the aim of the invention, the invention adopts the following technical scheme:
in a first aspect, the present invention provides a method for predicting the response of non-small cell lung cancer PD-1 immunotherapy for the purpose of non-disease diagnosis or treatment, said method comprising:
(1) Collecting single-cell transcriptome data of tumor tissues, peripheral blood and lymph nodes of a cancer patient, and carrying out data filtering and standardization on the single-cell transcriptome data; analyzing the differences of tissue distribution, expression profile characteristics, cell proliferation, migration capacity and clonal expansion of different groups of T cells to identify a tumor-responsive T cell subset including terminally differentiated depleted T cells and regulatory T cells expressing TNFRSF 9; the same type of T cells were found in peripheral blood by analysis of TCR sequences of the tumor-responsive T cell subpopulation and designated as outer Zhou Xiezhong tumor-responsive T cells;
(2) Collecting single-cell transcriptome data sets and clinical treatment information disclosed by existing melanoma, triple-negative breast and non-small cell lung cancer patients, and carrying out data filtering, standardization and annotation;
(3) Layering the single-cell transcriptome data obtained in the step (2), randomly dividing a training set and a testing set, and taking clinical response information as a single-cell transcriptome data label;
(4) Training a logistic regression model based on single-cell transcriptome data of a training set, fitting a parameter vector according to the gene expression condition of a sample cell and corresponding clinical response information, and constructing the logistic regression model;
(5) Verifying by adopting the test set data pair, and adding and averaging the response probability obtained by model prediction to calculate the corresponding clinical response probability;
(6) Collecting peripheral blood samples of a patient with non-small cell lung cancer receiving PD-1 immune treatment, sorting out Zhou Xiezhong tumor response T cells, performing high-throughput sequencing to obtain transcriptome information, filtering and standardizing transcriptome data, inputting a logistic regression model, adding and averaging response probabilities obtained through model prediction to calculate corresponding clinical response probabilities, and predicting clinical response conditions according to the clinical response probabilities.
In the invention, in the step (1), the gene expression characteristics of the external Zhou Xiezhong tumor response T cells are similar to those of the terminally differentiated memory T cells, and are closely related to immunotherapy. The terminally differentiated memory T cells were designated Temra (terminally differentiated effector memory or effector T cell).
Preferably, in step (1), identifying a characteristic gene of the tumor-responsive T cell subpopulation comprises: CX3CR1, GZMB, GZMH, KLRD1, NKG7, GNLY, and FGFBP2.
Preferably, in step (3), the training set comprises 80% single cell transcriptome data and the test set comprises 20% single cell transcriptome data.
Preferably, in step (4), the logistic regression model is represented by the following formula:
Figure BDA0004153428020000041
wherein X is the gene expression, W T As parameter vector, W 0 Is a bias parameter.
Preferably, in step (5), the method for calculating the response probability is as follows:
Figure BDA0004153428020000051
wherein P is probability corresponding to clinical response condition, x is characteristic value, y is clinical response condition, θ T Is a parameter vector.
Preferably, in step (5), the method for calculating the clinical response probability is as follows:
Figure BDA0004153428020000052
where y=1 indicates clinical response, y=0 indicates clinical non-response, n indicates cell number, response is a predicted value, and if response >0 is clinical response, response <0 is clinical non-response, response=0 is indistinct, and detection is required again.
Preferably, in step (6), the high throughput sequencing obtaining transcriptome information is performed using a method comprising:
cells are subjected to lysis and reverse transcription, cDNA amplification and fragmentation are performed, library amplification and purification are performed after fragmentation, and high-throughput sequencing is performed after purification to obtain transcriptome information.
Preferably, the cDNA is fragmented using a Tn5 cleavage system in which the final concentration of Tn5 enzyme is 0.001-0.01. Mu.M, for example, 0.001. Mu.M, 0.005. Mu.M, or 0.01. Mu.M, and preferably 0.005. Mu.M.
Preferably, 5% -20% of dimethylformamide is also included in the Tn5 enzyme cutting system, for example, 5%, 10%, 15% or 20% of dimethylformamide is selected, and preferably 10% of dimethylformamide is selected.
Preferably, the pH value of the Tn5 enzyme cutting system is 7.0-8.5, for example, 7.0, 7.5, 8.0 or 8.5, and the like, preferably 7.3.
Preferably, the library amplification is performed using an amplification system to which 0.01-0.012% Tween-20 (e.g., may be 0.01%, 0.011% or 0.012% or the like) is added.
Preferably, the purification employs a purification strategy of: 0.7X retention beads+0.6x retention supernatant+0.15X retention beads+0.6x retention beads.
In a second aspect, the present invention provides a non-small cell lung cancer PD-1 immunotherapeutic response prediction system based on peripheral blood detection, the system comprising:
a tumor-responsive T cell subpopulation screening module for screening a tumor-responsive T cell subpopulation from single cell transcriptome data of tumor tissue, peripheral blood, and lymph nodes of a cancer patient;
the data acquisition module is used for acquiring a single-cell transcriptome data set and clinical treatment information disclosed by the existing melanoma, triple-negative breast and non-small cell lung cancer patients;
the data dividing module is used for layering the obtained single-cell transcriptome data and randomly dividing a training set and a testing set, and taking clinical response information as a single-cell transcriptome data tag;
the logistic regression model construction module is used for fitting the parameter vector according to the gene expression condition of the sample cell and the corresponding clinical response information to construct a logistic regression model;
the logistic regression model verification module is used for verifying the obtained logistic regression model, and calculating the clinical response probability by adding and averaging the response probability obtained by model prediction;
the clinical response situation prediction module is used for filtering and standardizing the response T cell transcriptome information of the outer Zhou Xiezhong tumor of the person to be predicted, inputting a logistic regression model, and adding and averaging the response probability obtained by model prediction to calculate the clinical response probability so as to predict the clinical response situation according to the clinical response probability.
Preferably, in the tumor-responsive T cell subset screening module, the tumor-responsive T cell subset is screened by the following method: collecting single-cell transcriptome data of tumor tissues, peripheral blood and lymph nodes of a cancer patient, and carrying out data filtering and standardization on the single-cell transcriptome data; analyzing the differences of tissue distribution, expression profile characteristics, cell proliferation, migration capacity and clonal expansion of different groups of T cells to identify a tumor-responsive T cell subset including terminally differentiated depleted T cells and regulatory T cells expressing TNFRSF 9; the same type of T cells were found in peripheral blood by analysis of TCR sequences of the tumor-responsive T cell subpopulation and designated as outer Zhou Xiezhong tumor-responsive T cells.
In the invention, the gene expression characteristics of the selected external Zhou Xiezhong tumor response T cells are similar to those of terminally differentiated memory T cells, and the gene expression characteristics are closely related to immunotherapy. The terminally differentiated memory T cells were designated Temra (terminally differentiated effector memory or effector T cell).
Preferably, identifying a characteristic gene of the tumor-responsive T cell subpopulation comprises: CX3CR1, GZMB, GZMH, KLRD1, NKG7, GNLY, and FGFBP2.
Preferably, the data obtaining module further comprises data filtering, standardization and annotation after obtaining the data information.
Preferably, in the data partitioning module, the training set includes 80% single cell transcriptome data, and the test set includes 20% single cell transcriptome data.
Preferably, the logistic regression model is represented by the following formula:
Figure BDA0004153428020000071
wherein the method comprises the steps ofX is the gene expression, W T As parameter vector, W 0 Is a bias parameter.
Preferably, the calculation method of the response probability is as follows:
Figure BDA0004153428020000072
wherein P is probability corresponding to clinical response condition, x is characteristic value, y is clinical response condition, θ T Is a parameter vector.
Preferably, the calculation method of the clinical response probability is as follows:
Figure BDA0004153428020000073
where y=1 indicates clinical response, y=0 indicates clinical non-response, n indicates cell number, response is a predicted value, and if response >0 is clinical response, response <0 is clinical non-response, response=0 is indistinct, and detection is required again.
In a third aspect, the present invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of the first aspect when the computer program is executed.
In a fourth aspect, the present invention provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method of the first aspect.
The numerical ranges recited herein include not only the recited point values, but also any point values between the recited numerical ranges that are not recited, and are limited to, and for the sake of brevity, the invention is not intended to be exhaustive of the specific point values that the recited range includes.
Compared with the prior art, the invention has the following beneficial effects:
(1) The invention identifies cell subtype and characteristic gene related to tumor immunotherapy in peripheral blood based on large-queue single-cell histology data, and has obvious statistical significance in science.
(2) The invention establishes a prediction model based on machine learning, and can improve the accuracy and stability of treatment response prediction by continuously accumulating the data and repeatedly iterating the model.
(3) The experimental method of the invention is based on SMART-seq full-length transcriptome amplification technology, and has higher gene detection capability, so the detection resolution is higher.
(4) The invention optimizes the construction flow of transcriptome library, can successfully perform experiments in the initial quantity range of 1-1000 cells, and improves the technical application range.
(5) The detection means of the invention adopts a common transcriptome method (bulk RNA-seq) instead of a single-cell transcriptome, can greatly reduce the experimental workload and cost, and is favorable for clinical practice and popularization.
(6) The invention collects the peripheral blood sample of the immune therapeutic patient before treatment, the detection usage amount is not more than 4 milliliters (mL), the sampling process is convenient and quick, the damage to the patient is less, and the invention is beneficial to clinical practice and popularization.
Drawings
FIG. 1 is a flow chart of machine learning predictive model construction based on single cell data with concomitant diagnosis of lung cancer immunotherapy;
FIG. 2 is transcriptome library information at different Tn5 enzyme concentrations;
FIG. 3 is transcriptome library information at different buffer pH values;
FIG. 4 shows the purification results before and after adjustment of the magnetic bead screening strategy;
FIG. 5 is the results of construction of full length transcriptome library at a cell starting amount of 10;
FIG. 6 is the results of construction of full length transcriptome libraries at a cell starting amount of 100;
FIG. 7 is the results of construction of a full length transcriptome library at a cell starting amount of 200;
FIG. 8 is the results of construction of a full length transcriptome library at a cell starting amount of 300;
FIG. 9 is the results of construction of full length transcriptome libraries at a cell initiation amount of 1000;
FIG. 10 is a predicted outcome of immunotherapy response in peripheral blood based lung cancer patients.
Detailed Description
The technical scheme of the invention is further described by the following specific embodiments. It will be apparent to those skilled in the art that the examples are merely to aid in understanding the invention and are not to be construed as a specific limitation thereof.
The specific techniques or conditions are not identified in the examples and are described in the literature in this field or are carried out in accordance with the product specifications. The reagents or apparatus used were conventional products commercially available through regular channels, with no manufacturer noted.
The materials and solution formulation methods used in the following embodiments are as follows:
example 1
In order to realize the prediction of drug response to a patient with non-small cell lung cancer receiving PD-1 immunotherapy, the embodiment builds a prediction model based on the data of an immunotherapy transcriptome, and develops a peripheral blood sequencing technology based on a SMART-seq full-length transcriptome.
1. In order to improve the accuracy of the model building link, single cell data collection and arrangement are carried out:
(1) Through unified filtering standard and data standardization, integrating single cell transcriptome data of tumor tissues, peripheral blood and lymph nodes of a total of 21 patients with more than 300 cancer types, and carrying out data filtering and standardization on the single cell transcriptome data; analyzing the differences of tissue distribution, expression profile characteristics, cell proliferation, migration capacity and clonal expansion of different groups of T cells, and identifying T cell subsets of tumor responses therefrom, including terminally differentiated depleted T cells and regulatory T cells expressing TNFRSF 9; the same type of T cells were found in peripheral blood by analysis of TCR sequences of the tumor-responsive T cell subpopulation and designated as outer Zhou Xiezhong tumor-responsive T cells.
The gene expression characteristics of the outer Zhou Xiezhong tumor response T cells are similar to those of the terminally differentiated memory T cells, and are closely related to immunotherapy.
(2) Based on the latest single cell immunotherapy research results, the single cell data set and clinical treatment information (including melanoma, triple negative breast and non-small cell lung cancer) disclosed by 38 additional cancer patients are collected, and after data filtering, standardization and annotation, the T cell group is focused on.
(3) Single cell data were stratified with patient numbers and training sets (80% single cell data) and test sets (20% single cell data) were randomly divided at a ratio of 4:1, with clinical response information as single cell data tags.
(3) Selecting a default parameter training logistic regression model based on single cell data of a training set by using Python3.9, firstly fitting a parameter vector according to the gene expression condition of a sample cell and corresponding clinical response information, wherein the logistic regression model is shown in a formula 1:
Figure BDA0004153428020000101
wherein X is the gene expression, W T As parameter vector, W 0 Is a bias parameter.
(4) And verifying the model obtained by training by adopting test set data, and adding and averaging single cell response probabilities obtained by model prediction to calculate clinical response probabilities of corresponding patients, wherein the clinical response probabilities of the patients are predicted according to the clinical response probabilities, and the clinical response probabilities are specifically shown as a formula 2 and a formula 3:
Figure BDA0004153428020000111
Figure BDA0004153428020000112
wherein, in the formula 2, P is the probability of corresponding to the clinical response condition, x is the characteristic value, y is the clinical response condition,θ T is a parameter vector.
In formula 3, y=1 indicates clinical response, y=0 indicates clinical non-response, n indicates cell number, response is a predicted value, and if response >0 is clinical response, response <0 is clinical non-response, response=0 is indistinct, and detection is required again.
(5) The model construction strategy is based on verification test of melanoma, triple negative mammary gland and non-small cell lung cancer single cell immunotherapy data, and can effectively predict clinical response conditions of patients, and the prediction accuracy rate is more than 90%.
A flow chart of machine learning prediction model construction and lung cancer immunotherapy accompanying diagnosis based on single-cell data is shown in fig. 1, wherein the left graph is a prediction model construction flow, and the right graph is a clinical sample processing and therapy response prediction flow.
2. In the development link of peripheral blood sequencing technology:
(1) Peripheral blood samples of non-small cell lung cancer patients receiving PD-1 immunotherapy were collected and PBMC cells were extracted therefrom (sample size required: fresh not more than 4 ml or 10 ml 6 Individual cryopreserved PBMC cells).
(2) Labelling of CD3 in PBMC by fluorescent antibodies + T cells and specific T cell subsets (i.e. CD 3) are obtained by flow cell sorting + CX3CR1 + Outer Zhou Xiezhong tumor-responsive T cells and CD3 as background + CX3CR1 - Non-external Zhou Xiezhong tumor responsive T cells), three tubes were collected for each cell subpopulation as a technical repeat in order to ensure stability of the experimental results.
(3) The literature reports that the SMART-seq full-length transcriptome technology prototype is mainly used for constructing a second generation sequencing library aiming at single cells, and experimental reagents and steps are adjusted in the embodiment so that the SMART-seq full-length transcriptome technology prototype can be suitable for 1-10 3 Cell processing and library construction on the order of magnitude; this process involves three main steps, namely cell lysis and reverse transcription, cDNA amplification and fragmentation, library preparation and purification.
(4) Cell lysis and reverse transcription are routine experimental procedures, and conditions are optimized in key steps, including Tn5 cleavage system, reaction buffer adjustment, library purification strategy, etc., for better realization of subsequent cDNA fragmentation and library preparation.
(5) To improve experimental efficiency, the final concentrations of the reactions of different Tn5 enzyme complexes were tested (0.001. Mu.M-0.01. Mu.M, transcriptome library information and statistical results at different Tn5 enzyme concentrations are shown in FIG. 2 and Table 1, FIG. 2 is transcriptome library information at different Tn5 enzyme concentrations, table 1 is a summary statistical result of transcriptome library information at different Tn5 enzyme concentrations, and the results indicate that the library concentration and total amount are highest when the Tn5 concentration is 0.005. Mu.M. Therefore, the optimal condition is finally determined to be 0.005. Mu.M, compared with the commercial kit, the ratio of small fragments in the library is reduced, the quality of the library is improved, and the experimental cost is remarkably reduced.
TABLE 1
Sample name Tn5 enzyme concentration Library concentration Library inventory
1 0.01μM 8.76ng/μL 87.6ng
2 0.0075μM 10.64ng/μL 106.4ng
3 0.005μM 12.40ng/μL 124.0ng
4 0.004μM 10.50ng/μL 105.0ng
5 0.003μM 6.62ng/μL 66.2ng
6 0.001μM 1.28ng/μL 12.8ng
(6) In order to improve the quality and stability of experimental results, the pH value (7.0-8.5) of the reaction buffer solution and the concentration (0% -20%) of Dimethylformamide (DMF) are adjusted, and finally, the combination condition of the optimal pH value of 7.3 and the optimal DMF of 10% is determined, so that the cDNA fragmentation effect is best, the library concentration is higher, and the quality is guaranteed. The transcriptome library information and the statistical results under the different buffer pH values are shown in FIG. 3 and Table 2, FIG. 3 is the transcriptome library information under the different buffer pH values, and Table 2 is the summarized statistical result of the transcriptome library information under the different buffer pH values; the results indicated that the library concentration and total amount were highest at pH 7.3. Meanwhile, 0.01% of Tween-20 is added in the library amplification step after fragmentation, and the improvement can neutralize the effect of SDS and improve the efficiency of the amplification enzyme.
TABLE 2
Sample is clearly called PH value Library concentration Library inventory
1 7.2 7.86ng/μL 78.6ng
2 7.3 10.30ng/μL 103.0ng
3 8.4 4.68ng/μL 46.8ng
(7) In order to improve the quality of sequencing data, in the library purification process, the step of magnetic bead screening is adjusted to be a purification strategy of 0.7X (reserved magnetic beads) +0.6X (reserved supernatant) +0.15X (reserved magnetic beads) +0.6X (reserved magnetic beads), and the scheme can effectively remove small fragments, so that the library size is more concentrated, and the data quality is improved. The purification results before and after the magnetic bead screening strategy are adjusted are shown in fig. 4, the magnetic bead screening strategy is adjusted, and one-step purification is additionally added, so that the proportion of small fragments can be effectively reduced, and the library quality is improved. The left image is an original document method, wherein the original document method is 1X (reserved magnetic beads) +0.6X (reserved supernatant) +0.15X (reserved magnetic beads), more magnetic beads are consumed, and small fragments are difficult to completely remove; the right panel shows the purification result after strategy adjustment, and the small fragments in the range of 200bp are obviously reduced.
(8) Transcriptome information corresponding to the cells was then obtained by Illumina NovaSeq 6000PE150 high throughput sequencing.
(9) And filtering and normalizing the transcriptome data, and inputting the transcriptome data into a prediction model to finally obtain the treatment response probability value of the corresponding patient.
In conclusion, the invention fully utilizes the immune therapy single cell sequencing data to establish a stable prediction model, and accurately obtains transcriptome information of peripheral blood key cells by a full-length transcriptome sequencing technology:
(1) The invention has firm theoretical basis and abundant data, integrates single cell transcriptome data of a total of 21 cancer types of more than 300 patients, and identifies cell types and gene expression characteristics related to immune therapy response in peripheral blood, namely external Zhou Xiezhong tumor response T cells, wherein the characteristic genes comprise CX3CR1, GZMB, GZMH, KLRD1, NKG7, GNLY, FGFBP2 and the like;
(2) The present invention collects and integrates immunotherapeutic single cell datasets for an additional 38 cancer patients, directed against CD8 therein + T lymphocytes, establishing a stable machine learning prediction model through logistic regression; the subsequent cross verification is carried out through the test set data, and the overall prediction accuracy can reach more than 90%;
(3) The experimental method of the technology only needs 2-4 milliliters (mL) of peripheral blood sample or 10 milliliters (mL) 6 The specific T cells responding to the outer Zhou Xiezhong tumor are enriched by antibody labeling and flow sorting, which is lower than the sample size required by tumor mutation load detection or other diagnostic methods;
(4) Unlike available single cell, the present invention improves the SMART-seq full length transcriptome library constructing process to reach initial amount of 1-1000 cells, and has integral experiment success rate not less than 95%, and the results shown in FIG. 5, FIG. 6, FIG. 7, FIG. 8 and FIG. 9, in which specific T cell group is flow sorted for subsequent transcriptome library construction and library segment analysis shows obvious signal peak in 400-600 range; thus, more gene numbers are detected by the SMART-seq method, the accuracy of the data is improved, and meanwhile, the stability of the result is ensured based on transcriptome data of a plurality of cells;
(5) The sequencing library construction process adopts an optimized and improved enzyme breaking system (comprising an enzyme complex and a reaction buffer system), so that the quality of the library is obviously improved, and the experimental cost is obviously reduced;
(6) According to the invention, more than 50 pre-treatment blood samples of non-small cell lung cancer patients receiving PD-1 immune treatment are collected, the current 16 patients have paired clinical follow-up information, the overall prediction accuracy can reach more than 80% by testing the method, and the result is shown in a graph 10, wherein in the graph 10, R represents clinical treatment response, non-R represents clinical treatment non-response, data points are distributed above a y-axis 0 point and represent model prediction as response, the model prediction below the y-axis 0 point is not response, the number of the data points represents the number of calculation simulation, the current display result is 10 simulation cases, and the current overall prediction accuracy is 81.3%, so that the method is proved to be effective in practical application.
Based on the characteristics, the invention can give more accurate prediction results for the response situation of the peripheral blood sample before the treatment of a small number of patients after the treatment of the patients.
The applicant declares that the above is only a specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and it should be apparent to those skilled in the art that any changes or substitutions that are easily conceivable within the technical scope of the present invention disclosed by the present invention fall within the scope of the present invention and the disclosure.

Claims (10)

1. A method of predicting a non-small cell lung cancer PD-1 immunotherapeutic response for the purposes of non-disease diagnosis or treatment, the method comprising:
(1) Collecting single-cell transcriptome data of tumor tissues, peripheral blood and lymph nodes of a cancer patient, and carrying out data filtering and standardization on the single-cell transcriptome data; analyzing the differences of tissue distribution, expression profile characteristics, cell proliferation, migration capacity and clonal expansion of different groups of T cells to identify a tumor-responsive T cell subset including terminally differentiated depleted T cells and regulatory T cells expressing TNFRSF 9; the same type of T cells were found in peripheral blood by analysis of TCR sequences of the tumor-responsive T cell subpopulation and designated as outer Zhou Xiezhong tumor-responsive T cells;
(2) Collecting single-cell transcriptome data sets and clinical treatment information disclosed by existing melanoma, triple-negative breast and non-small cell lung cancer patients, and carrying out data filtering, standardization and annotation;
(3) Layering the single-cell transcriptome data obtained in the step (2), randomly dividing a training set and a testing set, and taking clinical response information as a single-cell transcriptome data label;
(4) Training a logistic regression model based on single-cell transcriptome data of a training set, fitting a parameter vector according to the gene expression condition of a sample cell and corresponding clinical response information, and constructing the logistic regression model;
(5) Verifying by adopting the test set data pair, and adding and averaging the response probability obtained by model prediction to calculate the corresponding clinical response probability;
(6) Collecting peripheral blood samples of a patient with non-small cell lung cancer receiving PD-1 immune treatment, sorting T cells responding to Zhou Xiezhong tumor outside, performing high-throughput sequencing to obtain transcriptome information, filtering and standardizing transcriptome data, inputting a logistic regression model, adding and averaging response probabilities obtained through model prediction to calculate corresponding clinical response probabilities, and predicting clinical response conditions according to the clinical response probabilities.
2. The method of predicting the response of non-small cell lung cancer PD-1 immunotherapy for non-disease diagnosis or treatment according to claim 1, wherein in step (1), identifying the characteristic genes of the T cell subset of the tumor response comprises: CX3CR1, GZMB, GZMH, KLRD1, NKG7, GNLY, and FGFBP2.
3. The method of predicting the response of non-small cell lung cancer PD-1 immunotherapy for non-disease diagnosis or treatment according to claim 1 or 2, wherein in step (3), the training set comprises 80% single cell transcriptome data and the test set comprises 20% single cell transcriptome data.
4. The method of predicting the PD-1 immunotherapy response of non-small cell lung cancer for the purpose of non-disease diagnosis or treatment according to any one of claims 1-3, wherein in step (4), the logistic regression model is represented by the following formula:
Figure FDA0004153428010000021
wherein X is the gene expression, W T As parameter vector, W 0 Is a bias parameter.
5. The method of predicting the response of non-small cell lung cancer PD-1 immunotherapy for the purpose of non-disease diagnosis or treatment according to any one of claims 1 to 4, wherein in step (5), the method of calculating the response probability is represented by the following formula:
Figure FDA0004153428010000022
wherein P is probability corresponding to clinical response condition, x is characteristic value, y is clinical response condition, θ T Is a parameter vector;
preferably, in step (5), the method for calculating the clinical response probability is as follows:
Figure FDA0004153428010000023
where y=1 indicates clinical response, y=0 indicates clinical non-response, n indicates cell number, response is a predicted value, and if response >0 is clinical response, response <0 is clinical non-response, response=0 is indistinct, and detection is required again.
6. The method of predicting the response to non-small cell lung cancer PD-1 immunotherapy for the purpose of non-disease diagnosis or treatment according to any one of claims 2 to 5, wherein in step (6), the high throughput sequencing obtaining transcriptome information is performed by a method comprising the steps of:
the cell is subjected to lysis and reverse transcription, cDNA amplification and fragmentation are carried out, library amplification and purification are carried out after fragmentation, and high-throughput sequencing is carried out after purification to obtain transcriptome information;
preferably, the cDNA is fragmented by using a Tn5 cleavage system, wherein the final concentration of Tn5 enzyme in the Tn5 cleavage system is 0.001-0.01. Mu.M, preferably 0.005. Mu.M;
preferably, the Tn5 enzyme cutting system also comprises 5% -20% of dimethylformamide, preferably 10% of dimethylformamide;
preferably, the pH value of the Tn5 enzyme cutting system is 7.0-8.5, preferably 7.3;
preferably, the library amplification is performed using an amplification system with 0.01-0.012% Tween-20 added;
preferably, the purification employs a purification strategy of: 0.7X retention beads+0.6x retention supernatant+0.15X retention beads+0.6x retention beads.
7. A non-small cell lung cancer PD-1 immunotherapeutic response prediction system based on peripheral blood detection, the system comprising:
a tumor-responsive T cell subpopulation screening module for screening a tumor-responsive T cell subpopulation from single cell transcriptome data of tumor tissue, peripheral blood, and lymph nodes of a cancer patient;
the data acquisition module is used for acquiring a single-cell transcriptome data set and clinical treatment information disclosed by the existing melanoma, triple-negative breast and non-small cell lung cancer patients;
the data dividing module is used for layering the obtained single-cell transcriptome data and randomly dividing a training set and a testing set, and taking clinical response information as a single-cell transcriptome data tag;
the logistic regression model construction module is used for fitting the parameter vector according to the gene expression condition of the sample cell and the corresponding clinical response information to construct a logistic regression model;
the logistic regression model verification module is used for verifying the obtained logistic regression model, and calculating the clinical response probability by adding and averaging the response probability obtained by model prediction;
the clinical response situation prediction module is used for filtering and standardizing the response T cell transcriptome information of the outer Zhou Xiezhong tumor of the person to be predicted, inputting a logistic regression model, and adding and averaging the response probability obtained by model prediction to calculate the clinical response probability so as to predict the clinical response situation according to the clinical response probability.
8. The peripheral blood detection-based non-small cell lung cancer PD-1 immunotherapeutic response prediction system of claim 7, wherein the logistic regression model is represented by the formula:
Figure FDA0004153428010000041
wherein X is the gene expression, W T As parameter vector, W 0 Is a bias parameter;
preferably, the calculation method of the response probability is as follows:
Figure FDA0004153428010000042
wherein P is probability corresponding to clinical response condition, x is characteristic value, y is clinical response condition, θ T Is a parameter vector;
preferably, the calculation method of the clinical response probability is as follows:
Figure FDA0004153428010000043
where y=1 indicates clinical response, y=0 indicates clinical non-response, n indicates cell number, response is a predicted value, and if response >0 is clinical response, response <0 is clinical non-response, response=0 is indistinct, and detection is required again.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1-5 when the computer program is executed by the processor.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any of claims 1-5.
CN202310326560.5A 2023-03-30 2023-03-30 Non-small cell lung cancer PD-1 immune therapy response prediction method for non-disease diagnosis or treatment Pending CN116334225A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310326560.5A CN116334225A (en) 2023-03-30 2023-03-30 Non-small cell lung cancer PD-1 immune therapy response prediction method for non-disease diagnosis or treatment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310326560.5A CN116334225A (en) 2023-03-30 2023-03-30 Non-small cell lung cancer PD-1 immune therapy response prediction method for non-disease diagnosis or treatment

Publications (1)

Publication Number Publication Date
CN116334225A true CN116334225A (en) 2023-06-27

Family

ID=86891064

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310326560.5A Pending CN116334225A (en) 2023-03-30 2023-03-30 Non-small cell lung cancer PD-1 immune therapy response prediction method for non-disease diagnosis or treatment

Country Status (1)

Country Link
CN (1) CN116334225A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117290817A (en) * 2023-10-18 2023-12-26 浙江省立同德医院(浙江省精神卫生研究院) Marker combination of product for relieving hypercoagulability state of lung cancer, method for establishing and applying curative effect discrimination model and traditional Chinese medicine combination
CN119170087A (en) * 2024-11-20 2024-12-20 北京大学人民医院 Predictive model and construction method for predicting prognosis after neoadjuvant immunotherapy by metastatic lymph nodes

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117290817A (en) * 2023-10-18 2023-12-26 浙江省立同德医院(浙江省精神卫生研究院) Marker combination of product for relieving hypercoagulability state of lung cancer, method for establishing and applying curative effect discrimination model and traditional Chinese medicine combination
CN119170087A (en) * 2024-11-20 2024-12-20 北京大学人民医院 Predictive model and construction method for predicting prognosis after neoadjuvant immunotherapy by metastatic lymph nodes
CN119170087B (en) * 2024-11-20 2025-02-18 北京大学人民医院 Predictive model for predicting prognosis after neoadjuvant immunotherapy through metastatic lymph node and construction method thereof

Similar Documents

Publication Publication Date Title
Zhao et al. Detection of fetal subchromosomal abnormalities by sequencing circulating cell-free DNA from maternal plasma
US20210002728A1 (en) Systems and methods for detection of residual disease
Ermann et al. Immune cell profiling to guide therapeutic decisions in rheumatic diseases
AU2020221845A1 (en) An integrated machine-learning framework to estimate homologous recombination deficiency
CN106650312B (en) Device for detecting copy number variation of circulating tumor DNA
CN109880910A (en) A kind of detection site combination, detection method, detection kit and the system of Tumor mutations load
AU2021282414B2 (en) Systems And Methods For Determining Microsatellite Instability
Shegekar et al. The emerging role of liquid biopsies in revolutionising cancer diagnosis and therapy
EP3629904A1 (en) Methods and systems for identifying or monitoring lung disease
CN106778073B (en) A kind of method and system of assessment tumor load variation
CN116334225A (en) Non-small cell lung cancer PD-1 immune therapy response prediction method for non-disease diagnosis or treatment
CN105219844A (en) A kind of compose examination 11 kinds of diseases gene marker combination, test kit and disease risks predictive model
Xu-Monette et al. A refined cell-of-origin classifier with targeted NGS and artificial intelligence shows robust predictive value in DLBCL
Zheng Study design considerations for cancer biomarker discoveries
JP2022552723A (en) Method and system for measuring cell status
US20240401035A1 (en) Fragment size characterization of cell-free dna mutations from clonal hematopoiesis
CN113853444A (en) Methods for predicting survival in cancer patients
Michaelsen et al. A B-cell–associated gene signature classification of diffuse large B-cell lymphoma by NanoString technology
Kotecha et al. Matched molecular profiling of cell-free DNA and tumor tissue in patients with advanced clear cell renal cell carcinoma
RU2744604C2 (en) Method for non-invasive prenatal diagnostics of fetal chromosomal aneuploidy from maternal blood
Aung et al. Spatially informed gene signatures for response to immunotherapy in melanoma
Bhattacharya et al. DeCompress: tissue compartment deconvolution of targeted mRNA expression panels using compressed sensing
Kim et al. Statistical methods of translating microarray data into clinically relevant diagnostic information in colorectal cancer
US20250059608A1 (en) Molecular signatures for cell typing and monitoring immune health
Hobbs et al. Biostatistics and bioinformatics in clinical trials

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination