[go: up one dir, main page]

CN119716028A - A blood marker combination for distinguishing non-cancerous lung diseases from early lung cancer, and its screening method and application - Google Patents

A blood marker combination for distinguishing non-cancerous lung diseases from early lung cancer, and its screening method and application Download PDF

Info

Publication number
CN119716028A
CN119716028A CN202411832721.9A CN202411832721A CN119716028A CN 119716028 A CN119716028 A CN 119716028A CN 202411832721 A CN202411832721 A CN 202411832721A CN 119716028 A CN119716028 A CN 119716028A
Authority
CN
China
Prior art keywords
ratio
lung
early
lung cancer
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411832721.9A
Other languages
Chinese (zh)
Inventor
李琰
何芸
崔惠娜
于书棋
黄桂红
史菲
文赫
韦青燕
胡隽源
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Metanotitia Inc
Original Assignee
Metanotitia Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Metanotitia Inc filed Critical Metanotitia Inc
Priority to CN202411832721.9A priority Critical patent/CN119716028A/en
Publication of CN119716028A publication Critical patent/CN119716028A/en
Pending legal-status Critical Current

Links

Landscapes

  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention provides a blood marker combination for distinguishing lung non-cancer diseases and early lung cancer, and a screening method and application thereof, belonging to the technical field of in-vitro diagnosis. According to the invention, 27 specific blood marker combinations related to lung cancer are found by analyzing clinical indexes and metabonomics data of blood samples of lung non-cancer patients and early lung cancer patients, so that the method can be used for developing a noninvasive, noninvasive and highly accurate diagnosis product for distinguishing lung non-cancer diseases from early lung cancer, and provides effective clinical treatment strategies and guides for early discovery, early diagnosis and early intervention of lung cancer.

Description

Blood marker combination for distinguishing lung non-cancer diseases and early lung cancer, and screening method and application thereof
Technical Field
The invention belongs to the technical field of in-vitro diagnosis, and particularly relates to a blood marker combination for distinguishing lung non-cancer diseases and early lung cancer, and a screening method and application thereof.
Background
Lung cancer (Lung cancer) is a malignant tumor. Most patients with lung cancer have no obvious clinical symptoms in early stages of the disease, and about 75% of patients are already in advanced stages of lung cancer at the time of diagnosis. Because of no timely intervention operation treatment, lung cancer patients miss the optimal treatment time, which is also the reason for lower 5-year survival rate of most advanced lung cancer patients after operation. Thus, early diagnosis and early treatment are important to enhance the prognosis of lung cancer patients and improve the quality of life of the patients.
Currently, the techniques commonly used for lung cancer diagnosis mainly include imaging examination, sputum cytology examination, pathology examination, serum tumor markers and the like. Among them, imaging examination such as low-dose CT (LDCT) is the most commonly used technique for diagnosing lung cancer, but is suitable for primary screening and detection of lung nodules due to insufficient definition of LDCT images, and the proportion of lung cancer finally diagnosed in suspicious lung nodules detected by LDCT is less than 10%. Sputum cytology examination is one of the simplest and convenient noninvasive diagnosis methods for diagnosing central lung cancer, but has certain possibility of false positive and false negative, and is difficult to type. Pathological examination diagnosis is considered as a gold standard for clinical diagnosis of many cancers, and is relatively invasive, harmful to the body of patients, and expensive, and is not suitable for early diagnosis screening of large-scale populations of lung cancer. Currently, clinically commonly used primary lung cancer serum markers are carcinoembryonic antigen (CEA), neuron-specific enolase (NSE), cytokeratin fragment 19 (CYFRA 21-1), gastrin-releasing peptide precursor (Pro-GRP), and squamous cell carcinoma antigen (SCC). Wherein NSE and Pro-GRP are associated with small cell lung cancer, and the increased CEA level is helpful for diagnosis of lung adenocarcinoma. However, the sensitivity and specificity of serum markers for early lung cancer are not ideal.
Because the technology for diagnosing and screening early lung cancer has certain limitation, in order to further improve the survival rate of lung cancer patients, development of a noninvasive, noninvasive and high-accuracy early screening method is urgently needed, and the method has important significance for improving the treatment effect of lung cancer and improving the prognosis of patients.
Disclosure of Invention
The invention aims to provide a blood biomarker combination for distinguishing lung non-cancer diseases and early lung cancer, a screening method and application thereof, and provides a noninvasive, noninvasive and high-accuracy blood biomarker which can be used for diagnosing and screening lung non-cancer diseases and early lung cancer.
The invention provides a blood marker combination for distinguishing lung non-cancer diseases and early lung cancer, which comprises 13 clinical indexes and 14 metabolome characteristics;
The clinical indexes comprise a monocyte ratio and potassium concentration ratio, a monocyte ratio and activated partial thromboplastin time ratio, a erythrocyte count and activated partial thromboplastin time ratio, a potassium concentration and international standardized ratio, a potassium concentration and thrombin time ratio, an activated partial thromboplastin time and thrombin time ratio, a carcinoembryonic antigen and serum globulin concentration ratio, an international standardized ratio and activated partial thromboplastin time ratio, a carcinoembryonic antigen and potassium concentration ratio, a carcinoembryonic antigen concentration and prothrombin time ratio, a carcinoembryonic antigen concentration and activated partial thromboplastin time ratio, a carcinoembryonic antigen and lactate dehydrogenase concentration ratio, a erythrocyte count and potassium concentration ratio;
The metabonomics features include a peak intensity ratio of phosphatidylethanolamine 38:3p to sebacic acid, a peak intensity ratio of phosphatidylinositol 38:3 to sebacic acid, a peak intensity ratio of sphingomyelin D38:5 to sebacic acid, a peak intensity ratio of phosphatidylethanolamine 38:5 to L-prolyl-L-threonine, a peak intensity ratio of gluconolactone to hexadecanedioic acid, a peak intensity ratio of phosphatidylcholine 38:5 to sebacic acid, a peak intensity ratio of gluconolactone to trans-cyclohexane-1, 2-dicarboxylic acid, a peak intensity ratio of phosphatidylcholine 38:6e to trans-cyclohexane-1, 2-dicarboxylic acid, a peak intensity ratio of phosphatidylethanolamine 36:1p to trans-cyclohexane-1, 2-dicarboxylic acid, a peak intensity ratio of phosphatidylethanolamine 38:5 to nerolidol, a peak intensity ratio of phosphatidylserine 38:0 to sebacic acid, a peak intensity ratio of dimethyl phosphatidylethanolamine 38:4 to sebacic acid, a peak intensity ratio of phosphatidylethanolamine 38:6e to D-glucosamine-6.
The invention also provides application of the reagent or the kit for detecting the blood marker combination in the scheme in preparation of products for distinguishing lung non-cancer diseases from early lung cancer.
Preferably, the non-cancerous lung disease includes one or more of pneumonia, tuberculosis and pulmonary infection.
The invention also provides application of the reagent or the kit for detecting the blood marker combination in the scheme in preparation of lung disease diagnosis products.
Preferably, the reagent or kit for detecting the blood marker combination according to the above scheme includes a liquid chromatography mass spectrometry detection reagent.
The invention also provides a screening method for distinguishing blood marker combinations of lung non-cancer diseases and early lung cancer, which comprises the following steps:
respectively collecting plasma samples of a lung non-cancer patient and an early lung cancer patient;
Separating an organic phase and an aqueous phase from plasma samples of patients with lung non-cancer diseases and early lung cancer, wherein the organic phase is a non-polar metabolite and the aqueous phase is a polar metabolite;
Respectively detecting polar metabolites in the water phase and nonpolar metabolites in the organic phase by adopting a liquid chromatography-mass spectrometry method to obtain metabolome detection data of plasma samples of patients with lung non-cancer diseases and early lung cancer patients;
Preprocessing and metabolite identification are carried out on the metabolome detection data of the lung non-cancer patients and the early lung cancer patients to obtain metabolome data;
Collecting clinical indexes of patients with lung non-cancer diseases and early-stage lung cancer patients to obtain clinical index data;
Respectively carrying out data standardization processing on the metabonomics data and the clinical index data of the lung non-cancer patients and the early lung cancer patients to obtain standardized metabonomics data and clinical index data;
Respectively calculating the composition ratio between all possible two variables of the standardized metabonomics data and the clinical index data, selecting the composition of TOP 100 feature pairs based on the ranking of P-value values among the variables from small to large, and respectively forming a new data set by the clinical index and the metabonomic features after calculating the ratio;
And sequentially and respectively carrying out pearson correlation analysis and Lasso regression feature screening on a new data set consisting of clinical indexes and metabolome features to obtain a blood marker combination for distinguishing lung non-cancer diseases and early lung cancer.
Preferably, the clinical index test includes routine blood and biochemical blood test.
Preferably, the pearson correlation analysis comprises the steps of extracting all clinical index pairs and metabolic feature pairs with absolute values of correlation coefficients R being more than or equal to 0.9, and removing from the beginning with the largest participation time until the correlation between any pair of clinical indexes and metabolic group features is less than a threshold value of 0.9.
The invention also provides a method for constructing a diagnosis model of lung non-cancer diseases and early lung cancer, which comprises the following steps:
and (3) performing multivariate ROC curve analysis on the blood biomarkers obtained by screening by the screening method according to the scheme.
Preferably, the multivariate ROC curve analysis comprises selecting 3/4 samples from modeling group data as a training set, taking the remaining 1/4 samples as a test set, performing random loop iteration for 1000 times by using a support vector machine, and constructing a diagnosis model for lung non-cancer diseases and early lung cancer by a method of calculating an average value of accuracy of a final model.
The invention provides a blood marker combination for distinguishing lung non-cancer diseases from early lung cancer. According to the invention, 27 specific blood marker combinations related to lung cancer are found by analyzing clinical indexes and metabonomics data of blood samples of lung non-cancer patients and early lung cancer patients, so that the method can be used for developing a noninvasive, noninvasive and highly accurate diagnosis product for distinguishing lung non-cancer diseases from early lung cancer, and provides effective clinical treatment strategies and guides for early discovery, early diagnosis and early intervention of lung cancer.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a graph of the results of a multivariate ROC curve analysis in a set-up module to distinguish between a group of non-cancerous lung diseases and an early stage lung cancer group based on 27 blood biomarkers;
Fig. 2 is a graph of the results of a multivariate ROC curve analysis in the validation set to distinguish between the group of non-cancerous lung diseases and the group of early stage lung cancer based on 27 blood biomarkers.
Detailed Description
The invention provides a blood marker combination for distinguishing lung non-cancer diseases and early lung cancer, which comprises 13 clinical indexes and 14 metabolome characteristics;
The clinical indexes comprise a monocyte ratio and potassium concentration ratio, a monocyte ratio and activated partial thromboplastin time ratio, a erythrocyte count and activated partial thromboplastin time ratio, a potassium concentration and international standardized ratio, a potassium concentration and thrombin time ratio, an activated partial thromboplastin time and thrombin time ratio, a carcinoembryonic antigen and serum globulin concentration ratio, an international standardized ratio and activated partial thromboplastin time ratio, a carcinoembryonic antigen and potassium concentration ratio, a carcinoembryonic antigen concentration and prothrombin time ratio, a carcinoembryonic antigen concentration and activated partial thromboplastin time ratio, a carcinoembryonic antigen and lactate dehydrogenase concentration ratio, a erythrocyte count and potassium concentration ratio;
The metabonomics features include a peak intensity ratio of phosphatidylethanolamine 38:3p to sebacic acid, a peak intensity ratio of phosphatidylinositol 38:3 to sebacic acid, a peak intensity ratio of sphingomyelin D38:5 to sebacic acid, a peak intensity ratio of phosphatidylethanolamine 38:5 to L-prolyl-L-threonine, a peak intensity ratio of gluconolactone to hexadecanedioic acid, a peak intensity ratio of phosphatidylcholine 38:5 to sebacic acid, a peak intensity ratio of gluconolactone to trans-cyclohexane-1, 2-dicarboxylic acid, a peak intensity ratio of phosphatidylcholine 38:6e to trans-cyclohexane-1, 2-dicarboxylic acid, a peak intensity ratio of phosphatidylethanolamine 36:1p to trans-cyclohexane-1, 2-dicarboxylic acid, a peak intensity ratio of phosphatidylethanolamine 38:5 to nerolidol, a peak intensity ratio of phosphatidylserine 38:0 to sebacic acid, a peak intensity ratio of dimethyl phosphatidylethanolamine 38:4 to sebacic acid, a peak intensity ratio of phosphatidylethanolamine 38:6e to D-glucosamine-6.
In the practice of the invention, the blood marker combination consists of 13 clinical indices and 14 metabolome features as described in the above protocol.
In the implementation process of the invention, the clinical indexes comprise the following units of mononuclear cell ratio percent, red blood cell count multiplied by 10 12/L, potassium ratio in mmol/L, international standardized ratio of no unit, activated partial thromboplastin time in sec, thrombin time in sec, carcinoembryonic antigen in ng/mL, serum globulin in g/L, prothrombin time in sec and lactate dehydrogenase in U/L.
In the practice of the invention, the monocyte ratio is the same as the monocyte percentage or monocyte ratio.
In the practice of the invention, the peak intensity is the relative concentration level.
The invention also provides application of the reagent or the kit for detecting the blood marker combination in the scheme in preparation of products for distinguishing lung non-cancer diseases from early lung cancer.
In the practice of the invention, the non-cancerous lung condition includes one or more of pneumonia, tuberculosis, and pulmonary infection.
In the practice of the invention, the product comprises a reagent or kit.
The invention also provides application of the reagent or the kit for detecting the blood marker combination in the scheme in preparation of lung disease diagnosis products.
In the practice of the present invention, the reagents or kits for detecting the blood marker combinations described in the above schemes include liquid chromatography mass spectrometry detection reagents.
In the implementation process of the invention, the monocyte ratio, potassium concentration, activated partial thromboplastin time, erythrocyte count, international standardized ratio, thrombin time, carcinoembryonic antigen and serum globulin prothrombin time and lactate dehydrogenase concentration are clinically detected;
The peak intensities of phosphatidylethanolamine 38:3p, sebacic acid, phosphatidylinositol 38:3, sphingomyelin D38:5, phosphatidylethanolamine 38:5, L-prolyl-L-threonine, gluconolactone, hexadecanoic acid, phosphatidylcholine 38:5, trans-cyclohexane-1, 2-dicarboxylic acid, phosphatidylcholine 38:6e, phosphatidylethanolamine 36:1p, nerolidol, phosphatidylserine 38:0, dimethyl phosphatidylethanolamine 38:4, and D-glucosamine-6-phosphoric acid were detected by liquid chromatography mass spectrometry.
The invention also provides a screening method for distinguishing blood marker combinations of lung non-cancer diseases and early lung cancer, which comprises the following steps:
respectively collecting plasma samples of a lung non-cancer patient and an early lung cancer patient;
Separating an organic phase and an aqueous phase from plasma samples of patients with lung non-cancer diseases and early lung cancer, wherein the organic phase is a non-polar metabolite and the aqueous phase is a polar metabolite;
Respectively detecting polar metabolites in the water phase and nonpolar metabolites in the organic phase by adopting a liquid chromatography-mass spectrometry method to obtain metabolome detection data of patients with lung non-cancer diseases and early-stage lung cancer patients;
Preprocessing and metabolite identification are carried out on the metabolome detection data of the lung non-cancer patients and the early lung cancer patients to obtain metabolome data;
Collecting clinical indexes of patients with lung non-cancer diseases and early-stage lung cancer patients to obtain clinical index data;
Respectively carrying out data standardization processing on the metabonomics data and the clinical index data of the lung non-cancer patients and the early lung cancer patients to obtain standardized metabonomics data and clinical index data;
Respectively calculating the composition ratio between all possible two variables of the standardized metabonomics data and the clinical index data, selecting the composition of TOP 100 feature pairs based on the ranking of P-value values among the variables from small to large, and respectively forming a new data set by the clinical index and the metabonomic features after calculating the ratio;
And sequentially and respectively carrying out pearson correlation analysis and Lasso regression feature screening on a new data set consisting of clinical indexes and metabolome features to obtain a blood marker combination for distinguishing lung non-cancer diseases and early lung cancer.
The invention firstly respectively collects plasma samples of patients with lung non-cancer diseases and early lung cancer patients.
In the specific implementation process of the invention, the inclusion criteria of the lung non-cancer patients and the early lung cancer patients are as follows (1) the age is more than or equal to 18 years, (2) the early lung cancer patients are all lung cancer patients which are identified by a gold standard method of histopathological examination, stage I is regarded as early lung cancer patients, and (3) the lung benign diseases are clinically diagnosed by biopsy/postoperative pathological diagnosis or comprehensive evaluation by a clinician, and the lung benign diseases are included in patients suffering from pneumonia, tuberculosis, pulmonary infection and the like. The exclusion criteria for patients with non-cancerous lung and early stage lung cancer are (1) gestation or lactation, (2) emergency or need for rescuing, (3) history of malignancy or any antitumor therapy before sampling, and (4) patients with multiple primary malignancies simultaneously. Patient clinical data, namely collecting information such as basic conditions and health states of a subject, including basic demographic characteristics, life style, disease history, medication history and the like, through a questionnaire form when a plasma sample of the subject is collected. And meanwhile, specific clinical index data (comprising blood routine, blood biochemistry, tumor markers and the like) of the subjects are collected for subsequent statistical analysis and model construction.
In the practice of the invention, the plasma is preferably that obtained after thawing a frozen plasma sample. In the practice of the invention, the frozen plasma sample is stored at-80 ℃. In the implementation process of the invention, the thawing is preferably thawing on ice, and vortex mixing is further included for 10s after thawing.
In the specific implementation process of the invention, blood collection, analysis and preservation are carried out, and the steps are that the purple head EDTA anticoagulant blood collection tube and the red head blood collection tube are respectively used for collecting a sample of fasting blood of a patient in the morning. The blood sample collected by the purple head blood collection tube is gently inverted for several times, so that the blood and the anticoagulant are fully mixed to avoid blood coagulation, and then the blood sample is centrifuged for 10min at the conditions of 4 ℃ and 3500rpm to separate plasma for routine analysis of blood, and the blood sample is stored in a-80 ℃ refrigerator after analysis for subsequent metabonomics detection. After blood is collected by the red head blood collection tube, the red head blood collection tube is placed for 30min at normal temperature until the blood is completely coagulated to form a coagulated blood mass, and then the coagulated blood is centrifuged for 10min at the speed of 3500rpm at the temperature of 4 ℃ to separate serum for biochemical analysis of blood.
In the method, an organic phase and an aqueous phase are separated from plasma samples of a lung non-cancer patient and an early lung cancer patient respectively, wherein the organic phase is a nonpolar metabolite, and the aqueous phase is a polar metabolite.
In the implementation process of the invention, the step of separating the organic phase and the water phase from the blood plasma comprises the steps of mixing the blood plasma with a solvent, extracting to obtain an extracting solution, adding a mixed solution of methanol and water into the extracting solution, centrifuging, and separating an upper organic phase and a lower water phase. In the implementation process of the invention, the volume ratio of the plasma to the solvent is 1:10, the solvent is a mixed solution of methyl tertiary butyl ether and methanol, and the volume ratio of the methyl tertiary butyl ether to the methanol is 3:1. In the implementation process of the invention, the volume of the mixed solution of methanol and water is 500 mu L, the volume ratio of the methanol to the water is 3:1, and the rotating speed of centrifugation is 12700rpm for 5min, wherein the volume of the plasma is 100 mu L.
After the aqueous phase is obtained, the present invention obtains the polar metabolite from the aqueous phase.
In the specific implementation process of the invention, the preparation method of the polar metabolite comprises the following steps of mixing the water phase with ice methanol, vibrating, incubating and centrifuging, carrying out protein precipitation, collecting supernatant after the protein precipitation, spin-drying the supernatant in a polar centrifuge tube, adding mass spectrum grade water into a spin-dried polar centrifuge tube sample for re-dissolution, and sequentially carrying out vortex mixing, ultrasonic treatment and centrifugation to obtain the polar metabolite in the water phase.
In the implementation process of the invention, the volume ratio of the water phase to the ice methanol is 4:11, the water phase and the ice methanol are mixed in a centrifuge tube, the mixing mode of the water phase and the ice methanol comprises shaking, incubation and centrifugation, the incubation time is 60min, the incubation temperature is-20 ℃, the centrifugation time is 10min, and a vacuum concentrator (Speed-Vac) is adopted as a device for spin drying. In the implementation process of the invention, the volume ratio of the supernatant to mass spectrum grade water of the polar centrifuge tube sample after re-dissolution spin drying is 5:1, the time of ultrasonic treatment after re-dissolution is 5min, the rotating speed of centrifugal treatment after ultrasonic treatment is 12700rpm, and the time of centrifugal treatment after ultrasonic treatment is 5min.
After the organic phase is obtained, the present invention obtains the nonpolar metabolite from the organic phase.
In the specific implementation process of the invention, the preparation of the nonpolar metabolite from the organic phase comprises the following steps of spin-drying the organic phase, re-dissolving the organic phase by adopting a mixed solution of acetonitrile and isopropanol, sequentially carrying out vortex mixing, ultrasonic treatment and centrifugation, and collecting supernatant to obtain the nonpolar metabolite of the organic phase.
In the implementation process of the invention, the volume of the organic phase is 350 mu L, the dosage of the mixed solution of the organic phase and acetonitrile and isopropanol is 200 mu L, the spin drying is carried out by a vacuum concentrator (Speed-Vac), the volume ratio of the acetonitrile to the isopropanol in the mixed solution of the acetonitrile and the isopropanol is 3:1, the ultrasonic time is 5min, the centrifugal temperature is 4 ℃, the centrifugal time is 5min, and the rotating Speed is 12700rpm.
The application adopts a liquid chromatography-mass spectrometry to detect polar metabolites in the water phase and nonpolar metabolites in the organic phase respectively, and metabolome detection data of patients with lung non-cancer diseases and early lung cancer patients are obtained.
In the implementation of the present invention, the liquid chromatography-mass spectrometry is preferably high resolution liquid chromatography-mass spectrometry (UHPLC-MS).
In the implementation process of the invention, the loading amount of the polar metabolite in the water phase is 180 mu L and is loaded into a 2mL glass sample bottle, and the polar metabolite in the water phase is detected by an on-machine (LC-MS).
In the practice of the invention, the liquid chromatographic conditions for detecting the polar metabolite in the aqueous phase include a chromatographic column of Waters ACQUTTYThe HSS T3 chromatographic column has the specification of 1.8 mu m2.1 x 100mm, and comprises an ACQUITY UPLC I-Class liquid chromatographic system (Woodson), wherein the small molecular separation is carried out, the column temperature is 40 ℃, the mobile phase comprises a mobile phase A and a mobile phase B, the mobile phase A is an aqueous solution containing 0.1% of formic acid by volume, the mobile phase B is an acetonitrile solution containing 0.1% of formic acid by volume, the flow rate of the mobile phase is 0.4mL/min, the separation elution gradient is 0min, the mobile phase B is 1% by volume, the separation elution is 11min, the mobile phase B is 40% by volume, 13min, the mobile phase B is 70% by volume, the mobile phase B is 15min, the mobile phase B is 99% by volume, the mobile phase A is 18min, the mobile phase B is 99% by volume, the mobile phase B is 19min, the mobile phase B is 1% by volume, the sample injection amount is 3 mu L, and the temperature of an automatic device is 10 ℃. The mass spectrometer used a Q-Exactive mass spectrometry system (Sieimer's technology).
In the practice of the invention, the liquid chromatographic conditions for detecting the nonpolar metabolite (lipid metabolite) in the organic phase include a chromatographic column of WatersACQUTTYBEH C8, with the specification of 1.7 mu m 2.1 x 100mm, comprises an ACQUITY UPLC I-Class liquid chromatography system (Wobbe), performs small molecule separation, has the column temperature of 60 ℃, comprises a mobile phase A and a mobile phase B, wherein the mobile phase A is water, the mobile phase B is acetonitrile/isopropanol solution containing 0.1% acetic acid and 1% ammonium acetate (the volume ratio of acetonitrile to isopropanol is 7:3), the flow rate of the mobile phase is 0.4mL/min, the separation elution gradient is 0-1 min, the volume percentage of the mobile phase B is 55%, the separation elution is 4min, the volume percentage of the mobile phase B is 75%, the volume percentage of the mobile phase B is 12min, the volume percentage of the mobile phase B is 89%, the mobile phase B is 15min, the volume percentage of the mobile phase B is 100%, the mobile phase B is 19.5min, the volume percentage of the mobile phase B is 55%, the mobile phase B is 24min, the volume percentage of the mobile phase B is 55%, the volume percentage of the mobile phase is 2% and the sample injection temperature is 10 ℃ in an automatic sample injector. The mass spectrometer used a Q-Exactive mass spectrometry system (Sieimer's technology).
In the implementation process of the invention, the mass spectrum conditions for detecting the polar metabolite in the water phase and detecting the nonpolar metabolite in the organic phase comprise a Q-Exactive mass spectrum system (Siemens technology), an MS instrument is provided with an electrospray ionization source (ESI) by using an Orbitrap high-resolution mass spectrum (Thermo Fisher), primary and secondary mass spectrum data are acquired by adopting Full scanning and data dependency acquisition, the Full scanning mass spectrum range is 100-1500 Da, the scanning range of a secondary scanning mode (Full MS/dd-MS 2) is 100-310, 300-710 and 700-1500 Da, the data are acquired by adopting positive and negative ionization modes, the Automatic Gain Control (AGC) is 3E+6, maximum IT is 200, the resolution of the primary Full scanning (Full scan) is 70,000FWHM (@ 200 m/z), the resolution of the secondary scanning mode (FullMS/dd-MS 2) is 1.75, the rod window is 1.5m/z, the automatic gain control eV is 1 eV/300-300V, the maximum impact time of the gas sheath is 20-300V, and the gas sheath is 20-300V, and the maximum impact time of the gas sheath is 50-300V.
After the metabolome detection data of the lung non-cancer patients and the early lung cancer patients are obtained, the metabolome detection data of the lung non-cancer patients and the early lung cancer patients are preprocessed and the metabolome detection data of the lung non-cancer patients and the early lung cancer patients are identified, so that metabolome data are obtained.
In the implementation process of the invention, the pretreatment comprises peak extraction, peak alignment, peak filtration and filling of missing values, and firstly, the original off-the-shelf data of the mass spectrum is subjected to the peak extraction, the peak alignment and the peak filtration, so as to reduce systematic data interference caused by instrument detection time fluctuation. The polar metabolite detection platform and the lipid metabolite detection platform are respectively subjected to the treatment so as to avoid mutual interference among different detection platforms. The retention time of the dataset is calibrated based on the historical detection of the instrument. After calibration, different filtering criteria are applied to remove interfering peaks (i) isotope peaks, (ii) fragmentation in the analyte source due to ionization methods, (iii) redundant peaks, such as additional low intensity adducts of the same analyte and redundant derivatives, to ensure the quality of the analyzable dataset, and to reject >50% missing values of the characteristic peaks in all samples. Filling of the missing values is performed by using a random Forest chain equation (MICE Forest), and the processed data matrix is used as original data of statistical analysis, and a series of data preprocessing such as homogenization, standardization and the like can be further performed to obtain final mass spectrum matrix data.
In the implementation process of the invention, the identification of the metabolites comprises the steps of obtaining spectrogram information of compound parent ions and secondary fragment ions after software analysis according to original data, such as mass-to-charge ratio (m/z) of a primary mass spectrum, fragments of the secondary ions and retention time (Retentiontime), and carrying out qualitative analysis on the metabolites by matching with spectrogram information of the primary and secondary metabolites in a database, wherein the conventional metabolite database comprises HMDB(www.hmdb.ca)、PubChem(https://pubchem.ncbi.nlm.nih.gov)、MassBank(http://www.massbank.jp)、MassBank ofNorthAmerica(https://massbank.us)、 and lipid databases Lipidmap (https:// www.lipidmaps.org) and Lipidblast (https:// fiehnlab. Ucdavis/edu/projects/LipidBlast), and carrying out final verification on the metabolites according to retention time, MS1 and MS2 mass spectrum information when the standard is separated under the same chromatographic column and mass spectrum conditions based on the metabolites identified by the relevant database. The criteria for metabolite identification were retention times within 0.1min bias and mass accuracy bias of the metabolites less than 10ppm.
Clinical indexes of the lung non-cancer patients and the early-stage lung cancer patients are collected, and clinical index data of the lung non-cancer patients and the lung cancer patients are obtained. In the implementation process of the invention, clinical index detection comprises blood routine, blood biochemical detection, tumor markers and the like.
The invention respectively performs data standardization processing on the metabonomics data of the lung non-cancer patients and the early lung cancer patients and the clinical index data of the lung non-cancer patients and the early lung cancer patients to obtain standardized metabonomics data and clinical index data.
In the implementation process of the invention, logarithmic transformation (Log-transformation) processing and standard score (Z-score) processing are respectively carried out on metabonomics data and clinical index data. The Log-transformation scales the original data into a specific interval, eliminates the order-of-magnitude difference between different samples, and converts a certain original score into a standard score which can lead the original incomparable numerical values to be comparable. And preprocessing the data to obtain two groups of standardized data matrixes.
After standardized metabonomics data and clinical index data are obtained, the invention respectively calculates the composition ratio between all possible two variables of the standardized metabonomics data and the clinical index data, selects the composition of TOP 100 feature pairs based on the ranking of P-value values among the variables from small to large, and respectively forms a new data set by the clinical index and the metabonomic features after calculating the ratio.
The invention sequentially and respectively carries out pearson correlation analysis and Lasso regression feature screening on a new data set consisting of clinical indexes and metabolome features to obtain the blood marker combination for distinguishing lung non-cancer diseases and early lung cancer.
In the implementation process of the invention, the high-correlation variable can cause the model to have over-fitting phenomenon, and larger fluctuation and other situations can also occur in the actual application process. In order to avoid the phenomenon of over fitting of the model, the prediction stability of the model is improved. Features with correlation higher than a preset threshold value are eliminated, so that the model is guaranteed to have good differentiation and stability. The method comprises the following steps of (1) and (2) setting the specific formula of the product, wherein I R x,y I is not less than beta, forx, y=1, 2, & n, wherein the 1,2 & gtth & gtn & gtare K 1,K2···Kn respectively, the correlation coefficient between two characteristics is R, rx and y represent the correlation coefficient between the characteristics Kx and Ky. The threshold value beta is set according to the correlation coefficient, and when the absolute value of the correlation coefficient Rx, y between two features Kx and Ky is equal to or larger than beta, the two features are regarded as a feature pair with high correlation. And the Pelson correlation analysis comprises the steps of extracting all clinical index pairs and metabolic feature pairs with absolute values of correlation coefficients R more than or equal to 0.9, and removing from the beginning with the largest participation times until the correlation of any pair of clinical indexes and metabolic group features is less than a threshold value of 0.9.
In the specific implementation process of the invention, lasso regression, also called minimum absolute value convergence and selection operator (LeastAbsolute SHRINKAGE AND Selection Operator), compresses some unnecessary variable regression coefficients into zero by adding penalty items in model estimation so as to be removed from the model, thereby achieving the purpose of variable screening and effectively reducing data dimension in high-dimensional data analysis.
The invention also provides a method for constructing a diagnosis model of lung non-cancer diseases and early lung cancer, which comprises the following steps:
and (3) performing multivariate ROC curve analysis on the blood biomarkers obtained by screening by the screening method according to the scheme.
In the specific implementation process of the invention, the multivariate ROC curve analysis comprises the steps of selecting 3/4 samples from modeling group data as a training set, taking the remaining 1/4 samples as a test set and supporting a vector machine (SVM) random loop iteration for 1000 times, and constructing and obtaining a diagnosis model of lung non-cancer diseases and early lung cancer by a method of calculating an average value of accuracy of a final model.
According to the clinical index data and the metabolome characteristic data which are preliminarily screened, the invention respectively calculates the composition ratio between all possible two variables of the clinical index and the metabolome characteristic, selects the composition of 100 TOP characteristic pairs from small to large according to the saliency arrangement sequence based on P-value, and forms a data set by the clinical index and the metabolome characteristic after calculating the ratio.
In the practice of the invention, the composition ratios between the clinical index and all possible two variables of the metabolome profile are calculated separately using BiomarkerAnalysis module tools in MetaboAnalyst website.
After the data set is obtained, the invention uses Lasso regression to screen blood marker combinations from the data set for distinguishing lung non-cancerous disease from early stage lung cancer.
The invention uses Lasso regression to reduce the dimension of the characteristic variable in the data set, and further screens out the optimal blood marker combination.
For further explanation of the present invention, a blood marker combination for distinguishing lung non-cancerous diseases from early stage lung cancer, and a screening method and application thereof, provided by the present invention, will be described in detail with reference to the accompanying drawings and examples, which should not be construed as limiting the scope of the present invention.
Example 1
Step one, detecting object and sample information
1. Subject condition
1) Inclusion criteria for patients with lung non-cancerous disease and early stage lung cancer are as follows:
The method comprises the steps of (1) determining early lung cancer patients by a gold standard method of histopathological examination, wherein the early lung cancer patients are early lung cancer patients, and the early lung cancer patients are classified into stage I lung cancer patients, and (3) determining the clinical diagnosis of benign lung diseases by biopsy/postoperative pathology or comprehensive evaluation by a clinician, wherein the early lung cancer patients are patients suffering from diseases such as pneumonia, tuberculosis, pulmonary infection and the like.
2) The criteria for the exclusion of patients with lung non-cancerous disease and early stage lung cancer are as follows:
the method comprises the steps of (1) gestation or lactation, (2) emergency treatment or need to be saved, (3) treatment of malignant tumor by any anti-tumor treatment before medical history or sampling, and (4) patients suffering from a plurality of primary malignant tumors simultaneously.
3) And acquiring clinical information of a subject, namely collecting information such as basic conditions, health states and the like of the subject, including basic demographic characteristics, life style, disease history, medication history and the like, through a questionnaire form when a plasma sample of the subject is acquired. And meanwhile, specific clinical index data (comprising blood routine, blood biochemistry, tumor markers and the like) of the subjects are collected for subsequent statistical analysis and model construction.
Step two, plasma sample pretreatment and metabolite detection
1. Plasma sample pretreatment
After the plasma samples were removed from the-80 ℃ refrigerator, they were thawed on ice and vortexed for 10s. Subsequently, 100. Mu.L of plasma was taken and placed in 1000. Mu.L of a mixture of pre-cooled methyl t-butyl ether and methanol, and vortex-mixed to obtain a sample extract. Next, 500. Mu.L of a mixture of methanol and water (volume ratio of methanol to water: 3:1) was added to the sample extract, and the mixture was centrifuged at 12700rpm for 5 minutes, to separate an upper organic phase and a lower aqueous phase.
The aqueous phase is transferred from the lower layer to a centrifuge tube, into which is added 1100. Mu.L of ice-methanol, shaking, -incubation at 20℃for 60min, centrifugation for 10min to precipitate the protein, after which 1000. Mu.L of the supernatant is transferred to the corresponding polar centrifuge tube, and then spin-dried in a vacuum concentrator (Speed-Vac). Adding 200 mu L of mass spectrum grade water into the dried polar centrifuge tube for re-dissolution, vortex and uniformly mixing the centrifuge tube, performing ultrasonic auxiliary treatment for 5min, centrifuging the centrifuge tube at 4 ℃ for 5min (12700 rpm), and taking 180 mu L of supernatant from the centrifuge tube into a 2mL glass sample injection bottle to perform on-machine detection (LC-MS) of polar metabolites.
The organic phase is transferred from the upper layer to a centrifuge tube and spun-dried using a vacuum concentrator (Speed-Vac). Adding 200 mu L of acetonitrile/isopropanol (v/v, 3:1) into the dried sample for re-dissolution, vortex-mixing the centrifuge tube, performing ultrasonic auxiliary treatment (ultrasonic treatment at normal temperature) for 5min, centrifuging the centrifuge tube at 4 ℃ for 5min (12700 rpm), taking 180 mu L of supernatant from the centrifuge tube into a 2mL glass sample injection vial to be a nonpolar substance, and detecting by an upper machine (LC-MS).
2. High resolution liquid chromatography mass spectrometry (UHPLC-MS) detection:
(1) Polar metabolite chromatographic parameters polar metabolite analysis uses WatersACQUTTY HSS T3 chromatographic column (1.8 μm 2.1 mm,1.8 μm) is used for separating small molecules, the column temperature is 40 ℃, a ACQUITYUPLC I-Class liquid chromatography system (Wobbe) and a Q-Exactive mass spectrometry system (Sieimer technology) are respectively used by a liquid chromatograph and a mass spectrometer, the mobile phase comprises a mobile phase A which is an aqueous solution containing 0.1% formic acid, a mobile phase B which is an acetonitrile solution containing 0.1% formic acid, the flow rate is 0.4mL/min, the separation elution gradient comprises 0-1 min,1% mobile phase B, 11-13 min,40% -70% mobile phase B, 13-15 min,70% -99% mobile phase B, 15-18 min,99% mobile phase B, 19min,1% mobile phase B, the sample injection amount is 3 mu L, and the temperature of an automatic sample injection device is 10 ℃;
(2) Non-polar metabolite chromatography parameters lipid metabolite analysis uses Waters ACQUTTY BEH C8 (2.1X100 mm,1.7μm) column temperature of 60℃was used for small molecule separation, and the liquid chromatography and mass spectrometer respectively used an ACQUITY UPLC I-Class liquid chromatography system (Wolth) and a Q-Exactive mass spectrometry system (Sieimer's technology), the mobile phase was composed of mobile phase A being water and mobile phase B being acetonitrile/isopropanol (7/3, V/V), both containing 0.1% acetic acid and 1% ammonium acetate, at a flow rate of 0.4mL/min. The separation elution gradient comprises 0-1 min,55% of mobile phase B, 1-4 min, 55-75% of mobile phase B, 4-12 min, 75-89% of mobile phase B, 12-15 min, 89-100% of mobile phase B, 15-19.5 min,100% of mobile phase B, 19.51-24 min and 55% of mobile phase B, wherein the sample injection amount is 2 mu L, and the temperature of an automatic sampler is 10 ℃;
(3) LC-MS mass spectrometry parameters both polar and non-polar metabolites were acquired using Full scan (Full scan) and Data Dependent Acquisition (DDA) to obtain primary (MS 1) and secondary mass spectrometry (MS 2) data. The full-scanning mass spectrum range is 100-1500 Da. The scanning range of the secondary scanning mode (Full MS/dd-MS 2) is 100-310, 300-710 and 700-1500 Da. The MS instrument uses Orbitrap high-resolution mass spectrometry (Thermo Fisher) equipped with electrospray ionization sources (ESI) to acquire data using positive and negative ionization modes, respectively. The specific parameters are 3E+6 for Automatic Gain Control (AGC), 200MS for maximum IT, 70,000FWHM (@ 200 m/z) for first-order Full scan (Full scan) resolution, 1.75 ten thousand for second-order mass spectrum (Full MS/dd-MS 2) for resolution, 1.5m/z for quadrupole window, 1E+5 for Automatic Gain Control (AGC), 50MS for ion maximum injection time, 30eV for relative collision energy (HCD), +3,500V for ion source voltage positive ion mode, 3,500V for negative ion mode, 20psi for atomizer, 400 ℃ for sheath gas temperature, and 10L/min for sheath gas flow.
Step three, metabonomics data preprocessing and metabolite identification
1. Metabonomic data pretreatment:
The metabonomics data preprocessing step comprises peak extraction, peak alignment, peak filtration, filling of missing values and the like, wherein first, mass spectrum original machine-down data is subjected to peak extraction, peak alignment and peak filtration, so as to reduce systematic data interference caused by instrument detection time fluctuation. The polar metabolite detection platform and the lipid metabolite detection platform are respectively subjected to the treatment so as to avoid mutual interference among different detection platforms. The retention time of the dataset is calibrated based on the historical detection of the instrument. After calibration, different filtering criteria are applied to remove interfering peaks (i) isotope peaks, (ii) fragmentation in the analyte source due to ionization methods, (iii) redundant peaks, such as additional low intensity adducts of the same analyte and redundant derivatives, to ensure the quality of the analyzable dataset, and to reject >50% missing values of the characteristic peaks in all samples. Filling of the missing values is performed by using a random Forest chain equation (MICE Forest), and the processed data matrix is used as original data of statistical analysis, and a series of data preprocessing such as homogenization, standardization and the like can be further performed to obtain final mass spectrum matrix data.
2. Identification of metabolites:
according to the original data, spectrum information of compound parent ions and secondary fragment ions, such as mass-to-charge ratio (m/z) of a primary mass spectrum, fragments of a secondary ion and retention time (Retentiontime), is obtained after software analysis, metabolites are qualitatively determined by matching with spectrum information of primary and secondary metabolites in a database, common metabolite databases are HMDB(www.hmdb.ca)、PubChem(https://pubchem.ncbi.nlm.nih.gov)、MassBank(http://www.massbank.jp)、MassBank ofNorthA merica(https://massbank.us)、, lipid databases Lipidmap (https:// www.lipidm aps. Org) and Lipidblast (https:// fiehnlab. Ucdavis. Edu/projects/LipidBlast), and the metabolites are finally verified according to the retention time, MS1 and MS2 mass spectrum information of the standard substances when the standard substances are separated under the same type of chromatographic column and mass spectrum conditions based on the metabolites identified by the related databases. The criteria for metabolite identification were retention times within 0.2min bias and mass accuracy bias of the metabolites less than 10ppm.
Fourth, diagnosis method for distinguishing lung non-cancer diseases and early lung cancer based on clinical indexes and metabonomic characteristics
1. Subject information
Together 556 subjects plasma samples were taken, both actual samples were collected by two medical centers, and the modeling and validation set samples were different samples. The specific distribution is that the modeling group has 417 plasma samples comprising 222 early lung cancer and 195 lung non-cancerous subjects, and the verification group has 139 plasma samples comprising 74 early lung cancer and 65 lung non-cancerous subjects (Table 1). All the plasma samples were collected in early morning on an empty stomach and all the collected plasma samples were kept in a-80 ℃ refrigerator.
Table 1 sample information
Building module Verification group
The number of people Total (n=179) 417 139
Grouping Early lung cancer group 222 74
Group of non-cancerous diseases of the lung 195 65
Sex (sex) Man (%) 54.9% 56.1%
Female (%) 45.1% 43.9%
Age of Average value (standard deviation) 56.94±11.55 58.33±11.74
2. Blood marker composition screening method
1) Data normalization processing, namely, log-transformation processing and standard score (Z-score) processing are respectively carried out on metabonomics data and clinical index data. The Log-transformation scales the original data into a specific interval, eliminates the order-of-magnitude difference between different samples, and converts a certain original score into a standard score which can lead the original incomparable numerical values to be comparable. And preprocessing the data to obtain two groups of standardized data matrixes.
2) And calculating characteristic comparison values, namely respectively calculating the ratio of the composition between all possible two variables of the metabonomics and clinical index data after standardized treatment, selecting the composition of 100 TOP characteristic pairs based on the P-value between the variables in a descending order, and respectively forming a new data set by the clinical index and the metabonomic characteristic after calculating the ratio.
3) And (3) carrying out Peerson (Pearson) correlation analysis on the clinical indexes and the metabolome characteristics respectively, extracting all clinical index pairs and metabolism characteristic pairs with absolute values of the correlation coefficient R being more than or equal to 0.9, and removing from the beginning with the largest participation times until the correlation of any pair of clinical indexes and the metabolome characteristics is less than a threshold value of 0.9.
4) Lasso regression feature screening.
The metabonomics and clinical index data were feature screened by Lasso regression, respectively, and finally 14 metabonomic feature pairs (phosphatidylethanolamine 38:3 p/sebacic acid, phosphatidylinositol 38:3/sebacic acid, sphingomyelin D38: 5/sebacic acid, phosphatidylethanolamine 38:5/L-prolyl-L-threonine, gluconolactone/hexadecanoic acid, phosphatidylcholine 38:5/sebacic acid, gluconolactone/trans-cyclohexane-1, 2-dicarboxylic acid, phosphatidylcholine 38:6 e/trans-cyclohexane-1, 2-dicarboxylic acid, phosphatidylethanolamine 36:1 p/trans-cyclohexane-1, 2-dicarboxylic acid, phosphatidylethanolamine 38:5/nerolidol, phosphatidylserine 38:0/sebacic acid, dimethyl phosphatidylethanolamine 38:4/sebacic acid, phosphatidylcholine 38:6 e/D-glucosamine-6-phosphoric acid) and 13 clinical index pairs (monocyte ratio/potassium monocyte ratio/activated partial thromboplastin time, erythrocyte count/activated partial thromboplastin time, potassium/international standardized ratio, potassium/thrombin time, activated partial thromboplastin time/thrombin time, carcinoembryonic antigen/serum globulin, international standardized ratio/activated partial thromboplastin time, carcinoembryonic antigen/potassium, carcinoembryonic antigen/prothrombin time, carcinoembryonic antigen/activated partial thromboplastin time, carcinoembryonic antigen/lactate dehydrogenase, erythrocyte count/potassium), a total of 27 feature variable pairs were blood biomarker combinations (table 2).
Table 227 blood marker compositions for distinguishing lung non-cancerous disease group and early stage lung cancer group
3. Construction of a diagnostic model for non-cancerous lung disease and early stage lung cancer
To evaluate the diagnostic effect of the screened 27 blood biomarkers in differentiating between the non-cancerous lung disease group and the early lung cancer group, a multivariate ROC curve analysis was performed on the 27 blood biomarkers. Randomly selecting 3/4 samples from modeling group data as training set (training), using the rest 1/4 samples as test set (test), and using SVM random loop iteration for 1000 times, constructing diagnosis model of lung non-cancer disease and early lung cancer by counting average value of accuracy of final model, ROC analysis result shows (figure 1), AUC=0.816 (sensitivity: 74.6%, specificity: 71.4%, accuracy: 73.1%, accuracy: 74.6%), indicating that constructed model has good prediction efficiency.
Generally, the model is evaluated for quality according to the Area Under the ROC Curve (AUC), generally between 0.5 and 1, the better the AUC value is, the better the model performance is, the better the diagnosis effect is, if the AUC value is smaller than 0.5, the model Accuracy is poor, and the ROC classification prediction model comprises Sensitivity (Sensitivity), specificity (SPECIFICITY) Accuracy (Accurcy) and Precision (Precision) besides common parameters such as a subject work Curve (ROC) and an Area Under the Curve (AUC). Wherein the sensitivity and specificity reflect the recognition ability of the model to positive and negative samples, respectively.
Sensitivity (also called true positive rate (TPF or TPR) indicates the probability that the model is correctly diagnosed as positive in the case group, specificity (SPECIFICITY) also called true negative rate (TNF or TNR) indicates the probability that the model is correctly diagnosed as negative in the control group, and higher Sensitivity indicates stronger recognition capability of the model to positive samples. The higher the specificity, the more discriminating the model against negative samples.
The sensitivity is:
the specificity is as follows:
The accuracy is as follows:
The accuracy is as follows:
Wherein,
TP (True positive) true positives, samples that are actually positive examples are correctly predicted as the number of positive examples;
TN (Ture Negative) true negative, the sample that is actually negative is correctly predicted as the number of negative cases;
FP (False Positive) false positives, samples that are actually negative examples are incorrectly predicted as the number of positive examples;
FN (False Negative) false negative, samples that are actually positive examples are incorrectly predicted as the number of negative examples.
4. External validation of lung non-cancerous disease and early stage lung cancer diagnostic models
To further verify the predictive effect of the constructed lung non-cancerous disease and early stage lung cancer diagnostic model based on 27 blood biomarkers. And taking the verification group data set as an unknown sample, putting the unknown sample into the diagnosis model of the lung non-cancer disease group and the early lung cancer group constructed by the modeling module, and evaluating the independent verification effect of the model on the unknown data set except the modeling group data set. Based on the external validation data set, each sample outputs a Probability value (Probability), i.e., a diagnostic threshold. The sensitivity and specificity can be calculated from the formula. In the ROC analysis with sensitivity (sensitivity) on the ordinate and 1-specificity (1-specitivity) on the abscissa, the different sensitivities and 1-specificities each have their corresponding points in the ROC curve. Similarly, the probability value of each sample can be used as a diagnostic threshold, so that a plurality of different points can be obtained from the ROC analysis chart, and the points are connected to form line segments with different gradients, so that a final ROC curve chart is drawn. Then, a diagnostic threshold for the model is selected based on the optimal sensitivity and specificity values, which result indicates that the diagnostic threshold for the model is 0.5531. The confusion matrix results in table 3 show that 58 out of 74 cases of early stage lung cancer patients were judged as early stage lung cancer groups, 16 were misclassified as lung non-cancerous disease groups, and 44 out of 65 cases of lung non-cancerous disease patients were diagnosed as lung non-cancerous disease groups, and 21 were misclassified as early stage lung cancer disease groups. The results of the ROC analysis (fig. 2) of the external verification show that auc=0.815 (sensitivity: 78.4%, specificity: 67.7%, accuracy: 73.4%), and the screened 27 clinical indexes and the diagnosis model of early lung cancer established by the metabolome characteristics also have better prediction capability.
TABLE 3 confusion matrix for distinguishing lung non-cancerous disease from early stage lung cancer diagnostic models
Early lung cancer Non-cancerous diseases of the lung
Early lung cancer, n=74 58(TP) 16(FN)
Lung non-cancerous disease, n=65 21(FP) 44(TN)
In summary, compared with the prior art, the invention not only uses the characteristics of plasma metabolome, but also brings in clinical indexes of blood through blood routine and blood biochemical detection, and compared with other methods, the invention more comprehensively reflects the physiological and pathological states of organisms, reduces the possibility of deviation of results, and can more accurately diagnose lung non-cancer diseases and early lung cancer patients by adopting multidimensional blood biomarkers.
Although the foregoing embodiments have been described in some, but not all, embodiments of the invention, it should be understood that other embodiments may be devised in accordance with the present embodiments without departing from the spirit and scope of the invention.

Claims (10)

1. A blood marker combination for distinguishing between non-cancerous lung disease and early stage lung cancer, comprising 13 clinical indicators and 14 metabolome features;
The clinical indexes comprise a monocyte ratio and potassium concentration ratio, a monocyte ratio and activated partial thromboplastin time ratio, a erythrocyte count and activated partial thromboplastin time ratio, a potassium concentration and international standardized ratio, a potassium concentration and thrombin time ratio, an activated partial thromboplastin time and thrombin time ratio, a carcinoembryonic antigen and serum globulin concentration ratio, an international standardized ratio and activated partial thromboplastin time ratio, a carcinoembryonic antigen and potassium concentration ratio, a carcinoembryonic antigen concentration and prothrombin time ratio, a carcinoembryonic antigen concentration and activated partial thromboplastin time ratio, a carcinoembryonic antigen and lactate dehydrogenase concentration ratio, a erythrocyte count and potassium concentration ratio;
The metabonomics features include a peak intensity ratio of phosphatidylethanolamine 38:3p to sebacic acid, a peak intensity ratio of phosphatidylinositol 38:3 to sebacic acid, a peak intensity ratio of sphingomyelin D38:5 to sebacic acid, a peak intensity ratio of phosphatidylethanolamine 38:5 to L-prolyl-L-threonine, a peak intensity ratio of gluconolactone to hexadecanedioic acid, a peak intensity ratio of phosphatidylcholine 38:5 to sebacic acid, a peak intensity ratio of gluconolactone to trans-cyclohexane-1, 2-dicarboxylic acid, a peak intensity ratio of phosphatidylcholine 38:6e to trans-cyclohexane-1, 2-dicarboxylic acid, a peak intensity ratio of phosphatidylethanolamine 36:1p to trans-cyclohexane-1, 2-dicarboxylic acid, a peak intensity ratio of phosphatidylethanolamine 38:5 to nerolidol, a peak intensity ratio of phosphatidylserine 38:0 to sebacic acid, a peak intensity ratio of dimethyl phosphatidylethanolamine 38:4 to sebacic acid, a peak intensity ratio of phosphatidylethanolamine 38:6e to D-glucosamine-6.
2. Use of a reagent or kit for detecting a blood marker combination according to claim 1 for the preparation of a product for distinguishing between a non-cancerous lung disease and early stage lung cancer.
3. The use according to claim 1, wherein the non-cancerous lung disease comprises one or more of pneumonia, tuberculosis and pulmonary infection.
4. Use of a reagent or kit for detecting a blood marker combination according to claim 1 for the preparation of a diagnostic product for pulmonary diseases.
5. The use according to any one of claims 2 to 4, wherein the reagent or kit for detecting the blood marker combination according to claim 1 comprises a liquid chromatography mass spectrometry detection reagent.
6. A screening method for a blood marker combination for distinguishing between a non-cancerous lung condition and early stage lung cancer, comprising the steps of:
respectively collecting plasma samples of a lung non-cancer patient and an early lung cancer patient;
Separating an organic phase and an aqueous phase from plasma samples of patients with lung non-cancer diseases and early lung cancer, wherein the organic phase is a non-polar metabolite and the aqueous phase is a polar metabolite;
Respectively detecting polar metabolites in the water phase and nonpolar metabolites in the organic phase by adopting a liquid chromatography-mass spectrometry method to obtain metabolome detection data of plasma samples of patients with lung non-cancer diseases and early lung cancer patients;
Preprocessing and metabolite identification are carried out on the metabolome detection data of the lung non-cancer patients and the early lung cancer patients to obtain metabolome data;
Collecting clinical indexes of patients with lung non-cancer diseases and early-stage lung cancer patients to obtain clinical index data;
Respectively carrying out data standardization processing on the metabonomics data and the clinical index data of the lung non-cancer patients and the early lung cancer patients to obtain standardized metabonomics data and clinical index data;
Respectively calculating the composition ratio between all possible two variables of the standardized metabonomics data and the clinical index data, selecting the composition of TOP 100 feature pairs based on the ranking of P-value values among the variables from small to large, and respectively forming a new data set by the clinical index and the metabonomic features after calculating the ratio;
And sequentially and respectively carrying out pearson correlation analysis and Lasso regression feature screening on a new data set consisting of clinical indexes and metabolome features to obtain a blood marker combination for distinguishing lung non-cancer diseases and early lung cancer.
7. The screening method of claim 6, wherein the clinical index test comprises routine blood and biochemical blood tests.
8. The method according to claim 6, wherein the pearson correlation analysis includes extracting all pairs of clinical indicators and pairs of metabolic signatures having an absolute value of the correlation coefficient R equal to or greater than 0.9, and eliminating from the most-involved ones until the correlation between any pair of clinical indicators and the pair of metabolic signatures is less than a threshold of 0.9.
9. A method of constructing a diagnostic model for non-cancerous lung disease and early stage lung cancer comprising the steps of:
Performing multivariate ROC curve analysis on the blood biomarkers obtained by screening by the screening method according to any one of claims 6-8.
10. The method of claim 9, wherein the multivariate ROC curve analysis comprises selecting 3/4 samples from the modeling set data as a training set, selecting the remaining 1/4 samples as a test set, and performing random loop iteration 1000 times by using a support vector machine, and constructing a diagnosis model of the lung non-cancerous disease and the early lung cancer by means of calculating an average value of accuracy of the final model.
CN202411832721.9A 2024-12-13 2024-12-13 A blood marker combination for distinguishing non-cancerous lung diseases from early lung cancer, and its screening method and application Pending CN119716028A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411832721.9A CN119716028A (en) 2024-12-13 2024-12-13 A blood marker combination for distinguishing non-cancerous lung diseases from early lung cancer, and its screening method and application

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411832721.9A CN119716028A (en) 2024-12-13 2024-12-13 A blood marker combination for distinguishing non-cancerous lung diseases from early lung cancer, and its screening method and application

Publications (1)

Publication Number Publication Date
CN119716028A true CN119716028A (en) 2025-03-28

Family

ID=95087405

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411832721.9A Pending CN119716028A (en) 2024-12-13 2024-12-13 A blood marker combination for distinguishing non-cancerous lung diseases from early lung cancer, and its screening method and application

Country Status (1)

Country Link
CN (1) CN119716028A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120214168A (en) * 2025-04-21 2025-06-27 北京工业大学 A biomarker combination and application for early screening of lung cancer

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170097355A1 (en) * 2015-10-06 2017-04-06 University Of Washington Biomarkers and methods to distinguish ovarian cancer from benign tumors
US20180156774A1 (en) * 2016-12-05 2018-06-07 Korea Institute Of Science And Technology Kit for diagnosis of coronary heart disease using multi-metabolites and clinical parameters, and method for diagnosis of coronary heart disease using the same
WO2023082820A1 (en) * 2021-11-09 2023-05-19 上海市第一人民医院 Marker for lung adenocarcinoma diagnosis and application thereof
CN117990922A (en) * 2024-03-01 2024-05-07 哈尔滨脉图精准技术有限公司 Metabolic markers and their applications in the diagnosis of diabetes mellitus combined with coronary heart disease
CN118191321A (en) * 2024-05-14 2024-06-14 哈尔滨脉图精准技术有限公司 Plasma metabolic markers for gastric cancer detection and their applications
CN118348131A (en) * 2023-12-12 2024-07-16 广西精准医学科技有限公司 Combined marker for diagnosing early bladder cancer
CN118348143A (en) * 2024-04-12 2024-07-16 哈尔滨脉图精准技术有限公司 Metabolic marker composition for distinguishing health from non-colorectal cancer diseases and its application

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170097355A1 (en) * 2015-10-06 2017-04-06 University Of Washington Biomarkers and methods to distinguish ovarian cancer from benign tumors
US20180156774A1 (en) * 2016-12-05 2018-06-07 Korea Institute Of Science And Technology Kit for diagnosis of coronary heart disease using multi-metabolites and clinical parameters, and method for diagnosis of coronary heart disease using the same
WO2023082820A1 (en) * 2021-11-09 2023-05-19 上海市第一人民医院 Marker for lung adenocarcinoma diagnosis and application thereof
CN118348131A (en) * 2023-12-12 2024-07-16 广西精准医学科技有限公司 Combined marker for diagnosing early bladder cancer
CN117990922A (en) * 2024-03-01 2024-05-07 哈尔滨脉图精准技术有限公司 Metabolic markers and their applications in the diagnosis of diabetes mellitus combined with coronary heart disease
CN118348143A (en) * 2024-04-12 2024-07-16 哈尔滨脉图精准技术有限公司 Metabolic marker composition for distinguishing health from non-colorectal cancer diseases and its application
CN118191321A (en) * 2024-05-14 2024-06-14 哈尔滨脉图精准技术有限公司 Plasma metabolic markers for gastric cancer detection and their applications

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
GUANGXI WANG 等: "Lung cancer scRNA-seq and lipidomics reveal aberrant lipid metabolism for early-stage diagnosis", SCI TRANSL MED, vol. 14, no. 630, 2 February 2022 (2022-02-02) *
张曦 等: "现代临床医学导论 上", 31 January 2024, 郑州大学出版社, pages: 46 *
胡树珍;尚彦彦;闵玲;李娟;席红利;张其超;: "肿瘤标志物和凝血指标在肺癌辅助诊断中的作用", 热带医学杂志, no. 11, 28 November 2017 (2017-11-28) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120214168A (en) * 2025-04-21 2025-06-27 北京工业大学 A biomarker combination and application for early screening of lung cancer

Similar Documents

Publication Publication Date Title
CN110057955B (en) Method for screening specific serum marker of hepatitis B
US20170023575A1 (en) Identification of blood based metabolite biomarkers of pancreatic cancer
CN118191321B (en) Plasma metabolic markers for gastric cancer detection and their applications
CN113008972A (en) Serum metabolic marker for gestational diabetes diagnosis and application thereof
CN118362666A (en) Metabolic marker composition for distinguishing health, non-colorectal cancer disease and colorectal cancer and its application
WO2025123592A1 (en) Use of metabolic marker for diagnosis of lung cancer staging and kit
CN118376786A (en) Urine metabolic markers for colorectal cancer detection and their applications
CN110057954B (en) Application of plasma metabolic markers in diagnosis or monitoring of HBV
CN119716028A (en) A blood marker combination for distinguishing non-cancerous lung diseases from early lung cancer, and its screening method and application
CN118465125A (en) Metabolic marker composition for gastric cancer diagnosis and screening method and application thereof
CN120490509A (en) Specific metabolic marker combination for diagnosing ovarian cancer and application thereof
CN119757601A (en) Metabolic marker combination for ovarian cancer diagnosis and application thereof
CN113777181A (en) A marker and kit for diagnosing early esophageal cancer
CN113567585A (en) A peripheral blood-based screening marker and kit for esophageal squamous cell carcinoma
CN113466370A (en) Marker and detection kit for early screening of esophageal squamous carcinoma
CN117347643B (en) Metabolic marker combinations for judging benign and malignant pulmonary nodules and their screening methods and applications
WO2024151217A1 (en) A novel system and method for early-stage detection of multiple cancers
CN111060585B (en) Plasma exosome body spectrum peak and application thereof
CN112305120B (en) Application of metabolite in atherosclerotic cerebral infarction
CN119861198B (en) Plasma metabolic marker combination for distinguishing early-stage lung cancer from pneumonia
CN119246872A (en) A screening method for metabolic markers for distinguishing gastric non-cancerous diseases from gastric cancer and its application
CN114264767A (en) Biomarkers for diabetes diagnosis and uses thereof
CN116500280B (en) Group of markers for diagnosing carotid aneurysm and application thereof
CN120064662B (en) A combination of metabolic biomarkers for early diagnosis of renal cell carcinoma and its application
CN114965733B (en) Colorectal advanced adenoma diagnosis metabolic marker combination and application thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination