CN118421799A - A group of specific methylation diagnostic markers related to lung adenocarcinoma and their application - Google Patents
A group of specific methylation diagnostic markers related to lung adenocarcinoma and their application Download PDFInfo
- Publication number
- CN118421799A CN118421799A CN202410669622.7A CN202410669622A CN118421799A CN 118421799 A CN118421799 A CN 118421799A CN 202410669622 A CN202410669622 A CN 202410669622A CN 118421799 A CN118421799 A CN 118421799A
- Authority
- CN
- China
- Prior art keywords
- lung adenocarcinoma
- cpg sites
- c21orf88
- lung
- methylation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 201000005249 lung adenocarcinoma Diseases 0.000 title claims abstract description 93
- 208000010507 Adenocarcinoma of Lung Diseases 0.000 title claims abstract description 92
- 230000011987 methylation Effects 0.000 title claims abstract description 60
- 238000007069 methylation reaction Methods 0.000 title claims abstract description 60
- 108091029430 CpG site Proteins 0.000 claims abstract description 58
- 101000894911 Homo sapiens Putative uncharacterized protein B3GALT5-AS1 Proteins 0.000 claims abstract description 29
- 102100021263 Putative uncharacterized protein B3GALT5-AS1 Human genes 0.000 claims abstract description 29
- 102100039268 Histone H2A type 1-A Human genes 0.000 claims abstract description 15
- 101001036104 Homo sapiens Histone H2A type 1-A Proteins 0.000 claims abstract description 15
- 101000666775 Homo sapiens T-box transcription factor TBX3 Proteins 0.000 claims abstract description 15
- 101000732353 Homo sapiens Transcription factor AP-2-delta Proteins 0.000 claims abstract description 15
- 102100038409 T-box transcription factor TBX3 Human genes 0.000 claims abstract description 15
- 102100033331 Transcription factor AP-2-delta Human genes 0.000 claims abstract description 15
- 102100030690 Histone H2B type 1-C/E/F/G/I Human genes 0.000 claims abstract description 9
- 101001084682 Homo sapiens Histone H2B type 1-C/E/F/G/I Proteins 0.000 claims abstract description 9
- 101000819111 Homo sapiens Trans-acting T-cell-specific transcription factor GATA-3 Proteins 0.000 claims abstract description 9
- 102100021386 Trans-acting T-cell-specific transcription factor GATA-3 Human genes 0.000 claims abstract description 9
- 239000012502 diagnostic product Substances 0.000 claims description 11
- 239000003550 marker Substances 0.000 claims description 5
- 239000003153 chemical reaction reagent Substances 0.000 claims description 4
- 239000000047 product Substances 0.000 claims description 2
- 230000035945 sensitivity Effects 0.000 abstract description 12
- 230000000694 effects Effects 0.000 abstract description 10
- 238000003745 diagnosis Methods 0.000 abstract description 9
- 238000007637 random forest analysis Methods 0.000 abstract description 6
- 238000012795 verification Methods 0.000 abstract description 5
- 238000010353 genetic engineering Methods 0.000 abstract 1
- 210000001519 tissue Anatomy 0.000 description 50
- 206010028980 Neoplasm Diseases 0.000 description 30
- 201000011510 cancer Diseases 0.000 description 28
- 208000020816 lung neoplasm Diseases 0.000 description 22
- 206010058467 Lung neoplasm malignant Diseases 0.000 description 21
- 201000005202 lung cancer Diseases 0.000 description 21
- 108020004414 DNA Proteins 0.000 description 11
- 238000001514 detection method Methods 0.000 description 11
- 210000004369 blood Anatomy 0.000 description 10
- 239000008280 blood Substances 0.000 description 10
- 230000002159 abnormal effect Effects 0.000 description 8
- 238000013103 analytical ultracentrifugation Methods 0.000 description 8
- 238000012216 screening Methods 0.000 description 8
- 210000000265 leukocyte Anatomy 0.000 description 7
- 238000004458 analytical method Methods 0.000 description 5
- 201000005243 lung squamous cell carcinoma Diseases 0.000 description 5
- 238000000034 method Methods 0.000 description 5
- 210000002381 plasma Anatomy 0.000 description 5
- 206010041067 Small cell lung cancer Diseases 0.000 description 4
- 206010023774 Large cell lung cancer Diseases 0.000 description 3
- 238000012165 high-throughput sequencing Methods 0.000 description 3
- 238000003384 imaging method Methods 0.000 description 3
- 201000009546 lung large cell carcinoma Diseases 0.000 description 3
- 208000002154 non-small cell lung carcinoma Diseases 0.000 description 3
- 108090000623 proteins and genes Proteins 0.000 description 3
- 208000000587 small cell lung carcinoma Diseases 0.000 description 3
- 208000029729 tumor suppressor gene on chromosome 11 Diseases 0.000 description 3
- VYPSYNLAJGMNEJ-UHFFFAOYSA-N Silicium dioxide Chemical compound O=[Si]=O VYPSYNLAJGMNEJ-UHFFFAOYSA-N 0.000 description 2
- 238000001574 biopsy Methods 0.000 description 2
- 210000004027 cell Anatomy 0.000 description 2
- 238000002790 cross-validation Methods 0.000 description 2
- 230000034994 death Effects 0.000 description 2
- 231100000517 death Toxicity 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 239000012634 fragment Substances 0.000 description 2
- 210000004072 lung Anatomy 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 210000005259 peripheral blood Anatomy 0.000 description 2
- 239000011886 peripheral blood Substances 0.000 description 2
- 238000000513 principal component analysis Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 238000010200 validation analysis Methods 0.000 description 2
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 1
- 102100032383 Adherens junction-associated protein 1 Human genes 0.000 description 1
- 206010005003 Bladder cancer Diseases 0.000 description 1
- 206010006187 Breast cancer Diseases 0.000 description 1
- 208000026310 Breast neoplasm Diseases 0.000 description 1
- 206010008342 Cervix carcinoma Diseases 0.000 description 1
- 208000006545 Chronic Obstructive Pulmonary Disease Diseases 0.000 description 1
- 206010009944 Colon cancer Diseases 0.000 description 1
- 208000001333 Colorectal Neoplasms Diseases 0.000 description 1
- 102100035342 Cysteine dioxygenase type 1 Human genes 0.000 description 1
- 230000007067 DNA methylation Effects 0.000 description 1
- 102100023191 E3 ubiquitin-protein ligase MARCHF11 Human genes 0.000 description 1
- 208000000461 Esophageal Neoplasms Diseases 0.000 description 1
- 102100021090 Homeobox protein Hox-A9 Human genes 0.000 description 1
- 102100040188 Homeobox protein unc-4 homolog Human genes 0.000 description 1
- 101000797959 Homo sapiens Adherens junction-associated protein 1 Proteins 0.000 description 1
- 101000737778 Homo sapiens Cysteine dioxygenase type 1 Proteins 0.000 description 1
- 101000978722 Homo sapiens E3 ubiquitin-protein ligase MARCHF11 Proteins 0.000 description 1
- 101000747380 Homo sapiens Homeobox protein unc-4 homolog Proteins 0.000 description 1
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 1
- 101000827703 Homo sapiens Polyphosphoinositide phosphatase Proteins 0.000 description 1
- 101001117305 Homo sapiens Prostaglandin D2 receptor Proteins 0.000 description 1
- 101000831616 Homo sapiens Protachykinin-1 Proteins 0.000 description 1
- 101000632056 Homo sapiens Septin-9 Proteins 0.000 description 1
- 101000885321 Homo sapiens Serine/threonine-protein kinase DCLK1 Proteins 0.000 description 1
- 101000703741 Homo sapiens Short stature homeobox protein 2 Proteins 0.000 description 1
- 101000652324 Homo sapiens Transcription factor SOX-17 Proteins 0.000 description 1
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 1
- 208000019693 Lung disease Diseases 0.000 description 1
- 206010027476 Metastases Diseases 0.000 description 1
- 108700011259 MicroRNAs Proteins 0.000 description 1
- 206010073310 Occupational exposures Diseases 0.000 description 1
- 206010030155 Oesophageal carcinoma Diseases 0.000 description 1
- 102100023591 Polyphosphoinositide phosphatase Human genes 0.000 description 1
- 206010036790 Productive cough Diseases 0.000 description 1
- 102100024212 Prostaglandin D2 receptor Human genes 0.000 description 1
- 206010060862 Prostate cancer Diseases 0.000 description 1
- 208000000236 Prostatic Neoplasms Diseases 0.000 description 1
- 102100024304 Protachykinin-1 Human genes 0.000 description 1
- 206010056342 Pulmonary mass Diseases 0.000 description 1
- 101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 1
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 1
- 102100028024 Septin-9 Human genes 0.000 description 1
- 102100039758 Serine/threonine-protein kinase DCLK1 Human genes 0.000 description 1
- 102100031976 Short stature homeobox protein 2 Human genes 0.000 description 1
- 208000005718 Stomach Neoplasms Diseases 0.000 description 1
- NINIDFKCEFEMDL-UHFFFAOYSA-N Sulfur Chemical compound [S] NINIDFKCEFEMDL-UHFFFAOYSA-N 0.000 description 1
- 208000024770 Thyroid neoplasm Diseases 0.000 description 1
- 102100030243 Transcription factor SOX-17 Human genes 0.000 description 1
- 108700025716 Tumor Suppressor Genes Proteins 0.000 description 1
- 102000044209 Tumor Suppressor Genes Human genes 0.000 description 1
- 208000007097 Urinary Bladder Neoplasms Diseases 0.000 description 1
- 208000006105 Uterine Cervical Neoplasms Diseases 0.000 description 1
- 238000003915 air pollution Methods 0.000 description 1
- 210000000601 blood cell Anatomy 0.000 description 1
- 230000017531 blood circulation Effects 0.000 description 1
- 238000013276 bronchoscopy Methods 0.000 description 1
- 201000010881 cervical cancer Diseases 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 238000002512 chemotherapy Methods 0.000 description 1
- 210000000038 chest Anatomy 0.000 description 1
- 238000002591 computed tomography Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000010219 correlation analysis Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000013399 early diagnosis Methods 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 210000002919 epithelial cell Anatomy 0.000 description 1
- 201000004101 esophageal cancer Diseases 0.000 description 1
- 239000012530 fluid Substances 0.000 description 1
- 206010017758 gastric cancer Diseases 0.000 description 1
- 108010027263 homeobox protein HOXA9 Proteins 0.000 description 1
- 230000006607 hypermethylation Effects 0.000 description 1
- 230000003902 lesion Effects 0.000 description 1
- 201000007270 liver cancer Diseases 0.000 description 1
- 208000014018 liver neoplasm Diseases 0.000 description 1
- 230000003211 malignant effect Effects 0.000 description 1
- 230000009401 metastasis Effects 0.000 description 1
- 239000002679 microRNA Substances 0.000 description 1
- 108020004707 nucleic acids Proteins 0.000 description 1
- 102000039446 nucleic acids Human genes 0.000 description 1
- 150000007523 nucleic acids Chemical class 0.000 description 1
- 231100000675 occupational exposure Toxicity 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 230000008506 pathogenesis Effects 0.000 description 1
- 230000001575 pathological effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000002980 postoperative effect Effects 0.000 description 1
- 208000005069 pulmonary fibrosis Diseases 0.000 description 1
- 238000003908 quality control method Methods 0.000 description 1
- 238000001959 radiotherapy Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 208000023504 respiratory system disease Diseases 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 239000000377 silicon dioxide Substances 0.000 description 1
- 230000000391 smoking effect Effects 0.000 description 1
- 210000003802 sputum Anatomy 0.000 description 1
- 208000024794 sputum Diseases 0.000 description 1
- 201000011549 stomach cancer Diseases 0.000 description 1
- 229910052717 sulfur Inorganic materials 0.000 description 1
- 239000011593 sulfur Substances 0.000 description 1
- 239000006228 supernatant Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 201000002510 thyroid cancer Diseases 0.000 description 1
- 238000002604 ultrasonography Methods 0.000 description 1
- 201000005112 urinary bladder cancer Diseases 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
- C12Q1/6886—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/154—Methylation markers
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Organic Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Engineering & Computer Science (AREA)
- Immunology (AREA)
- Pathology (AREA)
- Analytical Chemistry (AREA)
- Zoology (AREA)
- Genetics & Genomics (AREA)
- Wood Science & Technology (AREA)
- Physics & Mathematics (AREA)
- Biotechnology (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Hospice & Palliative Care (AREA)
- Biophysics (AREA)
- Oncology (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Description
技术领域Technical Field
本发明涉及生物诊断试剂技术领域,尤其涉及一组与肺腺癌相关的特异性甲基化诊断标志物及其应用。The present invention relates to the technical field of biological diagnostic reagents, and in particular to a group of specific methylation diagnostic markers related to lung adenocarcinoma and applications thereof.
背景技术Background technique
据最新国际癌症机构估计,2020年肺癌的新发病例约220万,约占每年癌症新发病例的11.4%;而死亡例数约为180万,占比18%,其发病率和死亡率在世界范围内分别排名第二和第一。据国家癌症中心发布的《2016年中国癌症统计数据》表明我国肺癌近二十年间的发病率呈逐年上升趋势;2016年我国肺癌的发病率和死亡率均排在首位,新发病例约为82.8万人,死亡人数约为65.7万人。此外,我国肺癌的发病率在不同性别、不同年龄段及不同地区间差距悬殊,近年来尤其以女性、45岁以上人群及城市地区患者增长为主。除年龄外,目前已经明确多项肺癌相关的高危因素,如吸烟、空气污染、二氧化硅等职业暴露、既往肺部疾病史(慢阻肺和肺纤维化等)及肺癌家族史等,但影响肺癌发生的因素众多,其发病机制仍有待挖掘。其中肺腺癌是肺癌最常见的亚型,约占所有肺癌的60%,且呈明显的上升趋势。可以预计,肺腺癌的发病率在未来仍将持续升高,是威胁我国居民健康的严重挑战之一。According to the latest estimates by the International Cancer Agency, there will be about 2.2 million new cases of lung cancer in 2020, accounting for about 11.4% of new cancer cases each year; and the number of deaths will be about 1.8 million, accounting for 18%, and its incidence and mortality rate rank second and first respectively in the world. According to the "2016 China Cancer Statistics" released by the National Cancer Center, the incidence of lung cancer in my country has been increasing year by year in the past two decades; in 2016, the incidence and mortality of lung cancer in my country ranked first, with about 828,000 new cases and about 657,000 deaths. In addition, the incidence of lung cancer in my country varies greatly among different genders, age groups and regions. In recent years, the growth of patients has been mainly in women, people over 45 years old and urban areas. In addition to age, a number of high-risk factors related to lung cancer have been identified, such as smoking, air pollution, occupational exposure to silica, previous history of lung diseases (chronic obstructive pulmonary disease and pulmonary fibrosis, etc.) and family history of lung cancer, but there are many factors that affect the occurrence of lung cancer, and its pathogenesis remains to be explored. Lung adenocarcinoma is the most common subtype of lung cancer, accounting for about 60% of all lung cancers, and is showing a clear upward trend. It can be expected that the incidence of lung adenocarcinoma will continue to rise in the future, and it is one of the serious challenges threatening the health of Chinese residents.
及时准确的诊断是肺腺癌治疗的前提,是降低肺腺癌死亡率的最有效方法之一。常用的组织活检方法包括在CT或彩超引导下经皮肺穿刺活检、支气管镜检查等,除了有创性及潜在的并发症,组织样本具有一定的肿瘤异质性,会影响诊断的准确性,且在早期癌症或术后残留及癌症复发患者中难以施展。目前,胸部低剂量计算机断层扫描(LDCT)是公认的肺癌筛查方式。多项随机对照研究结果表明LDCT筛查能够通过提高早期肺癌的检出而降低肺癌的总体死亡率,但由于LDCT具有较高的敏感性,其在临床试验中的假阳性率高达23.3%。这提示基于影像学对肺部结节病灶良恶性的综合判断与病灶的实际情况存在一定的偏差,其准确性仍有待提高。且出于对假阳性成像结果及辐射暴露的担心,CT筛查在高危人群实践中依从性仍有待提高。痰液的脱落细胞学检查或肺泡灌洗液上皮细胞中核酸类分子的检测(RNA、DNA和microRNA)简便易行,可进行多次检测,但在早期肺癌中的灵敏度不高,仍需进一步研究以确定其临床有效性和实用性。Timely and accurate diagnosis is the prerequisite for the treatment of lung adenocarcinoma and one of the most effective ways to reduce the mortality of lung adenocarcinoma. Commonly used tissue biopsy methods include percutaneous lung puncture biopsy and bronchoscopy under the guidance of CT or color Doppler ultrasound. In addition to being invasive and having potential complications, tissue samples have certain tumor heterogeneity, which will affect the accuracy of diagnosis, and are difficult to perform in patients with early cancer or postoperative residual and cancer recurrence. At present, low-dose computed tomography (LDCT) of the chest is a recognized lung cancer screening method. The results of multiple randomized controlled studies have shown that LDCT screening can reduce the overall mortality of lung cancer by improving the detection of early lung cancer. However, due to the high sensitivity of LDCT, its false positive rate in clinical trials is as high as 23.3%. This suggests that there is a certain deviation between the comprehensive judgment of the benign and malignant nature of lung nodules based on imaging and the actual situation of the lesions, and its accuracy still needs to be improved. And due to concerns about false-positive imaging results and radiation exposure, the compliance of CT screening in high-risk populations still needs to be improved. Exfoliative cytology of sputum or detection of nucleic acid molecules (RNA, DNA, and microRNA) in epithelial cells of bronchoalveolar lavage fluid are simple and easy to perform and can be tested multiple times, but their sensitivity in early lung cancer is not high, and further research is needed to determine their clinical effectiveness and practicality.
近年来,多项研究开始探索血浆游离cfDNA甲基化标志物在肺癌/非小细胞肺癌/肺腺癌中的诊断效果,提示cfDNA甲基化标志物作为肺腺癌筛选和早期诊断标志物的潜在价值。早期研究多关注于单个基因的甲基化水平在肺癌cfDNA中的诊断效果(如SHOX2,SEPT9及DCLK1等),灵敏度在44.3%-60%不等。随后,更多的研究从全基因组甲基化数据出发筛选甲基化标志物。2017年,Ooki A等人基于TCGA肺癌组织和癌旁组织450K芯片数据筛选得到6个甲基化标志物(CDO1,HOXA9,AJAP1,PTGDR,UNCX和MARCH11),该6个标志物在cfDNA(N=85)中联合诊断早期非小细胞肺癌的灵敏度和特异度分别为72.1%和71.4%。2017年,HulbertA等人同样基于TCGA肺癌组织和癌旁样本确定了不完全重合的6个甲基化标志物(SOX17和TAC1等),在非小细胞肺癌cfDNA(N=210)中的灵敏度和特异度的范围分别为65-76%和74-84%。2021年,LiangN等人同样基于TCGA组织数据确定了一组泛癌标志物,在Ⅰ-Ⅲ期肺癌患者cfDNA中的灵敏度为52-81%。我们注意到上述研究在确定甲基化标志物时仅用肺癌组织/血浆和癌旁组织/健康对照cfDNA进行分析筛选出两组之间差异最大的标志物,或者是泛癌标志物及一些已知的抑癌基因进行后续无创诊断,而没有考虑到标志物的肿瘤特异性的特点。此外,也有一些研究在探索癌症标志物的时候将不同细胞或癌症类型具有特异性甲基化图谱考虑在内。但在肺癌早期患者中灵敏度有限(21.9%(14.8-31.1%))。目前cfDNA甲基化检测以全基因组甲基化检测、靶向硫化测序等高通量测序和低通量检测为主。上述高通量测序无疑加速了标志物的开发,但为满足上样量需要采集更多的血液,这可能会降低参与者的依从性,复杂的技术设备和较高的成本价格均限制其在大样本人群中的推广。In recent years, many studies have begun to explore the diagnostic effect of plasma free cfDNA methylation markers in lung cancer/non-small cell lung cancer/lung adenocarcinoma, suggesting the potential value of cfDNA methylation markers as markers for screening and early diagnosis of lung adenocarcinoma. Early studies focused on the diagnostic effect of the methylation level of a single gene in lung cancer cfDNA (such as SHOX2, SEPT9 and DCLK1, etc.), with a sensitivity ranging from 44.3% to 60%. Subsequently, more studies screened methylation markers based on whole genome methylation data. In 2017, Ooki A et al. screened 6 methylation markers (CDO1, HOXA9, AJAP1, PTGDR, UNCX and MARCH11) based on TCGA lung cancer tissue and paracancerous tissue 450K chip data. The sensitivity and specificity of the six markers in the combined diagnosis of early non-small cell lung cancer in cfDNA (N=85) were 72.1% and 71.4%, respectively. In 2017, Hulbert A et al. also identified 6 incompletely overlapping methylation markers (SOX17 and TAC1, etc.) based on TCGA lung cancer tissue and adjacent samples, with sensitivity and specificity ranging from 65-76% and 74-84% in non-small cell lung cancer cfDNA (N=210), respectively. In 2021, Liang N et al. also identified a set of pan-cancer markers based on TCGA tissue data, with a sensitivity of 52-81% in cfDNA of stage I-III lung cancer patients. We note that when determining methylation markers, the above studies only used lung cancer tissue/plasma and adjacent tissue/healthy control cfDNA to analyze and screen out markers with the greatest difference between the two groups, or pan-cancer markers and some known tumor suppressor genes for subsequent non-invasive diagnosis, without considering the tumor-specific characteristics of the markers. In addition, some studies have taken into account the specific methylation profiles of different cells or cancer types when exploring cancer markers. However, the sensitivity is limited in early lung cancer patients (21.9% (14.8-31.1%)). Currently, cfDNA methylation detection is mainly based on high-throughput sequencing and low-throughput detection such as whole-genome methylation detection and targeted sulfur sequencing. The above high-throughput sequencing has undoubtedly accelerated the development of markers, but more blood needs to be collected to meet the sample volume, which may reduce the compliance of participants. The complex technical equipment and high cost price limit its promotion in large sample populations.
发明内容Summary of the invention
本发明的目的是为了解决现有技术中高通量测序为了满足上样量需要采集更多的血液,这可能会降低参与者的依从性,复杂的技术设备和较高的成本价格均限制其在大样本人群中的推广。The purpose of the present invention is to solve the problem in the prior art that high-throughput sequencing requires collecting more blood to meet the sample loading amount, which may reduce the compliance of the participants. The complex technical equipment and high cost price limit its promotion in large sample populations.
本发明的目的是为了解决现有技术中缺少一组肺腺癌特异性的甲基化标志物的筛选流程和初步验证。The purpose of the present invention is to solve the problem in the prior art of lacking a group of screening processes and preliminary verification of lung adenocarcinoma-specific methylation markers.
为了实现上述目的,本发明采用了如下技术方案:In order to achieve the above object, the present invention adopts the following technical solutions:
一种肺腺癌诊断标志物,包括cg24996482(NA)、cg07077277(HIST1H2AA)、cg05148722(HIST1H2BF)、cg00608169(NA)、cg15435730(TFAP2D)、cg21174344(NA)、cg15803869(NA)、cg23943136(GATA3)、cg03933026(NA)、cg27630311(TBX3)、cg08448665(C21orf88)及cg17900854(C21orf88)中的至少一种。A diagnostic marker for lung adenocarcinoma, comprising at least one of cg24996482 (NA), cg07077277 (HIST1H2AA), cg05148722 (HIST1H2BF), cg00608169 (NA), cg15435730 (TFAP2D), cg21174344 (NA), cg15803869 (NA), cg23943136 (GATA3), cg03933026 (NA), cg27630311 (TBX3), cg08448665 (C21orf88) and cg17900854 (C21orf88).
本申请还提供了CpG位点作为肺腺癌组织特异性甲基化标志物在制备肺腺癌诊断产品中的应用,所述CpG位点为cg24996482(NA)、cg07077277(HIST1H2AA)、cg05148722(HIST1H2BF)、cg00608169(NA)、cg15435730(TFAP2D)、cg21174344(NA)、cg15803869(NA)、cg23943136(GATA3)、cg03933026(NA)、cg27630311(TBX3)、cg08448665(C21orf88)及cg17900854(C21orf88)中的至少一种。The present application also provides the use of CpG sites as lung adenocarcinoma tissue-specific methylation markers in the preparation of lung adenocarcinoma diagnostic products, wherein the CpG sites are at least one of cg24996482 (NA), cg07077277 (HIST1H2AA), cg05148722 (HIST1H2BF), cg00608169 (NA), cg15435730 (TFAP2D), cg21174344 (NA), cg15803869 (NA), cg23943136 (GATA3), cg03933026 (NA), cg27630311 (TBX3), cg08448665 (C21orf88) and cg17900854 (C21orf88).
优选的,所述CpG位点为cg07077277(HIST1H2AA),cg08448665(C21orf88),cg15435730(TFAP2D),cg17900854(C21orf88),cg27630311(TBX3)及cg03933026(NA)6个CpG位点的组合。Preferably, the CpG sites are a combination of six CpG sites: cg07077277 (HIST1H2AA), cg08448665 (C21orf88), cg15435730 (TFAP2D), cg17900854 (C21orf88), cg27630311 (TBX3) and cg03933026 (NA).
优选的,所述肺腺癌诊断产品为试剂或试剂盒。Preferably, the lung adenocarcinoma diagnostic product is a reagent or a kit.
本申请还提供了一种肺腺癌诊断产品,包括cg24996482(NA)、cg07077277(HIST1H2AA)、cg05148722(HIST1H2BF)、cg00608169(NA)、cg15435730(TFAP2D)、cg21174344(NA)、cg15803869(NA)、cg23943136(GATA3)、cg03933026(NA)、cg27630311(TBX3)、cg08448665(C21orf88)及cg17900854(C21orf88)中的至少一种。The present application also provides a lung adenocarcinoma diagnostic product, including at least one of cg24996482 (NA), cg07077277 (HIST1H2AA), cg05148722 (HIST1H2BF), cg00608169 (NA), cg15435730 (TFAP2D), cg21174344 (NA), cg15803869 (NA), cg23943136 (GATA3), cg03933026 (NA), cg27630311 (TBX3), cg08448665 (C21orf88) and cg17900854 (C21orf88).
优选的,还包括医学上可接受的其他试剂。Preferably, other medically acceptable agents are also included.
优选的,所述所述产品中包括cg07077277(HIST1H2AA),cg08448665(C21orf88),cg15435730(TFAP2D),cg17900854(C21orf88),cg27630311(TBX3)及cg03933026(NA)。Preferably, the product includes cg07077277 (HIST1H2AA), cg08448665 (C21orf88), cg15435730 (TFAP2D), cg17900854 (C21orf88), cg27630311 (TBX3) and cg03933026 (NA).
综上所述,本申请中通过具体的验证实验证明了上述提供的12个肺腺癌特异性CpG位点在早期肺腺癌患者中即出现异常改变,并且相比于肺鳞癌、大细胞肺癌或小细胞肺癌等呈现出肺腺癌特异性甲基化改变,在TCGA肺腺癌数据中评价该12个CpG位点的诊断效果,其AUC范围在0.652-0.911之间,灵敏度在54.9-83.1%之间,而特异度在81.2-100%之间;基于TCGA肺腺癌数据构建随机森林模型评价该12个CpG位点的综合诊断效果,准确度为99.7%,在GEO数据集中验证,准确度分别为83.9%和97.9%。上述结果证明了本申请所提供的CpG位点在不同数据集的肺腺癌组织中均具有良好且稳定的区分性能。In summary, this application has proved through specific verification experiments that the 12 lung adenocarcinoma-specific CpG sites provided above show abnormal changes in early lung adenocarcinoma patients, and show lung adenocarcinoma-specific methylation changes compared with lung squamous cell carcinoma, large cell lung cancer or small cell lung cancer. The diagnostic effect of the 12 CpG sites was evaluated in the TCGA lung adenocarcinoma data, and the AUC range was between 0.652-0.911, the sensitivity was between 54.9-83.1%, and the specificity was between 81.2-100%; the random forest model was constructed based on the TCGA lung adenocarcinoma data to evaluate the comprehensive diagnostic effect of the 12 CpG sites, with an accuracy of 99.7%, and verified in the GEO data set, with accuracies of 83.9% and 97.9%, respectively. The above results prove that the CpG sites provided in this application have good and stable distinguishing performance in lung adenocarcinoma tissues in different data sets.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1为本发明实施例1中内部30例样本主成分分析(A)及10对肺腺癌患者癌和癌旁组织差异性甲基化分析结果(B);剔除内部健康人白细胞DNA异常背景噪声的差异性高甲基化位点(C)和异常低甲基化位点(D);FIG1 is a principal component analysis of 30 samples in Example 1 of the present invention (A) and the differential methylation analysis results of 10 pairs of lung adenocarcinoma patients' cancer and adjacent tissues (B); differentially high methylation sites (C) and abnormally low methylation sites (D) after eliminating abnormal background noise of healthy human leukocyte DNA;
图2为本发明一实施方式中内部肺腺癌差异性甲基化CpG位点与TCGA肺腺癌差异甲基化CpG位点甲基化水平相关性分析(A)及韦恩图(B);FIG2 is a correlation analysis (A) and a Venn diagram (B) of the methylation levels of internal lung adenocarcinoma differentially methylated CpG sites and TCGA lung adenocarcinoma differentially methylated CpG sites in one embodiment of the present invention;
图3为本发明一实施方式中肺腺癌组织特异性甲基化标志物筛选流程(A)及候选标志物在肺腺癌(B),健康人全血(C)及十种其他类型肿瘤组织样本(D)中的甲基化水平;FIG3 is a lung adenocarcinoma tissue-specific methylation marker screening process (A) and the methylation levels of candidate markers in lung adenocarcinoma (B), healthy human whole blood (C) and ten other types of tumor tissue samples (D) in one embodiment of the present invention;
图4为本发明中12个CpG位点在不同TNM分期肺腺癌患者和癌旁组织中的甲基化水平;FIG4 shows the methylation levels of 12 CpG sites in the present invention in lung adenocarcinoma patients and adjacent tissues at different TNM stages;
图5为本发明肺腺癌组织特异性甲基化CpG位点在肺癌组织中的甲基化水平注:LUAD:肺腺癌;LUSC:肺鳞癌;LCC:大细胞肺癌;SCLC:小细胞肺癌;Normal:癌旁组织;FIG5 shows the methylation levels of the lung adenocarcinoma tissue-specific methylated CpG sites in lung cancer tissue of the present invention. Note: LUAD: lung adenocarcinoma; LUSC: lung squamous cell carcinoma; LCC: large cell lung cancer; SCLC: small cell lung cancer; Normal: adjacent tissue to the cancer;
图6为本发明中12个CpG位点在TCGA肺腺癌数据中的诊断效果对比;FIG6 is a comparison of the diagnostic effects of 12 CpG sites in the present invention in TCGA lung adenocarcinoma data;
图7为本发明中TCGA及GEO数据集主成分分析(A)及不同CpG位点组合模型在三个数据集中的AUC值(B)。FIG. 7 shows the principal component analysis of TCGA and GEO datasets (A) and the AUC values of different CpG site combination models in the three datasets (B) in the present invention.
具体实施方式Detailed ways
以下结合具体实施例,对本发明作进一步地详细说明。The present invention is further described in detail below in conjunction with specific embodiments.
肺腺癌诊断标志物,包括cg24996482(NA)、cg07077277(HIST1H2AA)、cg05148722(HIST1H2BF)、cg00608169(NA)、cg15435730(TFAP2D)、cg21174344(NA)、cg15803869(NA)、cg23943136(GATA3)、cg03933026(NA)、cg27630311(TBX3)、cg08448665(C21orf88)及cg17900854(C21orf88)中的至少一种。Lung adenocarcinoma diagnostic markers, including at least one of cg24996482 (NA), cg07077277 (HIST1H2AA), cg05148722 (HIST1H2BF), cg00608169 (NA), cg15435730 (TFAP2D), cg21174344 (NA), cg15803869 (NA), cg23943136 (GATA3), cg03933026 (NA), cg27630311 (TBX3), cg08448665 (C21orf88) and cg17900854 (C21orf88).
CpG位点作为肺腺癌组织特异性甲基化标志物在制备肺腺癌诊断产品中的应用,所述CpG位点为cg24996482(NA)、cg07077277(HIST1H2AA)、cg05148722(HIST1H2BF)、cg00608169(NA)、cg15435730(TFAP2D)、cg21174344(NA)、cg15803869(NA)、cg23943136(GATA3)、cg03933026(NA)、cg27630311(TBX3)、cg08448665(C21orf88)及cg17900854(C21orf88)中的至少一种。The invention relates to the use of CpG sites as lung adenocarcinoma tissue-specific methylation markers in the preparation of lung adenocarcinoma diagnostic products, wherein the CpG sites are at least one of cg24996482 (NA), cg07077277 (HIST1H2AA), cg05148722 (HIST1H2BF), cg00608169 (NA), cg15435730 (TFAP2D), cg21174344 (NA), cg15803869 (NA), cg23943136 (GATA3), cg03933026 (NA), cg27630311 (TBX3), cg08448665 (C21orf88) and cg17900854 (C21orf88).
优选的,所述CpG位点为cg07077277(HIST1H2AA),cg08448665(C21orf88),cg15435730(TFAP2D),cg17900854(C21orf88),cg27630311(TBX3)及cg03933026(NA)6个CpG位点的组合。Preferably, the CpG sites are a combination of six CpG sites: cg07077277 (HIST1H2AA), cg08448665 (C21orf88), cg15435730 (TFAP2D), cg17900854 (C21orf88), cg27630311 (TBX3) and cg03933026 (NA).
以下结合具体验证实验对上述内容进行阐述。The above contents are explained below in combination with specific verification experiments.
实施例1:肺腺癌组织DNA全基因甲基化检测:Example 1: Detection of DNA whole gene methylation in lung adenocarcinoma tissue:
A、研究对象:第一阶段包含10例肺腺癌病例和10例健康对照。其基线信息被展现在表1中。A. Study subjects: The first phase included 10 lung adenocarcinoma cases and 10 healthy controls. Their baseline information is shown in Table 1.
表1本研究入选对象的特征Table 1 Characteristics of the subjects included in this study
病例组:来自南通市第一人民医院明确诊断的新发肺腺癌病例。纳入标准:首次就诊未经治疗的经病理确诊的原发性肺腺癌患者,排除以往患有恶性肿瘤史的患者。10例肺癌病例均来自2022.10-2022.11在江苏省南通市第一人民医院明确诊断的新发肺腺癌病例。所有病例都经由病理组织学的证实,排除了有既往恶性肿瘤病史、其他器官转移及采血前接受过化疗或放射治疗的病人。Case group: newly diagnosed lung adenocarcinoma cases from Nantong First People's Hospital. Inclusion criteria: patients with pathologically confirmed primary lung adenocarcinoma who were untreated for the first time, excluding patients with a history of malignant tumors. The 10 lung cancer cases were all newly diagnosed lung adenocarcinoma cases in Nantong First People's Hospital, Jiangsu Province from October 2022 to November 2022. All cases were confirmed by pathological histology, excluding patients with a history of malignant tumors, metastasis to other organs, and chemotherapy or radiotherapy before blood collection.
对照组:参加常规体检的健康人群,对照的纳入标准为无任何肿瘤史,肺部影像检测无任何结节。10名健康对照则是从南通市第一人民医院接受常规健康检查的健康人群中,随机抽选出来的。所有对照均是排除了有呼吸系统疾病或恶性肿瘤病史的健康对照。Control group: healthy people who participated in routine physical examinations. The inclusion criteria for the control group were no history of any tumors and no nodules in lung imaging tests. 10 healthy controls were randomly selected from healthy people who received routine health examinations at Nantong First People's Hospital. All controls were healthy controls with no history of respiratory diseases or malignant tumors.
B、生物标本收集和甲基化850K芯片检测B. Biological specimen collection and methylation 850K chip detection
收集肺腺癌患者新鲜的癌组织和匹配癌旁组织,与-80℃冻存。使用血浆游离DNA专用采血管Ardent Cell-Free DNABlood CollectionTube采集健康对照外周静脉血10mL,1900g/3000rpm室温下离心10min,分离血浆、白细胞和血细胞,将血浆在4℃环境下16,000g再次离心10min,转移上清液至另一个干净的EP管后,于-80℃冻存。Fresh cancer tissues and matching adjacent tissues from patients with lung adenocarcinoma were collected and stored at -80°C. 10 mL of peripheral venous blood from healthy controls was collected using an Ardent Cell-Free DNA Blood Collection Tube, centrifuged at 1900 g/3000 rpm for 10 min at room temperature to separate plasma, leukocytes, and blood cells. The plasma was centrifuged again at 16,000 g for 10 min at 4°C, and the supernatant was transferred to another clean EP tube and stored at -80°C.
提取肺腺癌患者组织DNA和健康对照血液白细胞DNA进行全基因组甲基化850K芯片检测。DNA from lung adenocarcinoma tissues and blood leukocytes of healthy controls were extracted and subjected to whole genome methylation 850K chip detection.
具体的,对10例早期肺腺癌患者匹配的癌和癌旁组织DNA,10例健康人外周血白细胞DNA进行全基因组甲基化检测(850K,首先,经质控后,剔除一例结果异常癌组织样本(图1A),在9例早期肺腺癌患者癌组织样本和10例癌旁正常组织样本间进行差异性甲基化分析,根据FDR小于0.05,共筛选得到在早期肺腺癌患者中发生异常高甲基化的57,659个CpG位点,发生异常低甲基化的95,768个CpG位点(图1B);对于异常高甲基化CpG位点,剔除在白细胞DNA中呈现高甲基化背景的CpG位点,即甲基化中位数水平大于0.1的49,639个CpG位点,剩余8,020个肺腺癌候选高甲基化CpG位点(图1C);对于异常低甲基化CpG位点,剔除在白细胞DNA中呈现低甲基化背景的CpG位点,即甲基化中位数水平小于0.9的77,988个CpG位点,剩余17,781个肺腺癌候选低甲基化CpG位点(图1D)。Specifically, the whole genome methylation detection (850K) was performed on the matched cancer and adjacent normal tissue DNA of 10 patients with early lung adenocarcinoma and the peripheral blood leukocyte DNA of 10 healthy subjects. First, after quality control, one cancer tissue sample with abnormal results was eliminated (Figure 1A). The differential methylation analysis was performed between the cancer tissue samples of 9 patients with early lung adenocarcinoma and 10 adjacent normal tissue samples. According to the FDR less than 0.05, a total of 57,659 CpG sites with abnormal high methylation and 95,768 CpG sites with abnormal low methylation in patients with early lung adenocarcinoma were screened ( Figure 1B); for abnormally high methylated CpG sites, CpG sites that showed a high methylation background in leukocyte DNA, i.e., 49,639 CpG sites with a median methylation level greater than 0.1, were eliminated, leaving 8,020 candidate high methylated CpG sites for lung adenocarcinoma (Figure 1C); for abnormally low methylated CpG sites, CpG sites that showed a low methylation background in leukocyte DNA, i.e., 77,988 CpG sites with a median methylation level less than 0.9, were eliminated, leaving 17,781 candidate low methylated CpG sites for lung adenocarcinoma (Figure 1D).
实施例2:肺腺癌特异性甲基化标志物的筛选Example 2: Screening of lung adenocarcinoma-specific methylation markers
对TCGA肺腺癌全基因组甲基化芯片数据进行差异性分析,和本课题组前期肺腺癌全基因组850K芯片结果联合分析,保留差异方向一致且具有统计学意义的肺腺癌候选CpG位点;进一步联合乳腺癌、肺鳞癌、结直肠癌、前列腺癌、胃癌、肝癌、食管癌、宫颈癌、甲状腺癌及膀胱癌等发病靠前的十种恶性肿瘤组织、癌旁组织和健康人血液全基因组甲基化芯片数据,针对上述肺腺癌候选CpG位点,去除在上述十种恶性肿瘤组织及健康人全血中具有异常甲基化背景干扰的位点,构建肺腺癌组织DNA特异性甲基化图谱,进而得到肺腺癌组织特异性甲基化标志物。A differential analysis was performed on the TCGA lung adenocarcinoma whole genome methylation chip data, and the results of the previous lung adenocarcinoma whole genome 850K chip were jointly analyzed to retain the candidate CpG sites of lung adenocarcinoma with consistent differences and statistical significance. The whole genome methylation chip data of ten most common malignant tumor tissues, including breast cancer, squamous cell lung carcinoma, colorectal cancer, prostate cancer, gastric cancer, liver cancer, esophageal cancer, cervical cancer, thyroid cancer and bladder cancer, as well as paracancerous tissues and healthy human blood were further combined. For the above-mentioned lung adenocarcinoma candidate CpG sites, the sites with abnormal methylation background interference in the above-mentioned ten malignant tumor tissues and healthy human whole blood were removed to construct a lung adenocarcinoma tissue-specific DNA methylation map, and then obtain lung adenocarcinoma tissue-specific methylation markers.
基于上述肺腺癌候选甲基化CpG位点,进一步联合TCGA及GEO(表2)数据库中的包括肺腺癌在内的11种类型的癌症肿瘤组织,健康人全血的甲基化450K芯片数据进行泛癌分析,构建肺腺癌组织特异性甲基化图谱。Based on the above candidate methylated CpG sites of lung adenocarcinoma, we further combined the methylated 450K chip data of 11 types of cancer tumor tissues including lung adenocarcinoma in the TCGA and GEO databases (Table 2) and healthy human whole blood to perform pan-cancer analysis and construct a tissue-specific methylation map of lung adenocarcinoma.
表2GEO样本的人口学信息和临床信息Table 2 Demographic and clinical information of GEO samples
在TCGA455例肺腺癌组织及32例癌旁组织中使用更为严格的筛选条件(|β癌-β癌旁|>0.2且FDR<0.05)筛选差异性甲基化位点,两个数据集差异性甲基化CpG位点的甲基化水平高度一致(图2A);和上述标志物交叉,共得到3,255个高甲基化CpG位点和837个低甲基化CpG位点(图2B)。More stringent screening conditions (|β-cancer-β-cancer-para-cancer|>0.2 and FDR<0.05) were used to screen differentially methylated sites in 455 lung adenocarcinoma tissues and 32 adjacent adjacent tissues from TCGA. The methylation levels of differentially methylated CpG sites in the two datasets were highly consistent (Figure 2A). When crossed with the above-mentioned markers, a total of 3,255 highly methylated CpG sites and 837 low-methylated CpG sites were obtained (Figure 2B).
最后,进一步剔除在1,501例健康人全血DNA中可能产生潜在背景干扰的CpG位点,和在4,633例十种其他类型的癌症组织中发生异常甲基化改变的CpG位点,最终筛选出12个在肺腺癌组织中发生特异性甲基化的CpG位点,共注释到6个已知基因上(表3,图3A)。该12个肺腺癌特异性甲基化位点在肺腺癌组织中呈现特异性高甲基化或低甲基化,相比于癌旁组织(图3B)、健康人外周血白细胞(图3C)及其他十种恶性肿瘤组织(图3D)。Finally, we further eliminated CpG sites that may have potential background interference in the whole blood DNA of 1,501 healthy people and CpG sites that had abnormal methylation changes in 4,633 cases of ten other types of cancer tissues, and finally screened out 12 CpG sites that were specifically methylated in lung adenocarcinoma tissues, which were annotated to 6 known genes (Table 3, Figure 3A). The 12 lung adenocarcinoma-specific methylation sites showed specific hypermethylation or hypomethylation in lung adenocarcinoma tissues, compared with adjacent tissues (Figure 3B), healthy peripheral blood leukocytes (Figure 3C) and ten other malignant tumor tissues (Figure 3D).
表3肺腺癌12个组织特异性甲基化CpG位点基因组信息Table 3 Genomic information of 12 tissue-specific methylated CpG sites in lung adenocarcinoma
实施例3:肺腺癌组织特异性甲基化诊断标志物的验证:Example 3: Validation of lung adenocarcinoma tissue-specific methylation diagnostic markers:
为了初步验证标志物对肺腺癌组织和癌旁正常组织的区分效果,以TCGA肺腺癌450K数据为训练集,使用三个GEO肺腺癌组织甲基化450K数据集(GSE39279、GSE52401和GSE56044)进行外部验证。对每个CpG位点进行ROC分析计算最佳截断值(最大Youden指数处)及AUC值。根据最佳截断值生成混淆矩阵,计算每个CpG位点的灵敏度、特异度及准确度等诊断指标。使用所有CpG位点建立随机森林模型以评价所选标志物的综合预测性能,构建模型在训练集中使用10折交叉验证进行内部验证,使用GEO数据集进行外部验证,生成准确率、灵敏度、特异度和AUC等指标评估该模型的诊断效能。In order to preliminarily verify the effect of the markers on distinguishing lung adenocarcinoma tissues from adjacent normal tissues, the TCGA lung adenocarcinoma 450K data were used as the training set, and three GEO lung adenocarcinoma tissue methylation 450K data sets (GSE39279, GSE52401, and GSE56044) were used for external validation. ROC analysis was performed for each CpG site to calculate the optimal cutoff value (at the maximum Youden index) and AUC value. The confusion matrix was generated according to the optimal cutoff value, and the diagnostic indicators such as sensitivity, specificity, and accuracy of each CpG site were calculated. A random forest model was established using all CpG sites to evaluate the comprehensive predictive performance of the selected markers. The constructed model was internally validated using 10-fold cross validation in the training set, and externally validated using the GEO data set. Indicators such as accuracy, sensitivity, specificity, and AUC were generated to evaluate the diagnostic efficacy of the model.
具体的,为了进一步探索该12个肺腺癌特异性CpG位点在不同TNM分期肺腺癌患者中的甲基化水平,可以看到该12个CpG位点的甲基化水平在早期肺腺癌患者中即出现异常改变(图4);此外,相比于肺鳞癌、大细胞肺癌或小细胞肺癌等呈现出肺腺癌特异性甲基化改变(图5)。Specifically, in order to further explore the methylation levels of the 12 lung adenocarcinoma-specific CpG sites in patients with lung adenocarcinoma at different TNM stages, it can be seen that the methylation levels of the 12 CpG sites showed abnormal changes in early lung adenocarcinoma patients (Figure 4); in addition, compared with lung squamous cell carcinoma, large cell lung cancer or small cell lung cancer, lung adenocarcinoma-specific methylation changes were shown (Figure 5).
在TCGA肺腺癌数据中评价该12个CpG位点的诊断效果,其AUC范围在0.652-0.911之间,灵敏度在54.9-83.1%之间,而特异度在81.2-100%之间(图6)。The diagnostic effect of the 12 CpG sites was evaluated in the TCGA lung adenocarcinoma data, with an AUC range of 0.652-0.911, a sensitivity of 54.9-83.1%, and a specificity of 81.2-100% ( FIG. 6 ).
基于TCGA肺腺癌数据构建随机森林模型评价该12个CpG位点的综合诊断效果,准确度为99.7%,在GEO数据集中验证,准确度分别为83.9%和97.9%(表4)。A random forest model was constructed based on TCGA lung adenocarcinoma data to evaluate the comprehensive diagnostic effect of the 12 CpG sites, with an accuracy of 99.7%. When validated in the GEO dataset, the accuracies were 83.9% and 97.9%, respectively (Table 4).
表412个CpG位点构建随机森林混淆矩阵Table 4 Construction of random forest confusion matrix for 12 CpG sites
注:LUAD:肺腺癌Note: LUAD: lung adenocarcinoma
随后,使用基于随机森林模型的(重复5次的10折交叉验证)递归特征消除(REF)构建含有从1个到12个CpG位点的模型,根据AUC值(最大)确定最佳模型;可以看到,在模型中仅纳入cg07077277(HIST1H2AA),cg08448665(C21orf88),cg15435730(TFAP2D),cg17900854(C21orf88),cg27630311(TBX3)及cg03933026(NA)等6个CpG位点的时候,联合模型诊断的AUC就达到了0.985(图7),将模型应用于三个GEO数据集,GSE39279&GSE52401和GSE56044的AUC分别为0.995和0.994。上述结果明这些CpG位点在不同数据集的肺腺癌组织中均具有良好且稳定的区分性能。Subsequently, recursive feature elimination (REF) based on the random forest model (10-fold cross validation repeated 5 times) was used to construct models containing from 1 to 12 CpG sites, and the best model was determined based on the AUC value (maximum). It can be seen that when only six CpG sites, cg07077277 (HIST1H2AA), cg08448665 (C21orf88), cg15435730 (TFAP2D), cg17900854 (C21orf88), cg27630311 (TBX3) and cg03933026 (NA), were included in the model, the AUC of the joint model diagnosis reached 0.985 (Figure 7). When the model was applied to three GEO datasets, the AUCs of GSE39279&GSE52401 and GSE56044 were 0.995 and 0.994, respectively. The above results show that these CpG sites have good and stable distinguishing performance in lung adenocarcinoma tissues in different data sets.
综上所述,本研究通过全基因甲基化芯片检测初步探索肺腺癌甲基化候选诊断标志物,随后与大样本、多种类癌症全基因组甲基化芯片数据进行整合分析,发现肺腺癌组织中存在区别于肺腺癌癌旁组织、肺鳞癌及其他肿瘤组织的甲基化标志物。因此,基于肿瘤组织特异性甲基化图谱理论,假设肺腺癌癌细胞主动或被动的向血液循环系统释放的DNA片段中有携带这种特异性甲基化模式的片段,对血浆cfDNA检测有望实现对肺腺癌的特异性溯源,从而实现特异性辅助诊断。In summary, this study preliminarily explored candidate diagnostic markers for lung adenocarcinoma methylation through whole-genome methylation chip detection, and then integrated and analyzed the whole-genome methylation chip data of large samples and multiple types of cancers, and found that there are methylation markers in lung adenocarcinoma tissue that are different from lung adenocarcinoma adjacent tissues, lung squamous cell carcinoma and other tumor tissues. Therefore, based on the theory of tumor tissue-specific methylation maps, assuming that there are fragments carrying this specific methylation pattern in the DNA fragments actively or passively released by lung adenocarcinoma cancer cells to the blood circulation system, plasma cfDNA detection is expected to achieve specific tracing of lung adenocarcinoma, thereby achieving specific auxiliary diagnosis.
Claims (7)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410669622.7A CN118421799A (en) | 2024-05-28 | 2024-05-28 | A group of specific methylation diagnostic markers related to lung adenocarcinoma and their application |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410669622.7A CN118421799A (en) | 2024-05-28 | 2024-05-28 | A group of specific methylation diagnostic markers related to lung adenocarcinoma and their application |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118421799A true CN118421799A (en) | 2024-08-02 |
Family
ID=92306941
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410669622.7A Pending CN118421799A (en) | 2024-05-28 | 2024-05-28 | A group of specific methylation diagnostic markers related to lung adenocarcinoma and their application |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118421799A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150197813A1 (en) * | 2012-08-10 | 2015-07-16 | Trustees Of Dartmouth College | Method for determining sensitivity to decitabine treatment |
CN111742062A (en) * | 2017-10-06 | 2020-10-02 | 优美佳肿瘤技术有限公司 | Methylation markers for cancer diagnosis |
US20210087630A1 (en) * | 2018-02-18 | 2021-03-25 | Yissum Research Development Company Of The Hebrew University Of Jerusalmem Ltd. | Cell free dna deconvolusion and use thereof |
CN113960215A (en) * | 2021-11-09 | 2022-01-21 | 上海市第一人民医院 | Marker for lung adenocarcinoma diagnosis and application thereof |
CN117143994A (en) * | 2023-10-12 | 2023-12-01 | 上海交通大学医学院附属第九人民医院黄浦分院 | Peripheral blood circulation DNA methylation marker for diagnosing Alzheimer disease, kit and identification method thereof |
-
2024
- 2024-05-28 CN CN202410669622.7A patent/CN118421799A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150197813A1 (en) * | 2012-08-10 | 2015-07-16 | Trustees Of Dartmouth College | Method for determining sensitivity to decitabine treatment |
CN111742062A (en) * | 2017-10-06 | 2020-10-02 | 优美佳肿瘤技术有限公司 | Methylation markers for cancer diagnosis |
US20210087630A1 (en) * | 2018-02-18 | 2021-03-25 | Yissum Research Development Company Of The Hebrew University Of Jerusalmem Ltd. | Cell free dna deconvolusion and use thereof |
CN113960215A (en) * | 2021-11-09 | 2022-01-21 | 上海市第一人民医院 | Marker for lung adenocarcinoma diagnosis and application thereof |
CN117143994A (en) * | 2023-10-12 | 2023-12-01 | 上海交通大学医学院附属第九人民医院黄浦分院 | Peripheral blood circulation DNA methylation marker for diagnosing Alzheimer disease, kit and identification method thereof |
Non-Patent Citations (1)
Title |
---|
NCBI: "Platform GPL21145", Retrieved from the Internet <URL:https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL21145> * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106047998B (en) | A kind of detection method and application of lung cancer gene | |
CN110438228B (en) | Colorectal cancer DNA methylation markers | |
CN108179190A (en) | The blood plasma excretion body circRNA markers and its detection primer, kit of a kind of non-small cell lung cancer | |
CN109830264B (en) | Method for classifying tumor patients based on methylation sites | |
WO2017202185A1 (en) | Peripheral blood gene marker for screening benign and malignant small pulmonary nodules and use thereof | |
CN116083584A (en) | A group of plasma miRNA markers for assessing the risk of non-small cell lung cancer and its screening method and application | |
Zeng et al. | Cell-free DNA from bronchoalveolar lavage fluid (BALF): a new liquid biopsy medium for identifying lung cancer | |
CN106701920A (en) | Kit for predicting colorectal cancer liver metastases and use method | |
CN112553351B (en) | Non-invasive markers, screening methods and applications for cholangiocarcinoma based on the relative abundance of gut microbes | |
CN114045337A (en) | Screening, analysis methods and applications of non-invasive markers for cholangiocarcinoma based on gut microbes | |
CN115976216A (en) | Methylation marker for differential diagnosis of benign and malignant lung nodules as well as screening method and application thereof | |
Bustos et al. | Diagnostic miRNA signatures in paired tumor, plasma, and urine specimens from renal cell carcinoma patients | |
CN117334325B (en) | Application of LCAT in diagnosis, treatment and prediction of recurrence of hepatocellular carcinoma | |
CN112481380A (en) | Marker for evaluating anti-tumor immunotherapy reactivity and prognosis survival of late bladder cancer and application thereof | |
CN111968702A (en) | Early malignant tumor screening system based on circulating tumor DNA | |
CN112375824A (en) | Application of MSC as cervical cancer diagnosis, prognosis and/or treatment marker | |
CN118421799A (en) | A group of specific methylation diagnostic markers related to lung adenocarcinoma and their application | |
CN117757936A (en) | Application of copper death related gene in diagnosis, treatment and prognosis prediction of adrenocortical carcinoma | |
CN117038067A (en) | Neuroendocrine type prostate cancer risk prediction method and application thereof | |
CN118086490A (en) | CtDNA methylation segment markers for diagnosing and predicting prognosis of gastric cancer | |
CN116564508A (en) | A prostate cancer early screening model and its construction method | |
TWI450968B (en) | A genetic combination and method for predicting the risk of recurrence or metastasis in cancer patients | |
CN104263723B (en) | A kind of low frequency related to primary lung cancer auxiliary diagnosis penetrance genetic marker high and its application | |
CN109161590A (en) | Application of the Integrin beta4 gene DNA methylation sites in preparation asthma and/or the biomarker of COPD early diagnosis | |
CN111910003A (en) | Early screening and diagnosis method for colorectal cancer by miRNAs based on brushing examination of epithelial cells of rectal mucosa |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |