CN118792426A - Pancreatic ductal adenocarcinoma prediction method and system based on oral flora - Google Patents
Pancreatic ductal adenocarcinoma prediction method and system based on oral flora Download PDFInfo
- Publication number
- CN118792426A CN118792426A CN202410930065.XA CN202410930065A CN118792426A CN 118792426 A CN118792426 A CN 118792426A CN 202410930065 A CN202410930065 A CN 202410930065A CN 118792426 A CN118792426 A CN 118792426A
- Authority
- CN
- China
- Prior art keywords
- biomarker
- ductal adenocarcinoma
- pancreatic ductal
- pancreatic
- patients
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 201000008129 pancreatic ductal adenocarcinoma Diseases 0.000 title claims abstract description 120
- 238000000034 method Methods 0.000 title claims abstract description 38
- 239000000090 biomarker Substances 0.000 claims abstract description 158
- 238000013179 statistical model Methods 0.000 claims abstract description 22
- 238000003745 diagnosis Methods 0.000 claims abstract description 20
- 208000016222 Pancreatic disease Diseases 0.000 claims abstract description 17
- 241000894006 Bacteria Species 0.000 claims abstract description 11
- 230000002183 duodenal effect Effects 0.000 claims abstract description 11
- 239000012530 fluid Substances 0.000 claims abstract description 11
- 244000005700 microbiome Species 0.000 claims abstract description 11
- 210000004923 pancreatic tissue Anatomy 0.000 claims abstract description 11
- 210000003296 saliva Anatomy 0.000 claims abstract description 11
- 238000012549 training Methods 0.000 claims description 15
- 238000012163 sequencing technique Methods 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 12
- 208000024691 pancreas disease Diseases 0.000 claims description 12
- 241000606125 Bacteroides Species 0.000 claims description 9
- 241000605956 Fusobacterium mortiferum Species 0.000 claims description 6
- 241001135206 Prevotella buccalis Species 0.000 claims description 6
- 241000186429 Propionibacterium Species 0.000 claims description 6
- 241000604450 Acidaminococcus fermentans Species 0.000 claims description 3
- 241000702462 Akkermansia muciniphila Species 0.000 claims description 3
- 241000025434 Anaeroglobus geminatus Species 0.000 claims description 3
- 241001051850 Atopostipes suicloacalis Species 0.000 claims description 3
- 241000193830 Bacillus <bacterium> Species 0.000 claims description 3
- 241001674039 Bacteroides acidifaciens Species 0.000 claims description 3
- 241000685477 Bacteroides caecigallinarum Species 0.000 claims description 3
- 241000606124 Bacteroides fragilis Species 0.000 claims description 3
- 235000016068 Berberis vulgaris Nutrition 0.000 claims description 3
- 241000335053 Beta vulgaris Species 0.000 claims description 3
- 241000186016 Bifidobacterium bifidum Species 0.000 claims description 3
- 241000316922 Caldicoprobacter faecalis Species 0.000 claims description 3
- 241000605980 Faecalibacterium prausnitzii Species 0.000 claims description 3
- 241001660422 Herbaspirillum huttiense Species 0.000 claims description 3
- 241000186660 Lactobacillus Species 0.000 claims description 3
- 241000029613 Leptotrichia goodfellowii Species 0.000 claims description 3
- 241000927544 Olsenella Species 0.000 claims description 3
- 241000927555 Olsenella uli Species 0.000 claims description 3
- 241001414297 Peptoanaerobacter stomatis Species 0.000 claims description 3
- 241000191992 Peptostreptococcus Species 0.000 claims description 3
- 241000896231 Phocaeicola Species 0.000 claims description 3
- 241000509341 Phocaeicola abscessus Species 0.000 claims description 3
- 241000509620 Prevotella dentalis Species 0.000 claims description 3
- 241000519651 Propionibacterium acidifaciens Species 0.000 claims description 3
- 241000543623 Schwartzia succinivorans Species 0.000 claims description 3
- 241001657517 Slackia exigua Species 0.000 claims description 3
- 241001254442 Sutterella massiliensis Species 0.000 claims description 3
- 239000002253 acid Substances 0.000 claims description 3
- 230000002053 acidogenic effect Effects 0.000 claims description 3
- 229940002008 bifidobacterium bifidum Drugs 0.000 claims description 3
- 230000002452 interceptive effect Effects 0.000 claims description 3
- 229940039696 lactobacillus Drugs 0.000 claims description 3
- 208000003265 stomatitis Diseases 0.000 claims description 3
- 238000004458 analytical method Methods 0.000 abstract description 3
- 230000008569 process Effects 0.000 description 9
- 238000010586 diagram Methods 0.000 description 7
- 241000894007 species Species 0.000 description 7
- 238000001514 detection method Methods 0.000 description 6
- 108090000623 proteins and genes Proteins 0.000 description 6
- 230000003902 lesion Effects 0.000 description 5
- 210000000496 pancreas Anatomy 0.000 description 5
- 239000003814 drug Substances 0.000 description 4
- 229940079593 drug Drugs 0.000 description 4
- 150000007523 nucleic acids Chemical class 0.000 description 4
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 3
- 238000003066 decision tree Methods 0.000 description 3
- 238000013399 early diagnosis Methods 0.000 description 3
- 108020004707 nucleic acids Proteins 0.000 description 3
- 102000039446 nucleic acids Human genes 0.000 description 3
- 238000003752 polymerase chain reaction Methods 0.000 description 3
- 230000004044 response Effects 0.000 description 3
- 230000031018 biological processes and functions Effects 0.000 description 2
- 210000004027 cell Anatomy 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000002405 diagnostic procedure Methods 0.000 description 2
- 239000003596 drug target Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 210000000214 mouth Anatomy 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000000144 pharmacologic effect Effects 0.000 description 2
- 238000002360 preparation method Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 102000004169 proteins and genes Human genes 0.000 description 2
- 238000012502 risk assessment Methods 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000001225 therapeutic effect Effects 0.000 description 2
- 238000007671 third-generation sequencing Methods 0.000 description 2
- 108020004465 16S ribosomal RNA Proteins 0.000 description 1
- 102000019034 Chemokines Human genes 0.000 description 1
- 108010012236 Chemokines Proteins 0.000 description 1
- 102000004127 Cytokines Human genes 0.000 description 1
- 108090000695 Cytokines Proteins 0.000 description 1
- 208000027244 Dysbiosis Diseases 0.000 description 1
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 1
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 1
- 206010064912 Malignant transformation Diseases 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 238000012408 PCR amplification Methods 0.000 description 1
- 206010061902 Pancreatic neoplasm Diseases 0.000 description 1
- 208000037273 Pathologic Processes Diseases 0.000 description 1
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 1
- 239000000427 antigen Substances 0.000 description 1
- 102000036639 antigens Human genes 0.000 description 1
- 108091007433 antigens Proteins 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 150000001720 carbohydrates Chemical class 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 208000012106 cystic neoplasm Diseases 0.000 description 1
- 238000012350 deep sequencing Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 230000007140 dysbiosis Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012165 high-throughput sequencing Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 208000020082 intraepithelial neoplasia Diseases 0.000 description 1
- 230000036212 malign transformation Effects 0.000 description 1
- 208000015486 malignant pancreatic neoplasm Diseases 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000000813 microbial effect Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 201000002528 pancreatic cancer Diseases 0.000 description 1
- 208000008443 pancreatic carcinoma Diseases 0.000 description 1
- 244000052769 pathogen Species 0.000 description 1
- 230000001717 pathogenic effect Effects 0.000 description 1
- 230000009054 pathological process Effects 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 239000006041 probiotic Substances 0.000 description 1
- 235000018291 probiotics Nutrition 0.000 description 1
- 238000004393 prognosis Methods 0.000 description 1
- 238000013138 pruning Methods 0.000 description 1
- 210000003705 ribosome Anatomy 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000000528 statistical test Methods 0.000 description 1
- 230000004083 survival effect Effects 0.000 description 1
- 208000024891 symptom Diseases 0.000 description 1
- 210000001519 tissue Anatomy 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6888—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
- C12Q1/689—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms for bacteria
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
- C12Q1/6886—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/30—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H70/00—ICT specially adapted for the handling or processing of medical references
- G16H70/40—ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/106—Pharmacogenomics, i.e. genetic variability in individual responses to drugs and drug metabolism
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/118—Prognosis of disease development
Landscapes
- Health & Medical Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Organic Chemistry (AREA)
- Public Health (AREA)
- Analytical Chemistry (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Pathology (AREA)
- Primary Health Care (AREA)
- Biomedical Technology (AREA)
- Epidemiology (AREA)
- Genetics & Genomics (AREA)
- Immunology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biophysics (AREA)
- Microbiology (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Biotechnology (AREA)
- Biochemistry (AREA)
- Molecular Biology (AREA)
- Toxicology (AREA)
- Pharmacology & Pharmacy (AREA)
- Medicinal Chemistry (AREA)
- Oncology (AREA)
- Hospice & Palliative Care (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
本发明属于胰腺导管腺癌预测领域,提供了一种基于口腔菌群的胰腺导管腺癌预测方法及系统。其中,基于口腔菌群的胰腺导管腺癌预测方法包括获取待测者的潜在诊断胰腺导管腺癌的生物标记物集合中各个生物标记物的相对丰度信息;利用所述生物标记物集合中各个生物标记物的相对丰度信息及预先训练的多元统计模型,得到患胰腺导管腺癌的概率值;其中,通过对胰腺导管腺癌患者群和胰腺良性疾病患者群的唾液、十二指肠液和胰腺组织样本进行微生物组间差异分析,将其中的差异菌取交集,得到潜在诊断胰腺导管腺癌的生物标记物集合。
The present invention belongs to the field of pancreatic ductal adenocarcinoma prediction, and provides a pancreatic ductal adenocarcinoma prediction method and system based on oral flora. The pancreatic ductal adenocarcinoma prediction method based on oral flora includes obtaining the relative abundance information of each biomarker in a set of biomarkers for potential diagnosis of pancreatic ductal adenocarcinoma of the subject to be tested; using the relative abundance information of each biomarker in the biomarker set and a pre-trained multivariate statistical model to obtain the probability value of suffering from pancreatic ductal adenocarcinoma; wherein, by performing microbiome difference analysis on saliva, duodenal fluid and pancreatic tissue samples of pancreatic ductal adenocarcinoma patients and benign pancreatic disease patients, the difference bacteria are intersected to obtain a set of biomarkers for potential diagnosis of pancreatic ductal adenocarcinoma.
Description
技术领域Technical Field
本发明属于胰腺导管腺癌预测领域,尤其涉及一种基于口腔菌群的胰腺导管腺癌预测方法及系统。The present invention belongs to the field of pancreatic ductal adenocarcinoma prediction, and in particular relates to a pancreatic ductal adenocarcinoma prediction method and system based on oral flora.
背景技术Background Art
本部分的陈述仅仅是提供了与本发明相关的背景技术信息,不必然构成在先技术。The statements in this section merely provide background information related to the present invention and do not necessarily constitute prior art.
由于胰腺的解剖位置隐蔽、早期病变难以发现,且大部分PDAC(胰腺导管腺癌)缺乏明确的症状,能够用于诊断的生物标志物也很少,导致大约80-85%的患者在诊断时就已经不能切除或转移。即使是一小部分确诊为局部可切除肿瘤的患者,其预后仍很差,5年生存率仅为20%。PDAC(胰腺导管腺癌)的病因复杂,菌群可能也与其发病有关,例如口腔。现有的研究表明,健康对照者和胰腺癌患者的口腔微生物组成不同,但口腔菌群和胰腺菌群之间的关系不明确,也未联合胰腺部位本身的菌群确定出预测PDAC的生物标志物组合,导致无法基于口腔菌群定量预测出胰腺导管腺癌患病概率,并且无法为医生提供更加量化的精确决策性建议。Due to the hidden anatomical location of the pancreas, the difficulty in detecting early lesions, and the lack of clear symptoms in most PDAC (pancreatic ductal adenocarcinoma), there are few biomarkers that can be used for diagnosis, resulting in approximately 80-85% of patients being unable to be removed or metastasized at the time of diagnosis. Even for a small number of patients diagnosed with locally resectable tumors, the prognosis is still poor, with a 5-year survival rate of only 20%. The etiology of PDAC (pancreatic ductal adenocarcinoma) is complex, and the flora may also be related to its onset, such as the oral cavity. Existing studies have shown that the oral microbial composition of healthy controls and pancreatic cancer patients is different, but the relationship between the oral flora and the pancreatic flora is unclear, and the flora of the pancreas itself has not been combined to determine a combination of biomarkers for predicting PDAC, resulting in the inability to quantitatively predict the probability of pancreatic ductal adenocarcinoma based on the oral flora, and the inability to provide doctors with more quantitative and accurate decision-making advice.
发明内容Summary of the invention
为了解决上述背景技术中存在的技术问题,本发明提供一种基于口腔菌群的胰腺导管腺癌预测方法及系统,其能够为医生提供更加量化的精确决策性建议。In order to solve the technical problems existing in the above-mentioned background technology, the present invention provides a method and system for predicting pancreatic ductal adenocarcinoma based on oral flora, which can provide doctors with more quantitative and accurate decision-making suggestions.
为了实现上述目的,本发明采用如下技术方案:In order to achieve the above object, the present invention adopts the following technical solution:
本发明的第一个方面提供一种基于口腔菌群的胰腺导管腺癌预测方法。The first aspect of the present invention provides a method for predicting pancreatic ductal adenocarcinoma based on oral flora.
一种基于口腔菌群的胰腺导管腺癌预测方法,其包括:A method for predicting pancreatic ductal adenocarcinoma based on oral flora, comprising:
获取待测者的潜在诊断胰腺导管腺癌的生物标记物集合中各个生物标记物的相对丰度信息;Obtaining relative abundance information of each biomarker in a set of biomarkers for potential diagnosis of pancreatic ductal adenocarcinoma in a subject;
利用所述生物标记物集合中各个生物标记物的相对丰度信息及预先训练的多元统计模型,得到患胰腺导管腺癌的概率值;Using the relative abundance information of each biomarker in the biomarker set and a pre-trained multivariate statistical model, a probability value of suffering from pancreatic ductal adenocarcinoma is obtained;
其中,通过对胰腺导管腺癌患者群和胰腺良性疾病患者群的唾液、十二指肠液和胰腺组织样本进行微生物组间差异分析,将其中的差异菌取交集,得到潜在诊断胰腺导管腺癌的生物标记物集合;Among them, by analyzing the differences between the microbiome of saliva, duodenal fluid and pancreatic tissue samples of patients with pancreatic ductal adenocarcinoma and patients with benign pancreatic diseases, the intersection of the differential bacteria was taken to obtain a set of biomarkers for the potential diagnosis of pancreatic ductal adenocarcinoma;
在所述多元统计模型的训练中,训练样本由所述生物标记物集合中的各个生物标记物的相对丰度信息及其对应属性标签构成;所述属性标签包括胰腺导管腺癌患者及胰腺良性疾病患者。In the training of the multivariate statistical model, the training samples are composed of the relative abundance information of each biomarker in the biomarker set and its corresponding attribute label; the attribute label includes patients with pancreatic ductal adenocarcinoma and patients with benign pancreatic diseases.
本发明的第二个方面提供一种基于口腔菌群的胰腺导管腺癌预测系统。The second aspect of the present invention provides a pancreatic ductal adenocarcinoma prediction system based on oral flora.
一种基于口腔菌群的胰腺导管腺癌预测系统,其包括:A pancreatic ductal adenocarcinoma prediction system based on oral flora, comprising:
相对丰度信息获取模块,其用于获取待测者的潜在诊断胰腺导管腺癌的生物标记物集合中各个生物标记物的相对丰度信息;A relative abundance information acquisition module, which is used to obtain the relative abundance information of each biomarker in a set of biomarkers for potential diagnosis of pancreatic ductal adenocarcinoma in a subject to be tested;
胰腺导管腺癌预测模块,其用于利用所述生物标记物集合中各个生物标记物的相对丰度信息及预先训练的多元统计模型,得到患胰腺导管腺癌的概率值;A pancreatic ductal adenocarcinoma prediction module, which is used to obtain a probability value of pancreatic ductal adenocarcinoma using the relative abundance information of each biomarker in the biomarker set and a pre-trained multivariate statistical model;
其中,通过对胰腺导管腺癌患者群和胰腺良性疾病患者群的唾液、十二指肠液和胰腺组织样本进行微生物组间差异分析,将其中的差异菌取交集,得到潜在诊断胰腺导管腺癌的生物标记物集合;Among them, by analyzing the differences between the microbiome of saliva, duodenal fluid and pancreatic tissue samples of patients with pancreatic ductal adenocarcinoma and patients with benign pancreatic diseases, the intersection of the differential bacteria was taken to obtain a set of biomarkers for the potential diagnosis of pancreatic ductal adenocarcinoma;
在所述多元统计模型的训练中,训练样本由所述生物标记物集合中的各个生物标记物的相对丰度信息及其对应属性标签构成;所述属性标签包括胰腺导管腺癌患者及胰腺良性疾病患者。In the training of the multivariate statistical model, the training samples are composed of the relative abundance information of each biomarker in the biomarker set and its corresponding attribute label; the attribute label includes patients with pancreatic ductal adenocarcinoma and patients with benign pancreatic diseases.
本发明的第三个方面提供一种计算机可读存储介质。A third aspect of the present invention provides a computer-readable storage medium.
一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如上述所述的基于口腔菌群的胰腺导管腺癌预测方法中的步骤。A computer-readable storage medium stores a computer program, which, when executed by a processor, implements the steps in the above-mentioned method for predicting pancreatic ductal adenocarcinoma based on oral flora.
本发明的第四个方面提供一种计算机设备。A fourth aspect of the present invention provides a computer device.
一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如上述所述的基于口腔菌群的胰腺导管腺癌预测方法中的步骤。A computer device comprises a memory, a processor and a computer program stored in the memory and executable on the processor, wherein when the processor executes the program, the steps in the above-mentioned method for predicting pancreatic ductal adenocarcinoma based on oral flora are implemented.
与现有技术相比,本发明的有益效果是:Compared with the prior art, the present invention has the following beneficial effects:
(1)本发明利用潜在诊断胰腺导管腺癌的生物标记物集合中各个生物标记物的相对丰度信息及多元统计模型,预测患胰腺导管腺癌的概率值,构建出口腔菌群和胰腺菌群之间的关系,实现了联合胰腺部位本身的菌群确定出预测胰腺导管腺癌的生物标志物组合,基于口腔菌群定量预测出胰腺导管腺癌患病概率,能够为医生提供更加量化的精确决策性建议,以辅助医生为诊断胰腺导管腺癌结果提供数据支撑。(1) The present invention uses the relative abundance information of each biomarker in a potential biomarker set for diagnosing pancreatic ductal adenocarcinoma and a multivariate statistical model to predict the probability of suffering from pancreatic ductal adenocarcinoma, construct the relationship between the oral flora and the pancreatic flora, and achieve the combination of the flora of the pancreas itself to determine the biomarker combination for predicting pancreatic ductal adenocarcinoma. The probability of pancreatic ductal adenocarcinoma is quantitatively predicted based on the oral flora, which can provide doctors with more quantitative and accurate decision-making suggestions to assist doctors in providing data support for the diagnosis of pancreatic ductal adenocarcinoma.
(2)本发明公开的生物标志物具有较高的准确度和特异性,具有良好的开发为诊断方法的前景,从而为PDAC的患病风险评估、诊断、早期诊断,寻找潜在药物靶点提供依据;基于口腔菌群的PDAC生物标志物组合作为检测靶点或检测目标在制备检测试剂盒中的应用;基于口腔菌群的PDAC生物标志物组合作为靶点在筛选治疗和/或者预防PDAC的药物中的应用;生物标志物组合相对丰度的变化为确定候选药物是否有效提供依据。(2) The biomarkers disclosed in the present invention have high accuracy and specificity, and have good prospects for development into diagnostic methods, thereby providing a basis for risk assessment, diagnosis, early diagnosis of PDAC, and finding potential drug targets; the use of oral flora-based PDAC biomarker combinations as detection targets or detection targets in the preparation of detection kits; the use of oral flora-based PDAC biomarker combinations as targets in screening drugs for the treatment and/or prevention of PDAC; changes in the relative abundance of the biomarker combination provide a basis for determining whether a candidate drug is effective.
本发明附加方面的优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本发明的实践了解到。Advantages of additional aspects of the present invention will be given in part in the following description, and in part will become obvious from the following description, or will be learned through practice of the present invention.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
构成本发明的一部分的说明书附图用来提供对本发明的进一步理解,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。The accompanying drawings in the specification, which constitute a part of the present invention, are used to provide a further understanding of the present invention. The exemplary embodiments of the present invention and their descriptions are used to explain the present invention and do not constitute improper limitations on the present invention.
图1是本发明实施例的基于口腔菌群的胰腺导管腺癌预测方法流程图;FIG1 is a flow chart of a method for predicting pancreatic ductal adenocarcinoma based on oral flora according to an embodiment of the present invention;
图2是本发明实施例的潜在诊断胰腺导管腺癌的生物标记物集合;FIG. 2 is a set of biomarkers for potential diagnosis of pancreatic ductal adenocarcinoma according to an embodiment of the present invention;
图3是本发明实施例的利用RF模型检验生物标志物对胰腺导管腺癌诊断的性能;FIG3 is a diagram of an embodiment of the present invention using an RF model to test the performance of biomarkers in diagnosing pancreatic ductal adenocarcinoma;
图4是本发明实施例的基于口腔菌群的胰腺导管腺癌预测系统结构示意图。FIG. 4 is a schematic diagram of the structure of a pancreatic ductal adenocarcinoma prediction system based on oral flora according to an embodiment of the present invention.
具体实施方式DETAILED DESCRIPTION
下面结合附图与实施例对本发明作进一步说明。The present invention will be further described below in conjunction with the accompanying drawings and embodiments.
应该指出,以下详细说明都是例示性的,旨在对本发明提供进一步的说明。除非另有指明,本文使用的所有技术和科学术语具有与本发明所属技术领域的普通技术人员通常理解的相同含义。It should be noted that the following detailed descriptions are all illustrative and intended to provide further explanation of the present invention. Unless otherwise specified, all technical and scientific terms used herein have the same meanings as those commonly understood by those skilled in the art to which the present invention belongs.
需要注意的是,这里所使用的术语仅是为了描述具体实施方式,而非意图限制根据本发明的示例性实施方式。如在这里所使用的,除非上下文另外明确指出,否则单数形式也意图包括复数形式,此外,还应当理解的是,当在本说明书中使用术语“包含”和/或“包括”时,其指明存在特征、步骤、操作、器件、组件和/或它们的组合。It should be noted that the terms used herein are only for describing specific embodiments and are not intended to limit exemplary embodiments according to the present invention. As used herein, unless the context clearly indicates otherwise, the singular form is also intended to include the plural form. In addition, it should be understood that when the terms "comprising" and/or "including" are used in this specification, it indicates the presence of features, steps, operations, devices, components and/or combinations thereof.
术语解释:Terminology explanation:
PDAC,胰腺导管腺癌:PDAC的最常见类型,由胰腺的外分泌组织(包括腺泡细胞和导管细胞)恶变形成;主要起源于胰腺上皮内瘤变或囊性肿瘤。与正常人相比,PDAC患者存在不同程度的菌群失调,主要表现在益生菌数量的减少和条件致病菌数量的增多。PDAC, pancreatic ductal adenocarcinoma: the most common type of PDAC, formed by the malignant transformation of the exocrine tissue of the pancreas (including acinar cells and ductal cells); mainly originating from pancreatic intraepithelial neoplasia or cystic tumors. Compared with normal people, PDAC patients have different degrees of dysbiosis, mainly manifested in the decrease of the number of probiotics and the increase of the number of conditional pathogens.
生物标志物,是指“一种可客观检测和评价的特性,可作为正常生物学过程、病理过程或治疗干预药理学反应的指示因子”。例如,核酸标志物(也可以称为基因标志物,例如DNA),蛋白质标志物,细胞因子标记物,趋化因子标记物,碳水化合物标志物,抗原标志物,抗体标志物,物种标志物(种/属的标记)和功能标志物(KO/OG标记)等。其中,核酸标志物的含义并不局限于现有可以表达为具有生物活性的蛋白质的基因,还包括任何核酸片段,可以为DNA,也可以为RNA,可以是经过修饰的DNA或者RNA,也可以是未经修饰的DNA或者RNA,以及由它们组成的集合。在本文中核酸标志物有时也可以称为特征片段。A biomarker is “a characteristic that can be objectively detected and evaluated and can serve as an indicator of normal biological processes, pathological processes, or pharmacological responses to therapeutic interventions”. For example, nucleic acid markers (also known as gene markers, such as DNA), protein markers, cytokine markers, chemokine markers, carbohydrate markers, antigen markers, antibody markers, species markers (species/genus markers) and functional markers (KO/OG markers), etc. Among them, the meaning of nucleic acid markers is not limited to existing genes that can be expressed as biologically active proteins, but also includes any nucleic acid fragments, which can be DNA or RNA, modified DNA or RNA, or unmodified DNA or RNA, as well as a collection of them. In this article, nucleic acid markers are sometimes also referred to as characteristic fragments.
在本发明中,生物标志物也可以用“口腔标志物”来表示,因为所发现的与PDAC相关的生物标志物均存在于受试者的口腔内。生物标记物经过测量和评估,经常用以检查正常生物过程,致病过程,或治疗干预药理响应,而且在许多科学领域都是有用的。In the present invention, biomarkers can also be represented by "oral markers" because the biomarkers found to be associated with PDAC are all present in the oral cavity of the subject. Biomarkers are measured and evaluated, often used to examine normal biological processes, pathogenic processes, or pharmacological responses to therapeutic interventions, and are useful in many scientific fields.
实施例一Embodiment 1
图1是本发明实施例的基于口腔菌群的胰腺导管腺癌预测方法流程图。如图1所示,本发明实施例提供了一种基于口腔菌群的胰腺导管腺癌预测方法,其具体包括如下步骤:Fig. 1 is a flow chart of a method for predicting pancreatic ductal adenocarcinoma based on oral flora according to an embodiment of the present invention. As shown in Fig. 1, an embodiment of the present invention provides a method for predicting pancreatic ductal adenocarcinoma based on oral flora, which specifically includes the following steps:
S101:获取待测者的潜在诊断胰腺导管腺癌的生物标记物集合中各个生物标记物的相对丰度信息。S101: Obtaining relative abundance information of each biomarker in a set of biomarkers for potential diagnosis of pancreatic ductal adenocarcinoma in a subject.
其中,通过对胰腺导管腺癌患者群和胰腺良性疾病患者群的唾液、十二指肠液和胰腺组织样本进行微生物组间差异分析,将其中的差异菌取交集,得到潜在诊断胰腺导管腺癌的生物标记物集合。Among them, by analyzing the differences between the microbiome groups of saliva, duodenal fluid and pancreatic tissue samples of patients with pancreatic ductal adenocarcinoma and patients with benign pancreatic diseases, the differential bacteria were intersected to obtain a set of potential biomarkers for the diagnosis of pancreatic ductal adenocarcinoma.
所述生物标记物集合中各个生物标记物的相对丰度信息由测序方法得到。其中,测序方法是通过第二代测序方法或第三代测序方法进行的。进行测序的手段并不受特别限制,通过二代或者三代测序的方法进行测序,可以实现快速高效的测序。例如,测序方法通过选自Hiseq2000、SOLiD、454和单分子测序装置的至少一种进行的。由此,能够利用这些测序装置的高通量、深度测序的特点,从而有利于对后续测序数据进行分析,尤其是进行统计学检验时的精确性和准确度。The relative abundance information of each biomarker in the biomarker set is obtained by a sequencing method. The sequencing method is performed by a second-generation sequencing method or a third-generation sequencing method. The means for sequencing is not particularly limited, and sequencing by a second-generation or third-generation sequencing method can achieve fast and efficient sequencing. For example, the sequencing method is performed by at least one selected from Hiseq2000, SOLiD, 454 and a single-molecule sequencing device. Thus, the high-throughput and deep sequencing characteristics of these sequencing devices can be utilized, which is conducive to the analysis of subsequent sequencing data, especially the precision and accuracy when performing statistical tests.
在一些具体实施例中,采集唾液、十二指肠液和胰腺组织样品后冷冻运输并迅速转移到-80℃保存,进行DNA提取,得到提取的DNA样本。本发明的胰腺导管腺癌患者群和胰腺良性疾病患者群受试者的唾液、十二指肠液和胰腺组织样本来自中国,共计239例样本,包括唾液样本85例(胰腺良性病变患者22例和PDAC患者63例),十二指肠液样本69例(胰腺良性病变患者19例和PDAC患者50例),胰腺组织样本85例(胰腺良性病变患者22例和PDAC患者63例),这些样本均经受试者同意,且经过合法来源得到。In some specific embodiments, saliva, duodenal fluid and pancreatic tissue samples are collected, frozen for transportation and quickly transferred to -80°C for storage, and DNA is extracted to obtain extracted DNA samples. The saliva, duodenal fluid and pancreatic tissue samples of the pancreatic ductal adenocarcinoma patient group and the pancreatic benign disease patient group of the present invention are from China, totaling 239 samples, including 85 saliva samples (22 patients with benign pancreatic lesions and 63 patients with PDAC), 69 duodenal fluid samples (19 patients with benign pancreatic lesions and 50 patients with PDAC), and 85 pancreatic tissue samples (22 patients with benign pancreatic lesions and 63 patients with PDAC). These samples were obtained with the consent of the subjects and from legal sources.
以上述提取的DNA为模板,使用携带Barcode序列的上游引物338F(5’-ACTCCTACGGGAGGCAGCAG-3’)和下游引物806R(5’-GGACTACHVGGGTWTCTAAT-3’)对16S rRNA基因V3-V4可变区进行PCR扩增。使用NEXTFLEX Rapid DNA-Seq Kit对纯化后的PCR产物进行建库。利用Illumina公司的Miseq PE300/NovaSeq PE250平台进行测序。The extracted DNA was used as a template to perform PCR amplification of the V3-V4 variable region of the 16S rRNA gene using the upstream primer 338F (5'-ACTCCTACGGGAGGCAGCAG-3') and the downstream primer 806R (5'-GGACTACHVGGGTWTCTAAT-3') carrying the Barcode sequence. The purified PCR product was library constructed using the NEXTFLEX Rapid DNA-Seq Kit. Sequencing was performed using the Miseq PE300/NovaSeq PE250 platform of Illumina.
对双端原始测序序列进行质控并拼接。使用UPARSE软件(version 7.1,maxee=3,minlength=370),根据97%的相似度对质控拼接后的序列进行操作分类单元OTU(Operational taxonomic unit)聚类并剔除嵌合体。将7219个OTU的代表性读数与核糖体数据库(Ribosomal Database Project,release 18)进行比对,以获得与参考数据库物种80%相似的微生物组的分类学。The double-end original sequencing sequences were quality controlled and spliced. Using UPARSE software (version 7.1, maxee = 3, minlength = 370), the quality-controlled spliced sequences were clustered into operational taxonomic units (OTUs) and chimeras were removed based on 97% similarity. Representative reads of 7219 OTUs were compared with the Ribosomal Database Project (release 18) to obtain the taxonomy of the microbiome with 80% similarity to the reference database species.
使用MaAsLin2(version 1.14.1)分析调整了年龄和性别的影响,挑选错误发现率(FDR)调整后的p值小于0.001的属或种。对PDAC患者和胰腺良性病变患者的唾液、十二指肠液和胰腺组织样本进行种水平上的微生物组间差异分析,将不同种类样本中的差异菌取交集,并使用K-means进行聚类,得到一个由25个微生物(4个簇)组成的面板,如图2所示,作为潜在的诊断PDAC的生物标记物。其中,所述生物标记物集合由如下生物标记物构成:MaAsLin2 (version 1.14.1) was used to adjust the effects of age and gender, and the genera or species with a false discovery rate (FDR) adjusted p value less than 0.001 were selected. The differences between the microbiome groups at the species level were analyzed in the saliva, duodenal fluid and pancreatic tissue samples of PDAC patients and patients with benign pancreatic lesions. The differential bacteria in the samples of different species were intersected and clustered using K-means to obtain a panel of 25 microorganisms (4 clusters), as shown in Figure 2, as potential biomarkers for diagnosing PDAC. Among them, the biomarker set consists of the following biomarkers:
生物标记物1为Bifidobacterium bifidum(两岐双岐杆菌);Biomarker 1 is Bifidobacterium bifidum;
生物标记物2为Sutterella massiliensis(马赛类萨特氏菌);Biomarker 2 is Sutterella massiliensis;
生物标记物3为Herbaspirillum huttiense(哈特草螺菌);Biomarker 3 is Herbaspirillum huttiense;
生物标记物4为Prevotella buccalis(口颊普雷沃菌);Biomarker 4 is Prevotella buccalis (Prevotella buccalis);
生物标记物5为Phocaeicola abscessus(脓肿拟杆菌);Biomarker 5 is Phocaeicola abscessus (Bacteroides abscessus);
生物标记物6为Prevotella dentalis(牙普雷沃菌);Biomarker 6 is Prevotella dentalis;
生物标记物7为Peptoanaerobacter stomatis(口炎消化链球菌);Biomarker 7 is Peptoanaerobacter stomatis (Peptostreptococcus stomatitis);
生物标记物8为Schwartzia succinivorans(食琥珀酸施氏菌);Biomarker 8 is Schwartzia succinivorans;
生物标记物9为Anaeroglobus geminatus(成双厌氧球形菌);Biomarker 9 is Anaeroglobus geminatus (dimorphic anaerobic spherical bacteria);
生物标记物10为Olsenella uli(齿龈欧氏菌);Biomarker 10 is Olsenella uli (Olsenella gingivalis);
生物标记物11为Slackia exigua(甜菜丝孢菌);Biomarker 11 is Slackia exigua (beet hyphomycetes);
生物标记物12为Arachnia rubra(鲁布拉蛛网菌);Biomarker 12 is Arachnia rubra;
生物标记物13为Leptotrichia goodfellowii(古氏纤毛菌);Biomarker 13 is Leptotrichia goodfellowii (Goodfellowii);
生物标记物14为Propionibacterium acidifaciens(产酸丙酸杆菌);Biomarker 14 is Propionibacterium acidifaciens (acid-producing Propionibacterium);
生物标记物15为Fusobacterium mortiferum(死亡梭杆菌);Biomarker 15 is Fusobacterium mortiferum (Fusobacterium mortiferum);
生物标记物16为Acidaminococcus fermentans(发酵氨基酸球菌);Biomarker 16 is Acidaminococcus fermentans;
生物标记物17为Loigolactobacillus coryniformis(棒状腐败乳杆菌);Biomarker 17 is Loigolactobacillus coryniformis (rod-shaped putrefactive lactobacillus);
生物标记物18为Bacteroides caecigallinarum(脆弱拟杆菌);Biomarker 18 is Bacteroides caecigallinarum (Bacteroides fragilis);
生物标记物19为Caldicoprobacter faecalis(粪嗜热互营杆菌);Biomarker 19 is Caldicoprobacter faecalis (Thermophilic Interactive Bacillus faecalis);
生物标记物20为Atopostipes suicloacalis(粪阿托波斯蒂普斯菌);Biomarker 20 is Atopostipes suicloacalis;
生物标记物21为Akkermansia muciniphila(嗜黏蛋白阿克曼菌);Biomarker 21 is Akkermansia muciniphila;
生物标记物22为Phocaeicola vulgatus(普通拟杆菌);Biomarker 22 is Phocaeicola vulgatus (common Bacteroides);
生物标记物23为Bacteroides acidifaciens(产酸拟杆菌);Biomarker 23 is Bacteroides acidifaciens (acidogenic Bacteroides);
生物标记物24为Lactiplantibacillus plantarum(植物乳植杆菌);Biomarker 24 is Lactiplantibacillus plantarum;
生物标记物25为Faecalibacterium prausnitzii(普拉梭菌)。Biomarker 25 is Faecalibacterium prausnitzii.
可以理解的是,生物标记物集合中的生物标记物的数量,本领域技术人员可根据实际精度需求进行选择,此处不再详述。It is understandable that the number of biomarkers in the biomarker set can be selected by those skilled in the art according to actual accuracy requirements, which will not be described in detail here.
本发明提出的PDAC相关的生物标记物对早期诊断是有价值的,本发明的标记物具有较高的特异性和灵敏性;口腔菌群的分析保证准确性、安全性、可负担性和患者依从性。并且咽拭子的样本是可运输的。基于聚合酶链反应(PCR)的试验舒适且无创,人们会更容易参与给定的筛选程序。本发明的标记物还可以用作于对PDAC患者进行治疗监测的工具以检测对治疗的响应。由于丰度度量的原因,上述25种标记物的组合适用于基于标记基因比对方法度量丰度的情况。The PDAC-related biomarkers proposed in the present invention are valuable for early diagnosis, and the markers of the present invention have high specificity and sensitivity; the analysis of oral flora guarantees accuracy, safety, affordability and patient compliance. And the samples of throat swabs are transportable. The test based on polymerase chain reaction (PCR) is comfortable and non-invasive, and people will be more likely to participate in a given screening program. The markers of the present invention can also be used as a tool for monitoring the treatment of PDAC patients to detect the response to treatment. Due to the abundance measurement, the combination of the above 25 markers is suitable for measuring the abundance based on the marker gene comparison method.
S102:利用所述生物标记物集合中各个生物标记物的相对丰度信息及预先训练的多元统计模型,得到患胰腺导管腺癌的概率值。S102: Obtaining a probability value of suffering from pancreatic ductal adenocarcinoma using the relative abundance information of each biomarker in the biomarker set and a pre-trained multivariate statistical model.
在S102中,优选地,所述多元统计模型为RF模型。In S102, preferably, the multivariate statistical model is a RF model.
将上述步骤S101筛选出来的25个微生物输入RF模型中。RF模型是一个包含多个决策树的分类器,并且其输出的类别是由个别树输出的类别的众数而定。RF模型的构建算法如下:The 25 microorganisms screened in step S101 above are input into the RF model. The RF model is a classifier containing multiple decision trees, and the category of its output is determined by the mode of the category output by the individual trees. The construction algorithm of the RF model is as follows:
(1)用N来表示训练用例(样本)的个数,M表示特征数目。(1) N represents the number of training cases (samples), and M represents the number of features.
(2)输入特征数目m,用于确定决策树上一个节点的决策结果;其中m应远小于M。(2) Input the number of features m, which is used to determine the decision result of a node on the decision tree; m should be much smaller than M.
(3)从N个训练用例(样本)中以有放回抽样的方式,取样N次,形成一个训练集(即bootstrap取样),并用未抽到的用例(样本)作预测,评估其误差。(3) Sample N times with replacement from N training cases (samples) to form a training set (i.e., bootstrap sampling), and use the cases (samples) that were not sampled to make predictions and evaluate their errors.
(4)对于每一个节点,随机选择m个特征,决策树上每个节点的决定都是基于这些特征确定的。根据这m个特征,计算其最佳的分裂方式。(4) For each node, m features are randomly selected. The decision of each node in the decision tree is determined based on these m features. Based on these m features, the best splitting method is calculated.
(5)每棵树都会完整成长而不会剪枝,这有可能在建完一棵正常树状分类器后会被采用。对分类器进行5折交叉验证,利用RF模型筛选的物种相对丰度对每一个体计算其PDAC患病风险,绘制ROC曲线,并计算出AUC作为判别模型效能评价参数,如图3所示。(5) Each tree will grow completely without pruning, which may be adopted after a normal tree classifier is built. The classifier was cross-validated 5-fold, and the relative abundance of species screened by the RF model was used to calculate the PDAC risk for each individual, draw the ROC curve, and calculate the AUC as the discriminant model performance evaluation parameter, as shown in Figure 3.
在另一些实施例中,多元统计模型也可其他现有的统计模型来实现,此处不再详述。In other embodiments, the multivariate statistical model may also be implemented by other existing statistical models, which will not be described in detail here.
在一个或多个实施例中,在所述多元统计模型的训练中,训练样本由所述生物标记物集合中的各个生物标记物的相对丰度信息及其对应属性标签构成;所述属性标签包括胰腺导管腺癌患者及胰腺良性疾病患者。In one or more embodiments, in the training of the multivariate statistical model, the training samples are composed of the relative abundance information of each biomarker in the biomarker set and its corresponding attribute label; the attribute label includes patients with pancreatic ductal adenocarcinoma and patients with benign pancreatic diseases.
在一些实施例中,将患胰腺导管腺癌的概率值与预设概率阈值(如0.5等)进行比较,若前者大于后者,则预测相应待测者为胰腺导管腺癌患者;否则,预测相应待测者为胰腺良性疾病患者。In some embodiments, the probability value of suffering from pancreatic ductal adenocarcinoma is compared with a preset probability threshold (such as 0.5, etc.). If the former is greater than the latter, the corresponding subject is predicted to be a patient with pancreatic ductal adenocarcinoma; otherwise, the corresponding subject is predicted to be a patient with benign pancreatic disease.
本实施例利用潜在诊断胰腺导管腺癌的生物标记物集合中各个生物标记物的相对丰度信息及多元统计模型,预测患胰腺导管腺癌的概率值,构建出口腔菌群和胰腺菌群之间的关系,实现了联合胰腺部位本身的菌群确定出预测胰腺导管腺癌的生物标志物组合,基于口腔菌群定量预测出胰腺导管腺癌患病概率,能够为医生提供更加量化的精确决策性建议。This embodiment uses the relative abundance information of each biomarker in the potential biomarker set for diagnosing pancreatic ductal adenocarcinoma and a multivariate statistical model to predict the probability of having pancreatic ductal adenocarcinoma, constructs the relationship between the oral flora and the pancreatic flora, and realizes the determination of a biomarker combination for predicting pancreatic ductal adenocarcinoma by combining the flora of the pancreas itself. The probability of pancreatic ductal adenocarcinoma is quantitatively predicted based on the oral flora, which can provide doctors with more quantitative and accurate decision-making suggestions.
本发明公开的生物标志物具有较高的准确度和特异性,具有良好的开发为诊断方法的前景,从而为PDAC的患病风险评估、诊断、早期诊断,寻找潜在药物靶点提供依据;基于口腔菌群的PDAC生物标志物组合作为检测靶点或检测目标在制备检测试剂盒中的应用;基于口腔菌群的PDAC生物标志物组合作为靶点在筛选治疗和/或者预防PDAC的药物中的应用;生物标志物组合相对丰度的变化为确定候选药物是否有效提供依据。The biomarkers disclosed in the present invention have high accuracy and specificity, and have good prospects for development into diagnostic methods, thereby providing a basis for risk assessment, diagnosis, early diagnosis of PDAC, and finding potential drug targets; the use of a PDAC biomarker combination based on oral flora as a detection target or detection object in the preparation of a detection kit; the use of a PDAC biomarker combination based on oral flora as a target in screening drugs for treating and/or preventing PDAC; and changes in the relative abundance of the biomarker combination provide a basis for determining whether a candidate drug is effective.
实施例二Embodiment 2
图4是本发明实施例的基于口腔菌群的胰腺导管腺癌预测系统结构示意图。FIG. 4 is a schematic diagram of the structure of a pancreatic ductal adenocarcinoma prediction system based on oral flora according to an embodiment of the present invention.
根据图4,本发明实施例提供了一种基于口腔菌群的胰腺导管腺癌预测系统,其包括:相对丰度信息获取模块401和胰腺导管腺癌预测模块402。According to FIG. 4 , an embodiment of the present invention provides a pancreatic ductal adenocarcinoma prediction system based on oral flora, which includes: a relative abundance information acquisition module 401 and a pancreatic ductal adenocarcinoma prediction module 402 .
在具体实施过程中,相对丰度信息获取模块401,其用于获取待测者的潜在诊断胰腺导管腺癌的生物标记物集合中各个生物标记物的相对丰度信息。In a specific implementation process, the relative abundance information acquisition module 401 is used to acquire the relative abundance information of each biomarker in a set of biomarkers for potential diagnosis of pancreatic ductal adenocarcinoma in a subject.
其中,通过对胰腺导管腺癌患者群和胰腺良性疾病患者群的唾液、十二指肠液和胰腺组织样本进行微生物组间差异分析,将其中的差异菌取交集,得到潜在诊断胰腺导管腺癌的生物标记物集合。Among them, by analyzing the differences between the microbiome groups of saliva, duodenal fluid and pancreatic tissue samples of patients with pancreatic ductal adenocarcinoma and patients with benign pancreatic diseases, the differential bacteria were intersected to obtain a set of potential biomarkers for the diagnosis of pancreatic ductal adenocarcinoma.
此处的所述生物标记物集合包括但不限于如下生物标记物:The biomarker set herein includes but is not limited to the following biomarkers:
生物标记物1为Bifidobacterium bifidum(两岐双岐杆菌);Biomarker 1 is Bifidobacterium bifidum;
生物标记物2为Sutterella massiliensis(马赛类萨特氏菌);Biomarker 2 is Sutterella massiliensis;
生物标记物3为Herbaspirillum huttiense(哈特草螺菌);Biomarker 3 is Herbaspirillum huttiense;
生物标记物4为Prevotella buccalis(口颊普雷沃菌);Biomarker 4 is Prevotella buccalis (Prevotella buccalis);
生物标记物5为Phocaeicola abscessus(脓肿拟杆菌);Biomarker 5 is Phocaeicola abscessus (Bacteroides abscessus);
生物标记物6为Prevotella dentalis(牙普雷沃菌);Biomarker 6 is Prevotella dentalis;
生物标记物7为Peptoanaerobacter stomatis(口炎消化链球菌);Biomarker 7 is Peptoanaerobacter stomatis (Peptostreptococcus stomatitis);
生物标记物8为Schwartzia succinivorans(食琥珀酸施氏菌);Biomarker 8 is Schwartzia succinivorans;
生物标记物9为Anaeroglobus geminatus(成双厌氧球形菌);Biomarker 9 is Anaeroglobus geminatus (dimorphic anaerobic spherical bacteria);
生物标记物10为Olsenella uli(齿龈欧氏菌);Biomarker 10 is Olsenella uli (Olsenella gingivalis);
生物标记物11为Slackia exigua(甜菜丝孢菌);Biomarker 11 is Slackia exigua (beet hyphomycetes);
生物标记物12为Arachnia rubra(鲁布拉蛛网菌);Biomarker 12 is Arachnia rubra;
生物标记物13为Leptotrichia goodfellowii(古氏纤毛菌);Biomarker 13 is Leptotrichia goodfellowii (Goodfellowii);
生物标记物14为Propionibacterium acidifaciens(产酸丙酸杆菌);Biomarker 14 is Propionibacterium acidifaciens (acid-producing Propionibacterium);
生物标记物15为Fusobacterium mortiferum(死亡梭杆菌);Biomarker 15 is Fusobacterium mortiferum (Fusobacterium mortiferum);
生物标记物16为Acidaminococcus fermentans(发酵氨基酸球菌);Biomarker 16 is Acidaminococcus fermentans;
生物标记物17为Loigolactobacillus coryniformis(棒状腐败乳杆菌);Biomarker 17 is Loigolactobacillus coryniformis (rod-shaped putrefactive lactobacillus);
生物标记物18为Bacteroides caecigallinarum(脆弱拟杆菌);Biomarker 18 is Bacteroides caecigallinarum (Bacteroides fragilis);
生物标记物19为Caldicoprobacter faecalis(粪嗜热互营杆菌);Biomarker 19 is Caldicoprobacter faecalis (Thermophilic Interactive Bacillus faecalis);
生物标记物20为Atopostipes suicloacalis(粪阿托波斯蒂普斯菌);Biomarker 20 is Atopostipes suicloacalis;
生物标记物21为Akkermansia muciniphila(嗜黏蛋白阿克曼菌);Biomarker 21 is Akkermansia muciniphila;
生物标记物22为Phocaeicola vulgatus(普通拟杆菌);Biomarker 22 is Phocaeicola vulgatus (common Bacteroides);
生物标记物23为Bacteroides acidifaciens(产酸拟杆菌);Biomarker 23 is Bacteroides acidifaciens (acidogenic Bacteroides);
生物标记物24为Lactiplantibacillus plantarum(植物乳植杆菌);Biomarker 24 is Lactiplantibacillus plantarum;
生物标记物25为Faecalibacterium prausnitzii(普拉梭菌)。Biomarker 25 is Faecalibacterium prausnitzii.
在具体实施过程中,胰腺导管腺癌预测模块402,其用于利用所述生物标记物集合中各个生物标记物的相对丰度信息及预先训练的多元统计模型,得到患胰腺导管腺癌的概率值。In a specific implementation process, the pancreatic ductal adenocarcinoma prediction module 402 is used to obtain the probability value of pancreatic ductal adenocarcinoma using the relative abundance information of each biomarker in the biomarker set and a pre-trained multivariate statistical model.
在所述多元统计模型的训练中,训练样本由所述生物标记物集合中的各个生物标记物的相对丰度信息及其对应属性标签构成;所述属性标签包括胰腺导管腺癌患者及胰腺良性疾病患者。In the training of the multivariate statistical model, the training samples are composed of the relative abundance information of each biomarker in the biomarker set and its corresponding attribute label; the attribute label includes patients with pancreatic ductal adenocarcinoma and patients with benign pancreatic diseases.
在所述胰腺导管腺癌预测模块402中,将患胰腺导管腺癌的概率值与预设概率阈值进行比较,若前者大于后者,则预测相应待测者为胰腺导管腺癌患者;否则,预测相应待测者为胰腺良性疾病患者。In the pancreatic ductal adenocarcinoma prediction module 402, the probability value of suffering from pancreatic ductal adenocarcinoma is compared with a preset probability threshold. If the former is greater than the latter, the corresponding subject is predicted to be a patient with pancreatic ductal adenocarcinoma; otherwise, the corresponding subject is predicted to be a patient with benign pancreatic disease.
此处可以理解的是,所述多元统计模型为RF模型,或是其他现有的多元统计模型。It can be understood here that the multivariate statistical model is the RF model, or other existing multivariate statistical models.
需要说明的是,本实施例中的基于口腔菌群的胰腺导管腺癌预测系统中的相对丰度信息获取模块401和胰腺导管腺癌预测模块402,与实施例一中的步骤S101及步骤S102一一对应,其具体实施过程相同,此处不再详述。It should be noted that the relative abundance information acquisition module 401 and the pancreatic ductal adenocarcinoma prediction module 402 in the oral flora-based pancreatic ductal adenocarcinoma prediction system in this embodiment correspond one-to-one to step S101 and step S102 in Example 1, and their specific implementation processes are the same, which will not be described in detail here.
实施例三Embodiment 3
本实施例提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如上述实施例一所述的基于口腔菌群的胰腺导管腺癌预测方法中的步骤。This embodiment provides a computer-readable storage medium having a computer program stored thereon. When the program is executed by a processor, the steps in the method for predicting pancreatic ductal adenocarcinoma based on oral flora as described in the first embodiment above are implemented.
实施例四Embodiment 4
本实施例提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如上述实施例一所述的基于口腔菌群的胰腺导管腺癌预测方法中的步骤。This embodiment provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the program, the steps in the method for predicting pancreatic ductal adenocarcinoma based on oral flora as described in the first embodiment above are implemented.
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用硬件实施例、软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that embodiments of the present invention may be provided as methods, systems, or computer program products. Therefore, the present invention may take the form of hardware embodiments, software embodiments, or embodiments combining software and hardware. Moreover, the present invention may take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) containing computer-usable program codes.
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to the flowchart and/or block diagram of the method, device (system), and computer program product according to the embodiment of the present invention. It should be understood that each process and/or box in the flowchart and/or block diagram, as well as the combination of the process and/or box in the flowchart and/or block diagram can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, an embedded processor or other programmable data processing device to produce a machine, so that the instructions executed by the processor of the computer or other programmable data processing device produce a device for implementing the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random AccessMemory,RAM)等。A person skilled in the art can understand that all or part of the processes in the above-mentioned embodiments can be implemented by instructing the relevant hardware through a computer program, and the program can be stored in a computer-readable storage medium, and when the program is executed, it can include the processes of the embodiments of the above-mentioned methods. The storage medium can be a disk, an optical disk, a read-only memory (ROM) or a random access memory (RAM), etc.
以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and variations. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included in the protection scope of the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410930065.XA CN118792426A (en) | 2024-07-11 | 2024-07-11 | Pancreatic ductal adenocarcinoma prediction method and system based on oral flora |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410930065.XA CN118792426A (en) | 2024-07-11 | 2024-07-11 | Pancreatic ductal adenocarcinoma prediction method and system based on oral flora |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118792426A true CN118792426A (en) | 2024-10-18 |
Family
ID=93030895
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410930065.XA Pending CN118792426A (en) | 2024-07-11 | 2024-07-11 | Pancreatic ductal adenocarcinoma prediction method and system based on oral flora |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118792426A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009009163A1 (en) * | 2007-07-09 | 2009-01-15 | Micropure, Inc. | Composition and method for the prevention of oral disease |
CN106202846A (en) * | 2015-04-30 | 2016-12-07 | 中国科学院青岛生物能源与过程研究所 | The construction method of oral microbial community detection model and application thereof |
CN111261222A (en) * | 2018-12-03 | 2020-06-09 | 中国科学院青岛生物能源与过程研究所 | Construction method and application of oral microbial community detection model |
-
2024
- 2024-07-11 CN CN202410930065.XA patent/CN118792426A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009009163A1 (en) * | 2007-07-09 | 2009-01-15 | Micropure, Inc. | Composition and method for the prevention of oral disease |
CN106202846A (en) * | 2015-04-30 | 2016-12-07 | 中国科学院青岛生物能源与过程研究所 | The construction method of oral microbial community detection model and application thereof |
CN111261222A (en) * | 2018-12-03 | 2020-06-09 | 中国科学院青岛生物能源与过程研究所 | Construction method and application of oral microbial community detection model |
Non-Patent Citations (2)
Title |
---|
KELLEY N L MCKINLEY ET AL.: "Translocation of Oral Microbiota into the Pancreatic Ductal Adenocarcinoma Tumor Microenvironment", MICROORGANISMS., vol. 11, no. 6, 31 May 2023 (2023-05-31), pages 1 - 5 * |
陈恬: "基于微生物组学的胰腺癌患者口腔及肠道菌群研究", 中国博士学位论文全文数据库(电子期刊), no. 2, 15 February 2021 (2021-02-15), pages 072 - 93 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109943636B (en) | Colorectal cancer microbial marker and application thereof | |
CN105296590B (en) | Large intestine carcinoma marker and its application | |
WO2021184412A1 (en) | Enteric microorganism-based bipolar affective disorder biomarkers, and application thereof in screening | |
CN111440884A (en) | Gut-derived flora for the diagnosis of sarcopenia and its use | |
CN110904213B (en) | An ulcerative colitis biomarker based on intestinal flora and its application | |
CN105132518B (en) | Large intestine carcinoma marker and its application | |
WO2020244018A1 (en) | Small-scale schizophrenia biomarker combination, application thereof and metaphlan2 screening method therefor | |
CN111020020A (en) | A biomarker combination for schizophrenia, its application and metaphlan2 screening method | |
EP3786305A1 (en) | Biomarker for depression and use thereof | |
JP2022527653A (en) | How to diagnose a disease | |
CN114292931A (en) | Risk assessment markers for acute coronary syndrome and their applications | |
CN112852916A (en) | Marker combination for intestinal microecology, auxiliary diagnosis model and application of marker combination | |
CN114438165A (en) | Acute coronary syndrome risk assessment markers and applications for stable coronary heart disease | |
CN114360726A (en) | Stable coronary heart disease risk assessment markers and their applications | |
CN114438214A (en) | Colorectal cancer tumor marker and detection method and device thereof | |
CN115786556A (en) | Application of Megasphaera micturition intestinal strain | |
CN112384634B (en) | Osteoporosis biomarker and application thereof | |
CN110358849A (en) | Derived from the biomarker of the Diagnosis of Pancreatic inflammation of enteron aisle, screening technique and application thereof | |
CN113403409A (en) | Bacterial species level detection and analysis method based on bacterial 16S rRNA gene sequence | |
CN111020021A (en) | Intestinal flora-based small-scale schizophrenia biomarker combination, application thereof and mOTU screening method | |
CN114657270B (en) | Alzheimer disease biomarker based on intestinal flora and application thereof | |
CN111370116A (en) | Intestinal microbial marker for predicting curative effect of bipolar affective disorder and screening application thereof | |
CN118792426A (en) | Pancreatic ductal adenocarcinoma prediction method and system based on oral flora | |
WO2022166934A1 (en) | Gut microbiota markers for evaluating onset risk of cardiovascular diseases and uses thereof | |
CN110396538A (en) | Migraine biomarker and application thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |