CN110904213A - Intestinal flora-based ulcerative colitis biomarker and application thereof - Google Patents
Intestinal flora-based ulcerative colitis biomarker and application thereof Download PDFInfo
- Publication number
- CN110904213A CN110904213A CN201911267223.3A CN201911267223A CN110904213A CN 110904213 A CN110904213 A CN 110904213A CN 201911267223 A CN201911267223 A CN 201911267223A CN 110904213 A CN110904213 A CN 110904213A
- Authority
- CN
- China
- Prior art keywords
- ulcerative colitis
- biomarker
- sequencing
- relative abundance
- sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 206010009900 Colitis ulcerative Diseases 0.000 title claims abstract description 160
- 201000006704 Ulcerative Colitis Diseases 0.000 title claims abstract description 160
- 239000000090 biomarker Substances 0.000 title claims abstract description 111
- 230000000968 intestinal effect Effects 0.000 title claims abstract description 45
- 201000010099 disease Diseases 0.000 claims abstract description 52
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims abstract description 52
- 239000003814 drug Substances 0.000 claims abstract description 19
- 229940079593 drug Drugs 0.000 claims abstract description 15
- 238000003745 diagnosis Methods 0.000 claims abstract description 12
- 108090000623 proteins and genes Proteins 0.000 claims description 76
- 238000012163 sequencing technique Methods 0.000 claims description 67
- 238000000034 method Methods 0.000 claims description 38
- 150000007523 nucleic acids Chemical class 0.000 claims description 29
- 108020004707 nucleic acids Proteins 0.000 claims description 25
- 102000039446 nucleic acids Human genes 0.000 claims description 25
- 241000193403 Clostridium Species 0.000 claims description 17
- 238000007637 random forest analysis Methods 0.000 claims description 14
- 238000012360 testing method Methods 0.000 claims description 10
- 241000186000 Bifidobacterium Species 0.000 claims description 9
- 241001202853 Blautia Species 0.000 claims description 9
- 241000605947 Roseburia Species 0.000 claims description 9
- 238000013179 statistical model Methods 0.000 claims description 9
- 241000466670 Adlercreutzia Species 0.000 claims description 8
- 241001185600 Gemmiger Species 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 7
- 238000012216 screening Methods 0.000 claims description 7
- 239000003153 chemical reaction reagent Substances 0.000 claims description 5
- 238000007671 third-generation sequencing Methods 0.000 claims description 5
- 241000482907 unclassified Ruminococcaceae Species 0.000 claims description 5
- 241001609975 Enterococcaceae Species 0.000 claims description 4
- 239000007787 solid Substances 0.000 claims description 4
- 238000002360 preparation method Methods 0.000 claims description 3
- 230000002596 correlated effect Effects 0.000 claims 1
- 238000011160 research Methods 0.000 abstract description 7
- 230000008506 pathogenesis Effects 0.000 abstract description 6
- 230000009471 action Effects 0.000 abstract description 3
- 230000007547 defect Effects 0.000 abstract description 3
- 238000011161 development Methods 0.000 abstract description 3
- 230000001575 pathological effect Effects 0.000 abstract description 2
- 241000894007 species Species 0.000 description 23
- 244000005700 microbiome Species 0.000 description 19
- 238000012549 training Methods 0.000 description 19
- 239000003550 marker Substances 0.000 description 17
- 108020004414 DNA Proteins 0.000 description 13
- 238000001514 detection method Methods 0.000 description 12
- 208000024891 symptom Diseases 0.000 description 11
- 210000001035 gastrointestinal tract Anatomy 0.000 description 10
- 238000010200 validation analysis Methods 0.000 description 10
- 206010009895 Colitis ischaemic Diseases 0.000 description 6
- 239000012634 fragment Substances 0.000 description 6
- 201000008222 ischemic colitis Diseases 0.000 description 6
- 241001112693 Lachnospiraceae Species 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 238000012165 high-throughput sequencing Methods 0.000 description 5
- 241000192031 Ruminococcus Species 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 4
- 229940000406 drug candidate Drugs 0.000 description 4
- 230000002550 fecal effect Effects 0.000 description 4
- 230000000813 microbial effect Effects 0.000 description 4
- 238000007619 statistical method Methods 0.000 description 4
- 238000000528 statistical test Methods 0.000 description 4
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 3
- 241000194033 Enterococcus Species 0.000 description 3
- 241001465754 Metazoa Species 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 3
- 238000003556 assay Methods 0.000 description 3
- 230000004071 biological effect Effects 0.000 description 3
- 238000013399 early diagnosis Methods 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 230000002068 genetic effect Effects 0.000 description 3
- 102000004169 proteins and genes Human genes 0.000 description 3
- 238000004445 quantitative analysis Methods 0.000 description 3
- 230000004044 response Effects 0.000 description 3
- 238000000926 separation method Methods 0.000 description 3
- 208000004998 Abdominal Pain Diseases 0.000 description 2
- 238000007400 DNA extraction Methods 0.000 description 2
- 206010012735 Diarrhoea Diseases 0.000 description 2
- 108091028043 Nucleic acid sequence Proteins 0.000 description 2
- 241000095588 Ruminococcaceae Species 0.000 description 2
- 208000025865 Ulcer Diseases 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000031018 biological processes and functions Effects 0.000 description 2
- 208000027503 bloody stool Diseases 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000001684 chronic effect Effects 0.000 description 2
- 210000001072 colon Anatomy 0.000 description 2
- 230000000112 colonic effect Effects 0.000 description 2
- 150000001875 compounds Chemical class 0.000 description 2
- 238000012350 deep sequencing Methods 0.000 description 2
- 239000003596 drug target Substances 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 244000005709 gut microbiome Species 0.000 description 2
- 208000035861 hematochezia Diseases 0.000 description 2
- 230000036039 immunity Effects 0.000 description 2
- 230000000144 pharmacologic effect Effects 0.000 description 2
- 238000003752 polymerase chain reaction Methods 0.000 description 2
- 230000002265 prevention Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000012502 risk assessment Methods 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 230000001225 therapeutic effect Effects 0.000 description 2
- 238000002560 therapeutic procedure Methods 0.000 description 2
- 108020004465 16S ribosomal RNA Proteins 0.000 description 1
- 101100518557 Arabidopsis thaliana OTU8 gene Proteins 0.000 description 1
- 238000012935 Averaging Methods 0.000 description 1
- 241000894006 Bacteria Species 0.000 description 1
- 102000019034 Chemokines Human genes 0.000 description 1
- 108010012236 Chemokines Proteins 0.000 description 1
- 208000017667 Chronic Disease Diseases 0.000 description 1
- 102000004127 Cytokines Human genes 0.000 description 1
- 108090000695 Cytokines Proteins 0.000 description 1
- 208000036649 Dysbacteriosis Diseases 0.000 description 1
- 208000027244 Dysbiosis Diseases 0.000 description 1
- 208000034826 Genetic Predisposition to Disease Diseases 0.000 description 1
- 241000282412 Homo Species 0.000 description 1
- 206010061218 Inflammation Diseases 0.000 description 1
- 206010022714 Intestinal ulcer Diseases 0.000 description 1
- 208000002720 Malnutrition Diseases 0.000 description 1
- 241000124008 Mammalia Species 0.000 description 1
- 208000008589 Obesity Diseases 0.000 description 1
- 208000037273 Pathologic Processes Diseases 0.000 description 1
- 241000288906 Primates Species 0.000 description 1
- 206010037660 Pyrexia Diseases 0.000 description 1
- 241000607479 Yersinia pestis Species 0.000 description 1
- 230000001154 acute effect Effects 0.000 description 1
- 239000000654 additive Substances 0.000 description 1
- 125000000539 amino acid group Chemical group 0.000 description 1
- 208000007502 anemia Diseases 0.000 description 1
- 239000000427 antigen Substances 0.000 description 1
- 102000036639 antigens Human genes 0.000 description 1
- 108091007433 antigens Proteins 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000003766 bioinformatics method Methods 0.000 description 1
- 150000001720 carbohydrates Chemical class 0.000 description 1
- 210000000349 chromosome Anatomy 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000010219 correlation analysis Methods 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 206010012601 diabetes mellitus Diseases 0.000 description 1
- 238000009510 drug design Methods 0.000 description 1
- 238000007877 drug screening Methods 0.000 description 1
- 230000007140 dysbiosis Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 210000003608 fece Anatomy 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000004054 inflammatory process Effects 0.000 description 1
- 230000001071 malnutrition Effects 0.000 description 1
- 235000000824 malnutrition Nutrition 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000003340 mental effect Effects 0.000 description 1
- 230000002503 metabolic effect Effects 0.000 description 1
- 229910052751 metal Inorganic materials 0.000 description 1
- 239000002184 metal Substances 0.000 description 1
- 150000002739 metals Chemical class 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000010369 molecular cloning Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 239000002547 new drug Substances 0.000 description 1
- 238000007481 next generation sequencing Methods 0.000 description 1
- 208000015380 nutritional deficiency disease Diseases 0.000 description 1
- 235000020824 obesity Nutrition 0.000 description 1
- 244000039328 opportunistic pathogen Species 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000001717 pathogenic effect Effects 0.000 description 1
- 230000009054 pathological process Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 239000006041 probiotic Substances 0.000 description 1
- 235000018291 probiotics Nutrition 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000004393 prognosis Methods 0.000 description 1
- 238000004451 qualitative analysis Methods 0.000 description 1
- 210000000664 rectum Anatomy 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 238000002864 sequence alignment Methods 0.000 description 1
- 238000011895 specific detection Methods 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 208000011580 syndromic disease Diseases 0.000 description 1
- 238000010998 test method Methods 0.000 description 1
- 208000001072 type 2 diabetes mellitus Diseases 0.000 description 1
- 230000036269 ulceration Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 208000016261 weight loss Diseases 0.000 description 1
- 230000004580 weight loss Effects 0.000 description 1
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6888—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
- C12Q1/689—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms for bacteria
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/136—Screening for pharmacological compounds
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Organic Chemistry (AREA)
- Health & Medical Sciences (AREA)
- Analytical Chemistry (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- Engineering & Computer Science (AREA)
- Genetics & Genomics (AREA)
- Immunology (AREA)
- Physics & Mathematics (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Pathology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
The invention provides an intestinal flora-based ulcerative colitis biomarker and application thereof, and belongs to the technical field of biological medicines. The intestinal flora-based ulcerative colitis biomarker provided by the invention can overcome the defects that the existing ulcerative colitis diagnosis cannot realize early warning, cannot predict the onset and development trend and the like, and can help disease pathological typing, the research of drug action targets, accurate medication, the research of pathogenesis and the like, so that the intestinal flora-based ulcerative colitis biomarker has good value in practical application.
Description
Technical Field
The invention belongs to the technical field of biological medicines, and particularly relates to an ulcerative colitis biomarker based on an intestinal flora and application thereof.
Background
The information in this background section is only for enhancement of understanding of the general background of the invention and is not necessarily to be construed as an admission or any form of suggestion that this information forms the prior art that is already known to a person of ordinary skill in the art.
Ulcerative Colitis (UC), also called nonspecific ulcerative colitis, is a chronic disease that causes inflammation and ulceration of the colon and rectum. The main symptoms of the onset include abdominal pain and diarrhea with bloody stool, and symptoms of weight loss, fever and anemia may also occur during the onset of ulcerative colitis. Usually, the symptoms occur slowly and are mild and severe, the symptoms appear intermittently, and a period of no symptoms is usually accompanied between two attacks.
Ulcerative colitis can occur at any age, but usually starts before the age of 30 years, and the initial onset of the ulcerative colitis occurs between the ages of 50 and 70 in a small number of patients, and although previous studies show that the onset of the ulcerative colitis is caused by the combined action of genetic factors and environmental factors, the current diagnosis of the ulcerative colitis still depends on symptomatic evaluation, and no reliable biological marker is identified. In addition, the existing diagnostic criteria cannot predict onset, efficacy and prognosis at an early stage.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides an ulcerative colitis biomarker based on intestinal flora and application thereof, can overcome the defects that the existing ulcerative colitis diagnosis cannot realize early warning, cannot predict the onset and development trend and the like, and can help disease pathological typing, the research of drug action target spots, accurate medication, the research of pathogenesis and the like.
In order to achieve the technical purpose, the technical scheme adopted by the invention is as follows:
in a first aspect of the invention, there is provided a biomarker comprising at least one selected from the group consisting of:
blautia and/or an analogue thereof;
unclassified _ Ruminococcaceae and/or the like;
clostridium XlVa and/or analogs thereof;
lachnospiraceae _ incertae _ sedis and/or analogues thereof;
roseburia and/or analogs thereof;
a Bifidobacterium and/or an analogue thereof;
gemmiger and/or an analogue thereof;
enterococcus and/or analogs thereof;
adlercreutzia and/or an analog thereof;
clostridium IV and/or analogs thereof.
Further, the analogs of Blautia have an alignment similarity of greater than 85%, such as 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% and 100% compared to Blautia.
Compared with the unclassified _ ruminococcus, the unclassified _ ruminococcus analogues have the comparison similarity of more than 85%, such as 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% and 100%.
The analogs of Clostridium XlVa have an alignment similarity of more than 85%, such as 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% and 100% when compared to Clostridium XlVa.
Compared with the Lachnospiraceae _ incertae _ setis, the comparison similarity of the analogue of the Lachnospiraceae _ incertae _ setis is more than 85%, such as 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% and 100%.
The analogs of Roseburia have an alignment similarity of greater than 85%, such as 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% and 100% compared to Roseburia.
The analogues of Bifidobacterium have a degree of similarity of more than 85%, such as 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% and 100% compared to that of Bifidobacterium.
Compared with the Gemmiiger, the Gemmiiger analogue has the comparison similarity of more than 85 percent, such as 85 percent, 86 percent, 87 percent, 88 percent, 89 percent, 90 percent, 91 percent, 92 percent, 93 percent, 94 percent, 95 percent, 96 percent, 97 percent, 98 percent, 99 percent and 100 percent.
The enteroccaceae analogs have greater than 85% alignment similarity, e.g., 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% and 100% compared to the enteroccaceae.
The analogs of adlercutzia have a similarity of greater than 85%, such as 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% and 100% when compared to adlercutzia.
The Clostridium IV analogs have greater than 85% comparative similarity to Clostridium IV, such as 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% and 100%.
These biomarkers can be used as biomarkers for the detection of ulcerative colitis, and can be used to determine whether a subject is suffering from or susceptible to ulcerative colitis (i.e., to predict the risk of suffering from ulcerative colitis) by determining the presence or absence of one or two or more of these markers in the gut flora of the subject, and can be further used to monitor the efficacy of treatment of patients suffering from ulcerative colitis. In addition, when the healthy sample is in a sufficient amount, the normal value or normal range of each biomarker in the intestinal tract can be obtained by those skilled in the art according to the test and calculation method, so as to indicate the content of each biomarker in the healthy sample, and therefore, whether the subject suffers from or is susceptible to ulcerative colitis can be determined by detecting the content of at least one of the biomarkers in the test sample in the intestinal flora, and the efficiency of the treatment effect of the ulcerative colitis patient can be monitored. Furthermore, it is known to those skilled in the art that when certain gene sequences of an unknown microorganism or a certain nucleic acid source are aligned to have a similarity of 85% or more compared with the gene sequences of a known strain, the microorganism can be considered to belong to the same genus as the strain, or the gene sequences can be classified into the same genus as the strain, and the microorganisms of the same genus generally have the same or similar functions, and thus, the analogs can also be used as markers of ulcerative colitis.
The alignment similarity in the present invention, which may also be referred to as alignment similarity, refers to the ratio of identical base or amino acid residue sequences between a target sequence (a sequence to be determined) and a reference sequence (a known sequence) in the sequence alignment process.
In a second aspect of the invention, there is provided a method of diagnosing whether a subject has, or predicting the risk of, ulcerative colitis or a related disease. The method comprises the following steps: (1) collecting a sample from the subject; (2) determining relative abundance information of biomarkers in the sample obtained in step (1), the biomarkers being biomarkers according to the first aspect of the invention; (3) comparing the relative abundance information described in step (2) to a reference data set and a reference value.
It should be noted that the method is not only used for disease diagnosis in the patent law sense, but also can be used for non-disease diagnosis such as scientific research or enrichment of other personal genetic information and enrichment of genetic information bases. The relative abundance information of each biomarker in the test subject is compared to a reference data set or reference value to determine whether the subject has, or is predicted to be at risk for having, ulcerative colitis or a related disease.
The reference data set refers to the relative abundance information of each biomarker obtained by operating on samples diagnosed as diseased individuals and healthy individuals, and is used as a reference for the relative abundance of each biomarker. In one embodiment of the invention, the reference data set refers to a training data set. According to the present invention, the training set is referred to and the validation set has the meaning well known in the art.
In one embodiment of the present invention, the training set refers to a data set comprising a sample number of ulcerative colitis subjects and non-ulcerative colitis subjects, and the content of each biomarker in the test sample. The validation set is an independent data set used to test the performance of the training set.
The reference value in the present invention refers to a reference value or normal value of a healthy control. It is known to those skilled in the art that when the sample volume is sufficiently large, a range of normal values (absolute values) for each biomarker in the sample can be obtained using detection and calculation methods well known in the art. When detecting the level of the biomarker using an assay, the absolute value of the level of the biomarker in the sample can be directly compared to a reference value to assess risk of disease and to diagnose or early diagnose ulcerative colitis or related diseases, optionally statistical methods can be included.
The ulcerative colitis-related diseases in the present invention mean diseases associated with ulcerative colitis, including a former symptom or disease causing ulcerative colitis and a subsequent or concurrent symptom or disease caused by ulcerative colitis, and also include various types of ulcerative colitis, such as chronic recurrent ulcerative colitis, chronic persistent ulcerative colitis, acute fulminant ulcerative colitis, initial ulcerative colitis, and the like.
According to an embodiment of the invention, the reference data set comprises relative abundance information of biomarkers according to the first aspect of the invention in samples from a plurality of ulcerative colitis patients and a plurality of healthy controls.
According to an embodiment of the present invention, in the step of comparing the relative abundance information with the reference data set in step (2), further comprising performing a multivariate statistical model to obtain the prevalence probability. The rapid and efficient detection can be realized by utilizing the multivariate statistical model.
According to an embodiment of the invention, the multivariate statistical model is a random forest model.
According to an embodiment of the invention, said probability of suffering from a disease being greater than a threshold value indicates that said subject suffers from or is at risk of suffering from ulcerative colitis or a related disease.
According to an embodiment of the invention, the threshold is 0.5.
According to an embodiment of the present invention, the relative abundance information of the biomarkers in step (2) is obtained by using a sequencing method, further comprising: isolating a nucleic acid sample from the sample of the subject, constructing a DNA library based on the nucleic acid sample obtained, sequencing the DNA library to obtain a sequencing result, and comparing the sequencing result to a reference gene set based on the sequencing result to determine relative abundance information of the biomarker. According to an embodiment of the invention, at least one of SOAP2 and MAQ may be used to compare the sequencing result with the reference gene set, so that the comparison efficiency can be improved, and the ulcerative colitis detection efficiency can be improved. According to the embodiment of the invention, multiple (at least two) biomarkers can be detected simultaneously, and the detection efficiency of the ulcerative colitis can be improved.
According to an embodiment of the present invention, the reference gene set comprises a non-redundant gene set obtained by metagenomic sequencing from samples of a plurality of ulcerative colitis patients and a plurality of healthy controls, and then the non-redundant gene set is combined with an intestinal microorganism gene set to obtain the reference gene set. The reference gene set in the present invention may be an existing gene set, such as an existing reference gene set of a disclosed gut microorganism; or metagenomic sequencing can be carried out on a plurality of ulcerative colitis patients and a plurality of healthy control samples to obtain a non-redundant gene set, and then the non-redundant gene set is combined with the intestinal microorganism gene set to obtain the reference gene set, so that the obtained reference gene set has more comprehensive information and more reliable detection results.
The set of non-redundant genes described in the present invention is explained as generally understood by those skilled in the art, and is simply the set of genes remaining after removal of the redundant genes. Redundant genes generally refer to multiple copies of a gene that appear on a chromosome.
According to an embodiment of the invention, the sample is a stool sample.
According to an embodiment of the invention, the sequencing method is performed by a second generation sequencing method or a third generation sequencing method. The sequencing method is not particularly limited, and rapid and efficient sequencing can be realized by sequencing by a second-generation or third-generation sequencing method.
According to an embodiment of the present invention, the sequencing method is performed by at least one selected from Hiseq2000, SOLiD, 454, and single molecule sequencing devices. Therefore, the characteristics of high-throughput and deep sequencing of the sequencing devices can be utilized, so that the subsequent sequencing data can be analyzed, and particularly, the accuracy and the accuracy of statistical test are facilitated.
In a third aspect of the invention, a kit is provided comprising reagents for detecting a biomarker comprising a biomarker according to the first aspect of the invention. With the kit, the relative abundance of these markers in the intestinal flora can be determined, and thus, the relative abundance value obtained can be used to determine whether a subject suffers from or is susceptible to ulcerative colitis, and to monitor the treatment effect of patients with ulcerative colitis.
According to an embodiment of the invention, the kit comprises a set of reference data sets or reference values for referencing the relative abundance of each biomarker. The reference data set or reference values may preferably be attached to a physical carrier, e.g. an optical disc, such as a CD-ROM or the like.
According to an embodiment of the invention, the kit further comprises a first computer program product for performing the obtaining of the reference data set or reference value. I.e. the first computer program product is used to perform the obtaining of a set of reference data sets or reference values for diagnosing whether a subject suffers from ulcerative colitis or a related disease or predicting whether a subject suffers from ulcerative colitis or a related disease.
According to an embodiment of the invention, the kit further comprises a second computer program product, which may also be used to perform a method of diagnosing whether a subject has ulcerative colitis or a related disease or predicting the risk of whether a subject has ulcerative colitis or a related disease according to the second aspect of the invention.
In a fourth aspect of the invention there is provided the use of a biomarker in the manufacture of a kit for diagnosing whether a subject has, or is at risk of having, ulcerative colitis or a related disease. According to an embodiment of the invention, the diagnosing or predicting comprises the steps of: 1) collecting a sample from the subject; 2) determining relative abundance information of biomarkers in the sample obtained in step 1), the biomarkers being biomarkers according to the first aspect of the invention; 3) comparing the relative abundance information described in step 2) to a reference data set or reference value. According to the kit, the relative abundance of these markers in the intestinal flora can be determined, and thus, the relative abundance value obtained can be used to determine whether a subject suffers from or is susceptible to ulcerative colitis and to monitor the efficiency of treatment of patients with ulcerative colitis.
According to the embodiment of the present invention, the following technical features can be further added to the use of the above biomarkers in the preparation of the kit:
according to an embodiment of the invention, in the above use, the reference data set comprises relative abundance information of biomarkers in samples from a plurality of ulcerative colitis patients and a plurality of healthy controls, the biomarkers being biomarkers according to the first aspect of the invention.
According to an embodiment of the present invention, in the above applications, in the step of comparing the relative abundance information described in step 2) with the reference data set, further comprising executing a multivariate statistical model to obtain a probability of illness, preferably, the multivariate statistical model is a random forest model.
According to an embodiment of the invention, in the above use, the probability of illness being greater than a threshold value indicates that the subject suffers from or is at risk of suffering from ulcerative colitis or a related disease; preferably, the threshold is 0.5.
According to an embodiment of the present invention, in the above uses, the obtaining of the relative abundance information of the biomarker in step 2) by a sequencing method further comprises: isolating a nucleic acid sample from the sample of the subject, constructing a DNA library based on the nucleic acid sample obtained, sequencing the DNA library to obtain a sequencing result, and aligning the sequencing result to a reference gene set based on the sequencing result to determine the relative abundance of the biomarker.
According to an embodiment of the present invention, in the above applications, the reference gene set includes metagenomic sequencing from samples of multiple ulcerative colitis patients and multiple healthy controls to obtain a non-redundant gene set, and then combining the non-redundant gene set with an intestinal microorganism gene set to obtain the reference gene set.
According to an embodiment of the invention, in the above use, the sample is a stool sample.
According to an embodiment of the present invention, in the above use, the sequencing method is performed by a second generation sequencing method or a third generation sequencing method.
According to an embodiment of the present invention, in the above use, the sequencing method is performed by at least one selected from Hiseq2000, SOLiD, 454, and single molecule sequencing devices.
In a fifth aspect of the invention, there is provided a use of the biomarker as a target for screening a medicament for treating or preventing ulcerative colitis or a related disease. According to an embodiment of the invention, the biomarker is a biomarker according to the first aspect of the invention. According to embodiments of the present invention, the effect of a candidate drug on these biomarkers before and after use can be used to determine whether the candidate drug can be used to treat or prevent ulcerative colitis.
That is, according to embodiments of the present invention, the change in relative abundance of the biomarker provides a basis for determining whether the drug candidate is effective.
The invention has the beneficial technical effects that:
the present invention is based on the discovery and recognition of the following facts and problems: the large number of microorganisms in the intestinal tract, the intestinal flora, which is a general term for the microbial community residing in the human intestinal tract, has a complex relationship with human health, and the imbalance of the intestinal flora is closely related to diseases such as malnutrition, obesity, diabetes, etc., and is one of the most interesting research focuses in the fields of microbiology, medicine, genetics, etc. in recent years.
The intestinal flora is closely related to the onset of ulcerative colitis, mainly manifested by reduction of symbiotic bacteria, and the pathogenesis of the intestinal flora is related to intestinal mucosal immunity. Although the pathogenesis of ulcerative colitis is not clear at present, the relationship between intestinal flora and the pathogenesis of ulcerative colitis is a research hotspot in recent years, and a series of clinical and animal experiments show that the intestinal flora of patients with ulcerative colitis is different from that of healthy people. The structural diversity of the intestinal flora of patients with ulcerative colitis in the active stage is obviously reduced, in addition, the intestinal flora of patients with ulcerative colitis tends to have larger individual difference, and the difference between normal control groups is smaller, which indicates that the intestinal flora of patients with ulcerative colitis is unstable.
According to the invention, based on comparison and analysis of intestinal flora of ulcerative colitis patients and healthy people, a plurality of related intestinal microorganisms are obtained, and the relative abundance data of high-quality species of the ulcerative colitis patients and the healthy people are taken as a training set, so that the risk evaluation and early diagnosis of the ulcerative colitis patients can be accurately carried out. Compared with the conventional diagnosis method, the method has the characteristics of convenience and rapidness.
The biomarkers related to ulcerative colitis provided by the invention are valuable for early diagnosis. First, the markers of the present invention have high specificity and sensitivity. Second, analysis of stool ensures accuracy, safety, affordability, and patient compliance. And samples of stool are transportable. Polymerase Chain Reaction (PCR) -based assays are comfortable and non-invasive, so one would be more likely to participate in a given screening procedure. Third, the markers of the invention can also be used as a tool for therapy monitoring of ulcerative colitis patients to detect response to therapy. For the reason of abundance measurement, the combination of 10 markers of the present invention is particularly suitable for the case of measuring abundance based on the marker gene alignment method. Therefore, it has good practical application value.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.
FIG. 1 is a ROC curve and AUC based on a random forest model (10 gut markers) consisting of ulcerative colitis patients and healthy control OTU according to example 1 of the present invention.
Detailed Description
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise. It is to be understood that the scope of the invention is not to be limited to the specific embodiments described below; it is also to be understood that the terminology used in the examples is for the purpose of describing particular embodiments only, and is not intended to limit the scope of the present invention.
The terms used herein have meanings commonly understood by those of ordinary skill in the relevant art. However, for a better understanding of the present invention, some definitions and related terms are explained as follows:
ulcerative colitis is a disease characterized by recurrent intestinal ulceration, and is mainly manifested by diarrhea, bloody stool, abdominal pain, and other symptoms. Pathogenesis is thought to be associated with host genetic susceptibility, mucosal immunity, and intestinal flora. Patients with ulcerative colitis have different degrees of dysbacteriosis compared to normal persons, mainly manifested by a reduction in the number of probiotics and an increase in the number of opportunistic pathogens.
"biomarker" refers to "a property that can be objectively detected and evaluated and that can be an indicator of a normal biological process, pathological process, or therapeutic intervention pharmacological response". For example, nucleic acid markers (also referred to as gene markers, e.g., DNA), protein markers, cytokine markers, chemokine markers, carbohydrate markers, antigen markers, antibody markers, species markers (species/genus markers) and functional markers (KO/OG markers) and the like. The meaning of the nucleic acid marker is not limited to the existing gene that can be expressed as a protein having biological activity, and includes any nucleic acid fragment, which may be DNA, RNA, modified DNA or RNA, unmodified DNA or RNA, and a collection of these. Nucleic acid markers may also sometimes be referred to herein as signature fragments. In the present invention, biomarkers may also be denoted as "intestinal markers" because the biomarkers found to be associated with ulcerative colitis are all present in the intestinal tract of the subject. Biomarkers are measured and evaluated, often to examine normal biological processes, pathogenic processes, or therapeutic intervention pharmacological responses, and are useful in many scientific fields.
The biomarker can be used for analyzing fecal samples of healthy people and patients with ulcerative colitis in batches by using high-throughput sequencing. Comparing the healthy population with a population of ulcerative colitis patients based on high throughput sequencing data to determine specific nucleic acid sequences associated with the population of ulcerative colitis patients.
Briefly, the procedure is as follows:
collecting and processing samples: collecting fecal samples of healthy people and ulcerative colitis patients, and performing DNA extraction by using the kit to obtain nucleic acid samples;
library construction and sequencing: constructing and sequencing a DNA library by using high-throughput sequencing so as to obtain a nucleic acid sequence of the intestinal microorganisms contained in the fecal sample;
the specific intestinal microorganism nucleic acid sequence related to the ulcerative colitis patient is determined by bioinformatics analysis method. First, the sequenced sequences (reads) are aligned with a reference gene set (also referred to as a reference gene set, which may be a newly constructed gene set or a database of any known sequences, e.g., using a known non-redundant gene set of human intestinal microflora). Next, based on the alignment results, the relative abundance of each gene in the nucleic acid samples from the fecal samples of the healthy population and the ulcerative colitis patient population, respectively, was determined. By comparing the sequencing sequence with the reference gene set, the corresponding relation between the sequencing sequence and the genes in the reference gene set can be established, so that the number of the corresponding sequencing sequence can effectively reflect the relative abundance of the genes aiming at the specific genes in the nucleic acid sample. Thus, the relative abundance of a gene in a nucleic acid sample can be determined by comparison results, according to conventional statistical analysis. Finally, after the relative abundance of each gene in the nucleic acid sample is determined, the relative abundance of each gene in the nucleic acid sample from the feces of the healthy population and the ulcerative colitis patient population is statistically examined, whereby it can be judged whether there is a gene whose relative abundance is significantly different between the healthy population and the ulcerative colitis patient population, and if there is a gene that is significantly different, the gene is regarded as a biomarker of an abnormal state, i.e., a nucleic acid marker.
In addition, for a known or newly constructed reference gene set, the reference gene set usually comprises gene species information and functional annotations, so that on the basis of determining the relative abundance of the genes, the species information and the functional annotations of the genes can be further classified, thereby determining the species relative abundance and the functional relative abundance of each microorganism in the intestinal flora, and further determining the species marker and the functional marker of the abnormal state.
Briefly, the method of determining a species marker and a functional marker further comprises: comparing sequencing sequences of healthy people and ulcerative colitis patient groups with a reference gene set; respectively determining the species relative abundance and the function relative abundance of each gene in the nucleic acid samples of the healthy population and the ulcerative colitis patient population based on the comparison result; performing statistical tests on the species relative abundance and the function relative abundance of each gene in nucleic acid samples from healthy people and ulcerative colitis patient groups; and determining species markers and functional markers, respectively, that have significant differences in relative abundance between nucleic acid samples of healthy and ulcerative colitis patient populations. According to embodiments of the present invention, statistical tests, such as summing, averaging, median, etc., of the relative abundances of genes from the same species and genes with the same functional annotation can be employed to determine functional relative abundance and species relative abundance.
Finally, biological markers were identified for which there was a significant difference in relative abundance between stool samples from healthy and ulcerative colitis patient populations, namely the inclusion of microbial species: blautia and/or an analogue thereof; unclassified _ ruminococcus and/or the like (it is to be noted that unclassified _ ruminococcus is an unclassified member of the family Ruminococcaceae or an unclassified genus of the family Ruminococcaceae); clostridium XlVa and/or analogs thereof; lachnospiraceae _ incertae _ sedis and/or analogues thereof; roseburia and/or analogs thereof; a Bifidobacterium and/or an analogue thereof; gemmiger and/or an analogue thereof; enterococcus and/or analogs thereof; adlercreutzia and/or an analog thereof; clostridium IV and/or analogs thereof. Thus, by detecting the presence or absence of at least one of the above-mentioned microorganisms, it is effectively determined whether a subject is suffering from or susceptible to ulcerative colitis, and can be used to monitor the efficacy of treatment in patients suffering from ulcerative colitis. The term "presence" as used herein is to be understood in a broad sense and refers to both qualitative analysis of the presence of a corresponding target in a sample and quantitative analysis of the target in the sample, and further to statistical analysis or any known mathematical operation of the quantitative analysis results obtained with reference, e.g. quantitative analysis results obtained by parallel tests on samples with known states. One skilled in the art can readily select the appropriate compound according to the needs and experimental conditions. According to embodiments of the present invention, it is also possible to determine whether a subject is suffering from or susceptible to ulcerative colitis by determining the relative abundance of these microorganisms in the gut flora, and to monitor the efficacy of treatment for patients with ulcerative colitis.
The presence of at least one of the above-mentioned microbial species in the subject's intestinal flora, or the presence of two or more of the above-mentioned microbial species in the subject's intestinal flora, i.e., the presence of the above-mentioned biomarker combinations, can be used to effectively determine whether the subject has or is susceptible to ulcerative colitis, and can be used to monitor the efficacy of treatment for patients with ulcerative colitis. In the present invention, the term "biomarker combination" refers to a set of biomarkers (i.e. a combination of two or more biomarkers).
Species and functional markers one skilled in the art can also determine the presence or absence of said species and function in the gut flora by conventional species identification means and biological activity testing means. For example, species identification can be performed by performing 16s rRNA.
Apparatus for detecting whether a subject has ulcerative colitis or a related disease or predicting whether a subject has ulcerative colitis or a related disease
According to a further aspect of the present invention, there is provided an apparatus for detecting whether a subject has ulcerative colitis or a related disease or predicting whether a subject has ulcerative colitis or a related disease, the apparatus comprising sample collection means, biomarker relative abundance determination means and prevalence probability determination means.
Wherein the sample acquiring device is adapted to acquire a sample from the subject; a biomarker relative abundance determination device connected to the sample acquisition device and adapted to determine relative abundance information of a biomarker in the obtained sample, the biomarker being according to the first aspect of the invention; the disease probability determination device is connected with the biomarker relative abundance determination device, and the disease probability determination device is used for comparing the relative abundance information of the biomarkers obtained in the relative abundance determination device with a reference data set or a reference value.
According to a specific embodiment of the invention, the reference data set comprises relative abundance information of the biomarkers according to the first aspect of the invention in samples from a plurality of ulcerative colitis patients and a plurality of healthy controls.
According to an embodiment of the present invention, the disease probability determining device further comprises a multivariate statistical model for obtaining the disease probability; preferably, the multivariate statistical model is a random forest model. According to a preferred embodiment of the invention, said probability of illness being greater than a threshold value indicates that said subject suffers from or is at risk of suffering from ulcerative colitis or a related disease; preferably, the threshold is 0.5.
According to an embodiment of the present invention, the biomarker relative abundance determination apparatus further comprises: a nucleic acid sample separation unit, a sequencing unit and an alignment unit. According to an embodiment of the invention, the nucleic acid sample separation unit is adapted to separate a nucleic acid sample from said sample of said subject, the sequencing unit is connected to the nucleic acid sample separation unit and, based on the obtained nucleic acid sample, constructs a DNA library, sequences said DNA library so as to obtain a sequencing result, the alignment unit is connected to the sequencing unit and, based on said sequencing result, aligns the sequencing result with a reference gene set to determine the relative abundance information of said biomarker.
According to a specific embodiment of the present invention, the reference gene set comprises a non-redundant gene set obtained by metagenomic sequencing from samples of a plurality of ulcerative colitis patients and a plurality of healthy controls, and then the non-redundant gene set is combined with an intestinal microorganism gene set to obtain the reference gene set.
According to embodiments of the present invention, the sequencing unit is not particularly limited. Preferably, the sequencing unit is performed using a second generation sequencing method or a third generation sequencing method. Preferably, the sequencing unit is at least one selected from Hiseq2000, SOLiD, 454, and single molecule sequencing devices. Therefore, the characteristics of high-throughput and deep sequencing of the sequencing devices can be utilized, so that the subsequent sequencing data can be analyzed, and particularly, the accuracy and the accuracy of statistical test are facilitated.
According to an embodiment of the invention, the alignment unit performs the alignment using at least one selected from SOAP2 and MAQ. Therefore, the comparison efficiency can be improved, and the efficiency of detecting the ulcerative colitis can be improved.
In addition, according to the embodiment of the invention, the invention also provides a drug screening method. Therefore, according to the embodiment of the invention, the marker closely related to ulcerative colitis is used as a drug design target to screen drugs, and discovery of new drugs for treating ulcerative colitis is promoted. For example, whether a candidate drug is a drug for treating or preventing ulcerative colitis can be determined by detecting changes in biomarker levels before and after exposure to the candidate drug. For example, it is tested whether the level of the pest marker is decreased after the drug candidate is contacted, and whether the level of the beneficial biomarker is increased after the drug candidate is contacted. In addition, the drug pair Blautia and/or an analog thereof; unclassified _ Ruminococcaceae and/or the like; clostridium XlVa and/or analogs thereof; lachnospiraceae _ incertae _ sedis and/or analogues thereof; roseburia and/or analogs thereof; a Bifidobacterium and/or an analogue thereof; gemmiger and/or an analogue thereof; enterococcus and/or analogs thereof; adlercreutzia and/or an analog thereof; the direct or indirect effect of the biological activity of Clostridium IV and/or at least one of its analogues is used to screen whether a candidate compound can be used as a medicament for the treatment or prevention of ulcerative colitis. Thus, according to an embodiment of the present invention, the present invention also proposes the use of a biomarker according to ulcerative colitis for screening a medicament for the treatment or prevention of ulcerative colitis.
Unless otherwise indicated, the techniques used in the examples are conventional and well known to those skilled in the art, and may be performed according to the third edition of the molecular cloning, laboratory Manual, or related products, and the reagents and products used are also commercially available. Various procedures and methods not described in detail are conventional methods well known in the art, and the sources, trade names, and components of the reagents used are indicated at the time of first appearance, and the same reagents used thereafter are the same as those indicated at the first appearance, unless otherwise specified.
The invention adopts an analysis method of Metagenome-wide association study (MWAS) to analyze the flora composition and functional difference of the excrement sample through sequencing; and distinguishing the ulcerative colitis population and the non-ulcerative colitis population by using a random forest distinguishing model to obtain the disease probability, and using the disease probability to evaluate, diagnose and early diagnose the disease risk of the ulcerative colitis or search potential drug targets.
According to the present invention, the term "individual" refers to an animal, in particular a mammal, such as a primate, preferably a human.
According to the present invention, terms such as "a," "an," and "the" do not refer only to a singular entity, but also include the general class that may be used to describe a particular embodiment.
In the present invention, the sequencing (next generation sequencing) and MWAS are well known in the art, and can be adjusted by those skilled in the art according to the circumstances. According to the embodiments of the present invention, the method can be performed according to the method described in the literature (Wang J, Jian H. Metagenome-wide association students: fine-mining the microbiome. Nature reviews Microbiology,2016,14(8): 508-.
In the present invention, the use methods of the random forest model and the ROC curve are well known in the art, and those skilled in the art can set and adjust parameters according to specific situations. According to embodiments of the present invention, the methods may be performed according to methods described in the literature (Drogand, Dunn WB, Lin W, et al. intargeted metabolic Profiling additives, alternative metals of Type 2Diabetes in a promoter, nested case controlled chemistry. Clin Chem 2015,61: 487-.
In the invention, a training set of biomarkers of ulcerative colitis subjects and non-ulcerative colitis subjects is constructed, and the content value of the biomarkers in a sample to be tested is evaluated by taking the training set as a reference.
One skilled in the art knows that when further expanding the sample size, the normal content value interval (absolute value) of each biomarker in the sample can be derived using sample detection and calculation methods well known in the art. The absolute value of the biomarker content detected can be compared with the normal content value, optionally in combination with statistical methods, to derive a risk assessment and diagnosis of ulcerative colitis, and to monitor the efficacy of treatment for patients with ulcerative colitis.
Without wishing to be bound by any theory, the inventors indicate that these biomarkers are the intestinal flora present in humans. The method of the invention is used for carrying out correlation analysis on intestinal flora of a subject, and the biomarker of the ulcerative colitis population shows a certain content range value in flora detection. Meanwhile, the invention has better specificity for the biomarkers of ulcerative colitis, and has extremely strong distinguishing capability for Ischemic Colitis (IC) which has extremely similar clinical symptoms and is easy to cause misdiagnosis, so that the invention can realize the specific diagnosis of the intestinal tract microbial markers on the ulcerative colitis.
The invention is further illustrated by the following examples, which are not to be construed as limiting the invention thereto. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention.
Example 1
1.1 sample Collection
According to the method described in the reference (A reagent-side association study of gut microbiota intuition 2diabetes, Nature,2012,490(7418):55-60), a stool sample is collected, then frozen and transported, and rapidly transferred to-80 ℃ for storage, and DNA extraction is performed to obtain an extracted DNA sample. Stool samples from subjects with ulcerative colitis and non-ulcerative colitis according to the present invention are from china. The total number of 99 samples, 44 healthy samples and 55 ulcerative colitis samples. The study protocol was approved by the ethical committee of the university of Shandong, Qilu hospital.
1.2 metagenomic sequencing and Assembly
The extracted DNA samples are used for constructing a sequencing library, and bidirectional (Paired-end) metagenome sequencing (the inserted fragment is 350bp, and the read length is 100bp) is carried out on an IlluminaHiSeq2000 sequencing platform. The data generated by sequencing was filtered (quality-controlled) to remove adapter contaminating sequences, low quality sequences and host genome contaminating sequences, resulting in high quality sequencing fragments (reads).
1.3 genomic alignment and abundance calculation
The relative abundance of the species can be calculated by inputting the high quality sequencing fragments (reads) of the above "1.2 metagenomic sequencing and screening" into the software Metaplan 2(http:// segatalab. cibio. unit. it/tools/metaplan 2 /). Reference is made to the Methods described in the literature (Truong D T, Franzosa E A, TiCkle T L, et al. MetaPhlAn2 for modified metallurgical chemical routine. Nature Methods,2015,12(10): 902-. The abundance of the protein is calculated as follows: 1) aligning the high-quality sequencing fragments to a reference marker gene; 2) counting the number of the inserted fragments according to the comparison result; 3) normalizing the length of the marker gene by the number of inserts (normalizing by the average gene length and rounding down to obtain the abundance of the corresponding species) yields the corresponding abundance.
1.4 screening of potential biomarkers for the development of ulcerative colitis Using random forest (ROC/AUC)
In order to further screen the potential disease intestinal biomarkers, this example constructs a training set of biomarkers of ulcerative colitis subjects and non-ulcerative colitis subjects, and based on the training set, the content value of the biomarkers in the sample to be tested is evaluated. Wherein, in the present invention, the training set and the validation set have meanings well known in the art. In an embodiment of the present invention, the training set refers to a data set comprising a sample number of ulcerative colitis subjects and non-ulcerative colitis subjects, and the content of each biomarker in the test sample. The validation set is an independent data set used to test the performance of the training set. The non-ulcerative colitis subjects are good in mental status, and the subjects can be human or model animals, and in this embodiment, the experiments are performed by using human as the subjects.
The method specifically comprises the following steps:
the invention selects 55 ulcerative colitis patients and 44 healthy people as training sets.
1.4.1 biomarkers screened using training set data
First, the relative abundance of species in each sample in the training set was calculated as described in 1.3. The species of the training set are then input into a Random Forest (RF) classifier (4.6-12 inR3.2.5). The classifier was performed 5 times.
And 10-fold cross validation, calculating the ulcerative colitis risk of each individual by using the relative abundance of the species screened by the RF model, drawing an ROC curve, and calculating AUC as a discrimination model efficiency evaluation parameter. The combination with the number of marker combinations less than 30 and the best discrimination efficiency is selected as the combination of the invention. Outputting the importance index of each species in the model, wherein the higher the importance index is, the higher the importance of the marker for distinguishing ulcerative colitis from non-ulcerative colitis is.
The resulting RF classifier of the present invention contains 10 biomarkers, the details of which are shown in Table 1. Figure 1 shows ROC curves and AUC for a training set consisting of ulcerative colitis patients and healthy controls based on a random forest model (10 biomarkers), where specificity characterizes the probability for not diseased pairs, sensitivity refers to the probability for diseased pairs, the discriminatory potency for the training set samples is: AUC 0.97, 95% confidence interval CI 0.94-1. The results show that the marker combination obtained by the model can be used as a potential biomarker for distinguishing ulcerative colitis from non-ulcerative colitis.
TABLE 110 biomarker details
OTU | Biomarker information | Importance of |
OTU617 | Blautia | 6.94707 |
OTU3035 | unclassified_Ruminococcaceae | 5.53507 |
OTU659 | Clostridium XlVa | 5.32256 |
OTU3974 | Lachnospiracea_incertae_sedis | 4.9148 |
OTU3056 | Roseburia | 4.34222 |
OTU1360 | Bifidobacterium | 4.30603 |
OTU1995 | Gemmiger | 3.99925 |
OTU86 | Enterococcaceae | 3.64026 |
OTU8 | Adlercreutzia | 3.44398 |
OTU3791 | Adlercreutzia | 3.43459 |
1.4.2 validation of the biomarkers screened Using the validation set data
The model is verified by randomly using part of independent population, and the probability of suffering from the ulcerative colitis is more than or equal to 0.5 to predict that the individual has the risk of suffering from the ulcerative colitis or suffers from the ulcerative colitis. First, the relative abundance of each biomarker in each sample in the training set was calculated as described in 1.3. The verification set data was then verified using a random forest model according to the method of 1.4.1. The prevalence probabilities based on the 10 marker combination validation set are shown in table 2 below.
TABLE 2 probability of illness based on 10 marker combination validation set
U: patients with ulcerative colon; n: healthy controls
1.4.3 intestinal biomarker specific detection
Ischemic Colitis (IC) is a group of syndromes with colonic insufficiency as the main symptom due to obliterative or non-obliterative disease of colonic vessels, and its clinical symptoms are very similar to ulcerative colitis and are liable to cause misdiagnosis.
To verify the detection specificity of the intestinal biomarkers of the invention for ulcerative colitis, 32 additional patients with ischemic colitis and 41 additional patients with ulcerative colitis were selected for verifying the diagnostic efficacy of the intestinal biomarker combinations. The results of re-modeling using the above microbial markers and validation in the validation set, shown in table 3 below, surprisingly show that when modeled using the above 10 biomarkers, the model was able to distinguish ulcerative colitis from ischemic colitis by 100%, the model predicted AUC values reached 100%, and the 95% confidence interval CI was 1.00-1.00. Thereby effectively realizing the specific diagnosis of the intestinal microbial marker of the invention on the ulcerative colitis.
TABLE 3 intestinal biomarker combination specificity assay
Number of | AUC | 95% Confidence Interval (CI) | |
5 | 0.88 | 0.81-0.95 | |
6 | 0.92 | 0.88-0.96 | |
7 | 0.95 | 0.92-0.98 | |
8 | 0.96 | 0.94-0.98 | |
9 | 0.98 | 0.97-0.99 | |
10 | 1.00 | 1.00-1.00 |
The results show that the biomarker disclosed by the invention has higher accuracy and specificity and has good prospect of being developed into a diagnosis method, thereby providing a basis for disease risk assessment, diagnosis and early diagnosis of ulcerative colitis and searching for potential drug targets.
The invention therefore proposes the following applications:
the ulcerative colitis biomarker combination based on the intestinal flora is used as a detection target or an application of a detection target in preparation of a detection kit.
The intestinal flora-based ulcerative colitis biomarker combination is used as an application of a target point in screening medicines for treating and/or preventing ulcerative colitis.
The change in the relative abundance of the biomarker panel provides a basis for determining whether the drug candidate is effective.
It should be noted that the above examples are only used to illustrate the technical solutions of the present invention and not to limit them. Although the present invention has been described in detail with reference to the examples given, those skilled in the art can modify the technical solution of the present invention as needed or equivalent substitutions without departing from the spirit and scope of the technical solution of the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911267223.3A CN110904213B (en) | 2019-12-11 | 2019-12-11 | An ulcerative colitis biomarker based on intestinal flora and its application |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911267223.3A CN110904213B (en) | 2019-12-11 | 2019-12-11 | An ulcerative colitis biomarker based on intestinal flora and its application |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110904213A true CN110904213A (en) | 2020-03-24 |
CN110904213B CN110904213B (en) | 2023-09-26 |
Family
ID=69824601
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911267223.3A Active CN110904213B (en) | 2019-12-11 | 2019-12-11 | An ulcerative colitis biomarker based on intestinal flora and its application |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110904213B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111440884A (en) * | 2020-04-22 | 2020-07-24 | 中国医学科学院北京协和医院 | Gut-derived flora for the diagnosis of sarcopenia and its use |
CN111808939A (en) * | 2020-03-23 | 2020-10-23 | 昆明医科大学第一附属医院 | A diagnostic marker to aid in the diagnosis of ulcerative colitis |
CN114107484A (en) * | 2021-12-08 | 2022-03-01 | 上海锐翌生物科技有限公司 | Ulcerative colitis marker gene and application thereof |
CN114292930A (en) * | 2021-12-14 | 2022-04-08 | 上海交通大学医学院附属瑞金医院 | An application of fecal microbiota-based detection in children with inflammatory bowel disease |
CN117949648A (en) * | 2024-03-26 | 2024-04-30 | 中国医学科学院北京协和医院 | A marker for detecting ulcerative colitis and its use |
CN118995910A (en) * | 2024-08-06 | 2024-11-22 | 广州市第一人民医院(广州消化疾病中心、广州医科大学附属市一人民医院、华南理工大学附属第二医院) | Terrisporobacter spp application of marker in auxiliary prediction of curative effect of VDZ on ulcerative colitis |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105368944A (en) * | 2015-11-23 | 2016-03-02 | 广州基迪奥生物科技有限公司 | Biomarker capable of detecting diseases and application of biomarker |
JP2017122603A (en) * | 2016-01-05 | 2017-07-13 | 哲朗 高山 | Mapping method for diagnosis and/or prognosis prediction of ulcerative colitis |
JP2018198560A (en) * | 2017-05-26 | 2018-12-20 | 国立大学法人神戸大学 | Ulcerative colitis examination method and device, and therapeutic agent screening method |
CN110541026A (en) * | 2019-08-17 | 2019-12-06 | 昆明医科大学第一附属医院 | A biomarker for detecting ulcerative colitis and its application |
-
2019
- 2019-12-11 CN CN201911267223.3A patent/CN110904213B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105368944A (en) * | 2015-11-23 | 2016-03-02 | 广州基迪奥生物科技有限公司 | Biomarker capable of detecting diseases and application of biomarker |
JP2017122603A (en) * | 2016-01-05 | 2017-07-13 | 哲朗 高山 | Mapping method for diagnosis and/or prognosis prediction of ulcerative colitis |
JP2018198560A (en) * | 2017-05-26 | 2018-12-20 | 国立大学法人神戸大学 | Ulcerative colitis examination method and device, and therapeutic agent screening method |
CN110541026A (en) * | 2019-08-17 | 2019-12-06 | 昆明医科大学第一附属医院 | A biomarker for detecting ulcerative colitis and its application |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111808939A (en) * | 2020-03-23 | 2020-10-23 | 昆明医科大学第一附属医院 | A diagnostic marker to aid in the diagnosis of ulcerative colitis |
CN111440884A (en) * | 2020-04-22 | 2020-07-24 | 中国医学科学院北京协和医院 | Gut-derived flora for the diagnosis of sarcopenia and its use |
CN111440884B (en) * | 2020-04-22 | 2021-03-16 | 中国医学科学院北京协和医院 | Gut-derived flora for the diagnosis of sarcopenia and its use |
CN114107484A (en) * | 2021-12-08 | 2022-03-01 | 上海锐翌生物科技有限公司 | Ulcerative colitis marker gene and application thereof |
CN114107484B (en) * | 2021-12-08 | 2024-03-22 | 上海锐翌生物科技有限公司 | Ulcerative colitis marker genes and their applications |
CN114292930A (en) * | 2021-12-14 | 2022-04-08 | 上海交通大学医学院附属瑞金医院 | An application of fecal microbiota-based detection in children with inflammatory bowel disease |
CN114292930B (en) * | 2021-12-14 | 2023-12-26 | 上海交通大学医学院附属瑞金医院 | Application of fecal flora-based detection in children inflammatory bowel disease |
CN117949648A (en) * | 2024-03-26 | 2024-04-30 | 中国医学科学院北京协和医院 | A marker for detecting ulcerative colitis and its use |
CN118995910A (en) * | 2024-08-06 | 2024-11-22 | 广州市第一人民医院(广州消化疾病中心、广州医科大学附属市一人民医院、华南理工大学附属第二医院) | Terrisporobacter spp application of marker in auxiliary prediction of curative effect of VDZ on ulcerative colitis |
Also Published As
Publication number | Publication date |
---|---|
CN110904213B (en) | 2023-09-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110904213B (en) | An ulcerative colitis biomarker based on intestinal flora and its application | |
CN112119167B (en) | Biomarker for depression and application thereof | |
WO2020244018A1 (en) | Small-scale schizophrenia biomarker combination, application thereof and metaphlan2 screening method therefor | |
CN111440884A (en) | Gut-derived flora for the diagnosis of sarcopenia and its use | |
CN111020020A (en) | A biomarker combination for schizophrenia, its application and metaphlan2 screening method | |
CN112877417A (en) | Screening and application of polycystic ovarian syndrome intestinal flora biomarker | |
CN111020021A (en) | Intestinal flora-based small-scale schizophrenia biomarker combination, application thereof and mOTU screening method | |
CN112384634B (en) | Osteoporosis biomarkers and their uses | |
CN115786556A (en) | Application of Megasphaera micturition intestinal strain | |
CN110358849A (en) | Derived from the biomarker of the Diagnosis of Pancreatic inflammation of enteron aisle, screening technique and application thereof | |
CN110396538B (en) | Migraine Biomarkers and Their Uses | |
CN114657270B (en) | Alzheimer disease biomarker based on intestinal flora and application thereof | |
CN116926187A (en) | Application of intestinal microbial marker | |
CN110396537B (en) | Asthma Biomarkers and Their Uses | |
CN112063709A (en) | Diagnostic kit for myasthenia gravis by taking microorganisms as diagnostic marker and application | |
CN116904575B (en) | Biomarkers associated with physical decline in silicosis patients and their uses | |
CN111996248B (en) | Reagent for detecting microorganism and application thereof in diagnosis of myasthenia gravis | |
HK40042727A (en) | Biomarker for depression and use thereof | |
CN119162304A (en) | A Crohn's disease intestinal microbial marker and its application and method for constructing a Crohn's disease detection model | |
CN115478100A (en) | Application of intestinal flora marker as marker for autism risk screening, detection kit and detection system | |
CN119560000A (en) | An intestinal microbial marker for irritable bowel syndrome and its application and method for constructing a detection model | |
CN119360942A (en) | A ulcerative colitis intestinal microbial marker and its application and ulcerative colitis detection model | |
WO2024155681A1 (en) | Methods and systems for detecting and assessing liver conditions | |
HK40072663B (en) | Biomarkers for alzheimer's disease based on intestinal flora and application thereof | |
HK40072663A (en) | Biomarkers for alzheimer's disease based on intestinal flora and application thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |