WO2023128419A1 - Method for screening colorectal cancer and colorectal polyps or advanced adenomas and application thereof - Google Patents
Method for screening colorectal cancer and colorectal polyps or advanced adenomas and application thereof Download PDFInfo
- Publication number
- WO2023128419A1 WO2023128419A1 PCT/KR2022/020461 KR2022020461W WO2023128419A1 WO 2023128419 A1 WO2023128419 A1 WO 2023128419A1 KR 2022020461 W KR2022020461 W KR 2022020461W WO 2023128419 A1 WO2023128419 A1 WO 2023128419A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- seq
- nos
- genes
- primers
- colorectal cancer
- Prior art date
Links
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
- C12Q1/6886—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/50—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
- G01N33/53—Immunoassay; Biospecific binding assay; Materials therefor
- G01N33/574—Immunoassay; Biospecific binding assay; Materials therefor for cancer
Definitions
- the present invention relates to a method for screening colon cancer and colon polyps or advanced adenomas and a kit used for the method.
- Colorectal cancer is a malignant tumor that occurs in the colon and rectum, which constitute the large intestine.
- colorectal cancer is the third most common cancer among all cancers, with 1.9 million new cases worldwide, and 935,000 people died from colorectal cancer, ranking third in the mortality rate due to cancer. It is a major cancer
- the 5-year relative survival rate of colorectal cancer is significantly lowered according to the degree of progression of the cancer. In Stage I, the 5-year relative survival rate reaches 90%, but in Stage IV, the 5-year survival rate rapidly decreases to 14%. However, only 37% of cases are diagnosed at Stage I because most of them do not show symptoms in Stage I. Therefore, early diagnosis of colorectal cancer through regular screening is important in increasing the survival rate of colorectal cancer.
- the selection-cancerization process refers to the process in which normal epithelial cells of the colon progress to colorectal cancer via non-advanced adenoma and advanced adenoma.
- Progressive adenomas are adenomas that are highly likely to develop into colorectal cancer. Histologically, they are larger than 10 mm, contain more than 25% villous components, have high-grade dysplastic lesions, or have three or more adenomas.
- a 13-year follow-up study revealed that subjects with advanced adenomas were 2.7 times more likely to develop colorectal cancer and 2.6 times more likely to die from colorectal cancer than subjects with control and non-advanced adenomas. Therefore, in order to lower the incidence of colorectal cancer, it is important to detect and remove advanced adenomas at the stage.
- colonoscopy and fecal occult blood test are performed for the diagnosis of colorectal cancer.
- Colonoscopy can examine the entire large intestine at once and can remove adenomas or some early cancers found during the examination. Through a meta-analysis, it has been reported that colonoscopy reduces both the incidence and mortality of colorectal cancer by about 70%.
- Colonoscopy requires a bowel preparation process before examination, and the degree of bowel preparation has a very important effect on the accuracy and quality of the examination.
- intestinal perforation which can occur as a complication, appears with a frequency of about 0.09%. Therefore, as a regular colorectal cancer screening test targeting a large population, the number of doctors that can be performed is limited, the patient's compliance may be poor due to pain during examination, and discomfort in pretreatment, and complications may occur relatively often.
- Fecal occult blood test is a method of diagnosis by detecting bleeding from a mass in the large intestine in the stool. It is a non-invasive method, has no special side effects, is relatively easy to implement, and has the advantage of low cost. Compared to those who did not perform fecal occult blood testing, those who underwent fecal occult blood testing had a 10-40% lower mortality rate.
- the sensitivity of the fecal occult blood test for colorectal cancer and advanced adenoma was 56-74% and 23-31%, respectively, and the specificity was 90-95%. Since bleeding from colon tumors is often intermittent, the accuracy of the test may vary depending on whether or not the specimen is properly collected, and the compliance of test subjects in using a stool sample may be low.
- the present invention solves the above problems and has been made by the above necessity, and an object of the present invention is to use a quantitative reverse transcription polymerization reaction based on a blood sample that is relatively easy to extract and an artificial intelligence prediction model produced through the result. It is to provide an information provision method for developing a molecular diagnostic test method for colorectal cancer and advanced adenoma or colon polyp with high sensitivity and specificity.
- Another object of the present invention is a molecule of colorectal cancer and colorectal polyps or advanced adenomas with high sensitivity and specificity using an artificial intelligence prediction model produced through quantitative reverse transcription polymerization based on a blood sample that is relatively easy to extract and the result. It is to provide a composition for diagnostic testing.
- the present invention is a colorectal cancer group and colorectal cancer high-risk group containing primers or probes involved in measuring the relative expression levels of IL1B, LTF, TNFSF13B, ITIH4, CXCL11, MAPK6, GK, and MCAM genes as active ingredients , It provides a composition capable of distinguishing between a low-risk group and a normal group.
- the primers and probes are SEQ ID NOs: 7 to 9, SEQ ID NOs: 13 to 15, SEQ ID NOs: 16 to 18, SEQ ID NOs: 22 to 24, SEQ ID NOs: 34 to 36, SEQ ID NOs: 40 to 42, SEQ ID NOs: 43 to 45, and sequences It is preferably composed of the sequence shown in Nos. 56 to 58, but is not limited thereto.
- the normal group is a case in which there are no lesions in the colon through colonoscopy
- the low-risk group is a case in which there are less than three low-risk adenomas
- the high-risk group is a case in which three or more low-risk adenomas are present through colonoscopy, and a high-risk group is present. It is preferable to include one or more adenomas and carcinoma in situ, but is not limited thereto.
- the present invention is a colorectal cancer group containing primers or probes involved in measuring the relative expression levels of CES1, IL1B, TNFSF13B, ITIH4, CXCL11, MAPK6, GK, and MCAM genes as an active ingredient, and colorectal cancer high-risk, low-risk and normal A composition capable of classifying groups is provided.
- the primers and probes are SEQ ID NOs: 4 to 6, SEQ ID NOs: 7 to 9, SEQ ID NOs: 16 to 18, SEQ ID NOs: 22 to 24, SEQ ID NOs: 31 to 33, SEQ ID NOs: 34 to 36, SEQ ID NOs: 40 to 42, SEQ ID NOs: 43 to 45 and SEQ ID NOs: 56 to 58, but is not limited thereto.
- the present invention provides a composition capable of distinguishing a high-risk group from a colorectal cancer high-risk group, a low-risk group, and a normal group, comprising primers or probes involved in measuring the relative expression levels of IL1B, LTF, TNFSF13B, ITIH4, CXCL11, and MAPK6 genes as an active ingredient to provide.
- the primers and probes preferably consist of the sequences shown in SEQ ID NOs: 7 to 9, SEQ ID NOs: 13 to 15, SEQ ID NOs: 16 to 18, SEQ ID NOs: 22 to 24, SEQ ID NOs: 34 to 36, and SEQ ID NOs: 40 to 42, but therefor Not limited.
- the present invention a) measuring the relative expression levels of IL1B, LTF, TNFSF13B, ITIH4, CXCL11, MAPK6, GK, and MCAM genes using primers and probes through polymerase chain reaction,
- the primers and probes used in a) are SEQ ID NOs: 7 to 9, SEQ ID NOs: 13 to 15, SEQ ID NOs: 16 to 18, SEQ ID NOs: 22 to 24, SEQ ID NOs: 34 to 36, Consisting of the sequences set forth in SEQ ID NOs: 40 to 42, SEQ ID NOs: 43 to 45, and SEQ ID NOs: 56 to 58,
- the primers and probes used in b) are SEQ ID NOs: 4 to 6, SEQ ID NOs: 7 to 9, SEQ ID NOs: 16 to 18, SEQ ID NOs: 22 to 24, SEQ ID NOs: 31 to 33, SEQ ID NOs: 34 to 36, and SEQ ID NOs: 40 to 40. 42, SEQ ID NOs: 43 to 45, and SEQ ID NOs: 56 to 58,
- the primers and probes used in c) are the sequences shown in SEQ ID NOs: 7 to 9, SEQ ID NOs: 13 to 15, SEQ ID NOs: 16 to 18, SEQ ID NOs: 22 to 24, SEQ ID NOs: 34 to 36, and SEQ ID NOs: 40 to 42. It is preferable that it has been made, but it is not limited thereto.
- the present invention provides a) primers or probe sets involved in measuring the relative expression levels of IL1B, LTF, TNFSF13B, ITIH4, CXCL11, MAPK6, GK, and MCAM genes,
- a kit for screening for colorectal cancer and colorectal polyps including primers or probe sets involved in measuring the relative expression levels of IL1B, LTF, TNFSF13B, ITIH4, CXCL11, and MAPK6 genes is provided.
- the primers and probes used in a) are SEQ ID NOs: 7 to 9, SEQ ID NOs: 13 to 15, SEQ ID NOs: 16 to 18, SEQ ID NOs: 22 to 24, SEQ ID NOs: 34 to 36, Consists of the sequences set forth in SEQ ID NOs: 40 to 42, SEQ ID NOs: 43 to 45, and SEQ ID NOs: 56 to 58;
- the primers and probes used in b) are SEQ ID NOs: 4 to 6, SEQ ID NOs: 7 to 9, SEQ ID NOs: 16 to 18, SEQ ID NOs: 22 to 24, SEQ ID NOs: 31 to 33, SEQ ID NOs: 34 to 36, and SEQ ID NOs: 40 to 40. 42, SEQ ID NOs: 43 to 45, and SEQ ID NOs: 56 to 58,
- the primers and probes used in c) are the sequences shown in SEQ ID NOs: 7 to 9, SEQ ID NOs: 13 to 15, SEQ ID NOs: 16 to 18, SEQ ID NOs: 22 to 24, SEQ ID NOs: 34 to 36, and SEQ ID NOs: 40 to 42. It is preferable that it has been made, but it is not limited thereto.
- the present invention is CCR1, CES1, IL1B, ITGA2, LTF, TNFSF13B, PTGES, ITIH4, TUG1, NME1, PTGS2, CXCL11, MAPK6, GK, KRT19, EpCAM, MCAM, PPARG, ANKHD1-EIF4EBP3, GPR15, MMP23B, TAS2R10,
- a composition for screening for colorectal cancer and advanced adenoma comprising primers or probes involved in measuring the relative expression levels of TYMS, FOXA2, MKi67, ERBB2, NPTN, SNAI2, TERT and VIM genes as active ingredients.
- the primers and probes preferably consist of the sequences shown in SEQ ID NOs: 1 to 91, but all mutant sequences that achieve the effect of the present invention through one or more substitutions, deletions, additions, etc. to the sequences are also included in the scope of the present invention.
- the present invention is CCR1, CES1, IL1B, ITGA2, LTF, TNFSF13B, PTGES, ITIH4, TUG1, NME1, PTGS2, CXCL11, MAPK6, GK, KRT19, EpCAM, MCAM, PPARG, ANKHD1-EIF4EBP3, GPR15, MMP23B, TAS2R10,
- a kit for screening for colorectal cancer and advanced adenoma comprising primers or probes involved in measuring the relative expression levels of TYMS, FOXA2, MKi67, ERBB2, NPTN, SNAI2, TERT and VIM genes as active ingredients.
- the primers and probes preferably consist of the sequences shown in SEQ ID NOs: 1 to 91, but all mutant sequences that achieve the effect of the present invention through one or more substitutions, deletions, additions, etc. to the sequences are also included in the scope of the present invention.
- the present invention is CCR1, CES1, IL1B, ITGA2, LTF, TNFSF13B, PTGES, ITIH4, TUG1, NME1, PTGS2, CXCL11, MAPK6, GK, KRT19, EpCAM, Measuring the relative expression levels of MCAM, PPARG, ANKHD1-EIF4EBP3, GPR15, MMP23B, TAS2R10, TYMS, FOXA2, MKi67, ERBB2, NPTN, SNAI2, TERT and VIM genes
- a method for providing information for prediction or diagnosis of colorectal cancer or advanced adenoma is provided.
- the expression level is performed using primers and probes, and the primers and probes preferably consist of the sequences shown in SEQ ID NOs: 1 to 91, but one or more substitutions, deletions, or additions to the sequences are preferred. All mutant sequences that achieve the effect of the present invention through the like are also included in the scope of the present invention.
- the subject when one or more of the following 1) to 3) is confirmed, the subject is determined to have colorectal cancer:
- CCR1, CES1, GK, IL1B, KRT19, LTF, PPARG, PTGES, PTGS2, TAS2R10, TNFSF13B and TYMS genes were compared with the corresponding genes or those encoded by the genes in normal control subject samples. If expression level changes compared to protein levels are seen;
- ANKHD1-EIF4EBP3, CCR1, MCAM, MMP23B, TAS2R10, TNFSF13B, TUG1 and TYMS genes or protein levels encoded by the genes are compared with the level of the gene or the protein encoded by the gene in a sample of an individual with advanced adenoma to show a change in expression level.
- the following 1) or 2) is additionally confirmed to determine the pathological characteristics of an individual having colorectal cancer:
- CXCL11 and PTGS2 genes or the protein levels encoded by the genes are compared with the corresponding genes or the protein levels encoded by the genes of an individual sample having a low TNM stage colorectal cancer, and the change in expression level if present, the subject is judged to have high TNM stage colorectal cancer; or
- the present invention provides blood gene marker combinations (Table 1) for the purpose of screening for colorectal cancer and advanced adenoma composed of the following gene groups.
- the present invention provides an artificial intelligence algorithm-based classification model for colorectal cancer and advanced adenoma screening tests prepared by substituting the expression levels of the 30 markers.
- primer and probe sequences are provided to indicate the relative expression levels of corresponding biomarkers in blood.
- the present invention provides an artificial intelligence prediction model for colorectal cancer, advanced adenoma, and/or colorectal polyps screening test prepared by substituting the expression levels of the 18 markers.
- Total RNA A method for isolating a commonly used full-length RNA (Total RNA) and a method for synthesizing cDNA therefrom can be performed through a known method, and a detailed description of this process can be found in Joseph Sambrook et al., Molecular Cloning, A Laboratory Manual. , Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2001); and Noonan, K.F. etc. are disclosed and may be incorporated by reference into the present invention.
- the primers of the present invention can be chemically synthesized using the phosphoramidite solid support method, or other well-known methods. Such nucleic acid sequences can also be modified using a number of means known in the art.
- Non-limiting examples of such modifications include methylation, "capping", substitution of one or more homologues of a natural nucleotide, and modifications between nucleotides, such as uncharged linkages (e.g., methyl phosphonates, phosphotriesters, phosphotriesters, phosphoramidates, carbamates, etc.) or to charged associations (eg phosphorothioates, phosphorodithioates, etc.).
- a nucleic acid can contain one or more additional covalently linked moieties, such as proteins (eg, nucleases, toxins, antibodies, signal peptides, ly-L-lysine, etc.), intercalants (eg, acridine, psoralen, etc.). ), chelating agents (eg, metals, radioactive metals, iron, oxidizing metals, etc.), and alkylating agents.
- proteins eg, nucleases, toxins, antibodies, signal peptides, ly-L
- a nucleic acid sequence of the present invention may also be modified with a label capable of providing, directly or indirectly, a detectable signal.
- labels include radioactive isotopes, fluorescent molecules, and biotin.
- the amplified target sequence (CCR1, GAPDH gene, etc.) may be labeled with a detectable labeling substance.
- the label material may be a material that emits fluorescence, phosphorescence, chemiluminescence, or radioactivity, but is not limited thereto.
- the labeling material may be fluorescein, phycoerythrin, rhodamine, lissamine, Cy-5 or Cy-3.
- a radioactive isotope such as 32P or 35 S
- the amplification product is synthesized and radioactive is incorporated into the amplification product, so that the amplification product can be radioactively labeled.
- One or more oligonucleotide primer sets used to amplify the target sequence may be used.
- Labeling is performed by various methods commonly practiced in the art, such as the nick translation method, the random priming method (Multiprime DNA labeling systems booklet, “Amersham” (1989)), and the kination method (Maxam & Gilbert, Methods in Enzymology, 65:499 (1986)).
- the label provides a signal that can be detected by fluorescence, radioactivity, chromometry, gravimetry, X-ray diffraction or absorption, magnetism, enzymatic activity, mass analysis, binding affinity, hybridization radiofrequency, nanocrystals.
- the expression level is measured at the mRNA level through RT-PCR.
- novel primer pairs and fluorescently labeled probes that specifically bind to the CCR1 and GAPDH genes are required, and in the present invention, corresponding primers and probes specified by specific nucleotide sequences can be used, but are not limited thereto , Anything that can specifically bind to these genes to provide a detectable signal to perform RT-PCR can be used without limitation.
- FAM and Quen (Quencher) mean fluorescent dyes.
- the RT-PCR method applied to the present invention may be performed through a known process commonly used in the art.
- the step of measuring the mRNA expression level may be used without limitation as long as it is a method capable of measuring the normal mRNA expression level, and may be performed through radioactivity measurement, fluorescence measurement, or phosphorescence measurement depending on the type of probe label used, but is limited thereto. It doesn't work.
- the fluorescence measurement method is to label the 5'-end of the primer with Cy-5 or Cy-3 and perform real-time RT-PCR to label the target sequence with a detectable fluorescent label. And the fluorescence thus labeled can be measured using a fluorescence meter.
- the radioactivity measurement method is to add a radioactive isotope such as 32 P or 35 S to the PCR reaction solution during RT-PCR to label the amplification product, and then use a radioactivity measurement instrument, for example, a Geiger counter or Radioactivity can be measured using a liquid scintillation counter.
- a radioactivity measurement instrument for example, a Geiger counter or Radioactivity can be measured using a liquid scintillation counter.
- a fluorescence-labeled probe is attached to the PCR product amplified through the RT-PCR to emit fluorescence of a specific wavelength, and at the same time as amplification, the fluorescence of the genes of the present invention is measured in the fluorescence meter of the PCR device.
- the mRNA expression level is measured in real time, and the measured value is calculated and visualized through a PC, so that the inspector can easily check the expression level.
- the screening kit may be a kit for diagnosing colorectal cancer and colorectal polyps, characterized in that it includes essential elements necessary for carrying out a reverse transcription polymerase reaction.
- the reverse transcription polymerase reaction kit may include each primer pair specific for the gene of the present invention.
- the primer is a nucleotide having a sequence specific to the nucleic acid sequence of each marker gene, and may have a length of about 7 bp to 50 bp, more preferably about 10 bp to 30 bp.
- reverse transcription polymerase reaction kits include a test tube or other suitable container, reaction buffer (with varying pH and magnesium concentration), deoxynucleotides (dNTPs), enzymes such as Taq-polymerase and reverse transcriptase, DNAse, RNAse inhibitors, DEPC-water, sterile water, and the like.
- reaction buffer with varying pH and magnesium concentration
- dNTPs deoxynucleotides
- enzymes such as Taq-polymerase and reverse transcriptase, DNAse, RNAse inhibitors, DEPC-water, sterile water, and the like.
- kit of the present invention may further include a user guide describing optimal reaction performance conditions.
- the guide is a printed matter that explains how to use the kit, eg, how to prepare a buffer solution, suggested reaction conditions, and the like.
- the guide may include a brochure in the form of a pamphlet or leaflet, a label affixed to the kit, and instructions on the surface of the package containing the kit.
- the guide may include information disclosed or provided through an electronic medium such as the Internet.
- the term "information provision method for diagnosing colon cancer and colon polyps" is a preliminary step for diagnosis and provides objective basic information necessary for diagnosis of cancer, and clinical judgment or opinion of a doctor is excluded.
- the term "information provision method for screening for colorectal cancer and advanced adenoma" is a preliminary step for diagnosis and provides objective basic information necessary for diagnosis of cancer, and clinical judgment or opinion of a doctor is excluded.
- primer refers to a short nucleic acid sequence having a short free 3-terminal hydroxyl group capable of forming base pairs with a complementary template and serving as a starting point for copying the template strand.
- Primers can initiate DNA synthesis in the presence of reagents for polymerization (i.e., DNA polymerase or reverse transcriptase) and four different nucleoside triphosphates in an appropriate buffer and temperature.
- the primers of the present invention are sense and antisense nucleic acids having sequences of 7 to 50 nucleotides specific to each marker gene.
- a primer may incorporate additional features that do not alter the basic properties of the primer that serve as the starting point of DNA synthesis.
- probe is a single-stranded nucleic acid molecule and contains a sequence complementary to a target nucleic acid sequence.
- real-time RT-PCR refers to reverse transcription of RNA into complementary DNA (cDNA) using reverse transcriptase and using cDNA as a template containing target primers and labels It is a molecular biological polymerization method that amplifies a target using a target probe and quantitatively detects a signal generated from the label of the target probe on the amplified target at the same time.
- a data mining method capable of diagnosing colon cancer and colon polyps through information learning can be used for diagnosing or predicting colon cancer and colon polyps of the present invention, and in particular, it can be effectively improved through AI analysis. Therefore, a method capable of measuring the relative expression levels of diagnostic markers for colon cancer and colon polyps and/or an AI analysis method may be preferably used in the method for diagnosing or predicting colon cancer and colon polyps of the present invention.
- AI analysis when AI analysis is used for colorectal cancer and colorectal polyps prediction models, various interpretable models can be used without limitation, and linear regression, logistic regression, neural network analysis, decision tree, decision rule, rule fit, support vector machine A model such as is applicable without limitation, and in a preferred embodiment of the present invention, logistic regression analysis, decision tree, neural network analysis and support vector machine are used in particular.
- the prediction model of the present invention may include a colorectal cancer and colon polyps diagnosis unit, a classification unit, and a weighting unit.
- the colon-related disease classification unit may perform a process of classifying colon cancer and colon polyps using a neural network as a classifier, and the weighting unit assigns a weight to the classification result, thereby detecting colorectal cancer and colon polyps can be screened.
- Neural network analysis refers to a system that constructs one or more layers to make a decision based on a plurality of data.
- the input layer is a layer that inputs relative expression level information of gene markers as data into a neural network analysis model
- the output layer determines the presence or absence of colorectal cancer and colon polyp disease patients based on various input information. It is a layer that gives results that can be done.
- the hidden layer is a layer that proceeds with the process of determining whether or not there is a patient by assigning weights to various criteria (gene mutation information).
- the method for predicting colorectal cancer and colorectal polyps using an AI analysis technique estimates a neural network analysis model having the number of hidden nodes using an MLP neural network.
- the neural network model with the highest accuracy estimated from each model is determined as the final neural network model for colorectal disease prediction.
- the AI analysis may be composed of an input layer, a hidden layer, and an output layer, and the neural network analysis model through the neural network analysis step may be a neural network model having several hidden nodes in several hidden layers.
- the present invention is helpful in screening for colorectal cancer and colorectal polyps by substituting the expression patterns of genetic markers expressed in blood into an artificial intelligence algorithm using blood samples that are relatively easy to extract. can give
- the present invention can help screen for colorectal cancer and advanced adenoma by using a relatively easy-to-extract blood sample and substituting the expression patterns of genetic markers expressed in blood into an artificial intelligence algorithm.
- Figure 2 is the number of samples by group in which the experiment and analysis were performed
- 3 is a primer probe nucleotide sequence prepared for detecting a genetic biomarker
- Figure 9 shows the results of t-test statistical analysis for each group using all samples (selection of biomarkers for Model A production),
- Figure 11 shows the results of t-test statistical analysis by group using negative samples in Model A and B (selection of biomarkers for Model C production),
- Model 15 is a schematic diagram of the final result of sequentially applying Models A, B, and C;
- 16 is a process of building an artificial intelligence algorithm-based classification model and verifying model performance
- Examples 1 to 5 are for screening colon cancer and colon polyps, and 18 gene markers [C-C motif chemokine receptor 1 (CCR1), Carboxylesterase 1 (CES1), Interleukin 1 beta (IL1B), Integrin alpha 2 ( ITGA2), Lactotransferrin (LTF), Tumor necrosis factor superfamily 13b (TNFSF13B), Prostaglandin E synthase (PTGES), Inter-alpha-trypsin inhibitor heavy chain H4 (ITIH4), Taurine upregulated gene1 (TUG1), Nucleoside diphosphate kinase 1 (NME1) ), Prostaglandin-endoperoxide synthase 2 (PTGS2), C-X-C motif chemokine 11 (CXCL11), Mitogen-activated protein kinase 6 (MAPK6), Glycerol kinase (GK), Keratin 19 (KRT19), Epithelial cell adhesion molecule (EpCAM), Melanoma
- Examples 6 to 10 are for diagnosis of colorectal cancer and its precancerous stage, and five new markers ( ANKHD1-EIF4EBP3 Readthrough (ANKHD1-EIF4EBP3), G Protein-Coupled Receptor 15 (GPR15) , Matrix Metallopeptidase 23B (MMP23B), Taste 2 Receptor Member 10 (TAS2R10), and Thymidylate Synthetase (TYMS)] were added to perform experiments on 23 genetic markers,
- the number of samples analyzed was 112 in the colorectal cancer group, 178 in the advanced adenoma group, 104 in the non-advanced adenoma group, and 203 in the control group.
- Examples 11 to 15 are for colorectal cancer and advanced adenoma screening, and 7 new markers (Forkhead box A2 (FOXA2), Marker Of proliferation Ki-67 (MKi67), Erb-B2 Receptor Tyrosine Kinase 2 (ERBB2), Neuroplastin (NPTN), Snail family transcriptional repressor 2 (SNAI2), Telomerase reverse transcriptase (TERT) and Vimentin (VIM)) were added and experiments were performed on 30 genetic markers. , The number of samples analyzed was 148 in the colorectal cancer group, 197 in the advanced adenoma group, and 143 in the control group.
- FOXA2 Formhead box A2
- Ki67 Marker Of proliferation Ki-67
- ERBB2 Receptor Tyrosine Kinase 2 ERBB2
- NPTN Neuroplastin
- SNAI2 Snail family transcriptional repressor 2
- TERT Telomerase reverse transcriptase
- VAM Viment
- Circulating tumor cells may exist in the blood in colorectal cancer or advanced adenoma, a precursor of colorectal cancer, and 7 genes ( FOXA2, MKi67, MUC1, NPTN, SNAI2, TERT, VIM ) were used as targets, and the relative expression levels of the corresponding genes were compared by group using blood from normal, advanced adenoma, and colorectal cancer groups.
- 7 genes FOXA2, MKi67, MUC1, NPTN, SNAI2, TERT, VIM
- the relative expression level (2 - ⁇ Cq ) of the target gene was calculated using the Cq value of the target gene using the Cq value of the GAPDH gene.
- the fold change value was obtained from the relative expression level ratio of the normal group to the colorectal cancer group, and the p -value of the difference between the two groups was obtained through Student's t -test analysis.
- the fold change value was calculated as the relative expression level ratio of the normal group to the advanced glandular group, and the p -value of the difference between the two groups was obtained through Student's t -test analysis.
- the fold change value was calculated as the ratio of the relative expression level of the advanced adenoma group to the colorectal cancer group, and the p -value of the difference between the two groups was obtained through Student's t-test analysis.
- the relative expression levels of FOXA2, MKi67, MUC1, NPTN, SNAI2, TERT, and VIM genes in the advanced adenoma group and colorectal cancer group compared to the normal group showed a statistically significant difference with a p -value of 0.05 or less.
- a statistically significant difference was confirmed with a p -value of 0.05 or less (Table 2).
- Table 2 is a comparison of differences in relative expression between groups. Accordingly, 3) genes that were significant in distinguishing between colorectal cancer and advanced adenoma in the study (Examples 11-15) and a total of 30 genes to which the above 7 genes were added. was used to construct a classification model for the purpose of screening for colorectal cancer and advanced adenoma.
- Table 3 is a list of primer and probe sequences for all 30 markers used in the present invention.
- Table 3 shows the base sequences of each primer and probe used in the present invention.
- high-risk groups including those with 3 or more low-risk adenomas, 1 or more high-risk adenomas, and those with carcinoma in situ
- Total RNA is isolated from a blood sample collected with a Tempus tube using the Tempus blood RNA isolation kit (Applied Biosystems®).
- Example 3 Construction of cDNA from isolated total RNA and real-time PCR
- ABSI thermocycler
- Real time PCR reaction was performed using CFX96 (Biorad), and the reaction temperature conditions are as follows. After 95°C, 3 minutes, 95°C, 3 seconds - 60°C, 30 seconds were repeated 40 times. Each time the annealing process (60 ° C, 30 seconds) was performed, a process of measuring fluorescence was added to measure the fluorescence value that increased for each number of times.
- Table 3 shows the base sequences of each primer and probe used in the present invention.
- the relative expression level (2 - ⁇ Cq ) of the target gene is calculated using the Cq value of the target gene.
- Example 5 Production of predictive model for diagnosis of colon cancer and colon polyps
- Genetic biomarkers for substitution in the diagnosis prediction model for colorectal cancer and colon polyps are selected, and the relative expression levels of the selected genetic biomarkers are substituted to produce a diagnosis prediction model for colon cancer and colon polyps.
- the SPSS statistical analysis package was used, and for the production of colorectal cancer and colorectal polyps diagnosis prediction models by substituting the relative expression levels of the selected genetic biomarkers. Statistical analysis was performed using the R package.
- colorectal cancer and colorectal polyps diagnosis prediction models was performed by decision tree (DT), logistic regression (LR), neural network (NN), and support vector machine (SVM), but is not limited thereto.
- the artificial intelligence prediction model is created by substituting the results of a training set composed of a part of the total sample results. After constructing a validation set with samples not included in the training set, the accuracy of the model built with the training set is verified by substituting the results of the validation set. In this case, accuracy means how accurate the prediction of the prediction model is.
- a total of four types of models (DT, NR, NN, and SVM) were produced, and the results using the training set and validation set were repeated 1000 times.
- the model with the highest accuracy confirmed by the Validation set appears as the final result.
- the type with the highest sensitivity or specificity of the total set (491 in total) including all samples was selected.
- Model A is constructed by substituting the relative expression levels of a total of 8 corresponding genetic markers.
- an SVM model was selected that differentiates the colorectal cancer group from the high-risk/low-risk/normal group with a sensitivity of 92.9% and a specificity of 65.0%.
- Model A has high sensitivity to distinguish between the colorectal cancer group, but up to 40% of the remaining groups are classified as colorectal cancer groups. Therefore, a model that can distinguish between the colorectal cancer group and the rest of the groups should be created again using samples that are positive in model A.
- Model B is produced by substituting the relative expression levels of a total of 9 corresponding genetic markers.
- an SVM model was selected that differentiates the colorectal cancer group from the high-risk/low-risk/normal group with a sensitivity of 94.9% and a specificity of 87.9%.
- models A and B include a high-risk group that requires colonoscopy, a model that distinguishes the high-risk group from the low-risk group/normal group must be created.
- Model C is constructed by substituting the relative expression levels of a total of six corresponding gene markers.
- an SVM model was selected that distinguished the high-risk group from the low-risk group/normal group with a sensitivity of 91.3% and specificity of 81.9%.
- the biomarkers used in the models A, B and C are as follows.
- Table 4 shows the classification of subjects and the number of samples according to the results of colonoscopy.
- Total RNA is isolated from a blood sample collected with a Tempus tube using the Tempus blood RNA isolation kit (Applied Biosystems®).
- Example 8 cDNA construction and qPCR from isolated total RNA
- a thermocycler Applied Biosystems
- THUNDERBIRD® Probe qPCR Mix (TOYOBO), Forward / Reverse Primer, and 1 uL of Probe (10 pmole/uL), add 2 ⁇ L of synthesized cDNA, and add ultrapure water to make the final volume 20 ⁇ l. Mix.
- the qPCR reaction was performed using CFX96 (Biorad), and the reaction temperature conditions were as follows. After 95°C, 3 minutes, 95°C, 3 seconds - 60°C, 30 seconds were repeated 40 times. Each time the annealing process (60 ° C, 30 seconds) was performed, a process of measuring fluorescence was added to measure the fluorescence value that increased for each number of times. A constant fluorescence value was set as the threshold, and the Cq value, which is the number of cycles at the time of reaching the threshold, was derived.
- the relative expression level (2 - ⁇ Cq) of the target gene is calculated using the Cq value of the target gene.
- the list of genes targeted is as follows (Table 5).
- Table 5 is a list of target blood genetic markers
- Example 10 Establishment of a classification model for the purpose of screening for colorectal cancer and advanced adenoma by substituting the relative expression level of the target gene
- An artificial intelligence algorithm-based classification model was constructed using the H2O package (version 3.32.1.3) of Statistical R software (version 3.6.3).
- the production of colorectal cancer and advanced adenoma diagnosis prediction models was based on deep neural network (DNN), generalized linear model (GLM), and random forest (RF) algorithms, and additionally several types of models (GLM, RF, DNN, GBM, stacked ensemble (SE)), but is not limited thereto.
- an artificial intelligence algorithm-based classification model that can distinguish between a normal group and a colorectal cancer group and an advanced cancer group was constructed, and the performance of the built model was evaluated using the test set. do.
- a 5-fold cross-validation technique is applied so that the training set is divided into 5 areas to learn the model and at the same time verify the performance of the model using each area to provide a high-performance model. wanted to build.
- the performance of the artificial intelligence classification model was judged through the AUROC and AUPRC values of the training set and test set based on the AUROC and AUPRC values, which are representative performance indicators of the classification model. Among them, the model with the best performance was selected based on the performance of the new test set that was not used for model learning.
- the AUROC and AUPRC values of the DNN, GBM, and RF models built based on each algorithm and the SE model built through AutoML are as follows (Table 6). As a result, the AUROC and AUPRC indicators were the highest in the SE model based on the test set.
- Model training set Test set AUROC AUPRC AUROC AUPRC DNN 0.87 0.88 0.75 0.73 GLM 0.78 0.79 0.75 0.71 RF 0.80 0.79 0.76 0.76 SE 1.00 1.00 0.80 0.80
- Table 6 shows AUROC and AUPRC performance indicators in the training set and test set. As a result, as shown in Table 7, the sensitivity for classifying the colorectal cancer group was 89.3% and the sensitivity for classifying the advanced adenoma group was 74.5%. The specificity to distinguish the control group was 72.0%.
- Table 7 shows the sensitivity and specificity results for each group of the SE model.
- Table 8 shows the classification of subjects and the number of specimens according to colonoscopy results.
- Total RNA is isolated from a blood sample collected with a Tempus tube using the Tempus blood RNA isolation kit (Applied Biosystems®).
- Example 13 cDNA construction and qPCR from isolated total RNA
- a thermocycler Applied Biosystems
- THUNDERBIRD® Probe qPCR Mix (TOYOBO), Forward / Reverse Primer, and 1 uL of Probe (10 pmole/uL), add 2 ⁇ L of synthesized cDNA, and add ultrapure water to make the final volume 20 ⁇ l. Mix.
- the qPCR reaction was performed using CFX96 (Biorad), and the reaction temperature conditions were as follows. After 95°C, 3 minutes, 95°C, 3 seconds - 60°C, 30 seconds were repeated 40 times. Each time the annealing process (60 ° C, 30 seconds) was performed, a process of measuring fluorescence was added to measure the fluorescence value that increased by number of times. A constant fluorescence value was set as the threshold, and the Cq value, which is the number of cycles at the time of reaching the threshold, was derived.
- the relative expression level (2 -*?*Cq ) of the target gene is calculated using the Cq value of the target gene.
- the relative expression amount ratio of the colorectal cancer group compared to the normal group the relative expression amount ratio of the advanced glandular group compared to the normal group, and the relative expression amount ratio of the colorectal cancer group compared to the advanced glandular group were calculated and shown in the table below. can be expressed as (Table 9).
- Table 9 compares the relative expression of 30 genes between groups.
- Example 15 Establishment of a classification model for the purpose of screening colorectal cancer and advanced adenoma by substituting the relative expression level of target genes
- An artificial intelligence algorithm-based classification model was constructed using the H2O package (version 3.32.1.3) of Statistical R software (version 3.6.3).
- the production of colorectal cancer and advanced adenoma diagnosis prediction models was based on Deep neural network (DNN), Generalized linear model (GLM), Gradient boosting machine (GBM), and Random forest (RF) algorithms, and additionally several types of models (GLM, RF, DNN, GBM, stacked ensemble (SE)) was performed by grafting Automated machine learning (AutoML) method to build a model suitable for data, but is not limited thereto.
- an artificial intelligence algorithm-based classification model that can distinguish between a normal group and a colorectal cancer group and an advanced cancer group was constructed, and the performance of the built model was evaluated using the test set.
- FIG. 12 When building a model using a training set, a 5-fold cross-validation technique is applied so that the training set is divided into 5 areas to learn the model and at the same time verify the performance of the model using each area to provide a high-performance model. It was intended to build (FIG. 16).
- the performance of the artificial intelligence classification model was judged through the AUROC and AUPRC values of the training set and test set based on the AUROC and AUPRC values, which are representative performance indicators of the classification model. Among them, the model with the best performance was selected based on the performance of the new test set that was not used for model learning.
- the AUROC and AUPRC values of the DNN, GBM, GLM, and RF models built based on each algorithm and the SE model built through AutoML are as follows (Table 9). As a result, the AUC and AUPRC indicators were the highest based on the test set in the GBM model and the SE model built through AutoML.
- Model training set Test set AUROC AUPRC AUROC AUPRC GLM 0.93 0.98 0.92 0.98 DNN 0.95 0.98 0.91 0.97 RF 0.93 0.98 0.96 0.99 GBM 1.00 1.00 0.97 0.99 SE 1.00 1.00 0.97 0.99
- Table 10 shows the results of the AUROC and AUPRC indicators in the training and test sets for each model, the test set results for each group of the GBM model and SE model.
- the sensitivity for distinguishing the colorectal cancer group was 94.6%
- the sensitivity for distinguishing the advanced adenoma group was 97.5%
- the specificity for distinguishing the normal group was 80.6% (Table 11).
- the sensitivity was 91.9%
- the sensitivity to distinguish the advanced adenoma group was 95.1%
- the specificity to distinguish the normal group was 80.6% (Table 12). Therefore, the GBM model showing higher sensitivity was finally selected.
- Table 11 shows the sensitivity and specificity results for each group of the GBM model.
- Table 12 shows the sensitivity and specificity results for each group of the SE model.
- colorectal cancer or advanced adenoma a precursor of colorectal cancer, circulating tumor cells may exist in the blood, and 10 genes ( EpCAM, ERBB2, FOXA2, KRT19, MCAM, MKi67, NPTN, SNAI2, TERT, VIM ) as a target, the relative expression level by group was calculated, and an artificial intelligence algorithm-based model was constructed to distinguish colorectal cancer or advanced adenoma from the normal group.
- Table 13 is the classification of subjects and the number of samples according to the results of colonoscopy
- Total RNA is isolated from a blood sample collected with a Tempus tube using the Tempus blood RNA isolation kit (Applied Biosystems®).
- a thermocycler Applied Biosystems
- THUNDERBIRD®Probe qPCR Mix (TOYOBO), Forward / Reverse Primer, and 1 uL of Probe (10 pmole/uL), add 2 ⁇ L of synthesized cDNA, and add ultrapure water to make the final volume 20 ⁇ l. Mix.
- the qPCR reaction was performed using CFX96 (Biorad), and the reaction temperature conditions were as follows. After 95°C 3 minutes, 95°C 3 seconds - 60°C 30 seconds were repeated 40 times. Each time the annealing process (60 ° C, 30 seconds) was performed, a process of measuring fluorescence was added to measure the fluorescence value that increased for each number of times. A constant fluorescence value was set as the threshold, and the Cq value, which is the number of cycles at the time of reaching the threshold, was derived.
- the relative expression level (2 - ⁇ Cq ) of the target gene is calculated using the Cq value of the target gene.
- the list of genes targeted is as follows (Table 14).
- An artificial intelligence algorithm-based classification model was constructed using the H2O package (version 3.32.1.3) of Statistical R software (version 3.6.3).
- the production of colorectal cancer and advanced adenoma diagnosis prediction models was based on Deep neural network (DNN), Generalized linear model (GLM), Gradient boosting machine (GBM), and Random forest (RF) algorithms, and additionally several types of models (GLM, RF, DNN, GBM, stacked ensemble (SE)) was performed by grafting Automated machine learning (AutoML) method to build a model suitable for data, but is not limited thereto.
- an artificial intelligence algorithm-based classification model that can distinguish between a normal group and a colorectal cancer group and an advanced cancer group was constructed, and the performance of the built model was evaluated using the test set. do.
- a 5-fold cross-validation technique is applied so that the training set is divided into 5 areas to learn the model and at the same time verify the performance of the model using each area to provide a high-performance model. wanted to build.
- the performance of the artificial intelligence classification model was judged through the AUROC and AUPRC values of the training set and test set based on the AUROC and AUPRC values, which are representative performance indicators of the classification model. Among them, the model with the best performance was selected based on the performance of the new test set that was not used for model learning.
- the AUROC and AUPRC values of the DNN, GBM, and RF models built based on each algorithm and the GBM model built through AutoML are as follows (Table 3). As a result, the AUROC and AUPRC indicators were the highest in the GBM model based on the test set.
- Model training set Test set AUROC AUPRC AUROC AUPRC GLM 0.91 0.96 0.86 0.96 DNN 0.99 1.00 0.92 0.97 RF 0.90 0.96 0.94 0.98 GBM 1.00 1.00 0.94 0.98 GBM (AutoML) 0.98 0.99 0.91 0.97
- Table 15 shows AUROC and AUPRC performance indicators in the training set and test set.
- the sensitivity to distinguish the colorectal cancer group was 78.4%
- the sensitivity to distinguish the advanced adenoma group was 88.9%
- the specificity to distinguish the normal group was 80.6%.
- Table 16 shows the sensitivity and specificity results for each group of the GBM model.
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Engineering & Computer Science (AREA)
- Immunology (AREA)
- Organic Chemistry (AREA)
- Molecular Biology (AREA)
- Pathology (AREA)
- Analytical Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Microbiology (AREA)
- Urology & Nephrology (AREA)
- Physics & Mathematics (AREA)
- Biotechnology (AREA)
- Genetics & Genomics (AREA)
- Oncology (AREA)
- Hospice & Palliative Care (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- Biochemistry (AREA)
- Hematology (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Cell Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biophysics (AREA)
- Food Science & Technology (AREA)
- Medicinal Chemistry (AREA)
- General Physics & Mathematics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Description
본 발명은 대장암 및 대장 용종 또는 진행 선종의 선별 방법 및 그 방법에 사용되는 키트에 관한 발명이다.The present invention relates to a method for screening colon cancer and colon polyps or advanced adenomas and a kit used for the method.
대장암이란 대장을 구성하는 결장과 직장에 생기는 악성종양이며 대부분은 대장 점막세포에서 발생하는 선암으로 선종성 용종에서 유래한다고 알려져 있다.Colorectal cancer is a malignant tumor that occurs in the colon and rectum, which constitute the large intestine.
Globocan 자료에 따르면 2020년을 기준으로 대장암은 세계적으로 190만 건 새롭게 발병하여 전체 암 중에서 세번째로 가장 많이 발병하는 암이며, 93만5천 명이 대장암으로 사망하여 암으로 인한 사망율의 3위를 차지하는 주요 암이다. 대장암의 5년 상대생존율은 암 진행 정도에 따라 현저하게 낮아져 Stage I에서는 5년 상대생존율이 90%에 달하지만 Stage IV에서는 5년 생존율이 14%로 급격하게 감소한다. 하지만 Stage I에서는 대부분 증상이 나타나지 않기 때문에 37%만이 Stage I 단계에서 진단되고 있다. 따라서, 대장암의 생존율을 높이는 데에 있어 정기적인 검진을 통한 대장암의 조기진단이 중요하다. According to Globocan data, as of 2020, colorectal cancer is the third most common cancer among all cancers, with 1.9 million new cases worldwide, and 935,000 people died from colorectal cancer, ranking third in the mortality rate due to cancer. It is a major cancer The 5-year relative survival rate of colorectal cancer is significantly lowered according to the degree of progression of the cancer. In Stage I, the 5-year relative survival rate reaches 90%, but in Stage IV, the 5-year survival rate rapidly decreases to 14%. However, only 37% of cases are diagnosed at Stage I because most of them do not show symptoms in Stage I. Therefore, early diagnosis of colorectal cancer through regular screening is important in increasing the survival rate of colorectal cancer.
대부분의 대장암은 장기간의 선종-암화과정 (Adenoma-carcinoma Sequence)을 거쳐 발생하게 된다. 선정-암화과정은 대장의 정상상피세포가 비진행선종, 진행선종을 거쳐 대장암으로 진행되는 과정을 의미한다. 진행선종은 대장암으로 발전할 가능성이 높은 선종으로 조직학적으로 크기가 10 mm 이상이거나 25% 이상의 융모성분을 포함하거나, 고등급 이형성 병변을 나타내거나 3개 이상의 선종이 있는 경우를 나타낸다. 13년간의 추적연구를 통하여 진행선종을 가진 대상자들은 대조군 및 비진행선종을 가진 대상자들에 비하여 대장암이 발생할 확률이 2.7배, 대장암으로 인해 사망할 확률이 2.6배 높다고 밝혀진 바 있다. 따라서 대장암 발병율을 낮추기 위해서는 진행선종 단계에서의 발견과 제거가 중요하다. Most colorectal cancers develop through a long-term adenoma-carcinoma sequence. The selection-cancerization process refers to the process in which normal epithelial cells of the colon progress to colorectal cancer via non-advanced adenoma and advanced adenoma. Progressive adenomas are adenomas that are highly likely to develop into colorectal cancer. Histologically, they are larger than 10 mm, contain more than 25% villous components, have high-grade dysplastic lesions, or have three or more adenomas. A 13-year follow-up study revealed that subjects with advanced adenomas were 2.7 times more likely to develop colorectal cancer and 2.6 times more likely to die from colorectal cancer than subjects with control and non-advanced adenomas. Therefore, in order to lower the incidence of colorectal cancer, it is important to detect and remove advanced adenomas at the stage.
대부분의 국가에서 대장암의 진단을 위하여 대장내시경검사와 분변잠혈검사가 이루어지고 있다. 대장내시경검사는 한번에 전체 대장을 검사할 수 있으며 검사 중 발견되는 선종이나 일부 조기 암을 제거할 수 있다. 메타분석을 통해 대장내시경의 시행으로 대장암의 발병율 및 사망률 모두 약 70% 가량 감소함을 확인한 결과가 보고된 바 있다. 대장내시경검사는 검사 전 장정결 과정이 필요하며 장 정결 정도가 검사의 정확도와 질에 매우 중요한 영향을 미친다. 또한 합병증으로서 발생 가능한 장천공이 약 0.09% 빈도로 나타난다. 따라서 대규모 인구집단을 대상으로 하는 정기적인 대장암 선별검사로서는 시행할 수 있는 의사 수의 제한, 검사 시 통증, 전처치의 불편감으로 인해 대상자의 순응도가 떨어질 수 있고 합병증이 비교적 흔히 발생할 수 있다는 문제점이 있다. In most countries, colonoscopy and fecal occult blood test are performed for the diagnosis of colorectal cancer. Colonoscopy can examine the entire large intestine at once and can remove adenomas or some early cancers found during the examination. Through a meta-analysis, it has been reported that colonoscopy reduces both the incidence and mortality of colorectal cancer by about 70%. Colonoscopy requires a bowel preparation process before examination, and the degree of bowel preparation has a very important effect on the accuracy and quality of the examination. In addition, intestinal perforation, which can occur as a complication, appears with a frequency of about 0.09%. Therefore, as a regular colorectal cancer screening test targeting a large population, the number of doctors that can be performed is limited, the patient's compliance may be poor due to pain during examination, and discomfort in pretreatment, and complications may occur relatively often. there is
분변잠혈검사는 대장 내 종괴에서 스며나오는 출혈을 대변에서 감지하여 진단하는 방법이다. 비침습적인 방법으로 특별한 부작용이 없으며 비교적 시행이 쉽고 비용이 저렴한 장점이 있다. 분변잠혈검사를 시행하지 않은 사람에 비해 분변잠혈검사를 시행한 사람은 사망률이 10-40% 가량 더 낮게 나타난다. Fecal occult blood test is a method of diagnosis by detecting bleeding from a mass in the large intestine in the stool. It is a non-invasive method, has no special side effects, is relatively easy to implement, and has the advantage of low cost. Compared to those who did not perform fecal occult blood testing, those who underwent fecal occult blood testing had a 10-40% lower mortality rate.
메타분석에 의하면 분변잠혈검사의 대장암 및 진행선종에 대한 민감도는 각각 56-74% 와 23-31% 이고, 특이도는 90-95%로 나타났다. 대장 종양의 출혈은 간헐적인 경우가 많기 때문에 적절한 검체 채취 여부에 따라서 검사의 정확도에 차이가 날 수 있고 대변 검체를 이용함에 있어서 검사 대상자들의 순응도가 낮게 나타날 수 있다. According to a meta-analysis, the sensitivity of the fecal occult blood test for colorectal cancer and advanced adenoma was 56-74% and 23-31%, respectively, and the specificity was 90-95%. Since bleeding from colon tumors is often intermittent, the accuracy of the test may vary depending on whether or not the specimen is properly collected, and the compliance of test subjects in using a stool sample may be low.
따라서 대장내시경에 비해 검사 중 위험도가 낮고 검체 채취가 용이한 혈액 검체를 이용하며 선별검사로서 유용하도록 민감도와 특이도를 높인 분자진단 검사법의 개발이 필요하다. Therefore, it is necessary to develop a molecular diagnostic test method with increased sensitivity and specificity to be useful as a screening test using a blood sample that has a low risk during examination and is easy to collect compared to colonoscopy.
[선행 특허 문헌][Prior Patent Literature]
미국 특허공개번호 20180238893 US Patent Publication No. 20180238893
본 발명은 상기의 문제점을 해결하고 상기의 필요성에 의해 안출된 것으로서 본 발명의 목적은 비교적 추출이 용이한 혈액 검체를 기반으로 역전사 정량적 중합반응과 그 결과를 통해 제작한 인공지능 예측 모델을 이용하여 민감도와 특이도가 높은 대장암 및 진행선종 또는 대장용종의 분자진단 검사법을 개발하기 위한 정보제공방법을 제공하는 것이다.The present invention solves the above problems and has been made by the above necessity, and an object of the present invention is to use a quantitative reverse transcription polymerization reaction based on a blood sample that is relatively easy to extract and an artificial intelligence prediction model produced through the result. It is to provide an information provision method for developing a molecular diagnostic test method for colorectal cancer and advanced adenoma or colon polyp with high sensitivity and specificity.
본 발명의 다른 목적은 비교적 추출이 용이한 혈액 검체를 기반으로 역전사 정량적 중합반응과 그 결과를 통해 제작한 인공지능 예측 모델을 이용하여 민감도와 특이도가 높은 대장암 및 대장용종 또는 진행 선종의 분자진단 검사용 조성물을 제공하는 것이다.Another object of the present invention is a molecule of colorectal cancer and colorectal polyps or advanced adenomas with high sensitivity and specificity using an artificial intelligence prediction model produced through quantitative reverse transcription polymerization based on a blood sample that is relatively easy to extract and the result. It is to provide a composition for diagnostic testing.
상기 목적을 달성하기 위하여 본 발명은 IL1B, LTF, TNFSF13B, ITIH4, CXCL11, MAPK6, GK, 및 MCAM 유전자의 상대적 발현량을 측정하는데 관여하는 프라이머 또는 프로브를 유효성분으로 포함하는 대장암군과 대장암 고위험군, 저위험군 및 정상군을 구분할 수 있는 조성물을 제공한다.In order to achieve the above object, the present invention is a colorectal cancer group and colorectal cancer high-risk group containing primers or probes involved in measuring the relative expression levels of IL1B, LTF, TNFSF13B, ITIH4, CXCL11, MAPK6, GK, and MCAM genes as active ingredients , It provides a composition capable of distinguishing between a low-risk group and a normal group.
본 발명의 일 구현예에 있어서,In one embodiment of the present invention,
상기 프라이머 및 프로브는 서열번호 7 내지 9, 서열번호 13 내지 15, 서열번호 16 내지 18, 서열번호 22 내지, 24, 서열번호 34 내지 36, 서열번호 40 내지 42, 서열번호 43 내지 45, 및 서열번호 56 내지 58에 기재된 서열로 이루어진 것이 바람직하나 이에 한정되지 아니한다.The primers and probes are SEQ ID NOs: 7 to 9, SEQ ID NOs: 13 to 15, SEQ ID NOs: 16 to 18, SEQ ID NOs: 22 to 24, SEQ ID NOs: 34 to 36, SEQ ID NOs: 40 to 42, SEQ ID NOs: 43 to 45, and sequences It is preferably composed of the sequence shown in Nos. 56 to 58, but is not limited thereto.
본 발명의 다른 구현예에 있어서, In another embodiment of the present invention,
상기 정상군은 대장내시경 검사의 결과를 통해 대장 내 병변이 없는 경우를,저위험군은 저위험 선종이 3개미만 존재하는 경우를, 고위험군은 대장내시경 검사를 통하여 저위험 선종이 3개 이상, 고위험 선종이 1개 이상, 및 상피내암인 경우를 포함하는 것이 바람직하나 이에 한정되지 아니한다.The normal group is a case in which there are no lesions in the colon through colonoscopy, the low-risk group is a case in which there are less than three low-risk adenomas, and the high-risk group is a case in which three or more low-risk adenomas are present through colonoscopy, and a high-risk group is present. It is preferable to include one or more adenomas and carcinoma in situ, but is not limited thereto.
또 본 발명은 CES1, IL1B, TNFSF13B, ITIH4, CXCL11, MAPK6, GK, 및 MCAM 유전자의 상대적 발현량을 측정하는데 관여하는 프라이머 또는 프로브를 유효성분으로 포함하는 대장암군과 대장암 고위험군, 저위험군 및 정상군을 구분할 수 있는 조성물을 제공한다.In addition, the present invention is a colorectal cancer group containing primers or probes involved in measuring the relative expression levels of CES1, IL1B, TNFSF13B, ITIH4, CXCL11, MAPK6, GK, and MCAM genes as an active ingredient, and colorectal cancer high-risk, low-risk and normal A composition capable of classifying groups is provided.
본 발명의 일 구현예에 있어서, In one embodiment of the present invention,
상기 프라이머 및 프로브는 서열번호 4 내지 6, 서열번호 7 내지 9, 서열번호 16 내지 18, 서열번호 22 내지 24, 서열번호 31 내지 33, 서열번호 34 내지 36, 서열번호 40 내지 42, 서열번호 43 내지 45, 및 서열번호 56 내지 58에 기재된 서열로 이루어진 것이 바람직하나 이에 한정되지 아니한다.The primers and probes are SEQ ID NOs: 4 to 6, SEQ ID NOs: 7 to 9, SEQ ID NOs: 16 to 18, SEQ ID NOs: 22 to 24, SEQ ID NOs: 31 to 33, SEQ ID NOs: 34 to 36, SEQ ID NOs: 40 to 42, SEQ ID NOs: 43 to 45 and SEQ ID NOs: 56 to 58, but is not limited thereto.
또 본 발명은 IL1B, LTF, TNFSF13B, ITIH4, CXCL11, 및 MAPK6 유전자의 상대적 발현량을 측정하는데 관여하는 프라이머 또는 프로브를 유효성분으로 포함하는 대장암 고위험군과 저위험군 및 정상군을 구분할 수 있는 조성물을 제공한다.In addition, the present invention provides a composition capable of distinguishing a high-risk group from a colorectal cancer high-risk group, a low-risk group, and a normal group, comprising primers or probes involved in measuring the relative expression levels of IL1B, LTF, TNFSF13B, ITIH4, CXCL11, and MAPK6 genes as an active ingredient to provide.
본 발명의 일 구현예에 있어서, In one embodiment of the present invention,
상기 프라이머 및 프로브는 서열번호 7 내지 9, 서열번호 13 내지 15, 서열번호 16 내지 18, 서열번호 22 내지 24, 서열번호 34 내지 36, 및 서열번호 40 내지 42에 기재된 서열로 이루어진 것이 바람직하나 이에 한정되지 아니한다.The primers and probes preferably consist of the sequences shown in SEQ ID NOs: 7 to 9, SEQ ID NOs: 13 to 15, SEQ ID NOs: 16 to 18, SEQ ID NOs: 22 to 24, SEQ ID NOs: 34 to 36, and SEQ ID NOs: 40 to 42, but therefor Not limited.
또 본 발명은 a)샘플을 중합효소 연쇄반응을 통하여 프라이머 및 프로브를 이용하여 IL1B, LTF, TNFSF13B, ITIH4, CXCL11, MAPK6, GK, 및 MCAM 유전자의 상대적 발현량을 측정하고,In addition, the present invention a) measuring the relative expression levels of IL1B, LTF, TNFSF13B, ITIH4, CXCL11, MAPK6, GK, and MCAM genes using primers and probes through polymerase chain reaction,
b) 상기 a)에서 대장암 양성이 나온 샘플을 대상으로 중합효소 연쇄반응을 통하여 프라이머 및 프로브를 이용하여 CES1, IL1B, TNFSF13B, ITIH4, PTGS2, CXCL11, MAPK6, GK, 및 MCAM 유전자의 상대적 발현량을 측정하여 양성인 경우에는 대장암 양성으로 판단하고,b) Relative expression levels of CES1, IL1B, TNFSF13B, ITIH4, PTGS2, CXCL11, MAPK6, GK, and MCAM genes using primers and probes through polymerase chain reaction for samples positive for colon cancer in a) above is measured and if it is positive, it is judged to be colon cancer positive,
c)상기 a) 및 b)에서 음성으로 판단받은 샘플을 중합효소 연쇄반응을 통하여 프라이머 및 프로브를 이용하여 IL1B, LTF, TNFSF13B, ITIH4, CXCL11, 및 MAPK6 유전자의 상대적 발현량을 측정하여 양성인 경우에는 대장암 양성으로 판단하는 것을 특징으로 하는 대장암 및 대장용종의 선별 방법을 제공한다.c) Samples judged negative in a) and b) were subjected to polymerase chain reaction to measure the relative expression levels of IL1B, LTF, TNFSF13B, ITIH4, CXCL11, and MAPK6 genes using primers and probes. Provided is a method for screening colon cancer and colon polyps, characterized in that colon cancer is judged as positive.
본 발명의 일 구현예에 있어서, 상기 a)에 사용된 프라이머 및 프로브는 서열번호 7 내지 9, 서열번호 13 내지 15, 서열번호 16 내지 18, 서열번호 22 내지, 24, 서열번호 34 내지 36, 서열번호 40 내지 42, 서열번호 43 내지 45, 및 서열번호 56 내지 58에 기재된 서열로 이루어지고,In one embodiment of the present invention, the primers and probes used in a) are SEQ ID NOs: 7 to 9, SEQ ID NOs: 13 to 15, SEQ ID NOs: 16 to 18, SEQ ID NOs: 22 to 24, SEQ ID NOs: 34 to 36, Consisting of the sequences set forth in SEQ ID NOs: 40 to 42, SEQ ID NOs: 43 to 45, and SEQ ID NOs: 56 to 58,
상기 b)에 사용된 프라이머 및 프로브는 서열번호 4 내지 6, 서열번호 7 내지 9, 서열번호 16 내지 18, 서열번호 22 내지 24, 서열번호 31 내지 33, 서열번호 34 내지 36, 서열번호 40 내지 42, 서열번호 43 내지 45, 및 서열번호 56 내지 58에 기재된 서열로 이루어지며,The primers and probes used in b) are SEQ ID NOs: 4 to 6, SEQ ID NOs: 7 to 9, SEQ ID NOs: 16 to 18, SEQ ID NOs: 22 to 24, SEQ ID NOs: 31 to 33, SEQ ID NOs: 34 to 36, and SEQ ID NOs: 40 to 40. 42, SEQ ID NOs: 43 to 45, and SEQ ID NOs: 56 to 58,
상기 c)에 사용된 프라이머 및 프로브는 서열번호 7 내지 9, 서열번호 13 내지 15, 서열번호 16 내지 18, 서열번호 22 내지 24, 서열번호 34 내지 36, 및 서열번호 40 내지 42에 기재된 서열로 이루어진 것이 바람직하나 이에 한정되지 아니한다.The primers and probes used in c) are the sequences shown in SEQ ID NOs: 7 to 9, SEQ ID NOs: 13 to 15, SEQ ID NOs: 16 to 18, SEQ ID NOs: 22 to 24, SEQ ID NOs: 34 to 36, and SEQ ID NOs: 40 to 42. It is preferable that it has been made, but it is not limited thereto.
또한 본 발명은 a) IL1B, LTF, TNFSF13B, ITIH4, CXCL11, MAPK6, GK, 및 MCAM 유전자의 상대적 발현량을 측정하는데 관여하는 프라이머 또는 프로브 세트, In addition, the present invention provides a) primers or probe sets involved in measuring the relative expression levels of IL1B, LTF, TNFSF13B, ITIH4, CXCL11, MAPK6, GK, and MCAM genes,
b) CES1, IL1B, TNFSF13B, ITIH4, PTGS2, CXCL11, MAPK6, GK, 및 MCAM 유전자의 상대적 발현량을 측정하는데 관여하는 프라이머 또는 프로브 세트, 및 b) primers or probe sets involved in measuring the relative expression levels of CES1, IL1B, TNFSF13B, ITIH4, PTGS2, CXCL11, MAPK6, GK, and MCAM genes, and
c) IL1B, LTF, TNFSF13B, ITIH4, CXCL11, 및 MAPK6 유전자의 상대적 발현량을 측정하는데 관여하는 프라이머 또는 프로브 세트를 포함하는 대장암 및 대장용종 선별 키트를 제공한다.c) A kit for screening for colorectal cancer and colorectal polyps including primers or probe sets involved in measuring the relative expression levels of IL1B, LTF, TNFSF13B, ITIH4, CXCL11, and MAPK6 genes is provided.
본 발명의 일 구현예에 있어서, 상기 a)에 사용된 프라이머 및 프로브는 서열번호 7 내지 9, 서열번호 13 내지 15, 서열번호 16 내지 18, 서열번호 22 내지, 24, 서열번호 34 내지 36, 서열번호 40 내지 42, 서열번호 43 내지 45, 및 서열번호 56 내지 58에 기재된 서열로 이루어지고,In one embodiment of the present invention, the primers and probes used in a) are SEQ ID NOs: 7 to 9, SEQ ID NOs: 13 to 15, SEQ ID NOs: 16 to 18, SEQ ID NOs: 22 to 24, SEQ ID NOs: 34 to 36, Consists of the sequences set forth in SEQ ID NOs: 40 to 42, SEQ ID NOs: 43 to 45, and SEQ ID NOs: 56 to 58;
상기 b)에 사용된 프라이머 및 프로브는 서열번호 4 내지 6, 서열번호 7 내지 9, 서열번호 16 내지 18, 서열번호 22 내지 24, 서열번호 31 내지 33, 서열번호 34 내지 36, 서열번호 40 내지 42, 서열번호 43 내지 45, 및 서열번호 56 내지 58에 기재된 서열로 이루어지며,The primers and probes used in b) are SEQ ID NOs: 4 to 6, SEQ ID NOs: 7 to 9, SEQ ID NOs: 16 to 18, SEQ ID NOs: 22 to 24, SEQ ID NOs: 31 to 33, SEQ ID NOs: 34 to 36, and SEQ ID NOs: 40 to 40. 42, SEQ ID NOs: 43 to 45, and SEQ ID NOs: 56 to 58,
상기 c)에 사용된 프라이머 및 프로브는 서열번호 7 내지 9, 서열번호 13 내지 15, 서열번호 16 내지 18, 서열번호 22 내지 24, 서열번호 34 내지 36, 및 서열번호 40 내지 42에 기재된 서열로 이루어진 것이 바람직하나 이에 한정되지 아니한다.The primers and probes used in c) are the sequences shown in SEQ ID NOs: 7 to 9, SEQ ID NOs: 13 to 15, SEQ ID NOs: 16 to 18, SEQ ID NOs: 22 to 24, SEQ ID NOs: 34 to 36, and SEQ ID NOs: 40 to 42. It is preferable that it has been made, but it is not limited thereto.
또한 본 발명은 CCR1, CES1, IL1B, ITGA2, LTF, TNFSF13B, PTGES, ITIH4, TUG1, NME1, PTGS2, CXCL11, MAPK6, GK, KRT19, EpCAM, MCAM, PPARG, ANKHD1-EIF4EBP3, GPR15, MMP23B, TAS2R10, TYMS, FOXA2, MKi67, ERBB2, NPTN,SNAI2,TERT 및 VIM 유전자의 상대적 발현량을 측정하는데 관여하는 프라이머 또는 프로브를 유효성분으로 포함하는 대장암 및 진행선종 선별 검사용 조성물을 제공한다.In addition, the present invention is CCR1, CES1, IL1B, ITGA2, LTF, TNFSF13B, PTGES, ITIH4, TUG1, NME1, PTGS2, CXCL11, MAPK6, GK, KRT19, EpCAM, MCAM, PPARG, ANKHD1-EIF4EBP3, GPR15, MMP23B, TAS2R10, Provided is a composition for screening for colorectal cancer and advanced adenoma, comprising primers or probes involved in measuring the relative expression levels of TYMS, FOXA2, MKi67, ERBB2, NPTN, SNAI2, TERT and VIM genes as active ingredients.
본 발명의 일 구현예에 있어서,In one embodiment of the present invention,
상기 프라이머 및 프로브는 서열번호 1 내지 91에 기재된 서열로 이루어진 것이 바람직하나 상기 서열에 하나 이상의 치환, 결손, 부가 등을 통하여 본 발명의 효과를 달성하는 모든 돌연변이 서열도 본 발명의 범위에 포함된다.The primers and probes preferably consist of the sequences shown in SEQ ID NOs: 1 to 91, but all mutant sequences that achieve the effect of the present invention through one or more substitutions, deletions, additions, etc. to the sequences are also included in the scope of the present invention.
또한 본 발명은 CCR1, CES1, IL1B, ITGA2, LTF, TNFSF13B, PTGES, ITIH4, TUG1, NME1, PTGS2, CXCL11, MAPK6, GK, KRT19, EpCAM, MCAM, PPARG, ANKHD1-EIF4EBP3, GPR15, MMP23B, TAS2R10, TYMS, FOXA2, MKi67, ERBB2, NPTN,SNAI2,TERT 및 VIM 유전자의 상대적 발현량을 측정하는데 관여하는 프라이머 또는 프로브를 유효성분으로 포함하는 대장암 및 진행선종 선별 검사용 키트를 제공한다. In addition, the present invention is CCR1, CES1, IL1B, ITGA2, LTF, TNFSF13B, PTGES, ITIH4, TUG1, NME1, PTGS2, CXCL11, MAPK6, GK, KRT19, EpCAM, MCAM, PPARG, ANKHD1-EIF4EBP3, GPR15, MMP23B, TAS2R10, Provided is a kit for screening for colorectal cancer and advanced adenoma, comprising primers or probes involved in measuring the relative expression levels of TYMS, FOXA2, MKi67, ERBB2, NPTN, SNAI2, TERT and VIM genes as active ingredients.
본 발명의 일 구현예에 있어서,In one embodiment of the present invention,
상기 프라이머 및 프로브는 서열번호 1 내지 91에 기재된 서열로 이루어진 것이 바람직하나 상기 서열에 하나 이상의 치환, 결손, 부가 등을 통하여 본 발명의 효과를 달성하는 모든 돌연변이 서열도 본 발명의 범위에 포함된다.The primers and probes preferably consist of the sequences shown in SEQ ID NOs: 1 to 91, but all mutant sequences that achieve the effect of the present invention through one or more substitutions, deletions, additions, etc. to the sequences are also included in the scope of the present invention.
또한 본 발명은 샘플을 중합효소 연쇄반응을 통하여 프라이머 및 프로브를 이용하여 CCR1, CES1, IL1B, ITGA2, LTF, TNFSF13B, PTGES, ITIH4, TUG1, NME1, PTGS2, CXCL11, MAPK6, GK, KRT19, EpCAM, MCAM, PPARG, ANKHD1-EIF4EBP3, GPR15, MMP23B, TAS2R10, TYMS, FOXA2, MKi67, ERBB2, NPTN,SNAI2,TERT 및 VIM 유전자의 상대적 발현량을 측정하는 단계를 포함하는 대장암 또는 진행 선종의 예측 또는 진단을 위한 정보를 제공하는 방법을 제공한다.In addition, the present invention is CCR1, CES1, IL1B, ITGA2, LTF, TNFSF13B, PTGES, ITIH4, TUG1, NME1, PTGS2, CXCL11, MAPK6, GK, KRT19, EpCAM, Measuring the relative expression levels of MCAM, PPARG, ANKHD1-EIF4EBP3, GPR15, MMP23B, TAS2R10, TYMS, FOXA2, MKi67, ERBB2, NPTN, SNAI2, TERT and VIM genes A method for providing information for prediction or diagnosis of colorectal cancer or advanced adenoma is provided.
본 발명의 일 구현예에 있어서, 상기 발현량은 프라이머 및 프로브를 사용하여 수행하고, 상기 프라이머 및 프로브는 서열번호 1 내지 91에 기재된 서열로 이루어진 것이 바람직하나 상기 서열에 하나 이상의 치환, 결손, 부가 등을 통하여 본 발명의 효과를 달성하는 모든 돌연변이 서열도 본 발명의 범위에 포함된다.In one embodiment of the present invention, the expression level is performed using primers and probes, and the primers and probes preferably consist of the sequences shown in SEQ ID NOs: 1 to 91, but one or more substitutions, deletions, or additions to the sequences are preferred. All mutant sequences that achieve the effect of the present invention through the like are also included in the scope of the present invention.
본 발명의 다른 구현예에 있어서, 하기 1) 내지 3) 중 하나 이상이 확인되는 경우 개체가 대장암을 가지는 것으로 판단한다:In another embodiment of the present invention, when one or more of the following 1) to 3) is confirmed, the subject is determined to have colorectal cancer:
1) CCR1, CES1, GK, IL1B, KRT19, LTF, PPARG, PTGES, PTGS2, TAS2R10, TNFSF13B 및 TYMS 유전자 또는 상기 유전자에 의해 인코딩되는 단백질 수준을 정상 대조군 개체 시료의 해당 유전자 또는 상기 유전자에 의해 인코딩되는 단백질 수준과 비교하여 발현 수준 변화가 나타나는 경우; 1) CCR1, CES1, GK, IL1B, KRT19, LTF, PPARG, PTGES, PTGS2, TAS2R10, TNFSF13B and TYMS genes, or the protein levels encoded by the genes, were compared with the corresponding genes or those encoded by the genes in normal control subject samples. If expression level changes compared to protein levels are seen;
2) ANKHD1-EIF4EBP3, ITIH4, MCAM, PPARG, TAS2R10, TNFSF13B, TUG1 및 TYMS 유전자 또는 상기 유전자에 의해 인코딩되는 단백질 수준을 비진행성 선종(non-advanced adenoma)을 갖는 개체 시료의 해당 유전자 또는 상기 유전자에 의해 인코딩되는 단백질 수준과 비교하여 발현 수준 변화가 나타나는 경우; 및2) ANKHD1-EIF4EBP3, ITIH4, MCAM, PPARG, TAS2R10, TNFSF13B, TUG1 and TYMS genes or the level of proteins encoded by said genes in a sample of an individual with a non-advanced adenoma or the corresponding gene If expression level changes are seen compared to the level of the protein encoded by; and
3) ANKHD1-EIF4EBP3, CCR1, MCAM, MMP23B, TAS2R10, TNFSF13B, TUG1 및 TYMS 유전자 또는 상기 유전자에 의해 인코딩되는 단백질 수준을 진행성 선종을 갖는 개체 시료의 해당 유전자 또는 상기 유전자에 의해 인코딩되는 단백질 수준과 비교하여 발현 수준 변화가 나타나는 경우.3) ANKHD1-EIF4EBP3, CCR1, MCAM, MMP23B, TAS2R10, TNFSF13B, TUG1 and TYMS genes or protein levels encoded by the genes are compared with the level of the gene or the protein encoded by the gene in a sample of an individual with advanced adenoma to show a change in expression level.
본 발명의 또 다른 구현예에 있어서, 추가로 하기 1) 또는 2)를 확인하여 대장암을 가지는 개체의 병리학적 특성을 판단한다:In another embodiment of the present invention, the following 1) or 2) is additionally confirmed to determine the pathological characteristics of an individual having colorectal cancer:
1) CXCL11 및 PTGS2 유전자 또는 상기 유전자에 의해 인코딩되는 단백질 수준을 저도 TNM 병기(low TNM stage)의 대장암을 가지는 개체 시료의 해당 유전자 또는 상기 유전자에 의해 인코딩되는 단백질 수준과 비교하여 발현 수준 변화가 나타나는 경우 개체가 고도 TNM 병기(high TNM stage)의 대장암을 가지는 것으로 판단; 또는1) CXCL11 and PTGS2 genes or the protein levels encoded by the genes are compared with the corresponding genes or the protein levels encoded by the genes of an individual sample having a low TNM stage colorectal cancer, and the change in expression level if present, the subject is judged to have high TNM stage colorectal cancer; or
2) EpCAM 유전자 또는 상기 유전자에 의해 인코딩되는 단백질 수준을 고분화된 대장암을 가지는 개체 시료의 해당 유전자 또는 상기 유전자에 의해 인코딩되는 단백질 수준과 비교하여 발현 수준 변화가 나타나는 경우 개체가 중등도 및 저분화된 대장암을 가지는 것으로 판단.2) Compare the level of the EpCAM gene or the protein encoded by the gene with the level of the gene or the protein encoded by the gene in a sample of an individual having highly differentiated colorectal cancer. diagnosed as having colorectal cancer.
또한 본 발명은 다음의 유전자 군으로 구성된 대장암 및 진행선종 스크리닝 목적의 혈중 유전자 마커 조합(표 1)을 제공한다.In addition, the present invention provides blood gene marker combinations (Table 1) for the purpose of screening for colorectal cancer and advanced adenoma composed of the following gene groups.
또한 본 발명은 상기 30개 마커의 발현양을 대입하여 제작한 대장암 및 진행선종 선별검사용 인공지능 알고리즘 기반 분류 모델을 제공한다.In addition, the present invention provides an artificial intelligence algorithm-based classification model for colorectal cancer and advanced adenoma screening tests prepared by substituting the expression levels of the 30 markers.
이하 본 발명을 설명한다.The present invention will be described below.
본 발명에서는 혈액 내 해당 바이오 마커들의 상대적 발현양을 나타내기 위해 프라이머 및 프로브 서열을 제공한다. In the present invention, primer and probe sequences are provided to indicate the relative expression levels of corresponding biomarkers in blood.
또한 본 발명은 상기 18개 마커의 발현양을 대입하여 제작한 대장암, 진행선종 및/또는 대장용종 선별검사용 인공지능 예측 모델을 제공한다.In addition, the present invention provides an artificial intelligence prediction model for colorectal cancer, advanced adenoma, and/or colorectal polyps screening test prepared by substituting the expression levels of the 18 markers.
일반적으로 사용되는 전장 RNA (Total RNA)를 분리하는 방법 및 이로부터 cDNA를 합성하는 방법은 공지된 방법을 통해 수행될 수 있으며, 이 과정에 대한 자세한 설명은 Joseph Sambrook 등, Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2001); 및 Noonan, K.F. 등에 개시되어 있어 본 발명의 참조로서 삽입될 수 있다. A method for isolating a commonly used full-length RNA (Total RNA) and a method for synthesizing cDNA therefrom can be performed through a known method, and a detailed description of this process can be found in Joseph Sambrook et al., Molecular Cloning, A Laboratory Manual. , Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2001); and Noonan, K.F. etc. are disclosed and may be incorporated by reference into the present invention.
본 발명의 프라이머는 포스포르아미다이트 고체 지지체 방법, 또는 기타 널리 공지된 방법을 사용하여 화학적으로 합성할 수 있다. 이러한 핵산 서열은 또한 당해 분야에 공지된 많은 수단을 이용하여 변형시킬 수 있다. The primers of the present invention can be chemically synthesized using the phosphoramidite solid support method, or other well-known methods. Such nucleic acid sequences can also be modified using a number of means known in the art.
이러한 변형의 비제한적인 예로는 메틸화, "캡화", 천연 클레오타이드 하나 이상의 동족체로의 치환, 및 뉴클레오타이드 간의 변형, 예를 들면, 하전되지 않은 결체(예: 메틸 포스포네이트, 포스포트리에스테르, 포스포로아미데이트, 카바메이트 등) 또는 하전된 결체(예: 포스포로티오에이트, 포스포로디티오에이트 등)로의 변형이 있다. 핵산은 하나 이상의 부가적인 공유 결합된 잔기, 예를 들면, 단백질(예: 뉴클레아제, 독소, 항체, 시그날 펩타이드, 리-L-리신 등), 삽입제(예: 아크리딘, 프소랄렌 등), 킬레이트화제(예: 금속, 방사성 금속, 철, 산화성 금속 등), 및 알킬화제를 함유할 수 있다. Non-limiting examples of such modifications include methylation, "capping", substitution of one or more homologues of a natural nucleotide, and modifications between nucleotides, such as uncharged linkages (e.g., methyl phosphonates, phosphotriesters, phosphotriesters, phosphoramidates, carbamates, etc.) or to charged associations (eg phosphorothioates, phosphorodithioates, etc.). A nucleic acid can contain one or more additional covalently linked moieties, such as proteins (eg, nucleases, toxins, antibodies, signal peptides, ly-L-lysine, etc.), intercalants (eg, acridine, psoralen, etc.). ), chelating agents (eg, metals, radioactive metals, iron, oxidizing metals, etc.), and alkylating agents.
본 발명의 핵산 서열은 또한 검출 가능한 시그널을 직접 또는 간접적으로 제공할 수 있는 표지를 이용하여 변형시킬 수 있다. 표지의 예로는 방사성 동위원소, 형광성 분자, 바이오틴 등이 있다.A nucleic acid sequence of the present invention may also be modified with a label capable of providing, directly or indirectly, a detectable signal. Examples of labels include radioactive isotopes, fluorescent molecules, and biotin.
본 발명의 방법에 있어서, 상기 증폭된 표적 서열(CCR1, 및 GAPDH 유전자 등)은 검출가능한 표지 물질로 표지될 수 있다. 일 구현예에서, 상기 표지 물질은 형광, 인광, 화학발광단 또는 방사성을 발하는 물질일 수 있으나, 이에 제한되지 않는다. 바람직하게는, 상기 표지 물질은 플루오리신(fluorescein), 피코에리트린 (phycoerythrin), 로다민, 리사민 (lissamine) Cy-5 또는 Cy-3일 수 있다. 표적 서열의 증폭시 프라이머의 5'-말단 및/또는 3' 말단에 Cy-5 또는 Cy-3를 표지하여 RT-PCR을 수행하면 표적 서열이 검출 가능한 형광 표지 물질로 표지될 수 있다.In the method of the present invention, the amplified target sequence (CCR1, GAPDH gene, etc.) may be labeled with a detectable labeling substance. In one embodiment, the label material may be a material that emits fluorescence, phosphorescence, chemiluminescence, or radioactivity, but is not limited thereto. Preferably, the labeling material may be fluorescein, phycoerythrin, rhodamine, lissamine, Cy-5 or Cy-3. When the target sequence is amplified, by labeling the 5'-end and/or 3'-end of the primer with Cy-5 or Cy-3 and performing RT-PCR, the target sequence can be labeled with a detectable fluorescent labeling material.
또한, 방사성 물질을 이용한 표지는 RT-PCR 수행시 32P 또는 35 S 등과 같은 방사성 동위원소를 PCR 반응액에 첨가하면 증폭 산물이 합성되면서 방사성이 증폭 산물에 혼입되어 증폭 산물이 방사성으로 표지될 수 있다. 표적서열을 증폭하기 위해 이용된 하나 이상의 올리고뉴클레오티드 프라이머 세트를 이용할 수 있다.In addition, when a radioactive isotope such as 32P or 35 S is added to the PCR reaction solution during RT-PCR, the amplification product is synthesized and radioactive is incorporated into the amplification product, so that the amplification product can be radioactively labeled. . One or more oligonucleotide primer sets used to amplify the target sequence may be used.
표지는 당업계에서 통상적으로 실시되는 다양한 방법, 예컨대, 닉 트랜스레이션 (nick translation) 방법, 무작위 프라이밍 방법 (Multiprime DNA labelling systems booklet, "Amersham"(1989)) 및 카이네이션 방법 (Maxam& Gilbert, Methods in Enzymology, 65:499(1986))을 통해 실시될 수 있다. 표지는 형광, 방사능, 발색 측정, 중량 측정, X-선 회절 또는 흡수, 자기, 효소적 활성, 매스 분석, 결합 친화도, 혼성화 고주파, 나노크리스탈에 의하여 검출할 수 있는 시그널을 제공한다.Labeling is performed by various methods commonly practiced in the art, such as the nick translation method, the random priming method (Multiprime DNA labeling systems booklet, "Amersham" (1989)), and the kination method (Maxam & Gilbert, Methods in Enzymology, 65:499 (1986)). The label provides a signal that can be detected by fluorescence, radioactivity, chromometry, gravimetry, X-ray diffraction or absorption, magnetism, enzymatic activity, mass analysis, binding affinity, hybridization radiofrequency, nanocrystals.
본 발명의 한 측면에 따르면, 본 발명에서는 RT-PCR을 통해 mRNA 수준에서 발현수준을 측정하게 된다. 이를 위하여 상기 CCR1, 및 GAPDH 유전자 등에 특이적으로 결합하는 신규한 프라이머 쌍과 형광이 표지된 프로브가 요구되며, 본 발명에서 특정한 염기서열로 특정된 해당 프라이머 및 프로브를 사용할 수 있으나 이에 제한되는 것은 아니며, 이들 유전자에 특이적으로 결합하여 검출가능한 시그널을 제공하여 RT-PCR을 수행할 수 있는 것이면, 제한 없이 사용될 수 있다. 상기에서 FAM과 Quen(Quencher)는 형광염료를 의미한다.According to one aspect of the present invention, in the present invention, the expression level is measured at the mRNA level through RT-PCR. To this end, novel primer pairs and fluorescently labeled probes that specifically bind to the CCR1 and GAPDH genes are required, and in the present invention, corresponding primers and probes specified by specific nucleotide sequences can be used, but are not limited thereto , Anything that can specifically bind to these genes to provide a detectable signal to perform RT-PCR can be used without limitation. In the above, FAM and Quen (Quencher) mean fluorescent dyes.
본 발명에 적용되는 RT-PCR 방법은 당업계에서 통상적으로 사용되는 공지의 과정을 통해 수행될 수 있다.The RT-PCR method applied to the present invention may be performed through a known process commonly used in the art.
mRNA 발현수준을 측정하는 단계는 통상의 mRNA 발현수준을 측정할 수 있는 방법이면 제한 없이 사용될 수 있으며, 사용한 프로브 표지의 종류에 따라 방사성 측정, 형광 측정 또는 인광 측정을 통해 수행될 수 있으나, 이에 제한되지 않는다.The step of measuring the mRNA expression level may be used without limitation as long as it is a method capable of measuring the normal mRNA expression level, and may be performed through radioactivity measurement, fluorescence measurement, or phosphorescence measurement depending on the type of probe label used, but is limited thereto. It doesn't work.
증폭 산물을 검출하는 방법 중의 하나로서, 형광 측정 방법은 프라이머의 5'-말단에 Cy-5 또는 Cy-3를 표지하여 real-time RT-PCR을 수행하면 표적 서열이 검출 가능한 형광 표지 물질로 표지되며, 이렇게 표지된 형광은 형광 측정기를 이용하여 측정할 수 있다. As one of the methods for detecting the amplification product, the fluorescence measurement method is to label the 5'-end of the primer with Cy-5 or Cy-3 and perform real-time RT-PCR to label the target sequence with a detectable fluorescent label. And the fluorescence thus labeled can be measured using a fluorescence meter.
또한, 방사성 측정 방법은 RT-PCR 수행 시 32P 또는 35S 등과 같은 방사성 동위원소를 PCR 반응액에 첨가하여 증폭 산물을 표지한 후, 방사성 측정기구, 예를 들면, 가이거 계수기(Geiger counter) 또는 액체섬광계수기(liquid scintillation counter)를 이용하여 방사성을 측정할 수 있다.In addition, the radioactivity measurement method is to add a radioactive isotope such as 32 P or 35 S to the PCR reaction solution during RT-PCR to label the amplification product, and then use a radioactivity measurement instrument, for example, a Geiger counter or Radioactivity can be measured using a liquid scintillation counter.
본 발명의 바람직한 일구현예에 따르면, 상기 RT-PCR을 통해 증폭된 PCR 산물에 형광이 표지된 프로브가 붙어 특정 파장의 형광을 내게 되고, 증폭과 동시에 PCR 장치의 형광 측정기에서 본 발명의 유전자들의 mRNA 발현 수준을 실시간으로 측정하고, 측정된 값이 계산되어 PC를 통해 시각화 되게 되어 검사자는 쉽게 그 발현 정도를 확인할 수 있다.According to a preferred embodiment of the present invention, a fluorescence-labeled probe is attached to the PCR product amplified through the RT-PCR to emit fluorescence of a specific wavelength, and at the same time as amplification, the fluorescence of the genes of the present invention is measured in the fluorescence meter of the PCR device. The mRNA expression level is measured in real time, and the measured value is calculated and visualized through a PC, so that the inspector can easily check the expression level.
본 발명의 다른 측면에 따르면 상기 선별 키트는 역전사 중합효소반응을 수행하기 위해 필요한 필수 요소를 포함하는 것을 특징으로 하는 대장암 및 대장용종 진단용 키트일 수 있다. 역전사 중합효소반응 키트는 상기 본 발명의 유전자에 대한 특이적인 각각의 프라이머 쌍을 포함할 수 있다. 프라이머는 각 마커 유전자의 핵산 서열에 특이적인 서열을 가지는 뉴클레오타이드로서, 약 7 bp 내지 50 bp의 길이, 보다 바람직하게는 약 10 bp 내지 30 bp 의 길이일 수 있다.According to another aspect of the present invention, the screening kit may be a kit for diagnosing colorectal cancer and colorectal polyps, characterized in that it includes essential elements necessary for carrying out a reverse transcription polymerase reaction. The reverse transcription polymerase reaction kit may include each primer pair specific for the gene of the present invention. The primer is a nucleotide having a sequence specific to the nucleic acid sequence of each marker gene, and may have a length of about 7 bp to 50 bp, more preferably about 10 bp to 30 bp.
그 외 역전사 중합효소반응 키트는 테스트 튜브 또는 다른 적절한 컨테이너, 반응 완충액 (pH 및 마그네슘 농도는 다양), 데옥시뉴클레오타이드 (dNTPs), Taq-폴리머라아제 및 역전사효소와 같은 효소, DNAse, RNAse 억제제, DEPC-수(DEPC-water), 멸균수 등을 포함할 수 있다.Other reverse transcription polymerase reaction kits include a test tube or other suitable container, reaction buffer (with varying pH and magnesium concentration), deoxynucleotides (dNTPs), enzymes such as Taq-polymerase and reverse transcriptase, DNAse, RNAse inhibitors, DEPC-water, sterile water, and the like.
또한, 본 발명의 키트는 최적의 반응 수행 조건을 기재한 사용자 안내서를 추가로 포함할 수 있다. In addition, the kit of the present invention may further include a user guide describing optimal reaction performance conditions.
안내서는 키트 사용법, 예를 들면, 완충액 제조 방법, 제시되는 반응 조건 등을 설명하는 인쇄물이다. The guide is a printed matter that explains how to use the kit, eg, how to prepare a buffer solution, suggested reaction conditions, and the like.
안내서는 팜플렛 또는 전단지 형태의 안내 책자, 키트에 부착된 라벨, 및 키트를 포함하는 패키지의 표면상에 설명을 포함할 수 있다. 또한, 안내서는 인터넷과 같이 전기 매체를 통해 공개되거나 제공되는 정보를 포함할 수 있다.The guide may include a brochure in the form of a pamphlet or leaflet, a label affixed to the kit, and instructions on the surface of the package containing the kit. In addition, the guide may include information disclosed or provided through an electronic medium such as the Internet.
본 발명에서 용어 "대장암 및 대장용종 진단을 위한 정보제공방법"은 진단을 위한 예비적 단계로서 암의 진단을 위하여 필요한 객관적인 기초정보를 제공하는 것이며 의사의 임상학적 판단 또는 소견은 제외된다.In the present invention, the term "information provision method for diagnosing colon cancer and colon polyps" is a preliminary step for diagnosis and provides objective basic information necessary for diagnosis of cancer, and clinical judgment or opinion of a doctor is excluded.
본 발명에서 용어 "대장암 및 진행선종 스크리닝을 위한 정보제공방법"은 진단을 위한 예비적 단계로서 암의 진단을 위하여 필요한 객관적인 기초정보를 제공하는 것이며 의사의 임상학적 판단 또는 소견은 제외된다.In the present invention, the term "information provision method for screening for colorectal cancer and advanced adenoma" is a preliminary step for diagnosis and provides objective basic information necessary for diagnosis of cancer, and clinical judgment or opinion of a doctor is excluded.
용어 "프라이머"는 짧은 자유 3말단 수산화기를 가지는 핵산 서열로 상보적인 템플레이트 (template)와 염기쌍을 형성할 수 있고 템플레이트 가닥 복사를 위한 시작 지점으로 기능을 하는 짧은 핵산 서열을 의미한다. 프라이머는 적절한 완충용액 및 온도에서 중합반응 (즉, DNA 중합효소 또는 역전사효소) 을 위한 시약 및 상이한 4가지 뉴클레오사이드 트리포스페이트의 존재하에서 DNA 합성이 개시할 수 있다. 본 발명의 프라이머는, 각 마커 유전자 특이적인 프라이머로 7개 내지 50개의 뉴클레오타이드 서열을 가진 센스 및 안티센스 핵산이다. 프라이머는 DNA합성의 개시점으로 작용하는 프라이머의 기본 성질을 변화시키지 않는 추가의 특징을 혼입할 수 있다.The term "primer" refers to a short nucleic acid sequence having a short free 3-terminal hydroxyl group capable of forming base pairs with a complementary template and serving as a starting point for copying the template strand. Primers can initiate DNA synthesis in the presence of reagents for polymerization (i.e., DNA polymerase or reverse transcriptase) and four different nucleoside triphosphates in an appropriate buffer and temperature. The primers of the present invention are sense and antisense nucleic acids having sequences of 7 to 50 nucleotides specific to each marker gene. A primer may incorporate additional features that do not alter the basic properties of the primer that serve as the starting point of DNA synthesis.
용어 "프로브"는 단일쇄 핵산 분자이며, 타깃 핵산 서열에 상보적인 서열을 포함한다.The term “probe” is a single-stranded nucleic acid molecule and contains a sequence complementary to a target nucleic acid sequence.
용어 "실시간 역전사 중합효소 반응 (realtime RT-PCR)"이라 함은 역전사효소를 이용하여 RNA를 상보적인 DNA (cDNA) 로 역전사 시킨 후에 만들어진 cDNA를 주형 (template) 으로 하여 타겟 프라이머와 표지를 포함하는 타겟 프로브를 이용해 타겟을 증폭함과 동시에 증폭된 타겟에 타겟 프로프의 표지에서 발생하는 신호를 정량적으로 검출해 내는 분자생물학적 중합방법이다.The term "real-time RT-PCR" refers to reverse transcription of RNA into complementary DNA (cDNA) using reverse transcriptase and using cDNA as a template containing target primers and labels It is a molecular biological polymerization method that amplifies a target using a target probe and quantitatively detects a signal generated from the label of the target probe on the amplified target at the same time.
본 발명의 대장암 및 대장용종 진단 또는 예측에는 정보 학습을 통해 대장암 및 대장용종을 진단하도록 할 수 있는 데이터 마이닝 방법이 사용될 수 있으며, 특히 AI 분석을 통해 효과적으로 개선할 수 있다. 따라서 본 발명의 대장암 및 대장용종 진단 또는 예측 방법에는 바람직하게는 대장암 및 대장용종 진단 마커의 상대적 발현량을 측정할 수 있는 방법 및/또는 AI 분석 방법이 사용될 수 있다.A data mining method capable of diagnosing colon cancer and colon polyps through information learning can be used for diagnosing or predicting colon cancer and colon polyps of the present invention, and in particular, it can be effectively improved through AI analysis. Therefore, a method capable of measuring the relative expression levels of diagnostic markers for colon cancer and colon polyps and/or an AI analysis method may be preferably used in the method for diagnosing or predicting colon cancer and colon polyps of the present invention.
본 발명에 있어 대장암 및 대장용종 예측 모델에 AI 분석을 이용하는 경우, 다양한 해석 가능한 모델을 제한없이 이용할 수 있으며, 선형 회귀, 로지스틱 회귀, 신경망 분석, 의사결정나무, 결정 규칙, 룰핏, 서포트 벡터 머신과 같은 모델을 제한없이 적용가능하고, 본 발명의 바람직한 구현예에서는 특히 로지스틱 회귀 분석, 의사결정 나무, 신경망 분석 및 서포트 벡터 머신을 이용하였다. In the present invention, when AI analysis is used for colorectal cancer and colorectal polyps prediction models, various interpretable models can be used without limitation, and linear regression, logistic regression, neural network analysis, decision tree, decision rule, rule fit, support vector machine A model such as is applicable without limitation, and in a preferred embodiment of the present invention, logistic regression analysis, decision tree, neural network analysis and support vector machine are used in particular.
한편 본 발명의 예측 모델은 대장암 및 대장용종 진단부, 분류부 및 가중치 부여부를 포함할 수 있으며, 상기 대장암 및 대장용종 진단부는 환자의 해당 질환 관련 유전자 마커의 상대적 발현량 정보 수신부에서 수신된 상대적 발현량 정보를 입력정보로 하고, 상기 대장 관련 질환 분류부는 신경망을 분류기로 하여 대장암 및 대장용종을 분류하는 과정을 수행할 수 있고, 상기 가중치 부여부는 분류 결과에 대하여 가중치를 부여함으로써 대장암 및 대장용종을 선별할 수 있다.Meanwhile, the prediction model of the present invention may include a colorectal cancer and colon polyps diagnosis unit, a classification unit, and a weighting unit. Using relative expression level information as input information, the colon-related disease classification unit may perform a process of classifying colon cancer and colon polyps using a neural network as a classifier, and the weighting unit assigns a weight to the classification result, thereby detecting colorectal cancer and colon polyps can be screened.
본 발명의 실시예들에 따른 신경망분석은, 하나 이상의 레이어(Layer)를 구축하여 복수의 데이터를 바탕으로 판단을 수행하는 시스템을 의미한다. 예를 들어, 신경망 분석은 입력 층은 유전자 마커의 상대적 발현량 정보를 신경망 분석 모델에 데이터로서 넣어주는 층이며, 출력 층은 입력된 여러 정보를 바탕으로 대장암 및 대장용종 질환 환자의 유무를 판단해 줄 수 있는 결과를 내주는 층이다. 숨김층은 여러가지 판단기준 (유전자 돌연변이 정보)에 대한 가중치를 부여하여 환자 유무를 결정할 수 있는 process를 진행하는 층이다.Neural network analysis according to embodiments of the present invention refers to a system that constructs one or more layers to make a decision based on a plurality of data. For example, in neural network analysis, the input layer is a layer that inputs relative expression level information of gene markers as data into a neural network analysis model, and the output layer determines the presence or absence of colorectal cancer and colon polyp disease patients based on various input information. It is a layer that gives results that can be done. The hidden layer is a layer that proceeds with the process of determining whether or not there is a patient by assigning weights to various criteria (gene mutation information).
본 발명의 실시예에 따른 AI 분석 기법을 이용한 대장암 및 대장용종 예측 방법은 MLP 신경망을 이용하여, 상기 숨김 노드의 수를 가지는 신경망 분석 모형을 추정한다. 또한, 입력변수와 출력변수의 다양한 변수변환을 통하여 구축된 여러 개의 신경망 모형 중 각 모형으로부터 추정된 정확도가 가장 높은 신경망 모형을 대장 질환 예측을 위한 최종 신경망 모형으로 결정한다. 상기 AI 분석은 입력 층, 숨김층, 및 출력 층으로 구성될 수 있으며, 상기 신경망 분석 단계를 통한 신경망 분석 모형은 몇 개의 숨김층에 몇 개의 숨김 노드를 가지는 신경망 모형일 수 있다.The method for predicting colorectal cancer and colorectal polyps using an AI analysis technique according to an embodiment of the present invention estimates a neural network analysis model having the number of hidden nodes using an MLP neural network. In addition, among several neural network models built through various variable transformations of input and output variables, the neural network model with the highest accuracy estimated from each model is determined as the final neural network model for colorectal disease prediction. The AI analysis may be composed of an input layer, a hidden layer, and an output layer, and the neural network analysis model through the neural network analysis step may be a neural network model having several hidden nodes in several hidden layers.
본 발명을 통하여 알 수 있는 바와 같이, 본 발명은 비교적 추출이 용이한 혈액 검체를 이용하여 혈액에서 발현하는 유전자 마커들의 발현 양상을 인공지능 알고리즘에 대입하여 대장암 및 대장용종 선별하는 데에 도움을 줄 수 있다.As can be seen through the present invention, the present invention is helpful in screening for colorectal cancer and colorectal polyps by substituting the expression patterns of genetic markers expressed in blood into an artificial intelligence algorithm using blood samples that are relatively easy to extract. can give
또한 본 발명은 비교적 추출이 용이한 혈액 검체를 이용하여 혈액에서 발현하는 유전자 마커들의 발현 양상을 인공지능 알고리즘에 대입하여 대장암 및 진행선종을 스크리닝 하는 데에 도움을 줄 수 있다.In addition, the present invention can help screen for colorectal cancer and advanced adenoma by using a relatively easy-to-extract blood sample and substituting the expression patterns of genetic markers expressed in blood into an artificial intelligence algorithm.
도 1은 본 발명의 요약도,1 is a summary diagram of the present invention;
도 2는 실험 및 분석 진행한 그룹별 검체 수,Figure 2 is the number of samples by group in which the experiment and analysis were performed,
도 3은 유전자 바이오마커 검출을 위해 제작한 프라이머 프로브 염기 서열,3 is a primer probe nucleotide sequence prepared for detecting a genetic biomarker;
도 4 내지 8은 RT-qPCR 수행을 통해 도출한 그룹별 각 유전자 바이오마커의 상대적 발현양 결과,4 to 8 show the relative expression of each gene biomarker for each group derived through RT-qPCR,
도 9는 전체 샘플을 이용한 그룹별 t-test 통계분석 결과 (Model A 제작을 위한 바이오마커 선별),Figure 9 shows the results of t-test statistical analysis for each group using all samples (selection of biomarkers for Model A production),
도 10은 Model A에서 양성인 샘플을 이용한 그룹별 t-test 통계분석 결과 (Model B 제작을 위한 바이오마커 선별),10 shows the results of t-test statistical analysis for each group using positive samples in Model A (selection of biomarkers for Model B production);
도 11은 Model A와 B 에서 음성인 샘플을 이용한 그룹별 t-test 통계분석 결과 (Model C 제작을 위한 바이오마커 선별),Figure 11 shows the results of t-test statistical analysis by group using negative samples in Model A and B (selection of biomarkers for Model C production),
도 12는 Model A 제작 시, 최종으로 도출된 4가지 인공지능 예측 모델의 정확도, 민감도, 특이도 결과,12 shows the accuracy, sensitivity, and specificity results of the four artificial intelligence prediction models finally derived when Model A was produced;
도 13은 Model B 제작 시, 최종으로 도출된 4가지 인공지능 예측 모델의 정확도, 민감도, 특이도 결과,13 shows the accuracy, sensitivity, and specificity results of the four artificial intelligence prediction models finally derived when Model B was produced;
도 14는 Model C 제작 시, 최종으로 도출된 4가지 인공지능 예측 모델의 정확도, 민감도, 특이도 결과,14 shows the accuracy, sensitivity, and specificity results of the four artificial intelligence prediction models finally derived when Model C was produced;
도 15는 Model A, B, C 를 순차적으로 적용한 최종 결과 모식도, 및15 is a schematic diagram of the final result of sequentially applying Models A, B, and C; and
도 16은 인공지능 알고리즘 기반 분류모델 구축 및 모델 성능 확인 과정16 is a process of building an artificial intelligence algorithm-based classification model and verifying model performance
이하, 본 발명의 실시예에 대하여 첨부된 도면을 참조하면서 상세히 설명하기로 한다. 이들 실시예는 오로지 본 발명을 보다 구체적으로 설명하기 위한 것으로, 본 발명의 요지에 따라 본 발명의 범위가 이들 실시예에 의해 제한되지 않는다는 것은 당업계에서 통상의 지식을 가진 자에 있어서 자명할 것이다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. These examples are only for explaining the present invention in more detail, and it will be apparent to those skilled in the art that the scope of the present invention is not limited by these examples according to the gist of the present invention. .
참고로 본 특허는 크게 세 개의 연구[하기 1), 2) 및 3)으로 구분]의 실시예로 이루어져 있다.For reference, this patent largely consists of examples of three studies [divided into the following 1), 2) and 3).
1) 실시예 1 내지 5는 대장암 및 대장 용종을 선별하는 내용으로 유전자 마커는 18개[C-C motif chemokine receptor 1 (CCR1), Carboxylesterase 1 (CES1), Interleukin 1 beta (IL1B), Integrin alpha 2 (ITGA2), Lactotransferrin (LTF), Tumor necrosis factor superfamily 13b (TNFSF13B), Prostaglandin E synthase (PTGES), Inter-alpha-trypsin inhibitor heavy chain H4 (ITIH4), Taurine upregulated gene1 (TUG1), Nucleoside diphosphate kinase 1 (NME1), Prostaglandin-endoperoxide synthase 2 (PTGS2), C-X-C motif chemokine 11 (CXCL11), Mitogen-activated protein kinase 6 (MAPK6), Glycerol kinase (GK), Keratin 19 (KRT19), Epithelial cell adhesion molecule (EpCAM), Melanoma Cell Adhesion Molecule (MCAM) 및 Peroxisome proliferator activated receptor gamma (PPARG)]를 사용하였으며, 대장암군 42, 고위험선종군 156, 저위험선종군 110 및 대조군 183으로 이루어진 분석 검체수를 통하여 대장암 민감도: 95.2%, 고위험선종 민감도: 91.7% 및 특이도: 78.5%의 분석결과를 나타내었으며,1) Examples 1 to 5 are for screening colon cancer and colon polyps, and 18 gene markers [C-C motif chemokine receptor 1 (CCR1), Carboxylesterase 1 (CES1), Interleukin 1 beta (IL1B), Integrin alpha 2 ( ITGA2), Lactotransferrin (LTF), Tumor necrosis factor superfamily 13b (TNFSF13B), Prostaglandin E synthase (PTGES), Inter-alpha-trypsin inhibitor heavy chain H4 (ITIH4), Taurine upregulated gene1 (TUG1), Nucleoside diphosphate kinase 1 (NME1) ), Prostaglandin-endoperoxide synthase 2 (PTGS2), C-X-C motif chemokine 11 (CXCL11), Mitogen-activated protein kinase 6 (MAPK6), Glycerol kinase (GK), Keratin 19 (KRT19), Epithelial cell adhesion molecule (EpCAM), Melanoma Cell Adhesion Molecule (MCAM) and Peroxisome proliferator activated receptor gamma (PPARG)] were used, and colorectal cancer sensitivity: 95.2% through the number of analysis samples consisting of 42 colorectal cancer groups, 156 high-risk adenoma groups, 110 low-risk adenoma groups, and 183 control groups. , high-risk adenoma sensitivity: 91.7% and specificity: 78.5% of the analysis results,
2) 실시예 6 내지 실시예 10은 대장암 및 이의 전암 단계 진단을 위한 내용으로 상기 18개의 마커에 새롭게 5개의 마커(ANKHD1-EIF4EBP3 Readthrough (ANKHD1-EIF4EBP3), G Protein-Coupled Receptor 15 (GPR15), Matrix Metallopeptidase 23B (MMP23B), Taste 2 Receptor Member 10 (TAS2R10), 및 Thymidylate Synthetase (TYMS)]를 추가하여 23개의 유전자 마커에 대해서 실험을 수행하였으며, 2) Examples 6 to 10 are for diagnosis of colorectal cancer and its precancerous stage, and five new markers ( ANKHD1-EIF4EBP3 Readthrough (ANKHD1-EIF4EBP3), G Protein-Coupled Receptor 15 (GPR15) , Matrix Metallopeptidase 23B (MMP23B), Taste 2 Receptor Member 10 (TAS2R10), and Thymidylate Synthetase (TYMS)] were added to perform experiments on 23 genetic markers,
분석 검체수는 대장암군 112, 진행선종군 178, 비진행선종군 104 및 대조군 203이었으며, 대장암 민감도: 89.3%, 진행선종 민감도: 74.5% 및 특이도: 72.0%의 분석결과를 나타내었으며,The number of samples analyzed was 112 in the colorectal cancer group, 178 in the advanced adenoma group, 104 in the non-advanced adenoma group, and 203 in the control group.
3) 실시예 11 내지 15는 대장암 및 진행선종 스크리닝을 위한 내용으로 상기 2)번 카타고리의 23개의 마커에 새롭게 7개의 마커(Forkhead box A2 (FOXA2), Marker Of proliferation Ki-67 (MKi67), Erb-B2 Receptor Tyrosine Kinase 2 (ERBB2), Neuroplastin (NPTN), Snail family transcriptional repressor 2 (SNAI2), Telomerase reverse transcriptase (TERT) 및 Vimentin (VIM))를 추가하여 30개의 유전자 마커에 대해서 실험을 수행하였으며, 분석 검체 수는 대장암군 148, 진행선종군 197 및 대조군 143이었으며, 대장암 민감도: 91.9%, 진행선종 민감도: 94.2% 및 특이도: 85.0%의 분석결과를 나타내었다.3) Examples 11 to 15 are for colorectal cancer and advanced adenoma screening, and 7 new markers (Forkhead box A2 (FOXA2), Marker Of proliferation Ki-67 (MKi67), Erb-B2 Receptor Tyrosine Kinase 2 (ERBB2), Neuroplastin (NPTN), Snail family transcriptional repressor 2 (SNAI2), Telomerase reverse transcriptase (TERT) and Vimentin (VIM)) were added and experiments were performed on 30 genetic markers. , The number of samples analyzed was 148 in the colorectal cancer group, 197 in the advanced adenoma group, and 143 in the control group.
또한 상기 2)와 3)의 관계는In addition, the relationship between 2) and 3) above is
상기 2) 연구(실시예 6 내지 10)에서 대장암 및 진행선종을 구분하는 데에 유의하였던 유전자들을 이용한 모델에서 대장암 민감도 89.3 %, 진행선종 민감도 74.5%, 특이도 72.0%의 결과를 도출하였다. 이에 특이도를 향상시키기 위하여 대장암 또는 진행선종에서만 상대 발현양이 변화하는 유전자를 추가로 탐색하였고 이를 3)연구로 명명하였다.In the above 2) study (Examples 6 to 10), the model using the genes that were significant in distinguishing between colorectal cancer and advanced adenoma resulted in a sensitivity of 89.3% for colorectal cancer, a sensitivity of 74.5% for advanced adenoma, and a specificity of 72.0%. . Therefore, in order to improve the specificity, we additionally searched for a gene whose relative expression level changes only in colorectal cancer or advanced adenoma, and named this as 3) study.
대장암 또는 대장암의 전구병변인 진행선종에서는 혈액 내 순환암세포 (Circulating tumor cell) 가 존재할 수 있고 이에 따라 순환암세포에서 상대발현양이 변화하는 것으로 알려진 7개 유전자 (FOXA2, MKi67, MUC1, NPTN, SNAI2, TERT, VIM) 를 표적으로 정상군, 진행선종군, 대장암군의 혈액을 이용하여서 해당 유전자들의 그룹별 상대발현양을 비교하였다.Circulating tumor cells may exist in the blood in colorectal cancer or advanced adenoma, a precursor of colorectal cancer, and 7 genes ( FOXA2, MKi67, MUC1, NPTN, SNAI2, TERT, VIM ) were used as targets, and the relative expression levels of the corresponding genes were compared by group using blood from normal, advanced adenoma, and colorectal cancer groups.
GAPDH 유전자의 Cq 값을 이용하여 표적 유전자의 Cq 값을 이용하여 표적 유전자의 상대발현양(2-ΔCq)을 계산하였다. 정상군과 대장암군의 상대발현양을 비교하기 위하여 정상군 대비 대장암군 상대발현양 비로 Fold change값을 구하였고 Student's t-test 분석을 통하여 두 그룹간 차이의 p-value를 구하였다. 정상군과 진행선종군의 상대발현양을 비교하기 위하여 정상군 대비 진행선종군 상대발현양 비로 Fold change값을 구하였고 Student's t-test 분석을 통하여 두 그룹간 차이의 p-value를 구하였다. 진행선종군과 대장암군의 상대발현양을 비교하기 위하여 진행선종군 대비 대장암군 상대발현양 비로 Fold change값을 구하였고 Student's t-test 분석을 통하여 두 그룹간 차이의 p-value를 구하였다.The relative expression level (2 -ΔCq ) of the target gene was calculated using the Cq value of the target gene using the Cq value of the GAPDH gene. In order to compare the relative expression level between the normal group and the colorectal cancer group, the fold change value was obtained from the relative expression level ratio of the normal group to the colorectal cancer group, and the p -value of the difference between the two groups was obtained through Student's t -test analysis. In order to compare the relative expression level between the normal group and the advanced glandular group, the fold change value was calculated as the relative expression level ratio of the normal group to the advanced glandular group, and the p -value of the difference between the two groups was obtained through Student's t -test analysis. In order to compare the relative expression levels between the advanced adenoma group and the colorectal cancer group, the fold change value was calculated as the ratio of the relative expression level of the advanced adenoma group to the colorectal cancer group, and the p -value of the difference between the two groups was obtained through Student's t-test analysis.
그 결과, FOXA2, MKi67, MUC1, NPTN, SNAI2, TERT, VIM 유전자 모두 정상군 대비 진행선종군과 대장암군에서의 상대발현양이 p-value 0.05 이하로 통계학적으로 유의한 차이가 나타났다. hTERT 유전자는 진행선종군과 대장암군의 상대발현양을 비교하였을 때에도 p-value 0.05 이하로 통계학적으로 유의한 차이가 나타나는 것을 확인하였다 (표 2). As a result, the relative expression levels of FOXA2, MKi67, MUC1, NPTN, SNAI2, TERT, and VIM genes in the advanced adenoma group and colorectal cancer group compared to the normal group showed a statistically significant difference with a p -value of 0.05 or less. When the relative expression level of the hTERT gene was compared between the advanced adenoma group and the colorectal cancer group, a statistically significant difference was confirmed with a p -value of 0.05 or less (Table 2).
표 2는 그룹 간 상대발현양 차이 비교따라서, 3) 연구(실시예 11-15)에서 대장암 및 진행선종을 구분하는 데에 유의하였던 유전자들과 위의 7개 유전자가 추가된 총 30개 유전자를 이용하여 대장암 및 진행선종 스크리닝 목적의 분류 모델 구축하고자 하였다.Table 2 is a comparison of differences in relative expression between groups. Accordingly, 3) genes that were significant in distinguishing between colorectal cancer and advanced adenoma in the study (Examples 11-15) and a total of 30 genes to which the above 7 genes were added. was used to construct a classification model for the purpose of screening for colorectal cancer and advanced adenoma.
참고로 표 3은 본 발명에서 사용된 30개 마커 전체에 대한 프라이머 및 프로브 서열의 리스트이다.For reference, Table 3 is a list of primer and probe sequences for all 30 markers used in the present invention.
(5' --> 3')(5' --> 3')
-EIF4EBP3-EIF4EBP3
표 3은 본 발명에 사용된 각각의 프라이머와 프로브 염기서열.Table 3 shows the base sequences of each primer and probe used in the present invention.
이하 본 발명을 실시예를 통하여 상술한다.Hereinafter, the present invention will be described in detail through examples.
실시예 1 : 재료Example 1: Materials
2017년부터 2019년까지 신촌 세브란스병원, 강남 세브란스병원, 강북 삼성병원 내 소화기 내과에 내원한 사람들의 혈액 샘플을 이용하였다. 혈액은 Tempus blood tube (Applied Biosystems®)를 이용하여 채혈하여 연세대학교 원주캠퍼스 분자진단학 연구실로 운송되었다. From 2017 to 2019, blood samples from people who visited the gastroenterology department at Shinchon Severance Hospital, Gangnam Severance Hospital, and Gangbuk Samsung Hospital were used. Blood was collected using Tempus blood tube (Applied Biosystems®) and transported to Yonsei University Wonju Campus Molecular Diagnostics Laboratory.
대장내시경 검사의 결과를 통해As a result of colonoscopy
- 대장 내 병변이 없는 정상군 183명,- 183 normal group without colon lesions,
-저위험 선종이 3개미만 존재하는 저위험군 110명,-110 low-risk group with less than 3 low-risk adenomas,
-저위험 선종이 3개 이상인 경우, 고위험 선종이 1개 이상인 경우, 상피내암인 경우를 포함하는 고위험군 156명,- 156 high-risk groups including those with 3 or more low-risk adenomas, 1 or more high-risk adenomas, and those with carcinoma in situ
-대장암군 42명으로 분류되었다. -42 patients were classified as colorectal cancer group.
실시예 2 : 혈액 검체에서 Total RNA 분리Example 2: Isolation of Total RNA from Blood Specimens
Tempus tube로 채혈된 혈액검체로부터 Tempus blood RNA isolation kit (Applied Biosystems®) 를 이용하여 Total RNA를 분리한다. Total RNA is isolated from a blood sample collected with a Tempus tube using the Tempus blood RNA isolation kit (Applied Biosystems®).
실시예 3 : 분리된 total RNA로부터 cDNA 제작 및 Real-time PCR 수행Example 3: Construction of cDNA from isolated total RNA and real-time PCR
i.cDNA 합성i. cDNA synthesis
분리된 total RNA 1.5~4.5ug, Random primer (Invitrogen) 0.625 ug, dNTP (Intron), MMLV 역전사 중합효소 (Invitrogen) 500 units 을 첨가하고 최종부피를 50㎕ 가 되도록 DEPC treated DW를 넣고 잘 섞은 후 합성 반응액을 thermocycler (ABI) 에서 25℃ 30분 - 37℃ 50분 - 70℃ 15분 반응 시켜 cDNA를 합성하였다.Separated total RNA 1.5~4.5ug, Random primer (Invitrogen) 0.625ug, dNTP (Intron), MMLV Reverse Transcription Polymerase (Invitrogen) 500 units were added, and DEPC treated DW was added to the final volume of 50μl, mixed well, and then synthesized The reaction solution was reacted in a thermocycler (ABI) at 25°C for 30 minutes - 37°C for 50 minutes - 70°C for 15 minutes to synthesize cDNA.
ii.qPCR 수행ii. Perform qPCR
Real time PCR 반응물의 조성은 THUNDERBIRD® Probe qPCR Mix (TOYOBO) 10㎕와 Forward / Reverse Primer 10pmole과 probe 또한 10pmole을 넣어주고 합성한 cDNA를 15ug 넣고 최종 부피가 20㎕ 되도록 Ultra pure water 를 넣어 섞어준다. 각각의 프라이머와 프로브 염기서열은 표 3에 기재 되었다.For the composition of the real-time PCR reaction, add 10 μl of THUNDERBIRD® Probe qPCR Mix (TOYOBO), 10 pmoles of Forward / Reverse Primer, and 10 pmoles of the probe, add 15 μg of synthesized cDNA, and add ultra pure water to make the final volume 20 μl. The base sequences of each primer and probe are listed in Table 3.
Real time PCR 반응은 CFX96 (Biorad) 를 이용하였으며 반응 온도 조건은 다음과 같다. 95℃, 3분 후 95℃, 3초 - 60℃, 30초를 40회 반복하여 수행하였다. Annealing 과정 (60℃, 30초) 이 한번 수행될 때마다 형광을 측정하는 과정을 추가하여, 횟수 별로 증가되는 형광 값을 측정하였다.Real time PCR reaction was performed using CFX96 (Biorad), and the reaction temperature conditions are as follows. After 95°C, 3 minutes, 95°C, 3 seconds - 60°C, 30 seconds were repeated 40 times. Each time the annealing process (60 ° C, 30 seconds) was performed, a process of measuring fluorescence was added to measure the fluorescence value that increased for each number of times.
표 3은 본 발명에 사용된 각각의 프라이머와 프로브 염기서열.Table 3 shows the base sequences of each primer and probe used in the present invention.
실시예 4 : 결과 확인 및 표적 유전자의 상대적 발현양 계산Example 4: Confirmation of results and calculation of relative expression of target genes
Endogenous control로 이용된 GAPDH 유전자의 Cq 값을 이용하여 표적 유전자의 Cq 값을 이용하여 표적 유전자의 상대적 발현양(2-ΔCq)을 계산한다. Using the Cq value of the GAPDH gene used as an endogenous control, the relative expression level (2 -ΔCq ) of the target gene is calculated using the Cq value of the target gene.
[관계식][relational expression]
2-ΔCq = 2-(표적 유전자 Cq - GAPDH 유전자 Cq) 2 -ΔCq = 2 - (target gene Cq - GAPDH gene Cq)
실시예 5 : 대장암 및 대장용종 진단 예측 모델 제작Example 5: Production of predictive model for diagnosis of colon cancer and colon polyps
대장암 및 대장용종 진단 예측 모델에 대입하기 위한 유전자 바이오마커를 선별하고 선별된 유전자 바이오마커의 상대발현양을 대입하여 대장암 및 대장용종 진단 예측 모델을 제작한다.Genetic biomarkers for substitution in the diagnosis prediction model for colorectal cancer and colon polyps are selected, and the relative expression levels of the selected genetic biomarkers are substituted to produce a diagnosis prediction model for colon cancer and colon polyps.
대장암 및 대장용종 진단 예측 모델에 대입하기 위한 유전자 바이오마커를 선별하기 위해서는 SPSS 통계분석 패키지를 이용하였고 선별된 유전자 바이오마커의 상대발현양을 대입하여 대장암 및 대장용종 진단 예측 모델을 제작하기 위한 통계분석은 R 패키지를 사용하였다.In order to select genetic biomarkers for substitution in the diagnosis prediction model for colorectal cancer and colorectal polyps, the SPSS statistical analysis package was used, and for the production of colorectal cancer and colorectal polyps diagnosis prediction models by substituting the relative expression levels of the selected genetic biomarkers. Statistical analysis was performed using the R package.
대장암 및 대장용종 진단 예측 모델의 제작은 Decision tree (DT), Logistic regression (LR), Neural network (NN), Support vector machine (SVM) 에 의해 수행되었으나 이에 한정되지 않는다.The production of colorectal cancer and colorectal polyps diagnosis prediction models was performed by decision tree (DT), logistic regression (LR), neural network (NN), and support vector machine (SVM), but is not limited thereto.
인공지능 예측 모델은 전체 샘플 결과의 일부로 구성된 Training set 결과를 대입하여 제작된다. Training set 에 포함되지 않은 샘플들로 Validation set 를 구성한 후 Validation set 의 결과를 대입하여 Training set 로 제작한 모델의 정확도를 확인한다. 이 때 정확도는 예측 모델의 예측이 얼마나 정확한 지를 의미한다. The artificial intelligence prediction model is created by substituting the results of a training set composed of a part of the total sample results. After constructing a validation set with samples not included in the training set, the accuracy of the model built with the training set is verified by substituting the results of the validation set. In this case, accuracy means how accurate the prediction of the prediction model is.
총 4가지 종류의 모델 (DT, NR, NN, SVM) 을 제작하였고, Training set와 Validation set 를 이용한 결과 도출을 1000회 반복한다. 1000회 동안 만들어지는 모델 중 Validation set 로 확인한 정확도가 가장 높은 모델이 최종 결과로 나타난다. 최종적으로 만들어진 4가지 종류의 모델 중에서 모든 샘플이 포함된 Total set (총 491개) 의 민감도 또는 특이도가 가장 높은 종류의 모델을 선정하였다. A total of four types of models (DT, NR, NN, and SVM) were produced, and the results using the training set and validation set were repeated 1000 times. Among the models created for 1000 times, the model with the highest accuracy confirmed by the Validation set appears as the final result. Among the four types of models finally created, the type with the highest sensitivity or specificity of the total set (491 in total) including all samples was selected.
i.대장암군과 나머지 그룹 (고위험군/저위험군/정상군)을 구분할 수 있는 인공지능 예측 모델 A 제작i. Creation of artificial intelligence prediction model A that can distinguish colorectal cancer group from other groups (high-risk/low-risk/normal group)
t-test 통계 분석을 통하여 각 그룹을 구분하는 데에 통계학적 유의성이 나타나는 유전자 마커들을 확인한다. 이에 해당하는 총 8개 유전자 마커들의 상대적 발현양을 대입하여 모델 A를 제작한다. Through t-test statistical analysis, genetic markers showing statistical significance in classifying each group are identified. Model A is constructed by substituting the relative expression levels of a total of 8 corresponding genetic markers.
-이에 따른 결과로 민감도 92.9%, 특이도 65.0%로 대장암군과 고위험군/저위험군/정상군을 구별하는 SVM 모델이 선정되었다. - As a result, an SVM model was selected that differentiates the colorectal cancer group from the high-risk/low-risk/normal group with a sensitivity of 92.9% and a specificity of 65.0%.
ii.대장암군과 나머지 그룹 (고위험군/저위험군/정상군) 을 구분할 수 있는 인공지능 예측 모델 B 제작ii. Creation of artificial intelligence prediction model B that can distinguish colorectal cancer group from other groups (high-risk/low-risk/normal group)
-모델 A는 대장암군을 구별하는 민감도는 높지만 나머지 그룹의 40% 까지 대장암군으로 판별하므로 모델 A에서 양성이 나온 샘플들을 이용해 다시 한번 대장암군과 나머지 그룹을 구별할 수 있는 모델을 제작해야한다.- Model A has high sensitivity to distinguish between the colorectal cancer group, but up to 40% of the remaining groups are classified as colorectal cancer groups. Therefore, a model that can distinguish between the colorectal cancer group and the rest of the groups should be created again using samples that are positive in model A.
-모델 A에서 양성이 나온 샘플들을 대상으로 t-test 통계 분석을 통하여 각 그룹을 구분하는 데에 통계학적 유의성이 나타나는 유전자 마커들을 확인한다. 이에 해당하는 총 9개 유전자 마커들의 상대적 발현양을 대입하여 모델 B를 제작한다.-Confirm the genetic markers that show statistical significance in classifying each group through t-test statistical analysis for samples that are positive in model A. Model B is produced by substituting the relative expression levels of a total of 9 corresponding genetic markers.
-이에 따른 결과로 민감도 94.9%, 특이도 87.9% 로 대장암군과 고위험군/저위험군/정상군을 구별하는 SVM 모델이 선정되었다. - As a result, an SVM model was selected that differentiates the colorectal cancer group from the high-risk/low-risk/normal group with a sensitivity of 94.9% and a specificity of 87.9%.
iii.고위험군과 저위험군/정상군을 구분할 수 있는 인공지능 예측 모델 C 제작iii. Creation of artificial intelligence prediction model C that can distinguish high-risk group from low-risk group/normal group
-모델 A와 B에서 음성이 나온 샘플들에는 대장내시경이 필요한 고위험군이 포함되어 있으므로 고위험군과 저위험군/정상군을 구분하는 모델을 제작해야한다.-Since samples that are negative in models A and B include a high-risk group that requires colonoscopy, a model that distinguishes the high-risk group from the low-risk group/normal group must be created.
-모델 A와 B에서 음성이 나온 샘플들을 대상으로 t-test 통계 분석을 통하여 각 그룹을 구분하는 데에 통계학적 유의성이 나타나는 유전자 마커들을 확인한다. 이에 해당하는 총 6개 유전자 마커들의 상대적 발현양을 대입하여 모델 C를 제작한다.-Confirm the genetic markers that show statistical significance in classifying each group through t-test statistical analysis for samples that are negative in models A and B. Model C is constructed by substituting the relative expression levels of a total of six corresponding gene markers.
-이에 따른 결과로 민감도 91.3%, 특이도 81.9% 로 고위험군과 저위험군/정상군을 구별하는 SVM 모델이 선정되었다.- As a result, an SVM model was selected that distinguished the high-risk group from the low-risk group/normal group with a sensitivity of 91.3% and specificity of 81.9%.
iv.인공지능 예측 모델 A - B - C를 순차적으로 적용한 전체적인 결과iv. Overall result of sequentially applying artificial intelligence prediction models A - B - C
i, ii, iii에서 제작한 인공지능 예측 모델 A - B - C에 순차적으로 전체 샘플의 유전자 마커 상대적 발현양을 대입한 결과, 95.2%의 대장암군과 91.7%의 고위험군이 양성으로 나타났으며 78.5%의 저위험군과 정상군이 음성으로 나타났다.As a result of sequentially substituting the relative expression levels of gene markers in the entire sample into the artificial intelligence prediction models A - B - C created in i, ii, and iii, 95.2% of the colorectal cancer group and 91.7% of the high-risk group were positive, and 78.5 % of the low-risk group and the normal group were negative.
상기 모델 A, B 및 C에 사용된 바이오마커는 하기와 같다.The biomarkers used in the models A, B and C are as follows.
모델 A- IL1B, LTF, TNFSF13B, ITIH4, CXCL11, MAPK6, GK, MCAM (8개),Model A- IL1B, LTF, TNFSF13B, ITIH4, CXCL11, MAPK6, GK, MCAM (8 pieces),
모델 B- CES1, IL1B, TNFSF13B, ITIH4, PTGS2, CXCL11, MAPK6, GK, MCAM (9개) 및Model B- CES1, IL1B, TNFSF13B, ITIH4, PTGS2, CXCL11, MAPK6, GK, MCAM (9 pieces) and
모델 C- IL1B, LTF, TNFSF13B, ITIH4, CXCL11, MAPK6 (6개)Model C- IL1B, LTF, TNFSF13B, ITIH4, CXCL11, MAPK6 (6 pieces)
실시예 6; 검체(clinical specimen) 수집Example 6; Collection of clinical specimens
2017년부터 2022년까지 신촌세브란스병원 (승인번호 4-2017-0148), 강남세브란스병원(승인번호 3-2017-0024), 강북삼성병원(승인번호 2017-02-022-009)의 소화기 내과에서, 원주세브란스기독병원 건강진단센터(승인번호 CR319115)에서 각 기관의 생명윤리심의위원회(IRB)의 승인을 받아 대장내시경 검사가 예정된 대상자들의 혈액 샘플을 수집하였다. 혈액은 Tempus blood tube (Applied Biosystems®) 를 이용하여 총 3 ml을 채혈하였다. 대상자들은 대장내시경 검사의 결과를 통해 다음과 같이 분류되었다 (표 4)From 2017 to 2022, Shinchon Severance Hospital (Approval No. 4-2017-0148), Gangnam Severance Hospital (Approval No. 3-2017-0024), Kangbuk Samsung Hospital (Approval No. 2017-02-022-009) in the Department of Gastroenterology , Blood samples from subjects scheduled for colonoscopy were collected at the Health Examination Center of Wonju Severance Christian Hospital (approval number CR319115) with the approval of the Bioethics Review Board (IRB) of each institution. A total of 3 ml of blood was collected using a Tempus blood tube (Applied Biosystems®). Subjects were classified as follows according to the results of colonoscopy (Table 4).
표 4는 대장내시경 결과에 따른 대상자들 분류 및 검체 수Table 4 shows the classification of subjects and the number of samples according to the results of colonoscopy.
실시예 7: 혈액 검체에서 Total RNA 분리Example 7: Isolation of Total RNA from Blood Specimens
Tempus tube로 채혈된 혈액검체로부터 Tempus blood RNA isolation kit (Applied Biosystems®)를 이용하여 Total RNA를 분리한다. Total RNA is isolated from a blood sample collected with a Tempus tube using the Tempus blood RNA isolation kit (Applied Biosystems®).
실시예 8: 분리된 total RNA로부터 cDNA 제작 및 qPCR 수행Example 8: cDNA construction and qPCR from isolated total RNA
i. complementary DNA (cDNA) 합성i. Complementary DNA (cDNA) synthesis
분리된 total RNA 1.5~4.5 ug, Random primer (3 ug/uL) (Invitrogen) 2.5 uL, dNTP 혼합물 (2.5 mM each) (Intron) 2.5 uL, M-MLV 역전사 중합효소 (200 U/uL) (Invitrogen) 2.5 uL, 5× First-strand buffer (250 mM Tris-HCl) (Invitrogen) 10 μL, Dithiothreitol (0.1 M) (Invitrogen) 5 μL를 첨가하고 최종부피를 50㎕ 가 되도록 Ultrapure water를 넣고 잘 섞은 후 합성 반응액을 thermocycler (Applied Biosystems)에서 25℃, 30분 - 37℃, 50분 - 70℃, 15분 반응시켜 cDNA를 합성하였다.Isolated total RNA 1.5~4.5 ug, Random primer (3 ug/uL) (Invitrogen) 2.5 uL, dNTP mixture (2.5 mM each) (Intron) 2.5 uL, M-MLV reverse transcription polymerase (200 U/uL) (Invitrogen ) 2.5 uL, 10 μL of 5× First-strand buffer (250 mM Tris-HCl) (Invitrogen), 5 μL of Dithiothreitol (0.1 M) (Invitrogen) were added, and ultrapure water was added to a final volume of 50 μL, and mixed well. The synthesis reaction solution was reacted in a thermocycler (Applied Biosystems) at 25°C, 30 minutes - 37°C, 50 minutes - 70°C, 15 minutes to synthesize cDNA.
ii. Quantitative polymerase chain reaction (qPCR) 수행ii. Perform quantitative polymerase chain reaction (qPCR)
qPCR 반응물의 조성은 THUNDERBIRD® Probe qPCR Mix (TOYOBO) 10㎕와 Forward / Reverse Primer, Probe (10 pmole/uL) 1 uL을 넣어주고 합성한 cDNA를 2 uL 넣고 최종 부피가 20㎕ 되도록 Ultrapure water 를 넣어 섞어준다. qPCR 반응은 CFX96 (Biorad) 를 이용하였으며 반응 온도 조건은 다음과 같다. 95℃, 3분 후 95℃, 3초 - 60℃, 30초를 40회 반복하여 수행하였다. Annealing 과정 (60℃, 30초) 이 한번 수행될 때마다 형광을 측정하는 과정을 추가하여, 횟수 별로 증가되는 형광 값을 측정하였다. 일정한 형광값을 Threshold로 설정하여 Threshold에 도달하는 시점의 cycle 수인 Cq 값을 도출하였다.For the composition of the qPCR reaction, add 10 μl of THUNDERBIRD® Probe qPCR Mix (TOYOBO), Forward / Reverse Primer, and 1 uL of Probe (10 pmole/uL), add 2 μL of synthesized cDNA, and add ultrapure water to make the final volume 20 μl. Mix. The qPCR reaction was performed using CFX96 (Biorad), and the reaction temperature conditions were as follows. After 95°C, 3 minutes, 95°C, 3 seconds - 60°C, 30 seconds were repeated 40 times. Each time the annealing process (60 ° C, 30 seconds) was performed, a process of measuring fluorescence was added to measure the fluorescence value that increased for each number of times. A constant fluorescence value was set as the threshold, and the Cq value, which is the number of cycles at the time of reaching the threshold, was derived.
실시예 9: 결과 확인 및 표적 유전자의 상대 발현양 분석Example 9: Confirmation of results and analysis of relative expression of target genes
Endogenous control로 이용된 GAPDH 유전자의 Cq 값을 이용하여 표적 유전자의 Cq 값을 이용하여 표적 유전자의 상대발현양(2-ΔCq)을 계산한다. 표적으로 하는 유전자의 목록은 다음과 같다 (표 5).Using the Cq value of the GAPDH gene used as an endogenous control, the relative expression level (2 -ΔCq) of the target gene is calculated using the Cq value of the target gene. The list of genes targeted is as follows (Table 5).
[관계식][relational expression]
2-ΔCq = 2-(표적 유전자 Cq - GAPDH 유전자 Cq) 2 -ΔCq = 2 - (target gene Cq - GAPDH gene Cq)
표 5는 표적 혈중 유전자 마커 목록Table 5 is a list of target blood genetic markers
실시예 10:표적 유전자의 상대발현양을 대입한 대장암 및 진행선종 스크리닝 목적의 분류 모델 구축Example 10: Establishment of a classification model for the purpose of screening for colorectal cancer and advanced adenoma by substituting the relative expression level of the target gene
Statistical R software (version 3.6.3)의 H2O package (version 3.32.1.3)를 이용하여 인공지능 알고리즘 기반의 분류 모델을 구축하였다. 대장암 및 진행선종 진단 예측 모델의 제작은 Deep neural network (DNN), Generalized linear model (GLM), Random forest (RF) 알고리즘을 기반으로 하였고 추가적으로 여러 종류의 모델 (GLM, RF, DNN, GBM, stacked ensemble (SE)) 중 데이터에 적합한 모델을 구축하는 Automated machine learning (AutoML) 방법을 접목하여 수행되었으나 이에 한정되지 않는다. An artificial intelligence algorithm-based classification model was constructed using the H2O package (version 3.32.1.3) of Statistical R software (version 3.6.3). The production of colorectal cancer and advanced adenoma diagnosis prediction models was based on deep neural network (DNN), generalized linear model (GLM), and random forest (RF) algorithms, and additionally several types of models (GLM, RF, DNN, GBM, stacked ensemble (SE)), but is not limited thereto.
전체 샘플을 Training set와 Test set으로 나누고 Training set 결과를 대입하여 정상군 대비 대장암군과 진행선종군을 구분할 수 있는 인공지능 알고리즘 기반 분류모델을 구축하고 구축된 모델의 성능을 Test set을 이용하여 평가한다. Training set를 이용하여 모델을 구축할 때 5-fold cross-validation 기법을 접목하여 Training set가 5개의 영역으로 구분되어 모델을 학습함과 동시에 각 영역을 이용하여 모델의 성능을 검증하여 높은 성능의 모델을 구축하고자 하였다. By dividing the entire sample into a training set and a test set, and substituting the results of the training set, an artificial intelligence algorithm-based classification model that can distinguish between a normal group and a colorectal cancer group and an advanced cancer group was constructed, and the performance of the built model was evaluated using the test set. do. When building a model using a training set, a 5-fold cross-validation technique is applied so that the training set is divided into 5 areas to learn the model and at the same time verify the performance of the model using each area to provide a high-performance model. wanted to build.
인공지능 분류 모델의 성능은 분류모델의 대표적인 성능지표인 AUROC, AUPRC 값을 기반으로 Training set와 Test set의 AUROC, AUPRC 값을 통하여 판단하였다. 그 중에서도 모델 학습에 이용되지 않은 새로운 Test set의 성능을 기준으로 가장 성능이 좋은 모델을 선정하였다. 각 알고리즘을 기반으로 구축된 DNN, GBM, RF 모델과 AutoML을 통해 구축된 SE 모델의 AUROC, AUPRC 값은 다음과 같다 (표 6). 그 결과, SE 모델에서 Test set 기준으로 AUROC, AUPRC 지표가 가장 높았다. The performance of the artificial intelligence classification model was judged through the AUROC and AUPRC values of the training set and test set based on the AUROC and AUPRC values, which are representative performance indicators of the classification model. Among them, the model with the best performance was selected based on the performance of the new test set that was not used for model learning. The AUROC and AUPRC values of the DNN, GBM, and RF models built based on each algorithm and the SE model built through AutoML are as follows (Table 6). As a result, the AUROC and AUPRC indicators were the highest in the SE model based on the test set.
표 6은 Training set와 Test set에서의 AUROC 및 AUPRC 성능 지표그 결과, 표 7에 나타낸 바와 같이 대장암군을 구분하는 민감도는 89.3%, 진행선종군을 구분하는 민감도는 74.5%이었고 비진행선종군과 대조군을 구분하는 특이도는 72.0%이었다. Table 6 shows AUROC and AUPRC performance indicators in the training set and test set. As a result, as shown in Table 7, the sensitivity for classifying the colorectal cancer group was 89.3% and the sensitivity for classifying the advanced adenoma group was 74.5%. The specificity to distinguish the control group was 72.0%.
(총 157명)(total 157)
(n = 75)(n = 75)
(n = 28)(n = 28)
(n = 47)(n = 47)
(n = 82)(n = 82)
표 7은 SE 모델의 그룹별 민감도 및 특이도 결과Table 7 shows the sensitivity and specificity results for each group of the SE model.
실시예 11: 검체(clinical specimen) 수집Example 11: Clinical specimen collection
2017년부터 2022년까지 신촌세브란스병원 (승인번호 4-2017-0148), 강남세브란스병원(승인번호 3-2017-0024), 강북삼성병원(승인번호 2017-02-022-009)의 소화기 내과에서, 원주세브란스기독병원 건강진단센터(승인번호 CR319115)에서 각 기관의 생명윤리심의위원회(IRB)의 승인을 받아 대장내시경 검사가 예정된 대상자들의 혈액 샘플을 수집하였다. 혈액은 Tempus blood tube (Applied Biosystems®) 를 이용하여 총 3 ml을 채혈하였다. 대상자들은 대장내시경 검사의 결과를 통해 다음과 같이 분류되었다 (표 8)From 2017 to 2022, Shinchon Severance Hospital (Approval No. 4-2017-0148), Gangnam Severance Hospital (Approval No. 3-2017-0024), Gangbuk Samsung Hospital (Approval No. 2017-02-022-009) in the Department of Gastroenterology , Blood samples were collected from subjects scheduled for colonoscopy at Wonju Severance Christian Hospital Health Examination Center (approval number CR319115) with the approval of the Bioethics Review Board (IRB) of each institution. A total of 3 ml of blood was collected using a Tempus blood tube (Applied Biosystems®). The subjects were classified as follows according to the results of colonoscopy (Table 8).
대장내시경 결과, 대장 내 병변이 없는 대상자No gastrointestinal symptoms and no family history of colorectal cancer
Subjects with no lesions in the large intestine as a result of colonoscopy
표 8은 대장내시경 결과에 따른 대상자들 분류 및 검체 수Table 8 shows the classification of subjects and the number of specimens according to colonoscopy results.
실시예 12 : 혈액 검체에서 Total RNA 분리Example 12: Isolation of Total RNA from Blood Specimens
Tempus tube로 채혈된 혈액검체로부터 Tempus blood RNA isolation kit (Applied Biosystems®) 를 이용하여 Total RNA를 분리한다. Total RNA is isolated from a blood sample collected with a Tempus tube using the Tempus blood RNA isolation kit (Applied Biosystems®).
실시예 13: 분리된 total RNA로부터 cDNA 제작 및 qPCR 수행Example 13: cDNA construction and qPCR from isolated total RNA
i. complementary DNA (cDNA) 합성i. Complementary DNA (cDNA) synthesis
분리된 total RNA 1.5~4.5 ug, Random primer (3 ug/uL) (Invitrogen) 2.5 uL, dNTP 혼합물 (2.5 mM each) (Intron) 2.5 uL, M-MLV 역전사 중합효소 (200 U/uL) (Invitrogen) 2.5 uL, 5× First-strand buffer (250 mM Tris-HCl) (Invitrogen) 10 μL, Dithiothreitol (0.1 M) (Invitrogen) 5 μL를 첨가하고 최종부피를 50㎕ 가 되도록 Ultrapure water를 넣고 잘 섞은 후 합성 반응액을 thermocycler (Applied Biosystems)에서 25℃, 30분 - 37℃, 50분 - 70℃, 15분 반응시켜 cDNA를 합성하였다.Isolated total RNA 1.5~4.5 ug, Random primer (3 ug/uL) (Invitrogen) 2.5 uL, dNTP mixture (2.5 mM each) (Intron) 2.5 uL, M-MLV reverse transcription polymerase (200 U/uL) (Invitrogen ) 2.5 uL, 10 μL of 5× First-strand buffer (250 mM Tris-HCl) (Invitrogen), and 5 μL of Dithiothreitol (0.1 M) (Invitrogen) were added, and ultrapure water was added to a final volume of 50 μL, and mixed well. The synthesis reaction solution was reacted in a thermocycler (Applied Biosystems) at 25°C, 30 minutes - 37°C, 50 minutes - 70°C, 15 minutes to synthesize cDNA.
ii. Quantitative polymerase chain reaction (qPCR) 수행ii. Perform quantitative polymerase chain reaction (qPCR)
qPCR 반응물의 조성은 THUNDERBIRD® Probe qPCR Mix (TOYOBO) 10㎕와 Forward / Reverse Primer, Probe (10 pmole/uL) 1 uL을 넣어주고 합성한 cDNA를 2 uL 넣고 최종 부피가 20㎕ 되도록 Ultrapure water 를 넣어 섞어준다. qPCR 반응은 CFX96 (Biorad) 를 이용하였으며 반응 온도 조건은 다음과 같다. 95℃, 3분 후 95℃, 3초 - 60℃, 30초를 40회 반복하여 수행하였다. Annealing 과정 (60℃, 30초) 이 한번 수행될 때마다 형광을 측정하는 과정을 추가하여, 횟수 별로 증가되는 형광 값을 측정하였다. 일정한 형광값을 Threshold로 설정하여 Threshold에 도달하는 시점의 cycle 수인 Cq 값을 도출하였다.For the composition of the qPCR reaction, add 10 μl of THUNDERBIRD® Probe qPCR Mix (TOYOBO), Forward / Reverse Primer, and 1 uL of Probe (10 pmole/uL), add 2 μL of synthesized cDNA, and add ultrapure water to make the final volume 20 μl. Mix. The qPCR reaction was performed using CFX96 (Biorad), and the reaction temperature conditions were as follows. After 95°C, 3 minutes, 95°C, 3 seconds - 60°C, 30 seconds were repeated 40 times. Each time the annealing process (60 ° C, 30 seconds) was performed, a process of measuring fluorescence was added to measure the fluorescence value that increased by number of times. A constant fluorescence value was set as the threshold, and the Cq value, which is the number of cycles at the time of reaching the threshold, was derived.
실시예 14 : 결과 확인 및 표적 유전자의 상대발현양 분석Example 14: Result Confirmation and Analysis of Relative Expression of Target Genes
Endogenous control로 이용된 GAPDH 유전자의 Cq 값을 이용하여 표적 유전자의 Cq 값을 이용하여 표적 유전자의 상대발현양(2-*?*Cq)을 계산한다. 30개 유전자의 각 그룹별 상대발현양을 비교하기 위하여 정상군 대비 대장암군 상대발현양 비, 정상군 대비 진행선종군 상대발현양 비, 진행선종군 대비 대장암군 상대발현양 비를 구하여 아래 표와 같이 나타낼 수 있다(표 9). Using the Cq value of the GAPDH gene used as an endogenous control, the relative expression level (2 -*?*Cq ) of the target gene is calculated using the Cq value of the target gene. In order to compare the relative expression amount of each group of 30 genes, the relative expression amount ratio of the colorectal cancer group compared to the normal group, the relative expression amount ratio of the advanced glandular group compared to the normal group, and the relative expression amount ratio of the colorectal cancer group compared to the advanced glandular group were calculated and shown in the table below. can be expressed as (Table 9).
[관계식][relational expression]
2-ΔCq = 2-(표적 유전자 Cq - GAPDH 유전자 Cq) 2 -ΔCq = 2 - (target gene Cq - GAPDH gene Cq)
표 9는 30개 유전자의 그룹 간 상대발현양 비교Table 9 compares the relative expression of 30 genes between groups.
실시예 15 : 표적 유전자의 상대발현양을 대입한 대장암 및 진행선종 스크리닝 목적의 분류 모델 구축Example 15: Establishment of a classification model for the purpose of screening colorectal cancer and advanced adenoma by substituting the relative expression level of target genes
Statistical R software (version 3.6.3)의 H2O package (version 3.32.1.3)를 이용하여 인공지능 알고리즘 기반의 분류 모델을 구축하였다. 대장암 및 진행선종 진단 예측 모델의 제작은 Deep neural network (DNN), Generalized linear model (GLM), Gradient boosting machine (GBM), Random forest (RF) 알고리즘을 기반으로 하였고 추가적으로 여러 종류의 모델 (GLM, RF, DNN, GBM, stacked ensemble (SE)) 중 데이터에 적합한 모델을 구축하는 Automated machine learning (AutoML) 방법을 접목하여 수행되었으나 이에 한정되지 않는다. An artificial intelligence algorithm-based classification model was constructed using the H2O package (version 3.32.1.3) of Statistical R software (version 3.6.3). The production of colorectal cancer and advanced adenoma diagnosis prediction models was based on Deep neural network (DNN), Generalized linear model (GLM), Gradient boosting machine (GBM), and Random forest (RF) algorithms, and additionally several types of models (GLM, RF, DNN, GBM, stacked ensemble (SE)) was performed by grafting Automated machine learning (AutoML) method to build a model suitable for data, but is not limited thereto.
전체 샘플을 Training set와 Test set으로 나누고 Training set 결과를 대입하여 정상군 대비 대장암군과 진행선종군을 구분할 수 있는 인공지능 알고리즘 기반 분류모델을 구축하고 구축된 모델의 성능을 Test set을 이용하여 평가한다(도 12). Training set를 이용하여 모델을 구축할 때 5-fold cross-validation 기법을 접목하여 Training set가 5개의 영역으로 구분되어 모델을 학습함과 동시에 각 영역을 이용하여 모델의 성능을 검증하여 높은 성능의 모델을 구축하고자 하였다(도 16). By dividing the entire sample into a training set and a test set, and substituting the results of the training set, an artificial intelligence algorithm-based classification model that can distinguish between a normal group and a colorectal cancer group and an advanced cancer group was constructed, and the performance of the built model was evaluated using the test set. (FIG. 12). When building a model using a training set, a 5-fold cross-validation technique is applied so that the training set is divided into 5 areas to learn the model and at the same time verify the performance of the model using each area to provide a high-performance model. It was intended to build (FIG. 16).
인공지능 분류 모델의 성능은 분류모델의 대표적인 성능지표인 AUROC, AUPRC 값을 기반으로Training set와 Test set의 AUROC, AUPRC 값을 통하여 판단하였다. 그 중에서도 모델 학습에 이용되지 않은 새로운 Test set의 성능을 기준으로 가장 성능이 좋은 모델을 선정하였다. 각 알고리즘을 기반으로 구축된 DNN, GBM, GLM, RF 모델과 AutoML을 통해 구축된 SE 모델의 AUROC, AUPRC 값은 다음과 같다 (표 9). 그 결과, GBM 모델과 AutoML을 통해 구축된 SE 모델에서 Test set 기준으로 AUC, AUPRC 지표가 가장 높았다. The performance of the artificial intelligence classification model was judged through the AUROC and AUPRC values of the training set and test set based on the AUROC and AUPRC values, which are representative performance indicators of the classification model. Among them, the model with the best performance was selected based on the performance of the new test set that was not used for model learning. The AUROC and AUPRC values of the DNN, GBM, GLM, and RF models built based on each algorithm and the SE model built through AutoML are as follows (Table 9). As a result, the AUC and AUPRC indicators were the highest based on the test set in the GBM model and the SE model built through AutoML.
표 10은 각 모델 별 Training, Test set에서의 AUROC, AUPRC 지표GBM 모델과 SE 모델의 그룹별 Test set 결과를 확인한 결과, Table 10 shows the results of the AUROC and AUPRC indicators in the training and test sets for each model, the test set results for each group of the GBM model and SE model.
GBM 모델에서 대장암군을 구분하는 민감도는 94.6%, 진행선종군을 구분하는 민감도는 97.5%, 정상군을 구분하는 특이도는 80.6%로 나타났고(표 11), SE 모델에서 대장암군을 구분하는 민감도는 91.9%, 진행선종군을 구분하는 민감도는 95.1%, 정상군을 구분하는 특이도는 80.6%로 나타났다 (표 12). 따라서, 더 높은 민감도를 나타내는 GBM 모델을 최종으로 선정하였다. In the GBM model, the sensitivity for distinguishing the colorectal cancer group was 94.6%, the sensitivity for distinguishing the advanced adenoma group was 97.5%, and the specificity for distinguishing the normal group was 80.6% (Table 11). The sensitivity was 91.9%, the sensitivity to distinguish the advanced adenoma group was 95.1%, and the specificity to distinguish the normal group was 80.6% (Table 12). Therefore, the GBM model showing higher sensitivity was finally selected.
(n = 118)(n = 118)
(n = 37)(n = 37)
(n = 36)(n = 36)
표 11은 GBM 모델의 그룹별 민감도 및 특이도 결과Table 11 shows the sensitivity and specificity results for each group of the GBM model.
(n = 118)(n = 118)
(n = 37)(n = 37)
(n = 36)(n = 36)
표 12는 SE 모델의 그룹별 민감도 및 특이도 결과Table 12 shows the sensitivity and specificity results for each group of the SE model.
비교예comparative example
대장암 또는 대장암의 전구병변인 진행선종에서는 혈액 내 순환암세포 (Circulating tumor cell) 가 존재할 수 있고 이에 따라 순환암세포에서 상대발현양이 변화하는 것으로 알려진 10개 유전자 (EpCAM, ERBB2, FOXA2, KRT19, MCAM, MKi67, NPTN, SNAI2, TERT, VIM)를 표적으로 그룹별 상대발현양을 구하고 정상군으로부터 대장암 또는 진행선종을 구분하는 인공지능 알고리즘 기반 모델을 구축하였다.In colorectal cancer or advanced adenoma, a precursor of colorectal cancer, circulating tumor cells may exist in the blood, and 10 genes ( EpCAM, ERBB2, FOXA2, KRT19, MCAM, MKi67, NPTN, SNAI2, TERT, VIM ) as a target, the relative expression level by group was calculated, and an artificial intelligence algorithm-based model was constructed to distinguish colorectal cancer or advanced adenoma from the normal group.
1)검체 수집1) Specimen collection
2017년부터 2022년까지 신촌세브란스병원 (승인번호 4-2017-0148), 강남세브란스병원 (승인번호 3-2017-0024), 강북삼성병원 (승인번호 2017-02-022-009)의 소화기 내과에서, 원주세브란스기독병원 건강진단센터 (승인번호 CR319115)에서 각 기관의 생명윤리심의위원회(IRB)의 승인을 받아 대장내시경 검사가 예정된 대상자들의 혈액 샘플을 수집하였다. 혈액은 Tempus blood tube (Applied Biosystems®를 이용하여 총 3 ml을 채혈하였다. 대상자들은 대장내시경 검사의 결과를 통해 다음과 같이 분류되었다 (표 13).From 2017 to 2022, Shinchon Severance Hospital (Approval No. 4-2017-0148), Gangnam Severance Hospital (Approval No. 3-2017-0024), Kangbuk Samsung Hospital (Approval No. 2017-02-022-009) in the Department of Gastroenterology , Blood samples from subjects scheduled for colonoscopy were collected at the Wonju Severance Christian Hospital Health Examination Center (approval number CR319115) with the approval of the Bioethics Review Board (IRB) of each institution. A total of 3 ml of blood was collected using a Tempus blood tube (Applied Biosystems®). Subjects were classified as follows through the results of colonoscopy (Table 13).
대장내시경 결과, 대장 내 병변이 없는 대상자No gastrointestinal symptoms and no family history of colorectal cancer
Subjects with no lesions in the large intestine as a result of colonoscopy
표 13은 대장내시경 결과에 따른 대상자들 분류 및 검체 수Table 13 is the classification of subjects and the number of samples according to the results of colonoscopy
2) 혈액 검체에서 Total RNA 분리2) Isolation of total RNA from blood samples
Tempus tube로 채혈된 혈액검체로부터 Tempus blood RNA isolation kit (Applied Biosystems®를 이용하여 Total RNA를 분리한다. Total RNA is isolated from a blood sample collected with a Tempus tube using the Tempus blood RNA isolation kit (Applied Biosystems®).
3) 분리된 total RNA로부터 cDNA 제작 및 qPCR 수행3) cDNA production and qPCR from isolated total RNA
i. complementary DNA (cDNA) 합성i. Complementary DNA (cDNA) synthesis
분리된 total RNA 1.5~4.5 ug, Random primer (3 ug/uL) (Invitrogen) 2.5 uL, dNTP 혼합물 (2.5 mM each) (Intron) 2.5 uL, M-MLV 역전사 중합효소 (200 U/uL) (Invitrogen) 2.5 uL, 5× First-strand buffer (250 mM Tris-HCl) (Invitrogen) 10 μL, Dithiothreitol (0.1 M) (Invitrogen) 5 μL를 첨가하고 최종부피를 50㎕ 가 되도록 Ultrapure water를 넣고 잘 섞은 후 합성 반응액을 thermocycler (Applied Biosystems)에서 25℃, 30분 - 37℃,50분 - 70℃, 15분 반응시켜 cDNA를 합성하였다.Isolated total RNA 1.5~4.5 ug, Random primer (3 ug/uL) (Invitrogen) 2.5 uL, dNTP mixture (2.5 mM each) (Intron) 2.5 uL, M-MLV reverse transcription polymerase (200 U/uL) (Invitrogen ) 2.5 uL, 10 μL of 5× First-strand buffer (250 mM Tris-HCl) (Invitrogen), 5 μL of Dithiothreitol (0.1 M) (Invitrogen) were added, and ultrapure water was added to a final volume of 50 μL, and mixed well. The synthetic reaction solution was reacted in a thermocycler (Applied Biosystems) at 25°C, 30 minutes - 37°C, 50 minutes - 70°C, 15 minutes to synthesize cDNA.
ii. Quantitative polymerase chain reaction (qPCR) 수행ii. Perform quantitative polymerase chain reaction (qPCR)
qPCR 반응물의 조성은 THUNDERBIRD®Probe qPCR Mix (TOYOBO) 10㎕와 Forward / Reverse Primer, Probe (10 pmole/uL) 1 uL을 넣어주고 합성한 cDNA를 2 uL 넣고 최종 부피가 20㎕ 되도록 Ultrapure water 를 넣어 섞어준다. qPCR 반응은 CFX96 (Biorad) 를 이용하였으며 반응 온도 조건은 다음과 같다. 95℃3분 후 95℃3초 - 60℃30초를 40회 반복하여 수행하였다. Annealing 과정 (60℃, 30초) 이 한번 수행될 때마다 형광을 측정하는 과정을 추가하여, 횟수 별로 증가되는 형광 값을 측정하였다. 일정한 형광값을 Threshold로 설정하여 Threshold에 도달하는 시점의 cycle 수인 Cq 값을 도출하였다.For the composition of the qPCR reaction, add 10 μl of THUNDERBIRD®Probe qPCR Mix (TOYOBO), Forward / Reverse Primer, and 1 uL of Probe (10 pmole/uL), add 2 μL of synthesized cDNA, and add ultrapure water to make the final volume 20 μl. Mix. The qPCR reaction was performed using CFX96 (Biorad), and the reaction temperature conditions were as follows. After 95°C 3 minutes, 95°C 3 seconds - 60°C 30 seconds were repeated 40 times. Each time the annealing process (60 ° C, 30 seconds) was performed, a process of measuring fluorescence was added to measure the fluorescence value that increased for each number of times. A constant fluorescence value was set as the threshold, and the Cq value, which is the number of cycles at the time of reaching the threshold, was derived.
4) 결과 확인 및 표적 유전자의 상대발현양 분석4) Confirmation of results and analysis of relative expression of target genes
Endogenous control로 이용된 GAPDH 유전자의 Cq 값을 이용하여 표적 유전자의 Cq 값을 이용하여 표적 유전자의 상대발현양(2-ΔCq)을 계산한다. 표적으로 하는 유전자의 목록은 다음과 같다 (표 14).Using the Cq value of the GAPDH gene used as an endogenous control, the relative expression level (2 -ΔCq ) of the target gene is calculated using the Cq value of the target gene. The list of genes targeted is as follows (Table 14).
5) 표적 유전자의 상대발현양을 대입한 대장암 및 진행선종 스크리닝 목적의 분류 모델 구축5) Establishment of a classification model for the purpose of screening for colorectal cancer and advanced adenoma by substituting the relative expression level of target genes
Statistical R software (version 3.6.3)의 H2O package (version 3.32.1.3)를 이용하여 인공지능 알고리즘 기반의 분류 모델을 구축하였다. 대장암 및 진행선종 진단 예측 모델의 제작은 Deep neural network (DNN), Generalized linear model (GLM), Gradient boosting machine (GBM), Random forest (RF) 알고리즘을 기반으로 하였고 추가적으로 여러 종류의 모델 (GLM, RF, DNN, GBM, stacked ensemble (SE)) 중 데이터에 적합한 모델을 구축하는 Automated machine learning (AutoML) 방법을 접목하여 수행되었으나 이에 한정되지 않는다. An artificial intelligence algorithm-based classification model was constructed using the H2O package (version 3.32.1.3) of Statistical R software (version 3.6.3). The production of colorectal cancer and advanced adenoma diagnosis prediction models was based on Deep neural network (DNN), Generalized linear model (GLM), Gradient boosting machine (GBM), and Random forest (RF) algorithms, and additionally several types of models (GLM, RF, DNN, GBM, stacked ensemble (SE)) was performed by grafting Automated machine learning (AutoML) method to build a model suitable for data, but is not limited thereto.
전체 샘플을 Training set와 Test set으로 나누고 Training set 결과를 대입하여 정상군 대비 대장암군과 진행선종군을 구분할 수 있는 인공지능 알고리즘 기반 분류모델을 구축하고 구축된 모델의 성능을 Test set을 이용하여 평가한다. Training set를 이용하여 모델을 구축할 때 5-fold cross-validation 기법을 접목하여 Training set가 5개의 영역으로 구분되어 모델을 학습함과 동시에 각 영역을 이용하여 모델의 성능을 검증하여 높은 성능의 모델을 구축하고자 하였다.By dividing the entire sample into a training set and a test set, and substituting the results of the training set, an artificial intelligence algorithm-based classification model that can distinguish between a normal group and a colorectal cancer group and an advanced cancer group was constructed, and the performance of the built model was evaluated using the test set. do. When building a model using a training set, a 5-fold cross-validation technique is applied so that the training set is divided into 5 areas to learn the model and at the same time verify the performance of the model using each area to provide a high-performance model. wanted to build.
인공지능 분류 모델의 성능은 분류모델의 대표적인 성능지표인 AUROC, AUPRC 값을 기반으로 Training set와 Test set의 AUROC, AUPRC 값을 통하여 판단하였다. 그 중에서도 모델 학습에 이용되지 않은 새로운 Test set의 성능을 기준으로 가장 성능이 좋은 모델을 선정하였다. 각 알고리즘을 기반으로 구축된 DNN, GBM, RF 모델과 AutoML을 통해 구축된 GBM 모델의 AUROC, AUPRC 값은 다음과 같다 (표 3). 그 결과, GBM 모델에서 Test set 기준으로 AUROC, AUPRC 지표가 가장 높았다. The performance of the artificial intelligence classification model was judged through the AUROC and AUPRC values of the training set and test set based on the AUROC and AUPRC values, which are representative performance indicators of the classification model. Among them, the model with the best performance was selected based on the performance of the new test set that was not used for model learning. The AUROC and AUPRC values of the DNN, GBM, and RF models built based on each algorithm and the GBM model built through AutoML are as follows (Table 3). As a result, the AUROC and AUPRC indicators were the highest in the GBM model based on the test set.
표 15는 Training set와 Test set에서의 AUROC 및 AUPRC 성능 지표그 결과,Table 15 shows AUROC and AUPRC performance indicators in the training set and test set. As a result,
GBM 모델에서 그룹별 Test set 결과를 확인한 결과, 대장암군을 구분하는 민감도는 78.4%, 진행선종군을 구분하는 민감도는 88.9%이었고 정상군을 구분하는 특이도는 80.6%이었다.As a result of confirming the test set results for each group in the GBM model, the sensitivity to distinguish the colorectal cancer group was 78.4%, the sensitivity to distinguish the advanced adenoma group was 88.9%, and the specificity to distinguish the normal group was 80.6%.
(총 157명)(total 157)
(n = 118)(n = 118)
(n = 37)(n = 37)
(n = 36)(n = 36)
표 16은 GBM 모델의 그룹별 민감도 및 특이도 결과Table 16 shows the sensitivity and specificity results for each group of the GBM model.
Claims (19)
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR20210193852 | 2021-12-31 | ||
KR10-2021-0193852 | 2021-12-31 | ||
KR10-2022-0163338 | 2022-11-29 | ||
KR1020220163338A KR20230104517A (en) | 2021-12-31 | 2022-11-29 | A method for sorting colorectal cancer and colon polyp or advanced. neoplasia and use of the same |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023128419A1 true WO2023128419A1 (en) | 2023-07-06 |
Family
ID=86999714
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2022/020461 WO2023128419A1 (en) | 2021-12-31 | 2022-12-15 | Method for screening colorectal cancer and colorectal polyps or advanced adenomas and application thereof |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2023128419A1 (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007112330A2 (en) * | 2006-03-24 | 2007-10-04 | Diadexus, Inc. | Compositions and methods for detection, prognosis and treatment of colon cancer |
KR20080102360A (en) * | 2005-12-23 | 2008-11-25 | 퍼시픽 에지 바이오테크놀로지 엘티디. | Prognosis for Colorectal Cancer |
KR20110052642A (en) * | 2008-07-18 | 2011-05-18 | 오라제닉스, 인코포레이티드 | Compositions for the Detection and Treatment of Colorectal Cancer |
JP2012517607A (en) * | 2009-02-20 | 2012-08-02 | オンコノム,インコーポレイテッド | Equipment set and method for colorectal cancer diagnosis and prognosis determination |
JP2013533977A (en) * | 2010-07-14 | 2013-08-29 | コモンウェルス サイエンティフィック アンド インダストリアル リサーチ オーガニゼイション | Diagnosis of colorectal cancer |
WO2014068124A1 (en) * | 2012-11-05 | 2014-05-08 | Diagnoplex Sa | Biomarker combinations for colorectal tumors |
KR20170094786A (en) * | 2014-12-11 | 2017-08-21 | 위스콘신 얼럼나이 리서어치 화운데이션 | Methods for detection and treatment of colorectal cancer |
WO2020225426A1 (en) * | 2019-05-08 | 2020-11-12 | Deutsches Krebsforschungszentrum Stiftung des öffentlichen Rechts | Colorectal cancer screening examination and early detection method |
KR20220116931A (en) * | 2021-02-16 | 2022-08-23 | 연세대학교 원주산학협력단 | A method for sorting colon polyp and colorectal cancer and use of the same |
-
2022
- 2022-12-15 WO PCT/KR2022/020461 patent/WO2023128419A1/en active Application Filing
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20080102360A (en) * | 2005-12-23 | 2008-11-25 | 퍼시픽 에지 바이오테크놀로지 엘티디. | Prognosis for Colorectal Cancer |
WO2007112330A2 (en) * | 2006-03-24 | 2007-10-04 | Diadexus, Inc. | Compositions and methods for detection, prognosis and treatment of colon cancer |
KR20110052642A (en) * | 2008-07-18 | 2011-05-18 | 오라제닉스, 인코포레이티드 | Compositions for the Detection and Treatment of Colorectal Cancer |
JP2012517607A (en) * | 2009-02-20 | 2012-08-02 | オンコノム,インコーポレイテッド | Equipment set and method for colorectal cancer diagnosis and prognosis determination |
JP2013533977A (en) * | 2010-07-14 | 2013-08-29 | コモンウェルス サイエンティフィック アンド インダストリアル リサーチ オーガニゼイション | Diagnosis of colorectal cancer |
WO2014068124A1 (en) * | 2012-11-05 | 2014-05-08 | Diagnoplex Sa | Biomarker combinations for colorectal tumors |
KR20170094786A (en) * | 2014-12-11 | 2017-08-21 | 위스콘신 얼럼나이 리서어치 화운데이션 | Methods for detection and treatment of colorectal cancer |
WO2020225426A1 (en) * | 2019-05-08 | 2020-11-12 | Deutsches Krebsforschungszentrum Stiftung des öffentlichen Rechts | Colorectal cancer screening examination and early detection method |
KR20220116931A (en) * | 2021-02-16 | 2022-08-23 | 연세대학교 원주산학협력단 | A method for sorting colon polyp and colorectal cancer and use of the same |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2012081898A2 (en) | Marker for predicting stomach cancer prognosis and method for predicting stomach cancer prognosis | |
WO2011055916A2 (en) | Method for detecting the methylation of colon-cancer-specific methylation marker genes for colon cancer diagnosis | |
WO2009113771A1 (en) | Lung cancer detecting method using lung cancer specific methylation marker genes | |
WO2014073785A1 (en) | Method for detecting gastric polyp and gastric cancer using marker gene of gastric polyp and gastric cancer-specific methylation | |
WO2012070861A2 (en) | Stomach cancer-specific methylation biomarker for stomach cancer diagnosis | |
WO2020226333A1 (en) | Method of predicting cancer prognosis and composition for same | |
WO2021230663A1 (en) | Method for predicting prognosis of patients having early breast cancer | |
WO2016171474A1 (en) | Method for analyzing biomolecule by using external biomolecule as standard material, and kit therefor | |
WO2012037128A2 (en) | Methods and kits for detecting melanoma | |
WO2018169145A1 (en) | System for predicting post-surgery prognosis or anticancer drug compatibility of advanced gastric cancer patients | |
WO2014027831A1 (en) | Bladder cancer prognosis diagnostic marker | |
WO2021075797A2 (en) | Composition for diagnosing liver cancer by using cpg methylation changes in specific genes, and use thereof | |
Lario et al. | microRNA profiling in duodenal ulcer disease caused by Helicobacter pylori infection in a Western population | |
WO2012081928A2 (en) | Method for detecting methylation of the bowel-cancer-specific methylation marker gpm6a gene in order to diagnose bowel cancer | |
WO2024043743A1 (en) | Composition for amplifying flt3 gene, and uses thereof | |
WO2023163458A1 (en) | Crispr-cas-based composition for salmonella detection and salmonella detection method using same | |
Gentilini et al. | GeneScanning analysis of Ig/TCR gene rearrangements to detect clonality in canine lymphomas | |
WO2022075788A1 (en) | Composition for diagnosing colorectal cancer, rectal cancer or colorectal adenoma by using cpg methylation change of linc01798 gene, and use thereof | |
WO2023128419A1 (en) | Method for screening colorectal cancer and colorectal polyps or advanced adenomas and application thereof | |
WO2011132989A2 (en) | Methylation marker for diagnosis of cervical cancer | |
WO2016108403A1 (en) | Use of rsph9 as bladder cancer prognosis diagnostic marker | |
WO2011122859A2 (en) | Composition for predicting chance of brain tumor recurrence and survival prognosis, and kit containing same | |
WO2024158203A1 (en) | Composition for predicting risk of developing liver cancer | |
WO2015060609A2 (en) | Method and apparatus for analyzing biomolecules by using oligonucleotide | |
WO2022220527A1 (en) | Genetic polymorphic markers for determining skin color, and use thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22916562 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 22916562 Country of ref document: EP Kind code of ref document: A1 |