CN116287222A

CN116287222A - Methylation marker for diagnosis of benign and malignant thyroid cancer nodules and application thereof

Info

Publication number: CN116287222A
Application number: CN202111496935.XA
Authority: CN
Inventors: 苏志熙; 刘轶颖; 徐敏杰; 马成城; 刘蕊
Original assignee: Jiangsu Huayuan Biotechnology Co ltd
Current assignee: Jiangsu Huayuan Biotechnology Co ltd
Priority date: 2021-12-09
Filing date: 2021-12-09
Publication date: 2023-06-23
Also published as: WO2023104136A1

Abstract

The invention relates to a methylation marker for diagnosing benign and malignant thyroid nodules and application thereof, in particular to application of a reagent for detecting methylation state or level of at least one CpG dinucleotide of one or more target markers in preparation of a detection reagent or a diagnosis reagent kit for diagnosing benign and malignant thyroid nodules of an individual, and application of a device for determining methylation state or level of at least one CpG dinucleotide of one or more target markers in preparation of a diagnosis reagent kit for diagnosing benign and malignant thyroid nodules of the individual, wherein the target markers comprise PRDM16 sequences of PRDM16 genes or genomes, CAMK2N1 sequences of CAMK2N1 genes or genomes, TACSTD2 sequences of TACSTD2 genes or genomes and the like. The invention also includes a diagnostic reagent or diagnostic kit for detecting the methylation state or methylation level of at least one CpG dinucleotide in the target marker to diagnose benign and malignant thyroid nodule.

Description

Methylation marker for diagnosis of benign and malignant thyroid cancer nodules and application thereof

Technical Field

The invention relates to a methylation marker for diagnosing benign and malignant thyroid cancer nodules and application thereof.

Background

Thyroid cancer is a malignancy that originates in the epithelium of thyroid follicles. The female morbidity is more, and the male and female morbidity proportion is 1: (2-4), the age of onset is generally 21-40 years. Papillary thyroid carcinoma (Papillary thyroid cancer, PTC) is the most common thyroid carcinoma, accounting for approximately 80% of all thyroid carcinomas. In recent years, the incidence of domestic thyroid cancer is on the rise. The thyroid cancer is discovered early and treated in time, the prognosis is good, and the survival rate of 10 years can reach more than 90 percent; however, if the patients are in early stage and leak diagnosis, the patients develop to local advanced stage, the patients lose the opportunity of operation and cannot be cured, and the survival rate of 5 years is obviously reduced.

The clinical routine diagnostic method is an imaging examination. Ultrasound examination is highly suspected of malignant thyroid nodules and requires further fine needle puncture cytology (fine needle aspiration, FNA) examination to confirm diagnosis. Malignant and benign nodules present some difficulty in diagnosing PTC due to approximate cytologic characteristics, and up to 40% of thyroid nodules are difficult to accurately diagnose by cytologic characteristics. Current molecular diagnostic methods improve the accuracy of identification, but the sensitivity of these methods remains to be improved.

Gene Expression Classifier is commonly used, but the positive predictive value (positive predictive value, PPV) is only 47%, and the detection of fresh puncture tissue is only performed, so that the wide application of some samples is limited. ThyroSeqv2 detects H/K/NRAS gene mutations and RET/PTC gene rearrangements frequently carried by benign nodules, with PPV of only 42-77%. Furthermore, diagnostic DNA Methylation Signature approach (DDMS) is a diagnostic method based on DNA methylation characteristics for the identification of benign and malignant thyroid cancer tissue. Although the method is highly accurate, some samples cannot be detected by the method for technical reasons [ John H YIm, audrey H Choi, arthur X Li et al Identification of Tissue-Specific DNA Methylation Signatures for Thyroid Nodule Diagnostics, clin Cancer Res,2019Jan 1 ] 5；25(2):544-551〕。

Disclosure of Invention

In a first aspect, the present invention provides the use of a reagent for detecting the methylation status or level of at least one CpG dinucleotide of one or more markers of interest for the manufacture of a detection reagent or a diagnostic kit for diagnosing benign and malignant thyroid nodules in an individual, and the use of a device for determining the methylation status or level of at least one CpG dinucleotide of one or more markers of interest for the manufacture of a diagnostic kit for diagnosing benign and malignant thyroid nodules in an individual, wherein the one or more markers of interest are selected from the group consisting of: PRDM16 gene or genome PRDM16 sequence, CAMK2N1 gene or genome CAMK2N1 sequence, TACSTD2 gene or genome TACSTD2 sequence, CRABP2 gene or genome CRABP2 sequence, IER5 gene or genome IER5 sequence, ITPKB gene or genome ITPKB sequence, ITGB1BP1 gene or genome ITGB1BP1 sequence, MTHFD2 gene or genome MTHFD2 sequence, BIN1 gene or genome BIN1 sequence, DNASE1L3 gene or genome DNASE1L3 sequence, DNASE1L3 sequence LSG1 sequence of LSG1 gene or genome, SH3BP2 sequence of SH3BP2 gene or genome, SLC12A7 sequence of SLC12A7 gene or genome, NR2F1 sequence of NR2F1 gene or genome, EGR1 sequence of EGR1 gene or genome, LARP1 sequence of LARP1 gene or genome, RARS sequence of RARS gene or genome, TTBK1 sequence of TTBK1 gene or genome, FAM20C sequence of FAM20C gene or genome, CREB5 sequence of CREB5 gene or genome, LIMK1 sequence of LIMK1 gene or genome PRKAG2 sequence of PRKAG2 gene or genome, SLC39A14 sequence of SLC39A14 gene or genome, EGR3 sequence of EGR3 gene or genome, DUSP26 sequence of DUSP26 gene or genome, AGPAT2 sequence of AGPAT2 gene or genome, AGPAT2 sequence of gene or genome, gene, or genome, or sequence, or gene, or NRARP sequence of NRARP gene or genome, EGR2 sequence of EGR2 gene or genome, PPIF sequence of PPIF gene or genome, CHID1 sequence of CHID1 gene or genome, ADM sequence of ADM gene or genome, NAV2 sequence of NAV2 gene or genome, and/or its/their/EHBP 1L1 sequence of the EHBP1L1 gene or genome, PHLDB1 sequence of the PHLDB1 gene or genome, PARP11 sequence of the PARP11 gene or genome, ANO6 sequence of the ANO6 gene or genome, PLXNC1 sequence of the PLXNC1 gene or genome, ZNF219 sequence of the ZNF219 gene or genome, FOXA1 sequence of the FOXA1 gene or genome, PAPLN sequence of the PAPLN gene or genome, UACA sequence of the UACA gene or genome, PGPEP1L sequence of the PGPEP1L gene or genome, ITPRIPL2 sequence of the PLXNC1 gene or genome, TNK1 gene or genome's TNK1 sequence, RPL19 gene or genome's RPL19 sequence, ICAM2 gene or genome's ICAM2 sequence, TMC6 gene or genome's TMC6 sequence, CEP295NL gene or genome's CEP295NL sequence, BAIAP2 gene or genome's BAIAP2 sequence, TBCD gene or genome's TBCD sequence, METRNL gene or genome's METRL sequence, MED16 gene or genome's MED16 sequence, SBNO2 gene or genome's SBNO2 sequence, CIRBP gene or genome's CIRBP sequence, KLF16 gene or genome's KLF16 sequence, C19orf77 gene or genome's C19orf77 sequence, SNNK 2 gene or genome's SNN 2 sequence, ICAM1 gene or genome's ICAM5 sequence, IER2 gene or genome's IER2 sequence, ASF1B gene or genome's MED16 sequence, ASF1 gene or genome's CIRBP sequence, KLF16 gene or genome's KLF16 sequence, C19orf77 sequence, SNNK 2 gene or genome's SNNK 2 sequence, ICAM1 gene or genome's ICAM5 sequence, ICAM5 gene or genome's TC 2 gene's IER2 gene or genome's IEF 1, ASF1 gene or ASF1 gene's ASF1 or genome's ASF1, ASF 4 or gene's BCF 4 gene's 4 sequence, or its BCF 4 gene or genome's 4 sequence, and its KCR 4 gene or its 4 sequence.

In one or more embodiments, the one or more markers of interest are selected from the group consisting of: the PRDM16 gene or genome PRDM16 sequence, BIN1 sequence of a BIN1 gene or genome, LIMK1 sequence of a LIMK1 gene or genome, EGR3 sequence of a CRTC1 gene or genome, PPIF sequence of a PPIF gene or genome, ZNF219 sequence of a ZNF219 gene or genome, UACA sequence of a UACA gene or genome, TNK1 sequence of a TNK1 gene or genome, CEP295NL sequence of a CEP295NL gene or genome, SBNO2 sequence of a SBNO2 gene or genome, C19orf77 sequence of a C19orf77 gene or genome, ICAM5 sequence of a ICAM5 gene or genome, CRTC1 sequence of a CRTC1 gene or genome, RTN4R sequence of a RTN4 gene or genome, CAMK2N1 sequence of a CAMK 1 gene or genome, DNASE1L3 sequence of a DNASE1 gene or genome, DUSP26 sequence of a DUSP26 gene or genome, a cstr 2 sequence of a csag 2 gene or genome, a cstr 2 sequence of a cstr 2 gene or genome, a cstr 2 or a cstr 2 gene or genome.

In one or more embodiments, the one or more target markers include at least one or more of the following target markers: the EGR3 sequence of the EGR3 gene or genome, the TNK1 sequence of the TNK1 gene or genome, the DNASE1L3 sequence of the DNASE1L3 gene or genome, the DUSP26 sequence of the DUSP26 gene or genome, the BAIAP2 sequence of the BAIAP2 gene or genome, the MED16 sequence of the MED16 gene or genome, the C19orf77 sequence of the C19orf77 gene or genome, the NOL4L-DT sequence of the NOL4L-DT gene or genome, the TACSTD2 sequence of the TACSTD2 gene or genome, the CRABP2 sequence of the CRABP2 gene or genome, the BCR sequence of the BCR gene or genome.

In one or more embodiments, the one or more target markers comprise: the PRDM16 gene or genome PRDM16 sequence, BIN1 sequence of a BIN1 gene or genome, LIMK1 sequence of a LIMK1 gene or genome, EGR3 sequence of an EGR3 gene or genome, PPIF sequence of a PPIF gene or genome, ZNF219 sequence of a ZNF219 gene or genome, UACA sequence of a UACA gene or genome, TNK1 sequence of a TNK1 gene or genome, CEP295NL sequence of a CEP295NL gene or genome, SBNO2 sequence of a SBNO2 gene or genome, C19orf77 sequence of a C19orf77 gene or genome, ICAM5 sequence of an ICAM5 gene or genome, CRTC1 sequence of a CRTC1 gene or genome, and RTN4R sequence of an RTN4R gene or genome.

In one or more embodiments, the CAMK2N1 sequence of the CAMK2N1 gene or genome, the DNASE1L3 sequence of the EGR3 gene or genome, the DUSP26 sequence of the DUSP26 gene or genome, the ICAM2 sequence of the ICAM2 gene or genome, the BAIAP2 sequence of the BAIAP2 gene or genome, the MED16 sequence of the MED16 gene or genome, the C19orf77 sequence of the C19orf77 gene or genome, and the non 4L-DT sequence of the non 4L-DT gene or genome.

In one or more embodiments, the TACSTD2 sequence of a TACSTD2 gene or genome, the CRABP2 sequence of a CRABP2 gene or genome, the DNASE1L3 sequence of a DNASE1L3 gene or genome, the LSG1 sequence of a LSG1 gene or genome, the EGR3 sequence of an EGR3 gene or genome, the TNK1 sequence of a TNK1 gene or genome, the BAIAP2 sequence of a BAIAP2 gene or genome, the NOL4L-DT sequence of a NOL4L-DT gene or genome, and the BCR sequence of a BCR gene or genome.

In one or more embodiments, the TACSTD2 sequence of a TACSTD2 gene or genome, the CRABP2 sequence of a CRABP2 gene or genome, the DNASE1L3 sequence of a DNASE1L3 gene or genome, the EGR3 sequence of an EGR3 gene or genome, the DUSP26 sequence of a DUSP26 gene or genome, the TNK1 sequence of a TNK1 gene or genome, the BAIAP2 sequence of a BAIAP2 gene or genome, the NOL4L-DT sequence of a NOL4L-DT gene or genome, and the BCR sequence of a BCR gene or genome.

In one or more embodiments, the TACSTD2 sequence of a TACSTD2 gene or genome, the DNASE1L3 sequence of a DNASE1L3 gene or genome, the EGR3 sequence of an EGR3 gene or genome, the DUSP26 sequence of a DUSP26 gene or genome, the TNK1 sequence of a TNK1 gene or genome, the BAIAP2 sequence of a BAIAP2 gene or genome, the MED16 sequence of a MED16 gene or genome, the NOL4L-DT sequence of a NOL4L-DT gene or genome, and the BCR sequence of a BCR gene or genome.

In one or more embodiments, the Hg19 coordinates of the one or more target markers are as follows: PRDM16 gene: chr1:3155051:3155760; CAMK2N1 gene: chr1:20813203:20813902; TACSTD2 Gene: chr 1:5904615:59042314; CRABP2 Gene: chr1:15667274:156676773; IER5 gene: chr1:181074539:181075238; ITPKB gene: chr1:226924700:226925399; ITGB1BP1 gene: chr2:9526804:9527503; MTHFD2 gene: chr2:74453839:74454538; BIN1 gene: chr2, 127822196, 127822895; DNASE1L3 gene: chr3:58153211:58153910; LSG1 gene: chr3:194408527:194409226; SH3BP2 gene: chr 4:2795932:2795331; SLC12A7 gene: chr5:1117661:1118360; NR2F1 gene: chr5:92914797:92915496; EGR1 gene: chr5:137802399:137803098; LARP1 gene: chr5:154133955:154134654; RARS gene: chr 5:167837780:167838499; TTBK1 gene: chr6:43215063:43215762; FAM20C gene: chr7:193512:194211; CREB5 gene: chr7:28449041:28449740; LIMK1 gene: chr7:73508743:73509442; PRKAG2 gene: chr7:151424814:151425513; SLC39a14 gene: chr8:22236914:22237613; EGR3 gene: chr8:22547976:22549090; DUSP26 gene: chr8:34104888:34105587; AGPAT2 gene: chr9:139581855:139582554; NRARP gene: chr9:140205734:140206433; EGR2 gene: chr10:64578269:64578968; PPIF gene: chr10:81001706:81002405; CHID1 gene: chr11:911289:911988; ADM gene: chr11:10328946:10329645; NAV2 gene: chr11:19734801:19736359; EHBP1L1 gene: chr11:65343387:65344086; PHLDB1 gene: chr11:118479144:118479843; PARP11 gene: chr12:4139935:4140634; ANO6 gene: chr12:45610331:45611030; PLXNC1 gene: chr12:94544076:94544775; ZNF219 gene: chr14:21559748:21560447; FOXA1 gene: chr14:3806876:38065575; PAPLN gene: chr14:73704629:73705328; UACA gene: chr15:70766881:70767580; PGPEP1L gene: chr15:99466242:99466941; ITPRIPL2 gene: chr16:19125694:19126393; TNK1 gene: chr17:7286958:7287657; RPL19 gene: chr17:37366033:37366732; ICAM2 gene: chr 17:6207008:6207677; TMC6 gene: chr 17:7613226:7624091; CEP295NL gene: chr17:7689761:768880460; the BAIAP2 gene: chr17:79060865:79061564; TBCD gene: chr17:80744791:80745490; METRNL gene: chr17:81083812:81084511; MED16 gene: chr19:883793:884492; SBNO2 gene: chr19:1177275:1177974; CIRBP gene: chr19:1265690:1266389; KLF16 gene: chr19:1860343:1861042; c19orf77 gene: chr19:34666:3435687; SNAPC2 gene: chr 19:7985709:7986108; ICAM1 gene: chr19:10381317:10382016; ICAM5 gene: chr19:10404832:10405531; IER2 gene: chr19:13266647:13267346; ASF1B gene: chr 19:14248133:14248172; CRTC1 gene: chr19:18770961:18771660; ZNF536 gene: chr19:31039247:31039946; LTBP4 gene: chr19:41105706:41106405; NOL4L-DT gene: chr20:31162101:31162800; KCNK15 gene: chr20:43374048:43374747; UCKL1 gene: chr20:62588113:62588812; RTN4R gene: chr22:20226373:20227274; BCR gene: chr22:23624092:23624791; TEF gene: chr22:41771229:41771928.

In one or more embodiments, the Hg19 coordinates of the one or more target markers are as follows: PRDM16 gene: chr1:3155311:3155510; CAMK2N1 gene: chr1:20813453:20813652; TACSTD2 Gene: chr 1:59041685:59042064; CRABP2 Gene: chr1:15667684:156676723; IER5 gene: chr1:181074789:181074988; ITPKB gene: chr1:226924950:226925149; ITGB1BP1 gene: chr2:9527054:9527253; MTHFD2 gene: chr2:74454089:74454288; BIN1 gene: chr2, 127822446, 127822645; DNASE1L3 gene: chr3:58153461:58153660; LSG1 gene: chr3:194408777:194408976; SH3BP2 gene: chr 4:279282:2795581; SLC12A7 gene: chr5:1117911:1118110; NR2F1 gene: chr5:92915047:92915246; EGR1 gene: chr5:137802649:137802848; LARP1 gene: chr5:154134205:154134404; RARS gene: chr 5:167838020:167838129; TTBK1 gene: chr6:43215313:43215512; FAM20C gene: chr7:193762:193961; CREB5 gene: chr7:28449291:28449490; LIMK1 gene: chr7:73508993:73509192; PRKAG2 gene: chr7:151425064:151425263; SLC39a14 gene: chr8:22237164:22237363; EGR3 gene: chr8:22548226:22548425; EGR3 gene: chr8:22548641:22548840; DUSP26 gene: chr 8:3405138:34105337; AGPAT2 gene: chr9:139582105:13958234; NRARP gene: chr9:140205984:140206183; EGR2 gene: chr 10:64578519:6457878; PPIF gene: chr10:81001956:81002155; CHID1 gene: chr11:911539:911738; ADM gene: chr 11:1032996:10329395; NAV2 gene: chr11:19735051:19735250; NAV2 gene: chr11:19735910:19736109; EHBP1L1 gene: chr11:65343637:65343836; PHLDB1 gene: chr11:118479394:118479593; PARP11 gene: chr12, 4140185:4140384; ANO6 gene: chr12:45610581:45610780; PLXNC1 gene: chr12:94544326:94544525; ZNF219 gene: chr 14:21559998:2156097; FOXA1 gene: chr14:380665126:380565325; PAPLN gene: chr14:73704879:73705078; UACA gene: chr15:70767131:70767330; PGPEP1L gene: chr 15:99466492:466691; ITPRIPL2 gene: chr16:19125944:19126143; TNK1 gene: chr17:7287208:7287407; RPL19 gene: chr17:37366283:37366482; ICAM2 gene: chr 17:62076858:62057657; TMC6 gene: chr 17:7613476:7613675; TMC6 gene: chr 17:7623642:7623841; CEP295NL gene: chr 17:768880011:768880210; the BAIAP2 gene: chr17:79061115:79061314; TBCD gene: chr17, 80745041, 80745240; METRNL gene: chr17:81084062:81084261; MED16 gene: chr19:884043:884242; SBNO2 gene: chr19:1177525:1177724; CIRBP gene: chr19:1265940:1266139; KLF16 gene: chr19:1860593:1860792; c19orf77 gene: chr19:34916:3435115; c19orf77 gene: chr19:3435238:3435437; SNAPC2 gene: chr19:7985959:7986158; ICAM1 gene: chr19:10381567:10381766; ICAM5 gene: chr19:10405082:10405281; IER2 gene: chr19:13266897:13267096; ASF1B gene: chr19:14248383:14248582; CRTC1 gene: chr19:18771211:18771410; ZNF536 gene: chr19:31039497:31039696; LTBP4 gene: chr19:41105956:4106155; NOL4L-DT gene: chr20:31162351:31162550; KCNK15 gene: chr20:43374298:43374497; UCKL1 gene: chr20:62588863:6258562; RTN4R gene: chr22:20226623:20226822; RTN4R gene: chr22:20226825:20227024; BCR gene: chr22:23624342:23624541; TEF gene: chr22:41771479:41771678.

In one or more embodiments, the reagents include primer and/or probe molecules; preferably, the primer molecule is identical, complementary or hybridizes under stringent conditions to the one or more target markers and comprises at least 9 consecutive nucleotides, and the probe molecule hybridizes under stringent conditions to the amplification product of the one or more target markers.

In one or more embodiments, the reagents are required to implement genome-simplified methylation sequencing techniques. In one or more embodiments, the reagents required to implement the genome-simplified methylation sequencing technology include reagents required for cleavage, reagents required for library construction (e.g., end repair, addition of a-tails and adaptors, etc.), reagents required for cytosine conversion, reagents required for PCR amplification, and the like. One or more of the above-described reagents may be included in the detection reagent or diagnostic kit of the present invention.

In a second aspect, the invention provides a diagnostic reagent or diagnostic kit for detecting the methylation state or methylation level of at least one CpG dinucleotide of one or more markers of interest according to any of the embodiments herein, for diagnosing a benign or malignant thyroid nodule, comprising a reagent for detecting the methylation state or level of at least one CpG dinucleotide of one or more markers of interest.

In one or more embodiments, the diagnostic reagent or diagnostic kit comprises a primer and/or probe molecule, wherein the primer molecule is identical, complementary or hybridizes under stringent conditions to the one or more target markers and comprises at least 9 consecutive nucleotides; hybridizing the probe molecules to the amplified products of the one or more target markers under stringent conditions; optionally, the diagnostic reagent or diagnostic kit further comprises a primer molecule and/or a probe molecule for detecting the reference gene ACTB.

In one or more embodiments, the diagnostic reagent or diagnostic kit further comprises one or more substances selected from the group consisting of: PCR buffer, polymerase, dNTP, restriction endonuclease, digestion buffer, fluorescent dye, fluorescence quencher, fluorescent reporter, exonuclease, alkaline phosphatase, internal standard, control, KCl, mgCl ₂ And (NH) ₄ ) ₂ SO ₄ 。

In one or more embodiments, the reagent for detecting methylation further comprises reagents used in one or more of the following methods: bisulfite conversion-based PCR, DNA sequencing, methylation-sensitive restriction enzyme analysis, fluorescent quantitation, methylation-sensitive high resolution melting curve, chip-based methylation profile analysis, and mass spectrometry.

In one or more embodiments, the agent is selected from one or more of the following: bisulfite and derivatives thereof, fluorescent dyes, fluorescence quenchers, fluorescence reporters, internal standards and controls.

In a third aspect the invention provides the use of at least one reagent or set of reagents for distinguishing between methylated and unmethylated CpG dinucleotides in at least one target region of genomic DNA, wherein the method comprises contacting genomic DNA isolated from a biological sample of the individual with the at least one reagent or set of reagents, wherein the target region is identical or complementary to a sequence of at least 16 consecutive nucleotides of one or more markers of interest as described in any of the embodiments herein, wherein the consecutive nucleotides comprise at least one CpG dinucleotide sequence, thereby at least partially providing for the detection and/or classification of benign and malignant thyroid nodules in an individual.

In a fourth aspect the invention provides the use of one or more reagents for converting an unmethylated cytosine base at position 5 to uracil or other bases detectably different from cytosine in terms of hybridization properties, an amplification enzyme and at least one primer comprising at least 9 consecutive nucleotides in the preparation of a kit for use in a method of detecting and/or classifying a benign and malignant thyroid nodule in an individual, wherein the method comprises: a) Isolating genomic DNA from the individual biological sample; b) Treating the genomic DNA of a) or fragment thereof with the one or more reagents; c) Contacting the treated genomic DNA or a treated fragment thereof with the amplification enzyme and the at least one primer that is identical, complementary or hybridizes under stringent conditions to one or more markers of interest as described in any of the embodiments herein, wherein the treated genomic DNA or fragment thereof is amplified to produce at least one amplification product or is not amplified; and d) determining the methylation status or level of at least one CpG dinucleotide of the one or more markers of interest, or a mean or value reflecting the mean methylation status or level of a plurality of CpG dinucleotides of the one or more markers of interest, based on the presence or nature of the amplificate, thereby at least partially detecting and/or classifying a benign malignancy of a thyroid nodule in an individual.

In one or more embodiments, in step b), the genomic DNA or fragment thereof is treated with an agent selected from the group consisting of bisulfites, acid sulfites, metabisulfites, and combinations thereof.

In one or more embodiments, c) contacting or amplifying the nucleic acid molecule is performed by using a thermostable DNA polymerase as the amplification enzyme, using a polymerase lacking 5'-3' exonuclease activity, using a polymerase chain reaction and/or producing an amplification product with a detectable label.

In one or more embodiments, the contacting or amplifying in c) comprises using methylation specific primers.

In a fifth aspect the present invention provides the use of one or more methylation sensitive restriction and amplification enzymes and at least one primer comprising at least 9 consecutive nucleotides, wherein the primer is identical, complementary or hybridizes under stringent conditions to one or more of the markers of interest described in any of the embodiments herein, for the preparation of a kit for use in a method of detecting and/or classifying a benign and malignant thyroid nodule in an individual; the method comprises the following steps: a) Isolating genomic DNA from the individual biological sample; b) Digesting the genomic DNA of a) or a fragment thereof with the one or more methylation sensitive restriction enzymes, and contacting the resulting digestion product with the amplification enzyme and the at least one primer; and c) determining the methylation status or level of at least one CpG dinucleotide of the one or more markers of interest based on the presence or the nature of the amplificate, thereby at least partially detecting and/or classifying a benign malignancy of a thyroid nodule in an individual.

In one or more embodiments, the presence or absence of an amplification product is determined by hybridizing at least one nucleic acid or peptide nucleic acid that is identical or complementary to at least a 16 base long fragment of a sequence selected from the one or more markers of interest.

In a sixth aspect the invention provides the use of a treated nucleic acid derived from one or more markers of interest as described in any of the embodiments herein, in the manufacture of a kit for diagnosing benign or malignant thyroid nodule, wherein the treatment is suitable for converting at least one unmethylated cytosine base of the one or more markers of interest to uracil or other bases detectably different from cytosine on hybridization.

A seventh aspect of the present invention provides an apparatus for detecting and diagnosing benign and malignant thyroid nodules in an individual, the apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to effect the steps of: (1) Obtaining the methylation level or methylation state of at least one CpG dinucleotide of one or more of the markers of interest described in any of the embodiments herein in the sample, and (2) interpreting the benign and malignant thyroid nodule according to the methylation level or methylation state of (1).

Drawings

Fig. 1: example 1 model constructed with marker combinations ROC curves for malignant nodules in training set and two sets of validation set samples

Fig. 2: example 2 model constructed with marker combinations ROC curves for malignant nodules were diagnosed in training set and two sets of validation set samples.

Fig. 3: example 3 model constructed with marker combinations ROC curves for malignant nodules were diagnosed in training set and two sets of validation set samples.

Fig. 4: example 4 model constructed with marker combinations ROC curves for malignant nodules were diagnosed in training set and two sets of validation set samples.

Fig. 5: example 5 model constructed with marker combinations ROC curves for malignant nodules were diagnosed in training set and two sets of validation set samples.

Detailed Description

While various aspects and embodiments of the present application have been disclosed, various equivalent changes or modifications can be made by those skilled in the art without departing from the spirit and scope of the present application. The various aspects and embodiments disclosed herein are illustrative and not intended to limit the scope of the application, which is defined in the claims. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art. All references, patents and patent applications cited in this application are incorporated by reference into this application.

It is noted that, in the specification and claims of this application, the singular forms "a," "an," and "the" include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to "an agent" includes a plurality of agents.

In the description and claims of this application, unless otherwise indicated, the terms "comprising," "including," or "containing" are intended to include the listed values, steps, or components, but do not exclude the presence of other values, steps, or components.

As a result of intensive studies, the present inventors have found some target markers associated with malignant thyroid nodules, including: PRDM16 gene or genome PRDM16 sequence, CAMK2N1 gene or genome CAMK2N1 sequence, TACSTD2 gene or genome TACSTD2 sequence, CRABP2 gene or genome CRABP2 sequence, IER5 gene or genome IER5 sequence, ITPKB gene or genome ITPKB sequence, ITGB1BP1 gene or genome ITGB1BP1 sequence, MTHFD2 gene or genome MTHFD2 sequence, BIN1 gene or genome BIN1 sequence, DNASE1L3 gene or genome DNASE1L3 sequence, DNASE1L3 sequence LSG1 sequence of LSG1 gene or genome, SH3BP2 sequence of SH3BP2 gene or genome, SLC12A7 sequence of SLC12A7 gene or genome, NR2F1 sequence of NR2F1 gene or genome, EGR1 sequence of EGR1 gene or genome, LARP1 sequence of LARP1 gene or genome, RARS sequence of RARS gene or genome, TTBK1 sequence of TTBK1 gene or genome, FAM20C sequence of FAM20C gene or genome, CREB5 sequence of CREB5 gene or genome, LIMK1 sequence of LIMK1 gene or genome PRKAG2 sequence of PRKAG2 gene or genome, SLC39A14 sequence of SLC39A14 gene or genome, EGR3 sequence of EGR3 gene or genome, DUSP26 sequence of DUSP26 gene or genome, AGPAT2 sequence of AGPAT2 gene or genome, AGPAT2 sequence of gene or genome, gene, or genome, or sequence, or gene, or NRARP sequence of NRARP gene or genome, EGR2 sequence of EGR2 gene or genome, PPIF sequence of PPIF gene or genome, CHID1 sequence of CHID1 gene or genome, ADM sequence of ADM gene or genome, NAV2 sequence of NAV2 gene or genome, and/or its/their/EHBP 1L1 sequence of the EHBP1L1 gene or genome, PHLDB1 sequence of the PHLDB1 gene or genome, PARP11 sequence of the PARP11 gene or genome, ANO6 sequence of the ANO6 gene or genome, PLXNC1 sequence of the PLXNC1 gene or genome, ZNF219 sequence of the ZNF219 gene or genome, FOXA1 sequence of the FOXA1 gene or genome, PAPLN sequence of the PAPLN gene or genome, UACA sequence of the UACA gene or genome, PGPEP1L sequence of the PGPEP1L gene or genome, ITPRIPL2 sequence of the PLXNC1 gene or genome, TNK1 gene or genome's TNK1 sequence, RPL19 gene or genome's RPL19 sequence, ICAM2 gene or genome's ICAM2 sequence, TMC6 gene or genome's TMC6 sequence, CEP295NL gene or genome's CEP295NL sequence, BAIAP2 gene or genome's BAIAP2 sequence, TBCD gene or genome's TBCD sequence, METRNL gene or genome's METRL sequence, MED16 gene or genome's MED16 sequence, SBNO2 gene or genome's SBNO2 sequence, CIRBP gene or genome's CIRBP sequence, KLF16 gene or genome's KLF16 sequence, C19orf77 gene or genome's C19orf77 sequence, SNNK 2 gene or genome's SNN 2 sequence, ICAM1 gene or genome's ICAM5 sequence, IER2 gene or genome's IER2 sequence, ASF1B gene or genome's MED16 sequence, ASF1 gene or genome's CIRBP sequence, KLF16 gene or genome's KLF16 sequence, C19orf77 sequence, SNNK 2 gene or genome's SNNK 2 sequence, ICAM1 gene or genome's ICAM5 sequence, ICAM5 gene or genome's TC 2 gene's IER2 gene or genome's IEF 1, ASF1 gene or ASF1 gene's ASF1 or genome's ASF1, ASF 4 or gene's BCF 4 gene's 4 sequence, or its BCF 4 gene or genome's 4 sequence, and its KCR 4 gene or its 4 sequence. The benign and malignant thyroid nodules can be identified by detecting the methylation level of one or more of the target markers in a biological sample from the individual comprising DNA.

I. Target marker and target region thereof

As used herein, the term "target marker" refers to a nucleic acid or gene region of interest that: its methylation level indicates benign and malignant thyroid nodule. The term "target marker" shall be considered to include all transcriptional variants of the genes described herein and all promoter and regulatory elements thereof. As will be appreciated by those skilled in the art, certain genes are known to exhibit allelic variation or single nucleotide polymorphism ("SNP") between individuals. SNPs include insertions and deletions of simple repeat sequences of different lengths (e.g., dinucleotide and trinucleotide repeats). Thus, the present application should be understood to extend to all forms of markers/genes arising from any other mutation, polymorphism or allelic variation. In addition, it is understood that the term "target marker" shall include both the sense strand sequence of a marker or gene and the antisense strand sequence of a marker or gene.

The term "target marker" as used herein is to be construed broadly to include both 1) the original marker found in a biological sample or genomic DNA (in a specific methylation state) and 2) its treated sequence (e.g., the corresponding region after bisulfite conversion or the corresponding region after MSRE treatment). The corresponding region after bisulfite conversion differs from the target marker in the genomic sequence in that one or more unmethylated cytosine residues are converted to uracil bases, thymine bases, or other bases that differ from cytosine in hybridization behavior. The MSRE treated corresponding region differs from the target marker in the genomic sequence in that the sequence is cleaved at one or more MSRE cleavage sites.

The molecular diagnosis in the invention comprises early diagnosis of thyroid malignant tumor, late diagnosis of thyroid malignant tumor, and also comprises thyroid malignant tumor screening, risk assessment, prognosis and disease recognition. Early diagnosis refers to the likelihood of finding cancer prior to metastasis, preferably before morphological changes in tissue or cells can be observed.

In this context, it should be understood that the target markers PRDM16, CAMK2N1, TACSTD2, CRABP2, IER5, ITPKB, ITGB1BP1, MTHFD2, BIN1, DNASE1L3, LSG1, SH3BP2, SLC12A7, NR2F1, EGR1, LARP1, RARS, TTBK1, FAM20C, CREB, LIMK1, PRKAG2, SLC39A14, EGR3, DUSP26, AGTAT 2, NRARP, EGR2, PPIF, CHID1, ADM, NAV2, EHBP1L1, PHLDB1, PARP11, ANO6, PLXNC1, ZNF219, FOXA1, PAPLN, UACA, PGPEP L, ITPRIPL2, TNK1, RPL19, ICAM2, TMC6, CEP295NL, BAIAP2, TBCD, METRNL, MED, NO2, RBP 16, CIP 16, CIF 19, ASRP 2, ASRP 4, RBF 4, and the like, may be described by their names as being equal to or more than two, or equal to, depending on the respective products, uses and methods described herein. The chromosome coordinates are consistent with the Hg19 version of the human genome database published in month 2 2009 (referred to herein as "Hg19 coordinates"). It is understood that the sequences of a certain gene and its genome as described herein also include fragments of each gene containing at least one CpG dinucleotide sequence. In some embodiments, the fragment is a region of interest of each gene described herein.

In some embodiments, the Hg19 coordinates of each gene mentioned herein are as follows: PRDM16 gene: chr1:3155051:3155760; CAMK2N1 gene: chr1:20813203:20813902; TACSTD2 Gene: chr 1:5904615:59042314; CRABP2 Gene: chr1:15667274:156676773; IER5 gene: chr1:181074539:181075238; ITPKB gene: chr1:226924700:226925399; ITGB1BP1 gene: chr2:9526804:9527503; MTHFD2 gene: chr2:74453839:74454538; BIN1 gene: chr2, 127822196, 127822895; DNASE1L3 gene: chr3:58153211:58153910; LSG1 gene: chr3:194408527:194409226; SH3BP2 gene: chr 4:2795932:2795331; SLC12A7 gene: chr5:1117661:1118360; NR2F1 gene: chr5:92914797:92915496; EGR1 gene: chr5:137802399:137803098; LARP1 gene: chr5:154133955:154134654; RARS gene: chr 5:167837780:167838499; TTBK1 gene: chr6:43215063:43215762; FAM20C gene: chr7:193512:194211; CREB5 gene: chr7:28449041:28449740; LIMK1 gene: chr7:73508743:73509442; PRKAG2 gene: chr7:151424814:151425513; SLC39a14 gene: chr8:22236914:22237613; EGR3 gene: chr8:22547976:22549090; DUSP26 gene: chr8:34104888:34105587; AGPAT2 gene: chr9:139581855:139582554; NRARP gene: chr9:140205734:140206433; EGR2 gene: chr10:64578269:64578968; PPIF gene: chr10:81001706:81002405; CHID1 gene: chr11:911289:911988; ADM gene: chr11:10328946:10329645; NAV2 gene: chr11:19734801:19736359; EHBP1L1 gene: chr11:65343387:65344086; PHLDB1 gene: chr11:118479144:118479843; PARP11 gene: chr12:4139935:4140634; ANO6 gene: chr12:45610331:45611030; PLXNC1 gene: chr12:94544076:94544775; ZNF219 gene: chr14:21559748:21560447; FOXA1 gene: chr14:3806876:38065575; PAPLN gene: chr14:73704629:73705328; UACA gene: chr15:70766881:70767580; PGPEP1L gene: chr15:99466242:99466941; ITPRIPL2 gene: chr16:19125694:19126393; TNK1 gene: chr17:7286958:7287657; RPL19 gene: chr17:37366033:37366732; ICAM2 gene: chr 17:6207008:6207677; TMC6 gene: chr 17:7613226:7624091; CEP295NL gene: chr17:7689761:768880460; the BAIAP2 gene: chr17:79060865:79061564; TBCD gene: chr17:80744791:80745490; METRNL gene: chr17:81083812:81084511; MED16 gene: chr19:883793:884492; SBNO2 gene: chr19:1177275:1177974; CIRBP gene: chr19:1265690:1266389; KLF16 gene: chr19:1860343:1861042; c19orf77 gene: chr19:34666:3435687; SNAPC2 gene: chr 19:7985709:7986108; ICAM1 gene: chr19:10381317:10382016; ICAM5 gene: chr19:10404832:10405531; IER2 gene: chr19:13266647:13267346; ASF1B gene: chr 19:14248133:14248172; CRTC1 gene: chr19:18770961:18771660; ZNF536 gene: chr19:31039247:31039946; LTBP4 gene: chr19:41105706:41106405; NOL4L-DT gene: chr20:31162101:31162800; KCNK15 gene: chr20:43374048:43374747; UCKL1 gene: chr20:62588113:62588812; RTN4R gene: chr22:20226373:20227274; BCR gene: chr22:23624092:23624791; TEF gene: chr22:41771229:41771928.

In some embodiments, the EGR3 gene, NAV2 gene, TMC6 gene, C19orf77 gene, and RTN4R gene may include the following two Hg coordinate regions:

EGR3 gene: chr8:22547976:22548675; chr8:22548391:22549090;

NAV2 gene: chr11:19734801:19735500; chr11:19735660:19736359;

TMC6 gene: chr 17:7613226:76101925; chr 17:7623399:7624091;

c19orf77 gene: chr19:34666:3435365; chr19:34988:3435687;

RTN4R gene: chr22:20226373:20227072; chr22:20226575:20227274.

In a further preferred embodiment, the Hg coordinate regions of one or more target markers described herein are respectively: PRDM16 gene: chr1:3155311:3155510; CAMK2N1 gene: chr1:20813453:20813652; TACSTD2 Gene: chr 1:59041685:59042064; CRABP2 Gene: chr1:15667684:156676723; IER5 gene: chr1:181074789:181074988; ITPKB gene: chr1:226924950:226925149; ITGB1BP1 gene: chr2:9527054:9527253; MTHFD2 gene: chr2:74454089:74454288; BIN1 gene: chr2, 127822446, 127822645; DNASE1L3 gene: chr3:58153461:58153660; LSG1 gene: chr3:194408777:194408976; SH3BP2 gene: chr 4:279282:2795581; SLC12A7 gene: chr5:1117911:1118110; NR2F1 gene: chr5:92915047:92915246; EGR1 gene: chr5:137802649:137802848; LARP1 gene: chr5:154134205:154134404; RARS gene: chr 5:167838020:167838129; TTBK1 gene: chr6:43215313:43215512; FAM20C gene: chr7:193762:193961; CREB5 gene: chr7:28449291:28449490; LIMK1 gene: chr7:73508993:73509192; PRKAG2 gene: chr7:151425064:151425263; SLC39a14 gene: chr8:22237164:22237363; EGR3 gene: chr8:22548226:22548425; EGR3 gene: chr8:22548641:22548840; DUSP26 gene: chr 8:3405138:34105337; AGPAT2 gene: chr9:139582105:13958234; NRARP gene: chr9:140205984:140206183; EGR2 gene: chr 10:64578519:6457878; PPIF gene: chr10:81001956:81002155; CHID1 gene: chr11:911539:911738; ADM gene: chr 11:1032996:10329395; NAV2 gene: chr11:19735051:19735250; NAV2 gene: chr11:19735910:19736109; EHBP1L1 gene: chr11:65343637:65343836; PHLDB1 gene: chr11:118479394:118479593; PARP11 gene: chr12, 4140185:4140384; ANO6 gene: chr12:45610581:45610780; PLXNC1 gene: chr12:94544326:94544525; ZNF219 gene: chr 14:21559998:2156097; FOXA1 gene: chr14:380665126:380565325; PAPLN gene: chr14:73704879:73705078; UACA gene: chr15:70767131:70767330; PGPEP1L gene: chr 15:99466492:466691; ITPRIPL2 gene: chr16:19125944:19126143; TNK1 gene: chr17:7287208:7287407; RPL19 gene: chr17:37366283:37366482; ICAM2 gene: chr 17:62076858:62057657; TMC6 gene: chr 17:7613476:7613675; TMC6 gene: chr 17:7623642:7623841; CEP295NL gene: chr 17:768880011:768880210; the BAIAP2 gene: chr17:79061115:79061314; TBCD gene: chr17, 80745041, 80745240; METRNL gene: chr17:81084062:81084261; MED16 gene: chr19:884043:884242; SBNO2 gene: chr19:1177525:1177724; CIRBP gene: chr19:1265940:1266139; KLF16 gene: chr19:1860593:1860792; c19orf77 gene: chr19:34916:3435115; c19orf77 gene: chr19:3435238:3435437; SNAPC2 gene: chr19:7985959:7986158; ICAM1 gene: chr19:10381567:10381766; ICAM5 gene: chr19:10405082:10405281; IER2 gene: chr19:13266897:13267096; ASF1B gene: chr19:14248383:14248582; CRTC1 gene: chr19:18771211:18771410; ZNF536 gene: chr19:31039497:31039696; LTBP4 gene: chr19:41105956:4106155; NOL4L-DT gene: chr20:31162351:31162550; KCNK15 gene: chr20:43374298:43374497; UCKL1 gene: chr20:62588863:6258562; RTN4R gene: chr22:20226623:20226822; RTN4R gene: chr22:20226825:20227024; BCR gene: chr22:23624342:23624541; TEF gene: chr22:41771479:41771678.

The target marker of the present invention also includes 5kb upstream of each start site and 5kb downstream of each end site of each region described above. The specific nucleotide sequences of the above Hg19 coordinates, as well as 5kb upstream of the respective start sites and 5kb downstream of the respective end sites of each region, can be obtained in a common database (e.g., UCSC Genome Browser, ensemble, and NCBI website).

The target markers of the invention (e.g., sequences of a gene and its genome, or fragments of each gene containing at least one CpG dinucleotide sequence, or sequences comprising a gene spacer) also include non-enzymatically transformed (e.g., the corresponding region after bisulfite conversion), and the corresponding region obtained after enzymatic transformation (e.g., MSRE conversion).

In some embodiments, the subject markers also include variants of each of the genes described above. Variants include nucleic acid sequences from the same region that have at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% sequence identity (i.e., have one or more deletions, insertions, substitutions, inverted sequences, etc.) to a gene or region described herein. Thus, the present disclosure should be understood to extend to such variants that achieve the same result, despite the fact that the actual nucleic acid sequences between individuals have minor genetic variations.

As used herein, the term "percent (%) sequence identity" refers to the same percentage of amino acid (or nucleic acid) residues of a candidate sequence and amino acid (or nucleic acid) residues of a reference sequence after sequence alignment, where a gap (if necessary) may be introduced to maximize the number of identical amino acids (or nucleic acids). In other words, the percent (%) sequence identity of an amino acid sequence (or nucleic acid sequence) can be calculated by dividing the number of amino acid residues (or bases) that are identical to a reference sequence by the total number of amino acid residues (or bases) in the candidate sequence or reference sequence, whichever is shorter. Conservative substitutions of amino acid residues may or may not be considered as identical residues. The percentage of amino acid (or Nucleic acid) sequence identity may be determined, for example, using published tools such as BLASTN, BLASTp (available on the National Center for Biotechnology Information (NCBI) website, see also Altschul S.F.et al, J.mol.biol.,215:403-410 (1990), stephen F.et al, nucleic Acids Res.,25:3389-3402 (1997)), clustalW2 (available on the European bioinformatics institute website), and Higgins D.G.et al., methods in Enzymology,266:383-402 (1996); larkin M.A. et al, bioinformation (Oxford, england), 23 (21): 2947-8 (2007)) and ALIGN or Megalign (DNASTAR) software. The default parameters provided by the tool may be used by those skilled in the art, or parameters suitable for alignment may be customized (e.g., by selecting an appropriate algorithm).

The target markers of the present invention also include the corresponding regions of the above genes 5kb upstream of the start site and 5kb downstream of the end site after non-enzymatic conversion (e.g., bisulfite conversion) or after enzymatic treatment (e.g., methylation sensitive restriction enzyme treatment).

II, source and preparation of target markers

Herein, the target marker may be from a biological sample of any individual of interest. The term "individual" as used herein includes both human and non-human animals. Non-human animals include all vertebrates, such as mammals and non-mammals. "individual" may also be livestock, such as cattle, pigs, sheep, poultry and horses; or rodents, e.g., rats, mice; non-human primates, e.g., apes, monkeys, rhesus monkeys; or a domestic animal such as a dog or cat. In some embodiments, the subject is a human or a non-human primate. In some embodiments, the subject is a human. In this application, "individual," "subject," and "subject" are used interchangeably.

It will be appreciated that the sequences given in section I above are human sequences. When referring to the sequence of a non-human animal, the corresponding position and corresponding sequence of the above-described genes in the genome of the non-human animal can be readily determined using prior art techniques.

The term "biological sample" as used herein refers to a biological composition obtained or derived from an individual, comprising cells and/or other molecular entities (e.g. DNA) to be characterized or identified based on physical, biochemical, chemical and/or physiological characteristics. Biological samples include, but are not limited to, cells, tissues, organs and/or biological fluids of an individual obtained by any method known to those of skill in the art. In some embodiments, the biological sample is selected from the group consisting of: histological sections, tissue biopsies, paraffin-embedded tissues, body fluids, surgical resection samples, isolated blood cells, cells isolated from blood, and any combination thereof. In some embodiments, the bodily fluid is selected from the group consisting of: whole blood, serum, plasma, and any combination thereof. The choice of the most suitable sample will depend on the nature of the context. In some embodiments, the biological sample is whole blood of an individual. In some embodiments, the biological sample is plasma of an individual. Those skilled in the art are aware of various methods for preparing plasma from whole blood. For example, in some embodiments, plasma is obtained by centrifuging whole blood from an individual one, two, three, four, five or more times. In some embodiments, the biological sample is a thyroid nodule biopsy, preferably a fine needle penetration biopsy.

The DNA to be detected may be isolated from the biological sample. The DNA to be detected may be isolated and purified from a biological sample by using various methods known in the art. Commercial kits can be used for isolation and purification. For example, DNA is isolated from cells and tissues by: the raw materials are cleaved under highly denaturing and reducing conditions, the protein degrading enzymes are used in part, the nucleic acid components obtained by the phenol/chloroform extraction process are purified, and the nucleic acids are recovered from the aqueous phase by dialysis or ethanol precipitation (see e.g. Sambrook, j., fritsch, e.f.in t.maniatis, CS H, molecular Cloning, 1989). For another example, there are many reagent systems now particularly suitable for purifying DNA fragments from agarose gels, isolating plasmid DNA from bacterial lysates, and isolating longer-chain nucleic acids (genomic DNA, total cellular RNA) from blood, tissue or cell cultures. Many of these commercially available purification systems are based on the rather well known principle of binding nucleic acids to mineral carriers in the presence of solutions of different chaotropic salts. In these systems, suspensions of finely ground glass powder, diatomaceous earth or silica gel are used as support materials. Some other methods of isolating and purifying DNA from biological samples are described, for example, in US7888006B2 and EP1626085 A1. The choice between methods will be affected by several factors, including time, cost and the amount of DNA required.

In some embodiments, the DNA contained in the biological sample comprises genomic DNA. The term "genomic DNA" as used herein refers to DNA comprising the complete genome of a cell or organism, as well as fragments or portions thereof. Genomic DNA is a large piece of DNA (e.g., longer than about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, or 300 kb) derived from an individual and may have natural modifications, such as DNA methylation.

In some embodiments, the DNA contained in the biological sample comprises cellular DNA. The term "cellular DNA" as used herein refers to DNA that is present within a cell, or that is obtained from a cell in vivo and isolated in vitro, or otherwise manipulated in vitro, as long as the DNA is not removed from the cell in vivo.

In some embodiments, the DNA contained in the biological sample comprises extracellular free DNA. The term "extracellular free DNA" as used herein refers to a DNA fragment that exists extracellularly in vivo. The term may also be used to refer to DNA fragments obtained from an extracellular source in vivo and isolated, or manipulated, in vitro. The DNA fragment in the extracellular free DNA generally has a length of about 100 to 200bp, presumably related to the length of the DNA fragment encapsulated in nucleosomes. Extracellular free DNA (cfDNA) includes, for example, extracellular free fetal DNA and circulating tumor DNA. Extracellular free fetal DNA circulates in the body of a pregnant woman (e.g., blood), representing the fetal genome, while circulating tumor DNA circulates in the body of a cancer patient (e.g., blood). In some embodiments, the extracellular free DNA may be substantially free of cellular DNA of the individual. For example, the extracellular free DNA may comprise less than about 1,000ng/mL, less than about 100ng/mL, less than about 10ng/mL, less than about 1ng/mL of cellular DNA.

Extracellular free DNA can be prepared by using conventional techniques known in the art. For example, the extracellular free DNA of the blood sample may be obtained by centrifuging the blood sample at a speed of about 200-20,000g, about 200-10,000g, about 200-5,000g, about 300-4000g, etc., for about 3-30 minutes, about 3-15 minutes, about 3-10 minutes, about 3-5 minutes. For example, in some embodiments, extracellular free DNA of a blood sample may be obtained by centrifuging the plasma or serum of an individual one, two, three, four, five or more times. In some embodiments, the biological sample may be obtained by microfiltration in order to isolate cells and fragments thereof from cell-free components comprising soluble DNA. Typically, microfiltration may be performed using filters, for example, 0.1 micron to 0.45 micron membrane filters, such as 0.22 micron membrane filters.

In some embodiments, extracellular free DNA is extracted from whole blood, serum, or plasma for analysis using commercially available DNA extraction products. This extraction method is said to have high recovery (> 50%) of circulating DNA, and some products (e.g., QIAamp Circulating Nucleic Acid Kit by Qiagen) are said to extract DNA fragments of small size. Typical amounts of sample used are 1-5mL serum or plasma.

In some embodiments, the extracellular free DNA comprises circulating tumor DNA. Circulating tumor DNA ("ctDNA") is fragmented DNA of tumor origin in cell-independent body fluids (e.g., blood, urine, saliva, sputum, stool, pleural fluid, cerebrospinal fluid, etc.). Typically, ctDNA is highly fragmented, with an average length of about 150 base pairs. ctDNA generally comprises a very small fraction of extracellular free DNA in body fluids (e.g., plasma), e.g., ctDNA may constitute less than about 10% of plasma DNA. Typically, the percentage is less than about 1%, such as less than about 0.5% or less than about 0.01%. In addition, the total amount of plasma DNA is typically very low, e.g., about 10ng/mL plasma. The amount of ctDNA varies from person to person and depends on the type, location of the tumor, and for cancerous tumors, on the stage of the cancer. However, ctDNA is generally very rare in body fluids and can only be detected by extremely sensitive and specific techniques. Detection of ctDNA may be useful in detecting and diagnosing tumors, directing tumor-specific therapies, monitoring treatments, and monitoring remission of cancer.

III base conversion

Herein, DNA methylation is a biological process of adding a methyl group (e.g., by the action of a DNA methyltransferase) to a DNA molecule (e.g., to one or more cytosine bases of a DNA molecule). In mammals, DNA methylation occurs at the 5' position of a cytosine-phosphate-guanine (CpG) dinucleotide (i.e., a "CpG site"), which when present in the promoter or 5' -CpG-3' dinucleotide in the first exon of a gene, results in epigenetic inactivation of the gene. DNA methylation has been well documented to play an important role in regulating gene expression, tumorigenesis, and other genetic and epigenetic diseases.

As used herein, the term "methylated cytosine residue" refers to a derivative of a cytosine residue wherein a methyl group is attached to a carbon atom (e.g., C5) of the cytosine ring. The term "unmethylated cytosine residue" refers to an underivatized cytosine residue in which, in contrast to a "methylated cytosine residue", there is no methyl linkage on a carbon atom (e.g., C5) of the cytosine ring. CpG sites in which cytosine residues are methylated CpG sites, while CpG sites in which cytosine residues are unmethylated CpG sites.

As described herein, conversion can occur between bases of DNA or RNA. As used herein, "transformation", "cytosine transformation" or "CT transformation" is a process whereby a non-enzymatic or enzymatic process is used to treat DNA to convert an unmodified cytosine base (C) to a base that does not bind to guanine (G) (e.g., uracil base (U)). Some agents are capable of distinguishing between unmethylated and methylated CpG sites in DNA, thereby obtaining treated DNA. The agent can act selectively on unmethylated cytosine residues but not significantly on methylated cytosine residues. Alternatively, the agent may act selectively on methylated cytosine residues, but not significantly on unmethylated cytosine residues. For example, some reagents may selectively convert an unmethylated cytosine residue to uracil, thymine, or another base that is hybridized to cytosine, while the methylated cytosine residue remains in an unconverted state; for another example, some reagents may selectively cleave methylated residues, or selectively cleave unmethylated residues. Thus, the original DNA is converted into the treated DNA in a manner depending on whether it is methylated or not, so that the treated DNA can be distinguished from the original DNA by its hybridization behavior.

As used herein, "treated DNA," "treated sequence," "treated fragment" refers to DNA, nucleic acid sequence, gene fragment that has been treated with an agent capable of distinguishing between unmethylated and methylated CpG sites in the DNA, nucleic acid sequence, gene fragment.

More specifically, cytosine conversion can be performed using non-enzymatic or enzymatic methods. Illustratively, the non-enzymatic method includes a bisulfite or bisulfate treatment. In some embodiments, the reagent used in the non-enzymatic method comprises a bisulphite reagent. As used herein, the term "bisulfite reagent" refers to a reagent including bisulfite, bisulfite ions, or any combination thereof, such as disclosed herein, that can be used to distinguish between methylated and unmethylated CpG dinucleotide sequences. In this application, treatment of DNA with a bisulfite reagent is also described as a "bisulfite reaction" or "bisulfite treatment" and refers to a reaction that converts unmethylated cytosine residues, particularly in the presence of bisulfite ions, to uracil bases, thymine bases, or other bases that differ from cytosine in hybridization behavior, in nucleic acids, where methylated cytosine residues are not significantly converted. In other words, bisulfite treatment can be used to distinguish between methylated CpG dinucleotides and unmethylated CpG dinucleotides. The bisulfite reaction for detecting methylated cytosine residues is described in detail in Frommer, M., et al, proc Natl Acad Sci USA 89 (1992) 1827-31 and Grigg, G, clark, S., bioessays 16 (1994) 431-6. The bisulfite reaction includes a deamination step and a desulfonation step (see Grigg and Clark, supra). The statement that "methylated cytosine residues are not significantly converted" does not exclude that a very small percentage (e.g., less than 0.1%, less than 0.2%, less than 0.3%, less than 0.4%, less than 0.5%, less than 0.6%, less than 0.7%, less than 0.8%, less than 0.9%, less than 1%, less than 2%, less than 3%, less than 4%, less than 5%, less than 6%, less than 7%, less than 8%, less than 9%, less than 10%, less than 11%, less than 12%, less than 13%, less than 14%, less than 15%, less than 16%, less than 17%, less than 18%, less than 19%, less than 20%) of the methylated cytosine residues are converted to uracil, thymine, or other bases that differ in hybridization behavior, although it is intended to convert only unmethylated cytosine residues.

In the case of, for example, reference to from m., et al (supra) or Grigg and Clark (supra), which disclose basic parameters of the bisulfite treatment, the person skilled in the art knows how to carry out the bisulfite treatment, in particular the deamination step and the desulfonation step. The effect of incubation time and temperature on deamination efficiency, and parameters affecting DNA degradation, are disclosed.

In some embodiments, the bisulphite reagent is selected from the group consisting of: ammonium bisulfate, sodium bisulfate, potassium bisulfate, calcium bisulfate, magnesium bisulfate, aluminum bisulfate, bisulfite ions, and any combination thereof. In some embodiments, the bisulphite reagent is sodium bisulphite. In some embodiments, the bisulphite reagent is commercially available, e.g., methyl code ^TM Bisulfite Conversion Kit、EpiMark ^TM Bisulfite Conversion Kit、EpiJET ^TM Bisulfite Conversion Kit、EZDNAMethylation-Gold ^TM Kit, etc. In some embodiments, the bisulfite reaction is performed according to the instructions of the kit.

Exemplary enzymatic methods include deaminase treatment, and selective cleavage of unmethylated residues but not methylated residues or selective cleavage of methylated residues but not unmethylated residues using a reagent. Preferably, the agent is a Methylation Sensitive Restriction Enzyme (MSRE).

The term "methylation sensitive restriction enzyme" refers to an enzyme that selectively digests nucleic acid according to the methylation state of its recognition site. For restriction enzymes that cleave specifically only when the recognition site is unmethylated or hemimethylated, cleavage does not occur, or with significantly reduced efficiency, when the recognition site is methylated. For restriction enzymes that cleave specifically when the recognition site is methylated, cleavage does not occur, or with significantly reduced efficiency, when the recognition site is unmethylated. In some embodiments, the recognition sequence of the methylation sensitive restriction enzyme contains a CG dinucleotide (e.g., cgcg or cccggg). In some embodiments, the methylation sensitive restriction enzyme does not cleave when a cytosine in the CG dinucleotide is methylated at the C5 carbon atom.

Exemplary MSREs are selected from the group consisting of: hpaII enzyme, salI enzyme,

Enzymes, scrFI enzymes, bbeI enzymes, notI enzymes, smaI enzymes, xmaI enzymes, mboI enzymes, bstBI enzymes, claI enzymes, mluI enzymes, naeI enzymes, narI enzymes, pvuI enzymes, sacII enzymes, hhaI enzymes, and any combination thereof.

Methylation is determined using methods known in the art using methylation sensitive restriction enzymes or a series of restriction enzyme reagents comprising methylation sensitive restriction enzymes, such as, but not limited to, differential methylation hybridization ("DMH"), that are capable of distinguishing between methylated and unmethylated CpG dinucleotides within a target region.

In some embodiments, DNA in the biological sample may be cleaved prior to treatment with the methylation sensitive restriction enzyme. Such methods are known in the art and may include both physical and enzymatic means. It is particularly preferred to use one or more restriction enzymes which are insensitive to methylation and whose recognition sites are AT-rich and do not contain CG dinucleotides. The use of such enzymes allows the preservation of CpG sites and CpG-enriched regions within the DNA fragment. In some embodiments, such restriction enzymes are selected from the group consisting of MseI enzymes, bfaI enzymes, csp6I15 enzymes, tru1I enzymes, tru9I enzymes, maeI enzymes, xspI enzymes, and any combination thereof.

The transformed DNA is optionally purified. DNA purification methods suitable for use herein are well known in the art.

Quantitative analysis

The methylation state or methylation level of at least one CpG dinucleotide in any 1, any 2, any 3, any 4, any 5, any 6, any 7, any 8, any 9, any 10, any 11, any 12, any 13, any 14, any 15, any 16, any 17, any 18, any 19, any 20 or more of the markers of interest described herein can be detected for identifying a benign or malignant thyroid nodule. The detection reagent and the diagnostic kit can be used for detecting the methylation state or the methylation level.

Herein, the terms "benign" and "malignant" refer to the nature of thyroid nodules. In general, benign manifestations are slow growth of nodules, uniform texture, good mobility, smooth surface, cystic changes, no lymphadenomegaly, no calcification, etc. Malignancy manifests as uncontrolled malignant cell growth, spread and tissue infiltration. Ultrasound signs that suggest a thyroid nodule as malignant include: the height of the nodule is larger than the width, the areola is lacked, the micro calcification is carried out, the boundary is irregular, the echo is reduced, the solid nodule is provided, the blood flow in the nodule is rich, etc. In some embodiments, the malignant thyroid nodule includes thyroid cancer.

Herein, "methylation state" refers to the presence or absence of one or more methylated nucleotide bases in a nucleic acid molecule. For example, a nucleic acid molecule containing a methylated cytosine is considered methylated (e.g., the methylation state of the nucleic acid molecule is methylated). Nucleic acid molecules that do not contain any methylated nucleotides are considered unmethylated. In some embodiments, a nucleic acid may be characterized as "unmethylated" if it is not methylated at a particular locus (e.g., a locus of a particular single CpG dinucleotide) or a particular combination of loci, even if it is methylated at other loci of the same gene or molecule.

Thus, methylation status describes the status of methylation of a nucleic acid (e.g., a genomic sequence or a marker of interest described herein). In addition, methylation state refers to a characteristic of a nucleic acid segment at a particular genomic locus that is associated with methylation. Such features include, but are not limited to, whether any cytosine (C) residues within this DNA sequence are methylated, the position of one or more methylated C residues, the frequency or percentage of methylated C throughout any particular region of the nucleic acid, and methylation allele differences due to, for example, differences in allele origins. "methylation state" refers to the relative concentration, absolute concentration, or pattern of methylated C or unmethylated C throughout any particular region of a nucleic acid in a biological sample. For example, one or more cytosine (C) residues within a nucleic acid sequence may be referred to as "hypermethylated" or have "increased methylation" if they are methylated, and one or more cytosine (C) residues within a DNA sequence may be referred to as "demethylated" or have "decreased methylation" if they are unmethylated. Likewise, if one or more cytosine (C) residues within a nucleic acid sequence are methylated compared to another nucleic acid sequence (e.g., from a different region or from a different individual, etc.), the sequence is considered hypermethylated or has increased methylation compared to the other nucleic acid sequence. Alternatively, if one or more cytosine (C) residues within a DNA sequence are unmethylated compared to another nucleic acid sequence (e.g., from a different region or from a different individual, etc.), then the sequence is considered to be demethylated or to have reduced methylation compared to the other nucleic acid sequence.

Herein, methylation level represents the proportion (or percentage, fraction, ratio, degree) of one or more sites in methylation state. The methylation level of a region (or group of sites) is the average of the methyl levels of all sites in the region (or all sites in the group). Thus, an increase or decrease in the methylation level of a region does not indicate an increase or decrease in the methylation level of all methylation sites in the region. The process of converting the results obtained by methods for detecting DNA methylation (e.g., simplified methylation sequencing) to methylation levels is known in the art. Methylation levels can be determined, for example, by quantitative analysis of the amount of intact DNA present after restriction digestion with methylation-sensitive restriction enzymes. In this example, if quantitative PCR is used to quantitatively analyze a particular sequence in DNA, an amount of template DNA approximately equal to the control of the mock-treatment indicates that the sequence is not highly methylated, and an amount of template significantly less than the amount of template in the mock-treated sample indicates that methylated DNA is present in the sequence. Thus, the methylation level as in the above examples can be used as a quantitative indicator of methylation status. This is particularly useful when it is desired to compare the methylation level of sequences in a sample to a threshold level.

In one or more embodiments, the methylation level (e.g., ct value) of the target marker is increased or decreased when compared to a reference level. When the methylation marker level (e.g., ct value) meets a certain threshold, then the thyroid nodule is identified as malignant. Alternatively, a mathematical analysis of the methylation level of the target marker may be performed to obtain a score. For the detected sample, when the score is greater than or less than the threshold, the result is judged to be positive, namely the thyroid nodule is malignant. Conventional mathematical analysis methods and processes for determining thresholds are known in the art, with an exemplary method being a Support Vector Machine (SVM) mathematical model. For example, for differential methylation markers, a Support Vector Machine (SVM) is constructed on training set samples, and test set sample prediction scores are counted using accuracy, sensitivity and specificity of model statistical test results and area under predictor feature curve (ROC) (AUC).

The methylation level/status of one or more CpG dinucleotide sequences within a DNA sequence (e.g. a marker of interest) can be determined by various analytical methods known in the art, preferably quantitative analytical methods. An exemplary analysis method includes: polymerase chain reactions, including real-time polymerase chain reactions, digital polymerase chain reactions, and bisulfite conversion-based PCR (e.g., methylation-specific PCR (MSP)); sequencing nucleic acid; whole genome methylation sequencing (RRBS); simplified methylation sequencing; mass-based separations (e.g., electrophoresis, mass spectrometry); target capture (e.g., hybridization, microarray); methylation sensitive restriction enzyme assays; methylation-sensitive high-resolution melting curve method; chip-based methylation profile analysis; mass spectrometry; and fluorescence quantification. Herein, detecting includes detecting any strand at a gene or site.

In some embodiments, the quantitative analysis is performed by real-time PCR. Non-limiting examples of real-time PCR include the HeavyMethyll described by Cottrell et al, nucl. Acids Res.32:e10, 2003 ^TM PCR; eads et al, cancer Res.59:2302-2306, 1999 described MethyLightTMPCR; rand et al, nucleic acids res.33:e127, 2005 describes Headloop PCR.

As used herein, the term "HeavyMethyl ^TM PCR "refers to a real-time PCR technique recognized in the art wherein one or more are non-extendableThe nucleic acid (e.g., oligonucleotide) is attached to the bisulfite treated nucleic acid in a methylation specific manner (i.e., the nucleic acid binds specifically to unmutated DNA under medium to high stringency conditions). The amplification reaction is performed using one or more primers, which may optionally be methylation specific, but flanked by one or more closures. In the presence of unmethylated nucleic acid (i.e., mutated DNA), the closure binds and no PCR products are produced. TaqMan is used as described essentially in Holland et al, proc.Natl. Acad.Sci.USA,88:7276-7280,1991, for example ^TM The methylation level of the nucleic acid in the sample is determined by the assay method.

As used herein, the term "MethyLightTMPCR" refers to a fluorescence-based real-time PCR technique based on the recognition in the art, wherein a technique called TaqMan is employed ^TM The probe's double-labeled fluorescent oligonucleotide probe is designed to hybridize to CpG-rich sequences located between forward and reverse amplification primers. The TaqMan probes comprise a fluorescent "reporter moiety" and a "quencher moiety" covalently bound to a linker moiety (e.g., phosphoramidite) that is attached to a nucleotide of the TaqMan oligonucleotide. During PCR amplification, the TaqMan probes hybridized to CpG-rich sequences are cleaved by the 5' nuclease activity of Taq polymerase, thereby generating a signal that is detected in real-time during the PCR reaction. In this approach, molecular beacons can be used as detectable probes, and the system is independent of the 5'-3' exonuclease activity of the DNA polymerase used (see Mhlanga and Malmberg, methods 25:463-471, 2001).

As used herein, the term "Headloop PCR" refers to a real-time PCR recognized in the art that selectively amplifies a target nucleic acid, but inhibits amplification of non-amplified target variants by extending the 3' stem loop to form a hairpin structure that does not provide further amplification template.

In some embodiments, the real-time PCR is multiplex real-time PCR. As used herein, the term "multiplex" may refer to an assay or other analytical method in which the presence and/or amount of multiple markers (e.g., multiple nucleic acid sequences) may be determined simultaneously by using more than one marker, each marker having at least one different detection characteristic, such as a fluorescent characteristic (e.g., excitation wavelength, emission intensity, FWHM (full width at half maximum) or fluorescent lifetime) or a unique nucleic acid or protein sequence characteristic.

In some embodiments, the quantitative analysis is performed by nucleic acid sequencing. Exemplary methods of nucleic acid sequencing are known in the art, see, e.g., from mer et al, proc.Natl. Acad.Sci.USA 89:1827-1831,1992; clark et al, nucleic acids Res.22:2990-2997,1994. For example, identification of methylated cytosines in a DNA sequence is facilitated by comparing the sequence obtained from a sample that has not been treated with bisulfite, or a known nucleotide sequence of a region of interest, to the sequence obtained from a sample that has been treated with bisulfite. In contrast to untreated samples, thymine residues detected at any cytosine position in the bisulfite treated sample can be considered as mutations resulting from bisulfite treatment, i.e., methylated cytosines are present at that position.

Methods for sequencing DNA are known in the art and include, for example, the dideoxy chain termination method or the Maxam-Gilbert method (see Sambrook et al, molecular Cloning, A Laboratory Manual (2) ^nd Ed., CSHP, new York 1989)), pyrosequencing (see Uhlmann et al, electrophoresis,23:4072-4079,2002), solid phase pyrosequencing (see Landegren et al, genome Res.,8 (8): 769-776, 1998), solid phase micro-sequencing (see, e.g., southern et al, genomics,13:1008-1017,1992), micro-sequencing using FRET (see, e.g., chen and Kwok, nucleic Acids Res.25:347-353, 1997), ligation sequencing or ultra-deep sequencing (see Margulles et al, nature 437 (7057): 376-80 (2005)).

In some embodiments, the quantitative analysis is performed by mass-based separation (e.g., electrophoresis, mass spectrometry). For example, the presence of methylated cytosine residues can be detected by a combination bisulfite restriction assay (COBRA), substantially as described by Xiong and Laird, nucleic acids Res.,25:2532-2534,2001. This method exploits the difference in restriction enzyme recognition sites between methylated and unmethylated nucleic acids after treatment with a compound that can selectively mutate unmethylated cytosine residues (e.g., bisulfite). For example, the restriction endonuclease Taq1 cleaves the sequence TCGA, which will be TTGA after bisulfite treatment of unmethylated nucleic acid, and will therefore not be cleaved. The digested and/or undigested nucleic acids are then detected using detection means known in the art, such as electrophoresis and/or mass spectrometry. As another example, nucleic acid differences in amplified products are detected using different techniques, such as methylation specific single strand conformation analysis (MS-SSCA) (Bianco et al, hum. Mutat, 14:289-293, 1999), methylation specific gradient gel electrophoresis (MS-DGGE) (Abrams and Stanton, methods enzymes, 212:71-74,1992) and methylation specific high performance liquid chromatography (MS-DHPLC) (Deng et al, chin. J. Cancer Res.,12:171-191,2000), based on differences in nucleotide sequence and/or secondary structure after treatment with compounds that selectively mutate unmethylated cytosine residues.

In some embodiments, the quantitative analysis is performed by target capture (e.g., hybridization, microarray). Suitable detection methods by hybridization are known in the art, for example Southern, dot blot, slot blot or other means of nucleic acid hybridization (Kawai et al, mol. Cell. Biol.14:7421-7427,1994;Gonzalgo et al, cancer Res.57:594-599, 1997). In some embodiments, the probe used in the hybridization assay is detectably labeled. In some embodiments, the nucleic acid-based probes used in hybridization assays are unlabeled. Such unlabeled probes may be immobilized on a solid support such as a microarray, and may hybridize to the detectably labeled target nucleic acid molecules. One example of a microarray is a methylation specific microarray that can be used to distinguish between sequences with converted cytosine residues and sequences with unconverted cytosine residues (see Adorjan et al, nucleic acids res, 30:e21, 2002). Hybridization-based assays can also be used for nucleic acids after treatment with methylation-sensitive restriction enzymes. For another example, the methylation status of CpG dinucleotide sequences within a DNA sequence can be determined by an oligonucleotide probe that hybridizes to the bisulfite treated DNA simultaneously with the PCR amplification primer (wherein the primer can be a methylation specific primer or a standard primer).

In some embodiments, the quantitative analysis is performed in the presence of a detection reagent. As used herein, the term "detection reagent" is a reagent used to detect the presence, absence, or amount of nucleic acid in a quantitative analysis step. Various detection reagents known in the art may be used in the present application. In some embodiments, the detection reagent is selected from the group consisting of: fluorescent probes, intercalating dyes, chromophore-labeled probes, radioisotope-labeled probes, and biotin-labeled probes.

In some embodiments, the quantitative analysis comprises amplifying the treated DNA using a quantitative primer pair and a DNA polymerase. As used herein, the term "quantitative primer pair" refers to one or more primer pairs used in a quantitative analysis step. Preferably, the quantitative primer pair is capable of hybridizing to at least 9 consecutive nucleotides of the treated DNA under stringent, moderately stringent or highly stringent conditions.

In some embodiments, the quantitative analysis comprises determining the methylation level of one or more markers of interest based on the presence or level of a plurality of CpG dinucleotides, tpG dinucleotides, or CpA dinucleotides in the treated DNA. In some embodiments, the quantitative analysis comprises determining the methylation level of cytosine residues based on the presence or level of one or more CpG dinucleotides in the treated DNA. In some embodiments, the quantitative analysis comprises determining the methylation level of cytosine residues based on the presence or level of one or more TpG dinucleotides in the treated DNA. In some embodiments, the quantitative analysis comprises determining the methylation level of cytosine residues based on the presence of CpA dinucleotides in the treated DNA.

In some embodiments, the step of quantifying is performed by separating the treated DNA product into a plurality of components. In some embodiments, a plurality of different quantitative analytical tests are performed on a plurality of components, wherein different combinations of the treated DNA products (if present in the components) are quantitatively analyzed in one of the plurality of components. In some embodiments, the control markers in each component are quantitatively analyzed.

In some embodiments, the methylation level of each target marker is separately quantitatively analyzed using MSP (see Herman, supra) based on pre-amplified DNA. For example, by using one or more primers that specifically hybridize to untransformed sequences under medium and/or high stringency conditions, amplification products are only produced when the template comprises methylated cytosines at the CpG sites.

In some embodiments, the quantitative primer pair is designed to amplify at least a portion of the treated DNA product, i.e., quantitative analysis is designed as nested PCR. Nested PCR is an improvement of PCR and aims at improving sensitivity and specificity. Nested PCR involves the use of two primer sets and two consecutive PCR reactions. A first round of amplification is performed to generate a first amplicon and a second round of amplification is performed using one primer pair, wherein one or both primers anneal to sites within the region defined by the initial primer pair, i.e., the second primer pair is considered "nested" within the first primer pair. In this way, background amplification products from the first PCR reaction that do not contain the correct internal sequences are not amplified any further in the second PCR reaction.

Typically, the reaction solution for PCR comprises Taq DNA polymerase, PCR buffer, primer, probe, dNTPs, mg ²⁺ . Preferably, the Taq DNA polymerase is a hot start Taq DNA polymerase. Illustratively, mg ²⁺ The final concentration is 1.0-20.0mM; the concentration of each primer is 100-500nM; each probe was at a concentration of 100-500nM. Exemplary PCR reaction conditions are, pre-denaturation at 95℃for 5min; denaturation at 95℃for 15s, annealing at 60℃for 60s,50 cycles.

In some embodiments, the methods of the invention comprise a pre-amplification step. One of the purposes of pre-amplifying the target markers is to increase the number of target markers in the treated DNA. As used herein, the term "amplification" refers generally to any process that can result in an increase in the copy number of a molecule or group of related molecules. When "amplification" is used with respect to a polynucleotide molecule, it is meant that multiple copies of the polynucleotide molecule or portions of the polynucleotide molecule are typically produced starting with a small amount of the polynucleotide, wherein the substance being amplified (amplicon, PCR amplicon) is typically detectable. Amplification of polynucleotides encompasses a number of chemical and enzymatic processes. Amplified forms include the generation of multiple copies of DNA from one or several copies of a template RNA or DNA molecule by polymerase chain reaction (reverse transcription PCR, PCR), strand Displacement Amplification (SDA) reaction, transcription Mediated Amplification (TMA) reaction, nucleic Acid Sequence Based Amplification (NASBA) reaction, or Ligase Chain Reaction (LCR).

The target marker in the treated DNA may be pre-amplified with a pre-amplification primer. As used herein, the term "primer" refers to a single stranded oligonucleotide that is capable of acting as a point of initiation of template-directed DNA synthesis in the presence of four different nucleoside triphosphates and reagents for polymerization (e.g., DNA polymerase) under suitable conditions (e.g., buffer and temperature). In any given case, the length of the primer depends on, for example, the intended use of the primer, and is typically in the range of 15 to 30 nucleotides. Short primer molecules typically require lower temperatures to form sufficiently stable hybridization complexes with the template. The primer need not reflect the exact sequence of the template, but must be sufficiently complementary to hybridize to the template. The primer site is the region of the template to which the primer hybridizes. The primer pair is a set of primers comprising a 5 'forward primer that hybridizes to the 5' end of the sequence to be amplified and a 3 'reverse primer that hybridizes to the complementary strand of the 3' end of the sequence to be amplified. The skilled person can design primers based on the markers to be amplified based on common knowledge in the art (see, e.g., PCR Primer: A Laboratory Manual, cold Spring Harbor Laboratories, NY, 1995). In addition, some software packages for designing optimal probes and/or primers for use in a wide variety of assays are disclosed, such as Primer 3 available from genomic research center (the Center for Genome Research, cambridge, mass., USA) of Cambridge, massachusetts. Obviously, the potential use of probes or primers should also be considered in designing them. For example, a primer designed for the purposes of the present invention may comprise at least one CpG site, or an amplification product obtained from the primer may comprise at least one CpG site. Tools for designing primers for detecting the methylation state of DNA are also known in the art, for example MethPrimer (Li LC and Dahiya R.MethPrimer: designing primers for methylation PCRs.Bioinformatics.2002Nov;18 (11): 1427-31). In this application, by using pre-amplification primers as primer pools, any target marker (each at least a portion of the target marker or one sub-region of the target marker) in the treated DNA can be pre-amplified.

As used herein, the term "complementary" refers to hybridization or base pairing between nucleotides or nucleic acids, e.g., between two strands of a double-stranded DNA molecule, or between a primer binding site and an oligonucleotide primer on a single-stranded nucleic acid to be sequenced or amplified. The complementary nucleotides are typically A and T (or A and U), or C and G. When nucleotides of one strand are optimally aligned, compared, and have appropriate nucleotide insertions or deletions, the two single stranded RNA or DNA molecules are said to be complementary, paired with at least about 80% (typically at least about 90% to 95%, more preferably about 98% to 100%) of the nucleotides of the other strand. Alternatively, complementarity exists when an RNA strand or DNA strand hybridizes to its complement under selective hybridization conditions. Typically, selective hybridization will occur when there is at least about 65% (preferably at least about 75%, more preferably at least about 90%) complementarity over a stretch of at least 14 to 25 nucleotides. See M. Kanehisa, nucleic Acids Res.12:203 (1984), incorporated herein by reference.

In some embodiments, the pre-amplification primer pool comprises at least one methylation specific primer pair. In some embodiments, the pre-amplification primer pool comprises a plurality of methylation specific primer pairs. In some embodiments, the pre-amplification step is performed by methylation specific PCR ("MSP"), which is PCR using methylation specific primers. Herman et al, methyl-specific PCR a novelPCRassay for Methylation status ofCpGislands, proc Natl Acad Sci USA 1996September 3;93 (18) this technique (i.e., MSP) has been described in 9821-6 and United States Patent No.6,265,171.

As used herein, the term "methylation specific primer pair" refers to a primer pair specifically designed to recognize CpG sites to exploit differences in methylation to amplify a particular target marker in treated DNA. The primer acts only on molecules with or without a specific methylation state. For example, the primer may be an oligonucleotide that hybridizes specifically to a specific CpG site with methylation, but not to a specific CpG site without methylation, under stringent, moderately stringent, or highly stringent conditions. Thus, the primers will specifically amplify a target marker that has methylation at a particular CpG site. For another example, the primer may be an oligonucleotide that hybridizes specifically to an unmethylated specific CpG site, but not to a methylated specific CpG site, under stringent, moderately stringent, or highly stringent conditions. Thus, the primers will specifically amplify target markers that are not methylated at a particular CpG site. Thus, in the present application, methylation specific primers are used in the pre-amplification of at least one target marker within the treated DNA, allowing to distinguish between methylated and unmethylated CpG sites. The methylation specific primer pairs of the present application comprise at least one primer that hybridizes to a bisulfite treated CpG dinucleotide. Thus, the sequence of the primer specific for methylated DNA comprises at least one CpG dinucleotide and the sequence of the primer specific for unmethylated DNA comprises a "T" at the C position of CpG and/or a "at the G position in CpG.

Methylation specific primer pairs typically comprise a forward primer and a reverse primer, each comprising an oligonucleotide sequence that hybridizes to at least 9 consecutive nucleotides of one of the markers (or a subregion of the marker) of interest under stringent conditions, moderately stringent conditions, or highly stringent conditions, wherein at least 9 consecutive nucleotides of one of the markers (or a subregion of the marker) of interest comprise at least one (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more) CpG site.

As used herein, the term "hybridization" may refer to a process in which two single-stranded polynucleotides are non-covalently bound to form a stable double-stranded polynucleotide. In one aspect, the resulting double-stranded polynucleotide may be a "hybrid" or "double-stranded". The salt concentration in the "hybridization conditions" is typically less than about 1M, often less than about 500mM and may be less than about 200mM. "hybridization buffer" includes buffered saline solutions, such as 5% sspe, or other such buffers known in the art. Hybridization temperatures can be as low as 5 ℃, but are typically above 22 ℃, and more typically above about 30 ℃, and typically above 37 ℃. The hybridization is usually performed under stringent conditions, i.e., conditions under which a sequence will hybridize to its target sequence but not to other non-complementary sequences. Stringent conditions will be sequence dependent and will be different in different circumstances. For example, longer fragments may require higher hybridization temperatures than shorter fragments to specifically hybridize. Parameter combinations are more important than absolute measurements using either parameter alone, as other factors may affect the stringency of hybridization, including base composition and length of the complementary strand, presence of organic solvents, and degree of base mismatch. Typically stringent conditions are selected to be about 5℃lower than the melting temperature (Tm) for a specific sequence at a specific ionic strength and pH. The Tm may be the temperature at which half of the population of double-stranded nucleic acid molecules are separated into single strands. Several equations for calculating the Tm of a nucleic acid are well known in the art. As shown in the standard reference, a simple estimated Tm value can be calculated by the formula tm=81.5+0.41 (% g+c) when the nucleic acid is in 1M aqueous NaCl solution (see, e.g., anderson and Young, quantitative Filter Hybridization, in Nucleic Acid Hybridization (1985)). Other references (e.g., allawi and SantaLucia, jr., biochemistry,36:10581-94 (1997)) include alternative calculation methods that take structural and environmental as well as sequence characteristics into account when calculating Tm.

Generally, the stability of a hybrid is a function of ion concentration and temperature. Typically, the hybridization reaction is performed under lower stringency conditions and then washed in wash solutions having different but higher stringency. Exemplary stringent conditions include those having a pH of about 7.0 to about 8.3, a temperature of at least 25℃and a sodium ion (or other salt) concentration of at least 0.01M and not more than 1M. For example, a temperature of 5 XSSPE (750 mM NaCl,50mM sodium phosphate, 5mM EDTA, pH 7.4) and about 30℃is suitable for allele-specific hybridization, although the suitable temperature depends on the length and/or GC content of the hybridization region. In one aspect, the "hybridization stringency" to determine the percentage of mismatches can be as follows: 1) High stringency: 0.1x SSPE,0.1%SDS,65 ℃; 2) Moderate stringency (also referred to as moderate stringency): 0.2x SSPE,0.1%SDS,50 ℃; 3) Low stringency: 1.0x SSPE,0.1%SDS,50 ℃. It will be appreciated that the same stringency can be achieved using alternative buffers, salts and temperatures. For example, moderately stringent hybridization may refer to conditions that allow a nucleic acid molecule (e.g., a probe) to bind to a complementary nucleic acid molecule. The hybridized nucleic acid molecules typically have at least 60% identity, including, for example, at least 70%, 75%, 80%, 85%, 90% or 95% identity. The moderately stringent conditions may be conditions that achieve the same effect as the following conditions: hybridization was performed at 42℃with 50% formamide, 5 XDenhardt's solution, 5 XSSPE, 0.2% SDS, and then washed with 42℃ 0.2x SSPE,0.2%SDS. High stringency conditions can be provided by, for example, hybridization with 42℃in 50% formamide, 5 XDenhardt's solution, 5 XSSPE, 0.2% SDS, followed by washing at 65℃in 0.1 XSSPE and 0.1% SDS. The low stringency hybridization may be performed under conditions equivalent to those described below: hybridization was performed at 22℃with 10% formamide, 5 XDenhardt's solution, 6 XSSPE, 0.2% SDS, followed by washing in 1 XSSPE, 0.2% SDS at 37 ℃. The Denhardt solution contained 1% polysucrose, 1% polyvinylpyrrolidone and 1% Bovine Serum Albumin (BSA). 20 XSSPE (sodium chloride, sodium phosphate, EDTA) comprises 3M sodium chloride, 0.2M sodium phosphate, and 0.025M EDTA. Other suitable moderately stringent and highly stringent hybridization buffers and conditions are well known to those skilled in the art and are described, for example, in Sambrook et al, molecular Cloning: A Laboratory Manual,2nd ed., cold Spring Harbor Press, planview, n.y. (1989) and Ausubel et al, short Protocols in Molecular Biology,4th ed., john Wiley & Sons (1999).

In some embodiments, the pre-amplification primer pool further comprises a control primer pair for amplifying a control marker. Typically, a control marker is a nucleic acid having a known characteristic (e.g., a known sequence, a known copy number per cell) for comparison to an experimental target (e.g., a nucleic acid of unknown concentration). The control may be an endogenous, preferably a constant gene against which the test nucleic acid or target nucleic acid under analysis may be normalized. Such controls, normalized for sample-to-sample variability, may occur, for example, in sample processing, analytical efficiency, etc., and allow for accurate sample-to-sample data comparisons, quantitative analysis of amplification efficiency and bias.

In some embodiments, the invention uses RRBS technology to detect the methylation level of CpG sites of a target marker of interest, and then calculates the average methylation ratio (average methylation fraction, AMF) of the marker as the DNA methylation level of the marker. The calculation of AMF may be performed as described in the embodiments of the present application.

V. identification of benign and malignant thyroid nodule

The present invention finds that the methylation level of one or more target markers described herein can be used to determine the benign malignancy of a thyroid nodule in an individual. In one or more embodiments, the methylation level of a CpG site in a target marker described herein can be detected, and then the average methylation ratio (AMF) of the target marker can be calculated as the DNA methylation level of the marker. Herein, the AMF can be calculated from the following formula:

Wherein M is the total number of CpG sites in the marker, i is one of the CpG sites, N _C,i Sequencing reads number, N, for methylation of the CpG sites _T,i Sequencing reads that are unmethylated for the CpG sites.

And then calculating the malignancy prediction probability of the sample through the constructed mathematical model. The malignancy prediction probability is calculated using a logistic regression (Logistic Regression) model. First, the input z of the Sigmoid function is calculated, which is given by the following formula:

z＝Σw*x+w0

then, the Sigmoid function is calculated as follows:

σ(z)＝1/(1+e ^-z )

w is the regression model coefficient for each marker, w0 is the intercept, and x is the calculated DNA methylation level (i.e., AMF) for that marker. The sigma value is the malignancy prediction probability.

The training set is constructed according to the DNA methylation level of each marker in the training set sample, the threshold defined by the Johnson index of the training set is used as a malignancy prediction threshold, the malignancy prediction threshold of each marker is obtained, and the malignancy prediction threshold of each marker is shown in the table 6.

In some embodiments, the probability of malignancy prediction for each sample is calculated according to the above formula based on the methylation level of a single target marker described herein, and if the value is above the threshold for the target marker shown in table 6, then it is determined to be malignant, and vice versa. In preferred embodiments, the target marker is the PRDM16 sequence of the PRDM16 gene or genome, the BIN1 sequence of the BIN1 gene or genome, the LIMK1 sequence of the LIMK1 gene or genome, the EGR3 sequence of the EGR3 gene or genome, the PPIF sequence of the PPIF gene or genome, the ZNF219 sequence of the ZNF219 gene or genome, the UACA sequence of the UACA gene or genome, the TNK1 sequence of the TNK1 gene or genome, the CEP295NL sequence of the CEP295NL gene or genome, the SBNO2 sequence of the SBNO2 gene or genome, the C19orf77 sequence of the C19orf77 gene or genome, the ICAM5 sequence of the ICAM5 gene or genome, the CRTC1 sequence of the CRTC1 gene or genome, the RTN4R sequence of the CAMK2N1 gene or genome, the DNASE1 sequence of the DNASE1L3 gene or genome, the SP26 sequence of the DUK 1 gene or genome, the ASP 26 sequence of the ASE 2 gene or genome, the CSTP 2 sequence of the ASTP 2 gene or genome, the CSTP 2 sequence of the ASTE 2 gene or genome, the BAND 2 gene or genome, the CSTP 2 sequence of the ASTE 2 gene or genome, the ASTP 2 or the ASTP 2 gene or genome, the CSTP 2 sequence of the BAND 2 gene or genome.

In other embodiments, any 2, any 3, any 4, any 5, any 6, any 7, any 8, any 9, any 10, any 11, any 12, any 13, any 14, any 15, any 16, any 17, any 18, any 19, any 20 or more combinations of the markers of interest described herein may be determined using the methods described herein as a threshold for malignancy prediction when assessed, and the combination of the markers of interest is used as a marker for diagnosing benign malignancy of thyroid nodules, determining the probability of malignancy prediction of the combination of the markers of interest in an individual sample (preferably thyroid nodule tissue, such as a puncture) by comparing it to the threshold, indicating malignancy above the threshold, and conversely, indicating benign.

In some embodiments, the one or more target markers include at least one or more of the following target markers: the EGR3 sequence of the EGR3 gene or genome, the TNK1 sequence of the TNK1 gene or genome, the DNASE1L3 sequence of the DNASE1L3 gene or genome, the DUSP26 sequence of the DUSP26 gene or genome, the BAIAP2 sequence of the BAIAP2 gene or genome, the MED16 sequence of the MED16 gene or genome, the C19orf77 sequence of the C19orf77 gene or genome, the NOL4L-DT sequence of the NOL4L-DT gene or genome, the TACSTD2 sequence of the TACSTD2 gene or genome, the CRABP2 sequence of the CRABP2 gene or genome, the BCR sequence of the BCR gene or genome.

In some embodiments, the one or more target markers comprise: the PRDM16 gene or genome PRDM16 sequence, BIN1 sequence of a BIN1 gene or genome, LIMK1 sequence of a LIMK1 gene or genome, EGR3 sequence of an EGR3 gene or genome, PPIF sequence of a PPIF gene or genome, ZNF219 sequence of a ZNF219 gene or genome, UACA sequence of a UACA gene or genome, TNK1 sequence of a TNK1 gene or genome, CEP295NL sequence of a CEP295NL gene or genome, SBNO2 sequence of a SBNO2 gene or genome, C19orf77 sequence of a C19orf77 gene or genome, ICAM5 sequence of an ICAM5 gene or genome, CRTC1 sequence of a CRTC1 gene or genome, and RTN4R sequence of an RTN4R gene or genome; and the threshold is 0.49.

In some embodiments, the one or more target markers comprise: the CAMK2N1 gene or genome, the DNASE1L3 sequence of the EGR3 gene or genome, the DUSP26 sequence of the DUSP26 gene or genome, the ICAM2 sequence of the ICAM2 gene or genome, the BAIAP2 sequence of the BAIAP2 gene or genome, the MED16 sequence of the MED16 gene or genome, the C19orf77 sequence of the C19orf77 gene or genome and the NOL4L-DT sequence of the NOL4L-DT gene or genome; and the threshold is 0.58.

In some embodiments, the one or more target markers comprise: the TACSTD2 sequence of the TACSTD2 gene or genome, the CRABP2 sequence of the CRABP2 gene or genome, the DNASE1L3 sequence of the DNASE1L3 gene or genome, the LSG1 sequence of the LSG1 gene or genome, the EGR3 sequence of the EGR3 gene or genome, the TNK1 sequence of the TNK1 gene or genome, the BAIAP2 sequence of the BAIAP2 gene or genome, the NOL4L-DT sequence of the NOL4L-DT gene or genome, and the BCR sequence of the BCR gene or genome; and the threshold is 0.52.

In some embodiments, the one or more target markers comprise: the TACSTD2 sequence of the TACSTD2 gene or genome, the CRABP2 sequence of the CRABP2 gene or genome, the DNASE1L3 sequence of the DNASE1L3 gene or genome, the EGR3 sequence of the EGR3 gene or genome, the DUSP26 sequence of the DUSP26 gene or genome, the TNK1 sequence of the TNK1 gene or genome, the BAIAP2 sequence of the BAIAP2 gene or genome, the NOL4L-DT sequence of the NOL4L-DT gene or genome, and the BCR sequence of the BCR gene or genome; and the threshold is 0.52.

In some embodiments, the one or more target markers comprise: the TACSTD2 sequence of the TACSTD2 gene or genome, the DNASE1L3 sequence of the DNASE1L3 gene or genome, the EGR3 sequence of the EGR3 gene or genome, the DUSP26 sequence of the DUSP26 gene or genome, the TNK1 sequence of the TNK1 gene or genome, the BAIAP2 sequence of the BAIAP2 gene or genome, the MED16 sequence of the MED16 gene or genome, the NOL4L-DT sequence of the NOL4L-DT gene or genome and the BCR sequence of the BCR gene or genome; and the threshold is 0.52.

Particularly preferably, the Hg coordinates of each of the target markers are described herein, particularly as shown in table 6.

In addition to the above comparisons, one of skill in the art can determine whether an individual's thyroid nodule is malignant or at risk of malignancy based on various factors, such as age, gender, medical history, family history, symptoms, and the like.

VI compositions and kits

The present invention provides a methylation detection or diagnostic kit and diagnostic reagent or diagnostic composition for the identification of benign and malignant thyroid nodules comprising reagents for detecting the methylation status or level of at least one CpG dinucleotide of one or more markers of interest described herein. Depending on the target marker to be detected, the kit and composition may contain primer and/or probe molecules. Preferably, the primer comprises a primer pair capable of hybridizing to said target marker or target region thereof to be detected under stringent, moderately stringent or highly stringent conditions. Primers may also include primers that detect internal references such as ACTB.

In some embodiments, the primers are packaged in a single container or in separate containers. In some embodiments, the kit further comprises one or more blocking oligonucleotides.

In some embodiments, the kits and compositions further comprise a detection reagent. In some embodiments, the detection reagent is selected from the group consisting of: fluorescent probes, intercalating dyes, chromophore-labeled probes, radioisotope-labeled probes, and biotin-labeled probes.

In some embodiments, the kit may further comprise a DNA polymerase and/or a container suitable for storing a biological sample obtained from an individual. In some embodiments, the kit further comprises instructions for use and/or interpretation of the detection results of the kit.

In some embodiments, the kits and compositions may further comprise reagents for enzymatic or non-enzymatic conversion. In a preferred embodiment, the kit further comprises a bisulphite reagent or a methylation sensitive restriction enzymeMSRE). In some embodiments, the bisulphite reagent is selected from the group consisting of: ammonium bisulfate, sodium bisulfate, potassium bisulfate, calcium bisulfate, magnesium bisulfate, aluminum bisulfate, bisulfite ions, and any combination thereof. In some embodiments, the bisulphite reagent is sodium bisulphite. In some embodiments, the MSRE is selected from the group consisting of: hpaII enzyme, salI enzyme,

The kits and compositions can also include a positive standard for conversion, wherein unmethylated cytosines are converted to bases that do not bind guanine. The positive standard may be fully methylated.

The kits and compositions may also include PCR reaction reagents. Preferably, the PCR reagent comprises Taq DNA polymerase, PCR buffer, dNTPs, mg ²⁺ 。

In some embodiments, the kits and compositions further comprise standard reagents useful for performing CpG site-specific methylation assays, wherein the assays include one or more of the following techniques: MS-SNuPE, MSP, methyLightTM, heavyMethyl ^TM COBRA and nucleic acid sequencing.

In some embodiments, the kits and compositions may comprise additional reagents selected from the group consisting of: buffers (e.g., restriction enzymes, PCR, preservation or washing buffers), DNA recovery reagents or kits (e.g., precipitation, ultrafiltration, affinity columns), DNA recovery components, and the like.

The kit of the present application may further comprise one or several of the following components known in the DNA enrichment art: a protein component that selectively binds methylated DNA; a triplex forming nucleic acid component, one or more linkers, optionally in a suitable solution; substances or solutions for performing ligation, e.g. ligase, buffers; a substance or solution for performing column chromatography; a substance or solution for performing an immunological-based enrichment (e.g., immunoprecipitation); substances or solutions for performing nucleic acid amplification, such as PCR; a dye or dyes, if applicable to a coupling agent, if applicable in solution; substances or solutions for performing hybridization; and/or a substance or solution used to perform the washing step.

In other embodiments, the compositions of the invention contain an isolated nucleic acid molecule selected from one or more of the following: PRDM16 gene: chr1:3155051:3155760; CAMK2N1 gene: chr1:20813203:20813902; TACSTD2 Gene: chr 1:5904615:59042314; CRABP2 Gene: chr1:15667274:156676773; IER5 gene: chr1:181074539:181075238; ITPKB gene: chr1:226924700:226925399; ITGB1BP1 gene: chr2:9526804:9527503; MTHFD2 gene: chr2:74453839:74454538; BIN1 gene: chr2, 127822196, 127822895; DNASE1L3 gene: chr3:58153211:58153910; LSG1 gene: chr3:194408527:194409226; SH3BP2 gene: chr 4:2795932:2795331; SLC12A7 gene: chr5:1117661:1118360; NR2F1 gene: chr5:92914797:92915496; EGR1 gene: chr5:137802399:137803098; LARP1 gene: chr5:154133955:154134654; RARS gene: chr 5:167837780:167838499; TTBK1 gene: chr6:43215063:43215762; FAM20C gene: chr7:193512:194211; CREB5 gene: chr7:28449041:28449740; LIMK1 gene: chr7:73508743:73509442; PRKAG2 gene: chr7:151424814:151425513; SLC39a14 gene: chr8:22236914:22237613; EGR3 gene: chr8:22547976:22549090; DUSP26 gene: chr8:34104888:34105587; AGPAT2 gene: chr9:139581855:139582554; NRARP gene: chr9:140205734:140206433; EGR2 gene: chr10:64578269:64578968; PPIF gene: chr10:81001706:81002405; CHID1 gene: chr11:911289:911988; ADM gene: chr11:10328946:10329645; NAV2 gene: chr11:19734801:19736359; EHBP1L1 gene: chr11:65343387:65344086; PHLDB1 gene: chr11:118479144:118479843; PARP11 gene: chr12:4139935:4140634; ANO6 gene: chr12:45610331:45611030; PLXNC1 gene: chr12:94544076:94544775; ZNF219 gene: chr14:21559748:21560447; FOXA1 gene: chr14:3806876:38065575; PAPLN gene: chr14:73704629:73705328; UACA gene: chr15:70766881:70767580; PGPEP1L gene: chr15:99466242:99466941; ITPRIPL2 gene: chr16:19125694:19126393; TNK1 gene: chr17:7286958:7287657; RPL19 gene: chr17:37366033:37366732; ICAM2 gene: chr 17:6207008:6207677; TMC6 gene: chr 17:7613226:7624091; CEP295NL gene: chr17:7689761:768880460; the BAIAP2 gene: chr17:79060865:79061564; TBCD gene: chr17:80744791:80745490; METRNL gene: chr17:81083812:81084511; MED16 gene: chr19:883793:884492; SBNO2 gene: chr19:1177275:1177974; CIRBP gene: chr19:1265690:1266389; KLF16 gene: chr19:1860343:1861042; c19orf77 gene: chr19:34666:3435687; SNAPC2 gene: chr 19:7985709:7986108; ICAM1 gene: chr19:10381317:10382016; ICAM5 gene: chr19:10404832:10405531; IER2 gene: chr19:13266647:13267346; ASF1B gene: chr 19:14248133:14248172; CRTC1 gene: chr19:18770961:18771660; ZNF536 gene: chr19:31039247:31039946; LTBP4 gene: chr19:41105706:41106405; NOL4L-DT gene: chr20:31162101:31162800; KCNK15 gene: chr20:43374048:43374747; UCKL1 gene: chr20:62588113:62588812; RTN4R gene: chr22:20226373:20227274; BCR gene: chr22:23624092:23624791; TEF gene: chr22:41771229:41771928.

In other embodiments, the compositions of the invention contain an isolated nucleic acid molecule selected from one or more of the following: PRDM16 gene: chr1:3155311:3155510; CAMK2N1 gene: chr1:20813453:20813652; TACSTD2 Gene: chr 1:59041685:59042064; CRABP2 Gene: chr1:15667684:156676723; IER5 gene: chr1:181074789:181074988; ITPKB gene: chr1:226924950:226925149; ITGB1BP1 gene: chr2:9527054:9527253; MTHFD2 gene: chr2:74454089:74454288; BIN1 gene: chr2, 127822446, 127822645; DNASE1L3 gene: chr3:58153461:58153660; LSG1 gene: chr3:194408777:194408976; SH3BP2 gene: chr 4:279282:2795581; SLC12A7 gene: chr5:1117911:1118110; NR2F1 gene: chr5:92915047:92915246; EGR1 gene: chr5:137802649:137802848; LARP1 gene: chr5:154134205:154134404; RARS gene: chr 5:167838020:167838129; TTBK1 gene: chr6:43215313:43215512; FAM20C gene: chr7:193762:193961; CREB5 gene: chr7:28449291:28449490; LIMK1 gene: chr7:73508993:73509192; PRKAG2 gene: chr7:151425064:151425263; SLC39a14 gene: chr8:22237164:22237363; EGR3 gene: chr8:22548226:22548425; EGR3 gene: chr8:22548641:22548840; DUSP26 gene: chr 8:3405138:34105337; AGPAT2 gene: chr9:139582105:13958234; NRARP gene: chr9:140205984:140206183; EGR2 gene: chr 10:64578519:6457878; PPIF gene: chr10:81001956:81002155; CHID1 gene: chr11:911539:911738; ADM gene: chr 11:1032996:10329395; NAV2 gene: chr11:19735051:19735250; NAV2 gene: chr11:19735910:19736109; EHBP1L1 gene: chr11:65343637:65343836; PHLDB1 gene: chr11:118479394:118479593; PARP11 gene: chr12, 4140185:4140384; ANO6 gene: chr12:45610581:45610780; PLXNC1 gene: chr12:94544326:94544525; ZNF219 gene: chr 14:21559998:2156097; FOXA1 gene: chr14:380665126:380565325; PAPLN gene: chr14:73704879:73705078; UACA gene: chr15:70767131:70767330; PGPEP1L gene: chr 15:99466492:466691; ITPRIPL2 gene: chr16:19125944:19126143; TNK1 gene: chr17:7287208:7287407; RPL19 gene: chr17:37366283:37366482; ICAM2 gene: chr 17:62076858:62057657; TMC6 gene: chr 17:7613476:7613675; TMC6 gene: chr 17:7623642:7623841; CEP295NL gene: chr 17:768880011:768880210; the BAIAP2 gene: chr17:79061115:79061314; TBCD gene: chr17, 80745041, 80745240; METRNL gene: chr17:81084062:81084261; MED16 gene: chr19:884043:884242; SBNO2 gene: chr19:1177525:1177724; CIRBP gene: chr19:1265940:1266139; KLF16 gene: chr19:1860593:1860792; c19orf77 gene: chr19:34916:3435115; c19orf77 gene: chr19:3435238:3435437; SNAPC2 gene: chr19:7985959:7986158; ICAM1 gene: chr19:10381567:10381766; ICAM5 gene: chr19:10405082:10405281; IER2 gene: chr19:13266897:13267096; ASF1B gene: chr19:14248383:14248582; CRTC1 gene: chr19:18771211:18771410; ZNF536 gene: chr19:31039497:31039696; LTBP4 gene: chr19:41105956:4106155; NOL4L-DT gene: chr20:31162351:31162550; KCNK15 gene: chr20:43374298:43374497; UCKL1 gene: chr20:62588863:6258562; RTN4R gene: chr22:20226623:20226822; RTN4R gene: chr22:20226825:20227024; BCR gene: chr22:23624342:23624541; TEF gene: chr22:41771479:41771678.

The present application also includes a medium bearing the sequence of the isolated nucleic acid molecules described herein and optionally methylation information thereof for comparison with gene methylation sequencing data to determine the presence, amount, and/or methylation level of the nucleic acid molecules. Preferably, the medium is a card, such as paper, plastic, metal, glass card, printed with the sequence and optionally its methylation information. Preferably, the medium is a computer readable medium storing the sequence and optionally methylation information thereof and a computer program which, when executed by a processor, performs the steps of: comparing methylation sequencing data of a sample to said sequence, thereby obtaining the presence, amount and/or methylation level of a nucleic acid molecule comprising said sequence in said sample.

The present application also includes an apparatus for identifying a benign or malignant thyroid nodule, the apparatus comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of when executing the program: (1) Obtaining in the sample a methylation level of a target marker selected from one or more of the following or a target region thereof described herein, (2) interpreting the benign and malignant thyroid nodule according to the methylation level of (1). Preferably, the step of obtaining is performed using any of the methods described in section IV of the present application; preferably, the interpretation is performed by any of the methods described in section V of the present application.

VII use of

The application also provides application of the isolated nucleic acid molecules as described in the application as detection targets in diagnosis of benign and malignant thyroid nodule.

The methylation marker of the invention identifies that the sensitivity of thyroid cancer reaches 100%; more importantly, the sensitivity of the thyroid nodule with undefined cytological classification is 100% identified by the invention. Compared with the existing technology for molecular diagnosis of benign and malignant thyroid nodule, the methylation marker and the technical scheme provided by the invention effectively solve the problem of low sensitivity of the existing diagnosis technology, and are beneficial to early diagnosis and early treatment of thyroid cancer so as to improve the cure rate.

The invention will be further illustrated with reference to specific examples. It is to be understood that these examples are illustrative of the present invention and are not intended to limit the scope of the present invention. The experimental methods, in which specific conditions are not noted in the following examples, are generally conducted under conventional conditions or under conditions recommended by the manufacturer. Percentages and parts are by weight unless otherwise indicated.

Examples

The invention will be illustrated by way of specific examples. It should be understood that these examples are illustrative only and are not intended to limit the scope of the invention. The method of the invention comprises the following steps:

1. The level of CpG site methylation in the above markers in the sample was detected using genome-simplified methylation sequencing (RRBS) techniques, and then the average methylation ratio (average methylation fraction, AMF) of the markers was calculated as the DNA methylation level of the markers. AMF is derived from the following equation:

m is the total CpG site number in the marker, i is one of the CpG sites, N _C,i Sequencing reads number, N, for methylation of the CpG sites _T,i Sequencing reads that are unmethylated for the CpG sites.

2. And calculating the malignancy prediction probability of the sample through the constructed mathematical model. The malignancy prediction probability is calculated using a logistic regression (Logistic Regression) model. First, the input z of the Sigmoid function is calculated, which is given by the following formula:

z＝Σw*x+w0

then, the Sigmoid function is calculated as follows:

σ(z)＝1/(1+e ^-z )

w is the regression model coefficient for each marker, w0 is the intercept, and x is the sample DNA methylation level for that marker.

3. And identifying the benign and malignant of the thyroid nodule according to the sample malignant prediction probability. And (3) judging that the sample malignancy prediction probability threshold value calculated based on the data model constructed by the methylation marker combination is malignant if the sample malignancy prediction probability is larger than the threshold value, otherwise judging that the sample malignancy prediction probability is benign.

Using published data of published scientific papers, namely genome simplified methylation sequencing (RRBS) raw sequencing data [ Guerra a, carrano M, angrisani E et al, detection of RAS mutation by pyrosequencing in thyroid cytology samples, int J Surg,12suppl 1: s91-4, 2014 ], analyzing all samples of benign and malignant thyroid nodules, i.e., 145 surgical samples (65 benign nodules, 80 malignant nodules), to obtain methylation levels of CpG sites above 10x in sequencing depth; AMF is then calculated as the marker DNA methylation level based on the CpG sites detected in each methylation marker. For comparison with the published article [ Valderrabano P, khazai L, len ME et al, evaluation of ThyroSeq v performance in thyroid nodules with indeterminate cytology, endocr Relat Cancer,24:127-136,2017 ] identification of benign and malignant thyroid nodules, the present invention was consistent with the article sample grouping, with Developing cohort as the training set (28 benign nodules, 39 malignant nodules) and Testing cobort as the validation set 1 (37 benign nodules, 41 malignant nodules). In addition, 74 Chinese thyroid surgery samples are collected as verification set 2 (37 benign nodules and 37 malignant nodules), each sample obtains CpG sites detected in each methylation marker by using the RRBS technology and the analysis flow, and AMF is calculated and used as the DNA methylation level of the marker. In the following examples, two sets of validation set samples predict the area AUC (Area Under Curve) under the subject's operating characteristics curve (receiver operating characteristic curve, ROC) using a mathematical model constructed of training set samples.

Example 1

Methylation markers chr1:31550561:3155760, chr2:127822196:127822895, chr7:73508743:73509442, chr8:22547976:22548675, chr8:22548391:225090, chr10:81001706:81002405, chr14:21559748:21560447, chr15:70766881:70767580, chr17:7286958:7287657, chr17:768879761:768880460, chr19:1177275:1177974, chr19:34666:3435365, chr19:10404832:10405531, chr19:18770961:181660, chr22:20226373:27072 are models constructed in combination (methylation marker combination 1) to test AUC in both sets of validation samples. The logistic regression model coefficient w for each marker is shown in Table 1-1, with a logistic regression model intercept w0 of 0.305.

Table 1-1: logistic regression model coefficients for each marker

Methylation markers	Gene name	Logistic regression model coefficients
			chr1:3155061:3155760	PRDM16	0.273
chr2:127822196:127822895	BIN1	-0.347
			chr7:73508743:73509442	LIMK1	-0.258
chr8:22547976:22548675	EGR3	0.373
			chr8:22548391:22549090	EGR3	0.239
chr10:81001706:81002405	PPIF	-0.228
			chr14:21559748:21560447	ZNF219	0.413
chr15:70766881:70767580	UACA	-0.172
			chr17:7286958:7287657	TNK1	-0.143
chr17:76879761:76880460	CEP295NL	-0.170
			chr19:1177275:1177974	SBNO2	-0.230
chr19:3434666:3435365	C19orf77	-0.184
			chr19:10404832:10405531	ICAM5	0.423
chr19:18770961:18771660	CRTC1	-0.251
			chr22:20226373:20227072	RTN4R	-0.180

The results are shown in FIG. 1. The results show that the area under the ROC curve of the verification set 1 is 0.98, and the 95% CI is 0.97-0.99; the area under the ROC curve of validation set 2 was 0.95 and 95% CI was 0.93-0.97. When the training set specificity is 86% and the sensitivity is 92%, the malignancy prediction threshold is 0.49, namely the malignancy prediction probability is greater than 0.49, and the training set is judged to be malignant, otherwise, the training set is judged to be benign; the sensitivity of the threshold value to the diagnosis of the thyroid malignant nodule in the verification set 1 reaches 100%, the specificity reaches 76%, the PPV reaches 82%, and the NPV (negative predict value) reaches 100%; the sensitivity of the kit for diagnosing the thyroid malignant nodule in the validation set 2 reaches 87%, the specificity reaches 84%, the PPV reaches 84%, and the NPV reaches 86%. The results of the two sets of validation set samples predicted using the methylation marker combination 1 are shown in tables 1-2 and tables 1-3, respectively.

Table 1-2: results of the validation set 1 samples predicted with methylation marker combination 1

Tables 1-3: validation set 2 results of samples predicted with methylation marker combination 1

Example 2

The model constructed with the methylation markers chr1:20813203:20813902, chr3:58153211:58153910, chr8:22547976:22548675, chr8:34104888:34105587, chr17:6207008:62076775, chr17:79060865:79061564, chr19:883793:884492, chr19:3434988:3435687, chr20:31162101:31162800 in combination of methylation markers (methylation marker combination 2) was tested for AUC in both validation set samples. The logistic regression model coefficients for each marker are shown in table 2-1 with a logistic regression model intercept of 1.212.

Table 2-1: logistic regression model coefficients for each marker

The results are shown in FIG. 2. The results show that the area under the ROC curve of the verification set 1 is 1.00, and the 95% CI is 1.00-1.00; the area under the ROC curve of validation set 2 was 0.96 and 95% CI was 0.95-0.98. When the training set specificity is 96% and the sensitivity is 85%, the malignancy prediction threshold is 0.58, namely the malignancy prediction probability is greater than 0.58, and the training set is judged to be malignant, otherwise, the training set is judged to be benign; the sensitivity of the threshold value to diagnosis of the thyroid malignant nodule in the verification set 1 reaches 88%, the specificity reaches 100%, the PPV reaches 100%, and the NPV reaches 88%; the sensitivity of the kit for diagnosing the thyroid malignant nodule in the validation set 2 reaches 76%, the specificity reaches 95%, the PPV reaches 93%, and the NPV reaches 80%. The prediction results of the methylation marker combination 2 for the two groups of verification set samples are shown in tables 2-2 and tables 2-3 respectively.

Table 2-2: validation of results predicted for set 1 samples with methylation marker combination 2

Table 2-3: validation set 2 results of samples predicted with methylation marker combination 2

Example 3

The model constructed with the methylation markers chr1:5904615:59042314, chr1:15667274:1566767973, chr3:58153211:58153910, chr3:194408527:194409226, chr8:22547976:22548675, chr17:7286958:7287657, chr17:79060865:79061564, chr20:31162101:31162800, chr22:23624092:23624791 combinations (methylation marker combination 3) was tested for AUC in two sets of validation set samples. The logistic regression model coefficients for each marker are shown in table 3-1. The intercept of the logistic regression model was 1.681.

Table 3-1: logistic regression model coefficients for each marker

The results are shown in FIG. 3. The results show that the area under the ROC curve of the verification set 1 is 1.00, and the 95% CI is 0.99-1.00; the area under the ROC curve of validation set 2 was 0.97 and 95% CI was 0.95-0.98. When the training set specificity is 93% and the sensitivity is 95%, the malignancy prediction threshold is 0.52, namely the malignancy prediction probability is greater than 0.52, and the training set is judged to be malignant, otherwise, the training set is judged to be benign; the sensitivity of the threshold value to diagnosis of the thyroid malignant nodule in the verification set 1 reaches 98%, the specificity reaches 100%, the PPV reaches 100%, and the NPV reaches 97%; the sensitivity of the kit for diagnosing the thyroid malignant nodule in the validation set 2 reaches 92%, the specificity reaches 87%, the PPV reaches 87%, and the NPV reaches 91%. The prediction results of the methylation marker combination 3 for the two groups of validation set samples are shown in Table 3-2 and Table 3-3 respectively.

Table 3-2: validation of results predicted for set 1 samples with methylation marker combination 3

Table 3-3 verifies the results of the prediction of set 2 samples with methylation marker combination 3

Example 4

The model constructed with the methylation markers chr1:5904615:59042314, chr1:15667274:1566767973, chr3:58153211:58153910, chr8:22547976:22548675, chr8:34104888:34105587, chr17:7286958:7287657, chr17:79060865:79061564, chr20:31162101:31162800, chr22:23624092:23624791 in combination of methylation markers 4 was tested for AUC in two sets of validation set samples. The logistic regression model coefficients for each marker are shown in table 4-1. The intercept of the logistic regression model is 1.358.

Table 4-1: logistic regression model coefficients for each marker

The results are shown in FIG. 4. The results show that the area under the ROC curve of the verification set 1 is 1.00, and the 95% CI is 0.99-1.00; the area under the ROC curve of validation set 2 was 0.97 and 95% ci was 0.95-0.98 (fig. 4). When the training set specificity is 93% and the sensitivity is 95%, the malignancy prediction threshold is 0.52, namely the malignancy prediction probability is greater than 0.52, and the training set is judged to be malignant, otherwise, the training set is judged to be benign; the sensitivity of the threshold value to diagnosis of the thyroid malignant nodule in the verification set 1 reaches 95%, the specificity reaches 97%, the PPV reaches 98%, and the NPV reaches 95%; the sensitivity of the kit for diagnosing the thyroid malignant nodule in the validation set 2 reaches 92%, the specificity reaches 87%, the PPV reaches 87%, and the NPV reaches 91%. The prediction results of the methylation marker combination 4 for the two groups of validation set samples are shown in Table 4-2 and Table 4-3 respectively.

Table 4-2: validation of results predicted for set 1 samples with methylation marker combination 4

Table 4-3: validation set 2 samples predicted results with methylation marker combination 4

Example 5

The model constructed with the methylation markers chr1:5904615:59042314, chr3:58153211:58153910, chr8:22547976:22548675, chr8:34104888:34105587, chr17:7286958:7287657, chr17:79060865:79061564, chr19:883793:884492, chr20:31162101:31162800, chr22:23624092:23624791 in combination of methylation markers (methylation marker combination 5) was tested for AUC in both validation set samples. The logistic regression model coefficients for each marker are shown in table 5-1. The intercept of the logistic regression model was 1.447.

Table 5-1: logistic regression model coefficients for each marker

Marker(s)	Gene name	Logistic regression model coefficients
			chr1:59041615:59042314	TACSTD2	-1.122
chr3:58153211:58153910	DNASE1L3	-0.724
			chr8:22547976:22548675	EGR3	2.143
chr8:34104888:34105587	DUSP26	0.893
			chr17:7286958:7287657	TNK1	-1.212
chr17:79060865:79061564	BAIAP2	-0.717
			chr19:883793:884492	MED16	-0.411
chr20:31162101:31162800	NOL4L-DT	-1.258
			chr22:23624092:23624791	BCR	-0.682

The results are shown in FIG. 5. The results show that the area under the ROC curve of the verification set 1 is 1.00, and the 95% CI is 0.99-1.00; the area under the ROC curve of validation set 2 was 0.97 and 95% ci was 0.96-0.99 (fig. 4). When the training set specificity is 93% and the sensitivity is 95%, the malignancy prediction threshold is 0.52, namely the malignancy prediction probability is greater than 0.52, and the training set is judged to be malignant, otherwise, the training set is judged to be benign; the sensitivity of the threshold value to diagnosis of the thyroid malignant nodule in the verification set 1 reaches 95%, the specificity reaches 97%, the PPV reaches 98%, and the NPV reaches 95%; the sensitivity of the kit for diagnosing the thyroid malignant nodule in the validation set 2 reaches 92%, the specificity reaches 87%, the PPV reaches 87%, and the NPV reaches 91%. The prediction results of the methylation marker combination 5 for the two groups of validation set samples are shown in tables 5-2 and tables 5-3 respectively.

Table 5-2: validation of results predicted for set 1 samples with methylation marker combination 5

Table 5-3: validation set 2 samples predicted outcome with methylation marker combination 5

Example 6

AUC was tested in two sets of validation set samples using the model constructed with each methylation marker separately. A threshold defined by the Johnson index of the training set is used as a malignancy prediction threshold, and a person above the threshold is judged to be malignant, otherwise, the person is judged to be benign. The predictive performance of each methylation marker on two sets of validation set samples is shown in table 6.

Table 6: predicting performance of each methylation marker on test set samples

Example 7

Up to 30% of thyroid fine needle puncture samples are difficult to diagnose accurately by cytologic characterization. According to the Bethesda thyroid cytopathology classification criteria, 25 samples in validation set 1 were of ambiguous cytological classification, with a predictive accuracy of 84%, 100% sensitivity and 73% specificity using the methylation marker combination 1 of example 1. Validation set 1 samples with ambiguous cytological classification the predictive results of methylation marker combination 1 of example 1 are shown in table 7-1.

Table 7-1: validation set 1 sample with ambiguous cytological classification using methylation marker combination 1 predicted results

25 samples of the validation set 1 with ambiguous cytological classifications were predicted to have 96% accuracy, 90% sensitivity, and 100% specificity using the methylation marker combination 2 of example 2. The prediction of the cytological classification ambiguous validation set 1 samples using the methylation marker combinations of example 2 is shown in Table 7-2.

Table 7-2: validation set 1 sample with ambiguous cytological classification using methylation marker combination 2 predicted results

25 samples of validation set 1 with ambiguous cytological classifications were predicted to have 96% accuracy, 90% sensitivity, and 100% specificity using the methylation marker combination 3 of example 3. Validation set 1 samples with ambiguous cytological classification are shown in tables 7-3 using the methylation signature combination 3 prediction results of example 3.

Table 7-3: validation set 1 sample with ambiguous cytological classification using methylation marker combination 3 predicted results

25 samples of the validation set 1 with ambiguous cytological classifications were predicted to have 96% accuracy, 90% sensitivity, and 100% specificity using the methylation marker combination 4 of example 4. Validation set 1 samples with ambiguous cytological classification are shown in tables 7-4 using the methylation signature combination 4 prediction results of example 4.

Table 7-4: validation set 1 sample with ambiguous cytological classification predicted outcome with methylation marker combination 4

25 samples of the validation set 1 with ambiguous cytological classifications were predicted to have 96% accuracy, 90% sensitivity, and 100% specificity using the methylation marker combination 5 of example 5. Validation set 1 samples with ambiguous cytological classification are shown in tables 7-5 using the methylation signature combination 5 prediction results of example 5.

Table 7-5: validation set 1 sample with ambiguous cytological classification predicted outcome with methylation marker combination 5

Sample ID	Bethesda classification	Sample type	Probability of malignancy prediction	Prediction result
					138T	SM	Malignant malignancy of	0.661	Malignant malignancy of
141T	SM	Malignant malignancy of	0.848	Malignant malignancy of
					148T	AUS	Malignant malignancy of	0.900	Malignant malignancy of
179T	SM	Malignant malignancy of	0.624	Malignant malignancy of
					181T	SM	Malignant malignancy of	0.653	Malignant malignancy of
608T	AUS	Malignant malignancy of	0.758	Malignant malignancy of
					610T	SM	Malignant malignancy of	0.668	Malignant malignancy of
612T	SM	Malignant malignancy of	0.751	Malignant malignancy of
					613T	SM	Malignant malignancy of	0.769	Malignant malignancy of
616T	SFN	Malignant malignancy of	0.477	Benign
					516B	SFN	Benign	0.352	Benign
519B	FLUS/AUS	Benign	0.331	Benign
					525B	SM	Benign	0.226	Benign
531B	FN	Benign	0.508	Benign
					545B	SFN	Benign	0.346	Benign
559B	SFN	Benign	0.078	Benign
					564B	SFN	Benign	0.501	Benign
565B	SFN	Benign	0.471	Benign
					567B	SFN	Benign	0.346	Benign
570B	FLUS	Benign	0.491	Benign
					574B	SFN	Benign	0.220	Benign
578B	SFN	Benign	0.271	Benign
					579B	SFN	Benign	0.465	Benign
581B	AUS/FLUS	Benign	0.218	Benign
					620B	SFN	Benign	0.202	Benign

Bethesda classification description:

AUS: cannot be diagnosed explicitly (Atypia of Undetermined Significance);

FLUS: unknown follicular lesions (Follicular Lesion of Undetermined Significance);

FN: a follicular new growth (follicular neoplasms);

SFN: suspicious follicular tumors (suspicious for follicular neoplasm);

SM: suspected malignant tumor (suspicious for malignancy).

Claims

1. Use of a reagent for detecting the methylation status or level of at least one CpG dinucleotide of one or more markers of interest for the manufacture of a detection reagent or a diagnostic kit for diagnosing benign and malignant thyroid nodules in an individual, and use of a device for determining the methylation status or level of at least one CpG dinucleotide of one or more markers of interest for the manufacture of a diagnostic kit for diagnosing benign and malignant thyroid nodules in an individual, wherein the one or more markers of interest are selected from the group consisting of: PRDM16 gene or genome PRDM16 sequence, CAMK2N1 gene or genome CAMK2N1 sequence, TACSTD2 gene or genome TACSTD2 sequence, CRABP2 gene or genome CRABP2 sequence, IER5 gene or genome IER5 sequence, ITPKB gene or genome ITPKB sequence, ITGB1BP1 gene or genome ITGB1BP1 sequence, MTHFD2 gene or genome MTHFD2 sequence, BIN1 gene or genome BIN1 sequence, DNASE1L3 gene or genome DNASE1L3 sequence, DNASE1L3 sequence LSG1 sequence of LSG1 gene or genome, SH3BP2 sequence of SH3BP2 gene or genome, SLC12A7 sequence of SLC12A7 gene or genome, NR2F1 sequence of NR2F1 gene or genome, EGR1 sequence of EGR1 gene or genome, LARP1 sequence of LARP1 gene or genome, RARS sequence of RARS gene or genome, TTBK1 sequence of TTBK1 gene or genome, FAM20C sequence of FAM20C gene or genome, CREB5 sequence of CREB5 gene or genome, LIMK1 sequence of LIMK1 gene or genome PRKAG2 sequence of PRKAG2 gene or genome, SLC39A14 sequence of SLC39A14 gene or genome, EGR3 sequence of EGR3 gene or genome, DUSP26 sequence of DUSP26 gene or genome, AGPAT2 sequence of AGPAT2 gene or genome, AGPAT2 sequence of gene or genome, gene, or genome, or sequence, or gene, or NRARP sequence of NRARP gene or genome, EGR2 sequence of EGR2 gene or genome, PPIF sequence of PPIF gene or genome, CHID1 sequence of CHID1 gene or genome, ADM sequence of ADM gene or genome, NAV2 sequence of NAV2 gene or genome, and/or its/their/EHBP 1L1 sequence of the EHBP1L1 gene or genome, PHLDB1 sequence of the PHLDB1 gene or genome, PARP11 sequence of the PARP11 gene or genome, ANO6 sequence of the ANO6 gene or genome, PLXNC1 sequence of the PLXNC1 gene or genome, ZNF219 sequence of the ZNF219 gene or genome, FOXA1 sequence of the FOXA1 gene or genome, PAPLN sequence of the PAPLN gene or genome, UACA sequence of the UACA gene or genome, PGPEP1L sequence of the PGPEP1L gene or genome, ITPRIPL2 sequence of the PLXNC1 gene or genome, TNK1 gene or genome's TNK1 sequence, RPL19 gene or genome's RPL19 sequence, ICAM2 gene or genome's ICAM2 sequence, TMC6 gene or genome's TMC6 sequence, CEP295NL gene or genome's CEP295NL sequence, BAIAP2 gene or genome's BAIAP2 sequence, TBCD gene or genome's TBCD sequence, METRNL gene or genome's METRL sequence, MED16 gene or genome's MED16 sequence, SBNO2 gene or genome's SBNO2 sequence, CIRBP gene or genome's CIRBP sequence, KLF16 gene or genome's KLF16 sequence, C19orf77 gene or genome's C19orf77 sequence, SNNK 2 gene or genome's SNN 2 sequence, ICAM1 gene or genome's ICAM5 sequence, IER2 gene or genome's IER2 sequence, ASF1B gene or genome's MED16 sequence, ASF1 gene or genome's CIRBP sequence, KLF16 gene or genome's KLF16 sequence, C19orf77 sequence, SNNK 2 gene or genome's SNNK 2 sequence, ICAM1 gene or genome's ICAM5 sequence, ICAM5 gene or genome's TC 2 gene's IER2 gene or genome's IEF 1, ASF1 gene or ASF1 gene's ASF1 or genome's ASF1, ASF 4 or gene's BCF 4 gene's 4 sequence, or its BCF 4 gene or genome's 4 sequence, and its KCR 4 gene or its 4 sequence.

2. The use of claim 1, wherein the one or more target markers are selected from the group consisting of: the PRDM16 gene or genome PRDM16 sequence, BIN1 sequence of a BIN1 gene or genome, LIMK1 sequence of a LIMK1 gene or genome, EGR3 sequence of a CRTC1 gene or genome, PPIF sequence of a PPIF gene or genome, ZNF219 sequence of a ZNF219 gene or genome, UACA sequence of a UACA gene or genome, TNK1 sequence of a TNK1 gene or genome, CEP295NL sequence of a CEP295NL gene or genome, SBNO2 sequence of a SBNO2 gene or genome, C19orf77 sequence of a C19orf77 gene or genome, ICAM5 sequence of a ICAM5 gene or genome, CRTC1 sequence of a CRTC1 gene or genome, RTN4R sequence of a RTN4 gene or genome, CAMK2N1 sequence of a CAMK 1 gene or genome, DNASE1L3 sequence of a DNASE1 gene or genome, DUSP26 sequence of a DUSP26 gene or genome, a cstr 2 sequence of a csag 2 gene or genome, a cstr 2 sequence of a cstr 2 gene or genome, a cstr 2 or a cstr 2 gene or genome.

3. The use of claim 1, wherein the one or more target markers comprise at least one or more of the following target markers: the EGR3 sequence of the EGR3 gene or genome, the TNK1 sequence of the TNK1 gene or genome, the DNASE1L3 sequence of the DNASE1L3 gene or genome, the DUSP26 sequence of the DUSP26 gene or genome, the BAIAP2 sequence of the BAIAP2 gene or genome, the MED16 sequence of the MED16 gene or genome, the C19orf77 sequence of the C19orf77 gene or genome, the NOL4L-DT sequence of the NOL4L-DT gene or genome, the TACSTD2 sequence of the TACSTD2 gene or genome, the CRABP2 sequence of the CRABP2 gene or genome, the BCR sequence of the BCR gene or genome.

4. The use of claim 1, wherein the one or more target markers comprise:

the PRDM16 gene or genome PRDM16 sequence, BIN1 sequence of a BIN1 gene or genome, LIMK1 sequence of a LIMK1 gene or genome, EGR3 sequence of an EGR3 gene or genome, PPIF sequence of a PPIF gene or genome, ZNF219 sequence of a ZNF219 gene or genome, UACA sequence of a UACA gene or genome, TNK1 sequence of a TNK1 gene or genome, CEP295NL sequence of a CEP295NL gene or genome, SBNO2 sequence of a SBNO2 gene or genome, C19orf77 sequence of a C19orf77 gene or genome, ICAM5 sequence of an ICAM5 gene or genome, CRTC1 sequence of a CRTC1 gene or genome, and RTN4R sequence of an RTN4R gene or genome; or (b)

The CAMK2N1 gene or genome, the DNASE1L3 sequence of the EGR3 gene or genome, the DUSP26 sequence of the DUSP26 gene or genome, the ICAM2 sequence of the ICAM2 gene or genome, the BAIAP2 sequence of the BAIAP2 gene or genome, the MED16 sequence of the MED16 gene or genome, the C19orf77 sequence of the C19orf77 gene or genome and the NOL4L-DT sequence of the NOL4L-DT gene or genome; or (b)

The TACSTD2 sequence of the TACSTD2 gene or genome, the CRABP2 sequence of the CRABP2 gene or genome, the DNASE1L3 sequence of the DNASE1L3 gene or genome, the LSG1 sequence of the LSG1 gene or genome, the EGR3 sequence of the EGR3 gene or genome, the TNK1 sequence of the TNK1 gene or genome, the BAIAP2 sequence of the BAIAP2 gene or genome, the NOL4L-DT sequence of the NOL4L-DT gene or genome, and the BCR sequence of the BCR gene or genome; or (b)

The TACSTD2 sequence of the TACSTD2 gene or genome, the CRABP2 sequence of the CRABP2 gene or genome, the DNASE1L3 sequence of the DNASE1L3 gene or genome, the EGR3 sequence of the EGR3 gene or genome, the DUSP26 sequence of the DUSP26 gene or genome, the TNK1 sequence of the TNK1 gene or genome, the BAIAP2 sequence of the BAIAP2 gene or genome, the NOL4L-DT sequence of the NOL4L-DT gene or genome, and the BCR sequence of the BCR gene or genome; or (b)

The TACSTD2 sequence of the TACSTD2 gene or genome, the DNASE1L3 sequence of the DNASE1L3 gene or genome, the EGR3 sequence of the EGR3 gene or genome, the DUSP26 sequence of the DUSP26 gene or genome, the TNK1 sequence of the TNK1 gene or genome, the BAIAP2 sequence of the BAIAP2 gene or genome, the MED16 sequence of the MED16 gene or genome, the NOL4L-DT sequence of the NOL4L-DT gene or genome and the BCR sequence of the BCR gene or genome.

5. The use according to any one of claims 1-4, wherein the Hg19 coordinates of the one or more target markers are as follows:

PRDM16 gene: chr1:3155051:3155760; CAMK2N1 gene: chr1:20813203:20813902; TACSTD2 Gene: chr 1:5904615:59042314; CRABP2 Gene: chr1:15667274:156676773; IER5 gene: chr1:181074539:181075238; ITPKB gene: chr1:226924700:226925399; ITGB1BP1 gene: chr2:9526804:9527503; MTHFD2 gene: chr2:74453839:74454538; BIN1 gene: chr2, 127822196, 127822895; DNASE1L3 gene: chr3:58153211:58153910; LSG1 gene: chr3:194408527:194409226; SH3BP2 gene: chr 4:2795932:2795331; SLC12A7 gene: chr5:1117661:1118360; NR2F1 gene: chr5:92914797:92915496; EGR1 gene: chr5:137802399:137803098; LARP1 gene: chr5:154133955:154134654; RARS gene: chr 5:167837780:167838499; TTBK1 gene: chr6:43215063:43215762; FAM20C gene: chr7:193512:194211; CREB5 gene: chr7:28449041:28449740; LIMK1 gene: chr7:73508743:73509442; PRKAG2 gene: chr7:151424814:151425513; SLC39a14 gene: chr8:22236914:22237613; EGR3 gene: chr8:22547976:22549090; DUSP26 gene: chr8:34104888:34105587; AGPAT2 gene: chr9:139581855:139582554; NRARP gene: chr9:140205734:140206433; EGR2 gene: chr10:64578269:64578968; PPIF gene: chr10:81001706:81002405; CHID1 gene: chr11:911289:911988; ADM gene: chr11:10328946:10329645; NAV2 gene: chr11:19734801:19736359; EHBP1L1 gene: chr11:65343387:65344086; PHLDB1 gene: chr11:118479144:118479843; PARP11 gene: chr12:4139935:4140634; ANO6 gene: chr12:45610331:45611030; PLXNC1 gene: chr12:94544076:94544775; ZNF219 gene: chr14:21559748:21560447; FOXA1 gene: chr14:3806876:38065575; PAPLN gene: chr14:73704629:73705328; UACA gene: chr15:70766881:70767580; PGPEP1L gene: chr15:99466242:99466941; ITPRIPL2 gene: chr16:19125694:19126393; TNK1 gene: chr17:7286958:7287657; RPL19 gene: chr17:37366033:37366732; ICAM2 gene: chr 17:6207008:6207677; TMC6 gene: chr 17:7613226:7624091; CEP295NL gene: chr17:7689761:768880460; the BAIAP2 gene: chr17:79060865:79061564; TBCD gene: chr17:80744791:80745490; METRNL gene: chr17:81083812:81084511; MED16 gene: chr19:883793:884492; SBNO2 gene: chr19:1177275:1177974; CIRBP gene: chr19:1265690:1266389; KLF16 gene: chr19:1860343:1861042; c19orf77 gene: chr19:34666:3435687; SNAPC2 gene: chr 19:7985709:7986108; ICAM1 gene: chr19:10381317:10382016; ICAM5 gene: chr19:10404832:10405531; IER2 gene: chr19:13266647:13267346; ASF1B gene: chr 19:14248133:14248172; CRTC1 gene: chr19:18770961:18771660; ZNF536 gene: chr19:31039247:31039946; LTBP4 gene: chr19:41105706:41106405; NOL4L-DT gene: chr20:31162101:31162800; KCNK15 gene: chr20:43374048:43374747; UCKL1 gene: chr20:62588113:62588812; RTN4R gene: chr22:20226373:20227274; BCR gene: chr22:23624092:23624791; TEF gene: chr22:41771229:41771928.

6. The use according to any one of claims 1-5, wherein the Hg19 coordinates of the one or more target markers are as follows:

PRDM16 gene: chr1:3155311:3155510; CAMK2N1 gene: chr1:20813453:20813652; TACSTD2 Gene: chr 1:59041685:59042064; CRABP2 Gene: chr1:15667684:156676723; IER5 gene: chr1:181074789:181074988; ITPKB gene: chr1:226924950:226925149; ITGB1BP1 gene: chr2:9527054:9527253; MTHFD2 gene: chr2:74454089:74454288; BIN1 gene: chr2, 127822446, 127822645; DNASE1L3 gene: chr3:58153461:58153660; LSG1 gene: chr3:194408777:194408976; SH3BP2 gene: chr 4:279282:2795581; SLC12A7 gene: chr5:1117911:1118110; NR2F1 gene: chr5:92915047:92915246; EGR1 gene: chr5:137802649:137802848; LARP1 gene: chr5:154134205:154134404; RARS gene: chr 5:167838020:167838129; TTBK1 gene: chr6:43215313:43215512; FAM20C gene: chr7:193762:193961; CREB5 gene: chr7:28449291:28449490; LIMK1 gene: chr7:73508993:73509192; PRKAG2 gene: chr7:151425064:151425263; SLC39a14 gene: chr8:22237164:22237363; EGR3 gene: chr8:22548226:22548425; EGR3 gene: chr8:22548641:22548840; DUSP26 gene: chr 8:3405138:34105337; AGPAT2 gene: chr9:139582105:13958234; NRARP gene: chr9:140205984:140206183; EGR2 gene: chr 10:64578519:6457878; PPIF gene: chr10:81001956:81002155; CHID1 gene: chr11:911539:911738; ADM gene: chr 11:1032996:10329395; NAV2 gene: chr11:19735051:19735250; NAV2 gene: chr11:19735910:19736109; EHBP1L1 gene: chr11:65343637:65343836; PHLDB1 gene: chr11:118479394:118479593; PARP11 gene: chr12, 4140185:4140384; ANO6 gene: chr12:45610581:45610780; PLXNC1 gene: chr12:94544326:94544525; ZNF219 gene: chr 14:21559998:2156097; FOXA1 gene: chr14:380665126:380565325; PAPLN gene: chr14:73704879:73705078; UACA gene: chr15:70767131:70767330; PGPEP1L gene: chr 15:99466492:466691; ITPRIPL2 gene: chr16:19125944:19126143; TNK1 gene: chr17:7287208:7287407; RPL19 gene: chr17:37366283:37366482; ICAM2 gene: chr 17:62076858:62057657; TMC6 gene: chr 17:7613476:7613675; TMC6 gene: chr 17:7623642:7623841; CEP295NL gene: chr 17:768880011:768880210; the BAIAP2 gene: chr17:79061115:79061314; TBCD gene: chr17, 80745041, 80745240; METRNL gene: chr17:81084062:81084261; MED16 gene: chr19:884043:884242; SBNO2 gene: chr19:1177525:1177724; CIRBP gene: chr19:1265940:1266139; KLF16 gene: chr19:1860593:1860792; c19orf77 gene: chr19:34916:3435115; c19orf77 gene: chr19:3435238:3435437; SNAPC2 gene: chr19:7985959:7986158; ICAM1 gene: chr19:10381567:10381766; ICAM5 gene: chr19:10405082:10405281; IER2 gene: chr19:13266897:13267096; ASF1B gene: chr19:14248383:14248582; CRTC1 gene: chr19:18771211:18771410; ZNF536 gene: chr19:31039497:31039696; LTBP4 gene: chr19:41105956:4106155; NOL4L-DT gene: chr20:31162351:31162550; KCNK15 gene: chr20:43374298:43374497; UCKL1 gene: chr20:62588863:6258562; RTN4R gene: chr22:20226623:20226822; RTN4R gene: chr22:20226825:20227024; BCR gene: chr22:23624342:23624541; TEF gene: chr22:41771479:41771678.

7. The use according to any one of claims 1 to 6, wherein the reagent comprises a primer and/or a probe molecule;

preferably, the primer molecule is identical, complementary or hybridizes under stringent conditions to the one or more target markers and comprises at least 9 consecutive nucleotides, and the probe molecule hybridizes under stringent conditions to the amplification product of the one or more target markers.

8. The use according to any one of claims 1 to 6, wherein the reagent is a reagent required to perform a genome-simplified methylation sequencing technique.

9. A diagnostic reagent or diagnostic kit for detecting the methylation state or methylation level of at least one CpG dinucleotide of one or more markers of interest to diagnose a benign or malignant thyroid nodule, comprising a reagent for detecting the methylation state or level of at least one CpG dinucleotide of one or more markers of interest; wherein the one or more markers of interest are as defined in any one of claims 1-6.

10. The diagnostic reagent or diagnostic kit according to claim 9, wherein the diagnostic reagent or diagnostic kit comprises a primer and/or a probe molecule; preferably, the primer molecule is identical, complementary or hybridizes under stringent conditions to the one or more target markers and comprises at least 9 consecutive nucleotides, and the probe molecule hybridizes under stringent conditions to an amplification product of the one or more target markers;

Optionally, the diagnostic reagent or diagnostic kit further comprises a primer molecule and/or a probe molecule for detecting the reference gene ACTB.

11. The diagnostic reagent or diagnostic kit of claim 9, further comprising one or more substances selected from the group consisting of: PCR buffer, polymerase, dNTP, restriction endonuclease, digestion buffer, fluorescent dye, fluorescence quencher, fluorescent reporter, exonuclease, alkaline phosphatase, internal standard, control, KCl, mgCl ₂ And (NH) ₄ ) ₂ SO ₄ 。

12. The diagnostic reagent or diagnostic kit of claim 9, wherein the reagent further comprises reagents used in one or more of the following methods: bisulfite conversion-based PCR, DNA sequencing, methylation-sensitive restriction enzyme analysis, fluorescent quantitation, methylation-sensitive high resolution melting curve, chip-based methylation profile analysis, and mass spectrometry.

13. The diagnostic reagent or diagnostic kit of claim 12, wherein the reagent is selected from one or more of the following: bisulfite and derivatives thereof, fluorescent dyes, fluorescence quenchers, fluorescence reporters, internal standards and controls.

14. Use of at least one reagent or set of reagents that distinguish between methylated and unmethylated CpG dinucleotides within at least one target region of genomic DNA, wherein the method comprises contacting genomic DNA isolated from a biological sample of the individual with the at least one reagent or set of reagents, wherein the target region is identical or complementary to a sequence of at least 16 consecutive nucleotides of one or more target markers, wherein the consecutive nucleotides comprise at least one CpG dinucleotide sequence, thereby at least partially providing for detection and/or classification of benign and malignant thyroid nodules in an individual, wherein the one or more target markers are as defined in any one of claims 1-6.

15. Use of one or more reagents for converting an unmethylated cytosine base at position 5 to uracil or other bases detectably different from cytosine in terms of hybridization properties, an amplification enzyme, and at least one primer comprising at least 9 consecutive nucleotides, in the preparation of a kit for use in a method of detecting and/or classifying a benign malignancy of a thyroid nodule in an individual, wherein the method comprises:

a) Isolating genomic DNA from the individual biological sample;

b) Treating the genomic DNA of a) or fragment thereof with the one or more reagents;

c) Contacting the treated genomic DNA or a treated fragment thereof with the amplification enzyme and the at least one primer that is identical, complementary or hybridizes under stringent conditions to one or more target markers, wherein the treated genomic DNA or fragment thereof is amplified to produce at least one amplification product or is not amplified; and

d) Determining the methylation status or level of at least one CpG dinucleotide of the one or more markers of interest, or a mean or value reflecting the mean methylation status or level of a plurality of CpG dinucleotides of the one or more markers of interest, based on the presence or the nature of the amplificate, thereby at least partially detecting and/or classifying a benign malignancy of a thyroid nodule in an individual;

wherein the one or more markers of interest are as defined in any one of claims 1-6.

16. The use of claim 15, wherein in step b) the genomic DNA or fragment thereof is treated with an agent selected from the group consisting of bisulfites, acid sulfites, metabisulfites and combinations thereof.

17. The use according to claim 16, wherein in c) the contacting or amplification of the nucleic acid molecule is performed by using a thermostable DNA polymerase as the amplification enzyme, using a polymerase lacking 5'-3' exonuclease activity, using a polymerase chain reaction and/or producing an amplification product with a detectable label.

18. The use of claim 15, wherein the contacting or amplifying in c) comprises the use of methylation specific primers.

19. Use of one or more methylation sensitive restriction and amplification enzymes and at least one primer comprising at least 9 consecutive nucleotides, wherein the primer is identical, complementary or hybridizes under stringent conditions to one or more target markers, for the preparation of a kit for use in a method of detecting and/or classifying a benign and malignant thyroid nodule in an individual; the method comprises the following steps:

a) Isolating genomic DNA from the individual biological sample;

b) Digesting the genomic DNA of a) or a fragment thereof with the one or more methylation sensitive restriction enzymes, and contacting the resulting digestion product with the amplification enzyme and the at least one primer; and

c) Determining the methylation status or level of at least one CpG dinucleotide of the one or more markers of interest based on the presence or the nature of the amplificate, thereby at least partially detecting and/or classifying a benign malignancy of a thyroid nodule in an individual;

20. The use of claim 19, wherein the presence or absence of amplification products is determined by hybridization of at least one nucleic acid or peptide nucleic acid that is identical or complementary to at least a 16 base long fragment of a sequence selected from the one or more markers of interest.

21. Use of a treated nucleic acid derived from one or more markers of interest, wherein the treatment is suitable for converting at least one unmethylated cytosine base of the one or more markers of interest to uracil or other bases detectably different from cytosine on hybridization, in the preparation of a kit for diagnosing a benign or malignant thyroid nodule.

22. An apparatus for detecting and diagnosing benign and malignant thyroid nodules in an individual, the apparatus comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the program: (1) Obtaining the methylation level or methylation status of at least one CpG dinucleotide of one or more markers of interest in the sample, and (2) interpreting the benign and malignant thyroid nodule according to the methylation level or methylation status of (1);