CN103898199B

CN103898199B - A kind of high-throughput nucleic acid analysis method and application thereof

Info

Publication number: CN103898199B
Application number: CN201210581830.9A
Authority: CN
Inventors: 姜正文; 杨锋
Original assignee: Tian Hao Biomedical Technology (suzhou) Co Ltd; Shanghai Genesky Bio-Tech Co Ltd
Current assignee: Shanghai Haoweitai Biotechnology Co ltd; Tianhao Gene Technology Suzhou Co ltd
Priority date: 2012-12-27
Filing date: 2012-12-27
Publication date: 2016-12-28
Anticipated expiration: 2032-12-27
Also published as: CN103898199A; WO2014101655A1

Abstract

本发明涉及一种高通量基因分析方法及其应用，具体地，包括步骤：对于待分析的n种目的核酸片段，针对每个目的核酸片段，提供结合于所述目的核酸片段的不同结合区的至少2个特异探针，所述的各特异探针具有特异结合区和通用序列区，并且所述的特异结合区的序列与目的核酸片段的结合区的序列互补，而所述通用序列区的序列对应于高通量单分子或单分子扩增簇测序平台的测序引物序列，其中n为≥40的正整数；将含待分析的目的核酸片段的核酸样本与所述探针杂交，并连接所述探针，从而获得探针连接产物的混合物；用所述测序引物对探针连接产物混合物或其扩增产物进行测序，并进行分析，从而实现高通量目的基因片段的定量分析的目的。The present invention relates to a high-throughput gene analysis method and its application. Specifically, it includes the step of: for each target nucleic acid fragment for n types of target nucleic acid fragments to be analyzed, providing different binding regions that bind to the target nucleic acid fragments At least 2 specific probes, each of the specific probes has a specific binding region and a general sequence region, and the sequence of the specific binding region is complementary to the sequence of the binding region of the target nucleic acid fragment, and the general sequence region The sequence corresponds to the sequencing primer sequence of the high-throughput single-molecule or single-molecule amplification cluster sequencing platform, wherein n is a positive integer ≥ 40; the nucleic acid sample containing the target nucleic acid fragment to be analyzed is hybridized with the probe, and Ligate the probes to obtain a mixture of probe ligation products; use the sequencing primers to sequence and analyze the probe ligation product mixture or its amplification products, thereby realizing the quantitative analysis of high-throughput target gene fragments Purpose.

Description

A high-throughput nucleic acid analysis method and its application

技术领域technical field

本发明属于生物技术领域和分子诊断领域，具体地，本发明涉及一种高通量核酸分析方法及其应用。The invention belongs to the field of biotechnology and molecular diagnosis, in particular, the invention relates to a high-throughput nucleic acid analysis method and its application.

背景技术Background technique

基因是遗传的物质基础，是DNA或RNA分子上具有遗传信息的特定核苷酸序列。除了部分病毒遗传物质是RNA外，几乎所有非病毒生物的遗传物质是DNA。不同物种都有其特异的基因序列，因此通过检测样品中的基因序列可以判断样品中存在的生物种性。Gene is the material basis of heredity, and it is a specific nucleotide sequence with genetic information on DNA or RNA molecules. Except for some viruses whose genetic material is RNA, the genetic material of almost all non-viral organisms is DNA. Different species have their specific gene sequences, so by detecting the gene sequences in a sample, the biological species in the sample can be judged.

生命过程中，基因通过DNA转录成mRNA,然后以mRNA为模板，翻译出有生物活性的蛋白质分子，从而将贮存在DNA序列中遗传信息表现出来。通过分析不同组织中各mRNA的量，并结合不同组织的生理功能差异，可以了解基因的功能，因此基因的表达分析是分子生物学研究基因功能最基本的研究手段之一。In the process of life, genes are transcribed into mRNA through DNA, and then use mRNA as a template to translate biologically active protein molecules, thereby expressing the genetic information stored in the DNA sequence. By analyzing the amount of each mRNA in different tissues and combining the differences in physiological functions of different tissues, the function of genes can be understood. Therefore, gene expression analysis is one of the most basic research methods for molecular biology to study gene function.

基因的表达受到多种调控因子的共同协调作用，其中DNA的甲基化是调控基因表达的重要方式之一。DNA甲基化能引起染色质结构、DNA构象、DNA稳定性及DNA与蛋白质相互作用方式的改变，从而达到控制基因表达的目的。绝大多数情况下，甲基化主要发生在CpG序列中的胞嘧啶核苷酸的胞嘧啶环的5位碳原子上。Gene expression is coordinated by a variety of regulatory factors, among which DNA methylation is one of the important ways to regulate gene expression. DNA methylation can cause changes in chromatin structure, DNA conformation, DNA stability, and the interaction between DNA and proteins, thereby achieving the purpose of controlling gene expression. In most cases, methylation mainly occurs on the 5th carbon atom of the cytosine ring of the cytosine nucleotide in the CpG sequence.

另外，基因在复制过程中也会出现差错产生“突变”，这种突变包括点突变、大片段缺失/重复(称为拷贝数多态，CNV)、基因倒位或基因易位等。有的突变会严重影响关键基因的功能从而导致疾病，由于受到选择作用，尽管这类突变在群体中的频率非常低，相当一部分突变由于并未严重影响基因功能或影响的基因并不对个体造成生存压力，它们在群体中会保留下来并由于受到随机漂变以及奠基者效应发生频率的改变，从而成为群体中的一种遗传多态，对于单碱基或寡碱基改变的多态被称之为单核苷酸多态（SNP），而对于大区段的缺失或重复多态被称之为拷贝数多态（CNP）。遗传多态以及基因突变分析是研究基因功能以及遗传性疾病的致病机理最常见的遗传分析方法。In addition, errors may occur during gene replication to produce "mutations", such mutations include point mutations, large deletions/duplications (called copy number polymorphisms, CNVs), gene inversions, or gene translocations. Some mutations will seriously affect the function of key genes and cause diseases. Due to selection, although the frequency of such mutations in the population is very low, a considerable part of the mutations do not seriously affect gene functions or the affected genes do not cause survival for individuals. They will remain in the population and become a genetic polymorphism in the population due to random drift and frequency changes of the founder effect. The polymorphism with single base or oligobase change is called Single nucleotide polymorphisms (SNPs), and deletions or duplications of large segments are called copy number polymorphisms (CNPs). Genetic polymorphism and gene mutation analysis are the most common genetic analysis methods for studying gene function and the pathogenic mechanism of genetic diseases.

因此，基因鉴定、基因表达分析、DNA甲基化分析、突变筛查、SNP分型、CNP分型以及CNV检测是重要的分子遗传学研究手段，而且在临床分子诊断上也有着广泛的应用。正因为这些遗传分析的重要性，对于每一种分析，科学家及工程师们都开发出了多种检测方法。Therefore, gene identification, gene expression analysis, DNA methylation analysis, mutation screening, SNP typing, CNP typing and CNV detection are important molecular genetic research methods, and are also widely used in clinical molecular diagnosis. Because of the importance of these genetic assays, scientists and engineers have developed multiple assays for each assay.

早期的检测方法主要针对有限的目的片段分析。采用PCR扩增对目标基因鉴定，或采用实时荧光定量PCR进行基因表达水平、病毒含量、基因拷贝数以及甲基化水平的鉴定。常见的DNA甲基化分析主要针对亚硫酸处理后的DNA进行甲基化测序或甲基化特异PCR分析。突变筛查主要是采用PCR扩增和Sanger法测序，然后通过比较测序序列与参照序列获得突变情况。用于SNP检测的方法也很多，如TaqMan探针等位基因检测技术、限制性内切酶反应（RFLP）、高分辨率融解曲线反应、单碱基延伸技术（飞行时间质谱平台、MultiplexSNaPshot）、高温连接酶检测技术（LDR,SNPscan）等。中小通量CNV的检测方法主要包括实时定量PCR、FISH、多重连接探针扩增技术（MLPA）、多重荧光竞争PCR技术（AccuCopy）等。上述方法灵活性很高，但最大的缺陷是通量太低，对于需要检测大量基因位点的研究项目或诊断需求时显得无能为力。Early detection methods focused on the analysis of limited fragments of interest. Use PCR amplification to identify the target gene, or use real-time fluorescent quantitative PCR to identify gene expression level, virus content, gene copy number and methylation level. Common DNA methylation analysis is mainly for methylation sequencing or methylation-specific PCR analysis of DNA after sulfurous acid treatment. Mutation screening mainly uses PCR amplification and Sanger sequencing, and then obtains the mutation status by comparing the sequencing sequence with the reference sequence. There are also many methods for SNP detection, such as TaqMan probe allele detection technology, restriction endonuclease reaction (RFLP), high-resolution melting curve reaction, single base extension technology (time-of-flight mass spectrometry platform, MultiplexSNaPshot), High-temperature ligase detection technology (LDR, SNPscan), etc. Medium and small throughput CNV detection methods mainly include real-time quantitative PCR, FISH, multiplex ligation probe amplification (MLPA), multiplex fluorescent competitive PCR (AccuCopy), etc. The above method is highly flexible, but the biggest defect is that the throughput is too low, and it is powerless for research projects or diagnostic needs that need to detect a large number of gene loci.

微阵列芯片（Microarray）以高密度探针阵列为特征，这些微阵列上"印"有大量已知部分序列的DNA探针,利用分子杂交原理，将各种处理过的荧光标记样本与微阵列探针进行杂交，然后经过洗涤去除非特异杂交信号，最后用扫描仪进行荧光检测，根据荧光信号的强弱以及荧光信号所在的阵列位置确认目的基因相关的信号量。该芯片能够同时实现成千上万甚至是数百万基因片段或多态位点的分析，被广泛应用于物种鉴定、表达谱分析、高通量SNP分析、全基因组甲基化水平分析以及全基因组拷贝数分析等等。微阵列芯片最大的优势就是高通量，能够在整个基因组水平上分析基因的变化，但其缺陷是由于普遍存在非特异性杂交，定量的准确性较差，同时需要昂贵的杂交及扫描仪器，成本高而且定制芯片时间长费用高，对未知基因无法实现检测。Microarray chip (Microarray) is characterized by high-density probe arrays. These microarrays are "printed" with a large number of DNA probes with known partial sequences. Using the principle of molecular hybridization, various processed fluorescently labeled samples are combined with the microarray The probes are hybridized, and then washed to remove non-specific hybridization signals. Finally, a scanner is used for fluorescence detection, and the signal amount related to the target gene is confirmed according to the intensity of the fluorescence signal and the position of the array where the fluorescence signal is located. The chip can simultaneously analyze tens of thousands or even millions of gene fragments or polymorphic sites, and is widely used in species identification, expression profile analysis, high-throughput SNP analysis, genome-wide methylation level analysis, and global DNA analysis. Genome copy number analysis and more. The biggest advantage of microarray chips is high throughput, which can analyze gene changes at the whole genome level, but its disadvantages are that due to the ubiquitous non-specific hybridization, the accuracy of quantification is poor, and expensive hybridization and scanning instruments are required. High and the custom chip takes a long time and is expensive, and it is impossible to detect unknown genes.

第二代测序技术的出现给基因检测领域带来个革命性的变化。第二代测序技术的主要原理为芯片单分子PCR扩增后测序，如Illumina公司的MiSeq、GAIIx、Hiseq2000测序仪、ABI公司的Ion PGM、Solid测序仪、Roche公司的454 GSFLX测序仪等。第二代测序技术能够同时实现数百万个甚至是数亿个单分子扩增产物的测序，它广泛应用于基因组重测序快速鉴定致病基因、转录组分析、甲基化谱、microRNA鉴定、全基因组水平的蛋白-DNA相互作用研究以及新物种的基因组测序等等。The emergence of second-generation sequencing technology has brought about a revolutionary change in the field of genetic testing. The main principle of the second-generation sequencing technology is sequencing after single-molecule PCR amplification on the chip, such as MiSeq, GAIIx, and Hiseq2000 sequencers from Illumina, Ion PGM and Solid sequencers from ABI, and 454 GSFLX sequencers from Roche. The second-generation sequencing technology can realize the sequencing of millions or even hundreds of millions of single-molecule amplification products at the same time. It is widely used in genome resequencing to quickly identify disease-causing genes, transcriptome analysis, methylation profile, microRNA identification, Genome-wide protein-DNA interaction research and genome sequencing of new species, etc.

新一代以单分子直接测序的技术也在快熟研究发展中，主要代表公司为PacificBiosciences及Helicos。这种高通量测序技术的最大的优势就是通量很大，而且能够同时实现对已知或未知基因进行鉴定并定量，应此特异性及效率都非常高。但也存在一些不足之处，主要是相对于常规测序，下一代测序的准确性稍差，单分子扩增引入的突变对最后的结果分析会造成影响，再则该技术平台适合整个基因组或转录组的检测，如果要实现对目的区域或一组基因的检测分析，需要事先对样本进行目的基因区段的富集。目前采用的富集方法有针对有限基因区域的多重PCR及微流体数字PCR等技术，而针对大量基因区域方法主要是利用覆盖目的区域的高密度探针序列与样本进行固相或液相杂交将目的区域富集。这些富集技术主要用于候选基因的突变检测，但由于这些富集过程在一定程度上消除了产物与原始模板量的正比关系，因此不能准确实现对富集的候选基因片段进行定量分析，如表达量以及拷贝数分析。A new generation of single-molecule direct sequencing technology is also under rapid research and development, and the main representative companies are PacificBiosciences and Helicos. The biggest advantage of this high-throughput sequencing technology is that the throughput is very high, and it can simultaneously identify and quantify known or unknown genes, so the specificity and efficiency are very high. But there are also some shortcomings, mainly compared with conventional sequencing, the accuracy of next-generation sequencing is slightly worse, mutations introduced by single-molecule amplification will affect the final result analysis, and this technology platform is suitable for the entire genome or transcription For detection and analysis of a target region or a group of genes, it is necessary to enrich the target gene segment in the sample in advance. The enrichment methods currently used include multiplex PCR and microfluidic digital PCR for limited gene regions, while methods for a large number of gene regions mainly use high-density probe sequences covering the target region to perform solid-phase or liquid-phase hybridization with samples. enrichment of target regions. These enrichment techniques are mainly used for mutation detection of candidate genes, but because these enrichment processes eliminate the proportional relationship between the product and the amount of the original template to a certain extent, the quantitative analysis of the enriched candidate gene fragments cannot be accurately realized, such as Expression and copy number analysis.

因此目前本领域对于基因的检测，特别是基因鉴定、基因表达分析、DNA甲基化分析、突变筛查、SNP分型、CNP分型以及CNV检测中，尚缺乏有效的检测方法，因此迫切需要开发一种有效的高通量基因分析方法。Therefore, there is still a lack of effective detection methods for gene detection in this field, especially gene identification, gene expression analysis, DNA methylation analysis, mutation screening, SNP typing, CNP typing and CNV detection, so there is an urgent need Development of an efficient method for high-throughput genetic analysis.

发明内容Contents of the invention

本发明的主要目的就是提供一种高通量基因分析方法及其应用。The main purpose of the present invention is to provide a high-throughput gene analysis method and its application.

在本发明的第一方面，提供了一种高通量核酸分析方法，包括步骤：In a first aspect of the present invention, a high-throughput nucleic acid analysis method is provided, comprising the steps of:

(1)对于待分析的n种目的核酸片段，针对每个目的核酸片段，提供结合于所述目的核酸片段的不同结合区的至少2个特异探针，所述的各特异探针具有特异结合区和通用序列区，并且所述的特异结合区的序列与目的核酸片段的结合区的序列互补，而所述通用序列区的序列对应于测序引物的序列，其中n为≥40的正整数；(1) For the n kinds of target nucleic acid fragments to be analyzed, for each target nucleic acid fragment, at least 2 specific probes that bind to different binding regions of the target nucleic acid fragments are provided, and each of the specific probes has a specific binding region and a universal sequence region, and the sequence of the specific binding region is complementary to the sequence of the binding region of the target nucleic acid fragment, and the sequence of the universal sequence region corresponds to the sequence of the sequencing primer, wherein n is a positive integer ≥ 40;

(2)将含有待分析的目的核酸片段的核酸样本与步骤(1)所述的探针杂交，并连接所述探针，从而获得探针连接产物的混合物，其中各探针连接产物的3’和5’端都是序列对应于测序引物序列的通用序列区；(2) hybridize the nucleic acid sample containing the target nucleic acid fragment to be analyzed with the probe described in step (1), and connect the probes to obtain a mixture of probe ligation products, wherein 3 of each probe ligation product Both ' and 5' ends are the general sequence regions whose sequences correspond to the sequences of the sequencing primers;

(3)对步骤(2)的探针连接产物的混合物进行测序，和/或分析，从而获得目的核酸的信息。(3) Sequencing and/or analyzing the mixture of probe ligation products in step (2), so as to obtain the information of the target nucleic acid.

在另一优选例中，所述的测序引物为高通量单分子或单分子扩增簇测序平台的测序引物。In another preferred example, the sequencing primer is a sequencing primer of a high-throughput single-molecule or single-molecule amplification cluster sequencing platform.

在另一优选例中，n为≥100的正整数，较佳地为：选自1000-10000的正整数。In another preferred example, n is a positive integer ≥ 100, preferably: a positive integer selected from 1000-10000.

在另一优选例中，所述通用序列区的序列对应于测序引物序列表示：通用序列区的序列与测序引物序列完全相同或至少8bp相同，或通用序列区的序列与测序引物序列完全互补或至少8bp互补。In another preferred example, the sequence of the universal sequence region corresponds to the sequencing primer sequence means: the sequence of the universal sequence region is completely identical or at least 8 bp identical to the sequence of the sequencing primer, or the sequence of the universal sequence region is completely complementary to the sequence of the sequencing primer or Complementary at least 8bp.

在另一优选例中，所述特异探针还具有选自下组的一个或多个特征：In another preferred example, the specific probe also has one or more characteristics selected from the following group:

（1）所述特异探针的长度≤100bp，优选地为30-70bp，更优选为40-50bp。(1) The length of the specific probe is ≤100bp, preferably 30-70bp, more preferably 40-50bp.

（2）所述特异探针的特异结合区的长度为≤50bp，优选地为15-35bp，更优选为20-25bp。(2) The length of the specific binding region of the specific probe is ≤50 bp, preferably 15-35 bp, more preferably 20-25 bp.

（3）特异探针的通用序列区长度为≥8bp，优选地为15-35bp，更优选为20-25bp。(3) The length of the general sequence region of the specific probe is ≥ 8 bp, preferably 15-35 bp, more preferably 20-25 bp.

（4）所述特异探针的通用序列区的序列还对应于扩增引物序列；(4) The sequence of the universal sequence region of the specific probe also corresponds to the sequence of the amplification primer;

（5）所述特异探针包括标签序列。(5) The specific probe includes a tag sequence.

在另一优选例中，所述的标签序列为一段（优选3个—30个，更优选6个-9个）特异碱基组成的序列，用于区别不同样本来源的探针连接产物。In another preferred example, the tag sequence is a sequence (preferably 3-30, more preferably 6-9) of specific bases, which is used to distinguish probe ligation products from different sample sources.

在另一优选例中，每个目的核酸片段对应的2个探针为：5’端探针和3’端探针，所述的5’端探针能够与位于待分析的目的核酸片段3’端的结合区互补，所述的3’端探针能够与位于待分析的目的核酸片段5’端的结合区互补。In another preferred example, the two probes corresponding to each target nucleic acid fragment are: a 5' end probe and a 3' end probe. The binding region at the 'end is complementary, and the probe at the 3' end is complementary to the binding region at the 5' end of the target nucleic acid fragment to be analyzed.

在另一优选例中，所述5’端探针或3’端探针的结构如式I所示：In another preferred example, the structure of the 5' end probe or the 3' end probe is shown in formula I:

5’-A—L—B-3’5'-A-L-B-3'

式IFormula I

在式I中，In Formula I,

A代表通用序列区；A represents the general sequence region;

B代表特异结合区；B represents the specific binding region;

L代表A与B的核酸连接序列；L represents the nucleic acid linking sequence between A and B;

其中，A与B位置可以互换。Among them, the positions of A and B can be interchanged.

在另一优选例中，所述的L为0个碱基。In another preferred example, said L is 0 bases.

在另一优选例中，5’端探针和3’端探针之间的连接关系选自以下其中一组或多组：In another preferred example, the connection relationship between the 5' end probe and the 3' end probe is selected from one or more of the following groups:

(a)5’端探针和3’端探针为紧邻探针：即所述的5’端探针和3’端探针与待分析的目的核酸片段杂交后，二者之间距离0个碱基，在连接酶的作用下进行连接，从而获得探针连接产物；(a) The 5' end probe and the 3' end probe are adjacent probes: after the 5' end probe and the 3' end probe hybridize with the target nucleic acid fragment to be analyzed, the distance between the two is 0 Bases are connected under the action of ligase to obtain probe ligation products;

(b)5’端探针和3’端探针距离1-500个碱基：所述的5’端探针和3’端探针与待分析的目的核酸片段杂交后，在DNA聚合酶和连接酶的作用下进行间隙聚合和连接，从而获得探针连接产物；(b) The distance between the 5' end probe and the 3' end probe is 1-500 bases: after the 5' end probe and the 3' end probe are hybridized with the target nucleic acid fragment to be analyzed, the DNA polymerase Gap polymerization and ligation under the action of ligase to obtain probe ligation products;

(c)杂交体系除了5’端探针和3’端探针外，还包括探针3，探针3分别与5’端探针和3’端探针紧邻，所述的三个探针与待分析的目的核酸片段杂交后，在连接酶的作用下连接，从而获得探针连接产物。(c) In addition to the 5' end probe and the 3' end probe, the hybridization system also includes probe 3, which is adjacent to the 5' end probe and the 3' end probe respectively, and the three probes After hybridizing with the target nucleic acid fragment to be analyzed, it is ligated under the action of ligase, so as to obtain the probe ligation product.

在另一优选例中，所述探针3长度为1-500bp,优选地15-35bp,更佳地为20-25bp。In another preferred example, the length of the probe 3 is 1-500bp, preferably 15-35bp, more preferably 20-25bp.

在另一优选例中，对(a)中所述的3’端探针的5’端进行磷酸化修饰。In another preferred embodiment, the 5' end of the 3' end probe described in (a) is phosphorylated.

在另一优选例中，对(a)中所述的3’端探针的3’端和5’端探针的5’端进行抗核酸外切酶的修饰保护。In another preferred example, the 3' end of the 3' end probe and the 5' end of the 5' end probe described in (a) are modified and protected against exonucleases.

在另一优选例中，所述的抗核酸外切酶修饰为硫代修饰。In another preferred example, the anti-exonuclease modification is a thio modification.

在另一优选例中，在(b)中，5’端探针和3’端探针优选距离为1-10个碱基。In another preferred example, in (b), the distance between the 5' end probe and the 3' end probe is preferably 1-10 bases.

在另一优选例中，在(b)中，所述的DNA聚合酶没有5’-3’外切酶活性。In another preferred example, in (b), the DNA polymerase has no 5'-3' exonuclease activity.

在另一优选例中，步骤(2)和步骤(3)之间还包括步骤：对步骤(2)的获得的探针连接产物进行扩增。In another preferred example, a step is further included between step (2) and step (3): amplifying the probe ligation product obtained in step (2).

在另一优选例中，在步骤(3)中，对步骤(2)获得的探针连接产物的混合物，直接利用高通量单分子或单分子扩增簇测序平台进行测序；或对探针连接产物的混合物的扩增产物，利用高通量单分子或单分子扩增簇测序平台进行测序。In another preferred embodiment, in step (3), the mixture of probe ligation products obtained in step (2) is directly sequenced using a high-throughput single-molecule or single-molecule amplification cluster sequencing platform; or the probe The amplification products of the mixture of ligated products are sequenced using a high-throughput single molecule or single molecule amplification cluster sequencing platform.

在另一优选例中，在步骤(3)中，用第三代测序技术或第二代测序技术对探针连接产物的混合物或其扩增产物进行测序和分析。In another preferred embodiment, in step (3), the mixture of probe ligation products or its amplification products are sequenced and analyzed by third-generation sequencing technology or second-generation sequencing technology.

在另一优选例中，步骤(3)中，所述的获得目的核酸的信息是指任选自下组的一个或多个信息：SNP分型信息、DNA甲基化信息、突变筛查信息、CNP分型信息、CNV信息、病原微生物基因信息、转基因动植物产品基因信息、基因表达水平。In another preferred example, in step (3), the information of obtaining the target nucleic acid refers to one or more information selected from the following group: SNP typing information, DNA methylation information, mutation screening information , CNP typing information, CNV information, pathogenic microorganism gene information, genetic information of transgenic animal and plant products, gene expression level.

在本发明的第二方面，提供了一种高通量SNP分型方法，包括步骤：使用第一方面所述的方法对来源于待测样本的探针连接产物的混合物进行测序和SNP分析，获得目的核酸的SNP分型信息。In the second aspect of the present invention, a high-throughput SNP typing method is provided, comprising the steps of: using the method described in the first aspect to perform sequencing and SNP analysis on the mixture of probe ligation products derived from the sample to be tested, Obtain the SNP typing information of the target nucleic acid.

在另一优选例中，所述的高通量SNP分型方法包括步骤：In another preference, the high-throughput SNP typing method includes the steps of:

(1)对于待分析的n种目的核酸片段，针对每个目的核酸片段，提供结合于所述目的核酸片段的不同结合区的3个特异探针：2个5’端探针和1个3’端探针，所述的5’端探针为等位基因特异性探针，并且最后一个碱基对应相应的等位基因碱基，所述的3’端探针为共用探针，其中n为≥40的正整数；(1) For the n kinds of target nucleic acid fragments to be analyzed, for each target nucleic acid fragment, 3 specific probes that bind to different binding regions of the target nucleic acid fragment are provided: 2 5' end probes and 1 3 ' end probe, the 5' end probe is an allele-specific probe, and the last base corresponds to the corresponding allele base, and the 3' end probe is a shared probe, wherein n is a positive integer ≥ 40;

(3)用所述测序引物，对步骤(2)的探针连接产物的混合物进行测序和分析，获得目的核酸的SNP分型信息。(3) Sequencing and analyzing the mixture of probe ligation products in step (2) by using the sequencing primers to obtain the SNP typing information of the target nucleic acid.

在本发明的第三方面，提供了一种检测CNV的方法，包括步骤：使用第一方面所述的方法对来源于待测样本的探针连接产物的混合物进行测序和CNV分析，获得目的核酸的CNV信息。In the third aspect of the present invention, a method for detecting CNV is provided, comprising the steps of: using the method described in the first aspect to perform sequencing and CNV analysis on the mixture of probe ligation products derived from the sample to be tested to obtain the target nucleic acid CNV information.

在另一优选例中，所述的检测CNV的方法包括步骤：In another preference, the method for detecting CNV includes the steps of:

(1)每个目的基因片段设计特异性探针（优选地设计2条探针，1个5’端探针以及1个3’端探针）；(1) Design specific probes for each target gene fragment (preferably design 2 probes, 1 5' end probe and 1 3' end probe);

(2)将所有目的基因片段的连接探针与DNA模板变性-复性-连接（优选进行多次变性-复性-连接循环）；(2) Denaturing-annealing-ligating the ligation probes of all target gene fragments with the DNA template (preferably performing multiple denaturation-annealing-ligation cycles);

(3)连接产物PCR扩增或不扩增直接用核酸酶消化，将不同样本的扩增产物混合后进行下一代高通量芯片测序；(3) The ligation product is amplified by PCR or directly digested with nuclease without amplification, and the amplified products of different samples are mixed for next-generation high-throughput chip sequencing;

(4)测序数据分析，获得样本的目的基因拷贝数。(4) Sequencing data analysis to obtain the target gene copy number of the sample.

在本发明的第四方面，提供了一种高通量甲基化分析方法，包括步骤：使用第一方面所述的方法对来源于待测样本的探针连接产物的混合物进行测序和甲基化分析，获得目的核酸的甲基化信息。In the fourth aspect of the present invention, a high-throughput methylation analysis method is provided, comprising the steps of: using the method described in the first aspect to sequence and methylate the mixture of probe ligation products derived from the sample to be tested. Methylation analysis to obtain the methylation information of the target nucleic acid.

在另一优选例中，所述高通量甲基化分析方法包括步骤：对基因组DNA采用甲基化敏感的限制性内切酶进行处理，针对切点处设计探针，用权利要求1所述的方法检测未被切断的基因组DNA量。In another preferred example, the high-throughput methylation analysis method includes the steps of: treating the genomic DNA with a methylation-sensitive restriction endonuclease, designing probes for the cutting point, and using the method described in claim 1 The amount of uncut genomic DNA was detected by the method described above.

在另一优选例中，所述高通量甲基化分析方法包括步骤：对基因组DNA进行亚硫酸盐处理，针对目的基因片段分别设计甲基化特异探针及非甲基化特异探针，通过检测两种探针的连接产物量，获得基目的基因区段的甲基化水平。In another preferred example, the high-throughput methylation analysis method includes the steps of: performing sulfite treatment on genomic DNA, designing methylation-specific probes and non-methylation-specific probes for target gene segments, By detecting the amount of the ligation product of the two probes, the methylation level of the target gene segment is obtained.

在本发明的第五方面，提供了一种基因表达检测方法，包括步骤：使用第一方面所述的方法进行检测。In the fifth aspect of the present invention, a method for detecting gene expression is provided, comprising the step of: using the method described in the first aspect for detection.

应理解，在本发明范围内中，本发明的上述各技术特征和在下文(如实施例)中具体描述的各技术特征之间都可以互相组合，从而构成新的或优选的技术方案。限于篇幅，在此不再一一累述。It should be understood that within the scope of the present invention, the above-mentioned technical features of the present invention and the technical features specifically described in the following (such as embodiments) can be combined with each other to form new or preferred technical solutions. Due to space limitations, we will not repeat them here.

附图说明Description of drawings

下列附图用于说明本发明的具体实施方案，而不用于限定由权利要求书所界定的本发明范围。The following drawings are used to illustrate specific embodiments of the present invention, but not to limit the scope of the present invention defined by the claims.

图1显示了本发明一个具体实施例中高通量测定的技术思路1。Figure 1 shows the technical idea 1 of high-throughput measurement in a specific embodiment of the present invention.

图2显示了本发明一个具体实施例中高通量测定的技术思路2。Fig. 2 shows the technical idea 2 of high-throughput measurement in a specific embodiment of the present invention.

图3显示基于单分子直接或扩增后测序的高通量连接产物检测技术用于高通量SNP分型的流程。Figure 3 shows the workflow for high-throughput SNP typing based on high-throughput ligation product detection technology based on single-molecule direct or post-amplification sequencing.

图4显示基于单分子直接或扩增后测序的高通量连接产物检测技术用于高通量CNV检测的流程。Figure 4 shows the workflow for high-throughput CNV detection based on high-throughput ligation product detection technology based on single-molecule direct or post-amplification sequencing.

图5显示基于单分子直接或扩增后测序的高通量连接产物检测技术用于高通量目的基因突变筛查的流程。Figure 5 shows the flow of high-throughput ligation product detection technology based on single-molecule direct or post-amplification sequencing for high-throughput target gene mutation screening.

图6显示基于单分子直接或扩增后测序的高通量连接产物检测技术用于高通量候选基因表达分析的流程。Figure 6 shows the workflow of high-throughput ligation product detection technology based on single-molecule direct or post-amplification sequencing for high-throughput candidate gene expression analysis.

图7显示基于单分子直接或扩增后测序的高通量连接产物检测技术用于高通量基因甲基化水平分析的流程。Figure 7 shows the workflow of high-throughput ligation product detection technology based on single-molecule direct or post-amplification sequencing for high-throughput gene methylation level analysis.

图8显示实施例2中DMD基因外显子缺失重复的检测结果。FIG. 8 shows the detection results of the exon deletion and duplication of the DMD gene in Example 2.

具体实施方式detailed description

本发明人经过广泛而深入的研究，首次利用多重连接探针扩增技术的高特异性以及对目的片段的数量信息的良好保存特性，利用下一代高通量测序技术平台对连接探针扩增产物进行测序鉴定并定量，从而实现高通量目的基因片段的定量分析。在此基础上完成了本发明。After extensive and in-depth research, the inventors first utilized the high specificity of the multiple ligation probe amplification technology and the good preservation of the quantitative information of the target fragments, and used the next-generation high-throughput sequencing technology platform to amplify the ligation probes. The products are identified and quantified by sequencing, so as to realize the quantitative analysis of high-throughput target gene fragments. The present invention has been accomplished on this basis.

具体地，包括步骤：对于待分析的n种目的核酸片段，针对每个目的核酸片段，提供结合于所述目的核酸片段的不同结合区的至少2个特异探针，所述的各特异探针具有特异结合区和通用序列区，并且所述的特异结合区的序列与目的核酸片段的结合区的序列互补，而所述通用序列区的序列对应于测序引物序列，其中n为≥40的正整数；将含待分析的目的核酸片段的核酸样本与所述探针杂交，并连接所述探针，从而获得探针连接产物的混合物，其中各探针连接产物的3’和5’端都是序列对应于测序引物序列的通用序列区；用所述测序引物，对探针连接产物的混合物进行测序，并进行分析，从而实现高通量目的基因片段的定量分析。Specifically, it includes the step of: for each target nucleic acid fragment to be analyzed, at least 2 specific probes that bind to different binding regions of the target nucleic acid fragment are provided, and each of the specific probes It has a specific binding region and a universal sequence region, and the sequence of the specific binding region is complementary to the sequence of the binding region of the target nucleic acid fragment, and the sequence of the universal sequence region corresponds to the sequence of the sequencing primer, wherein n is ≥ 40 positive Integer; the nucleic acid sample containing the target nucleic acid fragment to be analyzed is hybridized with the probe, and the probe is ligated to obtain a mixture of probe ligation products, wherein the 3' and 5' ends of each probe ligation product are both It is a universal sequence region whose sequence corresponds to the sequencing primer sequence; using the sequencing primer, the mixture of probe ligation products is sequenced and analyzed, thereby realizing quantitative analysis of high-throughput target gene fragments.

多重连接探针扩增（MLPA）Multiplex Ligation Probe Amplification (MLPA)

多重连接探针扩增是一种能准确检测目的基因片段分子数目的技术，其基本流程包括探针和靶核酸序列进行杂交，之后通过连接、PCR扩增，产物毛细管电泳并收集数据，分析软件对收集的数据进行分析最后得出结论。Multiple ligation probe amplification is a technology that can accurately detect the number of target gene fragment molecules. Its basic process includes hybridization between the probe and the target nucleic acid sequence, followed by ligation, PCR amplification, product capillary electrophoresis and data collection, and analysis software Analyze the collected data and draw conclusions.

MLPA探针是一条包括一段引物序列和一段特异性序列的寡核苷酸片段。在MLPA反应中，这两者都与靶序列进行杂交，之后使用连接酶连接两部分探针。连接反应高度特异，只有当两个探针与靶序列完全杂交，即靶序列与探针特异性序列完全互补，连接酶才能将两段探针连接成一条完整的核酸单链；反之，如果靶序列与探针序列不完全互补，即使只有一个碱基的差别，就会导致杂交不完全，使连接反应无法进行或连接效率大大下降。The MLPA probe is an oligonucleotide fragment including a primer sequence and a specific sequence. In the MLPA reaction, both hybridize to the target sequence, after which the two-part probe is ligated using ligase. The ligation reaction is highly specific. Only when the two probes are fully hybridized to the target sequence, that is, the target sequence is completely complementary to the probe-specific sequence, can the ligase link the two probes into a complete nucleic acid single strand; otherwise, if the target The sequence is not completely complementary to the probe sequence, even if there is only one base difference, it will lead to incomplete hybridization, so that the ligation reaction cannot be carried out or the ligation efficiency is greatly reduced.

连接反应完成后，用一对通用引物扩增连接好的探针，每个探针的扩增产物的长度都是唯一的，范围在100～480个碱基对，然后通过毛细管电泳分离扩增产物，专用软件分析，得出结论。After the ligation reaction is completed, a pair of universal primers are used to amplify the connected probes. The length of the amplification product of each probe is unique, ranging from 100 to 480 base pairs, and then separated and amplified by capillary electrophoresis. The product is analyzed by special software and a conclusion is drawn.

只有当连接反应完成，才能进行随后的PCR扩增，并收集到相应探针的扩增峰，如果检测的靶序列发生点突变或缺失、扩增突变，那么相应探针的扩增峰便会缺失、降低或增加，因此，根据扩增峰的改变就可判断靶序列是否有拷贝数的异常或点突变存在。Only when the ligation reaction is completed, can the subsequent PCR amplification be carried out, and the amplification peak of the corresponding probe can be collected. Deletion, decrease or increase. Therefore, according to the change of the amplification peak, it can be judged whether there is an abnormal copy number or a point mutation in the target sequence.

多重连接探针扩增技术的优点是探针连接的特异性很高，因此在一个体系中能同时实现多个目的基因片段的分析，而且连接产物的量与原始模板量之间存在正比关系，同时由于不同基因片段的连接产物采用通用引物扩增，因此扩增产物的量很好地保留了原始模板的量的信息，利用该方法能够通过连接PCR产物终端分析检测原始模板目的基因的量。The advantage of multiple ligation probe amplification technology is that the specificity of probe ligation is very high, so the analysis of multiple target gene fragments can be realized in one system at the same time, and there is a direct proportional relationship between the amount of ligation products and the amount of the original template, At the same time, because the ligation products of different gene fragments are amplified with universal primers, the amount of the amplified product well retains the information of the amount of the original template, and this method can detect the amount of the target gene of the original template through terminal analysis of the ligated PCR product.

多重连接探针扩增已经应用于多个领域研究，包括染色体非整倍体改变，SNP、点突变、染色体亚端粒的基因重排，以及常见的儿童遗传性疾病的检测。Multiplex ligation probe amplification has been applied to research in many fields, including chromosomal aneuploidy changes, SNPs, point mutations, gene rearrangements of chromosomal subtelomeres, and the detection of common childhood genetic diseases.

该方法的不足之处主要在于：1、连接产物通常长度不同，而且采用一对通用PCR荧光引物进行扩增，根据荧光标记PCR产物长度不同采用电泳技术进行不同位点的扩增量进行确定，这样大大限制了一个反应体系检测位点的数量，同时只能检测40~50个核苷酸序列，通量较低；2、连接探针序列通常很长（>100bp)，不能直接合成，只能利用M13克隆制备，比较繁琐；3、连接探针序列很长，而且不同位点的连接探针及连接产物长度差别可达数百个碱基，这样不同位点间的连接效率及扩增效率都会存在较大差异和波动，从而影响检测准确性。The main disadvantages of this method are: 1. The ligation products usually have different lengths, and a pair of general-purpose PCR fluorescent primers are used to amplify, and the amplification amount of different sites is determined by electrophoresis technology according to the different lengths of the fluorescently labeled PCR products. This greatly limits the number of detection sites in a reaction system, and can only detect 40-50 nucleotide sequences at the same time, and the throughput is low; 2. The ligation probe sequence is usually very long (>100bp) and cannot be directly synthesized. It can be prepared by M13 clones, which is relatively cumbersome; 3. The ligation probe sequence is very long, and the length difference between the ligation probes and ligation products at different sites can reach hundreds of bases, so the ligation efficiency and amplification between different sites There will be large differences and fluctuations in efficiency, which will affect the detection accuracy.

高通量基因分析方法High-throughput genetic analysis methods

本发明提供了一种高通量基因分析方法。该方法的技术思路如下：The invention provides a high-throughput gene analysis method. The technical idea of this method is as follows:

思路1（图1）：以分析两个目的基因片段（F1和F2）为例，包括下述步骤：Idea 1 (Figure 1): Take the analysis of two target gene fragments (F1 and F2) as an example, including the following steps:

1.针对目的核酸片段设计特异性DNA探针，探针的设计有三种可选方法：1. Design specific DNA probes for target nucleic acid fragments. There are three optional methods for probe design:

第一种方法是针对每一个目的片段设计两个紧邻探针（探针1和探针2），一个是5’端探针（即探针1），另一个是3’端探针（即探针2）。5’端探针前半部分序列（探针1的a）是后续PCR扩增引物相一致的通用序列，而后半部分（探针1的b1）为与目的核酸片段杂交的特异序列。3’端探针的5’端进行磷酸化修饰，前半部分（探针2的b1）为与目的核酸片段杂交的特异序列，后半部分（探针2的a）是后续PCR扩增引物相一致的通用序列。这两个探针与模板DNA杂交后，在连接酶作用下进行连接。The first method is to design two adjacent probes (probe 1 and probe 2) for each target fragment, one is the 5' end probe (i.e. probe 1) and the other is the 3' end probe (i.e. Probe 2). The first half of the 5’-end probe sequence (a of probe 1) is the general sequence consistent with the subsequent PCR amplification primers, while the second half (b1 of probe 1) is a specific sequence that hybridizes with the target nucleic acid fragment. The 5' end of the 3' end probe is modified by phosphorylation, the first half (b1 of probe 2) is the specific sequence that hybridizes with the target nucleic acid fragment, and the second half (a of probe 2) is the primer phase for subsequent PCR amplification. Consistent universal sequence. After the two probes hybridize with the template DNA, they are ligated under the action of ligase.

第二种方法同样设计两个探针（探针1和探针2），探针的结构同方法一，但这两个探针之间有数个至数十个碱基距离（该距离可选1-500bp，较佳地1-10bp），探针与模板DNA杂交后，在没有5’->3’外切酶活性的聚合酶作用下延伸，将两个探针之间的间隙补上，并连接酶作用下进行连接。The second method also designs two probes (probe 1 and probe 2), the structure of the probe is the same as the method 1, but there is a distance of several to tens of bases between the two probes (the distance is optional 1-500bp, preferably 1-10bp), after the probe is hybridized to the template DNA, it is extended under the action of a polymerase without 5'->3' exonuclease activity, and the gap between the two probes is filled , and ligated under the action of ligase.

第三种方法是设计3对探针（探针1、探针2和探针3），5’端及3’端探针（探针1和探针2）的结构同方法一，但这两个探针之间有数十个至数百个碱基距离（较佳地20-25bp），中间探针（探针3）的5’端磷酸化，正好与5’端及3’端探针的间隙匹配，三个探针与模板DNA杂交后在连接酶作用下进行连接。为了增加连接产物的量，优选利用高温耐热聚合酶如TaqDNA ligase进行变性-复性-连接多次循环。The third method is to design 3 pairs of probes (probe 1, probe 2 and probe 3), the structure of the 5' end and 3' end probes (probe 1 and probe 2) is the same as the first method, but this There are dozens to hundreds of bases distance (preferably 20-25bp) between the two probes, the 5' end of the middle probe (probe 3) is phosphorylated, just in line with the 5' end and 3' end The gaps of the probes are matched, and the three probes are ligated under the action of ligase after hybridization with the template DNA. In order to increase the amount of ligation products, it is preferable to use a high-temperature thermostable polymerase such as TaqDNA ligase to perform multiple cycles of denaturation-annealing-ligation.

2.利用一对与下一代测序平台扩增引物或测序引物相匹配的PCR引物，对连接产物进行扩增，获得含有完整特异序列的目的基因片段。2. Use a pair of PCR primers that match the amplification primers or sequencing primers of the next-generation sequencing platform to amplify the ligation product to obtain the target gene fragment containing the complete specific sequence.

优选地，PCR引物具有一段数个至数十个碱基长度的标签序列（即index），不同样本的连接产物可以用带有不同标签序列的PCR引物进行扩增，这样不同样本的扩增产物可以混合在一起，在后续测序数据中根据该标签序列将测序序列归类到不同样本中去。Preferably, the PCR primer has a tag sequence (ie index) with a length of several to tens of bases, and the ligation products of different samples can be amplified with PCR primers with different tag sequences, so that the amplified products of different samples They can be mixed together, and the sequencing sequences can be classified into different samples according to the tag sequence in the subsequent sequencing data.

3.连接探针扩增产物利用下一代高通量芯片测序平台进行单分子扩增测序或直接单分子测序；3. Use the next-generation high-throughput chip sequencing platform to perform single-molecule amplification sequencing or direct single-molecule sequencing of the amplification products of the ligation probe;

4.对测序数据进行分析，实现测序序列的样本归类，基因位点归类以及各个基因片段对应连接产物定量。4. Analyze the sequencing data to realize the sample classification of the sequencing sequence, the classification of the gene locus and the quantification of the corresponding connection products of each gene fragment.

首先根据标签序列将测序获得的序列归到相应的样本上，然后根据每个序列的碱基组成将其归到相应基因片段的连接产物上，统计每个连接产物的测序序列数目可以估计该基因片段连接产物的相对量。First, the sequence obtained by sequencing is assigned to the corresponding sample according to the tag sequence, and then assigned to the ligation product of the corresponding gene fragment according to the base composition of each sequence, and the number of sequencing sequences of each ligation product can be estimated to estimate the gene Relative amounts of fragment ligation products.

思路2（图2）：以分析两个目的基因片段（F1和F2）为例，包括下述步骤：Idea 2 (Figure 2): Take the analysis of two target gene fragments (F1 and F2) as an example, including the following steps:

第一种方法是设计两个紧邻探针（探针1和探针2），一个是5’端探针（探针1），另一个3’端探针（探针2）。5’端探针前半部分序列是与下一代测序平台扩增引物或测序引物相匹配的通用序列，而后半部分为与目的核酸片段杂交的特异序列，3’端探针的5’端进行磷酸化修饰，前半部分为与目的核酸片段杂交的特异序列，后半部分是与下一代测序平台扩增引物或测序引物相匹配的通用序列，5’端探针的5’末端几个碱基进行硫代修饰或其它保护基团修饰免受核算外切酶降解，3’端探针的3’末端几个碱基进行硫代修饰或其它保护基团修饰免受核算外切酶降解，这两个探针与模板DNA杂交后在连接酶作用下进行连接。The first approach is to design two adjacent probes (Probe 1 and Probe 2), one at the 5’ end (Probe 1) and the other at the 3’ end (Probe 2). The first half of the 5'-end probe sequence is a universal sequence that matches the amplification primer or sequencing primer of the next-generation sequencing platform, while the second half is a specific sequence that hybridizes with the target nucleic acid fragment, and the 5'-end of the 3'-end probe is phosphated The first half is a specific sequence that hybridizes with the target nucleic acid fragment, the second half is a universal sequence that matches the amplification primer or sequencing primer of the next-generation sequencing platform, and a few bases at the 5' end of the 5' end probe are modified. Thio-modification or other protective group modification to prevent exonuclease degradation, a few bases at the 3' end of the 3'-end probe are modified by thio-modification or other protective groups to prevent exo-accurate degradation. After the probes hybridize to the template DNA, they are ligated under the action of ligase.

第二种方法同样设计两个探针，探针结构同方法一，但这两个探针之间有数个至数十个碱基距离（该距离可选1-500bp，较佳地1-10bp）），探针与模板DNA杂交后，在没有5’->3’外切酶活性的聚合酶作用下延伸，将两个探针间隙补上，然后在连接酶作用下进行连接。The second method also designs two probes, the probe structure is the same as method one, but there are several to tens of base distances between the two probes (the distance can be 1-500bp, preferably 1-10bp )), after the probe is hybridized to the template DNA, it is extended under the action of a polymerase without 5'->3' exonuclease activity, the gap between the two probes is filled, and then ligated under the action of ligase.

第三种方法是设计3对探针，5’端及3’端探针结构同方法一，但这两个探针之间有数十个至数百个碱基距离（较佳地20-25bp），中间探针5’端磷酸化，正好与5’端及3’端探针的间隙匹配。通常情况下，5’端或3’端探针会加上一段数个至数十个碱基长度的标签序列，不同样本的连接产物带不同标签序列，这样不同样本的连接产物可以混合在一起，在后续测序数据中可以根据该标签序列将测序序列归类到不同样本中去。三个探针与模板DNA杂交后在连接酶作用下进行连接，为了增加连接产物的量，可以利用高温耐热聚合酶如TaqDNA ligase进行变性-复性-连接多次循环。The third method is to design 3 pairs of probes, the structure of the 5' end and the 3' end probe is the same as the method one, but there are tens to hundreds of base distances between the two probes (preferably 20- 25bp), the 5' end of the middle probe is phosphorylated, just matching the gap between the 5' end and the 3' end probe. Usually, a tag sequence of several to tens of bases is added to the probe at the 5' end or 3' end, and the ligation products of different samples have different tag sequences, so that the ligation products of different samples can be mixed together , in the subsequent sequencing data, the sequencing sequence can be classified into different samples according to the tag sequence. After the three probes are hybridized with the template DNA, they are ligated under the action of ligase. In order to increase the amount of the ligated product, a high-temperature heat-resistant polymerase such as TaqDNA ligase can be used for multiple cycles of denaturation-refolding-ligation.

2.连接反应产物用各种核酸外切酶联合作用如核酸外切酶I（exonucleaseI）、核酸外切酶III（exonuclease III）及λ核酸外切酶（lamda exonuclease）共同消化处理，将所有非连接产物的单链或双链DNA去除后纯化（去除非连接产物的所有核酸序列，可不需要连接产物PCR扩增的步骤，测序结果能够更真实反应连接产物信息）。2. The ligation reaction product is digested with various exonucleases such as exonuclease I (exonuclease I), exonuclease III (exonuclease III) and lambda exonuclease (lamda exonuclease). The single-stranded or double-stranded DNA of the ligation product is removed and then purified (removing all nucleic acid sequences of the non-ligated product, the step of PCR amplification of the ligated product is not required, and the sequencing results can more truly reflect the information of the ligated product).

3.非扩增连接产物直接用下一代高通量芯片测序平台进行单分子扩增测序或直接单分子测序。3. The non-amplified ligation products are directly subjected to single-molecule amplification sequencing or direct single-molecule sequencing using the next-generation high-throughput chip sequencing platform.

4.对测序数据进行分析，实现测序序列的样本归类，基因位点归类以及各个基因片段对应连接产物定量：首先根据标签序列将测序获得的序列归到相应的样本上，然后根据每个序列的碱基组成将其归到相应基因片段的连接产物上，统计每个连接产物的测序序列数目可以估计该基因片段连接产物的相对量。4. Analyze the sequencing data to realize the sample classification of the sequencing sequence, the classification of the gene locus, and the quantification of the connection products corresponding to each gene fragment: first, the sequence obtained by sequencing is assigned to the corresponding sample according to the tag sequence, and then according to each The base composition of the sequence is assigned to the ligation product of the corresponding gene fragment, and the relative amount of the ligation product of the gene fragment can be estimated by counting the number of sequenced sequences of each ligation product.

引物Primer

如本文所用，术语“引物”指的是能与模板互补配对，在DNA聚合酶的作用合成与模板互补的DNA链的寡聚核苷酸的总称。引物可以是天然的RNA、DNA，也可以是任何形式的天然核苷酸，引物甚至可以是非天然的核苷酸如LNA或ZNA等。As used herein, the term "primer" refers to a general term for oligonucleotides capable of complementary pairing with a template and synthesizing a DNA chain complementary to the template under the action of DNA polymerase. Primers can be natural RNA, DNA, or any form of natural nucleotides, and even non-natural nucleotides such as LNA or ZNA.

引物“大致上”(或“基本上”)与模板上一条链上的一个特殊的序列互补。引物必须与模板上的一条链充分互补才能开始延伸，但引物的序列不必与模板的序列完全互补。比如，在一个3’端与模板互补的引物的5’端加上一段与模板不互补的序列，这样的引物仍大致上与模板互补。只要有足够长的引物能与模板充分的结合，非完全互补的引物也可以与模板形成引物-模板复合物，从而进行扩增。A primer is "substantially" (or "essentially") complementary to a particular sequence on one strand of the template. A primer must be sufficiently complementary to one strand of the template to initiate extension, but the sequence of the primer does not have to be perfectly complementary to that of the template. For example, adding a sequence that is not complementary to the template at the 5' end of a primer that is complementary to the template at the 3' end, such a primer is still approximately complementary to the template. As long as there is a sufficiently long primer that can fully combine with the template, non-completely complementary primers can also form a primer-template complex with the template, thereby performing amplification.

在本发明中，引物包括(但不限于)：简并引物、测序引物、接头引物等。本领域的普通技术人员可以使用常规方法进行引物的设计和优化。In the present invention, primers include (but not limited to): degenerate primers, sequencing primers, linker primers and the like. Those skilled in the art can use conventional methods to design and optimize primers.

高通量测序High-throughput sequencing

基因组的“再测序”使得人类能够尽早地发现与疾病相关基因的异常变化，有助于对个体疾病的诊断和治疗进行深入的研究。The "resequencing" of the genome enables humans to discover abnormal changes in disease-related genes as early as possible, which is helpful for in-depth research on the diagnosis and treatment of individual diseases.

本领域技术人员通常可以采用三种第二代测序平台进行高通量测序：454FLX(Roche公司)、Solexa Genome Analyzer(Illumina公司)和Applied Biosystems公司的SOLID等。这些平台共同的特点是极高的测序通量，相对于传统测序的96道毛细管测序，高通量测序一次实验可以读取40万到30亿条序列，根据平台的不同，读取长度从25bp到450bp不等，因此不同的测序平台在一次实验中，可以读取1G到300G不等的碱基数。Those skilled in the art can generally use three second-generation sequencing platforms for high-throughput sequencing: 454FLX (Roche Company), Solexa Genome Analyzer (Illumina Company), and SOLID from Applied Biosystems Company, etc. The common feature of these platforms is extremely high sequencing throughput. Compared with the 96-channel capillary sequencing of traditional sequencing, high-throughput sequencing can read 400,000 to 3 billion sequences in one experiment. Depending on the platform, the read length ranges from 25bp to 450bp, so different sequencing platforms can read bases ranging from 1G to 300G in one experiment.

Solexa高通量测序包括DNA簇形成和上机测序两个步骤：PCR扩增产物的混合物与固相载体上固定的测序探针进行杂交，并进行固相桥式PCR扩增，形成测序簇；对所述测序簇用“边合成-边测序法”进行测序，从而得到样本中疾病相关核酸分子的核苷酸序列。Solexa high-throughput sequencing includes two steps: DNA cluster formation and on-machine sequencing: the mixture of PCR amplification products is hybridized with the sequencing probes immobilized on the solid-phase carrier, and solid-phase bridge PCR amplification is performed to form sequencing clusters; The sequencing clusters are sequenced by "sequencing-by-synthesis" to obtain the nucleotide sequence of the disease-associated nucleic acid molecule in the sample.

DNA簇的形成是使用表面连有一层单链引物(primer)的测序芯片(flow cell),单链状态的DNA片段通过接头序列与芯片表面的引物通过碱基互补配对的原理被固定在芯片的表面，通过扩增反应，固定的单链DNA变为双链DNA，双链再次变性成为单链，其一端锚定在测序芯片上，另一端随机和附近的另一个引物互补从而被锚定，形成“桥”；在测序芯片上同时有上千万个DNA单分子发生以上的反应；形成的单链桥，以周围的引物为扩增引物，在扩增芯片的表面再次扩增，形成双链，双链经变性成单链，再次成为桥，称为下一轮扩增的模板继续扩增；反复进行了30轮扩增后，每个单分子得到1000倍扩增，称为单克隆的DNA簇。DNA clusters are formed by using a flow cell with a layer of single-stranded primers attached to the surface, and DNA fragments in a single-stranded state are immobilized on the chip through the adapter sequence and the primers on the surface of the chip through the principle of complementary base pairing. On the surface, through the amplification reaction, the immobilized single-stranded DNA becomes double-stranded DNA, and the double-stranded DNA is denatured again to become a single-strand, one end of which is anchored on the sequencing chip, and the other end is randomly complementary to another nearby primer to be anchored. Form a "bridge"; on the sequencing chip, there are tens of millions of DNA single molecules undergoing the above reactions at the same time; The double-strand is denatured into a single-strand, which becomes a bridge again, and the template called the next round of amplification continues to amplify; after 30 rounds of amplification, each single molecule is amplified by 1000 times, which is called monoclonal DNA clusters.

DNA簇在Solexa测序仪上进行边合成边测序，测序反应中，四种碱基分别标记不同的荧光，每个碱基末端被保护碱基封闭，单次反应只能加入一个碱基，经过扫描，读取该次反应的颜色后，该保护集团被除去，下一个反应可以继续进行，如此反复，即得到碱基的精确序列。在Solexa多重测序(MultiplexedSequencing)过程中会使用Index(标签orbarcode)来区分样品，并在常规测序完成后，针对Index部分额外进行7个循环的测序，通过Index的识别，可以在1条测序甬道中区分高达1000种以上不同的样品。DNA clusters are synthesized and sequenced on the Solexa sequencer. In the sequencing reaction, the four bases are labeled with different fluorescence, and each base end is blocked by a protective base. Only one base can be added in a single reaction. After scanning , after reading the color of this reaction, the protection group is removed, and the next reaction can continue, and so on, and the precise sequence of the base is obtained. In the Solexa Multiplexed Sequencing (MultiplexedSequencing) process, the Index (label or barcode) is used to distinguish samples, and after the routine sequencing is completed, an additional 7 cycles of sequencing are performed on the Index part. Through the identification of the Index, it can be sequenced in one sequencing lane Distinguish up to 1000+ different samples.

应用application

本发明还提供了所述高通量基因分析方法的应用。The invention also provides the application of the high-throughput gene analysis method.

SNP分型SNP typing

使用本发明的方法检测SNP，每个反应可检测成百上千甚至是成千上万个SNP位点。在一个具体的实施例中，步骤如下（图3）：By using the method of the present invention to detect SNPs, hundreds or even thousands of SNP sites can be detected in each reaction. In a specific embodiment, the steps are as follows (Figure 3):

1.每个SNP位点优选地设计3条探针，2个5’端等位基因特异性探针以及1个3’端共用探针，每个等位基因特异性探针的最后一个碱基对应相应的等位基因碱基，为了增加连接的特异性，在该探针的倒数第2-4位中的某一处改变碱基引入额外的不匹配增加连接的特异性；1. Preferably design 3 probes for each SNP site, 2 5'-end allele-specific probes and 1 3'-end common probe, the last base of each allele-specific probe The base corresponds to the corresponding allelic base. In order to increase the specificity of connection, change the base at one of the penultimate 2-4 positions of the probe to introduce additional mismatches to increase the specificity of connection;

2.将所有SNP位点的连接探针与DNA模板进行变性-复性-连接，为了增加连接产物的量，可进行多次变性-复性-连接循环；2. Denature-anneal-ligate the ligation probes of all SNP sites with the DNA template. In order to increase the amount of ligation products, multiple denaturation-annealing-ligation cycles can be performed;

3.连接产物PCR进行扩增，或不扩增直接用核算酶消化纯化，不同样本的扩增产物混合后进行下一代高通量芯片测序；3. Amplify the ligation product by PCR, or directly digest and purify it with accounting enzymes without amplification, and perform next-generation high-throughput chip sequencing after mixing the amplification products of different samples;

4.测序数据分析，根据两个等位基因连接产物的比例进行基因型判读，或者在出现非特异连接情况下，可取多个样本的两个连接产物数量数据进行聚类分析（预计会有3个聚集区，对应三种基因型），根据聚类结果进行基因型判读。4. Sequencing data analysis, genotype interpretation is performed according to the ratio of the two allelic connection products, or in the case of non-specific connection, the data of the number of two connection products of multiple samples can be taken for cluster analysis (it is expected that there will be 3 clustering area, corresponding to the three genotypes), and genotype interpretation is performed according to the clustering results.

CNV检测CNV detection

使用本发明的方法检测CNV，每个反应可检测成百上千甚至是成千上万个目的基因片段。在一个具体的实施例中，步骤如下（图4）：By using the method of the invention to detect CNV, hundreds or even thousands of target gene fragments can be detected in each reaction. In a specific embodiment, the steps are as follows (Figure 4):

1.每个反应体系至少包含1个参照基因片段，参照基因片段是认为在检测物种群体中不存在拷贝数多态的基因片段，用于校正不同样本的取样差异；1. Each reaction system contains at least one reference gene fragment, which is a gene fragment that is considered to have no copy number polymorphism in the detected species population, and is used to correct the sampling differences of different samples;

2.每个目的基因或参照基因片段优选地设计2条探针，1个5’端探针以及1个3’端探针；2. For each target gene or reference gene fragment, preferably design 2 probes, 1 5' end probe and 1 3' end probe;

3.将所有目的基因或参照基因片段的连接探针与DNA模板变性-复性-连接，为了增加连接产物的量，可进行多次变性-复性-连接循环；3. Denature-refold-connect the ligation probes of all target genes or reference gene fragments with the DNA template. In order to increase the amount of ligation products, multiple cycles of denaturation-refolding-connection can be performed;

4.连接产物PCR扩增或不扩增直接用核酸酶消化，不同样本的扩增产物混合后进行下一代高通量芯片测序；4. The ligation product is amplified by PCR or directly digested with nuclease without amplification, and the amplified products of different samples are mixed for next-generation high-throughput chip sequencing;

5.测序数据分析：将每个目的基因对应连接产物的检测数量除以参照基因片段连接产物的检测数量获得校正值R如图中N_T1/N_R1，然后将该R值除以参照样本的R值后获得校正值RR，如果参照基因多于1个，则对每个参照基因片段都计算一个RR值，然后取其中位数即为该目的基因的相对拷贝数值，将该数值乘以参照样本的拷贝数即获得该样本的目的基因拷贝数如图中CN_T1。5. Sequencing data analysis: Divide the detected quantity of the ligation product corresponding to each target gene by the detected quantity of the ligated product of the reference gene fragment to obtain the correction value R as shown in the figure N _T1 /N _R1 , and then divide the R value by the reference sample After the R value, the correction value RR is obtained. If there is more than one reference gene, a RR value is calculated for each reference gene fragment, and then the median is taken as the relative copy value of the target gene, and the value is multiplied by the reference gene. The copy number of the sample is the copy number of the target gene obtained in the sample, as shown in the figure CN _T1 .

目的基因突变筛查Target gene mutation screening

使用本发明的方法筛查目的基因突变（图5），在一个具体的实施例中，步骤如下：由于连接探针对应DNA模板如果出现显突变会严重降低连接效率，针对目的区域设计高密度平铺探针，采用CNV检测的检测步骤与数据分析方法获得每个探针区域的拷贝数，对于拷贝数偏离正常值的探针区域可作为存在突变位点的候选区域，该区域可用常规测序进行验证。Use the method of the present invention to screen the mutation of the target gene (Figure 5). In a specific embodiment, the steps are as follows: Since the DNA template corresponding to the ligation probe has a significant mutation, the ligation efficiency will be seriously reduced, and a high-density platform is designed for the target region. Spread the probes, and use the detection steps and data analysis methods of CNV detection to obtain the copy number of each probe region. For the probe region whose copy number deviates from the normal value, it can be used as a candidate region for mutation sites, which can be detected by conventional sequencing. verify.

多重候选基因表达水平分析Multiple candidate gene expression level analysis

使用本发明的方法分析多重候选基因表达水平（图6），每个反应可检测成百上千甚至是成千上万个目的基因的表达水平。在一个具体的实施例中，步骤如下：针对每个基因可设计多个探针，可区分不同剪切体的表达比例，以反转录获得的cDNA或直接以RNA为模板进行探针连接，连接产物扩增后进行进行下一代高通量芯片测序。测序结果进行分析，每个基因目的区域的连接产物数量经多个参照基因校正后可取中位数作为该基因相对表达水平，用于不同样本间该基因表达水平的差异分析。Using the method of the present invention to analyze the expression levels of multiple candidate genes ( FIG. 6 ), each reaction can detect the expression levels of hundreds or even thousands of target genes. In a specific embodiment, the steps are as follows: multiple probes can be designed for each gene, the expression ratio of different splice bodies can be distinguished, and the cDNA obtained by reverse transcription or directly using RNA as a template is used for probe ligation, After the ligation products were amplified, the next-generation high-throughput chip sequencing was performed. The sequencing results were analyzed, and the number of ligation products in the target region of each gene was corrected by multiple reference genes, and the median was taken as the relative expression level of the gene, which was used for the differential analysis of the gene expression level among different samples.

高通量甲基化分析High-throughput methylation analysis

使用本发明的方法分析甲基化平，每个反应可检测成百上千甚至是成千上万个CpG岛的甲基化水平。在一个具体的实施例中，方法如下（图7）：Using the method of the present invention to analyze the methylation levels, each reaction can detect the methylation levels of hundreds or even thousands of CpG islands. In a specific embodiment, the method is as follows (Figure 7):

一种方法是将基因组DNA采用甲基化敏感的限制性内切酶进行处理，针对切点处设计探针检测未被切断的基因组DNA量；另一种方法是将基因组DNA进行亚硫酸盐处理后，针对目的基因片段分别设计甲基化特异探针及非甲基化特异探针，通过检测两种探针的连接产物量估计基目的基因区段的甲基化水平。One method is to treat the genomic DNA with a methylation-sensitive restriction endonuclease, and design a probe for the cutting point to detect the amount of uncut genomic DNA; the other method is to treat the genomic DNA with sulfite Finally, methylation-specific probes and non-methylation-specific probes were designed for the target gene segment, and the methylation level of the target gene segment was estimated by detecting the ligation products of the two probes.

探针连接产物进行下一代高通量芯片测序，获取每个探针连接产物的量。采用第一种方法是，需要选取基因组中存在的全部甲基化或半甲基化区域作为参照DNA片段，同时选取未进行限制性内切酶处理的样本作为参照样本。采用第二种方法是，需要选取一个参照DNA样本，该DNA样本在所有目标基因区域的甲基化比例已知，该样本的制备可采用全基因扩增产物与甲基化修饰后的全基因组扩增产物按一定比例混合，通常为1:1混合获取50%甲基化比例的参照样本。The probe ligation products were subjected to next-generation high-throughput chip sequencing to obtain the amount of each probe ligation product. In the first method, all methylated or hemimethylated regions in the genome need to be selected as reference DNA fragments, and samples that have not been treated with restriction endonucleases are selected as reference samples. In the second method, it is necessary to select a reference DNA sample whose methylation ratio in all target gene regions is known. The preparation of this sample can be done by using the whole gene amplification product and the whole genome after methylation modification. The amplification products are mixed in a certain ratio, usually 1:1 to obtain a reference sample with a 50% methylation ratio.

病原微生物或转基因动植物鉴定Identification of pathogenic microorganisms or transgenic animals and plants

使用本发明的方法鉴定病原微生物或转基因动植物，每个反应可检测成百上千甚至是成千上万个物种特异基因片段。By using the method of the invention to identify pathogenic microorganisms or transgenic animals and plants, hundreds or even thousands of species-specific gene fragments can be detected in each reaction.

针对每种微生物或转入基因设计多个特异探针，同时也针对掺入参照基因片段设计探针。探针连接产物进行下一代高通量芯片测序。对于每个探针连接产物量进行掺入参照基因片段校正后确认检测样本所含的病原微生物种类以及转基因作物的种类。Multiple specific probes are designed for each microorganism or transgenic gene, and probes are also designed for incorporated reference gene fragments. Probe ligation products were subjected to next-generation high-throughput chip sequencing. For the amount of each probe ligation product, the type of pathogenic microorganisms and the type of transgenic crops contained in the detection sample are confirmed after being corrected by incorporating the reference gene fragment.

本发明的主要优点在于：The main advantages of the present invention are:

（1）一个反应可同时检测成千上万个基因片段信息，检测通量提高；在非专有检测平台上应用，不需额外设备投入，同时一个检测反应能够完成成千上万个基因片段的分析，因此单个基因片段的检测成本大大降低；针对任意需要检测的目的基因片段能够快速建立检测体系，应用灵活：(1) One reaction can detect tens of thousands of gene fragment information at the same time, and the detection throughput is improved; it is applied on a non-proprietary detection platform without additional equipment investment, and one detection reaction can complete thousands of gene fragments at the same time Therefore, the detection cost of a single gene fragment is greatly reduced; the detection system can be quickly established for any target gene fragment that needs to be detected, and the application is flexible:

（2）相对传统的芯片杂交而言，本发明采用测序进行连接产物的鉴定，采用数字计数进行定量，不存在非特异杂交以及检测背景影响，因此大大提高准确性；(2) Compared with the traditional chip hybridization, the present invention uses sequencing to identify the ligation products, and digital counting for quantification, without the influence of non-specific hybridization and detection background, thus greatly improving the accuracy;

（3）本发明所有连接产物长度都比较一致，采用通用引物进行扩增时不同片段之间扩增效率差异比较小，相对于采用不同长度区分连接产物的毛细管电泳来说，在该技术中，扩增产物中各连接产物比例与扩增前的比例更倾向于一致；(3) The lengths of all ligated products in the present invention are relatively consistent, and the difference in amplification efficiency between different fragments is relatively small when using universal primers for amplification. Compared with capillary electrophoresis that uses different lengths to distinguish ligated products, in this technology, The ratio of each ligation product in the amplification product is more likely to be consistent with the ratio before amplification;

（4）采用连接产物经各种核酸外切酶处理纯化后直接进行高通量芯片测序，不经过PCR扩增，减少了由于不同连接产物的PCR扩增效率差异引入的各连接产物彼此相关比例的偏差；(4) The ligation products are treated and purified by various exonucleases and then directly sequenced by high-throughput chips without PCR amplification, which reduces the correlation ratio of each ligation product introduced due to the difference in PCR amplification efficiency of different ligation products deviation;

（5）采用单分子扩增产物测序的序列鉴定以及数字计数定量方法，大大提供灵敏度。(5) The sequence identification and digital counting quantitative method of single-molecule amplification product sequencing are adopted, which greatly improves the sensitivity.

下面结合具体实施例，进一步阐述本发明。应理解，这些实施例仅用于说明本发明而不用于限制本发明的范围。下列实施例中未注明具体条件的实验方法，通常按照常规条件如Sambrook等人，分子克隆：实验室手册(New York:ColdSpring Harbor LaboratoryPress,1989)中所述的条件，或按照制造厂商所建议的条件。Below in conjunction with specific embodiment, further illustrate the present invention. It should be understood that these examples are only used to illustrate the present invention and are not intended to limit the scope of the present invention. The experimental method that does not indicate specific conditions in the following examples, usually according to conventional conditions such as Sambrook et al., molecular cloning: the conditions described in the laboratory manual (New York: Cold Spring Harbor Laboratory Press, 1989), or according to the manufacturer's suggestion conditions of.

实施例1Example 1

检测48个SNP位点的分型Detect the typing of 48 SNP sites

针对48个SNP位点设计连接探针，每个位点各设计3条探针，2个5’端等位基因特异性探针以及1个3’端共有序列，5’端探针的前半部分加接与illumina二代测序平台兼容的通用PCR序列，而5’端探针的后半部分加接与illumina二代测序平台兼容的另一通用PCR序列。探针在与模板配对良好情况下在TaqDNA ligase作用下进行连接，连接产物利用与illumina二代测序平台兼容通用PCR引物扩增，不同样本分别用带有不同标签序列的通用引物进行扩增，然后均匀混合纯化后上Illumina GAIIx测序仪上进行1x72测序。Sequencing reads用软件读出后根据标签序列区分不同样本来源，然后确定每个Sequencing read来源与哪个连接产物，并对每个连接产物进行那个READS统计。根据两个等位基因特异连接产物的Sequencing reads数目比例进行基因型判读。Design junction probes for 48 SNP sites, design 3 probes for each site, 2 5'-end allele-specific probes and 1 3'-end consensus sequence, the first half of the 5'-end probe A general PCR sequence compatible with the next-generation sequencing platform of Illumina is added in part, and another general PCR sequence compatible with the next-generation sequencing platform of Illumina is added to the second half of the 5' end probe. The probe was ligated under the action of TaqDNA ligase under the condition of good pairing with the template, and the ligation product was amplified using general PCR primers compatible with the illumina next-generation sequencing platform. Different samples were amplified with general primers with different tag sequences, and then After uniform mixing and purification, 1x72 sequencing was performed on the Illumina GAIIx sequencer. After the Sequencing reads are read out by software, different sample sources are distinguished according to the tag sequence, and then each Sequencing read source is determined which ligation product is associated with, and the READS statistics are performed on each ligation product. Genotype interpretation was performed based on the ratio of the number of Sequencing reads of the two allele-specific junction products.

实验流程：experiment process:

样本来自上海瑞金医院常规体检正常个体全血样本，全血样本呢采用酚氯仿抽提出DNA后用1XTE溶解。The sample comes from the whole blood sample of a normal individual in Shanghai Ruijin Hospital. The whole blood sample is extracted with phenol chloroform to extract DNA and then dissolved with 1XTE.

取100-200ng DNA，用1xTE稀释到10μl，98℃温浴5分钟后，立即冰置；Take 100-200ng of DNA, dilute to 10μl with 1xTE, incubate at 98°C for 5 minutes, and place on ice immediately;

用1xTE配置探针混合液（ProbeMix），每个探针0.005μM；Configure the probe mixture (ProbeMix) with 1xTE, 0.005 μM for each probe;

配置2xLigation Premix,10μl:2μl 10*Taq ligase buffer,1μl 40U/μl TaqLigase,1μl ProbeMix,6μl ddH₂O；Configure 2xLigation Premix, 10μl: 2μl 10*Taq ligase buffer, 1μl 40U/μl TaqLigase, 1μl ProbeMix, 6μl ddH ₂ O;

将10μl 2xLigation Premix加入到变性后的10μl DNA样本中，轻微振荡混匀；Add 10μl 2xLigation Premix to the denatured 10μl DNA sample, shake slightly to mix;

用以下程序进行连接反应：4×（95℃30s，58℃4h），连接反应结束后立即冰置待用或将其存放于-20℃以下备用；Carry out the ligation reaction with the following procedure: 4× (95°C for 30s, 58°C for 4h), after the ligation reaction is completed, immediately put it on ice for later use or store it below -20°C for later use;

配置PCR引物混合液Pmix1、Pmix2及Pmix3，分别由NGMPCRF和NGMPCRR001，NGMPCRF和NGMPCRR002，NGMPCRF和NGMPCRR003组成，各引物浓度均为2μM；Configure PCR primer mixtures Pmix1, Pmix2 and Pmix3, which are composed of NGMPCRF and NGMPCRR001, NGMPCRF and NGMPCRR002, NGMPCRF and NGMPCRR003 respectively, and the concentration of each primer is 2 μM;

取连接产物1μl作为模板进行PCR反应，反应体系20μl，包含2μl 10x PCRbuffer，2μl 2.5mM dNTP mix，2μl Pmix1 for S1（或Pmix2 for S2，或Pmix3 for S3），1μlLigation product，0.2μl 5U/μl Taq DNA polymerase，12.8μl Milli-Q water；其PCR程序为：95℃5min；8x（94℃20s，54℃40s，72℃1min）；26x（94℃20s，68℃1.5min）；hold at 4℃；Take 1 μl of the ligation product as a template for PCR reaction, the reaction system is 20 μl, including 2 μl 10x PCRbuffer, 2 μl 2.5mM dNTP mix, 2 μl Pmix1 for S1 (or Pmix2 for S2, or Pmix3 for S3), 1 μl Ligation product, 0.2 μl 5U/μl Taq DNA polymerase, 12.8μl Milli-Q water; the PCR program is: 95°C 5min; 8x (94°C 20s, 54°C 40s, 72°C 1min); 26x (94°C 20s, 68°C 1.5min); hold at 4°C ;

电泳检测扩增效率，然后根据产物浓度将3个PCR产物均匀混合，电泳分离割胶用QIAquick Gel Extraction Kit纯化100bp-150bp之间的片段；Electrophoresis was used to detect the amplification efficiency, and then the three PCR products were evenly mixed according to the product concentration, and the fragments between 100bp and 150bp were purified by electrophoresis and slicing using QIAquick Gel Extraction Kit;

纯化产物OD定量后估计分子数，然后与其它项目样本混和后根据TruSeqSRCluster Kit v2要求进行芯片上的桥式扩增；Estimate the number of molecules after OD quantification of the purified product, then mix with other project samples and perform bridge amplification on the chip according to the requirements of TruSeqSRCluster Kit v2;

扩增产物用TruSeq SBS Kit v5在Illumina GAIIX进行1x72+7测序，仪器控制及数据采集采用Genome Analyzer Data Collection Software SCS2.8，测序选择的recipe为GA2-PEM_MP_72+7Cycle_v<#>；The amplified product was sequenced at 1x72+7 in Illumina GAIIX with TruSeq SBS Kit v5, the instrument control and data collection were performed using Genome Analyzer Data Collection Software SCS2.8, and the recipe selected for sequencing was GA2-PEM_MP_72+7Cycle_v<#>;

根据标签序列将测序的读序分到不同样本中，然后同expected ligationproductlibraries对照连接产物库进行比对；每个读序作为等位基因连接产物进行鉴定，计算每个等位基因连接产物的数目；Divide the sequenced reads into different samples according to the tag sequence, and then compare them with the expected ligation product libraries; identify each read as an allele ligation product, and calculate the number of each allelic ligation product;

根据每个位点两个连接产物测序读序数目比例以及不同样本的比例分布来确定该位点基因型：如果连接特异性很强，某个allele连接产物是另外一个的10倍以上或1/10以下，通常可直接判定为优势Allele的纯合子，如果不是可在多个样本中进行比较看是否存在聚类现象（如分成3类，即对应3种基因型）。The genotype of the site is determined according to the ratio of the number of sequencing reads of the two junction products at each site and the proportion distribution of different samples: if the junction specificity is very strong, a certain allele junction product is more than 10 times or 1/2 that of the other Below 10, it can usually be directly judged as the homozygous dominant Allele. If not, it can be compared in multiple samples to see if there is a clustering phenomenon (for example, divided into 3 categories, corresponding to 3 genotypes).

本实施例中使用的通用引物序列如下：The general primer sequences used in this example are as follows:

NGMPCRF（SEQ ID NO：1）NGMPCRF (SEQ ID NO: 1)

AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACNGMPCRR001（SEQ ID NO：2）AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACNGMPCRR001 (SEQ ID NO: 2)

CAAGCAGAAGACGGCATACGAGATAAACTTGTGACTGGAGTTCAGACGTGCAAGCAGAAGACGGCATACGAGATAAACTTGTGACTGGAGTTCAGACGTG

NGMPCRR002（SEQ ID NO：3）NGMPCRR002 (SEQ ID NO: 3)

CAAGCAGAAGACGGCATACGAGATTCCGGTGTGACTGGAGTTCAGACGTGCAAGCAGAAGACGGCATACGAGATTCCGGTGTGACTGGAGTTCAGACGTG

NGMPCRR003（SEQ ID NO：4）NGMPCRR003 (SEQ ID NO: 4)

CAAGCAGAAGACGGCATACGAGATCCAACTGTGACTGGAGTTCAGACGTGCAAGCAGAAGACGGCATACGAGATCCAACTGTGACTGGAGTTCAGACGTG

三个样本SNP位点和genotype calling（基因型判读）测序深度结果见表1。The results of SNP sites and genotype calling (genotype interpretation) sequencing depth of the three samples are shown in Table 1.

表1Table 1

结果表明：通过与前期测序结果比对，除了rs2231926，所有其他47个SNP位点在3个样本都得到准确分型。位点rs2231926分型有误主要是因为G特异探针发生了非特异连接，但如果更多的样本进行分型，这种分型错误可以通过两种等位基因连接产物量的簇化分析得到避免。The results showed that by comparison with the previous sequencing results, except for rs2231926, all other 47 SNP sites were accurately typed in the 3 samples. The mistyping of locus rs2231926 is mainly due to the non-specific linkage of the G-specific probe, but if more samples are typed, this type of mistyping can be obtained by the cluster analysis of the amount of the two allelic junction products avoid.

实施例2Example 2

检测DMD基因外显子缺失重复Detection of exon deletion duplications in the DMD gene

基本原理如图4所示，每个样本设计141个探针，其中129个分布于DMD基因79个外显子上，6个参照基因探针，6个性染色体性别鉴定探针（3个位于X染色体，3个位于Y染色体）。每个位点各设计2条探针，1个5’端探针以及1个3’端探针，5’端探针前半部分序列是后续PCR扩增引物相一致的通用序列，而后半部分为与目的核酸片段杂交的特异序列，3’端探针的5’端进行磷酸化修饰，前半部分为与目的核酸片段杂交的特异序列，后半部分是后续PCR扩增引物相一致的通用序列。探针在与模板配对良好情况下在Taq DNA连接酶作用下进行连接，连接产物利用与Illumina二代测序平台兼容的通用PCR引物扩增。不同样本分别用带有不同标签序列的通用引物进行扩增，然后均匀混合纯化后上IlluminaGAIIx测序仪上进行1x72+7测序。测序数据进行后续分析。The basic principle is shown in Figure 4, 141 probes are designed for each sample, 129 of which are distributed on the 79 exons of the DMD gene, 6 reference gene probes, and 6 sex chromosome sex identification probes (3 located on X chromosomes, 3 on the Y chromosome). Two probes are designed for each site, one 5' end probe and one 3' end probe. For the specific sequence that hybridizes with the target nucleic acid fragment, the 5' end of the 3' end probe is modified by phosphorylation, the first half is the specific sequence that hybridizes with the target nucleic acid fragment, and the second half is the general sequence consistent with the subsequent PCR amplification primers . The probes were ligated under the action of Taq DNA ligase under the condition of good pairing with the template, and the ligation products were amplified with universal PCR primers compatible with the Illumina next-generation sequencing platform. Different samples were amplified with universal primers with different tag sequences, and then evenly mixed and purified on the IlluminaGAIIx sequencer for 1x72+7 sequencing. Sequence data for subsequent analysis.

样本准备：2个假肥大型肌营养不良症患者（P1，P2），1个女性携带者(P3)及1个正常样本(P4)各抽取2ml全血，用传统酚氯仿方法抽提全血DNA用于后续实验。Sample preparation: 2ml whole blood was drawn from 2 patients with pseudohypertrophic muscular dystrophy (P1, P2), 1 female carrier (P3) and 1 normal sample (P4), and the whole blood was extracted by traditional phenol chloroform method DNA was used in subsequent experiments.

本实施例使用的通用引物NGMPCRF、NGMPCRR001、NGMPCRR002和NGMPCRR003同实施例1，NGMPCRR004序列如下（SEQ ID NO：5）：The general primers NGMPCRF, NGMPCRR001, NGMPCRR002 and NGMPCRR003 used in this example are the same as in Example 1, and the sequence of NGMPCRR004 is as follows (SEQ ID NO: 5):

CAAGCAGAAGACGGCATACGAGATAATTAGGTGACTGGAGTTCAGACGTG。AATTAG为用Illumina二代测序仪测序时采用的标签序列，用于区分不同样本的测序数据。CAAGCAGAAGACGGCATACGAGAT AATTAG GTGACTGGAGTTCAGACGTG. AATTAG is the tag sequence used when sequencing with the Illumina next-generation sequencer, and is used to distinguish the sequencing data of different samples.

实验流程：取100-200ng DNA，用1xTE稀释到10μl，98℃温浴5分钟后，立即冰置；用1xTE配置探针混合液（ProbeMix），每个探针0.005μM；配置2xLigationPremix，10μl：2μl10*Taq ligase buffer,1μl 40U/μl Taq DNA连接酶,1μl ProbeMix,6μl无菌水；将10μl2xLigation Premix加入到变性后的10μl DNA样本中，轻微振荡混匀；用以下程序进行连接反应：4×（95℃ 30s，58℃ 4h），连接反应结束后立即冰置待用或将其存放于-20℃以下备用；配置PCR引物混合液Pmix1,Pmix2，Pmix3及Pmix4，分别由NGMPCRF和NGMPCRR001、NGMPCRF和NGMPCRR002、NGMPCRF和NGMPCRR00、NGMPCRF和NGMPCRR004组成，各引物浓度均为2μM；取连接产物1μl作为模板进行PCR反应，反应体系20μl，包含2μl 10x PCR buffer,2μl2.5mM dNTP mix，2μl Pmix1 for P1（或Pmix2 for P2,Pmix3 for P3,Pmix4 for P4），1μl连接产物，0.2μl 5U/μl Taq DNA polymerase，12.8μl无菌水；其PCR程序为：95℃5min；8x（94℃20s，54℃40s，72℃1min）；26x（94℃20s，68℃1.5min）；hold at 4℃；2%琼脂糖电泳检测扩增效率，然后根据产物浓度将4个PCR产物均匀混合，电泳分离割胶用QIAquick GelExtraction Kit纯化100bp-150bp之间的片段；纯化产物OD定量后估计分子数，然后与其它项目样本混和后根据TruSeq SR Cluster Kit v2要求进行芯片上的桥式扩增；扩增产物用TruSeq SBS Kit v5在Illumina GAIIX进行1x72+7测序，仪器控制及数据采集采用GenomeAnalyzer Data Collection Software SCS2.8，测序选择的recipe为GA2-PEM_MP_72+7Cycle_v<#>；测序序列用软件读出后根据标签序列区分不同样本来源，然后确定每个测序序列来源于哪个连接产物，并对每个连接产物进行测序深度统计。将每个目的基因对应连接产物的检测数量分别除以参照基因片段连接产物的检测数量获得第一个校正值（R），然后将该R值除以参照样本的R值，获得第二个校正值（RR），对每个参照基因片段都计算一个RR值，总共有6个RR值，然后取其中位数，由于参照样本为正常男性个体，DMD基因及X，Y染色体上的基因片段的拷贝数均为1，这样该中位数即为检测样本对应基因片段的拷贝数。Experimental procedure: Take 100-200ng of DNA, dilute it to 10μl with 1xTE, incubate at 98°C for 5 minutes, and place it on ice immediately; use 1xTE to configure the probe mixture (ProbeMix), each probe 0.005μM; configure 2xLigationPremix, 10μl: 2μl10 *Taq ligase buffer, 1μl 40U/μl Taq DNA ligase, 1μl ProbeMix, 6μl sterile water; add 10μl 2xLigation Premix to the denatured 10μl DNA sample, shake slightly to mix; use the following procedure for ligation reaction: 4×( 95°C for 30s, 58°C for 4h), immediately after the ligation reaction, put it on ice for later use or store it below -20°C for later use; prepare the PCR primer mixture Pmix1, Pmix2, Pmix3 and Pmix4, respectively by NGMPCRF and NGMPCRR001, NGMPCRF and NGMPCRR002, NGMPCRF and NGMPCRR00, NGMPCRF and NGMPCRR004, the concentration of each primer is 2 μM; take 1 μl of the ligation product as a template for PCR reaction, the reaction system is 20 μl, including 2 μl 10x PCR buffer, 2 μl 2.5mM dNTP mix, 2 μl Pmix1 for P1 (or Pmix2 for P2, Pmix3 for P3, Pmix4 for P4), 1 μl ligation product, 0.2 μl 5U/μl Taq DNA polymerase, 12.8 μl sterile water; the PCR program is: 95°C 5min; 8x (94°C 20s, 54°C 40s , 72°C 1min); 26x (94°C 20s, 68°C 1.5min); hold at 4°C; 2% agarose electrophoresis to detect the amplification efficiency, then mix the 4 PCR products evenly according to the product concentration, and use QIAquick for electrophoresis separation and slicing GelExtraction Kit purifies fragments between 100bp-150bp; the purified product OD is quantified to estimate the number of molecules, and then mixed with other project samples to carry out bridge amplification on the chip according to the requirements of TruSeq SR Cluster Kit v2; the amplified product is used for TruSeq SBS Kit v5 was sequenced at Illumina GAIIX at 1x72+7. GenomeAnalyzer Data Collection Software SCS2.8 was used for instrument control and data collection. The recipe selected for sequencing was GA2-PEM_MP_72+7Cycle_v<#>; after the sequencing sequence was read out by the software, it was distinguished according to the tag sequence sample source, then determine which ligation product each sequence was derived from, and Sequencing depth statistics were performed for each ligation product. Divide the detection quantity of the ligation product corresponding to each target gene by the detection quantity of the reference gene fragment ligation product to obtain the first correction value (R), and then divide the R value by the R value of the reference sample to obtain the second correction value Value (RR), a RR value is calculated for each reference gene segment, there are 6 RR values in total, and then the median is taken, because the reference sample is a normal male individual, the DMD gene and the gene segments on the X and Y chromosomes The copy number is all 1, so the median is the copy number of the corresponding gene segment of the test sample.

结果：4个样本每个目标基因片段连接产物的测序深度及拷贝数检测结果见表2。Results: The detection results of the sequencing depth and copy number of each target gene fragment ligation product of the 4 samples are shown in Table 2.

表2Table 2

检测结果见图8：图8.1为DMD基因18-41外显子缺失的男性个体；图8.2为DMD基因63-67外显子重复的男性个体；图8.3为DMD基因50外显子缺失携带者的女性个体。The test results are shown in Figure 8: Figure 8.1 is a male individual with a deletion of exon 18-41 of the DMD gene; Figure 8.2 is a male individual with a duplication of exon 63-67 of the DMD gene; Figure 8.3 is a carrier of a deletion of exon 50 of the DMD gene of female individuals.

在本发明提及的所有文献都在本申请中引用作为参考，就如同每一篇文献被单独引用作为参考那样。此外应理解，在阅读了本发明的上述讲授内容之后，本领域技术人员可以对本发明作各种改动或修改，这些等价形式同样落于本申请所附权利要求书所限定的范围。All documents mentioned in this application are incorporated by reference in this application as if each were individually incorporated by reference. In addition, it should be understood that after reading the above teaching content of the present invention, those skilled in the art can make various changes or modifications to the present invention, and these equivalent forms also fall within the scope defined by the appended claims of the present application.

Claims

1. A high-throughput nucleic acid analysis method for non-diagnostic purposes, characterized in that, comprising steps:

(1) For the n kinds of target nucleic acid fragments to be analyzed, for each target nucleic acid fragment, at least 2 specific probes that bind to different binding regions of the target nucleic acid fragments are provided, and each of the specific probes has a specific binding region and a universal sequence region, and the sequence of the specific binding region is complementary to the sequence of the binding region of the target nucleic acid fragment, and the sequence of the universal sequence region corresponds to the sequence of the sequencing primer, wherein n is a positive integer ≥ 40;

(2) hybridize the nucleic acid sample containing the target nucleic acid fragment to be analyzed with the probe described in step (1), and connect the probes to obtain a mixture of probe ligation products, wherein 3 of each probe ligation product Both ' and 5' ends are the general sequence regions whose sequences correspond to the sequences of the sequencing primers;

(3) Sequencing and/or analyzing the mixture of probe ligation products in step (2), so as to obtain the information of the target nucleic acid;

Wherein, the two probes corresponding to each target nucleic acid fragment are: a 5' end probe and a 3' end probe, and the 5' end probe can be complementary to the binding region located at the 3' end of the target nucleic acid fragment to be analyzed , the 3' end probe can be complementary to the binding region located at the 5' end of the target nucleic acid fragment to be analyzed, and the 3' end of the 3' end probe and the 5' end of the 5' end probe carry out anti- Modification and protection of exonucleases;

And, in the step (2), multiple cycles of denaturation-annealing-ligation are performed.

2. The method of claim 1, wherein the specific probe also has one or more characteristics selected from the group consisting of:

(1) The length of the specific probe is ≤100bp;

(2) The length of the specific binding region of the specific probe is ≤50bp;

(3) The length of the general sequence region of the specific probe is ≥ 8 bp;

(4) The sequence of the universal sequence region of the specific probe also corresponds to the amplification primer sequence;

(5) The specific probe includes a tag sequence.

3. The method according to claim 2, characterized in that the length of the specific probe is 30-70bp.

4. The method according to claim 2, characterized in that the length of the specific probe is 40-50bp.

5. The method according to claim 2, characterized in that the length of the specific binding region of the specific probe is 15-35 bp.

6. The method according to claim 2, characterized in that the length of the specific binding region of the specific probe is 20-25 bp.

7. The method according to claim 2, characterized in that the length of the general sequence region of the specific probe is 15-35 bp.

8. The method according to claim 2, characterized in that the length of the general sequence region of the specific probe is 20-25 bp.

9. The method of claim 1, wherein the 5' end of the 3' end probe is phosphorylated.

10. method as claimed in claim 9, is characterized in that, the structure of described 5 ' end probe or 3 ' end probe is as shown in formula I:

5'-A-L-B-3'

Formula I

In Formula I,

A represents the general sequence region;

B represents the specific binding region;

L represents the nucleic acid linking sequence between A and B;

Among them, the positions of A and B can be interchanged.

11. The method according to claim 9 or 10, wherein the connection relationship between the 5' end probe and the 3' end probe is selected from one of the following groups:

(a) The 5' end probe and the 3' end probe are adjacent probes: after the 5' end probe and the 3' end probe hybridize with the target nucleic acid fragment to be analyzed, the distance between the two is 0 Bases are connected under the action of ligase to obtain probe ligation products;

(b) The distance between the 5' end probe and the 3' end probe is 1-500 bases: after the 5' end probe and the 3' end probe are hybridized with the target nucleic acid fragment to be analyzed, the DNA polymerase Gap polymerization and ligation under the action of ligase to obtain probe ligation products;

(c) In addition to the 5' end probe and the 3' end probe, the hybridization system also includes probe 3, which is adjacent to the 5' end probe and the 3' end probe respectively, and the three probes After hybridizing with the target nucleic acid fragment to be analyzed, it is ligated under the action of ligase, so as to obtain the probe ligation product.

12 . The method according to claim 1 , further comprising a step between step (2) and step (3): amplifying the probe ligation product obtained in step (2). 13 .

13. The method according to claim 1, characterized in that, in step (3), the mixture of probe ligation products or its amplified products is sequenced and analyzed using third-generation sequencing technology or second-generation sequencing technology .

14. The method according to claim 1, characterized in that, in step (3), the information of obtaining the target nucleic acid refers to one or more information optionally selected from the following group: DNA methylation information, Mutation screening information, genetic information of pathogenic microorganisms, genetic information of transgenic animal and plant products, and gene expression levels.

15. The method according to claim 1, characterized in that, in step (3), the information of obtaining the target nucleic acid refers to one or more information optionally selected from the following group: CNP typing information, and CNV information.

16. The method of claim 1, wherein n is a positive integer ≥ 100.

17. A method for detecting CNV for non-diagnostic purposes, characterized in that it comprises the step of: using the method according to claim 1 to perform sequencing and CNV analysis on the mixture of probe ligation products derived from the sample to be tested to obtain the target nucleic acid CNV information.

18. A high-throughput methylation analysis method for non-diagnostic purposes, characterized in that it comprises the step of: using the method of claim 1 to sequence and methylate the mixture of probe ligation products derived from the sample to be tested. Methylation analysis to obtain the methylation information of the target nucleic acid.