CN101130823A

CN101130823A - A reverse hybridization coupling extension DNA sequencing method

Info

Publication number: CN101130823A
Application number: CNA2007101315367A
Authority: CN
Inventors: 陆祖宏; 李燕强; 肖鹏峰; 潘志强
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2007-09-04
Filing date: 2007-09-04
Publication date: 2008-02-27
Anticipated expiration: 2027-09-04
Also published as: CN100582243C

Abstract

The invention discloses a reverse hybridization coupling extension DNA sequencing method, in which an unknown DNA template is hybridized in parallel with multiple probes whose 3' ends are 3 or more known bases, and the labeled bases are extended, Detect whether there is marker extension, determine whether the unknown DNA template has corresponding four (or more) consecutive base information according to the principle of base complementarity, and splice out the DNA sequence information to be tested; including repeated hybridization, extension and detection steps . The method of the invention adds an extension step to the traditional hybridization sequencing method, which can solve the problems of false positives and high cost, and realize high throughput, low cost, high sensitivity and rapid determination of short-segment DNA sequences.

Description

A reverse hybridization coupling extension DNA sequencing method

技术领域technical field

本发明涉及一种DNA测序方法，尤其是特别涉及一种DNA的杂交测序方法。The present invention relates to a DNA sequencing method, in particular to a DNA hybrid sequencing method.

背景技术Background technique

随着基因组研究的深入，从基因水平上认识生命的差异，疾病发生、发展的规律，以及药物与生命体的相互作用将成为可能。虽然导致疾病发生的因素众多，但基因突变(单核苷酸多态性、甲基化等)被广泛认为是一个重要的内在因素。在基础研究方面，基因突变位点有助于基因组的精确定位，研究疾病基因的遗传规律，克隆致病基因；在应用方面，直接寻找疾病的易感基因突变位点具有重要的价值。大多数复杂疾病，如癌症、糖尿病、心血管疾病、抑郁症、哮喘等，受众多基因以及环境因子共同作用，通过对于大量某一特定疾病的基因组样本中突变基因型进行大规模鉴定和检测，可以获得与该疾病相关基因型的信息。更为重要的是，通过筛查药物敏感的基因突变位点，为今后实现个体化用药提供依据。通过发现疾病易感性的基因突变位点，可以使人们及早地预防疾病，达到避免疾病的发生的目的。With the deepening of genome research, it will become possible to understand the differences in life, the rules of disease occurrence and development, and the interaction between drugs and living organisms at the genetic level. Although numerous factors contribute to the occurrence of disease, genetic mutations (single nucleotide polymorphisms, methylation, etc.) are widely recognized as an important intrinsic factor. In terms of basic research, gene mutation sites are helpful for the precise positioning of genomes, the study of genetic laws of disease genes, and the cloning of disease-causing genes; in terms of applications, it is of great value to directly find disease-susceptible gene mutation sites. Most complex diseases, such as cancer, diabetes, cardiovascular disease, depression, asthma, etc., are affected by many genes and environmental factors. Through large-scale identification and detection of mutant genotypes in a large number of genomic samples of a specific disease, Information on genotypes associated with the disease is available. More importantly, by screening drug-sensitive gene mutation sites, it will provide a basis for realizing individualized drug use in the future. By discovering the gene mutation sites of disease susceptibility, people can prevent diseases early and achieve the purpose of avoiding the occurrence of diseases.

因此，人类基因组草图刚刚绘制完成后，寻找人类基因组中基因突变位点立即成为人类基因组计划的主要任务之一。目前有许多关于突变和单核苷酸多态性检测的方法，如DNA测序、限制性酶切长度多态性、单链构象多态性、焦磷酸测序和等位基因特异的寡聚核苷酸杂交等。这些技术虽在某种程度上能完成对突变或核苷酸多态性的检测，但均不能同时对目标片段进行全方位的序列确定。能够全方位进行序列检测的方法，即全基因组序列测定方法。然而全基因组序列测定费用太高，对第一个人类基因组序列测定的费用大约为10亿美元，目前这一费用已经降低到大约2千万美元。因此，功能基因组的研究进展仍然受限于DNA测序技术，大幅度降低DNA测序的成本将会大大推动生命科学和医学的研究，甚至会带来革命性的变化。Therefore, just after the draft of the human genome was drawn, finding the gene mutation sites in the human genome immediately became one of the main tasks of the Human Genome Project. There are currently many methods for mutation and SNP detection, such as DNA sequencing, restriction length polymorphism, single-strand conformation polymorphism, pyrosequencing, and allele-specific oligonucleotides acid hybridization etc. Although these technologies can complete the detection of mutations or nucleotide polymorphisms to a certain extent, they cannot simultaneously determine the sequence of the target fragment in an all-round way. A method capable of comprehensive sequence detection, that is, a whole genome sequence determination method. However, the cost of sequencing the whole genome is too high. The cost of sequencing the first human genome was about 1 billion US dollars, which has been reduced to about 20 million US dollars at present. Therefore, the progress of functional genome research is still limited by DNA sequencing technology, and a substantial reduction in the cost of DNA sequencing will greatly promote the research of life sciences and medicine, and even bring about revolutionary changes.

除了对现有的基于电泳的DNA测序技术进行改进外，当前正在发展的新型测序技术主要集中在非电泳的手段上。从总体上来看，这类技术可以分成四大类：第一类是合成测序，在碱基加入到延伸的DNA链的过程中进行检测；第二类是杂交测序法，通过制备一组高密度寡核苷酸微阵列芯片的杂交信号，进行目标基因的序列鉴定；第三类为分子影像技术，一系列可以在单分子的水平上进行测序的技术；最后一类技术是诱导DNA分子蜿蜒通过非常细微的小孔，在这个过程当中借助于电子学或者光学的方法对碱基进行读出，也称作纳米孔测序。In addition to improving the existing electrophoresis-based DNA sequencing technology, new sequencing technologies currently under development mainly focus on non-electrophoretic means. Generally speaking, such technologies can be divided into four categories: the first type is sequencing by synthesis, which detects during the process of adding bases to the extended DNA chain; the second type is hybrid sequencing, by preparing a set of high-density The hybridization signal of the oligonucleotide microarray chip is used to identify the sequence of the target gene; the third type is molecular imaging technology, a series of technologies that can be sequenced at the single-molecule level; the last type of technology is to induce DNA molecule winding Through very small holes, the bases are read out by means of electronic or optical methods in this process, also known as nanopore sequencing.

传统的杂交测序方法，是把一系列长度为K个碱基共4^k条序列固定在芯片上，然后把标记的待测DNA模板去杂交，通过检测杂交信号来确定该序列的杂交序列谱。如果要测定长DNA序列，那么需要合成的探针需要量很大，成本倍数增加。更重要的是该方法有假阳性高的缺点。The traditional hybridization sequencing method is to immobilize a series of ^4k sequences with a length of K bases on the chip, then hybridize the labeled DNA template to be tested, and determine the hybridization sequence spectrum of the sequence by detecting the hybridization signal. If a long DNA sequence is to be determined, a large amount of synthesized probes is required, and the cost is multiplied. More importantly, this method has the disadvantage of high false positives.

发明内容Contents of the invention

技术问题：本发明的目的是提供一种反向杂交偶联延伸DNA测序方法，能克服传统的杂交测序方法假阳性高的缺陷，并解决现有方法探针合成量大、成本高昂的问题，实现高通量，低成本，高灵敏度和快速测定短片段DNA序列。Technical problem: The purpose of the present invention is to provide a reverse hybridization coupling extension DNA sequencing method, which can overcome the defects of high false positives in the traditional hybridization sequencing method, and solve the problem of large amount of probe synthesis and high cost in the existing method, Achieve high-throughput, low-cost, high-sensitivity and rapid determination of short-segment DNA sequences.

技术方案：本发明采用一种反向杂交偶联延伸DNA测序方法，确定未知DNA模板的碱基信息是通过DNA模板与多条探针平行杂交并延伸标记的碱基，通过检测是否有标记物延伸，根据碱基互补原理确定未知DNA模板是否有对应的四个(或以上)连续碱基的信息。即通过对DNA模板重复“杂交-延伸-检测”步骤来实现，即在传统杂交测序法中加入延伸一步，能够解决假阳性及成本高的问题。Technical solution: The present invention adopts a reverse hybridization coupling extension DNA sequencing method to determine the base information of the unknown DNA template by hybridizing the DNA template with multiple probes in parallel and extending the labeled bases, and by detecting whether there is a marker Extension, according to the principle of base complementarity, determine whether the unknown DNA template has corresponding four (or more) continuous base information. That is, it is achieved by repeating the steps of "hybridization-extension-detection" on the DNA template, that is, adding an extension step to the traditional hybridization sequencing method, which can solve the problems of false positives and high cost.

本发明采用如下技术方案：The present invention adopts following technical scheme:

一种反向杂交偶联延伸DNA测序方法，其特征在于DNA序列确定是通过3’末端为已知碱基的杂交引物平行地与待测DNA序列经过多次杂交、延伸、检测来实现，包括如下步骤：A reverse hybridization coupling extension DNA sequencing method, characterized in that the DNA sequence is determined by hybridization primers with known bases at the 3' end in parallel with the DNA sequence to be tested through multiple hybridization, extension, and detection, including Follow the steps below:

步骤1)3’末端为三个或三个以上已知碱基的杂交引物与固定的未知DNA模板在杂交液中加热，然后冷却退火进行杂交；Step 1) The hybridization primer with three or more known bases at the 3' end is heated in the hybridization solution with the immobilized unknown DNA template, and then cooled and annealed for hybridization;

步骤2)用洗液洗净杂交液后，标记单体在聚合酶的催化下，在杂交引物的3’末端延伸一个或多个碱基；通过芯片扫描仪读取延伸碱基的信息，确定未知DNA模板上的包含数个碱基的序列；Step 2) After washing the hybridization solution with washing liquid, the labeled monomer is catalyzed by the polymerase, and one or more bases are extended at the 3' end of the hybridization primer; the information of the extended base is read by a chip scanner to determine A sequence containing several bases on an unknown DNA template;

步骤3)用变性液将延伸碱基的杂交引物从未知DNA模板中洗脱，再将3’末端已知碱基序列不同的杂交引物按步骤1)与未知DNA模板杂交并进行步骤2)，确定未知DNA模板上的包含数个碱基的不同序列；用不同的杂交引物重复上述过程，直到完成DNA序列测定。Step 3) Elute the hybridization primers with extended bases from the unknown DNA template with a denaturing solution, and then hybridize the hybridization primers with different known base sequences at the 3' end to the unknown DNA template according to step 1) and perform step 2), Determine the different sequences containing several bases on the unknown DNA template; repeat the above process with different hybridization primers until the DNA sequence determination is completed.

所述的杂交引物是指能和且只能和所有DNA模板中的一段通用特定序列杂交的寡核酸片段。在DNA模板制备过程中，通过连接或者在扩增过程中引入与特定的杂交引物互补的一段通用序列，其中引入的该段通用序列应与模板中任何一段序列片段有区别，即模板中只有该段通用序列能够与该特定的杂交引物实现杂交。The hybridization primer refers to an oligonucleotide fragment that can and can only hybridize with a general specific sequence in all DNA templates. In the process of DNA template preparation, a general sequence complementary to a specific hybridization primer is introduced by ligation or in the amplification process, wherein the general sequence introduced should be different from any sequence fragment in the template, that is, only the A general sequence can hybridize with the specific hybridization primer.

本发明中的杂交引物，其3’末端的3个或3个以上碱基为已知碱基，优选3个至5个碱基为已知碱基，最优选3’末端为3个已知碱基；5’端带有7个或7个以上碱基，优选7至13个。在最优选的情况下，即3’末端为三个已知碱基，杂交引物序列特征可以5’(N)nXYZ3’表示，其中为N为四种碱基T，A，G，C中不限定的任意一种，或者可用通用碱基I来代替，n为7～13的整数；X，Y，Z均是选自T，A，G，C中的确定的一种碱基。如此，序列XYZ有4³＝64种不同组合，即本发明所述的方法，最少可以在合成64种不同杂交引物的条件下实现；且杂交引物的长度可以只包含10～16个碱基。In the hybrid primer of the present invention, 3 or more bases at its 3' end are known bases, preferably 3 to 5 bases are known bases, and most preferably 3' ends are 3 known bases. Base; 5' end with 7 or more bases, preferably 7 to 13 bases. In the most preferred case, that is, the 3' end is three known bases, and the sequence characteristics of the hybridization primer can be expressed as 5'(N)nXYZ3', where N is four bases T, A, G, and C. Any of the defined ones, or can be replaced by universal base I, n is an integer of 7 to 13; X, Y, and Z are all selected from T, A, G, and C. A definite base. In this way, the sequence XYZ has 4 ³ =64 different combinations, that is, the method of the present invention can be realized under the conditions of synthesizing at least 64 different hybridization primers; and the length of the hybridization primers can only contain 10-16 bases.

所述的未知DNA模板，是指通过常规的PCR，滚环扩增、固相扩增以及桥式扩增等增加基因组中感兴趣的目的序列量的方法扩增后的待测DNA序列或其序列片断。扩增可以是单重的，即一次扩增一个目的片断，也可以是多重的，即一次扩增多个目的片断。未知DNA模板通过化学或者物理方法可以固定在平面基片上，也可以固定在“96、384孔板”或者各种修饰的珠子等载体上，还可以同时固定多个DNA模板，也包括固定单个DNA模板。The unknown DNA template refers to the DNA sequence to be tested or its amplified method for increasing the amount of the target sequence of interest in the genome by conventional PCR, rolling circle amplification, solid phase amplification, and bridge amplification. sequence fragment. Amplification can be single-plexed, that is, one target segment is amplified at a time, or multiplexed, that is, multiple target segments are amplified at a time. Unknown DNA templates can be immobilized on flat substrates by chemical or physical methods, and can also be immobilized on carriers such as "96, 384-well plates" or various modified beads. Multiple DNA templates can also be immobilized at the same time, including the immobilization of a single DNA template.

步骤1)为杂交步骤，可按现有技术的方法，把杂交引物溶解于在高温条件下(60℃左右)短时间加热，然后低温(4℃左右)退火完成杂交过程。Step 1) is the hybridization step. According to the method of the prior art, the hybridization primers can be dissolved in high temperature conditions (about 60° C.) and heated for a short time, and then annealed at low temperature (about 4° C.) to complete the hybridization process.

步骤2)为延伸步骤，当杂交引物与固定的未知DNA模板杂交后，通过在杂交引物上延伸一个或多个标记单体(共A，G，C，T四种)，并通过芯片扫描仪检测延伸的碱基，可以确定不同位置的待测点上未知DNA模板在该次延伸中的碱基信息。此时未知的DNA模板中某一待测点(固定在基片上的单克隆模板DNA)上一小段DNA序列中包含数个碱基的序列已经确定，即与标记单体所含的一个或多个碱基与杂交引物3’末端的3个或3个以上已知碱基组成的序列形成碱基互补的DNA序列。Step 2) is the extension step, when the hybridization primer is hybridized with the immobilized unknown DNA template, by extending one or more labeled monomers (four kinds of A, G, C, and T) on the hybridization primer, and passing through the chip scanner By detecting the extended bases, the base information of the unknown DNA template in the extension can be determined at different positions to be detected. At this time, the sequence containing several bases in a short DNA sequence on a certain point to be tested (monoclonal template DNA fixed on the substrate) in the unknown DNA template has been determined, that is, it is consistent with one or more bases contained in the labeled monomer. A base and a sequence composed of 3 or more known bases at the 3' end of the hybridization primer form a base-complementary DNA sequence.

标记单体的延伸采用已知的通用方法，即在缓冲液中，加入四种不同的标记单体，如dATP(ddATP)、dCTP(ddCTP)、dGTP(ddGTP)和dUTP(ddUTP)，在聚合酶的作用下，在4～10℃条件下延伸5～15分钟，使杂交引物的3’末端延伸带标记的碱基。The extension of the labeled monomer adopts a known general method, that is, in the buffer, four different labeled monomers, such as dATP (ddATP), dCTP (ddCTP), dGTP (ddGTP) and dUTP (ddUTP), are added. Under the action of enzyme, it is extended at 4-10°C for 5-15 minutes, so that the 3' end of the hybridization primer is extended with the labeled base.

所述的标记单体，是指碱基单体(A，G，C，T)上包含可以直接或者间接检测的基团或者粒子，而且3’为双脱氧或被可切割基团所封闭。可检测的基团或者粒子，即标记物，如修饰荧光基团，生物素和量子点，或者不修饰的单体在反应中产生的光信号。所述的标记单体可以是单色的，即四种单体均标记相同的标记物，也可以是多色的，即四种单体标记不相同或者不完全相同的标记物。如标记单体可以是四种不同荧光标记的dATP(ddATP)、dCTP(ddCTP)、dGTP(ddGTP)和dUTP(ddUTP)。The labeling monomer refers to the base monomer (A, G, C, T) containing groups or particles that can be detected directly or indirectly, and the 3' is dideoxy or blocked by a cleavable group. Detectable groups or particles, ie labels, such as modified fluorophores, biotin and quantum dots, or unmodified monomers that generate light signals during the reaction. The marking monomers can be monochromatic, that is, the four monomers are all labeled with the same marker, or polychromatic, that is, the four monomers are labeled with different or not completely identical markers. For example, the labeled monomer can be four different fluorescently labeled dATP (ddATP), dCTP (ddCTP), dGTP (ddGTP) and dUTP (ddUTP).

所述的标记单体的延伸是标记的四种单体(A，G，C，T)的延伸。如果标记单体是单色的，则对于任一杂交引物，相同标记的不同的四种单体(A，G，C，T)应分别延伸并读取碱基延伸信息，不同的单体读取后应将其标记物去除后，再延伸另一种标记单体，其目的是芯片扫描仪能读出不同碱基的信息。容易理解，如果四种单体标记不相同，则四种单体可以同时延伸并读取延伸碱基的信息。The extension of the labeled monomer is the extension of the four labeled monomers (A, G, C, T). If the labeled monomers are monochromatic, for any hybridization primer, the different four monomers (A, G, C, T) of the same label should be extended separately and read the base extension information, and the different monomers should be read After extraction, the marker should be removed, and then another marker monomer should be extended, so that the chip scanner can read the information of different bases. It is easy to understand that if the labels of the four monomers are not the same, the four monomers can be simultaneously extended and the information of the extended base can be read.

单体的延伸可以一步延伸即延伸一个碱基，或多步延伸即延伸多个碱基。在多个碱基延伸中，每次延伸一个碱基，通过检测确定哪些未知DNA模板上的待测点发生了该次碱基的延伸信息；然后采用标记物去除方法，如将荧光切除的方法，再进行下一个碱基的延伸，或者采用标记物叠加，如荧光叠加的方式确定那些未知DNA模板上的待测点发生了碱基的多次延伸。The extension of the monomer can be extended by one base, or multiple bases can be extended by multiple steps. In multi-base extension, one base is extended at a time, and the extension information of the sub-base occurs at the point to be tested on the unknown DNA template by detection; then the label removal method is used, such as the method of cutting off the fluorescence , and then extend the next base, or use marker superposition, such as fluorescence superposition, to determine that multiple base extensions have occurred at the points to be tested on the unknown DNA template.

完成某一待测点上DNA序列中的数个碱基序列的测定后，可以将延伸碱基的杂交引物从未知DNA模板中用变性液洗脱。所述的变性液中含有尿素、NaOH、乙二醇、丙三醇和甲酰胺等能降低DNA变性温度的组分。After completing the determination of several base sequences in the DNA sequence at a certain point to be tested, the hybrid primers with extended bases can be eluted from the unknown DNA template with a denaturing solution. The denaturation solution contains components capable of lowering the DNA denaturation temperature, such as urea, NaOH, ethylene glycol, glycerol and formamide.

延伸标记单体的杂交引物从DNA模板中分离后，可重复步骤1)至步骤2)的方法，将3’末端已知碱基序列不同的另一杂交引物与未知DNA模板杂交，延伸单体，并读出待测点上未知DNA序列的碱基序列信息。以不同的杂交引物，循环上述“杂交-延伸-检测”过程，直到测出未知DNA模板的序列谱，或4^K种(K为杂交引物末端已知碱基的数目)杂交引物都经过上述步骤，完成DNA序列测定。After the hybridization primer for extending the labeled monomer is separated from the DNA template, the method from step 1) to step 2) can be repeated, and another hybridization primer with a different known base sequence at the 3' end is hybridized with the unknown DNA template to extend the monomer. , and read out the base sequence information of the unknown DNA sequence on the point to be tested. With different hybridization primers, cycle the above-mentioned "hybridization-extension-detection" process until the sequence spectrum of the unknown DNA template is measured, or ^4K species (K is the number of known bases at the end of the hybridization primer) hybridization primers have passed through the above steps , to complete the DNA sequence determination.

有益效果：本发明与现有技术相比，具有如下优点：Beneficial effect: compared with the prior art, the present invention has the following advantages:

1.本发明的最大优点是正确性可靠。在传统的杂交测序方法中，如果末端有一个或多个碱基错配，也有可能读出信号，但是读出的信号是错误的。而本发明的方法，只读3’末端的最后几个碱基，而且加上延伸一步(聚合酶对3’末端错配碱基非常敏感)大大增加了正确性。1. The greatest advantage of the present invention is correctness and reliability. In the traditional hybridization sequencing method, if there is one or more base mismatches at the end, it is also possible to read the signal, but the read signal is wrong. And the method of the present invention, only reads the last several bases of 3 ' terminal, and adds extension step (polymerase is very sensitive to 3 ' terminal mismatched base) has increased correctness greatly.

2.本发明的测序方法成本低，最少只需要合成64条杂交引物(每条长度10-15个碱基左右)、普通的聚合酶、荧光标记的dNTP及芯片扫描仪，就可以进行大规模DNA序列测定。2. The sequencing method of the present invention is low in cost, and at least only needs to synthesize 64 hybridization primers (about 10-15 bases in length each), common polymerase, fluorescently labeled dNTP and a chip scanner to perform large-scale sequencing. DNA sequence determination.

3.本发明可实现高通量DNA测序。通过polony或乳液PCR等方法制得的高密度DNA芯片，利用反向杂交偶联延伸DNA测序方法，很容易读出这些芯片上的序列。3. The present invention can realize high-throughput DNA sequencing. High-density DNA chips prepared by methods such as polony or emulsion PCR can easily read the sequences on these chips by using reverse hybridization coupling extension DNA sequencing method.

本发明所提出的方法，由于加入延伸一步，可解决假阳性及成本高的问题。而且应用在芯片上进行杂交延伸测序，可达到高通量水平。此外本发明不需通过破坏或者切割标记物的方法清除杂交引物，可建立快速、准确、便宜的基因组序列测定方法。The method proposed by the present invention can solve the problems of false positive and high cost due to the addition of an extension step. Moreover, the application of hybridization extension sequencing on the chip can reach a high-throughput level. In addition, the present invention does not need to remove hybrid primers by destroying or cutting markers, and can establish a fast, accurate and cheap genome sequence determination method.

以下将结合具体实施方式对本发明作进一步详细描述。The present invention will be described in further detail below in combination with specific embodiments.

附图说明Description of drawings

图1是本发明反向杂交偶联延伸DNA测序方法的示意图Fig. 1 is the schematic diagram of reverse hybridization coupled extension DNA sequencing method of the present invention

其中：1为未知DNA模板；2为杂交引物；3为荧光基团；4为基片；i、j、k、l、m分别为芯片上的待测点。Among them: 1 is an unknown DNA template; 2 is a hybridization primer; 3 is a fluorescent group; 4 is a substrate; i, j, k, l, m are points to be measured on the chip respectively.

图2为DNA序列拼接示意图Figure 2 is a schematic diagram of DNA sequence splicing

其中：4为基片；1为固定在基片上的未知DNA模板，2为杂交引物；3为延伸的带标记的碱基；e为经过拼接得到的序列。Among them: 4 is the substrate; 1 is the unknown DNA template fixed on the substrate, 2 is the hybridization primer; 3 is the extended labeled base; e is the sequence obtained through splicing.

具体实施方式Detailed ways

实施例一Embodiment one

将某一待测DNA序列通过常规的PCR扩增，并固定在丙烯酰胺基片上，制成未知DNA模板。制备64条杂交引物，3’末端为三个已知碱基，5’端带有8个碱基，杂交引物序列特征可以5，NNNNNNNNXYZ 3’表示，其中为N为四种碱基T，A，G，C中不限定的任意一种，或者可用通用碱基I来代替，X，Y，Z均是选自T，A，G，C中的确定的一种碱基，如NNNNNNNNACG、NNNNNNNNGAG等，杂交引物中XYZ组成64种不同序列。A DNA sequence to be tested is amplified by conventional PCR and fixed on an acrylamide substrate to make an unknown DNA template. Prepare 64 hybrid primers, with three known bases at the 3' end and 8 bases at the 5' end. The sequence characteristics of the hybrid primers can be expressed as 5, NNNNNNNNNXYZ 3', where N is four bases T, A , any one of G, C not limited, or can be replaced by universal base I, X, Y, Z are all selected from T, A, G, a certain base in C, such as NNNNNNNNACG, NNNNNNNNGAG etc., XYZ constituted 64 different sequences in the hybridization primers.

1)杂交：将杂交引物NNNNNNNNACG与固定的未知DNA模板杂交。把杂交引物以100uM浓度溶解于1倍的杂交液中，取适量(把待测模板完全浸没)所配的杂交引物置于待测模板中，60℃变性2分钟，然后4℃复性3分钟，用1倍4℃储藏的聚合酶缓冲液(10mM Tris-HCl(pH 7.5)，5mM MgCl2，7.5mM DTT)冲洗30秒，完成杂交过程。1) Hybridization: hybridize the hybridization primer NNNNNNNNACG with the immobilized unknown DNA template. Dissolve the hybridization primers at 100uM concentration in 1X hybridization solution, take an appropriate amount (submerge the template to be tested completely) and place the hybridization primers in the template to be tested, denature at 60°C for 2 minutes, then anneal at 4°C for 3 minutes , washed with 1 times 4°C stored polymerase buffer (10mM Tris-HCl (pH 7.5), 5mM MgCl2, 7.5mM DTT) for 30 seconds to complete the hybridization process.

2)延伸：在1X EcoPol缓冲液(10mM Tris-HCl(pH 7.5)，5mM MgCl2，7.5mM DTT)中，加入不同荧光标记的四种标记单体dATP(ddATP)、dCTP(ddCTP)、dGTP(ddGTP)和dUTP(ddUTP)，在聚合酶(Klenow片段(3’--＞5’exo-))的作用下，在4℃条件下发生延伸反应10分钟，使杂交引物的3’末端延伸一个标记的碱基。2) Extension: in 1X EcoPol buffer (10mM Tris-HCl (pH 7.5), 5mM MgCl2, 7.5mM DTT), add four kinds of labeled monomers dATP (ddATP), dCTP (ddCTP), dGTP ( ddGTP) and dUTP (ddUTP), under the action of polymerase (Klenow fragment (3'-->5'exo-)), an extension reaction occurs at 4°C for 10 minutes to extend the 3' end of the hybridization primer by one labeled bases.

3)检测：通过芯片扫描仪读取延伸碱基的信息，确定未知DNA序列某一待测点上包含4个碱基的序列信息。即可以知道该序列是否含有与杂交引物上已知的三个碱基GCA及延伸的一个碱基形成碱基互补的、共四个碱基的序列。3) Detection: read the information of the extended bases by a chip scanner, and determine the sequence information containing 4 bases at a point to be tested in the unknown DNA sequence. That is, it can be known whether the sequence contains a sequence of four bases in total that is complementary to the known three bases GCA and one base extended on the hybridization primer.

如图1，未知DNA模板1固定到固相载体丙烯酰胺基片4上，杂交引物NNNNNNNNACG 2与DNA模板1杂交反应，加入标记单体后杂交引物2发生标记单体延伸反应，通过检测可以知道本次延伸中，不同位置的待测点i、j、k、l和m中，i、j、l和m处未知DNA模板分别含有与序列ACGT、ACGC、ACGG和ACGA碱基互补的包含4个碱基的序列。As shown in Figure 1, the unknown DNA template 1 is immobilized on the solid-phase carrier acrylamide substrate 4, the hybridization primer NNNNNNNNACG 2 hybridizes with the DNA template 1, and after adding the labeled monomer, the hybridization primer 2 undergoes a labeled monomer extension reaction, which can be known through detection In this extension, among the points i, j, k, l, and m to be tested at different positions, the unknown DNA templates at i, j, l, and m respectively contain bases complementary to the sequences ACGT, ACGC, ACGG, and ACGA. sequence of bases.

4)杂交引物NNNNNNNNACG检测完后，再把其余的杂交引物依次重复1)～3)步，最多检测其余的63条杂交引物，直至得到每一个待测点上的序列谱，通过对这些序列谱进行数据整理、拼接，就可以读出这些序列。4) After the detection of the hybrid primer NNNNNNNNACG, repeat steps 1) to 3) for the rest of the hybrid primers in turn, and detect the remaining 63 hybrid primers at most until the sequence profile of each point to be tested is obtained. By analyzing these sequence profiles After data sorting and splicing, these sequences can be read out.

DNA序列拼接如图2所示，基片4上固定有DNA模板1，杂交引物2杂交后延伸带标记的一个碱基3，图中一段序列共有19条杂交引物，经过拼接可得到序列e，则待测DNA序列中有一与序列e碱基互补的DNA序列。The splicing of DNA sequences is shown in Figure 2. A DNA template 1 is immobilized on the substrate 4, and a base 3 with a label is extended after the hybridization primer 2 is hybridized. There are 19 hybridization primers in a sequence in the figure, and the sequence e can be obtained after splicing. Then there is a DNA sequence complementary to the sequence e base in the DNA sequence to be detected.

实施例二Embodiment two

反向杂交偶联延伸DNA测序方法对单个人类全基因组测序Whole-genome sequencing of a single human by reverse hybridization-coupled extension DNA sequencing

DNA模板可按下述步骤制备：DNA templates can be prepared as follows:

1.提取人类基因组。1. Extract the human genome.

2.DNA片段化。用超声波处理，使DNA的平均大小在50bp左右。2. DNA fragmentation. Use ultrasonic treatment to make the average size of DNA around 50bp.

3.加接头引物a。先把片段化后的基因组DNA用聚合酶补平，并使其3末端带一个突出的A处理。然后在连接酶的作用下，把带有Mme I内切酶识别位点的接头引物连接到小片段的DNA上。3. Add linker primer a. Firstly, the fragmented genomic DNA is blunted with polymerase, and its 3-terminus is treated with a protruding A. Then, under the action of ligase, the linker primer with Mme I endonuclease recognition site is connected to the small fragment of DNA.

4.酶切处理。在MmeI内切酶的作用下，把全基因组切成待测片段大小了19bp的基因组文库。4. Enzyme digestion treatment. Under the action of MmeI endonuclease, the whole genome was cut into a genome library with a fragment size of 19 bp.

5.同步骤3，把另一端也加上接头引物b。5. Same as step 3, add adapter primer b to the other end.

6.微乳液PCR扩增出单克隆的基因组文库。6. Microemulsion PCR amplifies a single-clonal genome library.

7.全基因组芯片制作。把上一步的扩增产物，固定在丙烯酰胺基片上。7. Whole genome microarray fabrication. Fix the amplified product from the previous step on an acrylamide substrate.

将所制备的固定的DNA模板利用本发明的方法进行杂交-延伸测序，测序方法同实施例一。将杂交序列组装、拼接后，得到全基因组信息。The prepared immobilized DNA template was subjected to hybridization-extension sequencing using the method of the present invention, and the sequencing method was the same as in Example 1. After the hybrid sequences are assembled and spliced, the whole genome information is obtained.

实施例三Embodiment three

反向杂交偶联延伸DNA测序方法进行RNA表达谱分析RNA Expression Profiling by Reverse Hybridization Coupled with Extended DNA Sequencing

DNA模板可按下述步骤制备：DNA templates can be prepared as follows:

1.分别提取正常细胞及病变细胞mRNA，用磁珠固定有30个T的核苷酸链纯化mRNA使磁珠吸有mRNA。1. Extract mRNA from normal cells and diseased cells respectively, and use magnetic beads to fix 30 T nucleotide chains to purify mRNA so that the magnetic beads can absorb mRNA.

2.反转录为cDNA，并用RNaseH、DNA聚合酶等作用下合成第二链。2. Reverse transcribe into cDNA, and use RNaseH, DNA polymerase, etc. to synthesize the second strand.

3.用识别四个碱基的内切酶(Sau 3AI)消化步骤2的产物。用binding buffer洗干净磁珠。3. Digest the product of step 2 with an endonuclease (Sau 3AI) that recognizes four bases. Wash the magnetic beads with binding buffer.

4.在连接酶的作用下加上接头引物A(引物上带有内切酶Acu I识别位点)。4. Add adapter primer A (with endonuclease Acu I recognition site on the primer) under the effect of ligase.

5.用Acu I消化步骤4的产物。得到接头引物A后面跟着要测的16个未知序列的cDNA文库。5. Digest the product of step 4 with Acu I. A cDNA library of adapter primer A followed by 16 unknown sequences to be tested was obtained.

6.加上另一接头引物B。然后用乳液PCR进行扩增。6. Add another linker primer B. Amplification was then performed using emulsion PCR.

7.扩增产物固定在芯片上。7. The amplified product is immobilized on the chip.

将所制备的固定的DNA模板利用本发明的方法进行杂交-延伸测序，测序方法同The prepared fixed DNA template is subjected to hybridization-extension sequencing using the method of the present invention, and the sequencing method is the same as

实施例一。数据分析后拼接得到所要测定的DNA测序。Embodiment one. After data analysis, the DNA sequence to be determined is obtained by splicing.

实施例四Embodiment four

反向杂交偶联延伸DNA测序方法检测包含人基因组中Reverse Hybridization Coupled Extension DNA Sequencing Method to Detect Human Genome Containing

SNP位点编号为rs 11053646一段DNA片断The SNP site number is rs 11053646 a DNA fragment

设计一对PCR引物，其中一条引物修饰了丙烯酰胺基团。用PCR引物扩增人样本(如血、唾液等)中包含SNP位点编号为rs 11053646一段DNA序列片断。用聚合方法将PCR产物固定在玻璃片上，采用变性、电泳方法将未固定的另一条PCR链和其他杂质清除，得到一条纯净的DNA链。Design a pair of PCR primers, one of which is modified with an acrylamide group. Use PCR primers to amplify a DNA sequence fragment containing SNP site number rs 11053646 in human samples (such as blood, saliva, etc.). The PCR product is fixed on a glass slide by polymerization method, and the unfixed PCR chain and other impurities are removed by denaturation and electrophoresis methods to obtain a pure DNA chain.

采用实施例一所描述的方法，把杂交引物跟上述芯片进行杂交，在延伸反应条件下，插入带有荧光标记的A、G、C、T，数据读取后，冲洗已经杂交上的引物，并对其他的杂交引物重复进行杂交、延伸和检测，直到所有杂交引物完成杂交-延伸过程，对不同样品所对应的序列进行序列谱分析，就可知编号为rs 11053646的SNP位点的碱基情况。Using the method described in Example 1, hybridize the hybridization primers with the above-mentioned chip, insert fluorescently labeled A, G, C, T under the extension reaction conditions, and wash the hybridized primers after the data is read. And repeat hybridization, extension and detection for other hybridization primers until all hybridization primers complete the hybridization-extension process, and sequence spectrum analysis is performed on the sequences corresponding to different samples, and the base situation of the SNP site numbered rs 11053646 can be known .

Claims

1. sequencing method for reverse hybridized coupling extended DNA is characterized in that dna sequence dna determines terminally to realize through repeatedly hybridizing, extend, detecting with dna sequence dna to be measured abreast for the hybridized primer of known base by 3 ', comprises the steps:

Step 1) 3 ' end is that the hybridized primer and the unknown dna profiling of fixed of known base more than three or three heats in hybridization solution, and cooling annealing is hybridized then;

Step 2) with behind the clean hybridization solution of washing lotion, labeled monomer is under the catalysis of polysaccharase, in the one or more bases of 3 ' terminal extension of hybridized primer; Read the information of extending base by chip scanner, determine the sequence that comprises several bases on the unknown dna profiling;

Step 3) will be extended hybridized primer wash-out from unknown dna profiling of base with sex change liquid, again that 3 ' terminal known base sequence is different hybridized primer set by step 1) with unknown dna profiling hybridization and carry out step 2), determine the different sequences that comprise several bases on the unknown dna profiling; Repeat said process with different hybridized primers, up to finishing determined dna sequence.

2. dna sequencing method according to claim 1 is characterized in that described hybridized primer 5 ' end has 7 or 7 above bases.

3. dna sequencing method according to claim 2 is characterized in that described hybridized primer 3 ' end is 3 known bases, and 5 ' end has 7～13 bases, and the hybridized primer sequence signature is 5 ' (N) _nXYZ3 ', wherein for N is four kinds of base T, A, G, any one that does not limit among the C, or universal base I, X, Y, Z be for being selected from T, A, G, the arbitrary definite base among the C, n is 7～13 integer.

4. dna sequencing method according to claim 1, it is characterized in that described unknown dna profiling, be PCR, rolling circle amplification, solid-phase amplification or bridge-type amplification method, increase dna sequence dna to be measured or sequence fragment after the aim sequence amount in the genome by routine.

5. dna sequencing method according to claim 1 is characterized in that in the step 1), and described unknown dna profiling is fixed on the planar substrates by chemistry or physical method, or on " 96,384 orifice plate ", perhaps on the pearl carrier of various modifications; Fixedly the single DNA template is perhaps fixed a plurality of dna profilings.

6. dna sequencing method according to claim 1 is characterized in that described labeled monomer, is meant that comprise on the base monomer (A, G, C or T) can be directly or the group of indirect detection or particle, and 3 ' by two deoxidations or can be cut group and be sealed.

7. dna sequencing method according to claim 6 is characterized in that described detectable group or particle, for modify fluorophor, vitamin H or quantum dot, the optical signal that the monomer of perhaps not modifying produces in reaction.

8. dna sequencing method according to claim 6 is characterized in that described labeled monomer is four kinds of fluorescently-labeled dATP of difference (ddATP), dCTP (ddCTP), dGTP (ddGTP) and dUTP (ddUTP).

9. dna sequencing method according to claim 1 is characterized in that containing in the described sex change liquid urea, NaOH, ethylene glycol, glycerol or the methane amide component that can reduce the DNA denaturation temperature.

10. dna sequencing method according to claim 1 is characterized in that determining the dna sequence dna of unknown dna profiling by the base complementrity principle according to known base of hybridized primer and the base of being extended.