CN101189345A

CN101189345A - Reagents, Methods, and Libraries for Bead-Based Sequencing

Info

Publication number: CN101189345A
Application number: CNA2006800076930A
Authority: CN
Inventors: K·麦柯南; A·布兰查德; L·科特勒; G·科斯塔
Original assignee: Agencourt Bioscience Corp
Current assignee: Beckman Coulter Genomics Inc
Priority date: 2005-02-01
Filing date: 2006-02-01
Publication date: 2008-05-28

Abstract

The present invention provides methods for determining a nucleic acid sequence by performing successive cycles of duplex extension along a single stranded template. The cycles comprise steps of extension, ligation, and, preferably, cleavage. In certain embodiments the methods make use of extension probes containing phosphorothiolate linkages and employ agents appropriate to cleave such linkages. In certain embodiments the methods make use of extension probes containing an abasic residue or a damaged base and employ agents appropriate to cleave linkages between a nucleoside and an abasic residue and/or agents appropriate to remove a damaged base from a nucleic acid. The invention provides methods of determining information about a sequence using at least two distinguishably labeled probe families. In certain embodiments the methods acquire less than 2 bits of information from each of a plurality of nucleotides in the template in each cycle. In certain embodiments the sequencing reactions are performed on templates attached to beads, which are immobilized in or on a semi-solid support. The invention further provides sets of labeled extension probes containing phosphorothiolate linkages or trigger residues that are suitable for use in the method. In addition, the invention includes performing multiple sequencing reactions on a single template by removing initializing oligonucleotides and extended strands and performing subsequent reactions using different initializing oligonucleotides. The invention further provides efficient methods for preparing templates, particularly for performing sequencing multiple different templates in parallel. The invention also provides methods for performing ligation and cleavage. The invention also provides new libraries of nucleic acid fragments containing paired tags, and methods of preparing microparticles having multiple different templates (e.g., containing paired tags) attached thereto and of sequencing the templates individually.

Description

Reagents, Methods, and Libraries for Bead-Based Sequencing

相关申请的交叉参考Cross References to Related Applications

本申请要求2005年2月1日提交的临时申请USSN 60/649,294；2005年2月25日提交的USSN 60/656,599；2005年4月21日提交的USSN 60/673,749，2005年7月15日提交的USSN 60/699,541和2005年9月30日提交的USSN 60/722,526的优先权和权益，所有这些文献通过参考纳入本文。This application claims provisional applications USSN 60/649,294, filed February 1, 2005; USSN 60/656,599, filed February 25, 2005; USSN 60/673,749, filed April 21, 2005, July 15, 2005 Priority and benefit of USSN 60/699,541, filed September 30, 2005, and USSN 60/722,526, filed September 30, 2005, all of which are incorporated herein by reference.

发明背景Background of the invention

核酸测序技术在从基础研究到临床诊断的各种领域中都非常重要。从这种技术获得的结果可包括不同程度的特异性信息。例如，有用信息可包括：确定具体多核苷酸的序列与参比多核苷酸是否不同，确认样品中是否存在特定多核苷酸序列，确定部分序列信息如鉴定多核苷酸内的一个或多个核苷酸，测定多核苷酸内核苷酸的种类和顺序等。Nucleic acid sequencing technology is important in a variety of fields from basic research to clinical diagnosis. Results obtained from such techniques can include varying degrees of specific information. For example, useful information may include: determining whether the sequence of a specific polynucleotide differs from a reference polynucleotide, confirming the presence or absence of a specific polynucleotide sequence in a sample, determining partial sequence information such as identifying one or more nuclei within a polynucleotide Nucleotides, determine the type and sequence of nucleotides in polynucleotides, etc.

DNA链一般是由四种类型的亚基组成的聚合物，这些亚基即含有腺嘌呤(A)、胞嘧啶(C)、鸟嘌呤(G)和胸腺嘧啶(T)碱基的脱氧核苷酸。这些亚基通过共价磷酸二酯键互相连接，该键将一个脱氧核糖基团的5’碳与下一个基团的3’碳连接起来。大多数天然产生的DNA由两条这种链组成，这两条链以反平行取向排列，通过互补碱基，即A和T以及G和C之间形成的氢键连接在一起。DNA strands are generally polymers composed of four types of subunits, namely deoxynucleosides containing the bases adenine (A), cytosine (C), guanine (G) and thymine (T) acid. These subunits are interconnected by a covalent phosphodiester bond that connects the 5' carbon of one deoxyribose group to the 3' carbon of the next. Most naturally occurring DNA consists of two such strands, arranged in an antiparallel orientation, held together by hydrogen bonds formed between complementary bases, A and T and G and C.

随着链终止或双脱氧核苷酸法(Sanger等，Proc.Natl.Acad.Sci./4：5463-5467，1977)和化学降解法(Maxam和Gilbert，Proc.Natl.Acad.Sci.74：560-564，1977)的发展，可以进行大规模DNA测序，其中前者已被广泛应用、改进和自动化。具体说，在开发自动DNA测序仪中使用荧光标记的链终止物非常重要。上述两种方法的共同之处在于产生了大小不同的标记DNA片段的一种或多种聚集体，其必须随后根据长度进行分离以鉴定所述片段3’端的核苷酸(链终止法)或最近从所述片段上切下的核苷酸(化学降解法)。With chain termination or dideoxynucleotide method (Sanger et al., Proc.Natl.Acad.Sci./4:5463-5467, 1977) and chemical degradation method (Maxam and Gilbert, Proc.Natl.Acad.Sci.74 : 560-564, 1977), the development of large-scale DNA sequencing is possible, where the former has been widely used, improved and automated. Specifically, the use of fluorescently labeled chain terminators is important in the development of automated DNA sequencers. Common to both of the above methods is the generation of one or more aggregates of labeled DNA fragments of different sizes, which must then be separated according to length to identify the nucleotides at the 3' ends of the fragments (chain termination method) or Nucleotides recently cleaved from the fragment (chemical degradation).

虽然目前可用的测序技术已经实现了重大进展，如对许多完整基因组进行测序，但这些技术有许多缺点，并在许多方面还非常需要对其进行改进。一般用聚丙烯酰胺凝胶电泳分离标记的DNA片段。然而，已证明此步骤在许多情况下是限制测序的速度和准确性的主要瓶颈。虽然证明毛细管电泳(CAE)是能够完成人类基因组计划的突破口(Venter等，Science，291：1304-1351，2001；Lander等，Nature，409：860-921，2001)，但仍然有显著的缺点。例如，CAE仍然需要耗时的分离步骤，并且仍然涉及根据大小来区分，这可能是不准确的。While currently available sequencing technologies have achieved major advances, such as the sequencing of many complete genomes, these technologies have a number of shortcomings and in many ways are in great need of improvement. Typically, labeled DNA fragments are separated by polyacrylamide gel electrophoresis. However, this step has proven to be the major bottleneck limiting the speed and accuracy of sequencing in many cases. Although capillary electrophoresis (CAE) has been proven to be a breakthrough capable of completing the Human Genome Project (Venter et al., Science, 291:1304-1351, 2001; Lander et al., Nature, 409:860-921, 2001), there are still significant shortcomings. For example, CAE still requires time-consuming separation steps and still involves differentiation based on size, which can be inaccurate.

已经提出了链终止法的各种替代方法。在一种通常称为“通过合成测序”的方法中，寡核苷酸引物首先与靶模板杂交。然后通过聚合酶-催化加入不同标记的核苷酸的连续循环延伸引物，对所述核苷酸在生长的链中掺入了进行检测。对标记的鉴定用作对模板中互补核苷酸的鉴定。或者，可用每种核苷酸平行进行多项反应，并在使用一种具体核苷酸的反应中所标记核苷酸的掺入鉴定模板中的互补核苷酸。(参见例如，Melamede，美国专利4,863,849；Cheeseman，美国专利5,302,509，Tsien等，国际申请WO 91/06678；Rosenthal等，国际申请WO93/21340；Canard等，Gene，148：1-6(1994)；Metzker等，Nucleic Acids Research，22：4259-4267(1994))。Various alternatives to the chain termination method have been proposed. In a method commonly referred to as "sequencing by synthesis," oligonucleotide primers are first hybridized to a target template. The primer is then extended by successive cycles of polymerase-catalyzed addition of differently labeled nucleotides whose incorporation into the growing strand is detected. Identification of the label serves as identification of the complementary nucleotide in the template. Alternatively, multiple reactions can be performed in parallel for each nucleotide, and incorporation of the labeled nucleotide in a reaction using a particular nucleotide identifies the complementary nucleotide in the template. (See, e.g., Melamede, U.S. Patent 4,863,849; Cheeseman, U.S. Patent 5,302,509, Tsien et al., International Application WO 91/06678; Rosenthal et al., International Application WO 93/21340; Canard et al., Gene, 148:1-6 (1994); Metzker et al., Nucleic Acids Research, 22: 4259-4267 (1994)).

为了对任何显著长度的多核苷酸进行有效测序，需要聚合酶在每个循环中准确地掺入一个核苷酸。因此，通常需要采用用作链终止物的核苷酸，即其掺入防止聚合酶的进一步延伸。然后，必须用酶学或化学方法修饰掺入的核苷酸，以使聚合酶掺入下一个核苷酸。提出了可用作链终止物、但在其掺入后可被修饰从而使其在后续步骤中继续延伸的各种核苷酸类似物，。例如在美国专利5,302,509；6,255,475；6,309,836；6,613,513中已经对这样的“可逆终止物”进行了描述，。然而，已证明难以鉴定可由聚合酶高效掺入的可逆终止物，这可能是由于鉴于核苷酸很小，影响核苷酸用作终止物的修饰也会影响其掺入生长的多核苷酸链。Efficient sequencing of polynucleotides of any significant length requires the polymerase to incorporate exactly one nucleotide per cycle. Therefore, it is often desirable to employ nucleotides that act as chain terminators, ie their incorporation prevents further extension by the polymerase. The incorporated nucleotide must then be modified enzymatically or chemically to allow the polymerase to incorporate the next nucleotide. Various nucleotide analogues have been proposed which can be used as chain terminators, but which can be modified after their incorporation so that their extension continues in subsequent steps. Such "reversible terminators" have been described, for example, in US Patent Nos. 5,302,509; 6,255,475; 6,309,836; 6,613,513. However, it has proven difficult to identify reversible terminators that can be efficiently incorporated by polymerases, possibly due to the fact that, given the small size of nucleotides, modifications that affect the use of nucleotides as terminators also affect their incorporation into growing polynucleotide chains .

其它测序方法包括焦磷酸盐测序(pyrosequencing)，该方法基于检测DNA聚合期间释放的焦磷酸盐(PPi)(参见例如，美国专利6,210,891和6,258,568)。虽然不需要电泳分离，但焦磷酸盐测序有大量仍然限制其广泛应用的缺点(Franca等，Quarterly Reviews of Biophysics，35(2)：169-200，2002)。也提出了杂交测序作为替代方法(美国专利5,202,231；WO 99/60170；WO 00/56937；Drmanac等，Advances in Biochemical Engineering/Biotechnology，11：16-101，2002)，但也有许多缺点，包括在区分高度相似序列时可能出错。理论上，通过外切核酸酶的单分子测序是快速测定长DNA分子序列的非常有效的方法，该方法包括标记一条链上的每个碱基，然后检测样品流中依次切下的3’末端核苷酸(Stephan等，JBiotechnoL，86：255-267，2001)。然而，在实现这种可能方法之前还有许多技术障碍等待克服(Stephan等，2001)。Other sequencing methods include pyrosequencing, which is based on the detection of pyrophosphate (PPi) released during DNA polymerization (see eg, US Patents 6,210,891 and 6,258,568). Although electrophoretic separation is not required, pyrosequencing has a number of disadvantages that still limit its widespread application (Franca et al., Quarterly Reviews of Biophysics, 35(2):169-200, 2002). Sequencing by hybridization has also been proposed as an alternative (US Patent 5,202,231; WO 99/60170; WO 00/56937; Drmanac et al., Advances in Biochemical Engineering/Biotechnology, 11:16-101, 2002), but has many disadvantages, including in distinguishing Errors may occur with highly similar sequences. In theory, single-molecule sequencing by exonucleases is a very efficient method for rapidly determining the sequence of long DNA molecules, which involves labeling each base on one strand and then detecting sequentially excised 3' ends in the sample stream Nucleotides (Stephan et al., J BiotechnoL, 86:255-267, 2001). However, there are many technical hurdles to overcome before this possible approach can be realized (Stephan et al., 2001).

基于具体序列变化的诊断测试已可用于各种不同疾病。人们普遍认为，人类基因组的测序开创了个性化用药的时代，其中治疗(包括预防性治疗)会适应患者的具体遗传组成或或根据具体等位基因或突变的鉴定结果进行选择。对快速和准确测定病原体如HIV的序列变体的需要逐步增加。因此，在不远的将来肯定更加需要准确和快速的序列测定。因此，需要所有类型的序列测定的改进方法。Diagnostic tests based on specific sequence changes are already available for a variety of different diseases. It is widely believed that the sequencing of the human genome has ushered in an era of personalized medicine, in which treatments, including prophylactic treatments, are tailored to a patient's specific genetic makeup or selected based on the identification of specific alleles or mutations. There is an increasing need for rapid and accurate determination of sequence variants of pathogens such as HIV. Therefore, there will certainly be a greater need for accurate and rapid sequence determination in the near future. Therefore, improved methods for all types of sequence determination are needed.

发明概述Summary of the invention

本发明提供了不需要进行片段分离，并在某些实施方式中也不需要采用聚合酶的新型改进测序方法。Macevicz的美国专利5,740,341和6,306,597描述了发明背景中讨论的方法的替代方法。该方法基于沿单链模板进行双链体延伸的重复循环。在这些方法的优选实施方式中，在每个循环中鉴定一个核苷酸。本发明改进了这些方法。这些改进能有效实施该方法，并且特别适合高通量测序。此外，本发明提供了用于序列测定的方法，该方法包括沿单链模板进行双链体延伸的重复循环但不包括在各循环中鉴定任何单个核苷酸。The present invention provides new and improved sequencing methods that do not require fragment isolation and, in certain embodiments, do not require the use of polymerases. US Patents 5,740,341 and 6,306,597 to Macevicz describe alternatives to the methods discussed in the Background of the Invention. The method is based on repeated cycles of duplex extension along a single-stranded template. In preferred embodiments of these methods, one nucleotide is identified in each cycle. The present invention improves upon these methods. These improvements enable efficient implementation of the method and are particularly suitable for high-throughput sequencing. In addition, the present invention provides methods for sequence determination that include repeated cycles of duplex extension along a single-stranded template but do not include identifying any single nucleotide in each cycle.

在一个方面，本发明提供了基于沿单链模板进行双链体延伸、连接标记的延伸探针和检测标记的连续循环进行测序的改进方法。通常，从通过起始寡核苷酸和模板形成的双链体开始延伸。通过将寡核苷酸连接于起始寡核苷酸末端形成延伸的双链体延伸起始寡核苷酸，然后延伸的双链体通过连续连接循环重复延伸。各循环期间，通过鉴定顺利连接在寡核苷酸探针上或与其相连的标记鉴定模板中一个或多个核苷酸。也可在连接前，或者、此外，也可在连接后检测新加探针的标记。通常优选在连接后检测该标记。In one aspect, the invention provides improved methods for sequencing based on sequential cycles of duplex extension along a single-stranded template, ligation of a labeled extension probe, and detection of the label. Typically, extension is initiated from the duplex formed by the starting oligonucleotide and the template. The starting oligonucleotide is extended by ligation of the oligonucleotide to the end of the starting oligonucleotide to form an extending duplex, and the extending duplex is then repeatedly extended by successive ligation cycles. During each cycle, one or more nucleotides in the template are identified by identifying a label that is successfully attached to or associated with the oligonucleotide probe. The labeling of newly added probes can also be detected before ligation, or, additionally, after ligation. It is generally preferred to detect the label after ligation.

在优选实施方式中，探针的末端位置(探针上与生长的双链体核酸链连接的核苷酸的相对末端)中具有不可延伸部分，以便在单个循环中仅发生延伸双链体的单个延伸。“不可延伸”指该部分未经修饰不可用作连接酶底物。例如，该部分可以是缺少5’磷酸或3’羟基的核苷酸残基。该部分可以是连接有防止连接的封端基团的核苷酸。在本发明优选实施方式中，连接后去除不可延伸的部分以再生可延伸末端，以便使双链体可在后续循环中进一步延伸。In a preferred embodiment, the probe has a non-extendable portion in its terminal position (the opposite end of the probe to the nucleotides attached to the growing duplex nucleic acid strand) so that only extension of the duplex occurs in a single cycle. single extension. "Non-extensible" means that the moiety cannot serve as a ligase substrate without modification. For example, the moiety can be a nucleotide residue lacking a 5' phosphate or a 3' hydroxyl. The moiety may be a nucleotide attached with a capping group that prevents ligation. In a preferred embodiment of the invention, the non-extendable portion is removed after ligation to regenerate the extendable end so that the duplex can be further extended in subsequent cycles.

为了能够去除不可延伸部分，在本发明的某些实施方式中，探针含有至少一个可在基本不切割磷酸二酯键的条件下切割的核苷间连接。本文中将这种连接称为“易切割的核苷间连接”或“易切连接”。切割易切割的核苷间连接能去除不可延伸部分，并再生可延伸的探针末端或留下修饰形成可延伸探针末端的末端残基。易切割的核苷间连接可位于探针中任意两个核苷之间。优选地，易切连接与新形成键相距至少几个核苷酸(即远端)。延伸探针中连接于可延伸末端的末端核苷酸和易切连接之间的核苷酸不需要与模板完全杂交。这些核苷酸可用作“间隔物”并用于鉴定位于模板间隔处的核苷酸，而不对该间隔内的每个核苷酸进行一个循环。To enable removal of non-extendable moieties, in certain embodiments of the invention, probes contain at least one internucleoside linkage that is cleavable without substantially cleaving phosphodiester bonds. Such linkages are referred to herein as "cleavable internucleoside linkages" or "sharp linkages". Cleavage of the scissile internucleoside linkage removes the non-extendable portion and either regenerates the extendable probe terminus or leaves the terminal residues modified to form the extensible probe terminus. A scissile internucleoside linkage can be located between any two nucleosides in the probe. Preferably, the easy linkage is at least a few nucleotides away from (ie distal to) the newly formed bond. The nucleotides between the terminal nucleotides attached to the extendable termini and the easy junction in the extension probe need not fully hybridize to the template. These nucleotides can be used as "spacers" and used to identify nucleotides located at intervals in the template without performing a cycle for each nucleotide within the interval.

优选地，易切割的核苷间连接和标记的定位应使得易切割的核苷间连接的切割能够将延伸探针分离成标记部分和保持为生长的核酸链一部分的部分，从而使得标记部分扩散开(如通过提高温度)。例如，该标记可在连接核苷酸的相对末端连接于延伸探针的末端核苷酸。或者，可用任何其它方法去除该标记。Preferably, the scissile internucleoside linkage and label are positioned such that cleavage of the scissile internucleoside linkage separates the extension probe into a labeled moiety and a moiety that remains part of the growing nucleic acid strand, thereby allowing the labeled moiety to diffuse on (e.g. by raising the temperature). For example, the label can be attached to the terminal nucleotide of the extension probe at the opposite end from the linking nucleotide. Alternatively, the marker can be removed by any other method.

本发明者发现，磷酸二酯键中桥接氧原子之一被硫原子取代的硫代磷酸酯连接是特别有利的易切割的核苷间连接。硫代磷酸酯连接中的硫原子可连接于一个核苷的3’碳或相邻核苷的5’碳。The inventors have found that a phosphorothioate linkage in which one of the bridging oxygen atoms in the phosphodiester linkage is replaced by a sulfur atom is a particularly advantageous scissile internucleoside linkage. The sulfur atom in the phosphorothioate linkage can be attached to the 3' carbon of one nucleoside or the 5' carbon of an adjacent nucleoside.

在上述方法的某些实施方式中，进行了许多测序反应。这些反应使用与模板的不同序列杂交的起始寡核苷酸，从而使得最初连接所发生的末端位于模板的不同位置上。例如，发生最初连接的位置可以通过增加1个核苷酸而移位，或互相“移相”。因此，用相同长度的寡核苷酸探针延伸的每个循环之后，不同模板上起始寡核苷酸的末端之间存在相同的相对相。可在各自含有相同模板的拷贝的独立容器中平行进行反应，或连续进行反应，即用初始起始寡核苷酸获得序列信息后去除模板上的延伸双链体，然后用杂交于该模板的不同序列的起始寡核苷酸进行其它反应。In certain embodiments of the methods described above, a number of sequencing reactions are performed. These reactions use starting oligonucleotides that hybridize to different sequences of the template so that the ends where the initial ligation occurs are at different positions on the template. For example, the position where the initial ligation occurs can be shifted, or "phased" from each other, by adding 1 nucleotide. Thus, after each cycle of extension with oligonucleotide probes of the same length, the same relative phase exists between the ends of the starting oligonucleotides on different templates. Reactions can be run in parallel in separate vessels each containing a copy of the same template, or they can be run sequentially, with the initial starting oligonucleotides used to obtain sequence information, the extended duplexes on the template removed, and then the extended duplexes hybridized to that template Starting oligonucleotides of different sequences were subjected to other reactions.

在另一方面，本发明提供了可用于各种核酸操作的溶液。在一种实施方式中，本发明提供了含有或主要由1.0-3.0％SDS、100-300mM NaCl和5-15mM硫酸氢钠(NaHSO₄)的水溶液组成的溶液。该溶液可含有或主要由约2％SDS、约200mM NaCl和约10mM硫酸氢钠(NaHSO₄)的水溶液组成。例如，在一种实施方式中，该溶液含有2％SDS、200mM NaCl和10mM硫酸氢钠(NaHSO₄)的水溶液。在另一实施方式中，该溶液主要由2％SDS、200mM NaCl和10mM硫酸氢钠(NaHSO₄)的水溶液组成。在某些实施方式中，该溶液的pH为2.0-3.0，如2.5。该溶液可用于将双链核酸，如双链DNA分离成单链，即使双链核酸变性(解链)。在某些实施方式中，两条链都是DNA。在其它实施方式中，两条链都是RNA。在其它实施方式中，一条链是DNA，另一条链是RNA。在其它实施方式中，一或两条链同时含有RNA和DNA。在其它实施方式中，一或两条链含有至少一个除A、G、C或T以外的核苷酸。在一些实施方式中，一或两条链含有非天然产生的核苷酸。在其它实施方式中，一个或两个残基是引发残基，如脱碱基残基或损坏的碱基。在一些实施方式中，一个或多个残基含有通用碱基。在一些实施方式中，一或两条链含有易切连接。In another aspect, the present invention provides solutions useful for various nucleic acid manipulations. In one embodiment, the invention provides a solution comprising or consisting essentially of an aqueous solution of 1.0-3.0% SDS, 100-300 mM NaCl, and 5-15 mM sodium bisulfate ( _NaHSO4 ). The solution may contain or consist essentially of an aqueous solution of about 2% SDS, about 200 mM NaCl, and about 10 mM sodium bisulfate ( _NaHSO4 ). For example, in one embodiment, the solution contains 2% SDS, 200 mM NaCl, and 10 mM sodium bisulfate ( _NaHSO4 ) in water. In another embodiment, the solution consists essentially of an aqueous solution of 2% SDS, 200 mM NaCl, and 10 mM sodium bisulfate ( _NaHSO4 ). In certain embodiments, the pH of the solution is 2.0-3.0, such as 2.5. This solution can be used to separate double-stranded nucleic acids, such as double-stranded DNA, into single strands, even if the double-stranded nucleic acids are denatured (melted). In certain embodiments, both strands are DNA. In other embodiments, both strands are RNA. In other embodiments, one strand is DNA and the other strand is RNA. In other embodiments, one or both strands contain both RNA and DNA. In other embodiments, one or both strands contain at least one nucleotide other than A, G, C, or T. In some embodiments, one or both strands contain non-naturally occurring nucleotides. In other embodiments, one or both residues are priming residues, such as abasic residues or damaged bases. In some embodiments, one or more residues contain a universal base. In some embodiments, one or both strands contain easy cleavage linkages.

双链核酸可以是完全或部分双链。它们可以是溶液中的游离分子，或者一或两条链可以与固体或半固体支持物或基材物理相连(如共价或非共价连接)。特别注意的是，在这些溶液中孵育的双链核酸在不用加热或不存在强变性剂的情况下有效分离成单链，加热或强变性剂会引起凝胶分层(如核酸位于或连接于半固体支持物如聚丙烯酰胺凝胶时)或可破坏非共价连接如链霉亲和素(SA)-生物素连接(如核酸通过SA-生物素连接连接于支持物或基材时)。在一种实施方式中，用该溶液分离其中一条核酸通过SA-生物素连接与小珠连接的双链核酸。Double-stranded nucleic acids can be fully or partially double-stranded. They may be free molecules in solution, or one or both chains may be physically associated (eg, covalently or non-covalently attached) to a solid or semisolid support or substrate. Of particular note is that double-stranded nucleic acids incubated in these solutions efficiently dissociate into single strands in the absence of heat or the presence of strong denaturing agents that would cause gel stratification (eg, when nucleic acids are located on or attached to semi-solid supports such as polyacrylamide gels) or disruptable non-covalent linkages such as streptavidin (SA)-biotin linkages (such as when nucleic acids are attached to supports or substrates via SA-biotin linkages) . In one embodiment, the solution is used to isolate double-stranded nucleic acids in which one nucleic acid is attached to a bead via an SA-biotin linkage.

本发明也提供了分离双链核酸的链的方法，所述方法包括以下步骤：将双链核酸与任何上述溶液接触，如含有约1.0-3.0％SDS、约100-300mM NaCl和约5-15mM硫酸氢钠(NaHSO₄)，如含有1.0-3.0％SDS、100-300mM NaCl和5-15mM硫酸氢钠(NaHSO₄)的水溶液。在一种实施方式中，该溶液含有约2％SDS、200mM NaCl和10mM硫酸氢钠(NaHSO₄)，如2％SDS、200mM NaCl和10mM硫酸氢钠(NaHSO₄)。在另一实施方式中，该溶液主要由2％SDS、200mM NaCl和10mM硫酸氢钠(NaHSO₄)的水溶液组成。在某些实施方式中，该溶液的pH为2.0-3.0，如2.5。在一些实施方式中，在该溶液中孵育双链核酸。在其它实施方式中，用该溶液洗涤双链核酸(优选连接于支持物或基材的核酸)。在一些实施方式中，将双链核酸与该溶液接触足够时间以将至少10％双链核酸分子分离成单链。在一些实施方式中，将双链核酸与该溶液接触足够时间以将至少20％、30％、40％、50％、60％、70％、80％、90％、95％、98％、99％或更多的双链核酸分离成单链。在示范性实施方式中，将双链核酸与该溶液接触15秒-3小时。在另一实施方式中，将双链核酸与该溶液接触1分钟-1小时。在某些实施方式中，将双链核酸与该溶液接触约1、2、3、4、5、10、15、20、25、30、35、40、45、50、55或60分钟。该方法还可包括孵育一段时间后去除溶液或从溶液中去除一些或全部核酸的步骤。The present invention also provides a method of separating strands of a double-stranded nucleic acid, said method comprising the step of: contacting the double-stranded nucleic acid with any of the above solutions, such as containing about 1.0-3.0% SDS, about 100-300 mM NaCl and about 5-15 mM sulfuric acid Sodium hydrogen sulfate (NaHSO ₄ ), such as an aqueous solution containing 1.0-3.0% SDS, 100-300 mM NaCl, and 5-15 mM sodium hydrogen sulfate (NaHSO ₄ ). In one embodiment, the solution contains about 2% SDS, 200 mM NaCl, and 10 mM sodium bisulfate (NaHSO ₄ ), such as 2% SDS, 200 mM NaCl, and 10 mM sodium bisulfate (NaHSO ₄ ). In another embodiment, the solution consists essentially of an aqueous solution of 2% SDS, 200 mM NaCl, and 10 mM sodium bisulfate ( _NaHSO4 ). In certain embodiments, the pH of the solution is 2.0-3.0, such as 2.5. In some embodiments, double-stranded nucleic acids are incubated in the solution. In other embodiments, the solution is used to wash double-stranded nucleic acids (preferably nucleic acids attached to a support or substrate). In some embodiments, the double-stranded nucleic acid is contacted with the solution for a time sufficient to separate at least 10% of the double-stranded nucleic acid molecules into single strands. In some embodiments, the double-stranded nucleic acid is contacted with the solution for a time sufficient to remove at least 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 98%, 99% % or more of the double-stranded nucleic acid separates into single strands. In an exemplary embodiment, the double-stranded nucleic acid is contacted with the solution for 15 seconds to 3 hours. In another embodiment, the double stranded nucleic acid is contacted with the solution for 1 minute to 1 hour. In certain embodiments, the double-stranded nucleic acid is contacted with the solution for about 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, or 60 minutes. The method may also include the step of removing the solution or removing some or all of the nucleic acid from the solution after a period of incubation.

该溶液可用于本文所述许多测序方法的一个或多个步骤，并可用于这些方法中的任何一种。例如，可用该溶液从模板上分离延伸的双链体。可在切割易切连接后用该溶液去除不再连接于延伸双链体的延伸探针部分。也可用该溶液分离三链核酸的链或分离含有互相杂交的自身互补部分的单链核酸的双链区。This solution can be used in one or more steps of many of the sequencing methods described herein, and can be used in any of these methods. For example, the solution can be used to separate extended duplexes from templates. This solution can be used after cleavage of the easy junction to remove the portion of the extension probe that is no longer attached to the extension duplex. The solution can also be used to separate strands of triple-stranded nucleic acids or to separate double-stranded regions of single-stranded nucleic acids containing self-complementary portions that hybridize to each other.

在另一方面，本发明提供了使用至少两种可区分标记的寡核苷酸探针家族的集合用于获得序列信息的方法。探针家族中的探针含有不限定部分和限定部分。如上述方法中所述，从起始寡核苷酸和模板形成的双链体开始延伸。通过将寡核苷酸探针与其末端连接形成延伸双链体延伸起始寡核苷酸，然后通过连续连接循环重复延伸。该探针的末端位置(探针上连接于双链体的生长的核酸链的核苷酸的相对末端)中含有不可延伸的部分，以便在单个循环中延伸双链体仅发生一次延伸。每个循环期间，检测顺利连接的探针上或与其连接的标记，去除或修饰不可延伸部分以产生可延伸末端。该标记对应于该探针所属的探针家族。In another aspect, the invention provides a method for obtaining sequence information using a collection of at least two distinctly labeled oligonucleotide probe families. Probes in a probe family contain an undefined portion and a defined portion. Extension is initiated from the duplex formed by the starting oligonucleotide and template as described in the above method. The initial oligonucleotide is extended by ligation of the oligonucleotide probe to its terminus to form an extension duplex, and the extension is then repeated through successive ligation cycles. The probe contains a non-extendable portion in its terminal position (the opposite end of the probe from the nucleotides attached to the growing nucleic acid strand of the duplex) so that only one extension occurs in a single cycle of extending the duplex. During each cycle, labels on or attached to successfully ligated probes are detected and non-extendable portions are removed or modified to generate extendable ends. The label corresponds to the probe family to which the probe belongs.

连续的延伸、连接和检测循环产生连续顺利连接的探针所属探针家族的有序列表。用探针家族的有序列表获得序列信息。然而，了解新连接的探针属于哪个探针家族本身不足以确定模板中的核苷酸种类。相反，了解新连接的探针属于哪个探针家族能排除某些序列成为该探针限定部分序列的可能，但各位置上至少留下两种可能的核苷酸种类。因此，模板中位于新连接探针的限定部分的核苷酸的相对位置上的核苷酸种类至少有两种可能性(即与探针限定部分的核苷酸互补的核苷酸)。Successive cycles of extension, ligation, and detection generate an ordered list of probe families to which successive successfully ligated probes belong. Sequence information was obtained using the ordered list of probe families. However, knowing which probe family a newly ligated probe belongs to is not, by itself, sufficient to determine the nucleotide species in the template. Conversely, knowing which probe family a newly ligated probe belongs to excludes certain sequences from being part of the probe's defined sequence, but leaves at least two possible nucleotide species at each position. Thus, there are at least two possibilities for the nucleotide species in the template relative to the nucleotides of the defining portion of the newly ligated probe (ie, nucleotides complementary to the nucleotides of the defining portion of the probe).

在某些实施方式中，进行所需循环数之后，用探针家族种类的有序列表产生一组候选序列。这组候选序列可为达到目标提供足够的信息。在本发明的优选实施方式中，进行一个或多个额外步骤，以从候选序列中选择正确序列。例如，可将该序列与已知序列的数据库进行比较，与数据库中序列之一最接近的候选序列被选作正确序列。在其它实施方式中，用探针家族的差异编码组通过连续的延伸、连接、检测和切割循环对该模板进行另一轮测序，并用第二轮获得的信息选择正确的序列。在其它实施方式中，将至少一项信息与获自探针家族的有序列表的信息合并，以确定该序列。In certain embodiments, after performing the desired number of cycles, an ordered list of probe family species is used to generate a set of candidate sequences. This set of candidate sequences can provide enough information to achieve the goal. In a preferred embodiment of the invention, one or more additional steps are performed to select the correct sequence from the candidate sequences. For example, the sequence can be compared to a database of known sequences, and the candidate sequence that is closest to one of the sequences in the database is selected as the correct sequence. In other embodiments, the template is subjected to another round of sequencing by successive cycles of extension, ligation, detection and cleavage using the differentially encoded set of probe families, and the information obtained in the second round is used to select the correct sequence. In other embodiments, at least one item of information is combined with information obtained from an ordered list of probe families to determine the sequence.

本发明也提供了用探针家族进行测序时进行差错检查的方法。某些方法能区分单核苷酸多态性(SNP)和测序差错。The present invention also provides methods for error checking when sequencing with probe families. Certain methods can distinguish between single nucleotide polymorphisms (SNPs) and sequencing errors.

本发明也提供了含有至少两个感兴趣节段(如至少两个标签)和至少三个引物结合区(PBR)的核酸片段(如DNA片段)，以便从各片段扩增出至少两种不同模板，各自对应于一个感兴趣节段。“引物结合区”是寡核苷酸可杂交的核酸部分，从而使得该寡核苷酸可用作扩增引物、测序引物、起始寡核苷酸等。因此，引物结合区应具有已知序列，以选择适当的互补寡核苷酸。如本文和附图所用，用于本发明方法的核酸链的一部分可称为引物结合区，无论在本发明方法实施中引物确实结合于该区域或是结合于核酸链的互补链的对应部分。因此，在用于本发明所述方法中时，核酸的一部分可称为引物结合区，无论引物确实结合于该区域(在这种情况下引物的序列与该区域的序列互补或基本互补)或是结合于该区域的互补区(在这种情况下引物的序列与该区域的序列相同或基本相同)。感兴趣节段是需要其序列信息的任何核酸节段。例如，感兴趣序列可以是标签，出于本公开目的，可假定感兴趣节段是标签(本文中和其它地方也称为“末端标签”)。然而应理解，本发明不限于作为标签的感兴趣节段。在某些实施方式中，至少两个标签是成对标签。核酸片段可含有一对或多对标签，如一对或多对成对标签，如2、3、4、5或更多对成对标签。本发明还提供了含有这种核酸片段的文库，以及制备模板和文库的方法。The present invention also provides nucleic acid fragments (such as DNA fragments) comprising at least two segments of interest (such as at least two tags) and at least three primer binding regions (PBRs), so that at least two different Templates, each corresponding to a segment of interest. A "primer binding region" is a nucleic acid portion of an oligonucleotide that is hybridizable, thereby rendering the oligonucleotide useful as an amplification primer, sequencing primer, starting oligonucleotide, and the like. Therefore, the primer binding region should have a known sequence to select an appropriate complementary oligonucleotide. As used herein and in the Figures, a portion of a nucleic acid strand used in the methods of the invention may be referred to as a primer binding region, whether a primer actually binds to this region or to the corresponding portion of the complementary strand of the nucleic acid strand in practice of the method of the invention. Thus, when used in the methods described herein, a portion of a nucleic acid may be referred to as a primer binding region, whether the primer actually binds to that region (in which case the sequence of the primer is complementary or substantially complementary to that of the region) or is the complementary region that binds to the region (in which case the sequence of the primer is identical or substantially identical to the sequence of the region). A segment of interest is any nucleic acid segment for which sequence information is desired. For example, a sequence of interest may be a tag, and for the purposes of this disclosure, a segment of interest may be assumed to be a tag (also referred to herein and elsewhere as an "end tag"). It should be understood, however, that the invention is not limited to segments of interest as labels. In certain embodiments, at least two tags are paired tags. A nucleic acid fragment may contain one or more pairs of tags, such as one or more pairs of tag pairs, such as 2, 3, 4, 5 or more pairs of tag pairs. The invention also provides libraries containing such nucleic acid fragments, and methods for preparing templates and libraries.

本发明还提供了微粒，如连接有至少两种不同的核酸群的珠，其中所述至少两种核酸群各自由多种基本相同的核酸组成，并且其中所述核酸群通过扩增如PCR扩增)单个核酸片段产生。在一些实施方式中，所述单个核酸片段含有5’标签和3’标签，其中5’和3’标签是成对标签。在其中所述单个核酸片段含有一对5’标签和3’标签的一些实施方式中，连接于所述微粒的核酸群之一包括5’标签的至少一部分，并且连接于所述微粒的核酸群之一包括3’标签的至少一部分。在优选实施方式中，核酸群之一包括完整的5’标签，并且核酸群之一包括完整的3’标签。The invention also provides microparticles, such as beads, to which at least two different populations of nucleic acids are attached, wherein each of the at least two populations of nucleic acids consists of a plurality of substantially identical nucleic acids, and wherein the populations of nucleic acids are amplified by amplification such as PCR. Increased) single nucleic acid fragment generation. In some embodiments, the single nucleic acid fragment contains a 5' tag and a 3' tag, wherein the 5' and 3' tags are paired tags. In some embodiments wherein the single nucleic acid fragment contains a pair of 5' and 3' tags, one of the populations of nucleic acids attached to the microparticle includes at least a portion of a 5' tag, and the population of nucleic acids attached to the microparticle One includes at least a portion of a 3' tag. In a preferred embodiment, one of the populations of nucleic acids includes a complete 5' tag and one of the populations of nucleic acids includes a complete 3' tag.

核酸片段含有多个PBR，其中至少一个位于标签之间，并且其中至少两个侧接于含有标签的核酸片段部分，从而使得能够对含有5’标签的至少一部分的区域进行扩增，并能够对含有3’标签的至少一部分的区域进行扩增，以产生两种不同的核酸群。在优选实施方式中，可扩增完整的5’标签和完整的3’标签。例如，所述核酸片段可含有侧接于5’标签的第一和第二引物结合位点，以及侧接于3’标签的第三和第四引物结合位点。用结合于第一和第二引物结合位点的引物进行PCR扩增扩增5’标签。用结合于第三和第四引物结合位点的引物进行PCR扩增扩增3’标签。应理解，应选择引物，以便从各引物向含有待扩增标签的DNA片段区域进行延伸。或者，第一引物结合位点可位于所述标签之一的上游，并且第二引物结合位点可位于另一标签的下游，并且第三引物结合位点可位于所述两标签之间。第三引物结合位点用作PCR扩增的正向引物的结合位点，以扩增一个标签，并用作PCR扩增的逆向引物的结合位点，以扩增另一标签。因此，在本发明一种实施方式中提供了微粒，如连接有至少两种不同的核酸群的珠，其中所述至少两种核酸群各自由多种基本相同的核酸组成，并且其中第一种不同的核酸群包括5’标签，第二种不同的核酸群包括3’标签。The nucleic acid fragment contains a plurality of PBRs, at least one of which is located between the tags, and at least two of which flank the portion of the nucleic acid fragment containing the tags, thereby enabling amplification of a region containing at least a portion of the 5' tag and enabling A region containing at least a portion of the 3' tag is amplified to generate two distinct nucleic acid populations. In a preferred embodiment, the entire 5' tag and the entire 3' tag can be amplified. For example, the nucleic acid fragment may contain first and second primer binding sites flanking the 5' tag, and third and fourth primer binding sites flanking the 3' tag. The 5' tag is amplified by PCR amplification using primers that bind to the first and second primer binding sites. The 3' tag is amplified by PCR amplification using primers that bind to the third and fourth primer binding sites. It will be appreciated that the primers should be selected so as to extend from each primer to the region of the DNA fragment containing the tag to be amplified. Alternatively, a first primer binding site may be located upstream of one of the tags, a second primer binding site may be located downstream of the other tag, and a third primer binding site may be located between the two tags. The third primer binding site serves as the binding site for the forward primer for PCR amplification to amplify one tag and as the binding site for the reverse primer for PCR amplification to amplify the other tag. Accordingly, in one embodiment of the invention there is provided a microparticle, such as a bead, to which at least two different populations of nucleic acids are attached, wherein each of the at least two populations of nucleic acids consists of a plurality of substantially identical nucleic acids, and wherein the first A different population of nucleic acids includes a 5' tag and a second different population of nucleic acids includes a 3' tag.

本发明还提供了微粒群，如其中各个微粒连接有至少两种不同的核酸群的珠，其中所述至少两种核酸群各自由多种基本相同的核酸组成，其中所述核酸群通过扩增(如PCR扩增)单个核酸片段产生。基本相同的核酸群可以是(例如)5’标签和3’标签。本发明还提供了这种微粒的阵列和测序方法，该方法包括对基本相同的核酸群进行测序。例如，在一种实施方式中，连接于单个微粒的这两种基本相同的核酸群各自包括不同引物结合区(PBR)，从而通过使用不同的测序引物，可在没有其它群体干扰的情况下对一个群体进行测序。如果将基本相同的核酸的两种以上基本相同的群体连接于一个微粒，每个群体可具有独特的PBR，从而使得结合特定PBR的引物不结合连接于该微粒的其它基本相同核酸群中存在的PBR。因此，本发明方法能够产生连接有至少两种不同的基本相同的核酸群的微粒(如含有5’标签的模板的多个拷贝和含有3’标签的模板的多个拷贝)，其中所述标签是成对标签。按照本发明方法，所述模板含有不同PBR，它们为测序引物提供结合位点。因此，通过选择与含有5’标签的模板中PBR互补的测序引物，可从5’标签获得序列信息，而不受含有3’标签的模板的干扰，即使同一微粒上也存在含有3’标签的模板。通过选择与含有3’标签的模板中PBR互补的测序引物，可从3’标签获得序列信息，而不受含有5’标签的模板的干扰，即使同一微粒上也存在含有5’标签的模板。两个成对标签存在于同一微粒上时，意味着5’和3’成对标签的序列可互相连接，正如本领域所知的它们存在于单个模板时那样。The invention also provides populations of microparticles, such as beads wherein each microparticle has attached at least two different populations of nucleic acids, wherein each of the at least two populations of nucleic acids consists of a plurality of substantially identical nucleic acids, wherein the populations of nucleic acids are amplified by (eg, PCR amplification) the production of individual nucleic acid fragments. A population of substantially identical nucleic acids can be, for example, a 5' tag and a 3' tag. The invention also provides arrays of such microparticles and methods of sequencing comprising sequencing substantially identical populations of nucleic acids. For example, in one embodiment, the two substantially identical populations of nucleic acids attached to a single microparticle each include a different primer binding region (PBR), so that by using different sequencing primers, the sequence can be targeted without interference from the other population. A population is sequenced. If two or more substantially identical populations of substantially identical nucleic acids are attached to a microparticle, each population may have a unique PBR such that a primer that binds a particular PBR does not bind to any other substantially identical populations of substantially identical nucleic acids that are attached to the microparticle. PBR. Thus, the methods of the invention are capable of producing microparticles (e.g., multiple copies of a template containing a 5' tag and multiple copies of a template containing a 3' tag) to which at least two different populations of substantially identical nucleic acids are attached, wherein the tags are pairwise labels. According to the method of the present invention, the template contains different PBRs that provide binding sites for sequencing primers. Therefore, by selecting sequencing primers that are complementary to the PBR in the template containing the 5' tag, sequence information can be obtained from the 5' tag without interference from the template containing the 3' tag, even if there is also a PBR containing the 3' tag on the same particle. template. By selecting sequencing primers that are complementary to the PBR in the template containing the 3' tag, sequence information can be obtained from the 3' tag without interference from the template containing the 5' tag, even if the template containing the 5' tag is also present on the same particle. The presence of both paired tags on the same particle means that the sequences of the 5' and 3' paired tags can be linked to each other, as is known in the art when they are present on a single template.

本发明也提供了可用于(例如)对在基本平坦的支持物中或上排列的模板进行测序的自动化测序系统。本发明还提供了一种图像处理方法，它们可储存于计算机可读介质如硬盘、CD、zip盘、闪存等中。在某些优选实施方式中，该系统每秒实现40,000个或更多核苷酸的鉴定。在某些优选实施方式中，该系统每天(24小时)产生8.6千兆(Gb)序列数据或更多。在某些优选实施方式中，该系统每天产生48Gb序列信息(核苷酸鉴定)或更多。The invention also provides automated sequencing systems that can be used, for example, to sequence templates arrayed in or on a substantially planar support. The present invention also provides an image processing method, which can be stored in computer-readable media such as hard disk, CD, zip disk, flash memory and the like. In certain preferred embodiments, the system achieves identification of 40,000 or more nucleotides per second. In certain preferred embodiments, the system generates 8.6 gigabytes (Gb) of sequence data or more per day (24 hours). In certain preferred embodiments, the system generates 48 Gb of sequence information (nucleotide identification) or more per day.

此外，本发明提供了储存应用本发明测序方法产生的信息的计算机可读介质。所述信息可以储存于数据库中。Additionally, the invention provides computer readable media storing information generated using the sequencing methods of the invention. The information can be stored in a database.

本申请书参考了各种专利、专利申请、期刊文献和其它发表物，它们都通过参考纳入本文。此外，将以下标准参考书纳入本文作参考：《新编分子生物学实验指南》(Current Protocols in Molecular Biology)，John Wiley&Sons，纽约，2002年7月编；Sambrook，Russell，和Sambrook，《分子克隆：实验室手册》(Molecular Cloning：A Laboratory Manual)，第三版，Cold Spring Harbor LaboratoryPress，Cold Spring Harbor，2001。在本说明书与纳入作参考的任何文献有矛盾时，应以本说明书为准，应理解，本发明者能够在任何时间判断是否存在矛盾或不一致。This application refers to various patents, patent applications, journal articles, and other publications, all of which are hereby incorporated by reference. In addition, the following standard references are incorporated herein by reference: Current Protocols in Molecular Biology, John Wiley & Sons, New York, ed. July 2002; Sambrook, Russell, and Sambrook, Molecular Cloning : Laboratory Manual "(Molecular Cloning: A Laboratory Manual), third edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, 2001. In the event of a conflict between this specification and any document incorporated by reference, this specification shall control, with the understanding that the inventors can determine at any time whether a contradiction or inconsistency exists.

附图简要说明Brief description of the drawings

图1A是启动后接两个延伸、连接和鉴定循环的示意图。Figure 1A is a schematic diagram of priming followed by two cycles of extension, ligation and identification.

图1B是从模板的游离端向支持物的向内延伸的实施方式中启动后接两个延伸、连接和鉴定循环的示意图。Figure IB is a schematic illustration of initiation followed by two cycles of extension, ligation and identification in an embodiment of inward extension from the free end of the template to the support.

图2显示了寡核苷酸探针的颜色分配方案，其中通过鉴定荧光团的颜色确定探针的3’碱基种类。Figure 2 shows the color assignment scheme for oligonucleotide probes, where the 3' base species of the probe is determined by identifying the color of the fluorophore.

图3A显示了起始寡核苷酸杂交于模板结合区的不同位置上然后连接延伸探针形成延伸双链体的示意图。Figure 3A shows a schematic diagram of hybridization of initial oligonucleotides to different positions in the template binding region followed by ligation of extension probes to form extension duplexes.

图3B显示了用设计在模板分子上每6个碱基读出一次的延伸探针以延伸、连接和切割法组装连续序列的示意图。Figure 3B shows a schematic diagram of assembly of contiguous sequences by extension, ligation and cleavage with extension probes designed to read every 6 bases on the template molecule.

图4A显示了5’-S-硫代磷酸酯连接(3’-O-P-S-5’)。Figure 4A shows a 5'-S-phosphorothioate linkage (3'-O-P-S-5').

图4B显示了3’-S-硫代磷酸酯连接(3’-S-P-O-5’)。Figure 4B shows a 3'-S-phosphorothioate linkage (3'-S-P-O-5').

图5A显示了用含有3’-O-P-S-5’硫代磷酸酯连接的延伸探针用于5’→3’方向测序的一个延伸、连接和切割循环的示意图。Figure 5A shows a schematic of one extension, ligation and cleavage cycle for sequencing in the 5'→3' direction with an extension probe containing a 3'-O-P-S-5' phosphorothioate linkage.

图5B显示了用含有3’-O-P-S-5’硫代磷酸酯连接的延伸探针用于3’→5’方向测序的一个延伸、连接和切割循环的示意图。Figure 5B shows a schematic of one extension, ligation and cleavage cycle for sequencing in the 3'→5' direction with an extension probe containing a 3'-O-P-S-5' phosphorothioate linkage.

图6A-6F是在单个模板上进行几个测序反应的更详细的示意图。这些反应利用结合于模板不同部分的起始寡核苷酸。6A-6F are more detailed schematics of several sequencing reactions performed on a single template. These reactions utilize starting oligonucleotides that bind to different parts of the template.

图7是显示了dA和dG的3’-亚磷酰胺的合成方案示意图。Figure 7 is a schematic diagram showing the synthetic scheme of 3'-phosphoramidites of dA and dG.

图8A-8E是显示顺利连接和切割含有硫代磷酸酯连接的延伸探针的两个循环的凝胶移位试验结果。Figures 8A-8E are the results of two cycles of gel shift assays showing smooth ligation and cleavage of extension probes containing phosphorothioate linkages.

图8F显示了DNA连接酶的连接机制的示意图。Figure 8F shows a schematic diagram of the ligation mechanism of DNA ligase.

图9是显示含肌苷的简并寡核苷酸探针的连接效率的凝胶移位试验结果。Figure 9 is the results of a gel shift assay showing the ligation efficiency of inosine-containing degenerate oligonucleotide probes.

图10是显示含肌苷的简并寡核苷酸探针在多种底物上的连接效率的凝胶移位试验结果。Figure 10 is the results of a gel shift assay showing the ligation efficiency of inosine-containing degenerate oligonucleotide probes on various substrates.

图11显示了评价两种DNA连接酶(T4 DNA连接酶和Tag DNA连接酶)各自在3’→5’延伸上的保守性的分析结果。Figure 11 shows the results of analyzes evaluating the conservation of each of the two DNA ligases (T4 DNA ligase and Tag DNA ligase) on the 3'→5' extension.

图12是用于评价T4 DNA连接酶在连接寡核苷酸探针中保守性的显示含肌苷的简并寡核苷酸探针的连接效率的凝胶移位试验结果(A)和连接反应的直接测序分析结果(B)。将结果制表形成图C-F。Figure 12 is a gel shift test result (A) and ligation of the ligation efficiency of the inosine-containing degenerate oligonucleotide probes used to evaluate the conservation of T4 DNA ligase in ligated oligonucleotide probes Results of direct sequencing analysis of the reaction (B). The results are tabulated into panels C-F.

图13A-13C显示了当基于珠的模板包埋在玻片上聚丙烯酰胺凝胶中时在凝胶中进行连接的实验结果。图13A显示了连接反应方案。在存在(B)和不存在(C)T4 DNA连接酶时在凝胶中进行连接反应。Figures 13A-13C show the results of experiments with ligation in gels when bead-based templates were embedded in polyacrylamide gels on slides. Figure 13A shows the ligation reaction scheme. Ligation reactions performed in gels in the presence (B) and absence (C) of T4 DNA ligase.

图14A显示了用荧光标记的第二扩增引物和过量模板在连接有第一扩增引物的珠上进行乳液PCR反应的图像。Figure 14A shows an image of an emulsion PCR reaction performed on a bead attached with a first amplification primer using fluorescently labeled second amplification primers and excess template.

图14B(上)显示了连有与Cy3-标记寡核苷酸杂交的模板的珠固定在聚丙烯酰胺凝胶内的部分玻片的荧光图像。(此玻片用于不同实验，但本文所用玻片具有代表性)。图14B(下)显示了装有Teflon掩模以封闭聚丙烯酰胺溶液的玻片的示意图。Figure 14B (top) shows a fluorescent image of a portion of a slide immobilized in a polyacrylamide gel with beads attached to templates hybridized to Cy3-labeled oligonucleotides. (This slide was used in a different experiment, but the slide used here is representative). Figure 14B (bottom) shows a schematic of a slide fitted with a Teflon mask to block polyacrylamide solution.

图15显示了经设计能解决探针特异性和选择性问题的三组标记的寡核苷酸探针，也显示了一组四种可光谱分辨的标记的激发和发射值。Figure 15 shows three sets of labeled oligonucleotide probes designed to address probe specificity and selectivity issues, and also shows excitation and emission values for a set of four spectrally resolved labels.

图16显示了确认寡核苷酸探针的4色光谱特性的实验结果。用含有四种独特荧光团探针的寡核苷酸探针混合物在含有四种独特的单链模板群(A)的玻片上进行杂交和连接反应，在连接前和连接后在亮光下成像(B)，并用四种带通滤光片荧光激发成像。单个群体显示假色(C)。在(D)中对显示最小信号重叠的光谱特性作图。Figure 16 shows the results of experiments confirming the 4-color spectral properties of oligonucleotide probes. Hybridization and ligation reactions were performed on slides containing four unique single-stranded template populations (A) using an oligonucleotide probe mix containing four unique fluorophore probes, imaged under bright light before and after ligation ( B), and imaged with fluorescence excitation using four bandpass filters. Individual populations appear in false color (C). Spectral features showing minimal signal overlap are plotted in (D).

图17显示了确认寡核苷酸延伸探针的连接特异性的实验。图17(A)显示了连接的示意图。图17(B)是亮光图像，图17(C)是包埋在聚丙烯酰胺凝胶中的珠群连接后的相应荧光图像。图17(D)显示了在连接前或连接后从各标记检测到的荧光。Figure 17 shows experiments to confirm ligation specificity of oligonucleotide extension probes. Figure 17(A) shows a schematic diagram of the connections. Figure 17(B) is a bright light image, and Figure 17(C) is the corresponding fluorescent image after ligation of bead populations embedded in a polyacrylamide gel. Fig. 17(D) shows the fluorescence detected from each label before or after ligation.

图18显示了确认寡核苷酸延伸探针的连接特异性和选择性的另一实验。图8(A)显示了连接的示意图。图18(B)是亮光图像，图18(C)是包埋在聚丙烯酰胺凝胶中的珠群连接后的相应荧光图像。图18(D)显示了预计与观察到的连接频率，显示出根据具体延伸探针在群体中的比例预测的频率和观察到的频率高度相关。Figure 18 shows another experiment to confirm ligation specificity and selectivity of oligonucleotide extension probes. Figure 8(A) shows a schematic diagram of the connections. Figure 18(B) is a bright light image, and Figure 18(C) is the corresponding fluorescent image after ligation of bead populations embedded in a polyacrylamide gel. Figure 18(D) shows predicted versus observed junction frequencies, showing that the predicted and observed frequencies are highly correlated based on the proportion of a particular extension probe in the population.

图19显示了确认含有简并和通用碱基的寡核苷酸延伸探针库可用于在凝胶中提供特异性和选择性连接的实验。图19(A)显示了连接实验的示意图，说明了连接后四种差别标记的含肌苷的简并探针库。图19(B)是亮光图像，图19(C)是包埋在聚丙烯酰胺凝胶中的珠群连接后的相应荧光图像。图19(D)显示了预计与观察到的连接频率，显示出根据具体延伸探针在群体中的比例预测的频率和观察到的频率高度相关。图19(E)显示了原始未处理数据和代表前90％珠信号值的过滤数据的散点图。Figure 19 shows experiments confirming that libraries of oligonucleotide extension probes containing degenerate and universal bases can be used to provide specific and selective ligation in a gel. Figure 19(A) shows a schematic of a ligation experiment illustrating four differentially labeled inosine-containing degenerate probe pools after ligation. Figure 19(B) is a bright light image, and Figure 19(C) is the corresponding fluorescent image after ligation of bead populations embedded in a polyacrylamide gel. Figure 19(D) shows predicted versus observed junction frequencies, showing that predicted and observed frequencies are highly correlated based on the proportion of a particular extension probe in the population. Figure 19(E) shows a scatterplot of raw unprocessed data and filtered data representing the top 90% of bead signal values.

图20是显示起始寡核苷酸(引物)与模板的连续杂交剥离循环中检测的信号的柱状图。如图所示，超过10个循环发生少量信号损失。Figure 20 is a bar graph showing the signal detected in successive cycles of hybridization stripping of the starting oligonucleotide (primer) to the template. As shown, a small loss of signal occurs beyond 10 cycles.

图21是可用于(例如)从排列于基本平坦的支持物中或之上的模板中收集序列信息的自动化测序系统的照片。也显示了控制该系统各组件运行、处理和储存收集的图像数据、提供用户界面等的专用计算机。图的下半部分显示了用于实现比重气泡置换的流动室的放大图。Figure 21 is a photograph of an automated sequencing system that can be used, for example, to collect sequence information from templates arrayed in or on a substantially flat support. Also shown are dedicated computers that control the operation of the various components of the system, process and store the collected image data, provide the user interface, and more. The lower part of the figure shows a magnified view of the flow chamber used to achieve specific gravity bubble displacement.

图22显示了高通量自动测序装置示意图，该装置可用于测定排列于基本平坦的支持物中或之上的模板序列。Figure 22 shows a schematic diagram of a high-throughput automated sequencing apparatus that can be used to sequence templates arrayed in or on a substantially flat support.

图23显示了不一致比对的散点图，它说明30帧中不一致的很少。Figure 23 shows a scatterplot of inconsistent alignments, which demonstrates that very few of the 30 frames were inconsistent.

图24A-I显示了本发明流动室或其部分的各种不同视图的示意图。24A-I show schematic diagrams of various views of a flow chamber or portion thereof of the present invention.

图25A显示了优选探针家族集合的示范性编码，该集合包括含有长度为2个核苷酸的限定部分的部分限定探针。Figure 25A shows an exemplary encoding of a preferred probe family set comprising partially defined probes comprising a defined portion of 2 nucleotides in length.

图25B显示了优选的探针家族集合(上图)和连接、检测和切割循环(下图)。Figure 25B shows a preferred collection of probe families (upper panel) and ligation, detection and cleavage cycle (lower panel).

图26显示了另一优选探针家族集合的示范性编码，该集合包括含有长度为2个核苷酸的限定部分的部分限定探针。Figure 26 shows an exemplary encoding of another preferred probe family set comprising partially defined probes comprising defined portions of 2 nucleotides in length.

图27A-27C代表了以图解确定表1定义的24个优选探针家族集合的另一方法。Figures 27A-27C represent another method for determining the set of 24 preferred probe families defined in Table 1 graphically.

图28显示了较不优选的探针家族集合，其中探针含有长度为2个核苷酸的限定部分。Figure 28 shows a less preferred collection of probe families in which the probes contain a defined portion of 2 nucleotides in length.

图29A显示了可用于产生探针家族集合的限定部分的图表，所述集合包括含有长度为3个核苷酸的限定部分的探针。Figure 29A shows a diagram of defined portions that can be used to generate probe family collections comprising probes comprising defined portions that are 3 nucleotides in length.

图29B显示了可用于从24个优选探针家族集合中产生探针家族集合的限定部分的作图方案图表，所述集合包括含有长度为3个核苷酸的限定部分的探针。Figure 29B shows a diagram of a mapping scheme that can be used to generate a defined portion of a probe family set comprising probes comprising a defined portion of 3 nucleotides in length from a set of 24 preferred probe families.

图30显示了用探针家族集合进行序列测定的方法。描述了采用优选探针家族组的一种实施方式。Figure 30 shows a method for sequence determination using probe family collections. One embodiment using a set of preferred probe families is described.

图31A-31C显示用第一探针家族集合产生候选序列、并用第二探针家族集合解码，从而进行序列测定的方法。Figures 31A-31C show a method for sequence determination using a first set of probe families to generate candidate sequences and decode them with a second set of probe families.

图32显示用较不优选的探针家族组合进行序列测定的方法。Figure 32 shows a method for sequence determination with less preferred combinations of probe families.

图33A显示连接有珠的玻片的示意图。DNA模板连接于珠。Figure 33A shows a schematic of a slide with beads attached. DNA templates are attached to beads.

图33B显示连接于玻片的珠群。下图显示白光(左)和荧光显微镜下的相同玻片区域。上图显示珠密度范围。Figure 33B shows a population of beads attached to a glass slide. The images below show the same slide area under white light (left) and fluorescence microscopy. The graph above shows the bead density range.

图34A-34C显示了核酸片段(模板)中存在的成对标签的两个标签以单个核酸群体的方式进行扩增和通过扩增方法将它们捕获到微粒上的方案。Figures 34A-34C show a scheme for the amplification of two tags of a paired tag present in a nucleic acid fragment (template) as a single nucleic acid population and their capture onto microparticles by the amplification method.

图35A和35B显示了图35方案的引物设计和扩增的详细情况。出于清晰目的显示了核酸片段(模板)的两条链。以同一颜色表示具有相同序列的引物和引物结合区。例如，用深蓝色表示P1，表示微粒上和溶液中存在的引物P1的序列与所示模板链的相应彩色部分相同。模板的深蓝色区域(标记的P1)可称为引物结合区，尽管对应的引物(P1)实际上结合于另一条链的互补部分并与引物P1序列相同。Figures 35A and 35B show details of primer design and amplification for the scheme of Figure 35 . Both strands of the nucleic acid fragment (template) are shown for clarity. Primers and primer binding regions with the same sequence are indicated in the same color. For example, P1 is represented in dark blue, indicating that the sequence of primer P1 present on the microparticle and in solution is identical to the corresponding colored portion of the template strand shown. The dark blue region of the template (labeled P1) may be referred to as the primer binding region, although the corresponding primer (P1) actually binds to the complementary portion of the other strand and is identical in sequence to primer P1.

图35C和35D分别显示了连接于用图35A和35B所示方法产生的微粒的第一和第二标签的测序。Figures 35C and 35D show the sequencing of first and second tags, respectively, attached to microparticles produced using the methods shown in Figures 35A and 35B.

定义definition

为了易于理解本说明书，提供以下定义。应理解，通常，没有特别定义的术语被赋予通常含义或本领域通常接受的含义。For ease of understanding of this specification, the following definitions are provided. It should be understood that, in general, terms not specifically defined are given their usual or generally accepted meanings in the art.

本文所用的“脱碱基残基”是具有去除含氮碱基或去除含氮碱基的重要部分以使得到的分子不再参与核苷或核苷酸的氢键特征后，保留的核苷或核苷酸部分结构的残基。可通过从核苷或核苷酸去除含氮碱基产生脱碱基残基。然而，术语“脱碱基”用于指残基的结构特征，不依赖产生残基的方式。本文所用术语“脱碱基残基”和“脱碱基位点”指核酸中缺少嘌呤或嘧啶碱基的残基。As used herein, an "abasic residue" is a nucleoside that remains after the nitrogenous base has been removed or a significant portion of the nitrogenous base has been removed such that the resulting molecule no longer participates in the hydrogen bonding characteristics of the nucleoside or nucleotides or residues of the nucleotide moiety. Abasic residues can be generated by removal of nitrogenous bases from nucleosides or nucleotides. However, the term "abasic" is used to refer to a structural feature of a residue, independent of the manner in which the residue is generated. The terms "abasic residue" and "abasic site" as used herein refer to a residue in a nucleic acid that lacks a purine or pyrimidine base.

本文所用的“脱嘌呤/脱嘧啶(AP)核酸内切酶”指在多核苷酸中切割脱碱基残基的5’侧、3’侧或5’和3’侧的键的酶。在本发明的某些实施方式中，AP核酸内切酶是AP裂解酶。AP核酸内切酶的例子包括但不限于：大肠杆菌(E.coli)核酸内切酶VIII及其同源物，大肠杆菌核酸内切酶III及其同源物。应理解，提到特定酶，如核酸内切酶如大肠杆菌Endo VIII、Endo V等时，也旨在包括本领域认为是同源物并且在去除损伤碱基和/或切割含有脱碱基残基或其它引发残基的DNA方面具有相似生化活性的得自其它物种的同源物。As used herein, an "apurinic/apyrimidinic (AP) endonuclease" refers to an enzyme that cleaves bonds on the 5' side, the 3' side, or both the 5' and 3' sides of abasic residues in a polynucleotide. In certain embodiments of the invention, the AP endonuclease is an AP lyase. Examples of AP endonucleases include, but are not limited to: Escherichia coli (E. coli) endonuclease VIII and its homologues, E. coli endonuclease III and its homologues. It should be understood that when referring to specific enzymes, such as endonucleases such as Escherichia coli Endo VIII, Endo V, etc., it is also intended to include those that are considered homologues in the art and have the ability to remove damaged bases and/or cleave abasic residues. Homologues from other species that have similar biochemical activity with respect to the DNA of the base or other priming residues.

本文所用术语“阵列”指分布于支持物基材上或之中的实体集合；单个实体之间优选间隔足够距离，以用各种技术鉴定该阵列的离散特征。实体可以是(例如)核酸分子，核酸分子克隆群，微粒(任选地连接有核酸分子克隆群体)等。用作动词时，术语“阵列”和其变化形式指形成阵列的任何方法，如将实体分布到支持物基材上或之中。The term "array" as used herein refers to a collection of entities distributed on or in a support substrate; the individual entities are preferably separated by sufficient distances to allow for the identification of discrete features of the array by various techniques. An entity can be, for example, a nucleic acid molecule, a clonal population of nucleic acid molecules, a microparticle (optionally linked to a clonal population of nucleic acid molecules), and the like. When used as a verb, the term "array" and its conjugations refer to any method of forming an array, such as distributing entities on or into a support substrate.

“损伤碱基”是与A、G、C或T不同的嘌呤或嘧啶碱基，使其成为通过DNA糖基化酶从DNA上去除的底物。尿嘧啶被认为是可用于本发明的损伤碱基。在本发明的一些实施方式中，损伤碱基是次黄嘌呤。A "damaged base" is a purine or pyrimidine base other than A, G, C, or T that makes it a substrate for removal from DNA by DNA glycosylases. Uracil is considered an impaired base that can be used in the present invention. In some embodiments of the invention, the damaged base is hypoxanthine.

提到多核苷酸群体的一个多核苷酸中的某位置时，“简并”指在群体的不同成员之间形成占据该位置的核苷部分的碱基种类不同。因此，该群体含有在简并位置上序列不同的单个成员。术语“位置”指通常相对于5’或3’端、分配给多核苷酸中各核苷的数值。例如，可将延伸探针3’端的核苷指定为位置1。因此，在3’-XXXNXXXX-5’结构的延伸探针库中，N位于位置4。如果在该库的不同成员中，N的种类可以变化，则位置4被认为是简并位置。也称延伸探针库在位置N上简并。如果一个位置可被k种不同种类的核苷占据，则称该位置为k倍简并。例如，可由含两种不同碱基的核苷占据的位置是2倍简并。"Degenerate" when referring to a position in a polynucleotide of a population of polynucleotides means that the species of bases that form the moiety of the nucleoside occupying the position differ between different members of the population. Thus, the population contains individual members that differ in sequence at degenerate positions. The term "position" refers to the numerical value assigned to each nucleoside in a polynucleotide, usually relative to the 5' or 3' end. For example, the nucleoside at the 3' end of the extension probe can be designated as position 1. Therefore, N is at position 4 in the library of extension probes of the 3'-XXXNXXXX-5' structure. If the species of N can vary in different members of the library, position 4 is considered a degenerate position. The library of extended probes is also said to be degenerate at position N. A position is said to be k-fold degenerate if it can be occupied by k different kinds of nucleosides. For example, a position that can be occupied by nucleosides containing two different bases is 2-fold degenerate.

“测定序列信息”包括“序列测定”，也包括其它水平的信息，如消除序列的一种或多种可能性。应注意，对多核苷酸进行序列测定通常产生对于完全互补(100％互补)的多核苷酸的等价信息，因此等效于直接对完全互补多核苷酸进行的序列测定。"Determining sequence information" includes "sequence determination" and also includes other levels of information, such as one or more possibilities to eliminate the sequence. It should be noted that sequencing a polynucleotide generally yields equivalent information for a fully complementary (100% complementary) polynucleotide and is therefore equivalent to sequencing a fully complementary polynucleotide directly.

提到多种元件，如寡核苷酸探针分子或其部分中的核苷时，“独立”指各元件的种类不限制或受限于任何其它元件的种类，如各元件种类的选择与任何其它元件的种类无关。因此，了解一种或多种元件的种类不能提供关于任何其它元件种类的任何信息。例如，如果各N的种类可以是A、G、C或T，与其它N的种类无关，那么序列NNNN中的核苷是独立的。When referring to multiple elements, such as nucleosides in an oligonucleotide probe molecule or part thereof, "independently" means that the type of each element is not limited or limited by the type of any other element, such as the choice of the type of each element and The kind of any other element is irrelevant. Thus, knowing the type of one or more elements does not provide any information about the type of any other element. For example, the nucleosides in the sequence NNNN are independent if the species of each N can be A, G, C or T, independent of the species of the other Ns.

“连接”指在模板驱动的反应中在两个或多个核酸如寡核苷酸和/或多核苷酸的末端之间形成共价键或连接。键或连接的本质可以大不相同，并且连接可以以酶学或化学方式进行。"Ligation" refers to the formation of a covalent bond or linkage between the termini of two or more nucleic acids, such as oligonucleotides and/or polynucleotides, in a template-driven reaction. The nature of the bond or linkage can vary widely, and the linkage can be enzymatic or chemical.

本文所用术语“微粒”指具有最小截面尺寸为50微米或更小，优选10微米或更小的颗粒。在某些实施方式中，最小截面尺寸约为3微米或更小，约为1微米或更小，约为0.5微米或更小，如约0.1、0.2、0.3或0.4微米。微粒可由各种无机或有机物制成，包括但不限于：玻璃(如孔径控制玻璃)、二氧化硅、氧化锆、交联的聚苯乙烯、聚丙烯酸、聚甲基甲基丙烯酸、二氧化钛、胶乳、聚苯乙烯等。各种合适的材料和其它考虑参见例如，美国专利6,406,848。获自Dynal，挪威奥斯陆的Dyna珠是可用于本发明的市售微粒的例子。可采用磁性反应微粒。某种优选微粒的磁性反应性有利于在扩增后收集和浓缩连接微粒的模板，并有利于其它步骤(如洗涤、去除试剂等)。在本发明的某些实施方式中，采用具有不同形状(如有些是球形且其他是非球形的)的微粒群。The term "particulate" as used herein refers to particles having a smallest cross-sectional dimension of 50 microns or less, preferably 10 microns or less. In certain embodiments, the smallest cross-sectional dimension is about 3 microns or less, about 1 micron or less, about 0.5 microns or less, such as about 0.1, 0.2, 0.3 or 0.4 microns. Microparticles can be made from a variety of inorganic or organic materials including, but not limited to: glass (e.g. pore control glass), silica, zirconia, cross-linked polystyrene, polyacrylic acid, polymethylmethacrylic acid, titanium dioxide, latex , polystyrene, etc. See, eg, US Patent 6,406,848 for various suitable materials and other considerations. Dyna beads available from Dynal, Oslo, Norway are examples of commercially available microparticles that can be used in the present invention. Magnetically responsive microparticles may be used. Magnetic reactivity of certain preferred microparticles facilitates collection and concentration of microparticle-attached templates after amplification and facilitates other steps (eg, washing, reagent removal, etc.). In certain embodiments of the invention, populations of particles having different shapes (eg, some spherical and others non-spherical) are employed.

本文所用术语“微球”或“珠”指直径为50微米或更小、优选10微米或更小的基本呈球形的微粒。在某些实施方式中，直径约为3微米或更小，约为1微米或更小，约为0.5微米或更小，如约为0.1、0.2、0.3或0.4微米。在本发明的某些实施方式中，采用单分散性微球群体，即微球的大小基本一致。例如，微粒直径的变异系数可小于5％，如2％或更小，1％或更小等。然而，在其它实施方式中，微粒群体的变异系数为5％或更大，如5％、5％-10％(包含性)、10％-25％(包含性)等。在某些实施方式中，采用混合的微粒群体。例如，可采用各自变异系数小于5％的两个群体的混合物，产生不具单分散性的混合群体。例如，可采用直径为1微米和3微米的微球混合物。在本发明的某些实施方式中，用连接于不具单分散性的微球群体的模板进行测序时，通过微球大小提供其它信息。例如，可将不同的模板文库连接于不同大小的微球。同时，由于小颗粒上可以连接较少的模板分子，所以信号强度可改变，这可以有助于进行多重测序。The term "microsphere" or "bead" as used herein refers to substantially spherical microparticles having a diameter of 50 microns or less, preferably 10 microns or less. In certain embodiments, the diameter is about 3 microns or less, about 1 micron or less, about 0.5 microns or less, such as about 0.1, 0.2, 0.3 or 0.4 microns. In certain embodiments of the present invention, a monodisperse population of microspheres is employed, ie, the microspheres are substantially uniform in size. For example, the coefficient of variation of particle diameter may be less than 5%, such as 2% or less, 1% or less, and the like. However, in other embodiments, the population of microparticles has a coefficient of variation of 5% or greater, such as 5%, 5%-10% (inclusive), 10%-25% (inclusive), etc. In certain embodiments, mixed populations of microparticles are employed. For example, a mixture of two populations with each having a coefficient of variation of less than 5% can be used, resulting in a mixed population that is not monodisperse. For example, mixtures of microspheres with diameters of 1 micron and 3 microns can be used. In certain embodiments of the invention, additional information is provided by microsphere size when sequencing is performed using templates attached to a non-monodisperse population of microspheres. For example, different template libraries can be attached to microspheres of different sizes. Also, since fewer template molecules can be attached to small particles, the signal intensity can vary, which can facilitate multiplexed sequencing.

本文所用术语“核酸序列”可以指核酸物质本身，并且不限于表征特定核酸，如DNA或RNA分子的生化特征的序列信息(即选自五个碱基字母A、G、C、T或U的字母的连续组合)。本文所述核酸以5’→3’取向表示，除非另有说明。The term "nucleic acid sequence" as used herein may refer to nucleic acid material itself, and is not limited to sequence information (i.e., selected from the five base letters A, G, C, T, or U) that characterizes the biochemical characteristics of a particular nucleic acid, such as a DNA or RNA molecule. consecutive combinations of letters). Nucleic acids described herein are presented in a 5'→3' orientation unless otherwise stated.

“核苷”包括连接于糖分子的含氮碱基。本文所用的该术语包括如Kornberg和Baker，《DNA复制》(DNA Replication)第2版(Freeman，旧金山，1992)所述的2′-脱氧和2′-羟基形式的天然核苷和核苷类似物。例如，天然核苷包括腺苷、胸苷、鸟苷、胞苷、尿苷、脱氧腺苷、脱氧胸苷、脱氧鸟苷和脱氧胞苷。核苷“类似物”指含有修饰碱基部分和/或修饰糖部分的合成核苷，通常如Scheit，《核苷酸类似物》(Nucleotide Analogs)(John Wiley，纽约，1980)所述。这种类似物包括经设计提高了结合特性、降低了简并性、提高了特异性等的合成核苷。核苷类似物包括2-氨基腺苷、2-硫代胸苷、吡咯并-嘧啶、3-甲基腺苷、C5-丙炔基胞苷、C5-丙炔基尿苷、C5-溴尿苷、C5-氟尿苷、C5-碘尿苷、C5-甲基胞苷、7-脱氮腺苷、7-脱氮鸟苷、8-氧腺苷、8-氧鸟苷、O(6)-甲基鸟嘌呤、2-硫代胞苷等。核苷类似物可包括本文所述的任何通用碱基。"Nucleoside" includes a nitrogenous base attached to a sugar molecule. The term as used herein includes the 2'-deoxy and 2'-hydroxyl forms of natural nucleosides and nucleoside analogs as described by Kornberg and Baker, DNA Replication, 2nd Ed. (Freeman, San Francisco, 1992). things. For example, natural nucleosides include adenosine, thymidine, guanosine, cytidine, uridine, deoxyadenosine, deoxythymidine, deoxyguanosine, and deoxycytidine. Nucleoside "analogues" refer to synthetic nucleosides containing modified base moieties and/or modified sugar moieties, generally as described by Scheit, Nucleotide Analogs (John Wiley, New York, 1980). Such analogs include synthetic nucleosides designed to have improved binding properties, reduced degeneracy, increased specificity, and the like. Nucleoside analogs include 2-aminoadenosine, 2-thiothymidine, pyrrolo-pyrimidine, 3-methyladenosine, C5-propynylcytidine, C5-propynyluridine, C5-bromouridine Glycoside, C5-fluorouridine, C5-iodouridine, C5-methylcytidine, 7-deazaadenosine, 7-deazaguanosine, 8-oxoadenosine, 8-oxoguanosine, O(6 )-methylguanine, 2-thiocytidine, etc. Nucleoside analogs can include any of the universal bases described herein.

本文所用术语“有机体”指任何包含能够复制并且其序列测定令人感兴趣的核酸的有生命或无生命的实体。它包括质粒；病毒；原核生物、古细菌和真核细胞、细胞系、真菌、原生动物、植物、动物等。The term "organism" as used herein refers to any animate or inanimate entity comprising a nucleic acid capable of replication and whose sequence is of interest. It includes plasmids; viruses; prokaryotes, archaea and eukaryotes, cell lines, fungi, protozoa, plants, animals, etc.

提到探针和模板多核苷酸的突出链时，“完全匹配双链体”指一条链的突出链与另一条链形成双链体结构，从而使得双链结构中的每个核苷都与相对链上的一个核苷发生Watson-Crick碱基配对。该术语也包括可用于降低探针简并性的核苷类似物，如脱氧肌苷、具有2-氨基嘌呤碱基的核苷等的配对，而不管这种配对是否包括氢键形成。"Perfectly matched duplex" when referring to the overhanging strands of the probe and template polynucleotide means that the overhanging strand of one strand forms a duplex structure with the other strand such that each nucleoside in the duplex structure is aligned with the Watson-Crick base pairing occurs with one nucleotide on the opposite strand. The term also includes the pairing of nucleoside analogs useful for reducing probe degeneracy, such as deoxyinosine, nucleosides with 2-aminopurine bases, etc., whether or not such pairing involves hydrogen bond formation.

术语“多种”指一种以上。The term "plurality" refers to more than one species.

术语“多态性”具有本领域的普通含义，指同种个体之间的基因组序列差异。“单核苷酸多态性”(SNP)指单个位置上的多态性。The term "polymorphism" has its ordinary meaning in the art and refers to differences in genome sequence among individuals of the same species. "Single nucleotide polymorphism" (SNP) refers to a polymorphism at a single position.

“多核苷酸”、“核酸”或“寡核苷酸”指通过核苷间连接相连的核苷(包括脱氧核糖核苷、核糖核苷或其类似物)的线性聚合物。一般地，多核苷酸包括至少三个核苷。在本发明的某些实施方式中，延伸探针中的一个或多个核苷包含通用碱基。通常，寡核苷酸的大小范围从几个如3-4个单体单元到几百个单体单元。用字母序列如“ATGCCTG”代表多核苷酸如寡核苷酸时，应理解，核苷酸从左至右是5’→3’顺序，“A”指脱氧腺苷，“C”指脱氧胞苷，“G”指脱氧鸟苷，“T”指胸苷，除非另有说明。在本领域中，字母A、C、G和T一般可用于指代碱基本身、包含该碱基的核苷或核苷酸。"Polynucleotide", "nucleic acid" or "oligonucleotide" refers to a linear polymer of nucleosides (including deoxyribonucleosides, ribonucleosides, or analogs thereof) linked by internucleoside linkages. Generally, a polynucleotide includes at least three nucleosides. In certain embodiments of the invention, one or more nucleosides in the extension probe comprise a universal base. Typically, oligonucleotides range in size from a few, eg, 3-4 monomeric units, to several hundred monomeric units. When a letter sequence such as "ATGCCTG" is used to represent a polynucleotide such as an oligonucleotide, it should be understood that the nucleotide sequence is 5'→3' from left to right, "A" refers to deoxyadenosine, and "C" refers to deoxycytidine. Glycoside, "G" refers to deoxyguanosine, and "T" refers to thymidine, unless otherwise stated. In the art, the letters A, C, G and T are generally used to refer to the base itself, the nucleosides or nucleotides comprising the base.

在天然产生的多核苷酸中，核苷间连接一般是磷酸二酯键，亚基称为“核苷酸”。然而，在本发明的某些实施方式中采用含有其它核苷间连接，如硫代磷酸酯连接的寡核苷酸探针。应理解，构成具有非磷酸二酯连接的寡核苷酸探针的一个或多个亚基可能不包括磷酸基团。认为这种核苷酸类似物属于本文所用术语“核苷酸”的范围，含有非磷酸二酯键连接的一个或多个核苷间连接的核酸仍然称为“多核苷酸”、“寡核苷酸”等。在其它实施方式中，多核苷酸如寡核苷酸探针包括含有AP核酸内切酶敏感位点的连接。例如，寡核苷酸探针可含有脱碱基残基、含有作为DNA糖基化酶去除底物的损伤碱基的残基，或者作为AP核酸内切酶切割底物的另一残基或连接。在另一实施方式中，寡核苷酸探针含有二糖核苷。In naturally occurring polynucleotides, the internucleoside linkages are generally phosphodiester bonds and the subunits are termed "nucleotides". However, oligonucleotide probes containing other internucleoside linkages, such as phosphorothioate linkages, are employed in certain embodiments of the invention. It is understood that one or more subunits comprising an oligonucleotide probe having a non-phosphodiester linkage may not include a phosphate group. Such nucleotide analogs are considered to fall within the scope of the term "nucleotide" as used herein, and nucleic acids containing one or more internucleoside linkages other than phosphodiester linkages are still referred to as "polynucleotides", "oligonucleotides", Glycolic Acid" and so on. In other embodiments, polynucleotides such as oligonucleotide probes include linkages comprising AP endonuclease sensitive sites. For example, the oligonucleotide probe may contain an abasic residue, a residue containing a damaged base that is a substrate for DNA glycosylase removal, or another residue that is a substrate for AP endonuclease cleavage or connect. In another embodiment, the oligonucleotide probes contain disaccharide nucleosides.

术语“引物”指长度一般约为10-100个核苷酸的短多核苷酸，通过与靶点杂交结合于靶多核苷酸或“模板”。引物优选为模板指导的与靶点互补的多核苷酸合成提供一个启动点，可以在合适的酶，辅因子，底物如核苷酸、寡核苷酸等存在下进行合成。引物一般提供可从其发生延伸的末端。就用于聚合酶如DNA聚合酶催化合成(如“通过合成测序”、聚合酶链反应(PCR)扩增等)的引物而言，引物一般含有、或可经修饰含有游离的3’OH基团。PCR反应一般采用一对引物(第一和第二扩增引物)，包括“上游”(或“正向”)引物和“下游”(或“反向”)引物，这对引物划定扩增区域的界限。就用于连续的延伸、连接(任选切割)循环进行合成的引物而言，引物一般含有、或可经修饰含有用作DNA连接酶的底物的游离的5’磷酸基团或3’OH基团。The term "primer" refers to a short polynucleotide, typically about 10-100 nucleotides in length, that binds to a target polynucleotide or "template" by hybridization to the target. Primers preferably provide a starting point for template-directed synthesis of polynucleotides complementary to the target, which can be synthesized in the presence of appropriate enzymes, cofactors, substrates such as nucleotides, oligonucleotides, and the like. Primers generally provide termini from which extension can occur. In the case of primers for polymerase, such as DNA polymerase catalyzed synthesis (eg, "sequencing by synthesis", polymerase chain reaction (PCR) amplification, etc.), the primers typically contain, or can be modified to contain, a free 3'OH group group. A PCR reaction generally employs a pair of primers (first and second amplification primers), including an "upstream" (or "forward") primer and a "downstream" (or "reverse") primer, which delineate the amplification the boundaries of the region. For primers synthesized for successive cycles of extension, ligation (and optional cleavage), the primers generally contain, or can be modified to contain, a free 5' phosphate group or 3' OH that serve as substrates for DNA ligase group.

本文所用“探针家族”指各自含有相同标记的一群探针。A "probe family" as used herein refers to a population of probes each containing the same label.

提到多核苷酸时，本文所用“序列测定”、“测定核苷酸序列”、“测序”等术语包括测定多核苷酸中一部分和全部的序列信息。即，该术语包括关于靶多核苷酸的序列比较、指纹分析等水平的信息，以及感兴趣区域内靶多核苷酸的各核苷的快速鉴定和排序。在本发明的某些实施方式中，“序列测定”包括鉴定单个核苷酸，而在其它实施方式中，鉴定一个以上核苷酸。在本发明的某些实施方式中，收集单个循环中本身不足以鉴定任何核苷酸的序列信息。在本文中认为核苷、核苷酸和/或碱基的鉴定等效。应注意，对多核苷酸进行序列测定一般产生等价的完全互补(100％互补)多核苷酸的序列信息，因此等效于直接对完全互补多核苷酸进行的序列测定。As used herein, the terms "sequence determination", "determining a nucleotide sequence", "sequencing" and the like, when referring to a polynucleotide, include determining the sequence information of some or all of the polynucleotide. That is, the term includes information on the level of sequence comparison, fingerprinting, etc., of the target polynucleotide, as well as the rapid identification and ranking of individual nucleotides of the target polynucleotide within the region of interest. In certain embodiments of the invention, "sequence determination" includes identification of a single nucleotide, while in other embodiments, identification of more than one nucleotide. In certain embodiments of the invention, sequence information is collected in a single cycle that is insufficient by itself to identify any nucleotides. Identification of nucleosides, nucleotides and/or bases is considered equivalent herein. It should be noted that sequencing polynucleotides generally yields sequence information for equivalent fully complementary (100% complementary) polynucleotides and is therefore equivalent to sequencing directly performed on fully complementary polynucleotides.

本文所用“测序反应”指一组延伸、连接和检测循环。去除模板上的延伸双链体并对模板进行第二组循环时，各组循环被认为是单独的测序反应，但可将得到的序列信息合并产生一个序列。A "sequencing reaction" as used herein refers to a set of extension, ligation and detection cycles. When the extended duplex on the template is removed and the template is subjected to a second set of cycles, each set of cycles is considered a separate sequencing reaction, but the resulting sequence information can be combined to generate a single sequence.

本文所用“半固体”指含有固体和液体组分的可压缩基质，其中液体占据了固体基质组分间的孔隙、空间或其它间隙。示范性半固体基质包括由聚丙烯酰胺、纤维素、聚酰胺(尼龙)和交联的琼脂糖、右旋糖苷和聚乙二醇制成的基质。可以在第二支持物，如基本平坦的刚性支持物上提供半固体支持物，第二支持物也称作基材，它能支持所述半固体支持物。As used herein, "semi-solid" refers to a compressible matrix comprising solid and liquid components, wherein the liquid occupies the pores, spaces or other interstices between the solid matrix components. Exemplary semi-solid matrices include matrices made of polyacrylamide, cellulose, polyamide (nylon), and cross-linked agarose, dextran, and polyethylene glycol. The semi-solid support may be provided on a second support, such as a substantially planar rigid support, also referred to as a substrate, which is capable of supporting said semi-solid support.

本文所用“支持物”指可将核酸分子、微粒等固定在其上或其中的基质，即它们可共价或非共价连接于该支持物，或者可将它们部分或完全包埋在该支持物中或之上，从而使得基本或完全防止它们自由扩散或相对移动。"Support" as used herein refers to a substrate on or in which nucleic acid molecules, microparticles, etc. in or on substances such that their free diffusion or relative movement is substantially or completely prevented.

“引发残基”是当其存在于核酸中时，相对于不包含引发残基的其他方面相同的核酸，使该核酸更易于被切割剂(如酶、硝酸银等)或切割剂组合切割(如切割核酸主链)的残基，和/或易于被修饰产生使该核酸更易受这种切割的残基。因此，核酸中存在引发残基可导致核酸中存在易切连接。例如，脱碱基残基是引发残基，因为核酸中存在脱碱基残基使该核酸易于被酶如AP核酸内切酶切割。含有损伤碱基的核苷是引发残基，因为核酸中存在包含损伤碱基的核苷也使该核酸更易被酶如AP核酸内切酶切割，如通过DNA糖基化酶去除损伤碱基后。切割位点可以是引发残基和相邻残基之间的键，或者可以是从引发残基移动一个或多个残基的键。例如，脱氧肌苷是引发残基，因为核酸中存在脱氧肌苷使该核酸更易被大肠杆菌核酸内切酶V及其同源物切割。这种酶能切割脱氧肌苷3’端的第二个磷酸二酯键。本文公开的任何探针可含有一个或多个引发残基。引发残基可以(但不一定)包含核糖或脱氧核糖部分。切割剂优选在没有引发残基时基本不切割核酸、但在相同条件下对含有引发残基的核酸有显著的切割活性的切割剂，所述条件可包括存在核酸修饰剂，以使其对切割剂更敏感。例如，优选地，如果含有长度相同的核酸的组合物中存在切割剂，该组合物中一种核酸含有引发残基、并且其他核酸不含所述引发残基，切割含有引发残基的核酸的概率至少是切割不含引发残基的核酸的10；25；50；100；250；500；1000；2500；5000；10,000；25,000；50,000；100,000；250,000；500,000；1,000,000或更多倍，切割含有引发残基的核酸的概率与切割不含引发残基但其它情况相同的核酸的概率之比为10-10⁶，或者其中的任何整数子范围。应理解，此比率可因具体核酸以及引发残基的位置和核苷酸环境而不同。A "priming residue" is one that, when present in a nucleic acid, renders the nucleic acid more susceptible to cleavage by a nicking agent (such as an enzyme, silver nitrate, etc.) or combination of nicking agents relative to an otherwise identical nucleic acid that does not contain a priming residue ( such as cleaving the nucleic acid backbone), and/or are susceptible to modification resulting in residues that render the nucleic acid more susceptible to such cleavage. Thus, the presence of a priming residue in a nucleic acid can result in the presence of an easy linkage in the nucleic acid. For example, abasic residues are priming residues because their presence in a nucleic acid renders the nucleic acid susceptible to cleavage by enzymes such as AP endonuclease. Nucleosides containing damaged bases are initiating residues because the presence of nucleosides containing damaged bases in a nucleic acid also makes the nucleic acid more susceptible to cleavage by enzymes such as AP endonuclease, such as after removal of the damaged base by DNA glycosylase . The cleavage site may be a bond between the priming residue and an adjacent residue, or may be a bond shifted by one or more residues from the priming residue. For example, deoxyinosine is the priming residue because the presence of deoxyinosine in a nucleic acid renders the nucleic acid more susceptible to cleavage by E. coli endonuclease V and its homologues. This enzyme cleaves the second phosphodiester bond at the 3' end of deoxyinosine. Any of the probes disclosed herein may contain one or more priming residues. A priming residue can, but need not, contain a ribose or deoxyribose moiety. The nicking agent is preferably one that does not substantially cleave nucleic acids in the absence of a priming residue, but has significant cleavage activity for nucleic acids containing a priming residue under the same conditions, which may include the presence of a nucleic acid modifier such that it is sensitive to cleavage. agent is more sensitive. For example, preferably, if a nicking agent is present in a composition comprising nucleic acids of the same length in which one nucleic acid contains a priming residue and the other nucleic acid does not contain said priming residue, it cleaves the cleavage of the nucleic acid containing the priming residue. 25; 50; 100; 250; 500; 1000; 2500; 5000; 10,000; 25,000; 50,000; 100,000; 250,000; The ratio of the probability of a nucleic acid with a priming residue to the probability of cleaving an otherwise identical nucleic acid without a priming residue is ^10-106 , or any integer subrange therein. It is understood that this ratio may vary for the particular nucleic acid and the position and nucleotide environment of the priming residue.

优选地，如果含有引发残基的核酸需要修饰以使核酸易于被切割剂切割，不难在合适修饰剂的存在下进行这种修饰，例如，以合理的产量和合理的时间进行修饰。例如，在本发明的某些实施方式中，在(如)24小时内、优选12小时内、更优选不足1分钟至4小时内修饰至少50％、至少60％、至少70％、优选至少80％、至少90％或更优选至少95％的含有引发残基的核酸。Preferably, if a nucleic acid containing a priming residue requires modification to render the nucleic acid susceptible to cleavage by a nicking agent, such modification can be readily performed in the presence of a suitable modifying agent, e.g., in reasonable yield and in a reasonable time. For example, in certain embodiments of the invention at least 50%, at least 60%, at least 70%, preferably at least 80% of the %, at least 90%, or more preferably at least 95% of the nucleic acids containing priming residues.

本文列举了各种合适的引发残基和对应的切割试剂。可采用与本文所述活性相似的任何引发残基和切割试剂。本领域普通技术人员能够确定具体引发残基和切割试剂组合是否适用于本发明，如切割效率和速度、切割剂对含有引发残基的核酸的选择性等是否适用于本发明方法。需要注意的是，“引发残基”与仅形成限制性酶切位点的部分的核苷酸的不同之处在于，引发残基提高切割易感性的能力通常不显著取决于发现引发残基的具体序列内容，但如上所述，序列内容可能对修饰和/或切割的易感性有些影响。当然，根据周围的核苷酸，引发残基可能形成限制性位点的一部分。因此，在大多数情况下，所述切割剂不是限制性酶，但不排除采用既是限制性酶、又具有非序列特异性切割能力的酶。Various suitable priming residues and corresponding cleavage reagents are listed herein. Any priming residues and cleavage agents similar in activity to those described herein can be used. One of ordinary skill in the art can determine whether a particular combination of priming residues and cleavage reagents is suitable for use in the present invention, such as cleavage efficiency and speed, selectivity of cleavage agents for nucleic acids containing priming residues, etc., are suitable for the methods of the present invention. It is important to note that a "priming residue" is distinguished from nucleotides that only form part of a restriction site in that the ability of a priming residue to increase susceptibility to cleavage is generally not significantly dependent on the discovery of the priming residue. specific sequence content, but as noted above, sequence content may have some effect on susceptibility to modification and/or cleavage. Of course, depending on the surrounding nucleotides, the priming residue may form part of a restriction site. Thus, in most cases, the cleavage agent is not a restriction enzyme, but the use of enzymes that are both restriction enzymes and have non-sequence-specific cleavage capabilities is not excluded.

本文所用的“通用碱基”是可与天然产生的核酸中发现的一种以上碱基“配对”的碱基，因此它可以取代双链体中天然产生的碱基。该碱基不需要能与每种天然产生的碱基配对。例如，某些碱基仅与嘌呤选择性配对，或仅与嘧啶选择性配对。某些优选的通用碱基(完全通用碱基)可与一般在天然产生的核酸中发现的任何碱基配对，因此可取代双链体中的任何这些碱基。该碱基与各种天然产生的碱基配对的能力不必相同。如果探针混合物含有包含不与所有天然产生核苷酸配对的通用碱基的探针(一个或多个位置)，具体探针的这个位置上可能需要利用两种或多种通用碱基，以便至少有一种通用碱基与A配对，至少有一种通用碱基与G配对，至少有一种通用碱基与C配对，至少有一种通用碱基与T配对。As used herein, a "universal base" is a base that can "pair" with more than one base found in a naturally occurring nucleic acid, such that it can replace a naturally occurring base in a duplex. The base need not be able to pair with every naturally occurring base. For example, certain bases pair selectively only with purines, or only with pyrimidines. Certain preferred universal bases (full universal bases) can pair with any base typically found in naturally occurring nucleic acids and thus can substitute for any of these bases in a duplex. The base does not have to be equally capable of pairing with various naturally occurring bases. If the probe mix contains probes (one or more positions) that contain a universal base that does not pair with all naturally occurring nucleotides, it may be necessary to utilize two or more universal bases at this position of a particular probe in order to At least one universal base pairs with A, at least one universal base pairs with G, at least one universal base pairs with C, and at least one universal base pairs with T.

本领域已知多种通用碱基，包括但不限于：次黄嘌呤、3-硝基吡咯、4-硝基吲哚、5-硝基吲哚、4-硝基苯并咪唑、5-硝基吲唑、8-氮杂-7-脱氮腺嘌呤、6H，8H-3，4-二氢嘧啶并[4，5-c][1，2]嗪-7-酮(P.Kong Thoo Lin.和D.M.Brown，Nucleic Acids Res.，1989，17，10373-10383)、2-氨基-6-甲氧基氨基嘌呤(D.M.Brown和P.Kong Thoo Lin，Carbohydrate Research，1991，216，129-139)等。次黄嘌呤是一种优选的完全通用碱基。含有次黄嘌呤的核苷包括但不限于：肌苷、异肌苷、2′-脱氧肌苷和7-脱氮-2′-脱氧肌苷、2-氮杂-2′脱氧肌苷。Various universal bases are known in the art, including but not limited to: hypoxanthine, 3-nitropyrrole, 4-nitroindole, 5-nitroindole, 4-nitrobenzimidazole, 5-nitro Indazole, 8-aza-7-deazaadenine, 6H,8H-3,4-dihydropyrimido[4,5-c][1,2]oxazin-7-one (P.Kong Thoo Lin. and D.M.Brown, Nucleic Acids Res., 1989,17,10373-10383), 2-amino-6-methoxyaminopurine (D.M.Brown and P.Kong Thoo Lin, Carbohydrate Research, 1991,216,129- 139) etc. Hypoxanthine is a preferred fully universal base. Nucleosides containing hypoxanthine include, but are not limited to, inosine, isoinosine, 2'-deoxyinosine and 7-deaza-2'-deoxyinosine, 2-aza-2'deoxyinosine.

本领域已知其它通用碱基，如以下文献的相关部分所述：Loakes，D.和Brown，D.M.，Nucl.Acids Res.22：4039-4043，1994；Ohtsuka，E.等，J.Biol.Chem.260(5)：2605-2608，1985；Lin，P.K.T.和Brown，D.M.，Nucleic Acids Res.20(19)：5149-5152，1992；Nichols，R.等，Nature 369(6480)：492-493，1994；Rahmon，M.S.和Humayun，N.Z.，Mutation Research 377(2)：263-8，1997；Berger，M.等，Nucleic Acids Research，28(15)：2911-2914，2000；Amosova，O.等，NucleicAcids Res.25(10)：1930-1934，1997；和Loakes，D.，Nucleic Acids Res.29(12)：2437-47，2001。通用碱基可以、但不一定与相对位置的碱基形成氢键。通用碱基可通过Watson-Crick或非Watson-Crick相互作用(如Hoogsteen相互作用)形成氢键。Other universal bases are known in the art, as described in relevant parts of the following documents: Loakes, D. and Brown, D.M., Nucl. Acids Res. 22:4039-4043, 1994; Ohtsuka, E. et al., J. Biol. Chem.260(5):2605-2608, 1985; Lin, P.K.T. and Brown, D.M., Nucleic Acids Res.20(19):5149-5152, 1992; Nichols, R. et al., Nature 369(6480):492- 493, 1994; Rahmon, M.S. and Humayun, N.Z., Mutation Research 377(2):263-8, 1997; Berger, M. et al., Nucleic Acids Research, 28(15):2911-2914, 2000; Amosova, O. et al., Nucleic Acids Res. 25(10): 1930-1934, 1997; and Loakes, D., Nucleic Acids Res. 29(12): 2437-47, 2001. Universal bases can, but do not necessarily, form hydrogen bonds with bases in opposite positions. Universal bases can form hydrogen bonds through Watson-Crick or non-Watson-Crick interactions such as Hoogsteen interactions.

在本发明的某些实施方式中采用包含脱碱基残基的寡核苷酸探针，而非采用包含通用碱基的寡核苷酸探针。脱碱基残基可占据四种天然产生核苷酸的相对位置，因此可起到与含有通用碱基的核苷酸相同的作用。在本发明的一些实施方式中，由AP核酸内切酶切割与脱碱基残基相邻的连接，但在存在其它易切连接(如硫代磷酸酯)并采用其它切割试剂的本发明实施方式中也可采用脱碱基残基(即起到通用碱基的作用)。Rather than employing oligonucleotide probes comprising universal bases, oligonucleotide probes comprising abasic residues are used in certain embodiments of the invention. Abasic residues can occupy the relative positions of the four naturally occurring nucleotides and thus can function in the same way as nucleotides containing universal bases. In some embodiments of the invention, linkages adjacent to abasic residues are cleaved by the AP endonuclease, but in practice of the invention where other easy-cleavable linkages such as phosphorothioates are present and other cleavage reagents are employed Abasic residues (ie, functioning as universal bases) can also be used in the format.

本发明某些优选实施方式的详述DETAILED DESCRIPTION OF SOME PREFERRED EMBODIMENTS OF THE INVENTION

A.通过连续的延伸、连接和切割循环测序A. Sequencing by sequential extension, ligation, and cutting cycles

图1A用图解法显示了本发明一个方面的总体方案，总体类似颁发给Macevicz的美国专利5,740,341和6,306,597所述的方法。出于方便目的，在本文中将这些专利总称为“Macevicz”。具体说，Macevicz描述了鉴定多核苷酸中核苷酸序列的方法，所述方法包括以下步骤：(a)通过连接寡核苷酸探针形成延伸双链体沿该多核苷酸延伸起始寡核苷酸；(b)鉴定该多核苷酸的一种或多种核苷酸；和(c)重复步骤(a)和(b)，直到测定出核苷酸序列。Figure 1A diagrammatically shows the general scheme of one aspect of the present invention, generally similar to the methods described in US Patents 5,740,341 and 6,306,597 issued to Macevicz. For convenience, these patents are collectively referred to herein as "Macevicz." In particular, Macevicz describes a method for identifying nucleotide sequences in a polynucleotide comprising the steps of: (a) extending an initial oligonucleotide along the polynucleotide by ligating an oligonucleotide probe to form an extension duplex; (b) identifying one or more nucleotides of the polynucleotide; and (c) repeating steps (a) and (b) until the nucleotide sequence is determined.

Macevicz还描述了一种测定模板多核苷酸的核苷酸序列的方法，所述方法包括以下步骤：(a)提供起始寡核苷酸探针与模板多核苷酸杂交形成的探针-模板双链体，所述探针具有可延伸探针末端；(b)将延伸寡核苷酸探针连接于所述可延伸探针末端，形成含有延伸的寡核苷酸探针的延伸双链体；(c)鉴定所述延伸双链体中(1)与刚刚连接的延伸探针互补的模板多核苷酸中的至少一个核苷酸或(2)紧接在延伸的寡核苷酸探针下游的模板多核苷酸中的核苷酸残基；(d)如果可延伸末端还不存在，在延伸的探针上产生可延伸探针末端，从而使得产生的末端不同于连接最后一个延伸探针的末端；和(e)重复步骤(b)、(c)和(d)，直到测定出所述靶多核苷酸的核苷酸序列。在这些方法的某些实施方式中，各延伸探针在起始寡核苷酸探针的远端上含有链终止部分。在某些实施方式中，再生步骤包括用化学方法切割延伸的寡核苷酸探针中易切割的核苷间连接。Macevicz also describes a method for determining the nucleotide sequence of a template polynucleotide comprising the steps of: (a) providing a probe-template formed by hybridization of an initial oligonucleotide probe to the template polynucleotide a duplex having an extendable probe end; (b) ligating an extended oligonucleotide probe to the extendable probe end to form an extended duplex containing the extended oligonucleotide probe (c) identifying at least one nucleotide in the extended duplex that is (1) in the template polynucleotide that is complementary to the just-ligated extension probe or (2) that is immediately adjacent to the extended oligonucleotide probe target the nucleotide residue in the template polynucleotide downstream; (d) if the extendable end does not already exist, generate an extendable probe end on the extended probe such that the end generated differs from the end that was joined to the last extension the end of the probe; and (e) repeating steps (b), (c) and (d) until the nucleotide sequence of said target polynucleotide is determined. In certain embodiments of these methods, each extension probe contains a chain-terminating moiety distal to the initial oligonucleotide probe. In certain embodiments, the regeneration step comprises chemically cleaving a scissile internucleoside linkage in the extended oligonucleotide probe.

在图1A中，含有未知序列的多核苷酸区50和结合区40的多核苷酸模板20连接于支持物10。结合区40远端的核苷酸41和多核苷酸区50近端的核苷酸51相邻。提供了在结合区40的位置上与结合区40杂交形成双链体的起始寡核苷酸30。本文中起始寡核苷酸30也称为“引物”，结合区40可称为“引物结合区”。该双链体可以、但不一定是完全匹配的双链体。起始寡核苷酸具有可延伸末端31。在图1A中，起始寡核苷酸结合于结合区，以使可延伸末端31位于核苷酸41对面。然而，起始寡核苷酸可结合于结合区的其它地方，如以下所述。长度为N的延伸寡核苷酸探针60与起始寡核苷酸相邻的模板杂交。延伸寡核苷酸探针的末端核苷酸61连接于可延伸末端31。In FIG. 1A , a polynucleotide template 20 containing a polynucleotide region 50 of unknown sequence and a binding region 40 is attached to a support 10 . Nucleotide 41 distal to binding region 40 is adjacent to nucleotide 51 proximal to polynucleotide region 50 . An initial oligonucleotide 30 is provided which hybridizes to the binding region 40 at the location of the binding region 40 to form a duplex. The starting oligonucleotide 30 is also referred to herein as a "primer", and the binding region 40 may be referred to as a "primer binding region". The duplex can, but need not be, a perfectly matching duplex. The starting oligonucleotide has an extendable end 31 . In FIG. 1A , the starting oligonucleotide is bound to the binding region such that the extendable end 31 is located opposite nucleotide 41 . However, the starting oligonucleotide may bind elsewhere in the binding region, as described below. An extension oligonucleotide probe 60 of length N hybridizes to the template adjacent to the starting oligonucleotide. The terminal nucleotide 61 of the extension oligonucleotide probe is linked to the extendable end 31 .

末端核苷酸61与多核苷酸区50中的第一个未知核苷酸互补。因此，末端核苷酸61的种类确定了核苷酸51的种类。优选地，通过检测与已知末端核苷酸61是A、G、C或T的延伸探针连接的标记(未显示)鉴定核苷酸51。检测后去除该标记。图2显示了给具有不同3’末端核苷酸的延伸探针分配不同标记，如颜色不同的荧光团的方案。Terminal nucleotide 61 is complementary to the first unknown nucleotide in polynucleotide region 50 . Thus, the identity of the terminal nucleotide 61 determines the identity of the nucleotide 51 . Preferably, nucleotide 51 is identified by detecting a label (not shown) attached to an extension probe whose terminal nucleotide 61 is known to be A, G, C or T. The marker is removed after detection. Figure 2 shows a scheme for assigning different labels, such as different colored fluorophores, to extension probes with different 3' terminal nucleotides.

连接和检测后，如果探针60没有这种末端，则在延伸探针60上产生可延伸探针末端。优选长度也是N的第二延伸探针70退火到与延伸探针60相邻的模板上，并连接于探针60的可延伸末端。延伸探针70的末端核苷酸71的种类指定了多核苷酸50中相对位置上的核苷酸52的种类。因此，末端核苷酸71构成了延伸探针的“序列测定部分”，这意谓着探针部分的杂交特异性用作测定模板中一个或多个核苷酸种类的基础。应理解，延伸探针中其它核苷酸一般能与模板杂交，但仅有其种类与具体标记相关的探针中的那些核苷酸用于鉴定模板中的核苷酸。After ligation and detection, an extendable probe end is generated on the extension probe 60 if the probe 60 does not have such an end. A second extension probe 70 , also preferably of length N, anneals to the template adjacent to extension probe 60 and ligates to the extendable end of probe 60 . The identity of the terminal nucleotide 71 of the extension probe 70 specifies the identity of the nucleotide 52 at the relative position in the polynucleotide 50 . Thus, terminal nucleotide 71 constitutes the "sequencing portion" of the extension probe, meaning that the hybridization specificity of the portion of the probe is used as a basis for determining one or more nucleotide species in the template. It will be appreciated that other nucleotides in the extension probe will generally hybridize to the template, but only those nucleotides in the probe whose class is associated with a particular label are used to identify nucleotides in the template.

在本发明优选实施方式中，产生可延伸末端包括如下所述切割核苷间连接。优选地，切割也去除该标记。切割去除了延伸探针中多个核苷酸M(未显示)。因此，该双链体在每个循环中延伸N-M个核苷酸，并对位于模板中N-M之间的核苷酸进行鉴定。应理解，一般将给定模板的多个拷贝连接于一种支持物，并在这些模板上同时进行测序反应。In a preferred embodiment of the invention, generating an extendable terminus comprises cleavage of an internucleoside linkage as described below. Preferably, cleavage also removes the label. Cleavage removes multiple nucleotides M in the extension probe (not shown). Thus, the duplex is extended by N-M nucleotides in each cycle, and the nucleotides located between N-M in the template are identified. It will be appreciated that typically multiple copies of a given template are attached to a support and sequencing reactions are performed simultaneously on these templates.

Macevicz说明，寡核苷酸探针通常应该能够连接于起始寡核苷酸或延伸双链体，以产生下一个延伸循环的延伸双链体；该连接应该是模板驱动的，因为探针应在连接前与模板形成双链体；该探针应具有封端部分，以防止在一个延伸循环中在同一模板上连接多个探针；该探针应能够在连接后经处理或修饰再生出可延伸末端；该探针应具有信号部分(即可检测部分)，以便在顺利连接后获得有关模板的序列信息。Macevicz explained that oligonucleotide probes should generally be able to ligate to either the starting oligonucleotide or the extending duplex to generate the extending duplex for the next extension cycle; this ligation should be template-driven because the probe should be Form a duplex with the template before ligation; the probe should have a capping moiety to prevent ligation of multiple probes on the same template in one extension cycle; the probe should be able to be regenerated after ligation by manipulation or modification Extendable ends; the probe should have a signal portion (i.e. a detectable portion) in order to obtain sequence information about the template after successful ligation.

Macevicz描述了某些合适起始寡核苷酸、延伸寡核苷酸探针、模板、结合位点和用于合成、设计、产生或获得这些组分的各种方法的特征。Macevicz还描述了某些合适的连接酶、连接条件和各种合适标记。Macevicz也描述了用于鉴定的通过聚合酶延伸将标记的链终止核苷酸加入新连接的延伸探针的另选方法。所加入核苷酸的种类确定模板相对位置上的核苷酸。Macevicz describes certain suitable starting oligonucleotides, extended oligonucleotide probes, templates, binding sites and various methods for synthesizing, designing, producing or obtaining these components. Macevicz also describes certain suitable ligases, ligation conditions and various suitable labels. Macevicz also described an alternative method for the incorporation of labeled chain-terminating nucleotides into newly ligated extension probes by polymerase extension for identification. The type of nucleotide added determines the nucleotide at the relative position of the template.

如本领域普通技术人员所理解，提到模板、起始寡核苷酸、延伸探针、引物等时，通常指相关区域内基本相同的核酸分子的群体或库，而非单个分子。因此，例如，“模板”通常指多个基本相同的模板分子；“探针”通常指多个基本相同的探针分子等。在一个或多个位置上简并的探针中，应理解，包含具体探针的探针分子的序列在简并位置上不同，即构成特定探针的探针分子序列可以仅在非简并位置上基本相同。出于说明目的，应理解单数形式包括单个分子和基本相同的分子群。需要表示单个核酸分子(即一个分子)时，采用术语“模板分子”、“探针分子”、“引物分子”等。在某些情况下，明确说明基本相同的核酸分子群体的复数本质。As understood by those of ordinary skill in the art, references to templates, starting oligonucleotides, extension probes, primers, etc., generally refer to a population or library of substantially identical nucleic acid molecules within the relevant region, rather than to a single molecule. Thus, for example, a "template" generally refers to a plurality of substantially identical template molecules; a "probe" generally refers to a plurality of substantially identical probe molecules, and the like. In probes that are degenerate at one or more positions, it is understood that the sequences of the probe molecules comprising a particular probe differ at the degenerate positions, i.e. the sequence of the probe molecules that make up a particular probe may only be present at the non-degenerate positions. Basically the same position. For purposes of illustration, it is understood that the singular form includes both individual molecules and substantially identical groups of molecules. The terms "template molecule", "probe molecule", "primer molecule", etc. are used when it is desired to refer to a single nucleic acid molecule (ie, a molecule). In some cases, the plural nature of populations of substantially identical nucleic acid molecules is specified.

可用各种已知方法获得或产生基本相同的核酸分子群体，这些方法包括化学合成、在细胞中生物合成、在体外从一个或多个起始核酸分子进行酶学扩增等。例如，采用本领域熟知方法，可通过插入合适的表达载体如DNA或RNA质粒、然后引入能够在其中复制的细胞如细菌细胞，克隆感兴趣核酸。然后，从细胞中分离含有感兴趣核酸拷贝的质粒DNA或RNA。分离自病毒、细胞等的基因组DNA，或通过逆转录mRNA产生的cDNA无需克隆或体外扩增等中间步骤也可成为基本相同的核酸分子群体的来源(如序列待测的模板多核苷酸)，但通常优选对其进行中间步骤处理。A population of substantially identical nucleic acid molecules can be obtained or produced by a variety of known methods, including chemical synthesis, biosynthesis in cells, enzymatic amplification in vitro from one or more starting nucleic acid molecules, and the like. For example, a nucleic acid of interest can be cloned by inserting a suitable expression vector, such as a DNA or RNA plasmid, followed by introduction into a cell, such as a bacterial cell, capable of replicating therein, using methods well known in the art. Plasmid DNA or RNA containing copies of the nucleic acid of interest is then isolated from the cells. Genomic DNA isolated from viruses, cells, etc., or cDNA produced by reverse transcription of mRNA can also become the source of substantially the same nucleic acid molecule population (such as the template polynucleotide whose sequence is to be tested) without intermediate steps such as cloning or in vitro amplification, However, it is generally preferred to subject it to an intermediate step treatment.

应理解，群体成员不一定100％相同，如合成过程中可能产生一定数量的“错误”。优选地，至少50％群体成员与参比核酸分子(即用作序列比较基础的序列确定的分子)至少90％，或更优选至少95％相同。更优选地，至少60％、至少70％、至少80％、至少90％、至少95％、至少99％或更多群体成员与参比核酸分子至少90％、或更优选至少95％、或更优选至少99％相同。优选地，与参比核酸分子的相同性百分数为至少95％或更优选至少99％的群体成员至少占98％、99％、99.9％或更多。可通过以下方法计算相同性百分数：比较两个最佳比对序列，测定两个序列中核酸碱基(如A、T、C、G、U或I)相同的位置的数量产生匹配位置数量，将匹配位置数量除以位置总数再乘以100，得到序列相同性百分数。应理解，在某些情况下核酸分子如模板、探针、引物等可以是还含有不作为模板、探针或引物的部分的较大核酸分子的一部分。在这种情况下，群体中单个成员的这些部分不一定基本相同。It is to be understood that members of a population are not necessarily 100% identical, as a certain number of "errors" may have occurred during synthesis. Preferably, at least 50% of the population members are at least 90%, or more preferably at least 95%, identical to a reference nucleic acid molecule (ie, the sequence-determined molecule used as the basis for sequence comparison). More preferably, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 99% or more of the population members are at least 90%, or more preferably at least 95%, or more Preferably at least 99% identical. Preferably, members of a population having a percent identity of at least 95%, or more preferably at least 99%, to a reference nucleic acid molecule are at least 98%, 99%, 99.9% or more. The percent identity can be calculated by comparing two optimally aligned sequences, determining the number of positions in which a nucleic acid base (such as A, T, C, G, U, or I) is identical in the two sequences to yield the number of matching positions, The percent sequence identity was obtained by dividing the number of matching positions by the total number of positions and multiplying by 100. It is understood that in some cases a nucleic acid molecule such as a template, probe, primer, etc. may be part of a larger nucleic acid molecule that also contains parts that are not part of the template, probe or primer. In this case, these parts of the individual members of the population need not be substantially the same.

Macevicz描述了将模板连接于支持物(如珠)并向位于支持物远端的模板末端进行延伸的方法，如图1A所示。因此，相对于未知序列，结合区与支持物的距离更近，延伸双链体在离开支持物的方向上生长。然而，本发明人出人意料地发现，宜用另选方法实施该方法，在该方法中结合区位于支持物远端的模板末端，向支持物方向进行向内延伸。图1B描述了这种实施方式，其中各种元件的编号如图1A所示。本发明人确定从模板远端向支持物进行“向内”测序能提供更好的结果。具体说，从模板远端向支持物如珠进行测序比从支持物向外测序产生更高的连接效率。Macevicz describes a method of attaching a template to a support, such as a bead, and extending to the end of the template distal to the support, as shown in Figure 1A. Thus, the binding region is closer to the support than the unknown sequence and the extended duplex grows away from the support. However, the inventors have surprisingly found that it is advantageous to carry out the method by an alternative method in which the binding region is located at the end of the template distal to the support, extending inwardly towards the support. Figure 1B depicts such an embodiment, where the various elements are numbered as in Figure 1A. The inventors determined that sequencing "in" from the template distal to the support provided better results. In particular, sequencing from the distal end of the template to a support such as a bead yields higher ligation efficiencies than sequencing out from the support.

如Macevicz进一步所述，优选将寡核苷酸探针作为含有预定长度的所有可能序列的寡核苷酸混合物加入模板中。例如，含有具有NNNNNN(也可表示为(N)_k，其中k＝6)结构、长度为6个核苷酸(六聚体)的所有可能序列的探针混合物含有4⁶(4096)个探针种类。通常，探针的结构是X(N)_kN^*，其中N代表任何核苷酸，k是1-100，*代表标记，X代表其种类对应于标记的核苷酸。在某些实施方式中，k为1-100、1-50、1-30、1-20，如4-10。一个或多个核苷酸可以包含通用碱基。在N代表的位置上，探针通常为4-倍简并，或在N代表的一个或多个位置上含有简并性降低的核苷酸。如果需要，可将该混合物分成探针亚组(“严格性类别”)，其与互补序列的完全匹配双链体具有相似的稳定性或结合自由能。如Macevicz所述，这些亚组可用于不同的杂交反应。As further described by Macevicz, the oligonucleotide probes are preferably added to the template as a mixture of oligonucleotides containing all possible sequences of a predetermined length. For example, a probe mixture containing all possible sequences with the structure NNNNNN (also denoted as (N) _k , where k = 6) with a length of 6 nucleotides (hexamers) contains 4 ⁶ (4096) probes. Needle type. Typically, the structure of the probe is X(N) _k N ^* , where N represents any nucleotide, k is 1-100, * represents a label, and X represents the nucleotide whose species corresponds to the label. In certain embodiments, k is 1-100, 1-50, 1-30, 1-20, such as 4-10. One or more nucleotides may contain a universal base. At the position represented by N, the probe is typically 4-fold degenerate, or contains nucleotides of reduced degeneracy at one or more positions represented by N. If desired, the mixture can be divided into subgroups of probes ("stringency classes") that have similar stability or free energy of binding to a perfectly matched duplex of complementary sequence. These subsets can be used in different hybridization reactions as described by Macevicz.

可通过许多方法降低探针混合物的复杂性(即不同序列的数量)，这些方法包括采用所谓的简并性降低的核苷酸或核苷酸类似物。例如，含有8个核苷酸的所有可能序列的探针文库含有4⁸个探针。通过在两个位置上采用通用碱基可将探针数量降低到4⁶，同时保持八聚体文库的各种所需特性，如长度。本发明包括采用上述或上面引用的参考文献所述的任何通用碱基。The complexity (ie the number of different sequences) of a probe mixture can be reduced in a number of ways including the use of so-called reduced degeneracy nucleotides or nucleotide analogs. For example, a library of probes containing all possible sequences of 8 nucleotides contains ⁴⁸ probes. By employing universal bases at two positions the number of probes can be reduced to 4 ⁶ while maintaining various desirable properties of the octamer library, such as length. The invention includes the use of any of the universal bases described above or in the references cited above.

根据该实施方式，可用寡核苷酸探针在5’→3’方向或3’→5’方向上延伸延伸双链体或起始寡核苷酸，如下所述。通常，寡核苷酸探针不一定与模板形成完全匹配的双链体，但可优选这种结合。在每个延伸循环鉴定模板中一个核苷酸的实施方式中，鉴定该具体核苷酸需要完全碱基配对。例如，在用酶学方法将寡核苷酸探针连接于延伸双链体的实施方式中，需要连接探针的末端核苷酸与其模板互补物之间进行完全碱基配对，即适当的Watson-Crick碱基配对。通常，在这种实施方式中，探针的其余核苷酸用作“间隔物”，以保证在预定位点或沿模板移动一定数量的碱基处发生下一次连接。即，它们配对或不配对不能提供进一步的序列信息。同样，在依赖聚合酶延伸进行碱基鉴定的实施方式中，探针主要用作间隔物，因此与模板的特异性杂交不重要。According to this embodiment, the duplex or starting oligonucleotide can be extended with oligonucleotide probes in the 5'→3' direction or in the 3'→5' direction, as described below. In general, oligonucleotide probes do not necessarily form perfectly matched duplexes with the template, but such binding may be preferred. In embodiments where each extension cycle identifies one nucleotide in the template, full base pairing is required to identify that particular nucleotide. For example, in embodiments where oligonucleotide probes are ligated enzymatically to an extending duplex, complete base pairing between the terminal nucleotides of the ligated probe and its template complement is required, i.e. the appropriate Watson -Crick base pairing. Typically, in such embodiments, the remaining nucleotides of the probe serve as "spacers" to ensure that the next ligation occurs at a predetermined site or a certain number of bases shifted along the template. That is, their pairing or unpairing does not provide further sequence information. Also, in embodiments that rely on polymerase extension for base calling, the probe serves primarily as a spacer, so specific hybridization to the template is not important.

上述方法能部分测定序列，即鉴定模板中互相隔开的单个核苷酸。在本发明优选实施方式中，为了收集更完整的信息，进行多个反应，其中每个反应利用不同的起始寡核苷酸i。起始寡核苷酸i结合于结合区的不同部分。优选地，起始寡核苷酸结合的位置应使不同起始寡核苷酸的可延伸末端杂交于结合区时互相偏移1个核苷酸。例如，如图3所示，进行测序反应1...N。起始寡核苷酸i₁...i_n长度相同，与结合区40结合后其末端核苷酸31、32、33等杂交于结合区40中的连续相邻位置41、42、43等。因此，延伸探针e₁...e_n结合于模板的连续相邻区域并连接于起始寡核苷酸的可延伸末端。连接于i_n的探针e_n的末端核苷酸61与多核苷酸区50的核苷酸55，即模板中第一个未知多核苷酸互补。在第二个延伸、连接和检测循环中，探针e_n的末端核苷酸71与多核苷酸区50的核苷酸56，即未知序列的第二个核苷酸互补。同样，连接于双链体的延伸探针的末端核苷酸从起始寡核苷酸i₂、i₃、i₄等开始，与未知序列50的第三个、第四个和第五个核苷酸互补。应理解，起始寡核苷酸可结合于逐渐远离多核苷酸区50，而非逐渐靠近它的区域。The methods described above enable partial sequence determination, ie, the identification of individual nucleotides in a template that are spaced apart from each other. In a preferred embodiment of the invention, in order to gather more complete information, multiple reactions are performed, wherein each reaction utilizes a different starting oligonucleotide i. The starting oligonucleotides i bind to different parts of the binding region. Preferably, the binding position of the starting oligonucleotides should be such that the extendable ends of different starting oligonucleotides are hybridized to the binding region and shifted by 1 nucleotide from each other. For example, as shown in Figure 3, sequencing reactions 1...N are performed. The initial oligonucleotides i ₁ ... i _n have the same length, and after binding to the binding region 40, their terminal nucleotides 31, 32, 33, etc. hybridize to consecutive adjacent positions 41, 42, 43, etc. in the binding region 40 . Thus, extension probes _e1 ... _en bind to consecutive adjacent regions of the template and are ligated to the extendable ends of the starting oligonucleotides. The terminal nucleotide 61 of the _probe en attached to _in is complementary to nucleotide 55 of the polynucleotide region 50, ie the first unknown polynucleotide in the template. In the second cycle of extension, ligation and detection, terminal nucleotide 71 of probe _en is complementary to nucleotide 56 of polynucleotide region 50, the second nucleotide of the unknown sequence. Likewise, the terminal nucleotides of the extension probes attached to the duplex start from the starting oligonucleotides i ₂ , i ₃ , i _{4 ,} etc., with the third, fourth and fifth of the unknown sequence 50 Nucleotides are complementary. It will be appreciated that the starting oligonucleotide may bind to a region that is progressively remote from the polynucleotide region 50, rather than progressively closer to it.

延伸探针的非末端核苷酸的间隔功能使得不需要对任何给定模板进行相应许多个循环，就能获得从起始寡核苷酸结合的位置开始相隔一定数量核苷酸的模板位置上的序列信息。例如，通过连接长度为N的探针、然后切割去除延伸探针上的单个末端核苷酸的连续循环，可在连续循环中鉴定间隔为N-1个核苷酸的核苷酸。例如，可用6个循环鉴定模板中位置1、N、2N-1、3N-2、4N-3和5N-4上的核苷酸，其中模板位置1上的核苷酸对应于连接于通过起始寡核苷酸与模板结合形成的双链体中可延伸探针末端的核苷酸。相似地，如果切割去除长度为N的延伸探针的两个核苷酸，可在连续轮次中鉴定相互间隔N-2个核苷酸的位置上的核苷酸。例如，可用6个循环鉴定模板中位置1、N-1、2N-3、3N-5、4N-7的核苷酸。因此，如果探针的长度为8个核苷酸，并且每个循环去除2个核苷酸，则鉴定位置1、7、13、19和25上的核苷酸。因此，鉴定与模板中第一个核苷酸距离为X的核苷酸所需的循环数约为X/M，其中M是切割后保留的延伸探针的长度，而非约为X。The spacing function of the non-terminal nucleotides of the extension probe allows for template positions that are separated by a certain number of nucleotides from the position where the initial oligonucleotide binds without requiring a correspondingly large number of cycles for any given template. sequence information. For example, by successive cycles of ligating probes of length N followed by cleaving to remove a single terminal nucleotide on the extension probe, nucleotides spaced by N-1 nucleotides apart can be identified in successive cycles. For example, 6 cycles can be used to identify the nucleotides at positions 1, N, 2N-1, 3N-2, 4N-3, and 5N-4 in the template, where the nucleotide at position 1 of the template corresponds to the The nucleotides at the end of the probe can be extended in the duplex formed by the binding of the initial oligonucleotide to the template. Similarly, if cleavage removes two nucleotides of an extension probe of length N, nucleotides at positions N-2 nucleotides apart from each other can be identified in consecutive runs. For example, 6 cycles can be used to identify the nucleotides at positions 1, N-1, 2N-3, 3N-5, 4N-7 in the template. Thus, if the probe is 8 nucleotides in length and 2 nucleotides are removed each cycle, the nucleotides at positions 1, 7, 13, 19 and 25 are identified. Thus, the number of cycles required to identify a nucleotide at a distance X from the first nucleotide in the template is approximately X/M, where M is the length of the extended probe remaining after cleavage, rather than approximately X.

例如，图3B所示方案显示采用延伸、连接和切割循环法与经设计每6个碱基阅读一次模板的延伸探针的最终结果。用结合于结合区的偏移位置的6种起始核苷酸对模板进行连续剥离和测序，并合并结果，可阐明确定长度上的所有模板碱基。例如，如果6次反应各自进行10次连续连接，得到的阅读长度为60个连续碱基对，而如果各反应进行15次连续连接，得到的阅读长度为90个连续碱基对。For example, the scheme shown in Figure 3B shows the end result of using the extension, ligation, and cleavage cycle approach with extension probes designed to read the template every 6 bases. The templates were sequentially stripped and sequenced with six starting nucleotides bound to the offset positions of the binding region, and the results were combined to elucidate all template bases over a defined length. For example, if 10 consecutive ligations are performed in each of the 6 reactions, the resulting read length is 60 consecutive base pairs, whereas if 15 consecutive ligations are performed in each reaction, the resulting read length is 90 consecutive base pairs.

虽然不希望受限于任何理论，但本发明人提出，与这种方法相反，用合成法进行的大多数连续测序伴随有差错累积的弊端，这最终会限制长阅读长度的可能。本文所述某些方法的有利特征是它们能每n个碱基鉴定一次(取决于探针中可切割部分的位置)，以便在给定数量的循环(y)后，达到第n*y-(n-1)个碱基(如上述例子中15个循环后达到第71个碱基，或在切割位点的3’侧用6个碱基的探针进行20个循环后达到第115个碱基)。在n-1、n-2等位置上“重启动”起始寡核苷酸的能力大大降低了给定长度上的连续差错累积(通过移相或损耗)，因为从模板上剥离延伸链和杂交新起始寡核苷酸的的过程有效地将背景信号再设定为零。例如，比较基于聚合酶合成的测序方法和本文所述基于连接的方法，如果各延伸循环的信噪比为99∶1，那么100个基于聚合酶的方法循环后，信噪比为37∶63，基于连接酶的方法为85∶15。基于连接酶的方法的最终结果是阅读长度比基于聚合酶的方法大大增加。While not wishing to be bound by any theory, the inventors propose that, in contrast to this approach, most serial sequencing by synthesis is accompanied by the drawback of error accumulation, which ultimately limits the possibility of long read lengths. An advantageous feature of certain methods described herein is that they identify every n bases (depending on the position of the cleavable moiety in the probe), so that after a given number of cycles (y), the n*y-th (n-1) bases (like the 71st base after 15 cycles in the above example, or the 115th base after 20 cycles with a 6-base probe on the 3' side of the cleavage site base). The ability to "restart" the starting oligonucleotide at position n-1, n-2, etc., greatly reduces the accumulation of consecutive errors (by phase shifting or depletion) over a given length as stripping of the extended strand from the template and The process of hybridizing the fresh starting oligonucleotides effectively resets the background signal to zero. For example, comparing the polymerase synthesis-based sequencing method with the ligation-based method described here, if the signal-to-noise ratio for each extension cycle is 99:1, after 100 cycles of the polymerase-based method, the signal-to-noise ratio is 37:63 , 85:15 for the ligase-based approach. The end result of the ligase-based approach is a large increase in read length over the polymerase-based approach.

鉴于多种原因，用少于如果模板中每个在前核苷酸都需要进行一个循环所需的循环数鉴定核苷酸的能力很重要。具体说，该方法的各步骤的效率不可能达到100％。例如，一些模板可能无法顺利连接于延伸探针；一些延伸探针可能无法被切割等。因此，各循环中，在不同拷贝的模板上发生的反应逐渐变得有相位差，可获得有用的准确信息的模板数量减少。因此，特别需要最大程度减少阅读距起始寡核苷酸可延伸末端较远位置的核苷酸所需的循环数。然而，增加延伸探针长度可能导致探针混合物的复杂性增加，这会降低各探针序列的有效浓度。如本文所述，可用简并性降低的核苷酸降低复杂性，但这可能导致杂交强度降低和/或连接效率降低。本发明者认识到，需要平衡这些竞争因素，以优化结果。因此，在本发明的优选实施方式中，采用长度为8个核苷酸的延伸探针，在所选位置上采用简并性降低的核苷酸。此外，本发明者认识到，选择合适的易切连接以及切割条件和时间以优化切割步骤效率(即各切割步骤中顺利切割的连接的百分数)和对合适连接的特异性的重要性。The ability to identify nucleotides with fewer cycles than would be required if each preceding nucleotide in the template required one cycle is important for a number of reasons. In particular, the efficiency of each step of the method cannot be 100%. For example, some templates may not be smoothly ligated to extension probes; some extension probes may not be cleaved, etc. Therefore, in each cycle, the reactions occurring on different copies of the template gradually become out of phase, and the number of templates for which useful and accurate information can be obtained decreases. Therefore, there is a particular need to minimize the number of cycles required to read nucleotides located farther from the extendable end of the starting oligonucleotide. However, increasing the extension probe length may lead to increased complexity of the probe mixture, which reduces the effective concentration of each probe sequence. As described herein, nucleotides of reduced degeneracy can be used to reduce complexity, but this may result in reduced hybridization strength and/or reduced ligation efficiency. The inventors have recognized that these competing factors need to be balanced in order to optimize results. Therefore, in a preferred embodiment of the invention, extension probes of 8 nucleotides in length are used, with reduced degeneracy nucleotides at selected positions. In addition, the inventors recognized the importance of selecting appropriate easy-cleavable linkages, as well as cleavage conditions and times, to optimize cleavage step efficiency (ie, the percentage of smoothly cleaved linkages in each cleavage step) and specificity for suitable linkages.

B.寡核苷酸延伸探针设计B. Oligonucleotide Extension Probe Design

虽然Macevicz提到，简并性降低的核苷类似物可用于寡核苷酸延伸探针，但他没有说明特别需要在延伸探针中包含这种残基的特定位置，也没有说明掺入简并性降低的核苷的各种具体探针结构(即序列)。本发明者认识到，在寡核苷酸延伸探针的特定位置上采用特定数量的简并性降低的核苷(如含有通用碱基的核苷)可能特别有利。例如，在本发明的某些实施方式中，位置6或更远位置上(从X开始)的大多数或全部核苷酸含有通用碱基。例如，位置6或更远位置上的至少50％、至少60％、至少70％、至少80％、至少90％或至少100％核苷酸可含有通用碱基。这些核苷酸不一定都含有相同的通用碱基。在本发明的某些实施方式中，次黄嘌呤和/或硝基吲哚用作通用碱基。例如，可采用核苷如肌苷。Although Macevicz mentions that nucleoside analogs with reduced degeneracy can be used in oligonucleotide extension probes, he does not state that specific positions of such residues are specifically required to be included in the extension probes, nor does he state that incorporation of degeneracy Various specific probe structures (ie sequences) of nucleosides with reduced degeneracy. The inventors have recognized that it may be particularly advantageous to employ a particular number of reduced degeneracy nucleosides (eg, nucleosides containing universal bases) at particular positions in an oligonucleotide extension probe. For example, in certain embodiments of the invention, most or all of the nucleotides at position 6 or further away (starting with X) contain a universal base. For example, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or at least 100% of the nucleotides at position 6 or further away may contain a universal base. These nucleotides do not necessarily all contain the same universal base. In certain embodiments of the invention hypoxanthine and/or nitroindole are used as universal bases. For example, nucleosides such as inosine may be employed.

本发明者认识到，可用长度大于6个核苷酸的延伸探针获得优异结果，其中从连接于可延伸探针末端的核苷酸开始数，从探针近端起位置6或更远位置上的一个或多个核苷酸是简并性降低的核苷酸，如含有通用碱基(即如果最近端核苷酸被认为是位置1，那么位置6或更远位置上的一个或多个核苷酸含有通用碱基)，如8聚体探针中位置6或更远位置上的1、2或3个核苷酸含有通用碱基。例如，在3’→5’测序中，可采用结构为3’-XNNNNsINI-5’的探针，其中X和N代表任何核苷酸，“s”代表易切连接，以便在从3’端数第五个和第六个残基之间发生切割，并优选易切连接和5’端之间至少一个残基具有对应于X种类的标记。另一种设计是3’-XNNNNsNII-5’。又一种探针设计是3’-XNNNNsIII-5’。这种设计产生含有1024种不同探针的具有适度复杂性的探针混合物，其长度足以防止形成显著的腺苷酸化产物(参见实施例1)，并且具有切割后得到的延伸产物由未修饰DNA组成的优点。一个缺点是此探针每次仅延伸引物5个碱基。由于阅读长度是延伸长度乘以循环数的函数，延伸长度每增加一个碱基可使阅读长度增加1x循环数个碱基(例如，如果采用20个循环则是20个碱基)。另一种探针设计切割后在延伸探针末端留下一个或多个肌苷(或其它通用碱基)，以产生6个碱基或更长的延伸双链体。例如，采用探针3’-XNNNNIsII-5’时，双链体每次延伸6个碱基，在连接处留下5’肌苷。在这些设计中，优选易切连接和5’端之间至少一个残基具有对应于X种类的标记。在本发明的某些实施方式中，从连接于可延伸探针末端核苷酸的相对末端开始数，从探针远端起第三个核苷酸含有通用碱基(即，如果远端被认为是位置K，那么位置K-2上的核苷酸含有通用碱基)。The inventors have recognized that excellent results can be obtained with extension probes greater than 6 nucleotides in length, where counting from the nucleotide attached to the end of the extendable probe, positions 6 or more from the proximal end of the probe One or more nucleotides at is a nucleotide with reduced degeneracy, such as containing a universal base (i.e. if the most proximal nucleotide is considered to be position 1, then one or more at position 6 or further nucleotides contain universal bases), such as 1, 2 or 3 nucleotides at position 6 or further in the 8-mer probe contain universal bases. For example, in 3'→5' sequencing, a probe with the structure 3'-XNNNNsINI-5' can be used, where X and N represent any nucleotide, and "s" represents an easy-cleavage junction, so that when counting from the 3' end Cleavage occurs between the fifth and sixth residues, and preferably at least one residue between the easy link and the 5' end has a label corresponding to the X species. Another design is 3'-XNNNNsNII-5'. Yet another probe design is 3'-XNNNNsIII-5'. This design yielded a moderately complex probe mixture containing 1024 different probes, of sufficient length to prevent the formation of significant adenylation products (see Example 1), and with extension products obtained after cleavage from unmodified DNA. Advantages of composition. One disadvantage is that this probe only extends the primer 5 bases at a time. Since read length is a function of extension length multiplied by the number of cycles, each base increase in extension length increases read length by 1x cycle number of bases (eg, 20 bases if 20 cycles are used). Another probe design cleavage leaves one or more inosines (or other universal bases) at the end of the extension probe to generate extension duplexes of 6 bases or longer. For example, with the probe 3'-XNNNNIsII-5', the duplex is extended 6 bases at a time, leaving a 5' inosine at the junction. In these designs, it is preferred that at least one residue between the easy link and the 5' end has a label corresponding to the X species. In certain embodiments of the invention, the third nucleotide from the distal end of the probe, counting from the opposite end attached to the terminal nucleotide of the extendable probe, contains a universal base (i.e., if the distal end is considered to be position K, then the nucleotide at position K-2 contains the universal base).

在本发明的某些实施方式中，在起始寡核苷酸探针、延伸探针或二者的一个或多个位置上采用锁定核酸(LNA)碱基。例如，美国专利6,268,490；Koshkin，AA等，Tetrahedron，54：3607-3630，1998；Singh，SK等，Chem.Comm.，4：455-456，1998中描述了锁定核酸。可用自动DNA合成仪和标准的亚磷酰胺化学合成LNA，LNA可掺入也含有天然产生的核苷酸和/或核苷酸类似物的寡核苷酸中。也可用标记如下述标记合成它们。In certain embodiments of the invention, locked nucleic acid (LNA) bases are employed at one or more positions of the initial oligonucleotide probe, the extension probe, or both. Locked nucleic acids are described, for example, in US Patent 6,268,490; Koshkin, AA et al., Tetrahedron, 54:3607-3630, 1998; Singh, SK et al., Chem. Comm., 4:455-456, 1998. LNAs can be synthesized using an automated DNA synthesizer and standard phosphoramidite chemistry and can be incorporated into oligonucleotides that also contain naturally occurring nucleotides and/or nucleotide analogs. They can also be synthesized using labels such as those described below.

C.模板和支持物制备方法C. Template and Support Preparation Methods

Macevicz描述了首先合成含有多种基本相同模板分子的模板的方法，如在试管或其它容器中用常规聚合酶链反应(PCR)法扩增。Macevicz指出，扩增的模板分子在合成后优选连接于支持物如磁性微粒(如珠)。Macevicz describes a method of first synthesizing a template comprising a plurality of substantially identical template molecules, as amplified by conventional polymerase chain reaction (PCR) methods in a test tube or other vessel. Macevicz states that the amplified template molecules are preferably attached to supports such as magnetic microparticles (eg beads) after synthesis.

本发明者认识到，宜在支持物本身上或之中合成待测序模板，例如，采用在进行PCR反应之前与一对扩增引物之一连接的支持物如微粒或各种半固体支持物，如凝胶基质。这种方法在合成后不需要单独步骤将模板分子连接于支持物。因此，可方便地平行扩增序列不同的多种模板。例如，按照下述方法，在微粒上合成产生一群单个微粒，各自连接有多个拷贝的特定模板分子(或其互补物)，其中连接于各微粒的模板分子与连接于其它微粒的模板分子的序列不同。因此，各支持物连接有克隆的模板群，如支持物A连接有多个拷贝的模板X；支持物B连接有多个拷贝的模板Y；支持物C连接有多个拷贝的模板Z等。“克隆的模板群”、“克隆的核酸群”等指基本相同的模板分子的群体，优选通过从感兴趣的单一模板分子(起始模板)开始的连续扩增轮次产生。基本相同的模板分子可能与起始模板或其互补物基本相同。The inventors have recognized that it is desirable to synthesize the templates to be sequenced on or in the support itself, for example, using supports such as microparticles or various semi-solid supports that are linked to one of a pair of amplification primers prior to performing the PCR reaction, as a gel matrix. This method does not require a separate step of attaching the template molecule to the support after synthesis. Thus, multiple templates differing in sequence can be conveniently amplified in parallel. For example, a population of individual particles, each having multiple copies of a particular template molecule (or its complement) attached to it, is produced synthetically on a particle according to the following method, wherein the template molecule attached to each particle is identical to the template molecule attached to other particles. The sequence is different. Therefore, each support is linked with a cloned template group, for example, support A is linked with multiple copies of template X; support B is linked with multiple copies of template Y; support C is linked with multiple copies of template Z, and so on. A "cloned template population", "cloned nucleic acid population", etc. refers to a population of substantially identical template molecules, preferably produced by successive rounds of amplification starting from a single template molecule of interest (the starting template). A substantially identical template molecule may be substantially identical to the starting template or its complement.

一般用PCR进行扩增，但也可采用其它扩增方法(见下)。应理解，克隆群体成员不一定100％相同，例如，在合成如扩增过程中，可能发生一定数量的“差错”。优选地，至少50％克隆群体成员与起始模板分子(或其互补物)至少90％、或更优选至少95％相同。更优选地，至少60％、至少70％、至少80％、至少90％、至少95％、至少99％、或更多的群体成员与起始模板分子(或其互补物)至少90％、或更优选至少95％相同，或更优选至少99％相同。优选地，至少95％或更优选至少99％的群体成员与起始模板分子(或其互补物)的相同性百分数为至少98％、99％、99.9％或更高。Amplification is typically performed by PCR, but other amplification methods may also be used (see below). It is to be understood that members of a clonal population are not necessarily 100% identical, for example, a certain number of "errors" may occur during synthesis such as amplification. Preferably, at least 50% of the members of the clonal population are at least 90%, or more preferably at least 95% identical to the starting template molecule (or its complement). More preferably, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 99%, or more of the population members are at least 90% identical to the starting template molecule (or its complement), or More preferably at least 95% identical, or more preferably at least 99% identical. Preferably, at least 95%, or more preferably at least 99%, of the population members have a percent identity of at least 98%, 99%, 99.9% or higher to the starting template molecule (or its complement).

可用各种技术将扩增引物连接于支持物。例如，可用结合对的一个成员(如生物素)使引物一端(5’端)官能化，用结合对的另一个成员(如链霉亲和素)使支持物官能化。可采用任何相似的结合对。例如，可将确定序列的核酸标签连接于支持物，含有互补核酸标签的引物可杂交于连接于支持物的核酸标签。也可采用各种接头和交联剂。Amplification primers can be attached to supports using various techniques. For example, one end (5' end) of the primer can be functionalized with one member of the binding pair (e.g., biotin) and the support can be functionalized with the other member of the binding pair (e.g., streptavidin). Any similar binding pair can be used. For example, a nucleic acid tag of defined sequence can be attached to a support, and a primer containing a complementary nucleic acid tag can hybridize to the nucleic acid tag attached to the support. Various linkers and crosslinkers can also be used.

本领域熟知进行PCR的方法，参见例如美国专利4,683,195、4,683,202和4,965,188，以及Dieffenbach，C.和Dveksler，GS，《PCR引物：实验室手册》(PCRPrimer：A Laboratory Manual)，第2版，Cold Spring Harbor Laboratory Press，ColdSpring Harbor，2003。本领域熟知且描述了扩增微粒上的核酸的方法，例如，可在微量滴定板孔或试管中对连接有引物的珠进行标准PCR(如实施例12制备的珠)。虽然PCR是方便的扩增方法，但也可采用本领域已知的许多其它方法。例如，可采用多链置换扩增、解旋酶置换扩增(HDA)、缺口平移、Qβ复制酶扩增、滚环扩增和其它等温扩增方法等。Methods of performing PCR are well known in the art, see, e.g., U.S. Patents 4,683,195, 4,683,202, and 4,965,188, and Dieffenbach, C. and Dveksler, GS, PCR Primer: A Laboratory Manual, 2nd Ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, 2003. Methods for amplifying nucleic acids on microparticles are well known and described in the art, for example, standard PCR can be performed on beads with attached primers (such as the beads prepared in Example 12) in microtiter plate wells or test tubes. While PCR is a convenient method of amplification, many other methods known in the art can also be used. For example, multiple strand displacement amplification, helicase displacement amplification (HDA), nick translation, Qβ replicase amplification, rolling circle amplification, and other isothermal amplification methods, etc. can be employed.

模板分子可获自任何来源。例如，DNA可分离自样品，该样品可能获自或衍生自某对象。从广义上说，术语“样品”指对其进行序列测定的任何模板来源。用术语“衍生自”表示直接获自对象的样品和/或样品中的核酸经过进一步加工获得模板分子。样品来源可以是任何病毒、真核生物、古细菌或真核物种。在本发明的某些实施方式中，来源是人。样品可以是(例如)血液或含有细胞的其它体液；精液；活检样品等。可对来自任何感兴趣有机体的基因组或线粒体DNA进行测序。可测序cDNA。也可测序RNA，例如，首先用本领域熟知的方法如RT-PCR逆转录产生cDNA。可合并来自不同样品和/或对象的DNA混合物。可用各种方式加工样品。可用已知方法从样品分离、纯化和/或扩增核酸。当然，也可测序不衍生自有机体的完全人造的合成核酸、重组核酸。Template molecules can be obtained from any source. For example, DNA may be isolated from a sample, which may have been obtained or derived from a subject. In a broad sense, the term "sample" refers to any source of templates from which sequencing is performed. By the term "derived from" is meant a sample directly obtained from a subject and/or further processing of nucleic acid in the sample to obtain a template molecule. The sample source can be any viral, eukaryotic, archaeal or eukaryotic species. In certain embodiments of the invention, the source is human. A sample can be, for example, blood or other bodily fluid containing cells; semen; a biopsy sample, and the like. Genomic or mitochondrial DNA from any organism of interest can be sequenced. cDNA can be sequenced. RNA can also be sequenced, for example, first by reverse transcription to generate cDNA using methods well known in the art, such as RT-PCR. DNA mixtures from different samples and/or subjects can be pooled. Samples can be processed in various ways. Nucleic acid can be isolated, purified and/or amplified from a sample using known methods. Of course, completely man-made synthetic nucleic acids, recombinant nucleic acids not derived from an organism can also be sequenced.

可以双链或单链形式提供模板。一般地，最初以双链形式提供模板时，随后分离两条链(如使DNA变性)，仅扩增两条链中的一条以产生定位的模板分子克隆群体，所述克隆群体(如)连接于微粒、固定在半固体支持物中或之上等。Templates can be provided in double-stranded or single-stranded form. Typically, when the template is initially provided in double-stranded form, the two strands are subsequently separated (e.g., denatured DNA) and only one of the two strands is amplified to generate a clonal population of localized template molecules that are (e.g., ligated) On microparticles, immobilized in or on semi-solid supports, etc.

可用各种其它方式选择或加工模板。例如，可采用用甲基敏感性限制性酶(如MspI)处理的DNA获得的模板。可在扩增前进行这种产生DNA片段的处理。含有甲基化碱基的片段不扩增。可比较获自超甲基化模板的序列信息与获自未进行超甲基化选择的相同来源的模板的序列信息。Templates can be selected or processed in various other ways. For example, a template obtained from DNA treated with a methyl-sensitive restriction enzyme such as MspI can be used. This process of generating DNA fragments can be performed prior to amplification. Fragments containing methylated bases are not amplified. Sequence information obtained from a hypermethylated template can be compared to sequence information obtained from a template from the same source that has not been selected for hypermethylation.

可将模板插入文库，或者可在文库中提供模板，或者模板可衍生自文库。例如，本领域已知超甲基化文库。将模板插入文库能够方便地将额外核苷酸序列与模板末端连接起来，如标签、引物结合位点或起始寡核苷酸等。例如，某些方案允许加入具有多个结合位点，如扩增引物结合位点、起始寡核苷酸结合位点、捕获剂结合位点等的标签。Templates can be inserted into a library, or can be provided in a library, or can be derived from a library. For example, hypermethylated libraries are known in the art. Insertion of templates into libraries facilitates the ligation of additional nucleotide sequences, such as tags, primer binding sites, or starting oligonucleotides, to the ends of the templates. For example, certain protocols allow for the addition of tags with multiple binding sites, such as amplification primer binding sites, starting oligonucleotide binding sites, capture agent binding sites, and the like.

本领域已知各种合适的文库。例如，USSN 10/978,224，PCT公开WO2005042781和WO2005082098以及Shendure，J.等，Science，309(5741)：1728-32，2005，Sciencexpress，2005年8月4日(www.sciencexpress.org)描述了特别感兴趣的文库及其构建方法。当然应理解，也可采用产生这种文库的其它方法。某些特别感兴趣的文库含有多种核酸片段(一般是DNA)，各片段含有两个感兴趣的核酸节段，它们被用于测序步骤的扩增和/或测序引物互补的序列分隔开，即这些序列用作引物结合区(PBR)。在特别感兴趣的实施方式中，核酸节段是天然产生的DNA的一段连续部分。例如，节段可来自基因组DNA的连续部分的5’和3’端，如上述参考文献所述。与上述文献一致，在本文中将这种核酸节段称为“标签”或“末端标签”。衍生自一段连续核酸如其5’和3’端的两个标签称为“成对标签”、“成对标签”或“双标签”。应理解，“成对标签”包括两个标签，即使用单数形式表示。通过选择预定大小限度内的产生成对标签的DNA连续部分，限制分隔开两个标签的距离。Various suitable libraries are known in the art. For example, USSN 10/978,224, PCT publications WO2005042781 and WO2005082098 and Shendure, J. et al., Science, 309(5741):1728-32, 2005, Scienceexpress, Aug. 4, 2005 (www.scienceexpress.org) describe particularly Libraries of interest and methods for their construction. It will of course be understood that other methods of generating such libraries may also be employed. Certain libraries of particular interest contain multiple nucleic acid fragments (typically DNA), each fragment containing two nucleic acid segments of interest separated by sequences complementary to the amplification and/or sequencing primers used in the sequencing step , that is, these sequences serve as primer binding regions (PBR). In embodiments of particular interest, a nucleic acid segment is a contiguous portion of naturally occurring DNA. For example, segments can be derived from the 5' and 3' ends of a contiguous portion of genomic DNA, as described in the above references. Consistent with the aforementioned literature, such nucleic acid segments are referred to herein as "tags" or "terminal tags." Two tags derived from a stretch of contiguous nucleic acid, such as its 5' and 3' ends, are referred to as "paired tags", "paired tags" or "bi-tags". It should be understood that "a pair of tags" includes two tags, that is, expressed in a singular form. The distance separating the two tags is limited by selecting contiguous portions of DNA within predetermined size limits that yield paired tags.

除了被与测序和/或扩增引物互补的序列分隔开以外，该文库的核酸片段一般也含有与侧接标签的测序和/或扩增引物互补的序列，即第一个这种序列可位于与该片段5’端较近的标签的5’端，第二个这种序列可位于与该片段3’端较近的标签的3’端。应理解，在各种实施方式中产生标签的连续核酸中存在的两个标签的位置可以，但不一定对应于标签在文库DNA片段中的位置。In addition to being separated by sequences complementary to the sequencing and/or amplification primers, the nucleic acid fragments of the library generally also contain sequences complementary to the sequencing and/or amplification primers flanking the tag, i.e. the first such sequence can be Located 5' to the tag that is closer to the 5' end of the fragment, a second such sequence may be located 3' to the tag that is closer to the 3' end of the fragment. It is understood that in various embodiments the positions of the two tags present in the contiguous nucleic acid from which the tags are generated may, but do not necessarily correspond to the positions of the tags in the DNA fragments of the library.

核酸片段和标签可具有不同的大小范围。核酸片段的长度一般可以是(例如)80-300个核苷酸，如100-200个、100-150个、约150个核苷酸、约200个核苷酸等。标签的长度可以是(如)15-25个核苷酸，如约17-18个核苷酸等。应注意，这些长度是示范性，而不是限制性。可采用较短或较长的片段和/或标签。Nucleic acid fragments and tags can be of different size ranges. Nucleic acid fragments generally can be, for example, 80-300 nucleotides in length, such as 100-200, 100-150, about 150 nucleotides, about 200 nucleotides, etc. in length. Tags can be, for example, 15-25 nucleotides in length, such as about 17-18 nucleotides, etc. in length. It should be noted that these lengths are exemplary and not limiting. Shorter or longer fragments and/or tags can be used.

也应注意，虽然从单个连续核酸获得成对标签提供了方便的方法进行文库构建，但成对标签的重要之处在于在最初产生它们的核酸中它们互相相隔一段距离(“间隔距离”)，其中间隔距离属于预定的距离范围。标签被属于预定范围的间隔距离分隔开使得能够将标签序列与参比序列(如参比基因组序列)进行比对。不希望受限于任何理论，这可能有利于某些应用如基因组再测序，其中它使得能够采用较短的阅读长度，同时仍然能够将序列准确地定位于参比基因组上。成对标签的5’和3’标签代表较大核酸片段如基因组DNA的节段(即它们具有以上序列)，在天然产生的DNA片段如基因组DNA片段中这些节段互相间隔在预定距离内。例如，在本发明的某些实施方式中，在天然产生的DNA片段中，成对标签的5’和3’标签代表互相相隔500个核苷酸内、互相相隔1kB内、互相相隔2kB内、互相相隔5kB内、互相相隔10kB内、互相相隔20kB内的DNA节段。在某些实施方式中，在天然产生的DNA片段中，成对标签的5’和3’标签相隔500个核苷酸-2kB，如700个核苷酸-1.2kB，约1kB等。应注意，成对标签的两个标签的准确间隔距离并不重要并且一般未知。此外，虽然标签最初获自较大核酸片段，但术语“标签”用于含有标签序列的任何核酸节段，无论其存在于原始序列内容或文库片段、文库片段的扩增产物、待测序模板等中。It should also be noted that while obtaining paired tags from a single contiguous nucleic acid provides a convenient method for library construction, the importance of paired tags is that they are separated by a distance from each other ("separation distance") in the nucleic acid from which they were originally generated, Wherein the separation distance belongs to a predetermined distance range. The tags are separated by a separation distance that falls within a predetermined range to enable alignment of the tag sequences with a reference sequence (eg, a reference genomic sequence). Without wishing to be bound by any theory, this may benefit certain applications such as genome resequencing, where it enables the use of shorter read lengths while still being able to map sequences accurately on a reference genome. The 5' and 3' tags of the paired tags represent segments (ie, they have the above sequence) of larger nucleic acid fragments, such as genomic DNA, that are spaced within a predetermined distance from each other in naturally occurring DNA fragments, such as genomic DNA fragments. For example, in certain embodiments of the invention, in naturally occurring DNA fragments, the 5' and 3' tags of a pair of tags represent within 500 nucleotides of each other, within 1 kB of each other, within 2 kB of each other, DNA segments that are within 5 kB of each other, within 10 kB of each other, and within 20 kB of each other. In certain embodiments, the 5' and 3' tags of a pair of tags are separated by 500 nucleotides - 2 kB, such as 700 nucleotides - 1.2 kB, about 1 kB, etc., in a naturally occurring DNA fragment. It should be noted that the exact separation distance of the two tags of a tag pair is not critical and is generally unknown. Furthermore, although tags are initially obtained from larger nucleic acid fragments, the term "tag" is used for any segment of nucleic acid that contains a tag sequence, whether present in the original sequence content or in library fragments, amplification products of library fragments, templates to be sequenced, etc. middle.

核酸片段(如文库分子)可能具有以下结构：Nucleic acid fragments (such as library molecules) may have the following structures:

接头1-标签1-接头3-标签1-接头2connector 1-label 1-connector 3-label 1-connector 2

标签1和接头2可以是成对标签的5’和3’标签。任一标签都可以是5’标签或3’标签。接头1和接头2含有一种或多种引物的引物结合区。在某些实施方式中，接头1和2各自含有扩增引物的PBR和测序引物的PBR。各接头中的引物可以是巢式引物，以使测序引物PBR位于扩增引物PBR内部。接头3可含有一种或多种测序引物的PBR，以便测序标签1和标签2。术语“接头”指在文库的多种核酸片段，如文库的基本上所有片段中存在的核酸序列。在文库构建期间，接头可以具有或不具有实际上的连接功能，接头仅可被认为是给定文库的大多数或所有成员所共有的确定序列。这种序列也称为“通用序列”。因此，与接头或其一部分互补的核酸与文库的多个成员杂交，并可用作文库中大多数或所有分子的扩增引物或测序引物。Tag 1 and Adapter 2 can be 5' and 3' tags of a paired tag. Either tag can be a 5' tag or a 3' tag. Adapter 1 and Adapter 2 contain the primer binding regions of one or more primers. In certain embodiments, adapters 1 and 2 each contain the PBR of the amplification primer and the PBR of the sequencing primer. The primers in each adapter may be nested primers such that the sequencing primer PBR is located inside the amplification primer PBR. Adapter 3 may contain a PBR of one or more sequencing primers for sequencing Tag 1 and Tag 2. The term "linker" refers to a nucleic acid sequence present in various nucleic acid fragments of a library, such as substantially all of the fragments of the library. During library construction, linkers may or may not have an actual ligation function, linkers can only be considered as defined sequences shared by most or all members of a given library. Such sequences are also referred to as "universal sequences". Thus, a nucleic acid complementary to an adapter or a portion thereof hybridizes to multiple members of the library and can be used as an amplification primer or sequencing primer for most or all of the molecules in the library.

在本发明某些实施方式中，核酸片段具有以下结构：In some embodiments of the invention, the nucleic acid fragment has the following structure:

接头1-标签1-内部衔接子-标签2-接头2adapter1-tag1-internal adapter-tag2-adapter2

标签1和标签2和接头1和接头2含有上述PBR。内部衔接子含有两个引物结合区，它们可称为IA和IB，如下所述。这些PBR可用于产生连接有两个独立的基本相同的核酸群体的微粒，其中一个核酸群体包含标签1，另一个核酸群体包含标签2。两个独立的核酸群体含有至少部分不同的序列，如它们的标签区序列不同。内部衔接子的两个引物结合区之间可含有间隔区。间隔区可含有脱碱基残基，这种脱碱基残基能防止聚合酶延伸通过该间隔物。当然，可采用含有能防止聚合酶延伸通过该间隔物的任何其它封闭基团的间隔区。Tag 1 and Tag 2 and Adapter 1 and Adapter 2 contain the above PBR. Internal adapters contain two primer binding regions, which may be referred to as IA and IB, as described below. These PBRs can be used to generate microparticles to which two independent populations of substantially identical nucleic acids are attached, one population of nucleic acids comprising tag 1 and the other population of nucleic acids comprising tag 2. Two independent populations of nucleic acids contain at least partially different sequences, eg, different sequences of their tag regions. An internal adapter may contain a spacer between the two primer binding regions. The spacer may contain abasic residues that prevent extension of the polymerase through the spacer. Of course, a spacer containing any other blocking group that prevents extension of the polymerase through the spacer may be used.

在其它实施方式中，核酸片段包括一个或多个(如2、4、6个等)其他标签和一个或多个其它内部衔接子。例如，核酸片段可具有以下结构：In other embodiments, nucleic acid fragments include one or more (eg, 2, 4, 6, etc.) additional tags and one or more additional internal adapters. For example, a nucleic acid fragment can have the following structure:

接头1-标签1-内部衔接子1-标签2-接头2-标签3-内部衔接子2-标签4-接头3adapter1-tag1-internaladaptor1-tag2-adapter2-tag3-internaladaptor2-tag4-adapter3

应注意，除了本文所述的基于连接的测序方法，本发明的核酸片段以及这种片段的文库、含有两种或多种基本相同的核酸群体的微粒和这种微粒的阵列还可用于各种测序方法。例如，可采用测序方法如FISSEQ、焦磷酸盐测序等。参见例如，WO2005082098。当然，也可有利地利用基于连接的方法。应理解，在本文所述基于连接的方法中，术语“测序引物”可理解为“起始寡核苷酸”。It should be noted that, in addition to the ligation-based sequencing methods described herein, the nucleic acid fragments of the invention, as well as libraries of such fragments, microparticles containing two or more substantially identical nucleic acid populations, and arrays of such microparticles, can be used in a variety of Sequencing method. For example, sequencing methods such as FISSEQ, pyrosequencing, etc. can be used. See eg, WO2005082098. Of course, a connection-based approach can also be advantageously utilized. It is to be understood that in the ligation-based methods described herein, the term "sequencing primer" is to be understood as a "starting oligonucleotide".

在本发明的某些实施方式中，在单独的水性乳液室(也称为“反应器”)中进行PCR以合成待测序模板。优选地，各室含有颗粒支持物如连接有合适的第一扩增引物的珠、模板的第一个拷贝、第二扩增引物和进行PCR反应必需的组分(如核苷酸、聚合酶、辅因子等)。制备乳液的方法参见例如美国专利6,489,103(Griffiths)；5,830,663(Embleton)；和美国公开号20040253731(Ghadessy)。在单个乳液室中进行PCR以产生连接于微粒的模板克隆群体的方法参见例如Dressman，D.等，Proc.Natl.Acad.Sci.，100(15)：8817-8822，2003，和PCT公开WO2005010145。In certain embodiments of the invention, PCR is performed in a separate aqueous emulsion chamber (also referred to as a "reactor") to synthesize templates to be sequenced. Preferably, each chamber contains a particulate support such as a bead to which is attached a suitable first amplification primer, a first copy of the template, a second amplification primer and the components necessary to perform the PCR reaction (e.g. nucleotides, polymerase , cofactors, etc.). Methods of making emulsions are found in, eg, US Patents 6,489,103 (Griffiths); 5,830,663 (Embleton); and US Publication No. 20040253731 (Ghadessy). Methods for performing PCR in a single emulsion chamber to generate template clonal populations attached to microparticles are described, for example, in Dressman, D. et al., Proc. .

上述参考文献所述方法或其修饰形式可用于产生用于测序的连接于微粒的模板克隆群体。在优选的非限制性实施方式中，通过将通用衔接子序列连接于不同靶序列(模板)群体的各末端产生适用于PCR的短(＜500个核苷酸)模板。(在这里“通用”指将相同的衔接子序列连接于各模板，产生可用一对PCR扩增引物扩增的“衔接”模板)。用衔接模板、一种游离的扩增引物、连接有第二扩增引物的微粒和其它PCR试剂(如聚合酶、辅因子、核苷酸等)制备批量PCR反应。将水相PCR反应与油相(含有轻质矿物油和表面活性剂)以1∶2混合。涡旋此混合物产生油包水乳液。一毫升混合物足以在该乳液中产生4×10⁹个水性室，各自为可能的PCR反应器。将乳液样品试样量分配到微量滴定板(如96孔板，384孔板等)孔中，进行热循环以在微粒上实现固相PCR扩增。为了保证克隆性，小心地控制微粒和模板浓度，以使该反应器几乎不含一个以上珠或模板分子。例如，在本发明的某些实施方式中，至少10％、20％、30％、40％、50％、60％、70％、80％、90％、95％或更多反应器含有一个珠和一个模板。因此，各模板克隆群体的成员由于连接于微粒而在空间上受到局限。通常，模板的连接点可以基本一致地分布在颗粒表面上。The methods described in the above references, or modifications thereof, can be used to generate populations of microparticle-attached template clones for sequencing. In a preferred, non-limiting embodiment, short (<500 nucleotides) templates suitable for PCR are generated by ligating universal adapter sequences to each end of a population of different target sequences (templates). ("Universal" here means that the same adapter sequence is ligated to each template, resulting in an "adapter" template that can be amplified with a pair of PCR amplification primers). A bulk PCR reaction is prepared using the adapter template, one free amplification primer, microparticles to which a second amplification primer is attached, and other PCR reagents (eg, polymerase, cofactors, nucleotides, etc.). The aqueous phase PCR reaction was mixed 1:2 with the oil phase (containing light mineral oil and surfactant). Vortex this mixture to produce a water-in-oil emulsion. One milliliter of the mixture is sufficient to generate 4 x ¹⁰⁹ aqueous compartments in this emulsion, each a potential PCR reactor. A sample amount of the emulsion sample is dispensed into wells of a microtiter plate (eg, 96-well plate, 384-well plate, etc.), and thermal cycling is performed to achieve solid-phase PCR amplification on the microparticles. To ensure clonality, the microparticle and template concentrations were carefully controlled so that the reactor contained almost no more than one bead or template molecule. For example, in certain embodiments of the invention, at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95% or more of the reactors contain a bead and a template. Thus, the members of each template clonal population are spatially confined due to attachment to microparticles. Typically, the attachment points of the template can be substantially uniformly distributed on the surface of the particle.

特别感兴趣的是，用PCR乳液法产生了微粒群体，其中单个微粒连接有含成对标签的5’标签和3’标签的扩增核酸片段的不同群体。换言之，特别感兴趣的是产生微粒群体，其中单个颗粒具有来自文库的如上述扩增并与之连接的不同核酸片段。Of particular interest, the PCR emulsion method was used to generate populations of microparticles in which individual microparticles have attached different populations of amplified nucleic acid fragments containing paired 5'-tags and 3'-tags. In other words, it is of particular interest to generate populations of microparticles in which individual particles have different nucleic acid fragments from a library amplified and linked to as described above.

根据实现扩增大核酸分子和将这些分子连接于微粒的能力，限制了本领域已知在乳液中扩增DNA的方法(如上述参考文献所述)。例如，已证明用较长的扩增子能使PCR效率呈指数级降低。PCR效率的降低降低了含有成对标签和引物结合位点(如上所述)的核酸片段在PCR乳液中扩增和通过这种扩增连接于微粒的效率。因此，含有成对标签的第一和第二标签的基本相同核酸片段的单一群体在PCR乳液中扩增并通过这种扩增连接于珠的方法受到许多限制。Methods known in the art for amplifying DNA in emulsions (as described in the above references) are limited in terms of the ability to achieve amplification of large nucleic acid molecules and linking these molecules to microparticles. For example, PCR efficiency has been shown to decrease exponentially with longer amplicons. The reduction in PCR efficiency reduces the efficiency with which nucleic acid fragments containing paired tag and primer binding sites (as described above) are amplified in PCR emulsions and attached to microparticles by such amplification. Thus, the method by which a single population of substantially identical nucleic acid fragments containing the first and second tags of a pair of tags are amplified in a PCR emulsion and attached to beads by such amplification suffers from a number of limitations.

本发明提供的方法能够采用较小扩增子，同时保留了含有成对标签的5’和3’标签的单个核酸片段通过扩增连接于微粒时产生的成对标签信息。本发明提供了连接有至少两种独特的核酸群体的微粒如珠，其中至少两种群体各自由多种基本相同的核酸组成，其中基本相同的第一核酸群体包括感兴趣的第一种核酸节段，如5’标签，第二核酸群体包括感兴趣的第二核酸节段，如3’标签。从含有两种标签、也含有侧接和分隔标签的合适分布的引物结合位点的一种较大核酸片段扩增第一和第二核酸群体，以便在微粒和扩增试剂的存在下在单个PCR乳液反应器中连续或(优选)同时进行两个扩增反应。该微粒连接有两种不同的引物群体，其中一种引物群体的序列对应于核酸片段中一个标签以外的引物结合区，另一种引物群体的序列对应于核酸片段中另一个标签以外的引物结合区，即引物结合区侧接于标签。The method provided by the present invention enables the use of smaller amplicons while retaining the paired tag information generated when a single nucleic acid fragment containing the 5' and 3' tags of the paired tag is attached to the microparticle through amplification. The invention provides microparticles, such as beads, to which at least two distinct populations of nucleic acids are attached, wherein the at least two populations each consist of a plurality of substantially identical nucleic acids, wherein the first substantially identical populations of nucleic acids comprise a first nucleic acid segment of interest segment, such as a 5' tag, the second nucleic acid population includes a second nucleic acid segment of interest, such as a 3' tag. The first and second nucleic acid populations are amplified from one larger nucleic acid fragment containing both tags, and also containing appropriately distributed primer binding sites flanking and separating the tags, so that in the presence of microparticles and amplification reagents in a single Two amplification reactions are performed consecutively or (preferably) simultaneously in a PCR emulsion reactor. The microparticle has two different primer populations attached, wherein one primer population has a sequence corresponding to a primer binding region other than one tag in the nucleic acid fragment, and the other primer population has a sequence corresponding to a primer binding region other than the other tag in the nucleic acid fragment. The region, the primer binding region, is flanked by the label.

本发明也提供了结合于位于两个标签之间的引物结合区的引物，以便进行两种不同的PCR反应，各自扩增含有一种标签的核酸片段的一部分。扩增的核酸节段含有互不相同的其它引物结合区。这些其它引物结合区存在于核酸片段中，位于扩增引物的PBR内部，即它们是巢式引物。这些额外PBR用作两种不同测序引物的结合区。因此，通过将两种不同测序引物的一种或另一种施加于连接有两群基本相同的核酸节段的微粒，可在不受另一核酸节段存在干扰的情况下测序两种核酸节段中的一种或另一种。各核酸节段显著短于扩增它的核酸片段，因此提高了用含有成对标签的片段文库进行基于乳液的PCR的效率，同时仍然保留了成对标签的标签之间的关联。The invention also provides primers that bind to a primer binding region located between two tags so that two different PCR reactions are performed, each amplifying a portion of a nucleic acid fragment containing one tag. The amplified nucleic acid segments contain additional primer binding regions that differ from each other. These other primer binding regions are present in the nucleic acid fragment, within the PBR of the amplification primers, ie they are nested primers. These additional PBRs serve as binding regions for two different sequencing primers. Thus, by applying one or the other of two different sequencing primers to microparticles to which two populations of substantially identical nucleic acid segments are attached, two nucleic acid segments can be sequenced without interference from the presence of the other nucleic acid segment. One or the other of the segments. Each nucleic acid segment is significantly shorter than the nucleic acid fragment from which it is amplified, thereby increasing the efficiency of emulsion-based PCR with fragment libraries containing paired tags, while still preserving the association between the tags of the paired tags.

通过参照图34和35的各图，可更好地理解上述方法，其中给具有相同序列的核酸部分分配相同颜色。上述说明是为了对图34和35作一致地解释。图34A和35A显示了相同步骤，其中图35A提供了额外的细节。如图34A和35A所示，用内部衔接子盒(IA-IB)和独特的侧接接头序列(P1和P2)构建含有两种标签(标签1和标签2)的成对末端文库片段。内部衔接子盒和侧接接头序列都含有供PCR扩增和DNA测序的核苷酸序列。设计PCR引物区，以便采用巢式DNA测序引物。通过将相同的两个寡核苷酸序列连接于独特的侧接接头序列产生DNA捕获微粒(珠)。在PCR扩增中，将与具有P1和P2序列的寡核苷酸结合的DNA捕获微粒接种到含有单一双标签文库片段(即文库片段含有成对标签的5’标签和3’标签)和溶液PCR引物的反应中。The method described above can be better understood by referring to the diagrams of Figures 34 and 35, in which portions of nucleic acids having the same sequence are assigned the same color. The foregoing description is for consistent interpretation of Figs. 34 and 35 . Figures 34A and 35A show the same steps, with Figure 35A providing additional detail. As shown in Figures 34A and 35A, paired-end library fragments containing two tags (Tag 1 and Tag 2) were constructed with internal adapter cassettes (IA-IB) and unique flanking adapter sequences (P1 and P2). Both the internal adapter cassette and the flanking linker sequences contain nucleotide sequences for PCR amplification and DNA sequencing. Design PCR primer regions so that nested DNA sequencing primers are used. DNA capture microparticles (beads) are generated by ligating identical two oligonucleotide sequences to unique flanking linker sequences. In PCR amplification, DNA capture particles conjugated to oligonucleotides with P1 and P2 sequences are inoculated into a solution containing a single double-tagged library fragment (i.e., the library fragment contains a 5' tag and a 3' tag of a paired tag) and a solution PCR primers in the reaction.

与内部衔接子引物(IA和IB)相比加入有限量的溶液侧接接头引物(P1和P2)，用于促进PCR产生的标签产物进行有效的向珠驱动的扩增(即[P1＜＜IB]、[P2＜＜IA])。如果需要，适当地控制引物量也可保证核酸群体含有基本相同数量的核酸，如单个微粒上大约一半核酸属于第一群体，单个微粒上大约一半核酸属于第二群体。因此，如果需要，可采用不对称PCR的形式来控制不同群体的比率。A limited amount of solution flanking adapter primers (P1 and P2) were added compared to internal adapter primers (IA and IB) to facilitate efficient bead-driven amplification of PCR-generated tagged products (i.e. [P1<< IB], [P2<<IA]). If necessary, properly controlling the amount of primers can also ensure that the nucleic acid populations contain substantially the same amount of nucleic acids, for example, about half of the nucleic acids on a single particle belong to the first population, and about half of the nucleic acids on a single particle belong to the second population. Therefore, if desired, a form of asymmetric PCR can be used to control the ratios of different populations.

在扩增期间，如图34B和35B所示(其中图35B相对于图34B再次提供了额外的细节)，在四种寡核苷酸引物(P1、P2、IA和IB)存在下，一种成对末端文库片段会产生两种独特的PCR产物。一个群体含有侧接P1和IA的标签1，第二群体含有侧接P2和IB的标签2。During amplification, as shown in Figures 34B and 35B (where Figure 35B again provides additional detail relative to Figure 34B), in the presence of four oligonucleotide primers (P1, P2, IA, and IB), a Paired-end library fragments yield two unique PCR products. One population contains Tag 1 flanked by P1 and IA, and the second population contains Tag 2 flanked by P2 and IB.

扩增后，给微粒装上对应于由起始文库片段产生的标签1和标签2的两种独特PCR群体。因此，各标签含有独特的引物区组，以便对各标签进行连续测序，如图34C、35C和35D所示。图35C和35D显示了用不同测序引物对标签1和2进行连续测序。可采用多种测序方法。After amplification, the microparticles are loaded with two unique PCR populations corresponding to Tag 1 and Tag 2 generated from the starting library fragments. Thus, each tag contains a unique primer block to allow for sequential sequencing of each tag, as shown in Figures 34C, 35C, and 35D. Figures 35C and 35D show sequential sequencing of Tags 1 and 2 with different sequencing primers. A variety of sequencing methods can be employed.

可用上述方法产生连接有两种以上如4、6、8、12、16、20种不同核酸序列群体的微粒，例如，其中该群体包括2、3、4、6、8、10个成对标签。可通过提供各序列中独特的引物结合区，对各群体进行单独测序，如上述两个标签部分所述。The method described above can be used to produce microparticles with more than two populations such as 4, 6, 8, 12, 16, 20 different nucleic acid sequences linked, for example, wherein the population includes 2, 3, 4, 6, 8, 10 paired tags . Each population can be sequenced individually by providing a unique primer binding region in each sequence, as described above in the Two Tags section.

本发明包括具有图34和35所示结构和上述结构的核酸片段，这种片段的文库，连接有来自这种片段的核酸节段的微粒，这种微粒群体(其中单个微粒所连接的核酸群体的序列不同于其他微粒连接的核酸群体)，微粒阵列，从核酸片段扩增核酸节段(标签)的扩增引物，测序连接于微粒的核酸节段的测序引物，制备这种片段、文库和微粒的方法，以及连接于微粒的核酸的测序方法。本发明包括含有上述组分的任何组合的试剂盒，任选也可含有用于扩增、测序等的一种或多种酶、缓冲液或其它试剂。The present invention includes nucleic acid fragments having the structures shown in Figures 34 and 35 and the structures described above, libraries of such fragments, microparticles to which nucleic acid segments from such fragments are linked, such populations of microparticles (wherein individual microparticles are linked to nucleic acid populations) different from other microparticle-linked nucleic acid populations), microparticle arrays, amplification primers for amplifying nucleic acid segments (tags) from nucleic acid fragments, sequencing primers for sequencing nucleic acid segments attached to microparticles, preparation of such fragments, libraries and Methods of microparticles, and methods of sequencing nucleic acids attached to microparticles. The invention includes kits containing any combination of the above components, optionally also containing one or more enzymes, buffers, or other reagents for amplification, sequencing, and the like.

如果需要，可用各种方法富集连接有模板的微粒。例如，可采用杂交方法，其中将与连接于微粒的一部分扩增产物(模板)互补的寡核苷酸(捕获剂)连接于捕获实体如另一种(优选较大)微粒、微量滴定孔或其它表面。这部分扩增产物可称为靶定区。可在扩增期间将靶定区掺入模板，如含有未知序列的模板部分的一端。例如，靶定区可存在于未连接于微粒的扩增引物中，以便使互补部分存在于扩增模板。因此，多种不同模板可包括相同的靶定区，因此一种捕获剂可以杂交于多种不同模板，这使得能够仅用一种寡核苷酸序列如捕获剂就能捕获多种微粒。使进行扩增的微粒在可发生杂交的条件下接触捕获剂。结果是，通过捕获剂将连接有扩增模板的微粒连接于捕获实体。然后去除未连接的微粒，释放残留微粒(如通过提高温度)。在采用颗粒捕获实体的某些实施方式中，分离杂交后由连接有微粒的捕获实体组成的聚集体与没有连接微粒的颗粒捕获实体和未连接于捕获实体的微粒，如通过在粘稠溶液如甘油中离心。也可采用基于大小、密度等的其它分离方法。杂交是可用于富集的许多方法之一。例如，可采用对(例如合成过程中)可掺入模板的许多不同配体有亲和力的捕获剂。可采用多轮富集。Various methods can be used to enrich the template-attached microparticles, if desired. For example, hybridization methods can be employed in which an oligonucleotide (capture agent) complementary to a portion of the amplification product (template) attached to the microparticle is attached to a capture entity such as another (preferably larger) microparticle, a microtiter well or other surfaces. This part of the amplified product can be called the targeted region. A targeted region may be incorporated into the template during amplification, such as at one end of a portion of the template containing an unknown sequence. For example, a targeting region may be present in an amplification primer that is not attached to a microparticle such that a complementary portion is present in the amplification template. Thus, multiple different templates can include the same targeting region, and thus one capture agent can hybridize to multiple different templates, which enables the capture of multiple microparticles with only one oligonucleotide sequence, such as a capture agent. Microparticles undergoing amplification are contacted with a capture agent under conditions such that hybridization can occur. As a result, the amplified template-attached microparticle is attached to the capture entity via the capture agent. Unattached microparticles are then removed and residual microparticles are released (eg, by increasing the temperature). In certain embodiments employing particle capture entities, hybridized aggregates consisting of particle-attached capture entities are separated from particle capture entities not attached to particles and particles not attached to capture entities, e.g., by mixing in a viscous solution such as Centrifuge in glycerol. Other methods of separation based on size, density, etc. may also be used. Hybridization is one of many methods available for enrichment. For example, capture agents can be employed that have affinity for a number of different ligands that can be incorporated into the template (eg, during synthesis). Multiple rounds of enrichment may be employed.

图14A显示了油包水乳液的小室图像，其中在连接有第一扩增引物的珠上用荧光标记的第二扩增引物和过量模板进行PCR反应。水性反应器从扩散的游离引物发出弱荧光，而由于固相扩增(即将荧光引物掺入通过第一扩增引物连接于珠的扩增模板)珠从聚集在珠上的引物发出强荧光。在不同大小的反应器中珠信号一致。Figure 14A shows a chamber image of a water-in-oil emulsion in which a PCR reaction was performed with fluorescently labeled second amplification primers and excess template on beads attached to first amplification primers. Aqueous reactors fluoresce weakly from diffuse free primers, while beads fluoresce strongly from primers aggregated on beads due to solid-phase amplification (ie, incorporation of fluorescent primers into amplification templates attached to beads via a first amplification primer). The bead signal was consistent across reactors of different sizes.

扩增后，收集微粒(如在磁性颗粒的情况下采用磁体)，并用于通过重复的延伸、连接和切割循环进行测序，如本文所述。在本发明的某些实施方式中，将微粒排列在半固体支持物中或之上，然后进行测序，如下所述。实施例12、13、14和15提供了代表性、非限制性方法的其它细节，这些方法可用于(i)制备连接有扩增引物的微粒，用于在微粒上合成模板(实施例12)；(ii)制备含有多个反应器的乳液，以进行PCR(实施例13)；(iii)在乳液室中进行PCR扩增(实施例13)；(iv)破坏乳液并回收微粒(实施例13)；(v)富集连接有克隆模板群体的微粒(实施例14)；(vi)制备玻片，用作半固体聚丙烯酰胺支持物的基材(实施例15)；和(vii)将微粒与未聚合的丙烯酰胺混合，形成连接有模板的微粒阵列，包埋在基材上的丙烯酰胺中(实施例15)。实施例15也描述了聚合酶捕获方案，在半固体支持物中进行PCR时，这种方案可用于某些方法。本领域普通技术人员认识到，可以对这些方法进行许多改变。After amplification, the microparticles are collected (eg, using a magnet in the case of magnetic particles) and used for sequencing through repeated cycles of extension, ligation, and cleavage, as described herein. In certain embodiments of the invention, microparticles are arrayed in or on a semi-solid support and then sequenced, as described below. Examples 12, 13, 14 and 15 provide additional details of representative, non-limiting methods that can be used to (i) prepare microparticles with attached amplification primers for template synthesis on microparticles (Example 12) (ii) preparation of an emulsion containing multiple reactors for PCR (Example 13); (iii) PCR amplification in an emulsion chamber (Example 13); (iv) disruption of the emulsion and recovery of microparticles (Example 13); 13); (v) enrichment of microparticles with attached cloned template populations (Example 14); (vi) preparation of glass slides for use as substrates for semi-solid polyacrylamide supports (Example 15); and (vii) Microparticles were mixed with unpolymerized acrylamide to form template-attached microparticle arrays embedded in acrylamide on a substrate (Example 15). Example 15 also describes a polymerase capture protocol that can be used in some methods when PCR is performed on a semi-solid support. Those of ordinary skill in the art recognize that many variations can be made to these methods.

在本发明其它实施方式中，用PCR在半固体支持物如其中固定有合适扩增引物的凝胶中扩增模板。PCR反应需要的模板、其它扩增引物和试剂存在于半固体支持物中。通过合适的连接部分如acrydite基团将扩增引物对中的一种或两种引物连接于半固体支持物。可以在聚合期间进行连接。在形成半固体支持物之前(如在凝胶形成之前在液体中)可存在其它试剂(如模板、第二扩增引物、聚合酶、核苷酸、辅因子等)，或者半固体支持物形成后一种或多种试剂可扩散到半固体支持物中。选择半固体支持物的孔径以便能够发生这种扩散。如本领域所熟知，在聚丙烯酰胺凝胶的情况下，主要通过丙烯酰胺单体的浓度确定孔径，还受交联剂的一定影响。在其它半固体支持物材料的情况下也有类似考虑。可选择实现所需孔径的合适的交联剂和浓度。在本发明的某些实施方式中，在聚合前溶液中含有添加剂如阳离子脂质、聚胺、聚阳离子等，它们在凝胶中形成环绕微粒的胶束或聚集体。也可采用美国专利5,705,628、5,898,071和6,534,262所述的方法。例如，可用各种“加密试剂”加密珠附近的DNA，以进行克隆PCR。也可采用SPRI磁珠技术和/或条件。参见例如，美国专利5,665,572，显示在10％聚乙二醇(PEG)存在下进行有效的PCR扩增。在本发明方法的某些实施方式中，在某些试剂如甜菜碱、聚乙二醇、PVP-40等的存在下进行扩增(如PCR)、连接或扩增和连接。这些试剂可加入溶液中、存在于乳液中和/或扩散到半固体支持物中。In other embodiments of the invention, PCR is used to amplify templates in a semi-solid support such as a gel in which appropriate amplification primers are immobilized. The templates, other amplification primers and reagents required for the PCR reaction are present on the semi-solid support. One or both primers of the amplification primer pair are attached to the semi-solid support via a suitable linking moiety such as an acrydite group. Joins can be made during aggregation. Other reagents (such as templates, second amplification primers, polymerases, nucleotides, cofactors, etc.) may be present prior to formation of the semisolid support (such as in liquid prior to gel formation), or the semisolid support forms The latter agent or agents may diffuse into the semi-solid support. The pore size of the semi-solid support is chosen so that this diffusion can occur. As is well known in the art, in the case of polyacrylamide gels, the pore size is primarily determined by the concentration of acrylamide monomer, with some influence from the crosslinker. Similar considerations apply in the case of other semi-solid support materials. A suitable cross-linking agent and concentration can be chosen to achieve the desired pore size. In certain embodiments of the invention, the pre-polymerization solution contains additives such as cationic lipids, polyamines, polycations, etc., which form micelles or aggregates surrounding the microparticles in the gel. The methods described in US Pat. Nos. 5,705,628, 5,898,071 and 6,534,262 can also be used. For example, various "encryption reagents" can be used to encrypt the DNA near the beads for clonal PCR. SPRI(R) magnetic bead technology and/or conditions may also be employed. See, eg, US Patent 5,665,572, showing efficient PCR amplification in the presence of 10% polyethylene glycol (PEG). In certain embodiments of the methods of the invention, amplification (eg, PCR), ligation, or amplification and ligation is performed in the presence of certain reagents, such as betaine, polyethylene glycol, PVP-40, and the like. These agents can be added in solution, present in emulsion and/or diffused into a semi-solid support.

可以在基本平坦的刚性基材上定位或组装半固体支持物。在某些优选实施方式中，该基材能够透过用于激发和检测典型标记(如荧光标记、量子点、等离子体共振颗粒、纳米簇)的激发和发射波长(如约400-900nm)的射线。某些材料如玻璃、塑料、石英等是合适的。半固体支持物可粘附于该基材，并可用各种方法任选地固定于该基材。可以使用或不使用提高粘着力或键合力的物质，如硅烷、聚赖氨酸等涂布该基材。美国专利6,511,803描述了用PCR在半固体支持物中合成克隆模板群体的方法、在基本平坦的基材上制备半固体支持物的方法等。本发明可采用相似方法。该基材在形成半固体基材之前可具有容纳液体的孔或凹陷。或者，升高的边界或掩模可用于此目的。A semi-solid support can be positioned or assembled on a substantially planar rigid substrate. In certain preferred embodiments, the substrate is transparent to radiation at excitation and emission wavelengths (e.g., about 400-900 nm) for excitation and detection of typical labels (e.g., fluorescent labels, quantum dots, plasmon resonant particles, nanoclusters) . Certain materials such as glass, plastic, quartz, etc. are suitable. A semi-solid support can be adhered to the substrate and optionally fixed to the substrate by a variety of methods. The substrate may be coated with or without adhesion or bond enhancing substances such as silanes, polylysine, and the like. US Patent 6,511,803 describes methods of using PCR to synthesize clonal template populations in semi-solid supports, methods of preparing semi-solid supports on substantially planar substrates, and the like. A similar approach can be used in the present invention. The substrate may have pores or depressions for receiving liquids prior to forming the semi-solid substrate. Alternatively, raised borders or masks can be used for this purpose.

上述方法提供了采用乳液中的反应器产生空间上受到限定的克隆模板群体的另一方法。克隆群体存在于半固体支持物中的离散位置上，从而使得在测序过程中可通过(例如)成像从各群体获得信号，用于检测新连接的延伸探针。在本发明的一些实施方式中，由一种核酸片段扩增两种或多种不同克隆群体，它们以混合物形式存在于半固体支持物中的离散位置上。混合物中各克隆群体可含有标签，从而使得离散位置含有含5’标签的片段和含有3’标签的片段。含有5’标签和3’标签的克隆模板含有不同测序引物，从而使得它们可互相独立地进行测序。该方法与上述方法相同，均可用于在微粒上产生多种基本相同的核酸群体并从一种微粒上获得成对标签的两个成员的测序信息。The method described above provides an alternative method for generating a spatially confined population of clonal templates using a reactor in an emulsion. Clonal populations are present at discrete locations in the semi-solid support such that during sequencing, a signal can be obtained from each population by, for example, imaging, for detection of newly ligated extension probes. In some embodiments of the invention, two or more different clonal populations are amplified from one nucleic acid fragment, present as a mixture at discrete locations on a semi-solid support. Each clonal population in the mixture may contain a tag such that a discrete position contains a fragment containing a 5' tag and a fragment containing a 3' tag. Cloning templates containing 5' and 3' tags contain different sequencing primers, allowing them to be sequenced independently of each other. This method is the same as the above method, and can be used to generate multiple substantially identical nucleic acid populations on microparticles and to obtain sequencing information of two members of a paired tag from one microparticle.

通常，用于任何本发明方法的半固体支持物形成厚度约100微米或更小，如约50微米或更小，如约20-40微米的层。优选在聚合前，可将盖玻片或具有基本平坦表面的其它相似物体放置在半固体支持物材料上，以帮助产生均一的凝胶层，如形成基本平坦和/或厚度基本均一的凝胶层。Typically, the semisolid support used in any of the methods of the invention forms a layer having a thickness of about 100 microns or less, such as about 50 microns or less, such as about 20-40 microns. Preferably, prior to polymerization, a coverslip or other similar object having a substantially flat surface can be placed on the semi-solid support material to help create a uniform gel layer, such as to form a gel that is substantially flat and/or substantially uniform in thickness layer.

在本发明的其它实施方式中，可采用上述方法的修饰形式，其中用PCR在连接有合适扩增引物的微粒上合成模板，其中在模板合成之前将该微粒固定在半固体支持物中或之上，即将它们完全或部分包埋在半固体支持物中。通常，半固体支持物完全环绕着所述微粒，但它们也可保持在下面的基材上。因此，微粒互相保持在基本固定的位置上，除非半固体支持物被破坏。所述方法提供了用乳液产生空间上受限制的克隆模板群体的另一种方法。可在形成半固体支持物之前将微粒与液体混合。或者，可将微粒排列在基本平坦的基材上，在聚合、交联等之前将液体加入微粒阵列中。该微粒连接有第一扩增引物。第二扩增引物可以，但不一定连接于半固体支持物。在形成半固体支持物之前(如在凝胶形成之前在液体中)可存在其它试剂(如模板、第二扩增引物、聚合酶、核苷酸、辅因子等)，或者凝胶形成后一种或多种试剂可扩散到半固体支持物中。通常，如上所述在玻片上形成半固体基材。In other embodiments of the invention, a modified form of the above method may be employed, wherein PCR is used to synthesize templates on microparticles to which appropriate amplification primers are attached, wherein the microparticles are immobilized in or on a semi-solid support prior to template synthesis. on, that is, they are fully or partially embedded in a semi-solid support. Typically, the semisolid support completely surrounds the microparticles, but they may also remain on the underlying substrate. Thus, the microparticles remain in a substantially fixed position relative to each other unless the semi-solid support is disrupted. The method provides an alternative method of using emulsions to generate spatially restricted populations of clonal templates. The microparticles can be mixed with the liquid prior to forming the semi-solid support. Alternatively, the microparticles can be arrayed on a substantially planar substrate, and a liquid added to the microparticle array prior to polymerization, crosslinking, etc. The microparticles are linked with first amplification primers. The second amplification primer can, but need not, be attached to the semi-solid support. Other reagents (e.g., templates, second amplification primers, polymerases, nucleotides, cofactors, etc.) may be present prior to formation of the semisolid support (e.g., in liquid before gel formation), or one One or more reagents can diffuse into the semi-solid support. Typically, a semi-solid substrate is formed on a glass slide as described above.

在本发明的某些实施方式中，可溶解(如消化或解聚或熔化)凝胶，以便模板合成后方便地回收连接克隆模板群体的微粒(如在磁性颗粒的情况下采用磁体)。在本文中将可溶解、消化、解聚、溶解等的凝胶称为“可逆”凝胶。常规的聚丙烯酰胺聚合包括采用N-N′亚甲基双丙烯酰胺(BIS)作为交联剂和合适的催化剂，以启动聚合(如N，N，N′，N′-四甲基亚乙基二胺(TEMED)。为了产生可逆凝胶，可采用另一种交联剂如N-N′二烯丙基酒石酸二酰胺(DATD)。这种化合物在结构上与BIS相似，但具有可被高碘酸(如含有高碘酸钠的溶液)切割的顺-二羟基(Anker，H.S.：F.E.B.S.Lett.，7：293，1970)。因此，不难溶解DATD凝胶。用DATD作为交联剂制备的凝胶高度透明，并与玻璃牢固结合。具有形成可逆凝胶的DATD样特性的另一种交联剂是二丙烯酸乙二酯(Choules，G.L.和Zimm，B.S.：Anal.Biochem.，13：336-339，1965)。N，N′-双丙烯基胱胺(BAC)是可用于形成可逆聚丙烯酰胺凝胶的另一种交联剂。可用于形成在高碘酸盐中溶解的凝胶的另一种交联剂是N，N′-(1，2-二羟基亚乙基)双丙烯酰胺(DHEBA)。也可采用能形成可逆半固体支持物的各种其它材料。例如，可采用热致可逆性聚合物如普朗尼克(购自BASF)。普朗尼克是聚(环氧乙烷)-聚(环氧丙烷)-聚(环氧乙烷)(PEO-PPO-PEO)三嵌段共聚物家族(Nace，V.M.等，Nonionic Surfactant，Marcel-Dekker，NY，1996)。这些材料在温度升高(如高于室温的温度)时变成半固体(凝胶)，冷却时液化。可用各种方法对普朗尼克进行化学衍生，例如以有利于连接引物(参见例如，Neff，J.A.等，J.Biomed.Mater.Res.，40：511，1998；Prud′homme，RK等，Langmuir，12：4651，1996)。In certain embodiments of the invention, the gel can be dissolved (eg, digested or disaggregated or melted) to facilitate recovery of the microparticles (eg, using a magnet in the case of magnetic particles) linked populations of clonal templates after template synthesis. Gels that can be solubilized, digested, depolymerized, dissolved, etc. are referred to herein as "reversible" gels. Conventional polyacrylamide polymerization involves the use of N-N' methylene bisacrylamide (BIS) as a crosslinking agent and a suitable catalyst to initiate polymerization (such as N, N, N', N'-tetramethylethylene bis Amines (TEMED). In order to produce reversible gels, another cross-linking agent such as N-N'diallyl tartaric acid diamide (DATD) can be used. This compound is similar in structure to BIS, but has the ability to be (such as a solution containing sodium periodate) cleaved cis-dihydroxyl (Anker, H.S.: F.E.B.S.Lett., 7: 293, 1970). Therefore, it is not difficult to dissolve the DATD gel. The gel prepared with DATD as a crosslinking agent The gel is highly transparent and binds strongly to glass. Another crosslinker with DATD-like properties to form reversible gels is ethylene glycol diacrylate (Choules, G.L. and Zimm, B.S.: Anal. Biochem., 13:336- 339, 1965). N, N'-bispropenylcystamine (BAC) is another cross-linking agent that can be used to form reversible polyacrylamide gels. It can be used to form gels dissolved in periodate Another cross-linking agent is N,N'-(1,2-dihydroxyethylene)bisacrylamide (DHEBA). Various other materials that can form a reversible semi-solid support can also be used. For example, the Thermoreversible polymers such as Pluronic (available from BASF). Pluronic is poly(ethylene oxide)-poly(propylene oxide)-poly(ethylene oxide)(PEO-PPO-PEO) family of block copolymers (Nace, V.M. et al., Nonionic Surfactant, Marcel-Dekker, NY, 1996). These materials become semi-solid (gel) at elevated temperature (eg, above room temperature) and liquefy on cooling Pluronic can be chemically derivatized in various ways, e.g. to facilitate ligation of primers (see e.g., Neff, J.A. et al., J. Biomed. Mater. Res., 40:511, 1998; Prud'homme, RK et al., Langmuir, 12:4651, 1996).

溶解后，可收集微粒，并用重复的延伸、连接和切割循环进行测序。测序前，可在第二种半固体支持物中或之上(例如，以高于其在第一种半固体支持物中或之上所存在的密度)排列微粒。半固体支持物本身由基本平坦的刚性基材如玻片支撑。After lysis, microparticles can be collected and sequenced with repeated cycles of extension, ligation, and cleavage. Prior to sequencing, microparticles can be arrayed in or on a second semi-solid support (eg, at a higher density than they were present in or on the first semi-solid support). The semi-solid support itself is supported by a substantially planar rigid substrate such as a glass slide.

因此，可用两种通用方法产生半固体支持物中或之上包埋了携带克隆模板群体的微粒阵列的半固体支持物。第一种方法包括在未存在于半固体支持物中的微粒上进行扩增(如用乳液PCR)，然后将该微粒固定在半固体支持物中或之上。第二种通用方法包括将微粒固定在半固体支持物中或之上，然后进行扩增。在这两种情况下，可能需要采取一定步骤来降低微粒聚集和/或将微粒基本排列在一个聚焦平面中。例如，将颗粒固定在聚丙烯酰胺凝胶中时，选择单体和交联剂的浓度，以使颗粒沉降到溶液底部，然后完成聚合，以使它们停留在下面的平坦基材上，从而位于一个平面中。在本发明的某些实施方式中，将具有基本平坦表面的物体，如盖玻片放置在含有微粒的液体丙烯酰胺(或能够形成半固体支持物的材料)上，以使丙烯酰胺夹在“夹心”结构的两层之间。然后倒转该夹心结构，以便通过重力作用使微粒沉降并停留在盖玻片(或具有基本平坦表面的其它物体)上。聚合后，揭下盖玻片。因此，微粒基本上包埋在同一平面内，接近半固体支持物的表面(如与该表面的正切)。Thus, two general approaches can be used to generate semi-solid supports with arrays of microparticles carrying clonal template populations embedded in or on the semi-solid support. The first method involves amplification (eg, using emulsion PCR) on microparticles not present in the semisolid support, followed by immobilization of the microparticles in or on the semisolid support. A second general method involves immobilizing microparticles in or on a semi-solid support followed by amplification. In both cases, steps may need to be taken to reduce particle aggregation and/or to align the particles substantially in a focal plane. For example, when immobilizing particles in a polyacrylamide gel, the concentrations of monomer and crosslinker are chosen so that the particles settle to the bottom of the solution, and polymerization is then completed so that they remain on the underlying flat in a plane. In certain embodiments of the invention, an object having a substantially planar surface, such as a coverslip, is placed on liquid acrylamide (or a material capable of forming a semi-solid support) containing microparticles such that the acrylamide is sandwiched between the " between the two layers of the sandwich structure. The sandwich is then inverted so that the microparticles settle and rest on the coverslip (or other object with a substantially flat surface) by gravity. After polymerization, the coverslip is removed. Thus, the microparticles are substantially embedded in the same plane, close to (eg tangent to) the surface of the semi-solid support.

在本发明的某些实施方式中，与其如上所述将支持物如微粒固定在半固体基质中，不如将微粒共价或非共价连接于基本平坦的刚性基材，而不采用半固体支持物来固定它们。本领域已知将微粒连接于基材如玻璃、塑料、石英、硅等的各种方法。可以使用或不使用某些材料(如各种聚合物)或促进连接的物质涂布(如旋涂)基材或使其官能化。涂层可以是薄膜、自组装单层等。可连接微粒、连接于微粒的部分或连接于微粒的寡核苷酸(如模板)。In certain embodiments of the invention, rather than immobilizing a support such as a microparticle in a semisolid matrix as described above, the microparticle is covalently or non-covalently attached to a substantially planar rigid substrate without the use of a semisolid support. things to fix them. Various methods are known in the art for attaching microparticles to substrates such as glass, plastic, quartz, silicon, and the like. The substrate may or may not be coated (eg, spin-coated) or functionalized with certain materials (eg, various polymers) or attachment-promoting substances. Coatings can be thin films, self-assembled monolayers, and the like. A microparticle, a moiety attached to a microparticle, or an oligonucleotide (eg, a template) attached to a microparticle can be attached.

通常，相互亲和以形成结合对的任何分子对均可用于将微粒或模板连接于基材。将结合对的第一个成员共价或非共价连接于基材，将结合对的第二个成员共价或非共价连接于微粒或模板。可通过接头将第一结合成员连接于基材。可通过接头将第二结合成员连接于微粒或模板。例如，根据一种方法，用胺活化基团(如采用含有胺活化基团的PEG接头)修饰玻片或其它合适基材。在水性条件下(如pH 8.0)，胺活化基团与胺如蛋白质(如链霉亲和素)中的赖氨酸反应。因此，用携带胺的部分官能化的微粒会固定在基材上。携带胺的部分可以是蛋白质或适当官能化的核酸，如DNA模板。可将多个部分连接于珠。例如，珠可连接与NHS酯反应的蛋白质，以将该珠连接于基材，也可连接DNA模板，该珠连接于基材后可对该模板进行测序。可从(如)Schott Nexterion，Schott NorthAmerica，Inc.，Elmsford，NY 10523购得适当包被的带有聚合物系链的玻片，所述系链的一端含有胺反应性NHS部分。或者，包被玻片(如生物素包被玻片)可购自Accelr8 Technology Corporation，Denver，CO。它们的OptiChem^TM技术代表了将微粒连接于基材的一种方法。参见例如，美国专利6,844,028。或者，可用(如)末端转移酶与生物素-二脱氧ATP和/或生物素-脱氧ATP以生物素使珠上的多核苷酸官能化，然后在促进生物素-链霉抗生素蛋白键合的条件下使这些珠接触链霉抗生素蛋白-包被的玻片(购自(如)Accelr8 Technology Corporation，Denver，CO)，从而将微粒连接于基材。In general, any pair of molecules with mutual affinity to form a binding pair can be used to attach the microparticle or template to the substrate. The first member of the binding pair is covalently or non-covalently attached to the substrate and the second member of the binding pair is covalently or non-covalently attached to the microparticle or template. The first binding member can be attached to the substrate via a linker. The second binding member can be attached to the particle or template via a linker. For example, according to one approach, a glass slide or other suitable substrate is modified with an amine-activated group (eg, using a PEG linker containing an amine-activated group). Under aqueous conditions (eg, pH 8.0), the amine-activating group reacts with amines such as lysine in proteins (eg, streptavidin). Thus, microparticles functionalized with amine-carrying moieties will be immobilized on the substrate. The amine-bearing moiety can be a protein or an appropriately functionalized nucleic acid, such as a DNA template. Multiple moieties can be attached to the bead. For example, a bead can be attached to a protein that reacts with an NHS ester to attach the bead to a substrate, or a DNA template can be attached to which the bead can be attached to a substrate for sequencing. Suitably coated slides with a polymer tether containing an amine-reactive NHS moiety at one end are commercially available from, eg, Schott Nexterion, Schott North America, Inc., Elmsford, NY 10523. Alternatively, coated slides (eg, biotin-coated slides) can be purchased from Accelr8 Technology Corporation, Denver, CO. Their OptiChem ^™ technology represents one method of attaching microparticles to substrates. See, eg, US Patent 6,844,028. Alternatively, the polynucleotides on the beads can be functionalized with biotin, e.g., with terminal transferase and biotin-dideoxyATP and/or biotin-deoxyATP, followed by biotin-streptavidin-promoting binding These beads are conditioned to contact streptavidin-coated slides (commercially available from, for example, Accelr8 Technology Corporation, Denver, CO), thereby attaching the microparticles to the substrate.

通常，可利用本领域已知的各种方法修饰核酸如寡核苷酸引物、探针、模板等，以促进这种核酸连接于微粒或者其它支持物或基材。此外，可利用本领域已知的各种方法修饰微粒或其它支持物，以促进连接核酸，促进将微粒连接于支持物或基材等。可获得具有有利于连接所需官能团的表面化学性质的微球。这些表面化学性质的一些例子包括但不限于：包括脂族胺和芳胺的氨基、羧酸、醛、酰胺、氯代甲基、酰肼、羟基、磺酸酯和硫酸酯。这些基团可与核酸上存在的基团发生反应，或者可通过连接活性基团修饰核酸。此外，本领域熟知大量稳定的双功能基团，包括同双功能和异双功能接头。参见例如，Pierce化学技术库，获自URL为www.piercenet.com的Web站点(最初在1994-95 Pierce目录中公开)和G.T.Hermanson，Bioconjugate Techniques，Academic Press，Inc.，1996。也参见美国专利6,632,655。In general, nucleic acids such as oligonucleotide primers, probes, templates, etc. can be modified using various methods known in the art to facilitate attachment of such nucleic acids to microparticles or other supports or substrates. In addition, microparticles or other supports can be modified using various methods known in the art to facilitate attachment of nucleic acids, attachment of microparticles to supports or substrates, and the like. Microspheres can be obtained with surface chemistry that facilitates the attachment of desired functional groups. Some examples of these surface chemistries include, but are not limited to, amino groups including aliphatic and aromatic amines, carboxylic acids, aldehydes, amides, chloromethyl groups, hydrazides, hydroxyl groups, sulfonate esters, and sulfate esters. These groups can react with groups present on the nucleic acid, or can modify the nucleic acid by attaching reactive groups. In addition, a large number of stable bifunctional groups, including homobifunctional and heterobifunctional linkers, are well known in the art. See, e.g., Pierce Chemical Technology Library, available from the Web site at URL www.piercenet.com (originally published in the 1994-95 Pierce Directory) and G.T. Hermanson, Bioconjugate Techniques, Academic Press, Inc., 1996. See also US Patent 6,632,655.

按照本文所述方法形成的微粒阵列通常是随机阵列。本文所用术语“随机图案化”或“随机”指实体(特征)在支持物上发生无序、非笛卡尔分布(换言之，没有排列在预定点或沿网格x-和y轴的位置或者相对于辐射图案的中心确定的′时钟位置′、角度或半径)，这不是通过有意设计(或可获得这种设计的程序)或放置单个实体获得的。这种“随机图案化”或“随机”的实体阵列可通过将含有实体库的溶液、乳液、气溶胶、蒸汽或干制剂滴加、喷雾、电镀、散布、分布(等)到支持物上或中，并使其沉降到支持物上或中实现，不以任何方式介入将它们导向支持物中或上的特定位点。例如，可将实体悬浮于含有半固体支持物前体(如丙烯酰胺单体)的溶液中。然后将该溶液分布在第二种支持物上，在第二种支持物上形成半固体支持物。将实体包埋在半固体支持物中或上。当然，也可采用非随机阵列。通常，本文所用形成阵列的方法不同于通过将单个核苷酸亚基连续施加于基材的预定位置上合成多核苷酸的方法。Microparticle arrays formed according to the methods described herein are typically random arrays. The terms "randomly patterned" or "randomly" as used herein refer to a disordered, non-Cartesian distribution of entities (features) on a support (in other words, not arranged at predetermined points or positions along the x- and y-axes of the grid or relative to each other). 'Clock position', angle or radius determined at the center of the radiation pattern), which is not achieved by deliberate design (or a program that enables such a design) or placement of a single entity. Such "randomly patterned" or "random" arrays of entities can be obtained by dripping, spraying, plating, spreading, distributing (etc.) a solution, emulsion, aerosol, steam or dry formulation containing a library of entities onto a support or and to allow their settling onto or into the support without intervening in any way in directing them to specific sites in or on the support. For example, the entity can be suspended in a solution containing a semisolid support precursor such as acrylamide monomer. The solution is then distributed on a second support, forming a semi-solid support on the second support. Entities are embedded in or on a semi-solid support. Of course, non-random arrays can also be used. In general, methods of forming arrays as used herein are distinct from methods of synthesizing polynucleotides by sequentially applying individual nucleotide subunits at predetermined locations on a substrate.

图14B(上)显示了上面含有聚丙烯酰胺凝胶的玻片(1英寸×3英寸)的荧光图像。将具有与连接于珠的模板杂交的荧光标记的寡核苷酸的珠(直径1微米)固定在凝胶中。该图显示了珠表面密度(即珠所在区域内每单位基材面积上的珠数量)，每块玻片上足以成像约2.8亿个珠。一块玻片上的表面密度和可成像面积使得足以成像至少5亿个珠。例如，图14B(下)显示了带有围绕清晰区域的Teflon掩模的玻片的示意图，在该区域中将珠包埋到半固体支持物层如聚丙烯酰胺凝胶中。此掩模的面积是864mm²。具有5亿个珠，表面密度是578,000个珠/mm²。紧密装填的1微米六边形阵列含有1,155,000个珠/mm²，因此，这种实施方式产生具有52％理论最大密度的阵列。应理解，可采用比此具体实施方式更少和更多的珠数量、更低或更高的珠表面密度。Figure 14B (top) shows a fluorescent image of a slide (1 inch x 3 inches) with a polyacrylamide gel on it. Beads (1 micron in diameter) with fluorescently labeled oligonucleotides that hybridize to templates attached to the beads were immobilized in the gel. The graph shows the bead surface density (ie, the number of beads per unit area of substrate within the region where the beads reside), which is sufficient to image approximately 280 million beads per slide. The surface density and imageable area on one slide are sufficient to image at least 500 million beads. For example, Figure 14B (bottom) shows a schematic of a glass slide with a Teflon(R) mask surrounding a clear region where beads are embedded in a layer of a semi-solid support such as a polyacrylamide gel. The area of this mask is 864 mm ² . With 500 million beads, the surface density is 578,000 beads/mm ² . A tightly packed 1 micron hexagonal array contains 1,155,000 beads/mm ² , thus, this embodiment yields an array with a density of 52% of the theoretical maximum. It should be understood that both fewer and higher bead counts, lower or higher bead surface densities than in this particular embodiment may be employed.

可在基本平坦的半固体支持物或另一支持物或基材中或之上以各种密度排列微粒，可以多种方式对其进行限定。例如，密度可表示为基本平坦的阵列每单位面积的微粒(如球形微粒)数。在本发明的某些实施方式中，基本平坦的阵列上每单位面积的微粒数至少为六边形阵列中微粒数的80％(“六边形阵列”指阵列中每个微粒至少接触面积相等的另外六个相邻微粒的基本平坦的微粒阵列，如美国专利6,406,848所述)。然而，在本发明的其它实施方式中，微粒密度较低，如，基本平坦的阵列上每单位面积的微粒数小于六边形阵列中微粒数的80％、70％、60％或50％。如果不希望受限于理论，优选利用较低密度(如上述密度)，以便使试剂如酶、引物、辅因子等发生足够扩散，并避免某些试剂对微粒有不同亲和力或截留在其中时产生的试剂分配效应。这种效应可在阵列的不同位置上产生不同的反应条件，甚至可能阻止这些试剂进入阵列的某些位置。在流动室中进行反应时这些问题可能更难处理，因为试剂以定向方式通过流动室。在本发明的某些实施方式中，流动室的小室中包括混合装置，如通过机械或声学手段实现流体混合的装置。本领域已知许多合适的混合装置。Microparticles can be arranged in various densities in or on a substantially planar semi-solid support or another support or substrate, which can be defined in a variety of ways. For example, density can be expressed as the number of particles (eg, spherical particles) per unit area of a substantially planar array. In some embodiments of the invention, the number of particles per unit area on the substantially planar array is at least 80% of the number of particles in the hexagonal array ("hexagonal array" means that each particle in the array contacts at least equal area). A substantially planar microparticle array of another six adjacent microparticles, as described in US Pat. No. 6,406,848). However, in other embodiments of the invention, the particle density is lower, eg, the number of particles per unit area on a substantially planar array is less than 80%, 70%, 60%, or 50% of the number of particles in a hexagonal array. Without wishing to be bound by theory, it is preferable to use lower densities such as those described above to allow sufficient diffusion of reagents such as enzymes, primers, cofactors, etc. reagent distribution effect. This effect can create different reaction conditions at different locations on the array and may even prevent access of these reagents to certain locations on the array. These issues can be more difficult to deal with when reactions are performed in a flow cell because the reagents pass through the cell in a directional fashion. In certain embodiments of the invention, the cells of the flow chamber include mixing means, such as means for effecting fluid mixing by mechanical or acoustic means. Many suitable mixing devices are known in the art.

可用以所有类型阵列形式，包括随机和非随机阵列排列的模板实施本发明测序方法，所述阵列可以是微粒阵列或模板本身的阵列。例如，美国专利5,641,658和PCT公开号WO0018957描述了上面排列着模板的支持物。阵列可位于各种基材如滤纸、膜(如尼龙)、金属表面等上。可在阵列上通过重复的延伸、连接和切割循环进行测序的阵列形式的其它例子是位于光纤束中单根光纤的末端或远端的孔中的珠阵列。参见例如，美国公开和专利如6,023,540；6,429,027、20040185483、2002187515，PCT申请US98/05025和PCT US98/09163以及PCT公开WO0039587中描述了珠阵列和“阵列的阵列”。可以如本文所述排列连接有模板的珠。优选在形成阵列之前进行扩增。在这些基材上形成的阵列不一定基本平坦。The sequencing methods of the invention can be performed with templates arranged in arrays of all types, including random and non-random arrays, which can be arrays of microparticles or arrays of the templates themselves. For example, US Patent 5,641,658 and PCT Publication No. WO0018957 describe supports with templates arrayed thereon. Arrays can be located on a variety of substrates such as filter paper, membranes (eg nylon), metal surfaces, and the like. Other examples of array formats that can be sequenced on an array by repeated cycles of extension, ligation, and cleavage are bead arrays in wells positioned at the ends or distal ends of individual optical fibers in a fiber optic bundle. See, eg, US publications and patents such as 6,023,540; 6,429,027, 20040185483, 2002187515, PCT applications US98/05025 and PCT US98/09163 and PCT publication WO0039587 describing bead arrays and "arrays of arrays". The template-attached beads can be arranged as described herein. Amplification is preferably performed prior to formation of the array. Arrays formed on these substrates need not be substantially planar.

在其它实施方式中，在含有连接于基材或支持物的寡核苷酸的阵列上进行PCR，(参见例如，美国专利5,744,305；5,800,992；6,646,243和相关专利(Affymetrix)；PCT公开WO2004029586；WO03065038；WO03040410(Nimblegen))。通常，这种寡核苷酸含有游离的3’或5’端。如果需要，可修饰该末端，例如，如果3’端没有磷酸基团或OH基团则将磷酸基团或OH基团加到3’端上。将含有与连接于支持物或基材的寡核苷酸互补的区域的模板分子杂交于寡核苷酸，在阵列上进行原位PCR，在阵列的各个位置上产生克隆模板群体。连接于阵列的寡核苷酸可用作扩增引物之一。然后，用本文所述基于连接的方法测序模板。也可在阵列中的模板上进行测序，如美国公开号20030068629所述。In other embodiments, PCR is performed on arrays containing oligonucleotides attached to a substrate or support, (see, e.g., US Patents 5,744,305; 5,800,992; 6,646,243 and related patents (Affymetrix); PCT Publication WO2004029586; WO03065038; WO03040410 (Nimblegen)). Typically, such oligonucleotides contain free 3' or 5' ends. This end can be modified if desired, for example, adding a phosphate or OH group to the 3' end if there is no phosphate or OH group at the 3' end. Template molecules containing regions complementary to oligonucleotides attached to a support or substrate are hybridized to oligonucleotides and in situ PCR is performed on the array to generate a population of clonal templates at each position on the array. An oligonucleotide attached to the array can be used as one of the amplification primers. The templates are then sequenced using the ligation-based methods described herein. Sequencing can also be performed on the templates in the array, as described in US Pub. No. 20030068629.

可使用在表面上制备DNA阵列的其他方法。例如，用末端醛基修饰的烷基硫醇(alkanethiol)可用于在金表面上制备自组装单层(SAM)。该单层的醛基可与胺修饰的寡核苷酸或其它携带胺的生物分子反应形成Schiff喊，然后可用氰基硼氢钠处理还原成稳定的仲胺(Peelen和Smith，Langmuir，21(1)：266-71，2005)。然后可进行模板的PCR扩增。或者，通过微粒或模板上的胺基或连接于颗粒的寡核苷酸与表面发生反应，可将连接有克隆模板群体的微粒连接于该表面。Other methods of preparing DNA arrays on surfaces can be used. For example, alkanethiols modified with terminal aldehyde groups can be used to prepare self-assembled monolayers (SAMs) on gold surfaces. The aldehyde groups of this monolayer can be reacted with amine-modified oligonucleotides or other amine-bearing biomolecules to form Schiff's, which can then be reduced to stable secondary amines by treatment with sodium cyanoborohydride (Peelen and Smith, Langmuir, 21( 1): 266-71, 2005). PCR amplification of the template can then be performed. Alternatively, microparticles with linked populations of cloned templates can be attached to the surface by reacting amine groups on the microparticle or template or oligonucleotides attached to the particle with the surface.

获得连接有克隆模板群体的微粒的另一方法是美国专利5,604,097所述的“固相克隆”法，该方法利用寡核苷酸标签将多核苷酸分选到微粒上，使得只有序列相同的多核苷酸连接于某一特定微粒。Another method of obtaining microparticles with linked populations of cloned templates is the "solid phase cloning" method described in U.S. Patent 5,604,097, which uses oligonucleotide tags to sort polynucleotides onto microparticles such that only polynuclei of identical sequence The nucleotides are linked to a specific particle.

在本发明的某些实施方式中，通过将测序试剂(如延伸探针、连接酶、磷酸酶等)扩散到含有固定在支持物中或之上的克隆模板群体(各克隆群体位于支持物的空间独立区域中)的半固体支持物如凝胶中，以重复的延伸、连接和切割循环进行测序。在某些实施方式中，将模板直接连接于上述半固体支持物。然而，在优选实施方式中，将模板固定在第二种支持物如微粒上，进而将微粒固定在半固体支持物中或上，如上所述。In some embodiments of the present invention, by diffusing sequencing reagents (such as extension probes, ligases, phosphatases, etc.) In a semi-solid support such as a gel in spatially independent regions), sequencing is performed in repeated cycles of extension, ligation, and cleavage. In certain embodiments, the template is directly attached to the semi-solid support described above. However, in a preferred embodiment, the template is immobilized on a second support, such as a microparticle, which in turn immobilizes the microparticle in or on a semi-solid support, as described above.

如实施例1所述，本发明者已证明，可在连接于固定在聚丙烯酰胺凝胶中的珠的模板上进行强效连接和切割。因此，本发明提供了将第一种多核苷酸连接于第二多核苷酸的方法，所述方法包括以下步骤：(a)提供固定在半固体支持物中或之上的第一种多核苷酸；(b)使所述第一种多核苷酸与第二种多核苷酸和连接酶接触；和(c)在存在连接酶时将所述第一种和第二种多核苷酸维持在适合连接的条件下。合适条件包括提供适合所用具体连接酶的缓冲液、辅因子、温度、时间等。在优选实施方式中，所述半固体支持物是凝胶如丙烯酰胺凝胶。在另一优选实施方式中，通过连接于支持物如珠、然后将珠本身固定在半固体支持物中或之上，如通过部分或完全包埋到支持物基质中，将所述第一种多核苷酸固定在半固体支持物中或之上。或者，可通过连接如acrydite部分将所述第一种多核苷酸直接连接于所述半固体支持物。该连接可以是共价或非共价连接(如通过生物素-亲合素相互作用)。美国专利6,511,803描述了可用于将核酸分子连接于本发明优选支持物即聚丙烯酰胺凝胶的各种方法。As described in Example 1, the inventors have demonstrated that robust ligation and cleavage can be performed on templates attached to beads immobilized in polyacrylamide gels. Accordingly, the present invention provides a method of linking a first polynucleotide to a second polynucleotide, said method comprising the steps of: (a) providing a first polynucleotide immobilized in or on a semi-solid support. (b) contacting the first polynucleotide with a second polynucleotide and a ligase; and (c) maintaining the first and second polynucleotides in the presence of the ligase under conditions suitable for connection. Suitable conditions include providing buffers, cofactors, temperature, time, etc. appropriate for the particular ligase used. In a preferred embodiment, the semi-solid support is a gel such as an acrylamide gel. In another preferred embodiment, said first species is attached to a support, such as a bead, and the bead itself is immobilized in or on a semi-solid support, such as by partial or complete embedding in the support matrix. The polynucleotides are immobilized in or on a semi-solid support. Alternatively, the first polynucleotide may be attached directly to the semi-solid support by linking moieties such as acrydite. This linking can be covalent or non-covalent (eg, via a biotin-avidin interaction). US Patent 6,511,803 describes various methods that can be used to attach nucleic acid molecules to the preferred support of the present invention, polyacrylamide gels.

本发明还提供了切割多核苷酸的方法，所述方法包括以下步骤：(a)提供固定在半固体支持物中或之上的多核苷酸，其中所述多核苷酸含有易切连接；(b)将所述多核苷酸与切割剂接触；和(c)在所述切割剂存在下将所述多核苷酸维持在适合切割的条件下。合适的条件包括提供适用于具体切割剂的缓冲液、温度、时间等。在优选实施方式中，所述半固体支持物是凝胶如丙烯酰胺凝胶。在另一优选实施方式中，通过连接于支持物如珠、然后将珠本身固定在半固体支持物中，将所述多核苷酸固定在半固体支持物中。或者，可通过连接如acrydite部分将所述多核苷酸直接连接于所述半固体支持物。该连接可以是共价或非共价连接(如通过生物素-亲合素相互作用)。The present invention also provides a method for cleaving a polynucleotide, the method comprising the steps of: (a) providing a polynucleotide immobilized in or on a semi-solid support, wherein the polynucleotide contains an easy-cut link; ( b) contacting the polynucleotide with a nicking agent; and (c) maintaining the polynucleotide under conditions suitable for cleavage in the presence of the nicking agent. Suitable conditions include providing a buffer, temperature, time, etc. suitable for a particular cutting agent. In a preferred embodiment, the semi-solid support is a gel such as an acrylamide gel. In another preferred embodiment, said polynucleotide is immobilized in a semi-solid support by attaching to a support such as a bead, and then immobilizing the bead itself in the semi-solid support. Alternatively, the polynucleotide may be attached directly to the semi-solid support by linking moieties such as acrydite. This linking can be covalent or non-covalent (eg, via a biotin-avidin interaction).

Macevicz公开了测序具有特定序列的一种模板。他没有讨论平行进行这种方法以同时测序具有不同序列的多种模板的可能性。本发明者认识到，为了以高通量方式进行有效测序，需要制备多种支持物(如珠)，如上所述，以使各支持物连接特定序列的模板，并对连接于各支持物的模板同时进行本文所述方法。在本方法的某些实施方式中，将多种支持物排列在平坦的基材如玻片中或之上。在某些实施方式中，将支持物排列在凝胶中或之上。可以随机方式排列支持物，即不预先确定各支持物在基材上的位置。支持物不一定以规则间隔分布或位于有序的行列排列中等。优选地，支持物的排列密度使得可能检测由许多或大多数支持物发出的单个信号。在某些优选实施方式中，支持物主要分布于一个焦平面上。可包括连接有序列相同的模板的多个支持物，(例如)以进行质量控制。在连接于各支持物的模板上进行平行测序反应。Macevicz discloses sequencing a template with a specific sequence. He does not discuss the possibility of parallelizing this approach to simultaneously sequence multiple templates with different sequences. The present inventors have recognized that for efficient sequencing in a high-throughput manner, multiple supports (such as beads) need to be prepared, as described above, so that each support is attached to a template of a specific sequence, and each support is attached to a specific sequence. The templates were carried out simultaneously with the methods described herein. In certain embodiments of the method, multiple supports are arranged in or on a flat substrate, such as a glass slide. In certain embodiments, the support is arranged in or on the gel. The supports can be arranged in a random fashion, ie the position of each support on the substrate is not predetermined. The supports need not be distributed at regular intervals or in an ordered array of rows and columns or the like. Preferably, the support is arranged in such a density that it is possible to detect a single signal emitted by many or most of the supports. In certain preferred embodiments, the supports are distributed predominantly in one focal plane. Multiple supports linked to sequence-identical templates can be included, for example, for quality control. Parallel sequencing reactions are performed on templates attached to each support.

可用各种方式收集信号，包括各种成像形态。优选地，在检测前在排列于基材上的微粒上(如包埋在位于基材上的半固体支持物中的珠)进行测序的实施方式中，成像装置的分辨率为1μm或更小。例如，可采用装有足够分辨率的CCD相机或微阵列扫描器的扫描显微镜。或者，使珠通过连接于为荧光检测装配的显微镜的流动室或流体工作站。收集信号的其它方法包括光纤束。可采用合适的图像捕获和加工软件。Signals can be collected in a variety of ways, including various imaging modalities. Preferably, in embodiments where sequencing is performed on microparticles arrayed on a substrate (such as beads embedded in a semi-solid support on a substrate) prior to detection, the imaging device has a resolution of 1 μm or less . For example, a scanning microscope equipped with a CCD camera of sufficient resolution or a microarray scanner can be used. Alternatively, beads are passed through a flow cell or fluidic station attached to a microscope equipped for fluorescence detection. Other methods of collecting signals include fiber optic bundles. Appropriate image capture and processing software can be used.

在本发明的某些实施方式中，在微流体装置中进行测序。例如，可将连接有模板的珠加载到该装置中，使试剂从中流过。也可在该装置中用PCR进行模板合成。美国专利6,632,655描述了合适的微流体装置的例子。In certain embodiments of the invention, sequencing is performed in a microfluidic device. For example, template-attached beads can be loaded into the device and reagents flowed through. Template synthesis using PCR can also be performed in this device. Examples of suitable microfluidic devices are described in US Patent 6,632,655.

D.通过不同起始寡核苷酸的再启动进行测序D. Sequencing by reinitiation of different starting oligonucleotides

在本发明优选实施方式中，进行足够的循环数后，从模板上去除通过延伸第一种起始寡核苷酸产生的延伸链，将第二种起始寡核苷酸退火到结合区上，然后进行延伸、连接和检测循环。用任何数量的不同起始寡核苷酸重复该过程。在切割延伸探针的实施方式中，所用不同起始寡核苷酸的数量(以及反应数量)优选等于释放探针的远端部分后仍然与模板杂交的延伸探针部分的长度。因此，按照这个实施方式，序列信息(如各核苷酸的顺序和种类)可获自连接于一种支持物的模板，在这种情况下，采用比每个循环中鉴定连续核苷酸所需循环数少得多的循环数仍能深入读出该序列。In a preferred embodiment of the invention, after a sufficient number of cycles, the extended strand generated by extending the first starting oligonucleotide is removed from the template and the second starting oligonucleotide is annealed to the binding region , followed by a cycle of extension, ligation, and detection. This process is repeated with any number of different starting oligonucleotides. In embodiments where the extension probe is cleaved, the number of different starting oligonucleotides used (and thus the number of reactions) is preferably equal to the length of the portion of the extension probe that remains hybridized to the template after releasing the distal portion of the probe. Thus, according to this embodiment, sequence information (such as the order and identity of individual nucleotides) can be obtained from a template attached to a support, in which case using a ratio of the nucleotides identified in each cycle to identify consecutive nucleotides. A much lower number of cycles is required to still read deeply into the sequence.

与需要将模板分成多个试样量的方法如Macevicz所述方法相比，起始寡核苷酸依次结合于相同模板的实施方式具有某些优点。例如，将起始寡核苷酸施加于同一模板就不需要对多个试样量获得的数据进行跟踪和随后的合并。在支持物以随机方式排列以致于无法预先确定单个支持物的位置的实施方式中，可能难以或不可能可靠地合并来自多个支持物的部分序列信息，各支持物连接有序列相同的模板。Embodiments in which starting oligonucleotides are sequentially bound to the same template have certain advantages over methods that require splitting the template into multiple aliquots, such as the method described by Macevicz. For example, application of the starting oligonucleotides to the same template eliminates the need for tracking and subsequent combining of data obtained from multiple aliquots. In embodiments where the supports are arranged in a random fashion such that the position of individual supports cannot be predetermined, it may be difficult or impossible to reliably combine partial sequence information from multiple supports each linked to a template of the same sequence.

E.在每个循环中鉴定一个模板上的多个核苷酸E. Identification of multiple nucleotides on a template per cycle

Macevicz描述了每个延伸、连接和检测循环中鉴定模板上的一个核苷酸。然而，本发明者认识到，可修改该方法，以在每个循环中鉴定模板上的多个核苷酸。在这种情况下，标记延伸探针，从而使得可从标记确定毗连延伸双链体的两个或多个(优选连续的)核苷酸的种类。换言之，延伸探针的序列测定部分多于一个核苷酸，一般包含最接近的核苷酸、紧邻的核苷酸，还可能包含一个或多个额外(优选连续的)核苷酸，所有这些核苷酸都能与模板特异性杂交。例如，除了采用4种标记鉴定碱基A、G、C和T以外，还可采用16种区别标记的探针或探针组合来鉴定16种可能的双核苷酸AA、AG、AC、AT、GA、GG、GC、GT、CA、CG、CC、CT、TA、TG、TC和TT。各区别标记的延伸探针的序列测定部分与这些双核苷酸之一互补。采用更多标记的相似方法在每个循环中能够鉴定更长的核苷酸序列。Macevicz described identifying one nucleotide on the template in each cycle of extension, ligation, and detection. However, the inventors realized that the method could be modified to identify multiple nucleotides on the template in each cycle. In this case, the extension probe is labeled such that the identity of two or more (preferably contiguous) nucleotides contiguously extending the duplex can be determined from the label. In other words, the sequence-determining portion of the extension probe is more than one nucleotide, generally comprising the nearest nucleotide, immediately adjacent nucleotides, and possibly one or more additional (preferably contiguous) nucleotides, all of which Nucleotides are capable of specifically hybridizing to a template. For example, in addition to using 4 labels to identify the bases A, G, C, and T, 16 differentially labeled probes or combinations of probes can be used to identify 16 possible dinucleotides AA, AG, AC, AT, GA, GG, GC, GT, CA, CG, CC, CT, TA, TG, TC, and TT. The sequenced portion of each differentially labeled extension probe is complementary to one of these dinucleotides. A similar approach employing more labels enables the identification of longer nucleotide sequences per cycle.

F.标记F. mark

从广义上说，本文所用术语“标记”指连接于探针、可用于区分不同种类的探针(如含有不同末端核苷酸的探针)的任何可检测部分或多个可检测部分。因此，标记和特定可检测部分之间不一定是一对一的对应关系。例如，多种可检测部分可连接于一种探针，产生能够将该探针与连接有不同可检测部分或可检测部分组的探针区分开来的组合信号。例如，可使用按照美国专利6,632,609和Speicher等，Nature Genetics，12：368-375，1996所述的称为“组合多色编码”的标记方案的可检测部分组合。In a broad sense, the term "label" as used herein refers to any detectable moiety or detectable moieties attached to a probe that can be used to distinguish between different types of probes (eg, probes containing different terminal nucleotides). Thus, there is not necessarily a one-to-one correspondence between labels and specific detectable moieties. For example, multiple detectable moieties can be attached to one probe, resulting in a combined signal capable of distinguishing that probe from probes to which different detectable moieties or sets of detectable moieties are attached. For example, a combination of detectable moieties according to a labeling scheme called "combinatorial multicolor coding" described in US Pat.

可用各种方式标记本发明探针，包括直接或间接连接荧光或化学发光部分、比色部分、与底物接触时产生可检测信号的酶部分等。Macevicz指出，可用荧光染料标记探针，如Menchen等，美国专利5,188,934；Begot等，PCT申请PCT/US90105565所述。本文所用术语“荧光染料”和“荧光团”指在特定激发波长上吸收光能并在不同波长上发出光能的部分。优选地，选择用于给定探针混合物的标记是可光谱分辨的。本文所用“可光谱分辨”指在操作条件下可根据光谱特征，具体是荧光发射波长区分该标记。例如，一种或多种末端核苷酸的种类可能与独特波长的最大光发射强度相关，或可能与不同波长下的强度比有关。本文中将用于检测和鉴定标记的标记光谱特征称为“颜色”。应理解，常常根据特定的光谱特征鉴定标记，例如当标记由一个可检测部分组成时根据最大发射强度频率来鉴定，或者当标记由多个可检测部分组成时根据发射峰的频率来签定。Probes of the invention can be labeled in a variety of ways, including direct or indirect attachment of fluorescent or chemiluminescent moieties, colorimetric moieties, enzymatic moieties that produce a detectable signal upon contact with a substrate, and the like. Macevicz notes that fluorescent dyes can be used to label the probes, as described by Menchen et al., US Patent 5,188,934; Begot et al., PCT Application PCT/US90105565. The terms "fluorochrome" and "fluorophore" as used herein refer to a moiety that absorbs light energy at a particular excitation wavelength and emits light energy at a different wavelength. Preferably, the labels selected for a given probe mix are spectrally resolvable. As used herein, "spectrally resolvable" means that the label can be distinguished under operating conditions based on its spectral characteristics, specifically the wavelength of fluorescence emission. For example, the species of one or more terminal nucleotides may be related to the maximum light emission intensity at a unique wavelength, or may be related to the ratio of intensities at different wavelengths. The spectral signature of the marker used to detect and identify the marker is referred to herein as "color". It will be appreciated that labels are often identified based on specific spectral features, such as the frequency of maximum emission intensity when the label consists of one detectable moiety, or the frequency of emission peaks when the label consists of multiple detectable moieties.

优选提供四种探针，以四种可光谱分辨的荧光染料各自与探针的四种可能末端核苷酸一对一对应。美国专利4,855,225和5,188,934；国际申请PCT7US90/05565；和Lee等，Nucleic Acids Researchss，20：2471-2483(1992)公开了可光谱分辨的染料组。在某些实施方式中，优选由FITC、HEX^TM、德克萨斯红和Cy5组成的染料组。可从(例如)Molecular Probes，Inc.，Eugene OR购得许多合适染料。荧光染料的特定例子包括但不限于：Alexa Fluor染料(Alexa Fluor350、Alexa Fluor 488、Alexa Fluor 532、Alexa Fluor 546、Alexa Fluor 568、AlexaFluor 594、Alexa Fluor 633、Alexa Fluor 660和Alexa Fluor 680)、AMCA、AMCA-S、BODIPY染料(BODIPY FL、BODIPY R6G、BODIPY TMR、BODIPYTR、BODIPY 530/550、BODIPY 558/568、BODIPY 564/570、BODIPY 576/589、BODIPY 581/591、BODIPY 630/650、BODIPY 650/665)、CAL染料、羧基罗丹明6G、羧基-X-罗丹明(ROX)、Cascade蓝、Cascade黄、花青染料(Cy3、Cy5、Cy3.5、Cy5.5)、丹酰、Dapoxyl、二烷基氨基香豆素、4′，5’-二氯-2′，7′-二甲氧基-荧光素、DM-NERF、伊红、赤藓红、荧光素、FAM、羟基香豆素、IRD染料(IRD40、IRD 700、IRD 800)、JOE、Lissamine罗丹明B、Marina蓝、甲氧基香豆素、萘并荧光素、Oregon绿488、Oregon绿500、Oregon绿514、Oyster染料、太平洋蓝、PyMPO、Pyrene、罗丹明6G、罗丹明绿、罗丹明红、Rhodol绿、2′，4′，5’，7′-四溴砜-荧光素、四甲基-罗丹明(TMR)、羧基四甲基罗丹明(TAMRA)、德克萨斯红、德克萨斯红-X。进一步的说明请参见《荧光探针和研究产物手册》(The Handbookof Fluorescent Probes and Research Products)，第9版，Molecular Probes，Inc.。Preferably four probes are provided, with each of the four spectrally resolvable fluorescent dyes in one-to-one correspondence with the four possible terminal nucleotides of the probes. US Patents 4,855,225 and 5,188,934; International Application PCT7US90/05565; and Lee et al., Nucleic Acids Researchss, 20:2471-2483 (1992) disclose sets of spectrally resolvable dyes. In certain embodiments, a dye set consisting of FITC, HEX ^™ , Texas Red and Cy5 is preferred. Many suitable dyes are commercially available from, eg, Molecular Probes, Inc., Eugene OR. Specific examples of fluorescent dyes include, but are not limited to: Alexa Fluor dyes (Alexa Fluor 350, Alexa Fluor 488, Alexa Fluor 532, Alexa Fluor 546, Alexa Fluor 568, Alexa Fluor 594, Alexa Fluor 633, Alexa Fluor 660, and Alexa Fluor 680), AMCA , AMCA-S, BODIPY Dyes (BODIPY FL, BODIPY R6G, BODIPY TMR, BODIPYTR, BODIPY 530/550, BODIPY 558/568, BODIPY 564/570, BODIPY 576/589, BODIPY 581/591, BODIPY 630/650, BODIPY 650/665), CAL Dye, Carboxyrhodamine 6G, Carboxy-X-Rhodamine (ROX), Cascade Blue, Cascade Yellow, Cyanine Dye (Cy3, Cy5, Cy3.5, Cy5.5), Dansyl, Dapoxyl , dialkylaminocoumarin, 4',5'-dichloro-2',7'-dimethoxy-fluorescein, DM-NERF, eosin, erythrosine, fluorescein, FAM, hydroxyl Beanin, IRD dyes (IRD40, IRD 700, IRD 800), JOE, Lissamine Rhodamine B, Marina Blue, Methoxycoumarin, Naphthofluorescein, Oregon Green 488, Oregon Green 500, Oregon Green 514, Oyster Dye, Pacific Blue, PyMPO, Pyrene, Rhodamine 6G, Rhodamine Green, Rhodamine Red, Rhodol Green, 2′,4′,5′,7′-tetrabromosulfone-fluorescein, tetramethyl-rhodamine ( TMR), Carboxytetramethylrhodamine (TAMRA), Texas Red, Texas Red-X. For further instructions see The Handbook of Fluorescent Probes and Research Products, 9th Edition, Molecular Probes, Inc.

在非辐射荧光共振能量转移(FRET)的过程中，一些荧光基团将能量转移到另一基团上，由第二个基团产生检测信号，而非直接检测此基团。即，采用淬灭剂也属于本发明范围。术语“淬灭剂”指接近时能吸收激发的荧光标记的能量的部分，并能消散该能量而不发射可见光。淬灭剂的例子包括但不限于：DABCYL(4-(4′-二甲基氨基苯基偶氮基)苯甲酸)琥珀酰亚胺酯、二芳基罗丹明羧酸琥珀酰亚胺酯(QSY-7)和4′，5’-二硝基荧光素羧酸琥珀酰亚胺酯(QSY-33)(均购自Molecular Probes)，淬灭剂1(Q1；购自Epoch)或“黑洞淬灭剂”BHQ-I、BHQ-2和BHQ-3(购自BioSearch，Inc.)。In the process of non-radiative fluorescence resonance energy transfer (FRET), some fluorophores transfer energy to another group, and the detection signal is generated by the second group instead of directly detecting this group. That is, the use of a quencher also falls within the scope of the present invention. The term "quencher" refers to a moiety that absorbs, upon proximity, the energy of an excited fluorescent label and dissipates that energy without emitting visible light. Examples of quenchers include, but are not limited to: DABCYL (4-(4'-dimethylaminophenylazo)benzoic acid) succinimidyl ester, diarylrhodamine carboxylic acid succinimidyl ester ( QSY-7) and 4′,5′-dinitrofluorescein carboxylate succinimidyl ester (QSY-33) (both purchased from Molecular Probes), quencher 1 (Q1; purchased from Epoch) or “black hole Quenchers" BHQ-I, BHQ-2 and BHQ-3 (purchased from BioSearch, Inc.).

除了上述各种可检测部分以外，本发明也考虑采用可光谱分辨的量子点、金属纳米颗粒或纳米簇等，可将它们直接连接于寡核苷酸探针，或者包埋或连接到聚合物基质中再连接于探针。如上所述，不一定直接可检测到可检测部分本身。例如，它们可能在基材上起反应从而被检测或者它们可能需要经过修饰才变得可检测。In addition to the various detectable moieties described above, the present invention also contemplates the use of spectrally resolvable quantum dots, metal nanoparticles or nanoclusters, etc., which can be directly attached to oligonucleotide probes, or embedded or attached to polymers The matrix is then attached to the probe. As noted above, the detectable moiety itself is not necessarily directly detectable. For example, they may be reactive on the substrate to be detected or they may need to be modified to become detectable.

如上所述，在本发明的某些实施方式中，标记由多种可检测部分组成。这些可检测部分的组合信号产生用于鉴定该探针的颜色。例如，可通过连接“蓝”和“红”可检测部分构建特定序列的“紫”探针。或者，可通过混合序列相同但用不同可检测部分进行标记的两种探针产生混合探针，从而产生独特的颜色。因此，可通过构建具有特定序列的两种探针产生该序列的“紫”探针。将“红”可检测部分连接于第一种探针，将“蓝”可检测部分连接于第二种探针。混合试样量的这两种探针。可通过以不同比例混合试样量产生不同渐变的紫色。这种方法提供了许多优点。首先，它能够用较少可检测部分产生多种可区分探针。其次，采用混合探针可提供可能有助于降低偏差的简并程度，这种偏差可能由具体可检测部分和具体核苷酸的相互作用产生。As noted above, in certain embodiments of the invention, the label consists of multiple detectable moieties. The combined signal of these detectable moieties produces the color used to identify the probe. For example, sequence-specific "purple" probes can be constructed by linking "blue" and "red" detectable moieties. Alternatively, hybrid probes can be produced by mixing two probes of the same sequence but labeled with different detectable moieties, thereby producing a unique color. Thus, a "purple" probe of a particular sequence can be generated by constructing two probes of that sequence. A "red" detectable moiety is attached to the first probe and a "blue" detectable moiety is attached to the second probe. Mix sample amounts of these two probes. Different shades of purple can be produced by mixing sample amounts in different proportions. This approach offers many advantages. First, it enables the generation of multiple distinguishable probes with fewer detectable moieties. Second, the use of mixed probes provides a degree of degeneracy that may help reduce bias that may result from the interaction of specific detectable moieties with specific nucleotides.

在本发明的某些实施方式中，通过可切割连接将可检测部分连接于寡核苷酸延伸探针中的核苷酸上，以便在连接和检测后去除可检测部分。可采用各种不同可切割连接。本文所用术语“可切割连接”指将可检测部分与核苷酸相连的化学部分，并在需要时可以将其切下以去除核苷酸上的可检测部分，而基本不改变其连接的核苷酸或核酸分子。根据连接的本质，可通过(例如)酸或碱处理、或者氧化或还原该连接、或者通过光处理(光切割)实现切割。可切割连接和切割剂的例子参见Shirnkus等，1985，Proc.Natl.Acad.Sci.USA 82：2593-2597；Soukup等，1995，Bioconjug.Chem.6：135-138；Shimikus等，1986，DNA 5：247-255；和Herman和Fenn，1990，Meth.Enzymol.184：584-588。In certain embodiments of the invention, a detectable moiety is attached to a nucleotide in an oligonucleotide extension probe by a cleavable linkage such that the detectable moiety is removed after ligation and detection. A variety of different cuttable connections can be used. As used herein, the term "cleavable linkage" refers to a chemical moiety that attaches a detectable moiety to a nucleotide and, if desired, can be cleaved to remove the detectable moiety on the nucleotide without substantially altering the core to which it is attached. nucleotide or nucleic acid molecule. Depending on the nature of the linkage, cleavage can be achieved, for example, by acid or base treatment, or by oxidation or reduction of the linkage, or by phototreatment (photocleavage). For examples of cleavable linkers and cleavage agents see Shirnkus et al., 1985, Proc. Natl. Acad. Sci. USA 82:2593-2597; Soukup et al., 1995, Bioconjug. 5:247-255; and Herman and Fenn, 1990, Meth. Enzymol. 184:584-588.

例如，如美国专利6,511,803所述，可还原二硫连接，从而用硫醇化合物还原剂如二硫苏糖醇(DTT)切割。可获得含有可用于与含有活性芳基氨基的核苷酸(如dCTP)偶联的巯基(SH)的荧光团(如含有SH的花青5或花青3荧光团；NewEngland Nuclear—DuPont)。活性吡啶基二硫醇能与巯基反应产生可用还原剂如二硫苏糖醇切割的氢硫键(sulfhydryl bond)。可用NHS酯异双功能交联剂(Pierce)将含有活性芳基氨基的脱氧核苷酸连接于吡啶基二硫醇基团，进而与荧光团上的SH反应，产生用于本发明方法的二硫键连接的可切割核苷酸-荧光团复合物。或者，核苷酸和荧光团之间的顺-二醇连接可被高碘酸盐切开。美国专利号6,664,079和6,632,655、美国公开申请20030104437、WO 04/18497和WO 03/48387中描述了各种可切割连接。For example, disulfide linkages can be reduced to be cleaved with a thiol reducing agent such as dithiothreitol (DTT), as described in US Patent No. 6,511,803. Fluorophores are available that contain a thiol (SH) group that can be used for conjugation to reactive arylamino-containing nucleotides (such as dCTP) (eg, SH-containing cyanine 5 or cyanine 3 fluorophores; New England Nuclear—DuPont). Reactive pyridyl dithiols can react with sulfhydryl groups to generate sulfhydryl bonds that can be cleaved by reducing agents such as dithiothreitol. NHS ester heterobifunctional cross-linking agent (Pierce) can be used to connect the deoxynucleotides containing active aryl amino group to pyridyl dithiol group, and then react with SH on the fluorophore to produce the bisphosphonate used in the method of the present invention. Sulfur-linked cleavable nucleotide-fluorophore complexes. Alternatively, the cis-diol linkage between the nucleotide and the fluorophore can be cleaved by periodate. Various cleavable linkages are described in US Patent Nos. 6,664,079 and 6,632,655, US Published Application 20030104437, WO 04/18497 and WO 03/48387.

在本发明的其它实施方式中，使用通过接触电磁能如光(光漂白)能使其不可检测的可检测部分。In other embodiments of the invention, detectable moieties are used that can be rendered undetectable by exposure to electromagnetic energy such as light (photobleaching).

在利用含有通过可切割连接连接于探针的标记或含有可被光漂白的标记的延伸探针的本发明实施方式中，测序方法一般包括在已经进行连接和标记检测后的一个或多个循环中进行切割或光漂白的步骤。如上所述，寡核苷酸延伸探针中易切连接的切割可能不进行至完成(即在其连接的循环中可切割小于100％新连接的探针)。由于这种探针通常包含不可延伸的模板或有帽，所以它们不能进行连续循环。然而，无法切割探针意味着该标记保持与探针连接的模板分子的连接，这将产生背景信号(即背景荧光)，可能增加后续循环中的噪音。加入切割或光漂白步骤以去除该标记或使其不可检测能减少此种背景并提高信噪比。可以在每个循环中进行切割或光漂白，或者频率稍低，如每两个循环、每三个循环或每五个循环或更多个循环进行一次切割或光漂白。在本发明的某些实施方式中，实际上不一定加入额外步骤来切割可切割接头。例如，切割剂如DTT可能已经存在于洗涤缓冲液中，可用于去除未连接延伸探针。In embodiments of the invention utilizing extended probes containing a label attached to the probe by a cleavable ligation or a label that can be photobleached, the sequencing method generally includes one or more cycles after ligation and label detection have been performed. Steps for cleavage or photobleaching. As noted above, cleavage of easy ligation in oligonucleotide extension probes may not proceed to completion (ie, less than 100% of the newly ligated probe may be cleaved in its cycle of ligation). Since such probes typically contain a non-extendable template or are capped, they cannot be cycled continuously. However, the inability to cleave the probe means that the label remains attached to the probe-attached template molecule, which would generate a background signal (i.e., background fluorescence), potentially increasing noise in subsequent cycles. Adding a cleavage or photobleaching step to remove the label or make it undetectable can reduce this background and improve the signal-to-noise ratio. Cleaving or photobleaching can be performed in every cycle, or less frequently, such as once every two cycles, every three cycles, or every five or more cycles. In certain embodiments of the invention, it is not actually necessary to add an additional step to cleave the cleavable linker. For example, a cleavage agent such as DTT may already be present in the wash buffer and can be used to remove unligated extension probes.

G.优选的易切连接G. Preferred easy-cut connection

本发明者发现，在通过连续的延伸、连接、检测和切割循环进行测序的方法中，含有至少一个硫代磷酸酯连接的延伸探针特别有用。在这种连接中，磷酸二酯键的桥接氧原子之一被硫原子取代。硫代磷酸酯连接可以是图4A所示的5’-S-硫代磷酸酯连接(3’-O-P-S-5’)或图4B所示的3’-S-硫代磷酸酯连接(3’-S-P-O-5’)。应理解，表示为3’-O-P-S-5’或3’-S-P-O-5’的连接中的磷原子可连接于两个非桥接氧原子，如图4A和4B所示(如典型的磷酸二酯键)。或者，磷原子可连接于各种其它原子或基团，如S、CH₃、BH₃等。因此，本发明一方面是含有硫代磷酸酯连接的标记的寡核苷酸探针。虽然该探针在本文所述的测序方法中特别有用，但它们也可用于各种其它目的。具体说，本发明提供了(i)5’-O-P-O-X-O-P-S-(N)_kN_B ^*-3’形式的寡核苷酸；和(ii)5’-N_B ^*(N)_k-S-P-O-X-3’形式的寡核苷酸。在这些探针中，N代表任何核苷酸，N_B代表连接酶不可延伸的部分，*代表可检测部分，X代表核苷酸，k是1-100。在某些实施方式中，k是1-50、1-30、1-20，如4-10，限制条件是：可检测部分可存在于替代N_B、或除N_B以外的(N)_k的任何核苷酸上。这些探针中的末端核苷酸可以包括或可以不包括磷酸基团或羟基。而且应理解，在优选实施方式中磷原子通常连接于两个其它(非桥接)氧原子。The inventors have found that extension probes containing at least one phosphorothioate linkage are particularly useful in methods for sequencing by successive cycles of extension, ligation, detection and cleavage. In this linkage, one of the bridging oxygen atoms of the phosphodiester bond is replaced by a sulfur atom. The phosphorothioate linkage can be a 5'-S-phosphorothioate linkage (3'-OPS-5') as shown in Figure 4A or a 3'-S-phosphorothioate linkage (3'-OPS-5') as shown in Figure 4B -SPO-5'). It will be appreciated that the phosphorus atom in linkages denoted 3'-OPS-5' or 3'-SPO-5' may be linked to two non-bridging oxygen atoms as shown in Figures 4A and 4B (as typical phosphodiester key). Alternatively, the phosphorus atom can be attached to various other atoms or groups, such as S, _CH3 , _BH3 , and the like. Accordingly, one aspect of the invention is a labeled oligonucleotide probe comprising a phosphorothioate linkage. While the probes are particularly useful in the sequencing methods described herein, they can also be used for a variety of other purposes. Specifically, the present invention provides (i) oligonucleotides in the form of ₍ i) 5'-OPOXOPS-(N) _kNB ^* -3'; and (ii) 5'- _NB ^* (N) _k -SPOX-3 ' form of the oligonucleotide. In these probes, N represents any nucleotide, N _B represents a ligase non-extendable portion, * represents a detectable portion, X represents a nucleotide, and k is 1-100. In certain embodiments, k is 1-50, 1-30, 1-20, such _as 4-10, with the proviso that the detectable moiety may be present in place of (N ₎ _k instead of, or in addition to, NB on any nucleotide. The terminal nucleotides in these probes may or may not include phosphate groups or hydroxyl groups. It should also be understood that in preferred embodiments the phosphorus atom is generally attached to two other (non-bridging) oxygen atoms.

本领域已知合成含有5’-S-硫代磷酸酯或3’-S-硫代磷酸酯连接的寡核苷酸的方法，其中某些方法适用于自动化固相寡核苷酸合成。合成方法参见例如：Cook，AF，J.Am.Chem.Soc.，92：190-195，1970；Chladek，S.等，J.Am.Chem.Soc.，94：2079-2084，1972；Rybakov，VN等，Nucleic Acids Res.，9：189-201，1981；Cosstick，R.和Vyle，JS，J.Chem.Soc.CHem.Commun.，992-992，1988；Mag，M.等，Nucleic Acids Res.，19(7)；1437-1441，1991；Xu，Y和Kool，ET，Nucleic Acids Res.，26(13)：3159-3164，1998；Cosstick，R.和Vyle，JS，TetrahedronLett.，30：4693-4696，1989；Cosstick，R.和Vyle，JS，Nucleic Acids Res.，18：829-835，1990；Sun，SG和Piccirilli，JA，Nucl.Nucl.，16：1543-1545，1997；Sun SG等，RNA，3：1352-1363，1997；Vyle，JS等，Tetrahedron Lett.，33：3017-3020，1992；Li，X.等，J.Chem.Soc.Perkin Trans.，1：2123-22129，1994；Liu，XH和Reese，CB，Tetrahedron Lett.，37：925-928，1996；Weinstein，LB等，J. Am.Chem.Soc.，118：10341-10350，1996；和Sabbagh，G.等，Nucleic Acids Res.，32(2)：495-501，2004。此外，本发明者开发了新合成方法。例如，图7显示了dA的3’-亚磷酰胺的合成方案。相似方案可用于合成dG的3’-亚磷酰胺。这些亚磷酰胺可用于合成与嘌呤核苷相连的含有3’-S-硫代磷酸酯连接的寡核苷酸，如利用自动DNA合成仪。Methods for synthesizing oligonucleotides containing 5'-S-phosphorothioate or 3'-S-phosphorothioate linkages are known in the art, some of which are suitable for automated solid-phase oligonucleotide synthesis. For synthetic methods see, for example: Cook, AF, J.Am.Chem.Soc., 92:190-195, 1970; Chladek, S. et al., J.Am.Chem.Soc., 94:2079-2084, 1972; Rybakov , VN et al., Nucleic Acids Res., 9:189-201, 1981; Cosstick, R. and Vyle, JS, J.Chem.Soc.CHem.Commun., 992-992, 1988; Mag, M. et al., Nucleic Acids Acids Res., 19(7); 1437-1441, 1991; Xu, Y and Kool, ET, Nucleic Acids Res., 26(13): 3159-3164, 1998; Cosstick, R. and Vyle, JS, Tetrahedron Lett. , 30: 4693-4696, 1989; Cosstick, R. and Vyle, JS, Nucleic Acids Res., 18: 829-835, 1990; Sun, SG and Piccirilli, JA, Nucl.Nucl., 16: 1543-1545, 1997; Sun SG et al., RNA, 3:1352-1363, 1997; Vyle, JS et al., Tetrahedron Lett., 33:3017-3020, 1992; Li, X. et al., J.Chem.Soc.Perkin Trans., 1 : 2123-22129, 1994; Liu, XH and Reese, CB, Tetrahedron Lett., 37: 925-928, 1996; Weinstein, LB et al., J. Am. Chem. Soc., 118: 10341-10350, 1996; and Sabbagh, G. et al., Nucleic Acids Res., 32(2):495-501, 2004. Furthermore, the present inventors developed a new synthesis method. For example, Figure 7 shows a synthetic scheme for the 3'-phosphoramidite of dA. A similar scheme can be used for the synthesis of 3'-phosphoramidites of dG. These phosphoramidites can be used to synthesize oligonucleotides containing 3'-S-phosphorothioate linkages linked to purine nucleosides, e.g., using an automated DNA synthesizer.

可用各种含金属的物质切割硫代磷酸酯连接。所述金属可以是(例如)Ag、Hg、Cu、Mn、Zn或Cd。优选地，该物质是提供Ag⁺、Hg⁺⁺、Cu⁺⁺、Mn⁺⁺、Zn⁺或Cd⁺阴离子的可溶于水的盐(也可采用提供其它氧化状态的离子的盐)。也可采用I₂。特别优选含银盐如硝酸银(AgNO₃)或其它提供Ag⁺离子的盐。合适的条件包括例如：50mM AgNO₃，约22-37℃，10分钟或更长时间如30分钟。优选地，pH为4.0-10.0，更优选5.0-9.0，如约6.0-8.0，如约7.0。参见例如，Mag，M.等，Nucleic Acids Res.，19(7)：1437-1441，1991。实施例1提供了示范性方案。Phosphorothioate linkages can be cleaved with various metal-containing substances. The metal may be, for example, Ag, Hg, Cu, Mn, Zn or Cd. Preferably, the species is a water-soluble salt providing anions of Ag ⁺ , Hg ⁺⁺ , Cu ⁺⁺ , Mn ⁺⁺ , Zn ⁺ or Cd ⁺ (salts providing ions in other oxidation states may also be used). _I2 can also be used. Particular preference is given to silver-containing salts such as silver nitrate (AgNO ₃ ) or other salts which provide Ag ⁺ ions. Suitable conditions include, for example: 50 mM AgNO ₃ , about 22-37° C., for 10 minutes or longer, such as 30 minutes. Preferably, the pH is 4.0-10.0, more preferably 5.0-9.0, such as about 6.0-8.0, such as about 7.0. See, eg, Mag, M. et al., Nucleic Acids Res., 19(7):1437-1441,1991. Example 1 provides an exemplary protocol.

可用含有3’-O-P-S-5’连接的延伸探针在5’→3’方向上进行测序。图5A显示了用5’-O-P-O-X-O-P-S-NN_B ^*-3’形式的延伸探针进行的一个杂交、连接和切割循环，其中N代表任意核苷酸，N_B代表了连接酶不能延伸的部分(如N_B是缺少3’羟基或连接有封闭部分的核苷酸)，*代表可检测部分，X代表其种类对应于可检测部分的核苷酸。或者，可将大量封闭部分连接于3’末端核苷酸，以防止多重连接。例如，将大基团连接于核苷酸的糖部分的(如)2′或3’位置上将防止连接。荧光标记可用作合适的大基团。Sequencing can be performed in the 5'→3' direction with an extension probe containing a 3'-OPS-5' junction. Figure 5A shows a cycle of hybridization, ligation and cleavage with an extension probe of the form 5'-OPOXOPS-NN _B ^* -3', where N stands for any nucleotide and N _B represents the portion that cannot be extended by the ligase ( For example, _NB is a nucleotide lacking a 3' hydroxyl group or having a blocking moiety attached), * represents a detectable moiety, and X represents a nucleotide whose type corresponds to the detectable moiety. Alternatively, a bulk blocking moiety can be attached to the 3' terminal nucleotide to prevent multiple ligation. For example, attachment of a bulky group to, eg, the 2' or 3' position of the sugar moiety of the nucleotide will prevent attachment. Fluorescent labels can be used as suitable macrogroups.

将含有结合区40和序列未知的多核苷酸区50的模板连接于支持物如珠。在优选实施方式中，如图5A所示，结合区位于模板与支持物连接点的另一端。将具有可延伸末端(在此例中为游离的3’OH基团)的起始寡核苷酸30退火到结合区40上。延伸探针60杂交于模板的多核苷酸区50。核苷酸X与模板中的未知核苷酸Y形成互补碱基对。将延伸探针60连接于起始寡核苷酸(如采用T4连接酶)。连接后，检测连接于延伸探针60的标记(未显示)。该标记对应于核苷酸X的种类。因此，核苷酸Y被鉴定为与核苷酸X互补的核苷酸。然后，在硫代磷酸酯连接上切割延伸探针60(如用AgNO₃或提供Ag⁺离子的另一种盐)，产生延伸双链体。切割在延伸双链体的3’端上产生磷酸基团。用磷酸酶处理在延伸双链体上产生可延伸探针末端。以所需循环数重复该过程。A template comprising a binding region 40 and a polynucleotide region of unknown sequence 50 is attached to a support such as a bead. In a preferred embodiment, as shown in Figure 5A, the binding region is located at the other end of the junction between the template and the support. An initial oligonucleotide 30 with an extendable end (in this case a free 3'OH group) is annealed to the binding region 40 . The extension probe 60 hybridizes to the polynucleotide region 50 of the template. Nucleotide X forms a complementary base pair with unknown nucleotide Y in the template. The extension probe 60 is ligated to the starting oligonucleotide (eg, using T4 ligase). After ligation, the label (not shown) attached to the extension probe 60 is detected. This label corresponds to the species of nucleotide X. Nucleotide Y is therefore identified as a nucleotide complementary to nucleotide X. Then, extension probe 60 is cleaved at the phosphorothioate linkage (eg, with _AgNO3 or another salt that provides Ag ⁺ ions), generating an extension duplex. Cleavage generates a phosphate group on the 3' end of the extended duplex. Treatment with phosphatase generates extendable probe ends on the extended duplex. Repeat the process for the desired number of cycles.

在优选实施方式中，用含有3’-S-P-O-5’连接的延伸探针在3’→5’方向上进行测序。图5B显示了用5’-N_B ^*-NNNN-S-P-O-X-3’形式的延伸探针进行的一个杂交、连接和切割循环，其中N代表任意核苷酸，N_B代表了连接酶不能延伸的部分(如N_B是缺少5’磷酸基团或连接有封闭部分的核苷酸)，*代表可检测部分，X代表其种类对应于可检测部分的核苷酸。In a preferred embodiment, sequencing is performed in the 3'→5' direction with an extension probe containing a 3'-SPO-5' junction. Figure 5B shows a cycle of hybridization, ligation, and cleavage with an extension probe of the form 5'- _NB ^* -NNNN-SPOX-3', where N represents any nucleotide and _NB represents the ligase that cannot be extended. moiety (such as _NB is a nucleotide lacking a 5' phosphate group or having a blocking moiety attached), * represents a detectable moiety, and X represents a nucleotide whose type corresponds to the detectable moiety.

将含有结合区40和序列未知的多核苷酸区50的模板连接于支持物如珠。在优选实施方式中，如图5B所示，结合区位于模板与支持物连接点的另一端。将具有可延伸末端(在此例中为游离的5’磷酸基团)的起始寡核苷酸30退火到结合区40上。延伸探针60杂交于模板的多核苷酸区50。核苷酸X与模板中的未知核苷酸Y形成互补碱基对。将延伸探针60连接于起始寡核苷酸(如采用T4连接酶)。连接后，检测连接于延伸探针60的标记(未显示)。该标记对应于核苷酸X的种类。因此，核苷酸Y被鉴定为与核苷酸X互补的核苷酸。然后，在硫代磷酸酯连接上切割延伸探针60(如用AgNO₃或提供Ag⁺离子的另一种盐)，产生延伸双链体。切割在延伸双链体的5’端上产生可延伸的单磷酸基团，因此不必进行额外的步骤来产生可延伸末端。以所需循环数重复该过程。A template comprising a binding region 40 and a polynucleotide region of unknown sequence 50 is attached to a support such as a bead. In a preferred embodiment, as shown in Figure 5B, the binding region is located at the other end of the junction between the template and the support. An initial oligonucleotide 30 with an extendable end (in this case a free 5' phosphate group) is annealed to the binding region 40 . The extension probe 60 hybridizes to the polynucleotide region 50 of the template. Nucleotide X forms a complementary base pair with unknown nucleotide Y in the template. The extension probe 60 is ligated to the starting oligonucleotide (eg, using T4 ligase). After ligation, the label (not shown) attached to the extension probe 60 is detected. This label corresponds to the species of nucleotide X. Nucleotide Y is therefore identified as a nucleotide complementary to nucleotide X. Then, extension probe 60 is cleaved at the phosphorothioate linkage (eg, with _AgNO3 or another salt that provides Ag ⁺ ions), generating an extension duplex. Cleavage generates an extendable monophosphate group on the 5' end of the extending duplex, so no additional steps are necessary to generate extendable ends. Repeat the process for the desired number of cycles.

应理解，可采用这种方案的许多改变形式。例如，探针可以短于或长于6个核苷酸；标记不一定在3’末端核苷酸上；P-S连接可位于任意两个相邻核苷酸之间等。在上述实施方式中，连续的延伸、连接、检测和切割循环导致鉴定了相邻位置的核苷酸。然而，通过使P-S连接更靠近延伸探针远端(即发生连接的相对末端)，依次鉴定的核苷酸将以一定间隔沿模板分布，如上所述和图1和6所述。It should be understood that many variations of this scheme may be employed. For example, a probe can be shorter or longer than 6 nucleotides; a label need not be on the 3' terminal nucleotide; a P-S junction can be between any two adjacent nucleotides, etc. In the above embodiments, successive cycles of extension, ligation, detection and cleavage result in the identification of nucleotides at adjacent positions. However, by bringing the P-S junction closer to the distal end of the extension probe (ie, the opposite end from which the ligation occurs), sequentially identified nucleotides will be spaced along the template, as described above and in Figures 1 and 6 .

图6A-6F是在一个模板上依次进行几个测序反应的更详细的示意图。用含有3’-S-P-O-5’连接的延伸探针在3’→5’方向上进行测序。各测序反应包括多个延伸、连接、检测和切割循环。该反应利用结合于模板不同部分的起始寡核苷酸。延伸探针的长度为8个核苷酸，在从探针的3’端开始数第6和第7个核苷酸之间含有硫代磷酸酯连接。核苷酸2-6用作间隔物，以使各反应能鉴定以一定间隔沿模板分布的多个核苷酸。通过连续进行多个反应和适当地合并从各反应获得的部分序列信息，测定部分模板的完整序列。6A-6F are more detailed schematics of several sequencing reactions performed sequentially on a template. Sequencing was performed in the 3'→5' direction with an extension probe containing a 3'-S-P-O-5' junction. Each sequencing reaction includes multiple cycles of extension, ligation, detection and cleavage. This reaction utilizes starting oligonucleotides that bind to different parts of the template. The extension probe is 8 nucleotides in length and contains a phosphorothioate linkage between the 6th and 7th nucleotides from the 3' end of the probe. Nucleotides 2-6 were used as spacers to allow each reaction to identify multiple nucleotides spaced along the template. The complete sequence of the partial template is determined by performing multiple reactions in succession and appropriately combining the partial sequence information obtained from each reaction.

图6A显示了用杂交于模板中的衔接子序列(上面称为结合区)的第一种起始寡核苷酸(在图6A-6F中称为引物)启动，以提供可延伸双链体。图6B-6D显示了几个核苷酸鉴定循环，其中模板中每6个碱基阅读一次。在图6B中，3’末端核苷酸与模板序列中第一个未知核苷酸互补的第一个延伸探针结合于模板，并与引物的可延伸末端连接。连接于该延伸探针的标记能鉴定该探针的3’末端核苷酸为A，从而鉴定出模板序列的第一个未知核苷酸为A。图6C显示了用AgNO₃在硫代磷酸酯连接处切割延伸寡核苷酸，并释放标记所连接的延伸探针部分。图6D显示了其它延伸、连接和切割循环。由于该探针所含间隔物的长度为5个核苷酸，所以该测序反应在模板上每6个核苷酸鉴定一次。Figure 6A shows priming with the first starting oligonucleotide (referred to as primer in Figures 6A-6F) that hybridizes to the adapter sequence (referred to above as the binding region) in the template to provide an extendable duplex . Figures 6B-6D show several nucleotide identification cycles in which every 6 bases in the template are read. In FIG. 6B, the first extension probe whose 3' terminal nucleotide is complementary to the first unknown nucleotide in the template sequence binds to the template and connects to the extendable end of the primer. A label attached to the extension probe identifies the 3' terminal nucleotide of the probe as an A, thereby identifying the first unknown nucleotide of the template sequence as an A. Figure 6C shows the cleavage of the extension oligonucleotide with _AgNO3 at the phosphorothioate linkage and releases the part of the extension probe to which the label is attached. Figure 6D shows additional extension, ligation and cleavage cycles. Since the length of the spacer contained in the probe is 5 nucleotides, the sequencing reaction identifies every 6 nucleotides on the template.

所需循环数后，去除包含第一种起始寡核苷酸的延伸链，结合于不同于第一种起始寡核苷酸所结合的结合区部分的第二种起始寡核苷酸杂交于模板。图6E显示了第二个测序反应，其中用第二种起始寡核苷酸启动，然后是几个核苷酸鉴定循环。图6F显示了用第三种起始寡核苷酸启动，然后是几个核苷酸鉴定循环。从第二种起始寡核苷酸延伸能以不同于第一个测序反应所鉴定核苷酸的“读框”每6个碱基鉴定一次。After the desired number of cycles, the extended strand comprising the first starting oligonucleotide is removed, binding to a second starting oligonucleotide that binds to a portion of the binding region different from that to which the first starting oligonucleotide binds hybridize to the template. Figure 6E shows a second sequencing reaction, initiated with a second starting oligonucleotide, followed by several cycles of nucleotide identification. Figure 6F shows priming with a third starting oligonucleotide followed by several cycles of nucleotide identification. Extensions from the second starting oligonucleotide can be identified every 6 bases in a "reading frame" different from the nucleotides identified in the first sequencing reaction.

虽然本发明的某些实施方式中优选含有硫代磷酸酯连接的延伸探针，但也宜采用各种其它易切连接。例如，已知对天然产生核酸中发现的O-P-O连接进行的许多变异(参见例如，Micklefield，J.Curr.Med.Chem.，8：1157-1179，2001)。可修饰其中所述任何含有P-O键的结构，使其含有易切的P-S键。例如，可将NH-P-O键改变成NH-P-S键。While phosphorothioate-linked extension probes are preferred in certain embodiments of the invention, a variety of other easy-cleavable linkages are also suitable. For example, many variations on O-P-O linkages found in naturally occurring nucleic acids are known (see eg, Micklefield, J. Curr. Med. Chem., 8:1157-1179, 2001). Any structure containing a P-O bond described therein can be modified to contain a scissile P-S bond. For example, NH-P-O bonds can be changed to NH-P-S bonds.

在本发明的一些实施方式中，延伸探针含有引发残基，所述引发残基在任选地用修饰剂修饰后，使核酸易被切割剂或其组合切割。具体说，本发明者发现，参与DNA修复的酶是用于实施经连续延伸、连接、检测和切割循环进行测序的方法的有利切割试剂。通常，在任选的DNA糖基化酶修饰后，延伸探针中存在引发残基如损伤碱基或脱碱基残基可使该探针易被一种或多种DNA修复酶切割。因此，含有作为参与DNA修复的酶如AP核酸内切酶的切割底物的连接的延伸探针可用于本发明。在本发明中含有作为参与DNA修复的酶如DNA糖基化酶的修饰底物的残基的延伸探针也特别有用，其中修饰使该探针易被AP核酸内切酶切割。在一些实施方式中，该延伸探针含有脱碱基残基，即它缺少嘌呤或嘧啶碱基。脱碱基残基和相邻核苷之间的连接易被AP核酸内切酶切割，因此是易切连接。在本发明的某些实施方式中，脱碱基残基包含2′脱氧核糖。在一些实施方式中，延伸探针包含损伤碱基。所述损伤碱基是去除损伤碱基的酶如DNA糖基化酶的底物。去除损伤碱基后，得到的脱碱基残基和相邻核苷之间的连接易被AP核酸内切酶切割，因此被认为是本发明的易切连接。In some embodiments of the invention, the extension probes contain a priming residue that, when optionally modified with a modifier, renders the nucleic acid susceptible to cleavage by a cleavage agent or a combination thereof. In particular, the present inventors have discovered that enzymes involved in DNA repair are advantageous cleavage agents for carrying out methods for sequencing via successive cycles of extension, ligation, detection and cleavage. Typically, following optional DNA glycosylase modification, the presence of priming residues, such as damaged bases or abasic residues, in the extension probe renders the probe susceptible to cleavage by one or more DNA repair enzymes. Thus, extension probes containing ligation that are cleavage substrates for enzymes involved in DNA repair, such as AP endonuclease, are useful in the present invention. Extension probes containing residues that are substrates for modifications by enzymes involved in DNA repair, such as DNA glycosylase, where the modification renders the probe susceptible to cleavage by AP endonucleases are also particularly useful in the present invention. In some embodiments, the extension probe contains an abasic residue, ie, it lacks a purine or pyrimidine base. The linkage between the abasic residue and the adjacent nucleoside is readily cleaved by the AP endonuclease and is therefore an easy linkage. In certain embodiments of the invention, the abasic residue comprises 2' deoxyribose. In some embodiments, the extension probes comprise damaged bases. The damaged base is a substrate for an enzyme that removes the damaged base, such as DNA glycosylase. After removal of damaged bases, the resulting abasic residue and the linkage between adjacent nucleosides are easily cleaved by AP endonuclease, and thus are considered as easy-cleavable linkages in the present invention.

许多不同AP核酸内切酶可用作本发明切割试剂。根据切割与脱碱基残基相邻的连接的机制来区分两种主要类型的AP核酸内切酶。I类AP核酸内切酶如大肠杆菌的核酸内切酶III(Endo III)和核酸内切酶VIII(Endo VIII)以及人同源物hNTH1、NEIL1、NEIL2和NEIL3是切割AP残基3’侧DNA的AP裂合酶，这种切割产生含有3’末端磷酸的5’部分和携带5’末端磷酸的3’部分。II类AP核酸内切酶如大肠杆菌的核酸内切酶IV(Endo IV)和外切核酸酶III(Exo III)切割AP位点5’侧DNA，这种切割在得到的片段末端上产生3’OH和5’脱氧核糖磷酸部分。参见例如，Doublie，S.等，Proc.Natl.Acad Sci.101(28)，10284-10289，2004；Haltiwanger，B.M.等，Biochem J.，345，85-89，2000；Levin，J.和Demple，B.，Nucl.Acids.Res.，18(17)，1990；以及所有上述文献的参考文献，以进一步讨论各种I类和II类AP核酸内切酶以及它们去除DNA上的损伤碱基和/或切割含有脱碱基残基的DNA的条件。本领域普通技术人员应理解，其它生物体(如酵母)中存在这些酶的各种同源物，可用于本发明。A number of different AP endonucleases can be used as cleavage reagents in the present invention. Two main types of AP endonucleases are distinguished according to the mechanism by which they cleave junctions adjacent to abasic residues. Class I AP endonucleases such as Escherichia coli endonuclease III (Endo III) and endonuclease VIII (Endo VIII) and the human homologues hNTH1, NEIL1, NEIL2 and NEIL3 cleave the 3' side of AP residues AP lyase of DNA, this cleavage produces a 5' portion containing a 3' terminal phosphate and a 3' portion carrying a 5' terminal phosphate. Class II AP endonucleases such as E. coli endonuclease IV (Endo IV) and exonuclease III (Exo III) cleave the DNA 5' to the AP site, and this cleavage produces 3' on the ends of the resulting fragments. 'OH and 5' deoxyribose phosphate moieties. See, for example, Doublie, S. et al., Proc. Natl. Acad Sci. 101(28), 10284-10289, 2004; Haltiwanger, B.M. et al., Biochem J., 345, 85-89, 2000; Levin, J. and Demple , B., Nucl.Acids.Res., 18(17), 1990; and references to all of the above for further discussion of the various Class I and Class II AP endonucleases and their removal of damaged bases on DNA and/or conditions that cleave DNA containing abasic residues. Those of ordinary skill in the art will appreciate that various homologues of these enzymes exist in other organisms, such as yeast, and can be used in the present invention.

某些酶是双功能酶，它们既具有去除损伤碱基以产生AP残基的糖基化酶活性，也显示出切割由糖基化酶活性产生的AP位点3’端的磷酸二酯主链的裂合酶活性。因此，这些双活性酶是AP核酸内切酶和DNA糖基化酶。例如，Endo VIII用作N-糖基化酶和AP-裂合酶。N-糖基化酶活性能从双链DNA上释放损伤嘧啶，产生脱嘌呤碱基(AP位点)。AP-裂合酶活性切割AP位点的3’和5’端，产生5’磷酸和3’磷酸。核酸内切酶VIII识别和切除的损伤碱基包括脲、5，6-二羟基胸腺嘧啶，胸腺嘧啶二醇，5-羟基-5-甲基乙内酰脲，尿嘧啶二醇，6-羟基-5，6-二氢胸腺嘧啶和甲基丙醇二酰脲。参见例如，Dizdaroglu，M.等，Biochemistry，32，12105-12111，1993和Hatahet，Z.等，J Biol.Chem.，269，18814-18820，1994；Jiang，D.等，J. Biol.Chem.，272(51)，32220-32229，1997；Jiang，D.等，J.Bact，179(11)，3773-3782，1997。Certain enzymes are bifunctional in that they have both glycosylase activity to remove damaged bases to generate AP residues and have been shown to cleave the phosphodiester backbone 3' to the AP site generated by glycosylase activity lyase activity. Thus, these dual activity enzymes are AP endonuclease and DNA glycosylase. For example, Endo VIII acts as N-glycosylase and AP-lyase. N-glycosylase activity releases damaged pyrimidines from double-stranded DNA, generating apurinic bases (AP sites). The AP-lyase activity cleaves the 3' and 5' ends of the AP site, generating a 5' phosphate and a 3' phosphate. Damaged bases recognized and excised by endonuclease VIII include urea, 5,6-dihydroxythymine, thymidinediol, 5-hydroxy-5-methylhydantoin, uracildiol, 6-hydroxy - 5,6-dihydrothymidine and methylpropanol diureide. See, for example, Dizdaroglu, M. et al., Biochemistry, 32, 12105-12111, 1993 and Hatahet, Z. et al., J Biol. Chem., 269, 18814-18820, 1994; Jiang, D. et al., J. Biol. Chem. ., 272(51), 32220-32229, 1997; Jiang, D. et al., J. Bact, 179(11), 3773-3782, 1997.

Fpg(甲酰胺基嘧啶[fapy]-DNA糖基化酶)(也称为8-氧鸟嘌呤DNA糖基化酶)也用作N-糖基化酶和AP-裂合酶。N-糖基化酶活性能从双链DNA上释放损伤嘌呤，产生脱嘌呤碱基(AP位点)。AP-裂合酶活性切割AP位点的3’和5’端，从而去除AP位点并产生1个碱基的缺口。Fpg识别和去除的一些损伤碱基包括7，8-二氢-8-氧鸟嘌呤(8-氧鸟嘌呤)、8-氧腺嘌呤、fapy-鸟嘌呤、甲基-fapy-鸟嘌呤、fapy-腺嘌呤、黄曲霉毒素B1-fapy-鸟嘌呤、5-羟基-胞嘧啶和5-羟基-尿嘧啶。参见例如，Tchou，J.等，J.Biol.Chem.，269，15318-15324，1994；Hatahet，Z.等，J.Biol.Chem.，269，18814-18820，1994；Boiteux，S.等，EMBO J.，5，3177-3183，1987；Jiang，D.等，J.Biol.Chem.，272(51)，32220-32229，1997；Jiang，D.等，J.Bact，179(11)，3773-3782，1997。Fpg (formamidopyrimidine [fapy]-DNA glycosylase) (also known as 8-oxoguanine DNA glycosylase) is also used as N-glycosylase and AP-lyase. N-glycosylase activity can release damaged purines from double-stranded DNA, resulting in apurinic bases (AP sites). The AP-lyase activity cleaves the 3' and 5' ends of the AP site, thereby removing the AP site and creating a 1 base gap. Some of the damaged bases recognized and removed by Fpg include 7,8-dihydro-8-oxoguanine (8-oxoguanine), 8-oxadenine, fapy-guanine, methyl-fapy-guanine, fapy - adenine, aflatoxin B1-fapy-guanine, 5-hydroxy-cytosine and 5-hydroxy-uracil. See, for example, Tchou, J. et al., J. Biol. Chem., 269, 15318-15324, 1994; Hatahet, Z. et al., J. Biol. Chem., 269, 18814-18820, 1994; Boiteux, S. et al. , EMBO J., 5, 3177-3183, 1987; Jiang, D. et al., J.Biol.Chem., 272(51), 32220-32229, 1997; Jiang, D. et al., J.Bact, 179(11 ), 3773-3782, 1997.

可从(如)New England Biolabs，Ipswich，MA购得许多DNA糖基化酶和AP核酸内切酶。A number of DNA glycosylases and AP endonucleases are commercially available from, e.g., New England Biolabs, Ipswich, MA.

在本发明的一些实施方式中，上述关于含有硫代磷酸酯连接的延伸探针的测序方法或测序方法AB(见下)中采用含有作为AP核酸内切酶切割底物的位点的延伸探针。在任何这些方法中，将延伸探针连接于生长的核酸链后，用AP核酸内切酶切割延伸探针，以去除含有标记的探针部分。In some embodiments of the invention, the above-described sequencing methods or sequencing methods AB (see below) for phosphorothioate-linked extension probes employ an extension probe containing a site that is a substrate for AP endonuclease cleavage. Needle. In any of these methods, after ligation of the extension probe to the growing nucleic acid strand, the extension probe is cleaved with AP endonuclease to remove the labeled portion of the probe.

根据具体的AP核酸内切酶，并根据以3’→5’或是5’→3’方向进行测序，可能必须或需要在切割后用多核苷酸激酶或磷酸酶处理延伸双链体，以在延伸双链体上产生可延伸探针末端(参见图5A和5B，关于可延伸探针末端的描述)。因此，在本发明的某些方法中，用多核苷酸激酶或磷酸酶处理以产生可延伸末端。本领域普通技术人员应理解，可采用适合各种酶的缓冲液，可包括额外洗涤步骤以去除酶，并为该方法的后续步骤提供合适条件。Depending on the specific AP endonuclease, and depending on whether sequencing is performed in the 3'→5' or 5'→3' orientation, it may be necessary or desirable to treat the extended duplex with a polynucleotide kinase or phosphatase after cleavage in order to An extendable probe end is generated on the extension duplex (see Figures 5A and 5B for a description of the extendable probe end). Thus, in certain methods of the invention, polynucleotide kinases or phosphatases are used to generate extendable ends. Those of ordinary skill in the art will appreciate that suitable buffers for the various enzymes may be employed, and that additional washing steps may be included to remove enzymes and provide suitable conditions for subsequent steps in the method.

在其它实施方式中，延伸探针含有作为DNA糖基化酶去除底物的损伤碱基。用不同DNA糖基化酶去除各种细胞毒性和引起突变的DNA碱基，从而在DNA损伤后启动碱基切除修复通路(Krokan，H.E.等，Biochem J，325(Pt 1)：1-16，1997)。DNA糖基化酶切割损伤碱基和脱氧核糖之间的N-糖基键，从而释放游离碱基并产生脱嘌呤/脱嘧啶(AP)位点。在一些实施方式中，延伸探针含有尿嘧啶残基，该残基被尿嘧啶-DNA糖基化酶(UDG)去除。在迄今为止研究的所有有生命生物体中都发现了UDG，本领域已知大量这种酶，它们可用于本发明(Frederica等，Biochemistry，29，2353-2537，1990；Krokan，同上)。例如，哺乳动物细胞含有至少4种类型的UDG：线粒体UNG1以及核UNG2、SMUG1、TDG和MBD4(Krokan等，Oncogene，21，8935-8948，2002)。UNG1和UNG2属于以大肠杆菌Ung为代表的高度保守家族。In other embodiments, the extension probes contain damaged bases that are substrates for removal by DNA glycosylases. Various cytotoxic and mutagenic DNA bases are removed by different DNA glycosylases, thereby initiating the base excision repair pathway after DNA damage (Krokan, H.E. et al., Biochem J, 325(Pt 1): 1-16, 1997). DNA glycosylase cleaves the N-glycosyl bond between the damaged base and the deoxyribose sugar, thereby releasing the free base and creating an apurinic/apyrimidinic (AP) site. In some embodiments, the extension probes contain uracil residues that are removed by uracil-DNA glycosylase (UDG). UDGs are found in all living organisms studied to date, and a large number of such enzymes are known in the art and can be used in the present invention (Frederica et al., Biochemistry, 29, 2353-2537, 1990; Krokan, supra). For example, mammalian cells contain at least 4 types of UDGs: mitochondrial UNG1 and nuclear UNG2, SMUG1, TDG and MBD4 (Krokan et al., Oncogene, 21, 8935-8948, 2002). UNG1 and UNG2 belong to a highly conserved family represented by Escherichia coli Ung.

在延伸探针含有损伤碱基的实施方式中，将延伸探针连接于可延伸探针末端后，使延伸双链体接触能去除损伤碱基的糖基化酶，从而产生脱碱基残基。认为含有由糖基化酶去除的损伤碱基的延伸探针“易于经修饰含有易切连接”。然后使延伸双链体接触AP核酸内切酶，它能切割脱碱基残基和相邻核苷之间的连接，如上所述。在本发明的某些实施方式中，用作为DNA糖基化酶和AP核酸内切酶的双活性酶进行两种反应。在一些实施方式中，使含有损伤碱基的延伸双链体接触DNA糖基化酶和AP核酸内切酶。在本发明的各种实施方式中，这些酶可以联合使用或依次使用(即使用糖基化酶后，使用核酸内切酶)。In embodiments where the extension probe contains a damaged base, after ligation of the extension probe to the end of the extendable probe, the extension duplex is contacted with a glycosylase capable of removing the damaged base, thereby generating an abasic residue . Extension probes containing damaged bases removed by glycosylases are considered "easily modified to contain easy cleavage linkages". The extended duplex is then exposed to AP endonuclease, which cleaves linkages between abasic residues and adjacent nucleosides, as described above. In certain embodiments of the invention, both reactions are performed with a dual activity enzyme that is a DNA glycosylase and an AP endonuclease. In some embodiments, the extended duplex containing the damaged base is contacted with a DNA glycosylase and an AP endonuclease. In various embodiments of the present invention, these enzymes can be used in combination or sequentially (ie, endonuclease is used after glycosylase).

在本发明的一些实施方式中，延伸探针所含的引发残基是脱氧肌苷。如上所述，大肠杆菌核酸内切酶V(Endo V)，也称为脱氧肌苷3’核酸内切酶及其同源物能在脱氧肌苷残基3’侧的第二个磷酸二酯键处切割含有脱氧肌苷的核酸，产生3’OH和5’磷酸末端。因此，此键用作延伸探针的易切连接。本领域已知Endo V及其切割特性(Yao，M.和Kow Y.W.，J Biol.Chem.，271，30672-30673(1996)；Yao，M.和Kow Y.W.，J Biol.Chem.，270，28609-28616(1995)；He，B等，MiitatRes.，459，109-114(2000)。除脱氧肌苷外，Endo V也识别脱氧尿苷、脱氧黄嘌呤核苷和deoxyoxanosine(Hitchcock，T.等，Nuc.Acids Res.，32(13)，32(13)(2004)。哺乳动物同源物如mEndo V也具有切割活性(Moe，A.等，Nuc.Acids Res.，31(14)，3893-3900(2004)。虽然Endo V是含有脱氧肌苷的探针的优选切割剂，但也可采用其它切割试剂来切割含有脱氧肌苷的探针。例如，作为损伤碱基，次黄嘌呤可被合适的DNA糖基化酶去除，产生的含有脱碱基残基的延伸探针随后被核酸内切酶切割。In some embodiments of the invention, the priming residue contained in the extension probe is deoxyinosine. As mentioned above, Escherichia coli endonuclease V (Endo V), also known as deoxyinosine 3' endonuclease and its homologues can detect the second phosphodiester on the 3' side of the deoxyinosine residue Cleaves deoxyinosine-containing nucleic acids at the bond, producing 3'OH and 5' phosphate ends. Therefore, this bond serves as an easy cleavable link for the extension probe. Endo V and its cleavage properties are known in the art (Yao, M. and Kow Y.W., J Biol. Chem., 271, 30672-30673 (1996); Yao, M. and Kow Y.W., J Biol. Chem., 270, 28609-28616 (1995); He, B et al., MiitatRes., 459, 109-114 (2000).In addition to deoxyinosine, Endo V also recognizes deoxyuridine, deoxyxanthine nucleoside and deoxyoxanosine (Hitchcock, T. etc., Nuc.Acids Res., 32(13), 32(13) (2004).Mammalian homologs such as mEndo V also have cleavage activity (Moe, A. etc., Nuc.Acids Res., 31(14) , 3893-3900 (2004). Although Endo V is the preferred cutting agent for probes containing deoxyinosine, other cleavage reagents can also be used to cleave probes containing deoxyinosine. For example, as damaged bases, hypoxanthin Purines can be removed by suitable DNA glycosylases, resulting in extended probes containing abasic residues which are subsequently cleaved by endonucleases.

应理解，如果脱氧肌苷用作引发残基，可能需要避免在探针的其它地方使用脱氧肌苷，特别是将连接于可延伸探针末端的末端和引发残基之间的位置。因此，如果探针含有一个或多个通用碱基，可采用除脱氧肌苷外的核苷。也应理解，当使含有引发残基的核酸易被特定切割剂切割的引发残基用于延伸探针时，可能需要避免在引发相同切割剂切割的探针(或将与该延伸探针一起用于测序反应的其它探针)中包含其它残基。It will be appreciated that if deoxyinosine is used as the priming residue, it may be desirable to avoid the use of deoxyinosine elsewhere in the probe, particularly at positions that would be attached to the end of the extendable probe and the priming residue. Thus, nucleosides other than deoxyinosine may be used if the probe contains one or more universal bases. It will also be understood that when a priming residue that makes a nucleic acid containing a priming residue susceptible to cleavage by a particular cleavage agent is used to extend a probe, it may be desirable to avoid the presence of probes that initiate cleavage by the same cleavage agent (or that would be used together with the extension probe). Other residues are included in other probes used in sequencing reactions).

本发明包括采用切割含有引发残基的核酸的任何酶。可通过研读酶供应商如New England Biolabs，Inc的目录鉴定其它酶。将New England Biolabs目录，2005版(New England Biolabs，Ipswich，MA 01938-2723)纳入本文作参考，本发明考虑采用能切割含有引发残基的核酸的本文公开的任何酶或这种酶的同源物。采用的其它酶包括例如：hOGG1及其同源物(Radicella，JP等，Proc Natl AcadSci USA，94(15)：8010-5，1997)。The present invention includes the use of any enzyme that cleaves a nucleic acid containing a priming residue. Additional enzymes can be identified by studying catalogs of enzyme suppliers such as New England Biolabs(R), Inc. The New England Biolabs Catalog, 2005 Edition (New England Biolabs, Ipswich, MA 01938-2723) is incorporated herein by reference, and the invention contemplates the use of any enzyme disclosed herein or a homolog of such an enzyme that cleaves a nucleic acid containing a priming residue. things. Other enzymes employed include, for example, hOGG1 and its homologues (Radicella, JP et al., Proc Natl AcadSci USA, 94(15):8010-5, 1997).

本领域已知含有引发残基如损伤碱基、脱碱基残基等的寡核苷酸的合成方法。本领域已知含有作为AP核酸内切酶底物的位点的寡核苷酸，如含有脱碱基残基的寡核苷酸的合成方法，通常适用于自动化固相寡核苷酸合成。在一些实施方式中，合成在脱碱基残基的所需位置上含有尿苷的寡核苷酸。然后用酶如去除尿嘧啶的UDG处理寡核苷酸产生脱碱基残基，无论尿苷存在于寡核苷酸中的什么地方。Methods for the synthesis of oligonucleotides containing priming residues such as damaged bases, abasic residues, and the like are known in the art. Methods for the synthesis of oligonucleotides containing sites that are substrates for AP endonucleases, such as oligonucleotides containing abasic residues, are known in the art and are generally suitable for automated solid phase oligonucleotide synthesis. In some embodiments, oligonucleotides are synthesized that contain uridine at the desired position of the abasic residue. The oligonucleotide is then treated with an enzyme such as UDG which removes uracil to generate an abasic residue wherever the uridine is present in the oligonucleotide.

在本发明的一些实施方式中，寡核苷酸探针含有二糖核苷，如Nauwelaerts，K.等，Nuc.Acids.Res.，31(23)，2003所述。连接后，用高碘酸盐(NaIO₄)切割延伸双链体，然后用碱(如NaOH)处理以去除该标记，产生游离的3’OH和P5-OPO₃H₂基团。根据以3’→5’或是5’→3’方向进行测序，可能必须或需要用多核苷酸激酶或磷酸酶处理延伸双链体产生可延伸末端。因此，在本发明的某些方法中，用多核苷酸激酶或磷酸酶处理以产生可延伸末端。In some embodiments of the invention, the oligonucleotide probes contain disaccharide nucleosides as described by Nauwelaerts, K. et al., Nuc. Acids. Res., 31(23), 2003. After ligation, the extended duplex is cleaved with periodate ( _NaIO4 ), followed by treatment with _a base such as NaOH to remove the label, yielding free 3'OH and P5- _OPO3H2 groups. Depending on whether sequencing is performed in the 3'→5' or 5'→3' orientation, it may be necessary or desirable to treat the extended duplex with a polynucleotide kinase or phosphatase to generate extendable ends. Thus, in certain methods of the invention, polynucleotide kinases or phosphatases are used to generate extendable ends.

认为含有二糖核苷的多核苷酸含有脱碱基残基。例如，一个核苷酸的3’OH和下一个核苷酸的5’磷酸基团之间插入了核糖残基的多核苷酸被认为含有脱碱基残基。Disaccharide nucleoside-containing polynucleotides are considered to contain abasic residues. For example, a polynucleotide with a ribose residue interposed between the 3'OH of one nucleotide and the 5' phosphate group of the next nucleotide is said to contain abasic residues.

加帽cap

在有些情况下，并非全部具有可延伸末端的探针都成功参与了各个延伸、连接和切割循环的连接反应。应理解，如果这种探针参与后续循环，每个核苷酸鉴定步骤的准确性将逐渐降低。虽然本发明者已证明，采用含有硫代磷酸酯连接的延伸探针能够以高效率连接，但在本发明的某些实施方式中，包括了一个加帽步骤以防止没有进行连接的可延伸末端参与后续循环。用含有3’-O-P-S-5’硫代磷酸酯连接的延伸探针以5’→3’方向测序时，例如，在连接或检测步骤后，可用DNA聚合酶和不可延伸部分，例如链终止核苷酸如二脱氧核苷酸或连接有封闭部分的核苷酸延伸未连接的可延伸末端来进行加帽。用含有3’-S-P-O-5’硫代磷酸酯连接的延伸探针以3’→5’方向进行测序时，例如，在连接或检测后，可用磷酸酶处理模板来进行加帽。也可采用其它加帽方法。In some cases, not all probes with extendable ends successfully participated in the ligation reactions of each extension, ligation, and cleavage cycle. It will be appreciated that if such probes are involved in subsequent cycles, the accuracy of each nucleotide identification step will progressively decrease. Although the inventors have demonstrated that high-efficiency ligation can be achieved using extension probes containing phosphorothioate linkages, in certain embodiments of the invention a capping step is included to prevent extensible ends that do not undergo ligation Participate in subsequent cycles. When sequencing in the 5'→3' direction with extension probes containing 3'-O-P-S-5' phosphorothioate linkages, e.g., after a ligation or detection step, a DNA polymerase and a non-extendable moiety, such as a chain-terminating core, can be used. Capping is performed by extending the unlinked extendable terminus with nucleotides such as dideoxynucleotides or nucleotides linked to blocking moieties. For sequencing in the 3'→5' direction with extension probes containing 3'-S-P-O-5' phosphorothioate linkages, for example, capping can be done by treating the template with phosphatase after ligation or detection. Other capping methods can also be used.

H.用寡核苷酸探针家族进行测序H. Sequencing with Oligonucleotide Probe Families

在总称为“方法A”的上述测序方法中，连接于任何具体延伸探针的标记和探针近端(即连接于延伸双链体的可延伸探针末端的末端)的一个或多个核苷酸的种类之间有直接和已知的对应关系。因此，鉴定出新连接延伸探针的标记就足以鉴定该模板中的一个或多个核苷酸。本发明提供了采用不同方法进行核苷酸鉴定的其它测序方法，总称为“方法AB”，也包括连续的延伸、连接和(优选)切割循环。In the sequencing methods described above, collectively referred to as "Method A," one or more nuclei attached to the label of any particular extension probe and to the proximal end of the probe (i.e., the end attached to the extensible probe terminus of the extension duplex) There is a direct and known correspondence between classes of nucleotides. Thus, identification of the label of the newly ligated extension probe is sufficient to identify one or more nucleotides in the template. The present invention provides other sequencing methods employing different methods for nucleotide identification, collectively referred to as "Methods AB", also comprising sequential cycles of extension, ligation and (preferably) cleavage.

本发明提供的测序方法AB采用至少两种区别标记的寡核苷酸探针家族的集合。根据标记分配各探针家族的名称，如“红”、“蓝”、“黄”、“绿”。如上述方法所述，从起始寡核苷酸和模板形成的双链体开始延伸。将寡核苷酸探针连接于起始寡核苷酸末端形成延伸双链体，从而延伸起始寡核苷酸，然后通过连续连接循环重复延伸。探针的末端位置(探针上连接于双链体生长核酸链的核苷酸的相对端)上含有不可延伸部分，以便在单个循环中延伸双链体仅发生一次延伸。在各循环中，检测顺利连接的探针上或与其连接的标记，去除或修饰不可延伸部分，以产生可延伸末端。标记的检测能确定探针所属探针家族的名称。The sequencing method AB provided by the present invention employs a collection of at least two differentially labeled oligonucleotide probe families. Assign each probe family a name according to its label, eg "red", "blue", "yellow", "green". Extension is initiated from the duplex formed by the starting oligonucleotide and the template as described above. The starting oligonucleotide is extended by ligation of the oligonucleotide probe to the end of the starting oligonucleotide to form an extension duplex, and the extension is then repeated through successive ligation cycles. The probe contains a non-extendable portion at its terminal position (opposite the end of the probe to the nucleotides attached to the growing duplex nucleic acid strand) so that only one extension occurs in a single cycle of extending the duplex. In each cycle, labels on or attached to successfully ligated probes are detected and non-extendable portions are removed or modified to generate extendable ends. Detection of the label enables the determination of the name of the probe family to which the probe belongs.

连续的延伸、连接和检测循环产生了标记名称的有序列表。这些标记对应于在连续位置上与模板杂交的顺利连接的探针所属的探针家族。连接后，探针近端位置与模板中不同核苷酸相对。因此，探针家族名称顺序和模板中核苷酸顺序有对应关系。Successive cycles of extension, concatenation, and detection produce an ordered list of tag names. These labels correspond to the probe families to which well-ligated probes hybridize to the template at consecutive positions. After ligation, the proximal position of the probe is relative to a different nucleotide in the template. Therefore, there is a corresponding relationship between the sequence of probe family names and the sequence of nucleotides in the template.

在易切连接位于延伸探针的近端核苷和相邻核苷之间的本发明实施方式中，可通过从一个起始寡核苷酸开始的连续延伸、连接、检测和切割循环获得探针家族名称的有序列表，因为每个循环将延伸的寡核苷酸探针延伸一个核苷酸。如果易切连接位于两个其它核苷之间，由获自多个测序反应的结果组装探针家族名称的有序列表，在这些测序反应中采用杂交于结合反应区不同位置的起始寡核苷酸，如测序方法A所述。In embodiments of the invention where the easy-to-cleavage linkage is located between the proximal nucleoside and the adjacent nucleoside of the extension probe, the probe can be obtained by successive cycles of extension, ligation, detection and cleavage starting from one starting oligonucleotide. An ordered list of needle family names as each cycle extends the extended oligonucleotide probe by one nucleotide. If the easy junction is between two other nucleosides, assemble an ordered list of probe family names from the results obtained from multiple sequencing reactions in which the starting oligo hybridized to a different position in the binding reaction region. Nucleotides, as described in Sequencing Method A.

了解新连接的探针属于哪个探针家族本身不足以确定模板中的核苷酸种类。但是，确定探针家族名称消除了核苷酸的某些组合作为至少一部分探针的序列的可能性，而给出至少两种可能的每个核苷酸的种类。因此，在没有其它信息的情况下了解探针家族名称能给出至少两种可能的位于新连接探针的核苷酸相对位置上的模板核苷酸种类。因此，任何一个延伸、连接、检测(和任选的切割)循环本身无法鉴定模板中的任何核苷酸。然而，它能消除模板的一种或多种可能序列，从而提供序列信息。在本发明的某些实施方式中，通过如下所述适当地设计探针和探针家族，仍可确定模板序列。在本发明的某些实施方式中，测序方法AB包括两个阶段：第一阶段获得探针家族名称的有序列表，第二阶段将该有序列表解码，以确定模板序列。Knowing which probe family a newly ligated probe belongs to is not, by itself, sufficient to determine the nucleotide species in the template. However, determining the probe family name eliminates the possibility of certain combinations of nucleotides as the sequence of at least a portion of the probe, giving at least two possible species for each nucleotide. Thus, knowing the probe family name in the absence of other information gives at least two possible template nucleotide species at relative positions to the nucleotides of the newly ligated probe. Thus, any one cycle of extension, ligation, detection (and optional cleavage) cannot by itself identify any nucleotide in the template. However, it provides sequence information by eliminating one or more possible sequences of the template. In certain embodiments of the invention, template sequences can still be determined by appropriate design of probes and probe families as described below. In some embodiments of the present invention, the sequencing method AB comprises two stages: the first stage obtains an ordered list of probe family names, and the second stage decodes the ordered list to determine the template sequence.

除非另有说明，测序方法A和AB通常采用相似方法合成探针，制备模板和进行延伸、连接、切割和检测的步骤。Unless otherwise stated, sequencing methods A and AB generally employ similar methods to synthesize probes, prepare templates and perform the steps of extension, ligation, cleavage and detection.

测序方法AB的寡核苷酸延伸探针和探针家族的特征Characterization of Oligonucleotide Extension Probes and Probe Families for Sequencing Methods AB

用于测序方法AB的探针家族的特征是，各探针家族包括不同序列的多个标记的寡核苷酸探针，并且在所述序列的每个位置上，一个探针家族包括该位置上碱基不同的至少2种探针。每个探针家族中的探针含有相同标记。优选地，探针包含易切核苷间连接。易切连接可位于探针中的任何地方。探针的一端优选含有连接酶不可延伸的部分。优选在易切连接和连接酶不可延伸的部分之间的位置上标记探针，以便将探针连接于可延伸探针末端后切割易切连接时产生连接于可延伸探针末端的未标记部分和不再连接于未标记部分的标记部分。The probe families used in the sequencing method AB are characterized in that each probe family includes a plurality of labeled oligonucleotide probes of different sequences, and at each position in said sequence, one probe family includes the position At least two probes with different bases. Probes within each probe family contain the same label. Preferably, the probe comprises an easy internucleoside linkage. The easy-cut junction can be located anywhere in the probe. One end of the probe preferably contains a ligase non-extendable portion. The probe is preferably labeled at a position between the easy junction and the non-extendable portion of the ligase so that cleavage of the easy junction after ligation of the probe to the end of the extendable probe results in an unlabeled portion attached to the end of the extendable probe and the marked part that is no longer connected to the unmarked part.

各探针家族中的探针优选含有至少j个核苷X，其中j至少为2，在各探针家族的探针中各X至少为2倍简并。各探针家族的探针还含有至少k个核苷N，其中k至少为2，其中N代表任何核苷。通常，j+k等于或小于100，一般小于或等于30。核苷X可位于探针中的任何位置。核苷X不一定位于毗连位置上。相似地，核苷N不一定位于毗连位置上。换言之，核苷X和N可散布。虽然核苷不一定毗连，但可认为核苷X具有5’→3’序列。例如，认为X_ANX_GNNX_CN结构的探针的核苷X含有序列AGC。相似地，可认为核苷N含有序列。The probes in each probe family preferably contain at least j nucleosides X, where j is at least 2, and each X is at least 2-fold degenerate in the probes of each probe family. The probes of each probe family also contain at least k nucleosides N, where k is at least 2, where N represents any nucleoside. Usually, j+k is equal to or less than 100, generally less than or equal to 30. Nucleoside X can be located anywhere in the probe. Nucleoside X does not have to be in contiguous positions. Similarly, nucleosides N need not be in contiguous positions. In other words, nucleosides X and N can be interspersed. Nucleoside X may be considered to have a 5'→3' sequence, although the nucleosides are not necessarily contiguous. For example, the nucleoside X of a probe of the structure X _A NX _G NNX _CN is considered to contain the sequence AGC. Similarly, nucleoside N can be considered to contain sequence.

核苷X可以相同或不同，但不能独立选择，即各X的种类受限于探针中一个或多个其它核苷X的种类。因此，通常具体探针和具体探针家族的探针中仅存在某些核苷X的组合。换言之，各探针中，核苷X的序列仅可代表长度为j的所有可能序列的一个亚组。因此，X中一个或多个核苷酸的种类限制了一个或多个其它核苷的可能种类。The nucleosides X may be the same or different, but cannot be independently selected, ie the identity of each X is limited by the identity of one or more other nucleosides X in the probe. Thus, typically only certain combinations of nucleosides X will be present in a particular probe and probes of a particular probe family. In other words, in each probe, the sequence of nucleoside X can represent only a subset of all possible sequences of length j. Thus, the species of one or more nucleotides in X limits the possible species of one or more other nucleosides.

核苷N优选独立选择，可以是A、G、C或T(或任选的简并性降低的核苷)。核苷N的序列优选代表长度为k的所有可能序列，除了一个或多个N可以是简并性降低的核苷。因此，探针含有两部分，其中由核苷N组成的部分称为不受限部分，由核苷X组成的部分称为受限部分。如上所述，所述部分不一定是毗连核苷。在本文中将含有受限部分和不受限部分的探针称为部分限定探针。受限部分的一个或多个核苷优选位于探针的近端，即含有将连接于可延伸探针末端的核苷的末端，在本发明不同实施方式中它可以是寡核苷酸探针的5’或3’端。The nucleoside N is preferably independently selected and may be A, G, C or T (or optionally a nucleoside of reduced degeneracy). The sequence of nucleoside N preferably represents all possible sequences of length k, except that one or more N may be nucleosides of reduced degeneracy. Therefore, the probe contains two parts, where the part consisting of nucleoside N is called the unrestricted part and the part consisting of nucleoside X is called the restricted part. As noted above, the moieties need not be contiguous nucleosides. Probes containing restricted and unrestricted moieties are referred to herein as partially defined probes. The one or more nucleosides of the restricted portion are preferably located at the proximal end of the probe, i.e. the end containing the nucleosides that will be attached to the end of the extendable probe, which may be an oligonucleotide probe in various embodiments of the invention 5' or 3' end.

由于任何寡核苷酸探针的受限部分仅可具有某些序列，了解探针受限部分的一个或多个核苷的种类能提供关于一个或多个其它核苷的信息。该信息可能足以或可能不足以准确地鉴定一个或多个其它核苷，但它足以消除受限部分的一个或多个其它核苷种类的一种或多种可能性。在测序方法AB的某些优选实施方式中，了解探针受限部分的一个核苷的种类足以准确鉴定受限部分其它各核苷，即测定含有受限部分的核苷的种类和顺序。Since the constrained portion of any oligonucleotide probe can only have certain sequences, knowing the identity of one or more nucleosides of the constrained portion of the probe can provide information about one or more other nucleosides. This information may or may not be sufficient to accurately identify the one or more other nucleosides, but it is sufficient to eliminate one or more possibilities for the restricted portion of the one or more other nucleoside species. In certain preferred embodiments of Sequencing Methods AB, knowledge of the identity of one nucleoside of the restricted portion of the probe is sufficient to accurately identify each of the other nucleosides of the restricted portion, ie, to determine the identity and order of the nucleosides containing the restricted portion.

如上述测序方法所述，与模板互补的延伸探针中最近端的核苷连接于起始寡核苷酸的可延伸末端(在第一个延伸、连接和检测循环中)和延伸的寡核苷酸探针的可延伸末端(在后续的延伸、连接和检测循环中)。检测确定新连接探针所属探针家族的名称。由于探针受限部分的各个位置至少为2倍简并，所以探针家族名称本身不能鉴定受限部分的任何核苷酸。然而，由于受限部分的序列是长度为j的所有可能序列的亚组中的一个序列，所以鉴定探针家族不能消除某些可能的受限部分序列。该探针受限部分构成了它的序列测定部分。因此，通过鉴定探针所属探针家族消除该探针受限部分的一个或多个核苷种类的一种或多种可能性消除了延伸探针杂交的模板的核苷酸种类的一种或多种可能性。在本发明的优选实施方式中，部分限定的探针在任何两个核苷之间含有易切连接。The most proximal nucleoside in the extension probe complementary to the template is ligated to the extendable end of the starting oligonucleotide (in the first cycle of extension, ligation, and detection) and the extended oligonucleotide as described in the sequencing method above. The extendable end of the acid probe (in subsequent extension, ligation and detection cycles). The detection determines the name of the probe family to which the newly ligated probe belongs. Since each position of the restricted portion of the probe is at least 2-fold degenerate, the probe family name itself does not identify any nucleotides of the restricted portion. However, since the sequence of the restricted part is one sequence in the subset of all possible sequences of length j, identifying the probe family does not eliminate some possible restricted part sequences. The restricted portion of the probe constitutes its sequence determination portion. Thus, eliminating one or more possibilities of one or more nucleotide species of the restricted portion of the probe by identifying the probe family to which the probe belongs eliminates one or more of the nucleotide species of the template to which the extension probe hybridizes. Many possibilities. In a preferred embodiment of the invention, partially defined probes contain an easy linkage between any two nucleosides.

在某些实施方式中，部分限定探针的通式为(X)_j(N)_k，其中X代表核苷，(X)_j在各位置上至少为2倍简并，因此X可以是具有不同碱基配对特异性的至少2种核苷中的任一种，N代表任何核苷，j至少为2，k为1-100，至少一个N或除探针末端的X以外的X含有可检测部分。优选地，(N)_k在各位置上是独立的4倍简并，因此，各探针中(N)_k代表长度为k的所有可能序列，除了(N)_k中一个或多个位置可能被简并性降低的核苷酸占据。(X)_j中的核苷可以相同或不同，但不能独立选择。换言之，各探针中，(X)_j仅可代表长度为j的所有可能序列的亚组。因此，(X)_j中一个或多个核苷酸的种类限制了一个或多个其它核苷的可能种类。因此，该探针含有两部分，其中(N)_k是不受限部分，(X)_j是受限部分。In certain embodiments, partially defined probes have the general formula (X) _j (N) _k , where X represents a nucleoside, and (X) _j is at least 2-fold degenerate at each position, so X can be Any of at least 2 nucleosides with different base pairing specificities, N represents any nucleoside, j is at least 2, k is 1-100, at least one N or X other than the X at the end of the probe contains possible detection part. Preferably, (N) _k is independently 4-fold degenerate at each position, so that (N) _k in each probe represents all possible sequences of length k, except that one or more positions in (N) _k may Occupied by nucleotides of reduced degeneracy. The nucleosides in (X) _j can be the same or different, but cannot be chosen independently. In other words, in each probe, (X) _j can only represent a subset of all possible sequences of length j. Thus, the species of one or more nucleotides in (X) _j limits the possible species of one or more other nucleosides. Therefore, the probe contains two parts, where (N) _k is the unrestricted part and (X) _j is the restricted part.

在本发明的某些优选实施方式中，部分限定探针具有结构5’-(X)_j(N)_kN_B ^*-3’或3’-(X)_j(N)_kN_B ^*-5’，其中N代表任何核苷，N_B代表连接酶不能延伸的部分，*代表可检测部分，(X)_j是各位置上至少2倍简并的探针的受限部分，(X)_j中的核苷可以相同或不同，但不能独立选择，至少一个核苷间连接是易切连接，j至少为2，k为1-100，限制条件是：可检测部分可能存在于替代N_B、或除了N_B以外任何核苷N或除探针末端的X以外的X上。易切连接可以位于(X)_j的两个核苷之间、(X)_j中最远端的核苷酸和(N)_k中最近端的核苷之间、(N)_k内的核苷之间或(N)_k和N_B的末端核苷之间。易切连接优选为硫代磷酸酯连接。In certain preferred embodiments of the invention, the partially defined probe has the structure 5'-(X) _j ( _N)kNB _* ^-3 ' or 3'-(X) _j ( _N ) _kNB ^* - 5', where N represents any nucleoside, N _B represents the part that cannot be extended by ligase, * represents the detectable part, (X) _j is the restricted part of the probe that is at least 2 times degenerate at each position, (X) The nucleosides in _j can be the same or different but cannot be independently selected, at least one internucleoside linkage is an easy cleavage linkage, j is at least 2, and k is 1-100, with the proviso that a detectable moiety may be present in the alternative _NB , or on any nucleoside N other than _NB or on X other than X at the end of the probe. An easy-cleavable linkage can be located between two nucleosides in (X) _j , between the most distal nucleotide in (X) _j and the most proximal nucleoside in (N) _k , or a nucleoside within (N) _k Between or between the terminal nucleosides of (N) _k and _NB . The easy-cleavable linkage is preferably a phosphorothioate linkage.

在本发明其它更优选的实施方式中，探针具有结构5’-(XY)(N)_kN_B ^*-3’或3’-(XY)(N)_kN_B ^*-5’，其中N代表任何核苷，N_B代表连接酶不能延伸的部分，*代表可检测部分，XY是该探针的受限部分，其中X和Y代表相同或不同、但不能独立选择的核苷，X和Y至少为2倍简并，至少一个核苷间连接是易切连接，k为1-100，限制条件是：可检测部分可存在于替代N_B、或除了N_B以外任何核苷酸N或除探针末端的X以外的X上。易切连接优选为硫代磷酸酯连接。结构为5’-(XY)(N)_kN_B ^*-3’的探针可用于以5’→3’方向测序。结构为3’-(XY)(N)_kN_B ^*-5’的探针可用于以3’→5’方向测序。In other more preferred embodiments of the invention, the probe _has the structure 5'-(XY)(N) _kNB ^* -3' or 3'-(XY)( _N ) _kNB ^* -5', wherein N represents any nucleoside, N _B represents the part that cannot be extended by ligase, * represents the detectable part, XY is the restricted part of the probe, where X and Y represent the same or different nucleosides that cannot be independently selected, X and Y are at least 2-fold degenerate, at least one internucleoside linkage is a scissile linkage, and k is 1-100, with the proviso that the detectable moiety can be present in place of _NB , or any nucleotide N other than _NB or on an X other than an X at the end of the probe. The easy-cleavable linkage is preferably a phosphorothioate linkage. Probes with the structure 5'-(XY)(N) _kNB ^* -3' _can be used for sequencing in the 5'→3' direction. Probes with the structure 3'- ₍ XY)(N) _kNB ^* -5' can be used for sequencing in the 3'→5' direction.

下面更详细地介绍了某些优选探针的结构。为了以5’→3’方向测序，采用结构为5’-O-P-O-(X)_j(N)_k-O-P-S-(N)_iN_B ^*-3’的部分限定探针，其中N代表任何核苷，N_B代表连接酶不能延伸的部分，^*代表可检测部分，(X)_j是各位置上至少为2倍简并的探针的受限部分，(X)_j中的核苷可以相同或不同，但不能独立选择，j至少为2，(k+i)为1-100，k为1-100，i为0-99，限制条件是：可检测部分可存在于替代N_B、或除了N_B以外(N)_j的任何核苷上。在本发明的某些实施方式中，(X)_j是(XY)，其中X和Y至少为2倍简并，并代表相同或不同、但不能独立选择的核苷酸。在本发明的某些实施方式中，i是O。The structures of some preferred probes are described in more detail below. For sequencing in the 5' → 3' direction, a partially defined probe with the structure 5'-OPO-(X) _j (N) _k -OPS-(N) _i N _B ^* -3' is used, where N stands for any nucleus glycoside, N _B represents the part that cannot be extended by ligase, ^* represents the detectable part, (X) _j is the restricted part of the probe that is at least 2 times degenerate at each position, and the nucleosides in (X) _j can be the same or different, but not independently selected, j is at least 2, (k+i) is 1-100, k is 1-100, i is 0-99, with the proviso that the detectable moiety may be present in the alternative _NB , or On any nucleoside of (N) _j except _NB . In certain embodiments of the invention, (X) _j is (XY), wherein X and Y are at least 2-fold degenerate and represent identical or different, but not independently selectable, nucleotides. In certain embodiments of the invention, i is O.

用于以5’→3’方向测序的其它优选探针具有结构5’-O-P-O-(X)_j-O-P-S-(N)_iN_B ^*-3’，其中N代表任何核苷，N_B代表连接酶不能延伸的部分，*代表可检测部分，(X)_j是各位置上至少为2倍简并的探针的受限部分，(X)_j中的核苷酸可以相同或不同，但不能独立选择，j至少为2，i为1-100，限制条件是：可检测部分可存在于替代N_B、或除了N_B以外(N)_i的任何核苷上。在本发明的某些实施方式中，(X)_j是(XY)，其中位置X和Y至少为2倍简并，X和Y代表相同或不同、但不能独立选择的核苷。用于以5’→3’方向测序的另一优选探针具有结构5’-O-P-O-(X)_j-O-P-S-(X)_k(N)_iN_B ^*-3’，其中N代表任何核苷，N_B代表连接酶不能延伸的部分，*代表可检测部分，(X)_j-O-P-S-(X)_k是各位置上至少为2倍简并的探针的受限部分，(X)_j-O-P-S-(X)_k的位置上至少为2倍简并，可以相同或不同，但不能独立选择，j和k都至少为1，(j+k)至少为2(如2、3、4或5)，i为1-100，限制条件是：可检测部分可存在于替代N_B、或除了N_B以外(N)_i的任何核苷上。在本发明的某些实施方式中，j和k都是1。Other preferred probes for sequencing in the 5'→3' direction have the structure 5'-OPO-(X) _j -OPS-(N) _iNB ^* -3', where N represents any _nucleoside and _NB represents The part that cannot be extended by ligase, * represents the detectable part, (X) _j is the restricted part of the probe that is at least 2 times degenerate at each position, the nucleotides in (X) _j can be the same or different, but Not independently selectable, j is at least 2 and i is 1-100, with the proviso that the detectable moiety may be present on any nucleoside that replaces _NB or (N) _i in addition to _NB . In certain embodiments of the invention, (X) _j is (XY), wherein positions X and Y are at least 2-fold degenerate, and X and Y represent identical or different nucleosides that cannot be independently selected. Another preferred probe for sequencing in the 5'→3 _' direction has the structure 5'-OPO-(X) _j -OPS-(X) _k (N) _iNB ^* -3', where N represents any nuclear Glycoside, N _B represents the part that cannot be extended by ligase, * represents the detectable part, (X) _j -OPS-(X) _k is the restricted part of the probe that is at least 2 times degenerate at each position, (X) The position of _j -OPS-(X) _k is at least 2 times degenerate, which can be the same or different, but cannot be selected independently, both j and k are at least 1, and (j+k) is at least 2 (such as 2, 3, 4 or 5), i is 1-100, with the proviso that the detectable moiety may be present on any nucleoside of (N) _i that replaces _NB or other than _NB . In certain embodiments of the invention, j and k are both 1.

为了以3’→5’方向测序，采用具有结构5’-N_B ^*(N)_i-S-P-O-(N)_k-O-P-O-(X)_j-3’的部分限定探针，其中N代表任何核苷，N_B代表连接酶不能延伸的部分，*代表可检测部分，(X)_j是各位置上至少为2倍简并的探针的受限部分，(X)_j中的核苷可以相同或不同，但不能独立选择，j至少为2，(k+i)为1-100，k为1-100，i为0-99，限制条件是：可检测部分可存在于替代N_B、或除了N_B以外(N)_i的任何核苷上。在本发明的某些实施方式中，(X)_j是(XY)，其中X和Y至少为2倍简并，并代表相同或不同、但不能独立选择的核苷。在本发明的某些实施方式中，i是0。For sequencing in the 3' → 5' direction, a partially defined probe with the structure 5'-N _B ^* (N) _i -SPO-(N) _k -OPO-(X) _j -3' is used, where N stands for any Nucleosides, N _B represents the part that cannot be extended by ligase, * represents the detectable part, (X) _j is the restricted part of the probe that is at least 2 times degenerate at each position, and the nucleosides in (X) _j can same or different, but not independently selected, j is at least 2, (k+i) is 1-100, k is 1-100, i is 0-99, with the proviso that a detectable moiety may be present in the alternative _NB , Or on any nucleoside of (N) _i other than _NB . In certain embodiments of the invention, (X) _j is (XY), wherein X and Y are at least 2-fold degenerate and represent the same or different nucleosides that cannot be independently selected. In some embodiments of the invention, i is 0.

用于以3’→5’方向测序的其它优选探针具有结构5’-N_B ^*(N)_i-S-P-O-(X)_j-3’，其中N代表任何核苷，N_B代表连接酶不能延伸的部分，*代表可检测部分，(X)_j是各位置上至少为2倍简并的探针的受限部分，(X)_j中的核苷可以相同或不同，但不能独立选择，j至少为2，i为1-100，限制条件是：可检测部分可存在于替代N_B、或除了N_B以外(N)_i的任何核苷上。在本发明的某些实施方式中，(X)_j是(XY)，其中X和Y至少为2倍简并，并代表相同或不同、但不能独立选择的核苷。在本发明的某些实施方式中，在任何部分限定探针中j为2-5，如2、3、4或5。Other preferred probes for sequencing in the 3'→5' direction have the structure 5'- _NB ^* (N) _i -SPO-(X) _j -3', where N stands for any nucleoside and _NB stands for ligase Non-extendable part, * represents the detectable part, (X) _j is the restricted part of the probe that is at least 2-fold degenerate at each position, the nucleosides in (X) _j can be the same or different, but cannot be independently selected , j is at least 2, i is 1-100, with the proviso that the detectable moiety may be present on any nucleoside that replaces _NB or (N) _i in addition to _NB . In certain embodiments of the invention, (X) _j is (XY), wherein X and Y are at least 2-fold degenerate and represent the same or different nucleosides that cannot be independently selected. In certain embodiments of the invention, j is 2-5, such as 2, 3, 4 or 5, in any moiety-defining probe.

用于以3’→5’方向测序的另一优选探针具有结构5’-N_B ^*(N)_i-S-P-O-(X)_k-O-P-O-(X)_j-3’，其中N代表任何核苷，N_B代表连接酶不能延伸的部分，*代表可检测部分，-(X)_k-O-P-O-(X)_j是各位置上至少为2倍简并的探针的受限部分，-(X)_k-O-P-O-(X)_j中的核苷可以相同或不同，但不能独立选择，j和k都至少为1，(j+k)至少为2(如2、3、4或5)，i为1-100，限制条件是：可检测部分可存在于替代N_B、或除了N_B以外(N)_i的任何核苷上。在某些实施方式中，j＝1，k＝1。Another preferred probe for sequencing in the 3'→5' direction has the structure 5'- _NB ^* (N) _i -SPO-(X) _k -OPO-(X) _j -3', where N represents any Nucleosides, N _B represents the part that cannot be extended by ligase, * represents the detectable part, -(X) _k -OPO-(X) _j is the restricted part of the probe that is at least 2 times degenerate at each position, - The nucleosides in (X) _k -OPO-(X) _j can be the same or different, but cannot be independently selected, both j and k are at least 1, and (j+k) is at least 2 (such as 2, 3, 4 or 5 ), i is 1-100, with the proviso that the detectable moiety may be present on any nucleoside of (N) _i that replaces _NB , or in addition to _NB . In some embodiments, j=1 and k=1.

在易切连接位于(X)_j的最近端核苷和(X)_j的次近端核苷之间的本发明实施方式中，可通过从一个起始寡核苷酸开始的连续延伸、连接、检测和切割循环获得探针家族名称的有序列表，因为每个循环将延伸的寡核苷酸探针延伸一个核苷酸。在易切连接位于另外两个核苷之间的本发明实施方式中，由获自多个测序反应的结果组装探针家族名称的有序列表，在这些测序反应中采用杂交于结合反应区不同位置的起始寡核苷酸，如测序方法A所述。In embodiments of the invention where the easy cleavable linkage is between the most proximal nucleoside of (X) _j and the next most proximal nucleoside of (X) _j , it can be achieved by successive extensions from one starting oligonucleotide, ligation The , detection and cleavage cycles obtain an ordered list of probe family names, as each cycle extends the extended oligonucleotide probe by one nucleotide. In embodiments of the invention where the easy link is between two other nucleosides, an ordered list of probe family names is assembled from the results obtained from multiple sequencing reactions in which hybridization to different binding reaction regions is used. The starting oligonucleotide for the position, as described in Sequencing Method A.

应理解，具有除上述结构以外大量结构的探针可用于测序方法AB。例如，探针可具有结构(如)受限核苷X和Y不相邻的XNY(N)_k，或I是通用碱基的XIY(N)_k。(N)_kX(N)_l、(N)_iX(N)_jY(N)_kZ(N)_l、(N)_iX(N)_jYIZ(N)_l和(N)_iX(N)_jY(N)_kZ(I)_l代表其它可能性。如上述探针所述，这些探针含有易切连接、可检测部分，一端含有连接酶不可延长的部分。优选地，该探针不包含连接于探针上连接酶不能延伸的部分相对端的核苷酸的可检测部分。包括具有任何这些结构和其它结构的探针的探针家族能满足各探针家族包括序列不同的多个标记的寡核苷酸探针，并且在所述序列的每个位置上，一个探针家族包括该位置上碱基不同的至少2种探针这一标准。各探针中核苷总数优选为100或更少，如30或更少。It will be appreciated that probes with a wide variety of structures other than those described above may be used in Sequencing Methods AB. For example, a probe may have a structure such as XNY(N) _k where constrained nucleosides X and Y are not adjacent, or XIY(N) _k where I is a universal base. (N) _k X(N) _l , (N) _i X(N) _j Y(N) _k Z(N) _l , (N) _i X(N) _j YIZ(N) _l and (N) _i X (N) _j Y(N) _k Z(I) _l represent other possibilities. As described above for the probes, these probes contain an easily cleavable, detectable portion with a ligase non-extendable portion at one end. Preferably, the probe does not comprise a detectable moiety of nucleotides attached to the opposite end of the probe from the portion of the probe that cannot be extended by the ligase. Probe families comprising probes having any of these structures and others are such that each probe family comprises a plurality of labeled oligonucleotide probes differing in sequence, and at each position in said sequence, one probe A family includes the criterion that at least two probes differ in base at this position. The total number of nucleosides in each probe is preferably 100 or less, such as 30 or less.

编码寡核苷酸延伸探针家族。Encodes a family of oligonucleotide extension probes.

本发明测序方法利用编码的探针家族。“编码”指将特定标记与含有具有确定序列组之一的部分的探针关联起来的方案，以便用这种标记标记含有具有确定序列组成员序列的部分的探针。通常，编码将多种可区别标记各自与一种或多种探针关联起来，以使各可区别标记与不同探针组关联，并且仅用一个标记标记每个探针(可包含可检测部分的组合)。优选地，各探针组的探针各自含有具有同一确定序列组成员序列的部分。该部分长度可以是一个核苷或多个核苷，如2、3、4、5或更多个核苷。该部分的长度可能仅构成探针整个长度的一小部分，或者可能构成整个探针。确定序列组可以仅含一个序列或含有任何数量的不同序列，这取决于该部分的长度。例如，如果该部分是一个核苷，那么确定序列组最多可含有4个元件(A、G、C、T)。如果该部分的长度为两个核苷，那么确定序列组可含有多达16个元件(AA、AG、AC、AT、GA、GG、GC、GT、CA、CG、CC、CT、TA、TG、TC、TT)。通常，确定序列组所含元件少于可能序列的总数，编码将采用一种以上的确定序列组。The sequencing methods of the invention utilize encoded probe families. "Coding" refers to the scheme of associating a specific label with probes containing a portion of one of a defined set of sequences so that probes containing a portion of a member of a defined set of sequences are labeled with that label. Typically, the encoding associates each of multiple distinguishable labels with one or more probes, such that each distinguishable label is associated with a different set of probes, and each probe (which may contain a detectable moiety) is labeled with only one label. The combination). Preferably, the probes of each probe set each contain a portion having the same defined sequence set member sequence. The moiety may be one nucleoside or multiple nucleosides in length, such as 2, 3, 4, 5 or more nucleosides. The length of this portion may constitute only a small fraction of the overall length of the probe, or may constitute the entire probe. The set of defined sequences may contain only one sequence or any number of different sequences, depending on the length of the portion. For example, if the moiety is a nucleotide, then the set of defined sequences may contain up to 4 elements (A, G, C, T). If the portion is two nucleotides in length, the defined sequence group can contain up to 16 elements (AA, AG, AC, AT, GA, GG, GC, GT, CA, CG, CC, CT, TA, TG , TC, TT). Typically, the set of defined sequences will contain fewer elements than the total number of possible sequences, and more than one set of defined sequences will be used for encoding.

本文所述的测序方法A通常利用探针的近端核苷(即连接于可延伸探针末端的核苷)与标记种类直接对应的简单编码的探针组。近端核苷与其杂交的模板核苷酸互补，因此新连接的探针中近端核苷的种类确定了位于延伸双链体相对位置上的模板核苷酸的种类。从普通意义上说，用于本文所述其它测序方法的探针具有结构X(N)_k，其中X是近端核苷，各核苷N为4倍简并，以使组成探针的寡核苷酸探针分子库中代表长度为k的所有可能序列。因此，例如，一些寡核苷酸探针分子在位置k＝1处含有A，其它分子在位置k＝1处含有G，其它分子在位置k＝1处含有C，其它分子在位置k＝1处含有T，对于其它位置k情况类似，其中认为(N)_k中与X相邻的核苷占据了位置k＝1；认为(N)_k中下一个核苷占据了位置k＝2，等。然而，在任何给定寡核苷酸探针中，X仅代表一种碱基配对特异性，它一般对应于具体的核苷种类，如A、G、C或T。因此，构成具体探针的探针分子库中X一般统一为A、G、C或T。图2显示了结构为X(N)_k的探针的合适编码。根据这种编码，将标记“红”分配给X＝C的探针；将标记“黄”分配给X＝A的探针；将标记“绿”分配给探针X＝G的探针；将标记“蓝”分配给X＝T的探针。因此，探针的序列测定部分和其标记之间是一对一对应关系。Sequencing Method A described herein generally utilizes simply coded probe sets in which the proximal nucleoside of the probe (ie, the nucleoside attached to the end of the extendable probe) corresponds directly to the label species. The proximal nucleoside is complementary to the template nucleotide to which it hybridizes, thus the identity of the proximal nucleoside in the newly ligated probe determines the identity of the template nucleotide at the opposite position in the extending duplex. In a general sense, probes for use in the other sequencing methods described herein have the structure X(N) _k , where X is a proximal nucleoside and each nucleoside N is 4-fold degenerate such that the oligos that make up the probe All possible sequences of length k are represented in the library of nucleotide probe molecules. Thus, for example, some oligonucleotide probe molecules contain A at position k=1, others contain G at position k=1, others contain C at position k=1, others at position k=1 T is contained at the position, and the situation is similar for other positions k, wherein it is considered that the nucleoside adjacent to X in (N) _k occupies position k=1; it is considered that the next nucleoside in (N) _k occupies position k=2, etc. . However, in any given oligonucleotide probe, X represents only one base pairing specificity, which generally corresponds to a specific nucleoside species, such as A, G, C or T. Therefore, X in the library of probe molecules constituting specific probes is generally unified as A, G, C or T. Figure 2 shows a suitable code for a probe of structure X(N) _k . According to this coding, the label "red" is assigned to probes with X=C; the label "yellow" is assigned to probes with X=A; the label "green" is assigned to probes with X=G; The label "blue" is assigned to probes where X=T. Thus, there is a one-to-one correspondence between the sequence-determining portion of a probe and its label.

应理解，新连接延伸探针的标记种类对应于延伸探针中最近端核苷的种类的上述方法可扩展到包括标记种类不仅对应于延伸探针中最近端核苷的种类、而且对应于延伸探针中最近端2个或多个核苷的序列的编码，以便在一个延伸、连接和检测(一般后接切割)循环中测定模板中多个核苷酸的种类。然而，这种编码仍然将标记与寡核苷酸延伸探针的一种序列相关联，以便鉴定模板中位于相对位置的互补核苷酸的种类。如上所述，为了在一个循环中鉴定两个核苷酸，则需要16种不同寡核苷酸探针，各自含有对应的标记(即16种可区别标记)。It will be appreciated that the above-described method of newly ligated extension probes having a label species corresponding to the species of the most-most nucleoside in the extension probe can be extended to include label species corresponding not only to the species of the most-most nucleoside in the extension probe, but also to the species of the most-most nucleoside in the extension probe. The sequence of the most proximal 2 or more nucleotides in the probe encodes the species of multiple nucleotides in the template in one cycle of extension, ligation and detection (typically followed by cleavage). However, this encoding still associates the label with one sequence of the oligonucleotide extension probe in order to identify the species of the complementary nucleotide at the opposite position in the template. As noted above, to identify two nucleotides in one cycle, 16 different oligonucleotide probes are required, each containing a corresponding label (ie, 16 distinguishable labels).

测序方法AB利用另一方法将标记与探针关联起来。将同一标记分配给具有不同序列测定部分的多种探针，而不将标记种类与探针的序列测定部分的序列之间一对一对应。该探针是部分受限探针，该探针受限部分是其序列测定部分。因此，将同一标记分配给各自含有序列不同的受限部分的多种不同探针，其中该序列是确定序列组的一个序列。如上所述，含有相同标记的探针组成一个“探针家族”。该方法采用多个这种探针家族，各自包括含有序列不同的受限部分的多种探针，其中该序列是确定序列组的一个序列。Sequencing Methods AB utilizes another method for associating labels with probes. The same label is assigned to multiple probes having different sequence-determining portions, without a one-to-one correspondence between label species and sequences of the sequence-determining portions of the probes. The probe is a partially constrained probe, the constrained portion of the probe being its sequence determining portion. Thus, the same label is assigned to a plurality of different probes each containing a constrained portion different in sequence, where the sequence is one sequence of a defined set of sequences. As noted above, probes containing the same label form a "probe family". The method employs a plurality of such probe families, each comprising a plurality of probes comprising constrained portions differing in sequence, wherein the sequence is one sequence of a defined set of sequences.

多个探针家族被称为探针家族“集合”。用一种标记来标记探针家族集合中某探针家族的探针，该标记应区别于用于标记该集合的其它探针家族的标记。各探针家族优选具有其自身的确定序列组。优选地，各探针家族中的探针的受限部分长度相同，优选地，探针家族集合中探针家族的受限部分长度相同。优选地，探针家族集合中探针家族的确定序列组的组合包括受限部分长度的所有可能序列。优选地，探针家族集合包括4种区别标记的探针家族或由其组成。优选地，探针的受限部分长度为2个核苷。A plurality of probe families is referred to as a probe family "set". The probes of a probe family in the set of probe families are labeled with a label that is distinguishable from the labels used to label other probe families of the set. Each probe family preferably has its own set of defined sequences. Preferably, the lengths of the restricted parts of the probes in each probe family are the same, and preferably, the lengths of the restricted parts of the probe families in the probe family set are the same. Preferably, the combination of the defined sequence set of probe families in the probe family collection includes all possible sequences of the restricted partial length. Preferably, the collection of probe families comprises or consists of 4 differentially labeled probe families. Preferably, the restricted portion of the probe is 2 nucleosides in length.

各种差别编码的可区别标记的探针家族的集合将满足上述标准，并可用于实施本发明方法。然而，优选某些探针家族集合。由部分限定探针组成的4种区别标记的探针家族的优选集合的示范性编码见图25A。如图25A所示，受限部分由探针中2个最靠近3’端的核苷组成。探针家族被标记为“红”、“黄”、“绿”和“蓝”。各探针家族的探针包括其序列是确定序列组中一个序列的受限部分，各探针家族的确定序列组不同。例如，从认为是探针近端的各序列的3’端开始，“红”探针家族是{CT、AG、GA、TC}；“黄”探针家族的确定序列组是{CC、AT、GG、TA}；“绿”探针家族的确定序列组是{CA、AC、GT、TG}；“蓝”探针家族的确定序列组是{CG、AA、GC、TT}。各确定序列组不含其他组中存在的任何成员，这是其优选特征。此外，探针家族集合中探针家族的确定序列组的组合包括长度为2的所有可能序列，即所有可能的二核苷。此探针家族集合的另一特征(优选但不必要)是探针的受限部分的各位置是4倍简并，即各位置可被A、G、C或T占据。此探针家族集合的另一特征(优选但不必要)是，在各确定序列组内，只有一个序列在任何位置，如最近端位置或任何其它位置上具有任何特定核苷。特别优选但不必要的是，如果认为最近端核苷是位置1，在各确定序列组内，只有一个序列在受限部分内的位置2或更高位置上具有任何特定核苷。例如，在红探针家族的确定序列组中，只有一个序列在位置2上具有T；只有一个序列在位置2上具有G；只有一个序列在位置2上具有A；只有一个序列在位置2上具有C。Collections of various differentially coded, differentially labeled probe families will meet the above criteria and can be used to practice the methods of the invention. However, certain probe family collections are preferred. An exemplary code for a preferred set of four differentially labeled probe families consisting of partially defined probes is shown in Figure 25A. As shown in Figure 25A, the restricted portion consists of the two most 3' nucleosides in the probe. Probe families are labeled "red", "yellow", "green" and "blue". The probes of each probe family include a restricted portion whose sequence is one of a set of defined sequences, the set of defined sequences being different for each probe family. For example, starting from the 3' end of each sequence considered proximal to the probe, the "red" probe family is {CT, AG, GA, TC}; the defined sequence set for the "yellow" probe family is {CC, AT , GG, TA}; the defined sequence set for the "green" probe family is {CA, AC, GT, TG}; the defined sequence set for the "blue" probe family is {CG, AA, GC, TT}. It is a preferred feature that each defined sequence group does not contain any members present in other groups. Furthermore, the combination of a defined sequence set of probe families in the probe family collection includes all possible sequences of length 2, ie all possible dinucleosides. Another feature (preferable but not necessary) of this collection of probe families is that each position of the restricted part of the probe is 4-fold degenerate, ie each position can be occupied by A, G, C or T. Another feature (preferable but not necessary) of this collection of probe families is that, within each defined set of sequences, only one sequence has any particular nucleotide at any position, such as the most proximal position or any other position. It is particularly preferred but not necessary that, if the most proximal nucleoside is considered to be position 1, within each defined sequence group only one sequence has any particular nucleoside at position 2 or higher within the restricted portion. For example, in the defined set of sequences for the red probe family, only one sequence has a T at position 2; only one sequence has a G at position 2; only one sequence has an A at position 2; only one sequence has an A at position 2 have C.

对于任何如图25A所示的具体编码，了解某探针家族中一种探针的受限部分的一个或多个核苷的种类能提供关于该探针的受限部分的其它核苷酸的信息。从最普通的意义上说，了解某探针家族探针的受限部分的一个或多个核苷的种类能提供足够信息来排除另外一个位置上一种或多种可能的核苷种类，因为该探针家族的确定序列组不包括在该位置上为该核苷种类的序列。一般地，了解某探针家族探针的受限部分的一个或多个核苷的种类能提供足够信息来排除多个核苷如其它各核苷的一种或多种可能种类。在优选编码中，了解某探针家族探针的受限部分的一个或多个核苷的种类能排除该探针中其它各核苷的除一种可能外的所有可能。例如，在图25A所示的编码探针家族的情况下，如果已知探针是红家族成员，如果也知道最近端核苷是C，那么相邻核苷一定是T。相似地，如果已知探针是绿家族成员，如果也知道最近端核苷是G，那么相邻核苷一定是T。因此，了解受限部分的一个核苷的种类足以排除其它核苷的除一种可能以外的所有可能，因此，即完全鉴定了其它核苷的种类。但如果不了解探针受限部分的至少一个核苷的种类，仅根据对所属的探针家族名称的了解无法获得关于探针中任何特定核苷的种类的任何信息，因为受限部分各位置上的核苷可以是A、G、C或T。图25B显示了采用测序方法AB时优选的探针家族集合(上图)和连接、检测和切割(下图)循环。For any of the specific codes shown in Figure 25A, knowledge of the identity of one or more nucleosides of a restricted portion of a probe in a probe family can provide information about the other nucleotides of the restricted portion of the probe. information. In the most general sense, knowledge of one or more nucleoside species in the restricted portion of a probe family probe provides sufficient information to rule out one or more possible nucleoside species at another position, because The defined sequence set for the probe family does not include the sequence for that nucleoside species at that position. In general, knowing the species of one or more nucleosides of a constrained portion of a probe family probe provides sufficient information to rule out multiple nucleosides such as one or more possible species of each of the other nucleosides. In preferred coding, knowledge of the identity of one or more nucleosides of a constrained portion of a probe family probe excludes all but one possibility for each of the other nucleosides in the probe. For example, in the case of the family of encoded probes shown in Figure 25A, if the probe is known to be a member of the red family, and if the most proximal nucleoside is also known to be a C, then the adjacent nucleoside must be a T. Similarly, if the probe is known to be a member of the Green family, and if the most proximal nucleoside is also known to be G, then the adjacent nucleoside must be T. Thus, knowing the species of one nucleoside of the restricted portion is sufficient to rule out all but one possibility of the other nucleosides, thus completely identifying the species of the other nucleosides. But without knowing the identity of at least one nucleoside in the restricted portion of the probe, no information can be obtained on the identity of any particular nucleoside in the probe based solely on the name of the probe family to which it belongs, because each position of the restricted portion The nucleoside on can be A, G, C or T. Figure 25B shows preferred probe family collections (upper panel) and cycles of ligation, detection and cleavage (lower panel) using sequencing method AB.

本发明者设计了含有长度为2个核苷并具有图25A所示探针家族集合的有利特征的受限部分的24种探针家族集合。这些探针家族最大限度地提供信息，因为了解探针所属的探针家族名称，并了解探针中一个核苷的种类，就足以准确地鉴定受限部分的其它核苷。这种情况适用于所有探针和各受限部分的所有核苷。24种优选的探针家族集合各自的编码方案见表1。表1将1-24的编码ID分配给各探针家族集合。各编码确定了用于测序方法AB的通用结构为(XY)N_k的优选探针家族集合的受限部分，从而确定了集合本身。在表1中，“编码ID”下面一列的值1表示，按照该编码，将如第一列和第二列所示分别含有核苷X和Y的探针分配给第一探针家族；(ii)“编码ID”下面一列的值2表示，按照该编码，将如第一列和第二列所示分别含有核苷X和Y的探针分配给第二探针家族；(iii)“编码ID”下面一列的值3表示，按照该编码，将如第一列和第二列所示分别含有核苷X和Y的探针分配给第三探针家族；和(iv)“编码ID”下面一列的值4表示，按照该编码，将如第一列和第二列所示分别含有核苷X和Y的探针分配给第四探针家族。值1、2、3和4各自代表一种标记。例如，编码9确定了图25A所示的探针家族集合，其中1代表蓝，2代表绿，3代表红，4代表黄。应理解，给标记分配值是随意的，如1同样可以代表绿、红或黄。改变值1、2、3和4与标记之间的关联不会改变各探针家族中的探针组，只能将不同标记与各探针家族关联。The present inventors designed 24 probe family sets containing a restricted portion that is 2 nucleosides in length and has the favorable characteristics of the probe family set shown in Figure 25A. These probe families are maximally informative because knowing the name of the probe family to which the probe belongs, and knowing the species of one nucleoside in the probe, is sufficient to accurately identify other nucleosides in the restricted portion. This applies to all probes and to all nucleosides of each restricted moiety. The respective coding schemes of the 24 preferred probe family sets are shown in Table 1. Table 1 assigns coded IDs 1-24 to each probe family set. Each code defines a restricted portion of the collection of preferred probe families of general structure (XY)N _k for sequencing method AB, and thus the collection itself. In Table 1, a value of 1 in the column below "Code ID" indicates that, according to this code, the probes containing nucleosides X and Y as shown in the first and second columns, respectively, are assigned to the first probe family; ( ii) A value of 2 in the column below "Code ID" indicates that, according to this code, the probes containing nucleosides X and Y as shown in the first and second columns, respectively, are assigned to the second probe family; (iii)" A value of 3 in the column below "Code ID" indicates that, according to this code, probes containing nucleosides X and Y, respectively, as shown in the first and second columns, are assigned to the third probe family; and (iv) "Code ID A value of 4 in the lower column indicates that, according to this code, the probes containing nucleosides X and Y as shown in the first and second columns, respectively, are assigned to the fourth probe family. Values 1, 2, 3, and 4 each represent a flag. For example, code 9 defines the set of probe families shown in Figure 25A, where 1 represents blue, 2 represents green, 3 represents red, and 4 represents yellow. It should be understood that the assignment of values to the flags is arbitrary, eg 1 could equally represent green, red or yellow. Changing the association between the values 1, 2, 3, and 4 and the markers does not change the probe sets within each probe family, only a different marker is associated with each probe family.

表1：寡核苷酸探针家族编码Table 1: Oligonucleotide probe family codes

编码IDEncoded ID 1 1 2 2 33 44 55 66 77 8 8 9 9 1010 1111 1212 1313 1414 1515 1616 1717 1818 1919 2020 21 twenty one 22 twenty two 23 twenty three 24 twenty four Xx YY AA AA 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 CC AA 2 2 44 33 2 2 2 2 44 33 2 2 2 2 33 44 33 2 2 33 33 44 2 2 33 44 44 2 2 44 33 44 GG AA 44 33 2 2 33 33 2 2 44 44 33 2 2 2 2 44 44 2 2 44 33 44 2 2 33 2 2 33 2 2 44 33 TT AA 33 2 2 44 44 44 33 2 2 33 44 44 33 2 2 33 44 2 2 2 2 33 44 2 2 33 44 33 2 2 2 2 AA CC 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 CC CC 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 44 44 33 44 44 44 44 33 33 44 33 33 33 33 44 33 GG CC 33 44 44 44 44 33 33 33 1 1 1 1 1 1 1 1 33 33 33 44 1 1 1 1 1 1 1 1 44 44 33 44 TT CC 44 33 33 33 33 44 44 44 33 33 44 33 1 1 1 1 1 1 1 1 44 33 44 44 1 1 1 1 1 1 1 1 AA GG 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 CC GG 44 2 2 44 44 44 2 2 44 44 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 44 2 2 2 2 2 2 44 2 2 2 2 2 2 GG GG 1 1 1 1 1 1 1 1 2 2 44 2 2 2 2 44 44 44 2 2 2 2 44 2 2 2 2 2 2 44 44 44 1 1 1 1 1 1 1 1 TT GG 2 2 44 2 2 2 2 1 1 1 1 1 1 1 1 2 2 2 2 2 2 44 44 2 2 44 44 1 1 1 1 1 1 1 1 2 2 44 44 44 AA TT 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 CC TT 33 33 2 2 33 33 33 2 2 33 33 2 2 2 2 2 2 33 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 GG TT 2 2 2 2 33 2 2 1 1 1 1 1 1 1 1 2 2 33 33 33 1 1 1 1 33 1 1 33 33 2 2 33 2 2 33 2 2 2 2 TT TT 1 1 1 1 1 1 1 1 2 2 2 2 33 2 2 1 1 1 1 1 1 1 1 2 2 33 1 1 33 2 2 2 2 33 2 2 33 2 2 33 33

为了进一步说明如何使用表1来确定优选探针家族集合，考虑编码17。按照这种编码，将具有受限部分AA、GC、TG和CT的探针分配给标记1(如红)；将具有受限部分CA、AC、GG和TT的探针分配给标记2(如黄)；将具有受限部分TA、CC、AG和GT的探针分配给标记3(如绿)；将具有受限部分GA、TC、CG和AT的探针分配给标记4(如蓝)。得到的探针家族集合见图26。To further illustrate how to use Table 1 to determine a preferred probe family set, consider code 17. Following this coding, probes with restricted moieties AA, GC, TG, and CT are assigned to marker 1 (eg, red); probes with restricted moieties CA, AC, GG, and TT are assigned to marker 2 (eg, yellow); probes with restricted moieties TA, CC, AG, and GT are assigned to marker 3 (eg, green); probes with restricted moieties GA, TC, CG, and AT are assigned to marker 4 (eg, blue) . The resulting collection of probe families is shown in Figure 26.

图27A-27C代表了示意性限定24种优选探针家族集合的另一种方法。该方法利用图表，如图27A。此图表的第一列代表第一个碱基。将各标记连接于四种不同碱基序列，这些序列是通过将第一列的碱基与所选标记列的碱基并列给出的。例如，如果标题为“第一个碱基”的列中有A，那么将含有序列为AA的受限部分的探针分配给探针家族1(标记1)；将含有序列为AC的受限部分的探针分配给探针家族2(标记2)；将含有序列为AG的受限部分的探针分配给探针家族3(标记3)；将含有序列为AT的受限部分的探针分配给探针家族4(标记4)。对于含有以C、G或T开始的受限部分的探针，以相似方式分配探针家族。因此，将填满图27A所示碱基的图表翻译成图27B所示编码，其中将受限部分属于{AA、CC、GG、TT}组的探针分配给探针家族1；将受限部分属于{AC、CA、GT、TG}组的探针分配给探针家族2；将受限部分属于{AG、CT、GC、TA}组的探针分配给探针家族3；将受限部分属于{AT、CG、GA、TC}组的探针分配给探针家族4。图27C显示了可插入替换图27A的阴影部分，以分别产生24种优选的探针家族集合的图表。下面进一步描述了采用测序方法AB中优选的探针家族集合的方法。Figures 27A-27C represent another approach to schematically define a set of 24 preferred probe families. The method utilizes a graph, such as that shown in Figure 27A. The first column of this chart represents the first base. Each marker was linked to four different base sequences given by juxtaposing the bases of the first column with the bases of the selected marker column. For example, if there is an A in the column titled "First Base", then probes containing the restricted portion with the sequence AA are assigned to probe family 1 (label 1); Some probes are assigned to probe family 2 (marker 2); probes containing a restricted portion with the sequence AG are assigned to probe family 3 (marker 3); probes with a restricted portion with the sequence AT are assigned to Assigned to probe family 4 (label 4). For probes containing restricted moieties starting with C, G, or T, probe families were assigned in a similar manner. Therefore, the chart filled with the bases shown in Figure 27A is translated into the code shown in Figure 27B, wherein the probes whose restricted part belongs to the group {AA, CC, GG, TT} are assigned to probe family 1; Probes that partially belong to the group {AC, CA, GT, TG} are assigned to probe family 2; probes that partially belong to the group {AG, CT, GC, TA} are assigned to probe family 3; Probes that partially belonged to the group {AT, CG, GA, TC} were assigned to probe family 4. Figure 27C shows a graph that can be inserted in place of the shaded portion of Figure 27A to generate each of the 24 preferred probe family sets. Methods employing the preferred collection of probe families in Sequencing Methods AB are further described below.

表1确定的24种编码探针家族集合仅代表用于测序方法AB的探针家族集合的优选实施方式。可采用基本原理相同的各种其它编码方案、探针家族和探针结构，其中了解探针家族名称，以及了解受限部分的一个或多个核苷的种类，能提供关于一个或多个其它核苷的信息。与优选的探针家族集合相比，较不优选的探针家族集合较不优选的原因通常为：(i)至少对一些探针而言，了解探针家族名称和核苷种类提供的信息量较少；或(ii)至少对一些探针而言，了解探针家族名称提供的信息量较多。The 24 coded probe family sets identified in Table 1 represent only preferred embodiments of probe family sets for sequencing methods AB. A variety of other coding schemes, probe families and probe structures can be employed with the same basic principles, where knowledge of the probe family name, and knowledge of the identity of one or more nucleosides of the restricted moiety, can provide information about one or more other Nucleoside information. The reasons why less preferred probe family sets are less preferred than preferred probe family sets are usually: (i) at least for some probes, knowing the probe family name and nucleoside species is informative or (ii) at least for some probes, knowing the probe family name is more informative.

通常，较不优选的探针家族集合可用于进行测序方法AB，其使用方式与使用优选的探针家族集合类似。然而，解码所需的步骤可能不同。例如，在一些情况下，将候选序列互相比较可能足以测定至少一部分序列。In general, a less preferred set of probe families can be used to perform sequencing methods AB in a manner similar to the use of a preferred set of probe families. However, the steps required for decoding may vary. For example, in some cases, comparing candidate sequences to each other may be sufficient to determine at least a portion of the sequence.

其中探针含有长度为2个核苷的受限部分的较不优选的探针家族集合的例子见图28。按照这种编码，将受限部分属于{AA、AC、GA、GC}组的探针分配给探针家族1；将受限部分属于{CA、CC、TA、TC}组的探针分配给探针家族2；将受限部分属于{AG、AT、GG、GT}组的探针分配给探针家族3；将受限部分属于{CG、CT、TG、TT}组的探针分配给探针家族4。在此探针家族集合中，了解探针家族名称能排除位于新连接延伸探针的近端核苷相对位置上的模板核苷酸种类的某些可能性，其中通过检测所述新连接延伸探针的标记来确定探针家族名称。例如，如果探针家族名称是1，那么新连接延伸探针的近端核苷一定是A或G，因此模板中的互补核苷酸一定是T或C。与采用优选的探针家族集合时相反，由于受限部分各位置上至少有两种可能性，所以无法准确鉴定核苷酸，但从单个循环获得的信息足以排除一些可能性。An example of a less preferred collection of probe families in which the probe contains a restricted portion of 2 nucleotides in length is shown in FIG. 28 . According to this encoding, the probes whose restricted part belongs to the group {AA, AC, GA, GC} are assigned to probe family 1; the probes whose restricted part belongs to the group {CA, CC, TA, TC} are assigned to Probe family 2; Probes whose restricted part belongs to the group {AG, AT, GG, GT} are assigned to probe family 3; Probes whose restricted part belongs to the group {CG, CT, TG, TT} are assigned to Probe family 4. In this collection of probe families, knowledge of the probe family names can exclude some possibility of template nucleotide species located at relative positions to the proximal nucleosides of the new ligation extension probes that are identified by detection of the new ligation extension probes. The label of the needle is used to determine the probe family name. For example, if the probe family name is 1, then the proximal nucleotide of the newly ligated extension probe must be A or G, so the complementary nucleotide in the template must be T or C. Contrary to when using the preferred pool of probe families, the nucleotides could not be identified exactly because there were at least two possibilities for each position of the restricted portion, but the information obtained from a single cycle was sufficient to rule out some possibilities.

在本发明的某些实施方式中，采用受限部分长度为3个核苷的部分限定探针。为了含有其受限部分包括长度为3的所有可能序列的探针(优选)，探针家族集合应包括4³＝64种不同探针。图29A显示了可用于产生包括受限部分长3个核苷(三核苷)的探针的探针家族集合的受限部分的图表。该图显示了4组表示为A、G、C和T的行和探针家族名称为1、2、3和4的4列。由4行组成的各组与内部含有核苷种类的框相对。为了确定三核苷的探针家族，首先选择含有三核苷的最后一个核苷的框。与该框相邻的4行中，选择用鉴定三核苷中第一个核苷的字母标记的行。在该行内，选择含有三核苷的第二个核苷的列。将三核苷分配给列顶部所示的探针家族。例如，按照以下方法将三核苷“TCG”分配给探针家族：由于最后一个核苷是“G”，所以将关注点限定于与含“G”框相对的4行的组，即第三组。由于第一个核苷是“T”，则将考量范围进一步限定于4组的最后一排。探针家族分配由含有中间核苷的列的标题决定。由于中间核苷是“C”，所以将三核苷分配给探针家族1。相似方法产生以下探针家族分配：AAA＝1；ATA＝2；AGA＝3；GTA＝4；GAG＝1；TGG＝2等。继续该过程，直到将所有可能的三核苷分配给探针家族。In certain embodiments of the invention, partially defined probes with a constrained portion length of 3 nucleotides are employed. In order to contain probes whose restricted part includes all possible sequences of length 3 (preferred), the collection of probe families should include 4 ³ =64 different probes. Figure 29A shows a diagram of restricted moieties that can be used to generate probe family collections that include probes with a restricted moiety 3 nucleosides long (trinucleosides). The figure shows 4 sets of rows denoted A, G, C and T and 4 columns with probe family names 1, 2, 3 and 4. Groups of 4 rows are opposed to boxes containing nucleoside species inside. To identify probe families for trinucleosides, first select the box containing the last nucleoside of the trinucleoside. Of the 4 rows adjacent to this box, select the row labeled with the letter identifying the first nucleoside of the trinucleotide. Within that row, select the column containing the second nucleoside of the trinucleotide. Assign the trinucleotides to the probe families shown at the top of the column. For example, assign the trinucleotide "TCG" to a probe family as follows: Since the last nucleoside is "G", restrict the focus to the group of 4 lines opposite the box containing "G", i.e. the third Group. Since the first nucleoside is "T", the scope of consideration is further limited to the last row of group 4. Probe family assignment is determined by the heading of the column containing the intermediate nucleoside. Since the middle nucleoside is "C", three nucleosides were assigned to probe family 1. A similar approach yielded the following probe family assignments: AAA=1; ATA=2; AGA=3; GTA=4; GAG=1; TGG=2, etc. Continue this process until all possible trinucleosides have been assigned to probe families.

图29B显示了构建包括受限部分长3个核苷的探针的探针家族集合的其它受限部分的方法。该方法用于构建来自上述24种优选的探针家族集合的每一种的集合，其中受限部分的长度为2个核苷，该集合含有4种探针家族。该图的上图显示了代表优选探针家族集合的示范性图表。按照上图中分配给各列的颜色，直接将上图的列绘制到下图中。因此，上图的列从左至右分别是蓝、绿、黄和红。下图中列1下的条目从上至下分别是蓝、绿、黄和红，各组的4个核苷对应于上图的列。通过将列1的各组4个核苷逐渐向下移动产生下图中的列2、3和4。Figure 29B shows a method for constructing additional restricted portions of probe family collections comprising probes with restricted portions 3 nucleotides long. This method was used to construct a collection from each of the 24 preferred probe family collections described above, wherein the length of the restricted portion is 2 nucleotides, the collection contains 4 probe families. The upper panel of the figure shows an exemplary graph representing a collection of preferred probe families. The columns of the above figure are drawn directly into the figure below, following the colors assigned to the columns in the figure above. Therefore, the columns in the image above are blue, green, yellow, and red from left to right. The entries under column 1 in the figure below are blue, green, yellow and red from top to bottom, and the 4 nucleosides in each group correspond to the columns in the figure above. Columns 2, 3, and 4 in the figure below were generated by gradually shifting down each group of 4 nucleosides of column 1.

应理解，可以认为“探针家族”是含有各自含有相同标记的多种不同探针的一种“超级探针”。在这种情况下，组成探针的探针分子通常不是探针的任何部分都基本相同的分子群体。采用术语“探针家族”不旨在产生任何限制作用，而是为了方便地描述组成这些“超级探针”的探针的特征。It is understood that a "probe family" can be considered to be a "super probe" comprising a plurality of different probes each containing the same label. In such cases, the probe molecules making up the probe are generally not a population of molecules that are substantially identical in any part of the probe. The use of the term "probe family" is not intended to be limiting in any way, but rather for convenience in describing the characteristics of the probes that make up these "super probes".

解码decoding

如上所述，在一个测序反应中，采用包括至少两种区别标记的探针家族的探针家族集合进行连续延伸、连接、检测和切割循环产生探针家族名称的有序列表，或者将从模板中不同位点启动的多个测序反应测定的探针家族名称组装成有序列表。进行的循环数应该约等于所需序列长度。有序列表含有大量信息，但不能立即产生感兴趣的序列。必须进行额外步骤，其中至少一个步骤包括收集至少一项关于序列的额外信息，以获得最可能代表感兴趣序列的序列。本文中将最可能代表感兴趣序列的序列称为“正确”序列，从探针家族的有序列表中提取正确序列的过程称为“解码”。应理解，在序列产生期间或之后上述“有序列表”中的元件可以重排，只要信息内容包括列表中的元件与模板中的核苷酸的对应关系被保留，并且只要解码过程(下述)中适当地考虑重排、片段化和/或置换。因此，术语“有序列表”旨在包括如上所述产生的重排、片段化和/或置换的有序列表，只要这种重排、片段化和/或置换的列表包括基本相同的信息内容。As described above, in a sequencing reaction, sequential extension, ligation, detection and cleavage cycles are performed using a probe family pool comprising at least two differentially labeled probe families to generate an ordered list of probe family names, or the Probe family names determined by multiple sequencing reactions initiated at different loci in the array are assembled into an ordered list. The number of cycles performed should be approximately equal to the desired sequence length. Ordered lists contain a lot of information, but do not immediately yield sequences of interest. Additional steps must be performed, at least one of which involves gathering at least one item of additional information about the sequence to obtain the sequence most likely to represent the sequence of interest. The sequence most likely to represent the sequence of interest is referred to herein as the "correct" sequence, and the process of extracting the correct sequence from the ordered list of probe families is referred to as "decoding". It should be understood that elements in the aforementioned "ordered list" may be rearranged during or after sequence generation, as long as the information content, including the correspondence between the elements in the list and the nucleotides in the template, is preserved, and as long as the decoding process (described below ) taking into account rearrangements, fragmentation and/or substitutions as appropriate. Accordingly, the term "ordered list" is intended to include rearranged, fragmented and/or permuted ordered lists produced as described above, so long as such rearranged, fragmented and/or permuted lists comprise substantially the same information content .

可用各种方法解码有序列表。其中有些方法包括从探针家族名称的有序列表产生至少一个候选序列的一组序列。这组候选序列可提供足够的信息来达到目标。在优选实施方式中，进行一个或多个额外步骤，以便从候选序列或与候选序列作比较的序列组中选择最可能代表感兴趣序列的序列。例如，在一种方法中，比较至少一个候选序列的至少一部分与至少一种其它序列。根据比较结果选择正确序列。在本发明的某些实施方式中，解码包括重复该方法并采用与原始探针家族集合编码不同的探针家族集合获得探针家族名称的第二个有序列表。用探针家族的第二个有序列表的信息确定正确序列。在一些实施方式中，用另选编码的探针家族集合从少至一个延伸、连接和检测循环获得的信息足以选择正确序列。换言之，用另选编码的探针家族鉴定的第一个探针家族提供了足够信息以确定哪个候选序列正确。Ordered lists can be decoded in various ways. Some of these methods include generating a set of sequences of at least one candidate sequence from an ordered list of probe family names. This set of candidate sequences may provide enough information to achieve the goal. In a preferred embodiment, one or more additional steps are performed to select from a candidate sequence or a group of sequences compared to candidate sequences the sequence most likely to represent the sequence of interest. For example, in one method, at least a portion of at least one candidate sequence is compared to at least one other sequence. Select the correct sequence based on the comparison result. In some embodiments of the invention, decoding comprises repeating the method and obtaining a second ordered list of probe family names using a different set of probe families than the original set of probe families. The information from the second ordered list of probe families was used to determine the correct sequence. In some embodiments, the information obtained from as few as one cycle of extension, ligation, and detection with alternatively encoded probe family pools is sufficient to select the correct sequence. In other words, the first probe family identified with an alternatively encoded probe family provides sufficient information to determine which candidate sequence is correct.

其它解码方法包括用任何可用的测序方法，如一个循环的测序方法A，特异性鉴定模板中的至少一个核苷酸。将关于一个或多个核苷酸的信息用作解码探针家族名称的有序列表的“钥匙”。或者，除了序列未知的区域以外，测序的模板部分可包括已知序列区。如果将测序方法AB应用于包括未知序列和已知序列的至少一个核苷酸的模板的部分，那么该已知序列可用作解码探针家族名称的有序列表的“钥匙”。以下章节描述了产生候选序列的过程。后续章节描述了采用候选序列与已知序列作比较、与第二组候选序列作比较和利用已知的核苷酸种类来选择正确序列。Other decoding methods include using any available sequencing method, such as one-cycle sequencing method A, to specifically identify at least one nucleotide in the template. Information about one or more nucleotides is used as a "key" to decode the ordered list of probe family names. Alternatively, the sequenced portion of the template may include regions of known sequence in addition to regions of unknown sequence. If the sequencing method AB is applied to a portion of the template comprising an unknown sequence and at least one nucleotide of a known sequence, the known sequence can be used as a "key" to decode the ordered list of probe family names. The following sections describe the process of generating candidate sequences. Subsequent sections describe the use of candidate sequences for comparison to known sequences, comparison to a second set of candidate sequences, and use of known nucleotide species to select the correct sequence.

产生候选序列generate candidate sequences

应理解，待测序模板部分与由连续的延伸、连接和切割循环产生的延伸双链体互补。因此，产生延伸双链体的候选序列等价于产生待测序模板区域的候选序列。在实践中，可以产生待测序模板区域的候选序列，或者可以产生延伸双链体的候选序列，并用它们的互补物来测定待测序模板区域的候选序列。本文中描述后一种方法。为了从探针家族名称列表产生候选序列，考虑了该探针家族列表的第一个成员。与该探针家族有关的受限部分组限制了等于受限部分长度的长度上序列的起始核苷酸的可能性。例如，如果受限部分是二核苷酸，那么延伸双链体中第一个二核苷酸的可能序列仅限于属于该探针家族的探针中出现的受限部分(因此待测序模板区域中第一个二核苷酸的可能序列仅限于与属于该探针家族的探针中出现的受限部分互补的组合)。一般用计算机记录第一个二核苷酸的可能性。相似地，延伸双链体中第二个二核苷酸(即从第一个二核苷酸偏移一个核苷酸的二核苷酸)的可能序列仅限于属于第二个探针家族的探针中出现的受限部分(因此，模板中第二个二核苷酸，即从第一个二核苷酸偏移一个核苷酸的二核苷酸的可能序列仅限于与属于第二个探针家族的探针中出现的受限部分互补的组合)。也记录第二个二核苷酸的可能序列。同样地记录随后的二核苷酸的可能性，直到记录到对应于待测定序列所需长度的二核苷酸的可能性或列表中再没有探针家族。It is understood that the portion of the template to be sequenced is complementary to the extended duplex generated by successive cycles of extension, ligation and cleavage. Thus, generating candidate sequences for the extended duplex is equivalent to generating candidate sequences for the template region to be sequenced. In practice, candidate sequences for the template region to be sequenced can be generated, or candidate sequences for extended duplexes can be generated and their complements used to determine candidate sequences for the template region to be sequenced. The latter method is described in this article. To generate a candidate sequence from a list of probe family names, the first member of the probe family list is considered. The set of restricted parts associated with this probe family limits the possibility of starting nucleotides of the sequence over a length equal to the length of the restricted part. For example, if the constrained moiety is a dinucleotide, then the possible sequences of the first dinucleotide in the extended duplex are limited to those constrained moieties that occur in probes belonging to that probe family (thus the region of the template to be sequenced The possible sequences of the first dinucleotide in are limited to combinations complementary to the restricted portion occurring in probes belonging to this probe family). The probability of the first dinucleotide is generally recorded by computer. Similarly, the possible sequences of the second dinucleotide (i.e., a dinucleotide offset by one nucleotide from the first) in the extended duplex are limited to those belonging to the second probe family. restricted portion of the probe (thus, the possible sequences of the second dinucleotide in the template, i.e., the dinucleotide offset by one nucleotide from the first dinucleotide, are limited to Combinations of restricted partial complementarities that occur in probes of a probe family). The probable sequence of the second dinucleotide is also recorded. Subsequent dinucleotide probabilities are likewise recorded until a dinucleotide probabilities corresponding to the desired length of the sequence to be determined are recorded or there are no more probe families in the list.

图30中描述了记录可能性的方法的代表性例子，其中设想用图25A所示的探针家族集合产生了探针家族名称列表。图30的最左边一列以从上到下的顺序显示了探针家族列表：黄、绿、红、蓝。在该图的右侧显示对应于列表中各探针家族的二核苷酸的序列可能性。在序列可能性上方标识了核苷酸位置。序列从位置1开始，因此第一个二核苷酸占据位置1和2；第二个二核苷酸占据位置2和3，等。对于黄探针家族，可能性是CC、AT、GG和TA，如图30所示。对于绿探针家族，可能性是CA、AC、GT和TG等。继续记录各二核苷酸的可能序列的过程，直到达到所需序列长度。A representative example of a method of recording likelihoods is depicted in FIG. 30, where it is assumed that a list of probe family names is generated using the collection of probe families shown in FIG. 25A. The leftmost column of Figure 30 shows a list of probe families in order from top to bottom: yellow, green, red, blue. The sequence likelihoods of the dinucleotides corresponding to each probe family in the list are shown on the right side of the figure. Nucleotide positions are indicated above the sequence likelihood. The sequence starts at position 1, so the first dinucleotide occupies positions 1 and 2; the second dinucleotide occupies positions 2 and 3, etc. For the yellow probe family, the possibilities are CC, AT, GG, and TA, as shown in FIG. 30 . For the green probe family, the possibilities are CA, AC, GT, and TG, etc. The process of recording possible sequences for each dinucleotide continues until the desired sequence length is reached.

产生可能性组后，对候选序列中第一个核苷酸的种类作出第一个假设，假定其在序列的5’位置上，在图30中表示为位置1。第一个假设可以是该核苷酸是A、该核苷酸是G、该核苷酸是C或该核苷酸是T。After generating the likelihood set, a first hypothesis is made about the identity of the first nucleotide in the candidate sequence, which is assumed to be at the 5' position of the sequence, denoted as position 1 in Figure 30. The first hypothesis can be that the nucleotide is A, the nucleotide is G, the nucleotide is C or the nucleotide is T.

观察到各二核苷酸的可能序列受限于相邻二核苷酸的可能序列，因为相邻二核苷酸重叠，即第一个二核苷酸的第二个核苷酸也是第二个二核苷酸的第一个核苷酸。例如，如果假定第一个核苷酸是C，那么第一个核苷酸一定是CC。如果第一个二核苷酸是CC，那么第二个二核苷酸的第一个位置上一定是C。由于第一个位置上为C的第二个二核苷酸的可能序列只能是CA，所以证明，第二个二核苷酸一定是CA。因此，前3个核苷酸的序列一定是CCA。相似地，第三个二核苷酸的可能序列受限于第二个二核苷酸的可能序列。如果第二个二核苷酸是CA，那么第三个二核苷酸一定是AG，因为这是第一个位置为A的唯一可能。因此前4个核苷酸的序列一定是CCAG。继续此过程产生前5个核苷酸的序列5’-CCAGC-3’。因此，CCAGC是第一个候选序列。Observe that the possible sequences of each dinucleotide are limited by the possible sequences of adjacent dinucleotides because adjacent dinucleotides overlap, i.e. the second nucleotide of the first dinucleotide is also the second the first nucleotide of a dinucleotide. For example, if the first nucleotide is assumed to be C, then the first nucleotide must be CC. If the first dinucleotide is CC, then the second dinucleotide must have a C in the first position. Since the possible sequence of the second dinucleotide with C at the first position can only be CA, it is proved that the second dinucleotide must be CA. Therefore, the sequence of the first 3 nucleotides must be CCA. Similarly, the possible sequences of the third dinucleotide are limited by the possible sequences of the second dinucleotide. If the second dinucleotide is CA, then the third must be AG, since that is the only possibility for an A in the first position. Therefore the sequence of the first 4 nucleotides must be CCAG. Continuing this process yields the first 5 nucleotides of the sequence 5'-CCAGC-3'. Therefore, CCAGC is the first candidate sequence.

通过假定第一个核苷酸是A产生第二个候选序列。这种假设使第一个二核苷酸成为AT。TG是与第一个二核苷酸的序列AT相符的第二个二核苷酸的唯一可能序列。GA是与第二个二核苷酸的序列TG相符的第三个二核苷酸的唯一可能序列。AA是与第三个二核苷酸的序列GA相符的第四个二核苷酸的唯一可能序列。将这些二核苷酸组装成全长候选序列产生ATGAA。相似地，假定第一个核苷酸是G产生的候选序列是GGTCG，假定第一个核苷酸是T产生的候选序列是TACTT。因此，产生了4种候选序列，各自以假定是该序列的第一个核苷酸的不同核苷酸开始。A second candidate sequence is generated by assuming that the first nucleotide is A. This assumption makes the first dinucleotide AT. TG is the only possible sequence of the second dinucleotide that corresponds to the sequence AT of the first dinucleotide. GA is the only possible sequence of the third dinucleotide that coincides with the sequence TG of the second dinucleotide. AA is the only possible sequence of the fourth dinucleotide corresponding to the sequence GA of the third dinucleotide. Assembly of these dinucleotides into full-length candidate sequences generates ATGAA. Similarly, assuming the first nucleotide is G, the resulting candidate sequence is GGTCG, and assuming the first nucleotide is T, the resulting candidate sequence is TACTT. Thus, 4 candidate sequences were generated, each beginning with a different nucleotide that was assumed to be the first nucleotide of the sequence.

没有要求一定要对第一个核苷酸、而非其它核苷酸之一作假设。例如，对第四个核苷酸的种类作假设可以达到相同效果，在这种情况下通过沿模板“向后”移动(即3’→5’方向)产生候选序列。例如，假定第四个核苷酸是T意味着，第四个二核苷酸一定是TT；第三个二核苷酸一定是CT；第二个二核苷酸一定是AC；第一个二核苷酸一定是CC。(虽然以3’→5’方向在序列中移动产生其种类，但以5’→3’方向书写核苷酸)。或者，可对序列当中的任何核苷酸进行假设，通过以5’→3’和3’→5方向移动产生二核苷酸种类。应理解，如果不对核苷酸之一作出假设，就完全无法测定各核苷酸的种类，因为各位置均可被A、G、C或T占据。There is no requirement to make a hypothesis about the first nucleotide rather than one of the other nucleotides. For example, the same effect can be achieved by making a hypothesis about the species of the fourth nucleotide, in which case a candidate sequence is generated by moving "backwards" along the template (i.e. in the 3'→5' direction). For example, assuming that the fourth nucleotide is T means that the fourth dinucleotide must be TT; the third dinucleotide must be CT; the second dinucleotide must be AC; the first The dinucleotide must be CC. (Nucleotides are written in the 5'→3' direction, although moving through the sequence in the 3'→5' direction generates its species). Alternatively, hypotheses can be made for any nucleotide in the sequence, generating dinucleotide species by shifting in the 5'→3' and 3'→5 directions. It will be appreciated that without making an assumption about one of the nucleotides, the identity of each nucleotide cannot be determined at all, since each position can be occupied by A, G, C or T.

采用优选的探针家族集合时，假定任何单个核苷酸(如第一个核苷酸)的种类能产生且仅产生一个候选序列。然而，采用较不优选的探针家族集合时，可能必须假设一个以上的核苷酸种类，即假定第一个核苷酸的种类不能完全确定其余序列。例如，较不优选的探针家族集合可能包括其成员的确定序列是AA和AC的家族。在这种情况下，假定第一个核苷酸是A使第二个核苷酸产生两种可能性。下面进一步讨论了用较不优选的探针家族集合进行测序。应理解，如果受限部分由非毗连核苷酸组成，仍可使用经稍许改动的上述方法。With the preferred collection of probe families, it is assumed that any single nucleotide (eg, first nucleotide) species will generate one and only one candidate sequence. However, when using less preferred probe family pools, it may be necessary to assume more than one nucleotide species, ie the assumed first nucleotide species does not fully define the rest of the sequence. For example, a less preferred collection of probe families might include families whose members' defined sequences are AA and AC. In this case, assuming the first nucleotide is A leaves two possibilities for the second nucleotide. Sequencing with less preferred probe family pools is discussed further below. It will be appreciated that if the constrained portion consists of non-contiguous nucleotides, the above method can still be used with slight modifications.

通过比较候选序列与已知序列进行序列鉴定Sequence identification by comparing candidate sequences with known sequences

通常，如果如上所述测定了延伸双链体的候选序列，通过取其互补物获得待测序模板区域的对应候选序列。在一些情况下，候选序列本身将提供足够信息以达到目的。例如，如果测序目的仅仅是排除某些序列可能性，那么比较候选序列与这些可能性足矣。图30所示候选序列能够测定(例如)测序区域不是聚A尾的一部分。较长的序列可确认测序区域不是载体的一部分。Typically, if the candidate sequence for the extended duplex is determined as described above, the corresponding candidate sequence for the region of the template to be sequenced is obtained by taking its complement. In some cases, the candidate sequence itself will provide sufficient information for purposes. For example, if the purpose of sequencing is simply to rule out certain sequence possibilities, it is sufficient to compare candidate sequences to these possibilities. Candidate sequences shown in Figure 30 enable determination, for example, that the sequenced region is not part of the poly A tail. Longer sequences confirm that the sequenced region is not part of the vector.

在许多情况下，需要明确测定正确序列。按照本发明的优选实施方式，通过比较待测序模板区域的候选序列与一组已知序列鉴定正确序列。该组已知序列可以是(例如)感兴趣的具体生物体的序列组。例如，如果对人DNA进行测序，那么可比较候选序列与人类基因组序列草图。参见URL为www.ncbi.nih.gov/genome/guide/human/的网站上关于公众可得的人类基因组序列来源的指南。另一个例子是，如果要对衍生自感染物(如分离自对象的细菌或病毒)的核酸进行测序，那么可搜索含有该细菌或病毒变异株的序列的数据库。本领域已知许多这种特定有机体的数据库，其中含有完整或部分序列，随着测序工作的加速进行，可获得更多的数据库。一些代表性例子包括小鼠数据库(参见例如，URL为www.ncbi.nlm.nih.gov/genome/seq/MmHome.html的网站)，人类免疫缺陷病毒数据库(参见例如，URL为hiv-web.lanl.gov/content/hiv-db/mainpage.html的网站)，疟疾病原体恶性疟原虫数据库(Plasmodium falciparum)(参见例如URL为http://www.tigr.org/tdb/edb2/pfal/htmls/index.shtml的网站)等。当然，不一定采用特定有机体的序列组。可搜索数据库如GenBank(URL为http://www.ncbi.nlm.nih.gov/Genbank/的网站)，其中含有来自各种有机体和病毒的序列。数据库甚至不一定含有产生模板的有机体或病毒的任何序列。通常，序列可以是基因组序列、cDNA序列、EST等。可搜索多个序列。In many cases, unambiguous determination of the correct sequence is required. According to a preferred embodiment of the present invention, the correct sequence is identified by comparing the candidate sequences of the template region to be sequenced with a set of known sequences. The set of known sequences can be, for example, the set of sequences for a particular organism of interest. For example, if human DNA is sequenced, the candidate sequence can be compared to the draft sequence of the human genome. See the guide to publicly available sources of human genome sequences at URL www.ncbi.nih.gov/genome/guide/human/ . As another example, if nucleic acid derived from an infectious agent, such as a bacterium or virus isolated from a subject, is to be sequenced, a database containing the sequence of a variant strain of the bacterium or virus can be searched. Many such organism-specific databases are known in the art, containing complete or partial sequences, and more will become available as sequencing efforts are accelerated. Some representative examples include the mouse database (see, e.g., the website at URL www.ncbi.nlm.nih.gov/genome/seq/MmHome.html ), the human immunodeficiency virus database (see, e.g., the URL at hiv-web. lanl.gov/content/hiv-db/mainpage.html ), malaria pathogen Plasmodium falciparum database (see e.g. URL at http://www.tigr.org/tdb/edb2/pfal/htmls/ index.shtml 's website), etc. Of course, it is not necessary to employ the set of sequences of a particular organism. Searchable databases such as GenBank (web site at URL http://www.ncbi.nlm.nih.gov/Genbank/ ) contain sequences from various organisms and viruses. The database doesn't even necessarily contain any sequences of the organism or virus that produced the template. Typically, the sequence may be a genomic sequence, cDNA sequence, EST, or the like. Multiple sequences can be searched.

只进行搜索可能就足以达到目的。例如，如果病毒核酸分离自患者，比较候选序列与该病毒的一组已知序列即可确定该病毒核酸是否含有来自该病毒的序列，即使从未检测过匹配序列。存在匹配能确认该患者被该病毒感染，而缺少匹配则表示该患者未感染该病毒。Just doing a search might be enough for the purpose. For example, if a viral nucleic acid is isolated from a patient, comparison of the candidate sequence to a set of known sequences for that virus can determine whether the viral nucleic acid contains a sequence from that virus, even if a matching sequence has never been detected. The presence of a match confirms that the patient is infected with the virus, while the absence of a match indicates that the patient is not infected with the virus.

在某些实施方式中，已知序列组含有较窄的序列范围，这可能特别适应进行测序的目的。因此，测序核酸信息可用于选择已知序列组。例如，如果已知模板代表具体基因的序列，已知序列可代表感兴趣的给定基因座上基因的不同等位基因、突变或野生型序列等。可能只需要比较候选序列与一种已知序列，即可测定哪个候选序列是正确序列。例如，在本发明的某些实施方式中，通过扩增含有感兴趣区域的DNA获得模板(如采用侧接于感兴趣区域的引物)。感兴趣区域可包括突变或多态性位点，如与具体基本相关的突变或多态性。如果已知该模板代表具体感兴趣区域的序列，那么只需要比较候选序列与一参比序列，如序列的野生型或突变形式的这个区域。换言之，如果已知一部分或全部的模板序列，那么可能不必与多种已知序列进行比较。相反，将包含所有或部分已知序列的候选序列选作正确序列。例如，已知BRCA1和BRCA2基因中的突变与乳腺癌风险增加有关，人们对测定对象是否携带这种突变很感兴趣。如果已知该模板包含来自BRCA1基因的序列，例如，如果将侧接于包括该基因一部分的感兴趣区域的引物用于产生模板的克隆群，那么仅需要比较该候选序列与野生型或突变的BRCA1序列，以确定正确序列。In certain embodiments, the set of known sequences contains a narrow range of sequences that may be particularly suitable for the purpose of performing the sequencing. Thus, sequenced nucleic acid information can be used to select groups of known sequences. For example, if the known template represents the sequence of a particular gene, the known sequence may represent different alleles, mutant or wild-type sequences, etc. of the gene at a given locus of interest. It may only be necessary to compare the candidate sequence with a known sequence to determine which candidate sequence is the correct one. For example, in certain embodiments of the invention, the template is obtained by amplifying DNA containing the region of interest (eg, using primers flanking the region of interest). A region of interest may include sites of mutations or polymorphisms, such as mutations or polymorphisms associated with a particular base. If the template is known to represent the sequence of a particular region of interest, then it is only necessary to compare the candidate sequence to a reference sequence, such as a wild-type or mutated form of the sequence for this region. In other words, if some or all of the template sequence is known, then it may not be necessary to compare to various known sequences. Instead, a candidate sequence containing all or part of the known sequence is selected as the correct sequence. For example, mutations in the BRCA1 and BRCA2 genes are known to be associated with an increased risk of breast cancer, and it is of interest to determine whether a subject carries such mutations. If the template is known to comprise a sequence from the BRCA1 gene, for example, if primers flanking a region of interest comprising a portion of the gene are used to generate a clonal population of the template, it is only necessary to compare the candidate sequence with wild-type or mutant BRCA1 sequence to determine the correct sequence.

在更一般的情况下，比较候选序列与已知序列组会鉴定与候选序列相似的任何已知序列。倘若候选序列足够长，数据库含有与一种以上候选序列相同或非常相似的序列的可能性就非常小。换言之，如果候选序列足够长，就不可能有一种以上的候选序列与已知序列组中的序列相同。比较候选序列与认为是“匹配”的任何序列。一般需要设定确定存在匹配所需的相同性阈值。例如，如果候选序列和已知序列至少50％、至少60％、至少70％、至少80％、至少90％、至少95％、至少99％或甚至100％相同，则可认为与已知序列匹配。一般在长度至少为10个核苷酸，如10-15个核苷酸、15-20个核苷酸、20-25个核苷酸、25-30个核苷酸等的窗口上评价相同性百分数。可按照各种不同标准选择窗口长度，这些标准包括但不限于：多种已知序列中的序列数，多种已知序列的种类或来源等。例如，如果比较候选序列与大数据库如GenBank，需要的窗口长度可能比采用含有较少序列的数据库时更长。在本发明的某些实施方式中，在多个不同窗口上比较序列，这些窗口不一定互相相邻。优选地，窗口的总长度至少为10个核苷酸，如10-15个核苷酸、15-20个核苷酸、20-25个核苷酸、25-30个核苷酸等。在一些情况下，已知序列组中的多种序列可以匹配。该序列可以(例如)代表与产生该模板的有机体相同的有机体中发现的同源基因、来自不同有机体的同源基因、假基因、cDNA和基因组序列等。In more general terms, comparing a candidate sequence to a set of known sequences identifies any known sequences that are similar to the candidate sequence. Provided that the candidate sequences are sufficiently long, it is very unlikely that the database will contain sequences that are identical or very similar to more than one candidate sequence. In other words, if the candidate sequences are long enough, it is impossible for more than one candidate sequence to be identical to a sequence in the set of known sequences. The candidate sequence is compared to any sequence considered a "match". It is generally desirable to set an identity threshold required to determine that a match exists. For example, a candidate sequence may be considered to match a known sequence if it is at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 99%, or even 100% identical to the known sequence . Identity is generally assessed over a window of at least 10 nucleotides in length, such as 10-15 nucleotides, 15-20 nucleotides, 20-25 nucleotides, 25-30 nucleotides, etc. percentage. The window length can be selected according to various criteria, including but not limited to: the number of sequences in the plurality of known sequences, the type or source of the plurality of known sequences, and the like. For example, if candidate sequences are compared to a large database such as GenBank, the required window length may be longer than when using a database containing fewer sequences. In certain embodiments of the invention, sequences are compared over a plurality of different windows, which windows are not necessarily adjacent to each other. Preferably, the total length of the window is at least 10 nucleotides, such as 10-15 nucleotides, 15-20 nucleotides, 20-25 nucleotides, 25-30 nucleotides, etc. In some cases, multiple sequences in the set of known sequences can be matched. The sequence may, for example, represent a homologous gene found in the same organism as that which produced the template, a homologous gene from a different organism, a pseudogene, cDNA and genomic sequences, etc.

通常，将已知序列组中最接近序列的候选序列选作正确序列。或者，例如，如果有理由相信该测序方法可能产生高误差率，那么优选选择数据库中的相应序列作为正确序列。例如，如果已知误差率超过了预定阈值，那么优选选择数据库中的序列作为正确序列。Typically, the candidate sequence that is the closest sequence in the set of known sequences is selected as the correct sequence. Alternatively, for example, if there is reason to believe that the sequencing method may produce a high error rate, then the corresponding sequence in the database is preferably selected as the correct sequence. For example, if the error rate is known to exceed a predetermined threshold, then preferably a sequence in the database is selected as the correct sequence.

为保证从多种候选序列发现匹配的可能性所需的长度取决于各种因素，包括但不限于：具体的已知序列组、接受匹配的阈值等。通常，长度约为25-26个核苷酸的序列在一般有机体的基因组中仅出现一次。因此，产生约为此长度的候选序列足以鉴定正确序列。通常，候选序列的长度应该至少为10个核苷酸，优选至少15个，至少20个核苷酸，如20-25、25-30、30-35、35-40、45-50个核苷酸或甚至更长。The length required to guarantee the likelihood of finding a match from multiple candidate sequences depends on various factors including, but not limited to: the particular set of known sequences, the threshold for accepting matches, etc. Typically, sequences of about 25-26 nucleotides in length occur only once in the genome of an average organism. Therefore, generating candidate sequences of approximately this length is sufficient to identify the correct sequence. Generally, the length of the candidate sequence should be at least 10 nucleotides, preferably at least 15, at least 20 nucleotides, such as 20-25, 25-30, 30-35, 35-40, 45-50 nucleotides sour or even longer.

通过比较第一组候选序列与第二组候选序列进行序列鉴定Sequence identification by comparing a first set of candidate sequences with a second set of candidate sequences

在本发明的某些实施方式中，用按照第一种编码方案编码的第一种探针家族集合产生探针家族的第一种有序列表，从中产生第一组候选序列，然后用按照第二种编码方案编码的第二种探针家族集合从同一模板产生探针家族的第二种有序列表，并从中产生第二组候选序列，从而进行解码。在两次测序反应之间去除模板上新合成的DNA链，或用第二种探针家族集合测序序列相同的模板。比较候选序列组。应理解，无论采用哪一种探针家族集合，其中一个候选序列是正确序列，而其它不是正确序列(或者最多是部分正确)。因此，每组候选序列都含有正确序列，但在大多数情况下，任何给定候选序列中的其它候选序列都不同于另一组候选序列中发现的序列。因此，只比较两组候选序列，即可确定正确序列。不必用两种编码不同的探针家族集合产生长度相等的候选序列。在本发明的优选实施方式中，用第二种探针家族集合产生的候选序列可以短至2个核苷酸，或者，用第二种探针家族集合产生的探针家族的有序列表可以短至1个元件(即1个连接和检测循环)。In certain embodiments of the invention, a first set of probe families encoded according to a first encoding scheme is used to generate a first ordered list of probe families from which a first set of candidate sequences is generated, and then encoded according to the first encoding scheme The second set of probe families encoded by the two encoding schemes generates a second ordered list of probe families from the same template, from which a second set of candidate sequences is generated for decoding. Either remove newly synthesized DNA strands from the template between sequencing reactions, or use a second probe family pool to sequence identically sequenced templates. Compare groups of candidate sequences. It should be understood that no matter which set of probe families is used, one of the candidate sequences is the correct sequence and the other is not (or at most partially correct). Thus, each set of candidate sequences contains the correct sequence, but in most cases the other candidate sequences in any given candidate sequence are different from the sequences found in the other set of candidate sequences. Therefore, the correct sequence can be determined by simply comparing two sets of candidate sequences. It is not necessary to use two sets of probe families with different encodings to generate candidate sequences of equal length. In preferred embodiments of the invention, the candidate sequences generated using the second collection of probe families can be as short as 2 nucleotides, or alternatively, the ordered list of probe families generated using the second collection of probe families can be As short as 1 component (i.e. 1 connection and detection cycle).

图31A-31C显示了候选序列产生和用两种区别标记的优选探针家族解码的例子。图31A显示了按照第一种编码方案编码的优选的探针家族集合。图31B显示了从探针家族黄、绿、红、蓝(可表示为“2314”，其中红＝1，黄＝2，绿＝3，蓝＝4)的有序列表产生4种候选序列，其中假定正确序列是CAGGC(粗体表示)。图31C显示了按照第二种编码方案编码的优选的探针家族集合。由于模板中第一个二核苷酸是CA，所以在第一个延伸循环中黄探针家族中最上面的探针会连接于可延伸末端。这使第一个二核苷酸成为下组候选序列：CA、TC、GG、AT。在用第一种探针家族集合产生的候选序列中，只有序列CAGGC从这些二核苷酸中的任一种开始。因此，它一定是正确序列。通常，第一种和第二种探针家族集合优选满足以下条件：比较第一种和第二种探针家族集合时，(i)第一种集合中各探针家族的4种探针中3种应分配给第二种集合的新探针家族；和(ii)这3个重新分配的探针各自应分配给第二种集合中的不同探针家族。Figures 31A-31C show examples of candidate sequence generation and decoding with two differentially labeled preferred probe families. Figure 31A shows a preferred collection of probe families encoded according to the first encoding scheme. Figure 31B shows the generation of 4 candidate sequences from an ordered list of probe families yellow, green, red, blue (may be expressed as "2314", wherein red=1, yellow=2, green=3, blue=4), where the correct sequence is assumed to be CAGGC (indicated in bold). Figure 31C shows a preferred set of probe families encoded according to the second encoding scheme. Since the first dinucleotide in the template is CA, the uppermost probe in the yellow probe family will be ligated to the extendable end in the first extension cycle. This makes the first dinucleotide the following set of candidate sequences: CA, TC, GG, AT. Of the candidate sequences generated with the first set of probe families, only the sequence CAGGC began with either of these dinucleotides. Therefore, it must be the correct sequence. Generally, the first and second probe family collections preferably meet the following conditions: when comparing the first and second probe family collections, (i) among the 4 probes of each probe family in the first collection The 3 new probe families should be assigned to the second pool; and (ii) each of the 3 reassigned probes should be assigned to a different probe family in the second pool.

用已知核苷酸种类解码探针家族的有序列表Decoding the ordered list of probe families with known nucleotide species

如上所述，可通过假定延伸双链体或模板中的一个核苷酸的种类产生候选序列。根据所用的具体探针家族集合，通常需要产生至少4个候选序列。然而，如果已知模板中(因此也是延伸双链体中)至少一种核苷酸的种类可避免产生多种候选序列。在这种情况下，只需要产生一个候选序列。产生候选序列的方法与上述方法相同。可用任何测序方法测定模板中至少一个核苷酸的种类，这些测序方法包括但不限于：测序方法A，用一组区别标记的核苷酸和聚合酶从起始寡核苷酸起进行引物延伸等。应理解，首先可用不同于测序方法AB的测序方法测序模板中一个或多个核苷酸，然后可去除起始寡核苷酸和任何延伸产物，用测序方法AB对相同模板进行测序(反之亦然)。As described above, candidate sequences can be generated by assuming the species of one nucleotide in the extended duplex or template. Depending on the particular set of probe families used, it is usually necessary to generate at least 4 candidate sequences. However, the generation of multiple candidate sequences can be avoided if the species of at least one nucleotide in the template (and thus in the extending duplex) is known. In this case, only one candidate sequence needs to be generated. The method of generating candidate sequences is the same as above. The identity of at least one nucleotide in the template can be determined by any sequencing method, including but not limited to: Sequencing Method A, primer extension from a starting oligonucleotide using a set of differentially labeled nucleotides and a polymerase wait. It will be appreciated that one or more nucleotides in a template may first be sequenced using a sequencing method other than Sequencing Method AB, and then the same template may be sequenced using Sequencing Method AB (and vice versa) after the initial oligonucleotide and any extension products may be removed. Of course).

另一种方法是除了序列待测定的部分以外，只测序含有一个或多个种类已知的核苷酸的模板。例如，起始寡核苷酸结合的区域和未知序列开始处之间的部分可包括种类已知的一种或多种核苷酸。通过对此部分模板进行测序方法AB，会预先测定该序列中一个或多个核苷酸的种类，因此可用于产生一个候选序列，它会是正确序列。Another approach is to sequence only templates containing one or more nucleotides of known species, except for the portion to be sequenced. For example, the portion between the region where the initial oligonucleotide binds and the beginning of the unknown sequence may include one or more nucleotides of known species. By sequencing method AB on this partial template, one or more nucleotides in the sequence are preidentified and thus can be used to generate a candidate sequence, which will be the correct sequence.

因此，上述方法包括以下步骤：(i)通过确定哪一种类与已知核苷酸种类以及其近端核苷酸连接于已知种类的核苷酸相邻核苷酸相对位置的该探针受限部分的可能序列相符，给模板上与已知种类的核苷酸相邻的核苷酸指定种类；(ii)通过确定哪一种类与其近端核苷酸连接于后续核苷酸相对位置的该探针受限部分的可能序列相符，给所述后续核苷酸指定种类；和(iii)重复步骤(ii)，直到测定该序列。应理解，这些步骤等价于对延伸双链体进行相同步骤，因为所述延伸双链体和待测序的模板区域之间有准确的对应关系。Thus, the method described above comprises the steps of: (i) determining which species is associated with the known nucleotide species and its proximal nucleotides are linked to the relative position of the nucleotides adjacent to the nucleotides of the known species. The probable sequence coincidence of the restricted portion assigns species to nucleotides adjacent to nucleotides of known species on the template; (ii) by determining which species and its proximal nucleotides are linked relative to subsequent nucleotides assigning species to said subsequent nucleotides; and (iii) repeating step (ii) until the sequence is determined. It will be understood that these steps are equivalent to performing the same steps on the extended duplex, since there is an exact correspondence between the extended duplex and the template region to be sequenced.

用较不优选的探针家族测序Sequencing with less preferred probe families

可以类似于使用优选探针家族集合的方式采用较不优选的探针家族集合进行测序方法AB。然而，结果可能在许多方面存在差异。例如，可以从候选序列完全鉴定某些序列部分，而无需额外信息。图32显示了用图28所示编码的较不优选的探针家族集合进行序列测定的例子。序列测定方法通常如优选探针家族集合中所述。感兴趣模板具有序列“GCATGA”，此时产生的探针家族的有序列表为“12341”。假定位置1上的核苷酸是A，则产生的候选序列为“ACATGA”。然而，与优选探针家族集合的情况不同，第二个核苷酸有两种可能，因为标记“1”与以A为第一个核苷酸的两种不同二核苷酸，即“AA”和“AG”有关。因此，假定位置1上的核苷酸是A，则产生的第二候选序列为“ACATGC”。假定位置1上的核苷酸是G，则产生的候选序列是“GCATGA”，也产生“GCATGC”作为候选序列。由于标记“1”与位置1上为C或T的任何二核苷酸无关，所以没有产生以“C”或“T”开头的候选序列。图32显示了互相比对的4种候选序列。应观察到，所有候选序列中中间的4个核苷酸都是CATG。因此，正确序列的位置2-5上一定包括CATG。如果仅对这些核苷酸感兴趣，则无需进行进一步解码步骤。Sequencing methods AB can be performed with less preferred probe family pools in a manner similar to the use of preferred probe family pools. However, results may vary in many ways. For example, certain sequence portions can be fully identified from a candidate sequence without additional information. Figure 32 shows an example of sequence determination using the less preferred set of probe families encoded in Figure 28. Sequencing methods are generally as described in the preferred probe family collections. The template of interest has the sequence "GCATGA", at this point the generated sequence listing for the probe family is "12341". Assuming that the nucleotide at position 1 is A, the resulting candidate sequence is "ACATGA". However, unlike the case of the preferred probe family set, there are two possibilities for the second nucleotide, since the label "1" differs from two different dinucleotides with A as the first nucleotide, namely "AA " is related to "AG". Therefore, assuming that the nucleotide at position 1 is A, the resulting second candidate sequence is "ACATGC". Assuming that the nucleotide at position 1 is G, the resulting candidate sequence is "GCATGA", and "GCATGC" is also generated as a candidate sequence. Since the marker "1" is not associated with any dinucleotide that is C or T at position 1, no candidate sequences starting with "C" or "T" were generated. Figure 32 shows 4 candidate sequences aligned to each other. It should be observed that the middle 4 nucleotides in all candidate sequences are CATG. Therefore, positions 2-5 of the correct sequence must include CATG. If only these nucleotides are of interest, no further decoding steps are required.

如上所述，探针家族集合不一定由四种不同探针家族组成，但可由大于2小于4^N种组成，其中N是受限部分长度。然而，如果采用少于4个家族，则可能必须产生4种以上候选序列，而如果采用4种以上探针家族，则需要其它标记。由于这些和其它原因，优选由4种探针家族组成的集合。As mentioned above, the collection of probe families does not necessarily consist of four different probe families, but may consist of more than 2 and less than 4 ^N species, where N is the length of the constrained section. However, if fewer than 4 families are used, more than 4 candidate sequences may have to be generated, and if more than 4 probe families are used, additional labels are required. For these and other reasons, a set consisting of 4 probe families is preferred.

通过候选序列的互相比较进行序列鉴定Sequence identification by comparison of candidate sequences against each other

在本发明的某些实施方式中，可通过候选序列的互相比较确定感兴趣的部分或全部序列。通常，这种比较不足以确定哪个候选序列在整个长度上正确。然而，如果两种或多种候选序列在一部分序列上相同或足够相似，此信息可能足以明确鉴定模板中上述部分内的核苷酸序列。In some embodiments of the present invention, part or all of the sequences of interest can be determined by comparing candidate sequences with each other. Often this comparison is insufficient to determine which candidate sequence is correct over its entire length. However, if two or more candidate sequences are identical or sufficiently similar over a portion of the sequence, this information may be sufficient to unambiguously identify the nucleotide sequence within that portion of the template.

如果需要，可用交替编码的探针家族对模板再测序一次或多次，以产生所鉴定序列的其它部分。可合并这些部分，以组装所需长度的序列。If desired, the template can be resequenced one or more times with alternately coded probe families to generate additional portions of the identified sequence. These sections can be combined to assemble sequences of desired length.

用探针家族纠正错误Correcting Mistakes with Probe Families

常常需要测序代表同一DNA序列的所有或部分序列的多个模板和比对这些序列。如果该模板仅含有部分感兴趣区域，那么通过组装重叠片段获得较长序列。例如，测序有机体的基因组时，一般将DNA片段化，测序足够片段，从而通过几个(如4-12个)不同片段延伸每个DNA。本领域技术人员已知将重叠序列组装成较长序列的计算机软件。It is often desirable to sequence multiple templates representing all or part of the same DNA sequence and to align these sequences. If the template contains only part of the region of interest, longer sequences are obtained by assembling overlapping fragments. For example, when sequencing the genome of an organism, the DNA is typically fragmented and enough fragments are sequenced so that each DNA is extended by several (eg, 4-12) different fragments. Computer software for assembling overlapping sequences into longer sequences is known to those skilled in the art.

采用常规测序方法时，常常是多个片段在一段区域上完美比对，但这些片段之一(称为异常片段)在该区域的一个位置上不同于其它片段。确定单独的差异是否代表测序差错或该位置上是否存在真实差异(如单核苷酸多态性)可能是有问题的。With conventional sequencing methods, it is often the case that multiple fragments align perfectly over a region, but one of these fragments (called an outlier fragment) differs from the others at one position in the region. Determining whether an individual difference represents a sequencing error or whether there is a true difference (eg, a single nucleotide polymorphism) at that position can be problematic.

本发明提供了用测序方法AB进行差错检验的新方法。按照该方法，用上述区别标记的探针家族集合测序包含代表同一DNA段的片段的模板，为各模板产生探针家族的有序列表。比对探针家族的有序列表。如果几种列表在预定长度，如列表中的10、15、20或25个或更多个元件上完美比对，除了一个列表在一个位置上不同于其它片段，那么将差异归因于测序差错。如果存在实际多态性，由异常片段产生的有序探针列表将在两个或多个相邻位置上不同于从其它片段产生的有序探针列表。The present invention provides a new method for error checking with sequencing method AB. According to this method, templates comprising fragments representing the same DNA segment are sequenced using the above-described collection of differentially labeled probe families, generating an ordered list of probe families for each template. An ordered list of alignment probe families. If several lists align perfectly over a predetermined length, such as 10, 15, 20, or 25 or more elements in the list, except that one list differs from the other fragments in one position, then attribute the difference to a sequencing error . If an actual polymorphism is present, the ordered probe list generated from the abnormal fragment will differ from the ordered probe list generated from other fragments at two or more adjacent positions.

例如，将采用表1编码4的优选探针家族集合的测序方法AB应用于含有序列5’-CAGACGACAAGTATAATG-3′的模板产生以下探针家族的有序列表：“23324322132444142”，如下所示：For example, applying the sequencing method AB using the set of preferred probe families encoded in Table 1 to a template containing the sequence 5'-CAGACGACAAGTATAATG-3' yields an ordered list of the following probe families: "23324322132444142", as follows:

2332432213244414223324322132444142

CAGACGACAAGTATAATGCAGACGACAAGTATAATG

如果有实际SNP(如CAGACGAGAAGTATAATG，其中下划线核苷酸代表多态性位点)，则导致该列表中两个连续元件发生改变：23324333132444142，其中下划线表示由SNP引起的改变。探针家族的有序列表和含有SNP序列之间的对应关系见下：If there is an actual SNP (such as CAGACGA G AAGTATAATG, where the underlined nucleotide represents the polymorphic site), it will cause changes in two consecutive elements in this list: 233243 33 132444142, where the underlined indicates the change caused by the SNP. The correspondence between the ordered list of probe families and the sequences containing SNPs is as follows:

23324333132444142233243 33 132444142

CAGACGAGAAGTATAATGCAGACGA G AAGTATAATG

然而，与连接延伸探针相连标记的鉴定差错导致探针家族的有序列表中产生一个差错以及得到的候选序列中从该点向前发生改变。例如，测定与第7个连接延伸探针连接的标记23324332132444142(其中下划线的数字代表错误鉴定的标记)中的差错将得到的候选序列改变成CAGACGAGTTCATATTAC，其中下划线部分表示由测序差错引起的改变。探针家族的有序列表和该序列之间的对应关系见下：However, an error in the identification of the label attached to the ligation extension probe results in an error in the ordered listing of the probe family and a change in the resulting candidate sequence from that point forward. For example, determining errors in markers 233243 3 2132444142 ligated to the 7th ligation extension probe (where underlined numbers represent misidentified markers) would change the resulting candidate sequence to CAGACGA GTTCATATTAC , where the underlined portion represents a sequence error Change. The ordered list of probe families and the correspondence between the sequences are as follows:

23324332132444142233243 3 2132444142

CAGACGAGTTCATATTAC CAGACGA GTTCATATTAC

采用3个碱基、4种标记方案时，含有SNP的片段会使异常片段的探针家族有序列表产生3个连续差异，但测序差错仅产生1个差错。例如，采用如图29所示编码的探针家族集合时，序列CAGACGACAAGTATAATG的探针家族种类的有序列表见下：When using 3 bases and 4 labeling schemes, fragments containing SNPs will produce 3 consecutive differences in the probe family ordered list of abnormal fragments, but only 1 error will be caused by sequencing errors. For example, when using the probe family set coded as shown in Figure 29, the ordered list of the probe family types of the sequence CAGACGACAAGTATAATG is as follows:

23222241324122442322224132412244

CAGACGACAAGTATAATGCAGACGACAAGTATAATG

含有SNP的异常片段，如CAGACGAGAAGTATAATG，会导致探针家族的有序列表在3个连续位置上不同于由不含SNP的片段产生的有序列表，如下所示：Unusual fragments containing SNPs, such as CAGACGA G AAGTATAATG, will result in an ordered list of probe families that differs at 3 consecutive positions from the ordered list produced by a fragment without a SNP, as follows:

232221333241224423222 133 32412244

CAGACGAGAAGTATAATGCAGACGA G AAGTATAATG

测序差错将使探针家族的有序列表中仅产生一个差异，会导致产生的候选序列从差错点向前完全不同。A sequencing error will produce only one difference in the ordered list of probe families, resulting in candidate sequences that are completely different from the point of error onwards.

因此，当一个片段(异常片段)产生的探针家族有序列表与代表同一DNA段的其它片段产生的探针家族有序列表比对，但在一个单独位置上与其它有序列表不同时，含有该差异的有序列表可能代表测序差错(探针家族的错误鉴定)。当一个片段(异常片段)产生的探针家族有序列表与代表同一DNA段的其它片段产生的探针家族有序列表比对，但在2个或多个连续位置上不同于其它有序列表时，该异常片段可能含有SNP。优选地，探针家族有序列表的比对部分的长度至少为3或4个元件，优选长度至少为6个、8个或更多个元件。优选地，比对部分至少66％相同、至少70％相同、至少80％相同、至少90％相同或更多相同，如100％相同。Thus, when a sequenced list of probe families produced by a fragment (an abnormal fragment) aligns with ordered lists of probe families produced by other fragments representing the same DNA segment, but differs from the other sequenced lists at a single position, Sequenced lists containing this difference likely represent a sequencing error (misidentification of a probe family). When an ordered list of probe families produced by a fragment (an abnormal fragment) aligns with ordered lists of probe families produced by other fragments representing the same DNA segment, but differs from the other ordered lists at 2 or more consecutive positions , the abnormal fragment may contain SNP. Preferably, the aligned portion of the probe family ordered list is at least 3 or 4 elements in length, preferably at least 6, 8 or more elements in length. Preferably, the aligned portions are at least 66% identical, at least 70% identical, at least 80% identical, at least 90% identical or more, such as 100% identical.

相似地，在第一部分序列上比对某片段候选序列与代表同一DNA段的其它片段的候选序列，但在序列第二部分上与其它片段的候选序列显著不同时，则可能发生了测序差错。在两部分序列上比对某片段候选序列与代表同一DNA段的其它片段的候选序列，但仅在一个位置上不同时，该异常片段可能含有SNP。优选地，候选序列的比对部分的长度至少是4个核苷酸。优选地，比对部分至少66％相同、至少70％相同、至少80％相同、至少90％相同或更多相同，如100％相同。Similarly, a sequencing error may occur when a segment candidate sequence is aligned with other segment candidate sequences representing the same DNA segment on the first part of the sequence, but is significantly different from other segment candidate sequences on the second part of the sequence. When the candidate sequence of a fragment is compared with the candidate sequences of other fragments representing the same DNA fragment on the two parts of the sequence, but only one position is different, the abnormal fragment may contain SNP. Preferably, the aligned portion of the candidate sequence is at least 4 nucleotides in length. Preferably, the aligned portions are at least 66% identical, at least 70% identical, at least 80% identical, at least 90% identical or more, such as 100% identical.

因此，本发明提供了区分单核苷酸多态性与测序差错的方法，所述方法包括以下步骤：(a)用测序方法AB测序多种模板，其中所述模板代表单个核酸序列的重叠片段；(b)比对步骤(a)获得的序列；和(c)如果该序列在第一个部分上基本相同、在第二个部分上显著不同(各部分的长度至少为3个核苷酸)，则将序列之间的差异确定为代表测序差错。本发明还提供了区分单核苷酸多态性与测序差错的方法，所述方法包括以下步骤：(a)用代表一个核酸序列的重叠片段的多种模板进行测序方法AB，从而获得多种探针家族有序列表；(b)比对步骤(a)获得的探针家族有序列表，以获得其中有序列表至少90％相同的比对区域；和(c)如果有序列表仅在比对区域内一个位置上不同，则将探针家族的有序列表之间的差异确定为代表测序差错；或(d)如果有序列表在比对区域内的两个或多个连续位置上不同，则将探针家族的有序列表之间的差异确定为代表单核苷酸多态性。Accordingly, the present invention provides a method for distinguishing single nucleotide polymorphisms from sequencing errors, said method comprising the steps of: (a) sequencing a plurality of templates by sequencing method AB, wherein said templates represent overlapping fragments of a single nucleic acid sequence ; (b) aligning the sequences obtained in step (a); and (c) if the sequences are substantially identical in a first part and significantly different in a second part (each part is at least 3 nucleotides in length ), the difference between the sequences was determined to represent a sequencing error. The present invention also provides a method for distinguishing single nucleotide polymorphisms from sequencing errors, the method comprising the following steps: (a) performing sequencing method AB with various templates representing overlapping fragments of a nucleic acid sequence, thereby obtaining multiple an ordered list of probe families; (b) aligning the ordered lists of probe families obtained in step (a) to obtain aligned regions in which the ordered lists are at least 90% identical; and (c) if the ordered lists are only in A difference between ordered lists of probe families is determined to represent a sequencing error if they differ at one position within the aligned region; or (d) if the ordered lists are at two or more consecutive positions within the aligned region Differences between ordered lists of probe families were determined to represent single nucleotide polymorphisms.

不定域(delocalized)信息集合A collection of delocalized information

如本领域所熟知，“位”(二进制数字)指逢2进位的一个数字，即1或0，它代表数字数据的最小单位。因为核苷酸可以是四种不同种类之一，应理解，限定核苷酸种类需要2位。例如，A、G、C和T可分别表示为00、01、10和11。在区别标记的探针家族的优选集合中限定探针家族名称需要2位，因为有四种区别标记的探针家族。As is well known in the art, a "bit" (binary number) refers to a number in binary, ie, 1 or 0, which represents the smallest unit of digital data. Since a nucleotide can be of one of four different classes, it is understood that 2 bits are required to define the class of a nucleotide. For example, A, G, C, and T can be represented as 00, 01, 10, and 11, respectively. Two bits are required to define the probe family name in the preferred set of differentially labeled probe families since there are four differentially labeled probe families.

在最常规的测序形式和测序方法A中，各核苷酸被确定为离散单元，并一次收集对应于一种核苷酸的信息。各检测步骤从一个核苷酸获得两位信息。相反，测序方法AB在各检测步骤中从多个核苷酸各自获得少于2位信息，而采用优选的探针家族集合时每个检测步骤仍获得2位信息。探针家族有序列表中的各探针家族名称代表了模板中至少2个核苷酸的种类，准确数目由探针的序列测定部分长度决定。例如，考虑了采用按照表1编码4编码的探针家族集合从序列5’-CAGACGACAAGTATAATG-3′获得的探针家族的有序列表：In the most conventional form of sequencing and sequencing method A, each nucleotide is determined as a discrete unit and information corresponding to one nucleotide at a time is collected. Each detection step obtains two bits of information from one nucleotide. In contrast, sequencing method AB obtains less than 2 bits of information each from multiple nucleotides in each detection step, while still obtaining 2 bits of information per detection step with the preferred probe family pool. Each probe family name in the probe family ordered list represents at least 2 nucleotide types in the template, and the exact number is determined by the length of the sequence determination part of the probe. For example, consider the ordered list of probe families obtained from the sequence 5'-CAGACGACAAGTATAATG-3' using the set of probe families coded 4 according to Table 1:

2332432213244414223324322132444142

CAGACGACAAGTATAATGCAGACGACAAGTATAATG

探针家族2是该列表中的第一个探针家族，因为二核苷酸CA是探针家族2的探针中存在的指定部分之一。探针家族3是该列表中的第二个探针家族，因为二核苷酸AG是探针家族3的探针中存在的指定部分之一。如上所述，由于有4种探针家族，各探针家族种类代表2位信息。因此，各检测步骤收集了关于2个核苷酸的2位信息，各核苷酸平均产生1位信息。Probe family 2 is the first probe family in this list because the dinucleotide CA is one of the specified moieties present in probe family 2 probes. Probe family 3 is the second probe family in this list because the dinucleotide AG is one of the specified moieties present in probe family 3 probes. As described above, since there are 4 types of probe families, each probe family type represents 2 bits of information. Thus, each detection step collects 2 bits of information on 2 nucleotides, yielding an average of 1 bit of information per nucleotide.

因此，本发明提供了一种序列测定方法，其中所述方法包括多个延伸、连接和检测循环，其中所述检测步骤包括平均同时获得模板中至少两个核苷酸各自两位信息，而不获得任何单个核苷酸的两位信息。本发明还提供了用第一个寡核苷酸探针家族集合测定模板多核苷酸的核苷酸序列的方法，所述方法包括以下步骤：(a)进行连续的延伸、连接、检测和切割循环，其中在各循环中平均同时获得模板中至少两个核苷酸各自两位信息，而不获得任何单个核苷酸的两位信息；和(b)将步骤(a)中获得的信息与至少一位额外信息合并，以确定该序列。在本发明的各种实施方式中，所述至少一位额外信息包括选自下组的信息：模板中的核苷酸种类，比较候选序列与至少一种已知序列获得的信息；和用寡核苷酸探针家族的第二集合重复该方法获得的信息。Therefore, the present invention provides a sequence determination method, wherein said method comprises a plurality of cycles of extension, ligation and detection, wherein said detection step comprises obtaining two bits of information of each of at least two nucleotides in a template simultaneously on average, without Get two bits of information for any single nucleotide. The present invention also provides a method for determining the nucleotide sequence of a template polynucleotide using the first set of oligonucleotide probe families, the method comprising the following steps: (a) performing continuous extension, ligation, detection and cutting cycle, wherein in each cycle the average of at least two nucleotides in the template is simultaneously obtained two bits of information, without obtaining two bits of information for any single nucleotide; and (b) combining the information obtained in step (a) with At least one bit of additional information is incorporated to determine the sequence. In various embodiments of the present invention, the at least one bit of additional information includes information selected from the group consisting of: nucleotide species in the template, information obtained by comparing a candidate sequence with at least one known sequence; Information obtained by repeating the method is repeated for a second set of nucleotide probe families.

因此，虽然该方法不获得单个核苷酸的2位信息，但采用优选的探针家族集合时以不定域方式在各循环中平均收集了模板的2位信息。采用2或3个探针家族的集合时，每个循环收集少于2位信息。Thus, although this method does not obtain 2-bit information for individual nucleotides, it does collect 2-bit information for the template on average in each cycle in a delocalized manner using the preferred probe family pool. With sets of 2 or 3 probe families, less than 2 bits of information were collected per cycle.

不定域信息收集有许多优点，包括能应用如上所述的差错检验方法。此外，由于在优选实施方式中模板中各核苷酸需检测一次以上，所以在与具体核苷酸连接的荧光团的检测中不定域信息收集有助于避免系统性偏差。Delocalized information collection has many advantages, including the ability to apply error checking methods as described above. Furthermore, since in preferred embodiments each nucleotide in the template is detected more than once, non-localized information collection helps avoid systematic bias in the detection of fluorophores attached to specific nucleotides.

除包括对探针进行连续的延伸、连接和切割循环的方法以外，本文所述的探针家族和探针家族集合还可用于各种测序方法。本发明也提供了具有上述序列和结构的探针家族和探针家族集合，其中所述探针任选地不含易切连接。例如，该探针可仅含磷酸二酯主链连接和/或可以不含引发残基。在本发明的一些实施方式中，用所述探针家族进行测序，测序中采用连续的延伸和连接循环，但各循环中不包括切割。例如，可将所述探针家族用于基于连接的方法，如WO2005021786和本领域其它文献所述。为了在这种方法中采用所述探针家族，所述探针上的标记应通过可切割接头连接，如WO2005021786所述，以便不用切割核酸的易切连接即可去除标记。这种方法可用于产生探针家族的有序列表，例如，用探针家族，而非WO2005021786所述的连接盒平行或依次进行多个反应，然后组装探针家族列表。如上所述解码该列表。In addition to methods involving sequential cycles of extension, ligation and cleavage of probes, the probe families and collections of probe families described herein can be used in a variety of sequencing methods. The present invention also provides probe families and probe family collections having the above sequences and structures, wherein the probes are optionally free of easy linkages. For example, the probe may contain only phosphodiester backbone linkages and/or may contain no priming residues. In some embodiments of the invention, the family of probes is used for sequencing in which successive cycles of extension and ligation are used, but cleavage is not included in each cycle. For example, the family of probes can be used in ligation-based methods as described in WO2005021786 and elsewhere in the art. In order to employ the probe family in this method, the label on the probe should be attached by a cleavable linker, as described in WO2005021786, so that the label can be removed without cleaving the nucleic acid by scissile ligation. This method can be used to generate an ordered list of probe families, for example, by performing multiple reactions in parallel or sequentially with probe families other than the junction cassette described in WO2005021786 and then assembling the list of probe families. Decode the list as above.

I.试剂盒I. Kit

可提供各种试剂盒来实施本发明的不同实施方式。某些试剂盒包括含有硫代磷酸酯连接的延伸寡核苷酸探针。该试剂盒还可包括一种或多种起始寡核苷酸。该试剂盒可含有适用于切割硫代磷酸酯连接的切割试剂如AgNO₃和进行切割的合适缓冲液。某些试剂盒包括含有引发残基如含有损伤碱基的核苷或脱碱基残基的延伸寡核苷酸探针。该试剂盒还可包括一种或多种起始寡核苷酸。该试剂盒可含有适用于切割核苷和相邻脱碱基残基之间的连接的切割试剂和/或适用于去除多核苷酸的损伤碱基的试剂如DNA糖基化酶。某些试剂盒包括含有二糖核苷酸的寡核苷酸探针，并包括高碘酸盐作为切割试剂。在某些实施方式中，该试剂盒含有区别标记的寡核苷酸探针家族集合。Various kits can be provided to practice different embodiments of the invention. Certain kits include extended oligonucleotide probes containing phosphorothioate linkages. The kit may also include one or more starting oligonucleotides. The kit may contain a cleavage reagent such as _AgNO3 suitable for cleavage of phosphorothioate linkages and a suitable buffer for cleavage. Certain kits include extended oligonucleotide probes containing priming residues such as nucleosides containing damaged bases or abasic residues. The kit may also include one or more starting oligonucleotides. The kit may contain a cleavage reagent suitable for cleaving linkages between nucleosides and adjacent abasic residues and/or a reagent such as a DNA glycosylase suitable for removing damaged bases of a polynucleotide. Certain kits include oligonucleotide probes containing disaccharide nucleotides and include periodate as a cleavage reagent. In certain embodiments, the kit contains a collection of differentially labeled oligonucleotide probe families.

试剂盒还可包括连接试剂(如连接酶、缓冲液等)和实施本发明具体实施方式的说明书。可包括适用于可采用的其它酶如磷酸酶、聚合酶的缓冲液。在一些情况下，这些缓冲液可能相同。试剂盒也可包括用于锚定模板的支持物，如磁珠。可用PCR扩增引物使这些珠功能化。其它任选组分包括洗涤溶液；用于PCR扩增的插入模板的载体；PCR试剂如扩增引物、热稳定性聚合酶、核苷酸；制备乳液的试剂；制备凝胶的试剂等。Kits may also include ligation reagents (eg, ligase, buffer, etc.) and instructions for practicing specific embodiments of the invention. Buffers suitable for other enzymes that may be employed such as phosphatases, polymerases may be included. In some cases these buffers may be the same. The kit may also include supports, such as magnetic beads, for anchoring the template. These beads can be functionalized with PCR amplification primers. Other optional components include washing solutions; template-inserted vectors for PCR amplification; PCR reagents such as amplification primers, thermostable polymerases, nucleotides; reagents for preparing emulsions; reagents for preparing gels, and the like.

在某些优选试剂盒中，提供了荧光标记的含有硫代磷酸酯连接的寡核苷酸探针，以使对应于不同的探针末端核苷酸的探针携带不同的可光谱分辨的荧光染料。更优选地，提供了四种这样的探针，以便使四种可光谱分辨的荧光染料和四种可能的探针末端核苷酸之间一一对应。In certain preferred kits, fluorescently labeled oligonucleotide probes containing phosphorothioate linkages are provided such that probes corresponding to different probe terminal nucleotides carry different spectrally resolved fluorophores. dye. More preferably, four such probes are provided so that there is a one-to-one correspondence between the four spectrally resolvable fluorescent dyes and the four possible probe terminal nucleotides.

试剂盒中或上可出现标识符，如条形码、射频ID标签等。例如，可采用标识符唯一地鉴定试剂盒，以进行质量控制、库存管理、跟踪、在工作站之间移动等。Identifiers, such as barcodes, radio frequency ID tags, etc., may be present in or on the kit. For example, an identifier may be used to uniquely identify a kit for quality control, inventory management, tracking, movement between workstations, and the like.

试剂盒通常包括一个或多个器皿或容器，以便单独存放某些试剂。试剂盒也可包括以相对紧密的密封封装单个容器的装置如塑料盒，以利于商业销售，其中可装入说明书、包装材料如泡沫聚苯乙烯等。Kits typically include one or more vessels or containers in which certain reagents are kept individually. Kits may also include means for enclosing a single container in a relatively tight seal, such as a plastic box, for commercial distribution, into which instructions, packaging materials such as styrofoam, and the like may be incorporated.

J.自动化测序系统J. Automated Sequencing System

本发明提供了可用于平行(即基本同时)收集多个模板的序列信息的多种自动化测序系统。优选地，将模板排列在基本平坦的基材上。图21显示了一种本发明系统的照片。如上面一张照片所示，本发明系统包括CCD相机、荧光显微镜、移动台、Peltier流动室、温度控制器、流体处理装置和专用计算机。应理解，可以对这些组件进行多种取代。例如，可采用另一种图像捕获装置。此系统的其它详细情况参见实施例9。The present invention provides a variety of automated sequencing systems that can be used to collect sequence information for multiple templates in parallel (ie, substantially simultaneously). Preferably, the templates are arranged on a substantially planar substrate. Figure 21 shows a photograph of a system of the present invention. As shown in the photo above, the system of the present invention includes a CCD camera, a fluorescence microscope, a moving stage, a Peltier flow chamber, a temperature controller, a fluid handling device, and a dedicated computer. It should be understood that various substitutions may be made for these components. For example, another image capture device may be used. See Example 9 for additional details of this system.

应理解，可采用本发明自动化测序系统和相关的图像加工方法和软件实施各种测序方法，包括本文所述基于连接的方法和其它方法，包括但不限于：用合成法测序，如借助合成的荧光原位测序(FISSEQ)(参见例如，Mitra RD等，AnalBiochem.，320(1)：55-65，2003)。正如本文所述基于连接的测序方法，可以对直接固定在半固体支持物中或之上的模板、固定于半固体支持物中或之上的微粒上的模板、直接连接于底物的模板等实施FISSEQ。It should be understood that various sequencing methods, including ligation-based methods and other methods described herein, can be implemented using the automated sequencing system of the present invention and related image processing methods and software, including but not limited to: sequencing by synthesis, such as by synthetic Fluorescence In Situ Sequencing (FISSEQ) (see, e.g., Mitra RD et al., AnalBiochem., 320(1):55-65, 2003). As described herein for ligation-based sequencing methods, templates immobilized directly in or on a semi-solid support, templates immobilized on microparticles in or on a semi-solid support, templates directly attached to a substrate, etc. Implement FISSEQ.

本发明系统的一个重要方面是流动室。通常，流动室包括具有输入和输出端口的小室，流体可通过这些端口流动。参见例如，美国专利6,406,848和6,654,505以及PCT公开号WO98053300中关于各种流动室及其制造材料和方法的讨论。流体流动使得能够向位于流动室中的实体(如模板、微粒、分析物等)添加和去除各种试剂。An important aspect of the system of the present invention is the flow chamber. Typically, a flow chamber includes a chamber with input and output ports through which fluid can flow. See, eg, US Patents 6,406,848 and 6,654,505 and PCT Publication No. WO98053300 for a discussion of various flow chambers and materials and methods for their manufacture. Fluid flow enables the addition and removal of various reagents to entities (eg, templates, microparticles, analytes, etc.) located in the flow chamber.

优选地，适用于本发明测序系统的流动室包括可以安装基本平坦的基材如玻片的位置，以使流体流过该基材表面，还包括允许进行光照、激发、信号获得等的窗口。按照本发明方法，实体如微粒在进入流动室之前一般排列在基材上。Preferably, a flow cell suitable for use in a sequencing system of the invention includes a location for mounting a substantially flat substrate, such as a glass slide, to allow fluid to flow across the surface of the substrate, and windows to allow for illumination, excitation, signal acquisition, and the like. According to the method of the present invention, entities such as particles are typically arranged on a substrate prior to entering the flow chamber.

在本发明的某些实施方式中，垂直地安置流动室，以便空气气泡从流动室顶端逃逸。通过安置流动室，使流径从流动室的底端向顶端运行，如输入端口位于流动室的底端，输出端口位于流动室的顶端。由于可引入的任何气泡都能浮起，所以它们快速浮向输出端口，而不使光照窗口变模糊。由于气泡密度低于液体密度而使气泡上升到液体表面的这种方法在本文中称为“重力气泡置换”。因此，本发明提供了流动室取向允许进行重力气泡置换的测序系统。优选地，在流动室中垂直安装直接或间接连接有微粒(如共价或非共价连接于基材)或者含有粘着或固定于基材上半固体支持物中或之上的微粒的基材，即该基材的最大平坦表面与地平面垂直。由于在优选实施方式中，微粒固定在支持物或基材中或之上，所以它们的相对位置基本固定，这有利于连续获得图像和图像记录。In some embodiments of the invention, the flow chamber is positioned vertically so that air bubbles escape from the top of the flow chamber. By arranging the flow chamber, the flow path runs from the bottom end of the flow chamber to the top end, such that the input port is located at the bottom end of the flow chamber and the output port is located at the top end of the flow chamber. Since any air bubbles that can be introduced are buoyant, they float quickly towards the output port without obscuring the lighting window. This method of raising the bubbles to the surface of the liquid due to their lower density than the liquid is referred to herein as "gravitational bubble displacement". Accordingly, the present invention provides a sequencing system in which the orientation of the flow chamber allows for gravitational bubble displacement. Preferably, the substrate is mounted vertically in the flow cell to which the microparticles are directly or indirectly attached (e.g., covalently or non-covalently attached to the substrate) or containing microparticles adhered or immobilized in or on a semi-solid support on the substrate , that is, the largest flat surface of the substrate is perpendicular to the ground plane. Since, in preferred embodiments, the microparticles are immobilized in or on the support or substrate, their relative positions are substantially fixed, which facilitates continuous image acquisition and image recording.

图24A-J显示了不同方向上本发明流动室或其部分的示意图。本发明流动室可用于各种目的，包括但不限于：分析方法(如核酸分析法如测序、杂交实验等；蛋白质分析法、结合实验、筛选实验等)。流动室也可用于进行合成，如产生组合文库等。Figures 24A-J show schematic views of flow chambers or portions thereof of the present invention in different orientations. The flow cell of the present invention can be used for various purposes, including but not limited to: analytical methods (such as nucleic acid analysis methods such as sequencing, hybridization experiments, etc.; protein analysis methods, binding experiments, screening experiments, etc.). Flow cells can also be used to perform synthesis, such as generating combinatorial libraries.

图22显示了另一种本发明自动化测序系统的示意图。将流动室安装在温控自动台(与实施例9所述相似)上，并连接于流体处理系统，如装有多端口阀的注射器泵。该平台容纳多个流动室，以便在另一流动室上进行其它步骤如延伸、连接和切割时对一个流动室成像。这种方法最大程度利用了昂贵的光学系统，同时提高了通量。Fig. 22 shows a schematic diagram of another automated sequencing system of the present invention. The flow cell is mounted on a temperature-controlled automated bench (similar to that described in Example 9) and connected to a fluid handling system, such as a syringe pump equipped with a multi-port valve. The platform accommodates multiple flow cells in order to image one flow cell while other steps such as extension, connection and cutting are performed on the other flow cell. This approach maximizes the use of expensive optics while increasing throughput.

流体线上装配有光学和/或电导传感器，以检测气泡并监测试剂使用。流体系统的温度控制和传感器保证了在合适温度下维持试剂的长期稳定性，但当它们进入流动室时提高到工作温度，以避免退火、连接和切割步骤期间的温度起伏现象。优选将试剂预先包装到试剂盒中，以防止加样时出错。Fluid lines are equipped with optical and/or conductivity sensors to detect air bubbles and monitor reagent usage. Temperature control and sensors in the fluidic system ensure that the long-term stability of the reagents is maintained at the proper temperature, but raised to the operating temperature as they enter the flow chamber to avoid temperature fluctuations during annealing, ligation, and cleavage steps. Reagents are preferably pre-packaged into the kit to prevent errors when loading samples.

光学器件包括四台相机-各自通过四个滤镜组之一拍摄一张图片。为了降低光漂白效应，可工程改造光照光学器件，使其仅照亮成像区域，以防止视野边缘出现多重照射。可通过标准的无限校正显微镜物镜以及标准分束器和滤光片搭建成像光学器件。可用标准的2,000×2,000像素CCD相机捕获图像。该系统加入了适用于光学器件的机械支承。优选监测和记录光照强度，以备分析软件之用。The optics consist of four cameras - each taking a picture through one of four filter sets. To reduce photobleaching effects, illumination optics can be engineered to illuminate only the imaged area, preventing multiple illumination at the edges of the field of view. Imaging optics can be built from standard infinity-corrected microscope objectives along with standard beam splitters and filters. Images can be captured with a standard 2,000 x 2,000 pixel CCD camera. The system incorporates mechanical support for the optics. Light intensity is preferably monitored and recorded for analysis software.

为了快速获得多幅图像(如在一个代表性实施方式中约1800或更多个非重叠图像视野)，该系统优选采用快速自动聚焦系统。本领域熟知基于对图像本身的分析的自动聚焦系统。它们通常需要至少5帧/聚焦事件。由于获得聚焦图像需要额外光照(增加光漂白)，所以这种方法既慢又昂贵。在本发明的某些实施方式中，采用了另一种自动聚焦系统，如基于独立光学器件的系统，它聚焦的速度与机械系统可反应的速度同样快。本领域已知这种系统，包括例如用于消费级CD播放器的聚焦系统，它能在CD播放时实时维持亚微米聚焦。For rapid acquisition of multiple images (eg, about 1800 or more non-overlapping image fields in one representative embodiment), the system preferably employs a fast autofocus system. Autofocus systems based on analysis of the image itself are well known in the art. They usually require at least 5 frames/focus event. This method is slow and expensive due to the additional light (increased photobleaching) required to obtain an in-focus image. In some embodiments of the invention, another autofocus system is used, such as a system based on independent optics, which can focus as fast as the mechanical system can react. Such systems are known in the art, including, for example, focusing systems for consumer CD players that maintain submicron focus in real time while the CD is playing.

在本发明的某些实施方式中，该系统是远程操作。实施特定方案的脚本可储存于中央数据库，下载用于各测序轮次。可给样品编条形码，以维持样品跟踪的完整性和将样品与最终数据关联。中央实时监测能快速分辨过程误差。在某些实施方式中，将该设备收集的图像立即上载到中央多TB存储系统和一个或多个处理器库。采用来自中央数据库的跟踪数据，处理器分析图像并产生序列数据，任选地产生处理规格，如背景荧光水平和珠密度，以(如)跟踪设备性能。In some embodiments of the invention, the system is operated remotely. Scripts to implement specific protocols can be stored in a central database and downloaded for each sequencing run. Samples can be barcoded to maintain the integrity of sample tracking and to associate samples with final data. Central real-time monitoring enables rapid resolution of process errors. In certain embodiments, images collected by the device are uploaded immediately to a central multi-terabyte storage system and one or more processor banks. Using tracking data from the central database, the processor analyzes images and generates sequence data, optionally generating processing specifications, such as background fluorescence levels and bead density, to eg track device performance.

用控制软件适当地排列泵、平台、相机、滤光片、温度控制器，并批注和储存图像数据。提供了用户界面，以(如)辅助操作者建立并维持该设备，该用户界面优选包括加载/卸载玻片时确定平台位置和启动流体线的功能。可包括显示功能，以(如)向操作者显示各种运行参数，如温度、平台位置、当前滤光片配置、运行方案的状态等。优选地包括记录跟踪数据如试剂批号和样品ID的数据库的界面。Use the control software to properly arrange pumps, stages, cameras, filters, temperature controllers, and annotate and store image data. A user interface is provided to, eg, assist an operator in setting up and maintaining the device, preferably including functions for determining platform position and activating fluid lines when loading/unloading slides. Display functions may be included to, for example, show an operator various operating parameters such as temperature, platform position, current filter configuration, status of the operating protocol, and the like. An interface to a database that records tracking data such as reagent lot numbers and sample IDs is preferably included.

K.图像和数据处理方法K. Image and Data Processing Methods

本发明提供了至少部分以计算机可读介质上储存的计算机编码(即软件)的形式实施的各种图像和数据处理方法。实施例9和10中列出了进一步详细情况。此外，通常，测序方法A和B通常采用合适的计算机软件来进行处理步骤，这些处理步骤包括(例如)保持跟踪多个测序反应中收集的数据、汇编这些数据、产生候选序列、进行序列比较等。The present invention provides various image and data processing methods implemented at least in part in the form of computer code (ie, software) stored on a computer-readable medium. Further details are set forth in Examples 9 and 10. In addition, generally, sequencing methods A and B typically employ suitable computer software to perform processing steps that include, for example, keeping track of data collected in multiple sequencing reactions, compiling such data, generating candidate sequences, performing sequence comparisons, etc. .

L.储存序列信息的计算机可读介质L. Computer-readable media storing sequence information

此外，本发明提供了储存应用本发明测序方法产生的信息的计算机可读介质。信息包括原始数据(即未经进一步处理或分析的数据)、处理或分析数据等。数据包括图像、数字等。这些信息可储存于一般为易于查找安排的数据库，即信息(如数据)集合中，例如，储存于计算机内存中。信息包括例如：序列和有关序列的任何信息，如部分序列、序列与参比序列的比较、序列分析结果、基因组信息如多态性信息(如具体模板是否含有多态性)或突变信息等、连锁信息(即涉及染色体中某核酸序列相对于另一核酸序列的物理位置的信息)、疾病相关信息(即将疾病的存在或易感性与对象的身体特征如对象的等位基因关联起来的信息)等。信息可能与样品ID、对象ID等有关。可包括涉及样品、对象等的其它信息，包括但不限于：样品来源、对样品进行的处理步骤、信息解释、样品或对象的特征等。本发明也包括一种方法，该方法包括接受计算机可读形式(如储存于计算机可读介质上)的任何上述信息。该方法还可包括根据这些信息提供诊断、预后或预示信息的步骤或只将优选储存于计算机可读介质上的信息提供给第三方的步骤。Additionally, the invention provides computer readable media storing information generated using the sequencing methods of the invention. Information includes raw data (i.e. data that has not been further processed or analyzed), processed or analyzed data, etc. Data includes images, numbers, etc. Such information may be stored in a database, ie, a collection of information (eg, data), typically in an easily searchable arrangement, eg, in a computer's memory. Information includes, for example: sequences and any information about sequences, such as partial sequences, comparisons between sequences and reference sequences, sequence analysis results, genome information such as polymorphism information (such as whether a specific template contains polymorphisms) or mutation information, etc., Linkage information (i.e., information relating to the physical location of a nucleic acid sequence in a chromosome relative to another nucleic acid sequence), disease-associated information (i.e., information relating the presence or susceptibility of a disease to a subject's physical characteristics, such as the subject's alleles) wait. Information may be related to sample ID, subject ID, etc. Other information relating to the sample, subject, etc. may be included, including but not limited to: source of the sample, processing steps performed on the sample, interpretation of the information, characteristics of the sample or subject, etc. The invention also includes a method comprising receiving any of the above information in a computer readable form, such as stored on a computer readable medium. The method may also include the step of providing diagnostic, prognostic or prognostic information based on such information or simply providing information preferably stored on a computer readable medium to a third party.

出于说明提供以下实施例，它们不会限制本发明。The following examples are offered for illustration and they do not limit the invention.

实施例1：有效切割和连接硫代磷酸酯化的寡核苷酸Example 1: Efficient Cleavage and Ligation of Phosphorothioated Oligonucleotides

本实施例描述了显示含有3’-S硫代磷酸酯连接的延伸寡核苷酸的有效连接和切割的实验。This example describes experiments showing efficient ligation and cleavage of extended oligonucleotides containing 3'-S phosphorothioate linkages.

材料和方法Materials and methods

连接测序方法ligation sequencing method

模板制备：为了评价通过寡核苷酸连接和切割循环进行测序的可能和探索改变该方法某些方面的作用，制备两组模式珠基模板群体。在优选实施中，如实施例所述，寡核苷酸连接和切割循环以3’→5’方向延伸链。因此，为了评价连接效率，将模式模板的5’端结合于珠，并在3’端设计有相同的结合区。一组由通过双生物素部分结合于链霉亲和素包被的磁珠(1微米)的短(70bp)寡核苷酸组成。这些短模板群体各自的3’端设计有相同的引物结合区(40bp)和独特的序列区(30bp)。短寡核苷酸模板群体称为连接测序模板1-7(LST1-7)。Template Preparation: To evaluate the potential for sequencing by oligonucleotide ligation and cleavage cycles and to explore the effect of altering certain aspects of the method, two populations of model bead-based templates were prepared. In a preferred implementation, cycles of oligonucleotide ligation and cleavage extend the strand in the 3'→5' direction, as described in the Examples. Therefore, in order to evaluate the ligation efficiency, the 5' end of the pattern template was bound to beads, and the same binding region was designed at the 3' end. One set consisted of short (70 bp) oligonucleotides bound to streptavidin-coated magnetic beads (1 micron) via a dual biotin moiety. The 3' ends of these short template populations are designed with the same primer binding region (40bp) and unique sequence region (30bp). The population of short oligonucleotide templates is referred to as Ligation Sequencing Templates 1-7 (LST1-7).

从PCR-产生的长DNA片段(232-bp)设计第二组珠基模板群体，所述长DNA片段是通过将183-bp间隔物序列(来自人p53外显子)插入各模板群体产生的。用含有双生物素的正向引物和反向引物扩增模板，所述反向引物含有与短模板群体相同的30个碱基的独特3’端序列。通过用含有氢氧化钠的缓冲液解开一条链产生单链模板。这些长模板群体的设计模拟了从共待审专利申请所述短片段成对末端文库产生的种类，它们称为长-LST1-7。A second bead-based template population was designed from PCR-generated long DNA fragments (232-bp) generated by inserting 183-bp spacer sequences (from human p53 exons) into each template population . The template was amplified with a double biotin-containing forward primer and a reverse primer containing a unique 3' end sequence of 30 bases identical to the short template population. Single-stranded templates are generated by unwinding one strand with a buffer containing sodium hydroxide. These long template populations were designed to mimic the species generated from the short fragment paired-end library described in the co-pending patent application, which they termed long-LST1-7.

引物杂交：预先混合2.5μL100μM FAM-标记引物与100μL1X Klenow缓冲液。去除缓冲液后将此溶液加入30μL试样量的连接有模板的磁珠(10⁶/μL)中，充分混合得到的溶液。允许模板/引物发生杂交(杂交反应在65℃进行2分钟、在40℃进行2分钟、在冰上进行2分钟)后，去除引物/缓冲液，用3×洗涤1E缓冲液洗涤该珠，然后重悬于300μL(10⁶/mL)TENT缓冲液(含有10mM Tris，2mMEDTA，30mM NaOAc和0.01％Triton X-100)。Primer Hybridization: Pre-mix 2.5 μL of 100 μM FAM-labeled primer with 100 μL of 1X Klenow buffer. After the buffer was removed, this solution was added to a 30 µL sample amount of template-linked magnetic beads (10 ⁶ /µL), and the resulting solution was thoroughly mixed. After allowing the template/primers to hybridize (hybridization reaction at 65°C for 2 min, at 40°C for 2 min, on ice for 2 min), the primer/buffer was removed and the beads were washed with 3× wash 1E buffer, then Resuspended in 300 μL (10 ⁶ /mL) of TENT buffer (containing 10 mM Tris, 2 mM EDTA, 30 mM NaOAc and 0.01% Triton X-100).

连接1：然后，在含有1μL100μM LST7-1九聚物、4μL5×T4连接酶缓冲液(Invitrogen)、14μLH₂O和1μL T4连接酶(1u/μL，Invitrogen)的混合物中37℃孵育杂交含有LigSeq-FAM的2.5×10⁶个LST7珠30分钟。Ligation 1: Then, incubate hybridization containing LigSeq in a mixture containing 1 μL of 100 μM LST7-1 nonamer, 4 μL of 5×T4 ligase buffer (Invitrogen), ₁₄ μL of HO, and 1 μL of T4 ligase (1u/μL, Invitrogen) - 2.5 x 10 ⁶ LST7 beads of FAM for 30 min.

切割1：然后用100μL LSWashl(含有1XTE，30mM乙酸钠，0.01％TritonX100)洗涤该珠3次；取出10μL试样量的此溶液，储存用于分析。然后用100μL30mM乙酸钠洗涤该珠(1X)。将50μL50mM AgNO₃加入此溶液，将得到的混合物37℃孵育20分钟。去除AgNO₃，用100μL30mM乙酸钠洗涤该珠一次。然后用100μL LSWash1洗涤该珠3次，重悬于90μL Wash(TENT)缓冲液；取出10μL试样量的此溶液，储存用于分析。Cut 1: The beads were then washed 3 times with 100 μL LS Washl (containing 1XTE, 30 mM sodium acetate, 0.01% TritonX100); a 10 μL aliquot of this solution was removed and stored for analysis. The beads were then washed (1X) with 100 μL of 30 mM sodium acetate. 50 μL of 50 mM AgNO ₃ was added to this solution, and the resulting mixture was incubated at 37° C. for 20 minutes. AgNO ₃ was removed and the beads were washed once with 100 μL of 30 mM sodium acetate. The beads were then washed 3 times with 100 μL LSWash1 and resuspended in 90 μL Wash (TENT) buffer; a 10 μL aliquot of this solution was removed and stored for analysis.

连接2：去除TENT缓冲液后，将该珠重悬于14μL H₂O，用含有1μL 100μMLST7-5九聚物、4μL5×T4连接酶缓冲液(Invitrogen)和1μLT4连接酶(1u/μL，Invitrogen)的混合物37℃孵育30分钟。Ligation 2: After removing the TENT buffer, the beads were resuspended in 14 μL H ₂ O and treated with 1 μL 100 μL ST7-5 nonamer, 4 μL 5×T4 ligase buffer (Invitrogen) and 1 μL T4 ligase (1 u/μL, Invitrogen ) mixture was incubated at 37°C for 30 minutes.

切割2：用100μL LSWash1(1XTE，30 mM乙酸钠，0.01％Triton X100)洗涤该珠3次，重悬于45μL Wash1E。取出15μL试样量的此混合物，储存用于分析。然后用100μL 30mM乙酸钠洗涤该珠1次，重悬于5μL20 mM乙酸钠。将50μL 50mM AgNO₃加入该珠，37℃孵育该混合物20分钟。去除AgNO₃后，用100μL30mM乙酸钠洗涤该珠一次。然后用100μL LSWash1洗涤该珠3次，重悬于30μLWash1E。取出20μL试样量的此混合物，储存用于分析。Cut 2: The beads were washed 3 times with 100 μL LSWash1 (1XTE, 30 mM sodium acetate, 0.01% Triton X100) and resuspended in 45 μL Wash1E. A 15 [mu]L aliquot of this mixture was withdrawn and stored for analysis. The beads were then washed once with 100 μL of 30 mM sodium acetate and resuspended in 5 μL of 20 mM sodium acetate. 50 μL of 50 mM AgNO ₃ was added to the beads and the mixture was incubated at 37° C. for 20 minutes. After removing the AgNO ₃ , the beads were washed once with 100 μL of 30 mM sodium acetate. The beads were then washed 3 times with 100 μL LSWash1 and resuspended in 30 μL Wash1E. A 20 [mu]L aliquot of this mixture was withdrawn and stored for analysis.

结果result

参照图8能更好地理解本实验。图8上部显示了实验步骤的总体概要。起始寡核苷酸(引物)杂交于通过生物素连接连接于珠的模板(标为LST7)。起始寡核苷酸含有5’磷酸，其3’端用FAM荧光标记。合成两个9-mer(九聚体)寡核苷酸探针(第1可切割寡核苷酸和第2可切割寡核苷酸)，它们内部含有硫代磷酸酯化的胸苷碱基(sT)(下划线)。用T4 DNA连接酶将第1可切割探针连接于该引物的可延伸末端，然后用硝酸银切割。切割去除了延伸探针的末端5个核苷酸，并在仍与该引物连接的探针部分上产生可延伸末端。然后，将第2可切割探针连接于可延伸末端，然后相似地进行切割。The experiment can be better understood with reference to FIG. 8 . The upper part of Figure 8 shows a general overview of the experimental procedure. The starting oligonucleotide (primer) hybridized to a template (labeled LST7) attached to the bead via a biotin linkage. The starting oligonucleotide contains a 5' phosphate and its 3' end is fluorescently labeled with FAM. Synthesis of two 9-mer (nonamers) oligonucleotide probes (cleavable oligonucleotide 1 and cleavable oligonucleotide 2) containing phosphorothioated thymidine bases inside (sT) (underline). The first cleavable probe was ligated to the extensible end of the primer with T4 DNA ligase, and then cleaved with silver nitrate. Cleavage removes the terminal 5 nucleotides of the extension probe and creates an extendable end on the portion of the probe still attached to the primer. Then, a second cleavable probe is ligated to the extensible end, followed by cleavage similarly.

用荧光毛细管电泳凝胶移位实验监测连接和切割步骤。此实验中，将该引物杂交于模板链，以使5’磷酸可用作引入寡核苷酸探针的连接底物(荧光团用作基于迁移率的毛细管凝胶电泳的报道物)。各步骤后，取出试样量的珠进行分析。连接寡核苷酸探针后，用磁体收集磁珠，通过热变性释放模板珠上由引物和探针连接形成的连接物，用自动DNA测序设备(ABI 3730)以标记的大小标准(lissamine梯；大小范围15-120个核苷酸；在色谱图中显示为一组橙色的峰，见图8)进行荧光毛细管电泳。在典型的凝胶移位中，可能的峰包括，i)引物峰(由于没有延伸或缺少引物延伸)，ii)腺苷酸化峰(由于DNA连接酶的作用在非生产性连接处的5’端连接了腺苷残基-参见图8F的机制，也参见Lehman，I.R.，Science，186：790-797，1974)，和iii)完成峰(由于寡核苷酸探针的连接)。用凝胶移位实验评价连接效率的一个优点是峰下面积与各物质的浓度直接相关。The ligation and cleavage steps were monitored using a fluorescent capillary electrophoresis gel-shift assay. In this experiment, the primer is hybridized to the template strand so that the 5' phosphate can be used as a ligation substrate for the introduction of oligonucleotide probes (the fluorophore is used as a reporter for mobility-based capillary gel electrophoresis). After each step, a sample amount of beads was removed for analysis. After connecting the oligonucleotide probes, collect the magnetic beads with a magnet, release the linker formed by the primer and probe ligation on the template beads by heat denaturation, and use an automatic DNA sequencing equipment (ABI 3730) to label the size standard (lissamine ladder ; size range 15-120 nucleotides; shown as a group of orange peaks in the chromatogram, see Figure 8) for fluorescent capillary electrophoresis. In a typical gel shift, possible peaks include, i) primer peaks (due to no extension or lack of primer extension), ii) adenylation peaks (due to the action of DNA ligase at 5' of non-productive junctions). end with adenosine residues attached - see Figure 8F for mechanism, see also Lehman, I.R., Science, 186:790-797, 1974), and iii) complete peaks (due to ligation of oligonucleotide probes). One advantage of using gel shift experiments to assess ligation efficiency is that the area under the peak is directly related to the concentration of each species.

图8A显示了用T4 DNA连接酶进行的对照连接和仅含磷酸二酯连接的精确匹配探针(图8A左方)。橙色峰代表大小标记物。左边的蓝色峰表示在没有连接时引物的位置。精确匹配探针的连接导致向左移位(箭头)。图8B显示了在相同条件下用内部含有硫醇化T碱基的探针进行的连接(图8B左方)。观察到与对照探针的移位相同(箭头)。然后，用硝酸银孵育连接有硫代磷酸酯化探针的连接于珠的模板群体以诱导探针切割。凝胶移位分析显示出左移的4-bp切割产物，从而确认了有效切割(图8C)。图8C左方显示预计的切割产物。然后，使切割的珠基模板群体进行第二轮连接，通过出现右移的13-bp延伸产物证明是生产性连接(图8D)。图8D左方显示预计的切割产物。第二轮切割确认，可完成有效的多个切割步骤，如预计的左移8-bp切割产物所示(图8E)。Figure 8A shows a control ligation with T4 DNA ligase and an exact match probe containing only a phosphodiester ligation (Figure 8A left). Orange peaks represent size markers. The blue peak on the left indicates the position of the primer when not ligated. Ligation of an exact match probe results in a shift to the left (arrow). Figure 8B shows ligation under the same conditions with a probe containing a thiolated T base internally (Figure 8B left). The same shift as the control probe was observed (arrow). The bead-attached template population with attached phosphorothioated probes was then incubated with silver nitrate to induce probe cleavage. Gel shift analysis revealed a left-shifted 4-bp cleavage product, confirming efficient cleavage (Fig. 8C). Figure 8C left shows expected cleavage products. The cleaved bead-based template population was then subjected to a second round of ligation, as demonstrated by a right-shifted 13-bp extension product (Fig. 8D). Figure 8D left shows expected cleavage products. The second round of cleavage confirmed that efficient multiple cleavage steps could be accomplished, as indicated by the expected left-shifted 8-bp cleavage product (Fig. 8E).

这些结果证明，成功地连接和切割了含有硫代磷酸酯连接的探针。These results demonstrate successful ligation and cleavage of probes containing phosphorothioate linkages.

显然，在这些实验中连接没有进行至100％完成，但在采用T4 DNA连接酶的其它实验中观察到较高的完成程度(见下)。虽然的确希望连接进行至完成，但这不是必要条件。例如，在上述连接步骤后可通过5’-磷酸酶处理给未连接5’端有效“加帽”。然而，在这种情况下，由于可连接分子的消耗可能限制可进行的连续连接的数量。在给定连续连接的数目的情况下，阅读长度取决于各连接/切割循环后剩余的探针长度和测序反应数目，每个测序反应后接可对给定模板进行的引物去除和结合于引物结合位点不同部分的引物的杂交，也称为“重启动”数目。这支持使用具有接近探针5’端的可切割连接的更长的探针。在我们的实验中，六聚物探针比八聚物和更长探针产生的不可连接腺苷酸化产物更多。八聚物和更长的探针基本连接至完成(见下)。此外，将荧光部分加入六聚物探针的5’端似乎会降低连接效率，而将荧光部分加入八聚物探针的影响很小或无影响。由于这些原因，认为优选采用八聚物或更长探针。Clearly, ligation did not proceed to 100% completion in these experiments, but a higher degree of completion was observed in other experiments with T4 DNA ligase (see below). While it is indeed desirable that the connection proceed to completion, this is not a requirement. For example, the unligated 5' ends can be effectively "capped" by 5'-phosphatase treatment after the ligation step described above. In this case, however, the number of serial ligations that can be performed may be limited due to the depletion of ligatable molecules. Given the number of contiguous ligations, the read length depends on the probe length remaining after each ligation/cleavage cycle and the number of sequencing reactions, each followed by primer removal and binding to primers that can be performed on a given template. Hybridization of primers that bind to different parts of the site, also known as the "restart" number. This supports the use of longer probes with cleavable linkages close to the 5' end of the probe. In our experiments, hexamer probes produced more non-ligatable adenylation products than octamer and longer probes. Octamers and longer probes ligate to completion (see below). Furthermore, adding a fluorescent moiety to the 5' end of a hexamer probe appears to reduce ligation efficiency, while adding a fluorescent moiety to an octamer probe has little or no effect. For these reasons, it is considered preferable to use octamer or longer probes.

其它实验(下述)已证明了含有硫代磷酸酯连接和简并性降低的核苷酸的探针的连接和切割；连接的延伸探针的3’端特异性和选择性；在凝胶中连接和切割；连续的引物杂交和去除循环，仅损失少量信号；T4或Taq连接酶进行3’→5’延伸的保真性为100％；和连接的延伸探针的4色光谱分辨能力。构建了进行该方法的自动化系统。Other experiments (below) have demonstrated ligation and cleavage of probes containing phosphorothioate linkages and nucleotides of reduced degeneracy; 3' end specificity and selectivity of ligated extension probes; medium ligation and cleavage; successive cycles of primer hybridization and removal with only a small loss of signal; 100% fidelity for 3'→5' extension with T4 or Taq ligase; and 4-color spectral resolution of ligated extension probes. An automated system for carrying out the method was constructed.

实施例2：含有简并性降低的核苷酸的硫代磷酸酯化寡核苷酸的有效切割和连接Example 2: Efficient Cleavage and Ligation of Phosphorothioated Oligonucleotides Containing Nucleotides with Reduced Degeneracy

然而，对探针长度的另一种考虑是延伸的寡核苷酸的保真性及其对后续连接效率的影响。已证明，T4 DNA连接酶在连接处后第5个碱基后的保真性快速降低(Luo等，Nucleic Acids Res.，24：3071-3078和3079-3085，1996)。如果在新连接的接合处的5’侧引入错配，可通过消耗降低连接效率，然而，不会产生背景信号的移相或增加(通过合成方法进行基于聚合酶的测序中碰到的主要障碍)。However, another consideration for probe length is the fidelity of the extended oligo and its impact on subsequent ligation efficiency. The fidelity of T4 DNA ligase has been demonstrated to decrease rapidly after the 5th base after the junction (Luo et al., Nucleic Acids Res., 24:3071-3078 and 3079-3085, 1996). If a mismatch is introduced on the 5' side of the newly ligated junction, ligation efficiency can be reduced by depletion, however, without a phase shift or increase in background signal (a major hurdle encountered in polymerase-based sequencing by synthetic approaches). ).

优选地，探针组应能够杂交于任何DNA序列，以便重新测序未表征的DNA。然而，标记探针组的复杂性随4倍简并碱基的长度和数量呈指数性增加。此外，在维持对所有探针种类基本相同的代表性时，复杂探针组更难以合成，并且更难以纯化。也需要较高浓度的探针混合物来维持各种类的浓度恒定。解决这种复杂性的一种方式是在某些位置上采用掺入通用碱基如脱氧肌苷来代替4倍简并碱基的核苷酸。Preferably, the probe set should be able to hybridize to any DNA sequence in order to resequence uncharacterized DNA. However, the complexity of labeled probe sets increases exponentially with the length and number of 4-fold degenerate bases. Furthermore, complex probe sets are more difficult to synthesize, and more difficult to purify, while maintaining essentially the same representation of all probe species. Higher concentrations of the probe mix are also required to maintain the concentration of each species constant. One way to address this complexity is to employ nucleotides that incorporate universal bases such as deoxyinosine instead of 4-fold degenerate bases at certain positions.

在八聚物内各个位置上用4倍简并碱基(N；等摩尔量的A、C、G、T)和通用碱基肌苷(I)设计12种八核苷酸探针(在B-DNA中肌苷能够与四种典型碱基中任意一种形成双配位基氢键；肌苷碱基对的稳定性的顺序是I∶C＞I∶A＞I∶T≈I∶G)。评价这些探针设计的目的之一是确定在肌苷碱基存在条件下可以实现多低的八聚物复杂性而仍支持有效连接。Twelve octanucleotide probes (in Inosine in B-DNA can form a double-dentate hydrogen bond with any of the four typical bases; the order of stability of the inosine base pair is I:C>I:A>I:T≈I: G). One of the goals of evaluating these probe designs was to determine how low octamer complexity could be achieved in the presence of inosine bases and still support efficient ligation.

在初步研究中，用T4 DNA连接酶将几种寡核苷酸探针连接于珠基模板(长-LST1)。连接后，荧光团-标记的引物(3’FAM引物)右移，移动量与连接的寡核苷酸探针量成正比。探针设计NI8-9显示了最高的完成水平，其中由于探针的有效连接，＞99％引物群体右移(见图9)。这些反应在25℃进行；当反应温度提高到37℃时，连接效率稍低，完成率更加易变。In a preliminary study, several oligonucleotide probes were ligated to a bead-based template (long-LST1) using T4 DNA ligase. After ligation, the fluorophore-labeled primer (3' FAM primer) shifts to the right in proportion to the amount of oligonucleotide probe attached. Probe design NI8-9 showed the highest level of completion where >99% of the primer population was shifted to the right due to efficient ligation of the probe (see Figure 9). These reactions were performed at 25 °C; when the reaction temperature was increased to 37 °C, ligation was slightly less efficient and completion was more variable.

进一步检查这些数据发现，连接处3’侧的前五个核苷酸(下划线)内肌苷碱基较少的探针显示出较高的连接效率。为了进一步研究和评价序列内容对连接效率的可能影响，在所有模板中筛选连接处3’侧的前五个碱基中只有一个肌苷残基的四种寡核苷酸探针设计。图10显示了使用T4 DNA连接酶在多种模板上对所选探针组合物进行凝胶移位试验以评价连接完成度。这些初步实验的数据显示，连接效率以及完成率是可变的，并且在连接处的前五个3’位置(下划线)中出现肌苷残基时是序列依赖性的。然而，采用寡核苷酸探针设计NI8-9时一致地观察到八聚体的有效连接，这是通过在所有测试模板上完成率＞99％得到证明的。Further examination of these data revealed that probes with fewer inosine bases within the first five nucleotides (underlined) on the 3' side of the junction showed higher ligation efficiencies. In order to further study and evaluate the possible influence of sequence content on ligation efficiency, four oligonucleotide probe designs with only one inosine residue in the first five bases on the 3' side of the junction were screened in all templates. Figure 10 shows gel shift assays of selected probe compositions on various templates using T4 DNA ligase to assess ligation completeness. Data from these preliminary experiments show that ligation efficiency as well as completion rates are variable and sequence-dependent on the occurrence of inosine residues in the first five 3' positions (underlined) at the ligation. However, efficient ligation of the octamer was consistently observed with oligonucleotide probe designs NI8-9, as evidenced by >99% completion across all templates tested.

虽然不希望受限于任何理论，但这些数据(包括腺苷酸化中间体的存在)支持了以下结论：T4 DNA连接酶的核心DNA结合位点中存在不利的肌苷碱基对会使DNA蛋白质复合物不稳定，这足以降低酶结合和后续连接。然而，一个有趣的问题是这种破坏稳定的肌苷碱基对会不会影响连接的寡核苷酸探针的保真性。While not wishing to be bound by any theory, these data, including the presence of an adenylation intermediate, support the conclusion that the presence of unfavorable inosine base pairs in the core DNA-binding site of T4 DNA ligase renders DNA proteins Complex instability is sufficient to reduce enzyme binding and subsequent ligation. However, an interesting question is whether this destabilizing inosine base pair would affect the fidelity of the ligated oligonucleotide probe.

实施例3：探针连接的保真性Example 3: Fidelity of Probe Ligation

已报道，细菌NAD依赖性连接酶如Taq DNA连接酶在连接处具有高序列保真性，其中3’侧的错配基本没有缺口-关闭活性，但5’侧的错配具有一定程度的耐受(Luo等，Nucleic Acids Res.，24：3071-3078和3079-3085，1996)。另一方面，据报道T4 DNA连接酶的严谨性稍低，允许在连接处的3’-和5’-侧发生错配。因此，感兴趣的是评价与Taq DNA连接酶相比，在我们的系统中用T4 DNA连接酶进行探针连接的保真性。Bacterial NAD-dependent ligases such as Taq DNA ligase have been reported to have high sequence fidelity at the ligation, where mismatches on the 3' side have essentially no nick-closing activity, but mismatches on the 5' side are somewhat tolerant (Luo et al., Nucleic Acids Res., 24:3071-3078 and 3079-3085, 1996). On the other hand, T4 DNA ligase has been reported to be slightly less stringent, allowing mismatches at the 3'- and 5'-sides of the junction. Therefore, it was of interest to evaluate the fidelity of probe ligation with T4 DNA ligase compared to Taq DNA ligase in our system.

用标准ABI测序技术，我们开发了两种方法以评价连接的寡核苷酸的序列保真性。第一种方法的设计是克隆和测序连接产物。在这种方法中，将连接延伸产物连接于衔接子序列、克隆并转化到细菌中。挑选单个集落并测序，以定量地评价连接处各位置上的错配频率。第二种方法的设计是直接测序连接产物。在这种方法中，由珠基模板变性成单链连接产物，用互补引物直接测序。在得到的序列迹线中低准确性的位置显示出多个重叠峰，定性地评价该位置上的序列保真性。Using standard ABI sequencing techniques, we developed two methods to assess the sequence fidelity of ligated oligonucleotides. The first approach was designed to clone and sequence the ligation products. In this method, ligated extension products are ligated to adapter sequences, cloned and transformed into bacteria. Individual colonies were picked and sequenced to quantitatively assess the frequency of mismatches at each position of the junction. The second approach was designed to directly sequence the ligation products. In this method, bead-based templates are denatured into single-stranded ligation products, which are sequenced directly with complementary primers. Positions of low accuracy in the resulting sequence trace showed multiple overlapping peaks, at which position the fidelity of the sequence was assessed qualitatively.

用第一种方法评价采用T4和Taq DNA连接酶连接探针的相对保真性。将单个珠基模板群体(LST1)杂交于用作起始寡核苷酸的通用测序引物。然后在简并寡核苷酸探针(N7A，3’ANNNNNNN5’，2000皮摩尔)的存在下用T4 DNA连接酶(15U/1x10⁶个珠)或Taq DNA连接酶(60U/1x10⁶个珠)在37℃进行基于溶液的连接反应30分钟(图11，图A)。克隆并测序连接产物，以评价在其连接处(位置1-8)的3’侧上各DNA连接酶的位置保真性(图11，图B和C)。结果说明，在前5个位置上T4 DNA连接酶与Taq DNA连接酶的保真性水平基本相同，但在位置6-8上T4DNA连接酶的保真性较低。随后的克隆实验进一步证明了这些结果，该实验评价了三种简并的含肌苷探针设计(3’-NNNNNIII-5’、3’-NNNNNINI-5’和3’-NNNINNNI-5’)与所有七种模板(LST1-7)连接处的DNA序列。该研究确认，T4DNA连接酶在连接处位置6-8的序列保真性低，但在所有测试模板中前5个位置上具有高保真性(数据未显示)。The first method was used to assess the relative fidelity of probe ligation using T4 and Taq DNA ligases. A single bead-based template population (LST1) was hybridized to a universal sequencing primer used as the starting oligonucleotide. Then T4 DNA ligase (15U/ ^1x106 beads) or Taq DNA ligase ( ^60U /1x106 beads) in the presence of a degenerate oligonucleotide probe (N7A, 3'ANNNNNNN5', 2000 pmoles) ) for a solution-based ligation reaction at 37° C. for 30 minutes ( FIG. 11 , panel A). Ligation products were cloned and sequenced to assess the positional fidelity of each DNA ligase on the 3' side of its junction (positions 1-8) (Figure 11, panels B and C). The results indicated that the fidelity levels of T4 DNA ligase and Taq DNA ligase were basically the same in the first 5 positions, but the fidelity of T4 DNA ligase was lower in positions 6-8. These results were further confirmed by subsequent cloning experiments that evaluated three degenerate inosine-containing probe designs (3'-NNNNNNIII-5', 3'-NNNNNINI-5' and 3'-NNNINNNI-5') DNA sequences at junctions with all seven templates (LST1-7). This study confirmed that T4 DNA ligase had low sequence fidelity at junction positions 6-8, but high fidelity at the top 5 positions in all templates tested (data not shown).

用直接测序方法评价T4 DNA连接酶对简并的含肌苷探针的保真性。在25℃和37℃的连接反应中评价寡核苷酸探针，该反应含有T4 DNA连接酶和珠基模板。用凝胶移位实验评价寡核苷酸探针连接效率(图12，图A)。用ABI3730xl DNA分析仪直接测序连接反应，以评价寡核苷酸探针连接中T4 DNA连接酶的保真性(图12，图B)。精确匹配寡核苷酸探针和两种代表性简并的含肌苷寡核苷酸探针(NI8-9和NI8-11)的连接能达到＞99％完成，而且错配的频率非常低(测序迹线中没有多个峰)。数据表示，有效连接的探针也具有高序列保真性。Evaluation of the fidelity of T4 DNA ligase for degenerate inosine-containing probes by direct sequencing. Oligonucleotide probes were evaluated at 25°C and 37°C in ligation reactions containing T4 DNA ligase and bead-based templates. Oligonucleotide probe ligation efficiency was assessed by gel shift assay (Figure 12, panel A). Ligation reactions were directly sequenced with an ABI3730xl DNA Analyzer to evaluate the fidelity of T4 DNA ligase in oligonucleotide probe ligation (Figure 12, panel B). Ligation of exact-match oligonucleotide probes and two representative degenerate inosine-containing oligonucleotide probes (NI8-9 and NI8-11) was >99% complete with very low frequency of mismatches (No multiple peaks in the sequencing trace). The data indicate that operably linked probes also have high sequence fidelity.

在其它实验中，单个珠基模板群体(LST1)杂交于用作起始寡核苷酸的含有5’磷酸的通用测序引物。在简并的含肌苷寡核苷酸探针(3’NNNNNiii5’、3’NNNNNiNi5’或3’NNNiNNNi5’，600皮摩尔)存在下，用T4 DNA连接酶(1U/250,000珠)在37℃进行基于溶液的连接反应30分钟。克隆连接产物，挑选集落并测序。通过计算代表连接处各位置的克隆数量确定序列保真性。将结果制表，见图12C-F。这些研究证明，用T4 DNA连接酶以3’→5’连接简并的含肌苷探针在前1-5个位置上具有高水平保真性。In other experiments, a single bead-based template population (LST1) was hybridized to a universal sequencing primer containing a 5' phosphate used as the starting oligonucleotide. In the presence of degenerate inosine-containing oligonucleotide probes (3'NNNNNiii5', 3'NNNNNiNi5' or 3'NNNNNi5', 600 pmol), use T4 DNA ligase (1U/250,000 beads) at 37°C The solution-based ligation reaction was performed for 30 minutes. The ligation products were cloned, and colonies were picked and sequenced. Sequence fidelity was determined by counting the number of clones representing each position of the junction. The results are tabulated, see Figures 12C-F. These studies demonstrate that degenerate inosine-containing probes are ligated 3' → 5' with T4 DNA ligase with a high level of fidelity in the first 1–5 positions.

实施例4：在凝胶中连接和切割Example 4: Ligation and cleavage in a gel

如上所述，用溶液中的珠基模板进行探索、开发和优化寡核苷酸连接循环的方法的初步实验。在第二组实验中，对包埋在玻片上聚丙烯酰胺凝胶中的珠基模板进行连接和切割。Initial experiments to explore, develop and optimize methods for oligonucleotide ligation cycles were performed with bead-based templates in solution, as described above. In the second set of experiments, bead-based templates embedded in polyacrylamide gels on glass slides were ligated and cleaved.

通过混合几百万个珠制备玻片，各珠连接有单链DNA模板的克隆群体，玻片上有5％聚丙烯酰胺并且在此处发生聚合。用Teflon掩模围绕含珠的聚丙烯酰胺溶液。图14(上图)显示了某玻片部分的荧光图像，连接有Cy3-标记引物杂交的模板的珠固定在该玻片上聚丙烯酰胺凝胶中。(该玻片用于不同实验，代表了本文所用玻片。)图14(下图)显示了装有Teflon掩模以围绕住聚丙烯酰胺溶液的玻片的示意图。Slides were prepared by mixing several million beads, each attached to a clonal population of single-stranded DNA templates, with 5% polyacrylamide on the slide and where polymerization occurred. The polyacrylamide solution containing the beads was surrounded with a Teflon(R) mask. Figure 14 (upper panel) shows a fluorescent image of a section of a slide on which beads attached to templates hybridized with Cy3-labeled primers were immobilized in a polyacrylamide gel. (The slide was used in a different experiment and is representative of the slide used in this paper.) Figure 14 (lower panel) shows a schematic of the slide fitted with a Teflon mask to surround the polyacrylamide solution.

向玻片手动滴加合适溶液或将玻片放入自动化层流室中，从而将反应物引入玻片。初步研究证明，事实上可对连接于珠的模板进行有效的凝胶中连接，所述珠固定在这种玻片的聚丙烯酰胺基质中。在图15所示实验中，将单链DNA模板珠固定在含有丙烯酰胺和DATD的玻片上。聚合后，将3’荧光团-标记的、5’磷酸化的通用引物(测序引物)扩散到该凝胶中，使其聚合(图A)。洗涤玻片以去除未结合的测序引物，与含有T4 DNA连接酶(10U)和寡核苷酸探针的连接混合物混合，37℃孵育30分钟。然后在含有高碘酸钠(0.1M)的缓冲液中孵育玻片，以消化丙烯酰胺聚合物并释放珠基模板群体。通过加热将模板链变性得到连接产物，收集并用上述凝胶移位实验分析。在没有T4 DNA连接酶时在凝胶中进行的连接反应显示出代表未连接测序引物的一个峰(图B)。在T4 DNA连接酶存在下用八聚体探针进行的连接反应显示出在凝胶中发生有效的寡核苷酸连接，其中＞99％珠基模板群体有效连接(图C)。Reactants are introduced to the slides by manually dropping the appropriate solution onto the slides or by placing the slides in an automated laminar flow chamber. Preliminary studies have demonstrated that efficient in-gel attachment of templates attached to beads immobilized in the polyacrylamide matrix of such slides is in fact possible. In the experiment shown in Figure 15, single-stranded DNA template beads were immobilized on glass slides containing acrylamide and DATD. After polymerization, a 3' fluorophore-labeled, 5' phosphorylated universal primer (sequencing primer) is diffused into the gel, allowing it to polymerize (panel A). Wash slides to remove unbound sequencing primers, mix with ligation mix containing T4 DNA ligase (10 U) and oligonucleotide probes, and incubate at 37°C for 30 minutes. Slides were then incubated in a buffer containing sodium periodate (0.1 M) to digest the acrylamide polymer and release the bead-based template population. The template strands were denatured by heating to yield ligation products, which were collected and analyzed using the gel shift assay described above. Ligation reactions performed in the gel in the absence of T4 DNA ligase showed a peak representing unligated sequencing primers (panel B). Ligation reactions with the octamer probe in the presence of T4 DNA ligase showed efficient oligonucleotide ligation in the gel with >99% of the bead-based template population ligated efficiently (panel C).

实施例5：四色检测Embodiment 5: four-color detection

为了最大程度提高检测效率，需要采用含有对应于各种可能的碱基加成产物的区别标记的一组寡核苷酸探针。在装配合适的激发和发射滤光片的自动化测序设备中模拟此方法，如图15所示。设计三组八聚体探针，以解决探针特异性和选择性的问题。第一组包括四种八聚体，它们与四种独特的模板群体互补，含有不同的3’碱基和5’染料标记。第二组包括七种独特的八聚体，它们含有独特的3’碱基和5’染料。第三组对应于四种简并的含肌苷八聚体的探针设计，各自含有用不同5’染料标记鉴定的独特3’端碱基。To maximize detection efficiency, it is desirable to employ a set of oligonucleotide probes that contain distinctive labels corresponding to each possible base addition product. This method is simulated in an automated sequencing device equipped with appropriate excitation and emission filters, as shown in Figure 15. Three sets of octamer probes were designed to address the issues of probe specificity and selectivity. The first set included four octamers that were complementary to four unique template populations, containing different 3' bases and 5' dye labels. The second group included seven unique octamers containing unique 3' bases and 5' dyes. The third set corresponds to the probe design of four degenerate inosine-containing octamers, each containing a unique 3' terminal base identified with a different 5' dye label.

为了验证四色光谱种类，用探针组#1检测四种独特的模板群体(见图16)。制备含有连接于珠的四种独特单链模板群体的玻片，这些珠包埋在聚丙烯酰胺中(图A)。各珠连接有克隆的模板群体。含有5’磷酸的通用测序引物原位杂交，用含有四种独特荧光团探针(Cy5、CAL 610、CAL 560、FAM；各100皮摩尔)和T4 DNA连接酶(10U/玻片)的寡核苷酸探针混合物进行连接反应。37℃孵育玻片30分钟，洗涤去除未结合的探针。在亮光下对玻片成像，产生白光基础图像(图B)，用四种带通滤光片(FITC、Cy3、德克萨斯红和Cy5)进行荧光激发。在连接前和连接后捕获荧光图像。单个群体产生假色(图C)，对不同光谱种类的图像值作图，并验证最小信号重叠(图D)。To validate the four-color spectral species, probe set #1 was used to detect four unique template populations (see Figure 16). Slides were prepared containing four unique populations of single-stranded templates attached to beads embedded in polyacrylamide (Panel A). Each bead is linked to a cloned template population. In situ hybridization with universal sequencing primers containing 5' phosphates, using oligos containing four unique fluorophore probes (Cy5, CAL 610, CAL 560, FAM; 100 pmol each) and T4 DNA ligase (10 U/slide) The nucleotide probe mixture is subjected to a ligation reaction. Incubate slides at 37°C for 30 minutes and wash to remove unbound probes. Slides were imaged under bright light to produce a white-light base image (Panel B), and four bandpass filters (FITC, Cy3, Texas Red, and Cy5) were used for fluorescence excitation. Fluorescent images were captured before and after ligation. A single population produces a false color (Panel C), plotting the image values of the different spectral classes and verifying minimal signal overlap (Panel D).

实施例6：证明凝胶中的连接特异性和选择性Example 6: Demonstration of Ligation Specificity and Selectivity in Gels

为了验证3’端特异性，用探针组#2检测一个模板群体(见图17)。用包埋在聚丙烯酰胺凝胶中连接有一个模板群体(LST1.T)的珠制备玻片，用通用测序引物原位杂交(图A)。用T4 DNA连接酶(10U/玻片)和由四种5’端-标记探针组成的寡核苷酸探针混合物在凝胶中进行连接反应，这四种探针的区别仅在于一个3’碱基。37℃孵育玻片30分钟，洗涤去除未结合的探针群体。在白光下对玻片成像，产生基本图像(图B)，用四种带通滤光片(FITC、Cy3、德克萨斯红和Cy5)进行荧光激发。在连接前和连接后捕获的荧光图像证实，用T4 DNA连接酶在凝胶中连接后存在单个基于FAM的探针群体(蓝点)，没有光谱重叠(图C、D)。这些数据显示，T4 DNA连接酶的探针特异性是严谨的，并且取决于连接处的第一个3’端碱基。To verify 3' specificity, a template population was tested with probe set #2 (see Figure 17). Slides were prepared from beads embedded in a polyacrylamide gel linked to a template population (LST1.T) and hybridized in situ with universal sequencing primers (Panel A). The ligation reaction was performed in a gel with T4 DNA ligase (10 U/slide) and an oligonucleotide probe mix consisting of four 5'-labeled probes that differed by only one 3 'base. Incubate slides at 37°C for 30 minutes and wash to remove unbound probe populations. The slides were imaged under white light to generate the base image (Panel B), and four bandpass filters (FITC, Cy3, Texas Red, and Cy5) were used for fluorescence excitation. Fluorescent images captured before and after ligation confirmed the presence of a single population of FAM-based probes (blue dots) after ligation in the gel with T4 DNA ligase, with no spectral overlap (panels C, D). These data show that the probe specificity of T4 DNA ligase is stringent and depends on the first 3' base at the ligation.

为了进一步证实3’端特异性和选择性，用探针组#2鉴定含有一种碱基差异并且存在量不同的珠基模板群体混合物。用各自连接有四种模板群体之一的珠的混合物制备玻片，四种模板群体各自具有不同的单核苷酸多态性(LST1；A、G、C或T)，如图18A所示。这些珠包埋在玻片上聚丙烯酰胺凝胶中。以各种不同频率使用珠基模板群体，如D栏所示。用通用测序引物原位杂交玻片。用T4DNA连接酶(10U/玻片)和含有等摩尔量(各100皮摩尔)的四种5’端-标记探针的寡核苷酸探针混合物在凝胶中进行连接反应，这四种探针的差别仅在于一个3’碱基。37℃孵育玻片30分钟，洗涤去除未结合的探针群体。在白光下对玻片成像，产生基本图像(B栏)，用四种带通滤光片(FITC、Cy3、德克萨斯红和Cy5)进行荧光激发。重叠单个探针图像并产生假色(C栏)。用珠-调用软件对荧光图像计数。结果见D栏，此结果证明观察到的连接频率(Obs)与预计频率(Exp)相关。数据显示，在多种模板存在下连接后探针特异性和探针选择性高，并证实了通过连接检测单核苷酸多态性(SNP)，即群体中不同个体的基因组DNA段中一个核苷酸碱基发生的改变的能力。To further confirm 3' end specificity and selectivity, probe set #2 was used to identify bead-based template population mixtures containing one base difference and present in different amounts. Slides were prepared with a mixture of beads each attached to one of four template populations, each with a different SNP (LST1; A, G, C, or T), as shown in Figure 18A . These beads are embedded in polyacrylamide gels on slides. Bead-based template populations were used at various frequencies, as indicated in column D. Slides were in situ hybridized with universal sequencing primers. The ligation reaction was performed in a gel with T4 DNA ligase (10 U/slide) and an oligonucleotide probe mix containing equimolar amounts (100 pmol each) of the four 5'-labeled probes. The probes differ by only one 3' base. Incubate slides at 37°C for 30 minutes and wash to remove unbound probe populations. Slides were imaged under white light to generate base images (column B), and four bandpass filters (FITC, Cy3, Texas Red, and Cy5) were used for fluorescence excitation. Individual probe images are superimposed and false-colored (column C). Fluorescent images were counted with bead-calling software. The results, shown in column D, demonstrate that the observed connection frequency (Obs) correlates with the expected frequency (Exp). The data showed high probe specificity and probe selectivity after ligation in the presence of multiple templates and confirmed the detection of single nucleotide polymorphisms (SNPs) by ligation, i.e., a single nucleotide polymorphism in a segment of genomic DNA from different individuals in a population The ability to change nucleotide bases.

实施例7：采用四色简并的含肌苷延伸探针证实凝胶中的连接特异性和选择性Example 7: Demonstration of ligation specificity and selectivity in gels using four-color degenerate inosine-containing extension probes

用探针组#3进行另一组实验，以评价采用四色简并的含肌苷寡核苷酸探针池时探针连接的特异性和选择性。结果见图19。如上所述制备珠基玻片，但采用了以不同数量存在于珠上的四种独特单链模板群体，然后用通用测序引物原位杂交(图A)。在T4 DNA连接酶(10U/玻片)的存在下用探针池在凝胶中进行连接反应，所述探针池的3’端由用五个简并碱基(N；复杂性4⁵＝1024)、两个通用碱基(I、肌苷)和一个已知核苷酸设计的八聚体组成，它们对应于特定5’荧光团(G-Cy5、A-CAL 610、T-CAL560、A-FAM；各600皮摩尔)。37℃孵育玻片30分钟，洗涤去除未结合的探针群体。在白光下对玻片成像，产生基本图像(B栏)，用四种带通滤光片(FITC、Cy3、德克萨斯红和Cy5)进行荧光激发。重叠单个探针图像并产生假色(C栏)。用珠-调用软件对荧光图像计数并将各连接产物的频率制表(D栏)；未加工原始数据和代表前90％珠信号值的过滤数据的光谱散点图见E栏。数据证明，观察到的连接频率(Obs)与基于各模板已知浓度的预计频率(Exp)相关。这验证了简并和含通用碱基的探针池可与T4 DNA连接酶一起使用，以提供凝胶中特异性和选择性连接。Another set of experiments was performed with probe set #3 to evaluate the specificity and selectivity of probe ligation using a pool of four-color degenerate inosine-containing oligonucleotide probes. The results are shown in Figure 19. Bead-based slides were prepared as described above, but using four unique single-stranded template populations present in varying amounts on the beads, followed by in situ hybridization with universal sequencing primers (Panel A). In the presence of T4 DNA ligase (10U/slide), the ligation reaction was carried out in the gel with a probe pool whose 3' end consisted of five degenerate bases (N; complexity 4 ⁵ =1024), two universal bases (I, inosine) and an octamer of known nucleotide design, which correspond to specific 5' fluorophores (G-Cy5, A-CAL 610, T-CAL560 , A-FAM; 600 pmol each). Incubate slides at 37°C for 30 minutes and wash to remove unbound probe populations. Slides were imaged under white light to generate base images (column B), and four bandpass filters (FITC, Cy3, Texas Red, and Cy5) were used for fluorescence excitation. Individual probe images are superimposed and false-colored (column C). Fluorescence images were counted with bead-calling software and the frequency of each ligation product was tabulated (column D); spectral scatterplots of raw raw data and filtered data representing the top 90% of bead signal values are shown in column E. The data demonstrate that the observed frequency of ligation (Obs) correlates to the predicted frequency (Exp) based on the known concentrations of each template. This validates that degenerate and universal base-containing probe pools can be used with T4 DNA ligase to provide in-gel specific and selective ligation.

实施例8：证实凝胶中杂交和去除起始寡核苷酸的重复循环Example 8: Confirmation of repeated cycles of hybridization and removal of starting oligonucleotides in a gel

对安装在自动流动室中的显微镜玻片上的凝胶中固定的模板进行的实验(见下)证实，多个退火和剥离起始寡核苷酸的循环可以最小的信号损失应用于连接于包埋在玻片上凝胶中的珠的模板。采用44个碱基荧光标记的起始寡核苷酸。如图20所示，在10个循环上发生最小信号损失。图20中起始寡核苷酸称为引物。如上所述，基于聚合酶的合成测序法的一个主要缺点是在单个模板链上发生正移相和负移相的倾向。核苷酸错误掺入生长链时发生正移相，由此引起该具体链的基础序列运行在从剩余模板获得的序列前面，并且相位差为n+1个碱基调用。链没有完全延伸时发生更常见的负移相，导致背景碱基调用运行在生长链之后(n-1)。有效剥离延伸产物和通过杂交定位不同的起始寡核苷酸“重启动”模板的能力能够以很少信号损耗甚至无信号损耗实现非常长的阅读长度。Experiments with templates immobilized in gels on microscope slides mounted in automated flow cells (see below) demonstrated that multiple cycles of annealing and stripping initiation oligonucleotides can be applied with minimal loss of signal Template for beads embedded in gel on slide. A 44 base fluorescently labeled starting oligonucleotide was used. As shown in Figure 20, minimal signal loss occurs over 10 cycles. The starting oligonucleotides in Figure 20 are referred to as primers. As mentioned above, a major disadvantage of polymerase-based sequencing-by-synthesis methods is the propensity for positive and negative phase shifts to occur on individual template strands. A positive phase shift occurs when nucleotides are misincorporated into a growing strand, thereby causing the base sequence of that particular strand to run ahead of the sequence obtained from the remaining template by n+1 base calls. The more common negative phase shift occurs when the strand is not fully extended, causing background base calls to run (n-1) behind the growing strand. The ability to efficiently strip extension products and "restart" the template by hybridization to position a different starting oligonucleotide enables very long read lengths with little or no loss of signal.

实施例9：自动化测序系统Embodiment 9: automated sequencing system

本实施例描述了可用于收集一个或多个模板的序列信息的代表性的本发明自动化测序系统。优选地，模板位于基本平坦的基材如显微镜载玻片上。例如，模板可连接于排列在基材上的珠。该系统的照片见图21。该系统基于装有自动化、自动聚焦平台和CCD相机的奥林巴斯落射荧光显微镜镜体(侧面安装)。旋转支架中的四种滤光片盒允许以不同的激发和发射波长进行四色检测。平台上安装了装有peltier温度控制器的流动室，该流动室可开启或关闭，以接受基材如玻片(具有垫圈以密封含有半固体支持物如凝胶的区域边缘)。流动室的垂直取向是本发明系统的一个重要方面，它允许气泡从流动室顶端逃逸。流动室可以完全充满空气，以在各洗涤步骤之前逐出所有试剂。流动室连接于装有两个9-端口Cavro注射器泵的流体处理器，这些注射器泵能将4种区别标记的探针混合物、切割试剂、任何其它所需试剂、酶平衡缓冲液、洗涤缓冲液和空气通过一个端口输送至流动室。通过控制软件用具有多个I/O端口的专用计算机使该系统的操作完全自动化和可编程。Cooke Sensicam相机装有1.3兆像素的冷却CCD，但也可采用灵敏度更低或更高的相机(如可采用4兆像素、8兆像素等)。流动室利用0.25微米平台，外形尺寸1微米。This example describes a representative automated sequencing system of the invention that can be used to collect sequence information for one or more templates. Preferably, the template is on a substantially flat substrate such as a microscope slide. For example, templates can be attached to beads arrayed on a substrate. A photo of the system is shown in Figure 21. The system is based on an Olympus epifluorescence microscope body (side-mounted) equipped with automation, an autofocus stage, and a CCD camera. Four filter cassettes in a carousel allow for four-color detection at different excitation and emission wavelengths. A flow chamber equipped with a peltier temperature controller is mounted on the platform, which can be opened or closed to accept substrates such as slides (with gaskets to seal the edges of areas containing semi-solid supports such as gels). The vertical orientation of the flow chamber is an important aspect of the system of the present invention, allowing air bubbles to escape from the top of the flow chamber. The flow chamber can be completely filled with air to drive out all reagents prior to each wash step. The flow cell is connected to a fluid handler equipped with two 9-port Cavro syringe pumps capable of pumping 4 differentially labeled probe mixes, cleavage reagents, any other required reagents, enzyme equilibration buffer, wash buffer and air is delivered to the flow chamber through a port. Operation of the system is fully automated and programmable with a dedicated computer with multiple I/O ports through control software. The Cooke Sensicam has a 1.3-megapixel cooled CCD, but less or more sensitive cameras are also available (e.g. 4-megapixel, 8-megapixel, etc.). The flow cell utilizes a 0.25 micron stage with a 1 micron overall dimension.

实施例10：图像获取和处理方法Embodiment 10: image acquisition and processing method

本实施例描述了获取和加工连接有标记核酸的珠阵列的图像的代表性方法。准确的特征鉴定和比对对于可靠地分析各获取图像而言很重要。首先丢弃除各珠的强度最高像素以外的所有像素以鉴定特征。将给定图像的像素值作成直方图；丢弃对应于背景的像素，分选其余像素值。在其中所有珠的强度基本相同的一致图像中，所采用算法去除了底部80-90％像素值。然后扫描像素值为前10-20％的像素，以鉴定在4个像素半径中为局部最大值的像素。然后记录该区域的平均强度以及周界的平均强度。这些值形成正态分布，然后去除其值落到该分布以外的像素。最初忽略的像素百分数、圆形区域大小和消除正态分布中可能的珠的截止值都是参数化的，如果需要可以改变。通过在比对组中建立各图像的特征矩阵完成比对。然后，搜索得到的矩阵中最频繁的x，y坐标偏移，以鉴定最优比对。This example describes a representative method for acquiring and processing images of bead arrays with attached labeled nucleic acids. Accurate feature identification and alignment are important for reliable analysis of each acquired image. All but the most intense pixel for each bead were first discarded to identify features. Make a histogram of the pixel values of the given image; discard the pixels corresponding to the background, and sort the remaining pixel values. In a consistent image where all beads have essentially the same intensity, the algorithm employed removes the bottom 80-90% of pixel values. The top 10-20% of pixels were then scanned to identify pixels that were local maxima within a 4 pixel radius. The average intensity for the area is then recorded as well as the average intensity for the perimeter. The values form a normal distribution, and pixels whose values fall outside that distribution are removed. The percentage of pixels initially ignored, the size of the circular area, and the cutoff to eliminate possible beads in the normal distribution are all parameterized and can be changed if desired. The comparison is accomplished by establishing the feature matrix of each image in the comparison group. The resulting matrix is then searched for the most frequent x,y coordinate offsets to identify optimal alignments.

在加入延伸探针之前在Cy5通道(对应于测序引物)中收集珠图像。用这些图像为每个珠建立标记定位坐标以及用荧光单位(RFU)表示的原始信号强度的特征图。就各后续双链体延伸而言，在加入Cy3-标记核苷酸之前和之后获取图像组。将这些图像与原始Cy5图像比对，然后将RFU值分配给各珠并记录。通过减去每个碱基加入引起的未标记图像(延伸前)和标记图像(加入荧光)之间的强度差进行基线校正。然后，用对于各特征在Cy5图像中发现的强度对这些减去基线的值进行标准化，以形成认定珠延伸与否的基础(即如果连接于珠的双链体延伸，则认为珠延伸)。采用这些方法，可以分析每个玻片约1,300幅图像中每幅图像上成千上万个特征，以便在每个实验轮次中分析五百万-一亿个模板物质。算法设计使得随后不难从MATLAB导入C+，以进一步提高效率。Bead images were collected in the Cy5 channel (corresponding to the sequencing primers) prior to addition of extension probes. These images were used to create a profile of the marker localization coordinates and raw signal intensity in fluorescence units (RFU) for each bead. For each subsequent duplex extension, sets of images were acquired before and after addition of Cy3-labeled nucleotides. These images were aligned to the original Cy5 images, and RFU values were assigned to each bead and recorded. Baseline correction was performed by subtracting the intensity difference between the unlabeled image (before extension) and the labeled image (fluorescence added) caused by each base addition. These baseline-subtracted values were then normalized to the intensity found in the Cy5 image for each feature to form the basis for determining bead extension (ie, a bead was considered extended if the duplex attached to the bead was extended). Using these methods, it is possible to analyze tens of thousands of features on each of approximately 1,300 images per slide, allowing analysis of five to one hundred million template species per experimental run. The algorithm design makes it easy to import C+ from MATLAB subsequently to further improve efficiency.

实施例11：珠比对和跟踪以及序列解码Example 11: Bead Alignment and Tracking and Sequence Decoding

本实施例描述了处理连接有标记核酸的珠阵列的图像以及由所获数据进行序列测定的代表性方法。This example describes a representative method for processing images of bead arrays with attached labeled nucleic acids and performing sequence determination from the obtained data.

用直径匹配珠尺寸的零-积分圆形顶环中心(zero-integral circular top-hatkernel)卷曲该图像开始图像分析。这能将背景自动标准化至零，同时通过局部最大值鉴定单个珠的中心。确定最大值，并将与其它局部最大值隔离的那些最大值用作比对点。依时间序列计算各图像的比对点。就各对图像而言，比较比对点，根据所有通用比对点的平均位移计算位移矢量。这提供了以亚像素分辨率进行成对图像位移。Image analysis was started by crimping the image with a zero-integral circular top-hatkernel whose diameter matched the bead size. This automatically normalizes the background to zero while identifying the center of individual beads through local maxima. Maxima were determined, and those maxima isolated from other local maxima were used as comparison points. Calculate the alignment points of each image in time series. For each pair of images, the alignment points are compared and a displacement vector is calculated from the average displacement of all common alignment points. This provides pairwise image displacement with sub-pixel resolution.

对于N幅图像，有N*(N-1)/2对位移，但仅N-1对位移是独立的，因为其余的可由独立组计算。例如，测定图像1与2之间和图像1与3之间的位移就提示了图像2与3之间的位移。如果测定的图像2与3之间的位移与提示的位移不同，那么测量不一致。这种不一致的量级可用作测量比对算法运行得如何的量度。我们的初步测试显示，各方向上不一致通常小于0.1像素(见图23)。For N images, there are N*(N-1)/2 pairs of displacements, but only N-1 pairs of displacements are independent, since the rest can be computed by independent groups. For example, measuring the displacement between images 1 and 2 and between images 1 and 3 suggests the displacement between images 2 and 3. If the measured displacement between images 2 and 3 is different from the suggested displacement, then the measurements are inconsistent. The magnitude of this inconsistency can be used as a measure of how well the alignment algorithm is performing. Our preliminary tests show that the inconsistency is usually less than 0.1 pixel in all directions (see Figure 23).

一旦比对了图像时间序列后，有两种方式跟踪单个珠。如果珠密度低，且大多数珠不接触其它珠，那么可鉴定各珠的光学质心，并对珠周围区域积分来计算珠强度。如果珠密度很高，以致于大多数珠相互接触，则不可能通过围绕它们的暗背景带鉴定单个珠。然而，将所有图像调校至亚像素分辨率后，则可能通过及时计算相邻像素的相关性鉴定属于同一珠的像素。可以将高度相关的像素对可靠地分配给同一珠。将相似技术应用于DNA测序凝胶中的泳道跟踪，获得了良好结果(Blanchard，A.P.，修饰的T7聚合酶对二脱氧核苷酸掺入的序列特异性影响(Sequence-specific effects on the incorporation of dideoxynucleotidesby a modified T7 polymerase)，California Institute of Technology，1993)。一旦通过整个4色时间序列跟踪珠后，通过了解哪种颜色对应于探针寡核苷酸的哪种3’-端碱基，即可解码该序列。Once the image time series is aligned, there are two ways to track individual beads. If the bead density is low, and most beads do not touch other beads, then the optical centroid of each bead can be identified and integrated over the area around the bead to calculate the bead intensity. If the bead density is so high that most of the beads touch each other, it may not be possible to identify individual beads by the dark background band surrounding them. However, after all images are scaled to sub-pixel resolution, it is possible to identify pixels belonging to the same bead by computing the correlation of neighboring pixels in time. Highly correlated pairs of pixels can be reliably assigned to the same bead. A similar technique was applied to lane tracking in DNA sequencing gels with good results (Blanchard, A.P., Sequence-specific effects on the incorporation of dideoxynucleotide incorporation by modified T7 polymerases (Sequence-specific effects on the incorporation of dideoxynucleotides by a modified T7 polymerase), California Institute of Technology, 1993). Once the bead has been tracked through the entire 4-color time series, the sequence can be decoded by knowing which color corresponds to which 3'-terminal base of the probe oligonucleotide.

实施例11：通量计算Example 11: Flux Calculations

通常，测序系统的通量主要取决于该机器每天可产生的图像数量和每幅图像的序列数据中的核苷酸(碱基)数量。由于优选将该机器设计为保持相机一直忙碌，所以计算基于100％相机利用度。在各珠以4色成像以确定一个碱基的种类的实施方式中，可采用一台相机拍摄的4幅图像、两台相机拍摄的2幅图像或4台相机拍摄的一幅图像。与其它选择相比，四台相机成像能显著提高通量，优选系统利用该方法。Typically, the throughput of a sequencing system is primarily determined by the number of images the machine can generate per day and the number of nucleotides (bases) in the sequence data for each image. Since the machine is preferably designed to keep the camera busy at all times, calculations are based on 100% camera utilization. In embodiments where each bead is imaged in 4 colors to determine the identity of a base, 4 images from one camera, 2 images from 2 cameras, or 1 image from 4 cameras can be used. Four-camera imaging can significantly increase throughput compared to other options, and it is preferred that the system utilize this approach.

我们的初步测试显示，每珠50像素的像素密度(代表5.4平方微米)能提供用于标准图像分析的合适密度。通过使用4兆像素CCD相机(现在很常见)，一帧CCD图像可拍摄～80,000个珠(根据我们的现有图像数据)。用不同相机捕获四幅图像并移动到流动室上的下一个视野的耗时不超过1.5秒。如果75％的珠产生有用信息，我们将能够收集约80,000个珠*0.75/1.5＝40,000个碱基/秒原始序列数据。Our preliminary tests show that a pixel density of 50 pixels per bead (representing 5.4 square microns) provides a suitable density for standard image analysis. By using a 4 megapixel CCD camera (common now), ~80,000 beads can be captured in one frame of CCD image (based on our existing image data). It takes less than 1.5 seconds to capture four images with different cameras and move to the next field of view on the flow cell. If 75% of the beads yielded useful information, we would be able to collect approximately 80,000 beads * 0.75/1.5 = 40,000 bases/second of raw sequence data.

维持100％相机利用度的一个重要问题是将进行一个连接/切割化学循环消耗的时间与整个流动室成像所需的时间相匹配。对延伸、切割和连接循环耗时的合理估计是11/2小时(5,400秒)。这5,400秒将容纳1,800个图像视野或约15mm×45mm的区域，这是流动室的合适大小。保守估计利用四台相机、流动室为15mm×45mm的系统的通量是每秒40,000个碱基。根据我们用ABI3730xl测序仪实现的具有约650个碱基的阅读长度(20个碱基/秒)的每天28轮的通量，这等于约2,000台ABI3730xl测序仪。珠密度增加2.5倍至每幅图像200,000个珠使通量整体增加到100,000个碱基/秒，约等于5,000台ABI3730xl机器。以这种通量水平，每天的总输出量约为8.6Gb，因此完成12X人类基因组序列所需的时间为～4.2天。An important issue in maintaining 100% camera utilization is matching the time it takes to perform one ligation/cutting chemistry cycle with the time required to image the entire flow cell. A reasonable estimate of the elongation, cleavage, and ligation cycle time is 11/2 hours (5,400 seconds). These 5,400 seconds will accommodate 1,800 image fields of view or an area of approximately 15mm by 45mm, which is a suitable size for a flow chamber. A conservative estimate of the throughput for a system utilizing four cameras and a flow cell of 15 mm x 45 mm is 40,000 bases per second. Based on our throughput of 28 rounds per day with a read length of about 650 bases (20 bases/sec) that we achieved with the ABI3730xl sequencers, this equates to about 2,000 ABI3730xl sequencers. A 2.5-fold increase in bead density to 200,000 beads per image increases throughput overall to 100,000 bases/second, equivalent to approximately 5,000 ABI3730xl machines. At this level of throughput, the total output is about 8.6 Gb per day, so the time required to complete a 12X human genome sequence is ~4.2 days.

应注意，可用各种不同的测序系统、图像捕获和处理方法等实施本文所述的本发明测序方法。详情参见例如，美国专利6,406,848和6,654,505以及PCT公开号WO98053300。It should be noted that the inventive sequencing methods described herein may be implemented with a variety of different sequencing systems, image capture and processing methods, and the like. See, eg, US Patents 6,406,848 and 6,654,505 and PCT Publication No. WO98053300 for details.

实施例12：用于在其上合成模板的微粒的制备方法Example 12: Preparation of Microparticles for Synthesizing Templates thereon

本实施例描述了连接有扩增引物的微粒(在本实施例中是磁珠)的制备方法，以扩增(如通过PCR)模板，产生连接于各微粒的模板分子的克隆群体。通常，扩增珠连接有克隆PCR反应所需的一种引物。此引物可共价偶联于珠表面或(例如)经生物素标记结合于珠表面上的链霉亲和素。珠可用于标准PCR反应(如在微量滴定板孔、试管中等)、实施例13所述的乳液PCR反应等，以获得连接有模板分子克隆群体的珠。This example describes the preparation of microparticles (in this example magnetic beads) attached to amplification primers to amplify (eg, by PCR) a template to generate a clonal population of template molecules attached to each microparticle. Typically, amplification beads are attached with one of the primers required for the cloning PCR reaction. This primer can be covalently coupled to the bead surface or, for example, labeled with biotin to streptavidin bound to the bead surface. Beads can be used in standard PCR reactions (eg, in microtiter plate wells, test tubes, etc.), emulsion PCR reactions as described in Example 13, etc., to obtain beads with linked template molecule clonal populations.

材料Material

1xTE：10mM Tris(pH 8)1mM EDTA 1xTE : 10mM Tris (pH 8) 1mM EDTA

1xPCR缓冲液：(ThermoPol缓冲液，NEB) 1xPCR buffer : (ThermoPol buffer, NEB)

20mM Tris-HCl(pH 8.8)20mM Tris-HCl (pH 8.8)

10mM KCl10mM KCl

10mM(NH₄)₂SO₄ 10mM(NH ₄ ) ₂ SO ₄

2mM MgSO₄ 2mM _MgSO4

0.1％Triton X-1000.1% Triton X-100

1M甜菜碱(仅加入1xPCR-B缓冲液)1M betaine (add only to 1xPCR-B buffer)

1x结合和洗涤缓冲液1x binding and washing buffer

5mM Tris HCl(pH 7.5)5mM Tris HCl (pH 7.5)

0.5mM EDTA0.5mM EDTA

1M NaCl1M NaCl

DNA捕获引物(20-mer，500μM母液)DNA capture primer (20-mer, 500 μM stock solution)

双生物素-(HEG)5-P1：5’-双生物素-(HEG)5-CTA AGG TAG CGA CTGTCC TA-3’Bibiotin-(HEG)5-P1: 5’-bibiotin-(HEG)5-CTA AGG TAG CGA CTGTCC TA-3’

(HEG)5＝六乙二醇接头，含有18碳的间隔物，可采用的许多不同间隔物部分之一。包括可用于(例如)抬升离开珠表面的寡核苷酸的P1引物部分的间隔物。可将本文所述任何引物掺入这种间隔物部分中。(HEG)5 = Hexaethylene glycol linker, contains an 18 carbon spacer, one of many different spacer moieties that can be used. A spacer is included that can be used, for example, to lift the P1 primer portion of the oligonucleotide off the bead surface. Any of the primers described herein can be incorporated into this spacer portion.

Dynal储存磁珠(1μm直径)＝10mg/ml(7-12x10⁶珠/μl)。Dynal storage magnetic beads (1 μm diameter) = 10 mg/ml (7-12x10 ⁶ beads/μl).

方法method

1.取出50μl珠(～450x10⁶珠)。1. Remove 50 μl beads (~ ^450x106 beads).

2.加入200μl 1xTE缓冲液，充分混合。用磁体分离。2. Add 200μl 1xTE buffer and mix well. Separate with a magnet.

3.用200μl 1xTE缓冲液洗涤1次。用磁体分离。3. Wash once with 200μl 1xTE buffer. Separate with a magnet.

4.重悬于100μl B/W缓冲液。4. Resuspend in 100μl B/W buffer.

5.加入3μl P1寡核苷酸(500μM母液＝1500pmol)。5. Add 3 μl P1 oligonucleotide (500 μM stock solution=1500 pmol).

6.室温下旋转＞30分钟。6. Spin >30 minutes at room temperature.

7.用200μl 1x TE缓冲液洗涤3次。7. Wash 3 times with 200μl 1x TE buffer.

8.重悬于50μl(起始体积)1xTE缓冲液。8. Resuspend in 50 μl (initial volume) of 1×TE buffer.

9.将DNA捕获珠储存于4℃或放在冰上待用。应在1周内使用珠(储存时间＞1周珠会趋向于凝聚成块)。9. Store the DNA Capture Beads at 4°C or on ice until use. Beads should be used within 1 week (beads tend to clump when stored for >1 week).

实施例13：在乳剂中的微粒上进行PCR的方法Example 13: Method for performing PCR on microparticles in emulsion

本实施例描述了可用于在乳剂中的微粒上进行PCR，产生连接有克隆模板的微粒的方法。首先用第一引物(P1)使微粒(在下面所用的命名中称为DNA珠)功能化。第二引物(P2)存在于发生PCR反应的水相中。如果需要，水相中也可含有低浓度的P1，例如少20倍。这样做能够在水相中快速建立模板，该模板是继续扩增的底物。随着溶液中P1的耗尽，迫使该反应利用连接于微粒的P1。P1_P2degen10是具有与P1和P2杂交以通过PCR进行扩增的序列的寡核苷酸模板(100bp)以及赋予该寡核苷酸群体4¹⁰复杂性的约10个简并碱基(在寡核苷酸合成期间掺入)的片段。This example describes a method that can be used to perform PCR on microparticles in an emulsion to generate microparticles with attached cloning templates. Microparticles (referred to as DNA beads in the nomenclature used below) are first functionalized with a first primer (P1). The second primer (P2) is present in the aqueous phase where the PCR reaction takes place. The aqueous phase may also contain a lower concentration of P1, for example 20 times less, if desired. Doing so enables rapid establishment of the template in the aqueous phase, which is the substrate for continued amplification. As P1 is depleted in solution, the reaction is forced to utilize P1 attached to the microparticle. P1_P2degen10 is an oligonucleotide template (100 bp) with sequences that hybridize to P1 and P2 for amplification by PCR and about 10 degenerate bases (in oligonucleotides) that confer 4 ¹⁰ complexity on this oligonucleotide population. incorporated during acid synthesis).

I.乳液方案(1μm珠)I. Emulsion protocol (1 μm beads)

1.制备油相：1. Prepare the oil phase:

Span 80(7％)Span 80 (7%)

吐温80(0.4％)Tween 80 (0.4%)

在轻质矿物油中制备Prepared in light mineral oil

仅使用新鲜制备的油相Use only freshly prepared oil phase

总油相＝450μlTotal oil phase = 450 μl

2.制备水相：(估计产生2x10⁹滴，每滴115fL)2. Prepare the aqueous phase: (estimated to produce ^2x109 drops, each drop 115fL)

试剂(母液) Reagent (mother liquor) (μl)/反应 (μl)/reaction 最终 finally dH₂OMgCl₂缓冲液(10X)dNTP(100 mM ea)MgCl₂(1M)甜菜碱(5M)P1(引物1)(10μM)P2(引物2)(200μM)P1_P2 degen10(100pM)DNA珠(8M/μl)Platinum Taq(5U/μl) _dH2OMgCl2 _buffer (10X) dNTP (100 mM ea) _MgCl2 (1M) betaine (5M) P1 (primer 1) (10 μM) P2 (primer 2) (200 μM) P1_P2 degen10 (100 pM) DNA beads (8M /μl) Platinum Taq (5U/μl) 156.032.011.37.332.01.640.06.625.09.0156.032.011.37.332.01.640.06.625.09.0 -1X各3.5 mM23 mM0.5 M11.25皮摩尔5625皮摩尔5.9x10⁷/μl150M/乳液0.28U/μl-1X each 3.5 mM23 mM0.5 M11.25 pmol 5625 pmol ^5.9x107 /μl150M/emulsion 0.28U/μl

总水相体积＝320μlTotal aqueous phase volume = 320 μl

终反应＝255μl水相∶450μl油相Final reaction = 255 μl water phase: 450 μl oil phase

3.将水相试管转移到冰上，直到加入乳液。3. Transfer the aqueous phase tube to ice until the emulsion is added.

4.将450μl油相加入2ml冷冻管中。4. Add 450 μl of the oil phase to a 2 ml cryovial.

5.将冷冻管直立着放入依附于IKA涡旋器的泡沫插槽。将涡旋器设定为2500rpm。5. Place the cryovial upright into the foam socket attached to the IKA vortexer. Set the vortex to 2500 rpm.

6.试样量水相(3份试样量，各85μl＝255μl)加到振荡的油相中。通过将移液器尖头插入试管，将水相从尖头中缓慢地加入振荡的油相中，从而将单分散水相加入搅拌的2ml冷冻管中。用其余水相重复添加2次。6. Sample volume The aqueous phase (3 sample volumes, 85 μl each = 255 μl) was added to the shaken oil phase. Add the monodisperse aqueous phase to a stirred 2 ml cryovial by inserting the pipette tip into the tube and slowly adding the aqueous phase from the tip to the shaking oil phase. The addition was repeated 2 times with the remaining aqueous phase.

7.继续在2500rpm振荡乳液24分钟，7. Continue to shake the emulsion at 2500rpm for 24 minutes,

8.将～100μl试样量乳液转移到96孔板中(总共＝4孔)。同时，将试样量的剩余水相(65μl)加入到单独的一个孔中，进行基于溶液的PCR对照反应。密封平板，如下一章节所述进行循环。8. Transfer ~100 [mu]l aliquots of emulsion into 96-well plates (total = 4 wells). Simultaneously, an aliquot of the remaining aqueous phase (65 [mu]l) was added to a separate well for a solution-based PCR control reaction. Seal the plate and cycle as described in the next section.

II.乳液扩增(1μm珠)II. Emulsion Amplification (1 μm Beads)

1.1μm珠乳液的PCR循环参数(引物Tm＝62℃)：PCR cycle parameters for 1.1 μm bead emulsion (primer Tm=62°C):

程序：DTB-PCRProcedure: DTB-PCR

94℃，2分钟n＝194°C, 2 minutes n=1

94℃，15秒94°C, 15 seconds

57℃，30秒n＝10057°C, 30 seconds n=100

70℃，60秒70°C, 60 seconds

55℃，5分钟n＝155°C, 5 minutes n=1

10℃，任意时间10°C, any time

2.循环时间约为6小时。2. The cycle time is about 6 hours.

3.在循环后观察乳液。成功的乳液会显示出均一的琥珀色，观察不到单独的水相。“破损”(析出溶液)的乳液在管底产生明显的水相。避免收集此相，因为这里的珠群体不是克隆性的。3. Observe the emulsion after cycle. A successful emulsion will exhibit a uniform amber color with no visible separate aqueous phase. A "broken" (out of solution) emulsion produces a distinct aqueous phase at the bottom of the tube. Avoid collecting this phase, as the bead population here is not clonogenic.

4.用亮视野显微术评价循环后乳液。取出2μl试样量的循环乳液并滴在玻片上。用22×60mm盖玻片盖上乳液样品。4. Evaluation of post-cycle emulsions by bright field microscopy. A 2 μl aliquot of the circulating emulsion was removed and dropped onto a glass slide. Emulsion samples were covered with 22 x 60 mm coverslips.

5.用20X物镜观察乳液。优选地，珠应为单分散的，大多数液滴含有单一的珠。5. Observe the emulsion with a 20X objective lens. Preferably, the beads should be monodisperse, with the majority of droplets containing a single bead.

注：如果乳液样品含有大量多珠液滴，则将乳液反应倾入一个1.5mleppendorf管中，6000rpm离心15秒。取出聚集在管底的珠悬液。此群体由游离珠和比单珠液体重的多珠液滴组成，因此在短暂离心后沉降到管底。此珠群体不是克隆的，因此在后续处理前应予以避免。重复步骤4和5再评价乳液，以确认乳液样品中含有单珠的液体的完整性。NOTE: If the emulsion sample contains a large number of multi-beaded droplets, pour the emulsion reaction into a 1.5 mleppendorf tube and centrifuge at 6000 rpm for 15 seconds. Remove the bead suspension that collects at the bottom of the tube. This population consists of free beads and multi-bead droplets that are heavier than single-bead liquid and therefore settle to the bottom of the tube after a brief centrifugation. This bead population is not clonal and should therefore be avoided prior to subsequent processing. Repeat steps 4 and 5 to re-evaluate the emulsion to confirm the integrity of the fluid containing the single beads in the emulsion sample.

6.用下一章节所述的方法破坏(破损)乳液。6. Break up (break) the emulsion as described in the next section.

III.乳液破坏和解链(1μm珠)III. Emulsion disruption and unzipping (1 μm beads)

珠破坏洗涤(BBW)缓冲液Bead Breaking Wash (BBW) Buffer

2％Triton X-100 2％吐温20；10mM EDTA2% Triton X-100 2% Tween 20; 10mM EDTA

解链溶液100mM NaOH Melting solution 100mM NaOH

1xTE：10mM Tris(pH8)1mM EDTA 1xTE : 10mM Tris (pH8) 1mM EDTA

1x结合和洗涤(B/W)缓冲液1x Binding and Washing (B/W) Buffer

5mM Tris-HCl(pH7.5)5mM Tris-HCl (pH7.5)

0.5mM EDTA0.5mM EDTA

1M NaCl1M NaCl

1.将各乳液组(4份试样量)倾入一个1.5ml eppendorf管中。1. Pour each emulsion set (4 sample sizes) into a 1.5ml eppendorf tube.

2.加入800μl BBW缓冲液。通过涡旋反应试管10秒破坏乳液。2. Add 800 μl BBW buffer. The emulsion was disrupted by vortexing the reaction tube for 10 sec.

3.8000rpm离心2分钟。3. Centrifuge at 8000 rpm for 2 minutes.

4.去除上面的800μl(主要是油相)。DNA珠会沉在管底。4. Remove the upper 800 μl (mainly the oil phase). The DNA beads will sink to the bottom of the tube.

5.加入800μl BBW，涡旋，8000rpm离心2分钟。去除上面的600μl。5. Add 800μl BBW, vortex, and centrifuge at 8000rpm for 2 minutes. Remove the upper 600 μl.

6.再用600μl 1xTE洗涤2次，用磁体交换各洗涤液。6. Wash twice with 600 μl 1xTE, and exchange each washing solution with a magnet.

8.将50μl解链溶液加入珠沉淀中，通过剧烈吹打重悬样品。用解链溶液在室温下孵育珠5分钟，间歇地轻弹试管。8. Add 50 μl of melting solution to the bead pellet and resuspend the sample by vigorous pipetting. Incubate the beads with melting solution for 5 min at room temperature, flicking the tube intermittently.

9.将试管放入磁体中，以去除解链溶液。用100μl解链溶液洗涤1次，以保证完全去除第二条链。9. Place the tube in the magnet to remove the melting solution. Wash once with 100 μl melting solution to ensure complete removal of the second strand.

10.用1xTE将珠沉淀洗涤2次，重悬于20μl TE缓冲液并储存于4℃，或者如果下一个步骤是富集的话重悬于20μl 1xB/W缓冲液。如果珠出现聚集，换到1xPCR-B缓冲液中。10. Wash the bead pellet twice with 1xTE, resuspend in 20 μl TE buffer and store at 4°C, or resuspend in 20 μl 1xB/W buffer if the next step is enrichment. If beads appear aggregated, switch to 1xPCR-B buffer.

11.继续富集方法(任选)。11. Continue enrichment method (optional).

实施例14：富集连接有克隆模板群体的微粒的方法Example 14: Method for Enrichment of Microparticles Linked to Cloned Template Populations

本实施例描述了富集(例如)在PCR乳液中成功进行了模板扩增的微粒的方法。此方法利用连接有捕获寡核苷酸的较大微粒。所述捕获寡核苷酸包含与模板中存在的核苷酸区互补的核苷酸区。This example describes a method for enriching microparticles that have undergone successful template amplification, eg, in a PCR emulsion. This method utilizes larger microparticles with attached capture oligonucleotides. The capture oligonucleotide comprises a region of nucleotides that is complementary to a region of nucleotides present in the template.

I.乳液富集(1μm)I. Emulsion enrichment (1μm)

A.制备富集珠(捕获实体)A. Preparation of enrichment beads (capture entities)

富集珠：Enrichment Beads:

Spherotech链霉亲和素包被的聚苯乙烯珠(～6.5μm)Spherotech Streptavidin-Coated Polystyrene Beads (~6.5 µm)

珠储存液(0.5％w/v)：33，125个珠/μlBead stock solution (0.5% w/v): 33,125 beads/μl

每个方案：(33，125个珠/μl)(800μl)＝26.5×10⁶个珠Each protocol: (33,125 beads/μl) (800 μl) = ^{26.5 x 106} beads

应用：application:

每份乳液1.19亿珠-乳液克隆性估计值(2％)：每份乳液～3M模板阳性珠。每个预计的模板阳性乳液珠加入2-3个富集珠＝每个乳液反应加入1千万个富集珠。119 million beads per emulsion - Emulsion clonality estimate (2%): ~3M template positive beads per emulsion. Add 2-3 enrichment beads per projected template-positive emulsion bead = 10 million enrichment beads per emulsion reaction.

富集寡核苷酸(捕获剂)：Enrichment oligonucleotides (capture reagents):

P2-富集(35-mer，Tm＝73℃)P2-enriched (35-mer, Tm=73°C)

5’-双生物素-18碳间隔物-ttaggaccgttatagttaggtgatgcattaccctg 3’5'-bibiotin-18 carbon spacer-ttaggaccgttatagttaggtgatgcattaccctg 3'

(或)(or)

P2-富集(如至多35-mer，Tm＝52℃)P2-enriched (eg, up to 35-mer, Tm = 52°C)

5’-双生物素-18碳间隔物-ggtgatgcattaccctg 3’5'-biotin-18 carbon spacer-ggtgatgcattaccctg 3'

甘油溶液-60％(v/v)Glycerin Solution - 60% (v/v)

6ml甘油6ml glycerin

4ml无核酸酶的H₂O4ml nuclease-free _H2O

1.取出800μl珠，13,000rpm离心1分钟交换到B/W缓冲液中。用500μl B/W缓冲液洗涤1次，重悬于100μl B/W缓冲液。1. Take out 800 μl of beads and centrifuge at 13,000 rpm for 1 minute to exchange into B/W buffer. Wash once with 500μl B/W buffer and resuspend in 100μl B/W buffer.

2.加入20μl富集寡核苷酸(500μM母液＝10,000皮摩尔/rxn)。2. Add 20 μl enrichment oligonucleotide (500 μM stock solution = 10,000 pmol/rxn).

3.室温旋转珠反应1小时。3. Rotate the beads at room temperature for 1 hour.

4.用500μl 1x TE缓冲液洗涤珠3次。各次洗涤之间通过13,000rpm离心1分钟使珠沉淀。4. Wash the beads 3 times with 500 μl 1x TE buffer. Beads were pelleted by centrifugation at 13,000 rpm for 1 minute between washes.

5.将珠重悬于25μl B/W缓冲液。浓度＝1M富集珠/μl。5. Resuspend beads in 25 μl B/W buffer. Concentration = 1 M enriched beads/μl.

注：将四种富集的乳液群体倾入20-30μl 1xB/W缓冲液产生～40M模板阳性珠。然后，可运行多块玻片。NOTE: Pouring the four enriched emulsion populations into 20-30 µl of 1xB/W buffer yields ~40M template-positive beads. Multiple slides can then be run.

B.富集步骤B. Enrichment step

1.将20μl富集珠加入含有乳液衍生珠(20μl)的试管。通过温和吹打重悬珠混合物(或采用每个预计的模板阳性乳液珠加入2-3个富集珠的比例)。1. Add 20 μl of enriched beads to the tube containing the emulsion-derived beads (20 μl). Resuspend the bead mixture by gentle pipetting (or use a ratio of 2-3 enriched beads per projected template-positive emulsion bead).

2.如果采用生物素化P2-富集引物包被的富集珠，在65℃孵育珠混合物2分钟。将试管移到冰上10分钟。2. If using enrichment beads coated with biotinylated P2-enrichment primers, incubate the bead mixture at 65°C for 2 minutes. Move the tubes to ice for 10 min.

注：初步实验提示，采用含有用于100个循环PCR(如P2PCR)的引物序列的富集珠的富集效率可能较低，因为它能够富集含有引物二聚体的珠，所述引物二聚体被驱赶到无模板液滴中的珠上。如果采用载有上述P2-富集引物的富集珠，由于此较短引物的Tm降低，则在50℃孵育珠混合物2分钟。NOTE: Preliminary experiments suggest that enrichment efficiency may be lower using enrichment beads containing primer sequences for 100-cycle PCR (e.g., P2PCR) because it is able to enrich beads containing primer-dimers, which Aggregates are driven onto beads in template-free droplets. If using enrichment beads loaded with the above P2-enrichment primers, incubate the bead mixture at 50°C for 2 minutes due to the lower Tm of this shorter primer.

3.将珠混合物加入含有300μl 60％甘油溶液的1.5ml eppendorf管中。3. Add the bead mixture to a 1.5ml eppendorf tube containing 300μl 60% glycerol solution.

4.13,000rpm离心1分钟。4. Centrifuge at 13,000 rpm for 1 minute.

5.离心后，阴性珠沉到管底。连接有模板珠的富集珠将浮在甘油相上方。收集上层相珠群体，将其转移到洁净的1.5ml eppendorf管中。5. After centrifugation, the negative beads sink to the bottom of the tube. The enrichment beads with attached template beads will float above the glycerol phase. Collect the upper bead population and transfer it to a clean 1.5ml eppendorf tube.

注：可用磁体洗涤和分析沉到管底的珠(无模板的珠)，然后用与模板阳性珠所述洗涤方案相同的方案洗涤。NOTE: Beads that sink to the bottom of the tube (template-free beads) can be washed and analyzed with a magnet, followed by washing with the same protocol as described for template-positive beads.

6.将1ml无核酸酶的H₂O加入由上层相收集的珠中，以稀释甘油浓度。用温和吹打重悬珠混合物。13,000rpm离心1分钟。6. Add 1 ml of nuclease-free _H2O to the beads collected from the upper phase to dilute the glycerol concentration. Resuspend the bead mixture by gentle pipetting. Centrifuge at 13,000 rpm for 1 minute.

7.离心后，去除上清液，用100μl TE洗涤2次。7. After centrifugation, remove the supernatant and wash twice with 100 μl TE.

8.将100μl解链溶液加入洗涤的珠沉淀中。室温旋转试管5分钟。8. Add 100 [mu]l melting solution to the washed bead pellet. Rotate the tube for 5 minutes at room temperature.

9.再加入100μl解链溶液并用磁体分离模板珠。9. Add another 100 μl melting solution and separate the template beads with a magnet.

10.用100μl TE洗涤两次以去除无磁性富集珠，用磁体将DNA珠与富集珠分离。10. Wash twice with 100 μl TE to remove the non-magnetic enrichment beads, and separate the DNA beads from the enrichment beads with a magnet.

11.将模板珠重悬于10-20μl 1x TE中。如果珠出现聚集，则稀释到1x PCR-B缓冲液中。11. Resuspend template beads in 10-20 μl 1x TE. If beads appear to aggregate, dilute into 1x PCR-B buffer.

12.可将含模板的珠与其它富集群体混合，并加到玻片上，如下一实施例所述。12. The template-containing beads can be mixed with other enriched populations and added to slides as described in the next example.

实施例15：固定于半固体支持物中或之上的微粒阵列的制备方法Example 15: Preparation method of microparticle array immobilized in or on semi-solid support

本实施例描述了玻片的制备，其中位于所述玻片上的半固体支持物中固定(如包埋)了连接有模板的微粒。这种玻片可称为polony玻片。用于本实施例的半固体支持物是聚丙烯酰胺。一种方案采用将聚合酶分子限制在模板附近以增强扩增的方法。This example describes the preparation of slides on which template-attached microparticles are immobilized (eg, embedded) in a semi-solid support. Such slides may be referred to as polony slides. The semisolid support used in this example was polyacrylamide. One approach employs methods that confine the polymerase molecules near the template to enhance amplification.

玻片制备slide preparation

A.载玻片：粘合-硅烷处理A. Glass Slides: Bonding - Silane Treatment

粘合-硅烷有利于使聚丙烯酰胺凝胶粘附于盖玻片表面。应该在临用前用粘合-硅烷预处理玻片。Adhesion - Silanes facilitate adhesion of polyacrylamide gels to coverslip surfaces. Slides should be pretreated with Adhesive-Silane just before use.

注：Note:

^**在化学通风橱中储存粘合-硅烷溶液。 ^** Store the bonding-silane solution in a chemical fume hood.

^**粘合-硅烷有刺激性。制备溶液时在化学实验室中工作。 ^** Adhesion - Silanes are irritating. Work in chemistry lab while preparing solutions.

^**保证粘合-硅烷母液未过期。 ^** Guaranteed Adhesive-Silane Master Solution is not expired.

^**从支架上转移时不要接触玻片表面。 ^** Do not touch the surface of the slide when transferring from the stand.

制备粘合-硅烷溶液：Prepare Adhesive-Silane Solution:

1.在1-L塑料容器中加入：1. Add to a 1-L plastic container:

1L dH₂O，1个搅拌子1L dH ₂ O, 1 stir bar

加入220μl浓乙酸(使pH为3.5)。加入4 ml粘合-硅烷试剂，用搅拌平板混合溶液＞15分钟。220 [mu]l concentrated acetic acid was added (to bring the pH to 3.5). Add 4 ml Adhesive-Silane Reagent and mix solution >15 min with a stir plate.

处理玻片：Handling slides:

2.将玻片加载到(面向相同方向)颠倒的塑料384孔板上。2. Load slides onto plastic 384-well plates upside down (facing the same direction).

3.用dH₂O洗涤玻片，倒干dH₂O。3. Wash the slides with dH ₂ O and drain the dH ₂ O.

4.用100％乙醇洗涤，倒干乙醇。4. Wash with 100% ethanol and drain the ethanol.

5.用dH₂O再次洗涤，倒干dH₂O，将其放入有运行的通风孔和UV灯的组织培养箱中。使洗涤的玻片干燥(～30分钟)。5. Wash again with _dH2O , drain _dH2O and place in tissue culture incubator with running vents and UV light. Washed slides were allowed to dry (~30 minutes).

6.将平板放入塑料容器中，用粘合-硅烷溶液覆盖玻片。6. Place the plate in a plastic container and cover the slide with the adhesive-silane solution.

7.使溶液和玻片反应1小时。间歇地振荡容器以保证粘合-硅烷均匀地包被到玻璃上。7. Allow the solution to react with the slide for 1 hour. Shake the container intermittently to ensure that the bonding-silane is evenly coated onto the glass.

8.孵育后，用dH₂O洗涤玻片3次。8. After incubation, wash slides 3 times with dH ₂ O.

9.用100％乙醇洗涤一次，倒干乙醇。9. Wash once with 100% ethanol, drain the ethanol.

10.临用前使玻片彻底干燥。10. Allow slides to dry thoroughly before use.

11.在干燥器中储存粘合-硅烷处理的玻片。11. Store the bonded-silane treated slides in a desiccator.

B.丙烯酰胺基玻片(小掩模)B. Acrylamide-based slide (small mask)

·非俘获方案·Non-capture scheme

将所有试剂放在冰上。将以下预冷试剂加入1.5ml eppendorf管中：Keep all reagents on ice. Add the following chilled reagents to a 1.5ml eppendorf tube:

试剂Reagent amt(μl)amt (μl) 2块玻片2 slides 1块玻片1 slide 1xTE1xTE 1313 6.56.5 珠(1-3M，用1xTE稀释)Beads (1-3M, diluted in 1xTE) 1010 55 RhinohideRhinohide 11 0.50.5 40％丙烯酰胺∶双丙烯酰胺(19∶1，F/S)40% Acrylamide: Bisacrylamide (19:1, F/S) 55 2.52.5 TEMED(5％，用1xTE配制)TEMED (5%, formulated with 1xTE) 22 1 1 APS(0.5％，新鲜制备)APS (0.5%, freshly prepared) 33 1.51.5 总计total 34μl34μl 17μl17μl

剧烈吹打混合物以使珠散开。The mixture was pipetted vigorously to break up the beads.

在盖玻片下每玻片加17μl。Add 17 μl per slide under the coverslip.

室温下上下翻转聚合60分钟。Polymerize by inverting up and down at room temperature for 60 minutes.

用干净的刀片揭掉盖玻片。Peel off the coverslip with a clean razor blade.

在15分钟内用1E缓冲液浸泡和洗涤玻片2次(以去除未结合的珠)。Slides were soaked and washed twice in 1E buffer (to remove unbound beads) within 15 minutes.

包埋了珠的玻片可储存于4℃的洗涤IE中。Bead-embedded slides can be stored in Wash IE at 4°C.

2.使荧光团-标记的测序引物杂交于包埋的珠群体。通过快速滴入含有1xPCR-B缓冲液的科普林缸(Coplin jar)使玻片从洗涤IE平衡到1x PCR-B缓冲液。2. Hybridize the fluorophore-labeled sequencing primer to the embedded bead population. Slides were equilibrated from wash IE to 1x PCR-B buffer by quick drops into a Coplin jar containing 1x PCR-B buffer.

3.在1.5ml eppendort管中，将1-6μl(100μM母液)引物加入99μl1xPCR缓冲液。在丙烯酰胺基质上，滴加100μl引物溶液，并盖上盖玻片或密封垫圈，3. In a 1.5ml eppendort tube, add 1-6μl (100μM stock solution) primers to 99μl 1xPCR buffer. On the acrylamide matrix, drop 100 μl of primer solution and cover with a coverslip or sealing gasket,

4.用<DEVIN>程序(65℃2分钟，缓慢退火至30℃)加热玻片，使引物杂交于包埋的珠。用洗涤IE洗涤玻片2次，2分钟。玻片准备用于进行基于连接的测序。4. Heat the slide with <DEVIN> program (65°C for 2 minutes, slowly anneal to 30°C) to allow the primers to hybridize to the embedded beads. Wash slides 2 times with Wash IE for 2 min. Slides are ready for ligation-based sequencing.

·俘获方案·Capture scheme

1.以1M/μl制备ssDNA模板珠。[每块玻片用4-5M珠制备polony玻片]。1. Prepare ssDNA template beads at 1M/μl. [Prepare polony slides with 4-5M beads per slide].

2.将珠混合物重悬于30μl 1xPCR缓冲液。2. Resuspend the bead mixture in 30 μl 1x PCR buffer.

3.加入1μl测序引物(100μM母液)；充分混合。3. Add 1 μl of sequencing primer (100 μM stock solution); mix well.

4.加热到65℃2分钟。4. Heat to 65°C for 2 minutes.

5.移到冰上5分钟。5. Move to ice for 5 minutes.

6.用80μl 1xTE洗涤3次。6. Wash 3 times with 80 μl 1xTE.

7.用磁体去除所有溶液。7. Remove all solution with a magnet.

8.加入下述试剂：8. Add the following reagents:

试剂Reagent amt(μl)amt (μl) 2块玻片2 slides 1x缓冲液1x buffer 1.51.5 10x缓冲液10x buffer 2.02.0 高浓度(HC)酶High Concentration (HC) Enzyme 16.016.0 40％丙烯酰胺∶双丙烯酰胺(19∶1，F/S)40% Acrylamide: Bisacrylamide (19:1, F/S) 14.414.4 RhinohideRhinohide 2.02.0 TEMED(5％，用1xTE配制)TEMED (5%, formulated with 1xTE) 2.02.0 APS(0.5％，新鲜制备)APS (0.5%, freshly prepared) 1.51.5 总计total 39.4μl39.4μl

吹打混合物以使珠散开。The mixture was pipetted to disperse the beads.

在盖玻片下每玻片加17μl。Add 17 μl per slide under the coverslip.

9.优选上下翻转聚合，例如，在MJ Research Tetrad PCR仪上采用<Pol-1>循环程序。9. Preferably upside-down polymerization, for example, using <Pol-1> cycle program on MJ Research Tetrad PCR machine.

10.用干净刀片揭掉盖玻片。用1E缓冲液浸泡和洗涤玻片2次10分钟。(以去除未结合的珠)。10. Peel off the coverslip with a clean razor blade. Soak and wash slides twice for 10 minutes with 1E buffer. (to remove unbound beads).

11.Polony玻片准备用于进行基于连接的测序。11. Polony slides are ready for ligation-based sequencing.

12.包埋有珠的Polony玻片可于4℃储存于位于洗涤IE中的垫圈中。12. Bead-embedded Polony slides can be stored at 4°C in a spacer in Wash IE.

实施例16：制备连接于固体支持物的微粒阵列的方法Example 16: Method for preparing microparticle arrays attached to solid supports

本实施例描述了玻片的制备，其中所述玻片上的连接有模板的微粒与固体支持物连接。This example describes the preparation of slides on which template-attached microparticles are attached to a solid support.

1.用具有活性NHS的聚合物系链制备的载玻片储存于-20℃。1. Store slides prepared with polymer tethers with active NHS at -20°C.

(玻片H，产品编号1070936；Schott Nexterion；Schott North America，Inc.，Elmsford，NY)(Slide H, Product No. 1070936; Schott Nexterion; Schott North America, Inc., Elmsford, NY)

2.在干燥剂存在下，在临用前将玻片平衡至室温。2. In the presence of a desiccant, equilibrate the slides to room temperature before use.

3.用50ml 1xPBS(300mM磷酸钠，pH8.7)洗涤玻片5分钟。重复洗涤2次。3. Wash slides with 50ml 1xPBS (300mM sodium phosphate, pH8.7) for 5 minutes. Repeat the wash 2 times.

4.从溶液中取出玻片，用粘性垫圈覆盖(以进行加样)。4. Remove the slide from the solution and cover with an adhesive gasket (for loading).

5.在单独试管中，将1-4亿蛋白质-包被的或DNA-包被的试样量珠加入1xPBS，pH8.7中。DNA可以是(如)用于测序的DNA模板。DNA可包括例如与NHS反应的胺接头。5. In separate tubes, add 1-400 million protein-coated or DNA-coated aliquots of beads in 1xPBS, pH 8.7. DNA can be, eg, a DNA template for sequencing. DNA may include, for example, an amine linker that reacts with NHS.

6.通过缓冲液交换用1xPBS，pH8.7洗涤珠样品3次。6. Wash the bead sample 3 times with 1xPBS, pH 8.7 by buffer exchange.

7.将珠重悬于125ml 1xPBS，pH8.7。7. Resuspend beads in 125ml 1xPBS, pH 8.7.

8.将珠溶液加入玻片垫圈中，以均匀地包被玻片表面。8. Add the bead solution to the slide gasket to evenly coat the slide surface.

9.在暗室中封装玻片，室温下孵育该反应1-2小时。9. Mount the slides in a dark room and incubate the reaction for 1-2 hours at room temperature.

10.孵育后，去除未结合的珠溶液并将玻片转移到50ml 1x TE(10 mM Tris，1mM EDTA，pH 8)中。10. After incubation, remove unbound bead solution and transfer slides to 50ml 1x TE (10 mM Tris, 1mM EDTA, pH 8).

11.用50ml 1x TE洗涤玻片5次，每次洗涤恒速搅拌15分钟。11. Wash the slides 5 times with 50ml 1x TE, stirring at a constant speed for 15 minutes each time.

12.玻片可于4℃、1xTE中储存数周。12. Slides can be stored in 1xTE at 4°C for several weeks.

13.如果需要，可通过白光(WL)亮视野图像分析或采用连接于荧光团基染料的互补DNA寡核苷酸的荧光评价珠群体。可用(如)基于连接的测序法测序DNA模板。13. If desired, bead populations can be evaluated by white light (WL) brightfield image analysis or fluorescence using complementary DNA oligonucleotides attached to fluorophore-based dyes. The DNA template can be sequenced using, for example, ligation-based sequencing.

图33A显示了连接有珠的玻片的示意图。Figure 33A shows a schematic of a slide with beads attached.

应注意，只有一小部分DNA模板分子连接于玻片。采用一微米珠(DynabeadsMyOne链霉亲和素珠；Dynal Biotech，Inc.，产品编号650.01)。然而，也可采用各种珠。It should be noted that only a small fraction of the DNA template molecules are attached to the slide. One micron beads (Dynabeads MyOne Streptavidin Beads; Dynal Biotech, Inc., Product No. 650.01) were used. However, various beads can also be used.

图33B显示了连接于玻片的珠群体。下栏显示了白光下(左)和荧光显微镜下玻片的同一区域。上栏显示了珠密度范围。Figure 33B shows a population of beads attached to a glass slide. The lower panel shows the same area of the slide under white light (left) and under a fluorescent microscope. The upper column shows the bead density range.

等价形式和范围Equivalent Form and Range

精通本领域的技术人员将认识到或能够确定，采用常规实验，即可获得本文所述的本发明具体实施方式的许多等价形式。本发明范围不限于上述说明书，还包括所附权利要求书所列范围。在所附权利要求书中，冠词如“一个”、“一种”和“这种”可以指一种或一种以上，除非另有说明或者文中显然并非如此。如果一个、一个以上或所有组成员存在于、用于给定产物或方法或者与其相关，则应使用组中一个或多个成员间用“或”连接的权利要求书或说明书，除非另有说明或者文中显然并非如此。Those skilled in the art will recognize, or be able to ascertain, using routine experimentation, many equivalents to the specific embodiments of the invention described herein. The scope of the present invention is not limited to the above description, but also includes the scope listed in the appended claims. In the appended claims articles such as "a", "an" and "the" may refer to one or more than one unless stated otherwise or evidently otherwise from the context. If one, more than one, or all group members are present in, used in, or related to a given product or process, claims or descriptions joined by "or" between one or more members of the group should be used unless otherwise stated Or that's clearly not the case in the text.

而且应理解，本发明包括将来自一项或多项所列权利要求的一种或多种限制、元件、条款、描述性术语等引入另一权利要求的所有改变、组合和取代。具体说，可以改造任何依赖于另一权利要求的权利要求，以包括存在于依赖于相同的基本权利要求的任何其它权利要求中的一种或多种限制。Furthermore, it is to be understood that the invention includes all changes, combinations and substitutions in which one or more limitations, elements, clauses, descriptive terms, etc., from one or more listed claims are introduced into another claim. In particular, any claim that is dependent on another claim may be adapted to include one or more limitations that exist in any other claim that is dependent on the same base claim.

此外应理解，任何一个或多个实施方式可明确排除在权利要求之外，即使本文中没有明确列出特定排除。也应理解，当说明书和/或权利要求书公开了用于测序的试剂(如模板、微球、探针、探针家族等)时，这种公开也包括按照本文所述特定方法或本领域已知的其它方法用该试剂测序的方法，除非本领域普通技术人员能作出不同理解，或者在说明书中有不同描述。此外，当说明书和/或权利要求书公开了测序方法时，本文所述的任何一种或多种试剂均可用于该方法，除非本领域普通技术人员能作出不同理解，或者在说明书中明确排除了将该试剂用于这种方法。还应理解，当说明书或权利要求书中公开了用于测序的特定组分时，本发明也包括制备这种试剂的方法。术语“组分”广泛用于指用于测序的任何项目，包括模板、连接有模板的微粒、文库等。而且，附图是说明书的组成部分，本发明包括附图中所示的结构如连接有模板的微粒和附图中所述的方法。Furthermore, it is to be understood that any one or more embodiments may be expressly excluded from the claims, even if a specific exclusion is not expressly recited herein. It should also be understood that when the description and/or claims disclose reagents (such as templates, microspheres, probes, probe families, etc.) Other known methods of sequencing using this reagent, unless a person of ordinary skill in the art can make a different understanding, or there is a different description in the specification. In addition, when the description and/or claims disclose a sequencing method, any one or more of the reagents described herein can be used in the method, unless a person of ordinary skill in the art can make a different understanding, or the description explicitly excludes to use this reagent in this method. It is also to be understood that where specific components for sequencing are disclosed in the specification or claims, the invention also includes methods of preparing such reagents. The term "component" is used broadly to refer to any item for sequencing, including templates, template-linked particles, libraries, and the like. Furthermore, the drawings are an integral part of the description, and the invention includes the structures shown in the drawings, such as template-linked particles, and the methods described in the drawings.

本文中给定范围时，包括端点。而且应理解，除非另有说明或者上下文中和本领域普通技术人员的理解显然不同，在本发明不同实施方式中表示为范围的值可推定所述范围内任何特定值或子范围，至该范围下限的单位的十分之一，除非文中明确说明并非如此。Where ranges are given herein, the endpoints are included. It is also to be understood that values expressed as ranges in various embodiments of the invention presuppose any specific value or subrange within said range, unless otherwise indicated or where the context clearly differs from that understood by one of ordinary skill in the art. One-tenth of the unit of the lower limit, unless the text expressly states otherwise.

Claims

1. method of identifying template polynucleotide inner nucleotide sequence said method comprising the steps of:

(a) by the duplex that oligonucleotide probe and initial few nucleic acid are connected to form prolongation described initial oligonucleotide is extended along described template polynucleotide, wherein said oligonucleotide probe comprises thiophosphatephosphorothioate and connects;

(b) one or more Nucleotide of the described polynucleotide of evaluation; With

(c) repeating step (a) and (b) is up to determining nucleotide sequence.

2. the method for claim 1 is characterized in that, described authentication step comprises the mark that detects the oligonucleotide probe that is connected in nearest connection.

3. the method for claim 1 comprises that also cutting described thiophosphatephosphorothioate with the cutting agent that contains the atom that is selected from Ag, Hg, Cu, Mn, Zn or Cd connects the step that produces extensible probe end.

4. method as claimed in claim 3 is characterized in that described cutting agent is AgNO ₃

5. the method for claim 1 is characterized in that, in semi-solid upholder or on carry out described extension step.

6. method of measuring template polynucleotide inner nucleotide sequence said method comprising the steps of:

(a) probe-template duplex that provides probe and template multi-nucleotide hybrid to form, described probe has extensible end;

(b) will extend oligonucleotide probe and described extensible end and be connected to form and contain the prolongation duplex that prolongs oligonucleotide probe, wherein said extension probes contains thiophosphatephosphorothioate and connects;

(c) in described prolongation duplex, identify at least one (1) and the extension probes complementary Nucleotide that just has been connected or (2) lucky nucleotide residue in the template polynucleotide in described prolongation oligonucleotide probe downstream;

(d) if there is not ready-made extensible end, on described prolongation oligonucleotide probe, produce extensible end, make the end of generation be different from an end that extension probes connected; With

(e) repeating step (b), (c) and (d), the nucleotide sequence in determining described template polynucleotide.

7. method as claimed in claim 6 is characterized in that, an end of described each extension probes contains non-extensible part.

8. method as claimed in claim 6 is characterized in that, described authentication step comprises the mark that detects the extension probes that is connected in nearest connection.

9. method as claimed in claim 6 is characterized in that, described authentication step is included under the existence of chain termination nucleoside triphosphate of one or more marks and removes described non-extensible part and extend the oligonucleotide probe of described extension with nucleic acid polymerase.

10. method as claimed in claim 6 is included in also that not have extension probes to be connected in the described Connection Step described extensible when terminal, adds the step of cap for the oligonucleotide probe that extends.

11. method as claimed in claim 6 is characterized in that, described generation step comprises that cutting described thiophosphatephosphorothioate with the cutting agent that contains the atom that is selected from Ag, Hg, Cu, Mn, Zn or Cd connects.

12. method as claimed in claim 11 is characterized in that, described cutting agent is AgNO ₃

13. method as claimed in claim 6 is characterized in that, in semi-solid upholder or on carry out described connection and produce step.

14. method as claimed in claim 6, it is characterized in that, step (a) is included in a plurality of different probes-template duplex is provided in the independent sample size, each duplex contains the initial oligonucleotide probe of hybridizing in the template polynucleotide, wherein the template polynucleotide in each duplex are identical, but the initial oligonucleotide probe in each duplex is incorporated into the different sequences of described template polynucleotide; Independently each sample size is carried out step (b)-(e).

15. method as claimed in claim 14 is characterized in that, with regard to each sample size, an end of described extension oligonucleotide probe contains non-extensible part.

16. method as claimed in claim 15 is characterized in that, with regard to each sample size, described authentication step comprises the mark that detects the extension probes that is connected in nearest connection.

17. method as claimed in claim 15, it is characterized in that, with regard to each sample size, described authentication step is included in the chain termination nucleoside triphosphate existence of one or more marks and removes described non-extensible part and extend the oligonucleotide probe of described extension with nucleic acid polymerase down.

18. method as claimed in claim 15 is included in also that not have extension probes to be connected in the described Connection Step described extensible when terminal, adds the step of cap for the oligonucleotide probe that extends.

19. method as claimed in claim 15 is characterized in that, described generation step comprises that cutting described thiophosphatephosphorothioate with the cutting agent that contains the atom that is selected from Ag, Hg, Cu, Mn, Zn or Cd connects.

20. method as claimed in claim 19 is characterized in that, described cutting agent is AgNO ₃

21. method as claimed in claim 15 is characterized in that, in semi-solid upholder or on carry out described connection and produce step.

22. method as claimed in claim 6 is further comprising the steps of: (f) described linking probe and the described initial oligonucleotide on the described template of removal; (g) with the not homotactic second kind of oligonucleotide repeating step (a) that is incorporated into described template polynucleotide; (h) repeating step (b)-(e).

23. method as claimed in claim 22 is characterized in that, repeatedly repeats described method with the not homotactic initial oligonucleotide that is incorporated into described template polynucleotide.

24. method as claimed in claim 23 is characterized in that, an end of described extension probes contains non-extensible part.

25. method as claimed in claim 23 is characterized in that, in repeating, described authentication step comprises the mark that detects the extension probes that is connected in nearest connection at every turn.

26. method as claimed in claim 23, it is characterized in that, in repeating, described authentication step is included in the chain termination nucleoside triphosphate existence of one or more marks and removes described non-extensible part and extend the oligonucleotide probe of described extension with nucleic acid polymerase down at every turn.

27. method as claimed in claim 23 is included in also that not have extension probes to be connected in the described Connection Step described extensible when terminal, adds the step of cap for the oligonucleotide probe that extends.

28. method as claimed in claim 23 is characterized in that, described generation step comprises that cutting described thiophosphatephosphorothioate with the cutting agent that contains the atom that is selected from Ag, Hg, Cu, Mn, Zn or Cd connects.

29. method as claimed in claim 28 is characterized in that, described cutting agent is AgNO ₃

30. method as claimed in claim 23 is characterized in that, in semi-solid upholder or on carry out described connection and produce step.

31. method as claimed in claim 22 is characterized in that, described removal step comprises linking probe, initial oligonucleotide and template and contains the 1.0-3.0%SDS that has an appointment, 100-300mM NaCl and 5-15mM sodium pyrosulfate (NaHSO ₄) the aqueous solution contact.

32. method as claimed in claim 22 is characterized in that, described removal step comprises linking probe, initial oligonucleotide and template and contains the 2%SDS that has an appointment, 200mM NaCl and 10mM sodium pyrosulfate (NaHSO ₄), as 2%SDS, 200mM NaCl and 10mM sodium pyrosulfate (NaHSO ₄) solution contact.

33. an evaluation is connected in the method for the template polynucleotide inner nucleotide sequence of upholder on tie point, said method comprising the steps of:

(a) by the duplex that oligonucleotide probe and initial oligonucleotide is connected to form prolongation described initial oligonucleotide is extended along described template polynucleotide, wherein extend along the tie point of described template to it and described upholder;

(b) one or more Nucleotide of the described polynucleotide of evaluation; With

(c) repeating step (a) and (b) is up to determining nucleotide sequence.

34. method as claimed in claim 33 is characterized in that, described authentication step comprises the mark that detects the extension probes that is connected in nearest connection.

35. method as claimed in claim 33 is characterized in that, described oligonucleotide probe contains thiophosphatephosphorothioate and connects, and described generation step comprises that cutting described thiophosphatephosphorothioate with the cutting agent that contains the atom that is selected from Ag, Hg, Cu, Mn, Zn or Cd connects.

36. method as claimed in claim 35 is characterized in that, described cutting agent is AgNO ₃

37. method as claimed in claim 35 is characterized in that, described connection and produce step in semi-solid upholder or on carry out.

38. a method that is determined at the template polynucleotide inner nucleotide sequence that is connected in upholder on the tie point said method comprising the steps of:

(a) probe-template duplex that provides probe and template multi-nucleotide hybrid to form, described primer has extensible end;

(b) will extend oligonucleotide probe and be connected in described extensible end, and form and contain the prolongation duplex that prolongs oligonucleotide probe;

39. method as claimed in claim 38 is characterized in that, an end of each extension probes contains non-extensible part.

40. method as claimed in claim 38 is characterized in that, described authentication step comprises the mark that detects the extension probes that is connected in nearest connection.

41. method as claimed in claim 38 is characterized in that, described authentication step is included under the existence of chain termination nucleoside triphosphate of one or more marks and removes described non-extensible part and extend the oligonucleotide probe of described extension with nucleic acid polymerase.

42. method as claimed in claim 38 is included in also that not have extension probes to be connected in the described Connection Step described extensible when terminal, adds the step of cap for the oligonucleotide probe that extends.

43. method as claimed in claim 38 is characterized in that, described oligonucleotide probe contains thiophosphatephosphorothioate and connects, and described generation step comprises that cutting described thiophosphatephosphorothioate with the cutting agent that contains the atom that is selected from Ag, Hg, Cu, Mn, Zn or Cd connects.

44. method as claimed in claim 43 is characterized in that, described cutting agent is AgNO ₃

45. method as claimed in claim 38 is further comprising the steps of: (f) described linking probe and the described initial oligonucleotide on the described template of removal; (g) with the not homotactic second kind of oligonucleotide repeating step (a) that is incorporated into described template polynucleotide; (h) repeating step (b)-(e).

46. method as claimed in claim 45 is characterized in that, repeatedly repeats described method with the not homotactic initial oligonucleotide that is incorporated into described template polynucleotide.

47. method as claimed in claim 38 is characterized in that, described connection and produce step in semi-solid upholder or on carry out.

48. method as claimed in claim 38 is characterized in that, described template is connected in the particulate that is connected with the rigid substrate of substantially flat.

49. a method of identifying template polynucleotide inner nucleotide sequence said method comprising the steps of:

(a) provide be connected in be fixed within the semi-solid upholder or on the template polynucleotide of particulate.

(b) by the duplex that oligonucleotide probe and initial oligonucleotide is connected to form prolongation initial oligonucleotide is extended along described template polynucleotide, wherein said oligonucleotide probe contains easily cuts connection;

(c) one or more Nucleotide of the described polynucleotide of evaluation; With

(d) repeating step (b) and (c) is up to determining nucleotide sequence.

50. method as claimed in claim 49 is characterized in that, carries out described extension step on described semi-solid upholder.

51. method as claimed in claim 49 is characterized in that, described template is connected in the particulate that is connected with the rigid substrate of substantially flat.

52. a method of measuring template polynucleotide inner nucleotide sequence said method comprising the steps of:

(a) probe-template duplex that provides probe and template multi-nucleotide hybrid to form, described probe has extensible end, described probe-template duplex be connected in be embedded within the semi-solid upholder or on particulate;

(b) will extend oligonucleotide probe and be connected in described extensible end, and form and contain the prolongation duplex that prolongs oligonucleotide probe, wherein said extension probes contains thiophosphatephosphorothioate and connects;

53. method as claimed in claim 52 is characterized in that, carries out described connection and produce step in described semi-solid upholder.

54. method as claimed in claim 52 is characterized in that, described template is connected in the particulate that is connected with the rigid substrate of substantially flat.

55. a method of measuring template polynucleotide inner nucleotide sequence said method comprising the steps of:

(a) in the presence of particulate in the emulsion chamber amplification template polynucleotide molecule, produce the particulate of the clonal population be connected with the template polynucleotide;

(b) from described emulsion, reclaim described particulate;

(c) described particulate is embedded within the semi-solid upholder or on;

(d) by the duplex that oligonucleotide probe and initial oligonucleotide is connected to form prolongation initial oligonucleotide is extended along described template polynucleotide, wherein said oligonucleotide probe contains easily cuts connection;

(e) one or more Nucleotide of the described polynucleotide of evaluation; With

(f) repeating step (d) and (e) is up to determining nucleotide sequence.

56. method as claimed in claim 55 is characterized in that, (i) amplification contains not homotactic multiple template polynucleotide molecule in single emulsion chamber; (ii) from described emulsion, reclaim multiple particulate and be embedded in the described upholder or on, each particulate is connected with template polynucleotide clonal population, wherein said clonal population has different sequences, (iii) to the parallel step (d), (e) and (f) of carrying out of the described clonal population that is connected in described embedding particulate, so that a plurality of sequences of replicate(determination).

57. the method for template polynucleotide inner nucleotide sequence information is measured in first set of the oligonucleotide probe family of at least two kinds of distinctive marks of a use, said method comprising the steps of:

(a) by the duplex that oligonucleotide probe and initial oligonucleotide is connected to form prolongation initial oligonucleotide is extended along described template polynucleotide, wherein said oligonucleotide probe is the member of the oligonucleotide probe man family set of described distinctive mark;

(b) detect the mark that is connected with described oligonucleotide; With

(c) repeating step (a) and (b) is up to the ordered list that obtains probe family title; With

(d) adopt the ordered list of probe family title to get rid of one or more possible nucleotide sequences.

58. method as claimed in claim 57 is characterized in that, step (d) comprises the ordered list of the described probe of decoding family title, to determine described sequence.

59. method as claimed in claim 57, it is characterized in that, probe-template the duplex that provides initial oligonucleotide probe and template multi-nucleotide hybrid to form is provided described method, described probe has extensible end, wherein said extension step comprises oligonucleotide probe is connected in described extensible end, formation contains the prolongation duplex that prolongs oligonucleotide probe, be included in also that not have oligonucleotide probe to be connected in the described extension step described extensible when terminal, add the step of cap for all the other extensible ends.

60. method as claimed in claim 57 is characterized in that, an end of oligonucleotide probe contains non-extensible part described in each probe family.

61. method as claimed in claim 57, after detecting step, each also comprises: if (f) extensible end does not exist, just on the oligonucleotide probe of described nearest connection, produce extensible end, so that the end that produces is different from the end that the oligonucleotide probe of described nearest connection connects.

62. method as claimed in claim 61, it is characterized in that, described oligonucleotide probe contains thiophosphatephosphorothioate and connects, and cuts described thiophosphatephosphorothioate with the cutting agent that contains the atom that is selected from Ag, Hg, Cu, Mn, Zn or Cd and connects, thereby produce described extensible probe end.

63. method as claimed in claim 62 is characterized in that, described cutting agent is AgNO ₃

64. method as claimed in claim 57 is characterized in that, in semi-solid upholder or on carry out described extension step.

65. method as claimed in claim 57 is characterized in that, described template is connected in the particulate that is connected with the rigid substrate of substantially flat.

66. method as claimed in claim 57 is characterized in that, described set comprises the probe family of 2 kinds of distinctive marks.

67. method as claimed in claim 57 is characterized in that, described set comprises the probe family of 3 kinds of distinctive marks.

68. method as claimed in claim 57 is characterized in that, described set comprises the probe family of 4 kinds of distinctive marks.

69. method as claimed in claim 57 is characterized in that, described set comprises the probe family of distinctive mark more than 4 kinds.

70. method as claimed in claim 57 is characterized in that, described oligonucleotide probe comprises the limited part that nucleosides is not independently selected, and wherein distributes to probe family according to the encoding scheme oligonucleotide probe that limited partial sequence is different.

71. method as claimed in claim 57 is characterized in that, according to one of 24 kinds of listed encoding schemes of table 1 described oligonucleotide probe is distributed to first, second, third and four point probe family.

72. method as claimed in claim 58 is characterized in that, the kind of at least one Nucleotide is known in the described template, and wherein said decoding step comprises:

(i) by determining that the possible sequence of the limited part of this probe which kind and known nucleotide kind and its near-end Nucleotide are connected in the Nucleotide adjacent nucleotide relative position of known kind conforms to, to the Nucleotide given category adjacent on the described template with the Nucleotide of known kind;

(ii), give described follow-up Nucleotide given category by determining which kind conforms to the possible sequence that its near-end Nucleotide is connected in the limited part of this probe of follow-up Nucleotide relative position; With

(iii) repeating step (ii), up to measuring this sequence.

73. method as claimed in claim 58 is further comprising the steps of:

(a) the Nucleotide kind in the described template of mensuration, so that described Nucleotide has known kind, wherein said decoding step comprises:

(iii) repeating step (ii), up to measuring this sequence.

74. as the described method of claim 73, it is characterized in that, described determination step is included under the certain condition that has polysaccharase template-probe duplex is contacted with labeled nucleotide, if described under the described conditions labeled nucleotide with described duplex position adjacent on described template complementation, just can mix described labeled nucleotide.

75. method as claimed in claim 58 is characterized in that, described decoding step comprises: produce at least a candidate sequence from the ordered list of probe family title; With the nucleotide sequence of selecting candidate sequence as described template.

76., it is characterized in that described generation step comprises at least 4 candidate sequences of generation as the described method of claim 75.

77., it is characterized in that described generation step comprises as the described method of claim 75:

(i) kind of first Nucleotide of the described nucleotide sequence of supposition;

(ii) basis is determined the possible kind of adjacent nucleotide corresponding to the probe family title of described first Nucleotide, thereby specifies the kind of the Nucleotide adjacent with described first Nucleotide;

(iii) basis is determined the possible kind of follow-up Nucleotide corresponding to the probe family title of the Nucleotide of nearest given category, thereby specifies the kind of follow-up Nucleotide;

(iv) repeating step (iii), up to producing candidate sequence; With

(v) repeating step (i)-(iv) wherein, is taken turns in the repetition at each, and described first Nucleotide is assumed to different sorts, up to the candidate sequence that produces desired number.

78., it is characterized in that described selection step comprises at least a candidate sequence and one or more known arrays as the described method of claim 75, and select and one or more known arrays have predetermined homogeny degree or immediate candidate sequence.

79., it is characterized in that described template is derived from interested organism as the described method of claim 78, wherein said comparison step comprises at least a candidate sequence and contains available from the sequence in the database of the sequence of described organism.

80., it is characterized in that described comparison step comprises at least a candidate sequence and the sequence that contains in the database of a plurality of comparative sequences as the described method of claim 78, each sequence contains the difference of polynucleotide sequence to be measured may sequence.

81., it is characterized in that described selection step comprises as the described method of claim 75:

(i) use second set of the coding probe family of distinctive mark to obtain second kind of probe family title ordered list from described template, the man family set middle probe of wherein said second probe family is different with the coding of the man family set middle probe of described first probe family;

(ii) produce at least a comparative sequences from described second kind of probe family title ordered list;

The (iii) part of the part of at least a described candidate sequence and at least a described comparative sequences; With

(iv) be chosen in the step (c) on the part relatively with comparative sequences predetermined homogeny degree or the immediate candidate sequence nucleotide sequence as described template is arranged.

82., it is characterized in that described rating unit is a dinucleotides as the described method of claim 81.

83., it is characterized in that the described second probe family title ordered list only contains an element as the described method of claim 81.

84. method as claimed in claim 57 is characterized in that, oligonucleotide probe has following structure described in each probe family: 5 '-(XY) (N) _kN _B ^*-3 ' or 3 '-(XY) (N) _kN _B ^*-5 ', wherein N represents any nucleosides, N _BThe part that representative can not be extended with ligase enzyme, but * represent the test section, XY is the limited part of described probe, wherein X and Y represent nucleosides identical or different but that can not independently choose separately, X and Y are at least 2 times of degeneracys, connecting between at least one nucleosides is easily to cut connection, and k is 1-100, and restricted condition is: but the test section can be present in Y or (N) _kIn arbitrarily on the Nucleotide and be present in N in addition _BGo up or be not present in N in addition _BOn.

85. as the described method of claim 84, it is characterized in that, describedly cut easily that to connect be that thiophosphatephosphorothioate connects.

86. as the described method of claim 84, it is characterized in that, but described test section connects, can or have this two kinds of features by photobleaching by cutting joint.

87., it is characterized in that the described joint that cuts contains disulfide linkage as the described method of claim 86.

88. as the described method of claim 84, it is characterized in that, adopt the oligonucleotide probe family of 4 kinds of distinctive marks, wherein the different oligonucleotide probe of the limited partial sequence of this probe is distributed to first, second, third and four point probe family according to one of 24 kinds of listed encoding schemes of table 1.

89. method as claimed in claim 57 is characterized in that, described detection step comprises simultaneously from described template at least 2 Nucleotide 2 information of average acquiring separately, and does not obtain two information from any single Nucleotide.

90. method as claimed in claim 57 is characterized in that, described detection step comprises simultaneously that at least 2 Nucleotide obtain separately from described template and is less than 2 information.

91. the method for template polynucleotide inner nucleotide sequence information is measured in first set with the oligonucleotide probe family of at least two kinds of distinctive marks, said method comprising the steps of:

(a) thus probe-template composite contacted with the oligonucleotide probe family of two kinds of distinctive marks at least make oligonucleotide probe hybridization, described probe-template composite contains double-stranded part and the strand part to be checked order with extensible end, and described oligonucleotide probe contains the template part complementary part that partly is close to described duplex;

(b) oligonucleotide probe with hybridization is connected with described extensible end, contains the probe-template composite that prolongs duplex thereby produce;

(c) detect the mark that links to each other with described linking probe;

(d), then on described prolongation duplex, produce extensible probe end if there is not ready-made extensible probe end; With

(e) repeating step (a)-(d) is up to the ordered list that obtains probe family title.

92., it is characterized in that described detection step comprises simultaneously from described template at least 2 Nucleotide 2 information of average acquiring separately, and does not obtain two information from any single Nucleotide as the described method of claim 91.

93., it is characterized in that described detection step comprises simultaneously that at least 2 Nucleotide obtain separately from described template and is less than 2 information as the described method of claim 91.

94. the method with the nucleotide sequence information of the first set mensuration template polynucleotide of oligonucleotide probe family said method comprising the steps of:

(a) carry out orderly continuously extension, connection, detection and cutting circulation, wherein said detection step comprises: at least two Nucleotide respectively obtain average two information from described template simultaneously, and do not obtain two information from any single Nucleotide; With

(b) information that step (a) is obtained and at least one out of Memory merge, to determine described sequence.

95., it is characterized in that described at least one out of Memory comprises an information that is selected from down group as the described method of claim 94: the Nucleotide kind in the described template, by comparing the information that candidate sequence and at least a known array obtain; Repeat the information that described method obtains with second set that utilizes oligonucleotide probe family.

96. a method of distinguishing single nucleotide polymorphism and order-checking mistake said method comprising the steps of:

(a) with the described method of the claim 58 multiple template that checks order, wherein, described template is represented the overlapping fragments of a nucleotide sequence;

(b) sequence of comparison step (a) acquisition; With

(c) represent the sequence mistake if described sequence, is then determined the difference between the sequence basic identical in the first part and significantly different on second section, the length of described each several part is at least 3 Nucleotide.

97. a method of distinguishing single nucleotide polymorphism and order-checking mistake said method comprising the steps of:

(a) carry out the step (a)-(c) of claim 58 with the multiple template of the single nucleotide sequence overlapping fragments of representative, thereby obtain a plurality of probe family ordered list;

(b) the probe family ordered list that obtains of comparison step (a) to be to obtain a comparison area, and is identical in tabulation at least 90% described in the described comparison area; With

(c), then determine the difference representative order-checking mistake between the described probe family ordered list if described tabulation is only different on a position of described comparison area; Or

(d), determine that then the difference between the described probe family ordered list is represented single nucleotide polymorphism if described tabulation is different on two or more consecutive positions of described comparison area.

98. the set of the oligonucleotide probe family of at least two kinds of distinctive marks, wherein the probe of each probe family comprises limited part and not limited part, be at least 2 times of degeneracys on each position of described limited part, the probe of each family contains easily to cut between nucleosides and connects.

99. the oligonucleotide probe man family set as the described distinctive mark of claim 98 is characterized in that each probe contains the inductile end of ligase enzyme.

100. oligonucleotide probe man family set as the described distinctive mark of claim 98, it is characterized in that, each probe contains the inductile end of ligase enzyme, but each probe comprises the test section on a described position of easily cutting between connection and the inductile end of described ligase enzyme.

101. the oligonucleotide probe man family set as the described distinctive mark of claim 98 is characterized in that, describedly cuts easily that to connect be that thiophosphatephosphorothioate connects.

102. the oligonucleotide probe man family set as the described distinctive mark of claim 98 is characterized in that described set comprises 2 kinds of probe families.

103. the oligonucleotide probe man family set as the described distinctive mark of claim 98 is characterized in that described set comprises 3 kinds of probe families.

104. the oligonucleotide probe man family set as the described distinctive mark of claim 98 is characterized in that described set comprises 4 kinds of probe families.

105. the oligonucleotide probe man family set as the described distinctive mark of claim 98 is characterized in that described set comprises probe family more than 4 kinds.

106. the oligonucleotide probe man family set as the described distinctive mark of claim 98 it is characterized in that, but described probe contains the test section, but described test section connects, can or have this two kinds of features by photobleaching by cutting joint.

107. the set of the oligonucleotide probe family of at least two kinds of distinctive marks, wherein the oligonucleotide probe of each probe family has following structure 5 '-(X) _j(N) _kN _B-3 ' or 3 '-(X) _j(N) _kN _B-5 ', wherein N represents any nucleosides, N _BThe part that representative can not be extended with ligase enzyme, (X) _jBe the limited part of described probe, wherein each X represents a nucleosides, (X) _jIn nucleosides identical or different, but can not independently choose separately, each X is at least 2 times of degeneracys, j is 2-5, k is 1-100, but each probe has the test section at its end, this position, test section is not (X) _jIn nucleosides, wherein the probe in each probe family comprises same tag, but the probe of different probe family comprises different separators.

108. the oligonucleotides coding probe man family set as the described distinctive mark of claim 107 is characterized in that connecting between at least one nucleosides is easily to cut connection.

109. the oligonucleotide probe man family set as the described distinctive mark of claim 107 is characterized in that, describedly cuts easily that to connect be that thiophosphatephosphorothioate connects.

110. the oligonucleotide probe man family set as the described distinctive mark of claim 107 is characterized in that, but described test section connects, can or have this two kinds of features by photobleaching by cutting joint.

111. the oligonucleotide probe man family set as the described distinctive mark of claim 110 is characterized in that the described joint that cuts contains disulfide linkage.

112. the oligonucleotide probe man family set as the described distinctive mark of claim 107 is characterized in that, described group by the group composition of four kinds of probes man, and wherein the oligonucleotide probe of each probe family has following structure: 5 '-(XY) (N) _kN _B ^*-3 ' or 3 '-(XY) (N) _kN _B ^*-5 ', wherein N represents any nucleosides, N _BThe part of representing function to extend with ligase enzyme, but * represent the test section, XY is the limited part of described probe, wherein X and Y represent nucleosides identical or different but that can not independently choose separately, X and Y are at least 2 times of degeneracys, connecting between at least one nucleosides is easily to cut connection, and k is 1-100, and restricted condition is: but the test section can be present in Y or (N) _kIn arbitrarily on the Nucleotide and be present in N in addition _BGo up or be not present in N in addition _BOn.

113. the oligonucleotide probe man family set as the described distinctive mark of claim 112 is characterized in that, describedly cuts easily that to connect be that thiophosphatephosphorothioate connects.

114. the oligonucleotide probe man family set as the described distinctive mark of claim 112 is characterized in that, but described test section connects, can or have this two kinds of features by photobleaching by cutting joint.

115. the oligonucleotide probe man family set as the described distinctive mark of claim 114 is characterized in that the described joint that cuts contains disulfide linkage.

116. oligonucleotide probe man family set as the described distinctive mark of claim 112, it is characterized in that, the different oligonucleotide probe of the limited partial sequence of this probe is distributed to first, second, third and four point probe family according to one of 24 kinds of listed encoding schemes of table 1.

117. the set of the oligonucleotide probe family of at least two kinds of distinctive marks, wherein the oligonucleotide probe of each probe family has following structure 5 '-(X) _j(N) _kN _B-3 ' or 3 '-(X) _j(N) _iN _B-5 ', wherein N represents any nucleosides or dealkalize base residue, N _BThe part that representative can not be extended with ligase enzyme, (X) _jBe the limited part of described probe, wherein each X represents nucleosides or dealkalize base residue, and restricted condition is: X ₁Represent a Nucleotide, (X) _jIn nucleosides identical or different, but can not independently choose separately, each X is at least 2 times of degeneracys, j is 2-5, k is 1-100, but each probe comprises the test section at its end, but the position at place, test section is not (X) _jIn nucleosides, wherein the probe in each probe family comprises same tag, but the probe of different probe family comprises different separators.

118. the oligonucleotides coding probe man family set as the described distinctive mark of claim 117 is characterized in that connecting between at least one nucleosides is easily to cut connection.

119. the oligonucleotide probe man family set as the described distinctive mark of claim 117 is characterized in that, describedly easily cuts connection between nucleosides and dealkalize base residue.

120. the oligonucleotides coding probe man family set as the described distinctive mark of claim 117 is characterized in that described oligonucleotide probe contains the initiation residue.

121. the oligonucleotide probe man family set as the described distinctive mark of claim 117 is characterized in that, but described test section connects, can or have this two kinds of features by photobleaching by cutting joint.

122. the oligonucleotide probe man family set as the described distinctive mark of claim 121 is characterized in that the described joint that cuts contains disulfide linkage.

123. the oligonucleotide probe man family set as the described distinctive mark of claim 117 is characterized in that, described group by the group composition of four kinds of probes man, and wherein the oligonucleotide probe of each probe family has following structure: 5 '-(XY) (N) _kN _B ^*-3 ' or 3 '-(XY) (N) _kN _B ^*-5 ', wherein N represents any nucleosides or dealkalize base residue, N _BThe part that representative can not be extended with ligase enzyme, but * represent the test section, XY is the limited part of described probe, wherein X and Y represent nucleosides identical or different but that can not independently choose separately, X and Y are at least 2 times of degeneracys, connecting between at least one nucleosides is easily to cut connection, and k is 1-100, and restricted condition is: but the test section can be present in Y or (N) _kIn arbitrarily on the Nucleotide and be present in N in addition _BGo up or be not present in N in addition _BOn.

124. the oligonucleotide probe man family set as the described distinctive mark of claim 123 is characterized in that, describedly easily cuts connection between nucleosides and dealkalize base residue.

125. the oligonucleotide probe man family set as the described distinctive mark of claim 123 is characterized in that, but described test section connects, can or have this two kinds of features by photobleaching by cutting joint.

126. the oligonucleotide probe man family set as the described distinctive mark of claim 125 is characterized in that the described joint that cuts contains disulfide linkage.

127. the oligonucleotides coding probe man family set as the described distinctive mark of claim 123 is characterized in that described oligonucleotide probe contains the initiation residue.

128. oligonucleotide probe man family set as the described distinctive mark of claim 123, it is characterized in that, the different oligonucleotide probe of the limited partial sequence of this probe is distributed to first, second, third and four point probe family according to one of 24 kinds of listed encoding schemes of table 1.

129. a test kit, it comprises the set of the oligonucleotide probe family of at least two kinds of distinctive marks.

130., it is characterized in that described probe contains easily to cut between nucleosides and connects as the described test kit of claim 129.

131. as the described test kit of claim 130, it is characterized in that, describedly cut easily between nucleosides that to connect be that thiophosphatephosphorothioate connects.

132. as the described test kit of claim 129, also comprise at least one that is selected from down group: the reagent of the reagent of ligase enzyme, the material that can cut the thiophosphatephosphorothioate connection, Phosphoric acid esterase, polysaccharase, upholder, damping fluid, thermostability polysaccharase, Nucleotide, preparation emulsion and preparation gel.

133. a method for preparing multiple template polynucleotide said method comprising the steps of:

(a) a plurality of particulates are embedded within the semi-solid upholder of reversibility or on, form first arrays of microparticles; With

(b) amplification starting template polynucleotide in described semi-solid upholder, like this, after amplification, each particulate is connected with the clonal population of template molecule.

134., further comprising the steps of as the described method of claim 133:

(a) the described semi-solid upholder of dissolving; With

(b) collect described particulate.

135., further comprising the steps of as the described method of claim 134:

In second half solid support or on form second arrays of microparticles, the density of particle in wherein said second array is greater than the density of particle in described first array.

136. a test kit, it comprises the oligonucleotide probe that contains the thiophosphatephosphorothioate connection, but wherein said probe has carried out mark with the test section.

137. as the described test kit of claim 136, it is characterized in that, but described test section is a fluorescence dye.

138., also comprise and to cut the material that thiophosphatephosphorothioate connects as the described test kit of claim 136.

139., also comprise ligase enzyme as the described test kit of claim 136.

140., also comprise ligase enzyme and can cut the material that thiophosphatephosphorothioate is connected as the described test kit of claim 136.

141. as the described test kit of claim 136, also comprise at least one that is selected from down group: the reagent of the reagent of ligase enzyme, the material that can cut the thiophosphatephosphorothioate connection, Phosphoric acid esterase, polysaccharase, upholder, damping fluid, thermostability polysaccharase, Nucleotide, preparation emulsion and preparation gel.

142. as the described test kit of claim 136, it is characterized in that, described test kit comprises and contains the multiple fluorescently-labeled oligonucleotide probe that thiophosphatephosphorothioate connects, so that carry different fluorescence dyes that can be spectrally resolved corresponding to the probe of different probe terminal nucleotide.

143. a form is 5 '-O-P-O-X-O-P-S-(N) _kN _B ^*-3 ' oligonucleotide, wherein N represents any Nucleotide, N _BThe part that representative can not be extended with ligase enzyme, but * representative test section, X represents Nucleotide, and k is 1-100, and restricted condition is: but the test section can be present in (N) _kIn any Nucleotide on and be present in N in addition _BGo up or be not present in N in addition _BOn.

144., it is characterized in that described probe contains the Nucleotide that at least one degeneracy reduces as the described oligonucleotide probe of claim 143.

145., it is characterized in that described group comprises multiple fluorescently-labeled oligonucleotide probe as the described oligonucleotide probe group of claim 143, so that carry different fluorescence dyes that can be spectrally resolved corresponding to the probe of different probe nucleotide X.

146. a form is 5 '-N _B ^*(N) _kThe oligonucleotide probe of C-S-P-O-X-3 ', wherein N represents any Nucleotide, N _BThe part that representative can not be extended with ligase enzyme, but * representative test section, X represents Nucleotide, and k is 1-100, and restricted condition is: but the test section can be present in (N) _kIn any Nucleotide on and be present in N in addition _BGo up or be not present in N in addition _BOn.

147., it is characterized in that described probe contains the Nucleotide that at least one degeneracy reduces as the described oligonucleotide probe of claim 146.

148., it is characterized in that described group comprises multiple fluorescently-labeled oligonucleotide probe as the described oligonucleotide probe group of claim 146, so that carry different fluorescence dyes that can be spectrally resolved corresponding to the probe of different probe nucleotide X.

149. a form is 5 '-O-P-O-X-O-(N) _k-O-P-S-(N) _iN _B ^*-3 ' oligonucleotide probe, wherein N represents any Nucleotide, N _BThe part that representative can not be extended with ligase enzyme, but * representative test section, X represents Nucleotide, (k+i) is 1-100, and k is 1-100, and i is 0-99, and restricted condition is: but the test section can be present in (N) _iIn any Nucleotide on and be present in N in addition _BGo up or be not present in N in addition _BOn.

150., it is characterized in that described probe contains the Nucleotide that at least one degeneracy reduces as the described oligonucleotide probe of claim 149.

151., it is characterized in that i=0 as the described oligonucleotide probe of claim 149.

152., it is characterized in that described group comprises multiple fluorescently-labeled oligonucleotide probe as the described oligonucleotide probe group of claim 149, so that carry different fluorescence dyes that can be spectrally resolved corresponding to the probe of different probe nucleotide X.

153. a form is 5 '-N _B ^*(N) _i-S-P-O-(N) _kThe oligonucleotide probe of-O-P-O-X-3 ', wherein N represents any Nucleotide, N _BThe part that representative can not be extended with ligase enzyme, but * representative test section, X represents Nucleotide, (k+i) is 1-100, and k is 1-100, and i is 0-99, and restricted condition is: but the test section can be present in (N) _iIn any Nucleotide on and be present in N in addition _BGo up or be not present in N in addition _BOn.

154., it is characterized in that described probe contains the Nucleotide that at least one degeneracy reduces as the described oligonucleotide probe of claim 153.

155., it is characterized in that i=0 as the described oligonucleotide probe of claim 153.

156., it is characterized in that described group comprises multiple fluorescently-labeled oligonucleotide probe as the described oligonucleotide probe group of claim 153, so that carry different fluorescence dyes that can be spectrally resolved corresponding to the probe of different probe nucleotide X.

157. a form is selected from down the oligonucleotide probe of group: 3 '-XNNNNsINI-5 ', 3 '-XNNNNsIII-5 ', 3 '-XNNNNsNII-5 ' and 3 '-XNNNNIsII-5 ', wherein X and N represent any Nucleotide, on behalf of Yi Qie, " s " connect, and described at least one residue of easily cutting between connection and the described oligonucleotide 5 ' end contains the mark corresponding to specific X.

158., it is characterized in that on behalf of thiophosphatephosphorothioate, s connect as the described probe of claim 157.

159. one kind is connected in the method for second polynucleotide with first polynucleotide, said method comprising the steps of:

(a) provide be fixed within the semi-solid upholder or on first polynucleotide;

(b) described first polynucleotide are contacted with second polynucleotide and ligase enzyme; With

(c) described first and second polynucleotide are maintained ligase enzyme exist with the condition that is fit to be connected under.

160., it is characterized in that described first kind of polynucleotide are directly connected in described semi-solid upholder by covalently or non-covalently connecting as the described method of claim 159.

161. as the described method of claim 159, it is characterized in that, described first kind of polynucleotide be connected in be fixed in the described semi-solid upholder or on upholder.

162., it is characterized in that described semi-solid upholder is a gel as the described method of claim 161, described upholder is a particulate.

163., it is characterized in that described semi-solid upholder is a gel as the described method of claim 161, described upholder is a magnetic particle.

164. a method of cutting polynucleotide said method comprising the steps of:

(a) provide be fixed within the semi-solid upholder or on polynucleotide, wherein said polynucleotide contain easily cuts connection;

(b) described polynucleotide are contacted with cutting agent; With

(c) described polynucleotide are remained under the condition of described cutting agent existence and suitable cutting.

165., it is characterized in that described first kind of polynucleotide are directly connected in described semi-solid upholder by covalently or non-covalently connecting as the described method of claim 164.

166. as the described method of claim 164, it is characterized in that, described first kind of polynucleotide be connected in be fixed in the described semi-solid upholder or on upholder.

167., it is characterized in that described semi-solid upholder is a gel as the described method of claim 166, described upholder is a particulate.

168., it is characterized in that described semi-solid upholder is a gel as the described method of claim 166, described upholder is a magnetic particle.

169. an automatization sequencing equipment, described equipment comprise that its orientation can provide gravity bubble metathetical flow chamber.

170. automatization sequencing system of realizing 40,000 Nucleotide of per second evaluation.

171. automatization sequencing system that produces the 8.6Gb sequence information every day.

172. automatization sequencing system that produces the 48Gb sequence information every day.

173. a method of identifying template polynucleotide inner nucleotide sequence said method comprising the steps of:

(a) by the duplex that oligonucleotide probe and initial oligonucleotide is connected to form prolongation initial oligonucleotide is extended along described template polynucleotide, wherein said oligonucleotide probe contains the initiation residue;

(b) one or more Nucleotide of the described polynucleotide of evaluation;

(c) thus cutting described oligonucleotide probe with cutting agent produces extensible probe end; With

(d) repeating step (a) and (b) and (c) are up to determining nucleotide sequence.

174., it is characterized in that described authentication step comprises the mark that detects the oligonucleotide probe that is connected in nearest connection as the described method of claim 173.

175., it is characterized in that described oligonucleotide probe contains dealkalize base residue, damage base or Hypoxanthine deoxyriboside as the described method of claim 173.

176., it is characterized in that described oligonucleotide probe contains the damage base as the described method of claim 173, described method also comprises the step of removing described damage base.

177., it is characterized in that described removal step comprises that the duplex with described extension contacts with the DNA glycosylase as the described method of claim 176.

178., also comprise with cutting agent and cut the step that described oligonucleotide probe produces extensible probe end as the described method of claim 173.

179., it is characterized in that described cutting agent is selected from AP endonuclease, Endo V or periodate as the described method of claim 178.

180., it is characterized in that described cutting agent is the AP endonuclease as the described method of claim 178.

181., it is characterized in that described cutting agent is endonuclease V III as the described method of claim 178.

182. as the described method of claim 173, it is characterized in that, in semi-solid upholder or on carry out described extension step.

183. a method of measuring template polynucleotide inner nucleotide sequence said method comprising the steps of:

(b) will extend oligonucleotide probe and be connected in described extensible end, and form and contain the prolongation duplex that prolongs oligonucleotide probe, wherein said extension probes contains the initiation residue;

184., it is characterized in that described extension probes contains dealkalize base residue, damage base or Hypoxanthine deoxyriboside as the described method of claim 183.

185., it is characterized in that an end of each extension probes contains non-extensible part as the described method of claim 183.

186., it is characterized in that described authentication step comprises the mark that detects the extension probes that is connected in nearest connection as the described method of claim 183.

187., it is characterized in that described authentication step is included under the existence of chain termination nucleoside triphosphate of one or more marks and removes described non-extensible part and extend the oligonucleotide probe of described extension with nucleic acid polymerase as the described method of claim 183.

188. as the described method of claim 183, be included in also that not have extension probes to be connected in the described Connection Step described extensible when terminal, add the step of cap for the oligonucleotide probe that extends.

189., it is characterized in that described generation step comprises with the cutting agent that is selected from AP endonuclease or periodate cuts described oligonucleotide probe as the described method of claim 183.

190., it is characterized in that described cutting agent is the AP endonuclease as the described method of claim 189.

191., it is characterized in that described cutting agent is endonuclease V III as the described method of claim 189.

192. as the described method of claim 183, it is characterized in that, in semi-solid upholder or on carry out described connection and produce step.

193. as the described method of claim 183, it is characterized in that, step (a) comprises with independent sample size provides multiple different probe-template duplex, each different duplex contains the initial oligonucleotide probe of hybridizing in the template polynucleotide, wherein the template polynucleotide are identical described in each duplex, but initial oligonucleotide probe described in each duplex is incorporated into the different sequences of described template polynucleotide; Each sample size is carried out step (b)-(e) independently.

194., it is characterized in that with regard to each sample size, an end of described extension oligonucleotide probe contains non-extensible part as the described method of claim 193.

195., it is characterized in that with regard to each sample size, described authentication step comprises the mark that detects the extension probes that is connected in nearest connection as the described method of claim 194.

196. as the described method of claim 194, it is characterized in that, with regard to each sample size, described authentication step is included under the existence of chain termination nucleoside triphosphate of one or more marks and removes described non-extensible part and extend the oligonucleotide probe of described extension with nucleic acid polymerase.

197. as the described method of claim 194, be included in also that not have extension probes to be connected in the described Connection Step described extensible when terminal, add the step of cap for the oligonucleotide probe that extends.

198., it is characterized in that described generation step comprises with the cutting agent that is selected from AP endonuclease or periodate cuts described oligonucleotide probe as the described method of claim 194.

199., it is characterized in that described cutting agent is the AP endonuclease as the described method of claim 198.

200., it is characterized in that described cutting agent is endonuclease V III as the described method of claim 198.

201. as the described method of claim 194, it is characterized in that, in semi-solid upholder or on carry out described connection and produce step.

202. it is, further comprising the steps of: (f) described linking probe and the described initial oligonucleotide on the described template of removal as the described method of claim 183; (g) with the not homotactic second kind of oligonucleotide repeating step (a) that is incorporated into described template polynucleotide; (h) repeating step (b)-(e).

203. as the described method of claim 202, it is characterized in that, repeatedly repeat described method with the not homotactic initial oligonucleotide that is incorporated into described template polynucleotide.

204., it is characterized in that described removal step comprises described linking probe, initial oligonucleotide and template and contains 1.0-3.0%SDS, 100-300mM NaCl and 5-15mM sodium pyrosulfate (NaHSO as the described method of claim 202 ₄) the aqueous solution contact.

205., it is characterized in that described removal step comprises described linking probe, initial oligonucleotide and template and contains the 2%SDS that has an appointment, about 200mM NaCl and about 10mM sodium pyrosulfate (NaHSO as the described method of claim 202 ₄), as 2%SDS, 200mM NaCl and 10mM sodium pyrosulfate (NaHSO ₄) solution contact.

206., it is characterized in that an end of described extension probes contains non-extensible part as the described method of claim 203.

207., it is characterized in that in repeating, described authentication step comprises the mark that detects the extension probes that is connected in nearest connection as the described method of claim 203 at every turn.

208. as the described method of claim 203, it is characterized in that, in repeating, described authentication step is included under the existence of chain termination nucleoside triphosphate of one or more marks and removes described non-extensible part and extend the oligonucleotide probe of described extension with nucleic acid polymerase at every turn.

209. as the described method of claim 203, be included in also that not have extension probes to be connected in the described Connection Step described extensible when terminal, add the step of cap for the oligonucleotide probe that extends.

210., it is characterized in that described generation step comprises with the cutting agent that is selected from AP endonuclease or periodate cuts described oligonucleotide probe as the described method of claim 203.

211., it is characterized in that described cutting agent is the AP endonuclease as the described method of claim 210.

212., it is characterized in that described cutting agent is endonuclease V III as the described method of claim 210.

213. as the described method of claim 203, it is characterized in that, in semi-solid upholder or on carry out described connection and produce step.

214. an evaluation is connected in the method for the template polynucleotide inner nucleotide sequence of upholder on tie point, said method comprising the steps of:

(a) by the duplex that oligonucleotide probe and initial oligonucleotide is connected to form prolongation initial oligonucleotide is extended along described template polynucleotide, wherein said oligonucleotide probe contains the initiation residue, and the tie point along described extension along template to it and described upholder carries out;

(b) one or more Nucleotide of the described polynucleotide of evaluation; With

(c) repeating step (a) and (b) is up to determining nucleotide sequence.

215., it is characterized in that described authentication step comprises the mark that detects the extension probes that is connected in nearest connection as the described method of claim 214.

216., it is characterized in that described generation step comprises with the cutting agent that is selected from AP endonuclease, Endo V or periodate cuts described oligonucleotide probe as the described method of claim 214.

217., it is characterized in that described cutting agent is the AP endonuclease as the described method of claim 216.

218., it is characterized in that described cutting agent is endonuclease V III as the described method of claim 216.

219. as the described method of claim 216, it is characterized in that, in semi-solid upholder or on carry out described connection and produce step.

220. a method that is determined at the template polynucleotide inner nucleotide sequence that is connected in upholder on the tie point said method comprising the steps of:

(a) probe-template duplex that provides probe and template multi-nucleotide hybrid to form, primer has extensible end;

(b) will extend oligonucleotide probe is connected with described extensible end, formation contains the prolongation duplex that prolongs oligonucleotide probe, wherein said prolongation is carried out along the tie point of described template to it and described upholder, and wherein said oligonucleotide probe contains the initiation residue;

(d) if there is no ready-made extensible end produces extensible end on described prolongation oligonucleotide probe, make the end of generation be different from an end that extension probes connected; With

221., it is characterized in that an end of each extension probes contains non-extensible part as the described method of claim 220.

222., it is characterized in that described authentication step comprises the mark that detects the extension probes that is connected in nearest connection as the described method of claim 220.

223., it is characterized in that described authentication step is included under the existence of chain termination nucleoside triphosphate of one or more marks and removes described non-extensible part and extend the oligonucleotide probe of described extension with nucleic acid polymerase as the described method of claim 220.

224. as the described method of claim 220, be included in also that not have extension probes to be connected in the described Connection Step described extensible when terminal, add the step of cap for the oligonucleotide probe that extends.

225., it is characterized in that described generation step comprises with cutting agent cuts described oligonucleotide probe as the described method of claim 220.

226., it is characterized in that described cutting agent is selected from AP endonuclease, EndoV or periodate as the described method of claim 225.

227., it is characterized in that described cutting agent is the AP endonuclease as the described method of claim 225.

228., it is characterized in that described cutting agent is endonuclease V III as the described method of claim 225.

229. it is, further comprising the steps of: (f) described linking probe and the described initial oligonucleotide on the described template of removal as the described method of claim 220; (g) with the not homotactic second kind of oligonucleotide repeating step (a) that is incorporated into described template polynucleotide; (h) repeating step (b)-(e).

230. as the described method of claim 229, it is characterized in that, repeatedly repeat described method with the not homotactic initial oligonucleotide that is incorporated into described template polynucleotide.

231. as the described method of claim 220, it is characterized in that, in semi-solid upholder or on carry out described connection and produce step.

232., it is characterized in that described template is connected in the particulate that is connected with the rigid substrate of substantially flat as the described method of claim 220.

233., it is characterized in that described removal step comprises described linking probe, initial oligonucleotide and template and contains the 1.0-3.0%SDS that has an appointment, 100-300mM NaCl and 5-15mM sodium pyrosulfate (NaHSO as the described method of claim 229 ₄) the aqueous solution contact.

234., it is characterized in that described removal step comprises described linking probe, initial oligonucleotide and template and contains the 2%SDS that has an appointment, 200mM NaCl and 10mM sodium pyrosulfate (NaHSO as the described method of claim 229 ₄), as 2%SDS, 200mM NaCl and 10mM sodium pyrosulfate (NaHSO ₄) the aqueous solution contact.

235. a method of identifying template polynucleotide inner nucleotide sequence said method comprising the steps of:

(a) provide be connected in be fixed within the semi-solid upholder or on the template polynucleotide of particulate;

(b) by the duplex that oligonucleotide probe and initial oligonucleotide is connected to form prolongation initial oligonucleotide is extended along described template polynucleotide, wherein said oligonucleotide probe contains the initiation residue;

(c) one or more Nucleotide of the described polynucleotide of evaluation; With

(d) repeating step (b) and (c) is up to determining nucleotide sequence.

236. as the described method of claim 235, it is characterized in that, in semi-solid upholder, carry out described extension step.

237., it is characterized in that described template is connected in the particulate that is connected with the rigid substrate of substantially flat as the described method of claim 235.

238., it is characterized in that described oligonucleotide contains easily cuts connection as the described method of claim 235, or it is easy to modified containing and easily cuts connection, wherein saidly easily cut connection between nucleosides and dealkalize base residue.

239. a method of measuring the nucleotide sequence of template polynucleotide said method comprising the steps of:

(a) probe-template duplex that provides probe and template multi-nucleotide hybrid to form, described probe contains extensible end, described probe-template duplex be connected in be embedded within the semi-solid upholder or on particulate;

(b) will extend oligonucleotide probe and be connected with described extensible end, and form and contain the prolongation duplex that prolongs oligonucleotide probe, wherein said extension probes contains the initiation residue;

240., it is characterized in that described connection and generation step are carried out as the described method of claim 239 in semi-solid upholder.

241., it is characterized in that described template is connected in the particulate that is connected with the rigid substrate of substantially flat as the described method of claim 239.

242. a method of measuring template polynucleotide inner nucleotide sequence said method comprising the steps of:

(a) in the presence of particulate in the emulsion chamber amplification template polynucleotide molecule, thereby produce the particulate of the clonal population be connected with the template polynucleotide;

(b) from described emulsion, reclaim described particulate;

(c) described particulate is embedded within the semi-solid upholder or on;

(d) by the duplex that oligonucleotide probe and initial oligonucleotide is connected to form prolongation initial oligonucleotide is extended along described template polynucleotide, wherein said oligonucleotide probe contains the initiation residue;

(e) one or more Nucleotide of the described polynucleotide of evaluation; With

(f) repeating step (d) and (e) is up to determining nucleotide sequence.

243., it is characterized in that (i) amplification contains not homotactic multiple template polynucleotide molecule in single emulsion chamber as the described method of claim 242; (ii) from described emulsion, reclaim multiple particulate and be embedded in the described upholder or on, each particulate is connected with template polynucleotide clonal population, wherein said clonal population has different sequences, (iii) to the parallel step (d), (e) and (f) of carrying out of the described clonal population that is connected in described embedding particulate, so that a plurality of sequences of replicate(determination).

244. the method for the nucleotide sequence information of template polynucleotide is measured in first set with the oligonucleotide probe family of at least two kinds of distinctive marks, said method comprising the steps of:

(a) by the duplex that oligonucleotide probe and initial oligonucleotide is connected to form prolongation initial oligonucleotide is extended along described template polynucleotide, wherein said oligonucleotide probe be described distinctive mark oligonucleotide probe man family set the member and contain the initiation residue;

(b) detect the mark that is connected with described oligonucleotide; With

245., it is characterized in that step (d) comprises the ordered list of the described probe of decoding family title, to determine described sequence as the described method of claim 244.

246. as the described method of claim 244, it is characterized in that, probe-template the duplex that provides initial oligonucleotide probe and template multi-nucleotide hybrid to form is provided described method, described probe has extensible end, wherein said extension step comprises oligonucleotide probe is connected in described extensible end, formation contains the prolongation duplex that prolongs oligonucleotide probe, be included in also that not have oligonucleotide probe to be connected in the described extension step described extensible when terminal, add the step of cap for all the other extensible ends.

247., it is characterized in that an end of oligonucleotide probe contains non-extensible part described in each probe family as the described method of claim 244.

248. as the described method of claim 244, after detecting step, each also comprises: if (f) extensible end does not exist, just on the oligonucleotide probe of described nearest connection, produce extensible end, so that the end that produces is different from the end that the oligonucleotide probe of described nearest connection connects.

249. as the described method of claim 248, it is characterized in that, cut described oligonucleotide with the cutting agent that is selected from AP endonuclease, EndoV or periodate and produce described extensible probe end.

250., it is characterized in that described cutting agent is the AP endonuclease as the described method of claim 249.

251., it is characterized in that described cutting agent is endonuclease V III as the described method of claim 249.

252. as the described method of claim 244, it is characterized in that, in semi-solid upholder or on carry out described extension step.

253., it is characterized in that described template is connected in the particulate that is connected with the rigid substrate of substantially flat as the described method of claim 244.

254., it is characterized in that described set comprises the probe family of 2 kinds of distinctive marks as the described method of claim 244.

255., it is characterized in that described set comprises the probe family of 3 kinds of distinctive marks as the described method of claim 244.

256., it is characterized in that described set comprises the probe family of 4 kinds of distinctive marks as the described method of claim 244.

257., it is characterized in that described set comprises the probe family of distinctive mark more than 4 kinds as the described method of claim 244.

258., it is characterized in that described oligonucleotide probe comprises the limited part that nucleosides is not independently selected as the described method of claim 244, wherein the oligonucleotide probe that will contain the different limited part of sequence according to encoding scheme is distributed to probe family.

259. as the described method of claim 244, it is characterized in that, described oligonucleotide probe distributed to first, second, third and four point probe family according to one of 24 kinds of listed encoding schemes of table 1.

260., it is characterized in that the kind of at least one Nucleotide is known in the described template as the described method of claim 245, wherein said decoding step comprises:

(iii) repeating step (ii), up to measuring this sequence.

261., further comprising the steps of as the described method of claim 245:

(iii) repeating step (ii), up to measuring this sequence.

262. as the described method of claim 261, it is characterized in that, described determination step is included under the certain condition that has polysaccharase template-probe duplex is contacted with labeled nucleotide, if described under the described conditions labeled nucleotide with described duplex position adjacent on described template complementation, just can mix described labeled nucleotide.

263., it is characterized in that described decoding step comprises as the described method of claim 245: produce at least a candidate sequence from the ordered list of probe family title; With the nucleotide sequence of selecting candidate sequence as described template.

264., it is characterized in that described generation step comprises generation at least 4 candidate sequences as the described method of claim 263.

265., it is characterized in that described generation step comprises as the described method of claim 263:

(iv) repeating step (iii), up to producing candidate sequence; With

266. as the described method of claim 263, it is characterized in that, described selection step comprises at least a candidate sequence and one or more known arrays, and selects and one or more known arrays have predetermined homogeny degree or immediate candidate sequence.

267., it is characterized in that described template is derived from organism interested as the described method of claim 266, wherein said comparison step comprises at least a candidate sequence and contains available from the sequence in the database of the sequence of described organism.

268., it is characterized in that described comparison step comprises at least a candidate sequence and the sequence that contains in the database of a plurality of comparative sequences as the described method of claim 266, each sequence contains the difference of polynucleotide sequence to be measured may sequence.

269., it is characterized in that described selection step comprises as the described method of claim 263:

270., it is characterized in that described rating unit is a dinucleotides as the described method of claim 269.

271., it is characterized in that the described second probe family title ordered list only contains an element as the described method of claim 269.

272., it is characterized in that oligonucleotide probe has following structure described in each probe family as the described method of claim 244: 5 '-(XY) (N) _kN _B ^*-3 ' or 3 '-(XY) (N) _kN _B ^*-5 ', wherein N represents any nucleosides, N _BThe part that representative can not be extended with ligase enzyme, but * represent the test section, XY is the limited part of described probe, wherein X and Y represent nucleosides identical or different but that can not independently choose separately, X and Y are at least 2 times of degeneracys, and connecting between at least one nucleosides is easily to cut connection between nucleosides and the dealkalize base residue, or nucleosides and contain connection between the residue that damages base, k is 1-100, and restricted condition is: but the test section can be present in Y or (N) _kIn arbitrarily on the Nucleotide and be present in N in addition _BGo up or be not present in N in addition _BOn.

273. as the described method of claim 272, it is characterized in that, connect between described at least one nucleosides and easily cut connection between nucleosides and the dealkalize base residue.

274. as the described method of claim 272, it is characterized in that, but described test section connects, can or have this two kinds of features by photobleaching by cutting joint.

275., it is characterized in that the described joint that cuts contains disulfide linkage as the described method of claim 274.

276. as the described method of claim 272, it is characterized in that, adopt the oligonucleotide probe family of 4 kinds of distinctive marks, wherein the different oligonucleotide probe of the limited partial sequence of this probe is distributed to first, second, third and four point probe family according to one of 24 kinds of listed encoding schemes of table 1.

277., it is characterized in that described detection step comprises simultaneously from described template at least 2 Nucleotide 2 information of average acquiring separately, and does not obtain two information from any single Nucleotide as the described method of claim 244.

278., it is characterized in that described detection step comprises simultaneously that at least 2 Nucleotide obtain separately from described template and is less than 2 information as the described method of claim 244.

279. the method for the nucleotide sequence information of template polynucleotide is measured in first set with the oligonucleotide probe family of at least two kinds of distinctive marks, said method comprising the steps of:

(a) probe-template composite is contacted so that oligonucleotide probe hybridization with the oligonucleotide probe family of two kinds of distinctive marks at least, described probe-template composite contains the partly double-stranded and strand part of waiting to check order with extensible end, described oligonucleotide probe contains the template part complementary part that partly is close to described duplex, and the probe of wherein said probe family contains the initiation residue;

(b) oligonucleotide probe with described hybridization is connected with described extensible end, contains the probe-template duplex that prolongs duplex thereby produce;

(c) detect the mark that links to each other with described linking probe;

(d) if there is no ready-made extensible probe end then produces extensible probe end on described prolongation duplex; With

280., it is characterized in that described detection step comprises simultaneously from described template at least 2 Nucleotide 2 information of average acquiring separately, and does not obtain two information from any single Nucleotide as the described method of claim 279.

281., it is characterized in that described detection step comprises simultaneously that at least 2 Nucleotide obtain separately from described template and is less than 2 information as the described method of claim 279.

282. measure the method for the nucleotide sequence information of template polynucleotide with first set of oligonucleotide probe family for one kind, the probe of wherein said probe family contains the initiation residue, said method comprising the steps of:

(a) carry out continuously orderly extension, connection, detection and cutting circulation, wherein said detection step comprises that simultaneously at least two Nucleotide respectively obtain average two information from described template, and does not obtain two information from any single Nucleotide; With

283., it is characterized in that described at least one out of Memory comprises an information that is selected from down group as the described method of claim 282: the Nucleotide kind in the described template, by comparing the information that candidate sequence and at least a known array obtain; Repeat the information that described method obtains with second set that utilizes oligonucleotide probe family.

284. the set of the oligonucleotide probe family of at least two kinds of distinctive marks, wherein the probe of each probe family comprises limited part and not limited part, is at least 2 times of degeneracys on each position of described limited part, and the probe of each family contains the initiation residue.

285. the oligonucleotide probe man family set as the described distinctive mark of claim 284 is characterized in that each probe contains the inductile end of ligase enzyme.

286. oligonucleotide probe man family set as the described distinctive mark of claim 284, it is characterized in that, each probe contains the inductile end of ligase enzyme, but each probe comprises the test section on a described position of easily cutting between connection and the inductile end of described ligase enzyme.

287. the oligonucleotide probe man family set as the described distinctive mark of claim 284 is characterized in that described set comprises 2 kinds of probe families.

288. the oligonucleotide probe man family set as the described distinctive mark of claim 284 is characterized in that described set comprises 3 kinds of probe families.

289. the oligonucleotide probe man family set as the described distinctive mark of claim 284 is characterized in that described set comprises 4 kinds of probe families.

290. the oligonucleotide probe man family set as the described distinctive mark of claim 284 is characterized in that described set comprises probe family more than 4 kinds.

291. the oligonucleotide probe man family set as the described distinctive mark of claim 284 it is characterized in that, but described probe contains the test section, but described test section connects, can or have this two kinds of features by photobleaching by cutting joint.

292. a test kit, it comprises the set of the oligonucleotide probe family of at least two kinds of distinctive marks, and wherein said oligonucleotide probe contains the initiation residue.

293., it is characterized in that described initiation residue is dealkalize base residue, Hypoxanthine deoxyriboside or contains the residue that damages base as the described test kit of claim 292.

294., it is characterized in that described probe contains easily cuts connection between nucleosides and the dealkalize base residue as the described test kit of claim 292.

295., also comprise being selected from down at least one of group: ligase enzyme, can cut described material, Phosphoric acid esterase, polysaccharase, upholder, damping fluid, thermostability polysaccharase, Nucleotide, the reagent of preparation emulsion and the reagent of preparation gel of easily cutting connection as the described test kit of claim 292.

296. a test kit, it comprises and contains the oligonucleotide probe that causes residue, and wherein said probe contains or is not difficult modified and contains easily cuts connection, but wherein said probe has carried out mark with the test section.

297. as the described test kit of claim 296, it is characterized in that, but described test section is a fluorescence dye.

298., also comprise and to cut the described material of easily cutting connection as the described test kit of claim 296.

299., also comprise ligase enzyme as the described test kit of claim 296.

300., also comprise ligase enzyme and can cut the described material that is connected of easily cutting as the described test kit of claim 296.

301., also comprise being selected from down at least one of group: ligase enzyme, can cut described material, Phosphoric acid esterase, polysaccharase, upholder, damping fluid, thermostability polysaccharase, Nucleotide, the reagent of preparation emulsion and the reagent of preparation gel of easily cutting connection as the described test kit of claim 296.

302. as the described test kit of claim 301, it is characterized in that, described test kit comprises multiple fluorescently-labeled oligonucleotide probe, wherein said probe contains nucleosides and causes easily cuts connection between the residue, so that carry different fluorescence dyes that can be spectrally resolved corresponding to the probe of different probe terminal nucleotide.

303. a form is 5 '-O-P-O-X-O-P-O-(N) _kN _B ^*-3 ' oligonucleotide, wherein N represents any Nucleotide or dealkalize base residue, and restricted condition is: at least one N causes residue, N _BThe part that representative can not be extended with ligase enzyme, but * representative test section, X represents Nucleotide, and k is 1-100, and restricted condition is: but the test section can be present in (N) _kIn arbitrarily on the Nucleotide and be present in N in addition _BGo up or be not present in N in addition _BOn.

304., it is characterized in that described probe contains the Nucleotide that at least one degeneracy reduces as the described oligonucleotide probe of claim 303.

305., it is characterized in that described group comprises multiple fluorescently-labeled oligonucleotide probe as the described oligonucleotide probe group of claim 303, so that carry different fluorescence dyes that can be spectrally resolved corresponding to the probe of different probe nucleotide X.

306. a form is 5 '-N _B ^*(N) _kThe oligonucleotide probe of C-O-P-O-X-3 ', wherein N represents any Nucleotide or dealkalize base residue, and restricted condition is: at least one N causes residue, N _BThe part that representative can not be extended with ligase enzyme, but * representative test section, it is 1-100 that X represents Nucleotide and k, restricted condition is: but the test section can be present in (N) _kIn arbitrarily on the Nucleotide and be present in N in addition _BGo up or be not present in N in addition _BOn.

307., it is characterized in that described probe comprises the Nucleotide that at least one degeneracy reduces as the described oligonucleotide probe of claim 306.

308., it is characterized in that described group comprises multiple fluorescently-labeled oligonucleotide probe as the described oligonucleotide probe group of claim 306, so that carry different fluorescence dyes that can be spectrally resolved corresponding to the probe of different probe nucleotide X.

309. a form is selected from down the oligonucleotide probe of group: 3 '-XNNNNRINI-5 ', 3 '-XNNNNRIII-5 ', 3 '-XNNNNRNII-5 ', 3 '-XNNNNIRII-5 ', XNNNNRNI-5 ', 3 '-XNNNNRII-5 ', 3 '-XNNNNRII-5 ', 3 '-XNNNNIRI-5, wherein X and N represent any Nucleotide, " R " representative causes residue, and at least one residue between described initiation residue and the described oligonucleotide 5 ' end contains the mark corresponding to specific X.

310., it is characterized in that R represents the ribodesose residue as the described probe of claim 309.

311. first polynucleotide are connected in the method for second polynucleotide, said method comprising the steps of:

(a) provide be fixed within the semi-solid upholder or on first polynucleotide;

(c) described first and second polynucleotide are remained on ligase enzyme exist with the condition that is fit to be connected under, wherein at least a described polynucleotide contain the initiation residue.

312., it is characterized in that described first kind of polynucleotide are directly connected in described semi-solid upholder by covalently or non-covalently connecting as the described method of claim 311.

313. as the described method of claim 311, it is characterized in that, described first kind of polynucleotide be connected in be fixed in the described semi-solid upholder or on upholder.

314., it is characterized in that described semi-solid upholder is a gel as the described method of claim 313, described upholder is a particulate.

315., it is characterized in that described semi-solid upholder is a gel as the described method of claim 313, described upholder is a magnetic particle.

316. a method of cutting polynucleotide said method comprising the steps of:

(a) provide be fixed within the semi-solid upholder or on polynucleotide, wherein said polynucleotide contain the initiation residue;

(b) described polynucleotide are contacted with cutting agent; With

317., it is characterized in that described initiation residue is dealkalize base residue, Hypoxanthine deoxyriboside or contains the residue that damages base as the described method of claim 316.

318., it is characterized in that described first kind of polynucleotide are directly connected in described semi-solid upholder by covalently or non-covalently connecting as the described method of claim 316.

319. as the described method of claim 316, it is characterized in that, described first kind of polynucleotide be connected in be fixed in the described semi-solid upholder or on upholder.

320., it is characterized in that described semi-solid upholder is a gel as the described method of claim 319, described upholder is a particulate.

321., it is characterized in that described semi-solid upholder is a gel as the described method of claim 320, described upholder is a magnetic particle.

322. a component set that is used to prepare particulate colony, described set comprises:

(a) particulate colony, wherein single particulate is connected with the first primer colony and the second primer colony at least, and the primer of wherein said first colony is different with the sequence of the primer of described second colony; With

(b) nucleic acid fragment library, wherein each nucleic acid fragment contains the interested first and second nucleic acid sections, and wherein said first and second primers are corresponding to the universal sequence that is positioned at outside the described interested first and second nucleic acid sections.

323., it is characterized in that 5 ' and the 3 ' label that the described interested first and second nucleic acid sections are paired labels as the described component set of claim 322.

324. as the set of the described component of claim 322, it is characterized in that described nucleic acid fragment comprises internal cohesion of the one or more primer binding sites that contain amplimer, so that with each nucleic acid sections of pcr amplification.

325., also comprise primer binding site complementary primer with described internal cohesion as the described component set of claim 324.

326. one kind be connected with that basic identical nucleotide sequence constitutes first colony and the particulate of second colony that constitutes of basic identical nucleotide sequence, wherein said first nucleic acid population comprises the interested first nucleic acid sections, and described second nucleic acid population comprises the interested second nucleic acid sections.

327., it is characterized in that 5 ' label and 3 ' label that the described interested first and second nucleic acid sections are paired labels as the described particulate of claim 326.

328., it is characterized in that the described interested first and second nucleic acid sections are at a distance of the 5 ' label and the 3 ' label of predetermined distance in the contiguous nucleic acid of natural generation as the described particulate of claim 326.

329. as claim 327 or 328 described particulates, it is characterized in that, by the single bigger nucleic acid fragment described first and second nucleic acid sections that increase.

330. as the described particulate of claim 329, it is characterized in that, in the single cell of PCR emulsion, carry out described amplification.

331., it is characterized in that the essentially identical first and second nucleotide sequence colonies that single particulate connects are different to small part with the essentially identical first and second nucleotide sequence colonies that are connected in other single particulate as the described particulate of claim 326 colony.

332. as the described particulate of claim 331 colony, it is characterized in that, the interested first nucleic acid sections is contained in the described first nucleotide sequence colony that is connected in single particulate, and the interested second nucleic acid sections is contained in the described second nucleotide sequence colony that is connected in single particulate.

333. as the described particulate of claim 332 colony, it is characterized in that, be connected in 5 ' label and 3 ' label that the interested first and second nucleic acid sections described in the described first and second nucleotide sequence colonies of single particulate are or contain paired label.

334. as the described particulate of claim 332 colony, it is characterized in that, be connected in the interested first and second nucleic acid sections described in the described first and second nucleotide sequence colonies of single particulate by single bigger nucleic acid fragment amplification.

335. as the described particulate of claim 334 colony, it is characterized in that, in the single cell of PCR emulsion, carry out described amplification.

336. as the described particulate of claim 331 colony, it is characterized in that, by in the single cell of PCR emulsion, increasing the described first and second nucleotide sequence colonies are connected in single particulate, wherein, the described single cell of an at least a portion nucleic acid fragment containing a particulate and contain the described first and second nucleic acid sections.

337. an array, it comprises the described particulate of each claim colony among the claim 331-336.

338. as the described array of claim 337, it is characterized in that, described particulate is fixed in the semi-solid upholder or on.

339. a generation is connected with the method for the particulate of first nucleotide sequence colony of basic identical nucleotide sequence formation and the second nucleotide sequence colony that basic identical nucleotide sequence constitutes, and said method comprising the steps of:

(a) provide the particulate that is connected with the first primer colony and the second primer colony;

(b) provide the nucleic acid fragment that contains the first and second nucleic acid sections, the wherein said first and second nucleic acid sections both sides are lands of described first and second primers, the sequence of described PBR is corresponding to first and second primers that are connected in described particulate, and separated by at least one other PBR;

(c) under the condition that has the existence of suitable amplifing reagent and primer to increase, hatch described particulate and nucleic acid fragment, thereby make described first and second nucleotide sequences be amplified and be connected in described particulate.

340., it is characterized in that wherein first and second PBRs contain amplimer land and sequencing primer land separately as the described method of claim 339.

341. as the described method of claim 339, it is characterized in that, wherein at least a other PBR contains the land of two kinds of amplimers, so that each nucleotide sequence side joint is in the primer binding site of a pair of amplimer, so that with these two kinds of nucleotide sequences of pcr amplification.

342. as the described method of claim 339, it is characterized in that, in the single cell of PCR emulsion, carry out described amplification.

343., it is characterized in that described first and second nucleotide sequences contain label separately as the described method of claim 339, wherein said label is 5 ' and a 3 ' label of paired label.

344. a generation is connected with the method for the particulate colony of different IPs acid sequence colony, the intragroup nucleotide sequence of each described nucleotide sequence is basic identical, described method comprises carries out the described method of claim 339, wherein in a plurality of cells of PCR emulsion, carry out described amplification, at least a portion cell nucleic acid fragment containing a particulate and comprise the first and second nucleic acid sections wherein, wherein first and second nucleotide sequences of single nucleic acid fragment are different with the described first and second nucleic acid sections.

345. a method that produces arrays of microparticles, described method comprise the described particulate of claim 344 colony is fixed in the semi-solid upholder or on.

346. a method of carrying out nucleic acid sequencing, described method comprises:

(a) acquisition is connected in the sequence information of the first nucleic acid molecule colony of particulate;

(b) acquisition is connected in the sequence information of the second nucleic acid molecule colony of same particulate, and wherein said first nucleic acid divides in the sequence of the colony and the second nucleic acid molecule colony different to small part.

347. as the described method of claim 346, it is characterized in that, obtain described (a) and sequence information (b) successively.

348., it is characterized in that the described first nucleic acid molecule colony comprises 5 ' label of paired label as the described method of claim 346, the described second nucleic acid molecule colony comprises 3 ' label.

349. a sequence measurement, described method comprise particulate colony is carried out the described method of claim 346, wherein the sequence of first and second nucleic acid molecule of each particulate connection is different to small part with first and second nucleic acid molecule that are connected in other particulate.

350., it is characterized in that the described first nucleic acid molecule colony on the described single particulate of wherein parallel order-checking, and the described second nucleic acid molecule colony on the described single particulate of parallel order-checking as the described method of claim 349.

351. a composition, it contains 1.0-3.0%SDS, 100-300mM NaCl and 5-15 mM sodium pyrosulfate (NaHSO ₄) the aqueous solution.

352. as the described composition of claim 351, its pH is 2.0-3.0.

353. as the described composition of claim 351, it contains the 2%SDS that has an appointment, about 200mM NaCl and about 10mM sodium pyrosulfate (NaHSO ₄) the aqueous solution.

354. as the described composition of claim 353, its pH is 2.0-3.0.