CN110869515B

CN110869515B - Sequencing method for genome rearrangement detection

Info

Publication number: CN110869515B
Application number: CN201880045564.3A
Authority: CN
Inventors: M·科里尼; D·N·罗伯茨
Original assignee: Agilent Technologies Inc
Current assignee: Agilent Technologies Inc
Priority date: 2017-07-12
Filing date: 2018-07-10
Publication date: 2024-08-23
Anticipated expiration: 2038-07-10
Also published as: JP2024099616A; WO2019014218A2; US11505826B2; US20220364169A1; US20190017113A1; CN110869515A; EP3652345A4; WO2019014218A3; EP3652345A2; JP2020530270A; JP7539770B2

Abstract

The present disclosure relates to single-end sequencing methods for improving genomic rearrangements such as deletions, insertions, inversions and translocations present in detection polynucleotides. The first priming event on the adapter allows sequencing of the target sequence, and the second priming event allows identification of sequences amplified and labeled by selective amplification. The combination of priming events in the same direction facilitates read alignment and identification of any genomic rearrangements.

Description

Sequencing methods for genomic rearrangement detection

相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS

本申请主张2017年7月12日提交的美国非临时申请15/648,240号的权益，在此通过题述并入其全部内容。This application claims the benefit of U.S. Non-Provisional Application No. 15/648,240, filed on July 12, 2017, which is hereby incorporated by reference in its entirety.

发明领域Field of the Invention

本公开涉及用于改进基因组重排(诸如融合基因)的检测的测序方法、组合物和试剂盒。本公开还涉及包含基因组重排的靶多核苷酸文库的制备方法。The present disclosure relates to sequencing methods, compositions and kits for improving the detection of genomic rearrangements, such as fusion genes. The present disclosure also relates to methods for preparing target polynucleotide libraries containing genomic rearrangements.

背景background

已经证明使用核酸测序方法鉴定基因组重排的能力在人类遗传障碍和疾病的检测中是非常有益的。基因组重排通常是指核酸链中的任何核苷酸重排，包括一个或多个核苷酸的缺失、插入、倒位或易位，并且可以通过对感兴趣的核酸测序并将序列数据与参考(比如已知的核酸序列)比较而加以检测。可以使用下一代测序(NGS)来快速分析多核苷酸并检测多核苷酸中的任何基因组重排。NGS允许同时平行分析大量序列。在一些形式中，将诸如DNA之类的多核苷酸通过一个或多个衔接子固定在固体表面上，并扩增以增加信号强度。通常，通过将样品片段化成多核苷酸片段，用一个或多个衔接子将片段加标签，以及扩增多核苷酸片段，从而制备用于测序的文库。可以用一种或多种扩增引物扩增片段。在通过合成形式测序时，使片段与测序引物杂交，并且以酶促方式添加标记的双脱氧核苷酸。检测并分析来自标记的双脱氧核苷酸的信号，以确定序列。The ability to identify genomic rearrangements using nucleic acid sequencing methods has been shown to be very beneficial in the detection of human genetic disorders and diseases. Genomic rearrangement generally refers to any nucleotide rearrangement in a nucleic acid chain, including the deletion, insertion, inversion or translocation of one or more nucleotides, and can be detected by sequencing the nucleic acid of interest and comparing the sequence data with a reference (such as a known nucleic acid sequence). Next generation sequencing (NGS) can be used to quickly analyze polynucleotides and detect any genomic rearrangements in polynucleotides. NGS allows for simultaneous parallel analysis of a large number of sequences. In some forms, polynucleotides such as DNA are fixed on a solid surface by one or more adapters and amplified to increase signal intensity. Typically, a library for sequencing is prepared by fragmenting a sample into polynucleotide fragments, labeling the fragments with one or more adapters, and amplifying the polynucleotide fragments. Fragments can be amplified with one or more amplification primers. When sequencing by synthetic form, the fragments are hybridized with sequencing primers, and labeled dideoxynucleotides are added enzymatically. The signal from the labeled dideoxynucleotides is detected and analyzed to determine the sequence.

可以使用单端或双端测序方法分析感兴趣的多核苷酸。单端测序方法涉及从片段的一端向另一端进行基因组片段的测序。单端测序读段为每片段提供一个读段，所述一个读段对应于片段两端之一的n个碱基对，其中n是测序循环数。单端测序通常不太适合检测大规模基因组重排和重复序列元件。跨融合连接点的单端读取为融合事件提供碱基对证据。然而，可能难以确保的是，单端读取已进行到足以鉴定融合事件的碱基对数目。The polynucleotides of interest can be analyzed using single-end or double-end sequencing methods. The single-end sequencing method involves sequencing of genomic fragments from one end of the fragment to the other end. The single-end sequencing read provides a read for each fragment, and the one read corresponds to n base pairs at one of the two ends of the fragment, where n is the number of sequencing cycles. Single-end sequencing is generally not suitable for detecting large-scale genome rearrangements and repetitive sequence elements. The single-end reading across the fusion junction provides base pair evidence for the fusion event. However, it may be difficult to ensure that the single-end reading has been carried out to the number of base pairs sufficient to identify the fusion event.

双端方法涉及从一端到另一端读取核酸片段直至达到指定的读长，然后从片段的另一侧进行另一轮读取。对于双端方法，进行正向序列读取和反向序列读取，并将数据配对为相邻序列。将序列与参考样品匹配，以鉴定变体。双端测序方法通常用于检测基因组重排，因为这样的方法通常提供良好的定位信息，从而更容易解析基因组中存在的结构重排。然而，许多测序仪器没有执行双端测序的配置，而仅能够进行单端测序。The double-end method involves reading a nucleic acid fragment from one end to the other until a specified read length is reached, followed by another round of reading from the other side of the fragment. For the double-end method, a forward sequence read and a reverse sequence read are performed, and the data are paired as adjacent sequences. The sequence is matched to a reference sample to identify variants. Double-end sequencing methods are often used to detect genomic rearrangements because such methods generally provide good positioning information, making it easier to resolve structural rearrangements present in the genome. However, many sequencing instruments are not configured to perform double-end sequencing and are only capable of single-end sequencing.

WO 2007133831A2论述了使用散布在靶多核苷酸中的衔接子获得靶序列的核苷酸序列信息的方法和组合物。所述方法可用于在靶多核苷酸或片段内的间隔位置插入多个衔接子。衔接子可以用作使用各种测序化学来询问相邻序列的平台，所述测序化学诸如通过引物延伸、探针连接等鉴定核苷酸的那些。本公开包括用于将已知的衔接子序列插入到靶序列中使得连续的靶序列被衔接子中断的方法和组合物。本公开指出，通过对衔接子的“上游”和“下游”两者测序，可以完成整个靶序列的鉴定。WO 2007133831A2 discusses methods and compositions for obtaining nucleotide sequence information of a target sequence using adapters interspersed in a target polynucleotide. The method can be used to insert multiple adapters at spaced positions within a target polynucleotide or fragment. The adapter can be used as a platform for interrogating adjacent sequences using various sequencing chemistries, such as those that identify nucleotides by primer extension, probe ligation, etc. The present disclosure includes methods and compositions for inserting a known adapter sequence into a target sequence so that a continuous target sequence is interrupted by the adapter. The present disclosure points out that the identification of the entire target sequence can be completed by sequencing both the "upstream" and "downstream" of the adapter.

WO2015112974A1论述了与用于制备和分析核酸的方法有关的方面。在一些实施方案中，提供了用于序列分析(例如，使用下一代测序)的核酸的制备方法。WO2015112974A1 discusses aspects related to methods for preparing and analyzing nucleic acids. In some embodiments, methods for preparing nucleic acids for sequence analysis (eg, using next generation sequencing) are provided.

WO2015148219A1论述了一种分析靶核酸片段的方法，所述方法包括使用靶标的一条链作为模板通过使用第一寡核苷酸引物的引物延伸生成第一链，所述第一寡核苷酸引物包含从5'至3'的突出端衔接子区、引物ID区、测序引物结合位点和与靶片段的一端互补的靶标特异性序列区；任选地去除未结合的引物；从生成的第一链扩增靶标以产生扩增产物；和检测扩增产物。本公开还论述了为什么独特的引物可用于这样的靶分析方法。WO2015148219A1 discusses a method for analyzing a target nucleic acid fragment, the method comprising using one strand of the target as a template to generate a first strand by primer extension using a first oligonucleotide primer, the first oligonucleotide primer comprising an overhang adapter region from 5' to 3', a primer ID region, a sequencing primer binding site, and a target-specific sequence region complementary to one end of the target fragment; optionally removing unbound primers; amplifying the target from the generated first strand to produce an amplified product; and detecting the amplified product. The disclosure also discusses why unique primers can be used in such a target analysis method.

使用单端测序检测基因组重排的改进方法将对该领域做出有用的贡献，特别是如果该方法与高通量测序分析结合使用的话。Improved methods for detecting genomic rearrangements using single-end sequencing would be a useful contribution to the field, particularly if the approach is combined with high-throughput sequencing analysis.

发明概述SUMMARY OF THE INVENTION

提供了用于检测多核苷酸中的基因组重排的方法、组合物和试剂盒。本发明的方法、组合物和试剂盒可用于利用感兴趣的核酸的单端测序更容易和可靠地检测基因组重排。Methods, compositions and kits for detecting genomic rearrangements in polynucleotides are provided. The methods, compositions and kits of the present invention can be used to more easily and reliably detect genomic rearrangements using single-end sequencing of nucleic acids of interest.

结合所附权利要求书，根据以下详细说明，本发明的这些和其他特征和优点将显而易见。These and other features and advantages of the present invention will become apparent from the following detailed description, taken in conjunction with the appended claims.

附图简要说明BRIEF DESCRIPTION OF THE DRAWINGS

当结合附图阅读时，根据以下详细说明将最好地理解本教导。这些特征不一定是按比例绘制的。The present teachings will be best understood from the following detailed description when read in conjunction with the accompanying drawings.The features are not necessarily drawn to scale.

图1展示了用于测序的多核苷酸的制备方法的一个实施方案。FIG. 1 shows one embodiment of a method for preparing polynucleotides for sequencing.

图2展示了用于测序的多核苷酸的制备方法的另一个实施方案。FIG. 2 shows another embodiment of a method for preparing a polynucleotide for sequencing.

定义的术语Defined Terms

应当理解，本文中使用的术语是仅仅出于描述具体实施方案的目的，而不意图限制。所定义的术语是对如本教导的技术领域中通常理解和接受的定义的术语的技术和科学含义的补充。It should be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting. The defined terms are in addition to the technical and scientific meanings of the defined terms as commonly understood and accepted in the technical field of the present teachings.

如在说明书和所附权利要求书中使用的，除了它们的普通含义外，术语“实质的”或“基本上”意味着在本领域普通技术人员可接受的限度或程度内。例如，“基本上取消”意味着本领域技术人员认为取消是可接受的。As used in the specification and the appended claims, in addition to their ordinary meanings, the terms "substantial" or "substantially" mean within the limits or degrees acceptable to those of ordinary skill in the art. For example, "substantially canceled" means that those skilled in the art consider the cancellation to be acceptable.

如在说明书和所附权利要求书中使用的，除了其普通含义外，术语“大致”和“大约”意味着在本领域普通技术人员可接受的限度或量之内。术语“大约”通常指的是指示数值的正或负15％。例如，“大约10”可以指示8.7到1.15的范围。例如，“大致相同”表示本领域普通技术人员认为所比较的事项相同。As used in the specification and the appended claims, in addition to their ordinary meanings, the terms "substantially" and "approximately" mean within limits or amounts acceptable to one of ordinary skill in the art. The term "approximately" generally refers to plus or minus 15% of the indicated value. For example, "about 10" may indicate a range of 8.7 to 1.15. For example, "substantially the same" means that one of ordinary skill in the art would consider the items being compared to be the same.

术语“多核苷酸”和“核酸”在本文中可互换使用，用于描述任何长度的聚合物，例如大于约10个碱基、大于约100个碱基、大于约500个碱基、大于1000个碱基、至多约10,000个或更多个碱基组成的核苷酸，例如脱氧核糖核苷酸或核糖核苷酸，或通过合成产生的化合物(例如PNA，如美国专利No.5,948,902和其中引用的参考文献所述)，其可以以序列特异性方式与天然存在的核酸杂交，所述方式类似于两个天然存在的核酸的杂交，例如可以参与沃森-克里克碱基配对相互作用。天然存在的核苷酸包括鸟嘌呤、胞嘧啶、腺嘌呤和胸腺嘧啶(分别为G、C、A和T)。如在说明书和所附权利要求书中使用的，除非另有说明，否则多核苷酸可以是带衔接子的多核苷酸、多核苷酸扩增子或带衔接子的多核苷酸扩增子。带衔接子的多核苷酸与感兴趣的多核苷酸的不同之处在于已将衔接子添加到感兴趣的多核苷酸中。The terms "polynucleotide" and "nucleic acid" are used interchangeably herein to describe polymers of any length, such as nucleotides composed of greater than about 10 bases, greater than about 100 bases, greater than about 500 bases, greater than 1000 bases, up to about 10,000 or more bases, such as deoxyribonucleotides or ribonucleotides, or synthetically produced compounds (e.g., PNAs, as described in U.S. Pat. No. 5,948,902 and references cited therein), which can hybridize with naturally occurring nucleic acids in a sequence-specific manner, the manner being similar to the hybridization of two naturally occurring nucleic acids, such as can participate in Watson-Crick base pairing interactions. Naturally occurring nucleotides include guanine, cytosine, adenine, and thymine (G, C, A, and T, respectively). As used in the specification and the appended claims, unless otherwise indicated, a polynucleotide can be a polynucleotide with an adapter, a polynucleotide amplicon, or a polynucleotide amplicon with an adapter. A polynucleotide with an adapter differs from a polynucleotide of interest in that an adapter has been added to the polynucleotide of interest.

如本文中使用的，术语“靶核酸”或“靶”指的是含有靶核酸序列的核酸。靶核酸可以是单链或双链的，并且常常是双链DNA。如本文中使用的，“靶核酸序列”、“靶序列”或“靶区域”表示特定的序列或其互补序列。靶序列可以在细胞的基因组之内的体外或体内核酸中，可以为单链或双链核酸的任何形式。As used herein, the term "target nucleic acid" or "target" refers to a nucleic acid containing a target nucleic acid sequence. The target nucleic acid can be single-stranded or double-stranded, and is often double-stranded DNA. As used herein, "target nucleic acid sequence," "target sequence," or "target region" refers to a specific sequence or its complementary sequence. The target sequence can be in a nucleic acid in vitro or in vivo within the genome of a cell, and can be any form of a single-stranded or double-stranded nucleic acid.

“杂交(hybridization)”或“杂交(hybridizing)”是指完全或部分互补的核酸链在指定的杂交条件下聚到一起以形成双链结构或区域的过程，其中两个组成链通过氢键连接。虽然氢键一般在腺嘌呤和胸腺嘧啶或尿嘧啶(A和T或U)之间或者在胞嘧啶和鸟嘌呤(C和G)之间形成，其他碱基对也可以形成氢键(例如，Adams等人，"The Biochemistry of theNucleic Acids,"11th ed.,1992)。"Hybridization" or "hybridizing" refers to the process by which fully or partially complementary nucleic acid chains come together under specified hybridization conditions to form a double-stranded structure or region, in which the two component chains are connected by hydrogen bonds. Although hydrogen bonds are generally formed between adenine and thymine or uracil (A and T or U) or between cytosine and guanine (C and G), other base pairs can also form hydrogen bonds (e.g., Adams et al., "The Biochemistry of the Nucleic Acids," 11th ed., 1992).

术语“引物”是指酶促制备或合成的寡核苷酸，其在与多核苷酸模板形成双链体时能够充当核酸合成的起始点并且从其3'端沿着所述模板延伸，从而形成延伸的双链体。在延伸过程中添加的核苷酸序列由模板多核苷酸的序列决定。引物用作通过DNA聚合酶、RNA聚合酶或逆转录酶催化的核苷酸聚合的起始点。引物的长度可以是4-1000个碱基或更长，例如10-500个碱基。The term "primer" refers to an enzymatically prepared or synthesized oligonucleotide that can serve as a starting point for nucleic acid synthesis when forming a duplex with a polynucleotide template and extend from its 3' end along the template to form an extended duplex. The nucleotide sequence added during the extension process is determined by the sequence of the template polynucleotide. Primers are used as starting points for nucleotide polymerization catalyzed by DNA polymerase, RNA polymerase or reverse transcriptase. The length of a primer can be 4-1000 bases or longer, for example 10-500 bases.

如本文中使用的，术语“引物延伸”是指通过使用聚合酶将特定的寡核苷酸退火到引物上而延伸引物。术语“衔接子”是指与感兴趣的多核苷酸附接以形成合成多核苷酸的核酸分子。衔接子可以是单链或双链的，并且可以包含DNA、RNA和/或人工核苷酸。衔接子可以位于感兴趣的多核苷酸的末端，或者可以位于中间或内部。衔接子可以给感兴趣的多核苷酸添加一种或多种功能或特性，比如提供用于扩增或测序的引发位点或添加条形码。举例来说，衔接子可包括通用引物和/或通用引发位点，包括用于测序的引发位点。作为进一步的实例，衔接子可含有各种类型的或出于各种目的的一个或多个条形码，比如分子条形码、样品条形码和/或靶特异性条形码。各种衔接子是本领域已知的，并且可用于本发明的方法、组合物和试剂盒中或经修饰后使用。例如，衔接子包括Y衔接子，其可与多核苷酸附接以产生具有变化的5'端的文库。衔接子还可包括单独的序列(例如A/B衔接子)，其中A衔接子与多核苷酸的一端附接，而B衔接子与多核苷酸的另一端附接。衔接子还包括茎-环衔接子，其中发夹环与多核苷酸的末端附接；一部分(通常是茎)可以在扩增或测序之前被切割。衔接子可以通过任何适合的技术与感兴趣的多核苷酸附接，包括但不限于连接、转座酶的使用、杂交和/或引物延伸。例如，可以将衔接子与感兴趣的多核苷酸的末端连接。作为另一个实例，通过使用转座酶将包含衔接子的转座子插入到感兴趣的多核苷酸中来附接衔接子，由此将衔接子提供在感兴趣的多核苷酸的片段的末端。在一些实施方案中，衔接子包含靶特异性引物和靶特异性条形码，其允许通过靶特异性引物的引物延伸将衔接子附接到感兴趣的多核苷酸上(更具体地，附接到互补多核苷酸上)。As used herein, the term "primer extension" refers to extending a primer by annealing a specific oligonucleotide to a primer using a polymerase. The term "adapter" refers to a nucleic acid molecule attached to a polynucleotide of interest to form a synthetic polynucleotide. An adaptor may be single-stranded or double-stranded, and may include DNA, RNA, and/or artificial nucleotides. An adaptor may be located at the end of a polynucleotide of interest, or may be located in the middle or inside. An adaptor may add one or more functions or characteristics to a polynucleotide of interest, such as providing a priming site for amplification or sequencing or adding a barcode. For example, an adaptor may include a universal primer and/or a universal priming site, including a priming site for sequencing. As a further example, an adaptor may contain one or more barcodes of various types or for various purposes, such as a molecular barcode, a sample barcode, and/or a target-specific barcode. Various adaptors are known in the art and may be used in the methods, compositions, and kits of the present invention or may be used after modification. For example, an adaptor includes a Y adaptor, which may be attached to a polynucleotide to produce a library with a variable 5' end. Adaptors may also include separate sequences (e.g., A/B adaptors), wherein the A adaptor is attached to one end of a polynucleotide, and the B adaptor is attached to the other end of the polynucleotide. Adaptors also include stem-loop adaptors, wherein a hairpin loop is attached to the end of a polynucleotide; a portion (usually a stem) may be cut before amplification or sequencing. Adaptors may be attached to the polynucleotide of interest by any suitable technique, including but not limited to connection, the use of a transposase, hybridization, and/or primer extension. For example, an adaptor may be connected to the end of a polynucleotide of interest. As another example, an adaptor is attached by inserting a transposon comprising an adaptor into a polynucleotide of interest using a transposase, thereby providing an adaptor at the end of a fragment of a polynucleotide of interest. In some embodiments, an adaptor includes a target-specific primer and a target-specific barcode, which allows the adaptor to be attached to a polynucleotide of interest (more specifically, to a complementary polynucleotide) by primer extension of a target-specific primer.

术语“测序”是指确定一个或多个核苷酸的身份，即，核苷酸是否为G、A、T或C。The term "sequencing" refers to determining the identity of one or more nucleotides, ie, whether the nucleotide is G, A, T, or C.

术语“单端测序”是指使用来自多核苷酸一端的读取(“单端读取”)确定多核苷酸的序列。可通过任何测序过程进行单端读取，包括下一代测序和其他大规模平行测序技术。被配置为执行单端测序的仪器可从许多公司商购获得。例如，Illumina的Hiseq 2500可提供单端50bp和单端100bp的读长。在一些实施方案中，单端读取的标称、平均、均值或绝对长度是至少20个连续核苷酸，或者至少30个连续核苷酸，或者至少40个连续核苷酸，或者至少50个连续核苷酸。在一些实施方案中，单端读取的标称、平均、均值或绝对长度是至多300个连续核苷酸，至多200个连续核苷酸，或者至多150个连续核苷酸，或者至多120个连续核苷酸，或者至多100个连续核苷酸。可以将前述最小值和最大值组合形成一定的范围。The term "single-end sequencing" refers to the use of reading ("single-end reading") from one end of a polynucleotide to determine the sequence of a polynucleotide. Single-end reading can be performed by any sequencing process, including next-generation sequencing and other large-scale parallel sequencing technologies. Instruments configured to perform single-end sequencing are commercially available from many companies. For example, Illumina's HiSeq 2500 can provide a read length of 50bp and 100bp on a single end. In some embodiments, the nominal, average, mean or absolute length of a single-end read is at least 20 continuous nucleotides, or at least 30 continuous nucleotides, or at least 40 continuous nucleotides, or at least 50 continuous nucleotides. In some embodiments, the nominal, average, mean or absolute length of a single-end read is at most 300 continuous nucleotides, at most 200 continuous nucleotides, or at most 150 continuous nucleotides, or at most 120 continuous nucleotides, or at most 100 continuous nucleotides. The aforementioned minimum and maximum values can be combined to form a certain range.

如本文中使用的，术语序列的“部分”或“片段”是指比完整序列小的序列的任何部分(例如，核苷酸亚序列或氨基酸亚序列)。多核苷酸的部分的长度可以是任何长度，例如长度为至少5、10、15、20、25、30、40、50、75、100、150、200、300或500个或更多个核苷酸。引导序列的一部分可以是引导序列的约50％、40％、30％、20％、10％，例如引导序列的三分之一或更短，例如7、6、5、4、3或2个核苷酸长度。As used herein, the term "portion" or "fragment" of a sequence refers to any portion of a sequence (e.g., a nucleotide subsequence or an amino acid subsequence) that is smaller than the entire sequence. The length of a portion of a polynucleotide can be any length, for example, at least 5, 10, 15, 20, 25, 30, 40, 50, 75, 100, 150, 200, 300, or 500 or more nucleotides in length. A portion of a guide sequence can be about 50%, 40%, 30%, 20%, 10% of a guide sequence, for example, one-third of a guide sequence or less, for example, 7, 6, 5, 4, 3, or 2 nucleotides in length.

术语“融合基因”是指由两个先前分开的基因形成的多核苷酸。融合基因可由易位、中间缺失或染色体倒位产生，它们经常在人类癌细胞中出现。融合基因可以导致融合转录物的表达，融合转录物被翻译成改变细胞的正常调控途径和/或促进癌细胞生长的融合蛋白。基因变体也可能产生影响正常调控途径的异常蛋白质。许多融合基因多核苷酸是已知的，并且更多正在被发现。例如，US20100279890、US20140120540、US20140272956和US20140315199公开了许多与癌症和其他疾病相关的融合基因，以及检测这样的融合基因的方法。本发明的方法、组合物和试剂盒可用于检测已知的基因融合，而且可用于发现先前未知的基因融合。The term "fusion gene" refers to a polynucleotide formed by two previously separated genes. Fusion genes can be produced by translocation, interstitial deletion or chromosomal inversion, which often occur in human cancer cells. Fusion genes can lead to the expression of fusion transcripts, which are translated into fusion proteins that change the normal regulatory pathways of cells and/or promote the growth of cancer cells. Gene variants may also produce abnormal proteins that affect normal regulatory pathways. Many fusion gene polynucleotides are known, and more are being discovered. For example, US20100279890, US20140120540, US20140272956 and US20140315199 disclose many fusion genes associated with cancer and other diseases, and methods for detecting such fusion genes. The methods, compositions and kits of the present invention can be used to detect known gene fusions, and can be used to find previously unknown gene fusions.

如本文中使用的，术语“引发位点”是指被配置为与引物杂交的寡核苷酸或多核苷酸之内的位点，从而可以例如通过引物延伸将相邻序列或对于单端测序足够接近的序列扩增或测序。引发位点可以是在感兴趣的多核苷酸中存在的序列，也可以是通过添加包含引发位点的衔接子而添加到多核苷酸中的序列。可以通过连接、通过使用转座酶、通过引物延伸或通过其他技术来添加含有引发位点的衔接子。As used herein, the term "priming site" refers to a site within an oligonucleotide or polynucleotide that is configured to hybridize with a primer so that adjacent sequences or sequences close enough for single-end sequencing can be amplified or sequenced, for example, by primer extension. The priming site can be a sequence present in the polynucleotide of interest or a sequence added to the polynucleotide by adding an adapter that contains the priming site. An adapter containing a priming site can be added by ligation, by using a transposase, by primer extension, or by other techniques.

在本公开中，数值范围包括定义该范围的数字。在本公开中，无论何处见到词语“包含”，可以预期的是，可以使用短语“基本上由...组成”或“由......组成”取而代之。应当认识到，出于说明的目的，可以延伸或扩展化学结构和化学式。In the present disclosure, numerical ranges are inclusive of the numbers defining the range. In the present disclosure, wherever the word "comprising" is seen, it is contemplated that the phrase "consisting essentially of" or "consisting of" may be used instead. It should be recognized that chemical structures and formulae may be extended or expanded for illustrative purposes.

如在说明书和所附权利要求中使用的，术语“一个”、“一种”和“该”包括单数和复数指示物，除非上下文另外明确地指示。因此，例如，“引物”包括一个引物和多个引物。在本公开中，诸如术语第一、第二、第三等的序数不表示第一事件在第二事件之前发生(除非上下文另有指示)；相反，它们用于彼此区分不同的事件。As used in the specification and the appended claims, the terms "a", "an", and "the" include singular and plural referents unless the context clearly indicates otherwise. Thus, for example, "a primer" includes a primer and a plurality of primers. In the present disclosure, ordinal numbers such as the terms first, second, third, etc. do not indicate that the first event occurs before the second event (unless the context indicates otherwise); rather, they are used to distinguish different events from each other.

除非另外定义，本文中使用的全部技术术语和科学术语具有与本公开所属领域的从业人员通常所理解的相同的含义。Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.

如本文中公开的，提供了许多数值范围。应当理解的是，还具体公开了在该范围的上限和下限之间的以下限的单位的十分之一为间隔(除非上下文明确地另行指明)的各居间值。在所陈述范围中的任何陈述值或居间值与在该陈述范围中的任何其他的陈述值或居间值之间的各个较小范围涵盖在本发明范围内。这些较小范围的上限和下限可独立地包括或排除在所述范围，并且在极限值任一者、两者都不、或两者都被包括在这些较小范围中的各个范围也涵盖在本发明范围内，受限于陈述范围中的任何明确排除的极限值。当陈述范围包括极限值之一或两者时，排除这些被包括的极限值的任一者或两者的范围也包括在本发明中。As disclosed herein, many numerical ranges are provided.It should be understood that each intermediate value between the upper and lower limits of the scope is also specifically disclosed with one-tenth of the unit of the lower limit as an interval (unless the context clearly indicates otherwise).Each smaller range between any stated value or intermediate value in the stated range and any other stated value or intermediate value in the stated range is included within the scope of the present invention.The upper and lower limits of these smaller ranges can be independently included or excluded in the scope, and each range in which any one, neither, or both of the limit values are included in these smaller ranges is also included within the scope of the present invention, subject to any clearly excluded limit value in the stated range.When the stated range includes one or both of the limit values, the scope excluding any one or both of these included limit values is also included in the present invention.

除非另外定义，本文中使用的全部技术术语和科学术语具有与本公开所属领域的普通技术人员通常所理解的相同的含义。尽管可以在本发明教导的实践或测试中使用类似于或等同于本文所述那些的任何方法和材料，但目前描述了一些示例性的方法和材料。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those of ordinary skill in the art to which the present disclosure belongs. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present teachings, some exemplary methods and materials are currently described.

本文中提及的所有专利和出版物都通过提述明确并入本文。任何出版物的引用是因为其公开在申请日之前，并且不应被解释为承认本权利要求书没有资格先于这样的出版物。此外，提供的出版日期可能与可被独立证实的实际出版日期不同。All patents and publications mentioned herein are expressly incorporated herein by reference. The citation of any publication is for its disclosure prior to the filing date and should not be construed as an admission that the present claims are not entitled to antedate such publication. In addition, the publication dates provided may be different from the actual publication dates that can be independently verified.

对于本领域技术人员在阅读本公开内容后将显而易见的是，本文中描述和展示的每个单独的实施方案具有离散的部件和特征，其可以容易地与任何其他几个实施方案的特征分离或组合，而不脱离本发明教导的范围或精神。任何叙述的方法可以按照叙述的事件的顺序或以逻辑上可能的任何其他顺序进行。It will be apparent to those skilled in the art after reading this disclosure that each individual embodiment described and illustrated herein has discrete components and features that can be readily separated or combined with the features of any other several embodiments without departing from the scope or spirit of the present teachings. Any recited method can be performed in the order of events recited or in any other order that is logically possible.

发明详述DETAILED DESCRIPTION OF THE INVENTION

在一些实施方案中，本公开提供了通过附接靶特异性条形码来制备用于测序的多核苷酸的方法。所述方法包括用第一扩增引物和第二扩增引物扩增多核苷酸，其中第一扩增引物包含第一引发序列和靶特异性条形码，其中第一引发序列与多核苷酸的第一引发位点杂交。此扩增产生多核苷酸扩增子，其中所述多核苷酸扩增子包含与感兴趣的多核苷酸和靶特异性条形码相同或互补的序列。In some embodiments, the present disclosure provides a method for preparing a polynucleotide for sequencing by attaching a target-specific barcode. The method includes amplifying the polynucleotide with a first amplification primer and a second amplification primer, wherein the first amplification primer comprises a first priming sequence and a target-specific barcode, wherein the first priming sequence hybridizes to a first priming site of the polynucleotide. This amplification produces a polynucleotide amplicon, wherein the polynucleotide amplicon comprises a sequence identical or complementary to the polynucleotide of interest and the target-specific barcode.

所述第一扩增引物包括靶特异性的(即，与带衔接子的多核苷酸内的靶序列互补和/或杂交的)第一扩增引物。所述第一扩增引物进一步包含靶特异性条形码，其是对靶序列特异的条形码，例如，对基因的一部分(如融合基因的一部分)特异的条形码。所述扩增产生多核苷酸扩增子，其中所述多核苷酸扩增子包含与感兴趣的多核苷酸和靶特异性条形码相同或互补的序列。所述第二扩增引物(1)与附接到多核苷酸上的衔接子的一部分在距第一引发位点一定距离处杂交，或(2)与多核苷酸的第二引发位点杂交，其中第二引发位点在距第一引发位点的一定距离处。在一些实施方案中，所述方法可进一步包括将衔接子附接到多核苷酸上以形成带衔接子的(adapted)多核苷酸，其中所述衔接子包含第二引发位点和任选的衔接子条形码。在一些实施方案中，第二引发位点在衔接子上并且是通用引发位点和/或测序引物的位点，并且/或者第二引物结合位点是在带衔接子的多核苷酸的5'端的通用引发位点。在一些实施方案中，衔接子和/或第二引发位点在多核苷酸的链的5'端，并且第一引发位点在所述链的3'端。在一些实施方案中，衔接子条形码是样品条形码或分子条形码。分子条形码可以是独特序列，因为它在附接到感兴趣的多核苷酸池的一组衔接子内是独特的。The first amplification primer comprises a target-specific (i.e., complementary and/or hybridizing to a target sequence within a polynucleotide with an adapter) first amplification primer. The first amplification primer further comprises a target-specific barcode, which is a barcode specific to a target sequence, for example, a barcode specific to a portion of a gene (such as a portion of a fusion gene). The amplification produces a polynucleotide amplicon, wherein the polynucleotide amplicon comprises a sequence identical or complementary to the polynucleotide of interest and the target-specific barcode. The second amplification primer (1) hybridizes with a portion of an adapter attached to the polynucleotide at a distance from the first priming site, or (2) hybridizes with a second priming site of the polynucleotide, wherein the second priming site is at a distance from the first priming site. In some embodiments, the method may further include attaching an adapter to the polynucleotide to form an adapted polynucleotide, wherein the adapter comprises a second priming site and an optional adapter barcode. In some embodiments, the second priming site is on the adapter and is a universal priming site and/or a site for a sequencing primer, and/or the second primer binding site is a universal priming site at the 5' end of the polynucleotide with the adapter. In some embodiments, the adapter and/or the second priming site is at the 5' end of the chain of the polynucleotide, and the first priming site is at the 3' end of the chain. In some embodiments, the adapter barcode is a sample barcode or a molecular barcode. The molecular barcode can be a unique sequence because it is unique within a set of adapters attached to a pool of polynucleotides of interest.

在一些实施方案中，本公开提供了用于通过附接靶特异性条形码来制备用于测序的多核苷酸文库的方法、组合物和试剂盒。使用第一组扩增引物和第二组扩增引物扩增多核苷酸池，其中第一组扩增引物与多核苷酸池内的多个不同序列杂交，其中第一组扩增引物中的每一个包含不同的靶特异性条形码。在一些实施方案中，将包含衔接子条形码的衔接子附接到多核苷酸扩增子上。所述第二组扩增引物(1)与附接到多核苷酸上的衔接子的一部分在距第一引发位点一定距离处杂交，或(2)与多核苷酸的第二引发位点杂交，其中第二引发位点在距第一引发位点的一定距离处。用第一和第二组引物扩增产生多核苷酸扩增子文库。可以将衔接子添加至多核苷酸扩增子。在一些实施方案中，在扩增之前添加衔接子，并且衔接子包含与第二组扩增引物杂交的第二引发位点。在一些实施方案中，在扩增之后添加衔接子，例如以便在多核苷酸扩增子上提供测序引发位点。In some embodiments, the present disclosure provides methods, compositions and kits for preparing a polynucleotide library for sequencing by attaching a target-specific barcode. A polynucleotide pool is amplified using a first set of amplification primers and a second set of amplification primers, wherein the first set of amplification primers hybridizes with a plurality of different sequences in the polynucleotide pool, wherein each of the first set of amplification primers comprises a different target-specific barcode. In some embodiments, an adapter comprising an adapter barcode is attached to a polynucleotide amplicon. The second set of amplification primers (1) hybridizes with a portion of an adapter attached to a polynucleotide at a distance from a first priming site, or (2) hybridizes with a second priming site of a polynucleotide, wherein the second priming site is at a distance from the first priming site. A polynucleotide amplicon library is generated by amplifying the first and second sets of primers. An adapter can be added to a polynucleotide amplicon. In some embodiments, an adapter is added before amplification, and the adapter comprises a second priming site hybridized with a second set of amplification primers. In some embodiments, an adapter is added after amplification, for example, so as to provide a sequencing priming site on a polynucleotide amplicon.

通过进行第一引物延伸和第二引物延伸，可以在两个位置对多个多核苷酸扩增子中的每一个进行测序，其中对于带衔接子的多核苷酸扩增子中的每一个，在相同方向上进行第一引物延伸和第二引物延伸的测序。可以基于从第一引物延伸和第二引物延伸的测序生成的数据来鉴定基因组重排。Each of the plurality of polynucleotide amplicons can be sequenced at two locations by performing a first primer extension and a second primer extension, wherein the first primer extension and the second primer extension sequencing are performed in the same direction for each of the polynucleotide amplicons with adapters. Genomic rearrangements can be identified based on data generated from the first primer extension and the second primer extension sequencing.

在其他实施方案中，本公开提供了用于检测在具有第一结合位点的多核苷酸中的基因组重排的组合物和试剂盒。所述组合物和试剂盒包含第一和第二扩增引物。所述第一扩增引物包含靶特异性引物和靶特异性条形码。所述组合物和试剂盒可进一步包含衔接子。衔接子包含第二引发位点和衔接子条形码。在一些实施方案中，第二扩增引物包含与衔接子内的序列互补或相同的引发序列，例如第二引发位点。在组合物和试剂盒的一些实施方案中，所述第二扩增引物(1)与附接到多核苷酸上的衔接子的一部分在距第一引发位点一定距离处杂交，或(2)与多核苷酸的第二引发位点杂交，其中第二引发位点在距第一引发位点的一定距离处。在一些实施方案中，衔接子和/或第二引发位点在多核苷酸的链的5'端，并且第一引发位点在所述链的3'端。In other embodiments, the present disclosure provides a composition and a kit for detecting a genomic rearrangement in a polynucleotide having a first binding site. The composition and the kit include a first and a second amplification primer. The first amplification primer includes a target-specific primer and a target-specific barcode. The composition and the kit may further include an adapter. The adapter includes a second priming site and an adapter barcode. In some embodiments, the second amplification primer includes a priming sequence complementary to or identical to a sequence in the adapter, such as a second priming site. In some embodiments of the composition and the kit, the second amplification primer (1) hybridizes with a portion of the adapter attached to the polynucleotide at a distance from the first priming site, or (2) hybridizes with the second priming site of the polynucleotide, wherein the second priming site is at a distance from the first priming site. In some embodiments, the adapter and/or the second priming site is at the 5' end of the chain of the polynucleotide, and the first priming site is at the 3' end of the chain.

在其他实施方案中，本公开提供了用于检测多核苷酸中的基因组重排的方法、组合物和试剂盒。所述方法、组合物和试剂盒包括用第一扩增引物和第二扩增引物扩增多核苷酸。所述第一扩增引物与多核苷酸的第一引发位点杂交，并且所述第一扩增引物进一步包含靶特异性条形码。所述扩增产生多核苷酸扩增子，其包含与感兴趣的多核苷酸和靶特异性条形码相同或互补的序列。通过进行第一引物延伸和第二引物延伸，在第一和第二位置对多核苷酸扩增子进行测序。可以在相同方向上进行第一引物延伸和第二引物延伸。In other embodiments, the present disclosure provides methods, compositions and kits for detecting genomic rearrangements in polynucleotides. The methods, compositions and kits include amplifying a polynucleotide with a first amplification primer and a second amplification primer. The first amplification primer hybridizes with a first priming site of the polynucleotide, and the first amplification primer further comprises a target-specific barcode. The amplification produces a polynucleotide amplicon comprising a sequence identical or complementary to the polynucleotide of interest and the target-specific barcode. The polynucleotide amplicon is sequenced at the first and second positions by performing a first primer extension and a second primer extension. The first primer extension and the second primer extension can be performed in the same direction.

在前述方法、组合物和试剂盒中，靶特异性条形码对靶是特异的，所述靶比如基因、基因的一部分、融合基因、融合基因的一部分或其他感兴趣的多核苷酸。融合基因可以是已知的融合基因，包括已知的融合基因的连接点，并且/或者融合基因可以是可疑的或假设的融合基因、或这种融合基因的连接点。靶可以是基因组重排，比如感兴趣的多核苷酸中的缺失、插入、倒位和易位。在一些实施方案中，靶是cDNA连接点或外显子连接点。In the aforementioned methods, compositions and kits, the target-specific barcode is specific for a target, such as a gene, a portion of a gene, a fusion gene, a portion of a fusion gene, or other polynucleotide of interest. The fusion gene can be a known fusion gene, including a junction of a known fusion gene, and/or the fusion gene can be a suspected or hypothetical fusion gene, or a junction of such a fusion gene. The target can be a genomic rearrangement, such as a deletion, insertion, inversion, and translocation in a polynucleotide of interest. In some embodiments, the target is a cDNA junction or an exon junction.

在一些实施方案中，第二扩增引物与衔接子的一部分比如第二引发位点杂交，所述第二引发位点可以是衔接子的测序引发位点。在一些实施方案中，带衔接子的多核苷酸在5'端包含衔接子和/或在3'端包含靶特异性条形码。In some embodiments, the second amplification primer hybridizes to a portion of the adaptor, such as a second priming site, which can be a sequencing priming site of the adaptor. In some embodiments, the adaptor-bearing polynucleotide comprises an adaptor at the 5' end and/or a target-specific barcode at the 3' end.

在一些实施方案中，感兴趣的多核苷酸包括多个感兴趣的多核苷酸，并且所述方法包括将多个衔接子附接到多个多核苷酸上，由此形成各自包含不同的衔接子条形码的多个带衔接子的多核苷酸。可替代地或另外地，其中感兴趣的多核苷酸包括多个感兴趣的多核苷酸，并且第一扩增引物包括具有不同的靶特异性引物和靶特异性条形码的多个第一扩增引物，由此形成各自包含不同的靶特异性条形码的多个带衔接子的多核苷酸扩增子。In some embodiments, the polynucleotide of interest comprises a plurality of polynucleotides of interest, and the method comprises attaching a plurality of adaptors to the plurality of polynucleotides, thereby forming a plurality of adaptor-bearing polynucleotides, each comprising a different adaptor barcode. Alternatively or additionally, wherein the polynucleotide of interest comprises a plurality of polynucleotides of interest, and the first amplification primer comprises a plurality of first amplification primers having different target-specific primers and target-specific barcodes, thereby forming a plurality of adaptor-bearing polynucleotide amplicons, each comprising a different target-specific barcode.

在一些实施方案中，通过进行第一引物延伸和第二引物延伸在第一和第二位置对带衔接子的多核苷酸扩增子进行测序，其中第一引物延伸和第二引物延伸在相同方向上进行。在一些实施方案中，用与衔接子的一部分比如第二引发位点互补或相同的第一测序引物进行第一引物延伸。在一些实施方案中，用与第一扩增引物的一部分互补或相同的第二测序引物进行第二引物延伸，所述第一扩增引物的一部分比如与靶特异性条形码相邻或足够接近以用于靶特异性条形码的单端测序的部分。In some embodiments, the polynucleotide amplicon with the adapter is sequenced at the first and second positions by performing a first primer extension and a second primer extension, wherein the first primer extension and the second primer extension are performed in the same direction. In some embodiments, the first primer extension is performed with a first sequencing primer that is complementary or identical to a portion of the adapter, such as the second priming site. In some embodiments, the second primer extension is performed with a second sequencing primer that is complementary or identical to a portion of the first amplification primer, such as a portion that is adjacent to or close enough to the target-specific barcode for single-end sequencing of the target-specific barcode.

按如下进行通过引物延伸的测序：将引物与多核苷酸扩增子杂交；通过添加一个或多个标记的核苷酸延伸引物，由此产生掺入的标记核苷酸；并且检测掺入的标记核苷酸。测序引物可与衔接子上的序列互补或相同。在一些实施方案中，第一引物延伸和第二引物延伸在分别的测序运行中在所述多核苷酸上以相同方向进行。在一些实施方案中，测序是下一代测序(NGS)或大规模平行测序。可以将从第一引物延伸和/或第二引物延伸的测序生成的数据与已知核酸序列(比如已知gDNA序列)比较。Sequencing by primer extension is performed as follows: primers are hybridized to polynucleotide amplicons; primers are extended by adding one or more labeled nucleotides, thereby producing incorporated labeled nucleotides; and the incorporated labeled nucleotides are detected. The sequencing primers may be complementary or identical to the sequences on the adapters. In some embodiments, the first primer extension and the second primer extension are performed in the same direction on the polynucleotides in separate sequencing runs. In some embodiments, sequencing is next generation sequencing (NGS) or massively parallel sequencing. The data generated by sequencing from the first primer extension and/or the second primer extension can be compared with known nucleic acid sequences (such as known gDNA sequences).

本发明的方法、组合物和试剂盒可用于多核苷酸的测序，所述多核苷酸包括基因组DNA(gDNA)、衍生自RNA模板(例如信使RNA(mRNA)或微小RNA(microRNA))的互补DNA(cDNA)、线粒体DNA(mtDNA)、RNA(比如mRNA、microRNA)和其他多核苷酸。多核苷酸可以是任何来源的，比如微生物、病毒、真菌、植物或哺乳动物。The methods, compositions and kits of the present invention can be used for sequencing polynucleotides, including genomic DNA (gDNA), complementary DNA (cDNA) derived from RNA templates (e.g., messenger RNA (mRNA) or microRNA (microRNA)), mitochondrial DNA (mtDNA), RNA (e.g., mRNA, microRNA) and other polynucleotides. The polynucleotides can be of any origin, such as microorganisms, viruses, fungi, plants or mammals.

在一些实施方案中，本发明的方法、组合物和试剂盒用于检测感兴趣的多核苷酸中基因组重排的存在、位置或不存在。基因组重排可以是缺失、重复、插入、倒位或易位，并且所述方法、组合物和试剂盒可用于检测在感兴趣的多核苷酸中某些基因组序列或基因是否已被缺失、重复、插入、倒位或易位。在一些实施方案中，本发明的方法、组合物和试剂盒用于检测基因组缺失。在一些实施方案中，本发明的方法、组合物和试剂盒用于检测基因组重复。在一些实施方案中，本发明的方法、组合物和试剂盒用于检测基因组插入。在一些实施方案中，本发明的方法、组合物和试剂盒用于检测基因组倒位。在一些实施方案中，本发明的方法、组合物和试剂盒用于检测基因组易位。在一些实施方案中，本发明的方法、组合物和试剂盒用于检测多核苷酸比如gDNA或源自RNA的cDNA中的基因组重排。在一些实施方案中，基因组重排的频率为约100％或更低，或者约50％或更低，或者约10％或更低，或者约5％或更低，或者约1％或更低。在一些实施方案中，本发明方法进一步包括使用多核苷酸扩增子的单端测序来检测基因组重排，例如通过基于从第一引物延伸和第二引物延伸的测序生成的数据来鉴定基因组重排。在一些实施方案中，基因组重排是易位。In some embodiments, the methods, compositions and kits of the present invention are used to detect the presence, position or absence of genomic rearrangements in polynucleotides of interest. Genomic rearrangements can be deletions, duplications, insertions, inversions or translocations, and the methods, compositions and kits can be used to detect whether certain genomic sequences or genes have been deleted, repeated, inserted, inverted or translocated in the polynucleotides of interest. In some embodiments, the methods, compositions and kits of the present invention are used to detect genomic deletions. In some embodiments, the methods, compositions and kits of the present invention are used to detect genomic duplications. In some embodiments, the methods, compositions and kits of the present invention are used to detect genomic insertions. In some embodiments, the methods, compositions and kits of the present invention are used to detect genomic inversions. In some embodiments, the methods, compositions and kits of the present invention are used to detect genomic translocations. In some embodiments, the methods, compositions and kits of the present invention are used to detect genomic rearrangements in polynucleotides such as gDNA or cDNA derived from RNA. In some embodiments, the frequency of genomic rearrangements is about 100% or less, or about 50% or less, or about 10% or less, or about 5% or less, or about 1% or less. In some embodiments, the method further comprises detecting genomic rearrangements using single-end sequencing of the polynucleotide amplicon, for example by identifying genomic rearrangements based on data generated from sequencing of the first primer extension and the second primer extension. In some embodiments, the genomic rearrangement is a translocation.

提供了可用于检测多核苷酸中的基因组重排的测序方法。本发明方法可用于利用感兴趣的核酸的单端测序更容易和可靠地检测基因组重排。本发明方法可用于下一代测序(NGS)过程中，以检测感兴趣的多核苷酸中的缺失、插入、倒位和易位。本发明方法涉及在相同方向上的第一和第二引物延伸的测序，以提高多核苷酸重排检测的准确性。来自第一和第二引物延伸的组合序列数据有助于多核苷酸中的读段比对和基因组重排的鉴定。在相同方向上产生的读段的组合允许更准确地鉴定多核苷酸中核酸的相对位置。与标准的单端测序方法相比，本发明方法提高了单端测序过程鉴定基因组中核苷酸相对位置的能力，从而产生更有效的结构重排解析。A sequencing method that can be used to detect genomic rearrangements in polynucleotides is provided. The method of the present invention can be used to more easily and reliably detect genomic rearrangements using single-end sequencing of nucleic acids of interest. The method of the present invention can be used in the next generation sequencing (NGS) process to detect deletions, insertions, inversions and translocations in polynucleotides of interest. The method of the present invention involves sequencing of first and second primer extensions in the same direction to improve the accuracy of polynucleotide rearrangement detection. The combined sequence data from the first and second primer extensions contributes to the alignment of reads in the polynucleotides and the identification of genomic rearrangements. The combination of reads generated in the same direction allows for more accurate identification of the relative position of nucleic acids in polynucleotides. Compared with standard single-end sequencing methods, the method of the present invention improves the ability of the single-end sequencing process to identify the relative position of nucleotides in the genome, thereby producing a more effective structural rearrangement analysis.

本发明方法可用于高通量测序方法中，比如下一代测序(NGS)过程。在一些实施方案中，高通量测序方法包括三个步骤：文库的制备、固定和测序。通常对多核苷酸进行随机片段化，并将衔接子连接至片段的一端或两端。衔接子可以是线性衔接子、环状衔接子或气泡衔接子。将测序文库片段固定在固体支持物上，并进行平行测序反应来询问多核苷酸序列。高通量测序方法可采用乳液PCR、桥式PCR或滚环扩增来提供原始多核苷酸的拷贝。The method of the present invention can be used in high-throughput sequencing methods, such as next-generation sequencing (NGS) processes. In some embodiments, the high-throughput sequencing method includes three steps: preparation, fixation and sequencing of the library. Polynucleotides are usually randomly fragmented, and adapters are connected to one or both ends of the fragments. The adapters can be linear adapters, circular adapters or bubble adapters. The sequencing library fragments are fixed on a solid support, and parallel sequencing reactions are performed to inquire about the polynucleotide sequence. High-throughput sequencing methods can use emulsion PCR, bridge PCR or rolling circle amplification to provide copies of the original polynucleotides.

聚合酶趋于在PCR期间产生错误(最常见的是核苷酸的错误掺入)，如果这些错误发生在早期循环中，它们会在测序数据分析中作为变体出现。分子条形码可用于将PCR错误与感兴趣的多核苷酸中的实际变体区分开。分子条形码的概念是，待扩增库中的每个多核苷酸都与独特的分子条形码附接。具有不同的分子条形码的序列读段代表不同的原始DNA分子，而具有相同条形码的读段是从同一个原始分子的PCR复制的结果。在美国专利8,481,292(Population Genetics Technologies Ltd.)中公开了被称为简并碱基区(DBR)的分子条形码。DBR是附接到样品中存在的分子上的随机序列标签。DBR和其他分子条形码允许将样品制备过程中的PCR错误与原始多核苷酸中存在的突变和其他变体区分开。Polymerases tend to make errors during PCR (most commonly misincorporation of nucleotides), and if these errors occur in early cycles, they appear as variants in sequencing data analysis. Molecular barcodes can be used to distinguish PCR errors from actual variants in the polynucleotides of interest. The concept of molecular barcodes is that each polynucleotide in the library to be amplified is attached to a unique molecular barcode. Sequence reads with different molecular barcodes represent different original DNA molecules, while reads with the same barcode are the result of PCR replication from the same original molecule. Molecular barcodes called degenerate base regions (DBRs) are disclosed in U.S. Patent 8,481,292 (Population Genetics Technologies Ltd.). DBRs are random sequence tags attached to molecules present in a sample. DBRs and other molecular barcodes allow PCR errors in sample preparation to be distinguished from mutations and other variants present in the original polynucleotides.

将衔接子与多核苷酸附接Attaching adaptors to polynucleotides

在一些实施方案中，将多核苷酸附接到衔接子上以形成带衔接子的多核苷酸。可以在扩增之前或之后将衔接子附接到多核苷酸上，在一些实施方案中，多核苷酸是多核苷酸扩增子，并且带衔接子的多核苷酸是带衔接子的多核苷酸扩增子。衔接子可以通过任何适合的技术附接，比如通过连接、转座酶的使用、杂交和/或引物延伸。在一些实施方案中，多核苷酸在一端或两端与衔接子连接。在连接反应中，在两个或多个多核苷酸(比如感兴趣的核酸)或寡核苷酸(比如衔接子)的末端之间形成共价键或连接。键或连接的性质可以变化，并且可以酶促或化学方式进行连接。通常以酶促方式进行连接，以在一个多核苷酸或寡核苷酸的末端核苷酸的5'碳与另一个多核苷酸或寡核苷酸的3'碳之间形成磷酸二酯键。在一些实施方案中，衔接子是Y衔接子，其可以产生具有变化的5'端并具有适合在MiniSeq、NextSeq和HiSeq3000/4000测序仪器上使用的P5和P7引发位点的文库。In some embodiments, polynucleotides are attached to adapters to form polynucleotides with adapters. Adapters can be attached to polynucleotides before or after amplification, in some embodiments, polynucleotides are polynucleotide amplicon, and polynucleotides with adapters are polynucleotide amplicon with adapters. Adapters can be attached by any suitable technology, such as by connection, use of transposase, hybridization and/or primer extension. In some embodiments, polynucleotides are connected to adapters at one or both ends. In ligation, covalent bonds or connections are formed between the ends of two or more polynucleotides (such as nucleic acids of interest) or oligonucleotides (such as adapters). The properties of the bonds or connections can vary, and can be connected enzymatically or chemically. Usually connected enzymatically, to form a phosphodiester bond between the 5' carbon of the terminal nucleotide of a polynucleotide or oligonucleotide and the 3' carbon of another polynucleotide or oligonucleotide. In some embodiments, the adaptor is a Y adaptor, which can generate libraries with varying 5' ends and with P5 and P7 priming sites suitable for use on MiniSeq, NextSeq, and HiSeq 3000/4000 sequencing instruments.

在一些实施方案中，A/B衔接子与感兴趣的多核苷酸附接，其中A衔接子与多核苷酸的一端附接，而B衔接子与多核苷酸的另一端附接。在一些实施方案中，通过随机连接或使用转座酶或通过引物延伸进行扩增来附接A/B衔接子。可以预期A衔接子和B衔接子的个体特征提供的是，在测序程序中包括的每个多核苷酸将包括A和B衔接子两者(即，一种类型的衔接子附接到经历测序的每个多核苷酸的5'端，而另一种类型的衔接子附接在3'端，表示为A/B衔接子组合)。由于连接步骤的随机性质，还将产生带A/A和B/B衔接子的多核苷酸，并且可以采用后续处理步骤来确保仅具有A/B衔接子组合的分子被选择和/或包括在测序程序中。可以在本文所述的用于附接靶特异性条形码的扩增之前或之后，使用针对衔接子的部分的引物扩增带衔接子的多核苷酸，以增加感兴趣的多核苷酸的量。在一些实施方案中，衔接子以一定的方式并且与足够数目的多核苷酸连接，以产生用于大规模平行测序的完全可测序的文库。In some embodiments, A/B adapters are attached to the polynucleotides of interest, wherein the A adapter is attached to one end of the polynucleotide, and the B adapter is attached to the other end of the polynucleotide. In some embodiments, the A/B adapters are attached by random connection or using a transposase or by primer extension for amplification. It can be expected that the individual characteristics of the A adapters and the B adapters provide that each polynucleotide included in the sequencing program will include both A and B adapters (i.e., one type of adapter is attached to the 5' end of each polynucleotide undergoing sequencing, and another type of adapter is attached to the 3' end, expressed as an A/B adapter combination). Due to the random nature of the connection step, polynucleotides with A/A and B/B adapters will also be generated, and subsequent processing steps can be used to ensure that only molecules with A/B adapter combinations are selected and/or included in the sequencing program. Polynucleotides with adapters can be amplified using primers for the part of the adapters before or after the amplification for attaching target-specific barcodes as described herein to increase the amount of the polynucleotides of interest. In some embodiments, adaptors are ligated in a pattern and to a sufficient number of polynucleotides to generate a fully sequenceable library for massively parallel sequencing.

在一些实施方案中，衔接子包含衔接子条形码。衔接子条形码可用作任何期望的目的，比如多核苷酸的来源或性质的标识符。条形码通常是指用于多核苷酸鉴定、分组或处理的任何序列信息。可以包括条形码以鉴定单独的读段、读段组、与探针相关的读段子集、与外显子相关的读段子集、与样品或任何其他组相关的读段子集或其任何组合。例如，可以通过参考条形码信息通过样品、外显子、探针组或其组合对序列进行分选(例如，使用计算机处理器)。条形码信息可用于组装重叠群。计算机处理器可以识别条形码，并通过将条形码组织在一起来组装读段。In some embodiments, the adapter comprises an adapter barcode. The adapter barcode can be used as any desired purpose, such as an identifier of the source or property of a polynucleotide. A barcode generally refers to any sequence information used for polynucleotide identification, grouping or processing. A barcode can be included to identify a single read, a read group, a subset of reads associated with a probe, a subset of reads associated with an exon, a subset of reads associated with a sample or any other group, or any combination thereof. For example, a sequence can be sorted (e.g., using a computer processor) by a sample, an exon, a probe group, or a combination thereof with reference to the barcode information. The barcode information can be used to assemble overlapping groups. A computer processor can recognize barcodes and assemble reads by organizing the barcodes together.

可以通过任何适合的机制获得多核苷酸。感兴趣的多核苷酸可以是基因组脱氧核糖核酸(gDNA)、cDNA、mRNA、线粒体DNA或其他类型。多核苷酸可以是哺乳动物的、病毒的、真菌的或细菌的多核苷酸或其混合物。在一些实施方案中，在将衔接子附接到多核苷酸上之前，使用任何适合的技术使多核苷酸链比如基因组DNA片段化。如本领域已知的，可以使用物理片段化、酶促片段化或化学剪切片段化使多核苷酸链片段化。在一些实施方案中，使用物理片段化方法比如超声处理、声剪切或流体动力剪切使多核苷酸片段化。在一些实施方案中，使用限制酶使多核苷酸片段化。在一些实施方案中，使用酶例如DNase I或转座酶使多核苷酸片段化。在一些实施方案中，在金属阳离子的存在下使用化学剪切方法比如热消化使多核苷酸片段化。在一些实施方案中，多核苷酸是随机片段化的。在一些实施方案中，可以用亚硫酸氢钠或其他化学改性剂处理多核苷酸。在一些实施方案中，多核苷酸片段用于填充(populate)测序文库。Polynucleotides can be obtained by any suitable mechanism. The polynucleotides of interest can be genomic deoxyribonucleic acid (gDNA), cDNA, mRNA, mitochondrial DNA or other types. Polynucleotides can be mammalian, viral, fungal or bacterial polynucleotides or mixtures thereof. In some embodiments, before the adapter is attached to the polynucleotide, any suitable technology is used to make polynucleotide chains such as genomic DNA fragmentation. As known in the art, physical fragmentation, enzymatic fragmentation or chemical shearing fragmentation can be used to fragment polynucleotide chains. In some embodiments, physical fragmentation methods such as ultrasonic treatment, acoustic shearing or fluid dynamic shearing are used to fragment polynucleotides. In some embodiments, restriction enzymes are used to fragment polynucleotides. In some embodiments, enzymes such as DNase I or transposase are used to fragment polynucleotides. In some embodiments, chemical shearing methods such as thermal digestion are used in the presence of metal cations to fragment polynucleotides. In some embodiments, polynucleotides are randomly fragmented. In some embodiments, polynucleotides can be treated with sodium bisulfite or other chemical modifiers. In some embodiments, polynucleotide fragments are used to fill (populate) sequencing libraries.

多核苷酸片段可以具有任何适合的碱基长度。在一些实施方案中，多核苷酸片段具有约30至约2,000的碱基长度。在一些实施方案中，多核苷酸片段具有约30至约800的碱基长度。在一些实施方案中，多核苷酸片段具有约30至约500的碱基长度。在一些实施方案中，多核苷酸片段具有约100至约800的碱基长度。在一些实施方案中，多核苷酸片段具有约200至约600的碱基长度。The polynucleotide fragments can have any suitable base length. In some embodiments, the polynucleotide fragments have a base length of about 30 to about 2,000. In some embodiments, the polynucleotide fragments have a base length of about 30 to about 800. In some embodiments, the polynucleotide fragments have a base length of about 30 to about 500. In some embodiments, the polynucleotide fragments have a base length of about 100 to about 800. In some embodiments, the polynucleotide fragments have a base length of about 200 to about 600.

在片段化之后，可将一个或多个衔接子附接到多核苷酸片段上。在一些实施方案中，衔接子是线性衔接子、环状衔接子或气泡衔接子。在一些实施方案中，将多核苷酸连接至至少一个环状衔接子。在一些实施方案中，使多核苷酸片段与环状衔接子接触以生成环状多核苷酸分子。在一些实施方案中，在扩增过程中仅扩增环状多核苷酸分子。在任何这些实施方案中，衔接子可包含衔接子条形码。After fragmentation, one or more adapters can be attached to the polynucleotide fragments. In some embodiments, the adapter is a linear adapter, a circular adapter, or a bubble adapter. In some embodiments, the polynucleotide is connected to at least one circular adapter. In some embodiments, the polynucleotide fragments are contacted with the circular adapter to generate circular polynucleotide molecules. In some embodiments, only the circular polynucleotide molecules are amplified during the amplification process. In any of these embodiments, the adapter may include an adapter barcode.

靶多核苷酸的扩增Amplification of target polynucleotides

本发明方法包括在多核苷酸附接至衔接子之前和/或之后扩增所述多核苷酸。在一些实施方案中，衔接子位于多核苷酸中感兴趣的序列的5'端，并且衔接子提供了用于扩增感兴趣的序列的引发位点。使用第一扩增引物和第二扩增引物扩增带衔接子的多核苷酸。第一扩增引物对多核苷酸中的靶序列具有序列特异性，并且能够与靶序列的一部分(感兴趣的多核苷酸)杂交。第二扩增引物能够与衔接子的引发位点或与感兴趣的多核苷酸的靶特异性引发位点杂交。在扩增步骤期间，第一扩增引物与靶序列杂交，第二引物与衔接子上的序列引发位点杂交。在一些实施方案中，第一扩增引物在带衔接子的多核苷酸的5'端杂交。本发明方法的引物应足够大，以提供与多核苷酸的靶序列的充分杂交。The method of the present invention includes amplifying the polynucleotide before and/or after the polynucleotide is attached to the adapter. In some embodiments, the adapter is located at the 5' end of the sequence of interest in the polynucleotide, and the adapter provides a priming site for amplifying the sequence of interest. The polynucleotide with adapters is amplified using the first amplification primer and the second amplification primer. The first amplification primer has sequence specificity to the target sequence in the polynucleotide and can hybridize with a part of the target sequence (polynucleotide of interest). The second amplification primer can hybridize with the priming site of the adapter or with the target-specific priming site of the polynucleotide of interest. During the amplification step, the first amplification primer hybridizes with the target sequence, and the second primer hybridizes with the sequence priming site on the adapter. In some embodiments, the first amplification primer hybridizes at the 5' end of the polynucleotide with adapters. The primer of the method of the present invention should be large enough to provide sufficient hybridization with the target sequence of the polynucleotide.

为了扩增，将感兴趣的多核苷酸与包含靶特异性条形码的第一扩增引物杂交。第一扩增引物与多核苷酸的至少一部分互补。第一扩增引物与多核苷酸的第一引发位点杂交。多核苷酸在3'端包含靶序列，任选地随后是衔接子。如果靶序列存在于带衔接子的多核苷酸中，则第一扩增引物与带衔接子的多核苷酸杂交，由此允许靶序列的选择性扩增和检测。第一扩增引物可以与基因组重排互补和/或杂交，所述基因组重排比如感兴趣的多核苷酸中的缺失、插入、倒位或易位。在一些实施方案中，第一扩增引物与cDNA连接点或外显子连接点互补和/或杂交。在一些实施方案中，第一扩增引物与融合基因互补和/或杂交，所述融合基因比如已知的融合基因，包括已知的融合基因的连接点和/或疑似的或假设的融合基因、或疑似的或假设的融合基因的连接点。In order to amplify, the polynucleotide of interest is hybridized with the first amplification primer comprising a target-specific barcode. The first amplification primer is complementary to at least a portion of the polynucleotide. The first amplification primer is hybridized with the first priming site of the polynucleotide. The polynucleotide comprises a target sequence at the 3' end, optionally followed by an adapter. If the target sequence is present in a polynucleotide with an adapter, the first amplification primer is hybridized with the polynucleotide with an adapter, thereby allowing selective amplification and detection of the target sequence. The first amplification primer can be complementary and/or hybridized with a genomic rearrangement, such as a deletion, insertion, inversion or translocation in the polynucleotide of interest. In some embodiments, the first amplification primer is complementary and/or hybridized with a cDNA junction or an exon junction. In some embodiments, the first amplification primer is complementary and/or hybridized with a fusion gene, such as a known fusion gene, including a junction of a known fusion gene and/or a suspected or hypothetical fusion gene or a suspected or hypothetical fusion gene.

第二扩增引物与多核苷酸或衔接子在距第一引发位点一定距离处杂交。在一些实施方案中，第二扩增引物与在距第一引发位点一定距离处附接到多核苷酸上的衔接子的一部分杂交。在一些实施方案中，第二扩增引物与多核苷酸的第二引发位点杂交，其中第二引发位点与第一引发位点相距一定距离。The second amplification primer hybridizes to the polynucleotide or adapter at a distance from the first priming site. In some embodiments, the second amplification primer hybridizes to a portion of the adapter attached to the polynucleotide at a distance from the first priming site. In some embodiments, the second amplification primer hybridizes to a second priming site of the polynucleotide, wherein the second priming site is at a distance from the first priming site.

可以使用任何适合的方法扩增感兴趣的多核苷酸。在一些实施方案中，使用聚合酶链反应(PCR)扩增多核苷酸。通常，PCR包括多核苷酸链的变性(例如DNA解链)，引物到变性的多核苷酸链的退火，以及用聚合酶延伸引物以合成互补的多核苷酸。所述过程通常需要DNA聚合酶、正向和反向引物、脱氧核苷三磷酸、二价阳离子和缓冲溶液。在一些实施方案中，通过线性扩增将多核苷酸扩增。在一些实施方案中，使用乳液PCR、桥式PCR或滚环扩增来扩增多核苷酸。可以使用适合的测序方法分析扩增的多核苷酸以确定碱基对的顺序。Any suitable method can be used to increase the polynucleotide of interest. In some embodiments, the polymerase chain reaction (PCR) is used to amplify polynucleotides. Usually, PCR includes the denaturation (such as DNA melting) of polynucleotide chains, the annealing of primers to the polynucleotide chains of denaturation, and the polynucleotides of synthetic complementation with polymerase extension primers. The process usually requires DNA polymerase, forward and reverse primers, deoxynucleoside triphosphates, divalent cations and buffer solutions. In some embodiments, by linear amplification, polynucleotides are amplified. In some embodiments, emulsion PCR, bridge PCR or rolling circle amplification are used to amplify polynucleotides. The polynucleotides amplified can be analyzed using a suitable sequencing method to determine the order of base pairs.

在一些实施方案中，将一种或多种引物或多核苷酸固定在固体支持物上。扩增引物和/或多核苷酸的固定化可以促进多核苷酸的洗涤以去除任何不希望的种类(例如，脱氧核苷酸)。在一些实施方案中，多核苷酸包含一个或多个附接到固体支持物上的衔接子，从而使多核苷酸固定在支持物上。在一些实施方案中，将多核苷酸固定在流动池或载玻片的表面上。在一些实施方案中，将多核苷酸固定在微量滴定孔或磁珠上。在一些实施方案中，可以用附接至官能团或模块的聚合物包被固体支持物。在一些实施方案中，固体支持物可带有官能团，比如氨基、羟基或羧基，或其他模块，比如亲和素或链霉亲和素，用于附接衔接子。In some embodiments, one or more primers or polynucleotides are fixed on a solid support. The immobilization of amplification primers and/or polynucleotides can promote the washing of polynucleotides to remove any undesirable species (e.g., deoxynucleotides). In some embodiments, polynucleotides include one or more adapters attached to a solid support, so that polynucleotides are fixed on a support. In some embodiments, polynucleotides are fixed on the surface of a flow cell or a slide. In some embodiments, polynucleotides are fixed on microtiter wells or magnetic beads. In some embodiments, a polymer-coated solid support attached to a functional group or module can be used. In some embodiments, a solid support can be provided with a functional group, such as an amino group, a hydroxyl group, or a carboxyl group, or other modules, such as avidin or streptavidin, for attaching an adapter.

多核苷酸扩增子可以是带衔接子的多核苷酸扩增子。在一些实施方案中，带衔接子的多核苷酸或多核苷酸扩增子包含结合配偶体，比如生物素模块。可将多核苷酸附接至包含结合配偶体的衔接子，或可使用一种或多种包含结合配偶体的引物扩增多核苷酸。在一些实施方案中，本发明方法包括在相互结合配偶体之间形成复合物，比如生物素化的引物延伸产物和固体支持的抗亲和素或链霉亲和素。所述方法还可包括通过与相互结合配偶体结合来富集包含含有结合配偶体的带衔接子的多核苷酸的样品。亲和素和链霉亲和素蛋白质与生物素和某些生物素类似物形成异常紧密的复合物。通常，当生物素通过其羧基侧链与第二个分子偶联时，所得的缀合物仍被亲和素或链霉亲和素紧密结合。当制备这样的缀合物时，第二个分子被说成是被“生物素化”。通常，本发明方法涉及将生物素化的核酸与亲和素或链霉亲和素复合，然后检测、分析和/或使用复合物。在一些实施方案中，将生物素化的多核苷酸固定在用链霉亲和素包被的流动池或用链霉亲和素包被的金属珠上。在本发明的方法、组合物和试剂盒的一些实施方案中，靶特异性引物(例如，第一扩增引物)可以附接至结合配偶体，比如生物素模块，以允许通过结合至相互结合配偶体比如链霉亲和素或亲和素进行选择或纯化。有用的结合配偶体包括生物素:亲和素、生物素:链霉亲和素、抗体:抗原和互补核酸。在一些实施方案中，靶特异性引物可包括结合配偶体，比如生物素，以允许捕获选择性扩增的池。The polynucleotide amplicon can be a polynucleotide amplicon with an adapter. In some embodiments, the polynucleotide or polynucleotide amplicon with an adapter comprises a binding partner, such as a biotin module. The polynucleotide can be attached to an adapter comprising a binding partner, or the polynucleotide can be amplified using one or more primers comprising a binding partner. In some embodiments, the method of the present invention includes forming a complex between mutual binding partners, such as biotinylated primer extension products and anti-avidin or streptavidin supported by a solid. The method may also include enriching a sample containing a polynucleotide with an adapter containing a binding partner by binding to the mutual binding partner. Avidin and streptavidin proteins form abnormally tight complexes with biotin and certain biotin analogs. Typically, when biotin is coupled to a second molecule through its carboxyl side chain, the resulting conjugate is still tightly bound by avidin or streptavidin. When such a conjugate is prepared, the second molecule is said to be "biotinylated". Typically, the method of the present invention involves compounding a biotinylated nucleic acid with avidin or streptavidin, and then detecting, analyzing and/or using the complex. In some embodiments, biotinylated polynucleotides are fixed on a flow cell coated with streptavidin or on a metal bead coated with streptavidin. In some embodiments of the methods, compositions and kits of the present invention, a target-specific primer (e.g., a first amplification primer) can be attached to a binding partner, such as a biotin module, to allow selection or purification by binding to a mutual binding partner such as streptavidin or avidin. Useful binding partners include biotin: avidin, biotin: streptavidin, antibody: antigen and complementary nucleic acid. In some embodiments, a target-specific primer may include a binding partner, such as biotin, to allow capture of a selectively amplified pool.

为了下一代测序的多核苷酸的制备，通常在下一代测序之前采用靶富集，并且一种或多种靶富集方案可以包括在本方法中。通过富集一种或多种期望的靶多核苷酸，可以更专注于测序，同时减少工作量和费用和/或提高覆盖深度。当前用于下一代测序的富集方案的实例包括基于杂交的捕获方案，比如Agilent的SureSelect Hybrid Capture和Illumina的TruSeq Capture。其他实例包括基于PCR的方案，比如Agilent的HaloPlex；ThermoFisher的AmpliSeq；Illumina的TruSeq Amplicon；以及Raindance的乳液/数字PCR。For the preparation of polynucleotides for next-generation sequencing, target enrichment is generally employed prior to next-generation sequencing, and one or more target enrichment schemes may be included in the present method. By enriching one or more desired target polynucleotides, more focus can be placed on sequencing while reducing workload and costs and/or increasing coverage depth. Examples of current enrichment schemes for next-generation sequencing include hybridization-based capture schemes, such as Agilent's SureSelect Hybrid Capture and Illumina's TruSeq Capture. Other examples include PCR-based schemes, such as Agilent's HaloPlex; ThermoFisher's AmpliSeq; Illumina's TruSeq Amplicon; and Raindance's emulsion/digital PCR.

在一些实施方案中，使用诸如PCR的方法扩增在两端具有通用接头的多核苷酸的文库。可以将包含定制衔接子的靶特异性引物添加到反应中，以扩增靶序列。在这样的实施方案中，生成两个片段池：(a)在两端具有通用接头的片段池，和(b)在一端或两端具有序列特异性接头的通过选择性扩增生成的片段池。如果需要，可以对片段的混合池进行靶富集。In some embodiments, a library of polynucleotides having universal adapters at both ends is amplified using methods such as PCR. Target-specific primers containing custom adapters can be added to the reaction to amplify the target sequence. In such an embodiment, two fragment pools are generated: (a) a fragment pool having universal adapters at both ends, and (b) a fragment pool generated by selective amplification with sequence-specific adapters at one or both ends. If desired, target enrichment can be performed on the mixed pool of fragments.

在本发明的方法、组合物和试剂盒的一些实施方案中，采用或提供了多于一种的靶特异性引物用于扩增。扩增可以是单重或多重的。多重PCR是一种分子生物学技术，用于在单个PCR实验中扩增多个核酸靶。可从Multiplicom NV获得用于靶序列多重扩增的试剂盒。In some embodiments of the methods, compositions and kits of the invention, more than one target-specific primer is employed or provided for amplification. Amplification can be single or multiplex. Multiplex PCR is a molecular biology technique for amplifying multiple nucleic acid targets in a single PCR experiment. Kits for multiplex amplification of target sequences can be obtained from Multiplicom NV.

在本发明的方法、组合物和试剂盒的一些实施方案中，多核苷酸扩增子用于转座因子(TE)方案中。通过使用转座酶插入包含衔接子的转座子，可以将衔接子附接到扩增子上，由此在扩增子片段的末端提供衔接子。在一些实施方案中，多核苷酸可以同时被片段化和条形码化。例如，可以使用转座酶(例如，NEXTERA)使多核苷酸片段化并向多核苷酸添加条形码。In some embodiments of the methods, compositions and kits of the present invention, polynucleotide amplicons are used in transposable element (TE) schemes. By using a transposase to insert a transposon comprising an adapter, the adapter can be attached to the amplicon, thereby providing an adapter at the end of the amplicon fragment. In some embodiments, the polynucleotide can be fragmented and barcoded simultaneously. For example, a transposase (e.g., NEXTERA) can be used to fragment the polynucleotide and add a barcode to the polynucleotide.

融合基因Fusion gene

靶特异性引物可以与任何已知或疑似的融合基因的一部分互补或相同。举例来说，靶特异性引物可以与US20100279890、US20140120540、US20140272956或US20140315199中公开的任何融合基因互补或相同。作为进一步的实例，靶特异性引物可以与以下任何融合基因互补或相同：BCR-ABL、EML4–ALK、TEL-AML1、AML1-ETO和TMPRSS2-ERG。或者，靶特异性引物可与新发现的融合基因或这种融合基因的连接点互补或相同。或者，靶特异性引物可与疑似或假设的融合基因或这种融合基因的连接点互补或相同。The target-specific primer may be complementary or identical to a portion of any known or suspected fusion gene. For example, the target-specific primer may be complementary or identical to any fusion gene disclosed in US20100279890, US20140120540, US20140272956, or US20140315199. As a further example, the target-specific primer may be complementary or identical to any of the following fusion genes: BCR-ABL, EML4–ALK, TEL-AML1, AML1-ETO, and TMPRSS2-ERG. Alternatively, the target-specific primer may be complementary or identical to a newly discovered fusion gene or the junction of such a fusion gene. Alternatively, the target-specific primer may be complementary or identical to a suspected or hypothetical fusion gene or the junction of such a fusion gene.

在一些实施方案中，本发明的方法、组合物和试剂盒包含用于不同融合基因的多种靶特异性引物。例如，多个靶特异性引物可包括用于BCR-ABL连接点的第一靶特异性引物和用于EML4-ALK的第二靶特异性引物。在一些实施方案中，本发明的方法、组合物和试剂盒包含用于单个融合基因(包括用于单个融合基因的多个连接点)的多个靶特异性引物。例如，多个靶特异性引物可包括用于第一EML4-ALK连接点的第一靶特异性引物和用于第二EML4-ALK连接点的第二靶特异性引物。本发明的方法、组合物和试剂盒可包含第三靶特异性引物、第四靶特异性引物、第五靶特异性引物，直至第二十种靶特异性引物，或甚至更多种靶特异性引物。In some embodiments, the methods, compositions and kits of the present invention include multiple target-specific primers for different fusion genes. For example, multiple target-specific primers may include a first target-specific primer for a BCR-ABL junction and a second target-specific primer for EML4-ALK. In some embodiments, the methods, compositions and kits of the present invention include multiple target-specific primers for a single fusion gene (including multiple junctions for a single fusion gene). For example, multiple target-specific primers may include a first target-specific primer for a first EML4-ALK junction and a second target-specific primer for a second EML4-ALK junction. The methods, compositions and kits of the present invention may include a third target-specific primer, a fourth target-specific primer, a fifth target-specific primer, up to the twentieth target-specific primer, or even more target-specific primers.

靶序列的测序Sequencing of target sequences

扩增后，可以对带衔接子的多核苷酸扩增子进行测序。例如，可以通过在扩增过程中产生的带衔接子的多核苷酸扩增子的第一引物延伸和第二引物延伸来进行测序。在一些实施方案中，在单独的扩增子或一组相同的扩增子上以相同的方向进行第一和第二引物延伸。第一引物延伸通过检测由于第一引物(和其他引物)的延伸而掺入的碱基来确定测序，从而允许确定多核苷酸的靶序列的至少一部分，特别是位于衔接子的5'的那些。带衔接子的多核苷酸可包含测序引发位点，比如P5或P7引发位点。在一些实施方案中，第一引物延伸也可用于检测衔接子条形码的序列。第二引物延伸通过检测由于第二引物延伸而掺入的碱基来确定测序，从而允许检测靶特异性条形码。靶特异性条形码的测序用于证实对感兴趣的多核苷酸中的靶特异性条形码特异的基因或其他多核苷酸的存在和/或位置。After amplification, the polynucleotide amplicon with adapters can be sequenced. For example, sequencing can be performed by extending the first primer and the second primer of the polynucleotide amplicon with adapters generated during the amplification process. In some embodiments, the first and second primer extensions are performed in the same direction on a separate amplicon or a group of identical amplicon. The first primer extension determines sequencing by detecting the bases incorporated due to the extension of the first primer (and other primers), thereby allowing at least a portion of the target sequence of the polynucleotide to be determined, particularly those located at the 5' of the adapter. The polynucleotide with adapters may include a sequencing initiation site, such as a P5 or P7 initiation site. In some embodiments, the first primer extension can also be used to detect the sequence of the adapter barcode. The second primer extension determines sequencing by detecting the bases incorporated due to the extension of the second primer, thereby allowing detection of target-specific barcodes. The sequencing of target-specific barcodes is used to confirm the presence and/or position of genes or other polynucleotides specific to the target-specific barcode in the polynucleotide of interest.

在一些实施方案中，通过大规模平行测序进行测序，所述测序使用利用可逆染料终止子的边合成边测序。在一些实施方案中，通过使用边连接边测序的大规模平行测序进行测序。在一些实施方案中，通过单分子测序进行测序。在一些实施方案中，使用焦磷酸测序进行测序。In some embodiments, sequencing is performed by massively parallel sequencing using sequencing by synthesis using reversible dye terminators. In some embodiments, sequencing is performed by massively parallel sequencing using sequencing by ligation. In some embodiments, sequencing is performed by single molecule sequencing. In some embodiments, sequencing is performed using pyrophosphate sequencing.

可以使用任何适合的反应方法将多核苷酸测序。在一些实施方案中，可以使用单一核苷酸(即，对应于G、A、T或C的核苷酸)完成单个反应循环，并且该方法涉及检测是否掺入了核苷酸。如果掺入了核苷酸，则核苷酸的身份将变得已知。在这样的实施方案中，所述方法可涉及依次循环遍历所有四种核苷酸(即，对应于G、A、T和C的核苷酸)，并且应当掺入所述核苷酸之一。在这样的实施方案中，可以通过例如检测焦磷酸盐释放、质子释放或荧光来检测核苷酸的添加，对于这些方法是已知的。例如，在一些实施方案中，链终止剂核苷酸可以是末端磷酸酯标记的荧光核苷酸(即，具有附接至末端磷酸酯的荧光团的核苷酸)，并且鉴定步骤包括读取荧光。在其他实施方案中，链终止剂核苷酸可以是包含在末端磷酸酯上的猝灭剂的荧光核苷酸。在这样的实施方案中，核苷酸的掺入从核苷酸上去除了猝灭剂，由此允许检测荧光标记物。在其他实施方案中，可以用质量标签、电荷标记物、电荷阻断标记物、化学发光标记物、氧化还原标记物或其他可检测标记物在末端磷酸酯上对末端磷酸酯标记的链终止子核苷酸进行标记。Any suitable reaction method can be used to sequence the polynucleotide. In some embodiments, a single reaction cycle can be completed using a single nucleotide (i.e., a nucleotide corresponding to G, A, T or C), and the method involves detecting whether a nucleotide is incorporated. If a nucleotide is incorporated, the identity of the nucleotide will become known. In such an embodiment, the method may involve cycling through all four nucleotides (i.e., nucleotides corresponding to G, A, T and C) in sequence, and one of the nucleotides should be incorporated. In such an embodiment, the addition of nucleotides can be detected by, for example, detecting pyrophosphate release, proton release or fluorescence, for which these methods are known. For example, in some embodiments, the chain terminator nucleotide can be a fluorescent nucleotide (i.e., a nucleotide with a fluorophore attached to a terminal phosphate) labeled with a terminal phosphate, and the identification step includes reading fluorescence. In other embodiments, the chain terminator nucleotide can be a fluorescent nucleotide of a quencher included on a terminal phosphate. In such an embodiment, the incorporation of nucleotides removes the quencher from the nucleotide, thereby allowing detection of fluorescent markers. In other embodiments, the terminal phosphate-labeled chain terminator nucleotide can be labeled on the terminal phosphate with a mass tag, charge label, charge blocking label, chemiluminescent label, redox label, or other detectable label.

在一些实施方案中，可以使用全部四种核苷酸(即，对应于G、A、T和C的核苷酸)进行单个反应循环，其中每种核苷酸用不同的荧光团标记。在这样的实施方案中，测序步骤可包括将对应于G、A、T和C的四种链终止子添加到扩增的多核苷酸中，其中四种链终止子包含不同的荧光团。在这样的实施方案中，鉴定步骤可包括鉴定四种链终止子中的哪一种被添加到引物的末端。In some embodiments, a single reaction cycle can be performed using all four nucleotides (i.e., nucleotides corresponding to G, A, T, and C), wherein each nucleotide is labeled with a different fluorophore. In such embodiments, the sequencing step can include adding four chain terminators corresponding to G, A, T, and C to the amplified polynucleotide, wherein the four chain terminators comprise different fluorophores. In such embodiments, the identifying step can include identifying which of the four chain terminators was added to the end of the primer.

可以使用单端测序进行测序步骤，即，以相同方向读取第一引物延伸序列和第二引物延伸序列。在一些实施方案中，启用单端的基因组分析仪用于将多核苷酸测序。在一些实施方案中，所述方法包括实时地连续监测测序反应(即碱基掺入)。这可以简单地通过在链延伸反应混合物中包括“检测酶”并同时进行链延伸和检测或信号生成、反应来实现。在一些实施方案中，作为第一反应步骤首先单独进行链延伸反应，然后进行单独的“检测”反应，其中随后检测引物延伸产物。The sequencing step can be performed using single-end sequencing, i.e., the first primer extension sequence and the second primer extension sequence are read in the same direction. In some embodiments, a single-end genome analyzer is enabled for sequencing the polynucleotides. In some embodiments, the method includes continuously monitoring the sequencing reaction (i.e., base incorporation) in real time. This can be achieved simply by including a "detection enzyme" in the chain extension reaction mixture and performing chain extension and detection or signal generation, reaction simultaneously. In some embodiments, as a first reaction step, a chain extension reaction is first performed separately, and then a separate "detection" reaction is performed, wherein the primer extension product is subsequently detected.

测序数据分析Sequencing data analysis

可以基于从第一引物延伸和第二引物延伸的测序生成的数据来鉴定基因组重排。本发明方法包括基于从第一引物延伸和第二引物延伸测序生成的数据来鉴定多核苷酸中的基因组重排。来自第一引物延伸的测序数据提供了靶序列的碱基对序列。来自第二引物延伸的测序数据提供了衔接子的碱基对序列，所述衔接子可用于指示或证实靶序列的存在，因为衔接子被设计为与多核苷酸样品中的靶序列特异性地杂交。由两次引物延伸提供的组合数据提供了用于确定多核苷酸中任何基因组重排的位置信息。The genome rearrangement can be identified based on the data generated from the sequencing of the first primer extension and the second primer extension. The method of the present invention includes identifying the genome rearrangement in the polynucleotide based on the data generated from the sequencing of the first primer extension and the second primer extension. The sequencing data from the first primer extension provides the base pair sequence of the target sequence. The sequencing data from the second primer extension provides the base pair sequence of the adapter, which can be used to indicate or confirm the presence of the target sequence because the adapter is designed to specifically hybridize with the target sequence in the polynucleotide sample. The combined data provided by the two primer extensions provides positional information for determining any genome rearrangement in the polynucleotide.

将从第一和第二引物延伸生成的数据与参考样品比较。在参考样品与从第一和第二引物延伸生成的数据之间的任何差异均表明，正在研究的样品中可能存在基因组重排。参考样品的序列以及相对于参考样品由第一引物延伸和第二引物延伸生成的序列可用于鉴定任何基因组重排的类型和位置。The data generated from the first and second primer extensions are compared to a reference sample. Any differences between the reference sample and the data generated from the first and second primer extensions indicate that a genomic rearrangement may be present in the sample being studied. The sequence of the reference sample and the sequence generated by the first primer extension and the second primer extension relative to the reference sample can be used to identify the type and location of any genomic rearrangement.

本发明的方法、组合物和试剂盒可用于检测任何感兴趣的序列，包括与常见的缺失综合征相关的序列。The methods, compositions and kits of the invention can be used to detect any sequence of interest, including sequences associated with common deletion syndromes.

实施例1Example 1

图1展示了通过将衔接子和条形码附接到多核苷酸上来制备用于测序的多核苷酸的方法，以及通过本技术生成的带衔接子的多核苷酸和带衔接子的多核苷酸扩增子。根据本发明的一个实施方案，所述带衔接子的多核苷酸可用于使用选择性基因扩增来检测融合事件。在图1中，带衔接子的多核苷酸102包含感兴趣的核酸，在这种情况下为融合基因的连接点。带衔接子的多核苷酸102包含第一基因104和第二基因106。带衔接子的多核苷酸102在每个末端还包含衔接子108、110。可以通过任何适合的程序比如通过连接来附接衔接子。衔接子中的至少一个包含衔接子条形码112，其可以是分子条形码或样品条形码。Figure 1 shows a method for preparing a polynucleotide for sequencing by attaching an adapter and a barcode to a polynucleotide, as well as a polynucleotide with an adapter and a polynucleotide amplicon with an adapter generated by the present technology. According to one embodiment of the present invention, the polynucleotide with an adapter can be used to detect a fusion event using selective gene amplification. In Figure 1, the polynucleotide with an adapter 102 comprises a nucleic acid of interest, in this case, a connection point for a fusion gene. The polynucleotide with an adapter 102 comprises a first gene 104 and a second gene 106. The polynucleotide with an adapter 102 also comprises adapters 108 and 110 at each end. The adapters can be attached by any suitable procedure, such as by connection. At least one of the adapters comprises an adapter barcode 112, which can be a molecular barcode or a sample barcode.

在时段A，制备带衔接子的多核苷酸用于靶特异性扩增。可以使带衔接子的多核苷酸变性以提供单链多核苷酸，或者可以提供双链多核苷酸用于扩增。在一些实施方案中，以非特异性方式扩增带衔接子的多核苷酸(例如，通过用与附接至带衔接子的多核苷酸的文库成员的衔接子上的引发位点互补的引物来扩增带衔接子的多核苷酸。在一些实施方案中，如上文论述，通常在扩增带衔接子的多核苷酸之前富集带衔接子的多核苷酸。In time period A, polynucleotides with adapters are prepared for target-specific amplification. Polynucleotides with adapters can be denatured to provide single-stranded polynucleotides, or double-stranded polynucleotides can be provided for amplification. In some embodiments, polynucleotides with adapters are amplified in a non-specific manner (e.g., by amplifying polynucleotides with adapters with primers complementary to the priming sites on the adapters of library members attached to polynucleotides with adapters. In some embodiments, as discussed above, polynucleotides with adapters are typically enriched before amplifying polynucleotides with adapters.

制备带衔接子的多核苷酸，使其与包含靶特异性引物116的第一扩增引物114接触。靶特异性引物116与已知或疑似存在于带衔接子的多核苷酸中的序列(例如第二基因106内的序列)互补。第一扩增引物114还包含靶特异性条形码118，其对已知或疑似存在于被分析样品或感兴趣的多核苷酸中的基因的一部分或其他靶是特异的。在此上下文中，基因特异的并不意味着它与基因互补，而是条形码与基因特异性地相关，因此检测到基因特异性条形码的序列可靠地表明相关序列是存在的。The adaptor-bearing polynucleotide is prepared and contacted with a first amplification primer 114 comprising a target-specific primer 116. The target-specific primer 116 is complementary to a sequence known or suspected to be present in the adaptor-bearing polynucleotide (e.g., a sequence within the second gene 106). The first amplification primer 114 also comprises a target-specific barcode 118 that is specific to a portion of a gene or other target known or suspected to be present in the sample being analyzed or the polynucleotide of interest. In this context, gene-specific does not mean that it is complementary to a gene, but rather that the barcode is specifically associated with a gene, so that detection of the sequence of the gene-specific barcode reliably indicates that the associated sequence is present.

在时段B，在第一扩增引物114和第二扩增引物120的存在下对带衔接子的多核苷酸进行扩增，以生成带衔接子的多核苷酸扩增子的文库。带衔接子的多核苷酸扩增子包含感兴趣的核酸、衔接子或其互补序列以及基因特异性条形码或其互补序列。为了便于说明，图1显示了一组第一扩增引物114和第二扩增引物120，尽管扩增反应可以针对各种序列采用大量靶特异性引物，并且可以生成大量感兴趣的核酸的扩增子。在一些实施方案中，从多核苷酸池富集带衔接子的多核苷酸，例如其中标签包括生物素或另一种结合配偶体。In time period B, the polynucleotide with adapters is amplified in the presence of the first amplification primer 114 and the second amplification primer 120 to generate a library of polynucleotide amplicons with adapters. The polynucleotide amplicons with adapters contain nucleic acids of interest, adapters or their complements, and gene-specific barcodes or their complements. For ease of illustration, FIG. 1 shows a set of first amplification primers 114 and second amplification primers 120, although a large number of target-specific primers can be used for various sequences of the amplification reaction, and a large number of amplicons of nucleic acids of interest can be generated. In some embodiments, polynucleotides with adapters are enriched from a polynucleotide pool, for example where the label includes biotin or another binding partner.

在一些实施方案中(其可以附加于富集或替代富集)，可以用外部或内部引物或嵌套引物扩增多核苷酸(包括带衔接子的多核苷酸)。在这样的实施方案中，外部引物或在较早的扩增轮次中使用的引物是靶特异性引物，其不必包括靶特异性条形码。内部引物或用于扩增的后续轮次的引物也是靶特异性引物，并且其包含靶特异性条形码。通常，巢式PCR是指使用一种或多种新引物进行的一轮或多轮后续PCR扩增，所述新引物在内部通过至少一个碱基对与在较早轮次中使用的引物结合。巢式PCR通过在后续反应中仅扩增来自前一个具有正确内部序列的扩增产物，从而减少了不需要的扩增靶的数目。巢式PCR通常需要设计完全在先前的外部引物结合位点内部的引物。In some embodiments (which may be in addition to enrichment or replace enrichment), polynucleotides (including polynucleotides with adapters) may be amplified with external or internal primers or nested primers. In such embodiments, external primers or primers used in earlier amplification rounds are target-specific primers, which do not necessarily include target-specific barcodes. Internal primers or primers for subsequent rounds of amplification are also target-specific primers, and they contain target-specific barcodes. Typically, nested PCR refers to one or more rounds of subsequent PCR amplifications performed using one or more new primers, which are internally bound to primers used in earlier rounds by at least one base pair. Nested PCR reduces the number of unwanted amplification targets by amplifying only the amplification product from the previous one with the correct internal sequence in subsequent reactions. Nested PCR typically requires the design of primers that are completely inside the previous external primer binding site.

然后可以对带衔接子的多核苷酸扩增子进行测序。在一些实施方案中，采用与衔接子108的第一引发位点124互补的第一测序引物122来进行第一引物延伸，以便对至少第一基因104测序。在测序反应中将标记的核苷酸添加到引物中，并且生成与带衔接子的多核苷酸扩增子互补的第一延伸序列126，从而提供有关带衔接子的多核苷酸的序列信息。第一引物延伸发生在带衔接子的多核苷酸扩增子的第一位置。第一引发位点124通常可以在衔接子条形码的5'或3'，这取决于是否希望与第一基因104一起或与第一基因104分开对衔接子条形码进行测序。使用第二测序引物128来进行第二引物延伸，从而至少对基因特异性条形码118进行测序。第二测序引物128与第一扩增引物114的部分130互补，所述部分130为基因特异性条形码118的3'和靶特异性序列116的5'。在测序反应中将标记的核苷酸添加到引物中，并且生成与基因特异性条形码互补的第二延伸序列132，从而提供有关基因特异性条形码的序列信息。如上所述，序数第一和第二并不意味着在第二引物之前使用第一引物；相反，它们用于彼此区分不同的引物。The polynucleotide amplicon with the adapter can then be sequenced. In some embodiments, a first sequencing primer 122 complementary to the first priming site 124 of the adapter 108 is used to perform a first primer extension to sequence at least the first gene 104. Labeled nucleotides are added to the primer in the sequencing reaction, and a first extension sequence 126 complementary to the polynucleotide amplicon with the adapter is generated, thereby providing sequence information about the polynucleotide with the adapter. The first primer extension occurs at the first position of the polynucleotide amplicon with the adapter. The first priming site 124 can usually be at the 5' or 3' of the adapter barcode, depending on whether it is desired to sequence the adapter barcode together with the first gene 104 or separately from the first gene 104. A second primer extension is performed using a second sequencing primer 128 to sequence at least the gene-specific barcode 118. The second sequencing primer 128 is complementary to a portion 130 of the first amplification primer 114, which is 3' of the gene-specific barcode 118 and 5' of the target-specific sequence 116. The labeled nucleotides are added to the primers in the sequencing reaction, and a second extended sequence 132 complementary to the gene-specific barcode is generated, thereby providing sequence information about the gene-specific barcode. As described above, the ordinal numbers first and second do not mean that the first primer is used before the second primer; rather, they are used to distinguish different primers from each other.

在时段C，处理并解释来自测序反应的数据。在一些实施方案中，第一延伸序列126被确定为第一基因的序列，第二延伸序列132被确定为与第二基因相关的基因特异性条形码的序列。基于这些确定，数据被解释为表明在感兴趣的核酸中存在融合基因。融合基因包含第一基因和第二基因的多个部分，并且即使不直接对第二基因106本身进行测序也可以确定其存在。In time period C, the data from the sequencing reaction is processed and interpreted. In some embodiments, the first extended sequence 126 is determined to be the sequence of the first gene, and the second extended sequence 132 is determined to be the sequence of a gene-specific barcode associated with the second gene. Based on these determinations, the data is interpreted as indicating the presence of a fusion gene in the nucleic acid of interest. The fusion gene contains multiple portions of the first gene and the second gene, and its presence can be determined even without directly sequencing the second gene 106 itself.

实施例2Example 2

图2展示的是，在有或没有衔接子的早期附接的情况下可以用靶特异性引物扩增多核苷酸。可以通过如下制备用于测序的多核苷酸：将衔接子附接到多核苷酸上，然后是靶特异性的(图2左侧的工作流程)，或者还有通过本技术生成的带衔接子的多核苷酸和带衔接子的多核苷酸扩增子。根据本发明的一个实施方案，所述带衔接子的多核苷酸可用于使用选择性基因扩增来检测基因组重排或其他融合事件。在图2中，多核苷酸202包含感兴趣的核酸，在这种情况下为融合基因的连接点。多核苷酸102包含第一基因204和第二基因206。What Fig. 2 shows is that polynucleotide can be amplified with target-specific primers in the case of early attachment with or without adapter.Polynucleotide for sequencing can be prepared as follows: adapter is attached to polynucleotide, then target-specific (workflow on the left side of Fig. 2), or there are also polynucleotide with adapter and polynucleotide amplicon with adapter generated by this technology.According to one embodiment of the present invention, the polynucleotide with adapter can be used for detecting genome rearrangement or other fusion events using selective gene amplification.In Fig. 2, polynucleotide 202 comprises nucleic acid of interest, in this case the junction point of fusion gene.Polynucleotide 102 comprises first gene 204 and second gene 206.

在时段A，制备多核苷酸用于靶特异性扩增。可以使多核苷酸变性以提供单链多核苷酸，或者可以提供双链多核苷酸用于扩增。在一些实施方案中，以非特异性方式扩增多核苷酸(例如，通过用与附接至带衔接子的多核苷酸的文库成员的衔接子上的引发位点互补的引物来扩增多核苷酸。在一些实施方案中，如上文论述，通常在扩增多核苷酸之前富集多核苷酸。In time period A, polynucleotides are prepared for target-specific amplification. The polynucleotides can be denatured to provide single-stranded polynucleotides, or double-stranded polynucleotides can be provided for amplification. In some embodiments, the polynucleotides are amplified in a non-specific manner (e.g., by amplifying the polynucleotides with primers complementary to the priming sites on the adapters of the library members attached to the polynucleotides with adapters. In some embodiments, as discussed above, the polynucleotides are typically enriched before amplifying the polynucleotides.

制备多核苷酸，使其与包含靶特异性引物216的第一扩增引物214接触。靶特异性引物216与已知或疑似存在于多核苷酸中的序列例如基因206内的序列互补。第一扩增引物214还包含基因特异性条形码218，其对已知或疑似存在于被分析样品或感兴趣的多核苷酸中的基因的一部分是特异的。还制备多核苷酸，使其与包含靶特异性引物217的第二扩增引物215接触。靶特异性引物217与已知或疑似存在于多核苷酸中的序列例如基因204内的序列互补。第二扩增引物215还包含条形码219，比如靶特异性条形码、样品条形码、分子条形码或其他条形码、或条形码的组合。第一和第二扩增引物中的一者或两者可包含衔接子。The polynucleotide is prepared and contacted with a first amplification primer 214 comprising a target-specific primer 216. The target-specific primer 216 is complementary to a sequence known or suspected to be present in the polynucleotide, such as a sequence within the gene 206. The first amplification primer 214 also comprises a gene-specific barcode 218, which is specific to a portion of a gene known or suspected to be present in the analyzed sample or the polynucleotide of interest. The polynucleotide is also prepared and contacted with a second amplification primer 215 comprising a target-specific primer 217. The target-specific primer 217 is complementary to a sequence known or suspected to be present in the polynucleotide, such as a sequence within the gene 204. The second amplification primer 215 also comprises a barcode 219, such as a target-specific barcode, a sample barcode, a molecular barcode, or other barcode, or a combination of barcodes. One or both of the first and second amplification primers may comprise an adapter.

在时段B，在第一扩增引物214和第二扩增引物215的存在下对多核苷酸进行扩增，以生成多核苷酸扩增子的文库。多核苷酸扩增子包含感兴趣的核酸和靶特异性条形码或其互补序列。所述多核苷酸扩增子可以是带衔接子的多核苷酸扩增子，其中它们包含感兴趣的核酸、衔接子或其互补序列以及靶特异性条形码或其互补序列。为了便于说明，图2显示了一组第一扩增引物214和第二扩增引物215，尽管扩增反应可以针对各种序列采用大量靶特异性引物，并且可以生成大量感兴趣的核酸的扩增子。In time period B, the polynucleotide is amplified in the presence of the first amplification primer 214 and the second amplification primer 215 to generate a library of polynucleotide amplicons. The polynucleotide amplicons contain the nucleic acid of interest and the target-specific barcode or its complementary sequence. The polynucleotide amplicons can be polynucleotide amplicons with adapters, wherein they contain the nucleic acid of interest, the adapter or its complementary sequence and the target-specific barcode or its complementary sequence. For ease of illustration, FIG. 2 shows a set of first amplification primers 214 and second amplification primers 215, although the amplification reaction can use a large number of target-specific primers for various sequences, and a large number of amplicons of the nucleic acid of interest can be generated.

然后可以对多核苷酸扩增子进行测序，或者可以对它们进行另外的处理步骤，比如富集、进一步扩增和/或衔接子的附接。例如，可以将衔接子附接到扩增子的每个末端上，使得带衔接子的多核苷酸扩增子具有测序引发位点，并且/可以将衔接子附接到固体支持物上。在时间段C中，多核苷酸扩增子已与附接到固体支持物上的引物杂交，并且所述引物已被延伸以提供附接到支持物上的多核苷酸扩增子的互补序列。采用与衔接子208的第一引发位点224互补的第一测序引物222来进行第一引物延伸，以便对至少第一基因204测序。在测序反应中将标记的核苷酸添加到引物中，并且生成与带衔接子的多核苷酸扩增子互补的第一延伸序列226，从而提供有关带衔接子的多核苷酸的序列信息。第一引物延伸发生在带衔接子的多核苷酸扩增子的第一位置。第一引发位点224通常可以在衔接子条形码的5'或3'，这取决于是否希望与第一基因204一起或与第一基因204分开对衔接子条形码进行测序。使用第二测序引物228来进行第二引物延伸，从而至少对基因特异性条形码218进行测序。第二测序引物228与第一扩增引物214的部分230互补，所述部分230为基因特异性条形码218的3'和靶特异性序列216的5'。在测序反应中将标记的核苷酸添加到引物中，并且生成与基因特异性条形码互补的第二延伸序列232，从而提供有关基因特异性条形码的序列信息。如上所述，序数第一和第二并不意味着在第二引物之前使用第一引物；相反，它们用于彼此区分不同的引物。Then the polynucleotide amplicon can be ordered, or they can be subjected to other processing steps, such as enrichment, further amplification and/or the attachment of adapter.For example, adapter can be attached to each end of amplicon so that the polynucleotide amplicon with adapter has the order-checking initiation site, and/the adapter can be attached to the solid support.In time period C, the polynucleotide amplicon has hybridized with the primer attached on the solid support, and the primer has been extended to provide the complementary sequence of the polynucleotide amplicon attached on the support.Adopt the first sequencing primer 222 complementary to the first initiation site 224 of adapter 208 to carry out the first primer extension, so that at least the first gene 204 are ordered.In sequencing reaction, the nucleotide of mark is added in the primer, and the first extension sequence 226 complementary to the polynucleotide amplicon with adapter is generated, thereby the sequence information of the polynucleotide of the relevant band adapter is provided.The first primer extension occurs in the first position of the polynucleotide amplicon with adapter. The first priming site 224 can typically be 5' or 3' of the adapter barcode, depending on whether it is desired to sequence the adapter barcode together with the first gene 204 or separately from the first gene 204. A second primer extension is performed using a second sequencing primer 228 to sequence at least the gene-specific barcode 218. The second sequencing primer 228 is complementary to a portion 230 of the first amplification primer 214, which is 3' of the gene-specific barcode 218 and 5' of the target-specific sequence 216. Labeled nucleotides are added to the primer in the sequencing reaction, and a second extension sequence 232 is generated that is complementary to the gene-specific barcode, thereby providing sequence information about the gene-specific barcode. As described above, the ordinal numbers first and second do not mean that the first primer is used before the second primer; rather, they are used to distinguish different primers from each other.

在时段C，处理并解释来自测序反应的数据。在一些实施方案中，第一延伸序列226被确定为第一基因的序列，第二延伸序列232被确定为与第二基因相关的基因特异性条形码的序列。基于这些确定，数据被解释为表明在感兴趣的核酸中存在融合基因。融合基因包含第一基因和第二基因的多个部分，并且即使不直接对第二基因206本身进行测序也可以确定其存在。In time period C, the data from the sequencing reaction is processed and interpreted. In some embodiments, the first extended sequence 226 is determined to be the sequence of the first gene, and the second extended sequence 232 is determined to be the sequence of a gene-specific barcode associated with the second gene. Based on these determinations, the data is interpreted as indicating the presence of a fusion gene in the nucleic acid of interest. The fusion gene contains multiple portions of the first gene and the second gene, and its presence can be determined even without directly sequencing the second gene 206 itself.

示例性实施方案Exemplary embodiments

实施方案1.一种通过附接靶特异性条形码制备用于测序的多核苷酸的方法，所述方法包括：用第一扩增引物和第二扩增引物扩增多核苷酸，其中第一扩增引物与多核苷酸的第一引发位点杂交，并且第一扩增引物包含靶特异性条形码；其中所述扩增产生多核苷酸扩增子，并且其中所述多核苷酸扩增子包含与感兴趣的多核苷酸和靶特异性条形码相同或互补的序列。Embodiment 1. A method for preparing a polynucleotide for sequencing by attaching a target-specific barcode, the method comprising: amplifying the polynucleotide with a first amplification primer and a second amplification primer, wherein the first amplification primer hybridizes to a first priming site of the polynucleotide and the first amplification primer comprises a target-specific barcode; wherein the amplification produces a polynucleotide amplicon, and wherein the polynucleotide amplicon comprises a sequence identical or complementary to the polynucleotide of interest and the target-specific barcode.

实施方案2.实施方案1的方法，其中所述第二扩增引物(1)与附接到多核苷酸上的衔接子的一部分在距第一引发位点一定距离处杂交，或(2)与多核苷酸的第二引发位点杂交，其中第二引发位点在距第一引发位点的一定距离处。Embodiment 2. The method of embodiment 1, wherein the second amplification primer (1) hybridizes to a portion of a linker attached to a polynucleotide at a certain distance from the first priming site, or (2) hybridizes to a second priming site of the polynucleotide, wherein the second priming site is at a certain distance from the first priming site.

实施方案3.实施方案1的方法，其进一步包括在距第一引发位点一定距离处将衔接子附接到多核苷酸上，其中所述衔接子包含第二引发位点。Embodiment 3. The method of embodiment 1, further comprising attaching an adaptor to the polynucleotide at a distance from the first priming site, wherein the adaptor comprises a second priming site.

实施方案4.实施方案3的方法，其中所述衔接子进一步包含衔接子条形码，其中所述衔接子条形码是样品条形码或分子条形码。Embodiment 4. The method of embodiment 3, wherein the adapter further comprises an adapter barcode, wherein the adapter barcode is a sample barcode or a molecular barcode.

实施方案5.前述实施方案中任一项的方法，其中所述第一引发位点是融合基因的一部分，并且所述靶特异性条形码对所述融合基因的所述部分是特异的。Embodiment 5. The method of any one of the preceding embodiments, wherein the first priming site is part of a fusion gene and the target-specific barcode is specific for the part of the fusion gene.

实施方案6.实施方案5的方法，其中所述融合基因的所述部分是所述融合基因的连接点。Embodiment 6. The method of embodiment 5, wherein the portion of the fusion gene is the junction point of the fusion gene.

实施方案7.前述实施方案中任一项的方法，其中所述多核苷酸是基因组DNA(gDNA)或衍生自RNA模板的互补DNA(cDNA)。Embodiment 7. The method of any one of the preceding embodiments, wherein the polynucleotide is genomic DNA (gDNA) or complementary DNA (cDNA) derived from an RNA template.

实施方案8.前述实施方案中任一项的方法，其中所述感兴趣的多核苷酸包括多个感兴趣的多核苷酸，并且所述方法包括将多个衔接子附接到多个多核苷酸上，由此形成多个带衔接子的多核苷酸，其中多个带衔接子的多核苷酸中的每一个包含不同的分子条形码。Embodiment 8. A method according to any of the preceding embodiments, wherein the polynucleotide of interest comprises a plurality of polynucleotides of interest, and the method comprises attaching a plurality of linkers to the plurality of polynucleotides, thereby forming a plurality of linker-carrying polynucleotides, wherein each of the plurality of linker-carrying polynucleotides comprises a different molecular barcode.

实施方案9.前述实施方案中任一项的方法，其中所述感兴趣的多核苷酸包括多个感兴趣的多核苷酸，并且第一扩增引物包括具有不同的靶特异性引物和不同的靶特异性条形码的多个第一扩增引物，由此形成多个带衔接子的多核苷酸扩增子，其中多个带衔接子的多核苷酸扩增子中的每一个包含不同的靶特异性条形码。Embodiment 9. A method according to any one of the preceding embodiments, wherein the polynucleotide of interest comprises a plurality of polynucleotides of interest, and the first amplification primer comprises a plurality of first amplification primers having different target-specific primers and different target-specific barcodes, thereby forming a plurality of polynucleotide amplicons with adapters, wherein each of the plurality of polynucleotide amplicons with adapters comprises a different target-specific barcode.

实施方案10.前述实施方案中任一项的方法，其中所述多核苷酸扩增子或带衔接子的多核苷酸包含结合配偶体，例如生物素模块。Embodiment 10. The method of any one of the preceding embodiments, wherein the polynucleotide amplicon or polynucleotide with adaptor comprises a binding partner, such as a biotin moiety.

实施方案11.前述实施方案中任一项的方法，进一步包括通过进行第一引物延伸和第二引物延伸在第一和第二位置对多核苷酸扩增子进行测序，其中第一引物延伸和第二引物延伸在相同方向上进行。在第一位置的测序可提供感兴趣的多核苷酸的至少一部分的序列，而在第二位置的测序可提供靶特异性条形码的序列。Embodiment 11. The method of any of the preceding embodiments, further comprising sequencing the polynucleotide amplicon at the first and second positions by performing a first primer extension and a second primer extension, wherein the first primer extension and the second primer extension are performed in the same direction. Sequencing at the first position can provide a sequence of at least a portion of the polynucleotide of interest, and sequencing at the second position can provide a sequence of a target-specific barcode.

实施方案12.实施方案11的方法，其中第一引物延伸和第二引物延伸在分别的测序运行中在所述多核苷酸上以相同方向进行。Embodiment 12. The method of embodiment 11, wherein the first primer extension and the second primer extension are performed in the same direction on the polynucleotide in separate sequencing runs.

实施方案13.实施方案11的方法，其中所述测序是下一代测序(NGS)或大规模平行测序。Embodiment 13. The method of embodiment 11, wherein the sequencing is next generation sequencing (NGS) or massively parallel sequencing.

实施方案14.前述实施方案中任一项的方法，进一步包括使用多核苷酸扩增子的至少一个的单端测序来检测基因组重排，例如通过基于从第一引物延伸和第二引物延伸的测序生成的数据来鉴定基因组重排。Embodiment 14. The method of any of the preceding embodiments, further comprising detecting genomic rearrangements using single-end sequencing of at least one of the polynucleotide amplicons, such as by identifying genomic rearrangements based on data generated from sequencing of first primer extension and second primer extension.

实施方案15.实施方案14的方法，其中基因组重排的频率为约10％或更小，或者5％或更小。Embodiment 15. The method of embodiment 14, wherein the frequency of genomic rearrangement is about 10% or less, or 5% or less.

实施方案16.实施方案14的方法，其中基因组重排是易位。Embodiment 16. The method of embodiment 14, wherein the genomic rearrangement is a translocation.

实施方案17.实施方案14的方法，其中将从第一引物延伸的测序生成的数据与已知的核酸序列比如已知的gDNA序列进行比较，以确定基因组重排。Embodiment 17. The method of embodiment 14, wherein the data generated from sequencing extended from the first primer is compared to a known nucleic acid sequence, such as a known gDNA sequence, to determine the genomic rearrangement.

实施方案18.一种通过附接靶特异性条形码制备用于测序的多核苷酸的文库的方法，所述方法包括：使用第一组扩增引物和第二组扩增引物扩增多核苷酸池，其中第一组扩增引物与多核苷酸池内的多个不同序列杂交，其中第一组扩增引物的每一个包含不同的靶特异性条形码。Embodiment 18. A method for preparing a library of polynucleotides for sequencing by attaching target-specific barcodes, the method comprising: amplifying a polynucleotide pool using a first set of amplification primers and a second set of amplification primers, wherein the first set of amplification primers hybridizes to a plurality of different sequences within the polynucleotide pool, wherein each of the first set of amplification primers comprises a different target-specific barcode.

实施方案19.实施方案18的方法，其进一步包括：生成带衔接子的多核苷酸的文库，其中每个带衔接子的多核苷酸包含附接到多核苷酸上的衔接子，并且所述衔接子包含第二引发位点和衔接子条形码。Embodiment 19. The method of embodiment 18, further comprising: generating a library of polynucleotides with adapters, wherein each polynucleotide with adapters comprises an adapter attached to the polynucleotide, and the adapter comprises a second priming site and an adapter barcode.

实施方案20.实施方案19的方法，其中第二组扩增引物与衔接子上的第二引发位点杂交，由此产生带衔接子的多核苷酸扩增子。Embodiment 20. The method of embodiment 19, wherein the second set of amplification primers hybridizes to a second priming site on the adaptor, thereby generating a polynucleotide amplicon with an adaptor.

实施方案21.实施方案18的方法，其进一步包括通过进行第一引物延伸和第二引物延伸，在两个位置对多个带衔接子的多核苷酸扩增子中的每一个进行测序，其中对于多核苷酸扩增子中的每一个，在相同方向上进行第一引物延伸和第二引物延伸的测序。Embodiment 21. The method of embodiment 18, further comprising sequencing each of a plurality of polynucleotide amplicons with adapters at two positions by performing a first primer extension and a second primer extension, wherein for each of the polynucleotide amplicons, sequencing by the first primer extension and the second primer extension is performed in the same direction.

实施方案22.实施方案21的方法，进一步包括：基于从第一引物延伸和第二引物延伸的测序生成的数据来鉴定基因组重排。Embodiment 22. The method of embodiment 21 further comprises: identifying genomic rearrangements based on data generated from sequencing of the first primer extension and the second primer extension.

实施方案23.一种用于检测具有第一结合位点的多核苷酸中的基因组重排的组合物或试剂盒，所述组合物或试剂盒包含：第一扩增引物，其包含靶特异性引物和靶特异性条形码；和第二扩增引物。Embodiment 23. A composition or kit for detecting genomic rearrangements in a polynucleotide having a first binding site, the composition or kit comprising: a first amplification primer comprising a target-specific primer and a target-specific barcode; and a second amplification primer.

实施方案24.实施方案23的组合物或试剂盒，进一步包含：衔接子，其包含第二引发位点和衔接子条形码，并且其中所述第二扩增引物包含与所述衔接子内的序列互补或相同的引发序列。Embodiment 24. The composition or kit of embodiment 23, further comprising: an adaptor comprising a second priming site and an adaptor barcode, and wherein the second amplification primer comprises a priming sequence that is complementary to or identical to a sequence within the adaptor.

实施方案25.实施方案23的组合物或试剂盒，其中所述第二扩增引物(1)与附接到多核苷酸上的衔接子的一部分在距第一引发位点一定距离处杂交，或(2)与多核苷酸的第二引发位点杂交，其中第二引发位点在距第一引发位点的一定距离处。Embodiment 25. A composition or kit of embodiment 23, wherein the second amplification primer (1) hybridizes to a portion of an adapter attached to a polynucleotide at a certain distance from the first priming site, or (2) hybridizes to a second priming site of the polynucleotide, wherein the second priming site is at a certain distance from the first priming site.

实施方案26.实施方案24的组合物或试剂盒，其中所述衔接子和/或第二引发位点在多核苷酸的链的5'端，并且第一引发位点在所述链的3'端。Embodiment 26. The composition or kit of embodiment 24, wherein the adaptor and/or the second priming site is at the 5' end of the strand of the polynucleotide and the first priming site is at the 3' end of the strand.

实施方案27.一种检测多核苷酸中的基因组重排的方法，所述方法包括：用第一扩增引物和第二扩增引物扩增多核苷酸，其中所述第一扩增引物与所述多核苷酸的第一引发位点杂交，并且所述第一扩增引物进一步包含靶特异性条形码，其中所述扩增产生多核苷酸扩增子，其包含与感兴趣的多核苷酸和靶特异性条形码相同或互补的序列；并且通过进行第一引物延伸和第二引物延伸在第一和第二位置对所述多核苷酸扩增子进行测序，其中第一引物延伸和第二引物延伸在相同方向上进行。Embodiment 27. A method for detecting genomic rearrangements in a polynucleotide, the method comprising: amplifying the polynucleotide using a first amplification primer and a second amplification primer, wherein the first amplification primer hybridizes to a first priming site of the polynucleotide and the first amplification primer further comprises a target-specific barcode, wherein the amplification produces a polynucleotide amplicon comprising a sequence identical or complementary to the polynucleotide of interest and the target-specific barcode; and sequencing the polynucleotide amplicon at a first and a second position by performing a first primer extension and a second primer extension, wherein the first primer extension and the second primer extension are performed in the same direction.

实施方案28.实施方案27的方法，其中在第一位置的测序提供感兴趣的多核苷酸的至少一部分的序列，在第二位置的测序提供靶特异性条形码的序列。Embodiment 28. The method of embodiment 27, wherein sequencing at the first position provides a sequence of at least a portion of the polynucleotide of interest and sequencing at the second position provides a sequence of a target-specific barcode.

实施方案29.实施方案27或28的方法，其中第一引物延伸和第二引物延伸在分别的测序运行期间在该多核苷酸上以相同方向进行。Embodiment 29. The method of embodiment 27 or 28, wherein the first primer extension and the second primer extension are performed in the same direction on the polynucleotide during separate sequencing runs.

实施方案30.实施方案27至39中任一项的方法，其中所述测序是下一代测序(NGS)或大规模平行测序。Embodiment 30. The method of any one of embodiments 27 to 39, wherein the sequencing is next generation sequencing (NGS) or massively parallel sequencing.

实施方案31.实施方案27至30中任一项的方法，进一步包括使用多核苷酸扩增子的至少一个的单端测序来检测基因组重排，例如通过基于从第一引物延伸和第二引物延伸的测序生成的数据来鉴定该基因组重排。Embodiment 31. The method of any one of embodiments 27 to 30, further comprising detecting a genomic rearrangement using single-end sequencing of at least one of the polynucleotide amplicons, such as by identifying the genomic rearrangement based on data generated from sequencing of a first primer extension and a second primer extension.

实施方案32.实施方案31的方法，其中基因组重排的频率为约10％或更小。Embodiment 32. The method of embodiment 31, wherein the frequency of genomic rearrangement is about 10% or less.

实施方案33.实施方案31或32的方法，其中基因组重排是易位。Embodiment 33. The method of embodiment 31 or 32, wherein the genomic rearrangement is a translocation.

实施方案34.实施方案27至33中任一项的方法，其中将从第一引物延伸的测序生成的数据与已知的核酸序列例111如已知的gDNA序列进行比较，以确定基因组重排。Embodiment 34. The method of any one of embodiments 27 to 33, wherein the data generated from sequencing extended from the first primer is compared to a known nucleic acid sequence, such as a known gDNA sequence, to determine genomic rearrangements.

鉴于本公开应当注意的是，可以与本教导一致地实施所述方法。此外，各种部件、材料、结构和参数仅为了说明和示例而被包括在内，而没有任何限制性的意义。鉴于本公开，可以在其他应用中实施本教导，并且可以确定实施这些应用的部件、材料、结构和设备，同时保持在所附权利要求书的范围内。In view of this disclosure, it should be noted that the method can be implemented in accordance with the present teachings. In addition, various components, materials, structures and parameters are included for illustration and example only, without any restrictive meaning. In view of this disclosure, the present teachings can be implemented in other applications, and the components, materials, structures and equipment for implementing these applications can be determined while remaining within the scope of the appended claims.

Claims

1. A method of preparing a polynucleotide for sequencing by attaching a target-specific barcode, the method comprising:

Amplifying a polynucleotide with a first amplification primer and a second amplification primer, wherein the first amplification primer hybridizes to a first priming site of the polynucleotide and the first amplification primer comprises a target-specific barcode, wherein the second amplification primer hybridizes to (1) a portion of an adapter attached to the polynucleotide at a distance from the first priming site, or (2) a second priming site of the polynucleotide, wherein the second priming site hybridizes to a second priming site at a distance from the first priming site, wherein the first priming site is a portion of a fusion gene and the target-specific barcode is specific for the portion of the fusion gene;

wherein said amplification produces a polynucleotide amplicon, wherein the polynucleotide amplicon comprises sequences identical or complementary to the polynucleotide of interest and the target-specific barcode,

Wherein the method further comprises single-ended sequencing of the polynucleotide amplicon at the first and second locations by performing a first sequencing primer extension and a second sequencing primer extension, wherein the first sequencing primer extension and the second sequencing primer extension are performed in the same direction.

2. The method of claim 1, further comprising attaching an adapter to the polynucleotide at a distance from a first priming site, wherein the adapter comprises a second priming site.

3. The method of claim 1, wherein the portion of the fusion gene is the junction of the fusion gene.

4. The method of claim 1, wherein the polynucleotide of interest comprises a plurality of polynucleotides of interest, and the method comprises attaching a plurality of adaptors to the plurality of polynucleotides, thereby forming a plurality of adaptor-containing polynucleotides each comprising a different molecular barcode.

5. The method of claim 1, wherein the polynucleotide of interest comprises a plurality of polynucleotides of interest and the first amplification primer comprises a plurality of first amplification primers having different target-specific primers and different target-specific barcodes, thereby forming a plurality of adaptor-containing polynucleotide amplicons, wherein each of the plurality of adaptor-containing polynucleotide amplicons comprises a different target-specific barcode.

6. The method of claim 1, wherein the polynucleotide amplicon or the adaptor-bearing polynucleotide comprises a binding partner.

7. The method of claim 6, wherein the binding partner is a biotin moiety.

8. The method of claim 1, wherein first sequencing primer extension and second sequencing primer extension are performed in the same direction on the polynucleotide in separate sequencing runs.

9. A composition or kit for detecting genomic rearrangement in a polynucleotide having a first priming site by single ended sequencing, the composition or kit comprising:

A first amplification primer comprising a target-specific primer and a target-specific barcode; and

A second amplification primer comprising a second primer sequence,

Wherein the first priming site is a portion of a fusion gene and the target-specific barcode is specific for the portion of the fusion gene,

Wherein the second amplification primer hybridizes (1) to a portion of an adapter attached to the polynucleotide at a distance from the first priming site, or (2) to a second priming site of a polynucleotide,

Wherein the single-ended sequencing comprises performing a first sequencing primer extension and performing a second sequencing primer extension, wherein the first sequencing primer extension and the second sequencing primer extension are performed in the same direction.

10. The composition or kit of claim 9, further comprising:

an adapter comprising a second priming site and an adapter barcode, and

Wherein the second amplification primer comprises a priming sequence that is complementary to or identical to a sequence within the adapter.

11. A method of detecting genomic rearrangements in a polynucleotide, the method comprising:

Amplifying a polynucleotide with a first amplification primer and a second amplification primer, wherein the first amplification primer hybridizes to a first priming site of the polynucleotide and the first amplification primer further comprises a target-specific barcode,

Wherein the amplification produces a polynucleotide amplicon comprising the same or complementary sequence as the polynucleotide of interest and the target-specific barcode; and

Single-ended sequencing of the polynucleotide amplicon at the first and second locations by performing a first sequencing primer extension and a second sequencing primer extension, wherein the first sequencing primer extension and the second sequencing primer extension are performed in the same direction,

Wherein the method is not used for diagnosing a disease.

12. The method of claim 11, wherein sequencing at the first location provides a sequence of at least a portion of the polynucleotide of interest and sequencing at the second location provides a sequence of a target-specific barcode.

13. The method of claim 11, wherein first sequencing primer extension and second sequencing primer extension are performed in the same direction on the polynucleotide in separate sequencing runs.

14. The method of claim 11, wherein the sequencing is next generation sequencing or large scale parallel sequencing.

15. The method of claim 11, further comprising detecting genomic rearrangements using single ended sequencing of at least one of the polynucleotide amplicons.

16. The method of claim 15, wherein the genomic rearrangement is identified by data generated based on sequencing of the first sequencing primer extension and the second sequencing primer extension.

17. The method of claim 15, wherein the frequency of genomic rearrangements is 10% or less.

18. The method of claim 15, wherein the genomic rearrangement is translocation.

19. The method of claim 11, wherein the data generated from sequencing the first primer extension is compared to a known nucleic acid sequence to determine a genomic rearrangement.

20. The method of claim 19, wherein the known nucleic acid sequence is a known gDNA sequence.