[go: up one dir, main page]

CN111868255A - Methods and reagents for enriching nucleic acid material for sequencing applications and other nucleic acid material interrogation - Google Patents

Methods and reagents for enriching nucleic acid material for sequencing applications and other nucleic acid material interrogation Download PDF

Info

Publication number
CN111868255A
CN111868255A CN201980019408.4A CN201980019408A CN111868255A CN 111868255 A CN111868255 A CN 111868255A CN 201980019408 A CN201980019408 A CN 201980019408A CN 111868255 A CN111868255 A CN 111868255A
Authority
CN
China
Prior art keywords
nucleic acid
target
sequencing
acid material
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201980019408.4A
Other languages
Chinese (zh)
Inventor
J·J·索尔克
L·N·威廉姆斯
李覃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Twinstrand Biosciences Inc
Original Assignee
Twinstrand Biosciences Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Twinstrand Biosciences Inc filed Critical Twinstrand Biosciences Inc
Publication of CN111868255A publication Critical patent/CN111868255A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/10Transferases (2.)
    • C12N9/12Transferases (2.) transferring phosphorus containing groups, e.g. kinases (2.7)
    • C12N9/1241Nucleotidyltransferases (2.7.7)
    • C12N9/1276RNA-directed DNA polymerase (2.7.7.49), i.e. reverse transcriptase or telomerase
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases [RNase]; Deoxyribonucleases [DNase]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6816Hybridisation assays characterised by the detection means
    • C12Q1/6818Hybridisation assays characterised by the detection means involving interaction of two or more labels, e.g. resonant energy transfer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/686Polymerase chain reaction [PCR]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPR]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/50Physical structure
    • C12N2310/53Physical structure partially self-complementary or closed
    • C12N2310/531Stem-loop; Hairpin
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2531/00Reactions of nucleic acids characterised by
    • C12Q2531/10Reactions of nucleic acids characterised by the purpose being amplify/increase the copy number of target nucleic acid
    • C12Q2531/113PCR

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Engineering & Computer Science (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Genetics & Genomics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Immunology (AREA)
  • Biophysics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Medicinal Chemistry (AREA)
  • Biomedical Technology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

本技术总体上涉及用于靶向的核酸序列富集的方法和组合物,以及这样的富集用于错误校正的核酸测序应用和其他核酸序列询问的用途。在一些实施例中,所提供的方法提供了与用于错误校正的分子条形码的使用兼容的基于非扩增的靶向的富集策略。其他实施例提供了与直接数字测序(DDS)和其他不使用分子条形码的测序策略(例如,单分子测序模式和询问)兼容的基于非扩增的靶向的富集策略的方法。

Figure 201980019408

The present technology generally relates to methods and compositions for targeted nucleic acid sequence enrichment, and the use of such enrichment for error-corrected nucleic acid sequencing applications and other nucleic acid sequence interrogation. In some embodiments, the provided methods provide non-amplification targeted based enrichment strategies compatible with the use of molecular barcodes for error correction. Other embodiments provide methods for non-amplification targeted enrichment strategies that are compatible with direct digital sequencing (DDS) and other sequencing strategies that do not use molecular barcodes (eg, single-molecule sequencing modes and interrogation).

Figure 201980019408

Description

用于富集用于测序应用和其他核酸材料询问的核酸材料的方 法和试剂Methods and reagents for enriching nucleic acid material for sequencing applications and other nucleic acid material interrogation

相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS

本申请要求2018年3月15日提交的美国临时专利申请第62/643,738号的优先权和权益,其公开通过引用以其整体并入本文。This application claims priority to and the benefit of US Provisional Patent Application No. 62/643,738, filed March 15, 2018, the disclosure of which is incorporated herein by reference in its entirety.

背景技术Background technique

在方案开发、化学/生物化学和数据处理水平上已经开发了多种方法来减轻大规模平行测序(MPS,有时也被称为下一代DNA测序NGS)应用中基于PCR的错误的影响。此外,在扩增之前或在扩增期间,来自单个DNA片段的PCR复制品可以基于独特的随机剪切点或通过外源标记(即,使用分子条形码,也被称为分子标签、独特的分子标识符[UMI]和单分子标识符[SMI])被解析的技术也普遍使用。这种方法已经被用于提高DNA和RNA模板的计数精度。因为可以明确地鉴定来自单个起始分子的所有扩增子,所以相同标记的测序读数的序列中的任何变化都可以用来校正在PCR或测序期间出现的碱基错误。例如,Kinde等人(ProcNatl Acad SciUSA 108,9530-9535,2011)引入了SafeSeqS,其使用单链分子条形码,以通过将共享条形码测序的PCR拷贝分组并形成一致性来降低测序的错误率。然而,单链分子条形码的引入不能完全消除在第一轮扩增中出现的PCR伪像,该伪像作为“头奖”事件被携带到衍生拷贝上。Various approaches have been developed at the protocol development, chemical/biochemical, and data processing levels to mitigate the effects of PCR-based errors in massively parallel sequencing (MPS, sometimes referred to as next-generation DNA sequencing, NGS) applications. Furthermore, before or during amplification, PCR replicates from individual DNA fragments can be based on unique random splice sites or by exogenous labeling (i.e., using molecular barcodes, also known as molecular tags, unique molecular The techniques in which identifiers [UMI] and single molecule identifiers [SMI]) are resolved are also commonly used. This method has been used to improve the counting accuracy of DNA and RNA templates. Because all amplicons from a single starting molecule can be unambiguously identified, any changes in the sequence of identically labeled sequencing reads can be used to correct for base errors that occur during PCR or sequencing. For example, Kinde et al. (ProcNatl Acad SciUSA 108, 9530-9535, 2011) introduced SafeSeqS, which uses single-stranded molecular barcodes, to reduce the error rate of sequencing by grouping and concordantly sequenced PCR copies that share barcodes. However, the introduction of single-stranded molecular barcodes did not completely eliminate the PCR artifact that appeared in the first round of amplification, which was carried over to the derived copies as a "jackpot" event.

用于单核苷酸多态性(SNP)基因座、短串联重复序列(STR)基因座和许多其他形式的突变和遗传变体的更高精度基因分型的方法在医学、法医学、遗传毒理学和其他科学工业应用的各种应用中是期望的。然而,一个挑战是如何以最高的置信度但以合理的成本,从尽可能多的被测序的相关遗传材料拷贝中最有效地生成序列信息。各种一致的测序方法(基于分子条形码和非基于分子条形码两者)已经被成功地用于错误校正,以帮助更好地鉴定混合物中的变体(参见J.Salk et al,Enhancing the accuracy of next-generationsequencing for detecting rare and subclonal mutations,Nature ReviewsGenetics,2018,for detailed discussion),但在性能上存在各种折衷。我们先前已经描述了双链测序,这是一种超高精度的测序方法,该方法依赖于基因分型并比较双链核酸分子的独立链序列,用于错误校正的目的。本文阐述的技术的方面描述了用于提高成本效率、回收效率和其他性能指标以及用于双链测序和其他测序应用的整体处理速度以用于实现高精度测序读数的方法。Methods for higher precision genotyping of single nucleotide polymorphism (SNP) loci, short tandem repeat (STR) loci and many other forms of mutations and genetic variants in medicine, forensics, genotoxicity Various applications in science and other scientific industrial applications are desired. One challenge, however, is how to most efficiently generate sequence information from as many copies of related genetic material that are sequenced as possible, with the highest confidence but at a reasonable cost. Various consistent sequencing methods (both molecular barcode-based and non-molecular barcode-based) have been successfully used for error correction to help better identify variants in mixtures (see J. Salk et al, Enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations, Nature ReviewsGenetics, 2018, for detailed discussion), but there are various tradeoffs in performance. We have previously described double-stranded sequencing, an ultra-high-precision sequencing method that relies on genotyping and comparing the sequences of independent strands of double-stranded nucleic acid molecules for error correction purposes. Aspects of the techniques set forth herein describe methods for improving cost efficiency, recovery efficiency, and other performance metrics, as well as overall processing speed for double-stranded sequencing and other sequencing applications, for achieving high precision sequencing reads.

发明内容SUMMARY OF THE INVENTION

本技术总体上涉及用于靶向的核酸序列富集的方法以及这样的富集用于错误校正的核酸测序应用和其他核酸材料询问的用途。在一些实施例中,使用已经从样品中富集的靶核酸材料,核酸材料的高度精确、错误校正的和大规模的平行测序是可能的。在一些方面,靶富集的核酸材料是双链的,并且一种或多种独特地标记双链核酸复合物的链的方法可以以这样的方式使用,即每条链可以与其互补链在信息上相关,但是也可以在对每条链或由其衍生的扩增产物测序后与其区分开,并且该信息可以进一步用于所确定的序列的错误校正的目的。本技术的一些方面提供了用于提高成本、测序的分子的转化率和生成用于靶向的超高精度测序的标记分子的时间效率的方法和组合物。在一些实施例中,所提供的方法和组合物允许精确分析非常少量的核酸材料(例如,来自少量临床样品或自由漂浮在血液中的DNA或取自犯罪现场的样品)。在一些实施例中,所提供的方法和组合物允许检测核酸材料的样品中以低于百分之一的细胞或分子(例如,低于千分之一的细胞或分子、低于万分之一的细胞或分子、低于十万分之一的细胞或分子)的频率存在的突变。The present technology generally relates to methods for targeted nucleic acid sequence enrichment and the use of such enrichment for error-corrected nucleic acid sequencing applications and other nucleic acid material interrogation. In some embodiments, highly accurate, error-corrected, and massively parallel sequencing of nucleic acid material is possible using target nucleic acid material that has been enriched from a sample. In some aspects, the target-enriched nucleic acid material is double-stranded, and the one or more methods of uniquely labeling strands of a double-stranded nucleic acid complex can be used in such a way that each strand can be informationally linked to its complementary strand. However, each strand or amplification product derived therefrom can also be distinguished after sequencing it, and this information can be further used for the purpose of error correction of the determined sequences. Some aspects of the present technology provide methods and compositions for improving cost, conversion rate of sequenced molecules, and time efficiency of generating labeled molecules for targeted ultra-high precision sequencing. In some embodiments, the provided methods and compositions allow for accurate analysis of very small amounts of nucleic acid material (eg, DNA from small clinical samples or freely floating in blood or samples taken from crime scenes). In some embodiments, the provided methods and compositions allow detection of nucleic acid material in samples of less than one percent of cells or molecules (eg, less than one in one thousand cells or molecules, less than one in ten thousand). A mutation that exists at a frequency of one cell or molecule, less than one in 100,000 cells or molecules).

本技术的方面涉及用于富集靶核酸材料的方法,所述方法包含提供核酸材料,以及用一种或多种靶向的核酸内切酶切割核酸材料,使得预定长度的靶区域与核酸材料的其余部分分离。所述方法可以进一步包含酶促破坏非靶向的核酸材料,从靶向的核酸内切酶中释放预定长度的靶区域;以及分析切割的靶区域。Aspects of the present technology relate to methods for enriching target nucleic acid material, the method comprising providing the nucleic acid material, and cleaving the nucleic acid material with one or more targeted endonucleases such that a target region of a predetermined length is bound to the nucleic acid material the rest of the separation. The method may further comprise enzymatically destroying the non-targeted nucleic acid material, releasing the target region of predetermined length from the targeted endonuclease; and analyzing the cleaved target region.

本技术的另外的方面涉及用于富集靶核酸材料的方法,所述方法包含提供核酸材料,用一种或多种靶向的核酸内切酶切割核酸材料,使得预定长度的靶区域与核酸材料的其余部分分离,其中至少一种靶向的核酸内切酶包括捕获标记;用被配置为结合捕获标记的提取部分捕获预定长度的靶区域;从靶向的核酸内切酶中释放预定长度的靶区域;以及分析切割的靶区域。Additional aspects of the present technology relate to methods for enriching target nucleic acid material, the method comprising providing the nucleic acid material, cleaving the nucleic acid material with one or more targeted endonucleases such that a target region of predetermined length is bound to the nucleic acid The remainder of the material is isolated, wherein the at least one targeted endonuclease includes a capture label; a target region of a predetermined length is captured with an extraction moiety configured to bind the capture label; the predetermined length is released from the targeted endonuclease the target region; and the target region for analysis of cleavage.

本技术的另外的方面涉及用于富集靶核酸材料的方法,包括提供核酸材料;将无催化活性的CRISPR相关的(Cas)酶结合到核酸材料的靶区域;用一种或多种核酸消化酶对核酸材料进行酶促处理,使得非靶向的核酸材料被破坏,并且靶区域被结合的无催化活性的Cas酶保护免受消化酶的影响;从无催化活性的Cas酶中释放靶区域;以及分析靶区域。Additional aspects of the present technology relate to methods for enriching a target nucleic acid material, comprising providing the nucleic acid material; binding a catalytically inactive CRISPR-associated (Cas) enzyme to a target region of the nucleic acid material; digesting with one or more nucleic acids Enzymatic treatment of nucleic acid material by enzymes such that non-targeted nucleic acid material is destroyed and the target region is protected from digestive enzymes by the bound catalytically inactive Cas enzyme; the target region is released from the catalytically inactive Cas enzyme ; and analyzing target regions.

本技术的另一方面涉及用于富集靶核酸材料的方法,包括提供核酸材料;提供一对催化活性的靶向的核酸内切酶和至少一种包括捕获标记的无催化活性的靶向的核酸内切酶,其中所述无催化活性的靶向的核酸内切酶被定向以结合核酸材料的靶区域,并且其中该对催化活性的靶向的核酸内切酶被定向以结合无催化活性的靶向的核酸内切酶的任一侧上的靶区域;用该对催化活性的靶向的核酸内切酶切割核酸材料,使得靶区域与核酸材料的其余部分分离;用被配置为结合捕获标记的提取部分捕获靶区域;从靶向的核酸内切酶中释放靶区域;以及分析切割的靶区域。Another aspect of the present technology relates to a method for enriching a target nucleic acid material, comprising providing a nucleic acid material; providing a pair of catalytically active targeted endonucleases and at least one catalytically inactive targeted endonuclease comprising a capture label An endonuclease, wherein the catalytically inactive targeted endonuclease is directed to bind to a target region of a nucleic acid material, and wherein the catalytically active targeted endonuclease is directed to bind to a catalytically inactive the target region on either side of the targeted endonuclease; cleave the nucleic acid material with the catalytically active targeted endonuclease such that the target region is separated from the rest of the nucleic acid material; with the pair configured to bind The extraction portion of the capture label captures the target region; releases the target region from the targeted endonuclease; and analyzes the cleaved target region.

进一步的方面包含用于从包括多个核酸片段的样品中富集靶核酸材料的方法,包括向包括靶核酸片段和非靶核酸片段的样品提供具有捕获标记的一种或多种无催化活性的CRISPR相关的(Cas)酶,其中所述一种或多种无催化活性的Cas酶被配置为结合靶核酸片段;提供包括提取部分的表面,所述提取部分被配置为结合捕获标记;以及通过经由提取部分结合捕获标记来捕获靶核酸片段,将靶核酸片段与非靶核酸片段分离。A further aspect includes a method for enriching a target nucleic acid material from a sample comprising a plurality of nucleic acid fragments, comprising providing the sample comprising the target nucleic acid fragments and non-target nucleic acid fragments with one or more catalytically inactive catalysts having capture labels. CRISPR-associated (Cas) enzymes, wherein the one or more catalytically inactive Cas enzymes are configured to bind target nucleic acid fragments; providing a surface comprising an extraction portion configured to bind a capture label; and by The target nucleic acid fragments are captured via the extraction moiety in conjunction with the capture label, and the target nucleic acid fragments are separated from the non-target nucleic acid fragments.

各种实施例提供了用于富集靶双链核酸材料的方法,包括提供核酸材料;用一种或多种靶向的核酸内切酶切割核酸材料以生成双链靶核酸片段,所述双链靶核酸片段包括具有5'预定核苷酸序列的5'粘性末端和/或具有3'预定核苷酸序列的3'粘性末端;以及通过5'粘性末端和3'粘性末端中的至少一个将双链靶核酸分子与核酸材料的其余部分分离。Various embodiments provide methods for enriching a target double-stranded nucleic acid material, comprising providing the nucleic acid material; cleaving the nucleic acid material with one or more targeted endonucleases to generate double-stranded target nucleic acid fragments, the double-stranded target nucleic acid fragments. The strand target nucleic acid fragment includes a 5' cohesive end having a 5' predetermined nucleotide sequence and/or a 3' cohesive end having a 3' predetermined nucleotide sequence; and passes through at least one of the 5' cohesive end and the 3' cohesive end The double-stranded target nucleic acid molecule is separated from the remainder of the nucleic acid material.

另外的实施例提供了用于富集靶核酸材料的试剂盒,其包括核酸文库,所述核酸文库包括核酸材料和多种无催化活性的Cas酶,其中所述Cas酶包括具有序列代码的标签,并且其中所述多种Cas酶沿着核酸材料结合至多个位点特异性靶区域。试剂盒进一步包括多个探针,其中每个探针包括寡核苷酸序列和捕获标记,所述寡核苷酸序列包括对应的序列代码的补体。试剂盒还可以包含查找表,该查找表对位点特异性靶区域、与位点特异性靶区域相关的序列代码和包括对应的序列代码的补体的探针之间的关系进行分类。Additional embodiments provide kits for enriching target nucleic acid material comprising a nucleic acid library comprising nucleic acid material and a plurality of catalytically inactive Cas enzymes, wherein the Cas enzymes comprise tags with sequence codes , and wherein the plurality of Cas enzymes bind to a plurality of site-specific target regions along the nucleic acid material. The kit further includes a plurality of probes, wherein each probe includes an oligonucleotide sequence including the complement of the corresponding sequence code and a capture label. The kit may also include a look-up table that classifies the relationship between the site-specific target regions, the sequence codes associated with the site-specific target regions, and probes that include the complement of the corresponding sequence codes.

在一些实施例中,使用错误校正的序列读数来鉴定或表征双链靶核酸分子所衍生自的生物体或受试者中的癌症、癌症风险、癌症突变、癌症代谢状态、突变表型、致癌物暴露、毒素暴露、慢性炎症暴露、年龄、神经退行性疾病、病原体、抗药性变体、胎儿分子、法医相关分子、免疫学相关分子、突变的T细胞受体、突变的B细胞受体、突变的免疫球蛋白基因座、基因组中的kategis位点、基因组中的高变位点、低频率变体、亚克隆变体、少数分子群体、污染源、核酸合成错误、酶修饰错误、化学修饰错误、基因编辑错误、基因疗法错误、核酸信息存储片段、微生物准种、病毒准种、器官移植、器官移植排斥、癌症复发、治疗后残留癌症、肿瘤前状态、发育异常状态、微嵌合状态、干细胞移植状态、细胞疗法状态、附着于另一分子的核酸标记或其组合。在一些实施例中,使用错误校正的序列读数来鉴定致癌化合物或暴露。在一些实施例中,使用错误校正的序列读数来鉴定诱变化合物或暴露。在一些实施例中,核酸材料来源于法医样品,并且错误校正的序列读数用于法医分析。In some embodiments, error-corrected sequence reads are used to identify or characterize cancer, cancer risk, cancer mutation, cancer metabolic state, mutant phenotype, oncogenicity in the organism or subject from which the double-stranded target nucleic acid molecule is derived drug exposure, toxin exposure, chronic inflammatory exposure, age, neurodegenerative diseases, pathogens, drug-resistant variants, fetal molecules, forensically relevant molecules, immunologically relevant molecules, mutated T cell receptors, mutated B cell receptors, Mutated immunoglobulin loci, kategis sites in the genome, hypermutation sites in the genome, low frequency variants, subclonal variants, minority molecular populations, sources of contamination, nucleic acid synthesis errors, enzymatic modification errors, chemical modification errors , gene editing errors, gene therapy errors, nucleic acid information storage fragments, microbial quasispecies, viral quasispecies, organ transplantation, organ transplant rejection, cancer recurrence, residual cancer after treatment, pre-neoplastic state, dysplasia state, microchimerism, A stem cell transplant state, a cell therapy state, a nucleic acid marker attached to another molecule, or a combination thereof. In some embodiments, error-corrected sequence reads are used to identify oncogenic compounds or exposures. In some embodiments, error-corrected sequence reads are used to identify mutagenic compounds or exposures. In some embodiments, the nucleic acid material is derived from a forensic sample and the error-corrected sequence reads are used for forensic analysis.

在一些实施例中,单分子标识符序列包括内源剪切点或可以与剪切点位置相关的内源序列。在一些实施例中,单分子标识符序列是简并或半简并的条形码序列、核酸材料的一个或多个核酸片段末端或其组合中的至少一种,其独特地标记双链核酸分子。在一些实施例中,衔接子和/或衔接子序列包括至少一个至少部分地非互补的核苷酸位置或者包括至少一个非标准碱基。在一些实施例中,衔接子包括由约5个或更多个自互补核苷酸形成的单个“U形”寡核苷酸序列。In some embodiments, the single-molecule identifier sequence includes an endogenous cleavage point or an endogenous sequence that can be associated with a cleavage point location. In some embodiments, the single-molecule identifier sequence is at least one of a degenerate or semi-degenerate barcode sequence, one or more nucleic acid fragment ends of nucleic acid material, or a combination thereof, which uniquely labels double-stranded nucleic acid molecules. In some embodiments, the adaptor and/or adaptor sequence includes at least one nucleotide position that is at least partially non-complementary or includes at least one non-standard base. In some embodiments, the adaptor comprises a single "U-shaped" oligonucleotide sequence formed from about 5 or more self-complementary nucleotides.

根据各种实施例,可以使用各种核酸材料中的任何一种。在一些实施例中,核酸材料可以包括对典型的糖-磷酸主链内的多核苷酸的至少一种修饰。在一些实施例中,核酸材料可以在核酸材料中的任何碱基内包括至少一种修饰。例如,作为非限制性的示例,在一些实施例中,核酸材料是或包括双链DNA、双链RNA、肽核酸(PAN)、锁核酸(LNA)中的至少一种。According to various embodiments, any of a variety of nucleic acid materials may be used. In some embodiments, the nucleic acid material can include at least one modification to a polynucleotide within a typical sugar-phosphate backbone. In some embodiments, the nucleic acid material can include at least one modification within any base in the nucleic acid material. For example, by way of non-limiting example, in some embodiments, the nucleic acid material is or includes at least one of double-stranded DNA, double-stranded RNA, peptide nucleic acid (PAN), locked nucleic acid (LNA).

在一些实施例中,所提供的方法进一步包括将衔接子分子连接到双链核酸分子上。在一些实施例中,连接步骤包含将双链核酸材料连接到至少一个双链简并的条形码序列上,以形成双链核酸分子条形码复合物,其中双链简并的条形码序列在每条链中包括单分子标识符序列。在一些实施例中,双链核酸分子是双链DNA分子或双链RNA分子。在一些实施例中,双链核酸分子包括至少一种修饰的核苷酸或非核苷酸分子。In some embodiments, the provided methods further comprise attaching an adaptor molecule to the double-stranded nucleic acid molecule. In some embodiments, the linking step comprises linking the double-stranded nucleic acid material to at least one double-stranded degenerate barcode sequence to form a double-stranded nucleic acid molecule barcode complex, wherein the double-stranded degenerate barcode sequence is in each strand Include single molecule identifier sequences. In some embodiments, the double-stranded nucleic acid molecule is a double-stranded DNA molecule or a double-stranded RNA molecule. In some embodiments, the double-stranded nucleic acid molecule includes at least one modified nucleotide or non-nucleotide molecule.

在一些实施例中,连接包括至少一种连接酶的活性。在一些实施例中,至少一种连接酶选自DNA连接酶和RNA连接酶。在一些实施例中,连接包括在与衔接子分子相关的连接结构域处的连接酶活性。在一些实施例中,连接包括在与衔接子分子相关的连接结构域和核酸分子的可连接末端处的连接酶活性。在一些实施例中,双链核酸分子的连接结构域和可连接末端是相容的(例如,具有彼此互补的单链区域)。在一些实施例中,连接结构域是来自或与一个或多个简并的或半简并的核苷酸相关的核苷酸序列。在一些实施例中,连接结构域是来自一个或多个非简并的核苷酸的核苷酸序列。在一些实施例中,连接结构域含有一个或多个修饰的核苷酸。在一些实施例中,连接结构域和/或可连接末端包括T-突出端、A-突出端、CG-突出端、钝端、重组序列、核酸内切酶切割位点突出端、限制性消化物突出端或另一个可连接区域。在一些实施例中,连接结构域的至少一条链被磷酸化。在一些实施例中,连接结构域包括核酸内切酶切割序列或其一部分。In some embodiments, the ligation includes the activity of at least one ligase. In some embodiments, the at least one ligase is selected from the group consisting of DNA ligases and RNA ligases. In some embodiments, the ligation comprises ligase activity at the ligation domain associated with the adaptor molecule. In some embodiments, the ligation comprises ligase activity at the ligation domain associated with the adaptor molecule and the ligable end of the nucleic acid molecule. In some embodiments, the linking domains and linkable ends of the double-stranded nucleic acid molecule are compatible (eg, have single-stranded regions that are complementary to each other). In some embodiments, the linking domain is a nucleotide sequence from or related to one or more degenerate or semi-degenerate nucleotides. In some embodiments, the linking domain is a nucleotide sequence from one or more non-degenerate nucleotides. In some embodiments, the linking domain contains one or more modified nucleotides. In some embodiments, the ligation domain and/or ligatable ends include T-overhangs, A-overhangs, CG-overhangs, blunt ends, recombination sequences, endonuclease cleavage site overhangs, restriction digests overhangs or another linkable region. In some embodiments, at least one chain of the linking domain is phosphorylated. In some embodiments, the linking domain includes an endonuclease cleavage sequence or a portion thereof.

在一些实施例中,核酸内切酶切割序列被核酸内切酶(例如,可调核酸内切酶、限制性核酸内切酶)切割,以产生钝端或带有可连接区域的突出端。在一些实施例中,双链核酸分子的可连接末端包括核酸内切酶切割序列或其一部分。在一些实施例中,核酸内切酶(例如,可编程的/靶向的核酸内切酶、限制性核酸内切酶)产生包括具有已知核苷酸长度(例如,1、2、3、4、5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20个或更多个核苷酸)和序列的“粘性末端”或单链突出区域的突出端。In some embodiments, endonuclease cleavage sequences are cleaved by endonucleases (eg, tunable endonucleases, restriction endonucleases) to produce blunt ends or overhangs with ligable regions. In some embodiments, the ligable end of the double-stranded nucleic acid molecule includes an endonuclease cleavage sequence or a portion thereof. In some embodiments, endonuclease (eg, programmable/targeted endonuclease, restriction endonuclease) production comprises having a known nucleotide length (eg, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more nucleotides) and "sticky ends" of the sequence or The overhang of the single-stranded overhang region.

在一些实施例中,标识符序列是或包括单分子标识符(SMI)序列。在一些实施例中,SMI序列是内源性SMI序列。在一些实施例中,内源性SMI序列与剪切点相关。在一些实施例中,SMI序列包括至少一个简并或半简并的核酸。在一些实施例中,SMI序列是非简并的。在一些实施例中,SMI序列是一个或多个简并或半简并的核苷酸的核苷酸序列。在一些实施例中,SMI序列是一个或多个非简并的核苷酸的核苷酸序列。在一些实施例中,SMI序列包括至少一个修饰的核苷酸或非核苷酸分子。在一些实施例中,SMI序列包括引物结合结构域。In some embodiments, the identifier sequence is or includes a single molecule identifier (SMI) sequence. In some embodiments, the SMI sequence is an endogenous SMI sequence. In some embodiments, the endogenous SMI sequence is associated with a splice point. In some embodiments, the SMI sequence includes at least one degenerate or semi-degenerate nucleic acid. In some embodiments, the SMI sequence is non-degenerate. In some embodiments, the SMI sequence is a nucleotide sequence of one or more degenerate or semi-degenerate nucleotides. In some embodiments, the SMI sequence is a nucleotide sequence of one or more non-degenerate nucleotides. In some embodiments, the SMI sequence includes at least one modified nucleotide or non-nucleotide molecule. In some embodiments, the SMI sequence includes a primer binding domain.

在一些实施例中,修饰的核苷酸或非核苷酸分子选自2-氨基嘌呤、2,6-二氨基嘌呤(2-氨基-dA)、5-溴代dU、脱氧尿苷、反式dT、反式双脱氧-T、双脱氧-C、5-甲基dC、脱氧肌苷、Super

Figure BDA0002682281560000051
Super
Figure BDA0002682281560000052
锁核酸、5-硝基吲哚、2'-O-甲基RNA碱基、羟甲基dC、异dG、异-dC、氟代C、氟代U、氟代A、氟代G、2-甲氧基乙氧基A、2-甲氧基乙氧基MeC、2-甲氧基乙氧基G、2-甲氧基乙氧基T、8-氧代-A、8-氧代G、5-羟甲基-2'-脱氧胞苷、5'-甲基异胞嘧啶、四氢呋喃、异胞嘧啶、异鸟苷、尿嘧啶、甲基化核苷酸、RNA核苷酸、核糖核苷酸、8-氧代-G、BrdU、Loto dU、呋喃、荧光染料、叠氮化物核苷酸、脱碱基核苷酸、5-硝基吲哚核苷酸和洋地黄毒苷核苷酸。In some embodiments, the modified nucleotide or non-nucleotide molecule is selected from the group consisting of 2-aminopurine, 2,6-diaminopurine (2-amino-dA), 5-bromo-dU, deoxyuridine, trans dT, trans-dideoxy-T, dideoxy-C, 5-methyl dC, deoxyinosine, Super
Figure BDA0002682281560000051
Super
Figure BDA0002682281560000052
Locked Nucleic Acid, 5-Nitroindole, 2'-O-Methyl RNA Base, Hydroxymethyl dC, Iso-dG, Iso-dC, Fluoro-C, Fluoro-U, Fluoro-A, Fluoro-G, 2 -Methoxyethoxy A, 2-methoxyethoxy MeC, 2-methoxyethoxy G, 2-methoxyethoxy T, 8-oxo-A, 8-oxo G, 5-hydroxymethyl-2'-deoxycytidine, 5'-methylisocytosine, tetrahydrofuran, isocytosine, isoguanosine, uracil, methylated nucleotides, RNA nucleotides, ribose Nucleotides, 8-oxo-G, BrdU, Loto dU, furans, fluorescent dyes, azide nucleotides, abasic nucleotides, 5-nitroindole nucleotides, and digoxigenin nucleosides acid.

在一些实施例中,切割位点是或包括限制性核酸内切酶识别序列。在一些实施例中,切割位点是或包括用于靶向的核酸内切酶(例如,CRISPR或CRISPR样核酸内切酶)或其他可调核酸内切酶的用户定向识别序列。在一些实施例中,切割核酸材料可以包括以下中的至少一种:酶消化、酶切割、一条链的酶切割、两条链的酶切割、掺入修饰的核酸随后进行酶处理(其导致一条或两条链的切割)、掺入复制阻断核苷酸、掺入链终止子、掺入可光裂解的接头、掺入尿嘧啶、掺入核糖碱基、掺入8-氧代-鸟嘌呤加合物、使用限制性核酸内切酶、使用核糖核蛋白核酸内切酶(例如Cas酶,诸如Cas9或CPF1)或其他可编程核酸内切酶(例如归巢核酸内切酶、锌指核酸酶、TALEN、巨核酸酶(例如megaTAL核酸酶)、精氨酸核酸酶等)以及它们的任意组合。In some embodiments, the cleavage site is or includes a restriction endonuclease recognition sequence. In some embodiments, the cleavage site is or includes a user-directed recognition sequence for a targeted endonuclease (eg, CRISPR or CRISPR-like endonuclease) or other tunable endonuclease. In some embodiments, cleavage of the nucleic acid material can include at least one of: enzymatic digestion, enzymatic cleavage, enzymatic cleavage of one strand, enzymatic cleavage of both strands, incorporation of modified nucleic acid followed by enzymatic treatment (which results in a or cleavage of both strands), incorporation of replication blocking nucleotides, incorporation of chain terminators, incorporation of photocleavable linkers, incorporation of uracil, incorporation of ribobases, incorporation of 8-oxo-ornithine Purine adducts, use of restriction endonucleases, use of ribonucleoprotein endonucleases (e.g. Cas enzymes such as Cas9 or CPF1) or other programmable endonucleases (e.g. homing endonucleases, zinc fingers Nucleases, TALENs, meganucleases (eg, megaTAL nucleases, arginine nucleases, etc.), and any combination thereof.

在一些实施例中,捕获标记是或包括以下中的至少一种:吖啶、叠氮化物、叠氮化物(NHS酯)、洋地黄毒苷(NHS酯)、I-接头、氨基改性剂C6、氨基改性剂C12、氨基改性剂C6dT、Unilink氨基改性剂、己炔基、5-辛二炔基dU、生物素、生物素(叠氮化物)、生物素dT、生物素TEG、双生物素、PC生物素、脱硫生物素TEG、硫醇改性剂C3、二硫醇、硫醇改性剂C6 S-S和琥珀酰基基团。In some embodiments, the capture label is or includes at least one of the following: acridine, azide, azide (NHS ester), digoxigenin (NHS ester), I-linker, amino modifier C6, amino modifier C12, amino modifier C6dT, Unilink amino modifier, hexynyl, 5-octadiynyl dU, biotin, biotin (azide), biotin dT, biotin TEG , Bibiotin, PC Biotin, Dethiobiotin TEG, Thiol Modifier C3, Dithiol, Thiol Modifier C6 S-S and Succinyl Groups.

在一些实施例中,提取部分是或包括氨基硅烷、环氧硅烷、异硫氰酸酯、氨基苯基硅烷、氨基丙基硅烷、巯基硅烷、醛、环氧化物、膦酸酯、链霉亲和素、抗生物素蛋白、识别抗体的半抗原、特定核酸序列、磁性吸引颗粒(Dynabeads)和光不稳定树脂中的至少一种。In some embodiments, the extraction moiety is or includes aminosilane, epoxysilane, isothiocyanate, aminophenylsilane, aminopropylsilane, mercaptosilane, aldehyde, epoxide, phosphonate, streptavidin At least one of and at least one of avidin, avidin, a hapten that recognizes an antibody, a specific nucleic acid sequence, magnetically attractive particles (Dynabeads), and a photolabile resin.

在一些实施例中,所提供的方法进一步包括通过使用特异于衔接子序列的引物和/或通过使用特异于核酸产物的非衔接子部分的引物来扩增核酸材料。预期根据各种实施例,可以使用用于扩增核酸材料的各种方法中的任何一种。例如,在一些实施例中,至少一个扩增步骤包括聚合酶链式反应(PCR)、滚环扩增(RCA)、多重置换扩增(MDA)、等温扩增、乳液内的聚合酶克隆扩增、在表面上、珠的表面上或在水凝胶内的桥接扩增,以及它们的任何组合。在一些实施例中,扩增核酸材料包含使用与第一衔接子序列和第二衔接子序列的区域至少部分地互补(例如,与核酸材料的每条链的5'和/或3'末端上的衔接子序列至少部分地互补)的单链寡核苷酸。在一些实施例中,扩增核酸材料包含使用与相关的基因组序列的区域至少部分地互补的单链寡核苷酸和与衔接子序列的区域至少部分地互补的单链寡核苷酸。In some embodiments, the provided methods further comprise amplifying the nucleic acid material by using primers specific for the adaptor sequence and/or by using primers specific for the non-adapter portion of the nucleic acid product. It is contemplated that any of a variety of methods for amplifying nucleic acid material may be used in accordance with various embodiments. For example, in some embodiments, at least one amplification step comprises polymerase chain reaction (PCR), rolling circle amplification (RCA), multiple displacement amplification (MDA), isothermal amplification, polymerase clonal amplification in emulsion Amplification, bridging amplification on the surface, on the surface of a bead, or within a hydrogel, and any combination thereof. In some embodiments, amplifying the nucleic acid material comprises using regions that are at least partially complementary to the first adaptor sequence and the second adaptor sequence (eg, with regions on the 5' and/or 3' ends of each strand of the nucleic acid material. single-stranded oligonucleotides that are at least partially complementary to the adaptor sequences). In some embodiments, amplifying the nucleic acid material comprises using single-stranded oligonucleotides that are at least partially complementary to regions of the relevant genomic sequence and single-stranded oligonucleotides that are at least partially complementary to regions of adaptor sequences.

在一些实施例中,扩增核酸材料包含生成源自第一链的多个扩增子和源自第二链的多个扩增子。In some embodiments, amplifying the nucleic acid material comprises generating a plurality of amplicons derived from the first strand and a plurality of amplicons derived from the second strand.

在一些实施例中,所提供的方法进一步包括以下步骤:用一种或多种靶向的核酸内切酶切割核酸材料,使得形成基本上已知长度的靶核酸片段,以及基于基本上已知的长度来分离靶核酸片段。在一些实施例中,所提供的方法进一步包括将衔接子(例如衔接子序列)连接到基本上已知长度的靶核酸(例如靶核酸片段)上(例如,在大小富集步骤之后)。In some embodiments, the provided methods further comprise the steps of cleaving the nucleic acid material with one or more targeted endonucleases such that target nucleic acid fragments of substantially known length are formed, and length to separate target nucleic acid fragments. In some embodiments, provided methods further comprise ligating an adaptor (eg, an adaptor sequence) to a target nucleic acid (eg, a target nucleic acid fragment) of substantially known length (eg, after a size enrichment step).

在一些实施例中,核酸材料可以是或包括一个或多个靶核酸片段。在一些实施例中,一个或多个靶核酸片段各自包括来自基因组中一个或多个位置的相关的基因组序列。在一些实施例中,一个或多个靶核酸片段包括来自核酸材料中基本上已知区域的靶向的序列。在一些实施例中,基于基本上已知的长度来分离靶核酸片段包含通过凝胶电泳、凝胶纯化、液相色谱法、大小排阻纯化、过滤或SPRI珠纯化来富集靶核酸片段。In some embodiments, the nucleic acid material can be or include one or more target nucleic acid fragments. In some embodiments, the one or more target nucleic acid fragments each comprise related genomic sequences from one or more locations in the genome. In some embodiments, the one or more target nucleic acid fragments comprise targeted sequences from substantially known regions of the nucleic acid material. In some embodiments, isolating target nucleic acid fragments based on substantially known lengths comprises enriching target nucleic acid fragments by gel electrophoresis, gel purification, liquid chromatography, size exclusion purification, filtration, or SPRI bead purification.

在一些实施例中,所提供的方法进一步包括以下步骤:用一种或多种靶向的核酸内切酶切割双链核酸材料,使得形成双链靶核酸片段,所述双链靶核酸片段包括具有基本上已知的长度和/或单链突出端的序列的一端或两端。在一些实施例中,所提供的方法进一步包括基于基本上已知的长度和/或单链突出端的序列来分离双链靶核酸片段的步骤。在一些实施例中,所提供的方法进一步包括将衔接子(例如衔接子序列)连接到具有基本上已知的长度和/或单链突出端的序列的双链靶核酸(例如靶核酸片段)上。在一些实施例中,双链靶核酸可以具有与连接选择的衔接子分子的连接结构域基本上唯一相容(例如互补)的可连接末端,使得包括来自核酸材料内基本上已知区域的靶向的序列的一个或多个靶核酸片段可以通过用与连接选择的衔接子相关的衔接子序列特异性的引物扩增被选择性富集。In some embodiments, the provided methods further comprise the step of cleaving the double-stranded nucleic acid material with one or more targeted endonucleases such that double-stranded target nucleic acid fragments are formed, the double-stranded target nucleic acid fragments comprising One or both ends of a sequence having substantially known lengths and/or single-stranded overhangs. In some embodiments, provided methods further comprise the step of isolating double-stranded target nucleic acid fragments based on substantially known lengths and/or sequences of single-stranded overhangs. In some embodiments, provided methods further comprise ligating an adaptor (eg, an adaptor sequence) to a double-stranded target nucleic acid (eg, a target nucleic acid fragment) having a substantially known length and/or sequence of single-stranded overhangs . In some embodiments, the double-stranded target nucleic acid can have ligatable ends that are substantially uniquely compatible (eg, complementary) to the ligation domains of the ligation-selected adaptor molecule, such that targets from substantially known regions within the nucleic acid material are included One or more target nucleic acid fragments of a targeted sequence can be selectively enriched by amplification with primers specific for the adaptor sequence associated with the ligated selected adaptor.

根据各种实施例,一些所提供的方法可以用于对核酸材料的各种次优(例如,被损坏的或降解的)样品中的任何一种进行测序。例如,在一些实施例中,至少一些核酸材料被损坏。在一些实施例中,损伤是或包括氧化、烷基化、脱氨基、甲基化、水解、羟基化、切口、链内交联、链间交联、钝端链断裂、交错末端双链断裂、磷酸化、去磷酸化、类泛素化、糖基化、去糖基化、腐胺酰化、羧酰化、卤化、甲酰化、单链间隙、由热引起的损伤、由干燥引起的损伤、由UV暴露引起的损伤、由γ辐射引起的损伤、由X辐射引起的损伤、由电离辐射引起的损伤、由非电离辐射引起的损伤、由重颗粒辐射引起的损伤、由核衰变引起的损伤、由β辐射引起的损伤、由α辐射引起的损伤、由中子辐射引起的损伤、由质子辐射引起的损伤、由宇宙辐射引起的损伤、由高pH引起的损伤、由低pH引起的损伤、由活性氧化物质引起的损伤、由自由基引起的损伤、由过氧化物引起的损伤、由次氯酸盐引起的损伤、由诸如福尔马林或甲醛等的组织固定引起的损伤、由活性铁引起的损伤、由低离子条件引起的损伤、由高离子条件引起的损伤、由无缓冲条件引起的损伤、由核酸酶引起的损伤、由环境暴露引起的损伤、由火灾引起的损伤、由机械应力引起的损伤、由酶降解引起的损伤、由微生物引起的损伤、由制备性机械剪切引起的损伤、由制备性酶切引起的损伤、在体内自然发生的损伤、在核酸提取期间发生的损伤、在测序文库制备期间发生的损伤、通过聚合酶引入的损伤、在核酸修复期间引入的损伤、在核酸末端拖尾期间发生的损伤、在核酸连接期间发生的损伤、在测序期间发生的损伤,由于机械处理DNA而发生的损伤、在通过纳米孔的期间发生的损伤、作为在生物体中老化的一部分而发生的损伤、由于个体的化学暴露而发生的损伤、由于诱变剂而发生的损伤、由于致癌物而发生的损伤、由断裂剂而发生的损伤、由于氧暴露引起的体内炎症损伤而发生的损伤、由于一条或多条链断裂而引起的损伤以及它们的任意组合中的至少一种。According to various embodiments, some of the provided methods can be used to sequence any of various suboptimal (eg, damaged or degraded) samples of nucleic acid material. For example, in some embodiments, at least some of the nucleic acid material is damaged. In some embodiments, the damage is or includes oxidation, alkylation, deamination, methylation, hydrolysis, hydroxylation, nicking, intrachain crosslinks, interchain crosslinks, blunt end chain breaks, staggered end double strand breaks , phosphorylation, dephosphorylation, ubiquitination, glycosylation, deglycosylation, putrescylation, carboxylation, halogenation, formylation, single-strand gap, damage caused by heat, caused by drying damage from UV exposure, damage from gamma radiation, damage from X-radiation, damage from ionizing radiation, damage from non-ionizing radiation, damage from heavy particle radiation, damage from nuclear decay damage caused by beta radiation, damage caused by alpha radiation, damage caused by neutron radiation, damage caused by proton radiation, damage caused by cosmic radiation, damage caused by high pH, damage caused by low pH Damage caused by reactive oxidizing species, damage caused by free radicals, damage caused by peroxides, damage caused by hypochlorite, damage caused by tissue fixation such as formalin or formaldehyde damage, damage caused by active iron, damage caused by low ionic conditions, damage caused by high ionic conditions, damage caused by unbuffered conditions, damage caused by nucleases, damage caused by environmental exposure, caused by fire damage caused by mechanical stress, damage caused by enzymatic degradation, damage caused by microorganisms, damage caused by preparative mechanical shearing, damage caused by preparative enzymatic cleavage, damage naturally occurring in vivo, in Damage occurring during nucleic acid extraction, Damage occurring during sequencing library preparation, Damage introduced by polymerase, Damage introduced during nucleic acid repair, Damage occurring during nucleic acid end tailing, Damage occurring during nucleic acid ligation, Damage occurring during nucleic acid ligation, Damage that occurs during sequencing, damage that occurs due to mechanical processing of DNA, damage that occurs during passage through a nanopore, damage that occurs as part of aging in an organism, damage that occurs due to chemical exposure of an individual, damage that occurs due to induced Damage due to mutagenic agents, damage due to carcinogens, damage due to cleavage agents, damage due to in vivo inflammatory damage due to oxygen exposure, damage due to one or more strand breaks, and their At least one of any combination.

预期核酸材料可以来自多种来源。例如,在一些实施例中,从来自人类受试者、动物、植物、真菌、病毒、细菌、原生动物或任何其他生命形式的样品中提供核酸材料(例如,包括一个或多个双链核酸分子)。在其他实施例中,样品包括已经被至少部分地人工合成的核酸材料。在一些实施例中,样品是或包括身体组织、活检样品、皮肤样品、血液、血清、血浆、汗液、唾液、脑脊液、粘液、子宫灌洗液、阴道拭子、巴氏涂片、鼻拭子、口腔拭子、组织刮屑、毛发、指纹、尿液、粪便、玻璃体液、腹膜洗液、痰液、支气管灌洗液、口腔灌洗液、胸膜灌洗液、胃灌洗液、胃液、胆汁、胰管灌洗液、胆管灌洗液、胆总管灌洗液、胆囊液、滑液、感染的伤口、未感染的伤口、考古样品、法医样品、水样品、组织样品、食品样品、生物反应器样品、植物样品、细菌样品、原生动物样品、真菌样品、动物样品、病毒样品、多生物样品、指甲刮屑、精液、前列腺液、阴道液、阴道拭子、输卵管灌洗液、无细胞核酸、细胞内的核酸、宏基因组样品、植入的异物的灌洗液或拭子、鼻灌洗液、肠液、上皮刷取物、上皮灌洗液、组织活检样品、尸检样品、尸体剖检样品、器官样品、人类识别样品、非人类识别样品、人工产生的核酸样品、合成基因样品、库存的或储存的核酸样品、肿瘤组织、胎儿样品、器官移植样品、微生物培养样品、细胞核DNA样品、线粒体DNA样品、叶绿体DNA样品、顶质体DNA样品、细胞器样品以及它们的任意组合。在一些实施例中,核酸材料来自多于一种的来源。It is contemplated that nucleic acid material can come from a variety of sources. For example, in some embodiments, nucleic acid material (eg, comprising one or more double-stranded nucleic acid molecules) is provided from a sample from a human subject, animal, plant, fungus, virus, bacteria, protozoa, or any other life form ). In other embodiments, the sample includes nucleic acid material that has been at least partially synthesized. In some embodiments, the sample is or includes body tissue, biopsy sample, skin sample, blood, serum, plasma, sweat, saliva, cerebrospinal fluid, mucus, uterine lavage, vaginal swab, Pap smear, nasal swab , oral swabs, tissue scrapings, hair, fingerprints, urine, feces, vitreous humor, peritoneal lavage, sputum, bronchial lavage, oral lavage, pleural lavage, gastric lavage, gastric juice, Bile, pancreatic duct lavage, bile duct lavage, common bile duct lavage, gallbladder fluid, synovial fluid, infected wounds, uninfected wounds, archaeological samples, forensic samples, water samples, tissue samples, food samples, biological Reactor samples, plant samples, bacterial samples, protozoa samples, fungal samples, animal samples, viral samples, multi-organism samples, nail scrapings, semen, prostatic fluid, vaginal fluid, vaginal swabs, tubal lavage fluid, acellular Nucleic acids, intracellular nucleic acids, metagenomic samples, lavages or swabs of implanted foreign bodies, nasal lavages, intestinal fluids, epithelial brushes, epithelial lavages, tissue biopsy samples, autopsy samples, necropsy Samples, organ samples, human identification samples, non-human identification samples, artificially generated nucleic acid samples, synthetic gene samples, banked or stored nucleic acid samples, tumor tissue, fetal samples, organ transplant samples, microbial culture samples, nuclear DNA samples, Mitochondrial DNA samples, chloroplast DNA samples, acroplast DNA samples, organelle samples, and any combination thereof. In some embodiments, the nucleic acid material is from more than one source.

如本文所描述的,在一些实施例中,处理核酸材料以便提高测序过程的效率、准确性和/或速度是有利的。在一些实施例中,核酸材料包括基本上均匀的长度和/或基本上已知的长度的核酸分子。在一些实施例中,基本上均匀的长度和/或基本上已知的长度在约1至约1,000,000个碱基之间。例如,在一些实施例中,基本上均匀的长度和/或基本上已知的长度可以是至少1;2;3;4;5;6;7;8;9;10;15;20;25;30;35;40;50;60;70;80;90;100;120;150;200;300;400;500;600;700;800;900;1000;1200;1500;2000;3000;4000;5000;6000;7000;8000;9000;10,000;15,000;20,000;30,000;40,000;或50,000个碱基长度。在一些实施例中,基本上均匀的长度和/或基本上已知的长度可以至多为60,000;70,000;80,000;90,000;100,000;120,000;150,000;200,000;300,000;400,000;500,000;600,000;700,000;800,000;900,000;或1,000,000个碱基。作为具体的非限制性示例,在一些实施例中,基本上均匀的长度和/或基本上已知的长度是约100至约500个碱基。在一些实施例中,本文描述的方法包括靶向富集核酸材料,从而提供具有一个或多于一个长度和/或基本上已知的长度的核酸分子的步骤。在一些实施例中,通过一种或多种靶向的核酸内切酶将核酸材料切割成基本上均匀的长度和/或基本上已知的长度的核酸分子。在一些实施例中,靶向的核酸内切酶包括至少一种修饰。As described herein, in some embodiments, it may be advantageous to process nucleic acid material in order to increase the efficiency, accuracy, and/or speed of the sequencing process. In some embodiments, the nucleic acid material comprises nucleic acid molecules of substantially uniform length and/or of substantially known length. In some embodiments, the substantially uniform length and/or the substantially known length is between about 1 and about 1,000,000 bases. For example, in some embodiments, the substantially uniform length and/or the substantially known length may be at least 1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 15; 20; 25 ;30;35;40;50;60;70;80;90;100;120;150;200;300;400;500;600;700;800;900;1000;1200;1500;2000;3000;4000 5000; 6000; 7000; 8000; 9000; 10,000; 15,000; 20,000; 30,000; 40,000; or 50,000 bases in length. In some embodiments, the substantially uniform length and/or the substantially known length may be at most 60,000; 70,000; 80,000; 90,000; 100,000; 120,000; ; 900,000; or 1,000,000 bases. As a specific non-limiting example, in some embodiments, the substantially uniform length and/or the substantially known length is from about 100 to about 500 bases. In some embodiments, the methods described herein include the step of targeting enriched nucleic acid material, thereby providing nucleic acid molecules of one or more lengths and/or of substantially known lengths. In some embodiments, the nucleic acid material is cleaved into nucleic acid molecules of substantially uniform length and/or of substantially known length by one or more targeted endonucleases. In some embodiments, the targeted endonuclease includes at least one modification.

在一些实施例中,核酸材料包括长度在一个或多个基本上已知的大小范围内的核酸分子。在一些实施例中,核酸分子可以是1至约1,000,000个碱基、约10至约10,000个碱基、约100至约1000个碱基、约100至约600个碱基、约100至约500个碱基,或它们的某种组合。In some embodiments, the nucleic acid material includes nucleic acid molecules of length within one or more substantially known size ranges. In some embodiments, nucleic acid molecules can be 1 to about 1,000,000 bases, about 10 to about 10,000 bases, about 100 to about 1000 bases, about 100 to about 600 bases, about 100 to about 500 bases bases, or some combination of them.

在一些实施例中,靶向的核酸内切酶是或包括在识别位点处或其附近切割DNA的限制性核酸内切酶(即限制性酶)(例如,EcoRI、BamHI、XbaI、HindIII、AluI、AvaII、BsaJI、BstNI、DsaV、Fnu4HI、HaeIII、MaeIII、N1aIV、NSiI、MspJI、FspEI、NaeI、Bsu36I、NotI、HinF1、Sau3AI、PvuII、SmaI、HgaI、AluI、EcoRV等)中的至少一种。若干种限制性核酸内切酶的列表以印刷的和计算机可读的形式可获得,并且由许多商业供应商(例如,马萨诸塞州伊普斯维奇的New England Biolabs)提供。本领域普通技术人员将理解,根据本技术的各种实施例,可以使用任何限制性核酸内切酶。在其他实施例中,靶向的核酸内切酶是或包括核糖核蛋白复合物中的至少一种,诸如例如CRISPR相关的(Cas)酶/导向RNA复合物(例如Cas9或Cpf1)或Cas9样酶。在其他实施例中,靶向的核酸内切酶是或包括归巢核酸内切酶、锌指核酸酶、TALEN和/或巨核酸酶(例如,megaTAL核酸酶等)、精氨酸核酸酶或其组合。在一些实施例中,靶向的核酸内切酶包括Cas9或CPF1或其衍生物。在一些实施例中,可以使用多于一种的靶向的核酸内切酶(例如,2、3、4、5、6、7、8、9、10种或更多种)。在一些实施例中,靶向的核酸内切酶可以用于切割核酸材料的多于一个的潜在靶区域(例如,2、3、4、5、6、7、8、9、10个或更多个)。在一些实施例中,在存在核酸材料的多于一个的靶区域的情况下,每个靶区域可以具有相同(或基本上相同)的长度。在一些实施例中,在存在核酸材料的多于一个的靶区域的情况下,已知长度的靶区域中的至少两个在长度上不同(例如,第一靶区域具有100bp的长度,并且第二靶区域具有1,000bp的长度)。In some embodiments, the targeted endonuclease is or includes a restriction endonuclease (ie, a restriction enzyme) that cleaves DNA at or near the recognition site (eg, EcoRI, BamHI, XbaI, HindIII, At least one of AluI, AvaII, BsaJI, BstNI, DsaV, Fnu4HI, HaeIII, MaeIII, N1aIV, NSil, MspJI, FspEI, NaeI, Bsu36I, NotI, HinF1, Sau3AI, PvuII, SmaI, HgaI, AluI, EcoRV, etc.) . Lists of several restriction endonucleases are available in printed and computer readable form and are provided by a number of commercial suppliers (eg, New England Biolabs, Ipswich, MA). One of ordinary skill in the art will appreciate that any restriction endonuclease may be used in accordance with various embodiments of the present technology. In other embodiments, the targeted endonuclease is or includes at least one of a ribonucleoprotein complex, such as, for example, a CRISPR-associated (Cas) enzyme/guide RNA complex (eg, Cas9 or Cpf1) or a Cas9-like enzymes. In other embodiments, the targeted endonuclease is or includes a homing endonuclease, zinc finger nuclease, TALEN and/or meganuclease (eg, megaTAL nuclease, etc.), arginine nuclease, or its combination. In some embodiments, the targeted endonuclease includes Cas9 or CPF1 or a derivative thereof. In some embodiments, more than one targeted endonuclease may be used (eg, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more). In some embodiments, the targeted endonuclease can be used to cleave more than one potential target region (eg, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more) of the nucleic acid material multiple). In some embodiments, where there is more than one target region of nucleic acid material, each target region may have the same (or substantially the same) length. In some embodiments, where there is more than one target region of nucleic acid material, at least two of the target regions of known length differ in length (eg, the first target region has a length of 100 bp, and the second The two target regions have a length of 1,000 bp).

在一些实施例中,至少一个扩增步骤包含至少一个引物和/或衔接子序列,其是或包括至少一个非标准核苷酸。作为另外的示例,在一些实施例中,至少一个衔接子序列是或包括至少一个非标准核苷酸。在一些实施例中,非标准核苷酸选自尿嘧啶、甲基化核苷酸、RNA核苷酸、核糖核苷酸、8-氧代-鸟嘌呤、生物素化的核苷酸、脱硫生物素核苷酸、硫醇修饰的核苷酸、丙烯酸酯修饰的核苷酸、异-dC、异dG、2'-O-甲基核苷酸、肌苷核苷酸锁核酸、肽核酸、5甲基dC、5-溴脱氧尿苷、2,6-二氨基嘌呤、2-氨基嘌呤核苷酸、脱碱基核苷酸、5-硝基吲哚核苷酸、腺苷酸化核苷酸、叠氮化物核苷酸、洋地黄毒苷核苷酸、I-接头、5'己炔基修饰的核苷酸、5-辛二炔基dU、可光裂解的间隔子、非可光裂解的间隔子、点击化学相容的修饰核苷酸、荧光染料、生物素、呋喃、BrdU、氟代-dU、loto-dU以及它们的任意组合。In some embodiments, at least one amplification step comprises at least one primer and/or adaptor sequence that is or includes at least one non-standard nucleotide. As a further example, in some embodiments, at least one adaptor sequence is or includes at least one non-standard nucleotide. In some embodiments, the non-standard nucleotides are selected from the group consisting of uracil, methylated nucleotides, RNA nucleotides, ribonucleotides, 8-oxo-guanine, biotinylated nucleotides, desulfurized Biotin nucleotides, thiol-modified nucleotides, acrylate-modified nucleotides, iso-dC, iso-dG, 2'-O-methyl nucleotides, inosine nucleotides locked nucleic acids, peptide nucleic acids , 5-methyl dC, 5-bromodeoxyuridine, 2,6-diaminopurine, 2-aminopurine nucleotide, abasic nucleotide, 5-nitroindole nucleotide, adenylated nucleus nucleotides, azide nucleotides, digoxigenin nucleotides, I-linkers, 5'hexynyl modified nucleotides, 5-octadiynyl dU, photocleavable spacers, non-removable Photocleavable spacers, click chemistry compatible modified nucleotides, fluorescent dyes, biotin, furan, BrdU, fluoro-dU, loto-dU, and any combination thereof.

根据若干个实施例,可以使用多种分析步骤中的任何一种,以便提高所提供的过程的准确性、速度和效率中的一个或多个。例如,在一些实施例中,对双链核酸分子的第一核酸链和第二核酸链中的每一条进行测序包含比较源自第一核酸链的多条链的序列以确定第一链共有序列,以及比较源自第二核酸链的多条链的序列以确定第二链共有序列。在一些实施例中,比较第一核酸链的序列与第二核酸链的序列包括比较第一链共有序列和第二链共有序列,以提供错误校正的共有序列。在其他实施例中,双链靶核酸分子的错误校正的序列可以通过将来自第一核酸链的单个序列读数与来自第二核酸链的单个序列读数进行比较来确定。According to several embodiments, any of a variety of analysis steps may be used in order to improve one or more of the accuracy, speed, and efficiency of the provided process. For example, in some embodiments, sequencing each of the first nucleic acid strand and the second nucleic acid strand of the double-stranded nucleic acid molecule comprises comparing the sequences of the multiple strands derived from the first nucleic acid strand to determine the first strand consensus sequence , and comparing the sequences of the multiple strands derived from the second nucleic acid strand to determine the second strand consensus sequence. In some embodiments, comparing the sequence of the first nucleic acid strand to the sequence of the second nucleic acid strand includes comparing the first strand consensus sequence and the second strand consensus sequence to provide an error-corrected consensus sequence. In other embodiments, the error-corrected sequence of a double-stranded target nucleic acid molecule can be determined by comparing a single sequence read from a first nucleic acid strand to a single sequence read from a second nucleic acid strand.

由一些实施例提供的一个方面是从非常少量的核酸材料生成高质量测序信息的能力。在一些实施例中,所提供的方法和组合物可以与至多约1皮克(pg);10pg;100pg;1纳克(ng);10ng;100ng;200ng、300ng、400ng、500ng、600ng、700ng、800ng、900ng或1000ng的量的起始核酸材料一起使用。在一些实施例中,所提供的方法和组合物可以与输入量的核酸材料一起使用,所述输入量为至多1个分子拷贝或基因组等同物、10个分子拷贝或其基因组等同物、100个分子拷贝或其基因组等同物、1,000个分子拷贝或其基因组等同物、10,000个分子拷贝或其基因组等同物、100,000个分子拷贝或其基因组等同物或1,000,000个分子拷贝或其基因组等同物。例如,在一些实施例中,最初为特定的测序过程提供至多1,000ng的核酸材料。例如,在一些实施例中,最初为特定的测序过程提供至多100ng的核酸材料。例如,在一些实施例中,最初为特定的测序过程提供至多10ng的核酸材料。例如,在一些实施例中,最初为特定的测序过程提供至多1ng的核酸材料。例如,在一些实施例中,最初为特定的测序过程提供至多100pg的核酸材料。例如,在一些实施例中,最初为特定的测序过程提供至多1pg的核酸材料。One aspect provided by some embodiments is the ability to generate high quality sequencing information from very small amounts of nucleic acid material. In some embodiments, provided methods and compositions can be combined with up to about 1 picogram (pg); 10 pg; 100 pg; 1 nanogram (ng); 10 ng; 100 ng; , 800ng, 900ng or 1000ng of starting nucleic acid material were used together. In some embodiments, the provided methods and compositions can be used with an input amount of nucleic acid material of up to 1 molecular copy or genomic equivalent, 10 molecular copies or genomic equivalent thereof, 100 Molecular copies or genomic equivalents thereof, 1,000 molecular copies or genomic equivalents thereof, 10,000 molecular copies or genomic equivalents thereof, 100,000 molecular copies or genomic equivalents thereof, or 1,000,000 molecular copies or genomic equivalents thereof. For example, in some embodiments, up to 1,000 ng of nucleic acid material is initially provided for a particular sequencing process. For example, in some embodiments, up to 100 ng of nucleic acid material is initially provided for a particular sequencing process. For example, in some embodiments, up to 10 ng of nucleic acid material is initially provided for a particular sequencing process. For example, in some embodiments, up to 1 ng of nucleic acid material is initially provided for a particular sequencing process. For example, in some embodiments, up to 100 pg of nucleic acid material is initially provided for a particular sequencing process. For example, in some embodiments, up to 1 pg of nucleic acid material is initially provided for a particular sequencing process.

如在本申请中所使用的,术语“约”和“近似”被用作等同物。本文中对出版物、专利或专利申请的任何引用均通过引用以其整体并入。在本申请中使用的任何带有或不带有约/近似的数字意在覆盖相关领域的普通技术人员所理解的任何正常波动。As used in this application, the terms "about" and "approximately" are used as equivalents. Any reference herein to a publication, patent or patent application is incorporated by reference in its entirety. Any use of numbers with or without about/approximate in this application is intended to cover any normal fluctuations as understood by one of ordinary skill in the relevant art.

在各种实施例中,以更快的速率(例如,用更少的步骤)和更低的成本(例如,使用更少的试剂)提供核酸材料的富集,包含将核酸材料富集到相关的区域,并导致所需数据的增加。本技术的各个方面在临床前和临床测试和诊断以及其他应用中都具有许多应用。In various embodiments, the enrichment of nucleic acid material is provided at a faster rate (eg, with fewer steps) and at a lower cost (eg, using fewer reagents), comprising enriching the nucleic acid material to a relevant area and result in an increase in the required data. Various aspects of the present technology have numerous applications in preclinical and clinical testing and diagnostics, as well as other applications.

在下面并参考图1-22C描述了该技术的若干个实施例的具体细节。尽管许多实施例在本文中是关于双链测序来描述的,但是除了本文描述的那些之外,能够生成错误校正的测序读数的其他测序模式、用于提供序列信息的其他测序模式也在本技术的范围内。此外,预期其他核酸询问受益于本文所述的核酸富集方法和试剂。此外,本技术的其他实施例可以具有不同于本文描述的配置、组分或程序。因此,本领域普通技术人员将相应地理解,该技术可以具有带有附加要素的其他实施例,并且该技术可以具有没有下面参考图1-22C所示和所描述的若干个特征的其他实施例。Specific details of several embodiments of the technology are described below and with reference to Figures 1-22C. Although many of the examples are described herein with respect to double-stranded sequencing, other sequencing modalities capable of generating error-corrected sequencing reads, other sequencing modalities for providing sequence information, in addition to those described herein, are also in the present technology In the range. In addition, other nucleic acid interrogations are expected to benefit from the nucleic acid enrichment methods and reagents described herein. Furthermore, other embodiments of the present technology may have configurations, components, or procedures other than those described herein. Accordingly, those of ordinary skill in the art will accordingly understand that the technology can have other embodiments with additional elements, and that the technology can have other embodiments without several of the features shown and described below with reference to FIGS. 1-22C .

附图说明Description of drawings

参考以下附图,可以更好地理解本公开的许多方面。附图中的部件不一定按比例绘制。而是,重点在于清楚地说明本公开的原理。Many aspects of the present disclosure may be better understood with reference to the following drawings. Components in the figures are not necessarily drawn to scale. Rather, emphasis is placed upon clearly illustrating the principles of the present disclosure.

图1是根据本技术的实施例的绘制了在扩增后核酸插入物大小和所得的家族大小之间的关系的图。1 is a graph plotting the relationship between nucleic acid insert size and resulting family size after amplification, according to embodiments of the present technology.

图2A和2B是示出了根据本技术的方面针对不同核酸插入物大小生成的测序数据的示意图。2A and 2B are schematic diagrams illustrating sequencing data generated for different nucleic acid insert sizes in accordance with aspects of the present technology.

图3是示出了根据本技术的实施例的用于利用CRISPR/Cas9生成靶向的片段大小的方法的步骤的示意图。图A示出了Cas9在靶向的DNA位点的gRNA促进的结合。Cas9定向的切割释放已知长度的钝端双链靶DNA片段,如图B所示。图C描绘了通过大小选择对靶DNA片段进行阳性富集/选择的进一步处理步骤。任选地,如图D中所描绘的,富集的DNA片段可以连接到用于核酸询问(诸如测序)的衔接子上。3 is a schematic diagram illustrating the steps of a method for generating targeted fragment sizes using CRISPR/Cas9 according to an embodiment of the present technology. Panel A shows gRNA-facilitated binding of Cas9 at the targeted DNA site. Cas9-directed cleavage releases blunt-ended double-stranded target DNA fragments of known length, as shown in panel B. Panel C depicts further processing steps for positive enrichment/selection of target DNA fragments by size selection. Optionally, as depicted in Figure D , the enriched DNA fragments can be ligated to adaptors for nucleic acid interrogation, such as sequencing.

图4是示出了根据本技术的实施例的用于利用CRISPR/Cas9变体生成具有已知/选定长度的靶向的核酸片段的方法的步骤的示意图。使用CRISPR/Cas9核糖核蛋白复合物,该复合物被工程改造以在合适的条件下保持与DNA结合,图A示出了变体Cas9与靶向的DNA位点的gRNA促进的结合。在切割后并且当Cas9保持与靶DNA片段的切割的5'和3'末端结合时,图B示出了用核酸外切酶处理样品,以水解DNA的暴露的3'或5'末端的暴露的磷酸二酯键。在通过核酸外切酶破坏所有非靶向的DNA而对靶DNA片段进行阴性/富集选择后,Cas9与DNA分离并且释放已知长度的钝端双链靶DNA片段,如图C所示。图D描绘了通过大小选择对靶DNA片段进行阳性富集/选择的任选的进一步处理步骤。任选地,如在图E中所描绘的,富集的DNA片段可以连接到用于核酸询问(诸如测序)的衔接子上。4 is a schematic diagram illustrating the steps of a method for generating targeted nucleic acid fragments of known/selected lengths using CRISPR/Cas9 variants, according to embodiments of the present technology. Using the CRISPR/Cas9 ribonucleoprotein complex engineered to remain bound to DNA under suitable conditions, Panel A shows gRNA-promoted binding of variant Cas9 to a targeted DNA site. After cleavage and while Cas9 remains bound to the cleaved 5' and 3' ends of the target DNA fragment, panel B shows the sample was treated with an exonuclease to hydrolyze the exposed 3' or 5' ends of the DNA phosphodiester bond. After negative/enrichment selection of target DNA fragments by exonuclease destruction of all non-targeted DNA, Cas9 dissociates from DNA and releases blunt-ended double-stranded target DNA fragments of known length, as shown in panel C. Panel D depicts an optional further processing step for positive enrichment/selection of target DNA fragments by size selection. Optionally, as depicted in Figure E , the enriched DNA fragments can be ligated to adaptors for nucleic acid interrogation, such as sequencing.

图5是示出了根据本技术的另一个实施例的用于利用CRISPR/Cas9变体生成具有已知/选定长度的靶向的核酸片段的方法的步骤的示意图。图A示出了使用CRISPR/Cas9核糖核蛋白复合物,该复合物被工程改造为在合适的条件下保持与DNA结合,其中核糖核蛋白复合物包括捕获标记。在变体Cas9核糖核蛋白复合物与捕获标记的导向RNA(gRNA)-促进的结合之后是双链靶DNA的切割。在切割后并且当Cas9保持与靶DNA片段的切割的5'和3'末端结合时,图B示出了用核酸外切酶处理样品,以水解DNA的暴露的3'或5'末端的暴露的磷酸二酯键。在通过核酸外切酶破坏所有非靶向的DNA而对靶DNA片段进行阴性/富集选择之后,并且当Cas9保持结合时,图C示出了靶核酸捕获的阳性富集/选择过程,该过程包括逐步添加官能化的表面,该官能化的表面能够结合与核糖核蛋白复合物相关的捕获标记,因为它保持与靶核酸结合。在基于亲和的富集步骤之后,并且如图D所描绘的,Cas9与DNA分离并且释放出已知长度的钝端双链靶DNA片段。图E描绘了通过大小选择对靶DNA片段进行阳性富集/选择的任选的进一步处理步骤。任选地,如图F中所描绘的,富集的DNA片段可以连接到用于核酸询问(诸如测序)的衔接子上。5 is a schematic diagram illustrating the steps of a method for generating targeted nucleic acid fragments of known/selected length using CRISPR/Cas9 variants according to another embodiment of the present technology. Panel A shows the use of a CRISPR/Cas9 ribonucleoprotein complex engineered to remain bound to DNA under suitable conditions, where the ribonucleoprotein complex includes a capture tag. Cleavage of the double-stranded target DNA follows the variant Cas9 ribonucleoprotein complex and capture-labeled guide RNA (gRNA)-promoted binding. After cleavage and while Cas9 remains bound to the cleaved 5' and 3' ends of the target DNA fragment, panel B shows the sample was treated with an exonuclease to hydrolyze the exposed 3' or 5' ends of the DNA phosphodiester bond. After negative/enrichment selection of target DNA fragments by exonuclease destruction of all non-targeted DNA, and when Cas9 remains bound, Panel C shows a positive enrichment/selection process for target nucleic acid capture, which The process involves the stepwise addition of a functionalized surface capable of binding a capture label associated with the ribonucleoprotein complex as it remains bound to the target nucleic acid. Following an affinity-based enrichment step, and as depicted in Figure D , Cas9 dissociates from DNA and releases blunt-ended double-stranded target DNA fragments of known length. Panel E depicts an optional further processing step for positive enrichment/selection of target DNA fragments by size selection. Optionally, as depicted in Figure F , the enriched DNA fragments can be ligated to adaptors for nucleic acid interrogation, such as sequencing.

图6是示出了根据本技术的实施例的用于利用Cas9的无催化活性的变体生成具有已知/选定长度的靶向的核酸片段的方法的步骤的示意图。使用无催化活性的Cas9核糖核蛋白复合物,该复合物被工程改造为靶向并结合双链DNA,图A示出了变体Cas9与靶向的DNA位点的gRNA促进的结合。在结合后,图B示出了用核酸外切酶处理样品,以水解在DNA的暴露的3'或5'末端的暴露的磷酸二酯键。Cas9的无催化活性的变体不切割靶DNA,但是提供了核酸外切酶抗性,使得核酸外切酶活性切割每个核苷酸碱基,直到被结合的Cas9复合物阻断。在通过核酸外切酶破坏所有非靶向的DNA而对靶DNA片段进行阴性/富集选择后,无催化活性的Cas9与DNA分离并且释放已知长度的双链靶DNA片段,如图C所示。图D描绘了通过大小选择对靶DNA片段进行阳性富集/选择的任选的进一步处理步骤。任选地,如在图E中所描绘的,富集的DNA片段可以连接到用于核酸询问(诸如测序)的衔接子上。6 is a schematic diagram illustrating the steps of a method for generating targeted nucleic acid fragments of known/selected lengths using catalytically inactive variants of Cas9 in accordance with embodiments of the present technology. Using a catalytically inactive Cas9 ribonucleoprotein complex engineered to target and bind double-stranded DNA, Panel A shows gRNA-promoted binding of variant Cas9 to the targeted DNA site. After binding, panel B shows treatment of the sample with exonuclease to hydrolyze exposed phosphodiester bonds at the exposed 3' or 5' ends of the DNA. Catalytically inactive variants of Cas9 do not cleave target DNA, but provide exonuclease resistance such that exonuclease activity cleaves every nucleotide base until blocked by bound Cas9 complexes. After negative/enriched selection of target DNA fragments by exonuclease destruction of all non-targeted DNA, catalytically inactive Cas9 dissociates from DNA and releases double-stranded target DNA fragments of known length, as shown in Figure C Show. Panel D depicts an optional further processing step for positive enrichment/selection of target DNA fragments by size selection. Optionally, as depicted in Figure E , the enriched DNA fragments can be ligated to adaptors for nucleic acid interrogation, such as sequencing.

图7是示出了根据本技术的另一个实施例的用于利用Cas9的无催化活性的变体生成靶向的片段大小的方法的步骤的示意图。图A示出了在核糖核蛋白复合物中使用Cas9的无催化活性的变体,该核糖核蛋白复合物被工程改造为在合适的条件下保持与DNA结合,并且其中该核糖核蛋白复合物包括捕获标记。无催化活性的变体Cas9核糖核蛋白复合物与捕获标记的导向RNA(gRNA)-促进的结合之后是向样品中加入核酸外切酶,以水解DNA的暴露的3'或5'末端的暴露的磷酸二酯键。Cas9的无催化活性的变体不切割靶DNA,但是提供了核酸外切酶抗性,使得核酸外切酶活性切割每个核苷酸碱基,直到被结合的Cas9复合物阻断。在通过核酸外切酶破坏所有非靶向的DNA而对靶DNA片段进行阴性/富集选择之后,并且当无催化活性的Cas9保持结合时,图C示出了靶核酸捕获的阳性富集/选择过程,该过程包括逐步添加官能化的表面,该官能化的表面能够结合与核糖核蛋白复合物相关的捕获标记,因为它保持与靶核酸结合。在基于亲和的富集步骤之后,并且如图D中所描绘的,Cas9与DNA分离并且释放已知长度的双链靶DNA片段。图E描绘了通过大小选择对靶DNA片段进行阳性富集/选择的任选的进一步处理步骤。任选地,如图F中所描绘的,富集的DNA片段可以连接到用于核酸询问(诸如测序)的衔接子上。7 is a schematic diagram illustrating the steps of a method for generating targeted fragment sizes using catalytically inactive variants of Cas9 in accordance with another embodiment of the present technology. Panel A shows the use of a catalytically inactive variant of Cas9 in a ribonucleoprotein complex engineered to remain bound to DNA under appropriate conditions, and in which the ribonucleoprotein complex Include capture tags. Catalytically inactive variant Cas9 ribonucleoprotein complexes with capture-labeled guide RNA (gRNA)-promoted binding are followed by the addition of exonuclease to the sample to hydrolyze the exposed 3' or 5' ends of the DNA phosphodiester bond. Catalytically inactive variants of Cas9 do not cleave target DNA, but provide exonuclease resistance such that exonuclease activity cleaves every nucleotide base until blocked by bound Cas9 complexes. Panel C shows positive enrichment/enrichment of target nucleic acid capture after negative/enrichment selection of target DNA fragments by exonuclease destruction of all non-targeted DNA, and when catalytically inactive Cas9 remains bound A selection process that involves the stepwise addition of a functionalized surface capable of binding a capture label associated with the ribonucleoprotein complex as it remains bound to the target nucleic acid. Following an affinity-based enrichment step, and as depicted in Figure D , Cas9 dissociates from DNA and releases double-stranded target DNA fragments of known length. Panel E depicts an optional further processing step for positive enrichment/selection of target DNA fragments by size selection. Optionally, as depicted in Figure F , the enriched DNA fragments can be ligated to adaptors for nucleic acid interrogation, such as sequencing.

图8是示出了根据本技术的另一个实施例的使用催化活性的和无催化活性的Cas9的靶核酸富集方案的示意图。催化活性的和无催化活性的Cas9核糖核蛋白复合物都可以靶向样品中的所需序列。催化活性的Cas9核糖核蛋白复合物被导向靶DNA区域的侧翼区域,并且用于切割靶双链DNA以释放已知长度的钝端双链靶DNA片段。一种或多种带有捕获标记的无催化活性的核糖核蛋白复合物被导向两个位点选择的切割位点之间的靶序列区域。在切割靶DNA以释放DNA片段后,添加能够结合与无催化活性的核糖核蛋白复合物相关的捕获标记的官能化的表面可以促进靶片段的阳性富集/选择。8 is a schematic diagram illustrating a target nucleic acid enrichment scheme using catalytically active and catalytically inactive Cas9 according to another embodiment of the present technology. Both catalytically active and catalytically inactive Cas9 ribonucleoprotein complexes can be targeted to desired sequences in a sample. Catalytically active Cas9 ribonucleoprotein complexes are directed to flanking regions of the target DNA region and are used to cleave the target double-stranded DNA to release blunt-ended double-stranded target DNA fragments of known length. One or more catalytically inactive ribonucleoprotein complexes with capture tags are directed to the region of the target sequence between the two site-selected cleavage sites. After cleavage of target DNA to release DNA fragments, the addition of functionalized surfaces capable of binding capture labels associated with catalytically inactive ribonucleoprotein complexes can facilitate positive enrichment/selection of target fragments.

图9A和9B是根据本技术的实施例的使用带有捕获标记的Cas9核糖核蛋白复合物的无催化活性的变体对靶核酸片段进行阳性富集/选择的方法步骤的概念性说明。样品中片段化的双链DNA片段(例如,机械剪切的DNA、声学片段化的DNA、无细胞的DNA等)可以通过经由在溶液中无催化活性的Cas9核糖核蛋白复合物的靶定向的结合而被阳性富集/选择(图9A)。逐步添加能够结合与核糖核蛋白复合物相关的捕获标记的官能化的表面,因为它保持与靶核酸的结合,促进了所需双链DNA片段的下拉(例如亲和纯化),同时丢弃非靶向的片段(图9B)。9A and 9B are conceptual illustrations of method steps for positive enrichment/selection of target nucleic acid fragments using catalytically inactive variants of the Cas9 ribonucleoprotein complex with capture tags, according to embodiments of the present technology. Fragmented double-stranded DNA fragments (eg, mechanically sheared DNA, acoustically fragmented DNA, cell-free DNA, etc.) in a sample can be directed by targeting via the catalytically inactive Cas9 ribonucleoprotein complex in solution. Binding was positively enriched/selected (Figure 9A). Stepwise addition of a functionalized surface capable of binding a capture label associated with the ribonucleoprotein complex as it remains bound to the target nucleic acid facilitates pull-down of the desired double-stranded DNA fragment (e.g. affinity purification) while discarding non-targets oriented fragment (Figure 9B).

图10是示出了根据本技术的实施例的使用带有捕获标记的Cas9核糖核蛋白复合物的无催化活性的变体对靶核酸片段进行阳性富集/选择的方法步骤的示意图。图A示出了样品中不同大小的多个片段化的双链DNA片段,包含分子2,其太小以至于不能通过大小选择或基于亲和力的方法可靠地富集。图B示出了将衔接子连接到样品中分子的5'和3'末端,从而使这样的DNA片段的长度更长。图C示出了通过经由在溶液中带有捕获标记的无催化活性的Cas9核糖核蛋白复合物的靶定向的结合,随后是通过下拉法进行亲和纯化的分子2的阳性富集/选择步骤。10 is a schematic diagram illustrating the method steps for positive enrichment/selection of target nucleic acid fragments using a catalytically inactive variant of the Cas9 ribonucleoprotein complex bearing a capture tag, according to an embodiment of the present technology. Panel A shows multiple fragmented double-stranded DNA fragments of different sizes in the sample, including molecule 2, which is too small to be reliably enriched by size selection or affinity-based methods. Panel B shows the ligation of adaptors to the 5' and 3' ends of the molecules in the sample, thereby making such DNA fragments longer in length. Panel C shows a positive enrichment/selection step for molecule 2 via target-directed binding via a catalytically inactive Cas9 ribonucleoprotein complex with a capture tag in solution, followed by affinity purification by pull-down method .

图11是示出了根据本技术的实施例的用于使用阴性富集方案(图A)和阳性富集方案(图B)来富集靶向的核酸材料的方法的步骤的示意图。图A示出了将发夹衔接子连接到双链靶DNA分子的5'和3'末端,以生成没有暴露的末端的衔接子-核酸复合物。在阴性富集/选择方案中用核酸外切酶处理衔接子-核酸复合物,以消除核酸材料片段和具有未保护的5'和3'末端的衔接子(例如,没有4个连接的磷酸二酯键的衔接子-核酸复合物、未连接的DNA、单链核酸材料、游离衔接子等),如图B的右侧所示。核酸外切酶抗性衔接子-核酸复合物可以通过大小选择或通过靶序列(例如CRISPR/Cas9下拉)被进一步富集(图B,左侧)。所需的衔接子-靶核酸复合物可以通过扩增和/或测序被进一步处理。11 is a schematic diagram illustrating the steps of a method for enriching targeted nucleic acid material using a negative enrichment scheme ( panel A ) and a positive enrichment scheme ( panel B ) in accordance with an embodiment of the present technology. Panel A shows the ligation of hairpin adaptors to the 5' and 3' ends of a double-stranded target DNA molecule to generate adaptor-nucleic acid complexes without exposed ends. Adapter-nucleic acid complexes are treated with exonuclease in a negative enrichment/selection protocol to eliminate fragments of nucleic acid material and adapters with unprotected 5' and 3' ends (eg, no 4 linked phosphodi Ester-linked adaptor-nucleic acid complexes, unligated DNA, single-stranded nucleic acid material, free adaptors, etc.), as shown on the right side of panel B. Exonuclease-resistant adaptor-nucleic acid complexes can be further enriched by size selection or by target sequences (eg, CRISPR/Cas9 pulldown) (Panel B , left). The desired adaptor-target nucleic acid complexes can be further processed by amplification and/or sequencing.

图12示出了其中带有捕获标记的发夹衔接子被连接到靶双链DNA上用于基于亲和的富集并且根据本技术的另一个实施例的实施例。Figure 12 shows an example in which a hairpin adaptor with a capture tag is ligated to a target double-stranded DNA for affinity-based enrichment and in accordance with another embodiment of the present technology.

图13是根据本技术的实施例的示意图,其示出了使用发夹衔接子来阳性富集衔接子-靶核酸复合物的方法步骤(图A),随后是滚环扩增(图B和C)和扩增子制备步骤,用于以基本相同的比率生成双链核酸片段的第一链和第二链的扩增子(图D)。13 is a schematic diagram illustrating method steps for positive enrichment of adaptor-target nucleic acid complexes using hairpin adaptors ( Panel A ) followed by rolling circle amplification ( Panel B and FIG. C ) and amplicon preparation steps for generating amplicons of the first and second strands of the double-stranded nucleic acid fragment at substantially the same ratio ( panel D ).

图14是示出了根据本技术的实施例的用于利用CRISPR/Cpf1生成具有已知/选定长度的靶向的核酸片段的方法的步骤的示意图,所述靶向的核酸片段具有不同的5'和3'可连接末端,所述可连接末端包括具有已知核苷酸长度和序列的单链突出区域。图A示出了Cpf1在靶向的DNA位点的gRNA促进的结合。Cpf1定向的切割生成交错切割,提供4个(所描绘的)或5个核苷酸突出端(例如,“粘性末端”)。位于靶DNA序列的侧翼的位点定向的Cpf1切割生成已知长度的双链靶DNA片段(例如,其可以通过大小选择来富集),其中粘性末端1位于片段的5'末端并且粘性末端2位于片段的3'末端(图B)。图B进一步示出了在片段的5'末端连接衔接子1,并且在片段的3'末端连接衔接子2,其中衔接子1和衔接子2分别包括与片段上的粘性末端1和2至少部分地互补的突出序列。14 is a schematic diagram illustrating the steps of a method for utilizing CRISPR/Cpf1 to generate targeted nucleic acid fragments of known/selected lengths having different The 5' and 3' ligatable ends include single-stranded overhang regions of known nucleotide length and sequence. Panel A shows gRNA-promoted binding of Cpf1 at targeted DNA sites. Cpf1-directed cleavage generates staggered cuts, providing 4 (depicted) or 5 nucleotide overhangs (eg, "sticky ends"). Site-directed Cpf1 cleavage flanking the target DNA sequence generates double-stranded target DNA fragments of known length (eg, which can be enriched by size selection) with sticky ends 1 at the 5' end of the fragment and sticky ends 2 at the 3' end of the fragment ( Panel B ). Panel B further shows ligation of adaptor 1 at the 5' end of the fragment, and ligation of adaptor 2 at the 3' end of the fragment, wherein adaptor 1 and adaptor 2 comprise at least part of the cohesive ends 1 and 2 on the fragment, respectively complementary overhang sequences.

图15是示出了根据本技术的实施例的用于包括粘性末端的靶DNA片段(例如,诸如在图14的方法中生成的靶DNA片段)的基于亲和的富集的方法的步骤的示意图。图A示出了逐步添加官能化的表面,该官能化的表面能够结合与溶液中切割的靶DNA片段相关的粘性末端。一旦结合到官能化的表面,亲和力相互作用促进所需双链DNA片段的下拉(例如,亲和纯化),同时丢弃非靶向的片段,如图B所示。15 is a diagram illustrating steps of a method for affinity-based enrichment of target DNA fragments comprising cohesive ends (eg, such as those generated in the method of FIG. 14 ) in accordance with embodiments of the present technology Schematic. Panel A shows the stepwise addition of functionalized surfaces capable of binding sticky ends associated with target DNA fragments cleaved in solution. Once bound to the functionalized surface, affinity interactions facilitate pull-down (eg, affinity purification) of the desired double-stranded DNA fragments, while discarding non-targeted fragments, as shown in panel B.

图16是示出了根据本技术的另一个实施例的用于包括粘性末端的靶DNA片段(例如,诸如在图14的方法中生成的靶DNA片段)的基于亲和的富集的方法的步骤的示意图。图A示出了带有捕获标记的寡核苷酸的逐步添加,该寡核苷酸具有与溶液中切割的靶DNA片段相关的粘性末端的一部分至少部分地互补的核苷酸序列。如图B所示,进一步添加能够结合捕获标记的官能化的表面促进所需的双链DNA片段的下拉(例如亲和纯化),同时丢弃非靶向的片段。16 is a diagram illustrating a method for affinity-based enrichment of target DNA fragments comprising sticky ends (eg, such as those generated in the method of FIG. 14 ) in accordance with another embodiment of the present technology Schematic of the steps. Panel A shows the stepwise addition of capture-labeled oligonucleotides having nucleotide sequences that are at least partially complementary to a portion of the sticky ends associated with target DNA fragments cleaved in solution. As shown in Panel B , further addition of a functionalized surface capable of binding a capture label facilitates pull-down (eg, affinity purification) of the desired double-stranded DNA fragments, while discarding non-targeted fragments.

图17是示出了根据本技术的实施例的使用Cas9切口酶对具有已知长度和具有不同5'和3'可连接末端的核酸材料进行靶向的片段富集的方法的步骤的示意图,所述可连接末端包括具有已知核苷酸长度和序列的长单链突出区域。图A示出了成对的Cas9切口酶在靶向的DNA区域中的gRNA靶向的结合。双链断裂可以通过使用成对的切口酶来切除靶DNA区域来引入,并且当使用成对的Cas9切口酶时,在每个切割的末端上而不是在钝端上产生长的突出端(粘性末端1和2),如图B所示。图C示出了能够结合与溶液中切割的靶DNA片段相关的长的粘性末端(例如,粘性末端1)的官能化的表面的逐步添加。一旦结合到官能化的表面,亲和力相互作用促进所需的双链DNA片段的下拉(例如,亲和纯化),同时丢弃非靶向的片段,如图D所示。图E示出了阳性富集步骤的变型,其包括添加带有捕获标记的寡核苷酸,所述寡核苷酸具有与溶液中切割的靶DNA片段相关的长粘性末端(例如,粘性末端1)的一部分至少部分地互补的核苷酸序列。图F示出了与带有捕获标记的寡核苷酸的一部分至少部分地互补的第二寡链的退火。第二寡链的酶促延伸和与模板DNA片段的连接生成了衔接子-靶DNA复合物。进一步的步骤可以包含引入能够结合捕获标记的官能化的表面(未示出),以促进所需的衔接子-双链DNA复合物的下拉(例如亲和纯化),同时丢弃非靶向的片段。17 is a schematic diagram showing the steps of a method for targeted fragment enrichment of nucleic acid material of known length and having different 5' and 3' ligatable ends using Cas9 nickase, according to an embodiment of the present technology, The ligable ends include long single-stranded overhangs of known nucleotide length and sequence. Panel A shows gRNA-targeted binding of paired Cas9 nickases in targeted DNA regions. Double-strand breaks can be introduced by excising the target DNA region using paired nickases, and when paired Cas9 nickases are used, long overhangs (sticky ends) are created on each cleaved end rather than blunt ends. Ends 1 and 2), as shown in Figure B. Panel C shows the stepwise addition of functionalized surfaces capable of binding long sticky ends (eg, sticky ends 1 ) associated with target DNA fragments cleaved in solution. Once bound to the functionalized surface, affinity interactions facilitate pull-down (eg, affinity purification) of the desired double-stranded DNA fragments, while discarding non-targeted fragments, as shown in panel D. Panel E shows a variation of the positive enrichment step that involves adding capture-labeled oligonucleotides with long sticky ends (eg, sticky ends) associated with target DNA fragments cleaved in solution A nucleotide sequence that is at least partially complementary to a portion of 1). Panel F shows the annealing of a second oligo strand that is at least partially complementary to a portion of the capture-labeled oligonucleotide. Enzymatic extension of the second oligo strand and ligation to the template DNA fragment generates the adaptor-target DNA complex. A further step may involve the introduction of a functionalized surface (not shown) capable of binding a capture label to facilitate pull-down (e.g., affinity purification) of the desired adaptor-dsDNA complex, while discarding non-targeted fragments .

图18是示出了根据本技术的另一个实施例的使用无催化活性的Cas9的靶核酸富集方案的示意图。无催化活性的Cas9核糖核蛋白复合物可以靶向样品中的所需序列。带有一个或多个捕获标记的一个或多个无催化活性的核糖核蛋白复合物将其他蛋白复合物结构引导至靶DNA区域。当蛋白质复合物结构覆盖靶DNA区域时,提供了核酸外切酶抗性。在用核酸外切酶或核酸内切酶和核酸外切酶的组合处理以及蛋白质复合物的亲和纯化(例如,通过与官能化的表面结合的捕获标记、抗体下拉等)之后,靶核酸片段可以从核糖核苷酸复合物结合中释放。18 is a schematic diagram illustrating a target nucleic acid enrichment scheme using catalytically inactive Cas9 according to another embodiment of the present technology. The catalytically inactive Cas9 ribonucleoprotein complex can target desired sequences in the sample. One or more catalytically inactive ribonucleoprotein complexes with one or more capture labels direct other protein complex structures to target DNA regions. Exonuclease resistance is provided when the protein complex structure covers the target DNA region. Following treatment with an exonuclease or a combination of endonuclease and exonuclease and affinity purification of the protein complex (eg, by capture tags bound to functionalized surfaces, antibody pulldown, etc.), target nucleic acid fragments Can be released from ribonucleotide complex binding.

图19A和19B是根据本技术的实施例的制备的DNA文库和试剂的概念性说明,所述文库和试剂可以用作选择性地询问相关的DNA区域的工具。唯一标记的无催化活性的Cas9靶向于分离的/未片段化的基因组DNA(或其他大的DNA片段)的多个(例如,间隔的)区域(图19A)。每种无催化活性的Cas9核糖核蛋白包括具有已知序列(例如,代码序列)的已知寡核苷酸标签,并且与基因组的预先设计的区域结合。当使用DNA文库时,用户可以逐步添加一种或多种探针,所述探针包括与相关的基因组的区域相对应的代码序列的补体(例如,反代码序列)。一种片段化的方法可以用于将基因组DNA片段化成各种大小(例如限制性酶消化、机械剪切等)。探针包括附着或结合到其上的捕获标记(图19B)。可以添加能够结合捕获标记的官能化的表面,用于亲和纯化和用于询问的所需基因组区域的阳性富集。19A and 19B are conceptual illustrations of DNA libraries and reagents prepared according to embodiments of the present technology that can be used as tools to selectively interrogate relevant DNA regions. Uniquely labeled catalytically inactive Cas9 targets multiple (eg, spaced) regions of isolated/unfragmented genomic DNA (or other large DNA fragments) (FIG. 19A). Each catalytically inactive Cas9 ribonucleoprotein includes a known oligonucleotide tag with a known sequence (eg, a code sequence) and binds to a predesigned region of the genome. When using a DNA library, the user can incrementally add one or more probes comprising the complement (eg, anti-code sequence) of the code sequence corresponding to the region of the genome of interest. A method of fragmentation can be used to fragment genomic DNA into various sizes (eg, restriction enzyme digestion, mechanical shearing, etc.). The probe includes a capture label attached or bound to it (FIG. 19B). Functionalized surfaces capable of binding capture labels can be added for affinity purification and for positive enrichment of desired genomic regions for interrogation.

图20示出了根据本技术的实施例的用于对靶DNA片段进行基于亲和的富集和测序的方法的步骤,该方法与直接数字测序方法一起使用。图A示出了与包括粘性末端的靶DNA片段(例如,诸如在图14或图17的方法中生成的靶DNA片段)的选定的衔接子连接。图A进一步示出了在片段的5'末端连接衔接子1并且在片段的3'末端连接衔接子2,其中衔接子1和衔接子2分别包括与片段上的粘性末端1和2至少部分地互补的突出序列。衔接子1具有Y形状并且包括5'和3'单链臂,其带有包括不同性质的不同标记(A和B)。衔接子2是发夹形衔接子。图B示出了直接数字测序方法中的步骤,其中标记A被配置为与功能表面结合。标记B提供了物理属性(例如,电荷、磁性等),使得电场或磁场的施加导致双链衔接子-DNA复合物的第一链和第二链的变性,随后是DNA片段的电拉伸。第一链和第二链保持被发夹衔接子束缚,使得来自富集的/靶向的链的序列信息为错误校正和其他核酸询问(例如,DNA损伤的评估等)提供双链序列信息。20 illustrates steps of a method for affinity-based enrichment and sequencing of target DNA fragments for use with a direct digital sequencing method in accordance with an embodiment of the present technology. Panel A shows selected adaptor ligations to target DNA fragments including cohesive ends (eg, such as those generated in the methods of FIG. 14 or FIG. 17 ). Panel A further shows ligation of adaptor 1 at the 5' end of the fragment and ligation of adaptor 2 at the 3' end of the fragment, wherein adaptor 1 and adaptor 2 respectively comprise at least partially cohesive ends 1 and 2 on the fragment Complementary overhang sequence. Adaptor 1 has a Y shape and includes 5' and 3' single-stranded arms with different labels (A and B) including different properties. Adaptor 2 is a hairpin adaptor. Panel B shows the steps in a direct digital sequencing method, where label A is configured to bind to a functional surface. Label B provides physical properties (eg, charge, magnetism, etc.) such that application of an electric or magnetic field results in denaturation of the first and second strands of the double-stranded adaptor-DNA complex, followed by electrical stretching of the DNA fragments. The first and second strands remain tethered by hairpin adaptors, so that sequence information from the enriched/targeted strands provides double-stranded sequence information for error correction and other nucleic acid interrogations (eg, assessment of DNA damage, etc.).

图21示出了根据本技术的另一个实施例的用于使用直接数字测序方法对靶DNA片段进行测序的基于亲和的富集的方法的步骤。图A示出了包括粘性末端的靶DNA片段(例如,诸如在图14或图17的方法中生成的靶DNA片段)的基于亲和的富集。如所示出的,发夹衔接子已经以序列依赖的方式连接到双链DNA片段的3'末端。靶DNA分子可以流过能够结合与切割的靶DNA片段相关的粘性末端的官能化的表面(例如,具有结合的寡核苷酸)。此外,将包括标记B并至少部分地与结合的寡核苷酸的一部分互补的第二寡核苷酸链加入到溶液中。衔接子/DNA片段组分的退火和连接提供了衔接子-靶双链DNA复合物,该复合物结合到适合于直接数字测序的表面上(图B)。用于测序步骤的电场或磁场的施加和衔接子-DNA复合物的电拉伸可以如例如图20所描述的进行。21 illustrates steps of an affinity-based enrichment method for sequencing target DNA fragments using direct digital sequencing methods, according to another embodiment of the present technology. Panel A shows affinity-based enrichment of target DNA fragments including cohesive ends (eg, such as those generated in the methods of FIG. 14 or FIG. 17 ). As shown, the hairpin adaptor has been ligated to the 3' end of the double-stranded DNA fragment in a sequence-dependent manner. Target DNA molecules can flow over a functionalized surface (eg, with bound oligonucleotides) capable of binding sticky ends associated with cleaved target DNA fragments. Additionally, a second oligonucleotide strand comprising label B and at least partially complementary to a portion of the bound oligonucleotide is added to the solution. Annealing and ligation of the adaptor/DNA fragment components provides adaptor-target double-stranded DNA complexes that bind to a surface suitable for direct digital sequencing ( Panel B ). The application of electric or magnetic fields for the sequencing step and the electrical stretching of the adaptor-DNA complex can be performed as described, for example, in FIG. 20 .

图22A示出了用于本技术的一些实施例的核酸衔接子分子,以及根据本技术的实施例由衔接子分子与双链核酸片段的连接产生的双链衔接子-核酸复合物。22A illustrates nucleic acid adaptor molecules used in some embodiments of the present technology, and double-stranded adaptor-nucleic acid complexes produced by ligation of adaptor molecules to double-stranded nucleic acid fragments according to embodiments of the present technology.

图22B和22C是根据本技术的实施例的各种双链测序方法步骤的概念性说明。22B and 22C are conceptual illustrations of various double-stranded sequencing method steps in accordance with embodiments of the present technology.

定义definition

为了更容易理解本公开,下面首先定义某些术语。用于以下术语和其他术语的附加定义在整个说明书中阐述。For an easier understanding of the present disclosure, certain terms are first defined below. Additional definitions for the following terms and other terms are set forth throughout the specification.

在本申请中,除非在上下文中另有说明,否则术语“一个”可以理解为表示“至少一个”。如在本申请中所使用的,术语“或”可以理解为意指“和/或”。在本申请中,术语“包括(comprising)”和“包含(including)”可以被理解为包含逐项列出的部件或步骤,无论是由它们单独呈现还是与一个或多个附加部件或步骤一起呈现。在本文提供范围的情况下,包含端点。如在本申请中所使用的,术语“包括(comprise)”和该术语的变体,诸如“包括(comprising)”和“包括(comprises)”,并不旨在排除其他添加剂、组分、整体或步骤。In this application, the term "a" is understood to mean "at least one" unless the context dictates otherwise. As used in this application, the term "or" can be understood to mean "and/or." In this application, the terms "comprising" and "including" may be understood to encompass the itemized components or steps, whether presented by them alone or together with one or more additional components or steps render. Where ranges are provided herein, the endpoints are included. As used in this application, the term "comprise" and variations of this term, such as "comprising" and "comprises", are not intended to exclude other additives, components, integral or steps.

约:术语“约”当在本文中参考值使用时,是指在上下文中与参考值相似的值。一般来说,熟悉上下文的本领域技术人员将理解在该上下文中由“约”所包含的相关变化程度。例如,在一些实施例中,术语“约”可以包含一些在参考值的25%、20%、19%、18%、17%、16%、15%、14%、13%、12%、11%、10%、9%、8%、7%、6%、5%、4%、3%、2%、1%或更小的范围内的值。About: The term "about" when used herein with reference to a value refers to a value that is, in context, similar to the reference value. In general, those skilled in the art familiar with the context will understand the relative degree of variation encompassed by "about" in this context. For example, in some embodiments, the term "about" may encompass something at 25%, 20%, 19%, 18%, 17%, 16%, 15%, 14%, 13%, 12%, 11% of the reference value %, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1% or less.

类似物:如本文中所使用的,术语“类似物”是指与参考物质共享一个或多个特定结构特征、元素、组分或部分的物质。通常地,“类似物”显示出与参考物质显著的结构相似性,例如共享核心或共有结构,但是在某些离散方式上也不同。在一些实施例中,类似物是可以从参考物质生成的物质,例如通过参考物质的化学处理。在一些实施例中,类似物是可以通过执行与生成参考物质的过程基本相似(例如,与其共享多个步骤)的合成过程来生成的物质。在一些实施例中,类似物通过执行不同于用于生成参考物质的合成过程的合成过程来生成或可以通过该合成过程来生成。Analog: As used herein, the term "analog" refers to a substance that shares one or more specific structural features, elements, components, or moieties with the reference substance. Typically, "analogs" exhibit significant structural similarity to the reference substance, such as a shared core or shared structure, but also differ in some discrete manner. In some embodiments, an analog is a substance that can be generated from a reference substance, eg, by chemical treatment of the reference substance. In some embodiments, an analog is a substance that can be generated by performing a synthetic process that is substantially similar (eg, shares multiple steps therewith) as the process for generating the reference substance. In some embodiments, the analog is or can be generated by performing a synthetic process different from that used to generate the reference substance.

生物样品:如本文中所使用的,术语“生物样品”或“样品”通常是指如本文所描述的从相关的生物源(例如,组织或生物体或细胞培养物)获得或衍生的样品。在一些实施例中,相关的来源包括生物体,例如动物或人类。在其他实施例中,相关的来源包括微生物,例如细菌、病毒、原生动物或真菌。在进一步的实施例中,相关的来源可以是合成组织、生物体、细胞培养物、核酸或其他材料。在又进一步的实施例中,相关的来源可以是基于植物的生物体。在又一个实施例中,样品可以是环境样品,诸如例如水样品、土壤样品、考古样品或从非生物源收集的其他样品。在其他实施例中,样品可以是多生物体样品(例如,混合生物体样品)。在一些实施例中,生物样品是或包括生物组织或流体。在一些实施例中,生物样品可以是或包括骨髓;血液;血细胞;腹水;组织或细针活检样品;含有细胞的体液;自由漂浮的核酸;痰;唾液;尿液;脑脊液、腹膜液;胸膜液;粪便;淋巴液;妇科流体;皮肤拭子;阴道拭子;巴氏涂片、口腔拭子;鼻拭子;冲洗液或灌洗液,例如导管灌洗液或肺泡灌洗液;阴道流体、抽吸物;废料;骨髓标本;组织活检标本;胎儿组织或流体;外科标本;粪便、其他体液、分泌物和/或排泄物;和/或由此的细胞等。在一些实施例中,生物样品是或包括从个体获得的细胞。在一些实施例中,获得的细胞是或者包含来自从中获得样品的个体的细胞。在特定的实施例中,生物样品是从受试者获得的液体活检样品。在一些实施例中,样品是通过任何合适的方式直接从相关的来源获得的“初级样品”。例如,在一些实施例中,初级生物样品通过选自由活检(例如,细针抽吸或组织活检)、手术、体液(例如,血液、淋巴液、粪便等)的收集组成的组的方法来获得。在一些实施例中,如将从上下文中清楚的是,术语“样品”是指通过处理(例如,通过除去初级样品的一种或多种组分和/或通过向初级样品中加入一种或多种药剂)初级样品获得的制剂。例如,使用半透膜过滤。这样的“处理过的样品”可以包括例如从样品中提取的或者通过使初级样品经历例如mRNA的扩增或反转录、某些组分的分离和/或纯化等的技术而获得的核酸或蛋白质。Biological sample: As used herein, the term "biological sample" or "sample" generally refers to a sample obtained or derived from a relevant biological source (eg, tissue or organism or cell culture) as described herein. In some embodiments, relevant sources include organisms, such as animals or humans. In other embodiments, relevant sources include microorganisms, such as bacteria, viruses, protozoa, or fungi. In further embodiments, the relevant source may be a synthetic tissue, organism, cell culture, nucleic acid or other material. In yet further embodiments, the relevant source may be a plant-based organism. In yet another embodiment, the sample may be an environmental sample such as, for example, a water sample, a soil sample, an archaeological sample, or other sample collected from abiotic sources. In other embodiments, the sample can be a multi-organism sample (eg, a mixed-organism sample). In some embodiments, the biological sample is or includes biological tissue or fluid. In some embodiments, the biological sample can be or include bone marrow; blood; blood cells; ascites; tissue or fine needle biopsy; body fluids containing cells; free floating nucleic acids; sputum; saliva; urine; cerebrospinal fluid, peritoneal fluid; pleura feces; lymph; gynecological fluids; skin swabs; vaginal swabs; Pap smears, buccal swabs; Fluids, aspirates; waste; bone marrow specimens; tissue biopsy specimens; fetal tissue or fluid; surgical specimens; feces, other body fluids, secretions and/or excretions; and/or cells therefrom, etc. In some embodiments, the biological sample is or includes cells obtained from an individual. In some embodiments, the obtained cells are or comprise cells from the individual from which the sample was obtained. In certain embodiments, the biological sample is a liquid biopsy sample obtained from a subject. In some embodiments, the sample is a "primary sample" obtained directly from a relevant source by any suitable means. For example, in some embodiments, the primary biological sample is obtained by a method selected from the group consisting of biopsy (eg, fine needle aspiration or tissue biopsy), surgery, collection of bodily fluids (eg, blood, lymph, feces, etc.) . In some embodiments, as will be clear from the context, the term "sample" refers to processing (eg, by removing one or more components of the primary sample and/or by adding to the primary sample one or more Multiple agents) formulations obtained from primary samples. For example, use semi-permeable membrane filtration. Such "treated sample" may include, for example, nucleic acid extracted from the sample or obtained by subjecting the primary sample to techniques such as amplification or reverse transcription of mRNA, separation and/or purification of certain components, or the like or protein.

捕获标记:如本文中所使用的,术语“捕获标记”(其也可以被称为“捕获标签”、“捕获部分”、“亲和标记”、“亲和标签”、“表位标签”、“标签”、“猎物”部分或化学基团,以及其他名称)是指为了纯化目的可以被整合到靶分子或底物中或被整合到其上的部分。在一些实施例中,捕获标记选自包括小分子、核酸、肽或任何独特可结合的部分的组。在一些实施例中,捕获标记附着至核酸分子的5'末端。在一些实施例中,捕获标记附着至核酸分子的3'末端。在一些实施例中,捕获标记与核酸分子的内部序列中而不是在任一端的核苷酸缀合。在一些实施例中,捕获标记是核酸分子内的核苷酸的序列。在一些实施例中,捕获标记选自生物素、生物素脱氧胸苷dT、生物素NHS、生物素TEG、脱硫生物素NHS、洋地黄毒苷NHS、DNP、TEG、硫醇以及其他的组。在一些实施例中,捕获标记包含但不限于生物素、抗生物素蛋白、链霉亲和素、由抗体识别的半抗原、特定的核酸序列和磁性吸引颗粒。在一些实施例中,核酸分子的化学改性(例如,AcriditeTM改性的、腺苷酸化的、叠氮化物改性的、炔烃改性的、I-LinkerTM-改性的等)可以用作捕获标记。Capture Tag: As used herein, the term "capture tag" (which may also be referred to as "capture tag", "capture moiety", "affinity tag", "affinity tag", "epitope tag", A "tag", "prey" moiety or chemical moiety, and other names) refers to a moiety that can be incorporated into or onto a target molecule or substrate for purification purposes. In some embodiments, the capture label is selected from the group consisting of small molecules, nucleic acids, peptides, or any uniquely bindable moiety. In some embodiments, the capture label is attached to the 5' end of the nucleic acid molecule. In some embodiments, the capture label is attached to the 3' end of the nucleic acid molecule. In some embodiments, the capture label is conjugated to a nucleotide in the internal sequence of the nucleic acid molecule but not at either end. In some embodiments, the capture label is a sequence of nucleotides within a nucleic acid molecule. In some embodiments, the capture label is selected from the group of biotin, biotin deoxythymidine dT, biotin NHS, biotin TEG, desthiobiotin NHS, digoxigenin NHS, DNP, TEG, thiols, and others. In some embodiments, capture labels include, but are not limited to, biotin, avidin, streptavidin, haptens recognized by antibodies, specific nucleic acid sequences, and magnetically attractive particles. In some embodiments, chemical modification of nucleic acid molecules (eg, Acridite -modified, adenylated, azide-modified, alkyne-modified, I-Linker -modified, etc.) can Used as a capture marker.

切割位点(cut site):也被称为“切割位点(cleavage site)”和“缺口位点(nicksite)”,是核酸分子中核苷酸之间的键或键对。在双链核酸分子(诸如双链DNA)的情况下,切割位点可以包括在双链分子中彼此紧邻的键(通常是磷酸二酯键),使得在切割后形成“钝”端。切割位点也可以包括在成对的每条单链上的两个核苷酸键,这两个核苷酸键不是彼此直接相对的,使得当被切割时,留下“粘性末端”,从而单链核苷酸的区域保留在分子的末端。切割位点可以由特定的核苷酸序列定义,该特定的核苷酸序列能够被酶诸如限制性酶或另一种具有序列识别能力的核酸内切酶诸如CRISPER/Cas9识别。切割位点可以在这样的酶(即1型限制性酶)的识别序列内,或者通过一些确定的核苷酸间隔(即2型限制性酶)与它们相邻。切割位点也可以由能够被某些核酸酶识别的修饰的核苷酸的位置来定义。例如,脱碱基位点可以被核酸内切酶VII以及酶FPG识别和切割。尿嘧啶碱基可以被酶UDG识别并变成脱碱基位点。当退火至互补的DNA序列时,另外的DNA序列中含核糖的核苷酸可以被RNAseH2识别和切割。Cut site: Also known as "cleavage site" and "nicksite," is a bond or pair of bonds between nucleotides in a nucleic acid molecule. In the case of a double-stranded nucleic acid molecule, such as double-stranded DNA, the cleavage site may include bonds (usually phosphodiester bonds) that are immediately adjacent to each other in the double-stranded molecule, such that "blunt" ends are formed upon cleavage. The cleavage site may also include two nucleotide bonds on each single strand of the pair that are not directly opposite each other, so that when cleaved, a "sticky end" is left, thereby Regions of single-stranded nucleotides remain at the ends of the molecule. The cleavage site can be defined by a specific nucleotide sequence that can be recognized by an enzyme such as a restriction enzyme or another endonuclease with sequence recognition capabilities such as CRISPER/Cas9. The cleavage site can be within the recognition sequence of such enzymes (ie type 1 restriction enzymes) or be adjacent to them by some defined nucleotide spacer (ie type 2 restriction enzymes). The cleavage site can also be defined by the position of modified nucleotides that can be recognized by certain nucleases. For example, abasic sites can be recognized and cleaved by endonuclease VII and the enzyme FPG. Uracil bases can be recognized by the enzyme UDG and become abasic sites. When annealed to a complementary DNA sequence, ribose-containing nucleotides in the additional DNA sequence can be recognized and cleaved by RNAseH2.

确定:本文描述的许多方法包含“确定”的步骤。阅读本说明书的本领域普通技术人员将理解,这样的“确定”可以利用或通过使用本领域技术人员可用的各种技术中的任何一种来实现,包含例如本文明确提及的特定技术。在一些实施例中,确定包含物理样品的操作。在一些实施例中,确定包含对数据或信息的考虑和/或操纵,例如利用适于执行相关分析的计算机或其他处理单元。在一些实施例中,确定包含从来源接收相关信息和/或材料。在一些实施例中,确定包含将样品或实体的一个或多个特征与可比参考进行比较。Determination: Many of the methods described herein contain a step of "determination". Those of ordinary skill in the art who read this specification will understand that such "determining" can be accomplished with or through the use of any of a variety of techniques available to those of ordinary skill in the art, including, for example, the specific techniques explicitly mentioned herein. In some embodiments, an operation comprising a physical sample is determined. In some embodiments, the determination involves consideration and/or manipulation of data or information, eg, using a computer or other processing unit adapted to perform the relevant analysis. In some embodiments, determining includes receiving relevant information and/or material from a source. In some embodiments, determining comprises comparing one or more characteristics of the sample or entity to a comparable reference.

表达:如本文中所使用的,核酸序列的“表达”是指下列事件中的一个或多个:(1)由DNA序列产生RNA模板(例如,通过转录);(2)处理RNA转录本(例如,通过剪接、编辑、5'帽形成和/或3'末端形成);(3)将RNA翻译成多肽或蛋白质;和/或(4)多肽或蛋白质的翻译后修饰。Expression: As used herein, "expression" of a nucleic acid sequence refers to one or more of the following events: (1) the production of an RNA template from a DNA sequence (eg, by transcription); (2) the processing of RNA transcripts ( For example, by splicing, editing, 5' cap formation and/or 3' end formation); (3) translation of RNA into a polypeptide or protein; and/or (4) post-translational modification of a polypeptide or protein.

提取部分:如本文中所使用的,术语“提取部分”(其也可以被称为“结合配偶体”、“亲和配偶体”、“诱饵”部分或化学基团以及其他名称)是指可分离部分或任何类型的分子,其允许带有捕获标记的核酸与缺少捕获标记的核酸的亲和分离。在一些实施例中,提取部分选自包括小分子、核酸、肽、抗体或任何唯一可结合的部分的组。提取部分可以被连接到或可链接到固相或其他表面,用于形成官能化的表面。在一些实施例中,提取部分是连接到表面(例如固体表面、珠、磁性颗粒等)的核苷酸的序列。在一些实施例中,提取部分选自抗生物素蛋白、链霉亲和素、抗体、聚组氨酸标签、FLAG标签或用于附着化学的表面的任何化学修饰的组。这些后者的非限制性示例包含可以通过“点击”方法形成1,2,3-三唑键的叠氮化物和炔烃基团或硫醇叠氮化物和末端炔烃以及可以反应以固定I-LinkerTM标记的寡核苷酸的醛和酮修饰的表面,硫醇修饰的表面可以与丙烯酸酯修饰的寡核苷酸共价反应。Extraction moiety: As used herein, the term "extraction moiety" (which may also be referred to as "binding partner", "affinity partner", "bait" moiety or chemical group, among other names) refers to a moiety that can A separation moiety or molecule of any type that allows affinity separation of nucleic acid bearing a capture label from nucleic acid lacking a capture label. In some embodiments, the extraction moiety is selected from the group consisting of small molecules, nucleic acids, peptides, antibodies, or any uniquely bindable moiety. The extraction moiety can be or can be linked to a solid phase or other surface for forming a functionalized surface. In some embodiments, the extraction moiety is a sequence of nucleotides attached to a surface (eg, a solid surface, beads, magnetic particles, etc.). In some embodiments, the extraction moiety is selected from the group of avidin, streptavidin, antibodies, polyhistidine tags, FLAG tags, or any chemical modification of the surface for attachment chemistry. Non-limiting examples of these latter include azide and alkyne groups or thiol azides and terminal alkynes that can form 1,2,3-triazole bonds by "click" methods and can react to fix I- Aldehyde and ketone-modified surfaces of Linker TM -labeled oligonucleotides, and thiol-modified surfaces can be covalently reacted with acrylate-modified oligonucleotides.

官能化的表面:如本文中所使用的,术语“官能化的表面”是指固体表面、珠或能够结合或固定捕获标记的另一固定结构。在一些实施例中,官能化的表面包括能够结合捕获标记的提取部分。在一些实施例中,提取部分被直接地连接至表面。在一些实施例中,表面的化学修饰起到提取部分的作用。在一些实施例中,官能化的表面可以包括受控的孔玻璃(CPG)、磁性多孔玻璃(MPG)以及其他玻璃或非玻璃表面。化学官能化可以包括酮修饰、醛修饰、硫醇修饰、叠氮化物修饰和炔烃修饰等。在一些实施例中,官能化的表面和用于衔接子合成的寡核苷酸使用一组固定化学物质中的一种或多种连接,所述固定化学物质形成酰胺键、烷基胺键、硫脲键、重氮键、肼键以及其他表面化学物质。在一些实施例中,使用一组试剂中的一种或多种连接官能化的表面和用于衔接子合成的寡核苷酸,所述试剂包含EDAC、NHS、高碘酸钠、戊二醛、吡啶基二硫化物、亚硝酸、生物素以及其他连接试剂。Functionalized surface: As used herein, the term "functionalized surface" refers to a solid surface, bead, or another immobilized structure capable of binding or immobilizing a capture label. In some embodiments, the functionalized surface includes extraction moieties capable of binding capture labels. In some embodiments, the extraction portion is directly attached to the surface. In some embodiments, chemical modification of the surface acts as an extraction moiety. In some embodiments, functionalized surfaces can include controlled pore glass (CPG), magnetic porous glass (MPG), and other glass or non-glass surfaces. Chemical functionalization can include ketone modifications, aldehyde modifications, thiol modifications, azide modifications, and alkyne modifications, among others. In some embodiments, the functionalized surface and the oligonucleotide for adaptor synthesis are linked using one or more of a set of immobilization chemistries that form amide bonds, alkylamine bonds, Thiourea bonds, diazo bonds, hydrazine bonds, and other surface chemistries. In some embodiments, the functionalized surface is attached to the oligonucleotide for adaptor synthesis using one or more of a set of reagents comprising EDAC, NHS, sodium periodate, glutaraldehyde , pyridyl disulfide, nitrous acid, biotin, and other linking reagents.

gRNA:如本文中所使用的,“gRNA”或“导向RNA”是指短的RNA分子,其包含适合于靶向的核酸内切酶(例如Cas酶诸如Cas9或Cpf1或具有类似性质的另一种核糖核蛋白等)的支架序列,所述靶向的核酸内切酶结合至有助于切割DNA或RNA的特定区域的基本上靶特异性的序列。gRNA: As used herein, "gRNA" or "guide RNA" refers to a short RNA molecule comprising an endonuclease suitable for targeting (eg a Cas enzyme such as Cas9 or Cpf1 or another with similar properties ribonucleoproteins, etc.), the targeted endonucleases bind to substantially target-specific sequences that facilitate cleavage of specific regions of DNA or RNA.

核酸:如本文中所使用的,在其最广泛的意义上,是指被掺入到或可以被掺入到寡核苷酸链中的任何化合物和/或物质。在一些实施例中,核酸是通过磷酸二酯键被掺入到或可以被掺入到寡核苷酸链中的化合物和/或物质。如将从上下文中可以清楚的是,在一些实施例中,“核酸”是指单个核酸残基(例如,核苷酸和/或核苷);在一些实施例中,“核酸”是指包括单个核酸残基的寡核苷酸链。在一些实施例中,“核酸”是或包括RNA;在一些实施例中,“核酸”是或包括DNA。在一些实施例中,核酸是、包括或由一个或多个天然核酸残基组成。在一些实施例中,核酸是、包括或由一种或多种核酸类似物组成。在一些实施例中,核酸类似物不同于核酸,因为它不利用磷酸二酯主链。例如,在一些实施例中,核酸是、包括或由一种或多种“肽核酸”组成,所述“肽核酸”是本领域中已知的,并且在主链中具有肽键而不是磷酸二酯键,被认为在本技术的范围内。可替代地或另外地,在一些实施例中,核酸具有一个或多个硫代磷酸酯和/或5'-N-亚磷酰胺键,而不是磷酸二酯键。在一些实施例中,核酸是、包括或由一种或多种天然核苷(例如,腺苷、胸苷、鸟苷、胞苷、尿苷、脱氧腺苷、脱氧胸腺嘧啶、脱氧鸟苷和脱氧胞苷)组成。在一些实施例中,核酸是、包括或由一种或多种核苷类似物组成(例如,2-氨基腺苷、2-硫代嘧啶、肌苷、吡咯并嘧啶、3-甲基腺苷、5-甲基胞苷、C-5丙炔基-胞苷、C-5丙炔基-尿苷、2-氨基腺苷、C5-溴尿苷、C5-氟尿苷、C5-碘尿苷、C5-丙炔基-尿苷、C5-丙炔基-胞苷、C5-甲基胞苷、2-氨基腺苷、7-脱氮腺苷、7-脱氮鸟苷、8-氧代腺苷、8-氧代鸟苷、0(6)-甲基鸟嘌呤、2-硫代胞苷、甲基化碱基、插层碱基及其组合)。在一些实施例中,与通常存在的天然核酸中的核酸相比,核酸包括一种或多种修饰的糖(例如2'-氟核糖、核糖、2'-脱氧核糖、阿拉伯糖、己糖或锁核酸)。在一些实施例中,核酸具有编码功能基因产物例如RNA或蛋白质的核苷酸序列。在一些实施例中,核酸包含一个或多个内含子。在一些实施例中,核酸可以是非蛋白质编码RNA产物,诸如微RNA、核糖体RNA或CRISPER/Cas9导向RNA。在一些实施例中,核酸在基因组中起调节作用。在一些实施例中,核酸不是来自基因组。在一些实施例中,核酸包含基因间序列。在一些实施例中,核酸衍生自染色体外元件或非细胞核基因组(线粒体、叶绿体等)。在一些实施例中,核酸通过从天然来源分离、通过基于互补模板的聚合的酶促合成(体内或体外)、在重组细胞或系统中的复制和化学合成中的一种或多种来制备。在一些实施例中,核酸是至少2、3、4、5、6、7、8、9、10、15、20、25、30、35、40、45、50、55、60、65、70、75、80、85、90、95、100、110、120、130、140、150、160、170、180、190、200、225、250、275、300、325、350、375、400、425、450、475、500、600、700、800、900、1000、1500、2000、2500、3000、3500、4000、4500、5000或更多的残基长度。在一些实施例中,核酸是部分或全部单链的;在一些实施例中,核酸是部分或全部双链的。在一些实施例中,核酸具有包括至少一种编码多肽的元件核苷酸序列,或者是编码多肽的序列的补体。在一些实施例中,核酸具有酶活性。在一些实施例中,核酸发挥机械功能,例如在核糖核蛋白复合物或转移RNA中。在一些实施例中,核酸起到衔接子的作用。在一些实施例中,核酸可以用于数据存储。在一些实施例中,核酸可以在体外化学合成。Nucleic acid: As used herein, in its broadest sense, refers to any compound and/or substance that is or can be incorporated into an oligonucleotide chain. In some embodiments, nucleic acids are compounds and/or substances that are or can be incorporated into oligonucleotide chains through phosphodiester linkages. As will be clear from the context, in some embodiments, "nucleic acid" refers to a single nucleic acid residue (eg, nucleotide and/or nucleoside); in some embodiments, "nucleic acid" refers to comprising Oligonucleotide chains of single nucleic acid residues. In some embodiments, "nucleic acid" is or includes RNA; in some embodiments, "nucleic acid" is or includes DNA. In some embodiments, the nucleic acid is, comprises or consists of one or more natural nucleic acid residues. In some embodiments, the nucleic acid is, comprises or consists of one or more nucleic acid analogs. In some embodiments, a nucleic acid analog differs from a nucleic acid in that it does not utilize a phosphodiester backbone. For example, in some embodiments, the nucleic acid is, comprises or consists of one or more "peptide nucleic acids" known in the art and having peptide bonds in the backbone rather than phosphates Diester linkages are considered to be within the scope of the present technology. Alternatively or additionally, in some embodiments, the nucleic acid has one or more phosphorothioate and/or 5'-N-phosphoramidite linkages instead of phosphodiester linkages. In some embodiments, the nucleic acid is, includes, or consists of one or more natural nucleosides (eg, adenosine, thymidine, guanosine, cytidine, uridine, deoxyadenosine, deoxythymidine, deoxyguanosine, and deoxycytidine). In some embodiments, the nucleic acid is, includes, or consists of one or more nucleoside analogs (eg, 2-aminoadenosine, 2-thiopyrimidine, inosine, pyrrolopyrimidine, 3-methyladenosine , 5-methylcytidine, C-5 propynyl-cytidine, C-5 propynyl-uridine, 2-aminoadenosine, C5-bromouridine, C5-fluorouridine, C5-iodouridine glycoside, C5-propynyl-uridine, C5-propynyl-cytidine, C5-methylcytidine, 2-aminoadenosine, 7-deazaadenosine, 7-deazaguanosine, 8-oxo adenosine, 8-oxoguanosine, 0(6)-methylguanine, 2-thiocytidine, methylated bases, intercalated bases, and combinations thereof). In some embodiments, the nucleic acid comprises one or more modified sugars (eg, 2'-fluororibose, ribose, 2'-deoxyribose, arabinose, hexose or locked nucleic acid). In some embodiments, the nucleic acid has a nucleotide sequence that encodes a functional gene product, such as RNA or protein. In some embodiments, the nucleic acid comprises one or more introns. In some embodiments, the nucleic acid can be a non-protein-coding RNA product, such as a microRNA, ribosomal RNA, or CRISPER/Cas9 guide RNA. In some embodiments, the nucleic acid functions as a regulator in the genome. In some embodiments, the nucleic acid is not from the genome. In some embodiments, the nucleic acid comprises an intergenic sequence. In some embodiments, the nucleic acid is derived from an extrachromosomal element or a non-nuclear genome (mitochondria, chloroplast, etc.). In some embodiments, nucleic acids are prepared by one or more of isolation from natural sources, enzymatic synthesis (in vivo or in vitro) by polymerization of complementary templates, replication in recombinant cells or systems, and chemical synthesis. In some embodiments, the nucleic acid is at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70 , 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 225, 250, 275, 300, 325, 350, 375, 400, 425 , 450, 475, 500, 600, 700, 800, 900, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000 or more residues in length. In some embodiments, the nucleic acid is partially or fully single-stranded; in some embodiments, the nucleic acid is partially or fully double-stranded. In some embodiments, the nucleic acid has a nucleotide sequence that includes at least one element encoding a polypeptide, or is the complement of a sequence encoding a polypeptide. In some embodiments, the nucleic acid has enzymatic activity. In some embodiments, the nucleic acid functions mechanically, eg, in a ribonucleoprotein complex or transfer RNA. In some embodiments, the nucleic acid functions as an adaptor. In some embodiments, nucleic acids can be used for data storage. In some embodiments, nucleic acids can be chemically synthesized in vitro.

参考:如本文中所使用的,描述了相对于其进行比较的标准或对照。例如,在一些实施例中,相关的药剂、动物、个体、群体、样品、序列或值与参考或对照药剂、动物、个体、群体、样品、序列或值进行比较。在一些实施例中,基本上与相关的测试或确定同时测试和/或确定参考或对照。在一些实施例中,参考或对照是历史参考或对照,任选地包含在有形介质中。通常地,如本领域技术人员将理解的,参考或对照在与被评估的条件或环境可比较的条件或环境下确定或表征。本领域技术人员将理解何时存在足够的相似性以证明对特定的可能的参考或对照的依赖和/或比较。Reference: As used herein, describes a standard or control against which to compare. For example, in some embodiments, a related agent, animal, individual, population, sample, sequence or value is compared to a reference or control agent, animal, individual, population, sample, sequence or value. In some embodiments, the reference or control is tested and/or determined substantially concurrently with the relevant test or determination. In some embodiments, the reference or control is a historical reference or control, optionally contained in a tangible medium. Typically, as will be understood by those skilled in the art, a reference or control is determined or characterized under conditions or circumstances that are comparable to the conditions or circumstances being assessed. Those skilled in the art will understand when sufficient similarity exists to justify reliance and/or comparison with a particular potential reference or control.

单分子标识符(SMI):如本文中所使用的,术语“单分子标识符”或“SMI”(其可以被称为“标签”、“条形码”、“分子条形码”、“唯一分子标识符”或“UMI”,以及其他名称)是指能够在大的异质分子群体中区分单个分子的任何材料(例如,核苷酸序列、核酸分子特征)。在一些实施例中,SMI可以是或包括外源性应用的SMI。在一些实施例中,外源性应用的SMI可以是或包括简并或半简并序列。在一些实施例中,基本上简并SMI可以被称为随机唯一分子标识符(R-UMI)。在一些实施例中,SMI可以包括来自已知代码池内的代码(例如核酸序列)。在一些实施例中,预定义的SMI代码被称为定义的唯一分子标识符(D-UMI)。在一些实施例中,SMI可以是或包括内源性SMI。在一些实施例中,内源性SMI可以是或包括与靶序列的特定剪切点或与包括靶序列的单个分子的末端相关的特征相关的信息。在一些实施例中,SMI可以涉及由对核酸分子的随机或半随机损伤、化学修饰、酶修饰或其他修饰引起的核酸分子中的序列变异。在一些实施例中,修饰可以是甲基胞嘧啶的脱氨基。在一些实施例中,修饰可能需要核酸切口的位点。在一些实施例中,SMI可以包括外源性元件和内源性元件。在一些实施例中,SMI可以包括物理上相邻的SMI元件。在一些实施例中,SMI元件在分子中可以在空间上不同。在一些实施例中,SMI可以是非核酸。在一些实施例中,SMI可以包括两种或更多种不同类型的SMI信息。在国际专利公开第WO2017/100441号(其全部内容通过引用并入到本文中)中进一步公开了SMI的各种实施例。Single Molecule Identifier (SMI): As used herein, the term "Single Molecule Identifier" or "SMI" (which may be referred to as "tag", "barcode", "molecular barcode", "unique molecular identifier" ” or “UMI,” and other names) refers to any material (eg, nucleotide sequence, nucleic acid molecular signature) capable of distinguishing individual molecules within a large heterogeneous population of molecules. In some embodiments, the SMI may be or include an exogenously applied SMI. In some embodiments, the exogenously applied SMI may be or include a degenerate or semi-degenerate sequence. In some embodiments, a substantially degenerate SMI may be referred to as a random unique molecular identifier (R-UMI). In some embodiments, the SMI may include codes (eg, nucleic acid sequences) from within a pool of known codes. In some embodiments, the predefined SMI codes are referred to as Defined Unique Molecular Identifiers (D-UMIs). In some embodiments, the SMI can be or include an endogenous SMI. In some embodiments, an endogenous SMI can be or include information related to a specific cleavage site of the target sequence or a feature associated with the end of a single molecule that includes the target sequence. In some embodiments, SMI may involve sequence variation in a nucleic acid molecule caused by random or semi-random damage, chemical modification, enzymatic modification, or other modification to the nucleic acid molecule. In some embodiments, the modification can be the deamination of methylcytosine. In some embodiments, modifications may require nucleic acid nicking sites. In some embodiments, the SMI can include exogenous and endogenous elements. In some embodiments, the SMI may include physically adjacent SMI elements. In some embodiments, the SMI elements can be spatially distinct within the molecule. In some embodiments, the SMI can be non-nucleic acid. In some embodiments, an SMI may include two or more different types of SMI information. Various embodiments of SMI are further disclosed in International Patent Publication No. WO2017/100441, the entire contents of which are incorporated herein by reference.

链定义元件(SDE):如本文中所使用的,术语“链定义元件”或“SDE”是指允许识别双链核酸材料的特定链并且因此与另一/互补链区分的任何材料(例如,在测序或其他核酸询问后,使由靶双链核酸产生的两个单链核酸中的每一个的扩增产物基本上彼此可区分的任何材料)。在一些实施例中,SDE可以是或包括衔接子序列中基本上非互补序列的一个或多个片段。在特定的实施例中,衔接子序列中基本上非互补的序列的片段可以由包括Y形或“环”形的衔接子分子提供。在其他实施例中,衔接子序列中基本上非互补序列的片段可能在衔接子序列中相邻互补序列的中间形成不成对的“泡”。在其他实施例中,SDE可以包含核酸修饰。在一些实施例中,SDE可以包括成对的链物理分离成物理分离的反应室。在一些实施例中,SDE可以包括化学修饰。在一些实施例中,SDE可以包括修饰的核酸。在一些实施例中,SDE可能涉及由对核酸分子的随机或半随机损伤、化学修饰、酶修饰或其他修饰引起的核酸分子中的序列变异。在一些实施例中,修饰可以是甲基胞嘧啶的脱氨基。在一些实施例中,修饰可能需要核酸切口的位点。在国际专利公开第WO2017/100441号(其全部内容通过引用被并入到本文中)中进一步公开了SDE的各种实施例。Strand-Defining Element (SDE): As used herein, the term "strand-defining element" or "SDE" refers to any material that allows a particular strand of double-stranded nucleic acid material to be identified and thus distinguished from another/complementary strand (eg, Any material that makes the amplification products of each of two single-stranded nucleic acids generated from a target double-stranded nucleic acid substantially distinguishable from each other after sequencing or other nucleic acid interrogation). In some embodiments, the SDE can be or include one or more fragments of substantially non-complementary sequences in the adaptor sequence. In particular embodiments, fragments of substantially non-complementary sequences in the adaptor sequence can be provided by adaptor molecules comprising Y-shapes or "loops". In other embodiments, fragments of substantially non-complementary sequences in the adaptor sequence may form unpaired "bubbles" in the middle of adjacent complementary sequences in the adaptor sequence. In other embodiments, the SDE may comprise nucleic acid modifications. In some embodiments, the SDE may include physical separation of pairs of chains into physically separate reaction chambers. In some embodiments, the SDE can include chemical modifications. In some embodiments, SDEs can include modified nucleic acids. In some embodiments, SDE may involve sequence variation in a nucleic acid molecule caused by random or semi-random damage, chemical modification, enzymatic modification, or other modifications to the nucleic acid molecule. In some embodiments, the modification can be the deamination of methylcytosine. In some embodiments, modifications may require nucleic acid nicking sites. Various embodiments of SDE are further disclosed in International Patent Publication No. WO2017/100441, the entire contents of which are incorporated herein by reference.

受试者:如本文中所使用的,术语“受试者”是指生物体,通常是哺乳动物(例如,人,在一些实施例中包含产前人类形式)。在一些实施例中,受试者患有相关疾病、障碍或病症。在一些实施例中,受试者易患疾病、障碍或病症。在一些实施例中,受试者表现出疾病、障碍或病症的一种或多种症状或特征。在一些实施例中,受试者没有表现出疾病、障碍或病症的任何症状或特征。在一些实施例中,受试者是具有对疾病、障碍或病症的易感性或风险特征的一个或多个特征的人。在一些实施例中,受试者是患者。在一些实施例中,受试者是被施用和/或已经被施用诊断和/或疗法的个体。Subject: As used herein, the term "subject" refers to an organism, typically a mammal (eg, a human, including in some embodiments a prenatal human form). In some embodiments, the subject suffers from a related disease, disorder or condition. In some embodiments, the subject is susceptible to a disease, disorder or condition. In some embodiments, the subject exhibits one or more symptoms or characteristics of a disease, disorder or condition. In some embodiments, the subject does not exhibit any symptoms or characteristics of the disease, disorder or condition. In some embodiments, the subject is a human having one or more characteristics of susceptibility or risk characteristics for a disease, disorder or condition. In some embodiments, the subject is a patient. In some embodiments, the subject is an individual to whom a diagnosis and/or therapy is administered and/or has been administered.

基本上:如本文中所使用的,术语“基本上”是指表现出相关的特征或性质的全部或接近全部的范围或程度的定性条件。生物领域的普通技术人员将理解,生物和化学现象很少(如果有的话)完成和/或进行到完全或达到或避免绝对结果。因此,术语“基本上”在本文中用于捕捉许多生物和化学现象中固有的潜在的完整性的缺乏。Substantially: As used herein, the term "substantially" refers to the qualitative condition of exhibiting all or nearly all of the extent or degree of the characteristic or property concerned. One of ordinary skill in the biological arts will understand that biological and chemical phenomena are rarely, if ever, accomplished and/or progressed to completion or absolute results are achieved or avoided. Thus, the term "substantially" is used herein to capture the underlying lack of integrity inherent in many biological and chemical phenomena.

具体实施方式Detailed ways

本技术总体上涉及用于富集用于测序应用和其他核酸材料询问的核酸材料的方法和用于这样的方法的相关试剂。该技术的一些实施例涉及富集核酸材料中的一个或多个相关区域,用于测序应用,诸如双链测序应用和用于实现高精度测序读数的其他测序应用。例如,本技术的各种实施例包含选择性地富集相关区域的核酸材料(例如,基因组DNA材料),并且进行双链测序方法以提供富集的核酸材料的错误校正的序列读数。本技术的进一步示例涉及用于对针对相关区域富集的核酸材料进行双链测序方法或其他测序方法(例如,单一共有测序方法、Hyb&SeqTM测序方法、纳米孔测序方法等)的方法。在各种实施例中,以更快的速率(例如,用更少的步骤)和更低的成本(例如,使用更少的试剂)提供核酸材料的富集,包含将核酸材料富集到相关的区域,并导致所需数据的增加。本技术的各个方面在临床前和临床测试和诊断以及其他应用中都具有许多应用。The present technology generally relates to methods for enriching nucleic acid material for sequencing applications and other interrogation of nucleic acid material and related reagents for such methods. Some embodiments of this technology relate to enriching one or more relevant regions in nucleic acid material for sequencing applications, such as double-stranded sequencing applications and other sequencing applications for achieving high precision sequencing reads. For example, various embodiments of the present technology comprise selectively enriching nucleic acid material (eg, genomic DNA material) in regions of interest, and performing a double-stranded sequencing method to provide error-corrected sequence reads of the enriched nucleic acid material. Further examples of the present technology relate to methods for double-stranded sequencing methods or other sequencing methods (eg, single consensus sequencing methods, Hyb&Seq sequencing methods, nanopore sequencing methods, etc.) for nucleic acid material enriched for regions of interest. In various embodiments, the enrichment of nucleic acid material is provided at a faster rate (eg, with fewer steps) and at a lower cost (eg, using fewer reagents), comprising enriching the nucleic acid material to a relevant area and result in an increase in the required data. Various aspects of the present technology have numerous applications in preclinical and clinical testing and diagnostics, as well as other applications.

双链测序(DS)是一种用于从双链核酸分子中产生错误校正的核酸序列读数的方法。在该技术的某些方面,可以使用DS来以这样的方式独立地对单个核酸分子的两条链进行测序,使得在大规模平行测序期间,衍生序列读数可以被识别为源自相同的双链核酸亲代分子,但是在测序之后也可以作为可区分的实体而彼此区分。然后比较来自每条链的所得序列读数,用于获得原始双链核酸分子的错误校正的序列(被称为双链共有序列)的目的。DS的过程使得可以确认原始双链核酸分子的一条链或两条链是否在用于形成双链共有序列的生成的测序数据中被表示。Double-stranded sequencing (DS) is a method for generating error-corrected nucleic acid sequence reads from double-stranded nucleic acid molecules. In certain aspects of the technology, DS can be used to independently sequence two strands of a single nucleic acid molecule in such a way that during massively parallel sequencing, derived sequence reads can be identified as originating from the same double strand Nucleic acid parent molecules, but can also be distinguished from each other as distinguishable entities after sequencing. The resulting sequence reads from each strand are then compared for the purpose of obtaining an error-corrected sequence (referred to as a double-stranded consensus sequence) of the original double-stranded nucleic acid molecule. The process of DS makes it possible to confirm whether one or both strands of the original double-stranded nucleic acid molecule are represented in the generated sequencing data used to form the double-stranded consensus sequence.

标准下一代测序的误差率为约1/100-1/1000,并且当少于1/100-1/1000的分子携带序列变体时,其存在被测序过程的背景误差率所掩盖。另一方面,由于获得了高度的误差校正,DS可以精确地检测极低的频率变体。与标准的下一代测序方法相比,由DS的链比较技术提供的高度错误校正将双链核酸分子的测序错误降低多个数量级。误差的这种降低提高了几乎所有类型序列的测序的准确度,但可以特别好地适于本领域中众所周知的特别容易出错的具有生物化学挑战性的序列,或者其中被测序的分子群体是异质的(即,分子的小子集携带其他分子不携带的序列变体)。这样的类型的序列的一个非限制性示例是均聚物或其他微卫星/短串联重复序列。受益于DS错误校正的易错序列的另一个非限制性示例是已经被破坏的分子,例如,通过加热、辐射、机械应力或各种化学暴露被破坏的分子,这些化学暴露产生在被一种或多种核苷酸聚合酶复制期间容易出错的化学加合物,以及在分子的末端产生单链DNA或作为缺口和间隙的那些加合物。在通过固定过程(即临床病理学中的FFPE)或古老的DNA或在其中材料已经被暴露于恶劣化学物质或环境的法医应用中发生的高度受损的DNA(氧化、脱氨等)中,双链测序对于减少损伤导致的高度误差水平特别有用。Standard next-generation sequencing has an error rate of about 1/100-1/1000, and when fewer than 1/100-1/1000 molecules carry sequence variants, their presence is masked by the background error rate of the sequencing process. On the other hand, DS can accurately detect extremely low frequency variants due to the high degree of error correction obtained. The high degree of error correction provided by DS's strand comparison technology reduces sequencing errors of double-stranded nucleic acid molecules by orders of magnitude compared to standard next-generation sequencing methods. This reduction in error increases the accuracy of sequencing for nearly all types of sequences, but may be particularly well suited for biochemically challenging sequences that are well known in the art to be particularly error-prone, or where the population of molecules being sequenced is heterogeneous. Qualitative (ie, small subsets of molecules carry sequence variants that other molecules do not carry). A non-limiting example of such a type of sequence is a homopolymer or other microsatellites/short tandem repeats. Another non-limiting example of an error-prone sequence that benefits from DS error correction is a molecule that has been destroyed, for example, by heat, radiation, mechanical stress, or various chemical exposures that result from being destroyed by a or various chemical adducts that are error-prone during replication of nucleotide polymerases, and those that create single-stranded DNA or as gaps and gaps at the ends of the molecule. In highly damaged DNA (oxidation, deamination, etc.) that occurs through fixation processes (i.e. FFPE in clinical pathology) or ancient DNA or in forensic applications where the material has been exposed to harsh chemicals or environments, Double-stranded sequencing is particularly useful for reducing the high level of error caused by damage.

在进一步的实施例中,DS还可以用于精确检测双链核酸分子的群体中的少数序列变体。该应用的一个非限制性示例是在来自受试者体内的非癌组织的较大数量的未突变分子中检测到少量源自癌症的DNA分子。DS也非常适合于基因组中难以测序的区域(均聚物、微卫星、G-四链体等)的精确基因分型,其中标准测序的错误率特别高。通过DS进行稀有变体检测的另一个非限制性应用是早期检测由基因毒素暴露导致的DNA损伤。DS的另一个非限制性应用是通过观察出现驱动突变的基因克隆来检测由基因毒性或非基因毒性致癌物生成的突变。用于精确检测少数序列变体的又进一步的非限制性应用是生成与基因毒素相关的诱变标志。DS效用的另外的非限制性示例可以在Salk et al,Nature ReviewsGenetics 2018,PMID 29576615(其通过引用以其整体并入本文)中找到。In a further embodiment, DS can also be used to accurately detect minority sequence variants in a population of double-stranded nucleic acid molecules. A non-limiting example of this application is the detection of a small number of cancer-derived DNA molecules in a larger number of unmutated molecules from non-cancerous tissue in a subject. DS is also well suited for precise genotyping of difficult-to-sequence regions of the genome (homopolymers, microsatellites, G-quadruplexes, etc.), where the error rate of standard sequencing is particularly high. Another non-limiting application of rare variant detection by DS is the early detection of DNA damage caused by genotoxin exposure. Another non-limiting application of DS is the detection of mutations generated by genotoxic or non-genotoxic carcinogens by observing gene clones that exhibit driver mutations. Yet a further non-limiting application for the precise detection of few sequence variants is the generation of mutagenic markers associated with genotoxins. Additional non-limiting examples of the utility of DS can be found in Salk et al, Nature Reviews Genetics 2018, PMID 29576615 (which is incorporated herein by reference in its entirety).

与用于测序应用以及其他核酸材料询问的核酸材料的富集相关的各种实施例在单分子测序应用和直接数字测序方法中具有实用性。在一些实施例中,使用带有条形码探针的单分子杂交的技术可以用于表征和/或定量基因组区域。一般来说,这样的技术使用分子“条形码”和单分子成像来检测和计数单个反应中的特定核酸靶,而无需扩增。典型地,每个颜色编码的条形码附着到对应于相关的基因组区域的单个靶特异性探针上。它们与控件混合在一起,形成一个多路复用的代码集。在一些实施例中,使用两种探针来杂交每个单独的靶核酸。在特定的布置中,报告探针携带信号,并且捕获探针允许复合物被固定用于数据收集。在杂交后,除去多余的探针,并且固定的探针/靶复合物可以通过数字分析仪进行分析用于数据收集。对每个靶分子(例如,相关的基因组区域)的颜色代码进行计数和列表。合适的数字分析仪包含

Figure BDA0002682281560000251
分析系统(NanoStringTMTechnologies;Seattle,WA)。包含分子“条形码”的方法和试剂以及适合于NanoStringTM技术的设备在例如美国专利公开第2010/0112710号、第2010/0047924号、第2010/0015607号(每一个的全部内容通过引用并入本文)中进一步描述。Various embodiments related to enrichment of nucleic acid material for sequencing applications as well as other nucleic acid material interrogation have utility in single molecule sequencing applications and direct digital sequencing methods. In some embodiments, techniques using single-molecule hybridization with barcoded probes can be used to characterize and/or quantify genomic regions. In general, such techniques use molecular "barcoding" and single-molecule imaging to detect and count specific nucleic acid targets in a single reaction without amplification. Typically, each color-coded barcode is attached to a single target-specific probe corresponding to the relevant genomic region. They are mixed with controls to form a multiplexed code set. In some embodiments, two probes are used to hybridize each individual target nucleic acid. In certain arrangements, the reporter probe carries the signal and the capture probe allows the complex to be immobilized for data collection. After hybridization, excess probes are removed and the immobilized probe/target complexes can be analyzed by a digital analyzer for data collection. Color codes for each target molecule (eg, associated genomic region) are counted and listed. A suitable digital analyzer contains
Figure BDA0002682281560000251
Analysis system (NanoString Technologies; Seattle, WA). Methods and reagents comprising molecular "barcodes" and devices suitable for NanoString technology are described in, eg, US Patent Publication Nos. 2010/0112710, 2010/0047924, 2010/0015607 (the entire contents of each are incorporated herein by reference) ) are further described in.

直接数字测序(DDS)技术包含用于提供高度精确的单分子测序的方法,其同时捕获和直接测序用于各种研究、诊断和其他应用的DNA和RNA。DDS提供短的测序读数和长的测序读数两者,而不需要文库创建或扩增步骤,并且在例如国际专利公开第WO 2016/081740号(其通过引用并入本文)中描述。通常,核酸靶的直接测序通过将荧光分子条形码杂交到天然核酸靶上来实现。如在美国专利7,919,237中进一步描述的,以及如可从NanoStringTMTechnologies,Inc.(华盛顿州西雅图)可获得的,作为靶向核苷酸序列的延伸的寡聚物通过电拉伸技术拉伸,在空间上分离单体,其中每个单体连接到唯一的标记上。因此,标记的单体的模式可以用于识别寡聚标签上的条形码。Direct digital sequencing (DDS) technology encompasses methods for providing highly accurate single-molecule sequencing that simultaneously captures and directly sequences DNA and RNA for a variety of research, diagnostic, and other applications. DDS provides both short and long sequencing reads without the need for library creation or amplification steps, and is described, for example, in International Patent Publication No. WO 2016/081740, which is incorporated herein by reference. Typically, direct sequencing of nucleic acid targets is accomplished by hybridizing fluorescent molecular barcodes to native nucleic acid targets. As further described in US Pat. No. 7,919,237, and as available from NanoString Technologies, Inc. (Seattle, WA), oligomers that are extensions of targeted nucleotide sequences are stretched by electrostretching techniques, The monomers are spatially separated, where each monomer is attached to a unique label. Thus, the pattern of labeled monomers can be used to identify barcodes on oligomeric tags.

此外,与核酸材料的富集相关的各种实施例在核酸材料的其他表征和/或定量形式中具有实用性,这在本领域中是已知的。例如,确定基因组突变的存在或不存在、DNA变体、DNA或RNA拷贝数的定量以及其他应用的核酸材料的表征可以受益于如本文提供的靶核酸材料的选择性富集。一些方法的示例包含但不限于单分子测序(例如,单分子实时测序、纳米孔测序、高通量测序或下一代测序(NGS)等)、数字PCR、桥接PCR、乳液PCR、半导体测序等。本领域普通技术人员将认识到可以适用于询问和/或受益于富集的核酸材料的其他核酸询问方法和技术。Furthermore, various embodiments related to the enrichment of nucleic acid material have utility in other forms of characterization and/or quantification of nucleic acid material, which are known in the art. For example, determination of the presence or absence of genomic mutations, quantification of DNA variants, DNA or RNA copy number, and characterization of nucleic acid material for other applications can benefit from selective enrichment of target nucleic acid material as provided herein. Examples of some methods include, but are not limited to, single-molecule sequencing (eg, single-molecule real-time sequencing, nanopore sequencing, high-throughput sequencing, or next-generation sequencing (NGS), etc.), digital PCR, bridging PCR, emulsion PCR, semiconductor sequencing, and the like. One of ordinary skill in the art will recognize other nucleic acid interrogation methods and techniques that may be suitable for interrogation and/or benefit from enriched nucleic acid material.

并入DS的方法以及其他测序模式可以包含将一个或多个测序衔接子连接到靶双链核酸分子上,以产生双链靶核酸复合物。这样的衔接子分子可以包含适合于MPS平台的多种特征中的一种或多种,诸如例如测序引物识别位点、扩增引物识别位点、条形码(例如单分子标识符(SMI)序列、索引序列、单链部分、双链部分、链区分元件或特征等)。使用高度纯的测序衔接子用于DS或任何下一代测序技术,对于获得高质量的可重现数据和最大化样品的序列产量(即转化为独立序列读数的输入的分子的相对百分比)非常重要。由于需要成功地恢复原始双链分子的两条链,因此对DS而言,这特别重要。Methods of incorporating DS, as well as other sequencing modalities, can involve ligating one or more sequencing adaptors to a target double-stranded nucleic acid molecule to generate a double-stranded target nucleic acid complex. Such adaptor molecules may comprise one or more of a variety of features suitable for MPS platforms, such as, for example, sequencing primer recognition sites, amplification primer recognition sites, barcodes (eg, single molecule identifier (SMI) sequences, index sequences, single-stranded portions, double-stranded portions, strand distinguishing elements or features, etc.). The use of highly pure sequencing adapters for DS or any next-generation sequencing technology is important for obtaining high-quality reproducible data and maximizing the sequence yield of the sample (i.e. the relative percentage of input molecules converted to independent sequence reads) . This is particularly important for DS as both strands of the original double-stranded molecule need to be successfully recovered.

关于DS过程或其他高精度测序模式的效率,本文进一步描述了两种类型的效率:转换效率和工作流效率。为了讨论DS的效率的目的,转化效率可以被定义为输入到测序文库制备反应中的独特核酸分子的分数,由此产生至少一个双链共有序列读数。工作流效率可能与需要进行这些步骤以产生双链测序文库和/或对相关的序列进行靶向的富集的时间的量、步骤的相对数量和/或试剂/材料的财务成本的相对低效有关。Regarding the efficiency of the DS process or other high-precision sequencing modalities, this paper further describes two types of efficiencies: conversion efficiency and workflow efficiency. For purposes of discussing the efficiency of DS, transformation efficiency can be defined as the fraction of unique nucleic acid molecules input into a sequencing library preparation reaction, resulting in at least one double-stranded consensus sequence read. Workflow efficiency may be related to the amount of time required to perform these steps to generate double-stranded sequencing libraries and/or targeted enrichment of related sequences, the relative number of steps and/or the relative inefficiency of the financial cost of reagents/materials related.

在一些情况下,转换效率和工作流效率限制中的一个或两个可能限制高精度DS在一些应用中的实用性,否则其将非常适合。例如,低的转化效率将导致其中靶双链核酸的拷贝数受到限制的情况,这可能导致低于期望的量的产生的序列信息。这一概念的非限制性示例包含来自循环肿瘤细胞的DNA或来自肿瘤的无细胞DNA,或脱落到体液诸如血浆中并与来自其他组织的过量DNA混合的产前婴儿的DNA。尽管DS通常具有能够解析超过十万个未突变分子中的一个突变分子的准确性,例如,如果在样品中仅有10,000个分子可用,并且即使将这些转化为双链共有序列读数的理想效率为100%,则可以测量的最低突变频率将是1/(10,000*100%)=1/10,000。作为临床诊断,对检测癌症或治疗相关突变的低水平信号具有最大灵敏度可能是重要的,并且因此在这种情况下相对低的转化效率将是不期望的。类似地,在法医应用中,通常非常少的DNA可用于测试。当只能从犯罪现场或自然灾害现场恢复到纳克或皮克的数量,并且其中来自多个个体的DNA混合在一起时,具有最大的转化效率对于能够检测混合物中所有个体的DNA的存在可能是重要的。In some cases, one or both of conversion efficiency and workflow efficiency constraints may limit the usefulness of high precision DS in some applications where it would otherwise be well suited. For example, a low transformation efficiency will lead to situations where the copy number of the target double-stranded nucleic acid is limited, which may result in a lower than desired amount of sequence information being produced. Non-limiting examples of this concept include DNA from circulating tumor cells or cell-free DNA from tumors, or DNA from prenatal infants shed into body fluids such as plasma and mixed with excess DNA from other tissues. Although DS typically has the accuracy of being able to resolve one mutated molecule out of over a hundred thousand unmutated molecules, for example, if only 10,000 molecules are available in the sample, and even if these are converted to double-stranded consensus reads, the ideal efficiency is 100%, then the lowest mutation frequency that can be measured would be 1/(10,000*100%)=1/10,000. As a clinical diagnosis, it may be important to have maximum sensitivity to detect low-level signals of cancer or therapy-related mutations, and thus relatively low transformation efficiencies would be undesirable in this case. Similarly, in forensic applications, often very little DNA is available for testing. The greatest transformation efficiency is achieved when only recovering from a crime scene or natural disaster scene to nanogram or picogram quantities, and where DNA from multiple individuals is mixed together, is possible for being able to detect the presence of DNA from all individuals in the mix is important.

在一些情况下,对于某些核酸询问应用,工作流效率低下可以类似地具有挑战性。这方面的一个非限制性示例是临床微生物学测试。有时需要快速地检测一种或多种感染性生物体的性质,例如,微生物或多微生物血流感染,其中一些生物体基于它们携带的独特遗传变体而对特定的抗生素具有抗性,但是培养和凭经验确定感染性生物体的抗生素敏感性所需的时间比必须做出关于用于治疗的抗生素的治疗决定的时间长得多。来自血液(或其他受感染的组织或体液)的DNA的DNA测序具有更加快速的潜力,并且例如,在其他高精度测序方法中,DS可以根据DNA标志非常准确地检测出感染群体中治疗上重要的少数变体。由于数据生成的工作流周转时间对于确定治疗选项至关重要(例如,如在本文使用的示例中),所以提高到达数据输出的速度的应用也将是期望的。In some cases, workflow inefficiencies can be similarly challenging for certain nucleic acid interrogation applications. A non-limiting example of this is clinical microbiology testing. Sometimes there is a need to rapidly test the properties of one or more infectious organisms, for example, microbial or polymicrobial bloodstream infections, some of which are resistant to specific antibiotics based on the unique genetic variants they carry, but cultured And the time required to empirically determine the antibiotic susceptibility of an infectious organism is much longer than the time required to make a therapeutic decision about the antibiotic to be used for treatment. DNA sequencing of DNA from blood (or other infected tissues or body fluids) has the potential to be more rapid and, for example, among other high-precision sequencing methods, DS can very accurately detect therapeutically important in infected populations based on DNA markers few variants. Since workflow turnaround time for data generation is critical in determining treatment options (eg, as in the examples used herein), applications that increase the speed to data output would also be desirable.

本文进一步公开了用于多种核酸材料询问应用的靶向的核酸序列富集的方法和组合物。具体而言,本技术的一些方面涉及用于靶向的核酸材料富集的方法和组合物,以及这样的富集在误差校正的核酸测序应用中的用途,其在成本、测序的分子的转化和生成用于靶向的超高精度测序的标记的分子的时间效率方面提供了改进。Further disclosed herein are methods and compositions for targeted nucleic acid sequence enrichment for a variety of nucleic acid material interrogation applications. In particular, some aspects of the present technology relate to methods and compositions for the enrichment of targeted nucleic acid material, and the use of such enrichment in error-corrected nucleic acid sequencing applications, the cost, conversion of sequenced molecules and provides improvements in the time efficiency of generating labeled molecules for targeted ultra-high precision sequencing.

I.用于核酸材料的富集的方法和试剂的选定的实施例 I. Selected Examples of Methods and Reagents for Enrichment of Nucleic Acid Materials

在一些实施例中,所提供的方法提供了与用于错误校正的分子条形码的使用兼容的靶向的富集策略。其他实施例提供了用于与不使用分子条形码的DDS和其他测序策略(例如,单分子测序模式和询问)兼容的基于非扩增的靶向的富集策略的方法。In some embodiments, provided methods provide targeted enrichment strategies compatible with the use of molecular barcodes for error correction. Other embodiments provide methods for non-amplification-based targeted enrichment strategies compatible with DDS and other sequencing strategies (eg, single-molecule sequencing modes and interrogation) that do not use molecular barcodes.

在一些实施例中,处理核酸材料以便提高测序过程的效率、准确性和/或速度是有利的。根据本技术的进一步的方面,例如,通过靶向的核酸的片段化可以提高DS的效率。传统上,核酸(例如基因组、线粒体、质粒等)的片段化通过物理剪切(例如声处理)或相对非序列特异性的酶方法(其利用酶混合物来切割DNA磷酸二酯键)来实现。上述方法中任一种的结果都是样品,其中完整的核酸材料(例如,基因组DNA(gDNA))被还原成随机或半随机大小的核酸片段的混合物。尽管有效,但这些方法生成大小可变的核酸片段,这可能导致扩增偏差(例如,短片段比长片段更有效地倾向于PCR扩增,并且在聚合酶克隆形成期间可能更容易簇集扩增)和不均匀的测序深度。例如,图1是绘制了在文库制备期间用不同分子条形码标记的DNA分子群体扩增后,核酸插入物大小和所得的家族大小之间的关系的图。如图1所示,因为较短的片段倾向于优先扩增,所以平均生成这些较短的片段中的每一个的更多数目的拷贝并进行测序,从而提供了这些区域的测序深度的不成比例的水平。In some embodiments, it is advantageous to process the nucleic acid material in order to increase the efficiency, accuracy and/or speed of the sequencing process. According to further aspects of the present technology, the efficiency of DS can be increased, for example, by fragmentation of targeted nucleic acids. Traditionally, fragmentation of nucleic acids (eg, genomes, mitochondria, plasmids, etc.) is accomplished by physical shearing (eg, sonication) or relatively non-sequence-specific enzymatic methods that utilize mixtures of enzymes to cleave DNA phosphodiester bonds. The result of any of the above methods is a sample in which intact nucleic acid material (eg, genomic DNA (gDNA)) is reduced to a mixture of random or semi-random sized nucleic acid fragments. Although effective, these methods generate nucleic acid fragments of variable size, which can lead to amplification bias (e.g., short fragments tend to be PCR amplified more efficiently than long fragments, and may be more prone to cluster amplification during polymerase clone formation) increase) and uneven sequencing depth. For example, Figure 1 is a graph plotting the relationship between nucleic acid insert size and the resulting family size after amplification of a population of DNA molecules labeled with different molecular barcodes during library preparation. As shown in Figure 1, because shorter fragments tend to amplify preferentially, on average a greater number of copies of each of these shorter fragments are generated and sequenced, providing a disproportionate amount of sequencing depth for these regions s level.

此外,对于更长的片段,如果延伸超出测序平台的最大读数长度并且是“暗”的,则测序读数的极限之间(或成对末端测序读数的末端之间)的DNA的部分不能被询问,尽管其被成功地连接、扩增和捕获(图2A)。同样,对于短片段,并且当使用成对末端测序时,来自两个读数的覆盖分子中间的相同序列的重叠的读数提供了冗余信息并且是成本低效的(图2B)。随机或半随机核酸片段化也可能导致靶分子中无法预测的断裂点,这些断裂点产生可能与用于杂交捕获的诱饵链不具有互补性或具有降低的互补性的片段,从而降低靶捕获效率。随机或半随机片段化也可能破坏相关的序列和/或导致非常小或非常大的片段,这些片段在文库制备的其他阶段期间丢失并且可能降低数据产量和效率。Furthermore, for longer fragments, parts of DNA between the limits of sequencing reads (or between the ends of paired-end sequencing reads) cannot be interrogated if they extend beyond the maximum read length of the sequencing platform and are "dark" , although it was successfully ligated, amplified and captured (Fig. 2A). Also, for short fragments, and when paired-end sequencing is used, overlapping reads from two reads covering the same sequence in the middle of the molecule provide redundant information and are cost-inefficient (Figure 2B). Random or semi-random nucleic acid fragmentation can also lead to unpredictable breakpoints in the target molecule that generate fragments that may not be complementary or have reduced complementarity to the bait strand used for hybrid capture, thereby reducing target capture efficiency . Random or semi-random fragmentation may also disrupt related sequences and/or result in very small or very large fragments that are lost during other stages of library preparation and may reduce data yield and efficiency.

许多随机片段化的方法,特别是机械或声学方法的另一个问题是,它们引入了超出双链断裂的损伤,这种损伤可以使部分双链DNA不再是双链的。例如,机械剪切可以在分子的末端产生3'或5'突出端,并且在分子中间产生单链缺口或间隙。这些适于衔接子连接的单链部分(诸如“末端修复”酶的混合物)被用于人工使其再次双链化,并且这可能是人工错误的来源(诸如,例如,如本文所述的“假双链分子”)。在许多实施例中,在处理期间最大化保留天然双链形式的相关的双链核酸的量是最佳的。此外,许多随机或半随机机械片段化的方法所涉及的高能量增加了DNA损伤的丰度,诸如氧化、脱氨或其他加合物的形成,这些加合物的形成在扩增或测序期间可能是诱变的或抑制的,并且可能引入人工产物碱基响应或减少的信号。一些随机或半随机酶片段化方法类似地可以在部分切割的部位处留下诱变的或封闭的“疤痕”。Another problem with many methods of random fragmentation, especially mechanical or acoustic methods, is that they introduce damage beyond double-strand breaks, which can render partially double-stranded DNA no longer double-stranded. For example, mechanical shearing can create 3' or 5' overhangs at the ends of the molecule and single-stranded gaps or gaps in the middle of the molecule. These single-stranded moieties suitable for adaptor ligation (such as a cocktail of "end repair" enzymes) are used to artificially double-stranded again, and this can be a source of artificial error (such as, for example, "end repair" as described herein) Pseudodouble-stranded molecules"). In many embodiments, it is optimal to maximize the amount of associated double-stranded nucleic acid that remains in its native double-stranded form during processing. In addition, the high energies involved in many methods of random or semi-random mechanical fragmentation increase the abundance of DNA damage, such as oxidation, deamination, or other adduct formation during amplification or sequencing May be mutagenic or repressive, and may introduce artefact base calls or reduced signal. Some random or semi-random enzymatic fragmentation methods can similarly leave mutagenic or closed "scars" at the site of partial cleavage.

此外,对于DS处理,原始靶核酸分子的两条链必须成功地连接。例如,在其中衔接子被连接到分子的5'末端和3'末端的实施例中,必须成功地产生四个磷酸二酯键。如果这些键中的一个不能形成,则不可能对该分子的两条链进行扩增和测序。如上所述,形成必需键的失败可能由于多种原因(包含,例如,对靶双链核酸分子的末端的损伤、文库片段的不完全末端修复或拖尾、不完全合成或受损的衔接子分子、污染连接或前面的反应,例如,具有不期望的酶活性(例如,可以破坏衔接子或文库片段的可连接末端的核酸外切酶活性,或连接酶的降解,使其多级催化活性无效))以及其他原因而发生。对文库片段的末端的损伤在高能超声波或其他机械DNA片段化中可以特别常见。Furthermore, for DS processing, the two strands of the original target nucleic acid molecule must be successfully ligated. For example, in embodiments where adaptors are ligated to the 5' and 3' ends of the molecule, four phosphodiester linkages must be successfully created. If one of these bonds cannot be formed, it is impossible to amplify and sequence both strands of the molecule. As noted above, failure to form the necessary bonds can be due to a variety of reasons (including, for example, damage to the ends of the target double-stranded nucleic acid molecule, incomplete end repair or tailing of library fragments, incomplete synthesis or damaged adaptors) Molecular, contaminating ligation, or previous reactions, for example, have undesired enzymatic activity (e.g., exonuclease activity that can destroy the ligable ends of adapters or library fragments, or degradation of the ligase, rendering it multi-level catalytically active) invalid)) and other reasons. Damage to the ends of library fragments can be particularly common during high-energy sonication or other mechanical DNA fragmentation.

除了成功的衔接子连接外,衔接子-靶核酸复合物的第一链和第二链都必须是可扩增的,以达到双链序列的准确性。例如,如果靶核酸分子的特定链以聚合酶不能穿越的方式被切刻或破坏,则该特定链的扩增将不会发生,并且不能生成双链共有序列读数。作为非限制性的示例,不可穿越的损伤可以通过超声波DNA片段化、高温或延长的酶促步骤或文库制备中的单链切刻活性来引入。In addition to successful adaptor ligation, both the first and second strands of the adaptor-target nucleic acid complex must be amplifiable to achieve double-stranded sequence accuracy. For example, if a particular strand of a target nucleic acid molecule is nicked or destroyed in a manner that the polymerase cannot traverse, amplification of that particular strand will not occur and double-stranded consensus reads cannot be generated. As non-limiting examples, non-traversable lesions can be introduced by sonication DNA fragmentation, enzymatic steps of high temperature or elongation, or single-strand nicking activity in library preparation.

因此,在其他应用中,DS可以通过利用一种或多种用于富集样品中的靶核酸的方法(包含在扩增步骤之前富集靶核酸材料)而受益于效率的提高。不管潜在的方法如何,稀有核酸变体的检测需要筛选大量的分子;然而,同时制备到文库中的分子(即基因组等同物)越多,该过程的相对效率就越低。Thus, in other applications, DS may benefit from increased efficiency by utilizing one or more methods for enriching target nucleic acid in a sample comprising enriching the target nucleic acid material prior to the amplification step. Regardless of the underlying method, detection of rare nucleic acid variants requires screening of large numbers of molecules; however, the more molecules (ie, genomic equivalents) that are prepared simultaneously into a library, the lower the relative efficiency of the process.

本技术的各个方面提供了用于富集用于测序应用和其他核酸询问的核酸材料的方法、试剂和核酸文库及试剂盒。本技术的另外的方面提供了多种解决方案来提高DS和其他测序模式的转换效率和工作流效率,以克服上面列举的大多数限制。Various aspects of the present technology provide methods, reagents and nucleic acid libraries and kits for enriching nucleic acid material for sequencing applications and other nucleic acid interrogations. Additional aspects of the present technology provide solutions to improve the conversion and workflow efficiency of DS and other sequencing modalities to overcome most of the limitations listed above.

本技术的一些方面涉及使用聚类的规则间隔的短回文重复序列(CRISPR)可编程核酸内切酶系统来富集相关的区域的方法。在其他方面,CRISPER样或其他可编程的核酸内切酶诸如锌指核酸酶、TALEN核酸酶或其他序列特异性核酸内切酶诸如归巢核酸内切酶或简单限制性核酸酶或其衍生物可以单独使用或组合使用作为所公开的技术的一部分。Some aspects of the present technology relate to methods for enriching regions of interest using the Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) programmable endonuclease system. In other aspects, CRISPER-like or other programmable endonucleases such as zinc finger nucleases, TALEN nucleases or other sequence specific endonucleases such as homing endonucleases or simple restriction nucleases or derivatives thereof Can be used alone or in combination as part of the disclosed technology.

特别地,CRISPR/Cas9(或其他可编程或不可编程的核酸内切酶或其组合)可以用于选择性地切割一个或多个限定的或半限定的区域中的核酸主链,以从较长的核酸分子中功能性地切除一个或多个相关的序列区域,其中切除的靶区域被设计成具有一个或多个预定的或基本上预定的长度,从而使得能够在用于测序应用(诸如DS)的文库制备之前通过大小选择富集一个或多个相关的核酸靶区域。在其他实施例中,CRISPR/Cas9(或其他可编程的核酸内切酶或不可编程的核酸内切酶或其组合)可以用于选择性地切除一个或多个相关的序列区域,其中切除的靶区域被设计成具有基本上预定的长度和突出端的序列。这些可编程的核酸内切酶可以单独使用或与其他形式的靶向的核酸酶诸如限制性核酸内切酶或其他酶或非酶方法组合使用,用于切割核酸。In particular, CRISPR/Cas9 (or other programmable or non-programmable endonucleases or a combination thereof) can be used to selectively cleave the nucleic acid backbone in one or more defined or semi-defined regions in order to recover from relatively Functionally excising one or more relevant sequence regions in a long nucleic acid molecule, wherein the excised target region is designed to have one or more predetermined or substantially predetermined lengths to enable use in sequencing applications such as DS) is enriched for one or more relevant nucleic acid target regions by size selection prior to library preparation. In other embodiments, CRISPR/Cas9 (or other programmable endonucleases or non-programmable endonucleases or combinations thereof) can be used to selectively excise one or more relevant sequence regions, wherein the excised The target region is designed to have a sequence of substantially predetermined lengths and overhangs. These programmable endonucleases can be used alone or in combination with other forms of targeted nucleases such as restriction endonucleases or other enzymatic or non-enzymatic methods for cleaving nucleic acids.

在一些实施例中,所提供的方法可以包含以下步骤:提供核酸材料,用靶向的核酸内切酶(例如,核糖核蛋白复合物)切割核酸材料,使得基本上预定长度的一个或多个靶区域从核酸材料的其余部分分离或富集,以及分析切割的靶区域。在其他实施例中,一个或多个切割的区域可以从核酸材料的其余部分阴性富集(即耗尽),并且不进行分析。在一些实施例中,所提供的方法可以进一步包含将至少一个SMI和/或衔接子序列连接到预定长度的切割的靶区域的5'或3'末端中的至少一个。在一些实施例中,分析可以是或包括定量和/或测序。In some embodiments, provided methods can comprise the steps of: providing nucleic acid material, cleaving the nucleic acid material with a targeted endonuclease (eg, a ribonucleoprotein complex) such that one or more substantially predetermined lengths The target region is isolated or enriched from the rest of the nucleic acid material, and the cleaved target region is analyzed. In other embodiments, one or more cleaved regions can be negatively enriched (ie, depleted) from the remainder of the nucleic acid material and not analyzed. In some embodiments, the provided methods can further comprise ligating at least one SMI and/or adaptor sequence to at least one of the 5' or 3' end of the cleaved target region of predetermined length. In some embodiments, analysis may be or include quantification and/or sequencing.

在一些实施例中,定量可以是或包括分光光度分析、实时PCR和/或基于荧光的定量(例如,使用荧光染料标记)。在一些实施例中,测序可以是或包括Sanger测序、鸟枪法测序、桥接PCR、纳米孔测序、单分子实时测序、离子激流测序、焦磷酸测序、数字测序(例如,基于数字条形码的测序)、通过连接的测序、基于聚合酶克隆的测序、基于电流的测序(例如,隧道电流)、通过质谱的测序、基于微流体的测序、Illumina测序、下一代测序、大规模平行测序以及它们的任意组合。In some embodiments, quantification may be or include spectrophotometric analysis, real-time PCR, and/or fluorescence-based quantification (eg, using fluorescent dye labeling). In some embodiments, sequencing can be or include Sanger sequencing, shotgun sequencing, bridge PCR, nanopore sequencing, single molecule real-time sequencing, ion torrent sequencing, pyrosequencing, digital sequencing (eg, digital barcode-based sequencing), Sequencing by ligation, polymerase cloning-based sequencing, current-based sequencing (eg, tunneling current), sequencing by mass spectrometry, microfluidics-based sequencing, Illumina sequencing, next-generation sequencing, massively parallel sequencing, and any combination thereof .

在一些实施例中,靶向的核酸内切酶是或包括CRISPR相关的(Cas)酶(例如Cas9或Cpf1)或其他核糖核蛋白复合物、归巢核酸内切酶、锌指核酸酶、转录激活因子样效应核酸酶(TALEN)、精氨酸核酸酶、megaTAL核酸酶、巨核酸酶和/或限制性核酸内切酶中的至少一种。在一些实施例中,可以使用多于一种的靶向的核酸内切酶(例如,2、3、4、5、6、7、8、9、10种或更多种)。在一些实施例中,靶向的核酸酶可以用于切割多于一个的预定长度(例如,2、3、4、5、6、7、8、9、10个或更多个)的潜在靶区域。在其中存在多于一个的预定长度的靶区域的一些实施例中,每个靶区域可以具有相同(或基本上相同)的长度。在其中存在多于一个的预定长度的靶区域的一些实施例中,至少两个预定长度的靶区域在长度上不同(例如,第一靶区域具有100bp的长度并且第二靶区域具有1,000bp的长度)。In some embodiments, the targeted endonuclease is or includes a CRISPR-associated (Cas) enzyme (eg, Cas9 or Cpf1) or other ribonucleoprotein complexes, homing endonucleases, zinc finger nucleases, transcription At least one of activator-like effector nucleases (TALENs), arginine nucleases, megaTAL nucleases, meganucleases, and/or restriction endonucleases. In some embodiments, more than one targeted endonuclease may be used (eg, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more). In some embodiments, targeted nucleases can be used to cleave more than one predetermined length (eg, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more) of potential targets area. In some embodiments in which there is more than one target region of predetermined length, each target region may have the same (or substantially the same) length. In some embodiments in which there is more than one target region of predetermined length, at least two target regions of predetermined length differ in length (eg, a first target region has a length of 100 bp and a second target region has a length of 1,000 bp length).

本公开还提供了用于基于亲和富集靶核酸材料的方法和试剂。在包含这样的方法的一些实施例中,一种或多种捕获标记或部分可以用于从样品中富集/选择所需的靶核酸材料,所述样品包括基因组材料、非靶核酸材料、污染的核酸材料、来自混合样品的核酸材料、cfDNA材料等。例如,一些实施例包括使用一种或多种捕获标记/部分用于阳性富集/选择所需的靶核酸材料(例如,包括靶序列或相关的基因组区域、未片段化的基因组DNA中相关的靶向的基因组区域的片段)。在其他实施例中,捕获标记可以用于阴性富集/选择,以排除或减少不期望的基因组物质的丰度。The present disclosure also provides methods and reagents for affinity-based enrichment of target nucleic acid materials. In some embodiments comprising such methods, one or more capture labels or moieties can be used to enrich/select desired target nucleic acid material from samples including genomic material, non-target nucleic acid material, contamination nucleic acid material, nucleic acid material from mixed samples, cfDNA material, etc. For example, some embodiments include the use of one or more capture labels/moieties for positive enrichment/selection of desired target nucleic acid material (eg, including target sequences or associated genomic regions, associated in unfragmented genomic DNA) Fragments of targeted genomic regions). In other embodiments, capture markers can be used for negative enrichment/selection to exclude or reduce the abundance of undesired genomic material.

例如,在包含阳性富集的一些实施例中,衔接子寡核苷酸可以具有捕获标记,该捕获标记是或包括附着的化学部分(例如生物素),该化学部分可以用于通过在一个或多个后续纯化步骤中的捕获,例如通过结合到官能化的表面(例如顺磁性珠或其他形式的珠)的提取部分(例如链霉亲和素),来隔离或分离所需的衔接子-核酸复合物。在包含阴性富集的一些实施例中,作为或包括附着的化学部分(例如生物素)的捕获标记可以用于通过在一个或多个后续纯化步骤中的捕获,例如通过结合到官能化的表面(例如顺磁性珠或其他形式的珠)的提取部分(例如链霉亲和素)来纯化或分离被连接至或附着至衔接子(或包括捕获标记的其他探针)的不期望的基因组材料(例如,脱靶核酸片段等)。For example, in some embodiments comprising positive enrichment, the adaptor oligonucleotide can have a capture label that is or includes an attached chemical moiety (eg, biotin) that can be used to pass the Capture in multiple subsequent purification steps, e.g. by extracting moieties (eg streptavidin) bound to functionalized surfaces (eg paramagnetic beads or other forms of beads) to isolate or isolate the desired adaptor- Nucleic acid complexes. In some embodiments involving negative enrichment, a capture label that is or includes an attached chemical moiety (eg, biotin) can be used for capture by in one or more subsequent purification steps, such as by binding to a functionalized surface Extract fractions (eg, streptavidin) (eg, paramagnetic beads or other forms of beads) to purify or isolate undesired genomic material ligated or attached to adapters (or other probes including capture labels) (eg, off-target nucleic acid fragments, etc.).

核酸材料的基于大小的富集Size-based enrichment of nucleic acid material

在一些实施例中,所提供的方法和组合物利用了靶向的核酸内切酶(例如,核糖核蛋白复合物(CRISPR相关的核酸内切酶,诸如Cas9、Cpf1)、归巢核酸内切酶、锌指核酸酶、TALEN、精氨酸核酸酶、巨核酸酶、限制性核酸内切酶和/或巨核酸酶(例如,megaTAL核酸酶等)或其组合)或能够切割核酸材料(例如,一种或多种限制性酶)的其他技术,从而以最佳片段大小切除相关的靶序列用于测序。在一些实施例中,靶向的核酸内切酶具有特异性和选择性地切除相关的精确序列区域的能力。通过预先选择切割位点,例如利用产生预定的和基本上一致的大小的片段的可编程的核酸内切酶(例如,CRISPR相关的(Cas)酶/导向RNA复合物),可以显著地减少偏差和非信息性读数的存在。此外,由于切除的片段和剩余的未切割的DNA之间的大小差异,可以进行大小选择步骤(如下文进一步描述的)以去除大的脱靶区域,从而在任何进一步的处理步骤之前预富集样品。也可以减少或消除对末端修复步骤的需要,从而节省时间和假双链挑战的风险,并且在一些情况下,减少或消除对分子末端附近的数据的计算修整的需要,从而提高效率。因此,靶向的酶切的另外的优点是减少缺口或核酸加合物或由机械片段化方法引起的其他形式的损伤的潜力。In some embodiments, provided methods and compositions utilize targeted endonucleases (eg, ribonucleoprotein complexes (CRISPR-associated endonucleases such as Cas9, Cpf1), homing endonucleases Enzymes, zinc finger nucleases, TALENs, arginine nucleases, meganucleases, restriction endonucleases, and/or meganucleases (eg, megaTAL nucleases, etc., or combinations thereof) or capable of cleaving nucleic acid material (eg, , one or more restriction enzymes) other techniques to excise the relevant target sequence at the optimal fragment size for sequencing. In some embodiments, the targeted endonuclease has the ability to specifically and selectively excise a precise sequence region of interest. Bias can be significantly reduced by preselecting cleavage sites, eg, using programmable endonucleases (eg, CRISPR-associated (Cas) enzyme/guide RNA complexes) that generate fragments of predetermined and substantially uniform size and the presence of non-informative readings. Furthermore, due to the size difference between the excised fragments and the remaining uncleaved DNA, a size selection step (as described further below) can be performed to remove large off-target regions, thereby pre-enriching the sample prior to any further processing steps . The need for end repair steps may also be reduced or eliminated, saving time and risk of false duplex challenges, and in some cases, the need for computational trimming of data near the ends of the molecule, thereby increasing efficiency. Therefore, an additional advantage of targeted cleavage is the potential to reduce gaps or nucleic acid adducts or other forms of damage caused by mechanical fragmentation methods.

被称为CRISPR-DS的方法允许非常高的靶上富集(其可以减少后续杂交捕获步骤的需要),这可以显著地减少时间和成本以及提高转化效率。图3是示出了根据本技术的实施例的用于利用CRISPR/Cas9生成靶向的片段大小的方法的步骤的示意图。例如,CRISPR/Cas9可以用于通过Cas9的gRNA促进的结合,在靶序列(图3的图A)内的一个或多个特定位点(例如,原间隔子邻近基序或“PAM”位点)处切割。Cas9定向的切割释放已知长度的钝端双链靶DNA片段,如图B所示。图3的图C描绘了通过大小选择对靶DNA片段进行阳性富集/选择的进一步处理步骤。一种分离切除的靶部分的方法包含使用SPRI/Ampure珠和磁纯化来除去高分子量的DNA,同时留下预定的较短的片段。在其他实施例中,可以使用各种大小选择方法(包含但不限于凝胶电泳、凝胶纯化、液相色谱、大小排阻纯化和/或过滤纯化方法)以及其他方法,从不需要的DNA片段和其他高分子量基因组DNA(如果适用)中分离预定长度的切除部分。在大小选择之后,CRISPR-DS方法可以包含与DS方法步骤一致的步骤,包含A-拖尾(CRISPR/Cas9切除留下钝端)、连接衔接子(例如DS衔接子)、双链扩增、任选的捕获步骤和扩增(例如PCR),然后对每条链进行测序并生成双链共有序列。除了提高工作流效率外,基于CRISPR的大小选择/靶富集还为高效率的扩增和测序步骤提供了最佳片段长度。在国际专利公开第WO/2018/175997号(其全部内容通过引用并入本文)中公开了CRISPR-DS的各个方面。The method known as CRISPR-DS allows for very high on-target enrichment (which can reduce the need for subsequent hybridization capture steps), which can significantly reduce time and cost as well as improve transformation efficiency. 3 is a schematic diagram illustrating the steps of a method for generating targeted fragment sizes using CRISPR/Cas9 according to an embodiment of the present technology. For example, CRISPR/Cas9 can be used for gRNA-facilitated binding via Cas9, at one or more specific sites (eg, protospacer-adjacent motif or "PAM" sites) within the target sequence (Figure 3, Panel A ) ) is cut. Cas9-directed cleavage releases blunt-ended double-stranded target DNA fragments of known length, as shown in panel B. Figure 3, panel C depicts further processing steps for positive enrichment/selection of target DNA fragments by size selection. One method of isolating excised target moieties involves the use of SPRI/Ampure beads and magnetic purification to remove high molecular weight DNA while leaving predetermined shorter fragments. In other embodiments, various size selection methods (including, but not limited to, gel electrophoresis, gel purification, liquid chromatography, size exclusion purification, and/or filtration purification methods), as well as other methods, can be used to isolate unwanted DNA from Fragments and other high molecular weight genomic DNA (if applicable) are isolated from excised portions of predetermined lengths. Following size selection, the CRISPR-DS method may contain steps consistent with the DS method steps, including A-tailing (CRISPR/Cas9 excision leaves blunt ends), ligation of adaptors (eg DS adaptors), double-stranded amplification, An optional capture step and amplification (eg, PCR), then each strand is sequenced and a double-stranded consensus sequence is generated. In addition to improving workflow efficiency, CRISPR-based size selection/target enrichment provides optimal fragment lengths for efficient amplification and sequencing steps. Various aspects of CRISPR-DS are disclosed in International Patent Publication No. WO/2018/175997, the entire contents of which are incorporated herein by reference.

在某些实施例中,CRISPR-DS解决了与NGS相关的多个常见问题,包含例如低效的靶富集,其可以通过基于CRISPR的大小选择来优化;测序错误,其可以使用DS技术消除,以生成错误纠正的双链共有序列;和不均匀的片段大小,其通过预先设计的CRISPR/Cas9片段化来减少。如本领域技术人员所理解的,如本文所述,CRISPR-DS可以具有用于在其中样品受DNA限制的情况下敏感地鉴定突变的应用,诸如法医学和早期癌症检测应用等。In certain embodiments, CRISPR-DS addresses a number of common problems associated with NGS, including, for example, inefficient target enrichment, which can be optimized by CRISPR-based size selection; sequencing errors, which can be eliminated using DS technology , to generate error-corrected double-stranded consensus sequences; and uneven fragment size, which was reduced by predesigned CRISPR/Cas9 fragmentation. As understood by those of skill in the art, as described herein, CRISPR-DS may have applications for sensitive identification of mutations in situations where the sample is DNA-constrained, such as forensic and early cancer detection applications, among others.

用Cas9核酸酶对DNA材料的体外消化利用了核糖核蛋白复合物的形成,该复合物识别并切割预定的位点(例如PAM位点,图3的图A)。这种复合物由导向RNA(“gRNA”,例如crRNA+tracrRNA)和Cas9形成。对于多重切割,可以通过汇集所有的crRNA,然后与tracrRNA复合,或者通过分别将每个crRNA和tracrRNA复合,然后汇集来复合gRNA。在一些实施例中,第二种选择可能是优选的,因为它消除了crRNA之间的竞争。使用不同Cas蛋白的其他CRISPER系统可能依赖于不同的PAM基序序列,或者不需要PAM基序序列,或者依赖于其他形式的核酸序列来引导核酸酶向靶向的核酸区域的递送。In vitro digestion of DNA material with Cas9 nuclease utilizes the formation of ribonucleoprotein complexes that recognize and cleave predetermined sites (eg PAM sites, Figure 3, Panel A ). This complex is formed by guide RNA ("gRNA" eg crRNA+tracrRNA) and Cas9. For multiple cleavage, gRNAs can be complexed by pooling all crRNAs and then complexing with tracrRNA, or by complexing each crRNA and tracrRNA separately and then pooling. In some embodiments, the second option may be preferred because it eliminates competition between crRNAs. Other CRISPER systems using different Cas proteins may rely on different PAM motif sequences, or do not require PAM motif sequences, or rely on other forms of nucleic acid sequences to direct the delivery of nucleases to targeted nucleic acid regions.

在一些实施例中,核酸材料包括基本上均匀长度的核酸分子。在一些实施例中,基本上均匀的长度是约1至1,000,000个碱基。例如,在一些实施例中,基本上均匀的长度可以是至少1;2;3;4;5;6;7;8;9;10;15;20;25;30;35;40;50;60;70;80;90;100;120;150;200;300;400;500;600;700;800;900;1000;1200;1500;2000;3000;4000;5000;6000;7000;8000;9000;10,000;15,000;20,000;30,000;40,000;或50,000个碱基长度。在一些实施例中,基本上均匀的长度可以为至多60,000;70,000;80,000;90,000;100,000;120,000;150,000;200,000;300,000;400,000;500,000;600,000;700,000;800,000;900,000;或1,000,000个碱基。作为具体的非限制性示例,在一些实施例中,基本上均匀的长度是约100至约500个碱基。在一些实施例中,可以在任何特定的扩增步骤之前进行大小选择步骤,诸如本文所述的那些。在一些实施例中,可以在任何特定的扩增步骤之后进行大小选择步骤,诸如本文所述的那些。在一些实施例中,诸如本文所述的大小选择步骤之后可以是另外的步骤,诸如消化步骤和/或另一大小选择步骤。在一些实施例中,大小选择可以在连接衔接子的步骤之前或之后进行。在一些实施例中,大小选择可以与切割步骤同时进行。在一些实施例中,大小选择可以在切割步骤之后进行。In some embodiments, the nucleic acid material comprises nucleic acid molecules of substantially uniform length. In some embodiments, the substantially uniform length is about 1 to 1,000,000 bases. For example, in some embodiments, the substantially uniform length may be at least 1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 15; 20; 25; 30; 35; 40; 50; 60; 70; 80; 90; 100; 120; 150; 200; 300; 400; 500; 600; 700; 800; 900; 9000; 10,000; 15,000; 20,000; 30,000; 40,000; or 50,000 bases in length. In some embodiments, the substantially uniform length may be at most 60,000; 70,000; 80,000; 90,000; 100,000; 120,000; 150,000; As a specific non-limiting example, in some embodiments, the substantially uniform length is about 100 to about 500 bases. In some embodiments, a size selection step, such as those described herein, can be performed before any particular amplification step. In some embodiments, any particular amplification step can be followed by a size selection step, such as those described herein. In some embodiments, a size selection step, such as described herein, may be followed by additional steps, such as a digestion step and/or another size selection step. In some embodiments, size selection can be performed before or after the step of ligating adaptors. In some embodiments, the size selection may be performed concurrently with the cutting step. In some embodiments, size selection may be performed after the cutting step.

除了使用靶向的核酸内切酶之外,还可以使用获得基本上均匀长度的核酸分子的任何其他合适的施加方法。作为非限制性的示例,这样的方法可以是或包含使用以下中的一种或多种:琼脂糖或其他凝胶、凝胶电泳、亲和柱、HPLC、PAGE、过滤、凝胶过滤、交换色谱、SPRI/Ampure型珠,或本领域技术人员将认识到的任何其他合适的方法。In addition to the use of targeted endonucleases, any other suitable application method that results in nucleic acid molecules of substantially uniform length can be used. By way of non-limiting example, such methods may be or include the use of one or more of the following: agarose or other gels, gel electrophoresis, affinity columns, HPLC, PAGE, filtration, gel filtration, exchange Chromatography, SPRI/Ampure-type beads, or any other suitable method that will be recognized by those skilled in the art.

在一些实施例中,处理核酸材料以便产生基本上均匀的长度(或质量)的核酸分子可以用于从样品(例如,相关的靶序列)中回收一个或多个期望的靶区域。在一些实施例中,处理核酸材料以便产生基本上均匀的长度(或质量)的核酸分子可以用于排除样品(例如,来自同一物种的非所需物种或非所需受试者的核酸材料)的特定部分。在一些实施例中,核酸材料可以以多种大小(例如,不是以基本上均匀的长度或质量)存在。In some embodiments, processing the nucleic acid material to produce nucleic acid molecules of substantially uniform length (or mass) can be used to recover one or more desired target regions from a sample (eg, a relevant target sequence). In some embodiments, processing nucleic acid material to produce nucleic acid molecules of substantially uniform length (or mass) can be used to exclude samples (eg, nucleic acid material from an undesired species of the same species or an undesired subject) specific part of the . In some embodiments, the nucleic acid material may be present in a variety of sizes (eg, not in substantially uniform length or mass).

在一些实施例中,可以使用多于一种的靶向的核酸内切酶或其他方法来提供基本上均匀长度的核酸分子(例如,2、3、4、5、6、7、8、9、10个或更多个)。在一些实施例中,靶向的核酸酶可以用于切割核酸材料的多于一个的潜在靶区域(例如,2、3、4、5、6、7、8、9、10个或更多个)。在其中存在核酸材料的多于一个的靶区域的一些实施例中,每个靶区可以具有相同(或基本上相同)的长度。在其中存在核酸材料的多于一个的靶区域的一些实施例中,至少两个已知长度的靶区域在长度上不同(例如,第一靶区域具有100bp的长度并且第二靶区域具有1,000bp的长度)。In some embodiments, more than one targeted endonuclease or other method can be used to provide nucleic acid molecules of substantially uniform length (eg, 2, 3, 4, 5, 6, 7, 8, 9 , 10 or more). In some embodiments, a targeted nuclease can be used to cleave more than one potential target region (eg, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more) of a nucleic acid material ). In some embodiments in which more than one target region of nucleic acid material is present, each target region may have the same (or substantially the same) length. In some embodiments in which more than one target region of nucleic acid material is present, at least two target regions of known length differ in length (eg, the first target region has a length of 100 bp and the second target region has a length of 1,000 bp length).

在一些实施例中,多种靶向的核酸内切酶(例如,可编程的核酸内切酶)可以组合使用以片段化相关的靶核酸的多个区域。在一些实施例中,一种或多种可编程的靶向的核酸内切酶可以与其他靶向的核酸酶组合使用。在一些实施例中,一种或多种靶向的核酸内切酶可以与随机或半随机核酸酶组合使用。在一些实施例中,一种或多种靶向的核酸内切酶可以与核酸片段化的其他随机或半随机的方法诸如机械或声学剪切组合使用。在一些实施例中,在具有一个或多个中间大小选择步骤的连续步骤中进行切割可能是有利的。在其中靶向的片段化与随机或半随机片段化组合使用的一些实施例中,后者的随机或半随机性质可以用于实现唯一分子标识符(UMI)序列的目的。在其中靶向的片段化与随机或半随机片段化组合使用的一些实施例中,后者的随机或半随机性质可用于促进核酸中不容易以靶向的方式切割的区域的测序,诸如长的或高度重复的区域或与一个或多个基因组中其他区域基本相似的区域,否则这些区域可能难以通过传统的杂交捕获方法进行富集。In some embodiments, multiple targeted endonucleases (eg, programmable endonucleases) can be used in combination to fragment multiple regions of a related target nucleic acid. In some embodiments, one or more programmable targeted endonucleases can be used in combination with other targeted nucleases. In some embodiments, one or more targeted endonucleases can be used in combination with random or semi-random nucleases. In some embodiments, one or more targeted endonucleases can be used in combination with other random or semi-random methods of nucleic acid fragmentation, such as mechanical or acoustic shearing. In some embodiments, it may be advantageous to perform the cleavage in successive steps with one or more intermediate size selection steps. In some embodiments in which targeted fragmentation is used in combination with random or semi-random fragmentation, the random or semi-random nature of the latter may be used for the purpose of achieving Unique Molecular Identifier (UMI) sequences. In some embodiments in which targeted fragmentation is used in combination with random or semi-random fragmentation, the random or semi-random nature of the latter can be used to facilitate sequencing of regions of nucleic acids that are not readily cleaved in a targeted manner, such as long or highly repetitive or substantially similar to other regions in one or more genomes that might otherwise be difficult to enrich by traditional hybrid capture methods.

靶向的核酸内切酶targeted endonuclease

靶向的核酸内切酶(例如CRISPR相关的核糖核蛋白复合物,诸如Cas9或Cpf1、归巢核酸酶、锌指核酸酶、TALEN、megaTAL核酸酶、精氨酸核酸酶和/或其衍生物)可以用于选择性地切割和切除核酸材料的靶向部分,用于富集这样的靶向部分以用于测序应用的目的。在一些实施例中,可以修饰靶向的核酸内切酶,诸如具有氨基酸取代,以提供例如增强的热稳定性、耐盐性和/或pH耐性或增强的特异性或交替的PAM位点识别或对结合的更高的亲和力。在其他实施例中,靶向的核酸内切酶可以是生物素化的,与链霉亲和素融合和/或结合其他基于亲和力的(例如诱饵/猎物)技术。在某些实施例中,靶向的核酸内切酶可以具有改变的识别位点特异性(例如,具有改变的PAM位点特异性的SpCas9变体)。在其他实施例中,靶向的核酸内切酶可以是无催化活性的,使得一旦结合到核酸材料的靶向部分就不会发生切割。在一些实施例中,靶向的核酸内切酶被修饰以切割核酸材料的靶向部分的单链(例如,切口酶变体),从而在核酸材料中生成缺口。本文进一步讨论了基于CRISPR的靶向的核酸内切酶,以提供使用靶向的核酸内切酶的进一步详细的非限制性示例。我们注意到围绕这样的靶向核酸酶的命名仍在变化中。为了本文的目的,我们使用术语“基于CRISPER的”通常指包括核酸序列的核酸内切酶,其序列可以被修饰以重新定义待切割的核酸序列。Cas9和CPF1是目前正在使用的这样的靶向的核酸内切酶的示例,但是在自然界的不同地方似乎存在更多的这种酶,并且这样的靶向的且易于调节的核酸酶的不同变体的可用性预计在未来几年中快速增长。例如,Cas12a、Cas13、CasX和其他被考虑用于各种实施例中。类似地,增强或改变它们的特性的这些酶的多种工程改造的变体正在变得可用。在本文中,我们明确地预期使用本文未明确地描述或尚未发现的基本上功能相似的靶向的核酸内切酶,以实现与本文描述的公开相似的目的。Targeted endonucleases (eg, CRISPR-associated ribonucleoprotein complexes such as Cas9 or Cpf1, homing nucleases, zinc finger nucleases, TALENs, megaTAL nucleases, arginine nucleases, and/or derivatives thereof ) can be used to selectively cleave and excise targeting moieties of nucleic acid material for the purpose of enriching such targeting moieties for sequencing applications. In some embodiments, targeted endonucleases can be modified, such as with amino acid substitutions, to provide, for example, enhanced thermostability, salt tolerance, and/or pH tolerance or enhanced specificity or alternate PAM site recognition or higher affinity for binding. In other embodiments, the targeted endonuclease may be biotinylated, fused to streptavidin and/or combined with other affinity-based (eg, bait/prey) technologies. In certain embodiments, the targeted endonuclease can have an altered recognition site specificity (eg, a SpCas9 variant with an altered PAM site specificity). In other embodiments, the targeted endonuclease may be catalytically inactive such that cleavage does not occur once bound to the targeting moiety of the nucleic acid material. In some embodiments, the targeted endonuclease is modified to cleave a single strand of the targeting moiety of the nucleic acid material (eg, a nickase variant), thereby creating a gap in the nucleic acid material. CRISPR-based targeted endonucleases are discussed further herein to provide further detailed non-limiting examples of the use of targeted endonucleases. We note that the nomenclature surrounding such targeted nucleases is still in flux. For purposes herein, we use the term "CRISPER-based" to generally refer to an endonuclease that includes a nucleic acid sequence, the sequence of which can be modified to redefine the nucleic acid sequence to be cleaved. Cas9 and CPF1 are examples of such targeted endonucleases currently in use, but there appear to be more such enzymes in different places in nature, and different variants of such targeted and easily regulated nucleases Availability of the body is expected to grow rapidly in the next few years. For example, Cas12a, Cas13, CasX and others are contemplated for use in various embodiments. Similarly, various engineered variants of these enzymes that enhance or alter their properties are becoming available. Herein, we expressly contemplate the use of substantially functionally similar targeted endonucleases not expressly described or discovered herein to achieve similar purposes as disclosed herein.

限制性核酸内切酶restriction endonuclease

特别预期的是,多种限制性核酸内切酶(即酶)中的任何一种都可以用于提供基本上一致长度的核酸材料和/或切除核酸材料的靶向区域。通常,限制性酶通常由某些细菌/其他原核生物产生,并且在给定的DNA区段中的特定序列处、附近或之间切割。It is particularly contemplated that any of a variety of restriction endonucleases (ie, enzymes) can be used to provide nucleic acid material of substantially uniform length and/or to excise targeted regions of nucleic acid material. Generally, restriction enzymes are typically produced by certain bacteria/other prokaryotes and cut at, near or between specific sequences in a given DNA segment.

对于本领域技术人员来说将显而易见的是,选择限制性酶以在特定的位点处切割,或者可替代地,在为了产生用于切割的限制性位点而生成的位点处切割。在一些实施例中,限制性酶是合成酶。在一些实施例中,限制性酶不是合成酶。在一些实施例中,如本文所用的限制性酶已经被修饰以在酶本身的基因组中引入一个或多个变化。在一些实施例中,限制性酶在DNA的给定部分中的确定序列之间产生双链切割。It will be apparent to those skilled in the art that a restriction enzyme is selected to cut at a particular site, or alternatively, at a site created to create a restriction site for cleavage. In some embodiments, the restriction enzyme is a synthetase. In some embodiments, the restriction enzyme is not a synthetase. In some embodiments, a restriction enzyme as used herein has been modified to introduce one or more changes in the genome of the enzyme itself. In some embodiments, restriction enzymes create double-stranded cleavage between defined sequences in a given portion of DNA.

尽管根据一些实施例可以使用任何限制性酶(例如,I型、II型、III型和/或IV型),但以下代表可以使用的限制性酶的非限制性列表:AluI、ApoI、AspHI、BamHI、BfaI、BsaI、CfrI、DdeI、DpnI、DraI、EcoRI、EcoRII、EcoRV、HaeII、HaeIII、HgaI、HindII、HindIII、HinFI、HPYCH4III、KpnI、MamI、MNL1、MseI、MstI、MstII、NcoI、NdeI、NotI、PacI、PstI、PvuI、PvuII、RcaI、RsaI、SacI、SacII、SalI、Sau3AI、ScaI、SmaI、SpeI、SphI、StuI、TaqI、XbaI、XhoI、XhoII、XmaI、XmaII和它们的任何组合。合适的限制性酶的广泛但非详尽的列表可以在公开可获得的目录和互联网上找到(例如,可在美国马萨诸塞州伊普斯威奇的NewEngland Biolabs获得)。本领域技术人员应该理解,可以单独或组合用于靶向可以实现相同目的的核酸分子的磷酸二酯主链切割的多种酶、核酶或其他核酸修饰酶可能不被包含在上述列表中或者尚未在上述列表中发现。多种核酸修饰酶可以识别碱基修饰(例如,CpG甲基化),其可以用于靶向可以被切割(如被具有裂解酶活性的酶切割)的相邻核酸序列的进一步修饰(例如用于生成脱碱基位点)。因此,基于对DNA或RNA修饰的识别,可以实现切割的实质序列特异性,并且这可以单独使用或与靶向的核酸内切酶组合使用,以实现靶向的核酸片段化。Although any restriction enzyme (eg, Type I, II, III, and/or IV) may be used according to some embodiments, the following represents a non-limiting list of restriction enzymes that may be used: AluI, ApoI, AspHI, BamHI, BfaI, BsaI, CfrI, DdeI, DpnI, DraI, EcoRI, EcoRII, EcoRV, HaeII, HaeIII, HgaI, HindII, HindIII, HinFI, HPYCH4III, KpnI, MamI, MNL1, MseI, MstI, MstII, NcoI, NdeI, NotI, PacI, PstI, PvuI, PvuII, RcaI, RsaI, SacI, SacII, SalI, Sau3AI, ScaI, SmaI, SpeI, SphI, StuI, TaqI, XbaI, XhoI, XhoII, XmaI, XmaII, and any combination thereof. An extensive but non-exhaustive list of suitable restriction enzymes can be found in publicly available catalogs and on the Internet (eg, available at NewEngland Biolabs, Ipswich, MA, USA). It will be appreciated by those skilled in the art that various enzymes, ribozymes or other nucleic acid modifying enzymes that may be used alone or in combination to target phosphodiester backbone cleavage of nucleic acid molecules that may achieve the same purpose may not be included in the above list or Not yet found on the above list. A variety of nucleic acid modifying enzymes can recognize base modifications (e.g., CpG methylation) that can be used to target further modifications of adjacent nucleic acid sequences that can be cleaved (e.g., by enzymes with lyase activity) (e.g., with for the generation of abasic sites). Thus, substantial sequence specificity of cleavage can be achieved based on recognition of DNA or RNA modifications, and this can be used alone or in combination with targeted endonucleases to achieve targeted nucleic acid fragmentation.

用于核酸材料的阴性和阳性富集/选择的方法Methods for negative and positive enrichment/selection of nucleic acid material

在一些实施例中,所提供的方法和组合物利用了靶向的核酸内切酶(例如,核糖核蛋白复合物(CRISPR相关的核酸内切酶,诸如Cas9、Cpf1)、归巢核酸内切酶、锌指核酸酶、TALEN、精氨酸核酸酶和/或巨核酸酶(例如,megaTAL核酸酶等)或其组合)或其他能够与核酸材料定点相互作用的技术,以阳性富集所需的(靶上)核酸分子。其他实施例提供了通过从样品中除去不期望的(例如,脱靶的)核酸材料来阴性富集/选择所需核酸分子的方法和这样的组合物。本文描述的一些实施例结合了阳性富集方案和阴性富集方案。在一些实施例中,所提供的方法可以进一步包含将至少一个SMI和/或衔接子序列连接到富集的靶区域的5'或3'末端中的至少一个。在一些实施例中,分析可以是或包括定量和/或测序。In some embodiments, provided methods and compositions utilize targeted endonucleases (eg, ribonucleoprotein complexes (CRISPR-associated endonucleases such as Cas9, Cpf1), homing endonucleases Enzymes, zinc finger nucleases, TALENs, arginine nucleases and/or meganucleases (eg, megaTAL nucleases, etc., or combinations thereof) or other technologies capable of site-specific interactions with nucleic acid material to positively enrich the desired the (on-target) nucleic acid molecule. Other embodiments provide methods and such compositions for negatively enriching/selecting desired nucleic acid molecules by removing undesired (eg, off-target) nucleic acid material from a sample. Some embodiments described herein combine positive and negative enrichment schemes. In some embodiments, the provided methods can further comprise ligating at least one SMI and/or adaptor sequence to at least one of the 5' or 3' end of the enriched target region. In some embodiments, analysis may be or include quantification and/or sequencing.

在一些实施例中,通过去除或破坏非靶或不需要的核酸材料,可以促进靶核酸材料的阴性富集/选择。图4是示出了根据本技术的实施例的利用CRISPR/Cas9变体生成具有基本上已知/选定长度的靶向的核酸片段的方法的步骤的示意图。使用CRISPR/Cas9核糖核蛋白复合物,任选地具有增强的热稳定性和/或被工程改造成在合适的条件下保持与dsDNA结合(例如,直到被去除、酶置换等)的CRISPR/Cas9核糖核蛋白复合物,图A示出了如上所述的变体Cas9与靶向的DNA位点的gRNA促进的结合。在一个实施例中,并且在切割之后并且当Cas9保持与靶DNA片段的切割的5'和3'末端结合时,样品可以用核酸外切酶处理,以水解DNA的暴露的3'或5'末端的暴露的磷酸二酯键(图B)。在核酸外切酶处理期间,不需要的或非靶向的DNA将通过酶活性被破坏,只留下抗核酸外切酶的靶dsDNA片段。如图4所示,结合的核糖核蛋白复合物可以提供核酸外切酶保护。在通过非靶向的DNA的核酸外切酶破坏对靶DNA片段进行阴性富集/选择之后,Cas9与DNA分离并释放已知长度的钝端双链靶DNA片段,如图C所示。在一些实施例中,该方法还可以包含并入阳性富集/选择方案的步骤,例如使用大小选择(图D)。在一些实施例中,富集期望的和/或预测的靶大小的片段可以进一步过滤掉保持未消化的和/或被脱靶Cas9结合保护的基因组片段。任选地,如在图E中所描绘的,富集的DNA片段可以连接到用于核酸询问(诸如测序)的衔接子上。例如,靶片段的钝端可以直接连接到钝端衔接子上。如果在特定的应用中需要,将衔接子连接到切割的双链核酸材料的方面可以包含片段的末端修复和3'-dA拖尾。在其他实施例中,生成合适的片段的可连接末端的片段的进一步处理可以包含形成可连接末端的多种形式或步骤中的任何一种,所述可连接末端具有例如钝端、A-3'突出端、包括一个核苷酸3'突出端的“粘性”末端、两个核苷酸3'突出端、三个核苷酸3'突出端、4、5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20个或更多个核苷酸3'突出端、一个核苷酸5'突出端、两个核苷酸5'突出端、三个核苷酸5'突出端、4、5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20个或更多个核苷酸5'突出端等。连接位点的5'碱基可以被磷酸化,并且3'碱基可以具有羟基基团,或者可以单独或组合去磷酸化或脱水,或者进一步化学修饰以促进一条链的增强的连接,以防止一条链的连接,任选地,直到稍后的时间点。In some embodiments, negative enrichment/selection of target nucleic acid material may be facilitated by removing or destroying non-target or unwanted nucleic acid material. 4 is a schematic diagram illustrating the steps of a method of utilizing CRISPR/Cas9 variants to generate targeted nucleic acid fragments of substantially known/selected lengths according to embodiments of the present technology. Use of CRISPR/Cas9 ribonucleoprotein complexes, optionally with enhanced thermostability and/or CRISPR/Cas9 engineered to remain bound to dsDNA under suitable conditions (eg, until removed, enzymatically displaced, etc.) The ribonucleoprotein complex, panel A shows gRNA-promoted binding of variant Cas9 to the targeted DNA site as described above. In one embodiment, and after cleavage and while Cas9 remains bound to the cleaved 5' and 3' ends of the target DNA fragment, the sample can be treated with an exonuclease to hydrolyze the exposed 3' or 5' of the DNA Terminal exposed phosphodiester bonds ( Panel B ). During exonuclease treatment, unwanted or non-targeted DNA will be destroyed by enzymatic activity, leaving only exonuclease resistant target dsDNA fragments. As shown in Figure 4, the bound ribonucleoprotein complex can provide exonuclease protection. Following negative enrichment/selection of target DNA fragments by exonuclease disruption of non-targeted DNA, Cas9 dissociates from DNA and releases blunt-ended double-stranded target DNA fragments of known length, as shown in Panel C. In some embodiments, the method may further comprise the step of incorporating a positive enrichment/selection scheme, such as the use of size selection ( panel D ). In some embodiments, fragments enriched for desired and/or predicted target sizes can be further filtered out genomic fragments that remain undigested and/or protected by off-target Cas9 binding. Optionally, as depicted in Figure E , the enriched DNA fragments can be ligated to adaptors for nucleic acid interrogation, such as sequencing. For example, the blunt end of the target fragment can be directly ligated to a blunt end adaptor. If desired in a particular application, aspects of ligating the adaptor to the cleaved double-stranded nucleic acid material can include end repair and 3'-dA tailing of the fragments. In other embodiments, further processing of fragments to generate ligatable ends of suitable fragments may comprise any of a variety of forms or steps of forming ligatable ends having, for example, blunt ends, A-3 'overhangs, "sticky" ends including one nucleotide 3' overhang, two nucleotide 3' overhangs, three nucleotide 3' overhangs, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more nucleotides 3' overhang, one nucleotide 5' overhang, two nucleotides 5' Overhangs, three nucleotide 5' overhangs, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more nucleotide 5' overhangs, etc. The 5' base of the attachment site can be phosphorylated, and the 3' base can have a hydroxyl group, or can be dephosphorylated or dehydrated, alone or in combination, or further chemically modified to facilitate enhanced attachment of one strand to prevent Linking of a chain, optionally, until a later point in time.

在另一个实施例中,使用CRISPR/Cas的靶核酸材料的阳性富集/选择可以通过靶核酸材料的基于亲和的富集来促进。图5是说明根据本技术的另一个实施例的利用CRISPR/Cas9变体生成具有基本上已知/选定长度的靶向的核酸片段的方法的步骤的示意图。图A示出了使用CRISPR/Cas9核糖核蛋白复合物,其任选地被进一步工程改造以在合适的条件下(如上所述)保持与DNA强结合,其中核糖核蛋白复合物包括捕获标记(例如生物素)。捕获标记可以结合在gRNA(例如,crRNA、tracrRNA)或Cas9蛋白上。因此,核糖核蛋白复合物为随后的下拉步骤提供了亲和标记。In another embodiment, positive enrichment/selection of target nucleic acid material using CRISPR/Cas can be facilitated by affinity-based enrichment of target nucleic acid material. 5 is a schematic diagram illustrating the steps of a method of utilizing CRISPR/Cas9 variants to generate targeted nucleic acid fragments of substantially known/selected lengths according to another embodiment of the present technology. Panel A shows the use of a CRISPR/Cas9 ribonucleoprotein complex, optionally further engineered to retain strong DNA binding under suitable conditions (as described above), wherein the ribonucleoprotein complex includes a capture tag ( such as biotin). The capture label can be bound to a gRNA (eg, crRNA, tracrRNA) or Cas9 protein. Thus, the ribonucleoprotein complex provides an affinity tag for the subsequent pull-down step.

呈现捕获标记的变体Cas9核糖核蛋白复合物的导向RNA(gRNA)-促进的结合之后是双链靶DNA的切割。在切割后并且当Cas9保持与靶DNA片段的切割的5'和3'末端结合时,使反应混合物与结合有一个或多个提取部分的官能化的表面接触。所提供的提取部分能够结合至捕获标记(例如链霉亲和素珠,其中捕获标记是生物素),用于固定和分离带有捕获标记的分子。特别地,提取部分可以是结合对的任何成员,诸如生物素/链霉亲和素或半抗原/抗体或互补核酸序列(DNA/DNA对、DNA/RNA对、RNA/RNA对、LNA/DNA对等)。在所图示的实施例中,连接到与(裂解的)靶dsDNA片段结合的CRISPR/Cas9核糖核蛋白复合物上的捕获标记被其结合对(例如提取部分)捕获,该结合对连接至可分离部分(例如,诸如磁性吸引颗粒或可以通过离心沉淀的大颗粒)。因此,捕获标记可以是任何类型的分子/部分,其允许与捕获标记相关联(例如,由Cas9结合)的核酸与缺乏与捕获标记相关联的核酸的亲和分离。捕获标记的示例是生物素,它通过与连接至或可连接至固相的链霉亲和素或寡核苷酸结合而允许亲和分离,这又通过与连接至或可连接至固相的互补寡核苷酸结合而允许亲和分离。不期望的或非靶向的核酸材料可以在溶液中保持游离。有益的是,不带有任何捕获标记或与任何捕获标记相关联的游离/未结合的核酸材料可以有效地从期望的靶核酸材料中去除/分离。在进一步的实施例中,官能化的表面可以被清洗以去除残留的副产物或其他污染物。Guide RNA (gRNA)-facilitated binding of the variant Cas9 ribonucleoprotein complex presenting the capture tag is followed by cleavage of double-stranded target DNA. After cleavage and while Cas9 remains bound to the cleaved 5' and 3' ends of the target DNA fragments, the reaction mixture is contacted with a functionalized surface to which one or more extraction moieties are bound. Extraction moieties provided are capable of binding to capture labels (eg, streptavidin beads, where the capture labels are biotin) for immobilization and separation of capture label bearing molecules. In particular, the extraction moiety may be any member of a binding pair, such as biotin/streptavidin or hapten/antibody or complementary nucleic acid sequence (DNA/DNA pair, DNA/RNA pair, RNA/RNA pair, LNA/DNA pair equal). In the illustrated embodiment, a capture tag attached to a CRISPR/Cas9 ribonucleoprotein complex bound to a (cleaved) target dsDNA fragment is captured by its binding pair (eg, an extraction moiety) that is attached to a Separate fractions (eg, such as magnetically attracted particles or large particles that can be precipitated by centrifugation). Thus, the capture label can be any type of molecule/moiety that allows affinity separation of nucleic acids associated with the capture label (eg, bound by Cas9) from nucleic acids lacking the association of the capture label. An example of a capture label is biotin, which allows affinity separation by binding to streptavidin or oligonucleotides attached or attachable to a solid phase, which in turn Complementary oligonucleotides bind to allow affinity separation. Undesirable or non-targeted nucleic acid material can remain free in solution. Beneficially, free/unbound nucleic acid material without or associated with any capture label can be efficiently removed/separated from the desired target nucleic acid material. In further embodiments, the functionalized surface can be cleaned to remove residual by-products or other contaminants.

使用图5所示的基于亲和的富集方案,可以显著地减少不期望的或非靶向的核酸材料的丰度。期望的/靶核酸片段的收集可以以任何适合应用的方式完成。作为具体的示例,在一些实施例中,期望的核酸材料的收集可以通过以下中的一种或多种来完成:通过大小过滤、磁性方法、电荷方法、离心密度方法或任何其他方法去除官能化的表面,或者如果使用基于柱的纯化方法或类似方法,则收集洗脱部分,或者通过本领域技术人员通常理解的任何其他纯化实践。Using the affinity-based enrichment scheme shown in Figure 5, the abundance of undesired or non-targeted nucleic acid material can be significantly reduced. Collection of the desired/target nucleic acid fragments can be accomplished in any manner suitable for the application. As a specific example, in some embodiments, collection of the desired nucleic acid material can be accomplished by one or more of the following: defunctionalization by size filtration, magnetic methods, charge methods, centrifugal density methods, or any other method surface, or if a column-based purification method or similar method is used, collect the eluted fraction, or by any other purification practice commonly understood by those skilled in the art.

在一些实施例中,基于亲和力的阳性富集步骤可以与阴性富集步骤结合或者可以与阴性富集步骤结合使用。例如,在切割之后并且当Cas9保持与靶DNA片段的切割的5'和3'末端结合时(在基于亲和的富集步骤之前或之后),样品可以用核酸外切酶处理以破坏样品中任何不需要的核酸材料或污染物。在图A和B所示的基于亲和的富集步骤和任选的阴性核酸外切酶清除步骤之后,Cas9与DNA分离,以释放出已知长度的钝端双链靶DNA片段(图D)。任选地,上述富集步骤可以与如上所述的基于大小的富集步骤结合(图E),并且在一些实施例中,富集的DNA片段可以连接到用于核酸询问的衔接子上,诸如如上所述的测序(图F)。In some embodiments, an affinity-based positive enrichment step can be combined with a negative enrichment step or can be used in conjunction with a negative enrichment step. For example, after cleavage and while Cas9 remains bound to the cleaved 5' and 3' ends of the target DNA fragment (before or after the affinity-based enrichment step), the sample can be treated with an exonuclease to disrupt the Any unwanted nucleic acid material or contaminants. Following an affinity-based enrichment step and an optional negative exonuclease clean-up step shown in panels A and B , Cas9 dissociates from the DNA to release blunt-ended double-stranded target DNA fragments of known length ( Panel D ). Optionally, the enrichment step described above can be combined with a size-based enrichment step as described above ( Figure E ), and in some embodiments, the enriched DNA fragments can be ligated to adaptors for nucleic acid interrogation, Sequencing such as described above ( panel F ).

图6是示出了根据本技术的另一个实施例的用于靶核酸材料的阴性富集/选择的方法的步骤的示意图。例如,可以通过去除或破坏非靶或不期望的核酸材料来促进靶双链核酸材料的富集。图6示出了使用Cas9的无催化活性变体来生成具有基本上已知/选定长度的靶向的核酸片段的富集的实施例。使用被工程改造成靶向并选择性地结合双链DNA的无催化活性的Cas9核糖核蛋白复合物,gRNA促进一对无催化活性的Cas9变体与侧翼靶向的DNA区域的结合(图A)。在结合后,样品可以用一种或多种核酸外切酶处理,以水解DNA的暴露的3'或5'末端的暴露的磷酸二酯键。Cas9的无催化活性的变体不切割靶DNA,但是提供了核酸外切酶抗性,使得核酸外切酶活性切割每个核苷酸碱基,直到被结合的Cas9复合物阻断。因此,核酸外切酶处理破坏了样品中的所有非靶向的核酸材料,其中暴露的末端留下由成对的无催化活性的Cas9保护的片段。在某些实施例中,核酸内切酶和核酸外切酶的混合物可以用于破坏不期望的核酸材料。例如,核酸内切酶(例如位点特异性限制性酶)可以用于生成多个暴露的5'和3'末端,以允许核酸外切酶的酶活性。6 is a schematic diagram illustrating the steps of a method for negative enrichment/selection of target nucleic acid material according to another embodiment of the present technology. For example, the enrichment of target double-stranded nucleic acid material can be facilitated by removing or destroying non-target or undesired nucleic acid material. Figure 6 shows an example of using a catalytically inactive variant of Cas9 to generate an enrichment of targeted nucleic acid fragments of substantially known/selected length. Using a catalytically inactive Cas9 ribonucleoprotein complex engineered to target and selectively bind to double-stranded DNA, the gRNA facilitates the binding of a pair of catalytically inactive Cas9 variants to flanking targeted DNA regions ( Figure A ). After binding, the sample can be treated with one or more exonucleases to hydrolyze the exposed phosphodiester bonds of the exposed 3' or 5' ends of the DNA. Catalytically inactive variants of Cas9 do not cleave target DNA, but provide exonuclease resistance such that exonuclease activity cleaves every nucleotide base until blocked by bound Cas9 complexes. Thus, exonuclease treatment destroys all non-targeted nucleic acid material in the sample, with exposed ends leaving fragments protected by pairs of catalytically inactive Cas9. In certain embodiments, mixtures of endonucleases and exonucleases can be used to destroy undesired nucleic acid material. For example, endonucleases (eg, site-specific restriction enzymes) can be used to generate multiple exposed 5' and 3' ends to allow for the enzymatic activity of the exonuclease.

在通过核酸外切酶破坏所有非靶向的DNA而对靶DNA片段进行阴性/富集选择后(图B),无催化活性的Cas9与DNA分离,从而释放出已知长度的双链靶DNA片段,如图C所示。如上面所讨论的,可以实施另外的大小选择步骤以用于进一步富集靶双链DNA片段(图D)。任选地,富集的DNA片段可以被抛光、钝化或拖尾以形成合适的可连接末端,并且随后连接到用于核酸询问的衔接子,诸如测序(图E)。After negative/enrichment selection of target DNA fragments by exonuclease destruction of all non-targeted DNA (Panel B), catalytically inactive Cas9 dissociates from DNA, releasing double-stranded target DNA of known length fragment, as shown in Figure C. As discussed above, additional size selection steps can be performed for further enrichment of target double-stranded DNA fragments ( panel D ). Optionally, the enriched DNA fragments can be polished, blunted or tailed to form suitable ligable ends, and subsequently ligated to adaptors for nucleic acid interrogation, such as sequencing ( Figure E ).

在图7所描绘的另一个实施例中,阴性富集方案和阳性富集方案都可以使用Cas9的无催化活性的变体来实施。图A示出了在核糖核蛋白复合物中使用Cas9的无催化活性变体,该核糖核蛋白复合物被工程改造成在合适的条件下保持与DNA结合,并且其中该核糖核蛋白复合物包括捕获标记(例如,在导向RNA上或束缚至Cas9蛋白)。无催化活性的变体Cas9核糖核蛋白复合物与捕获标记的导向RNA(gRNA)-促进的结合之后是向样品中加入核酸外切酶,以水解DNA的暴露的3'或5'末端的暴露的磷酸二酯键。Cas9的无催化活性的变体不切割靶DNA,但是提供了核酸外切酶抗性,使得核酸外切酶活性切割每个核苷酸碱基,直到被结合的Cas9复合物阻断。在通过核酸外切酶破坏所有非靶向的DNA而对靶DNA片段进行阴性/富集选择之后,并且当无催化活性的Cas9保持结合时,逐步添加能够结合与核糖核蛋白复合物(当其保持与靶核酸结合时)相关的捕获标记的官能化的表面(例如,具有一个或多个与其结合的提取部分的官能化的表面),可以固定和/或分离携带捕获标记和/或与捕获标记相关的分子与保留在样品中的不期望的核酸材料(图B)。在一些实施例中,所提供的方法允许去除样品中所有或基本上所有不需要的核酸材料,或者基本上减少它们的丰度。所需的靶核酸材料的收集可以以任何适合应用的方式完成。作为具体的示例,在一些实施例中,收集所需的靶核酸片段可以通过以下中的一种或多种来完成:通过大小过滤、磁性方法、电荷方法、离心密度方法或任何其他方法去除官能化的表面,或者如果使用基于柱的纯化方法或类似方法,则收集洗脱部分,或者通过任何其他通常理解的纯化实践。In another example depicted in Figure 7, both negative and positive enrichment protocols can be implemented using catalytically inactive variants of Cas9. Panel A shows the use of a catalytically inactive variant of Cas9 in a ribonucleoprotein complex engineered to remain bound to DNA under appropriate conditions, and wherein the ribonucleoprotein complex includes Capture label (eg, on guide RNA or tethered to Cas9 protein). Catalytically inactive variant Cas9 ribonucleoprotein complexes with capture-labeled guide RNA (gRNA)-promoted binding are followed by the addition of exonuclease to the sample to hydrolyze the exposed 3' or 5' ends of the DNA phosphodiester bond. Catalytically inactive variants of Cas9 do not cleave target DNA, but provide exonuclease resistance such that exonuclease activity cleaves every nucleotide base until blocked by bound Cas9 complexes. After negative/enriched selection of target DNA fragments by exonuclease destruction of all non-targeted DNA, and while catalytically inactive Cas9 remains bound, stepwise additions capable of binding to the ribonucleoprotein complex (when its A functionalized surface (e.g., a functionalized surface having one or more extraction moieties bound thereto) that retains the capture label associated with the target nucleic acid) can be immobilized and/or separated carrying the capture label and/or associated with the capture The associated molecules are labeled with undesired nucleic acid material retained in the sample ( Panel B ). In some embodiments, provided methods allow for the removal of all or substantially all unwanted nucleic acid material in a sample, or substantially reduce their abundance. Collection of the desired target nucleic acid material can be accomplished in any manner suitable for the application. As a specific example, in some embodiments, collecting the desired target nucleic acid fragments can be accomplished by one or more of the following: functional removal by size filtration, magnetic methods, charge methods, centrifugal density methods, or any other method The eluted surface is collected, or if a column-based purification method or similar method is used, the eluted fraction is collected, or by any other commonly understood purification practice.

在基于亲和的富集步骤之后,并且如图D中所描绘的,Cas9与DNA分离并且释放已知长度的双链靶DNA片段。图E描绘了通过大小选择对靶DNA片段进行阳性富集/选择的任选的进一步处理步骤。任选地,如图F中所描绘的,富集的DNA片段可以连接到用于核酸询问(诸如测序)的衔接子上。Following an affinity-based enrichment step, and as depicted in Figure D , Cas9 dissociates from DNA and releases double-stranded target DNA fragments of known length. Panel E depicts an optional further processing step for positive enrichment/selection of target DNA fragments by size selection. Optionally, as depicted in Figure F , the enriched DNA fragments can be ligated to adaptors for nucleic acid interrogation, such as sequencing.

在一些实施例中,催化活性的和无催化活性的CRISPR/Cas复合物的组合可以用于阳性富集包括靶双链核酸区域的片段。参见图8,催化活性的和无催化活性的Cas9核糖核蛋白复合物都可以以序列依赖的方式靶向样品中所需的核酸区域(例如,特定的基因组基因座)。催化活性的Cas9核糖核蛋白复合物被导向靶DNA区域的侧翼区域,并且用于切割靶双链DNA以释放已知长度的钝端双链靶DNA片段。一种或多种带有捕获标记(例如生物素)的无催化活性的核糖核蛋白复合物被导向两个位点选择的切割位点之间的靶序列区域。在切割靶DNA以释放DNA片段后,添加能够结合与无催化活性的核糖核蛋白复合物相关的捕获标记的官能化的表面可以促进靶片段的阳性富集/选择。将认识到,许多其他形式的靶向的核酸片段化,诸如上述的那些,可以替代本示例中的活性Cas9核糖核蛋白复合物。In some embodiments, a combination of catalytically active and catalytically inactive CRISPR/Cas complexes can be used to positively enrich for fragments comprising target double-stranded nucleic acid regions. Referring to Figure 8, both catalytically active and catalytically inactive Cas9 ribonucleoprotein complexes can target desired nucleic acid regions (eg, specific genomic loci) in a sample in a sequence-dependent manner. Catalytically active Cas9 ribonucleoprotein complexes are directed to flanking regions of the target DNA region and are used to cleave the target double-stranded DNA to release blunt-ended double-stranded target DNA fragments of known length. One or more catalytically inactive ribonucleoprotein complexes with a capture label (eg, biotin) are directed to the region of the target sequence between the two site-selected cleavage sites. After cleavage of target DNA to release DNA fragments, the addition of functionalized surfaces capable of binding capture labels associated with catalytically inactive ribonucleoprotein complexes can facilitate positive enrichment/selection of target fragments. It will be appreciated that many other forms of targeted nucleic acid fragmentation, such as those described above, can be substituted for the active Cas9 ribonucleoprotein complex in this example.

在一些实施例中,可以采取阳性富集/选择步骤用于富集来自样品的靶序列,其中核酸材料已经被片段化(例如,机械剪切的或来自无细胞的DNA样品(例如,来自液体活检))。图9A和9B是使用带有如上所述的捕获标记的Cas9核糖核蛋白复合物的无催化活性的变体对靶核酸片段进行阳性富集/选择的方法步骤的概念性说明。样品中片段化的双链DNA片段(例如,机械剪切、声学片段化、无细胞的DNA等)可以通过经由溶液中一种或多种无催化活性的Cas9核糖核蛋白复合物的靶定向的结合而被阳性富集/选择(图9A)。In some embodiments, a positive enrichment/selection step can be taken to enrich for target sequences from samples where the nucleic acid material has been fragmented (eg, mechanically sheared or from a cell-free DNA sample (eg, from a liquid) biopsy)). Figures 9A and 9B are conceptual illustrations of method steps for positive enrichment/selection of target nucleic acid fragments using catalytically inactive variants of the Cas9 ribonucleoprotein complex with capture tags as described above. Fragmented double-stranded DNA fragments (eg, mechanical shearing, acoustic fragmentation, cell-free DNA, etc.) in a sample can be directed via targeting of one or more catalytically inactive Cas9 ribonucleoprotein complexes in solution. Binding was positively enriched/selected (Figure 9A).

在一些实施例中,一种方法可以包含使用两种或更多种捕获标记(例如,2、3、4、5、6、7、8、9、10种或更多种),其可以用于差异地标记多个Cas9核糖核蛋白复合物。例如,样品可以同时富集多个靶核酸样品。虽然在一些实施例中,预期所有的Cas9复合物均带有相同的捕获标记(例如生物素),使得所有的靶向的序列均可以在单个样品中一起被下拉(亲和纯化),但是在其他实施例中,通过将基本上独特的捕获标记与针对靶向不同区域的Cas9复合物结合,可以促进不同靶向的序列的分离。在一些实施例中,在方法中使用的至少两种捕获标记彼此不同(例如,小分子和肽)。在一些实施例中,包含两种或更多种不同的捕获标记允许使用阳性富集/选择以及阴性富集/选择两者。包含两种或更多种捕获标记可能是有帮助的,尤其是在其中需要物理地分离包括不同靶序列的核酸片段用于随后的核酸询问(例如测序)的情况下。In some embodiments, a method can comprise the use of two or more capture labels (eg, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more), which can be used with for differentially labeling multiple Cas9 ribonucleoprotein complexes. For example, a sample can be enriched for multiple target nucleic acid samples simultaneously. While in some embodiments it is expected that all Cas9 complexes carry the same capture label (eg, biotin) so that all targeted sequences can be pulled down (affinity purified) together in a single sample, in In other embodiments, separation of differently targeted sequences can be facilitated by conjugating substantially unique capture labels to Cas9 complexes targeting different regions. In some embodiments, the at least two capture labels used in the method are different from each other (eg, small molecules and peptides). In some embodiments, the inclusion of two or more different capture labels allows the use of both positive enrichment/selection and negative enrichment/selection. The inclusion of two or more capture labels may be helpful, especially in situations where it is desired to physically separate nucleic acid fragments comprising different target sequences for subsequent nucleic acid interrogation (eg, sequencing).

使反应混合物与结合有一个或多个提取部分的官能化的表面接触。所提供的提取部分能够结合至捕获标记(例如链霉亲和素珠,其中捕获标记是生物素),用于固定和分离带有捕获标记的分子(图9B)。The reaction mixture is contacted with a functionalized surface to which one or more extraction moieties are bound. The provided extraction fractions are capable of binding to capture labels (eg, streptavidin beads, where the capture labels are biotin) for immobilization and isolation of capture label-bearing molecules (Figure 9B).

在一些实施例中,当样品含有不同大小的片段时,期望从样品中富集或分离靶核酸材料,所述不同大小包含较小并且否则在处理步骤(例如,DS过程步骤)期间可能会丢失的片段大小。图10是示出了使用带有捕获标记的Cas9核糖核蛋白复合物的无催化活性的变体对靶核酸片段进行阳性富集/选择的方法步骤的示意图。图A示出了样品中不同大小的多个片段化的双链DNA片段,包含分子2,其太小以至于不能通过大小选择或基于亲和力的方法可靠地富集。在该实施例中,可以使用已知的测序文库制备步骤将衔接子(例如测序衔接子)连接/附着至片段末端。以这种方式,某些小的核酸片段通过侧翼的衔接子分子被延长。从溶液中阳性富集靶向的片段可以如上文关于图9A和9B所述进行。例如,图10的图B示出了将衔接子连接到样品中分子的5'和3'末端,从而使这样的DNA片段的长度更长。图C示出了通过经由在溶液中带有捕获标记的无催化活性的Cas9核糖核蛋白复合物的靶定向的结合的分子2的阳性富集/选择步骤,随后是亲和纯化。In some embodiments, it is desirable to enrich or isolate target nucleic acid material from the sample when the sample contains fragments of different sizes, which are smaller and which might otherwise be lost during processing steps (eg, DS process steps) fragment size. Figure 10 is a schematic diagram showing the method steps for positive enrichment/selection of target nucleic acid fragments using catalytically inactive variants of the Cas9 ribonucleoprotein complex with capture tags. Panel A shows multiple fragmented double-stranded DNA fragments of different sizes in the sample, including molecule 2, which is too small to be reliably enriched by size selection or affinity-based methods. In this example, adapters (eg, sequencing adapters) can be ligated/attached to fragment ends using known sequencing library preparation procedures. In this way, certain small nucleic acid fragments are extended by flanking adaptor molecules. Positive enrichment of targeted fragments from solution can be performed as described above with respect to Figures 9A and 9B. For example, Panel B of Figure 10 shows the ligation of adaptors to the 5' and 3' ends of the molecules in the sample, thereby making such DNA fragments longer in length. Panel C shows a positive enrichment/selection step for molecule 2 by targeting directed binding of the catalytically inactive Cas9 ribonucleoprotein complex with a capture label in solution, followed by affinity purification.

图11是示出了根据本技术的实施例的用于使用阴性富集方案(图A)和阳性富集方案(图B)来富集靶向的核酸材料的方法的步骤的示意图。图A示出了将发夹衔接子连接到双链靶DNA分子的5'和3'末端,以生成没有暴露的末端的衔接子-核酸复合物。在阴性富集/选择方案中用核酸外切酶处理衔接子-核酸复合物,以消除核酸材料片段和具有未保护的5'和3'末端的衔接子(例如,没有4个连接的磷酸二酯键的衔接子-核酸复合物、未连接的DNA、单链核酸材料、游离衔接子等),如图B的右侧所示。11 is a schematic diagram illustrating the steps of a method for enriching targeted nucleic acid material using a negative enrichment scheme ( panel A ) and a positive enrichment scheme ( panel B ) in accordance with an embodiment of the present technology. Panel A shows the ligation of hairpin adaptors to the 5' and 3' ends of a double-stranded target DNA molecule to generate adaptor-nucleic acid complexes without exposed ends. Adapter-nucleic acid complexes are treated with exonuclease in a negative enrichment/selection protocol to eliminate fragments of nucleic acid material and adapters with unprotected 5' and 3' ends (eg, no 4 linked phosphodi Ester-linked adaptor-nucleic acid complexes, unligated DNA, single-stranded nucleic acid material, free adaptors, etc.), as shown on the right side of panel B.

如图11所示,发夹衔接子可以在接头部分中包括可切割部分,诸如尿嘧啶基团,或任何其他酶、化学或光电可切割的基团。当用尿嘧啶DNA糖基化酶(UDG)和具有脱碱基位点DNA裂解酶活性的酶(诸如核酸内切酶VIII或甲酰胺嘧啶[fapy]-DNA糖基化酶(FPG))或商业预混合的组合(例如USERTM酶)的组合处理时,尿嘧啶处的切割可将发夹衔接子转变为包括适合于聚合物克隆形成(桥接扩增)和某些测序模式的Y形状的衔接子。As shown in Figure 11, a hairpin adaptor can include a cleavable moiety in the linker moiety, such as a uracil group, or any other enzymatically, chemically or optoelectronically cleavable group. When using uracil DNA glycosylase (UDG) and an enzyme with abasic site DNA lyase activity (such as endonuclease VIII or formamidopyrimidine [fapy]-DNA glycosylase (FPG)) or Cleavage at uracil can convert hairpin adaptors to include Y-shapes suitable for polymer clone formation (bridging amplification) and certain sequencing modes when processed by a combination of commercial premixed combinations (e.g., USER enzymes). linker.

核酸外切酶抗性衔接子-核酸复合物可以通过大小选择或通过靶序列(例如CRISPR/Cas9下拉)进一步富集(图11的图B左侧)。在另一个实施例中,可以使用带有捕获标记的发夹衔接子(如图12所示),其直接适合于使用具有暴露的提取部分的官能化的表面的基于亲和的富集。Exonuclease-resistant adaptor-nucleic acid complexes can be further enriched by size selection or by target sequences (eg, CRISPR/Cas9 pulldown) (Figure 11, left panel B ). In another embodiment, hairpin adaptors with capture tags (as shown in Figure 12) can be used, which are directly suitable for affinity-based enrichment using functionalized surfaces with exposed extraction moieties.

在图11中描述的连接到发夹衔接子的靶核酸片段的阴性富集后的实施例中,可以进行额外的阳性富集步骤。例如,图13是示出了使用发夹衔接子(图A),随后进行滚环扩增(图B和C),对衔接子-靶核酸复合物进行阳性富集的方法步骤的示意图。滚环扩增步骤可以用于(1)提供基本上1:1比例的第一链扩增子与第二链扩增子,和(2)在标记前和/或在文库清理步骤期间防止链解离。长分子测序平台可以适合于直接测序滚环扩增子(图C);然而,对于短读数测序平台,人们可以(1)酶促地切割包括切割位点(例如限制性核酸内切酶识别位点)的发夹接头片段,以生成大致均匀比例的第一链扩增子和第二链扩增子(图D左侧),或者(2)使用PCR扩增以生成多个包括基本上相同比例的第一序列和第二序列(图D右侧)的短扩增子。In the embodiment described in Figure 11 after negative enrichment of target nucleic acid fragments ligated to hairpin adaptors, an additional positive enrichment step can be performed. For example, Figure 13 is a schematic diagram showing the method steps for positive enrichment of adaptor-target nucleic acid complexes using hairpin adaptors ( Panel A ) followed by rolling circle amplification ( Panels B and C ). The rolling circle amplification step can be used to (1) provide a substantially 1:1 ratio of first-strand amplicons to second-strand amplicons, and (2) prevent stranding prior to labeling and/or during library clean-up steps dissociate. Long-molecule sequencing platforms may be suitable for direct sequencing of rolling circle amplicons (panel C); however, for short-read sequencing platforms, one can (1) enzymatically cleave cleavage sites including cleavage sites (e.g., restriction endonuclease recognition sites) dots) to generate approximately uniform ratios of first-strand and second-strand amplicons (left side of panel D ), or (2) use PCR amplification to generate multiple components including substantially the same Proportion of short amplicons of the first and second sequences (right side of panel D ).

图14是示出了用于使用CRISPR/Cpf1的位点定向的结合和切割来生成具有带有不同5'和3'可连接末端的已知/选定长度的靶向的核酸片段的方法的步骤的示意图。在各种实施例中,5'和3'可连接末端包括具有已知核苷酸长度和序列的单链突出区域。Cpf1是一种靶向的核酸内切酶,其识别在导向的5'末端富含T的PAM,并且在双链DNA靶序列中进行交错的切割。例如,Cpf1的变体在正义链上的PAM后切割19bp,并且在反义链上切割23bp,如图14所示。图A示出了Cpf1在靶向的DNA位点的gRNA促进的结合。Cpf1定向的切割生成交错的切割,提供4个(所描绘的)或5个核苷酸突出端(例如,“粘性末端”)。在靶DNA序列的侧面的位点定向的Cpf1切割生成已知长度的双链靶DNA片段(例如,其可以通过大小选择被进一步和任选地富集),其中粘性末端1在片段的5'末端并且粘性末端2在片段的3'末端(图B)。图B进一步示出了在片段的5'末端连接衔接子1,并且在片段的3'末端连接衔接子2,其中衔接子1和衔接子2分别包括与片段上的粘性末端1和2至少部分地互补的突出序列。Figure 14 is a diagram illustrating a method for generating targeted nucleic acid fragments of known/selected lengths with different 5' and 3' ligatable ends using site-directed binding and cleavage of CRISPR/Cpf1 Schematic of the steps. In various embodiments, the 5' and 3' ligatable ends include single-stranded overhang regions of known nucleotide length and sequence. Cpf1 is a targeted endonuclease that recognizes a T-rich PAM at the targeted 5' end and performs staggered cleavage in double-stranded DNA target sequences. For example, a variant of Cpf1 cleaves 19 bp after PAM on the sense strand and 23 bp on the antisense strand, as shown in FIG. 14 . Panel A shows gRNA-promoted binding of Cpf1 at targeted DNA sites. Cpf1-directed cleavage generates staggered cleavage, providing 4 (depicted) or 5 nucleotide overhangs (eg, "sticky ends"). Site-directed Cpf1 cleavage flanking the target DNA sequence generates double-stranded target DNA fragments of known length (eg, which can be further and optionally enriched by size selection) with sticky ends 1 5' to the fragment end and sticky end 2 at the 3' end of the fragment ( panel B ). Panel B further shows ligation of adaptor 1 at the 5' end of the fragment, and ligation of adaptor 2 at the 3' end of the fragment, wherein adaptor 1 and adaptor 2 comprise at least part of the cohesive ends 1 and 2 on the fragment, respectively complementary overhang sequences.

通过设计,粘性末端1的序列(靶向的片段5'末端的突出端)是已知的。同样,粘性末端2的序列(靶向的片段3'末端的突出端)是已知的。可以合成包括基本上互补的序列的特定衔接子,使得片段可以在两端连接至衔接子。在一个实施例中,衔接子可以是相同类型的衔接子(例如,包括Y形、U形、条形码衔接子等的衔接子)。在另一个实施例中,衔接子可以不同(例如,衔接子1可以包括Y形,并且衔接子2可以包括U形)。其他独特的特征可以包含用于扩增的不同引物位点、不同类型或位置的条形码或其他独特的分子标识符、包括捕获标记的衔接子和没有捕获标记的衔接子,某些衔接子可以包括荧光标签等。在一些应用中,将特定的衔接子设计成位于片段的5'或3'末端具有明显的优势。靶向的片段上基本上独特的粘性末端的特异性促进了这些类型的应用。此外,成功地切割和衔接子连接的靶片段的阳性选择可以确保仅扩增和测序靶富集的核酸区域。By design, the sequence of sticky end 1 (the overhang at the 5' end of the targeted fragment) is known. Likewise, the sequence of sticky end 2 (the overhang at the 3' end of the targeted fragment) is known. Specific adaptors can be synthesized that include substantially complementary sequences such that the fragments can be ligated to the adaptors at both ends. In one embodiment, the adaptors can be the same type of adaptors (eg, adaptors including Y-shaped, U-shaped, barcoded adaptors, etc.). In another embodiment, the adaptors can be different (eg, adaptor 1 can comprise a Y shape and adaptor 2 can comprise a U shape). Other unique features may include different primer sites for amplification, different types or positions of barcodes or other unique molecular identifiers, adapters including capture tags and adapters without capture tags, some adapters may include Fluorescent labels, etc. In some applications, there are distinct advantages to designing specific adaptors to be located at the 5' or 3' end of the fragment. The specificity of the essentially unique sticky ends on the targeted fragments facilitates these types of applications. In addition, positive selection of successfully cleaved and adaptor-ligated target fragments ensures that only target-enriched nucleic acid regions are amplified and sequenced.

在一些实施例中,由Cpf1切割生成的基本上独特的粘性末端可以用于额外的阳性富集方案。例如,图15是示出了根据本技术的实施例的包括粘性末端的靶DNA片段(例如,诸如在图14的方法中生成的靶DNA片段)的基于亲和的富集的方法的步骤的示意图。图A示出了逐步添加官能化的表面,该官能化的表面能够结合与溶液中切割的靶DNA片段相关的粘性末端。例如,官能化的表面可以具有与其结合的一个或多个提取部分,其适合作为与一个或多个靶向的DNA突出序列的结合对。所提供的提取部分可以是例如合成的寡核苷酸,其具有至少部分地与Cpf1切割的靶序列的生成的粘性末端互补的预定义的或已知的寡核苷酸序列。寡核苷酸可以包括DNA、RNA或LNA序列,其能够结合至捕获标记(例如粘性末端),用于固定和分离包括粘性末端的靶。一旦结合到官能化的表面,亲和力相互作用促进所需双链DNA片段的下拉(例如,亲和纯化),同时丢弃非靶向的片段,如图B所示。In some embodiments, substantially unique sticky ends generated by Cpf1 cleavage can be used for additional positive enrichment protocols. For example, FIG. 15 is a diagram illustrating steps of a method for affinity-based enrichment of target DNA fragments comprising sticky ends (eg, such as those generated in the method of FIG. 14 ), according to embodiments of the present technology. Schematic. Panel A shows the stepwise addition of functionalized surfaces capable of binding sticky ends associated with target DNA fragments cleaved in solution. For example, a functionalized surface may have bound thereto one or more extraction moieties that are suitable as binding pairs with one or more targeted DNA overhang sequences. The provided extraction moiety can be, for example, a synthetic oligonucleotide having a predefined or known oligonucleotide sequence that is at least partially complementary to the resulting cohesive ends of the Cpf1-cleaved target sequence. Oligonucleotides can include DNA, RNA, or LNA sequences capable of binding to capture labels (eg, sticky ends) for immobilization and isolation of targets including sticky ends. Once bound to the functionalized surface, affinity interactions facilitate pull-down (eg, affinity purification) of the desired double-stranded DNA fragments, while discarding non-targeted fragments, as shown in panel B.

图16是示出了根据本技术的另一个实施例的用于包括粘性末端的靶DNA片段(例如,诸如在图14的方法中生成的靶DNA片段)的基于亲和的富集的方法的步骤的示意图。图A示出了带有捕获标记的寡核苷酸的逐步添加,该寡核苷酸具有预定义的或已知的寡核苷酸序列,该寡核苷酸序列至少部分地与溶液中与切割的靶DNA片段相关的粘性末端的一部分互补。在特定的示例中,寡核苷酸链可以诸如通过亚磷酰胺方法在3'至5'方向上合成(例如,在受控的孔玻璃(CPG)片段等上),并且化学部分可以在寡核苷酸的合成后被连接(例如,共价连接、非共价连接、离子连接或其他连接化学)到5'末端,或者作为寡核苷酸的合成的一部分,诸如通过在5'末端、在5'末端附近或在寡核苷酸中的内部位置掺入非标准亚磷酰胺分子被连接。16 is a diagram illustrating a method for affinity-based enrichment of target DNA fragments comprising sticky ends (eg, such as those generated in the method of FIG. 14 ) in accordance with another embodiment of the present technology Schematic of the steps. Panel A shows the stepwise addition of capture-labeled oligonucleotides having a predefined or known oligonucleotide sequence that is at least partially associated with in solution A portion of the cleaved target DNA fragment associated cohesive ends is complementary. In certain examples, oligonucleotide strands can be synthesized in the 3' to 5' direction, such as by phosphoramidite methods (eg, on controlled pore glass (CPG) fragments, etc.), and chemical moieties can be The nucleotides are linked (eg, covalently, non-covalently, ionic, or other linking chemistry) to the 5' end after synthesis, or as part of the synthesis of oligonucleotides, such as by The incorporation of a non-standard phosphoramidite molecule near the 5' end or at an internal position in the oligonucleotide is ligated.

图B所示,进一步添加能够结合捕获标记的官能化的表面促进所需的双链DNA片段的下拉(例如亲和纯化),同时丢弃非靶向的片段。As shown in Panel B , further addition of a functionalized surface capable of binding a capture label facilitates pull-down (eg, affinity purification) of the desired double-stranded DNA fragments, while discarding non-targeted fragments.

一起参考图15和16,并且在接下来的步骤(未示出)中,靶向的片段的洗脱可以通过从提取部分中释放来进行。在一些非限制性的示例中,可切割部分可以结合在寡核苷酸提取部分的结合端附近。在另一个实施例中,可以改变温度或其他条件以引起短捕获标记/提取结合的变性,同时保持靶核酸片段的双链性质。在又一个实施例中,可以在靶片段的第二粘性末端使用发夹衔接子,以在洗脱和进一步处理期间将双链束缚在一起。在各种实施例中,在富集步骤之后,如本文所述,粘性末端可以被抛光、修整或生物计算过滤,用于避免假复合错误。Referring to Figures 15 and 16 together, and in a subsequent step (not shown), elution of the targeted fragment may be performed by release from the extraction fraction. In some non-limiting examples, the cleavable moiety can be bound near the binding end of the oligonucleotide extraction moiety. In another example, the temperature or other conditions can be altered to cause denaturation of the short capture label/extract binding while maintaining the double-stranded nature of the target nucleic acid fragment. In yet another embodiment, a hairpin adaptor can be used at the second cohesive end of the target fragment to tether the double strands together during elution and further processing. In various embodiments, after the enrichment step, as described herein, the sticky ends may be polished, trimmed, or biocomputationally filtered to avoid false recombination errors.

图17是示出了根据本技术的实施例的使用Cas9切口酶对具有已知长度和具有不同5'和3'可连接末端的核酸材料进行靶向的片段富集的方法的步骤的示意图,所述可连接末端包括具有已知核苷酸长度和序列的长单链突出区域。图A示出了成对的Cas9切口酶在靶向的DNA区域中的gRNA靶向的结合。双链断裂可以通过使用成对的切口酶来切除靶DNA区域来引入,并且当使用成对的Cas9切口酶时,在每个切割的末端上产生长的突出端(粘性末端1和2),如图B所示。因此,与用产生钝端的催化活性的Cas9切割相反,Cas9切口酶的策略性配对可以在相对的DNA链上提供交错的单链切割,以产生长的突出端,如图B所描绘的。如上文关于图15所描述的,能够结合与溶液中切割的靶DNA片段相关的长粘性末端(例如,粘性末端1)的官能化的表面的逐步添加为溶液中的靶向的DNA片段提供了阳性富集步骤。例如,提取部分可以是具有预定义的或已知寡核苷酸序列的寡核苷酸,该寡核苷酸序列与片段的长粘性末端的预定义的或已知序列基本上互补。一旦结合到官能化的表面,亲和力相互作用促进所需的双链DNA片段的下拉(例如,亲和纯化),同时丢弃非靶向的片段,如图D所示。17 is a schematic diagram showing the steps of a method for targeted fragment enrichment of nucleic acid material of known length and having different 5' and 3' ligatable ends using Cas9 nickase, according to an embodiment of the present technology, The ligable ends include long single-stranded overhangs of known nucleotide length and sequence. Panel A shows gRNA-targeted binding of paired Cas9 nickases in targeted DNA regions. Double-strand breaks can be introduced by excising target DNA regions using paired nickases, and when paired Cas9 nickases are used, long overhangs are created on each cleaved end (sticky ends 1 and 2), As shown in Figure B. Thus, in contrast to cleavage with catalytically active Cas9 that produces blunt ends, strategic pairing of Cas9 nickases can provide staggered single-stranded cleavage on opposing DNA strands to generate long overhangs, as depicted in panel B. As described above with respect to Figure 15, the stepwise addition of a functionalized surface capable of binding the long sticky ends associated with cleaved target DNA fragments in solution (eg, sticky end 1) provides targeted DNA fragments in solution Positive enrichment step. For example, the extraction moiety may be an oligonucleotide having a predefined or known oligonucleotide sequence that is substantially complementary to the predefined or known sequence of the long sticky ends of the fragment. Once bound to the functionalized surface, affinity interactions facilitate pull-down (eg, affinity purification) of the desired double-stranded DNA fragments, while discarding non-targeted fragments, as shown in panel D.

图17,图E示出了阳性富集步骤的变型,包括添加和退火带有捕获标记的寡核苷酸,所述寡核苷酸具有预定义的或已知的寡核苷酸序列,所述寡核苷酸序列与溶液中与切割的靶DNA片段相关的长粘性末端(例如,粘性末端1)的一部分至少部分地互补。图F示出了与带有捕获标记的寡核苷酸的一部分至少部分地互补的第二寡链的退火。第二寡链的酶促延伸和与模板DNA片段的连接生成了衔接子-靶DNA复合物。如所示出的,第一寡核苷酸链和第二寡核苷酸链包括单链部分,使得所得的衔接子复合物包括用于DS处理的不对称性。此外,第一寡核苷酸链可以包括简并或半简并的SMI序列,使得当第二寡核苷酸链延长时,第一寡核苷酸链起模板链的作用,并且SMI序列被制成双链。进一步的步骤可以包含引入能够结合捕获标记的官能化的表面(未示出),以促进所需的衔接子-双链DNA复合物的下拉(例如亲和纯化),同时丢弃非靶向的片段。Figure 17, Panel E shows a variant of the positive enrichment step involving the addition and annealing of capture-labeled oligonucleotides with predefined or known oligonucleotide sequences, which The oligonucleotide sequence is at least partially complementary to a portion of the long sticky end (eg, sticky end 1) associated with the cleaved target DNA fragment in solution. Panel F shows the annealing of a second oligo strand that is at least partially complementary to a portion of the capture-labeled oligonucleotide. Enzymatic extension of the second oligo strand and ligation to the template DNA fragment generates the adaptor-target DNA complex. As shown, the first and second oligonucleotide strands comprise single-stranded moieties such that the resulting adaptor complexes comprise asymmetry for DS processing. Furthermore, the first oligonucleotide strand may comprise a degenerate or semi-degenerate SMI sequence, such that when the second oligonucleotide strand is extended, the first oligonucleotide strand acts as a template strand and the SMI sequence is Made into double strands. A further step may involve the introduction of a functionalized surface (not shown) capable of binding a capture label to facilitate pull-down (e.g., affinity purification) of the desired adaptor-dsDNA complex, while discarding non-targeted fragments .

本技术的各个方面包含用于通过经由蛋白质结合提供核酸外切酶和核酸内切酶抗性来阴性富集核酸区域的方法。在一个实施例中,如图18所示,与靶DNA结合的位点选择的蛋白质可以用于提供核酸外切酶和核酸内切酶抗性。如所示出的,靶核酸富集方案使用无催化活性的Cas9核糖核蛋白复合物来保护靶向的基因组区域。通过gRNA,可以将Cas9靶向样品中的所需序列。一种或多种带有一种或多种捕获标记的无催化活性的核糖核蛋白复合物可以被定位在非常接近和/或相邻的位置,以保护基因组DNA的区域免受酶消化。在一些实施例中,如所示出的,核糖核酸酶复合物可以被工程改造以将其他蛋白质复合物结构引导至靶DNA区域。当蛋白质复合物结构覆盖靶DNA区域时,提供了核酸外切酶抗性。在用核酸外切酶或核酸内切酶和核酸外切酶的组合处理后,蛋白复合物的亲和纯化(例如,通过与官能化的表面结合的捕获标记、抗体下拉等)将靶DNA片段与溶液中其他不需要的核酸材料或未结合的蛋白质分离。然后可以从核糖核苷酸复合物结合中释放靶核酸片段。Various aspects of the present technology include methods for negatively enriching nucleic acid regions by providing exonuclease and endonuclease resistance via protein binding. In one example, as shown in Figure 18, site-selected proteins that bind to target DNA can be used to provide exonuclease and endonuclease resistance. As shown, the target nucleic acid enrichment protocol uses the catalytically inactive Cas9 ribonucleoprotein complex to protect targeted genomic regions. With gRNA, Cas9 can be targeted to the desired sequence in the sample. One or more catalytically inactive ribonucleoprotein complexes with one or more capture tags can be positioned in close proximity and/or adjacent positions to protect regions of genomic DNA from enzymatic digestion. In some embodiments, as shown, ribonuclease complexes can be engineered to direct other protein complex structures to target DNA regions. Exonuclease resistance is provided when the protein complex structure covers the target DNA region. After treatment with an exonuclease or a combination of an endonuclease and an exonuclease, affinity purification of the protein complex (eg, by capture tags bound to functionalized surfaces, antibody pulldown, etc.) removes target DNA fragments Separated from other unwanted nucleic acid material or unbound proteins in solution. The target nucleic acid fragment can then be released from the ribonucleotide complex binding.

核酸文库以及用于制备和使用核酸文库的方法Nucleic acid libraries and methods for making and using nucleic acid libraries

在一些实施例中,所提供的方法可以包含以下步骤:提供核酸材料,将多个靶向的无催化活性的核酸内切酶(例如,核糖核蛋白复合物)引导至沿着核酸材料分布的多个区域,以产生可以在任何时间通过选择性探针询问的核酸文库。In some embodiments, the provided methods can comprise the steps of: providing a nucleic acid material, directing a plurality of targeted catalytically inactive endonucleases (eg, ribonucleoprotein complexes) to cells distributed along the nucleic acid material multiple regions to generate nucleic acid libraries that can be interrogated by selective probes at any time.

图19A和19B是根据本技术的实施例的制备的DNA文库和试剂的概念性说明,所述文库和试剂可以用作选择性地询问相关的DNA区域的工具。唯一标记的无催化活性的Cas9靶向于分离的/未片段化的基因组DNA(或其他大的DNA片段)的多个(例如,间隔的)区域(图19A)。每种无催化活性的Cas9核糖核蛋白包括具有已知序列(例如,代码序列)的已知寡核苷酸标签,并且与基因组的预先设计的区域结合。如图19A中示意性地示出的,多个非活性的Cas9核糖核蛋白复合物(例如,iCas9A、iCas9B、iCas9C、iCas9N)被gRNA导向以结合分布在整个基因组区域(例如,大的选定区域、整个基因组等)的基因组位点(位点A、位点B、位点C、位点N)。每个iCas9复合物包括包括有寡核苷酸代码序列(AAAAAAA)的寡核苷酸标签,其中“A”是任何核苷酸(未修饰的或修饰的)。核苷酸的首位包括基本上唯一的代码,该代码可以被记录并随后在查找表中查找。19A and 19B are conceptual illustrations of DNA libraries and reagents prepared according to embodiments of the present technology that can be used as tools to selectively interrogate relevant DNA regions. Uniquely labeled catalytically inactive Cas9 targets multiple (eg, spaced) regions of isolated/unfragmented genomic DNA (or other large DNA fragments) (FIG. 19A). Each catalytically inactive Cas9 ribonucleoprotein includes a known oligonucleotide tag with a known sequence (eg, a code sequence) and binds to a predesigned region of the genome. As shown schematically in Figure 19A, multiple inactive Cas9 ribonucleoprotein complexes (e.g., iCas9A , iCas9B , iCas9C , iCas9N ) are directed by gRNAs to bind distributed throughout the genome region (e.g., large selected regions, entire genomes, etc.) of the genomic loci (Site A , Spot B , Spot C , Spot N ). Each iCas9 complex includes an oligonucleotide tag including an oligonucleotide code sequence (AAAAAAA), where "A" is any nucleotide (unmodified or modified). The leading position of the nucleotide includes a substantially unique code that can be recorded and then looked up in a look-up table.

当需要询问(例如,测序)特定的靶序列或更小的区域时,可以用专门设计的捕获探针对文库进行探测,所述捕获探针被设计成下拉所需的区域。一种片段化的方法可以用于将基因组DNA片段化成各种大小(例如限制性酶消化、机械剪切等)。由于每个iCas9复合物包括计算上与DNA位点相关的基本上唯一的寡核苷酸标签,用户可以逐步添加一个或多个探针,所述探针包括对应于相关的基因组的区域的代码序列的补体(例如,反代码序列)。例如,并且如图19B所示,反代码序列是与相关的代码序列基本上互补的核苷酸序列。例如,为了提取包括位点A的区域,用户查找与结合至位点A的iCas9A复合物相关联的代码序列(AAAAAAA)。然后,使用包括附着或掺入到其上的捕获标记并且包括反代码序列(A'A'A'A'A'A'A')的寡核苷酸探针,可以通过引入带有适当提取部分的官能化的表面(例如链霉亲和素,其中生物素是捕获标记)来功能性地选择和富集相关的区域。When a specific target sequence or smaller region needs to be interrogated (eg, sequenced), the library can be probed with specially designed capture probes designed to pull down the desired region. A method of fragmentation can be used to fragment genomic DNA into various sizes (eg, restriction enzyme digestion, mechanical shearing, etc.). Since each iCas9 complex includes a substantially unique oligonucleotide tag that is computationally associated with a DNA locus, the user can incrementally add one or more probes that include codes corresponding to regions of the genome of interest The complement of the sequence (eg, the reverse code sequence). For example, and as shown in Figure 19B, an inverse code sequence is a nucleotide sequence that is substantially complementary to the associated code sequence. For example, to extract the region that includes site A , the user looks up the code sequence associated with the iCas9A complex that binds to site A (AAAAAAA). This can then be achieved by introducing an Partially functionalized surfaces (eg streptavidin, where biotin is the capture label) to functionally select and enrich regions of interest.

在各种实施例中,核酸文库可以用作若干个探测的询问的资源。此外,可以制备若干个预先结合有多个CRISPR/Cas位点定向的复合物的文库。此外,一些文库可以使用机械剪切、核酸内切酶切割(使用一种或多种限制性核酸内切酶)进行预先片段化或切割。当期望的靶区域被切除时(例如,通过靶向的核酸内切酶消化(例如,CRISPR/Cas,限制性酶等),靶片段的长度将是已知的,并且在使用探针下拉之后,靶片段可以通过大小选择被进一步富集。In various embodiments, a nucleic acid library can be used as a resource for interrogation of several probes. In addition, several libraries can be prepared that are pre-bound with multiple CRISPR/Cas site-directed complexes. In addition, some libraries can be pre-fragmented or cleaved using mechanical shearing, endonuclease cleavage (using one or more restriction endonucleases). When the desired target region is excised (eg, by targeted endonuclease digestion (eg, CRISPR/Cas, restriction enzymes, etc.), the length of the target fragment will be known, and after pulldown using the probe , target fragments can be further enriched by size selection.

另外的方法another method

本技术的一些方面适用于长序列测序技术,诸如直接数字测序(DDS)平台。在一些实施例中,期望富集相关的靶序列以用于DDS。在这样的实施例中,期望对靶序列进行无扩增富集。此外,进一步期望在这样的平台上生成双链测序数据。Some aspects of the present technology are applicable to long sequence sequencing technologies, such as direct digital sequencing (DDS) platforms. In some embodiments, it is desirable to enrich for relevant target sequences for DDS. In such embodiments, amplification-free enrichment of the target sequence is desired. Furthermore, it is further desirable to generate double-stranded sequencing data on such platforms.

图20示出了根据本技术的实施例的用于对靶DNA片段进行基于亲和的富集和测序的方法的步骤,该方法与直接数字测序方法一起使用。图A示出了与包括粘性末端的靶DNA片段(例如,诸如在图14或图17的方法中生成的靶DNA片段)的选定的衔接子连接。图A进一步示出了在片段的5'末端连接衔接子1,并且在片段的3'末端连接衔接子2,其中衔接子1和衔接子2分别包括与片段上的粘性末端1和2至少部分地互补的突出序列。衔接子1具有Y形,并且包括5'和3'单链臂,其带有包括不同性质的不同标记(A和B)。衔接子2是发夹形衔接子。20 illustrates steps of a method for affinity-based enrichment and sequencing of target DNA fragments for use with a direct digital sequencing method in accordance with an embodiment of the present technology. Panel A shows selected adaptor ligations to target DNA fragments including cohesive ends (eg, such as those generated in the methods of FIG. 14 or FIG. 17 ). Panel A further shows ligation of adaptor 1 at the 5' end of the fragment, and ligation of adaptor 2 at the 3' end of the fragment, wherein adaptor 1 and adaptor 2 comprise at least part of the cohesive ends 1 and 2 on the fragment, respectively complementary overhang sequences. Adaptor 1 has a Y shape and includes 5' and 3' single-stranded arms with different labels (A and B) including different properties. Adaptor 2 is a hairpin adaptor.

图B示出了直接数字测序方法中的步骤,其中标记A被配置为结合至功能表面。标记B提供了物理属性(例如,电荷、磁性等),使得电场或磁场的施加导致双链衔接子-DNA复合物的第一链和第二链的变性,随后是DNA片段的电拉伸。第一链和第二链保持被发夹衔接子束缚,使得来自富集的/靶向的链的序列信息为错误校正和其他核酸询问(例如,DNA损伤的评估等)提供双链序列信息。例如,从第一链生成的序列可以与第二链生成的序列进行比较以用于错误校正,或者在另一个示例中,以确定DNA损伤的位点和特征。在一些实施例中,富集的靶向的基因组区域可以具有约1至1,000,000个碱基之间的长度。例如,在一些实施例中,并且当被变性和测序时,富集的核酸片段的长度可以为至少1;2;3;4;5;6;7;8;9;10;15;20;25;30;35;40;50;60;70;80;90;100;120;150;200;300;400;500;600;700;800;900;1000;1200;1500;2000;3000;4000;5000;6000;7000;8000;9000;10,000;15,000;20,000;30,000;40,000;或50,000个碱基长度。在一些实施例中,片段的长度可以至多为60,000;70,000;80,000;90,000;100,000;120,000;150,000;200,000;300,000;400,000;500,000;600,000;700,000;800,000;900,000;或1,000,000个碱基。 Panel B shows steps in a direct digital sequencing method, where label A is configured to bind to a functional surface. Label B provides physical properties (eg, charge, magnetism, etc.) such that application of an electric or magnetic field results in denaturation of the first and second strands of the double-stranded adaptor-DNA complex, followed by electrical stretching of the DNA fragments. The first and second strands remain tethered by hairpin adaptors, so that sequence information from the enriched/targeted strands provides double-stranded sequence information for error correction and other nucleic acid interrogations (eg, assessment of DNA damage, etc.). For example, the sequence generated from the first strand can be compared to the sequence generated from the second strand for error correction, or in another example, to determine the site and characteristics of DNA damage. In some embodiments, the enriched targeted genomic region may have a length between about 1 and 1,000,000 bases. For example, in some embodiments, and when denatured and sequenced, the length of the enriched nucleic acid fragments can be at least 1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 15; 20; 25; 30; 35; 40; 50; 60; 70; 80; 90; 100; 120; 150; 200; 300; 400; 4000; 5000; 6000; 7000; 8000; 9000; 10,000; 15,000; 20,000; 30,000; In some embodiments, fragments can be up to 60,000; 70,000; 80,000; 90,000; 100,000; 120,000; 150,000; 200,000;

图21示出了根据本技术的另一个实施例的使用DDS方法对靶DNA片段进行测序的基于亲和的富集的方法的步骤。图A示出了包括粘性末端的靶DNA片段的基于亲和的富集(例如,诸如在图14或图17的方法中生成的靶DNA片段)。如所示出的,发夹衔接子已经以序列依赖的方式连接到双链DNA片段的3'末端。靶DNA分子可以流过能够结合与切割的靶DNA片段相关的粘性末端的官能化的表面(例如,具有结合的寡核苷酸)。此外,将包括标记B并至少部分地与结合的寡核苷酸的一部分互补的第二寡核苷酸链加入到溶液中。衔接子/DNA片段组分的退火和连接提供了衔接子-靶双链DNA复合物,该复合物结合到适合于直接数字测序的表面上(图B)。用于测序步骤的电场或磁场的施加和衔接子-DNA复合物的电拉伸可以如例如图20中所描述的进行。21 illustrates steps of an affinity-based enrichment method for sequencing target DNA fragments using a DDS method according to another embodiment of the present technology. Panel A shows affinity-based enrichment of target DNA fragments including cohesive ends (eg, such as those generated in the methods of FIG. 14 or FIG. 17 ). As shown, the hairpin adaptor has been ligated to the 3' end of the double-stranded DNA fragment in a sequence-dependent manner. Target DNA molecules can flow over a functionalized surface (eg, with bound oligonucleotides) capable of binding sticky ends associated with cleaved target DNA fragments. Additionally, a second oligonucleotide strand comprising label B and at least partially complementary to a portion of the bound oligonucleotide is added to the solution. Annealing and ligation of the adaptor/DNA fragment components provides adaptor-target double-stranded DNA complexes that bind to a surface suitable for direct digital sequencing ( Panel B ). The application of electric or magnetic fields for the sequencing step and the electrical stretching of the adaptor-DNA complex can be performed as described, for example, in FIG. 20 .

试剂和方法Reagents and Methods

衔接子类型adaptor type

虽然本公开中的大多数示例描绘了Y形或环形衔接子,但是根据各种实施例,可以使用任何已知的衔接子结构,诸如在WO2017/100441(其通过引用以其整体并入本文)中描述的那些。例如,包括气泡的各种衔接子形状(例如,非互补性的内部区域)也被进一步考虑。While most examples in this disclosure depict Y-shaped or circular adaptors, according to various embodiments, any known adaptor structure may be used, such as in WO2017/100441 (which is incorporated herein by reference in its entirety) those described in. For example, various adaptor shapes including bubbles (eg, non-complementary interior regions) are also further considered.

分离separate

如本文所述,各种方法包含至少一个分离步骤。具体地设想,各种分离步骤中的任何一个都可以被包含在各种实施例中。例如,在一些实施例中,分离可以是或包括物理分离、大小分离、磁分离、溶解度分离、电荷分离、疏水性分离、极性分离、电泳迁移率分离、密度分离、化学洗脱分离、SBIR珠分离等。例如,物理基团可以具有磁性特性、电荷特性或不溶性特性。在实施例中,当物理基团具有磁性特性并且施加磁场时,包含物理基团的相关的衔接子核酸序列与不包含物理基团的衔接子核酸序列分离。在另一个实施例中,当物理基团具有电荷特性并且施加电场时,包含物理基团的相关的衔接子核酸序列与不包含物理基团的衔接子核酸序列分离。在实施例中,当物理基团具有不溶性特性并且衔接子核酸序列被包含在物理基团不溶的溶液中时,包括物理基团的衔接子核酸序列从不包含保留在溶液中的物理基团的衔接子核酸序列中沉淀出来。As described herein, various methods comprise at least one separation step. It is specifically contemplated that any of the various separation steps may be included in various embodiments. For example, in some embodiments, separation may be or include physical separation, size separation, magnetic separation, solubility separation, charge separation, hydrophobic separation, polar separation, electrophoretic mobility separation, density separation, chemical elution separation, SBIR Bead separation, etc. For example, a physical group can have magnetic properties, charge properties, or insolubility properties. In an embodiment, when the physical group has magnetic properties and a magnetic field is applied, the associated adaptor nucleic acid sequence comprising the physical group is separated from the adaptor nucleic acid sequence that does not comprise the physical group. In another embodiment, when the physical group has a charge characteristic and an electric field is applied, the associated adaptor nucleic acid sequence comprising the physical group is separated from the adaptor nucleic acid sequence that does not comprise the physical group. In an embodiment, when the physical group has insoluble properties and the adaptor nucleic acid sequence is contained in a solution in which the physical group is insoluble, the adaptor nucleic acid sequence comprising the physical group never contains the physical group remaining in solution. The adaptor nucleic acid sequence precipitates out.

各种物理分离方法中的任何一种都可以被包含在各种实施例中。作为具体的示例,一组非限制性的方法包含:大小选择性过滤、密度离心、HPLC分离、凝胶过滤分离、FPLC分离、密度梯度离心和凝胶色谱法等。Any of a variety of physical separation methods may be included in various embodiments. As a specific example, a non-limiting set of methods includes: size selective filtration, density centrifugation, HPLC separation, gel filtration separation, FPLC separation, density gradient centrifugation, gel chromatography, and the like.

各种磁分离方法中的任何一种都可以被包含在各种实施例中。通常,磁分离方法将包括包含或添加一个或多个具有磁性特性的物理基团,使得当施加磁场时,包含这样的物理基团的分子与不包含这样的物理基团的分子分离。作为具体的示例,表现出磁性特性的物理基团包含但不限于铁磁性材料,诸如铁、镍、钴、镝、钆及其合金。用于化学和生物化学分离的常用的顺磁性珠将这样的材料嵌入表面中,该表面减少了材料与被操纵的化学物质(诸如聚苯乙烯)的化学相互作用,该表面可以针对上述亲和特性进行功能化。Any of various magnetic separation methods may be included in various embodiments. Typically, a magnetic separation method will involve the inclusion or addition of one or more physical groups having magnetic properties such that when a magnetic field is applied, molecules comprising such physical groups are separated from molecules not comprising such physical groups. As specific examples, physical groups exhibiting magnetic properties include, but are not limited to, ferromagnetic materials, such as iron, nickel, cobalt, dysprosium, gadolinium, and alloys thereof. Commonly used paramagnetic beads for chemical and biochemical separations embed such materials in a surface that reduces chemical interactions of the material with manipulated chemicals, such as polystyrene, which can target the aforementioned affinity Features are functionalized.

捕获标记capture marker

如本文所述,在一些实施例中,捕获标记可以沿着寡核苷酸探针、衔接子、核糖核苷酸序列、核糖核蛋白复合物等以多种构型中的任何一种存在于蛋白质上。在一些实施例中,捕获标记可以被掺入或附着到序列的区域5'中的寡核苷酸链上。在一些实施例中,捕获标记可以存在于寡核苷酸链中间的某处(即,不在寡核苷酸的5'或3'末端)。在包含两种或更多种捕获标记的实施例中,每个捕获标记可以存在于沿着寡核苷酸的不同位置。As described herein, in some embodiments, capture labels may be present in any of a variety of configurations along oligonucleotide probes, adaptors, ribonucleotide sequences, ribonucleoprotein complexes, and the like on protein. In some embodiments, a capture label can be incorporated or attached to an oligonucleotide strand in a region 5' of the sequence. In some embodiments, the capture label may be present somewhere in the middle of the oligonucleotide chain (ie, not at the 5' or 3' end of the oligonucleotide). In embodiments comprising two or more capture labels, each capture label may be present at a different location along the oligonucleotide.

在一些实施例中,捕获标记选自下面的组:生物素、生物素脱氧胸苷dT、生物素NHS、生物素TEG、生物素-6-氨基烯丙基-2'-脱氧尿苷-S'-三磷酸酯、生物素-16-氨基烯丙基-2-脱氧胞苷-5'-三磷酸酯、生物素16-氨基烯丙基胞苷-5'-三磷酸酯、N4-生物素-OBEA-2'-脱氧胞苷-5'-三磷酸酯、生物素-16-氨基烯丙基尿苷-5'-三磷酸酯、生物素-16-7-脱氮-7-氨基烯丙基-2'-脱氧鸟苷-5'-三磷酸酯、5'-生物素-G-单磷酸酯、5'-生物素-A-单磷酸酯、5'-生物素-dG-单磷酸酯、5'-生物素-dA-单磷酸酯、脱硫生物素NHS、脱硫生物素-6-氨基烯丙基-2'-脱氧胞苷-5'-三磷酸酯、洋地黄毒苷NHS、DNP、TEG、硫醇、大肠杆菌素E2、Im2、谷胱甘肽、谷胱甘肽-s-转移酶(GST)、镍、聚组氨酸、FLAG标签、myc-标签等。在一些实施例中,捕获标记包含但不限于生物素、抗生物素蛋白、链霉亲和素、由抗体识别的半抗原、特定的核酸序列和/或磁性吸引颗粒。在一些实施例中,核酸分子的一种或多种化学修饰(例如,Acridite-修饰的以及许多其他修饰,其中一些在本申请的其他地方描述)可以用作捕获标记。In some embodiments, the capture label is selected from the group consisting of biotin, biotin deoxythymidine dT, biotin NHS, biotin TEG, biotin-6-aminoallyl-2'-deoxyuridine-S '-triphosphate, biotin-16-aminoallyl-2-deoxycytidine-5'-triphosphate, biotin-16-aminoallylcytidine-5'-triphosphate, N4-bio Biotin-OBEA-2'-deoxycytidine-5'-triphosphate, Biotin-16-aminoallyluridine-5'-triphosphate, Biotin-16-7-deaza-7-amino Allyl-2'-deoxyguanosine-5'-triphosphate, 5'-biotin-G-monophosphate, 5'-biotin-A-monophosphate, 5'-biotin-dG- Monophosphate, 5'-biotin-dA-monophosphate, desthiobiotin NHS, desthiobiotin-6-aminoallyl-2'-deoxycytidine-5'-triphosphate, digoxigenin NHS, DNP, TEG, thiols, colicin E2, Im2, glutathione, glutathione-s-transferase (GST), nickel, polyhistidine, FLAG tag, myc-tag, etc. In some embodiments, capture labels include, but are not limited to, biotin, avidin, streptavidin, haptens recognized by antibodies, specific nucleic acid sequences, and/or magnetically attractive particles. In some embodiments, one or more chemical modifications of nucleic acid molecules (eg, Acridite-modified and many others, some of which are described elsewhere in this application) can be used as capture labels.

提取部分Extract part

提取部分可以是靶向捕获标记的物理结合配偶体或对,并且是指可分离部分或任何类型的分子,其允许带有捕获标记的核酸或由带有捕获标记的分子(例如,寡核苷酸、蛋白质、核糖核蛋白复合物等)结合的核酸与缺乏捕获标记的核酸的亲和分离。提取部分可以直接连接或间接连接(例如,通过核酸、通过抗体、通过衔接子等)到基底(诸如固体表面)。在一些实施例中,提取部分选自包括小分子、核酸、肽、抗体或任何唯一可结合的部分的组。提取部分可以被连接到或可链接到固相或其他表面,用于形成官能化的表面。在一些实施例中,提取部分是连接到表面(例如固体表面、珠、磁性颗粒等)的核苷酸的序列。在其中捕获标记是生物素的一些实施例中,提取部分选自抗生物素蛋白或链霉亲和素的组。本领域技术人员将理解,根据各种实施例,可以使用各种亲和结合对中的任何一种。The extraction moiety can be a physical binding partner or pair targeted to a capture label, and refers to an detachable moiety or any type of molecule that allows capture-labeled nucleic acid or capture-labeled molecules (e.g., oligonucleotides) Affinity separation of nucleic acids bound to acids, proteins, ribonucleoprotein complexes, etc. from nucleic acids lacking capture labels. The extraction moiety can be attached directly or indirectly (eg, by nucleic acid, by antibody, by adaptor, etc.) to a substrate (such as a solid surface). In some embodiments, the extraction moiety is selected from the group consisting of small molecules, nucleic acids, peptides, antibodies, or any uniquely bindable moiety. The extraction moiety can be or can be linked to a solid phase or other surface for forming a functionalized surface. In some embodiments, the extraction moiety is a sequence of nucleotides attached to a surface (eg, a solid surface, beads, magnetic particles, etc.). In some embodiments wherein the capture label is biotin, the extraction moiety is selected from the group of avidin or streptavidin. Those skilled in the art will understand that any of a variety of affinity binding pairs may be used according to various embodiments.

在某些实施例中,提取部分可以是与靶向的捕获标记相互作用的物理或化学性质。例如,提取部分可以是磁场、电荷场或靶向的捕获标记不溶于其中的液体溶液。可以应用这样的物理或化学性质,并且带有捕获标记的衔接子核酸可以被固定在容器(表面)或柱内/靠着容器(表面)或柱固定。取决于所需的阳性富集/选择或阴性富集/选择结果,可以保留固定的分子(阳性富集)或可以保留非固定的分子(阴性富集)用于进一步的纯化/处理或使用。In certain embodiments, the extraction moiety may be a physical or chemical property that interacts with the targeted capture label. For example, the extraction moiety can be a magnetic field, a charge field, or a liquid solution in which the targeted capture label is insoluble. Such physical or chemical properties can be applied, and the adapter nucleic acid with the capture label can be immobilized in/against the container (surface) or column. Depending on the desired positive enrichment/selection or negative enrichment/selection result, either immobilized molecules (positive enrichment) or non-immobilized molecules (negative enrichment) may be retained for further purification/processing or use.

固体表面solid surface

当亲和配偶体/提取部分被附着于固体表面或基底并结合至捕获标记时,包含捕获标记的衔接子核酸序列能够与不包含亲和标记的衔接子核酸序列分离。固体表面或基底可以是珠、可分离颗粒、磁性颗粒或另一种固定结构。When the affinity partner/extraction moiety is attached to a solid surface or substrate and bound to the capture label, the adaptor nucleic acid sequence comprising the capture label can be separated from the adaptor nucleic acid sequence that does not comprise the affinity label. The solid surface or substrate can be beads, separable particles, magnetic particles, or another immobilized structure.

如本文所述,并且本领域技术人员将会理解,根据各种实施例,可以使用各种官能化的表面中的任何一种。例如,在一些实施例中,官能化的表面可以是或包括珠(例如,受控孔玻璃珠、大孔聚苯乙烯珠等)。然而,本领域技术人员将理解,许多其他化学部分/表面对可以类似地用于实现相同的目的。将理解,本文描述的特定官能化的表面仅作为示例,并且可以使用能够与一个或多个提取部分相关(例如,连接到、结合到一个或多个提取部分等)的任何其他合适的固定结构或基底。As described herein, and as will be understood by those skilled in the art, any of a variety of functionalized surfaces may be used according to various embodiments. For example, in some embodiments, the functionalized surface can be or include beads (eg, controlled pore glass beads, macroporous polystyrene beads, etc.). However, those skilled in the art will appreciate that many other chemical moiety/surface pairs can be similarly used to achieve the same purpose. It will be appreciated that the specific functionalized surfaces described herein are by way of example only and that any other suitable immobilization structure capable of being associated with (eg, attached to, bound to, one or more extraction moieties, etc.) one or more extraction moieties may be used or base.

核酸的切割Nucleic acid cleavage

本技术的各个方面,包含使用衔接子、寡核苷酸和捕获标记富集核酸材料,所述衔接子、寡核苷酸和捕获标记可以掺入酶促切割、单链的酶促切割、双链的酶促切割、掺入修饰的核酸,随后进行导致切割一条或两条链的酶促处理,掺入可光裂解的接头、掺入尿嘧啶、掺入核糖碱基、掺入8-氧代鸟嘌呤加合物、使用限制性核酸内切酶、使用位点定向的切割酶等。在其他实施例中,可以使用核酸内切酶,诸如核糖核蛋白核酸内切酶(例如Cas酶,诸如Cas9或CPF1),或其他可编程的核酸内切酶(例如归巢核酸内切酶、锌指核酸酶、TALEN、巨核酸酶(例如megaTAL核酸酶)、精氨酸核酸酶等)以及它们的任意组合。Various aspects of the present technology include enriching nucleic acid material using adaptors, oligonucleotides and capture labels that can incorporate enzymatic cleavage, single-stranded enzymatic cleavage, double Enzymatic cleavage of strands, incorporation of modified nucleic acids followed by enzymatic treatment leading to cleavage of one or both strands, incorporation of photocleavable linkers, incorporation of uracil, incorporation of ribobases, incorporation of 8-oxo Substitute guanine adducts, use restriction endonucleases, use site-directed cleavage enzymes, etc. In other embodiments, endonucleases, such as ribonucleoprotein endonucleases (eg, Cas enzymes such as Cas9 or CPF1), or other programmable endonucleases (eg, homing endonucleases, Zinc finger nucleases, TALENs, meganucleases (eg, megaTAL nucleases, arginine nucleases, etc.), and any combination thereof.

如本文所述,各种实施例包含使用识别独特的核苷酸序列或修饰的一种或多种核酸内切酶或识别碱基或其他主链化学修饰的其他实体,用于在一条或多条链中的特定位置切割(cutting)和/或切割(cleaving)双链核酸(例如,DNA或RNA)。示例包含尿嘧啶(被识别并且可以被尿嘧啶DNA糖基化酶和脱碱基位点裂解酶诸如核酸内切酶VIII或FPG的组合裂解)和核糖核苷酸,当这些核糖核苷酸与DNA碱基配对时,核糖核苷酸可以被RNAseH2识别和切割。核酸可以是DNA、RNA或其组合,并且任选地,包含肽核酸(PNA)或锁核酸(LNA)或其他修饰的核酸。在一些实施例中,切割可以通过使用一种或多种限制性核酸内切酶来进行。在一些实施例中,可以使用可切割的接头(例如尿嘧啶脱硫生物素-TEG、核糖切割或其他方法)进行切割。在一些实施例中,可切割的接头可以是不需要酶或部分地需要酶的光可裂解的接头或化学可切割的接头。As described herein, various embodiments include the use of one or more endonucleases that recognize unique nucleotide sequences or modifications or other entities that recognize bases or other backbone chemical modifications for use in one or more Specific positions in the strands cut and/or cleave double-stranded nucleic acids (eg, DNA or RNA). Examples include uracil (recognized and can be cleaved by a combination of uracil DNA glycosylase and abasic site cleavage enzymes such as endonuclease VIII or FPG) and ribonucleotides, when these ribonucleotides are combined with During DNA base pairing, ribonucleotides can be recognized and cleaved by RNAseH2. The nucleic acid can be DNA, RNA, or a combination thereof, and optionally, comprises a peptide nucleic acid (PNA) or locked nucleic acid (LNA) or other modified nucleic acid. In some embodiments, cleavage can be performed using one or more restriction endonucleases. In some embodiments, cleavage can be performed using a cleavable linker (eg, uracil desthiobiotin-TEG, ribose cleavage, or other methods). In some embodiments, the cleavable linker may be a photocleavable linker or a chemically cleavable linker that does not require or partially require an enzyme.

本领域普通技术人员将理解,在识别位点处或其附近切割DNA的各种限制性核酸内切酶(即限制性酶)(例如,EcoRI、BamHI、XbaI、HindIII、AluI、AvaII、BsaJI、BstNI、DsaV、Fnu4HI、HaeIII、MaeIII、NlaIV、NSiI、MspJI、FspEI、NaeI、Bsu36I、NotI、HinF1、Sau3AI、PvuII、SmaI、HgaI、AluI、EcoRV等)可以符合本技术的各种实施例。若干种限制性核酸内切酶的列表以印刷的和计算机可读的形式可获得,并且由许多商业供应商(例如,马萨诸塞州伊普斯维奇的New England Biolabs)提供。限制性核酸内切酶和相关的识别位点的非限制性列表可以在www.neb.com/tools-and-resources/selection-charts/alphabetized-list-of-recognition-specificities找到。One of ordinary skill in the art will understand that various restriction endonucleases (ie, restriction enzymes) that cleave DNA at or near the recognition site (eg, EcoRI, BamHI, XbaI, HindIII, AluI, AvaII, BsaJI, BstNI, DsaV, Fnu4HI, HaeIII, MaeIII, NlaIV, NSil, MspJI, FspEI, NaeI, Bsu36I, NotI, HinF1, Sau3AI, PvuII, SmaI, HgaI, AluI, EcoRV, etc.) may conform to various embodiments of the present technology. Lists of several restriction endonucleases are available in printed and computer readable form and are provided by a number of commercial suppliers (eg, New England Biolabs, Ipswich, MA). A non-limiting list of restriction endonucleases and associated recognition sites can be found at www.neb.com/tools-and-resources/selection-charts/alphabetized-list-of-recognition-specificities.

在一些实施例中,修饰的或非核苷酸可以提供可切割的部分。例如,尿嘧啶碱基(作为一个示例,可以用UGD和核酸内切酶VIII或FPG的组合来切割)、脱碱基位点(作为一个示例,可以用核酸内切酶VIII来切割)、8-氧代-鸟嘌呤(作为示例,可以用FPG或OGG1来切割)和核糖核苷酸(在一个示例中,当与DNA配对时,可以用RNAseH2来切割)。In some embodiments, modified or non-nucleotides can provide cleavable moieties. For example, uracil bases (as an example, can be cleaved with a combination of UGD and endonuclease VIII or FPG), abasic sites (as an example, can be cleaved with endonuclease VIII), 8 -oxo-guanine (can be cleaved with FPG or OGG1 as an example) and ribonucleotides (in one example, when paired with DNA, can be cleaved with RNAseH2).

可连接末端connectable end

在一些实施例中,生成衔接子产物,其具有适合于连接到靶双链核酸序列的可连接3'末端(例如,用于测序文库制备)。存在于每个双链衔接子产物中的连接结构域可以能够被连接到双链靶核酸序列的一条相应链上。在一些实施例中,连接结构域中的一个包含T-突出端、A-突出端、CG-突出端、多核苷酸突出端、钝端或另一种可连接的核酸序列。在一些实施例中,双链3'连接结构域包括钝端。在某些实施例中,连接结构域序列中的至少一个包含修饰的或非标准的核酸。在一些实施例中,修饰的核苷酸可以是脱碱基位点、尿嘧啶、四氢呋喃、8-氧代-7,8-二氢-2'-脱氧腺苷(8-氧代-A)、8-氧代-7,8-二氢-2'-脱氧鸟苷(8-氧代-G)、脱氧肌苷、5'-硝基吲哚、5-羟甲基-2'-脱氧胞苷、异胞嘧啶、5'-甲基-异胞嘧啶或异鸟苷。在一些实施例中,连接结构域的至少一条链包含去磷酸化的碱基。在一些实施例中,连接结构域中的至少一个包含脱羟基化的碱基。在一些实施例中,连接结构域的至少一条链已经被化学修饰以便使其不可连接(例如,直到进行进一步的动作以使连接结构域可连接)。在一些实施例中,通过使用具有末端转移酶活性的聚合酶来获得3'突出端。在一个示例中,Taq聚合酶可以添加单个碱基对突出端。在一些实施例中,这是“A”。In some embodiments, adaptor products are generated that have ligable 3' ends suitable for ligation to target double-stranded nucleic acid sequences (eg, for sequencing library preparation). The ligation domains present in each double-stranded adaptor product can be capable of being ligated to a corresponding strand of the double-stranded target nucleic acid sequence. In some embodiments, one of the linking domains comprises a T-overhang, A-overhang, CG-overhang, polynucleotide overhang, blunt end, or another ligable nucleic acid sequence. In some embodiments, the double-stranded 3' linking domain includes blunt ends. In certain embodiments, at least one of the linker domain sequences comprises a modified or non-standard nucleic acid. In some embodiments, the modified nucleotide can be an abasic site, uracil, tetrahydrofuran, 8-oxo-7,8-dihydro-2'-deoxyadenosine (8-oxo-A) , 8-oxo-7,8-dihydro-2'-deoxyguanosine (8-oxo-G), deoxyinosine, 5'-nitroindole, 5-hydroxymethyl-2'-deoxy Cytidine, isocytosine, 5'-methyl-isocytosine, or isoguanosine. In some embodiments, at least one strand of the linker domain comprises dephosphorylated bases. In some embodiments, at least one of the linking domains comprises a dehydroxylated base. In some embodiments, at least one chain of the linking domain has been chemically modified so as to render it non-ligable (eg, until further action is taken to render the linking domain ligable). In some embodiments, 3' overhangs are obtained by using a polymerase with terminal transferase activity. In one example, Taq polymerase can add a single base pair overhang. In some embodiments, this is "A".

非标准核苷酸non-standard nucleotides

在一些实施例中,所提供的模板和/或延长链可以包含一个或多个非标准/非规范的核苷酸。在一些实施例中,非标准核苷酸可以是或包括尿嘧啶、甲基化核苷酸、RNA核苷酸、核糖核苷酸、8-氧代-鸟嘌呤、生物素化核苷酸、脱硫生物素核苷酸、硫醇修饰的核苷酸、丙烯酸酯修饰的核苷酸、异-dC、异dG、2'-O-甲基核苷酸、肌苷核苷酸锁核酸、肽核酸、5甲基dC、5-溴脱氧尿苷、2,6-二氨基嘌呤、2-氨基嘌呤核苷酸、脱碱基核苷酸、5-硝基吲哚核苷酸、腺苷酸化核苷酸、叠氮化物核苷酸、洋地黄毒苷核苷酸、I-接头、5'己炔基修饰的核苷酸、5-辛二炔基dU、可光裂解的间隔子、非可光裂解的间隔子、点击化学相容的修饰核苷酸、荧光染料、生物素、呋喃、BrdU、氟代-dU、loto-dU以及它们的任意组合。In some embodiments, the provided template and/or extension chain may comprise one or more non-standard/non-canonical nucleotides. In some embodiments, non-standard nucleotides can be or include uracil, methylated nucleotides, RNA nucleotides, ribonucleotides, 8-oxo-guanine, biotinylated nucleotides, Dethiobiotin nucleotides, thiol-modified nucleotides, acrylate-modified nucleotides, iso-dC, iso-dG, 2'-O-methyl nucleotides, inosine nucleotides locked nucleic acids, peptides Nucleic acids, 5-methyl dC, 5-bromodeoxyuridine, 2,6-diaminopurine, 2-aminopurine nucleotides, abasic nucleotides, 5-nitroindole nucleotides, adenylation Nucleotides, azide nucleotides, digoxigenin nucleotides, I-linkers, 5'hexynyl modified nucleotides, 5-octadiynyl dU, photocleavable spacers, non- Photocleavable spacers, click chemistry compatible modified nucleotides, fluorescent dyes, biotin, furan, BrdU, fluoro-dU, loto-dU, and any combination thereof.

另外的方面another aspect

根据本公开的方面,一些实施例提供了来自非常少量的核酸材料的高质量测序信息。在一些实施例中,所提供的方法和组合物可以与至多约1皮克(pg);10pg;100pg;1纳克(ng);10ng;100ng;200ng、300ng、400ng、500ng、600ng、700ng、800ng、900ng或1000ng的量的起始核酸材料一起使用。在一些实施例中,所提供的方法和组合物可以与输入量的核酸材料一起使用,所述输入量为至多1个分子拷贝或基因组等同物、10个分子拷贝或其基因组等同物、100个分子拷贝或其基因组等同物、1,000个分子拷贝或其基因组等同物、10,000个分子拷贝或其基因组等同物、100,000个分子拷贝或其基因组等同物或1,000,000个分子拷贝或其基因组等同物。例如,在一些实施例中,最初为特定的测序过程提供至多1,000ng的核酸材料。例如,在一些实施例中,最初为特定的测序过程提供至多100ng的核酸材料。例如,在一些实施例中,最初为特定的测序过程提供至多10ng的核酸材料。例如,在一些实施例中,最初为特定的测序过程提供至多1ng的核酸材料。例如,在一些实施例中,最初为特定的测序过程提供至多100pg的核酸材料。例如,在一些实施例中,最初为特定的测序过程提供至多1pg的核酸材料。According to aspects of the present disclosure, some embodiments provide high quality sequencing information from very small amounts of nucleic acid material. In some embodiments, provided methods and compositions can be combined with up to about 1 picogram (pg); 10 pg; 100 pg; 1 nanogram (ng); 10 ng; 100 ng; , 800ng, 900ng or 1000ng of starting nucleic acid material were used together. In some embodiments, the provided methods and compositions can be used with an input amount of nucleic acid material of up to 1 molecular copy or genomic equivalent, 10 molecular copies or genomic equivalent thereof, 100 Molecular copies or genomic equivalents thereof, 1,000 molecular copies or genomic equivalents thereof, 10,000 molecular copies or genomic equivalents thereof, 100,000 molecular copies or genomic equivalents thereof, or 1,000,000 molecular copies or genomic equivalents thereof. For example, in some embodiments, up to 1,000 ng of nucleic acid material is initially provided for a particular sequencing process. For example, in some embodiments, up to 100 ng of nucleic acid material is initially provided for a particular sequencing process. For example, in some embodiments, up to 10 ng of nucleic acid material is initially provided for a particular sequencing process. For example, in some embodiments, up to 1 ng of nucleic acid material is initially provided for a particular sequencing process. For example, in some embodiments, up to 100 pg of nucleic acid material is initially provided for a particular sequencing process. For example, in some embodiments, up to 1 pg of nucleic acid material is initially provided for a particular sequencing process.

根据本技术的其他方面,一些所提供的方法可用于对核酸材料的各种次优(例如,受损的或降解的)样品中的任何一种进行测序。例如,在一些实施例中,至少一些核酸材料被损坏。在一些实施例中,损伤是或包括氧化、烷基化、脱氨基、甲基化、水解、切口、链内交联、链间交联、钝端链断裂、交错末端双链断裂、磷酸化、去磷酸化、类泛素化、糖基化、单链间隙、由热引起的损伤、由干燥引起的损伤、由UV暴露引起的损伤、由γ辐射引起的损伤、由X辐射引起的损伤、由电离辐射引起的损伤、由非电离辐射引起的损伤、由重颗粒辐射引起的损伤、由核衰变引起的损伤、由β辐射引起的损伤、由α辐射引起的损伤、由中子辐射引起的损伤、由质子辐射引起的损伤、由宇宙辐射引起的损伤、由高pH引起的损伤、由低pH引起的损伤、由活性氧化物质引起的损伤、由自由基引起的损伤、由过氧化物引起的损伤、由次氯酸盐引起的损伤、由诸如福尔马林或甲醛等的组织固定引起的损伤、由活性铁引起的损伤、由低离子条件引起的损伤、由高离子条件引起的损伤、由无缓冲条件引起的损伤、由核酸酶引起的损伤、由环境暴露引起的损伤、由火灾引起的损伤、由机械应力引起的损伤、由酶降解引起的损伤、由微生物引起的损伤、由制备性机械剪切引起的损伤、由制备性酶切引起的损伤、在体内自然发生的损伤、在核酸提取期间发生的损伤、在测序文库制备期间发生的损伤、通过聚合酶引入的损伤、在核酸修复期间引入的损伤、在核酸末端拖尾期间发生的损伤、在核酸连接期间发生的损伤、在测序期间发生的损伤,由于机械处理DNA而发生的损伤、在通过纳米孔的期间发生的损伤、作为在生物体中老化的一部分而发生的损伤、由于个体的化学暴露而发生的损伤、由于诱变剂而发生的损伤、由于致癌物而发生的损伤、由断裂剂而发生的损伤、由于氧暴露引起的体内炎症损伤而发生的损伤、由于一条或多条链断裂而引起的损伤以及它们的任意组合中的至少一种。According to other aspects of the present technology, some of the provided methods can be used to sequence any of a variety of suboptimal (eg, damaged or degraded) samples of nucleic acid material. For example, in some embodiments, at least some of the nucleic acid material is damaged. In some embodiments, the damage is or includes oxidation, alkylation, deamination, methylation, hydrolysis, nicking, intrachain crosslinks, interchain crosslinks, blunt end strand breaks, staggered end double strand breaks, phosphorylation , dephosphorylation, ubiquitination, glycosylation, single-strand gap, damage by heat, damage by desiccation, damage by UV exposure, damage by gamma radiation, damage by X-radiation , Damage caused by ionizing radiation, Damage caused by non-ionizing radiation, Damage caused by heavy particle radiation, Damage caused by nuclear decay, Damage caused by beta radiation, Damage caused by alpha radiation, Damage caused by neutron radiation damage by proton radiation, damage by cosmic radiation, damage by high pH, damage by low pH, damage by reactive oxidizing species, damage by free radicals, damage by peroxides damage caused by hypochlorite, damage caused by tissue fixation such as formalin or formaldehyde, damage caused by active iron, damage caused by low ionic conditions, damage caused by high ionic conditions damage, damage by unbuffered conditions, damage by nucleases, damage by environmental exposure, damage by fire, damage by mechanical stress, damage by enzymatic degradation, damage by microorganisms, Damage caused by preparative mechanical shearing, damage caused by preparative cleavage, damage that occurs naturally in vivo, damage that occurs during nucleic acid extraction, damage that occurs during sequencing library preparation, damage introduced by polymerases, Damage introduced during nucleic acid repair, damage during nucleic acid end tailing, damage during nucleic acid ligation, damage during sequencing, damage due to mechanical handling of DNA, damage during passage through nanopores damage, damage that occurs as part of aging in living organisms, damage due to chemical exposure of an individual, damage due to mutagens, damage due to carcinogens, damage due to cleavage agents, At least one of damage due to inflammatory damage in vivo caused by oxygen exposure, damage due to breakage of one or more strands, and any combination thereof.

II.双链测序方法和相关的衔接子和试剂的选定的实施例 II. Selected Examples of Double Strand Sequencing Methods and Related Adapters and Reagents

双链测序是一种用于从双链核酸分子生成错误校正的DNA序列的方法,并且最初在国际专利公开第WO 2013/142389号中和在美国专利第9,752,188号和WO 2017/100441中,在Schmitt et.al.,PNAS,2012[1];在Kennedy et.al.,PLOS Genetics,2013[2];在Kennedy et.al.,Nature Protocols,2014[3];和在Schmitt et.al.,Nature Methods,2015[4]中描述。上述专利、专利申请和出版物中的每一个都通过引用以其整体并入到本文中。如图1A-1C所示,并且在该技术的某些方面中,双链测序可以用于以这样的方式独立地对单个DNA分子的两条链进行测序,使得在大规模平行测序(MPS)(也通常称为下一代测序(NGS))期间,衍生序列读数可以被识别为源自相同的双链核酸亲本分子,但也在测序后作为可区分的实体彼此区分。然后将从每条链得到的序列读数进行比较,用于获得被称为双链共有序列(DCS)的原始双链核酸分子的错误校正的序列。双链测序的过程使得可以明确地确认原始双链核酸分子的两条链在用于形成DCS的所生成的测序数据中被表示。Double-stranded sequencing is a method for generating error-corrected DNA sequences from double-stranded nucleic acid molecules, and was originally described in International Patent Publication No. WO 2013/142389 and in US Patent No. 9,752,188 and WO 2017/100441, in Schmitt et.al., PNAS, 2012[1]; in Kennedy et.al., PLOS Genetics, 2013[2]; in Kennedy et.al., Nature Protocols, 2014[3]; and in Schmitt et.al. , described in Nature Methods, 2015 [4]. Each of the aforementioned patents, patent applications, and publications is incorporated herein by reference in its entirety. As shown in Figures 1A-1C, and in certain aspects of the technology, double-stranded sequencing can be used to independently sequence both strands of a single DNA molecule in such a way that in massively parallel sequencing (MPS) During (also commonly referred to as next-generation sequencing (NGS)), derived sequence reads can be identified as originating from the same double-stranded nucleic acid parent molecule, but also differentiated from each other as distinguishable entities after sequencing. The sequence reads obtained from each strand are then compared to obtain an error-corrected sequence of the original double-stranded nucleic acid molecule known as the double-stranded consensus sequence (DCS). The process of double-stranded sequencing makes it possible to unambiguously confirm that both strands of the original double-stranded nucleic acid molecule are represented in the generated sequencing data used to form the DCS.

在某些实施例中,掺入DS的方法可以包含将一个或多个测序衔接子连接到靶双链核酸分子上,以生成双链靶核酸复合物,所述靶双链核酸分子包含第一链靶核酸序列和第二链靶核酸序列(例如图22A)。In certain embodiments, a method of incorporating DS can comprise ligating one or more sequencing adaptors to a target double-stranded nucleic acid molecule comprising a first double-stranded target nucleic acid complex to generate a double-stranded target nucleic acid complex Strand target nucleic acid sequence and second strand target nucleic acid sequence (eg, Figure 22A).

在各种实施例中,得到的靶核酸复合物可以包含至少一个SMI序列,其可能需要外源性应用的简并或半简并序列(例如,图22A中所示的随机双链标签、在图22A中被识别为α和β的序列)、与靶双链核酸分子的特异性剪切点相关的内源性信息,或其组合。SMI可以使靶核酸分子与群体中的多个其他分子基本上可区分,所述群体被单独测序或与它们所连接的核酸片段的区分元件组合测序。SMI元件的基本上可区分的特征可以由形成双链核酸分子的每条单链独立地携带,使得每条链的衍生扩增产物在测序后可以被识别为来自相同的原始基本上独特的双链核酸分子。在其他实施例中,SMI可以包含附加的信息和/或可以用于对于这样的分子区分功能有用的其他方法,例如在上述参考的出版物中描述的那些方法。在另一个实施例中,SMI元件可以在衔接子连接之后被并入。在一些实施例中,SMI本质上是双链的。在其他实施例中,它本质上是单链的(例如,SMI可以在衔接子的单链部分上)。在其他实施例中,它本质上是单链和双链的组合。In various embodiments, the resulting target nucleic acid complex can comprise at least one SMI sequence, which may require a degenerate or semi-degenerate sequence applied exogenously (eg, the random double-stranded tag shown in FIG. Sequences identified in Figure 22A as alpha and beta), endogenous information associated with specific cleavage sites for the target double-stranded nucleic acid molecule, or a combination thereof. SMI can make target nucleic acid molecules substantially distinguishable from multiple other molecules in a population that are sequenced alone or in combination with the distinguishing elements of the nucleic acid fragments to which they are attached. Substantially distinguishable features of an SMI element can be carried independently by each single strand forming a double-stranded nucleic acid molecule, such that the derived amplification product of each strand can be identified after sequencing as being derived from the same original substantially unique double-stranded nucleic acid molecule. Stranded nucleic acid molecules. In other embodiments, the SMI may contain additional information and/or may be used for other methods useful for such molecular discrimination functions, such as those described in the above-referenced publications. In another embodiment, the SMI element can be incorporated after adaptor ligation. In some embodiments, the SMI is double-stranded in nature. In other embodiments, it is single-stranded in nature (eg, the SMI can be on the single-stranded portion of the adaptor). In other embodiments, it is a combination of single stranded and double stranded in nature.

在一些实施例中,每个双链靶核酸序列复合物可以进一步包含元件(例如,SDE),该元件使得形成靶双链核酸分子的两个单链核酸的扩增产物在测序后基本上可以彼此区分。在一个实施例中,SDE可以包括包括在测序衔接子内的不对称引物位点,或者,在其他排列中,序列不对称可以被引入到不在引物序列内的衔接子分子中,使得在扩增和测序之后,第一链靶核酸序列复合物的核苷酸序列中的至少一个位置和靶核酸序列复合物的第二链彼此不同。在其他实施例中,SMI可以包括在两条链之间的另一种生化不对称,其不同于标准核苷酸序列A、T、C、G或U,但是在两个扩增的和测序的分子中被转化为至少一个标准核苷酸序列差异。在又一个实施例中,SDE可以是在扩增前物理地分离两条链的手段,使得来自第一链靶核酸序列和第二链靶核酸序列的衍生扩增产物保持彼此基本物理隔离,用于保持两者之间的区别的目的。可以使用用于提供允许区分第一链和第二链的SDE功能的其他这样的排列或方法,例如在上述参考的出版物中描述的那些,或者服务于所描述的功能目的的其他方法。In some embodiments, each double-stranded target nucleic acid sequence complex can further comprise an element (eg, an SDE) that allows the amplification products of the two single-stranded nucleic acids that form the target double-stranded nucleic acid molecule to be substantially accessible after sequencing distinguished from each other. In one embodiment, the SDE can include asymmetric primer sites that are included within the sequencing adaptor, or, in other arrangements, sequence asymmetry can be introduced into the adaptor molecule that is not within the primer sequence, such that during amplification And after sequencing, at least one position in the nucleotide sequence of the first strand target nucleic acid sequence complex and the second strand of the target nucleic acid sequence complex differ from each other. In other embodiments, the SMI may include another biochemical asymmetry between the two strands that differs from the standard nucleotide sequence A, T, C, G, or U, but differs in both amplified and sequenced is translated into at least one standard nucleotide sequence difference in the molecule. In yet another embodiment, SDE can be a means of physically separating the two strands prior to amplification, such that the derived amplification products from the first-strand target nucleic acid sequence and the second-strand target nucleic acid sequence remain substantially physically separated from each other, using for the purpose of maintaining the distinction between the two. Other such arrangements or methods for providing SDE functionality that allows differentiation of the first and second strands, such as those described in the above-referenced publications, or other methods that serve the functional purposes described, may be used.

在生成包括至少一个SMI和至少一个SDE的双链靶核酸复合物之后,或者在随后将引入这些元件中的一个或两个的情况下,该复合物可以经历DNA扩增,例如用PCR或DNA扩增的任何其他生化方法(例如,滚环扩增、多重置换扩增、等温扩增、桥接扩增或表面结合扩增),使得产生一个或多个拷贝的第一链靶核酸序列和一个或多个拷贝的第二链靶核酸序列(例如,图22B)。然后第一链靶核酸分子的一个或多个扩增拷贝和第二靶核酸分子的一个或多个扩增拷贝可以经历DNA测序,优选地使用“下一代”大规模平行DNA测序平台(例如,图22B)。After generating a double-stranded target nucleic acid complex comprising at least one SMI and at least one SDE, or where one or both of these elements are to be subsequently introduced, the complex can undergo DNA amplification, for example with PCR or DNA Any other biochemical method of amplification (eg, rolling circle amplification, multiple displacement amplification, isothermal amplification, bridging amplification, or surface-bound amplification) such that one or more copies of the first-strand target nucleic acid sequence and a or multiple copies of the second strand target nucleic acid sequence (eg, Figure 22B). The one or more amplified copies of the first strand target nucleic acid molecule and the one or more amplified copies of the second target nucleic acid molecule can then undergo DNA sequencing, preferably using a "next-generation" massively parallel DNA sequencing platform (eg, Figure 22B).

从由原始的双链靶核酸分子衍生的第一链靶核酸分子和第二链靶核酸分子产生的序列读数可以基于共享相关的基本上独特的SMI来识别,并通过SDE与相反的链靶核酸分子相区别。在一些实施例中,SMI可以是基于基于数学的错误校正的码(例如,汉明码)的序列,由此为了将SMI序列的序列关联到原始双链体(例如,双链核酸分子)的互补链上的目的,某些扩增错误、测序错误或SMI合成错误是可以容忍的。例如,对于双链外源性SMI,其中SMI包括15个完全简并的标准DNA碱基序列的碱基对,估计4L^15=1,073,741,824个SMI变体将存在于完全简并的SMI群体中。如果从10,000个取样的SMI群体中仅在SMI序列中有一个核苷酸不同的测序数据的读数中恢复了两个SMI,则可以通过随机机会从数学上计算出发生这种情况的概率,并决定单个碱基对差异是否更有可能反映上述类型的错误之一,并且可以确定SMI序列实际上源自相同的原始双链分子。在其中SMI至少部分地是外源性应用的序列的一些实施例中,其中序列变体彼此不完全退化,并且至少部分地是已知序列,在一些实施例中,已知序列的同一性可以被设计成使得前述类型的一个或多个错误不会将一个已知SMI序列的同一性转化成另一个SMI序列的同一性,使得一个SMI被误解为另一个SMI的可能性降低。在一些实施例中,该SMI设计策略包括汉明码方法或其衍生物。一旦被识别,将从第一链靶核酸分子产生的一个或多个序列读数与从第二链靶核酸分子产生的一个或多个序列读数进行比较,以产生错误校正的靶核酸分子序列(例如,图22C)。例如,其中来自第一链靶核酸序列和第二链靶核酸序列的碱基一致的核苷酸位置被认为是真序列,而在两条链之间不一致的核苷酸位置被认为是技术错误的潜在位点,其可以被忽略、消除、校正或以其他方式识别。因此可以产生原始双链靶核酸分子的错误校正的序列(在图22C中示出)。在一些实施例中,并且在分别地对从第一链靶核酸分子和第二链靶核酸分子产生的每个测序读数进行分组之后,可以为第一链和第二链中的每一个生成单链共有序列。然后可以比较来自第一链靶核酸分子和第二链靶核酸分子的单链共有序列,以生成错误校正的靶核酸分子序列(例如,图22C)。Sequence reads generated from first-strand target nucleic acid molecules and second-strand target nucleic acid molecules derived from the original double-stranded target nucleic acid molecule can be identified based on the substantially unique SMIs that are shared and correlated with the opposite-strand target nucleic acid by SDE Molecular differences. In some embodiments, the SMI may be a sequence based on a mathematically error-correcting code (eg, Hamming code), whereby in order to correlate the sequence of the SMI sequence to the complement of the original duplex (eg, double-stranded nucleic acid molecule) For on-strand purposes, some amplification errors, sequencing errors, or SMI synthesis errors are tolerable. For example, for a double-stranded exogenous SMI, where the SMI includes 15 base pairs of the fully degenerate standard DNA base sequence, it is estimated that 4L^15=1,073,741,824 SMI variants will be present in the fully degenerate SMI population. If two SMIs are recovered from reads in a sampled SMI population of 10,000 sequencing data that differ by only one nucleotide in the SMI sequence, the probability of this happening can be calculated mathematically by random chance, and It was decided whether a single base pair difference was more likely to reflect one of the above types of errors, and it was possible to determine that the SMI sequences were in fact derived from the same original double-stranded molecule. In some embodiments wherein the SMI is, at least in part, an exogenously applied sequence, wherein the sequence variants do not fully degenerate from each other, and are at least in part known sequences, in some embodiments the identity of the known sequences may be One or more errors of the foregoing types are designed such that one or more errors of the aforementioned type do not convert the identity of one known SMI sequence to that of another SMI sequence, making it less likely that one SMI will be misinterpreted as another SMI. In some embodiments, the SMI design strategy includes a Hamming code method or a derivative thereof. Once identified, one or more sequence reads generated from the first-strand target nucleic acid molecule are compared to one or more sequence reads generated from the second-strand target nucleic acid molecule to generate an error-corrected sequence of the target nucleic acid molecule (eg, , Figure 22C). For example, a nucleotide position where the bases from a first-strand target nucleic acid sequence and a second-strand target nucleic acid sequence are identical is considered a true sequence, while a nucleotide position that is inconsistent between the two strands is considered a technical error potential sites that can be ignored, eliminated, corrected, or otherwise identified. An error-corrected sequence of the original double-stranded target nucleic acid molecule can thus be generated (shown in Figure 22C). In some embodiments, and after grouping each sequencing read generated from the first-strand target nucleic acid molecule and the second-strand target nucleic acid molecule, respectively, a single-strand can be generated for each of the first-strand and second-strand target nucleic acid molecules chain consensus sequence. Single-stranded consensus sequences from the first-strand target nucleic acid molecule and the second-strand target nucleic acid molecule can then be compared to generate an error-corrected sequence of the target nucleic acid molecule (eg, Figure 22C).

可替代地,在一些实施例中,两条链之间的序列不一致的位点可以被识别为原始双链靶核酸分子中生物衍生的错配的潜在位点。可替代地,在一些实施例中,两条链之间的序列不一致的位点可以被识别为原始双链靶核酸分子中来自DNA合成的错配的潜在位点。可替代地,在一些实施例中,两条链之间序列不一致的位点可被识别为这样的潜在的位点,其中受损的或修饰的核苷酸碱基存在于一条或两条链上,并通过酶促过程(例如,DNA聚合酶、DNA糖基化酶或另一种核酸修饰酶或化学过程)被转化为错配。在一些实施例中,这一后来的发现可以用于推断在酶促过程或化学处理之前核酸损伤或核苷酸修饰的存在。Alternatively, in some embodiments, sites of sequence discordance between the two strands can be identified as potential sites for biologically derived mismatches in the original double-stranded target nucleic acid molecule. Alternatively, in some embodiments, sites of sequence discordance between the two strands can be identified as potential sites for mismatches from DNA synthesis in the original double-stranded target nucleic acid molecule. Alternatively, in some embodiments, sites of sequence inconsistency between the two strands can be identified as potential sites where damaged or modified nucleotide bases are present on one or both strands and are converted into mismatches by an enzymatic process (eg, DNA polymerase, DNA glycosylase, or another nucleic acid-modifying enzyme or chemical process). In some embodiments, this later discovery can be used to infer the presence of nucleic acid damage or nucleotide modifications prior to enzymatic processes or chemical treatments.

在一些实施例中,并且根据本技术的各个方面,可以进一步过滤由本文讨论的双链测序步骤生成的测序读数,以消除来自DNA损伤的分子(例如,在储存、运输期间、在组织或血液提取期间或之后、在文库制备期间或之后的损伤等)的测序读数。例如,DNA修复酶,例如尿嘧啶-DNA糖基化酶(UDG)、甲酰胺嘧啶DNA糖基化酶(FPG)和8-氧代鸟嘌呤DNA糖基化酶(OGG1),可以用于消除或校正DNA损伤(例如,体外DNA损伤或体内损伤)。例如,这些DNA修复酶是从DNA中去除受损的碱基的糖基化酶。例如,UDG去除由胞嘧啶脱氨基(由胞嘧啶的自发水解引起)引起的尿嘧啶,并且FPG去除8-氧代鸟嘌呤(例如,由活性氧物质引起的常见DNA损伤)。FPG还具有裂合酶活性,其可以在脱碱基位点生成1个碱基缺口。例如,由于聚合酶不能复制模板,这样的脱碱基位点将通常随后不能通过PCR扩增。因此,使用这样的DNA损伤修复/消除酶可以有效地去除没有真正突变但在测序和双链序列分析后可能以其他方式未检测为错误的受损的DNA。虽然在极少数情况下,由于受损的碱基而导致的错误通常可以通过双链测序来校正,但理论上,互补错误可能出现在两条链上的相同位置,因此,减少错误增加的损伤可以降低假象的可能性。此外,在文库制备期间,待测序的某些DNA片段可能是来自其来源或来自处理步骤(例如,机械DNA剪切)的单链。这些区域通常在本领域中已知的“末端修复”步骤期间被转化为双链DNA,由此将DNA聚合酶和核苷底物加入到DNA样品中以延伸5'凹陷末端。在被复制的DNA的单链部分中的DNA损伤的诱变位点(即在DNA双链体的一端或两端的单链5'突出端或内部单链切口或缺口)可以在填充反应期间引起错误,该错误可以使单链突变、合成错误或核酸损伤的位点变成双链形式,该双链形式在最终的双链共有序列中可能被误解为真正的突变,由此真正的突变存在于原始的双链核酸分子中,而事实上它并不存在。这种情况(被称为“假双链”),可以通过使用这样的损伤破坏/修复酶来减少或防止。在其他实施例中,这种情况可以通过使用破坏或防止原始双链分子的单链部分形成的策略来减少或消除(例如,某些酶的使用被用于片段化原始双链核酸材料,而不是机械剪切或可能留下切口或缺口的某些其他酶)。在其他实施例中,消除原始双链核酸的单链部分的过程(例如,单链特异性核酸酶,例如S1核酸酶或绿豆核酸酶)的使用可以用于类似的目的。In some embodiments, and in accordance with various aspects of the present technology, the sequencing reads generated by the double-stranded sequencing steps discussed herein can be further filtered to eliminate molecules from DNA damage (eg, during storage, transport, in tissue or blood) Sequencing reads during or after extraction, damage during or after library preparation, etc.). For example, DNA repair enzymes, such as uracil-DNA glycosylase (UDG), formamide pyrimidine DNA glycosylase (FPG), and 8-oxoguanine DNA glycosylase (OGG1), can be used to eliminate Or correct for DNA damage (eg, in vitro DNA damage or in vivo damage). For example, these DNA repair enzymes are glycosylases that remove damaged bases from DNA. For example, UDG removes uracil caused by deamination of cytosine (caused by spontaneous hydrolysis of cytosine), and FPG removes 8-oxoguanine (eg, common DNA damage caused by reactive oxygen species). FPG also has lyase activity, which can create a 1-base gap at abasic sites. For example, such abasic sites will typically not subsequently be amplified by PCR due to the inability of the polymerase to replicate the template. Thus, the use of such DNA damage repair/elimination enzymes can effectively remove damaged DNA that is not truly mutated but may otherwise not be detected as errors after sequencing and double-stranded sequence analysis. While in rare cases errors due to damaged bases can usually be corrected by double-stranded sequencing, in theory, complementary errors can occur at the same position on both strands, thus reducing the damage that errors increase The possibility of artifacts can be reduced. Furthermore, during library preparation, some DNA fragments to be sequenced may be single-stranded from their source or from processing steps (eg, mechanical DNA shearing). These regions are typically converted to double-stranded DNA during a "end repair" step known in the art, whereby a DNA polymerase and a nucleoside substrate are added to the DNA sample to extend the 5' recessed ends. Mutagenic sites of DNA damage in the single-stranded portion of the replicated DNA (ie, single-stranded 5' overhangs or internal single-stranded nicks or gaps at one or both ends of the DNA duplex) can be caused during the fill-in reaction Errors that can make single-stranded mutations, synthetic errors, or sites of nucleic acid damage into a double-stranded form that may be misinterpreted as a true mutation in the final double-stranded consensus sequence, whereby a true mutation exists in the original double-stranded nucleic acid molecule, when in fact it does not exist. This condition, known as "pseudoduplex", can be reduced or prevented by the use of such damage destruction/repair enzymes. In other embodiments, this condition can be reduced or eliminated by using strategies that disrupt or prevent the formation of single-stranded portions of the original double-stranded molecule (eg, the use of certain enzymes is used to fragment the original double-stranded nucleic acid material, while Not mechanical shear or some other enzyme that might leave a cut or gap). In other embodiments, the use of processes that eliminate single-stranded portions of the original double-stranded nucleic acid (eg, single-strand-specific nucleases such as S1 nuclease or mung bean nuclease) can be used for similar purposes.

在进一步的实施例中,可以进一步过滤由本文讨论的双链测序步骤生成的测序读数,以通过修整最容易生成假双链假象的读数的末端来消除假突变。例如,DNA片段化可以在双链分子的末端生成单链部分。这些单链部分可以在末端修复期间被填充(例如,通过Klenow或T4聚合酶)。在一些情况下,聚合酶使得在这些末端修复的区域中发生复制错误,导致“假双链分子”的生成。一旦被测序,这些文库制备的人假象可以错误地表现为真正的突变。作为末端修复机制的结果,这些错误可以通过修整测序读数的末端以排除可能在较高的风险区域中发生的任何突变,从而减少假突变的数量而从测序后的分析中消除或减少。在一个实施例中,测序读数的这样的修整可以自动地完成(例如,正常过程步骤)。在另一个实施例中,可以评估片段末端区域的突变频率,并且如果在片段末端区域中观察到阈值水平的突变,则可以在生成DNA片段的双链共有序列读数之前进行测序读数修整。In a further embodiment, the sequencing reads generated by the double-stranded sequencing steps discussed herein can be further filtered to eliminate false mutations by trimming the ends of the reads that are most prone to false-duplex artifacts. For example, DNA fragmentation can generate single-stranded portions at the ends of double-stranded molecules. These single-stranded portions can be filled in during end repair (eg, by Klenow or T4 polymerase). In some cases, polymerases cause replication errors in these end-repaired regions, resulting in the creation of "pseudoduplexes." Once sequenced, the human artifacts of these library preparations can erroneously appear as true mutations. As a result of the end repair mechanism, these errors can be eliminated or reduced from post-sequencing analysis by trimming the ends of the sequenced reads to exclude any mutations that may occur in higher risk regions, thereby reducing the number of false mutations. In one embodiment, such trimming of sequencing reads can be done automatically (eg, normal process steps). In another embodiment, the mutation frequency of the fragment end regions can be assessed, and if a threshold level of mutation is observed in the fragment end regions, sequencing read trimming can be performed prior to generating double-stranded consensus reads of the DNA fragment.

作为具体示例,在一些实施例中,本文提供了生成双链靶核酸材料的错误校正的序列读数的方法,包含以下步骤:将双链靶核酸材料连接到至少一个衔接子序列以形成衔接子-靶核酸材料复合物,其中所述至少一个衔接子序列包括(a)简并或半简并单分子标识符(SMI)序列,其唯一地标记双链靶核酸材料的每个分子,和(b)标记衔接子-靶核酸材料复合物的第一链的第一核苷酸衔接子序列,和第二核苷酸衔接子序列,该第二核苷酸衔接子序列至少部分地与标记衔接子-靶核酸材料复合物的第二链的第一核苷酸序列不互补,使得衔接子-靶核酸材料复合物的每条链相对于其互补链具有明显可识别的核苷酸序列。该方法接下来可以包含扩增衔接子-靶核酸材料复合物的每条链以生成多个第一链衔接子-靶核酸复合物扩增子和多个第二链衔接子-靶核酸复合物扩增子的步骤。该方法可以进一步包含扩增第一链和第二链以提供第一核酸产物和第二核酸产物的步骤。该方法还可以包含以下步骤:对第一核酸产物和第二核酸产物中的每一种进行测序,以生成多个第一链序列读数和多个第二链序列读数,并确认至少一个第一链序列读数和至少一个第二链序列读数的存在。该方法可以进一步包含将至少一个第一链序列读数与至少一个第二链序列读数进行比较,以及通过忽略不一致的核苷酸位置,或者可替换地去除具有一个或多个核苷酸位置的比较的第一和第二链序列读数来生成双链靶核酸材料的错误校正的序列读数,其中比较的第一链序列读数和第二链序列读数是非互补的。As a specific example, in some embodiments, provided herein are methods of generating error-corrected sequence reads of a double-stranded target nucleic acid material, comprising the steps of: ligating the double-stranded target nucleic acid material to at least one adaptor sequence to form an adaptor- A complex of target nucleic acid material, wherein the at least one adaptor sequence comprises (a) a degenerate or semidegenerate single molecule identifier (SMI) sequence that uniquely labels each molecule of the double-stranded target nucleic acid material, and (b) ) a first nucleotide adaptor sequence of the first strand of the labeled adaptor-target nucleic acid material complex, and a second nucleotide adaptor sequence that is at least partially associated with the labeled adaptor - The first nucleotide sequence of the second strand of the complex of target nucleic acid material is not complementary such that each strand of the complex of adapter-target nucleic acid material has a distinctly recognizable nucleotide sequence relative to its complementary strand. The method can next include amplifying each strand of the adaptor-target nucleic acid material complex to generate a plurality of first-strand adaptor-target nucleic acid complex amplicons and a plurality of second-strand adaptor-target nucleic acid complexes Steps for amplicon. The method may further comprise the step of amplifying the first strand and the second strand to provide the first nucleic acid product and the second nucleic acid product. The method may further comprise the steps of sequencing each of the first nucleic acid product and the second nucleic acid product to generate a plurality of first-strand sequence reads and a plurality of second-strand sequence reads, and identifying at least one first The presence of strand sequence reads and at least one second strand sequence read. The method may further comprise comparing at least one first-strand sequence read to at least one second-strand sequence read, and by ignoring discordant nucleotide positions, or alternatively removing comparisons with one or more nucleotide positions The first and second strand sequence reads are used to generate error-corrected sequence reads of the double-stranded target nucleic acid material, wherein the compared first and second strand sequence reads are non-complementary.

作为另外的具体示例,在一些实施例中,本文提供了从样品中识别DNA变体的方法,包含以下步骤:将核酸材料(例如双链靶DNA分子)的两条链连接到至少一个不对称衔接子分子上以形成衔接子-靶核酸材料复合物,该复合物具有与双链靶DNA分子的第一链(例如,顶部链)相关联的第一核苷酸序列和第二核苷酸序列,所述第二核苷酸序列与双链靶DNA分子的第二链(例如底部链)相关的第一核苷酸序列至少部分地不互补;并且扩增衔接子-靶核酸材料的每条链,导致在每条链中生成一组不同但相关的扩增的衔接子-靶核酸产物。该方法可以进一步包含以下步骤:对多个第一链衔接子-靶核酸产物和多个第二链衔接子-靶核酸产物中的每一种进行测序,确认来自衔接子-靶核酸材料复合物的每一条链的至少一个扩增序列读数的存在,以及将从第一链获得的至少一个扩增的序列读数与从第二链获得的至少一个扩增的序列读数进行比较,以形成仅具有核苷酸碱基的核酸材料(例如双链靶DNA分子)的共有序列读数,其中核酸材料(例如双链靶DNA分子)的两条链的序列在所述核苷酸碱基上是一致的,使得在共有序列读数中的特定位置出现的变体(例如如与参考序列相比)被识别为真正的DNA变体。As a further specific example, in some embodiments, provided herein are methods of identifying DNA variants from a sample, comprising the steps of: ligating two strands of nucleic acid material (eg, a double-stranded target DNA molecule) to at least one asymmetric on the adaptor molecule to form an adaptor-target nucleic acid material complex having a first nucleotide sequence and a second nucleotide sequence associated with the first strand (eg, top strand) of the double-stranded target DNA molecule a sequence that is at least partially non-complementary to a first nucleotide sequence associated with the second strand (eg, bottom strand) of the double-stranded target DNA molecule; and each of the amplifying adapter-target nucleic acid material strands, resulting in a distinct but related set of amplified adaptor-target nucleic acid products in each strand. The method may further comprise the step of sequencing each of the plurality of first-strand adaptor-target nucleic acid products and the plurality of second-strand adaptor-target nucleic acid products, confirming that they are from the adaptor-target nucleic acid material complex The presence of at least one amplified sequence read for each strand of the A consensus sequence read of a nucleic acid material (eg, a double-stranded target DNA molecule) of nucleotide bases at which the sequences of the two strands of the nucleic acid material (eg, a double-stranded target DNA molecule) are identical , so that variants occurring at specific positions in the consensus sequence reads (eg, as compared to a reference sequence) are identified as true DNA variants.

在一些实施例中,本文提供了从双链核酸材料生成高准确度共有序列的方法,包含用衔接子分子标记单个双链DNA分子以形成标记的DNA材料的步骤,其中每个衔接子分子包括(a)唯一标记双链DNA分子的简并或半简并单分子标识符(SMI),和(b)第一和第二非互补核苷酸衔接子序列,其对于每个标记的DNA分子,将标记的DNA材料内每个单独的DNA分子的原始顶部链与原始底部链区分开来,并生成标记的DNA分子的原始顶部链的一组复制品和标记的DNA分子的原始底部链的一组复制品,以形成扩增的DNA材料。该方法可以进一步包含以下步骤:从原始顶部链的复制品产生第一单链共有序列(SSCS)和从原始底部链的复制品产生第二单链共有序列(SSCS),将原始顶部链的第一SSCS与原始底部链的第二SSCS进行比较,并生成仅具有核苷酸碱基的高准确度共有序列,在该核苷酸碱基处原始顶部链的第一SSCS的序列和原始底部链的第二SSCS的序列互补。In some embodiments, provided herein are methods of generating high-accuracy consensus sequences from double-stranded nucleic acid material, comprising the step of labeling a single double-stranded DNA molecule with an adaptor molecule to form a labeled DNA material, wherein each adaptor molecule comprises (a) a degenerate or semidegenerate single molecule identifier (SMI) that uniquely labels the double-stranded DNA molecule, and (b) first and second non-complementary nucleotide adaptor sequences for each labeled DNA molecule , distinguish the original top strand from the original bottom strand of each individual DNA molecule within the labeled DNA material, and generate a set of replicas of the original top strand of the labeled DNA molecule and the original bottom strand of the labeled DNA molecule A set of replicas to form amplified DNA material. The method may further comprise the steps of generating a first single-stranded consensus sequence (SSCS) from a replica of the original top strand and generating a second single-stranded consensus sequence (SSCS) from a replica of the original bottom strand, One SSCS is compared to the second SSCS of the original bottom strand and generates a high-accuracy consensus sequence with only the nucleotide base at which the sequence of the first SSCS of the original top strand and the original bottom strand the sequence complementarity of the second SSCS.

在进一步的实施例中,本文提供了检测和/或定量来自包括双链靶DNA分子的样品的DNA损伤的方法,包含将每个双链靶DNA分子的两条链连接到至少一个不对称衔接子分子以形成多个衔接子-靶DNA复合物的步骤,其中每个衔接子-靶DNA复合物具有与双链靶DNA分子的第一链相关联的第一核苷酸序列和与双链靶DNA分子的第二链相关联的第一核苷酸序列至少部分地不互补的第二核苷酸序列,并且对于每个衔接子-靶DNA复合物:扩增衔接子-靶DNA复合物的每条链,导致每条链生成一组不同但相关的扩增的衔接子-靶DNA扩增子。该方法可以进一步包含以下步骤:对多个第一链衔接子-靶DNA扩增子和多个第二链衔接子-靶DNA扩增子中的每一个进行测序,确认来自衔接子-靶DNA复合物的每一条链中的至少一个序列读数的存在,以及将从第一链获得的至少一个序列读数与从第二链获得的至少一个序列读数进行比较,以检测和/或定量核苷酸碱基,在所述核苷酸碱基处,双链DNA分子的一条链的序列读数与双链DNA分子的另一条链的序列读数不一致(例如,不互补),使得可以检测和/或定量DNA损伤的位点。在一些实施例中,该方法可以进一步包含以下步骤:从第一链衔接子-靶DNA扩增子产生第一单链共有序列(SSCS)和从第二链衔接子-靶DNA扩增子产生第二单链共有序列(SSCS),将原始第一链的第一SSCS与原始第二链的第二SSCS进行比较,并识别第一SSCS的序列和第二SSCS的序列不互补的核苷酸碱基,以检测和/或定量样品中与双链靶DNA分子相关的DNA损伤。In further embodiments, provided herein are methods of detecting and/or quantifying DNA damage from a sample comprising double-stranded target DNA molecules comprising ligating the two strands of each double-stranded target DNA molecule to at least one asymmetric adaptor The step of sub-molecules to form a plurality of adaptor-target DNA complexes, wherein each adaptor-target DNA complex has a first nucleotide sequence associated with the first strand of the double-stranded target DNA molecule and a first nucleotide sequence associated with the double-stranded target DNA molecule the second nucleotide sequence associated with the second strand of the target DNA molecule is at least partially non-complementary to the second nucleotide sequence, and for each adaptor-target DNA complex: amplifies the adaptor-target DNA complex of each strand, resulting in each strand generating a distinct but related set of amplified adaptor-target DNA amplicons. The method may further comprise the step of sequencing each of the plurality of first-strand adaptor-target DNA amplicons and the plurality of second-strand adaptor-target DNA amplicons, confirming the origin of the adaptor-target DNA The presence of at least one sequence read in each strand of the complex, and the comparison of the at least one sequence read obtained from the first strand with the at least one sequence read obtained from the second strand to detect and/or quantify nucleotides A base at which a sequence read of one strand of a double-stranded DNA molecule is not identical (eg, not complementary) to a sequence read of the other strand of the double-stranded DNA molecule, allowing detection and/or quantification site of DNA damage. In some embodiments, the method may further comprise the steps of generating a first single-stranded consensus sequence (SSCS) from a first-strand adaptor-target DNA amplicon and generating from a second-strand adaptor-target DNA amplicon Second Single Strand Consensus Sequence (SSCS), compares the first SSCS of the original first strand with the second SSCS of the original second strand, and identifies nucleotides where the sequence of the first SSCS and the sequence of the second SSCS are not complementary bases to detect and/or quantify DNA damage associated with double-stranded target DNA molecules in a sample.

单分子标识符序列(SMI)Single Molecule Identifier Sequence (SMI)

根据各种实施例,所提供的方法和组合物在核酸材料的每条链上包含一个或多个SMI序列。SMI可以被由双链核酸分子产生的每条单链独立地携带,使得在测序后每条链的衍生扩增产物可以被识别为来自相同的原始基本上独特的双链核酸分子。在一些实施例中,如本领域技术人员将认识到的,SMI可以包含额外的信息和/或可以用于这样的分子区分功能有用的其他方法中。在一些实施例中,SMI元件可以在连接到核酸材料的衔接子序列连接之前、基本上同时或之后被引入。According to various embodiments, provided methods and compositions comprise one or more SMI sequences on each strand of nucleic acid material. The SMI can be carried independently by each single strand produced by a double-stranded nucleic acid molecule, such that the derived amplification products of each strand can be identified as derived from the same original substantially unique double-stranded nucleic acid molecule after sequencing. In some embodiments, as those skilled in the art will recognize, SMIs may contain additional information and/or may be used in other methods where such molecular differentiation functions are useful. In some embodiments, the SMI element can be introduced prior to, substantially simultaneously with, or after ligation of the adaptor sequence ligated to the nucleic acid material.

在一些实施例中,SMI序列可以包含至少一种简并或半简并核酸。在其他实施例中,SMI序列可以是非简并的。在一些实施例中,SMI可以是与核酸分子的片段末端(例如,连接的核酸材料的随机或半随机剪切的末端)相关或在其附近的序列。在一些实施例中,可以将外源性序列与对应于随机或半随机剪切的连接的核酸材料(例如,DNA)的末端的序列结合起来考虑,以获得能够彼此区分例如单个DNA分子的SMI序列。在一些实施例中,SMI序列是连接到双链核酸分子的衔接子序列的一部分。在某些实施例中,包括SMI序列的衔接子序列是双链的,使得双链核酸分子的每条链在连接到衔接子序列后包含SMI。在另一个实施例中,SMI序列在连接到双链核酸分子之前或之后是单链的,并且互补的SMI序列可以通过用DNA聚合酶延伸相反的链以生成互补的双链SMI序列来生成。在其他实施例中,SMI序列位于衔接子的单链部分(例如,具有Y形的衔接子的臂)中。在这样的实施例中,SMI可以促进源自双链核酸分子的原始链的序列读数家族的分组,并且在一些情况下可以赋予双链核酸分子的原始第一链和第二链之间的关系(例如,所有或部分的SMI可以通过查找表关联)。在实施例中,在第一链和第二链用不同的SMI标记的情况下,可以通过使用一种或多种内源性SMI(例如,片段特异性特征,例如与核酸分子的片段末端相关或在其附近的序列),或者使用两个原始链共有的额外分子标签(例如,衔接子的双链部分中的条形码)或其组合来关联来自两个原始链的序列读数。在一些实施例中,每个SMI序列可以包含约1至约30个之间的核酸(例如,1、2、3、4、5、8、10、12、14、16、18、20个或更多个简并或半简并核酸)。In some embodiments, the SMI sequence can comprise at least one degenerate or semi-degenerate nucleic acid. In other embodiments, the SMI sequence may be non-degenerate. In some embodiments, an SMI can be a sequence associated with or near a fragment end of a nucleic acid molecule (eg, a randomly or semi-randomly cleaved end of a ligated nucleic acid material). In some embodiments, exogenous sequences can be considered in combination with sequences corresponding to the ends of randomly or semi-randomly sheared ligated nucleic acid material (eg, DNA) to obtain SMIs that can distinguish, eg, single DNA molecules from each other sequence. In some embodiments, the SMI sequence is part of an adaptor sequence that is attached to the double-stranded nucleic acid molecule. In certain embodiments, the adaptor sequence that includes the SMI sequence is double-stranded, such that each strand of the double-stranded nucleic acid molecule, after ligation to the adaptor sequence, contains an SMI. In another embodiment, the SMI sequence is single-stranded before or after ligation to a double-stranded nucleic acid molecule, and a complementary SMI sequence can be generated by extending the opposite strand with a DNA polymerase to generate a complementary double-stranded SMI sequence. In other embodiments, the SMI sequence is located in the single-stranded portion of the adaptor (eg, the arms of the adaptor having a Y shape). In such embodiments, the SMI can facilitate the grouping of families of sequence reads derived from the original strand of the double-stranded nucleic acid molecule, and in some cases can confer a relationship between the original first and second strands of the double-stranded nucleic acid molecule (For example, all or some of the SMIs can be associated through a lookup table). In an embodiment, where the first and second strands are labeled with different SMIs, this can be achieved by using one or more endogenous SMIs (eg, fragment-specific features, such as associated with fragment ends of the nucleic acid molecule) or sequences in its vicinity), or use additional molecular tags common to the two original strands (eg, barcodes in the double-stranded portion of the adaptor), or a combination thereof to correlate sequence reads from the two original strands. In some embodiments, each SMI sequence can comprise between about 1 and about 30 nucleic acids (eg, 1, 2, 3, 4, 5, 8, 10, 12, 14, 16, 18, 20 or more degenerate or semi-degenerate nucleic acids).

在一些实施例中,SMI能够连接到核酸材料和衔接子序列中的一种或两种。在一些实施例中,SMI可以连接到核酸材料的T-突出端、A-突出端、CG-突出端、包括具有已知的核苷酸长度(例如,1、2、3、4、5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20个或更多个核苷酸)的“粘性末端”或单链突出区域的突出端、去羟基化的碱基和钝端中的至少一个上。In some embodiments, the SMI can be attached to one or both of the nucleic acid material and the adaptor sequence. In some embodiments, SMIs can be attached to T-overhangs, A-overhangs, CG-overhangs, including those of known nucleotide lengths (eg, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more nucleotides) "sticky ends" or overhangs of single-stranded overhang regions on at least one of the end, the dehydroxylated base, and the blunt end.

在一些实施例中,可以结合(或根据)对应于例如核酸材料(例如,连接的核酸材料)的随机或半随机剪切末端的序列来考虑(设计)SMI序列,以获得能够将单个核酸分子彼此区分的SMI序列。In some embodiments, SMI sequences can be considered (designed) in conjunction with (or based on) sequences corresponding to, for example, randomly or semi-randomly spliced ends of nucleic acid material (eg, linked nucleic acid material) to obtain a single nucleic acid molecule capable of combining SMI sequences that are distinguished from each other.

在一些实施例中,至少一个SMI可以是内源性SMI(例如,与剪切点(例如,片段末端)相关的SMI,例如,使用剪切点本身或使用紧邻剪切点的核酸材料中限定数量的核苷酸[例如,距剪切点2、3、4、5、6、7、8、9、10个核苷酸])。在一些实施例中,至少一种SMI可以是外源性SMI(例如,包括在靶核酸材料上未发现的序列的SMI)。In some embodiments, the at least one SMI can be an endogenous SMI (eg, an SMI associated with a cleavage point (eg, a fragment end), eg, defined using the cleavage point itself or using nucleic acid material immediately adjacent to the cleavage point number of nucleotides [eg, 2, 3, 4, 5, 6, 7, 8, 9, 10 nucleotides from the cleavage point]). In some embodiments, the at least one SMI can be an exogenous SMI (eg, an SMI that includes sequences not found on the target nucleic acid material).

在一些实施例中,SMI可以是或包括成像部分(例如,荧光或以其他方式光学上可检测的部分)。在一些实施例中,这样的SMI允许检测和/或定量,而不需要扩增步骤。In some embodiments, the SMI may be or include an imaging moiety (eg, a fluorescent or otherwise optically detectable moiety). In some embodiments, such SMI allows detection and/or quantification without the need for an amplification step.

在一些实施例中,SMI元件可以包括位于衔接子-靶核酸复合物上的不同位置的两个或更多个不同的SMI元件。In some embodiments, an SMI element can include two or more different SMI elements located at different positions on the adaptor-target nucleic acid complex.

在国际专利公开第WO2017/100441号(其全部内容通过引用并入到本文中)中进一步公开了SMI的各种实施例。Various embodiments of SMI are further disclosed in International Patent Publication No. WO2017/100441, the entire contents of which are incorporated herein by reference.

链定义元件(SDE)Chain Definition Element (SDE)

在一些实施例中,双链核酸材料的每条链可以进一步包含一种元件,该元件使得形成靶双链核酸材料的两个单链核酸的扩增产物在测序后基本上可以彼此区分。在一些实施例中,SDE可以是或包括包括在测序衔接子内的不对称引物位点,或者,在其他排列中,序列不对称可以被引入到衔接子序列中而不是引物序列内,使得在扩增和测序后,第一链靶核酸序列复合物的核苷酸序列中的至少一个位置和靶核酸序列复合物的第二链彼此不同。在其他实施例中,SDE可以包括两条链之间的另一种生化不对称,其不同于标准核苷酸序列A、T、C、G或U,但在两个扩增的和测序的分子中被转化为至少一个标准核苷酸序列差异。在又一个实施例中,SDE可以是或包括在扩增前物理分离两条链的手段,使得来自第一链靶核酸序列和第二链靶核酸序列的衍生扩增产物彼此保持基本物理隔离,用于保持两种衍生扩增产物之间的区别的目的。可以利用用于提供允许区分第一链和第二链的SDE功能的其他这样的排列或方法。In some embodiments, each strand of the double-stranded nucleic acid material may further comprise an element that renders the amplification products of the two single-stranded nucleic acids forming the target double-stranded nucleic acid material substantially distinguishable from each other after sequencing. In some embodiments, the SDE may be or include an asymmetric primer site within the sequencing adaptor, or, in other arrangements, sequence asymmetry may be introduced into the adaptor sequence rather than the primer sequence, such that in After amplification and sequencing, at least one position in the nucleotide sequence of the first strand target nucleic acid sequence complex and the second strand of the target nucleic acid sequence complex differ from each other. In other embodiments, the SDE may include another biochemical asymmetry between the two strands that differs from the standard nucleotide sequence A, T, C, G, or U, but differs in both amplified and sequenced Molecules are converted to at least one standard nucleotide sequence difference. In yet another embodiment, the SDE may be or include a means of physically separating the two strands prior to amplification such that the derived amplification products from the first-strand target nucleic acid sequence and the second-strand target nucleic acid sequence remain substantially physically separated from each other, For the purpose of maintaining the distinction between the two derived amplification products. Other such arrangements or methods for providing SDE functionality that allow differentiation of the first and second strands can be utilized.

在一些实施例中,SDE也许能够形成环(例如发夹环)。在一些实施例中,环可以包括至少一个核酸内切酶识别位点。在一些实施例中,靶核酸复合物可以含有有助于环内切割事件的核酸内切酶识别位点。在一些实施例中,环可以包括非标准核苷酸序列。在一些实施例中,所含有的非标准核苷酸可以被一种或多种促进链切割的酶识别。在一些实施例中,所含有的非标准核苷酸可以通过一种或多种有助于环中链切割的化学过程来靶向。在一些实施例中,环可以含有修饰的核酸接头,其可以通过一种或多种促进环中链切割的酶促、化学或物理过程来靶向。在一些实施例中,这种修饰的接头是可光裂解的接头。In some embodiments, the SDE may be capable of forming loops (eg, hairpin loops). In some embodiments, the loop can include at least one endonuclease recognition site. In some embodiments, the target nucleic acid complex may contain an endonuclease recognition site that facilitates intra-loop cleavage events. In some embodiments, loops may include non-standard nucleotide sequences. In some embodiments, the included non-standard nucleotides can be recognized by one or more enzymes that facilitate strand cleavage. In some embodiments, the included non-standard nucleotides can be targeted by one or more chemical processes that facilitate cleavage of strands in the loop. In some embodiments, the loops can contain modified nucleic acid linkers that can be targeted by one or more enzymatic, chemical, or physical processes that facilitate cleavage of strands in the loop. In some embodiments, such modified linkers are photocleavable linkers.

各种其他的分子工具可以作为SMI和SDE。除了剪切点和基于DNA的标记之外,保持成对的链物理邻近的单分子区室化方法或其他非核酸标记方法可以发挥链相关功能。类似地,以使衔接子链可以物理分离的方式对衔接子链进行不对称化学标记可以起到SDE的作用。最近描述的双链测序的变体使用亚硫酸氢盐转化以将胞嘧啶甲基化形式的天然发生的链不对称转化为区分两条链的序列差异。尽管这种实施方式限制了可以检测的突变的类型,但是在新兴的可以直接地检测修饰的核苷酸的测序技术的上下文中,利用天然不对称的概念是值得注意的。SDE的各种实施例在国际专利公开第WO2017100441号(其全部内容通过引用被并入)中进一步公开。Various other molecular tools are available as SMI and SDE. In addition to cleavage sites and DNA-based labels, single-molecule compartmentalization approaches that maintain physical proximity of paired strands or other non-nucleic acid labeling approaches can perform strand-associated functions. Similarly, asymmetric chemical labeling of the adaptor strands in such a way that the adaptor strands can be physically separated can function as an SDE. A recently described variant of double-stranded sequencing uses bisulfite transformation to convert naturally occurring strand asymmetry in the form of cytosine methylation into sequence differences that distinguish the two strands. Although this embodiment limits the types of mutations that can be detected, exploiting the concept of natural asymmetry is noteworthy in the context of emerging sequencing technologies that can directly detect modified nucleotides. Various embodiments of SDE are further disclosed in International Patent Publication No. WO2017100441, the entire contents of which are incorporated by reference.

衔接子和衔接子序列Adapters and Adapter Sequences

在各种排列中,包括SMI(例如,分子条形码)、SDE、引物位点、流动细胞序列和/或其他特征的衔接子分子被预期用于本文公开的许多实施例。在一些实施例中,所提供的衔接子可以是或包括与PCR引物(例如,引物位点)互补或至少部分互补的一个或多个序列,所述引物具有以下特性中的至少一种:1)高靶特异性;2)能够被多重;和3)表现出稳健和最小偏差的扩增。In various arrangements, adaptor molecules including SMIs (eg, molecular barcodes), SDEs, primer sites, flow cell sequences, and/or other features are contemplated for use in many of the embodiments disclosed herein. In some embodiments, the provided adaptors can be or include one or more sequences that are complementary or at least partially complementary to PCR primers (eg, primer sites) having at least one of the following properties: 1 ) high target specificity; 2) capable of being multiplexed; and 3) amplification that exhibits robustness and minimal bias.

在一些实施例中,衔接子分子可以是“Y”形、“U”形、“发夹”形,具有气泡(例如,序列的非互补的部分)或其他特征。在其他实施例中,衔接子分子可以包括“Y”形、“U”形、“发夹”形或气泡。某些衔接子可以包括修饰的或非标准的核苷酸、限制性位点或用于体外结构或功能的操纵的其他特征。衔接子分子可以连接到多种具有末端的核酸材料上。例如,衔接子分子可以适合于连接到T-突出端、A-突出端、CG-突出端、多核苷酸突出端(在本文中也被称为“粘性末端”或“粘性突出端”)、去羟基化的碱基、核酸材料的钝端和分子的末端,其中靶的5'被去磷酸化或以其他方式与传统连接阻断。在其他实施例中,衔接子分子可以在连接位点的5'链上含有去磷酸化的或以其他方式防止连接的修饰。在后两个实施例中,这样的策略可用于防止文库片段或衔接子分子的二聚化。In some embodiments, adaptor molecules can be "Y" shaped, "U" shaped, "hairpin" shaped, have bubbles (eg, non-complementary portions of the sequence) or other features. In other embodiments, the adaptor molecule may comprise a "Y" shape, a "U" shape, a "hairpin" shape, or a bubble. Certain adaptors may include modified or non-standard nucleotides, restriction sites, or other features for manipulation of structure or function in vitro. Adaptor molecules can be ligated to a variety of nucleic acid materials with termini. For example, adaptor molecules may be suitable for ligation to T-overhangs, A-overhangs, CG-overhangs, polynucleotide overhangs (also referred to herein as "sticky ends" or "sticky overhangs"), Dehydroxylated bases, blunt ends of nucleic acid material, and ends of molecules where the 5' of the target is dephosphorylated or otherwise blocked from conventional ligation. In other embodiments, the adaptor molecule may contain modifications on the 5' strand of the ligation site that dephosphorylate or otherwise prevent ligation. In the latter two examples, such a strategy can be used to prevent dimerization of library fragments or adaptor molecules.

在一些实施例中,衔接子分子可以包括适合于分离与其连接的所需靶核酸分子的捕获部分。In some embodiments, the adaptor molecule can include a capture moiety suitable for isolating the desired target nucleic acid molecule to which it is attached.

衔接子序列可以指单链序列、双链序列、互补序列、非互补序列、部分互补序列、不对称序列、引物结合序列、流动细胞序列、连接序列或由衔接子分子提供的其他序列。在特定的实施例中,衔接子序列可以指通过互补寡核苷酸的方式用于扩增的序列。Adapter sequences may refer to single-stranded sequences, double-stranded sequences, complementary sequences, non-complementary sequences, partially complementary sequences, asymmetric sequences, primer binding sequences, flow cell sequences, linker sequences, or other sequences provided by the adaptor molecule. In certain embodiments, an adaptor sequence may refer to a sequence used for amplification by means of complementary oligonucleotides.

在一些实施例中,所提供的方法和组合物包含至少一个衔接子序列(例如,两个衔接子序列,在核酸材料的5'和3'的末端中每一个上各一个)。在一些实施例中,所提供的方法和组合物可以包括2个或更多个衔接子序列(例如,3、4、5、6、7、8、9、10个或更多个)。在一些实施例中,衔接子序列中的至少两个彼此不同(例如,通过序列)。在一些实施例中,每个衔接子序列彼此不同(例如,通过序列)。在一些实施例中,至少一个衔接子序列与至少一个其他衔接子序列的至少一部分至少部分地不互补(例如,与至少一个核苷酸不互补)。In some embodiments, provided methods and compositions comprise at least one adaptor sequence (eg, two adaptor sequences, one on each of the 5' and 3' ends of the nucleic acid material). In some embodiments, provided methods and compositions can include 2 or more adaptor sequences (eg, 3, 4, 5, 6, 7, 8, 9, 10 or more). In some embodiments, at least two of the adaptor sequences differ from each other (eg, by sequence). In some embodiments, each adaptor sequence differs from each other (eg, by sequence). In some embodiments, at least one adaptor sequence is at least partially non-complementary to at least a portion of at least one other adaptor sequence (eg, non-complementary to at least one nucleotide).

在一些实施例中,衔接子序列包括至少一个非标准核苷酸。在一些实施例中,非标准核苷酸选自脱碱基位点、尿嘧啶、四氢呋喃、8-氧代-7,8-二氢-2'脱氧腺苷(8-氧代-A)、8-氧代-7,8-二氢2'-脱氧鸟苷(8-氧代-G)、脱氧肌苷、5'硝基吲哚、5-羟甲基-2'-脱氧胞苷、异胞嘧啶、5'-甲基异胞嘧啶或异鸟苷、甲基化核苷酸、RNA核苷酸、核糖核苷酸、8-氧代鸟嘌呤、光可裂解的接头、生物素化的核苷酸、脱硫生物素核苷酸、硫醇修饰的核苷酸、丙烯酸酯修饰的核苷酸、异-dC、异dG、2'-O-甲基核苷酸、肌苷核苷酸锁核酸、肽核酸、5甲基dC、5-溴脱氧尿苷、2,6-二氨基嘌呤、2-氨基嘌呤核苷酸、脱碱基核苷酸、5-硝基吲哚核苷酸、腺苷酸化核苷酸、叠氮化物核苷酸、洋地黄毒苷核苷酸、I-接头、5'己炔基修饰的核苷酸、5-辛二炔基dU、可光裂解的间隔子、非可光裂解的间隔子、点击化学相容的修饰核苷酸及其任何组合。In some embodiments, the adaptor sequence includes at least one non-standard nucleotide. In some embodiments, the non-standard nucleotide is selected from the group consisting of abasic sites, uracil, tetrahydrofuran, 8-oxo-7,8-dihydro-2'deoxyadenosine (8-oxo-A), 8-oxo-7,8-dihydro2'-deoxyguanosine (8-oxo-G), deoxyinosine, 5'nitroindole, 5-hydroxymethyl-2'-deoxycytidine, Isocytosine, 5'-methylisocytosine or isoguanosine, methylated nucleotides, RNA nucleotides, ribonucleotides, 8-oxoguanine, photocleavable linkers, biotinylation nucleotides, desthiobiotin nucleotides, thiol-modified nucleotides, acrylate-modified nucleotides, iso-dC, iso-dG, 2'-O-methyl nucleotides, inosine nucleosides Acid-locked nucleic acid, peptide nucleic acid, 5-methyl dC, 5-bromodeoxyuridine, 2,6-diaminopurine, 2-aminopurine nucleotide, abasic nucleotide, 5-nitroindole nucleoside acid, adenylated nucleotides, azide nucleotides, digoxigenin nucleotides, I-linkers, 5'hexynyl modified nucleotides, 5-octadiynyl dU, photocleavable spacers, non-photocleavable spacers, click chemistry compatible modified nucleotides, and any combination thereof.

在一些实施例中,衔接子序列包括具有磁性性质的部分(即磁性部分)。在一些实施例中,这种磁性性质是顺磁的。在一些实施例中,其中衔接子序列包括磁性部分(例如,连接到包括磁性部分的衔接子序列的核酸材料),当施加磁场时,包括磁性部分的衔接子序列基本上与不包括磁性部分(例如,连接到不包含磁性部分的衔接子序列的核酸材料)的衔接子序列分离。In some embodiments, the adaptor sequence includes a moiety having magnetic properties (ie, a magnetic moiety). In some embodiments, this magnetic property is paramagnetic. In some embodiments, wherein the adaptor sequence includes a magnetic moiety (eg, a nucleic acid material attached to an adaptor sequence that includes a magnetic moiety), when a magnetic field is applied, the adaptor sequence that includes the magnetic moiety is substantially different from the adaptor sequence that does not include the magnetic moiety ( For example, adaptor sequences linked to nucleic acid material that do not contain adaptor sequences that do not have magnetic moieties) are isolated.

在一些实施例中,至少一个衔接子序列位于SMI的5'处。在一些实施例中,至少一个衔接子序列位于SMI的3'处。In some embodiments, at least one adaptor sequence is located 5' to the SMI. In some embodiments, at least one adaptor sequence is located 3' to the SMI.

在一些实施例中,衔接子序列可以通过一个或多个接头结构域被连接至SMI和核酸材料中的至少一种。在一些实施例中,接头结构域可以由核苷酸组成。在一些实施例中,接头结构域可以包含至少一种修饰的核苷酸或非核苷酸分子(例如,如在本公开中其他地方所描述的)。在一些实施例中,接头结构域可以是或包括环。In some embodiments, the adaptor sequence can be linked to at least one of the SMI and the nucleic acid material through one or more linker domains. In some embodiments, the linker domain may consist of nucleotides. In some embodiments, the linker domain may comprise at least one modified nucleotide or non-nucleotide molecule (eg, as described elsewhere in this disclosure). In some embodiments, the linker domain can be or include a loop.

在一些实施例中,双链核酸材料的每条链的任一端或两端上的衔接子序列可以进一步包含一个或多个提供SDE的元件。在一些实施例中,SDE可以是或包括包括在衔接子序列中的不对称引物位点。In some embodiments, the adaptor sequences on either or both ends of each strand of the double-stranded nucleic acid material may further comprise one or more SDE-providing elements. In some embodiments, the SDE can be or include an asymmetric primer site included in the adaptor sequence.

在一些实施例中,衔接子序列可以是或包括至少一个SDE和至少一个连接结构域(即可根据至少一种连接酶的活性修饰的结构域,例如,适于通过连接酶的活性连接到核酸材料的结构域)。在一些实施例中,从5'到3',衔接子序列可以是或包括引物结合位点、SDE和连接结构域。In some embodiments, an adaptor sequence can be or include at least one SDE and at least one ligation domain (ie, a domain that is modified according to the activity of at least one ligase, eg, suitable for attachment to a nucleic acid by the activity of the ligase). material domains). In some embodiments, from 5' to 3', the adaptor sequence can be or include a primer binding site, an SDE, and a ligation domain.

用于合成双链测序衔接子的各种方法先前已经在例如美国专利第9,752,188号、国际专利公开第WO2017/100441号和国际专利申请第PCT/US18/59908号(2018年11月8日提交)中被描述,所有这些专利的全部内容通过引用并入到本文中。Various methods for synthesizing double-stranded sequencing adapters have been previously described in, for example, US Patent No. 9,752,188, International Patent Publication No. WO2017/100441, and International Patent Application No. PCT/US18/59908 (filed on November 8, 2018) are described in , the entire contents of all of these patents are incorporated herein by reference.

引物primer

在一些实施例中,具有以下性质中的至少一种的一种或多种PCR引物被预期用于根据本技术的各个方面的各种实施例中:1)高靶特异性;2)能够被多重;和3)表现出稳健的和最小偏差的扩增。许多以前的研究和商业产品已经被设计为满足常规PCR-CE的这些标准中的某些的引物混合物。然而,已经注意到这些引物混合物并不总是与MPS一起使用的最佳选择。事实上,开发高度多重的引物混合物可以是一个具有挑战性且耗时的过程。便利的是,Illumina和Promega最近都已经为Illumina平台开发了多重兼容的引物混合物,其显示出对多种标准和非标准STR和SNP基因座的稳健和有效的扩增。因为这些试剂盒在测序前使用PCR来扩增它们的靶区域,成对的末端测序数据中每个读数的5'末端对应于用于扩增DNA的PCR引物的5'末端。在一些实施例中,所提供的方法和组合物包含被设计用于确保均匀扩增的引物,这可能需要改变反应浓度、解链温度,并使二级结构和引物内/引物间相互作用最小化。已经描述了多种技术用于MPS应用的高度多重引物优化。特别地,这些技术通常被称为ampliseq方法,如本领域中描述的。In some embodiments, one or more PCR primers having at least one of the following properties are contemplated for use in various embodiments in accordance with various aspects of the present technology: 1) high target specificity; 2) capable of being used by multiplex; and 3) amplification that exhibits robustness and minimal bias. A number of previous research and commercial products have been designed to meet primer mixes for some of these criteria for conventional PCR-CE. However, it has been noted that these primer mixes are not always the best choice for use with MPS. In fact, developing highly multiplexed primer mixes can be a challenging and time-consuming process. Conveniently, both Illumina and Promega have recently developed multiplex-compatible primer mixes for the Illumina platform that have shown robust and efficient amplification of a variety of standard and non-standard STR and SNP loci. Because these kits use PCR to amplify their target regions prior to sequencing, the 5' end of each read in the paired-end sequencing data corresponds to the 5' end of the PCR primers used to amplify the DNA. In some embodiments, provided methods and compositions comprise primers designed to ensure uniform amplification, which may require changes in reaction concentrations, melting temperatures, and minimize secondary structure and intra-/inter-primer interactions change. Various techniques have been described for highly multiplex primer optimization for MPS applications. In particular, these techniques are commonly referred to as ampliseq methods, as described in the art.

扩增Amplify

在各种实施例中,所提供的方法和组合物利用或用于至少一个扩增步骤,其中核酸材料(或其部分,例如,特定靶区域或基因座)被扩增以形成扩增的核酸材料(例如,一些扩增子产物)。In various embodiments, the provided methods and compositions utilize or are used for at least one amplification step, wherein nucleic acid material (or a portion thereof, eg, a specific target region or locus) is amplified to form amplified nucleic acid material (eg, some amplicon products).

在一些实施例中,扩增核酸材料包含使用至少一种单链寡核苷酸从原始双链核酸材料中扩增衍生自第一和第二核酸链中的每一个的核酸材料的步骤,所述单链寡核苷酸至少部分地与第一衔接子序列中存在的序列互补,使得SMI序列至少部分被保持。扩增步骤进一步包含使用第二单链寡核苷酸来扩增每条相关的链,并且这样的第二单链寡核苷酸可以(a)至少部分地与相关的靶序列互补,或者(b)至少部分地与第二衔接子序列中存在的序列互补,使得至少一条单链寡核苷酸和第二单链寡核苷酸以有效地扩增核酸材料的方式定向。In some embodiments, amplifying the nucleic acid material comprises the step of amplifying nucleic acid material derived from each of the first and second nucleic acid strands from the original double-stranded nucleic acid material using at least one single-stranded oligonucleotide, whereby The single-stranded oligonucleotide is at least partially complementary to the sequence present in the first adaptor sequence such that the SMI sequence is at least partially retained. The step of amplifying further comprises amplifying each relevant strand using a second single-stranded oligonucleotide, and such second single-stranded oligonucleotide may (a) be at least partially complementary to the relevant target sequence, or ( b) is at least partially complementary to a sequence present in the second adaptor sequence such that the at least one single-stranded oligonucleotide and the second single-stranded oligonucleotide are oriented in a manner effective to amplify the nucleic acid material.

在一些实施例中,扩增样品中的核酸材料可以包含扩增“管”(例如,PCR管)、乳液液滴、微室和上述的其他示例或其他已知容器中的核酸材料。在一些实施例中,扩增核酸材料可以包括在两个或更多个(例如,3、4、5、6、7、8、9、10、20、30、40、50个或更多个样品)物理分离的样品(例如,管、液滴、室、容器等)中扩增核酸材料。例如,在扩增步骤之前,可以将初始样品分成多个容器。在一些实施例中,每个样品包含与每个其他样品基本上相同量的扩增的核酸材料,在一些实施例中,至少两个样品包含基本上不同量的扩增的核酸材料。In some embodiments, the nucleic acid material in the amplified sample may comprise nucleic acid material in amplification "tubes" (eg, PCR tubes), emulsion droplets, microchambers, and other examples above or in other known containers. In some embodiments, the amplified nucleic acid material can be included in two or more (eg, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, or more) sample) to amplify nucleic acid material in physically separated samples (eg, tubes, droplets, chambers, containers, etc.). For example, the initial sample can be divided into multiple containers prior to the amplification step. In some embodiments, each sample contains substantially the same amount of amplified nucleic acid material as each other sample, and in some embodiments, at least two samples contain substantially different amounts of amplified nucleic acid material.

在一些实施例中,至少一个扩增步骤包含至少一种引物,该引物是或包括至少一个非标准核苷酸。在一些实施例中,非标准核苷酸选自尿嘧啶、甲基化核苷酸、RNA核苷酸、核糖核苷酸、8-氧代鸟嘌呤、生物素化核苷酸、锁核酸、肽核酸、高Tm核酸变体、等位基因识别核酸变体、本文别处描述的任何其他核苷酸或接头变体及其任意组合。In some embodiments, at least one amplification step includes at least one primer that is or includes at least one non-standard nucleotide. In some embodiments, the non-standard nucleotides are selected from the group consisting of uracil, methylated nucleotides, RNA nucleotides, ribonucleotides, 8-oxoguanine, biotinylated nucleotides, locked nucleic acids, Peptide nucleic acids, high Tm nucleic acid variants, allelic recognition nucleic acid variants, any other nucleotide or linker variants described elsewhere herein, and any combination thereof.

尽管任何适合应用的扩增反应都被认为与一些实施例相容,但作为具体的示例,在一些实施例中,扩增步骤可以是或包括聚合酶链反应(PCR)、滚环扩增(RCA)、多重置换扩增(MDA)、等温扩增、乳液内的聚合酶克隆扩增、在表面上、珠的表面上或在水凝胶内的桥接扩增,以及它们的任何组合。While any amplification reaction suitable for use is considered compatible with some embodiments, by way of specific example, in some embodiments, the amplification step may be or include polymerase chain reaction (PCR), rolling circle amplification ( RCA), multiple displacement amplification (MDA), isothermal amplification, polymerase clonal amplification in emulsion, bridging amplification on a surface, on the surface of a bead, or within a hydrogel, and any combination thereof.

在一些实施例中,扩增核酸材料包含使用单链寡核苷酸,所述单链寡核苷酸至少部分地与核酸材料的每条链的5'和3'末端上的衔接子序列的区域互补。在一些实施例中,扩增核酸材料包含使用至少一种与靶区域或相关的靶序列(例如,基因组序列、线粒体序列、质粒序列、合成产生的靶核酸等)至少部分地互补的单链寡核苷酸和至少部分与衔接子序列的区域(例如,引物位点)互补的单链寡核苷酸。In some embodiments, amplifying the nucleic acid material comprises using single-stranded oligonucleotides that are at least partially bound to adaptor sequences on the 5' and 3' ends of each strand of the nucleic acid material Regional complementarity. In some embodiments, amplifying the nucleic acid material comprises using at least one single-stranded oligo that is at least partially complementary to the target region or an associated target sequence (eg, genomic sequence, mitochondrial sequence, plasmid sequence, synthetically produced target nucleic acid, etc.). Nucleotides and single-stranded oligonucleotides that are at least partially complementary to regions of the adaptor sequence (eg, primer sites).

通常,稳健的扩增,例如PCR扩增,可以高度地依赖于反应条件。例如,多重PCR对缓冲液组成、单价或二价阳离子浓度、洗涤剂浓度、拥挤剂(即PEG、甘油等)浓度、引物浓度、引物Tms、引物设计、引物GC含量、引物修饰的核苷酸性质和循环条件(即温度和延伸时间以及温度变化的速率)可以是敏感的。缓冲条件的优化可能是困难且耗时的过程。在一些实施例中,扩增反应可以根据先前已知的扩增方案使用缓冲液、引物池浓度和PCR条件中的至少一种。在一些实施例中,可以创建新的扩增方案,和/或可以使用扩增反应优化。作为具体的示例,在一些实施例中,可以使用PCR优化试剂盒,例如来自

Figure BDA0002682281560000651
的PCR优化试剂盒,其含有许多预先配制的缓冲液,这些缓冲液被部分优化用于各种PCR应用,例如多重、实时、富含GC和抑制剂抗性扩增。这些预先配制的缓冲液可以快速地补充有不同的Mg2+和引物浓度,以及引物池比率。此外,在一些实施例中,可以评估和/或使用各种循环条件(例如,热循环)。在评估特定的实施例是否适合特定的期望应用时,可以评估特异性、杂合基因座的等位基因覆盖率、基因座间平衡和深度以及其他方面中的一个或多个。扩增成功的测量可以包含产物的DNA测序、通过凝胶或毛细管电泳或HPLC或其他大小分离方法对产物的评价,随后是片段可视化、使用双链核酸结合染料或荧光探针的熔融曲线分析、质谱或本领域已知的其他方法。In general, robust amplification, such as PCR amplification, can be highly dependent on reaction conditions. For example, multiplex PCR pair buffer composition, monovalent or divalent cation concentration, detergent concentration, crowding agent (i.e. PEG, glycerol, etc.) concentration, primer concentration, primer Tms, primer design, primer GC content, primer modified nucleotides Properties and cycling conditions (ie, temperature and extension time and rate of temperature change) can be sensitive. Optimization of buffer conditions can be a difficult and time-consuming process. In some embodiments, the amplification reaction can use at least one of buffers, primer pool concentrations, and PCR conditions according to previously known amplification protocols. In some embodiments, new amplification protocols can be created, and/or amplification reaction optimizations can be used. As a specific example, in some embodiments, PCR optimization kits can be used, such as from
Figure BDA0002682281560000651
The PCR Optimization Kits from ® contain a number of pre-formulated buffers that are partially optimized for various PCR applications such as multiplex, real-time, GC-rich, and inhibitor-resistant amplification. These pre-formulated buffers can be quickly supplemented with different Mg 2+ and primer concentrations, as well as primer pool ratios. Additionally, in some embodiments, various cycling conditions (eg, thermal cycling) may be evaluated and/or used. In assessing the suitability of a particular embodiment for a particular desired application, one or more of specificity, allelic coverage of heterozygous loci, balance and depth between loci, and other aspects can be assessed. Measurement of amplification success can include DNA sequencing of the product, evaluation of the product by gel or capillary electrophoresis or HPLC or other size separation methods, followed by fragment visualization, melting curve analysis using double-stranded nucleic acid binding dyes or fluorescent probes, Mass spectrometry or other methods known in the art.

根据各种实施例,多种因素中的任何一种都可以影响特定扩增步骤的长度(例如,PCR反应中的循环次数等)。例如,在一些实施例中,所提供的核酸材料可能是受损的或以其他方式次优的(例如降解的和/或污染的)。在这样的情况下,较长的扩增步骤可能有助于确保所需的产物被扩增到可接受的程度。在一些实施例中,扩增步骤可以从每个起始DNA分子提供平均3至10个测序的PCR拷贝,尽管在其他实施例中,仅需要第一链和第二链中的每一个的单个拷贝。不希望局限于特定的理论,太多或太少的PCR拷贝可能导致降低的测定效率,并且最终导致降低的深度。通常,扩增(例如,PCR)反应中使用的核酸(例如,DNA)片段的数量是一个主要的可调节变量,它可以决定共享相同的SMI/条形码序列的读数数量。According to various embodiments, any of a variety of factors can affect the length of a particular amplification step (eg, the number of cycles in a PCR reaction, etc.). For example, in some embodiments, the provided nucleic acid material may be damaged or otherwise suboptimal (eg, degraded and/or contaminated). In such cases, longer amplification steps may help to ensure that the desired product is amplified to an acceptable degree. In some embodiments, the amplification step can provide an average of 3 to 10 sequenced PCR copies from each starting DNA molecule, although in other embodiments only a single copy of each of the first and second strands is required copy. Without wishing to be bound by a particular theory, too many or too few PCR copies may result in reduced assay efficiency and ultimately reduced depth. Typically, the number of nucleic acid (eg, DNA) fragments used in an amplification (eg, PCR) reaction is a major adjustable variable that can determine the number of reads that share the same SMI/barcode sequence.

核酸材料Nucleic acid material

类型type

根据各种实施例,可以使用各种核酸材料中的任何一种。在一些实施例中,核酸材料可以包括对典型的糖-磷酸主链内的多核苷酸的至少一种修饰。在一些实施例中,核酸材料可以在核酸材料中的任何碱基内包括至少一种修饰。例如,作为非限制性的示例,在一些实施例中,核酸材料是或包括双链DNA、单链DNA、双链RNA、单链RNA、肽核酸(PNA)、锁核酸(LNA)中的至少一种。According to various embodiments, any of a variety of nucleic acid materials may be used. In some embodiments, the nucleic acid material can include at least one modification to a polynucleotide within a typical sugar-phosphate backbone. In some embodiments, the nucleic acid material can include at least one modification within any base in the nucleic acid material. For example, by way of non-limiting example, in some embodiments, the nucleic acid material is or includes at least one of double-stranded DNA, single-stranded DNA, double-stranded RNA, single-stranded RNA, peptide nucleic acid (PNA), locked nucleic acid (LNA) A sort of.

来源source

预期核酸材料可以来自多种来源中的任何一种。例如,在一些实施例中,从来自至少一个受试者(例如,人类或动物受试者)或其他生物来源的样品中提供核酸材料。在一些实施例中,核酸材料从库存/储存的样品中提供。在一些实施例中,样品是或包括血液、血清、汗液、唾液、脑髓液、粘液、子宫灌洗液、阴道拭子、鼻拭子、口腔拭子、组织刮屑、毛发、指纹、尿液、粪便、玻璃体液、腹膜洗液、痰液、支气管灌洗液、口腔灌洗液、胸膜灌洗液、胃灌洗液、胃液、胆汁、胰管灌洗液、胆管灌洗液、胆总管灌洗液、胆囊液、滑液、感染的伤口、未感染的伤口、考古样品、法医样品、水样品、组织样品、食物样品、生物反应器样品、植物样品、指甲刮屑、精液、前列腺液、输卵管灌洗液、无细胞核酸、细胞内的核酸、宏基因组样品、植入的异物的灌洗液、鼻灌洗液、肠液、上皮刷取物、上皮灌洗液、组织活检样品、尸检样品、尸体剖检样品、器官样品、人类识别样品、人工产生的核酸样品、合成基因样品、核酸数据储存样品、肿瘤组织以及它们的任意组合中的至少一种。在其他实施例中,样品是或包括微生物、基于植物的生物体或任何收集的环境样品(例如,水、土壤、考古等)中的至少一种。It is contemplated that the nucleic acid material can be derived from any of a variety of sources. For example, in some embodiments, nucleic acid material is provided from a sample from at least one subject (eg, a human or animal subject) or other biological source. In some embodiments, the nucleic acid material is provided from an inventory/stored sample. In some embodiments, the sample is or includes blood, serum, sweat, saliva, cerebrospinal fluid, mucus, uterine lavage, vaginal swabs, nasal swabs, buccal swabs, tissue scrapings, hair, fingerprints, urine , feces, vitreous humor, peritoneal lavage, sputum, bronchial lavage, oral lavage, pleural lavage, gastric lavage, gastric juice, bile, pancreatic lavage, bile duct lavage, common bile duct Irrigation fluid, gallbladder fluid, synovial fluid, infected wounds, uninfected wounds, archaeological samples, forensic samples, water samples, tissue samples, food samples, bioreactor samples, plant samples, nail scrapings, semen, prostate fluid , tubal lavage fluid, cell-free nucleic acid, intracellular nucleic acid, metagenomic samples, lavage fluid for implanted foreign bodies, nasal lavage fluid, intestinal fluid, epithelial brushing, epithelial lavage fluid, tissue biopsy samples, autopsy At least one of samples, necropsy samples, organ samples, human identification samples, artificially generated nucleic acid samples, synthetic genetic samples, nucleic acid data storage samples, tumor tissue, and any combination thereof. In other embodiments, the sample is or includes at least one of a microorganism, a plant-based organism, or any collected environmental sample (eg, water, soil, archaeology, etc.).

修饰retouch

根据各种实施例,核酸材料可以在任何特定步骤之前、基本上同时或之后接受一种或多种修饰,这取决于使用特定提供的方法或组合物的应用。According to various embodiments, the nucleic acid material may undergo one or more modifications before, substantially simultaneously, or after any particular step, depending on the application for which the particular provided method or composition is used.

在一些实施例中,修饰可以是或包括至少一部分核酸材料的修复。尽管任何适合应用的核酸修复的方式被认为与一些实施例相容,但是某些示例性的方法和组合物因此在下文和实例中进行描述。In some embodiments, the modification may be or include repair of at least a portion of the nucleic acid material. Although any means of nucleic acid repair suitable for use are considered compatible with some embodiments, certain exemplary methods and compositions are thus described below and in the examples.

作为非限制性的示例,在一些实施例中,可以利用DNA修复酶,例如尿嘧啶-DNA糖基化酶(UDG)、甲酰胺嘧啶DNA糖基化酶(FPG)和8-氧代鸟嘌呤DNA糖基化酶(OGG1),来校正DNA损伤(例如,体外DNA损伤)。如上面所讨论的,这些DNA修复酶,例如,是从DNA中去除受损的碱基的糖基化酶。例如,UDG去除由胞嘧啶脱氨基(由胞嘧啶的自发水解引起)引起的尿嘧啶,并且FPG去除8-氧代鸟嘌呤(例如,由活性氧物质引起的最常见的DNA损伤)。FPG还具有裂合酶活性,其可以在脱碱基位点生成1个碱基缺口。这样的脱碱基位点随后将不能通过PCR扩增,例如,因为聚合酶不能复制模板。因此,使用这样的DNA损伤修复酶可以有效地去除没有真正突变的损伤的DNA,但是在测序和双链体序列分析之后可能不会以其他方式检测为错误。By way of non-limiting example, in some embodiments, DNA repair enzymes such as uracil-DNA glycosylase (UDG), formamide pyrimidine DNA glycosylase (FPG), and 8-oxoguanine can be utilized DNA glycosylase (OGG1), to correct DNA damage (eg, in vitro DNA damage). As discussed above, these DNA repair enzymes are, for example, glycosylases that remove damaged bases from DNA. For example, UDG removes uracil caused by deamination of cytosine (caused by spontaneous hydrolysis of cytosine), and FPG removes 8-oxoguanine (eg, the most common DNA damage caused by reactive oxygen species). FPG also has lyase activity, which can create a 1-base gap at abasic sites. Such abasic sites would then not be able to be amplified by PCR, for example, because the polymerase cannot replicate the template. Thus, the use of such DNA damage repair enzymes can effectively remove damaged DNA without true mutations, but may not otherwise be detected as errors after sequencing and duplex sequence analysis.

如上面所讨论的,在进一步的实施例中,从本文所述的处理步骤中生成的测序读数可以被进一步过滤,以通过修整最容易生成假象的读数的末端来消除假突变。例如,DNA片段化可以在双链分子的末端生成单链部分。这些单链部分可以在末端修复期间被填充(例如,通过Klenow)。在一些情况下,聚合酶使得在这些末端修复的区域中发生复制错误,导致“假双链分子”的生成。一旦被测序,这些假象可能看起来是真正的突变。作为末端修复机制的结果,这些错误可以通过修整测序读数的末端以排除可能发生的任何突变,从而减少错误突变的数量而从测序后的分析中被消除。在一些实施例中,测序读数的这样的修整可以自动完成(例如,正常过程步骤)。在一些实施例中,可以评估片段末端区域的突变频率,并且如果在片段末端区域中观察到阈值水平的突变,则可以在生成DNA片段的双链共有序列读数之前进行测序读数修整。As discussed above, in further embodiments, the sequencing reads generated from the processing steps described herein can be further filtered to eliminate false mutations by trimming the ends of the reads that are most prone to artifacts. For example, DNA fragmentation can generate single-stranded portions at the ends of double-stranded molecules. These single stranded moieties can be filled in during end repair (eg, by Klenow). In some cases, polymerases cause replication errors in these end-repaired regions, resulting in the creation of "pseudoduplexes." Once sequenced, these artifacts may appear to be true mutations. As a result of the end-repair mechanism, these errors can be eliminated from post-sequencing analysis by trimming the ends of the sequenced reads to exclude any mutations that may have occurred, thereby reducing the number of erroneous mutations. In some embodiments, such trimming of sequencing reads can be done automatically (eg, normal process steps). In some embodiments, the frequency of mutations in the fragment end regions can be assessed, and if a threshold level of mutation is observed in the fragment end regions, sequencing read trimming can be performed prior to generating double-stranded consensus reads of the DNA fragment.

DS方法的一些实施例提供了基于PCR的靶向的富集策略,该策略与用于错误校正的分子条形码的使用兼容。例如,利用连接的模板的分离的PCR用于测序的测序富集策略(“SPLiT-DS”)方法步骤也可以受益于使用本文所述的一个或多个实施例的预富集的核酸材料。SPLiT-DS最初在国际专利公开第WO/2018/175997号(其全部内容通过引用并入本文)中描述。SPLiT-DS方法可以以与上述相似的方式并参照标准的DS文库构建方案从用分子条形码标记(labelling)(例如,标记(tagging))片段化的双链核酸材料(例如,来自DNA样品)开始。在一些实施例中,双链核酸材料可以被片段化(例如,诸如用无细胞的DNA、受损的DNA等);然而,在其他实施例中,各种步骤可以包含使用机械剪切诸如声处理或其他DNA切割方法(诸如本文进一步描述的)来片段化核酸材料。标记片段化的双链核酸材料的方面可以包含末端修复和3'-dA-拖尾(如果在特定的应用中需要的话),随后用含有SMI的DS衔接子连接双链核酸片段。在其他实施例中,SMI可以是内源序列,或者是外源序列和内源序列的组合,用于唯一地关联来自原始核酸分子的两条链的信息。在将衔接子分子连接到双链核酸材料后,该方法可以继续进行扩增(例如,PCR扩增、滚环扩增、多重置换扩增、等温扩增、桥接扩增、表面结合扩增等)。Some embodiments of the DS method provide a PCR-based targeted enrichment strategy compatible with the use of molecular barcodes for error correction. For example, the Sequencing Enrichment Strategy for Sequencing ("SPLiT-DS") method step utilizing isolated PCR of ligated templates may also benefit from the use of pre-enriched nucleic acid material of one or more of the embodiments described herein. SPLiT-DS was originally described in International Patent Publication No. WO/2018/175997 (the entire contents of which are incorporated herein by reference). The SPLiT-DS method can start from double-stranded nucleic acid material (eg, from a DNA sample) fragmented with molecular barcode labelling (eg, tagging) in a similar manner to that described above and with reference to standard DS library construction protocols . In some embodiments, the double-stranded nucleic acid material can be fragmented (eg, such as with cell-free DNA, damaged DNA, etc.); however, in other embodiments, the various steps can involve the use of mechanical shearing such as acoustic Treatment or other DNA cleavage methods (such as described further herein) to fragment nucleic acid material. Aspects of labeling the fragmented double-stranded nucleic acid material may include end repair and 3'-dA-tailing (if desired in a particular application) followed by ligation of the double-stranded nucleic acid fragments with SMI-containing DS adapters. In other embodiments, the SMI can be an endogenous sequence, or a combination of exogenous and endogenous sequences, used to uniquely correlate information from the two strands of the original nucleic acid molecule. After ligation of the adaptor molecule to the double-stranded nucleic acid material, the method can proceed to amplification (eg, PCR amplification, rolling circle amplification, multiple displacement amplification, isothermal amplification, bridging amplification, surface-bound amplification, etc. ).

在某些实施例中,可以使用对例如一个或多个衔接子序列特异的引物来扩增核酸材料的每一条链,从而产生源自原始双链核酸分子的每一条链的核酸扩增子的多个拷贝,其中每个扩增子保留最初相关的SMI。在扩增和去除反应副产物的相关步骤之后,可以将样品(优选地,但不是必须地,基本上均匀地)分成两个或更多个单独的样品(例如,在管中、在乳液液滴中、在微室中、在表面上的分离的液滴中或在其他已知的容器中,被统称为“管”)。在分离后,并且根据SPLiT-DS过程的一个实施例,该方法可以包含通过使用对第一衔接子序列特异的引物扩增第一样品中的第一链以提供第一核酸产物,以及通过使用对第二衔接子序列特异的引物扩增第二样品中的第二链以提供第二核酸产物。接下来,该方法可以包含对第一核酸产物和第二核酸产物中的每一种进行测序,并切将第一核酸产物的序列与第二核酸产物的序列进行比较。在一些实施例中,核酸材料在核酸材料的每条链的5'和3'末端的每一个上包括衔接子序列。在某些应用中,可以使用至少部分地与相关的靶序列互补的单链寡核苷酸来完成分离的样品中单独的链的扩增,使得至少部分地保持单分子标识符序列。In certain embodiments, primers specific for, for example, one or more adaptor sequences can be used to amplify each strand of nucleic acid material, thereby producing a nucleic acid amplicon derived from each strand of the original double-stranded nucleic acid molecule. Multiple copies where each amplicon retains the originally associated SMI. Following the relevant steps of amplification and removal of reaction by-products, the sample can be (preferably, but not necessarily, substantially uniformly) divided into two or more separate samples (eg, in a tube, in an emulsion in droplets, in microchambers, in separate droplets on a surface, or in other known containers, collectively referred to as "tubes"). After isolation, and according to one embodiment of the SPLiT-DS process, the method may comprise amplifying the first strand in the first sample by using primers specific for the first adaptor sequence to provide the first nucleic acid product, and by The second strand in the second sample is amplified using primers specific for the second adaptor sequence to provide a second nucleic acid product. Next, the method can comprise sequencing each of the first nucleic acid product and the second nucleic acid product, and comparing the sequence of the first nucleic acid product to the sequence of the second nucleic acid product. In some embodiments, the nucleic acid material includes adaptor sequences on each of the 5' and 3' ends of each strand of the nucleic acid material. In certain applications, amplification of individual strands in an isolated sample can be accomplished using single-stranded oligonucleotides that are at least partially complementary to associated target sequences, such that single-molecule identifier sequences are at least partially preserved.

应用的选择的示例Examples of applied selections

如本文所述,所提供的方法和组合物可以用于各种目的中的任何一种和/或各种情况中的任何一种。仅为了具体说明的目的,以下描述了非限制性应用和/或情况的示例。As described herein, the provided methods and compositions can be used for any one of a variety of purposes and/or any one of a variety of situations. For the purpose of specific illustration only, non-limiting examples of applications and/or situations are described below.

监测对疗法的反应(肿瘤突变等)Monitoring response to therapy (tumor mutations, etc.)

基因组研究中的下一代测序(NGS)的出现使得能够以前所未有的细节表征肿瘤的突变态势,并且导致诊断、预后和临床可行突变的分类。总的来说,这些突变为通过个体化的药物改善癌症结果以及为潜在的早期癌症检测和筛查带来了巨大的希望。在本公开之前,该领域的关键限制是当这些突变以低频率存在时,不能检测到它们。临床活检常常主要由正常细胞组成,并且即使对现代NGS来说,基于其DNA突变检测癌细胞也是一项技术挑战。在成千上万个正常基因组中鉴定肿瘤突变类似于大海捞针,需要超出先前已知的方法的测序准确度的水平。The advent of next-generation sequencing (NGS) in genomic research has enabled the characterization of tumor mutational landscapes in unprecedented detail and has led to diagnostic, prognostic, and classification of clinically actionable mutations. Collectively, these mutations hold great promise for improving cancer outcomes through personalized medicine and for potential early cancer detection and screening. Prior to this disclosure, a key limitation in the field was that these mutations could not be detected when they were present at low frequencies. Clinical biopsies are often composed primarily of normal cells, and detecting cancer cells based on their DNA mutations is a technical challenge even for modern NGS. Identifying tumor mutations in thousands of normal genomes is akin to finding a needle in a haystack, requiring levels of sequencing accuracy that exceed previously known methods.

一般来说,在液体活检的情况下,这个问题更加严重,其中挑战不仅是提供发现肿瘤突变所需的极端灵敏度,而且还要在这些活检中通常存在的最少量的DNA的情况下做到这一点。术语‘液体活检’通常是指血液基于循环肿瘤DNA(ctDNA)的存在来告知癌症的能力。ctDNA被细胞释放到血流中,并且已经显示出监测、检测和预测癌症以及进行肿瘤基因分型和疗法选择的巨大前景。这些应用可能会彻底改变目前对患有癌症的患者的管理,然而,进展比以前预期的要慢。主要的问题是,ctDNA通常仅代表在血浆中存在的所有无细胞的DNA(cfDNA)的很小的一部分。在转移性癌症中,其频率可能>5%,但是在局部性癌症中,其频率仅在1%-0.001%之间。理论上,任何大小的DNA亚群都可通过测定足够数量的分子来检测。然而,先前方法的一个基本限制是碱基被错误地评分的高频率。错误通常出现在聚类生成、测序周期、差的聚类分辨率和模板退化期间。结果是约0.1-1%的测序的碱基被错误地调用。进一步的问题可能来自PCR期间的聚合酶错误和扩增偏差,这可能导致群体偏斜或引入假突变等位基因频率(MAF)。综上所述,先前已知的技术(包含常规的NGS)都不能以检测低频突变所需的水平执行。In general, this problem is exacerbated in the context of liquid biopsies, where the challenge is not only to provide the extreme sensitivity needed to find tumor mutations, but also to do so with the minimal amount of DNA typically present in these biopsies a little. The term 'liquid biopsy' generally refers to the ability of blood to tell cancer based on the presence of circulating tumor DNA (ctDNA). ctDNA is released by cells into the bloodstream and has shown great promise for monitoring, detecting and predicting cancer, as well as for tumor genotyping and therapy selection. These applications may revolutionize the current management of patients with cancer, however, progress has been slower than previously expected. The main problem is that ctDNA typically represents only a small fraction of all cell-free DNA (cfDNA) present in plasma. In metastatic cancers, its frequency may be >5%, but in localized cancers, its frequency is only between 1% and 0.001%. In theory, DNA subpopulations of any size can be detected by assaying a sufficient number of molecules. However, a fundamental limitation of previous methods is the high frequency with which bases are incorrectly scored. Errors typically occur during cluster generation, sequencing cycles, poor cluster resolution, and template degradation. The result is that about 0.1-1% of the sequenced bases are called incorrectly. Further problems can arise from polymerase errors and amplification bias during PCR, which can lead to population skew or introduce false mutant allele frequencies (MAFs). In conclusion, none of the previously known techniques, including conventional NGS, can perform at the level required to detect low-frequency mutations.

由于其高准确性,DS以及用于增加这些测序平台的转换效率和工作流效率的方法在肿瘤学领域具有希望。如本文所描述的,所提供的方法和组合物允许DS方法的创新方法,其将DS的双链分子标记与靶核酸富集整合在一起,用于提高效率和可扩展性,同时保持错误校正。Due to its high accuracy, DS and methods for increasing the conversion and workflow efficiency of these sequencing platforms hold promise in the field of oncology. As described herein, the provided methods and compositions allow for innovative approaches to DS methods that integrate double-stranded molecular labeling of DS with target nucleic acid enrichment for increased efficiency and scalability while maintaining error correction .

除了需要高度准确性和高效的测定之外,临床实验室的现实也需要快速、可扩展和合理的成本效益的测定。因此,根据本技术的方面的提高DS的工作流效率(例如,DS的富集策略)的各种实施例是高度期望的。如本文所述,用于DS应用的特定靶序列的消化/大小选择富集和基于亲和的富集提供了高的靶特异性、低DNA输入性能、可扩展性和最小成本。In addition to the need for highly accurate and efficient assays, the realities of clinical laboratories also demand assays that are fast, scalable, and reasonably cost-effective. Accordingly, various embodiments that improve workflow efficiency of DSs (eg, enrichment strategies for DSs) in accordance with aspects of the present technology are highly desirable. As described herein, digestion/size selection enrichment and affinity-based enrichment of specific target sequences for DS applications provides high target specificity, low DNA input performance, scalability, and minimal cost.

所提供的方法和组合物的一些实施例对于一般的癌症研究特别重要,并且对于ctDNA领域特别重要,因为本文开发的技术具有以前所未有的灵敏度鉴定癌症突变的潜力,同时使DNA输入、制备时间和成本最小化。本文公开的靶核酸富集实施例可用于可以通过改善的患者管理和早期癌症检测显著地提高存活率的临床应用。Some embodiments of the provided methods and compositions are particularly important for cancer research in general, and for the ctDNA field, because the technology developed herein has the potential to identify cancer mutations with unprecedented sensitivity, while enabling DNA input, preparation time, and Cost minimization. The target nucleic acid enrichment embodiments disclosed herein can be used in clinical applications that can significantly increase survival through improved patient management and early cancer detection.

患者分层patient stratification

患者分层(其通常是指基于一个或多个非治疗相关因素对患者进行划分)是医学界非常关注的话题。这种关注的很大一部分可能是由于某些治疗候选物未能获得FDA的批准的事实,部分是由于在试验中患者之间以前未被识别的差异。这些差异可以是或包含一种或多种遗传差异,这些遗传差异导致治疗剂被不同地代谢,或者导致在一组患者相对于一组或多组其他患者中出现或加剧的副作用。在一些情况下,这些差异中的一些或全部可以被检测为患者中的一种或多种不同的遗传特征,其导致对治疗剂的反应不同于没有表现出相同遗传特征的其他患者。Patient stratification, which generally refers to the division of patients based on one or more non-treatment-related factors, is a topic of great interest in the medical community. Much of this concern may be due to the fact that some therapeutic candidates have failed to gain FDA approval, in part due to previously unrecognized differences between patients in trials. These differences may be or include one or more genetic differences that result in a therapeutic agent being metabolized differently, or side effects that occur or are exacerbated in one group of patients relative to one or more groups of other patients. In some cases, some or all of these differences can be detected as one or more different genetic characteristics in a patient that result in a different response to a therapeutic agent than in other patients who do not exhibit the same genetic characteristics.

因此,在一些实施例中,所提供的方法和组合物可用于确定特定患者群体(例如,患有常见疾病、障碍或病症的患者)中的哪个或哪些受试者可以对特定疗法有反应。例如,在一些实施例中,所提供的方法和/或组合物可以用于评估特定的受试者是否具有与疗法的差的反应相关的基因型。在一些实施例中,所提供的方法和/或组合物可以用于评估特定的受试者是否具有与对疗法的积极反应相关的基因型。Thus, in some embodiments, the provided methods and compositions can be used to determine which subject or subjects in a particular patient population (eg, patients with a common disease, disorder, or condition) may respond to a particular therapy. For example, in some embodiments, provided methods and/or compositions can be used to assess whether a particular subject has a genotype associated with poor response to therapy. In some embodiments, provided methods and/or compositions can be used to assess whether a particular subject has a genotype associated with a positive response to therapy.

法医学Forensic Science

先前的法医DNA分析的方法几乎完全依靠PCR扩增子的毛细管电泳分离,以鉴定短串联重复序列中的长度多态性。自其1991年引入以来,这种类型的分析已被证明极其有价值。从那时起,若干出版物已经引入了标准化方案,在世界各地的实验室中验证了它们的使用,详细介绍了它在许多不同群体组中的使用,并引入了更有效的方法,诸如miniSTR。Previous approaches to forensic DNA analysis have relied almost exclusively on capillary electrophoretic separation of PCR amplicons to identify length polymorphisms in short tandem repeats. Since its introduction in 1991, this type of analysis has proven extremely valuable. Since then, several publications have introduced standardized protocols, validated their use in laboratories around the world, detailed its use in many different population groups, and introduced more efficient methods such as miniSTR .

虽然这种方法被证明非常成功,但是该技术具有许多限制其实用性的缺点。例如,目前的STR基因分型的方法通常会产生背景信号,这是由聚合酶在模板DNA上的滑动引起的PCR影子带造成的。该问题在具有多于一个的贡献者的样品中尤其重要,因为难以区分影子带等位基因和真正的等位基因。另一个问题出现在分析降解的DNA样品时。片段长度的变化通常导致更长的PCR片段显著降低,或者甚至缺失。因此,来自降解的DNA的图谱通常具有较低的辨别能力。While this approach has proven very successful, the technique has a number of drawbacks that limit its usefulness. For example, current methods of STR genotyping often produce background signals caused by PCR shadow bands caused by polymerase slippage on template DNA. This problem is especially important in samples with more than one contributor because it is difficult to distinguish between shadow band alleles and true alleles. Another problem arises when analyzing degraded DNA samples. Variations in fragment length often result in significantly reduced, or even absent, longer PCR fragments. Consequently, profiles from degraded DNA typically have lower discriminative power.

MPS系统的引入具有解决法医学分析中的若干个挑战性问题的潜力。例如,这些平台提供了无与伦比的能力,以允许同时分析细胞核和mtDNA的STR和SNP,这将极大地增加个人之间的区分能力,并且提供确定种族和甚至身体属性的可能性。此外,与简单地报告分子的聚集群体的平均基因型的PCR-CE不同,MPS技术以数字方式将许多单个DNA分子的完整核苷酸序列制成表格,从而提供了检测异质DNA混合物中MAF的独特能力。由于包括两个或更多个贡献者的法医学标本仍然是法医学中最成问题的问题之一,因此MPS对法医学领域的影响可能是巨大的。The introduction of MPS systems has the potential to address several challenging problems in forensic analysis. For example, these platforms offer unparalleled capabilities to allow simultaneous analysis of nuclear and mtDNA STRs and SNPs, which will greatly increase the ability to discriminate between individuals and offer the possibility to determine ethnicity and even physical attributes. Furthermore, unlike PCR-CE, which simply reports the average genotype of an aggregated population of molecules, the MPS technique numerically tabulates the complete nucleotide sequence of many individual DNA molecules, providing the ability to detect MAFs in heterogeneous DNA mixtures unique ability. As forensic specimens that include two or more contributors remain one of the most problematic issues in forensic science, the impact of MPS on the field of forensic science can be enormous.

人类基因组的发表凸显了MPS平台的巨大力量。然而,直到最近,这些平台的全部功能对法医学具有有限的应用,因为读数长度明显短于STR基因座,排除了调用基于长度的基因型的能力。最初,焦磷酸测序仪(诸如Roche 454平台)是唯一具有足够的读数长度以对核心STR基因座进行测序的平台。然而,竞争技术中的读数长度已经增加,从而使它们在法医学应用中的效用发挥了作用。许多研究已经揭示了STR基因座的MPS基因分型的潜力。总的来说,所有这些研究的总体结果(无论平台如何)都是可以成功地对STR进行分型,从而产生可与CE分析相当的基因型,甚至是从受损的法医样品中。The publication of the human genome highlights the enormous power of the MPS platform. However, until recently, the full capabilities of these platforms had limited application to forensics because read lengths were significantly shorter than STR loci, precluding the ability to call length-based genotypes. Initially, pyrosequencers such as the Roche 454 platform were the only platforms with sufficient read length to sequence core STR loci. However, read lengths in competing technologies have increased, giving their utility in forensic applications a role. Numerous studies have revealed the potential for MPS genotyping of STR loci. Overall, the overall result of all these studies (regardless of platform) is that STRs can be successfully typed, resulting in genotypes comparable to CE analysis, even from compromised forensic samples.

虽然所有这些研究都显示出与传统的PCR-CE方法一致,并且甚至显示出额外的益处,如检测STR内SNP,但它们也强调了该技术目前存在的一些问题。例如,目前用于STR基因分型的MPS方法依赖于多重PCR,以提供足够的DNA来测序和引入PCR引物。然而,因为多重PCR试剂盒被设计用于PCR-CE,它们含有不同大小扩增子的引物。这种变化导致覆盖不平衡,偏向于较小片段的扩增,这可能导致等位基因的缺失。事实上,最近的研究已经表明,PCR效率的差异可以影响混合物组分,特别是在低MAF下。为了解决这一问题,若干专门被设计用于法医学的测序试剂盒现在是可以商购的,并且验证研究开始被报道。然而,由于高水平的多路复用,扩增偏差仍然明显。While all of these studies show agreement with traditional PCR-CE methods and even show additional benefits such as detection of SNPs within STRs, they also highlight some of the current problems with this technique. For example, current MPS methods for STR genotyping rely on multiplex PCR to provide enough DNA to sequence and introduce PCR primers. However, because multiplex PCR kits are designed for PCR-CE, they contain primers for amplicons of different sizes. This variation results in an imbalance in coverage, favoring the amplification of smaller fragments, which can lead to loss of alleles. In fact, recent studies have shown that differences in PCR efficiency can affect mixture components, especially at low MAF. To address this issue, several sequencing kits specifically designed for use in forensics are now commercially available, and validation studies are beginning to be reported. However, amplification bias is still evident due to high levels of multiplexing.

像PCR-CE一样,MPS不能避免PCR影子带的发生。绝大多数关于STR的MPS研究报告了人工插入等位基因的发生。最近,系统的MPS研究报告,大多数影子带事件表现为较短的长度多态性,其在四个碱基对单位中与真实等位基因不同,其中最常见的是n-4,但其中也观察到n-8和n-12位置。影子带的百分比通常发生在约1%的读数中,但在一些基因座可能高达3%,这表明MPS可以以比PCR-CE更高的比率显示影子带。Like PCR-CE, MPS cannot avoid PCR shadow bands. The vast majority of MPS studies on STRs report the occurrence of artificially inserted alleles. Recently, a systematic MPS study reported that most shadow band events manifested as shorter length polymorphisms that differed from true alleles in four base pair units, the most common of which was n-4, but which The n-8 and n-12 positions were also observed. The percentage of shadow bands typically occurs in about 1% of reads, but may be as high as 3% at some loci, suggesting that MPS can display shadow bands at a higher rate than PCR-CE.

相反,在一些实施例中,所提供的方法和组合物允许对低质量和/或少量样品进行高质量和有效的测序,如上文和以下示例中所描述的。因此,在一些实施例中,所提供的方法和/或组合物可用于从与不同基因型的另一个个体的DNA以低丰度混合的一个个体的DNA的罕见变体检测。Rather, in some embodiments, the provided methods and compositions allow for high-quality and efficient sequencing of low-quality and/or small amounts of samples, as described above and in the Examples below. Thus, in some embodiments, provided methods and/or compositions may be used for rare variant detection from DNA of one individual mixed in low abundance with DNA of another individual of a different genotype.

法医DNA样品通常含有非人类DNA。这种外源DNA的潜在来源是:DNA的来源(例如唾液或颊样品中的微生物),收集样品的表面环境以及来自实验室的污染(例如试剂、工作区等)。由一些实施例提供的另一个方面是某些提供的方法和组合物允许将污染的核酸材料与其他来源(例如,不同物种)和/或表面或环境污染物区分开来,使得这些材料(和/或它们的影响)可以从最终分析中去除,并且不使测序结果产生偏差。Forensic DNA samples often contain non-human DNA. Potential sources of such exogenous DNA are: the source of the DNA (eg, microorganisms in saliva or buccal samples), the surface environment where the sample was collected, and contamination from the laboratory (eg, reagents, work area, etc.). Another aspect provided by some embodiments is that certain provided methods and compositions allow contaminating nucleic acid material to be distinguished from other sources (eg, different species) and/or surface or environmental contaminants such that these materials (and /or their effects) can be removed from the final analysis without biasing the sequencing results.

在高度降解的DNA中,由于DNA片段不含有必需的引物退火位点,所以基因座特异性PCR可能无法很好地工作,从而导致等位基因缺失。这种情况将限制基因型调用的唯一性,并且匹配的置信度不太保证,尤其是在混合试验中。然而,在一些实施例中,所提供的方法和组合物允许使用除STR标记之外或作为STR标记的替代物的单核苷酸多态性(SNP)。In highly degraded DNA, locus-specific PCR may not work well because the DNA fragments do not contain the necessary primer annealing sites, resulting in allelic deletions. This situation will limit the uniqueness of genotype calls and the confidence in matching is less guaranteed, especially in mixed trials. However, in some embodiments, the provided methods and compositions allow for the use of single nucleotide polymorphisms (SNPs) in addition to or in place of STR markers.

实际上,随着人类遗传变异数据的不断增加,SNP与法医工作越来越相关。这样,在一些实施例中,所提供的方法和组合物使用引物设计策略,使得可以例如基于当前可用的测序试剂盒来创建多重引物板,其实际上确保了读数遍历一个或多个SNP位置。In fact, SNPs are increasingly relevant for forensic work as data on human genetic variation continues to increase. As such, in some embodiments, the provided methods and compositions use primer design strategies such that multiplex primer plates can be created, eg, based on currently available sequencing kits, that actually ensure that reads traverse one or more SNP positions.

进一步的示例further example

1.一种用于富集靶核酸材料的方法,包括:1. A method for enriching target nucleic acid material, comprising:

提供核酸材料;provide nucleic acid material;

用一种或多种靶向的核酸内切酶切割核酸材料,使得预定长度的靶区域与核酸材料的其余部分分离;cleaving the nucleic acid material with one or more targeted endonucleases such that a target region of a predetermined length is separated from the remainder of the nucleic acid material;

酶促破坏非靶向的核酸材料;Enzymatic destruction of non-targeted nucleic acid material;

从靶向的核酸内切酶中释放预定长度的靶区域;和release a target region of predetermined length from the targeted endonuclease; and

分析切割的靶区域。The cleaved target region is analyzed.

2.根据示例1所述的方法,其中酶促破坏非靶向的核酸材料包括提供核酸外切酶。2. The method of example 1, wherein enzymatically destroying the non-targeted nucleic acid material comprises providing an exonuclease.

3.根据示例1所述的方法,其中酶促破坏非靶向的核酸材料包括提供核酸外切酶和核酸内切酶中的一种或多种。3. The method of example 1, wherein enzymatically destroying the non-targeted nucleic acid material comprises providing one or more of an exonuclease and an endonuclease.

4.根据示例1所述的方法,其中破坏包括酶消化和酶切割中的至少一种。4. The method of example 1, wherein disrupting comprises at least one of enzymatic digestion and enzymatic cleavage.

5.根据示例1-4中任一项所述的方法,其中在酶促破坏步骤期间,一种或多种靶向的核酸内切酶保持与靶区域结合。5. The method of any one of examples 1-4, wherein the one or more targeted endonucleases remain bound to the target region during the enzymatic destruction step.

6.根据示例1-5中任一项所述的方法,其中至少一种靶向的核酸内切酶是包括捕获标记的核糖核蛋白复合物,并且其中预定长度的靶区域通过捕获标记与核酸的其余部分物理分离,同时至少一种靶向的核酸内切酶保持与靶区域结合。6. The method according to any one of examples 1-5, wherein at least one targeted endonuclease is a ribonucleoprotein complex comprising a capture label, and wherein a target region of a predetermined length is linked to the nucleic acid by capturing the label. The remainder are physically separated, while at least one targeted endonuclease remains bound to the target region.

7.根据示例1-5所述的方法,其中至少一种靶向的核酸内切酶是包括捕获标记的核糖核蛋白复合物,并且其中该方法进一步包括用被配置为结合捕获标记的提取部分捕获靶区域。7. The method according to example 1-5, wherein at least one targeted endonuclease is a ribonucleoprotein complex comprising a capture label, and wherein the method further comprises using an extraction portion configured to bind the capture label Capture the target area.

8.根据示例6或示例7所述的方法,其中捕获标记是或包括以下中的至少一种:吖啶、叠氮化物、叠氮化物(NHS酯)、洋地黄毒苷(NHS酯)、I-接头、氨基改性剂C6、氨基改性剂C12、氨基改性剂C6 dT、Unilink氨基改性剂、己炔基、5-辛二炔基dU、生物素、生物素(叠氮化物)、生物素dT、生物素TEG、双生物素、PC生物素、脱硫生物素TEG、硫醇改性剂C3、二硫醇、硫醇改性剂C6 S-S、琥珀酰基基团。8. The method of example 6 or example 7, wherein the capture label is or comprises at least one of acridine, azide, azide (NHS ester), digoxigenin (NHS ester), I-Linker, Amino Modifier C6, Amino Modifier C12, Amino Modifier C6 dT, Unilink Amino Modifier, Hexynyl, 5-Octadiynyl dU, Biotin, Biotin (azide) ), biotin dT, biotin TEG, dual biotin, PC biotin, desthiobiotin TEG, thiol modifier C3, dithiol, thiol modifier C6 S-S, succinyl group.

9.根据示例7所述的方法,其中提取部分是或包括氨基硅烷、环氧硅烷、异硫氰酸酯、氨基苯基硅烷、氨基丙基硅烷、巯基硅烷、醛、环氧化物、膦酸酯、链霉亲和素、抗生物素蛋白、识别抗体的半抗原、特定核酸序列、磁性吸引颗粒(Dynabeads)和光不稳定树脂中的至少一种。9. The method of example 7, wherein the extraction moiety is or comprises aminosilane, epoxysilane, isothiocyanate, aminophenylsilane, aminopropylsilane, mercaptosilane, aldehyde, epoxide, phosphonic acid At least one of esters, streptavidin, avidin, haptens that recognize antibodies, specific nucleic acid sequences, magnetically attractive particles (Dynabeads), and photolabile resins.

10.根据示例7所述的方法,其中提取部分被结合到表面。10. The method of example 7, wherein the extraction moiety is bound to the surface.

11.根据示例7所述的方法,其中在酶促破坏非靶向的核酸材料后,靶区域被物理分离。11. The method of example 7, wherein the target region is physically separated after enzymatically destroying the non-targeted nucleic acid material.

12.根据示例1-11中任一项所述的方法,其中所述一种或多种靶向的核酸内切酶选自由核糖核蛋白、Cas酶、Cas9样酶、Cpf1酶、巨核酸酶、基于转录激活因子样效应子的核酸酶(TALEN)、锌指核酸酶、精氨酸核酸酶或其组合组成的组。12. The method according to any one of examples 1-11, wherein the one or more targeted endonucleases are selected from the group consisting of ribonucleoproteins, Cas enzymes, Cas9-like enzymes, Cpf1 enzymes, meganucleases , Transcription activator-like effector-based nucleases (TALENs), zinc finger nucleases, arginine nucleases, or the group consisting of combinations thereof.

13.根据示例1-12中任一项所述的方法,其中所述一种或多种靶向的核酸内切酶包括Cas9或CPF1或其衍生物。13. The method of any one of examples 1-12, wherein the one or more targeted endonucleases comprise Cas9 or CPF1 or a derivative thereof.

14.根据示例1-13中任一项所述的方法,其中切割核酸材料包含用一种或多种靶向的核酸内切酶切割核酸材料,使得形成多于一个的基本上已知长度的靶核酸片段。14. The method according to any one of examples 1-13, wherein cutting the nucleic acid material comprises cutting the nucleic acid material with one or more targeted endonucleases such that more than one substantially known length is formed. target nucleic acid fragment.

15.根据示例14所述的方法,进一步包括基于预定长度分离多于一个的靶核酸片段。15. The method of example 14, further comprising separating more than one target nucleic acid fragment based on a predetermined length.

16.根据示例15所述的方法,其中靶核酸片段具有不同的基本上已知的长度。16. The method of example 15, wherein the target nucleic acid fragments are of different substantially known lengths.

17.根据示例15所述的方法,其中靶核酸片段各自包括来自基因组中一个或多个不同位置的相关的基因组序列。17. The method of example 15, wherein the target nucleic acid fragments each comprise related genomic sequences from one or more distinct locations in the genome.

18.根据示例15所述的方法,其中靶核酸片段各自包括来自核酸材料中基本上已知区域的靶向的序列。18. The method of example 15, wherein the target nucleic acid fragments each comprise a targeted sequence from a substantially known region in the nucleic acid material.

19.根据示例15-18中任一项所述的方法,其中基于基本上已知的长度分离靶核酸片段包含通过凝胶电泳、凝胶纯化、液相色谱、大小排阻纯化、过滤或SPRI珠纯化来富集靶核酸片段。19. The method of any one of examples 15-18, wherein separating target nucleic acid fragments based on substantially known lengths comprises by gel electrophoresis, gel purification, liquid chromatography, size exclusion purification, filtration, or SPRI Bead purification to enrich target nucleic acid fragments.

20.根据示例1所述的方法,进一步包括将至少一个SMI和/或衔接子序列连接到预定长度的切割的靶区域的5'或3'末端中的至少一个。20. The method of example 1, further comprising ligating at least one SMI and/or adaptor sequence to at least one of the 5' or 3' end of the cleaved target region of predetermined length.

21.根据示例1所述的方法,其中分析包括靶区域的定量和/或测序。21. The method of example 1, wherein the analysis comprises quantification and/or sequencing of the target region.

22.根据示例21所述的方法,其中定量包括分光光度分析、实时PCR和/或基于荧光的定量中的至少一种。22. The method of example 21, wherein quantification comprises at least one of spectrophotometric analysis, real-time PCR, and/or fluorescence-based quantification.

23.根据示例21所述的方法,其中测序包括双链测序、SPLiT-双链测序、Sanger测序、鸟枪法测序、桥接扩增/测序、纳米孔测序、单分子实时测序、离子激流测序、焦磷酸测序、数字测序(例如,基于数字条形码的测序)、直接数字测序、通过连接的测序、基于聚合酶克隆的测序、基于电流的测序(例如,隧道电流)、通过质谱的测序、基于微流体的测序以及它们的任意组合。23. The method of example 21, wherein sequencing comprises double-stranded sequencing, SPLiT-double-stranded sequencing, Sanger sequencing, shotgun sequencing, bridging amplification/sequencing, nanopore sequencing, single-molecule real-time sequencing, ion torrent sequencing, pyrosequencing Phospho-sequencing, digital sequencing (eg, digital barcode-based sequencing), direct digital sequencing, sequencing by ligation, polymerase cloning-based sequencing, current-based sequencing (eg, tunneling current), sequencing by mass spectrometry, microfluidics-based and any combination of them.

24.根据示例21所述的方法,其中测序包括:24. The method of example 21, wherein sequencing comprises:

对靶区域的第一链进行测序以生成第一链序列读数;sequencing the first strand of the target region to generate a first strand sequence read;

对靶区域的第二链进行测序以生成第二链序列读数;和sequencing the second strand of the target region to generate a second strand sequence read; and

将第一链序列读数与第二链序列读数进行比较,以生成错误校正的序列读数。The first strand sequence reads are compared to the second strand sequence reads to generate error-corrected sequence reads.

25.根据示例24所述的方法,其中错误校正的序列读数包括在第一链序列读数与第二链序列读数之间一致的核苷酸碱基。25. The method of example 24, wherein the error-corrected sequence reads comprise nucleotide bases that are identical between the first-strand sequence reads and the second-strand sequence reads.

26.根据示例24或示例25所述的方法,其中在错误校正的序列读数中的特定位置出现的变异被鉴定为真正的变体。26. The method of example 24 or example 25, wherein variants occurring at specific positions in the error-corrected sequence reads are identified as true variants.

27.根据示例24-26中任一项所述的方法,其中仅在第一链序列读数或第二链序列读数之一中的特定位置出现的变异被鉴定为潜在伪像。27. The method of any one of examples 24-26, wherein variations occurring only at specific positions in one of the first strand sequence reads or the second strand sequence reads are identified as potential artifacts.

28.根据示例24-27中任一项所述的方法,其中使用错误校正的序列读数来鉴定或表征双链靶核酸分子所衍生自的生物体或受试者中的癌症、癌症风险、癌症突变、癌症代谢状态、突变表型、致癌物暴露、毒素暴露、慢性炎症暴露、年龄、神经退行性疾病、病原体、抗药性变体、胎儿分子、法医相关分子、免疫学相关分子、突变的T细胞受体、突变的B细胞受体、突变的免疫球蛋白基因座、基因组中的kategis位点、基因组中的高变位点、低频率变体、亚克隆变体、少数分子群体、污染源、核酸合成错误、酶修饰错误、化学修饰错误、基因编辑错误、基因疗法错误、核酸信息存储片段、微生物准种、病毒准种、器官移植、器官移植排斥、癌症复发、治疗后残留癌症、肿瘤前状态、发育异常状态、微嵌合状态、干细胞移植状态、细胞疗法状态、附着于另一分子的核酸标记或其组合。28. The method according to any one of examples 24-27, wherein the error-corrected sequence reads are used to identify or characterize cancer, cancer risk, cancer in the organism or subject from which the double-stranded target nucleic acid molecule is derived Mutation, Cancer Metabolic Status, Mutant Phenotype, Carcinogen Exposure, Toxin Exposure, Chronic Inflammation Exposure, Age, Neurodegenerative Diseases, Pathogens, Drug Resistant Variants, Fetal Molecules, Forensic Relevant Molecules, Immunological Relevant Molecules, Mutated T cell receptors, mutated B cell receptors, mutated immunoglobulin loci, kategis loci in the genome, hypervariable loci in the genome, low frequency variants, subclonal variants, minority molecular populations, sources of contamination, Nucleic acid synthesis errors, enzyme modification errors, chemical modification errors, gene editing errors, gene therapy errors, nucleic acid information storage fragments, microbial quasispecies, virus quasispecies, organ transplantation, organ transplant rejection, cancer recurrence, residual cancer after treatment, pre-tumor A state, a dysplastic state, a microchimeric state, a stem cell transplant state, a cell therapy state, a nucleic acid marker attached to another molecule, or a combination thereof.

29.根据示例24-27中任一项所述的方法,其中使用错误校正的序列读数来鉴定诱变化合物或暴露。29. The method of any one of examples 24-27, wherein error-corrected sequence reads are used to identify mutagenic compounds or exposures.

30.根据示例24-27中任一项所述的方法,其中使用错误校正的序列读数来鉴定致癌化合物或暴露。30. The method of any one of examples 24-27, wherein error-corrected sequence reads are used to identify oncogenic compounds or exposures.

31.根据示例24-27中任一项所述的方法,其中所述核酸材料来自法医样品,并且其中所述错误校正的序列读数用于法医分析。31. The method of any one of examples 24-27, wherein the nucleic acid material is from a forensic sample, and wherein the error-corrected sequence reads are used for forensic analysis.

32.根据示例1所述的方法,其中所述靶向的核酸内切酶包含CRISPR相关的(Cas)酶、核糖核蛋白复合物、归巢核酸内切酶、锌指核酸酶、转录激活因子样效应核酸酶(TALEN)、精氨酸核酸酶和/或megaTAL核酸酶中的至少一种。32. The method of example 1, wherein the targeted endonucleases comprise CRISPR-associated (Cas) enzymes, ribonucleoprotein complexes, homing endonucleases, zinc finger nucleases, transcription activators At least one of a nuclease-like effector nuclease (TALEN), an arginine nuclease and/or a megaTAL nuclease.

33.根据示例32所述的方法,其中CRISPR相关的(Cas)酶是Cas9或Cpf1。33. The method of example 32, wherein the CRISPR-associated (Cas) enzyme is Cas9 or Cpf1.

34.根据示例32所述的方法,其中CRISPR相关的(Cas)酶是Cpf1,并且其中靶区域包括预定或已知核苷酸序列的5'突出端和3'突出端。34. The method of example 32, wherein the CRISPR-associated (Cas) enzyme is Cpfl, and wherein the target region comprises 5' overhangs and 3' overhangs of predetermined or known nucleotide sequences.

35.根据示例1所述的方法,其中用靶向的核酸内切酶切割核酸材料包括用多于一种的靶向的核酸内切酶切割核酸材料。35. The method of example 1, wherein cleaving the nucleic acid material with a targeted endonuclease comprises cleaving the nucleic acid material with more than one targeted endonuclease.

36.根据示例35所述的方法,其中所述多于一种的靶向的核酸内切酶包含针对多于一个的靶区域的多于一种的Cas酶。36. The method of example 35, wherein the more than one targeted endonuclease comprises more than one Cas enzyme for more than one target region.

37.根据示例35所述的方法,其中用靶向的核酸内切酶切割核酸材料使得预定长度的靶区域与核酸材料的其余部分分离包括用一对靶向的核酸内切酶切割靶区域,所述靶向的核酸内切酶被引导以预定的距离切割核酸材料,以便生成具有预定长度的靶区域。37. The method according to example 35, wherein cutting the nucleic acid material with a targeted endonuclease such that the target region of a predetermined length is separated from the rest of the nucleic acid material comprises cutting the target region with a pair of targeted endonucleases, The targeted endonuclease is directed to cleave the nucleic acid material by a predetermined distance so as to generate a target region having a predetermined length.

38.根据示例37所述的方法,其中该对靶核酸内切酶包括一对Cas酶。38. The method of example 37, wherein the pair of target endonucleases comprises a pair of Cas enzymes.

39.根据示例38所述的方法,其中该对Cas酶包含相同类型的Cas酶。39. The method of example 38, wherein the pair of Cas enzymes comprise the same type of Cas enzyme.

40.根据示例38所述的方法,其中该对Cas酶包括两种不同类型的Cas酶。40. The method of example 38, wherein the pair of Cas enzymes comprises two different types of Cas enzymes.

41.一种用于富集靶核酸材料的方法,包括:41. A method for enriching target nucleic acid material, comprising:

提供核酸材料;provide nucleic acid material;

用一种或多种靶向的核酸内切酶切割核酸材料,使得预定长度的靶区域与核酸材料的其余部分分离,其中至少一种靶向的核酸内切酶包括捕获标记;cleaving the nucleic acid material with one or more targeted endonucleases such that a target region of predetermined length is separated from the remainder of the nucleic acid material, wherein the at least one targeted endonuclease includes a capture label;

用被配置为结合捕获标记的提取部分捕获预定长度的靶区域;capturing a target region of a predetermined length with an extraction moiety configured to bind a capture label;

从靶向的核酸内切酶中释放预定长度的靶区域;和release a target region of predetermined length from the targeted endonuclease; and

分析切割的靶区域。The cleaved target region is analyzed.

42.一种用于富集靶核酸材料的方法,包括:42. A method for enriching target nucleic acid material, comprising:

提供核酸材料;provide nucleic acid material;

将无催化活性的CRISPR相关的(Cas)酶结合到核酸材料的靶区域;binding catalytically inactive CRISPR-associated (Cas) enzymes to target regions of nucleic acid material;

用一种或多种核酸消化酶对核酸材料进行酶处理,使得非靶向的核酸材料被破坏,并且靶区域被结合的无催化活性的Cas酶保护以免受消化酶的影响;Enzymatic treatment of nucleic acid material with one or more nucleic acid-digesting enzymes such that non-targeted nucleic acid material is destroyed and target regions are protected from the digestive enzymes by bound catalytically inactive Cas enzymes;

从无催化活性的Cas酶中释放靶区域;和Release the target region from catalytically inactive Cas enzymes; and

分析靶区域。Analyze target regions.

43.根据示例42所述的方法,其中结合步骤包括将一对无催化活性的Cas酶结合到靶区域,使得结合的Cas酶之间的核酸材料被酶促保护以免受消化酶的影响,从而富集靶区域的靶核酸材料。43. The method of example 42, wherein the binding step comprises binding a pair of catalytically inactive Cas enzymes to the target region such that nucleic acid material between the bound Cas enzymes is enzymatically protected from digestive enzymes, thereby The target nucleic acid material is enriched in the target region.

44.根据示例42所述的方法,其中无催化活性的Cas酶包括捕获标记,并且其中该方法进一步包括用被配置为结合捕获标记的提取部分捕获靶区域。44. The method of example 42, wherein the catalytically inactive Cas enzyme comprises a capture label, and wherein the method further comprises capturing the target region with an extraction moiety configured to bind the capture label.

45.根据示例42所述的方法,进一步包括通过大小选择来富集靶区域。45. The method of example 42, further comprising enriching the target region by size selection.

46.一种用于富集靶核酸材料的方法,包括:46. A method for enriching target nucleic acid material, comprising:

提供核酸材料;provide nucleic acid material;

提供一对催化活性的靶向的核酸内切酶和至少一种包括捕获标记的无催化活性的靶向的核酸内切酶,其中所述无催化活性的靶向的核酸内切酶被定向以结合核酸材料的靶区域,并且其中所述一对催化活性的靶向的核酸内切酶被定向以结合无催化活性的靶向的核酸内切酶的任一侧上的靶区域;A pair of catalytically active targeted endonucleases and at least one catalytically inactive targeted endonuclease comprising a capture label are provided, wherein the catalytically inactive targeted endonuclease is directed to binding to a target region of a nucleic acid material, and wherein the pair of catalytically active targeted endonucleases are oriented to bind to the target region on either side of the catalytically inactive targeted endonuclease;

用该对催化活性的靶向的核酸内切酶切割核酸材料,使得靶区域与核酸材料的其余部分分离;cleaving the nucleic acid material with the pair of catalytically active targeted endonucleases such that the target region is separated from the remainder of the nucleic acid material;

用被配置为结合捕获标记的提取部分捕获靶区域;capturing the target region with an extraction moiety configured to bind the capture label;

从靶向的核酸内切酶中释放靶区域;和release the target region from the targeted endonuclease; and

分析切割的靶区域。The cleaved target region is analyzed.

47.一种用于从包括多个核酸片段的样品中富集靶核酸材料的方法,包括:47. A method for enriching target nucleic acid material from a sample comprising a plurality of nucleic acid fragments, comprising:

向包括靶核酸片段和非靶核酸片段的样品提供具有捕获标记的一种或多种无催化活性的CRISPR相关的(Cas)酶,其中一种或多种无催化活性的Cas酶被配置为结合靶核酸片段;providing one or more catalytically inactive CRISPR-associated (Cas) enzymes with capture labels to a sample comprising target nucleic acid fragments and non-target nucleic acid fragments, wherein the one or more catalytically inactive Cas enzymes are configured to bind to target nucleic acid fragment;

提供包括提取部分的表面,所述提取部分被配置为结合捕获标记;和providing a surface comprising an extraction moiety configured to bind a capture label; and

通过经由由提取部分结合捕获标记来捕获靶核酸片段,将靶核酸片段与非靶核酸片段分离。The target nucleic acid fragments are separated from the non-target nucleic acid fragments by capturing the target nucleic acid fragments via binding of the capture label by the extraction moiety.

48.根据示例47所述的方法,进一步包括在提供一种或多种无催化活性的CRISPR相关的(Cas)酶之前,将衔接子分子连接到多个核酸片段的末端。48. The method of example 47, further comprising ligating adaptor molecules to the ends of the plurality of nucleic acid fragments prior to providing the one or more catalytically inactive CRISPR-associated (Cas) enzymes.

49.一种用于富集靶双链核酸材料的方法,包括:49. A method for enriching target double-stranded nucleic acid material, comprising:

提供核酸材料;provide nucleic acid material;

用一种或多种靶向的核酸内切酶切割核酸材料以生成双链靶核酸片段,该双链靶核酸片段包括具有5'预定核苷酸序列的5'粘性末端和/或具有3'预定核苷酸序列的3'粘性末端;和Cleavage of nucleic acid material with one or more targeted endonucleases to generate double-stranded target nucleic acid fragments comprising 5' cohesive ends having a 5' predetermined nucleotide sequence and/or having 3' the 3' sticky ends of the predetermined nucleotide sequence; and

通过5'粘性末端和3'粘性末端中的至少一个将双链靶核酸分子与核酸材料的其余部分分离。The double-stranded target nucleic acid molecule is separated from the remainder of the nucleic acid material by at least one of the 5' sticky ends and the 3' sticky ends.

50.根据示例49所述的方法,进一步包括提供至少一种测序衔接子分子,其包括与5'预定核苷酸序列或3'预定核苷酸序列至少部分地互补的可连接末端;50. The method of example 49, further comprising providing at least one sequencing adaptor molecule comprising ligatable ends that are at least partially complementary to a 5' predetermined nucleotide sequence or a 3' predetermined nucleotide sequence;

将至少一个测序衔接子分子连接到双链靶核酸分子上;和ligating at least one sequencing adaptor molecule to the double-stranded target nucleic acid molecule; and

通过测序来分析双链靶核酸片段。Double-stranded target nucleic acid fragments are analyzed by sequencing.

51.根据示例50所述的方法,其中所述至少一种衔接子分子包括Y形或U形。51. The method of example 50, wherein the at least one adaptor molecule comprises a Y-shape or a U-shape.

52.根据示例50所述的方法,其中所述至少一种衔接子分子是发夹分子。52. The method of example 50, wherein the at least one adaptor molecule is a hairpin molecule.

53.根据示例50所述的方法,其中所述至少一种衔接子分子包括被配置为由提取部分结合的捕获分子。53. The method of example 50, wherein the at least one adaptor molecule comprises a capture molecule configured to be bound by an extraction moiety.

54.根据示例50所述的方法,其中测序衔接子分子被连接到双链靶核酸片段的5'粘性末端和3'粘性末端中的每一个。54. The method of example 50, wherein the sequencing adapter molecule is ligated to each of the 5' cohesive end and the 3' cohesive end of the double-stranded target nucleic acid fragment.

55.根据示例49所述的方法,其中通过5'粘性末端和3'粘性末端中的至少一个将双链靶核酸分子与核酸材料的其余部分分离包括提供寡核苷酸,所述寡核苷酸具有与5'预定核苷酸序列或3'预定核苷酸序列至少部分地互补的序列。55. The method of example 49, wherein separating the double-stranded target nucleic acid molecule from the rest of the nucleic acid material by at least one of the 5' cohesive end and the 3' cohesive end comprises providing an oligonucleotide, the oligonucleotide The acid has a sequence that is at least partially complementary to the 5' predetermined nucleotide sequence or the 3' predetermined nucleotide sequence.

56.根据示例55所述的方法,其中寡核苷酸被结合到表面。56. The method of example 55, wherein the oligonucleotide is bound to a surface.

57.根据示例55所述的方法,其中寡核苷酸包括被配置为结合提取部分的捕获标记。57. The method of example 55, wherein the oligonucleotide comprises a capture label configured to bind the extraction moiety.

58.根据示例49所述的方法,其中所述一种或多种靶向的核酸内切酶包含Cpf1。58. The method of example 49, wherein the one or more targeted endonucleases comprise Cpfl.

59.根据示例49所述的方法,其中所述一种或多种靶向的核酸内切酶包含Cas9切口酶。59. The method of example 49, wherein the one or more targeted endonucleases comprise a Cas9 nickase.

60.一种用于富集靶核酸材料的试剂盒,包括:60. A kit for enriching target nucleic acid material, comprising:

核酸文库,所述核酸文库包括:A nucleic acid library comprising:

核酸材料;和nucleic acid material; and

多种无催化活性的Cas酶,其中所述Cas酶包括具有序列代码的标签,A variety of catalytically inactive Cas enzymes, wherein the Cas enzymes include tags with sequence codes,

其中所述多种Cas酶沿着核酸材料结合到多个位点特异性靶区域;wherein the plurality of Cas enzymes bind to a plurality of site-specific target regions along the nucleic acid material;

多个探针,其中每个探针包括:Multiple probes, where each probe includes:

包括相应的序列代码的补体的寡核苷酸序列;和the oligonucleotide sequence of the complement including the corresponding sequence code; and

捕获标签;和capture tags; and

查找表,其对位点特异性靶区域、与位点特异性靶区域相关的序列代码和包括相应的序列代码的补体的探针之间的关系进行分类。A look-up table that classifies the relationship between site-specific target regions, sequence codes associated with the site-specific target regions, and probes that include complements of the corresponding sequence codes.

61.根据上述示例中任一项所述的方法,其中所述核酸材料是或包括双链DNA和双链RNA中的至少一种。61. The method of any one of the preceding examples, wherein the nucleic acid material is or comprises at least one of double-stranded DNA and double-stranded RNA.

62.根据上述示例中任一项所述的方法,其中至少一些核酸材料被破坏。62. The method of any one of the preceding examples, wherein at least some of the nucleic acid material is destroyed.

63.根据示例62所述的方法,其中损伤是或包括氧化、烷基化、脱氨基、甲基化、水解、羟基化、切口、链内交联、链间交联、钝端链断裂、交错末端双链断裂、磷酸化、去磷酸化、类泛素化、糖基化、去糖基化、腐胺酰化、羧酰化、卤化、甲酰化、单链间隙、由热引起的损伤、由干燥引起的损伤、由UV暴露引起的损伤、由γ辐射引起的损伤、由X辐射引起的损伤、由电离辐射引起的损伤、由非电离辐射引起的损伤、由重颗粒辐射引起的损伤、由核衰变引起的损伤、由β辐射引起的损伤、由α辐射引起的损伤、由中子辐射引起的损伤、由质子辐射引起的损伤、由宇宙辐射引起的损伤、由高pH引起的损伤、由低pH引起的损伤、由活性氧化物质引起的损伤、由自由基引起的损伤、由过氧化物引起的损伤、由次氯酸盐引起的损伤、由诸如福尔马林或甲醛等的组织固定引起的损伤、由活性铁引起的损伤、由低离子条件引起的损伤、由高离子条件引起的损伤、由无缓冲条件引起的损伤、由核酸酶引起的损伤、由环境暴露引起的损伤、由火灾引起的损伤、由机械应力引起的损伤、由酶降解引起的损伤、由微生物引起的损伤、由制备性机械剪切引起的损伤、由制备性酶切引起的损伤、在体内自然发生的损伤、在核酸提取期间发生的损伤、在测序文库制备期间发生的损伤、通过聚合酶引入的损伤、在核酸修复期间引入的损伤、在核酸末端拖尾期间发生的损伤、在核酸连接期间发生的损伤、在测序期间发生的损伤,由于机械处理DNA而发生的损伤、在通过纳米孔的期间发生的损伤、作为在生物体中老化的一部分而发生的损伤、由于个体的化学暴露而发生的损伤、由于诱变剂而发生的损伤、由于致癌物而发生的损伤、由断裂剂而发生的损伤、由于氧暴露引起的体内炎症损伤而发生的损伤、由于一条或多条链断裂而引起的损伤以及它们的任意组合中的至少一种。63. The method of example 62, wherein the damage is or comprises oxidation, alkylation, deamination, methylation, hydrolysis, hydroxylation, nicking, intrachain crosslinking, interchain crosslinking, blunt end chain scission, Staggered-end double-strand breaks, phosphorylation, dephosphorylation, ubiquitination, glycosylation, deglycosylation, putrescylation, carboxylation, halogenation, formylation, single-strand gap, heat-induced damage, damage by drying, damage by UV exposure, damage by gamma radiation, damage by X-radiation, damage by ionizing radiation, damage by non-ionizing radiation, damage by heavy particle radiation damage, damage by nuclear decay, damage by beta radiation, damage by alpha radiation, damage by neutron radiation, damage by proton radiation, damage by cosmic radiation, damage by high pH damage, damage caused by low pH, damage caused by reactive oxidizing species, damage caused by free radicals, damage caused by peroxides, damage caused by hypochlorite, damage caused by, for example, formalin or formaldehyde, etc. damage caused by tissue fixation, damage caused by active iron, damage caused by low ionic conditions, damage caused by high ionic conditions, damage caused by unbuffered conditions, damage caused by nucleases, damage caused by environmental exposure damage, damage by fire, damage by mechanical stress, damage by enzymatic degradation, damage by microorganisms, damage by preparative mechanical shearing, damage by preparative enzymatic cleavage, natural in vivo Damages that occur, Damages that occur during nucleic acid extraction, Damages that occur during sequencing library preparation, Damages introduced by polymerases, Damages introduced during nucleic acid repair, Damages that occur during nucleic acid end tailing, During nucleic acid ligation Damage that occurs, Damage that occurs during sequencing, Damage that occurs due to mechanical processing of DNA, Damage that occurs during passage through a nanopore, Damage that occurs as part of aging in an organism, Damage that occurs due to chemical exposure of an individual damage due to mutagens, damage due to carcinogens, damage due to cleavage agents, damage due to in vivo inflammatory damage due to oxygen exposure, damage due to one or more strand breaks damage and at least one of any combination thereof.

64.根据上述示例中任一项所述的方法,其中从包括一种或多种源自受试者或生物体的双链核酸分子的样品中提供核酸材料。64. The method of any one of the preceding examples, wherein the nucleic acid material is provided from a sample comprising one or more double-stranded nucleic acid molecules derived from a subject or organism.

65.根据示例64所述的方法,其中样品是或包括身体组织、活检样品、皮肤样品、血液、血清、血浆、汗液、唾液、脑脊液、粘液、子宫灌洗液、阴道拭子、巴氏涂片、鼻拭子、口腔拭子、组织刮屑、毛发、指纹、尿液、粪便、玻璃体液、腹膜洗液、痰液、支气管灌洗液、口腔灌洗液、胸膜灌洗液、胃灌洗液、胃液、胆汁、胰管灌洗液、胆管灌洗液、胆总管灌洗液、胆囊液、滑液、感染的伤口、未感染的伤口、考古样品、法医样品、水样品、组织样品、食品样品、生物反应器样品、植物样品、细菌样品、原生动物样品、真菌样品、动物样品、病毒样品、多生物样品、指甲刮屑、精液、前列腺液、阴道液、阴道拭子、输卵管灌洗液、无细胞核酸、细胞内的核酸、宏基因组样品、植入的异物的灌洗液或拭子、鼻灌洗液、肠液、上皮刷取物、上皮灌洗液、组织活检样品、尸检样品、尸体剖检样品、器官样品、人类识别样品、非人类识别样品、人工产生的核酸样品、合成基因样品、库存的或储存的样品、肿瘤组织、胎儿样品、器官移植样品、微生物培养样品、细胞核DNA样品、线粒体DNA样品、叶绿体DNA样品、顶质体DNA样品、细胞器样品以及它们的任意组合。65. The method according to example 64, wherein the sample is or comprises body tissue, biopsy sample, skin sample, blood, serum, plasma, sweat, saliva, cerebrospinal fluid, mucus, uterine lavage, vaginal swab, Pap smear Sheets, nasal swabs, oral swabs, tissue scrapings, hair, fingerprints, urine, feces, vitreous humor, peritoneal lavage, sputum, bronchial lavage, oral lavage, pleural lavage, gastric lavage Washing fluid, gastric fluid, bile, pancreatic duct lavage, bile duct lavage, common bile duct lavage, gallbladder fluid, synovial fluid, infected wounds, uninfected wounds, archaeological samples, forensic samples, water samples, tissue samples , food samples, bioreactor samples, plant samples, bacterial samples, protozoa samples, fungal samples, animal samples, virus samples, multi-organism samples, nail scrapings, semen, prostatic fluid, vaginal fluid, vaginal swabs, tubal irrigation Washes, cell-free nucleic acids, intracellular nucleic acids, metagenomic samples, lavages or swabs of implanted foreign bodies, nasal lavages, intestinal fluids, epithelial brushes, epithelial lavages, tissue biopsy samples, autopsy Samples, necropsy samples, organ samples, human identification samples, non-human identification samples, artificially generated nucleic acid samples, synthetic genetic samples, banked or stored samples, tumor tissue, fetal samples, organ transplant samples, microbial culture samples, Nuclear DNA samples, mitochondrial DNA samples, chloroplast DNA samples, acroplast DNA samples, organelle samples, and any combination thereof.

66.根据上述示例中任一项所述的方法,其中核酸材料包括基本上均匀长度或接近均匀长度的核酸分子。66. The method of any one of the preceding examples, wherein the nucleic acid material comprises nucleic acid molecules of substantially uniform length or near uniform length.

67.根据上述示例中任一项所述的方法,其中靶核酸材料来源于受试者或生物体。67. The method of any one of the preceding examples, wherein the target nucleic acid material is derived from a subject or organism.

68.根据上述示例中任一项所述的方法,其中靶核酸材料已经被至少部分地人工合成。68. The method of any one of the preceding examples, wherein the target nucleic acid material has been at least partially synthesized.

69.根据上述示例中任一项所述的方法,其中最初提供至多1000ng的核酸材料。69. The method of any one of the preceding examples, wherein up to 1000 ng of nucleic acid material is initially provided.

70.根据上述示例中任一项所述的方法,其中最初提供至多10ng的核酸材料。70. The method of any one of the preceding examples, wherein at most 10 ng of nucleic acid material is initially provided.

71.根据上述示例中任一项所述的方法,其中核酸材料包括来自多于一种来源的核酸材料。71. The method of any one of the preceding examples, wherein the nucleic acid material comprises nucleic acid material from more than one source.

等同物和范围Equivalents and Scope

对本技术的实施例的上述详细描述并不旨在穷举或将本技术限制为上述公开的精确形式。尽管以上出于说明的目的描述了该技术的具体实施例和示例,但是如相关领域的技术人员将认识到的,在该技术的范围内,各种等同的修改是可能的。例如,虽然步骤以给定的顺序呈现,但是替代的实施例可以以不同的顺序执行步骤。本文描述的各种实施例也可以被组合以提供进一步的实施例。本文引用的所有参考文献都通过引用被并入,如同在本文中完全阐述一样。The foregoing detailed description of embodiments of the present technology is not intended to be exhaustive or to limit the present technology to the precise forms disclosed above. While specific embodiments of, and examples for, the technology have been described above for illustrative purposes, various equivalent modifications are possible within the scope of the technology, as those skilled in the relevant art will recognize. For example, although the steps are presented in a given order, alternative embodiments may perform the steps in a different order. The various embodiments described herein can also be combined to provide further embodiments. All references cited herein are incorporated by reference as if fully set forth herein.

根据前述内容,将理解,本文为了说明的目的已经描述了该技术的特定实施例,但是没有详细地示出或描述公知的结构和功能,以避免不必要地模糊对该技术的实施例的描述。在上下文允许的情况下,单数或复数术语也可以分别包含复数或单数术语。此外,虽然已经在那些实施例的上下文中描述了与该技术的某些实施例相关联的优点,但是其他实施例也可以展示这样的优点,并且并非所有实施例都需要展示这样的优点以落入该技术的范围内。因此,本公开和相关的技术可以包含本文没有明确地示出或描述的其他实施例。From the foregoing, it will be appreciated that specific embodiments of the technology have been described herein for illustrative purposes, but well-known structures and functions have not been shown or described in detail to avoid unnecessarily obscuring the description of embodiments of the technology . Singular or plural terms may also include plural or singular terms, respectively, where the context allows. Furthermore, while advantages associated with certain embodiments of the technology have been described in the context of those embodiments, other embodiments may exhibit such advantages, and not all embodiments need to exhibit such advantages in order to fall within the scope of this technology. Accordingly, the present disclosure and related technology may encompass other embodiments not expressly shown or described herein.

本领域技术人员将认识到,或者能够仅使用常规实验来确定本文描述的所公开的技术的具体实施例的许多等同物。本技术的范围并不旨在局限于上面的描述,而是如在所附的权利要求中所阐述的。Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments of the disclosed technology described herein. The scope of the present technology is not intended to be limited by the above description, but rather is as set forth in the appended claims.

Claims (60)

1. A method for enriching a target nucleic acid material, comprising:
providing a nucleic acid material;
Cleaving the nucleic acid material with one or more targeted endonucleases such that a target region of a predetermined length is separated from the remainder of the nucleic acid material;
enzymatically disrupting non-targeted nucleic acid material;
releasing the target region of the predetermined length from the targeted endonuclease; and
the cleaved target region is analyzed.
2. The method of claim 1, wherein enzymatically disrupting non-targeted nucleic acid material comprises providing an exonuclease.
3. The method of claim 1, wherein enzymatically disrupting non-targeted nucleic acid material comprises providing one or more of an exonuclease and an endonuclease.
4. The method of claim 1, wherein disrupting comprises at least one of enzymatic digestion and enzymatic cleavage.
5. The method of any one of claims 1-4, wherein the one or more targeted endonucleases remain bound to the target region during the enzymatic disruption step.
6. The method according to any one of claims 1-5, wherein at least one targeted endonuclease is a ribonucleoprotein complex comprising a capture label, and wherein the target region of predetermined length is physically separated from the remainder of the nucleic acid by the capture label while the at least one targeted endonuclease remains bound to the target region.
7. The method of any one of claims 1-5, wherein at least one targeted endonuclease is a ribonucleoprotein complex comprising a capture label, and wherein the method further comprises capturing the target region with an extraction moiety configured to bind the capture label.
8. The method of claim 6 or claim 7, wherein the capture label is or comprises at least one of: acridine, azide (NHS ester), digitoxin (NHS ester), I-linker, amino modifier C6, amino modifier C12, amino modifier C6 dT, Unilink amino modifier, hexynyl, 5-octadiynyl dU, biotin (azide), biotin dT, biotin TEG, bis-biotin, PC biotin, desthiobiotin TEG, thiol modifier C3, dithiol, thiol modifier C6S-S, succinyl group.
9. The method of claim 7, wherein the extraction moiety is or comprises at least one of an aminosilane, an epoxysilane, an isothiocyanate, an aminophenylsilane, an aminopropylsilane, a mercaptosilane, an aldehyde, an epoxide, a phosphonate, streptavidin, avidin, a hapten for a recognition antibody, a specific nucleic acid sequence, a magnetically attractable particle (Dynabeads), and a photolabile resin.
10. The method of claim 7, wherein the extraction moiety is bound to a surface.
11. The method of claim 7, wherein the target region is physically separated after enzymatically disrupting the non-targeted nucleic acid material.
12. The method of any one of claims 1-11, wherein the one or more targeted endonucleases are selected from the group consisting of ribonucleoproteins, Cas enzymes, Cas 9-like enzymes, Cpf1 enzymes, meganucleases, transcription activator-like effector based nucleases (TALENs), zinc finger nucleases, arginine nucleases, or combinations thereof.
13. The method of any one of claims 1-12, wherein the one or more targeted endonucleases comprise Cas9 or CPF1 or derivatives thereof.
14. The method of any one of claims 1-13, wherein cleaving the nucleic acid material comprises cleaving the nucleic acid material with one or more targeted endonucleases such that more than one target nucleic acid fragment of substantially known length is formed.
15. The method of claim 14, further comprising isolating the more than one target nucleic acid fragments based on the predetermined length.
16. The method of claim 15, wherein the target nucleic acid fragments have different substantially known lengths.
17. The method of claim 15, wherein the target nucleic acid fragments each comprise related genomic sequences from one or more different locations in a genome.
18. The method of claim 15, wherein the target nucleic acid fragments each comprise a targeted sequence from a substantially known region within the nucleic acid material.
19. The method of any one of claims 15-18, wherein isolating the target nucleic acid fragments based on a substantially known length comprises enriching the target nucleic acid fragments by gel electrophoresis, gel purification, liquid chromatography, size exclusion purification, filtration, or SPRI bead purification.
20. The method of claim 1, further comprising ligating at least one SMI and/or adaptor sequence to at least one of the 5 'or 3' ends of a predetermined length of the cleaved target region.
21. The method of claim 1, wherein analyzing comprises quantification and/or sequencing of the target region.
22. The method of claim 21, wherein quantifying comprises at least one of spectrophotometric analysis, real-time PCR, and/or fluorescence-based quantification.
23. The method of claim 21, wherein sequencing comprises double-stranded sequencing, SPLiT-double-stranded sequencing, Sanger sequencing, shotgun sequencing, bridge amplification/sequencing, nanopore sequencing, single molecule real-time sequencing, ion torrent sequencing, pyrosequencing, digital sequencing (e.g., digital barcode-based sequencing), direct digital sequencing, sequencing by ligation, polymerase clone-based sequencing, current-based sequencing (e.g., tunneling current), sequencing by mass spectrometry, microfluidic-based sequencing, and any combination thereof.
24. The method of claim 21, wherein sequencing comprises:
sequencing a first strand of the target region to generate first strand sequence reads;
sequencing a second strand of the target region to generate second strand sequence reads; and
comparing the first strand sequence reads to the second strand sequence reads to generate error-corrected sequence reads.
25. The method of claim 24, wherein the error-corrected sequence reads comprise nucleotide bases that are identical between the first strand sequence reads and the second strand sequence reads.
26. The method of claim 24 or claim 25, wherein a variation occurring at a particular position in the error-corrected sequence reads is identified as a true variant.
27. The method of any one of claims 24-26, wherein variations that occur only at specific positions in one of the first strand sequence reads or the second strand sequence reads are identified as potential artifacts.
28. The method of any one of claims 24-27, wherein the error-corrected sequence reads are used to identify or characterize cancer, cancer risk, cancer mutation, cancer metabolic state, mutation phenotype, carcinogen exposure, toxin exposure, chronic inflammatory exposure, age, neurodegenerative disease, pathogen, drug-resistant variant, fetal molecule, forensic-related molecule, immunologically-related molecule, mutated T cell receptor, mutated B cell receptor, mutated immunoglobulin locus, kategis site in genome, hypervariable site in genome, low frequency variant, subcloned variant, minority molecule population, contamination source, nucleic acid synthesis error, enzymatic modification error, chemical modification error, gene editing error, gene therapy error, nucleic acid information storage fragment, nucleic acid in an organism or subject from which a double-stranded target nucleic acid molecule is derived, A microbial quasispecies, a viral quasispecies, an organ transplant rejection, a cancer recurrence, a post-treatment residual cancer, a pre-neoplastic state, a dysplastic state, a micro-chimeric state, a stem cell transplant state, a cell therapy state, a nucleic acid marker attached to another molecule, or a combination thereof.
29. The method of any one of claims 24-27, wherein the error corrected sequence reads are used to identify mutagenic compounds or exposures.
30. The method of any one of claims 24-27, wherein the error corrected sequence reads are used to identify an oncogenic compound or exposure.
31. The method of any one of claims 24-27, wherein the nucleic acid material is from a forensic sample, and wherein the error corrected sequence reads are used for forensic analysis.
32. The method of claim 1, wherein the targeted endonuclease comprises at least one of a CRISPR-associated (Cas) enzyme, a ribonucleoprotein complex, a homing endonuclease, a zinc finger nuclease, a transcription activator-like effector nuclease (TALEN), an arginine nuclease, and/or a megaTAL nuclease.
33. The method of claim 32, wherein the CRISPR-associated (Cas) enzyme is Cas9 or Cpf 1.
34. The method of claim 32, wherein the CRISPR-associated (Cas) enzyme is Cpf1, and wherein the target region comprises a 5 'overhang and a 3' overhang of a predetermined or known nucleotide sequence.
35. The method of claim 1, wherein cleaving the nucleic acid material with a targeted endonuclease comprises cleaving the nucleic acid material with more than one targeted endonuclease.
36. The method of claim 35, wherein the more than one targeted endonuclease comprises more than one Cas enzyme for more than one target region.
37. The method of claim 35, wherein cleaving the nucleic acid material with a targeted endonuclease such that a target region of a predetermined length is separated from the remainder of the nucleic acid material comprises cleaving the target region with a pair of targeted endonucleases, the targeted endonucleases being oriented to cleave the nucleic acid material at a predetermined distance so as to generate the target region of a predetermined length.
38. The method of claim 37, wherein the pair of target nucleic acid endonucleases comprises a pair of Cas enzymes.
39. The method of claim 38, wherein the pair of Cas enzymes comprises the same type of Cas enzyme.
40. The method of claim 38, wherein the pair of Cas enzymes comprises two different types of Cas enzymes.
41. A method for enriching a target nucleic acid material, comprising:
providing a nucleic acid material;
cleaving the nucleic acid material with one or more targeted endonucleases such that a target region of a predetermined length is separated from the remainder of the nucleic acid material, wherein at least one targeted endonuclease includes a capture label;
Capturing the target region of the predetermined length with an extraction moiety configured to bind the capture label;
releasing the target region of the predetermined length from the targeted endonuclease; and
the cleaved target region is analyzed.
42. A method for enriching a target nucleic acid material, comprising:
providing a nucleic acid material;
binding a catalytically inactive CRISPR-associated (Cas) enzyme to a target region of the nucleic acid material;
enzymatically treating the nucleic acid material with one or more nucleic acid digesting enzymes such that non-targeted nucleic acid material is destroyed and the target region is protected from the digesting enzymes by a bound catalytically inactive Cas enzyme;
releasing the target region from the catalytically inactive Cas enzyme; and
analyzing the target region.
43. The method of claim 42, wherein the binding step comprises binding a pair of catalytically inactive Cas enzymes to the target region such that nucleic acid material between the bound Cas enzymes is enzymatically protected from the digestive enzymes, thereby enriching the target nucleic acid material of the target region.
44. The method of claim 42, wherein the catalytically inactive Cas enzyme comprises a capture label, and wherein the method further comprises capturing the target region with an extraction moiety configured to bind the capture label.
45. The method of claim 42, further comprising enriching the target region by size selection.
46. A method for enriching a target nucleic acid material, comprising:
providing a nucleic acid material;
providing a pair of catalytically active targeted endonucleases and at least one catalytically inactive targeted endonuclease comprising capture tags, wherein the catalytically inactive targeted endonuclease is oriented to bind to the target region of the nucleic acid material, and wherein the pair of catalytically active targeted endonucleases are oriented to bind to the target region on either side of the catalytically inactive targeted endonuclease;
cleaving the nucleic acid material with the pair of catalytically active targeted endonucleases such that the target region is separated from the remainder of the nucleic acid material;
capturing the target region with an extraction moiety configured to bind the capture label;
releasing the target region from the targeted endonuclease; and
the cleaved target region is analyzed.
47. A method for enriching a target nucleic acid material from a sample comprising a plurality of nucleic acid fragments, comprising:
providing one or more catalytically inactive CRISPR-associated (Cas) enzymes with a capture label to a sample comprising a target nucleic acid fragment and a non-target nucleic acid fragment, wherein the one or more catalytically inactive Cas enzymes are configured to bind to the target nucleic acid fragment;
Providing a surface comprising an extraction moiety configured to bind to the capture label; and
separating the target nucleic acid fragments from the non-target nucleic acid fragments by capturing the target nucleic acid fragments via binding of the capture label by the extraction portion.
48. The method of claim 47, further comprising ligating an adaptor molecule to the ends of the plurality of nucleic acid fragments prior to providing the one or more catalytically inactive CRISPR-associated (Cas) enzymes.
49. A method for enriching a target double-stranded nucleic acid material, comprising:
providing a nucleic acid material;
cleaving the nucleic acid material with one or more targeted endonucleases to generate double-stranded target nucleic acid fragments comprising a 5 'sticky end having a 5' predetermined nucleotide sequence and/or a 3 'sticky end having a 3' predetermined nucleotide sequence; and
separating the double stranded target nucleic acid molecule from the remainder of the nucleic acid material by at least one of the 5 'sticky end and the 3' sticky end.
50. The method of claim 49, further comprising providing at least one sequencing adaptor molecule comprising an ligatable end at least partially complementary to said 5 'predetermined nucleotide sequence or said 3' predetermined nucleotide sequence;
Ligating said at least one sequencing adaptor molecule to said double stranded target nucleic acid molecule; and
analyzing the double-stranded target nucleic acid fragments by sequencing.
51. The method of claim 50, wherein the at least one adaptor molecule comprises a Y-shape or a U-shape.
52. The method of claim 50, wherein the at least one adaptor molecule is a hairpin molecule.
53. The method of claim 50, wherein the at least one adaptor molecule comprises a capture molecule configured to be bound by an extraction moiety.
54. The method of claim 50, wherein a sequencing adaptor molecule is ligated to each of the 5 'sticky end and the 3' sticky end of the double-stranded target nucleic acid fragments.
55. The method of claim 49, wherein separating the double stranded target nucleic acid molecule from the remainder of the nucleic acid material by at least one of the 5 'sticky end and the 3' sticky end comprises providing an oligonucleotide having a sequence at least partially complementary to the 5 'predetermined nucleotide sequence or the 3' predetermined nucleotide sequence.
56. The method of claim 55, wherein the oligonucleotide is bound to a surface.
57. The method of claim 55, wherein the oligonucleotide comprises a capture label configured to bind to an extraction moiety.
58. The method of claim 49, wherein the one or more targeted endonucleases comprise Cpf 1.
59. The method of claim 49, wherein the one or more targeted endonucleases comprise a Cas9 nickase.
60. A kit for enriching a target nucleic acid material, comprising:
a nucleic acid library comprising:
a nucleic acid material; and
a plurality of catalytically inactive Cas enzymes, wherein the Cas enzymes comprise a tag having a sequence code,
wherein the plurality of Cas enzymes are bound to a plurality of site-specific target regions along the nucleic acid material;
a plurality of probes, wherein each probe comprises:
a complement oligonucleotide sequence comprising a corresponding sequence code; and
capturing the tag; and
a look-up table that classifies relationships between the site-specific target region, sequence codes associated with the site-specific target region, and probes of a complement including the corresponding sequence codes.
CN201980019408.4A 2018-03-15 2019-03-15 Methods and reagents for enriching nucleic acid material for sequencing applications and other nucleic acid material interrogation Pending CN111868255A (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201862643738P 2018-03-15 2018-03-15
US62/643,738 2018-03-15
PCT/US2019/022640 WO2019178577A1 (en) 2018-03-15 2019-03-15 Methods and reagents for enrichment of nucleic acid material for sequencing applications and other nucleic acid material interrogations

Publications (1)

Publication Number Publication Date
CN111868255A true CN111868255A (en) 2020-10-30

Family

ID=67908450

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201980019408.4A Pending CN111868255A (en) 2018-03-15 2019-03-15 Methods and reagents for enriching nucleic acid material for sequencing applications and other nucleic acid material interrogation

Country Status (9)

Country Link
US (1) US20210010065A1 (en)
EP (1) EP3765063A4 (en)
JP (2) JP2021515579A (en)
CN (1) CN111868255A (en)
AU (1) AU2019233918A1 (en)
CA (1) CA3093846A1 (en)
IL (1) IL277325A (en)
SG (1) SG11202008929WA (en)
WO (1) WO2019178577A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114752668A (en) * 2022-05-13 2022-07-15 深圳市优圣康生物科技有限公司 Anemia screening kit and method for CRISPR and CAS9 targeted capture of long DNA fragments
CN115927539A (en) * 2022-09-14 2023-04-07 首都体育学院 Target nucleic acid enrichment method and kit and application thereof
CN117448422A (en) * 2023-10-23 2024-01-26 复旦大学附属肿瘤医院 Method for enriching cfDNA in urine based on biotin double probes
CN118064548A (en) * 2024-03-11 2024-05-24 青岛大学 A method for preparing biotinylated 8-oxo-Gua nucleic acid

Families Citing this family (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10844428B2 (en) 2015-04-28 2020-11-24 Illumina, Inc. Error suppression in sequenced DNA fragments using redundant reads with unique molecular indices (UMIS)
EP3387152B1 (en) 2015-12-08 2022-01-26 Twinstrand Biosciences, Inc. Improved adapters, methods, and compositions for duplex sequencing
US10650312B2 (en) 2016-11-16 2020-05-12 Catalog Technologies, Inc. Nucleic acid-based data storage
AU2017363139B2 (en) 2016-11-16 2023-09-21 Catalog Technologies, Inc. Nucleic acid-based data storage
AU2018210188B2 (en) 2017-01-18 2023-11-09 Illumina, Inc. Methods and systems for generation and error-correction of unique molecular index sets with heterogeneous molecular lengths
WO2018204423A1 (en) 2017-05-01 2018-11-08 Illumina, Inc. Optimal index sequences for multiplex massively parallel sequencing
DK3622089T3 (en) 2017-05-08 2024-10-14 Illumina Inc PROCEDURE FOR SEQUENCE USING UNIVERSAL SHORT ADAPTERS FOR INDEXING POLYNUCLEOTIDE SAMPLES
EP3638809A4 (en) * 2017-06-13 2021-03-10 Genetics Research, LLC, D/B/A ZS Genetics, Inc. Negative-positive enrichment for nucleic acid detection
US20180355437A1 (en) * 2017-06-13 2018-12-13 Genetics Research, Llc, D/B/A Zs Genetics, Inc. Plasma/serum target enrichment
US10081829B1 (en) * 2017-06-13 2018-09-25 Genetics Research, Llc Detection of targeted sequence regions
EP3638812A4 (en) * 2017-06-13 2021-04-28 Genetics Research, LLC, D/B/A ZS Genetics, Inc. Rare nucleic acid detection
US11447818B2 (en) 2017-09-15 2022-09-20 Illumina, Inc. Universal short adapters with variable length non-random unique molecular identifiers
AU2018366213B2 (en) 2017-11-08 2025-05-15 Twinstrand Biosciences, Inc. Reagents and adapters for nucleic acid sequencing and methods for making such reagents and adapters
JP7364604B2 (en) 2018-03-16 2023-10-18 カタログ テクノロジーズ, インコーポレイテッド Chemical methods for nucleic acid-based data storage
JP7497879B2 (en) * 2018-05-16 2024-06-11 ツインストランド・バイオサイエンシズ・インコーポレイテッド Methods and Reagents for Analysing Nucleic Acid Mixtures and Mixed Cell Populations and Related Uses - Patent application
KR20210029147A (en) 2018-05-16 2021-03-15 카탈로그 테크놀로지스, 인크. Compositions and methods for storing nucleic acid-based data
WO2020014693A1 (en) 2018-07-12 2020-01-16 Twinstrand Biosciences, Inc. Methods and reagents for characterizing genomic editing, clonal expansion, and associated applications
CA3120359A1 (en) * 2018-11-19 2020-05-28 The Regents Of The University Of California Methods for detecting and sequencing a target nucleic acid
US11610651B2 (en) 2019-05-09 2023-03-21 Catalog Technologies, Inc. Data structures and operations for searching, computing, and indexing in DNA-based data storage
EP4041920A1 (en) 2019-10-11 2022-08-17 Catalog Technologies, Inc. Nucleic acid security and authentication
JP2023519782A (en) * 2020-01-17 2023-05-15 ジャンプコード ゲノミクス,インク. Methods of targeted sequencing
CN111424075B (en) * 2020-04-10 2021-01-15 西咸新区予果微码生物科技有限公司 Third-generation sequencing technology-based microorganism detection method and system
AU2021271639A1 (en) 2020-05-11 2022-12-08 Catalog Technologies, Inc. Programs and functions in DNA-based data storage
WO2022060707A1 (en) * 2020-09-15 2022-03-24 Rutgers, The State University Of New Jersey Systems for gene editing and methods of use thereof
CN117441026A (en) * 2021-04-06 2024-01-23 瑞普瑞德诊断有限责任公司 Methods and systems for analyzing complex genomic regions
GB202111195D0 (en) * 2021-08-03 2021-09-15 Cergentis B V Method for targeted sequencing
CN114672549A (en) * 2022-04-22 2022-06-28 厦门大学 Rett syndrome early auxiliary diagnosis kit
WO2024138484A1 (en) * 2022-12-29 2024-07-04 深圳华大生命科学研究院 Sequencing method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010148115A1 (en) * 2009-06-18 2010-12-23 The Penn State Research Foundation Methods, systems and kits for detecting protein-nucleic acid interactions
US20150044687A1 (en) * 2012-03-20 2015-02-12 University Of Washington Through Its Center For Commercialization Methods of lowering the error rate of massively parallel dna sequencing using duplex consensus sequencing
WO2015075056A1 (en) * 2013-11-19 2015-05-28 Thermo Fisher Scientific Baltics Uab Programmable enzymes for isolation of specific dna fragments
US20160208241A1 (en) * 2014-08-19 2016-07-21 Pacific Biosciences Of California, Inc. Compositions and methods for enrichment of nucleic acids
US20170107560A1 (en) * 2013-05-29 2017-04-20 Agilent Technologies, Inc. Nucleic acid enrichment using cas9

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SG11201702066UA (en) * 2014-07-21 2017-04-27 Illumina Inc Polynucleotide enrichment using crispr-cas systems
EP3957745A1 (en) * 2014-12-20 2022-02-23 Arc Bio, LLC Compositions and methods for targeted depletion, enrichment, and partitioning of nucleic acids using crispr/cas system proteins
US20180355437A1 (en) * 2017-06-13 2018-12-13 Genetics Research, Llc, D/B/A Zs Genetics, Inc. Plasma/serum target enrichment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010148115A1 (en) * 2009-06-18 2010-12-23 The Penn State Research Foundation Methods, systems and kits for detecting protein-nucleic acid interactions
US20150044687A1 (en) * 2012-03-20 2015-02-12 University Of Washington Through Its Center For Commercialization Methods of lowering the error rate of massively parallel dna sequencing using duplex consensus sequencing
US20170107560A1 (en) * 2013-05-29 2017-04-20 Agilent Technologies, Inc. Nucleic acid enrichment using cas9
WO2015075056A1 (en) * 2013-11-19 2015-05-28 Thermo Fisher Scientific Baltics Uab Programmable enzymes for isolation of specific dna fragments
US20160208241A1 (en) * 2014-08-19 2016-07-21 Pacific Biosciences Of California, Inc. Compositions and methods for enrichment of nucleic acids

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114752668A (en) * 2022-05-13 2022-07-15 深圳市优圣康生物科技有限公司 Anemia screening kit and method for CRISPR and CAS9 targeted capture of long DNA fragments
CN115927539A (en) * 2022-09-14 2023-04-07 首都体育学院 Target nucleic acid enrichment method and kit and application thereof
CN117448422A (en) * 2023-10-23 2024-01-26 复旦大学附属肿瘤医院 Method for enriching cfDNA in urine based on biotin double probes
CN118064548A (en) * 2024-03-11 2024-05-24 青岛大学 A method for preparing biotinylated 8-oxo-Gua nucleic acid

Also Published As

Publication number Publication date
IL277325A (en) 2020-10-29
AU2019233918A1 (en) 2020-10-15
JP2025060959A (en) 2025-04-10
SG11202008929WA (en) 2020-10-29
WO2019178577A1 (en) 2019-09-19
EP3765063A1 (en) 2021-01-20
JP2021515579A (en) 2021-06-24
US20210010065A1 (en) 2021-01-14
CA3093846A1 (en) 2019-09-19
EP3765063A4 (en) 2021-12-15

Similar Documents

Publication Publication Date Title
CN111868255A (en) Methods and reagents for enriching nucleic acid material for sequencing applications and other nucleic acid material interrogation
US12006532B2 (en) Methods for targeted nucleic acid sequence enrichment with applications to error corrected nucleic acid sequencing
CN113661249B (en) Compositions and methods for isolating cell-free DNA
AU2017386533A1 (en) Analysis system for orthogonal access to and tagging of biomolecules in cellular compartments
US20220220543A1 (en) Methods and reagents for nucleic acid sequencing and associated applications
CN114555802A (en) single cell analysis
US20230235393A1 (en) Methods of enriching for target nucleic acid molecules and uses thereof
US20240327900A1 (en) Bodily fluid target enrichment
US20230095295A1 (en) Phi29 mutants and use thereof
CN109072296A (en) The method for carrying out direct target sequencing using nucleic acid enzyme protection
US12270125B2 (en) System and method for modular and combinatorial nucleic acid sample preparation for sequencing
HK40039255A (en) Methods and reagents for enrichment of nucleic acid material for sequencing applications and other nucleic acid material interrogations
HK40065550A (en) Methods and reagents for nucleic acid sequencing and associated applications
HK40087991B (en) Methods for targeted nucleic acid sequence enrichment with applications to error corrected nucleic acid sequencing
HK40087991A (en) Methods for targeted nucleic acid sequence enrichment with applications to error corrected nucleic acid sequencing
WO2024158720A2 (en) Fine needle aspiration methods
JP2022539630A (en) Preparation of nucleic acid libraries using electrophoresis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40039255

Country of ref document: HK