IP-2724-PCT/531.2724WO01 METHODS FOR INCREASING SEQUENCING QUALITY OF GC-RICH REGIONS [0001] CROSS-REFERENCE TO RELATED APPLICATIONS [0002] This application claims the benefit of U.S. Provisional Application Serial No.63/661,111, filed June 18, 2024, which is incorporated by reference herein in its entirety. [0003] SEQUENCE LISTING [0004] This application contains a Sequence Listing electronically submitted via Patent Center to the United States Patent and Trademark Office as an XML file entitled “IP-2724-PCT- 5312724WO01.xml” having a size of 29,588 bytes and created on June 17, 2025. The information contained in the Sequence Listing is incorporated by reference herein. [0005] FIELD [0006] The present disclosure is concerned with reducing secondary structure during sequencing to improve quality of the resulting data. In particular, the present disclosure includes methods for using dGTP analogs during the production of clusters, during strand resynthesis, or the combination thereof. [0007] BACKGROUND [0008] GC-rich regions of the genome are often underrepresented in sequencing results (Tilak et al., Genome Biol Evol.2018 Feb 1;10(2):616-622. doi: 10.1093/gbe/evy022). GC rich regions can include homopolymers of G or C and DNA secondary structures such as G- quadruplexes. G homopolymers, C homopolymers, and G-quadruplexes can lead to systematic, or Sequence Specific Errors (SSEs), in sequencing-by-synthesis (SBS) methodologies. The SSEs reduce sequencing quality of those regions, and the SSEs cannot be minimized by increasing sequencing depth. Although homopolymers constitute the bulk of SSEs, G-quadruplexes account for about 5-14% of SSEs, some of which happen in
IP-2724-PCT/531.2724WO01 clinically relevant regions and thus affect the ability of clinicians to identify important mutations and make critical genome-based decisions affecting the health of patients. [0009] SUMMARY OF THE APPLICATION [0010] Genomic DNA can include GC-rich regions that present significant challenges to sequence. The challenges may be largely attributed to effect G homopolymers, C homopolymers, and secondary structure have on library preparation, inefficient seeding of GC rich templates, inability to cluster efficiently through high GC rich regions, and inability of DNA polymerase to sequence through GC rich regions. Provided herein are methods, compositions, arrays, cartridges, and kit that reduce secondary structure in GC-rich regions, reduce G-quadruplex formation and stability, and reduce bias against representation of GC rich region in sequence data. [0011] The present disclosure provides methods. In one embodiment, a method can include providing an amplification reagent including (i) an array of amplification sites, (ii) a composition including a plurality of modified target nucleic acids, (iii) a composition including nucleotide triphosphates (NTPs), where the NTPs include dATP, dTTP, dCTP, dGTP, and a dGTP analog, and (iv) a composition including a polymerase. The method can include reacting the amplification reagent to produce a plurality of populated amplification sites, where the plurality of populated amplification sites each include a clonal population of amplicons from an individual modified target nucleic acid from the plurality of modified target nucleic acids. [0012] In one embodiment, a method can include providing an amplification reagent including (i) an array of amplification sites, where each amplification site includes a capture sequence and a single-stranded modified target nucleic acid immobilized thereto; (ii) a composition including nucleotide triphosphates (NTPs), where the NTPs include dATP, dTTP, dCTP, dGTP, and a dGTP analog, and (iii) a composition including a polymerase. The method can further include reacting the amplification reagent to produce a plurality of amplification sites that each include a clonal population of amplicons from the single-stranded modified target nucleic acid immobilized thereto in step (i).
IP-2724-PCT/531.2724WO01 [0013] In one embodiment, a method can include providing an array including a plurality of amplification sites, where the amplification sites include two populations of capture nucleic acids immobilized to the amplification sites at the 5’ end, each population including a capture sequence. The first population of capture nucleic acids can include at each amplification site a clonal population of a modified target nucleic acid, the 5’ end of the clonal population of the modified target nucleic acid attached to the 3’ end of the first population capture nucleic acids. The clonal population of the modified target nucleic acid at each amplification site is can be a member of a sequencing library. The method can further include contacting the plurality of amplification sites to a resynthesis reagent including (i) a composition including nucleotide triphosphates (NTPs), where the NTPs include dATP, dTTP, dCTP, dGTP, and a dGTP analog including a nucleobase, and (ii) a composition including a polymerase. The method can further include reacting the resynthesis reagent to produce a plurality of re- populated amplification sites attached to the array, where the plurality of re-populated amplification sites each include a clonal population of a resynthesized target nucleic acids immobilized to the amplification sites at the 5’ end. The clonal population of the resynthesized target nucleic acid can include a nucleic acid sequence that is a complement of the clonal population of the modified target nucleic acid of the providing step. [0014] In one embodiment, a method can include providing an array including a plurality of amplification sites, where the amplification sites include two populations of capture nucleic acids immobilized to the amplification sites at the 5’ end, each population including a capture sequence. A first population of capture nucleic acids can include at each amplification site a clonal population of a modified target nucleic acid, where the 5’ end of the clonal population of the modified target nucleic acid attached to the 3’ end of the first population capture nucleic acids. A second population of capture nucleic acids can include (i) the complement of the clonal population of the modified target nucleic acid at each amplification site, where the 5’ end of the complement of the clonal population of the modified target nucleic acid attached to the 3’ end of the second population capture nucleic acids, and (ii) a cleavage site. The clonal population of the modified target nucleic acid at each amplification site is a member of a sequencing library. The method can further include contacting the amplification sites with a cleavage agent, thereby cleaving the second population of capture nucleic acids, and releasing the clonal population of the modified target nucleic acid attached
IP-2724-PCT/531.2724WO01 to the 3’ end of the second population capture nucleic acids. The method can further include removing the released clonal population of the modified target nucleic acid attached to the 3’ end of the second population capture nucleic acids from the amplification sites. The method can further include contacting the plurality of amplification sites to a resynthesis reagent including (i) a composition that includes nucleotide triphosphates (NTPs), where the NTPs include dATP, dTTP, dCTP, dGTP, and a dGTP analog, and (ii) a composition including a polymerase, and reacting the resynthesis reagent to produce a plurality of re- populated amplification sites attached to the array. The plurality of re-populated amplification sites each include a clonal population of a resynthesized target nucleic acid immobilized to the amplification sites at the 5’ end, where the clonal population of the resynthesized target nucleic acid includes a nucleic acid sequence that is a complement of the clonal population of the modified target nucleic acid of the providing step. [0015] In some embodiments, the amplification reagent and the resynthesis reagent can include the dGTP analog at no greater than 25% of the total amount of dGTP in the reagent. [0016] The present disclosure also provides arrays. In one embodiment, a array includes a plurality of populated amplification sites attached to the array, where the plurality of populated amplification sites each include a clonal population of amplicons from an individual modified target nucleic acid from a library of modified target nucleic acids. The amplicons include nucleotides dATP, dTTP, dGTP, dCTP, and a dGTP analog. [0017] The present disclosure also provides cartridges. A cartridge can be for use with a sequencing apparatus, and the cartridge can include a first chamber that has a nucleotide composition, where the nucleotide composition includes dATP, dTTP, dGTP, dCTP, and a dGTP analog. [0018] The present disclosure also provides kits. A kit can be for use with a sequencing apparatus, and can include a cartridge. The cartridge can include a first chamber that has a nucleotide composition, where the nucleotide composition incudes dATP, dTTP, dGTP, dCTP, and a dGTP analog. [0019] In some embodiments, the cartridge and the kit can include the dGTP analog at no greater than 25% of the total amount of dGTP in the first chamber.
IP-2724-PCT/531.2724WO01 [0020] The dGTP analog of the methods, compositions, articles, and kits can include a nucleobase that can be 7-deaza-dGPT, a 7-deaza-dGPT analog substituted at the 7 position, a 7-deaza- dGPT analog substituted at the 7 position, or a 7-deaza-dGPT analog substituted at the 7 position and 8 position. The dGTP analog of the methods, compositions, articles, and kits can include a nucleobase that is 8-aza-7-deaza-dGTP or an 8-aza-7-deaza-dGTP analog substituted at the 7 position. The dGTP analog of the methods, compositions, articles, and kits can include a nucleobase that can be a 7-N-sub-dGTP. [0021] In some embodiments, the nucleobase of the dGTP analog is of Formula 5:
wherein J is C or N; where Z is C or N; where R1 is hydrogen, halo, alkyl, acyl, trihaloalkyl, cyano, sulfinyl, sulfonyl, or alkynyl; and R2 is hydrogen or halo, such as chloro. In some embodiments, R1 is a C1 to C6 alkyl, R1 is methyl, or R1 is an acyl of the formula -C(O)- R10 and R10 is a C1 to C6 alkyl, such as methyl. In some embodiments, R1 is a trihaloalkyl of the formula -(CH2)n1C(X)3, where n1 is 0, 1, 2, 3, or 4, and where X is halo such as F. In some embodiments, R1 is a cyano of the formula -(CH2)n2CN , and where n2 is 0, 1, 2, 3, or 4. In some embodiments, R1 is a sulfinyl of formula -S(O)- R20 where R20 is a C1 to C6 alkyl or a trihaloalkyl of the formula -(CH2)n1C(X)3, and where n1 is 0, 1, 2, 3, or 4, and X is halo such as F. In some embodiments, R1 is a sulfonyl of the formula -S(O)2-R30 , where R30 is a C1 to C6 alkyl, such as methyl, or a trihaloalkyl of the formula -(CH2)n1C(X)3, and where n1 is 0, 1, 2, 3, or 4, and X is halo, such as F. In some embodiments, R1 is an alkynyl of -CC- (CH2)n3-R40, where n3 is 1, 2, 3, or 4; and where R40 is CH3 or an amine. [0022] In some embodiments, the nucleobase of the dGTP analog is of Formula 6:
IP-2724-PCT/531.2724WO01
where R100 is hydrogen or a C1 to C6 alkyl, such as methyl. [0023] In some embodiments, the nucleobase of the dGTP analog is , , ,
IP-2724-PCT/531.2724WO01 [0024] Terms used herein will be understood to take on their ordinary meaning in the relevant art unless specified otherwise. Several terms used herein and their meanings are set forth below. [0025] As used herein, "GC-rich region" refers to a series of guanosine (G) nucleotides, cytosine (C) nucleotides, or both guanosine and cytosine nucleotides on a strand of a nucleic acid. A GC-rich region can be a series of G nucleotides (a G homopolymer), a series of C nucleotides, (a C homopolymer) or a combination of both G and C nucleotides. A GC-rich region can include G nucleotides that can form one or more G-quadruplex structures. C- rich region can include a GC content of at least 40%, at least 50%, at least 60%, and 70%, at least 80%, or at least 90% on a single strand over a defined length of nucleotides. GC content can be calculated as (number of G + C nucleotides)/(number A + T + G + C nucleotides) * 100%. The defined length can be any number, such as 25, 50, 100, or 200 nucleotides. [0026] As used herein, the term “amplification site” refers to a site in or on an array where one or more amplicons can be generated. An amplification site can be further configured to contain, hold or attach at least one amplicon that is generated at the site. [0027] As used herein, the term “array” refers to a population of sites that can be differentiated from each other according to relative location. Different molecules that are at different sites of an array can be differentiated from each other according to the locations of the sites in the array. An individual site of an array can include one or more molecules of a particular type. For example, a site can include a single target nucleic acid molecule having a particular sequence or a site can include several nucleic acid molecules having the same sequence (and/or complementary sequence, thereof). The sites of an array can be different features located on the same substrate. Exemplary features include without limitation, wells in a substrate, beads (or other particles) in or on a substrate, projections from a substrate, ridges on a substrate or channels in a substrate. The sites of an array can be separate substrates each bearing a different molecule. Different molecules attached to separate substrates can be identified according to the locations of the substrates on a surface to which the substrates are associated or according to the locations of the substrates in a liquid or gel. Exemplary arrays in which
IP-2724-PCT/531.2724WO01 separate substrates are located on a surface include, without limitation, those having beads in wells. [0028] As used herein, the term “amplicon,” when used in reference to a nucleic acid, means the product of copying the nucleic acid, where the product has a nucleotide sequence that is the same as or complementary to at least a portion of the nucleotide sequence of the nucleic acid. An amplicon can be produced by any of a variety of amplification methods that use the nucleic acid, e.g., a target nucleic acid or an amplicon thereof, as a template including, for example, polymerase extension, polymerase chain reaction (PCR), rolling circle amplification (RCA), ligation extension, or ligation chain reaction. An amplicon can be a nucleic acid molecule having a single copy of a particular nucleotide sequence (e.g., a polymerase extension product) or multiple copies of the nucleotide sequence (e.g., a concatemeric product of RCA). A first amplicon of a target nucleic acid is typically a complementary copy. Subsequent amplicons are copies that are created, after generation of the first amplicon, from the target nucleic acid or from the first amplicon. A subsequent amplicon can have a sequence that is substantially complementary to the target nucleic acid or substantially identical to the target nucleic acid. [0029] As used herein, the term “capture agent” refers to a material, chemical, molecule, or moiety thereof that is capable of attaching, retaining, or binding to a target molecule (e.g., a target nucleic acid). Exemplary capture agents include, without limitation, a capture nucleic acid that is complementary to at least a portion of a modified target nucleic acid (e.g., a universal capture binding sequence), a member of a receptor-ligand binding pair (e.g., avidin, streptavidin, biotin, lectin, carbohydrate, nucleic acid binding protein, epitope, antibody, etc.) capable of binding to a modified target nucleic acid (or linking moiety attached thereto), or a chemical reagent capable of forming a covalent bond with a modified target nucleic acid (or linking moiety attached thereto). In one embodiment, a capture agent is a nucleic acid. A nucleic acid capture agent can also be used as an amplification primer. [0030] The terms “P5” and “P7” may be used when referring to a nucleic acid capture agent. The terms “P5’” (P5 prime) and “P7’” (P7 prime) refer to the complements of P5 and P7, respectively. It will be understood that any suitable nucleic acid capture agent can be used
IP-2724-PCT/531.2724WO01 in the methods presented herein, and that the use of P5 and P7 are exemplary embodiments only. Uses of nucleic acid capture agents such as P5 and P7 on flow-cells is known in the art, as exemplified by the disclosures of WO 2007/010251, WO 2006/064199, WO 2005/065814, WO 2015/106941, WO 1998/044151, and WO 2000/018957. One of skill in the art will recognize that a nucleic acid capture agent can also function as an amplification primer. For example, any suitable nucleic acid capture agent can act as a forward amplification primer, whether immobilized or in solution, and can be useful in the methods presented herein for hybridization to a sequence (e.g., a universal capture binding sequence) and amplification of a sequence. Similarly, any suitable nucleic acid capture agent can act as a reverse amplification primer, whether immobilized or in solution, and can be useful in the methods presented herein for hybridization to a sequence (e.g., a universal capture binding sequence) and amplification of a sequence. In view of the general knowledge available and the teachings of the present disclosure, one of skill in the art will understand how to design and use sequences that are suitable for capture and amplification of target nucleic acids as presented herein. [0031] As used herein, the term “polymerase” is intended to be consistent with its use in the art and includes, for example, an enzyme that produces a complementary replicate of a nucleic acid molecule using the nucleic acid as a template strand. Typically, DNA polymerases bind to the template strand and then move down the template strand sequentially adding nucleotides to the free hydroxyl group at the 3' end of a growing strand of nucleic acid. DNA polymerases typically synthesize complementary DNA molecules from DNA templates and RNA polymerases typically synthesize RNA molecules from DNA templates (transcription). Polymerases can use a short RNA or DNA strand, called a primer, to begin strand growth. Some polymerases can displace the strand upstream of the site where they are adding bases to a chain. Such polymerases are said to be strand displacing, meaning they have an activity that removes a complementary strand from a template strand being read by the polymerase. Exemplary polymerases having strand displacing activity include, without limitation, the large fragment of Bsu (Bacillus subtilis), Bst (Bacillus stearothermophilus) polymerase, exo- Klenow polymerase or sequencing grade T7 exo-polymerase. Some polymerases degrade the strand in front of them, effectively replacing it with the growing chain behind (5' exonuclease activity). Some polymerases have an activity that degrades the strand behind them (3'
IP-2724-PCT/531.2724WO01 exonuclease activity). Some useful polymerases have been modified, either by mutation or otherwise, to reduce or eliminate 3' and/or 5' exonuclease activity. Different polymerases can be used at different times during the sequencing process, including library production (e.g., amplification or reverse transcription), production of clonal populations of amplicons at amplification sites (e.g., a polymerase for Exclusion Amplification or Bridge Amplification), or sequencing (e.g., a polymerase that can be used with 3'-blocked nucleotides). [0032] As used herein, the terms “nucleic acid” and “polynucleotide” are used interchangeably and are intended to be consistent with its use in the art and includes naturally occurring nucleic acids and functional analogs thereof. Particularly useful functional analogs are capable of hybridizing to a nucleic acid in a sequence specific fashion or capable of being used as a template for replication of a particular nucleotide sequence. Naturally occurring nucleic acids generally have a backbone containing phosphodiester bonds. An analog structure can have an alternate backbone linkage including any of a variety of those known in the art. Naturally occurring nucleic acids generally have a deoxyribose sugar (e.g., found in deoxyribonucleic acid (DNA)) or a ribose sugar (e.g., found in ribonucleic acid (RNA)). A nucleic acid can contain any of a variety of analogs of these sugar moieties that are known in the art. A nucleic acid can include native or non-native bases. In this regard, a native deoxyribonucleic acid can have one or more bases selected from adenine, thymine, cytosine or guanine and a ribonucleic acid can have one or more bases selected from uracil, adenine, cytosine or guanine. Useful non-native bases that can be included in a nucleic acid are known in the art. In some embodiments, non-native bases that can be included in a nucleic acid include a guanine modified as described herein (e.g., a guanine present in a dGTP analog). The term “target,” when used in reference to a nucleic acid, is intended as a semantic identifier for the nucleic acid in the context of a method or composition set forth herein and does not necessarily limit the structure or function of the nucleic acid beyond what is otherwise explicitly indicated. A target nucleic acid having a universal sequence at each end, for instance a universal adapter at each end, can be referred to as a modified target nucleic acid.
IP-2724-PCT/531.2724WO01 [0033] As used herein, the symbol “ ” (hereinafter can be referred to as “a point of attachment bond”) denotes that is a point of attachment between two chemical
entities, one of which is depicted as being attached to the point of attachment bond and the other of which is not depicted as being attached to the point of attachment bond. For example, “ XY ” indicates that the chemical entity “XY” is bonded to another chemical entity via the point of attachment bond. [0034] The point of attachment of the organic group to the compound may be described in several ways. For example. in some embodiments, the chemical entity (or chemical group or moiety) may be described as the monovalent or radical of the respective functional group (e.g., alkyl for alkane, aryl for aromatic ring, aminyl for a primary or secondary amine). In some embodiments, where a general formula is shown with a covalent bond connecting a chemical moiety to a compound, the chemical moiety may be described as the common functional group. For example, if the organic group R is described relative to the formula CH3CH2CH2-R, the organic group may be described, for example, as an aromatic ring, sulfoxide, amine, or any other common functional group name. [0035] As used herein, “alkyl” refers to a monovalent group that is a radical of an alkane and includes straight-chain, branched-chain, cyclic, and bicyclic alkyl groups, and combinations thereof, including both unsubstituted and substituted alkyl groups. Alkyl may be used to describe an alkane substituent attached to a compound. An alkyl substituent may include other functional groups, for example, including carbonyls, halogens, amines, and others. [0036] The terms “alkynyl” and “alkynyl group” refer to a univalent group that is a radical of an alkyne and includes groups that are linear, branched, cyclic, or combinations thereof. An alkynyl group has one or more triple bonds. The location of the triple bond may be anywhere along the alkynyl. For example, the radical may be a part of the triple bond (e.g., ∙CC-). Alternatively, the radical may be a part of a single bond (e.g., ∙CH2-). Alkynyl may be used to describe an alkyne containing substituent attached to a compound.
IP-2724-PCT/531.2724WO01 [0037] The term "sulfinyl" means a divalent group of formula -SO-. Sulfinyl may be used to describe a sulfoxide that is covalently connected to a compound. [0038] The term "sulfonyl" means a divalent group of formula -SO2-. Sulfonyl may be used to describe a sulfone connected to a compound. [0039] The term “acyl” refers to a group derived by removing one or more hydroxyl groups from a carboxylic acid. An acyl group may include an alkyl group and an oxygen atom double bonded to a carbon atom. [0040] Unless otherwise specified, "a," "an," "the," and "at least one" are used interchangeably and mean one or more than one. [0041] As used in this specification and the appended claims, the term "or" is generally employed in its sense including "and/or" unless the content clearly dictates otherwise. The term "and/or" means one or all of the listed elements or a combination of any two or more of the listed elements. The use of "and/or" in some instances does not imply that the use of "or" in other instances may not mean "and/or." [0042] The words "preferred" and "preferably" refer to embodiments of the disclosure that may afford certain benefits, under certain circumstances. However, other embodiments may also be preferred, under the same or other circumstances. Furthermore, the recitation of one or more preferred embodiments does not imply that other embodiments are not useful, and is not intended to exclude other embodiments from the scope of the disclosure. [0043] As used herein, "have," "has," "having," "include," "includes," "including," "comprise," "comprises," "comprising" or the like are used in their open ended inclusive sense, and generally mean "include, but not limited to," "includes, but not limited to," or "including, but not limited to." [0044] It is understood that wherever embodiments are described herein with the language "have," "has," "having," "include," "includes," "including," "comprise," "comprises," "comprising" and the like, otherwise analogous embodiments described in terms of "consisting of" and/or "consisting essentially of" are also provided. The term "consisting of" means including, and
IP-2724-PCT/531.2724WO01 limited to, whatever follows the phrase "consisting of." That is, "consisting of" indicates that the listed elements are required or mandatory, and that no other elements may be present. The term "consisting essentially of" indicates that any elements listed after the phrase are included, and that other elements than those listed may be included provided that those elements do not interfere with or contribute to the activity or action specified in the disclosure for the listed elements. [0045] Conditions that are "suitable" for an event to occur, or "suitable" conditions are conditions that do not prevent such events from occurring. Thus, these conditions permit, enhance, facilitate, and/or are conducive to the event. [0046] As used herein, "providing" in the context of, for instance, an amplification or resynthesis reagent, an array, or a composition, means making the amplification or resynthesis reagent, an array, or composition, purchasing the amplification or resynthesis reagent, an array, or composition, or otherwise obtaining the amplification or resynthesis reagent, an array, or composition. [0047] Reference throughout this specification to "one embodiment," "an embodiment," "certain embodiments," or "some embodiments," etc., means that a particular feature, configuration, composition, or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosure. Thus, the appearances of such phrases in various places throughout this specification are not necessarily referring to the same embodiment of the disclosure. Furthermore, the particular features, configurations, compositions, or characteristics may be combined in any suitable manner in one or more embodiments. [0048] Throughout this disclosure, various aspects of the disclosure can be presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the disclosure. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4,
IP-2724-PCT/531.2724WO01 from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 2.7, 3, 4, 5, 5.3, and 6. This applies regardless of the breadth of the range. [0049] In the description herein particular embodiments may be described in isolation for clarity. Unless otherwise expressly specified that the features of a particular embodiment are incompatible with the features of another embodiment, certain embodiments can include a combination of compatible features described herein in connection with one or more embodiments. [0050] For any method disclosed herein that includes discrete steps, the steps may be conducted in any feasible order. And, as appropriate, any combination of two or more steps may be conducted simultaneously. [0051] The above summary of the present disclosure is not intended to describe each disclosed embodiment or every implementation of the present disclosure. The description that follows more particularly exemplifies illustrative embodiments. In several places throughout the application, guidance is provided through lists of examples, which examples can be used in various combinations. In each instance, the recited list serves only as a representative group and should not be interpreted as an exclusive list. [0052] BRIEF DESCRIPTION OF THE FIGURES [0053] The following detailed description of illustrative embodiments of the present disclosure may be best understood when read in conjunction with the following drawings. [0054] FIG.1A-B shows G-quadruplex structures. FIG.1A, the structure of a G-quadruplex with four guanine nucleobases and a central cation (C+). The Watson-Crick and Hoogsteen hydrogen bonds between positions 6 and 1 and between positions 2 and 7 of adjacent nucleobases are shown by the dashed lines. FIG.1B, an example of a structure of an intramolecular G-quadruplex with three stacked G-quadruplexes. [0055] FIG.2 shows a schematic drawing of embodiments that can occur during cluster generation or resynthesis. For simplicity, only one amplification site of an array is shown.
IP-2724-PCT/531.2724WO01 [0056] FIG.3 shows a general block diagram of a portion of a general illustrative sequencing workflow including use of a dGTP analog according to the present disclosure. [0057] FIG.4A - 4B show schematic drawings of embodiments that can occur during seeding of an amplification site and first strand synthesis. For simplicity, only one amplification site of an array and an associated target nucleic acid (FIG.4A) of one amplification site of an array and an immobilized complement of a target nucleic acid (FIG.4B) are shown. The figures use the following convention when numbering single strands of nucleic acids: the strand that is a member of a sequencing library is numbered (e.g., strand 21’ of FIG.4A); the strand that is immobilized and is the complement of the strand that is a member of a sequencing library is numbered (e.g., strand 21 of FIG.4B). [0058] FIG.5A-5D shows schematic drawings of an embodiment of producing clonal clusters. For simplicity, only one amplification site of an array and a limited number of target nucleic acids are shown. [0059] FIG.6A-6F shows schematic drawings of an embodiment of paired-end sequencing. For simplicity, only one amplification site of an array and a limited number of target nucleic acids are shown. The figures use the following convention when numbering capture nucleic acids: capture nucleic acids prior to cleavage are numbered (e.g., capture nucleic acid 23 of FIG.6A); capture nucleic acids after cleavage are also numbered but the number is modified with the symbol " * " (e.g., strand 23* of FIG.6B). [0060] FIG.7A shows assay used in assessing improvements afforded by the use of 7-deaza- dGTP containing templates. FIG.7B-C shows enhanced ffC incorporation kinetics observed in 7-deaza-dGTP containing templates with (FIG.7B) strong G4 regions especially with K+ but not in (FIG.7C) weak G4 regions. [0061] FIG.8A shows a polymerase has similar ffC incorporation kinetics against dGTP and 7- deaza-dGTP containing templates. FIG.8B shows clustering polymerase Bsu can incorporate both dGTP and 7-deaza-dGTP efficiently. [0062] FIG.9A shows first base incorporation intensities on HiSeqX flow cells at different conditions (lanes 1 and 5-8). FIG.9B shows calculated mean intensities of HiSeqX flow
IP-2724-PCT/531.2724WO01 cell lanes corresponding to different conditions tested showed reduced clustering efficiency with higher 7-deaza-dGTP ratios. Lanes 2-4 are from a separate experimental set not described here. [0063] FIG.10 shows Primary Metric Analysis of mixed 7-deaza-dGTP:dGTP % ratios when used in combined seed and amp ExAmp formulation on the 4 Channel HiSeq. [0064] FIG.11 shows impact of increasing % ratios of 7-deaza-dGTP (to dGTP) on secondary metrics, as analyzed by Fluente Sequence Specific Errors (SSE) Scan on BaseSpace Sequencing Hub. [0065] FIG.12 is an Integrative Genomics View (IGV) plot showing sequence resolution for a G4 region (nucleotides present in box) with control compared to 10% to 50% 7-deaza-dGTP. [0066] FIG.13 shows BaseSpace Sequencing hub SSE scan G-Quadruplex coverage, downsampled to 30X for control standard NextSeq2k clustering, the T2 clustering water spike in control and increasing concentrations of Deaza G spike in treatments. [0067] FIG.14 shows mean % of bases softclipped for 50 known, most severe, G4 sequences within BacPac 450 with control Nextseq2k clustering and Control water spike in T2 clustering format, as well as T2 clustering with increasing % ratios of 7-deaza-dGTP (to dGTP) on NextSeq2k with Gen2 Chemistry. [0068] FIG.15 is an IGV plot showing sequence resolution of a known G4 region (nucleotides present in box) in BacPac 450 library with control and 5% to 20% 7-deaza-dGTP (to dGTP) on NextSeq2k with Gen2 chemistry. [0069] FIG.16 shows kinetics analysis of incorporation against G base using (FIG.16A) ffC and (FIG.16B) ffT demonstrates potentially reduced misincorporation in SBS when using 7- deaza-dGTP. [0070] FIG.17 shows Sybr-gold stained (left) and unstained (right) state of the same gel illustrates the quenching effect of 7-deaza-dGTP. The yield was not affected in this case.
IP-2724-PCT/531.2724WO01 [0071] FIG.18 is an IGV plot showing sequence resolution of a known G4 region (nucleotides present in box) in BacPac 450 library with control and 20% 7-CF3SO4-dGTP on NextSeq2k with Gen2 chemistry. [0072] FIG.19 is an IGV plot showing sequence resolution of a known G4 region (nucleotides present in box) in BacPac 450 library with control and 10% 7-deaza-7-Iodo-dGTP on NextSeq2k with Gen2 chemistry. [0073] FIG.20 is an IGV plot showing sequence resolution of a known G4 region (nucleotides present in box) in BacPac 450 library with control and 20% 7-deaza-7CF3-dGTP on NextSeq2k with Gen2 chemistry. [0074] FIG.21 is an IGV plot showing sequence resolution of a known G4 region (nucleotides present in box) in BacPac 450 library with control and 20% 7-deaza-7-CN-dGTP on NextSeq2k with Gen2 chemistry. [0075] FIG.22 is an IGV plot showing sequence resolution of a known G4 region (nucleotides present in box) in BacPac 450 library with control and 10% 7-deaza-7-F-dGTP on NextSeq2k with Gen2 chemistry. [0076] FIG.23 is an IGV plot showing sequence resolution of a known G4 region (nucleotides present in box) in BacPac 450 library with control and 10% 7-deaza-7-MeSO2-dGTP on NextSeq2k with Gen2 chemistry. [0077] FIG.24 is an IGV plot showing sequence resolution of a known G4 region (nucleotides present in box) in BacPac 450 library with control and 10% 8-aza-7-deaza-dGTP on NextSeq2k with Gen2 chemistry. [0078] FIG.25 is an IGV plot showing sequence resolution of a known G4 region (nucleotides present in box) in BacPac 450 library with control and 10% 7-paraG-dGTP on iSeq100 with Gen2 chemistry. [0079] FIG.26 is a synthetic scheme for synthesizing 7-deaza-7-trifluoromethyl-dGTP (7-deaza- 7-CF3-dGTP).
IP-2724-PCT/531.2724WO01 [0080] FIG.27 is a synthetic scheme for synthesizing 7-deaza-7-methyl sulfoxide-dGTP (7-deaza- 7-SO2Me-dGTP). [0081] FIG. 28 is a synthetic scheme for synthesizing 7-deaza-7-cyano-dGTP (7-deaza-7-CN- dGTP. [0082] FIG.29 is a synthetic scheme for synthesizing 7-deaza-7-fluoro-dGTP (7-deaza-7-F-dGTP). [0083] FIG. 30 is a synthetic scheme for synthesizing 7-deaza-7-chloro-dGTP (7-deaza-7-Cl- dGTP). [0084] FIG. 31 is a synthetic scheme for synthesizing 7-deaza-7,8-dichloro-dGTP (7-deaza-7,8- diCl-dGTP). [0085] FIG.32 is a synthetic scheme for synthesizing 7-deaza-7-trifluoromethylsulfoxide-dGTP (7- deaza-7-SOCF3-dGTP) and 7-deaza-7-trifluoromethylsulfone-dGTP (7-deaza-7-SO2CF3- dGTP). [0086] FIG. 33 is a synthetic scheme for synthesizing 7-deaza-acetoxy-dGTP (7-deaza-7-Ac- dGTP). [0087] FIG.34 is a synthetic scheme for synthesizing 7-N-methyl-dGTP (7-NMe-dGTP). [0088] The schematic drawings are not necessarily to scale. Like numbers used in the figures refer to like components, steps and the like. However, it will be understood that the use of a number to refer to a component in a given figure is not intended to limit the component in another figure labeled with the same number. In addition, the use of different numbers to refer to components is not intended to indicate that the different numbered components cannot be the same or similar to other numbered components. [0089] DETAILED DESCRIPTION [0090] A G-quadruplex (interchangeably referred to herein as G-quadruplex, G4, G-tetrad, and G- quad) is a highly thermodynamically stable structure formed in guanine-rich regions under physiological conditions (FIG.1A). The stability is due to both Watson-Crick and
IP-2724-PCT/531.2724WO01 Hoogsteen hydrogen bonds, and they are further stabilized by monovalent cations typically present in storage and sequencing buffers. A single G-quadruplex can form a stacked structure of two or more G-quadruplexes (FIG.1B) and coordination of a central cation by the guanine O6 carbonyl oxygens. The guanines of a stacked structure can be from a single strand of DNA (intramolecular) or from different strands (intermolecular). The stability of a G-quadruplex relies on the formation of Hoogsteen-type circular H-bonds, which involve the O6 carbonyl oxygen and the N1, N2, and N7 nitrogen atoms of each guanine. [0091] G-quadruplex secondary structures reduce representation of GC rich regions in sequencing methods, including SBS methods. For instance, secondary structure can reduce polymerase synthesis during amplification to produce monoclonal clusters and/or during the resynthesis after the first round of SBS during pairwise sequencing by making it difficult for a polymerase to traverse the region of secondary structure. As depicted in FIG.2, a polymerase 21 cannot easily read through a G-quadruplex 22 during cluster generation or resynthesis at an amplification site 20. [0092] Interference with the highly organized H-bond network of a G-quadruplex (for example by chemically modifying guanosine residues) may affect its stability, and a reduction in G- quadruplex stability is expected to decrease G-quadruplex occurrence. Any changes to dGTP, however, that interfere with the highly organized hydrogen-bond network must also be compatible with the reagents and methods used in a sequencing workflow. For instance, a modification of dGTP used in the sequencing process must be compatible with one or more of cluster generation, sequencing of clusters, and paired-end turn methods. [0093] A sequencing workflow can include sequencing library preparation (often including an amplification), cluster generation (often including seeding amplification sites, first strand extension, and amplification), sequencing (often including first read, paired end turn, and second read), and data analysis (FIG.3). The present disclosure provides methods related to cluster generation and resynthesis between the first and second round of sequencing. Also included in the present disclosure are compositions, articles, and kits related to cluster generation and resynthesis. In particular, the methods, compositions, articles, and kits described herein include dGTP analogs that reduce the formation of G-quadruplexes in
IP-2724-PCT/531.2724WO01 double stranded DNA, single stranded DNA, or both double stranded DNA and single stranded DNA. Reducing G-quadruplexes in nucleic acids reduces one or more of secondary structure of nucleic acids, bias in populating or seeding amplification sites of an array, and bias during the first extension of nucleic acids seeded at amplification sites. The data obtained from subsequent sequencing includes increased representation of secondary structure-prone regions, reduced SSEs, increased output from GC-rich regions, and increased quality from GC-rich regions. [0094] In one embodiment, a method of the present disclosure includes reducing G-quadruplex formation during the production of amplification sites, e.g., during cluster formation (FIG. 3, block 32). A method can include providing an amplification reagent. An amplification reagent can include (i) an array of amplification sites, (ii) a plurality of modified target nucleic acids, (iii) nucleotide triphosphates (dNTPs), wherein the NTPs include dATP, dTTP, dCTP, dGTP, and a dGTP analog, and (iv) a polymerase. In some embodiments, an amplification reagent does not include a dGTP analog. The amplification reagent is reacted, for instance in an amplification reaction, to produce a plurality of populated amplification sites, where the plurality of populated amplification sites each include a clonal population of amplicons from an individual target nucleic acid from the plurality of target nucleic acids. When a dGTP analog is present, the amplicons at the amplification sites will include the dGTP analog incorporated in both strands. The dGTP analog will be present in the amplicons at a level that is dependent on the percentage or ratio of dGTP to dGTP analog in the amplification reagent, and the G-quadruplexes in the amplicons will be reduced in number and/or stability compared to the same amplicon that does not include a dGTP analog. In embodiments where the method includes targeted sequencing (sequencing of specific regions of DNA), most or all of the amplification sites can include target nuclei acids with GC-rich regions. In other embodiments, only some of the populated amplification sites may include GC-rich regions. [0095] In another embodiment, a method of the present disclosure includes reducing G-quadruplex formation during paired-end turn resynthesis (FIG.3, block 34). In one embodiment, a method includes providing a resynthesis reagent. A resynthesis reagent can include (i) an array of amplification sites, where each amplification site includes a immobilized modified
IP-2724-PCT/531.2724WO01 target nucleic acids, (ii) nucleotide triphosphates (dNTPs), wherein the NTPs include dATP, dTTP, dCTP, dGTP, and a dGTP analog, and (iii) a polymerase. In some embodiments, a resynthesis reagent does not include a dGTP analog. The resynthesis reagent is reacted to produce, at each amplification site, a population of strands that are complementary to the strand sequenced during the first round. The population of complementary strands are sequenced during the second round. When a dGTP analog is present, the complementary strands will include the dGTP analog incorporated. The dGTP analog will be present in the complementary strands at a level that is dependent on the percentage or ratio of dGTP to dGTP analog present in the resynthesis reagent, and the G- quadruplexes in the amplicons will be reduced in number and/or stability compared to the same amplicon that does not include a dGTP analog. [0096] Surprisingly, methods described herein can also result in reduced misincorporation rates during sequencing. Specifically, incorporation of thymidine against some dGTP analogs was much lower than that against dGTP. This reduction in misincorporation rate can result in further improvements in sequencing quality. [0097] dGTP Analogs [0098] The methods, compositions, articles, and kits described herein can include 2ʹ- deoxyguanosine triphosphate (dGTP) analogs that aid in reducing the formation of G- quadruplexes in double stranded DNA, single stranded DNA, or both double stranded DNA and single stranded DNA. A dGTP analog can be used in certain DNA synthesis steps during cluster generation and resynthesis. The terms “dGTP analog” and “analog of dGTP” are used interchangeably and refer to a compound having a nucleobase that differs from the nucleobase of dGTP by addition of at least one component, removal of at least one component, exchange of at least one component, or any combination thereof. A component can be one or more atoms, one or more functional groups, or one or more substructures. At least one component of dGTP can be removed and replaced with at least one other component. [0099] The nucleobase of dGTP has the structure shown in Formula I. Formula I includes each atom in the purine ring system labelled. This numbering scheme is used for dGTP and
IP-2724-PCT/531.2724WO01 dGTP analogs described herein. The dGTP analogs useful herein may be 7-deaza-2´- deoxyguanosine 5´-triphosphate (7-deza-dGTP; see Formula 2) or an analog thereof; or a 7-N-substituted -2´-deoxyguanosine 5´-triphosphate (7-N-sub-dGTP; Formula 4). In 7- deaza-dGTP and 7-deaza-dGTP analogs, the nitrogen at position 7 of the purine in dGTP is replaced with CH or CR where R is a substituent group other than H. In some embodiments, a 7-deaza-dGTP analog includes a substituent other than hydrogen (H) at the 7 position. In some embodiments, a 7-deaza-dGTP analog includes a substituent other than hydrogen (H) at the 8 position. In some embodiments, a 7-deaza-dGTP analog includes a substituent other than hydrogen (H) at the 7 position and a substituent other than hydrogen (H) the 8 position. In other embodiments, a 7-deaza-dGTP analog is 8-aza-7-deaza-2´- deoxyguanosine 5´-triphosphate (8-aza-7-deaza-dGTP) or an analog thereof. 8-aza-7- deaza-dGTP and analogs thereof include an aza at position 8 as shown in Formula 3.8-aza- 7-deaza-dGTP analogs may or may not include a substituent other than H at the 7 position. In a 7-N-sub-dGTP, the nitrogen at position 7 of dGTP is substituted (the R group in Formula 4) such that the N is positively charged (Formula 4). In the Formulas 1, 2, 3, and 4 it is understood that the point of attachment bond is covalently coupled to the 1ʹ position of the deoxyribose of the dGTP analog.
IP-2724-PCT/531.2724WO01 [00100] In some embodiments, the dGTP analog is 7-deaza-dGPT or a 7-deaza-dGPT analog substituted at the 7 position, 8 position, or both the 7 and 8 position. In other embodiments, the dGTP analog is a 8-aza-7-deaza-dGTP or an 8-aza-7-deaza-dGTP analog substituted at the 7 position. In yet other embodiments, the dGTP analog is a 7-N-substituted dGTP; that is, dGTP substituted at the nitrogen of position 7. In some embodiments, the nucleobase of the dGTP analog is of Formula 5. It is understood that the point of attachment bond is covalently coupled to the 1ʹ position of the deoxyribose of the dGTP analog of Formula 5.
[00101] In Formula 5, Z may be C or N. In some embodiments, Z is C. In some embodiments, Z is N. In Formula 5, J can be C or N. In some embodiments, J is C. In some embodiments, J is N. In some embodiments when Z is C and J is C, the dGTP analog is 7-deza-dGTP or a 7- deza-dGTP analog. In some embodiments when Z is N and J is C, the dGTP analog is 8-aza- 7-deaza-dGTP or an 8-aza-7-deaza-dGTP analog. R1 can be a hydrogen, halo, alkyl, acyl, trihaloalkyl, cyano, sulfinyl, sulfonyl, or alkynyl. R2 can be hydrogen or halo. In some embodiments when Z is C, R2 is H, J is N, and R1 is not H, the dGTP analog is a 7-N-sub- dGTP analog. [00102] In some embodiments where Z is N, J is C, and R1 is H. [00103] R1 may be halo. In some embodiments, R1 is fluoro (F). In some embodiments, R1 is iodo (I). In some embodiments, R1 is chloro (Cl). In some embodiments, R1 is bromo (Br).
IP-2724-PCT/531.2724WO01 [00104] R1 may be alkyl. The alkyl may be linear, branched, or cyclic. The alkyl may be a C1 to C6 alkyl. In some embodiments, R1 is methyl, ethyl, propyl, isopropyl, n-butyl, iso-butyl, or sec- butyl. [00105] R1 may be acyl. R1 may be acyl of the formula -C(O)-R10 where R10 is alkyl. R10 may be a C1 to C6 alkyl. In some embodiments, R10 is methyl, ethyl, propyl, or isopropyl. In some embodiments, R1 is -C(O)-CH3. [00106] R1 may be trihaloalkyl. The trihaloalkyl includes three halos attached to the terminal carbon of an alkyl. The trihaloalkyl may be of the formula –(CH2)n1C(X)3 where n1 is 0, 1, 2, 3, or 4 and X is halo. X may be fluoro (F), bromo (Br), iodo (I), or chloro (Cl). In some embodiments, X is fluoro (F). In some embodiments n is 0 and X is fluoro (F). In some such embodiments, the trihaloalkyl can be referred to as trifluoromethyl. [00107] R1 may be cyano. The cyano may be of the formula -(CH2)n2CN where n2 is 0, 1, 2, 3, or 4. In some embodiments, n2 is 1. In some embodiments, n2 is 2. In some embodiments, R1 is - CN. [00108] R1 may be sulfinyl. The sulfinyl may be of the formula -S(O)-R20 where R20 is alkyl or trihaloalkyl. R20 may be a C1 to C6 alkyl. In some embodiments, R20 is methyl, ethyl, propyl, or isopropyl. When R20 is trihaloalkyl, the trihaloalkyl may be of the formula –(CH2)n1C(X)3 where n1 is 0, 1, 2, 3, or 4 and X is halo. X may be fluoro (F), bromo (Br), iodo (I), or chloro (Cl). In some embodiments, X is fluoro (F). In some embodiments n is 1 and X is fluoro (F). In some embodiments, R1 is -S(O)-CH3. In some embodiments, R1 is -S(O)-CF3. [00109] R1 may be sulfonyl. The sulfonyl may be of the formula -S(O)2-R30 where R30 is alkyl, such as C1 to C6 alkyl, or trihaloalkyl. In some embodiments, R30 is methyl, ethyl, propyl, or isopropyl. When R30 is trihaloalkyl, the trihaloalkyl may be of the formula –(CH2)n1C(X)3 where n1 is 0, 1, 2, 3, or 4 and X is halo. X may be fluoro (F), bromo (Br), iodo (I), or chloro (Cl). In some embodiments, X is fluoro (F). In some embodiments n is 1 and X is fluoro (F). In some embodiments, R1 is -S(O)2-CH3. In some embodiments, R1 is -S(O)2-CF3. [00110] R1 may be alkynyl. The alkynyl may be of the formula -CC-(CH2)n3-R40 where n3 is 1, 2, 3, or 4 and R40 may be CH3 or an amine. In some embodiments n3 is 1 or 2. The amine may be
IP-2724-PCT/531.2724WO01 a primary amine, a secondary amine, or a tertiary amine. In some embodiments, the amine is a primary amine. In some embodiments, R1 is -CC-(CH2)1-NH2. [00111] R2 can be a hydrogen or halo. In some embodiments, R1 is fluoro (F). In some embodiments, R1 is iodo (I). In some embodiments, R1 is chloro (Cl). In some embodiments, R1 is bromo (Br). [00112] In some embodiments both R1 and R2 are each independently halo. In some such embodiments both R1 and R2 are F. In other such embodiments, both R1 and R2 are Cl. [00113] In some embodiments, the nucleobase of the dGTP analog is of Formula 6. It is understood that the point of attachment bond is covalently coupled to the 1ʹ position of the deoxyribose of the dGTP analog of Formula 6.
[00114] In Formula 6, R100 is alkyl or H. In some embodiments, R100 is H. R100 may be a C1 to C6 alkyl. In some embodiments, R100 is methyl, ethyl, propyl, or isopropyl. In some embodiments R100 is -CH3 (methyl). [00115] Table 1 provides the structure, the common name, and abbreviated name of exemplary dGTP analogs. Only the nucleobase portion of the dGTP analog is shown. The point of connection bond is covalently coupled to the 1ʹ position of the deoxyribose of the dGTP analog.
IP-2724-PCT/531.2724WO01 [00116] Table 1: Exemplary dGTP analogs Structure Common Name Abbreviation 7-deaza-dGTP -
IP-2724-PCT/531.2724WO01 7-deaza-7- 7-deaza-7-paraG-dGTP; 7- proparagylamino-dGTP deaza-paraG-dGTP; 7- - -
IP-2724-PCT/531.2724WO01 7-deaza-acetoxy-dGTP - P
[00117] Described herein are methods for producing a nucleic acid by synthesis in the presence of a dGTP analog. In some embodiments, the synthesis is performed in the absence of potassium ions. That is, one or more of the amplifications, compositions, articles, and kits described herein are free of potassium salts that can dissociate to form potassium ions. Without intending to be limiting, potassium ions are able to stabilize secondary structures such as g-quadruplexes. As such, it may be advantageous to perform one or more of the nucleic acid synthesis steps, e.g., amplification and/or first strand synthesis, in the absence of potassium ions. In some such embodiments, a method, composition, article, or kit may not include a salt, or may include a non-potassium containing salt such as lithium chloride. [00118] Arrays
IP-2724-PCT/531.2724WO01 [00119] Some embodiments of the methods, compositions, articles, and kits described herein include an array of amplification sites. An array of amplification sites can be present as one or more substrates. Exemplary types of substrate materials that can be used for an array include glass, modified glass, functionalized glass, inorganic glasses, microspheres (e.g., inert and/or magnetic particles), plastics, polysaccharides, nylon, nitrocellulose, ceramics, resins, silica, silica-based materials, carbon, metals, an optical fiber or optical fiber bundles, polymers and multiwell (e.g., microtiter) plates. Exemplary plastics include acrylics, polystyrene, copolymers of styrene and other materials, polypropylene, polyethylene, polybutylene, polyurethanes and TeflonTM. Exemplary silica-based materials include silicon and various forms of modified silicon. [00120] In particular embodiments, a substrate can be within or part of a vessel such as a well, tube, channel, cuvette, Petri plate, bottle or the like. A particularly useful vessel is a flow-cell, for example, as described in US Pat. No.8,241,573 or Bentley et al., Nature 456:53-59 (2008). Exemplary flow-cells are those that are commercially available from Illumina, Inc. (San Diego, Calif.). Another particularly useful vessel is a well in a multiwell plate or microtiter plate. [00121] In some embodiments, the sites of an array can be configured as features on a surface. The features can be present in any of a variety of desired formats. For example, the sites can be wells, pits, channels, ridges, raised regions, pegs, posts or the like. As set forth herein, the sites can contain beads. However, in particular embodiments the sites need not contain a bead or particle. Exemplary sites include wells that are present in substrates used for commercial sequencing platforms sold by 454 LifeSciences (a subsidiary of Roche, Basel Switzerland) or Ion Torrent (a subsidiary of Life Technologies, Carlsbad Calif.). Other substrates having wells include, for example, etched fiber optics and other substrates described in U.S. Pat. No. 6,266,459; U.S. Pat. No.6,355,431; U.S. Pat. No.6,770,441; U.S. Pat. No.6,859,570; U.S. Pat. No. 6,210,891; U.S. Pat. No. 6,258,568; U.S. Pat. No. 6,274,320; U.S. Pat No. 8,262,900; U.S. Pat. No. 7,948,015; U.S. Pat. Pub. No. 2010/0137143; U.S. Pat. No. 8,349,167, or PCT Publication No. WO 00/63437. In several cases the substrates are exemplified in these references for applications that use beads in the wells. The well- containing substrates can be used with or without beads in the methods or compositions of
IP-2724-PCT/531.2724WO01 the present disclosure. In some embodiments, wells of a substrate can include gel material (with or without beads) as set forth in U.S. Pat. No.9,512,422. [00122] The sites of an array can be metal features on a non-metallic surface such as glass, plastic or other materials exemplified herein. A metal layer can be deposited on a surface using methods known in the art such as wet plasma etching, dry plasma etching, atomic layer deposition, ion beam etching, chemical vapor deposition, vacuum sputtering, or the like. Any of a variety of commercial instruments can be used as appropriate including, for example, the FlexAL®, OpAL®, Ionfab 300Plus®, or Optofab 3000® systems (Oxford Instruments, UK). A metal layer can also be deposited by e-beam evaporation or sputtering as set forth in Thornton, Ann. Rev. Mater. Sci. 7:239-60 (1977). Metal layer deposition techniques, such as those exemplified herein, can be combined with photolithography techniques to create metal regions or patches on a surface. Exemplary methods for combining metal layer deposition techniques and photolithography techniques are provided in U.S. Pat. No. 8,778,848 and U.S. Pat. No.8,895,249. [00123] In particular embodiments, an array can include a collection of beads or other particles. The particles can be suspended in a solution or they can be located on the surface of a substrate. Examples of bead arrays in solution are those commercialized by Luminex (Austin, Tex.). Examples of arrays having beads located on a surface include those wherein beads are located in wells such as a BeadChip array (Illumina Inc., San Diego Calif.) or substrates used in sequencing platforms from 454 LifeSciences (a subsidiary of Roche, Basel Switzerland) or Ion Torrent (a subsidiary of Life Technologies, Carlsbad Calif.). Other arrays having beads located on a surface are described in U.S. Pat. No.6,266,459; U.S. Pat. No.6,355,431; U.S. Pat. No. 6,770,441; U.S. Pat. No. 6,859,570; U.S. Pat. No. 6,210,891; U.S. Pat. No. 6,258,568; U.S. Pat. No. 6,274,320; US 2009/0026082 A1; US 2009/0127589 A1; US 2010/0137143 A1; US 2010/0282617 A1, or PCT Publication No. WO 00/63437. Several of the above references describe methods for attaching target nucleic acids to beads prior to loading the beads in or on an array substrate. It will, however, be understood that the beads can be made to include amplification primers and the beads can then be used to load an array, thereby forming amplification sites for use in a method set forth herein. As set forth previously herein, the substrates can be used without beads. For example, amplification
IP-2724-PCT/531.2724WO01 primers can be attached directly to the wells or to gel material in wells. Thus, the references are illustrative of materials, compositions or apparatus that can be modified for use in the methods and compositions set forth herein. [00124] In particular embodiments, a capture agent, such as a capture nucleic acid, can be attached to the amplification site. For example, the capture agent can be attached to the surface of a feature of an array. The attachment can be via an intermediate structure such as a bead, particle, or gel. An example of attachment of capture nucleic acids to an array via a gel is described in U.S. Pat. No. 8,895,249 and further exemplified by flow-cells available commercially from Illumina Inc. (San Diego, Calif.) or described in WO 2008/093098. Exemplary gels that can be used in the methods and apparatus set forth herein include, but are not limited to, those having a colloidal structure, such as agarose; polymer mesh structure, such as gelatin; or cross-linked polymer structure, such as polyacrylamide, SFA (see, for example, US Pat. App. Pub. No.2011/0059865 A1) or PAZAM (see, for example, U.S. Prov. Pat. App. Ser. No.61/753,833 and U.S. Pat. No.9,012,022). Attachment via a bead can be achieved as exemplified in the description and cited references set forth previously herein. [00125] Amplification sites of an array can include a plurality of capture agents capable of binding to target nucleic acids. In one embodiment, a capture agent includes a capture nucleic acid. In typical conditions used to prepare arrays for sequencing, the nucleotide sequence of the capture nucleic acid is complementary to a sequence of one or more modified target nucleic acids, such as a universal capture binding sequence present on a target nucleic acid. In some embodiments, the capture nucleic acid can also function as a primer for amplification of the modified target nucleic acid. In some embodiments, one population of capture nucleic acid includes a P5 primer or the complement thereof, and the second population of capture nucleic acid includes a P7 primer or the complement thereof. [00126] A capture nucleic acid can be immobilized by single point covalent attachment to an array at or near the 5' end of the capture nucleic acid, leaving the template-specific portion of the capture nucleic acid free to anneal to its cognate universal capture binding sequence and the 3' hydroxyl group free for extension. Any suitable covalent attachment means known in the art may be used for this purpose. The chosen attachment chemistry will depend on the nature
IP-2724-PCT/531.2724WO01 of the solid support, and any derivatization or functionalization applied to it. The capture nucleic acid itself may include a moiety, which may be a non-nucleotide chemical modification, to facilitate attachment. In a particular embodiment, the primer may include a sulphur-containing nucleophile, such as phosphorothioate or thiophosphate, at the 5' end. [00127] In some embodiments, the features on the surface of an array substrate are non-contiguous, being separated by interstitial regions of the surface. Interstitial regions that have a substantially lower quantity or concentration of capture agents, compared to the features of the array, are advantageous. Interstitial regions that lack capture agents are particularly advantageous. For example, a relatively small amount or absence of capture moieties at the interstitial regions favors localization of target nucleic acids, and subsequently generated clusters, to desired features. In particular embodiments, the features can be concave features in a surface (e.g., wells) and the features can contain a gel material. The gel-containing features can be separated from each other by interstitial regions on the surface where the gel is substantially absent or, if present the gel is substantially incapable of supporting localization of nucleic acids. Methods and compositions for making and using substrates having gel containing features, such as wells, are set forth in U.S. Pat. No. 9,512,422. The size of the features and/or spacing between the regions can vary such that arrays can be high density, medium density or lower density. High density arrays are characterized as having regions separated by less than about 15 ^m. Medium density arrays have regions separated by about 15 to 30 ^m, while low density arrays have regions separated by greater than 30 ^m. An array useful in the disclosure can have regions that are separated by less than 100 ^m, 50 ^m, 10 ^m, 5 ^m, 1 ^m or 0.5 ^m. [00128] In some embodiments, the solid support comprises a patterned surface. A "patterned surface" refers to an arrangement of different regions in or on an exposed layer of a solid support. For example, one or more of the regions can be features where one or more amplification primers are present. In some embodiments, the pattern can be an x-y format of features that are in rows and columns. In some embodiments, the pattern can be a repeating arrangement of features and/or interstitial regions. In some embodiments, the pattern can be a random arrangement of features and/or interstitial regions. In some embodiments, the pattern can appear as a grid of spots or patches. The features can be located in a repeating pattern or in
IP-2724-PCT/531.2724WO01 an irregular non-repeating pattern. Particularly useful patterns are hexagonal patterns, rectilinear patterns, grid patterns, patterns having reflective symmetry, patterns having rotational symmetry, or the like. Asymmetric patterns can also be useful. The pitch can be the same between different pairs of nearest neighbor features or the pitch can vary between different pairs of nearest neighbor features. In particular embodiments, features of an array can each have an area that is larger than about 100 nm2, 250 nm2, 500 nm2, 1 ^m2, 2.5 ^m2, 5 ^m2, 10 ^m2, 100 ^m2, or 500 ^m2. Alternatively, or additionally, features of an array can each have an area that is smaller than about 1 mm2, 500 ^m2, 100 ^m2, 25 ^m2, 10 ^m2, 5 ^m2, 1 ^m2, 500 nm2, or 100 nm2. Indeed, a region can have a size that is in a range between an upper and lower limit selected from those exemplified above. Exemplary patterned surfaces that can be used in the methods and compositions set forth herein are described in U.S. Pat. Nos. 8,778,848, 8,778,849 and 9,079,148, and U.S. Pat. Appl. Pub. No.2014/0243224. [00129] The features in a patterned surface can be wells in an array of wells (e.g., microwells or nanowells) on glass, silicon, plastic or other suitable solid supports with patterned, covalently-linked gel such as poly(N-(5-azidoacetamidylpentyl)acrylamide-co-acrylamide) (PAZAM, see, for example, US Pub. No. 2013/184796, WO 2016/066586, and WO 2015/002813). The process can create gel pads used for sequencing that can be stable over sequencing runs with a large number of cycles. The covalent linking of the polymer to the wells is helpful for maintaining the gel in the structured features throughout the lifetime of the structured substrate during a variety of uses. However, in many embodiments the gel need not be covalently linked to the wells. For example, in some conditions silane free acrylamide (SFA, see, for example, US Pat. No.8,563,477) which is not covalently attached to any part of the structured substrate, can be used as the gel material. [00130] In particular embodiments, a structured substrate can be made by patterning a solid support material with wells (e.g., microwells or nanowells), coating the patterned support with a gel material (e.g., PAZAM, SFA, or chemically modified variants thereof, such as the azidolyzed version of SFA (azido-SFA)) and polishing the gel coated support, for example via chemical or mechanical polishing, thereby retaining gel in the wells but removing or inactivating substantially all of the gel from the interstitial regions on the surface of the structured
IP-2724-PCT/531.2724WO01 substrate between the wells. Primer nucleic acids can be attached to gel material. A solution of modified target nucleic acids can then be contacted with the polished substrate such that individual modified target nucleic acids will seed individual wells via interactions with primers attached to the gel material; however, the target nucleic acids will not occupy the interstitial regions due to absence or inactivity of the gel material. Amplification of the modified target nucleic acids will be confined to the wells since absence or inactivity of gel in the interstitial regions prevents outward migration of the growing nucleic acid colony. The process can be conveniently manufactured, being scalable and utilizing conventional micro- or nanofabrication methods. [00131] Target nucleic acids [00132] Some embodiments of the methods, compositions, articles, and kits described herein include target nucleic acids. The terms “target nucleic acid,” “target fragment,” “target nucleic acid fragment, “target molecule,” and “target nucleic acid molecule” are used interchangeably to refer to nucleic acid molecules that are to be sequenced, such as on an array. The target nucleic acid may be essentially any nucleic acid of known or unknown sequence. It may be, for example, a fragment of genomic DNA or cDNA. Sequencing may result in determination of the sequence of the whole, or a part of the target molecule. The targets can be derived from a primary nucleic acid sample that has been randomly fragmented. In one embodiment, the targets can be processed into templates suitable for amplification by the placement of universal amplification sequences, e.g., sequences present in a universal adaptor. [00133] The primary nucleic acid sample may originate in double-stranded DNA (dsDNA) form (e.g., genomic DNA fragments, amplification products and the like) from a sample or may have originated in single-stranded form from a sample, as DNA or RNA, and been converted to dsDNA form. By way of example, mRNA molecules may be copied into double-stranded cDNAs suitable for use in a method described herein using standard techniques well known in the art. The precise sequence of the polynucleotide molecules from a primary nucleic acid sample is generally not material to the disclosure, and may be known or unknown. [00134] In one embodiment, the primary polynucleotide molecules from a primary nucleic acid sample are DNA molecules. More particularly, the primary polynucleotide molecules
IP-2724-PCT/531.2724WO01 represent the entire genetic complement of an organism, and are genomic DNA molecules which include both intron and exon sequences, as well as non-coding regulatory sequences such as promoter and enhancer sequences. In one embodiment, particular sub-sets of polynucleotide sequences or genomic DNA can be used, such as, for example, particular chromosomes. Yet more particularly, the sequence of the primary polynucleotide molecules is not known. Still yet more particularly, the primary polynucleotide molecules are human genomic DNA molecules. The DNA target nucleic acids may be treated chemically or enzymatically either prior or subsequent to any random fragmentation processes, and prior or subsequent to the ligation of a universal sequence, such as universal adapter sequences. [00135] The nucleic acid sample can include high molecular weight material such as genomic DNA (gDNA). The sample can include low molecular weight material such as nucleic acid molecules obtained from FFPE or archived DNA samples. In another embodiment, low molecular weight material includes enzymatically or mechanically fragmented DNA. The sample can include cell-free circulating DNA. A sample can include, but is not limited to, nucleic acid molecules obtained from biopsies, tumors, scrapings, swabs, blood, mucus, urine, plasma, semen, hair, laser capture micro-dissections, surgical resections, and other clinical or laboratory obtained samples. In some embodiments, the sample can be an epidemiological, agricultural, forensic or pathogenic sample. [00136] The biological source of a sample is not intended to be limiting. In some embodiments, the sample can include nucleic acid molecules obtained from a eukaryote, such as an animal or a plant. Examples of an animal include, but are not limited to, a mammal including a human. In some embodiments, the sample can include nucleic acid molecules obtained from a prokaryote, such as a bacterium or archaeon. In some embodiments, the sample can include nucleic acid molecules obtained from a virus. In some embodiments, the source of the nucleic acid molecules may be an archived or extinct sample or species. [00137] Random fragmentation refers to the fragmentation of a polynucleotide molecule from a primary nucleic acid sample in a non-ordered fashion by enzymatic, chemical or mechanical means. Such fragmentation methods are known in the art and use standard methods (Sambrook and Russell, Molecular Cloning, A Laboratory Manual, third edition). In one
IP-2724-PCT/531.2724WO01 embodiment, enzymatic fragmentation can be accomplished using a process often referred to as tagmentation. Tagmentation uses a transposome complex that can include both transposon and transposase and combines into a single step fragmentation and ligation to add universal sequences that can be used as universal adapters or for the addition of other universal sequences (Gunderson et al., WO 2016/130704). For the sake of clarity, generating smaller fragments of a larger piece of nucleic acid via specific PCR amplification of such smaller fragments is not equivalent to fragmenting the larger piece of nucleic acid because the larger piece of nucleic acid sequence remains in intact (i.e., is not fragmented by the PCR amplification). Moreover, random fragmentation is designed to produce fragments irrespective of the sequence identity or position of nucleotides comprising and/or surrounding the break. More particularly, the random fragmentation is by mechanical means such as nebulization or sonication to produce fragments of about 50 base pairs in length to about 1500 base pairs in length, still more particularly 50-700 base pairs in length, yet more particularly 50-400 base pairs in length. Most particularly, the method is used to generate smaller fragments of from 50-150 base pairs in length. [00138] Fragmentation of polynucleotide molecules by mechanical means (nebulization, sonication and Hydroshear, for example) results in fragments with a heterogeneous mix of blunt and 3'- and 5'-overhanging ends. It is therefore desirable to repair the fragment ends using methods or kits (such as the Lucigen DNA terminator End Repair Kit) known in the art to generate ends that are optimal for insertion, for example, into blunt sites of cloning vectors. In a particular embodiment, the fragment ends of the population of nucleic acids are blunt ended. More particularly, the fragment ends are blunt ended and phosphorylated. The phosphate moiety can be introduced via enzymatic treatment, for example, using polynucleotide kinase. [00139] A population of target nucleic acids, or amplicons thereof, can have an average strand length that is desired or appropriate for a particular application of the methods or compositions set forth herein. For example, the average strand length can be less than about 100,000 nucleotides, 50,000 nucleotides, 10,000 nucleotides, 5,000 nucleotides, 1,000 nucleotides, 500 nucleotides, 100 nucleotides, or 50 nucleotides. Alternatively, or additionally, the average strand length can be greater than about 10 nucleotides, 50 nucleotides, 100 nucleotides, 500 nucleotides, 1,000 nucleotides, 5,000 nucleotides, 10,000 nucleotides,
IP-2724-PCT/531.2724WO01 50,000 nucleotides, or 100,000 nucleotides. The average strand length for population of target nucleic acids, or amplicons thereof, can be in a range between a maximum and minimum value set forth above. It will be understood that amplicons generated at an amplification site (or otherwise made or used herein) can have an average strand length that is in a range between an upper and lower limit selected from those exemplified above. [00140] In some cases, a population of target nucleic acids can be produced under conditions or otherwise configured to have a maximum length for its members. For example, the maximum length for the members that are used in one or more steps of a method set forth herein or that are present in a particular composition can be less than 100,000 nucleotides, less than 50,000 nucleotides, less than 10,000 nucleotides, less than 5,000 nucleotides, less than 1,000 nucleotides, less than 500 nucleotides, less than 100 nucleotides, or less than 50 nucleotides. Alternatively, or additionally, a population of target nucleic acids, or amplicons thereof, can be produced under conditions or otherwise configured to have a minimum length for its members. For example, the minimum length for the members that are used in one or more steps of a method set forth herein or that are present in a particular composition can be more than 10 nucleotides, more than 50 nucleotides, more than 100 nucleotides, more than 500 nucleotides, more than 1,000 nucleotides, more than 5,000 nucleotides, more than 10,000 nucleotides, more than 50,000 nucleotides, or more than 100,000 nucleotides. The maximum and minimum strand length for target nucleic acids in a population can be in a range between a maximum and minimum value set forth above. It will be understood that amplicons generated at an amplification site (or otherwise made or used herein) can have maximum and/or minimum strand lengths in a range between the upper and lower limits exemplified above. [00141] In particular embodiments, the target nucleic acids are sized relative to the area of the amplification sites, for example, to facilitate exclusion amplification. For example, the area for each of the sites of an array can be greater than the diameter of the excluded volume of the target nucleic acids in order to achieve exclusion amplification. Taking, for example, embodiments that use an array of features on a surface, the area for each of the features can be greater than the diameter of the excluded volume of the target nucleic acids that are transported to the amplification sites. The excluded volume for a target nucleic acid and its
IP-2724-PCT/531.2724WO01 diameter can be determined, for example, from the length of the target nucleic acid. Methods for determining the excluded volume of nucleic acids and the diameter of the excluded volume are described, for example, in U.S. Pat. No.7,785,790; Rybenkov et al., Proc. Natl. Acad. Sci. U.S.A. 90: 5307-5311 (1993); Zimmerman et al., J. Mol. Biol. 222:599-620 (1991); or Sobel et al., Biopolymers 31:1559-1564 (1991). [00142] In a particular embodiment, the target fragment sequences are prepared with single overhanging nucleotides by, for example, activity of certain types of DNA polymerase such as Taq polymerase or Klenow exo minus polymerase which has a non-template-dependent terminal transferase activity that adds a single deoxynucleotide, for example, deoxyadenosine (A) to the 3' ends of a DNA molecule, for example, a PCR product. Such enzymes can be used to add a single nucleotide ‘A’ to the blunt ended 3' terminus of each strand of the double-stranded target fragments. Thus, an ‘A’ could be added to the 3' terminus of each end repaired strand of the double-stranded target fragments by reaction with Taq or Klenow exo minus polymerase, while a universal adapter polynucleotide construct could be a T-construct with a compatible ‘T’ overhang present on the 3' terminus of each region of double stranded nucleic acid of the universal adapter. This end modification also prevents self-ligation of both vector and target such that there is a bias towards formation of the combined ligated adaptor-target-adaptor molecules. [00143] Sequencing Library Preparation [00144] A sequencing library of the methods, compositions, articles, and kits described herein typically includes a target nucleic acid having a universal adapter attached one or both ends. The terms “target nucleic acid,” “target fragment,” “target nucleic acid fragment,” “target molecule,” and “target nucleic acid molecule” are used interchangeably to refer to nucleic acid molecules that are to be sequenced. A target nucleic acid having a universal adapter on one or both ends can be referred to as a "modified target nucleic acid." A library of target nucleic acids refers to the collection of target nucleic acids containing known common sequences at their 3' and 5' ends, and may also be referred to as a 3' and 5' modified library.
IP-2724-PCT/531.2724WO01 [00145] Methods for attaching a universal adapter to one of both ends of a target nucleic acid are known to the person skilled in the art. The attachment can be through standard library preparation techniques using ligation (Chesney et al. U.S. Pat. Pub. No.2018/0305753 A1), through tagmentation using transposase complexes (Gunderson et al., WO 2016/130704), or primer extension, for instance when preparing a sample for targeted sequencing. [00146] Target nucleic acids are often amplified during sequencing library preparation. Amplification of modified target nucleic acids during sequencing library preparation can be by linear amplification, exponential amplification, or both linear and exponential amplification steps. Amplification conditions useful during sequencing library preparation are routine and known to the person of ordinary skill in the art. For instance, amplification profiles (e.g., number of cycles and the temperature and time of each cycle), and concentrations of target nucleic acids, buffers, ions, dNTPs, and polymerase are known or can be easily determined using commercially available algorithms. [00147] In one embodiment, double-stranded target nucleic acids from a sample, e.g., a fragmented sample, are treated by first ligating identical universal adaptor molecules to the 5' and 3' ends of the double-stranded target nucleic acids (which may be of known, partially known or unknown sequence). In some embodiments, the identical universal adaptor molecules can be ‘mismatched adaptors’, the general features of which are defined below, and further described in Gormley et al., US 7,741,463, and Bignell et al., US 8,053,192). In some embodiments, the identical universal adaptor molecules can include fully complementary polynucleotide strands. A universal adaptor typically includes the universal capture binding sequences that aid in immobilizing the target nucleic acids on an array for subsequent cluster generation. In one embodiment, library preparation of target nucleic acids having universal adaptor molecules at the 5' and 3' ends includes one or more amplification, for instance by PCR, before immobilizing the target nucleic acids on an array for subsequent cluster generation. [00148] In some embodiments, for instance when a universal adapter is added by tagmentation, it is desirable to modify the universal adapter present at each end of target nucleic acids before cluster generation. The modification can occur by an amplification step, such as PCR. For
IP-2724-PCT/531.2724WO01 instance, an initial primer extension reaction is carried out using a universal primer binding site in which extension products complementary to both strands of each target nucleic acid are formed and add a universal capture binding sequence. The resulting primer extension products, and amplified copies thereof, collectively provide a library of modified target nucleic acids that can be immobilized, clonally expanded to form clusters, and then sequenced. In some embodiments, a library includes target nucleic acids originating from the same source, e.g., the same tissue, same cell, and/or same individual (for instance, a sample of cell-free DNA). The 3’ ends, and optionally the 5’ ends, of the universal adapters attached to the target nucleic acids can include a homogeneous population or a heterogeneous population of universal capture binding sequences described herein. [00149] Generally, amplification reactions require at least two amplification primers, often denoted `forward` and `reverse` primers (primer oligonucleotides) that are capable of annealing specifically to a part of the nucleic acid sequence to be amplified, e.g., a universal adapter at the ends of target nucleic acids, under conditions encountered in the primer annealing step of each cycle of an amplification reaction. It will be understood by the skilled person that if the primers contain any nucleotide sequence which does not anneal to the modified target nucleic acids in the first amplification cycle then this sequence may be copied into the amplification products. For instance, the use of primers having universal capture binding sequences, i.e., sequences that do not anneal to the universal adapter at the ends of target nucleic acids, the universal capture binding sequences will be incorporated into the resulting amplicon. [00150] Amplification primers are generally single stranded polynucleotide structures. They may also contain a mixture of natural and non-natural bases and also natural and non-natural backbone linkages, provided that any non-natural modifications does not preclude function as a primer- -that being defined as the ability to anneal to a template polynucleotide strand during conditions of the amplification reaction and to act as an initiation point for synthesis of a new polynucleotide strand complementary to the template strand. Primers may additionally include non-nucleotide chemical modifications, for example phosphorothioates to increase exonuclease resistance, again provided such that modifications do not prevent primer function.
IP-2724-PCT/531.2724WO01 [00151] In some embodiments, the universal adapters used in the method of the disclosure are referred to as ‘mismatched’ adaptors because the adaptors include a region of sequence mismatch, i.e., they are not formed by annealing of fully complementary polynucleotide strands. Mismatched adaptors for use herein typically include at least one double-stranded region, also referred to as a region of double stranded nucleic acid, and at least one unmatched single-stranded region, also referred to as a region of single-stranded non- complementary nucleic acid strands. Mismatched adapters are routinely used in producing sequencing libraries, and the characteristics of useful mismatched adapters are known to the skilled person. [00152] The ‘double-stranded region’ of the universal adapter is a short double-stranded region, typically including 5 or more consecutive base pairs, formed by annealing of the two partially complementary polynucleotide strands. As used herein, the term “double stranded,” when used in reference to a nucleic acid molecule, means that substantially all of the nucleotides in the nucleic acid molecule are hydrogen bonded to a complementary nucleotide. A partially double stranded nucleic acid can have at least 10%, 25%, 50%, 60%, 70%, 80%, 90% or 95% of its nucleotides hydrogen bonded to a complementary nucleotide. [00153] The double-stranded region can form the ‘ligatable’ end of the adaptor, e.g., the end that is joined to a double-stranded target nucleic acid in the ligation reaction. The ligatable end of the universal adaptor may be blunt or, in other embodiments, short 5' or 3' overhangs of one or more nucleotides may be present to facilitate/promote ligation. The 5' terminal nucleotide at the ligatable end of the universal adapter is typically phosphorylated to enable phosphodiester linkage to a 3' hydroxyl group on the target polynucleotide. [00154] The term ‘unmatched region’ refers to a region of the universal adaptor, the region of single- stranded non-complementary nucleic acid strands, wherein the sequences of the two polynucleotide strands forming the universal adaptor exhibit a degree of non- complementarity such that the two strands are not capable of fully annealing to each other under standard annealing conditions for a primer extension or PCR reaction. The unmatched region(s) may exhibit some degree of annealing under standard reaction conditions for an
IP-2724-PCT/531.2724WO01 enzyme-catalyzed ligation reaction, provided that the two strands revert to single stranded form under annealing conditions in an amplification reaction. [00155] A universal adapter can include at least one universal primer binding site. A universal primer binding site is a universal sequence that can be used for amplification and/or sequencing of a target nucleic acid attached to the universal adapter. Examples of universal primer binding sites include, but are not limited to, sequences complementary to a Read1 or Read2 primer. [00156] A universal adapter can include at least one index. An index can be used as a marker characteristic of the source of particular target nucleic acid on an array. Generally, the index is a synthetic sequence of nucleotides that is part of the universal adapter which is added to the target nucleic acids as part of the library preparation step. Accordingly, an index is a nucleic acid sequence which is attached to each of the target molecules of a particular sample, the presence of which is indicative of, or is used to identify, the sample or source from which the target molecules were isolated. [00157] In some embodiments, the index may be up to 20 nucleotides in length, more preferably 1- 10 nucleotides, and most preferably 4-8 nucleotides in length. For example, a four-nucleotide index gives a possibility of multiplexing 256 (44) samples on the same array, whereas a six base index enables 4,096 (46) samples to be processed on the same array. [00158] In one embodiment, the universal capture binding sequence and/or universal primer binding site is part of the universal adapter when it is ligated to the double-stranded target fragments, and in another embodiment the universal capture binding sequence and/or universal primer binding site is added to the universal adapter after the universal adapter is ligated to the double-stranded target fragments. The addition can be accomplished using routine methods, including PCR-based methods. [00159] The precise nucleotide sequence of the universal adapters is generally not material to the invention and may be selected by the user such that the desired sequence elements are ultimately included in the common sequences of the plurality of different modified target nucleic acids, for example, to provide for the universal capture binding sequences and universal primer binding sites for particular sets of universal primers. Additional sequence
IP-2724-PCT/531.2724WO01 elements may be included, for example, to provide binding sites for sequencing primers, e.g., Read1 and Read2 primers, which will ultimately be used in sequencing of target nucleic acids in the library, or products derived from amplification of the target nucleic acids in the library, for example on a solid support. In some embodiments, a universal adapter may include mixtures of natural and non-natural nucleotides (e.g., one or more ribonucleotides) linked by a mixture of phosphodiester and non-phosphodiester backbone linkages. [00160] Ligation methods for adding a universal adapter to a target nucleic acid are known in the art and use standard methods. Such methods use ligase enzymes such as DNA ligase to effect or catalyze joining of the ends of the two polynucleotide strands of, in this case, the universal adapter and the double-stranded target nucleic acids, such that covalent linkages are formed. The universal adapter may contain a 5'-phosphate moiety to facilitate ligation to the 3'-OH present on the target fragment. The double-stranded target nucleic acid contains a 5'- phosphate moiety, either residual from the shearing process, or added using an enzymatic treatment step, and has been end repaired, and optionally extended by an overhanging base or bases, to give a 3'-OH suitable for ligation. [00161] As discussed herein, in one embodiment universal adaptors used in the ligation are complete and include a universal capture binding sequence and other universal sequences, e.g., a universal primer binding site and an index sequence. The resulting plurality of modified target nucleic acids can be amplified before immobilization for sequencing. Also, as discussed herein, in one embodiment universal adaptors used in the ligation include a universal primer binding site and an index sequence, and do not include a universal capture binding sequence. The resulting plurality of modified target nucleic acids can be further modified to include specific sequences, such as a universal capture binding sequence, and can be amplified before immobilization for sequencing. [00162] Immobilizing Modified Target Nucleic Acids at Amplification Sites and Production of Clonal Clusters [00163] The present disclosure includes methods, compositions, articles, and kits related to initial steps of cluster generation, e.g., seeding amplification sites and/or first strand extension. In one embodiment, a method of the present disclosure can include contacting a plurality of
IP-2724-PCT/531.2724WO01 amplification sites of an array with a single-stranded sequencing library. Each amplification site of an array includes at least one, and in some embodiments two or more populations of capture agents immobilized to the amplification sites. The method includes using conditions suitable for attaching the universal adapter to one of the capture agents to result in a plurality of amplification sites that each include one member of the sequencing library. The conditions useful for the attaching are routinely used in sequencing workflows and are known to the skilled person. [00164] In embodiments where the modified target nucleic acids include at least one universal capture binding sequence and a complementary capture nucleic acid is present in one of the immobilized capture agents, sequences of the universal capture binding sequence and the complementary capture nucleic acid hybridize to result in a plurality of amplification sites that each include one member of the sequencing library. The addition of a member of a sequencing library to an amplification site is referred to as “seeding” the site (FIG.3, block 31). The seeding can be accomplished by use of a seeding reagent. A seeding reagent can include an array of amplification sites and a plurality of target nucleic acids. An example is shown in FIG.4A, which shows an amplification site 20 containing an immobilized capture agent 24 and a member of a sequencing library 21’. The 3’ end of the member of the sequencing library 21’ is hybridized to a complementary capture nucleic acid that is present in the universal capture binding sequence 25. The skilled person will recognize that some amplification sites can include more than one member of the sequencing library at this stage and not significantly reduce the ability to obtain useful data from the subsequent sequencing reaction. The skilled person will also recognize that not all amplification sites of an array need to be occupied. [00165] The method can further include first strand synthesis to result in immobilization of a modified target nucleic acid to an amplification site. First strand synthesis and immobilization can be accomplished by extending the 3’ end of the first capture nucleic acid associated with member of the sequencing library at the amplification sites. The extending includes the incorporation of nucleotides by a DNA polymerase using the attached member of the sequencing library as a template, and results in an extended nucleic acid that is immobilized to the surface of the amplification site. The immobilization can be accomplished by use of
IP-2724-PCT/531.2724WO01 an immobilization reagent. An immobilization reagent can include an array of amplification sites, a plurality of target nucleic acids, dNTPs (e.g., dATP, dTTP, dCTP, and dGTP), and a polymerase. As shown in FIG.4A, a polymerase extends the immobilized capture agent 24 as shown by the dashed line using the nucleotide sequence of the member of the sequencing library 21’ as template, resulting in an immobilized complement 21 of the member of the sequencing library 21’ (FIG.4B). Under some conditions, such as when kinetic exclusion is used for cluster generation, seeding and first strand synthesis can occur essentially simultaneously. [00166] The methods of the present disclosure can further include generating clonal clusters, e.g., producing a plurality of amplification sites that each include a clonal population of amplicons derived from the modified target nucleic acid originally present at each amplification site (FIG.3, block 32). Secondary structure from a G-quadruplex during the cluster production can reduce the representation of GC-rich regions. The present disclosure includes the use of a dGTP analog during some embodiments of the step of cluster production to reduce the impact of G quadruplexes. [00167] In one embodiment, the method can include providing an amplification reagent and an array of amplification sites that include an immobilized nucleic acid. An amplification reagent can include (i) an array of populated amplification sites (e.g., amplification sites seeded with members of a sequencing library), (ii) nucleotide triphosphates (NTPs) including dATP, dTTP, dCTP, dGTP, and a dGTP analog, and (iii) a polymerase. In some embodiments, the amplification reagent does not include a dGTP analog. In some embodiments, the NTPs further include a dGTP analog in addition to dATP, dTTP, dCTP, and dGTP. The amplification sites are populated with an immobilized nucleic acid that is to be clonally amplified. [00168] In some embodiments, the nucleic acid at each amplification site is a modified target nucleic acid that originally seeded the site, and the clonal amplification includes first strand synthesis and the subsequent amplification. For instance, in embodiments that include clonal cluster generation by kinetic exclusion, first strand synthesis and subsequent amplification can occur essentially simultaneously. In other embodiments, the nucleic acid
IP-2724-PCT/531.2724WO01 at each amplification site includes the complement of the modified target nucleic acid that originally seeded the site, e.g., first strand synthesis has occurred. The amplification reagent is reacted to produce a plurality of populated amplification sites, where the plurality of populated amplification sites each include a clonal population of amplicons, where each clonal population is derived from the modified target nucleic acid that originally seeded the site. FIG.5 shows an example of generating clonal clusters. FIG. 5A shows an amplification site 20 containing immobilized strand 21. Exposure to suitable conditions results in the 3’ end of immobilized strand 21 hybridizing to complementary nucleotides of capture nucleic acid 23 (FIG.5B), and immobilized strand 21 is used as a template for synthesis initiated from the 3’ end of capture nucleic acid 23 to result in strand 22. [00169] The methods for cluster generation described herein can differ from typical cluster generation due to the inclusion of a dGTP analog. Thus, the extension reactions that occur during cluster generation, e.g., extension from capture nucleic acid 23 of FIG.5B), can include a dGTP analog. The amount of dGTP analog can be described in relation to the normal dGTP present. In one embodiment, the amount of dGTP analog can be expressed as a percentage of the normal dGTP present in an amplification reaction. For instance, an amplification reagent can include dGTP and a dGTP analog, where the amount of dGTP analog can be described in relation to the normal dGTP present. In some embodiments, the amount of dGTP analog in an amplification reaction can be at least 3%, at least 4%, at least 5%, at least 6%, at least 7%, at least 8%, at least 9%, at least 10%, at least 11%, at least 12%, at least 13%, at least 14%, at least 15%, at least 16%, at least 17%, at least 18%, at least 19%, at least 20%, at least 21%, at least 22%, at least 23%, at least 24%, least 25%, at least 26%, at least 27%, at least 28%, at least 29%, at least 30%, at least 31%, at least 32%, at least 33%, at least 34%, at least 35%, at least 36%, at least 37%, at least 38%, at least 39%, at least 40%, at least 41%, at least 42%, at least 43%, at least 44%, at least 45%, at least 46%, at least 47%, at least 48%, at least 49%, at least 50%, at least 51%, at least 52%, at least 53%, at least 54%, at least 55%, at least 56%, at least 57%, at least 58%, at least 59%, at least 60%, at least 61%, at least 62%, at least 63%, at least 64%, least 65%, at least 66%, at least 67%, at least 68%, at least 69%, at least 70%, at least 71%, at least 72%, at least 73%, at least 74%, at least 75%, at least 76%, at least 77%, at least 78%, at least 79%,
IP-2724-PCT/531.2724WO01 at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% of the total amount of dGTP . In some embodiments, the amount of dGTP analog in an amplification reaction can be no greater than 99%, no greater than 98%, no greater than 97%, no greater than 96%, no greater than 95%, no greater than 94%, no greater than 93%, no greater than 92%, no greater than 91%, no greater than 90%, no greater than 89%, no greater than 88%, no greater than 87%, no greater than 86%, no greater than 85%, no greater than 84%, no greater than 83%, no greater than 82%, no greater than 81%, no greater than 80%, no greater than 79%, no greater than 78%, no greater than 77%, no greater than 76%, no greater than 75%, no greater than 74%, no greater than 73%, no greater than 72%, no greater than 71%, no greater than 70%, no greater than 69%, no greater than 68%, no greater than 67%, no greater than 66%, no greater than 65%, no greater than 64%, no greater than 63%, no greater than 62%, no greater than 61%, no greater than 60%, no greater than 59%, no greater than 58%, no greater than 57%, no greater than 56%, no greater than 55%, no greater than 54%, no greater than 53%, no greater than 52%, no greater than 51%, no greater than 50%, no greater than 49%, no greater than 48%, no greater than 47%, no greater than 46%, no greater than 45%, no greater than 44%, no greater than 43%, no greater than 42%, no greater than 41%, no greater than 40%, no greater than 39%, no greater than 38%, no greater than 37%, no greater than 36%, no greater than 35%, no greater than 34%, no greater than 33%, no greater than 32%, no greater than 31%, no greater than 30%, no greater than 29%, no greater than 28%, no greater than 27%, no greater than 26%, no greater than 25%, no greater than 24%, no greater than 23%, no greater than 22%, no greater than 21%, no greater than 20%, no greater than 19%, no greater than 18%, no greater than 17%, no greater than 16%, no greater than 15%, no greater than 14%, no greater than 13%, no greater than 12%, no greater than 11%, no greater than 10%, no greater than 9%, no greater than 8%, no greater than 7%, no greater than 6%, no greater than 5%, or no greater than 4% of the total amount of dGTP. In some embodiments, the amount of dGTP analog in an amplification reaction is 100%, that is, there is no dGTP present, only dGTP analog and other dNTPs useful in an amplification, such as dATP, dCTP, and dTTP.
IP-2724-PCT/531.2724WO01 [00170] Examples of ranges of the amount of dGTP analog in an amplification reaction include, but are not limited to, a lower amount of the range selected from at least 3% to at least 24% and a higher amount of the range selected from no greater than 25% to no greater than 4%, for instance, at least 3% to no greater than 25%, at least 3% to no greater than 7%, at least 7% to no greater than 12%, at least 12% to no greater than 17%, or at least 17% to no greater than 22%. Other examples of ranges of the amount of dGTP analog in an amplification reaction include, but are not limited to, a lower amount of the range selected from at least 3% to at least 13% and a higher amount of the range selected from no greater than 16% to no greater than 6%, for instance, at least 3% to no greater than 16%, at least 5% to no greater than 14%, or at least 7% to no greater than 12%. [00171] dGTP analogs useful in clustering include those having the nucleobase of Formula 5:
where J, Z, R1 and R2 are described herein. Other dGTP analogs useful in clustering include those having the nucleobase of Formula 6:
wherein R100 is described herein. [00172] Examples of dGTP analogs useful in clustering include, but are not limited to, 7-deaza-dGTP, 7-deaza-7-trifluoromethyl-dGTP, 7-deaza-7-methyl sulfoxide-dGTP, 7-deaza-7-cyano- dGTP, 8-aza-7-deaza-dGTP, 7-deaza-7-proparagylamino-dGTP, 7-deaza-7-iodo-dGPT, 7- deaza-7-fluoro-dGTP, 7-deaza-7-trifluoromethylsulfone-dGTP, 7-deaza-7-
IP-2724-PCT/531.2724WO01 trifluoromethylsulfoxide-dGTP, 7-deaza-acetoxy-dGTP. In one embodiment, a dGTP analog useful in clustering is 7-deaza-MeSO2-dGTP. [00173] In some embodiments an array includes two populations of primers (e.g., capture nucleic acids) immobilized at amplification sites. In some embodiments the amplification sites of array include one population of a first primer (e.g., a first capture nucleic acids) immobilized thereto, and a second primer (e.g., a second nucleic acids) can be provided in solution during the reacting. In practice, there will be a plurality of identical first primers and/or a plurality of identical second primers immobilized at the amplification sites, as the amplification process requires an excess of primers to sustain amplification. [00174] As will be appreciated by the person of ordinary skill in the art, any given amplification reaction requires at least one type of forward primer and at least one type of reverse primer specific for the target nucleic acid to be amplified. However, in certain embodiments the forward and reverse primers may include target-specific portions of identical sequence, and may have entirely identical nucleotide sequence and structure (including any non- nucleotide modifications). In other words, it is possible to carry out amplification at amplification sites using only one type of primer, and such single-primer methods are encompassed within the scope of the disclosure. Other embodiments may use forward and reverse primers which contain identical target-specific sequences but which differ in some other structural features. For example, one type of primer may contain a non-nucleotide modification which is not present in the other. [00175] The production of a plurality of populated amplification sites on a array typically occurs by amplification at each amplification site. The term "solid-phase amplification" as used herein refers to any nucleic acid amplification reaction carried out on or in association with an array such that all or a portion of the amplified products are immobilized at amplification sites on the array as they are formed. In particular, the term encompasses solid-phase polymerase chain reaction (solid-phase PCR) and solid phase isothermal amplification which are reactions analogous to standard solution phase amplification, except that one or both of the forward and reverse capture agents include amplification primers are immobilized on the array. Solid phase PCR covers systems such as emulsions,
IP-2724-PCT/531.2724WO01 where one primer is anchored to, for instance a bead, and the other is in free solution, and colony formation in solid phase gel matrices wherein one primer is anchored to the array and one is in free solution. [00176] In one embodiment, a plurality of target nucleic acids is used to prepare clustered arrays of nucleic acid colonies, analogous to those described in U.S. Pub. No.2005/0100900, U.S. Pat. No.7,115,400, WO 00/18957 and WO 98/44151 by solid-phase amplification, such as solid-phase isothermal amplification. The terms "cluster" and "colony" are used interchangeably herein to refer to a discrete site on a solid support including a plurality of identical immobilized nucleic acid strands and a plurality of identical immobilized complementary nucleic acid strands. The term "clustered array" refers to an array formed from such clusters or colonies. [00177] Clustered arrays can be prepared using either a process of thermocycling, as described in WO 98/44151, or a process where the temperature is maintained as a constant, and the cycles of extension and denaturing are performed using changes of reagents. Such isothermal amplification methods include, but are not limited to, bridge amplification and exclusion amplification (ExAmp, also referred to as kinetic exclusion amplification (KEA)). Isothermal amplification methods are described in patent application numbers WO 02/46456, U.S. Pub. No.2008/0009420, U.S. Pat. No.8,895,249, U.S. Pub No. 2013/0338042, and U.S. Pat. No.9,169,513. Isothermal amplification by exclusion amplification may be used with, for instance, the Bsu (Bacillus subtilis) DNA polymerase or large fragment of Bsu. Isothermal amplification by bridge amplification may be used with, for instance, the Bst (Bacillus stearothermophilus) DNA polymerase. Optionally, the polymerase is deficient in 5' exonuclease activity, 3' exonuclease activity, or both activities. In some embodiments, cluster generation can be accomplished using commercially available machines such as the cBot (Illumina, San Diego, CA) and certain sequencing instruments such as iSeq 100, MiniSeq, NextSeq 550 Series, NextSeq 1000 & 2000, NovaSeq 6000 Series, and NovaSeq X Series (Illumina, San Diego, CA). [00178] Examples of dGTP analogs useful during cluster generation by bridge amplification include, but are not limited to, 7-deaza-dGTP, 7-deaza-7-trifluoromethyl-dGTP, 7-deaza-7-
IP-2724-PCT/531.2724WO01 methyl sulfoxide-dGTP, 7-deaza-7-cyano-dGTP, 8-aza-7-deaza-dGTP, 7-deaza-7- proparagylamino-dGTP, 7-deaza-7-iodo-dGPT, 7-deaza-7-fluoro-dGTP, 7-deaza-7- trifluoromethylsulfone-dGTP, 7-deaza-7-trifluoromethylsulfoxide-dGTP, 7-deaza-acetoxy- dGTP. Examples of dGTP analogs useful during cluster generation by exclusion amplification include, but are not limited to, 7-deaza-dGTP, 7-deaza-7-trifluoromethyl- dGTP, 7-deaza-7-methyl sulfoxide-dGTP, 7-deaza-7-cyano-dGTP, 8-aza-7-deaza-dGTP, 7- deaza-7-proparagylamino-dGTP, 7-deaza-7-iodo-dGPT, 7-deaza-7-fluoro-dGTP, 7-deaza- 7-trifluoromethylsulfone-dGTP, 7-deaza-7-trifluoromethylsulfoxide-dGTP, 7-deaza- acetoxy-dGTP. [00179] It will be appreciated that any of the amplification methodologies described herein or generally known in the art may be used with universal or target-specific primers to amplify immobilized DNA fragments. Suitable methods for amplification include, but are not limited to, the polymerase chain reaction (PCR), strand displacement amplification (SDA), transcription mediated amplification (TMA) and nucleic acid sequence-based amplification (NASBA), as described in U.S. Pat. No.8,003,354. The amplification methods may be employed to amplify one or more nucleic acids of interest. For example, PCR, including multiplex PCR, SDA, TMA, NASBA and the like may be utilized to amplify immobilized DNA fragments. In some embodiments, primers directed specifically to the polynucleotide of interest are included in the amplification reaction. [00180] Other suitable methods for amplification of target nucleic acids may include oligonucleotide extension and ligation, rolling circle amplification (RCA) (Lizardi et al., Nat. Genet.19:225-232 (1998)) and oligonucleotide ligation assay (OLA) (See generally U.S. Pat. Nos.7,582,420, 5,185,243, 5,679,524 and 5,573,907; EP 0320308 B1; EP 0336 731 B1; EP 0439182 B1; WO 90/01069; WO 89/12696; and WO 89/09835) technologies. It will be appreciated that these amplification methodologies may be designed to amplify immobilized target nucleic acids. For example, in some embodiments, the amplification method may include ligation probe amplification or oligonucleotide ligation assay (OLA) reactions that contain primers directed specifically to a nucleic acid of interest. In some embodiments, the amplification method may include a primer extension-ligation reaction that contains primers directed specifically to the nucleic acid of interest. As a non-limiting
IP-2724-PCT/531.2724WO01 example of primer extension and ligation primers that may be specifically designed to amplify a nucleic acid of interest, the amplification may include primers used for the GoldenGate assay (Illumina, Inc., San Diego, CA) as exemplified by U.S. Pat. No. 7,582,420 and 7,611,869. [00181] DNA nanoballs can also be used in combination with methods, systems, compositions and kits as described herein. Methods for creating and using DNA nanoballs for genomic sequencing can be found at, for example, US patents and publications U.S. Pat. No. 7,910,354, 2009/0264299, 2009/0011943, 2009/0005252, 2009/0155781, 2009/0118488 and as described in, for example, Drmanac et al. (2010, Science 327(5961): 78-81). Briefly, following production of modified target nucleic acids, the modified target nucleic acids are circularized and amplified by rolling circle amplification (Lizardi et al., 1998. Nat. Genet. 19:225-232; US 2007/0099208 A1). The extended concatemeric structure of the amplicons promotes coiling creates compact DNA nanoballs. The DNA nanoballs can be captured on substrates, preferably to create an ordered or patterned array such that distance between each nanoball is maintained thereby allowing sequencing of the separate DNA nanoballs. In some embodiments such as those used by Complete Genomics (Mountain View, Calif.), consecutive rounds of adapter addition, amplification, and digestion are carried out prior to circularization to produce head to tail constructs having several target nucleic acids separated by adapter sequences. [00182] Exemplary isothermal amplification methods that may be used in a method of the present disclosure include, but are not limited to, Multiple Displacement Amplification (MDA) as exemplified by, for example Dean et al., Proc. Natl. Acad. Sci. USA 99:5261-66 (2002) or isothermal strand displacement nucleic acid amplification exemplified by, for example U.S. Pat. No.6,214,587. Other non-PCR-based methods that may be used in the present disclosure include, for example, strand displacement amplification (SDA) which is described in, for example Walker et al., Molecular Methods for Virus Detection, Academic Press, Inc., 1995; U.S. Pat. Nos.5,455,166, and 5,130,238, and Walker et al., Nucl. Acids Res.20:1691-96 (1992) or hyper-branched strand displacement amplification which is described in, for example Lage et al., Genome Res.13:294-307 (2003). Isothermal amplification methods may be used with, for instance, the strand-displacing Phi 29
IP-2724-PCT/531.2724WO01 polymerase or Bst DNA polymerase large fragment, 5'->3' exo- for random primer amplification of genomic DNA. The use of these polymerases takes advantage of their high processivity and strand displacing activity. High processivity allows the polymerases to produce fragments that are 10-20 kb in length. As set forth herein, smaller fragments may be produced under isothermal conditions using polymerases having low processivity and strand-displacing activity such as Klenow polymerase. Additional description of amplification reactions, conditions and components are set forth in detail in the disclosure of U.S. Patent No.7,670,810. [00183] In some embodiments, amplification sites in an array can be, but need not be, entirely clonal. Rather, for some applications, an individual amplification site can be predominantly populated with amplicons from a first modified target nucleic acid and can also have a low level of contaminating amplicons from a second modified target nucleic acid. An array can have one or more amplification sites that have a low level of contaminating amplicons so long as the level of contamination does not have an unacceptable impact on a subsequent use of the array. For example, when the array is to be used in a detection application, an acceptable level of contamination would be a level that does not impact signal to noise or resolution of the detection technique in an unacceptable way. Accordingly, apparent clonality will generally be relevant to a particular use or application of an array made by the methods set forth herein. Exemplary levels of contamination that can be acceptable at an individual amplification site for particular applications include, but are not limited to, at most 0.1%, 0.5%, 1%, 5%, 10% or 25% contaminating amplicons. An array can include one or more amplification sites having these exemplary levels of contaminating amplicons. For example, up to 5%, 10%, 25%, 50%, 75%, or even 100% of the amplification sites in an array can have some contaminating amplicons. It will be understood that in an array or other collection of sites, at least 50%, 75%, 80%, 85%, 90%, 95% or 99% or more of the sites can be clonal or apparently clonal. [00184] An amplification reagent can include further components that facilitate amplicon formation, and in some cases increase the rate of amplicon formation. An example is a recombinase in isothermal reactions including exclusion amplification. A mixture of recombinase and single-stranded binding (SSB) protein is particularly useful as SSB can
IP-2724-PCT/531.2724WO01 further facilitate amplification. Exemplary formulations for recombinase-facilitated amplification include those sold commercially as TwistAmp kits by TwistDx (Cambridge, UK). Useful components of recombinase-facilitated amplification reagent and reaction conditions are set forth in US 5,223,414 and US 7,399,590. [00185] Another example of a component that can be included in an amplification reagent to facilitate amplicon formation and in some cases to increase the rate of amplicon formation is a helicase. Exemplary formulations for helicase-facilitated amplification include those sold commercially as IsoAmp kits from Biohelix (Beverly, MA). Further, examples of useful formulations that include a helicase protein are described in US 7,399,590 and US 7,829,284. [00186] Yet another example of a component that can be included in an amplification reagent to facilitate amplicon formation and in some cases increase the rate of amplicon formation is an origin binding protein. [00187] The presence of molecular crowding reagents in the solution can be used to aid exclusion amplification. Examples of useful molecular crowding reagents include, but are not limited to, polyethylene glycol (PEG), Ficoll®, dextran, or polyvinyl alcohol. Exemplary molecular crowding reagents and formulations are set forth in U.S. Pat. No.7,399,590. [00188] The rate at which an amplification reaction occurs can be increased by increasing the concentration or amount of one or more of the active components of an amplification reaction. For example, the amount or concentration of polymerase, nucleotide triphosphates, primers, recombinase, helicase or SSB can be increased to increase the amplification rate. In some cases, the one or more active components of an amplification reaction that are increased in amount or concentration (or otherwise manipulated in a method set forth herein) are non-nucleic acid components of the amplification reaction. [00189] Amplification rate can also be increased in a method set forth herein by adjusting the temperature. For example, the rate of amplification at one or more amplification sites can be increased by increasing the temperature at the site(s) up to a maximum temperature where reaction rate declines due to denaturation or other adverse events. Optimal or desired
IP-2724-PCT/531.2724WO01 temperatures can be determined from known properties of the amplification components in use or empirically for a given amplification reaction mixture. Such adjustments can be made based on a priori predictions of primer melting temperature (Tm) or empirically. [00190] The rate at which an amplification reaction occurs can be increased by increasing the activity of one or more amplification reagent. For example, a cofactor that increases the extension rate of a polymerase can be added to a reaction where the polymerase is in use. In some embodiments, metal cofactors such as magnesium, zinc or manganese can be added to a polymerase reaction or betaine can be added. [00191] In some embodiments of the methods set forth herein, it is desirable to use a population of target nucleic acids that is double-stranded. It has been observed that amplicon formation at an array of sites under exclusion amplification conditions is efficient for double-stranded target nucleic acids. For example, a plurality of amplification sites having clonal populations of amplicons can be more efficiently produced from double-stranded target nucleic acids (compared to single-stranded target nucleic acids at the same concentration) in the presence of recombinase and single-stranded binding protein. Nevertheless, it will be understood that single-stranded target nucleic acids can be used in some embodiments of the methods set forth herein. [00192] Methods of Sequencing [00193] An array of the present disclosure, for example, having been produced by a method set forth herein and including amplified target nucleic acids at amplification sites, can be used for any of a variety of applications. A particularly useful application is nucleic acid sequencing. One example is sequencing-by-synthesis (SBS). In SBS, extension of a nucleic acid primer along a nucleic acid template (e.g., a target nucleic acid or amplicon thereof) is monitored to determine the sequence of nucleotides in the template. The underlying chemical process can be polymerization (e.g., as catalyzed by a polymerase enzyme). In a particular polymerase- based SBS embodiment, fluorescently labeled nucleotides are added to a primer (thereby extending the primer) in a template dependent fashion such that detection of the order and type of nucleotides added to the primer can be used to determine the sequence of the template. A plurality of different templates at different sites of an array set forth herein can be subjected
IP-2724-PCT/531.2724WO01 to an SBS technique under conditions where events occurring for different templates can be distinguished due to their location in the array. Examples of DNA polymerases useful for sequencing include, but are not limited to, polymerases described in U.S. Patent No. 11,104,888, U.S. Pat. No.11,001,816, U.S. Pat. Appl. No.18/373,620; U.S. Published Patent Application No.2023/0047225. [00194] Flow cells provide a convenient format for housing an array that is produced by the methods of the present disclosure and that is subjected to an SBS or other detection technique that involves repeated delivery of reagents in cycles. For example, to initiate a first SBS cycle, one or more labeled nucleotides, DNA polymerase, etc., can be flowed into/through a flow cell that houses an array of nucleic acid templates. Those sites of an array where primer extension causes a labeled nucleotide to be incorporated can be detected. Optionally, the nucleotides can further include a reversible termination property that terminates further primer extension once a nucleotide has been added to a primer. For example, a nucleotide analog having a reversible terminator moiety can be added to a primer such that subsequent extension cannot occur until a deblocking agent is delivered to remove the moiety. Thus, for embodiments that use reversible termination, a deblocking reagent can be delivered to the flow cell (before or after detection occurs). Washes can be carried out between the various delivery steps. The cycle can then be repeated n times to extend the primer by n nucleotides, thereby detecting a sequence of length n. Exemplary SBS procedures, fluidic systems and detection platforms that can be readily adapted for use with an array produced by the methods of the present disclosure are described, for example, in Bentley et al., Nature 456:53-59 (2008), WO 04/018497; U.S. Pat. No.7,057,026; WO 91/06678; WO 07/123,744; U.S. Pat. No.7,329,492; U.S. Pat. No.7,211,414; U.S. Pat. No.7,315,019; U.S. Pat. No.7,405,281, and U.S. Pat. No.8,343,746. Examples nucleotides having a reversible termination property include modifications at the 3'-OH of the nucleotide sugar moiety, such as a 3'-O- azidomethyl blocking group –CH2N3, a 3'-OH acetal blocking group, or a 3'-OH thiocarbamate blocking group (U.S. Patent No. 11,293,061; U.S. Published Patent Application No.2022/0396832). [00195] Other sequencing procedures that use cyclic reactions can be used, such as pyrosequencing. Pyrosequencing detects the release of inorganic pyrophosphate (PPi) as particular
IP-2724-PCT/531.2724WO01 nucleotides are incorporated into a nascent nucleic acid strand (Ronaghi, et al., Analytical Biochemistry 242(1), 84-9 (1996); Ronaghi, Genome Res.11(1), 3-11 (2001); Ronaghi et al. Science 281(5375), 363 (1998); U.S. Pat. No.6,210,891; U.S. Pat. No.6,258,568 and U.S. Pat. No.6,274,320). In pyrosequencing, released PPi can be detected by being immediately converted to adenosine triphosphate (ATP) by ATP sulfurylase, and the level of ATP generated can be detected via luciferase-produced photons. Thus, the sequencing reaction can be monitored via a luminescence detection system. Excitation radiation sources used for fluorescence-based detection systems are not necessary for pyrosequencing procedures. Useful fluidic systems, detectors and procedures that can be used for application of pyrosequencing to arrays of the present disclosure are described, for example, in WIPO Published Pat. App.2012/058096, US 2005/0191698 A1, U.S. Pat. No.7,595,883, and U.S. Pat. No.7,244,559. [00196] Sequencing-by-ligation reactions are also useful including, for example, those described in Shendure et al. Science 309:1728-1732 (2005); U.S. Pat. No.5,599,675; and U.S. Pat. No. 5,750,341. Some embodiments can include sequencing-by-hybridization procedures as described, for example, in Bains et al., Journal of Theoretical Biology 135(3), 303-7 (1988); Drmanac et al., Nature Biotechnology 16, 54-58 (1998); Fodor et al., Science 251(4995), 767-773 (1995); and WO 1989/10977. In both sequencing-by-ligation and sequencing-by- hybridization procedures, template nucleic acids (e.g., a target nucleic acid or amplicons thereof) that are present at sites of an array are subjected to repeated cycles of oligonucleotide delivery and detection. Fluidic systems for SBS methods as set forth herein or in references cited herein can be readily adapted for delivery of reagents for sequencing-by-ligation or sequencing-by-hybridization procedures. Typically, the oligonucleotides are fluorescently labeled and can be detected using fluorescence detectors similar to those described with regard to SBS procedures herein or in references cited herein. [00197] Some embodiments can use methods involving the real-time monitoring of DNA polymerase activity. For example, nucleotide incorporations can be detected through fluorescence resonance energy transfer (FRET) interactions between a fluorophore-bearing polymerase and ^-phosphate-labeled nucleotides, or with zeromode waveguides (ZMWs). Techniques and reagents for FRET-based sequencing are described, for example, in Levene et al. Science
IP-2724-PCT/531.2724WO01 299, 682-686 (2003); Lundquist et al. Opt. Lett.33, 1026-1028 (2008); Korlach et al. Proc. Natl. Acad. Sci. USA 105, 1176-1181 (2008). [00198] Some SBS embodiments include detection of a proton released upon incorporation of a nucleotide into an extension product. For example, sequencing based on detection of released protons can use an electrical detector and associated techniques that are commercially available from Ion Torrent (Guilford, Conn., a Life Technologies subsidiary) or sequencing methods and systems described in US 2009/0026082 A1; US 2009/0127589 A1; US 2010/0137143 A1; or US 2010/0282617 A1. Methods set forth herein for amplifying target nucleic acids using exclusion amplification can be readily applied to substrates used for detecting protons. More specifically, methods set forth herein can be used to produce clonal populations of amplicons at the sites of the arrays that are used to detect protons. [00199] Sequencing of templates in a cluster often includes the technique of "paired-end" or "pairwise" sequencing (U.S. Pat. No.7,754,429 and U.S. Pat. No.8,017,335). In some embodiments, a dGTP analog is used during the resynthesis that occurs during paired-end sequencing of templates at a cluster. Paired-end sequencing is a multi-step process that allows the determination of two "reads" of sequence by sequencing both strands of a double stranded nucleic acid. The advantage of the paired-end approach is that there is significantly more information to be gained from sequencing bases from two complementary templates than from sequencing the same number of bases from each of two independent templates in a random fashion. With the use of appropriate software tools for the assembly of sequence information, it is possible to use the knowledge that the "paired-end" sequences are not completely random, but are known to occur on a single template, and are therefore linked or paired in the genome. This information greatly aids the assembly of whole genome sequences into a consensus sequence. [00200] After production of clonal clusters, each cluster includes immobilized complementary strands. In order to provide more suitable templates for sequencing, substantially all or at least a portion of one of the immobilized strands is removed in order to generate a template which is at least partially single-stranded. The portion of the template which is single- stranded will thus be available for hybridization to a sequencing primer. The process of
IP-2724-PCT/531.2724WO01 removing all or a portion of one immobilized strand is referred to as "linearization." There are various ways for linearization, including but not limited to enzymatic cleavage (e.g., uracil DNA glycosylase (UDG) and endonuclease VII, oxoguanine glycosylase, chemical cleavage (e.g., palladium reagents and Pd linearization, nickel reagents and Ni Pd linearization), photo-chemical cleavage. Non-limiting examples of linearization methods are disclosed in US Serial No. 18/473,971, filed Sep. 25, 2023; PCT Publication No. WO 2019/222264; US Published Patent Application No. 2019/0352327; WO 2007/010251; US Patent Application Publication No. 2009/0088327; and in US. Patent Publication No. 2009/0118128, which are incorporated by reference in their entireties. [00201] Sequence data can be obtained from both immobilized complementary strands by performing a linearization to remove a strand attached by one capture nucleic acid, e.g., P5, obtaining a sequence read from the remining first strand using a primer, copying the first strand using immobilized primers for strand resynthesis and repopulation of the cluster with the strand initially removed by the first linearization, releasing the first strand and sequencing the second, copied strand. In one embodiment, resynthesis and repopulation of clusters includes use of a resynthesis reagent. A resynthesis reagent can include (i) an array of amplification sites, where each amplification site includes immobilized modified target nucleic acids, (ii) nucleotide triphosphates (dNTPs), wherein the NTPs include dATP, dTTP, dCTP, dGTP, and a dGTP analog, and (iii) a polymerase. In some embodiments, a resynthesis reagent does not include a dGTP analog. The resynthesis reagent is reacted to produce, at each amplification site, a population of strands that are complementary to the strand sequenced during the first round. The population of complementary strands are sequenced during the second round. When a dGTP analog is present, the complementary strands will include the dGTP analog incorporated. The dGTP analog will be present in the complementary strands at a level that is dependent on the percentage or ratio of dGTP to dGTP analog present in the resynthesis reagent, and the G- quadruplexes in the amplicons will be reduced in number and/or stability compared to the same amplicon that does not include a dGTP analog. [00202] Secondary structure from a G-quadruplex during the cluster repopulation/strand resynthesis can reduce the representation of GC-rich regions. The present disclosure includes the use of
IP-2724-PCT/531.2724WO01 a dGTP analog during some embodiments of the step of cluster repopulation/strand resynthesis to reduce the impact of G quadruplexes. An example of these steps is shown in FIG.6. FIG.6A shows an amplification site 20 containing immobilized complementary strands 21 and 22. A linearization is performed to remove strand 22 by cleaving the capture nucleic acid 23, for instance P5, at the X. Cleavage results in a cleaved capture nucleic acid 23* and one population of immobilized target nucleic acids, 21, as shown in FIG.6B. The sequencing of strand 21 can be carried out by the sequential addition of nucleotides to the first sequencing primer using the strand 21 as the template. For instance, as shown in FIG.6C a sequencing primer 24 is annealed to the strand 21 and is ready for extension by a DNA polymerase in a sequencing reaction. The strand extended during the sequencing reaction is not immobilized and is removed, and strand resynthesis occurs to repopulate the cluster with the strand that is the complement of the sequenced strand. As shown in FIG.6D, the sequenced strand 21 is used as the template to repopulate the amplification site 20 using bridge amplification. The result is shown in FIG.6E, where the amplification site is repopulated with strand 22*, which is identical to strand 22 in FIG.6A but the cleavage site X is no longer present. Linearization is performed to remove sequenced strand 21 by cleaving the capture nucleic acid 24, for instance P7, at the Y. As shown in FIG.6F, the repopulated amplification site 20 is ready for sequencing of the other strand, thereby resulting in pairwise sequencing. [00203] The methods for cluster repopulation/strand resynthesis described herein can differ from typical cluster repopulation/strand resynthesis due to the inclusion of a dGTP analog. Thus, the extension reactions that occur during resynthesis, e.g., extension from capture nucleic acid 23* of FIG.6D, can include a dGTP analog. The amount of dGTP analog can be described in relation to the normal dGTP present. In one embodiment, the amount of dGTP analog can be expressed as a percentage of the normal dGTP present in a resynthesis reaction. For instance, a resynthesis reagent can include dGTP and a dGTP analog, where the amount of dGTP analog can be described in relation to the normal dGTP present. In some embodiments, the amount of dGTP analog in a resynthesis reaction can be at least 3%, at least 4%, at least 5%, at least 6%, at least 7%, at least 8%, at least 9%, at least 10%, at least 11%, at least 12%, at least 13%, at least 14%, at least 15%, at least 16%, at least 17%, at least 18%, at least 19%, at least 20%, at least 21%, at least 22%, at least 23%, at
IP-2724-PCT/531.2724WO01 least 24%, least 25%, at least 26%, at least 27%, at least 28%, at least 29%, at least 30%, at least 31%, at least 32%, at least 33%, at least 34%, at least 35%, at least 36%, at least 37%, at least 38%, at least 39%, at least 40%, at least 41%, at least 42%, at least 43%, at least 44%, at least 45%, at least 46%, at least 47%, at least 48%, at least 49%, at least 50%, at least 51%, at least 52%, at least 53%, at least 54%, at least 55%, at least 56%, at least 57%, at least 58%, at least 59%, at least 60%, at least 61%, at least 62%, at least 63%, at least 64%, least 65%, at least 66%, at least 67%, at least 68%, at least 69%, at least 70%, at least 71%, at least 72%, at least 73%, at least 74%, at least 75%, at least 76%, at least 77%, at least 78%, at least 79%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% of the total amount of dGTP . In some embodiments, the amount of dGTP analog in an amplification reaction can be no greater than 99%, no greater than 98%, no greater than 97%, no greater than 96%, no greater than 95%, no greater than 94%, no greater than 93%, no greater than 92%, no greater than 91%, no greater than 90%, no greater than 89%, no greater than 88%, no greater than 87%, no greater than 86%, no greater than 85%, no greater than 84%, no greater than 83%, no greater than 82%, no greater than 81%, no greater than 80%, no greater than 79%, no greater than 78%, no greater than 77%, no greater than 76%, no greater than 75%, no greater than 74%, no greater than 73%, no greater than 72%, no greater than 71%, no greater than 70%, no greater than 69%, no greater than 68%, no greater than 67%, no greater than 66%, no greater than 65%, no greater than 64%, no greater than 63%, no greater than 62%, no greater than 61%, no greater than 60%, no greater than 59%, no greater than 58%, no greater than 57%, no greater than 56%, no greater than 55%, no greater than 54%, no greater than 53%, no greater than 52%, no greater than 51%, no greater than 50%, no greater than 49%, no greater than 48%, no greater than 47%, no greater than 46%, no greater than 45%, no greater than 44%, no greater than 43%, no greater than 42%, no greater than 41%, no greater than 40%, no greater than 39%, no greater than 38%, no greater than 37%, no greater than 36%, no greater than 35%, no greater than 34%, no greater than 33%, no greater than 32%, no greater than 31%, no greater than 30%, no greater than 29%, no greater than 28%, no greater than 27%, no greater than 26%, no
IP-2724-PCT/531.2724WO01 greater than 25%, no greater than 24%, no greater than 23%, no greater than 22%, no greater than 21%, no greater than 20%, no greater than 19%, no greater than 18%, no greater than 17%, no greater than 16%, no greater than 15%, no greater than 14%, no greater than 13%, no greater than 12%, no greater than 11%, no greater than 10%, no greater than 9%, no greater than 8%, no greater than 7%, no greater than 6%, no greater than 5%, or no greater than 4% of the total amount of dGTP. In some embodiments, the amount of dGTP analog in an amplification reaction is 100%, that is, there is no dGTP present, only dGTP analog and other dNTPs useful in an amplification, such as dATP, dCTP, and dTTP. [00204] Examples of ranges of the amount of dGTP analog in a resynthesis reaction include, but are not limited to, a lower amount of the range selected from at least 3% to at least 24% and a higher amount of the range selected from no greater than 25% to no greater than 4%, for instance, at least 3% to no greater than 25%, at least 3% to no greater than 7%, at least 7% to no greater than 12%, at least 12% to no greater than 17%, or at least 17% to no greater than 22%. Other examples of ranges of the amount of dGTP analog in a resynthesis reaction include, but are not limited to, a lower amount of the range selected from at least 3% to at least 13% and a higher amount of the range selected from no greater than 16% to no greater than 6%, for instance, at least 3% to no greater than 16%, at least 5% to no greater than 14%, or at least 7% to no greater than 12%. [00205] dGTP analogs useful in the amplification that occurs during resynthesis include those having the nucleobase of Formula 5:
where J, Z, R1 and R2 are described herein. Other dGTP analogs useful in clustering include those having the nucleobase of Formula 6:
IP-2724-PCT/531.2724WO01
herein. [00206] Examples of dGTP analogs useful in resynthesis include, but are not limited to, 7-deaza- dGTP, 7-deaza-7-trifluoromethyl-dGTP, 7-deaza-7-methyl sulfoxide-dGTP, 7-deaza-7- cyano-dGTP, 8-aza-7-deaza-dGTP or 7-deaza-8-aza-dGTP, 7-deaza-7-proparagylamino- dGTP, 7-deaza-7-iodo-dGPT, 7-deaza-7-trifluoromethylsulfone-dGTP, 7-deaza-7- trifluoromethylsulfoxide-dGTP, and 7-deaza-acetoxy-dGTP. In some embodiments, when linearization is accomplished by using an oxoguanine glycosylase a dGTP analog other than 7-deaza-dGTP or 8-aza-7-deaza-dGTP is useful during resynthesis. In one embodiment, a dGTP analog useful in resynthesis is 7-deaza-MeSO2-dGTP. [00207] The present disclosure provides integrated sequencing systems capable of making an array using one or more of the methods set forth herein, e.g., producing clusters that include a dGTP analog. An integrated sequencing system can be capable of detecting nucleic acids on the arrays using techniques such as those described herein, including resynthesis in the presence of a dGTP analog during paired-end sequencing. Thus, an integrated sequencing system of the present disclosure can include fluidic components capable of delivering amplification reagents to an array of amplification sites such as pumps, valves, reservoirs, fluidic lines and the like. An example of useful fluidic components includes a flow cell and a cartridge. A flow cell can be configured and/or used in an integrated sequencing system to create an array of the present disclosure and to detect the array. Exemplary flow cells are described, for example, in US 2010/0111768 A1 and U.S. Pat. No. 8,951,781. A cartridge can be configured to include the components of an amplification or resynthesis reagent in one or more chambers. As exemplified for flow cells, one or more of the fluidic components of an integrated sequencing system can be used for an amplification method and for a detection method. Taking a nucleic acid sequencing embodiment as an example, one or more of the fluidic components of an integrated sequencing system can be used for an
IP-2724-PCT/531.2724WO01 amplification method set forth herein and for the delivery of sequencing reagents in a sequencing method, including a resynthesis method, such as those described herein. Alternatively, an integrated sequencing system can include separate fluidic systems to carry out amplification methods and to carry out detection methods and resynthesis methods. Examples of integrated sequencing systems that are capable of creating arrays of nucleic acids and also determining the sequence of the nucleic acids include, without limitation, the MiSeq™, HiSeq™, NextSeq™, MiniSeq™, NovaSeq™ and iSeq™ platforms (Illumina, Inc., San Diego, Calif.) and devices described in U.S. Pat. No.8,951,781. Such devices can be modified to make arrays using exclusion amplification in accordance with the guidance set forth herein. [00208] A system capable of carrying out a method set forth herein need not be integrated with a detection device. Rather, a stand-alone system or a system integrated with other devices is also possible. Fluidic components similar to those exemplified herein in the context of an integrated sequencing system can be used in such embodiments. [00209] A system capable of carrying out a method set forth herein, whether integrated with detection capabilities or not, can include a system controller that is capable of executing a set of instructions to perform one or more steps of a method, technique or process set forth herein. For example, the instructions can direct the performance of steps for creating an array under exclusion amplification conditions. Optionally, the instructions can further direct the performance of steps for detecting nucleic acids using methods set forth previously herein. A useful system controller may include any processor-based or microprocessor-based system, including systems using microcontrollers, reduced instruction set computers (RISC), application specific integrated circuits (ASICs), field programmable gate array (FPGAs), logic circuits, and any other circuit or processor capable of executing functions described herein. A set of instructions for a system controller may be in the form of a software program. As used herein, the terms “software” and “firmware” are interchangeable, and include any computer program stored in memory for execution by a computer, including RAM memory, ROM memory, EPROM memory, EEPROM memory, and non-volatile RAM (NVRAM) memory. The software may be in various forms such as system software or application software. Further, the software may be in the form of a collection of separate programs, or a
IP-2724-PCT/531.2724WO01 program module within a larger program or a portion of a program module. The software also may include modular programming in the form of object-oriented programming. [00210] Several applications for arrays of the present disclosure have been exemplified herein in the context of ensemble detection, wherein multiple amplicons present at each amplification site are detected together. In alternative embodiments, a single nucleic acid, whether a target nucleic acid or amplicon thereof, can be detected at each amplification site. For example, an amplification site can be configured to contain a single nucleic acid molecule having a target nucleotide sequence that is to be detected and a plurality of filler nucleic acids. In this example, the filler nucleic acids function to fill the capacity of the amplification site and they are not necessarily intended to be detected. The single molecule that is to be detected can be detected by a method that is capable of distinguishing the single molecule in the background of the filler nucleic acids. Any of a variety of single molecule detection techniques can be used including, for example, modifications of the ensemble detection techniques set forth herein to detect the sites at increased gain or using more sensitive labels. Other examples of single molecule detection methods that can be used are set forth in U.S.2011/0312529 A1; U.S. Pat. No.9,279,154; and U.S.2013/0085073 A1. [00211] It will be understood that an array of the present disclosure, for example, having been produced by a method set forth herein, need not be used for a detection method. Rather, the array can be used to store a nucleic acid library. Accordingly, the array can be stored in a state that preserves the nucleic acids therein. For example, an array can be stored in a desiccated state, frozen state (e.g., in liquid nitrogen), or in a solution that is protective of nucleic acids. Alternatively, or additionally, the array can be used to replicate a nucleic acid library. For example, an array can be used to create replicate amplicons from one or more of the sites on the array. [00212] Several embodiments of the disclosure have been exemplified herein with regard to transporting target nucleic acids to amplification sites of an array and making copies of the captured target nucleic acids at the amplification sites. Similar methods can be used for non- nucleic acid target molecules. Thus, methods set forth herein can be used with other target molecules in place of the exemplified target nucleic acids. For example, a method of the
IP-2724-PCT/531.2724WO01 present disclosure can be carried out to transport individual target molecules from a population of different target molecules. Each target molecule can be transported to (and in some cases captured at) an individual amplification site of an array to initiate a reaction at the site of capture. The reaction at each site can, for example, produce copies of the captured molecule or the reaction can alter the site to isolate or sequester the captured molecule. In either case, the end result can be sites of the array that are each pure with respect to the type of target molecule that is present from a population that contained different types of target molecules. [00213] Compositions [00214] Clusters produced by amplification, e.g., during cluster generation or during resynthesis, in the presence of a dGTP analog have one or more of several characteristics. In one embodiment, the analog is present in the two strands of the amplified target nucleic acids, and the percentage of dGTP analog present in the strands is a function of the amount of dGTP analog present in an amplification reaction. In one embodiment, the analog is present in one strand of the amplified target nucleic acids. For instance, the analog is present in the population of single strands present at a cluster before the first round of sequencing, or the analog is present in the population of single strands present at a cluster before the second round of sequencing (e.g., after resynthesis in the presence of a dGTP analog). The percentage of dGTP analog present in a strand is a function of the amount of dGTP analog present in an amplification reaction. The amount of dGTP analog present in the strands can be at least 3% to at least 99% of the total amount of dGTP, and no greater than 4% to no greater than 99% of the total amount of dGTP. [00215] Also provided are compositions that include a dGTP analog, including those having the nucleobase of Formula 5:
IP-2724-PCT/531.2724WO01 6:
wherein R100 is hydrogen or alkyl as described herein. [00216] Further provided are compositions that include a dGTP analog described herein, e.g., 7- deaza-dGTP, 7-deaza-7-trifluoromethyl-dGTP, 7-deaza-7-methyl sulfoxide-dGTP, 7- deaza-7-cyano-dGTP, 8-aza-7-deaza-dGTP, 7-deaza-7-proparagylamino-dGTP, 7-deaza-7- iodo-dGPT, 7-deaza-7-fluoro-dGTP, 7-deaza-7-trifluoromethylsulfone-dGTP, 7-deaza-7- trifluoromethylsulfoxide-dGTP, or 7-deaza-acetoxy-dGTP. In one embodiment, a composition includes 7-deaza-7-trifluoromethylsulfoxide-dGTP. In one embodiment, a composition includes 7-deaza-7-methyl sulfoxide-dGTP. In one embodiment, a composition includes 7-deaza-7-trifluoromethylsulfone-dGTP. [00217] [00218] In one embodiment, a characteristic of clusters prepared as described herein is reduced secondary structure of the amplicons compared to clusters produced in the same way but without use of a dGTP analog. The present disclosure includes compositions, arrays, cartridges, and kits that include clusters having one or more of these characteristics, in any combination. For instance, the present disclosure includes an array that has amplification sites populated with clusters that include a dGTP analog. The array can be a flow cell, and the flow cell can be one that is configured to interact with a cartridge that can be used with
IP-2724-PCT/531.2724WO01 a sequencing apparatus. In one embodiment, a flow cell, such as one having clusters that include a dGTP analog, can be releasably attached to a cartridge. [00219] Kits and articles [00220] The present disclosure also provides kits and articles, such as arrays and cartridges, for carrying out the methods disclosed herein. The kits and articles can be configured for use with a sequencing instrument, such as an integrated sequencing system. [00221] In some embodiments, a cartridge for use with an sequencing system may include a chamber from which a composition (such as a composition that includes a plurality of target nucleic acids, nucleotide triphosphates (dNTPs) including dATP, dTTP, dCTP, dGTP, and an optional dGTP analog, or a polymerase) may be withdrawn or expelled for use in a method disclosed herein, (e.g., cluster generation, resynthesis, or both). A cartridge may include a releasably attached flow cell. [00222] In one embodiment, an array including a sequencing library includes a plurality of amplification sites that include a first and a second nucleic acid sequence. The first nucleic acid can include one single-stranded member of a sequencing library attached thereto, where the attachment includes an interaction between a capture agent immobilized by its 5’ end to the amplification site and a universal capture binding sequence at the 3’ end of the first nucleic acid. The first nucleic acid can include nucleotides dATP, dTTP, dGTP, dCTP, and a dGTP analog, and the second nucleic acid can include the capture agent and the complement of the first nucleic acid. [00223] In some embodiments, a kit includes components for use with the methods of the present disclosure. For instance, a kit may include one or more compositions configured to perform one or more of the amplification and resynthesis steps of cluster generation or amplification site repopulation. A kit may be configured for use with a cartridge. For example, a kit may include the compositions for disposing into the chambers of the cartridge. [00224] The invention is defined in the claims. However, below there is provided a non-exhaustive listing of non-limiting exemplary aspects. Any one or more of the features of these aspects
IP-2724-PCT/531.2724WO01 may be combined with any one or more features of another example, embodiment, or aspect described herein. [00225] Exemplary Aspects [00226] Aspect 1 is a method for reducing bias in generating clonal amplification sites, including (a) providing an amplification reagent including (i) an array of amplification sites, (ii) a composition including a plurality of modified target nucleic acids, (iii) a composition including nucleotide triphosphates (NTPs), where the NTPs include dATP, dTTP, dCTP, dGTP, and a dGTP analog including a nucleobase, and (iv) a composition including a polymerase; and (b) reacting the amplification reagent to produce a plurality of populated amplification sites, where the plurality of populated amplification sites each include a clonal population of amplicons from an individual modified target nucleic acid from the plurality of modified target nucleic acids. [00227] Aspect 2 is a method for reducing bias in generating clonal amplification sites, including (a) providing an amplification reagent including (i) an array of amplification sites, where each amplification site includes a capture sequence and a single-stranded modified target nucleic acid immobilized thereto; (ii) a composition including nucleotide triphosphates (NTPs), where the NTPs include dATP, dTTP, dCTP, dGTP, and a dGTP analog including a nucleobase, and (iii) a composition including a polymerase; and (b) reacting the amplification reagent to produce a plurality of amplification sites that each include a clonal population of amplicons from the single-stranded modified target nucleic acid immobilized thereto in step (a)(i). [00228] Aspect 3 is the method of any of Aspects 1 or 2 or 4 to 39, where the compositions of (ii) and (iii) are present together in a mixture, the compositions of (iii) and (iv) are present together in a mixture, or the compositions of (ii), (iii), and (iv) are present together in a mixture. [00229] Aspect 4 is the method of any of Aspects 1 to 3 or 5 to 39, where the compositions of (ii) and (iii) are present together in a mixture.
IP-2724-PCT/531.2724WO01 [00230] Aspect 5 is the method of any of Aspects 1 to 4 or 6 to 39, where the array includes a flow cell. [00231] Aspect 6 is the method of any of Aspects 1 to 5 or 7 to 39, where the polymerase is Bsu or Bst. [00232] Aspect 7 is the method of any of Aspects 1 to 6 or 8 to 39, where the reacting includes kinetic exclusion amplification or bridge amplification. [00233] Aspect 8 is the method of any of Aspects 1 to 7 or 9 to 39, where the nucleobase of the dGTP analog is 7-deaza-dGPT or 7-deaza-dGPT substituted at the 7 position. [00234] Aspect 9 is the method of any of Aspects 1 to 8 or 10 to 39, where the 7-deaza-dGPT is 8- aza-7-deaza-dGTP or 8-aza-7-deaza-dGTP substituted at the 7 position. [00235] Aspect 10 is the method of any of Aspects 1 to 9 or 11 to 39, where the nucleobase of the dGTP analog is substituted at the nitrogen of position 7. [00236] Aspect 11 is the method of any of Aspects 1 to 10 or 12 to 39, where the nucleobase of the dGTP analog is of Formula 5:
where J is C or N; where Z is C or N; where R1 is hydrogen, halo, alkyl, acyl, trihaloalkyl, cyano, sulfinyl, sulfonyl, or alkynyl; and R2 is hydrogen or halo. [00237] Aspect 12 is the method of any of Aspects 1 to 11 to 13 to 39, where R1 is a C1 to C6 alkyl. [00238] Aspect 13 is the method of any of Aspects 1 to 12 or 14 to 39, where R1 is methyl.
IP-2724-PCT/531.2724WO01 [00239] Aspect 14 is the method of any of Aspects 1 to 13 or 15 to 39, where R1 is an acyl of the formula -C(O)- R10 , and where R10 is a C1 to C6 alkyl. [00240] Aspect 15 is the method of any of Aspects 1 to 14 or 16 to 39, where R10 is methyl. [00241] Aspect 16 is the method of any of Aspects 1 to 15 or 17 to 39, where R1 is a trihaloalkyl of the formula -(CH2)n1C(X)3, where n1 is 0, 1, 2, 3, or 4, and where X is halo. [00242] Aspect 17 is the method of any of Aspects 1 to 16 or 18 to 39, where X is F. [00243] Aspect 18 is the method of any of Aspects 1 to 17 or 19 to 39, where n1 is 0. [00244] Aspect 19 is the method of any of Aspects 1 to 18 or 20 to 39, where R1 is a cyano of the formula -(CH2)n2CN , and where n2 is 0, 1, 2, 3, or 4. [00245] Aspect 20 is the method of any of Aspects 1 to 19 or 21 to 39, where n2 is 0. [00246] Aspect 21 is the method of any of Aspects 1 to 20 or 22 to 39, where R1 is a sulfinyl of formula -S(O)- R20 where R20 is a C1 to C6 alkyl or a trihaloalkyl of the formula - (CH2)n1C(X)3, and where n1 is 0, 1, 2, 3, or 4, and X is halo. [00247] Aspect 22 is the method of any of Aspects 1 to 21 to 23 to 39, where X is F. [00248] Aspect 23 is the method of any of Aspects 1 to 22 or 24 to 39, where n1 is 0. [00249] Aspect 24 is the method of any of Aspects 1 to 23 or 25 to 39, where R1 is a sulfonyl of the formula -S(O)2-R30 , where R30 is a C1 to C6 alkyl or a trihaloalkyl of the formula - (CH2)n1C(X)3, and where n1 is 0, 1, 2, 3, or 4, and X is halo. [00250] Aspect 25 is the method of any of Aspects 1 to 24 or 26 to 39, where X is F. [00251] Aspect 26 is the method of any of Aspects 1 to 25 or 27 to 39, where n1 is 0. [00252] Aspect 27 is the method of any of Aspects 1 to 26 or 28 to 39, where R30 is methyl. [00253] Aspect 28 is the method of any of Aspects 1 to 27 or 29 to 39, where R1 is an alkynyl of - CC-(CH2)n3-R40, where n3 is 1, 2, 3, or 4; and where R40 is CH3 or an amine.
IP-2724-PCT/531.2724WO01 [00254] Aspect 29 is the method of any of Aspects 1 to 28 or 30 to 39, where n3 is 1. [00255] Aspect 30 is the method of any one of any of Aspects 1 to 29 or 31 to 39, where R2 is H. [00256] Aspect 31 is the method of any one of any of Aspects 1 to 30 or 32 to 39, where R2 is halo. [00257] Aspect 32 is the method of any of Aspects 1 to 31 to 33 to 39 where R2 is chloro. [00258] Aspect 33 is the method of any of Aspects 1 to 32 or 34 to 39, where the nucleobase of the dGTP analog of Formula 6:
where R100 is hydrogen or a C1 to C6 alkyl. [00259] Aspect 34 is the method of any of Aspects 1 to 33 or 35 to 39, where R100 is methyl. [00260] Aspect 35 is the method any of Aspects 1 to 34 or 36 to 39, where the nucleobase of the dGTP analog is
IP-2724-PCT/531.2724WO01 , , ,
[00261] includes the dGTP analog at no greater than 50%, no greater than 40%, no greater than 30%, no greater than 20%, no greater than 10%, or no greater than 5% of the total amount of dGTP. [00262] Aspect 37 is the method of any of Aspects 1 to 36 or 38 to 39, further including (c) performing a sequencing procedure on the array to determine the nucleotide sequences for the clonal populations of amplicons. [00263] Aspect 38 is the method of any of Aspects 1 to 37 or to 39, where the sequencing procedure includes use nucleotides including a modification at the 3'-OH of the nucleotide sugar moiety.
IP-2724-PCT/531.2724WO01 [00264] Aspect 39 is the method of any of Aspects 1 to 39, where the modification includes a 3'-O- azidomethyl blocking group, a 3'-OH acetal blocking group, or a 3'-OH thiocarbamate blocking group. [00265] Aspect 40 is an array including a sequencing library, the array including: a plurality of populated amplification sites attached to the array, where the plurality of populated amplification sites each include a clonal population of amplicons from an individual modified target nucleic acid from a library of modified target nucleic acids, where the amplicons include nucleotides dATP, dTTP, dGTP, dCTP, and a dGTP analog including a nucleobase. [00266] Aspect 41 is the array of any of Aspects 40 or 42to 46, where the array includes a flow cell. [00267] Aspect 42 is the array of any of Aspects 40, 41, or 43 to 46, where the nucleobase of the dGTP analog is 7-deaza-dGPT or 7-deaza-dGPT substituted at the 7 position. [00268] Aspect 43 is the array of any of Aspects 40 to 42 or 44 to 46, where the 7-deaza-dGPT is 8- aza-7-deaza-dGTP or 8-aza-7-deaza-dGTP substituted at the 7 position. [00269] Aspect 44 is the array of any of Aspects 40 to 43 or 45 to 46, where the nucleobase of the dGTP analog is substituted at the nitrogen of position 7. [00270] Aspect 45 is the array of any of Aspects 40 to 44 or 46, where the nucleobase of the dGTP analog is of Formula 5:
wherein J is C or N; where Z is C or N; where R1 is hydrogen, halo, alkyl, acyl, trihaloalkyl, cyano, sulfinyl, sulfonyl, or alkynyl; and R2 is hydrogen or halo.
IP-2724-PCT/531.2724WO01 [00271] Aspect 46 is the array of any of Aspects 40 to 45, where the nucleobase of the dGTP analog of Formula 6:
or a C1 to C6 alkyl. [00272] Aspect 47 is a method for reducing bias during resynthesis of modified target nucleic acids at amplification sites, including: (a) providing an array including a plurality of amplification sites, where the amplification sites include two populations of capture nucleic acids immobilized to the amplification sites at the 5’ end, each population including a capture sequence, where a first population of capture nucleic acids include at each amplification site a clonal population of a modified target nucleic acid, the 5’ end of the clonal population of the modified target nucleic acid attached to the 3’ end of the first population capture nucleic acids, where the clonal population of the modified target nucleic acid at each amplification site is a member of a sequencing library, (b) contacting the plurality of amplification sites to a resynthesis reagent including (i) a composition including nucleotide triphosphates (NTPs), where the NTPs include dATP, dTTP, dCTP, dGTP, and a dGTP analog including a nucleobase, and (ii) a composition including a polymerase; and (c) reacting the resynthesis reagent to produce a plurality of re-populated amplification sites attached to the array, where the plurality of re-populated amplification sites each include a clonal population of a resynthesized target nucleic acids immobilized to the amplification sites at the 5’ end, where the clonal population of the resynthesized target nucleic acid includes a nucleic acid sequence that is a complement of the clonal population of the modified target nucleic acid of step (a). [00273] Aspect 48 is a method for reducing bias during resynthesis of target nucleic acids at amplification sites, including: (a) providing an array including a plurality of amplification sites, where the amplification sites include two populations of capture nucleic acids
IP-2724-PCT/531.2724WO01 immobilized to the amplification sites at the 5’ end, each population including a capture sequence, where a first population of capture nucleic acids include at each amplification site a clonal population of a modified target nucleic acid, the 5’ end of the clonal population of the modified target nucleic acid attached to the 3’ end of the first population capture nucleic acids, where a second population of capture nucleic acids include (i) the complement of the clonal population of the modified target nucleic acid at each amplification site, the 5’ end of the complement of the clonal population of the modified target nucleic acid attached to the 3’ end of the second population capture nucleic acids, and (ii) a cleavage site, where the clonal population of the modified target nucleic acid at each amplification site is a member of a sequencing library, (b) contacting the amplification sites with a cleavage agent, thereby cleaving the second population of capture nucleic acids, and releasing the clonal population of the modified target nucleic acid attached to the 3’ end of the second population capture nucleic acids; (c) removing the released clonal population of the modified target nucleic acid attached to the 3’ end of the second population capture nucleic acids from the amplification sites; (d) contacting the plurality of amplification sites to a resynthesis reagent including (i) a composition including nucleotide triphosphates (NTPs), where the NTPs include dATP, dTTP, dCTP, dGTP, and a dGTP analog including a nucleobase, and (ii) a composition including a polymerase; and (e) reacting the resynthesis reagent to produce a plurality of re- populated amplification sites attached to the array, where the plurality of re-populated amplification sites each include a clonal population of a resynthesized target nucleic acid immobilized to the amplification sites at the 5’ end, where the clonal population of the resynthesized target nucleic acid includes a nucleic acid sequence that is a complement of the clonal population of the modified target nucleic acid of step (a). [00274] Aspect 49 is the method of any of Aspects 47, 48, or 50 to 55, where the array includes a flow cell. [00275] Aspect 50 is the method of any of Aspects 47 to 49 or 51 to 55, where the nucleobase of the dGTP analog is 7-deaza-dGPT or 7-deaza-dGPT substituted at the 7 position. [00276] Aspect 51 is the method of any of Aspects 47 to 50 or 52 to 55, where the 7-deaza-dGPT is 8-aza-7-deaza-dGTP or 8-aza-7-deaza-dGTP substituted at the 7 position.
IP-2724-PCT/531.2724WO01 [00277] Aspect 52 is the method of any of Aspects 47 to 51 or 53 to 55, where the nucleobase of the dGTP analog is substituted at the nitrogen of position 7. [00278] Aspect 53 is the method of any of Aspects 47 to 52 or 54 to 55, where the nucleobase of the dGTP analog is of Formula 5:
where J is C or N; where Z is C or N; where R1 is hydrogen, halo, alkyl, acyl, trihaloalkyl, cyano, sulfinyl, sulfonyl, or alkynyl; and R2 is hydrogen or halo. [00279] Aspect 54 is the method of any of Aspects 47 to 53 or 55, where the nucleobase of the dGTP analog of Formula 6 :
where R100 is hydrogen or a C1 to C6 alkyl. [00280] Aspect 55 method of any of Aspects 47 to 54, where the resynthesis reagent includes the dGTP analog at no greater than 50%, no greater than 40%, no greater than 30%, no greater than 20%, no greater than 10%, or no greater than 5% of the total amount of dGTP. [00281] Aspect 56 is a cartridge for use with a sequencing apparatus, the cartridge including: a first chamber including a nucleotide composition, the nucleotide composition including dATP, dTTP, dGTP, dCTP, and a dGTP analog including a nucleobase.
IP-2724-PCT/531.2724WO01 [00282] Aspect 57 is the cartridge of Aspect 56, further including a flow cell, where the flow cell is releasably attached to the cartridge. [00283] Aspect 58. A kit for use with a sequencing apparatus, the kit including: a cartridge, the cartridge including a first chamber including a nucleotide composition, the nucleotide composition including dATP, dTTP, dGTP, dCTP, and a dGTP analog including a nucleobase. [00284] Aspect 59 is the kit of Aspect 58, further including a flow cell, where the flow cell is configured to releasably attach to the cartridge. [00285] Aspect 60 is the cartridge or the kit of any of Aspects 58 to 59 or 61 to 65, where the nucleobase of the dGTP analog is 7-deaza-dGPT or 7-deaza-dGPT substituted at the 7 position. [00286] Aspect 61 is the cartridge or the kit of any of Aspects 58 to 60 or 62 to 65, where the 7-deaza- dGPT is 8-aza-7-deaza-dGTP or 8-aza-7-deaza-dGTP substituted at the 7 position. [00287] Aspect 62 is the cartridge or the kit of any of Aspects 58 to 61 or 63 to 65, where the nucleobase of the dGTP analog is substituted at the nitrogen of position 7. [00288] Aspect 63 is the cartridge or the kit of any of Aspects 58 to 62 or 64 to 65, where the nucleobase of the dGTP analog is of Formula 5:
wherein J is C or N; where Z is C or N; where R1 is hydrogen, halo, alkyl, acyl, trihaloalkyl, cyano, sulfinyl, sulfonyl, or alkynyl; and R2 is hydrogen or halo.
IP-2724-PCT/531.2724WO01 [00289] Aspect 64 is the cartridge or the kit of any of Aspects 58 to 63 or 65, where the nucleobase of the dGTP analog of Formula 6 :
or a C1 to C6 alkyl. [00290] Aspect 65 is the cartridge or the kit of any of Aspects 58 to 64, where the dGTP analog is present at no greater than 50%, no greater than 40%, no greater than 30%, no greater than 20%, no greater than 10%, or no greater than 5% of the total amount of dGTP. [00291] EXAMPLES [00292] The present disclosure is illustrated by the following examples. It is to be understood that the particular examples, materials, amounts, and procedures are to be interpreted broadly in accordance with the scope and spirit of the disclosure as set forth herein. [00293] Example 1 [00294] Assessing 7-deaza-dGTP and G-quadruplex formation [00295] In order to determine if 7-deaza-dGTP reduces the formation of G-quadruplex (G4) structures, 7-deaza-dGTP was assessed in various steps prior to sequencing runs. DNA templates were synthesized that included known sequences for weak and strong G4s using either dGTP or 7-deaza-dGTP. The sequence for weak G4s was GGGCTGGGGGCGTGGGCACGTGGGGT (SEQ ID NO:1), and the sequence for strong G4s was GGGGGAGGGGGAGGGGGAGGGGGT (SEQ ID NO:2). Incorporation of ffC by a polymerase described in US Patent No. 11,001,816 at the beginning of G4 regions was then tested (FIG. 7A). While there was no difference in the incorporation rates in both templates for weak G4 regions, we observed significantly higher incorporation events in 7-
IP-2724-PCT/531.2724WO01 deaza-dGTP templates for strong G4 regions, in particularly in the presence of K+ (FIG.7B, C). This is evidence that 7-deaza-dGTP could indeed potentially alleviate G4-related quality drops in sequencing. [00296] ffN incorporation kinetics by the polymerase with templates containing multiple 7-deaza- dGTP (vs dGTP) in the polymerase footprint were characterized. It was found that the incorporation kinetics were similar for both dGTP and 7-deaza-dGTP templates (FIG.8A), indicating that the polymerase activity during sequencing remains unaffected. In a similar experiment, it was observed that a polymerase used during clustering, Bsu, could incorporate both dGTP and 7-deaza-dGTP with comparable efficiency as well (FIG.8B). [00297] Clustering experiments with cBot, an automated clonal cluster generator (Illumina, San Diego, CA) were successfully performed with PhiX library on HiSeqX V2.5 flowcell and clustering plate. A custom-made EPX1 reagent mix that contained various 7-deaza- dGTP:dGTP ratios (e.g., 100% dGTP, 100% 7-deaza-dGTP, and 50% dGTP and 50% 7- deaza-dGTP) was used. First base incorporation was used to visualize the clusters formed (FIG. 9A). It was observed that clustering was less efficient with higher 7-deaza-dGTP content (FIG.9B), suggesting that a ratio of 7-deaza-dGTP to dGTP would be beneficial. [00298] Example 2 [00299] Sequencing metrics using 7-deaza-dGPT with Gen1 chemistry [00300] Sequencing metrics were characterized using various ratios of 7-deaza-dGTP in clustering on 4-channel HiSeqX followed by sequencing using ffNs having a 3'-O-azidomethyl blocking group at the 3'-OH of the nucleotide sugar moiety (referred to herein as "Gen1 chemistry"). HiSeqX clustering was set up as standard using Illumina reagents as specified by the manufacturer. Clustering was performed as previously described above in Example 1. BacPac nano 450 was used as library with PhiX spike-in. [00301] As depicted in FIG.10, increasing the ratio of 7-deaza-dGTP with respect to dGTP appears to cause a decrease in clustering efficiency. This is highlighted by reduced Clusters passing filter (PF) and reduced intensity. However, ratios below 50% appear to cause minimal impact to the primary metrics when compared to the control custom EPX1.
IP-2724-PCT/531.2724WO01 [00302] Increasing the percent ratio of 7-deaza-dGTP was found to cause an associated increase in G quadruplex coverage (Global coverage metric of all defined G4s within BacPac Library) (FIG. 11). Although increasing the ratio above 50% appears to cause a decrease in G4 coverage, this could be associated with a decrease in Poly G coverage. This likely is the result of a reduced ability of the currently selected enzymes to efficiency incorporate multiple 7- deaza G bases successively, or due to quenching by the incorporated 7-deaza-dGTP. This can be overcome by increasing clustering time, using new deaza variants, or formulation optimization. [00303] The data also shows that a large proportion of the known G4s in BacPac 450 can be resolved with an optimized ratio of 7-deaza-dGTP. This ratio appears to sit within 25-50% 7-deaza- dGTP (FIG.12). [00304] Example 3 [00305] Sequencing metrics using 7-deaza-dGPT with Gen2 chemistry [00306] Sequencing metrics were also characterized using various ratios of 7-deaza-dGTP in clustering on NextSeq2000 followed by sequencing using ffNs having a 3'-OH acetal blocking group or a 3'-OH thiocarbamate blocking group at the 3'-OH of the nucleotide sugar moiety (referred to herein as "Gen2 Chemistry"). NextSeq2000 was set up with modified reagents, instrument configuration for basecalling and recipe for the different chemistry type. The recipe utilized was edited to remove on board incorporation mix mixing stages and a pre-mixed incorporation mix was placed directly into the cartridge, and the deblocking time was also reduced. Standard clustering time on the NextSeq2000 was for a total of 60 mins comprising to two pushes of ExAmp reagent each with a static wait of 30 mins, Clustering with analogues was performed using a custom recipe which increased the static wait time of each push by 30 mins, to create a total of two pushes at 60 mins each, with an overall total time of 120 mins. The 7-deaza-dGTP was spiked into the cartridges containing existing Examp Clustering reagent 1 (ECX1) to achieve various ratios. A control with purified water spiked into the ECX1 was also run.
IP-2724-PCT/531.2724WO01 [00307] Similar to previously described primary metrics results for HiSeq, on the NextSeq2k platform an increase in the percent ratio of 7-deaza-dGTP (to dGTP) results in a decrease in primary metric quality (Table 2). Unlike on HiSeq, the decrease in primary metrics quality is less associated with decreased clustering efficiency, as seen by relatively stable %PF, but can instead be correlated with the decrease in intensity in both the Blue and Green Channels. This is thought to be the result of 7-deaza-dGTP causing quenching of the ffC cloud, which in this system is comprised of a ffC’s with both a blue and green dye mix (‘dual’ cloud pairing). This enhanced quenching with enhanced % ratios of 7-deaza-dGTP (to dGTP) directly correlates with the decrease in primary metrics quality. However, the impact to %PF, %ER and %Q30 appears to be minimal at 5% 7-deaza-dGTP spiked in. The variation in the intensity values may also arise from other variables. [00308] Table 2. Primary metrices analysis of increasing % ratios of 7-deaza-dGTP (to dGTP) on NextSeq2k with Gen2 chemistry. Treatment %PF %ER %Q30 Cycle 1 Intensity Cycle 1 Intensity Green Blue %
[00309] Notably, the analysis of the BaseSpace sequencing Hub SSE metrics demonstrates that even with just a 5% 7-deaza-dGTP G spike in the G-Quad coverage increased by approx.1X, as seen in FIG.13. [00310] In addition, the analysis of the % of bases that are softclipped (due to errors and or miss- calls) within the covered regions of known G-Quad sequences in BacPac 450 demonstrates that the addition of a mere 5% 7-deaza-dGTP (to dGTP) results in a substantial reduction in the mean % softclipping from a population of 50 G4s on NextSeq2k with Gen2 chemsitry
IP-2724-PCT/531.2724WO01 (FIG.14). Despite the data in FIG.15 suggesting that the 20% 7-deaza-dGTP spike appears to have the best G-Quadroplex resolution, its impact to the primary metrics suggests that the optimal ratio of 7-deaza-dGTP (to dGTP) may actually fall in the range of 1-10% on this platform with Gen2 chemistry. [00311] Example 4 [00312] Reduction in misincorporation rates [00313] An unexpected and surprising reduction in misincorportion rates was observed during the initial radio chemistry kinetics gel assays. While ffCs could be efficiently incorporated against both dGTP and 7-deaza-dGTP (FIG. 16A), incorporation of ffTs against 7-deaza- dGTP was much lower than that against dGTP (FIG. 16B). This reduction in the misincorporation rates could potentially provide a further improvement in sequencing quality. [00314] Example 5 [00315] Use in Bridge amplification (BridgeAmp) and Exclusion Amplification (ExAmp) [00316] Previous investigations in using 7-deaza-dGTP for clustering by BridgeAmp suggested that use of 7-deaza-dGTP resulted in the formation of less desirable product than when dGTP was used. It has now been determined that the low yield similar was, in part, due to the reagent sets utilized. Previous experiments were conducted with BridgeAmp clustering, where the current set disclosed here is conducted with ExAmp reactions. It was observed that DNA templates containing 7-deaza-dGTP do not stain well, giving rise to an illusion of low yield. In FIG. 17, the Sybr-gold stained gel seemed like there were more products formed with dGTP compared to 7-deaza-dGTP. However, the same gel observed under FAM channel (with a 5’ FAM labelled primer) prior to the sybr-gold staining showed that the same amount of products were formed for both G nucleotides. The seemingly lower yield was due to the quenching effect of 7-deaza-dGTP. [00317] Example 6 [00318] Use of other dGTP analogs with Gen2 chemistry
IP-2724-PCT/531.2724WO01 [00319] Resolution of G4s was determined using other dGTP analogs. Various ratios of dGTP analogs were used in clustering on NextSeq2000 followed by sequencing using ExAmp reagent and ffNs having Gen2 chemistry as described in Example 3. A large proportion of the known G4s in BacPac 450 can be resolved with 20% 7-deaza-CF3SO2-dGTP (FIG.18), 10% 7-Iodo-dGTP (FIG.19), 10% 7-deaza-CF3-dGTP (FIG.20), 20% 7-deaza-7-CN-dGTP (FIG. 21), 10% 7-deaza-7-F-dGTP (FIG. 22), or 10% 7-deaza-MeSO2-dGTP (FIG. 23). Further experimental work with 7-deaza-MeSO2-dGTP at levels of 5%, 10%, 20%, 30%, and 50% showed there were no deleterious effects on primary metrics including percent error rate for both Read 1 and Read 2, percent miss-matched bases, and G4 callability of autosomes. [00320] 8-aza-7-deaza-dGPT was used in clustering on NextSeq2000 followed by sequencing with ffNs having Gen2 chemistry as described in Example 3, but using BridgeAmp clustering reagent instead of ExAmp reagent. Ten percent 8-aza-7-deaza-dGPT resolved a known G4 (FIG.24). [00321] 7-paraG-dGTP was used in clustering on iSeq100 followed by sequencing with ffNs having Gen2 chemistry and the default ExAmp reagent. Ten percent 7-paraG-dGTP resolved known G4s (FIG.25). [00322] Example 7 [00323] Synthesis of various dGTP analogs [00324] Various dGTP analogs were synthesized according to the following procedures. [00325] 7-deaza-7-trifluoromethyl-dGTP (7-deaza-7-CF3-dGTP) [00326] 7-CF3-dGTP was synthesized according to the synthetic scheme in FIG. 26. 7-deaza-7- iodoguanosine (compound 1) was treated with tert-butyldiphenylsilyl chloride (TBDPSCl) to install the TBDPS protecting group in the 5ʹ hydroxy affording intermediate 2. Intermediate 2 was treated with isobutyryl chloride to protect the 3ʹ alcohol and the primary amine yielding intermediate 3. Intermediate 3 was treated with methyl 2,2-difluoro-2- (fluorosulfonyl)acetate + Copper iodide to install the CF3 group at the 7 position yielding
IP-2724-PCT/531.2724WO01 intermediate 4. The 5ʹ hydroxyl protecting group was removed using tetra-n-butylammonium fluoride (TBAF); the 2ʹ protecting group was removed using methylamine; the primary amine group was removed using methylamine; and triphosphate synthesis was accomplished using phosphorus oxychloride POCl3 + Tetra-n-butylammonium pyrophosphate to afford 7- deaza-7-CF3-dGTP. [00327] 7-deaza-7-methyl sulfoxide-dGTP (7-deaza-7-SO2Me-dGTP) [00328] 7-deaza-7-SO2Me-dGTP was synthesized according to the synthetic scheme in FIG.27.7- deaza-7-iodoguanosine (compound 1) was treated with tert-butyldiphenylsilyl chloride (TBDPSCl) to install the TBDPS protecting group in the 5ʹ hydroxy affording intermediate 2. Intermediate 2 was treated with isobutyryl chloride to protect the 3ʹ alcohol and the primary amine yielding intermediate 3. Intermediate 3 was treated with sodium methanesulfinate + Copper iodide to install the SO2Me group at the7 position yielding intermediate 4. The 5ʹ hydroxyl protecting group was removed using tetra-n-butylammonium fluoride (TBAF); the 2ʹ protecting group was removed using methylamine; the primary amine group was removed using methylamine; and triphosphate synthesis was accomplished using phosphorus oxychloride POCl3 + Tetra-n-butylammonium pyrophosphate to afford 7- deaza-7-SO2Me-dGTP. [00329] 7-deaza-7-cyano-dGTP (7-deaza-7-CN-dGTP) [00330] 7-deaza-7-CN-dGTP was synthesized according to the synthetic scheme in FIG.28.7-deaza- 7-iodoguanosine (compound 1) was treated with tert-butyldiphenylsilyl chloride (TBDPSCl) to install the TBDPS protecting group in the 5ʹ hydroxy affording intermediate 2. Intermediate 2 was treated with isobutyryl chloride to protect the 3ʹ alcohol and the primary amine yielding intermediate 3. Intermediate 3 was treated with copper cyanide to install the CN group at the7 position yielding intermediate 4. The 5ʹ hydroxyl protecting group was removed using tetra-n-butylammonium fluoride (TBAF); the 2ʹ protecting group was removed using methylamine; the primary amine group was removed using methylamine; and triphosphate synthesis was accomplished using phosphorus oxychloride POCl3 + Tetra-n- butylammonium pyrophosphate to afford 7-deaza-7-CN-dGTP.
IP-2724-PCT/531.2724WO01 [00331] 7-deaza-7-fluoro-dGTP (7-deaza-7-F-dGTP) [00332] 7-deaza-7-F-dGTP was synthesized according to the synthetic scheme in FIG.29.7-deaza- 6-chloroguanine having a protected amine (compound 1) was treated with SELECTFLUOR to install the fluoro group at the 7 position of the protected guanine affording intermediate 2. Intermediate 2 was treated with 1-chloro-deoxyribose (alcohol groups protected using 1- chloro-2-deoxy-3,5-di-O-toluoyl-a-D-ribofuranose) to covalently couple the sugar to the nucleobase affording intermediate 3. Intermediate 3 was treated with sodium methoxide to exchange the chloro with a methyl ether and to remove the alcohol protecting groups yielding intermediate 4. Intermediate 4 was treated with sodium hydroxide (2 molar) to oxidize the methyl ether to a ketone yielding intermediate 5. Triphosphate synthesis was accomplished using phosphorus oxychloride POCl3 + Tetra-n-butylammonium pyrophosphate to afford 7- deaza-7-F-dGTP. [00333] 7-deaza-7-chloro-dGTP (7-deaza-7-Cl-dGTP) [00334] 7-deaza-7-Cl-dGTP was synthesized according to the synthetic scheme in FIG.30.7-deaza- guanosine (compound 1) was treated with isobutyryl chloride to install 5ʹ and 2ʹ hydroxyl protecting groups to afford intermediate 2. Intermediate 2 was treated with N- chlorosuccinimide to install the Cl group at the 7 position yielding intermediate 3. The 2ʹ and 5ʹ hydroxyl protecting group was removed using methylamine (intermediate 4) and triphosphate synthesis was accomplished using phosphorus oxychloride POCl3 + Tetra-n- butylammonium pyrophosphate to afford 7-deaza-7-Cl-dGTP. [00335] 7-deaza-7,8-dichloro-dGTP (7-deaza-7,8-diCl-dGTP) [00336] 7-deaza-7,8-diCl-dGTP was synthesized according to the synthetic scheme in FIG. 31. 7- deaza-guanosine (compound 1) was treated with tert-butyldiphenylsilyl chloride (TBDPSCl) to install the TBDPS protecting group in the 5ʹ hydroxy affording intermediate 2. Intermediate 2 was treated with isobutyryl chloride to install the 2ʹ hydroxyl protecting group to afford intermediate 3. Intermediate 3 was treated with N-chlorosuccinimide to install the Cl group at the 7 position and the 8 position yielding intermediate 4. The 5ʹ hydroxyl protecting group was removed using tetra-n-butylammonium fluoride (TBAF) (intermediate
IP-2724-PCT/531.2724WO01 5); triphosphate synthesis was accomplished using phosphorus oxychloride POCl3 + Tetra- n-butylammonium pyrophosphate; and the 2ʹ was removed using methylamine to afford 7- deaza-7,8-diCl-dGTP. [00337] 7-deaza-7-trifluoromethylsulfoxide-dGTP (7-deaza-7-SOCF3-dGTP) and 7-deaza-7- trifluoromethylsulfone-dGTP (7-deaza-7-SO2CF3-dGTP) [00338] 7-deaza-7-SOCF3-dGTP and 7-deaza-7-SO2CF3-dGTP was synthesized according to the synthetic scheme in FIG.32.7-deaza-7-iodoguanosine (compound 1) was treated with tert- butyldiphenylsilyl chloride (TBDPSCl) to install the TBDPS protecting group in the 5ʹ hydroxy affording intermediate 2. Intermediate 2 was treated with isobutyryl chloride to protect the 3ʹ alcohol and the primary amine yielding intermediate 3. Intermediate 3 was treated with copper(I) Trifluoromethanethiolate to install the SCF3 group at the 7 position yielding intermediate 4. Intermediate 4 was oxidized using meta-Chloroperbenzoic acid (m- CPBA) yielding intermediates 5 and 6. The 5ʹ hydroxyl protecting group was removed using tetra-n-butylammonium fluoride (TBAF) (giving intermediates 7 and 8); the 2ʹ protecting group was removed using mMethylamine; the primary amine group was removed using methylamine; and triphosphate synthesis was accomplished using phosphorus oxychloride POCl3 + Tetra-n-butylammonium pyrophosphate to afford 7-deaza-7-SOCF3-dGTP and 7- deaza-7-SO2CF3-dGTP. [00339] 7-deaza-acetoxy-dGTP (7-deaza-7-Ac-dGTP) [00340] 7-deaza-7-Ac-dGTP was synthesized according to the synthetic scheme in FIG.33.7-deaza- 7-iodoguanosine (compound 1) was treated with trimethylsilylacetylene, tetrakis(triphenylphosphine)palladium, copper iodide, and triethylamine in dichloromethane to add the silyl alkyne to the 7 position affording intermediate 2. Intermediate 2 was treated with potassium carbonate in methanol to remove the silyl group affording intermediate 3. Intermediate 3 was treated with sulfuric acid in a water/methanol mixture to oxidize the alkyne affording intermediate 4. Triphosphate synthesis was accomplished using phosphorus oxychloride POCl3 + Tetra-n-butylammonium pyrophosphate to afford 7-deaza-7-Ac- dGTP.
IP-2724-PCT/531.2724WO01 [00341] 7-N-methyl-dGTP (7-NMe-dGTP) [00342] 7-N-methyl-dGTP was synthesized according to the synthetic scheme in FIG.34. dGTP was treated with dimethyl sulfate yielding 7-N-methyl-dGTP. [00343] The complete disclosure of all patents, patent applications, and publications, and electronically available material (including, for instance, nucleotide sequence submissions in, e.g., GenBank and RefSeq, and amino acid sequence submissions in, e.g., SwissProt, PIR, PRF, PDB, and translations from annotated coding regions in GenBank and RefSeq) cited herein are incorporated by reference in their entirety. Supplementary materials referenced in publications (such as supplementary tables, supplementary figures, supplementary materials and methods, and/or supplementary experimental data) are likewise incorporated by reference in their entirety. In the event that any inconsistency exists between the disclosure of the present application and the disclosure(s) of any document incorporated herein by reference, the disclosure of the present application shall govern. The foregoing detailed description and examples have been given for clarity of understanding only. No unnecessary limitations are to be understood therefrom. The disclosure is not limited to the exact details shown and described, for variations obvious to one skilled in the art will be included within the disclosure defined by the claims. [00344] Unless otherwise indicated, all numbers expressing quantities of components, molecular weights, and so forth used in the specification and claims are to be understood as being modified in all instances by the term "about." Accordingly, unless otherwise indicated to the contrary, the numerical parameters set forth in the specification and claims are approximations that may vary depending upon the desired properties sought to be obtained by the present disclosure. At the very least, and not as an attempt to limit the doctrine of equivalents to the scope of the claims, each numerical parameter should at least be construed in light of the number of reported significant digits and by applying ordinary rounding techniques. [00345] Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the disclosure are approximations, the numerical values set forth in the specific examples are reported as precisely as possible. All numerical values, however, inherently contain a
IP-2724-PCT/531.2724WO01 range necessarily resulting from the standard deviation found in their respective testing measurements. [00346] All headings are for the convenience of the reader and should not be used to limit the meaning of the text that follows the heading, unless so specified.