WO2025149478A1

WO2025149478A1 - Compositions of modified nucleoside triphosphates

Info

Publication number: WO2025149478A1
Application number: PCT/EP2025/050239
Authority: WO
Inventors: Drew GOODMAN; Aaron Jacobs; Mark Stamatios Kokoris; Tylor LEHMANN; Matthew Lopez; Melud Nabavi; Dylan O´CONNELL; John C. Tabone
Original assignee: F Hoffmann La Roche AG; Roche Sequencing Solutions Inc
Current assignee: F Hoffmann La Roche AG; Roche Sequencing Solutions Inc
Priority date: 2024-01-12
Filing date: 2025-01-07
Publication date: 2025-07-17
Anticipated expiration: 2026-07-12

Abstract

The invention relates to a diastereomer of nucleoside triphosphates suitable for use in sequencing by expansion. The diastereomer provides a better acceptance and incorporation by a DNA polymerase and better performance in sequencing by expansion workflows. The invention also relates to sequencing methods using the diastereomer of the nucleoside triphosphates.

Description

COMPOSITIONS OF MODIFIED NUCLEOSIDE TRIPHOSPHATES

SEQUENCE LISTING INCORPORATION BY REFERENCE

[0001] This application hereby incorporates-by-reference a sequence listing submitted herewith in a computer-readable format..

FIELD OF THE INVENTION

[0002] The present invention relates to compositions comprising an excess of a specific diastereomer of nucleoside triphosphates with a 5’ phosphoramidate. The invention also relates to methods for generating a complementary strand and for sequencing using the compositions.

BACKGROUND

[0003] Over the last two decades, biological membranes have emerged as an important tool in a variety of biomedical applications. This includes the use of lipid bilayer membranes in nanopore based sequencing applications, where nanopores provide a constant and reproducible physical aperture, through which a target molecule can be directed and sequenced.

[0004] One approach for nanopore-based sequencing of, for example, nucleic acids involves a sequencing-by-expansion approach by transcribing the sequence of nucleic acids into a simple to measure polymer molecule called an Xpandomer. Much like with polymerase chain reaction (PCR), Xpandomer synthesis is based on the natural function of DNA replication where expandable nucleoside triphosphates (XNTPs) act as substrates for replication.

[0005] Xpandomer synthesis is based on four easily differentiated XNTPs that include High Signal-to-Noise Reporters, one for each DNA base. Engineered polymerases incorporate these modified nucleotides into Xpandomers, producing a copy of the target nucleic acid template from the library. As the Xpandomer molecule transits through the nanopore, the distinct electrical signal of each base reporter is easily identifiable to enable highly accurate and high throughput nanopore-based nucleic acid sequencing. See, e.g., U.S. Pat. No. 7,939,259, titled “High Throughput Nucleic Acid Sequencing by Expansion;” and PCT publication WO 2020/236526 Al, titled “Translocation control elements, reporter codes, and further means for translocation control for use in nanopore sequencing”, both of which are hereby incorporated herein in their entirety.

[0006] Modified nucleoside triphosphates with two clickable (e.g. terminal alkyne) groups, such as dNTP-2c, are used as building blocks for reagents in nanopore sequencing, especially within the technology of sequencing by expansion. Within this technology, such building blocks are typically clicked to tethers that contain reporter and translocation control elements in order to generate XNTPs. Structures and processes of sequencing by expansion and reagents used therein are disclosed in WO 2016/081871, WO 2020/236526 and WO 2020/172479.

[0007] The synthesis of dNTP-2c has previously been conducted by solid-phase synthesis using commercially available DNA/RNA-synthesizers with a proprietary synthetic method. Since the a-phosphoramidate in dNTP-2c (and XNTPs derived therefrom) is chiral, dNTP-2c (and XNTPs derived therefrom) are obtained as 1 :1 diastereomeric mixtures of two isomers in this process.

SUMMARY OF THE INVENTION

[0008] The disclosure relates to a nucleoside triphosphate with a 5’ phosphoramidate that may be useful in the field of sequencing by expansion. The a-phosphoramidate in such nucleoside triphosphates is chiral and can have two distinct stereoconfigurations:

[0009] The present inventors have surprisingly found that one stereoconfiguration (“active” isomer) of XNTPs provides the desired functional performance in sequencing by expansion due to better polymerase incorporation and Xpandomer production compared to the other, “inactive” isomer or a mixture of both. As shown in the Examples, only the active isomer allows generation of full- length complementary strand (Xpandomer) product. Surprisingly, the inactive isomer not only does not yield any full-length product, its presence in a mixture with the active isomer demonstrates a negative impact on the yield of full-length product.

[0010] The specific use of a composition comprising an excess of the active isomer over the inactive isomer (e.g. in which at least 80%, preferably at least 90% or even 100% of the nucleoside triphosphates represents the active isomer) thus enables efficient and cost-effective Xpandomer production for sequencing by expansion. Based on these findings, it is desirable to remove the inactive isomer from the final solution used in Xpandomer synthesis, or even to avoid its production in the first place. The disclosure thus provides a composition comprising a nucleoside triphosphate with a chiral 5’ phosphoramidate, wherein one isomer is present in excess over the other. [0011] Exemplary embodiments of the disclosure are as follows:

[0012] 1. A composition comprising nucleoside triphosphates having the structure: wherein NB is a nucleobase; R¹ comprises or consists of a hydrocarbon; R² is independently H, OH or any 2 '-ribose modification; R³ is H or any protecting group; and R⁴ comprises or consists of a hydrocarbon; G¹ and G² independently represent terminal clickable groups; L¹ and L² independently represent linking groups; and T is a tether molecule; wherein at least 80%, preferably at least 90%, such as at least 95%, at least 99%, or 100% of the nucleoside triphosphates have the following stereoconfiguration at the a -phosphoramidate: [0013] 2. The composition of item 1, wherein at least 90% of the nucleoside triphosphates have the following stereoconfiguration at the a -phosphoramidate: [0014] 3. The composition of item 1 or 2, wherein 100% of the nucleoside triphosphates have the following stereoconfiguration at the a-phosphoramidate:

G²

R₄

(-O-P-N-] o

[0015] 4. The composition of any one of the preceding items, wherein NB is selected from cytosine, thymine, 7- deazaadenine and 7-deazaguanine.

[0016] 5. The composition of any one of the preceding items, wherein R¹ is attached to position 5 of the nucleobase when the nucleobase is a pyrimidine nucleobase, and to position 7 of the nucleobase when the nucleobase is a purine nucleobase.

[0017] 6. The composition of any one of the preceding items, wherein NB has one of the following structures:

[0018] 7. The composition of any one of the preceding items, comprising a mixture of four different types of nucleoside triphosphates.

[0019] 8. The composition of item 7, wherein the four different types of nucleoside triphosphates comprise four different types of nucleobases.

[0020] 9. The composition of item 7 or 8, wherein the four different types of nucleoside triphosphate base pair with, guanine, adenine, thymine and cytosine, respectively.

[0021] 10. The composition of any one of items 7-9, wherein the four different types of nucleoside triphosphate comprise the four different types of nucleobases of item 6, respectively.

[0022] 11. The composition of any one of the preceding items, wherein R¹ comprises or consists of an unsaturated hydrocarbon.

[0023] 12. The composition of any one of the preceding items, wherein R¹ consists of a hydrocarbon, such as an alkynyl.

[0024] 13. The composition of any one of the preceding items, wherein R¹ is acyclic.

[0025] 14. The composition of any one of the preceding items, wherein R¹ is linear. [0026] 15. The composition of any one of the preceding items, wherein R¹ comprises 1- 20 carbon atoms, such as 1-10 carbon atoms or 5-10 carbon atoms, such as 6 or 8 carbon atoms.

[0027] 16. The composition of any one of the preceding items, wherein R¹ is a hexa-1- ynyl or octa-1 -ynyl group.

[0028] 17. The composition of any one of the preceding items, wherein R¹ with G¹ is a octa-l,7-diynyl or a deca-1 ,9-diynyl group.

[0029] 18. The composition of any one of the preceding items, wherein both R² are H.

[0030] 19. The composition of any one of the preceding items, wherein R³ is H.

[0031] 20. The composition of any one of the preceding items, wherein R⁴ comprises or consists of a saturated hydrocarbon.

[0032] 21. The composition of any one of the preceding items, wherein R⁴ comprises 1- 20 carbon atoms, such as 3-15 carbon atoms, 3-10 carbon atoms, such as 4 carbon atoms.

[0033] 22. The composition of any one of the preceding items, wherein R⁴ is acyclic.

[0034] 23. The composition of any one of the preceding items, wherein R⁴ is linear.

[0035] 24. The composition of any one of the preceding items, wherein R⁴ consists of a hydrocarbon.

[0036] 25. The composition of any one of the preceding items, wherein R⁴ is an n- butyl group.

[0037] 26. The composition of any one of the preceding items, wherein R⁴ with G² is a hex-5-ynyl group.

[0038] 27. The composition of any one of items 1 -23, wherein R⁴ comprises or consists of two or more hydrocarbons that are linked by an atom or group of atoms other than carbon, such as a phosphorus atom and/or an oxygen atom.

[0039] 28. The composition of any one of the preceding items, wherein the terminal clickable group is a terminal alkyne group or a terminal azide group, preferably a terminal alkyne group.

[0040] 29. The composition of any one of the preceding items, wherein G¹ and G² represent the same type of terminal clickable group.

[0041] 30. The composition of any one of the preceding items, wherein L¹ and L² independently represent linking groups formed via click reactions.

[0042] 31. The composition of any one of the preceding items, wherein each of L¹ and L² is a 1,2,3-triazole. [0043] 32. The composition of any one of the preceding items, wherein at least 80%, preferably at least 90%, such as at least 95%, at least 99%, or 100% of the nucleoside triphosphates have a structure selected from the following structures:

[0044] 33. The composition of any one of the preceding items, wherein at least 80%, preferably at least 90%, such as at least 95%, at least 99%, or 100% of the nucleoside triphosphates have a structure selected from the following structures:

[0045] 34. The composition of any one of items 1 -29, wherein at least 80%, preferably at least 90%, such as at least 95%, at least 99%, or 100% of the nucleoside triphosphates have a structure selected from the following structures:

[0046] 35. The composition of any one of items 1 -29, wherein at least 80%, preferably at least 90%, such as at least 95%, at least 99%, or 100% of the nucleoside triphosphates have a structure

5 selected from the following structures:

[0047] 36. A method for generating a complementary strand to a nucleic acid, comprising contacting the nucleic acid with the composition of any one of items 1-33. [0048] 37. The method of item 36, wherein the nucleic acid is comprised in library of nucleic acids.

[0049] 38. The method of item 36 or 37, wherein the nucleic acid is a DNA.

[0050] 39. The method of any one of items 36-38, wherein the composition further comprises a nucleic acid polymerase. [0051] 40. The method of any one of items 36-39, wherein the composition further comprises a buffering agent, such as TrisCi, and/or a polymerase cofactor, such as MnC12.

[0052] 41. The method of any one of items 36-40, comprise hybridizing a primer to the nucleic acid, at the same time as or followed by contacting the nucleic acid with the composition.

[0053] 42. A method for determining the sequence of a nucleic acid, comprising the following steps in order:

1) Generating a complementary strand to the nucleic acid by the method of any one of items

36-41; 2) Selectively cleaving the P-N bond within the nucleoside triphosphate to generate an expanded complementary strand;

3) Sequencing the expanded complementary strand,

4) Determining the sequence of the nucleic acid based on the sequence of the expanded complementary strand.

[0054] 43. The method of item 42, wherein the P-N bond is selectively cleaved in step 3) under acidic conditions.

[0055] 44. The method of item 42 or 43, wherein the expanded complementary strand is sequenced in step 4) by nanopore-based sequencing.

[0056] 45. The method of item 44, wherein the nanopore-based sequencing comprises:

(a) providing a chip for nanopore-based sequencing comprising:

(i) an electrochemically resistive barrier disposed over an aperture on a surface of the chip, wherein the barrier separates a cis side from a trans side;

(ii) a nanopore inserted into the barrier, wherein the nanopore has an entrance side on the cis side of the barrier and an exit side on the trans side of the barrier;

(b) contacting the cis side of the barrier with the expanded complementary strand;

(c) applying a voltage across the barrier of the chip to translocate the expanded complementary strand to the trans side;

(d) determining one or more changes in an electrical characteristic of the nanopore associated with occupation of the nanopore by the expanded complementary strand during the translocation; and

(e) determining, based on the one or more changes in the electrical characteristic of the nanopore, a sequence for the expanded complementary strand.

BRIEF DESCRIPTION OF THE DRAWINGS

[0057] FIG. 1 : HPLC chromatograms showing two separate peaks for the active vs. inactive diastereomers for four different dNTP-2c molecules.

[0058] FIG. 2: Graphical representation of data selected from Table 2. Ratio: the ratio of active vs. inactive isomer that is present during Xpandomer synthesis; % Full-length: percentage of full- length complementary strand among all complementary strand products detected.

[0059] FIG. 3 : Gel-electrophoresis after Xpandomer synthesis using active or inactive isomer. A full-length Xpandomer product is obtained using the active isomer (lane 37), whereas no full-length Xpandomer product is obtained using the inactive isomer (lane 38). DETAILED DESCRIPTION OF THE INVENTION

[0060] The invention will now be described in detail by way of reference only using the following definitions and examples. All patents and publications, including all sequences disclosed within such patents and publications, referred to herein are expressly incorporated by reference.

[0061] Unless defined otherwise herein, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Singleton (Singleton et al., Dictionary of microbiology and molecular biology, 2nd ed., 1994, John Wiley and Sons, New York), Hale (Hale and Marham, The Harper Collins dictionary of biology, 1991, Harper Perennial, NY) and Walker (Walker and Cox, The Language of Biotechnology: A Dictionary of Terms. 1988, American Chemical Society, Washington, D.C. ISBN-0-8412-1499-1) provide one of skill with a general dictionary of many of the terms used in this invention. Practitioners are particularly directed to Sambrook (Sambrook et al., Molecular cloning: A laboratory manual, 1989, Cold Spring Harbor Laboratory Press), and Ausubel (Ausubel et al., Current protocols in molecular biology, 1993, John Wiley & Sons, Inc.), for definitions and terms of the art. It is to be understood that this invention is not limited to the particular methodology, protocols, and reagents described, as these may vary.

[0062] As used herein, the singular forms “a,” “an” and “the” include plural referents unless the context clearly dictates otherwise.

[0063] In the structures shown herein, when not all natural valencies of an atom are filled by named groups, it should be understood that the unfilled valencies are filled by hydrogen. When a wavy line in a structure intersects a bond, then the intersected bond is the location where the structure joins to the remainder of a molecule.

[0064] When a structure depicts a molecule with one or more negatively charged oxygens, the structure likewise encompasses the molecule with the oxygen(s) in conjunction with H+ and/or any organic or inorganic cations. When a structure depicts a molecule with one or more hydroxyl groups, the structure likewise encompasses the molecule with the oxygen(s) from the hydroxyl group(s) in conjunction with H+ and/or any organic or inorganic cations.

[0065] Reference throughout this specification to "one embodiment" or "an embodiment" and variations thereof means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

[0066] Unless otherwise indicated, nucleic acids are written left to right in 5' to 3' orientation; amino acid sequences are written left to right in amino to carboxy orientation, respectively.

[0067] The headings provided herein are not limitations of the various aspects or embodiments of the invention, which can be had by reference to the specification as a whole. Accordingly, the terms defined immediately below are more fully defined by reference to the specification as a whole.

I. Terms

[0068] Percent identity: The term “% identity” in the context of nucleic acid or amino acid sequences refers to the level of sequence identity between a nucleic acid sequence and a reference nucleic acid sequence or between an amino acid sequence and a reference amino acid sequence, when aligned using a sequence alignment program. For example, as used herein, 80% identity indicates that a sequence has greater than 80% sequence identity over a length of the reference sequence. Exemplary levels of sequence identity include, but are not limited to, 80% or more, 85% or more, 90% or more, 95% or more, and 98% or more sequence identity to a reference sequence, e.g., the wildtype sequence for any one of the polypeptides described herein. Exemplary computer programs which can be used to determine identity between two sequences include, but are not limited to, the suite of BLAST programs, e.g., BLAS TN, BLASTX, and TBLASTX, BLASTP and TBLASTN, publicly available on the Internet. See also, Altschul I and Altschul n. Sequence searches are typically carried out using the BLASTN program when evaluating a given nucleic acid sequence relative to nucleic acid sequences in the GenBank DNA Sequences and other public databases. The BLASTX program may be used for searching nucleic acid sequences that have been translated in all reading frames against amino acid sequences in the GenBank Protein Sequences and other public databases. The BLASTP program may be used for searching amino acid sequence against amino acid sequences in the GenBank Protein Sequences and other public databases. All of BLASTN, BLASTX and BLASTP are run using default parameters of an open gap penalty of 11.0, and an extended gap penalty of 1.0, and utilize the BLOSUM-62 matrix. (See, e.g., Altschul II). In certain example embodiments, an alignment of selected sequences in order to determine “% identity” between two or more sequences, is performed using for example, the CLUSTAL-W program in MacVector version 13.0.7, operated with default parameters, including an open gap penalty of 10.0, an extended gap penalty of 0.1, and a BLOSUM 30 similarity matrix. [0069] Phosphate: A “phosphate” includes an “organophosphate” as well as variants thereof, such as an “amidophosphate” (which is a synonym for “phosphoramidate”). A phosphate can include a side chain, such as -R⁴-G² in the nucleoside triphosphates disclosed herein. The first, second and third phosphate counted from the 5’ end of a nucleoside are also referred to as a-phosphate, 0- phosphate and y -phosphate, respectively (or, in case the a-phosphate is a phosphoramidate, it can also be referred to as “a-phosphoramidate”). The type of a given phosphate is also derivable from the structures provided herein.

[0070] Expandable NTP: An “expandable NTP” or “XNTP” refers to a 5' phosphate modified non-natural nucleoside triphosphate (NTP) molecule (typically a non-natural 2’ -deoxynucleoside triphosphate molecule) compatible with template-dependent enzymatic polymerization. Each XNTP has two distinct functional regions, i.e., a selectively cleavable bond (e.g. a phosphoramidate bond) linking the 5’ a-phosphate to a sugar comprised in a nucleoside and a tether that is attached within the XNTP at positions that allow for controlled expansion by cleavage of the cleavable bond (e.g. a tether linking the 5’ a-phosphate and the nucleobase). An XNTP can thus be present in a constrained configuration (when the cleavable bond is still intact) or in an expanded configuration (when the cleavable bond has been cleaved, e.g. via acid treatment).

[0071] dNTP-2c: An “dNTP-2c” refers to a 5' phosphate modified non-natural dNTP molecule that can serve as an intermediate in the synthesis of XNTPs. A dNTP -2c comprises two clickable groups, such as terminal alkynes, one as part of a modification at the 5’ a-phosphate, and one as part of a modification at the nucleobase. The two clickable groups allow addition of a tether between the a-phosphate and the nucleobase to form an XNTP.

[0072] Xpandomer: An “Xpandomer” or “Xp” refers to a molecule consisting of at least two XNTPs. An Xpandomer is obtainable, for example, by polymerase-mediated synthesis of a complementary strand to a template nucleic acid using XNTPs as polymerase substrates. An expanded configuration of the Xpandomer can be obtained by cleavage of the phosphoramidate bond in the XNTPs, e.g. via acid treatment.

II. Nucleoside triphosphates with clickable groups

[0073] In some embodiments, the disclosure relates to nucleoside triphosphates comprising two clickable groups, one attached to the a-phosphoramidate, the other attached to the nucleobase. This allows linking the a-phosphoramidate to the nucleobase via a tether molecule by a click reaction to yield expandable NTPs. Such a nucleoside triphosphate generally has the following structure: wherein NB is a nucleobase; R¹ comprises or consists of a hydrocarbon; R² is independently H, OH or any 2 -ribose modification; R³ is H or any protecting group; and R⁴ comprises or consists of a hydrocarbon; and G¹ and G² independently represent terminal clickable groups. Preferably, the nucleoside triphosphate is a modified 2 ’-deoxynucleoside triphosphate (dNTP).

[0074] The nucleoside triphosphate comprises a stereocenter at a phosphorus atom, and can therefore exist as two different isomers with different stereoconfigurations at the a-phosphoramidate as follows: [0075] The disclosure thus provides a composition comprising the nucleoside triphosphate

(with two clickable groups) disclosed herein. In some embodiments, the disclosure provides a composition comprising nucleoside triphosphates having the structure: wherein NB is a nucleobase; R¹ comprises or consists of a hydrocarbon; R² is independently H, OH or any 2 '-ribose modification; R³ is H or any protecting group; and R⁴ comprises or consists of a hydrocarbon; G¹ and G² independently represent terminal clickable groups; and T is a tether molecule; [0076] wherein at least 80%, preferably at least 90%, such as at least 95%, at least 99%, or

100% of the nucleoside triphosphates have the following stereoconfiguration at the a- phosphorami date:

[0077] Preferably, at least 90%, and more preferably 100% of the nucleoside triphosphates in the composition have the following stereoconfiguration at the a-phosphoramidate: [0078] Thus, for example, at least 80%, preferably at least 90%, such as at least 95%, at least

99%, or 100% of the nucleoside triphosphates in the composition can have the following structure:

[0079] A clickable group can be any group that allows selective reaction with a complementary clickable group via click chemistry. Click chemistry and suitable pairs of clickable groups are well known in the art, see e.g. Fantoni et al., 2021, Chemical Reviews, 121 (12): 7122- 7154; and Klbcker et al., 2020, Chem. Soc. Rev., 49:8749-8773. Examples of click reactions include alkyne + azide reactions (CuAAC), copper-free click strain promoted azide alkyne click (SPAAC) reactions (e.g. DBCO + azide), inverse-electron demand Diels-Alder cycloaddition (IEDDA). Thus, for example, a terminal clickable group can be a terminal alkyne or azide group, preferably an alkyne group. In preferred embodiments, G¹ and G² are terminal clickable groups of the same type. Preferably, both G¹ and G² represent a terminal alkyne group.

[0080] NB is a nucleobase, and generally will be a pyrimidine nucleobase or a purine nucleobase. This includes naturally occurring nucleobases, like adenine, guanine, cytosine or thymine, and nucleobases with modifications that do not interfere with base pairing to a complementary nucleobase. For instance, pyrimidine nucleobases can be modified at the position 5, and purine nucleobases can be modified at the position 7. Non-limiting examples of nucleobases are adenine, guanine, thymine, cytosine, uracil, xanthine, hypoxanthine, 8-azapurine, purines substituted at the 8 position with methyl or bromine, 9-oxo-N6-methyladenine, 2-aminoadenine, 7-deazaxanthine, 7- deazaguanine, 7-deazaadenine, N4-ethanocytosine, 2,6-diaminopurine, N6-ethano-2,6- diaminopurine, 5-methylcytosine, 5-(C3-C10)-alkynylcytosine, 5 -fluorouracil, 5 -bromouracil, thiouracil, pseudo isocytosine, 2-hydroxy-5-methyl-4-triazolopyridine, isocytosine, isoguanine, inosine, 7, 8 -dimethylalloxazine, 6-dihydrothymine, 5,6-dihydrouracil, 4-methyl-indole, ethenoadenine and the non-naturally occurring nucleobases described in U.S. Pat. Nos. 5,432,272 and 6,150,510 and published PCT applications WO 92/002258, WO 93/10820, WO 94/22892 and WO 94/24144, and Fasman ("Practical Handbook of Biochemistry and Molecular Biology", pp. 385-394, 1989, CRC Press, Boca Raton, La.), all herein incorporated by reference in their entireties. In one embodiment, the nucleobase is selected from adenine, guanine, uracil, and cytosine, and modified versions ofthese nucleobases, such as those disclosed herein (e.g. 7-deazaadenine or 7-deazaguanine). NB is preferably selected from cytosine, thymine, 7-deazaadenine and 7-deazaguanine. In preferred embodiments, NB has one of the following structures:

[0081] R¹ is typically such that it does not interfere with base-pairing with a complementary nucleobase. For example, R¹ is attached to position 5 of the nucleobase when the nucleobase is a pyrimidine nucleobase, and to position 7 of the nucleobase when the nucleobase is a purine nucleobase (wherein a naturally occurring nitrogen at position 7 can be replaced by a carbon, for example, as e.g. in 7-deazaadenine or 7-deazaguanine). Nucleobases with modifications at position 5 (pyrimidine bases) or 7 (purine bases) and their synthesis are commonly known, see e.g. see e.g. Kozak et al., 2020 (Russ. Chem. Rev., 2020, 89 (3) 281-310) and Matyugina et al., 2021 (Russ. Chem. Rev., 2021, 90 (11) 1454-1491). For concrete synthesis methods for nucleosides with bases as shown above, see also WO 2016/081871.

[0082] R¹ comprises or consists of a (substituted or unsubstituted, preferably unsubstituted) hydrocarbon. For application in sequencing by expansion, R¹ typically consists of a (substituted or unsubstituted, preferably unsubstituted) hydrocarbon. The hydrocarbon can be saturated or unsaturated, preferably unsaturated. For example, R¹ can comprise or consist of a (substituted or unsubstituted, preferably unsubstituted) alkyl, alkenyl, or alkynyl, preferably alkynyl.

[0083] Preferably, R¹ comprises 1-100 carbon atoms, preferably 1-30 carbon atoms or 1-20 carbon atoms, such as 3-20 carbon atoms, 3-10 carbon atoms or 5-10 carbon atoms, such as 6 or 8 carbon atoms. Typically, R¹ will be acyclic. Preferably, R¹ is linear. In some embodiments, the molecular weight of R¹ is 1500 g/mol or less, 1000 g/mol or less, 500 g/mol or less, 200 g/mol or less, or 100 g/mol or less. [0084] In some embodiments, R¹ is a substituted or unsubstituted, branched or unbranched, saturated or unsaturated alkyl group comprising 1-100 carbon atoms, which optionally includes one or more oxygen, nitrogen, phosphorus or sulfur heteroatoms (e.g. to include an ether, a thioether, a phosphordiester or phosphortriester, or PEG, a heterocycle, such as a triazole or imidazole).

[0085] In some embodiments, R¹ is -R^w-Z, wherein R^w is a substituted or unsubstituted, branched or unbranched, saturated or unsaturated alkyl group having between 1 and 100 carbon atoms, which optionally includes one or more oxygen, nitrogen, phosphorus or sulfur heteroatoms, and where Z is alkyl, alkenyl, alkynyl, acyl, -Het, or -Ofc-Het, where "Het" is a substituted or unsubstituted 5- or 6-membered heterocyclic moiety.

[0086] As a preferred example, R¹ consists of a linear hydrocarbon, such as an alkynyl, and G¹ is a terminal alkyne group. Preferably, R¹ is a hexa-l-ynyl or octa-l-ynyl group. In most preferred examples, R¹ with G¹ is an octa-1, 7-diynyl or a deca-l,9-diynyl group.

[0087] R² is independently H, OH or any 2 -ribose modification. In some embodiments, both R² are H or one R² is H and the other R² is OH. Preferably, both R² are H. 2’ -ribose modifications are known in the art and include, for example, tert-butyldimethylsilyl and tri-iso-propylsilyloxymethyl ether groups as well as a 2’-O-methyl group or a 2’-fluoro group.

[0088] R³ is H or any protecting group, preferably H. Protecting groups are known in the art. Examples of a protecting group include acetyl, benzoyl, benzyl, methoxyethoxymethyl ether, dimethoxytrityl, ethoxymethyl ether, methoxytrityl, p-methoxybenzyl ether, p-methoxyphenyl ether, methylthiomethyl ether, pivaloyl, tert-butyl ethers, tetrahydropyranyl, tetrahydrofuran, trityl, silyl ether (e.g. trimethylsilyl, tert-butyldimethylsilyl, tri-iso-propylsilyloxymethyl, or triisopropylsilyl ethers), methyl ethers, and ethoxyethyl ethers.

[0089] R⁴ comprises or consists of a hydrocarbon. For application in sequencing by expansion, R⁴ typically consists of a hydrocarbon. The hydrocarbon can be substituted or unsubstituted, preferably unsubstituted. The hydrocarbon can be saturated or unsaturated, preferably saturated. For example, R⁴ can comprise or consist of a alkyl, alkenyl, or alkynyl, preferably alkyl.

[0090] Typically, R⁴ comprises 1-100 carbon atoms, preferably 1-30 carbon atoms or 1-20 carbon atoms, such as 1-15 carbon atoms, 3-15 carbon atoms, 3-10 carbon atoms or 3-6 carbon atoms, such as 4 carbon atoms. Typically, R⁴ will be acyclic. Preferably, R⁴ is linear. In preferred embodiments, R⁴ comprises or consists of a linear (saturated) alkyl. In some embodiments, the molecular weight of R⁴ is 1500 g/mol or less, 1000 g/mol or less, 500 g/mol or less, 200 g/mol or less, or 100 g/mol or less. [0091] In some embodiments, R⁴ comprises or consists of a branched, linear, cyclic or heterocyclic, substituted or unsubstituted, saturated or unsaturated hydrocarbon, optionally including one or more heteroatoms, optionally selected from nitrogen, oxygen, phosphorus and sulfur. A cyclic or heterocyclic hydrocarbon can be 5 -membered or 6-membered, for example. A cyclic or heterocyclic hydrocarbon can be aromatic, for example.

[0092] In some embodiments, R⁴ is a substituted or unsubstituted, branched or unbranched, saturated or unsaturated alkyl group comprising 1-100 carbon atoms, which optionally includes one or more oxygen, nitrogen, phosphorus or sulfur heteroatoms (e.g. to include an ether, a thioether, a phosphordiester or phosphortriester, or PEG, a heterocycle, such as a triazole or imidazole).

[0093] In some embodiments, R⁴ is -R^w-Z, wherein R^w is a substituted or unsubstituted, branched or unbranched, saturated or unsaturated alkyl group having between 1 and 100 carbon atoms, which optionally includes one or more oxygen, nitrogen, phosphorus or sulfur heteroatoms, and where Z is alkyl, alkenyl, alkynyl, acyl, -Het, or -Clfc-Het, where "Het" is a substituted or unsubstituted 5- or 6-membered heterocyclic moiety.

[0094] In a preferred embodiment, R⁴ consists of a linear hydrocarbon, such as an alkyl, and G² is a terminal alkyne group. As a preferred example, R⁴ with G² is a hex-5 -ynyl group. Thus, as a preferred example, R⁴ with G² has the structure:

[0095] R⁴ can also comprise or consist of two or more hydrocarbons that are linked by an atom or group of atoms other than carbon, such as a phosphorus atom and/or an oxygen atom. For example, two hydrocarbons (each independently comprising 1-10, 2-10 or 2-6 carbon atoms), such as alkyls, can be linked by an oxygen atom. In another example, two or three hydrocarbons (each independently comprising 1-10, 2-10 or 2-6 carbon atoms), such as alkyls, can be linked by a phosphate diester or a phosphate triester, respectively. Thus, in an embodiment, R⁴ with G² has the structure: [0096] During production, R⁴ comprising a phosphate typically has a protective group at the hydroxyl function of the phosphate, such as an beta-cyano -ethyl group. The protective group can be removed after the pyrophosphate has been added to yield a triphosphate.

[0097] For example, R⁴ can consist of a) a linear hydrocarbon, or b) two linear hydrocarbons linked by a phosphate diester, wherein R⁴ comprises 3-15 carbon atoms, and G² is a terminal alkyne group.

[0098] In preferred embodiments, at least 80%, preferably at least 90%, such as at least 95%, at least 99%, or 100% of the nucleoside triphosphate in the composition have a structure selected from the following structures:

[0099] In more preferred embodiments, at least 80%, preferably at least 90%, such as at least 95%, at least 99%, or 100% of the nucleoside triphosphate in the composition have a structure selected from the following structures:

[0100] A racemic mixture of the nucleoside triphosphate with two clickable groups as disclosed herein can be produced, for example, by the methods disclosed in WO 2016/081871. Certain types of R⁴, e.g. those comprising a phosphate, may comprise a protective group, such as an ethyl cyanide group, at the phosphate, until the 5’ triphosphate has been generated. A given diastereomerically pure isomer can then be separated from the other isomer by high-performance liquid chromatography (HPLC), for example. Concrete conditions for separating the two isomers by preparative HPLC are given in Example 1. The active configuration can be identified as eluting first in HPLC as described in Example 1. This configuration can be further functionally identified, for example, by separating the two enantiomers and testing whether expandable nucleoside triphosphates synthesized with a given enantiomer are suitable for Xpandomer synthesis as described in Example 2.

III. Expandable NTPs

[0101] The nucleoside triphosphates disclosed herein can be used, for example, for sequencing by expansion. In this case, the a-phosphoramidate is typically linked to the nucleobase, e.g. via a tether molecule (to form an expandable NTP). For example, R¹ can be linked to R⁴, e.g. via a tether molecule. The disclosure thus also relates to nucleoside triphosphates that can be expanded, and are thus suitable for sequencing by expansion, for example. Such a nucleoside triphosphate is obtainable, for example, by linking the a-phosphoramidate and the nucleobase in the nucleoside triphosphate with two clickable groups by a click reaction. Such a nucleoside triphosphate generally has the following structure: wherein T is a tether molecule; NB is a nucleobase; R¹ comprises or consists of a hydrocarbon; R² is independently H, OH or any 2'-ribose modification; R³ is H or any protecting group; R⁴ comprises or consists of a hydrocarbon, and L¹ and L² independently represent linking groups. The further description of NB, R¹, R², R³ and R⁴ from the context of the nucleoside triphosphate with two clickable groups equally applies.

[0102] The nucleoside triphosphate comprises a stereocenter at a phosphorus atom, and can therefore exist as two different diastereomers with different stereoconfigurations at the a- phosphoramidate as follows:

(active) (inactive)

[0103] The disclosure thus provides a composition comprising (expandable) nucleoside triphosphates as disclosed herein suitable for sequencing by expansion. In some embodiments, the disclosure provides a composition comprising (expandable) nucleoside triphosphates having the structure: wherein NB is a nucleobase; R¹ comprises or consists of a hydrocarbon; R² is independently H, OH or any 2 '-ribose modification; R³ is H or any protecting group; and R⁴ comprises or consists of a hydrocarbon; L¹ and L² independently represent linking groups; and T is a tether molecule; wherein at least 80%, preferably at least 90%, such as at least 95%, at least 99%, or 100% of the nucleoside triphosphates have the following stereoconfiguration at the a-phosphoramidate: preferably

[0104] Thus, for example, at least 80%, preferably at least 90%, such as at least 95%, at least 99% or 100% of the nucleoside triphosphate in the composition can have the following structure:

[0105] The composition can also comprise a mixture of different types of the nucleoside triphosphate. In some embodiments, the composition comprises a mixture of four different types of nucleoside triphosphates. Typically, the four different types of nucleoside triphosphates comprise four different types of nucleobases. Preferably, the four different types of nucleoside triphosphate base pair with, guanine, adenine, thymine and cytosine, respectively. Thus, in some embodiments, the composition comprises four different types of nucleoside triphosphates, each comprising a unique nucleobase, e.g. selected from 7-deazaadenine, 7-deazaguanine, thymine and cytosine. Preferably, the four different types of nucleoside triphosphates each comprise a unique nucleobase selected from:

[0106] The tether molecule is not particularly limited, but will typically comprise a reporter (to allow specific identification of the attached nucleobase, e.g. via nanopore-based sequencing). A tether molecule can be, for example, a symmetrically synthesized reporter tether (SSRT) as disclosed in WO 2020/236526 Al. Such tether molecules typically have the following structure: Linker A - reporter - Linker B. Linker A can be attached to the a-phosphoramidate and linker B to the nucleobase, or vice versa. For example, Linker A and Linker B can be polymers comprising two or more repeat units selected from: spermine (Q), hexaethylene glycol (D), 2-((4-((3-(benzoyloxy)-2-(((l-(3- (benzoyloxy)-2-((benzoyloxy)methyl)-2-((phosphodiester-oxy)methyl)propyl)-lH-l,2,3-triazol-4- yl)methoxy)methyl)-2- ((benzoyloxy )methyl)propoxy)methyl)- 1 H- 1 ,2, 3 -triazol- 1 -yl)methyl)-2-O- phosphodiester- propane- 1,3 -diyl dibenzoate, l,3-O-bis(phosphodiester-2,2-bis(l-Me-4-(Me-O-

PEG2-O-Bz)-l,2,3-triazole)-propane, l,3-O-bis(phosphodiester-2-(4-(Me-O-PEG5)-l-(Et-O-Ac)-

1.2.3-triazole)-propane, 1 ,3 -O-bis(phosphodiester-2s-O-(4-(Me-O-PEG7)- 1 -(Et-OBz)-l ,2,3 - triazole)-propane, l,3-O-bis(phosphodiester-2s-O-(4-(Me-O-PEG3)-l-(Et-2,2,2-Tris-(Me-O-Bz))-

1.2.3-triazole)-propane, l,3-O-bis(phosphodiester-2-(4-(Me-O-PEG5)-l-(Et-2,2,2-Tris-(Me-O-Ac))-

1.2.3-triazole)-propane, l,2-O-bis(phosphodiester)-3-(4-(Me-O-PEG3-O-Bz)-l-(l,2,3-triazole))- propane, 1 ,3-O-bis(phosphodiester-2,2-bis(4-(Me-O-PEG2-O-Me)-l -(Et-O-Bz)-l ,2,3-triazole)- propane, 1 ,3 -O-bis(phosphodiester-2,2-bis(4-(Me-O-PEG3 -O-Me)-1 -(Et-2,2,2-Tris-(Me-O-Bz))-

1.2.3-triazole)-propane, 1 ,2-O-bis(phosphodiester)-3 -(4-methylpiperazine- 1 -yl)-propane, 1 ,3-0- bis(phosphodiester-2,2-bis(4-(Me-O-PEG3-O-Me)-l-(Et-O-Bz)-l,2,3-triazole)-propane, and 1 ,1 ’-O- bis(phosphodiester)-N(p-tolyl)-diethanolamine, preferably spermine. In some embodiments, linker A and B are inverted copies of each other.

[0107] For example, a reporter can be a polymer comprising two or more repeat units selected from: hexaethylene glycol (D), ethane (L), triaethylene glycol (X), l,3-O-bis(phosphodiester)-2S-O- mPEG4-propane, 1 ,3-O-bis(phosphodiester)-2-(4-Me-O-PEG3)-l -(Et-O-Ac)-l ,2,3-triazole)- propane, l,3-O-bis(phosphodiester-2,2-bis(Me-O-mPEG2)-propane, 1 ,3-O-bis(phosphodiester-2S- O-(PEG4-O-Bz)-propane, 1 ,3-O-bis(phosphodiester)-2s-O-mPEG6-propane, 1,3-0- bis(phosphodiester-2s-O-(4-(Me-O-PEG3)-l-(Et-2,2,2-Tris-(Me-O-Bz))-l,2,3-triazole)-propane,

1.3-O-bis(phosphodiester-2s-O-(4-(Me-O-PEG3)-l-(Me-acetate)-l,2,3-triazole)-propane, 1,3-0- bis(phosphodiester)-2s-O-(4-(Me-O-PEG2)-l-(Et-OBz)-l,2,3-triazole)-propane, 1,3-0- bis(phosphodiester)-2-(4-Et-l-(Et-O-mPEGl)-l,2,3-triazole)-propane, 2,3-O-bis(phosphodiester)-l- (1 dimethoxyquinazolinedione)- propane , 2,3-O-bis(phosphodiester)-l-(N9-(3,6- dimethoxycarbazole)-propane, l,r-O-bis(phosphodiester)-2,2’-(sulfonylbis(benz-4-yl))- di ethanol, l,r-O-bis(phosphodiester)-2,2’-bipyridin-4,4’-yl)-dimethanol, 2,3-O-bis(phosphodiester)-l-(Nl- (4,6-dimethoxy-3-Me-indole)-propane, 3-(l,2-O-bis(phosphodiester)-propyl)-8,8- dimethylhexahydro-3H-3a,6-methanobenzo[c]isothiazole 2,2-dioxide, 2,3-O-bis(phosphodiester)-l - (Nl-(6-Azathymine))-propane, l,5-O-bis(phosphodiester)-hexahydrofuro[2,6]furan, 1,1 ’-O- bis(phosphodiester)-octahydro-2,6-dimethyl-3,8:4,7-dimethano-2,6-naphthyridin-4,8-diyl)- dimethanol, 2,3-O-bis(phosphodiester)-l-(Nl-(2-Me-5-nitroindole)-propane, 2,3-0- bis(phosphodiester)-l-(Nl-(2-Me-5-nitroindole)-propane, 2,3-O-bis(phosphodiester)-l-(5- benzofuran)-propane, 1 ,2-O-bis(phosphodiester)-3 -0-mPEG2 -propane, 1 ,3 -O-bis(phosphodiester)- 2-(4-Et-l-(Et-O-mPEG3)-l ,2,3-triazole)-propane, and l,3-O-bis(phosphodiester)-3-O-mPEG4- propane (see WO 2020/236526 Al). In some embodiments, the reporter comprises or consists of two inverted copies of the same polymer comprising two or more repeat units selected from the above. The two inverted copies of the same polymer in the reporter can be linked via a branching element that is further linked to a translocation control element (TCE), as described in WO 2020/236526 Al.

[0108] L¹ and L² independently represent linking groups. The linking groups are not particularly limited, and include substituted or unsubstituted hydrocarbons, including e.g. a 1,2,3- triazole.

[0109] Preferably, the tether molecule is attached via click chemistry reactions. In other words, the terminal clickable groups in G¹ and G² can be used to link a tether molecule via a click reaction. These terminal clickable groups are typically clickable groups of the same type, e.g. each is an alkyne group. When the tether is attached via click reactions, L¹ and L² will each be a product of a click reaction, such as a 1 ,2,3 -triazole. For example, when both G¹ and G² are terminal alkyne groups, a tether molecule attached to terminal azide groups on two ends can be reacted with G¹ and G², thereby yielding two 1,2,3-triazoles (or vice versa, i.e. G¹ and G² are terminal azide groups and the tether is attached to two terminal alkyne groups). In preferred embodiments, L¹ and L² are 1,2,3-triazoles.

[0110] In preferred embodiments, at least 80%, preferably at least 90%, such as at least 95%, at least 99%, or 100% of the nucleoside triphosphate in the composition have a structure selected from the following structures:

[0111] The first two structures are obtainable by reacting a nucleoside triphosphate with two clickable groups as disclosed herein with R¹ with G¹ = octa-1, 7-diynyl or deca-l,9-diynyl group, and R⁴ with G² = hex-5-ynyl group with a tether attached to two terminal azide groups.

[0112] In more preferred embodiments, at least 80%, preferably at least 90%, such as at least 95%, at least 99%, or 100% of the nucleoside triphosphate in the composition have a structure selected from the following structures:

[0113] The composition can also be provided as a master mix or a reaction solution. The

5 composition can thus further comprise additional reagents for complementary strand synthesis. For example, the composition can further comprise a nucleic acid polymerase (for details see section IV.). Moreover, the composition can further comprise a buffering agent, such as TrisCi. In some embodiments, the composition further comprises at least one of (including each of) TrisOAc, NH4OAC, PEG, water-miscible organic solvent, such as DMF or NMP, polyphosphate 60, N-methyl succinimide (NMS), and MnCh, a single-strand binding protein (SSB), and urea. The SSB can be Kod SSB (from Thermococcus kodakarensis), for example. The composition may further comprise a polymerase-enhancing molecule (PEM), such as described in EP 3 735 409 Bl. A reaction solution will also typically comprise at least one nucleic acid. In these embodiments, the composition typically comprises four different types of the nucleoside triphosphate, wherein the four different types of the nucleoside triphosphate base pair with guanine, adenine, thymine and cytosine, respectively.

[0114] The (expandable) nucleoside triphosphates can be obtained for example, by linking the a-phosphoramidate to the nucleobase of the nucleoside triphosphate with the two clickable groups G¹ and G². The a-phosphoramidate can be linked to the nucleobase via a tether molecule that can be attached by click chemistry reactions. For example, when G¹ and G² are terminal alkyne groups and the tether molecule is attached to two terminal azide groups, a double click reaction at each end will attach the tether molecule to R¹ and R⁴ via two 1,2,3-triazoles. The reaction can look as follows: wherein G¹ and G² independently represent terminal clickable groups, G^la and G^2a independently represent terminal clickable groups, L¹ and L² represent linking groups formed by reacting G¹ and G² with G^la and G^2a, respectively, and T, NB, R¹, R², R³, and R⁴ are as defined above.

[0115] The nucleoside triphosphate can be further purified, for example by HPLC. Purification can take place e.g. after the reaction with a pyrophosphate and/or after linking the a- phosphoramidate to the nucleobase. IV. Methods

[0116] The disclosure also provides a method for generating a complementary strand to a nucleic acid, comprising contacting the nucleic acid with a composition comprising a nucleoside triphosphate as disclosed herein. The complementary strand generated by such method is typically an Xpandomer (in constrained configuration).

[0117] The type of nucleic acid is not particularly limited, and includes DNA or RNA. In preferred embodiments, the nucleic acid is a DNA, such as a genomic DNA or a cDNA. Typically, the DNA is cell-free DNA.

[0118] The nucleic acid can be part of a library of nucleic acids. For example, the library can be a library of genomic DNA or cDNA.

[0119] The generation of the complementary strand is typically primed by a primer. The design and generation of primers is known in the art. The primer to be used is not particularly limited and can be designed, for example, to hybridize with the nucleic acid at a position so as to allow the generation of the complementary strand to parts of the nucleic acid that are of interest, including full- length. When a library of nucleic acids is to be sequenced, it is possible, for example, to use a standard primer binding to all nucleic acids of interest in the library, or a random primer mixture.

[0120] In some embodiments, the method can comprise hybridizing a primer to the nucleic acid, followed by contacting the nucleic acid with a composition comprising a nucleoside triphosphate as disclosed herein.

[0121] If necessary, the nucleic acid can also be denatured, e.g. to facilitate primer hybridization. Means for denaturation are not particularly limited, and include e.g. applying heat (e.g. 90°C-100°C). Thus, in some embodiments, the method can comprise denaturing the nucleic acid and then hybridizing a primer to the nucleic acid, followed by contacting the nucleic acid with a composition comprising a nucleoside triphosphate as disclosed herein.

[0122] Typically, the complementary strand is generated by using a polymerase, such as a (DNA-dependent) DNA polymerase. Thus, the composition used preferably further comprises a nucleic acid polymerase. Polymerases will typically be used comprising mutations that sterically allow the use of XNTPs as substrates. A suitable class of polymerases for incorporating XNTPs includes the translesion DNA polymerase (i.e. class Y polymerase) family that includes e.g. the DPO4 polymerase. Translesion DNA polymerases exhibit a more flexible substrate recognition than conventional (e.g. replication) polymerases owing to their relatively large substrate binding sites, which have evolved to accommodate naturally occurring, bulky DNA lesions. Suitable polymerases include e.g. modified DPO4 polymerases as described in WO 2017/087281, WO 2018/204707, or WO 2019/118372, which are herein incorporated by reference in their entireties. Suitable examples are provided herein as SEQ ID NOs: 1-5. Thus, in some embodiments, the polymerase has at least 95%, such as at least 98% or 99% sequence identity to the amino acid sequence of SEQ ID NO: 1. Such a polymerase has DNA-dependent DNA polymerase activity, and more specifically, is capable of using XNTPs as polymerization substrate.

[0123] Moreover, the composition typically further comprises a buffering agent, such as TrisCi. In some embodiments, the composition comprises at least one of (including each of) TrisOAc, NH4OAc, PEG, water-miscible organic solvent, such as dimethylformamide (DMF) or N- methylpyrrolidone (NMP), polyphosphate 60, N-methyl succinimide (NMS), and MnC12, a singlestrand binding protein (SSB), and urea. The SSB can be Kod SSB, for example.

[0124] The composition may further comprise a polymerase-enhancing molecule (PEM), such as described in EP 3 735 409 Bl . For example, a PEM can be as described in the claims of EP 3 735 409 Bl, i.e. a compound of the following formula that increases the processivity, rate, or fidelity of the nucleic acid polymerase reaction: wherein independently at each occurrence: m is 1,2 or 3; n is 0, 1 or 2; p is 0, 1 or 2; Ari is optionally substituted aryl; Ar2 is selected from 5- and 6-membered monocyclic aromatic rings and 9- and 10-membered fused bicyclic rings comprising two 5- and/or 6-membered monocyclic rings fused together, where at least one of the two monocyclic rings is an aromatic ring, where Ar2 is optionally substituted with one or more substituents selected from halide, Ci-Cealkyl, Ci-Cehaloalkyl, ECO₂R°, E-CONH2, E-CHO, E-C(O)NH(OH), E-N(R°)₂, and E-OR°, where E is selected from a direct bond and Ci-Cealkylene; and R° is selected from H, Ci-Cealkyl and Ci-Cehaloalkyl, M is selected from hydrogen, halogen and Ci-C4alkyl; and L is a linking group; or a solvate, hydrate, tautomer, chelate or salt thereof. [0125] The composition typically comprises four different types of the nucleoside triphosphate, wherein the four different types of the nucleoside triphosphate base pair with guanine, adenine, thymine and cytosine, respectively.

[0126] The disclosure also provides a method for sequencing a nucleic acid using the expandable nucleoside triphosphate or the composition comprising the same as disclosed herein.

[0127] The disclosure thus provides a method for determining the sequence of a nucleic acid, comprising the following steps in order:

1) Generating a complementary strand to the nucleic acid by the method for generating a complementary strand as disclosed herein;

2) Selectively cleaving the P-N bond within the nucleoside triphosphate to generate an expanded complementary strand;

3) Sequencing the expanded complementary strand,

[0128] The nucleoside triphosphates used in such a method are such that they allow the generation of an expanded complementary strand. This is typically achieved by using nucleoside triphosphates in which the a-phosphoramidate is linked to the nucleobase via a tether molecule. When the phosphoramidate (P-N) bond is cleaved, the a-phosphoramidate and the nucleobase remain linked via the tether molecule, thereby generating the expanded complementary strand.

[0129] In some embodiments, the complementary strand is separated from the nucleic acid after step 1), for example by denaturation. The complementary strand can optionally be purified before proceeding with step 2).

[0130] The P-N bond can be selectively cleaved in step 2) under acidic conditions, for example. This can be achieved by addition of an acid, such as DO. Cleavage in step 2) typically yields an Xpandomer in expanded configuration. The product of step 2) can optionally be purified before proceeding with step 3).

[0131] In preferred embodiments, the expanded complementary strand is sequenced in step 3) by nanopore-based sequencing. Methods for nanopore- based sequencing are known in the art, see e.g. WO 2020/236526. For example, nanopore-based sequencing can comprise:

(a) providing a chip for nanopore-based sequencing comprising:

(i) an electrochemically resistive barrier disposed over an aperture on a surface of the chip, wherein the barrier separates a cis side from a trans side; (ii) a nanopore inserted into the barrier, wherein the nanopore has an entrance side on the cis side of the barrier and an exit side on the trans side of the barrier;

[0132] The barrier is typically a lipid bilayer membrane, such as a DPhPE/hexadecane bilayer membrane. A nanopore, such as a a.hemolysine nanopoer, can be inserted into the membrane by electroporation in a buffer, such as a buffer of 2 M NH4C1 and 100 mM HEPES, pH 7.4. The cis well can be perfused with a buffer containing 0.4M NH4C1, 600mM GuanCl, lOOmM HEPES; pH 7.4, and 5% glycerol and the trans well can be perfused with buffer containing 0.4M NH4C1, 600mM GuanCl, 5% ethyl acetate, lOmM HEPES; pH 7.4, before introducing the Xpandomer to the cis side for sequencing.

V. Examples

[0133] Example 1: Production of diastereomers of dNTP-2c molecules

[0134] Racemic mixtures comprising nucleoside triphosphates with two clickable groups as disclosed herein were produced by the methods disclosed in WO 2016/081871. Four separate racemic mixtures were produced for four different types of nucleoside triphosphates with four different nucleobases corresponding to C, T, A and G, respectively, each one with -R^G¹ on the nucleobase and -R⁴-G² on the a-phosphoramidate as disclosed herein. For each racemic mixture, the two isomers were then separated from one another by preparative HPLC.

HPLC system: Agilent 1290 Infinity II Preparative LC System

Column: 50 x 250 mm Waters Xbridge Cl 8

Guard column: Waters Xbridge Cl 8

Mobile phase: Me0H/H20 were premixed to the remove heat of mixing. IM TEAB was mixed on the HPLC instrument by pumping it at 10%

[0135] Table 1 shows the preparative workflow used. Table 1

[0136] Exemplary HPLC chromatograms for the four different types of nucleoside triphosphates are shown in Fig. 1. The chromatogram shows two distinct peaks for the active and inactive isomers of the four different types of nucleoside triphosphates. The two isomers were separated from one another by collecting suitable fractions from the two peaks. This allows the production of compositions in which one of the two isomers is present in excess.

[0137] Example 2: Xpandomer synthesis and sequencing using active vs. inactive isomers

[0138] To produce Xpandomer copies of a DNA template, solid-state primer extension reactions are conducted using isomolar amounts of each XNTP, 4pmol template and 20pmol E-oligo primer (solid-state Xpandomer synthesis in which the extension oligo is covalently bound to a chip substrate is described in WO 2020/172479 Al, which is herein incorporated by reference in its entirety). The 50 pl extension reaction includes the following reagents: 50mM TrisCi, pH 8.84, 200mM NH₄OAC, 50mM GuC120% PEG8K, 10% N-methylpyrrolidone (NMP), 15nmol polyphosphate PP-60.23, 2.5 pg Kod single-strand binding protein (SSB), 0.1M urea, 15mM PEM additive and 13 pg purified recombinant DNA polymerase C4760 (SEQ ID NO: 2, a variant of DPO4 polymerase; other suitable variants include SEQ ID NOs: 1 and 3-5). The extension reaction is run for 60 minutes at 37°C.

[0139] Xpandomer products are next sequenced using the SBX protocol. Briefly, the constrained Xpandomer products are washed in buffer B.064 (1% Tween-20/3% SDS/5mM HEPES, pH 8.0/100mM NaPO-i/l 5% DMF) and cleaved to generate linearized Xpandomer by adding 200pl buffer C.001 (7.5M DC1) and incubating for 30 minutes at 23 °C. The sample is then neutralized by adding 2000 pl buffer B.064 and incubating for 2min at RT. The Xpandomer sample is then subjected to amine modification by adding 500pmol succinate anhydride in buffer B.064 and incubating for 5 minutes at 23 °C. The sample is then washed in buffer D.102 (50% ACN) and the Xpandomers are released from the substrate by photocleavage and eluted in 60 pl elution buffer.

[0140] Protein nanopores are prepared by inserting a-hemolysin into a DPhPE/hexadecane bilayer membrane in a buffer of 2 M NH4CI and 100 mM HEPES, pH 7.4. The cis well is perfused with buffer AG242 containing 0.4M NH4CI , 600mM GuanCl, lOOmM HEPES; pH 7.4, and 5% glycerol and the trans well is perfused with buffer AB080 containing 0.4M NH4CI, 600mM GuanCl, 5% ethyl acetate, lOmM HEPES; pH 7.4. The Xpandomer sample is heated to 70° C for 2 minutes, cooled completely and vortexed, then a 2 pL aliquot is added to the cis well. The voltage parameters are run as follows: 70mV/625mV/6ps/l .0ms (read voltage/pulse voltage/pulse voltage duration/pulse frequency). Data are acquired via Labview acquisition software.

[0141] A mixture of four XNTPs (complementary to the four naturally occurring nucleobases A, G, C and T, respectively) was used for Xp synthesis and subsequent sequencing by expansion in the form of 1) the diastereomerically pure active isomer, 2) the diastereomerically pure inactive isomer, or 3) mixtures of 90:10 or 75:25 of active inactive isomers. The results are summarized in the following Table 2:

[0142] Table 2 [0143] The use of active isomer at a concentration of 100 pM or higher yielded -40% full- length Xpandomer product, while 75 pM of active isomer yielded 33% full-length product, thus providing a correlation between yield of full-length products with concentrations of active isomers up to 100 pM. This is also graphically shown in Fig. 2 (see data series for 100:0 ratio). Table 2 further shows that the % full-length value did not drastically change at a concentration of 150 pM of active isomer compared to 100 pM, suggesting a possible saturation effect between 75 and 100 pM of active isomer. In contrast, the use of 100 pM inactive isomer did not yield any full-length Xpandomer product at all. This difference between the use of diastereomerically pure active and inactive isomers is confirmed by Fig. 3 showing a representative image after a gel electrophoresis with products from Xpandomer synthesis using the active or inactive isomers. Fig. 3 shows that the active isomer allows for the generation of full-length Xpandomers, as evidenced by the presence of a band in lane 37. In contrast, the use of the inactive isomer does not yield any full-length Xpandomer, as evidenced by the absence of any band in lane 38.

[0144] Table 2 also shows that the truncated Xpandomer products that could be obtained using the inactive isomer had higher rates of sequence errors (deletions, substitutions, or insertion-deletions) than the Xpandomer products synthesized in the presence of the active isomer.

[0145] Moreover, as further demonstrated by Table 2, the use of the active isomer achieved a higher percentage of full-length Xpandomers compared to a mixture of both isomers. This was even the case when the concentration of the active isomer was the same in the mixture as in the diastereomerically pure preparation: While 100 pM of pure active isomers yielded 40% of full-length product, the 90:10 mixture at 111 pM (comprising 100 pM active and 11 pM inactive isomers) or the 75:25 mixture at 133 pM (comprising 100 pM active and 33 pM inactive isomers) only yielded 37% and 32% of full-length product, respectively. Likewise, while 75 pM of pure active isomers yielded 33% of full-length product, the 75:25 mixture at 100 pM (comprising 75 pM active and 25 pM inactive isomers) only yielded 29% of full-length product. This is graphically shown in Fig. 2. The figure plots the concentration of active isomer vs. the % full-length Xpandomer product obtained. It is evident that the % full-length value at a given concentration of active isomer decreases when the inactive isomer is present, depending on the amount of inactive isomer present: While there was only a slight negative effect in a 90:10 mixture, the negative effect was stronger with a 75:25 mixture.

[0146] A negative effect of the inactive isomer on Xpandomer length is also supported by the mean length of the products obtained. See Table 2 showing 281 nt mean length with 100 pM of pure - 31 - active isomers, and slightly shorter mean lengths of 279 nt with the mixtures comprising 100 pM of active isomers plus 11 or 33 pM inactive isomers.

[0147] Overall, the data demonstrate that the yield of full-length Xpandomer product directly depends on the presence of active isomer. The inactive isomer is incapable of producing any high- quality full-length Xpandomer products, and, surprisingly, even has a concentration-dependent negative impact on the yield of full-length Xpandomer product in the presence of the active isomer. Without wishing to be bound by any theory, one explanation for this negative effect might be that - despite a strong preference for the active isomer - the polymerase can occasionally incorporate an inactive isomer which may result in premature termination of Xpandomer elongation.

[0148] In conclusion, it is desirable for Xpandomer synthesis to use a composition comprising an excess of active isomers, for example at least 80%, preferably at least 90% or even 100% of active isomers. Conversely, a composition comprising an excess of inactive isomers may have some utility e.g. as negative control for Xpandomer synthesis.

SEQUENCES

[0149] SEQ ID NO: 1 (DPO4 C4552)

[0150] MTVLFVDFDYFYAQVEEVLNPSLKGKPVWCVFSGRFEDSGWAT ANYEAR

KFGVYAGIPIVEAKKILPNAVYLPWRDLVYWGVSERIMNLLREYSEKIEIASIDEAYLDISDK

VRDYREAYNLGLEIKNKILEKEKITVTVGISKNKVFAAVAGRMAKPNGIKVIDDEEVKRLIR

ELDIADVQGIPYFTAEI<LI<I<LGINI<LVDTLSIEFDI<LI<GMIGEAI<AI<YLISLARDEYNEPIRT

RVRI<SIGRTVTMI<RNSRNLEEII<PYLFRAIEESYYI<LDI<RIPI<AIHVVAWI<SYWNSQYRWS WFPHGISKETAYSESVQLLQQILKKDKRKIRRIGVRFSKF

[0151] SEQ ID NO: 2 (DPO4 C4760)

[0152] MI VLFVDFDYFYAQVEEVLNPSLKGKPVVVCVFSGRFEDSGVV AT ANYEAR

KFGVYAGIPIVRAKKILPNAVYLPWRDLVYWGVSERIMNLLREYSEKIEIASIDEAYLDISDK

VRDYREAYNLGLEIKNKILEKEKITVTVGISKNKVFAAVAGRMAKPNGIKVIDDEEVKRLIR

ELDIADVQGIPYFTAEI<LI<I<LGINI<LVDTLSIEFDI<LI<GMIGEAI<AI<YLISLARDEYNEPIRT RVRKSIGRTVTMKRNSRNLEEIKPYLFRAIEESYYKLDKRIPKAIHWAWKSYWNSQYRWS

WFPHGISKETAYSESVQLLQQILKKDKRKIRRIGVRFSKF

[0153] SEQ ID NO: 3 (DPO4 C4842)

[0154] MIVLFVDFDYFYAQVEEVLNPSLKGKPVWCVFSGRFEDSGWAT AN YE AR

KFGVYAGIPIVRAKKILPNAVYLPWRDLVYWGVSERIMNLLREYSEKIEIASIDEAYLDISDK

VRDYREAYNLGLEIKNKILEKEKITVTVGISKNKVFAAVAGRMAKPNGIKVIDDEEVKRLIR

ELDIADVQGIPYFTAEKLKKLGINKLVDTLSIEFDKLKGMIGEAKAKYLISLARDEYNEPIRT

RVRRSIGRTVTMKRNSRNLEEIKPYLFRAIEESYYKLDKRIPKAIHVVAWKSYWNSQYRWS WFPHGISKETAYSESVQLLQQILKKDKRKIRRIGVRFSKF

[0155] SEQ ID NO: 4 (DPO4 C4852)

[0156] MIVLFVDFDYFYAQVEEVLNPSLKGKPVWCVFSGRFEDSGWAT ANYEAR

KFGVYAGIPIVRAKKILPNAVYLPWRDLVYWGVSERIMNLLREYSEKIEIASIDEAYLDISDK

VRDYREAYNLGLEIKNKILEKEKITVTVGISKNKVFAAVAGRMAKPNGIKVIDDEEVKRLIR

ELDIADVQGIPYFTAEKLKKLGINKLVDTLSIEFDKLKGMIGEAKAKYLISLARDEYNEPIRT

RVRKSIGRTVTMKRDSRNLEEIKPYLFRAIEESYYKLDKRIPKAIHWAWKSYWNSQYRWS WFPHGISKETAYSESVQLLQQILKKDKRKIRRIGVRFSKF

[0157] SEQ ID NO: 5 (DPO4 C4862)

[0158] MIVLFVDFDYFYAQVEEVLNPSLKGKPVWCVFSGRFEDSGW AT ANYEAR

KFGVYAGIPIKRAKKILPNAVYLPWRDLVYWGVSERIMNLLREYSEKIEIASIDEAYLDISDK

VRDYREAYNLGLEIKNKILEKEKITVTVGISKNKVFAAVAGRMAKPNGIKVIDDEEVKRLIR

ELDIADVQGIPYFTAEKLKKLGINKLVDTLSIEFDKLKGMIGEAKAKYLISLARDEYNEPIRT

RVRKSIGRTVTMKRNSRNLEEIKPYLFRAIEESYYKLDKRIPKAIHWAWKSYWNSQYRWS WFPHGISKETAYSESVQLLQQILKKDKRKIRRIGVRFSKF

Claims

PATENT CLAIMS What is claimed is:

1. A composition comprising nucleoside triphosphates having the structure: wherein NB is a nucleobase; R¹ comprises or consists of a hydrocarbon; R² is independently H, OH or any 2'-ribose modification; R³ is H or any protecting group; and R⁴ comprises or consists of a hydrocarbon; G¹ and G² independently represent terminal clickable groups; L¹ and L² independently represent linking groups; and T is a tether molecule; wherein at least 80%, preferably at least 90%, such as at least 95%, at least 99%, or 100% of the nucleoside triphosphates have the following stereoconfiguration at the a-phosphoramidate:

2. The composition of claim 1, wherein at least 90% of the nucleoside triphosphates have the following stereoconfiguration at the a-phosphoramidate:

3. The composition of claim 1 or 2, wherein 100% of the nucleoside triphosphates have the following stereoconfiguration at the a-phosphoramidate:

4. The composition of any one of the preceding claims, wherein NB is selected from cytosine, thymine, 7-deazaadenine and 7-deazaguanine.

5. The composition of any one of the preceding claims, wherein R¹ is attached to position 5 of the nucleobase when the nucleobase is a pyrimidine nucleobase, and to position 7 of the nucleobase when the nucleobase is a purine nucleobase.

6. The composition of any one of the preceding claims, wherein NB has one of the following structures:

7. The composition of any one of the preceding claims, comprising a mixture of four different types of nucleoside triphosphates.

8. The composition of claim 7, wherein the four different types of nucleoside triphosphates comprise four different types of nucleobases.

9. The composition of claim 7 or 8, wherein the four different types of nucleoside triphosphate base pair with, guanine, adenine, thymine and cytosine, respectively.

10. The composition of any one of claims 7-9, wherein the four different types of nucleoside triphosphate comprise the four different types of nucleobases of claim 6, respectively.

11. The composition of any one of the preceding claims, wherein R¹ comprises or consists of an unsaturated hydrocarbon.

12. The composition of any one of the preceding claims, wherein R¹ consists of a hydrocarbon, such as an alkynyl.

13. The composition of any one of the preceding claims, wherein R¹ is acyclic.

14. The composition of any one of the preceding claims, wherein R¹ is linear.

15. The composition of any one of the preceding claims, wherein R¹ comprises 1-20 carbon atoms, such as 1-10 carbon atoms or 5-10 carbon atoms, such as 6 or 8 carbon atoms.

16. The composition of any one of the preceding claims, wherein R¹ is a hexa-l-ynyl or octa- 1-ynyl group.

17. The composition of any one of the preceding claims, wherein R¹ with G¹ is a octa-1, 7- diynyl or a deca-l,9-diynyl group.

18. The composition of any one of the preceding claims, wherein both R² are H.

19. The composition of any one of the preceding claims, wherein R³ is H.

20. The composition of any one of the preceding claims, wherein R⁴ comprises or consists of a saturated hydrocarbon.

21. The composition of any one of the preceding claims, wherein R⁴ comprises 1-20 carbon atoms, such as 3-15 carbon atoms, 3-10 carbon atoms, such as 4 carbon atoms.

22. The composition of any one of the preceding claims, wherein R⁴ is acyclic.

23. The composition of any one of the preceding claims, wherein R⁴ is linear.

24. The composition of any one of the preceding claims, wherein R⁴ consists of a hydrocarbon.

25. The composition of any one of the preceding claims, wherein R⁴ is an n-butyl group.

26. The composition of any one of the preceding claims, wherein R⁴ with G² is a hex-5-ynyl group.

27. The composition of any one of claims 1-23, wherein R⁴ comprises or consists of two or more hydrocarbons that are linked by an atom or group of atoms other than carbon, such as a phosphorus atom and/or an oxygen atom.

28. The composition of any one of the preceding claims, wherein the terminal clickable group is a terminal alkyne group or a terminal azide group, preferably a terminal alkyne group.

29. The composition of any one of the preceding claims, wherein G¹ and G² represent the same type of terminal clickable group.

30. The composition of any one of the preceding claims, wherein L¹ and L² independently represent linking groups formed via click reactions.

31. The composition of any one of the preceding claims, wherein each of L¹ and L² is a 1 ,2,3- triazole.

32. The composition of any one of the preceding claims, wherein at least 80%, preferably at least 90%, such as at least 95%, at least 99%, or 100% of the nucleoside triphosphates have a structure selected from the following structures:

33. The composition of any one of the preceding claims, wherein at least 80%, preferably at least 90%, such as at least 95%, at least 99%, or 100% of the nucleoside triphosphates have a structure selected from the following structures:

34. The composition of any one of claims 1-29, wherein at least 80%, preferably at least 90%, such as at least 95%, at least 99%, or 100% of the nucleoside triphosphates have a structure selected from the following structures:

35. The composition of any one of claims 1-29, wherein at least 80%, preferably at least 90%, such as at least 95%, at least 99%, or 100% of the nucleoside triphosphates have a structure selected from the following structures:

36. A method for generating a complementary strand to a nucleic acid, comprising contacting the nucleic acid with the composition of any one of claims 1-33.

37. The method of claim 36, wherein the nucleic acid is comprised in library of nucleic acids.

38. The method of claim 36 or 37, wherein the nucleic acid is a DNA.

39. The method of any one of claims 36-38, wherein the composition further comprises a nucleic acid polymerase.

40. The method of any one of claims 36-39, wherein the composition further comprises a buffering agent, such as TrisCi, and/or a polymerase cofactor, such as MnCh.

41. The method of any one of claims 36-40, comprise hybridizing a primer to the nucleic acid, at the same time as or followed by contacting the nucleic acid with the composition.

42. A method for determining the sequence of a nucleic acid, comprising the following steps in order:

1) Generating a complementary strand to the nucleic acid by the method of any one of claims 36-41;

3) Sequencing the expanded complementary strand,

43. The method of claim 42, wherein the P-N bond is selectively cleaved in step 3) under acidic conditions.

44. The method of claim 42 or 43, wherein the expanded complementary strand is sequenced in step 4) by nanopore-based sequencing.

45. The method of claim 44, wherein the nanopore-based sequencing comprises:

(a) providing a chip for nanopore-based sequencing comprising:

(ii) a nanopore inserted into the barrier, wherein the nanopore has an entrance side on the cis side of the barrier and an exit side on the trans side of the barrier; (b) contacting the cis side of the barrier with the expanded complementary strand;