WO2025202473A1

WO2025202473A1 - A nucleic acid deaminase, a base editor and uses thereof

Info

Publication number: WO2025202473A1
Application number: PCT/EP2025/058591
Authority: WO
Inventors: Robin Léo Philippe LOESCH
Original assignee: Revvity Discovery Ltd
Current assignee: Revvity Discovery Ltd
Priority date: 2024-03-28
Filing date: 2025-03-28
Publication date: 2025-10-02
Anticipated expiration: 2026-09-28

Abstract

The present disclosure relates to systems, methods and compositions for nucleic acid modification (e.g., base editing) by using a nucleic acid deaminase or a base editor comprising the nucleic acid deaminase. The nucleic acid deaminase is capable of deaminating a deoxyadenosine (dA) in a nucleic acid molecule and comprises an amino acid sequence of a wild-type TadA, wherein one or more amino acid residue(s) of the wild- type TadA are deleted and wherein the one or more deleted amino acid residue(s) are located within a region of the wild-type TadA contacting the nucleic acid molecule.

Description

A NUCLEIC ACID DEAMINASE, A BASE EDITOR AND USES THEREOF

Technical Field

The present disclosure relates to systems, methods and compositions for nucleic acid modification (e.g., base editing) by using a nucleic acid deaminase or a base editor comprising the nucleic acid deaminase.

Background

Base editing is an approach to genome editing that enables the direct conversion of one nucleobase into another in a programmable manner. Base editing does not require doublestranded nucleic acid backbone (e.g., DNA) cleavage, a donortemplate or relying on cellular homology directed repair (HDR). Accordingly, base editing may be a safer alternative to conventional genome editing approaches (e.g., RNA-programmable CRISPR-associated (Cas) nucleases) by minimizing the risks to cells created by double-stranded breaks (DSBs), such as the formation of DSB-associated byproducts.

DNA base editors may comprise fusions between a catalytically impaired Cas nuclease (e.g., Cas nickase/nCas or a dead Cas/dCas) and a base-modification enzyme that operates on single-stranded DNA (ssDNA). Upon binding to its target site in DNA, base pairing between the guide RNA and the complementary strand of the DNA leads to displacement of a small segment of ssDNA (displaced strand of DNA non complementary to the guide RNA) in a structure called R-loop. DNA bases within the ssDNA are modified by the base-modification enzyme (e.g., a nucleic acid deaminase). Two classes of DNA base editors with nucleic acid deaminases have been described. Cytidine base editors (CBEs) rely on naturally occurring enzymes to convert a C-G base pair into a T-A base pair (C-to-T transition mutation). Adenine base editors (ABEs) use a modified version of a transfer RNA adenosine deaminase enzyme to convert an A-T base pair to a G-C base pair (A-to-G transition mutation). For example, an ABE catalyses deamination of an adenosine within the ssDNA yielding inosine, which in the context of transcription or replication exhibits the base-pairing preference of guanosine in the active site of a polymerase.

Bacterial genomes encode a wild-type transfer RNA adenosine deaminase (TadA) enzyme which specifically catalyses the deamination of adenosine to inosine at the wobble position 34 of transfer RNA (tRNA) in bacteria. In its native context (e.g., in Escherichia coli), wildtype TadA acts as a homodimer, with one monomer catalyzing deamination (catalytic monomer) and the other monomer contributing to tRNA substrate binding (structural monomer). The wild-type TadA enzyme does not exhibit any significant enzymatic activity on ssDNA when fused to a catalytically impaired Cas9 protein (D10A nickase Cas9, nCas9) in a single polypeptide chain.

While the wild-type Escherichia coliTad (ecTadA) enzyme does not exhibit any significant enzymatic activity on single-stranded DNA (ssDNA) when fused to a Cas9 nickase, successive rounds of evolution performed on ecTadA have caused the enzyme to modify ssDNA. In the context of adenosine base editors (ABEs) comprising a nCas9 fusion protein, a directed protein evolution strategy transformed a protein with no ability to deaminate adenine at target loci in DNA into forms that edit DNA, identifying an ABE protein that comprises as many as 14 mutations to achieve a good base editing efficiency. Therefore, more than ten amino acid substitutions, identified by directed protein evolution, were required for ecTadA engineering to generate potent ABEs (evolved ecTadA deaminase) which edit adenosines at the DNA level. In consideration of the complicated ecTadA evolution process, additional TadA orthologs have not been exploited and engineered for functional base editors.

Available approaches for base editing using adenosine deaminases are currently suffering from various limitations including: (i) a lack of variety of adenosine deaminases acting on DNA, as the ones identified so far are fused to nCas9 in a single polypeptide chain to generate sufficient editing rates, (ii) low total DNA base editing activity, (iii) compatibility of the TadA deaminase with sequence-targeting proteins, (iv) base editing window preference (e.g., at canonical positions A5-A7 in a protospacer compared with non- canonical positions A3, A4, A8-A10), (v) broad base editing window, (vi) presence of off- target modifications, (vii) variable efficiency depending on the -1 nucleotide, etc. Moreover, there remains a need in the field for more specific, controlled, and safer methods of base editing nucleic acids in cells, in particular, in human and animal cells.

Summary

The present disclosure meets or addresses at least some of the above needs and aims at solving the above problems in the art by providing novel nucleic acid deaminases with deoxyadenosine base editing activity (e.g., a TadA deletion variant), and novel base editors comprising the nucleic acid deaminase (e.g., an adenosine base editor). The present disclosure also provides isolated nucleic acids, expression constructs and vectors encoding the disclosed nucleic acid deaminases and/or the disclosed base editors wherein the components of the base editor may be encoded on a single or separate molecule(s); kit of parts, pharmaceutical compositions and cells containing the disclosed nucleic acid deaminases, the disclosed base editors, the disclosed isolated nucleic acids, the disclosed expression constructs and/or the disclosed vectors. The present disclosure also provides methods and uses which employ the disclosed nucleic acid deaminase orthe disclosed base editor for base editing in a nucleic acid molecule including methods for modifying a nucleic acid molecule (e.g., to obtain a genetically engineered isolated cell); methods for target site-specific modification of a nucleic acid molecule; methods of preventing or treating subjects having or at risk of developing a disease, disorder or condition associated with a point mutation; and uses as a basic research tool and/or a screening tool. The present disclosure is defined in the corresponding independent claims.

In a first aspect, the present disclosure provides a nucleic acid deaminase capable of deaminating a deoxyadenosine (dA) in a nucleic acid molecule (e.g., DNA). The nucleic acid deaminase comprises an amino acid sequence of a wild-type TadA, wherein one or more amino acid residue(s) of the wild-type TadA are deleted, and wherein the one or more deleted amino acid residue(s) are located within a region of the wild-type TadA contacting the nucleic acid molecule (e.g., |34-|35 loop segment).

Evolved TadA enzymes that enable base conversion of A-T to G-C base pairs in DNA (A-to- G transition) were centered around amino acid substitutions (existing TadA substitution variants), whereas deletions of amino acids remained overlooked. The inventors have found that the TadA deletion variants according to the present disclosure achieve a base editing activity (e.g., DNA editing activity as defined by percentage of A-to-G transition, as measured by sequencing) that is equal to, or higher compared to a TadA substitution variant that has a substitution at the corresponding deleted amino acid residue. Accordingly, the disclosed TadA deletion variants are able to deaminate new substrates (e.g., DNA substrates as compared to RNA substrates for wild-type TadA).

Surprisingly, compared to existing TadA substitution variants that include more than 10 amino acid substitutions and are capable of modifying DNA, the nucleic acid deaminase of the present disclosure achieves a similar base editing efficiency (transition of A-to-G) with only one, two or three amino acid deletions. The inventors have also found that the previously reported substitution D108N of ecTadA is not essential for creating a nucleic acid deaminase with detectable DNA editing activity.

The inventors also found that a "free" nucleic acid deaminase according to the present disclosure can deaminate any exposed dA in a nucleic acid molecule (e.g., a DNA). In a further aspect, the present disclosure provides a base editor comprising (i) an effector protein comprising at least one nucleic acid deaminase according to the present disclosure, (ii) a sequence-targeting protein, and (iii) an RNA-ligand binding complex. The base editor of the present disclosure is specific for a target site in a nucleic acid molecule, wherein the sequence-targeting protein (ii) and the RNA-ligand binding complex (iii) are capable of recruiting the effector protein (i) to the target site (e.g., a protospacer) in the nucleic acid molecule.

The base editor of the present disclosure showed desirable base editing window preference (e.g., at canonical positions A5-A7, All in a protospacer compared with non- canonical positions A3, A4, A8-A10) and a narrow base editing window (e.g., at a single dA). The present inventors showed that the base editor of the present disclosure created precise A-to-G transitions with equal or fewer bystander edits compared to a base editor comprising a TadA substitution variant that has a substitution at the corresponding deleted amino acid residue (existing adenosine base editor, ABE). Moreover, the disclosed base editor is not limited to a specific sequence motif at the target site (in contrast, a wild-type ecTadA has a strict UACG sequence motif requirement in RNA; see, e.g., Fig. 1 of Kim et al. Biochemistry, Vol. 45, No. 20, 2006, doi: 10.1021/bi0522394).

Many of the existing adenosine base editor include a fusion protein containing a heterodimer of a wild-type (WT) TadA monomer that plays a structural role during base editing and a TadA substitution variant monomer that catalyzes dA deamination. Surprisingly, a base editor comprising a monomer of the disclosed TadA deletion variant was capable of maintaining or improving DNA base editing levels compared to a base editor comprising a heterodimer of a WT TadA monomer and the disclosed TadA deletion variant or a homodimer of two disclosed TadA deletion variants.

The present inventors also found that a base editor comprising (i) an effector protein comprising at least one nucleic acid deaminase according to the present disclosure and at least one ligand capable of binding to a ligand binding moiety (e.g., an RNA binding domain such as MCP), (ii) a sequence-targeting protein, and (iii) an RNA-ligand binding complex comprising at least one ligand binding moiety (e.g., an RNA motif selected from a single MS2 RNA motif, Qbeta RNA motif, boxB RNA motif, Csy4 RNA motif or PP7 RNA motif) had an equal or improved base editing performance (also referred to as "aptamer-recruitment- dependent base editing system") compared to existing adenosine base editors based on a fusion between the deaminase and the sequence-targeting protein. The present inventors also found that the disclosed base editor enables concurrent adenine and cytosine editing (a dual-function base editor). For example, the inventors found that a first base editor comprising at least one nucleic acid deaminase according to the present disclosure may be multiplexed with at least one further base editor. The inventors showed that multiplexing a first base editor comprising a nucleic acid deaminase according to the present disclosure and a second base editor comprising a further protein capable of site-specific deamination (e.g., a cytidine deaminase) yielded combined DNA editing of A-to-G and C-to-T at the target site.

Further examples of advantages or benefits over the existing adenosine base editors include (i) providing a wider variety of base editors acting on DNA, (ii) higher or enhanced DNA base editing activity, (iii) reduction of off-target modifications or of bystander edits,

(iv) the disclosed base editor is not limited to a specific sequence motif at the target site,

(v) a narrower base editing window.

In a further aspect, the present disclosure provides methods and uses which employ the disclosed nucleic acid deaminase or the disclosed base editor for base editing in a nucleic acid molecule. The methods disclosed herein can be used to modify specific target sites contained in nucleic acid molecules of eukaryotic and prokaryotic cells. The methods can further be used to modify specific target sites contained in nucleic acid molecules within organelles (e.g., chloroplasts and/or mitochondria). The methods can further be used to generate a cell (e.g., a genetically engineered isolated cell). The disclosed nucleic acid deaminase and the disclosed base editor are also useful as a basic research tool (e.g., to identify, test, and/or manipulate target site(s) in a nucleic acid molecule), and/or as a screening tool (e.g., diagnostic screens).

In a further aspect, the present disclosure provides the disclosed nucleic acid deaminase or the disclosed base editor for medical applications to prevent or treat diseases, disorders or conditions that are associated or caused by one or more point mutation(s) that may be corrected by deaminase-mediated base editing (e.g., as a medicament, for treating or preventing a genetic disease, or for correcting a pathogenic point mutation or a loss-of- function mutation). Precise and efficient base editing is critical for therapeutic applications (e.g., in gene therapy), as any bystander or off-target edit may result in undesired mutations in an (on- or off-) target site.

Different embodiments are set out in the respective dependent claims, as well as in the numbered embodiments described below. A nucleic acid deaminase capable of deaminating a deoxyadenosine (dA) in a nucleic acid molecule, wherein the nucleic acid deaminase comprises an amino acid sequence of a wild-type TadA, wherein one or more amino acid residue(s) of the wildtype TadA are deleted, and wherein the one or more deleted amino acid residue(s) are located within a region of the wild-type TadA contacting the nucleic acid molecule. The nucleic acid deaminase according to item 1, wherein the deoxyadenosine (dA) is contained in a double-stranded or single-stranded DNA region of the nucleic acid molecule. The nucleic acid deaminase according to item 2, wherein the deoxyadenosine (dA) is contained in a single-stranded DNA region of the nucleic acid molecule; optionally wherein the single-stranded DNA region is part of an R-loop. The nucleic acid deaminase according to any one of items 1 to 3, wherein the wildtype TadA is a bacterial TadA selected from the group consisting of Escherichia coli TadA, Salmonella enterica TadA, Staphylococcus aureus TadA, a Streptococcus pyogenes TadA, Salmonella typhi TadA, Haemophilus influenzae TadA, Caulobacter vibrioides TadA, and Shewanella putrefaciens TadA. The nucleic acid deaminase according to any one of items 1 to 4, wherein the region of the wild-type TadA contacting the nucleic acid molecule comprises amino acid residues 105 to 130 of E.coli TadA shown in SEQ ID NO: 1 or corresponding amino acid residues in a bacterial TadA shown in one of SEQ ID NOs: 2 to 8. The nucleic acid deaminase according to any one of items 1 to 5, wherein the region of the wild-type TadA contacting the nucleic acid molecule consists of amino acid residues 105 to 130 of E.coli TadA shown in SEQ ID NO: 1 or corresponding amino acid residues in a bacterial TadA shown in one of SEQ ID NOs: 2 to 8. The nucleic acid deaminase according to any one of items 1 to 6, wherein the region of the wild-type TadA contacting the nucleic acid molecule comprises amino acid residues 105 to 110 of E.coli TadA shown in SEQ ID NO: 1 or corresponding amino acid residues in a bacterial TadA shown in one of SEQ ID NOs: 2 to 8. 8. The nucleic acid deaminase according to any one of items 1 to 7, wherein the region of the wild-type TadA contacting the nucleic acid molecule consists of amino acid residues 105 to 110 of E.coli TadA shown in SEQ ID NO: 1 or corresponding amino acid residues in a bacterial TadA shown in one of SEQ ID NOs: 2 to 8.

9. The nucleic acid deaminase according to any one of items 1 to 8, wherein the one or more amino acid residue(s) are one, two or three amino acid residue(s) selected from A106, R107 and D108 of £.co// TadA shown in SEQ ID NO: l or a corresponding residue in a bacterial TadA shown in one of SEQ ID NOs: 2 to 8.

10. The nucleic acid deaminase according to any one of items 1 to 9, wherein the nucleic acid deaminase comprises an amino acid sequence derived from any one of SEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIM ALRQGG LVMQNYRLI DATLYVTLEPCVMCAGAM I HSRIG RVVFGARAKTG AAGSLMDVLH H PGMNHRVEITEGILADECAALLSDFFRMRRQEIKAQKKAQSSTD (SEQ ID NO: 9);

SEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIM ALRQGGLVMQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGADAKTGAAGSLMDVLHH PGMNHRVEITEGILADECAALLSDFFRMRRQEIKAQKKAQSSTD (SEQ ID NO: 10);

SEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIM ALRQGGLVMQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGRDAKTGAAGSLMDVLHH PGMNHRVEITEGILADECAALLSDFFRMRRQEIKAQKKAQSSTD (SEQ ID NO: 11);

SEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIM ALRQGGLVMQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGAAKTGAAGSLMDVLHHP GMNHRVEITEGILADECAALLSDFFRMRRQEIKAQKKAQSSTD (SEQ ID NO: 12);

SEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIM ALRQGGLVMQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGAKTGAAGSLMDVLHHPG MNHRVEITEGILADECAALLSDFFRMRRQEIKAQKKAQSSTD (SEQ ID NO: 13);

SEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIM ALRQGGLVMQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGDAKTGAAGSLMDVLHHP GMNHRVEITEGILADECAALLSDFFRMRRQEIKAQKKAQSSTD (SEQ ID NO: 14); or SEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIM ALRQGGLVMQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGRAKTGAAGSLMDVLHHP GMNHRVEITEGILADECAALLSDFFRMRRQEIKAQKKAQSSTD (SEQ ID NO: 15).

11. The nucleic acid deaminase according to any one of items 1 to 10, wherein the nucleic acid deaminase comprises an amino acid sequence derived from any one of SEQ ID NOs: 9 to 15, and wherein the amino acid sequence of the nucleic acid deaminase has one or more mutation(s) compared to said SEQ ID NOs.

12. The nucleic acid deaminase according to item 11, wherein the one or more mutation(s) are selected from K20A, R21A, E59A, A106W, N108Q, L145T, F148A, R153del, V82G and V82W of wild-type E.coli TadA shown in SEQ ID NO: 1 or corresponding amino acid residues in a bacterial TadA shown in one of SEQ ID NOs: 2 to 8.

13. The nucleic acid deaminase according to any one of items 1 to 12, wherein the nucleotide sequence of the nucleic acid deaminase is codon-optimized for expression in a prokaryotic cell or a eukaryotic cell.

14. The nucleic acid deaminase according to item 13, wherein the eukaryotic cell is a plant cell, insect cell, animal cell or fungal cell.

15. The nucleic acid deaminase according to any one of items 1 to 14, wherein the nucleic acid deaminase has an amino acid sequence as set forth in SEQ ID NOs: 9 to 15.

16. The nucleic acid deaminase according to any one of items 1 to 15, wherein the amino acid sequence of the nucleic acid deaminase further comprises a nuclear localization sequence (NLS).

17. The nucleic acid deaminase according to any one of items 1 to 16, wherein the nucleic acid molecule is a DNA selected from the group consisting of genomic DNA, nuclear DNA, chromosomal DNA, organellar DNA, exogenous DNA, viral DNA and a stably maintained plasmid.

18. A base editor comprising (i) an effector protein comprising at least one nucleic acid deaminase according to any one of items 1 to 17.

19. The base editor according to item 18, wherein in (i), the effector protein comprises one or two nucleic acid deaminase(s) according to any one of items 1 to 17. 20. The base editor according to item 18 or 19, wherein in (i), the at least one nucleic acid deaminase comprises an amino acid sequence derived from any one of SEQ ID NOs: 9 to 15.

21. The base editor according to any one of items 18 to 20, wherein in (i), the effector protein further comprises at least one nucleic acid deaminase comprising an amino acid sequence of a wild-type TadA, wherein the wild-type TadA is a bacterial TadA selected from the group consisting of Escherichia coli TadA, Salmonella enterica TadA, Staphylococcus aureus TadA, a Streptococcus pyogenes TadA, Salmonella typhi TadA, Haemophilus influenzae TadA, Caulobacter vibrioides TadA, and Shewanella putrefaciens TadA.

22. The base editor according to the any one of items 18 to 21, wherein in (i), the effector protein further comprises at least one ligand capable of binding to a ligand binding moiety.

23. The base editor according to item 22, wherein in (i), the nucleic acid deaminase and the ligand are directly fused, connected via a linker or non-covalently linked.

24. The base editor according to item 22, wherein in (i), the nucleic acid deaminase and the at least one ligand are covalently linked.

25. The base editor according to item 22 or 23, wherein in (i), the ligand is an RNA binding domain.

26. The base editor according to any one of items 18 to 25, wherein the base editor further comprises (ii) a sequence-targeting protein.

27. The base editor according to item 26, wherein in (ii), the sequence-targeting protein is a nuclease.

28. The base editor according to item 26 or 27, wherein in (ii), the sequence-targeting protein is a nuclease comprising at least one catalytically inactive nuclease domain.

29. The base editor according to any one of items 26 to 28, wherein in (ii), the sequencetargeting protein is a CRISPR-Cas protein. 30. The base editor according to any one of item 26 to 29, wherein in (ii), the sequencetargeting protein is a type V Cas protein or a type II Cas protein.

31. The base editor according to any one of items 26 to 30, wherein in (ii), the sequencetargeting protein is a CRISPR-Cas protein with at least one catalytically inactive nuclease domain (nickase) or a nuclease-dead CRISPR-Cas protein.

32. The base editor according to any one of items 26 to 31, wherein the effector protein (i) and the sequence-targeting protein (ii) are directly fused, connected via a linker or non-covalently linked.

33. The base editor according to item 32, wherein the effector protein (i) and the sequence-targeting protein (ii) are connected via a linker comprising 1 to 100 amino acids.

34. The base editor according to any one of items 26 to 31, wherein the base editor further comprises (iii) an RNA-ligand binding complex.

35. The base editor according to item 34, wherein in (iii), the RNA-ligand binding complex comprises an RNA moiety capable of binding to the sequence-targeting protein and an RNA moiety capable of binding to a target site in a nucleic acid molecule.

36. The base editor according to item 34 or 35, wherein in (iii), the RNA-ligand binding complex further comprises at least one ligand binding moiety.

37. The base editor according to any one of items 34 to 36, wherein the effector protein (i) and the sequence-targeting protein (ii) are connected via the RNA-ligand binding complex (iii).

38. The base editor according to any one of items 34 to 37, wherein in (iii), the ligand binding moiety is an RNA motif.

39. The base editor according to any one of items 34 to 38, wherein in (iii), the ligand binding moiety is an RNA motif selected from a single MS2 phage operator stem-loop (MS2) RNA motif, Qbeta RNA motif, boxB RNA motif, telomerase Ku binding motif, telomerase Sm7 binding motif, SfMu phage Com stem-loop, Csy4 RNA motif and PP7 phage operator stem-loop RNA motif.

40. The base editor according to any one of items 34 to 39, wherein in (iii), the ligand binding moiety is located at an extension of a stem-loop of the RNA-ligand binding complex; optionally wherein the ligand binding moiety is located at the 5' end or 3' end of the RNA-ligand binding complex.

41. The base editor according to any one of items 34 to 40, wherein in (i) the effector protein further comprises at least one ligand capable of binding to a ligand binding moiety, wherein the ligand is an RNA binding domain, and wherein in (iii) the ligand binding moiety is an RNA motif.

42. The base editor according to any one of items 36 to 41, wherein in (iii), the RNA moiety capable of binding to the sequence-targeting protein, the RNA moiety capable of binding to a target site, and/or the ligand binding moiety are located within the same or different molecule(s).

43. The base editor according to any one of items 35 to 42, wherein in (iii), the RNA moiety capable of binding to the sequence-targeting protein comprises a tracrRNA or scoutRNA, and the RNA moiety capable of binding to a target site comprises a crRNA.

44. The base editor according to any one of items 18 to 43, wherein the base editor comprises:

(i) an effector protein comprising at least one nucleic acid deaminase according to any one of items 1 to 17 and at least one ligand capable of binding to a ligand binding moiety,

(ii) a sequence-targeting protein, and

(iii) an RNA-ligand binding complex comprising at least one ligand binding moiety; and wherein the at least one ligand of the effector protein (i) binds to the ligand binding moiety of the RNA-ligand binding complex (iii). The base editor according to any one of items 18 to 44, wherein the base editor comprises:

(i) an effector protein comprising one or two nucleic acid deaminase(s) according to any one of items 1 to 17 and at least one ligand capable of binding to a ligand binding moiety,

(ii) a sequence-targeting protein comprising a CRISPR-Cas protein, and

(iii) an RNA-ligand binding complex comprising (a) an RNA moiety capable of binding to the sequence-targeting protein, (b) an RNA moiety capable of binding to a target site and (c) at least one ligand binding moiety; and wherein the at least one ligand of the effector protein (i) binds to the ligand binding moiety of the RNA-ligand binding complex (iii). The base editor according to any one of items 18 to 31 and 34 to 45, wherein the base editor comprises:

(i) an effector protein comprising one nucleic acid deaminase having an amino acid sequence as set forth in any one of SEQ ID NOs: 9 to 15 and at least one ligand capable of binding to a ligand binding moiety,

(ii) a sequence-targeting protein comprising a CRISPR-Cas protein with at least one catalytically inactive nuclease domain (nickase) or a nuclease-dead CRISPR-Cas protein, and

(iii) an RNA-ligand binding complex comprising (a) an RNA moiety capable of binding to the sequence-targeting protein, (b) an RNA moiety capable of binding to a target site and (c) at least one ligand binding moiety; and wherein the at least one ligand of the effector protein (i) binds to the ligand binding moiety of the RNA-ligand binding complex (iii). The base editor according to any one of items 18 to 46, wherein the deoxyadenosine (dA) is in a target site in the nucleic acid molecule. The base editor according to item 47, wherein the target site comprises a deoxyadenosine (dA) and a protospacer. The base editor according to according to item 48, wherein the deoxyadenosine (dA) is located in a position between 1 to 20 within the protospacer 5' of a protospacer adjacent motif (PAM). 50. The base editor according to according to item 48, wherein the deoxyadenosine (dA) is located at any one of the positions 4, 5, 6, 7 and 11 within the protospacer.

51. The base editor according to any one of items 18 to 50, wherein the base editor is a deoxyadenosine (dA) base editor.

52. The base editor according to any one of items 18 to 51, wherein the nucleic acid molecule is a DNA selected from the group consisting of genomic DNA, nuclear DNA, chromosomal DNA, organellar DNA, exogenous DNA, viral DNA and a stably maintained plasmid.

53. An isolated nucleic acid, an expression construct, or a vector comprising a sequence encoding the nucleic acid deaminase according to any one of items 1 to 17.

54. An isolated nucleic acid, an expression construct, or a vector comprising a sequence encoding the base editor according to any one of items 18 to 52, wherein the components (i), (ii) and/or (iii) are encoded on a single or separate molecule(s).

55. The isolated nucleic acid, the expression construct, or the vector according to item 53 or 54, wherein the isolated nucleic acid, the expression construct, or the vector further comprise a promoter operably linked to the sequence encoding the nucleic acid deaminase or the base editor, wherein the promoter is active in a prokaryotic cell or a eukaryotic cell.

56. A cell comprising the nucleic acid deaminase according to any one of items 1 to 17.

57. A cell comprising the base editor according to any one of items 18 to 52, wherein the cell is obtained from a prokaryotic cell or a eukaryotic cell.

58. A cell comprising the isolated nucleic acid, the expression construct, or the vector according to any one of items 53 to 55.

59. The cell according to any one of items 56 to 58, wherein the cell is obtained from a eukaryotic cell, and wherein the eukaryotic cell is a plant cell, insect cell, animal cell or fungal cell. 60. The cell according to item 59, wherein the animal cell is a mammalian cell.

61. The cell according to any one of items 56 to 60, wherein the cell is a genetically modified cell.

62. A kit of parts comprising part (i), wherein part (i) comprises an effector protein comprising at least one nucleic acid deaminase according to any one of items 1 to 17, and part (ii), wherein part (ii) comprises a sequence-targeting protein.

63. The kit of parts according to item 62, further comprising part (iii), wherein part (iii) comprises an RNA-ligand binding complex.

64. The kit of parts according to item 63 comprising parts (i), (ii) and (iii), wherein in part (i) the effector protein further comprises at least one ligand capable of binding to a ligand binding moiety, wherein in part (ii) the sequence-targeting protein is a CRISPR-Cas protein, and wherein in part (iii) the RNA-ligand binding complex comprises at least one ligand binding moiety; and wherein the at least one ligand of part (i) is capable of binding to the at least one ligand binding moiety of part (iii).

65. The kit of parts according to any one of items 62 to 64, wherein the parts (i), (ii), and/or (iii) are encoded on one or more expression construct(s).

66. A method for modifying a nucleic acid molecule, said method comprising contacting the nucleic acid molecule with at least one nucleic acid deaminase according to any one of items 1 to 17.

67. The method for modifying a nucleic acid molecule according to item 66, wherein the nucleic acid molecule comprises at least one deoxyadenosine (dA).

68. The method for modifying a nucleic acid molecule according to item 67, wherein the modifying results in an A-to-G transition.

69. The method for modifying a nucleic acid molecule according to any one of items 66 to 68, wherein the nucleic acid molecule is comprised in a cell. The method for modifying a nucleic acid molecule according to item 69, wherein the at least one nucleic acid deaminase is delivered into the cell through a plasmid, an mRNA, a ribonucleoprotein particle (RNP) complex, a viral vector, or lipid nanoparticles (LNPs). A method for target site-specific modification of a nucleic acid molecule, said method comprising contacting the nucleic acid molecule with a base editor according to any one of items 18 to 52 at a target site. The method for target site-specific modification of a nucleic acid molecule according to item 71, wherein the nucleic acid molecule comprises at least one deoxyadenosine (dA). The method for target site-specific modification of a nucleic acid molecule according to item 72, wherein the site-specific modification results in an A-to-G transition. The method for target site-specific modification of a nucleic acid molecule according to any one of items 71 to 73, wherein the base editor comprises (i) an effector protein, (ii) a sequence-targeting protein and (iii) an RNA-ligand binding complex, and wherein in (iii) the RNA-ligand binding complex comprises an RNA moiety capable of binding to the sequence-targeting protein and an RNA moiety capable of binding to the target site. The method for target site-specific modification of a nucleic acid molecule according to any one of items 71 to 74, said method comprising a first step of contacting the nucleic acid molecule with a first base editor according to any one of items 18 to 52 at a first target site and a simultaneous or subsequent second step of contacting the nucleic acid molecule with a second base editor at the first target site or at a second target site. The method for target site-specific modification of a nucleic acid molecule according to item 75, wherein the first target site and the second target site are on the same nucleic acid molecule or different nucleic acid molecules. 77. The method for target site-specific modification of a nucleic acid molecule according to any one of items 72 to 76, wherein the deoxyadenosine (dA) is in a target site in the nucleic acid molecule.

78. The method for target site-specific modification of a nucleic acid molecule according to the any one of items 71 to 77, wherein the nucleic acid molecule is comprised in a cell.

79. The method for target site-specific modification of a nucleic acid molecule according to item 78, wherein the cell is a prokaryotic cell or a eukaryotic cell.

80. The method for target site-specific modification of a nucleic acid molecule according to any one of items 74 to 79, wherein the effector protein (i) comprising at least one nucleic acid deaminase according to any one of items 1 to 17 is delivered into a cell.

81. The method for target site-specific modification of a nucleic acid molecule according to any one of items 74 to 80, wherein the sequence-targeting protein (ii) is delivered into a cell.

82. The method for target site-specific modification of a nucleic acid molecule according to any one of items 74 to 81, wherein the RNA-ligand binding complex (iii) is delivered into a cell.

83. The method for target site-specific modification of a nucleic acid molecule according to any one of items 78 to 82, wherein the base editor is delivered into the cell.

84. The method for target site-specific modification of a nucleic acid molecule according to any one of items 78 to 83, wherein the delivery into the cell is through a plasmid, an mRNA, an RNP complex, a viral vector or LNPs.

85. The method for modifying a nucleic acid molecule according to any one of items 66 to 70 or the method for target site-specific modification of a nucleic acid molecule according to any one of items 71 to 84, wherein the nucleic acid molecule is a DNA molecule. 86. The method according to item 85, wherein the nucleic acid molecule is a DNA selected from the group consisting of genomic DNA, nuclear DNA, chromosomal DNA, organellar DNA, exogenous DNA, viral DNA and a stably maintained plasmid.

87. A genetically engineered isolated cell obtained according to the method of any one of items 66 to 86.

88. A pharmaceutical composition comprising the nucleic acid deaminase according to any one of items 1 to 17 or the base editor according to any one of items 18 to 52.

89. The nucleic acid deaminase according to any one of items 1 to 17 or the base editor according to any one of items 18 to 52 for use as a medicament.

90. The nucleic acid deaminase according to any one of items 1 to 17 or the base editor according to any one of items 18 to 52 for use in treating or preventing a disease by target site-specific modification of a nucleic acid molecule.

91. The nucleic acid deaminase or the base editor for use according to item 90, wherein the nucleic acid molecule is selected from the group consisting of genomic DNA, nuclear DNA, chromosomal DNA, organellar DNA, exogenous DNA and viral DNA.

92. The nucleic acid deaminase or the base editor for use according to item 90 or 91, wherein a point mutation or a loss or gain-of-function mutation is targeted by the nucleic acid deaminase or the base editor to correct the mutation.

93. Use of the nucleic acid deaminase of any one of items 1 to 17 or the base editor according to any one of items 18 to 52 for the manufacture of a medicament.

94. A method for treating or preventing a disease comprising the target site-specific deamination of a deoxyadenosine (dA) in a nucleic acid molecule using the nucleic acid deaminase of any one of items 1 to 17 or the base editor according to any one of items 18 to 52.

95. Use of the nucleic acid deaminase of any one of items 1 to 17 or the base editor according to any one of items 18 to 52 as a research tool and/or as a screening tool. Brief Description of Drawings

In order to best describe the manner in which the above-described embodiments are implemented, as well as define other advantages and features of the present disclosure, a more particular description is provided below and is illustrated in the appended drawings. Understanding that these drawings depict only exemplary embodiments of the present disclosure and are thus not to be considered to be limiting in scope, the examples will be described and explained with additional specificity and detail through the use of the accompanying drawings in which:

Figure 1 is an overview of the designs of the Escherichia coli transfer RNA adenosine deaminase enzyme deletion (ecTadAA) variants generated. It shows amino acid residues 105 to 130 ("P4-P5 loop segment") of the wild-type ecTadA deaminase sequence (SEQ ID NO: 1), referred here as WT, which are located in the |34-|35 loop segment (based on PDB: P68398). The different ecTadAA variants generated are ecTadA-D108A (SEQ ID NO: 9), referred to as DI; ecTadA-R107A (SEQ ID NO: 10), referred to as D2; ecTadA-A106A (SEQ ID NO: 11), referred to as D3; ecTadA-D108A-R107A (SEQ ID NO: 12), referred to as D4; ecTadA-D108A-R107A-A106A (SEQ ID NO: 13), referred to as D5; ecTadA-R107A-A106A (SEQ ID NO: 14), referred to as D6, and ecTadA-D108A-A106A (SEQ ID NO: 15), referred to as D7. The deleted amino acid positions are highlighted in grey. The SEQ ID NOs depicted in the figure relate to the |34-|35 loop segment of the wild-type ecTadA (amino acid residues 105-130; SEQ ID NO: 16) and the |34-|35 loop segment of each ecTadAA variant (SEQ ID NOs: 17-23).

Figure 2 is a representation of the percentage of Adenine (A) or other nucleotides to guanine (G) ("N-to-G") conversions across the Site2a protospacer sequence (SEQ ID NO: 24) using the different ecTadAA variants DI, D2, D3, D4 and D7, a SpCas9 nickase and a single guide RNA targeting Site2a. The values shown represent the degree of nucleotide conversion (in percentage) for each of the Site2a nucleotides shown. "NT" represents the non-transfected control. In the context of N-to-G editing (e.g., TadAA variants), G-to-G events were excluded from the analysis, shown as (not applicable).

Figure 3 is a representation of the percentage of N-to-G conversions across the Site2a protospacer (SEQ ID NO: 24) (Fig. 3A), Site2c (SEQ ID NO: 25) (Fig. 3B), Site45 (SEQ ID NO: 26) (Fig. 3C) and Site312 (SEQ ID NO: 27) (Fig. 3D) protospacer sequences using an aptamer-recruitment-dependent base editing system comprising the different ecTadAA variants C-terminally fused to MCP: Dl-MCP, D2-MCP, D3-MCP, D4-MCP, D5-MCP, D6-MCP and D7-MCP, the SpCas9_D10A nuclease and a gRNA containing an MS2 aptamer in 3' position. The MS2 aptamer will recruit the ecTadAA variants fused to MCP protein. The values shown represent the degree of A-to-G transition (in percentage) for each nucleotide across the corresponding protospacer. "NT" represents the non-transfected control. In the context of N-to-G editing (e.g., TadAA variants), G-to-G events were excluded from the analysis, shown as "-" (not applicable).

Figure 4 is a dot plot that provides a representation of the A-to-G transition (in percentage) of the A in position 5 ("A5") of Site2a (Fig. 4A), in position 5 ("A5") of Site2c (Fig. 4B), in position 6 ("A6") of Site45 (Fig. 4C), and in positions 5 and 7 ("A5"; "A7") for Site312 (Fig. 4D) protospacer sequences using the same aptamer-recruitment-dependent base editing system as described in Fig. 3. It shows the comparison of the percentage of A-to-G transition between the ecTadAA variants Dl-MCP, D2-MCP and D3-MCP and a corresponding ecTadA substitution variant which contains an amino acid substitution instead of the deletion at the same position. Dl-MCP was compared to ecTadA-D108N- MCP, D2-MCP was compared to ecTadA-R107C-MCP, and D3-MCP was compared to ecTadA-A106V. For Site312 in Fig. 4D, the A-to-G transition of the A in position 7 is presented in open circled dots. "NT" represents the non-transfected control.

Figure 5 is a representation of the percentage of N-to-G conversions across the Site2c protospacer (SEQ ID NO: 25) (Fig. 5A) and Site45 protospacer (SEQ ID NO: 26) (Fig. 5B) protospacer sequences using the aptamer-recruitment-dependent base editing system described in Example 2 with the ecTadAA variants: Dl-MCP as a monomer, N-terminally fused MCP-D1, D1-D1-MCP as a homodimer, Dl-WT-MCP as a heterodimer, N-terminally fused MCP-D1-WT as a heterodimer, and WT-D1-MCP as a heterodimer. The values shown represent the percentage of N-to-G conversion for each nucleotide across each protospacer. "NT" represents the non-transfected control. In the context of N-to-G editing (e.g., TadAA variants), G-to-G events were excluded from the analysis, shown as "-" (not applicable).

Figure 6 is a representation of percentage of N-to-G conversions across the Site45 protospacer (SEQ ID NO: 26) (Fig. 6A), SiteB2M protospacer (SEQ ID NO: 28) (Fig. 6B) and Site2c protospacer (SEQ ID NO: 25) (Fig. 6C) sequences using a base editing system comprising either a CRISPR-Cas protein with at least one catalytically inactive nuclease domain (e.g., SpCas9_D10A nickase) or a CRISPR-Cas protein with no catalytically active nuclease domains (e.g., nuclease-dead SpCas9_D10A+H840A). The base editing system further comprised the ecTadAA variant Dl-MCP, as described in Example 2. The values shown represent the percentage of G conversions for each nucleotide across each protospacer. "NT" represents the non-transfected control. In the context of N-to-G editing (e.g., TadAA variants), G-to-G events were excluded from the analysis, shown as "-" (not applicable).

Figure 7 is a representation of the percentage of N-to-G and N-to-T conversions across the Site2c protospacer (SEQ ID NO: 25) (Fig. 7A) and SiteB2M protospacer (SEQ ID NO:28) (Fig. 7B) for a multiplex base editing with an ecTadAA deletion variant and ratAPOBECl. Base conversions were achieved using an aptamer-recruitment-dependent base editing system comprising an individual ecTadAA variant (e.g., SEQ ID NO: 43) and ratAPOBECl (SEQ ID NO: 72), both C-terminally fused to MCP (Dl-MCP and ratAPOBEC-MCP), the SpCas9_D10A nuclease and a gRNA containing an MS2 aptamer in 3' position. The MS2 aptamer recruited Dl-MCP and ratAPOBEC-MCP to the respective target regions at Site2c or SiteB2M to allow for simultaneous base editing of adenosine and cytidine in the protospacer sequence. The values shown represent the degree of N-to-G (Dl-MCP, left column) or N-to-T (ratAPOBEC- MCP, right column) conversion in percent for each nucleotide across the corresponding protospacer. "NT" represents the non-transfected control. The unprocessed Sanger sequence trace data is shown in Fig. 7C for B2M (see also example 7; Fig. 7B). "NT B2M" represents the non-transfected control at the SiteB2M protospacer (SEQ ID NO:28). In the context of N-to-G editing (e.g., TadAA variant), G-to-G events were excluded from the analysis, shown as "-" (not applicable). In the context of N-to-T editing (e.g., APOBEC1), T- to-T events were excluded from the analysis, shown as "-" (not applicable).

Figure 8 is a representation of the percentage of N-to-G conversions across the Site2c protospacer (SEQ ID NO: 25) using a fusion system of a SpCas9_D10A nuclease and an ecTadAA variant (Fig. 8A) and across Site45 protospacer in Fig. 8B. The values shown represent the degree of A-to-G transition (in percentage) for each nucleotide across the corresponding protospacer. "NT" represents the non-transfected control. In the context of N-to-G editing, G-to-G events were excluded from the analysis, shown as "-" (not applicable).

Figure 9 is a representation of the percentage of N-to-G conversions across the Site2c protospacer (SEQ ID NO: 25) in hiPSCs, using an aptamer-recruitment-dependent base editing system comprising one ecTadAA variant C-terminally fused to MCP: Dl-MCP, a Cas9 nickase (SpCas9_D10A) and a gRNA containing an MS2 aptamer in 3' position. The values shown represent the degree of A-to-G transition (in percentage) for each nucleotide across the corresponding protospacer. "NT" represents the non-transfected control. In the context of N-to-G editing, G-to-G events were excluded from the analysis, shown as "-" (not applicable).

Description of Embodiments

Definitions

All publications, patents and patent applications cited herein are hereby incorporated by reference in their entirety for all purposes.

In accordance with the present disclosure, each occurrence of the term "comprising" may optionally be substituted with the term "consisting of" or "consisting essentially of". The word "comprise" or variations, such as "comprises" or "comprising" will be understood to imply the inclusion of a stated integer (or components) or group of integers (or components), but not the exclusion of any other integer (or components) or group of integers (or components). The articles "a / an" and "the" are used herein to refer to one or to more than one (e.g., to at least one) of the grammatical object of the article unless otherwise clearly indicated by contrast. By way of example, "an element" means one element or more than one element. The term "or" is used herein to mean, and is used interchangeably with, the term "and/or," unless context clearly indicates otherwise.

The term "about" as used herein refers to a deviation of ± 10 % from the recited value. When the word "about" is used herein in reference to a number, it should be understood that still another embodiment includes that number not modified by the presence of the word "about". In the absence of the term "about" and unless the context dictates otherwise, generally accepted rounding rules apply to the specified values.

"Polypeptide", "peptide", and "protein", as used herein, are used interchangeably herein and refer to a polymeric form of amino acids of any length, which can include genetically coded and non-genetically coded amino acids, chemically or biochemically modified or derivatized amino acids, and polypeptides having modified peptide backbones. The term includes fusion proteins including fusion proteins with a heterologous amino acid sequence, fusions with heterologous and homologous leader sequences, with or without N-terminal methionine residue, immunologically tagged proteins, and the like. "Fusion protein", as used herein, refers to a non-naturally occurring fusion which contain one or more protein(s). Fusion proteins are molecules that contain a portion or a complete amino sequence of each of two or more proteins. The components of fusion proteins may be fused directly to each other through, for example, covalent bonds or through linkers as described below. Fusion proteins may also be associated with moieties that do not contain amino acids such as nucleotides sequences.

"Linker", as used herein, refers to a bond (e.g., covalent bond), chemical group, or a molecule linking two molecules or moieties, e.g., two domains of a fusion protein, such as, for example, a Cas protein and an effector protein. Typically, the linker is positioned between, or flanked by, two groups, molecules, or other moieties and connected to each one via a covalent bond, thus connecting the two. In some embodiments, the linker is an amino acid or a plurality of amino acids (e.g., a peptide or protein). In some embodiments, the linker is an organic molecule, group, polymer, or chemical moiety. In some embodiments, the linker is 1-100 amino acids in length, for example, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 30-35, 35-40, 40-45, 45-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-150, or 150-200 amino acids in length. Longer or shorter linkers are also contemplated.

"Naturally-occurring", as used herein, refers to a nucleic acid, a protein, a cell, or an organism that is found in nature.

"Recombinant", as used herein, refers to a nucleic acid, a protein, a cell, or an organism that is artificially produced (e.g., formed by laboratory methods). "Recombinant nucleic acid", as used herein, refers to a nucleic acid that is the product of various combinations of cloning, restriction, and/or ligation steps resulting in a construct having a structural coding or non-coding sequence distinguishable from nucleic acids found in natural systems. "Recombinant polypeptide", as used herein, refers to a polypeptide that is not naturally occurring, e.g., is made by the artificial combination of two otherwise separated segments of amino acid(s).

"Nucleic acid" and "polynucleotide", as used herein, are used interchangeably, and refer to biopolymers containing nucleotides. Nucleic acid molecules refer to a polymeric form of nucleotides of any length, either ribonucleotides or deoxyribonucleotides. Thus, this term includes single-, double- or multi-stranded DNA or RNA, genomic DNA, cDNA, DNA-RNA hybrids, or a polymer comprising purine and pyrimidine bases or other natural, chemically or biochemically modified, non-natural, or derivatized nucleotide bases. The nucleic acid sequence can be DNA or RNA or combinations thereof of any length, can be linear, circular or branched and can be single-stranded or double-stranded or combinations thereof. As applicable to the embodiment being described, the terms "nucleic acid" and "polynucleotide" include single-stranded, double-stranded or multi-stranded polynucleotides. A "nucleotide" refers to a ribonucleotide or a deoxyribonucleotide or modified form thereof, as well as analogs thereof. Nucleotides include species that comprise purines, e.g., adenine, hypoxanthine, guanine, and their derivatives and analogs, as well as pyrimidines, e.g., cytosine, uracil, thymine, and their derivatives and analogs. Preferably, a nucleotide comprises a cytosine, uracil, thymine, adenine, or guanine moiety. Further, the term nucleotide also includes those species that have a detectable label, such as for example a radioactive or fluorescent moiety, or mass label attached to the nucleotide. The term nucleotide also includes what are known in the art as universal bases. By way of example, universal bases include but are not limited to 3-nitropyrrole, 5- nitroindole, or nebularine. Nucleotide analogs are, for example, meant to include nucleotides with bases such as inosine, queuosine, xanthine, sugars such as 2'-methyl ribose, and non-natural phosphodiester internucleotide linkages such as methylphosphonates, phosphorothioates, phosphoroacetates and peptides.

"Nucleic acid molecule", as used herein, refers to biopolymers containing nucleotides (see, e.g., "nucleic acid"). In the context of the nucleic acid deaminase of the disclosure, the nucleic acid molecule comprises at least one base as a substrate for a nucleic acid deaminase (e.g., a deoxyadenosine, dA). In some instances, the nucleic acid molecule comprises a "DNA region" comprising the at least one dA. In some instances, the nucleic acid molecule may be a DNA selected from the group consisting of genomic DNA, nuclear DNA, chromosomal DNA, organellar DNA (e.g., mitochondrial DNA, chloroplast DNA, and the like), exogenous DNA (e.g., episomal DNA, a minicircle, viral DNA, a stably maintained plasmid, and the like).

"Isolated", as used herein, refers to a nucleic acid, a protein, a cell, or an organism that is in an environment different from that in which the nucleic acid, the protein, the cell, or the organism naturally occurs. "Isolated nucleic acid", as used herein, can encompass naturally occurring as well as artificial (e.g., chemically or enzymatically modified) parts or building blocks. "Isolated cell", as used herein, can encompass a cell that is substantially separated from other cells of a tissue. "Regulatory element", "regulatory sequence", and "control element", used interchangeably herein, refer to transcriptional and translational control sequences, such as promoters, enhancers, polyadenylation signals, terminators, protein degradation signals, and the like, that provide for and/or regulate expression of a (non) coding sequence and/or production of the encoded polypeptide in a host cell.

"Operably linked", as used herein, refers to a juxtaposition wherein the components so described are in a relationship permitting them to function in their intended manner. For example, a promoter is operably linked to a coding sequence if the promoter affects its transcription, expression, or degradation level.

"Vector", "plasmid" or "construct", as used herein, are used interchangeably and refer to a vehicle (e.g., a molecule or complex) to transfer genetic material into a cell for the purpose of the expression and/or propagation of the genetic material, or to be used in the construction of other genetic material. In some instances, a vector is a plasmid (e.g., a stably maintained plasmid), a virus or bacteriophage, a cosmid or an artificial chromosome. In most instances, a vector refers to a nucleic acid molecule harboring at least one origin of replication, a multiple cloning site (MCS) and one or more selection marker(s). A vector can be introduced into cells and organisms to express RNA transcripts, proteins, and peptides, and may be termed an "expression vector".

"Cell", as used herein, are used interchangeably, and refer to an in vivo or in vitro cell, such as an in vivo or in vitro eukaryotic cell, prokaryotic cell, or a cell obtained from a multicellular organism (e.g., a cell line) used as recipients for a nucleic acid (e.g., an expression vector), and/or a polypeptide (e.g., a ribonucleoprotein, "RNP" complex). A recombinant host cell is a host cell which has been introduced a nucleic acid (e.g., an expression vector), and/or a polypeptide (e.g., RNP complex). For example, a host cell supports the replication of a vector or expression of a protein. In most instances, host cells are eukaryotic cells such as yeast, fungal, protozoal, higher plant, insect (e.g., Sf9), amphibian cells, or mammalian cells (e.g., HEK293, U2OS, immune cells, primary immune cells, primary T cells, induced pluripotent stem cells (iPSCs), stem cells, CHO, HeLa, Vero, MDCK, BHK, COS-1 etc.). Mammalian cells include, e.g., cultured cells (in vitro), explants and primary cultures (in vitro and ex vivo), and cells in vivo. In some embodiments, the cell is an in vitro cell that is not the human body, at the various stages of its formation and development. "Genetically modified cell" or "genetically engineered cell", as used herein, are used interchangeably and refer to an in vivo or in vitro cell, such as an in vivo or in vitro eukaryotic cell, prokaryotic cell, or a cell obtained from a multicellular organism (e.g., a cell line) which has been modified in its nucleic acid, such as its extrachromosomal nucleic acid or its genomic nucleic acid or genomic nucleic acid on a chromosome. For example, the genetically engineered isolated cell can be modified by the method of the present disclosure in its DNA genome or in an extrachromosomal DNA or in genomic DNA on a chromosome.

"Transformation", as used herein, refers to a transient or a permanent genetic change induced in a cell following introduction of a vector into the cell (e.g., a nucleic acid exogenous to the cell). Suitable methods for genetic modification include viral infection, transfection, conjugation, protoplast fusion, electroporation, particle gun technology, calcium phosphate precipitation, direct microinjection, and the like. The choice of method is generally dependent on the type of cell being transformed and the circumstances under which the transformation is taking place (e.g., in vitro, ex vivo, or in vivo).

"Sequence identity", as used herein, with respect to a nucleic acid sequence or a protein sequence is defined as the percentage of nucleotides or amino acid residues in a candidate sequence that are identical with the nucleotides or amino acid residues in the specific (parental) sequence, after aligning the sequences and introducing gaps, if necessary, to achieve the maximum percent sequence identity, and not considering any conservative substitutions as part of the sequence identity. Alignment for purposes of determining percent nucleotide or amino acid sequence identity can be achieved in various ways that are within the skill in the art. For instance, using publicly available computer software such as BLAST (available at ncbi.nlm.nih.gov/BLAST (Altschul et al J Mol Biol. 1990 Oct 5;215(3):403-10. doi: 10.1016/S0022-2836(05)80360-2), BLAST-2, ALIGN or Megalign (DNASTAR) software. Those skilled in the art can determine appropriate parameters for measuring alignment, including any algorithms needed to achieve maximal alignment over the full length of the sequences being compared.

"Complementary", as used herein, refers to the ability of a nucleic acid to form one or more hydrogen bonds with another nucleic acid sequence by either traditional Watson-Crick base-pairing or other non-traditional types of base pairs. A percent complementarity indicates the percentage of residues in a nucleic acid molecule that can form hydrogen bonds (e.g., Watson-Crick base pairing) with a second nucleic acid sequence (e.g., 5, 6, 7, 8, 9, 10 out of 10 being 50%, 60%, 70%, 80%, 90%, and 100% complementary, respectively). "Perfect complementarity" means that all of the contiguous residues of a nucleic acid sequence will hydrogen bond with the same number of contiguous residues in a second nucleic acid sequence. "Substantially complementary" refers to a degree of complementarity that is at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, or at least 99%, over a region of, for example, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, or more consecutive nucleotides, or refers to two nucleic acids that hybridize under stringent conditions.

"Wild-type sequence" and "WT sequence" and "WT", as used herein, are used interchangeably, and refer to an amino acid sequence or a nucleotide sequence that can be used as template for subsequent reactions or modifications. The wild-type sequence may include a nucleic acid sequence (such as DNA or RNA or combinations thereof) or an amino acid sequence or may be composed of different chemical entities. In some instances, the sequence initially provided would be regarded as wild-type sequence in view of downstream processes based thereon, irrespective of whether the sequence itself is a natural (i.e., is found in nature, including allelic variations and has not been intentionally modified) or a modified sequence (e.g., the sequence was modified with regard to another wild-type sequence or is completely artificial). Methods to obtain a wild-type sequence by chemical, enzymatic or other means are known in the art. A nucleic acid wild-type sequence may be obtained by PCR amplification of a corresponding template region or may be synthesized de novo based on assembly of synthetic oligonucleotides.

"Wild-type TadA", as used herein, refers to the wild-type (WT) TadA enzyme which comprises a five-stranded beta-sheet core, with five alpha-helices wrapped around to form the active site. In addition, wild-type TadA displays a loop region that joins the beta4 and beta5 strands ("loop between |34 and |35" or "|34-|35 loop") of TadA. In wild-type Escherichia coli TadA, ecTadA, the |34-|35 loop comprises or consists of residue numbers 118 to 142 (24 amino acids) (Fig. 2 of Kim et al. Biochemistry, Vol. 45, No. 20, 2006, doi: 10.1021/bi0522394; Rallapalli et al. 2020, doi: 10.1126/sciadv.aaz2309).

"|34-|35 loop segment", as used herein, refers to amino acid residues from 105 to 130 of the wild-type ecTadA or a corresponding region in other wild-type TadA proteins from other organisms. "A region of the wild-type TadA contacting the nucleic acid molecule", as used herein, refers to a region contacting the target nucleic acid molecule or a corresponding region in other wild-type TadA proteins.

"Nucleic acid deaminase", as used herein, refers to enzymes involved in purine metabolism or pyrimidine metabolism. A "nucleic acid deaminase capable of deaminating a deoxyadenosine (dA) in a nucleic acid molecule" as used herein refers to a deaminase enzyme that acts on deoxyadenosine (also referred to as "adenosine deaminase") and deaminates the deoxyadenosine to convert it into an inosine, which base pairs like guanine (G) in the context of DNA.

"Nucleic acid deaminase variant" or "TadA variant", as used herein, refers to a polynucleotide or polypeptide of a nucleic acid deaminase which has a sequence substantially similar to a reference WT polynucleotide or polypeptide. In particular, the "nucleic acid deaminase variant" or "TadA variant" comprises at least one amino acid residue difference (e.g., a deletion, insertion or a substitution) compared to a wild-type nucleic acid deaminase. Sequences may be analyzed by sequence comparison, typically one sequence acts as a reference sequence (e.g., wild-type TadA), to which one or more sequence(s) (e.g., TadA variant) are compared. When using a sequence comparison algorithm, a reference sequence and test sequence(s) are entered into a computer and sequence algorithm program parameters are designated. Default program parameters can be used, as described for the BLASTN (nucleic acids) and BLASTP (proteins) programs, or alternative parameters can be designated. The sequence comparison algorithm then calculates the percent sequence identities for the test sequence(s) relative to the reference sequence, based on the program parameters. One algorithm suitable for determining percent sequence identity and sequence similarity are BLAST algorithms. Software for performing BLAST analyses is publicly available through the National Center for Biotechnology Information. Similarities and/or differences in sequences between a variant and the reference polynucleotide can be detected using conventional techniques known in the art, for example polymerase chain reaction (PCR) and hybridization techniques.

"TadA deletion variant", "TadA variant", "TadAA variant" or "TadAA deletion variant", as used herein, comprises an amino acid sequence of a wild-type TadA, wherein one or more amino acid residue(s) of the wild-type TadA are deleted, i.e., refers to a TadA variant wherein one or more amino acid residue(s) are deleted in the obtained amino acid sequence (TadAA) as compared to the wild-type amino acid sequence (WT TadA).

"Mutation", as used herein, refers to a substitution of a residue within a sequence with another residue (e.g., a nucleic acid or amino acid sequence), a deletion of one or more residues within a sequence (e.g., a nucleic acid or amino acid sequence), or an insertion of one or more residues within a sequence (e.g., a nucleic acid or amino acid sequence). For example, "mutation of an amino acid residue" means that a specific amino acid residue in an amino acid sequence of a protein is substituted, deleted or inserted with an amino acid residue different from an amino acid residue in a corresponding wild-type amino acid sequence. The determination of the presence or absence of the mutation of the amino acid residue described above may be performed by known methods. Methods to obtain a mutated sequence by chemical, enzymatic or other means are known in the art.

"One or more mutation(s)", as used herein, refers to a one or more mutation are selected from any one of the mutations which have an effect on the editing efficiency, nucleotide context preference and/or off targets. For example, mutation may be selected from mutations described in Zhou, C. et al. Nature 571, 275-278 (2019), doi: 10.1038/s41586- 019-1314-0; Li, J. et al. Nat. Commun. 12, 2287 (2021), doi: 10.1038/s41467-021-22519-z; Grunewald, J. et al. Nat. Biotechnol. 37, 1041-1048 (2019), doi:10.1038/s41587-019-0236- 6; Rees et al Sci Adv (2019), doi: 10.1126/sciadv.aax5717.

"One or more amino acid residue(s) of the wild-type amino acid sequence are deleted", as used herein, refers to one or more amino acid residue(s) which are missing in the obtained amino acid sequence as compared to the wild-type amino acid sequence. Methods to obtain a nucleic acid deaminase variant wherein one or more amino acid residue(s) of the wild-type amino acid sequence are deleted are site-directed mutagenesis, G-block™ synthetic generation, custom PCR amplification and others known by the person skilled in the art. Methods to delete one or more amino acid residues have been described, for example in Kenneth W. Walker, Jeremy D. King. Site-Directed Mutagenesis. Encyclopedia of Cell Biology (Second Edition), Academic Press, 2023, pages 161-169. In some embodiments, apart from the one or more amino acid residue(s) of the wild-type amino acid sequence being deleted, the variant comprises other mutations.

The term "amino acid deletion" or "deletion," as used herein, refers to the removal of an amino acid at a particular position in a parent, reference or wild-type (WT) polypeptide sequence. For example, D108- or D108A or D108del designates a deletion of aspartic acid at position 108.

"TadA substitution variant", as used herein, refers to a TadA variant wherein one or more amino acid residue(s) are substituted in the obtained amino acid sequence by another amino acid residue, as compared to the wild-type amino acid sequence (WT TadA).

"ecTadA", as used herein, refers to Escherichia coli (ec) transfer RNA adenosine deaminase enzyme (TadA).

"Aptamer-recruitment-dependent base editing system", as used herein, refers to a base editor comprising (i) an effector protein comprising at least one nucleic acid deaminase according to the present disclosure (TadA deletion variant) and at least one ligand capable of binding to a ligand binding moiety; (ii) a sequence-targeting protein; and (iii) an RNA- ligand binding complex comprising (a) an RNA moiety capable of binding to the sequencetargeting protein, (b) an RNA moiety capable of binding to a target site and (c) at least one ligand binding moiety; and wherein the at least one ligand of the effector protein (i) binds to the ligand binding moiety of the RNA-ligand binding complex (iii). For example, if all parts (i), (ii) and (iii) are present in a cell, the sequence-targeting protein (ii) and the RNA-ligand binding complex (iii) are capable of recruiting the effector protein (i) to a target site (e.g., a protospacer) in the nucleic acid molecule.

"Base editing", as used herein, refers to a process wherein a nucleotide base is modified as compared to the initial (e.g., wild-type) base at the same position. Base editing (e.g., targeted point mutations) will necessarily reproduce the change in any mRNA that is transcribed from the edited DNA. Adenosine deaminases remove an amino group from the deoxyadenosine nucleotide target, converting the dA into inosine. During DNA repair or replication, inosine is recognized as guanine by polymerase enzymes, resulting in transition (also referred to as "conversion") of an A:T base pair into a G:C base pair in the DNA that has been edited.

"Target site" of a base editor, as used herein, refers to a nucleic acid sequence comprising at least one deoxyadenosine (dA). To specifically define and distinguish a target site from other sites on a nucleic acid molecule (e.g., a single nucleic acid molecule; two or more nucleic acid molecules having different sequences), the target site must have a sufficient sequence length which depends on the approximate size of the total sequence space (e.g., in the context of a mammalian genome, the target site comprises more than 10 specifically defined nucleotides including the at least one dA). For example, in the context of a base editor comprising a sequence-targeting protein such as a CRISPR-Cas protein, a target site may comprise 17-20 nucleotides (protospacer) including at least one dA (e.g., located in a position between 4 to 11 within the protospacer).

"Editing activity", as used herein, refers to the adenosine deaminase activity for converting a target N-to-G in a DNA target and the level or percentage of N-to-G conversions in a desired target nucleic acid in an editing assay, as measured, e.g., by Sanger sequencing, deep-sequencing or Next Generation Sequencing or any other method known to the person skilled in the art. The conversion is sufficient if at least about 1%, about 3%, about 5% of the editing target bases are converted under appropriate conditions.

"Sequence-targeting protein", as used herein, refers to a protein capable of targeting a specific nucleic acid sequence (e.g., binding to a specific nucleic acid sequence is preferred as compared to nucleic acid sequences having at least one nucleotide difference to the specific nucleic acid sequence). The sequence-targeting protein may be selected from the group consisting of an amino acid sequence motif (e.g., a pentatricopeptide repeat, PPR), a meganuclease (MN), a zinc-finger nuclease (ZFN), a transcription activator-like effector nuclease (TALEN) and a CRISPR-Cas system.

"Guide RNA" or "gRNA", as used herein, refers to a nucleic acid comprising a nucleotide sequence (guide sequence) that is complementary to a sequence (target site) of a target nucleic acid, and the nucleic acid of the gRNA forms a complex with a CRISPR-Cas protein. A gRNA comprises, consists essentially of, or consists of a CRISPR RNA (crRNA) and in some embodiments, it may also comprise a trans-activating CRISPR RNA (tracrRNA). It may be created synthetically or enzymatically, and it may be in the form of a contiguous strand of nucleotides in which case it is a single guide RNA (sgRNA), or in some embodiments, formed by the hybridization of a crRNA and a tracrRNA that are not covalently linked together to form a contiguous chain of nucleotides (also referred to as a two-part gRNA). Additionally, each gRNA (or component thereof, e.g., crRNA and tracrRNA if present) may independently be encoded by a vector such as a plasmid, a lentivirus, an adeno associated virus (AAV), a retrovirus, an adenovirus, a coronavirus, a Sendai virus, and the like.

"Ligand binding moiety", as used herein, refers to a moiety such as an aptamer e.g., oligonucleotide or peptide or another compound that binds to a specific ligand and can reversibly or irreversibly be associated with that ligand. To be reversibly associated means that two molecules or complexes can retain association with each other by, for example, non-covalent forces such as hydrogen bonding, and be separated from each other without either molecule or complex losing the ability to associate with other molecules or complexes. In a base editor, the ligand binding moiety links the nucleic acid deaminase and the sequence-targeting protein through the binding of the ligand binding moiety to its ligand.

In some embodiments, the least one nucleic acid deaminase and the ligand are directly fused or connected, i.e., no linker is present between the deaminase and the ligand. In other embodiments, both molecules are connected through a linker. In other embodiments, both molecules are linked via other molecules or domains, such as via a SH3 domain, wherein the two molecules are connected through a SH3 (Src 3 homology) domain in one molecule and a SHL (SH3 interaction ligand) in the other molecule.

In some embodiments, the effector protein and the sequence-targeting protein are directly fused or connected, i.e., no linker is present between the effector and the sequencetargeting protein. In other embodiments, both molecules are connected through a linker. In other embodiments, both molecules are linked via other molecules or domains, such as via a SH3 domain, wherein the two molecules are connected through a SH3 (Src 3 homology) domain in one molecule and a SHL (SH3 interaction ligand) in the other molecule.

"RNA-ligand binding complex", as used herein, refers to a complex comprising two subcomponents: (a) an RNA moiety capable of binding to the sequence-targeting protein and (b) an RNA moiety capable of binding to a target site in a nucleic acid molecule. As a further component, the complex may comprise at least one ligand binding moiety. In one embodiment, the ligand binding moiety is an RNA motif. In some embodiments, the RNA moiety capable of binding to the sequence-targeting protein is a guide RNA. In some embodiments, the RNA moiety capable of binding to a target site in a nucleic acid molecule is a CRISPR motif, such as a tracrRNA.

The nucleotides within the RNA strand(s) of the components of the RNA-ligand binding complex and of a gRNA may be entirely ribonucleotides or a combination of ribonucleotides and other nucleotides such as deoxyribonucleotides. Each nucleotide may be unmodified, or one or more nucleotides, if not all nucleotides may be modified, e.g. with one or more of the following modifications: 2'-O-methyl, 2' fluoro or 2' aminopurine. In some embodiments, over one or more ranges of one to forty or two to twenty or 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 35, or 36 nucleotides, there are consecutively modified nucleotides or a modification pattern of every second, or every third or every fourth nucleotide being modified at its 2' position with all other nucleotides being unmodified. Additionally, or alternatively, between one or more pairs or every pair of consecutive nucleotides, there may be modified or unmodified internucleoside linkages.

"Treatment" or "treating", as used herein, refers to obtaining a desired pharmacologic and/or physiologic effect. The effect may be prophylactic in terms of completely or partially preventing a disease or symptom(s) thereof and/or may be therapeutic in terms of a partial or complete cure for a disease and/or adverse effect attributable to the disease. The treatment covers any treatment of a disease in a subject and includes preventing the disease from occurring in the subject which may be predisposed to the disease but has not yet been diagnosed as having it, inhibiting the disease (arresting its development), and relieving the disease (causing regression of the disease).

A "subject" (also referred to as "individual" or "patient"), as used herein, refers to an individual organism, e.g., a mammal including murines, simians, non-human primates, humans, animals (e.g., farm animals, sport animals, and pets such as dogs and cats), and plants. The tissues, cells and their progeny of an organism or other biological entity obtained in vivo or cultured in vitro are also encompassed within the terms subject and patient. Additionally, in some embodiments, a subject may be an invertebrate animal, for example, an insect or a nematode; while in others, a subject may be a plant or a fungus.

Detailed Description of Embodiments

Hereinafter, embodiments for carrying out the present disclosure will be described in detail. However, the present disclosure is not limited to the following embodiments.

A nucleic acid deaminase (TadA deletion variant)

The wild-type TadA enzyme does not exhibit any significant enzymatic activity on ssDNA when fused to a catalytically impaired Cas9 protein (Cas9 D10A nickase, nCas9) in a single polypeptide chain. Surprisingly, the deletion of one or more amino acids in the WT TadA deaminase causes that the TadA deletion variants of the present disclosure modify DNA. Without wishing to be bound by any particular theory, the adenosine deaminase described herein works by using TadA deletion variants to deaminate A bases in DNA, causing A-to-G mutations via inosine formation. Inosine preferentially hydrogen bonds with C, resulting in A-to-G mutation during DNA replication.

Provided are nucleic acid deaminase enzymes comprising an amino acid sequence of a wild-type TadA, wherein one or more amino acid residue(s) of the wild-type TadA are deleted (TadA deletion variant), and wherein the one or more deleted amino acid residue(s) are located within a region of the wild-type TadA contacting the nucleic acid molecule.

In one embodiment, the deleted amino acid residue(s) of the WT TadA which are deleted in the TadA deletion variant are one, two or three amino acid residue(s) selected from A106, R107 and D108 of ecTadA (SEQ ID NOs: 9-15) or a corresponding residue in a bacterial TadA (e.g., shown in any one of SEQ ID NOs: 2-8). In some embodiments, the TadA variants comprise other mutations apart from the one or more deletions. For example, the TadA variant may be optimized for minimized A-to-l RNA off-target activity by mutating one or more amino acid residues of wild-type TadA, e.g., by introducing ecTadA F148A (Zhou, C. et al. Nature 571, 275-278 (2019)), e.g., by introducing ecTadA R153 deletion (Li, J. et al. Nat. Commun. 12, 2287 (2021)), and/or by introducing ecTadA V82G/W (Grunewald, J. et al. Nat. Biotechnol. 37, 1041-1048 (2019)) or corresponding residues in a bacterial TadA (e.g., shown in any one of SEQ ID NOs: 2-8). In other embodiments, the TadA variant may be optimized by mutating one or more amino acid residues of wild-type TadA, e.g. by introducing ecTadA A106W (Gaudelli et al. 2020; doi:10.1038/nature24644), e.g., by introducing ecTadA N108Q and/or L145T (Chen et al. 2022; doi: 10.1101/2022.08.12.503700), e.g., by introducing ecTadA E59A (Rees et al. 2019; doi: 10.1126/sciadv.aax5717), e.g., by introducing ecTadA K20A, R21A and/or V82G (Grunewald, J. et al. Nat. Biotechnol. 37, 1041-1048 (2019)), or combinations thereof.

In some embodiments, the nucleic acid deaminase comprises or consists of a wild-type E.coli WT TadA, ecTadA enzyme (SEQ ID NO: 1) or a deaminase comprising at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% of sequence identity to SEQ ID NO: 1, wherein one, two or three amino acid residue(s) selected from A106, R107 and D108 are deleted.

In some embodiments, the nucleic acid deaminase comprises or consists of a Salmonella enterica WT TadA enzyme (SEQ ID NO: 2) or a deaminase comprising at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% of sequence identity to SEQ ID NO: 2, wherein one, two or three amino acid residue(s) selected from A117, R118 and D119 are deleted.

In some embodiments, the nucleic acid deaminase comprises or consists of a Staphylococcus aureus \N TadA enzyme (SEQ ID NO: 3) or a deaminase comprising at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% of sequence identity to SEQ ID NO: 3, wherein one, two or three amino acid residue(s) selected from A102, D103 and D104 are deleted.

In some embodiments, the nucleic acid deaminase comprises or consists of a Streptococcus pyogenes WT TadA enzyme (SEQ ID NO: 4) or a deaminase comprising at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% of sequence identity to SEQ ID NO: 4, wherein one, two or three amino acid residue(s) selected from A106, S107 and N108 are deleted.

In some embodiments, the nucleic acid deaminase comprises or consists of a Salmonella typhi WT TadA enzyme (SEQ ID NO: 5) or a deaminase comprising at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% of sequence identity to SEQ ID NO: 5, wherein one, two or three amino acid residue(s) selected from the corresponding residues of ecTadA's A106, R107 and D108 are deleted.

In some embodiments, the nucleic acid deaminase comprises or consists of a Haemophilus influenzae WT TadA enzyme (SEQ ID NO: 6) or a deaminase comprising at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% of sequence identity to SEQ ID NO: 6, wherein one, two or three amino acid residue(s) selected from the corresponding residues of ecTadA's A106, R107 and D108 are deleted.

In some embodiments, the nucleic acid deaminase comprises or consists of a Caulobacter vibrioides (Caulobacter crescentus) WT TadA enzyme (SEQ ID NO: 7) or a deaminase comprising at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% of sequence identity to SEQ ID NO: 7, wherein one, two or three amino acid residue(s) selected from the corresponding residues of ecTadA's A106, R107 and D108 are deleted.

In some embodiments, the nucleic acid deaminase comprises or consists of a Shewanella putrefaciens strain 4H WT TadA enzyme (SEQ ID NO: 8) or a deaminase comprising at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% of sequence identity to SEQ ID NO: 8, wherein one, two or three amino acid residue(s) selected from the corresponding residues of ecTadA's A106, R107 and D108 are deleted.

Provided are nucleic acid deaminase enzymes comprising a TadA deletion variant in the form of a monomer, a homodimer (two identical TadA variants) or a heterodimer (e.g., two different TadAA deletion variants, or a heterodimer formed by a TadAA deletion variant and a WT TadA). In one embodiment, a TadA deletion variant is provided in the form of a monomer. In other embodiments, the TadA deletion variant is forming part of a multi meric form, such as a trimer or a combination of any other number of TadA deletion variants and WT TadA proteins. In other embodiments, the TadA deletion variant deaminase is forming part of a multimeric form in combination with a cytidine deaminase.

TadA deaminases have been described in various species, i.e., considered TadA orthologs. In some embodiments, the wild-type TadA is a bacterial TadA deaminase. In some embodiments, the bacterial TadA deaminase is selected from the group consisting of Escherichia coli TadA, ecTadA (e.g., UniProtKB P68398; SEQ ID NO: 1), Salmonella enterica TadA (e.g., BioCyc database, STM2568; SEQ ID NO: 2), Staphylococcus aureus TadA (e.g., PDB: 2B3J_A; SEQ ID NO: 3), Streptococcus pyogenes TadA (e.g., UniProtKB Q5XE14; SEQ ID NO: 4), Salmonella typhi TadA (e.g., UniProtKB Q8XGY4; SEQ ID NO: 5), Haemophilus influenzae TadA (e.g., NCBI Reference Sequence: NZ_JAUPHP010000001.1; SEQ ID NO: 6), Caulobacter vibrioides (Caulobacter crescentus) TadA (e.g., UniProtKB A0A258DDM2; SEQ ID NO: 7), and Shewanella putrefaciens TadA (e.g., NCBI Reference Sequence: NZ_CP104755.1; SEQ ID NO: 8).

In some embodiments, the one or more deleted amino acid residue(s) in the WT TadA protein are located within a region of the wild-type TadA contacting the nucleic acid molecule. In one embodiment, said region interacting with the nucleic acid molecule to be modified comprises amino acids 105 to 130 of E.coli WT TadA, or a corresponding region in a TadA ortholog: amino acids 116-141 in Salmonella enterica TadA, amino acids 101-126 in Staphylococcus aureus and 105-130 in Streptococcus pyogenes. See Table 1.

Table 1: Corresponding amino acids and numbering of |34-|35 loop segment in different bacterial TadA proteins. Shaded amino acids are identical amino acids between the four bacterial species. Compositions comprising the nucleic acid deaminase of the present disclosure and a sequence-targeting protein

Provided are compositions and methods that include one or more of (1) a nucleic acid deaminase protein (also referred to as "TadA deletion variant" or "TadAA variant"), a nucleic acid encoding the deaminase protein, and/or a modified cell comprising the deaminase protein (and/or a nucleic acid encoding the same) and (2) a sequence-targeting protein.

The sequence-targeting protein may be selected from the group consisting of an amino acid sequence motif (e.g., a pentatricopeptide repeat, PPR), a meganuclease (MN), a zinc- finger nuclease (ZFN), a transcription activator-like effector nuclease (TALEN), and a CRISPR-Cas system. A person skilled in the art will select the most adequate sequencetargeting protein based on, e.g., context, feasibility, efficiency, and safety. In one embodiment, the sequence-targeting protein is a CRISPR-Cas system.

In some embodiments, provided are compositions and methods that include one or more of (1) a "nucleic acid deaminase" protein (also referred to as "TadA deletion variant" or "TadAA variant"), a nucleic acid encoding the deaminase protein, and/or a modified cell comprising the protein (and/or a nucleic acid encoding the same); and (2) a CRISPR-Cas protein.

In other embodiments, provided also are compositions and methods that include (1) a "nucleic acid deaminase" protein (also referred to as "TadA deletion variant" or "TadAA variant"), a nucleic acid encoding the protein, and/or a modified cell comprising the protein (and/or a nucleic acid encoding the same); and one or more of (2) a CRISPR-Cas system, wherein a Cas protein bound to RNA is responsible for binding to a targeted sequence.

In some embodiments, the sequence-targeting protein is part of a fusion protein wherein the sequence-targeting protein is operably linked to a fusion partner with an activity (an effector protein comprising at least one nucleic acid deaminase according to the present disclosure) and may comprise one or more of a nuclear localization sequence (NLS), a linker, and combinations thereof.

In some embodiments, a CRISPR-Cas protein is part of a fusion protein wherein the CRISPR- Cas protein is operably linked to a fusion partner with an activity (an effector protein comprising at least one nucleic acid deaminase according to the present disclosure) and may comprise one or more of a nuclear localization sequence (NLS), a linker, and combinations thereof.

In some embodiments, this fusion protein including the nucleic acid deaminase of the disclosure is organized in one of the following nonlimiting ways (written N terminus to C terminus; the used in the general architecture indicates the presence of an optional linker or a direct peptide bond):

[CRISPR-Cas protein]-[nucleic acid deaminase];

[NLS]-[CRISPR-Cas protein]-[nucleic acid deaminase];

[NLS]-[CRISPR-Cas protein]-[NLS]-[nucleic acid deaminase];

[CRISPR-Cas protein]-[ nucleic acid deaminase]-[NLS];

[CRISPR-Cas protein]-[NLS]-[ nucleic acid deaminase];

[CRISPR-Cas protein]-[NLS]-[ nucleic acid deaminase]-[NLS];

[nucleic acid deaminase]-[CRISPR-Cas protein];

[nucleic acid deaminase]-[NLS]-[CRISPR-Cas protein];

[NLS]-[nucleic acid deaminase]-[CRISPR-Cas protein];

[nucleic acid deaminase]-[NLS]-[CRISPR-Cas protein]-[NLS];

[nucleic acid deaminase]-[CRISPR-Cas protein]-[NLS],

The optional linker may be a peptide linker comprising between 1 and 200 amino acids. In some embodiments, the peptide linker comprises the amino acid sequence selected from the group consisting of a (GGGGS)n (SEQ ID NOs: 97 and 98), a (G)n (SEQ ID NO: 99), an (EAAAK)n (SEQ ID NOs: 100 and 101), a (GGS)n (SEQ ID NO: 102), an SGSETPGTSESATPES (SEQ ID NO: 103) motif (see, e.g., Guilinger JP et al. Nat. Biotechnol. 2014; 32(6): 577-82) and an (XP)n (SEQ ID NO: 104) motif, or a combination of any of these, wherein n is independently an integer between 1 and 30. In some embodiments, n is independently 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30, or, if more than one linker or more than one linker motif is present, any combination thereof. Additional suitable linker motifs and linker configurations will be apparent to those of skill in the art, e.g., to improve folding, stability, expression, and/or bioactivity of the construct comprising the linker(s). In some embodiments, suitable linker motifs and configurations include those described in Chen et al., Fusion protein linkers: property, design and functionality. Adv Drug Deliv Rev. 2013; 65(10): 1357-69, the entire contents of which are incorporated herein by reference. For example, the linker(s) may be derived from naturally-occurring multi-domain proteins, linker(s) may be selected from empirical linkers such as flexible linkers, rigid linkers and cleavable linkers and combinations thereof (e.g., Chen et al. 2013, Table 3). In some embodiments, the peptide linker comprises the amino acid sequence set forth in SEQ ID NOs: 32-41.

Table 2: Amino acid sequences of linkers.

In some embodiments, instead of fusing the nucleic acid deaminase of the disclosure to the N- or C-terminus of a CRISPR-Cas protein, the nucleic acid deaminase can also be inlaid into the Cas sequence. This configuration is well known in the art and has been described, for example, in Wang et al. Sig Transduct Target Ther 4, 36 (2019) and Nguyen Tran et al. Nat Commun. 2020; 11: 4871.

In some embodiments, a nucleic acid deaminase of the present disclosure is a small molecule inducible nucleic acid deaminase (e.g., an adenosine deaminase). For example, the disclosed nucleic acid deaminase may be split into two parts to provide regulatory control over promiscuous deaminase activity. In some embodiments, said configuration comprises the use of a rapamycin-regulated heterodimerization of FK506 binding protein 12 (FKBP12) and FKBP rapamycin binding domain (FRB), wherein a first portion of the disclosed nucleic acid deaminase is fused to a sequence-targeting protein (e.g., a CRISPR- Cas protein) and to FKBP12 and a second portion of the disclosed nucleic acid deaminase is fused to FRB. Upon dimerization of FKBP12 and FRB after contact with a dimerization agent, the dimerization causes the two portions of the split nucleic acid deaminase enzyme to reform thereby resulting in formation of small molecule inducible complex. An example of a base editor including a split deaminase has been described, for example, in Berrios et al. Nature Chemical Biology volume 17, pagesl262-1270 (2021).

In some embodiments, a base editor of the present disclosure comprises an effector protein comprising at least one nucleic acid deaminase according to the present disclosure (TadA deletion variant) linked to a sequence targeting protein via an RNA-ligand binding complex (aptamer-recruitment-dependent base editing system; or RNA scaffold mediated effector recruitment). Said RNA-ligand binding complex may comprise an RNA moiety capable of binding to the sequence-targeting protein, an RNA moiety capable of binding to a target site in a nucleic acid molecule and a ligand-binding moiety.

In some embodiments, the ligand binding moiety and the ligand can be a pair selected from the group consisting of (1) a telomerase Ku binding motif and Ku protein or an RNA-binding section thereof, (2) a telomerase Sm7 binding motif and Sm7 protein or an RNA-binding section thereof, (3) a MS2 phage operator stem-loop and MS2 coat protein (MCP) or an RNA-binding section thereof, (4) a PP7 phage operator stem-loop and PP7 coat protein (PCP) or an RNA-binding section thereof, (5) a SfMu phage Com stem-loop and Com RNA binding protein or an RNA-binding section thereof, and (6) a non-natural RNA aptamer and corresponding aptamer ligand or an RNA-binding section thereof.

In some embodiments, the base editor of the disclosure comprises one or more nucleic acid deaminase according to the present disclosure (TadA deletion variant) fused to a ligand (such as MCP) and comprising the recruitment of the TadA deletion variant via a ligand binding moiety (such as MS2 aptamer), wherein an RNA-ligand binding complex comprises the ligand binding moiety. In some instances, the ligand is located C-terminally of the TadA deletion variant (e.g., C-terminal fusion of MCP).

In some embodiments, a base editor comprises a TadA deletion variant (TadAA), a Cas protein, a Nuclear Localization Signal (NLS), a linker and an RNA-ligand binding complex. Non-limiting examples of said compositions include:

Composition 1: o TadAA, e.g., ecTadAA; o Cas nickase linked to a NLS (e.g., NLS-SpCas9_D10A-NLS) or nuclease-dead Cas protein linked to a NLS (e.g., NLS-SpCas9_D10A+H840A-NLS); and o RNA-ligand binding complex comprising an RNA moiety capable of binding to the Cas protein and an RNA moiety capable of binding to a target site in a nucleic acid molecule (e.g., guide RNA, gRNA).

Composition 2: o TadAA linked to a ligand (e.g., ecTadAA-MCP or ecTadAA-Linker-MCP); o Cas nickase linked to a NLS (e.g., NLS-SpCas9_D10A-NLS) or nuclease-dead Cas protein linked to a NLS (e.g., NLS-SpCas9_D10A+H840A-NLS); and o RNA-ligand binding complex comprising an RNA moiety capable of binding to the Cas protein, an RNA moiety capable of binding to a target site in a nucleic acid molecule and a ligand-binding moiety (e.g., gRNA-MS2).

Composition 3: o TadAA linked to a further protein capable of site-specific deamination and a ligand (e.g., protein -TadAA- MCP, TadAA-protein-MCP, wherein denotes a direct peptide bond or a linker; e.g., APOBEC-Linker2-ecTadAA-Linkerl-MCP or ecTadAA- Linker2-APOBEC- Linkerl-MCP); o Cas nickase linked to a NLS (e.g., NLS-SpCas9_D10A-NLS) or nuclease-dead Cas protein linked to a NLS (e.g., NLS-SpCas9_D10A+H840A-NLS); and o RNA-ligand binding complex comprising an RNA moiety capable of binding to the Cas protein, an RNA moiety capable of binding to a target site in a nucleic acid molecule and a ligand-binding moiety (e.g., gRNA-MS2).

In some embodiments, a base editor comprising a TadA deletion variant may further comprise a protein capable of site-specific deamination selected from a cytosine deaminase (e.g., a naturally occurring cytidine deaminase such as APOBEC, AID or CDA; or variants thereof) or an adenine deaminase (e.g., a TadA variant such as a TadA substitution variant or a TadA deletion variant).

In some embodiments, the base editor comprising a TadA deletion variant and a further protein capable of site-specific deamination may comprise one or more units of uracil glycosylase inhibitor (UGI) peptide(s). CRISPR-Cas systems

Programmable, sequence-specific targeting systems (e.g., CRISPR-Cas systems) comprise a guide RNA and a sequence-targeting protein (e.g., an RNA-guided sequence-targeting protein).

CRISPR-Cas systems include a CRISPR-Cas protein that interacts with (binds to) a corresponding guide RNA (gRNA) to form a ribonucleoprotein (RNP) complex that is targeted to a particular sequence in a target nucleic acid via base pairing between the gRNA and the particular sequence. A gRNA includes a nucleotide sequence (guide sequence) that is complementary to a sequence (target site) in a nucleic acid. Thus, a CRISPR-Cas protein forms a complex with a gRNA, wherein the gRNA provides target specificity to the complex via the guide sequence, and the CRISPR-Cas protein provides the site-specific activity (e.g., cleave, nick or bind). A target site within a target nucleic acid includes a nucleotide sequence (protospacer) within a chromosomal or extrachromosomal nucleic acid (e.g., an episomal nucleic acid, a minicircle nucleic acid, a mitochondrial nucleic acid, a chloroplast nucleic acid, and the like). The nucleotide sequence (protospacer) is adjacent to a "protospacer adjacent motif" (PAM) sequence which is recognized by the CRISPR-Cas protein. A protospacer and its adjacent PAM sequence may collectively be referred to as a target site. For example, the Class 2 CRISPR system of 5. pyogenes uses targeted sites having N12-20NGG, where NGG represents the PAM site from 5. pyogenes, and N12-20 represents the 12-20 nucleotides directly 5' to the PAM site. Additional PAM site sequences from other species of bacteria include NGGNG, NNNNGATT, NNAGAA, NNAGAAW, and NAAAAC. See, e.g., US 20140273233, WO 2013176772, Cong et al., Science 339 (6121): 819-823 (2012), Jinek et al., Science 337 (6096): p. 816-821 (2012), Mali et al., Science 339 (6121): p. 823-826 (2013), Gasiunas et al., Proc Natl Acad. Sci. U S A, 109 (39): p. E2579-E2586 (2012), Cho et al., Nature Biotechnology 31: p. 230-232 (2013), Hou et al., Proc. Natl Acad. Sci. U S A. 110(39): p.15644-15649 (2013), Mojica et al., Microbiology 155 (Pt 3): p. 733-740 (2009), and www.addgene.org/CRISPR/. Examples of Cas proteins and their preferred PAM sequences are listed in Table 3. Biological Origin Cas PAM

Francisella novicida FnCasl2a (Cpfl) 5' (T)TTV-(PS)

Lachnospiraceae bacterium LbCasl2a (Cpfl) 5' TTTV-(PS)

Acidaminococcus sp. AsCasl2a (Cpfl) 5' TTTN-(PS)

Alicyclobacillus AacCasl2b (C2cl) 5' TTN-(PS) acidoterrestris

Bacillus hisashii BhCasl2b v4 5' ATTN-(PS)

Oleiphilus sp. OspCasl2c (C2c3) 5' TG-(PS)

Casl2cl (C2c3)a 5' TG-(PS)

Casl2c2 (C2c3) 5' TN-(PS)

Casl2d (CasY) 5' TA-(PS)

Casl2e (CasX) 5' TTCN-(PS)

Casl2fl (Casl4al) 5' TTTR-(PS)

- Casl2gl No PAM requirement

Casl2hl 5' RTR-(PS)

Casl2il 5' TTN-(PS)

PS = protospacer; N = any base; R = A/G; V = A/C/G.

Table 3: Examples of Cas proteins and their preferred PAM sequences.

A CRISPR-Cas protein (and/or a nucleic acid encoding the CRISPR-Cas protein) can be a naturally existing CRISPR-Cas protein (e.g., a nuclease) or variants thereof (e.g., a nickase, or a dead nuclease), or an engineered CRISPR-Cas protein derived from a naturally occurring (wild-type) protein. In some embodiments, the sequence-targeting protein (e.g., a CRISPR-Cas protein) is a nuclease comprising at least one catalytically inactive nuclease domain, such as a CRISPR-Cas protein with at least one catalytically inactive nuclease domain (e.g., a nickase) or a nuclease with no catalytically active nuclease domain (e.g., a nuclease-dead CRISPR-Cas protein).

Non-limiting examples of Cas proteins include but are not limited to: Casl, CasIB, Cas2, Cas3, Cas4, Cas5, Cas5e (CasD), Cas6, Cas6e, Cas6f, Cas7, Cas8al, Cas8a2, Cas8b, Cas8c, Cas9 (Csnl or Csxl2), SauriCas9, Casio, CaslOd, Casl2a, Casl2b, Casl2c, Casl2d, Casl2e, Casl2f, Casl2h, Casl2i, Casl2j, Casl2l (CasBeta), CasMINI, Mad7, CasX, CasY, Cas 13a, Casl4, C2cl, C2c2, C2c3, CasF, CasG, CasH, Csyl, Csy2, Csy3, Csel (CasA), Cse2 (CasB), Cse3 (CasE), Cse4 (CasC), Csel, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmrl, Cmr3, Cmr4, Cmr5, Cmr6, Csbl, Csb2, Csb3, Csxl7, Csxl4, CsxlO, Csxl6, CsaX, Csx3, Csxl, Csxl5, Csfl, Csf2, Csf3, Csf4, and Cul966, and homologs or modified versions thereof. Unless otherwise stated or implicit from context the recitation of a Cas protein includes all active and deactivated versions, as well as homologs and derivatives thereof. In some embodiments, the Cas protein is a Type II Cas protein, such as Cas9. In other embodiments, the Cas protein is a Type V Cas protein selected from the group consisting of Casl2a, Casl2b, Casl2c, Casl2d, Casl2e, Casl2f, Casl2fl, Casl2h, Casl2i, Casl2j, Casl2l, CasMINI and ErCasl2a (MAD7®). Modified versions of Cas proteins that may be used in the present disclosure, include but are not limited to catalytically inactive versions of the Cas protein, such as dCas9 and dCasl2 or Cas versions that have modified attenuated catalytic activity to provide a nicking function, such as nickase nCas9. A nicking enzyme or nickase is an enzyme that cuts one strand of a double-stranded DNA at a specific recognition nucleotide sequence. These enzymes cut only one strand of the DNA duplex, to produce DNA molecules that are "nicked," rather than cleaved. Further information about Cas proteins can be found, e.g., in Makarova et al. Nat Rev Microbiol. 2020 Feb;18(2):67-83.

Non-limiting examples of amino acid sequences of Cas proteins that may be of use in connection with the present disclosure are:

Streptococcus pyogenes (Sp) Cas9:

MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTAR RRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLR KKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDA KAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLL AQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEI FFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSI PHQI HLGELH AILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQ SFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKV TVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREM IEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLI HD DSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMAREN QTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSD YDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTK AERGGLSELDKAGFI KRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQF YKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSN IMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESIL PKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDF LEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPED NEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENII HLFTLTNLGAPA AFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD (SEQ ID NO: 59) SpCas9_D10A nickase:

MDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTAR RRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLR KKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDA KAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLL AQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEI FFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSI PHQI HLGELH Al LRRQEDFYPFLKDN REKI EKI LTFRI PYYVGPLARG NSRFAWMTRKSEETITPWN FEEVVDKG ASAQ SFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKV TVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREM IEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLI HD DSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMAREN QTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSD YDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTK AERGGLSELDKAGFI KRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQF YKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSN IMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESIL PKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDF LEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPED NEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENII HLFTLTNLGAPA

AFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD (SEQ ID NO: 60)

Nuclease-dead SpCas9_D10A+H840A:

MDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTAR RRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLR KKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDA KAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLL AQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEI FFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSI PHQI HLGELH Al LRRQEDFYPFLKDN REKI EKI LTFRI PYYVGPLARG NSRFAWMTRKSEETITPWN FEEVVDKG ASAQ SFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKV TVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREM IEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLI HD DSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMAREN QTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSD YDVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTK AERGGLSELDKAGFI KRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQF YKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSN IMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESIL PKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDF LEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPED NEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENII HLFTLTNLGAPA

AFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD (SEQ ID NO: 61) Casl2f:

MAKNTITKTLKLRIVRPYNSAEVEKIVADEKNNREKIALEKNKDKVKEACSKHLKVAAYCTTQVERNAC

LFCKARKLDDKFYQKLRGQFPDAVFWQEISEIFRQLQKQAAEIYNQSLIELYYEIFIKGKGIANASSVEHY

LSDVCYTRAAELFKNAAIASGLRSKIKSNFRLKELKNMKSGLPTTKSDNFPIPLVKQKGGQYTGFEISNH

NSDFIIKI PFGRWQVKKEIDKYRPWEKFDFEQVQKSPKPISLLLSTQRRKRNKGWSKDEGTEAEIKKVM

NGDYQTSYIEVKRGSKIGEKSAWMLNLSIDVPKIDKGVDPSIIGGIDVGVKSPLVCAINNAFSRYSISDN

DLFHFNKKMFARRRILLKKNRHKRAGHGAKNKLKPITILTEKSERFRKKLIERWACEIADFFIKNKVGTV

QMENLESMKRKEDSYFNIRLRGFWPYAEMQNKIEFKLKQYGIEIRKVAPNNTSKTCSKCGHLNNYFN

FEYRKKNKFPHFKCEKCNFKENADYNAALNISNPKLKSTKEEP (SEQ ID NO: 62)

Nuclease-dead Casl2f:

MAKNTITKTLKLRIVRPYNSAEVEKIVADEKNNREKIALEKNKDKVKEACSKHLKVAAYCTTQVERNAC

LFCKARKLDDKFYQKLRGQFPDAVFWQEISEIFRQLQKQAAEIYNQSLIELYYEIFIKGKGIANASSVEHY

LSDVCYTRAAELFKNAAIASGLRSKIKSNFRLKELKNMKSGLPTTKSDNFPIPLVKQKGGQYTGFEISNH

NSDFIIKI PFGRWQVKKEIDKYRPWEKFDFEQVQKSPKPISLLLSTQRRKRNKGWSKDEGTEAEIKKVM

NGDYQTSYIEVKRGSKIGEKSAWMLNLSIDVPKIDKGVDPSIIGGIAVGVKSPLVCAINNAFSRYSISDN

DLFHFNKKMFARRRILLKKNRHKRAGHGAKNKLKPITILTEKSERFRKKLIERWACEIADFFIKNKVGTV

QMENLESMKRKEDSYFNIRLRGFWPYAEMQNKIEFKLKQYGIEIRKVAPNNTSKTCSKCGHLNNYFN

FEYRKKNKFPHFKCEKCNFKENAAYNAALNISNPKLKSTKEEP (SEQ ID NO: 63)

CasMINI:

MAKNTITKTLKLRIVRPYNSAEVEKIVADEKNNREKIALEKNKDKVKEACSKHLKVAAYCTTQVERNAC

LFCKARKLDDKFYQKLRGQFPDAVFWQEISEIFRQLQKQAAEIYNQSLIELYYEIFIKGKGIANASSVEHY

LSRVCYRRAAELFKNAAIASGLRSKIKSNFRLKELKNMKSGLPTTKSDNFPIPLVKQKGGQYTGFEISNH

NSDFIIKI PFGRWQVKKEIDKYRPWEKFDFEQVQKSPKPISLLLSTQRRKRNKGWSKDEGTEAEIKKVM

NGDYQTSYIEVKRGSKICEKSAWMLNLSIDVPKIDKGVDPSIIGGIDVGVRSPLVCAINNAFSRYSISDN

DLFHFNKKMFARRRILLKKNRHKRAGHGAKNKLKPITILTEKSERFRKKLIERWACEIADFFIKNKVGTV

QMENLESMKRKEDSYFNIRLRGFWPYAEMQNKIEFKLKQYGIEIRKVAPNNTSKTCSKCGHLNNYFN

FEYRKKNKFPHFKCEKCNFKENADYNAALNISNPKLKSTKERP (SEQ ID NO: 64)

Nuclease-dead CasMI NI :

MAKNTITKTLKLRIVRPYNSAEVEKIVADEKNNREKIALEKNKDKVKEACSKHLKVAAYCTTQVERNAC

LFCKARKLDDKFYQKLRGQFPDAVFWQEISEIFRQLQKQAAEIYNQSLIELYYEIFIKGKGIANASSVEHY

LSRVCYRRAAELFKNAAIASGLRSKIKSNFRLKELKNMKSGLPTTKSDNFPIPLVKQKGGQYTGFEISNH

NSDFIIKI PFGRWQVKKEIDKYRPWEKFDFEQVQKSPKPISLLLSTQRRKRNKGWSKDEGTEAEIKKVM

NGDYQTSYIEVKRGSKICEKSAWMLNLSIDVPKIDKGVDPSIIGGIAVGVRSPLVCAINNAFSRYSISDN

DLFHFNKKMFARRRILLKKNRHKRAGHGAKNKLKPITILTEKSERFRKKLIERWACEIADFFIKNKVGTV

QMENLESMKRKEDSYFNIRLRGFWPYAEMQNKIEFKLKQYGIEIRKVAPNNTSKTCSKCGHLNNYFN

FEYRKKNKFPHFKCEKCNFKENAAYNAALNISNPKLKSTKERP (SEQ ID NO: 65)

A corresponding guide RNA (gRNA) may comprise any one or more of the following: an RNA moiety capable of binding to the sequence-targeting protein, such as a crRNA, an RNA moiety capable of binding to a target site in a nucleic acid molecule, such as a tracrRNA and/or a scoutRNA.

For embodiments wherein a Type II Cas protein is utilised, the guide RNA may comprise a crRNA and a tracrRNA. For embodiments wherein a Type V Cas protein is utilised, the guide RNA may comprise crRNA, crRNA and tracrRNA, or crRNA and scoutRNA.

Type II CRISPR-Cas System

Type II CRISPR-Cas systems utilise a guide RNA comprising a crRNA and a tracrRNA (Type II crRNA:tracrRNA guide). The crRNA comprises a sequence that is complementary to a target site in a nucleic acid. The guide RNA may further comprise a tracrRNA that at minimum can hybridise with the crRNA over a range of at least three nucleotides, and when hybridised over that region can retain association with a Type II Cas protein. The guide RNA can be either a single RNA molecule or a complex of multiple RNA molecules.

Thus, in embodiments wherein the sequence targeting component comprises a Type II Cas protein, it is preferable that the RNA-ligand binding complex comprises a Type II crRNA:tracrRNA guide RNA.

Among the sub-components of the RNA-ligand binding complex disclosed herein, the crRNA provides the targeting specificity. The crRNA comprises a programmable spacerthat is complementary and capable of hybridisation to a pre-selected target site of interest. In various embodiments, this spacer can comprise from about 10 nucleotides to more than about 25 nucleotides. For example, the region of base pairing between the spacer and the corresponding target site sequence can be about 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 22, 23, 24, 25, or more than 25 nucleotides in length. In an exemplary embodiment, the spacer is about 17-20 nucleotides in length, such as 20 nucleotides. The spacer is variable and may be selected based on the target sequence of interest for the Cas protein and/or effector to cause base editing. The spacer does not hybridise with tracrRNA, and it may be downstream of the Cas association region.

One requirement for selecting a suitable target nucleic acid for Type II CRISPR-Cas systems is that it has a PAM site/sequence. Each target sequence and its corresponding PAM site/sequence are referred herein as a Cas-targeted site. The Type II CRISPR-Cas system, one of the most well characterised systems, needs only a Cas9 protein and a guide RNA complementary to a target sequence to affect target cleavage. The Type II CRISPR-Cas system of 5. pyogenes uses target sites having N12-20NGG, where NGG represents the PAM site from 5. pyogenes, and N12-20 represents the 12-20 nucleotides directly 5' to the PAM site. Additional PAM site sequences from other species of bacteria include NGGNG, NNNNGATT, NNAGAA, NNAGAAW, and NAAAAC. See, e.g., US 20140273233, WO 2013176772, Cong et al., (2012), Science 339 (6121): 819-823, Jinek et al., (2012), Science 337 (6096): 816-821, Mali et al, (2013), Science 339 (6121): 823-826, Gasiunas et al., (2012), Proc Natl Acad Sci U S A. 109 (39): E2579-E2586, Cho et al., (2013) Nature Biotechnology 31, 230-232, Hou et al., Proc Natl Acad Sci U S A. 2013 Sep 24;110(39):15644-9, Mojica et al., Microbiology. 2009 Mar;155(Pt 3):733-40, and www.addgene.org/CRISPR/. The contents of these documents are incorporated herein by reference in their entireties. A PAM site/sequence is also a requirement for selecting a suitable target nucleic acid for a Type V CRISPR system.

The target site in a nucleic acid can be in either of the two strands on a genomic DNA in a cell. Examples of such genomic DNA include, but are not necessarily limited to, a cell chromosome, organelle DNA, such as mitochondrial DNA, and a stably maintained plasmid. However, it is to be understood that the present method can be practiced on other DNA present in a cell, such as non-stable plasmid DNA, viral DNA, and phagemid DNA, as long as there is Cas-targeted site regardless of the nature of the cell dsDNA.

In addition to the spacer, the crRNA also comprises a repeat region. The repeat region hybridises with the anti-repeat region of the tracrRNA, described below, to form a repeat:antirepeat duplex. In various embodiments, the repeat region can comprise from about 10 nucleotides to more than about 25 nucleotides. For example, the repeat region can be about 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 22, 23, 24, 25, or more than 25 nucleotides in length. In an exemplary embodiment, the repeat region is about 21-25 nucleotides in length, such as 23 nucleotides.

The repeat:antirepeat duplex acts as a Cas association region, and is designed on the Cas RNA binding domain of a Cas protein with which it is intended to associate. Not all of the nucleotides within the Cas association region need directly associate with the Cas protein. Not all of the nucleotides within the repeat:antirepeat duplex hybridise. The repeat:antirepeat duplex comprises a lower stem, bulge and upper stem. The bulge is essential for interaction with the Cas protein. In a non-limiting example, the repeat:antirepeat duplex of a guide RNA capable of interacting with Cas9 from 5. pyogenes may comprise a lower stem of about 6 nucleotide in length, a bulge of a about 6 nucleotides in length and a upper stem of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more than 10 nucleotides in length. Optionally, the upper stem may be absent from the repeat:antirepeat duplex.

In addition to a crRNA, the guide RNA of Type II CRISPR-Cas systems further comprises a trans-activating CRISPR RNA (tracrRNA). The tracrRNA sequence may comprise from about 40 nucleotides to more than about 100 nucleotides. For example, the tracrRNA can be about 40, 50, 60, 70 80, 90 or more than 100 nucleotides in length. In an exemplary embodiment, the tracrRNA is about 88-90 nucleotides in length, such as 89 nucleotides. The tracrRNA sequence comprises an anti-repeat region, a nexus, stem loop 2 and stem loop 3. The tracrRNA further comprises a distal region that does not hybridise with the crRNA, and it may be upstream of the anti-repeat region. The anti-repeat region is at least 80%, at least 85%, at least 90%, at least 95%, or 100% complementary to the crRNA repeat region over at least 7 consecutive nucleotides. Thus, the repeat region of the abovedescribed programmable crRNA and the anti-repeat region of the tracrRNA are capable of hybridising to form a hybridisation region, herein referred to as a repeat:anti-repeat duplex.

In some embodiments, the tracrRNA is from Streptococcus pyogenes.

In some embodiments, the tracrRNA activity and crRNA activity are part of a single continuous strand of nucleotides, known as single guide RNA (or sgRNA). In some embodiments, the crRNA may be immediately upstream of the tracrRNA or it may be upstream of the tracrRNA with an intervening sequence or moiety between the tracrRNA and crRNA. If the tracrRNA and crRNA are part of a contiguous strand of nucleotides (sgRNA), there may be a loop region between the tracrRNA and the crRNA of for example 3 to 6 nucleotides, herein referred to as a tetraloop. See e.g., W02014099750, US 20140179006, and US 20140273226. The contents of these documents are incorporated herein by reference in their entireties. Methods for generating crRNA:tracrRNA guide RNAs and sgRNAs are well known in the art.

In some embodiments, the tracrRNA and the crRNA comprising the guide RNA are in two separate RNA molecules, which togetherform the functional guide RNA and are part of the RNA-ligand binding complex. In this case, the molecule with the tracrRNA activity should be able to interact with (usually by base pairing) the molecule with the crRNA activity to form a two-part guide crRNA:tracrRNA. Various tracrRNA and crRNA sequences are known in the art, and non-limiting examples of crRNAs and tracrRNAs that may be used in connection to the present disclosure are provided below (e.g., SEQ ID NOs: 75-84). As used herein, an active portion of a tracrRNA retains the ability to form a complex with a Cas protein, such as Cas9 or dCas9 or nCas9. See, e.g., \NO 2014/144592.

Type II crRNA

5' NNNNNNNNNNNNNNNNNNNNGTTTTAGAGCTATGCTGTTTTG 3' (SEQ ID NO: 74) (N's denote target-specific spacer; N may be any one of "a", "c", "g", or "t/u").

Type II tracrRNA

5'

AACAGCATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGG TGCGCGCACATGAGGATCACCCATGTGCTTTTTTT 3' (SEQ ID NO: 95)

Thus, in embodiments wherein the sequence targeting component of the present disclosure comprises a Type II Cas protein, it is preferable that the RNA-ligand binding complex comprises a Type II crRNA:tracrRNA guide RNA.

Type V CRISPR-Cas Systems

Type V CRISPR-Cas systems may utilise a guide RNA comprising a crRNA (Type V crRNA-only guide), a crRNA and tracrRNA (Type V crRNA:tracrRNA guide), or a crRNA and scoutRNA (Type V crRNA:scoutRNA guide). The guide RNA can be either a single RNA molecule or a complex of multiple RNA molecules. Thus, in embodiments where the sequence targeting component of the present disclosure comprises a Type V Cas protein, it is preferable that the RNA-ligand binding complex comprises a crRNA guide RNA, a crRNA:tracrRNA guide RNA or a crRNA:scoutRNA guide RNA.

The target site at a nucleic acid can be in either of the two strands on a genomic DNA in a cell. Examples of such genomic DNA include, but are not necessarily limited to, a cell chromosome, organelle DNA, such as mitochondrial DNA, and a stably maintained plasmid. However, it is to be understood that the present method can be practiced on other DNA present in a cell, such as non-stable plasmid DNA, viral DNA, and phagemid DNA, as long as there is Cas-targeted site regardless of the nature of the cell DNA. The crRNA provides targeting specificity and comprises a spacer that has a nucleotide sequence that is complementary and capable of hybridisation to a pre-selected target site of interest. In various embodiments, the spacer can comprise from about 10 nucleotides to more than about 25 nucleotides. For example, the region of base pairing between the spacer and the corresponding target site sequence can be about 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 22, 23, 24, 25, or more than 25 nucleotides in length. In an exemplary embodiment, the targeting sequence is about 18-25 nucleotides in length, such as 24 nucleotides. The spacer is variable and may be selected based on where one wishes for the Cas protein and/or effector to cause base editing. The spacer does not hybridise with tracrRNA and scoutRNA, when present.

One requirement for selecting a suitable target nucleic acid for some Type V CRISPR-Cas systems (as detailed in Table 3) is that it has a PAM site/sequence. Some Type V CRISPR- Cas systems, for example Casl2gl, do not require a PAM site/sequence for target recognition. Each target sequence and (if required) its corresponding PAM site/sequence are referred herein as a Cas-targeted site.

Type V CRISPR-Cas System: crRNA-only Guide

Type V CRISPR-Cas systems wherein the sequence-targeting component is a Cas protein such as Casl2a from Acidaminococcus sp. BV3L6 (AsCasl2a), require a guide RNA comprising only a crRNA molecule (Type V crRNA-only guide).

In addition to a spacer, the crRNA in Type V crRNA-only guide comprises a direct repeat that forms a pseudoknot-type hairpin secondary structure. The secondary structure forms critical contacts with the Cas enzyme.

Thus, in embodiments wherein the sequence targeting component comprises a Type V Cas protein such as AsCasl2a, MAD7® or Casl2j, it is preferable that the RNA-ligand binding complex comprises Type V crRNA-only guide RNA.

Type V CRISPR-Cas System: crRNA:tracrRNA Guide

Type V CRISPR-Cas systems wherein the sequence-targeting component is a Cas protein, such as Casl2b from Alicylobacillus acidiphilus (AaCasl2b) or uncultured archaeon 1 (Uni) Casl2fl, also referred as Casl2fl, and its evolved derivatives (CasMINI), require a guide RNA comprising a crRNA and a tracrRNA (Type V crRNA:tracrRNA guide). In addition to the spacer, the crRNA also comprises a repeat region. The repeat region hybridises with the anti-repeat region of the tracrRNA, described below, to form a repeat:antirepeat duplex. In various embodiments, the repeat region can comprise from about 20 nucleotides to more than about 35 nucleotides. In an exemplary embodiment, the repeat region is about 28-33 nucleotides in length, such as 31 nucleotides.

Besides the above-described crRNA, the Type V crRNA:tracrRNA guide comprises a transactivating CRISPR RNA (tracrRNA). The tracrRNA sequence may comprise from about 40 nucleotides to more than about 100 nucleotides. For example, the tracrRNA can be about 40, 50, 60, 7080, 90 or more than 100 nucleotides in length. In an exemplary embodiment, the tracrRNA is about 98-100 nucleotides in length, such as 100 nucleotides. The tracrRNA sequence comprises an anti-repeat region and stem loops. Various tracrRNA sequences are known in the art. As used herein, an active portion of a tracrRNA retains the ability to form a complex with a Type V Cas protein, such as Casl2b or dCasl2b or nCasl2b or Casl2fl or CasMINI.

In some embodiments the tracrRNA is from uncultured archaeon 1 (Uni).

In some embodiments, the tracrRNA activity and crRNA activity are part of a single continuous strand of nucleotides, known as single guide RNA (or sgRNA). In some embodiments, the crRNA may be immediately downstream of the tracrRNA or it may be downstream of the tracrRNA with an intervening sequence or moiety between the tracrRNA and crRNA. If the tracrRNA and crRNA are part of a contiguous strand of nucleotides (sgRNA), there may be a loop region between the tracrRNA and the crRNA of for example 3 to 6 nucleotides, herein referred to as a tetraloop. Methods for generating Type V crRNA:tracrRNA guide RNAs and sgRNAs are known in the art.

In some embodiments, the tracrRNA activity and the crRNA comprising the guide RNA are two separate RNA molecules, which together form the functional guide RNA and part of the RNA-ligand binding complex. In this case, the molecule with the tracrRNA activity should be able to interact with (usually by base pairing) the molecule with the crRNA activity, to form a two-part Type V crRNA:tracrRNA guide.

Thus, in embodiments wherein the sequence targeting component comprises a Type V Cas protein such as Casl2b from Alicylobacillus acidiphilus (AaCasl2b) or Casl2fl or evolved derivatives of Casl2fl (such as CasMINI), it is preferable that the RNA-ligand binding complex comprises a Type V crRNA:tracrRNA guide. In some embodiments, the sequence targeting component comprises a Type V Cas protein, such as Casl2fl. In some embodiments, the sequence targeting component comprises a Type V Cas protein, such as CasMINI.

Type V CRISPR-Cas System: crRNA:scoutRNA Guide

Type V CRISPR-Cas systems wherein the sequence-targeting component is a Cas protein such as Casl2dl5, require a guide RNA comprising a crRNA and a scoutRNA (crRNA:scoutRNA guide RNA).

In addition to the spacer, the crRNA also comprises a 5' direct-repeat region. This region comprises a conserved 5' nt sequence that hybridises to a complementary sequence of the scoutRNA, described below. In various embodiments, the repeat region can comprise from about 20 nucleotides to more than about 35 nucleotides. In an exemplary embodiment, the repeat region is about 28-33 nucleotides in length, such as 31 nucleotides.

Besides the above-described crRNA, the Type V crRNA:scoutRNA guide of this disclosure comprises a short-complementary untranslated RNA (scoutRNA). ScoutRNA together with crRNA, is required for the activity of certain Type V CRISPR systems, for example Casl2d- catalysed activity. The scoutRNA differs in secondary structure from previously described tracrRNAs used by CRISPR-Cas9 and some Casl2 enzymes, and in Casl2d-containing systems, scoutRNA includes a conserved five-nucleotide sequence that is essential for hybridisation to the crRNA and subsequent enzymatic activity. In addition to supporting crRNA-directed DNA recognition, biochemical and cell-based experiments establish scoutRNA as an essential cofactor for Casl2c-catalyzed pre-crRNA maturation.

The scoutRNA may be 40 to 100 nucleotides long. The scoutRNA sequence comprises a crRNA complementary region, an upstream region that is upstream of the crRNA complementary region, and a downstream region that is downstream of the crRNA complementary region. Preferably, the crRNA complementary region is 5 nucleotides long. The crRNA complementary region may be located at or near the 5' end of the scoutRNA or at or near the 3' end of the scoutRNA or between consecutive nucleotides within the scoutRNA that are neither at or nor the 5' or 3' end of the scoutRNA.

In some native scoutRNAs, self-complementary regions allow for one or more selfhybridisation regions, loops, and bulges, as well as optionally 5' ssRNA overhangs and or no overhangs. In some embodiments, any bulge or bulges of naturally occurring scoutRNAs are preserved even when the ligand binding moiety is attached.

In some embodiments, the anti-repeat region is at least 80%, at least 85%, at least 90%, at least 95%, or 100% complementary to the Cas association region over at least 5 consecutive nucleotides. Thus, the Cas association region of the above-described programmable crRNA and the anti-repeat region of the scoutRNA are capable of hybridising to form a hybridisation region. If the scoutRNA self-hybridises to form one or more hairpin regions, in some embodiments, its anti-repeat region may form a bulge.

When the anti-repeat region of the scoutRNA and the Cas association region of the crRNA hybridise to form the hybridisation region, the RNA that contains both the scoutRNA and the crRNA is capable of retaining association with a Cas RNA binding domain of a Type V Cas protein.

In some embodiments, the scoutRNA activity and crRNA activity are part of a single continuous strand of nucleotides. In some embodiments, the crRNA may be immediately downstream of the scoutRNA or it may be downstream of the scoutRNA with an intervening sequence or moiety between the scoutRNA and crRNA. The intervening sequence or moiety may be the ligand binding moiety, or a nucleotide or non-nucleotide loop region or ethylene glycol spacers such as 18S, 9S or C3. In a 5' to 3' direction, the crRNA:scoutRNA guide RNA may comprise, consist essentially of or consist of a first region of the scoutRNA, the anti-repeat region of the scout RNA, a second region of the scoutRNA, the loop between the scoutRNA, a Cas association region and the targeting region.

In some embodiments, the scoutRNA activity and the crRNA comprising the guide RNA are two separate RNA molecules, which together form the functional guide RNA and part of the RNA-ligand binding complex. In this case, the molecule with the scoutRNA activity should be able to interact with (usually by base pairing) the molecule (crRNA) having the targeting sequence to form a two-part guide crRNA:scoutRNA.

Non-limiting examples of scoutRNAs and crRNAs that may be used in a Type V CRISPR-Cas system connection to the present disclosure appear below.

Casl2dl5 scoutRNA 5'

CTTAGTTAAG G ATGTTCCAG GTTCTTTCG GG AGCCTTG G CCTTCTCCCTTAACCTATG CC ACTAAT

GATT3' (SEQ ID NO: 96)

Thus, in embodiments wherein the sequence targeting component comprises a Type V Cas protein such as Casl2dl5 (Fu, B.X.H., Smith, J.D., Fuchs, R.T. et al. Target-dependent nickase activities of the CRISPR-Cas nucleases Cpfl and Cas9. Nat Microbiol 4, 888-897 (2019). https://doi.org/10.1038/s41564-019-0382-0), it is preferable that the RNA-ligand binding complex comprises a Type V crRNA:scoutRNA guide.

Cells

In one aspect, the disclosure relates to a cell that comprises the nucleic acid deaminase according to the present disclosure or a base editor comprising said nucleic acid deaminase.

In another aspect, the disclosure relates to a cell that is a genetically engineered or a modified cell, that has been engineered or modified by the nucleic acid deaminase according to the disclosure or a base editor comprising said nucleic acid deaminase. In another aspect, the disclosure relates to a cell that is a genetically engineered or modified cell, that has been engineered or modified by one of the methods of the disclosure (a modified cell obtainable by methods disclosed herein). In some embodiments, the cell has been isolated from a human or non-human subject.

In some embodiments, the cell can be selected from the group consisting of: an archaeal cell, a bacterial cell, a eukaryotic cell, a eukaryotic single-cell organism, a somatic cell, a germ cell, a stem cell, a plant cell, an algal cell, an animal cell, in invertebrate cell, a vertebrate cell, a fish cell, a frog cell, a bird cell, a mammalian cell, a pig cell, a cow cell, a goat cell, a sheep cell, a rodent cell, a rat cell, a mouse cell, a non-human primate cell, and a human cell.

In some embodiments, the cell can be selected from the group consisting of epithelial cells, endothelial cells and mesenchymal cells. In more specific embodiments, the cell can be selected from the group consisting of bone cells, muscle cells, fat cells, nerve cells, etc. and others cells derived from these cells.

In some embodiments, the cell can be selected from the group consisting of a stem cell, an immune cell and a lymphocyte. Examples of the stem cell include embryonic stem cells, Embryonic stem (ES)-like stem cells, fetal stem cells, adult stem cells, pluripotent stem cells, induced pluripotent stem cells (iPSCs), hematopoietic stem cells (HSCs), multipotent stem cells, oligopotent stem cells, unipotent stem cells and others derived from these cells. Examples of the immune cell include a T cell, a B cell, an NK cell, a macrophage, a mixture thereof, and others described herein. In some embodiments, the cell is an in vitro cell that is not the human body, at the various stages of its formation and development.

Also provided is a pharmaceutical composition comprising an effective amount of the cell and a pharmaceutically acceptable carrier.

Expression constructs

The present disclosure also provides expression constructs and vectors encoding the nucleic acid deaminase variants and/or base editors, pharmaceutical compositions thereof and cells containing the nucleic acid deaminase variants, base editors or expression constructs thereof.

In some embodiments, the nucleic acid deaminase variant and/or base editors are delivered to the cells as a nucleic acid deaminase variant is expressed from an mRNA encoding the nucleic acid deaminase variants.

Numerous promoters for expression of nucleic acids are known and may be used in the practice of the disclosure. Such promoters are selected from constitutive, regulatable and tissue-specific promoters. In some instances, common promoters for mammalian expression are, e.g., CMV promoter, human U6 (hU6) promoter, SV40 promoter/enhancer, viral LTRs, promoters of constitutively expressed genes (actin, GAPDH), promoters of genes expressed in a tissue-specific manner, promoters of inducible genes (e.g., steroid hormones). For example, a hU6 promoter may be used for expression of gRNA molecules and a CMV promoter may be used for expression of RNA encoding a protein. In some embodiments, the nucleic acid molecules comprise a cell type specific promoter. A "cell type specific" promoter is a promoter that primarily drives expression in certain cell types in one or more organs.

In some instances, the nucleic acid sequence contains one or more of a coding region, an open reading frame (ORF), an expression cassette, a promoter/enhancer or terminator region, an untranslated region (UTR), and a cleavage site. Applications

The nucleic acid deaminase, base editors, cells and methods disclosed herein have a wide variety of utilities including modifying and editing (e.g., inactivating or activating) a target polynucleotide in a multitude of cell types, such as the cell types listed above. As such, the nucleic acid deaminase, base editors, cells and methods have a broad spectrum of applications in, e.g., research and therapy and it can be used in research and therapy in a wide spectrum of organisms: animals, plants, prokaryotic organisms, fungi, etc.

The disclosure provides a nucleic acid deaminase, base editors, cells and methods that can be used for the prevention and/or treatment of a disease or disorder in a subject, wherein a mutation can be corrected by a nucleic acid deaminase provided herein (or a base editor provided herein). In some embodiments, a method of modifying a nucleic acid is provided, wherein said method comprises administering to a subject having such a disease an effective amount of the nucleic acid deaminase of the disclosure (or the base editor of the disclosure) that corrects the point mutation that leads to a loss or gain of function in a gene product or introduces a deactivating mutation into a disease-associated gene. In some embodiments, the disease is a proliferative disease, such as cancer. In some embodiments, the disease is a genetic disease. In some embodiments, the disease is a neoplastic disease. In some embodiments, the disease is a metabolic disease. In some embodiments, the disease is a lysosomal storage disease or a metabolic disease, such as, for example, type I diabetes. In other embodiments, the nucleic acid deaminase of the disclosure introduces a deactivating point mutation into an oncogene (e.g., in the treatment of a proliferative disease). A deactivating mutation may, in some embodiments, generate a premature stop codon in a coding sequence, which results in the expression of a truncated gene product, e.g., a truncated protein lacking the function of the full-length protein. Other diseases that can be treated by correcting a point mutation or introducing a deactivating mutation into a disease-associated gene will be known to those of skill in the art, and the disclosure is not limited in this respect.

In some embodiments, the target site in the nucleic acid comprises a point mutation associated with a disease or a disorder. In some embodiments, the activity of the nucleic acid deaminase results in a correction of the point mutation. In some embodiments, the target site in the nucleic acid comprises a G to A point mutation associated with a disease or disorder, and the deamination of the deoxyadenosine results in a sequence that is not associated with a disease or disorder. In some embodiments, the disease or disorder to be treated is selected from the group consisting of phenylketonuria, von Willebrand disease, a neoplastic disease, a neoplastic disease associated with a mutant PTEN or BRCA1, alpha- 1 antitrypsin deficiency and Li-Fraumeni syndrome.

In some embodiments, the nucleic acid deaminase, base editors, cells and methods of the disclosure can be used in cosmetic applications to modify and/or edit (e.g., inactivating or activating) a target polynucleotide, to correct a mutation or introduce desired edits into a target polynucleotide. For cosmetic applications, the nucleic acid deaminase, base editors or cells of the disclosure can be introduced into or, otherwise, applied to the subject for cleansing, beautifying, promoting attractiveness, or altering the appearance.

Other applications of the nucleic acid deaminase, base editors, cells and methods of the disclosure are in biological computing.

The nucleic acid deaminase, base editors, cells and methods disclosed herein can be used to generate a transgenic non-human animal or plant having one or more genetic modification(s) of interest. In some embodiments, the transgenic non-human animal is homozygous for the genetic modification. In some embodiments, the transgenic non- human animal is heterozygous for the genetic modification. In some embodiments, the transgenic non-human animal is a vertebrate, for example, a fish (e.g., zebra fish, gold fish, puffer fish, cave fish, etc.), an amphibian (frog, salamander, etc.), a bird (e.g., chicken, turkey, etc.), a reptile (e.g., snake, lizard, etc.), a mammal (e.g., an ungulate, e.g., a pig, a cow, a goat, a sheep, etc.; a lagomorph (e.g., a rabbit); a rodent (e.g., a rat, a mouse); or a non-human primate.

Examples

Hereinafter, the present disclosure will be more specifically described based on examples. It is understood that various other aspects may be practiced, given the general and detailed descriptions provided elsewhere herein.

The wild-type (WT) TadA protein predominantly acts on tRNA. The data provided in the below Examples describe the engineering of WT TadA deaminases capable of dA base editing (TadA deletion variant), e.g., for use as base editors.

Example 1: Base editing with ecTadA A deletion variants

Escherichia coliTad (ecTadA) variants were engineered and used in a base editing system as an adenosine deaminase acting on DNA, e.g., single stranded DNA (ssDNA). One, two or three specific amino acids in |34-|35 loop segment (amino acid residues from 105 to 130) of the wild-type ecTadA (WT) were deleted to generate seven (7) different ecTadAA deletion variants (DI: ecTadA-D108A; D2: ecTadA-R107A; D3: ecTadA-A106A; D4: ecTadA-D108A- R107A; D5: ecTadA-D108A-R107A-A106A; D6: ecTadA-D108A-A106A, and D7: ecTadA- R107A-A106A), as summarised in Fig. 1.

The ecTadAA deletion mutations correspond to a specific in-frame deletion of three, six or nine consecutive nucleotides at the DNA level of the WT ecTadA. The nucleic acids encoding the ecTadAA deletion variants were cloned into expression plasmids and expressed in vitro in HEK293T cells, along with a SpCas9 nickase and a site-specific guide RNA. A non-transfected (NT) control was included in this study as a negative control.

The ability to perform base editing using systems that utilise the ecTadA variants compared to the NT was measured via Sanger sequencing 72 hours after transfection. Experiments were performed in duplicates. To measure base editing efficiency (N-to-G) for Site2a from fluorescence-based Sanger sequencing data, the following processing steps were applied:

(i) estimating average background noises for each base (over the span of the input guide sequence) from the trace data to account for variability in sequencing by: a) identifying and capping outliers in the noise data by median absolute deviation (MAD) method; b) calculating the geometric mean;

(ii) calculating the percentage of background-subtracted peak values of each base over their sum at that position.

The base editing efficiency data (N-to-G) were expressed as the mean. In the context of N- to-G editing (TadA variants), G-to-G events were excluded from the analysis, shown as (not applicable) in the Figures.

Methods

Engineering of ecTadAA deletion variants for use in a base editing system

The nucleotide sequence corresponding to the different ecTadAA deletion variants was designed in-silico and synthesized as a G-block™, which was cloned into an expression plasmid.

Plasmids expressing a WT ecTadA or an ecTadAA deletion variant under a CMV promoter were generated. A total of seven constructs were generated, which included an engineered ecTadAA deletion variant with one, two or three amino acid deletions. Each ecTadAA deletion variant contained an NLS sequence N-terminally fused. The regions from amino acids 105-130 of the resulting protein sequences of the seven variants are displayed in Fig. 1 and the corresponding deletion variant named DI through D7.

CRISPR-Cas protein and guide RNA (gRNA)

An SpCas9 nuclease containing the D10A mutation (i.e., an SpCas9 nickase, also referred to as SpCas9_D10A) was used. A plasmid was used to express SpCas9 nickase. An SpCas9 gRNA was used to target the SpCas9_D10A to Site2A (SEQ ID NO: 24). The gRNA was expressed under the control of a U6 promoter in a plasmid format.

Table 4: Nucleotide sequences of the gRNA targeting Site2a. In the above sequence annotation, the MS2 hairpin sequence is displayed in italics and bold whilst the targeting region of the gRNA (20 nucleotides) is underlined, "t" represents uracil in the RNA sequence.

Cell culture and transfection

For each of the seven ecTadAA deletion variants, the three plasmids described above were co-transfected in HEK293T cells to provide all three components of the base editing platform: (a) the SpCas9_D10A nickase, (b) the guide RNA targeting the Site2A genomic locus, hereinafter referred to as Site2A, and (c) the respective ecTadAA deletion variant catalysing the A-to-G transition. Briefly, HEK293T cells were grown in Dulbecco's modified Eagle medium (DMEM) supplemented with 10% fetal bovine serum (FBS) and 100 U ml-1 penicillin/streptomycin. 24 hours prior to transfection 10,000 cells were seeded into a single well of a 96-well plate. After 24 hours, the cells were lipid transfected with 200 ng of plasmid DNA (75 ng SpCas9_D10A plasmid, 75 ng ecTadAA deletion variant plasmid, and 50 ng gRNA plasmid) using DharmaFECT™ DUO transfection reagent (Dharmacon™ reagents; T-2012-02). Transfections were set up in triplicates for each ecTadAA deletion variant.

Cell lysis

Briefly, 72 hours after transfection, the medium was removed, and the cells were washed lx with PBS and 50 pl of Trypsin enzyme (Thermo Fisher Scientific) was added to each well. After the cells were dissociated, 20 pl of the resuspended cell solution were transferred to a 96 well plate and were incubated with 60 pl of DirectPCR lysis reagent (Viagen Biotech) under the following conditions: 55°C for 45 minutes followed by 95°C for 15 minutes. The cell lysates were stored at -20 °C until further use.

PCR amplification of targeted regions

For the amplification of the targeted sites, 1 pl of cell lysate obtained via the cell lysis protocol using the DirectPCR lysis reagent was used per PCR reaction.

Primers used in the PCR are Site2A-Forward (Site2A-F; SEQ ID NO: 87) and Site2A-Reverse (Site2A-R; SEQ ID NO: 88).

Table 5: Primers used in the PCR reaction to amplify the targeted Site2A.

Determination of the level of base editing

PCR-amplified targeted regions were Sanger sequenced. To measure base editing efficiency (N-to-G) for Site2a from fluorescence-based Sanger sequencing data, the following processing steps were applied:

The base editing efficiency data (N-to-G) were expressed as the mean. In the context of N- to-G editing (TadA variants), G-to-G events were excluded from the analysis, shown as (not applicable).

Results and Discussion

As shown in Fig. 2, only background level A-to-G transition was observed at Site2A for the non-transfected (NT) control. In contrast, each ecTadAA deletion variant tested induced A- to-G transition, with base editing efficiencies ranging from 5 to 21% depending on the ecTadAA deletion variant. These results show that specific single amino acid deletions in |34-|35 loop segment of ecTadA are sufficient to drastically change the adenosine deaminase activity of ecTadA on ssDNA, from non-detectable to very active when used in a base editing system, as demonstrated by the level of A-to-G transition observed in the experiments. These data also imply that targeted deletions in |34-|35 loop segment of ecTadA are not detrimental to the whole protein structure and activity.

Example 2: Base editing with ecTadAA deletion variants fused to an aptamer binding protein os port of on aptamer-recruitment-dependent base editing system

The ecTadAA deletion variants described in Example 1 were engineered, fused to an MS2 Coat Protein (MCP) binding protein and used in a base editing system as an adenosine deaminase acting on ssDNA. Said base editing system comprises the recruitment of the ecTadAA deletion variant via an MS2 aptamer on the gRNA that binds to the MCP protein. The ecTadAA deletion variants summarised in Fig. 1 were fused to MCP via a linker (see Table 2) according to the general structure (NLS-TADA-£//V/C'£/?-MCP; see Table 6). The different ecTadAA-MCP deletion variants (Dl-MCP, D2-MCP, D3-MCP, D4-MCP, D5-MCP, D6-MCP or D7-MCP) and WT-MCP were cloned into expression plasmids and expressed in vitro in HEK293T cells, along with a plasmid expressing an SpCas9_D10A and one of four specific gRNA plasmids for each of four targets Site2A, Site2c, Site45 and Site312. These gRNAs contained a 3' RNA aptamer hairpin (MS2 hairpin sequence) which mediates the recruitment of the respective ecTadAA-MCP. A WT-MCP was included in this study as a negative control.

The ability to perform base editing using aptamer-recruitment-dependent base editing systems that utilise the ecTadAA deletion variants compared to WT-MCP was measured via Sanger sequencing 72 hours after transfection. Experiments were performed in duplicates. To measure base editing efficiency (N-to-G) from fluorescence-based Sanger sequencing data, the following processing steps were applied:

(ii) calculating the percentage of background-subtracted peak values of each base over their sum at that position. The base editing efficiency data (N-to-G) were expressed as the mean. In the context of N- to-G editing (TadA variants), G-to-G events were excluded from the analysis, shown as (not applicable). Methods

Engineering of ecTadAA-MCP deletion variants to use in an aptamer-recruitment- dependent base editing system

Plasmids expressing a wild-type or a mutant ecTadA fused to MCP under a pCMV promoter were generated. A total of seven constructs were generated, which include an engineered ecTadAA deletion variant with one, two or three amino acid deletions. Each ecTadAA deletion variant consists of an NLS sequence fused to the ecTadA N-terminus which is fused to MCP via a linker. The amino acid sequences of the seven ecTadAA-MCP deletion variants are displayed in Table 6.

Table 6: Amino acid sequences of the seven TadA deletion variants generated (D1-D7), and the amino acid sequence of the wild-type TadA (WT ecTadA; referred to as "WT") used in the experiments. The general structure is "MA-[NLS]-[ecTadA]-[L]-[MCP]". In the above sequence annotation, the N-terminal NLS sequence, [NLS] is highlighted in bold, the linker, [L] is depicted in italic/bold/underlined and the MCP is underlined (NLS-TADA-/./nfcerl- MCP). ecTadA deletion variants: ecTadA-D108A (referred to as "DI"); ecTadA-R107A (referred to as "D2"); ecTadA-A106A (referred to as "D3"); ecTadA-D108A-R107A (referred to as "D4"); ecTadA-D108A-R107A-A106A (referred to as "D5"); ecTadA-R107A-A106A (referred to as "D6"); ecTadA-D108A-A106A (referred to as "D7"). CRISPR-Cas protein and gRNA

An SpCas9 nuclease containing the D10A mutation (i.e., an SpCas9 nickase, also referred to as SpCas9_D10A) was used. A plasmid was used to express SpCas9 nickase. An SpCas9 gRNA scaffold containing an RNA aptamer hairpin (MS2 hairpin sequence) in 3' position was used to target the SpCas9_D10A to the four different genomic targets Site2a (SEQ ID NO: 24), Site2c (SEQ ID NO: 25), Site45 (SEQ ID NO: 26) and Site312 (SEQ ID NO: 27). The gRNAs were expressed under the control of a U6 promoter in a plasmid format. The sequences of the different gRNAs are displayed in Table 7.

Table 7: Nucleotide sequences of different gRNAs targeting Site2c, Site2a, Site45 and Site312. In the above sequence annotation, the MS2 hairpins are displayed in italics and bold whilst the targeting region of the gRNA (20 nucleotides) is underlined, "t" represents uracil in the RNA sequence.

Cell culture and transfection

For each of the seven ecTadAA deletion variants, the three plasmids described above were co-transfected in HEK293T cells to provide all three components of the base editing platform: (a) the SpCas9_D10A nickase, (b) the guide RNA containing the MS2 aptamer and targeting the different sites, and (c) the respective ecTadAA deletion variant fused to MCP and catalysing the A-to-G transition. HEK293T cells were grown in Dulbecco's modified Eagle medium (DMEM) supplemented with 10% fetal bovine serum (FBS) and 100 U ml-1 penicillin/streptomycin. 24 hours prior to transfection 10,000 cells were seeded into a single well of a 96-well plate. After 24 hours the cells were lipid transfected with 200 ng of plasmid DNA (75 ng SpCas9_D10A plasmid, 75 ng ecTadAA-MCP or WT-MCP plasmid, and 50 ng gRNA plasmid) using DharmaFECT™ DUO (Dharmacon™ reagents; T-2012-02).

Cell lysis

Following 72 hours after transfection the medium was removed, and the cells were washed lx with PBS and 50 pl of Trypsin enzyme (Thermo Fisher Scientific) was added to each well. After the cells were dissociated, 20 pl of the resuspended cell solution were transferred to a 96 well plate and were incubated with 60 pl of DirectPCR lysis reagent (Viagen Biotech) under the following conditions: 55°C for 45 minutes followed by 95°C for 15 minutes. The cell lysates were stored at -20 °C until further use.

PCR amplification of targeted regions

For the amplification of the targeted sites, 1 pl of cell lysate obtained using the DirectPCR lysis reagent was used per PCR reaction. Primers used in the PCR reaction for the different sites are detailed in Table 8.

Table 8: Primers used in the PCR reaction to amplify the targeted sites.

Determination of the level of base editing

PCR-amplified targeted regions were Sanger sequenced. To measure base editing efficiency (N-to-G) for Site2a, Site2c, Site45 and Site312 from fluorescence-based Sanger sequencing data, the following processing steps were applied:

(i) estimating average background noises for each base (over the span of the input guide sequence) from the trace data to account for variability in sequencing by: a) identifying and capping outliers in the noise data by median absolute deviation (MAD) method; b) calculating the geometric mean; (ii) calculating the percentage of background-subtracted peak values of each base over their sum at that position.

Results and Discussion

As is presented in Figs. 3A, 3B, 3C and 3D, only background level A-to-G transition was observed across all four sites (Site2a, Site2c, Site45 and Site312) when the WT-MCP was used, similar to NT. Among the seven ecTadAA deletion variants, the three variants with only one amino acid deletion (Dl-MCP (ecTadA-D108A); D2-MCP (ecTadA-R107A); D3-MCP (ecTadA-A106A)) and the double deletion variant D4-MCP (ecTadA-D108A-R107A) were the most reproducibly active variants across the four sites, with notably up to 56% base editing on A5 of Site2c with D3-MCP (see Fig. 3B), up to 48% base editing on A6 of Site45 with D4-MCP (see Fig. 3C) and up to 15% base editing on A7 of Site312 with D4-MCP (see Fig. 3D).

These results show that targeted single amino acid deletions in ecTadA and the fusion of the resulting ecTadAA deletion variant to MCP result in adenosine deaminase activity of the ecTadAA-MCP on ssDNA, when used in an aptamer-recruitment-dependent base editing system. This is demonstrated by the level of A-to-G transition observed in these experiments. These data also imply that targeted deletions in |34-|35 loop segment of ecTadA in combination to the fusion to MCP via a linker are not detrimental to the whole protein structure and function.

Example 3: Base editing with ecTadAA deletion variants or ecTadA substitution variants fused to an aptamer binding protein as part of an aptamer-recruitment-dependent base editing system

The ecTadAA-MCP deletion variants (Dl-MCP, D2-MCP, D3-MCP) were tested alongside ecTadA substitution variants which contain a substitution instead of the deletion at the same amino acid position. Dl-MCP (ecTadA-D108A) was compared to ecTadA-D108N-MCP (substitution variant for DI; SEQ ID NO: 50), D2-MCP (ecTadA-R107A) was compared to ecTadA-R107C-MCP (substitution variant for D2; SEQ ID NO: 51), and D3-MCP (ecTadA- A106A) was compared to ecTadA-A106V (substitution variant for D3; SEQ ID NO: 52). All ecTadAA deletion variants fused to MCP via a linker or ecTadA substitution variants fused to MCP via a linker were cloned into expression plasmids and expressed in vitro in HEK293T cells along with an SpCas9_D10A plasmid and one of four specific gRNA plasmids for each of four targets: Site2a, Site2c, Site45 and Site312. These gRNAs contained a 3' RNA aptamer hairpin (MS2 hairpin sequence) which mediates the recruitment of the respective ecTadAA-MCP deletion variants or ecTadA substitution variants. A non-transfected (NT) control was included in this study as a negative control.

The ability to perform base editing using aptamer-recruitment-dependent base editing systems that utilise the ecTadAA-MCP deletion variants compared to the ecTadA-MCP substitution variants was determined via Sanger sequencing 72 hours after transfection. Experiments were performed in duplicates or triplicates. To measure base editing efficiency (N-to-G) from fluorescence-based Sanger sequencing data, the following processing steps were applied:

Methods

Engineering of ecTadA deletion variants or ecTadA substitution variants fused to MCP to use in an aptamer-recruitment-dependent base editing system.

Plasmids expressing the ecTadAA deletion variants fused to MCP (Dl-MCP, D2-MCP and D3-MCP; see sequences in Table 6) or ecTadA substitution variants corresponding to the deletion variants but substituting the deleted amino acid by another amino acid fused to MCP under a CMV promoter were generated. A total of six constructs were generated, three of which included an engineered ecTadA with one amino acid deletion and three of which included a substitution mutation at the same position. Each variant contained an NLS sequence fused to the ecTadA N-terminus and an MCP ligand fused to the C-terminus via Linkerl (see Table 2). TadA substitution constructs:

MA-NLS-ecTadA-D108N-linker-MCP amino acid sequence ("reference construct for DI"; annotation below: NLS in bold; linker in italics, bold and underlined; MCP in underlined) MAPKKKRKVSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTA

HAEIMALRQGGLVMQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARNAKTGAAGSLMDVL

HHPGMNHRVEITEGILADECAALLSDFFRMRRQEIKAQKKAQSSTDEL T'PLGDTTHT'SPPCP PE’LL

GGPMASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKV

EVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIY (SEQ ID NO: 50)

MA-NLS-ecTadA-R107C-linker-MCP amino acid sequence ("reference construct for D2"; annotation below: NLS in bold; linker in italics, bold and underlined; MCP in underlined) MAPKKKRKVSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTA

HAEIMALRQGGLVMQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGACDAKTGAAGSLMDVL

HHPGMNHRVEITEGILADECAALLSDFFRMRRQEIKAQKKAQSSTDELKTPLGD7THTSPPCPAPELL

GGPMASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKV

EVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIY (SEQ ID NO: 51)

MA-NLS-ecTadA-A106V-linker-MCP amino acid sequence ("reference construct for D3"; annotation below: NLS in bold; linker in italics, bold and underlined; MCP in underlined) MAPKKKRKVSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTA

HAEIMALRQGGLVMQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGVRDAKTGAAGSLMDVL

HHPGMNHRVEITEGILADECAALLSDFFRMRRQEIKAQKKAQSSTDELKTPLGD7THTSPPCPAPELL

GGPMASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKV

EVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIY (SEQ ID NO: 52)

CRISPR-Cas protein and gRNA

An SpCas9 nuclease containing the D10A mutation (i.e., an SpCas9 nickase, also referred to as SpCas9_D10A) was used. A plasmid was used to express SpCas9 nickase. An SpCas9 gRNA scaffold containing an RNA aptamer hairpin (MS2 hairpin sequence) in 3' position was used to target the SpCas9_D10A to the four genomic targets: Site2a (SEQ ID NO: 24), Site2c (SEQ ID NO: 25), Site45 (SEQ ID NO: 26) and Site312 (SEQ ID NO: 27). The gRNAs were expressed under the control of a U6 promoter in a plasmid format. The sequences of the different gRNAs directed to the four targets are displayed in Table 7 above. Cell culture and transfection

For each of the three ecTadAA deletion variants and the three corresponding ecTadA substitution variants, the three plasmids described above were co-transfected in HEK293T cells to provide all three components of the base editing platform: (a) the SpCas9_D10A nickase, (b) the guide RNA containing the MS2 aptamer and targeting the different sites, and (c) the respective ecTadAA deletion variant or ecTadA substitution variant fused to MCP and catalysing the A-to-G transition. HEK293T cells were grown in Dulbecco's modified Eagle medium (DMEM) supplemented with 10% fetal bovine serum (FBS) and 100 U ml-1 penicillin/streptomycin. 24 hours prior to transfection 10,000 cells were seeded into a single well of a 96-well plate. After 24 hours the cells were lipid transfected with 200 ng of plasmid DNA (75 ng SpCas9D10A vector, 75 ng ecTadAA-MCP or corresponding ecTadA substitution variant fused to MCP vector, and 50 ng gRNA expression vector) using DharmaFECT™ DUO (DharmaconTM reagents; T-2012-02).

Cell lysis

Following 72 hours after transfection the medium was removed, and the cells were washed lx with PBS and 50 pl of Trypsin enzyme (Thermo Fisher Scientific) was added to each well. After cells were dissociated, 20 pl of the resuspended cell solution were transferred to a 96 well plate and were incubated with 60 pl of DirectPCR lysis reagent (Viagen Biotech) under the following conditions: 55°C for 45 minutes followed by 95°C for 15 minutes, the cell lysates were stored at -20 °C.

PCR amplification of targeted regions

Determination of the level of base editing

Results and Discussion

As can be seen in the data presented in Figs. 4A, 4B, 4C, and 4D, no A-to-G transition was observed across all four sites in the NT controls. Each of the ecTadAA-MCP deletion variants tested induced A-to-G editing, with base editing efficiencies ranging from 5 to 60% depending on the ecTadAA-MCP deletion variant and the site. Overall, Dl-MCP and the corresponding ecTadA-D108N-MCP showed comparable editing efficiency over the four sites, with Dl-MCP at Site2C (see Fig. 4B) and Site45 (see Fig. 4C) showing the highest levels of base editing. Among the three ecTadA substitution variants, two variants ecTadA-R107C- MCP and ecTadA-A106V showed negligible or no base editing across the four sites. In contrast, the corresponding deletion variants D2-MCP and D3-MCP did show base editing activity across the four sites (see Figs. 4A-4C).

These results showed that targeted single amino acid deletions in the |34-|35 loop segment of TadA (i.e., amino acid residues from 105 to 130 of the wild-type ecTadA or a corresponding region in other wild-type TadA proteins from other organisms) and the fusion of the resulting ecTadA deletion variant to MCP resulted in DNA editing activity (adenosine deaminase activity) that was equal to, or higher compared to ecTadA substitution variants (that have a substitution at the corresponding deleted base, also fused to MCP), when used in an aptamer-recruitment-dependent base editing system. This was demonstrated by the level of N-to-G editing observed in these experiments, as shown by Sanger sequencing.

Example 4: Base editing with homo- and heterodimer ecTadAA deletion variants

An ecTadA variant was engineered, fused to either (i) an identical ecTadA variant to generate an ecTadA homodimer or to (ii) WT to generate a heterodimer. The ecTadA variant was then further fused to an MCP binding protein and used in a base editing system as an adenosine deaminase effector. The homo- and heterodimers fused to MCP were recruited via an MS2 aptamer present on the gRNA. Different configurations of the homo- and heterodimer ecTadAA deletion variants fused to MCP were prepared. The schematics of the homo- and heterodimer ecTadAA deletion variants are depicted in the left part of Fig. 5A and Fig. 5B. Base editing efficiency was tested and compared to an ecTadAA deletion variant monomer fused to MCP.

The Dl-MCP monomer, N-terminally fused MCP-D1 monomer, D1-D1-MCP homodimer, Dl-WT-MCP heterodimer, N-terminally fused MCP-D1-WT heterodimer or WT-D1-MCP heterodimer were cloned into expression plasmids and expressed in vitro in HEK293T cells along with an SpCas9-D10A plasmid and one of two specific gRNA plasmids for each of two targets Site2c and Site45. These gRNAs contained an RNA aptamer hairpin (MS2 hairpin sequence) in 3' position, which mediates the recruitment of the respective monomer or homo- or heterodimer ecTadAA-MCP deletion variant.

The ability to perform base editing using aptamer-recruitment-dependent base editing systems that utilise a homo- or heterodimer of a ecTadAA variant was determined via sequencing 72 hours after transfection. Non-transfected (NT) control was used for comparison. Experiments were performed in duplicates. To measure base editing efficiency (N-to-G) from fluorescence-based Sanger sequencing data, the following processing steps were applied:

Methods

Engineering of homo- and heterodimer of an ecTadAA deletion variant to use in an aptamer-recruitment-dependent base editing system.

Plasmids expressing the ecTadAA deletion variant (DI) as a monomer or as homo- or heterodimer were generated, under a pCMV promoter. A total of seven constructs were designed, as shown below. Each variant also contained an NLS sequence via a linker (not shown). 1. Dl-MCP monomer with C-terminally fused MCP (SEQ ID NO: 43),

2. N-terminally fused MCP-D1 monomer (SEQ ID NO: 54),

3. D1-D1-MCP homodimer with C-terminally fused MCP (SEQ ID NO: 55),

4. Dl-WT-MCP heterodimer with C-terminally fused MCP (SEQ ID NO: 56),

5. N-terminally fused MCP-D1-WT heterodimer (SEQ ID NO: 57), and

6. WT-D1-MCP heterodimer with C-terminally fused MCP (SEQ ID NO: 58).

CRISPR-Cas protein and gRNA

An SpCas9 nuclease containing the D10A mutation (i.e., an SpCas9 nickase, also referred to as SpCas9_D10A) was used. A plasmid was used to express SpCas9 nickase. An SpCas9 gRNA scaffold containing an RNA aptamer hairpin (MS2 hairpin sequence) in 3' position was used to target the SpCas9_D10A to 2 genomic targets: Site2c (SEQ ID NO: 25) and Site45 (SEQ ID NO: 26). The gRNAs were expressed under the control of a U6 promoter in a plasmid format. The sequences of the different gRNAs are displayed in Table 7 above.

Cell culture and transfection

For each of the monomers, homodimers and heterodimers, the three plasmids described above were co-transfected in HEK293T cells to provide all three components of the base editing platform: (a) the SpCas9_D10A nickase, (b) the guide RNA containing the MS2 aptamer and targeting the different sites, and (c) the respective ecTadAA deletion variant or ecTadA substitution variant fused to MCP and catalysing the A-to-G transition. HEK293T cells were grown in Dulbecco's modified Eagle medium (DMEM) supplemented with 10% fetal bovine serum (FBS) and 100 U ml-1 penicillin/streptomycin. 24 hours prior to transfection 10,000 cells were seeded into a single well of a 96-well plate. After 24 hours the cells were lipid transfected with 200 ng of plasmid DNA (75 ng SpCas9D10A vector, 75 ng ecTadAA-MCP or WT ecTadA-MCP vector, and 50 ng gRNA expression vector) using DharmaFECT DUO (DharmaconTM reagents; T-2012-02).

Cell lysis

Following 72 hours after transfection the medium was removed, and the cells were washed lx with PBS and 50 pl of Trypsin enzyme (Thermo Fisher Scientific) was added to each well. After cells were dissociated, 20 pl of the resuspended cells were transferred to a 96 well plate and were incubated with 60 pl of DirectPCR lysis reagent (Viagen Biotech) under the following conditions: 55°C for 45 minutes followed by 95°C for 15 minutes, the cell lysates were stored at -20 °C. PCR amplification of targeted regions

For the amplification of the targeted sites, 1 pl of cell lysate obtained using the DirectPCR lysis reagent was used per PCR reaction. Primers used in the PCR reaction for Site2c and Site45 are detailed in Table 8.

Determination of the level of base editing

PCR-amplified targeted regions were Sanger sequenced and resultant data files were processed. To measure base editing efficiency (N-to-G) for Site2c and Site45 from fluorescence-based Sanger sequencing data, the following processing steps were applied:

Results and Discussion

As is shown in Figs. 5A and 5B, the NT control shows no editing at Site2c. All the homo- and heterodimer configurations of DI ecTadAA deletion variant show base editing ranging from 6% to 52% A-to-G transition (see Fig. 5A). The highest mean level of editing is with Dl-MCP monomer at position 5 of Site2c. Fig. 5B shows the editing activities of the homo- and heterodimers at Site45. At this site, all variants show editing at position 6, demonstrated by A-to-G conversion. The highest mean level of editing was seen with Dl-MCP.

These data show that an ecTadAA deletion variant created through the deletion of one or more amino acids in |34-|35 loop segment and fused to MCP performs the function of an adenosine deaminase on ssDNA either as a monomer, a homodimer, or as a heterodimer when paired with a WT ecTadA.

Example 5: Base editing with ecTadAA deletion variants using base editing systems comprising g catalytically inactive SpCgs9 nuclegse. The ecTadAA deletion variant DI C-terminally fused to MCP (Dl-MCP) described in Example 2 was used in combination with a base editing system comprising either a SpCas9_D10A nickase or a nuclease-dead SpCas9. The nuclease-dead SpCas9 is a catalytically inactive SpCas9 enzyme containing a D10A and H840A double-mutation (also referred to as SpCas9_D10A+H840A).

The base editing system further comprised a guide RNA specifically targeting either the Site45, SiteB2M or Site2c. The guide RNA contained an RNA aptamer hairpin (MS2 hairpin sequence) in 3' position for the recruitment of Dl-MCP.

The individual base editing system components were cloned into expression plasmids and expressed in vitro in HEK293T cells following co-transfection of three individual plasmids. The first plasmid expressed either the SpCas9_D10A nickase or the dead SpCas9, the second plasmid expressed the specific guide RNAtargeting eitherSite45, SiteB2M or Site2c that contained MS2, and the third plasmid expressed the ecTadAA deletion variant (Dl- MCP).

The ability to perform base editing using the aptamer-recuitment-dependent base editing system with either a SpCas9_D10A nickase or a nuclease-dead SpCas9 at three target sites was measured via Sanger sequencing 72 hours after transfection. Experiments were performed in duplicates. To measure base editing efficiency (N-to-G) from fluorescencebased Sanger sequencing data, the following processing steps were applied:

Methods

The engineering of ecTadAA deletion variant Dl-MCP was performed as described in Example 2. CRISPR-Cas protein and guide RNA

An SpCas9 nuclease containing the D10A mutation (i.e., an SpCas9 nickase, also referred to as SpCas9_D10A) and an SpCas9 nuclease containing both a D10A and H840A doublemutation (i.e., a catalytically inactive SpCas9 or dead SpCas9 or nuclease-dead SpCas9, also referred to as SpCas9_D10A+H840A) were used. A guide RNA scaffold containing an RNA aptamer hairpin (MS2 hairpin sequence) in 3' position was used to guide the SpCas9_D10A or SpCas9_D10A+H840A to the three different genomic targets Site45 (SEQ ID NO: 26), SiteB2M (SEQ ID NO: 28) and Site2c (SEQ ID NO: 25). The guide RNAs were expressed under the control of a U6 promoter in a plasmid format. The sequences of the different guide RNAs are displayed in Table 9.

Table 9: Nucleotide sequences of different gRNAs targeting Site45, SiteB2M and Site2c. In the above sequence annotation, the MS2 hairpin sequences are displayed in italics and bold whilst the targeting region of the guide RNA (20 nucleotides) is underlined, "t" represents uracil in the RNA sequence. Cell culture and transfection

The three plasmids described above were co-transfected in HEK293T cells to provide all three components of the base editing platform: (a) the SpCas9_D10A nickase or dead SpCas9, (b) the specific guide RNA for targeting of either Site2c, Site45 or SiteB2M that either did or did not contain MS2, and (c) the ecTadAA deletion variant (DI or Dl-MCP) catalysing the N-to-G conversion. HEK293T cells were grown in Dulbecco's modified Eagle medium (DMEM) supplemented with 10% fetal bovine serum (FBS) and 100 U ml-1 penicillin/streptomycin. 24 hours prior to transfection 10,000 cells were seeded into a single well of a 96-well plate. After 24 hours the cells were lipid transfected with 200 ng of plasmid DNA (75 ng SpCas9_D10A or SpCas9_D10A+H840A plasmid, 75 ng DI or Dl-MCP plasmid, and 50 ng gRNA plasmid) using DharmaFECT™ DUO (DharmaconTM reagents; T- 2012-02).

Cell lysis

PCR amplification of targeted regions

For the amplification of the targeted sites, 1 pl of cell lysate obtained using the DirectPCR lysis reagent was used per PCR reaction. Primers used in the PCR reaction for the different sites are detailed in Table 10:

Table 10: Primers used in the PCR reaction to amplify the targeted sites.

Determination of the level of base editing PCR-amplified targeted regions were Sanger sequenced. To measure base editing efficiency (N-to-G) for Site2C, Site45 and SiteB2M from fluorescence-based Sanger sequencing data, the following processing steps were applied:

Results and Discussion

As is presented in Figs. 6A, 6B and 6C, N-to-G conversion was not observed across all three sites (Site45, SiteB2M and Site2c) in the NT control.

In the presence of SpCas9_D10A nickase, the ecTadAA deletion variant Dl-MCP induced A- to-G transition at Site45 at adenosine in position 6 of 28% and 30% (see Fig. 6A), at SiteB2M at adenosine in position 5 of 7% and 13% (see Fig. 6B) and at Site2c at adenosine in position 5 of 26% and 32% (see Fig. 6C). In the presence of a dead SpCas9 enzyme (SpCas9_D10A+H840A), the ecTadAA deletion variant Dl-MCP also induced A-to-G transition at all three sites (see Figs. 6A, 6B, 6C).

These results show that the aptamer-recruitment-dependent base editing system with the ecTadAA deletion variant fused to MCP is able to efficiently base edit target sites with either nickase or dead SpCas9 enzymes. This is demonstrated by the level of A-to-G transition at the respective target adenosines observed in these experiments. These data also suggest that targeted deletions in |34-|35 loop segment of ecTadA in combination to the fusion to MCP via a linker are not detrimental to the whole protein structure and function.

Example 6: Base editing with ecTadAA deletion variant SpCas9 fusion base editor ecTadAA deletion variants described in Example 1 are fused to the N-terminus or C- terminus of SpCas9 nuclease, SpCas9 nickase or dead SpCas9 nuclease to generate ecTadAA-SpCas9 fusion base editors. The SpCas9 nickase contains a D10A mutation (also referred to as SpCas9_D10A) and the dead SpCas9 nuclease contains a D10A and H840A double-mutation (also referred to as SpCas9_D10A+H840A).

In particular, the ecTadAA deletion variants described in Example 1 (DI, D2, D3, D4 and D7) were fused to the N-terminus of a Cas9 nickase (nCas9; SpCas9_D10A) to generate ecTadAA-nCas9 fusion base editors.

The base editing system comprises a fusion base editor as described above and guide RNAs specifically targeting genomic loci Site2c and Site45 of HEK293T cells.

The individual base editing system components were cloned into expression plasmids and expressed in vitro in HEK293T cells following co-transfection of two individual plasmids. The first plasmid expresses the ecTadAA-nCas9 fusion base editor and the second plasmid expresses a specific guide RNA targeting the genomic locus in the HEK293T cells.

The ability to perform base editing using the different ecTadAA-nCas9 fusion base editors at different target sites in HEK293T cells was measured via Sanger sequencing 72 hours after transfection. Experiments were performed in duplicates. To measure base editing efficiency (N-to-G) from fluorescence-based Sanger sequencing data, the processing steps as in Example 1 were applied.

Methods

Plasmids expressing ecTadAA deletion variants fused to SpCas9 nickase (nCas9; SpCas9_D10A) under a pCMV promoter were generated. A total of five ecTadAA-nCas9 fusion base editor constructs were generated, which include an engineered ecTadAA deletion variant with one or two amino acid deletions (DI, D2, D3, D4 and D7). Each ecTadAA-nCas9 fusion base editor construct consists of an NLS sequence fused to the N- terminus of an ecTadAA, which is fused to the N-terminus of an SpCas9 nickase (nCas9; SpCas9_D10A) via a linker, and an NLS sequence fused to the C-terminus of said SpCas9 nickase. The amino acid sequences of the five ecTadAA-nCas9 fusion base editor constructs are displayed in Table 11 below.

Table 11: Amino acid sequences of the five ecTadAA-nCas9 fusion base editor constructs generated. The N- and C-terminal NLS sequence is highlighted in bold, the linker is depicted in italics and the SpCas9-D10A (nCas9) is underlined (NLS-TADA-£//V/('E7?-SpCas9-D10A- NLS).

Cell culture and transfection

For each of the five ecTadAA-nCas9 fusion base editor constructs, the two plasmids described above were co-transfected in HEK293T cells to provide all components of the base editing platform: (a) the guide RNA targeting the Site2c and Site45 sites, and (b) the respective ecTadAA deletion variant fused to SpCas9-D10A. HEK293T cells were grown in Dulbecco's modified Eagle medium (DMEM) supplemented with 10% fetal bovine serum (FBS) and 100 U ml-1 penicillin/streptomycin. 24 hours prior to transfection 10,000 cells were seeded into a single well of a 96-well plate. After 24 hours the cells were lipid transfected with 200 ng of plasmid DNA (120 ng ecTadAA-SpCas9-D10A plasmid, and 80 ng gRNA plasmid) using DharmaFECT™ DUO (Dharmacon™ reagents; T-2012-02). A nontransfected (NT) control was included in this study as a negative control.

Cell lysis

PCR amplification of targeted regions

For the amplification of the targeted sites, 1 pl of cell lysate obtained using the DirectPCR lysis reagent was used per PCR reaction. Primers used in the PCR reaction for the two sites are detailed in Table 4 from Example 2.

Determination of the level of base editing

PCR-amplified targeted regions were measured via Sanger sequencing and resultant data files were processed as in Example 2 Results and Discussion

As is presented in Fig. 8A and 8B, no A-to-G conversion was observed in both sites (Site2c and Site45) in the NT condition. All five ecTadAA-nCas9 fusion base editor constructs were active, with variant Dl-SpCas9-D10A (ecTadA-D108A) being the most reproducibly active variant, with up to 28% base editing on A5 of Site2c (see Fig. 8A, Dl-nCas9).

These results show that a targeted amino acid deletion in ecTadA (e.g., one, two, three or more amino acid residue(s) of amino acid residues 105 to 130 of E.coli TadA shown in SEQ ID NO: 1) and the fusion of the resulting ecTadAA variant to SpCas9-D10A result in adenosine deaminase activity of the ecTadAA, when used in a fusion base editing system. This is demonstrated by the level of A-to-G conversion observed in these experiments.

Example 7: Multiplex base editing with an ecTadAA deletion variant and ratAPOBECl individually fused to an aptamer binding protein as part of an aptamer-recruitment- dependent base editing system

The ecTadAA deletion variant DI described in Example 1 and the cytidine deaminase ratAPOBECl were individually fused to an MS2 Coat Protein (MCP) binding protein via a linker (e.g., Linkerl) to generate Dl-MCP and ratAPOBECl-MCP. They were used in a base editing system as an adenosine deaminase and a cytidine deaminase, respectively, acting on ssDNA simultaneously. Said base editing system comprises the individual recruitment of Dl-MCP and ratAPOBECl-MCP via the MS2 aptamer on the gRNA that binds to the MCP protein.

Dl-MCP and ratAPOBECl-MCP were each separately cloned into two expression plasmids and co-transfected together with a plasmid expressing one of two specific gRNA for each of the two targets Site2c and SiteB2M into HEK293T cells stably expressing SpCas9_D10A nickase.

Methods

HEK293T cells stably expressing SpCas9_D10A nickase fused to uracil glycosylase inhibitor (UGI; SEQ ID NO: 29) were grown in Dulbecco's modified Eagle medium (DMEM) supplemented with 10% fetal bovine serum (FBS), 100 U ml-1 penicillin/streptomycin and 0.7 pg/ml puromycin. 24 hours prior to transfection 10,000 cells were seeded into a single well of a 96-well plate. After 24 hours, the cells were lipid transfected with 125 ng of plasmid DNA forthe singleplex experiment (75 ng ecTadAA-MCP or 75 ng ratAPOBECl, and 50 ng gRNA plasmid) or 200 ng of plasmid DNA for the multiplex experiment (75 ng ecTadAA-MCP and 75 ng ratAPOBECl, and 50 ng gRNA plasmid) using DharmaFECT™ DUO (Dharmacon™ reagents; T-2012-02).

Cell lysis and PCR amplification of targeted regions in Site2c and SiteB2M were performed as described in Example 2. The ability to perform base editing using the aptamer- recruitment-dependent base editing system that utilises both the ecTadAA deletion variant and the cytidine deaminase ratAPOBECl simultaneously was measured via Sanger sequencing 72 hours after transfection. Experiments were performed in duplicates. To measure base editing efficiency (N-to-G) from fluorescence-based Sanger sequencing data, the following processing steps were applied:

In the context of N-to-G editing (e.g., TadAA variant), G-to-G events were excluded from the analysis, shown as (not applicable). In the context of N-to-T editing (e.g., APOBEC1), T-to-T events were excluded from the analysis, shown as (not applicable).

Results and Discussion

As is presented in Figs. 7A and 7B, only background level N-to-G conversion (Dl-MCP) and N-to-T conversion (ratAPOBECl-MCP) was observed for Site2c and SiteB2M in the absence of the aptamer-recruitment-dependent base editing system (NT).

At Site2c, Dl-MCP showed 8% and 30% base editing at the adenosine in position 5 (A5, see Fig. 7A, left column), while ratAPOBECl showed 30% and 53% base editing at the cytidine in position 6 (C6, see Fig. 7A, right column). In the multiplex experiment, the presence of both Dl-MCP and ratAPOBECl-MCP resulted in a simultaneous base editing at A5 by Dl- MCP and at C6 by ratAPOBECl (see Fig. 7A, last row). At SiteB2M, Dl-MCP showed 4% and 8% base editing at A5 (see Fig. 7B, left column) while ratAPOBECl showed between 59% and 63% base editing at C4 and C6 (see Fig. 7B, right column). In the multiplex experiment, the presence of both Dl-MCP and ratAPOBECl-MCP resulted in simultaneous base editing at A5 by Dl-MCP and at C4 and C6 by ratAPOBECl (see Fig. 7B, last row; Fig. 7C). These results demonstrate that targeted single amino acid deletions in ecTadA and the fusion of the resulting ecTadAA deletion variant to MCP result in adenosine deaminase activity in a multiplex aptamer-recruitment-dependent base editing system and not only in a singleplex experiment (see, e.g., Example 2). In addition, the presence of UGI at the C- terminus of the SpCas9_D10A nickase does not affect the base editing by the TadAA deletion variant.

Example 8: Base editing with ecTadAA deletion variant in human induced pluripotent stem cells (hiPSCs)

The ecTadAA deletion variant DI C-terminally fused to MCP (Dl-MCP) described in Example 2 was expressed in vitro in hiPSCs from mRNA along with mRNA expressing an SpCas9_D10A and a synthetic gRNA for the Site2c target. The synthetic gRNA contained a 3' RNA aptamer hairpin (MS2 hairpin sequence) which mediates the recruitment of Dl- MCP. A non-transfected (NT) control was included in this study as a negative control.

The ability to perform base editing using a system that utilizes the Dl-MCP variant compared to the NT control was measured via Sanger sequencing 72 hours after electroporation. The measurement of the base editing efficiency from the sequencing data was performed as described in Example 2.

Methods mRNA generation of Dl-MCP

Plasmids expressing a Dl-MCP under a T7 promoter were generated. The amino acid sequence of Dl-MCP is displayed in Table 6. mRNA was produced using an in vitro transcription (IVT) assay. DNA plasmids were linearized with Sspl-HF restriction enzyme (New England Biolabs) for 4 hours at 37°C followed by column cleanup with the DNA Clean & Concentrator kit (Zymo Research). The resulting template was used for in vitro transcription with the T7 RiboMAX™ Large Scale RNA Production System (Promega) followed by RQ1 DNase (Promega) treatment for 15 minutes at 37°C. The resulting mRNA was purified with the Monarch® Spin RNA Cleanup Kit (New England Biolabs), aliquoted, and analyzed for quality via the TapeStation High Sensitivity RNA ScreenTape (Agilent).

CRISPR-Cas protein and gRNA

An SpCas9 nuclease containing the D10A mutation (i.e., an SpCas9 nickase, also referred to as SpCas9_D10A) was used as mRNA, to express SpCas9 nickase in hiPSCs. An SpCas9 gRNA scaffold containing an RNA aptamer hairpin (MS2 hairpin sequence) in 3' position was used to target the SpCas9_D10A to the genomic target Site2c. The gRNA was synthesized inhouse, and the sequence of the gRNA is displayed in Table 7.

Cell culture and electroporation hiPSCs were cultured on Geltrex matrix (Thermo Fisher Scientific)-coated cell culture plates in mTeSR™ PLUS medium (STEMCELL Technologies) at 37°C and 5% CO2. Sub-confluent hiPSC cultures were passaged by non-enzymatic dissociation with Gibco™ Versene™ solution (Thermo Fisher Scientific) and re-plated as clumps.

3 hours prior to electroporation, sub-confluent hiPSC cultures were treated with 10 pM Rho kinase inhibitor (Y-27632, STEMCELL Technologies). hiPSC colonies were dissociated to single cells using Gibco™ StemPro™ Accutase™ reagent (Thermo Fisher Scientific) prior to resuspension in P3 electroporation buffer (Lonza). Dissociated hiPSC samples (250,000 cells per sample) were electroporated with 40 pmol of the gRNA targeting Site2c and 1.6pmol (2.5 pg) of the SpCas9 nickase mRNA and 1.6 pmol (0.75 pg) of the Dl-MCP mRNA using the Amaxa 4D-Nucleofector® System with a 20 pl Nucleocuvette® Strip format (Lonza). After electroporation, hiPSCs were transferred directly to Geltrex matrix-coated cell culture vessels containing pre-warmed mTeSR PLUS medium supplemented with 10 pM Rho kinase inhibitor and incubated at 37°C, 5% CO2 for 24 hours. Culture medium was then changed for mTeSR PLUS medium without additional Rho kinase inhibitor and electroporated hiPSCs were expanded with daily complete medium changes.

Cell lysis

Following 72 hours after electroporation the medium was removed, and the cells were washed once with PBS and resuspended in 50 pl PBS. After the cells were dissociated, 20 pl of the resuspended cell solution were transferred to a 96 well plate and incubated with 60 pl of DirectPCR® lysis reagent (Viagen Biotech) under the following conditions: 55°C for 45 minutes followed by 95°C for 15 minutes. The cell lysates were stored at -20°C until further use.

PCR amplification of targeted regions

For the amplification of the targeted site Site2c, 1 pl of cell lysate obtained using the DirectPCR lysis reagent was used per PCR reaction. Primers used in the PCR reaction for Site2c are detailed in Table 8.

Determination of the level of base editing PCR-amplified targeted regions were Sanger sequenced, and resultant data files were processed through the Revvity in-house Bio-IT pipeline to generate the editing data for Site2c.

Results and Discussion

As presented in Fig. 9, for the NT control only background level A-to-G transition was observed across the whole 20 nucleotide spacer sequence (Site2c), whereas Dl-MCP showed a highly specific base editing activity of 38% at position A5 only. These results demonstrate that the ecTadAA deletion variant is active as adenine deaminase within an aptamer-recruitment-dependent base editing system using an mRNA delivery system for the ecTadAA deletion variant and the Cas9 nickase together with a synthetic sgRNA in a physiologically relevant model, such as hiPSCs.

Industrial Applicability

The present disclosure is useful in a wide range of fields, including research, medicine (e.g., therapeutic treatment, drug discovery), agriculture (e.g., agricultural, fishery and livestock production, breeding etc.), and biological material production. Thus, the present disclosure is industrially applicable.

Further applications and practical exploitation in industry may be derived from the present description by the skilled person's general knowledge.

Claims

1. A nucleic acid deaminase capable of deaminating a deoxyadenosine (dA) in a nucleic acid molecule, wherein the nucleic acid deaminase comprises an amino acid sequence of a wild-type TadA, wherein one or more amino acid residue(s) of the wildtype TadA are deleted, and wherein the one or more deleted amino acid residue(s) are located within a region of the wild-type TadA contacting the nucleic acid molecule.

2. The nucleic acid deaminase according to claim 1, wherein the wild-type TadA is a bacterial TadA selected from the group consisting of Escherichia coli TadA, Salmonella enterica TadA, Staphylococcus aureus TadA, a Streptococcus pyogenes TadA, Salmonella typhi TadA, Haemophilus influenzae TadA, Caulobacter vibrioides TadA, and Shewanella putrefaciens Tad .

3. The nucleic acid deaminase according to claim 1 or 2, wherein the region of the wildtype TadA contacting the nucleic acid molecule comprises amino acid residues 105 to 130 of E.coli TadA shown in SEQ ID NO: 1 or corresponding amino acid residues in a bacterial TadA shown in one of SEQ ID NOs: 2 to 8.

4. The nucleic acid deaminase according to any one of claims 1 to 3, wherein the one or more amino acid residue(s) are one, two or three amino acid residue(s) selected from A106, R107 and D108 of £.co// TadA shown in SEQ I D NO: 1 or a corresponding residue in a bacterial TadA shown in one of SEQ ID NOs: 2 to 8.

5. The nucleic acid deaminase according to any one of claims I to 4, wherein the nucleic acid deaminase comprises an amino acid sequence derived from any one of SEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIM ALRQGG LVMQNYRLI DATLYVTLEPCVMCAGAM I HSRIG RVVFGARAKTG AAGSLMDVLH H PGMNHRVEITEGI LADECAALLSDFFRMRRQEIKAQKKAQSSTD (SEQ ID NO: 9);

SEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIM ALRQGGLVMQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGADAKTGAAGSLMDVLHH PGMNHRVEITEGI LADECAALLSDFFRMRRQEIKAQKKAQSSTD (SEQ ID NO: 10);

SEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIM ALRQGGLVMQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGRDAKTGAAGSLMDVLHH PGMNHRVEITEGI LADECAALLSDFFRMRRQEIKAQKKAQSSTD (SEQ ID NO: 11); SEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIM ALRQGGLVMQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGAAKTGAAGSLMDVLHHP GMNHRVEITEGILADECAALLSDFFRMRRQEIKAQKKAQSSTD (SEQ ID NO: 12);

6. A base editor comprising (i) an effector protein comprising at least one nucleic acid deaminase according to any one of claims 1 to 5.

7. The base editor according to claim 6, wherein the base editor further comprises (ii) a sequence-targeting protein, wherein optionally the sequence-targeting protein is a nuclease comprising at least one catalytically inactive nuclease domain.

8. The base editor according to claim 6 or 7, wherein the base editor further comprises (iii) an RNA-ligand binding complex.

9. The base editor according to any one of claims 6 to 8, wherein the base editor comprises:

(i) an effector protein comprising at least one nucleic acid deaminase according to any one of claims 1 to 5 and at least one ligand capable of binding to a ligand binding moiety,

(ii) a sequence-targeting protein, and

(iii) an RNA-ligand binding complex comprising at least one ligand binding moiety; and wherein the at least one ligand of the effector protein (i) binds to the ligand binding moiety of the RNA-ligand binding complex (iii).

10. An isolated nucleic acid, an expression construct or a vector comprising a sequence encoding the nucleic acid deaminase according to any one of claims 1 to 5 or comprising a sequence encoding the base editor according to any one of claims 6 to

9.

11. A cell comprising the nucleic acid deaminase according to any one of claims 1 to 5, the base editor according to any one of claims 6 to 9, or the isolated nucleic acid, the expression construct or the vector according to claim 10.

12. A kit of parts comprising part (i), wherein part (i) comprises an effector protein comprising at least one nucleic acid deaminase according to any one of claims 1 to 5, and part (ii), wherein part (ii) comprises a sequence-targeting protein.

13. A method for modifying a nucleic acid molecule, said method comprising contacting the nucleic acid molecule with at least one nucleic acid deaminase according to any one of claims 1 to 5 or contacting the nucleic acid molecule with a base editor according to any one of claims 6 to 9 at a target site.

14. The nucleic acid deaminase according to any one of claims 1 to 5 or the base editor according to any one of claims 6 to 9 for use as a medicament.

15. The nucleic acid deaminase according to any one of claims 1 to 5 or the base editor according to any one of claims 6 to 9 for use in treating or preventing a disease by target site-specific modification of a nucleic acid molecule.